diff --git a/doc/HandWrittenNotes/2025/FYSSTKweek37.pdf b/doc/HandWrittenNotes/2025/FYSSTKweek37.pdf
new file mode 100644
index 000000000..caf451ded
Binary files /dev/null and b/doc/HandWrittenNotes/2025/FYSSTKweek37.pdf differ
diff --git a/doc/HandWrittenNotes/2025/FYSSTKweek38.pdf b/doc/HandWrittenNotes/2025/FYSSTKweek38.pdf
new file mode 100644
index 000000000..ba905a254
Binary files /dev/null and b/doc/HandWrittenNotes/2025/FYSSTKweek38.pdf differ
diff --git a/doc/HandWrittenNotes/2025/FYSSTKweek39.pdf b/doc/HandWrittenNotes/2025/FYSSTKweek39.pdf
new file mode 100644
index 000000000..d6a57bb16
Binary files /dev/null and b/doc/HandWrittenNotes/2025/FYSSTKweek39.pdf differ
diff --git a/doc/HandWrittenNotes/2025/FYSSTKweek40.pdf b/doc/HandWrittenNotes/2025/FYSSTKweek40.pdf
new file mode 100644
index 000000000..491d4d210
Binary files /dev/null and b/doc/HandWrittenNotes/2025/FYSSTKweek40.pdf differ
diff --git a/doc/HandWrittenNotes/2025/FYSSTKweek41.pdf b/doc/HandWrittenNotes/2025/FYSSTKweek41.pdf
new file mode 100644
index 000000000..9199be871
Binary files /dev/null and b/doc/HandWrittenNotes/2025/FYSSTKweek41.pdf differ
diff --git a/doc/HandWrittenNotes/2025/FYSSTKweek42.pdf b/doc/HandWrittenNotes/2025/FYSSTKweek42.pdf
new file mode 100644
index 000000000..0b1000252
Binary files /dev/null and b/doc/HandWrittenNotes/2025/FYSSTKweek42.pdf differ
diff --git a/doc/HandWrittenNotes/2025/FYSSTKweek43.pdf b/doc/HandWrittenNotes/2025/FYSSTKweek43.pdf
new file mode 100644
index 000000000..4186ab06a
Binary files /dev/null and b/doc/HandWrittenNotes/2025/FYSSTKweek43.pdf differ
diff --git a/doc/HandWrittenNotes/2025/FYSSTKweek44.pdf b/doc/HandWrittenNotes/2025/FYSSTKweek44.pdf
new file mode 100644
index 000000000..0b17ce9e4
Binary files /dev/null and b/doc/HandWrittenNotes/2025/FYSSTKweek44.pdf differ
diff --git a/doc/HandWrittenNotes/2025/FYSSTweek45.pdf b/doc/HandWrittenNotes/2025/FYSSTweek45.pdf
new file mode 100644
index 000000000..09fefba30
Binary files /dev/null and b/doc/HandWrittenNotes/2025/FYSSTweek45.pdf differ
diff --git a/doc/LectureNotes/.DS_Store b/doc/LectureNotes/.DS_Store
index 510782495..cb7504035 100644
Binary files a/doc/LectureNotes/.DS_Store and b/doc/LectureNotes/.DS_Store differ
diff --git a/doc/LectureNotes/.ipynb_checkpoints/exercisesweek34-checkpoint.ipynb b/doc/LectureNotes/.ipynb_checkpoints/exercisesweek34-checkpoint.ipynb
new file mode 100644
index 000000000..50e773ede
--- /dev/null
+++ b/doc/LectureNotes/.ipynb_checkpoints/exercisesweek34-checkpoint.ipynb
@@ -0,0 +1,315 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "232d1306",
+   "metadata": {},
+   "source": [
+    "# Exercises week 34\n",
+    "\n",
+    "## Coding Setup and Linear Regression"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9b66a351",
+   "metadata": {},
+   "source": [
+    "Welcome to FYS-STK3155/4155!\n",
+    "\n",
+    "In this first week will focus on getting you set up with the programs you are going to be using throughout this course. We expect that many of you will encounter some trouble with setting these programs up, as they can be extremely finnicky and prone to not working the same on all machines, so we strongly encourage you to not get discouraged, and to show up to the group-sessions where we can help you along. The group sessions are also the best place to find group partners for the projects and to be challenged on your understanding of the material, which are both essential to doing well in this course. We strongly encourage you to form groups of 2-3 participants. \n",
+    "\n",
+    "If you are unable to complete this week's exercises, don't worry, this will likely be the most frustrating week for many of you. You have time to get back on track next week, especially if you come to the group-sessions! Note also that this week's set of exercises does not count for the additional score. The deadline for the weekly exercises is set to Fridays, at midnight."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "36d8750b",
+   "metadata": {},
+   "source": [
+    "### Learning goals\n",
+    "\n",
+    "After completing these exercises, you will know how to\n",
+    "\n",
+    "- Create and use a Github repository\n",
+    "- Set up and use a virtual environment in Python\n",
+    "- Fit an OLS model to data using scikit-learn\n",
+    "- Fit a model on training data and evaluate it on test data\n",
+    "\n",
+    "### Deliverables\n",
+    "\n",
+    "Complete the following exercises while working in a jupyter notebook. Exercises 1,2 and 3 require no writing in the notebook. Then, in canvas, include\n",
+    "- The jupyter notebook with the exercises completed\n",
+    "- An exported PDF of the notebook (https://code.visualstudio.com/docs/datascience/jupyter-notebooks#_export-your-jupyter-notebook)\n",
+    "- Optional: A link to your github repository, which must be set to public, include the notebook file, a README file, requirements file and gitignore file.\n",
+    "\n",
+    "We require you to deliver a jupyter notebook so that we can evaluate the results of your code without needing to download and run the code of every student, as well as to teach you to use this useful tool."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2a9c7ef8",
+   "metadata": {},
+   "source": [
+    "## Exercise 1 - Github Setup\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1498aed1",
+   "metadata": {},
+   "source": [
+    "In this course, we require you to pay extra mind to the reproducibility of your results and the shareability of your code. The first step toward these goals is using a version control system like git and online repository like Github.\n",
+    "\n",
+    "**a)** Download git if you don't already have it on your machine, check with the terminal command ´git --version´ (https://git-scm.com/downloads).\n",
+    "\n",
+    "**b)** Create a Github account(https://github.com/), or log in to github with your UiO account (https://github.uio.no/login).\n",
+    "\n",
+    "**c)** Learn the basics of opening the terminal and navigating folders on your operating system. Things to learn: Opening a terminal, opening a terminal in a specific folder, listing the contents of the current folder, navigating into a folder, navigating out of a folder.\n",
+    "\n",
+    "**d)** Download the Github CLI tool and run ´gh auth login´ in your terminal to authenticate your local machine for some of the later steps. (https://github.com/cli/cli#installation). You might need to change file permissions to make it work, ask us or ChatGPT for help with these issues.\n",
+    "\n",
+    "**e)**  As an alternative to the above terminal based instructions, you could install GitHub Desktop (see https://desktop.github.com/download/) or if you prefer GitLab, GitLab desktop (see https://about.gitlab.com/install/). This sets up all communications between your PC/Laptop and the repository. This allows you to combine exercises 1 and 2 in an easy way if you don't want to use terminarl. Keep in mind that these GUIs (graphical user interfaces) are not text editors."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c56fbefa",
+   "metadata": {},
+   "source": [
+    "## Exercise 2 - Setting up a Github repository\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "fb9b8acd",
+   "metadata": {
+    "vscode": {
+     "languageId": "plaintext"
+    }
+   },
+   "source": [
+    "**a)** Create an empty repository for your coursework in this course in your browser at github.com (or uio github).\n",
+    "\n",
+    "**b)** Open a terminal in the location you want to create your local folder for this repository, like your desktop.\n",
+    "\n",
+    "**c)** Clone the repository to your laptop using the terminal command ´gh repo clone username/repository-name´. This creates a folder with the same name as the repository. Moving it or renaming it might require some extra steps.\n",
+    "\n",
+    "**d)** Download this jupyter notebook. Add the notebook to the local folder.\n",
+    "\n",
+    "**e)** Run the ´git add .´ command command in a terminal opened in the local folder to stage the current changes in the folder to be commited to the version control history. Run ´git status´ to see the staged files.\n",
+    "\n",
+    "**f)** Run the ´git commit -m \"Adding first weekly assignment file\"´ command to commit the staged changes to the version control history. Run ´git status´ to see that no files are staged.\n",
+    "\n",
+    "**g)** Run the ´git push\" command to upload the commited changes to the remote repository on Github.\n",
+    "\n",
+    "**h)** Add a file called README.txt to the repository at Github.com. Don't do this in your local folder. Add a suitable title for your repository and some inforomation to the file.\n",
+    "\n",
+    "**i)** Run the ´git fetch origin´ command to fetch the latest remote changes to your repository.\n",
+    "\n",
+    "**j)** Run the ´git pull´ command to download and update files to match the remote changes.\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f84d0db6",
+   "metadata": {},
+   "source": [
+    "## Exercise 3 - Setting up a Python virtual environment\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b5a4818a",
+   "metadata": {},
+   "source": [
+    "Following the themes from the previous exercises, another way of improving the reproducibility of your results and shareability of your code is having a good handle on which python packages you are using.\n",
+    "\n",
+    "There are many ways to manage your packages in Python, and you are free to use any approach you want, but in this course we encourage you to use something called a virtual environment. A virtual environemnt is a folder in your project which contains a Python runtime executable as well as all the packages you are using in the current project. In this way, each of your projects has its required set of packages installed in the same folder, so that if anything goes wrong while managing your packages it only affects the one project, and if multiple projects require different versions of the same package, you don't need to worry about messing up old projects. Also, it's easy to just delete the folder and start over if anything goes wrong.\n",
+    "\n",
+    "Virtual environments are typically created, activated, managed and updated using terminal commands, but for now we recommend that you let for example VS Code (a popular cross-paltform package) handle it for you to make the coding experience much easier. If you are familiar with another approach for virtual environments that works for you, feel free to keep doing it that way.\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0f6de364",
+   "metadata": {},
+   "source": [
+    "**a)** Open this notebook in VS Code (https://code.visualstudio.com/Download). Download the Python and Jupyter extensions.\n",
+    "\n",
+    "**b)** Press ´Cmd + Shift + P´, then search and run ´Python: Create Environment...´\n",
+    "\n",
+    "**c)** Select ´Venv´\n",
+    "\n",
+    "**d)** Choose the most up-to-date version of Python your have installed.\n",
+    "\n",
+    "**e)** Press ´Cmd + Shift + P´, then search and run ´Python: Select Interpreter´\n",
+    "\n",
+    "**f)** Selevet the (.venv) option you just created.\n",
+    "\n",
+    "**g)** Open a terminal in VS Code, the venv name should be visible at the beginning of the line. Run `pip list` to see that there are no packages install in the environment.\n",
+    "\n",
+    "**h)** In this terminal, run `pip install matplotlib numpy scikit-learn`. This will install the listed packages.\n",
+    "\n",
+    "**i)** To make these installations reproducible, which is important for reproducing results and sharing your code, run ´pip freeze > requirements.txt´ to create the file requirements.txt with all your dependencies.\n",
+    "\n",
+    "Now, anyone who wants to recreate your package setup can download your requirements.txt file and run ´pip install -r requirements.txt´ to install the correct packages and versions. To keep the requirements.txt file up to date with your environment, you will need to re-run the freeze command whenever you install a new package.\n",
+    "\n",
+    "**j)** Create a .gitignore file at the root of your project folder, and add the line ´.venv´ to it. This way, you won't try to upload a copy of all your python packages when you regularly push your changes to Github. Ignored files should not show up when you run ´git status´, and are not staged when running ´git add .´, try it!"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5d184ab1",
+   "metadata": {},
+   "source": [
+    "## Exercise 3 - Fitting an OLS model to data\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d19ebd67",
+   "metadata": {},
+   "source": [
+    "Great job on getting through all of that! Now it is time to do some actual machine learning!\n",
+    "\n",
+    "**a)** Complete the code below so that you fit a second order polynomial to the data. You will need to look up some scikit-learn documentation online (look at the imported functions for hints).\n",
+    "\n",
+    "**b)** Compute the mean square error for the line model and for the second degree polynomial model."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "id": "b58fb9bf",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import numpy as np\n",
+    "import matplotlib.pyplot as plt\n",
+    "from sklearn.preprocessing import PolynomialFeatures # use the fit_transform method of the created object!\n",
+    "from sklearn.linear_model import LinearRegression\n",
+    "from sklearn.metrics import mean_squared_error"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "id": "0208e9ca",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAhYAAAGdCAYAAABO2DpVAAAAOnRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjEwLjUsIGh0dHBzOi8vbWF0cGxvdGxpYi5vcmcvWftoOwAAAAlwSFlzAAAPYQAAD2EBqD+naQAASZ5JREFUeJzt3QuczPX++PH37GItdpclLVIuiSSV08U1KUVJOnXSr4vLOV1Jkn+FOn6ojqXboZtup1Kd8lMhKioJSeIkHZduWJdEInZdF7vzf7y/s7PNzs7l+535zv31fDzmrJ35zsx3J8f3vZ/P++JwOp1OAQAAsEGaHS8CAACgCCwAAIBtCCwAAIBtCCwAAIBtCCwAAIBtCCwAAIBtCCwAAIBtCCwAAIBtqkiUlZaWyi+//CJZWVnicDii/fYAACAE2k9z37590rBhQ0lLS4ufwEKDisaNG0f7bQEAgA22bt0qJ5xwgj2BRZMmTWTz5s2V7h88eLA888wzpl5DVyrcJ5adnW3l7QEAQIwUFRUZCwPu67gtgcWKFSukpKSk/Ps1a9bIxRdfLNdcc43p13Bvf2hQQWABAEBiCZbGYCmwOO644yp8P2HCBGnevLl07do1tLMDAABJJeQciyNHjsgbb7whw4cPDxi9FBcXGzfPpRQAAJCcQi43nTVrluzdu1cGDhwY8Lj8/HzJyckpv5G4CQBA8nI4tX4kBD169JBq1arJnDlzAh7na8VCg4vCwkK/ORaax3H06NFQTgsJJj09XapUqULpMQDEOb1+6wJBoOt3yFshWhkyf/58mTFjRtBjMzIyjJtZ+/fvl59//tmol0VqqFGjhjRo0MAIVAEAiS2kwOKVV16R+vXrS69evWw9GV2p0KBCLzSaKMpvsclNg0fN1fntt9+koKBAWrRoEbDpCgAgCQML7ZypgcWAAQOMJWw76faHXmw0qMjMzLT1tRGf9L9z1apVjVUwDTKqV68e61MCAITB8q+HugWyZcsW+dvf/iaRwkpFamGVAgCSh+Ulh0suuYT8BwAA4kxJqVOWF/wuO/cdlvpZ1eXcprmSnhb9X9SjPisEAADYG0x8sm6HzFr1i/x+4Ej5Yw1yqsuY3q2lZ5sGEk0EFgAAJKB5a7bLuDnrZHvhYZ+P7yg8LIPeWClTbmwX1eCCzW0baJMwzQvRmyYiHn/88cYMlZdfftlIdjXr1Vdfldq1a0f0XAEAyRFUDHpjpd+gQrmTFjT40JWNaEnKwEI/wC837Jb3Vm0zvkbjA+3Zs6ds375dNm3aJHPnzpVu3brJXXfdJZdffrkcO3Ys4u8PAEhuJWXXtpnfbJP7Z64uDxwC0WM0+NDtkmipkgpLQ9HYZ9ImYHl5ecafGzVqJO3atZP27dvLRRddZKxE3HzzzfLEE08YpbobN26U3Nxc6d27tzzyyCNSq1YtWbhwofz1r3+tUBUzZswYGTt2rLz++usyefJk+eGHH6RmzZpy4YUXyqRJk4xeIgCA5DcvyLZHMJrQGS1pqbA05N5n0sejSQOAM844o7xDqZZVPvnkk7J27VqZOnWqLFiwQO677z7jsY4dOxrBgrZJ1ZUPvd1zzz3l/T0eeugh+fbbb40ZLboqEmxGCwAgdbY9gtEqkWipkkxLRBrN+Voa0vt0DUAfv7h1XlTLb1q1aiX//e9/jT8PGzas/P4mTZrIww8/LLfffrs8++yzRjtr7cGuqxXulQ83z54hzZo1M4KTc845x2h/rqsdAIDkVBLg2maGXu3yclylp9GSNCsWun8ULIkl2vtMxvs6neVbG9pcTLdGdKskKytL+vXrJ7t375aDBw8GfI2vv/7a2DY58cQTjed17drVuF8blQEAktfyINe2QNy/QmsqQDR/oU6awMLs/lE095nUd999J02bNjW2LzSRs23btvLuu+8awcIzzzxjHKOtrP05cOCAMUlWt0j+/e9/y4oVK2TmzJlBnwcASHw7w7hm6UpFtEtNk2orxOz+UTT3mTSHYvXq1XL33XcbgYSWnj7++OPlLaynT59e4XjdDtFBbJ6+//57Y1VjwoQJxrh59Z///CdqPwMAIHbq1TI/HdzTkG7N5e6LW8ak82bSrFjo/pFWf/j7CPX+BhHcZyouLpYdO3bItm3bZOXKlTJ+/Hjp06ePsUrRv39/Ofnkk40kzKeeesqoCtFKj+eee67Ca2jeheZNfPrpp7Jr1y5ji0S3PzTgcD9v9uzZRiInACAFOEN7WqeTj4tJUJFUgYV+gLqPpBwx2GeaN2+eNGjQwAgOtKfFZ599ZiRZvvfee5Kenm5Uh2i56cSJE6VNmzbGtkZ+fn6F19DKEE3mvPbaa40Jr1qKql+1XPXtt9+W1q1bGysXjz32WER+BgBAfNl1oNjS8ZH+JdrUOTijPFGsqKjIqH4oLCw08gY8HT58WAoKCoychFDHZ8eqjwVCZ8d/dwBIRl9u2C3XvbjM1LHuX5sjlVcR6PqdlDkWbvphaklpPEx4AwDAjm1+7cfkNJGsGQ+/RCddYKE0iOjQvG6sTwMAgLD9zzknyj/n/1jpfv11WYONv3VqYvxCHS+/RCdlYAEAQLK38c6LkxUKbwQWAADEaRtvp5/H7+7eQoZc2CIuViiStioEAIBUaOPtEJFpK7ZKvCKwAAAgjiyP0xEVZhFYAAAQR3bG6YgKswgsAACII/XjcESFFSRvAgAQ5RyK5QF6LQXrXRGLUehWsGIRBTo2fdasWZIsxo4dK2eeeabp43Wyq34Gq1atiuh5AUAiVHt0nrjA6KZ517RVxlf9Xu/37l3hL3lT77/ijAZxWRGSvCsWpSUim5eK7P9VpNbxIid1FElLj9jbDRw4UPbu3es3eNi+fbvUqVMnYu8PAIh//kpIdWVC79dW3CpQ7wq3FxYXyFkn1om7HhbJGVismy0yb4RI0S9/3JfdUKTnRJHWV8TklPLy8mLyvgCA+C8hdZZtb4yasVr2HDxq+jX19bTjZrytXKQlXVAxvX/FoEIVbXfdr4/HeCvEvS0wY8YM6datm9SoUcOYfPrll19WeM6SJUukS5cukpmZKY0bN5ahQ4fKgQMHgm5PvPzyy8ao9Vq1asngwYOlpKTEmJKqwU39+vXlH//4R4XnbdmyxRjvrsfrUJm+ffvKr7/+WuEYnah6/PHHS1ZWltx0003G0DBvL730kpx66qnGELFWrVrJs88+G+anBgCpVUK6x0JQEc8lp2lJtf2hKxV+40FdhxrpOi4OPPDAA3LPPfcYeQennHKKXHfddXLs2DHjsQ0bNhij16+++mr573//K//3f/9nBBpDhgwJ+Jr6vLlz5xoj3N966y3517/+Jb169ZKff/5ZFi1aZIxs//vf/y5fffWVcXxpaakRVPz+++/G45988ols3LjRGNvuNn36dCNoGT9+vPznP/8xRsN7Bw06Av5///d/jaDlu+++M44dPXq0TJ06NSKfHQAkmp0RKg2Nx5LT5NkK0ZwK75WKCpwiRdtcxzXtIrGmQYVe9NW4cePktNNOk/Xr1xu/7efn58sNN9wgw4YNMx5v0aKFPPnkk9K1a1eZMmWK39HiGijoioWuLLRu3dpYEfnhhx/kww8/lLS0NGnZsqURXHz22Wdy3nnnyaeffiqrV682Rpbrqoh67bXXjHNZsWKFnHPOOTJp0iRjlUJv6uGHH5b58+dXWLUYM2aMPP7443LVVVcZ3+v483Xr1snzzz8vAwYMiPhnCQDxrn6ESkPjseQ0eVYsNFHTzuMirG3btuV/1lUAtXPnTuPrt99+K6+++qqxPeG+9ejRwwgcNAjwp0mTJkZQ4abbFxpgaFDheZ/7fXR1QQMKd1Ch9PjatWsbj7mP0SDEU4cOHcr/rNszulKigYfn+WoAovcDAET+dFIdsTMVQl+qQZyWnCbPioVWf9h5XIRVrVq1/M+ac6E0cFD79++X2267zcir8Kb5E2Ze0/26vu5zv48d9FzViy++WCkASU+PXCUOACSSrzfvkVJ/9aMWueMTnWwab4mbyRVYaEmpVn9ooqa/liL6uB4X59q1a2dsJZx88skRfR9Ntty6datxc69a6Ptq6ayuXLiP0ZyM/v37lz9v2bJlFVZAGjZsaORm6PYNACC8XAgNFQLFIPE6Lj35AgvtU6ElpVr9Uek/S1lE13NCxPpZFBYWVmoAVbdu3QrbDGaNGDFC2rdvbyRr3nzzzVKzZk3jgq/JlU8//bRt59y9e3c5/fTTjYBAcyk0eVQrSTSX4+yzzzaOueuuu4w+Hfp9p06djETNtWvXSrNmzcpfR3NEdHUlJyfHSDotLi42Ej337Nkjw4cPt+18ASBR1TeZC6Hj0HVyqWcFSV52hlx37onSpF5Nn506403yBBZK+1T0fc1PH4sJEe1jsXDhQjnrrLMq3Kd5B1qGGUr+hVZpaOWIlpw6nU5p3rx5hWoNO+i2yHvvvSd33nmnnH/++UYuhgYGTz31VPkx+p6aK3HfffcZCZtaqTJo0CD56KOPyo/R4EfLZh999FG59957jUBIAxZ38ikApLpzTbbpHnJhC+MWqOV3vHM49aoVRUVFRcZvtvobvvZN8KQXLk1O1KoCf5UP8dh5E+Gx7b87ACRA503xvaZudN6M1+2NYNfv5F2xcNMgIg5KSgEAqafEz5AxDRo0ePBu2R3vORNWJWdgAQBAjFYlxnkFDg08Age9aRvuRN7qCIbAAgCAKA0Z69nGNZW0Q/O69p9AnKQBEFgAABCFIWPjIjk0LI4GcMZl580o55MixvjvDSAVhoxtj8TQMF2lWKitFvrFzQDOuAos3J0ajxw5EutTQRQdPHjQ+OrdJRQAkq0B1k47h4atnSXyaAuRheP9HBCbAZxxtRVSpUoVox/Cb7/9ZlxkPGdcIDlXKjSo0NklOp+EFuAAEqXCw/vxn37dF92hYR+PFln6pIkDoz+AM64CC23YpAO5tKfB5s2bY306iBINKvLy8mJ9GgBgqcLD1+P+uBtg2TI0bM0sk0FFbAZwxlVgoapVq2aMCWc7JDXoyhQrFQASrcLj1vObyguLCwLO9LB9aNixIyLLXxD59EHrz43iAM64CyyUboHQgREAEI8VHurFz80FFbY0wNL8iHdvFlk7M8h4MomLAZxxGVgAABCvFR7KzAj0Id1Olk4n1wuvAZZue8y6TeRYGEmfERzA6QuBBQAAEajcaHF8rfAaYZlO0PQju1HEB3D6QmABAEAEKjfqh/o6uvWx6JHwgoquI0W63kfnTQAAYu1PJ9WR3JrV5PcDoRcRpDlE9oTyfG1mNfc+kX3bQ35v6ThUpNsoiRUaRQAA4FEN0vXRz8IKKtw5GHe8udJ4PUtBhXbKDDWoqFFP5C9TRS55SGLJcmCxbds2ufHGG6Vu3bqSmZkpp59+uvznP/+JzNkBABDlElMzfSnMGjdnnVFlYmr7Q2d9WK76UA6RG2aI3POjSJsrJdYsbYXs2bNHOnXqJN26dZO5c+fKcccdJz/99JPUqVMncmcIAEAMS0xD5fSYDxI0iVM7Y3rP+jCr450iLS6SeGEpsJg4caI0btxYXnnllfL7mjZtGvA5xcXFxs2tqKgolPMEACCmJaa2VZkcOyKy4kWRPZtE6jQROeeWEDtjOlxBRYy3PsIKLGbPni09evSQa665RhYtWiSNGjWSwYMHyy233OL3Ofn5+TJu3Dg7zhUAgIiwdThYoOoQLSH98mkRZ6nHfX8XaW1xC6NJF5EbZ4hUqSbxxlKOxcaNG2XKlClGy+2PPvpIBg0aJEOHDpWpU6f6fc6oUaOksLCw/LZ161Y7zhsAANuEWhqaVb1Kectub46y2SLl80HcfSk8gwql36+dIVKtpkcD8AAJmtdMFRn4flwGFZZXLEpLS+Xss8+W8eNdI1rPOussWbNmjTz33HMyYMAAn8/JyMgwbgAAxCu9+GsQoLNArORZXPOnE+SVLzYZ4YAz0HyQY0dcKxWBHDlY9irer1bmgvtFzr8nJr0pIrZioZNHW7duXeG+U089VbZs2WL3eQEAEDV68dcgQFlpvn1x6zyZcmM7Yx6Ip7yc6sb95fNBNKfCe6WiEqfIGTeIZDeo3EGz7+siF4yI+6DC8oqFVoT88MMPFe778ccf5aSTTrL7vAAAiGpVSE5mNflbpyYyc9U2+f3AUdNj0DUo0QBDE0B37jtsbKtUmg+iiZpmZNQUGbbGVSWiCZ06lVQHiCVAQBFSYHH33XdLx44dja2Qvn37yvLly+WFF14wbgAAJGr/Ci019awK0c6bZzbOkQXf/xZ8m6NsxaO8pLS0RGTzkoqBgVZ/mKHHaRDRtIskKofT6bRUtvv+++8bCZnav0JLTYcPHx6wKsSblpvm5OQYiZzZ2dmhnDMAALY2xfK+ELqDh1vPbyqzv91eIehoEGgM+rrZrkZXnj0pdGz5Jf8QefemwNshjnSRB3bEbVKm2eu35cAiWicGAECktz86T1zgt3+Fe7tj0b3d5OvNe/xvc7hXKRY/JrJwvJ9XEpGWl4r88GHgGR9x1pMilOs3Q8gAACkpWFMsd+dMDSr8ds50BxTLnhU5vDfAKzlEtn8r0mGI61jPlQtdqehwR1wHFVYQWAAAUpLZplh+j9NtjzlDRQ7tMfEqTpGibSKn9BS5aEzlzptxuv0RCgILAEBKMtsUq9JxAbc9gtCETg0idIUiSRFYAABSkpmmWHVrVpM/nVSn4irF3PtCH21e63hJdpbHpgMAkCpNsXYfOCLdHpkvyxfMEpk3SmR6vxCDCoer0ZWWniY5AgsAQMrSklFfnTPdLk37SmYX/03OXTzAlXQZ1ptNSKhGV6FiKwQAIKkeXBw75pQ7p31TYUtkZPqbcluV98Vhpce3L7pSoUFF6yskFRBYAAAk1ZtkDZn2Tfn3VeSY5Fd5Uf6S/nn4L35BYgwOsxOBBQAgpZtkaTtvlSalMqnK03J5+jLx7n9lWWauSO/JKbNK4YnAAgAgqd4kq2faMvln1SmS6Qg8fCyozDoi5w1KuVUKTwQWAICUpc2vbMulSMFtD18ILAAAKavV7gVyRZX3w3uRFEvODIbAAgCQmkpL5JSvx4S+UqFbHq16uXpTpPgqhScCCwBAatq8VBwHd4f23DifRBpLBBYAgNSkczusqlFP5LLHRdpcGYkzSgoEFgCA1GR1bsdpV4lc/RLbHkEQWAAAkqovhZaQarWHTiXVQWM6E8QnzY3IbihS9EuQV3WIdLyTrQ+TCCwAAEnTQVObXWlfCjedXqqDxrRtdyW68tBzosj0/iL+5ps26SJy4wzXqHOYwhAyAEBSBBWD3lgpvxYelPZp6+SKtKXG152FB4379XGftES072uulQvvXIprpooMfJ+gwiJWLAAASdGW+5K05TKm6mvS0PF7+WO/OHPlwaP9Zdyc6nJx6zzf2yIaXGjZ6OalroROzb2ghDRkBBYAgISmORVt9y2WKVUnVXosT36XZ6tOkkH79LgzpUPzur5fRIOIpl0if7IpgK0QAEBC21l0wFipUN4LEu7vx1R93TgOkUdgAQBIaCcfXG1sf/gr/tD7Gzp2G8ch8ggsAAAJ7dSsg7Yeh/AQWAAAElpaVp6txyE8BBYAgMRW1ujKqY2sfDDu1wmkehwijsACAJDYyhpdaVjhHVzo98Y9Otac8tGoILAAACS+skZXjuyKHTYd2vhKG2Dp44gK+lgAAJIDja7iAoEFACB+lJaEFxjQ6CrmCCwAAPFh3WyReSMqThvVrQwdFMZWRsIgxwIAEB9BhU4Z9R5hXrTddb8+joRAYAEAiP32h65U+BxdXnbfvJGu4xD3CCwAALGlORXeKxUVOEWKtrmOQ9wjsAAAxJYmatp5HGKKwAIAEFta/WHncYgpqkIAADEtIS1p3EF2SV05zrnb54TSUqfITkddOa5xB6EjRfwjsAAAxLSEdPnmQnn1SD+ZUnWSEUR4Bhf6vRpzpJ8M3FwoHZrXjfZPAIvYCgEAxLSEdOe+w/JR6bky6Ogw2SG5FQ7dIXWN+/VxPQ7xjxULAEAMSkgdrhLSVr2kflZ1414NHj4pPlvOTfte6ste2Sm1ZXlpKykt+x3YfRziG4EFACCmJaTnNu0sDXKqy47Cw0YQsay0dYUjdWckL6e6nNu04moG4hNbIQCAmJaQpqc5ZExvVzDhnbvp/l4f1+MQ/wgsAAAxLyHt2aaBTLmxnbEy4Um/1/v1cSQGtkIAAPbTklKt/tBETZ95Fg7X43pcGQ0eLm6dJ8sLfjcSNTWnQrc/WKlILAQWAAD7aZ8KLSnV6g9jQ8MzuCgLFHpOqDQSXYMISkoTG1shAIDI0D4VfV8TZ3bFbQynrlT0fY1R6EmKFQsAQMTMKz1HHjo8WRof+ba8hHTr4TNkdOnp0jPWJ4eIILAAAETEvDXbZdAbK41NkG3yRwmpo+iocT9JmcmJrRAAgO1KSp0ybs46v+2xlD6uxyGFA4uxY8eKw+GocGvVqlXkzg4AkJC0smN7of8W3BpO6ON6HFJ8K+S0006T+fPn//ECVdhNAYBUmURqltm5Hsz/SD6WowINJPLy8kwfX1xcbNzcioqKrL4lACBOJpGaZXauB/M/ko/lHIuffvpJGjZsKM2aNZMbbrhBtmzZEvD4/Px8ycnJKb81btw4nPMFAMRwEqlZ2thK53/4a22l9+vjzP9IPg6n02k6c2bu3Lmyf/9+admypWzfvl3GjRsn27ZtkzVr1khWVpbpFQsNLgoLCyU7O9uenwIAEPrWx6YlIm/3Fzm0189BZV0yh622tC3irgoR3+2xqApJMHr91gWCYNdvS4GFt71798pJJ50kTzzxhNx00022nhgAIMIBxeLHRL56NkBA4WXA+yJNu1h6Gw0utPrDM5FTVyp0qBhBRWIxe/0OK/Oydu3acsopp8j69evDeRkAQDStnSUye4hI8b7ITCz1wPyP1BNWYKHbIhs2bJB+/frZd0YAgMj5eLTI0icjO7HUC/M/Uoul5M177rlHFi1aJJs2bZKlS5fKn//8Z0lPT5frrrsucmcIALDHmlkhBhWaY9GowiRSwJYVi59//tkIInbv3i3HHXecdO7cWZYtW2b8GQAQ5zkVHw4P4Yn+J5ECYQcW06ZNs3I4ACBeaNOrg7stP00nkTo0qGASKUyibSYAJNvKRMHnIpuXuGo8tYqjSWdLiZc6vqNQasngo0Nly+GzmEQKSwgsACBZaBOrOUNFDu35477PHxXJzBU573ZTL+FuQDDy6M3yZWkbJpHCMqabAkAyrFIsnCgyvV/FoMLt0O8iC8eLZNYJ+lJ7pJYMOjpMPio91/ieSaSwihULAEj0VYoP7xXZv8PEwY6AKxWzS9rL3ceGSKnX75yek0gpG0UwrFgAQELP9+hnMqgoW7m44H45lFlxkOQuZ7YMOjpU7jo2tFJQ4YlJpDCDFQsASNTtD82nsOjbg7ly1Z7H5Jy076W+7JWdUluWl7YKGFC4MYkUZhBYAEAi0soPX/kUQTz/zUEpkTRZVtra9HN0AyWPSaQwia0QAEhEWk5qUXGNBjJvXzNLz3FnZejQMOZ7wAwCCwBIRCEUaHzbZqSpLQ9PulJBqSmsILAAgERkZXy59rHo+7qUtOxt6S1G9zpVloy4kKAClpBjAQCJSLtpasCglR7+pFcXuW6aSLPzjTkf55Y6pUFOdaN01Ix6WRlsf8AyViwAIB5bcq9+x/VVv/dFB4L1nhz4ta5+UeTkbuXDwzRIuOIM86sPVIEgFKxYAEA89aWYN0Kk6Jc/7stuKNJzou8hYHpf39crPUeTNDWfoiSjk7FK4V51mLdmu7ywuCDoaVAFgnA4nE53Z/joKCoqkpycHCksLJTs7OxovjUAxHmzq/4+sjLLtiL6vuZ/wqiuamxeKt9+971RTqqVH+4kTd360IqOi1vnSeeJC4Jug7g3PkjYRKjXb7ZCACDWNDDQVQefpR5l980b6XdbRPtSTN6QJ30WN5QP951cofJDA4nb31gpTy9Ybyq3IrdmNYIKhIWtEACItc1LK25/VOIUKdrmOs6rGkS3N8bOXis7iooDvsXzizeYOpW/9zqVoAJhIbAAgFjb/2tIx2lQoSPNzexnHzziJwnUS15OprlzAfxgKwQAYq3W8ZaP0xHmOsrcSpKcI8hjmo9BwibCRWABALF2UkdX9YffS79DJLuR67gyOsLcbD8KN3cQ4v0utO2GnQgsACDWtM+ElpQGuuz3nFDejyKcEeY3dWpilJJ6om037ESOBQBEglZwbFoiUrBIZO/PIrVPEGlyviv50iNAqNiT4jU/fSwmVCo1DbV5VffWeXJ/r9bGiocGJ/o6uv3BSgXsQmABAHZbO0tk9hCR4n0V7//8cZHMOiK9n/Tf8KpVL1f1hyZqak6Fbn/4CET+dFId0Vig1GSShWfTKw0iOjSvG+pPBwREYAEAdvp4tMjSJ/0/fmiPyPR+ro6ZvoILDSJMDBj7evMeS0GFIocC0UCOBQDYZc2swEGFp7kj/M8BMcFKjgU5FIgmViwAwA4aJHw43Pzx+37x2fBKy0jN5D+YzbHQ0ecDOzVlpQJRQ2ABAKEqm9Fh5EPo7eBua8/30fBKe1N4lpG6Z314rzZowKGP7Sg87LOXhTungqAC0UZgAQB2TSK1yqPhlb8umho46P3eWxkaLGjAoY9p2OD5PHIqEEvkWABAqJNIwwkqshqWN7wK1EXTfZ8+rsd50kBDAw76UiCesGIBALZNIrXg0onlZaTBumjqO+njepx3magGDzoSnb4UiBcEFgBg6yTSIDJzRXpPrlBqarbCw99x9KVAPCGwAIBITCL1VCVT5OSLRc65yWfnTbMVHqF22wSiicACAIJVfHh2wDQ7ibTHeCmtWV++21dD1tc4Xepn1/S7RWG2woPJo0gEBBYAYKbiw5jZMdHVclv/XLTdT56FTiJtKPNq9ZFx7/9QljuxOmDpKBUeSCYOp9MZZgaSNUVFRZKTkyOFhYWSnZ0dzbcGAJMVH/38P66tuJVWhRgqhwHfdJgsV31Wr1LY4Q4L/FVsWOljAUSb2es3gQUAeG5/PHqyyKHfAydf3rte5PsPfKxqNJKSHvnSeXYtv1Ue7m2NJSMu9LkCYbbzJhBtZq/fbIUAgJuOOQ8UVCh9XI/zM4l0ecFe2V64LKTSUUWFBxIdgQUAuBV8bv64Zl19TiKdv26H7UPEgERCYAEgNfmq+jC74+DnOM2R+NcXm0y9BKWjSFYEFgBSj7+qj7PcCZlBnNS50l3uttzBUDqKZEdgASA153x412xo+eiiCSLVaokc2R84edNr+8NMW243fVdKR5HMGEIGIHUEnPOh9zlE0qsFfg1tx+3VOdNKzsTfOjWhdBRJjcACQOoIOufD6ar6uOB+kawGlaeRag8LjxkfoeRM6MAwIJmxFQIgdZIzzc75qNtc5O61vlt6+0FbbsCFwAJA6iRnthto7vkaSPgoJQ0kUFtuKfv+f85pbPr1gETFVgiA5FqlWDjR1ZLbe8tDkzMX5otk1vFfL2rM+WjkWp0IgeZOaLtuXZnw5Z/zf5LOExcYZalAsiKwAJA8qxT/PE1k4Xg/B7jXENzrCd7BRdn3PScE3PIwE1xou+67u5/i83HdKtFVDYILJCsCCwCJvUKxcZHI//V3rVLsC3ax9kjOzPZKztStkr6v+U3OtGraii3+zsCgPS+09wWQbMixAJC4KxSzh4oc3mP9uZqcOWyNpeRMK4L1tAg2LwRI2RWLCRMmiMPhkGHDhtl3RgAQzNpZrhWKUIIK7+TM0//i+mpTUGGlpwXzQpCMQl6xWLFihTz//PPStm1be88IAAJZM0vkHZPVHT6TMxuGnJxpltmeFswLQTIKacVi//79csMNN8iLL74odepohjUARGn7450BfjpnmhRmcqYnzZH4csNueW/VNuOrO2diz4EjEqhjtz6kPS/oaYFkFNKKxR133CG9evWS7t27y8MPPxzw2OLiYuPmVlRUFMpbAkh15e24Q6RlpBpUmEzO1CBBcyB0u0JXFjQI8JzvoVUdmoDpmUuhwcIVZzSQFxYXBA19mBeCZGU5sJg2bZqsXLnS2AoxIz8/X8aNGxfKuQFIdZ7dM/UWsB13AFoFcv49plcqNGgYO3ut7Cj645eivOwMGXvFaUY5qT6uJaPewYMGGc8vLgj42hpLPH3dWcwLQdKyFFhs3bpV7rrrLvnkk0+kenVze4OjRo2S4cOHV1ixaNyY7nMAQuieGeFVCqVBw+1vrKx0vwYZev+z158lD33wXcibMbpbUqdmRojPBpIssPj6669l586d0q5du/L7SkpKZPHixfL0008bWx7p6RV/I8jIyDBuABD2aHMruo4U6XqfpXwK3f4YOWN1wGPuffe/cqC4JPTzohoESc5SYHHRRRfJ6tUV/0/317/+VVq1aiUjRoyoFFQAgOVtD21yNW9keEFFx6Ei3UZZftqyDbtl78GjAY8JN6hQVIMgmVkKLLKysqRNmzYV7qtZs6bUrVu30v0AENVtD5WRJXLF0yKnXRnS07/cuEsiiQmnSAV03gSQ+NseOljsvEGWEjR9M1elUSsj3Vi5CHTG3hNO3a9MNQiSXdiBxcKFC+05EwCpt/VR8LnInKGhBRU9xrs6aNrYjlvbaz/92fqgx93cuZlM/vQnv8HDrec3ldnfbq9QiqorFRpUUA2CZMeKBYAE2/oo65553u22tuFW7ZvVldo1qgbMs6hTo6rceVELadUgq1IfC8/g4b6epwbsgwEkKwILAAm09WHPaHN/9MI/4arTfZabuuVfdbpxnAYPF7fO8xs86FcGjCEVEVgAiN7Wx6YlInPuDD2fQlcqLPalsEoDhudubCdjZ6+THUUVu2p6b2UQPACVEVgAiOOtD4dIjboiPfNFshrYOto8kGCrEQD8I7AAEKdbH2UX8cv/GdEVCn9YjQBCQ2ABIAqDw5xR3/YINkQMQGQQWACIHO2kaXX7Q3tSXDNVpEnnkLc9/E0epdwTiLy0KLwHgFSlE0lN09UEh0jvJ0WadQ0rqNDJo55BhdpReNi4Xx8HEDkEFgAiR5tXWdn66PtaWPkUuv2hKxW+Nl7c9+njehyAyGArBEDkaBWHBgxFukrg52KemSvyl1dEmnYJu+Jj2cbdlVYqPOkZ6OOae0FiJhAZrFgAiBwNFHpOLPvG4WfrY7JI8wvCDip0i+OOf/tvbOWJseVA5BBYAIgs3drQLY7sBrZvfXjnVew9FHjkuZtWieh2yJcbdst7q7YZX9keAezBVgiAyNPgoVUvV5WIJnTaODgsUF6Fv7Hlew4US+eJC6gaASKAwAJAdGgQoXkUNtKg4uUlBQHzKrxdcUYDuePNbyoFIu6qkSk3tiO4AMLAVgiAhKTbH+0e+lj+8eF3po7XqaXPXN/OGGdO1QgQOaxYAIhb/rpnalARaAKpL89c107S0hxUjQARRmABIC6DCM2DeOiD7yrlQYzudarcP3ON6dd151W0b15X3v+vuS6gVI0AoSOwABBzvlpw+6J5EIPf/Mby62tSpq50aMBihtnjAFRGYAEgptylomayGqxmPmhexYSrTi9PxtStFF310ADFGWB1Q48DEBqSNwHEjJVS0VBoXoVnhYeuWujqhb92XZ6rGwBCQ2ABpPJI84LPRVa/4/qq30eZ5lRYKRW1ok6NqkZehTcNNLSkVFcmPOn3lJoC4WMrBEhF62aLzBtRcaS5dsLU9ts2dMI0K5JJkv+4so3flQcNHi5uneez4gRAeAgsgFQMKqb3r5yxoIPC9H6b2mxHMklSL/+Btk9uO7+pXNa2YcDX0CCCklLAfgQWQCrQbQ5tp71vu8i8UX4uy3qfQ2TeSFf7bRvabQcTLJnSH3/H1sqoIo9c3VYua8t2BhAr5FgAyW7NLJHHWohMvVxkxi0iB3cFONgpUrTNFYREQaBkSl+C7VTUykiXHm3y7Dk5ACFhxQJIZh+PFln6pPXn6aCwKHEnU3r3sXA1w2otdWpWM/Igdu1zNcwKZEdRMV0zgRgjsACS1dpZoQUVSqePRpGZZEodb24GXTOB2CKwAJI1p+KD/xfCEx2u6hAdaR5lwZIp6ZoJJAZyLIBkpDkSAXMpfClbHeg5ISqJm6EmevpLs9D79XG6ZgKxRWABJGOjq1ByJHSlIoqlplbRNRNIDGyFAMnY6KrdQHPPz8gW6fW4SFYD1/ZHBFcqvKeX/umkOvL15j2WGlT5S/TUrpkaVNA1E4g9h9PpjFSbfp+KiookJydHCgsLJTs7O5pvDaRGoyt3+6jMXJFDvwd+jb9MFWlzpcRieqnGEKUep97AQnDgHaTQNROIn+s3KxZAom59zBkauNFVsB6VHYdGLajwNb3UM6hQ2iRLjzMzr4OumUD8IscCSLRVikltRF7vI3JoT4ADna7VigtGubZGPNWoJ3LNVJFLHoqr6aXuY/R4fR6AxMSKBZDwWx8B1G0uMmyNq0pEEzq1P0WEcynCmV6qP5keT5MrIHERWACJsv2hSZqWJmqUNbrSIKJpF4mFUJtV0eQKSFwEFkAi0BUHz8qPOG50ZUezKppcAYmLHAsgEVjqSxE/ja6CNbXyRpMrIPERWACJwMrsjjhqdGVleilNroDkQGABJALd0jCqOwJccDPriPSfLTJsdVwEFd5NrbSJlSfv2EEfN1NqCiC+kWMBJALd0ug5sawqxLs3RdkVuveTIs26SjzyNb00lM6bAOIfnTeBhG/h3ciVTxFHqxQAkg+dN4F4Lh0Nta+EBg+tesWsLwUABENgAcTD0DDd5jC74hDDvhQAEAzJm0C0O2d696Mo2u66Xx8HgARHYAHEvHNm2X3zRrqOA4AExlYIEI18ioJFQTpnOkWKtrmOtXmbgxHjAKKJwAKIZj6FrR02zY0s12mhnoPAtLOlNqGiXwSASGArBLB7haLgc5F5o0Sm97M438Nih00TQcWgN1ZWmi66o/Cwcb8+DgAxDSymTJkibdu2NepX9dahQweZO3eu7ScFJOwKxaQ2IlMvF1n2rMUn69CwRrYNDdPtD12pCJDRYTyuxwFAzLZCTjjhBJkwYYK0aNFCtK/W1KlTpU+fPvLNN9/IaaedZuuJAQm1SrH4MZGF40N8AfuHhmlOhfdKhScNJ/TxV78okHpZGeReAIifzpu5ubny6KOPyk033WTqeDpvIulWKebeJ7IvjG2FCHTOfG/VNrlr2ipLzyH3AkBMO2+WlJTI22+/LQcOHDC2RPwpLi42bp4nBiRVXwqfGw4mdLnXNdsjAp0zdQXCKnfuBYPAAEQ1eXP16tVSq1YtycjIkNtvv11mzpwprVu7xiL7kp+fb0Q47lvjxo3DOmEgLrY+Ni4SmXNniEFFWT5Ft1Gu0tIItOPWbQ1dgbCysUHuBYCYBBYtW7aUVatWyVdffSWDBg2SAQMGyLp16/weP2rUKGPZxH3bunVruOcMxD5B87UrRA7tjYt8Cl80V0K3NTze0RR37oXmaABATHIsunfvLs2bN5fnn3/e1PHkWCBltz5iMInUVx8LMyb/z5nS58xGETsvAIknatNNS0tLK+RQAKnXktuE9oNFWl5mSz6FlU6amitxceu88uN37SuWhz74LiI5GgBgObDQbY1LL71UTjzxRNm3b5+8+eabsnDhQvnoo4/4NJHctNW21WZXEVihCKWTpgYdHZrXLQ9KXlpSYCRq+gqRNDzJy3EFKwAQ8cBi586d0r9/f9m+fbuxHKLNsjSouPjii0N6cyDuZ3xoi23thhlKOekF94ucf49tuRTuTprOMKo53LkXerwGEZ6v5V7z0MfpZwEgZjkWVpFjgbh27IjInGEi62aKHD34x/016ooc3B2zPApdaeg8cYHfXAn3SsOSEReaCgqYIQIgbnMsgKTx8WiRpU/5zqMwE1Rk5or85ZWIlJCa7aSpx7m3PQLxzr2g8yYAuxBYILXplsemJSILHhL5eYXJJ/nZROg9WaT5BZE4S+PiH85x/hI+zQQhAGAFgQVSu3x0zlCRQ3usPc/YFtn1x/fZDSNeQmq2SsPXcWx7AIgmAgukcE+Kfsa6g+XF/575IlkN/kjstFhCaqVc1LuTptVqDjsSPgHACgILpOb2x5yhoQUVSoMKzaMIIYjYc+CIPPSB9dWDQNUcUvb9ZW1cORPuQCXY6HR9HX1ccy3IrQBgF6pCkHp0zoe25A6FVnwMW+13hcIzkNi064C8tXyL7CgK3EDOfUk3s3rga1tDYwLP0R7uQCUns5pc9+KyoD/SW7e0J9cCQFBUhQD++lL89HHorxNgxkeo7bOtrB54VnN8sm6HvPzFpgpBhec2x187NbE1MRQAzCCwQPIHFIsfE/nq2RCHhpWpVkvkyil+EzT95TKYZaVcVAMP3e4YPn2V39fS0OS9VeY6hdK+G4CdCCyQ3AHF0skiRw6E/DLGRfq0q0Sufing9oe/XAarzK4emOlrsfvAEcmtWVX2HDhK+24A8Ts2HYh7a2aJTDhRZOH4gEGFv0BAs470NsdxgZTev1PkmlcCVn0Eu8hbYXb1wGwA8ueyCaXemyu07wYQKQQWSL7ume8MEDmyP+ihgS6nzx+7XKpe/ZykV8uISo6Coyzp0uzqgdkApHvrPCMpVFcmPOn3lJoCiAS2QpA81s4SWfpkWC+xy5klj1e5TbpcfbNRVfHeqm1Be02Em6MQyuqBlb4W+pq07wYQLQQWSJ6cig/+X0hPLblkvGw4WEN2OmtLepNO0uXQMUu9JoJd5IPJC6ELptUppbTvBhAt9LFAcij4XGTq5WH3pfBX3RGo14Qmbz69YL38c/6PQd9OA5DRvU6VOjUzbFk9oF03gGihjwVSi/anCLMvRSidKoP1rtCL/P+cc6I0qVcjIlsQTCkFEG8ILJAcdGaHFTrivPdkKWnVW5Zv2G1clHftKzY1mnzZht2SluaQ+et2yL++2OT3+Lu7t5AhF7aI+EWebQ4A8YTAAslBB4HplNGiIE2hqtYU6XSXyPn3yLx1O2XcxAWWS0XveHOl7D10NOAxGkpMW7HVCCwAIJVQborkoNsZPScGLiLVRlejtopcMMIIKjSXIpT+E8GCCu9OmgCQSggskDy03Xbf11wrF55q1BO5Zmp5oys7O2UGwxwOAKmGrRAkxuAwzaHQ7Y4AHTDLg4tWvQI+z85OmcEwhwNAqiGwQOIMDtOVCN3u8DMIrJwGEU27+B1n/tOvwbtyhos5HABSFYEF4su62eKcc5c4DvnITSjaLjK9v2u7I1hwYcM481AxhwNAKiOwQHwFFRo4+B8P5rpszxvp2u4IMG3UvTpR8Nt+mfTpeltOr2ZGuhwsLgmamxFKJ00ASBYEFogPpSVyaM69Ut3pFEfAX/KdIkXbXDkUXtsdkV6duKVzM5n86U8+W2jr93/r1MRoVkWDKgCpjMAiDnn+xp3UnRQ9kjNL9v0qmYd2BB45GqTTpr923HaoU6Oq3HlRC2nVIKtS4MIKBQD8gcAizqTM7Id1s0XmjShvaBWk1iNop81Il5DmX3W6EdzRQhsAAiOwiCP+fuPWqZl6v68BWAkbVATMpfDPyLLQwWFaQuohUiWkvoI6WmgDgH8EFnEilAFYCdmPouZxInPv8xtU6KxdfzkWxhxeR8XBYZFqRFU7s6o8c0M7ad+sbuJ93gAQQwQWcSLYb9yeLaIT6rdlrfSYO0Ic+4LM8CgTKHFzr6OWZF/zrKT7KDW1qxGV++0nXH26dDq5ni2vCQCphJbeccLsb9wJ1SLaKB/tJ2IyqPBnj7OWPH70L/LV1V9J+ml9fB6jeQ66bWGWLkLc0qVJpedoImbSbDkBQAywYhEnzP7GnRAtonXro+BzOfbubZIeYGsjkAeP3ii7nLVlp9SWrbXOkNHXnB7wYq/bFaN7tZbBb6409fpPX9dOLmvbQEZe2ppETACwEYFFnHD/xq2Jms5EbhHtUe1h/OWyeI0udYrskLpyQo9hckZ2DUsX+zo1q5l6j7u7n2IEFYpETACwF1shcUIvcFp9oBxx1CJak0q/3LBb3lu1zfiq3wet9igrIbXK/dLjjvaTutk1pM+ZjYyLvtmf2ew2UZN6NUI6PwBAcKxYxBFd6tf9fTMNmOxuouXr9T5Zt8N8Tw3d/tCVijA6SehKhQYVH5WeKwND2PJJqu0kAEhQBBZRphfwZRt3G7/960W4Q7N60t7jt3IzDZjsbqLl6/Vq16gqew8erXSs354aWk4awkrFQ0evl9+cuUYuxfLSVuKUNONn0Z/ZavCUNNtJAJDAHE6n0R0gaoqKiiQnJ0cKCwslOztbUolewEfOWF3pgq0X8QlXBU5ODNZEy325tVrREEobbPcFesmIC/+40K9+R+Tdm0y/hv6t2yO15Ozi56S0bEfO82dQoQRP7p/HeA+vc3a/NhUfABC56zc5FlGiF7zb31jpcxVA79PH9JhAOQ3BmmjpbezstYHzIGxog+3ZU8Nfi+2Az3e6XuPhtNvLgwrPUk+lwYF3Xw/3aol+TsG2k/S1PFFGCgDRwVZIFOgFfOzsdUGPG/Huf43jdhT5/i3dTNvqHUXF8vSC9XJX9xbl763P09f8fX+x5NasJnk5mcZ2QLhtsCskS2qL7eyGIkV60Q8cqvzqyJXtHcbKoxf3l2u8tjpU54kLwupAyjwPAIgdAosocF/Ygyk8dMy4+ctpKD5Waur9/jn/R2mZV8v4s78R4hqwXNYmT8JRIQlSW2z3nGhUhTjFIQ6P0MD9pw3N+0txs57S6rweklfF9VfPu9RTV2ns6EBKGSkAxAaBRRSE0y3T87f0x645w/TzNJej8OBRv2sHenH+1xebgr5OmpTKuWnfS33ZWyHB0mcSZOsr5JsOk6Xhl+PkeNHkVJdfpa5s7zBGzuoxIDU7kAJACiGwiIJwyxvdv6XrH3Slwcz2ha9cDl90d8Cd8+CtR9pyGVP1NWno+COX4hdnrjx4tL9c2fv2SlsLRuLkZ/XEIZMrBCMrSltJ6WdpMqXR9qA5DpSMAkBiI3kzCvQ3+7zs8C+Euw4UlzfRsovmebpXRTz1TFsuU6pOkjyPoELp91OqTZaeaSv8JoJqQuay0tYyu7Sj8bWk7K+ZPh4ssdRdMuovG0Lvd5ejAgDiT0oGFpa6SdpAf7Mfe0X4AYH+lq5JiTUzKo4MD9ffOjUxtjZ026N92jrpk7ZExld72Zjx4f0XRL83LvrzRrqaYlmczvrqFwUBP+947UAKADAn5bZC7G4uZZa+9nM3tvPdxyKzihwpccrBI39cqL1prwt3JceBYv/HhUKDlV7pX0nT5WMkV4pMPMMpUrTN1RSraRdLOQ8PffCdvLSkIODnbaUDKQAgvqRUYOGvGZTfbpI2c5dBenfePKdprpz54McBn3u0rCLE7qRFDarqL3tYmv3oWqGwZP+vIeU8mPm8KRkFgMSUMoFFsOZSZvoj2EFfu9PJ9Yyb2xc/7Qq4WqEOHCmRZRt225q0qD/l02dskWbLXw7tBTyaYgVrpx3K503JKAAknpTJsTCbA1Chm2SUfLlxl+njgiU3mqWvMeWGM6Tttw8ar2VttcIhkt3I1RTLRG5EvH3eAIDISZnAIr77I5i9qjsqXMBDlVuzqiy6t5v0rFUgVYt/D+1ce05wNcUy0U47EPpRAEBySZnAIlb9EcxUoJhd7ncf576Aa4Bglrvi44q0pXLKoW/l64JdFXIkTNO23X1fM5ph+aLnpsPJRvc61dTL0Y8CAFI4xyI/P19mzJgh33//vWRmZkrHjh1l4sSJ0rJlS4l30Rqp7Tnqe9Oug/LW8i1+Z3+4aYdM3YoINGe2To2q0r7ZHwGIPv/CVsdL+/xP5fcDRwKek69GV4fefVGkvblppEVSQ2r9eZKkaVCh2x9eKxXedFVlYKemRvUHI8wBILVYWrFYtGiR3HHHHbJs2TL55JNP5OjRo3LJJZfIgQMHJN7Z0R8h2OqDVp3oAK3rXlwmd01bZczs8J4R4j2hU7/e8ebKgEGFyr/q9ErnVq1Kmoz/cxtXjoSfVYqh6e/Kc1UnSQOpuOVR/fBOkYXjRTJz/SZbGh05nSIb2o+XtDOudZWWBgkq3OhHAQCpyeF0Bruk+ffbb79J/fr1jYDj/PPPt3Wee7z1sQj2PH+lrBLgt3XNc+j66GcBk0r1uvv0dWfJZW0bWjo3XaXIr/qi5DoCBX0Okcw6Iof2iNMYHVaR/iwFp9wkza5/wsRPFV99QwAA9jJ7/Q6r3FRfXOXm+l/OLi4uNm6eJxZLofRHCNb/4pnr28lDH/guZQ1UEfH6l5uCzv3QRZE6NTMs/Uzp38+Ry76bZCIl1Cly6HeRC+4Xx8pXRYp+KX/kSEZdSe/9uDRr82eTP5W5c6MfBQAkt5ADi9LSUhk2bJh06tRJ2rRpEzAvY9y4cRJPrPRHMNP/YvR7a2R3kDwHXzb/ftC2yol0KZUOaetE0raLbJ4oTof5WhOp21xk2BpXJ01N6Kx1vFQzkUthFv0oACB1hBxYaK7FmjVrZMmSJQGPGzVqlAwfPrzCikXjxo3FTp4Jk3b/Rmym/0UoQYU6KbeGPZUT62aLzBtRYcXBYbXRlQYRZe25AQCIamAxZMgQef/992Xx4sVywgknBDw2IyPDuEVKpPfwI9FnwZ1j0a9Dk/AqJ3QI2KJHRBZNCP1kvBpdAQAQtaoQzfPUoGLmzJmyYMECadq0qcSSO/fBe0XBu/IiHGb7LGhPCSurBBr4aFVHyJUTa2eJTDgxvKDCT6MrAACiEljo9scbb7whb775pmRlZcmOHTuM26FDhyTaguU+KH083JHowVpo6/36+MN9XHkmwYILjRFuPb9p+WqKv26V+r3fIV0fjxZ5e4DIkf2h/VDGiaaL/GWq30ZXAABEvNzU4WegxCuvvCIDBw6Marmp9pHQfhHBvHVL+6CJg8FyNNwrI8rzw3If4Q4AfG3LePN+jtlzKLdmlsg7AyRs10wVOe3K8F8HAJASiiJRbhpGy4u4nf1hJkfDvargfVyej+OMsegbdhtNr/YeOmp6sqepygnNqfjwj0TYkGTmivSezEoFACAiqqTy7I9g/Sk8VxXM9mPQ79PSHD6DCl+TPS2VYWo56MHdYklmXZGz/+oKZ7Tqo0lncioAABGTsIFFuLM/zPSnCGlVIZKTVEMZGtZ7EqsTAICoSdjppuHOojDTn8K9qhDV1RTd7ti4SOTTh0UWPCyyYaHrPne/CbNIzgQAxEDCrlhYyX2I6qpCOKsp2uhqzlBjdscfHnXN8+j9pEirXq6x5R6NsPz6y8skZwIAoi6hA4twZlHYkaMRbDVF8zQcfipJKq2mBKr20EBjej+Rvq+L9JwoMr2/16t6qFZL5MoprFQAAGIiYbdCPLlzH/qc2cj4aqadt9n+FH67XgZhqT+FNrt6x0S57twRrlWLvq+5Vi48VcsSOX+kyMgtBBUAgJhJ+BWLUIW0qhCJ1RTd/tBmV2bs+8VVGaKBgwYYHkPDjLbcVHsAAGIsZQOLcHM0zApYSaJJmTo8LJTKEIaGAQDiUEoHFuHkaIRMg4lNS0QKPhcp3GIuEdOTlcoQAACiLOUDCyv9KcJmVH3cJXLIeglreR4Fk0gBAHGMwCLSdIVCcyF++FBk2bPhvdYVT5JHAQCIawQWkV6hmHufyL7wx7dLy8tE2lxlx1kBABAxBBaRDCq094QdOtwp0uNhe14LAIAIIrCIxNaHJmbOuj2816lWU6R1H5HLJ4tUqWbX2QEAEFEEFnavUmj5qNVKD09d7hVp1pW+FACAhERgYevWR4BW22ZkNRTpNoqAAgCQsJKipXfMlTe6CiOoUJdOJKgAACQ0Ags7aDlpONsfmbmuAWPM+AAAJDi2QuzgbrNtRZVMkQ6DRZqc72rNzUoFACAJEFjYIZQ221e9wAoFACDpEFiE0kXTe6KoftUx5kXbg+dZZDcS6TmBoAIAkJQILMIpJdVgoudEV5CgX42qEO8h7GXaD3Z1z6SMFACQxEjetFJK6p2gqSsUer8+rsFF39dEshtUXqHQxMye+eRSAACSHisWYZWS6n0OkXkjRVr1cgUX+tXXdgkAACmAwCLsUlKnSNE213HuFQn9CgBACmIrxK5S0lBKTgEASDKsWASq9rBSShpKySkAAEmGwCJYtUfQUlKH63E9DgCAFJeW0qPN540Smd4vcLWHrlxokGHQUlJPZd9rXwoSNAEASMHAQoOFSW1Epl4usuxZPweVrUxotYcGIX5LSRu67qfZFQAAKbgVYmm0uVe1B6WkAAAElTqBRaijzT2rPSglBQAgoNTZCgl1tDnVHgAAmJY6KxaW+0xQ7QEAgFWps2JhaeWBag8AAEKROoGFux9FpZJRH6j2AAAghbdCAnXOdHP3o2C0OQAAEVMl6TtnenL3o6h0fCPXtgcrFAAAhMXhdDot1l+Gp6ioSHJycqSwsFCys7Mj1JeibLvD33aGmRUOAABg+fpdJTn7Uuh9DlfnTG1q5WtbhH4UAADYLi15+1J4dM4EAABRkZb0fSks968AAACpF1iY7UtB50wAAKImLXn7UmjnzEZ0zgQAIIoSN7Bw96UweAcXdM4EACAWEjew8OxLkd2g4v10zgQAICYSt9zUTYMHLSmlLwUAADGX+IGFoi8FAABxIbG3QgAAQGIHFosXL5bevXtLw4YNxeFwyKxZsyJzZgAAIPkDiwMHDsgZZ5whzzzzTGTOCAAApE6OxaWXXmrcAAAAop68WVxcbNw8p6MBAIDkFPHkzfz8fGPMqvvWuHHjSL8lAABI1sBi1KhRxux2923r1q2RfksAAJCsWyEZGRnGDQAAJD/6WAAAgNitWOzfv1/Wr19f/n1BQYGsWrVKcnNz5cQTTwz6fKfTaXwliRMAgMThvm67r+N+OS367LPP9BUr3QYMGGDq+Vu3bvX5fG7cuHHjxo2bxP1Nr+OBOPR/JIpKS0vll19+kaysLKNzZziRk1aYaDJodna2recI3/jMo4/PPPr4zKOPzzwxPnMNF/bt22d03k5LS4ufIWR6MieccIJtr6cfCH8Ro4vPPPr4zKOPzzz6+Mzj/zPXthHBkLwJAABsQ2ABAABsk7CBhfbGGDNmDD0yoojPPPr4zKOPzzz6+MyT6zOPevImAABIXgm7YgEAAOIPgQUAALANgQUAALANgQUAALANgQUAAEiNwOKZZ56RJk2aSPXq1eW8886T5cuXBzz+7bffllatWhnHn3766fLhhx9G7VyThZXP/MUXX5QuXbpInTp1jFv37t2D/jdC+H/P3aZNm2a0xb/yyisjfo6p/pnv3btX7rjjDmnQoIFRnnfKKafw70sEP+9JkyZJy5YtJTMz02g7fffdd8vhw4ejdr6JbvHixdK7d2+j9bb+GzFr1qygz1m4cKG0a9fO+Pt98skny6uvvhr6CTjj1LRp05zVqlVzvvzyy861a9c6b7nlFmft2rWdv/76q8/jv/jiC2d6errzkUceca5bt87597//3Vm1alXn6tWro37uicrqZ3799dc7n3nmGec333zj/O6775wDBw505uTkOH/++eeon3uqfOZuBQUFzkaNGjm7dOni7NOnT9TONxU/8+LiYufZZ5/tvOyyy5xLliwxPvuFCxc6V61aFfVzT4XP+9///rczIyPD+Kqf9UcffeRs0KCB8+677476uSeqDz/80PnAAw84Z8yYYQwNmzlzZsDjN27c6KxRo4Zz+PDhxvXzqaeeMq6n8+bNC+n94zawOPfcc5133HFH+fclJSXOhg0bOvPz830e37dvX2evXr0q3Hfeeec5b7vttoifa7Kw+pl7O3bsmDMrK8s5derUCJ5lcgnlM9fPuWPHjs6XXnrJmCpMYBHZz3zKlCnOZs2aOY8cORLFs0zdz1uPvfDCCyvcpxe8Tp06Rfxck5GYCCzuu+8+52mnnVbhvmuvvdbZo0ePkN4zLrdCjhw5Il9//bWxtO45vEy///LLL30+R+/3PF716NHD7/EI/zP3dvDgQTl69Kjk5uZG8EyTR6if+YMPPij169eXm266KUpnmtqf+ezZs6VDhw7GVsjxxx8vbdq0kfHjx0tJSUkUzzx1Pu+OHTsaz3Fvl2zcuNHYdrrsssuidt6p5kubr59Rn25qxq5du4z/0+r/iT3p999//73P5+zYscPn8Xo/IvOZexsxYoSxp+f9FxT2feZLliyRf/3rX7Jq1aoonWVyCeUz1wvbggUL5IYbbjAucOvXr5fBgwcbQbS2RIa9n/f1119vPK9z587GmO5jx47J7bffLvfff3+Uzjr17PBz/dTR6ocOHTJyXayIyxULJJ4JEyYYyYQzZ840ErRgv3379km/fv2MpNl69erF+nRSRmlpqbFC9MILL8if/vQnufbaa+WBBx6Q5557LtanlpQ0iVBXhJ599llZuXKlzJgxQz744AN56KGHYn1qSOQVC/1HMz09XX799dcK9+v3eXl5Pp+j91s5HuF/5m6PPfaYEVjMnz9f2rZtG+EzTd3PfMOGDbJp0yYj29vzoqeqVKkiP/zwgzRv3jwKZ55af8+1EqRq1arG89xOPfVU47c8XeqvVq1axM87lT7v0aNHGwH0zTffbHyvFX4HDhyQW2+91QjodCsF9vJ3/czOzra8WqHi8r+Q/h9VfzP49NNPK/wDqt/rXqcver/n8eqTTz7xezzC/8zVI488YvwmMW/ePDn77LOjdLap+ZlrKfXq1auNbRD37YorrpBu3boZf9ayPNj/97xTp07G9oc7iFM//vijEXAQVNj/eWuulnfw4A7qmJkZGbZfP51xXKKkJUevvvqqUf5y6623GiVKO3bsMB7v16+fc+TIkRXKTatUqeJ87LHHjNLHMWPGUG4a4c98woQJRhnZO++849y+fXv5bd++fTH8KZL7M/dGVUjkP/MtW7YY1U5Dhgxx/vDDD87333/fWb9+fefDDz8cw58ieT9v/bdbP++33nrLKIP8+OOPnc2bNzcq/2CO/husbQD0ppf5J554wvjz5s2bjcf189bP3bvc9N577zWun9pGICnLTZXW0p544onGxUtLlpYtW1b+WNeuXY1/VD1Nnz7decoppxjHa+nMBx98EIOzTmxWPvOTTjrJ+EvrfdN/GBC5v+eeCCyi85kvXbrUKF/XC6SWnv7jH/8wyn5h/+d99OhR59ixY41gonr16s7GjRs7Bw8e7NyzZ0+Mzj7xfPbZZz7/bXZ/zvpVP3fv55x55pnGfyP9O/7KK6+E/P4O/R97FlMAAECqi8scCwAAkJgILAAAgG0ILAAAgG0ILAAAgG0ILAAAgG0ILAAAgG0ILAAAgG0ILAAAgG0ILAAAgG0ILAAAgG0ILAAAgNjl/wOB6vqgznVLNAAAAABJRU5ErkJggg==",
+      "text/plain": [
+       "<Figure size 640x480 with 1 Axes>"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "n = 100\n",
+    "x = np.random.rand(n, 1)\n",
+    "y = 2.0 + 5 * x**2 + 0.1 * np.random.randn(n, 1)\n",
+    "\n",
+    "line_model = LinearRegression().fit(x, y)\n",
+    "line_predict = line_model.predict(x)\n",
+    "#line_mse = ...\n",
+    "\n",
+    "#poly_features = ...\n",
+    "#poly_model = LinearRegression().fit(..., y)\n",
+    "#poly_predict = ...\n",
+    "#poly_mse = ...\n",
+    "\n",
+    "plt.scatter(x, y, label = \"Data\")\n",
+    "plt.scatter(x, line_predict, label = \"Line model\")\n",
+    "plt.legend()\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "248d8931",
+   "metadata": {},
+   "source": [
+    "## Exercise 4 - The train-test split\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1efd3376",
+   "metadata": {},
+   "source": [
+    "Hopefully your model fit the data quite well, but to know how well the model actually generalizes to unseen data, which is most often what we care about, we need to split our data into training and testing data. "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "0f8d75fb",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from sklearn.model_selection import train_test_split"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "edb213fc",
+   "metadata": {},
+   "source": [
+    "**a)** Complete the code below so that the polynomial features and the targets y get split into training and test data.\n",
+    "\n",
+    "**b)** What is the shape of X_test?\n",
+    "\n",
+    "**c)** Fit your model to X_train\n",
+    "\n",
+    "**d)** Compute the MSE when your model predicts on the training data and on the testing data, using y_train and y_test as targets for the two cases.\n",
+    "\n",
+    "**e)** Why do we not fit the model to X_test?"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "a03e0388",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "polynomial_features = ...\n",
+    "\n",
+    "#X_train, X_test, y_train, y_test = train_test_split(polynomial_features, y, test_size=0.2)\n",
+    "\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "22e7536e",
+   "metadata": {},
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.9.15"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/doc/LectureNotes/.ipynb_checkpoints/exercisesweek35-checkpoint.ipynb b/doc/LectureNotes/.ipynb_checkpoints/exercisesweek35-checkpoint.ipynb
index 8ce6a6c3d..403eab1f3 100644
--- a/doc/LectureNotes/.ipynb_checkpoints/exercisesweek35-checkpoint.ipynb
+++ b/doc/LectureNotes/.ipynb_checkpoints/exercisesweek35-checkpoint.ipynb
@@ -323,7 +323,7 @@
    "source": [
     "n = 100\n",
     "x = np.linspace(-3, 3, n)\n",
-    "y = np.exp(-x**2) + 1.5 * np.exp(-(x-2)**2) + np.random.normal(0, 0.1)"
+    "y = np.exp(-x**2) + 1.5 * np.exp(-(x-2)**2) + np.random.normal(n)"
    ]
   },
   {
diff --git a/doc/LectureNotes/.ipynb_checkpoints/exercisesweek36-checkpoint.ipynb b/doc/LectureNotes/.ipynb_checkpoints/exercisesweek36-checkpoint.ipynb
index ddf3e11e5..3dd1ad167 100644
--- a/doc/LectureNotes/.ipynb_checkpoints/exercisesweek36-checkpoint.ipynb
+++ b/doc/LectureNotes/.ipynb_checkpoints/exercisesweek36-checkpoint.ipynb
@@ -172,7 +172,7 @@
    "source": [
     "n = 100\n",
     "x = np.linspace(-3, 3, n)\n",
-    "y = np.exp(-x**2) + 1.5 * np.exp(-(x-2)**2) + np.random.normal(0, 0.1)"
+    "y = np.exp(-x**2) + 1.5 * np.exp(-(x-2)**2) + np.random.normal(n)"
    ]
   },
   {
diff --git a/doc/LectureNotes/.ipynb_checkpoints/exercisesweek38-checkpoint.ipynb b/doc/LectureNotes/.ipynb_checkpoints/exercisesweek38-checkpoint.ipynb
new file mode 100644
index 000000000..4ffd81af5
--- /dev/null
+++ b/doc/LectureNotes/.ipynb_checkpoints/exercisesweek38-checkpoint.ipynb
@@ -0,0 +1,483 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "1da77599",
+   "metadata": {},
+   "source": [
+    "# Exercises week 38\n",
+    "\n",
+    "## September 15-19\n",
+    "\n",
+    "## Resampling and the Bias-Variance Trade-off\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e9f27b0e",
+   "metadata": {},
+   "source": [
+    "### Learning goals\n",
+    "\n",
+    "After completing these exercises, you will know how to\n",
+    "\n",
+    "- Derive expectation and variances values related to linear regression\n",
+    "- Compute expectation and variances values related to linear regression\n",
+    "- Compute and evaluate the trade-off between bias and variance of a model\n",
+    "\n",
+    "### Deliverables\n",
+    "\n",
+    "Complete the following exercises while working in a jupyter notebook. Then, in canvas, include\n",
+    "\n",
+    "- The jupyter notebook with the exercises completed\n",
+    "- An exported PDF of the notebook (https://code.visualstudio.com/docs/datascience/jupyter-notebooks#_export-your-jupyter-notebook)\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "984af8e3",
+   "metadata": {},
+   "source": [
+    "## Use the books!\n",
+    "\n",
+    "This week deals with various mean values and variances in linear regression methods (here it may be useful to look up chapter 3, equation (3.8) of [Trevor Hastie, Robert Tibshirani, Jerome H. Friedman, The Elements of Statistical Learning, Springer](https://www.springer.com/gp/book/9780387848570)).\n",
+    "\n",
+    "For more discussions on Ridge regression and calculation of expectation values, [Wessel van Wieringen's](https://arxiv.org/abs/1509.09169) article is highly recommended.\n",
+    "\n",
+    "The exercises this week are also a part of project 1 and can be reused in the theory part of the project.\n",
+    "\n",
+    "### Definitions\n",
+    "\n",
+    "We assume that there exists a continuous function $f(\\boldsymbol{x})$ and a normal distributed error $\\boldsymbol{\\varepsilon}\\sim N(0, \\sigma^2)$ which describes our data\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c16f7d0e",
+   "metadata": {},
+   "source": [
+    "$$\n",
+    "\\boldsymbol{y} = f(\\boldsymbol{x})+\\boldsymbol{\\varepsilon}\n",
+    "$$\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9fcf981a",
+   "metadata": {
+    "vscode": {
+     "languageId": "plaintext"
+    }
+   },
+   "source": [
+    "We further assume that this continous function can be modeled with a linear model $\\mathbf{\\tilde{y}}$ of some features $\\mathbf{X}$.\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d4189366",
+   "metadata": {},
+   "source": [
+    "$$\n",
+    "\\boldsymbol{y} = \\boldsymbol{\\tilde{y}} + \\boldsymbol{\\varepsilon} = \\boldsymbol{X}\\boldsymbol{\\beta} +\\boldsymbol{\\varepsilon}\n",
+    "$$\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f4fca21b",
+   "metadata": {},
+   "source": [
+    "We therefore get that our data $\\boldsymbol{y}$ has an expectation value $\\boldsymbol{X}\\boldsymbol{\\beta}$ and variance $\\sigma^2$, that is $\\boldsymbol{y}$ follows a normal distribution with mean value $\\boldsymbol{X}\\boldsymbol{\\beta}$ and variance $\\sigma^2$.\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5de0c7e6",
+   "metadata": {},
+   "source": [
+    "## Exercise 1: Expectation values for ordinary least squares expressions\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d878c699",
+   "metadata": {},
+   "source": [
+    "**a)** With the expressions for the optimal parameters $\\boldsymbol{\\hat{\\beta}_{OLS}}$ show that\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "08b7007d",
+   "metadata": {},
+   "source": [
+    "$$\n",
+    "\\mathbb{E}(\\boldsymbol{\\hat{\\beta}_{OLS}}) = \\boldsymbol{\\beta}.\n",
+    "$$\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "46e93394",
+   "metadata": {},
+   "source": [
+    "**b)** Show that the variance of $\\boldsymbol{\\hat{\\beta}_{OLS}}$ is\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "be1b65be",
+   "metadata": {},
+   "source": [
+    "$$\n",
+    "\\mathbf{Var}(\\boldsymbol{\\hat{\\beta}_{OLS}}) = \\sigma^2 \\, (\\mathbf{X}^{T} \\mathbf{X})^{-1}.\n",
+    "$$\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d2143684",
+   "metadata": {},
+   "source": [
+    "We can use the last expression when we define a [confidence interval](https://en.wikipedia.org/wiki/Confidence_interval) for the parameters $\\boldsymbol{\\hat{\\beta}_{OLS}}$.\n",
+    "A given parameter ${\\boldsymbol{\\hat{\\beta}_{OLS}}}_j$ is given by the diagonal matrix element of the above matrix.\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f5c2dc22",
+   "metadata": {},
+   "source": [
+    "## Exercise 2: Expectation values for Ridge regression\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3893e3e7",
+   "metadata": {},
+   "source": [
+    "**a)** With the expressions for the optimal parameters $\\boldsymbol{\\hat{\\beta}_{Ridge}}$ show that\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "79dc571f",
+   "metadata": {},
+   "source": [
+    "$$\n",
+    "\\mathbb{E} \\big[ \\hat{\\boldsymbol{\\beta}}^{\\mathrm{Ridge}} \\big]=(\\mathbf{X}^{T} \\mathbf{X} + \\lambda \\mathbf{I}_{pp})^{-1} (\\mathbf{X}^{\\top} \\mathbf{X})\\boldsymbol{\\beta}\n",
+    "$$\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "028209a1",
+   "metadata": {},
+   "source": [
+    "We see that $\\mathbb{E} \\big[ \\hat{\\boldsymbol{\\beta}}^{\\mathrm{Ridge}} \\big] \\not= \\mathbb{E} \\big[\\hat{\\boldsymbol{\\beta}}^{\\mathrm{OLS}}\\big ]$ for any $\\lambda > 0$.\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b4e721fc",
+   "metadata": {},
+   "source": [
+    "**b)** Show that the variance is\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "090eb1e1",
+   "metadata": {},
+   "source": [
+    "$$\n",
+    "\\mathbf{Var}[\\hat{\\boldsymbol{\\beta}}^{\\mathrm{Ridge}}]=\\sigma^2[  \\mathbf{X}^{T} \\mathbf{X} + \\lambda \\mathbf{I} ]^{-1}  \\mathbf{X}^{T}\\mathbf{X} \\{ [  \\mathbf{X}^{\\top} \\mathbf{X} + \\lambda \\mathbf{I} ]^{-1}\\}^{T}\n",
+    "$$\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6b8e8697",
+   "metadata": {},
+   "source": [
+    "We see that if the parameter $\\lambda$ goes to infinity then the variance of the Ridge parameters $\\boldsymbol{\\beta}$ goes to zero.\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "74bc300b",
+   "metadata": {},
+   "source": [
+    "## Exercise 3: Deriving the expression for the Bias-Variance Trade-off\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "eeb86010",
+   "metadata": {},
+   "source": [
+    "The aim of this exercise is to derive the equations for the bias-variance tradeoff to be used in project 1.\n",
+    "\n",
+    "The parameters $\\boldsymbol{\\hat{\\beta}_{OLS}}$ are found by optimizing the mean squared error via the so-called cost function\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "522a0d1d",
+   "metadata": {},
+   "source": [
+    "$$\n",
+    "C(\\boldsymbol{X},\\boldsymbol{\\beta}) =\\frac{1}{n}\\sum_{i=0}^{n-1}(y_i-\\tilde{y}_i)^2=\\mathbb{E}\\left[(\\boldsymbol{y}-\\boldsymbol{\\tilde{y}})^2\\right]\n",
+    "$$\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "831db06c",
+   "metadata": {},
+   "source": [
+    "**a)** Show that you can rewrite this into an expression which contains\n",
+    "\n",
+    "- the variance of the model (the variance term)\n",
+    "- the expected deviation of the mean of the model from the true data (the bias term)\n",
+    "- the variance of the noise\n",
+    "\n",
+    "In other words, show that:\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8cc52b3c",
+   "metadata": {},
+   "source": [
+    "$$\n",
+    "\\mathbb{E}\\left[(\\boldsymbol{y}-\\boldsymbol{\\tilde{y}})^2\\right]=\\mathrm{Bias}[\\tilde{y}]+\\mathrm{var}[\\tilde{y}]+\\sigma^2,\n",
+    "$$\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8cb50416",
+   "metadata": {},
+   "source": [
+    "with\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e49bdbb4",
+   "metadata": {},
+   "source": [
+    "$$\n",
+    "\\mathrm{Bias}[\\tilde{y}]=\\mathbb{E}\\left[\\left(\\boldsymbol{y}-\\mathbb{E}\\left[\\boldsymbol{\\tilde{y}}\\right]\\right)^2\\right],\n",
+    "$$\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "eca5554a",
+   "metadata": {},
+   "source": [
+    "and\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b1054343",
+   "metadata": {},
+   "source": [
+    "$$\n",
+    "\\mathrm{var}[\\tilde{y}]=\\mathbb{E}\\left[\\left(\\tilde{\\boldsymbol{y}}-\\mathbb{E}\\left[\\boldsymbol{\\tilde{y}}\\right]\\right)^2\\right]=\\frac{1}{n}\\sum_i(\\tilde{y}_i-\\mathbb{E}\\left[\\boldsymbol{\\tilde{y}}\\right])^2.\n",
+    "$$\n",
+    "In order to arrive at the last equation, we have to approximate the unknown function $f$ with the output/target values $y$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "70fbfcd7",
+   "metadata": {},
+   "source": [
+    "**b)** Explain what the terms mean and discuss their interpretations.\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b8f8b9d1",
+   "metadata": {},
+   "source": [
+    "## Exercise 4: Computing the Bias and Variance\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9e012430",
+   "metadata": {},
+   "source": [
+    "Before you compute the bias and variance of a real model for different complexities, let's for now assume that you have sampled predictions and targets for a single model complexity using bootstrap resampling.\n",
+    "\n",
+    "**a)** Using the expression above, compute the mean squared error, bias and variance of the given data. Check that the sum of the bias and variance correctly gives (approximately) the mean squared error.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 67,
+   "id": "b5bf581c",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import numpy as np\n",
+    "\n",
+    "n = 100\n",
+    "bootstraps = 1000\n",
+    "\n",
+    "predictions = np.random.rand(bootstraps, n) * 10 + 10\n",
+    "targets = np.random.rand(bootstraps, n)\n",
+    "\n",
+    "mse = ...\n",
+    "bias = ...\n",
+    "variance = ..."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7b1dc621",
+   "metadata": {},
+   "source": [
+    "**b)** Change the prediction values in some way to increase the bias while decreasing the variance.\n",
+    "\n",
+    "**c)** Change the prediction values in some way to increase the variance while decreasing the bias.\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8da63362",
+   "metadata": {},
+   "source": [
+    "**d)** Perform a bias-variance analysis of a polynomial OLS model fit to a one-dimensional function by computing and plotting the bias and variances values as a function of the polynomial degree of your model.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "dd5855e4",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import numpy as np\n",
+    "import matplotlib.pyplot as plt\n",
+    "from sklearn.preprocessing import (\n",
+    "    PolynomialFeatures,\n",
+    ")  # use the fit_transform method of the created object!\n",
+    "from sklearn.linear_model import LinearRegression\n",
+    "from sklearn.metrics import mean_squared_error\n",
+    "from sklearn.model_selection import train_test_split\n",
+    "from sklearn.utils import resample"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "7e35fa37",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "n = 100\n",
+    "bootstraps = 1000\n",
+    "\n",
+    "x = np.linspace(-3, 3, n)\n",
+    "y = np.exp(-(x**2)) + 1.5 * np.exp(-((x - 2) ** 2)) + np.random.normal(0, 0.1)\n",
+    "\n",
+    "biases = []\n",
+    "variances = []\n",
+    "mses = []\n",
+    "\n",
+    "# for p in range(1, 5):\n",
+    "#    predictions = ...\n",
+    "#    targets = ...\n",
+    "#\n",
+    "#    X = ...\n",
+    "#    X_train, X_test, y_train, y_test = ...\n",
+    "#    for b in range(bootstraps):\n",
+    "#        X_train_re, y_train_re = ...\n",
+    "#\n",
+    "#        # fit your model on the sampled data\n",
+    "#\n",
+    "#        # make predictions on the test data\n",
+    "#        predictions[b, :] =\n",
+    "#        targets[b, :] =\n",
+    "#\n",
+    "#    biases.append(...)\n",
+    "#    variances.append(...)\n",
+    "#    mses.append(...)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "253b8461",
+   "metadata": {},
+   "source": [
+    "**e)** Discuss the bias-variance trade-off as function of your model complexity (the degree of the polynomial).\n",
+    "\n",
+    "**f)** Compute and discuss the bias and variance as function of the number of data points (choose a suitable polynomial degree to show something interesting).\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "46250fbc",
+   "metadata": {},
+   "source": [
+    "## Exercise 5: Interpretation of scaling and metrics\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5af53055",
+   "metadata": {},
+   "source": [
+    "In this course, we often ask you to scale data and compute various metrics. Although these practices are \"standard\" in the field, we will require you to demonstrate an understanding of _why_ you need to scale data and use these metrics. Both so that you can make better arguements about your results, and so that you will hopefully make fewer mistakes.\n",
+    "\n",
+    "First, a few reminders: In this course you should always scale the columns of the feature matrix, and sometimes scale the target data, when it is worth the effort. By scaling, we mean subtracting the mean and dividing by the standard deviation, though there are many other ways to scale data. When scaling either the feature matrix or the target data, the intercept becomes a bit harder to implement and understand, so take care.\n",
+    "\n",
+    "Briefly answer the following:\n",
+    "\n",
+    "**a)** Why do we scale data?\n",
+    "\n",
+    "**b)** Why does the OLS method give practically equivelent models on scaled and unscaled data?\n",
+    "\n",
+    "**c)** Why does the Ridge method **not** give practically equivelent models on scaled and unscaled data? Why do we only consider the model on scaled data correct?\n",
+    "\n",
+    "**d)** Why do we say that the Ridge method gives a biased model?\n",
+    "\n",
+    "**e)** Is the MSE of the OLS method affected by scaling of the feature matrix? Is it affected by scaling of the target data?\n",
+    "\n",
+    "**f)** Read about the R2 score, a metric we will ask you to use a lot later in the course. Is the R2 score of the OLS method affected by scaling of the feature matrix? Is it affected by scaling of the target data?\n",
+    "\n",
+    "**g)** Give interpretations of the following R2 scores: 0, 0.5, 1.\n",
+    "\n",
+    "**h)** What is an advantage of the R2 score over the MSE?\n"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.9.15"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/doc/LectureNotes/.ipynb_checkpoints/exercisesweek39-checkpoint.ipynb b/doc/LectureNotes/.ipynb_checkpoints/exercisesweek39-checkpoint.ipynb
new file mode 100644
index 000000000..7e520c96d
--- /dev/null
+++ b/doc/LectureNotes/.ipynb_checkpoints/exercisesweek39-checkpoint.ipynb
@@ -0,0 +1,197 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "433db993",
+   "metadata": {},
+   "source": [
+    "# Exercises week 39\n",
+    "\n",
+    "## Getting started with project 1\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6b931365",
+   "metadata": {},
+   "source": [
+    "The aim of the exercises this week is to aid you in getting started with writing the report. This will be discussed during the lab sessions as well.\n",
+    "\n",
+    "A short feedback to the this exercise will be available before the project deadline. And you can reuse these elements in your final report.\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2a63bae1",
+   "metadata": {},
+   "source": [
+    "### Learning goals\n",
+    "\n",
+    "After completing these exercises, you will know how to\n",
+    "\n",
+    "- Create a properly formatted report in Overleaf\n",
+    "- Select and present graphs for a scientific report\n",
+    "- Write an abstract and introduction for a scientific report\n",
+    "\n",
+    "### Deliverables\n",
+    "\n",
+    "Complete the following exercises while working in an Overleaf project. Then, in canvas, include\n",
+    "\n",
+    "- An exported PDF of the report draft you have been working on.\n",
+    "- A comment linking to the github repository used in exercise 4.\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e0f2d99d",
+   "metadata": {},
+   "source": [
+    "## Exercise 1: Creating the report document\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d06bfb29",
+   "metadata": {},
+   "source": [
+    "We require all projects to be formatted as proper scientific reports, and this includes using LaTeX for typesetting. We strongly recommend that you use the online LaTeX editor Overleaf, as it is much easier to start using, and has excellent support for collaboration.\n",
+    "\n",
+    "**a)** Create an account on Overleaf.com\n",
+    "\n",
+    "**b)** Download [this](https://github.com/CompPhysics/MachineLearning/blob/master/doc/LectureNotes/data/FYS_STK_Template.zip) template project.\n",
+    "\n",
+    "**c)** Create a new Overleaf project with the correct formatting by uploading the template project.\n",
+    "\n",
+    "**d)** Read the general guideline for writing a report, which can be found at <https://github.com/CompPhysics/MachineLearning/blob/master/doc/Projects/EvaluationGrading/EvaluationForm.md>.\n",
+    "\n",
+    "**e)** Look at the provided example of an earlier project, found at <https://github.com/CompPhysics/MachineLearning/blob/master/doc/Projects/ReportExample/ReportExampleDanielBH.pdf>\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "338b2ee1",
+   "metadata": {
+    "vscode": {
+     "languageId": "plaintext"
+    }
+   },
+   "outputs": [],
+   "source": []
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ec36f4c3",
+   "metadata": {},
+   "source": [
+    "## Exercise 2: Adding good figures\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f50723f8",
+   "metadata": {},
+   "source": [
+    "**a)** Using what you have learned so far in this course, create a plot illustrating the Bias-Variance trade-off. Make sure the lines and axes are labeled, with font size being the same as in the text.\n",
+    "\n",
+    "**b)** Add this figure to the results section of your document, with a caption that describes it. A reader should be able to understand the figure with only its contents and caption.\n",
+    "\n",
+    "**c)** Refer to the figure in your text using \\ref.\n",
+    "\n",
+    "**d)** Create a heatmap showing the MSE of a Ridge regression model for various polynomial degrees and lambda values. Make sure the axes are labeled, and that the title or colorbar describes what is plotted.\n",
+    "\n",
+    "**e)** Add this second figure to your document with a caption and reference in the text. All figures in the final report must be captioned and be referenced and used in the text.\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "276c214e",
+   "metadata": {},
+   "source": [
+    "## Exercise 3: Writing an abstract and introduction\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f4134eb5",
+   "metadata": {},
+   "source": [
+    "Although much of your project 1 results are not done yet, we want you to write an abstract and introduction to get you started on writing the report. It is generally a good idea to write a lot of a report before finishing all of the results, as you get a better understanding of your methods and inquiry from doing so, along with saving a lot of time. Where you would typically describe results in the abstract, instead make something up, just this once.\n",
+    "\n",
+    "**a)** Read the guidelines on abstract and introduction before you start.\n",
+    "\n",
+    "**b)** Write an abstract for project 1 in your report.\n",
+    "\n",
+    "**c)** Write an introduction for project 1 in your report.\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2f512b59",
+   "metadata": {},
+   "source": [
+    "## Exercise 4: Making the code avaliable and presentable\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "77fe1fec",
+   "metadata": {},
+   "source": [
+    "A central part of the report is the code you write to implement the methods and generate the results. To get points for the code-part of the project, you need to make your code avaliable and presentable.\n",
+    "\n",
+    "**a)** Create a github repository for project 1, or create a dedicated folder for project 1 in a github repository. Only one person in your group needs to do this.\n",
+    "\n",
+    "**b)** Add a PDF of the report to this repository, after completing exercises 1-3\n",
+    "\n",
+    "**c)** Add a folder named Code, where you can put python files for your functions and notebooks for reproducing your results.\n",
+    "\n",
+    "**d)** Add python files for functions, and a notebook to produce the figures in exercise 2, to the Code folder. Remember to use a seed for generating random data and for train-test splits.\n",
+    "\n",
+    "**e)** Create a README file in the repository or project folder with\n",
+    "\n",
+    "- the name of the group members\n",
+    "- a short description of the project\n",
+    "- a description of how to install the required packages to run your code from a requirements.txt file\n",
+    "- names and descriptions of the various notebooks in the Code folder and the results they produce\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f1d72c56",
+   "metadata": {},
+   "source": [
+    "## Exercise 5: Referencing\n",
+    "\n",
+    "**a)** Add a reference to Hastie et al. using your preferred referencing style. See https://www.sokogskriv.no/referansestiler/ for an overview of styles.\n",
+    "\n",
+    "**b)** Add a reference to sklearn like this: https://scikit-learn.org/stable/about.html#citing-scikit-learn\n",
+    "\n",
+    "**c)** Make a prompt to your LLM of choice, and upload the exported conversation to your GitHub repository for the project.\n",
+    "\n",
+    "**d)** At the end of the methods section of the report, write a one paragraph declaration on how and for what you have used the LLM. Link to the log on GitHub.\n"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.9.15"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/doc/LectureNotes/.ipynb_checkpoints/exercisesweek43-checkpoint.ipynb b/doc/LectureNotes/.ipynb_checkpoints/exercisesweek43-checkpoint.ipynb
index 6d6019289..f80e8787a 100644
--- a/doc/LectureNotes/.ipynb_checkpoints/exercisesweek43-checkpoint.ipynb
+++ b/doc/LectureNotes/.ipynb_checkpoints/exercisesweek43-checkpoint.ipynb
@@ -2,1492 +2,624 @@
  "cells": [
   {
    "cell_type": "markdown",
-   "id": "b2937d10",
-   "metadata": {},
+   "id": "860d70d8",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "<!-- HTML file automatically generated from DocOnce source (https://github.com/doconce/doconce/)\n",
     "doconce format html exercisesweek43.do.txt  -->\n",
-    "<!-- dom:TITLE: Exercises weeks 43 and 44  -->"
+    "<!-- dom:TITLE: Exercises week 43  -->"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "3dd00d19",
-   "metadata": {},
+   "id": "119c0988",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "# Exercises weeks 43 and 44 \n",
-    "**October 23-27, 2023**\n",
-    "\n",
-    "Date: **Deadline is Sunday November 5 at midnight**\n",
+    "# Exercises week 43 \n",
+    "**October 20-24, 2025**\n",
     "\n",
-    "You can hand in the exercises from week 43 and week 44 as one exercise and get a total score of two additional points."
+    "Date: **Deadline Friday October 24 at midnight**"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "82a19a1d",
-   "metadata": {},
+   "id": "909887eb",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "# Overarching aims of the exercises weeks 43 and 44\n",
-    "\n",
-    "The aim of the exercises this week and next week is to get started with writing a neural network code\n",
-    "of relevance for project 2. \n",
-    "\n",
-    "During week 41 we discussed three different types of gates, the\n",
-    "so-called XOR, the OR and the AND gates.  In order to develop a code\n",
-    "for neural networks, it can be useful to set up a simpler system with\n",
-    "only two inputs and one output. This can make it easier to debug and\n",
-    "study the feed forward pass and the back propagation part. In the\n",
-    "exercise this and next week, we propose to study this system with just\n",
-    "one hidden layer and two hidden nodes. There is only one output node\n",
-    "and we can choose to use either a simple regression case (fitting a\n",
-    "line) or just a binary classification case with the cross-entropy as\n",
-    "cost function.\n",
+    "# Overarching aims of the exercises for week 43\n",
     "\n",
-    "Their inputs and outputs can be\n",
-    "summarized using the following tables, first for the OR gate with\n",
-    "inputs $x_1$ and $x_2$ and outputs $y$:\n",
+    "The aim of the exercises this week is to gain some confidence with\n",
+    "ways to visualize the results of a classification problem.  We will\n",
+    "target three ways of setting up the analysis. The first and simplest\n",
+    "one is the\n",
+    "1. so-called confusion matrix. The next one is the so-called\n",
     "\n",
-    "<table class=\"dotable\" border=\"1\">\n",
-    "<thead>\n",
-    "<tr><th align=\"center\">$x_1$</th> <th align=\"center\">$x_2$</th> <th align=\"center\">$y$</th> </tr>\n",
-    "</thead>\n",
-    "<tbody>\n",
-    "<tr><td align=\"center\">   0        </td> <td align=\"center\">   0        </td> <td align=\"center\">   0      </td> </tr>\n",
-    "<tr><td align=\"center\">   0        </td> <td align=\"center\">   1        </td> <td align=\"center\">   1      </td> </tr>\n",
-    "<tr><td align=\"center\">   1        </td> <td align=\"center\">   0        </td> <td align=\"center\">   1      </td> </tr>\n",
-    "<tr><td align=\"center\">   1        </td> <td align=\"center\">   1        </td> <td align=\"center\">   1      </td> </tr>\n",
-    "</tbody>\n",
-    "</table>"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "f74f69af",
-   "metadata": {},
-   "source": [
-    "## The AND and XOR Gates\n",
+    "2. ROC curve. Finally we have the\n",
     "\n",
-    "The AND gate is defined as\n",
+    "3. Cumulative gain curve.\n",
     "\n",
-    "<table class=\"dotable\" border=\"1\">\n",
-    "<thead>\n",
-    "<tr><th align=\"center\">$x_1$</th> <th align=\"center\">$x_2$</th> <th align=\"center\">$y$</th> </tr>\n",
-    "</thead>\n",
-    "<tbody>\n",
-    "<tr><td align=\"center\">   0        </td> <td align=\"center\">   0        </td> <td align=\"center\">   0      </td> </tr>\n",
-    "<tr><td align=\"center\">   0        </td> <td align=\"center\">   1        </td> <td align=\"center\">   0      </td> </tr>\n",
-    "<tr><td align=\"center\">   1        </td> <td align=\"center\">   0        </td> <td align=\"center\">   0      </td> </tr>\n",
-    "<tr><td align=\"center\">   1        </td> <td align=\"center\">   1        </td> <td align=\"center\">   1      </td> </tr>\n",
-    "</tbody>\n",
-    "</table>\n",
+    "We will use Logistic Regression as method for the classification in\n",
+    "this exercise. You can compare these results with those obtained with\n",
+    "your neural network code from project 2 without a hidden layer.\n",
     "\n",
-    "And finally we have the XOR gate\n",
+    "In these exercises we will use binary and  multi-class data sets\n",
+    "(the Iris data set from week 41).\n",
     "\n",
-    "<table class=\"dotable\" border=\"1\">\n",
-    "<thead>\n",
-    "<tr><th align=\"center\">$x_1$</th> <th align=\"center\">$x_2$</th> <th align=\"center\">$y$</th> </tr>\n",
-    "</thead>\n",
-    "<tbody>\n",
-    "<tr><td align=\"center\">   0        </td> <td align=\"center\">   0        </td> <td align=\"center\">   0      </td> </tr>\n",
-    "<tr><td align=\"center\">   0        </td> <td align=\"center\">   1        </td> <td align=\"center\">   1      </td> </tr>\n",
-    "<tr><td align=\"center\">   1        </td> <td align=\"center\">   0        </td> <td align=\"center\">   1      </td> </tr>\n",
-    "<tr><td align=\"center\">   1        </td> <td align=\"center\">   1        </td> <td align=\"center\">   0      </td> </tr>\n",
-    "</tbody>\n",
-    "</table>"
+    "The underlying mathematics is described here."
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "1b52d47a",
-   "metadata": {},
+   "id": "1e1cb4fb",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "## Representing the Data Sets\n",
+    "### Confusion Matrix\n",
     "\n",
-    "Our design matrix is defined by the input values $x_1$ and $x_2$. Since we have four possible outputs, our design matrix reads"
+    "A **confusion matrix** summarizes a classifier’s performance by\n",
+    "tabulating predictions versus true labels.  For binary classification,\n",
+    "it is a $2\\times2$ table whose entries are counts of outcomes:"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "3e6910cb",
-   "metadata": {},
+   "id": "7b090385",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
-    "\\boldsymbol{X}=\\begin{bmatrix} 0 & 0 \\\\\n",
-    "                       0 & 1 \\\\\n",
-    "\t\t       1 & 0 \\\\\n",
-    "\t\t       1 & 1 \\end{bmatrix},\n",
+    "\\begin{array}{l|cc} & \\text{Predicted Positive} & \\text{Predicted Negative} \\\\ \\hline \\text{Actual Positive} & TP & FN \\\\ \\text{Actual Negative} & FP & TN \\end{array}.\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "90a3b78a",
-   "metadata": {},
+   "id": "1e14904b",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "while the vector of outputs is $\\boldsymbol{y}^T=[0,1,1,0]$ for the XOR gate, $\\boldsymbol{y}^T=[0,0,0,1]$ for the AND gate and $\\boldsymbol{y}^T=[0,1,1,1]$ for the OR gate.\n",
-    "\n",
-    "Your tasks here are\n",
-    "\n",
-    "1. Set up the design matrix with the inputs as discussed above and a vector containing the output, the so-called targets. Note that the design matrix is the same for all gates. You need just to define different outputs.\n",
-    "\n",
-    "2. Construct a neural network with only one hidden layer and two hidden nodes using the Sigmoid function as activation function.\n",
-    "\n",
-    "3. Set up the output layer with only one output node and use again the Sigmoid function as activation function for the output.\n",
-    "\n",
-    "4. Initialize the weights and biases and perform a feed forward pass and compare the outputs with the targets.\n",
-    "\n",
-    "5. Set up the cost function (cross entropy for classification of binary cases).\n",
-    "\n",
-    "6. Calculate the gradients needed for the back propagation part.\n",
-    "\n",
-    "7. Use the gradients to train the network in the back propagation part. Think of using automatic differentiation.\n",
-    "\n",
-    "8. Train the network and study your results and compare with results obtained either with **scikit-learn** or **TensorFlow**.\n",
-    "\n",
-    "Everything you develop here can be used directly into the code for the project."
+    "Here TP (true positives) is the number of cases correctly predicted as\n",
+    "positive, FP (false positives) is the number incorrectly predicted as\n",
+    "positive, TN (true negatives) is correctly predicted negative, and FN\n",
+    "(false negatives) is incorrectly predicted negative .  In other words,\n",
+    "“positive” means class 1 and “negative” means class 0; for example, TP\n",
+    "occurs when the prediction and actual are both positive.  Formally:"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "d6a3ab1e",
-   "metadata": {},
+   "id": "e93ea290",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "## Setting up the Neural Network\n",
-    "\n",
-    "We define first our design matrix and the various output vectors for the different gates."
+    "$$\n",
+    "\\text{TPR} = \\frac{\\text{TP}}{\\text{TP} + \\text{FN}}, \\quad \\text{FPR} = \\frac{\\text{FP}}{\\text{FP} + \\text{TN}},\n",
+    "$$"
    ]
   },
   {
-   "cell_type": "code",
-   "execution_count": 1,
-   "id": "152123b0",
-   "metadata": {},
-   "outputs": [],
+   "cell_type": "markdown",
+   "id": "c80bea5b",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "%matplotlib inline\n",
-    "\n",
-    "\"\"\"\n",
-    "Simple code that tests XOR, OR and AND gates with linear regression\n",
-    "\"\"\"\n",
-    "\n",
-    "# import necessary packages\n",
-    "import numpy as np\n",
-    "import matplotlib.pyplot as plt\n",
-    "from sklearn import datasets\n",
-    "\n",
-    "def sigmoid(x):\n",
-    "    return 1/(1 + np.exp(-x))\n",
-    "\n",
-    "def feed_forward(X):\n",
-    "    # weighted sum of inputs to the hidden layer\n",
-    "    z_h = np.matmul(X, hidden_weights) + hidden_bias\n",
-    "    # activation in the hidden layer\n",
-    "    a_h = sigmoid(z_h)\n",
-    "    \n",
-    "    # weighted sum of inputs to the output layer\n",
-    "    z_o = np.matmul(a_h, output_weights) + output_bias\n",
-    "    # softmax output\n",
-    "    # axis 0 holds each input and axis 1 the probabilities of each category\n",
-    "    probabilities = sigmoid(z_o)\n",
-    "    return probabilities\n",
-    "\n",
-    "# we obtain a prediction by taking the class with the highest likelihood\n",
-    "def predict(X):\n",
-    "    probabilities = feed_forward(X)\n",
-    "    return np.argmax(probabilities, axis=1)\n",
-    "\n",
-    "# ensure the same random numbers appear every time\n",
-    "np.random.seed(0)\n",
-    "\n",
-    "# Design matrix\n",
-    "X = np.array([ [0, 0], [0, 1], [1, 0],[1, 1]],dtype=np.float64)\n",
-    "\n",
-    "# The XOR gate\n",
-    "yXOR = np.array( [ 0, 1 ,1, 0])\n",
-    "# The OR gate\n",
-    "yOR = np.array( [ 0, 1 ,1, 1])\n",
-    "# The AND gate\n",
-    "yAND = np.array( [ 0, 0 ,0, 1])\n",
-    "\n",
-    "# Defining the neural network\n",
-    "n_inputs, n_features = X.shape\n",
-    "n_hidden_neurons = 2\n",
-    "n_categories = 2\n",
-    "n_features = 2\n",
+    "where TPR and FPR are the true and false positive rates defined below.\n",
     "\n",
-    "# we make the weights normally distributed using numpy.random.randn\n",
-    "\n",
-    "# weights and bias in the hidden layer\n",
-    "hidden_weights = np.random.randn(n_features, n_hidden_neurons)\n",
-    "hidden_bias = np.zeros(n_hidden_neurons) + 0.01\n",
-    "\n",
-    "# weights and bias in the output layer\n",
-    "output_weights = np.random.randn(n_hidden_neurons, n_categories)\n",
-    "output_bias = np.zeros(n_categories) + 0.01\n",
-    "\n",
-    "probabilities = feed_forward(X)\n",
-    "print(probabilities)\n",
-    "\n",
-    "\n",
-    "predictions = predict(X)\n",
-    "print(predictions)"
+    "In multiclass classification with $K$ classes, the confusion matrix\n",
+    "generalizes to a $K\\times K$ table.  Entry $N_{ij}$ in the table is\n",
+    "the count of instances whose true class is $i$ and whose predicted\n",
+    "class is $j$.  For example, a three-class confusion matrix can be written\n",
+    "as:"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "73319f0a",
-   "metadata": {},
+   "id": "a0f68f5f",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "Not an impressive result, but this was our first forward pass with randomly assigned weights. Let us now add the full network with the back-propagation algorithm discussed above."
+    "$$\n",
+    "\\begin{array}{c|ccc} & \\text{Pred Class 1} & \\text{Pred Class 2} & \\text{Pred Class 3} \\\\ \\hline \\text{Act Class 1} & N_{11} & N_{12} & N_{13} \\\\ \\text{Act Class 2} & N_{21} & N_{22} & N_{23} \\\\ \\text{Act Class 3} & N_{31} & N_{32} & N_{33} \\end{array}.\n",
+    "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "a7e0c47a",
-   "metadata": {},
+   "id": "869669b2",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "## The Code using Scikit-Learn"
+    "Here the diagonal entries $N_{ii}$ are the true positives for each\n",
+    "class, and off-diagonal entries are misclassifications.  This matrix\n",
+    "allows computation of per-class metrics: e.g. for class $i$,\n",
+    "$\\mathrm{TP}_i=N_{ii}$, $\\mathrm{FN}_i=\\sum_{j\\neq i}N_{ij}$,\n",
+    "$\\mathrm{FP}_i=\\sum_{j\\neq i}N_{ji}$, and $\\mathrm{TN}_i$ is the sum of\n",
+    "all remaining entries.\n",
+    "\n",
+    "As defined above, TPR and FPR come from the binary case. In binary\n",
+    "terms with $P$ actual positives and $N$ actual negatives, one has"
    ]
   },
   {
-   "cell_type": "code",
-   "execution_count": 2,
-   "id": "dbbacc67",
-   "metadata": {},
-   "outputs": [],
+   "cell_type": "markdown",
+   "id": "2abd82a7",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "# import necessary packages\n",
-    "import numpy as np\n",
-    "import matplotlib.pyplot as plt\n",
-    "from sklearn.neural_network import MLPClassifier\n",
-    "from sklearn.metrics import accuracy_score\n",
-    "import seaborn as sns\n",
-    "\n",
-    "# ensure the same random numbers appear every time\n",
-    "np.random.seed(0)\n",
-    "\n",
-    "# Design matrix\n",
-    "X = np.array([ [0, 0], [0, 1], [1, 0],[1, 1]],dtype=np.float64)\n",
-    "\n",
-    "# The XOR gate\n",
-    "yXOR = np.array( [ 0, 1 ,1, 0])\n",
-    "# The OR gate\n",
-    "yOR = np.array( [ 0, 1 ,1, 1])\n",
-    "# The AND gate\n",
-    "yAND = np.array( [ 0, 0 ,0, 1])\n",
-    "\n",
-    "# Defining the neural network\n",
-    "n_inputs, n_features = X.shape\n",
-    "n_hidden_neurons = 2\n",
-    "n_categories = 2\n",
-    "n_features = 2\n",
-    "\n",
-    "eta_vals = np.logspace(-5, 1, 7)\n",
-    "lmbd_vals = np.logspace(-5, 1, 7)\n",
-    "# store models for later use\n",
-    "DNN_scikit = np.zeros((len(eta_vals), len(lmbd_vals)), dtype=object)\n",
-    "epochs = 100\n",
-    "\n",
-    "for i, eta in enumerate(eta_vals):\n",
-    "    for j, lmbd in enumerate(lmbd_vals):\n",
-    "        dnn = MLPClassifier(hidden_layer_sizes=(n_hidden_neurons), activation='logistic',\n",
-    "                            alpha=lmbd, learning_rate_init=eta, max_iter=epochs)\n",
-    "        dnn.fit(X, yXOR)\n",
-    "        DNN_scikit[i][j] = dnn\n",
-    "        print(\"Learning rate  = \", eta)\n",
-    "        print(\"Lambda = \", lmbd)\n",
-    "        print(\"Accuracy score on data set: \", dnn.score(X, yXOR))\n",
-    "        print()\n",
-    "\n",
-    "sns.set()\n",
-    "test_accuracy = np.zeros((len(eta_vals), len(lmbd_vals)))\n",
-    "for i in range(len(eta_vals)):\n",
-    "    for j in range(len(lmbd_vals)):\n",
-    "        dnn = DNN_scikit[i][j]\n",
-    "        test_pred = dnn.predict(X)\n",
-    "        test_accuracy[i][j] = accuracy_score(yXOR, test_pred)\n",
-    "\n",
-    "fig, ax = plt.subplots(figsize = (10, 10))\n",
-    "sns.heatmap(test_accuracy, annot=True, ax=ax, cmap=\"viridis\")\n",
-    "ax.set_title(\"Test Accuracy\")\n",
-    "ax.set_ylabel(\"$\\eta$\")\n",
-    "ax.set_xlabel(\"$\\lambda$\")\n",
-    "plt.show()"
+    "$$\n",
+    "\\text{TPR} = \\frac{TP}{P} = \\frac{TP}{TP+FN}, \\quad \\text{FPR} =\n",
+    "\\frac{FP}{N} = \\frac{FP}{FP+TN},\n",
+    "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "1cac501a",
-   "metadata": {},
+   "id": "2f79325c",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "## Building a neural network code\n",
-    "\n",
-    "Here we  present a flexible object oriented codebase\n",
-    "for a feed forward neural network, along with a demonstration of how\n",
-    "to use it. Before we get into the details of the neural network, we\n",
-    "will first present some implementations of various schedulers, cost\n",
-    "functions and activation functions that can be used together with the\n",
-    "neural network.\n",
-    "\n",
-    "The codes here were developed by Eric Reber and Gregor Kajda during spring 2023."
+    "as used in standard confusion-matrix\n",
+    "formulations. These rates will be used in constructing ROC curves."
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "dd153528",
-   "metadata": {},
+   "id": "0ce65a47",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "### Learning rate methods\n",
+    "### ROC Curve\n",
     "\n",
-    "The code below shows object oriented implementations of the Constant,\n",
-    "Momentum, Adagrad, AdagradMomentum, RMS prop and Adam schedulers. All\n",
-    "of the classes belong to the shared abstract Scheduler class, and\n",
-    "share the update_change() and reset() methods allowing for any of the\n",
-    "schedulers to be seamlessly used during the training stage, as will\n",
-    "later be shown in the fit() method of the neural\n",
-    "network. Update_change() only has one parameter, the gradient\n",
-    "($δ^l_ja^{l−1}_k$), and returns the change which will be subtracted\n",
-    "from the weights. The reset() function takes no parameters, and resets\n",
-    "the desired variables. For Constant and Momentum, reset does nothing."
+    "The Receiver Operating Characteristic (ROC) curve plots the trade-off\n",
+    "between true positives and false positives as a discrimination\n",
+    "threshold varies.  Specifically, for a binary classifier that outputs\n",
+    "a score or probability, one varies the threshold $t$ for declaring\n",
+    "**positive**, and computes at each $t$ the true positive rate\n",
+    "$\\mathrm{TPR}(t)$ and false positive rate $\\mathrm{FPR}(t)$ using the\n",
+    "confusion matrix at that threshold.  The ROC curve is then the graph\n",
+    "of TPR versus FPR.  By definition,"
    ]
   },
   {
-   "cell_type": "code",
-   "execution_count": 3,
-   "id": "f55eea63",
-   "metadata": {},
-   "outputs": [],
+   "cell_type": "markdown",
+   "id": "d750fdff",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "import autograd.numpy as np\n",
-    "\n",
-    "class Scheduler:\n",
-    "    \"\"\"\n",
-    "    Abstract class for Schedulers\n",
-    "    \"\"\"\n",
-    "\n",
-    "    def __init__(self, eta):\n",
-    "        self.eta = eta\n",
-    "\n",
-    "    # should be overwritten\n",
-    "    def update_change(self, gradient):\n",
-    "        raise NotImplementedError\n",
-    "\n",
-    "    # overwritten if needed\n",
-    "    def reset(self):\n",
-    "        pass\n",
-    "\n",
-    "\n",
-    "class Constant(Scheduler):\n",
-    "    def __init__(self, eta):\n",
-    "        super().__init__(eta)\n",
-    "\n",
-    "    def update_change(self, gradient):\n",
-    "        return self.eta * gradient\n",
-    "    \n",
-    "    def reset(self):\n",
-    "        pass\n",
-    "\n",
-    "\n",
-    "class Momentum(Scheduler):\n",
-    "    def __init__(self, eta: float, momentum: float):\n",
-    "        super().__init__(eta)\n",
-    "        self.momentum = momentum\n",
-    "        self.change = 0\n",
-    "\n",
-    "    def update_change(self, gradient):\n",
-    "        self.change = self.momentum * self.change + self.eta * gradient\n",
-    "        return self.change\n",
-    "\n",
-    "    def reset(self):\n",
-    "        pass\n",
-    "\n",
-    "\n",
-    "class Adagrad(Scheduler):\n",
-    "    def __init__(self, eta):\n",
-    "        super().__init__(eta)\n",
-    "        self.G_t = None\n",
-    "\n",
-    "    def update_change(self, gradient):\n",
-    "        delta = 1e-8  # avoid division ny zero\n",
-    "\n",
-    "        if self.G_t is None:\n",
-    "            self.G_t = np.zeros((gradient.shape[0], gradient.shape[0]))\n",
-    "\n",
-    "        self.G_t += gradient @ gradient.T\n",
-    "\n",
-    "        G_t_inverse = 1 / (\n",
-    "            delta + np.sqrt(np.reshape(np.diagonal(self.G_t), (self.G_t.shape[0], 1)))\n",
-    "        )\n",
-    "        return self.eta * gradient * G_t_inverse\n",
-    "\n",
-    "    def reset(self):\n",
-    "        self.G_t = None\n",
-    "\n",
-    "\n",
-    "class AdagradMomentum(Scheduler):\n",
-    "    def __init__(self, eta, momentum):\n",
-    "        super().__init__(eta)\n",
-    "        self.G_t = None\n",
-    "        self.momentum = momentum\n",
-    "        self.change = 0\n",
-    "\n",
-    "    def update_change(self, gradient):\n",
-    "        delta = 1e-8  # avoid division ny zero\n",
-    "\n",
-    "        if self.G_t is None:\n",
-    "            self.G_t = np.zeros((gradient.shape[0], gradient.shape[0]))\n",
-    "\n",
-    "        self.G_t += gradient @ gradient.T\n",
-    "\n",
-    "        G_t_inverse = 1 / (\n",
-    "            delta + np.sqrt(np.reshape(np.diagonal(self.G_t), (self.G_t.shape[0], 1)))\n",
-    "        )\n",
-    "        self.change = self.change * self.momentum + self.eta * gradient * G_t_inverse\n",
-    "        return self.change\n",
-    "\n",
-    "    def reset(self):\n",
-    "        self.G_t = None\n",
-    "\n",
-    "\n",
-    "class RMS_prop(Scheduler):\n",
-    "    def __init__(self, eta, rho):\n",
-    "        super().__init__(eta)\n",
-    "        self.rho = rho\n",
-    "        self.second = 0.0\n",
-    "\n",
-    "    def update_change(self, gradient):\n",
-    "        delta = 1e-8  # avoid division ny zero\n",
-    "        self.second = self.rho * self.second + (1 - self.rho) * gradient * gradient\n",
-    "        return self.eta * gradient / (np.sqrt(self.second + delta))\n",
-    "\n",
-    "    def reset(self):\n",
-    "        self.second = 0.0\n",
-    "\n",
-    "\n",
-    "class Adam(Scheduler):\n",
-    "    def __init__(self, eta, rho, rho2):\n",
-    "        super().__init__(eta)\n",
-    "        self.rho = rho\n",
-    "        self.rho2 = rho2\n",
-    "        self.moment = 0\n",
-    "        self.second = 0\n",
-    "        self.n_epochs = 1\n",
-    "\n",
-    "    def update_change(self, gradient):\n",
-    "        delta = 1e-8  # avoid division ny zero\n",
-    "\n",
-    "        self.moment = self.rho * self.moment + (1 - self.rho) * gradient\n",
-    "        self.second = self.rho2 * self.second + (1 - self.rho2) * gradient * gradient\n",
-    "\n",
-    "        moment_corrected = self.moment / (1 - self.rho**self.n_epochs)\n",
-    "        second_corrected = self.second / (1 - self.rho2**self.n_epochs)\n",
-    "\n",
-    "        return self.eta * moment_corrected / (np.sqrt(second_corrected + delta))\n",
-    "\n",
-    "    def reset(self):\n",
-    "        self.n_epochs += 1\n",
-    "        self.moment = 0\n",
-    "        self.second = 0"
+    "$$\n",
+    "\\mathrm{TPR} = \\frac{TP}{TP+FN}, \\qquad \\mathrm{FPR} = \\frac{FP}{FP+TN},\n",
+    "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "1a9bcb3e",
-   "metadata": {},
+   "id": "561bfb2c",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "### Usage of the above learning rate schedulers\n",
+    "where $TP,FP,TN,FN$ are counts determined by threshold $t$.  A perfect\n",
+    "classifier would reach the point (FPR=0, TPR=1) at some threshold.\n",
     "\n",
-    "To initalize a scheduler, simply create the object and pass in the\n",
-    "necessary parameters such as the learning rate and the momentum as\n",
-    "shown below. As the Scheduler class is an abstract class it should not\n",
-    "called directly, and will raise an error upon usage."
+    "Formally, the ROC curve is obtained by plotting\n",
+    "$(\\mathrm{FPR}(t),\\mathrm{TPR}(t))$ for all $t\\in[0,1]$ (or as $t$\n",
+    "sweeps through the sorted scores).  The Area Under the ROC Curve (AUC)\n",
+    "quantifies the average performance over all thresholds.  It can be\n",
+    "interpreted probabilistically: $\\mathrm{AUC} =\n",
+    "\\Pr\\bigl(s(X^+)>s(X^-)\\bigr)$, the probability that a random positive\n",
+    "instance $X^+$ receives a higher score $s$ than a random negative\n",
+    "instance $X^-$ .  Equivalently, the AUC is the integral under the ROC\n",
+    "curve:"
    ]
   },
   {
-   "cell_type": "code",
-   "execution_count": 4,
-   "id": "86013cb4",
-   "metadata": {},
-   "outputs": [],
+   "cell_type": "markdown",
+   "id": "5ca722fe",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "momentum_scheduler = Momentum(eta=1e-3, momentum=0.9)\n",
-    "adam_scheduler = Adam(eta=1e-3, rho=0.9, rho2=0.999)"
+    "$$\n",
+    "\\mathrm{AUC} \\;=\\; \\int_{0}^{1} \\mathrm{TPR}(f)\\,df,\n",
+    "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "535331f6",
-   "metadata": {},
+   "id": "30080a86",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "Here is a small example for how a segment of code using schedulers\n",
-    "could look. Switching out the schedulers is simple."
+    "where $f$ ranges over FPR (or fraction of negatives).  A model that guesses at random yields a diagonal ROC (AUC=0.5), whereas a perfect model yields AUC=1.0."
    ]
   },
   {
-   "cell_type": "code",
-   "execution_count": 5,
-   "id": "7e0f6b5a",
-   "metadata": {},
-   "outputs": [],
+   "cell_type": "markdown",
+   "id": "9e627156",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "weights = np.ones((3,3))\n",
-    "print(f\"Before scheduler:\\n{weights=}\")\n",
+    "### Cumulative Gain\n",
     "\n",
-    "epochs = 10\n",
-    "for e in range(epochs):\n",
-    "    gradient = np.random.rand(3, 3)\n",
-    "    change = adam_scheduler.update_change(gradient)\n",
-    "    weights = weights - change\n",
-    "    adam_scheduler.reset()\n",
-    "\n",
-    "print(f\"\\nAfter scheduler:\\n{weights=}\")"
+    "The cumulative gain curve (or gains chart) evaluates how many\n",
+    "positives are captured as one targets an increasing fraction of the\n",
+    "population, sorted by model confidence.  To construct it, sort all\n",
+    "instances by decreasing predicted probability of the positive class.\n",
+    "Then, for the top $\\alpha$ fraction of instances, compute the fraction\n",
+    "of all actual positives that fall in this subset.  In formula form, if\n",
+    "$P$ is the total number of positive instances and $P(\\alpha)$ is the\n",
+    "number of positives among the top $\\alpha$ of the data, the cumulative\n",
+    "gain at level $\\alpha$ is"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "f018ae57",
-   "metadata": {},
+   "id": "3e9132ef",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "### Cost functions\n",
-    "\n",
-    "Here we discuss cost functions that can be used when creating the\n",
-    "neural network. Every cost function takes the target vector as its\n",
-    "parameter, and returns a function valued only at $x$ such that it may\n",
-    "easily be differentiated."
+    "$$\n",
+    "\\mathrm{Gain}(\\alpha) \\;=\\; \\frac{P(\\alpha)}{P}.\n",
+    "$$"
    ]
   },
   {
-   "cell_type": "code",
-   "execution_count": 6,
-   "id": "c13507bf",
-   "metadata": {},
-   "outputs": [],
+   "cell_type": "markdown",
+   "id": "75be6f5c",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "import autograd.numpy as np\n",
-    "\n",
-    "def CostOLS(target):\n",
-    "    \n",
-    "    def func(X):\n",
-    "        return (1.0 / target.shape[0]) * np.sum((target - X) ** 2)\n",
-    "\n",
-    "    return func\n",
+    "For example, cutting off at the top 10% of predictions yields a gain\n",
+    "equal to (positives in top 10%) divided by (total positives) .\n",
+    "Plotting $\\mathrm{Gain}(\\alpha)$ versus $\\alpha$ (often in percent)\n",
+    "gives the gain curve.  The baseline (random) curve is the diagonal\n",
+    "$\\mathrm{Gain}(\\alpha)=\\alpha$, while an ideal model has a steep climb\n",
+    "toward 1.\n",
     "\n",
-    "\n",
-    "def CostLogReg(target):\n",
-    "\n",
-    "    def func(X):\n",
-    "        \n",
-    "        return -(1.0 / target.shape[0]) * np.sum(\n",
-    "            (target * np.log(X + 10e-10)) + ((1 - target) * np.log(1 - X + 10e-10))\n",
-    "        )\n",
-    "\n",
-    "    return func\n",
-    "\n",
-    "\n",
-    "def CostCrossEntropy(target):\n",
-    "    \n",
-    "    def func(X):\n",
-    "        return -(1.0 / target.size) * np.sum(target * np.log(X + 10e-10))\n",
-    "\n",
-    "    return func"
+    "A related measure is the {\\em lift}, often called the gain ratio.  It is the ratio of the model’s capture rate to that of random selection.  Equivalently,"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "6dab17bc",
-   "metadata": {},
-   "source": [
-    "Below we give a short example of how these cost function may be used\n",
-    "to obtain results if you wish to test them out on your own using\n",
-    "AutoGrad's automatics differentiation."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 7,
-   "id": "a5dbba01",
-   "metadata": {},
-   "outputs": [],
+   "id": "e5525570",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "from autograd import grad\n",
-    "\n",
-    "target = np.array([[1, 2, 3]]).T\n",
-    "a = np.array([[4, 5, 6]]).T\n",
-    "\n",
-    "cost_func = CostCrossEntropy\n",
-    "cost_func_derivative = grad(cost_func(target))\n",
-    "\n",
-    "valued_at_a = cost_func_derivative(a)\n",
-    "print(f\"Derivative of cost function {cost_func.__name__} valued at a:\\n{valued_at_a}\")"
+    "$$\n",
+    "\\mathrm{Lift}(\\alpha) \\;=\\; \\frac{\\mathrm{Gain}(\\alpha)}{\\alpha}.\n",
+    "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "b55d31d4",
-   "metadata": {},
+   "id": "18ff8dc2",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "### Activation functions\n",
-    "\n",
-    "Finally, before we look at the neural network, we will look at the\n",
-    "activation functions which can be specified between the hidden layers\n",
-    "and as the output function. Each function can be valued for any given\n",
-    "vector or matrix X, and can be differentiated via derivate()."
+    "A lift $>1$ indicates better-than-random targeting.  In practice, gain\n",
+    "and lift charts (used e.g.\\ in marketing or imbalanced classification)\n",
+    "show how many positives can be “gained” by focusing on a fraction of\n",
+    "the population ."
    ]
   },
   {
-   "cell_type": "code",
-   "execution_count": 8,
-   "id": "b3e045a6",
-   "metadata": {},
-   "outputs": [],
+   "cell_type": "markdown",
+   "id": "c3d3fde8",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "import autograd.numpy as np\n",
-    "from autograd import elementwise_grad\n",
-    "\n",
-    "def identity(X):\n",
-    "    return X\n",
-    "\n",
-    "\n",
-    "def sigmoid(X):\n",
-    "    try:\n",
-    "        return 1.0 / (1 + np.exp(-X))\n",
-    "    except FloatingPointError:\n",
-    "        return np.where(X > np.zeros(X.shape), np.ones(X.shape), np.zeros(X.shape))\n",
-    "\n",
-    "\n",
-    "def softmax(X):\n",
-    "    X = X - np.max(X, axis=-1, keepdims=True)\n",
-    "    delta = 10e-10\n",
-    "    return np.exp(X) / (np.sum(np.exp(X), axis=-1, keepdims=True) + delta)\n",
-    "\n",
-    "\n",
-    "def RELU(X):\n",
-    "    return np.where(X > np.zeros(X.shape), X, np.zeros(X.shape))\n",
-    "\n",
-    "\n",
-    "def LRELU(X):\n",
-    "    delta = 10e-4\n",
-    "    return np.where(X > np.zeros(X.shape), X, delta * X)\n",
+    "### Other measures: Precision, Recall, and the F$_1$ Measure\n",
     "\n",
-    "\n",
-    "def derivate(func):\n",
-    "    if func.__name__ == \"RELU\":\n",
-    "\n",
-    "        def func(X):\n",
-    "            return np.where(X > 0, 1, 0)\n",
-    "\n",
-    "        return func\n",
-    "\n",
-    "    elif func.__name__ == \"LRELU\":\n",
-    "\n",
-    "        def func(X):\n",
-    "            delta = 10e-4\n",
-    "            return np.where(X > 0, 1, delta)\n",
-    "\n",
-    "        return func\n",
-    "\n",
-    "    else:\n",
-    "        return elementwise_grad(func)"
+    "Precision and recall (sensitivity) quantify binary classification\n",
+    "accuracy in terms of positive predictions.  They are defined from the\n",
+    "confusion matrix as:"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "c0189342",
-   "metadata": {},
+   "id": "f1f14c8e",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "Below follows a short demonstration of how to use an activation\n",
-    "function. The derivative of the activation function will be important\n",
-    "when calculating the output delta term during backpropagation. Note\n",
-    "that derivate() can also be used for cost functions for a more\n",
-    "generalized approach."
+    "$$\n",
+    "\\text{Precision} = \\frac{TP}{TP + FP}, \\qquad \\text{Recall} = \\frac{TP}{TP + FN}.\n",
+    "$$"
    ]
   },
   {
-   "cell_type": "code",
-   "execution_count": 9,
-   "id": "640aa861",
-   "metadata": {},
-   "outputs": [],
+   "cell_type": "markdown",
+   "id": "422cc743",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "z = np.array([[4, 5, 6]]).T\n",
-    "print(f\"Input to activation function:\\n{z}\")\n",
-    "\n",
-    "act_func = sigmoid\n",
-    "a = act_func(z)\n",
-    "print(f\"\\nOutput from {act_func.__name__} activation function:\\n{a}\")\n",
+    "Precision is the fraction of predicted positives that are correct, and\n",
+    "recall is the fraction of actual positives that are correctly\n",
+    "identified .  A high-precision classifier makes few false-positive\n",
+    "errors, while a high-recall classifier makes few false-negative\n",
+    "errors.\n",
     "\n",
-    "act_func_derivative = derivate(act_func)\n",
-    "valued_at_z = act_func_derivative(a)\n",
-    "print(f\"\\nDerivative of {act_func.__name__} activation function valued at z:\\n{valued_at_z}\")"
+    "The F$_1$ score (balanced F-measure) combines precision and recall into a single metric via their harmonic mean.  The usual formula is:"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "1007ccdd",
-   "metadata": {},
+   "id": "621a2e8b",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "### The Neural Network\n",
-    "\n",
-    "Now that we have gotten a good understanding of the implementation of\n",
-    "some important components, we can take a look at an object oriented\n",
-    "implementation of a feed forward neural network. The feed forward\n",
-    "neural network has been implemented as a class named FFNN, which can\n",
-    "be initiated as a regressor or classifier dependant on the choice of\n",
-    "cost function. The FFNN can have any number of input nodes, hidden\n",
-    "layers with any amount of hidden nodes, and any amount of output nodes\n",
-    "meaning it can perform multiclass classification as well as binary\n",
-    "classification and regression problems. Although there is a lot of\n",
-    "code present, it makes for an easy to use and generalizeable interface\n",
-    "for creating many types of neural networks as will be demonstrated\n",
-    "below."
+    "$$\n",
+    "F_1 =2\\frac{\\text{Precision}\\times\\text{Recall}}{\\text{Precision} + \\text{Recall}}.\n",
+    "$$"
    ]
   },
   {
-   "cell_type": "code",
-   "execution_count": 10,
-   "id": "9584a2da",
-   "metadata": {},
-   "outputs": [],
+   "cell_type": "markdown",
+   "id": "62eee54a",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "import math\n",
-    "import autograd.numpy as np\n",
-    "import sys\n",
-    "import warnings\n",
-    "from autograd import grad, elementwise_grad\n",
-    "from random import random, seed\n",
-    "from copy import deepcopy, copy\n",
-    "from typing import Tuple, Callable\n",
-    "from sklearn.utils import resample\n",
-    "\n",
-    "warnings.simplefilter(\"error\")\n",
-    "\n",
-    "\n",
-    "class FFNN:\n",
-    "    \"\"\"\n",
-    "    Description:\n",
-    "    ------------\n",
-    "        Feed Forward Neural Network with interface enabling flexible design of a\n",
-    "        nerual networks architecture and the specification of activation function\n",
-    "        in the hidden layers and output layer respectively. This model can be used\n",
-    "        for both regression and classification problems, depending on the output function.\n",
-    "\n",
-    "    Attributes:\n",
-    "    ------------\n",
-    "        I   dimensions (tuple[int]): A list of positive integers, which specifies the\n",
-    "            number of nodes in each of the networks layers. The first integer in the array\n",
-    "            defines the number of nodes in the input layer, the second integer defines number\n",
-    "            of nodes in the first hidden layer and so on until the last number, which\n",
-    "            specifies the number of nodes in the output layer.\n",
-    "        II  hidden_func (Callable): The activation function for the hidden layers\n",
-    "        III output_func (Callable): The activation function for the output layer\n",
-    "        IV  cost_func (Callable): Our cost function\n",
-    "        V   seed (int): Sets random seed, makes results reproducible\n",
-    "    \"\"\"\n",
-    "\n",
-    "    def __init__(\n",
-    "        self,\n",
-    "        dimensions: tuple[int],\n",
-    "        hidden_func: Callable = sigmoid,\n",
-    "        output_func: Callable = lambda x: x,\n",
-    "        cost_func: Callable = CostOLS,\n",
-    "        seed: int = None,\n",
-    "    ):\n",
-    "        self.dimensions = dimensions\n",
-    "        self.hidden_func = hidden_func\n",
-    "        self.output_func = output_func\n",
-    "        self.cost_func = cost_func\n",
-    "        self.seed = seed\n",
-    "        self.weights = list()\n",
-    "        self.schedulers_weight = list()\n",
-    "        self.schedulers_bias = list()\n",
-    "        self.a_matrices = list()\n",
-    "        self.z_matrices = list()\n",
-    "        self.classification = None\n",
-    "\n",
-    "        self.reset_weights()\n",
-    "        self._set_classification()\n",
-    "\n",
-    "    def fit(\n",
-    "        self,\n",
-    "        X: np.ndarray,\n",
-    "        t: np.ndarray,\n",
-    "        scheduler: Scheduler,\n",
-    "        batches: int = 1,\n",
-    "        epochs: int = 100,\n",
-    "        lam: float = 0,\n",
-    "        X_val: np.ndarray = None,\n",
-    "        t_val: np.ndarray = None,\n",
-    "    ):\n",
-    "        \"\"\"\n",
-    "        Description:\n",
-    "        ------------\n",
-    "            This function performs the training the neural network by performing the feedforward and backpropagation\n",
-    "            algorithm to update the networks weights.\n",
-    "\n",
-    "        Parameters:\n",
-    "        ------------\n",
-    "            I    X (np.ndarray) : training data\n",
-    "            II   t (np.ndarray) : target data\n",
-    "            III  scheduler (Scheduler) : specified scheduler (algorithm for optimization of gradient descent)\n",
-    "            IV   scheduler_args (list[int]) : list of all arguments necessary for scheduler\n",
-    "\n",
-    "        Optional Parameters:\n",
-    "        ------------\n",
-    "            V    batches (int) : number of batches the datasets are split into, default equal to 1\n",
-    "            VI   epochs (int) : number of iterations used to train the network, default equal to 100\n",
-    "            VII  lam (float) : regularization hyperparameter lambda\n",
-    "            VIII X_val (np.ndarray) : validation set\n",
-    "            IX   t_val (np.ndarray) : validation target set\n",
-    "\n",
-    "        Returns:\n",
-    "        ------------\n",
-    "            I   scores (dict) : A dictionary containing the performance metrics of the model.\n",
-    "                The number of the metrics depends on the parameters passed to the fit-function.\n",
-    "\n",
-    "        \"\"\"\n",
-    "\n",
-    "        # setup \n",
-    "        if self.seed is not None:\n",
-    "            np.random.seed(self.seed)\n",
-    "\n",
-    "        val_set = False\n",
-    "        if X_val is not None and t_val is not None:\n",
-    "            val_set = True\n",
-    "\n",
-    "        # creating arrays for score metrics\n",
-    "        train_errors = np.empty(epochs)\n",
-    "        train_errors.fill(np.nan)\n",
-    "        val_errors = np.empty(epochs)\n",
-    "        val_errors.fill(np.nan)\n",
-    "\n",
-    "        train_accs = np.empty(epochs)\n",
-    "        train_accs.fill(np.nan)\n",
-    "        val_accs = np.empty(epochs)\n",
-    "        val_accs.fill(np.nan)\n",
-    "\n",
-    "        self.schedulers_weight = list()\n",
-    "        self.schedulers_bias = list()\n",
-    "\n",
-    "        batch_size = X.shape[0] // batches\n",
-    "\n",
-    "        X, t = resample(X, t)\n",
-    "\n",
-    "        # this function returns a function valued only at X\n",
-    "        cost_function_train = self.cost_func(t)\n",
-    "        if val_set:\n",
-    "            cost_function_val = self.cost_func(t_val)\n",
-    "\n",
-    "        # create schedulers for each weight matrix\n",
-    "        for i in range(len(self.weights)):\n",
-    "            self.schedulers_weight.append(copy(scheduler))\n",
-    "            self.schedulers_bias.append(copy(scheduler))\n",
-    "\n",
-    "        print(f\"{scheduler.__class__.__name__}: Eta={scheduler.eta}, Lambda={lam}\")\n",
-    "\n",
-    "        try:\n",
-    "            for e in range(epochs):\n",
-    "                for i in range(batches):\n",
-    "                    # allows for minibatch gradient descent\n",
-    "                    if i == batches - 1:\n",
-    "                        # If the for loop has reached the last batch, take all thats left\n",
-    "                        X_batch = X[i * batch_size :, :]\n",
-    "                        t_batch = t[i * batch_size :, :]\n",
-    "                    else:\n",
-    "                        X_batch = X[i * batch_size : (i + 1) * batch_size, :]\n",
-    "                        t_batch = t[i * batch_size : (i + 1) * batch_size, :]\n",
-    "\n",
-    "                    self._feedforward(X_batch)\n",
-    "                    self._backpropagate(X_batch, t_batch, lam)\n",
-    "\n",
-    "                # reset schedulers for each epoch (some schedulers pass in this call)\n",
-    "                for scheduler in self.schedulers_weight:\n",
-    "                    scheduler.reset()\n",
-    "\n",
-    "                for scheduler in self.schedulers_bias:\n",
-    "                    scheduler.reset()\n",
-    "\n",
-    "                # computing performance metrics\n",
-    "                pred_train = self.predict(X)\n",
-    "                train_error = cost_function_train(pred_train)\n",
-    "\n",
-    "                train_errors[e] = train_error\n",
-    "                if val_set:\n",
-    "                    \n",
-    "                    pred_val = self.predict(X_val)\n",
-    "                    val_error = cost_function_val(pred_val)\n",
-    "                    val_errors[e] = val_error\n",
-    "\n",
-    "                if self.classification:\n",
-    "                    train_acc = self._accuracy(self.predict(X), t)\n",
-    "                    train_accs[e] = train_acc\n",
-    "                    if val_set:\n",
-    "                        val_acc = self._accuracy(pred_val, t_val)\n",
-    "                        val_accs[e] = val_acc\n",
-    "\n",
-    "                # printing progress bar\n",
-    "                progression = e / epochs\n",
-    "                print_length = self._progress_bar(\n",
-    "                    progression,\n",
-    "                    train_error=train_errors[e],\n",
-    "                    train_acc=train_accs[e],\n",
-    "                    val_error=val_errors[e],\n",
-    "                    val_acc=val_accs[e],\n",
-    "                )\n",
-    "        except KeyboardInterrupt:\n",
-    "            # allows for stopping training at any point and seeing the result\n",
-    "            pass\n",
-    "\n",
-    "        # visualization of training progression (similiar to tensorflow progression bar)\n",
-    "        sys.stdout.write(\"\\r\" + \" \" * print_length)\n",
-    "        sys.stdout.flush()\n",
-    "        self._progress_bar(\n",
-    "            1,\n",
-    "            train_error=train_errors[e],\n",
-    "            train_acc=train_accs[e],\n",
-    "            val_error=val_errors[e],\n",
-    "            val_acc=val_accs[e],\n",
-    "        )\n",
-    "        sys.stdout.write(\"\")\n",
-    "\n",
-    "        # return performance metrics for the entire run\n",
-    "        scores = dict()\n",
-    "\n",
-    "        scores[\"train_errors\"] = train_errors\n",
-    "\n",
-    "        if val_set:\n",
-    "            scores[\"val_errors\"] = val_errors\n",
-    "\n",
-    "        if self.classification:\n",
-    "            scores[\"train_accs\"] = train_accs\n",
-    "\n",
-    "            if val_set:\n",
-    "                scores[\"val_accs\"] = val_accs\n",
-    "\n",
-    "        return scores\n",
-    "\n",
-    "    def predict(self, X: np.ndarray, *, threshold=0.5):\n",
-    "        \"\"\"\n",
-    "         Description:\n",
-    "         ------------\n",
-    "             Performs prediction after training of the network has been finished.\n",
-    "\n",
-    "         Parameters:\n",
-    "        ------------\n",
-    "             I   X (np.ndarray): The design matrix, with n rows of p features each\n",
-    "\n",
-    "         Optional Parameters:\n",
-    "         ------------\n",
-    "             II  threshold (float) : sets minimal value for a prediction to be predicted as the positive class\n",
-    "                 in classification problems\n",
-    "\n",
-    "         Returns:\n",
-    "         ------------\n",
-    "             I   z (np.ndarray): A prediction vector (row) for each row in our design matrix\n",
-    "                 This vector is thresholded if regression=False, meaning that classification results\n",
-    "                 in a vector of 1s and 0s, while regressions in an array of decimal numbers\n",
-    "\n",
-    "        \"\"\"\n",
-    "\n",
-    "        predict = self._feedforward(X)\n",
-    "\n",
-    "        if self.classification:\n",
-    "            return np.where(predict > threshold, 1, 0)\n",
-    "        else:\n",
-    "            return predict\n",
-    "\n",
-    "    def reset_weights(self):\n",
-    "        \"\"\"\n",
-    "        Description:\n",
-    "        ------------\n",
-    "            Resets/Reinitializes the weights in order to train the network for a new problem.\n",
-    "\n",
-    "        \"\"\"\n",
-    "        if self.seed is not None:\n",
-    "            np.random.seed(self.seed)\n",
-    "\n",
-    "        self.weights = list()\n",
-    "        for i in range(len(self.dimensions) - 1):\n",
-    "            weight_array = np.random.randn(\n",
-    "                self.dimensions[i] + 1, self.dimensions[i + 1]\n",
-    "            )\n",
-    "            weight_array[0, :] = np.random.randn(self.dimensions[i + 1]) * 0.01\n",
-    "\n",
-    "            self.weights.append(weight_array)\n",
-    "\n",
-    "    def _feedforward(self, X: np.ndarray):\n",
-    "        \"\"\"\n",
-    "        Description:\n",
-    "        ------------\n",
-    "            Calculates the activation of each layer starting at the input and ending at the output.\n",
-    "            Each following activation is calculated from a weighted sum of each of the preceeding\n",
-    "            activations (except in the case of the input layer).\n",
-    "\n",
-    "        Parameters:\n",
-    "        ------------\n",
-    "            I   X (np.ndarray): The design matrix, with n rows of p features each\n",
-    "\n",
-    "        Returns:\n",
-    "        ------------\n",
-    "            I   z (np.ndarray): A prediction vector (row) for each row in our design matrix\n",
-    "        \"\"\"\n",
-    "\n",
-    "        # reset matrices\n",
-    "        self.a_matrices = list()\n",
-    "        self.z_matrices = list()\n",
-    "\n",
-    "        # if X is just a vector, make it into a matrix\n",
-    "        if len(X.shape) == 1:\n",
-    "            X = X.reshape((1, X.shape[0]))\n",
-    "\n",
-    "        # Add a coloumn of zeros as the first coloumn of the design matrix, in order\n",
-    "        # to add bias to our data\n",
-    "        bias = np.ones((X.shape[0], 1)) * 0.01\n",
-    "        X = np.hstack([bias, X])\n",
-    "\n",
-    "        # a^0, the nodes in the input layer (one a^0 for each row in X - where the\n",
-    "        # exponent indicates layer number).\n",
-    "        a = X\n",
-    "        self.a_matrices.append(a)\n",
-    "        self.z_matrices.append(a)\n",
-    "\n",
-    "        # The feed forward algorithm\n",
-    "        for i in range(len(self.weights)):\n",
-    "            if i < len(self.weights) - 1:\n",
-    "                z = a @ self.weights[i]\n",
-    "                self.z_matrices.append(z)\n",
-    "                a = self.hidden_func(z)\n",
-    "                # bias column again added to the data here\n",
-    "                bias = np.ones((a.shape[0], 1)) * 0.01\n",
-    "                a = np.hstack([bias, a])\n",
-    "                self.a_matrices.append(a)\n",
-    "            else:\n",
-    "                try:\n",
-    "                    # a^L, the nodes in our output layers\n",
-    "                    z = a @ self.weights[i]\n",
-    "                    a = self.output_func(z)\n",
-    "                    self.a_matrices.append(a)\n",
-    "                    self.z_matrices.append(z)\n",
-    "                except Exception as OverflowError:\n",
-    "                    print(\n",
-    "                        \"OverflowError in fit() in FFNN\\nHOW TO DEBUG ERROR: Consider lowering your learning rate or scheduler specific parameters such as momentum, or check if your input values need scaling\"\n",
-    "                    )\n",
-    "\n",
-    "        # this will be a^L\n",
-    "        return a\n",
-    "\n",
-    "    def _backpropagate(self, X, t, lam):\n",
-    "        \"\"\"\n",
-    "        Description:\n",
-    "        ------------\n",
-    "            Performs the backpropagation algorithm. In other words, this method\n",
-    "            calculates the gradient of all the layers starting at the\n",
-    "            output layer, and moving from right to left accumulates the gradient until\n",
-    "            the input layer is reached. Each layers respective weights are updated while\n",
-    "            the algorithm propagates backwards from the output layer (auto-differentation in reverse mode).\n",
-    "\n",
-    "        Parameters:\n",
-    "        ------------\n",
-    "            I   X (np.ndarray): The design matrix, with n rows of p features each.\n",
-    "            II  t (np.ndarray): The target vector, with n rows of p targets.\n",
-    "            III lam (float32): regularization parameter used to punish the weights in case of overfitting\n",
-    "\n",
-    "        Returns:\n",
-    "        ------------\n",
-    "            No return value.\n",
-    "\n",
-    "        \"\"\"\n",
-    "        out_derivative = derivate(self.output_func)\n",
-    "        hidden_derivative = derivate(self.hidden_func)\n",
-    "\n",
-    "        for i in range(len(self.weights) - 1, -1, -1):\n",
-    "            # delta terms for output\n",
-    "            if i == len(self.weights) - 1:\n",
-    "                # for multi-class classification\n",
-    "                if (\n",
-    "                    self.output_func.__name__ == \"softmax\"\n",
-    "                ):\n",
-    "                    delta_matrix = self.a_matrices[i + 1] - t\n",
-    "                # for single class classification\n",
-    "                else:\n",
-    "                    cost_func_derivative = grad(self.cost_func(t))\n",
-    "                    delta_matrix = out_derivative(\n",
-    "                        self.z_matrices[i + 1]\n",
-    "                    ) * cost_func_derivative(self.a_matrices[i + 1])\n",
-    "\n",
-    "            # delta terms for hidden layer\n",
-    "            else:\n",
-    "                delta_matrix = (\n",
-    "                    self.weights[i + 1][1:, :] @ delta_matrix.T\n",
-    "                ).T * hidden_derivative(self.z_matrices[i + 1])\n",
-    "\n",
-    "            # calculate gradient\n",
-    "            gradient_weights = self.a_matrices[i][:, 1:].T @ delta_matrix\n",
-    "            gradient_bias = np.sum(delta_matrix, axis=0).reshape(\n",
-    "                1, delta_matrix.shape[1]\n",
-    "            )\n",
-    "\n",
-    "            # regularization term\n",
-    "            gradient_weights += self.weights[i][1:, :] * lam\n",
-    "\n",
-    "            # use scheduler\n",
-    "            update_matrix = np.vstack(\n",
-    "                [\n",
-    "                    self.schedulers_bias[i].update_change(gradient_bias),\n",
-    "                    self.schedulers_weight[i].update_change(gradient_weights),\n",
-    "                ]\n",
-    "            )\n",
-    "\n",
-    "            # update weights and bias\n",
-    "            self.weights[i] -= update_matrix\n",
-    "\n",
-    "    def _accuracy(self, prediction: np.ndarray, target: np.ndarray):\n",
-    "        \"\"\"\n",
-    "        Description:\n",
-    "        ------------\n",
-    "            Calculates accuracy of given prediction to target\n",
-    "\n",
-    "        Parameters:\n",
-    "        ------------\n",
-    "            I   prediction (np.ndarray): vector of predicitons output network\n",
-    "                (1s and 0s in case of classification, and real numbers in case of regression)\n",
-    "            II  target (np.ndarray): vector of true values (What the network ideally should predict)\n",
-    "\n",
-    "        Returns:\n",
-    "        ------------\n",
-    "            A floating point number representing the percentage of correctly classified instances.\n",
-    "        \"\"\"\n",
-    "        assert prediction.size == target.size\n",
-    "        return np.average((target == prediction))\n",
-    "    def _set_classification(self):\n",
-    "        \"\"\"\n",
-    "        Description:\n",
-    "        ------------\n",
-    "            Decides if FFNN acts as classifier (True) og regressor (False),\n",
-    "            sets self.classification during init()\n",
-    "        \"\"\"\n",
-    "        self.classification = False\n",
-    "        if (\n",
-    "            self.cost_func.__name__ == \"CostLogReg\"\n",
-    "            or self.cost_func.__name__ == \"CostCrossEntropy\"\n",
-    "        ):\n",
-    "            self.classification = True\n",
-    "\n",
-    "    def _progress_bar(self, progression, **kwargs):\n",
-    "        \"\"\"\n",
-    "        Description:\n",
-    "        ------------\n",
-    "            Displays progress of training\n",
-    "        \"\"\"\n",
-    "        print_length = 40\n",
-    "        num_equals = int(progression * print_length)\n",
-    "        num_not = print_length - num_equals\n",
-    "        arrow = \">\" if num_equals > 0 else \"\"\n",
-    "        bar = \"[\" + \"=\" * (num_equals - 1) + arrow + \"-\" * num_not + \"]\"\n",
-    "        perc_print = self._format(progression * 100, decimals=5)\n",
-    "        line = f\"  {bar} {perc_print}% \"\n",
-    "\n",
-    "        for key in kwargs:\n",
-    "            if not np.isnan(kwargs[key]):\n",
-    "                value = self._format(kwargs[key], decimals=4)\n",
-    "                line += f\"| {key}: {value} \"\n",
-    "        sys.stdout.write(\"\\r\" + line)\n",
-    "        sys.stdout.flush()\n",
-    "        return len(line)\n",
-    "\n",
-    "    def _format(self, value, decimals=4):\n",
-    "        \"\"\"\n",
-    "        Description:\n",
-    "        ------------\n",
-    "            Formats decimal numbers for progress bar\n",
-    "        \"\"\"\n",
-    "        if value > 0:\n",
-    "            v = value\n",
-    "        elif value < 0:\n",
-    "            v = -10 * value\n",
-    "        else:\n",
-    "            v = 1\n",
-    "        n = 1 + math.floor(math.log10(v))\n",
-    "        if n >= decimals - 1:\n",
-    "            return str(round(value))\n",
-    "        return f\"{value:.{decimals-n-1}f}\""
+    "This can be shown to equal"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "9ccd1fc1",
-   "metadata": {},
+   "id": "7a6a2e7a",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "Before we make a model, we will quickly generate a dataset we can use\n",
-    "for our linear regression problem as shown below"
+    "$$\n",
+    "\\frac{2\\,TP}{2\\,TP + FP + FN}.\n",
+    "$$"
    ]
   },
   {
-   "cell_type": "code",
-   "execution_count": 11,
-   "id": "7f3a5b31",
-   "metadata": {},
-   "outputs": [],
+   "cell_type": "markdown",
+   "id": "b96c9ff4",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "import autograd.numpy as np\n",
-    "from sklearn.model_selection import train_test_split\n",
-    "\n",
-    "def SkrankeFunction(x, y):\n",
-    "    return np.ravel(0 + 1*x + 2*y + 3*x**2 + 4*x*y + 5*y**2)\n",
-    "\n",
-    "def create_X(x, y, n):\n",
-    "    if len(x.shape) > 1:\n",
-    "        x = np.ravel(x)\n",
-    "        y = np.ravel(y)\n",
-    "\n",
-    "    N = len(x)\n",
-    "    l = int((n + 1) * (n + 2) / 2)  # Number of elements in beta\n",
-    "    X = np.ones((N, l))\n",
-    "\n",
-    "    for i in range(1, n + 1):\n",
-    "        q = int((i) * (i + 1) / 2)\n",
-    "        for k in range(i + 1):\n",
-    "            X[:, q + k] = (x ** (i - k)) * (y**k)\n",
+    "The F$_1$ score ranges from 0 (worst) to 1 (best), and balances the\n",
+    "trade-off between precision and recall.\n",
     "\n",
-    "    return X\n",
+    "For multi-class classification, one computes per-class\n",
+    "precision/recall/F$_1$ (treating each class as “positive” in a\n",
+    "one-vs-rest manner) and then averages.  Common averaging methods are:\n",
     "\n",
-    "step=0.5\n",
-    "x = np.arange(0, 1, step)\n",
-    "y = np.arange(0, 1, step)\n",
-    "x, y = np.meshgrid(x, y)\n",
-    "target = SkrankeFunction(x, y)\n",
-    "target = target.reshape(target.shape[0], 1)\n",
+    "Micro-averaging: Sum all true positives, false positives, and false negatives across classes, then compute precision/recall/F$_1$ from these totals.\n",
+    "Macro-averaging: Compute the F$1$ score $F{1,i}$ for each class $i$ separately, then take the unweighted mean: $F_{1,\\mathrm{macro}} = \\frac{1}{K}\\sum_{i=1}^K F_{1,i}$ .  This treats all classes equally regardless of size.\n",
+    "Weighted-averaging: Like macro-average, but weight each class’s $F_{1,i}$ by its support $n_i$ (true count): $F_{1,\\mathrm{weighted}} = \\frac{1}{N}\\sum_{i=1}^K n_i F_{1,i}$, where $N=\\sum_i n_i$.  This accounts for class imbalance by giving more weight to larger classes .\n",
     "\n",
-    "poly_degree=3\n",
-    "X = create_X(x, y, poly_degree)\n",
-    "\n",
-    "X_train, X_test, t_train, t_test = train_test_split(X, target)"
+    "Each of these averages has different use-cases. Micro-average is\n",
+    "dominated by common classes, macro-average highlights performance on\n",
+    "rare classes, and weighted-average is a compromise.  These formulas\n",
+    "and concepts allow rigorous evaluation of classifier performance in\n",
+    "both binary and multi-class settings."
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "1ac05bb6",
-   "metadata": {},
-   "source": [
-    "Now that we have our dataset ready for the regression, we can create\n",
-    "our regressor. Note that with the seed parameter, we can make sure our\n",
-    "results stay the same every time we run the neural network. For\n",
-    "inititialization, we simply specify the dimensions (we wish the amount\n",
-    "of input nodes to be equal to the datapoints, and the output to\n",
-    "predict one value)."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 12,
-   "id": "0f857604",
-   "metadata": {},
-   "outputs": [],
+   "id": "9274bf3f",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "input_nodes = X_train.shape[1]\n",
-    "output_nodes = 1\n",
+    "## Exercises\n",
     "\n",
-    "linear_regression = FFNN((input_nodes, output_nodes), output_func=identity, cost_func=CostOLS, seed=2023)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "eeff4315",
-   "metadata": {},
-   "source": [
-    "We then fit our model with our training data using the scheduler of our choice."
+    "Here is a simple code example which uses  the Logistic regression machinery from **scikit-learn**.\n",
+    "At the end it sets up the confusion matrix and the ROC and cumulative gain curves.\n",
+    "Feel free to use these functionalities (we don't expect you to write your own code for say the confusion matrix)."
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 13,
-   "id": "46246810",
-   "metadata": {},
+   "execution_count": 1,
+   "id": "be9ff0b9",
+   "metadata": {
+    "collapsed": false,
+    "editable": true,
+    "jupyter": {
+     "outputs_hidden": false
+    }
+   },
    "outputs": [],
    "source": [
-    "linear_regression.reset_weights() # reset weights such that previous runs or reruns don't affect the weights\n",
+    "%matplotlib inline\n",
     "\n",
-    "scheduler = Constant(eta=1e-3)\n",
-    "scores = linear_regression.fit(X_train, t_train, scheduler)"
+    "import matplotlib.pyplot as plt\n",
+    "import numpy as np\n",
+    "from sklearn.model_selection import  train_test_split \n",
+    "# from sklearn.datasets import fill in the data set\n",
+    "from sklearn.linear_model import LogisticRegression\n",
+    "\n",
+    "# Load the data, fill inn\n",
+    "mydata.data = ?\n",
+    "\n",
+    "X_train, X_test, y_train, y_test = train_test_split(mydata.data,cancer.target,random_state=0)\n",
+    "print(X_train.shape)\n",
+    "print(X_test.shape)\n",
+    "# Logistic Regression\n",
+    "# define which type of problem, binary or multiclass\n",
+    "logreg = LogisticRegression(solver='lbfgs')\n",
+    "logreg.fit(X_train, y_train)\n",
+    "\n",
+    "from sklearn.preprocessing import LabelEncoder\n",
+    "from sklearn.model_selection import cross_validate\n",
+    "#Cross validation\n",
+    "accuracy = cross_validate(logreg,X_test,y_test,cv=10)['test_score']\n",
+    "print(accuracy)\n",
+    "print(\"Test set accuracy with Logistic Regression: {:.2f}\".format(logreg.score(X_test,y_test)))\n",
+    "\n",
+    "import scikitplot as skplt\n",
+    "y_pred = logreg.predict(X_test)\n",
+    "skplt.metrics.plot_confusion_matrix(y_test, y_pred, normalize=True)\n",
+    "plt.show()\n",
+    "y_probas = logreg.predict_proba(X_test)\n",
+    "skplt.metrics.plot_roc(y_test, y_probas)\n",
+    "plt.show()\n",
+    "skplt.metrics.plot_cumulative_gain(y_test, y_probas)\n",
+    "plt.show()"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "8c6f9954",
-   "metadata": {},
-   "source": [
-    "Due to the progress bar we can see the MSE (train_error) throughout\n",
-    "the FFNN's training. Note that the fit() function has some optional\n",
-    "parameters with defualt arguments. For example, the regularization\n",
-    "hyperparameter can be left ignored if not needed, and equally the FFNN\n",
-    "will by default run for 100 epochs. These can easily be changed, such\n",
-    "as for example:"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 14,
-   "id": "2661939c",
-   "metadata": {},
-   "outputs": [],
+   "id": "51760b3e",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "linear_regression.reset_weights() # reset weights such that previous runs or reruns don't affect the weights\n",
+    "### Exercise a)\n",
     "\n",
-    "scores = linear_regression.fit(X_train, t_train, scheduler, lam=1e-4, epochs=1000)"
+    "Convince yourself about the mathematics for the confusion matrix, the ROC and the cumlative gain curves for both a binary and a multiclass classification problem."
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "74c5624c",
-   "metadata": {},
-   "source": [
-    "We see that given more epochs to train on, the regressor reaches a lower MSE.\n",
-    "\n",
-    "Let us then switch to a binary classification. We use a binary\n",
-    "classification dataset, and follow a similar setup to the regression\n",
-    "case."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 15,
-   "id": "4b8eb115",
-   "metadata": {},
-   "outputs": [],
+   "id": "c1d42f5f",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "from sklearn.datasets import load_breast_cancer\n",
-    "from sklearn.preprocessing import MinMaxScaler\n",
-    "\n",
-    "wisconsin = load_breast_cancer()\n",
-    "X = wisconsin.data\n",
-    "target = wisconsin.target\n",
-    "target = target.reshape(target.shape[0], 1)\n",
-    "\n",
-    "X_train, X_val, t_train, t_val = train_test_split(X, target)\n",
+    "### Exercise b)\n",
     "\n",
-    "scaler = MinMaxScaler()\n",
-    "scaler.fit(X_train)\n",
-    "X_train = scaler.transform(X_train)\n",
-    "X_val = scaler.transform(X_val)"
+    "Use a binary classification data available from **scikit-learn**. As an example you can use\n",
+    "the MNIST data set and just specialize to two numbers. To do so you can use the following code lines"
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 16,
-   "id": "2c0f92bd",
-   "metadata": {},
+   "execution_count": 2,
+   "id": "d20bb8be",
+   "metadata": {
+    "collapsed": false,
+    "editable": true,
+    "jupyter": {
+     "outputs_hidden": false
+    }
+   },
    "outputs": [],
    "source": [
-    "input_nodes = X_train.shape[1]\n",
-    "output_nodes = 1\n",
-    "\n",
-    "logistic_regression = FFNN((input_nodes, output_nodes), output_func=sigmoid, cost_func=CostLogReg, seed=2023)"
+    "from sklearn.datasets import load_digits\n",
+    "digits = load_digits(n_class=2) # Load only two classes, e.g., 0 and 1\n",
+    "X, y = digits.data, digits.target"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "49201ae4",
-   "metadata": {},
+   "id": "828ea1cd",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "We will now make use of our validation data by passing it into our fit function as a keyword argument"
+    "Alternatively, you can use the _make$\\_$classification_\n",
+    "functionality. This function generates a random $n$-class classification\n",
+    "dataset, which can be configured for binary classification by setting\n",
+    "n_classes=2. You can also control the number of samples, features,\n",
+    "informative features, redundant features, and more."
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 17,
-   "id": "55b5e426",
-   "metadata": {},
+   "execution_count": 3,
+   "id": "d271f0ba",
+   "metadata": {
+    "collapsed": false,
+    "editable": true,
+    "jupyter": {
+     "outputs_hidden": false
+    }
+   },
    "outputs": [],
    "source": [
-    "logistic_regression.reset_weights() # reset weights such that previous runs or reruns don't affect the weights\n",
-    "\n",
-    "scheduler = Adam(eta=1e-3, rho=0.9, rho2=0.999)\n",
-    "scores = logistic_regression.fit(X_train, t_train, scheduler, epochs=1000, X_val=X_val, t_val=t_val)"
+    "from sklearn.datasets import make_classification\n",
+    "X, y = make_classification(n_samples=1000, n_features=20, n_informative=10, n_redundant=5, n_classes=2, random_state=42)"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "b51762fb",
-   "metadata": {},
+   "id": "0068b032",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "Finally, we will create a neural network with 2 hidden layers with activation functions."
+    "You can use this option for the multiclass case as well, see the next exercise.\n",
+    "If you prefer to study other binary classification datasets, feel free\n",
+    "to replace the above suggestions with your own dataset.\n",
+    "\n",
+    "Make plots of the confusion matrix, the ROC curve and the cumulative gain curve."
    ]
   },
   {
-   "cell_type": "code",
-   "execution_count": 18,
-   "id": "6b59e27d",
-   "metadata": {},
-   "outputs": [],
+   "cell_type": "markdown",
+   "id": "c45f5b41",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "input_nodes = X_train.shape[1]\n",
-    "hidden_nodes1 = 100\n",
-    "hidden_nodes2 = 30\n",
-    "output_nodes = 1\n",
-    "\n",
-    "dims = (input_nodes, hidden_nodes1, hidden_nodes2, output_nodes)\n",
+    "### Exercise c) week 43\n",
     "\n",
-    "neural_network = FFNN(dims, hidden_func=RELU, output_func=sigmoid, cost_func=CostLogReg, seed=2023)"
+    "As a multiclass problem, we will use the Iris data set discussed in\n",
+    "the exercises from weeks 41 and 42. This is a three-class data set and\n",
+    "you can set it up using **scikit-learn**,"
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 19,
-   "id": "72c87921",
-   "metadata": {},
+   "execution_count": 4,
+   "id": "3b045d56",
+   "metadata": {
+    "collapsed": false,
+    "editable": true,
+    "jupyter": {
+     "outputs_hidden": false
+    }
+   },
    "outputs": [],
    "source": [
-    "neural_network.reset_weights() # reset weights such that previous runs or reruns don't affect the weights\n",
-    "\n",
-    "scheduler = Adam(eta=1e-4, rho=0.9, rho2=0.999)\n",
-    "scores = neural_network.fit(X_train, t_train, scheduler, epochs=1000, X_val=X_val, t_val=t_val)"
+    "from sklearn.datasets import load_iris\n",
+    "iris = load_iris()\n",
+    "X = iris.data  # Features\n",
+    "y = iris.target # Target labels"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "ed40d7d2",
-   "metadata": {},
-   "source": [
-    "### Multiclass classification\n",
-    "\n",
-    "Finally, we will demonstrate the use case of multiclass classification\n",
-    "using our FFNN with the famous MNIST dataset, which contain images of\n",
-    "digits between the range of 0 to 9."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 20,
-   "id": "315ef3fe",
-   "metadata": {},
-   "outputs": [],
+   "id": "14cc859c",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "from sklearn.datasets import load_digits\n",
-    "\n",
-    "def onehot(target: np.ndarray):\n",
-    "    onehot = np.zeros((target.size, target.max() + 1))\n",
-    "    onehot[np.arange(target.size), target] = 1\n",
-    "    return onehot\n",
-    "\n",
-    "digits = load_digits()\n",
-    "\n",
-    "X = digits.data\n",
-    "target = digits.target\n",
-    "target = onehot(target)\n",
-    "\n",
-    "input_nodes = 64\n",
-    "hidden_nodes1 = 100\n",
-    "hidden_nodes2 = 30\n",
-    "output_nodes = 10\n",
-    "\n",
-    "dims = (input_nodes, hidden_nodes1, hidden_nodes2, output_nodes)\n",
-    "\n",
-    "multiclass = FFNN(dims, hidden_func=LRELU, output_func=softmax, cost_func=CostCrossEntropy)\n",
-    "\n",
-    "multiclass.reset_weights() # reset weights such that previous runs or reruns don't affect the weights\n",
-    "\n",
-    "scheduler = Adam(eta=1e-4, rho=0.9, rho2=0.999)\n",
-    "scores = multiclass.fit(X, target, scheduler, epochs=1000)"
+    "Make plots of the confusion matrix, the ROC curve and the cumulative\n",
+    "gain curve for this (or other) multiclass data set."
    ]
   }
  ],
@@ -1507,7 +639,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.9.10"
+   "version": "3.9.15"
   }
  },
  "nbformat": 4,
diff --git a/doc/LectureNotes/.ipynb_checkpoints/project1-checkpoint.ipynb b/doc/LectureNotes/.ipynb_checkpoints/project1-checkpoint.ipynb
new file mode 100644
index 000000000..5170af951
--- /dev/null
+++ b/doc/LectureNotes/.ipynb_checkpoints/project1-checkpoint.ipynb
@@ -0,0 +1,688 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "b209e219",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- HTML file automatically generated from DocOnce source (https://github.com/doconce/doconce/)\n",
+    "doconce format html Project1.do.txt  -->\n",
+    "<!-- dom:TITLE: Project 1 on Machine Learning, deadline October 6 (midnight), 2025 -->\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6fa4c4bc",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "# Project 1 on Machine Learning, deadline October 6 (midnight), 2025\n",
+    "\n",
+    "**Data Analysis and Machine Learning FYS-STK3155/FYS4155**, University of Oslo, Norway\n",
+    "\n",
+    "Date: **September 2**\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "beb333e3",
+   "metadata": {},
+   "source": [
+    "### Deliverables\n",
+    "\n",
+    "First, join a group in canvas with your group partners. Pick an avaliable group for Project 1 in the \"People\" page.\n",
+    "\n",
+    "In canvas, deliver as a group and include:\n",
+    "\n",
+    "- A PDF of your report which follows the guidelines covered below and in the week 39 exercises. Additional requirements include:\n",
+    "  - It should be around 5000 words, use the word counter in Overleaf for this. This often corresponds to 10-12 pages. References and appendices are excluded from the word count\n",
+    "  - It should include around 10-15 figures. You can include more figures in appendices and/or as supplemental material in your repository.\n",
+    "- A comment linking to your github repository (or folder in one of your github repositories) for this project. The repository must include\n",
+    "  - A PDF file of the report\n",
+    "  - A folder named Code, where you put python files for your functions and notebooks for reproducing your results. Remember to use a seed for generating random data and for train-test splits when generating final results.\n",
+    "  - A README file with\n",
+    "    - the name of the group members\n",
+    "    - a short description of the project\n",
+    "    - a description of how to install the required packages to run your code from a requirements.txt file or similar (such as a plain text description)\n",
+    "    - names and descriptions of the various notebooks in the Code folder and the results they produce\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "735b16c4",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Preamble: Note on writing reports, using reference material, AI and other tools\n",
+    "\n",
+    "We want you to answer the three different projects by handing in\n",
+    "reports written like a standard scientific/technical report. The\n",
+    "links at\n",
+    "<https://github.com/CompPhysics/MachineLearning/tree/master/doc/Projects>\n",
+    "contain more information. There you can find examples of previous\n",
+    "reports, the projects themselves, how we grade reports etc. How to\n",
+    "write reports will also be discussed during the various lab\n",
+    "sessions. Please do ask us if you are in doubt.\n",
+    "\n",
+    "When using codes and material from other sources, you should refer to\n",
+    "these in the bibliography of your report, indicating wherefrom you for\n",
+    "example got the code, whether this is from the lecture notes,\n",
+    "softwares like Scikit-Learn, TensorFlow, PyTorch or other sources. These sources\n",
+    "should always be cited correctly. How to cite some\n",
+    "of the libraries is often indicated from their corresponding GitHub\n",
+    "sites or websites, see for example how to cite Scikit-Learn at\n",
+    "<https://scikit-learn.org/dev/about.html>.\n",
+    "\n",
+    "We enocurage you to use tools like\n",
+    "[ChatGPT](https://openai.com/chatgpt/) or similar in writing the report. If you use for example ChatGPT,\n",
+    "please do cite it properly and include (if possible) your questions and answers as an addition to the report. This can\n",
+    "be uploaded to for example your website, GitHub/GitLab or similar as supplemental material.\n",
+    "\n",
+    "If you would like to study other data sets, feel free to propose other\n",
+    "sets. What we have proposed here are mere suggestions from our\n",
+    "side. If you opt for another data set, consider using a set which has\n",
+    "been studied in the scientific literature. This makes it easier for\n",
+    "you to compare and analyze your results. Comparing with existing\n",
+    "results from the scientific literature is also an essential element of\n",
+    "the scientific discussion. The University of California at Irvine\n",
+    "with its Machine Learning repository at\n",
+    "<https://archive.ics.uci.edu/ml/index.php> is an excellent site to\n",
+    "look up for examples and\n",
+    "inspiration. [Kaggle.com](https://www.kaggle.com/) is an equally\n",
+    "interesting site. Feel free to explore these sites. When selecting\n",
+    "other data sets, make sure these are sets used for regression problems\n",
+    "(not classification).\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0b7956ca",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Regression analysis and resampling methods\n",
+    "\n",
+    "The main aim of this project is to study in more detail various\n",
+    "regression methods, including Ordinary Least Squares (OLS) reegression, Ridge regression and LASSO regression.\n",
+    "In addition to the scientific part, in this course we want also to\n",
+    "give you an experience in writing scientific reports.\n",
+    "\n",
+    "We will study how to fit polynomials to specific\n",
+    "one-dimensional functions (feel free to replace the suggested function with more complicated ones).\n",
+    "\n",
+    "We will use Runge's function (see <https://en.wikipedia.org/wiki/Runge%27s_phenomenon> for a discussion). The one-dimensional function we will study is\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "28ba3d22",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "f(x) = \\frac{1}{1+25x^2}.\n",
+    "$$\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9a3e10ba",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Our first step will be to perform an OLS regression analysis of this\n",
+    "function, trying out a polynomial fit with an $x$ dependence of the\n",
+    "form $[x,x^2,\\dots]$. You can use a uniform distribution to set up the\n",
+    "arrays of values for $x \\in [-1,1]$, or alternatively use a fixed step size.\n",
+    "Thereafter we will repeat many of the same steps when using the Ridge and Lasso regression methods,\n",
+    "introducing thereby a dependence on the hyperparameter (penalty) $\\lambda$.\n",
+    "\n",
+    "We will also include bootstrap as a resampling technique in order to\n",
+    "study the so-called **bias-variance tradeoff**. After that we will\n",
+    "include the so-called cross-validation technique.\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8aa547a5",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "### Part a : Ordinary Least Square (OLS) for the Runge function\n",
+    "\n",
+    "We will generate our own dataset for abovementioned function\n",
+    "$\\mathrm{Runge}(x)$ function with $x\\in [-1,1]$. You should explore also the addition\n",
+    "of an added stochastic noise to this function using the normal\n",
+    "distribution $N(0,1)$.\n",
+    "\n",
+    "_Write your own code_ (using for example the pseudoinverse function **pinv** from **Numpy** ) and perform a standard **ordinary least square regression**\n",
+    "analysis using polynomials in $x$ up to order $15$ or higher. Explore the dependence on the number of data points and the polynomial degree.\n",
+    "\n",
+    "Evaluate the mean Squared error (MSE)\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "68fbf03d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "MSE(\\boldsymbol{y},\\tilde{\\boldsymbol{y}}) = \\frac{1}{n}\n",
+    "\\sum_{i=0}^{n-1}(y_i-\\tilde{y}_i)^2,\n",
+    "$$\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b49509bc",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and the $R^2$ score function. If $\\tilde{\\boldsymbol{y}}_i$ is the predicted\n",
+    "value of the $i-th$ sample and $y_i$ is the corresponding true value,\n",
+    "then the score $R^2$ is defined as\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0fa4ffc6",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "R^2(\\boldsymbol{y}, \\tilde{\\boldsymbol{y}}) = 1 - \\frac{\\sum_{i=0}^{n - 1} (y_i - \\tilde{y}_i)^2}{\\sum_{i=0}^{n - 1} (y_i - \\bar{y})^2},\n",
+    "$$\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ce462b32",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where we have defined the mean value of $\\boldsymbol{y}$ as\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a5fbef36",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\bar{y} =  \\frac{1}{n} \\sum_{i=0}^{n - 1} y_i.\n",
+    "$$\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a6afe9cb",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Plot the resulting scores (MSE and R$^2$) as functions of the polynomial degree (here up to polymial degree 15).\n",
+    "Plot also the parameters $\\theta$ as you increase the order of the polynomial. Comment your results.\n",
+    "\n",
+    "Your code has to include a scaling/centering of the data (for example by\n",
+    "subtracting the mean value), and\n",
+    "a split of the data in training and test data. For the scaling you can\n",
+    "either write your own code or use for example the function for\n",
+    "splitting training data provided by the library **Scikit-Learn** (make\n",
+    "sure you have installed it). This function is called\n",
+    "$train\\_test\\_split$. **You should present a critical discussion of why and how you have scaled or not scaled the data**.\n",
+    "\n",
+    "It is normal in essentially all Machine Learning studies to split the\n",
+    "data in a training set and a test set (eventually also an additional\n",
+    "validation set). There\n",
+    "is no explicit recipe for how much data should be included as training\n",
+    "data and say test data. An accepted rule of thumb is to use\n",
+    "approximately $2/3$ to $4/5$ of the data as training data.\n",
+    "\n",
+    "You can easily reuse the solutions to your exercises from week 35.\n",
+    "See also the lecture slides from week 35 and week 36.\n",
+    "\n",
+    "On scaling, we recommend reading the following section from the scikit-learn software description, see <https://scikit-learn.org/stable/auto_examples/preprocessing/plot_all_scaling.html#plot-all-scaling-standard-scaler-section>.\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3be10f68",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "### Part b: Adding Ridge regression for the Runge function\n",
+    "\n",
+    "Write your own code for the Ridge method as done in the previous\n",
+    "exercise. The lecture notes from week 35 and 36 contain more information. Furthermore, the results from the exercise set from week 36 is something you can reuse here.\n",
+    "\n",
+    "Perform the same analysis as you did in the previous exercise but now for different values of $\\lambda$. Compare and\n",
+    "analyze your results with those obtained in part a) with the OLS method. Study the\n",
+    "dependence on $\\lambda$.\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "caa7909c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "### Part c: Writing your own gradient descent code\n",
+    "\n",
+    "Replace now the analytical expressions for the optimal parameters\n",
+    "$\\boldsymbol{\\theta}$ with your own gradient descent code. In this exercise we\n",
+    "focus only on the simplest gradient descent approach with a fixed\n",
+    "learning rate (see the exercises from week 37 and the lecture notes\n",
+    "from week 36).\n",
+    "\n",
+    "Study and compare your results from parts a) and b) with your gradient\n",
+    "descent approch. Discuss in particular the role of the learning rate.\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3aac4df1",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "### Part d: Including momentum and more advanced ways to update the learning the rate\n",
+    "\n",
+    "We keep our focus on OLS and Ridge regression and update our code for\n",
+    "the gradient descent method by including **momentum**, **ADAgrad**,\n",
+    "**RMSprop** and **ADAM** as methods fro iteratively updating your learning\n",
+    "rate. Discuss the results and compare the different methods applied to\n",
+    "the one-dimensional Runge function. The lecture notes from week 37 contain several examples on how to implement these methods.\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d0862a53",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "### Part e: Writing our own code for Lasso regression\n",
+    "\n",
+    "LASSO regression (see lecture slides from week 36 and week 37)\n",
+    "represents our first encounter with a machine learning method which\n",
+    "cannot be solved through analytical expressions (as in OLS and Ridge regression). Use the gradient\n",
+    "descent methods you developed in parts c) and d) to solve the LASSO\n",
+    "optimization problem. You can compare your results with\n",
+    "the functionalities of **Scikit-Learn**.\n",
+    "\n",
+    "Discuss (critically) your results for the Runge function from OLS,\n",
+    "Ridge and LASSO regression using the various gradient descent\n",
+    "approaches.\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9170032e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "### Part f: Stochastic gradient descent\n",
+    "\n",
+    "Our last gradient step is to include stochastic gradient descent using\n",
+    "the same methods to update the learning rates as in parts c-e).\n",
+    "Compare and discuss your results with and without stochastic gradient\n",
+    "and give a critical assessment of the various methods.\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "bacd1035",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "### Part g: Bias-variance trade-off and resampling techniques\n",
+    "\n",
+    "Our aim here is to study the bias-variance trade-off by implementing\n",
+    "the **bootstrap** resampling technique. **We will only use the simpler\n",
+    "ordinary least squares here**.\n",
+    "\n",
+    "With a code which does OLS and includes resampling techniques,\n",
+    "we will now discuss the bias-variance trade-off in the context of\n",
+    "continuous predictions such as regression. However, many of the\n",
+    "intuitions and ideas discussed here also carry over to classification\n",
+    "tasks and basically all Machine Learning algorithms.\n",
+    "\n",
+    "Before you perform an analysis of the bias-variance trade-off on your\n",
+    "test data, make first a figure similar to Fig. 2.11 of Hastie,\n",
+    "Tibshirani, and Friedman. Figure 2.11 of this reference displays only\n",
+    "the test and training MSEs. The test MSE can be used to indicate\n",
+    "possible regions of low/high bias and variance. You will most likely\n",
+    "not get an equally smooth curve! You may also need to increase the\n",
+    "polynomial order and play around with the number of data points as\n",
+    "well (see also the exercise set from week 35).\n",
+    "\n",
+    "With this result we move on to the bias-variance trade-off analysis.\n",
+    "\n",
+    "Consider a\n",
+    "dataset $\\mathcal{L}$ consisting of the data\n",
+    "$\\mathbf{X}_\\mathcal{L}=\\{(y_j, \\boldsymbol{x}_j), j=0\\ldots n-1\\}$.\n",
+    "\n",
+    "We assume that the true data is generated from a noisy model\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b871ec69",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\boldsymbol{y}=f(\\boldsymbol{x}) + \\boldsymbol{\\epsilon}.\n",
+    "$$\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b47c19bc",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Here $\\epsilon$ is normally distributed with mean zero and standard\n",
+    "deviation $\\sigma^2$.\n",
+    "\n",
+    "In our derivation of the ordinary least squares method we defined then\n",
+    "an approximation to the function $f$ in terms of the parameters\n",
+    "$\\boldsymbol{\\theta}$ and the design matrix $\\boldsymbol{X}$ which embody our model,\n",
+    "that is $\\boldsymbol{\\tilde{y}}=\\boldsymbol{X}\\boldsymbol{\\theta}$.\n",
+    "\n",
+    "The parameters $\\boldsymbol{\\theta}$ are in turn found by optimizing the mean\n",
+    "squared error via the so-called cost function\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6db622c2",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "C(\\boldsymbol{X},\\boldsymbol{\\theta}) =\\frac{1}{n}\\sum_{i=0}^{n-1}(y_i-\\tilde{y}_i)^2=\\mathbb{E}\\left[(\\boldsymbol{y}-\\boldsymbol{\\tilde{y}})^2\\right].\n",
+    "$$\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5a7eb70d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Here the expected value $\\mathbb{E}$ is the sample value.\n",
+    "\n",
+    "Show that you can rewrite this in terms of a term which contains the\n",
+    "variance of the model itself (the so-called variance term), a term\n",
+    "which measures the deviation from the true data and the mean value of\n",
+    "the model (the bias term) and finally the variance of the noise.\n",
+    "\n",
+    "That is, show that\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d50292fe",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\mathbb{E}\\left[(\\boldsymbol{y}-\\boldsymbol{\\tilde{y}})^2\\right]=\\mathrm{Bias}[\\tilde{y}]+\\mathrm{var}[\\tilde{y}]+\\sigma^2,\n",
+    "$$\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "50fa641f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "with (we approximate $f(\\boldsymbol{x})\\approx \\boldsymbol{y}$)\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2bd429c9",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\mathrm{Bias}[\\tilde{y}]=\\mathbb{E}\\left[\\left(\\boldsymbol{y}-\\mathbb{E}\\left[\\boldsymbol{\\tilde{y}}\\right]\\right)^2\\right],\n",
+    "$$\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "737c2819",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "41ef92ef",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\mathrm{var}[\\tilde{y}]=\\mathbb{E}\\left[\\left(\\tilde{\\boldsymbol{y}}-\\mathbb{E}\\left[\\boldsymbol{\\tilde{y}}\\right]\\right)^2\\right]=\\frac{1}{n}\\sum_i(\\tilde{y}_i-\\mathbb{E}\\left[\\boldsymbol{\\tilde{y}}\\right])^2.\n",
+    "$$\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b948dab0",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "**Important note**: Since the function $f(x)$ is unknown, in order to be able to evalute the bias, we replace $f(\\boldsymbol{x})$ in the expression for the bias with $\\boldsymbol{y}$.\n",
+    "\n",
+    "The answer to this exercise should be included in the theory part of\n",
+    "the report. This exercise is also part of the weekly exercises of\n",
+    "week 38. Explain what the terms mean and discuss their\n",
+    "interpretations.\n",
+    "\n",
+    "Perform then a bias-variance analysis of the Runge function by\n",
+    "studying the MSE value as function of the complexity of your model.\n",
+    "\n",
+    "Discuss the bias and variance trade-off as function\n",
+    "of your model complexity (the degree of the polynomial) and the number\n",
+    "of data points, and possibly also your training and test data using the **bootstrap** resampling method.\n",
+    "You can follow the code example in the jupyter-book at <https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/chapter3.html#the-bias-variance-tradeoff>.\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6a0548bf",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "### Part h): Cross-validation as resampling techniques, adding more complexity\n",
+    "\n",
+    "The aim here is to implement another widely popular\n",
+    "resampling technique, the so-called cross-validation method.\n",
+    "\n",
+    "Implement the $k$-fold cross-validation algorithm (feel free to use\n",
+    "the functionality of **Scikit-Learn** or write your own code) and\n",
+    "evaluate again the MSE function resulting from the test folds.\n",
+    "\n",
+    "Compare the MSE you get from your cross-validation code with the one\n",
+    "you got from your **bootstrap** code from the previous exercise. Comment and interpret your results.\n",
+    "\n",
+    "In addition to using the ordinary least squares method, you should\n",
+    "include both Ridge and Lasso regression in the final analysis.\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "df9845cb",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Background literature\n",
+    "\n",
+    "1. For a discussion and derivation of the variances and mean squared errors using linear regression, see the [Lecture notes on ridge regression by Wessel N. van Wieringen](https://arxiv.org/abs/1509.09169)\n",
+    "\n",
+    "2. The textbook of [Trevor Hastie, Robert Tibshirani, Jerome H. Friedman, The Elements of Statistical Learning, Springer](https://www.springer.com/gp/book/9780387848570), chapters 3 and 7 are the most relevant ones for the analysis of parts g) and h).\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b9e04791",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Introduction to numerical projects\n",
+    "\n",
+    "Here follows a brief recipe and recommendation on how to answer the various questions when preparing your answers.\n",
+    "\n",
+    "- Give a short description of the nature of the problem and the eventual numerical methods you have used.\n",
+    "\n",
+    "- Describe the algorithm you have used and/or developed. Here you may find it convenient to use pseudocoding. In many cases you can describe the algorithm in the program itself.\n",
+    "\n",
+    "- Include the source code of your program. Comment your program properly. You should have the code at your GitHub/GitLab link. You can also place the code in an appendix of your report.\n",
+    "\n",
+    "- If possible, try to find analytic solutions, or known limits in order to test your program when developing the code.\n",
+    "\n",
+    "- Include your results either in figure form or in a table. Remember to label your results. All tables and figures should have relevant captions and labels on the axes.\n",
+    "\n",
+    "- Try to evaluate the reliabilty and numerical stability/precision of your results. If possible, include a qualitative and/or quantitative discussion of the numerical stability, eventual loss of precision etc.\n",
+    "\n",
+    "- Try to give an interpretation of you results in your answers to the problems.\n",
+    "\n",
+    "- Critique: if possible include your comments and reflections about the exercise, whether you felt you learnt something, ideas for improvements and other thoughts you've made when solving the exercise. We wish to keep this course at the interactive level and your comments can help us improve it.\n",
+    "\n",
+    "- Try to establish a practice where you log your work at the computerlab. You may find such a logbook very handy at later stages in your work, especially when you don't properly remember what a previous test version of your program did. Here you could also record the time spent on solving the exercise, various algorithms you may have tested or other topics which you feel worthy of mentioning.\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3fab6237",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Format for electronic delivery of report and programs\n",
+    "\n",
+    "The preferred format for the report is a PDF file. You can also use DOC or postscript formats or as an ipython notebook file. As programming language we prefer that you choose between C/C++, Fortran2008, Julia or Python. The following prescription should be followed when preparing the report:\n",
+    "\n",
+    "- Use Canvas to hand in your projects, log in at <https://www.uio.no/english/services/it/education/canvas/> with your normal UiO username and password.\n",
+    "\n",
+    "- Upload **only** the report file or the link to your GitHub/GitLab or similar typo of repos! For the source code file(s) you have developed please provide us with your link to your GitHub/GitLab or similar domain. The report file should include all of your discussions and a list of the codes you have developed. Do not include library files which are available at the course homepage, unless you have made specific changes to them.\n",
+    "\n",
+    "- In your GitHub/GitLab or similar repository, please include a folder which contains selected results. These can be in the form of output from your code for a selected set of runs and input parameters.\n",
+    "\n",
+    "Finally,\n",
+    "we encourage you to collaborate. Optimal working groups consist of\n",
+    "2-3 students. You can then hand in a common report.\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3388eb60",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Software and needed installations\n",
+    "\n",
+    "If you have Python installed (we recommend Python3) and you feel pretty familiar with installing different packages,\n",
+    "we recommend that you install the following Python packages via **pip** as\n",
+    "\n",
+    "1. pip install numpy scipy matplotlib ipython scikit-learn tensorflow sympy pandas pillow\n",
+    "\n",
+    "For Python3, replace **pip** with **pip3**.\n",
+    "\n",
+    "See below for a discussion of **tensorflow** and **scikit-learn**.\n",
+    "\n",
+    "For OSX users we recommend also, after having installed Xcode, to install **brew**. Brew allows\n",
+    "for a seamless installation of additional software via for example\n",
+    "\n",
+    "1. brew install python3\n",
+    "\n",
+    "For Linux users, with its variety of distributions like for example the widely popular Ubuntu distribution\n",
+    "you can use **pip** as well and simply install Python as\n",
+    "\n",
+    "1. sudo apt-get install python3 (or python for python2.7)\n",
+    "\n",
+    "etc etc.\n",
+    "\n",
+    "If you don't want to install various Python packages with their dependencies separately, we recommend two widely used distrubutions which set up all relevant dependencies for Python, namely\n",
+    "\n",
+    "1. [Anaconda](https://docs.anaconda.com/) Anaconda is an open source distribution of the Python and R programming languages for large-scale data processing, predictive analytics, and scientific computing, that aims to simplify package management and deployment. Package versions are managed by the package management system **conda**\n",
+    "\n",
+    "2. [Enthought canopy](https://www.enthought.com/product/canopy/) is a Python distribution for scientific and analytic computing distribution and analysis environment, available for free and under a commercial license.\n",
+    "\n",
+    "Popular software packages written in Python for ML are\n",
+    "\n",
+    "- [Scikit-learn](http://scikit-learn.org/stable/),\n",
+    "\n",
+    "- [Tensorflow](https://www.tensorflow.org/),\n",
+    "\n",
+    "- [PyTorch](http://pytorch.org/) and\n",
+    "\n",
+    "- [Keras](https://keras.io/).\n",
+    "\n",
+    "These are all freely available at their respective GitHub sites. They\n",
+    "encompass communities of developers in the thousands or more. And the number\n",
+    "of code developers and contributors keeps increasing.\n"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.9.15"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/doc/LectureNotes/.ipynb_checkpoints/week37-checkpoint.ipynb b/doc/LectureNotes/.ipynb_checkpoints/week37-checkpoint.ipynb
new file mode 100644
index 000000000..9038066ab
--- /dev/null
+++ b/doc/LectureNotes/.ipynb_checkpoints/week37-checkpoint.ipynb
@@ -0,0 +1,3484 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "311a2385",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- HTML file automatically generated from DocOnce source (https://github.com/doconce/doconce/)\n",
+    "doconce format html week37.do.txt --no_mako -->\n",
+    "<!-- dom:TITLE: Week 37: Gradient descent methods -->"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9e4484dc",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "# Week 37: Gradient descent methods\n",
+    "**Morten Hjorth-Jensen**, Department of Physics, University of Oslo, Norway\n",
+    "\n",
+    "Date: **September 8-12, 2025**\n",
+    "\n",
+    "<!-- todo add link to videos and add link to Van Wieringens notes -->"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a24010ae",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Plans for week 37, lecture Monday\n",
+    "\n",
+    "**Plans and material  for the lecture on Monday September 8.**\n",
+    "\n",
+    "The family of gradient descent methods\n",
+    "1. Plain gradient descent (constant learning rate), reminder from last week with examples using OLS and Ridge\n",
+    "\n",
+    "2. Improving gradient descent with momentum\n",
+    "\n",
+    "3. Introducing stochastic gradient descent\n",
+    "\n",
+    "4. More advanced updates of the learning rate: ADAgrad, RMSprop and ADAM\n",
+    "<!-- * [Video of Lecture](https://youtu.be/omLmp_kkie0) -->\n",
+    "<!-- * [Whiteboard notes](https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2024/NotesSeptember9.pdf) -->"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4a291d59",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Readings and Videos:\n",
+    "1. Recommended: Goodfellow et al, Deep Learning, introduction to gradient descent, see sections 4.3-4.5  at <https://www.deeplearningbook.org/contents/numerical.html> and chapter 8.3-8.5 at <https://www.deeplearningbook.org/contents/optimization.html>\n",
+    "\n",
+    "2. Rashcka et al, pages 37-44 and pages 278-283 with focus on linear regression.\n",
+    "\n",
+    "3. Video on gradient descent at <https://www.youtube.com/watch?v=sDv4f4s2SB8>\n",
+    "\n",
+    "4. Video on Stochastic gradient descent at <https://www.youtube.com/watch?v=vMh0zPT0tLI>"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "85c747e2",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Material for lecture Monday September 8"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6580dfe2",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Gradient descent and revisiting Ordinary Least Squares from last week\n",
+    "\n",
+    "Last week we started with  linear regression as a case study for the gradient descent\n",
+    "methods. Linear regression is a great test case for the gradient\n",
+    "descent methods discussed in the lectures since it has several\n",
+    "desirable properties such as:\n",
+    "\n",
+    "1. An analytical solution (recall homework sets for week 35).\n",
+    "\n",
+    "2. The gradient can be computed analytically.\n",
+    "\n",
+    "3. The cost function is convex which guarantees that gradient descent converges for small enough learning rates\n",
+    "\n",
+    "We revisit an example similar to what we had in the first homework set. We have a function  of the type"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "id": "c2ddcfe5",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "import numpy as np\n",
+    "x = 2*np.random.rand(m,1)\n",
+    "y = 4+3*x+np.random.randn(m,1)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e1e8a5b2",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "with $x_i \\in [0,1] $ is chosen randomly using a uniform distribution. Additionally we have a stochastic noise chosen according to a normal distribution $\\cal {N}(0,1)$. \n",
+    "The linear regression model is given by"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c8a5100b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "h_\\theta(x) = \\boldsymbol{y} = \\theta_0 + \\theta_1 x,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b026883e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "such that"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3a2f7b75",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\boldsymbol{y}_i = \\theta_0 + \\theta_1 x_i.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6380eed5",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Gradient descent example\n",
+    "\n",
+    "Let $\\mathbf{y} = (y_1,\\cdots,y_n)^T$, $\\mathbf{\\boldsymbol{y}} = (\\boldsymbol{y}_1,\\cdots,\\boldsymbol{y}_n)^T$ and $\\theta = (\\theta_0, \\theta_1)^T$\n",
+    "\n",
+    "It is convenient to write $\\mathbf{\\boldsymbol{y}} = X\\theta$ where $X \\in \\mathbb{R}^{100 \\times 2} $ is the design matrix given by (we keep the intercept here)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c5d3766d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "X \\equiv \\begin{bmatrix}\n",
+    "1 & x_1  \\\\\n",
+    "\\vdots & \\vdots  \\\\\n",
+    "1 & x_{100} &  \\\\\n",
+    "\\end{bmatrix}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1d313807",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "The cost/loss/risk function is given by"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "bee64882",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "C(\\theta) = \\frac{1}{n}||X\\theta-\\mathbf{y}||_{2}^{2} = \\frac{1}{n}\\sum_{i=1}^{100}\\left[ (\\theta_0 + \\theta_1 x_i)^2 - 2 y_i (\\theta_0 + \\theta_1 x_i) + y_i^2\\right]\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7ffe8d02",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and we want to find $\\theta$ such that $C(\\theta)$ is minimized."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "97225362",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## The derivative of the cost/loss function\n",
+    "\n",
+    "Computing $\\partial C(\\theta) / \\partial \\theta_0$ and $\\partial C(\\theta) / \\partial \\theta_1$ we can show  that the gradient can be written as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9fe2a0b3",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\nabla_{\\theta} C(\\theta) = \\frac{2}{n}\\begin{bmatrix} \\sum_{i=1}^{100} \\left(\\theta_0+\\theta_1x_i-y_i\\right) \\\\\n",
+    "\\sum_{i=1}^{100}\\left( x_i (\\theta_0+\\theta_1x_i)-y_ix_i\\right) \\\\\n",
+    "\\end{bmatrix} = \\frac{2}{n}X^T(X\\theta - \\mathbf{y}),\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2e678439",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where $X$ is the design matrix defined above."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5f45e358",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## The Hessian matrix\n",
+    "The Hessian matrix of $C(\\theta)$ is given by"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1713ee43",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\boldsymbol{H} \\equiv \\begin{bmatrix}\n",
+    "\\frac{\\partial^2 C(\\theta)}{\\partial \\theta_0^2} & \\frac{\\partial^2 C(\\theta)}{\\partial \\theta_0 \\partial \\theta_1}  \\\\\n",
+    "\\frac{\\partial^2 C(\\theta)}{\\partial \\theta_0 \\partial \\theta_1} & \\frac{\\partial^2 C(\\theta)}{\\partial \\theta_1^2} &  \\\\\n",
+    "\\end{bmatrix} = \\frac{2}{n}X^T X.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "671ea0fc",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "This result implies that $C(\\theta)$ is a convex function since the matrix $X^T X$ always is positive semi-definite."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7df56d17",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Simple program\n",
+    "\n",
+    "We can now write a program that minimizes $C(\\theta)$ using the gradient descent method with a constant learning rate $\\eta$ according to"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5887c657",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\theta_{k+1} = \\theta_k - \\eta \\nabla_\\theta C(\\theta_k), \\ k=0,1,\\cdots\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5a012ac0",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "We can use the expression we computed for the gradient and let use a\n",
+    "$\\theta_0$ be chosen randomly and let $\\eta = 0.001$. Stop iterating\n",
+    "when $||\\nabla_\\theta C(\\theta_k) || \\leq \\epsilon = 10^{-8}$. **Note that the code below does not include the latter stop criterion**.\n",
+    "\n",
+    "And finally we can compare our solution for $\\theta$ with the analytic result given by \n",
+    "$\\theta= (X^TX)^{-1} X^T \\mathbf{y}$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cf1fd4f4",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Gradient Descent Example\n",
+    "\n",
+    "Here our simple example"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "id": "4417d3aa",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "%matplotlib inline\n",
+    "\n",
+    "\n",
+    "# Importing various packages\n",
+    "from random import random, seed\n",
+    "import numpy as np\n",
+    "import matplotlib.pyplot as plt\n",
+    "from mpl_toolkits.mplot3d import Axes3D\n",
+    "from matplotlib import cm\n",
+    "from matplotlib.ticker import LinearLocator, FormatStrFormatter\n",
+    "import sys\n",
+    "\n",
+    "# the number of datapoints\n",
+    "n = 100\n",
+    "x = 2*np.random.rand(n,1)\n",
+    "y = 4+3*x+np.random.randn(n,1)\n",
+    "\n",
+    "X = np.c_[np.ones((n,1)), x]\n",
+    "# Hessian matrix\n",
+    "H = (2.0/n)* X.T @ X\n",
+    "# Get the eigenvalues\n",
+    "EigValues, EigVectors = np.linalg.eig(H)\n",
+    "print(f\"Eigenvalues of Hessian Matrix:{EigValues}\")\n",
+    "\n",
+    "theta_linreg = np.linalg.inv(X.T @ X) @ X.T @ y\n",
+    "print(theta_linreg)\n",
+    "theta = np.random.randn(2,1)\n",
+    "\n",
+    "eta = 1.0/np.max(EigValues)\n",
+    "Niterations = 1000\n",
+    "\n",
+    "for iter in range(Niterations):\n",
+    "    gradient = (2.0/n)*X.T @ (X @ theta-y)\n",
+    "    theta -= eta*gradient\n",
+    "\n",
+    "print(theta)\n",
+    "xnew = np.array([[0],[2]])\n",
+    "xbnew = np.c_[np.ones((2,1)), xnew]\n",
+    "ypredict = xbnew.dot(theta)\n",
+    "ypredict2 = xbnew.dot(theta_linreg)\n",
+    "plt.plot(xnew, ypredict, \"r-\")\n",
+    "plt.plot(xnew, ypredict2, \"b-\")\n",
+    "plt.plot(x, y ,'ro')\n",
+    "plt.axis([0,2.0,0, 15.0])\n",
+    "plt.xlabel(r'$x$')\n",
+    "plt.ylabel(r'$y$')\n",
+    "plt.title(r'Gradient descent example')\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7d39d005",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Gradient descent and Ridge\n",
+    "\n",
+    "We have also discussed Ridge regression where the loss function contains a regularized term given by the $L_2$ norm of $\\theta$,"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "45a85d32",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "C_{\\text{ridge}}(\\theta) = \\frac{1}{n}||X\\theta -\\mathbf{y}||^2 + \\lambda ||\\theta||^2, \\ \\lambda \\geq 0.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "31d267ea",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "In order to minimize $C_{\\text{ridge}}(\\theta)$ using GD we adjust the gradient as follows"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f8f50b02",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\nabla_\\theta C_{\\text{ridge}}(\\theta)  = \\frac{2}{n}\\begin{bmatrix} \\sum_{i=1}^{100} \\left(\\theta_0+\\theta_1x_i-y_i\\right) \\\\\n",
+    "\\sum_{i=1}^{100}\\left( x_i (\\theta_0+\\theta_1x_i)-y_ix_i\\right) \\\\\n",
+    "\\end{bmatrix} + 2\\lambda\\begin{bmatrix} \\theta_0 \\\\ \\theta_1\\end{bmatrix} = 2 (\\frac{1}{n}X^T(X\\theta - \\mathbf{y})+\\lambda \\theta).\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ac21d44c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "We can easily extend our program to minimize $C_{\\text{ridge}}(\\theta)$ using gradient descent and compare with the analytical solution given by"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "aae5aaa1",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\theta_{\\text{ridge}} = \\left(X^T X + n\\lambda I_{2 \\times 2} \\right)^{-1} X^T \\mathbf{y}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "319922a5",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## The Hessian matrix for Ridge Regression\n",
+    "The Hessian matrix of Ridge Regression for our simple example  is given by"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "724078a1",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\boldsymbol{H} \\equiv \\begin{bmatrix}\n",
+    "\\frac{\\partial^2 C(\\theta)}{\\partial \\theta_0^2} & \\frac{\\partial^2 C(\\theta)}{\\partial \\theta_0 \\partial \\theta_1}  \\\\\n",
+    "\\frac{\\partial^2 C(\\theta)}{\\partial \\theta_0 \\partial \\theta_1} & \\frac{\\partial^2 C(\\theta)}{\\partial \\theta_1^2} &  \\\\\n",
+    "\\end{bmatrix} = \\frac{2}{n}X^T X+2\\lambda\\boldsymbol{I}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "dbc443e3",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "This implies that the Hessian matrix  is positive definite, hence the stationary point is a\n",
+    "minimum.\n",
+    "Note that the Ridge cost function is convex being  a sum of two convex\n",
+    "functions. Therefore, the stationary point is a global\n",
+    "minimum of this function."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2ea2bf50",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Program example for gradient descent with Ridge Regression"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "id": "9f431da1",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "from random import random, seed\n",
+    "import numpy as np\n",
+    "import matplotlib.pyplot as plt\n",
+    "from mpl_toolkits.mplot3d import Axes3D\n",
+    "from matplotlib import cm\n",
+    "from matplotlib.ticker import LinearLocator, FormatStrFormatter\n",
+    "import sys\n",
+    "\n",
+    "# the number of datapoints\n",
+    "n = 100\n",
+    "x = 2*np.random.rand(n,1)\n",
+    "y = 4+3*x+np.random.randn(n,1)\n",
+    "\n",
+    "X = np.c_[np.ones((n,1)), x]\n",
+    "XT_X = X.T @ X\n",
+    "\n",
+    "#Ridge parameter lambda\n",
+    "lmbda  = 0.001\n",
+    "Id = n*lmbda* np.eye(XT_X.shape[0])\n",
+    "\n",
+    "# Hessian matrix\n",
+    "H = (2.0/n)* XT_X+2*lmbda* np.eye(XT_X.shape[0])\n",
+    "# Get the eigenvalues\n",
+    "EigValues, EigVectors = np.linalg.eig(H)\n",
+    "print(f\"Eigenvalues of Hessian Matrix:{EigValues}\")\n",
+    "\n",
+    "\n",
+    "theta_linreg = np.linalg.inv(XT_X+Id) @ X.T @ y\n",
+    "print(theta_linreg)\n",
+    "# Start plain gradient descent\n",
+    "theta = np.random.randn(2,1)\n",
+    "\n",
+    "eta = 1.0/np.max(EigValues)\n",
+    "Niterations = 100\n",
+    "\n",
+    "for iter in range(Niterations):\n",
+    "    gradients = 2.0/n*X.T @ (X @ (theta)-y)+2*lmbda*theta\n",
+    "    theta -= eta*gradients\n",
+    "\n",
+    "print(theta)\n",
+    "ypredict = X @ theta\n",
+    "ypredict2 = X @ theta_linreg\n",
+    "plt.plot(x, ypredict, \"r-\")\n",
+    "plt.plot(x, ypredict2, \"b-\")\n",
+    "plt.plot(x, y ,'ro')\n",
+    "plt.axis([0,2.0,0, 15.0])\n",
+    "plt.xlabel(r'$x$')\n",
+    "plt.ylabel(r'$y$')\n",
+    "plt.title(r'Gradient descent example for Ridge')\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8aa155a9",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Using gradient descent methods, limitations\n",
+    "\n",
+    "* **Gradient descent (GD) finds local minima of our function**. Since the GD algorithm is deterministic, if it converges, it will converge to a local minimum of our cost/loss/risk function. Because in ML we are often dealing with extremely rugged landscapes with many local minima, this can lead to poor performance.\n",
+    "\n",
+    "* **GD is sensitive to initial conditions**. One consequence of the local nature of GD is that initial conditions matter. Depending on where one starts, one will end up at a different local minima. Therefore, it is very important to think about how one initializes the training process. This is true for GD as well as more complicated variants of GD.\n",
+    "\n",
+    "* **Gradients are computationally expensive to calculate for large datasets**. In many cases in statistics and ML, the cost/loss/risk function is a sum of terms, with one term for each data point. For example, in linear regression, $E \\propto \\sum_{i=1}^n (y_i - \\mathbf{w}^T\\cdot\\mathbf{x}_i)^2$; for logistic regression, the square error is replaced by the cross entropy. To calculate the gradient we have to sum over *all* $n$ data points. Doing this at every GD step becomes extremely computationally expensive. An ingenious solution to this, is to calculate the gradients using small subsets of the data called \"mini batches\". This has the added benefit of introducing stochasticity into our algorithm.\n",
+    "\n",
+    "* **GD is very sensitive to choices of learning rates**. GD is extremely sensitive to the choice of learning rates. If the learning rate is very small, the training process take an extremely long time. For larger learning rates, GD can diverge and give poor results. Furthermore, depending on what the local landscape looks like, we have to modify the learning rates to ensure convergence. Ideally, we would *adaptively* choose the learning rates to match the landscape.\n",
+    "\n",
+    "* **GD treats all directions in parameter space uniformly.** Another major drawback of GD is that unlike Newton's method, the learning rate for GD is the same in all directions in parameter space. For this reason, the maximum learning rate is set by the behavior of the steepest direction and this can significantly slow down training. Ideally, we would like to take large steps in flat directions and small steps in steep directions. Since we are exploring rugged landscapes where curvatures change, this requires us to keep track of not only the gradient but second derivatives. The ideal scenario would be to calculate the Hessian but this proves to be too computationally expensive. \n",
+    "\n",
+    "* GD can take exponential time to escape saddle points, even with random initialization. As we mentioned, GD is extremely sensitive to initial condition since it determines the particular local minimum GD would eventually reach. However, even with a good initialization scheme, through the introduction of randomness, GD can still take exponential time to escape saddle points."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "03bd2e44",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Momentum based GD\n",
+    "\n",
+    "We discuss here some simple examples where we introduce what is called\n",
+    "'memory'about previous steps, or what is normally called momentum\n",
+    "gradient descent.\n",
+    "For the mathematical details, see whiteboad notes from lecture on September 8, 2025."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0e101e2d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Improving gradient descent with momentum"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "id": "09ecede4",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "from numpy import asarray\n",
+    "from numpy import arange\n",
+    "from numpy.random import rand\n",
+    "from numpy.random import seed\n",
+    "from matplotlib import pyplot\n",
+    " \n",
+    "# objective function\n",
+    "def objective(x):\n",
+    "\treturn x**2.0\n",
+    " \n",
+    "# derivative of objective function\n",
+    "def derivative(x):\n",
+    "\treturn x * 2.0\n",
+    " \n",
+    "# gradient descent algorithm\n",
+    "def gradient_descent(objective, derivative, bounds, n_iter, step_size):\n",
+    "\t# track all solutions\n",
+    "\tsolutions, scores = list(), list()\n",
+    "\t# generate an initial point\n",
+    "\tsolution = bounds[:, 0] + rand(len(bounds)) * (bounds[:, 1] - bounds[:, 0])\n",
+    "\t# run the gradient descent\n",
+    "\tfor i in range(n_iter):\n",
+    "\t\t# calculate gradient\n",
+    "\t\tgradient = derivative(solution)\n",
+    "\t\t# take a step\n",
+    "\t\tsolution = solution - step_size * gradient\n",
+    "\t\t# evaluate candidate point\n",
+    "\t\tsolution_eval = objective(solution)\n",
+    "\t\t# store solution\n",
+    "\t\tsolutions.append(solution)\n",
+    "\t\tscores.append(solution_eval)\n",
+    "\t\t# report progress\n",
+    "\t\tprint('>%d f(%s) = %.5f' % (i, solution, solution_eval))\n",
+    "\treturn [solutions, scores]\n",
+    " \n",
+    "# seed the pseudo random number generator\n",
+    "seed(4)\n",
+    "# define range for input\n",
+    "bounds = asarray([[-1.0, 1.0]])\n",
+    "# define the total iterations\n",
+    "n_iter = 30\n",
+    "# define the step size\n",
+    "step_size = 0.1\n",
+    "# perform the gradient descent search\n",
+    "solutions, scores = gradient_descent(objective, derivative, bounds, n_iter, step_size)\n",
+    "# sample input range uniformly at 0.1 increments\n",
+    "inputs = arange(bounds[0,0], bounds[0,1]+0.1, 0.1)\n",
+    "# compute targets\n",
+    "results = objective(inputs)\n",
+    "# create a line plot of input vs result\n",
+    "pyplot.plot(inputs, results)\n",
+    "# plot the solutions found\n",
+    "pyplot.plot(solutions, scores, '.-', color='red')\n",
+    "# show the plot\n",
+    "pyplot.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3489dbbc",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Same code but now with momentum gradient descent"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "id": "426eaa39",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "from numpy import asarray\n",
+    "from numpy import arange\n",
+    "from numpy.random import rand\n",
+    "from numpy.random import seed\n",
+    "from matplotlib import pyplot\n",
+    " \n",
+    "# objective function\n",
+    "def objective(x):\n",
+    "\treturn x**2.0\n",
+    " \n",
+    "# derivative of objective function\n",
+    "def derivative(x):\n",
+    "\treturn x * 2.0\n",
+    " \n",
+    "# gradient descent algorithm\n",
+    "def gradient_descent(objective, derivative, bounds, n_iter, step_size, momentum):\n",
+    "\t# track all solutions\n",
+    "\tsolutions, scores = list(), list()\n",
+    "\t# generate an initial point\n",
+    "\tsolution = bounds[:, 0] + rand(len(bounds)) * (bounds[:, 1] - bounds[:, 0])\n",
+    "\t# keep track of the change\n",
+    "\tchange = 0.0\n",
+    "\t# run the gradient descent\n",
+    "\tfor i in range(n_iter):\n",
+    "\t\t# calculate gradient\n",
+    "\t\tgradient = derivative(solution)\n",
+    "\t\t# calculate update\n",
+    "\t\tnew_change = step_size * gradient + momentum * change\n",
+    "\t\t# take a step\n",
+    "\t\tsolution = solution - new_change\n",
+    "\t\t# save the change\n",
+    "\t\tchange = new_change\n",
+    "\t\t# evaluate candidate point\n",
+    "\t\tsolution_eval = objective(solution)\n",
+    "\t\t# store solution\n",
+    "\t\tsolutions.append(solution)\n",
+    "\t\tscores.append(solution_eval)\n",
+    "\t\t# report progress\n",
+    "\t\tprint('>%d f(%s) = %.5f' % (i, solution, solution_eval))\n",
+    "\treturn [solutions, scores]\n",
+    " \n",
+    "# seed the pseudo random number generator\n",
+    "seed(4)\n",
+    "# define range for input\n",
+    "bounds = asarray([[-1.0, 1.0]])\n",
+    "# define the total iterations\n",
+    "n_iter = 30\n",
+    "# define the step size\n",
+    "step_size = 0.1\n",
+    "# define momentum\n",
+    "momentum = 0.3\n",
+    "# perform the gradient descent search with momentum\n",
+    "solutions, scores = gradient_descent(objective, derivative, bounds, n_iter, step_size, momentum)\n",
+    "# sample input range uniformly at 0.1 increments\n",
+    "inputs = arange(bounds[0,0], bounds[0,1]+0.1, 0.1)\n",
+    "# compute targets\n",
+    "results = objective(inputs)\n",
+    "# create a line plot of input vs result\n",
+    "pyplot.plot(inputs, results)\n",
+    "# plot the solutions found\n",
+    "pyplot.plot(solutions, scores, '.-', color='red')\n",
+    "# show the plot\n",
+    "pyplot.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6220214d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Overview video on Stochastic Gradient Descent (SGD)\n",
+    "\n",
+    "[What is Stochastic Gradient Descent](https://www.youtube.com/watch?v=vMh0zPT0tLI&ab_channel=StatQuestwithJoshStarmer)\n",
+    "There are several reasons for using stochastic gradient descent. Some of these are:\n",
+    "\n",
+    "1. Efficiency: Updates weights more frequently using a single or a small batch of samples, which speeds up convergence.\n",
+    "\n",
+    "2. Hopefully avoid Local Minima\n",
+    "\n",
+    "3. Memory Usage: Requires less memory compared to computing gradients for the entire dataset."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "bf86ac65",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Batches and mini-batches\n",
+    "\n",
+    "In gradient descent we compute the cost function and its gradient for all data points we have.\n",
+    "\n",
+    "In large-scale applications such as the [ILSVRC challenge](https://www.image-net.org/challenges/LSVRC/), the\n",
+    "training data can have on order of millions of examples. Hence, it\n",
+    "seems wasteful to compute the full cost function over the entire\n",
+    "training set in order to perform only a single parameter update. A\n",
+    "very common approach to addressing this challenge is to compute the\n",
+    "gradient over batches of the training data. For example, a typical batch could contain some thousand  examples from\n",
+    "an  entire training set of several millions. This batch is then used to\n",
+    "perform a parameter update."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4ac61edb",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Pros and cons\n",
+    "\n",
+    "1. Speed: SGD is faster than gradient descent because it uses only one training example per iteration, whereas gradient descent requires the entire dataset. This speed advantage becomes more significant as the size of the dataset increases.\n",
+    "\n",
+    "2. Convergence: Gradient descent has a more predictable convergence behaviour because it uses the average gradient of the entire dataset. In contrast, SGD’s convergence behaviour can be more erratic due to its random sampling of individual training examples.\n",
+    "\n",
+    "3. Memory: Gradient descent requires more memory than SGD because it must store the entire dataset for each iteration. SGD only needs to store the current training example, making it more memory-efficient."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0058008d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Convergence rates\n",
+    "\n",
+    "1. Stochastic Gradient Descent has a faster convergence rate due to the use of single training examples in each iteration.\n",
+    "\n",
+    "2. Gradient Descent as a slower convergence rate, as it uses the entire dataset for each iteration."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f994e1e2",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Accuracy\n",
+    "\n",
+    "In general, stochastic Gradient Descent is Less accurate than gradient\n",
+    "descent, as it calculates the gradient on single examples, which may\n",
+    "not accurately represent the overall dataset.  Gradient Descent is\n",
+    "more accurate because it uses the average gradient calculated over the\n",
+    "entire dataset.\n",
+    "\n",
+    "There are other disadvantages to using SGD. The main drawback is that\n",
+    "its convergence behaviour can be more erratic due to the random\n",
+    "sampling of individual training examples. This can lead to less\n",
+    "accurate results, as the algorithm may not converge to the true\n",
+    "minimum of the cost function. Additionally, the learning rate, which\n",
+    "determines the step size of each update to the model’s parameters,\n",
+    "must be carefully chosen to ensure convergence.\n",
+    "\n",
+    "It is however the method of choice in deep learning algorithms where\n",
+    "SGD is often used in combination with other optimization techniques,\n",
+    "such as momentum or adaptive learning rates"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "842a8611",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Stochastic Gradient Descent (SGD)\n",
+    "\n",
+    "In stochastic gradient descent, the extreme case is the case where we\n",
+    "have only one batch, that is we include the whole data set.\n",
+    "\n",
+    "This process is called Stochastic Gradient\n",
+    "Descent (SGD) (or also sometimes on-line gradient descent). This is\n",
+    "relatively less common to see because in practice due to vectorized\n",
+    "code optimizations it can be computationally much more efficient to\n",
+    "evaluate the gradient for 100 examples, than the gradient for one\n",
+    "example 100 times. Even though SGD technically refers to using a\n",
+    "single example at a time to evaluate the gradient, you will hear\n",
+    "people use the term SGD even when referring to mini-batch gradient\n",
+    "descent (i.e. mentions of MGD for “Minibatch Gradient Descent”, or BGD\n",
+    "for “Batch gradient descent” are rare to see), where it is usually\n",
+    "assumed that mini-batches are used. The size of the mini-batch is a\n",
+    "hyperparameter but it is not very common to cross-validate or bootstrap it. It is\n",
+    "usually based on memory constraints (if any), or set to some value,\n",
+    "e.g. 32, 64 or 128. We use powers of 2 in practice because many\n",
+    "vectorized operation implementations work faster when their inputs are\n",
+    "sized in powers of 2.\n",
+    "\n",
+    "In our notes with  SGD we mean stochastic gradient descent with mini-batches."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "90bd121a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Stochastic Gradient Descent\n",
+    "\n",
+    "Stochastic gradient descent (SGD) and variants thereof address some of\n",
+    "the shortcomings of the Gradient descent method discussed above.\n",
+    "\n",
+    "The underlying idea of SGD comes from the observation that the cost\n",
+    "function, which we want to minimize, can almost always be written as a\n",
+    "sum over $n$ data points $\\{\\mathbf{x}_i\\}_{i=1}^n$,"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5cd81303",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "C(\\mathbf{\\theta}) = \\sum_{i=1}^n c_i(\\mathbf{x}_i,\n",
+    "\\mathbf{\\theta}).\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "60e085a9",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Computation of gradients\n",
+    "\n",
+    "This in turn means that the gradient can be\n",
+    "computed as a sum over $i$-gradients"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "fef0100e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\nabla_\\theta C(\\mathbf{\\theta}) = \\sum_i^n \\nabla_\\theta c_i(\\mathbf{x}_i,\n",
+    "\\mathbf{\\theta}).\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "aaba7f05",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Stochasticity/randomness is introduced by only taking the\n",
+    "gradient on a subset of the data called minibatches.  If there are $n$\n",
+    "data points and the size of each minibatch is $M$, there will be $n/M$\n",
+    "minibatches. We denote these minibatches by $B_k$ where\n",
+    "$k=1,\\cdots,n/M$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "038b47ae",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## SGD example\n",
+    "As an example, suppose we have $10$ data points $(\\mathbf{x}_1,\\cdots, \\mathbf{x}_{10})$ \n",
+    "and we choose to have $M=5$ minibathces,\n",
+    "then each minibatch contains two data points. In particular we have\n",
+    "$B_1 = (\\mathbf{x}_1,\\mathbf{x}_2), \\cdots, B_5 =\n",
+    "(\\mathbf{x}_9,\\mathbf{x}_{10})$. Note that if you choose $M=1$ you\n",
+    "have only a single batch with all data points and on the other extreme,\n",
+    "you may choose $M=n$ resulting in a minibatch for each datapoint, i.e\n",
+    "$B_k = \\mathbf{x}_k$.\n",
+    "\n",
+    "The idea is now to approximate the gradient by replacing the sum over\n",
+    "all data points with a sum over the data points in one the minibatches\n",
+    "picked at random in each gradient descent step"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0ad42833",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\nabla_{\\theta}\n",
+    "C(\\mathbf{\\theta}) = \\sum_{i=1}^n \\nabla_\\theta c_i(\\mathbf{x}_i,\n",
+    "\\mathbf{\\theta}) \\rightarrow \\sum_{i \\in B_k}^n \\nabla_\\theta\n",
+    "c_i(\\mathbf{x}_i, \\mathbf{\\theta}).\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "64b15ba2",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## The gradient step\n",
+    "\n",
+    "Thus a gradient descent step now looks like"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "49c6adb0",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\theta_{j+1} = \\theta_j - \\eta_j \\sum_{i \\in B_k}^n \\nabla_\\theta c_i(\\mathbf{x}_i,\n",
+    "\\mathbf{\\theta})\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "82873545",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where $k$ is picked at random with equal\n",
+    "probability from $[1,n/M]$. An iteration over the number of\n",
+    "minibathces (n/M) is commonly referred to as an epoch. Thus it is\n",
+    "typical to choose a number of epochs and for each epoch iterate over\n",
+    "the number of minibatches, as exemplified in the code below."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "35a8e70d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Simple example code"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "id": "6aa32b90",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "import numpy as np \n",
+    "\n",
+    "n = 100 #100 datapoints \n",
+    "M = 5   #size of each minibatch\n",
+    "m = int(n/M) #number of minibatches\n",
+    "n_epochs = 10 #number of epochs\n",
+    "\n",
+    "j = 0\n",
+    "for epoch in range(1,n_epochs+1):\n",
+    "    for i in range(m):\n",
+    "        k = np.random.randint(m) #Pick the k-th minibatch at random\n",
+    "        #Compute the gradient using the data in minibatch Bk\n",
+    "        #Compute new suggestion for \n",
+    "        j += 1"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6e20f534",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Taking the gradient only on a subset of the data has two important\n",
+    "benefits. First, it introduces randomness which decreases the chance\n",
+    "that our opmization scheme gets stuck in a local minima. Second, if\n",
+    "the size of the minibatches are small relative to the number of\n",
+    "datapoints ($M <  n$), the computation of the gradient is much\n",
+    "cheaper since we sum over the datapoints in the $k-th$ minibatch and not\n",
+    "all $n$ datapoints."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "71745d3e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## When do we stop?\n",
+    "\n",
+    "A natural question is when do we stop the search for a new minimum?\n",
+    "One possibility is to compute the full gradient after a given number\n",
+    "of epochs and check if the norm of the gradient is smaller than some\n",
+    "threshold and stop if true. However, the condition that the gradient\n",
+    "is zero is valid also for local minima, so this would only tell us\n",
+    "that we are close to a local/global minimum. However, we could also\n",
+    "evaluate the cost function at this point, store the result and\n",
+    "continue the search. If the test kicks in at a later stage we can\n",
+    "compare the values of the cost function and keep the $\\theta$ that\n",
+    "gave the lowest value."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "bad95be2",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Slightly different approach\n",
+    "\n",
+    "Another approach is to let the step length $\\eta_j$ depend on the\n",
+    "number of epochs in such a way that it becomes very small after a\n",
+    "reasonable time such that we do not move at all. Such approaches are\n",
+    "also called scaling. There are many such ways to [scale the learning\n",
+    "rate](https://towardsdatascience.com/gradient-descent-the-learning-rate-and-the-importance-of-feature-scaling-6c0b416596e1)\n",
+    "and [discussions here](https://www.jmlr.org/papers/volume23/20-1258/20-1258.pdf). See\n",
+    "also\n",
+    "<https://towardsdatascience.com/learning-rate-schedules-and-adaptive-learning-rate-methods-for-deep-learning-2c8f433990d1>\n",
+    "for a discussion of different scaling functions for the learning rate."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "40b4d87e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Time decay rate\n",
+    "\n",
+    "As an example, let $e = 0,1,2,3,\\cdots$ denote the current epoch and let $t_0, t_1 > 0$ be two fixed numbers. Furthermore, let $t = e \\cdot m + i$ where $m$ is the number of minibatches and $i=0,\\cdots,m-1$. Then the function $$\\eta_j(t; t_0, t_1) = \\frac{t_0}{t+t_1} $$ goes to zero as the number of epochs gets large. I.e. we start with a step length $\\eta_j (0; t_0, t_1) = t_0/t_1$ which decays in *time* $t$.\n",
+    "\n",
+    "In this way we can fix the number of epochs, compute $\\theta$ and\n",
+    "evaluate the cost function at the end. Repeating the computation will\n",
+    "give a different result since the scheme is random by design. Then we\n",
+    "pick the final $\\theta$ that gives the lowest value of the cost\n",
+    "function."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "id": "1208bbec",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "import numpy as np \n",
+    "\n",
+    "def step_length(t,t0,t1):\n",
+    "    return t0/(t+t1)\n",
+    "\n",
+    "n = 100 #100 datapoints \n",
+    "M = 5   #size of each minibatch\n",
+    "m = int(n/M) #number of minibatches\n",
+    "n_epochs = 500 #number of epochs\n",
+    "t0 = 1.0\n",
+    "t1 = 10\n",
+    "\n",
+    "eta_j = t0/t1\n",
+    "j = 0\n",
+    "for epoch in range(1,n_epochs+1):\n",
+    "    for i in range(m):\n",
+    "        k = np.random.randint(m) #Pick the k-th minibatch at random\n",
+    "        #Compute the gradient using the data in minibatch Bk\n",
+    "        #Compute new suggestion for theta\n",
+    "        t = epoch*m+i\n",
+    "        eta_j = step_length(t,t0,t1)\n",
+    "        j += 1\n",
+    "\n",
+    "print(\"eta_j after %d epochs: %g\" % (n_epochs,eta_j))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b83b5ed1",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Code with a Number of Minibatches which varies\n",
+    "\n",
+    "In the code here we vary the number of mini-batches."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 8,
+   "id": "1f669db6",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "# Importing various packages\n",
+    "from math import exp, sqrt\n",
+    "from random import random, seed\n",
+    "import numpy as np\n",
+    "import matplotlib.pyplot as plt\n",
+    "\n",
+    "n = 100\n",
+    "x = 2*np.random.rand(n,1)\n",
+    "y = 4+3*x+np.random.randn(n,1)\n",
+    "\n",
+    "X = np.c_[np.ones((n,1)), x]\n",
+    "XT_X = X.T @ X\n",
+    "theta_linreg = np.linalg.inv(X.T @ X) @ (X.T @ y)\n",
+    "print(\"Own inversion\")\n",
+    "print(theta_linreg)\n",
+    "# Hessian matrix\n",
+    "H = (2.0/n)* XT_X\n",
+    "EigValues, EigVectors = np.linalg.eig(H)\n",
+    "print(f\"Eigenvalues of Hessian Matrix:{EigValues}\")\n",
+    "\n",
+    "theta = np.random.randn(2,1)\n",
+    "eta = 1.0/np.max(EigValues)\n",
+    "Niterations = 1000\n",
+    "\n",
+    "\n",
+    "for iter in range(Niterations):\n",
+    "    gradients = 2.0/n*X.T @ ((X @ theta)-y)\n",
+    "    theta -= eta*gradients\n",
+    "print(\"theta from own gd\")\n",
+    "print(theta)\n",
+    "\n",
+    "xnew = np.array([[0],[2]])\n",
+    "Xnew = np.c_[np.ones((2,1)), xnew]\n",
+    "ypredict = Xnew.dot(theta)\n",
+    "ypredict2 = Xnew.dot(theta_linreg)\n",
+    "\n",
+    "n_epochs = 50\n",
+    "M = 5   #size of each minibatch\n",
+    "m = int(n/M) #number of minibatches\n",
+    "t0, t1 = 5, 50\n",
+    "\n",
+    "def learning_schedule(t):\n",
+    "    return t0/(t+t1)\n",
+    "\n",
+    "theta = np.random.randn(2,1)\n",
+    "\n",
+    "for epoch in range(n_epochs):\n",
+    "# Can you figure out a better way of setting up the contributions to each batch?\n",
+    "    for i in range(m):\n",
+    "        random_index = M*np.random.randint(m)\n",
+    "        xi = X[random_index:random_index+M]\n",
+    "        yi = y[random_index:random_index+M]\n",
+    "        gradients = (2.0/M)* xi.T @ ((xi @ theta)-yi)\n",
+    "        eta = learning_schedule(epoch*m+i)\n",
+    "        theta = theta - eta*gradients\n",
+    "print(\"theta from own sdg\")\n",
+    "print(theta)\n",
+    "\n",
+    "plt.plot(xnew, ypredict, \"r-\")\n",
+    "plt.plot(xnew, ypredict2, \"b-\")\n",
+    "plt.plot(x, y ,'ro')\n",
+    "plt.axis([0,2.0,0, 15.0])\n",
+    "plt.xlabel(r'$x$')\n",
+    "plt.ylabel(r'$y$')\n",
+    "plt.title(r'Random numbers ')\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3e9ed564",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Replace or not\n",
+    "\n",
+    "In the above code, we have use replacement in setting up the\n",
+    "mini-batches. The discussion\n",
+    "[here](https://sebastianraschka.com/faq/docs/sgd-methods.html) may be\n",
+    "useful."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9c0ac318",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Second moment of the gradient\n",
+    "\n",
+    "In stochastic gradient descent, with and without momentum, we still\n",
+    "have to specify a schedule for tuning the learning rates $\\eta_t$\n",
+    "as a function of time.  As discussed in the context of Newton's\n",
+    "method, this presents a number of dilemmas. The learning rate is\n",
+    "limited by the steepest direction which can change depending on the\n",
+    "current position in the landscape. To circumvent this problem, ideally\n",
+    "our algorithm would keep track of curvature and take large steps in\n",
+    "shallow, flat directions and small steps in steep, narrow directions.\n",
+    "Second-order methods accomplish this by calculating or approximating\n",
+    "the Hessian and normalizing the learning rate by the\n",
+    "curvature. However, this is very computationally expensive for\n",
+    "extremely large models. Ideally, we would like to be able to\n",
+    "adaptively change the step size to match the landscape without paying\n",
+    "the steep computational price of calculating or approximating\n",
+    "Hessians.\n",
+    "\n",
+    "During the last decade a number of methods have been introduced that accomplish\n",
+    "this by tracking not only the gradient, but also the second moment of\n",
+    "the gradient. These methods include AdaGrad, AdaDelta, Root Mean Squared Propagation (RMS-Prop), and\n",
+    "[ADAM](https://arxiv.org/abs/1412.6980)."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d8f518c4",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Challenge: Choosing a Fixed Learning Rate\n",
+    "A fixed $\\eta$ is hard to get right:\n",
+    "1. If $\\eta$ is too large, the updates can overshoot the minimum, causing oscillations or divergence\n",
+    "\n",
+    "2. If $\\eta$ is too small, convergence is very slow (many iterations to make progress)\n",
+    "\n",
+    "In practice, one often uses trial-and-error or schedules (decaying $\\eta$ over time) to find a workable balance.\n",
+    "For a function with steep directions and flat directions, a single global $\\eta$ may be inappropriate:\n",
+    "1. Steep coordinates require a smaller step size to avoid oscillation.\n",
+    "\n",
+    "2. Flat/shallow coordinates could use a larger step to speed up progress.\n",
+    "\n",
+    "3. This issue is pronounced in high-dimensional problems with **sparse or varying-scale features** – we need a method to adjust step sizesper feature."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3dcb89bd",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Motivation for Adaptive Step Sizes\n",
+    "\n",
+    "1. Instead of a fixed global $\\eta$, use an **adaptive learning rate** for each parameter that depends on the history of gradients.\n",
+    "\n",
+    "2. Parameters that have large accumulated gradient magnitude should get smaller steps (they've been changing a lot), whereas parameters with small or infrequent gradients can have larger relative steps.\n",
+    "\n",
+    "3. This is especially useful for sparse features: Rarely active features accumulate little gradient, so their learning rate remains comparatively high, ensuring they are not neglected\n",
+    "\n",
+    "4. Conversely, frequently active features accumulate large gradient sums, and their learning rate automatically decreases, preventing too-large updates\n",
+    "\n",
+    "5. Several algorithms implement this idea (AdaGrad, RMSProp, AdaDelta, Adam, etc.). We will derive **AdaGrad**, one of the first adaptive methods."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8f258bc2",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## AdaGrad algorithm, taken from [Goodfellow et al](https://www.deeplearningbook.org/contents/optimization.html)\n",
+    "\n",
+    "<!-- dom:FIGURE: [figures/adagrad.png, width=600 frac=0.8] -->\n",
+    "<!-- begin figure -->\n",
+    "\n",
+    "<img src=\"figures/adagrad.png\" width=\"600\"><p style=\"font-size: 0.9em\"><i>Figure 1: </i></p>\n",
+    "<!-- end figure -->"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2a3715f8",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Derivation of the AdaGrad Algorithm\n",
+    "\n",
+    "**Accumulating Gradient History.**\n",
+    "\n",
+    "1. AdaGrad maintains a running sum of squared gradients for each parameter (coordinate)\n",
+    "\n",
+    "2. Let $g_t = \\nabla C_{i_t}(x_t)$ be the gradient at step $t$ (or a subgradient for nondifferentiable cases).\n",
+    "\n",
+    "3. Initialize $r_0 = 0$ (an all-zero vector in $\\mathbb{R}^d$).\n",
+    "\n",
+    "4. At each iteration $t$, update the accumulation:"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a1d9578a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "r_t = r_{t-1} + g_t \\circ g_t,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b6b5bc5e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "1. Here  $g_t \\circ g_t$ denotes element-wise square of the gradient vector. $g_t^{(j)} = g_{t-1}^{(j)} + (g_{t,j})^2$ for each parameter $j$.\n",
+    "\n",
+    "2. We can view $H_t = \\mathrm{diag}(r_t)$ as a diagonal matrix of past squared gradients. Initially $H_0 = 0$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "44b313c8",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## AdaGrad Update Rule Derivation\n",
+    "\n",
+    "We scale the gradient by the inverse square root of the accumulated matrix $H_t$. The AdaGrad update at step $t$ is:"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b56c85b9",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\theta_{t+1} =\\theta_t - \\eta H_t^{-1/2} g_t,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5bcc6bd2",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where $H_t^{-1/2}$ is the diagonal matrix with entries $(r_{t}^{(1)})^{-1/2}, \\dots, (r_{t}^{(d)})^{-1/2}$\n",
+    "In coordinates, this means each parameter $j$ has an individual step size:"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "41fc9f01",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\theta_{t+1,j} =\\theta_{t,j} -\\frac{\\eta}{\\sqrt{r_{t,j}}}g_{t,j}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8151719b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "In practice we add a small constant $\\epsilon$ in the denominator for numerical stability to avoid division by zero:"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "bb75b0ad",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\theta_{t+1,j}= \\theta_{t,j}-\\frac{\\eta}{\\sqrt{\\epsilon + r_{t,j}}}g_{t,j}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3c71fd46",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Equivalently, the effective learning rate for parameter $j$ at time $t$ is $\\displaystyle \\alpha_{t,j} = \\frac{\\eta}{\\sqrt{\\epsilon + r_{t,j}}}$. This decreases over time as $r_{t,j}$ grows."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1d835a18",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## AdaGrad Properties\n",
+    "\n",
+    "1. AdaGrad automatically tunes the step size for each parameter. Parameters with more *volatile or large gradients* get smaller steps, and those with *small or infrequent gradients* get relatively larger steps\n",
+    "\n",
+    "2. No manual schedule needed: The accumulation $r_t$ keeps increasing (or stays the same if gradient is zero), so step sizes $\\eta/\\sqrt{r_t}$ are non-increasing. This has a similar effect to a learning rate schedule, but individualized per coordinate.\n",
+    "\n",
+    "3. Sparse data benefit: For very sparse features, $r_{t,j}$ grows slowly, so that feature’s parameter retains a higher learning rate for longer, allowing it to make significant updates when it does get a gradient signal\n",
+    "\n",
+    "4. Convergence: In convex optimization, AdaGrad can be shown to achieve a sub-linear convergence rate  comparable to the best fixed learning rate tuned for the problem\n",
+    "\n",
+    "It effectively reduces the need to tune $\\eta$ by hand.\n",
+    "1. Limitations: Because $r_t$ accumulates without bound, AdaGrad’s learning rates can become extremely small over long training, potentially slowing progress. (Later variants like RMSProp, AdaDelta, Adam address this by modifying the accumulation rule.)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "77dcc8c3",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## RMSProp: Adaptive Learning Rates\n",
+    "\n",
+    "Addresses AdaGrad’s diminishing learning rate issue.\n",
+    "Uses a decaying average of squared gradients (instead of a cumulative sum):"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "21161d57",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "v_t = \\rho v_{t-1} + (1-\\rho)(\\nabla C(\\theta_t))^2,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e87e09a9",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "with $\\rho$ typically $0.9$ (or $0.99$).\n",
+    "1. Update: $\\theta_{t+1} = \\theta_t - \\frac{\\eta}{\\sqrt{v_t + \\epsilon}} \\nabla C(\\theta_t)$.\n",
+    "\n",
+    "2. Recent gradients have more weight, so $v_t$ adapts to the current landscape.\n",
+    "\n",
+    "3. Avoids AdaGrad’s “infinite memory” problem – learning rate does not continuously decay to zero.\n",
+    "\n",
+    "RMSProp was first proposed in lecture notes by Geoff Hinton, 2012 - unpublished.)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1a98c681",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## RMSProp algorithm, taken from [Goodfellow et al](https://www.deeplearningbook.org/contents/optimization.html)\n",
+    "\n",
+    "<!-- dom:FIGURE: [figures/rmsprop.png, width=600 frac=0.8] -->\n",
+    "<!-- begin figure -->\n",
+    "\n",
+    "<img src=\"figures/rmsprop.png\" width=\"600\"><p style=\"font-size: 0.9em\"><i>Figure 1: </i></p>\n",
+    "<!-- end figure -->"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8b337277",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Adam Optimizer\n",
+    "\n",
+    "Why combine Momentum and RMSProp? Motivation for Adam: Adaptive Moment Estimation (Adam) was introduced by Kingma an Ba (2014) to combine the benefits of momentum and RMSProp.\n",
+    "\n",
+    "1. Fast convergence by smoothing gradients (accelerates in long-term gradient direction).\n",
+    "\n",
+    "2. Adaptive rates (RMSProp): Per-dimension learning rate scaling for stability (handles different feature scales, sparse gradients).\n",
+    "\n",
+    "3. Adam uses both: maintains moving averages of both first moment (gradients) and second moment (squared gradients)\n",
+    "\n",
+    "4. Additionally, includes a mechanism to correct the bias in these moving averages (crucial in early iterations)\n",
+    "\n",
+    "**Result**: Adam is robust, achieves faster convergence with less tuning, and often outperforms SGD (with momentum) in practice."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "af77b83f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## [ADAM optimizer](https://arxiv.org/abs/1412.6980)\n",
+    "\n",
+    "In [ADAM](https://arxiv.org/abs/1412.6980), we keep a running average of\n",
+    "both the first and second moment of the gradient and use this\n",
+    "information to adaptively change the learning rate for different\n",
+    "parameters.  The method is efficient when working with large\n",
+    "problems involving lots data and/or parameters.  It is a combination of the\n",
+    "gradient descent with momentum algorithm and the RMSprop algorithm\n",
+    "discussed above."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "bc924f77",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Why Combine Momentum and RMSProp?\n",
+    "\n",
+    "1. Momentum: Fast convergence by smoothing gradients (accelerates in long-term gradient direction).\n",
+    "\n",
+    "2. Adaptive rates (RMSProp): Per-dimension learning rate scaling for stability (handles different feature scales, sparse gradients).\n",
+    "\n",
+    "3. Adam uses both: maintains moving averages of both first moment (gradients) and second moment (squared gradients)\n",
+    "\n",
+    "4. Additionally, includes a mechanism to correct the bias in these moving averages (crucial in early iterations)\n",
+    "\n",
+    "Result: Adam is robust, achieves faster convergence with less tuning, and often outperforms SGD (with momentum) in practice"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "86e5ab5e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Adam: Exponential Moving Averages (Moments)\n",
+    "Adam maintains two moving averages at each time step $t$ for each parameter $w$:\n",
+    "**First moment (mean) $m_t$.**\n",
+    "\n",
+    "The Momentum term"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "949f359d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "m_t = \\beta_1m_{t-1} + (1-\\beta_1)\\, \\nabla C(\\theta_t),\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0ba26be3",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "**Second moment (uncentered variance) $v_t$.**\n",
+    "\n",
+    "The RMS term"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4fb9b2a2",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "v_t = \\beta_2v_{t-1} + (1-\\beta_2)(\\nabla C(\\theta_t))^2,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8711e597",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "with typical $\\beta_1 = 0.9$, $\\beta_2 = 0.999$. Initialize $m_0 = 0$, $v_0 = 0$.\n",
+    "\n",
+    "  These are **biased** estimators of the true first and second moment of the gradients, especially at the start (since $m_0,v_0$ are zero)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "49e6e73d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Adam: Bias Correction\n",
+    "To counteract initialization bias in $m_t, v_t$, Adam computes bias-corrected estimates"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ca5bb491",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\hat{m}_t = \\frac{m_t}{1 - \\beta_1^t}, \\qquad \\hat{v}_t = \\frac{v_t}{1 - \\beta_2^t}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5e19d7bf",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "* When $t$ is small, $1-\\beta_i^t \\approx 0$, so $\\hat{m}_t, \\hat{v}_t$ significantly larger than raw $m_t, v_t$, compensating for the initial zero bias.\n",
+    "\n",
+    "* As $t$ increases, $1-\\beta_i^t \\to 1$, and $\\hat{m}_t, \\hat{v}_t$ converge to $m_t, v_t$.\n",
+    "\n",
+    "* Bias correction is important for Adam’s stability in early iterations"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f79d952e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Adam: Update Rule Derivation\n",
+    "Finally, Adam updates parameters using the bias-corrected moments:"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "13e9862f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\theta_{t+1} =\\theta_t -\\frac{\\alpha}{\\sqrt{\\hat{v}_t} + \\epsilon}\\hat{m}_t,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5693500e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where $\\epsilon$ is a small constant (e.g. $10^{-8}$) to prevent division by zero.\n",
+    "Breaking it down:\n",
+    "1. Compute gradient $\\nabla C(\\theta_t)$.\n",
+    "\n",
+    "2. Update first moment $m_t$ and second moment $v_t$ (exponential moving averages).\n",
+    "\n",
+    "3. Bias-correct: $\\hat{m}_t = m_t/(1-\\beta_1^t)$, $\\; \\hat{v}_t = v_t/(1-\\beta_2^t)$.\n",
+    "\n",
+    "4. Compute step: $\\Delta \\theta_t = \\frac{\\hat{m}_t}{\\sqrt{\\hat{v}_t} + \\epsilon}$.\n",
+    "\n",
+    "5. Update parameters: $\\theta_{t+1} = \\theta_t - \\alpha\\, \\Delta \\theta_t$.\n",
+    "\n",
+    "This is the Adam update rule as given in the original paper."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "65a5e1e7",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Adam vs. AdaGrad and RMSProp\n",
+    "\n",
+    "1. AdaGrad: Uses per-coordinate scaling like Adam, but no momentum. Tends to slow down too much due to cumulative history (no forgetting)\n",
+    "\n",
+    "2. RMSProp: Uses moving average of squared gradients (like Adam’s $v_t$) to maintain adaptive learning rates, but does not include momentum or bias-correction.\n",
+    "\n",
+    "3. Adam: Effectively RMSProp + Momentum + Bias-correction\n",
+    "\n",
+    "  * Momentum ($m_t$) provides acceleration and smoother convergence.\n",
+    "\n",
+    "  * Adaptive $v_t$ scaling moderates the step size per dimension.\n",
+    "\n",
+    "  * Bias correction (absent in AdaGrad/RMSProp) ensures robust estimates early on.\n",
+    "\n",
+    "In practice, Adam often yields faster convergence and better tuning stability than RMSProp or AdaGrad alone"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "27686255",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Adaptivity Across Dimensions\n",
+    "\n",
+    "1. Adam adapts the step size \\emph{per coordinate}: parameters with larger gradient variance get smaller effective steps, those with smaller or sparse gradients get larger steps.\n",
+    "\n",
+    "2. This per-dimension adaptivity is inherited from AdaGrad/RMSProp and helps handle ill-conditioned or sparse problems.\n",
+    "\n",
+    "3. Meanwhile, momentum (first moment) allows Adam to continue making progress even if gradients become small or noisy, by leveraging accumulated direction."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f3dfc1e2",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## ADAM algorithm, taken from [Goodfellow et al](https://www.deeplearningbook.org/contents/optimization.html)\n",
+    "\n",
+    "<!-- dom:FIGURE: [figures/adam.png, width=600 frac=0.8] -->\n",
+    "<!-- begin figure -->\n",
+    "\n",
+    "<img src=\"figures/adam.png\" width=\"600\"><p style=\"font-size: 0.9em\"><i>Figure 1: </i></p>\n",
+    "<!-- end figure -->"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "045d399c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Algorithms and codes for Adagrad, RMSprop and Adam\n",
+    "\n",
+    "The algorithms we have implemented are well described in the text by [Goodfellow, Bengio and Courville, chapter 8](https://www.deeplearningbook.org/contents/optimization.html).\n",
+    "\n",
+    "The codes which implement these algorithms are discussed below here."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4e75ee41",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Practical tips\n",
+    "\n",
+    "* **Randomize the data when making mini-batches**. It is always important to randomly shuffle the data when forming mini-batches. Otherwise, the gradient descent method can fit spurious correlations resulting from the order in which data is presented.\n",
+    "\n",
+    "* **Transform your inputs**. Learning becomes difficult when our landscape has a mixture of steep and flat directions. One simple trick for minimizing these situations is to standardize the data by subtracting the mean and normalizing the variance of input variables. Whenever possible, also decorrelate the inputs. To understand why this is helpful, consider the case of linear regression. It is easy to show that for the squared error cost function, the Hessian of the cost function is just the correlation matrix between the inputs. Thus, by standardizing the inputs, we are ensuring that the landscape looks homogeneous in all directions in parameter space. Since most deep networks can be viewed as linear transformations followed by a non-linearity at each layer, we expect this intuition to hold beyond the linear case.\n",
+    "\n",
+    "* **Monitor the out-of-sample performance.** Always monitor the performance of your model on a validation set (a small portion of the training data that is held out of the training process to serve as a proxy for the test set. If the validation error starts increasing, then the model is beginning to overfit. Terminate the learning process. This *early stopping* significantly improves performance in many settings.\n",
+    "\n",
+    "* **Adaptive optimization methods don't always have good generalization.** Recent studies have shown that adaptive methods such as ADAM, RMSPorp, and AdaGrad tend to have poor generalization compared to SGD or SGD with momentum, particularly in the high-dimensional limit (i.e. the number of parameters exceeds the number of data points). Although it is not clear at this stage why these methods perform so well in training deep neural networks, simpler procedures like properly-tuned SGD may work as well or better in these applications."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ddbb28ab",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Sneaking in automatic differentiation using Autograd\n",
+    "\n",
+    "In the examples here we take the liberty of sneaking in automatic\n",
+    "differentiation (without having discussed the mathematics).  In\n",
+    "project 1 you will write the gradients as discussed above, that is\n",
+    "hard-coding the gradients.  By introducing automatic differentiation\n",
+    "via the library **autograd**, which is now replaced by **JAX**, we have\n",
+    "more flexibility in setting up alternative cost functions.\n",
+    "\n",
+    "The\n",
+    "first example shows results with ordinary leats squares."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 9,
+   "id": "dae38b6c",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "# Using Autograd to calculate gradients for OLS\n",
+    "from random import random, seed\n",
+    "import numpy as np\n",
+    "import autograd.numpy as np\n",
+    "import matplotlib.pyplot as plt\n",
+    "from autograd import grad\n",
+    "\n",
+    "def CostOLS(theta):\n",
+    "    return (1.0/n)*np.sum((y-X @ theta)**2)\n",
+    "\n",
+    "n = 100\n",
+    "x = 2*np.random.rand(n,1)\n",
+    "y = 4+3*x+np.random.randn(n,1)\n",
+    "\n",
+    "X = np.c_[np.ones((n,1)), x]\n",
+    "XT_X = X.T @ X\n",
+    "theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)\n",
+    "print(\"Own inversion\")\n",
+    "print(theta_linreg)\n",
+    "# Hessian matrix\n",
+    "H = (2.0/n)* XT_X\n",
+    "EigValues, EigVectors = np.linalg.eig(H)\n",
+    "print(f\"Eigenvalues of Hessian Matrix:{EigValues}\")\n",
+    "\n",
+    "theta = np.random.randn(2,1)\n",
+    "eta = 1.0/np.max(EigValues)\n",
+    "Niterations = 1000\n",
+    "# define the gradient\n",
+    "training_gradient = grad(CostOLS)\n",
+    "\n",
+    "for iter in range(Niterations):\n",
+    "    gradients = training_gradient(theta)\n",
+    "    theta -= eta*gradients\n",
+    "print(\"theta from own gd\")\n",
+    "print(theta)\n",
+    "\n",
+    "xnew = np.array([[0],[2]])\n",
+    "Xnew = np.c_[np.ones((2,1)), xnew]\n",
+    "ypredict = Xnew.dot(theta)\n",
+    "ypredict2 = Xnew.dot(theta_linreg)\n",
+    "\n",
+    "plt.plot(xnew, ypredict, \"r-\")\n",
+    "plt.plot(xnew, ypredict2, \"b-\")\n",
+    "plt.plot(x, y ,'ro')\n",
+    "plt.axis([0,2.0,0, 15.0])\n",
+    "plt.xlabel(r'$x$')\n",
+    "plt.ylabel(r'$y$')\n",
+    "plt.title(r'Random numbers ')\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ca5a343a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Same code but now with momentum gradient descent"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 10,
+   "id": "08d97c1e",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "# Using Autograd to calculate gradients for OLS\n",
+    "from random import random, seed\n",
+    "import numpy as np\n",
+    "import autograd.numpy as np\n",
+    "import matplotlib.pyplot as plt\n",
+    "from autograd import grad\n",
+    "\n",
+    "def CostOLS(theta):\n",
+    "    return (1.0/n)*np.sum((y-X @ theta)**2)\n",
+    "\n",
+    "n = 100\n",
+    "x = 2*np.random.rand(n,1)\n",
+    "y = 4+3*x#+np.random.randn(n,1)\n",
+    "\n",
+    "X = np.c_[np.ones((n,1)), x]\n",
+    "XT_X = X.T @ X\n",
+    "theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)\n",
+    "print(\"Own inversion\")\n",
+    "print(theta_linreg)\n",
+    "# Hessian matrix\n",
+    "H = (2.0/n)* XT_X\n",
+    "EigValues, EigVectors = np.linalg.eig(H)\n",
+    "print(f\"Eigenvalues of Hessian Matrix:{EigValues}\")\n",
+    "\n",
+    "theta = np.random.randn(2,1)\n",
+    "eta = 1.0/np.max(EigValues)\n",
+    "Niterations = 30\n",
+    "\n",
+    "# define the gradient\n",
+    "training_gradient = grad(CostOLS)\n",
+    "\n",
+    "for iter in range(Niterations):\n",
+    "    gradients = training_gradient(theta)\n",
+    "    theta -= eta*gradients\n",
+    "    print(iter,gradients[0],gradients[1])\n",
+    "print(\"theta from own gd\")\n",
+    "print(theta)\n",
+    "\n",
+    "# Now improve with momentum gradient descent\n",
+    "change = 0.0\n",
+    "delta_momentum = 0.3\n",
+    "for iter in range(Niterations):\n",
+    "    # calculate gradient\n",
+    "    gradients = training_gradient(theta)\n",
+    "    # calculate update\n",
+    "    new_change = eta*gradients+delta_momentum*change\n",
+    "    # take a step\n",
+    "    theta -= new_change\n",
+    "    # save the change\n",
+    "    change = new_change\n",
+    "    print(iter,gradients[0],gradients[1])\n",
+    "print(\"theta from own gd wth momentum\")\n",
+    "print(theta)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "727d8fc3",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Including Stochastic Gradient Descent with Autograd\n",
+    "\n",
+    "In this code we include the stochastic gradient descent approach\n",
+    "discussed above. Note here that we specify which argument we are\n",
+    "taking the derivative with respect to when using **autograd**."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 11,
+   "id": "4e41c003",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "# Using Autograd to calculate gradients using SGD\n",
+    "# OLS example\n",
+    "from random import random, seed\n",
+    "import numpy as np\n",
+    "import autograd.numpy as np\n",
+    "import matplotlib.pyplot as plt\n",
+    "from autograd import grad\n",
+    "\n",
+    "# Note change from previous example\n",
+    "def CostOLS(y,X,theta):\n",
+    "    return np.sum((y-X @ theta)**2)\n",
+    "\n",
+    "n = 100\n",
+    "x = 2*np.random.rand(n,1)\n",
+    "y = 4+3*x+np.random.randn(n,1)\n",
+    "\n",
+    "X = np.c_[np.ones((n,1)), x]\n",
+    "XT_X = X.T @ X\n",
+    "theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)\n",
+    "print(\"Own inversion\")\n",
+    "print(theta_linreg)\n",
+    "# Hessian matrix\n",
+    "H = (2.0/n)* XT_X\n",
+    "EigValues, EigVectors = np.linalg.eig(H)\n",
+    "print(f\"Eigenvalues of Hessian Matrix:{EigValues}\")\n",
+    "\n",
+    "theta = np.random.randn(2,1)\n",
+    "eta = 1.0/np.max(EigValues)\n",
+    "Niterations = 1000\n",
+    "\n",
+    "# Note that we request the derivative wrt third argument (theta, 2 here)\n",
+    "training_gradient = grad(CostOLS,2)\n",
+    "\n",
+    "for iter in range(Niterations):\n",
+    "    gradients = (1.0/n)*training_gradient(y, X, theta)\n",
+    "    theta -= eta*gradients\n",
+    "print(\"theta from own gd\")\n",
+    "print(theta)\n",
+    "\n",
+    "xnew = np.array([[0],[2]])\n",
+    "Xnew = np.c_[np.ones((2,1)), xnew]\n",
+    "ypredict = Xnew.dot(theta)\n",
+    "ypredict2 = Xnew.dot(theta_linreg)\n",
+    "\n",
+    "plt.plot(xnew, ypredict, \"r-\")\n",
+    "plt.plot(xnew, ypredict2, \"b-\")\n",
+    "plt.plot(x, y ,'ro')\n",
+    "plt.axis([0,2.0,0, 15.0])\n",
+    "plt.xlabel(r'$x$')\n",
+    "plt.ylabel(r'$y$')\n",
+    "plt.title(r'Random numbers ')\n",
+    "plt.show()\n",
+    "\n",
+    "n_epochs = 50\n",
+    "M = 5   #size of each minibatch\n",
+    "m = int(n/M) #number of minibatches\n",
+    "t0, t1 = 5, 50\n",
+    "def learning_schedule(t):\n",
+    "    return t0/(t+t1)\n",
+    "\n",
+    "theta = np.random.randn(2,1)\n",
+    "\n",
+    "for epoch in range(n_epochs):\n",
+    "# Can you figure out a better way of setting up the contributions to each batch?\n",
+    "    for i in range(m):\n",
+    "        random_index = M*np.random.randint(m)\n",
+    "        xi = X[random_index:random_index+M]\n",
+    "        yi = y[random_index:random_index+M]\n",
+    "        gradients = (1.0/M)*training_gradient(yi, xi, theta)\n",
+    "        eta = learning_schedule(epoch*m+i)\n",
+    "        theta = theta - eta*gradients\n",
+    "print(\"theta from own sdg\")\n",
+    "print(theta)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "fe00db52",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Same code but now with momentum gradient descent"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 12,
+   "id": "8f22105b",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "# Using Autograd to calculate gradients using SGD\n",
+    "# OLS example\n",
+    "from random import random, seed\n",
+    "import numpy as np\n",
+    "import autograd.numpy as np\n",
+    "import matplotlib.pyplot as plt\n",
+    "from autograd import grad\n",
+    "\n",
+    "# Note change from previous example\n",
+    "def CostOLS(y,X,theta):\n",
+    "    return np.sum((y-X @ theta)**2)\n",
+    "\n",
+    "n = 100\n",
+    "x = 2*np.random.rand(n,1)\n",
+    "y = 4+3*x+np.random.randn(n,1)\n",
+    "\n",
+    "X = np.c_[np.ones((n,1)), x]\n",
+    "XT_X = X.T @ X\n",
+    "theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)\n",
+    "print(\"Own inversion\")\n",
+    "print(theta_linreg)\n",
+    "# Hessian matrix\n",
+    "H = (2.0/n)* XT_X\n",
+    "EigValues, EigVectors = np.linalg.eig(H)\n",
+    "print(f\"Eigenvalues of Hessian Matrix:{EigValues}\")\n",
+    "\n",
+    "theta = np.random.randn(2,1)\n",
+    "eta = 1.0/np.max(EigValues)\n",
+    "Niterations = 100\n",
+    "\n",
+    "# Note that we request the derivative wrt third argument (theta, 2 here)\n",
+    "training_gradient = grad(CostOLS,2)\n",
+    "\n",
+    "for iter in range(Niterations):\n",
+    "    gradients = (1.0/n)*training_gradient(y, X, theta)\n",
+    "    theta -= eta*gradients\n",
+    "print(\"theta from own gd\")\n",
+    "print(theta)\n",
+    "\n",
+    "\n",
+    "n_epochs = 50\n",
+    "M = 5   #size of each minibatch\n",
+    "m = int(n/M) #number of minibatches\n",
+    "t0, t1 = 5, 50\n",
+    "def learning_schedule(t):\n",
+    "    return t0/(t+t1)\n",
+    "\n",
+    "theta = np.random.randn(2,1)\n",
+    "\n",
+    "change = 0.0\n",
+    "delta_momentum = 0.3\n",
+    "\n",
+    "for epoch in range(n_epochs):\n",
+    "    for i in range(m):\n",
+    "        random_index = M*np.random.randint(m)\n",
+    "        xi = X[random_index:random_index+M]\n",
+    "        yi = y[random_index:random_index+M]\n",
+    "        gradients = (1.0/M)*training_gradient(yi, xi, theta)\n",
+    "        eta = learning_schedule(epoch*m+i)\n",
+    "        # calculate update\n",
+    "        new_change = eta*gradients+delta_momentum*change\n",
+    "        # take a step\n",
+    "        theta -= new_change\n",
+    "        # save the change\n",
+    "        change = new_change\n",
+    "print(\"theta from own sdg with momentum\")\n",
+    "print(theta)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8956bf7a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## But none of these can compete with Newton's method\n",
+    "\n",
+    "Note that we here have introduced automatic differentiation"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 13,
+   "id": "044275ef",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "# Using Newton's method\n",
+    "from random import random, seed\n",
+    "import numpy as np\n",
+    "import autograd.numpy as np\n",
+    "from autograd import grad\n",
+    "\n",
+    "def CostOLS(theta):\n",
+    "    return (1.0/n)*np.sum((y-X @ theta)**2)\n",
+    "\n",
+    "n = 100\n",
+    "x = 2*np.random.rand(n,1)\n",
+    "y = 4+3*x+5*x*x\n",
+    "\n",
+    "X = np.c_[np.ones((n,1)), x, x*x]\n",
+    "XT_X = X.T @ X\n",
+    "theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)\n",
+    "print(\"Own inversion\")\n",
+    "print(theta_linreg)\n",
+    "# Hessian matrix\n",
+    "H = (2.0/n)* XT_X\n",
+    "# Note that here the Hessian does not depend on the parameters theta\n",
+    "invH = np.linalg.pinv(H)\n",
+    "theta = np.random.randn(3,1)\n",
+    "Niterations = 5\n",
+    "# define the gradient\n",
+    "training_gradient = grad(CostOLS)\n",
+    "\n",
+    "for iter in range(Niterations):\n",
+    "    gradients = training_gradient(theta)\n",
+    "    theta -= invH @ gradients\n",
+    "    print(iter,gradients[0],gradients[1])\n",
+    "print(\"theta from own Newton code\")\n",
+    "print(theta)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "353b50b3",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Similar (second order function now) problem but now with AdaGrad"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 14,
+   "id": "fdc8debd",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "# Using Autograd to calculate gradients using AdaGrad and Stochastic Gradient descent\n",
+    "# OLS example\n",
+    "from random import random, seed\n",
+    "import numpy as np\n",
+    "import autograd.numpy as np\n",
+    "import matplotlib.pyplot as plt\n",
+    "from autograd import grad\n",
+    "\n",
+    "# Note change from previous example\n",
+    "def CostOLS(y,X,theta):\n",
+    "    return np.sum((y-X @ theta)**2)\n",
+    "\n",
+    "n = 1000\n",
+    "x = np.random.rand(n,1)\n",
+    "y = 2.0+3*x +4*x*x\n",
+    "\n",
+    "X = np.c_[np.ones((n,1)), x, x*x]\n",
+    "XT_X = X.T @ X\n",
+    "theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)\n",
+    "print(\"Own inversion\")\n",
+    "print(theta_linreg)\n",
+    "\n",
+    "\n",
+    "# Note that we request the derivative wrt third argument (theta, 2 here)\n",
+    "training_gradient = grad(CostOLS,2)\n",
+    "# Define parameters for Stochastic Gradient Descent\n",
+    "n_epochs = 50\n",
+    "M = 5   #size of each minibatch\n",
+    "m = int(n/M) #number of minibatches\n",
+    "# Guess for unknown parameters theta\n",
+    "theta = np.random.randn(3,1)\n",
+    "\n",
+    "# Value for learning rate\n",
+    "eta = 0.01\n",
+    "# Including AdaGrad parameter to avoid possible division by zero\n",
+    "delta  = 1e-8\n",
+    "for epoch in range(n_epochs):\n",
+    "    Giter = 0.0\n",
+    "    for i in range(m):\n",
+    "        random_index = M*np.random.randint(m)\n",
+    "        xi = X[random_index:random_index+M]\n",
+    "        yi = y[random_index:random_index+M]\n",
+    "        gradients = (1.0/M)*training_gradient(yi, xi, theta)\n",
+    "        Giter += gradients*gradients\n",
+    "        update = gradients*eta/(delta+np.sqrt(Giter))\n",
+    "        theta -= update\n",
+    "print(\"theta from own AdaGrad\")\n",
+    "print(theta)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b738f1b8",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Running this code we note an almost perfect agreement with the results from matrix inversion."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "65ce93ba",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## RMSprop for adaptive learning rate with Stochastic Gradient Descent"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 15,
+   "id": "604d7286",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "# Using Autograd to calculate gradients using RMSprop  and Stochastic Gradient descent\n",
+    "# OLS example\n",
+    "from random import random, seed\n",
+    "import numpy as np\n",
+    "import autograd.numpy as np\n",
+    "import matplotlib.pyplot as plt\n",
+    "from autograd import grad\n",
+    "\n",
+    "# Note change from previous example\n",
+    "def CostOLS(y,X,theta):\n",
+    "    return np.sum((y-X @ theta)**2)\n",
+    "\n",
+    "n = 1000\n",
+    "x = np.random.rand(n,1)\n",
+    "y = 2.0+3*x +4*x*x# +np.random.randn(n,1)\n",
+    "\n",
+    "X = np.c_[np.ones((n,1)), x, x*x]\n",
+    "XT_X = X.T @ X\n",
+    "theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)\n",
+    "print(\"Own inversion\")\n",
+    "print(theta_linreg)\n",
+    "\n",
+    "\n",
+    "# Note that we request the derivative wrt third argument (theta, 2 here)\n",
+    "training_gradient = grad(CostOLS,2)\n",
+    "# Define parameters for Stochastic Gradient Descent\n",
+    "n_epochs = 50\n",
+    "M = 5   #size of each minibatch\n",
+    "m = int(n/M) #number of minibatches\n",
+    "# Guess for unknown parameters theta\n",
+    "theta = np.random.randn(3,1)\n",
+    "\n",
+    "# Value for learning rate\n",
+    "eta = 0.01\n",
+    "# Value for parameter rho\n",
+    "rho = 0.99\n",
+    "# Including AdaGrad parameter to avoid possible division by zero\n",
+    "delta  = 1e-8\n",
+    "for epoch in range(n_epochs):\n",
+    "    Giter = 0.0\n",
+    "    for i in range(m):\n",
+    "        random_index = M*np.random.randint(m)\n",
+    "        xi = X[random_index:random_index+M]\n",
+    "        yi = y[random_index:random_index+M]\n",
+    "        gradients = (1.0/M)*training_gradient(yi, xi, theta)\n",
+    "\t# Accumulated gradient\n",
+    "\t# Scaling with rho the new and the previous results\n",
+    "        Giter = (rho*Giter+(1-rho)*gradients*gradients)\n",
+    "\t# Taking the diagonal only and inverting\n",
+    "        update = gradients*eta/(delta+np.sqrt(Giter))\n",
+    "\t# Hadamard product\n",
+    "        theta -= update\n",
+    "print(\"theta from own RMSprop\")\n",
+    "print(theta)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e663a714",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## And finally [ADAM](https://arxiv.org/pdf/1412.6980.pdf)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 16,
+   "id": "749fa687",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "# Using Autograd to calculate gradients using RMSprop  and Stochastic Gradient descent\n",
+    "# OLS example\n",
+    "from random import random, seed\n",
+    "import numpy as np\n",
+    "import autograd.numpy as np\n",
+    "import matplotlib.pyplot as plt\n",
+    "from autograd import grad\n",
+    "\n",
+    "# Note change from previous example\n",
+    "def CostOLS(y,X,theta):\n",
+    "    return np.sum((y-X @ theta)**2)\n",
+    "\n",
+    "n = 1000\n",
+    "x = np.random.rand(n,1)\n",
+    "y = 2.0+3*x +4*x*x# +np.random.randn(n,1)\n",
+    "\n",
+    "X = np.c_[np.ones((n,1)), x, x*x]\n",
+    "XT_X = X.T @ X\n",
+    "theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)\n",
+    "print(\"Own inversion\")\n",
+    "print(theta_linreg)\n",
+    "\n",
+    "\n",
+    "# Note that we request the derivative wrt third argument (theta, 2 here)\n",
+    "training_gradient = grad(CostOLS,2)\n",
+    "# Define parameters for Stochastic Gradient Descent\n",
+    "n_epochs = 50\n",
+    "M = 5   #size of each minibatch\n",
+    "m = int(n/M) #number of minibatches\n",
+    "# Guess for unknown parameters theta\n",
+    "theta = np.random.randn(3,1)\n",
+    "\n",
+    "# Value for learning rate\n",
+    "eta = 0.01\n",
+    "# Value for parameters theta1 and theta2, see https://arxiv.org/abs/1412.6980\n",
+    "theta1 = 0.9\n",
+    "theta2 = 0.999\n",
+    "# Including AdaGrad parameter to avoid possible division by zero\n",
+    "delta  = 1e-7\n",
+    "iter = 0\n",
+    "for epoch in range(n_epochs):\n",
+    "    first_moment = 0.0\n",
+    "    second_moment = 0.0\n",
+    "    iter += 1\n",
+    "    for i in range(m):\n",
+    "        random_index = M*np.random.randint(m)\n",
+    "        xi = X[random_index:random_index+M]\n",
+    "        yi = y[random_index:random_index+M]\n",
+    "        gradients = (1.0/M)*training_gradient(yi, xi, theta)\n",
+    "        # Computing moments first\n",
+    "        first_moment = theta1*first_moment + (1-theta1)*gradients\n",
+    "        second_moment = theta2*second_moment+(1-theta2)*gradients*gradients\n",
+    "        first_term = first_moment/(1.0-theta1**iter)\n",
+    "        second_term = second_moment/(1.0-theta2**iter)\n",
+    "\t# Scaling with rho the new and the previous results\n",
+    "        update = eta*first_term/(np.sqrt(second_term)+delta)\n",
+    "        theta -= update\n",
+    "print(\"theta from own ADAM\")\n",
+    "print(theta)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8801fcd5",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Material for the lab sessions\n",
+    "\n",
+    "1. Exercise set for week 37 and reminder on scaling (from lab sessions of week 35)\n",
+    "\n",
+    "2. Work on project 1\n",
+    "<!-- * [Video of exercise sessions week 37](https://youtu.be/bK4AEcTu-oM) -->\n",
+    "\n",
+    "For more discussions of Ridge regression and calculation of averages, [Wessel van Wieringen's](https://arxiv.org/abs/1509.09169) article is highly recommended."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8ea68725",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Reminder on different scaling methods\n",
+    "\n",
+    "Before fitting a regression model, it is good practice to normalize or\n",
+    "standardize the features. This ensures all features are on a\n",
+    "comparable scale, which is especially important when using\n",
+    "regularization. In the exercises this week we will perform standardization, scaling each\n",
+    "feature to have mean 0 and standard deviation 1.\n",
+    "\n",
+    "Here we compute the mean and standard deviation of each column (feature) in our design/feature matrix $\\boldsymbol{X}$.\n",
+    "Then we subtract the mean and divide by the standard deviation for each feature.\n",
+    "\n",
+    "In the example here we\n",
+    "we will also center the target $\\boldsymbol{y}$ to mean $0$. Centering $\\boldsymbol{y}$\n",
+    "(and each feature) means the model does not require a separate intercept\n",
+    "term, the data is shifted such that the intercept is effectively 0\n",
+    ". (In practice, one could include an intercept in the model and not\n",
+    "penalize it, but here we simplify by centering.)\n",
+    "Choose $n=100$ data points and set up $\\boldsymbol{x}, $\\boldsymbol{y}$ and the design matrix $\\boldsymbol{X}$."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 17,
+   "id": "04811786",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "# Standardize features (zero mean, unit variance for each feature)\n",
+    "X_mean = X.mean(axis=0)\n",
+    "X_std = X.std(axis=0)\n",
+    "X_std[X_std == 0] = 1  # safeguard to avoid division by zero for constant features\n",
+    "X_norm = (X - X_mean) / X_std\n",
+    "\n",
+    "# Center the target to zero mean (optional, to simplify intercept handling)\n",
+    "y_mean = ?\n",
+    "y_centered = ?"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b0e7cc2c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Do we need to center the values of $y$?\n",
+    "\n",
+    "After this preprocessing, each column of $\\boldsymbol{X}_{\\mathrm{norm}}$ has mean zero and standard deviation $1$\n",
+    "and $\\boldsymbol{y}_{\\mathrm{centered}}$ has mean 0. This can make the optimization landscape\n",
+    "nicer and ensures the regularization penalty $\\lambda \\sum_j\n",
+    "\\theta_j^2$ in Ridge regression treats each coefficient fairly (since features are on the\n",
+    "same scale)."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f8a8132d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Functionality in Scikit-Learn\n",
+    "\n",
+    "**Scikit-Learn** has several functions which allow us to rescale the\n",
+    "data, normally resulting in much better results in terms of various\n",
+    "accuracy scores.  The **StandardScaler** function in **Scikit-Learn**\n",
+    "ensures that for each feature/predictor we study the mean value is\n",
+    "zero and the variance is one (every column in the design/feature\n",
+    "matrix).  This scaling has the drawback that it does not ensure that\n",
+    "we have a particular maximum or minimum in our data set. Another\n",
+    "function included in **Scikit-Learn** is the **MinMaxScaler** which\n",
+    "ensures that all features are exactly between $0$ and $1$. The"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "03eca41f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## More preprocessing\n",
+    "\n",
+    "The **Normalizer** scales each data\n",
+    "point such that the feature vector has a euclidean length of one. In other words, it\n",
+    "projects a data point on the circle (or sphere in the case of higher dimensions) with a\n",
+    "radius of 1. This means every data point is scaled by a different number (by the\n",
+    "inverse of it’s length).\n",
+    "This normalization is often used when only the direction (or angle) of the data matters,\n",
+    "not the length of the feature vector.\n",
+    "\n",
+    "The **RobustScaler** works similarly to the StandardScaler in that it\n",
+    "ensures statistical properties for each feature that guarantee that\n",
+    "they are on the same scale. However, the RobustScaler uses the median\n",
+    "and quartiles, instead of mean and variance. This makes the\n",
+    "RobustScaler ignore data points that are very different from the rest\n",
+    "(like measurement errors). These odd data points are also called\n",
+    "outliers, and might often lead to trouble for other scaling\n",
+    "techniques."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "710e8f88",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Frequently used scaling functions\n",
+    "\n",
+    "Many features are often scaled using standardization to improve performance. In **Scikit-Learn** this is given by the **StandardScaler** function as discussed above. It is easy however to write your own. \n",
+    "Mathematically, this involves subtracting the mean and divide by the standard deviation over the data set, for each feature:"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5d3df9bf",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "x_j^{(i)} \\rightarrow \\frac{x_j^{(i)} - \\overline{x}_j}{\\sigma(x_j)},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "be0fd5f1",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where $\\overline{x}_j$ and $\\sigma(x_j)$ are the mean and standard deviation, respectively,  of the feature $x_j$.\n",
+    "This ensures that each feature has zero mean and unit standard deviation.  For data sets where  we do not have the standard deviation or don't wish to calculate it,  it is then common to simply set it to one.\n",
+    "\n",
+    "Keep in mind that when you transform your data set before training a model, the same transformation needs to be done\n",
+    "on your eventual new data set  before making a prediction. If we translate this into a Python code, it would could be implemented as"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 18,
+   "id": "2a0924bb",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "\"\"\"\n",
+    "#Model training, we compute the mean value of y and X\n",
+    "y_train_mean = np.mean(y_train)\n",
+    "X_train_mean = np.mean(X_train,axis=0)\n",
+    "X_train = X_train - X_train_mean\n",
+    "y_train = y_train - y_train_mean\n",
+    "\n",
+    "# The we fit our model with the training data\n",
+    "trained_model = some_model.fit(X_train,y_train)\n",
+    "\n",
+    "\n",
+    "#Model prediction, we need also to transform our data set used for the prediction.\n",
+    "X_test = X_test - X_train_mean #Use mean from training data\n",
+    "y_pred = trained_model(X_test)\n",
+    "y_pred = y_pred + y_train_mean\n",
+    "\"\"\""
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d116f448",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Let us try to understand what this may imply mathematically when we\n",
+    "subtract the mean values, also known as *zero centering*. For\n",
+    "simplicity, we will focus on  ordinary regression, as done in the above example.\n",
+    "\n",
+    "The cost/loss function  for regression is"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "41caea07",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "C(\\theta_0, \\theta_1, ... , \\theta_{p-1}) = \\frac{1}{n}\\sum_{i=0}^{n} \\left(y_i - \\theta_0 - \\sum_{j=1}^{p-1} X_{ij}\\theta_j\\right)^2,.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1fa96f7c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Recall also that we use the squared value. This expression can lead to an\n",
+    "increased penalty for higher differences between predicted and\n",
+    "output/target values.\n",
+    "\n",
+    "What we have done is to single out the $\\theta_0$ term in the\n",
+    "definition of the mean squared error (MSE).  The design matrix $X$\n",
+    "does in this case not contain any intercept column.  When we take the\n",
+    "derivative with respect to $\\theta_0$, we want the derivative to obey"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "70038d6a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial C}{\\partial \\theta_j} = 0,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "852a77d0",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "for all $j$. For $\\theta_0$ we have"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "fc4afaaf",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial C}{\\partial \\theta_0} = -\\frac{2}{n}\\sum_{i=0}^{n-1} \\left(y_i - \\theta_0 - \\sum_{j=1}^{p-1} X_{ij} \\theta_j\\right).\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "94b18ced",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Multiplying away the constant $2/n$, we obtain"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d7a95314",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\sum_{i=0}^{n-1} \\theta_0 = \\sum_{i=0}^{n-1}y_i - \\sum_{i=0}^{n-1} \\sum_{j=1}^{p-1} X_{ij} \\theta_j.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "eaf6a485",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Let us specialize first to the case where we have only two parameters $\\theta_0$ and $\\theta_1$.\n",
+    "Our result for $\\theta_0$ simplifies then to"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3d9442a2",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "n\\theta_0 = \\sum_{i=0}^{n-1}y_i - \\sum_{i=0}^{n-1} X_{i1} \\theta_1.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e4aeef17",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "We obtain then"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4ce9dee9",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\theta_0 = \\frac{1}{n}\\sum_{i=0}^{n-1}y_i - \\theta_1\\frac{1}{n}\\sum_{i=0}^{n-1} X_{i1}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "752ce099",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "If we define"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7cad5229",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\mu_{\\boldsymbol{x}_1}=\\frac{1}{n}\\sum_{i=0}^{n-1} X_{i1},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "46f1aaf9",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and the mean value of the outputs as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7d25a9fb",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\mu_y=\\frac{1}{n}\\sum_{i=0}^{n-1}y_i,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "57b4c7d9",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "we have"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "fb833214",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\theta_0 = \\mu_y - \\theta_1\\mu_{\\boldsymbol{x}_1}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5fa29cd3",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "In the general case with more parameters than $\\theta_0$ and $\\theta_1$, we have"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6c0e668d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\theta_0 = \\frac{1}{n}\\sum_{i=0}^{n-1}y_i - \\frac{1}{n}\\sum_{i=0}^{n-1}\\sum_{j=1}^{p-1} X_{ij}\\theta_j.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9d928664",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "We can rewrite the latter equation as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "65434b84",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\theta_0 = \\frac{1}{n}\\sum_{i=0}^{n-1}y_i - \\sum_{j=1}^{p-1} \\mu_{\\boldsymbol{x}_j}\\theta_j,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "127c9817",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where we have defined"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "46f45c10",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\mu_{\\boldsymbol{x}_j}=\\frac{1}{n}\\sum_{i=0}^{n-1} X_{ij},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4fbaa69a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "the mean value for all elements of the column vector $\\boldsymbol{x}_j$.\n",
+    "\n",
+    "Replacing $y_i$ with $y_i - y_i - \\overline{\\boldsymbol{y}}$ and centering also our design matrix results in a cost function (in vector-matrix disguise)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "25f1abd4",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "C(\\boldsymbol{\\theta}) = (\\boldsymbol{\\tilde{y}} - \\tilde{X}\\boldsymbol{\\theta})^T(\\boldsymbol{\\tilde{y}} - \\tilde{X}\\boldsymbol{\\theta}).\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9fd5ef9e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "If we minimize with respect to $\\boldsymbol{\\theta}$ we have then"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f1cb8e35",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\hat{\\boldsymbol{\\theta}} = (\\tilde{X}^T\\tilde{X})^{-1}\\tilde{X}^T\\boldsymbol{\\tilde{y}},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c0c5100a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where $\\boldsymbol{\\tilde{y}} = \\boldsymbol{y} - \\overline{\\boldsymbol{y}}$\n",
+    "and $\\tilde{X}_{ij} = X_{ij} - \\frac{1}{n}\\sum_{k=0}^{n-1}X_{kj}$.\n",
+    "\n",
+    "For Ridge regression we need to add $\\lambda \\boldsymbol{\\theta}^T\\boldsymbol{\\theta}$ to the cost function and get then"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c80e55cb",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\hat{\\boldsymbol{\\theta}} = (\\tilde{X}^T\\tilde{X} + \\lambda I)^{-1}\\tilde{X}^T\\boldsymbol{\\tilde{y}}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a47f5c5e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "What does this mean? And why do we insist on all this? Let us look at some examples.\n",
+    "\n",
+    "This code shows a simple first-order fit to a data set using the above transformed data, where we consider the role of the intercept first, by either excluding it or including it (*code example thanks to  Øyvind Sigmundson Schøyen*). Here our scaling of the data is done by subtracting the mean values only.\n",
+    "Note also that we do not split the data into training and test."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 19,
+   "id": "e093186c",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "import numpy as np\n",
+    "import matplotlib.pyplot as plt\n",
+    "\n",
+    "from sklearn.linear_model import LinearRegression\n",
+    "\n",
+    "\n",
+    "np.random.seed(2021)\n",
+    "\n",
+    "def MSE(y_data,y_model):\n",
+    "    n = np.size(y_model)\n",
+    "    return np.sum((y_data-y_model)**2)/n\n",
+    "\n",
+    "\n",
+    "def fit_theta(X, y):\n",
+    "    return np.linalg.pinv(X.T @ X) @ X.T @ y\n",
+    "\n",
+    "\n",
+    "true_theta = [2, 0.5, 3.7]\n",
+    "\n",
+    "x = np.linspace(0, 1, 11)\n",
+    "y = np.sum(\n",
+    "    np.asarray([x ** p * b for p, b in enumerate(true_theta)]), axis=0\n",
+    ") + 0.1 * np.random.normal(size=len(x))\n",
+    "\n",
+    "degree = 3\n",
+    "X = np.zeros((len(x), degree))\n",
+    "\n",
+    "# Include the intercept in the design matrix\n",
+    "for p in range(degree):\n",
+    "    X[:, p] = x ** p\n",
+    "\n",
+    "theta = fit_theta(X, y)\n",
+    "\n",
+    "# Intercept is included in the design matrix\n",
+    "skl = LinearRegression(fit_intercept=False).fit(X, y)\n",
+    "\n",
+    "print(f\"True theta: {true_theta}\")\n",
+    "print(f\"Fitted theta: {theta}\")\n",
+    "print(f\"Sklearn fitted theta: {skl.coef_}\")\n",
+    "ypredictOwn = X @ theta\n",
+    "ypredictSKL = skl.predict(X)\n",
+    "print(f\"MSE with intercept column\")\n",
+    "print(MSE(y,ypredictOwn))\n",
+    "print(f\"MSE with intercept column from SKL\")\n",
+    "print(MSE(y,ypredictSKL))\n",
+    "\n",
+    "\n",
+    "plt.figure()\n",
+    "plt.scatter(x, y, label=\"Data\")\n",
+    "plt.plot(x, X @ theta, label=\"Fit\")\n",
+    "plt.plot(x, skl.predict(X), label=\"Sklearn (fit_intercept=False)\")\n",
+    "\n",
+    "\n",
+    "# Do not include the intercept in the design matrix\n",
+    "X = np.zeros((len(x), degree - 1))\n",
+    "\n",
+    "for p in range(degree - 1):\n",
+    "    X[:, p] = x ** (p + 1)\n",
+    "\n",
+    "# Intercept is not included in the design matrix\n",
+    "skl = LinearRegression(fit_intercept=True).fit(X, y)\n",
+    "\n",
+    "# Use centered values for X and y when computing coefficients\n",
+    "y_offset = np.average(y, axis=0)\n",
+    "X_offset = np.average(X, axis=0)\n",
+    "\n",
+    "theta = fit_theta(X - X_offset, y - y_offset)\n",
+    "intercept = np.mean(y_offset - X_offset @ theta)\n",
+    "\n",
+    "print(f\"Manual intercept: {intercept}\")\n",
+    "print(f\"Fitted theta (without intercept): {theta}\")\n",
+    "print(f\"Sklearn intercept: {skl.intercept_}\")\n",
+    "print(f\"Sklearn fitted theta (without intercept): {skl.coef_}\")\n",
+    "ypredictOwn = X @ theta\n",
+    "ypredictSKL = skl.predict(X)\n",
+    "print(f\"MSE with Manual intercept\")\n",
+    "print(MSE(y,ypredictOwn+intercept))\n",
+    "print(f\"MSE with Sklearn intercept\")\n",
+    "print(MSE(y,ypredictSKL))\n",
+    "\n",
+    "plt.plot(x, X @ theta + intercept, \"--\", label=\"Fit (manual intercept)\")\n",
+    "plt.plot(x, skl.predict(X), \"--\", label=\"Sklearn (fit_intercept=True)\")\n",
+    "plt.grid()\n",
+    "plt.legend()\n",
+    "\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "de555fff",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "The intercept is the value of our output/target variable\n",
+    "when all our features are zero and our function crosses the $y$-axis (for a one-dimensional case). \n",
+    "\n",
+    "Printing the MSE, we see first that both methods give the same MSE, as\n",
+    "they should.  However, when we move to for example Ridge regression,\n",
+    "the way we treat the intercept may give a larger or smaller MSE,\n",
+    "meaning that the MSE can be penalized by the value of the\n",
+    "intercept. Not including the intercept in the fit, means that the\n",
+    "regularization term does not include $\\theta_0$. For different values\n",
+    "of $\\lambda$, this may lead to different MSE values. \n",
+    "\n",
+    "To remind the reader, the regularization term, with the intercept in Ridge regression, is given by"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "72178d39",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\lambda \\vert\\vert \\boldsymbol{\\theta} \\vert\\vert_2^2 = \\lambda \\sum_{j=0}^{p-1}\\theta_j^2,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8e5d822b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "but when we take out the intercept, this equation becomes"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e9218f82",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\lambda \\vert\\vert \\boldsymbol{\\theta} \\vert\\vert_2^2 = \\lambda \\sum_{j=1}^{p-1}\\theta_j^2.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2223d1b1",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "For Lasso regression we have"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e5474a5b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\lambda \\vert\\vert \\boldsymbol{\\theta} \\vert\\vert_1 = \\lambda \\sum_{j=1}^{p-1}\\vert\\theta_j\\vert.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "691295ed",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "It means that, when scaling the design matrix and the outputs/targets,\n",
+    "by subtracting the mean values, we have an optimization problem which\n",
+    "is not penalized by the intercept. The MSE value can then be smaller\n",
+    "since it focuses only on the remaining quantities. If we however bring\n",
+    "back the intercept, we will get a MSE which then contains the\n",
+    "intercept.\n",
+    "\n",
+    "Armed with this wisdom, we attempt first to simply set the intercept equal to **False** in our implementation of Ridge regression for our well-known  vanilla data set."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 20,
+   "id": "e243cef5",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "import numpy as np\n",
+    "import pandas as pd\n",
+    "import matplotlib.pyplot as plt\n",
+    "from sklearn.model_selection import train_test_split\n",
+    "from sklearn import linear_model\n",
+    "\n",
+    "def MSE(y_data,y_model):\n",
+    "    n = np.size(y_model)\n",
+    "    return np.sum((y_data-y_model)**2)/n\n",
+    "\n",
+    "\n",
+    "# A seed just to ensure that the random numbers are the same for every run.\n",
+    "# Useful for eventual debugging.\n",
+    "np.random.seed(3155)\n",
+    "\n",
+    "n = 100\n",
+    "x = np.random.rand(n)\n",
+    "y = np.exp(-x**2) + 1.5 * np.exp(-(x-2)**2)\n",
+    "\n",
+    "Maxpolydegree = 20\n",
+    "X = np.zeros((n,Maxpolydegree))\n",
+    "#We include explicitely the intercept column\n",
+    "for degree in range(Maxpolydegree):\n",
+    "    X[:,degree] = x**degree\n",
+    "# We split the data in test and training data\n",
+    "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)\n",
+    "\n",
+    "p = Maxpolydegree\n",
+    "I = np.eye(p,p)\n",
+    "# Decide which values of lambda to use\n",
+    "nlambdas = 6\n",
+    "MSEOwnRidgePredict = np.zeros(nlambdas)\n",
+    "MSERidgePredict = np.zeros(nlambdas)\n",
+    "lambdas = np.logspace(-4, 2, nlambdas)\n",
+    "for i in range(nlambdas):\n",
+    "    lmb = lambdas[i]\n",
+    "    OwnRidgeTheta = np.linalg.pinv(X_train.T @ X_train+lmb*I) @ X_train.T @ y_train\n",
+    "    # Note: we include the intercept column and no scaling\n",
+    "    RegRidge = linear_model.Ridge(lmb,fit_intercept=False)\n",
+    "    RegRidge.fit(X_train,y_train)\n",
+    "    # and then make the prediction\n",
+    "    ytildeOwnRidge = X_train @ OwnRidgeTheta\n",
+    "    ypredictOwnRidge = X_test @ OwnRidgeTheta\n",
+    "    ytildeRidge = RegRidge.predict(X_train)\n",
+    "    ypredictRidge = RegRidge.predict(X_test)\n",
+    "    MSEOwnRidgePredict[i] = MSE(y_test,ypredictOwnRidge)\n",
+    "    MSERidgePredict[i] = MSE(y_test,ypredictRidge)\n",
+    "    print(\"Theta values for own Ridge implementation\")\n",
+    "    print(OwnRidgeTheta)\n",
+    "    print(\"Theta values for Scikit-Learn Ridge implementation\")\n",
+    "    print(RegRidge.coef_)\n",
+    "    print(\"MSE values for own Ridge implementation\")\n",
+    "    print(MSEOwnRidgePredict[i])\n",
+    "    print(\"MSE values for Scikit-Learn Ridge implementation\")\n",
+    "    print(MSERidgePredict[i])\n",
+    "\n",
+    "# Now plot the results\n",
+    "plt.figure()\n",
+    "plt.plot(np.log10(lambdas), MSEOwnRidgePredict, 'r', label = 'MSE own Ridge Test')\n",
+    "plt.plot(np.log10(lambdas), MSERidgePredict, 'g', label = 'MSE Ridge Test')\n",
+    "\n",
+    "plt.xlabel('log10(lambda)')\n",
+    "plt.ylabel('MSE')\n",
+    "plt.legend()\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ef2eaa7a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "The results here agree when we force **Scikit-Learn**'s Ridge function to include the first column in our design matrix.\n",
+    "We see that the results agree very well. Here we have thus explicitely included the intercept column in the design matrix.\n",
+    "What happens if we do not include the intercept in our fit?\n",
+    "Let us see how we can change this code by zero centering."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 21,
+   "id": "546e3504",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "import numpy as np\n",
+    "import pandas as pd\n",
+    "import matplotlib.pyplot as plt\n",
+    "from sklearn.model_selection import train_test_split\n",
+    "from sklearn import linear_model\n",
+    "from sklearn.preprocessing import StandardScaler\n",
+    "\n",
+    "def MSE(y_data,y_model):\n",
+    "    n = np.size(y_model)\n",
+    "    return np.sum((y_data-y_model)**2)/n\n",
+    "# A seed just to ensure that the random numbers are the same for every run.\n",
+    "# Useful for eventual debugging.\n",
+    "np.random.seed(315)\n",
+    "\n",
+    "n = 100\n",
+    "x = np.random.rand(n)\n",
+    "y = np.exp(-x**2) + 1.5 * np.exp(-(x-2)**2)\n",
+    "\n",
+    "Maxpolydegree = 20\n",
+    "X = np.zeros((n,Maxpolydegree-1))\n",
+    "\n",
+    "for degree in range(1,Maxpolydegree): #No intercept column\n",
+    "    X[:,degree-1] = x**(degree)\n",
+    "\n",
+    "# We split the data in test and training data\n",
+    "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)\n",
+    "\n",
+    "#For our own implementation, we will need to deal with the intercept by centering the design matrix and the target variable\n",
+    "X_train_mean = np.mean(X_train,axis=0)\n",
+    "#Center by removing mean from each feature\n",
+    "X_train_scaled = X_train - X_train_mean \n",
+    "X_test_scaled = X_test - X_train_mean\n",
+    "#The model intercept (called y_scaler) is given by the mean of the target variable (IF X is centered)\n",
+    "#Remove the intercept from the training data.\n",
+    "y_scaler = np.mean(y_train)           \n",
+    "y_train_scaled = y_train - y_scaler   \n",
+    "\n",
+    "p = Maxpolydegree-1\n",
+    "I = np.eye(p,p)\n",
+    "# Decide which values of lambda to use\n",
+    "nlambdas = 6\n",
+    "MSEOwnRidgePredict = np.zeros(nlambdas)\n",
+    "MSERidgePredict = np.zeros(nlambdas)\n",
+    "\n",
+    "lambdas = np.logspace(-4, 2, nlambdas)\n",
+    "for i in range(nlambdas):\n",
+    "    lmb = lambdas[i]\n",
+    "    OwnRidgeTheta = np.linalg.pinv(X_train_scaled.T @ X_train_scaled+lmb*I) @ X_train_scaled.T @ (y_train_scaled)\n",
+    "    intercept_ = y_scaler - X_train_mean@OwnRidgeTheta #The intercept can be shifted so the model can predict on uncentered data\n",
+    "    #Add intercept to prediction\n",
+    "    ypredictOwnRidge = X_test_scaled @ OwnRidgeTheta + y_scaler \n",
+    "    RegRidge = linear_model.Ridge(lmb)\n",
+    "    RegRidge.fit(X_train,y_train)\n",
+    "    ypredictRidge = RegRidge.predict(X_test)\n",
+    "    MSEOwnRidgePredict[i] = MSE(y_test,ypredictOwnRidge)\n",
+    "    MSERidgePredict[i] = MSE(y_test,ypredictRidge)\n",
+    "    print(\"Theta values for own Ridge implementation\")\n",
+    "    print(OwnRidgeTheta) #Intercept is given by mean of target variable\n",
+    "    print(\"Theta values for Scikit-Learn Ridge implementation\")\n",
+    "    print(RegRidge.coef_)\n",
+    "    print('Intercept from own implementation:')\n",
+    "    print(intercept_)\n",
+    "    print('Intercept from Scikit-Learn Ridge implementation')\n",
+    "    print(RegRidge.intercept_)\n",
+    "    print(\"MSE values for own Ridge implementation\")\n",
+    "    print(MSEOwnRidgePredict[i])\n",
+    "    print(\"MSE values for Scikit-Learn Ridge implementation\")\n",
+    "    print(MSERidgePredict[i])\n",
+    "\n",
+    "\n",
+    "# Now plot the results\n",
+    "plt.figure()\n",
+    "plt.plot(np.log10(lambdas), MSEOwnRidgePredict, 'b--', label = 'MSE own Ridge Test')\n",
+    "plt.plot(np.log10(lambdas), MSERidgePredict, 'g--', label = 'MSE SL Ridge Test')\n",
+    "plt.xlabel('log10(lambda)')\n",
+    "plt.ylabel('MSE')\n",
+    "plt.legend()\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f6787352",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "We see here, when compared to the code which includes explicitely the\n",
+    "intercept column, that our MSE value is actually smaller. This is\n",
+    "because the regularization term does not include the intercept value\n",
+    "$\\theta_0$ in the fitting.  This applies to Lasso regularization as\n",
+    "well.  It means that our optimization is now done only with the\n",
+    "centered matrix and/or vector that enter the fitting procedure."
+   ]
+  }
+ ],
+ "metadata": {},
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/a11y_pygments/a11y_dark/README.doctree b/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/a11y_pygments/a11y_dark/README.doctree
new file mode 100644
index 000000000..d34aaa2a0
Binary files /dev/null and b/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/a11y_pygments/a11y_dark/README.doctree differ
diff --git a/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/a11y_pygments/a11y_high_contrast_dark/README.doctree b/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/a11y_pygments/a11y_high_contrast_dark/README.doctree
new file mode 100644
index 000000000..4ab79f481
Binary files /dev/null and b/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/a11y_pygments/a11y_high_contrast_dark/README.doctree differ
diff --git a/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/a11y_pygments/a11y_high_contrast_light/README.doctree b/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/a11y_pygments/a11y_high_contrast_light/README.doctree
new file mode 100644
index 000000000..c6faf048b
Binary files /dev/null and b/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/a11y_pygments/a11y_high_contrast_light/README.doctree differ
diff --git a/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/a11y_pygments/a11y_light/README.doctree b/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/a11y_pygments/a11y_light/README.doctree
new file mode 100644
index 000000000..80e654d7f
Binary files /dev/null and b/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/a11y_pygments/a11y_light/README.doctree differ
diff --git a/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/a11y_pygments/blinds_dark/README.doctree b/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/a11y_pygments/blinds_dark/README.doctree
new file mode 100644
index 000000000..477788e73
Binary files /dev/null and b/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/a11y_pygments/blinds_dark/README.doctree differ
diff --git a/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/a11y_pygments/blinds_light/README.doctree b/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/a11y_pygments/blinds_light/README.doctree
new file mode 100644
index 000000000..21a71eca2
Binary files /dev/null and b/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/a11y_pygments/blinds_light/README.doctree differ
diff --git a/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/a11y_pygments/github_dark/README.doctree b/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/a11y_pygments/github_dark/README.doctree
new file mode 100644
index 000000000..be2cbefec
Binary files /dev/null and b/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/a11y_pygments/github_dark/README.doctree differ
diff --git a/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/a11y_pygments/github_dark_colorblind/README.doctree b/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/a11y_pygments/github_dark_colorblind/README.doctree
new file mode 100644
index 000000000..154b70b88
Binary files /dev/null and b/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/a11y_pygments/github_dark_colorblind/README.doctree differ
diff --git a/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/a11y_pygments/github_dark_high_contrast/README.doctree b/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/a11y_pygments/github_dark_high_contrast/README.doctree
new file mode 100644
index 000000000..956f542aa
Binary files /dev/null and b/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/a11y_pygments/github_dark_high_contrast/README.doctree differ
diff --git a/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/a11y_pygments/github_light/README.doctree b/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/a11y_pygments/github_light/README.doctree
new file mode 100644
index 000000000..48c14d047
Binary files /dev/null and b/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/a11y_pygments/github_light/README.doctree differ
diff --git a/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/a11y_pygments/github_light_colorblind/README.doctree b/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/a11y_pygments/github_light_colorblind/README.doctree
new file mode 100644
index 000000000..a50c92fae
Binary files /dev/null and b/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/a11y_pygments/github_light_colorblind/README.doctree differ
diff --git a/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/a11y_pygments/github_light_high_contrast/README.doctree b/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/a11y_pygments/github_light_high_contrast/README.doctree
new file mode 100644
index 000000000..be662a48b
Binary files /dev/null and b/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/a11y_pygments/github_light_high_contrast/README.doctree differ
diff --git a/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/a11y_pygments/gotthard_dark/README.doctree b/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/a11y_pygments/gotthard_dark/README.doctree
new file mode 100644
index 000000000..709722fbc
Binary files /dev/null and b/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/a11y_pygments/gotthard_dark/README.doctree differ
diff --git a/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/a11y_pygments/gotthard_light/README.doctree b/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/a11y_pygments/gotthard_light/README.doctree
new file mode 100644
index 000000000..01a2cdd03
Binary files /dev/null and b/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/a11y_pygments/gotthard_light/README.doctree differ
diff --git a/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/a11y_pygments/greative/README.doctree b/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/a11y_pygments/greative/README.doctree
new file mode 100644
index 000000000..c7d11aa68
Binary files /dev/null and b/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/a11y_pygments/greative/README.doctree differ
diff --git a/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/a11y_pygments/pitaya_smoothie/README.doctree b/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/a11y_pygments/pitaya_smoothie/README.doctree
new file mode 100644
index 000000000..8c79795a1
Binary files /dev/null and b/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/a11y_pygments/pitaya_smoothie/README.doctree differ
diff --git a/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/alabaster-0.7.16.dist-info/LICENSE.doctree b/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/alabaster-0.7.16.dist-info/LICENSE.doctree
new file mode 100644
index 000000000..45e7ab3c7
Binary files /dev/null and b/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/alabaster-0.7.16.dist-info/LICENSE.doctree differ
diff --git a/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/debugpy/_vendored/pydevd/pydevd_plugins/extensions/README.doctree b/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/debugpy/_vendored/pydevd/pydevd_plugins/extensions/README.doctree
new file mode 100644
index 000000000..c56b43479
Binary files /dev/null and b/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/debugpy/_vendored/pydevd/pydevd_plugins/extensions/README.doctree differ
diff --git a/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/idna-3.10.dist-info/LICENSE.doctree b/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/idna-3.10.dist-info/LICENSE.doctree
new file mode 100644
index 000000000..cf4965621
Binary files /dev/null and b/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/idna-3.10.dist-info/LICENSE.doctree differ
diff --git a/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/imagesize-1.4.1.dist-info/LICENSE.doctree b/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/imagesize-1.4.1.dist-info/LICENSE.doctree
new file mode 100644
index 000000000..c59e69a76
Binary files /dev/null and b/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/imagesize-1.4.1.dist-info/LICENSE.doctree differ
diff --git a/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/ipython-9.5.0.dist-info/licenses/COPYING.doctree b/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/ipython-9.5.0.dist-info/licenses/COPYING.doctree
new file mode 100644
index 000000000..cecfe527d
Binary files /dev/null and b/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/ipython-9.5.0.dist-info/licenses/COPYING.doctree differ
diff --git a/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/jupyter_book/book_template/intro.doctree b/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/jupyter_book/book_template/intro.doctree
new file mode 100644
index 000000000..2490a33e5
Binary files /dev/null and b/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/jupyter_book/book_template/intro.doctree differ
diff --git a/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/jupyter_book/book_template/markdown-notebooks.doctree b/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/jupyter_book/book_template/markdown-notebooks.doctree
new file mode 100644
index 000000000..1241e9190
Binary files /dev/null and b/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/jupyter_book/book_template/markdown-notebooks.doctree differ
diff --git a/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/jupyter_book/book_template/markdown.doctree b/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/jupyter_book/book_template/markdown.doctree
new file mode 100644
index 000000000..dff26e46b
Binary files /dev/null and b/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/jupyter_book/book_template/markdown.doctree differ
diff --git a/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/jupyter_book/book_template/notebooks.doctree b/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/jupyter_book/book_template/notebooks.doctree
new file mode 100644
index 000000000..9a11b76dd
Binary files /dev/null and b/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/jupyter_book/book_template/notebooks.doctree differ
diff --git a/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/latexcodec-3.0.1.dist-info/licenses/AUTHORS.doctree b/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/latexcodec-3.0.1.dist-info/licenses/AUTHORS.doctree
new file mode 100644
index 000000000..28c738f27
Binary files /dev/null and b/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/latexcodec-3.0.1.dist-info/licenses/AUTHORS.doctree differ
diff --git a/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/latexcodec-3.0.1.dist-info/licenses/LICENSE.doctree b/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/latexcodec-3.0.1.dist-info/licenses/LICENSE.doctree
new file mode 100644
index 000000000..e12b8a69e
Binary files /dev/null and b/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/latexcodec-3.0.1.dist-info/licenses/LICENSE.doctree differ
diff --git a/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/mdit_py_plugins/container/README.doctree b/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/mdit_py_plugins/container/README.doctree
new file mode 100644
index 000000000..2c161d7bb
Binary files /dev/null and b/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/mdit_py_plugins/container/README.doctree differ
diff --git a/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/mdit_py_plugins/deflist/README.doctree b/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/mdit_py_plugins/deflist/README.doctree
new file mode 100644
index 000000000..b8db93782
Binary files /dev/null and b/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/mdit_py_plugins/deflist/README.doctree differ
diff --git a/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/mdit_py_plugins/texmath/README.doctree b/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/mdit_py_plugins/texmath/README.doctree
new file mode 100644
index 000000000..30aa1080d
Binary files /dev/null and b/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/mdit_py_plugins/texmath/README.doctree differ
diff --git a/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/numpy/ma/README.doctree b/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/numpy/ma/README.doctree
new file mode 100644
index 000000000..237a4fd91
Binary files /dev/null and b/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/numpy/ma/README.doctree differ
diff --git a/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/numpy/random/LICENSE.doctree b/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/numpy/random/LICENSE.doctree
new file mode 100644
index 000000000..f87551977
Binary files /dev/null and b/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/numpy/random/LICENSE.doctree differ
diff --git a/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/pip-25.2.dist-info/licenses/src/pip/_vendor/idna/LICENSE.doctree b/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/pip-25.2.dist-info/licenses/src/pip/_vendor/idna/LICENSE.doctree
new file mode 100644
index 000000000..0cbde70f4
Binary files /dev/null and b/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/pip-25.2.dist-info/licenses/src/pip/_vendor/idna/LICENSE.doctree differ
diff --git a/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/prompt_toolkit-3.0.52.dist-info/licenses/AUTHORS.doctree b/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/prompt_toolkit-3.0.52.dist-info/licenses/AUTHORS.doctree
new file mode 100644
index 000000000..e3dc468e7
Binary files /dev/null and b/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/prompt_toolkit-3.0.52.dist-info/licenses/AUTHORS.doctree differ
diff --git a/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/pybtex_docutils-1.0.3.dist-info/LICENSE.doctree b/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/pybtex_docutils-1.0.3.dist-info/LICENSE.doctree
new file mode 100644
index 000000000..37d431da9
Binary files /dev/null and b/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/pybtex_docutils-1.0.3.dist-info/LICENSE.doctree differ
diff --git a/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/pyzmq-27.0.2.dist-info/licenses/LICENSE.doctree b/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/pyzmq-27.0.2.dist-info/licenses/LICENSE.doctree
new file mode 100644
index 000000000..33dd0fdfc
Binary files /dev/null and b/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/pyzmq-27.0.2.dist-info/licenses/LICENSE.doctree differ
diff --git a/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/soupsieve-2.8.dist-info/licenses/LICENSE.doctree b/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/soupsieve-2.8.dist-info/licenses/LICENSE.doctree
new file mode 100644
index 000000000..3b98e4a91
Binary files /dev/null and b/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/soupsieve-2.8.dist-info/licenses/LICENSE.doctree differ
diff --git a/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/sphinx-7.4.7.dist-info/LICENSE.doctree b/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/sphinx-7.4.7.dist-info/LICENSE.doctree
new file mode 100644
index 000000000..6240076ea
Binary files /dev/null and b/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/sphinx-7.4.7.dist-info/LICENSE.doctree differ
diff --git a/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/sphinx_book_theme/assets/translations/README.doctree b/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/sphinx_book_theme/assets/translations/README.doctree
new file mode 100644
index 000000000..b686f1bf6
Binary files /dev/null and b/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/sphinx_book_theme/assets/translations/README.doctree differ
diff --git a/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/sphinxcontrib_bibtex-2.6.5.dist-info/licenses/LICENSE.doctree b/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/sphinxcontrib_bibtex-2.6.5.dist-info/licenses/LICENSE.doctree
new file mode 100644
index 000000000..6ae9ca3f7
Binary files /dev/null and b/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/sphinxcontrib_bibtex-2.6.5.dist-info/licenses/LICENSE.doctree differ
diff --git a/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/zmq/backend/cffi/README.doctree b/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/zmq/backend/cffi/README.doctree
new file mode 100644
index 000000000..8e0dc1337
Binary files /dev/null and b/doc/LectureNotes/_build/.doctrees/.venv/lib/python3.13/site-packages/zmq/backend/cffi/README.doctree differ
diff --git a/doc/LectureNotes/_build/.doctrees/environment.pickle b/doc/LectureNotes/_build/.doctrees/environment.pickle
index 98ef94a83..f9bc94dbb 100644
Binary files a/doc/LectureNotes/_build/.doctrees/environment.pickle and b/doc/LectureNotes/_build/.doctrees/environment.pickle differ
diff --git a/doc/LectureNotes/_build/.doctrees/exercisesweek35.doctree b/doc/LectureNotes/_build/.doctrees/exercisesweek35.doctree
index e4d37262e..3e8c9fa18 100644
Binary files a/doc/LectureNotes/_build/.doctrees/exercisesweek35.doctree and b/doc/LectureNotes/_build/.doctrees/exercisesweek35.doctree differ
diff --git a/doc/LectureNotes/_build/.doctrees/exercisesweek36.doctree b/doc/LectureNotes/_build/.doctrees/exercisesweek36.doctree
index f8892c80f..9aab23cea 100644
Binary files a/doc/LectureNotes/_build/.doctrees/exercisesweek36.doctree and b/doc/LectureNotes/_build/.doctrees/exercisesweek36.doctree differ
diff --git a/doc/LectureNotes/_build/.doctrees/exercisesweek37.doctree b/doc/LectureNotes/_build/.doctrees/exercisesweek37.doctree
index bd5448904..2824a9bf4 100644
Binary files a/doc/LectureNotes/_build/.doctrees/exercisesweek37.doctree and b/doc/LectureNotes/_build/.doctrees/exercisesweek37.doctree differ
diff --git a/doc/LectureNotes/_build/.doctrees/exercisesweek38.doctree b/doc/LectureNotes/_build/.doctrees/exercisesweek38.doctree
new file mode 100644
index 000000000..9961e3be8
Binary files /dev/null and b/doc/LectureNotes/_build/.doctrees/exercisesweek38.doctree differ
diff --git a/doc/LectureNotes/_build/.doctrees/exercisesweek39.doctree b/doc/LectureNotes/_build/.doctrees/exercisesweek39.doctree
new file mode 100644
index 000000000..cc6fae4d9
Binary files /dev/null and b/doc/LectureNotes/_build/.doctrees/exercisesweek39.doctree differ
diff --git a/doc/LectureNotes/_build/.doctrees/exercisesweek41.doctree b/doc/LectureNotes/_build/.doctrees/exercisesweek41.doctree
new file mode 100644
index 000000000..abdaa4371
Binary files /dev/null and b/doc/LectureNotes/_build/.doctrees/exercisesweek41.doctree differ
diff --git a/doc/LectureNotes/_build/.doctrees/exercisesweek42.doctree b/doc/LectureNotes/_build/.doctrees/exercisesweek42.doctree
new file mode 100644
index 000000000..5d34d9619
Binary files /dev/null and b/doc/LectureNotes/_build/.doctrees/exercisesweek42.doctree differ
diff --git a/doc/LectureNotes/_build/.doctrees/exercisesweek43.doctree b/doc/LectureNotes/_build/.doctrees/exercisesweek43.doctree
new file mode 100644
index 000000000..793ebcfad
Binary files /dev/null and b/doc/LectureNotes/_build/.doctrees/exercisesweek43.doctree differ
diff --git a/doc/LectureNotes/_build/.doctrees/exercisesweek44.doctree b/doc/LectureNotes/_build/.doctrees/exercisesweek44.doctree
new file mode 100644
index 000000000..41ce5a839
Binary files /dev/null and b/doc/LectureNotes/_build/.doctrees/exercisesweek44.doctree differ
diff --git a/doc/LectureNotes/_build/.doctrees/intro.doctree b/doc/LectureNotes/_build/.doctrees/intro.doctree
index 79d28ce2c..f7f4a762d 100644
Binary files a/doc/LectureNotes/_build/.doctrees/intro.doctree and b/doc/LectureNotes/_build/.doctrees/intro.doctree differ
diff --git a/doc/LectureNotes/_build/.doctrees/project1.doctree b/doc/LectureNotes/_build/.doctrees/project1.doctree
index a9a908212..03d34307b 100644
Binary files a/doc/LectureNotes/_build/.doctrees/project1.doctree and b/doc/LectureNotes/_build/.doctrees/project1.doctree differ
diff --git a/doc/LectureNotes/_build/.doctrees/project2.doctree b/doc/LectureNotes/_build/.doctrees/project2.doctree
new file mode 100644
index 000000000..23faf0efe
Binary files /dev/null and b/doc/LectureNotes/_build/.doctrees/project2.doctree differ
diff --git a/doc/LectureNotes/_build/.doctrees/week37.doctree b/doc/LectureNotes/_build/.doctrees/week37.doctree
new file mode 100644
index 000000000..d6aa660bb
Binary files /dev/null and b/doc/LectureNotes/_build/.doctrees/week37.doctree differ
diff --git a/doc/LectureNotes/_build/.doctrees/week38.doctree b/doc/LectureNotes/_build/.doctrees/week38.doctree
new file mode 100644
index 000000000..7cd935a7f
Binary files /dev/null and b/doc/LectureNotes/_build/.doctrees/week38.doctree differ
diff --git a/doc/LectureNotes/_build/.doctrees/week39.doctree b/doc/LectureNotes/_build/.doctrees/week39.doctree
new file mode 100644
index 000000000..d09449378
Binary files /dev/null and b/doc/LectureNotes/_build/.doctrees/week39.doctree differ
diff --git a/doc/LectureNotes/_build/.doctrees/week40.doctree b/doc/LectureNotes/_build/.doctrees/week40.doctree
new file mode 100644
index 000000000..da8a5efe0
Binary files /dev/null and b/doc/LectureNotes/_build/.doctrees/week40.doctree differ
diff --git a/doc/LectureNotes/_build/.doctrees/week41.doctree b/doc/LectureNotes/_build/.doctrees/week41.doctree
new file mode 100644
index 000000000..3b83ec1c9
Binary files /dev/null and b/doc/LectureNotes/_build/.doctrees/week41.doctree differ
diff --git a/doc/LectureNotes/_build/.doctrees/week42.doctree b/doc/LectureNotes/_build/.doctrees/week42.doctree
new file mode 100644
index 000000000..a93db2b6f
Binary files /dev/null and b/doc/LectureNotes/_build/.doctrees/week42.doctree differ
diff --git a/doc/LectureNotes/_build/.doctrees/week43.doctree b/doc/LectureNotes/_build/.doctrees/week43.doctree
new file mode 100644
index 000000000..66ccdcbeb
Binary files /dev/null and b/doc/LectureNotes/_build/.doctrees/week43.doctree differ
diff --git a/doc/LectureNotes/_build/.doctrees/week44.doctree b/doc/LectureNotes/_build/.doctrees/week44.doctree
new file mode 100644
index 000000000..cc92706c4
Binary files /dev/null and b/doc/LectureNotes/_build/.doctrees/week44.doctree differ
diff --git a/doc/LectureNotes/_build/.doctrees/week45.doctree b/doc/LectureNotes/_build/.doctrees/week45.doctree
new file mode 100644
index 000000000..b114b1f95
Binary files /dev/null and b/doc/LectureNotes/_build/.doctrees/week45.doctree differ
diff --git a/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/a11y_pygments/a11y_dark/README.html b/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/a11y_pygments/a11y_dark/README.html
new file mode 100644
index 000000000..62d549a2d
--- /dev/null
+++ b/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/a11y_pygments/a11y_dark/README.html
@@ -0,0 +1,587 @@
+
+<!DOCTYPE html>
+
+
+<html lang="en" data-content_root="../../../../../../" >
+
+  <head>
+    <meta charset="utf-8" />
+    <meta name="viewport" content="width=device-width, initial-scale=1.0" /><meta name="viewport" content="width=device-width, initial-scale=1" />
+
+    <title>A11Y Dark &#8212; Applied Data Analysis and Machine Learning</title>
+  
+  
+  
+  <script data-cfasync="false">
+    document.documentElement.dataset.mode = localStorage.getItem("mode") || "";
+    document.documentElement.dataset.theme = localStorage.getItem("theme") || "";
+  </script>
+  
+  <!-- Loaded before other Sphinx assets -->
+  <link href="http://github.com/_static/styles/theme.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+<link href="http://github.com/_static/styles/bootstrap.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+<link href="http://github.com/_static/styles/pydata-sphinx-theme.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+
+  
+  <link href="http://github.com/_static/vendor/fontawesome/6.5.2/css/all.min.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+  <link rel="preload" as="font" type="font/woff2" crossorigin href="http://github.com/_static/vendor/fontawesome/6.5.2/webfonts/fa-solid-900.woff2" />
+<link rel="preload" as="font" type="font/woff2" crossorigin href="http://github.com/_static/vendor/fontawesome/6.5.2/webfonts/fa-brands-400.woff2" />
+<link rel="preload" as="font" type="font/woff2" crossorigin href="http://github.com/_static/vendor/fontawesome/6.5.2/webfonts/fa-regular-400.woff2" />
+
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/pygments.css?v=03e43079" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/styles/sphinx-book-theme.css?v=eba8b062" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/togglebutton.css?v=13237357" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/copybutton.css?v=76b2166b" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/mystnb.8ecb98da25f57f5357bf6f572d296f466b2cfe2517ffebfabe82451661e28f02.css?v=6644e6bb" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/sphinx-thebe.css?v=4fa983c6" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/sphinx-design.min.css?v=95c83b7e" />
+  
+  <!-- Pre-loaded scripts that we'll load fully later -->
+  <link rel="preload" as="script" href="http://github.com/_static/scripts/bootstrap.js?digest=dfe6caa3a7d634c4db9b" />
+<link rel="preload" as="script" href="http://github.com/_static/scripts/pydata-sphinx-theme.js?digest=dfe6caa3a7d634c4db9b" />
+  <script src="http://github.com/_static/vendor/fontawesome/6.5.2/js/all.min.js?digest=dfe6caa3a7d634c4db9b"></script>
+
+    <script src="http://github.com/_static/documentation_options.js?v=9eb32ce0"></script>
+    <script src="http://github.com/_static/doctools.js?v=9a2dae69"></script>
+    <script src="http://github.com/_static/sphinx_highlight.js?v=dc90522c"></script>
+    <script src="http://github.com/_static/clipboard.min.js?v=a7894cd8"></script>
+    <script src="http://github.com/_static/copybutton.js?v=f281be69"></script>
+    <script src="http://github.com/_static/scripts/sphinx-book-theme.js?v=887ef09a"></script>
+    <script>let toggleHintShow = 'Click to show';</script>
+    <script>let toggleHintHide = 'Click to hide';</script>
+    <script>let toggleOpenOnPrint = 'true';</script>
+    <script src="http://github.com/_static/togglebutton.js?v=4a39c7ea"></script>
+    <script>var togglebuttonSelector = '.toggle, .admonition.dropdown';</script>
+    <script src="http://github.com/_static/design-tabs.js?v=f930bc37"></script>
+    <script>const THEBE_JS_URL = "https://unpkg.com/thebe@0.8.2/lib/index.js"; const thebe_selector = ".thebe,.cell"; const thebe_selector_input = "pre"; const thebe_selector_output = ".output, .cell_output"</script>
+    <script async="async" src="http://github.com/_static/sphinx-thebe.js?v=c100c467"></script>
+    <script>var togglebuttonSelector = '.toggle, .admonition.dropdown';</script>
+    <script>const THEBE_JS_URL = "https://unpkg.com/thebe@0.8.2/lib/index.js"; const thebe_selector = ".thebe,.cell"; const thebe_selector_input = "pre"; const thebe_selector_output = ".output, .cell_output"</script>
+    <script>DOCUMENTATION_OPTIONS.pagename = '.venv/lib/python3.13/site-packages/a11y_pygments/a11y_dark/README';</script>
+    <link rel="index" title="Index" href="http://github.com/genindex.html" />
+    <link rel="search" title="Search" href="http://github.com/search.html" />
+  <meta name="viewport" content="width=device-width, initial-scale=1"/>
+  <meta name="docsearch:language" content="en"/>
+  </head>
+  
+  
+  <body data-bs-spy="scroll" data-bs-target=".bd-toc-nav" data-offset="180" data-bs-root-margin="0px 0px -60%" data-default-mode="">
+
+  
+  
+  <div id="pst-skip-link" class="skip-link d-print-none"><a href="#main-content">Skip to main content</a></div>
+  
+  <div id="pst-scroll-pixel-helper"></div>
+  
+  <button type="button" class="btn rounded-pill" id="pst-back-to-top">
+    <i class="fa-solid fa-arrow-up"></i>Back to top</button>
+
+  
+  <input type="checkbox"
+          class="sidebar-toggle"
+          id="pst-primary-sidebar-checkbox"/>
+  <label class="overlay overlay-primary" for="pst-primary-sidebar-checkbox"></label>
+  
+  <input type="checkbox"
+          class="sidebar-toggle"
+          id="pst-secondary-sidebar-checkbox"/>
+  <label class="overlay overlay-secondary" for="pst-secondary-sidebar-checkbox"></label>
+  
+  <div class="search-button__wrapper">
+    <div class="search-button__overlay"></div>
+    <div class="search-button__search-container">
+<form class="bd-search d-flex align-items-center"
+      action="http://github.com/search.html"
+      method="get">
+  <i class="fa-solid fa-magnifying-glass"></i>
+  <input type="search"
+         class="form-control"
+         name="q"
+         id="search-input"
+         placeholder="Search this book..."
+         aria-label="Search this book..."
+         autocomplete="off"
+         autocorrect="off"
+         autocapitalize="off"
+         spellcheck="false"/>
+  <span class="search-button__kbd-shortcut"><kbd class="kbd-shortcut__modifier">Ctrl</kbd>+<kbd>K</kbd></span>
+</form></div>
+  </div>
+
+  <div class="pst-async-banner-revealer d-none">
+  <aside id="bd-header-version-warning" class="d-none d-print-none" aria-label="Version warning"></aside>
+</div>
+
+  
+    <header class="bd-header navbar navbar-expand-lg bd-navbar d-print-none">
+    </header>
+  
+
+  <div class="bd-container">
+    <div class="bd-container__inner bd-page-width">
+      
+      
+      
+        
+      
+      <div class="bd-sidebar-primary bd-sidebar">
+        
+
+  
+  <div class="sidebar-header-items sidebar-primary__section">
+    
+    
+    
+    
+  </div>
+  
+    <div class="sidebar-primary-items__start sidebar-primary__section">
+        <div class="sidebar-primary-item">
+
+  
+    
+  
+
+<a class="navbar-brand logo" href="http://github.com/intro.html">
+  
+  
+  
+  
+  
+    
+    
+      
+    
+    
+    <img src="http://github.com/_static/logo.png" class="logo__image only-light" alt="Applied Data Analysis and Machine Learning - Home"/>
+    <script>document.write(`<img src="http://github.com/_static/logo.png" class="logo__image only-dark" alt="Applied Data Analysis and Machine Learning - Home"/>`);</script>
+  
+  
+</a></div>
+        <div class="sidebar-primary-item">
+
+ <script>
+ document.write(`
+   <button class="btn search-button-field search-button__button" title="Search" aria-label="Search" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <i class="fa-solid fa-magnifying-glass"></i>
+    <span class="search-button__default-text">Search</span>
+    <span class="search-button__kbd-shortcut"><kbd class="kbd-shortcut__modifier">Ctrl</kbd>+<kbd class="kbd-shortcut__modifier">K</kbd></span>
+   </button>
+ `);
+ </script></div>
+        <div class="sidebar-primary-item"><nav class="bd-links bd-docs-nav" aria-label="Main">
+    <div class="bd-toc-item navbar-nav active">
+        
+        <ul class="nav bd-sidenav bd-sidenav__home-link">
+            <li class="toctree-l1">
+                <a class="reference internal" href="http://github.com/intro.html">
+                    Applied Data Analysis and Machine Learning
+                </a>
+            </li>
+        </ul>
+        <p aria-level="2" class="caption" role="heading"><span class="caption-text">About the course</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/schedule.html">Course setting</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/teachers.html">Teachers and Grading</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/textbooks.html">Textbooks</a></li>
+
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Review of Statistics with Resampling Techniques and Linear Algebra</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/statistics.html">1. Elements of Probability Theory and Statistical Data Analysis</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/linalg.html">2. Linear Algebra, Handling of Arrays and more Python Features</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">From Regression to Support Vector Machines</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter1.html">3. Linear Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter2.html">4. Ridge and Lasso Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter3.html">5. Resampling Methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter4.html">6. Logistic Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapteroptimization.html">7. Optimization, the central part of any Machine Learning algortithm</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter5.html">8. Support Vector Machines, overarching aims</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Decision Trees, Ensemble Methods and Boosting</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter6.html">9. Decision trees, overarching aims</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter7.html">10. Ensemble Methods: From a Single Tree to Many Trees and Extreme Boosting, Meet the Jungle of Methods</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Dimensionality Reduction</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter8.html">11. Basic ideas of the Principal Component Analysis (PCA)</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/clustering.html">12. Clustering and Unsupervised Learning</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Deep Learning Methods</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter9.html">13. Neural networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter10.html">14. Building a Feed Forward Neural Network</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter11.html">15. Solving Differential Equations  with Deep Learning</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter12.html">16. Convolutional Neural Networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter13.html">17. Recurrent neural networks: Overarching view</a></li>
+
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Weekly material, notes and exercises</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek34.html">Exercises week 34</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week34.html">Week 34: Introduction to the course, Logistics and Practicalities</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek35.html">Exercises week 35</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week35.html">Week 35: From Ordinary Linear Regression to Ridge and Lasso Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek36.html">Exercises week 36</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week36.html">Week 36: Linear Regression and Gradient descent</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek37.html">Exercises week 37</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week37.html">Week 37: Gradient descent methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek38.html">Exercises week 38</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week38.html">Week 38: Statistical analysis, bias-variance tradeoff and resampling methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek39.html">Exercises week 39</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week39.html">Week 39: Resampling methods and logistic regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week40.html">Week 40: Gradient descent methods (continued) and start Neural networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week41.html">Week 41 Neural networks and constructing a neural network code</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek41.html">Exercises week 41</a></li>
+
+
+
+
+
+
+
+
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Projects</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/project1.html">Project 1 on Machine Learning, deadline October 6 (midnight), 2025</a></li>
+</ul>
+
+    </div>
+</nav></div>
+    </div>
+  
+  
+  <div class="sidebar-primary-items__end sidebar-primary__section">
+  </div>
+  
+  <div id="rtd-footer-container"></div>
+
+
+      </div>
+      
+      <main id="main-content" class="bd-main" role="main">
+        
+        
+
+<div class="sbt-scroll-pixel-helper"></div>
+
+          <div class="bd-content">
+            <div class="bd-article-container">
+              
+              <div class="bd-header-article d-print-none">
+<div class="header-article-items header-article__inner">
+  
+    <div class="header-article-items__start">
+      
+        <div class="header-article-item"><button class="sidebar-toggle primary-toggle btn btn-sm" title="Toggle primary sidebar" data-bs-placement="bottom" data-bs-toggle="tooltip">
+  <span class="fa-solid fa-bars"></span>
+</button></div>
+      
+    </div>
+  
+  
+    <div class="header-article-items__end">
+      
+        <div class="header-article-item">
+
+<div class="article-header-buttons">
+
+
+
+
+
+<div class="dropdown dropdown-download-buttons">
+  <button class="btn dropdown-toggle" type="button" data-bs-toggle="dropdown" aria-expanded="false" aria-label="Download this page">
+    <i class="fas fa-download"></i>
+  </button>
+  <ul class="dropdown-menu">
+      
+      
+      
+      <li><a href="http://github.com/_sources/.venv/lib/python3.13/site-packages/a11y_pygments/a11y_dark/README.md" target="_blank"
+   class="btn btn-sm btn-download-source-button dropdown-item"
+   title="Download source file"
+   data-bs-placement="left" data-bs-toggle="tooltip"
+>
+  
+
+<span class="btn__icon-container">
+  <i class="fas fa-file"></i>
+  </span>
+<span class="btn__text-container">.md</span>
+</a>
+</li>
+      
+      
+      
+      
+      <li>
+<button onclick="window.print()"
+  class="btn btn-sm btn-download-pdf-button dropdown-item"
+  title="Print to PDF"
+  data-bs-placement="left" data-bs-toggle="tooltip"
+>
+  
+
+<span class="btn__icon-container">
+  <i class="fas fa-file-pdf"></i>
+  </span>
+<span class="btn__text-container">.pdf</span>
+</button>
+</li>
+      
+  </ul>
+</div>
+
+
+
+
+<button onclick="toggleFullScreen()"
+  class="btn btn-sm btn-fullscreen-button"
+  title="Fullscreen mode"
+  data-bs-placement="bottom" data-bs-toggle="tooltip"
+>
+  
+
+<span class="btn__icon-container">
+  <i class="fas fa-expand"></i>
+  </span>
+
+</button>
+
+
+
+<script>
+document.write(`
+  <button class="btn btn-sm nav-link pst-navbar-icon theme-switch-button" title="light/dark" aria-label="light/dark" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <i class="theme-switch fa-solid fa-sun fa-lg" data-mode="light"></i>
+    <i class="theme-switch fa-solid fa-moon fa-lg" data-mode="dark"></i>
+    <i class="theme-switch fa-solid fa-circle-half-stroke fa-lg" data-mode="auto"></i>
+  </button>
+`);
+</script>
+
+
+<script>
+document.write(`
+  <button class="btn btn-sm pst-navbar-icon search-button search-button__button" title="Search" aria-label="Search" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <i class="fa-solid fa-magnifying-glass fa-lg"></i>
+  </button>
+`);
+</script>
+<button class="sidebar-toggle secondary-toggle btn btn-sm" title="Toggle secondary sidebar" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <span class="fa-solid fa-list"></span>
+</button>
+</div></div>
+      
+    </div>
+  
+</div>
+</div>
+              
+              
+
+<div id="jb-print-docs-body" class="onlyprint">
+    <h1>A11Y Dark</h1>
+    <!-- Table of contents -->
+    <div id="print-main-content">
+        <div id="jb-print-toc">
+            
+            <div>
+                <h2> Contents </h2>
+            </div>
+            <nav aria-label="Page">
+                <ul class="visible nav section-nav flex-column">
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#colors">Colors</a></li>
+</ul>
+            </nav>
+        </div>
+    </div>
+</div>
+
+              
+                
+<div id="searchbox"></div>
+                <article class="bd-article">
+                  
+  <section class="tex2jax_ignore mathjax_ignore" id="a11y-dark">
+<h1>A11Y Dark<a class="headerlink" href="#a11y-dark" title="Link to this heading">#</a></h1>
+<p>This is the Pygments implementation of a11y-dark from <a class="reference external" href="https://github.com/ericwbailey/a11y-syntax-highlighting">Eric Bailey’s
+accessible themes for syntax
+highlighting</a></p>
+<p><img alt="Screenshot of the a11y-dark theme in a bash script" src="http://github.com/_images/a11y-dark.png" /></p>
+<section id="colors">
+<h2>Colors<a class="headerlink" href="#colors" title="Link to this heading">#</a></h2>
+<p>Background color: <img alt="#2b2b2b" src="https://via.placeholder.com/20/2b2b2b/2b2b2b.png" /> <code class="docutils literal notranslate"><span class="pre">#2b2b2b</span></code></p>
+<p>Highlight color: <img alt="#ffd9002e" src="https://via.placeholder.com/20/ffd9002e/ffd9002e.png" /> <code class="docutils literal notranslate"><span class="pre">#ffd9002e</span></code></p>
+<p><strong>WCAG compliance</strong></p>
+<div class="pst-scrollable-table-container"><table class="table">
+<thead>
+<tr class="row-odd"><th class="head"><p>Color</p></th>
+<th class="head"><p>Hex</p></th>
+<th class="head"><p>Ratio</p></th>
+<th class="head"><p>Normal text</p></th>
+<th class="head"><p>Large text</p></th>
+</tr>
+</thead>
+<tbody>
+<tr class="row-even"><td><p><img alt="#d4d0ab" src="http://github.com/_images/d4d0ab.png" /></p></td>
+<td><p><code class="docutils literal notranslate"><span class="pre">#d4d0ab</span></code></p></td>
+<td><p>9.0 : 1</p></td>
+<td><p>AAA</p></td>
+<td><p>AAA</p></td>
+</tr>
+<tr class="row-odd"><td><p><img alt="#ffa07a" src="http://github.com/_images/ffa07a.png" /></p></td>
+<td><p><code class="docutils literal notranslate"><span class="pre">#ffa07a</span></code></p></td>
+<td><p>7.1 : 1</p></td>
+<td><p>AAA</p></td>
+<td><p>AAA</p></td>
+</tr>
+<tr class="row-even"><td><p><img alt="#f5ab35" src="http://github.com/_images/f5ab35.png" /></p></td>
+<td><p><code class="docutils literal notranslate"><span class="pre">#f5ab35</span></code></p></td>
+<td><p>7.3 : 1</p></td>
+<td><p>AAA</p></td>
+<td><p>AAA</p></td>
+</tr>
+<tr class="row-odd"><td><p><img alt="#ffd700" src="http://github.com/_images/ffd700.png" /></p></td>
+<td><p><code class="docutils literal notranslate"><span class="pre">#ffd700</span></code></p></td>
+<td><p>10.1 : 1</p></td>
+<td><p>AAA</p></td>
+<td><p>AAA</p></td>
+</tr>
+<tr class="row-even"><td><p><img alt="#abe338" src="http://github.com/_images/abe338.png" /></p></td>
+<td><p><code class="docutils literal notranslate"><span class="pre">#abe338</span></code></p></td>
+<td><p>9.3 : 1</p></td>
+<td><p>AAA</p></td>
+<td><p>AAA</p></td>
+</tr>
+<tr class="row-odd"><td><p><img alt="#00e0e0" src="http://github.com/_images/00e0e0.png" /></p></td>
+<td><p><code class="docutils literal notranslate"><span class="pre">#00e0e0</span></code></p></td>
+<td><p>8.6 : 1</p></td>
+<td><p>AAA</p></td>
+<td><p>AAA</p></td>
+</tr>
+<tr class="row-even"><td><p><img alt="#dcc6e0" src="http://github.com/_images/dcc6e0.png" /></p></td>
+<td><p><code class="docutils literal notranslate"><span class="pre">#dcc6e0</span></code></p></td>
+<td><p>8.9 : 1</p></td>
+<td><p>AAA</p></td>
+<td><p>AAA</p></td>
+</tr>
+<tr class="row-odd"><td><p><img alt="#f8f8f2" src="http://github.com/_images/f8f8f2.png" /></p></td>
+<td><p><code class="docutils literal notranslate"><span class="pre">#f8f8f2</span></code></p></td>
+<td><p>13.3 : 1</p></td>
+<td><p>AAA</p></td>
+<td><p>AAA</p></td>
+</tr>
+</tbody>
+</table>
+</div>
+</section>
+</section>
+
+    <script type="text/x-thebe-config">
+    {
+        requestKernel: true,
+        binderOptions: {
+            repo: "binder-examples/jupyter-stacks-datascience",
+            ref: "master",
+        },
+        codeMirrorConfig: {
+            theme: "abcdef",
+            mode: "python"
+        },
+        kernelOptions: {
+            name: "python3",
+            path: "./.venv/lib/python3.13/site-packages/a11y_pygments/a11y_dark"
+        },
+        predefinedOutput: true
+    }
+    </script>
+    <script>kernelName = 'python3'</script>
+
+                </article>
+              
+
+              
+              
+              
+              
+                <footer class="prev-next-footer d-print-none">
+                  
+<div class="prev-next-area">
+</div>
+                </footer>
+              
+            </div>
+            
+            
+              
+                <div class="bd-sidebar-secondary bd-toc"><div class="sidebar-secondary-items sidebar-secondary__inner">
+
+
+  <div class="sidebar-secondary-item">
+  <div class="page-toc tocsection onthispage">
+    <i class="fa-solid fa-list"></i> Contents
+  </div>
+  <nav class="bd-toc-nav page-toc">
+    <ul class="visible nav section-nav flex-column">
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#colors">Colors</a></li>
+</ul>
+  </nav></div>
+
+</div></div>
+              
+            
+          </div>
+          <footer class="bd-footer-content">
+            
+<div class="bd-footer-content__inner container">
+  
+  <div class="footer-item">
+    
+<p class="component-author">
+By Morten Hjorth-Jensen
+</p>
+
+  </div>
+  
+  <div class="footer-item">
+    
+
+  <p class="copyright">
+    
+      © Copyright 2023.
+      <br/>
+    
+  </p>
+
+  </div>
+  
+  <div class="footer-item">
+    
+  </div>
+  
+  <div class="footer-item">
+    
+  </div>
+  
+</div>
+          </footer>
+        
+
+      </main>
+    </div>
+  </div>
+  
+  <!-- Scripts loaded after <body> so the DOM is not blocked -->
+  <script src="http://github.com/_static/scripts/bootstrap.js?digest=dfe6caa3a7d634c4db9b"></script>
+<script src="http://github.com/_static/scripts/pydata-sphinx-theme.js?digest=dfe6caa3a7d634c4db9b"></script>
+
+  <footer class="bd-footer">
+  </footer>
+  </body>
+</html>
\ No newline at end of file
diff --git a/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/a11y_pygments/a11y_high_contrast_dark/README.html b/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/a11y_pygments/a11y_high_contrast_dark/README.html
new file mode 100644
index 000000000..5b04aa896
--- /dev/null
+++ b/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/a11y_pygments/a11y_high_contrast_dark/README.html
@@ -0,0 +1,573 @@
+
+<!DOCTYPE html>
+
+
+<html lang="en" data-content_root="../../../../../../" >
+
+  <head>
+    <meta charset="utf-8" />
+    <meta name="viewport" content="width=device-width, initial-scale=1.0" /><meta name="viewport" content="width=device-width, initial-scale=1" />
+
+    <title>A11Y High Contrast Dark &#8212; Applied Data Analysis and Machine Learning</title>
+  
+  
+  
+  <script data-cfasync="false">
+    document.documentElement.dataset.mode = localStorage.getItem("mode") || "";
+    document.documentElement.dataset.theme = localStorage.getItem("theme") || "";
+  </script>
+  
+  <!-- Loaded before other Sphinx assets -->
+  <link href="http://github.com/_static/styles/theme.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+<link href="http://github.com/_static/styles/bootstrap.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+<link href="http://github.com/_static/styles/pydata-sphinx-theme.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+
+  
+  <link href="http://github.com/_static/vendor/fontawesome/6.5.2/css/all.min.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+  <link rel="preload" as="font" type="font/woff2" crossorigin href="http://github.com/_static/vendor/fontawesome/6.5.2/webfonts/fa-solid-900.woff2" />
+<link rel="preload" as="font" type="font/woff2" crossorigin href="http://github.com/_static/vendor/fontawesome/6.5.2/webfonts/fa-brands-400.woff2" />
+<link rel="preload" as="font" type="font/woff2" crossorigin href="http://github.com/_static/vendor/fontawesome/6.5.2/webfonts/fa-regular-400.woff2" />
+
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/pygments.css?v=03e43079" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/styles/sphinx-book-theme.css?v=eba8b062" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/togglebutton.css?v=13237357" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/copybutton.css?v=76b2166b" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/mystnb.8ecb98da25f57f5357bf6f572d296f466b2cfe2517ffebfabe82451661e28f02.css?v=6644e6bb" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/sphinx-thebe.css?v=4fa983c6" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/sphinx-design.min.css?v=95c83b7e" />
+  
+  <!-- Pre-loaded scripts that we'll load fully later -->
+  <link rel="preload" as="script" href="http://github.com/_static/scripts/bootstrap.js?digest=dfe6caa3a7d634c4db9b" />
+<link rel="preload" as="script" href="http://github.com/_static/scripts/pydata-sphinx-theme.js?digest=dfe6caa3a7d634c4db9b" />
+  <script src="http://github.com/_static/vendor/fontawesome/6.5.2/js/all.min.js?digest=dfe6caa3a7d634c4db9b"></script>
+
+    <script src="http://github.com/_static/documentation_options.js?v=9eb32ce0"></script>
+    <script src="http://github.com/_static/doctools.js?v=9a2dae69"></script>
+    <script src="http://github.com/_static/sphinx_highlight.js?v=dc90522c"></script>
+    <script src="http://github.com/_static/clipboard.min.js?v=a7894cd8"></script>
+    <script src="http://github.com/_static/copybutton.js?v=f281be69"></script>
+    <script src="http://github.com/_static/scripts/sphinx-book-theme.js?v=887ef09a"></script>
+    <script>let toggleHintShow = 'Click to show';</script>
+    <script>let toggleHintHide = 'Click to hide';</script>
+    <script>let toggleOpenOnPrint = 'true';</script>
+    <script src="http://github.com/_static/togglebutton.js?v=4a39c7ea"></script>
+    <script>var togglebuttonSelector = '.toggle, .admonition.dropdown';</script>
+    <script src="http://github.com/_static/design-tabs.js?v=f930bc37"></script>
+    <script>const THEBE_JS_URL = "https://unpkg.com/thebe@0.8.2/lib/index.js"; const thebe_selector = ".thebe,.cell"; const thebe_selector_input = "pre"; const thebe_selector_output = ".output, .cell_output"</script>
+    <script async="async" src="http://github.com/_static/sphinx-thebe.js?v=c100c467"></script>
+    <script>var togglebuttonSelector = '.toggle, .admonition.dropdown';</script>
+    <script>const THEBE_JS_URL = "https://unpkg.com/thebe@0.8.2/lib/index.js"; const thebe_selector = ".thebe,.cell"; const thebe_selector_input = "pre"; const thebe_selector_output = ".output, .cell_output"</script>
+    <script>DOCUMENTATION_OPTIONS.pagename = '.venv/lib/python3.13/site-packages/a11y_pygments/a11y_high_contrast_dark/README';</script>
+    <link rel="index" title="Index" href="http://github.com/genindex.html" />
+    <link rel="search" title="Search" href="http://github.com/search.html" />
+  <meta name="viewport" content="width=device-width, initial-scale=1"/>
+  <meta name="docsearch:language" content="en"/>
+  </head>
+  
+  
+  <body data-bs-spy="scroll" data-bs-target=".bd-toc-nav" data-offset="180" data-bs-root-margin="0px 0px -60%" data-default-mode="">
+
+  
+  
+  <div id="pst-skip-link" class="skip-link d-print-none"><a href="#main-content">Skip to main content</a></div>
+  
+  <div id="pst-scroll-pixel-helper"></div>
+  
+  <button type="button" class="btn rounded-pill" id="pst-back-to-top">
+    <i class="fa-solid fa-arrow-up"></i>Back to top</button>
+
+  
+  <input type="checkbox"
+          class="sidebar-toggle"
+          id="pst-primary-sidebar-checkbox"/>
+  <label class="overlay overlay-primary" for="pst-primary-sidebar-checkbox"></label>
+  
+  <input type="checkbox"
+          class="sidebar-toggle"
+          id="pst-secondary-sidebar-checkbox"/>
+  <label class="overlay overlay-secondary" for="pst-secondary-sidebar-checkbox"></label>
+  
+  <div class="search-button__wrapper">
+    <div class="search-button__overlay"></div>
+    <div class="search-button__search-container">
+<form class="bd-search d-flex align-items-center"
+      action="http://github.com/search.html"
+      method="get">
+  <i class="fa-solid fa-magnifying-glass"></i>
+  <input type="search"
+         class="form-control"
+         name="q"
+         id="search-input"
+         placeholder="Search this book..."
+         aria-label="Search this book..."
+         autocomplete="off"
+         autocorrect="off"
+         autocapitalize="off"
+         spellcheck="false"/>
+  <span class="search-button__kbd-shortcut"><kbd class="kbd-shortcut__modifier">Ctrl</kbd>+<kbd>K</kbd></span>
+</form></div>
+  </div>
+
+  <div class="pst-async-banner-revealer d-none">
+  <aside id="bd-header-version-warning" class="d-none d-print-none" aria-label="Version warning"></aside>
+</div>
+
+  
+    <header class="bd-header navbar navbar-expand-lg bd-navbar d-print-none">
+    </header>
+  
+
+  <div class="bd-container">
+    <div class="bd-container__inner bd-page-width">
+      
+      
+      
+        
+      
+      <div class="bd-sidebar-primary bd-sidebar">
+        
+
+  
+  <div class="sidebar-header-items sidebar-primary__section">
+    
+    
+    
+    
+  </div>
+  
+    <div class="sidebar-primary-items__start sidebar-primary__section">
+        <div class="sidebar-primary-item">
+
+  
+    
+  
+
+<a class="navbar-brand logo" href="http://github.com/intro.html">
+  
+  
+  
+  
+  
+    
+    
+      
+    
+    
+    <img src="http://github.com/_static/logo.png" class="logo__image only-light" alt="Applied Data Analysis and Machine Learning - Home"/>
+    <script>document.write(`<img src="http://github.com/_static/logo.png" class="logo__image only-dark" alt="Applied Data Analysis and Machine Learning - Home"/>`);</script>
+  
+  
+</a></div>
+        <div class="sidebar-primary-item">
+
+ <script>
+ document.write(`
+   <button class="btn search-button-field search-button__button" title="Search" aria-label="Search" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <i class="fa-solid fa-magnifying-glass"></i>
+    <span class="search-button__default-text">Search</span>
+    <span class="search-button__kbd-shortcut"><kbd class="kbd-shortcut__modifier">Ctrl</kbd>+<kbd class="kbd-shortcut__modifier">K</kbd></span>
+   </button>
+ `);
+ </script></div>
+        <div class="sidebar-primary-item"><nav class="bd-links bd-docs-nav" aria-label="Main">
+    <div class="bd-toc-item navbar-nav active">
+        
+        <ul class="nav bd-sidenav bd-sidenav__home-link">
+            <li class="toctree-l1">
+                <a class="reference internal" href="http://github.com/intro.html">
+                    Applied Data Analysis and Machine Learning
+                </a>
+            </li>
+        </ul>
+        <p aria-level="2" class="caption" role="heading"><span class="caption-text">About the course</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/schedule.html">Course setting</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/teachers.html">Teachers and Grading</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/textbooks.html">Textbooks</a></li>
+
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Review of Statistics with Resampling Techniques and Linear Algebra</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/statistics.html">1. Elements of Probability Theory and Statistical Data Analysis</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/linalg.html">2. Linear Algebra, Handling of Arrays and more Python Features</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">From Regression to Support Vector Machines</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter1.html">3. Linear Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter2.html">4. Ridge and Lasso Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter3.html">5. Resampling Methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter4.html">6. Logistic Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapteroptimization.html">7. Optimization, the central part of any Machine Learning algortithm</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter5.html">8. Support Vector Machines, overarching aims</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Decision Trees, Ensemble Methods and Boosting</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter6.html">9. Decision trees, overarching aims</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter7.html">10. Ensemble Methods: From a Single Tree to Many Trees and Extreme Boosting, Meet the Jungle of Methods</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Dimensionality Reduction</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter8.html">11. Basic ideas of the Principal Component Analysis (PCA)</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/clustering.html">12. Clustering and Unsupervised Learning</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Deep Learning Methods</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter9.html">13. Neural networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter10.html">14. Building a Feed Forward Neural Network</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter11.html">15. Solving Differential Equations  with Deep Learning</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter12.html">16. Convolutional Neural Networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter13.html">17. Recurrent neural networks: Overarching view</a></li>
+
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Weekly material, notes and exercises</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek34.html">Exercises week 34</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week34.html">Week 34: Introduction to the course, Logistics and Practicalities</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek35.html">Exercises week 35</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week35.html">Week 35: From Ordinary Linear Regression to Ridge and Lasso Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek36.html">Exercises week 36</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week36.html">Week 36: Linear Regression and Gradient descent</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek37.html">Exercises week 37</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week37.html">Week 37: Gradient descent methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek38.html">Exercises week 38</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week38.html">Week 38: Statistical analysis, bias-variance tradeoff and resampling methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek39.html">Exercises week 39</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week39.html">Week 39: Resampling methods and logistic regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week40.html">Week 40: Gradient descent methods (continued) and start Neural networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week41.html">Week 41 Neural networks and constructing a neural network code</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek41.html">Exercises week 41</a></li>
+
+
+
+
+
+
+
+
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Projects</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/project1.html">Project 1 on Machine Learning, deadline October 6 (midnight), 2025</a></li>
+</ul>
+
+    </div>
+</nav></div>
+    </div>
+  
+  
+  <div class="sidebar-primary-items__end sidebar-primary__section">
+  </div>
+  
+  <div id="rtd-footer-container"></div>
+
+
+      </div>
+      
+      <main id="main-content" class="bd-main" role="main">
+        
+        
+
+<div class="sbt-scroll-pixel-helper"></div>
+
+          <div class="bd-content">
+            <div class="bd-article-container">
+              
+              <div class="bd-header-article d-print-none">
+<div class="header-article-items header-article__inner">
+  
+    <div class="header-article-items__start">
+      
+        <div class="header-article-item"><button class="sidebar-toggle primary-toggle btn btn-sm" title="Toggle primary sidebar" data-bs-placement="bottom" data-bs-toggle="tooltip">
+  <span class="fa-solid fa-bars"></span>
+</button></div>
+      
+    </div>
+  
+  
+    <div class="header-article-items__end">
+      
+        <div class="header-article-item">
+
+<div class="article-header-buttons">
+
+
+
+
+
+<div class="dropdown dropdown-download-buttons">
+  <button class="btn dropdown-toggle" type="button" data-bs-toggle="dropdown" aria-expanded="false" aria-label="Download this page">
+    <i class="fas fa-download"></i>
+  </button>
+  <ul class="dropdown-menu">
+      
+      
+      
+      <li><a href="http://github.com/_sources/.venv/lib/python3.13/site-packages/a11y_pygments/a11y_high_contrast_dark/README.md" target="_blank"
+   class="btn btn-sm btn-download-source-button dropdown-item"
+   title="Download source file"
+   data-bs-placement="left" data-bs-toggle="tooltip"
+>
+  
+
+<span class="btn__icon-container">
+  <i class="fas fa-file"></i>
+  </span>
+<span class="btn__text-container">.md</span>
+</a>
+</li>
+      
+      
+      
+      
+      <li>
+<button onclick="window.print()"
+  class="btn btn-sm btn-download-pdf-button dropdown-item"
+  title="Print to PDF"
+  data-bs-placement="left" data-bs-toggle="tooltip"
+>
+  
+
+<span class="btn__icon-container">
+  <i class="fas fa-file-pdf"></i>
+  </span>
+<span class="btn__text-container">.pdf</span>
+</button>
+</li>
+      
+  </ul>
+</div>
+
+
+
+
+<button onclick="toggleFullScreen()"
+  class="btn btn-sm btn-fullscreen-button"
+  title="Fullscreen mode"
+  data-bs-placement="bottom" data-bs-toggle="tooltip"
+>
+  
+
+<span class="btn__icon-container">
+  <i class="fas fa-expand"></i>
+  </span>
+
+</button>
+
+
+
+<script>
+document.write(`
+  <button class="btn btn-sm nav-link pst-navbar-icon theme-switch-button" title="light/dark" aria-label="light/dark" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <i class="theme-switch fa-solid fa-sun fa-lg" data-mode="light"></i>
+    <i class="theme-switch fa-solid fa-moon fa-lg" data-mode="dark"></i>
+    <i class="theme-switch fa-solid fa-circle-half-stroke fa-lg" data-mode="auto"></i>
+  </button>
+`);
+</script>
+
+
+<script>
+document.write(`
+  <button class="btn btn-sm pst-navbar-icon search-button search-button__button" title="Search" aria-label="Search" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <i class="fa-solid fa-magnifying-glass fa-lg"></i>
+  </button>
+`);
+</script>
+<button class="sidebar-toggle secondary-toggle btn btn-sm" title="Toggle secondary sidebar" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <span class="fa-solid fa-list"></span>
+</button>
+</div></div>
+      
+    </div>
+  
+</div>
+</div>
+              
+              
+
+<div id="jb-print-docs-body" class="onlyprint">
+    <h1>A11Y High Contrast Dark</h1>
+    <!-- Table of contents -->
+    <div id="print-main-content">
+        <div id="jb-print-toc">
+            
+            <div>
+                <h2> Contents </h2>
+            </div>
+            <nav aria-label="Page">
+                <ul class="visible nav section-nav flex-column">
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#colors">Colors</a></li>
+</ul>
+            </nav>
+        </div>
+    </div>
+</div>
+
+              
+                
+<div id="searchbox"></div>
+                <article class="bd-article">
+                  
+  <section class="tex2jax_ignore mathjax_ignore" id="a11y-high-contrast-dark">
+<h1>A11Y High Contrast Dark<a class="headerlink" href="#a11y-high-contrast-dark" title="Link to this heading">#</a></h1>
+<p>This style mimics the a11 light theme from eric bailey’s accessible themes.</p>
+<p><img alt="Screenshot of the a11y-high-contrast-dark theme in a bash script" src="http://github.com/_images/a11y-high-contrast-dark.png" /></p>
+<section id="colors">
+<h2>Colors<a class="headerlink" href="#colors" title="Link to this heading">#</a></h2>
+<p>Background color: <img alt="#2b2b2b" src="https://via.placeholder.com/20/2b2b2b/2b2b2b.png" /> <code class="docutils literal notranslate"><span class="pre">#2b2b2b</span></code></p>
+<p>Highlight color: <img alt="#ffd9002e" src="https://via.placeholder.com/20/ffd9002e/ffd9002e.png" /> <code class="docutils literal notranslate"><span class="pre">#ffd9002e</span></code></p>
+<p><strong>WCAG compliance</strong></p>
+<div class="pst-scrollable-table-container"><table class="table">
+<thead>
+<tr class="row-odd"><th class="head"><p>Color</p></th>
+<th class="head"><p>Hex</p></th>
+<th class="head"><p>Ratio</p></th>
+<th class="head"><p>Normal text</p></th>
+<th class="head"><p>Large text</p></th>
+</tr>
+</thead>
+<tbody>
+<tr class="row-even"><td><p><img alt="#ffd900" src="http://github.com/_images/ffd900.png" /></p></td>
+<td><p><code class="docutils literal notranslate"><span class="pre">#ffd900</span></code></p></td>
+<td><p>10.2 : 1</p></td>
+<td><p>AAA</p></td>
+<td><p>AAA</p></td>
+</tr>
+<tr class="row-odd"><td><p><img alt="#ffa07a" src="http://github.com/_images/ffa07a.png" /></p></td>
+<td><p><code class="docutils literal notranslate"><span class="pre">#ffa07a</span></code></p></td>
+<td><p>7.1 : 1</p></td>
+<td><p>AAA</p></td>
+<td><p>AAA</p></td>
+</tr>
+<tr class="row-even"><td><p><img alt="#abe338" src="http://github.com/_images/abe338.png" /></p></td>
+<td><p><code class="docutils literal notranslate"><span class="pre">#abe338</span></code></p></td>
+<td><p>9.3 : 1</p></td>
+<td><p>AAA</p></td>
+<td><p>AAA</p></td>
+</tr>
+<tr class="row-odd"><td><p><img alt="#00e0e0" src="http://github.com/_images/00e0e0.png" /></p></td>
+<td><p><code class="docutils literal notranslate"><span class="pre">#00e0e0</span></code></p></td>
+<td><p>8.6 : 1</p></td>
+<td><p>AAA</p></td>
+<td><p>AAA</p></td>
+</tr>
+<tr class="row-even"><td><p><img alt="#dcc6e0" src="http://github.com/_images/dcc6e0.png" /></p></td>
+<td><p><code class="docutils literal notranslate"><span class="pre">#dcc6e0</span></code></p></td>
+<td><p>8.9 : 1</p></td>
+<td><p>AAA</p></td>
+<td><p>AAA</p></td>
+</tr>
+<tr class="row-odd"><td><p><img alt="#f8f8f2" src="http://github.com/_images/f8f8f2.png" /></p></td>
+<td><p><code class="docutils literal notranslate"><span class="pre">#f8f8f2</span></code></p></td>
+<td><p>13.3 : 1</p></td>
+<td><p>AAA</p></td>
+<td><p>AAA</p></td>
+</tr>
+</tbody>
+</table>
+</div>
+</section>
+</section>
+
+    <script type="text/x-thebe-config">
+    {
+        requestKernel: true,
+        binderOptions: {
+            repo: "binder-examples/jupyter-stacks-datascience",
+            ref: "master",
+        },
+        codeMirrorConfig: {
+            theme: "abcdef",
+            mode: "python"
+        },
+        kernelOptions: {
+            name: "python3",
+            path: "./.venv/lib/python3.13/site-packages/a11y_pygments/a11y_high_contrast_dark"
+        },
+        predefinedOutput: true
+    }
+    </script>
+    <script>kernelName = 'python3'</script>
+
+                </article>
+              
+
+              
+              
+              
+              
+                <footer class="prev-next-footer d-print-none">
+                  
+<div class="prev-next-area">
+</div>
+                </footer>
+              
+            </div>
+            
+            
+              
+                <div class="bd-sidebar-secondary bd-toc"><div class="sidebar-secondary-items sidebar-secondary__inner">
+
+
+  <div class="sidebar-secondary-item">
+  <div class="page-toc tocsection onthispage">
+    <i class="fa-solid fa-list"></i> Contents
+  </div>
+  <nav class="bd-toc-nav page-toc">
+    <ul class="visible nav section-nav flex-column">
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#colors">Colors</a></li>
+</ul>
+  </nav></div>
+
+</div></div>
+              
+            
+          </div>
+          <footer class="bd-footer-content">
+            
+<div class="bd-footer-content__inner container">
+  
+  <div class="footer-item">
+    
+<p class="component-author">
+By Morten Hjorth-Jensen
+</p>
+
+  </div>
+  
+  <div class="footer-item">
+    
+
+  <p class="copyright">
+    
+      © Copyright 2023.
+      <br/>
+    
+  </p>
+
+  </div>
+  
+  <div class="footer-item">
+    
+  </div>
+  
+  <div class="footer-item">
+    
+  </div>
+  
+</div>
+          </footer>
+        
+
+      </main>
+    </div>
+  </div>
+  
+  <!-- Scripts loaded after <body> so the DOM is not blocked -->
+  <script src="http://github.com/_static/scripts/bootstrap.js?digest=dfe6caa3a7d634c4db9b"></script>
+<script src="http://github.com/_static/scripts/pydata-sphinx-theme.js?digest=dfe6caa3a7d634c4db9b"></script>
+
+  <footer class="bd-footer">
+  </footer>
+  </body>
+</html>
\ No newline at end of file
diff --git a/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/a11y_pygments/a11y_high_contrast_light/README.html b/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/a11y_pygments/a11y_high_contrast_light/README.html
new file mode 100644
index 000000000..d6b6b1c7b
--- /dev/null
+++ b/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/a11y_pygments/a11y_high_contrast_light/README.html
@@ -0,0 +1,585 @@
+
+<!DOCTYPE html>
+
+
+<html lang="en" data-content_root="../../../../../../" >
+
+  <head>
+    <meta charset="utf-8" />
+    <meta name="viewport" content="width=device-width, initial-scale=1.0" /><meta name="viewport" content="width=device-width, initial-scale=1" />
+
+    <title>A11Y High Contrast Light &#8212; Applied Data Analysis and Machine Learning</title>
+  
+  
+  
+  <script data-cfasync="false">
+    document.documentElement.dataset.mode = localStorage.getItem("mode") || "";
+    document.documentElement.dataset.theme = localStorage.getItem("theme") || "";
+  </script>
+  
+  <!-- Loaded before other Sphinx assets -->
+  <link href="http://github.com/_static/styles/theme.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+<link href="http://github.com/_static/styles/bootstrap.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+<link href="http://github.com/_static/styles/pydata-sphinx-theme.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+
+  
+  <link href="http://github.com/_static/vendor/fontawesome/6.5.2/css/all.min.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+  <link rel="preload" as="font" type="font/woff2" crossorigin href="http://github.com/_static/vendor/fontawesome/6.5.2/webfonts/fa-solid-900.woff2" />
+<link rel="preload" as="font" type="font/woff2" crossorigin href="http://github.com/_static/vendor/fontawesome/6.5.2/webfonts/fa-brands-400.woff2" />
+<link rel="preload" as="font" type="font/woff2" crossorigin href="http://github.com/_static/vendor/fontawesome/6.5.2/webfonts/fa-regular-400.woff2" />
+
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/pygments.css?v=03e43079" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/styles/sphinx-book-theme.css?v=eba8b062" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/togglebutton.css?v=13237357" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/copybutton.css?v=76b2166b" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/mystnb.8ecb98da25f57f5357bf6f572d296f466b2cfe2517ffebfabe82451661e28f02.css?v=6644e6bb" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/sphinx-thebe.css?v=4fa983c6" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/sphinx-design.min.css?v=95c83b7e" />
+  
+  <!-- Pre-loaded scripts that we'll load fully later -->
+  <link rel="preload" as="script" href="http://github.com/_static/scripts/bootstrap.js?digest=dfe6caa3a7d634c4db9b" />
+<link rel="preload" as="script" href="http://github.com/_static/scripts/pydata-sphinx-theme.js?digest=dfe6caa3a7d634c4db9b" />
+  <script src="http://github.com/_static/vendor/fontawesome/6.5.2/js/all.min.js?digest=dfe6caa3a7d634c4db9b"></script>
+
+    <script src="http://github.com/_static/documentation_options.js?v=9eb32ce0"></script>
+    <script src="http://github.com/_static/doctools.js?v=9a2dae69"></script>
+    <script src="http://github.com/_static/sphinx_highlight.js?v=dc90522c"></script>
+    <script src="http://github.com/_static/clipboard.min.js?v=a7894cd8"></script>
+    <script src="http://github.com/_static/copybutton.js?v=f281be69"></script>
+    <script src="http://github.com/_static/scripts/sphinx-book-theme.js?v=887ef09a"></script>
+    <script>let toggleHintShow = 'Click to show';</script>
+    <script>let toggleHintHide = 'Click to hide';</script>
+    <script>let toggleOpenOnPrint = 'true';</script>
+    <script src="http://github.com/_static/togglebutton.js?v=4a39c7ea"></script>
+    <script>var togglebuttonSelector = '.toggle, .admonition.dropdown';</script>
+    <script src="http://github.com/_static/design-tabs.js?v=f930bc37"></script>
+    <script>const THEBE_JS_URL = "https://unpkg.com/thebe@0.8.2/lib/index.js"; const thebe_selector = ".thebe,.cell"; const thebe_selector_input = "pre"; const thebe_selector_output = ".output, .cell_output"</script>
+    <script async="async" src="http://github.com/_static/sphinx-thebe.js?v=c100c467"></script>
+    <script>var togglebuttonSelector = '.toggle, .admonition.dropdown';</script>
+    <script>const THEBE_JS_URL = "https://unpkg.com/thebe@0.8.2/lib/index.js"; const thebe_selector = ".thebe,.cell"; const thebe_selector_input = "pre"; const thebe_selector_output = ".output, .cell_output"</script>
+    <script>DOCUMENTATION_OPTIONS.pagename = '.venv/lib/python3.13/site-packages/a11y_pygments/a11y_high_contrast_light/README';</script>
+    <link rel="index" title="Index" href="http://github.com/genindex.html" />
+    <link rel="search" title="Search" href="http://github.com/search.html" />
+  <meta name="viewport" content="width=device-width, initial-scale=1"/>
+  <meta name="docsearch:language" content="en"/>
+  </head>
+  
+  
+  <body data-bs-spy="scroll" data-bs-target=".bd-toc-nav" data-offset="180" data-bs-root-margin="0px 0px -60%" data-default-mode="">
+
+  
+  
+  <div id="pst-skip-link" class="skip-link d-print-none"><a href="#main-content">Skip to main content</a></div>
+  
+  <div id="pst-scroll-pixel-helper"></div>
+  
+  <button type="button" class="btn rounded-pill" id="pst-back-to-top">
+    <i class="fa-solid fa-arrow-up"></i>Back to top</button>
+
+  
+  <input type="checkbox"
+          class="sidebar-toggle"
+          id="pst-primary-sidebar-checkbox"/>
+  <label class="overlay overlay-primary" for="pst-primary-sidebar-checkbox"></label>
+  
+  <input type="checkbox"
+          class="sidebar-toggle"
+          id="pst-secondary-sidebar-checkbox"/>
+  <label class="overlay overlay-secondary" for="pst-secondary-sidebar-checkbox"></label>
+  
+  <div class="search-button__wrapper">
+    <div class="search-button__overlay"></div>
+    <div class="search-button__search-container">
+<form class="bd-search d-flex align-items-center"
+      action="http://github.com/search.html"
+      method="get">
+  <i class="fa-solid fa-magnifying-glass"></i>
+  <input type="search"
+         class="form-control"
+         name="q"
+         id="search-input"
+         placeholder="Search this book..."
+         aria-label="Search this book..."
+         autocomplete="off"
+         autocorrect="off"
+         autocapitalize="off"
+         spellcheck="false"/>
+  <span class="search-button__kbd-shortcut"><kbd class="kbd-shortcut__modifier">Ctrl</kbd>+<kbd>K</kbd></span>
+</form></div>
+  </div>
+
+  <div class="pst-async-banner-revealer d-none">
+  <aside id="bd-header-version-warning" class="d-none d-print-none" aria-label="Version warning"></aside>
+</div>
+
+  
+    <header class="bd-header navbar navbar-expand-lg bd-navbar d-print-none">
+    </header>
+  
+
+  <div class="bd-container">
+    <div class="bd-container__inner bd-page-width">
+      
+      
+      
+        
+      
+      <div class="bd-sidebar-primary bd-sidebar">
+        
+
+  
+  <div class="sidebar-header-items sidebar-primary__section">
+    
+    
+    
+    
+  </div>
+  
+    <div class="sidebar-primary-items__start sidebar-primary__section">
+        <div class="sidebar-primary-item">
+
+  
+    
+  
+
+<a class="navbar-brand logo" href="http://github.com/intro.html">
+  
+  
+  
+  
+  
+    
+    
+      
+    
+    
+    <img src="http://github.com/_static/logo.png" class="logo__image only-light" alt="Applied Data Analysis and Machine Learning - Home"/>
+    <script>document.write(`<img src="http://github.com/_static/logo.png" class="logo__image only-dark" alt="Applied Data Analysis and Machine Learning - Home"/>`);</script>
+  
+  
+</a></div>
+        <div class="sidebar-primary-item">
+
+ <script>
+ document.write(`
+   <button class="btn search-button-field search-button__button" title="Search" aria-label="Search" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <i class="fa-solid fa-magnifying-glass"></i>
+    <span class="search-button__default-text">Search</span>
+    <span class="search-button__kbd-shortcut"><kbd class="kbd-shortcut__modifier">Ctrl</kbd>+<kbd class="kbd-shortcut__modifier">K</kbd></span>
+   </button>
+ `);
+ </script></div>
+        <div class="sidebar-primary-item"><nav class="bd-links bd-docs-nav" aria-label="Main">
+    <div class="bd-toc-item navbar-nav active">
+        
+        <ul class="nav bd-sidenav bd-sidenav__home-link">
+            <li class="toctree-l1">
+                <a class="reference internal" href="http://github.com/intro.html">
+                    Applied Data Analysis and Machine Learning
+                </a>
+            </li>
+        </ul>
+        <p aria-level="2" class="caption" role="heading"><span class="caption-text">About the course</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/schedule.html">Course setting</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/teachers.html">Teachers and Grading</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/textbooks.html">Textbooks</a></li>
+
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Review of Statistics with Resampling Techniques and Linear Algebra</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/statistics.html">1. Elements of Probability Theory and Statistical Data Analysis</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/linalg.html">2. Linear Algebra, Handling of Arrays and more Python Features</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">From Regression to Support Vector Machines</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter1.html">3. Linear Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter2.html">4. Ridge and Lasso Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter3.html">5. Resampling Methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter4.html">6. Logistic Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapteroptimization.html">7. Optimization, the central part of any Machine Learning algortithm</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter5.html">8. Support Vector Machines, overarching aims</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Decision Trees, Ensemble Methods and Boosting</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter6.html">9. Decision trees, overarching aims</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter7.html">10. Ensemble Methods: From a Single Tree to Many Trees and Extreme Boosting, Meet the Jungle of Methods</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Dimensionality Reduction</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter8.html">11. Basic ideas of the Principal Component Analysis (PCA)</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/clustering.html">12. Clustering and Unsupervised Learning</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Deep Learning Methods</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter9.html">13. Neural networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter10.html">14. Building a Feed Forward Neural Network</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter11.html">15. Solving Differential Equations  with Deep Learning</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter12.html">16. Convolutional Neural Networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter13.html">17. Recurrent neural networks: Overarching view</a></li>
+
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Weekly material, notes and exercises</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek34.html">Exercises week 34</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week34.html">Week 34: Introduction to the course, Logistics and Practicalities</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek35.html">Exercises week 35</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week35.html">Week 35: From Ordinary Linear Regression to Ridge and Lasso Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek36.html">Exercises week 36</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week36.html">Week 36: Linear Regression and Gradient descent</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek37.html">Exercises week 37</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week37.html">Week 37: Gradient descent methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek38.html">Exercises week 38</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week38.html">Week 38: Statistical analysis, bias-variance tradeoff and resampling methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek39.html">Exercises week 39</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week39.html">Week 39: Resampling methods and logistic regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week40.html">Week 40: Gradient descent methods (continued) and start Neural networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week41.html">Week 41 Neural networks and constructing a neural network code</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek41.html">Exercises week 41</a></li>
+
+
+
+
+
+
+
+
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Projects</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/project1.html">Project 1 on Machine Learning, deadline October 6 (midnight), 2025</a></li>
+</ul>
+
+    </div>
+</nav></div>
+    </div>
+  
+  
+  <div class="sidebar-primary-items__end sidebar-primary__section">
+  </div>
+  
+  <div id="rtd-footer-container"></div>
+
+
+      </div>
+      
+      <main id="main-content" class="bd-main" role="main">
+        
+        
+
+<div class="sbt-scroll-pixel-helper"></div>
+
+          <div class="bd-content">
+            <div class="bd-article-container">
+              
+              <div class="bd-header-article d-print-none">
+<div class="header-article-items header-article__inner">
+  
+    <div class="header-article-items__start">
+      
+        <div class="header-article-item"><button class="sidebar-toggle primary-toggle btn btn-sm" title="Toggle primary sidebar" data-bs-placement="bottom" data-bs-toggle="tooltip">
+  <span class="fa-solid fa-bars"></span>
+</button></div>
+      
+    </div>
+  
+  
+    <div class="header-article-items__end">
+      
+        <div class="header-article-item">
+
+<div class="article-header-buttons">
+
+
+
+
+
+<div class="dropdown dropdown-download-buttons">
+  <button class="btn dropdown-toggle" type="button" data-bs-toggle="dropdown" aria-expanded="false" aria-label="Download this page">
+    <i class="fas fa-download"></i>
+  </button>
+  <ul class="dropdown-menu">
+      
+      
+      
+      <li><a href="http://github.com/_sources/.venv/lib/python3.13/site-packages/a11y_pygments/a11y_high_contrast_light/README.md" target="_blank"
+   class="btn btn-sm btn-download-source-button dropdown-item"
+   title="Download source file"
+   data-bs-placement="left" data-bs-toggle="tooltip"
+>
+  
+
+<span class="btn__icon-container">
+  <i class="fas fa-file"></i>
+  </span>
+<span class="btn__text-container">.md</span>
+</a>
+</li>
+      
+      
+      
+      
+      <li>
+<button onclick="window.print()"
+  class="btn btn-sm btn-download-pdf-button dropdown-item"
+  title="Print to PDF"
+  data-bs-placement="left" data-bs-toggle="tooltip"
+>
+  
+
+<span class="btn__icon-container">
+  <i class="fas fa-file-pdf"></i>
+  </span>
+<span class="btn__text-container">.pdf</span>
+</button>
+</li>
+      
+  </ul>
+</div>
+
+
+
+
+<button onclick="toggleFullScreen()"
+  class="btn btn-sm btn-fullscreen-button"
+  title="Fullscreen mode"
+  data-bs-placement="bottom" data-bs-toggle="tooltip"
+>
+  
+
+<span class="btn__icon-container">
+  <i class="fas fa-expand"></i>
+  </span>
+
+</button>
+
+
+
+<script>
+document.write(`
+  <button class="btn btn-sm nav-link pst-navbar-icon theme-switch-button" title="light/dark" aria-label="light/dark" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <i class="theme-switch fa-solid fa-sun fa-lg" data-mode="light"></i>
+    <i class="theme-switch fa-solid fa-moon fa-lg" data-mode="dark"></i>
+    <i class="theme-switch fa-solid fa-circle-half-stroke fa-lg" data-mode="auto"></i>
+  </button>
+`);
+</script>
+
+
+<script>
+document.write(`
+  <button class="btn btn-sm pst-navbar-icon search-button search-button__button" title="Search" aria-label="Search" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <i class="fa-solid fa-magnifying-glass fa-lg"></i>
+  </button>
+`);
+</script>
+<button class="sidebar-toggle secondary-toggle btn btn-sm" title="Toggle secondary sidebar" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <span class="fa-solid fa-list"></span>
+</button>
+</div></div>
+      
+    </div>
+  
+</div>
+</div>
+              
+              
+
+<div id="jb-print-docs-body" class="onlyprint">
+    <h1>A11Y High Contrast Light</h1>
+    <!-- Table of contents -->
+    <div id="print-main-content">
+        <div id="jb-print-toc">
+            
+            <div>
+                <h2> Contents </h2>
+            </div>
+            <nav aria-label="Page">
+                <ul class="visible nav section-nav flex-column">
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#colors">Colors</a></li>
+</ul>
+            </nav>
+        </div>
+    </div>
+</div>
+
+              
+                
+<div id="searchbox"></div>
+                <article class="bd-article">
+                  
+  <section class="tex2jax_ignore mathjax_ignore" id="a11y-high-contrast-light">
+<h1>A11Y High Contrast Light<a class="headerlink" href="#a11y-high-contrast-light" title="Link to this heading">#</a></h1>
+<p>This style mimics the a11y-light theme (but with more contrast) from eric bailey’s accessible themes.</p>
+<p><img alt="Screenshot of the a11y-high-contrast-light theme in a bash script" src="http://github.com/_images/a11y-high-contrast-light.png" /></p>
+<section id="colors">
+<h2>Colors<a class="headerlink" href="#colors" title="Link to this heading">#</a></h2>
+<p>Background color: <img alt="#fefefe" src="https://via.placeholder.com/20/fefefe/fefefe.png" /> <code class="docutils literal notranslate"><span class="pre">#fefefe</span></code></p>
+<p>Highlight color: <img alt="#fae4c2" src="https://via.placeholder.com/20/fae4c2/fae4c2.png" /> <code class="docutils literal notranslate"><span class="pre">#fae4c2</span></code></p>
+<p><strong>WCAG compliance</strong></p>
+<div class="pst-scrollable-table-container"><table class="table">
+<thead>
+<tr class="row-odd"><th class="head"><p>Color</p></th>
+<th class="head"><p>Hex</p></th>
+<th class="head"><p>Ratio</p></th>
+<th class="head"><p>Normal text</p></th>
+<th class="head"><p>Large text</p></th>
+</tr>
+</thead>
+<tbody>
+<tr class="row-even"><td><p><img alt="#515151" src="http://github.com/_images/515151.png" /></p></td>
+<td><p><code class="docutils literal notranslate"><span class="pre">#515151</span></code></p></td>
+<td><p>7.9 : 1</p></td>
+<td><p>AAA</p></td>
+<td><p>AAA</p></td>
+</tr>
+<tr class="row-odd"><td><p><img alt="#a12236" src="http://github.com/_images/a12236.png" /></p></td>
+<td><p><code class="docutils literal notranslate"><span class="pre">#a12236</span></code></p></td>
+<td><p>7.4 : 1</p></td>
+<td><p>AAA</p></td>
+<td><p>AAA</p></td>
+</tr>
+<tr class="row-even"><td><p><img alt="#7f4707" src="http://github.com/_images/7f4707.png" /></p></td>
+<td><p><code class="docutils literal notranslate"><span class="pre">#7f4707</span></code></p></td>
+<td><p>7.4 : 1</p></td>
+<td><p>AAA</p></td>
+<td><p>AAA</p></td>
+</tr>
+<tr class="row-odd"><td><p><img alt="#912583" src="http://github.com/_images/912583.png" /></p></td>
+<td><p><code class="docutils literal notranslate"><span class="pre">#912583</span></code></p></td>
+<td><p>7.4 : 1</p></td>
+<td><p>AAA</p></td>
+<td><p>AAA</p></td>
+</tr>
+<tr class="row-even"><td><p><img alt="#00622f" src="http://github.com/_images/00622f.png" /></p></td>
+<td><p><code class="docutils literal notranslate"><span class="pre">#00622f</span></code></p></td>
+<td><p>7.5 : 1</p></td>
+<td><p>AAA</p></td>
+<td><p>AAA</p></td>
+</tr>
+<tr class="row-odd"><td><p><img alt="#005b82" src="http://github.com/_images/005b82.png" /></p></td>
+<td><p><code class="docutils literal notranslate"><span class="pre">#005b82</span></code></p></td>
+<td><p>7.4 : 1</p></td>
+<td><p>AAA</p></td>
+<td><p>AAA</p></td>
+</tr>
+<tr class="row-even"><td><p><img alt="#6730c5" src="http://github.com/_images/6730c5.png" /></p></td>
+<td><p><code class="docutils literal notranslate"><span class="pre">#6730c5</span></code></p></td>
+<td><p>7.4 : 1</p></td>
+<td><p>AAA</p></td>
+<td><p>AAA</p></td>
+</tr>
+<tr class="row-odd"><td><p><img alt="#080808" src="http://github.com/_images/080808.png" /></p></td>
+<td><p><code class="docutils literal notranslate"><span class="pre">#080808</span></code></p></td>
+<td><p>19.9 : 1</p></td>
+<td><p>AAA</p></td>
+<td><p>AAA</p></td>
+</tr>
+</tbody>
+</table>
+</div>
+</section>
+</section>
+
+    <script type="text/x-thebe-config">
+    {
+        requestKernel: true,
+        binderOptions: {
+            repo: "binder-examples/jupyter-stacks-datascience",
+            ref: "master",
+        },
+        codeMirrorConfig: {
+            theme: "abcdef",
+            mode: "python"
+        },
+        kernelOptions: {
+            name: "python3",
+            path: "./.venv/lib/python3.13/site-packages/a11y_pygments/a11y_high_contrast_light"
+        },
+        predefinedOutput: true
+    }
+    </script>
+    <script>kernelName = 'python3'</script>
+
+                </article>
+              
+
+              
+              
+              
+              
+                <footer class="prev-next-footer d-print-none">
+                  
+<div class="prev-next-area">
+</div>
+                </footer>
+              
+            </div>
+            
+            
+              
+                <div class="bd-sidebar-secondary bd-toc"><div class="sidebar-secondary-items sidebar-secondary__inner">
+
+
+  <div class="sidebar-secondary-item">
+  <div class="page-toc tocsection onthispage">
+    <i class="fa-solid fa-list"></i> Contents
+  </div>
+  <nav class="bd-toc-nav page-toc">
+    <ul class="visible nav section-nav flex-column">
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#colors">Colors</a></li>
+</ul>
+  </nav></div>
+
+</div></div>
+              
+            
+          </div>
+          <footer class="bd-footer-content">
+            
+<div class="bd-footer-content__inner container">
+  
+  <div class="footer-item">
+    
+<p class="component-author">
+By Morten Hjorth-Jensen
+</p>
+
+  </div>
+  
+  <div class="footer-item">
+    
+
+  <p class="copyright">
+    
+      © Copyright 2023.
+      <br/>
+    
+  </p>
+
+  </div>
+  
+  <div class="footer-item">
+    
+  </div>
+  
+  <div class="footer-item">
+    
+  </div>
+  
+</div>
+          </footer>
+        
+
+      </main>
+    </div>
+  </div>
+  
+  <!-- Scripts loaded after <body> so the DOM is not blocked -->
+  <script src="http://github.com/_static/scripts/bootstrap.js?digest=dfe6caa3a7d634c4db9b"></script>
+<script src="http://github.com/_static/scripts/pydata-sphinx-theme.js?digest=dfe6caa3a7d634c4db9b"></script>
+
+  <footer class="bd-footer">
+  </footer>
+  </body>
+</html>
\ No newline at end of file
diff --git a/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/a11y_pygments/a11y_light/README.html b/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/a11y_pygments/a11y_light/README.html
new file mode 100644
index 000000000..ff367bdb9
--- /dev/null
+++ b/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/a11y_pygments/a11y_light/README.html
@@ -0,0 +1,579 @@
+
+<!DOCTYPE html>
+
+
+<html lang="en" data-content_root="../../../../../../" >
+
+  <head>
+    <meta charset="utf-8" />
+    <meta name="viewport" content="width=device-width, initial-scale=1.0" /><meta name="viewport" content="width=device-width, initial-scale=1" />
+
+    <title>A11Y Light &#8212; Applied Data Analysis and Machine Learning</title>
+  
+  
+  
+  <script data-cfasync="false">
+    document.documentElement.dataset.mode = localStorage.getItem("mode") || "";
+    document.documentElement.dataset.theme = localStorage.getItem("theme") || "";
+  </script>
+  
+  <!-- Loaded before other Sphinx assets -->
+  <link href="http://github.com/_static/styles/theme.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+<link href="http://github.com/_static/styles/bootstrap.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+<link href="http://github.com/_static/styles/pydata-sphinx-theme.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+
+  
+  <link href="http://github.com/_static/vendor/fontawesome/6.5.2/css/all.min.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+  <link rel="preload" as="font" type="font/woff2" crossorigin href="http://github.com/_static/vendor/fontawesome/6.5.2/webfonts/fa-solid-900.woff2" />
+<link rel="preload" as="font" type="font/woff2" crossorigin href="http://github.com/_static/vendor/fontawesome/6.5.2/webfonts/fa-brands-400.woff2" />
+<link rel="preload" as="font" type="font/woff2" crossorigin href="http://github.com/_static/vendor/fontawesome/6.5.2/webfonts/fa-regular-400.woff2" />
+
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/pygments.css?v=03e43079" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/styles/sphinx-book-theme.css?v=eba8b062" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/togglebutton.css?v=13237357" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/copybutton.css?v=76b2166b" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/mystnb.8ecb98da25f57f5357bf6f572d296f466b2cfe2517ffebfabe82451661e28f02.css?v=6644e6bb" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/sphinx-thebe.css?v=4fa983c6" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/sphinx-design.min.css?v=95c83b7e" />
+  
+  <!-- Pre-loaded scripts that we'll load fully later -->
+  <link rel="preload" as="script" href="http://github.com/_static/scripts/bootstrap.js?digest=dfe6caa3a7d634c4db9b" />
+<link rel="preload" as="script" href="http://github.com/_static/scripts/pydata-sphinx-theme.js?digest=dfe6caa3a7d634c4db9b" />
+  <script src="http://github.com/_static/vendor/fontawesome/6.5.2/js/all.min.js?digest=dfe6caa3a7d634c4db9b"></script>
+
+    <script src="http://github.com/_static/documentation_options.js?v=9eb32ce0"></script>
+    <script src="http://github.com/_static/doctools.js?v=9a2dae69"></script>
+    <script src="http://github.com/_static/sphinx_highlight.js?v=dc90522c"></script>
+    <script src="http://github.com/_static/clipboard.min.js?v=a7894cd8"></script>
+    <script src="http://github.com/_static/copybutton.js?v=f281be69"></script>
+    <script src="http://github.com/_static/scripts/sphinx-book-theme.js?v=887ef09a"></script>
+    <script>let toggleHintShow = 'Click to show';</script>
+    <script>let toggleHintHide = 'Click to hide';</script>
+    <script>let toggleOpenOnPrint = 'true';</script>
+    <script src="http://github.com/_static/togglebutton.js?v=4a39c7ea"></script>
+    <script>var togglebuttonSelector = '.toggle, .admonition.dropdown';</script>
+    <script src="http://github.com/_static/design-tabs.js?v=f930bc37"></script>
+    <script>const THEBE_JS_URL = "https://unpkg.com/thebe@0.8.2/lib/index.js"; const thebe_selector = ".thebe,.cell"; const thebe_selector_input = "pre"; const thebe_selector_output = ".output, .cell_output"</script>
+    <script async="async" src="http://github.com/_static/sphinx-thebe.js?v=c100c467"></script>
+    <script>var togglebuttonSelector = '.toggle, .admonition.dropdown';</script>
+    <script>const THEBE_JS_URL = "https://unpkg.com/thebe@0.8.2/lib/index.js"; const thebe_selector = ".thebe,.cell"; const thebe_selector_input = "pre"; const thebe_selector_output = ".output, .cell_output"</script>
+    <script>DOCUMENTATION_OPTIONS.pagename = '.venv/lib/python3.13/site-packages/a11y_pygments/a11y_light/README';</script>
+    <link rel="index" title="Index" href="http://github.com/genindex.html" />
+    <link rel="search" title="Search" href="http://github.com/search.html" />
+  <meta name="viewport" content="width=device-width, initial-scale=1"/>
+  <meta name="docsearch:language" content="en"/>
+  </head>
+  
+  
+  <body data-bs-spy="scroll" data-bs-target=".bd-toc-nav" data-offset="180" data-bs-root-margin="0px 0px -60%" data-default-mode="">
+
+  
+  
+  <div id="pst-skip-link" class="skip-link d-print-none"><a href="#main-content">Skip to main content</a></div>
+  
+  <div id="pst-scroll-pixel-helper"></div>
+  
+  <button type="button" class="btn rounded-pill" id="pst-back-to-top">
+    <i class="fa-solid fa-arrow-up"></i>Back to top</button>
+
+  
+  <input type="checkbox"
+          class="sidebar-toggle"
+          id="pst-primary-sidebar-checkbox"/>
+  <label class="overlay overlay-primary" for="pst-primary-sidebar-checkbox"></label>
+  
+  <input type="checkbox"
+          class="sidebar-toggle"
+          id="pst-secondary-sidebar-checkbox"/>
+  <label class="overlay overlay-secondary" for="pst-secondary-sidebar-checkbox"></label>
+  
+  <div class="search-button__wrapper">
+    <div class="search-button__overlay"></div>
+    <div class="search-button__search-container">
+<form class="bd-search d-flex align-items-center"
+      action="http://github.com/search.html"
+      method="get">
+  <i class="fa-solid fa-magnifying-glass"></i>
+  <input type="search"
+         class="form-control"
+         name="q"
+         id="search-input"
+         placeholder="Search this book..."
+         aria-label="Search this book..."
+         autocomplete="off"
+         autocorrect="off"
+         autocapitalize="off"
+         spellcheck="false"/>
+  <span class="search-button__kbd-shortcut"><kbd class="kbd-shortcut__modifier">Ctrl</kbd>+<kbd>K</kbd></span>
+</form></div>
+  </div>
+
+  <div class="pst-async-banner-revealer d-none">
+  <aside id="bd-header-version-warning" class="d-none d-print-none" aria-label="Version warning"></aside>
+</div>
+
+  
+    <header class="bd-header navbar navbar-expand-lg bd-navbar d-print-none">
+    </header>
+  
+
+  <div class="bd-container">
+    <div class="bd-container__inner bd-page-width">
+      
+      
+      
+        
+      
+      <div class="bd-sidebar-primary bd-sidebar">
+        
+
+  
+  <div class="sidebar-header-items sidebar-primary__section">
+    
+    
+    
+    
+  </div>
+  
+    <div class="sidebar-primary-items__start sidebar-primary__section">
+        <div class="sidebar-primary-item">
+
+  
+    
+  
+
+<a class="navbar-brand logo" href="http://github.com/intro.html">
+  
+  
+  
+  
+  
+    
+    
+      
+    
+    
+    <img src="http://github.com/_static/logo.png" class="logo__image only-light" alt="Applied Data Analysis and Machine Learning - Home"/>
+    <script>document.write(`<img src="http://github.com/_static/logo.png" class="logo__image only-dark" alt="Applied Data Analysis and Machine Learning - Home"/>`);</script>
+  
+  
+</a></div>
+        <div class="sidebar-primary-item">
+
+ <script>
+ document.write(`
+   <button class="btn search-button-field search-button__button" title="Search" aria-label="Search" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <i class="fa-solid fa-magnifying-glass"></i>
+    <span class="search-button__default-text">Search</span>
+    <span class="search-button__kbd-shortcut"><kbd class="kbd-shortcut__modifier">Ctrl</kbd>+<kbd class="kbd-shortcut__modifier">K</kbd></span>
+   </button>
+ `);
+ </script></div>
+        <div class="sidebar-primary-item"><nav class="bd-links bd-docs-nav" aria-label="Main">
+    <div class="bd-toc-item navbar-nav active">
+        
+        <ul class="nav bd-sidenav bd-sidenav__home-link">
+            <li class="toctree-l1">
+                <a class="reference internal" href="http://github.com/intro.html">
+                    Applied Data Analysis and Machine Learning
+                </a>
+            </li>
+        </ul>
+        <p aria-level="2" class="caption" role="heading"><span class="caption-text">About the course</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/schedule.html">Course setting</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/teachers.html">Teachers and Grading</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/textbooks.html">Textbooks</a></li>
+
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Review of Statistics with Resampling Techniques and Linear Algebra</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/statistics.html">1. Elements of Probability Theory and Statistical Data Analysis</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/linalg.html">2. Linear Algebra, Handling of Arrays and more Python Features</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">From Regression to Support Vector Machines</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter1.html">3. Linear Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter2.html">4. Ridge and Lasso Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter3.html">5. Resampling Methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter4.html">6. Logistic Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapteroptimization.html">7. Optimization, the central part of any Machine Learning algortithm</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter5.html">8. Support Vector Machines, overarching aims</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Decision Trees, Ensemble Methods and Boosting</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter6.html">9. Decision trees, overarching aims</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter7.html">10. Ensemble Methods: From a Single Tree to Many Trees and Extreme Boosting, Meet the Jungle of Methods</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Dimensionality Reduction</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter8.html">11. Basic ideas of the Principal Component Analysis (PCA)</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/clustering.html">12. Clustering and Unsupervised Learning</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Deep Learning Methods</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter9.html">13. Neural networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter10.html">14. Building a Feed Forward Neural Network</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter11.html">15. Solving Differential Equations  with Deep Learning</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter12.html">16. Convolutional Neural Networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter13.html">17. Recurrent neural networks: Overarching view</a></li>
+
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Weekly material, notes and exercises</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek34.html">Exercises week 34</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week34.html">Week 34: Introduction to the course, Logistics and Practicalities</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek35.html">Exercises week 35</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week35.html">Week 35: From Ordinary Linear Regression to Ridge and Lasso Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek36.html">Exercises week 36</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week36.html">Week 36: Linear Regression and Gradient descent</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek37.html">Exercises week 37</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week37.html">Week 37: Gradient descent methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek38.html">Exercises week 38</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week38.html">Week 38: Statistical analysis, bias-variance tradeoff and resampling methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek39.html">Exercises week 39</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week39.html">Week 39: Resampling methods and logistic regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week40.html">Week 40: Gradient descent methods (continued) and start Neural networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week41.html">Week 41 Neural networks and constructing a neural network code</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek41.html">Exercises week 41</a></li>
+
+
+
+
+
+
+
+
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Projects</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/project1.html">Project 1 on Machine Learning, deadline October 6 (midnight), 2025</a></li>
+</ul>
+
+    </div>
+</nav></div>
+    </div>
+  
+  
+  <div class="sidebar-primary-items__end sidebar-primary__section">
+  </div>
+  
+  <div id="rtd-footer-container"></div>
+
+
+      </div>
+      
+      <main id="main-content" class="bd-main" role="main">
+        
+        
+
+<div class="sbt-scroll-pixel-helper"></div>
+
+          <div class="bd-content">
+            <div class="bd-article-container">
+              
+              <div class="bd-header-article d-print-none">
+<div class="header-article-items header-article__inner">
+  
+    <div class="header-article-items__start">
+      
+        <div class="header-article-item"><button class="sidebar-toggle primary-toggle btn btn-sm" title="Toggle primary sidebar" data-bs-placement="bottom" data-bs-toggle="tooltip">
+  <span class="fa-solid fa-bars"></span>
+</button></div>
+      
+    </div>
+  
+  
+    <div class="header-article-items__end">
+      
+        <div class="header-article-item">
+
+<div class="article-header-buttons">
+
+
+
+
+
+<div class="dropdown dropdown-download-buttons">
+  <button class="btn dropdown-toggle" type="button" data-bs-toggle="dropdown" aria-expanded="false" aria-label="Download this page">
+    <i class="fas fa-download"></i>
+  </button>
+  <ul class="dropdown-menu">
+      
+      
+      
+      <li><a href="http://github.com/_sources/.venv/lib/python3.13/site-packages/a11y_pygments/a11y_light/README.md" target="_blank"
+   class="btn btn-sm btn-download-source-button dropdown-item"
+   title="Download source file"
+   data-bs-placement="left" data-bs-toggle="tooltip"
+>
+  
+
+<span class="btn__icon-container">
+  <i class="fas fa-file"></i>
+  </span>
+<span class="btn__text-container">.md</span>
+</a>
+</li>
+      
+      
+      
+      
+      <li>
+<button onclick="window.print()"
+  class="btn btn-sm btn-download-pdf-button dropdown-item"
+  title="Print to PDF"
+  data-bs-placement="left" data-bs-toggle="tooltip"
+>
+  
+
+<span class="btn__icon-container">
+  <i class="fas fa-file-pdf"></i>
+  </span>
+<span class="btn__text-container">.pdf</span>
+</button>
+</li>
+      
+  </ul>
+</div>
+
+
+
+
+<button onclick="toggleFullScreen()"
+  class="btn btn-sm btn-fullscreen-button"
+  title="Fullscreen mode"
+  data-bs-placement="bottom" data-bs-toggle="tooltip"
+>
+  
+
+<span class="btn__icon-container">
+  <i class="fas fa-expand"></i>
+  </span>
+
+</button>
+
+
+
+<script>
+document.write(`
+  <button class="btn btn-sm nav-link pst-navbar-icon theme-switch-button" title="light/dark" aria-label="light/dark" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <i class="theme-switch fa-solid fa-sun fa-lg" data-mode="light"></i>
+    <i class="theme-switch fa-solid fa-moon fa-lg" data-mode="dark"></i>
+    <i class="theme-switch fa-solid fa-circle-half-stroke fa-lg" data-mode="auto"></i>
+  </button>
+`);
+</script>
+
+
+<script>
+document.write(`
+  <button class="btn btn-sm pst-navbar-icon search-button search-button__button" title="Search" aria-label="Search" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <i class="fa-solid fa-magnifying-glass fa-lg"></i>
+  </button>
+`);
+</script>
+<button class="sidebar-toggle secondary-toggle btn btn-sm" title="Toggle secondary sidebar" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <span class="fa-solid fa-list"></span>
+</button>
+</div></div>
+      
+    </div>
+  
+</div>
+</div>
+              
+              
+
+<div id="jb-print-docs-body" class="onlyprint">
+    <h1>A11Y Light</h1>
+    <!-- Table of contents -->
+    <div id="print-main-content">
+        <div id="jb-print-toc">
+            
+            <div>
+                <h2> Contents </h2>
+            </div>
+            <nav aria-label="Page">
+                <ul class="visible nav section-nav flex-column">
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#colors">Colors</a></li>
+</ul>
+            </nav>
+        </div>
+    </div>
+</div>
+
+              
+                
+<div id="searchbox"></div>
+                <article class="bd-article">
+                  
+  <section class="tex2jax_ignore mathjax_ignore" id="a11y-light">
+<h1>A11Y Light<a class="headerlink" href="#a11y-light" title="Link to this heading">#</a></h1>
+<p>This style inspired by the a11y-light theme from eric bailey’s accessible themes.</p>
+<p><img alt="Screenshot of the a11y-light theme in a bash script" src="http://github.com/_images/a11y-light.png" /></p>
+<section id="colors">
+<h2>Colors<a class="headerlink" href="#colors" title="Link to this heading">#</a></h2>
+<p>Background color: <img alt="#f2f2f2" src="https://via.placeholder.com/20/f2f2f2/f2f2f2.png" /> <code class="docutils literal notranslate"><span class="pre">#f2f2f2</span></code></p>
+<p>Highlight color: <img alt="#fdf2e2" src="https://via.placeholder.com/20/fdf2e2/fdf2e2.png" /> <code class="docutils literal notranslate"><span class="pre">#fdf2e2</span></code></p>
+<p><strong>WCAG compliance</strong></p>
+<div class="pst-scrollable-table-container"><table class="table">
+<thead>
+<tr class="row-odd"><th class="head"><p>Color</p></th>
+<th class="head"><p>Hex</p></th>
+<th class="head"><p>Ratio</p></th>
+<th class="head"><p>Normal text</p></th>
+<th class="head"><p>Large text</p></th>
+</tr>
+</thead>
+<tbody>
+<tr class="row-even"><td><p><img alt="#515151" src="http://github.com/_images/515151.png" /></p></td>
+<td><p><code class="docutils literal notranslate"><span class="pre">#515151</span></code></p></td>
+<td><p>7.1 : 1</p></td>
+<td><p>AAA</p></td>
+<td><p>AAA</p></td>
+</tr>
+<tr class="row-odd"><td><p><img alt="#d71835" src="http://github.com/_images/d71835.png" /></p></td>
+<td><p><code class="docutils literal notranslate"><span class="pre">#d71835</span></code></p></td>
+<td><p>4.6 : 1</p></td>
+<td><p>AA</p></td>
+<td><p>AAA</p></td>
+</tr>
+<tr class="row-even"><td><p><img alt="#7f4707" src="http://github.com/_images/7f4707.png" /></p></td>
+<td><p><code class="docutils literal notranslate"><span class="pre">#7f4707</span></code></p></td>
+<td><p>6.7 : 1</p></td>
+<td><p>AA</p></td>
+<td><p>AAA</p></td>
+</tr>
+<tr class="row-odd"><td><p><img alt="#116633" src="http://github.com/_images/116633.png" /></p></td>
+<td><p><code class="docutils literal notranslate"><span class="pre">#116633</span></code></p></td>
+<td><p>6.3 : 1</p></td>
+<td><p>AA</p></td>
+<td><p>AAA</p></td>
+</tr>
+<tr class="row-even"><td><p><img alt="#00749c" src="http://github.com/_images/00749c.png" /></p></td>
+<td><p><code class="docutils literal notranslate"><span class="pre">#00749c</span></code></p></td>
+<td><p>4.7 : 1</p></td>
+<td><p>AA</p></td>
+<td><p>AAA</p></td>
+</tr>
+<tr class="row-odd"><td><p><img alt="#8045e5" src="http://github.com/_images/8045e5.png" /></p></td>
+<td><p><code class="docutils literal notranslate"><span class="pre">#8045e5</span></code></p></td>
+<td><p>4.8 : 1</p></td>
+<td><p>AA</p></td>
+<td><p>AAA</p></td>
+</tr>
+<tr class="row-even"><td><p><img alt="#1e1e1e" src="http://github.com/_images/1e1e1e.png" /></p></td>
+<td><p><code class="docutils literal notranslate"><span class="pre">#1e1e1e</span></code></p></td>
+<td><p>14.9 : 1</p></td>
+<td><p>AAA</p></td>
+<td><p>AAA</p></td>
+</tr>
+</tbody>
+</table>
+</div>
+</section>
+</section>
+
+    <script type="text/x-thebe-config">
+    {
+        requestKernel: true,
+        binderOptions: {
+            repo: "binder-examples/jupyter-stacks-datascience",
+            ref: "master",
+        },
+        codeMirrorConfig: {
+            theme: "abcdef",
+            mode: "python"
+        },
+        kernelOptions: {
+            name: "python3",
+            path: "./.venv/lib/python3.13/site-packages/a11y_pygments/a11y_light"
+        },
+        predefinedOutput: true
+    }
+    </script>
+    <script>kernelName = 'python3'</script>
+
+                </article>
+              
+
+              
+              
+              
+              
+                <footer class="prev-next-footer d-print-none">
+                  
+<div class="prev-next-area">
+</div>
+                </footer>
+              
+            </div>
+            
+            
+              
+                <div class="bd-sidebar-secondary bd-toc"><div class="sidebar-secondary-items sidebar-secondary__inner">
+
+
+  <div class="sidebar-secondary-item">
+  <div class="page-toc tocsection onthispage">
+    <i class="fa-solid fa-list"></i> Contents
+  </div>
+  <nav class="bd-toc-nav page-toc">
+    <ul class="visible nav section-nav flex-column">
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#colors">Colors</a></li>
+</ul>
+  </nav></div>
+
+</div></div>
+              
+            
+          </div>
+          <footer class="bd-footer-content">
+            
+<div class="bd-footer-content__inner container">
+  
+  <div class="footer-item">
+    
+<p class="component-author">
+By Morten Hjorth-Jensen
+</p>
+
+  </div>
+  
+  <div class="footer-item">
+    
+
+  <p class="copyright">
+    
+      © Copyright 2023.
+      <br/>
+    
+  </p>
+
+  </div>
+  
+  <div class="footer-item">
+    
+  </div>
+  
+  <div class="footer-item">
+    
+  </div>
+  
+</div>
+          </footer>
+        
+
+      </main>
+    </div>
+  </div>
+  
+  <!-- Scripts loaded after <body> so the DOM is not blocked -->
+  <script src="http://github.com/_static/scripts/bootstrap.js?digest=dfe6caa3a7d634c4db9b"></script>
+<script src="http://github.com/_static/scripts/pydata-sphinx-theme.js?digest=dfe6caa3a7d634c4db9b"></script>
+
+  <footer class="bd-footer">
+  </footer>
+  </body>
+</html>
\ No newline at end of file
diff --git a/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/a11y_pygments/blinds_dark/README.html b/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/a11y_pygments/blinds_dark/README.html
new file mode 100644
index 000000000..daad1ecd8
--- /dev/null
+++ b/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/a11y_pygments/blinds_dark/README.html
@@ -0,0 +1,579 @@
+
+<!DOCTYPE html>
+
+
+<html lang="en" data-content_root="../../../../../../" >
+
+  <head>
+    <meta charset="utf-8" />
+    <meta name="viewport" content="width=device-width, initial-scale=1.0" /><meta name="viewport" content="width=device-width, initial-scale=1" />
+
+    <title>Blinds Dark &#8212; Applied Data Analysis and Machine Learning</title>
+  
+  
+  
+  <script data-cfasync="false">
+    document.documentElement.dataset.mode = localStorage.getItem("mode") || "";
+    document.documentElement.dataset.theme = localStorage.getItem("theme") || "";
+  </script>
+  
+  <!-- Loaded before other Sphinx assets -->
+  <link href="http://github.com/_static/styles/theme.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+<link href="http://github.com/_static/styles/bootstrap.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+<link href="http://github.com/_static/styles/pydata-sphinx-theme.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+
+  
+  <link href="http://github.com/_static/vendor/fontawesome/6.5.2/css/all.min.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+  <link rel="preload" as="font" type="font/woff2" crossorigin href="http://github.com/_static/vendor/fontawesome/6.5.2/webfonts/fa-solid-900.woff2" />
+<link rel="preload" as="font" type="font/woff2" crossorigin href="http://github.com/_static/vendor/fontawesome/6.5.2/webfonts/fa-brands-400.woff2" />
+<link rel="preload" as="font" type="font/woff2" crossorigin href="http://github.com/_static/vendor/fontawesome/6.5.2/webfonts/fa-regular-400.woff2" />
+
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/pygments.css?v=03e43079" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/styles/sphinx-book-theme.css?v=eba8b062" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/togglebutton.css?v=13237357" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/copybutton.css?v=76b2166b" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/mystnb.8ecb98da25f57f5357bf6f572d296f466b2cfe2517ffebfabe82451661e28f02.css?v=6644e6bb" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/sphinx-thebe.css?v=4fa983c6" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/sphinx-design.min.css?v=95c83b7e" />
+  
+  <!-- Pre-loaded scripts that we'll load fully later -->
+  <link rel="preload" as="script" href="http://github.com/_static/scripts/bootstrap.js?digest=dfe6caa3a7d634c4db9b" />
+<link rel="preload" as="script" href="http://github.com/_static/scripts/pydata-sphinx-theme.js?digest=dfe6caa3a7d634c4db9b" />
+  <script src="http://github.com/_static/vendor/fontawesome/6.5.2/js/all.min.js?digest=dfe6caa3a7d634c4db9b"></script>
+
+    <script src="http://github.com/_static/documentation_options.js?v=9eb32ce0"></script>
+    <script src="http://github.com/_static/doctools.js?v=9a2dae69"></script>
+    <script src="http://github.com/_static/sphinx_highlight.js?v=dc90522c"></script>
+    <script src="http://github.com/_static/clipboard.min.js?v=a7894cd8"></script>
+    <script src="http://github.com/_static/copybutton.js?v=f281be69"></script>
+    <script src="http://github.com/_static/scripts/sphinx-book-theme.js?v=887ef09a"></script>
+    <script>let toggleHintShow = 'Click to show';</script>
+    <script>let toggleHintHide = 'Click to hide';</script>
+    <script>let toggleOpenOnPrint = 'true';</script>
+    <script src="http://github.com/_static/togglebutton.js?v=4a39c7ea"></script>
+    <script>var togglebuttonSelector = '.toggle, .admonition.dropdown';</script>
+    <script src="http://github.com/_static/design-tabs.js?v=f930bc37"></script>
+    <script>const THEBE_JS_URL = "https://unpkg.com/thebe@0.8.2/lib/index.js"; const thebe_selector = ".thebe,.cell"; const thebe_selector_input = "pre"; const thebe_selector_output = ".output, .cell_output"</script>
+    <script async="async" src="http://github.com/_static/sphinx-thebe.js?v=c100c467"></script>
+    <script>var togglebuttonSelector = '.toggle, .admonition.dropdown';</script>
+    <script>const THEBE_JS_URL = "https://unpkg.com/thebe@0.8.2/lib/index.js"; const thebe_selector = ".thebe,.cell"; const thebe_selector_input = "pre"; const thebe_selector_output = ".output, .cell_output"</script>
+    <script>DOCUMENTATION_OPTIONS.pagename = '.venv/lib/python3.13/site-packages/a11y_pygments/blinds_dark/README';</script>
+    <link rel="index" title="Index" href="http://github.com/genindex.html" />
+    <link rel="search" title="Search" href="http://github.com/search.html" />
+  <meta name="viewport" content="width=device-width, initial-scale=1"/>
+  <meta name="docsearch:language" content="en"/>
+  </head>
+  
+  
+  <body data-bs-spy="scroll" data-bs-target=".bd-toc-nav" data-offset="180" data-bs-root-margin="0px 0px -60%" data-default-mode="">
+
+  
+  
+  <div id="pst-skip-link" class="skip-link d-print-none"><a href="#main-content">Skip to main content</a></div>
+  
+  <div id="pst-scroll-pixel-helper"></div>
+  
+  <button type="button" class="btn rounded-pill" id="pst-back-to-top">
+    <i class="fa-solid fa-arrow-up"></i>Back to top</button>
+
+  
+  <input type="checkbox"
+          class="sidebar-toggle"
+          id="pst-primary-sidebar-checkbox"/>
+  <label class="overlay overlay-primary" for="pst-primary-sidebar-checkbox"></label>
+  
+  <input type="checkbox"
+          class="sidebar-toggle"
+          id="pst-secondary-sidebar-checkbox"/>
+  <label class="overlay overlay-secondary" for="pst-secondary-sidebar-checkbox"></label>
+  
+  <div class="search-button__wrapper">
+    <div class="search-button__overlay"></div>
+    <div class="search-button__search-container">
+<form class="bd-search d-flex align-items-center"
+      action="http://github.com/search.html"
+      method="get">
+  <i class="fa-solid fa-magnifying-glass"></i>
+  <input type="search"
+         class="form-control"
+         name="q"
+         id="search-input"
+         placeholder="Search this book..."
+         aria-label="Search this book..."
+         autocomplete="off"
+         autocorrect="off"
+         autocapitalize="off"
+         spellcheck="false"/>
+  <span class="search-button__kbd-shortcut"><kbd class="kbd-shortcut__modifier">Ctrl</kbd>+<kbd>K</kbd></span>
+</form></div>
+  </div>
+
+  <div class="pst-async-banner-revealer d-none">
+  <aside id="bd-header-version-warning" class="d-none d-print-none" aria-label="Version warning"></aside>
+</div>
+
+  
+    <header class="bd-header navbar navbar-expand-lg bd-navbar d-print-none">
+    </header>
+  
+
+  <div class="bd-container">
+    <div class="bd-container__inner bd-page-width">
+      
+      
+      
+        
+      
+      <div class="bd-sidebar-primary bd-sidebar">
+        
+
+  
+  <div class="sidebar-header-items sidebar-primary__section">
+    
+    
+    
+    
+  </div>
+  
+    <div class="sidebar-primary-items__start sidebar-primary__section">
+        <div class="sidebar-primary-item">
+
+  
+    
+  
+
+<a class="navbar-brand logo" href="http://github.com/intro.html">
+  
+  
+  
+  
+  
+    
+    
+      
+    
+    
+    <img src="http://github.com/_static/logo.png" class="logo__image only-light" alt="Applied Data Analysis and Machine Learning - Home"/>
+    <script>document.write(`<img src="http://github.com/_static/logo.png" class="logo__image only-dark" alt="Applied Data Analysis and Machine Learning - Home"/>`);</script>
+  
+  
+</a></div>
+        <div class="sidebar-primary-item">
+
+ <script>
+ document.write(`
+   <button class="btn search-button-field search-button__button" title="Search" aria-label="Search" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <i class="fa-solid fa-magnifying-glass"></i>
+    <span class="search-button__default-text">Search</span>
+    <span class="search-button__kbd-shortcut"><kbd class="kbd-shortcut__modifier">Ctrl</kbd>+<kbd class="kbd-shortcut__modifier">K</kbd></span>
+   </button>
+ `);
+ </script></div>
+        <div class="sidebar-primary-item"><nav class="bd-links bd-docs-nav" aria-label="Main">
+    <div class="bd-toc-item navbar-nav active">
+        
+        <ul class="nav bd-sidenav bd-sidenav__home-link">
+            <li class="toctree-l1">
+                <a class="reference internal" href="http://github.com/intro.html">
+                    Applied Data Analysis and Machine Learning
+                </a>
+            </li>
+        </ul>
+        <p aria-level="2" class="caption" role="heading"><span class="caption-text">About the course</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/schedule.html">Course setting</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/teachers.html">Teachers and Grading</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/textbooks.html">Textbooks</a></li>
+
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Review of Statistics with Resampling Techniques and Linear Algebra</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/statistics.html">1. Elements of Probability Theory and Statistical Data Analysis</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/linalg.html">2. Linear Algebra, Handling of Arrays and more Python Features</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">From Regression to Support Vector Machines</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter1.html">3. Linear Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter2.html">4. Ridge and Lasso Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter3.html">5. Resampling Methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter4.html">6. Logistic Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapteroptimization.html">7. Optimization, the central part of any Machine Learning algortithm</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter5.html">8. Support Vector Machines, overarching aims</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Decision Trees, Ensemble Methods and Boosting</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter6.html">9. Decision trees, overarching aims</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter7.html">10. Ensemble Methods: From a Single Tree to Many Trees and Extreme Boosting, Meet the Jungle of Methods</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Dimensionality Reduction</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter8.html">11. Basic ideas of the Principal Component Analysis (PCA)</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/clustering.html">12. Clustering and Unsupervised Learning</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Deep Learning Methods</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter9.html">13. Neural networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter10.html">14. Building a Feed Forward Neural Network</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter11.html">15. Solving Differential Equations  with Deep Learning</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter12.html">16. Convolutional Neural Networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter13.html">17. Recurrent neural networks: Overarching view</a></li>
+
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Weekly material, notes and exercises</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek34.html">Exercises week 34</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week34.html">Week 34: Introduction to the course, Logistics and Practicalities</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek35.html">Exercises week 35</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week35.html">Week 35: From Ordinary Linear Regression to Ridge and Lasso Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek36.html">Exercises week 36</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week36.html">Week 36: Linear Regression and Gradient descent</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek37.html">Exercises week 37</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week37.html">Week 37: Gradient descent methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek38.html">Exercises week 38</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week38.html">Week 38: Statistical analysis, bias-variance tradeoff and resampling methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek39.html">Exercises week 39</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week39.html">Week 39: Resampling methods and logistic regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week40.html">Week 40: Gradient descent methods (continued) and start Neural networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week41.html">Week 41 Neural networks and constructing a neural network code</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek41.html">Exercises week 41</a></li>
+
+
+
+
+
+
+
+
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Projects</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/project1.html">Project 1 on Machine Learning, deadline October 6 (midnight), 2025</a></li>
+</ul>
+
+    </div>
+</nav></div>
+    </div>
+  
+  
+  <div class="sidebar-primary-items__end sidebar-primary__section">
+  </div>
+  
+  <div id="rtd-footer-container"></div>
+
+
+      </div>
+      
+      <main id="main-content" class="bd-main" role="main">
+        
+        
+
+<div class="sbt-scroll-pixel-helper"></div>
+
+          <div class="bd-content">
+            <div class="bd-article-container">
+              
+              <div class="bd-header-article d-print-none">
+<div class="header-article-items header-article__inner">
+  
+    <div class="header-article-items__start">
+      
+        <div class="header-article-item"><button class="sidebar-toggle primary-toggle btn btn-sm" title="Toggle primary sidebar" data-bs-placement="bottom" data-bs-toggle="tooltip">
+  <span class="fa-solid fa-bars"></span>
+</button></div>
+      
+    </div>
+  
+  
+    <div class="header-article-items__end">
+      
+        <div class="header-article-item">
+
+<div class="article-header-buttons">
+
+
+
+
+
+<div class="dropdown dropdown-download-buttons">
+  <button class="btn dropdown-toggle" type="button" data-bs-toggle="dropdown" aria-expanded="false" aria-label="Download this page">
+    <i class="fas fa-download"></i>
+  </button>
+  <ul class="dropdown-menu">
+      
+      
+      
+      <li><a href="http://github.com/_sources/.venv/lib/python3.13/site-packages/a11y_pygments/blinds_dark/README.md" target="_blank"
+   class="btn btn-sm btn-download-source-button dropdown-item"
+   title="Download source file"
+   data-bs-placement="left" data-bs-toggle="tooltip"
+>
+  
+
+<span class="btn__icon-container">
+  <i class="fas fa-file"></i>
+  </span>
+<span class="btn__text-container">.md</span>
+</a>
+</li>
+      
+      
+      
+      
+      <li>
+<button onclick="window.print()"
+  class="btn btn-sm btn-download-pdf-button dropdown-item"
+  title="Print to PDF"
+  data-bs-placement="left" data-bs-toggle="tooltip"
+>
+  
+
+<span class="btn__icon-container">
+  <i class="fas fa-file-pdf"></i>
+  </span>
+<span class="btn__text-container">.pdf</span>
+</button>
+</li>
+      
+  </ul>
+</div>
+
+
+
+
+<button onclick="toggleFullScreen()"
+  class="btn btn-sm btn-fullscreen-button"
+  title="Fullscreen mode"
+  data-bs-placement="bottom" data-bs-toggle="tooltip"
+>
+  
+
+<span class="btn__icon-container">
+  <i class="fas fa-expand"></i>
+  </span>
+
+</button>
+
+
+
+<script>
+document.write(`
+  <button class="btn btn-sm nav-link pst-navbar-icon theme-switch-button" title="light/dark" aria-label="light/dark" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <i class="theme-switch fa-solid fa-sun fa-lg" data-mode="light"></i>
+    <i class="theme-switch fa-solid fa-moon fa-lg" data-mode="dark"></i>
+    <i class="theme-switch fa-solid fa-circle-half-stroke fa-lg" data-mode="auto"></i>
+  </button>
+`);
+</script>
+
+
+<script>
+document.write(`
+  <button class="btn btn-sm pst-navbar-icon search-button search-button__button" title="Search" aria-label="Search" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <i class="fa-solid fa-magnifying-glass fa-lg"></i>
+  </button>
+`);
+</script>
+<button class="sidebar-toggle secondary-toggle btn btn-sm" title="Toggle secondary sidebar" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <span class="fa-solid fa-list"></span>
+</button>
+</div></div>
+      
+    </div>
+  
+</div>
+</div>
+              
+              
+
+<div id="jb-print-docs-body" class="onlyprint">
+    <h1>Blinds Dark</h1>
+    <!-- Table of contents -->
+    <div id="print-main-content">
+        <div id="jb-print-toc">
+            
+            <div>
+                <h2> Contents </h2>
+            </div>
+            <nav aria-label="Page">
+                <ul class="visible nav section-nav flex-column">
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#colors">Colors</a></li>
+</ul>
+            </nav>
+        </div>
+    </div>
+</div>
+
+              
+                
+<div id="searchbox"></div>
+                <article class="bd-article">
+                  
+  <section class="tex2jax_ignore mathjax_ignore" id="blinds-dark">
+<h1>Blinds Dark<a class="headerlink" href="#blinds-dark" title="Link to this heading">#</a></h1>
+<p>This style mimics the blinds dark theme from vscode themes.</p>
+<p><img alt="Screenshot of the blinds-dark theme in a bash script" src="http://github.com/_images/blinds-dark.png" /></p>
+<section id="colors">
+<h2>Colors<a class="headerlink" href="#colors" title="Link to this heading">#</a></h2>
+<p>Background color: <img alt="#242424" src="https://via.placeholder.com/20/242424/242424.png" /> <code class="docutils literal notranslate"><span class="pre">#242424</span></code></p>
+<p>Highlight color: <img alt="#66666691" src="https://via.placeholder.com/20/66666691/66666691.png" /> <code class="docutils literal notranslate"><span class="pre">#66666691</span></code></p>
+<p><strong>WCAG compliance</strong></p>
+<div class="pst-scrollable-table-container"><table class="table">
+<thead>
+<tr class="row-odd"><th class="head"><p>Color</p></th>
+<th class="head"><p>Hex</p></th>
+<th class="head"><p>Ratio</p></th>
+<th class="head"><p>Normal text</p></th>
+<th class="head"><p>Large text</p></th>
+</tr>
+</thead>
+<tbody>
+<tr class="row-even"><td><p><img alt="#8c8c8c" src="http://github.com/_images/8c8c8c.png" /></p></td>
+<td><p><code class="docutils literal notranslate"><span class="pre">#8c8c8c</span></code></p></td>
+<td><p>4.6 : 1</p></td>
+<td><p>AA</p></td>
+<td><p>AAA</p></td>
+</tr>
+<tr class="row-odd"><td><p><img alt="#ee6677" src="http://github.com/_images/ee6677.png" /></p></td>
+<td><p><code class="docutils literal notranslate"><span class="pre">#ee6677</span></code></p></td>
+<td><p>5.0 : 1</p></td>
+<td><p>AA</p></td>
+<td><p>AAA</p></td>
+</tr>
+<tr class="row-even"><td><p><img alt="#ccbb44" src="http://github.com/_images/ccbb44.png" /></p></td>
+<td><p><code class="docutils literal notranslate"><span class="pre">#ccbb44</span></code></p></td>
+<td><p>8.0 : 1</p></td>
+<td><p>AAA</p></td>
+<td><p>AAA</p></td>
+</tr>
+<tr class="row-odd"><td><p><img alt="#66ccee" src="http://github.com/_images/66ccee.png" /></p></td>
+<td><p><code class="docutils literal notranslate"><span class="pre">#66ccee</span></code></p></td>
+<td><p>8.5 : 1</p></td>
+<td><p>AAA</p></td>
+<td><p>AAA</p></td>
+</tr>
+<tr class="row-even"><td><p><img alt="#5391cf" src="http://github.com/_images/5391cf.png" /></p></td>
+<td><p><code class="docutils literal notranslate"><span class="pre">#5391cf</span></code></p></td>
+<td><p>4.7 : 1</p></td>
+<td><p>AA</p></td>
+<td><p>AAA</p></td>
+</tr>
+<tr class="row-odd"><td><p><img alt="#d166a3" src="http://github.com/_images/d166a3.png" /></p></td>
+<td><p><code class="docutils literal notranslate"><span class="pre">#d166a3</span></code></p></td>
+<td><p>4.5 : 1</p></td>
+<td><p>AA</p></td>
+<td><p>AAA</p></td>
+</tr>
+<tr class="row-even"><td><p><img alt="#bbbbbb" src="http://github.com/_images/bbbbbb.png" /></p></td>
+<td><p><code class="docutils literal notranslate"><span class="pre">#bbbbbb</span></code></p></td>
+<td><p>8.1 : 1</p></td>
+<td><p>AAA</p></td>
+<td><p>AAA</p></td>
+</tr>
+</tbody>
+</table>
+</div>
+</section>
+</section>
+
+    <script type="text/x-thebe-config">
+    {
+        requestKernel: true,
+        binderOptions: {
+            repo: "binder-examples/jupyter-stacks-datascience",
+            ref: "master",
+        },
+        codeMirrorConfig: {
+            theme: "abcdef",
+            mode: "python"
+        },
+        kernelOptions: {
+            name: "python3",
+            path: "./.venv/lib/python3.13/site-packages/a11y_pygments/blinds_dark"
+        },
+        predefinedOutput: true
+    }
+    </script>
+    <script>kernelName = 'python3'</script>
+
+                </article>
+              
+
+              
+              
+              
+              
+                <footer class="prev-next-footer d-print-none">
+                  
+<div class="prev-next-area">
+</div>
+                </footer>
+              
+            </div>
+            
+            
+              
+                <div class="bd-sidebar-secondary bd-toc"><div class="sidebar-secondary-items sidebar-secondary__inner">
+
+
+  <div class="sidebar-secondary-item">
+  <div class="page-toc tocsection onthispage">
+    <i class="fa-solid fa-list"></i> Contents
+  </div>
+  <nav class="bd-toc-nav page-toc">
+    <ul class="visible nav section-nav flex-column">
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#colors">Colors</a></li>
+</ul>
+  </nav></div>
+
+</div></div>
+              
+            
+          </div>
+          <footer class="bd-footer-content">
+            
+<div class="bd-footer-content__inner container">
+  
+  <div class="footer-item">
+    
+<p class="component-author">
+By Morten Hjorth-Jensen
+</p>
+
+  </div>
+  
+  <div class="footer-item">
+    
+
+  <p class="copyright">
+    
+      © Copyright 2023.
+      <br/>
+    
+  </p>
+
+  </div>
+  
+  <div class="footer-item">
+    
+  </div>
+  
+  <div class="footer-item">
+    
+  </div>
+  
+</div>
+          </footer>
+        
+
+      </main>
+    </div>
+  </div>
+  
+  <!-- Scripts loaded after <body> so the DOM is not blocked -->
+  <script src="http://github.com/_static/scripts/bootstrap.js?digest=dfe6caa3a7d634c4db9b"></script>
+<script src="http://github.com/_static/scripts/pydata-sphinx-theme.js?digest=dfe6caa3a7d634c4db9b"></script>
+
+  <footer class="bd-footer">
+  </footer>
+  </body>
+</html>
\ No newline at end of file
diff --git a/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/a11y_pygments/blinds_light/README.html b/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/a11y_pygments/blinds_light/README.html
new file mode 100644
index 000000000..f36e2e54d
--- /dev/null
+++ b/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/a11y_pygments/blinds_light/README.html
@@ -0,0 +1,579 @@
+
+<!DOCTYPE html>
+
+
+<html lang="en" data-content_root="../../../../../../" >
+
+  <head>
+    <meta charset="utf-8" />
+    <meta name="viewport" content="width=device-width, initial-scale=1.0" /><meta name="viewport" content="width=device-width, initial-scale=1" />
+
+    <title>Blinds Light &#8212; Applied Data Analysis and Machine Learning</title>
+  
+  
+  
+  <script data-cfasync="false">
+    document.documentElement.dataset.mode = localStorage.getItem("mode") || "";
+    document.documentElement.dataset.theme = localStorage.getItem("theme") || "";
+  </script>
+  
+  <!-- Loaded before other Sphinx assets -->
+  <link href="http://github.com/_static/styles/theme.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+<link href="http://github.com/_static/styles/bootstrap.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+<link href="http://github.com/_static/styles/pydata-sphinx-theme.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+
+  
+  <link href="http://github.com/_static/vendor/fontawesome/6.5.2/css/all.min.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+  <link rel="preload" as="font" type="font/woff2" crossorigin href="http://github.com/_static/vendor/fontawesome/6.5.2/webfonts/fa-solid-900.woff2" />
+<link rel="preload" as="font" type="font/woff2" crossorigin href="http://github.com/_static/vendor/fontawesome/6.5.2/webfonts/fa-brands-400.woff2" />
+<link rel="preload" as="font" type="font/woff2" crossorigin href="http://github.com/_static/vendor/fontawesome/6.5.2/webfonts/fa-regular-400.woff2" />
+
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/pygments.css?v=03e43079" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/styles/sphinx-book-theme.css?v=eba8b062" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/togglebutton.css?v=13237357" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/copybutton.css?v=76b2166b" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/mystnb.8ecb98da25f57f5357bf6f572d296f466b2cfe2517ffebfabe82451661e28f02.css?v=6644e6bb" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/sphinx-thebe.css?v=4fa983c6" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/sphinx-design.min.css?v=95c83b7e" />
+  
+  <!-- Pre-loaded scripts that we'll load fully later -->
+  <link rel="preload" as="script" href="http://github.com/_static/scripts/bootstrap.js?digest=dfe6caa3a7d634c4db9b" />
+<link rel="preload" as="script" href="http://github.com/_static/scripts/pydata-sphinx-theme.js?digest=dfe6caa3a7d634c4db9b" />
+  <script src="http://github.com/_static/vendor/fontawesome/6.5.2/js/all.min.js?digest=dfe6caa3a7d634c4db9b"></script>
+
+    <script src="http://github.com/_static/documentation_options.js?v=9eb32ce0"></script>
+    <script src="http://github.com/_static/doctools.js?v=9a2dae69"></script>
+    <script src="http://github.com/_static/sphinx_highlight.js?v=dc90522c"></script>
+    <script src="http://github.com/_static/clipboard.min.js?v=a7894cd8"></script>
+    <script src="http://github.com/_static/copybutton.js?v=f281be69"></script>
+    <script src="http://github.com/_static/scripts/sphinx-book-theme.js?v=887ef09a"></script>
+    <script>let toggleHintShow = 'Click to show';</script>
+    <script>let toggleHintHide = 'Click to hide';</script>
+    <script>let toggleOpenOnPrint = 'true';</script>
+    <script src="http://github.com/_static/togglebutton.js?v=4a39c7ea"></script>
+    <script>var togglebuttonSelector = '.toggle, .admonition.dropdown';</script>
+    <script src="http://github.com/_static/design-tabs.js?v=f930bc37"></script>
+    <script>const THEBE_JS_URL = "https://unpkg.com/thebe@0.8.2/lib/index.js"; const thebe_selector = ".thebe,.cell"; const thebe_selector_input = "pre"; const thebe_selector_output = ".output, .cell_output"</script>
+    <script async="async" src="http://github.com/_static/sphinx-thebe.js?v=c100c467"></script>
+    <script>var togglebuttonSelector = '.toggle, .admonition.dropdown';</script>
+    <script>const THEBE_JS_URL = "https://unpkg.com/thebe@0.8.2/lib/index.js"; const thebe_selector = ".thebe,.cell"; const thebe_selector_input = "pre"; const thebe_selector_output = ".output, .cell_output"</script>
+    <script>DOCUMENTATION_OPTIONS.pagename = '.venv/lib/python3.13/site-packages/a11y_pygments/blinds_light/README';</script>
+    <link rel="index" title="Index" href="http://github.com/genindex.html" />
+    <link rel="search" title="Search" href="http://github.com/search.html" />
+  <meta name="viewport" content="width=device-width, initial-scale=1"/>
+  <meta name="docsearch:language" content="en"/>
+  </head>
+  
+  
+  <body data-bs-spy="scroll" data-bs-target=".bd-toc-nav" data-offset="180" data-bs-root-margin="0px 0px -60%" data-default-mode="">
+
+  
+  
+  <div id="pst-skip-link" class="skip-link d-print-none"><a href="#main-content">Skip to main content</a></div>
+  
+  <div id="pst-scroll-pixel-helper"></div>
+  
+  <button type="button" class="btn rounded-pill" id="pst-back-to-top">
+    <i class="fa-solid fa-arrow-up"></i>Back to top</button>
+
+  
+  <input type="checkbox"
+          class="sidebar-toggle"
+          id="pst-primary-sidebar-checkbox"/>
+  <label class="overlay overlay-primary" for="pst-primary-sidebar-checkbox"></label>
+  
+  <input type="checkbox"
+          class="sidebar-toggle"
+          id="pst-secondary-sidebar-checkbox"/>
+  <label class="overlay overlay-secondary" for="pst-secondary-sidebar-checkbox"></label>
+  
+  <div class="search-button__wrapper">
+    <div class="search-button__overlay"></div>
+    <div class="search-button__search-container">
+<form class="bd-search d-flex align-items-center"
+      action="http://github.com/search.html"
+      method="get">
+  <i class="fa-solid fa-magnifying-glass"></i>
+  <input type="search"
+         class="form-control"
+         name="q"
+         id="search-input"
+         placeholder="Search this book..."
+         aria-label="Search this book..."
+         autocomplete="off"
+         autocorrect="off"
+         autocapitalize="off"
+         spellcheck="false"/>
+  <span class="search-button__kbd-shortcut"><kbd class="kbd-shortcut__modifier">Ctrl</kbd>+<kbd>K</kbd></span>
+</form></div>
+  </div>
+
+  <div class="pst-async-banner-revealer d-none">
+  <aside id="bd-header-version-warning" class="d-none d-print-none" aria-label="Version warning"></aside>
+</div>
+
+  
+    <header class="bd-header navbar navbar-expand-lg bd-navbar d-print-none">
+    </header>
+  
+
+  <div class="bd-container">
+    <div class="bd-container__inner bd-page-width">
+      
+      
+      
+        
+      
+      <div class="bd-sidebar-primary bd-sidebar">
+        
+
+  
+  <div class="sidebar-header-items sidebar-primary__section">
+    
+    
+    
+    
+  </div>
+  
+    <div class="sidebar-primary-items__start sidebar-primary__section">
+        <div class="sidebar-primary-item">
+
+  
+    
+  
+
+<a class="navbar-brand logo" href="http://github.com/intro.html">
+  
+  
+  
+  
+  
+    
+    
+      
+    
+    
+    <img src="http://github.com/_static/logo.png" class="logo__image only-light" alt="Applied Data Analysis and Machine Learning - Home"/>
+    <script>document.write(`<img src="http://github.com/_static/logo.png" class="logo__image only-dark" alt="Applied Data Analysis and Machine Learning - Home"/>`);</script>
+  
+  
+</a></div>
+        <div class="sidebar-primary-item">
+
+ <script>
+ document.write(`
+   <button class="btn search-button-field search-button__button" title="Search" aria-label="Search" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <i class="fa-solid fa-magnifying-glass"></i>
+    <span class="search-button__default-text">Search</span>
+    <span class="search-button__kbd-shortcut"><kbd class="kbd-shortcut__modifier">Ctrl</kbd>+<kbd class="kbd-shortcut__modifier">K</kbd></span>
+   </button>
+ `);
+ </script></div>
+        <div class="sidebar-primary-item"><nav class="bd-links bd-docs-nav" aria-label="Main">
+    <div class="bd-toc-item navbar-nav active">
+        
+        <ul class="nav bd-sidenav bd-sidenav__home-link">
+            <li class="toctree-l1">
+                <a class="reference internal" href="http://github.com/intro.html">
+                    Applied Data Analysis and Machine Learning
+                </a>
+            </li>
+        </ul>
+        <p aria-level="2" class="caption" role="heading"><span class="caption-text">About the course</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/schedule.html">Course setting</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/teachers.html">Teachers and Grading</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/textbooks.html">Textbooks</a></li>
+
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Review of Statistics with Resampling Techniques and Linear Algebra</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/statistics.html">1. Elements of Probability Theory and Statistical Data Analysis</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/linalg.html">2. Linear Algebra, Handling of Arrays and more Python Features</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">From Regression to Support Vector Machines</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter1.html">3. Linear Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter2.html">4. Ridge and Lasso Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter3.html">5. Resampling Methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter4.html">6. Logistic Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapteroptimization.html">7. Optimization, the central part of any Machine Learning algortithm</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter5.html">8. Support Vector Machines, overarching aims</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Decision Trees, Ensemble Methods and Boosting</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter6.html">9. Decision trees, overarching aims</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter7.html">10. Ensemble Methods: From a Single Tree to Many Trees and Extreme Boosting, Meet the Jungle of Methods</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Dimensionality Reduction</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter8.html">11. Basic ideas of the Principal Component Analysis (PCA)</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/clustering.html">12. Clustering and Unsupervised Learning</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Deep Learning Methods</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter9.html">13. Neural networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter10.html">14. Building a Feed Forward Neural Network</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter11.html">15. Solving Differential Equations  with Deep Learning</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter12.html">16. Convolutional Neural Networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter13.html">17. Recurrent neural networks: Overarching view</a></li>
+
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Weekly material, notes and exercises</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek34.html">Exercises week 34</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week34.html">Week 34: Introduction to the course, Logistics and Practicalities</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek35.html">Exercises week 35</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week35.html">Week 35: From Ordinary Linear Regression to Ridge and Lasso Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek36.html">Exercises week 36</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week36.html">Week 36: Linear Regression and Gradient descent</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek37.html">Exercises week 37</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week37.html">Week 37: Gradient descent methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek38.html">Exercises week 38</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week38.html">Week 38: Statistical analysis, bias-variance tradeoff and resampling methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek39.html">Exercises week 39</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week39.html">Week 39: Resampling methods and logistic regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week40.html">Week 40: Gradient descent methods (continued) and start Neural networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week41.html">Week 41 Neural networks and constructing a neural network code</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek41.html">Exercises week 41</a></li>
+
+
+
+
+
+
+
+
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Projects</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/project1.html">Project 1 on Machine Learning, deadline October 6 (midnight), 2025</a></li>
+</ul>
+
+    </div>
+</nav></div>
+    </div>
+  
+  
+  <div class="sidebar-primary-items__end sidebar-primary__section">
+  </div>
+  
+  <div id="rtd-footer-container"></div>
+
+
+      </div>
+      
+      <main id="main-content" class="bd-main" role="main">
+        
+        
+
+<div class="sbt-scroll-pixel-helper"></div>
+
+          <div class="bd-content">
+            <div class="bd-article-container">
+              
+              <div class="bd-header-article d-print-none">
+<div class="header-article-items header-article__inner">
+  
+    <div class="header-article-items__start">
+      
+        <div class="header-article-item"><button class="sidebar-toggle primary-toggle btn btn-sm" title="Toggle primary sidebar" data-bs-placement="bottom" data-bs-toggle="tooltip">
+  <span class="fa-solid fa-bars"></span>
+</button></div>
+      
+    </div>
+  
+  
+    <div class="header-article-items__end">
+      
+        <div class="header-article-item">
+
+<div class="article-header-buttons">
+
+
+
+
+
+<div class="dropdown dropdown-download-buttons">
+  <button class="btn dropdown-toggle" type="button" data-bs-toggle="dropdown" aria-expanded="false" aria-label="Download this page">
+    <i class="fas fa-download"></i>
+  </button>
+  <ul class="dropdown-menu">
+      
+      
+      
+      <li><a href="http://github.com/_sources/.venv/lib/python3.13/site-packages/a11y_pygments/blinds_light/README.md" target="_blank"
+   class="btn btn-sm btn-download-source-button dropdown-item"
+   title="Download source file"
+   data-bs-placement="left" data-bs-toggle="tooltip"
+>
+  
+
+<span class="btn__icon-container">
+  <i class="fas fa-file"></i>
+  </span>
+<span class="btn__text-container">.md</span>
+</a>
+</li>
+      
+      
+      
+      
+      <li>
+<button onclick="window.print()"
+  class="btn btn-sm btn-download-pdf-button dropdown-item"
+  title="Print to PDF"
+  data-bs-placement="left" data-bs-toggle="tooltip"
+>
+  
+
+<span class="btn__icon-container">
+  <i class="fas fa-file-pdf"></i>
+  </span>
+<span class="btn__text-container">.pdf</span>
+</button>
+</li>
+      
+  </ul>
+</div>
+
+
+
+
+<button onclick="toggleFullScreen()"
+  class="btn btn-sm btn-fullscreen-button"
+  title="Fullscreen mode"
+  data-bs-placement="bottom" data-bs-toggle="tooltip"
+>
+  
+
+<span class="btn__icon-container">
+  <i class="fas fa-expand"></i>
+  </span>
+
+</button>
+
+
+
+<script>
+document.write(`
+  <button class="btn btn-sm nav-link pst-navbar-icon theme-switch-button" title="light/dark" aria-label="light/dark" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <i class="theme-switch fa-solid fa-sun fa-lg" data-mode="light"></i>
+    <i class="theme-switch fa-solid fa-moon fa-lg" data-mode="dark"></i>
+    <i class="theme-switch fa-solid fa-circle-half-stroke fa-lg" data-mode="auto"></i>
+  </button>
+`);
+</script>
+
+
+<script>
+document.write(`
+  <button class="btn btn-sm pst-navbar-icon search-button search-button__button" title="Search" aria-label="Search" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <i class="fa-solid fa-magnifying-glass fa-lg"></i>
+  </button>
+`);
+</script>
+<button class="sidebar-toggle secondary-toggle btn btn-sm" title="Toggle secondary sidebar" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <span class="fa-solid fa-list"></span>
+</button>
+</div></div>
+      
+    </div>
+  
+</div>
+</div>
+              
+              
+
+<div id="jb-print-docs-body" class="onlyprint">
+    <h1>Blinds Light</h1>
+    <!-- Table of contents -->
+    <div id="print-main-content">
+        <div id="jb-print-toc">
+            
+            <div>
+                <h2> Contents </h2>
+            </div>
+            <nav aria-label="Page">
+                <ul class="visible nav section-nav flex-column">
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#colors">Colors</a></li>
+</ul>
+            </nav>
+        </div>
+    </div>
+</div>
+
+              
+                
+<div id="searchbox"></div>
+                <article class="bd-article">
+                  
+  <section class="tex2jax_ignore mathjax_ignore" id="blinds-light">
+<h1>Blinds Light<a class="headerlink" href="#blinds-light" title="Link to this heading">#</a></h1>
+<p>This style mimics the blinds light theme from vscode themes.</p>
+<p><img alt="Screenshot of the blinds-light theme in a bash script" src="http://github.com/_images/blinds-light.png" /></p>
+<section id="colors">
+<h2>Colors<a class="headerlink" href="#colors" title="Link to this heading">#</a></h2>
+<p>Background color: <img alt="#fcfcfc" src="https://via.placeholder.com/20/fcfcfc/fcfcfc.png" /> <code class="docutils literal notranslate"><span class="pre">#fcfcfc</span></code></p>
+<p>Highlight color: <img alt="#add6ff" src="https://via.placeholder.com/20/add6ff/add6ff.png" /> <code class="docutils literal notranslate"><span class="pre">#add6ff</span></code></p>
+<p><strong>WCAG compliance</strong></p>
+<div class="pst-scrollable-table-container"><table class="table">
+<thead>
+<tr class="row-odd"><th class="head"><p>Color</p></th>
+<th class="head"><p>Hex</p></th>
+<th class="head"><p>Ratio</p></th>
+<th class="head"><p>Normal text</p></th>
+<th class="head"><p>Large text</p></th>
+</tr>
+</thead>
+<tbody>
+<tr class="row-even"><td><p><img alt="#737373" src="http://github.com/_images/737373.png" /></p></td>
+<td><p><code class="docutils literal notranslate"><span class="pre">#737373</span></code></p></td>
+<td><p>4.6 : 1</p></td>
+<td><p>AA</p></td>
+<td><p>AAA</p></td>
+</tr>
+<tr class="row-odd"><td><p><img alt="#bf5400" src="http://github.com/_images/bf5400.png" /></p></td>
+<td><p><code class="docutils literal notranslate"><span class="pre">#bf5400</span></code></p></td>
+<td><p>4.6 : 1</p></td>
+<td><p>AA</p></td>
+<td><p>AAA</p></td>
+</tr>
+<tr class="row-even"><td><p><img alt="#996b00" src="http://github.com/_images/996b00.png" /></p></td>
+<td><p><code class="docutils literal notranslate"><span class="pre">#996b00</span></code></p></td>
+<td><p>4.6 : 1</p></td>
+<td><p>AA</p></td>
+<td><p>AAA</p></td>
+</tr>
+<tr class="row-odd"><td><p><img alt="#008561" src="http://github.com/_images/008561.png" /></p></td>
+<td><p><code class="docutils literal notranslate"><span class="pre">#008561</span></code></p></td>
+<td><p>4.5 : 1</p></td>
+<td><p>AA</p></td>
+<td><p>AAA</p></td>
+</tr>
+<tr class="row-even"><td><p><img alt="#0072b2" src="http://github.com/_images/0072b2.png" /></p></td>
+<td><p><code class="docutils literal notranslate"><span class="pre">#0072b2</span></code></p></td>
+<td><p>5.1 : 1</p></td>
+<td><p>AA</p></td>
+<td><p>AAA</p></td>
+</tr>
+<tr class="row-odd"><td><p><img alt="#cc398b" src="http://github.com/_images/cc398b.png" /></p></td>
+<td><p><code class="docutils literal notranslate"><span class="pre">#cc398b</span></code></p></td>
+<td><p>4.5 : 1</p></td>
+<td><p>AA</p></td>
+<td><p>AAA</p></td>
+</tr>
+<tr class="row-even"><td><p><img alt="#000000" src="http://github.com/_images/000000.png" /></p></td>
+<td><p><code class="docutils literal notranslate"><span class="pre">#000000</span></code></p></td>
+<td><p>20.5 : 1</p></td>
+<td><p>AAA</p></td>
+<td><p>AAA</p></td>
+</tr>
+</tbody>
+</table>
+</div>
+</section>
+</section>
+
+    <script type="text/x-thebe-config">
+    {
+        requestKernel: true,
+        binderOptions: {
+            repo: "binder-examples/jupyter-stacks-datascience",
+            ref: "master",
+        },
+        codeMirrorConfig: {
+            theme: "abcdef",
+            mode: "python"
+        },
+        kernelOptions: {
+            name: "python3",
+            path: "./.venv/lib/python3.13/site-packages/a11y_pygments/blinds_light"
+        },
+        predefinedOutput: true
+    }
+    </script>
+    <script>kernelName = 'python3'</script>
+
+                </article>
+              
+
+              
+              
+              
+              
+                <footer class="prev-next-footer d-print-none">
+                  
+<div class="prev-next-area">
+</div>
+                </footer>
+              
+            </div>
+            
+            
+              
+                <div class="bd-sidebar-secondary bd-toc"><div class="sidebar-secondary-items sidebar-secondary__inner">
+
+
+  <div class="sidebar-secondary-item">
+  <div class="page-toc tocsection onthispage">
+    <i class="fa-solid fa-list"></i> Contents
+  </div>
+  <nav class="bd-toc-nav page-toc">
+    <ul class="visible nav section-nav flex-column">
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#colors">Colors</a></li>
+</ul>
+  </nav></div>
+
+</div></div>
+              
+            
+          </div>
+          <footer class="bd-footer-content">
+            
+<div class="bd-footer-content__inner container">
+  
+  <div class="footer-item">
+    
+<p class="component-author">
+By Morten Hjorth-Jensen
+</p>
+
+  </div>
+  
+  <div class="footer-item">
+    
+
+  <p class="copyright">
+    
+      © Copyright 2023.
+      <br/>
+    
+  </p>
+
+  </div>
+  
+  <div class="footer-item">
+    
+  </div>
+  
+  <div class="footer-item">
+    
+  </div>
+  
+</div>
+          </footer>
+        
+
+      </main>
+    </div>
+  </div>
+  
+  <!-- Scripts loaded after <body> so the DOM is not blocked -->
+  <script src="http://github.com/_static/scripts/bootstrap.js?digest=dfe6caa3a7d634c4db9b"></script>
+<script src="http://github.com/_static/scripts/pydata-sphinx-theme.js?digest=dfe6caa3a7d634c4db9b"></script>
+
+  <footer class="bd-footer">
+  </footer>
+  </body>
+</html>
\ No newline at end of file
diff --git a/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/a11y_pygments/github_dark/README.html b/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/a11y_pygments/github_dark/README.html
new file mode 100644
index 000000000..2d6c83ac7
--- /dev/null
+++ b/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/a11y_pygments/github_dark/README.html
@@ -0,0 +1,579 @@
+
+<!DOCTYPE html>
+
+
+<html lang="en" data-content_root="../../../../../../" >
+
+  <head>
+    <meta charset="utf-8" />
+    <meta name="viewport" content="width=device-width, initial-scale=1.0" /><meta name="viewport" content="width=device-width, initial-scale=1" />
+
+    <title>Github Dark &#8212; Applied Data Analysis and Machine Learning</title>
+  
+  
+  
+  <script data-cfasync="false">
+    document.documentElement.dataset.mode = localStorage.getItem("mode") || "";
+    document.documentElement.dataset.theme = localStorage.getItem("theme") || "";
+  </script>
+  
+  <!-- Loaded before other Sphinx assets -->
+  <link href="http://github.com/_static/styles/theme.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+<link href="http://github.com/_static/styles/bootstrap.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+<link href="http://github.com/_static/styles/pydata-sphinx-theme.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+
+  
+  <link href="http://github.com/_static/vendor/fontawesome/6.5.2/css/all.min.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+  <link rel="preload" as="font" type="font/woff2" crossorigin href="http://github.com/_static/vendor/fontawesome/6.5.2/webfonts/fa-solid-900.woff2" />
+<link rel="preload" as="font" type="font/woff2" crossorigin href="http://github.com/_static/vendor/fontawesome/6.5.2/webfonts/fa-brands-400.woff2" />
+<link rel="preload" as="font" type="font/woff2" crossorigin href="http://github.com/_static/vendor/fontawesome/6.5.2/webfonts/fa-regular-400.woff2" />
+
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/pygments.css?v=03e43079" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/styles/sphinx-book-theme.css?v=eba8b062" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/togglebutton.css?v=13237357" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/copybutton.css?v=76b2166b" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/mystnb.8ecb98da25f57f5357bf6f572d296f466b2cfe2517ffebfabe82451661e28f02.css?v=6644e6bb" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/sphinx-thebe.css?v=4fa983c6" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/sphinx-design.min.css?v=95c83b7e" />
+  
+  <!-- Pre-loaded scripts that we'll load fully later -->
+  <link rel="preload" as="script" href="http://github.com/_static/scripts/bootstrap.js?digest=dfe6caa3a7d634c4db9b" />
+<link rel="preload" as="script" href="http://github.com/_static/scripts/pydata-sphinx-theme.js?digest=dfe6caa3a7d634c4db9b" />
+  <script src="http://github.com/_static/vendor/fontawesome/6.5.2/js/all.min.js?digest=dfe6caa3a7d634c4db9b"></script>
+
+    <script src="http://github.com/_static/documentation_options.js?v=9eb32ce0"></script>
+    <script src="http://github.com/_static/doctools.js?v=9a2dae69"></script>
+    <script src="http://github.com/_static/sphinx_highlight.js?v=dc90522c"></script>
+    <script src="http://github.com/_static/clipboard.min.js?v=a7894cd8"></script>
+    <script src="http://github.com/_static/copybutton.js?v=f281be69"></script>
+    <script src="http://github.com/_static/scripts/sphinx-book-theme.js?v=887ef09a"></script>
+    <script>let toggleHintShow = 'Click to show';</script>
+    <script>let toggleHintHide = 'Click to hide';</script>
+    <script>let toggleOpenOnPrint = 'true';</script>
+    <script src="http://github.com/_static/togglebutton.js?v=4a39c7ea"></script>
+    <script>var togglebuttonSelector = '.toggle, .admonition.dropdown';</script>
+    <script src="http://github.com/_static/design-tabs.js?v=f930bc37"></script>
+    <script>const THEBE_JS_URL = "https://unpkg.com/thebe@0.8.2/lib/index.js"; const thebe_selector = ".thebe,.cell"; const thebe_selector_input = "pre"; const thebe_selector_output = ".output, .cell_output"</script>
+    <script async="async" src="http://github.com/_static/sphinx-thebe.js?v=c100c467"></script>
+    <script>var togglebuttonSelector = '.toggle, .admonition.dropdown';</script>
+    <script>const THEBE_JS_URL = "https://unpkg.com/thebe@0.8.2/lib/index.js"; const thebe_selector = ".thebe,.cell"; const thebe_selector_input = "pre"; const thebe_selector_output = ".output, .cell_output"</script>
+    <script>DOCUMENTATION_OPTIONS.pagename = '.venv/lib/python3.13/site-packages/a11y_pygments/github_dark/README';</script>
+    <link rel="index" title="Index" href="http://github.com/genindex.html" />
+    <link rel="search" title="Search" href="http://github.com/search.html" />
+  <meta name="viewport" content="width=device-width, initial-scale=1"/>
+  <meta name="docsearch:language" content="en"/>
+  </head>
+  
+  
+  <body data-bs-spy="scroll" data-bs-target=".bd-toc-nav" data-offset="180" data-bs-root-margin="0px 0px -60%" data-default-mode="">
+
+  
+  
+  <div id="pst-skip-link" class="skip-link d-print-none"><a href="#main-content">Skip to main content</a></div>
+  
+  <div id="pst-scroll-pixel-helper"></div>
+  
+  <button type="button" class="btn rounded-pill" id="pst-back-to-top">
+    <i class="fa-solid fa-arrow-up"></i>Back to top</button>
+
+  
+  <input type="checkbox"
+          class="sidebar-toggle"
+          id="pst-primary-sidebar-checkbox"/>
+  <label class="overlay overlay-primary" for="pst-primary-sidebar-checkbox"></label>
+  
+  <input type="checkbox"
+          class="sidebar-toggle"
+          id="pst-secondary-sidebar-checkbox"/>
+  <label class="overlay overlay-secondary" for="pst-secondary-sidebar-checkbox"></label>
+  
+  <div class="search-button__wrapper">
+    <div class="search-button__overlay"></div>
+    <div class="search-button__search-container">
+<form class="bd-search d-flex align-items-center"
+      action="http://github.com/search.html"
+      method="get">
+  <i class="fa-solid fa-magnifying-glass"></i>
+  <input type="search"
+         class="form-control"
+         name="q"
+         id="search-input"
+         placeholder="Search this book..."
+         aria-label="Search this book..."
+         autocomplete="off"
+         autocorrect="off"
+         autocapitalize="off"
+         spellcheck="false"/>
+  <span class="search-button__kbd-shortcut"><kbd class="kbd-shortcut__modifier">Ctrl</kbd>+<kbd>K</kbd></span>
+</form></div>
+  </div>
+
+  <div class="pst-async-banner-revealer d-none">
+  <aside id="bd-header-version-warning" class="d-none d-print-none" aria-label="Version warning"></aside>
+</div>
+
+  
+    <header class="bd-header navbar navbar-expand-lg bd-navbar d-print-none">
+    </header>
+  
+
+  <div class="bd-container">
+    <div class="bd-container__inner bd-page-width">
+      
+      
+      
+        
+      
+      <div class="bd-sidebar-primary bd-sidebar">
+        
+
+  
+  <div class="sidebar-header-items sidebar-primary__section">
+    
+    
+    
+    
+  </div>
+  
+    <div class="sidebar-primary-items__start sidebar-primary__section">
+        <div class="sidebar-primary-item">
+
+  
+    
+  
+
+<a class="navbar-brand logo" href="http://github.com/intro.html">
+  
+  
+  
+  
+  
+    
+    
+      
+    
+    
+    <img src="http://github.com/_static/logo.png" class="logo__image only-light" alt="Applied Data Analysis and Machine Learning - Home"/>
+    <script>document.write(`<img src="http://github.com/_static/logo.png" class="logo__image only-dark" alt="Applied Data Analysis and Machine Learning - Home"/>`);</script>
+  
+  
+</a></div>
+        <div class="sidebar-primary-item">
+
+ <script>
+ document.write(`
+   <button class="btn search-button-field search-button__button" title="Search" aria-label="Search" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <i class="fa-solid fa-magnifying-glass"></i>
+    <span class="search-button__default-text">Search</span>
+    <span class="search-button__kbd-shortcut"><kbd class="kbd-shortcut__modifier">Ctrl</kbd>+<kbd class="kbd-shortcut__modifier">K</kbd></span>
+   </button>
+ `);
+ </script></div>
+        <div class="sidebar-primary-item"><nav class="bd-links bd-docs-nav" aria-label="Main">
+    <div class="bd-toc-item navbar-nav active">
+        
+        <ul class="nav bd-sidenav bd-sidenav__home-link">
+            <li class="toctree-l1">
+                <a class="reference internal" href="http://github.com/intro.html">
+                    Applied Data Analysis and Machine Learning
+                </a>
+            </li>
+        </ul>
+        <p aria-level="2" class="caption" role="heading"><span class="caption-text">About the course</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/schedule.html">Course setting</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/teachers.html">Teachers and Grading</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/textbooks.html">Textbooks</a></li>
+
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Review of Statistics with Resampling Techniques and Linear Algebra</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/statistics.html">1. Elements of Probability Theory and Statistical Data Analysis</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/linalg.html">2. Linear Algebra, Handling of Arrays and more Python Features</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">From Regression to Support Vector Machines</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter1.html">3. Linear Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter2.html">4. Ridge and Lasso Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter3.html">5. Resampling Methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter4.html">6. Logistic Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapteroptimization.html">7. Optimization, the central part of any Machine Learning algortithm</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter5.html">8. Support Vector Machines, overarching aims</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Decision Trees, Ensemble Methods and Boosting</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter6.html">9. Decision trees, overarching aims</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter7.html">10. Ensemble Methods: From a Single Tree to Many Trees and Extreme Boosting, Meet the Jungle of Methods</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Dimensionality Reduction</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter8.html">11. Basic ideas of the Principal Component Analysis (PCA)</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/clustering.html">12. Clustering and Unsupervised Learning</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Deep Learning Methods</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter9.html">13. Neural networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter10.html">14. Building a Feed Forward Neural Network</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter11.html">15. Solving Differential Equations  with Deep Learning</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter12.html">16. Convolutional Neural Networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter13.html">17. Recurrent neural networks: Overarching view</a></li>
+
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Weekly material, notes and exercises</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek34.html">Exercises week 34</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week34.html">Week 34: Introduction to the course, Logistics and Practicalities</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek35.html">Exercises week 35</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week35.html">Week 35: From Ordinary Linear Regression to Ridge and Lasso Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek36.html">Exercises week 36</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week36.html">Week 36: Linear Regression and Gradient descent</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek37.html">Exercises week 37</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week37.html">Week 37: Gradient descent methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek38.html">Exercises week 38</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week38.html">Week 38: Statistical analysis, bias-variance tradeoff and resampling methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek39.html">Exercises week 39</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week39.html">Week 39: Resampling methods and logistic regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week40.html">Week 40: Gradient descent methods (continued) and start Neural networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week41.html">Week 41 Neural networks and constructing a neural network code</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek41.html">Exercises week 41</a></li>
+
+
+
+
+
+
+
+
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Projects</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/project1.html">Project 1 on Machine Learning, deadline October 6 (midnight), 2025</a></li>
+</ul>
+
+    </div>
+</nav></div>
+    </div>
+  
+  
+  <div class="sidebar-primary-items__end sidebar-primary__section">
+  </div>
+  
+  <div id="rtd-footer-container"></div>
+
+
+      </div>
+      
+      <main id="main-content" class="bd-main" role="main">
+        
+        
+
+<div class="sbt-scroll-pixel-helper"></div>
+
+          <div class="bd-content">
+            <div class="bd-article-container">
+              
+              <div class="bd-header-article d-print-none">
+<div class="header-article-items header-article__inner">
+  
+    <div class="header-article-items__start">
+      
+        <div class="header-article-item"><button class="sidebar-toggle primary-toggle btn btn-sm" title="Toggle primary sidebar" data-bs-placement="bottom" data-bs-toggle="tooltip">
+  <span class="fa-solid fa-bars"></span>
+</button></div>
+      
+    </div>
+  
+  
+    <div class="header-article-items__end">
+      
+        <div class="header-article-item">
+
+<div class="article-header-buttons">
+
+
+
+
+
+<div class="dropdown dropdown-download-buttons">
+  <button class="btn dropdown-toggle" type="button" data-bs-toggle="dropdown" aria-expanded="false" aria-label="Download this page">
+    <i class="fas fa-download"></i>
+  </button>
+  <ul class="dropdown-menu">
+      
+      
+      
+      <li><a href="http://github.com/_sources/.venv/lib/python3.13/site-packages/a11y_pygments/github_dark/README.md" target="_blank"
+   class="btn btn-sm btn-download-source-button dropdown-item"
+   title="Download source file"
+   data-bs-placement="left" data-bs-toggle="tooltip"
+>
+  
+
+<span class="btn__icon-container">
+  <i class="fas fa-file"></i>
+  </span>
+<span class="btn__text-container">.md</span>
+</a>
+</li>
+      
+      
+      
+      
+      <li>
+<button onclick="window.print()"
+  class="btn btn-sm btn-download-pdf-button dropdown-item"
+  title="Print to PDF"
+  data-bs-placement="left" data-bs-toggle="tooltip"
+>
+  
+
+<span class="btn__icon-container">
+  <i class="fas fa-file-pdf"></i>
+  </span>
+<span class="btn__text-container">.pdf</span>
+</button>
+</li>
+      
+  </ul>
+</div>
+
+
+
+
+<button onclick="toggleFullScreen()"
+  class="btn btn-sm btn-fullscreen-button"
+  title="Fullscreen mode"
+  data-bs-placement="bottom" data-bs-toggle="tooltip"
+>
+  
+
+<span class="btn__icon-container">
+  <i class="fas fa-expand"></i>
+  </span>
+
+</button>
+
+
+
+<script>
+document.write(`
+  <button class="btn btn-sm nav-link pst-navbar-icon theme-switch-button" title="light/dark" aria-label="light/dark" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <i class="theme-switch fa-solid fa-sun fa-lg" data-mode="light"></i>
+    <i class="theme-switch fa-solid fa-moon fa-lg" data-mode="dark"></i>
+    <i class="theme-switch fa-solid fa-circle-half-stroke fa-lg" data-mode="auto"></i>
+  </button>
+`);
+</script>
+
+
+<script>
+document.write(`
+  <button class="btn btn-sm pst-navbar-icon search-button search-button__button" title="Search" aria-label="Search" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <i class="fa-solid fa-magnifying-glass fa-lg"></i>
+  </button>
+`);
+</script>
+<button class="sidebar-toggle secondary-toggle btn btn-sm" title="Toggle secondary sidebar" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <span class="fa-solid fa-list"></span>
+</button>
+</div></div>
+      
+    </div>
+  
+</div>
+</div>
+              
+              
+
+<div id="jb-print-docs-body" class="onlyprint">
+    <h1>Github Dark</h1>
+    <!-- Table of contents -->
+    <div id="print-main-content">
+        <div id="jb-print-toc">
+            
+            <div>
+                <h2> Contents </h2>
+            </div>
+            <nav aria-label="Page">
+                <ul class="visible nav section-nav flex-column">
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#colors">Colors</a></li>
+</ul>
+            </nav>
+        </div>
+    </div>
+</div>
+
+              
+                
+<div id="searchbox"></div>
+                <article class="bd-article">
+                  
+  <section class="tex2jax_ignore mathjax_ignore" id="github-dark">
+<h1>Github Dark<a class="headerlink" href="#github-dark" title="Link to this heading">#</a></h1>
+<p>This style mimics the github dark default theme from vs code themes.</p>
+<p><img alt="Screenshot of the github-dark theme in a bash script" src="http://github.com/_images/github-dark.png" /></p>
+<section id="colors">
+<h2>Colors<a class="headerlink" href="#colors" title="Link to this heading">#</a></h2>
+<p>Background color: <img alt="#0d1117" src="https://via.placeholder.com/20/0d1117/0d1117.png" /> <code class="docutils literal notranslate"><span class="pre">#0d1117</span></code></p>
+<p>Highlight color: <img alt="#6e7681" src="https://via.placeholder.com/20/6e7681/6e7681.png" /> <code class="docutils literal notranslate"><span class="pre">#6e7681</span></code></p>
+<p><strong>WCAG compliance</strong></p>
+<div class="pst-scrollable-table-container"><table class="table">
+<thead>
+<tr class="row-odd"><th class="head"><p>Color</p></th>
+<th class="head"><p>Hex</p></th>
+<th class="head"><p>Ratio</p></th>
+<th class="head"><p>Normal text</p></th>
+<th class="head"><p>Large text</p></th>
+</tr>
+</thead>
+<tbody>
+<tr class="row-even"><td><p><img alt="#8b949e" src="http://github.com/_images/8b949e.png" /></p></td>
+<td><p><code class="docutils literal notranslate"><span class="pre">#8b949e</span></code></p></td>
+<td><p>6.2 : 1</p></td>
+<td><p>AA</p></td>
+<td><p>AAA</p></td>
+</tr>
+<tr class="row-odd"><td><p><img alt="#ff7b72" src="http://github.com/_images/ff7b72.png" /></p></td>
+<td><p><code class="docutils literal notranslate"><span class="pre">#ff7b72</span></code></p></td>
+<td><p>7.5 : 1</p></td>
+<td><p>AAA</p></td>
+<td><p>AAA</p></td>
+</tr>
+<tr class="row-even"><td><p><img alt="#ffa657" src="http://github.com/_images/ffa657.png" /></p></td>
+<td><p><code class="docutils literal notranslate"><span class="pre">#ffa657</span></code></p></td>
+<td><p>9.8 : 1</p></td>
+<td><p>AAA</p></td>
+<td><p>AAA</p></td>
+</tr>
+<tr class="row-odd"><td><p><img alt="#7ee787" src="http://github.com/_images/7ee787.png" /></p></td>
+<td><p><code class="docutils literal notranslate"><span class="pre">#7ee787</span></code></p></td>
+<td><p>12.3 : 1</p></td>
+<td><p>AAA</p></td>
+<td><p>AAA</p></td>
+</tr>
+<tr class="row-even"><td><p><img alt="#79c0ff" src="http://github.com/_images/79c0ff.png" /></p></td>
+<td><p><code class="docutils literal notranslate"><span class="pre">#79c0ff</span></code></p></td>
+<td><p>9.7 : 1</p></td>
+<td><p>AAA</p></td>
+<td><p>AAA</p></td>
+</tr>
+<tr class="row-odd"><td><p><img alt="#d2a8ff" src="http://github.com/_images/d2a8ff.png" /></p></td>
+<td><p><code class="docutils literal notranslate"><span class="pre">#d2a8ff</span></code></p></td>
+<td><p>9.7 : 1</p></td>
+<td><p>AAA</p></td>
+<td><p>AAA</p></td>
+</tr>
+<tr class="row-even"><td><p><img alt="#c9d1d9" src="http://github.com/_images/c9d1d9.png" /></p></td>
+<td><p><code class="docutils literal notranslate"><span class="pre">#c9d1d9</span></code></p></td>
+<td><p>12.3 : 1</p></td>
+<td><p>AAA</p></td>
+<td><p>AAA</p></td>
+</tr>
+</tbody>
+</table>
+</div>
+</section>
+</section>
+
+    <script type="text/x-thebe-config">
+    {
+        requestKernel: true,
+        binderOptions: {
+            repo: "binder-examples/jupyter-stacks-datascience",
+            ref: "master",
+        },
+        codeMirrorConfig: {
+            theme: "abcdef",
+            mode: "python"
+        },
+        kernelOptions: {
+            name: "python3",
+            path: "./.venv/lib/python3.13/site-packages/a11y_pygments/github_dark"
+        },
+        predefinedOutput: true
+    }
+    </script>
+    <script>kernelName = 'python3'</script>
+
+                </article>
+              
+
+              
+              
+              
+              
+                <footer class="prev-next-footer d-print-none">
+                  
+<div class="prev-next-area">
+</div>
+                </footer>
+              
+            </div>
+            
+            
+              
+                <div class="bd-sidebar-secondary bd-toc"><div class="sidebar-secondary-items sidebar-secondary__inner">
+
+
+  <div class="sidebar-secondary-item">
+  <div class="page-toc tocsection onthispage">
+    <i class="fa-solid fa-list"></i> Contents
+  </div>
+  <nav class="bd-toc-nav page-toc">
+    <ul class="visible nav section-nav flex-column">
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#colors">Colors</a></li>
+</ul>
+  </nav></div>
+
+</div></div>
+              
+            
+          </div>
+          <footer class="bd-footer-content">
+            
+<div class="bd-footer-content__inner container">
+  
+  <div class="footer-item">
+    
+<p class="component-author">
+By Morten Hjorth-Jensen
+</p>
+
+  </div>
+  
+  <div class="footer-item">
+    
+
+  <p class="copyright">
+    
+      © Copyright 2023.
+      <br/>
+    
+  </p>
+
+  </div>
+  
+  <div class="footer-item">
+    
+  </div>
+  
+  <div class="footer-item">
+    
+  </div>
+  
+</div>
+          </footer>
+        
+
+      </main>
+    </div>
+  </div>
+  
+  <!-- Scripts loaded after <body> so the DOM is not blocked -->
+  <script src="http://github.com/_static/scripts/bootstrap.js?digest=dfe6caa3a7d634c4db9b"></script>
+<script src="http://github.com/_static/scripts/pydata-sphinx-theme.js?digest=dfe6caa3a7d634c4db9b"></script>
+
+  <footer class="bd-footer">
+  </footer>
+  </body>
+</html>
\ No newline at end of file
diff --git a/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/a11y_pygments/github_dark_colorblind/README.html b/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/a11y_pygments/github_dark_colorblind/README.html
new file mode 100644
index 000000000..b913171f5
--- /dev/null
+++ b/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/a11y_pygments/github_dark_colorblind/README.html
@@ -0,0 +1,579 @@
+
+<!DOCTYPE html>
+
+
+<html lang="en" data-content_root="../../../../../../" >
+
+  <head>
+    <meta charset="utf-8" />
+    <meta name="viewport" content="width=device-width, initial-scale=1.0" /><meta name="viewport" content="width=device-width, initial-scale=1" />
+
+    <title>Github Dark Colorblind &#8212; Applied Data Analysis and Machine Learning</title>
+  
+  
+  
+  <script data-cfasync="false">
+    document.documentElement.dataset.mode = localStorage.getItem("mode") || "";
+    document.documentElement.dataset.theme = localStorage.getItem("theme") || "";
+  </script>
+  
+  <!-- Loaded before other Sphinx assets -->
+  <link href="http://github.com/_static/styles/theme.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+<link href="http://github.com/_static/styles/bootstrap.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+<link href="http://github.com/_static/styles/pydata-sphinx-theme.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+
+  
+  <link href="http://github.com/_static/vendor/fontawesome/6.5.2/css/all.min.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+  <link rel="preload" as="font" type="font/woff2" crossorigin href="http://github.com/_static/vendor/fontawesome/6.5.2/webfonts/fa-solid-900.woff2" />
+<link rel="preload" as="font" type="font/woff2" crossorigin href="http://github.com/_static/vendor/fontawesome/6.5.2/webfonts/fa-brands-400.woff2" />
+<link rel="preload" as="font" type="font/woff2" crossorigin href="http://github.com/_static/vendor/fontawesome/6.5.2/webfonts/fa-regular-400.woff2" />
+
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/pygments.css?v=03e43079" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/styles/sphinx-book-theme.css?v=eba8b062" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/togglebutton.css?v=13237357" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/copybutton.css?v=76b2166b" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/mystnb.8ecb98da25f57f5357bf6f572d296f466b2cfe2517ffebfabe82451661e28f02.css?v=6644e6bb" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/sphinx-thebe.css?v=4fa983c6" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/sphinx-design.min.css?v=95c83b7e" />
+  
+  <!-- Pre-loaded scripts that we'll load fully later -->
+  <link rel="preload" as="script" href="http://github.com/_static/scripts/bootstrap.js?digest=dfe6caa3a7d634c4db9b" />
+<link rel="preload" as="script" href="http://github.com/_static/scripts/pydata-sphinx-theme.js?digest=dfe6caa3a7d634c4db9b" />
+  <script src="http://github.com/_static/vendor/fontawesome/6.5.2/js/all.min.js?digest=dfe6caa3a7d634c4db9b"></script>
+
+    <script src="http://github.com/_static/documentation_options.js?v=9eb32ce0"></script>
+    <script src="http://github.com/_static/doctools.js?v=9a2dae69"></script>
+    <script src="http://github.com/_static/sphinx_highlight.js?v=dc90522c"></script>
+    <script src="http://github.com/_static/clipboard.min.js?v=a7894cd8"></script>
+    <script src="http://github.com/_static/copybutton.js?v=f281be69"></script>
+    <script src="http://github.com/_static/scripts/sphinx-book-theme.js?v=887ef09a"></script>
+    <script>let toggleHintShow = 'Click to show';</script>
+    <script>let toggleHintHide = 'Click to hide';</script>
+    <script>let toggleOpenOnPrint = 'true';</script>
+    <script src="http://github.com/_static/togglebutton.js?v=4a39c7ea"></script>
+    <script>var togglebuttonSelector = '.toggle, .admonition.dropdown';</script>
+    <script src="http://github.com/_static/design-tabs.js?v=f930bc37"></script>
+    <script>const THEBE_JS_URL = "https://unpkg.com/thebe@0.8.2/lib/index.js"; const thebe_selector = ".thebe,.cell"; const thebe_selector_input = "pre"; const thebe_selector_output = ".output, .cell_output"</script>
+    <script async="async" src="http://github.com/_static/sphinx-thebe.js?v=c100c467"></script>
+    <script>var togglebuttonSelector = '.toggle, .admonition.dropdown';</script>
+    <script>const THEBE_JS_URL = "https://unpkg.com/thebe@0.8.2/lib/index.js"; const thebe_selector = ".thebe,.cell"; const thebe_selector_input = "pre"; const thebe_selector_output = ".output, .cell_output"</script>
+    <script>DOCUMENTATION_OPTIONS.pagename = '.venv/lib/python3.13/site-packages/a11y_pygments/github_dark_colorblind/README';</script>
+    <link rel="index" title="Index" href="http://github.com/genindex.html" />
+    <link rel="search" title="Search" href="http://github.com/search.html" />
+  <meta name="viewport" content="width=device-width, initial-scale=1"/>
+  <meta name="docsearch:language" content="en"/>
+  </head>
+  
+  
+  <body data-bs-spy="scroll" data-bs-target=".bd-toc-nav" data-offset="180" data-bs-root-margin="0px 0px -60%" data-default-mode="">
+
+  
+  
+  <div id="pst-skip-link" class="skip-link d-print-none"><a href="#main-content">Skip to main content</a></div>
+  
+  <div id="pst-scroll-pixel-helper"></div>
+  
+  <button type="button" class="btn rounded-pill" id="pst-back-to-top">
+    <i class="fa-solid fa-arrow-up"></i>Back to top</button>
+
+  
+  <input type="checkbox"
+          class="sidebar-toggle"
+          id="pst-primary-sidebar-checkbox"/>
+  <label class="overlay overlay-primary" for="pst-primary-sidebar-checkbox"></label>
+  
+  <input type="checkbox"
+          class="sidebar-toggle"
+          id="pst-secondary-sidebar-checkbox"/>
+  <label class="overlay overlay-secondary" for="pst-secondary-sidebar-checkbox"></label>
+  
+  <div class="search-button__wrapper">
+    <div class="search-button__overlay"></div>
+    <div class="search-button__search-container">
+<form class="bd-search d-flex align-items-center"
+      action="http://github.com/search.html"
+      method="get">
+  <i class="fa-solid fa-magnifying-glass"></i>
+  <input type="search"
+         class="form-control"
+         name="q"
+         id="search-input"
+         placeholder="Search this book..."
+         aria-label="Search this book..."
+         autocomplete="off"
+         autocorrect="off"
+         autocapitalize="off"
+         spellcheck="false"/>
+  <span class="search-button__kbd-shortcut"><kbd class="kbd-shortcut__modifier">Ctrl</kbd>+<kbd>K</kbd></span>
+</form></div>
+  </div>
+
+  <div class="pst-async-banner-revealer d-none">
+  <aside id="bd-header-version-warning" class="d-none d-print-none" aria-label="Version warning"></aside>
+</div>
+
+  
+    <header class="bd-header navbar navbar-expand-lg bd-navbar d-print-none">
+    </header>
+  
+
+  <div class="bd-container">
+    <div class="bd-container__inner bd-page-width">
+      
+      
+      
+        
+      
+      <div class="bd-sidebar-primary bd-sidebar">
+        
+
+  
+  <div class="sidebar-header-items sidebar-primary__section">
+    
+    
+    
+    
+  </div>
+  
+    <div class="sidebar-primary-items__start sidebar-primary__section">
+        <div class="sidebar-primary-item">
+
+  
+    
+  
+
+<a class="navbar-brand logo" href="http://github.com/intro.html">
+  
+  
+  
+  
+  
+    
+    
+      
+    
+    
+    <img src="http://github.com/_static/logo.png" class="logo__image only-light" alt="Applied Data Analysis and Machine Learning - Home"/>
+    <script>document.write(`<img src="http://github.com/_static/logo.png" class="logo__image only-dark" alt="Applied Data Analysis and Machine Learning - Home"/>`);</script>
+  
+  
+</a></div>
+        <div class="sidebar-primary-item">
+
+ <script>
+ document.write(`
+   <button class="btn search-button-field search-button__button" title="Search" aria-label="Search" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <i class="fa-solid fa-magnifying-glass"></i>
+    <span class="search-button__default-text">Search</span>
+    <span class="search-button__kbd-shortcut"><kbd class="kbd-shortcut__modifier">Ctrl</kbd>+<kbd class="kbd-shortcut__modifier">K</kbd></span>
+   </button>
+ `);
+ </script></div>
+        <div class="sidebar-primary-item"><nav class="bd-links bd-docs-nav" aria-label="Main">
+    <div class="bd-toc-item navbar-nav active">
+        
+        <ul class="nav bd-sidenav bd-sidenav__home-link">
+            <li class="toctree-l1">
+                <a class="reference internal" href="http://github.com/intro.html">
+                    Applied Data Analysis and Machine Learning
+                </a>
+            </li>
+        </ul>
+        <p aria-level="2" class="caption" role="heading"><span class="caption-text">About the course</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/schedule.html">Course setting</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/teachers.html">Teachers and Grading</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/textbooks.html">Textbooks</a></li>
+
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Review of Statistics with Resampling Techniques and Linear Algebra</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/statistics.html">1. Elements of Probability Theory and Statistical Data Analysis</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/linalg.html">2. Linear Algebra, Handling of Arrays and more Python Features</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">From Regression to Support Vector Machines</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter1.html">3. Linear Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter2.html">4. Ridge and Lasso Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter3.html">5. Resampling Methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter4.html">6. Logistic Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapteroptimization.html">7. Optimization, the central part of any Machine Learning algortithm</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter5.html">8. Support Vector Machines, overarching aims</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Decision Trees, Ensemble Methods and Boosting</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter6.html">9. Decision trees, overarching aims</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter7.html">10. Ensemble Methods: From a Single Tree to Many Trees and Extreme Boosting, Meet the Jungle of Methods</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Dimensionality Reduction</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter8.html">11. Basic ideas of the Principal Component Analysis (PCA)</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/clustering.html">12. Clustering and Unsupervised Learning</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Deep Learning Methods</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter9.html">13. Neural networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter10.html">14. Building a Feed Forward Neural Network</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter11.html">15. Solving Differential Equations  with Deep Learning</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter12.html">16. Convolutional Neural Networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter13.html">17. Recurrent neural networks: Overarching view</a></li>
+
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Weekly material, notes and exercises</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek34.html">Exercises week 34</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week34.html">Week 34: Introduction to the course, Logistics and Practicalities</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek35.html">Exercises week 35</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week35.html">Week 35: From Ordinary Linear Regression to Ridge and Lasso Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek36.html">Exercises week 36</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week36.html">Week 36: Linear Regression and Gradient descent</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek37.html">Exercises week 37</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week37.html">Week 37: Gradient descent methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek38.html">Exercises week 38</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week38.html">Week 38: Statistical analysis, bias-variance tradeoff and resampling methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek39.html">Exercises week 39</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week39.html">Week 39: Resampling methods and logistic regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week40.html">Week 40: Gradient descent methods (continued) and start Neural networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week41.html">Week 41 Neural networks and constructing a neural network code</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek41.html">Exercises week 41</a></li>
+
+
+
+
+
+
+
+
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Projects</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/project1.html">Project 1 on Machine Learning, deadline October 6 (midnight), 2025</a></li>
+</ul>
+
+    </div>
+</nav></div>
+    </div>
+  
+  
+  <div class="sidebar-primary-items__end sidebar-primary__section">
+  </div>
+  
+  <div id="rtd-footer-container"></div>
+
+
+      </div>
+      
+      <main id="main-content" class="bd-main" role="main">
+        
+        
+
+<div class="sbt-scroll-pixel-helper"></div>
+
+          <div class="bd-content">
+            <div class="bd-article-container">
+              
+              <div class="bd-header-article d-print-none">
+<div class="header-article-items header-article__inner">
+  
+    <div class="header-article-items__start">
+      
+        <div class="header-article-item"><button class="sidebar-toggle primary-toggle btn btn-sm" title="Toggle primary sidebar" data-bs-placement="bottom" data-bs-toggle="tooltip">
+  <span class="fa-solid fa-bars"></span>
+</button></div>
+      
+    </div>
+  
+  
+    <div class="header-article-items__end">
+      
+        <div class="header-article-item">
+
+<div class="article-header-buttons">
+
+
+
+
+
+<div class="dropdown dropdown-download-buttons">
+  <button class="btn dropdown-toggle" type="button" data-bs-toggle="dropdown" aria-expanded="false" aria-label="Download this page">
+    <i class="fas fa-download"></i>
+  </button>
+  <ul class="dropdown-menu">
+      
+      
+      
+      <li><a href="http://github.com/_sources/.venv/lib/python3.13/site-packages/a11y_pygments/github_dark_colorblind/README.md" target="_blank"
+   class="btn btn-sm btn-download-source-button dropdown-item"
+   title="Download source file"
+   data-bs-placement="left" data-bs-toggle="tooltip"
+>
+  
+
+<span class="btn__icon-container">
+  <i class="fas fa-file"></i>
+  </span>
+<span class="btn__text-container">.md</span>
+</a>
+</li>
+      
+      
+      
+      
+      <li>
+<button onclick="window.print()"
+  class="btn btn-sm btn-download-pdf-button dropdown-item"
+  title="Print to PDF"
+  data-bs-placement="left" data-bs-toggle="tooltip"
+>
+  
+
+<span class="btn__icon-container">
+  <i class="fas fa-file-pdf"></i>
+  </span>
+<span class="btn__text-container">.pdf</span>
+</button>
+</li>
+      
+  </ul>
+</div>
+
+
+
+
+<button onclick="toggleFullScreen()"
+  class="btn btn-sm btn-fullscreen-button"
+  title="Fullscreen mode"
+  data-bs-placement="bottom" data-bs-toggle="tooltip"
+>
+  
+
+<span class="btn__icon-container">
+  <i class="fas fa-expand"></i>
+  </span>
+
+</button>
+
+
+
+<script>
+document.write(`
+  <button class="btn btn-sm nav-link pst-navbar-icon theme-switch-button" title="light/dark" aria-label="light/dark" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <i class="theme-switch fa-solid fa-sun fa-lg" data-mode="light"></i>
+    <i class="theme-switch fa-solid fa-moon fa-lg" data-mode="dark"></i>
+    <i class="theme-switch fa-solid fa-circle-half-stroke fa-lg" data-mode="auto"></i>
+  </button>
+`);
+</script>
+
+
+<script>
+document.write(`
+  <button class="btn btn-sm pst-navbar-icon search-button search-button__button" title="Search" aria-label="Search" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <i class="fa-solid fa-magnifying-glass fa-lg"></i>
+  </button>
+`);
+</script>
+<button class="sidebar-toggle secondary-toggle btn btn-sm" title="Toggle secondary sidebar" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <span class="fa-solid fa-list"></span>
+</button>
+</div></div>
+      
+    </div>
+  
+</div>
+</div>
+              
+              
+
+<div id="jb-print-docs-body" class="onlyprint">
+    <h1>Github Dark Colorblind</h1>
+    <!-- Table of contents -->
+    <div id="print-main-content">
+        <div id="jb-print-toc">
+            
+            <div>
+                <h2> Contents </h2>
+            </div>
+            <nav aria-label="Page">
+                <ul class="visible nav section-nav flex-column">
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#colors">Colors</a></li>
+</ul>
+            </nav>
+        </div>
+    </div>
+</div>
+
+              
+                
+<div id="searchbox"></div>
+                <article class="bd-article">
+                  
+  <section class="tex2jax_ignore mathjax_ignore" id="github-dark-colorblind">
+<h1>Github Dark Colorblind<a class="headerlink" href="#github-dark-colorblind" title="Link to this heading">#</a></h1>
+<p>This style mimics the github dark colorblind theme from vscode.</p>
+<p><img alt="Screenshot of the github-dark-colorblind theme in a bash script" src="http://github.com/_images/github-dark-colorblind.png" /></p>
+<section id="colors">
+<h2>Colors<a class="headerlink" href="#colors" title="Link to this heading">#</a></h2>
+<p>Background color: <img alt="#0d1117" src="https://via.placeholder.com/20/0d1117/0d1117.png" /> <code class="docutils literal notranslate"><span class="pre">#0d1117</span></code></p>
+<p>Highlight color: <img alt="#58a6ff70" src="https://via.placeholder.com/20/58a6ff70/58a6ff70.png" /> <code class="docutils literal notranslate"><span class="pre">#58a6ff70</span></code></p>
+<p><strong>WCAG compliance</strong></p>
+<div class="pst-scrollable-table-container"><table class="table">
+<thead>
+<tr class="row-odd"><th class="head"><p>Color</p></th>
+<th class="head"><p>Hex</p></th>
+<th class="head"><p>Ratio</p></th>
+<th class="head"><p>Normal text</p></th>
+<th class="head"><p>Large text</p></th>
+</tr>
+</thead>
+<tbody>
+<tr class="row-even"><td><p><img alt="#b1bac4" src="http://github.com/_images/b1bac4.png" /></p></td>
+<td><p><code class="docutils literal notranslate"><span class="pre">#b1bac4</span></code></p></td>
+<td><p>9.6 : 1</p></td>
+<td><p>AAA</p></td>
+<td><p>AAA</p></td>
+</tr>
+<tr class="row-odd"><td><p><img alt="#ec8e2c" src="http://github.com/_images/ec8e2c.png" /></p></td>
+<td><p><code class="docutils literal notranslate"><span class="pre">#ec8e2c</span></code></p></td>
+<td><p>7.6 : 1</p></td>
+<td><p>AAA</p></td>
+<td><p>AAA</p></td>
+</tr>
+<tr class="row-even"><td><p><img alt="#fdac54" src="http://github.com/_images/fdac54.png" /></p></td>
+<td><p><code class="docutils literal notranslate"><span class="pre">#fdac54</span></code></p></td>
+<td><p>10.1 : 1</p></td>
+<td><p>AAA</p></td>
+<td><p>AAA</p></td>
+</tr>
+<tr class="row-odd"><td><p><img alt="#a5d6ff" src="http://github.com/_images/a5d6ff.png" /></p></td>
+<td><p><code class="docutils literal notranslate"><span class="pre">#a5d6ff</span></code></p></td>
+<td><p>12.3 : 1</p></td>
+<td><p>AAA</p></td>
+<td><p>AAA</p></td>
+</tr>
+<tr class="row-even"><td><p><img alt="#79c0ff" src="http://github.com/_images/79c0ff.png" /></p></td>
+<td><p><code class="docutils literal notranslate"><span class="pre">#79c0ff</span></code></p></td>
+<td><p>9.7 : 1</p></td>
+<td><p>AAA</p></td>
+<td><p>AAA</p></td>
+</tr>
+<tr class="row-odd"><td><p><img alt="#d2a8ff" src="http://github.com/_images/d2a8ff.png" /></p></td>
+<td><p><code class="docutils literal notranslate"><span class="pre">#d2a8ff</span></code></p></td>
+<td><p>9.7 : 1</p></td>
+<td><p>AAA</p></td>
+<td><p>AAA</p></td>
+</tr>
+<tr class="row-even"><td><p><img alt="#c9d1d9" src="http://github.com/_images/c9d1d9.png" /></p></td>
+<td><p><code class="docutils literal notranslate"><span class="pre">#c9d1d9</span></code></p></td>
+<td><p>12.3 : 1</p></td>
+<td><p>AAA</p></td>
+<td><p>AAA</p></td>
+</tr>
+</tbody>
+</table>
+</div>
+</section>
+</section>
+
+    <script type="text/x-thebe-config">
+    {
+        requestKernel: true,
+        binderOptions: {
+            repo: "binder-examples/jupyter-stacks-datascience",
+            ref: "master",
+        },
+        codeMirrorConfig: {
+            theme: "abcdef",
+            mode: "python"
+        },
+        kernelOptions: {
+            name: "python3",
+            path: "./.venv/lib/python3.13/site-packages/a11y_pygments/github_dark_colorblind"
+        },
+        predefinedOutput: true
+    }
+    </script>
+    <script>kernelName = 'python3'</script>
+
+                </article>
+              
+
+              
+              
+              
+              
+                <footer class="prev-next-footer d-print-none">
+                  
+<div class="prev-next-area">
+</div>
+                </footer>
+              
+            </div>
+            
+            
+              
+                <div class="bd-sidebar-secondary bd-toc"><div class="sidebar-secondary-items sidebar-secondary__inner">
+
+
+  <div class="sidebar-secondary-item">
+  <div class="page-toc tocsection onthispage">
+    <i class="fa-solid fa-list"></i> Contents
+  </div>
+  <nav class="bd-toc-nav page-toc">
+    <ul class="visible nav section-nav flex-column">
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#colors">Colors</a></li>
+</ul>
+  </nav></div>
+
+</div></div>
+              
+            
+          </div>
+          <footer class="bd-footer-content">
+            
+<div class="bd-footer-content__inner container">
+  
+  <div class="footer-item">
+    
+<p class="component-author">
+By Morten Hjorth-Jensen
+</p>
+
+  </div>
+  
+  <div class="footer-item">
+    
+
+  <p class="copyright">
+    
+      © Copyright 2023.
+      <br/>
+    
+  </p>
+
+  </div>
+  
+  <div class="footer-item">
+    
+  </div>
+  
+  <div class="footer-item">
+    
+  </div>
+  
+</div>
+          </footer>
+        
+
+      </main>
+    </div>
+  </div>
+  
+  <!-- Scripts loaded after <body> so the DOM is not blocked -->
+  <script src="http://github.com/_static/scripts/bootstrap.js?digest=dfe6caa3a7d634c4db9b"></script>
+<script src="http://github.com/_static/scripts/pydata-sphinx-theme.js?digest=dfe6caa3a7d634c4db9b"></script>
+
+  <footer class="bd-footer">
+  </footer>
+  </body>
+</html>
\ No newline at end of file
diff --git a/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/a11y_pygments/github_dark_high_contrast/README.html b/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/a11y_pygments/github_dark_high_contrast/README.html
new file mode 100644
index 000000000..982873a19
--- /dev/null
+++ b/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/a11y_pygments/github_dark_high_contrast/README.html
@@ -0,0 +1,579 @@
+
+<!DOCTYPE html>
+
+
+<html lang="en" data-content_root="../../../../../../" >
+
+  <head>
+    <meta charset="utf-8" />
+    <meta name="viewport" content="width=device-width, initial-scale=1.0" /><meta name="viewport" content="width=device-width, initial-scale=1" />
+
+    <title>Github Dark High Contrast &#8212; Applied Data Analysis and Machine Learning</title>
+  
+  
+  
+  <script data-cfasync="false">
+    document.documentElement.dataset.mode = localStorage.getItem("mode") || "";
+    document.documentElement.dataset.theme = localStorage.getItem("theme") || "";
+  </script>
+  
+  <!-- Loaded before other Sphinx assets -->
+  <link href="http://github.com/_static/styles/theme.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+<link href="http://github.com/_static/styles/bootstrap.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+<link href="http://github.com/_static/styles/pydata-sphinx-theme.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+
+  
+  <link href="http://github.com/_static/vendor/fontawesome/6.5.2/css/all.min.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+  <link rel="preload" as="font" type="font/woff2" crossorigin href="http://github.com/_static/vendor/fontawesome/6.5.2/webfonts/fa-solid-900.woff2" />
+<link rel="preload" as="font" type="font/woff2" crossorigin href="http://github.com/_static/vendor/fontawesome/6.5.2/webfonts/fa-brands-400.woff2" />
+<link rel="preload" as="font" type="font/woff2" crossorigin href="http://github.com/_static/vendor/fontawesome/6.5.2/webfonts/fa-regular-400.woff2" />
+
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/pygments.css?v=03e43079" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/styles/sphinx-book-theme.css?v=eba8b062" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/togglebutton.css?v=13237357" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/copybutton.css?v=76b2166b" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/mystnb.8ecb98da25f57f5357bf6f572d296f466b2cfe2517ffebfabe82451661e28f02.css?v=6644e6bb" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/sphinx-thebe.css?v=4fa983c6" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/sphinx-design.min.css?v=95c83b7e" />
+  
+  <!-- Pre-loaded scripts that we'll load fully later -->
+  <link rel="preload" as="script" href="http://github.com/_static/scripts/bootstrap.js?digest=dfe6caa3a7d634c4db9b" />
+<link rel="preload" as="script" href="http://github.com/_static/scripts/pydata-sphinx-theme.js?digest=dfe6caa3a7d634c4db9b" />
+  <script src="http://github.com/_static/vendor/fontawesome/6.5.2/js/all.min.js?digest=dfe6caa3a7d634c4db9b"></script>
+
+    <script src="http://github.com/_static/documentation_options.js?v=9eb32ce0"></script>
+    <script src="http://github.com/_static/doctools.js?v=9a2dae69"></script>
+    <script src="http://github.com/_static/sphinx_highlight.js?v=dc90522c"></script>
+    <script src="http://github.com/_static/clipboard.min.js?v=a7894cd8"></script>
+    <script src="http://github.com/_static/copybutton.js?v=f281be69"></script>
+    <script src="http://github.com/_static/scripts/sphinx-book-theme.js?v=887ef09a"></script>
+    <script>let toggleHintShow = 'Click to show';</script>
+    <script>let toggleHintHide = 'Click to hide';</script>
+    <script>let toggleOpenOnPrint = 'true';</script>
+    <script src="http://github.com/_static/togglebutton.js?v=4a39c7ea"></script>
+    <script>var togglebuttonSelector = '.toggle, .admonition.dropdown';</script>
+    <script src="http://github.com/_static/design-tabs.js?v=f930bc37"></script>
+    <script>const THEBE_JS_URL = "https://unpkg.com/thebe@0.8.2/lib/index.js"; const thebe_selector = ".thebe,.cell"; const thebe_selector_input = "pre"; const thebe_selector_output = ".output, .cell_output"</script>
+    <script async="async" src="http://github.com/_static/sphinx-thebe.js?v=c100c467"></script>
+    <script>var togglebuttonSelector = '.toggle, .admonition.dropdown';</script>
+    <script>const THEBE_JS_URL = "https://unpkg.com/thebe@0.8.2/lib/index.js"; const thebe_selector = ".thebe,.cell"; const thebe_selector_input = "pre"; const thebe_selector_output = ".output, .cell_output"</script>
+    <script>DOCUMENTATION_OPTIONS.pagename = '.venv/lib/python3.13/site-packages/a11y_pygments/github_dark_high_contrast/README';</script>
+    <link rel="index" title="Index" href="http://github.com/genindex.html" />
+    <link rel="search" title="Search" href="http://github.com/search.html" />
+  <meta name="viewport" content="width=device-width, initial-scale=1"/>
+  <meta name="docsearch:language" content="en"/>
+  </head>
+  
+  
+  <body data-bs-spy="scroll" data-bs-target=".bd-toc-nav" data-offset="180" data-bs-root-margin="0px 0px -60%" data-default-mode="">
+
+  
+  
+  <div id="pst-skip-link" class="skip-link d-print-none"><a href="#main-content">Skip to main content</a></div>
+  
+  <div id="pst-scroll-pixel-helper"></div>
+  
+  <button type="button" class="btn rounded-pill" id="pst-back-to-top">
+    <i class="fa-solid fa-arrow-up"></i>Back to top</button>
+
+  
+  <input type="checkbox"
+          class="sidebar-toggle"
+          id="pst-primary-sidebar-checkbox"/>
+  <label class="overlay overlay-primary" for="pst-primary-sidebar-checkbox"></label>
+  
+  <input type="checkbox"
+          class="sidebar-toggle"
+          id="pst-secondary-sidebar-checkbox"/>
+  <label class="overlay overlay-secondary" for="pst-secondary-sidebar-checkbox"></label>
+  
+  <div class="search-button__wrapper">
+    <div class="search-button__overlay"></div>
+    <div class="search-button__search-container">
+<form class="bd-search d-flex align-items-center"
+      action="http://github.com/search.html"
+      method="get">
+  <i class="fa-solid fa-magnifying-glass"></i>
+  <input type="search"
+         class="form-control"
+         name="q"
+         id="search-input"
+         placeholder="Search this book..."
+         aria-label="Search this book..."
+         autocomplete="off"
+         autocorrect="off"
+         autocapitalize="off"
+         spellcheck="false"/>
+  <span class="search-button__kbd-shortcut"><kbd class="kbd-shortcut__modifier">Ctrl</kbd>+<kbd>K</kbd></span>
+</form></div>
+  </div>
+
+  <div class="pst-async-banner-revealer d-none">
+  <aside id="bd-header-version-warning" class="d-none d-print-none" aria-label="Version warning"></aside>
+</div>
+
+  
+    <header class="bd-header navbar navbar-expand-lg bd-navbar d-print-none">
+    </header>
+  
+
+  <div class="bd-container">
+    <div class="bd-container__inner bd-page-width">
+      
+      
+      
+        
+      
+      <div class="bd-sidebar-primary bd-sidebar">
+        
+
+  
+  <div class="sidebar-header-items sidebar-primary__section">
+    
+    
+    
+    
+  </div>
+  
+    <div class="sidebar-primary-items__start sidebar-primary__section">
+        <div class="sidebar-primary-item">
+
+  
+    
+  
+
+<a class="navbar-brand logo" href="http://github.com/intro.html">
+  
+  
+  
+  
+  
+    
+    
+      
+    
+    
+    <img src="http://github.com/_static/logo.png" class="logo__image only-light" alt="Applied Data Analysis and Machine Learning - Home"/>
+    <script>document.write(`<img src="http://github.com/_static/logo.png" class="logo__image only-dark" alt="Applied Data Analysis and Machine Learning - Home"/>`);</script>
+  
+  
+</a></div>
+        <div class="sidebar-primary-item">
+
+ <script>
+ document.write(`
+   <button class="btn search-button-field search-button__button" title="Search" aria-label="Search" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <i class="fa-solid fa-magnifying-glass"></i>
+    <span class="search-button__default-text">Search</span>
+    <span class="search-button__kbd-shortcut"><kbd class="kbd-shortcut__modifier">Ctrl</kbd>+<kbd class="kbd-shortcut__modifier">K</kbd></span>
+   </button>
+ `);
+ </script></div>
+        <div class="sidebar-primary-item"><nav class="bd-links bd-docs-nav" aria-label="Main">
+    <div class="bd-toc-item navbar-nav active">
+        
+        <ul class="nav bd-sidenav bd-sidenav__home-link">
+            <li class="toctree-l1">
+                <a class="reference internal" href="http://github.com/intro.html">
+                    Applied Data Analysis and Machine Learning
+                </a>
+            </li>
+        </ul>
+        <p aria-level="2" class="caption" role="heading"><span class="caption-text">About the course</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/schedule.html">Course setting</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/teachers.html">Teachers and Grading</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/textbooks.html">Textbooks</a></li>
+
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Review of Statistics with Resampling Techniques and Linear Algebra</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/statistics.html">1. Elements of Probability Theory and Statistical Data Analysis</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/linalg.html">2. Linear Algebra, Handling of Arrays and more Python Features</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">From Regression to Support Vector Machines</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter1.html">3. Linear Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter2.html">4. Ridge and Lasso Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter3.html">5. Resampling Methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter4.html">6. Logistic Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapteroptimization.html">7. Optimization, the central part of any Machine Learning algortithm</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter5.html">8. Support Vector Machines, overarching aims</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Decision Trees, Ensemble Methods and Boosting</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter6.html">9. Decision trees, overarching aims</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter7.html">10. Ensemble Methods: From a Single Tree to Many Trees and Extreme Boosting, Meet the Jungle of Methods</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Dimensionality Reduction</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter8.html">11. Basic ideas of the Principal Component Analysis (PCA)</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/clustering.html">12. Clustering and Unsupervised Learning</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Deep Learning Methods</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter9.html">13. Neural networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter10.html">14. Building a Feed Forward Neural Network</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter11.html">15. Solving Differential Equations  with Deep Learning</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter12.html">16. Convolutional Neural Networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter13.html">17. Recurrent neural networks: Overarching view</a></li>
+
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Weekly material, notes and exercises</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek34.html">Exercises week 34</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week34.html">Week 34: Introduction to the course, Logistics and Practicalities</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek35.html">Exercises week 35</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week35.html">Week 35: From Ordinary Linear Regression to Ridge and Lasso Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek36.html">Exercises week 36</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week36.html">Week 36: Linear Regression and Gradient descent</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek37.html">Exercises week 37</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week37.html">Week 37: Gradient descent methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek38.html">Exercises week 38</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week38.html">Week 38: Statistical analysis, bias-variance tradeoff and resampling methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek39.html">Exercises week 39</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week39.html">Week 39: Resampling methods and logistic regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week40.html">Week 40: Gradient descent methods (continued) and start Neural networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week41.html">Week 41 Neural networks and constructing a neural network code</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek41.html">Exercises week 41</a></li>
+
+
+
+
+
+
+
+
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Projects</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/project1.html">Project 1 on Machine Learning, deadline October 6 (midnight), 2025</a></li>
+</ul>
+
+    </div>
+</nav></div>
+    </div>
+  
+  
+  <div class="sidebar-primary-items__end sidebar-primary__section">
+  </div>
+  
+  <div id="rtd-footer-container"></div>
+
+
+      </div>
+      
+      <main id="main-content" class="bd-main" role="main">
+        
+        
+
+<div class="sbt-scroll-pixel-helper"></div>
+
+          <div class="bd-content">
+            <div class="bd-article-container">
+              
+              <div class="bd-header-article d-print-none">
+<div class="header-article-items header-article__inner">
+  
+    <div class="header-article-items__start">
+      
+        <div class="header-article-item"><button class="sidebar-toggle primary-toggle btn btn-sm" title="Toggle primary sidebar" data-bs-placement="bottom" data-bs-toggle="tooltip">
+  <span class="fa-solid fa-bars"></span>
+</button></div>
+      
+    </div>
+  
+  
+    <div class="header-article-items__end">
+      
+        <div class="header-article-item">
+
+<div class="article-header-buttons">
+
+
+
+
+
+<div class="dropdown dropdown-download-buttons">
+  <button class="btn dropdown-toggle" type="button" data-bs-toggle="dropdown" aria-expanded="false" aria-label="Download this page">
+    <i class="fas fa-download"></i>
+  </button>
+  <ul class="dropdown-menu">
+      
+      
+      
+      <li><a href="http://github.com/_sources/.venv/lib/python3.13/site-packages/a11y_pygments/github_dark_high_contrast/README.md" target="_blank"
+   class="btn btn-sm btn-download-source-button dropdown-item"
+   title="Download source file"
+   data-bs-placement="left" data-bs-toggle="tooltip"
+>
+  
+
+<span class="btn__icon-container">
+  <i class="fas fa-file"></i>
+  </span>
+<span class="btn__text-container">.md</span>
+</a>
+</li>
+      
+      
+      
+      
+      <li>
+<button onclick="window.print()"
+  class="btn btn-sm btn-download-pdf-button dropdown-item"
+  title="Print to PDF"
+  data-bs-placement="left" data-bs-toggle="tooltip"
+>
+  
+
+<span class="btn__icon-container">
+  <i class="fas fa-file-pdf"></i>
+  </span>
+<span class="btn__text-container">.pdf</span>
+</button>
+</li>
+      
+  </ul>
+</div>
+
+
+
+
+<button onclick="toggleFullScreen()"
+  class="btn btn-sm btn-fullscreen-button"
+  title="Fullscreen mode"
+  data-bs-placement="bottom" data-bs-toggle="tooltip"
+>
+  
+
+<span class="btn__icon-container">
+  <i class="fas fa-expand"></i>
+  </span>
+
+</button>
+
+
+
+<script>
+document.write(`
+  <button class="btn btn-sm nav-link pst-navbar-icon theme-switch-button" title="light/dark" aria-label="light/dark" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <i class="theme-switch fa-solid fa-sun fa-lg" data-mode="light"></i>
+    <i class="theme-switch fa-solid fa-moon fa-lg" data-mode="dark"></i>
+    <i class="theme-switch fa-solid fa-circle-half-stroke fa-lg" data-mode="auto"></i>
+  </button>
+`);
+</script>
+
+
+<script>
+document.write(`
+  <button class="btn btn-sm pst-navbar-icon search-button search-button__button" title="Search" aria-label="Search" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <i class="fa-solid fa-magnifying-glass fa-lg"></i>
+  </button>
+`);
+</script>
+<button class="sidebar-toggle secondary-toggle btn btn-sm" title="Toggle secondary sidebar" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <span class="fa-solid fa-list"></span>
+</button>
+</div></div>
+      
+    </div>
+  
+</div>
+</div>
+              
+              
+
+<div id="jb-print-docs-body" class="onlyprint">
+    <h1>Github Dark High Contrast</h1>
+    <!-- Table of contents -->
+    <div id="print-main-content">
+        <div id="jb-print-toc">
+            
+            <div>
+                <h2> Contents </h2>
+            </div>
+            <nav aria-label="Page">
+                <ul class="visible nav section-nav flex-column">
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#colors">Colors</a></li>
+</ul>
+            </nav>
+        </div>
+    </div>
+</div>
+
+              
+                
+<div id="searchbox"></div>
+                <article class="bd-article">
+                  
+  <section class="tex2jax_ignore mathjax_ignore" id="github-dark-high-contrast">
+<h1>Github Dark High Contrast<a class="headerlink" href="#github-dark-high-contrast" title="Link to this heading">#</a></h1>
+<p>This style mimics the github dark high contrast theme from vs code themes.</p>
+<p><img alt="Screenshot of the github-dark-high-contrast theme in a bash script" src="http://github.com/_images/github-dark-high-contrast.png" /></p>
+<section id="colors">
+<h2>Colors<a class="headerlink" href="#colors" title="Link to this heading">#</a></h2>
+<p>Background color: <img alt="#0d1117" src="https://via.placeholder.com/20/0d1117/0d1117.png" /> <code class="docutils literal notranslate"><span class="pre">#0d1117</span></code></p>
+<p>Highlight color: <img alt="#58a6ff70" src="https://via.placeholder.com/20/58a6ff70/58a6ff70.png" /> <code class="docutils literal notranslate"><span class="pre">#58a6ff70</span></code></p>
+<p><strong>WCAG compliance</strong></p>
+<div class="pst-scrollable-table-container"><table class="table">
+<thead>
+<tr class="row-odd"><th class="head"><p>Color</p></th>
+<th class="head"><p>Hex</p></th>
+<th class="head"><p>Ratio</p></th>
+<th class="head"><p>Normal text</p></th>
+<th class="head"><p>Large text</p></th>
+</tr>
+</thead>
+<tbody>
+<tr class="row-even"><td><p><img alt="#d9dee3" src="http://github.com/_images/d9dee3.png" /></p></td>
+<td><p><code class="docutils literal notranslate"><span class="pre">#d9dee3</span></code></p></td>
+<td><p>14.0 : 1</p></td>
+<td><p>AAA</p></td>
+<td><p>AAA</p></td>
+</tr>
+<tr class="row-odd"><td><p><img alt="#ff9492" src="http://github.com/_images/ff9492.png" /></p></td>
+<td><p><code class="docutils literal notranslate"><span class="pre">#ff9492</span></code></p></td>
+<td><p>8.9 : 1</p></td>
+<td><p>AAA</p></td>
+<td><p>AAA</p></td>
+</tr>
+<tr class="row-even"><td><p><img alt="#ffb757" src="http://github.com/_images/ffb757.png" /></p></td>
+<td><p><code class="docutils literal notranslate"><span class="pre">#ffb757</span></code></p></td>
+<td><p>11.0 : 1</p></td>
+<td><p>AAA</p></td>
+<td><p>AAA</p></td>
+</tr>
+<tr class="row-odd"><td><p><img alt="#72f088" src="http://github.com/_images/72f088.png" /></p></td>
+<td><p><code class="docutils literal notranslate"><span class="pre">#72f088</span></code></p></td>
+<td><p>13.1 : 1</p></td>
+<td><p>AAA</p></td>
+<td><p>AAA</p></td>
+</tr>
+<tr class="row-even"><td><p><img alt="#91cbff" src="http://github.com/_images/91cbff.png" /></p></td>
+<td><p><code class="docutils literal notranslate"><span class="pre">#91cbff</span></code></p></td>
+<td><p>11.0 : 1</p></td>
+<td><p>AAA</p></td>
+<td><p>AAA</p></td>
+</tr>
+<tr class="row-odd"><td><p><img alt="#dbb7ff" src="http://github.com/_images/dbb7ff.png" /></p></td>
+<td><p><code class="docutils literal notranslate"><span class="pre">#dbb7ff</span></code></p></td>
+<td><p>11.0 : 1</p></td>
+<td><p>AAA</p></td>
+<td><p>AAA</p></td>
+</tr>
+<tr class="row-even"><td><p><img alt="#c9d1d9" src="http://github.com/_images/c9d1d9.png" /></p></td>
+<td><p><code class="docutils literal notranslate"><span class="pre">#c9d1d9</span></code></p></td>
+<td><p>12.3 : 1</p></td>
+<td><p>AAA</p></td>
+<td><p>AAA</p></td>
+</tr>
+</tbody>
+</table>
+</div>
+</section>
+</section>
+
+    <script type="text/x-thebe-config">
+    {
+        requestKernel: true,
+        binderOptions: {
+            repo: "binder-examples/jupyter-stacks-datascience",
+            ref: "master",
+        },
+        codeMirrorConfig: {
+            theme: "abcdef",
+            mode: "python"
+        },
+        kernelOptions: {
+            name: "python3",
+            path: "./.venv/lib/python3.13/site-packages/a11y_pygments/github_dark_high_contrast"
+        },
+        predefinedOutput: true
+    }
+    </script>
+    <script>kernelName = 'python3'</script>
+
+                </article>
+              
+
+              
+              
+              
+              
+                <footer class="prev-next-footer d-print-none">
+                  
+<div class="prev-next-area">
+</div>
+                </footer>
+              
+            </div>
+            
+            
+              
+                <div class="bd-sidebar-secondary bd-toc"><div class="sidebar-secondary-items sidebar-secondary__inner">
+
+
+  <div class="sidebar-secondary-item">
+  <div class="page-toc tocsection onthispage">
+    <i class="fa-solid fa-list"></i> Contents
+  </div>
+  <nav class="bd-toc-nav page-toc">
+    <ul class="visible nav section-nav flex-column">
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#colors">Colors</a></li>
+</ul>
+  </nav></div>
+
+</div></div>
+              
+            
+          </div>
+          <footer class="bd-footer-content">
+            
+<div class="bd-footer-content__inner container">
+  
+  <div class="footer-item">
+    
+<p class="component-author">
+By Morten Hjorth-Jensen
+</p>
+
+  </div>
+  
+  <div class="footer-item">
+    
+
+  <p class="copyright">
+    
+      © Copyright 2023.
+      <br/>
+    
+  </p>
+
+  </div>
+  
+  <div class="footer-item">
+    
+  </div>
+  
+  <div class="footer-item">
+    
+  </div>
+  
+</div>
+          </footer>
+        
+
+      </main>
+    </div>
+  </div>
+  
+  <!-- Scripts loaded after <body> so the DOM is not blocked -->
+  <script src="http://github.com/_static/scripts/bootstrap.js?digest=dfe6caa3a7d634c4db9b"></script>
+<script src="http://github.com/_static/scripts/pydata-sphinx-theme.js?digest=dfe6caa3a7d634c4db9b"></script>
+
+  <footer class="bd-footer">
+  </footer>
+  </body>
+</html>
\ No newline at end of file
diff --git a/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/a11y_pygments/github_light/README.html b/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/a11y_pygments/github_light/README.html
new file mode 100644
index 000000000..1200efee2
--- /dev/null
+++ b/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/a11y_pygments/github_light/README.html
@@ -0,0 +1,579 @@
+
+<!DOCTYPE html>
+
+
+<html lang="en" data-content_root="../../../../../../" >
+
+  <head>
+    <meta charset="utf-8" />
+    <meta name="viewport" content="width=device-width, initial-scale=1.0" /><meta name="viewport" content="width=device-width, initial-scale=1" />
+
+    <title>Github Light &#8212; Applied Data Analysis and Machine Learning</title>
+  
+  
+  
+  <script data-cfasync="false">
+    document.documentElement.dataset.mode = localStorage.getItem("mode") || "";
+    document.documentElement.dataset.theme = localStorage.getItem("theme") || "";
+  </script>
+  
+  <!-- Loaded before other Sphinx assets -->
+  <link href="http://github.com/_static/styles/theme.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+<link href="http://github.com/_static/styles/bootstrap.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+<link href="http://github.com/_static/styles/pydata-sphinx-theme.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+
+  
+  <link href="http://github.com/_static/vendor/fontawesome/6.5.2/css/all.min.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+  <link rel="preload" as="font" type="font/woff2" crossorigin href="http://github.com/_static/vendor/fontawesome/6.5.2/webfonts/fa-solid-900.woff2" />
+<link rel="preload" as="font" type="font/woff2" crossorigin href="http://github.com/_static/vendor/fontawesome/6.5.2/webfonts/fa-brands-400.woff2" />
+<link rel="preload" as="font" type="font/woff2" crossorigin href="http://github.com/_static/vendor/fontawesome/6.5.2/webfonts/fa-regular-400.woff2" />
+
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/pygments.css?v=03e43079" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/styles/sphinx-book-theme.css?v=eba8b062" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/togglebutton.css?v=13237357" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/copybutton.css?v=76b2166b" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/mystnb.8ecb98da25f57f5357bf6f572d296f466b2cfe2517ffebfabe82451661e28f02.css?v=6644e6bb" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/sphinx-thebe.css?v=4fa983c6" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/sphinx-design.min.css?v=95c83b7e" />
+  
+  <!-- Pre-loaded scripts that we'll load fully later -->
+  <link rel="preload" as="script" href="http://github.com/_static/scripts/bootstrap.js?digest=dfe6caa3a7d634c4db9b" />
+<link rel="preload" as="script" href="http://github.com/_static/scripts/pydata-sphinx-theme.js?digest=dfe6caa3a7d634c4db9b" />
+  <script src="http://github.com/_static/vendor/fontawesome/6.5.2/js/all.min.js?digest=dfe6caa3a7d634c4db9b"></script>
+
+    <script src="http://github.com/_static/documentation_options.js?v=9eb32ce0"></script>
+    <script src="http://github.com/_static/doctools.js?v=9a2dae69"></script>
+    <script src="http://github.com/_static/sphinx_highlight.js?v=dc90522c"></script>
+    <script src="http://github.com/_static/clipboard.min.js?v=a7894cd8"></script>
+    <script src="http://github.com/_static/copybutton.js?v=f281be69"></script>
+    <script src="http://github.com/_static/scripts/sphinx-book-theme.js?v=887ef09a"></script>
+    <script>let toggleHintShow = 'Click to show';</script>
+    <script>let toggleHintHide = 'Click to hide';</script>
+    <script>let toggleOpenOnPrint = 'true';</script>
+    <script src="http://github.com/_static/togglebutton.js?v=4a39c7ea"></script>
+    <script>var togglebuttonSelector = '.toggle, .admonition.dropdown';</script>
+    <script src="http://github.com/_static/design-tabs.js?v=f930bc37"></script>
+    <script>const THEBE_JS_URL = "https://unpkg.com/thebe@0.8.2/lib/index.js"; const thebe_selector = ".thebe,.cell"; const thebe_selector_input = "pre"; const thebe_selector_output = ".output, .cell_output"</script>
+    <script async="async" src="http://github.com/_static/sphinx-thebe.js?v=c100c467"></script>
+    <script>var togglebuttonSelector = '.toggle, .admonition.dropdown';</script>
+    <script>const THEBE_JS_URL = "https://unpkg.com/thebe@0.8.2/lib/index.js"; const thebe_selector = ".thebe,.cell"; const thebe_selector_input = "pre"; const thebe_selector_output = ".output, .cell_output"</script>
+    <script>DOCUMENTATION_OPTIONS.pagename = '.venv/lib/python3.13/site-packages/a11y_pygments/github_light/README';</script>
+    <link rel="index" title="Index" href="http://github.com/genindex.html" />
+    <link rel="search" title="Search" href="http://github.com/search.html" />
+  <meta name="viewport" content="width=device-width, initial-scale=1"/>
+  <meta name="docsearch:language" content="en"/>
+  </head>
+  
+  
+  <body data-bs-spy="scroll" data-bs-target=".bd-toc-nav" data-offset="180" data-bs-root-margin="0px 0px -60%" data-default-mode="">
+
+  
+  
+  <div id="pst-skip-link" class="skip-link d-print-none"><a href="#main-content">Skip to main content</a></div>
+  
+  <div id="pst-scroll-pixel-helper"></div>
+  
+  <button type="button" class="btn rounded-pill" id="pst-back-to-top">
+    <i class="fa-solid fa-arrow-up"></i>Back to top</button>
+
+  
+  <input type="checkbox"
+          class="sidebar-toggle"
+          id="pst-primary-sidebar-checkbox"/>
+  <label class="overlay overlay-primary" for="pst-primary-sidebar-checkbox"></label>
+  
+  <input type="checkbox"
+          class="sidebar-toggle"
+          id="pst-secondary-sidebar-checkbox"/>
+  <label class="overlay overlay-secondary" for="pst-secondary-sidebar-checkbox"></label>
+  
+  <div class="search-button__wrapper">
+    <div class="search-button__overlay"></div>
+    <div class="search-button__search-container">
+<form class="bd-search d-flex align-items-center"
+      action="http://github.com/search.html"
+      method="get">
+  <i class="fa-solid fa-magnifying-glass"></i>
+  <input type="search"
+         class="form-control"
+         name="q"
+         id="search-input"
+         placeholder="Search this book..."
+         aria-label="Search this book..."
+         autocomplete="off"
+         autocorrect="off"
+         autocapitalize="off"
+         spellcheck="false"/>
+  <span class="search-button__kbd-shortcut"><kbd class="kbd-shortcut__modifier">Ctrl</kbd>+<kbd>K</kbd></span>
+</form></div>
+  </div>
+
+  <div class="pst-async-banner-revealer d-none">
+  <aside id="bd-header-version-warning" class="d-none d-print-none" aria-label="Version warning"></aside>
+</div>
+
+  
+    <header class="bd-header navbar navbar-expand-lg bd-navbar d-print-none">
+    </header>
+  
+
+  <div class="bd-container">
+    <div class="bd-container__inner bd-page-width">
+      
+      
+      
+        
+      
+      <div class="bd-sidebar-primary bd-sidebar">
+        
+
+  
+  <div class="sidebar-header-items sidebar-primary__section">
+    
+    
+    
+    
+  </div>
+  
+    <div class="sidebar-primary-items__start sidebar-primary__section">
+        <div class="sidebar-primary-item">
+
+  
+    
+  
+
+<a class="navbar-brand logo" href="http://github.com/intro.html">
+  
+  
+  
+  
+  
+    
+    
+      
+    
+    
+    <img src="http://github.com/_static/logo.png" class="logo__image only-light" alt="Applied Data Analysis and Machine Learning - Home"/>
+    <script>document.write(`<img src="http://github.com/_static/logo.png" class="logo__image only-dark" alt="Applied Data Analysis and Machine Learning - Home"/>`);</script>
+  
+  
+</a></div>
+        <div class="sidebar-primary-item">
+
+ <script>
+ document.write(`
+   <button class="btn search-button-field search-button__button" title="Search" aria-label="Search" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <i class="fa-solid fa-magnifying-glass"></i>
+    <span class="search-button__default-text">Search</span>
+    <span class="search-button__kbd-shortcut"><kbd class="kbd-shortcut__modifier">Ctrl</kbd>+<kbd class="kbd-shortcut__modifier">K</kbd></span>
+   </button>
+ `);
+ </script></div>
+        <div class="sidebar-primary-item"><nav class="bd-links bd-docs-nav" aria-label="Main">
+    <div class="bd-toc-item navbar-nav active">
+        
+        <ul class="nav bd-sidenav bd-sidenav__home-link">
+            <li class="toctree-l1">
+                <a class="reference internal" href="http://github.com/intro.html">
+                    Applied Data Analysis and Machine Learning
+                </a>
+            </li>
+        </ul>
+        <p aria-level="2" class="caption" role="heading"><span class="caption-text">About the course</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/schedule.html">Course setting</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/teachers.html">Teachers and Grading</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/textbooks.html">Textbooks</a></li>
+
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Review of Statistics with Resampling Techniques and Linear Algebra</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/statistics.html">1. Elements of Probability Theory and Statistical Data Analysis</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/linalg.html">2. Linear Algebra, Handling of Arrays and more Python Features</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">From Regression to Support Vector Machines</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter1.html">3. Linear Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter2.html">4. Ridge and Lasso Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter3.html">5. Resampling Methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter4.html">6. Logistic Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapteroptimization.html">7. Optimization, the central part of any Machine Learning algortithm</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter5.html">8. Support Vector Machines, overarching aims</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Decision Trees, Ensemble Methods and Boosting</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter6.html">9. Decision trees, overarching aims</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter7.html">10. Ensemble Methods: From a Single Tree to Many Trees and Extreme Boosting, Meet the Jungle of Methods</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Dimensionality Reduction</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter8.html">11. Basic ideas of the Principal Component Analysis (PCA)</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/clustering.html">12. Clustering and Unsupervised Learning</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Deep Learning Methods</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter9.html">13. Neural networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter10.html">14. Building a Feed Forward Neural Network</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter11.html">15. Solving Differential Equations  with Deep Learning</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter12.html">16. Convolutional Neural Networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter13.html">17. Recurrent neural networks: Overarching view</a></li>
+
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Weekly material, notes and exercises</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek34.html">Exercises week 34</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week34.html">Week 34: Introduction to the course, Logistics and Practicalities</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek35.html">Exercises week 35</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week35.html">Week 35: From Ordinary Linear Regression to Ridge and Lasso Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek36.html">Exercises week 36</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week36.html">Week 36: Linear Regression and Gradient descent</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek37.html">Exercises week 37</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week37.html">Week 37: Gradient descent methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek38.html">Exercises week 38</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week38.html">Week 38: Statistical analysis, bias-variance tradeoff and resampling methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek39.html">Exercises week 39</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week39.html">Week 39: Resampling methods and logistic regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week40.html">Week 40: Gradient descent methods (continued) and start Neural networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week41.html">Week 41 Neural networks and constructing a neural network code</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek41.html">Exercises week 41</a></li>
+
+
+
+
+
+
+
+
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Projects</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/project1.html">Project 1 on Machine Learning, deadline October 6 (midnight), 2025</a></li>
+</ul>
+
+    </div>
+</nav></div>
+    </div>
+  
+  
+  <div class="sidebar-primary-items__end sidebar-primary__section">
+  </div>
+  
+  <div id="rtd-footer-container"></div>
+
+
+      </div>
+      
+      <main id="main-content" class="bd-main" role="main">
+        
+        
+
+<div class="sbt-scroll-pixel-helper"></div>
+
+          <div class="bd-content">
+            <div class="bd-article-container">
+              
+              <div class="bd-header-article d-print-none">
+<div class="header-article-items header-article__inner">
+  
+    <div class="header-article-items__start">
+      
+        <div class="header-article-item"><button class="sidebar-toggle primary-toggle btn btn-sm" title="Toggle primary sidebar" data-bs-placement="bottom" data-bs-toggle="tooltip">
+  <span class="fa-solid fa-bars"></span>
+</button></div>
+      
+    </div>
+  
+  
+    <div class="header-article-items__end">
+      
+        <div class="header-article-item">
+
+<div class="article-header-buttons">
+
+
+
+
+
+<div class="dropdown dropdown-download-buttons">
+  <button class="btn dropdown-toggle" type="button" data-bs-toggle="dropdown" aria-expanded="false" aria-label="Download this page">
+    <i class="fas fa-download"></i>
+  </button>
+  <ul class="dropdown-menu">
+      
+      
+      
+      <li><a href="http://github.com/_sources/.venv/lib/python3.13/site-packages/a11y_pygments/github_light/README.md" target="_blank"
+   class="btn btn-sm btn-download-source-button dropdown-item"
+   title="Download source file"
+   data-bs-placement="left" data-bs-toggle="tooltip"
+>
+  
+
+<span class="btn__icon-container">
+  <i class="fas fa-file"></i>
+  </span>
+<span class="btn__text-container">.md</span>
+</a>
+</li>
+      
+      
+      
+      
+      <li>
+<button onclick="window.print()"
+  class="btn btn-sm btn-download-pdf-button dropdown-item"
+  title="Print to PDF"
+  data-bs-placement="left" data-bs-toggle="tooltip"
+>
+  
+
+<span class="btn__icon-container">
+  <i class="fas fa-file-pdf"></i>
+  </span>
+<span class="btn__text-container">.pdf</span>
+</button>
+</li>
+      
+  </ul>
+</div>
+
+
+
+
+<button onclick="toggleFullScreen()"
+  class="btn btn-sm btn-fullscreen-button"
+  title="Fullscreen mode"
+  data-bs-placement="bottom" data-bs-toggle="tooltip"
+>
+  
+
+<span class="btn__icon-container">
+  <i class="fas fa-expand"></i>
+  </span>
+
+</button>
+
+
+
+<script>
+document.write(`
+  <button class="btn btn-sm nav-link pst-navbar-icon theme-switch-button" title="light/dark" aria-label="light/dark" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <i class="theme-switch fa-solid fa-sun fa-lg" data-mode="light"></i>
+    <i class="theme-switch fa-solid fa-moon fa-lg" data-mode="dark"></i>
+    <i class="theme-switch fa-solid fa-circle-half-stroke fa-lg" data-mode="auto"></i>
+  </button>
+`);
+</script>
+
+
+<script>
+document.write(`
+  <button class="btn btn-sm pst-navbar-icon search-button search-button__button" title="Search" aria-label="Search" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <i class="fa-solid fa-magnifying-glass fa-lg"></i>
+  </button>
+`);
+</script>
+<button class="sidebar-toggle secondary-toggle btn btn-sm" title="Toggle secondary sidebar" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <span class="fa-solid fa-list"></span>
+</button>
+</div></div>
+      
+    </div>
+  
+</div>
+</div>
+              
+              
+
+<div id="jb-print-docs-body" class="onlyprint">
+    <h1>Github Light</h1>
+    <!-- Table of contents -->
+    <div id="print-main-content">
+        <div id="jb-print-toc">
+            
+            <div>
+                <h2> Contents </h2>
+            </div>
+            <nav aria-label="Page">
+                <ul class="visible nav section-nav flex-column">
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#colors">Colors</a></li>
+</ul>
+            </nav>
+        </div>
+    </div>
+</div>
+
+              
+                
+<div id="searchbox"></div>
+                <article class="bd-article">
+                  
+  <section class="tex2jax_ignore mathjax_ignore" id="github-light">
+<h1>Github Light<a class="headerlink" href="#github-light" title="Link to this heading">#</a></h1>
+<p>This style mimics the github light theme from vscode themes.</p>
+<p><img alt="Screenshot of the github-light theme in a bash script" src="http://github.com/_images/github-light.png" /></p>
+<section id="colors">
+<h2>Colors<a class="headerlink" href="#colors" title="Link to this heading">#</a></h2>
+<p>Background color: <img alt="#ffffff" src="https://via.placeholder.com/20/ffffff/ffffff.png" /> <code class="docutils literal notranslate"><span class="pre">#ffffff</span></code></p>
+<p>Highlight color: <img alt="#0969da4a" src="https://via.placeholder.com/20/0969da4a/0969da4a.png" /> <code class="docutils literal notranslate"><span class="pre">#0969da4a</span></code></p>
+<p><strong>WCAG compliance</strong></p>
+<div class="pst-scrollable-table-container"><table class="table">
+<thead>
+<tr class="row-odd"><th class="head"><p>Color</p></th>
+<th class="head"><p>Hex</p></th>
+<th class="head"><p>Ratio</p></th>
+<th class="head"><p>Normal text</p></th>
+<th class="head"><p>Large text</p></th>
+</tr>
+</thead>
+<tbody>
+<tr class="row-even"><td><p><img alt="#6e7781" src="http://github.com/_images/6e7781.png" /></p></td>
+<td><p><code class="docutils literal notranslate"><span class="pre">#6e7781</span></code></p></td>
+<td><p>4.5 : 1</p></td>
+<td><p>AA</p></td>
+<td><p>AAA</p></td>
+</tr>
+<tr class="row-odd"><td><p><img alt="#cf222e" src="http://github.com/_images/cf222e.png" /></p></td>
+<td><p><code class="docutils literal notranslate"><span class="pre">#cf222e</span></code></p></td>
+<td><p>5.4 : 1</p></td>
+<td><p>AA</p></td>
+<td><p>AAA</p></td>
+</tr>
+<tr class="row-even"><td><p><img alt="#953800" src="http://github.com/_images/953800.png" /></p></td>
+<td><p><code class="docutils literal notranslate"><span class="pre">#953800</span></code></p></td>
+<td><p>7.4 : 1</p></td>
+<td><p>AAA</p></td>
+<td><p>AAA</p></td>
+</tr>
+<tr class="row-odd"><td><p><img alt="#116329" src="http://github.com/_images/116329.png" /></p></td>
+<td><p><code class="docutils literal notranslate"><span class="pre">#116329</span></code></p></td>
+<td><p>7.4 : 1</p></td>
+<td><p>AAA</p></td>
+<td><p>AAA</p></td>
+</tr>
+<tr class="row-even"><td><p><img alt="#0550ae" src="http://github.com/_images/0550ae.png" /></p></td>
+<td><p><code class="docutils literal notranslate"><span class="pre">#0550ae</span></code></p></td>
+<td><p>7.6 : 1</p></td>
+<td><p>AAA</p></td>
+<td><p>AAA</p></td>
+</tr>
+<tr class="row-odd"><td><p><img alt="#8250df" src="http://github.com/_images/8250df.png" /></p></td>
+<td><p><code class="docutils literal notranslate"><span class="pre">#8250df</span></code></p></td>
+<td><p>5.0 : 1</p></td>
+<td><p>AA</p></td>
+<td><p>AAA</p></td>
+</tr>
+<tr class="row-even"><td><p><img alt="#24292f" src="http://github.com/_images/24292f.png" /></p></td>
+<td><p><code class="docutils literal notranslate"><span class="pre">#24292f</span></code></p></td>
+<td><p>14.7 : 1</p></td>
+<td><p>AAA</p></td>
+<td><p>AAA</p></td>
+</tr>
+</tbody>
+</table>
+</div>
+</section>
+</section>
+
+    <script type="text/x-thebe-config">
+    {
+        requestKernel: true,
+        binderOptions: {
+            repo: "binder-examples/jupyter-stacks-datascience",
+            ref: "master",
+        },
+        codeMirrorConfig: {
+            theme: "abcdef",
+            mode: "python"
+        },
+        kernelOptions: {
+            name: "python3",
+            path: "./.venv/lib/python3.13/site-packages/a11y_pygments/github_light"
+        },
+        predefinedOutput: true
+    }
+    </script>
+    <script>kernelName = 'python3'</script>
+
+                </article>
+              
+
+              
+              
+              
+              
+                <footer class="prev-next-footer d-print-none">
+                  
+<div class="prev-next-area">
+</div>
+                </footer>
+              
+            </div>
+            
+            
+              
+                <div class="bd-sidebar-secondary bd-toc"><div class="sidebar-secondary-items sidebar-secondary__inner">
+
+
+  <div class="sidebar-secondary-item">
+  <div class="page-toc tocsection onthispage">
+    <i class="fa-solid fa-list"></i> Contents
+  </div>
+  <nav class="bd-toc-nav page-toc">
+    <ul class="visible nav section-nav flex-column">
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#colors">Colors</a></li>
+</ul>
+  </nav></div>
+
+</div></div>
+              
+            
+          </div>
+          <footer class="bd-footer-content">
+            
+<div class="bd-footer-content__inner container">
+  
+  <div class="footer-item">
+    
+<p class="component-author">
+By Morten Hjorth-Jensen
+</p>
+
+  </div>
+  
+  <div class="footer-item">
+    
+
+  <p class="copyright">
+    
+      © Copyright 2023.
+      <br/>
+    
+  </p>
+
+  </div>
+  
+  <div class="footer-item">
+    
+  </div>
+  
+  <div class="footer-item">
+    
+  </div>
+  
+</div>
+          </footer>
+        
+
+      </main>
+    </div>
+  </div>
+  
+  <!-- Scripts loaded after <body> so the DOM is not blocked -->
+  <script src="http://github.com/_static/scripts/bootstrap.js?digest=dfe6caa3a7d634c4db9b"></script>
+<script src="http://github.com/_static/scripts/pydata-sphinx-theme.js?digest=dfe6caa3a7d634c4db9b"></script>
+
+  <footer class="bd-footer">
+  </footer>
+  </body>
+</html>
\ No newline at end of file
diff --git a/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/a11y_pygments/github_light_colorblind/README.html b/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/a11y_pygments/github_light_colorblind/README.html
new file mode 100644
index 000000000..4d5c0b528
--- /dev/null
+++ b/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/a11y_pygments/github_light_colorblind/README.html
@@ -0,0 +1,573 @@
+
+<!DOCTYPE html>
+
+
+<html lang="en" data-content_root="../../../../../../" >
+
+  <head>
+    <meta charset="utf-8" />
+    <meta name="viewport" content="width=device-width, initial-scale=1.0" /><meta name="viewport" content="width=device-width, initial-scale=1" />
+
+    <title>Github Light Colorblind &#8212; Applied Data Analysis and Machine Learning</title>
+  
+  
+  
+  <script data-cfasync="false">
+    document.documentElement.dataset.mode = localStorage.getItem("mode") || "";
+    document.documentElement.dataset.theme = localStorage.getItem("theme") || "";
+  </script>
+  
+  <!-- Loaded before other Sphinx assets -->
+  <link href="http://github.com/_static/styles/theme.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+<link href="http://github.com/_static/styles/bootstrap.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+<link href="http://github.com/_static/styles/pydata-sphinx-theme.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+
+  
+  <link href="http://github.com/_static/vendor/fontawesome/6.5.2/css/all.min.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+  <link rel="preload" as="font" type="font/woff2" crossorigin href="http://github.com/_static/vendor/fontawesome/6.5.2/webfonts/fa-solid-900.woff2" />
+<link rel="preload" as="font" type="font/woff2" crossorigin href="http://github.com/_static/vendor/fontawesome/6.5.2/webfonts/fa-brands-400.woff2" />
+<link rel="preload" as="font" type="font/woff2" crossorigin href="http://github.com/_static/vendor/fontawesome/6.5.2/webfonts/fa-regular-400.woff2" />
+
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/pygments.css?v=03e43079" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/styles/sphinx-book-theme.css?v=eba8b062" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/togglebutton.css?v=13237357" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/copybutton.css?v=76b2166b" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/mystnb.8ecb98da25f57f5357bf6f572d296f466b2cfe2517ffebfabe82451661e28f02.css?v=6644e6bb" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/sphinx-thebe.css?v=4fa983c6" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/sphinx-design.min.css?v=95c83b7e" />
+  
+  <!-- Pre-loaded scripts that we'll load fully later -->
+  <link rel="preload" as="script" href="http://github.com/_static/scripts/bootstrap.js?digest=dfe6caa3a7d634c4db9b" />
+<link rel="preload" as="script" href="http://github.com/_static/scripts/pydata-sphinx-theme.js?digest=dfe6caa3a7d634c4db9b" />
+  <script src="http://github.com/_static/vendor/fontawesome/6.5.2/js/all.min.js?digest=dfe6caa3a7d634c4db9b"></script>
+
+    <script src="http://github.com/_static/documentation_options.js?v=9eb32ce0"></script>
+    <script src="http://github.com/_static/doctools.js?v=9a2dae69"></script>
+    <script src="http://github.com/_static/sphinx_highlight.js?v=dc90522c"></script>
+    <script src="http://github.com/_static/clipboard.min.js?v=a7894cd8"></script>
+    <script src="http://github.com/_static/copybutton.js?v=f281be69"></script>
+    <script src="http://github.com/_static/scripts/sphinx-book-theme.js?v=887ef09a"></script>
+    <script>let toggleHintShow = 'Click to show';</script>
+    <script>let toggleHintHide = 'Click to hide';</script>
+    <script>let toggleOpenOnPrint = 'true';</script>
+    <script src="http://github.com/_static/togglebutton.js?v=4a39c7ea"></script>
+    <script>var togglebuttonSelector = '.toggle, .admonition.dropdown';</script>
+    <script src="http://github.com/_static/design-tabs.js?v=f930bc37"></script>
+    <script>const THEBE_JS_URL = "https://unpkg.com/thebe@0.8.2/lib/index.js"; const thebe_selector = ".thebe,.cell"; const thebe_selector_input = "pre"; const thebe_selector_output = ".output, .cell_output"</script>
+    <script async="async" src="http://github.com/_static/sphinx-thebe.js?v=c100c467"></script>
+    <script>var togglebuttonSelector = '.toggle, .admonition.dropdown';</script>
+    <script>const THEBE_JS_URL = "https://unpkg.com/thebe@0.8.2/lib/index.js"; const thebe_selector = ".thebe,.cell"; const thebe_selector_input = "pre"; const thebe_selector_output = ".output, .cell_output"</script>
+    <script>DOCUMENTATION_OPTIONS.pagename = '.venv/lib/python3.13/site-packages/a11y_pygments/github_light_colorblind/README';</script>
+    <link rel="index" title="Index" href="http://github.com/genindex.html" />
+    <link rel="search" title="Search" href="http://github.com/search.html" />
+  <meta name="viewport" content="width=device-width, initial-scale=1"/>
+  <meta name="docsearch:language" content="en"/>
+  </head>
+  
+  
+  <body data-bs-spy="scroll" data-bs-target=".bd-toc-nav" data-offset="180" data-bs-root-margin="0px 0px -60%" data-default-mode="">
+
+  
+  
+  <div id="pst-skip-link" class="skip-link d-print-none"><a href="#main-content">Skip to main content</a></div>
+  
+  <div id="pst-scroll-pixel-helper"></div>
+  
+  <button type="button" class="btn rounded-pill" id="pst-back-to-top">
+    <i class="fa-solid fa-arrow-up"></i>Back to top</button>
+
+  
+  <input type="checkbox"
+          class="sidebar-toggle"
+          id="pst-primary-sidebar-checkbox"/>
+  <label class="overlay overlay-primary" for="pst-primary-sidebar-checkbox"></label>
+  
+  <input type="checkbox"
+          class="sidebar-toggle"
+          id="pst-secondary-sidebar-checkbox"/>
+  <label class="overlay overlay-secondary" for="pst-secondary-sidebar-checkbox"></label>
+  
+  <div class="search-button__wrapper">
+    <div class="search-button__overlay"></div>
+    <div class="search-button__search-container">
+<form class="bd-search d-flex align-items-center"
+      action="http://github.com/search.html"
+      method="get">
+  <i class="fa-solid fa-magnifying-glass"></i>
+  <input type="search"
+         class="form-control"
+         name="q"
+         id="search-input"
+         placeholder="Search this book..."
+         aria-label="Search this book..."
+         autocomplete="off"
+         autocorrect="off"
+         autocapitalize="off"
+         spellcheck="false"/>
+  <span class="search-button__kbd-shortcut"><kbd class="kbd-shortcut__modifier">Ctrl</kbd>+<kbd>K</kbd></span>
+</form></div>
+  </div>
+
+  <div class="pst-async-banner-revealer d-none">
+  <aside id="bd-header-version-warning" class="d-none d-print-none" aria-label="Version warning"></aside>
+</div>
+
+  
+    <header class="bd-header navbar navbar-expand-lg bd-navbar d-print-none">
+    </header>
+  
+
+  <div class="bd-container">
+    <div class="bd-container__inner bd-page-width">
+      
+      
+      
+        
+      
+      <div class="bd-sidebar-primary bd-sidebar">
+        
+
+  
+  <div class="sidebar-header-items sidebar-primary__section">
+    
+    
+    
+    
+  </div>
+  
+    <div class="sidebar-primary-items__start sidebar-primary__section">
+        <div class="sidebar-primary-item">
+
+  
+    
+  
+
+<a class="navbar-brand logo" href="http://github.com/intro.html">
+  
+  
+  
+  
+  
+    
+    
+      
+    
+    
+    <img src="http://github.com/_static/logo.png" class="logo__image only-light" alt="Applied Data Analysis and Machine Learning - Home"/>
+    <script>document.write(`<img src="http://github.com/_static/logo.png" class="logo__image only-dark" alt="Applied Data Analysis and Machine Learning - Home"/>`);</script>
+  
+  
+</a></div>
+        <div class="sidebar-primary-item">
+
+ <script>
+ document.write(`
+   <button class="btn search-button-field search-button__button" title="Search" aria-label="Search" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <i class="fa-solid fa-magnifying-glass"></i>
+    <span class="search-button__default-text">Search</span>
+    <span class="search-button__kbd-shortcut"><kbd class="kbd-shortcut__modifier">Ctrl</kbd>+<kbd class="kbd-shortcut__modifier">K</kbd></span>
+   </button>
+ `);
+ </script></div>
+        <div class="sidebar-primary-item"><nav class="bd-links bd-docs-nav" aria-label="Main">
+    <div class="bd-toc-item navbar-nav active">
+        
+        <ul class="nav bd-sidenav bd-sidenav__home-link">
+            <li class="toctree-l1">
+                <a class="reference internal" href="http://github.com/intro.html">
+                    Applied Data Analysis and Machine Learning
+                </a>
+            </li>
+        </ul>
+        <p aria-level="2" class="caption" role="heading"><span class="caption-text">About the course</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/schedule.html">Course setting</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/teachers.html">Teachers and Grading</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/textbooks.html">Textbooks</a></li>
+
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Review of Statistics with Resampling Techniques and Linear Algebra</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/statistics.html">1. Elements of Probability Theory and Statistical Data Analysis</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/linalg.html">2. Linear Algebra, Handling of Arrays and more Python Features</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">From Regression to Support Vector Machines</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter1.html">3. Linear Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter2.html">4. Ridge and Lasso Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter3.html">5. Resampling Methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter4.html">6. Logistic Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapteroptimization.html">7. Optimization, the central part of any Machine Learning algortithm</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter5.html">8. Support Vector Machines, overarching aims</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Decision Trees, Ensemble Methods and Boosting</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter6.html">9. Decision trees, overarching aims</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter7.html">10. Ensemble Methods: From a Single Tree to Many Trees and Extreme Boosting, Meet the Jungle of Methods</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Dimensionality Reduction</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter8.html">11. Basic ideas of the Principal Component Analysis (PCA)</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/clustering.html">12. Clustering and Unsupervised Learning</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Deep Learning Methods</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter9.html">13. Neural networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter10.html">14. Building a Feed Forward Neural Network</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter11.html">15. Solving Differential Equations  with Deep Learning</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter12.html">16. Convolutional Neural Networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter13.html">17. Recurrent neural networks: Overarching view</a></li>
+
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Weekly material, notes and exercises</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek34.html">Exercises week 34</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week34.html">Week 34: Introduction to the course, Logistics and Practicalities</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek35.html">Exercises week 35</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week35.html">Week 35: From Ordinary Linear Regression to Ridge and Lasso Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek36.html">Exercises week 36</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week36.html">Week 36: Linear Regression and Gradient descent</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek37.html">Exercises week 37</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week37.html">Week 37: Gradient descent methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek38.html">Exercises week 38</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week38.html">Week 38: Statistical analysis, bias-variance tradeoff and resampling methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek39.html">Exercises week 39</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week39.html">Week 39: Resampling methods and logistic regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week40.html">Week 40: Gradient descent methods (continued) and start Neural networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week41.html">Week 41 Neural networks and constructing a neural network code</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek41.html">Exercises week 41</a></li>
+
+
+
+
+
+
+
+
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Projects</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/project1.html">Project 1 on Machine Learning, deadline October 6 (midnight), 2025</a></li>
+</ul>
+
+    </div>
+</nav></div>
+    </div>
+  
+  
+  <div class="sidebar-primary-items__end sidebar-primary__section">
+  </div>
+  
+  <div id="rtd-footer-container"></div>
+
+
+      </div>
+      
+      <main id="main-content" class="bd-main" role="main">
+        
+        
+
+<div class="sbt-scroll-pixel-helper"></div>
+
+          <div class="bd-content">
+            <div class="bd-article-container">
+              
+              <div class="bd-header-article d-print-none">
+<div class="header-article-items header-article__inner">
+  
+    <div class="header-article-items__start">
+      
+        <div class="header-article-item"><button class="sidebar-toggle primary-toggle btn btn-sm" title="Toggle primary sidebar" data-bs-placement="bottom" data-bs-toggle="tooltip">
+  <span class="fa-solid fa-bars"></span>
+</button></div>
+      
+    </div>
+  
+  
+    <div class="header-article-items__end">
+      
+        <div class="header-article-item">
+
+<div class="article-header-buttons">
+
+
+
+
+
+<div class="dropdown dropdown-download-buttons">
+  <button class="btn dropdown-toggle" type="button" data-bs-toggle="dropdown" aria-expanded="false" aria-label="Download this page">
+    <i class="fas fa-download"></i>
+  </button>
+  <ul class="dropdown-menu">
+      
+      
+      
+      <li><a href="http://github.com/_sources/.venv/lib/python3.13/site-packages/a11y_pygments/github_light_colorblind/README.md" target="_blank"
+   class="btn btn-sm btn-download-source-button dropdown-item"
+   title="Download source file"
+   data-bs-placement="left" data-bs-toggle="tooltip"
+>
+  
+
+<span class="btn__icon-container">
+  <i class="fas fa-file"></i>
+  </span>
+<span class="btn__text-container">.md</span>
+</a>
+</li>
+      
+      
+      
+      
+      <li>
+<button onclick="window.print()"
+  class="btn btn-sm btn-download-pdf-button dropdown-item"
+  title="Print to PDF"
+  data-bs-placement="left" data-bs-toggle="tooltip"
+>
+  
+
+<span class="btn__icon-container">
+  <i class="fas fa-file-pdf"></i>
+  </span>
+<span class="btn__text-container">.pdf</span>
+</button>
+</li>
+      
+  </ul>
+</div>
+
+
+
+
+<button onclick="toggleFullScreen()"
+  class="btn btn-sm btn-fullscreen-button"
+  title="Fullscreen mode"
+  data-bs-placement="bottom" data-bs-toggle="tooltip"
+>
+  
+
+<span class="btn__icon-container">
+  <i class="fas fa-expand"></i>
+  </span>
+
+</button>
+
+
+
+<script>
+document.write(`
+  <button class="btn btn-sm nav-link pst-navbar-icon theme-switch-button" title="light/dark" aria-label="light/dark" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <i class="theme-switch fa-solid fa-sun fa-lg" data-mode="light"></i>
+    <i class="theme-switch fa-solid fa-moon fa-lg" data-mode="dark"></i>
+    <i class="theme-switch fa-solid fa-circle-half-stroke fa-lg" data-mode="auto"></i>
+  </button>
+`);
+</script>
+
+
+<script>
+document.write(`
+  <button class="btn btn-sm pst-navbar-icon search-button search-button__button" title="Search" aria-label="Search" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <i class="fa-solid fa-magnifying-glass fa-lg"></i>
+  </button>
+`);
+</script>
+<button class="sidebar-toggle secondary-toggle btn btn-sm" title="Toggle secondary sidebar" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <span class="fa-solid fa-list"></span>
+</button>
+</div></div>
+      
+    </div>
+  
+</div>
+</div>
+              
+              
+
+<div id="jb-print-docs-body" class="onlyprint">
+    <h1>Github Light Colorblind</h1>
+    <!-- Table of contents -->
+    <div id="print-main-content">
+        <div id="jb-print-toc">
+            
+            <div>
+                <h2> Contents </h2>
+            </div>
+            <nav aria-label="Page">
+                <ul class="visible nav section-nav flex-column">
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#colors">Colors</a></li>
+</ul>
+            </nav>
+        </div>
+    </div>
+</div>
+
+              
+                
+<div id="searchbox"></div>
+                <article class="bd-article">
+                  
+  <section class="tex2jax_ignore mathjax_ignore" id="github-light-colorblind">
+<h1>Github Light Colorblind<a class="headerlink" href="#github-light-colorblind" title="Link to this heading">#</a></h1>
+<p>This style mimics the github light colorblind theme from vscode themes.</p>
+<p><img alt="Screenshot of the github-light-colorblind theme in a bash script" src="http://github.com/_images/github-light-colorblind.png" /></p>
+<section id="colors">
+<h2>Colors<a class="headerlink" href="#colors" title="Link to this heading">#</a></h2>
+<p>Background color: <img alt="#ffffff" src="https://via.placeholder.com/20/ffffff/ffffff.png" /> <code class="docutils literal notranslate"><span class="pre">#ffffff</span></code></p>
+<p>Highlight color: <img alt="#0969da4a" src="https://via.placeholder.com/20/0969da4a/0969da4a.png" /> <code class="docutils literal notranslate"><span class="pre">#0969da4a</span></code></p>
+<p><strong>WCAG compliance</strong></p>
+<div class="pst-scrollable-table-container"><table class="table">
+<thead>
+<tr class="row-odd"><th class="head"><p>Color</p></th>
+<th class="head"><p>Hex</p></th>
+<th class="head"><p>Ratio</p></th>
+<th class="head"><p>Normal text</p></th>
+<th class="head"><p>Large text</p></th>
+</tr>
+</thead>
+<tbody>
+<tr class="row-even"><td><p><img alt="#6e7781" src="http://github.com/_images/6e7781.png" /></p></td>
+<td><p><code class="docutils literal notranslate"><span class="pre">#6e7781</span></code></p></td>
+<td><p>4.5 : 1</p></td>
+<td><p>AA</p></td>
+<td><p>AAA</p></td>
+</tr>
+<tr class="row-odd"><td><p><img alt="#b35900" src="http://github.com/_images/b35900.png" /></p></td>
+<td><p><code class="docutils literal notranslate"><span class="pre">#b35900</span></code></p></td>
+<td><p>4.8 : 1</p></td>
+<td><p>AA</p></td>
+<td><p>AAA</p></td>
+</tr>
+<tr class="row-even"><td><p><img alt="#8a4600" src="http://github.com/_images/8a4600.png" /></p></td>
+<td><p><code class="docutils literal notranslate"><span class="pre">#8a4600</span></code></p></td>
+<td><p>7.1 : 1</p></td>
+<td><p>AAA</p></td>
+<td><p>AAA</p></td>
+</tr>
+<tr class="row-odd"><td><p><img alt="#0550ae" src="http://github.com/_images/0550ae.png" /></p></td>
+<td><p><code class="docutils literal notranslate"><span class="pre">#0550ae</span></code></p></td>
+<td><p>7.6 : 1</p></td>
+<td><p>AAA</p></td>
+<td><p>AAA</p></td>
+</tr>
+<tr class="row-even"><td><p><img alt="#8250df" src="http://github.com/_images/8250df.png" /></p></td>
+<td><p><code class="docutils literal notranslate"><span class="pre">#8250df</span></code></p></td>
+<td><p>5.0 : 1</p></td>
+<td><p>AA</p></td>
+<td><p>AAA</p></td>
+</tr>
+<tr class="row-odd"><td><p><img alt="#24292f" src="http://github.com/_images/24292f.png" /></p></td>
+<td><p><code class="docutils literal notranslate"><span class="pre">#24292f</span></code></p></td>
+<td><p>14.7 : 1</p></td>
+<td><p>AAA</p></td>
+<td><p>AAA</p></td>
+</tr>
+</tbody>
+</table>
+</div>
+</section>
+</section>
+
+    <script type="text/x-thebe-config">
+    {
+        requestKernel: true,
+        binderOptions: {
+            repo: "binder-examples/jupyter-stacks-datascience",
+            ref: "master",
+        },
+        codeMirrorConfig: {
+            theme: "abcdef",
+            mode: "python"
+        },
+        kernelOptions: {
+            name: "python3",
+            path: "./.venv/lib/python3.13/site-packages/a11y_pygments/github_light_colorblind"
+        },
+        predefinedOutput: true
+    }
+    </script>
+    <script>kernelName = 'python3'</script>
+
+                </article>
+              
+
+              
+              
+              
+              
+                <footer class="prev-next-footer d-print-none">
+                  
+<div class="prev-next-area">
+</div>
+                </footer>
+              
+            </div>
+            
+            
+              
+                <div class="bd-sidebar-secondary bd-toc"><div class="sidebar-secondary-items sidebar-secondary__inner">
+
+
+  <div class="sidebar-secondary-item">
+  <div class="page-toc tocsection onthispage">
+    <i class="fa-solid fa-list"></i> Contents
+  </div>
+  <nav class="bd-toc-nav page-toc">
+    <ul class="visible nav section-nav flex-column">
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#colors">Colors</a></li>
+</ul>
+  </nav></div>
+
+</div></div>
+              
+            
+          </div>
+          <footer class="bd-footer-content">
+            
+<div class="bd-footer-content__inner container">
+  
+  <div class="footer-item">
+    
+<p class="component-author">
+By Morten Hjorth-Jensen
+</p>
+
+  </div>
+  
+  <div class="footer-item">
+    
+
+  <p class="copyright">
+    
+      © Copyright 2023.
+      <br/>
+    
+  </p>
+
+  </div>
+  
+  <div class="footer-item">
+    
+  </div>
+  
+  <div class="footer-item">
+    
+  </div>
+  
+</div>
+          </footer>
+        
+
+      </main>
+    </div>
+  </div>
+  
+  <!-- Scripts loaded after <body> so the DOM is not blocked -->
+  <script src="http://github.com/_static/scripts/bootstrap.js?digest=dfe6caa3a7d634c4db9b"></script>
+<script src="http://github.com/_static/scripts/pydata-sphinx-theme.js?digest=dfe6caa3a7d634c4db9b"></script>
+
+  <footer class="bd-footer">
+  </footer>
+  </body>
+</html>
\ No newline at end of file
diff --git a/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/a11y_pygments/github_light_high_contrast/README.html b/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/a11y_pygments/github_light_high_contrast/README.html
new file mode 100644
index 000000000..a48958f20
--- /dev/null
+++ b/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/a11y_pygments/github_light_high_contrast/README.html
@@ -0,0 +1,579 @@
+
+<!DOCTYPE html>
+
+
+<html lang="en" data-content_root="../../../../../../" >
+
+  <head>
+    <meta charset="utf-8" />
+    <meta name="viewport" content="width=device-width, initial-scale=1.0" /><meta name="viewport" content="width=device-width, initial-scale=1" />
+
+    <title>Github Light High Contrast &#8212; Applied Data Analysis and Machine Learning</title>
+  
+  
+  
+  <script data-cfasync="false">
+    document.documentElement.dataset.mode = localStorage.getItem("mode") || "";
+    document.documentElement.dataset.theme = localStorage.getItem("theme") || "";
+  </script>
+  
+  <!-- Loaded before other Sphinx assets -->
+  <link href="http://github.com/_static/styles/theme.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+<link href="http://github.com/_static/styles/bootstrap.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+<link href="http://github.com/_static/styles/pydata-sphinx-theme.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+
+  
+  <link href="http://github.com/_static/vendor/fontawesome/6.5.2/css/all.min.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+  <link rel="preload" as="font" type="font/woff2" crossorigin href="http://github.com/_static/vendor/fontawesome/6.5.2/webfonts/fa-solid-900.woff2" />
+<link rel="preload" as="font" type="font/woff2" crossorigin href="http://github.com/_static/vendor/fontawesome/6.5.2/webfonts/fa-brands-400.woff2" />
+<link rel="preload" as="font" type="font/woff2" crossorigin href="http://github.com/_static/vendor/fontawesome/6.5.2/webfonts/fa-regular-400.woff2" />
+
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/pygments.css?v=03e43079" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/styles/sphinx-book-theme.css?v=eba8b062" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/togglebutton.css?v=13237357" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/copybutton.css?v=76b2166b" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/mystnb.8ecb98da25f57f5357bf6f572d296f466b2cfe2517ffebfabe82451661e28f02.css?v=6644e6bb" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/sphinx-thebe.css?v=4fa983c6" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/sphinx-design.min.css?v=95c83b7e" />
+  
+  <!-- Pre-loaded scripts that we'll load fully later -->
+  <link rel="preload" as="script" href="http://github.com/_static/scripts/bootstrap.js?digest=dfe6caa3a7d634c4db9b" />
+<link rel="preload" as="script" href="http://github.com/_static/scripts/pydata-sphinx-theme.js?digest=dfe6caa3a7d634c4db9b" />
+  <script src="http://github.com/_static/vendor/fontawesome/6.5.2/js/all.min.js?digest=dfe6caa3a7d634c4db9b"></script>
+
+    <script src="http://github.com/_static/documentation_options.js?v=9eb32ce0"></script>
+    <script src="http://github.com/_static/doctools.js?v=9a2dae69"></script>
+    <script src="http://github.com/_static/sphinx_highlight.js?v=dc90522c"></script>
+    <script src="http://github.com/_static/clipboard.min.js?v=a7894cd8"></script>
+    <script src="http://github.com/_static/copybutton.js?v=f281be69"></script>
+    <script src="http://github.com/_static/scripts/sphinx-book-theme.js?v=887ef09a"></script>
+    <script>let toggleHintShow = 'Click to show';</script>
+    <script>let toggleHintHide = 'Click to hide';</script>
+    <script>let toggleOpenOnPrint = 'true';</script>
+    <script src="http://github.com/_static/togglebutton.js?v=4a39c7ea"></script>
+    <script>var togglebuttonSelector = '.toggle, .admonition.dropdown';</script>
+    <script src="http://github.com/_static/design-tabs.js?v=f930bc37"></script>
+    <script>const THEBE_JS_URL = "https://unpkg.com/thebe@0.8.2/lib/index.js"; const thebe_selector = ".thebe,.cell"; const thebe_selector_input = "pre"; const thebe_selector_output = ".output, .cell_output"</script>
+    <script async="async" src="http://github.com/_static/sphinx-thebe.js?v=c100c467"></script>
+    <script>var togglebuttonSelector = '.toggle, .admonition.dropdown';</script>
+    <script>const THEBE_JS_URL = "https://unpkg.com/thebe@0.8.2/lib/index.js"; const thebe_selector = ".thebe,.cell"; const thebe_selector_input = "pre"; const thebe_selector_output = ".output, .cell_output"</script>
+    <script>DOCUMENTATION_OPTIONS.pagename = '.venv/lib/python3.13/site-packages/a11y_pygments/github_light_high_contrast/README';</script>
+    <link rel="index" title="Index" href="http://github.com/genindex.html" />
+    <link rel="search" title="Search" href="http://github.com/search.html" />
+  <meta name="viewport" content="width=device-width, initial-scale=1"/>
+  <meta name="docsearch:language" content="en"/>
+  </head>
+  
+  
+  <body data-bs-spy="scroll" data-bs-target=".bd-toc-nav" data-offset="180" data-bs-root-margin="0px 0px -60%" data-default-mode="">
+
+  
+  
+  <div id="pst-skip-link" class="skip-link d-print-none"><a href="#main-content">Skip to main content</a></div>
+  
+  <div id="pst-scroll-pixel-helper"></div>
+  
+  <button type="button" class="btn rounded-pill" id="pst-back-to-top">
+    <i class="fa-solid fa-arrow-up"></i>Back to top</button>
+
+  
+  <input type="checkbox"
+          class="sidebar-toggle"
+          id="pst-primary-sidebar-checkbox"/>
+  <label class="overlay overlay-primary" for="pst-primary-sidebar-checkbox"></label>
+  
+  <input type="checkbox"
+          class="sidebar-toggle"
+          id="pst-secondary-sidebar-checkbox"/>
+  <label class="overlay overlay-secondary" for="pst-secondary-sidebar-checkbox"></label>
+  
+  <div class="search-button__wrapper">
+    <div class="search-button__overlay"></div>
+    <div class="search-button__search-container">
+<form class="bd-search d-flex align-items-center"
+      action="http://github.com/search.html"
+      method="get">
+  <i class="fa-solid fa-magnifying-glass"></i>
+  <input type="search"
+         class="form-control"
+         name="q"
+         id="search-input"
+         placeholder="Search this book..."
+         aria-label="Search this book..."
+         autocomplete="off"
+         autocorrect="off"
+         autocapitalize="off"
+         spellcheck="false"/>
+  <span class="search-button__kbd-shortcut"><kbd class="kbd-shortcut__modifier">Ctrl</kbd>+<kbd>K</kbd></span>
+</form></div>
+  </div>
+
+  <div class="pst-async-banner-revealer d-none">
+  <aside id="bd-header-version-warning" class="d-none d-print-none" aria-label="Version warning"></aside>
+</div>
+
+  
+    <header class="bd-header navbar navbar-expand-lg bd-navbar d-print-none">
+    </header>
+  
+
+  <div class="bd-container">
+    <div class="bd-container__inner bd-page-width">
+      
+      
+      
+        
+      
+      <div class="bd-sidebar-primary bd-sidebar">
+        
+
+  
+  <div class="sidebar-header-items sidebar-primary__section">
+    
+    
+    
+    
+  </div>
+  
+    <div class="sidebar-primary-items__start sidebar-primary__section">
+        <div class="sidebar-primary-item">
+
+  
+    
+  
+
+<a class="navbar-brand logo" href="http://github.com/intro.html">
+  
+  
+  
+  
+  
+    
+    
+      
+    
+    
+    <img src="http://github.com/_static/logo.png" class="logo__image only-light" alt="Applied Data Analysis and Machine Learning - Home"/>
+    <script>document.write(`<img src="http://github.com/_static/logo.png" class="logo__image only-dark" alt="Applied Data Analysis and Machine Learning - Home"/>`);</script>
+  
+  
+</a></div>
+        <div class="sidebar-primary-item">
+
+ <script>
+ document.write(`
+   <button class="btn search-button-field search-button__button" title="Search" aria-label="Search" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <i class="fa-solid fa-magnifying-glass"></i>
+    <span class="search-button__default-text">Search</span>
+    <span class="search-button__kbd-shortcut"><kbd class="kbd-shortcut__modifier">Ctrl</kbd>+<kbd class="kbd-shortcut__modifier">K</kbd></span>
+   </button>
+ `);
+ </script></div>
+        <div class="sidebar-primary-item"><nav class="bd-links bd-docs-nav" aria-label="Main">
+    <div class="bd-toc-item navbar-nav active">
+        
+        <ul class="nav bd-sidenav bd-sidenav__home-link">
+            <li class="toctree-l1">
+                <a class="reference internal" href="http://github.com/intro.html">
+                    Applied Data Analysis and Machine Learning
+                </a>
+            </li>
+        </ul>
+        <p aria-level="2" class="caption" role="heading"><span class="caption-text">About the course</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/schedule.html">Course setting</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/teachers.html">Teachers and Grading</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/textbooks.html">Textbooks</a></li>
+
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Review of Statistics with Resampling Techniques and Linear Algebra</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/statistics.html">1. Elements of Probability Theory and Statistical Data Analysis</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/linalg.html">2. Linear Algebra, Handling of Arrays and more Python Features</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">From Regression to Support Vector Machines</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter1.html">3. Linear Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter2.html">4. Ridge and Lasso Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter3.html">5. Resampling Methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter4.html">6. Logistic Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapteroptimization.html">7. Optimization, the central part of any Machine Learning algortithm</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter5.html">8. Support Vector Machines, overarching aims</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Decision Trees, Ensemble Methods and Boosting</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter6.html">9. Decision trees, overarching aims</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter7.html">10. Ensemble Methods: From a Single Tree to Many Trees and Extreme Boosting, Meet the Jungle of Methods</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Dimensionality Reduction</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter8.html">11. Basic ideas of the Principal Component Analysis (PCA)</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/clustering.html">12. Clustering and Unsupervised Learning</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Deep Learning Methods</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter9.html">13. Neural networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter10.html">14. Building a Feed Forward Neural Network</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter11.html">15. Solving Differential Equations  with Deep Learning</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter12.html">16. Convolutional Neural Networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter13.html">17. Recurrent neural networks: Overarching view</a></li>
+
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Weekly material, notes and exercises</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek34.html">Exercises week 34</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week34.html">Week 34: Introduction to the course, Logistics and Practicalities</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek35.html">Exercises week 35</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week35.html">Week 35: From Ordinary Linear Regression to Ridge and Lasso Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek36.html">Exercises week 36</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week36.html">Week 36: Linear Regression and Gradient descent</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek37.html">Exercises week 37</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week37.html">Week 37: Gradient descent methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek38.html">Exercises week 38</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week38.html">Week 38: Statistical analysis, bias-variance tradeoff and resampling methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek39.html">Exercises week 39</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week39.html">Week 39: Resampling methods and logistic regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week40.html">Week 40: Gradient descent methods (continued) and start Neural networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week41.html">Week 41 Neural networks and constructing a neural network code</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek41.html">Exercises week 41</a></li>
+
+
+
+
+
+
+
+
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Projects</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/project1.html">Project 1 on Machine Learning, deadline October 6 (midnight), 2025</a></li>
+</ul>
+
+    </div>
+</nav></div>
+    </div>
+  
+  
+  <div class="sidebar-primary-items__end sidebar-primary__section">
+  </div>
+  
+  <div id="rtd-footer-container"></div>
+
+
+      </div>
+      
+      <main id="main-content" class="bd-main" role="main">
+        
+        
+
+<div class="sbt-scroll-pixel-helper"></div>
+
+          <div class="bd-content">
+            <div class="bd-article-container">
+              
+              <div class="bd-header-article d-print-none">
+<div class="header-article-items header-article__inner">
+  
+    <div class="header-article-items__start">
+      
+        <div class="header-article-item"><button class="sidebar-toggle primary-toggle btn btn-sm" title="Toggle primary sidebar" data-bs-placement="bottom" data-bs-toggle="tooltip">
+  <span class="fa-solid fa-bars"></span>
+</button></div>
+      
+    </div>
+  
+  
+    <div class="header-article-items__end">
+      
+        <div class="header-article-item">
+
+<div class="article-header-buttons">
+
+
+
+
+
+<div class="dropdown dropdown-download-buttons">
+  <button class="btn dropdown-toggle" type="button" data-bs-toggle="dropdown" aria-expanded="false" aria-label="Download this page">
+    <i class="fas fa-download"></i>
+  </button>
+  <ul class="dropdown-menu">
+      
+      
+      
+      <li><a href="http://github.com/_sources/.venv/lib/python3.13/site-packages/a11y_pygments/github_light_high_contrast/README.md" target="_blank"
+   class="btn btn-sm btn-download-source-button dropdown-item"
+   title="Download source file"
+   data-bs-placement="left" data-bs-toggle="tooltip"
+>
+  
+
+<span class="btn__icon-container">
+  <i class="fas fa-file"></i>
+  </span>
+<span class="btn__text-container">.md</span>
+</a>
+</li>
+      
+      
+      
+      
+      <li>
+<button onclick="window.print()"
+  class="btn btn-sm btn-download-pdf-button dropdown-item"
+  title="Print to PDF"
+  data-bs-placement="left" data-bs-toggle="tooltip"
+>
+  
+
+<span class="btn__icon-container">
+  <i class="fas fa-file-pdf"></i>
+  </span>
+<span class="btn__text-container">.pdf</span>
+</button>
+</li>
+      
+  </ul>
+</div>
+
+
+
+
+<button onclick="toggleFullScreen()"
+  class="btn btn-sm btn-fullscreen-button"
+  title="Fullscreen mode"
+  data-bs-placement="bottom" data-bs-toggle="tooltip"
+>
+  
+
+<span class="btn__icon-container">
+  <i class="fas fa-expand"></i>
+  </span>
+
+</button>
+
+
+
+<script>
+document.write(`
+  <button class="btn btn-sm nav-link pst-navbar-icon theme-switch-button" title="light/dark" aria-label="light/dark" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <i class="theme-switch fa-solid fa-sun fa-lg" data-mode="light"></i>
+    <i class="theme-switch fa-solid fa-moon fa-lg" data-mode="dark"></i>
+    <i class="theme-switch fa-solid fa-circle-half-stroke fa-lg" data-mode="auto"></i>
+  </button>
+`);
+</script>
+
+
+<script>
+document.write(`
+  <button class="btn btn-sm pst-navbar-icon search-button search-button__button" title="Search" aria-label="Search" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <i class="fa-solid fa-magnifying-glass fa-lg"></i>
+  </button>
+`);
+</script>
+<button class="sidebar-toggle secondary-toggle btn btn-sm" title="Toggle secondary sidebar" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <span class="fa-solid fa-list"></span>
+</button>
+</div></div>
+      
+    </div>
+  
+</div>
+</div>
+              
+              
+
+<div id="jb-print-docs-body" class="onlyprint">
+    <h1>Github Light High Contrast</h1>
+    <!-- Table of contents -->
+    <div id="print-main-content">
+        <div id="jb-print-toc">
+            
+            <div>
+                <h2> Contents </h2>
+            </div>
+            <nav aria-label="Page">
+                <ul class="visible nav section-nav flex-column">
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#colors">Colors</a></li>
+</ul>
+            </nav>
+        </div>
+    </div>
+</div>
+
+              
+                
+<div id="searchbox"></div>
+                <article class="bd-article">
+                  
+  <section class="tex2jax_ignore mathjax_ignore" id="github-light-high-contrast">
+<h1>Github Light High Contrast<a class="headerlink" href="#github-light-high-contrast" title="Link to this heading">#</a></h1>
+<p>This style mimics the github light high contrast theme from vscode themes.</p>
+<p><img alt="Screenshot of the github-light-high-contrast theme in a bash script" src="http://github.com/_images/github-light-high-contrast.png" /></p>
+<section id="colors">
+<h2>Colors<a class="headerlink" href="#colors" title="Link to this heading">#</a></h2>
+<p>Background color: <img alt="#ffffff" src="https://via.placeholder.com/20/ffffff/ffffff.png" /> <code class="docutils literal notranslate"><span class="pre">#ffffff</span></code></p>
+<p>Highlight color: <img alt="#0969da4a" src="https://via.placeholder.com/20/0969da4a/0969da4a.png" /> <code class="docutils literal notranslate"><span class="pre">#0969da4a</span></code></p>
+<p><strong>WCAG compliance</strong></p>
+<div class="pst-scrollable-table-container"><table class="table">
+<thead>
+<tr class="row-odd"><th class="head"><p>Color</p></th>
+<th class="head"><p>Hex</p></th>
+<th class="head"><p>Ratio</p></th>
+<th class="head"><p>Normal text</p></th>
+<th class="head"><p>Large text</p></th>
+</tr>
+</thead>
+<tbody>
+<tr class="row-even"><td><p><img alt="#66707b" src="http://github.com/_images/66707b.png" /></p></td>
+<td><p><code class="docutils literal notranslate"><span class="pre">#66707b</span></code></p></td>
+<td><p>5.0 : 1</p></td>
+<td><p>AA</p></td>
+<td><p>AAA</p></td>
+</tr>
+<tr class="row-odd"><td><p><img alt="#a0111f" src="http://github.com/_images/a0111f.png" /></p></td>
+<td><p><code class="docutils literal notranslate"><span class="pre">#a0111f</span></code></p></td>
+<td><p>8.1 : 1</p></td>
+<td><p>AAA</p></td>
+<td><p>AAA</p></td>
+</tr>
+<tr class="row-even"><td><p><img alt="#702c00" src="http://github.com/_images/702c00.png" /></p></td>
+<td><p><code class="docutils literal notranslate"><span class="pre">#702c00</span></code></p></td>
+<td><p>10.2 : 1</p></td>
+<td><p>AAA</p></td>
+<td><p>AAA</p></td>
+</tr>
+<tr class="row-odd"><td><p><img alt="#024c1a" src="http://github.com/_images/024c1a.png" /></p></td>
+<td><p><code class="docutils literal notranslate"><span class="pre">#024c1a</span></code></p></td>
+<td><p>10.2 : 1</p></td>
+<td><p>AAA</p></td>
+<td><p>AAA</p></td>
+</tr>
+<tr class="row-even"><td><p><img alt="#023b95" src="http://github.com/_images/023b95.png" /></p></td>
+<td><p><code class="docutils literal notranslate"><span class="pre">#023b95</span></code></p></td>
+<td><p>10.2 : 1</p></td>
+<td><p>AAA</p></td>
+<td><p>AAA</p></td>
+</tr>
+<tr class="row-odd"><td><p><img alt="#622cbc" src="http://github.com/_images/622cbc.png" /></p></td>
+<td><p><code class="docutils literal notranslate"><span class="pre">#622cbc</span></code></p></td>
+<td><p>8.1 : 1</p></td>
+<td><p>AAA</p></td>
+<td><p>AAA</p></td>
+</tr>
+<tr class="row-even"><td><p><img alt="#24292f" src="http://github.com/_images/24292f.png" /></p></td>
+<td><p><code class="docutils literal notranslate"><span class="pre">#24292f</span></code></p></td>
+<td><p>14.7 : 1</p></td>
+<td><p>AAA</p></td>
+<td><p>AAA</p></td>
+</tr>
+</tbody>
+</table>
+</div>
+</section>
+</section>
+
+    <script type="text/x-thebe-config">
+    {
+        requestKernel: true,
+        binderOptions: {
+            repo: "binder-examples/jupyter-stacks-datascience",
+            ref: "master",
+        },
+        codeMirrorConfig: {
+            theme: "abcdef",
+            mode: "python"
+        },
+        kernelOptions: {
+            name: "python3",
+            path: "./.venv/lib/python3.13/site-packages/a11y_pygments/github_light_high_contrast"
+        },
+        predefinedOutput: true
+    }
+    </script>
+    <script>kernelName = 'python3'</script>
+
+                </article>
+              
+
+              
+              
+              
+              
+                <footer class="prev-next-footer d-print-none">
+                  
+<div class="prev-next-area">
+</div>
+                </footer>
+              
+            </div>
+            
+            
+              
+                <div class="bd-sidebar-secondary bd-toc"><div class="sidebar-secondary-items sidebar-secondary__inner">
+
+
+  <div class="sidebar-secondary-item">
+  <div class="page-toc tocsection onthispage">
+    <i class="fa-solid fa-list"></i> Contents
+  </div>
+  <nav class="bd-toc-nav page-toc">
+    <ul class="visible nav section-nav flex-column">
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#colors">Colors</a></li>
+</ul>
+  </nav></div>
+
+</div></div>
+              
+            
+          </div>
+          <footer class="bd-footer-content">
+            
+<div class="bd-footer-content__inner container">
+  
+  <div class="footer-item">
+    
+<p class="component-author">
+By Morten Hjorth-Jensen
+</p>
+
+  </div>
+  
+  <div class="footer-item">
+    
+
+  <p class="copyright">
+    
+      © Copyright 2023.
+      <br/>
+    
+  </p>
+
+  </div>
+  
+  <div class="footer-item">
+    
+  </div>
+  
+  <div class="footer-item">
+    
+  </div>
+  
+</div>
+          </footer>
+        
+
+      </main>
+    </div>
+  </div>
+  
+  <!-- Scripts loaded after <body> so the DOM is not blocked -->
+  <script src="http://github.com/_static/scripts/bootstrap.js?digest=dfe6caa3a7d634c4db9b"></script>
+<script src="http://github.com/_static/scripts/pydata-sphinx-theme.js?digest=dfe6caa3a7d634c4db9b"></script>
+
+  <footer class="bd-footer">
+  </footer>
+  </body>
+</html>
\ No newline at end of file
diff --git a/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/a11y_pygments/gotthard_dark/README.html b/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/a11y_pygments/gotthard_dark/README.html
new file mode 100644
index 000000000..a86c6b072
--- /dev/null
+++ b/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/a11y_pygments/gotthard_dark/README.html
@@ -0,0 +1,579 @@
+
+<!DOCTYPE html>
+
+
+<html lang="en" data-content_root="../../../../../../" >
+
+  <head>
+    <meta charset="utf-8" />
+    <meta name="viewport" content="width=device-width, initial-scale=1.0" /><meta name="viewport" content="width=device-width, initial-scale=1" />
+
+    <title>Gotthard Dark &#8212; Applied Data Analysis and Machine Learning</title>
+  
+  
+  
+  <script data-cfasync="false">
+    document.documentElement.dataset.mode = localStorage.getItem("mode") || "";
+    document.documentElement.dataset.theme = localStorage.getItem("theme") || "";
+  </script>
+  
+  <!-- Loaded before other Sphinx assets -->
+  <link href="http://github.com/_static/styles/theme.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+<link href="http://github.com/_static/styles/bootstrap.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+<link href="http://github.com/_static/styles/pydata-sphinx-theme.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+
+  
+  <link href="http://github.com/_static/vendor/fontawesome/6.5.2/css/all.min.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+  <link rel="preload" as="font" type="font/woff2" crossorigin href="http://github.com/_static/vendor/fontawesome/6.5.2/webfonts/fa-solid-900.woff2" />
+<link rel="preload" as="font" type="font/woff2" crossorigin href="http://github.com/_static/vendor/fontawesome/6.5.2/webfonts/fa-brands-400.woff2" />
+<link rel="preload" as="font" type="font/woff2" crossorigin href="http://github.com/_static/vendor/fontawesome/6.5.2/webfonts/fa-regular-400.woff2" />
+
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/pygments.css?v=03e43079" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/styles/sphinx-book-theme.css?v=eba8b062" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/togglebutton.css?v=13237357" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/copybutton.css?v=76b2166b" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/mystnb.8ecb98da25f57f5357bf6f572d296f466b2cfe2517ffebfabe82451661e28f02.css?v=6644e6bb" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/sphinx-thebe.css?v=4fa983c6" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/sphinx-design.min.css?v=95c83b7e" />
+  
+  <!-- Pre-loaded scripts that we'll load fully later -->
+  <link rel="preload" as="script" href="http://github.com/_static/scripts/bootstrap.js?digest=dfe6caa3a7d634c4db9b" />
+<link rel="preload" as="script" href="http://github.com/_static/scripts/pydata-sphinx-theme.js?digest=dfe6caa3a7d634c4db9b" />
+  <script src="http://github.com/_static/vendor/fontawesome/6.5.2/js/all.min.js?digest=dfe6caa3a7d634c4db9b"></script>
+
+    <script src="http://github.com/_static/documentation_options.js?v=9eb32ce0"></script>
+    <script src="http://github.com/_static/doctools.js?v=9a2dae69"></script>
+    <script src="http://github.com/_static/sphinx_highlight.js?v=dc90522c"></script>
+    <script src="http://github.com/_static/clipboard.min.js?v=a7894cd8"></script>
+    <script src="http://github.com/_static/copybutton.js?v=f281be69"></script>
+    <script src="http://github.com/_static/scripts/sphinx-book-theme.js?v=887ef09a"></script>
+    <script>let toggleHintShow = 'Click to show';</script>
+    <script>let toggleHintHide = 'Click to hide';</script>
+    <script>let toggleOpenOnPrint = 'true';</script>
+    <script src="http://github.com/_static/togglebutton.js?v=4a39c7ea"></script>
+    <script>var togglebuttonSelector = '.toggle, .admonition.dropdown';</script>
+    <script src="http://github.com/_static/design-tabs.js?v=f930bc37"></script>
+    <script>const THEBE_JS_URL = "https://unpkg.com/thebe@0.8.2/lib/index.js"; const thebe_selector = ".thebe,.cell"; const thebe_selector_input = "pre"; const thebe_selector_output = ".output, .cell_output"</script>
+    <script async="async" src="http://github.com/_static/sphinx-thebe.js?v=c100c467"></script>
+    <script>var togglebuttonSelector = '.toggle, .admonition.dropdown';</script>
+    <script>const THEBE_JS_URL = "https://unpkg.com/thebe@0.8.2/lib/index.js"; const thebe_selector = ".thebe,.cell"; const thebe_selector_input = "pre"; const thebe_selector_output = ".output, .cell_output"</script>
+    <script>DOCUMENTATION_OPTIONS.pagename = '.venv/lib/python3.13/site-packages/a11y_pygments/gotthard_dark/README';</script>
+    <link rel="index" title="Index" href="http://github.com/genindex.html" />
+    <link rel="search" title="Search" href="http://github.com/search.html" />
+  <meta name="viewport" content="width=device-width, initial-scale=1"/>
+  <meta name="docsearch:language" content="en"/>
+  </head>
+  
+  
+  <body data-bs-spy="scroll" data-bs-target=".bd-toc-nav" data-offset="180" data-bs-root-margin="0px 0px -60%" data-default-mode="">
+
+  
+  
+  <div id="pst-skip-link" class="skip-link d-print-none"><a href="#main-content">Skip to main content</a></div>
+  
+  <div id="pst-scroll-pixel-helper"></div>
+  
+  <button type="button" class="btn rounded-pill" id="pst-back-to-top">
+    <i class="fa-solid fa-arrow-up"></i>Back to top</button>
+
+  
+  <input type="checkbox"
+          class="sidebar-toggle"
+          id="pst-primary-sidebar-checkbox"/>
+  <label class="overlay overlay-primary" for="pst-primary-sidebar-checkbox"></label>
+  
+  <input type="checkbox"
+          class="sidebar-toggle"
+          id="pst-secondary-sidebar-checkbox"/>
+  <label class="overlay overlay-secondary" for="pst-secondary-sidebar-checkbox"></label>
+  
+  <div class="search-button__wrapper">
+    <div class="search-button__overlay"></div>
+    <div class="search-button__search-container">
+<form class="bd-search d-flex align-items-center"
+      action="http://github.com/search.html"
+      method="get">
+  <i class="fa-solid fa-magnifying-glass"></i>
+  <input type="search"
+         class="form-control"
+         name="q"
+         id="search-input"
+         placeholder="Search this book..."
+         aria-label="Search this book..."
+         autocomplete="off"
+         autocorrect="off"
+         autocapitalize="off"
+         spellcheck="false"/>
+  <span class="search-button__kbd-shortcut"><kbd class="kbd-shortcut__modifier">Ctrl</kbd>+<kbd>K</kbd></span>
+</form></div>
+  </div>
+
+  <div class="pst-async-banner-revealer d-none">
+  <aside id="bd-header-version-warning" class="d-none d-print-none" aria-label="Version warning"></aside>
+</div>
+
+  
+    <header class="bd-header navbar navbar-expand-lg bd-navbar d-print-none">
+    </header>
+  
+
+  <div class="bd-container">
+    <div class="bd-container__inner bd-page-width">
+      
+      
+      
+        
+      
+      <div class="bd-sidebar-primary bd-sidebar">
+        
+
+  
+  <div class="sidebar-header-items sidebar-primary__section">
+    
+    
+    
+    
+  </div>
+  
+    <div class="sidebar-primary-items__start sidebar-primary__section">
+        <div class="sidebar-primary-item">
+
+  
+    
+  
+
+<a class="navbar-brand logo" href="http://github.com/intro.html">
+  
+  
+  
+  
+  
+    
+    
+      
+    
+    
+    <img src="http://github.com/_static/logo.png" class="logo__image only-light" alt="Applied Data Analysis and Machine Learning - Home"/>
+    <script>document.write(`<img src="http://github.com/_static/logo.png" class="logo__image only-dark" alt="Applied Data Analysis and Machine Learning - Home"/>`);</script>
+  
+  
+</a></div>
+        <div class="sidebar-primary-item">
+
+ <script>
+ document.write(`
+   <button class="btn search-button-field search-button__button" title="Search" aria-label="Search" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <i class="fa-solid fa-magnifying-glass"></i>
+    <span class="search-button__default-text">Search</span>
+    <span class="search-button__kbd-shortcut"><kbd class="kbd-shortcut__modifier">Ctrl</kbd>+<kbd class="kbd-shortcut__modifier">K</kbd></span>
+   </button>
+ `);
+ </script></div>
+        <div class="sidebar-primary-item"><nav class="bd-links bd-docs-nav" aria-label="Main">
+    <div class="bd-toc-item navbar-nav active">
+        
+        <ul class="nav bd-sidenav bd-sidenav__home-link">
+            <li class="toctree-l1">
+                <a class="reference internal" href="http://github.com/intro.html">
+                    Applied Data Analysis and Machine Learning
+                </a>
+            </li>
+        </ul>
+        <p aria-level="2" class="caption" role="heading"><span class="caption-text">About the course</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/schedule.html">Course setting</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/teachers.html">Teachers and Grading</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/textbooks.html">Textbooks</a></li>
+
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Review of Statistics with Resampling Techniques and Linear Algebra</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/statistics.html">1. Elements of Probability Theory and Statistical Data Analysis</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/linalg.html">2. Linear Algebra, Handling of Arrays and more Python Features</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">From Regression to Support Vector Machines</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter1.html">3. Linear Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter2.html">4. Ridge and Lasso Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter3.html">5. Resampling Methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter4.html">6. Logistic Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapteroptimization.html">7. Optimization, the central part of any Machine Learning algortithm</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter5.html">8. Support Vector Machines, overarching aims</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Decision Trees, Ensemble Methods and Boosting</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter6.html">9. Decision trees, overarching aims</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter7.html">10. Ensemble Methods: From a Single Tree to Many Trees and Extreme Boosting, Meet the Jungle of Methods</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Dimensionality Reduction</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter8.html">11. Basic ideas of the Principal Component Analysis (PCA)</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/clustering.html">12. Clustering and Unsupervised Learning</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Deep Learning Methods</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter9.html">13. Neural networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter10.html">14. Building a Feed Forward Neural Network</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter11.html">15. Solving Differential Equations  with Deep Learning</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter12.html">16. Convolutional Neural Networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter13.html">17. Recurrent neural networks: Overarching view</a></li>
+
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Weekly material, notes and exercises</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek34.html">Exercises week 34</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week34.html">Week 34: Introduction to the course, Logistics and Practicalities</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek35.html">Exercises week 35</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week35.html">Week 35: From Ordinary Linear Regression to Ridge and Lasso Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek36.html">Exercises week 36</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week36.html">Week 36: Linear Regression and Gradient descent</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek37.html">Exercises week 37</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week37.html">Week 37: Gradient descent methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek38.html">Exercises week 38</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week38.html">Week 38: Statistical analysis, bias-variance tradeoff and resampling methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek39.html">Exercises week 39</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week39.html">Week 39: Resampling methods and logistic regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week40.html">Week 40: Gradient descent methods (continued) and start Neural networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week41.html">Week 41 Neural networks and constructing a neural network code</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek41.html">Exercises week 41</a></li>
+
+
+
+
+
+
+
+
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Projects</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/project1.html">Project 1 on Machine Learning, deadline October 6 (midnight), 2025</a></li>
+</ul>
+
+    </div>
+</nav></div>
+    </div>
+  
+  
+  <div class="sidebar-primary-items__end sidebar-primary__section">
+  </div>
+  
+  <div id="rtd-footer-container"></div>
+
+
+      </div>
+      
+      <main id="main-content" class="bd-main" role="main">
+        
+        
+
+<div class="sbt-scroll-pixel-helper"></div>
+
+          <div class="bd-content">
+            <div class="bd-article-container">
+              
+              <div class="bd-header-article d-print-none">
+<div class="header-article-items header-article__inner">
+  
+    <div class="header-article-items__start">
+      
+        <div class="header-article-item"><button class="sidebar-toggle primary-toggle btn btn-sm" title="Toggle primary sidebar" data-bs-placement="bottom" data-bs-toggle="tooltip">
+  <span class="fa-solid fa-bars"></span>
+</button></div>
+      
+    </div>
+  
+  
+    <div class="header-article-items__end">
+      
+        <div class="header-article-item">
+
+<div class="article-header-buttons">
+
+
+
+
+
+<div class="dropdown dropdown-download-buttons">
+  <button class="btn dropdown-toggle" type="button" data-bs-toggle="dropdown" aria-expanded="false" aria-label="Download this page">
+    <i class="fas fa-download"></i>
+  </button>
+  <ul class="dropdown-menu">
+      
+      
+      
+      <li><a href="http://github.com/_sources/.venv/lib/python3.13/site-packages/a11y_pygments/gotthard_dark/README.md" target="_blank"
+   class="btn btn-sm btn-download-source-button dropdown-item"
+   title="Download source file"
+   data-bs-placement="left" data-bs-toggle="tooltip"
+>
+  
+
+<span class="btn__icon-container">
+  <i class="fas fa-file"></i>
+  </span>
+<span class="btn__text-container">.md</span>
+</a>
+</li>
+      
+      
+      
+      
+      <li>
+<button onclick="window.print()"
+  class="btn btn-sm btn-download-pdf-button dropdown-item"
+  title="Print to PDF"
+  data-bs-placement="left" data-bs-toggle="tooltip"
+>
+  
+
+<span class="btn__icon-container">
+  <i class="fas fa-file-pdf"></i>
+  </span>
+<span class="btn__text-container">.pdf</span>
+</button>
+</li>
+      
+  </ul>
+</div>
+
+
+
+
+<button onclick="toggleFullScreen()"
+  class="btn btn-sm btn-fullscreen-button"
+  title="Fullscreen mode"
+  data-bs-placement="bottom" data-bs-toggle="tooltip"
+>
+  
+
+<span class="btn__icon-container">
+  <i class="fas fa-expand"></i>
+  </span>
+
+</button>
+
+
+
+<script>
+document.write(`
+  <button class="btn btn-sm nav-link pst-navbar-icon theme-switch-button" title="light/dark" aria-label="light/dark" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <i class="theme-switch fa-solid fa-sun fa-lg" data-mode="light"></i>
+    <i class="theme-switch fa-solid fa-moon fa-lg" data-mode="dark"></i>
+    <i class="theme-switch fa-solid fa-circle-half-stroke fa-lg" data-mode="auto"></i>
+  </button>
+`);
+</script>
+
+
+<script>
+document.write(`
+  <button class="btn btn-sm pst-navbar-icon search-button search-button__button" title="Search" aria-label="Search" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <i class="fa-solid fa-magnifying-glass fa-lg"></i>
+  </button>
+`);
+</script>
+<button class="sidebar-toggle secondary-toggle btn btn-sm" title="Toggle secondary sidebar" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <span class="fa-solid fa-list"></span>
+</button>
+</div></div>
+      
+    </div>
+  
+</div>
+</div>
+              
+              
+
+<div id="jb-print-docs-body" class="onlyprint">
+    <h1>Gotthard Dark</h1>
+    <!-- Table of contents -->
+    <div id="print-main-content">
+        <div id="jb-print-toc">
+            
+            <div>
+                <h2> Contents </h2>
+            </div>
+            <nav aria-label="Page">
+                <ul class="visible nav section-nav flex-column">
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#colors">Colors</a></li>
+</ul>
+            </nav>
+        </div>
+    </div>
+</div>
+
+              
+                
+<div id="searchbox"></div>
+                <article class="bd-article">
+                  
+  <section class="tex2jax_ignore mathjax_ignore" id="gotthard-dark">
+<h1>Gotthard Dark<a class="headerlink" href="#gotthard-dark" title="Link to this heading">#</a></h1>
+<p>This style mimics the gotthard dark theme from vscode.</p>
+<p><img alt="Screenshot of the gotthard-dark theme in a bash script" src="http://github.com/_images/gotthard-dark.png" /></p>
+<section id="colors">
+<h2>Colors<a class="headerlink" href="#colors" title="Link to this heading">#</a></h2>
+<p>Background color: <img alt="#000000" src="https://via.placeholder.com/20/000000/000000.png" /> <code class="docutils literal notranslate"><span class="pre">#000000</span></code></p>
+<p>Highlight color: <img alt="#4c4b4be8" src="https://via.placeholder.com/20/4c4b4be8/4c4b4be8.png" /> <code class="docutils literal notranslate"><span class="pre">#4c4b4be8</span></code></p>
+<p><strong>WCAG compliance</strong></p>
+<div class="pst-scrollable-table-container"><table class="table">
+<thead>
+<tr class="row-odd"><th class="head"><p>Color</p></th>
+<th class="head"><p>Hex</p></th>
+<th class="head"><p>Ratio</p></th>
+<th class="head"><p>Normal text</p></th>
+<th class="head"><p>Large text</p></th>
+</tr>
+</thead>
+<tbody>
+<tr class="row-even"><td><p><img alt="#f5f5f5" src="http://github.com/_images/f5f5f5.png" /></p></td>
+<td><p><code class="docutils literal notranslate"><span class="pre">#f5f5f5</span></code></p></td>
+<td><p>19.3 : 1</p></td>
+<td><p>AAA</p></td>
+<td><p>AAA</p></td>
+</tr>
+<tr class="row-odd"><td><p><img alt="#ab6369" src="http://github.com/_images/ab6369.png" /></p></td>
+<td><p><code class="docutils literal notranslate"><span class="pre">#ab6369</span></code></p></td>
+<td><p>4.7 : 1</p></td>
+<td><p>AA</p></td>
+<td><p>AAA</p></td>
+</tr>
+<tr class="row-even"><td><p><img alt="#b89784" src="http://github.com/_images/b89784.png" /></p></td>
+<td><p><code class="docutils literal notranslate"><span class="pre">#b89784</span></code></p></td>
+<td><p>7.8 : 1</p></td>
+<td><p>AAA</p></td>
+<td><p>AAA</p></td>
+</tr>
+<tr class="row-odd"><td><p><img alt="#caab6d" src="http://github.com/_images/caab6d.png" /></p></td>
+<td><p><code class="docutils literal notranslate"><span class="pre">#caab6d</span></code></p></td>
+<td><p>9.6 : 1</p></td>
+<td><p>AAA</p></td>
+<td><p>AAA</p></td>
+</tr>
+<tr class="row-even"><td><p><img alt="#81b19b" src="http://github.com/_images/81b19b.png" /></p></td>
+<td><p><code class="docutils literal notranslate"><span class="pre">#81b19b</span></code></p></td>
+<td><p>8.7 : 1</p></td>
+<td><p>AAA</p></td>
+<td><p>AAA</p></td>
+</tr>
+<tr class="row-odd"><td><p><img alt="#6f98b3" src="http://github.com/_images/6f98b3.png" /></p></td>
+<td><p><code class="docutils literal notranslate"><span class="pre">#6f98b3</span></code></p></td>
+<td><p>6.8 : 1</p></td>
+<td><p>AA</p></td>
+<td><p>AAA</p></td>
+</tr>
+<tr class="row-even"><td><p><img alt="#b19db4" src="http://github.com/_images/b19db4.png" /></p></td>
+<td><p><code class="docutils literal notranslate"><span class="pre">#b19db4</span></code></p></td>
+<td><p>8.4 : 1</p></td>
+<td><p>AAA</p></td>
+<td><p>AAA</p></td>
+</tr>
+</tbody>
+</table>
+</div>
+</section>
+</section>
+
+    <script type="text/x-thebe-config">
+    {
+        requestKernel: true,
+        binderOptions: {
+            repo: "binder-examples/jupyter-stacks-datascience",
+            ref: "master",
+        },
+        codeMirrorConfig: {
+            theme: "abcdef",
+            mode: "python"
+        },
+        kernelOptions: {
+            name: "python3",
+            path: "./.venv/lib/python3.13/site-packages/a11y_pygments/gotthard_dark"
+        },
+        predefinedOutput: true
+    }
+    </script>
+    <script>kernelName = 'python3'</script>
+
+                </article>
+              
+
+              
+              
+              
+              
+                <footer class="prev-next-footer d-print-none">
+                  
+<div class="prev-next-area">
+</div>
+                </footer>
+              
+            </div>
+            
+            
+              
+                <div class="bd-sidebar-secondary bd-toc"><div class="sidebar-secondary-items sidebar-secondary__inner">
+
+
+  <div class="sidebar-secondary-item">
+  <div class="page-toc tocsection onthispage">
+    <i class="fa-solid fa-list"></i> Contents
+  </div>
+  <nav class="bd-toc-nav page-toc">
+    <ul class="visible nav section-nav flex-column">
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#colors">Colors</a></li>
+</ul>
+  </nav></div>
+
+</div></div>
+              
+            
+          </div>
+          <footer class="bd-footer-content">
+            
+<div class="bd-footer-content__inner container">
+  
+  <div class="footer-item">
+    
+<p class="component-author">
+By Morten Hjorth-Jensen
+</p>
+
+  </div>
+  
+  <div class="footer-item">
+    
+
+  <p class="copyright">
+    
+      © Copyright 2023.
+      <br/>
+    
+  </p>
+
+  </div>
+  
+  <div class="footer-item">
+    
+  </div>
+  
+  <div class="footer-item">
+    
+  </div>
+  
+</div>
+          </footer>
+        
+
+      </main>
+    </div>
+  </div>
+  
+  <!-- Scripts loaded after <body> so the DOM is not blocked -->
+  <script src="http://github.com/_static/scripts/bootstrap.js?digest=dfe6caa3a7d634c4db9b"></script>
+<script src="http://github.com/_static/scripts/pydata-sphinx-theme.js?digest=dfe6caa3a7d634c4db9b"></script>
+
+  <footer class="bd-footer">
+  </footer>
+  </body>
+</html>
\ No newline at end of file
diff --git a/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/a11y_pygments/gotthard_light/README.html b/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/a11y_pygments/gotthard_light/README.html
new file mode 100644
index 000000000..f6919644d
--- /dev/null
+++ b/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/a11y_pygments/gotthard_light/README.html
@@ -0,0 +1,579 @@
+
+<!DOCTYPE html>
+
+
+<html lang="en" data-content_root="../../../../../../" >
+
+  <head>
+    <meta charset="utf-8" />
+    <meta name="viewport" content="width=device-width, initial-scale=1.0" /><meta name="viewport" content="width=device-width, initial-scale=1" />
+
+    <title>Gotthard Light &#8212; Applied Data Analysis and Machine Learning</title>
+  
+  
+  
+  <script data-cfasync="false">
+    document.documentElement.dataset.mode = localStorage.getItem("mode") || "";
+    document.documentElement.dataset.theme = localStorage.getItem("theme") || "";
+  </script>
+  
+  <!-- Loaded before other Sphinx assets -->
+  <link href="http://github.com/_static/styles/theme.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+<link href="http://github.com/_static/styles/bootstrap.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+<link href="http://github.com/_static/styles/pydata-sphinx-theme.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+
+  
+  <link href="http://github.com/_static/vendor/fontawesome/6.5.2/css/all.min.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+  <link rel="preload" as="font" type="font/woff2" crossorigin href="http://github.com/_static/vendor/fontawesome/6.5.2/webfonts/fa-solid-900.woff2" />
+<link rel="preload" as="font" type="font/woff2" crossorigin href="http://github.com/_static/vendor/fontawesome/6.5.2/webfonts/fa-brands-400.woff2" />
+<link rel="preload" as="font" type="font/woff2" crossorigin href="http://github.com/_static/vendor/fontawesome/6.5.2/webfonts/fa-regular-400.woff2" />
+
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/pygments.css?v=03e43079" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/styles/sphinx-book-theme.css?v=eba8b062" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/togglebutton.css?v=13237357" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/copybutton.css?v=76b2166b" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/mystnb.8ecb98da25f57f5357bf6f572d296f466b2cfe2517ffebfabe82451661e28f02.css?v=6644e6bb" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/sphinx-thebe.css?v=4fa983c6" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/sphinx-design.min.css?v=95c83b7e" />
+  
+  <!-- Pre-loaded scripts that we'll load fully later -->
+  <link rel="preload" as="script" href="http://github.com/_static/scripts/bootstrap.js?digest=dfe6caa3a7d634c4db9b" />
+<link rel="preload" as="script" href="http://github.com/_static/scripts/pydata-sphinx-theme.js?digest=dfe6caa3a7d634c4db9b" />
+  <script src="http://github.com/_static/vendor/fontawesome/6.5.2/js/all.min.js?digest=dfe6caa3a7d634c4db9b"></script>
+
+    <script src="http://github.com/_static/documentation_options.js?v=9eb32ce0"></script>
+    <script src="http://github.com/_static/doctools.js?v=9a2dae69"></script>
+    <script src="http://github.com/_static/sphinx_highlight.js?v=dc90522c"></script>
+    <script src="http://github.com/_static/clipboard.min.js?v=a7894cd8"></script>
+    <script src="http://github.com/_static/copybutton.js?v=f281be69"></script>
+    <script src="http://github.com/_static/scripts/sphinx-book-theme.js?v=887ef09a"></script>
+    <script>let toggleHintShow = 'Click to show';</script>
+    <script>let toggleHintHide = 'Click to hide';</script>
+    <script>let toggleOpenOnPrint = 'true';</script>
+    <script src="http://github.com/_static/togglebutton.js?v=4a39c7ea"></script>
+    <script>var togglebuttonSelector = '.toggle, .admonition.dropdown';</script>
+    <script src="http://github.com/_static/design-tabs.js?v=f930bc37"></script>
+    <script>const THEBE_JS_URL = "https://unpkg.com/thebe@0.8.2/lib/index.js"; const thebe_selector = ".thebe,.cell"; const thebe_selector_input = "pre"; const thebe_selector_output = ".output, .cell_output"</script>
+    <script async="async" src="http://github.com/_static/sphinx-thebe.js?v=c100c467"></script>
+    <script>var togglebuttonSelector = '.toggle, .admonition.dropdown';</script>
+    <script>const THEBE_JS_URL = "https://unpkg.com/thebe@0.8.2/lib/index.js"; const thebe_selector = ".thebe,.cell"; const thebe_selector_input = "pre"; const thebe_selector_output = ".output, .cell_output"</script>
+    <script>DOCUMENTATION_OPTIONS.pagename = '.venv/lib/python3.13/site-packages/a11y_pygments/gotthard_light/README';</script>
+    <link rel="index" title="Index" href="http://github.com/genindex.html" />
+    <link rel="search" title="Search" href="http://github.com/search.html" />
+  <meta name="viewport" content="width=device-width, initial-scale=1"/>
+  <meta name="docsearch:language" content="en"/>
+  </head>
+  
+  
+  <body data-bs-spy="scroll" data-bs-target=".bd-toc-nav" data-offset="180" data-bs-root-margin="0px 0px -60%" data-default-mode="">
+
+  
+  
+  <div id="pst-skip-link" class="skip-link d-print-none"><a href="#main-content">Skip to main content</a></div>
+  
+  <div id="pst-scroll-pixel-helper"></div>
+  
+  <button type="button" class="btn rounded-pill" id="pst-back-to-top">
+    <i class="fa-solid fa-arrow-up"></i>Back to top</button>
+
+  
+  <input type="checkbox"
+          class="sidebar-toggle"
+          id="pst-primary-sidebar-checkbox"/>
+  <label class="overlay overlay-primary" for="pst-primary-sidebar-checkbox"></label>
+  
+  <input type="checkbox"
+          class="sidebar-toggle"
+          id="pst-secondary-sidebar-checkbox"/>
+  <label class="overlay overlay-secondary" for="pst-secondary-sidebar-checkbox"></label>
+  
+  <div class="search-button__wrapper">
+    <div class="search-button__overlay"></div>
+    <div class="search-button__search-container">
+<form class="bd-search d-flex align-items-center"
+      action="http://github.com/search.html"
+      method="get">
+  <i class="fa-solid fa-magnifying-glass"></i>
+  <input type="search"
+         class="form-control"
+         name="q"
+         id="search-input"
+         placeholder="Search this book..."
+         aria-label="Search this book..."
+         autocomplete="off"
+         autocorrect="off"
+         autocapitalize="off"
+         spellcheck="false"/>
+  <span class="search-button__kbd-shortcut"><kbd class="kbd-shortcut__modifier">Ctrl</kbd>+<kbd>K</kbd></span>
+</form></div>
+  </div>
+
+  <div class="pst-async-banner-revealer d-none">
+  <aside id="bd-header-version-warning" class="d-none d-print-none" aria-label="Version warning"></aside>
+</div>
+
+  
+    <header class="bd-header navbar navbar-expand-lg bd-navbar d-print-none">
+    </header>
+  
+
+  <div class="bd-container">
+    <div class="bd-container__inner bd-page-width">
+      
+      
+      
+        
+      
+      <div class="bd-sidebar-primary bd-sidebar">
+        
+
+  
+  <div class="sidebar-header-items sidebar-primary__section">
+    
+    
+    
+    
+  </div>
+  
+    <div class="sidebar-primary-items__start sidebar-primary__section">
+        <div class="sidebar-primary-item">
+
+  
+    
+  
+
+<a class="navbar-brand logo" href="http://github.com/intro.html">
+  
+  
+  
+  
+  
+    
+    
+      
+    
+    
+    <img src="http://github.com/_static/logo.png" class="logo__image only-light" alt="Applied Data Analysis and Machine Learning - Home"/>
+    <script>document.write(`<img src="http://github.com/_static/logo.png" class="logo__image only-dark" alt="Applied Data Analysis and Machine Learning - Home"/>`);</script>
+  
+  
+</a></div>
+        <div class="sidebar-primary-item">
+
+ <script>
+ document.write(`
+   <button class="btn search-button-field search-button__button" title="Search" aria-label="Search" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <i class="fa-solid fa-magnifying-glass"></i>
+    <span class="search-button__default-text">Search</span>
+    <span class="search-button__kbd-shortcut"><kbd class="kbd-shortcut__modifier">Ctrl</kbd>+<kbd class="kbd-shortcut__modifier">K</kbd></span>
+   </button>
+ `);
+ </script></div>
+        <div class="sidebar-primary-item"><nav class="bd-links bd-docs-nav" aria-label="Main">
+    <div class="bd-toc-item navbar-nav active">
+        
+        <ul class="nav bd-sidenav bd-sidenav__home-link">
+            <li class="toctree-l1">
+                <a class="reference internal" href="http://github.com/intro.html">
+                    Applied Data Analysis and Machine Learning
+                </a>
+            </li>
+        </ul>
+        <p aria-level="2" class="caption" role="heading"><span class="caption-text">About the course</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/schedule.html">Course setting</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/teachers.html">Teachers and Grading</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/textbooks.html">Textbooks</a></li>
+
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Review of Statistics with Resampling Techniques and Linear Algebra</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/statistics.html">1. Elements of Probability Theory and Statistical Data Analysis</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/linalg.html">2. Linear Algebra, Handling of Arrays and more Python Features</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">From Regression to Support Vector Machines</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter1.html">3. Linear Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter2.html">4. Ridge and Lasso Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter3.html">5. Resampling Methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter4.html">6. Logistic Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapteroptimization.html">7. Optimization, the central part of any Machine Learning algortithm</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter5.html">8. Support Vector Machines, overarching aims</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Decision Trees, Ensemble Methods and Boosting</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter6.html">9. Decision trees, overarching aims</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter7.html">10. Ensemble Methods: From a Single Tree to Many Trees and Extreme Boosting, Meet the Jungle of Methods</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Dimensionality Reduction</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter8.html">11. Basic ideas of the Principal Component Analysis (PCA)</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/clustering.html">12. Clustering and Unsupervised Learning</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Deep Learning Methods</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter9.html">13. Neural networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter10.html">14. Building a Feed Forward Neural Network</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter11.html">15. Solving Differential Equations  with Deep Learning</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter12.html">16. Convolutional Neural Networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter13.html">17. Recurrent neural networks: Overarching view</a></li>
+
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Weekly material, notes and exercises</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek34.html">Exercises week 34</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week34.html">Week 34: Introduction to the course, Logistics and Practicalities</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek35.html">Exercises week 35</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week35.html">Week 35: From Ordinary Linear Regression to Ridge and Lasso Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek36.html">Exercises week 36</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week36.html">Week 36: Linear Regression and Gradient descent</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek37.html">Exercises week 37</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week37.html">Week 37: Gradient descent methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek38.html">Exercises week 38</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week38.html">Week 38: Statistical analysis, bias-variance tradeoff and resampling methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek39.html">Exercises week 39</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week39.html">Week 39: Resampling methods and logistic regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week40.html">Week 40: Gradient descent methods (continued) and start Neural networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week41.html">Week 41 Neural networks and constructing a neural network code</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek41.html">Exercises week 41</a></li>
+
+
+
+
+
+
+
+
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Projects</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/project1.html">Project 1 on Machine Learning, deadline October 6 (midnight), 2025</a></li>
+</ul>
+
+    </div>
+</nav></div>
+    </div>
+  
+  
+  <div class="sidebar-primary-items__end sidebar-primary__section">
+  </div>
+  
+  <div id="rtd-footer-container"></div>
+
+
+      </div>
+      
+      <main id="main-content" class="bd-main" role="main">
+        
+        
+
+<div class="sbt-scroll-pixel-helper"></div>
+
+          <div class="bd-content">
+            <div class="bd-article-container">
+              
+              <div class="bd-header-article d-print-none">
+<div class="header-article-items header-article__inner">
+  
+    <div class="header-article-items__start">
+      
+        <div class="header-article-item"><button class="sidebar-toggle primary-toggle btn btn-sm" title="Toggle primary sidebar" data-bs-placement="bottom" data-bs-toggle="tooltip">
+  <span class="fa-solid fa-bars"></span>
+</button></div>
+      
+    </div>
+  
+  
+    <div class="header-article-items__end">
+      
+        <div class="header-article-item">
+
+<div class="article-header-buttons">
+
+
+
+
+
+<div class="dropdown dropdown-download-buttons">
+  <button class="btn dropdown-toggle" type="button" data-bs-toggle="dropdown" aria-expanded="false" aria-label="Download this page">
+    <i class="fas fa-download"></i>
+  </button>
+  <ul class="dropdown-menu">
+      
+      
+      
+      <li><a href="http://github.com/_sources/.venv/lib/python3.13/site-packages/a11y_pygments/gotthard_light/README.md" target="_blank"
+   class="btn btn-sm btn-download-source-button dropdown-item"
+   title="Download source file"
+   data-bs-placement="left" data-bs-toggle="tooltip"
+>
+  
+
+<span class="btn__icon-container">
+  <i class="fas fa-file"></i>
+  </span>
+<span class="btn__text-container">.md</span>
+</a>
+</li>
+      
+      
+      
+      
+      <li>
+<button onclick="window.print()"
+  class="btn btn-sm btn-download-pdf-button dropdown-item"
+  title="Print to PDF"
+  data-bs-placement="left" data-bs-toggle="tooltip"
+>
+  
+
+<span class="btn__icon-container">
+  <i class="fas fa-file-pdf"></i>
+  </span>
+<span class="btn__text-container">.pdf</span>
+</button>
+</li>
+      
+  </ul>
+</div>
+
+
+
+
+<button onclick="toggleFullScreen()"
+  class="btn btn-sm btn-fullscreen-button"
+  title="Fullscreen mode"
+  data-bs-placement="bottom" data-bs-toggle="tooltip"
+>
+  
+
+<span class="btn__icon-container">
+  <i class="fas fa-expand"></i>
+  </span>
+
+</button>
+
+
+
+<script>
+document.write(`
+  <button class="btn btn-sm nav-link pst-navbar-icon theme-switch-button" title="light/dark" aria-label="light/dark" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <i class="theme-switch fa-solid fa-sun fa-lg" data-mode="light"></i>
+    <i class="theme-switch fa-solid fa-moon fa-lg" data-mode="dark"></i>
+    <i class="theme-switch fa-solid fa-circle-half-stroke fa-lg" data-mode="auto"></i>
+  </button>
+`);
+</script>
+
+
+<script>
+document.write(`
+  <button class="btn btn-sm pst-navbar-icon search-button search-button__button" title="Search" aria-label="Search" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <i class="fa-solid fa-magnifying-glass fa-lg"></i>
+  </button>
+`);
+</script>
+<button class="sidebar-toggle secondary-toggle btn btn-sm" title="Toggle secondary sidebar" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <span class="fa-solid fa-list"></span>
+</button>
+</div></div>
+      
+    </div>
+  
+</div>
+</div>
+              
+              
+
+<div id="jb-print-docs-body" class="onlyprint">
+    <h1>Gotthard Light</h1>
+    <!-- Table of contents -->
+    <div id="print-main-content">
+        <div id="jb-print-toc">
+            
+            <div>
+                <h2> Contents </h2>
+            </div>
+            <nav aria-label="Page">
+                <ul class="visible nav section-nav flex-column">
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#colors">Colors</a></li>
+</ul>
+            </nav>
+        </div>
+    </div>
+</div>
+
+              
+                
+<div id="searchbox"></div>
+                <article class="bd-article">
+                  
+  <section class="tex2jax_ignore mathjax_ignore" id="gotthard-light">
+<h1>Gotthard Light<a class="headerlink" href="#gotthard-light" title="Link to this heading">#</a></h1>
+<p>This style mimics the gotthard light theme from vscode.</p>
+<p><img alt="Screenshot of the gotthard-light theme in a bash script" src="http://github.com/_images/gotthard-light.png" /></p>
+<section id="colors">
+<h2>Colors<a class="headerlink" href="#colors" title="Link to this heading">#</a></h2>
+<p>Background color: <img alt="#F5F5F5" src="https://via.placeholder.com/20/F5F5F5/F5F5F5.png" /> <code class="docutils literal notranslate"><span class="pre">#F5F5F5</span></code></p>
+<p>Highlight color: <img alt="#E1E1E1" src="https://via.placeholder.com/20/E1E1E1/E1E1E1.png" /> <code class="docutils literal notranslate"><span class="pre">#E1E1E1</span></code></p>
+<p><strong>WCAG compliance</strong></p>
+<div class="pst-scrollable-table-container"><table class="table">
+<thead>
+<tr class="row-odd"><th class="head"><p>Color</p></th>
+<th class="head"><p>Hex</p></th>
+<th class="head"><p>Ratio</p></th>
+<th class="head"><p>Normal text</p></th>
+<th class="head"><p>Large text</p></th>
+</tr>
+</thead>
+<tbody>
+<tr class="row-even"><td><p><img alt="#141414" src="http://github.com/_images/141414.png" /></p></td>
+<td><p><code class="docutils literal notranslate"><span class="pre">#141414</span></code></p></td>
+<td><p>16.9 : 1</p></td>
+<td><p>AAA</p></td>
+<td><p>AAA</p></td>
+</tr>
+<tr class="row-odd"><td><p><img alt="#9f4e55" src="http://github.com/_images/9f4e55.png" /></p></td>
+<td><p><code class="docutils literal notranslate"><span class="pre">#9f4e55</span></code></p></td>
+<td><p>5.2 : 1</p></td>
+<td><p>AA</p></td>
+<td><p>AAA</p></td>
+</tr>
+<tr class="row-even"><td><p><img alt="#a25e53" src="http://github.com/_images/a25e53.png" /></p></td>
+<td><p><code class="docutils literal notranslate"><span class="pre">#a25e53</span></code></p></td>
+<td><p>4.5 : 1</p></td>
+<td><p>AA</p></td>
+<td><p>AAA</p></td>
+</tr>
+<tr class="row-odd"><td><p><img alt="#98661b" src="http://github.com/_images/98661b.png" /></p></td>
+<td><p><code class="docutils literal notranslate"><span class="pre">#98661b</span></code></p></td>
+<td><p>4.5 : 1</p></td>
+<td><p>AA</p></td>
+<td><p>AAA</p></td>
+</tr>
+<tr class="row-even"><td><p><img alt="#437a6b" src="http://github.com/_images/437a6b.png" /></p></td>
+<td><p><code class="docutils literal notranslate"><span class="pre">#437a6b</span></code></p></td>
+<td><p>4.5 : 1</p></td>
+<td><p>AA</p></td>
+<td><p>AAA</p></td>
+</tr>
+<tr class="row-odd"><td><p><img alt="#3d73a9" src="http://github.com/_images/3d73a9.png" /></p></td>
+<td><p><code class="docutils literal notranslate"><span class="pre">#3d73a9</span></code></p></td>
+<td><p>4.6 : 1</p></td>
+<td><p>AA</p></td>
+<td><p>AAA</p></td>
+</tr>
+<tr class="row-even"><td><p><img alt="#974eb7" src="http://github.com/_images/974eb7.png" /></p></td>
+<td><p><code class="docutils literal notranslate"><span class="pre">#974eb7</span></code></p></td>
+<td><p>4.7 : 1</p></td>
+<td><p>AA</p></td>
+<td><p>AAA</p></td>
+</tr>
+</tbody>
+</table>
+</div>
+</section>
+</section>
+
+    <script type="text/x-thebe-config">
+    {
+        requestKernel: true,
+        binderOptions: {
+            repo: "binder-examples/jupyter-stacks-datascience",
+            ref: "master",
+        },
+        codeMirrorConfig: {
+            theme: "abcdef",
+            mode: "python"
+        },
+        kernelOptions: {
+            name: "python3",
+            path: "./.venv/lib/python3.13/site-packages/a11y_pygments/gotthard_light"
+        },
+        predefinedOutput: true
+    }
+    </script>
+    <script>kernelName = 'python3'</script>
+
+                </article>
+              
+
+              
+              
+              
+              
+                <footer class="prev-next-footer d-print-none">
+                  
+<div class="prev-next-area">
+</div>
+                </footer>
+              
+            </div>
+            
+            
+              
+                <div class="bd-sidebar-secondary bd-toc"><div class="sidebar-secondary-items sidebar-secondary__inner">
+
+
+  <div class="sidebar-secondary-item">
+  <div class="page-toc tocsection onthispage">
+    <i class="fa-solid fa-list"></i> Contents
+  </div>
+  <nav class="bd-toc-nav page-toc">
+    <ul class="visible nav section-nav flex-column">
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#colors">Colors</a></li>
+</ul>
+  </nav></div>
+
+</div></div>
+              
+            
+          </div>
+          <footer class="bd-footer-content">
+            
+<div class="bd-footer-content__inner container">
+  
+  <div class="footer-item">
+    
+<p class="component-author">
+By Morten Hjorth-Jensen
+</p>
+
+  </div>
+  
+  <div class="footer-item">
+    
+
+  <p class="copyright">
+    
+      © Copyright 2023.
+      <br/>
+    
+  </p>
+
+  </div>
+  
+  <div class="footer-item">
+    
+  </div>
+  
+  <div class="footer-item">
+    
+  </div>
+  
+</div>
+          </footer>
+        
+
+      </main>
+    </div>
+  </div>
+  
+  <!-- Scripts loaded after <body> so the DOM is not blocked -->
+  <script src="http://github.com/_static/scripts/bootstrap.js?digest=dfe6caa3a7d634c4db9b"></script>
+<script src="http://github.com/_static/scripts/pydata-sphinx-theme.js?digest=dfe6caa3a7d634c4db9b"></script>
+
+  <footer class="bd-footer">
+  </footer>
+  </body>
+</html>
\ No newline at end of file
diff --git a/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/a11y_pygments/greative/README.html b/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/a11y_pygments/greative/README.html
new file mode 100644
index 000000000..e2d7a65c3
--- /dev/null
+++ b/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/a11y_pygments/greative/README.html
@@ -0,0 +1,579 @@
+
+<!DOCTYPE html>
+
+
+<html lang="en" data-content_root="../../../../../../" >
+
+  <head>
+    <meta charset="utf-8" />
+    <meta name="viewport" content="width=device-width, initial-scale=1.0" /><meta name="viewport" content="width=device-width, initial-scale=1" />
+
+    <title>Greative &#8212; Applied Data Analysis and Machine Learning</title>
+  
+  
+  
+  <script data-cfasync="false">
+    document.documentElement.dataset.mode = localStorage.getItem("mode") || "";
+    document.documentElement.dataset.theme = localStorage.getItem("theme") || "";
+  </script>
+  
+  <!-- Loaded before other Sphinx assets -->
+  <link href="http://github.com/_static/styles/theme.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+<link href="http://github.com/_static/styles/bootstrap.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+<link href="http://github.com/_static/styles/pydata-sphinx-theme.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+
+  
+  <link href="http://github.com/_static/vendor/fontawesome/6.5.2/css/all.min.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+  <link rel="preload" as="font" type="font/woff2" crossorigin href="http://github.com/_static/vendor/fontawesome/6.5.2/webfonts/fa-solid-900.woff2" />
+<link rel="preload" as="font" type="font/woff2" crossorigin href="http://github.com/_static/vendor/fontawesome/6.5.2/webfonts/fa-brands-400.woff2" />
+<link rel="preload" as="font" type="font/woff2" crossorigin href="http://github.com/_static/vendor/fontawesome/6.5.2/webfonts/fa-regular-400.woff2" />
+
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/pygments.css?v=03e43079" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/styles/sphinx-book-theme.css?v=eba8b062" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/togglebutton.css?v=13237357" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/copybutton.css?v=76b2166b" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/mystnb.8ecb98da25f57f5357bf6f572d296f466b2cfe2517ffebfabe82451661e28f02.css?v=6644e6bb" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/sphinx-thebe.css?v=4fa983c6" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/sphinx-design.min.css?v=95c83b7e" />
+  
+  <!-- Pre-loaded scripts that we'll load fully later -->
+  <link rel="preload" as="script" href="http://github.com/_static/scripts/bootstrap.js?digest=dfe6caa3a7d634c4db9b" />
+<link rel="preload" as="script" href="http://github.com/_static/scripts/pydata-sphinx-theme.js?digest=dfe6caa3a7d634c4db9b" />
+  <script src="http://github.com/_static/vendor/fontawesome/6.5.2/js/all.min.js?digest=dfe6caa3a7d634c4db9b"></script>
+
+    <script src="http://github.com/_static/documentation_options.js?v=9eb32ce0"></script>
+    <script src="http://github.com/_static/doctools.js?v=9a2dae69"></script>
+    <script src="http://github.com/_static/sphinx_highlight.js?v=dc90522c"></script>
+    <script src="http://github.com/_static/clipboard.min.js?v=a7894cd8"></script>
+    <script src="http://github.com/_static/copybutton.js?v=f281be69"></script>
+    <script src="http://github.com/_static/scripts/sphinx-book-theme.js?v=887ef09a"></script>
+    <script>let toggleHintShow = 'Click to show';</script>
+    <script>let toggleHintHide = 'Click to hide';</script>
+    <script>let toggleOpenOnPrint = 'true';</script>
+    <script src="http://github.com/_static/togglebutton.js?v=4a39c7ea"></script>
+    <script>var togglebuttonSelector = '.toggle, .admonition.dropdown';</script>
+    <script src="http://github.com/_static/design-tabs.js?v=f930bc37"></script>
+    <script>const THEBE_JS_URL = "https://unpkg.com/thebe@0.8.2/lib/index.js"; const thebe_selector = ".thebe,.cell"; const thebe_selector_input = "pre"; const thebe_selector_output = ".output, .cell_output"</script>
+    <script async="async" src="http://github.com/_static/sphinx-thebe.js?v=c100c467"></script>
+    <script>var togglebuttonSelector = '.toggle, .admonition.dropdown';</script>
+    <script>const THEBE_JS_URL = "https://unpkg.com/thebe@0.8.2/lib/index.js"; const thebe_selector = ".thebe,.cell"; const thebe_selector_input = "pre"; const thebe_selector_output = ".output, .cell_output"</script>
+    <script>DOCUMENTATION_OPTIONS.pagename = '.venv/lib/python3.13/site-packages/a11y_pygments/greative/README';</script>
+    <link rel="index" title="Index" href="http://github.com/genindex.html" />
+    <link rel="search" title="Search" href="http://github.com/search.html" />
+  <meta name="viewport" content="width=device-width, initial-scale=1"/>
+  <meta name="docsearch:language" content="en"/>
+  </head>
+  
+  
+  <body data-bs-spy="scroll" data-bs-target=".bd-toc-nav" data-offset="180" data-bs-root-margin="0px 0px -60%" data-default-mode="">
+
+  
+  
+  <div id="pst-skip-link" class="skip-link d-print-none"><a href="#main-content">Skip to main content</a></div>
+  
+  <div id="pst-scroll-pixel-helper"></div>
+  
+  <button type="button" class="btn rounded-pill" id="pst-back-to-top">
+    <i class="fa-solid fa-arrow-up"></i>Back to top</button>
+
+  
+  <input type="checkbox"
+          class="sidebar-toggle"
+          id="pst-primary-sidebar-checkbox"/>
+  <label class="overlay overlay-primary" for="pst-primary-sidebar-checkbox"></label>
+  
+  <input type="checkbox"
+          class="sidebar-toggle"
+          id="pst-secondary-sidebar-checkbox"/>
+  <label class="overlay overlay-secondary" for="pst-secondary-sidebar-checkbox"></label>
+  
+  <div class="search-button__wrapper">
+    <div class="search-button__overlay"></div>
+    <div class="search-button__search-container">
+<form class="bd-search d-flex align-items-center"
+      action="http://github.com/search.html"
+      method="get">
+  <i class="fa-solid fa-magnifying-glass"></i>
+  <input type="search"
+         class="form-control"
+         name="q"
+         id="search-input"
+         placeholder="Search this book..."
+         aria-label="Search this book..."
+         autocomplete="off"
+         autocorrect="off"
+         autocapitalize="off"
+         spellcheck="false"/>
+  <span class="search-button__kbd-shortcut"><kbd class="kbd-shortcut__modifier">Ctrl</kbd>+<kbd>K</kbd></span>
+</form></div>
+  </div>
+
+  <div class="pst-async-banner-revealer d-none">
+  <aside id="bd-header-version-warning" class="d-none d-print-none" aria-label="Version warning"></aside>
+</div>
+
+  
+    <header class="bd-header navbar navbar-expand-lg bd-navbar d-print-none">
+    </header>
+  
+
+  <div class="bd-container">
+    <div class="bd-container__inner bd-page-width">
+      
+      
+      
+        
+      
+      <div class="bd-sidebar-primary bd-sidebar">
+        
+
+  
+  <div class="sidebar-header-items sidebar-primary__section">
+    
+    
+    
+    
+  </div>
+  
+    <div class="sidebar-primary-items__start sidebar-primary__section">
+        <div class="sidebar-primary-item">
+
+  
+    
+  
+
+<a class="navbar-brand logo" href="http://github.com/intro.html">
+  
+  
+  
+  
+  
+    
+    
+      
+    
+    
+    <img src="http://github.com/_static/logo.png" class="logo__image only-light" alt="Applied Data Analysis and Machine Learning - Home"/>
+    <script>document.write(`<img src="http://github.com/_static/logo.png" class="logo__image only-dark" alt="Applied Data Analysis and Machine Learning - Home"/>`);</script>
+  
+  
+</a></div>
+        <div class="sidebar-primary-item">
+
+ <script>
+ document.write(`
+   <button class="btn search-button-field search-button__button" title="Search" aria-label="Search" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <i class="fa-solid fa-magnifying-glass"></i>
+    <span class="search-button__default-text">Search</span>
+    <span class="search-button__kbd-shortcut"><kbd class="kbd-shortcut__modifier">Ctrl</kbd>+<kbd class="kbd-shortcut__modifier">K</kbd></span>
+   </button>
+ `);
+ </script></div>
+        <div class="sidebar-primary-item"><nav class="bd-links bd-docs-nav" aria-label="Main">
+    <div class="bd-toc-item navbar-nav active">
+        
+        <ul class="nav bd-sidenav bd-sidenav__home-link">
+            <li class="toctree-l1">
+                <a class="reference internal" href="http://github.com/intro.html">
+                    Applied Data Analysis and Machine Learning
+                </a>
+            </li>
+        </ul>
+        <p aria-level="2" class="caption" role="heading"><span class="caption-text">About the course</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/schedule.html">Course setting</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/teachers.html">Teachers and Grading</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/textbooks.html">Textbooks</a></li>
+
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Review of Statistics with Resampling Techniques and Linear Algebra</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/statistics.html">1. Elements of Probability Theory and Statistical Data Analysis</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/linalg.html">2. Linear Algebra, Handling of Arrays and more Python Features</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">From Regression to Support Vector Machines</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter1.html">3. Linear Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter2.html">4. Ridge and Lasso Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter3.html">5. Resampling Methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter4.html">6. Logistic Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapteroptimization.html">7. Optimization, the central part of any Machine Learning algortithm</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter5.html">8. Support Vector Machines, overarching aims</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Decision Trees, Ensemble Methods and Boosting</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter6.html">9. Decision trees, overarching aims</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter7.html">10. Ensemble Methods: From a Single Tree to Many Trees and Extreme Boosting, Meet the Jungle of Methods</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Dimensionality Reduction</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter8.html">11. Basic ideas of the Principal Component Analysis (PCA)</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/clustering.html">12. Clustering and Unsupervised Learning</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Deep Learning Methods</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter9.html">13. Neural networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter10.html">14. Building a Feed Forward Neural Network</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter11.html">15. Solving Differential Equations  with Deep Learning</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter12.html">16. Convolutional Neural Networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter13.html">17. Recurrent neural networks: Overarching view</a></li>
+
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Weekly material, notes and exercises</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek34.html">Exercises week 34</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week34.html">Week 34: Introduction to the course, Logistics and Practicalities</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek35.html">Exercises week 35</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week35.html">Week 35: From Ordinary Linear Regression to Ridge and Lasso Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek36.html">Exercises week 36</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week36.html">Week 36: Linear Regression and Gradient descent</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek37.html">Exercises week 37</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week37.html">Week 37: Gradient descent methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek38.html">Exercises week 38</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week38.html">Week 38: Statistical analysis, bias-variance tradeoff and resampling methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek39.html">Exercises week 39</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week39.html">Week 39: Resampling methods and logistic regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week40.html">Week 40: Gradient descent methods (continued) and start Neural networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week41.html">Week 41 Neural networks and constructing a neural network code</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek41.html">Exercises week 41</a></li>
+
+
+
+
+
+
+
+
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Projects</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/project1.html">Project 1 on Machine Learning, deadline October 6 (midnight), 2025</a></li>
+</ul>
+
+    </div>
+</nav></div>
+    </div>
+  
+  
+  <div class="sidebar-primary-items__end sidebar-primary__section">
+  </div>
+  
+  <div id="rtd-footer-container"></div>
+
+
+      </div>
+      
+      <main id="main-content" class="bd-main" role="main">
+        
+        
+
+<div class="sbt-scroll-pixel-helper"></div>
+
+          <div class="bd-content">
+            <div class="bd-article-container">
+              
+              <div class="bd-header-article d-print-none">
+<div class="header-article-items header-article__inner">
+  
+    <div class="header-article-items__start">
+      
+        <div class="header-article-item"><button class="sidebar-toggle primary-toggle btn btn-sm" title="Toggle primary sidebar" data-bs-placement="bottom" data-bs-toggle="tooltip">
+  <span class="fa-solid fa-bars"></span>
+</button></div>
+      
+    </div>
+  
+  
+    <div class="header-article-items__end">
+      
+        <div class="header-article-item">
+
+<div class="article-header-buttons">
+
+
+
+
+
+<div class="dropdown dropdown-download-buttons">
+  <button class="btn dropdown-toggle" type="button" data-bs-toggle="dropdown" aria-expanded="false" aria-label="Download this page">
+    <i class="fas fa-download"></i>
+  </button>
+  <ul class="dropdown-menu">
+      
+      
+      
+      <li><a href="http://github.com/_sources/.venv/lib/python3.13/site-packages/a11y_pygments/greative/README.md" target="_blank"
+   class="btn btn-sm btn-download-source-button dropdown-item"
+   title="Download source file"
+   data-bs-placement="left" data-bs-toggle="tooltip"
+>
+  
+
+<span class="btn__icon-container">
+  <i class="fas fa-file"></i>
+  </span>
+<span class="btn__text-container">.md</span>
+</a>
+</li>
+      
+      
+      
+      
+      <li>
+<button onclick="window.print()"
+  class="btn btn-sm btn-download-pdf-button dropdown-item"
+  title="Print to PDF"
+  data-bs-placement="left" data-bs-toggle="tooltip"
+>
+  
+
+<span class="btn__icon-container">
+  <i class="fas fa-file-pdf"></i>
+  </span>
+<span class="btn__text-container">.pdf</span>
+</button>
+</li>
+      
+  </ul>
+</div>
+
+
+
+
+<button onclick="toggleFullScreen()"
+  class="btn btn-sm btn-fullscreen-button"
+  title="Fullscreen mode"
+  data-bs-placement="bottom" data-bs-toggle="tooltip"
+>
+  
+
+<span class="btn__icon-container">
+  <i class="fas fa-expand"></i>
+  </span>
+
+</button>
+
+
+
+<script>
+document.write(`
+  <button class="btn btn-sm nav-link pst-navbar-icon theme-switch-button" title="light/dark" aria-label="light/dark" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <i class="theme-switch fa-solid fa-sun fa-lg" data-mode="light"></i>
+    <i class="theme-switch fa-solid fa-moon fa-lg" data-mode="dark"></i>
+    <i class="theme-switch fa-solid fa-circle-half-stroke fa-lg" data-mode="auto"></i>
+  </button>
+`);
+</script>
+
+
+<script>
+document.write(`
+  <button class="btn btn-sm pst-navbar-icon search-button search-button__button" title="Search" aria-label="Search" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <i class="fa-solid fa-magnifying-glass fa-lg"></i>
+  </button>
+`);
+</script>
+<button class="sidebar-toggle secondary-toggle btn btn-sm" title="Toggle secondary sidebar" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <span class="fa-solid fa-list"></span>
+</button>
+</div></div>
+      
+    </div>
+  
+</div>
+</div>
+              
+              
+
+<div id="jb-print-docs-body" class="onlyprint">
+    <h1>Greative</h1>
+    <!-- Table of contents -->
+    <div id="print-main-content">
+        <div id="jb-print-toc">
+            
+            <div>
+                <h2> Contents </h2>
+            </div>
+            <nav aria-label="Page">
+                <ul class="visible nav section-nav flex-column">
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#colors">Colors</a></li>
+</ul>
+            </nav>
+        </div>
+    </div>
+</div>
+
+              
+                
+<div id="searchbox"></div>
+                <article class="bd-article">
+                  
+  <section class="tex2jax_ignore mathjax_ignore" id="greative">
+<h1>Greative<a class="headerlink" href="#greative" title="Link to this heading">#</a></h1>
+<p>This style mimics greative theme from vscode themes.</p>
+<p><img alt="Screenshot of the greative theme in a bash script" src="http://github.com/_images/greative.png" /></p>
+<section id="colors">
+<h2>Colors<a class="headerlink" href="#colors" title="Link to this heading">#</a></h2>
+<p>Background color: <img alt="#010726" src="https://via.placeholder.com/20/010726/010726.png" /> <code class="docutils literal notranslate"><span class="pre">#010726</span></code></p>
+<p>Highlight color: <img alt="#473d18" src="https://via.placeholder.com/20/473d18/473d18.png" /> <code class="docutils literal notranslate"><span class="pre">#473d18</span></code></p>
+<p><strong>WCAG compliance</strong></p>
+<div class="pst-scrollable-table-container"><table class="table">
+<thead>
+<tr class="row-odd"><th class="head"><p>Color</p></th>
+<th class="head"><p>Hex</p></th>
+<th class="head"><p>Ratio</p></th>
+<th class="head"><p>Normal text</p></th>
+<th class="head"><p>Large text</p></th>
+</tr>
+</thead>
+<tbody>
+<tr class="row-even"><td><p><img alt="#797979" src="http://github.com/_images/797979.png" /></p></td>
+<td><p><code class="docutils literal notranslate"><span class="pre">#797979</span></code></p></td>
+<td><p>4.6 : 1</p></td>
+<td><p>AA</p></td>
+<td><p>AAA</p></td>
+</tr>
+<tr class="row-odd"><td><p><img alt="#f78c6c" src="http://github.com/_images/f78c6c.png" /></p></td>
+<td><p><code class="docutils literal notranslate"><span class="pre">#f78c6c</span></code></p></td>
+<td><p>8.4 : 1</p></td>
+<td><p>AAA</p></td>
+<td><p>AAA</p></td>
+</tr>
+<tr class="row-even"><td><p><img alt="#9e8741" src="http://github.com/_images/9e8741.png" /></p></td>
+<td><p><code class="docutils literal notranslate"><span class="pre">#9e8741</span></code></p></td>
+<td><p>5.7 : 1</p></td>
+<td><p>AA</p></td>
+<td><p>AAA</p></td>
+</tr>
+<tr class="row-odd"><td><p><img alt="#c5e478" src="http://github.com/_images/c5e478.png" /></p></td>
+<td><p><code class="docutils literal notranslate"><span class="pre">#c5e478</span></code></p></td>
+<td><p>13.9 : 1</p></td>
+<td><p>AAA</p></td>
+<td><p>AAA</p></td>
+</tr>
+<tr class="row-even"><td><p><img alt="#a2bffc" src="http://github.com/_images/a2bffc.png" /></p></td>
+<td><p><code class="docutils literal notranslate"><span class="pre">#a2bffc</span></code></p></td>
+<td><p>10.8 : 1</p></td>
+<td><p>AAA</p></td>
+<td><p>AAA</p></td>
+</tr>
+<tr class="row-odd"><td><p><img alt="#5ca7e4" src="http://github.com/_images/5ca7e4.png" /></p></td>
+<td><p><code class="docutils literal notranslate"><span class="pre">#5ca7e4</span></code></p></td>
+<td><p>7.6 : 1</p></td>
+<td><p>AAA</p></td>
+<td><p>AAA</p></td>
+</tr>
+<tr class="row-even"><td><p><img alt="#9e86c8" src="http://github.com/_images/9e86c8.png" /></p></td>
+<td><p><code class="docutils literal notranslate"><span class="pre">#9e86c8</span></code></p></td>
+<td><p>6.3 : 1</p></td>
+<td><p>AA</p></td>
+<td><p>AAA</p></td>
+</tr>
+</tbody>
+</table>
+</div>
+</section>
+</section>
+
+    <script type="text/x-thebe-config">
+    {
+        requestKernel: true,
+        binderOptions: {
+            repo: "binder-examples/jupyter-stacks-datascience",
+            ref: "master",
+        },
+        codeMirrorConfig: {
+            theme: "abcdef",
+            mode: "python"
+        },
+        kernelOptions: {
+            name: "python3",
+            path: "./.venv/lib/python3.13/site-packages/a11y_pygments/greative"
+        },
+        predefinedOutput: true
+    }
+    </script>
+    <script>kernelName = 'python3'</script>
+
+                </article>
+              
+
+              
+              
+              
+              
+                <footer class="prev-next-footer d-print-none">
+                  
+<div class="prev-next-area">
+</div>
+                </footer>
+              
+            </div>
+            
+            
+              
+                <div class="bd-sidebar-secondary bd-toc"><div class="sidebar-secondary-items sidebar-secondary__inner">
+
+
+  <div class="sidebar-secondary-item">
+  <div class="page-toc tocsection onthispage">
+    <i class="fa-solid fa-list"></i> Contents
+  </div>
+  <nav class="bd-toc-nav page-toc">
+    <ul class="visible nav section-nav flex-column">
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#colors">Colors</a></li>
+</ul>
+  </nav></div>
+
+</div></div>
+              
+            
+          </div>
+          <footer class="bd-footer-content">
+            
+<div class="bd-footer-content__inner container">
+  
+  <div class="footer-item">
+    
+<p class="component-author">
+By Morten Hjorth-Jensen
+</p>
+
+  </div>
+  
+  <div class="footer-item">
+    
+
+  <p class="copyright">
+    
+      © Copyright 2023.
+      <br/>
+    
+  </p>
+
+  </div>
+  
+  <div class="footer-item">
+    
+  </div>
+  
+  <div class="footer-item">
+    
+  </div>
+  
+</div>
+          </footer>
+        
+
+      </main>
+    </div>
+  </div>
+  
+  <!-- Scripts loaded after <body> so the DOM is not blocked -->
+  <script src="http://github.com/_static/scripts/bootstrap.js?digest=dfe6caa3a7d634c4db9b"></script>
+<script src="http://github.com/_static/scripts/pydata-sphinx-theme.js?digest=dfe6caa3a7d634c4db9b"></script>
+
+  <footer class="bd-footer">
+  </footer>
+  </body>
+</html>
\ No newline at end of file
diff --git a/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/a11y_pygments/pitaya_smoothie/README.html b/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/a11y_pygments/pitaya_smoothie/README.html
new file mode 100644
index 000000000..dcd757c82
--- /dev/null
+++ b/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/a11y_pygments/pitaya_smoothie/README.html
@@ -0,0 +1,591 @@
+
+<!DOCTYPE html>
+
+
+<html lang="en" data-content_root="../../../../../../" >
+
+  <head>
+    <meta charset="utf-8" />
+    <meta name="viewport" content="width=device-width, initial-scale=1.0" /><meta name="viewport" content="width=device-width, initial-scale=1" />
+
+    <title>Pitaya Smoothie &#8212; Applied Data Analysis and Machine Learning</title>
+  
+  
+  
+  <script data-cfasync="false">
+    document.documentElement.dataset.mode = localStorage.getItem("mode") || "";
+    document.documentElement.dataset.theme = localStorage.getItem("theme") || "";
+  </script>
+  
+  <!-- Loaded before other Sphinx assets -->
+  <link href="http://github.com/_static/styles/theme.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+<link href="http://github.com/_static/styles/bootstrap.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+<link href="http://github.com/_static/styles/pydata-sphinx-theme.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+
+  
+  <link href="http://github.com/_static/vendor/fontawesome/6.5.2/css/all.min.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+  <link rel="preload" as="font" type="font/woff2" crossorigin href="http://github.com/_static/vendor/fontawesome/6.5.2/webfonts/fa-solid-900.woff2" />
+<link rel="preload" as="font" type="font/woff2" crossorigin href="http://github.com/_static/vendor/fontawesome/6.5.2/webfonts/fa-brands-400.woff2" />
+<link rel="preload" as="font" type="font/woff2" crossorigin href="http://github.com/_static/vendor/fontawesome/6.5.2/webfonts/fa-regular-400.woff2" />
+
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/pygments.css?v=03e43079" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/styles/sphinx-book-theme.css?v=eba8b062" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/togglebutton.css?v=13237357" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/copybutton.css?v=76b2166b" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/mystnb.8ecb98da25f57f5357bf6f572d296f466b2cfe2517ffebfabe82451661e28f02.css?v=6644e6bb" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/sphinx-thebe.css?v=4fa983c6" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/sphinx-design.min.css?v=95c83b7e" />
+  
+  <!-- Pre-loaded scripts that we'll load fully later -->
+  <link rel="preload" as="script" href="http://github.com/_static/scripts/bootstrap.js?digest=dfe6caa3a7d634c4db9b" />
+<link rel="preload" as="script" href="http://github.com/_static/scripts/pydata-sphinx-theme.js?digest=dfe6caa3a7d634c4db9b" />
+  <script src="http://github.com/_static/vendor/fontawesome/6.5.2/js/all.min.js?digest=dfe6caa3a7d634c4db9b"></script>
+
+    <script src="http://github.com/_static/documentation_options.js?v=9eb32ce0"></script>
+    <script src="http://github.com/_static/doctools.js?v=9a2dae69"></script>
+    <script src="http://github.com/_static/sphinx_highlight.js?v=dc90522c"></script>
+    <script src="http://github.com/_static/clipboard.min.js?v=a7894cd8"></script>
+    <script src="http://github.com/_static/copybutton.js?v=f281be69"></script>
+    <script src="http://github.com/_static/scripts/sphinx-book-theme.js?v=887ef09a"></script>
+    <script>let toggleHintShow = 'Click to show';</script>
+    <script>let toggleHintHide = 'Click to hide';</script>
+    <script>let toggleOpenOnPrint = 'true';</script>
+    <script src="http://github.com/_static/togglebutton.js?v=4a39c7ea"></script>
+    <script>var togglebuttonSelector = '.toggle, .admonition.dropdown';</script>
+    <script src="http://github.com/_static/design-tabs.js?v=f930bc37"></script>
+    <script>const THEBE_JS_URL = "https://unpkg.com/thebe@0.8.2/lib/index.js"; const thebe_selector = ".thebe,.cell"; const thebe_selector_input = "pre"; const thebe_selector_output = ".output, .cell_output"</script>
+    <script async="async" src="http://github.com/_static/sphinx-thebe.js?v=c100c467"></script>
+    <script>var togglebuttonSelector = '.toggle, .admonition.dropdown';</script>
+    <script>const THEBE_JS_URL = "https://unpkg.com/thebe@0.8.2/lib/index.js"; const thebe_selector = ".thebe,.cell"; const thebe_selector_input = "pre"; const thebe_selector_output = ".output, .cell_output"</script>
+    <script>DOCUMENTATION_OPTIONS.pagename = '.venv/lib/python3.13/site-packages/a11y_pygments/pitaya_smoothie/README';</script>
+    <link rel="index" title="Index" href="http://github.com/genindex.html" />
+    <link rel="search" title="Search" href="http://github.com/search.html" />
+  <meta name="viewport" content="width=device-width, initial-scale=1"/>
+  <meta name="docsearch:language" content="en"/>
+  </head>
+  
+  
+  <body data-bs-spy="scroll" data-bs-target=".bd-toc-nav" data-offset="180" data-bs-root-margin="0px 0px -60%" data-default-mode="">
+
+  
+  
+  <div id="pst-skip-link" class="skip-link d-print-none"><a href="#main-content">Skip to main content</a></div>
+  
+  <div id="pst-scroll-pixel-helper"></div>
+  
+  <button type="button" class="btn rounded-pill" id="pst-back-to-top">
+    <i class="fa-solid fa-arrow-up"></i>Back to top</button>
+
+  
+  <input type="checkbox"
+          class="sidebar-toggle"
+          id="pst-primary-sidebar-checkbox"/>
+  <label class="overlay overlay-primary" for="pst-primary-sidebar-checkbox"></label>
+  
+  <input type="checkbox"
+          class="sidebar-toggle"
+          id="pst-secondary-sidebar-checkbox"/>
+  <label class="overlay overlay-secondary" for="pst-secondary-sidebar-checkbox"></label>
+  
+  <div class="search-button__wrapper">
+    <div class="search-button__overlay"></div>
+    <div class="search-button__search-container">
+<form class="bd-search d-flex align-items-center"
+      action="http://github.com/search.html"
+      method="get">
+  <i class="fa-solid fa-magnifying-glass"></i>
+  <input type="search"
+         class="form-control"
+         name="q"
+         id="search-input"
+         placeholder="Search this book..."
+         aria-label="Search this book..."
+         autocomplete="off"
+         autocorrect="off"
+         autocapitalize="off"
+         spellcheck="false"/>
+  <span class="search-button__kbd-shortcut"><kbd class="kbd-shortcut__modifier">Ctrl</kbd>+<kbd>K</kbd></span>
+</form></div>
+  </div>
+
+  <div class="pst-async-banner-revealer d-none">
+  <aside id="bd-header-version-warning" class="d-none d-print-none" aria-label="Version warning"></aside>
+</div>
+
+  
+    <header class="bd-header navbar navbar-expand-lg bd-navbar d-print-none">
+    </header>
+  
+
+  <div class="bd-container">
+    <div class="bd-container__inner bd-page-width">
+      
+      
+      
+        
+      
+      <div class="bd-sidebar-primary bd-sidebar">
+        
+
+  
+  <div class="sidebar-header-items sidebar-primary__section">
+    
+    
+    
+    
+  </div>
+  
+    <div class="sidebar-primary-items__start sidebar-primary__section">
+        <div class="sidebar-primary-item">
+
+  
+    
+  
+
+<a class="navbar-brand logo" href="http://github.com/intro.html">
+  
+  
+  
+  
+  
+    
+    
+      
+    
+    
+    <img src="http://github.com/_static/logo.png" class="logo__image only-light" alt="Applied Data Analysis and Machine Learning - Home"/>
+    <script>document.write(`<img src="http://github.com/_static/logo.png" class="logo__image only-dark" alt="Applied Data Analysis and Machine Learning - Home"/>`);</script>
+  
+  
+</a></div>
+        <div class="sidebar-primary-item">
+
+ <script>
+ document.write(`
+   <button class="btn search-button-field search-button__button" title="Search" aria-label="Search" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <i class="fa-solid fa-magnifying-glass"></i>
+    <span class="search-button__default-text">Search</span>
+    <span class="search-button__kbd-shortcut"><kbd class="kbd-shortcut__modifier">Ctrl</kbd>+<kbd class="kbd-shortcut__modifier">K</kbd></span>
+   </button>
+ `);
+ </script></div>
+        <div class="sidebar-primary-item"><nav class="bd-links bd-docs-nav" aria-label="Main">
+    <div class="bd-toc-item navbar-nav active">
+        
+        <ul class="nav bd-sidenav bd-sidenav__home-link">
+            <li class="toctree-l1">
+                <a class="reference internal" href="http://github.com/intro.html">
+                    Applied Data Analysis and Machine Learning
+                </a>
+            </li>
+        </ul>
+        <p aria-level="2" class="caption" role="heading"><span class="caption-text">About the course</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/schedule.html">Course setting</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/teachers.html">Teachers and Grading</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/textbooks.html">Textbooks</a></li>
+
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Review of Statistics with Resampling Techniques and Linear Algebra</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/statistics.html">1. Elements of Probability Theory and Statistical Data Analysis</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/linalg.html">2. Linear Algebra, Handling of Arrays and more Python Features</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">From Regression to Support Vector Machines</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter1.html">3. Linear Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter2.html">4. Ridge and Lasso Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter3.html">5. Resampling Methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter4.html">6. Logistic Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapteroptimization.html">7. Optimization, the central part of any Machine Learning algortithm</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter5.html">8. Support Vector Machines, overarching aims</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Decision Trees, Ensemble Methods and Boosting</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter6.html">9. Decision trees, overarching aims</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter7.html">10. Ensemble Methods: From a Single Tree to Many Trees and Extreme Boosting, Meet the Jungle of Methods</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Dimensionality Reduction</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter8.html">11. Basic ideas of the Principal Component Analysis (PCA)</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/clustering.html">12. Clustering and Unsupervised Learning</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Deep Learning Methods</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter9.html">13. Neural networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter10.html">14. Building a Feed Forward Neural Network</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter11.html">15. Solving Differential Equations  with Deep Learning</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter12.html">16. Convolutional Neural Networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter13.html">17. Recurrent neural networks: Overarching view</a></li>
+
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Weekly material, notes and exercises</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek34.html">Exercises week 34</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week34.html">Week 34: Introduction to the course, Logistics and Practicalities</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek35.html">Exercises week 35</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week35.html">Week 35: From Ordinary Linear Regression to Ridge and Lasso Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek36.html">Exercises week 36</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week36.html">Week 36: Linear Regression and Gradient descent</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek37.html">Exercises week 37</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week37.html">Week 37: Gradient descent methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek38.html">Exercises week 38</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week38.html">Week 38: Statistical analysis, bias-variance tradeoff and resampling methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek39.html">Exercises week 39</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week39.html">Week 39: Resampling methods and logistic regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week40.html">Week 40: Gradient descent methods (continued) and start Neural networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week41.html">Week 41 Neural networks and constructing a neural network code</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek41.html">Exercises week 41</a></li>
+
+
+
+
+
+
+
+
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Projects</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/project1.html">Project 1 on Machine Learning, deadline October 6 (midnight), 2025</a></li>
+</ul>
+
+    </div>
+</nav></div>
+    </div>
+  
+  
+  <div class="sidebar-primary-items__end sidebar-primary__section">
+  </div>
+  
+  <div id="rtd-footer-container"></div>
+
+
+      </div>
+      
+      <main id="main-content" class="bd-main" role="main">
+        
+        
+
+<div class="sbt-scroll-pixel-helper"></div>
+
+          <div class="bd-content">
+            <div class="bd-article-container">
+              
+              <div class="bd-header-article d-print-none">
+<div class="header-article-items header-article__inner">
+  
+    <div class="header-article-items__start">
+      
+        <div class="header-article-item"><button class="sidebar-toggle primary-toggle btn btn-sm" title="Toggle primary sidebar" data-bs-placement="bottom" data-bs-toggle="tooltip">
+  <span class="fa-solid fa-bars"></span>
+</button></div>
+      
+    </div>
+  
+  
+    <div class="header-article-items__end">
+      
+        <div class="header-article-item">
+
+<div class="article-header-buttons">
+
+
+
+
+
+<div class="dropdown dropdown-download-buttons">
+  <button class="btn dropdown-toggle" type="button" data-bs-toggle="dropdown" aria-expanded="false" aria-label="Download this page">
+    <i class="fas fa-download"></i>
+  </button>
+  <ul class="dropdown-menu">
+      
+      
+      
+      <li><a href="http://github.com/_sources/.venv/lib/python3.13/site-packages/a11y_pygments/pitaya_smoothie/README.md" target="_blank"
+   class="btn btn-sm btn-download-source-button dropdown-item"
+   title="Download source file"
+   data-bs-placement="left" data-bs-toggle="tooltip"
+>
+  
+
+<span class="btn__icon-container">
+  <i class="fas fa-file"></i>
+  </span>
+<span class="btn__text-container">.md</span>
+</a>
+</li>
+      
+      
+      
+      
+      <li>
+<button onclick="window.print()"
+  class="btn btn-sm btn-download-pdf-button dropdown-item"
+  title="Print to PDF"
+  data-bs-placement="left" data-bs-toggle="tooltip"
+>
+  
+
+<span class="btn__icon-container">
+  <i class="fas fa-file-pdf"></i>
+  </span>
+<span class="btn__text-container">.pdf</span>
+</button>
+</li>
+      
+  </ul>
+</div>
+
+
+
+
+<button onclick="toggleFullScreen()"
+  class="btn btn-sm btn-fullscreen-button"
+  title="Fullscreen mode"
+  data-bs-placement="bottom" data-bs-toggle="tooltip"
+>
+  
+
+<span class="btn__icon-container">
+  <i class="fas fa-expand"></i>
+  </span>
+
+</button>
+
+
+
+<script>
+document.write(`
+  <button class="btn btn-sm nav-link pst-navbar-icon theme-switch-button" title="light/dark" aria-label="light/dark" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <i class="theme-switch fa-solid fa-sun fa-lg" data-mode="light"></i>
+    <i class="theme-switch fa-solid fa-moon fa-lg" data-mode="dark"></i>
+    <i class="theme-switch fa-solid fa-circle-half-stroke fa-lg" data-mode="auto"></i>
+  </button>
+`);
+</script>
+
+
+<script>
+document.write(`
+  <button class="btn btn-sm pst-navbar-icon search-button search-button__button" title="Search" aria-label="Search" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <i class="fa-solid fa-magnifying-glass fa-lg"></i>
+  </button>
+`);
+</script>
+<button class="sidebar-toggle secondary-toggle btn btn-sm" title="Toggle secondary sidebar" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <span class="fa-solid fa-list"></span>
+</button>
+</div></div>
+      
+    </div>
+  
+</div>
+</div>
+              
+              
+
+<div id="jb-print-docs-body" class="onlyprint">
+    <h1>Pitaya Smoothie</h1>
+    <!-- Table of contents -->
+    <div id="print-main-content">
+        <div id="jb-print-toc">
+            
+            <div>
+                <h2> Contents </h2>
+            </div>
+            <nav aria-label="Page">
+                <ul class="visible nav section-nav flex-column">
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#colors">Colors</a></li>
+</ul>
+            </nav>
+        </div>
+    </div>
+</div>
+
+              
+                
+<div id="searchbox"></div>
+                <article class="bd-article">
+                  
+  <section class="tex2jax_ignore mathjax_ignore" id="pitaya-smoothie">
+<h1>Pitaya Smoothie<a class="headerlink" href="#pitaya-smoothie" title="Link to this heading">#</a></h1>
+<p>This style mimics the a11 light theme from eric bailey’s accessible themes.</p>
+<p><img alt="Screenshot of the pitaya-smoothie theme in a bash script" src="http://github.com/_images/pitaya-smoothie.png" /></p>
+<section id="colors">
+<h2>Colors<a class="headerlink" href="#colors" title="Link to this heading">#</a></h2>
+<p>Background color: <img alt="#181036" src="https://via.placeholder.com/20/181036/181036.png" /> <code class="docutils literal notranslate"><span class="pre">#181036</span></code></p>
+<p>Highlight color: <img alt="#2A1968" src="https://via.placeholder.com/20/2A1968/2A1968.png" /> <code class="docutils literal notranslate"><span class="pre">#2A1968</span></code></p>
+<p><strong>WCAG compliance</strong></p>
+<div class="pst-scrollable-table-container"><table class="table">
+<thead>
+<tr class="row-odd"><th class="head"><p>Color</p></th>
+<th class="head"><p>Hex</p></th>
+<th class="head"><p>Ratio</p></th>
+<th class="head"><p>Normal text</p></th>
+<th class="head"><p>Large text</p></th>
+</tr>
+</thead>
+<tbody>
+<tr class="row-even"><td><p><img alt="#8786ac" src="http://github.com/_images/8786ac.png" /></p></td>
+<td><p><code class="docutils literal notranslate"><span class="pre">#8786ac</span></code></p></td>
+<td><p>5.2 : 1</p></td>
+<td><p>AA</p></td>
+<td><p>AAA</p></td>
+</tr>
+<tr class="row-odd"><td><p><img alt="#f26196" src="http://github.com/_images/f26196.png" /></p></td>
+<td><p><code class="docutils literal notranslate"><span class="pre">#f26196</span></code></p></td>
+<td><p>5.9 : 1</p></td>
+<td><p>AA</p></td>
+<td><p>AAA</p></td>
+</tr>
+<tr class="row-even"><td><p><img alt="#f5a394" src="http://github.com/_images/f5a394.png" /></p></td>
+<td><p><code class="docutils literal notranslate"><span class="pre">#f5a394</span></code></p></td>
+<td><p>9.0 : 1</p></td>
+<td><p>AAA</p></td>
+<td><p>AAA</p></td>
+</tr>
+<tr class="row-odd"><td><p><img alt="#fad000" src="http://github.com/_images/fad000.png" /></p></td>
+<td><p><code class="docutils literal notranslate"><span class="pre">#fad000</span></code></p></td>
+<td><p>12.1 : 1</p></td>
+<td><p>AAA</p></td>
+<td><p>AAA</p></td>
+</tr>
+<tr class="row-even"><td><p><img alt="#18c1c4" src="http://github.com/_images/18c1c4.png" /></p></td>
+<td><p><code class="docutils literal notranslate"><span class="pre">#18c1c4</span></code></p></td>
+<td><p>8.1 : 1</p></td>
+<td><p>AAA</p></td>
+<td><p>AAA</p></td>
+</tr>
+<tr class="row-odd"><td><p><img alt="#66e9ec" src="http://github.com/_images/66e9ec.png" /></p></td>
+<td><p><code class="docutils literal notranslate"><span class="pre">#66e9ec</span></code></p></td>
+<td><p>12.4 : 1</p></td>
+<td><p>AAA</p></td>
+<td><p>AAA</p></td>
+</tr>
+<tr class="row-even"><td><p><img alt="#7998f2" src="http://github.com/_images/7998f2.png" /></p></td>
+<td><p><code class="docutils literal notranslate"><span class="pre">#7998f2</span></code></p></td>
+<td><p>6.5 : 1</p></td>
+<td><p>AA</p></td>
+<td><p>AAA</p></td>
+</tr>
+<tr class="row-odd"><td><p><img alt="#c4a2f5" src="http://github.com/_images/c4a2f5.png" /></p></td>
+<td><p><code class="docutils literal notranslate"><span class="pre">#c4a2f5</span></code></p></td>
+<td><p>8.4 : 1</p></td>
+<td><p>AAA</p></td>
+<td><p>AAA</p></td>
+</tr>
+<tr class="row-even"><td><p><img alt="#fefeff" src="http://github.com/_images/fefeff.png" /></p></td>
+<td><p><code class="docutils literal notranslate"><span class="pre">#fefeff</span></code></p></td>
+<td><p>17.9 : 1</p></td>
+<td><p>AAA</p></td>
+<td><p>AAA</p></td>
+</tr>
+</tbody>
+</table>
+</div>
+</section>
+</section>
+
+    <script type="text/x-thebe-config">
+    {
+        requestKernel: true,
+        binderOptions: {
+            repo: "binder-examples/jupyter-stacks-datascience",
+            ref: "master",
+        },
+        codeMirrorConfig: {
+            theme: "abcdef",
+            mode: "python"
+        },
+        kernelOptions: {
+            name: "python3",
+            path: "./.venv/lib/python3.13/site-packages/a11y_pygments/pitaya_smoothie"
+        },
+        predefinedOutput: true
+    }
+    </script>
+    <script>kernelName = 'python3'</script>
+
+                </article>
+              
+
+              
+              
+              
+              
+                <footer class="prev-next-footer d-print-none">
+                  
+<div class="prev-next-area">
+</div>
+                </footer>
+              
+            </div>
+            
+            
+              
+                <div class="bd-sidebar-secondary bd-toc"><div class="sidebar-secondary-items sidebar-secondary__inner">
+
+
+  <div class="sidebar-secondary-item">
+  <div class="page-toc tocsection onthispage">
+    <i class="fa-solid fa-list"></i> Contents
+  </div>
+  <nav class="bd-toc-nav page-toc">
+    <ul class="visible nav section-nav flex-column">
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#colors">Colors</a></li>
+</ul>
+  </nav></div>
+
+</div></div>
+              
+            
+          </div>
+          <footer class="bd-footer-content">
+            
+<div class="bd-footer-content__inner container">
+  
+  <div class="footer-item">
+    
+<p class="component-author">
+By Morten Hjorth-Jensen
+</p>
+
+  </div>
+  
+  <div class="footer-item">
+    
+
+  <p class="copyright">
+    
+      © Copyright 2023.
+      <br/>
+    
+  </p>
+
+  </div>
+  
+  <div class="footer-item">
+    
+  </div>
+  
+  <div class="footer-item">
+    
+  </div>
+  
+</div>
+          </footer>
+        
+
+      </main>
+    </div>
+  </div>
+  
+  <!-- Scripts loaded after <body> so the DOM is not blocked -->
+  <script src="http://github.com/_static/scripts/bootstrap.js?digest=dfe6caa3a7d634c4db9b"></script>
+<script src="http://github.com/_static/scripts/pydata-sphinx-theme.js?digest=dfe6caa3a7d634c4db9b"></script>
+
+  <footer class="bd-footer">
+  </footer>
+  </body>
+</html>
\ No newline at end of file
diff --git a/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/alabaster-0.7.16.dist-info/LICENSE.html b/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/alabaster-0.7.16.dist-info/LICENSE.html
new file mode 100644
index 000000000..2afc98044
--- /dev/null
+++ b/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/alabaster-0.7.16.dist-info/LICENSE.html
@@ -0,0 +1,542 @@
+
+<!DOCTYPE html>
+
+
+<html lang="en" data-content_root="../../../../../" >
+
+  <head>
+    <meta charset="utf-8" />
+    <meta name="viewport" content="width=device-width, initial-scale=1.0" /><meta name="viewport" content="width=device-width, initial-scale=1" />
+
+    <title>&lt;no title&gt; &#8212; Applied Data Analysis and Machine Learning</title>
+  
+  
+  
+  <script data-cfasync="false">
+    document.documentElement.dataset.mode = localStorage.getItem("mode") || "";
+    document.documentElement.dataset.theme = localStorage.getItem("theme") || "";
+  </script>
+  
+  <!-- Loaded before other Sphinx assets -->
+  <link href="http://github.com/_static/styles/theme.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+<link href="http://github.com/_static/styles/bootstrap.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+<link href="http://github.com/_static/styles/pydata-sphinx-theme.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+
+  
+  <link href="http://github.com/_static/vendor/fontawesome/6.5.2/css/all.min.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+  <link rel="preload" as="font" type="font/woff2" crossorigin href="http://github.com/_static/vendor/fontawesome/6.5.2/webfonts/fa-solid-900.woff2" />
+<link rel="preload" as="font" type="font/woff2" crossorigin href="http://github.com/_static/vendor/fontawesome/6.5.2/webfonts/fa-brands-400.woff2" />
+<link rel="preload" as="font" type="font/woff2" crossorigin href="http://github.com/_static/vendor/fontawesome/6.5.2/webfonts/fa-regular-400.woff2" />
+
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/pygments.css?v=03e43079" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/styles/sphinx-book-theme.css?v=eba8b062" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/togglebutton.css?v=13237357" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/copybutton.css?v=76b2166b" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/mystnb.8ecb98da25f57f5357bf6f572d296f466b2cfe2517ffebfabe82451661e28f02.css?v=6644e6bb" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/sphinx-thebe.css?v=4fa983c6" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/sphinx-design.min.css?v=95c83b7e" />
+  
+  <!-- Pre-loaded scripts that we'll load fully later -->
+  <link rel="preload" as="script" href="http://github.com/_static/scripts/bootstrap.js?digest=dfe6caa3a7d634c4db9b" />
+<link rel="preload" as="script" href="http://github.com/_static/scripts/pydata-sphinx-theme.js?digest=dfe6caa3a7d634c4db9b" />
+  <script src="http://github.com/_static/vendor/fontawesome/6.5.2/js/all.min.js?digest=dfe6caa3a7d634c4db9b"></script>
+
+    <script src="http://github.com/_static/documentation_options.js?v=9eb32ce0"></script>
+    <script src="http://github.com/_static/doctools.js?v=9a2dae69"></script>
+    <script src="http://github.com/_static/sphinx_highlight.js?v=dc90522c"></script>
+    <script src="http://github.com/_static/clipboard.min.js?v=a7894cd8"></script>
+    <script src="http://github.com/_static/copybutton.js?v=f281be69"></script>
+    <script src="http://github.com/_static/scripts/sphinx-book-theme.js?v=887ef09a"></script>
+    <script>let toggleHintShow = 'Click to show';</script>
+    <script>let toggleHintHide = 'Click to hide';</script>
+    <script>let toggleOpenOnPrint = 'true';</script>
+    <script src="http://github.com/_static/togglebutton.js?v=4a39c7ea"></script>
+    <script>var togglebuttonSelector = '.toggle, .admonition.dropdown';</script>
+    <script src="http://github.com/_static/design-tabs.js?v=f930bc37"></script>
+    <script>const THEBE_JS_URL = "https://unpkg.com/thebe@0.8.2/lib/index.js"; const thebe_selector = ".thebe,.cell"; const thebe_selector_input = "pre"; const thebe_selector_output = ".output, .cell_output"</script>
+    <script async="async" src="http://github.com/_static/sphinx-thebe.js?v=c100c467"></script>
+    <script>var togglebuttonSelector = '.toggle, .admonition.dropdown';</script>
+    <script>const THEBE_JS_URL = "https://unpkg.com/thebe@0.8.2/lib/index.js"; const thebe_selector = ".thebe,.cell"; const thebe_selector_input = "pre"; const thebe_selector_output = ".output, .cell_output"</script>
+    <script>DOCUMENTATION_OPTIONS.pagename = '.venv/lib/python3.13/site-packages/alabaster-0.7.16.dist-info/LICENSE';</script>
+    <link rel="index" title="Index" href="http://github.com/genindex.html" />
+    <link rel="search" title="Search" href="http://github.com/search.html" />
+  <meta name="viewport" content="width=device-width, initial-scale=1"/>
+  <meta name="docsearch:language" content="en"/>
+  </head>
+  
+  
+  <body data-bs-spy="scroll" data-bs-target=".bd-toc-nav" data-offset="180" data-bs-root-margin="0px 0px -60%" data-default-mode="">
+
+  
+  
+  <div id="pst-skip-link" class="skip-link d-print-none"><a href="#main-content">Skip to main content</a></div>
+  
+  <div id="pst-scroll-pixel-helper"></div>
+  
+  <button type="button" class="btn rounded-pill" id="pst-back-to-top">
+    <i class="fa-solid fa-arrow-up"></i>Back to top</button>
+
+  
+  <input type="checkbox"
+          class="sidebar-toggle"
+          id="pst-primary-sidebar-checkbox"/>
+  <label class="overlay overlay-primary" for="pst-primary-sidebar-checkbox"></label>
+  
+  <input type="checkbox"
+          class="sidebar-toggle"
+          id="pst-secondary-sidebar-checkbox"/>
+  <label class="overlay overlay-secondary" for="pst-secondary-sidebar-checkbox"></label>
+  
+  <div class="search-button__wrapper">
+    <div class="search-button__overlay"></div>
+    <div class="search-button__search-container">
+<form class="bd-search d-flex align-items-center"
+      action="http://github.com/search.html"
+      method="get">
+  <i class="fa-solid fa-magnifying-glass"></i>
+  <input type="search"
+         class="form-control"
+         name="q"
+         id="search-input"
+         placeholder="Search this book..."
+         aria-label="Search this book..."
+         autocomplete="off"
+         autocorrect="off"
+         autocapitalize="off"
+         spellcheck="false"/>
+  <span class="search-button__kbd-shortcut"><kbd class="kbd-shortcut__modifier">Ctrl</kbd>+<kbd>K</kbd></span>
+</form></div>
+  </div>
+
+  <div class="pst-async-banner-revealer d-none">
+  <aside id="bd-header-version-warning" class="d-none d-print-none" aria-label="Version warning"></aside>
+</div>
+
+  
+    <header class="bd-header navbar navbar-expand-lg bd-navbar d-print-none">
+    </header>
+  
+
+  <div class="bd-container">
+    <div class="bd-container__inner bd-page-width">
+      
+      
+      
+        
+      
+      <div class="bd-sidebar-primary bd-sidebar">
+        
+
+  
+  <div class="sidebar-header-items sidebar-primary__section">
+    
+    
+    
+    
+  </div>
+  
+    <div class="sidebar-primary-items__start sidebar-primary__section">
+        <div class="sidebar-primary-item">
+
+  
+    
+  
+
+<a class="navbar-brand logo" href="http://github.com/intro.html">
+  
+  
+  
+  
+  
+    
+    
+      
+    
+    
+    <img src="http://github.com/_static/logo.png" class="logo__image only-light" alt="Applied Data Analysis and Machine Learning - Home"/>
+    <script>document.write(`<img src="http://github.com/_static/logo.png" class="logo__image only-dark" alt="Applied Data Analysis and Machine Learning - Home"/>`);</script>
+  
+  
+</a></div>
+        <div class="sidebar-primary-item">
+
+ <script>
+ document.write(`
+   <button class="btn search-button-field search-button__button" title="Search" aria-label="Search" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <i class="fa-solid fa-magnifying-glass"></i>
+    <span class="search-button__default-text">Search</span>
+    <span class="search-button__kbd-shortcut"><kbd class="kbd-shortcut__modifier">Ctrl</kbd>+<kbd class="kbd-shortcut__modifier">K</kbd></span>
+   </button>
+ `);
+ </script></div>
+        <div class="sidebar-primary-item"><nav class="bd-links bd-docs-nav" aria-label="Main">
+    <div class="bd-toc-item navbar-nav active">
+        
+        <ul class="nav bd-sidenav bd-sidenav__home-link">
+            <li class="toctree-l1">
+                <a class="reference internal" href="http://github.com/intro.html">
+                    Applied Data Analysis and Machine Learning
+                </a>
+            </li>
+        </ul>
+        <p aria-level="2" class="caption" role="heading"><span class="caption-text">About the course</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/schedule.html">Course setting</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/teachers.html">Teachers and Grading</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/textbooks.html">Textbooks</a></li>
+
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Review of Statistics with Resampling Techniques and Linear Algebra</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/statistics.html">1. Elements of Probability Theory and Statistical Data Analysis</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/linalg.html">2. Linear Algebra, Handling of Arrays and more Python Features</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">From Regression to Support Vector Machines</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter1.html">3. Linear Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter2.html">4. Ridge and Lasso Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter3.html">5. Resampling Methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter4.html">6. Logistic Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapteroptimization.html">7. Optimization, the central part of any Machine Learning algortithm</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter5.html">8. Support Vector Machines, overarching aims</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Decision Trees, Ensemble Methods and Boosting</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter6.html">9. Decision trees, overarching aims</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter7.html">10. Ensemble Methods: From a Single Tree to Many Trees and Extreme Boosting, Meet the Jungle of Methods</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Dimensionality Reduction</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter8.html">11. Basic ideas of the Principal Component Analysis (PCA)</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/clustering.html">12. Clustering and Unsupervised Learning</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Deep Learning Methods</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter9.html">13. Neural networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter10.html">14. Building a Feed Forward Neural Network</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter11.html">15. Solving Differential Equations  with Deep Learning</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter12.html">16. Convolutional Neural Networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter13.html">17. Recurrent neural networks: Overarching view</a></li>
+
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Weekly material, notes and exercises</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek34.html">Exercises week 34</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week34.html">Week 34: Introduction to the course, Logistics and Practicalities</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek35.html">Exercises week 35</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week35.html">Week 35: From Ordinary Linear Regression to Ridge and Lasso Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek36.html">Exercises week 36</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week36.html">Week 36: Linear Regression and Gradient descent</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek37.html">Exercises week 37</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week37.html">Week 37: Gradient descent methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek38.html">Exercises week 38</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week38.html">Week 38: Statistical analysis, bias-variance tradeoff and resampling methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek39.html">Exercises week 39</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week39.html">Week 39: Resampling methods and logistic regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week40.html">Week 40: Gradient descent methods (continued) and start Neural networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week41.html">Week 41 Neural networks and constructing a neural network code</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek41.html">Exercises week 41</a></li>
+
+
+
+
+
+
+
+
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Projects</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/project1.html">Project 1 on Machine Learning, deadline October 6 (midnight), 2025</a></li>
+</ul>
+
+    </div>
+</nav></div>
+    </div>
+  
+  
+  <div class="sidebar-primary-items__end sidebar-primary__section">
+  </div>
+  
+  <div id="rtd-footer-container"></div>
+
+
+      </div>
+      
+      <main id="main-content" class="bd-main" role="main">
+        
+        
+
+<div class="sbt-scroll-pixel-helper"></div>
+
+          <div class="bd-content">
+            <div class="bd-article-container">
+              
+              <div class="bd-header-article d-print-none">
+<div class="header-article-items header-article__inner">
+  
+    <div class="header-article-items__start">
+      
+        <div class="header-article-item"><button class="sidebar-toggle primary-toggle btn btn-sm" title="Toggle primary sidebar" data-bs-placement="bottom" data-bs-toggle="tooltip">
+  <span class="fa-solid fa-bars"></span>
+</button></div>
+      
+    </div>
+  
+  
+    <div class="header-article-items__end">
+      
+        <div class="header-article-item">
+
+<div class="article-header-buttons">
+
+
+
+
+
+<div class="dropdown dropdown-download-buttons">
+  <button class="btn dropdown-toggle" type="button" data-bs-toggle="dropdown" aria-expanded="false" aria-label="Download this page">
+    <i class="fas fa-download"></i>
+  </button>
+  <ul class="dropdown-menu">
+      
+      
+      
+      <li><a href="http://github.com/_sources/.venv/lib/python3.13/site-packages/alabaster-0.7.16.dist-info/LICENSE.rst" target="_blank"
+   class="btn btn-sm btn-download-source-button dropdown-item"
+   title="Download source file"
+   data-bs-placement="left" data-bs-toggle="tooltip"
+>
+  
+
+<span class="btn__icon-container">
+  <i class="fas fa-file"></i>
+  </span>
+<span class="btn__text-container">.rst</span>
+</a>
+</li>
+      
+      
+      
+      
+      <li>
+<button onclick="window.print()"
+  class="btn btn-sm btn-download-pdf-button dropdown-item"
+  title="Print to PDF"
+  data-bs-placement="left" data-bs-toggle="tooltip"
+>
+  
+
+<span class="btn__icon-container">
+  <i class="fas fa-file-pdf"></i>
+  </span>
+<span class="btn__text-container">.pdf</span>
+</button>
+</li>
+      
+  </ul>
+</div>
+
+
+
+
+<button onclick="toggleFullScreen()"
+  class="btn btn-sm btn-fullscreen-button"
+  title="Fullscreen mode"
+  data-bs-placement="bottom" data-bs-toggle="tooltip"
+>
+  
+
+<span class="btn__icon-container">
+  <i class="fas fa-expand"></i>
+  </span>
+
+</button>
+
+
+
+<script>
+document.write(`
+  <button class="btn btn-sm nav-link pst-navbar-icon theme-switch-button" title="light/dark" aria-label="light/dark" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <i class="theme-switch fa-solid fa-sun fa-lg" data-mode="light"></i>
+    <i class="theme-switch fa-solid fa-moon fa-lg" data-mode="dark"></i>
+    <i class="theme-switch fa-solid fa-circle-half-stroke fa-lg" data-mode="auto"></i>
+  </button>
+`);
+</script>
+
+
+<script>
+document.write(`
+  <button class="btn btn-sm pst-navbar-icon search-button search-button__button" title="Search" aria-label="Search" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <i class="fa-solid fa-magnifying-glass fa-lg"></i>
+  </button>
+`);
+</script>
+<button class="sidebar-toggle secondary-toggle btn btn-sm" title="Toggle secondary sidebar" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <span class="fa-solid fa-list"></span>
+</button>
+</div></div>
+      
+    </div>
+  
+</div>
+</div>
+              
+              
+
+<div id="jb-print-docs-body" class="onlyprint">
+    <h1><no title></h1>
+    <!-- Table of contents -->
+    <div id="print-main-content">
+        <div id="jb-print-toc">
+            
+            <div>
+                <h2> Contents </h2>
+            </div>
+            <nav aria-label="Page">
+                <ul class="simple visible nav section-nav flex-column">
+</ul>
+
+            </nav>
+        </div>
+    </div>
+</div>
+
+              
+                
+<div id="searchbox"></div>
+                <article class="bd-article">
+                  
+  <p>Copyright (c) 2020 Jeff Forcier.</p>
+<p>Based on original work copyright (c) 2011 Kenneth Reitz and copyright (c) 2010
+Armin Ronacher.</p>
+<p>Some rights reserved.</p>
+<p>Redistribution and use in source and binary forms of the theme, with or
+without modification, are permitted provided that the following conditions
+are met:</p>
+<ul class="simple">
+<li><p>Redistributions of source code must retain the above copyright
+notice, this list of conditions and the following disclaimer.</p></li>
+<li><p>Redistributions in binary form must reproduce the above
+copyright notice, this list of conditions and the following
+disclaimer in the documentation and/or other materials provided
+with the distribution.</p></li>
+<li><p>The names of the contributors may not be used to endorse or
+promote products derived from this software without specific
+prior written permission.</p></li>
+</ul>
+<p>THIS THEME IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS “AS IS”
+AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
+LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+ARISING IN ANY WAY OUT OF THE USE OF THIS THEME, EVEN IF ADVISED OF THE
+POSSIBILITY OF SUCH DAMAGE.</p>
+
+    <script type="text/x-thebe-config">
+    {
+        requestKernel: true,
+        binderOptions: {
+            repo: "binder-examples/jupyter-stacks-datascience",
+            ref: "master",
+        },
+        codeMirrorConfig: {
+            theme: "abcdef",
+            mode: "python"
+        },
+        kernelOptions: {
+            name: "python3",
+            path: "./.venv/lib/python3.13/site-packages/alabaster-0.7.16.dist-info"
+        },
+        predefinedOutput: true
+    }
+    </script>
+    <script>kernelName = 'python3'</script>
+
+                </article>
+              
+
+              
+              
+              
+              
+                <footer class="prev-next-footer d-print-none">
+                  
+<div class="prev-next-area">
+</div>
+                </footer>
+              
+            </div>
+            
+            
+              
+                <div class="bd-sidebar-secondary bd-toc"><div class="sidebar-secondary-items sidebar-secondary__inner">
+
+
+  <div class="sidebar-secondary-item">
+  <div class="page-toc tocsection onthispage">
+    <i class="fa-solid fa-list"></i> Contents
+  </div>
+  <nav class="bd-toc-nav page-toc">
+    <ul class="simple visible nav section-nav flex-column">
+</ul>
+
+  </nav></div>
+
+</div></div>
+              
+            
+          </div>
+          <footer class="bd-footer-content">
+            
+<div class="bd-footer-content__inner container">
+  
+  <div class="footer-item">
+    
+<p class="component-author">
+By Morten Hjorth-Jensen
+</p>
+
+  </div>
+  
+  <div class="footer-item">
+    
+
+  <p class="copyright">
+    
+      © Copyright 2023.
+      <br/>
+    
+  </p>
+
+  </div>
+  
+  <div class="footer-item">
+    
+  </div>
+  
+  <div class="footer-item">
+    
+  </div>
+  
+</div>
+          </footer>
+        
+
+      </main>
+    </div>
+  </div>
+  
+  <!-- Scripts loaded after <body> so the DOM is not blocked -->
+  <script src="http://github.com/_static/scripts/bootstrap.js?digest=dfe6caa3a7d634c4db9b"></script>
+<script src="http://github.com/_static/scripts/pydata-sphinx-theme.js?digest=dfe6caa3a7d634c4db9b"></script>
+
+  <footer class="bd-footer">
+  </footer>
+  </body>
+</html>
\ No newline at end of file
diff --git a/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/debugpy/_vendored/pydevd/pydevd_plugins/extensions/README.html b/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/debugpy/_vendored/pydevd/pydevd_plugins/extensions/README.html
new file mode 100644
index 000000000..c3d0376f1
--- /dev/null
+++ b/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/debugpy/_vendored/pydevd/pydevd_plugins/extensions/README.html
@@ -0,0 +1,546 @@
+
+<!DOCTYPE html>
+
+
+<html lang="en" data-content_root="../../../../../../../../../" >
+
+  <head>
+    <meta charset="utf-8" />
+    <meta name="viewport" content="width=device-width, initial-scale=1.0" /><meta name="viewport" content="width=device-width, initial-scale=1" />
+
+    <title>&lt;no title&gt; &#8212; Applied Data Analysis and Machine Learning</title>
+  
+  
+  
+  <script data-cfasync="false">
+    document.documentElement.dataset.mode = localStorage.getItem("mode") || "";
+    document.documentElement.dataset.theme = localStorage.getItem("theme") || "";
+  </script>
+  
+  <!-- Loaded before other Sphinx assets -->
+  <link href="http://github.com/_static/styles/theme.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+<link href="http://github.com/_static/styles/bootstrap.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+<link href="http://github.com/_static/styles/pydata-sphinx-theme.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+
+  
+  <link href="http://github.com/_static/vendor/fontawesome/6.5.2/css/all.min.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+  <link rel="preload" as="font" type="font/woff2" crossorigin href="http://github.com/_static/vendor/fontawesome/6.5.2/webfonts/fa-solid-900.woff2" />
+<link rel="preload" as="font" type="font/woff2" crossorigin href="http://github.com/_static/vendor/fontawesome/6.5.2/webfonts/fa-brands-400.woff2" />
+<link rel="preload" as="font" type="font/woff2" crossorigin href="http://github.com/_static/vendor/fontawesome/6.5.2/webfonts/fa-regular-400.woff2" />
+
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/pygments.css?v=03e43079" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/styles/sphinx-book-theme.css?v=eba8b062" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/togglebutton.css?v=13237357" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/copybutton.css?v=76b2166b" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/mystnb.8ecb98da25f57f5357bf6f572d296f466b2cfe2517ffebfabe82451661e28f02.css?v=6644e6bb" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/sphinx-thebe.css?v=4fa983c6" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/sphinx-design.min.css?v=95c83b7e" />
+  
+  <!-- Pre-loaded scripts that we'll load fully later -->
+  <link rel="preload" as="script" href="http://github.com/_static/scripts/bootstrap.js?digest=dfe6caa3a7d634c4db9b" />
+<link rel="preload" as="script" href="http://github.com/_static/scripts/pydata-sphinx-theme.js?digest=dfe6caa3a7d634c4db9b" />
+  <script src="http://github.com/_static/vendor/fontawesome/6.5.2/js/all.min.js?digest=dfe6caa3a7d634c4db9b"></script>
+
+    <script src="http://github.com/_static/documentation_options.js?v=9eb32ce0"></script>
+    <script src="http://github.com/_static/doctools.js?v=9a2dae69"></script>
+    <script src="http://github.com/_static/sphinx_highlight.js?v=dc90522c"></script>
+    <script src="http://github.com/_static/clipboard.min.js?v=a7894cd8"></script>
+    <script src="http://github.com/_static/copybutton.js?v=f281be69"></script>
+    <script src="http://github.com/_static/scripts/sphinx-book-theme.js?v=887ef09a"></script>
+    <script>let toggleHintShow = 'Click to show';</script>
+    <script>let toggleHintHide = 'Click to hide';</script>
+    <script>let toggleOpenOnPrint = 'true';</script>
+    <script src="http://github.com/_static/togglebutton.js?v=4a39c7ea"></script>
+    <script>var togglebuttonSelector = '.toggle, .admonition.dropdown';</script>
+    <script src="http://github.com/_static/design-tabs.js?v=f930bc37"></script>
+    <script>const THEBE_JS_URL = "https://unpkg.com/thebe@0.8.2/lib/index.js"; const thebe_selector = ".thebe,.cell"; const thebe_selector_input = "pre"; const thebe_selector_output = ".output, .cell_output"</script>
+    <script async="async" src="http://github.com/_static/sphinx-thebe.js?v=c100c467"></script>
+    <script>var togglebuttonSelector = '.toggle, .admonition.dropdown';</script>
+    <script>const THEBE_JS_URL = "https://unpkg.com/thebe@0.8.2/lib/index.js"; const thebe_selector = ".thebe,.cell"; const thebe_selector_input = "pre"; const thebe_selector_output = ".output, .cell_output"</script>
+    <script>DOCUMENTATION_OPTIONS.pagename = '.venv/lib/python3.13/site-packages/debugpy/_vendored/pydevd/pydevd_plugins/extensions/README';</script>
+    <link rel="index" title="Index" href="http://github.com/genindex.html" />
+    <link rel="search" title="Search" href="http://github.com/search.html" />
+  <meta name="viewport" content="width=device-width, initial-scale=1"/>
+  <meta name="docsearch:language" content="en"/>
+  </head>
+  
+  
+  <body data-bs-spy="scroll" data-bs-target=".bd-toc-nav" data-offset="180" data-bs-root-margin="0px 0px -60%" data-default-mode="">
+
+  
+  
+  <div id="pst-skip-link" class="skip-link d-print-none"><a href="#main-content">Skip to main content</a></div>
+  
+  <div id="pst-scroll-pixel-helper"></div>
+  
+  <button type="button" class="btn rounded-pill" id="pst-back-to-top">
+    <i class="fa-solid fa-arrow-up"></i>Back to top</button>
+
+  
+  <input type="checkbox"
+          class="sidebar-toggle"
+          id="pst-primary-sidebar-checkbox"/>
+  <label class="overlay overlay-primary" for="pst-primary-sidebar-checkbox"></label>
+  
+  <input type="checkbox"
+          class="sidebar-toggle"
+          id="pst-secondary-sidebar-checkbox"/>
+  <label class="overlay overlay-secondary" for="pst-secondary-sidebar-checkbox"></label>
+  
+  <div class="search-button__wrapper">
+    <div class="search-button__overlay"></div>
+    <div class="search-button__search-container">
+<form class="bd-search d-flex align-items-center"
+      action="http://github.com/search.html"
+      method="get">
+  <i class="fa-solid fa-magnifying-glass"></i>
+  <input type="search"
+         class="form-control"
+         name="q"
+         id="search-input"
+         placeholder="Search this book..."
+         aria-label="Search this book..."
+         autocomplete="off"
+         autocorrect="off"
+         autocapitalize="off"
+         spellcheck="false"/>
+  <span class="search-button__kbd-shortcut"><kbd class="kbd-shortcut__modifier">Ctrl</kbd>+<kbd>K</kbd></span>
+</form></div>
+  </div>
+
+  <div class="pst-async-banner-revealer d-none">
+  <aside id="bd-header-version-warning" class="d-none d-print-none" aria-label="Version warning"></aside>
+</div>
+
+  
+    <header class="bd-header navbar navbar-expand-lg bd-navbar d-print-none">
+    </header>
+  
+
+  <div class="bd-container">
+    <div class="bd-container__inner bd-page-width">
+      
+      
+      
+        
+      
+      <div class="bd-sidebar-primary bd-sidebar">
+        
+
+  
+  <div class="sidebar-header-items sidebar-primary__section">
+    
+    
+    
+    
+  </div>
+  
+    <div class="sidebar-primary-items__start sidebar-primary__section">
+        <div class="sidebar-primary-item">
+
+  
+    
+  
+
+<a class="navbar-brand logo" href="http://github.com/intro.html">
+  
+  
+  
+  
+  
+    
+    
+      
+    
+    
+    <img src="http://github.com/_static/logo.png" class="logo__image only-light" alt="Applied Data Analysis and Machine Learning - Home"/>
+    <script>document.write(`<img src="http://github.com/_static/logo.png" class="logo__image only-dark" alt="Applied Data Analysis and Machine Learning - Home"/>`);</script>
+  
+  
+</a></div>
+        <div class="sidebar-primary-item">
+
+ <script>
+ document.write(`
+   <button class="btn search-button-field search-button__button" title="Search" aria-label="Search" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <i class="fa-solid fa-magnifying-glass"></i>
+    <span class="search-button__default-text">Search</span>
+    <span class="search-button__kbd-shortcut"><kbd class="kbd-shortcut__modifier">Ctrl</kbd>+<kbd class="kbd-shortcut__modifier">K</kbd></span>
+   </button>
+ `);
+ </script></div>
+        <div class="sidebar-primary-item"><nav class="bd-links bd-docs-nav" aria-label="Main">
+    <div class="bd-toc-item navbar-nav active">
+        
+        <ul class="nav bd-sidenav bd-sidenav__home-link">
+            <li class="toctree-l1">
+                <a class="reference internal" href="http://github.com/intro.html">
+                    Applied Data Analysis and Machine Learning
+                </a>
+            </li>
+        </ul>
+        <p aria-level="2" class="caption" role="heading"><span class="caption-text">About the course</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/schedule.html">Course setting</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/teachers.html">Teachers and Grading</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/textbooks.html">Textbooks</a></li>
+
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Review of Statistics with Resampling Techniques and Linear Algebra</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/statistics.html">1. Elements of Probability Theory and Statistical Data Analysis</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/linalg.html">2. Linear Algebra, Handling of Arrays and more Python Features</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">From Regression to Support Vector Machines</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter1.html">3. Linear Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter2.html">4. Ridge and Lasso Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter3.html">5. Resampling Methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter4.html">6. Logistic Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapteroptimization.html">7. Optimization, the central part of any Machine Learning algortithm</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter5.html">8. Support Vector Machines, overarching aims</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Decision Trees, Ensemble Methods and Boosting</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter6.html">9. Decision trees, overarching aims</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter7.html">10. Ensemble Methods: From a Single Tree to Many Trees and Extreme Boosting, Meet the Jungle of Methods</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Dimensionality Reduction</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter8.html">11. Basic ideas of the Principal Component Analysis (PCA)</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/clustering.html">12. Clustering and Unsupervised Learning</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Deep Learning Methods</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter9.html">13. Neural networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter10.html">14. Building a Feed Forward Neural Network</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter11.html">15. Solving Differential Equations  with Deep Learning</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter12.html">16. Convolutional Neural Networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter13.html">17. Recurrent neural networks: Overarching view</a></li>
+
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Weekly material, notes and exercises</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek34.html">Exercises week 34</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week34.html">Week 34: Introduction to the course, Logistics and Practicalities</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek35.html">Exercises week 35</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week35.html">Week 35: From Ordinary Linear Regression to Ridge and Lasso Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek36.html">Exercises week 36</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week36.html">Week 36: Linear Regression and Gradient descent</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek37.html">Exercises week 37</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week37.html">Week 37: Gradient descent methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek38.html">Exercises week 38</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week38.html">Week 38: Statistical analysis, bias-variance tradeoff and resampling methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek39.html">Exercises week 39</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week39.html">Week 39: Resampling methods and logistic regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week40.html">Week 40: Gradient descent methods (continued) and start Neural networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week41.html">Week 41 Neural networks and constructing a neural network code</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek41.html">Exercises week 41</a></li>
+
+
+
+
+
+
+
+
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Projects</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/project1.html">Project 1 on Machine Learning, deadline October 6 (midnight), 2025</a></li>
+</ul>
+
+    </div>
+</nav></div>
+    </div>
+  
+  
+  <div class="sidebar-primary-items__end sidebar-primary__section">
+  </div>
+  
+  <div id="rtd-footer-container"></div>
+
+
+      </div>
+      
+      <main id="main-content" class="bd-main" role="main">
+        
+        
+
+<div class="sbt-scroll-pixel-helper"></div>
+
+          <div class="bd-content">
+            <div class="bd-article-container">
+              
+              <div class="bd-header-article d-print-none">
+<div class="header-article-items header-article__inner">
+  
+    <div class="header-article-items__start">
+      
+        <div class="header-article-item"><button class="sidebar-toggle primary-toggle btn btn-sm" title="Toggle primary sidebar" data-bs-placement="bottom" data-bs-toggle="tooltip">
+  <span class="fa-solid fa-bars"></span>
+</button></div>
+      
+    </div>
+  
+  
+    <div class="header-article-items__end">
+      
+        <div class="header-article-item">
+
+<div class="article-header-buttons">
+
+
+
+
+
+<div class="dropdown dropdown-download-buttons">
+  <button class="btn dropdown-toggle" type="button" data-bs-toggle="dropdown" aria-expanded="false" aria-label="Download this page">
+    <i class="fas fa-download"></i>
+  </button>
+  <ul class="dropdown-menu">
+      
+      
+      
+      <li><a href="http://github.com/_sources/.venv/lib/python3.13/site-packages/debugpy/_vendored/pydevd/pydevd_plugins/extensions/README.md" target="_blank"
+   class="btn btn-sm btn-download-source-button dropdown-item"
+   title="Download source file"
+   data-bs-placement="left" data-bs-toggle="tooltip"
+>
+  
+
+<span class="btn__icon-container">
+  <i class="fas fa-file"></i>
+  </span>
+<span class="btn__text-container">.md</span>
+</a>
+</li>
+      
+      
+      
+      
+      <li>
+<button onclick="window.print()"
+  class="btn btn-sm btn-download-pdf-button dropdown-item"
+  title="Print to PDF"
+  data-bs-placement="left" data-bs-toggle="tooltip"
+>
+  
+
+<span class="btn__icon-container">
+  <i class="fas fa-file-pdf"></i>
+  </span>
+<span class="btn__text-container">.pdf</span>
+</button>
+</li>
+      
+  </ul>
+</div>
+
+
+
+
+<button onclick="toggleFullScreen()"
+  class="btn btn-sm btn-fullscreen-button"
+  title="Fullscreen mode"
+  data-bs-placement="bottom" data-bs-toggle="tooltip"
+>
+  
+
+<span class="btn__icon-container">
+  <i class="fas fa-expand"></i>
+  </span>
+
+</button>
+
+
+
+<script>
+document.write(`
+  <button class="btn btn-sm nav-link pst-navbar-icon theme-switch-button" title="light/dark" aria-label="light/dark" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <i class="theme-switch fa-solid fa-sun fa-lg" data-mode="light"></i>
+    <i class="theme-switch fa-solid fa-moon fa-lg" data-mode="dark"></i>
+    <i class="theme-switch fa-solid fa-circle-half-stroke fa-lg" data-mode="auto"></i>
+  </button>
+`);
+</script>
+
+
+<script>
+document.write(`
+  <button class="btn btn-sm pst-navbar-icon search-button search-button__button" title="Search" aria-label="Search" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <i class="fa-solid fa-magnifying-glass fa-lg"></i>
+  </button>
+`);
+</script>
+<button class="sidebar-toggle secondary-toggle btn btn-sm" title="Toggle secondary sidebar" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <span class="fa-solid fa-list"></span>
+</button>
+</div></div>
+      
+    </div>
+  
+</div>
+</div>
+              
+              
+
+<div id="jb-print-docs-body" class="onlyprint">
+    <h1><no title></h1>
+    <!-- Table of contents -->
+    <div id="print-main-content">
+        <div id="jb-print-toc">
+            
+            <div>
+                <h2> Contents </h2>
+            </div>
+            <nav aria-label="Page">
+                <ul class="simple visible nav section-nav flex-column">
+</ul>
+
+            </nav>
+        </div>
+    </div>
+</div>
+
+              
+                
+<div id="searchbox"></div>
+                <article class="bd-article">
+                  
+  <p>Extensions allow extending the debugger without modifying the debugger code. This is implemented with explicit namespace
+packages.</p>
+<p>To implement your own extension:</p>
+<ol class="arabic simple">
+<li><p>Ensure that the root folder of your extension is in sys.path (add it to PYTHONPATH)</p></li>
+<li><p>Ensure that your module follows the directory structure below</p></li>
+<li><p>The <code class="docutils literal notranslate"><span class="pre">__init__.py</span></code> files inside the pydevd_plugin and extension folder must contain the preamble below,
+and nothing else.
+Preamble:</p></li>
+</ol>
+<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="k">try</span><span class="p">:</span>
+    <span class="nb">__import__</span><span class="p">(</span><span class="s1">&#39;pkg_resources&#39;</span><span class="p">)</span><span class="o">.</span><span class="n">declare_namespace</span><span class="p">(</span><span class="vm">__name__</span><span class="p">)</span>
+<span class="k">except</span> <span class="ne">ImportError</span><span class="p">:</span>
+    <span class="kn">import</span><span class="w"> </span><span class="nn">pkgutil</span>
+    <span class="n">__path__</span> <span class="o">=</span> <span class="n">pkgutil</span><span class="o">.</span><span class="n">extend_path</span><span class="p">(</span><span class="n">__path__</span><span class="p">,</span> <span class="vm">__name__</span><span class="p">)</span>
+</pre></div>
+</div>
+<ol class="arabic simple" start="4">
+<li><p>Your plugin name inside the extensions folder must start with <code class="docutils literal notranslate"><span class="pre">&quot;pydevd_plugin&quot;</span></code></p></li>
+<li><p>Implement one or more of the abstract base classes defined in <code class="docutils literal notranslate"><span class="pre">_pydevd_bundle.pydevd_extension_api</span></code>. This can be done
+by either inheriting from them or registering with the abstract base class.</p></li>
+</ol>
+<ul class="simple">
+<li><p>Directory structure:</p></li>
+</ul>
+<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="o">|--</span>  <span class="n">root_directory</span><span class="o">-&gt;</span> <span class="n">must</span> <span class="n">be</span> <span class="n">on</span> <span class="n">python</span> <span class="n">path</span>
+<span class="o">|</span>    <span class="o">|--</span> <span class="n">pydevd_plugins</span>
+<span class="o">|</span>    <span class="o">|</span>   <span class="o">|--</span> <span class="fm">__init__</span><span class="o">.</span><span class="n">py</span> <span class="o">-&gt;</span> <span class="n">must</span> <span class="n">contain</span> <span class="n">preamble</span>
+<span class="o">|</span>    <span class="o">|</span>   <span class="o">|--</span> <span class="n">extensions</span>
+<span class="o">|</span>    <span class="o">|</span>   <span class="o">|</span>   <span class="o">|--</span> <span class="fm">__init__</span><span class="o">.</span><span class="n">py</span> <span class="o">-&gt;</span> <span class="n">must</span> <span class="n">contain</span> <span class="n">preamble</span>
+<span class="o">|</span>    <span class="o">|</span>   <span class="o">|</span>   <span class="o">|--</span> <span class="n">pydevd_plugin_plugin_name</span><span class="o">.</span><span class="n">py</span>
+</pre></div>
+</div>
+
+    <script type="text/x-thebe-config">
+    {
+        requestKernel: true,
+        binderOptions: {
+            repo: "binder-examples/jupyter-stacks-datascience",
+            ref: "master",
+        },
+        codeMirrorConfig: {
+            theme: "abcdef",
+            mode: "python"
+        },
+        kernelOptions: {
+            name: "python3",
+            path: "./.venv/lib/python3.13/site-packages/debugpy/_vendored/pydevd/pydevd_plugins/extensions"
+        },
+        predefinedOutput: true
+    }
+    </script>
+    <script>kernelName = 'python3'</script>
+
+                </article>
+              
+
+              
+              
+              
+              
+                <footer class="prev-next-footer d-print-none">
+                  
+<div class="prev-next-area">
+</div>
+                </footer>
+              
+            </div>
+            
+            
+              
+                <div class="bd-sidebar-secondary bd-toc"><div class="sidebar-secondary-items sidebar-secondary__inner">
+
+
+  <div class="sidebar-secondary-item">
+  <div class="page-toc tocsection onthispage">
+    <i class="fa-solid fa-list"></i> Contents
+  </div>
+  <nav class="bd-toc-nav page-toc">
+    <ul class="simple visible nav section-nav flex-column">
+</ul>
+
+  </nav></div>
+
+</div></div>
+              
+            
+          </div>
+          <footer class="bd-footer-content">
+            
+<div class="bd-footer-content__inner container">
+  
+  <div class="footer-item">
+    
+<p class="component-author">
+By Morten Hjorth-Jensen
+</p>
+
+  </div>
+  
+  <div class="footer-item">
+    
+
+  <p class="copyright">
+    
+      © Copyright 2023.
+      <br/>
+    
+  </p>
+
+  </div>
+  
+  <div class="footer-item">
+    
+  </div>
+  
+  <div class="footer-item">
+    
+  </div>
+  
+</div>
+          </footer>
+        
+
+      </main>
+    </div>
+  </div>
+  
+  <!-- Scripts loaded after <body> so the DOM is not blocked -->
+  <script src="http://github.com/_static/scripts/bootstrap.js?digest=dfe6caa3a7d634c4db9b"></script>
+<script src="http://github.com/_static/scripts/pydata-sphinx-theme.js?digest=dfe6caa3a7d634c4db9b"></script>
+
+  <footer class="bd-footer">
+  </footer>
+  </body>
+</html>
\ No newline at end of file
diff --git a/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/idna-3.10.dist-info/LICENSE.html b/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/idna-3.10.dist-info/LICENSE.html
new file mode 100644
index 000000000..384a01789
--- /dev/null
+++ b/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/idna-3.10.dist-info/LICENSE.html
@@ -0,0 +1,540 @@
+
+<!DOCTYPE html>
+
+
+<html lang="en" data-content_root="../../../../../" >
+
+  <head>
+    <meta charset="utf-8" />
+    <meta name="viewport" content="width=device-width, initial-scale=1.0" /><meta name="viewport" content="width=device-width, initial-scale=1" />
+
+    <title>&lt;no title&gt; &#8212; Applied Data Analysis and Machine Learning</title>
+  
+  
+  
+  <script data-cfasync="false">
+    document.documentElement.dataset.mode = localStorage.getItem("mode") || "";
+    document.documentElement.dataset.theme = localStorage.getItem("theme") || "";
+  </script>
+  
+  <!-- Loaded before other Sphinx assets -->
+  <link href="http://github.com/_static/styles/theme.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+<link href="http://github.com/_static/styles/bootstrap.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+<link href="http://github.com/_static/styles/pydata-sphinx-theme.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+
+  
+  <link href="http://github.com/_static/vendor/fontawesome/6.5.2/css/all.min.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+  <link rel="preload" as="font" type="font/woff2" crossorigin href="http://github.com/_static/vendor/fontawesome/6.5.2/webfonts/fa-solid-900.woff2" />
+<link rel="preload" as="font" type="font/woff2" crossorigin href="http://github.com/_static/vendor/fontawesome/6.5.2/webfonts/fa-brands-400.woff2" />
+<link rel="preload" as="font" type="font/woff2" crossorigin href="http://github.com/_static/vendor/fontawesome/6.5.2/webfonts/fa-regular-400.woff2" />
+
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/pygments.css?v=03e43079" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/styles/sphinx-book-theme.css?v=eba8b062" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/togglebutton.css?v=13237357" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/copybutton.css?v=76b2166b" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/mystnb.8ecb98da25f57f5357bf6f572d296f466b2cfe2517ffebfabe82451661e28f02.css?v=6644e6bb" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/sphinx-thebe.css?v=4fa983c6" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/sphinx-design.min.css?v=95c83b7e" />
+  
+  <!-- Pre-loaded scripts that we'll load fully later -->
+  <link rel="preload" as="script" href="http://github.com/_static/scripts/bootstrap.js?digest=dfe6caa3a7d634c4db9b" />
+<link rel="preload" as="script" href="http://github.com/_static/scripts/pydata-sphinx-theme.js?digest=dfe6caa3a7d634c4db9b" />
+  <script src="http://github.com/_static/vendor/fontawesome/6.5.2/js/all.min.js?digest=dfe6caa3a7d634c4db9b"></script>
+
+    <script src="http://github.com/_static/documentation_options.js?v=9eb32ce0"></script>
+    <script src="http://github.com/_static/doctools.js?v=9a2dae69"></script>
+    <script src="http://github.com/_static/sphinx_highlight.js?v=dc90522c"></script>
+    <script src="http://github.com/_static/clipboard.min.js?v=a7894cd8"></script>
+    <script src="http://github.com/_static/copybutton.js?v=f281be69"></script>
+    <script src="http://github.com/_static/scripts/sphinx-book-theme.js?v=887ef09a"></script>
+    <script>let toggleHintShow = 'Click to show';</script>
+    <script>let toggleHintHide = 'Click to hide';</script>
+    <script>let toggleOpenOnPrint = 'true';</script>
+    <script src="http://github.com/_static/togglebutton.js?v=4a39c7ea"></script>
+    <script>var togglebuttonSelector = '.toggle, .admonition.dropdown';</script>
+    <script src="http://github.com/_static/design-tabs.js?v=f930bc37"></script>
+    <script>const THEBE_JS_URL = "https://unpkg.com/thebe@0.8.2/lib/index.js"; const thebe_selector = ".thebe,.cell"; const thebe_selector_input = "pre"; const thebe_selector_output = ".output, .cell_output"</script>
+    <script async="async" src="http://github.com/_static/sphinx-thebe.js?v=c100c467"></script>
+    <script>var togglebuttonSelector = '.toggle, .admonition.dropdown';</script>
+    <script>const THEBE_JS_URL = "https://unpkg.com/thebe@0.8.2/lib/index.js"; const thebe_selector = ".thebe,.cell"; const thebe_selector_input = "pre"; const thebe_selector_output = ".output, .cell_output"</script>
+    <script>DOCUMENTATION_OPTIONS.pagename = '.venv/lib/python3.13/site-packages/idna-3.10.dist-info/LICENSE';</script>
+    <link rel="index" title="Index" href="http://github.com/genindex.html" />
+    <link rel="search" title="Search" href="http://github.com/search.html" />
+  <meta name="viewport" content="width=device-width, initial-scale=1"/>
+  <meta name="docsearch:language" content="en"/>
+  </head>
+  
+  
+  <body data-bs-spy="scroll" data-bs-target=".bd-toc-nav" data-offset="180" data-bs-root-margin="0px 0px -60%" data-default-mode="">
+
+  
+  
+  <div id="pst-skip-link" class="skip-link d-print-none"><a href="#main-content">Skip to main content</a></div>
+  
+  <div id="pst-scroll-pixel-helper"></div>
+  
+  <button type="button" class="btn rounded-pill" id="pst-back-to-top">
+    <i class="fa-solid fa-arrow-up"></i>Back to top</button>
+
+  
+  <input type="checkbox"
+          class="sidebar-toggle"
+          id="pst-primary-sidebar-checkbox"/>
+  <label class="overlay overlay-primary" for="pst-primary-sidebar-checkbox"></label>
+  
+  <input type="checkbox"
+          class="sidebar-toggle"
+          id="pst-secondary-sidebar-checkbox"/>
+  <label class="overlay overlay-secondary" for="pst-secondary-sidebar-checkbox"></label>
+  
+  <div class="search-button__wrapper">
+    <div class="search-button__overlay"></div>
+    <div class="search-button__search-container">
+<form class="bd-search d-flex align-items-center"
+      action="http://github.com/search.html"
+      method="get">
+  <i class="fa-solid fa-magnifying-glass"></i>
+  <input type="search"
+         class="form-control"
+         name="q"
+         id="search-input"
+         placeholder="Search this book..."
+         aria-label="Search this book..."
+         autocomplete="off"
+         autocorrect="off"
+         autocapitalize="off"
+         spellcheck="false"/>
+  <span class="search-button__kbd-shortcut"><kbd class="kbd-shortcut__modifier">Ctrl</kbd>+<kbd>K</kbd></span>
+</form></div>
+  </div>
+
+  <div class="pst-async-banner-revealer d-none">
+  <aside id="bd-header-version-warning" class="d-none d-print-none" aria-label="Version warning"></aside>
+</div>
+
+  
+    <header class="bd-header navbar navbar-expand-lg bd-navbar d-print-none">
+    </header>
+  
+
+  <div class="bd-container">
+    <div class="bd-container__inner bd-page-width">
+      
+      
+      
+        
+      
+      <div class="bd-sidebar-primary bd-sidebar">
+        
+
+  
+  <div class="sidebar-header-items sidebar-primary__section">
+    
+    
+    
+    
+  </div>
+  
+    <div class="sidebar-primary-items__start sidebar-primary__section">
+        <div class="sidebar-primary-item">
+
+  
+    
+  
+
+<a class="navbar-brand logo" href="http://github.com/intro.html">
+  
+  
+  
+  
+  
+    
+    
+      
+    
+    
+    <img src="http://github.com/_static/logo.png" class="logo__image only-light" alt="Applied Data Analysis and Machine Learning - Home"/>
+    <script>document.write(`<img src="http://github.com/_static/logo.png" class="logo__image only-dark" alt="Applied Data Analysis and Machine Learning - Home"/>`);</script>
+  
+  
+</a></div>
+        <div class="sidebar-primary-item">
+
+ <script>
+ document.write(`
+   <button class="btn search-button-field search-button__button" title="Search" aria-label="Search" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <i class="fa-solid fa-magnifying-glass"></i>
+    <span class="search-button__default-text">Search</span>
+    <span class="search-button__kbd-shortcut"><kbd class="kbd-shortcut__modifier">Ctrl</kbd>+<kbd class="kbd-shortcut__modifier">K</kbd></span>
+   </button>
+ `);
+ </script></div>
+        <div class="sidebar-primary-item"><nav class="bd-links bd-docs-nav" aria-label="Main">
+    <div class="bd-toc-item navbar-nav active">
+        
+        <ul class="nav bd-sidenav bd-sidenav__home-link">
+            <li class="toctree-l1">
+                <a class="reference internal" href="http://github.com/intro.html">
+                    Applied Data Analysis and Machine Learning
+                </a>
+            </li>
+        </ul>
+        <p aria-level="2" class="caption" role="heading"><span class="caption-text">About the course</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/schedule.html">Course setting</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/teachers.html">Teachers and Grading</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/textbooks.html">Textbooks</a></li>
+
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Review of Statistics with Resampling Techniques and Linear Algebra</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/statistics.html">1. Elements of Probability Theory and Statistical Data Analysis</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/linalg.html">2. Linear Algebra, Handling of Arrays and more Python Features</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">From Regression to Support Vector Machines</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter1.html">3. Linear Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter2.html">4. Ridge and Lasso Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter3.html">5. Resampling Methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter4.html">6. Logistic Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapteroptimization.html">7. Optimization, the central part of any Machine Learning algortithm</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter5.html">8. Support Vector Machines, overarching aims</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Decision Trees, Ensemble Methods and Boosting</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter6.html">9. Decision trees, overarching aims</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter7.html">10. Ensemble Methods: From a Single Tree to Many Trees and Extreme Boosting, Meet the Jungle of Methods</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Dimensionality Reduction</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter8.html">11. Basic ideas of the Principal Component Analysis (PCA)</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/clustering.html">12. Clustering and Unsupervised Learning</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Deep Learning Methods</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter9.html">13. Neural networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter10.html">14. Building a Feed Forward Neural Network</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter11.html">15. Solving Differential Equations  with Deep Learning</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter12.html">16. Convolutional Neural Networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter13.html">17. Recurrent neural networks: Overarching view</a></li>
+
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Weekly material, notes and exercises</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek34.html">Exercises week 34</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week34.html">Week 34: Introduction to the course, Logistics and Practicalities</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek35.html">Exercises week 35</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week35.html">Week 35: From Ordinary Linear Regression to Ridge and Lasso Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek36.html">Exercises week 36</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week36.html">Week 36: Linear Regression and Gradient descent</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek37.html">Exercises week 37</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week37.html">Week 37: Gradient descent methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek38.html">Exercises week 38</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week38.html">Week 38: Statistical analysis, bias-variance tradeoff and resampling methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek39.html">Exercises week 39</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week39.html">Week 39: Resampling methods and logistic regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week40.html">Week 40: Gradient descent methods (continued) and start Neural networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week41.html">Week 41 Neural networks and constructing a neural network code</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek41.html">Exercises week 41</a></li>
+
+
+
+
+
+
+
+
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Projects</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/project1.html">Project 1 on Machine Learning, deadline October 6 (midnight), 2025</a></li>
+</ul>
+
+    </div>
+</nav></div>
+    </div>
+  
+  
+  <div class="sidebar-primary-items__end sidebar-primary__section">
+  </div>
+  
+  <div id="rtd-footer-container"></div>
+
+
+      </div>
+      
+      <main id="main-content" class="bd-main" role="main">
+        
+        
+
+<div class="sbt-scroll-pixel-helper"></div>
+
+          <div class="bd-content">
+            <div class="bd-article-container">
+              
+              <div class="bd-header-article d-print-none">
+<div class="header-article-items header-article__inner">
+  
+    <div class="header-article-items__start">
+      
+        <div class="header-article-item"><button class="sidebar-toggle primary-toggle btn btn-sm" title="Toggle primary sidebar" data-bs-placement="bottom" data-bs-toggle="tooltip">
+  <span class="fa-solid fa-bars"></span>
+</button></div>
+      
+    </div>
+  
+  
+    <div class="header-article-items__end">
+      
+        <div class="header-article-item">
+
+<div class="article-header-buttons">
+
+
+
+
+
+<div class="dropdown dropdown-download-buttons">
+  <button class="btn dropdown-toggle" type="button" data-bs-toggle="dropdown" aria-expanded="false" aria-label="Download this page">
+    <i class="fas fa-download"></i>
+  </button>
+  <ul class="dropdown-menu">
+      
+      
+      
+      <li><a href="http://github.com/_sources/.venv/lib/python3.13/site-packages/idna-3.10.dist-info/LICENSE.md" target="_blank"
+   class="btn btn-sm btn-download-source-button dropdown-item"
+   title="Download source file"
+   data-bs-placement="left" data-bs-toggle="tooltip"
+>
+  
+
+<span class="btn__icon-container">
+  <i class="fas fa-file"></i>
+  </span>
+<span class="btn__text-container">.md</span>
+</a>
+</li>
+      
+      
+      
+      
+      <li>
+<button onclick="window.print()"
+  class="btn btn-sm btn-download-pdf-button dropdown-item"
+  title="Print to PDF"
+  data-bs-placement="left" data-bs-toggle="tooltip"
+>
+  
+
+<span class="btn__icon-container">
+  <i class="fas fa-file-pdf"></i>
+  </span>
+<span class="btn__text-container">.pdf</span>
+</button>
+</li>
+      
+  </ul>
+</div>
+
+
+
+
+<button onclick="toggleFullScreen()"
+  class="btn btn-sm btn-fullscreen-button"
+  title="Fullscreen mode"
+  data-bs-placement="bottom" data-bs-toggle="tooltip"
+>
+  
+
+<span class="btn__icon-container">
+  <i class="fas fa-expand"></i>
+  </span>
+
+</button>
+
+
+
+<script>
+document.write(`
+  <button class="btn btn-sm nav-link pst-navbar-icon theme-switch-button" title="light/dark" aria-label="light/dark" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <i class="theme-switch fa-solid fa-sun fa-lg" data-mode="light"></i>
+    <i class="theme-switch fa-solid fa-moon fa-lg" data-mode="dark"></i>
+    <i class="theme-switch fa-solid fa-circle-half-stroke fa-lg" data-mode="auto"></i>
+  </button>
+`);
+</script>
+
+
+<script>
+document.write(`
+  <button class="btn btn-sm pst-navbar-icon search-button search-button__button" title="Search" aria-label="Search" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <i class="fa-solid fa-magnifying-glass fa-lg"></i>
+  </button>
+`);
+</script>
+<button class="sidebar-toggle secondary-toggle btn btn-sm" title="Toggle secondary sidebar" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <span class="fa-solid fa-list"></span>
+</button>
+</div></div>
+      
+    </div>
+  
+</div>
+</div>
+              
+              
+
+<div id="jb-print-docs-body" class="onlyprint">
+    <h1><no title></h1>
+    <!-- Table of contents -->
+    <div id="print-main-content">
+        <div id="jb-print-toc">
+            
+            <div>
+                <h2> Contents </h2>
+            </div>
+            <nav aria-label="Page">
+                <ul class="simple visible nav section-nav flex-column">
+</ul>
+
+            </nav>
+        </div>
+    </div>
+</div>
+
+              
+                
+<div id="searchbox"></div>
+                <article class="bd-article">
+                  
+  <p>BSD 3-Clause License</p>
+<p>Copyright (c) 2013-2024, Kim Davies and contributors.
+All rights reserved.</p>
+<p>Redistribution and use in source and binary forms, with or without
+modification, are permitted provided that the following conditions are
+met:</p>
+<ol class="arabic simple">
+<li><p>Redistributions of source code must retain the above copyright
+notice, this list of conditions and the following disclaimer.</p></li>
+<li><p>Redistributions in binary form must reproduce the above copyright
+notice, this list of conditions and the following disclaimer in the
+documentation and/or other materials provided with the distribution.</p></li>
+<li><p>Neither the name of the copyright holder nor the names of its
+contributors may be used to endorse or promote products derived from
+this software without specific prior written permission.</p></li>
+</ol>
+<p>THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+“AS IS” AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED
+TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
+PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
+LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
+NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
+SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.</p>
+
+    <script type="text/x-thebe-config">
+    {
+        requestKernel: true,
+        binderOptions: {
+            repo: "binder-examples/jupyter-stacks-datascience",
+            ref: "master",
+        },
+        codeMirrorConfig: {
+            theme: "abcdef",
+            mode: "python"
+        },
+        kernelOptions: {
+            name: "python3",
+            path: "./.venv/lib/python3.13/site-packages/idna-3.10.dist-info"
+        },
+        predefinedOutput: true
+    }
+    </script>
+    <script>kernelName = 'python3'</script>
+
+                </article>
+              
+
+              
+              
+              
+              
+                <footer class="prev-next-footer d-print-none">
+                  
+<div class="prev-next-area">
+</div>
+                </footer>
+              
+            </div>
+            
+            
+              
+                <div class="bd-sidebar-secondary bd-toc"><div class="sidebar-secondary-items sidebar-secondary__inner">
+
+
+  <div class="sidebar-secondary-item">
+  <div class="page-toc tocsection onthispage">
+    <i class="fa-solid fa-list"></i> Contents
+  </div>
+  <nav class="bd-toc-nav page-toc">
+    <ul class="simple visible nav section-nav flex-column">
+</ul>
+
+  </nav></div>
+
+</div></div>
+              
+            
+          </div>
+          <footer class="bd-footer-content">
+            
+<div class="bd-footer-content__inner container">
+  
+  <div class="footer-item">
+    
+<p class="component-author">
+By Morten Hjorth-Jensen
+</p>
+
+  </div>
+  
+  <div class="footer-item">
+    
+
+  <p class="copyright">
+    
+      © Copyright 2023.
+      <br/>
+    
+  </p>
+
+  </div>
+  
+  <div class="footer-item">
+    
+  </div>
+  
+  <div class="footer-item">
+    
+  </div>
+  
+</div>
+          </footer>
+        
+
+      </main>
+    </div>
+  </div>
+  
+  <!-- Scripts loaded after <body> so the DOM is not blocked -->
+  <script src="http://github.com/_static/scripts/bootstrap.js?digest=dfe6caa3a7d634c4db9b"></script>
+<script src="http://github.com/_static/scripts/pydata-sphinx-theme.js?digest=dfe6caa3a7d634c4db9b"></script>
+
+  <footer class="bd-footer">
+  </footer>
+  </body>
+</html>
\ No newline at end of file
diff --git a/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/imagesize-1.4.1.dist-info/LICENSE.html b/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/imagesize-1.4.1.dist-info/LICENSE.html
new file mode 100644
index 000000000..2773d816b
--- /dev/null
+++ b/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/imagesize-1.4.1.dist-info/LICENSE.html
@@ -0,0 +1,504 @@
+
+<!DOCTYPE html>
+
+
+<html lang="en" data-content_root="../../../../../" >
+
+  <head>
+    <meta charset="utf-8" />
+    <meta name="viewport" content="width=device-width, initial-scale=1.0" /><meta name="viewport" content="width=device-width, initial-scale=1" />
+
+    <title>The MIT License (MIT) &#8212; Applied Data Analysis and Machine Learning</title>
+  
+  
+  
+  <script data-cfasync="false">
+    document.documentElement.dataset.mode = localStorage.getItem("mode") || "";
+    document.documentElement.dataset.theme = localStorage.getItem("theme") || "";
+  </script>
+  
+  <!-- Loaded before other Sphinx assets -->
+  <link href="http://github.com/_static/styles/theme.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+<link href="http://github.com/_static/styles/bootstrap.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+<link href="http://github.com/_static/styles/pydata-sphinx-theme.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+
+  
+  <link href="http://github.com/_static/vendor/fontawesome/6.5.2/css/all.min.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+  <link rel="preload" as="font" type="font/woff2" crossorigin href="http://github.com/_static/vendor/fontawesome/6.5.2/webfonts/fa-solid-900.woff2" />
+<link rel="preload" as="font" type="font/woff2" crossorigin href="http://github.com/_static/vendor/fontawesome/6.5.2/webfonts/fa-brands-400.woff2" />
+<link rel="preload" as="font" type="font/woff2" crossorigin href="http://github.com/_static/vendor/fontawesome/6.5.2/webfonts/fa-regular-400.woff2" />
+
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/pygments.css?v=03e43079" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/styles/sphinx-book-theme.css?v=eba8b062" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/togglebutton.css?v=13237357" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/copybutton.css?v=76b2166b" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/mystnb.8ecb98da25f57f5357bf6f572d296f466b2cfe2517ffebfabe82451661e28f02.css?v=6644e6bb" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/sphinx-thebe.css?v=4fa983c6" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/sphinx-design.min.css?v=95c83b7e" />
+  
+  <!-- Pre-loaded scripts that we'll load fully later -->
+  <link rel="preload" as="script" href="http://github.com/_static/scripts/bootstrap.js?digest=dfe6caa3a7d634c4db9b" />
+<link rel="preload" as="script" href="http://github.com/_static/scripts/pydata-sphinx-theme.js?digest=dfe6caa3a7d634c4db9b" />
+  <script src="http://github.com/_static/vendor/fontawesome/6.5.2/js/all.min.js?digest=dfe6caa3a7d634c4db9b"></script>
+
+    <script src="http://github.com/_static/documentation_options.js?v=9eb32ce0"></script>
+    <script src="http://github.com/_static/doctools.js?v=9a2dae69"></script>
+    <script src="http://github.com/_static/sphinx_highlight.js?v=dc90522c"></script>
+    <script src="http://github.com/_static/clipboard.min.js?v=a7894cd8"></script>
+    <script src="http://github.com/_static/copybutton.js?v=f281be69"></script>
+    <script src="http://github.com/_static/scripts/sphinx-book-theme.js?v=887ef09a"></script>
+    <script>let toggleHintShow = 'Click to show';</script>
+    <script>let toggleHintHide = 'Click to hide';</script>
+    <script>let toggleOpenOnPrint = 'true';</script>
+    <script src="http://github.com/_static/togglebutton.js?v=4a39c7ea"></script>
+    <script>var togglebuttonSelector = '.toggle, .admonition.dropdown';</script>
+    <script src="http://github.com/_static/design-tabs.js?v=f930bc37"></script>
+    <script>const THEBE_JS_URL = "https://unpkg.com/thebe@0.8.2/lib/index.js"; const thebe_selector = ".thebe,.cell"; const thebe_selector_input = "pre"; const thebe_selector_output = ".output, .cell_output"</script>
+    <script async="async" src="http://github.com/_static/sphinx-thebe.js?v=c100c467"></script>
+    <script>var togglebuttonSelector = '.toggle, .admonition.dropdown';</script>
+    <script>const THEBE_JS_URL = "https://unpkg.com/thebe@0.8.2/lib/index.js"; const thebe_selector = ".thebe,.cell"; const thebe_selector_input = "pre"; const thebe_selector_output = ".output, .cell_output"</script>
+    <script>DOCUMENTATION_OPTIONS.pagename = '.venv/lib/python3.13/site-packages/imagesize-1.4.1.dist-info/LICENSE';</script>
+    <link rel="index" title="Index" href="http://github.com/genindex.html" />
+    <link rel="search" title="Search" href="http://github.com/search.html" />
+  <meta name="viewport" content="width=device-width, initial-scale=1"/>
+  <meta name="docsearch:language" content="en"/>
+  </head>
+  
+  
+  <body data-bs-spy="scroll" data-bs-target=".bd-toc-nav" data-offset="180" data-bs-root-margin="0px 0px -60%" data-default-mode="">
+
+  
+  
+  <div id="pst-skip-link" class="skip-link d-print-none"><a href="#main-content">Skip to main content</a></div>
+  
+  <div id="pst-scroll-pixel-helper"></div>
+  
+  <button type="button" class="btn rounded-pill" id="pst-back-to-top">
+    <i class="fa-solid fa-arrow-up"></i>Back to top</button>
+
+  
+  <input type="checkbox"
+          class="sidebar-toggle"
+          id="pst-primary-sidebar-checkbox"/>
+  <label class="overlay overlay-primary" for="pst-primary-sidebar-checkbox"></label>
+  
+  <input type="checkbox"
+          class="sidebar-toggle"
+          id="pst-secondary-sidebar-checkbox"/>
+  <label class="overlay overlay-secondary" for="pst-secondary-sidebar-checkbox"></label>
+  
+  <div class="search-button__wrapper">
+    <div class="search-button__overlay"></div>
+    <div class="search-button__search-container">
+<form class="bd-search d-flex align-items-center"
+      action="http://github.com/search.html"
+      method="get">
+  <i class="fa-solid fa-magnifying-glass"></i>
+  <input type="search"
+         class="form-control"
+         name="q"
+         id="search-input"
+         placeholder="Search this book..."
+         aria-label="Search this book..."
+         autocomplete="off"
+         autocorrect="off"
+         autocapitalize="off"
+         spellcheck="false"/>
+  <span class="search-button__kbd-shortcut"><kbd class="kbd-shortcut__modifier">Ctrl</kbd>+<kbd>K</kbd></span>
+</form></div>
+  </div>
+
+  <div class="pst-async-banner-revealer d-none">
+  <aside id="bd-header-version-warning" class="d-none d-print-none" aria-label="Version warning"></aside>
+</div>
+
+  
+    <header class="bd-header navbar navbar-expand-lg bd-navbar d-print-none">
+    </header>
+  
+
+  <div class="bd-container">
+    <div class="bd-container__inner bd-page-width">
+      
+      
+      
+        
+      
+      <div class="bd-sidebar-primary bd-sidebar">
+        
+
+  
+  <div class="sidebar-header-items sidebar-primary__section">
+    
+    
+    
+    
+  </div>
+  
+    <div class="sidebar-primary-items__start sidebar-primary__section">
+        <div class="sidebar-primary-item">
+
+  
+    
+  
+
+<a class="navbar-brand logo" href="http://github.com/intro.html">
+  
+  
+  
+  
+  
+    
+    
+      
+    
+    
+    <img src="http://github.com/_static/logo.png" class="logo__image only-light" alt="Applied Data Analysis and Machine Learning - Home"/>
+    <script>document.write(`<img src="http://github.com/_static/logo.png" class="logo__image only-dark" alt="Applied Data Analysis and Machine Learning - Home"/>`);</script>
+  
+  
+</a></div>
+        <div class="sidebar-primary-item">
+
+ <script>
+ document.write(`
+   <button class="btn search-button-field search-button__button" title="Search" aria-label="Search" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <i class="fa-solid fa-magnifying-glass"></i>
+    <span class="search-button__default-text">Search</span>
+    <span class="search-button__kbd-shortcut"><kbd class="kbd-shortcut__modifier">Ctrl</kbd>+<kbd class="kbd-shortcut__modifier">K</kbd></span>
+   </button>
+ `);
+ </script></div>
+        <div class="sidebar-primary-item"><nav class="bd-links bd-docs-nav" aria-label="Main">
+    <div class="bd-toc-item navbar-nav active">
+        
+        <ul class="nav bd-sidenav bd-sidenav__home-link">
+            <li class="toctree-l1">
+                <a class="reference internal" href="http://github.com/intro.html">
+                    Applied Data Analysis and Machine Learning
+                </a>
+            </li>
+        </ul>
+        <p aria-level="2" class="caption" role="heading"><span class="caption-text">About the course</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/schedule.html">Course setting</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/teachers.html">Teachers and Grading</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/textbooks.html">Textbooks</a></li>
+
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Review of Statistics with Resampling Techniques and Linear Algebra</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/statistics.html">1. Elements of Probability Theory and Statistical Data Analysis</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/linalg.html">2. Linear Algebra, Handling of Arrays and more Python Features</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">From Regression to Support Vector Machines</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter1.html">3. Linear Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter2.html">4. Ridge and Lasso Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter3.html">5. Resampling Methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter4.html">6. Logistic Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapteroptimization.html">7. Optimization, the central part of any Machine Learning algortithm</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter5.html">8. Support Vector Machines, overarching aims</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Decision Trees, Ensemble Methods and Boosting</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter6.html">9. Decision trees, overarching aims</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter7.html">10. Ensemble Methods: From a Single Tree to Many Trees and Extreme Boosting, Meet the Jungle of Methods</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Dimensionality Reduction</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter8.html">11. Basic ideas of the Principal Component Analysis (PCA)</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/clustering.html">12. Clustering and Unsupervised Learning</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Deep Learning Methods</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter9.html">13. Neural networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter10.html">14. Building a Feed Forward Neural Network</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter11.html">15. Solving Differential Equations  with Deep Learning</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter12.html">16. Convolutional Neural Networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter13.html">17. Recurrent neural networks: Overarching view</a></li>
+
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Weekly material, notes and exercises</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek34.html">Exercises week 34</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week34.html">Week 34: Introduction to the course, Logistics and Practicalities</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek35.html">Exercises week 35</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week35.html">Week 35: From Ordinary Linear Regression to Ridge and Lasso Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek36.html">Exercises week 36</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week36.html">Week 36: Linear Regression and Gradient descent</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek37.html">Exercises week 37</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week37.html">Week 37: Gradient descent methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek38.html">Exercises week 38</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week38.html">Week 38: Statistical analysis, bias-variance tradeoff and resampling methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek39.html">Exercises week 39</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week39.html">Week 39: Resampling methods and logistic regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week40.html">Week 40: Gradient descent methods (continued) and start Neural networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week41.html">Week 41 Neural networks and constructing a neural network code</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek41.html">Exercises week 41</a></li>
+
+
+
+
+
+
+
+
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Projects</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/project1.html">Project 1 on Machine Learning, deadline October 6 (midnight), 2025</a></li>
+</ul>
+
+    </div>
+</nav></div>
+    </div>
+  
+  
+  <div class="sidebar-primary-items__end sidebar-primary__section">
+  </div>
+  
+  <div id="rtd-footer-container"></div>
+
+
+      </div>
+      
+      <main id="main-content" class="bd-main" role="main">
+        
+        
+
+<div class="sbt-scroll-pixel-helper"></div>
+
+          <div class="bd-content">
+            <div class="bd-article-container">
+              
+              <div class="bd-header-article d-print-none">
+<div class="header-article-items header-article__inner">
+  
+    <div class="header-article-items__start">
+      
+        <div class="header-article-item"><button class="sidebar-toggle primary-toggle btn btn-sm" title="Toggle primary sidebar" data-bs-placement="bottom" data-bs-toggle="tooltip">
+  <span class="fa-solid fa-bars"></span>
+</button></div>
+      
+    </div>
+  
+  
+    <div class="header-article-items__end">
+      
+        <div class="header-article-item">
+
+<div class="article-header-buttons">
+
+
+
+
+
+<div class="dropdown dropdown-download-buttons">
+  <button class="btn dropdown-toggle" type="button" data-bs-toggle="dropdown" aria-expanded="false" aria-label="Download this page">
+    <i class="fas fa-download"></i>
+  </button>
+  <ul class="dropdown-menu">
+      
+      
+      
+      <li><a href="http://github.com/_sources/.venv/lib/python3.13/site-packages/imagesize-1.4.1.dist-info/LICENSE.rst" target="_blank"
+   class="btn btn-sm btn-download-source-button dropdown-item"
+   title="Download source file"
+   data-bs-placement="left" data-bs-toggle="tooltip"
+>
+  
+
+<span class="btn__icon-container">
+  <i class="fas fa-file"></i>
+  </span>
+<span class="btn__text-container">.rst</span>
+</a>
+</li>
+      
+      
+      
+      
+      <li>
+<button onclick="window.print()"
+  class="btn btn-sm btn-download-pdf-button dropdown-item"
+  title="Print to PDF"
+  data-bs-placement="left" data-bs-toggle="tooltip"
+>
+  
+
+<span class="btn__icon-container">
+  <i class="fas fa-file-pdf"></i>
+  </span>
+<span class="btn__text-container">.pdf</span>
+</button>
+</li>
+      
+  </ul>
+</div>
+
+
+
+
+<button onclick="toggleFullScreen()"
+  class="btn btn-sm btn-fullscreen-button"
+  title="Fullscreen mode"
+  data-bs-placement="bottom" data-bs-toggle="tooltip"
+>
+  
+
+<span class="btn__icon-container">
+  <i class="fas fa-expand"></i>
+  </span>
+
+</button>
+
+
+
+<script>
+document.write(`
+  <button class="btn btn-sm nav-link pst-navbar-icon theme-switch-button" title="light/dark" aria-label="light/dark" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <i class="theme-switch fa-solid fa-sun fa-lg" data-mode="light"></i>
+    <i class="theme-switch fa-solid fa-moon fa-lg" data-mode="dark"></i>
+    <i class="theme-switch fa-solid fa-circle-half-stroke fa-lg" data-mode="auto"></i>
+  </button>
+`);
+</script>
+
+
+<script>
+document.write(`
+  <button class="btn btn-sm pst-navbar-icon search-button search-button__button" title="Search" aria-label="Search" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <i class="fa-solid fa-magnifying-glass fa-lg"></i>
+  </button>
+`);
+</script>
+
+</div></div>
+      
+    </div>
+  
+</div>
+</div>
+              
+              
+
+<div id="jb-print-docs-body" class="onlyprint">
+    <h1>The MIT License (MIT)</h1>
+    <!-- Table of contents -->
+    <div id="print-main-content">
+        <div id="jb-print-toc">
+            
+        </div>
+    </div>
+</div>
+
+              
+                
+<div id="searchbox"></div>
+                <article class="bd-article">
+                  
+  <section id="the-mit-license-mit">
+<h1>The MIT License (MIT)<a class="headerlink" href="#the-mit-license-mit" title="Link to this heading">#</a></h1>
+<p>Copyright © 2016 Yoshiki Shibukawa</p>
+<p>Permission is hereby granted, free of charge, to any person obtaining a copy of this software
+and associated documentation files (the “Software”), to deal in the Software without restriction,
+including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense,
+and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so,
+subject to the following conditions:</p>
+<p>The above copyright notice and this permission notice shall be included in all copies or substantial
+portions of the Software.</p>
+<p>THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT
+NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
+IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
+WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH
+THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.</p>
+</section>
+
+    <script type="text/x-thebe-config">
+    {
+        requestKernel: true,
+        binderOptions: {
+            repo: "binder-examples/jupyter-stacks-datascience",
+            ref: "master",
+        },
+        codeMirrorConfig: {
+            theme: "abcdef",
+            mode: "python"
+        },
+        kernelOptions: {
+            name: "python3",
+            path: "./.venv/lib/python3.13/site-packages/imagesize-1.4.1.dist-info"
+        },
+        predefinedOutput: true
+    }
+    </script>
+    <script>kernelName = 'python3'</script>
+
+                </article>
+              
+
+              
+              
+              
+              
+                <footer class="prev-next-footer d-print-none">
+                  
+<div class="prev-next-area">
+</div>
+                </footer>
+              
+            </div>
+            
+            
+              
+            
+          </div>
+          <footer class="bd-footer-content">
+            
+<div class="bd-footer-content__inner container">
+  
+  <div class="footer-item">
+    
+<p class="component-author">
+By Morten Hjorth-Jensen
+</p>
+
+  </div>
+  
+  <div class="footer-item">
+    
+
+  <p class="copyright">
+    
+      © Copyright 2023.
+      <br/>
+    
+  </p>
+
+  </div>
+  
+  <div class="footer-item">
+    
+  </div>
+  
+  <div class="footer-item">
+    
+  </div>
+  
+</div>
+          </footer>
+        
+
+      </main>
+    </div>
+  </div>
+  
+  <!-- Scripts loaded after <body> so the DOM is not blocked -->
+  <script src="http://github.com/_static/scripts/bootstrap.js?digest=dfe6caa3a7d634c4db9b"></script>
+<script src="http://github.com/_static/scripts/pydata-sphinx-theme.js?digest=dfe6caa3a7d634c4db9b"></script>
+
+  <footer class="bd-footer">
+  </footer>
+  </body>
+</html>
\ No newline at end of file
diff --git a/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/ipython-9.5.0.dist-info/licenses/COPYING.html b/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/ipython-9.5.0.dist-info/licenses/COPYING.html
new file mode 100644
index 000000000..4b1989c67
--- /dev/null
+++ b/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/ipython-9.5.0.dist-info/licenses/COPYING.html
@@ -0,0 +1,548 @@
+
+<!DOCTYPE html>
+
+
+<html lang="en" data-content_root="../../../../../../" >
+
+  <head>
+    <meta charset="utf-8" />
+    <meta name="viewport" content="width=device-width, initial-scale=1.0" /><meta name="viewport" content="width=device-width, initial-scale=1" />
+
+    <title>The IPython licensing terms &#8212; Applied Data Analysis and Machine Learning</title>
+  
+  
+  
+  <script data-cfasync="false">
+    document.documentElement.dataset.mode = localStorage.getItem("mode") || "";
+    document.documentElement.dataset.theme = localStorage.getItem("theme") || "";
+  </script>
+  
+  <!-- Loaded before other Sphinx assets -->
+  <link href="http://github.com/_static/styles/theme.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+<link href="http://github.com/_static/styles/bootstrap.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+<link href="http://github.com/_static/styles/pydata-sphinx-theme.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+
+  
+  <link href="http://github.com/_static/vendor/fontawesome/6.5.2/css/all.min.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+  <link rel="preload" as="font" type="font/woff2" crossorigin href="http://github.com/_static/vendor/fontawesome/6.5.2/webfonts/fa-solid-900.woff2" />
+<link rel="preload" as="font" type="font/woff2" crossorigin href="http://github.com/_static/vendor/fontawesome/6.5.2/webfonts/fa-brands-400.woff2" />
+<link rel="preload" as="font" type="font/woff2" crossorigin href="http://github.com/_static/vendor/fontawesome/6.5.2/webfonts/fa-regular-400.woff2" />
+
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/pygments.css?v=03e43079" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/styles/sphinx-book-theme.css?v=eba8b062" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/togglebutton.css?v=13237357" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/copybutton.css?v=76b2166b" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/mystnb.8ecb98da25f57f5357bf6f572d296f466b2cfe2517ffebfabe82451661e28f02.css?v=6644e6bb" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/sphinx-thebe.css?v=4fa983c6" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/sphinx-design.min.css?v=95c83b7e" />
+  
+  <!-- Pre-loaded scripts that we'll load fully later -->
+  <link rel="preload" as="script" href="http://github.com/_static/scripts/bootstrap.js?digest=dfe6caa3a7d634c4db9b" />
+<link rel="preload" as="script" href="http://github.com/_static/scripts/pydata-sphinx-theme.js?digest=dfe6caa3a7d634c4db9b" />
+  <script src="http://github.com/_static/vendor/fontawesome/6.5.2/js/all.min.js?digest=dfe6caa3a7d634c4db9b"></script>
+
+    <script src="http://github.com/_static/documentation_options.js?v=9eb32ce0"></script>
+    <script src="http://github.com/_static/doctools.js?v=9a2dae69"></script>
+    <script src="http://github.com/_static/sphinx_highlight.js?v=dc90522c"></script>
+    <script src="http://github.com/_static/clipboard.min.js?v=a7894cd8"></script>
+    <script src="http://github.com/_static/copybutton.js?v=f281be69"></script>
+    <script src="http://github.com/_static/scripts/sphinx-book-theme.js?v=887ef09a"></script>
+    <script>let toggleHintShow = 'Click to show';</script>
+    <script>let toggleHintHide = 'Click to hide';</script>
+    <script>let toggleOpenOnPrint = 'true';</script>
+    <script src="http://github.com/_static/togglebutton.js?v=4a39c7ea"></script>
+    <script>var togglebuttonSelector = '.toggle, .admonition.dropdown';</script>
+    <script src="http://github.com/_static/design-tabs.js?v=f930bc37"></script>
+    <script>const THEBE_JS_URL = "https://unpkg.com/thebe@0.8.2/lib/index.js"; const thebe_selector = ".thebe,.cell"; const thebe_selector_input = "pre"; const thebe_selector_output = ".output, .cell_output"</script>
+    <script async="async" src="http://github.com/_static/sphinx-thebe.js?v=c100c467"></script>
+    <script>var togglebuttonSelector = '.toggle, .admonition.dropdown';</script>
+    <script>const THEBE_JS_URL = "https://unpkg.com/thebe@0.8.2/lib/index.js"; const thebe_selector = ".thebe,.cell"; const thebe_selector_input = "pre"; const thebe_selector_output = ".output, .cell_output"</script>
+    <script>DOCUMENTATION_OPTIONS.pagename = '.venv/lib/python3.13/site-packages/ipython-9.5.0.dist-info/licenses/COPYING';</script>
+    <link rel="index" title="Index" href="http://github.com/genindex.html" />
+    <link rel="search" title="Search" href="http://github.com/search.html" />
+  <meta name="viewport" content="width=device-width, initial-scale=1"/>
+  <meta name="docsearch:language" content="en"/>
+  </head>
+  
+  
+  <body data-bs-spy="scroll" data-bs-target=".bd-toc-nav" data-offset="180" data-bs-root-margin="0px 0px -60%" data-default-mode="">
+
+  
+  
+  <div id="pst-skip-link" class="skip-link d-print-none"><a href="#main-content">Skip to main content</a></div>
+  
+  <div id="pst-scroll-pixel-helper"></div>
+  
+  <button type="button" class="btn rounded-pill" id="pst-back-to-top">
+    <i class="fa-solid fa-arrow-up"></i>Back to top</button>
+
+  
+  <input type="checkbox"
+          class="sidebar-toggle"
+          id="pst-primary-sidebar-checkbox"/>
+  <label class="overlay overlay-primary" for="pst-primary-sidebar-checkbox"></label>
+  
+  <input type="checkbox"
+          class="sidebar-toggle"
+          id="pst-secondary-sidebar-checkbox"/>
+  <label class="overlay overlay-secondary" for="pst-secondary-sidebar-checkbox"></label>
+  
+  <div class="search-button__wrapper">
+    <div class="search-button__overlay"></div>
+    <div class="search-button__search-container">
+<form class="bd-search d-flex align-items-center"
+      action="http://github.com/search.html"
+      method="get">
+  <i class="fa-solid fa-magnifying-glass"></i>
+  <input type="search"
+         class="form-control"
+         name="q"
+         id="search-input"
+         placeholder="Search this book..."
+         aria-label="Search this book..."
+         autocomplete="off"
+         autocorrect="off"
+         autocapitalize="off"
+         spellcheck="false"/>
+  <span class="search-button__kbd-shortcut"><kbd class="kbd-shortcut__modifier">Ctrl</kbd>+<kbd>K</kbd></span>
+</form></div>
+  </div>
+
+  <div class="pst-async-banner-revealer d-none">
+  <aside id="bd-header-version-warning" class="d-none d-print-none" aria-label="Version warning"></aside>
+</div>
+
+  
+    <header class="bd-header navbar navbar-expand-lg bd-navbar d-print-none">
+    </header>
+  
+
+  <div class="bd-container">
+    <div class="bd-container__inner bd-page-width">
+      
+      
+      
+        
+      
+      <div class="bd-sidebar-primary bd-sidebar">
+        
+
+  
+  <div class="sidebar-header-items sidebar-primary__section">
+    
+    
+    
+    
+  </div>
+  
+    <div class="sidebar-primary-items__start sidebar-primary__section">
+        <div class="sidebar-primary-item">
+
+  
+    
+  
+
+<a class="navbar-brand logo" href="http://github.com/intro.html">
+  
+  
+  
+  
+  
+    
+    
+      
+    
+    
+    <img src="http://github.com/_static/logo.png" class="logo__image only-light" alt="Applied Data Analysis and Machine Learning - Home"/>
+    <script>document.write(`<img src="http://github.com/_static/logo.png" class="logo__image only-dark" alt="Applied Data Analysis and Machine Learning - Home"/>`);</script>
+  
+  
+</a></div>
+        <div class="sidebar-primary-item">
+
+ <script>
+ document.write(`
+   <button class="btn search-button-field search-button__button" title="Search" aria-label="Search" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <i class="fa-solid fa-magnifying-glass"></i>
+    <span class="search-button__default-text">Search</span>
+    <span class="search-button__kbd-shortcut"><kbd class="kbd-shortcut__modifier">Ctrl</kbd>+<kbd class="kbd-shortcut__modifier">K</kbd></span>
+   </button>
+ `);
+ </script></div>
+        <div class="sidebar-primary-item"><nav class="bd-links bd-docs-nav" aria-label="Main">
+    <div class="bd-toc-item navbar-nav active">
+        
+        <ul class="nav bd-sidenav bd-sidenav__home-link">
+            <li class="toctree-l1">
+                <a class="reference internal" href="http://github.com/intro.html">
+                    Applied Data Analysis and Machine Learning
+                </a>
+            </li>
+        </ul>
+        <p aria-level="2" class="caption" role="heading"><span class="caption-text">About the course</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/schedule.html">Course setting</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/teachers.html">Teachers and Grading</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/textbooks.html">Textbooks</a></li>
+
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Review of Statistics with Resampling Techniques and Linear Algebra</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/statistics.html">1. Elements of Probability Theory and Statistical Data Analysis</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/linalg.html">2. Linear Algebra, Handling of Arrays and more Python Features</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">From Regression to Support Vector Machines</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter1.html">3. Linear Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter2.html">4. Ridge and Lasso Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter3.html">5. Resampling Methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter4.html">6. Logistic Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapteroptimization.html">7. Optimization, the central part of any Machine Learning algortithm</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter5.html">8. Support Vector Machines, overarching aims</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Decision Trees, Ensemble Methods and Boosting</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter6.html">9. Decision trees, overarching aims</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter7.html">10. Ensemble Methods: From a Single Tree to Many Trees and Extreme Boosting, Meet the Jungle of Methods</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Dimensionality Reduction</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter8.html">11. Basic ideas of the Principal Component Analysis (PCA)</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/clustering.html">12. Clustering and Unsupervised Learning</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Deep Learning Methods</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter9.html">13. Neural networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter10.html">14. Building a Feed Forward Neural Network</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter11.html">15. Solving Differential Equations  with Deep Learning</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter12.html">16. Convolutional Neural Networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter13.html">17. Recurrent neural networks: Overarching view</a></li>
+
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Weekly material, notes and exercises</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek34.html">Exercises week 34</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week34.html">Week 34: Introduction to the course, Logistics and Practicalities</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek35.html">Exercises week 35</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week35.html">Week 35: From Ordinary Linear Regression to Ridge and Lasso Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek36.html">Exercises week 36</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week36.html">Week 36: Linear Regression and Gradient descent</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek37.html">Exercises week 37</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week37.html">Week 37: Gradient descent methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek38.html">Exercises week 38</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week38.html">Week 38: Statistical analysis, bias-variance tradeoff and resampling methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek39.html">Exercises week 39</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week39.html">Week 39: Resampling methods and logistic regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week40.html">Week 40: Gradient descent methods (continued) and start Neural networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week41.html">Week 41 Neural networks and constructing a neural network code</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek41.html">Exercises week 41</a></li>
+
+
+
+
+
+
+
+
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Projects</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/project1.html">Project 1 on Machine Learning, deadline October 6 (midnight), 2025</a></li>
+</ul>
+
+    </div>
+</nav></div>
+    </div>
+  
+  
+  <div class="sidebar-primary-items__end sidebar-primary__section">
+  </div>
+  
+  <div id="rtd-footer-container"></div>
+
+
+      </div>
+      
+      <main id="main-content" class="bd-main" role="main">
+        
+        
+
+<div class="sbt-scroll-pixel-helper"></div>
+
+          <div class="bd-content">
+            <div class="bd-article-container">
+              
+              <div class="bd-header-article d-print-none">
+<div class="header-article-items header-article__inner">
+  
+    <div class="header-article-items__start">
+      
+        <div class="header-article-item"><button class="sidebar-toggle primary-toggle btn btn-sm" title="Toggle primary sidebar" data-bs-placement="bottom" data-bs-toggle="tooltip">
+  <span class="fa-solid fa-bars"></span>
+</button></div>
+      
+    </div>
+  
+  
+    <div class="header-article-items__end">
+      
+        <div class="header-article-item">
+
+<div class="article-header-buttons">
+
+
+
+
+
+<div class="dropdown dropdown-download-buttons">
+  <button class="btn dropdown-toggle" type="button" data-bs-toggle="dropdown" aria-expanded="false" aria-label="Download this page">
+    <i class="fas fa-download"></i>
+  </button>
+  <ul class="dropdown-menu">
+      
+      
+      
+      <li><a href="http://github.com/_sources/.venv/lib/python3.13/site-packages/ipython-9.5.0.dist-info/licenses/COPYING.rst" target="_blank"
+   class="btn btn-sm btn-download-source-button dropdown-item"
+   title="Download source file"
+   data-bs-placement="left" data-bs-toggle="tooltip"
+>
+  
+
+<span class="btn__icon-container">
+  <i class="fas fa-file"></i>
+  </span>
+<span class="btn__text-container">.rst</span>
+</a>
+</li>
+      
+      
+      
+      
+      <li>
+<button onclick="window.print()"
+  class="btn btn-sm btn-download-pdf-button dropdown-item"
+  title="Print to PDF"
+  data-bs-placement="left" data-bs-toggle="tooltip"
+>
+  
+
+<span class="btn__icon-container">
+  <i class="fas fa-file-pdf"></i>
+  </span>
+<span class="btn__text-container">.pdf</span>
+</button>
+</li>
+      
+  </ul>
+</div>
+
+
+
+
+<button onclick="toggleFullScreen()"
+  class="btn btn-sm btn-fullscreen-button"
+  title="Fullscreen mode"
+  data-bs-placement="bottom" data-bs-toggle="tooltip"
+>
+  
+
+<span class="btn__icon-container">
+  <i class="fas fa-expand"></i>
+  </span>
+
+</button>
+
+
+
+<script>
+document.write(`
+  <button class="btn btn-sm nav-link pst-navbar-icon theme-switch-button" title="light/dark" aria-label="light/dark" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <i class="theme-switch fa-solid fa-sun fa-lg" data-mode="light"></i>
+    <i class="theme-switch fa-solid fa-moon fa-lg" data-mode="dark"></i>
+    <i class="theme-switch fa-solid fa-circle-half-stroke fa-lg" data-mode="auto"></i>
+  </button>
+`);
+</script>
+
+
+<script>
+document.write(`
+  <button class="btn btn-sm pst-navbar-icon search-button search-button__button" title="Search" aria-label="Search" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <i class="fa-solid fa-magnifying-glass fa-lg"></i>
+  </button>
+`);
+</script>
+<button class="sidebar-toggle secondary-toggle btn btn-sm" title="Toggle secondary sidebar" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <span class="fa-solid fa-list"></span>
+</button>
+</div></div>
+      
+    </div>
+  
+</div>
+</div>
+              
+              
+
+<div id="jb-print-docs-body" class="onlyprint">
+    <h1>The IPython licensing terms</h1>
+    <!-- Table of contents -->
+    <div id="print-main-content">
+        <div id="jb-print-toc">
+            
+            <div>
+                <h2> Contents </h2>
+            </div>
+            <nav aria-label="Page">
+                <ul class="visible nav section-nav flex-column">
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#about-the-ipython-development-team">About the IPython Development Team</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#our-copyright-policy">Our Copyright Policy</a></li>
+</ul>
+            </nav>
+        </div>
+    </div>
+</div>
+
+              
+                
+<div id="searchbox"></div>
+                <article class="bd-article">
+                  
+  <section id="the-ipython-licensing-terms">
+<h1>The IPython licensing terms<a class="headerlink" href="#the-ipython-licensing-terms" title="Link to this heading">#</a></h1>
+<p>IPython is licensed under the terms of the Modified BSD License (also known as
+New or Revised or 3-Clause BSD). See the LICENSE file.</p>
+<section id="about-the-ipython-development-team">
+<h2>About the IPython Development Team<a class="headerlink" href="#about-the-ipython-development-team" title="Link to this heading">#</a></h2>
+<p>Fernando Perez began IPython in 2001 based on code from Janko Hauser
+&lt;<a class="reference external" href="/service/mailto:jhauser&#37;&#52;&#48;zscout&#46;de">jhauser<span>&#64;</span>zscout<span>&#46;</span>de</a>&gt; and Nathaniel Gray &lt;<a class="reference external" href="/service/mailto:n8gray&#37;&#52;&#48;caltech&#46;edu">n8gray<span>&#64;</span>caltech<span>&#46;</span>edu</a>&gt;.  Fernando is still
+the project lead.</p>
+<p>The IPython Development Team is the set of all contributors to the IPython
+project.  This includes all of the IPython subprojects.</p>
+<p>The core team that coordinates development on GitHub can be found here:
+<a class="github reference external" href="https://github.com/ipython/">ipython/</a>.</p>
+</section>
+<section id="our-copyright-policy">
+<h2>Our Copyright Policy<a class="headerlink" href="#our-copyright-policy" title="Link to this heading">#</a></h2>
+<p>IPython uses a shared copyright model. Each contributor maintains copyright
+over their contributions to IPython. But, it is important to note that these
+contributions are typically only changes to the repositories. Thus, the IPython
+source code, in its entirety is not the copyright of any single person or
+institution.  Instead, it is the collective copyright of the entire IPython
+Development Team.  If individual contributors want to maintain a record of what
+changes/contributions they have specific copyright on, they should indicate
+their copyright in the commit message of the change, when they commit the
+change to one of the IPython repositories.</p>
+<p>With this in mind, the following banner should be used in any source code file
+to indicate the copyright and license terms:</p>
+<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="c1"># Copyright (c) IPython Development Team.</span>
+<span class="c1"># Distributed under the terms of the Modified BSD License.</span>
+</pre></div>
+</div>
+</section>
+</section>
+
+    <script type="text/x-thebe-config">
+    {
+        requestKernel: true,
+        binderOptions: {
+            repo: "binder-examples/jupyter-stacks-datascience",
+            ref: "master",
+        },
+        codeMirrorConfig: {
+            theme: "abcdef",
+            mode: "python"
+        },
+        kernelOptions: {
+            name: "python3",
+            path: "./.venv/lib/python3.13/site-packages/ipython-9.5.0.dist-info/licenses"
+        },
+        predefinedOutput: true
+    }
+    </script>
+    <script>kernelName = 'python3'</script>
+
+                </article>
+              
+
+              
+              
+              
+              
+                <footer class="prev-next-footer d-print-none">
+                  
+<div class="prev-next-area">
+</div>
+                </footer>
+              
+            </div>
+            
+            
+              
+                <div class="bd-sidebar-secondary bd-toc"><div class="sidebar-secondary-items sidebar-secondary__inner">
+
+
+  <div class="sidebar-secondary-item">
+  <div class="page-toc tocsection onthispage">
+    <i class="fa-solid fa-list"></i> Contents
+  </div>
+  <nav class="bd-toc-nav page-toc">
+    <ul class="visible nav section-nav flex-column">
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#about-the-ipython-development-team">About the IPython Development Team</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#our-copyright-policy">Our Copyright Policy</a></li>
+</ul>
+  </nav></div>
+
+</div></div>
+              
+            
+          </div>
+          <footer class="bd-footer-content">
+            
+<div class="bd-footer-content__inner container">
+  
+  <div class="footer-item">
+    
+<p class="component-author">
+By Morten Hjorth-Jensen
+</p>
+
+  </div>
+  
+  <div class="footer-item">
+    
+
+  <p class="copyright">
+    
+      © Copyright 2023.
+      <br/>
+    
+  </p>
+
+  </div>
+  
+  <div class="footer-item">
+    
+  </div>
+  
+  <div class="footer-item">
+    
+  </div>
+  
+</div>
+          </footer>
+        
+
+      </main>
+    </div>
+  </div>
+  
+  <!-- Scripts loaded after <body> so the DOM is not blocked -->
+  <script src="http://github.com/_static/scripts/bootstrap.js?digest=dfe6caa3a7d634c4db9b"></script>
+<script src="http://github.com/_static/scripts/pydata-sphinx-theme.js?digest=dfe6caa3a7d634c4db9b"></script>
+
+  <footer class="bd-footer">
+  </footer>
+  </body>
+</html>
\ No newline at end of file
diff --git a/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/jupyter_book/book_template/intro.html b/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/jupyter_book/book_template/intro.html
new file mode 100644
index 000000000..4825ab866
--- /dev/null
+++ b/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/jupyter_book/book_template/intro.html
@@ -0,0 +1,496 @@
+
+<!DOCTYPE html>
+
+
+<html lang="en" data-content_root="../../../../../../" >
+
+  <head>
+    <meta charset="utf-8" />
+    <meta name="viewport" content="width=device-width, initial-scale=1.0" /><meta name="viewport" content="width=device-width, initial-scale=1" />
+
+    <title>Welcome to your Jupyter Book &#8212; Applied Data Analysis and Machine Learning</title>
+  
+  
+  
+  <script data-cfasync="false">
+    document.documentElement.dataset.mode = localStorage.getItem("mode") || "";
+    document.documentElement.dataset.theme = localStorage.getItem("theme") || "";
+  </script>
+  
+  <!-- Loaded before other Sphinx assets -->
+  <link href="http://github.com/_static/styles/theme.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+<link href="http://github.com/_static/styles/bootstrap.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+<link href="http://github.com/_static/styles/pydata-sphinx-theme.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+
+  
+  <link href="http://github.com/_static/vendor/fontawesome/6.5.2/css/all.min.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+  <link rel="preload" as="font" type="font/woff2" crossorigin href="http://github.com/_static/vendor/fontawesome/6.5.2/webfonts/fa-solid-900.woff2" />
+<link rel="preload" as="font" type="font/woff2" crossorigin href="http://github.com/_static/vendor/fontawesome/6.5.2/webfonts/fa-brands-400.woff2" />
+<link rel="preload" as="font" type="font/woff2" crossorigin href="http://github.com/_static/vendor/fontawesome/6.5.2/webfonts/fa-regular-400.woff2" />
+
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/pygments.css?v=03e43079" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/styles/sphinx-book-theme.css?v=eba8b062" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/togglebutton.css?v=13237357" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/copybutton.css?v=76b2166b" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/mystnb.8ecb98da25f57f5357bf6f572d296f466b2cfe2517ffebfabe82451661e28f02.css?v=6644e6bb" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/sphinx-thebe.css?v=4fa983c6" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/sphinx-design.min.css?v=95c83b7e" />
+  
+  <!-- Pre-loaded scripts that we'll load fully later -->
+  <link rel="preload" as="script" href="http://github.com/_static/scripts/bootstrap.js?digest=dfe6caa3a7d634c4db9b" />
+<link rel="preload" as="script" href="http://github.com/_static/scripts/pydata-sphinx-theme.js?digest=dfe6caa3a7d634c4db9b" />
+  <script src="http://github.com/_static/vendor/fontawesome/6.5.2/js/all.min.js?digest=dfe6caa3a7d634c4db9b"></script>
+
+    <script src="http://github.com/_static/documentation_options.js?v=9eb32ce0"></script>
+    <script src="http://github.com/_static/doctools.js?v=9a2dae69"></script>
+    <script src="http://github.com/_static/sphinx_highlight.js?v=dc90522c"></script>
+    <script src="http://github.com/_static/clipboard.min.js?v=a7894cd8"></script>
+    <script src="http://github.com/_static/copybutton.js?v=f281be69"></script>
+    <script src="http://github.com/_static/scripts/sphinx-book-theme.js?v=887ef09a"></script>
+    <script>let toggleHintShow = 'Click to show';</script>
+    <script>let toggleHintHide = 'Click to hide';</script>
+    <script>let toggleOpenOnPrint = 'true';</script>
+    <script src="http://github.com/_static/togglebutton.js?v=4a39c7ea"></script>
+    <script>var togglebuttonSelector = '.toggle, .admonition.dropdown';</script>
+    <script src="http://github.com/_static/design-tabs.js?v=f930bc37"></script>
+    <script>const THEBE_JS_URL = "https://unpkg.com/thebe@0.8.2/lib/index.js"; const thebe_selector = ".thebe,.cell"; const thebe_selector_input = "pre"; const thebe_selector_output = ".output, .cell_output"</script>
+    <script async="async" src="http://github.com/_static/sphinx-thebe.js?v=c100c467"></script>
+    <script>var togglebuttonSelector = '.toggle, .admonition.dropdown';</script>
+    <script>const THEBE_JS_URL = "https://unpkg.com/thebe@0.8.2/lib/index.js"; const thebe_selector = ".thebe,.cell"; const thebe_selector_input = "pre"; const thebe_selector_output = ".output, .cell_output"</script>
+    <script>DOCUMENTATION_OPTIONS.pagename = '.venv/lib/python3.13/site-packages/jupyter_book/book_template/intro';</script>
+    <link rel="index" title="Index" href="http://github.com/genindex.html" />
+    <link rel="search" title="Search" href="http://github.com/search.html" />
+  <meta name="viewport" content="width=device-width, initial-scale=1"/>
+  <meta name="docsearch:language" content="en"/>
+  </head>
+  
+  
+  <body data-bs-spy="scroll" data-bs-target=".bd-toc-nav" data-offset="180" data-bs-root-margin="0px 0px -60%" data-default-mode="">
+
+  
+  
+  <div id="pst-skip-link" class="skip-link d-print-none"><a href="#main-content">Skip to main content</a></div>
+  
+  <div id="pst-scroll-pixel-helper"></div>
+  
+  <button type="button" class="btn rounded-pill" id="pst-back-to-top">
+    <i class="fa-solid fa-arrow-up"></i>Back to top</button>
+
+  
+  <input type="checkbox"
+          class="sidebar-toggle"
+          id="pst-primary-sidebar-checkbox"/>
+  <label class="overlay overlay-primary" for="pst-primary-sidebar-checkbox"></label>
+  
+  <input type="checkbox"
+          class="sidebar-toggle"
+          id="pst-secondary-sidebar-checkbox"/>
+  <label class="overlay overlay-secondary" for="pst-secondary-sidebar-checkbox"></label>
+  
+  <div class="search-button__wrapper">
+    <div class="search-button__overlay"></div>
+    <div class="search-button__search-container">
+<form class="bd-search d-flex align-items-center"
+      action="http://github.com/search.html"
+      method="get">
+  <i class="fa-solid fa-magnifying-glass"></i>
+  <input type="search"
+         class="form-control"
+         name="q"
+         id="search-input"
+         placeholder="Search this book..."
+         aria-label="Search this book..."
+         autocomplete="off"
+         autocorrect="off"
+         autocapitalize="off"
+         spellcheck="false"/>
+  <span class="search-button__kbd-shortcut"><kbd class="kbd-shortcut__modifier">Ctrl</kbd>+<kbd>K</kbd></span>
+</form></div>
+  </div>
+
+  <div class="pst-async-banner-revealer d-none">
+  <aside id="bd-header-version-warning" class="d-none d-print-none" aria-label="Version warning"></aside>
+</div>
+
+  
+    <header class="bd-header navbar navbar-expand-lg bd-navbar d-print-none">
+    </header>
+  
+
+  <div class="bd-container">
+    <div class="bd-container__inner bd-page-width">
+      
+      
+      
+        
+      
+      <div class="bd-sidebar-primary bd-sidebar">
+        
+
+  
+  <div class="sidebar-header-items sidebar-primary__section">
+    
+    
+    
+    
+  </div>
+  
+    <div class="sidebar-primary-items__start sidebar-primary__section">
+        <div class="sidebar-primary-item">
+
+  
+    
+  
+
+<a class="navbar-brand logo" href="http://github.com/intro.html">
+  
+  
+  
+  
+  
+    
+    
+      
+    
+    
+    <img src="http://github.com/_static/logo.png" class="logo__image only-light" alt="Applied Data Analysis and Machine Learning - Home"/>
+    <script>document.write(`<img src="http://github.com/_static/logo.png" class="logo__image only-dark" alt="Applied Data Analysis and Machine Learning - Home"/>`);</script>
+  
+  
+</a></div>
+        <div class="sidebar-primary-item">
+
+ <script>
+ document.write(`
+   <button class="btn search-button-field search-button__button" title="Search" aria-label="Search" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <i class="fa-solid fa-magnifying-glass"></i>
+    <span class="search-button__default-text">Search</span>
+    <span class="search-button__kbd-shortcut"><kbd class="kbd-shortcut__modifier">Ctrl</kbd>+<kbd class="kbd-shortcut__modifier">K</kbd></span>
+   </button>
+ `);
+ </script></div>
+        <div class="sidebar-primary-item"><nav class="bd-links bd-docs-nav" aria-label="Main">
+    <div class="bd-toc-item navbar-nav active">
+        
+        <ul class="nav bd-sidenav bd-sidenav__home-link">
+            <li class="toctree-l1">
+                <a class="reference internal" href="http://github.com/intro.html">
+                    Applied Data Analysis and Machine Learning
+                </a>
+            </li>
+        </ul>
+        <p aria-level="2" class="caption" role="heading"><span class="caption-text">About the course</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/schedule.html">Course setting</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/teachers.html">Teachers and Grading</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/textbooks.html">Textbooks</a></li>
+
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Review of Statistics with Resampling Techniques and Linear Algebra</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/statistics.html">1. Elements of Probability Theory and Statistical Data Analysis</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/linalg.html">2. Linear Algebra, Handling of Arrays and more Python Features</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">From Regression to Support Vector Machines</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter1.html">3. Linear Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter2.html">4. Ridge and Lasso Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter3.html">5. Resampling Methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter4.html">6. Logistic Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapteroptimization.html">7. Optimization, the central part of any Machine Learning algortithm</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter5.html">8. Support Vector Machines, overarching aims</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Decision Trees, Ensemble Methods and Boosting</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter6.html">9. Decision trees, overarching aims</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter7.html">10. Ensemble Methods: From a Single Tree to Many Trees and Extreme Boosting, Meet the Jungle of Methods</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Dimensionality Reduction</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter8.html">11. Basic ideas of the Principal Component Analysis (PCA)</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/clustering.html">12. Clustering and Unsupervised Learning</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Deep Learning Methods</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter9.html">13. Neural networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter10.html">14. Building a Feed Forward Neural Network</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter11.html">15. Solving Differential Equations  with Deep Learning</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter12.html">16. Convolutional Neural Networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter13.html">17. Recurrent neural networks: Overarching view</a></li>
+
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Weekly material, notes and exercises</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek34.html">Exercises week 34</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week34.html">Week 34: Introduction to the course, Logistics and Practicalities</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek35.html">Exercises week 35</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week35.html">Week 35: From Ordinary Linear Regression to Ridge and Lasso Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek36.html">Exercises week 36</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week36.html">Week 36: Linear Regression and Gradient descent</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek37.html">Exercises week 37</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week37.html">Week 37: Gradient descent methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek38.html">Exercises week 38</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week38.html">Week 38: Statistical analysis, bias-variance tradeoff and resampling methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek39.html">Exercises week 39</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week39.html">Week 39: Resampling methods and logistic regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week40.html">Week 40: Gradient descent methods (continued) and start Neural networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week41.html">Week 41 Neural networks and constructing a neural network code</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek41.html">Exercises week 41</a></li>
+
+
+
+
+
+
+
+
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Projects</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/project1.html">Project 1 on Machine Learning, deadline October 6 (midnight), 2025</a></li>
+</ul>
+
+    </div>
+</nav></div>
+    </div>
+  
+  
+  <div class="sidebar-primary-items__end sidebar-primary__section">
+  </div>
+  
+  <div id="rtd-footer-container"></div>
+
+
+      </div>
+      
+      <main id="main-content" class="bd-main" role="main">
+        
+        
+
+<div class="sbt-scroll-pixel-helper"></div>
+
+          <div class="bd-content">
+            <div class="bd-article-container">
+              
+              <div class="bd-header-article d-print-none">
+<div class="header-article-items header-article__inner">
+  
+    <div class="header-article-items__start">
+      
+        <div class="header-article-item"><button class="sidebar-toggle primary-toggle btn btn-sm" title="Toggle primary sidebar" data-bs-placement="bottom" data-bs-toggle="tooltip">
+  <span class="fa-solid fa-bars"></span>
+</button></div>
+      
+    </div>
+  
+  
+    <div class="header-article-items__end">
+      
+        <div class="header-article-item">
+
+<div class="article-header-buttons">
+
+
+
+
+
+<div class="dropdown dropdown-download-buttons">
+  <button class="btn dropdown-toggle" type="button" data-bs-toggle="dropdown" aria-expanded="false" aria-label="Download this page">
+    <i class="fas fa-download"></i>
+  </button>
+  <ul class="dropdown-menu">
+      
+      
+      
+      <li><a href="http://github.com/_sources/.venv/lib/python3.13/site-packages/jupyter_book/book_template/intro.md" target="_blank"
+   class="btn btn-sm btn-download-source-button dropdown-item"
+   title="Download source file"
+   data-bs-placement="left" data-bs-toggle="tooltip"
+>
+  
+
+<span class="btn__icon-container">
+  <i class="fas fa-file"></i>
+  </span>
+<span class="btn__text-container">.md</span>
+</a>
+</li>
+      
+      
+      
+      
+      <li>
+<button onclick="window.print()"
+  class="btn btn-sm btn-download-pdf-button dropdown-item"
+  title="Print to PDF"
+  data-bs-placement="left" data-bs-toggle="tooltip"
+>
+  
+
+<span class="btn__icon-container">
+  <i class="fas fa-file-pdf"></i>
+  </span>
+<span class="btn__text-container">.pdf</span>
+</button>
+</li>
+      
+  </ul>
+</div>
+
+
+
+
+<button onclick="toggleFullScreen()"
+  class="btn btn-sm btn-fullscreen-button"
+  title="Fullscreen mode"
+  data-bs-placement="bottom" data-bs-toggle="tooltip"
+>
+  
+
+<span class="btn__icon-container">
+  <i class="fas fa-expand"></i>
+  </span>
+
+</button>
+
+
+
+<script>
+document.write(`
+  <button class="btn btn-sm nav-link pst-navbar-icon theme-switch-button" title="light/dark" aria-label="light/dark" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <i class="theme-switch fa-solid fa-sun fa-lg" data-mode="light"></i>
+    <i class="theme-switch fa-solid fa-moon fa-lg" data-mode="dark"></i>
+    <i class="theme-switch fa-solid fa-circle-half-stroke fa-lg" data-mode="auto"></i>
+  </button>
+`);
+</script>
+
+
+<script>
+document.write(`
+  <button class="btn btn-sm pst-navbar-icon search-button search-button__button" title="Search" aria-label="Search" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <i class="fa-solid fa-magnifying-glass fa-lg"></i>
+  </button>
+`);
+</script>
+
+</div></div>
+      
+    </div>
+  
+</div>
+</div>
+              
+              
+
+<div id="jb-print-docs-body" class="onlyprint">
+    <h1>Welcome to your Jupyter Book</h1>
+    <!-- Table of contents -->
+    <div id="print-main-content">
+        <div id="jb-print-toc">
+            
+        </div>
+    </div>
+</div>
+
+              
+                
+<div id="searchbox"></div>
+                <article class="bd-article">
+                  
+  <section class="tex2jax_ignore mathjax_ignore" id="welcome-to-your-jupyter-book">
+<h1>Welcome to your Jupyter Book<a class="headerlink" href="#welcome-to-your-jupyter-book" title="Link to this heading">#</a></h1>
+<p>This is a small sample book to give you a feel for how book content is
+structured.
+It shows off a few of the major file types, as well as some sample content.
+It does not go in-depth into any particular topic - check out <a class="reference external" href="https://jupyterbook.org/">the Jupyter Book documentation</a> for more information.</p>
+<p>Check out the content pages bundled with this sample book to see more.</p>
+</section>
+
+    <script type="text/x-thebe-config">
+    {
+        requestKernel: true,
+        binderOptions: {
+            repo: "binder-examples/jupyter-stacks-datascience",
+            ref: "master",
+        },
+        codeMirrorConfig: {
+            theme: "abcdef",
+            mode: "python"
+        },
+        kernelOptions: {
+            name: "python3",
+            path: "./.venv/lib/python3.13/site-packages/jupyter_book/book_template"
+        },
+        predefinedOutput: true
+    }
+    </script>
+    <script>kernelName = 'python3'</script>
+
+                </article>
+              
+
+              
+              
+              
+              
+                <footer class="prev-next-footer d-print-none">
+                  
+<div class="prev-next-area">
+</div>
+                </footer>
+              
+            </div>
+            
+            
+              
+            
+          </div>
+          <footer class="bd-footer-content">
+            
+<div class="bd-footer-content__inner container">
+  
+  <div class="footer-item">
+    
+<p class="component-author">
+By Morten Hjorth-Jensen
+</p>
+
+  </div>
+  
+  <div class="footer-item">
+    
+
+  <p class="copyright">
+    
+      © Copyright 2023.
+      <br/>
+    
+  </p>
+
+  </div>
+  
+  <div class="footer-item">
+    
+  </div>
+  
+  <div class="footer-item">
+    
+  </div>
+  
+</div>
+          </footer>
+        
+
+      </main>
+    </div>
+  </div>
+  
+  <!-- Scripts loaded after <body> so the DOM is not blocked -->
+  <script src="http://github.com/_static/scripts/bootstrap.js?digest=dfe6caa3a7d634c4db9b"></script>
+<script src="http://github.com/_static/scripts/pydata-sphinx-theme.js?digest=dfe6caa3a7d634c4db9b"></script>
+
+  <footer class="bd-footer">
+  </footer>
+  </body>
+</html>
\ No newline at end of file
diff --git a/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/jupyter_book/book_template/markdown-notebooks.html b/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/jupyter_book/book_template/markdown-notebooks.html
new file mode 100644
index 000000000..8397c2c25
--- /dev/null
+++ b/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/jupyter_book/book_template/markdown-notebooks.html
@@ -0,0 +1,558 @@
+
+<!DOCTYPE html>
+
+
+<html lang="en" data-content_root="../../../../../../" >
+
+  <head>
+    <meta charset="utf-8" />
+    <meta name="viewport" content="width=device-width, initial-scale=1.0" /><meta name="viewport" content="width=device-width, initial-scale=1" />
+
+    <title>Notebooks with MyST Markdown &#8212; Applied Data Analysis and Machine Learning</title>
+  
+  
+  
+  <script data-cfasync="false">
+    document.documentElement.dataset.mode = localStorage.getItem("mode") || "";
+    document.documentElement.dataset.theme = localStorage.getItem("theme") || "";
+  </script>
+  
+  <!-- Loaded before other Sphinx assets -->
+  <link href="http://github.com/_static/styles/theme.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+<link href="http://github.com/_static/styles/bootstrap.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+<link href="http://github.com/_static/styles/pydata-sphinx-theme.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+
+  
+  <link href="http://github.com/_static/vendor/fontawesome/6.5.2/css/all.min.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+  <link rel="preload" as="font" type="font/woff2" crossorigin href="http://github.com/_static/vendor/fontawesome/6.5.2/webfonts/fa-solid-900.woff2" />
+<link rel="preload" as="font" type="font/woff2" crossorigin href="http://github.com/_static/vendor/fontawesome/6.5.2/webfonts/fa-brands-400.woff2" />
+<link rel="preload" as="font" type="font/woff2" crossorigin href="http://github.com/_static/vendor/fontawesome/6.5.2/webfonts/fa-regular-400.woff2" />
+
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/pygments.css?v=03e43079" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/styles/sphinx-book-theme.css?v=eba8b062" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/togglebutton.css?v=13237357" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/copybutton.css?v=76b2166b" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/mystnb.8ecb98da25f57f5357bf6f572d296f466b2cfe2517ffebfabe82451661e28f02.css?v=6644e6bb" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/sphinx-thebe.css?v=4fa983c6" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/sphinx-design.min.css?v=95c83b7e" />
+  
+  <!-- Pre-loaded scripts that we'll load fully later -->
+  <link rel="preload" as="script" href="http://github.com/_static/scripts/bootstrap.js?digest=dfe6caa3a7d634c4db9b" />
+<link rel="preload" as="script" href="http://github.com/_static/scripts/pydata-sphinx-theme.js?digest=dfe6caa3a7d634c4db9b" />
+  <script src="http://github.com/_static/vendor/fontawesome/6.5.2/js/all.min.js?digest=dfe6caa3a7d634c4db9b"></script>
+
+    <script src="http://github.com/_static/documentation_options.js?v=9eb32ce0"></script>
+    <script src="http://github.com/_static/doctools.js?v=9a2dae69"></script>
+    <script src="http://github.com/_static/sphinx_highlight.js?v=dc90522c"></script>
+    <script src="http://github.com/_static/clipboard.min.js?v=a7894cd8"></script>
+    <script src="http://github.com/_static/copybutton.js?v=f281be69"></script>
+    <script src="http://github.com/_static/scripts/sphinx-book-theme.js?v=887ef09a"></script>
+    <script>let toggleHintShow = 'Click to show';</script>
+    <script>let toggleHintHide = 'Click to hide';</script>
+    <script>let toggleOpenOnPrint = 'true';</script>
+    <script src="http://github.com/_static/togglebutton.js?v=4a39c7ea"></script>
+    <script>var togglebuttonSelector = '.toggle, .admonition.dropdown';</script>
+    <script src="http://github.com/_static/design-tabs.js?v=f930bc37"></script>
+    <script>const THEBE_JS_URL = "https://unpkg.com/thebe@0.8.2/lib/index.js"; const thebe_selector = ".thebe,.cell"; const thebe_selector_input = "pre"; const thebe_selector_output = ".output, .cell_output"</script>
+    <script async="async" src="http://github.com/_static/sphinx-thebe.js?v=c100c467"></script>
+    <script>var togglebuttonSelector = '.toggle, .admonition.dropdown';</script>
+    <script>const THEBE_JS_URL = "https://unpkg.com/thebe@0.8.2/lib/index.js"; const thebe_selector = ".thebe,.cell"; const thebe_selector_input = "pre"; const thebe_selector_output = ".output, .cell_output"</script>
+    <script>DOCUMENTATION_OPTIONS.pagename = '.venv/lib/python3.13/site-packages/jupyter_book/book_template/markdown-notebooks';</script>
+    <link rel="index" title="Index" href="http://github.com/genindex.html" />
+    <link rel="search" title="Search" href="http://github.com/search.html" />
+  <meta name="viewport" content="width=device-width, initial-scale=1"/>
+  <meta name="docsearch:language" content="en"/>
+  </head>
+  
+  
+  <body data-bs-spy="scroll" data-bs-target=".bd-toc-nav" data-offset="180" data-bs-root-margin="0px 0px -60%" data-default-mode="">
+
+  
+  
+  <div id="pst-skip-link" class="skip-link d-print-none"><a href="#main-content">Skip to main content</a></div>
+  
+  <div id="pst-scroll-pixel-helper"></div>
+  
+  <button type="button" class="btn rounded-pill" id="pst-back-to-top">
+    <i class="fa-solid fa-arrow-up"></i>Back to top</button>
+
+  
+  <input type="checkbox"
+          class="sidebar-toggle"
+          id="pst-primary-sidebar-checkbox"/>
+  <label class="overlay overlay-primary" for="pst-primary-sidebar-checkbox"></label>
+  
+  <input type="checkbox"
+          class="sidebar-toggle"
+          id="pst-secondary-sidebar-checkbox"/>
+  <label class="overlay overlay-secondary" for="pst-secondary-sidebar-checkbox"></label>
+  
+  <div class="search-button__wrapper">
+    <div class="search-button__overlay"></div>
+    <div class="search-button__search-container">
+<form class="bd-search d-flex align-items-center"
+      action="http://github.com/search.html"
+      method="get">
+  <i class="fa-solid fa-magnifying-glass"></i>
+  <input type="search"
+         class="form-control"
+         name="q"
+         id="search-input"
+         placeholder="Search this book..."
+         aria-label="Search this book..."
+         autocomplete="off"
+         autocorrect="off"
+         autocapitalize="off"
+         spellcheck="false"/>
+  <span class="search-button__kbd-shortcut"><kbd class="kbd-shortcut__modifier">Ctrl</kbd>+<kbd>K</kbd></span>
+</form></div>
+  </div>
+
+  <div class="pst-async-banner-revealer d-none">
+  <aside id="bd-header-version-warning" class="d-none d-print-none" aria-label="Version warning"></aside>
+</div>
+
+  
+    <header class="bd-header navbar navbar-expand-lg bd-navbar d-print-none">
+    </header>
+  
+
+  <div class="bd-container">
+    <div class="bd-container__inner bd-page-width">
+      
+      
+      
+        
+      
+      <div class="bd-sidebar-primary bd-sidebar">
+        
+
+  
+  <div class="sidebar-header-items sidebar-primary__section">
+    
+    
+    
+    
+  </div>
+  
+    <div class="sidebar-primary-items__start sidebar-primary__section">
+        <div class="sidebar-primary-item">
+
+  
+    
+  
+
+<a class="navbar-brand logo" href="http://github.com/intro.html">
+  
+  
+  
+  
+  
+    
+    
+      
+    
+    
+    <img src="http://github.com/_static/logo.png" class="logo__image only-light" alt="Applied Data Analysis and Machine Learning - Home"/>
+    <script>document.write(`<img src="http://github.com/_static/logo.png" class="logo__image only-dark" alt="Applied Data Analysis and Machine Learning - Home"/>`);</script>
+  
+  
+</a></div>
+        <div class="sidebar-primary-item">
+
+ <script>
+ document.write(`
+   <button class="btn search-button-field search-button__button" title="Search" aria-label="Search" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <i class="fa-solid fa-magnifying-glass"></i>
+    <span class="search-button__default-text">Search</span>
+    <span class="search-button__kbd-shortcut"><kbd class="kbd-shortcut__modifier">Ctrl</kbd>+<kbd class="kbd-shortcut__modifier">K</kbd></span>
+   </button>
+ `);
+ </script></div>
+        <div class="sidebar-primary-item"><nav class="bd-links bd-docs-nav" aria-label="Main">
+    <div class="bd-toc-item navbar-nav active">
+        
+        <ul class="nav bd-sidenav bd-sidenav__home-link">
+            <li class="toctree-l1">
+                <a class="reference internal" href="http://github.com/intro.html">
+                    Applied Data Analysis and Machine Learning
+                </a>
+            </li>
+        </ul>
+        <p aria-level="2" class="caption" role="heading"><span class="caption-text">About the course</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/schedule.html">Course setting</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/teachers.html">Teachers and Grading</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/textbooks.html">Textbooks</a></li>
+
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Review of Statistics with Resampling Techniques and Linear Algebra</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/statistics.html">1. Elements of Probability Theory and Statistical Data Analysis</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/linalg.html">2. Linear Algebra, Handling of Arrays and more Python Features</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">From Regression to Support Vector Machines</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter1.html">3. Linear Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter2.html">4. Ridge and Lasso Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter3.html">5. Resampling Methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter4.html">6. Logistic Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapteroptimization.html">7. Optimization, the central part of any Machine Learning algortithm</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter5.html">8. Support Vector Machines, overarching aims</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Decision Trees, Ensemble Methods and Boosting</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter6.html">9. Decision trees, overarching aims</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter7.html">10. Ensemble Methods: From a Single Tree to Many Trees and Extreme Boosting, Meet the Jungle of Methods</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Dimensionality Reduction</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter8.html">11. Basic ideas of the Principal Component Analysis (PCA)</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/clustering.html">12. Clustering and Unsupervised Learning</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Deep Learning Methods</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter9.html">13. Neural networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter10.html">14. Building a Feed Forward Neural Network</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter11.html">15. Solving Differential Equations  with Deep Learning</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter12.html">16. Convolutional Neural Networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter13.html">17. Recurrent neural networks: Overarching view</a></li>
+
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Weekly material, notes and exercises</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek34.html">Exercises week 34</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week34.html">Week 34: Introduction to the course, Logistics and Practicalities</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek35.html">Exercises week 35</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week35.html">Week 35: From Ordinary Linear Regression to Ridge and Lasso Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek36.html">Exercises week 36</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week36.html">Week 36: Linear Regression and Gradient descent</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek37.html">Exercises week 37</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week37.html">Week 37: Gradient descent methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek38.html">Exercises week 38</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week38.html">Week 38: Statistical analysis, bias-variance tradeoff and resampling methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek39.html">Exercises week 39</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week39.html">Week 39: Resampling methods and logistic regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week40.html">Week 40: Gradient descent methods (continued) and start Neural networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week41.html">Week 41 Neural networks and constructing a neural network code</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek41.html">Exercises week 41</a></li>
+
+
+
+
+
+
+
+
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Projects</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/project1.html">Project 1 on Machine Learning, deadline October 6 (midnight), 2025</a></li>
+</ul>
+
+    </div>
+</nav></div>
+    </div>
+  
+  
+  <div class="sidebar-primary-items__end sidebar-primary__section">
+  </div>
+  
+  <div id="rtd-footer-container"></div>
+
+
+      </div>
+      
+      <main id="main-content" class="bd-main" role="main">
+        
+        
+
+<div class="sbt-scroll-pixel-helper"></div>
+
+          <div class="bd-content">
+            <div class="bd-article-container">
+              
+              <div class="bd-header-article d-print-none">
+<div class="header-article-items header-article__inner">
+  
+    <div class="header-article-items__start">
+      
+        <div class="header-article-item"><button class="sidebar-toggle primary-toggle btn btn-sm" title="Toggle primary sidebar" data-bs-placement="bottom" data-bs-toggle="tooltip">
+  <span class="fa-solid fa-bars"></span>
+</button></div>
+      
+    </div>
+  
+  
+    <div class="header-article-items__end">
+      
+        <div class="header-article-item">
+
+<div class="article-header-buttons">
+
+
+
+
+
+<div class="dropdown dropdown-download-buttons">
+  <button class="btn dropdown-toggle" type="button" data-bs-toggle="dropdown" aria-expanded="false" aria-label="Download this page">
+    <i class="fas fa-download"></i>
+  </button>
+  <ul class="dropdown-menu">
+      
+      
+      
+      <li><a href="http://github.com/_sources/.venv/lib/python3.13/site-packages/jupyter_book/book_template/markdown-notebooks.md" target="_blank"
+   class="btn btn-sm btn-download-source-button dropdown-item"
+   title="Download source file"
+   data-bs-placement="left" data-bs-toggle="tooltip"
+>
+  
+
+<span class="btn__icon-container">
+  <i class="fas fa-file"></i>
+  </span>
+<span class="btn__text-container">.md</span>
+</a>
+</li>
+      
+      
+      
+      
+      <li>
+<button onclick="window.print()"
+  class="btn btn-sm btn-download-pdf-button dropdown-item"
+  title="Print to PDF"
+  data-bs-placement="left" data-bs-toggle="tooltip"
+>
+  
+
+<span class="btn__icon-container">
+  <i class="fas fa-file-pdf"></i>
+  </span>
+<span class="btn__text-container">.pdf</span>
+</button>
+</li>
+      
+  </ul>
+</div>
+
+
+
+
+<button onclick="toggleFullScreen()"
+  class="btn btn-sm btn-fullscreen-button"
+  title="Fullscreen mode"
+  data-bs-placement="bottom" data-bs-toggle="tooltip"
+>
+  
+
+<span class="btn__icon-container">
+  <i class="fas fa-expand"></i>
+  </span>
+
+</button>
+
+
+
+<script>
+document.write(`
+  <button class="btn btn-sm nav-link pst-navbar-icon theme-switch-button" title="light/dark" aria-label="light/dark" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <i class="theme-switch fa-solid fa-sun fa-lg" data-mode="light"></i>
+    <i class="theme-switch fa-solid fa-moon fa-lg" data-mode="dark"></i>
+    <i class="theme-switch fa-solid fa-circle-half-stroke fa-lg" data-mode="auto"></i>
+  </button>
+`);
+</script>
+
+
+<script>
+document.write(`
+  <button class="btn btn-sm pst-navbar-icon search-button search-button__button" title="Search" aria-label="Search" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <i class="fa-solid fa-magnifying-glass fa-lg"></i>
+  </button>
+`);
+</script>
+<button class="sidebar-toggle secondary-toggle btn btn-sm" title="Toggle secondary sidebar" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <span class="fa-solid fa-list"></span>
+</button>
+</div></div>
+      
+    </div>
+  
+</div>
+</div>
+              
+              
+
+<div id="jb-print-docs-body" class="onlyprint">
+    <h1>Notebooks with MyST Markdown</h1>
+    <!-- Table of contents -->
+    <div id="print-main-content">
+        <div id="jb-print-toc">
+            
+            <div>
+                <h2> Contents </h2>
+            </div>
+            <nav aria-label="Page">
+                <ul class="visible nav section-nav flex-column">
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#an-example-cell">An example cell</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#create-a-notebook-with-myst-markdown">Create a notebook with MyST Markdown</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#quickly-add-yaml-metadata-for-myst-notebooks">Quickly add YAML metadata for MyST Notebooks</a></li>
+</ul>
+            </nav>
+        </div>
+    </div>
+</div>
+
+              
+                
+<div id="searchbox"></div>
+                <article class="bd-article">
+                  
+  <section class="tex2jax_ignore mathjax_ignore" id="notebooks-with-myst-markdown">
+<h1>Notebooks with MyST Markdown<a class="headerlink" href="#notebooks-with-myst-markdown" title="Link to this heading">#</a></h1>
+<p>Jupyter Book also lets you write text-based notebooks using MyST Markdown.
+See <a class="reference external" href="https://jupyterbook.org/file-types/myst-notebooks.html">the Notebooks with MyST Markdown documentation</a> for more detailed instructions.
+This page shows off a notebook written in MyST Markdown.</p>
+<section id="an-example-cell">
+<h2>An example cell<a class="headerlink" href="#an-example-cell" title="Link to this heading">#</a></h2>
+<p>With MyST Markdown, you can define code cells with a directive like so:</p>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="nb">print</span><span class="p">(</span><span class="mi">2</span> <span class="o">+</span> <span class="mi">2</span><span class="p">)</span>
+</pre></div>
+</div>
+</div>
+</div>
+<p>When your book is built, the contents of any <code class="docutils literal notranslate"><span class="pre">{code-cell}</span></code> blocks will be
+executed with your default Jupyter kernel, and their outputs will be displayed
+in-line with the rest of your content.</p>
+<div class="admonition seealso">
+<p class="admonition-title">See also</p>
+<p>Jupyter Book uses <a class="reference external" href="https://jupytext.readthedocs.io/en/latest/">Jupytext</a> to convert text-based files to notebooks, and can support <a class="reference external" href="https://jupyterbook.org/file-types/jupytext.html">many other text-based notebook files</a>.</p>
+</div>
+</section>
+<section id="create-a-notebook-with-myst-markdown">
+<h2>Create a notebook with MyST Markdown<a class="headerlink" href="#create-a-notebook-with-myst-markdown" title="Link to this heading">#</a></h2>
+<p>MyST Markdown notebooks are defined by two things:</p>
+<ol class="arabic simple">
+<li><p>YAML metadata that is needed to understand if / how it should convert text files to notebooks (including information about the kernel needed).
+See the YAML at the top of this page for example.</p></li>
+<li><p>The presence of <code class="docutils literal notranslate"><span class="pre">{code-cell}</span></code> directives, which will be executed with your book.</p></li>
+</ol>
+<p>That’s all that is needed to get started!</p>
+</section>
+<section id="quickly-add-yaml-metadata-for-myst-notebooks">
+<h2>Quickly add YAML metadata for MyST Notebooks<a class="headerlink" href="#quickly-add-yaml-metadata-for-myst-notebooks" title="Link to this heading">#</a></h2>
+<p>If you have a markdown file and you’d like to quickly add YAML metadata to it, so that Jupyter Book will treat it as a MyST Markdown Notebook, run the following command:</p>
+<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">jupyter</span><span class="o">-</span><span class="n">book</span> <span class="n">myst</span> <span class="n">init</span> <span class="n">path</span><span class="o">/</span><span class="n">to</span><span class="o">/</span><span class="n">markdownfile</span><span class="o">.</span><span class="n">md</span>
+</pre></div>
+</div>
+</section>
+</section>
+
+    <script type="text/x-thebe-config">
+    {
+        requestKernel: true,
+        binderOptions: {
+            repo: "binder-examples/jupyter-stacks-datascience",
+            ref: "master",
+        },
+        codeMirrorConfig: {
+            theme: "abcdef",
+            mode: "python"
+        },
+        kernelOptions: {
+            name: "python3",
+            path: "./.venv/lib/python3.13/site-packages/jupyter_book/book_template"
+        },
+        predefinedOutput: true
+    }
+    </script>
+    <script>kernelName = 'python3'</script>
+
+                </article>
+              
+
+              
+              
+              
+              
+                <footer class="prev-next-footer d-print-none">
+                  
+<div class="prev-next-area">
+</div>
+                </footer>
+              
+            </div>
+            
+            
+              
+                <div class="bd-sidebar-secondary bd-toc"><div class="sidebar-secondary-items sidebar-secondary__inner">
+
+
+  <div class="sidebar-secondary-item">
+  <div class="page-toc tocsection onthispage">
+    <i class="fa-solid fa-list"></i> Contents
+  </div>
+  <nav class="bd-toc-nav page-toc">
+    <ul class="visible nav section-nav flex-column">
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#an-example-cell">An example cell</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#create-a-notebook-with-myst-markdown">Create a notebook with MyST Markdown</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#quickly-add-yaml-metadata-for-myst-notebooks">Quickly add YAML metadata for MyST Notebooks</a></li>
+</ul>
+  </nav></div>
+
+</div></div>
+              
+            
+          </div>
+          <footer class="bd-footer-content">
+            
+<div class="bd-footer-content__inner container">
+  
+  <div class="footer-item">
+    
+<p class="component-author">
+By Morten Hjorth-Jensen
+</p>
+
+  </div>
+  
+  <div class="footer-item">
+    
+
+  <p class="copyright">
+    
+      © Copyright 2023.
+      <br/>
+    
+  </p>
+
+  </div>
+  
+  <div class="footer-item">
+    
+  </div>
+  
+  <div class="footer-item">
+    
+  </div>
+  
+</div>
+          </footer>
+        
+
+      </main>
+    </div>
+  </div>
+  
+  <!-- Scripts loaded after <body> so the DOM is not blocked -->
+  <script src="http://github.com/_static/scripts/bootstrap.js?digest=dfe6caa3a7d634c4db9b"></script>
+<script src="http://github.com/_static/scripts/pydata-sphinx-theme.js?digest=dfe6caa3a7d634c4db9b"></script>
+
+  <footer class="bd-footer">
+  </footer>
+  </body>
+</html>
\ No newline at end of file
diff --git a/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/jupyter_book/book_template/markdown.html b/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/jupyter_book/book_template/markdown.html
new file mode 100644
index 000000000..96b13e256
--- /dev/null
+++ b/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/jupyter_book/book_template/markdown.html
@@ -0,0 +1,566 @@
+
+<!DOCTYPE html>
+
+
+<html lang="en" data-content_root="../../../../../../" >
+
+  <head>
+    <meta charset="utf-8" />
+    <meta name="viewport" content="width=device-width, initial-scale=1.0" /><meta name="viewport" content="width=device-width, initial-scale=1" />
+
+    <title>Markdown Files &#8212; Applied Data Analysis and Machine Learning</title>
+  
+  
+  
+  <script data-cfasync="false">
+    document.documentElement.dataset.mode = localStorage.getItem("mode") || "";
+    document.documentElement.dataset.theme = localStorage.getItem("theme") || "";
+  </script>
+  
+  <!-- Loaded before other Sphinx assets -->
+  <link href="http://github.com/_static/styles/theme.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+<link href="http://github.com/_static/styles/bootstrap.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+<link href="http://github.com/_static/styles/pydata-sphinx-theme.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+
+  
+  <link href="http://github.com/_static/vendor/fontawesome/6.5.2/css/all.min.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+  <link rel="preload" as="font" type="font/woff2" crossorigin href="http://github.com/_static/vendor/fontawesome/6.5.2/webfonts/fa-solid-900.woff2" />
+<link rel="preload" as="font" type="font/woff2" crossorigin href="http://github.com/_static/vendor/fontawesome/6.5.2/webfonts/fa-brands-400.woff2" />
+<link rel="preload" as="font" type="font/woff2" crossorigin href="http://github.com/_static/vendor/fontawesome/6.5.2/webfonts/fa-regular-400.woff2" />
+
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/pygments.css?v=03e43079" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/styles/sphinx-book-theme.css?v=eba8b062" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/togglebutton.css?v=13237357" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/copybutton.css?v=76b2166b" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/mystnb.8ecb98da25f57f5357bf6f572d296f466b2cfe2517ffebfabe82451661e28f02.css?v=6644e6bb" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/sphinx-thebe.css?v=4fa983c6" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/sphinx-design.min.css?v=95c83b7e" />
+  
+  <!-- Pre-loaded scripts that we'll load fully later -->
+  <link rel="preload" as="script" href="http://github.com/_static/scripts/bootstrap.js?digest=dfe6caa3a7d634c4db9b" />
+<link rel="preload" as="script" href="http://github.com/_static/scripts/pydata-sphinx-theme.js?digest=dfe6caa3a7d634c4db9b" />
+  <script src="http://github.com/_static/vendor/fontawesome/6.5.2/js/all.min.js?digest=dfe6caa3a7d634c4db9b"></script>
+
+    <script src="http://github.com/_static/documentation_options.js?v=9eb32ce0"></script>
+    <script src="http://github.com/_static/doctools.js?v=9a2dae69"></script>
+    <script src="http://github.com/_static/sphinx_highlight.js?v=dc90522c"></script>
+    <script src="http://github.com/_static/clipboard.min.js?v=a7894cd8"></script>
+    <script src="http://github.com/_static/copybutton.js?v=f281be69"></script>
+    <script src="http://github.com/_static/scripts/sphinx-book-theme.js?v=887ef09a"></script>
+    <script>let toggleHintShow = 'Click to show';</script>
+    <script>let toggleHintHide = 'Click to hide';</script>
+    <script>let toggleOpenOnPrint = 'true';</script>
+    <script src="http://github.com/_static/togglebutton.js?v=4a39c7ea"></script>
+    <script>var togglebuttonSelector = '.toggle, .admonition.dropdown';</script>
+    <script src="http://github.com/_static/design-tabs.js?v=f930bc37"></script>
+    <script>const THEBE_JS_URL = "https://unpkg.com/thebe@0.8.2/lib/index.js"; const thebe_selector = ".thebe,.cell"; const thebe_selector_input = "pre"; const thebe_selector_output = ".output, .cell_output"</script>
+    <script async="async" src="http://github.com/_static/sphinx-thebe.js?v=c100c467"></script>
+    <script>var togglebuttonSelector = '.toggle, .admonition.dropdown';</script>
+    <script>const THEBE_JS_URL = "https://unpkg.com/thebe@0.8.2/lib/index.js"; const thebe_selector = ".thebe,.cell"; const thebe_selector_input = "pre"; const thebe_selector_output = ".output, .cell_output"</script>
+    <script>DOCUMENTATION_OPTIONS.pagename = '.venv/lib/python3.13/site-packages/jupyter_book/book_template/markdown';</script>
+    <link rel="index" title="Index" href="http://github.com/genindex.html" />
+    <link rel="search" title="Search" href="http://github.com/search.html" />
+  <meta name="viewport" content="width=device-width, initial-scale=1"/>
+  <meta name="docsearch:language" content="en"/>
+  </head>
+  
+  
+  <body data-bs-spy="scroll" data-bs-target=".bd-toc-nav" data-offset="180" data-bs-root-margin="0px 0px -60%" data-default-mode="">
+
+  
+  
+  <div id="pst-skip-link" class="skip-link d-print-none"><a href="#main-content">Skip to main content</a></div>
+  
+  <div id="pst-scroll-pixel-helper"></div>
+  
+  <button type="button" class="btn rounded-pill" id="pst-back-to-top">
+    <i class="fa-solid fa-arrow-up"></i>Back to top</button>
+
+  
+  <input type="checkbox"
+          class="sidebar-toggle"
+          id="pst-primary-sidebar-checkbox"/>
+  <label class="overlay overlay-primary" for="pst-primary-sidebar-checkbox"></label>
+  
+  <input type="checkbox"
+          class="sidebar-toggle"
+          id="pst-secondary-sidebar-checkbox"/>
+  <label class="overlay overlay-secondary" for="pst-secondary-sidebar-checkbox"></label>
+  
+  <div class="search-button__wrapper">
+    <div class="search-button__overlay"></div>
+    <div class="search-button__search-container">
+<form class="bd-search d-flex align-items-center"
+      action="http://github.com/search.html"
+      method="get">
+  <i class="fa-solid fa-magnifying-glass"></i>
+  <input type="search"
+         class="form-control"
+         name="q"
+         id="search-input"
+         placeholder="Search this book..."
+         aria-label="Search this book..."
+         autocomplete="off"
+         autocorrect="off"
+         autocapitalize="off"
+         spellcheck="false"/>
+  <span class="search-button__kbd-shortcut"><kbd class="kbd-shortcut__modifier">Ctrl</kbd>+<kbd>K</kbd></span>
+</form></div>
+  </div>
+
+  <div class="pst-async-banner-revealer d-none">
+  <aside id="bd-header-version-warning" class="d-none d-print-none" aria-label="Version warning"></aside>
+</div>
+
+  
+    <header class="bd-header navbar navbar-expand-lg bd-navbar d-print-none">
+    </header>
+  
+
+  <div class="bd-container">
+    <div class="bd-container__inner bd-page-width">
+      
+      
+      
+        
+      
+      <div class="bd-sidebar-primary bd-sidebar">
+        
+
+  
+  <div class="sidebar-header-items sidebar-primary__section">
+    
+    
+    
+    
+  </div>
+  
+    <div class="sidebar-primary-items__start sidebar-primary__section">
+        <div class="sidebar-primary-item">
+
+  
+    
+  
+
+<a class="navbar-brand logo" href="http://github.com/intro.html">
+  
+  
+  
+  
+  
+    
+    
+      
+    
+    
+    <img src="http://github.com/_static/logo.png" class="logo__image only-light" alt="Applied Data Analysis and Machine Learning - Home"/>
+    <script>document.write(`<img src="http://github.com/_static/logo.png" class="logo__image only-dark" alt="Applied Data Analysis and Machine Learning - Home"/>`);</script>
+  
+  
+</a></div>
+        <div class="sidebar-primary-item">
+
+ <script>
+ document.write(`
+   <button class="btn search-button-field search-button__button" title="Search" aria-label="Search" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <i class="fa-solid fa-magnifying-glass"></i>
+    <span class="search-button__default-text">Search</span>
+    <span class="search-button__kbd-shortcut"><kbd class="kbd-shortcut__modifier">Ctrl</kbd>+<kbd class="kbd-shortcut__modifier">K</kbd></span>
+   </button>
+ `);
+ </script></div>
+        <div class="sidebar-primary-item"><nav class="bd-links bd-docs-nav" aria-label="Main">
+    <div class="bd-toc-item navbar-nav active">
+        
+        <ul class="nav bd-sidenav bd-sidenav__home-link">
+            <li class="toctree-l1">
+                <a class="reference internal" href="http://github.com/intro.html">
+                    Applied Data Analysis and Machine Learning
+                </a>
+            </li>
+        </ul>
+        <p aria-level="2" class="caption" role="heading"><span class="caption-text">About the course</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/schedule.html">Course setting</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/teachers.html">Teachers and Grading</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/textbooks.html">Textbooks</a></li>
+
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Review of Statistics with Resampling Techniques and Linear Algebra</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/statistics.html">1. Elements of Probability Theory and Statistical Data Analysis</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/linalg.html">2. Linear Algebra, Handling of Arrays and more Python Features</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">From Regression to Support Vector Machines</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter1.html">3. Linear Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter2.html">4. Ridge and Lasso Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter3.html">5. Resampling Methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter4.html">6. Logistic Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapteroptimization.html">7. Optimization, the central part of any Machine Learning algortithm</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter5.html">8. Support Vector Machines, overarching aims</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Decision Trees, Ensemble Methods and Boosting</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter6.html">9. Decision trees, overarching aims</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter7.html">10. Ensemble Methods: From a Single Tree to Many Trees and Extreme Boosting, Meet the Jungle of Methods</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Dimensionality Reduction</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter8.html">11. Basic ideas of the Principal Component Analysis (PCA)</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/clustering.html">12. Clustering and Unsupervised Learning</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Deep Learning Methods</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter9.html">13. Neural networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter10.html">14. Building a Feed Forward Neural Network</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter11.html">15. Solving Differential Equations  with Deep Learning</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter12.html">16. Convolutional Neural Networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter13.html">17. Recurrent neural networks: Overarching view</a></li>
+
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Weekly material, notes and exercises</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek34.html">Exercises week 34</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week34.html">Week 34: Introduction to the course, Logistics and Practicalities</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek35.html">Exercises week 35</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week35.html">Week 35: From Ordinary Linear Regression to Ridge and Lasso Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek36.html">Exercises week 36</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week36.html">Week 36: Linear Regression and Gradient descent</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek37.html">Exercises week 37</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week37.html">Week 37: Gradient descent methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek38.html">Exercises week 38</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week38.html">Week 38: Statistical analysis, bias-variance tradeoff and resampling methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek39.html">Exercises week 39</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week39.html">Week 39: Resampling methods and logistic regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week40.html">Week 40: Gradient descent methods (continued) and start Neural networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week41.html">Week 41 Neural networks and constructing a neural network code</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek41.html">Exercises week 41</a></li>
+
+
+
+
+
+
+
+
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Projects</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/project1.html">Project 1 on Machine Learning, deadline October 6 (midnight), 2025</a></li>
+</ul>
+
+    </div>
+</nav></div>
+    </div>
+  
+  
+  <div class="sidebar-primary-items__end sidebar-primary__section">
+  </div>
+  
+  <div id="rtd-footer-container"></div>
+
+
+      </div>
+      
+      <main id="main-content" class="bd-main" role="main">
+        
+        
+
+<div class="sbt-scroll-pixel-helper"></div>
+
+          <div class="bd-content">
+            <div class="bd-article-container">
+              
+              <div class="bd-header-article d-print-none">
+<div class="header-article-items header-article__inner">
+  
+    <div class="header-article-items__start">
+      
+        <div class="header-article-item"><button class="sidebar-toggle primary-toggle btn btn-sm" title="Toggle primary sidebar" data-bs-placement="bottom" data-bs-toggle="tooltip">
+  <span class="fa-solid fa-bars"></span>
+</button></div>
+      
+    </div>
+  
+  
+    <div class="header-article-items__end">
+      
+        <div class="header-article-item">
+
+<div class="article-header-buttons">
+
+
+
+
+
+<div class="dropdown dropdown-download-buttons">
+  <button class="btn dropdown-toggle" type="button" data-bs-toggle="dropdown" aria-expanded="false" aria-label="Download this page">
+    <i class="fas fa-download"></i>
+  </button>
+  <ul class="dropdown-menu">
+      
+      
+      
+      <li><a href="http://github.com/_sources/.venv/lib/python3.13/site-packages/jupyter_book/book_template/markdown.md" target="_blank"
+   class="btn btn-sm btn-download-source-button dropdown-item"
+   title="Download source file"
+   data-bs-placement="left" data-bs-toggle="tooltip"
+>
+  
+
+<span class="btn__icon-container">
+  <i class="fas fa-file"></i>
+  </span>
+<span class="btn__text-container">.md</span>
+</a>
+</li>
+      
+      
+      
+      
+      <li>
+<button onclick="window.print()"
+  class="btn btn-sm btn-download-pdf-button dropdown-item"
+  title="Print to PDF"
+  data-bs-placement="left" data-bs-toggle="tooltip"
+>
+  
+
+<span class="btn__icon-container">
+  <i class="fas fa-file-pdf"></i>
+  </span>
+<span class="btn__text-container">.pdf</span>
+</button>
+</li>
+      
+  </ul>
+</div>
+
+
+
+
+<button onclick="toggleFullScreen()"
+  class="btn btn-sm btn-fullscreen-button"
+  title="Fullscreen mode"
+  data-bs-placement="bottom" data-bs-toggle="tooltip"
+>
+  
+
+<span class="btn__icon-container">
+  <i class="fas fa-expand"></i>
+  </span>
+
+</button>
+
+
+
+<script>
+document.write(`
+  <button class="btn btn-sm nav-link pst-navbar-icon theme-switch-button" title="light/dark" aria-label="light/dark" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <i class="theme-switch fa-solid fa-sun fa-lg" data-mode="light"></i>
+    <i class="theme-switch fa-solid fa-moon fa-lg" data-mode="dark"></i>
+    <i class="theme-switch fa-solid fa-circle-half-stroke fa-lg" data-mode="auto"></i>
+  </button>
+`);
+</script>
+
+
+<script>
+document.write(`
+  <button class="btn btn-sm pst-navbar-icon search-button search-button__button" title="Search" aria-label="Search" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <i class="fa-solid fa-magnifying-glass fa-lg"></i>
+  </button>
+`);
+</script>
+<button class="sidebar-toggle secondary-toggle btn btn-sm" title="Toggle secondary sidebar" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <span class="fa-solid fa-list"></span>
+</button>
+</div></div>
+      
+    </div>
+  
+</div>
+</div>
+              
+              
+
+<div id="jb-print-docs-body" class="onlyprint">
+    <h1>Markdown Files</h1>
+    <!-- Table of contents -->
+    <div id="print-main-content">
+        <div id="jb-print-toc">
+            
+            <div>
+                <h2> Contents </h2>
+            </div>
+            <nav aria-label="Page">
+                <ul class="visible nav section-nav flex-column">
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#what-is-myst">What is MyST?</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#sample-roles-and-directives">Sample Roles and Directives</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#citations">Citations</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#learn-more">Learn more</a></li>
+</ul>
+            </nav>
+        </div>
+    </div>
+</div>
+
+              
+                
+<div id="searchbox"></div>
+                <article class="bd-article">
+                  
+  <section class="tex2jax_ignore mathjax_ignore" id="markdown-files">
+<h1>Markdown Files<a class="headerlink" href="#markdown-files" title="Link to this heading">#</a></h1>
+<p>Whether you write your book’s content in Jupyter Notebooks (<code class="docutils literal notranslate"><span class="pre">.ipynb</span></code>) or
+in regular markdown files (<code class="docutils literal notranslate"><span class="pre">.md</span></code>), you’ll write in the same flavor of markdown
+called <strong>MyST Markdown</strong>.
+This is a simple file to help you get started and show off some syntax.</p>
+<section id="what-is-myst">
+<h2>What is MyST?<a class="headerlink" href="#what-is-myst" title="Link to this heading">#</a></h2>
+<p>MyST stands for “Markedly Structured Text”. It
+is a slight variation on a flavor of markdown called “CommonMark” markdown,
+with small syntax extensions to allow you to write <strong>roles</strong> and <strong>directives</strong>
+in the Sphinx ecosystem.</p>
+<p>For more about MyST, see <a class="reference external" href="https://jupyterbook.org/content/myst.html">the MyST Markdown Overview</a>.</p>
+</section>
+<section id="sample-roles-and-directives">
+<h2>Sample Roles and Directives<a class="headerlink" href="#sample-roles-and-directives" title="Link to this heading">#</a></h2>
+<p>Roles and directives are two of the most powerful tools in Jupyter Book. They
+are like functions, but written in a markup language. They both
+serve a similar purpose, but <strong>roles are written in one line</strong>, whereas
+<strong>directives span many lines</strong>. They both accept different kinds of inputs,
+and what they do with those inputs depends on the specific role or directive
+that is being called.</p>
+<p>Here is a “note” directive:</p>
+<div class="admonition note">
+<p class="admonition-title">Note</p>
+<p>Here is a note</p>
+</div>
+<p>It will be rendered in a special box when you build your book.</p>
+<p>Here is an inline directive to refer to a document: <a class="reference internal" href="http://github.com/markdown-notebooks.html"><span class="doc">Notebooks with MyST Markdown</span></a>.</p>
+</section>
+<section id="citations">
+<h2>Citations<a class="headerlink" href="#citations" title="Link to this heading">#</a></h2>
+<p>You can also cite references that are stored in a <code class="docutils literal notranslate"><span class="pre">bibtex</span></code> file. For example,
+the following syntax: <code class="docutils literal notranslate"><span class="pre">{cite}`holdgraf_evidence_2014`</span></code> will render like
+this: .</p>
+<p>Moreover, you can insert a bibliography into your page with this syntax:
+The <code class="docutils literal notranslate"><span class="pre">{bibliography}</span></code> directive must be used for all the <code class="docutils literal notranslate"><span class="pre">{cite}</span></code> roles to
+render properly.
+For example, if the references for your book are stored in <code class="docutils literal notranslate"><span class="pre">references.bib</span></code>,
+then the bibliography is inserted with:</p>
+</section>
+<section id="learn-more">
+<h2>Learn more<a class="headerlink" href="#learn-more" title="Link to this heading">#</a></h2>
+<p>This is just a simple starter to get you started.
+You can learn a lot more at <a class="reference external" href="https://jupyterbook.org/">jupyterbook.org</a>.</p>
+</section>
+</section>
+
+    <script type="text/x-thebe-config">
+    {
+        requestKernel: true,
+        binderOptions: {
+            repo: "binder-examples/jupyter-stacks-datascience",
+            ref: "master",
+        },
+        codeMirrorConfig: {
+            theme: "abcdef",
+            mode: "python"
+        },
+        kernelOptions: {
+            name: "python3",
+            path: "./.venv/lib/python3.13/site-packages/jupyter_book/book_template"
+        },
+        predefinedOutput: true
+    }
+    </script>
+    <script>kernelName = 'python3'</script>
+
+                </article>
+              
+
+              
+              
+              
+              
+                <footer class="prev-next-footer d-print-none">
+                  
+<div class="prev-next-area">
+</div>
+                </footer>
+              
+            </div>
+            
+            
+              
+                <div class="bd-sidebar-secondary bd-toc"><div class="sidebar-secondary-items sidebar-secondary__inner">
+
+
+  <div class="sidebar-secondary-item">
+  <div class="page-toc tocsection onthispage">
+    <i class="fa-solid fa-list"></i> Contents
+  </div>
+  <nav class="bd-toc-nav page-toc">
+    <ul class="visible nav section-nav flex-column">
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#what-is-myst">What is MyST?</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#sample-roles-and-directives">Sample Roles and Directives</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#citations">Citations</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#learn-more">Learn more</a></li>
+</ul>
+  </nav></div>
+
+</div></div>
+              
+            
+          </div>
+          <footer class="bd-footer-content">
+            
+<div class="bd-footer-content__inner container">
+  
+  <div class="footer-item">
+    
+<p class="component-author">
+By Morten Hjorth-Jensen
+</p>
+
+  </div>
+  
+  <div class="footer-item">
+    
+
+  <p class="copyright">
+    
+      © Copyright 2023.
+      <br/>
+    
+  </p>
+
+  </div>
+  
+  <div class="footer-item">
+    
+  </div>
+  
+  <div class="footer-item">
+    
+  </div>
+  
+</div>
+          </footer>
+        
+
+      </main>
+    </div>
+  </div>
+  
+  <!-- Scripts loaded after <body> so the DOM is not blocked -->
+  <script src="http://github.com/_static/scripts/bootstrap.js?digest=dfe6caa3a7d634c4db9b"></script>
+<script src="http://github.com/_static/scripts/pydata-sphinx-theme.js?digest=dfe6caa3a7d634c4db9b"></script>
+
+  <footer class="bd-footer">
+  </footer>
+  </body>
+</html>
\ No newline at end of file
diff --git a/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/jupyter_book/book_template/notebooks.html b/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/jupyter_book/book_template/notebooks.html
new file mode 100644
index 000000000..09cc2a1db
--- /dev/null
+++ b/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/jupyter_book/book_template/notebooks.html
@@ -0,0 +1,590 @@
+
+<!DOCTYPE html>
+
+
+<html lang="en" data-content_root="../../../../../../" >
+
+  <head>
+    <meta charset="utf-8" />
+    <meta name="viewport" content="width=device-width, initial-scale=1.0" /><meta name="viewport" content="width=device-width, initial-scale=1" />
+
+    <title>Content with notebooks &#8212; Applied Data Analysis and Machine Learning</title>
+  
+  
+  
+  <script data-cfasync="false">
+    document.documentElement.dataset.mode = localStorage.getItem("mode") || "";
+    document.documentElement.dataset.theme = localStorage.getItem("theme") || "";
+  </script>
+  
+  <!-- Loaded before other Sphinx assets -->
+  <link href="http://github.com/_static/styles/theme.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+<link href="http://github.com/_static/styles/bootstrap.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+<link href="http://github.com/_static/styles/pydata-sphinx-theme.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+
+  
+  <link href="http://github.com/_static/vendor/fontawesome/6.5.2/css/all.min.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+  <link rel="preload" as="font" type="font/woff2" crossorigin href="http://github.com/_static/vendor/fontawesome/6.5.2/webfonts/fa-solid-900.woff2" />
+<link rel="preload" as="font" type="font/woff2" crossorigin href="http://github.com/_static/vendor/fontawesome/6.5.2/webfonts/fa-brands-400.woff2" />
+<link rel="preload" as="font" type="font/woff2" crossorigin href="http://github.com/_static/vendor/fontawesome/6.5.2/webfonts/fa-regular-400.woff2" />
+
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/pygments.css?v=03e43079" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/styles/sphinx-book-theme.css?v=eba8b062" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/togglebutton.css?v=13237357" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/copybutton.css?v=76b2166b" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/mystnb.8ecb98da25f57f5357bf6f572d296f466b2cfe2517ffebfabe82451661e28f02.css?v=6644e6bb" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/sphinx-thebe.css?v=4fa983c6" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/sphinx-design.min.css?v=95c83b7e" />
+  
+  <!-- Pre-loaded scripts that we'll load fully later -->
+  <link rel="preload" as="script" href="http://github.com/_static/scripts/bootstrap.js?digest=dfe6caa3a7d634c4db9b" />
+<link rel="preload" as="script" href="http://github.com/_static/scripts/pydata-sphinx-theme.js?digest=dfe6caa3a7d634c4db9b" />
+  <script src="http://github.com/_static/vendor/fontawesome/6.5.2/js/all.min.js?digest=dfe6caa3a7d634c4db9b"></script>
+
+    <script src="http://github.com/_static/documentation_options.js?v=9eb32ce0"></script>
+    <script src="http://github.com/_static/doctools.js?v=9a2dae69"></script>
+    <script src="http://github.com/_static/sphinx_highlight.js?v=dc90522c"></script>
+    <script src="http://github.com/_static/clipboard.min.js?v=a7894cd8"></script>
+    <script src="http://github.com/_static/copybutton.js?v=f281be69"></script>
+    <script src="http://github.com/_static/scripts/sphinx-book-theme.js?v=887ef09a"></script>
+    <script>let toggleHintShow = 'Click to show';</script>
+    <script>let toggleHintHide = 'Click to hide';</script>
+    <script>let toggleOpenOnPrint = 'true';</script>
+    <script src="http://github.com/_static/togglebutton.js?v=4a39c7ea"></script>
+    <script>var togglebuttonSelector = '.toggle, .admonition.dropdown';</script>
+    <script src="http://github.com/_static/design-tabs.js?v=f930bc37"></script>
+    <script>const THEBE_JS_URL = "https://unpkg.com/thebe@0.8.2/lib/index.js"; const thebe_selector = ".thebe,.cell"; const thebe_selector_input = "pre"; const thebe_selector_output = ".output, .cell_output"</script>
+    <script async="async" src="http://github.com/_static/sphinx-thebe.js?v=c100c467"></script>
+    <script>var togglebuttonSelector = '.toggle, .admonition.dropdown';</script>
+    <script>const THEBE_JS_URL = "https://unpkg.com/thebe@0.8.2/lib/index.js"; const thebe_selector = ".thebe,.cell"; const thebe_selector_input = "pre"; const thebe_selector_output = ".output, .cell_output"</script>
+    <script>window.MathJax = {"options": {"processHtmlClass": "tex2jax_process|mathjax_process|math|output_area"}}</script>
+    <script defer="defer" src="https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-mml-chtml.js"></script>
+    <script>DOCUMENTATION_OPTIONS.pagename = '.venv/lib/python3.13/site-packages/jupyter_book/book_template/notebooks';</script>
+    <link rel="index" title="Index" href="http://github.com/genindex.html" />
+    <link rel="search" title="Search" href="http://github.com/search.html" />
+  <meta name="viewport" content="width=device-width, initial-scale=1"/>
+  <meta name="docsearch:language" content="en"/>
+  </head>
+  
+  
+  <body data-bs-spy="scroll" data-bs-target=".bd-toc-nav" data-offset="180" data-bs-root-margin="0px 0px -60%" data-default-mode="">
+
+  
+  
+  <div id="pst-skip-link" class="skip-link d-print-none"><a href="#main-content">Skip to main content</a></div>
+  
+  <div id="pst-scroll-pixel-helper"></div>
+  
+  <button type="button" class="btn rounded-pill" id="pst-back-to-top">
+    <i class="fa-solid fa-arrow-up"></i>Back to top</button>
+
+  
+  <input type="checkbox"
+          class="sidebar-toggle"
+          id="pst-primary-sidebar-checkbox"/>
+  <label class="overlay overlay-primary" for="pst-primary-sidebar-checkbox"></label>
+  
+  <input type="checkbox"
+          class="sidebar-toggle"
+          id="pst-secondary-sidebar-checkbox"/>
+  <label class="overlay overlay-secondary" for="pst-secondary-sidebar-checkbox"></label>
+  
+  <div class="search-button__wrapper">
+    <div class="search-button__overlay"></div>
+    <div class="search-button__search-container">
+<form class="bd-search d-flex align-items-center"
+      action="http://github.com/search.html"
+      method="get">
+  <i class="fa-solid fa-magnifying-glass"></i>
+  <input type="search"
+         class="form-control"
+         name="q"
+         id="search-input"
+         placeholder="Search this book..."
+         aria-label="Search this book..."
+         autocomplete="off"
+         autocorrect="off"
+         autocapitalize="off"
+         spellcheck="false"/>
+  <span class="search-button__kbd-shortcut"><kbd class="kbd-shortcut__modifier">Ctrl</kbd>+<kbd>K</kbd></span>
+</form></div>
+  </div>
+
+  <div class="pst-async-banner-revealer d-none">
+  <aside id="bd-header-version-warning" class="d-none d-print-none" aria-label="Version warning"></aside>
+</div>
+
+  
+    <header class="bd-header navbar navbar-expand-lg bd-navbar d-print-none">
+    </header>
+  
+
+  <div class="bd-container">
+    <div class="bd-container__inner bd-page-width">
+      
+      
+      
+        
+      
+      <div class="bd-sidebar-primary bd-sidebar">
+        
+
+  
+  <div class="sidebar-header-items sidebar-primary__section">
+    
+    
+    
+    
+  </div>
+  
+    <div class="sidebar-primary-items__start sidebar-primary__section">
+        <div class="sidebar-primary-item">
+
+  
+    
+  
+
+<a class="navbar-brand logo" href="http://github.com/intro.html">
+  
+  
+  
+  
+  
+    
+    
+      
+    
+    
+    <img src="http://github.com/_static/logo.png" class="logo__image only-light" alt="Applied Data Analysis and Machine Learning - Home"/>
+    <script>document.write(`<img src="http://github.com/_static/logo.png" class="logo__image only-dark" alt="Applied Data Analysis and Machine Learning - Home"/>`);</script>
+  
+  
+</a></div>
+        <div class="sidebar-primary-item">
+
+ <script>
+ document.write(`
+   <button class="btn search-button-field search-button__button" title="Search" aria-label="Search" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <i class="fa-solid fa-magnifying-glass"></i>
+    <span class="search-button__default-text">Search</span>
+    <span class="search-button__kbd-shortcut"><kbd class="kbd-shortcut__modifier">Ctrl</kbd>+<kbd class="kbd-shortcut__modifier">K</kbd></span>
+   </button>
+ `);
+ </script></div>
+        <div class="sidebar-primary-item"><nav class="bd-links bd-docs-nav" aria-label="Main">
+    <div class="bd-toc-item navbar-nav active">
+        
+        <ul class="nav bd-sidenav bd-sidenav__home-link">
+            <li class="toctree-l1">
+                <a class="reference internal" href="http://github.com/intro.html">
+                    Applied Data Analysis and Machine Learning
+                </a>
+            </li>
+        </ul>
+        <p aria-level="2" class="caption" role="heading"><span class="caption-text">About the course</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/schedule.html">Course setting</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/teachers.html">Teachers and Grading</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/textbooks.html">Textbooks</a></li>
+
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Review of Statistics with Resampling Techniques and Linear Algebra</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/statistics.html">1. Elements of Probability Theory and Statistical Data Analysis</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/linalg.html">2. Linear Algebra, Handling of Arrays and more Python Features</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">From Regression to Support Vector Machines</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter1.html">3. Linear Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter2.html">4. Ridge and Lasso Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter3.html">5. Resampling Methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter4.html">6. Logistic Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapteroptimization.html">7. Optimization, the central part of any Machine Learning algortithm</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter5.html">8. Support Vector Machines, overarching aims</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Decision Trees, Ensemble Methods and Boosting</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter6.html">9. Decision trees, overarching aims</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter7.html">10. Ensemble Methods: From a Single Tree to Many Trees and Extreme Boosting, Meet the Jungle of Methods</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Dimensionality Reduction</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter8.html">11. Basic ideas of the Principal Component Analysis (PCA)</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/clustering.html">12. Clustering and Unsupervised Learning</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Deep Learning Methods</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter9.html">13. Neural networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter10.html">14. Building a Feed Forward Neural Network</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter11.html">15. Solving Differential Equations  with Deep Learning</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter12.html">16. Convolutional Neural Networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter13.html">17. Recurrent neural networks: Overarching view</a></li>
+
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Weekly material, notes and exercises</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek34.html">Exercises week 34</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week34.html">Week 34: Introduction to the course, Logistics and Practicalities</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek35.html">Exercises week 35</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week35.html">Week 35: From Ordinary Linear Regression to Ridge and Lasso Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek36.html">Exercises week 36</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week36.html">Week 36: Linear Regression and Gradient descent</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek37.html">Exercises week 37</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week37.html">Week 37: Gradient descent methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek38.html">Exercises week 38</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week38.html">Week 38: Statistical analysis, bias-variance tradeoff and resampling methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek39.html">Exercises week 39</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week39.html">Week 39: Resampling methods and logistic regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week40.html">Week 40: Gradient descent methods (continued) and start Neural networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week41.html">Week 41 Neural networks and constructing a neural network code</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek41.html">Exercises week 41</a></li>
+
+
+
+
+
+
+
+
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Projects</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/project1.html">Project 1 on Machine Learning, deadline October 6 (midnight), 2025</a></li>
+</ul>
+
+    </div>
+</nav></div>
+    </div>
+  
+  
+  <div class="sidebar-primary-items__end sidebar-primary__section">
+  </div>
+  
+  <div id="rtd-footer-container"></div>
+
+
+      </div>
+      
+      <main id="main-content" class="bd-main" role="main">
+        
+        
+
+<div class="sbt-scroll-pixel-helper"></div>
+
+          <div class="bd-content">
+            <div class="bd-article-container">
+              
+              <div class="bd-header-article d-print-none">
+<div class="header-article-items header-article__inner">
+  
+    <div class="header-article-items__start">
+      
+        <div class="header-article-item"><button class="sidebar-toggle primary-toggle btn btn-sm" title="Toggle primary sidebar" data-bs-placement="bottom" data-bs-toggle="tooltip">
+  <span class="fa-solid fa-bars"></span>
+</button></div>
+      
+    </div>
+  
+  
+    <div class="header-article-items__end">
+      
+        <div class="header-article-item">
+
+<div class="article-header-buttons">
+
+
+
+
+
+<div class="dropdown dropdown-download-buttons">
+  <button class="btn dropdown-toggle" type="button" data-bs-toggle="dropdown" aria-expanded="false" aria-label="Download this page">
+    <i class="fas fa-download"></i>
+  </button>
+  <ul class="dropdown-menu">
+      
+      
+      
+      <li><a href="http://github.com/_sources/.venv/lib/python3.13/site-packages/jupyter_book/book_template/notebooks.ipynb" target="_blank"
+   class="btn btn-sm btn-download-source-button dropdown-item"
+   title="Download source file"
+   data-bs-placement="left" data-bs-toggle="tooltip"
+>
+  
+
+<span class="btn__icon-container">
+  <i class="fas fa-file"></i>
+  </span>
+<span class="btn__text-container">.ipynb</span>
+</a>
+</li>
+      
+      
+      
+      
+      <li>
+<button onclick="window.print()"
+  class="btn btn-sm btn-download-pdf-button dropdown-item"
+  title="Print to PDF"
+  data-bs-placement="left" data-bs-toggle="tooltip"
+>
+  
+
+<span class="btn__icon-container">
+  <i class="fas fa-file-pdf"></i>
+  </span>
+<span class="btn__text-container">.pdf</span>
+</button>
+</li>
+      
+  </ul>
+</div>
+
+
+
+
+<button onclick="toggleFullScreen()"
+  class="btn btn-sm btn-fullscreen-button"
+  title="Fullscreen mode"
+  data-bs-placement="bottom" data-bs-toggle="tooltip"
+>
+  
+
+<span class="btn__icon-container">
+  <i class="fas fa-expand"></i>
+  </span>
+
+</button>
+
+
+
+<script>
+document.write(`
+  <button class="btn btn-sm nav-link pst-navbar-icon theme-switch-button" title="light/dark" aria-label="light/dark" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <i class="theme-switch fa-solid fa-sun fa-lg" data-mode="light"></i>
+    <i class="theme-switch fa-solid fa-moon fa-lg" data-mode="dark"></i>
+    <i class="theme-switch fa-solid fa-circle-half-stroke fa-lg" data-mode="auto"></i>
+  </button>
+`);
+</script>
+
+
+<script>
+document.write(`
+  <button class="btn btn-sm pst-navbar-icon search-button search-button__button" title="Search" aria-label="Search" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <i class="fa-solid fa-magnifying-glass fa-lg"></i>
+  </button>
+`);
+</script>
+<button class="sidebar-toggle secondary-toggle btn btn-sm" title="Toggle secondary sidebar" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <span class="fa-solid fa-list"></span>
+</button>
+</div></div>
+      
+    </div>
+  
+</div>
+</div>
+              
+              
+
+<div id="jb-print-docs-body" class="onlyprint">
+    <h1>Content with notebooks</h1>
+    <!-- Table of contents -->
+    <div id="print-main-content">
+        <div id="jb-print-toc">
+            
+            <div>
+                <h2> Contents </h2>
+            </div>
+            <nav aria-label="Page">
+                <ul class="visible nav section-nav flex-column">
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#markdown-notebooks">Markdown + notebooks</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#myst-markdown">MyST markdown</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#code-blocks-and-outputs">Code blocks and outputs</a></li>
+</ul>
+            </nav>
+        </div>
+    </div>
+</div>
+
+              
+                
+<div id="searchbox"></div>
+                <article class="bd-article">
+                  
+  <section class="tex2jax_ignore mathjax_ignore" id="content-with-notebooks">
+<h1>Content with notebooks<a class="headerlink" href="#content-with-notebooks" title="Link to this heading">#</a></h1>
+<p>You can also create content with Jupyter Notebooks. This means that you can include
+code blocks and their outputs in your book.</p>
+<section id="markdown-notebooks">
+<h2>Markdown + notebooks<a class="headerlink" href="#markdown-notebooks" title="Link to this heading">#</a></h2>
+<p>As it is markdown, you can embed images, HTML, etc into your posts!</p>
+<p><img alt="" src="https://myst-parser.readthedocs.io/en/latest/_static/logo-wide.svg" /></p>
+<p>You can also <span class="math notranslate nohighlight">\(add_{math}\)</span> and</p>
+<div class="math notranslate nohighlight">
+\[
+math^{blocks}
+\]</div>
+<p>or</p>
+<div class="math notranslate nohighlight">
+\[\begin{split}
+\begin{aligned}
+\mbox{mean} la_{tex} \\ \\
+math blocks
+\end{aligned}
+\end{split}\]</div>
+<p>But make sure you $Escape $your $dollar signs $you want to keep!</p>
+</section>
+<section id="myst-markdown">
+<h2>MyST markdown<a class="headerlink" href="#myst-markdown" title="Link to this heading">#</a></h2>
+<p>MyST markdown works in Jupyter Notebooks as well. For more information about MyST markdown, check
+out <a class="reference external" href="https://jupyterbook.org/content/myst.html">the MyST guide in Jupyter Book</a>,
+or see <a class="reference external" href="https://myst-parser.readthedocs.io/en/latest/">the MyST markdown documentation</a>.</p>
+</section>
+<section id="code-blocks-and-outputs">
+<h2>Code blocks and outputs<a class="headerlink" href="#code-blocks-and-outputs" title="Link to this heading">#</a></h2>
+<p>Jupyter Book will also embed your code blocks and output in your book.
+For example, here’s some sample Matplotlib code:</p>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-ipython3 notranslate"><div class="highlight"><pre><span></span><span class="kn">from</span><span class="w"> </span><span class="nn">matplotlib</span><span class="w"> </span><span class="kn">import</span> <span class="n">rcParams</span><span class="p">,</span> <span class="n">cycler</span>
+<span class="kn">import</span><span class="w"> </span><span class="nn">matplotlib.pyplot</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="nn">plt</span>
+<span class="kn">import</span><span class="w"> </span><span class="nn">numpy</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="nn">np</span>
+<span class="n">plt</span><span class="o">.</span><span class="n">ion</span><span class="p">()</span>
+</pre></div>
+</div>
+</div>
+</div>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-ipython3 notranslate"><div class="highlight"><pre><span></span><span class="c1"># Fixing random state for reproducibility</span>
+<span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">seed</span><span class="p">(</span><span class="mi">19680801</span><span class="p">)</span>
+
+<span class="n">N</span> <span class="o">=</span> <span class="mi">10</span>
+<span class="n">data</span> <span class="o">=</span> <span class="p">[</span><span class="n">np</span><span class="o">.</span><span class="n">logspace</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">100</span><span class="p">)</span> <span class="o">+</span> <span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">randn</span><span class="p">(</span><span class="mi">100</span><span class="p">)</span> <span class="o">+</span> <span class="n">ii</span> <span class="k">for</span> <span class="n">ii</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">N</span><span class="p">)]</span>
+<span class="n">data</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">array</span><span class="p">(</span><span class="n">data</span><span class="p">)</span><span class="o">.</span><span class="n">T</span>
+<span class="n">cmap</span> <span class="o">=</span> <span class="n">plt</span><span class="o">.</span><span class="n">cm</span><span class="o">.</span><span class="n">coolwarm</span>
+<span class="n">rcParams</span><span class="p">[</span><span class="s1">&#39;axes.prop_cycle&#39;</span><span class="p">]</span> <span class="o">=</span> <span class="n">cycler</span><span class="p">(</span><span class="n">color</span><span class="o">=</span><span class="n">cmap</span><span class="p">(</span><span class="n">np</span><span class="o">.</span><span class="n">linspace</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="n">N</span><span class="p">)))</span>
+
+
+<span class="kn">from</span><span class="w"> </span><span class="nn">matplotlib.lines</span><span class="w"> </span><span class="kn">import</span> <span class="n">Line2D</span>
+<span class="n">custom_lines</span> <span class="o">=</span> <span class="p">[</span><span class="n">Line2D</span><span class="p">([</span><span class="mi">0</span><span class="p">],</span> <span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="n">color</span><span class="o">=</span><span class="n">cmap</span><span class="p">(</span><span class="mf">0.</span><span class="p">),</span> <span class="n">lw</span><span class="o">=</span><span class="mi">4</span><span class="p">),</span>
+                <span class="n">Line2D</span><span class="p">([</span><span class="mi">0</span><span class="p">],</span> <span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="n">color</span><span class="o">=</span><span class="n">cmap</span><span class="p">(</span><span class="mf">.5</span><span class="p">),</span> <span class="n">lw</span><span class="o">=</span><span class="mi">4</span><span class="p">),</span>
+                <span class="n">Line2D</span><span class="p">([</span><span class="mi">0</span><span class="p">],</span> <span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="n">color</span><span class="o">=</span><span class="n">cmap</span><span class="p">(</span><span class="mf">1.</span><span class="p">),</span> <span class="n">lw</span><span class="o">=</span><span class="mi">4</span><span class="p">)]</span>
+
+<span class="n">fig</span><span class="p">,</span> <span class="n">ax</span> <span class="o">=</span> <span class="n">plt</span><span class="o">.</span><span class="n">subplots</span><span class="p">(</span><span class="n">figsize</span><span class="o">=</span><span class="p">(</span><span class="mi">10</span><span class="p">,</span> <span class="mi">5</span><span class="p">))</span>
+<span class="n">lines</span> <span class="o">=</span> <span class="n">ax</span><span class="o">.</span><span class="n">plot</span><span class="p">(</span><span class="n">data</span><span class="p">)</span>
+<span class="n">ax</span><span class="o">.</span><span class="n">legend</span><span class="p">(</span><span class="n">custom_lines</span><span class="p">,</span> <span class="p">[</span><span class="s1">&#39;Cold&#39;</span><span class="p">,</span> <span class="s1">&#39;Medium&#39;</span><span class="p">,</span> <span class="s1">&#39;Hot&#39;</span><span class="p">]);</span>
+</pre></div>
+</div>
+</div>
+</div>
+<p>There is a lot more that you can do with outputs (such as including interactive outputs)
+with your book. For more information about this, see <a class="reference external" href="https://jupyterbook.org/">the Jupyter Book documentation</a></p>
+</section>
+</section>
+
+    <script type="text/x-thebe-config">
+    {
+        requestKernel: true,
+        binderOptions: {
+            repo: "binder-examples/jupyter-stacks-datascience",
+            ref: "master",
+        },
+        codeMirrorConfig: {
+            theme: "abcdef",
+            mode: "python"
+        },
+        kernelOptions: {
+            name: "python3",
+            path: "./.venv/lib/python3.13/site-packages/jupyter_book/book_template"
+        },
+        predefinedOutput: true
+    }
+    </script>
+    <script>kernelName = 'python3'</script>
+
+                </article>
+              
+
+              
+              
+              
+              
+                <footer class="prev-next-footer d-print-none">
+                  
+<div class="prev-next-area">
+</div>
+                </footer>
+              
+            </div>
+            
+            
+              
+                <div class="bd-sidebar-secondary bd-toc"><div class="sidebar-secondary-items sidebar-secondary__inner">
+
+
+  <div class="sidebar-secondary-item">
+  <div class="page-toc tocsection onthispage">
+    <i class="fa-solid fa-list"></i> Contents
+  </div>
+  <nav class="bd-toc-nav page-toc">
+    <ul class="visible nav section-nav flex-column">
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#markdown-notebooks">Markdown + notebooks</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#myst-markdown">MyST markdown</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#code-blocks-and-outputs">Code blocks and outputs</a></li>
+</ul>
+  </nav></div>
+
+</div></div>
+              
+            
+          </div>
+          <footer class="bd-footer-content">
+            
+<div class="bd-footer-content__inner container">
+  
+  <div class="footer-item">
+    
+<p class="component-author">
+By Morten Hjorth-Jensen
+</p>
+
+  </div>
+  
+  <div class="footer-item">
+    
+
+  <p class="copyright">
+    
+      © Copyright 2023.
+      <br/>
+    
+  </p>
+
+  </div>
+  
+  <div class="footer-item">
+    
+  </div>
+  
+  <div class="footer-item">
+    
+  </div>
+  
+</div>
+          </footer>
+        
+
+      </main>
+    </div>
+  </div>
+  
+  <!-- Scripts loaded after <body> so the DOM is not blocked -->
+  <script src="http://github.com/_static/scripts/bootstrap.js?digest=dfe6caa3a7d634c4db9b"></script>
+<script src="http://github.com/_static/scripts/pydata-sphinx-theme.js?digest=dfe6caa3a7d634c4db9b"></script>
+
+  <footer class="bd-footer">
+  </footer>
+  </body>
+</html>
\ No newline at end of file
diff --git a/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/latexcodec-3.0.1.dist-info/licenses/AUTHORS.html b/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/latexcodec-3.0.1.dist-info/licenses/AUTHORS.html
new file mode 100644
index 000000000..6d98fb80e
--- /dev/null
+++ b/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/latexcodec-3.0.1.dist-info/licenses/AUTHORS.html
@@ -0,0 +1,541 @@
+
+<!DOCTYPE html>
+
+
+<html lang="en" data-content_root="../../../../../../" >
+
+  <head>
+    <meta charset="utf-8" />
+    <meta name="viewport" content="width=device-width, initial-scale=1.0" /><meta name="viewport" content="width=device-width, initial-scale=1" />
+
+    <title>&lt;no title&gt; &#8212; Applied Data Analysis and Machine Learning</title>
+  
+  
+  
+  <script data-cfasync="false">
+    document.documentElement.dataset.mode = localStorage.getItem("mode") || "";
+    document.documentElement.dataset.theme = localStorage.getItem("theme") || "";
+  </script>
+  
+  <!-- Loaded before other Sphinx assets -->
+  <link href="http://github.com/_static/styles/theme.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+<link href="http://github.com/_static/styles/bootstrap.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+<link href="http://github.com/_static/styles/pydata-sphinx-theme.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+
+  
+  <link href="http://github.com/_static/vendor/fontawesome/6.5.2/css/all.min.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+  <link rel="preload" as="font" type="font/woff2" crossorigin href="http://github.com/_static/vendor/fontawesome/6.5.2/webfonts/fa-solid-900.woff2" />
+<link rel="preload" as="font" type="font/woff2" crossorigin href="http://github.com/_static/vendor/fontawesome/6.5.2/webfonts/fa-brands-400.woff2" />
+<link rel="preload" as="font" type="font/woff2" crossorigin href="http://github.com/_static/vendor/fontawesome/6.5.2/webfonts/fa-regular-400.woff2" />
+
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/pygments.css?v=03e43079" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/styles/sphinx-book-theme.css?v=eba8b062" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/togglebutton.css?v=13237357" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/copybutton.css?v=76b2166b" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/mystnb.8ecb98da25f57f5357bf6f572d296f466b2cfe2517ffebfabe82451661e28f02.css?v=6644e6bb" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/sphinx-thebe.css?v=4fa983c6" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/sphinx-design.min.css?v=95c83b7e" />
+  
+  <!-- Pre-loaded scripts that we'll load fully later -->
+  <link rel="preload" as="script" href="http://github.com/_static/scripts/bootstrap.js?digest=dfe6caa3a7d634c4db9b" />
+<link rel="preload" as="script" href="http://github.com/_static/scripts/pydata-sphinx-theme.js?digest=dfe6caa3a7d634c4db9b" />
+  <script src="http://github.com/_static/vendor/fontawesome/6.5.2/js/all.min.js?digest=dfe6caa3a7d634c4db9b"></script>
+
+    <script src="http://github.com/_static/documentation_options.js?v=9eb32ce0"></script>
+    <script src="http://github.com/_static/doctools.js?v=9a2dae69"></script>
+    <script src="http://github.com/_static/sphinx_highlight.js?v=dc90522c"></script>
+    <script src="http://github.com/_static/clipboard.min.js?v=a7894cd8"></script>
+    <script src="http://github.com/_static/copybutton.js?v=f281be69"></script>
+    <script src="http://github.com/_static/scripts/sphinx-book-theme.js?v=887ef09a"></script>
+    <script>let toggleHintShow = 'Click to show';</script>
+    <script>let toggleHintHide = 'Click to hide';</script>
+    <script>let toggleOpenOnPrint = 'true';</script>
+    <script src="http://github.com/_static/togglebutton.js?v=4a39c7ea"></script>
+    <script>var togglebuttonSelector = '.toggle, .admonition.dropdown';</script>
+    <script src="http://github.com/_static/design-tabs.js?v=f930bc37"></script>
+    <script>const THEBE_JS_URL = "https://unpkg.com/thebe@0.8.2/lib/index.js"; const thebe_selector = ".thebe,.cell"; const thebe_selector_input = "pre"; const thebe_selector_output = ".output, .cell_output"</script>
+    <script async="async" src="http://github.com/_static/sphinx-thebe.js?v=c100c467"></script>
+    <script>var togglebuttonSelector = '.toggle, .admonition.dropdown';</script>
+    <script>const THEBE_JS_URL = "https://unpkg.com/thebe@0.8.2/lib/index.js"; const thebe_selector = ".thebe,.cell"; const thebe_selector_input = "pre"; const thebe_selector_output = ".output, .cell_output"</script>
+    <script>DOCUMENTATION_OPTIONS.pagename = '.venv/lib/python3.13/site-packages/latexcodec-3.0.1.dist-info/licenses/AUTHORS';</script>
+    <link rel="index" title="Index" href="http://github.com/genindex.html" />
+    <link rel="search" title="Search" href="http://github.com/search.html" />
+  <meta name="viewport" content="width=device-width, initial-scale=1"/>
+  <meta name="docsearch:language" content="en"/>
+  </head>
+  
+  
+  <body data-bs-spy="scroll" data-bs-target=".bd-toc-nav" data-offset="180" data-bs-root-margin="0px 0px -60%" data-default-mode="">
+
+  
+  
+  <div id="pst-skip-link" class="skip-link d-print-none"><a href="#main-content">Skip to main content</a></div>
+  
+  <div id="pst-scroll-pixel-helper"></div>
+  
+  <button type="button" class="btn rounded-pill" id="pst-back-to-top">
+    <i class="fa-solid fa-arrow-up"></i>Back to top</button>
+
+  
+  <input type="checkbox"
+          class="sidebar-toggle"
+          id="pst-primary-sidebar-checkbox"/>
+  <label class="overlay overlay-primary" for="pst-primary-sidebar-checkbox"></label>
+  
+  <input type="checkbox"
+          class="sidebar-toggle"
+          id="pst-secondary-sidebar-checkbox"/>
+  <label class="overlay overlay-secondary" for="pst-secondary-sidebar-checkbox"></label>
+  
+  <div class="search-button__wrapper">
+    <div class="search-button__overlay"></div>
+    <div class="search-button__search-container">
+<form class="bd-search d-flex align-items-center"
+      action="http://github.com/search.html"
+      method="get">
+  <i class="fa-solid fa-magnifying-glass"></i>
+  <input type="search"
+         class="form-control"
+         name="q"
+         id="search-input"
+         placeholder="Search this book..."
+         aria-label="Search this book..."
+         autocomplete="off"
+         autocorrect="off"
+         autocapitalize="off"
+         spellcheck="false"/>
+  <span class="search-button__kbd-shortcut"><kbd class="kbd-shortcut__modifier">Ctrl</kbd>+<kbd>K</kbd></span>
+</form></div>
+  </div>
+
+  <div class="pst-async-banner-revealer d-none">
+  <aside id="bd-header-version-warning" class="d-none d-print-none" aria-label="Version warning"></aside>
+</div>
+
+  
+    <header class="bd-header navbar navbar-expand-lg bd-navbar d-print-none">
+    </header>
+  
+
+  <div class="bd-container">
+    <div class="bd-container__inner bd-page-width">
+      
+      
+      
+        
+      
+      <div class="bd-sidebar-primary bd-sidebar">
+        
+
+  
+  <div class="sidebar-header-items sidebar-primary__section">
+    
+    
+    
+    
+  </div>
+  
+    <div class="sidebar-primary-items__start sidebar-primary__section">
+        <div class="sidebar-primary-item">
+
+  
+    
+  
+
+<a class="navbar-brand logo" href="http://github.com/intro.html">
+  
+  
+  
+  
+  
+    
+    
+      
+    
+    
+    <img src="http://github.com/_static/logo.png" class="logo__image only-light" alt="Applied Data Analysis and Machine Learning - Home"/>
+    <script>document.write(`<img src="http://github.com/_static/logo.png" class="logo__image only-dark" alt="Applied Data Analysis and Machine Learning - Home"/>`);</script>
+  
+  
+</a></div>
+        <div class="sidebar-primary-item">
+
+ <script>
+ document.write(`
+   <button class="btn search-button-field search-button__button" title="Search" aria-label="Search" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <i class="fa-solid fa-magnifying-glass"></i>
+    <span class="search-button__default-text">Search</span>
+    <span class="search-button__kbd-shortcut"><kbd class="kbd-shortcut__modifier">Ctrl</kbd>+<kbd class="kbd-shortcut__modifier">K</kbd></span>
+   </button>
+ `);
+ </script></div>
+        <div class="sidebar-primary-item"><nav class="bd-links bd-docs-nav" aria-label="Main">
+    <div class="bd-toc-item navbar-nav active">
+        
+        <ul class="nav bd-sidenav bd-sidenav__home-link">
+            <li class="toctree-l1">
+                <a class="reference internal" href="http://github.com/intro.html">
+                    Applied Data Analysis and Machine Learning
+                </a>
+            </li>
+        </ul>
+        <p aria-level="2" class="caption" role="heading"><span class="caption-text">About the course</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/schedule.html">Course setting</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/teachers.html">Teachers and Grading</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/textbooks.html">Textbooks</a></li>
+
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Review of Statistics with Resampling Techniques and Linear Algebra</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/statistics.html">1. Elements of Probability Theory and Statistical Data Analysis</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/linalg.html">2. Linear Algebra, Handling of Arrays and more Python Features</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">From Regression to Support Vector Machines</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter1.html">3. Linear Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter2.html">4. Ridge and Lasso Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter3.html">5. Resampling Methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter4.html">6. Logistic Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapteroptimization.html">7. Optimization, the central part of any Machine Learning algortithm</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter5.html">8. Support Vector Machines, overarching aims</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Decision Trees, Ensemble Methods and Boosting</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter6.html">9. Decision trees, overarching aims</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter7.html">10. Ensemble Methods: From a Single Tree to Many Trees and Extreme Boosting, Meet the Jungle of Methods</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Dimensionality Reduction</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter8.html">11. Basic ideas of the Principal Component Analysis (PCA)</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/clustering.html">12. Clustering and Unsupervised Learning</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Deep Learning Methods</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter9.html">13. Neural networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter10.html">14. Building a Feed Forward Neural Network</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter11.html">15. Solving Differential Equations  with Deep Learning</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter12.html">16. Convolutional Neural Networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter13.html">17. Recurrent neural networks: Overarching view</a></li>
+
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Weekly material, notes and exercises</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek34.html">Exercises week 34</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week34.html">Week 34: Introduction to the course, Logistics and Practicalities</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek35.html">Exercises week 35</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week35.html">Week 35: From Ordinary Linear Regression to Ridge and Lasso Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek36.html">Exercises week 36</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week36.html">Week 36: Linear Regression and Gradient descent</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek37.html">Exercises week 37</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week37.html">Week 37: Gradient descent methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek38.html">Exercises week 38</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week38.html">Week 38: Statistical analysis, bias-variance tradeoff and resampling methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek39.html">Exercises week 39</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week39.html">Week 39: Resampling methods and logistic regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week40.html">Week 40: Gradient descent methods (continued) and start Neural networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week41.html">Week 41 Neural networks and constructing a neural network code</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek41.html">Exercises week 41</a></li>
+
+
+
+
+
+
+
+
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Projects</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/project1.html">Project 1 on Machine Learning, deadline October 6 (midnight), 2025</a></li>
+</ul>
+
+    </div>
+</nav></div>
+    </div>
+  
+  
+  <div class="sidebar-primary-items__end sidebar-primary__section">
+  </div>
+  
+  <div id="rtd-footer-container"></div>
+
+
+      </div>
+      
+      <main id="main-content" class="bd-main" role="main">
+        
+        
+
+<div class="sbt-scroll-pixel-helper"></div>
+
+          <div class="bd-content">
+            <div class="bd-article-container">
+              
+              <div class="bd-header-article d-print-none">
+<div class="header-article-items header-article__inner">
+  
+    <div class="header-article-items__start">
+      
+        <div class="header-article-item"><button class="sidebar-toggle primary-toggle btn btn-sm" title="Toggle primary sidebar" data-bs-placement="bottom" data-bs-toggle="tooltip">
+  <span class="fa-solid fa-bars"></span>
+</button></div>
+      
+    </div>
+  
+  
+    <div class="header-article-items__end">
+      
+        <div class="header-article-item">
+
+<div class="article-header-buttons">
+
+
+
+
+
+<div class="dropdown dropdown-download-buttons">
+  <button class="btn dropdown-toggle" type="button" data-bs-toggle="dropdown" aria-expanded="false" aria-label="Download this page">
+    <i class="fas fa-download"></i>
+  </button>
+  <ul class="dropdown-menu">
+      
+      
+      
+      <li><a href="http://github.com/_sources/.venv/lib/python3.13/site-packages/latexcodec-3.0.1.dist-info/licenses/AUTHORS.rst" target="_blank"
+   class="btn btn-sm btn-download-source-button dropdown-item"
+   title="Download source file"
+   data-bs-placement="left" data-bs-toggle="tooltip"
+>
+  
+
+<span class="btn__icon-container">
+  <i class="fas fa-file"></i>
+  </span>
+<span class="btn__text-container">.rst</span>
+</a>
+</li>
+      
+      
+      
+      
+      <li>
+<button onclick="window.print()"
+  class="btn btn-sm btn-download-pdf-button dropdown-item"
+  title="Print to PDF"
+  data-bs-placement="left" data-bs-toggle="tooltip"
+>
+  
+
+<span class="btn__icon-container">
+  <i class="fas fa-file-pdf"></i>
+  </span>
+<span class="btn__text-container">.pdf</span>
+</button>
+</li>
+      
+  </ul>
+</div>
+
+
+
+
+<button onclick="toggleFullScreen()"
+  class="btn btn-sm btn-fullscreen-button"
+  title="Fullscreen mode"
+  data-bs-placement="bottom" data-bs-toggle="tooltip"
+>
+  
+
+<span class="btn__icon-container">
+  <i class="fas fa-expand"></i>
+  </span>
+
+</button>
+
+
+
+<script>
+document.write(`
+  <button class="btn btn-sm nav-link pst-navbar-icon theme-switch-button" title="light/dark" aria-label="light/dark" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <i class="theme-switch fa-solid fa-sun fa-lg" data-mode="light"></i>
+    <i class="theme-switch fa-solid fa-moon fa-lg" data-mode="dark"></i>
+    <i class="theme-switch fa-solid fa-circle-half-stroke fa-lg" data-mode="auto"></i>
+  </button>
+`);
+</script>
+
+
+<script>
+document.write(`
+  <button class="btn btn-sm pst-navbar-icon search-button search-button__button" title="Search" aria-label="Search" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <i class="fa-solid fa-magnifying-glass fa-lg"></i>
+  </button>
+`);
+</script>
+<button class="sidebar-toggle secondary-toggle btn btn-sm" title="Toggle secondary sidebar" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <span class="fa-solid fa-list"></span>
+</button>
+</div></div>
+      
+    </div>
+  
+</div>
+</div>
+              
+              
+
+<div id="jb-print-docs-body" class="onlyprint">
+    <h1><no title></h1>
+    <!-- Table of contents -->
+    <div id="print-main-content">
+        <div id="jb-print-toc">
+            
+            <div>
+                <h2> Contents </h2>
+            </div>
+            <nav aria-label="Page">
+                <ul class="simple visible nav section-nav flex-column">
+</ul>
+
+            </nav>
+        </div>
+    </div>
+</div>
+
+              
+                
+<div id="searchbox"></div>
+                <article class="bd-article">
+                  
+  <p>Main authors:</p>
+<ul class="simple">
+<li><p>David Eppstein</p>
+<ul>
+<li><p>wrote the original LaTeX codec as a recipe on ActiveState
+<a class="reference external" href="http://code.activestate.com/recipes/252124-latex-codec/">http://code.activestate.com/recipes/252124-latex-codec/</a></p></li>
+</ul>
+</li>
+<li><p>Peter Tröger</p>
+<ul>
+<li><p>wrote the original latexcodec package, which contained a simple
+but very effective LaTeX encoder</p></li>
+</ul>
+</li>
+<li><p>Matthias Troffaes (<a class="reference external" href="/service/mailto:matthias&#46;troffaes&#37;&#52;&#48;gmail&#46;com">matthias<span>&#46;</span>troffaes<span>&#64;</span>gmail<span>&#46;</span>com</a>)</p>
+<ul>
+<li><p>wrote the lexer</p></li>
+<li><p>integrated codec with the lexer for a simpler and more robust
+design</p></li>
+<li><p>various bugfixes</p></li>
+</ul>
+</li>
+</ul>
+<p>Contributors:</p>
+<ul class="simple">
+<li><p>Michael Radziej</p></li>
+<li><p>Philipp Spitzer</p></li>
+</ul>
+
+    <script type="text/x-thebe-config">
+    {
+        requestKernel: true,
+        binderOptions: {
+            repo: "binder-examples/jupyter-stacks-datascience",
+            ref: "master",
+        },
+        codeMirrorConfig: {
+            theme: "abcdef",
+            mode: "python"
+        },
+        kernelOptions: {
+            name: "python3",
+            path: "./.venv/lib/python3.13/site-packages/latexcodec-3.0.1.dist-info/licenses"
+        },
+        predefinedOutput: true
+    }
+    </script>
+    <script>kernelName = 'python3'</script>
+
+                </article>
+              
+
+              
+              
+              
+              
+                <footer class="prev-next-footer d-print-none">
+                  
+<div class="prev-next-area">
+</div>
+                </footer>
+              
+            </div>
+            
+            
+              
+                <div class="bd-sidebar-secondary bd-toc"><div class="sidebar-secondary-items sidebar-secondary__inner">
+
+
+  <div class="sidebar-secondary-item">
+  <div class="page-toc tocsection onthispage">
+    <i class="fa-solid fa-list"></i> Contents
+  </div>
+  <nav class="bd-toc-nav page-toc">
+    <ul class="simple visible nav section-nav flex-column">
+</ul>
+
+  </nav></div>
+
+</div></div>
+              
+            
+          </div>
+          <footer class="bd-footer-content">
+            
+<div class="bd-footer-content__inner container">
+  
+  <div class="footer-item">
+    
+<p class="component-author">
+By Morten Hjorth-Jensen
+</p>
+
+  </div>
+  
+  <div class="footer-item">
+    
+
+  <p class="copyright">
+    
+      © Copyright 2023.
+      <br/>
+    
+  </p>
+
+  </div>
+  
+  <div class="footer-item">
+    
+  </div>
+  
+  <div class="footer-item">
+    
+  </div>
+  
+</div>
+          </footer>
+        
+
+      </main>
+    </div>
+  </div>
+  
+  <!-- Scripts loaded after <body> so the DOM is not blocked -->
+  <script src="http://github.com/_static/scripts/bootstrap.js?digest=dfe6caa3a7d634c4db9b"></script>
+<script src="http://github.com/_static/scripts/pydata-sphinx-theme.js?digest=dfe6caa3a7d634c4db9b"></script>
+
+  <footer class="bd-footer">
+  </footer>
+  </body>
+</html>
\ No newline at end of file
diff --git a/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/latexcodec-3.0.1.dist-info/licenses/LICENSE.html b/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/latexcodec-3.0.1.dist-info/licenses/LICENSE.html
new file mode 100644
index 000000000..f4b6e00f6
--- /dev/null
+++ b/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/latexcodec-3.0.1.dist-info/licenses/LICENSE.html
@@ -0,0 +1,535 @@
+
+<!DOCTYPE html>
+
+
+<html lang="en" data-content_root="../../../../../../" >
+
+  <head>
+    <meta charset="utf-8" />
+    <meta name="viewport" content="width=device-width, initial-scale=1.0" /><meta name="viewport" content="width=device-width, initial-scale=1" />
+
+    <title>&lt;no title&gt; &#8212; Applied Data Analysis and Machine Learning</title>
+  
+  
+  
+  <script data-cfasync="false">
+    document.documentElement.dataset.mode = localStorage.getItem("mode") || "";
+    document.documentElement.dataset.theme = localStorage.getItem("theme") || "";
+  </script>
+  
+  <!-- Loaded before other Sphinx assets -->
+  <link href="http://github.com/_static/styles/theme.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+<link href="http://github.com/_static/styles/bootstrap.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+<link href="http://github.com/_static/styles/pydata-sphinx-theme.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+
+  
+  <link href="http://github.com/_static/vendor/fontawesome/6.5.2/css/all.min.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+  <link rel="preload" as="font" type="font/woff2" crossorigin href="http://github.com/_static/vendor/fontawesome/6.5.2/webfonts/fa-solid-900.woff2" />
+<link rel="preload" as="font" type="font/woff2" crossorigin href="http://github.com/_static/vendor/fontawesome/6.5.2/webfonts/fa-brands-400.woff2" />
+<link rel="preload" as="font" type="font/woff2" crossorigin href="http://github.com/_static/vendor/fontawesome/6.5.2/webfonts/fa-regular-400.woff2" />
+
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/pygments.css?v=03e43079" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/styles/sphinx-book-theme.css?v=eba8b062" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/togglebutton.css?v=13237357" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/copybutton.css?v=76b2166b" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/mystnb.8ecb98da25f57f5357bf6f572d296f466b2cfe2517ffebfabe82451661e28f02.css?v=6644e6bb" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/sphinx-thebe.css?v=4fa983c6" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/sphinx-design.min.css?v=95c83b7e" />
+  
+  <!-- Pre-loaded scripts that we'll load fully later -->
+  <link rel="preload" as="script" href="http://github.com/_static/scripts/bootstrap.js?digest=dfe6caa3a7d634c4db9b" />
+<link rel="preload" as="script" href="http://github.com/_static/scripts/pydata-sphinx-theme.js?digest=dfe6caa3a7d634c4db9b" />
+  <script src="http://github.com/_static/vendor/fontawesome/6.5.2/js/all.min.js?digest=dfe6caa3a7d634c4db9b"></script>
+
+    <script src="http://github.com/_static/documentation_options.js?v=9eb32ce0"></script>
+    <script src="http://github.com/_static/doctools.js?v=9a2dae69"></script>
+    <script src="http://github.com/_static/sphinx_highlight.js?v=dc90522c"></script>
+    <script src="http://github.com/_static/clipboard.min.js?v=a7894cd8"></script>
+    <script src="http://github.com/_static/copybutton.js?v=f281be69"></script>
+    <script src="http://github.com/_static/scripts/sphinx-book-theme.js?v=887ef09a"></script>
+    <script>let toggleHintShow = 'Click to show';</script>
+    <script>let toggleHintHide = 'Click to hide';</script>
+    <script>let toggleOpenOnPrint = 'true';</script>
+    <script src="http://github.com/_static/togglebutton.js?v=4a39c7ea"></script>
+    <script>var togglebuttonSelector = '.toggle, .admonition.dropdown';</script>
+    <script src="http://github.com/_static/design-tabs.js?v=f930bc37"></script>
+    <script>const THEBE_JS_URL = "https://unpkg.com/thebe@0.8.2/lib/index.js"; const thebe_selector = ".thebe,.cell"; const thebe_selector_input = "pre"; const thebe_selector_output = ".output, .cell_output"</script>
+    <script async="async" src="http://github.com/_static/sphinx-thebe.js?v=c100c467"></script>
+    <script>var togglebuttonSelector = '.toggle, .admonition.dropdown';</script>
+    <script>const THEBE_JS_URL = "https://unpkg.com/thebe@0.8.2/lib/index.js"; const thebe_selector = ".thebe,.cell"; const thebe_selector_input = "pre"; const thebe_selector_output = ".output, .cell_output"</script>
+    <script>DOCUMENTATION_OPTIONS.pagename = '.venv/lib/python3.13/site-packages/latexcodec-3.0.1.dist-info/licenses/LICENSE';</script>
+    <link rel="index" title="Index" href="http://github.com/genindex.html" />
+    <link rel="search" title="Search" href="http://github.com/search.html" />
+  <meta name="viewport" content="width=device-width, initial-scale=1"/>
+  <meta name="docsearch:language" content="en"/>
+  </head>
+  
+  
+  <body data-bs-spy="scroll" data-bs-target=".bd-toc-nav" data-offset="180" data-bs-root-margin="0px 0px -60%" data-default-mode="">
+
+  
+  
+  <div id="pst-skip-link" class="skip-link d-print-none"><a href="#main-content">Skip to main content</a></div>
+  
+  <div id="pst-scroll-pixel-helper"></div>
+  
+  <button type="button" class="btn rounded-pill" id="pst-back-to-top">
+    <i class="fa-solid fa-arrow-up"></i>Back to top</button>
+
+  
+  <input type="checkbox"
+          class="sidebar-toggle"
+          id="pst-primary-sidebar-checkbox"/>
+  <label class="overlay overlay-primary" for="pst-primary-sidebar-checkbox"></label>
+  
+  <input type="checkbox"
+          class="sidebar-toggle"
+          id="pst-secondary-sidebar-checkbox"/>
+  <label class="overlay overlay-secondary" for="pst-secondary-sidebar-checkbox"></label>
+  
+  <div class="search-button__wrapper">
+    <div class="search-button__overlay"></div>
+    <div class="search-button__search-container">
+<form class="bd-search d-flex align-items-center"
+      action="http://github.com/search.html"
+      method="get">
+  <i class="fa-solid fa-magnifying-glass"></i>
+  <input type="search"
+         class="form-control"
+         name="q"
+         id="search-input"
+         placeholder="Search this book..."
+         aria-label="Search this book..."
+         autocomplete="off"
+         autocorrect="off"
+         autocapitalize="off"
+         spellcheck="false"/>
+  <span class="search-button__kbd-shortcut"><kbd class="kbd-shortcut__modifier">Ctrl</kbd>+<kbd>K</kbd></span>
+</form></div>
+  </div>
+
+  <div class="pst-async-banner-revealer d-none">
+  <aside id="bd-header-version-warning" class="d-none d-print-none" aria-label="Version warning"></aside>
+</div>
+
+  
+    <header class="bd-header navbar navbar-expand-lg bd-navbar d-print-none">
+    </header>
+  
+
+  <div class="bd-container">
+    <div class="bd-container__inner bd-page-width">
+      
+      
+      
+        
+      
+      <div class="bd-sidebar-primary bd-sidebar">
+        
+
+  
+  <div class="sidebar-header-items sidebar-primary__section">
+    
+    
+    
+    
+  </div>
+  
+    <div class="sidebar-primary-items__start sidebar-primary__section">
+        <div class="sidebar-primary-item">
+
+  
+    
+  
+
+<a class="navbar-brand logo" href="http://github.com/intro.html">
+  
+  
+  
+  
+  
+    
+    
+      
+    
+    
+    <img src="http://github.com/_static/logo.png" class="logo__image only-light" alt="Applied Data Analysis and Machine Learning - Home"/>
+    <script>document.write(`<img src="http://github.com/_static/logo.png" class="logo__image only-dark" alt="Applied Data Analysis and Machine Learning - Home"/>`);</script>
+  
+  
+</a></div>
+        <div class="sidebar-primary-item">
+
+ <script>
+ document.write(`
+   <button class="btn search-button-field search-button__button" title="Search" aria-label="Search" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <i class="fa-solid fa-magnifying-glass"></i>
+    <span class="search-button__default-text">Search</span>
+    <span class="search-button__kbd-shortcut"><kbd class="kbd-shortcut__modifier">Ctrl</kbd>+<kbd class="kbd-shortcut__modifier">K</kbd></span>
+   </button>
+ `);
+ </script></div>
+        <div class="sidebar-primary-item"><nav class="bd-links bd-docs-nav" aria-label="Main">
+    <div class="bd-toc-item navbar-nav active">
+        
+        <ul class="nav bd-sidenav bd-sidenav__home-link">
+            <li class="toctree-l1">
+                <a class="reference internal" href="http://github.com/intro.html">
+                    Applied Data Analysis and Machine Learning
+                </a>
+            </li>
+        </ul>
+        <p aria-level="2" class="caption" role="heading"><span class="caption-text">About the course</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/schedule.html">Course setting</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/teachers.html">Teachers and Grading</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/textbooks.html">Textbooks</a></li>
+
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Review of Statistics with Resampling Techniques and Linear Algebra</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/statistics.html">1. Elements of Probability Theory and Statistical Data Analysis</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/linalg.html">2. Linear Algebra, Handling of Arrays and more Python Features</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">From Regression to Support Vector Machines</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter1.html">3. Linear Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter2.html">4. Ridge and Lasso Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter3.html">5. Resampling Methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter4.html">6. Logistic Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapteroptimization.html">7. Optimization, the central part of any Machine Learning algortithm</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter5.html">8. Support Vector Machines, overarching aims</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Decision Trees, Ensemble Methods and Boosting</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter6.html">9. Decision trees, overarching aims</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter7.html">10. Ensemble Methods: From a Single Tree to Many Trees and Extreme Boosting, Meet the Jungle of Methods</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Dimensionality Reduction</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter8.html">11. Basic ideas of the Principal Component Analysis (PCA)</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/clustering.html">12. Clustering and Unsupervised Learning</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Deep Learning Methods</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter9.html">13. Neural networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter10.html">14. Building a Feed Forward Neural Network</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter11.html">15. Solving Differential Equations  with Deep Learning</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter12.html">16. Convolutional Neural Networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter13.html">17. Recurrent neural networks: Overarching view</a></li>
+
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Weekly material, notes and exercises</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek34.html">Exercises week 34</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week34.html">Week 34: Introduction to the course, Logistics and Practicalities</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek35.html">Exercises week 35</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week35.html">Week 35: From Ordinary Linear Regression to Ridge and Lasso Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek36.html">Exercises week 36</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week36.html">Week 36: Linear Regression and Gradient descent</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek37.html">Exercises week 37</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week37.html">Week 37: Gradient descent methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek38.html">Exercises week 38</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week38.html">Week 38: Statistical analysis, bias-variance tradeoff and resampling methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek39.html">Exercises week 39</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week39.html">Week 39: Resampling methods and logistic regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week40.html">Week 40: Gradient descent methods (continued) and start Neural networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week41.html">Week 41 Neural networks and constructing a neural network code</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek41.html">Exercises week 41</a></li>
+
+
+
+
+
+
+
+
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Projects</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/project1.html">Project 1 on Machine Learning, deadline October 6 (midnight), 2025</a></li>
+</ul>
+
+    </div>
+</nav></div>
+    </div>
+  
+  
+  <div class="sidebar-primary-items__end sidebar-primary__section">
+  </div>
+  
+  <div id="rtd-footer-container"></div>
+
+
+      </div>
+      
+      <main id="main-content" class="bd-main" role="main">
+        
+        
+
+<div class="sbt-scroll-pixel-helper"></div>
+
+          <div class="bd-content">
+            <div class="bd-article-container">
+              
+              <div class="bd-header-article d-print-none">
+<div class="header-article-items header-article__inner">
+  
+    <div class="header-article-items__start">
+      
+        <div class="header-article-item"><button class="sidebar-toggle primary-toggle btn btn-sm" title="Toggle primary sidebar" data-bs-placement="bottom" data-bs-toggle="tooltip">
+  <span class="fa-solid fa-bars"></span>
+</button></div>
+      
+    </div>
+  
+  
+    <div class="header-article-items__end">
+      
+        <div class="header-article-item">
+
+<div class="article-header-buttons">
+
+
+
+
+
+<div class="dropdown dropdown-download-buttons">
+  <button class="btn dropdown-toggle" type="button" data-bs-toggle="dropdown" aria-expanded="false" aria-label="Download this page">
+    <i class="fas fa-download"></i>
+  </button>
+  <ul class="dropdown-menu">
+      
+      
+      
+      <li><a href="http://github.com/_sources/.venv/lib/python3.13/site-packages/latexcodec-3.0.1.dist-info/licenses/LICENSE.rst" target="_blank"
+   class="btn btn-sm btn-download-source-button dropdown-item"
+   title="Download source file"
+   data-bs-placement="left" data-bs-toggle="tooltip"
+>
+  
+
+<span class="btn__icon-container">
+  <i class="fas fa-file"></i>
+  </span>
+<span class="btn__text-container">.rst</span>
+</a>
+</li>
+      
+      
+      
+      
+      <li>
+<button onclick="window.print()"
+  class="btn btn-sm btn-download-pdf-button dropdown-item"
+  title="Print to PDF"
+  data-bs-placement="left" data-bs-toggle="tooltip"
+>
+  
+
+<span class="btn__icon-container">
+  <i class="fas fa-file-pdf"></i>
+  </span>
+<span class="btn__text-container">.pdf</span>
+</button>
+</li>
+      
+  </ul>
+</div>
+
+
+
+
+<button onclick="toggleFullScreen()"
+  class="btn btn-sm btn-fullscreen-button"
+  title="Fullscreen mode"
+  data-bs-placement="bottom" data-bs-toggle="tooltip"
+>
+  
+
+<span class="btn__icon-container">
+  <i class="fas fa-expand"></i>
+  </span>
+
+</button>
+
+
+
+<script>
+document.write(`
+  <button class="btn btn-sm nav-link pst-navbar-icon theme-switch-button" title="light/dark" aria-label="light/dark" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <i class="theme-switch fa-solid fa-sun fa-lg" data-mode="light"></i>
+    <i class="theme-switch fa-solid fa-moon fa-lg" data-mode="dark"></i>
+    <i class="theme-switch fa-solid fa-circle-half-stroke fa-lg" data-mode="auto"></i>
+  </button>
+`);
+</script>
+
+
+<script>
+document.write(`
+  <button class="btn btn-sm pst-navbar-icon search-button search-button__button" title="Search" aria-label="Search" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <i class="fa-solid fa-magnifying-glass fa-lg"></i>
+  </button>
+`);
+</script>
+<button class="sidebar-toggle secondary-toggle btn btn-sm" title="Toggle secondary sidebar" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <span class="fa-solid fa-list"></span>
+</button>
+</div></div>
+      
+    </div>
+  
+</div>
+</div>
+              
+              
+
+<div id="jb-print-docs-body" class="onlyprint">
+    <h1><no title></h1>
+    <!-- Table of contents -->
+    <div id="print-main-content">
+        <div id="jb-print-toc">
+            
+            <div>
+                <h2> Contents </h2>
+            </div>
+            <nav aria-label="Page">
+                <ul class="simple visible nav section-nav flex-column">
+</ul>
+
+            </nav>
+        </div>
+    </div>
+</div>
+
+              
+                
+<div id="searchbox"></div>
+                <article class="bd-article">
+                  
+  <div class="line-block">
+<div class="line">latexcodec is a lexer and codec to work with LaTeX code in Python</div>
+<div class="line">Copyright (c) 2011-2020 by Matthias C. M. Troffaes</div>
+</div>
+<p>Permission is hereby granted, free of charge, to any person
+obtaining a copy of this software and associated documentation
+files (the “Software”), to deal in the Software without
+restriction, including without limitation the rights to use,
+copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the
+Software is furnished to do so, subject to the following
+conditions:</p>
+<p>The above copyright notice and this permission notice shall be
+included in all copies or substantial portions of the Software.</p>
+<p>THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND,
+EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
+OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
+HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
+WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+OTHER DEALINGS IN THE SOFTWARE.</p>
+
+    <script type="text/x-thebe-config">
+    {
+        requestKernel: true,
+        binderOptions: {
+            repo: "binder-examples/jupyter-stacks-datascience",
+            ref: "master",
+        },
+        codeMirrorConfig: {
+            theme: "abcdef",
+            mode: "python"
+        },
+        kernelOptions: {
+            name: "python3",
+            path: "./.venv/lib/python3.13/site-packages/latexcodec-3.0.1.dist-info/licenses"
+        },
+        predefinedOutput: true
+    }
+    </script>
+    <script>kernelName = 'python3'</script>
+
+                </article>
+              
+
+              
+              
+              
+              
+                <footer class="prev-next-footer d-print-none">
+                  
+<div class="prev-next-area">
+</div>
+                </footer>
+              
+            </div>
+            
+            
+              
+                <div class="bd-sidebar-secondary bd-toc"><div class="sidebar-secondary-items sidebar-secondary__inner">
+
+
+  <div class="sidebar-secondary-item">
+  <div class="page-toc tocsection onthispage">
+    <i class="fa-solid fa-list"></i> Contents
+  </div>
+  <nav class="bd-toc-nav page-toc">
+    <ul class="simple visible nav section-nav flex-column">
+</ul>
+
+  </nav></div>
+
+</div></div>
+              
+            
+          </div>
+          <footer class="bd-footer-content">
+            
+<div class="bd-footer-content__inner container">
+  
+  <div class="footer-item">
+    
+<p class="component-author">
+By Morten Hjorth-Jensen
+</p>
+
+  </div>
+  
+  <div class="footer-item">
+    
+
+  <p class="copyright">
+    
+      © Copyright 2023.
+      <br/>
+    
+  </p>
+
+  </div>
+  
+  <div class="footer-item">
+    
+  </div>
+  
+  <div class="footer-item">
+    
+  </div>
+  
+</div>
+          </footer>
+        
+
+      </main>
+    </div>
+  </div>
+  
+  <!-- Scripts loaded after <body> so the DOM is not blocked -->
+  <script src="http://github.com/_static/scripts/bootstrap.js?digest=dfe6caa3a7d634c4db9b"></script>
+<script src="http://github.com/_static/scripts/pydata-sphinx-theme.js?digest=dfe6caa3a7d634c4db9b"></script>
+
+  <footer class="bd-footer">
+  </footer>
+  </body>
+</html>
\ No newline at end of file
diff --git a/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/mdit_py_plugins/container/README.html b/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/mdit_py_plugins/container/README.html
new file mode 100644
index 000000000..a65ea1098
--- /dev/null
+++ b/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/mdit_py_plugins/container/README.html
@@ -0,0 +1,609 @@
+
+<!DOCTYPE html>
+
+
+<html lang="en" data-content_root="../../../../../../" >
+
+  <head>
+    <meta charset="utf-8" />
+    <meta name="viewport" content="width=device-width, initial-scale=1.0" /><meta name="viewport" content="width=device-width, initial-scale=1" />
+
+    <title>markdown-it-container &#8212; Applied Data Analysis and Machine Learning</title>
+  
+  
+  
+  <script data-cfasync="false">
+    document.documentElement.dataset.mode = localStorage.getItem("mode") || "";
+    document.documentElement.dataset.theme = localStorage.getItem("theme") || "";
+  </script>
+  
+  <!-- Loaded before other Sphinx assets -->
+  <link href="http://github.com/_static/styles/theme.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+<link href="http://github.com/_static/styles/bootstrap.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+<link href="http://github.com/_static/styles/pydata-sphinx-theme.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+
+  
+  <link href="http://github.com/_static/vendor/fontawesome/6.5.2/css/all.min.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+  <link rel="preload" as="font" type="font/woff2" crossorigin href="http://github.com/_static/vendor/fontawesome/6.5.2/webfonts/fa-solid-900.woff2" />
+<link rel="preload" as="font" type="font/woff2" crossorigin href="http://github.com/_static/vendor/fontawesome/6.5.2/webfonts/fa-brands-400.woff2" />
+<link rel="preload" as="font" type="font/woff2" crossorigin href="http://github.com/_static/vendor/fontawesome/6.5.2/webfonts/fa-regular-400.woff2" />
+
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/pygments.css?v=03e43079" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/styles/sphinx-book-theme.css?v=eba8b062" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/togglebutton.css?v=13237357" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/copybutton.css?v=76b2166b" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/mystnb.8ecb98da25f57f5357bf6f572d296f466b2cfe2517ffebfabe82451661e28f02.css?v=6644e6bb" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/sphinx-thebe.css?v=4fa983c6" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/sphinx-design.min.css?v=95c83b7e" />
+  
+  <!-- Pre-loaded scripts that we'll load fully later -->
+  <link rel="preload" as="script" href="http://github.com/_static/scripts/bootstrap.js?digest=dfe6caa3a7d634c4db9b" />
+<link rel="preload" as="script" href="http://github.com/_static/scripts/pydata-sphinx-theme.js?digest=dfe6caa3a7d634c4db9b" />
+  <script src="http://github.com/_static/vendor/fontawesome/6.5.2/js/all.min.js?digest=dfe6caa3a7d634c4db9b"></script>
+
+    <script src="http://github.com/_static/documentation_options.js?v=9eb32ce0"></script>
+    <script src="http://github.com/_static/doctools.js?v=9a2dae69"></script>
+    <script src="http://github.com/_static/sphinx_highlight.js?v=dc90522c"></script>
+    <script src="http://github.com/_static/clipboard.min.js?v=a7894cd8"></script>
+    <script src="http://github.com/_static/copybutton.js?v=f281be69"></script>
+    <script src="http://github.com/_static/scripts/sphinx-book-theme.js?v=887ef09a"></script>
+    <script>let toggleHintShow = 'Click to show';</script>
+    <script>let toggleHintHide = 'Click to hide';</script>
+    <script>let toggleOpenOnPrint = 'true';</script>
+    <script src="http://github.com/_static/togglebutton.js?v=4a39c7ea"></script>
+    <script>var togglebuttonSelector = '.toggle, .admonition.dropdown';</script>
+    <script src="http://github.com/_static/design-tabs.js?v=f930bc37"></script>
+    <script>const THEBE_JS_URL = "https://unpkg.com/thebe@0.8.2/lib/index.js"; const thebe_selector = ".thebe,.cell"; const thebe_selector_input = "pre"; const thebe_selector_output = ".output, .cell_output"</script>
+    <script async="async" src="http://github.com/_static/sphinx-thebe.js?v=c100c467"></script>
+    <script>var togglebuttonSelector = '.toggle, .admonition.dropdown';</script>
+    <script>const THEBE_JS_URL = "https://unpkg.com/thebe@0.8.2/lib/index.js"; const thebe_selector = ".thebe,.cell"; const thebe_selector_input = "pre"; const thebe_selector_output = ".output, .cell_output"</script>
+    <script>DOCUMENTATION_OPTIONS.pagename = '.venv/lib/python3.13/site-packages/mdit_py_plugins/container/README';</script>
+    <link rel="index" title="Index" href="http://github.com/genindex.html" />
+    <link rel="search" title="Search" href="http://github.com/search.html" />
+  <meta name="viewport" content="width=device-width, initial-scale=1"/>
+  <meta name="docsearch:language" content="en"/>
+  </head>
+  
+  
+  <body data-bs-spy="scroll" data-bs-target=".bd-toc-nav" data-offset="180" data-bs-root-margin="0px 0px -60%" data-default-mode="">
+
+  
+  
+  <div id="pst-skip-link" class="skip-link d-print-none"><a href="#main-content">Skip to main content</a></div>
+  
+  <div id="pst-scroll-pixel-helper"></div>
+  
+  <button type="button" class="btn rounded-pill" id="pst-back-to-top">
+    <i class="fa-solid fa-arrow-up"></i>Back to top</button>
+
+  
+  <input type="checkbox"
+          class="sidebar-toggle"
+          id="pst-primary-sidebar-checkbox"/>
+  <label class="overlay overlay-primary" for="pst-primary-sidebar-checkbox"></label>
+  
+  <input type="checkbox"
+          class="sidebar-toggle"
+          id="pst-secondary-sidebar-checkbox"/>
+  <label class="overlay overlay-secondary" for="pst-secondary-sidebar-checkbox"></label>
+  
+  <div class="search-button__wrapper">
+    <div class="search-button__overlay"></div>
+    <div class="search-button__search-container">
+<form class="bd-search d-flex align-items-center"
+      action="http://github.com/search.html"
+      method="get">
+  <i class="fa-solid fa-magnifying-glass"></i>
+  <input type="search"
+         class="form-control"
+         name="q"
+         id="search-input"
+         placeholder="Search this book..."
+         aria-label="Search this book..."
+         autocomplete="off"
+         autocorrect="off"
+         autocapitalize="off"
+         spellcheck="false"/>
+  <span class="search-button__kbd-shortcut"><kbd class="kbd-shortcut__modifier">Ctrl</kbd>+<kbd>K</kbd></span>
+</form></div>
+  </div>
+
+  <div class="pst-async-banner-revealer d-none">
+  <aside id="bd-header-version-warning" class="d-none d-print-none" aria-label="Version warning"></aside>
+</div>
+
+  
+    <header class="bd-header navbar navbar-expand-lg bd-navbar d-print-none">
+    </header>
+  
+
+  <div class="bd-container">
+    <div class="bd-container__inner bd-page-width">
+      
+      
+      
+        
+      
+      <div class="bd-sidebar-primary bd-sidebar">
+        
+
+  
+  <div class="sidebar-header-items sidebar-primary__section">
+    
+    
+    
+    
+  </div>
+  
+    <div class="sidebar-primary-items__start sidebar-primary__section">
+        <div class="sidebar-primary-item">
+
+  
+    
+  
+
+<a class="navbar-brand logo" href="http://github.com/intro.html">
+  
+  
+  
+  
+  
+    
+    
+      
+    
+    
+    <img src="http://github.com/_static/logo.png" class="logo__image only-light" alt="Applied Data Analysis and Machine Learning - Home"/>
+    <script>document.write(`<img src="http://github.com/_static/logo.png" class="logo__image only-dark" alt="Applied Data Analysis and Machine Learning - Home"/>`);</script>
+  
+  
+</a></div>
+        <div class="sidebar-primary-item">
+
+ <script>
+ document.write(`
+   <button class="btn search-button-field search-button__button" title="Search" aria-label="Search" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <i class="fa-solid fa-magnifying-glass"></i>
+    <span class="search-button__default-text">Search</span>
+    <span class="search-button__kbd-shortcut"><kbd class="kbd-shortcut__modifier">Ctrl</kbd>+<kbd class="kbd-shortcut__modifier">K</kbd></span>
+   </button>
+ `);
+ </script></div>
+        <div class="sidebar-primary-item"><nav class="bd-links bd-docs-nav" aria-label="Main">
+    <div class="bd-toc-item navbar-nav active">
+        
+        <ul class="nav bd-sidenav bd-sidenav__home-link">
+            <li class="toctree-l1">
+                <a class="reference internal" href="http://github.com/intro.html">
+                    Applied Data Analysis and Machine Learning
+                </a>
+            </li>
+        </ul>
+        <p aria-level="2" class="caption" role="heading"><span class="caption-text">About the course</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/schedule.html">Course setting</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/teachers.html">Teachers and Grading</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/textbooks.html">Textbooks</a></li>
+
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Review of Statistics with Resampling Techniques and Linear Algebra</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/statistics.html">1. Elements of Probability Theory and Statistical Data Analysis</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/linalg.html">2. Linear Algebra, Handling of Arrays and more Python Features</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">From Regression to Support Vector Machines</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter1.html">3. Linear Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter2.html">4. Ridge and Lasso Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter3.html">5. Resampling Methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter4.html">6. Logistic Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapteroptimization.html">7. Optimization, the central part of any Machine Learning algortithm</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter5.html">8. Support Vector Machines, overarching aims</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Decision Trees, Ensemble Methods and Boosting</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter6.html">9. Decision trees, overarching aims</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter7.html">10. Ensemble Methods: From a Single Tree to Many Trees and Extreme Boosting, Meet the Jungle of Methods</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Dimensionality Reduction</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter8.html">11. Basic ideas of the Principal Component Analysis (PCA)</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/clustering.html">12. Clustering and Unsupervised Learning</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Deep Learning Methods</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter9.html">13. Neural networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter10.html">14. Building a Feed Forward Neural Network</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter11.html">15. Solving Differential Equations  with Deep Learning</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter12.html">16. Convolutional Neural Networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter13.html">17. Recurrent neural networks: Overarching view</a></li>
+
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Weekly material, notes and exercises</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek34.html">Exercises week 34</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week34.html">Week 34: Introduction to the course, Logistics and Practicalities</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek35.html">Exercises week 35</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week35.html">Week 35: From Ordinary Linear Regression to Ridge and Lasso Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek36.html">Exercises week 36</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week36.html">Week 36: Linear Regression and Gradient descent</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek37.html">Exercises week 37</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week37.html">Week 37: Gradient descent methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek38.html">Exercises week 38</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week38.html">Week 38: Statistical analysis, bias-variance tradeoff and resampling methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek39.html">Exercises week 39</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week39.html">Week 39: Resampling methods and logistic regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week40.html">Week 40: Gradient descent methods (continued) and start Neural networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week41.html">Week 41 Neural networks and constructing a neural network code</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek41.html">Exercises week 41</a></li>
+
+
+
+
+
+
+
+
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Projects</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/project1.html">Project 1 on Machine Learning, deadline October 6 (midnight), 2025</a></li>
+</ul>
+
+    </div>
+</nav></div>
+    </div>
+  
+  
+  <div class="sidebar-primary-items__end sidebar-primary__section">
+  </div>
+  
+  <div id="rtd-footer-container"></div>
+
+
+      </div>
+      
+      <main id="main-content" class="bd-main" role="main">
+        
+        
+
+<div class="sbt-scroll-pixel-helper"></div>
+
+          <div class="bd-content">
+            <div class="bd-article-container">
+              
+              <div class="bd-header-article d-print-none">
+<div class="header-article-items header-article__inner">
+  
+    <div class="header-article-items__start">
+      
+        <div class="header-article-item"><button class="sidebar-toggle primary-toggle btn btn-sm" title="Toggle primary sidebar" data-bs-placement="bottom" data-bs-toggle="tooltip">
+  <span class="fa-solid fa-bars"></span>
+</button></div>
+      
+    </div>
+  
+  
+    <div class="header-article-items__end">
+      
+        <div class="header-article-item">
+
+<div class="article-header-buttons">
+
+
+
+
+
+<div class="dropdown dropdown-download-buttons">
+  <button class="btn dropdown-toggle" type="button" data-bs-toggle="dropdown" aria-expanded="false" aria-label="Download this page">
+    <i class="fas fa-download"></i>
+  </button>
+  <ul class="dropdown-menu">
+      
+      
+      
+      <li><a href="http://github.com/_sources/.venv/lib/python3.13/site-packages/mdit_py_plugins/container/README.md" target="_blank"
+   class="btn btn-sm btn-download-source-button dropdown-item"
+   title="Download source file"
+   data-bs-placement="left" data-bs-toggle="tooltip"
+>
+  
+
+<span class="btn__icon-container">
+  <i class="fas fa-file"></i>
+  </span>
+<span class="btn__text-container">.md</span>
+</a>
+</li>
+      
+      
+      
+      
+      <li>
+<button onclick="window.print()"
+  class="btn btn-sm btn-download-pdf-button dropdown-item"
+  title="Print to PDF"
+  data-bs-placement="left" data-bs-toggle="tooltip"
+>
+  
+
+<span class="btn__icon-container">
+  <i class="fas fa-file-pdf"></i>
+  </span>
+<span class="btn__text-container">.pdf</span>
+</button>
+</li>
+      
+  </ul>
+</div>
+
+
+
+
+<button onclick="toggleFullScreen()"
+  class="btn btn-sm btn-fullscreen-button"
+  title="Fullscreen mode"
+  data-bs-placement="bottom" data-bs-toggle="tooltip"
+>
+  
+
+<span class="btn__icon-container">
+  <i class="fas fa-expand"></i>
+  </span>
+
+</button>
+
+
+
+<script>
+document.write(`
+  <button class="btn btn-sm nav-link pst-navbar-icon theme-switch-button" title="light/dark" aria-label="light/dark" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <i class="theme-switch fa-solid fa-sun fa-lg" data-mode="light"></i>
+    <i class="theme-switch fa-solid fa-moon fa-lg" data-mode="dark"></i>
+    <i class="theme-switch fa-solid fa-circle-half-stroke fa-lg" data-mode="auto"></i>
+  </button>
+`);
+</script>
+
+
+<script>
+document.write(`
+  <button class="btn btn-sm pst-navbar-icon search-button search-button__button" title="Search" aria-label="Search" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <i class="fa-solid fa-magnifying-glass fa-lg"></i>
+  </button>
+`);
+</script>
+<button class="sidebar-toggle secondary-toggle btn btn-sm" title="Toggle secondary sidebar" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <span class="fa-solid fa-list"></span>
+</button>
+</div></div>
+      
+    </div>
+  
+</div>
+</div>
+              
+              
+
+<div id="jb-print-docs-body" class="onlyprint">
+    <h1>markdown-it-container</h1>
+    <!-- Table of contents -->
+    <div id="print-main-content">
+        <div id="jb-print-toc">
+            
+            <div>
+                <h2> Contents </h2>
+            </div>
+            <nav aria-label="Page">
+                <ul class="visible nav section-nav flex-column">
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#installation">Installation</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#api">API</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#example">Example</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#license">License</a></li>
+</ul>
+            </nav>
+        </div>
+    </div>
+</div>
+
+              
+                
+<div id="searchbox"></div>
+                <article class="bd-article">
+                  
+  <section class="tex2jax_ignore mathjax_ignore" id="markdown-it-container">
+<h1>markdown-it-container<a class="headerlink" href="#markdown-it-container" title="Link to this heading">#</a></h1>
+<p><a class="reference external" href="https://travis-ci.org/markdown-it/markdown-it-container"><img alt="Build Status" src="https://img.shields.io/travis/markdown-it/markdown-it-container/master.svg?style=flat" /></a>
+<a class="reference external" href="https://www.npmjs.org/package/markdown-it-container"><img alt="NPM version" src="https://img.shields.io/npm/v/markdown-it-container.svg?style=flat" /></a>
+<a class="reference external" href="https://coveralls.io/r/markdown-it/markdown-it-container?branch=master"><img alt="Coverage Status" src="https://img.shields.io/coveralls/markdown-it/markdown-it-container/master.svg?style=flat" /></a></p>
+<blockquote>
+<div><p>Plugin for creating block-level custom containers for <a class="reference external" href="https://github.com/markdown-it/markdown-it">markdown-it</a> markdown parser.</p>
+</div></blockquote>
+<p><strong>v2.+ requires <code class="docutils literal notranslate"><span class="pre">markdown-it</span></code> v5.+, see changelog.</strong></p>
+<p>With this plugin you can create block containers like:</p>
+<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="p">:::</span> <span class="n">warning</span>
+<span class="o">*</span><span class="n">here</span> <span class="n">be</span> <span class="n">dragons</span><span class="o">*</span>
+<span class="p">:::</span>
+</pre></div>
+</div>
+<p>…. and specify how they should be rendered. If no renderer defined, <code class="docutils literal notranslate"><span class="pre">&lt;div&gt;</span></code> with
+container name class will be created:</p>
+<div class="highlight-html notranslate"><div class="highlight"><pre><span></span><span class="p">&lt;</span><span class="nt">div</span> <span class="na">class</span><span class="o">=</span><span class="s">&quot;warning&quot;</span><span class="p">&gt;</span>
+<span class="p">&lt;</span><span class="nt">em</span><span class="p">&gt;</span>here be dragons<span class="p">&lt;/</span><span class="nt">em</span><span class="p">&gt;</span>
+<span class="p">&lt;/</span><span class="nt">div</span><span class="p">&gt;</span>
+</pre></div>
+</div>
+<p>Markup is the same as for <a class="reference external" href="http://spec.commonmark.org/0.18/#fenced-code-blocks">fenced code blocks</a>.
+Difference is, that marker use another character and content is rendered as markdown markup.</p>
+<section id="installation">
+<h2>Installation<a class="headerlink" href="#installation" title="Link to this heading">#</a></h2>
+<p>node.js, browser:</p>
+<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>$<span class="w"> </span>npm<span class="w"> </span>install<span class="w"> </span>markdown-it-container<span class="w"> </span>--save
+$<span class="w"> </span>bower<span class="w"> </span>install<span class="w"> </span>markdown-it-container<span class="w"> </span>--save
+</pre></div>
+</div>
+</section>
+<section id="api">
+<h2>API<a class="headerlink" href="#api" title="Link to this heading">#</a></h2>
+<div class="highlight-js notranslate"><div class="highlight"><pre><span></span><span class="kd">var</span><span class="w"> </span><span class="nx">md</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">require</span><span class="p">(</span><span class="s1">&#39;markdown-it&#39;</span><span class="p">)()</span>
+<span class="w">            </span><span class="p">.</span><span class="nx">use</span><span class="p">(</span><span class="nx">require</span><span class="p">(</span><span class="s1">&#39;markdown-it-container&#39;</span><span class="p">),</span><span class="w"> </span><span class="nx">name</span><span class="w"> </span><span class="p">[,</span><span class="w"> </span><span class="nx">options</span><span class="p">]);</span>
+</pre></div>
+</div>
+<p>Params:</p>
+<ul class="simple">
+<li><p><strong>name</strong> - container name (mandatory)</p></li>
+<li><p><strong>options:</strong></p>
+<ul>
+<li><p><strong>validate</strong> - optional, function to validate tail after opening marker, should
+return <code class="docutils literal notranslate"><span class="pre">true</span></code> on success.</p></li>
+<li><p><strong>render</strong> - optional, renderer function for opening/closing tokens.</p></li>
+<li><p><strong>marker</strong> - optional (<code class="docutils literal notranslate"><span class="pre">:</span></code>), character to use in delimiter.</p></li>
+</ul>
+</li>
+</ul>
+</section>
+<section id="example">
+<h2>Example<a class="headerlink" href="#example" title="Link to this heading">#</a></h2>
+<div class="highlight-js notranslate"><div class="highlight"><pre><span></span><span class="kd">var</span><span class="w"> </span><span class="nx">md</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">require</span><span class="p">(</span><span class="s1">&#39;markdown-it&#39;</span><span class="p">)();</span>
+
+<span class="nx">md</span><span class="p">.</span><span class="nx">use</span><span class="p">(</span><span class="nx">require</span><span class="p">(</span><span class="s1">&#39;markdown-it-container&#39;</span><span class="p">),</span><span class="w"> </span><span class="s1">&#39;spoiler&#39;</span><span class="p">,</span><span class="w"> </span><span class="p">{</span>
+
+<span class="w">  </span><span class="nx">validate</span><span class="o">:</span><span class="w"> </span><span class="kd">function</span><span class="p">(</span><span class="nx">params</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
+<span class="w">    </span><span class="k">return</span><span class="w"> </span><span class="nx">params</span><span class="p">.</span><span class="nx">trim</span><span class="p">().</span><span class="nx">match</span><span class="p">(</span><span class="sr">/^spoiler\s+(.*)$/</span><span class="p">);</span>
+<span class="w">  </span><span class="p">},</span>
+
+<span class="w">  </span><span class="nx">render</span><span class="o">:</span><span class="w"> </span><span class="kd">function</span><span class="w"> </span><span class="p">(</span><span class="nx">tokens</span><span class="p">,</span><span class="w"> </span><span class="nx">idx</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
+<span class="w">    </span><span class="kd">var</span><span class="w"> </span><span class="nx">m</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">tokens</span><span class="p">[</span><span class="nx">idx</span><span class="p">].</span><span class="nx">info</span><span class="p">.</span><span class="nx">trim</span><span class="p">().</span><span class="nx">match</span><span class="p">(</span><span class="sr">/^spoiler\s+(.*)$/</span><span class="p">);</span>
+
+<span class="w">    </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nx">tokens</span><span class="p">[</span><span class="nx">idx</span><span class="p">].</span><span class="nx">nesting</span><span class="w"> </span><span class="o">===</span><span class="w"> </span><span class="mf">1</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
+<span class="w">      </span><span class="c1">// opening tag</span>
+<span class="w">      </span><span class="k">return</span><span class="w"> </span><span class="s1">&#39;&lt;details&gt;&lt;summary&gt;&#39;</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nx">md</span><span class="p">.</span><span class="nx">utils</span><span class="p">.</span><span class="nx">escapeHtml</span><span class="p">(</span><span class="nx">m</span><span class="p">[</span><span class="mf">1</span><span class="p">])</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="s1">&#39;&lt;/summary&gt;\n&#39;</span><span class="p">;</span>
+
+<span class="w">    </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span>
+<span class="w">      </span><span class="c1">// closing tag</span>
+<span class="w">      </span><span class="k">return</span><span class="w"> </span><span class="s1">&#39;&lt;/details&gt;\n&#39;</span><span class="p">;</span>
+<span class="w">    </span><span class="p">}</span>
+<span class="w">  </span><span class="p">}</span>
+<span class="p">});</span>
+
+<span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="nx">md</span><span class="p">.</span><span class="nx">render</span><span class="p">(</span><span class="s1">&#39;::: spoiler click me\n*content*\n:::\n&#39;</span><span class="p">));</span>
+
+<span class="c1">// Output:</span>
+<span class="c1">//</span>
+<span class="c1">// &lt;details&gt;&lt;summary&gt;click me&lt;/summary&gt;</span>
+<span class="c1">// &lt;p&gt;&lt;em&gt;content&lt;/em&gt;&lt;/p&gt;</span>
+<span class="c1">// &lt;/details&gt;</span>
+</pre></div>
+</div>
+</section>
+<section id="license">
+<h2>License<a class="headerlink" href="#license" title="Link to this heading">#</a></h2>
+<p><a class="reference external" href="https://github.com/markdown-it/markdown-it-container/blob/master/LICENSE">MIT</a></p>
+</section>
+</section>
+
+    <script type="text/x-thebe-config">
+    {
+        requestKernel: true,
+        binderOptions: {
+            repo: "binder-examples/jupyter-stacks-datascience",
+            ref: "master",
+        },
+        codeMirrorConfig: {
+            theme: "abcdef",
+            mode: "python"
+        },
+        kernelOptions: {
+            name: "python3",
+            path: "./.venv/lib/python3.13/site-packages/mdit_py_plugins/container"
+        },
+        predefinedOutput: true
+    }
+    </script>
+    <script>kernelName = 'python3'</script>
+
+                </article>
+              
+
+              
+              
+              
+              
+                <footer class="prev-next-footer d-print-none">
+                  
+<div class="prev-next-area">
+</div>
+                </footer>
+              
+            </div>
+            
+            
+              
+                <div class="bd-sidebar-secondary bd-toc"><div class="sidebar-secondary-items sidebar-secondary__inner">
+
+
+  <div class="sidebar-secondary-item">
+  <div class="page-toc tocsection onthispage">
+    <i class="fa-solid fa-list"></i> Contents
+  </div>
+  <nav class="bd-toc-nav page-toc">
+    <ul class="visible nav section-nav flex-column">
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#installation">Installation</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#api">API</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#example">Example</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#license">License</a></li>
+</ul>
+  </nav></div>
+
+</div></div>
+              
+            
+          </div>
+          <footer class="bd-footer-content">
+            
+<div class="bd-footer-content__inner container">
+  
+  <div class="footer-item">
+    
+<p class="component-author">
+By Morten Hjorth-Jensen
+</p>
+
+  </div>
+  
+  <div class="footer-item">
+    
+
+  <p class="copyright">
+    
+      © Copyright 2023.
+      <br/>
+    
+  </p>
+
+  </div>
+  
+  <div class="footer-item">
+    
+  </div>
+  
+  <div class="footer-item">
+    
+  </div>
+  
+</div>
+          </footer>
+        
+
+      </main>
+    </div>
+  </div>
+  
+  <!-- Scripts loaded after <body> so the DOM is not blocked -->
+  <script src="http://github.com/_static/scripts/bootstrap.js?digest=dfe6caa3a7d634c4db9b"></script>
+<script src="http://github.com/_static/scripts/pydata-sphinx-theme.js?digest=dfe6caa3a7d634c4db9b"></script>
+
+  <footer class="bd-footer">
+  </footer>
+  </body>
+</html>
\ No newline at end of file
diff --git a/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/mdit_py_plugins/deflist/README.html b/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/mdit_py_plugins/deflist/README.html
new file mode 100644
index 000000000..977ff6ea8
--- /dev/null
+++ b/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/mdit_py_plugins/deflist/README.html
@@ -0,0 +1,551 @@
+
+<!DOCTYPE html>
+
+
+<html lang="en" data-content_root="../../../../../../" >
+
+  <head>
+    <meta charset="utf-8" />
+    <meta name="viewport" content="width=device-width, initial-scale=1.0" /><meta name="viewport" content="width=device-width, initial-scale=1" />
+
+    <title>markdown-it-deflist &#8212; Applied Data Analysis and Machine Learning</title>
+  
+  
+  
+  <script data-cfasync="false">
+    document.documentElement.dataset.mode = localStorage.getItem("mode") || "";
+    document.documentElement.dataset.theme = localStorage.getItem("theme") || "";
+  </script>
+  
+  <!-- Loaded before other Sphinx assets -->
+  <link href="http://github.com/_static/styles/theme.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+<link href="http://github.com/_static/styles/bootstrap.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+<link href="http://github.com/_static/styles/pydata-sphinx-theme.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+
+  
+  <link href="http://github.com/_static/vendor/fontawesome/6.5.2/css/all.min.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+  <link rel="preload" as="font" type="font/woff2" crossorigin href="http://github.com/_static/vendor/fontawesome/6.5.2/webfonts/fa-solid-900.woff2" />
+<link rel="preload" as="font" type="font/woff2" crossorigin href="http://github.com/_static/vendor/fontawesome/6.5.2/webfonts/fa-brands-400.woff2" />
+<link rel="preload" as="font" type="font/woff2" crossorigin href="http://github.com/_static/vendor/fontawesome/6.5.2/webfonts/fa-regular-400.woff2" />
+
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/pygments.css?v=03e43079" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/styles/sphinx-book-theme.css?v=eba8b062" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/togglebutton.css?v=13237357" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/copybutton.css?v=76b2166b" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/mystnb.8ecb98da25f57f5357bf6f572d296f466b2cfe2517ffebfabe82451661e28f02.css?v=6644e6bb" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/sphinx-thebe.css?v=4fa983c6" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/sphinx-design.min.css?v=95c83b7e" />
+  
+  <!-- Pre-loaded scripts that we'll load fully later -->
+  <link rel="preload" as="script" href="http://github.com/_static/scripts/bootstrap.js?digest=dfe6caa3a7d634c4db9b" />
+<link rel="preload" as="script" href="http://github.com/_static/scripts/pydata-sphinx-theme.js?digest=dfe6caa3a7d634c4db9b" />
+  <script src="http://github.com/_static/vendor/fontawesome/6.5.2/js/all.min.js?digest=dfe6caa3a7d634c4db9b"></script>
+
+    <script src="http://github.com/_static/documentation_options.js?v=9eb32ce0"></script>
+    <script src="http://github.com/_static/doctools.js?v=9a2dae69"></script>
+    <script src="http://github.com/_static/sphinx_highlight.js?v=dc90522c"></script>
+    <script src="http://github.com/_static/clipboard.min.js?v=a7894cd8"></script>
+    <script src="http://github.com/_static/copybutton.js?v=f281be69"></script>
+    <script src="http://github.com/_static/scripts/sphinx-book-theme.js?v=887ef09a"></script>
+    <script>let toggleHintShow = 'Click to show';</script>
+    <script>let toggleHintHide = 'Click to hide';</script>
+    <script>let toggleOpenOnPrint = 'true';</script>
+    <script src="http://github.com/_static/togglebutton.js?v=4a39c7ea"></script>
+    <script>var togglebuttonSelector = '.toggle, .admonition.dropdown';</script>
+    <script src="http://github.com/_static/design-tabs.js?v=f930bc37"></script>
+    <script>const THEBE_JS_URL = "https://unpkg.com/thebe@0.8.2/lib/index.js"; const thebe_selector = ".thebe,.cell"; const thebe_selector_input = "pre"; const thebe_selector_output = ".output, .cell_output"</script>
+    <script async="async" src="http://github.com/_static/sphinx-thebe.js?v=c100c467"></script>
+    <script>var togglebuttonSelector = '.toggle, .admonition.dropdown';</script>
+    <script>const THEBE_JS_URL = "https://unpkg.com/thebe@0.8.2/lib/index.js"; const thebe_selector = ".thebe,.cell"; const thebe_selector_input = "pre"; const thebe_selector_output = ".output, .cell_output"</script>
+    <script>DOCUMENTATION_OPTIONS.pagename = '.venv/lib/python3.13/site-packages/mdit_py_plugins/deflist/README';</script>
+    <link rel="index" title="Index" href="http://github.com/genindex.html" />
+    <link rel="search" title="Search" href="http://github.com/search.html" />
+  <meta name="viewport" content="width=device-width, initial-scale=1"/>
+  <meta name="docsearch:language" content="en"/>
+  </head>
+  
+  
+  <body data-bs-spy="scroll" data-bs-target=".bd-toc-nav" data-offset="180" data-bs-root-margin="0px 0px -60%" data-default-mode="">
+
+  
+  
+  <div id="pst-skip-link" class="skip-link d-print-none"><a href="#main-content">Skip to main content</a></div>
+  
+  <div id="pst-scroll-pixel-helper"></div>
+  
+  <button type="button" class="btn rounded-pill" id="pst-back-to-top">
+    <i class="fa-solid fa-arrow-up"></i>Back to top</button>
+
+  
+  <input type="checkbox"
+          class="sidebar-toggle"
+          id="pst-primary-sidebar-checkbox"/>
+  <label class="overlay overlay-primary" for="pst-primary-sidebar-checkbox"></label>
+  
+  <input type="checkbox"
+          class="sidebar-toggle"
+          id="pst-secondary-sidebar-checkbox"/>
+  <label class="overlay overlay-secondary" for="pst-secondary-sidebar-checkbox"></label>
+  
+  <div class="search-button__wrapper">
+    <div class="search-button__overlay"></div>
+    <div class="search-button__search-container">
+<form class="bd-search d-flex align-items-center"
+      action="http://github.com/search.html"
+      method="get">
+  <i class="fa-solid fa-magnifying-glass"></i>
+  <input type="search"
+         class="form-control"
+         name="q"
+         id="search-input"
+         placeholder="Search this book..."
+         aria-label="Search this book..."
+         autocomplete="off"
+         autocorrect="off"
+         autocapitalize="off"
+         spellcheck="false"/>
+  <span class="search-button__kbd-shortcut"><kbd class="kbd-shortcut__modifier">Ctrl</kbd>+<kbd>K</kbd></span>
+</form></div>
+  </div>
+
+  <div class="pst-async-banner-revealer d-none">
+  <aside id="bd-header-version-warning" class="d-none d-print-none" aria-label="Version warning"></aside>
+</div>
+
+  
+    <header class="bd-header navbar navbar-expand-lg bd-navbar d-print-none">
+    </header>
+  
+
+  <div class="bd-container">
+    <div class="bd-container__inner bd-page-width">
+      
+      
+      
+        
+      
+      <div class="bd-sidebar-primary bd-sidebar">
+        
+
+  
+  <div class="sidebar-header-items sidebar-primary__section">
+    
+    
+    
+    
+  </div>
+  
+    <div class="sidebar-primary-items__start sidebar-primary__section">
+        <div class="sidebar-primary-item">
+
+  
+    
+  
+
+<a class="navbar-brand logo" href="http://github.com/intro.html">
+  
+  
+  
+  
+  
+    
+    
+      
+    
+    
+    <img src="http://github.com/_static/logo.png" class="logo__image only-light" alt="Applied Data Analysis and Machine Learning - Home"/>
+    <script>document.write(`<img src="http://github.com/_static/logo.png" class="logo__image only-dark" alt="Applied Data Analysis and Machine Learning - Home"/>`);</script>
+  
+  
+</a></div>
+        <div class="sidebar-primary-item">
+
+ <script>
+ document.write(`
+   <button class="btn search-button-field search-button__button" title="Search" aria-label="Search" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <i class="fa-solid fa-magnifying-glass"></i>
+    <span class="search-button__default-text">Search</span>
+    <span class="search-button__kbd-shortcut"><kbd class="kbd-shortcut__modifier">Ctrl</kbd>+<kbd class="kbd-shortcut__modifier">K</kbd></span>
+   </button>
+ `);
+ </script></div>
+        <div class="sidebar-primary-item"><nav class="bd-links bd-docs-nav" aria-label="Main">
+    <div class="bd-toc-item navbar-nav active">
+        
+        <ul class="nav bd-sidenav bd-sidenav__home-link">
+            <li class="toctree-l1">
+                <a class="reference internal" href="http://github.com/intro.html">
+                    Applied Data Analysis and Machine Learning
+                </a>
+            </li>
+        </ul>
+        <p aria-level="2" class="caption" role="heading"><span class="caption-text">About the course</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/schedule.html">Course setting</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/teachers.html">Teachers and Grading</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/textbooks.html">Textbooks</a></li>
+
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Review of Statistics with Resampling Techniques and Linear Algebra</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/statistics.html">1. Elements of Probability Theory and Statistical Data Analysis</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/linalg.html">2. Linear Algebra, Handling of Arrays and more Python Features</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">From Regression to Support Vector Machines</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter1.html">3. Linear Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter2.html">4. Ridge and Lasso Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter3.html">5. Resampling Methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter4.html">6. Logistic Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapteroptimization.html">7. Optimization, the central part of any Machine Learning algortithm</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter5.html">8. Support Vector Machines, overarching aims</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Decision Trees, Ensemble Methods and Boosting</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter6.html">9. Decision trees, overarching aims</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter7.html">10. Ensemble Methods: From a Single Tree to Many Trees and Extreme Boosting, Meet the Jungle of Methods</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Dimensionality Reduction</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter8.html">11. Basic ideas of the Principal Component Analysis (PCA)</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/clustering.html">12. Clustering and Unsupervised Learning</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Deep Learning Methods</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter9.html">13. Neural networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter10.html">14. Building a Feed Forward Neural Network</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter11.html">15. Solving Differential Equations  with Deep Learning</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter12.html">16. Convolutional Neural Networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter13.html">17. Recurrent neural networks: Overarching view</a></li>
+
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Weekly material, notes and exercises</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek34.html">Exercises week 34</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week34.html">Week 34: Introduction to the course, Logistics and Practicalities</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek35.html">Exercises week 35</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week35.html">Week 35: From Ordinary Linear Regression to Ridge and Lasso Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek36.html">Exercises week 36</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week36.html">Week 36: Linear Regression and Gradient descent</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek37.html">Exercises week 37</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week37.html">Week 37: Gradient descent methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek38.html">Exercises week 38</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week38.html">Week 38: Statistical analysis, bias-variance tradeoff and resampling methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek39.html">Exercises week 39</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week39.html">Week 39: Resampling methods and logistic regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week40.html">Week 40: Gradient descent methods (continued) and start Neural networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week41.html">Week 41 Neural networks and constructing a neural network code</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek41.html">Exercises week 41</a></li>
+
+
+
+
+
+
+
+
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Projects</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/project1.html">Project 1 on Machine Learning, deadline October 6 (midnight), 2025</a></li>
+</ul>
+
+    </div>
+</nav></div>
+    </div>
+  
+  
+  <div class="sidebar-primary-items__end sidebar-primary__section">
+  </div>
+  
+  <div id="rtd-footer-container"></div>
+
+
+      </div>
+      
+      <main id="main-content" class="bd-main" role="main">
+        
+        
+
+<div class="sbt-scroll-pixel-helper"></div>
+
+          <div class="bd-content">
+            <div class="bd-article-container">
+              
+              <div class="bd-header-article d-print-none">
+<div class="header-article-items header-article__inner">
+  
+    <div class="header-article-items__start">
+      
+        <div class="header-article-item"><button class="sidebar-toggle primary-toggle btn btn-sm" title="Toggle primary sidebar" data-bs-placement="bottom" data-bs-toggle="tooltip">
+  <span class="fa-solid fa-bars"></span>
+</button></div>
+      
+    </div>
+  
+  
+    <div class="header-article-items__end">
+      
+        <div class="header-article-item">
+
+<div class="article-header-buttons">
+
+
+
+
+
+<div class="dropdown dropdown-download-buttons">
+  <button class="btn dropdown-toggle" type="button" data-bs-toggle="dropdown" aria-expanded="false" aria-label="Download this page">
+    <i class="fas fa-download"></i>
+  </button>
+  <ul class="dropdown-menu">
+      
+      
+      
+      <li><a href="http://github.com/_sources/.venv/lib/python3.13/site-packages/mdit_py_plugins/deflist/README.md" target="_blank"
+   class="btn btn-sm btn-download-source-button dropdown-item"
+   title="Download source file"
+   data-bs-placement="left" data-bs-toggle="tooltip"
+>
+  
+
+<span class="btn__icon-container">
+  <i class="fas fa-file"></i>
+  </span>
+<span class="btn__text-container">.md</span>
+</a>
+</li>
+      
+      
+      
+      
+      <li>
+<button onclick="window.print()"
+  class="btn btn-sm btn-download-pdf-button dropdown-item"
+  title="Print to PDF"
+  data-bs-placement="left" data-bs-toggle="tooltip"
+>
+  
+
+<span class="btn__icon-container">
+  <i class="fas fa-file-pdf"></i>
+  </span>
+<span class="btn__text-container">.pdf</span>
+</button>
+</li>
+      
+  </ul>
+</div>
+
+
+
+
+<button onclick="toggleFullScreen()"
+  class="btn btn-sm btn-fullscreen-button"
+  title="Fullscreen mode"
+  data-bs-placement="bottom" data-bs-toggle="tooltip"
+>
+  
+
+<span class="btn__icon-container">
+  <i class="fas fa-expand"></i>
+  </span>
+
+</button>
+
+
+
+<script>
+document.write(`
+  <button class="btn btn-sm nav-link pst-navbar-icon theme-switch-button" title="light/dark" aria-label="light/dark" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <i class="theme-switch fa-solid fa-sun fa-lg" data-mode="light"></i>
+    <i class="theme-switch fa-solid fa-moon fa-lg" data-mode="dark"></i>
+    <i class="theme-switch fa-solid fa-circle-half-stroke fa-lg" data-mode="auto"></i>
+  </button>
+`);
+</script>
+
+
+<script>
+document.write(`
+  <button class="btn btn-sm pst-navbar-icon search-button search-button__button" title="Search" aria-label="Search" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <i class="fa-solid fa-magnifying-glass fa-lg"></i>
+  </button>
+`);
+</script>
+<button class="sidebar-toggle secondary-toggle btn btn-sm" title="Toggle secondary sidebar" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <span class="fa-solid fa-list"></span>
+</button>
+</div></div>
+      
+    </div>
+  
+</div>
+</div>
+              
+              
+
+<div id="jb-print-docs-body" class="onlyprint">
+    <h1>markdown-it-deflist</h1>
+    <!-- Table of contents -->
+    <div id="print-main-content">
+        <div id="jb-print-toc">
+            
+            <div>
+                <h2> Contents </h2>
+            </div>
+            <nav aria-label="Page">
+                <ul class="visible nav section-nav flex-column">
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#install">Install</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#use">Use</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#license">License</a></li>
+</ul>
+            </nav>
+        </div>
+    </div>
+</div>
+
+              
+                
+<div id="searchbox"></div>
+                <article class="bd-article">
+                  
+  <section class="tex2jax_ignore mathjax_ignore" id="markdown-it-deflist">
+<h1>markdown-it-deflist<a class="headerlink" href="#markdown-it-deflist" title="Link to this heading">#</a></h1>
+<p><a class="reference external" href="https://travis-ci.org/markdown-it/markdown-it-deflist"><img alt="Build Status" src="https://img.shields.io/travis/markdown-it/markdown-it-deflist/master.svg?style=flat" /></a>
+<a class="reference external" href="https://www.npmjs.org/package/markdown-it-deflist"><img alt="NPM version" src="https://img.shields.io/npm/v/markdown-it-deflist.svg?style=flat" /></a>
+<a class="reference external" href="https://coveralls.io/r/markdown-it/markdown-it-deflist?branch=master"><img alt="Coverage Status" src="https://img.shields.io/coveralls/markdown-it/markdown-it-deflist/master.svg?style=flat" /></a></p>
+<blockquote>
+<div><p>Definition list (<code class="docutils literal notranslate"><span class="pre">&lt;dl&gt;</span></code>) tag plugin for <a class="reference external" href="https://github.com/markdown-it/markdown-it">markdown-it</a> markdown parser.</p>
+</div></blockquote>
+<p><strong>v2.+ requires <code class="docutils literal notranslate"><span class="pre">markdown-it</span></code> v5.+, see changelog.</strong></p>
+<p>Syntax is based on <a class="reference external" href="http://johnmacfarlane.net/pandoc/README.html#definition-lists">pandoc definition lists</a>.</p>
+<section id="install">
+<h2>Install<a class="headerlink" href="#install" title="Link to this heading">#</a></h2>
+<p>node.js, browser:</p>
+<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>npm<span class="w"> </span>install<span class="w"> </span>markdown-it-deflist<span class="w"> </span>--save
+bower<span class="w"> </span>install<span class="w"> </span>markdown-it-deflist<span class="w"> </span>--save
+</pre></div>
+</div>
+</section>
+<section id="use">
+<h2>Use<a class="headerlink" href="#use" title="Link to this heading">#</a></h2>
+<div class="highlight-js notranslate"><div class="highlight"><pre><span></span><span class="kd">var</span><span class="w"> </span><span class="nx">md</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">require</span><span class="p">(</span><span class="s1">&#39;markdown-it&#39;</span><span class="p">)()</span>
+<span class="w">            </span><span class="p">.</span><span class="nx">use</span><span class="p">(</span><span class="nx">require</span><span class="p">(</span><span class="s1">&#39;markdown-it-deflist&#39;</span><span class="p">));</span>
+
+<span class="nx">md</span><span class="p">.</span><span class="nx">render</span><span class="p">(</span><span class="cm">/*...*/</span><span class="p">);</span>
+</pre></div>
+</div>
+<p><em>Differences in browser.</em> If you load script directly into the page, without
+package system, module will add itself globally as <code class="docutils literal notranslate"><span class="pre">window.markdownitDeflist</span></code>.</p>
+</section>
+<section id="license">
+<h2>License<a class="headerlink" href="#license" title="Link to this heading">#</a></h2>
+<p><a class="reference external" href="https://github.com/markdown-it/markdown-it-deflist/blob/master/LICENSE">MIT</a></p>
+</section>
+</section>
+
+    <script type="text/x-thebe-config">
+    {
+        requestKernel: true,
+        binderOptions: {
+            repo: "binder-examples/jupyter-stacks-datascience",
+            ref: "master",
+        },
+        codeMirrorConfig: {
+            theme: "abcdef",
+            mode: "python"
+        },
+        kernelOptions: {
+            name: "python3",
+            path: "./.venv/lib/python3.13/site-packages/mdit_py_plugins/deflist"
+        },
+        predefinedOutput: true
+    }
+    </script>
+    <script>kernelName = 'python3'</script>
+
+                </article>
+              
+
+              
+              
+              
+              
+                <footer class="prev-next-footer d-print-none">
+                  
+<div class="prev-next-area">
+</div>
+                </footer>
+              
+            </div>
+            
+            
+              
+                <div class="bd-sidebar-secondary bd-toc"><div class="sidebar-secondary-items sidebar-secondary__inner">
+
+
+  <div class="sidebar-secondary-item">
+  <div class="page-toc tocsection onthispage">
+    <i class="fa-solid fa-list"></i> Contents
+  </div>
+  <nav class="bd-toc-nav page-toc">
+    <ul class="visible nav section-nav flex-column">
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#install">Install</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#use">Use</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#license">License</a></li>
+</ul>
+  </nav></div>
+
+</div></div>
+              
+            
+          </div>
+          <footer class="bd-footer-content">
+            
+<div class="bd-footer-content__inner container">
+  
+  <div class="footer-item">
+    
+<p class="component-author">
+By Morten Hjorth-Jensen
+</p>
+
+  </div>
+  
+  <div class="footer-item">
+    
+
+  <p class="copyright">
+    
+      © Copyright 2023.
+      <br/>
+    
+  </p>
+
+  </div>
+  
+  <div class="footer-item">
+    
+  </div>
+  
+  <div class="footer-item">
+    
+  </div>
+  
+</div>
+          </footer>
+        
+
+      </main>
+    </div>
+  </div>
+  
+  <!-- Scripts loaded after <body> so the DOM is not blocked -->
+  <script src="http://github.com/_static/scripts/bootstrap.js?digest=dfe6caa3a7d634c4db9b"></script>
+<script src="http://github.com/_static/scripts/pydata-sphinx-theme.js?digest=dfe6caa3a7d634c4db9b"></script>
+
+  <footer class="bd-footer">
+  </footer>
+  </body>
+</html>
\ No newline at end of file
diff --git a/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/mdit_py_plugins/texmath/README.html b/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/mdit_py_plugins/texmath/README.html
new file mode 100644
index 000000000..2b12331ff
--- /dev/null
+++ b/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/mdit_py_plugins/texmath/README.html
@@ -0,0 +1,747 @@
+
+<!DOCTYPE html>
+
+
+<html lang="en" data-content_root="../../../../../../" >
+
+  <head>
+    <meta charset="utf-8" />
+    <meta name="viewport" content="width=device-width, initial-scale=1.0" /><meta name="viewport" content="width=device-width, initial-scale=1" />
+
+    <title>markdown-it-texmath &#8212; Applied Data Analysis and Machine Learning</title>
+  
+  
+  
+  <script data-cfasync="false">
+    document.documentElement.dataset.mode = localStorage.getItem("mode") || "";
+    document.documentElement.dataset.theme = localStorage.getItem("theme") || "";
+  </script>
+  
+  <!-- Loaded before other Sphinx assets -->
+  <link href="http://github.com/_static/styles/theme.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+<link href="http://github.com/_static/styles/bootstrap.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+<link href="http://github.com/_static/styles/pydata-sphinx-theme.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+
+  
+  <link href="http://github.com/_static/vendor/fontawesome/6.5.2/css/all.min.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+  <link rel="preload" as="font" type="font/woff2" crossorigin href="http://github.com/_static/vendor/fontawesome/6.5.2/webfonts/fa-solid-900.woff2" />
+<link rel="preload" as="font" type="font/woff2" crossorigin href="http://github.com/_static/vendor/fontawesome/6.5.2/webfonts/fa-brands-400.woff2" />
+<link rel="preload" as="font" type="font/woff2" crossorigin href="http://github.com/_static/vendor/fontawesome/6.5.2/webfonts/fa-regular-400.woff2" />
+
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/pygments.css?v=03e43079" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/styles/sphinx-book-theme.css?v=eba8b062" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/togglebutton.css?v=13237357" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/copybutton.css?v=76b2166b" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/mystnb.8ecb98da25f57f5357bf6f572d296f466b2cfe2517ffebfabe82451661e28f02.css?v=6644e6bb" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/sphinx-thebe.css?v=4fa983c6" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/sphinx-design.min.css?v=95c83b7e" />
+  
+  <!-- Pre-loaded scripts that we'll load fully later -->
+  <link rel="preload" as="script" href="http://github.com/_static/scripts/bootstrap.js?digest=dfe6caa3a7d634c4db9b" />
+<link rel="preload" as="script" href="http://github.com/_static/scripts/pydata-sphinx-theme.js?digest=dfe6caa3a7d634c4db9b" />
+  <script src="http://github.com/_static/vendor/fontawesome/6.5.2/js/all.min.js?digest=dfe6caa3a7d634c4db9b"></script>
+
+    <script src="http://github.com/_static/documentation_options.js?v=9eb32ce0"></script>
+    <script src="http://github.com/_static/doctools.js?v=9a2dae69"></script>
+    <script src="http://github.com/_static/sphinx_highlight.js?v=dc90522c"></script>
+    <script src="http://github.com/_static/clipboard.min.js?v=a7894cd8"></script>
+    <script src="http://github.com/_static/copybutton.js?v=f281be69"></script>
+    <script src="http://github.com/_static/scripts/sphinx-book-theme.js?v=887ef09a"></script>
+    <script>let toggleHintShow = 'Click to show';</script>
+    <script>let toggleHintHide = 'Click to hide';</script>
+    <script>let toggleOpenOnPrint = 'true';</script>
+    <script src="http://github.com/_static/togglebutton.js?v=4a39c7ea"></script>
+    <script>var togglebuttonSelector = '.toggle, .admonition.dropdown';</script>
+    <script src="http://github.com/_static/design-tabs.js?v=f930bc37"></script>
+    <script>const THEBE_JS_URL = "https://unpkg.com/thebe@0.8.2/lib/index.js"; const thebe_selector = ".thebe,.cell"; const thebe_selector_input = "pre"; const thebe_selector_output = ".output, .cell_output"</script>
+    <script async="async" src="http://github.com/_static/sphinx-thebe.js?v=c100c467"></script>
+    <script>var togglebuttonSelector = '.toggle, .admonition.dropdown';</script>
+    <script>const THEBE_JS_URL = "https://unpkg.com/thebe@0.8.2/lib/index.js"; const thebe_selector = ".thebe,.cell"; const thebe_selector_input = "pre"; const thebe_selector_output = ".output, .cell_output"</script>
+    <script>DOCUMENTATION_OPTIONS.pagename = '.venv/lib/python3.13/site-packages/mdit_py_plugins/texmath/README';</script>
+    <link rel="index" title="Index" href="http://github.com/genindex.html" />
+    <link rel="search" title="Search" href="http://github.com/search.html" />
+  <meta name="viewport" content="width=device-width, initial-scale=1"/>
+  <meta name="docsearch:language" content="en"/>
+  </head>
+  
+  
+  <body data-bs-spy="scroll" data-bs-target=".bd-toc-nav" data-offset="180" data-bs-root-margin="0px 0px -60%" data-default-mode="">
+
+  
+  
+  <div id="pst-skip-link" class="skip-link d-print-none"><a href="#main-content">Skip to main content</a></div>
+  
+  <div id="pst-scroll-pixel-helper"></div>
+  
+  <button type="button" class="btn rounded-pill" id="pst-back-to-top">
+    <i class="fa-solid fa-arrow-up"></i>Back to top</button>
+
+  
+  <input type="checkbox"
+          class="sidebar-toggle"
+          id="pst-primary-sidebar-checkbox"/>
+  <label class="overlay overlay-primary" for="pst-primary-sidebar-checkbox"></label>
+  
+  <input type="checkbox"
+          class="sidebar-toggle"
+          id="pst-secondary-sidebar-checkbox"/>
+  <label class="overlay overlay-secondary" for="pst-secondary-sidebar-checkbox"></label>
+  
+  <div class="search-button__wrapper">
+    <div class="search-button__overlay"></div>
+    <div class="search-button__search-container">
+<form class="bd-search d-flex align-items-center"
+      action="http://github.com/search.html"
+      method="get">
+  <i class="fa-solid fa-magnifying-glass"></i>
+  <input type="search"
+         class="form-control"
+         name="q"
+         id="search-input"
+         placeholder="Search this book..."
+         aria-label="Search this book..."
+         autocomplete="off"
+         autocorrect="off"
+         autocapitalize="off"
+         spellcheck="false"/>
+  <span class="search-button__kbd-shortcut"><kbd class="kbd-shortcut__modifier">Ctrl</kbd>+<kbd>K</kbd></span>
+</form></div>
+  </div>
+
+  <div class="pst-async-banner-revealer d-none">
+  <aside id="bd-header-version-warning" class="d-none d-print-none" aria-label="Version warning"></aside>
+</div>
+
+  
+    <header class="bd-header navbar navbar-expand-lg bd-navbar d-print-none">
+    </header>
+  
+
+  <div class="bd-container">
+    <div class="bd-container__inner bd-page-width">
+      
+      
+      
+        
+      
+      <div class="bd-sidebar-primary bd-sidebar">
+        
+
+  
+  <div class="sidebar-header-items sidebar-primary__section">
+    
+    
+    
+    
+  </div>
+  
+    <div class="sidebar-primary-items__start sidebar-primary__section">
+        <div class="sidebar-primary-item">
+
+  
+    
+  
+
+<a class="navbar-brand logo" href="http://github.com/intro.html">
+  
+  
+  
+  
+  
+    
+    
+      
+    
+    
+    <img src="http://github.com/_static/logo.png" class="logo__image only-light" alt="Applied Data Analysis and Machine Learning - Home"/>
+    <script>document.write(`<img src="http://github.com/_static/logo.png" class="logo__image only-dark" alt="Applied Data Analysis and Machine Learning - Home"/>`);</script>
+  
+  
+</a></div>
+        <div class="sidebar-primary-item">
+
+ <script>
+ document.write(`
+   <button class="btn search-button-field search-button__button" title="Search" aria-label="Search" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <i class="fa-solid fa-magnifying-glass"></i>
+    <span class="search-button__default-text">Search</span>
+    <span class="search-button__kbd-shortcut"><kbd class="kbd-shortcut__modifier">Ctrl</kbd>+<kbd class="kbd-shortcut__modifier">K</kbd></span>
+   </button>
+ `);
+ </script></div>
+        <div class="sidebar-primary-item"><nav class="bd-links bd-docs-nav" aria-label="Main">
+    <div class="bd-toc-item navbar-nav active">
+        
+        <ul class="nav bd-sidenav bd-sidenav__home-link">
+            <li class="toctree-l1">
+                <a class="reference internal" href="http://github.com/intro.html">
+                    Applied Data Analysis and Machine Learning
+                </a>
+            </li>
+        </ul>
+        <p aria-level="2" class="caption" role="heading"><span class="caption-text">About the course</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/schedule.html">Course setting</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/teachers.html">Teachers and Grading</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/textbooks.html">Textbooks</a></li>
+
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Review of Statistics with Resampling Techniques and Linear Algebra</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/statistics.html">1. Elements of Probability Theory and Statistical Data Analysis</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/linalg.html">2. Linear Algebra, Handling of Arrays and more Python Features</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">From Regression to Support Vector Machines</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter1.html">3. Linear Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter2.html">4. Ridge and Lasso Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter3.html">5. Resampling Methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter4.html">6. Logistic Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapteroptimization.html">7. Optimization, the central part of any Machine Learning algortithm</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter5.html">8. Support Vector Machines, overarching aims</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Decision Trees, Ensemble Methods and Boosting</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter6.html">9. Decision trees, overarching aims</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter7.html">10. Ensemble Methods: From a Single Tree to Many Trees and Extreme Boosting, Meet the Jungle of Methods</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Dimensionality Reduction</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter8.html">11. Basic ideas of the Principal Component Analysis (PCA)</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/clustering.html">12. Clustering and Unsupervised Learning</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Deep Learning Methods</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter9.html">13. Neural networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter10.html">14. Building a Feed Forward Neural Network</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter11.html">15. Solving Differential Equations  with Deep Learning</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter12.html">16. Convolutional Neural Networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter13.html">17. Recurrent neural networks: Overarching view</a></li>
+
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Weekly material, notes and exercises</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek34.html">Exercises week 34</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week34.html">Week 34: Introduction to the course, Logistics and Practicalities</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek35.html">Exercises week 35</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week35.html">Week 35: From Ordinary Linear Regression to Ridge and Lasso Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek36.html">Exercises week 36</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week36.html">Week 36: Linear Regression and Gradient descent</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek37.html">Exercises week 37</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week37.html">Week 37: Gradient descent methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek38.html">Exercises week 38</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week38.html">Week 38: Statistical analysis, bias-variance tradeoff and resampling methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek39.html">Exercises week 39</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week39.html">Week 39: Resampling methods and logistic regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week40.html">Week 40: Gradient descent methods (continued) and start Neural networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week41.html">Week 41 Neural networks and constructing a neural network code</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek41.html">Exercises week 41</a></li>
+
+
+
+
+
+
+
+
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Projects</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/project1.html">Project 1 on Machine Learning, deadline October 6 (midnight), 2025</a></li>
+</ul>
+
+    </div>
+</nav></div>
+    </div>
+  
+  
+  <div class="sidebar-primary-items__end sidebar-primary__section">
+  </div>
+  
+  <div id="rtd-footer-container"></div>
+
+
+      </div>
+      
+      <main id="main-content" class="bd-main" role="main">
+        
+        
+
+<div class="sbt-scroll-pixel-helper"></div>
+
+          <div class="bd-content">
+            <div class="bd-article-container">
+              
+              <div class="bd-header-article d-print-none">
+<div class="header-article-items header-article__inner">
+  
+    <div class="header-article-items__start">
+      
+        <div class="header-article-item"><button class="sidebar-toggle primary-toggle btn btn-sm" title="Toggle primary sidebar" data-bs-placement="bottom" data-bs-toggle="tooltip">
+  <span class="fa-solid fa-bars"></span>
+</button></div>
+      
+    </div>
+  
+  
+    <div class="header-article-items__end">
+      
+        <div class="header-article-item">
+
+<div class="article-header-buttons">
+
+
+
+
+
+<div class="dropdown dropdown-download-buttons">
+  <button class="btn dropdown-toggle" type="button" data-bs-toggle="dropdown" aria-expanded="false" aria-label="Download this page">
+    <i class="fas fa-download"></i>
+  </button>
+  <ul class="dropdown-menu">
+      
+      
+      
+      <li><a href="http://github.com/_sources/.venv/lib/python3.13/site-packages/mdit_py_plugins/texmath/README.md" target="_blank"
+   class="btn btn-sm btn-download-source-button dropdown-item"
+   title="Download source file"
+   data-bs-placement="left" data-bs-toggle="tooltip"
+>
+  
+
+<span class="btn__icon-container">
+  <i class="fas fa-file"></i>
+  </span>
+<span class="btn__text-container">.md</span>
+</a>
+</li>
+      
+      
+      
+      
+      <li>
+<button onclick="window.print()"
+  class="btn btn-sm btn-download-pdf-button dropdown-item"
+  title="Print to PDF"
+  data-bs-placement="left" data-bs-toggle="tooltip"
+>
+  
+
+<span class="btn__icon-container">
+  <i class="fas fa-file-pdf"></i>
+  </span>
+<span class="btn__text-container">.pdf</span>
+</button>
+</li>
+      
+  </ul>
+</div>
+
+
+
+
+<button onclick="toggleFullScreen()"
+  class="btn btn-sm btn-fullscreen-button"
+  title="Fullscreen mode"
+  data-bs-placement="bottom" data-bs-toggle="tooltip"
+>
+  
+
+<span class="btn__icon-container">
+  <i class="fas fa-expand"></i>
+  </span>
+
+</button>
+
+
+
+<script>
+document.write(`
+  <button class="btn btn-sm nav-link pst-navbar-icon theme-switch-button" title="light/dark" aria-label="light/dark" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <i class="theme-switch fa-solid fa-sun fa-lg" data-mode="light"></i>
+    <i class="theme-switch fa-solid fa-moon fa-lg" data-mode="dark"></i>
+    <i class="theme-switch fa-solid fa-circle-half-stroke fa-lg" data-mode="auto"></i>
+  </button>
+`);
+</script>
+
+
+<script>
+document.write(`
+  <button class="btn btn-sm pst-navbar-icon search-button search-button__button" title="Search" aria-label="Search" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <i class="fa-solid fa-magnifying-glass fa-lg"></i>
+  </button>
+`);
+</script>
+<button class="sidebar-toggle secondary-toggle btn btn-sm" title="Toggle secondary sidebar" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <span class="fa-solid fa-list"></span>
+</button>
+</div></div>
+      
+    </div>
+  
+</div>
+</div>
+              
+              
+
+<div id="jb-print-docs-body" class="onlyprint">
+    <h1>markdown-it-texmath</h1>
+    <!-- Table of contents -->
+    <div id="print-main-content">
+        <div id="jb-print-toc">
+            
+            <div>
+                <h2> Contents </h2>
+            </div>
+            <nav aria-label="Page">
+                <ul class="visible nav section-nav flex-column">
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#features">Features</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#show-me">Show me</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#use-with-node-js">Use with <code class="docutils literal notranslate"><span class="pre">node.js</span></code></a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#use-in-browser">Use in Browser</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#cdn">CDN</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#dependencies">Dependencies</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#todo">ToDo</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#faq">FAQ</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#changelog">CHANGELOG</a><ul class="nav section-nav flex-column">
+<li class="toc-h3 nav-item toc-entry"><a class="reference internal nav-link" href="#on-october-04-2019">[0.6.0] on October 04, 2019</a></li>
+<li class="toc-h3 nav-item toc-entry"><a class="reference internal nav-link" href="#on-february-07-2019">[0.5.5] on February 07, 2019</a></li>
+<li class="toc-h3 nav-item toc-entry"><a class="reference internal nav-link" href="#on-january-20-2019">[0.5.4] on January 20, 2019</a></li>
+<li class="toc-h3 nav-item toc-entry"><a class="reference internal nav-link" href="#on-november-11-2018">[0.5.3] on November 11, 2018</a></li>
+<li class="toc-h3 nav-item toc-entry"><a class="reference internal nav-link" href="#on-september-07-2018">[0.5.2] on September 07, 2018</a></li>
+<li class="toc-h3 nav-item toc-entry"><a class="reference internal nav-link" href="#on-august-15-2018">[0.5.0] on August 15, 2018</a></li>
+<li class="toc-h3 nav-item toc-entry"><a class="reference internal nav-link" href="#on-january-05-2018">[0.4.6] on January 05, 2018</a></li>
+<li class="toc-h3 nav-item toc-entry"><a class="reference internal nav-link" href="#on-november-06-2017">[0.4.5] on November 06, 2017</a></li>
+<li class="toc-h3 nav-item toc-entry"><a class="reference internal nav-link" href="#on-september-27-2017">[0.4.4] on September 27, 2017</a></li>
+</ul>
+</li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#license">License</a></li>
+</ul>
+            </nav>
+        </div>
+    </div>
+</div>
+
+              
+                
+<div id="searchbox"></div>
+                <article class="bd-article">
+                  
+  <p><a class="reference external" href="https://github.com/goessner/markdown-it-texmath/blob/master/licence.txt"><img alt="License" src="https://img.shields.io/github/license/goessner/markdown-it-texmath.svg" /></a>
+<a class="reference external" href="https://www.npmjs.com/package/markdown-it-texmath"><img alt="npm" src="https://img.shields.io/npm/v/markdown-it-texmath.svg" /></a>
+<a class="reference external" href="https://www.npmjs.com/package/markdown-it-texmath"><img alt="npm" src="https://img.shields.io/npm/dt/markdown-it-texmath.svg" /></a></p>
+<section class="tex2jax_ignore mathjax_ignore" id="markdown-it-texmath">
+<h1>markdown-it-texmath<a class="headerlink" href="#markdown-it-texmath" title="Link to this heading">#</a></h1>
+<p>Add TeX math equations to your Markdown documents rendered by <a class="reference external" href="https://github.com/markdown-it/markdown-it">markdown-it</a> parser. <a class="reference external" href="https://github.com/Khan/KaTeX">KaTeX</a> is used as a fast math renderer.</p>
+<section id="features">
+<h2>Features<a class="headerlink" href="#features" title="Link to this heading">#</a></h2>
+<p>Simplify the process of authoring markdown documents containing math formulas.
+This extension is a comfortable tool for scientists, engineers and students with markdown as their first choice document format.</p>
+<ul class="simple">
+<li><p>Macro support</p></li>
+<li><p>Simple formula numbering</p></li>
+<li><p>Inline math with tables, lists and blockquote.</p></li>
+<li><p>User setting delimiters:</p>
+<ul>
+<li><p><code class="docutils literal notranslate"><span class="pre">'dollars'</span></code> (default)</p>
+<ul>
+<li><p>inline: <code class="docutils literal notranslate"><span class="pre">$...$</span></code></p></li>
+<li><p>display: <code class="docutils literal notranslate"><span class="pre">$$...$$</span></code></p></li>
+<li><p>display + equation number: <code class="docutils literal notranslate"><span class="pre">$$...$$</span> <span class="pre">(1)</span></code></p></li>
+</ul>
+</li>
+<li><p><code class="docutils literal notranslate"><span class="pre">'brackets'</span></code></p>
+<ul>
+<li><p>inline: <code class="docutils literal notranslate"><span class="pre">\(...\)</span></code></p></li>
+<li><p>display: <code class="docutils literal notranslate"><span class="pre">\[...\]</span></code></p></li>
+<li><p>display + equation number: <code class="docutils literal notranslate"><span class="pre">\[...\]</span> <span class="pre">(1)</span></code></p></li>
+</ul>
+</li>
+<li><p><code class="docutils literal notranslate"><span class="pre">'gitlab'</span></code></p>
+<ul>
+<li><p>inline: <code class="docutils literal notranslate"><span class="pre">$`...`$</span></code></p></li>
+<li><p>display: <code class="docutils literal notranslate"><span class="pre">```math</span> <span class="pre">...</span> <span class="pre">```</span></code></p></li>
+<li><p>display + equation number: <code class="docutils literal notranslate"> <span class="pre">```math</span> <span class="pre">...</span> <span class="pre">```</span> <span class="pre">(1)</span></code></p></li>
+</ul>
+</li>
+<li><p><code class="docutils literal notranslate"><span class="pre">'julia'</span></code></p>
+<ul>
+<li><p>inline: <code class="docutils literal notranslate"><span class="pre">$...$</span></code>  or <code class="docutils literal notranslate"><span class="pre">``...``</span></code></p></li>
+<li><p>display: <code class="docutils literal notranslate"><span class="pre">```math</span> <span class="pre">...</span> <span class="pre">```</span></code></p></li>
+<li><p>display + equation number: <code class="docutils literal notranslate"> <span class="pre">```math</span> <span class="pre">...</span> <span class="pre">```</span> <span class="pre">(1)</span></code></p></li>
+</ul>
+</li>
+<li><p><code class="docutils literal notranslate"><span class="pre">'kramdown'</span></code></p>
+<ul>
+<li><p>inline: <code class="docutils literal notranslate"><span class="pre">$$...$$</span></code></p></li>
+<li><p>display: <code class="docutils literal notranslate"><span class="pre">$$...$$</span></code></p></li>
+<li><p>display + equation number: <code class="docutils literal notranslate"><span class="pre">$$...$$</span> <span class="pre">(1)</span></code></p></li>
+</ul>
+</li>
+</ul>
+</li>
+</ul>
+</section>
+<section id="show-me">
+<h2>Show me<a class="headerlink" href="#show-me" title="Link to this heading">#</a></h2>
+<p>View a <a class="reference external" href="https://goessner.github.io/markdown-it-texmath/index.html">test table</a>.</p>
+<p><a class="reference external" href="https://goessner.github.io/markdown-it-texmath/markdown-it-texmath-demo.html">try it out …</a></p>
+</section>
+<section id="use-with-node-js">
+<h2>Use with <code class="docutils literal notranslate"><span class="pre">node.js</span></code><a class="headerlink" href="#use-with-node-js" title="Link to this heading">#</a></h2>
+<p>Install the extension. Verify having <code class="docutils literal notranslate"><span class="pre">markdown-it</span></code> and <code class="docutils literal notranslate"><span class="pre">katex</span></code> already installed .</p>
+<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">npm</span> <span class="n">install</span> <span class="n">markdown</span><span class="o">-</span><span class="n">it</span><span class="o">-</span><span class="n">texmath</span>
+</pre></div>
+</div>
+<p>Use it with JavaScript.</p>
+<div class="highlight-js notranslate"><div class="highlight"><pre><span></span><span class="kd">let</span><span class="w"> </span><span class="nx">kt</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">require</span><span class="p">(</span><span class="s1">&#39;katex&#39;</span><span class="p">),</span>
+<span class="w">    </span><span class="nx">tm</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">require</span><span class="p">(</span><span class="s1">&#39;markdown-it-texmath&#39;</span><span class="p">).</span><span class="nx">use</span><span class="p">(</span><span class="nx">kt</span><span class="p">),</span>
+<span class="w">    </span><span class="nx">md</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">require</span><span class="p">(</span><span class="s1">&#39;markdown-it&#39;</span><span class="p">)().</span><span class="nx">use</span><span class="p">(</span><span class="nx">tm</span><span class="p">,{</span><span class="nx">delimiters</span><span class="o">:</span><span class="s1">&#39;dollars&#39;</span><span class="p">,</span><span class="nx">macros</span><span class="o">:</span><span class="p">{</span><span class="s2">&quot;\\RR&quot;</span><span class="o">:</span><span class="w"> </span><span class="s2">&quot;\\mathbb{R}&quot;</span><span class="p">}});</span>
+
+<span class="nx">md</span><span class="p">.</span><span class="nx">render</span><span class="p">(</span><span class="s1">&#39;Euler\&#39;s identity \(e^{i\pi}+1=0\) is a beautiful formula in $\\RR 2$.&#39;</span><span class="p">)</span>
+</pre></div>
+</div>
+</section>
+<section id="use-in-browser">
+<h2>Use in Browser<a class="headerlink" href="#use-in-browser" title="Link to this heading">#</a></h2>
+<div class="highlight-html notranslate"><div class="highlight"><pre><span></span><span class="p">&lt;</span><span class="nt">html</span><span class="p">&gt;</span>
+<span class="p">&lt;</span><span class="nt">head</span><span class="p">&gt;</span>
+  <span class="p">&lt;</span><span class="nt">meta</span> <span class="na">charset</span><span class="o">=</span><span class="s">&#39;utf-8&#39;</span><span class="p">&gt;</span>
+  <span class="p">&lt;</span><span class="nt">link</span> <span class="na">rel</span><span class="o">=</span><span class="s">&quot;stylesheet&quot;</span> <span class="na">href</span><span class="o">=</span><span class="s">&quot;katex.min.css&quot;</span><span class="p">&gt;</span>
+  <span class="p">&lt;</span><span class="nt">link</span> <span class="na">rel</span><span class="o">=</span><span class="s">&quot;stylesheet&quot;</span> <span class="na">href</span><span class="o">=</span><span class="s">&quot;texmath.css&quot;</span><span class="p">&gt;</span>
+  <span class="p">&lt;</span><span class="nt">script</span> <span class="na">src</span><span class="o">=</span><span class="s">&quot;markdown-it.min.js&quot;</span><span class="p">&gt;&lt;/</span><span class="nt">script</span><span class="p">&gt;</span>
+  <span class="p">&lt;</span><span class="nt">script</span> <span class="na">src</span><span class="o">=</span><span class="s">&quot;katex.min.js&quot;</span><span class="p">&gt;&lt;/</span><span class="nt">script</span><span class="p">&gt;</span>
+  <span class="p">&lt;</span><span class="nt">script</span> <span class="na">src</span><span class="o">=</span><span class="s">&quot;texmath.js&quot;</span><span class="p">&gt;&lt;/</span><span class="nt">script</span><span class="p">&gt;</span>
+<span class="p">&lt;/</span><span class="nt">head</span><span class="p">&gt;</span>
+<span class="p">&lt;</span><span class="nt">body</span><span class="p">&gt;</span>
+  <span class="p">&lt;</span><span class="nt">div</span> <span class="na">id</span><span class="o">=</span><span class="s">&quot;out&quot;</span><span class="p">&gt;&lt;/</span><span class="nt">div</span><span class="p">&gt;</span>
+  <span class="p">&lt;</span><span class="nt">script</span><span class="p">&gt;</span>
+<span class="w">    </span><span class="kd">let</span><span class="w"> </span><span class="nx">md</span><span class="p">;</span>
+<span class="w">    </span><span class="nb">document</span><span class="p">.</span><span class="nx">addEventListener</span><span class="p">(</span><span class="s2">&quot;DOMContentLoaded&quot;</span><span class="p">,</span><span class="w"> </span><span class="p">()</span><span class="w"> </span><span class="p">=&gt;</span><span class="w"> </span><span class="p">{</span>
+<span class="w">        </span><span class="kd">const</span><span class="w"> </span><span class="nx">tm</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">texmath</span><span class="p">.</span><span class="nx">use</span><span class="p">(</span><span class="nx">katex</span><span class="p">);</span>
+<span class="w">        </span><span class="nx">md</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">markdownit</span><span class="p">().</span><span class="nx">use</span><span class="p">(</span><span class="nx">tm</span><span class="p">,{</span><span class="nx">delimiters</span><span class="o">:</span><span class="s1">&#39;dollars&#39;</span><span class="p">,</span><span class="nx">macros</span><span class="o">:</span><span class="p">{</span><span class="s2">&quot;\\RR&quot;</span><span class="o">:</span><span class="w"> </span><span class="s2">&quot;\\mathbb{R}&quot;</span><span class="p">}});</span>
+<span class="w">        </span><span class="nx">out</span><span class="p">.</span><span class="nx">innerHTML</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">md</span><span class="p">.</span><span class="nx">render</span><span class="p">(</span><span class="s1">&#39;Euler\&#39;s identity $e^{i\pi}+1=0$ is a beautiful formula in //RR 2.&#39;</span><span class="p">);</span>
+<span class="w">    </span><span class="p">})</span>
+<span class="w">  </span><span class="p">&lt;/</span><span class="nt">script</span><span class="p">&gt;</span>
+<span class="p">&lt;/</span><span class="nt">body</span><span class="p">&gt;</span>
+<span class="p">&lt;/</span><span class="nt">html</span><span class="p">&gt;</span>
+</pre></div>
+</div>
+</section>
+<section id="cdn">
+<h2>CDN<a class="headerlink" href="#cdn" title="Link to this heading">#</a></h2>
+<p>Use following links for <code class="docutils literal notranslate"><span class="pre">texmath.js</span></code> and <code class="docutils literal notranslate"><span class="pre">texmath.css</span></code></p>
+<ul class="simple">
+<li><p><code class="docutils literal notranslate"><span class="pre">https://gitcdn.xyz/cdn/goessner/markdown-it-texmath/master/texmath.js</span></code></p></li>
+<li><p><code class="docutils literal notranslate"><span class="pre">https://gitcdn.xyz/cdn/goessner/markdown-it-texmath/master/texmath.css</span></code></p></li>
+</ul>
+</section>
+<section id="dependencies">
+<h2>Dependencies<a class="headerlink" href="#dependencies" title="Link to this heading">#</a></h2>
+<ul class="simple">
+<li><p><a class="reference external" href="https://github.com/markdown-it/markdown-it"><code class="docutils literal notranslate"><span class="pre">markdown-it</span></code></a>: Markdown parser done right. Fast and easy to extend.</p></li>
+<li><p><a class="reference external" href="https://github.com/Khan/KaTeX"><code class="docutils literal notranslate"><span class="pre">katex</span></code></a>: This is where credits for fast rendering TeX math in HTML go to.</p></li>
+</ul>
+</section>
+<section id="todo">
+<h2>ToDo<a class="headerlink" href="#todo" title="Link to this heading">#</a></h2>
+<p>nothing yet</p>
+</section>
+<section id="faq">
+<h2>FAQ<a class="headerlink" href="#faq" title="Link to this heading">#</a></h2>
+<ul class="simple">
+<li><p><strong><code class="docutils literal notranslate"><span class="pre">markdown-it-texmath</span></code> with React Native does not work, why ?</strong></p>
+<ul>
+<li><p><code class="docutils literal notranslate"><span class="pre">markdown-it-texmath</span></code> is using regular expressions with <code class="docutils literal notranslate"><span class="pre">y</span></code> <a class="reference external" href="https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/sticky">(sticky) property</a> and cannot avoid this. The use of the <code class="docutils literal notranslate"><span class="pre">y</span></code> flag in regular expressions means the plugin is not compatible with React Native (which as of now doesn’t support it and throws an error <code class="docutils literal notranslate"><span class="pre">Invalid</span> <span class="pre">flags</span> <span class="pre">supplied</span> <span class="pre">to</span> <span class="pre">RegExp</span> <span class="pre">constructor</span></code>).</p></li>
+</ul>
+</li>
+</ul>
+</section>
+<section id="changelog">
+<h2>CHANGELOG<a class="headerlink" href="#changelog" title="Link to this heading">#</a></h2>
+<section id="on-october-04-2019">
+<h3>[0.6.0] on October 04, 2019<a class="headerlink" href="#on-october-04-2019" title="Link to this heading">#</a></h3>
+<ul class="simple">
+<li><p>Add support for <a class="reference external" href="https://docs.julialang.org/en/v1/stdlib/Markdown/">Julia Markdown</a> on <a class="reference external" href="https://github.com/goessner/markdown-it-texmath/issues/15">request</a>.</p></li>
+</ul>
+</section>
+<section id="on-february-07-2019">
+<h3>[0.5.5] on February 07, 2019<a class="headerlink" href="#on-february-07-2019" title="Link to this heading">#</a></h3>
+<ul class="simple">
+<li><p>Remove <a class="reference external" href="https://github.com/goessner/markdown-it-texmath/issues/9">rendering bug with brackets delimiters</a>.</p></li>
+</ul>
+</section>
+<section id="on-january-20-2019">
+<h3>[0.5.4] on January 20, 2019<a class="headerlink" href="#on-january-20-2019" title="Link to this heading">#</a></h3>
+<ul class="simple">
+<li><p>Remove pathological <a class="reference external" href="https://github.com/goessner/mdmath/issues/50">bug within blockquotes</a>.</p></li>
+</ul>
+</section>
+<section id="on-november-11-2018">
+<h3>[0.5.3] on November 11, 2018<a class="headerlink" href="#on-november-11-2018" title="Link to this heading">#</a></h3>
+<ul class="simple">
+<li><p>Add support for Tex macros (<a class="reference external" href="https://katex.org/docs/supported.html#macros">https://katex.org/docs/supported.html#macros</a>) .</p></li>
+<li><p>Bug with <a class="reference external" href="https://github.com/goessner/markdown-it-texmath/issues/9">brackets delimiters</a> .</p></li>
+</ul>
+</section>
+<section id="on-september-07-2018">
+<h3>[0.5.2] on September 07, 2018<a class="headerlink" href="#on-september-07-2018" title="Link to this heading">#</a></h3>
+<ul class="simple">
+<li><p>Add support for <a class="reference external" href="https://kramdown.gettalong.org/">Kramdown</a> .</p></li>
+</ul>
+</section>
+<section id="on-august-15-2018">
+<h3>[0.5.0] on August 15, 2018<a class="headerlink" href="#on-august-15-2018" title="Link to this heading">#</a></h3>
+<ul class="simple">
+<li><p>Fatal blockquote bug investigated. Implemented workaround to vscode bug, which has finally gone with vscode 1.26.0 .</p></li>
+</ul>
+</section>
+<section id="on-january-05-2018">
+<h3>[0.4.6] on January 05, 2018<a class="headerlink" href="#on-january-05-2018" title="Link to this heading">#</a></h3>
+<ul class="simple">
+<li><p>Escaped underscore bug removed.</p></li>
+</ul>
+</section>
+<section id="on-november-06-2017">
+<h3>[0.4.5] on November 06, 2017<a class="headerlink" href="#on-november-06-2017" title="Link to this heading">#</a></h3>
+<ul class="simple">
+<li><p>Backslash bug removed.</p></li>
+</ul>
+</section>
+<section id="on-september-27-2017">
+<h3>[0.4.4] on September 27, 2017<a class="headerlink" href="#on-september-27-2017" title="Link to this heading">#</a></h3>
+<ul class="simple">
+<li><p>Modifying the <code class="docutils literal notranslate"><span class="pre">block</span></code> mode regular expression with <code class="docutils literal notranslate"><span class="pre">gitlab</span></code> delimiters, so removing the <code class="docutils literal notranslate"><span class="pre">newline</span></code> bug.</p></li>
+</ul>
+</section>
+</section>
+<section id="license">
+<h2>License<a class="headerlink" href="#license" title="Link to this heading">#</a></h2>
+<p><code class="docutils literal notranslate"><span class="pre">markdown-it-texmath</span></code> is licensed under the <a class="reference internal" href="#./license.txt"><span class="xref myst">MIT License</span></a></p>
+<p>© <a class="reference external" href="https://github.com/goessner">Stefan Gössner</a></p>
+</section>
+</section>
+
+    <script type="text/x-thebe-config">
+    {
+        requestKernel: true,
+        binderOptions: {
+            repo: "binder-examples/jupyter-stacks-datascience",
+            ref: "master",
+        },
+        codeMirrorConfig: {
+            theme: "abcdef",
+            mode: "python"
+        },
+        kernelOptions: {
+            name: "python3",
+            path: "./.venv/lib/python3.13/site-packages/mdit_py_plugins/texmath"
+        },
+        predefinedOutput: true
+    }
+    </script>
+    <script>kernelName = 'python3'</script>
+
+                </article>
+              
+
+              
+              
+              
+              
+                <footer class="prev-next-footer d-print-none">
+                  
+<div class="prev-next-area">
+</div>
+                </footer>
+              
+            </div>
+            
+            
+              
+                <div class="bd-sidebar-secondary bd-toc"><div class="sidebar-secondary-items sidebar-secondary__inner">
+
+
+  <div class="sidebar-secondary-item">
+  <div class="page-toc tocsection onthispage">
+    <i class="fa-solid fa-list"></i> Contents
+  </div>
+  <nav class="bd-toc-nav page-toc">
+    <ul class="visible nav section-nav flex-column">
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#features">Features</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#show-me">Show me</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#use-with-node-js">Use with <code class="docutils literal notranslate"><span class="pre">node.js</span></code></a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#use-in-browser">Use in Browser</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#cdn">CDN</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#dependencies">Dependencies</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#todo">ToDo</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#faq">FAQ</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#changelog">CHANGELOG</a><ul class="nav section-nav flex-column">
+<li class="toc-h3 nav-item toc-entry"><a class="reference internal nav-link" href="#on-october-04-2019">[0.6.0] on October 04, 2019</a></li>
+<li class="toc-h3 nav-item toc-entry"><a class="reference internal nav-link" href="#on-february-07-2019">[0.5.5] on February 07, 2019</a></li>
+<li class="toc-h3 nav-item toc-entry"><a class="reference internal nav-link" href="#on-january-20-2019">[0.5.4] on January 20, 2019</a></li>
+<li class="toc-h3 nav-item toc-entry"><a class="reference internal nav-link" href="#on-november-11-2018">[0.5.3] on November 11, 2018</a></li>
+<li class="toc-h3 nav-item toc-entry"><a class="reference internal nav-link" href="#on-september-07-2018">[0.5.2] on September 07, 2018</a></li>
+<li class="toc-h3 nav-item toc-entry"><a class="reference internal nav-link" href="#on-august-15-2018">[0.5.0] on August 15, 2018</a></li>
+<li class="toc-h3 nav-item toc-entry"><a class="reference internal nav-link" href="#on-january-05-2018">[0.4.6] on January 05, 2018</a></li>
+<li class="toc-h3 nav-item toc-entry"><a class="reference internal nav-link" href="#on-november-06-2017">[0.4.5] on November 06, 2017</a></li>
+<li class="toc-h3 nav-item toc-entry"><a class="reference internal nav-link" href="#on-september-27-2017">[0.4.4] on September 27, 2017</a></li>
+</ul>
+</li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#license">License</a></li>
+</ul>
+  </nav></div>
+
+</div></div>
+              
+            
+          </div>
+          <footer class="bd-footer-content">
+            
+<div class="bd-footer-content__inner container">
+  
+  <div class="footer-item">
+    
+<p class="component-author">
+By Morten Hjorth-Jensen
+</p>
+
+  </div>
+  
+  <div class="footer-item">
+    
+
+  <p class="copyright">
+    
+      © Copyright 2023.
+      <br/>
+    
+  </p>
+
+  </div>
+  
+  <div class="footer-item">
+    
+  </div>
+  
+  <div class="footer-item">
+    
+  </div>
+  
+</div>
+          </footer>
+        
+
+      </main>
+    </div>
+  </div>
+  
+  <!-- Scripts loaded after <body> so the DOM is not blocked -->
+  <script src="http://github.com/_static/scripts/bootstrap.js?digest=dfe6caa3a7d634c4db9b"></script>
+<script src="http://github.com/_static/scripts/pydata-sphinx-theme.js?digest=dfe6caa3a7d634c4db9b"></script>
+
+  <footer class="bd-footer">
+  </footer>
+  </body>
+</html>
\ No newline at end of file
diff --git a/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/numpy/ma/README.html b/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/numpy/ma/README.html
new file mode 100644
index 000000000..4087f4257
--- /dev/null
+++ b/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/numpy/ma/README.html
@@ -0,0 +1,777 @@
+
+<!DOCTYPE html>
+
+
+<html lang="en" data-content_root="../../../../../../" >
+
+  <head>
+    <meta charset="utf-8" />
+    <meta name="viewport" content="width=device-width, initial-scale=1.0" /><meta name="viewport" content="width=device-width, initial-scale=1" />
+
+    <title>A guide to masked arrays in NumPy &#8212; Applied Data Analysis and Machine Learning</title>
+  
+  
+  
+  <script data-cfasync="false">
+    document.documentElement.dataset.mode = localStorage.getItem("mode") || "";
+    document.documentElement.dataset.theme = localStorage.getItem("theme") || "";
+  </script>
+  
+  <!-- Loaded before other Sphinx assets -->
+  <link href="http://github.com/_static/styles/theme.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+<link href="http://github.com/_static/styles/bootstrap.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+<link href="http://github.com/_static/styles/pydata-sphinx-theme.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+
+  
+  <link href="http://github.com/_static/vendor/fontawesome/6.5.2/css/all.min.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+  <link rel="preload" as="font" type="font/woff2" crossorigin href="http://github.com/_static/vendor/fontawesome/6.5.2/webfonts/fa-solid-900.woff2" />
+<link rel="preload" as="font" type="font/woff2" crossorigin href="http://github.com/_static/vendor/fontawesome/6.5.2/webfonts/fa-brands-400.woff2" />
+<link rel="preload" as="font" type="font/woff2" crossorigin href="http://github.com/_static/vendor/fontawesome/6.5.2/webfonts/fa-regular-400.woff2" />
+
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/pygments.css?v=03e43079" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/styles/sphinx-book-theme.css?v=eba8b062" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/togglebutton.css?v=13237357" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/copybutton.css?v=76b2166b" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/mystnb.8ecb98da25f57f5357bf6f572d296f466b2cfe2517ffebfabe82451661e28f02.css?v=6644e6bb" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/sphinx-thebe.css?v=4fa983c6" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/sphinx-design.min.css?v=95c83b7e" />
+  
+  <!-- Pre-loaded scripts that we'll load fully later -->
+  <link rel="preload" as="script" href="http://github.com/_static/scripts/bootstrap.js?digest=dfe6caa3a7d634c4db9b" />
+<link rel="preload" as="script" href="http://github.com/_static/scripts/pydata-sphinx-theme.js?digest=dfe6caa3a7d634c4db9b" />
+  <script src="http://github.com/_static/vendor/fontawesome/6.5.2/js/all.min.js?digest=dfe6caa3a7d634c4db9b"></script>
+
+    <script src="http://github.com/_static/documentation_options.js?v=9eb32ce0"></script>
+    <script src="http://github.com/_static/doctools.js?v=9a2dae69"></script>
+    <script src="http://github.com/_static/sphinx_highlight.js?v=dc90522c"></script>
+    <script src="http://github.com/_static/clipboard.min.js?v=a7894cd8"></script>
+    <script src="http://github.com/_static/copybutton.js?v=f281be69"></script>
+    <script src="http://github.com/_static/scripts/sphinx-book-theme.js?v=887ef09a"></script>
+    <script>let toggleHintShow = 'Click to show';</script>
+    <script>let toggleHintHide = 'Click to hide';</script>
+    <script>let toggleOpenOnPrint = 'true';</script>
+    <script src="http://github.com/_static/togglebutton.js?v=4a39c7ea"></script>
+    <script>var togglebuttonSelector = '.toggle, .admonition.dropdown';</script>
+    <script src="http://github.com/_static/design-tabs.js?v=f930bc37"></script>
+    <script>const THEBE_JS_URL = "https://unpkg.com/thebe@0.8.2/lib/index.js"; const thebe_selector = ".thebe,.cell"; const thebe_selector_input = "pre"; const thebe_selector_output = ".output, .cell_output"</script>
+    <script async="async" src="http://github.com/_static/sphinx-thebe.js?v=c100c467"></script>
+    <script>var togglebuttonSelector = '.toggle, .admonition.dropdown';</script>
+    <script>const THEBE_JS_URL = "https://unpkg.com/thebe@0.8.2/lib/index.js"; const thebe_selector = ".thebe,.cell"; const thebe_selector_input = "pre"; const thebe_selector_output = ".output, .cell_output"</script>
+    <script>DOCUMENTATION_OPTIONS.pagename = '.venv/lib/python3.13/site-packages/numpy/ma/README';</script>
+    <link rel="index" title="Index" href="http://github.com/genindex.html" />
+    <link rel="search" title="Search" href="http://github.com/search.html" />
+  <meta name="viewport" content="width=device-width, initial-scale=1"/>
+  <meta name="docsearch:language" content="en"/>
+  </head>
+  
+  
+  <body data-bs-spy="scroll" data-bs-target=".bd-toc-nav" data-offset="180" data-bs-root-margin="0px 0px -60%" data-default-mode="">
+
+  
+  
+  <div id="pst-skip-link" class="skip-link d-print-none"><a href="#main-content">Skip to main content</a></div>
+  
+  <div id="pst-scroll-pixel-helper"></div>
+  
+  <button type="button" class="btn rounded-pill" id="pst-back-to-top">
+    <i class="fa-solid fa-arrow-up"></i>Back to top</button>
+
+  
+  <input type="checkbox"
+          class="sidebar-toggle"
+          id="pst-primary-sidebar-checkbox"/>
+  <label class="overlay overlay-primary" for="pst-primary-sidebar-checkbox"></label>
+  
+  <input type="checkbox"
+          class="sidebar-toggle"
+          id="pst-secondary-sidebar-checkbox"/>
+  <label class="overlay overlay-secondary" for="pst-secondary-sidebar-checkbox"></label>
+  
+  <div class="search-button__wrapper">
+    <div class="search-button__overlay"></div>
+    <div class="search-button__search-container">
+<form class="bd-search d-flex align-items-center"
+      action="http://github.com/search.html"
+      method="get">
+  <i class="fa-solid fa-magnifying-glass"></i>
+  <input type="search"
+         class="form-control"
+         name="q"
+         id="search-input"
+         placeholder="Search this book..."
+         aria-label="Search this book..."
+         autocomplete="off"
+         autocorrect="off"
+         autocapitalize="off"
+         spellcheck="false"/>
+  <span class="search-button__kbd-shortcut"><kbd class="kbd-shortcut__modifier">Ctrl</kbd>+<kbd>K</kbd></span>
+</form></div>
+  </div>
+
+  <div class="pst-async-banner-revealer d-none">
+  <aside id="bd-header-version-warning" class="d-none d-print-none" aria-label="Version warning"></aside>
+</div>
+
+  
+    <header class="bd-header navbar navbar-expand-lg bd-navbar d-print-none">
+    </header>
+  
+
+  <div class="bd-container">
+    <div class="bd-container__inner bd-page-width">
+      
+      
+      
+        
+      
+      <div class="bd-sidebar-primary bd-sidebar">
+        
+
+  
+  <div class="sidebar-header-items sidebar-primary__section">
+    
+    
+    
+    
+  </div>
+  
+    <div class="sidebar-primary-items__start sidebar-primary__section">
+        <div class="sidebar-primary-item">
+
+  
+    
+  
+
+<a class="navbar-brand logo" href="http://github.com/intro.html">
+  
+  
+  
+  
+  
+    
+    
+      
+    
+    
+    <img src="http://github.com/_static/logo.png" class="logo__image only-light" alt="Applied Data Analysis and Machine Learning - Home"/>
+    <script>document.write(`<img src="http://github.com/_static/logo.png" class="logo__image only-dark" alt="Applied Data Analysis and Machine Learning - Home"/>`);</script>
+  
+  
+</a></div>
+        <div class="sidebar-primary-item">
+
+ <script>
+ document.write(`
+   <button class="btn search-button-field search-button__button" title="Search" aria-label="Search" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <i class="fa-solid fa-magnifying-glass"></i>
+    <span class="search-button__default-text">Search</span>
+    <span class="search-button__kbd-shortcut"><kbd class="kbd-shortcut__modifier">Ctrl</kbd>+<kbd class="kbd-shortcut__modifier">K</kbd></span>
+   </button>
+ `);
+ </script></div>
+        <div class="sidebar-primary-item"><nav class="bd-links bd-docs-nav" aria-label="Main">
+    <div class="bd-toc-item navbar-nav active">
+        
+        <ul class="nav bd-sidenav bd-sidenav__home-link">
+            <li class="toctree-l1">
+                <a class="reference internal" href="http://github.com/intro.html">
+                    Applied Data Analysis and Machine Learning
+                </a>
+            </li>
+        </ul>
+        <p aria-level="2" class="caption" role="heading"><span class="caption-text">About the course</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/schedule.html">Course setting</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/teachers.html">Teachers and Grading</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/textbooks.html">Textbooks</a></li>
+
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Review of Statistics with Resampling Techniques and Linear Algebra</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/statistics.html">1. Elements of Probability Theory and Statistical Data Analysis</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/linalg.html">2. Linear Algebra, Handling of Arrays and more Python Features</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">From Regression to Support Vector Machines</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter1.html">3. Linear Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter2.html">4. Ridge and Lasso Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter3.html">5. Resampling Methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter4.html">6. Logistic Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapteroptimization.html">7. Optimization, the central part of any Machine Learning algortithm</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter5.html">8. Support Vector Machines, overarching aims</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Decision Trees, Ensemble Methods and Boosting</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter6.html">9. Decision trees, overarching aims</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter7.html">10. Ensemble Methods: From a Single Tree to Many Trees and Extreme Boosting, Meet the Jungle of Methods</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Dimensionality Reduction</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter8.html">11. Basic ideas of the Principal Component Analysis (PCA)</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/clustering.html">12. Clustering and Unsupervised Learning</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Deep Learning Methods</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter9.html">13. Neural networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter10.html">14. Building a Feed Forward Neural Network</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter11.html">15. Solving Differential Equations  with Deep Learning</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter12.html">16. Convolutional Neural Networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter13.html">17. Recurrent neural networks: Overarching view</a></li>
+
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Weekly material, notes and exercises</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek34.html">Exercises week 34</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week34.html">Week 34: Introduction to the course, Logistics and Practicalities</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek35.html">Exercises week 35</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week35.html">Week 35: From Ordinary Linear Regression to Ridge and Lasso Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek36.html">Exercises week 36</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week36.html">Week 36: Linear Regression and Gradient descent</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek37.html">Exercises week 37</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week37.html">Week 37: Gradient descent methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek38.html">Exercises week 38</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week38.html">Week 38: Statistical analysis, bias-variance tradeoff and resampling methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek39.html">Exercises week 39</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week39.html">Week 39: Resampling methods and logistic regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week40.html">Week 40: Gradient descent methods (continued) and start Neural networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week41.html">Week 41 Neural networks and constructing a neural network code</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek41.html">Exercises week 41</a></li>
+
+
+
+
+
+
+
+
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Projects</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/project1.html">Project 1 on Machine Learning, deadline October 6 (midnight), 2025</a></li>
+</ul>
+
+    </div>
+</nav></div>
+    </div>
+  
+  
+  <div class="sidebar-primary-items__end sidebar-primary__section">
+  </div>
+  
+  <div id="rtd-footer-container"></div>
+
+
+      </div>
+      
+      <main id="main-content" class="bd-main" role="main">
+        
+        
+
+<div class="sbt-scroll-pixel-helper"></div>
+
+          <div class="bd-content">
+            <div class="bd-article-container">
+              
+              <div class="bd-header-article d-print-none">
+<div class="header-article-items header-article__inner">
+  
+    <div class="header-article-items__start">
+      
+        <div class="header-article-item"><button class="sidebar-toggle primary-toggle btn btn-sm" title="Toggle primary sidebar" data-bs-placement="bottom" data-bs-toggle="tooltip">
+  <span class="fa-solid fa-bars"></span>
+</button></div>
+      
+    </div>
+  
+  
+    <div class="header-article-items__end">
+      
+        <div class="header-article-item">
+
+<div class="article-header-buttons">
+
+
+
+
+
+<div class="dropdown dropdown-download-buttons">
+  <button class="btn dropdown-toggle" type="button" data-bs-toggle="dropdown" aria-expanded="false" aria-label="Download this page">
+    <i class="fas fa-download"></i>
+  </button>
+  <ul class="dropdown-menu">
+      
+      
+      
+      <li><a href="http://github.com/_sources/.venv/lib/python3.13/site-packages/numpy/ma/README.rst" target="_blank"
+   class="btn btn-sm btn-download-source-button dropdown-item"
+   title="Download source file"
+   data-bs-placement="left" data-bs-toggle="tooltip"
+>
+  
+
+<span class="btn__icon-container">
+  <i class="fas fa-file"></i>
+  </span>
+<span class="btn__text-container">.rst</span>
+</a>
+</li>
+      
+      
+      
+      
+      <li>
+<button onclick="window.print()"
+  class="btn btn-sm btn-download-pdf-button dropdown-item"
+  title="Print to PDF"
+  data-bs-placement="left" data-bs-toggle="tooltip"
+>
+  
+
+<span class="btn__icon-container">
+  <i class="fas fa-file-pdf"></i>
+  </span>
+<span class="btn__text-container">.pdf</span>
+</button>
+</li>
+      
+  </ul>
+</div>
+
+
+
+
+<button onclick="toggleFullScreen()"
+  class="btn btn-sm btn-fullscreen-button"
+  title="Fullscreen mode"
+  data-bs-placement="bottom" data-bs-toggle="tooltip"
+>
+  
+
+<span class="btn__icon-container">
+  <i class="fas fa-expand"></i>
+  </span>
+
+</button>
+
+
+
+<script>
+document.write(`
+  <button class="btn btn-sm nav-link pst-navbar-icon theme-switch-button" title="light/dark" aria-label="light/dark" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <i class="theme-switch fa-solid fa-sun fa-lg" data-mode="light"></i>
+    <i class="theme-switch fa-solid fa-moon fa-lg" data-mode="dark"></i>
+    <i class="theme-switch fa-solid fa-circle-half-stroke fa-lg" data-mode="auto"></i>
+  </button>
+`);
+</script>
+
+
+<script>
+document.write(`
+  <button class="btn btn-sm pst-navbar-icon search-button search-button__button" title="Search" aria-label="Search" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <i class="fa-solid fa-magnifying-glass fa-lg"></i>
+  </button>
+`);
+</script>
+<button class="sidebar-toggle secondary-toggle btn btn-sm" title="Toggle secondary sidebar" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <span class="fa-solid fa-list"></span>
+</button>
+</div></div>
+      
+    </div>
+  
+</div>
+</div>
+              
+              
+
+<div id="jb-print-docs-body" class="onlyprint">
+    <h1>A guide to masked arrays in NumPy</h1>
+    <!-- Table of contents -->
+    <div id="print-main-content">
+        <div id="jb-print-toc">
+            
+            <div>
+                <h2> Contents </h2>
+            </div>
+            <nav aria-label="Page">
+                <ul class="visible nav section-nav flex-column">
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#history">History</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#main-differences">Main differences</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#new-features">New features</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#using-the-new-package-with-numpy-core-ma">Using the new package with numpy.core.ma</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#using-maskedarray-with-matplotlib">Using maskedarray with matplotlib</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#masked-records">Masked records</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#optimizing-maskedarray">Optimizing maskedarray</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#should-masked-arrays-be-filled-before-processing-or-not">Should masked arrays be filled before processing or not?</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#thanks">Thanks</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#revision-notes">Revision notes</a></li>
+</ul>
+            </nav>
+        </div>
+    </div>
+</div>
+
+              
+                
+<div id="searchbox"></div>
+                <article class="bd-article">
+                  
+  <section id="a-guide-to-masked-arrays-in-numpy">
+<h1><a class="toc-backref" href="#id1" role="doc-backlink">A guide to masked arrays in NumPy</a><a class="headerlink" href="#a-guide-to-masked-arrays-in-numpy" title="Link to this heading">#</a></h1>
+<nav class="contents" id="contents">
+<p class="topic-title">Contents</p>
+<ul class="simple">
+<li><p><a class="reference internal" href="#a-guide-to-masked-arrays-in-numpy" id="id1">A guide to masked arrays in NumPy</a></p>
+<ul>
+<li><p><a class="reference internal" href="#history" id="id2">History</a></p></li>
+<li><p><a class="reference internal" href="#main-differences" id="id3">Main differences</a></p></li>
+<li><p><a class="reference internal" href="#new-features" id="id4">New features</a></p></li>
+<li><p><a class="reference internal" href="#using-the-new-package-with-numpy-core-ma" id="id5">Using the new package with numpy.core.ma</a></p></li>
+<li><p><a class="reference internal" href="#using-maskedarray-with-matplotlib" id="id6">Using maskedarray with matplotlib</a></p></li>
+<li><p><a class="reference internal" href="#masked-records" id="id7">Masked records</a></p></li>
+<li><p><a class="reference internal" href="#optimizing-maskedarray" id="id8">Optimizing maskedarray</a></p></li>
+<li><p><a class="reference internal" href="#should-masked-arrays-be-filled-before-processing-or-not" id="id9">Should masked arrays be filled before processing or not?</a></p></li>
+<li><p><a class="reference internal" href="#thanks" id="id10">Thanks</a></p></li>
+<li><p><a class="reference internal" href="#revision-notes" id="id11">Revision notes</a></p></li>
+</ul>
+</li>
+</ul>
+</nav>
+<p>See <a class="reference external" href="http://www.scipy.org/scipy/numpy/wiki/MaskedArray">http://www.scipy.org/scipy/numpy/wiki/MaskedArray</a> (dead link)
+for updates of this document.</p>
+<section id="history">
+<h2><a class="toc-backref" href="#id2" role="doc-backlink">History</a><a class="headerlink" href="#history" title="Link to this heading">#</a></h2>
+<p>As a regular user of MaskedArray, I (Pierre G.F. Gerard-Marchant) became
+increasingly frustrated with the subclassing of masked arrays (even if
+I can only blame my inexperience). I needed to develop a class of arrays
+that could store some additional information along with numerical values,
+while keeping the possibility for missing data (picture storing a series
+of dates along with measurements, what would later become the <a class="reference external" href="http://projects.scipy.org/scipy/scikits/wiki/TimeSeries">TimeSeries
+Scikit</a>
+(dead link).</p>
+<p>I started to implement such a class, but then quickly realized that
+any additional information disappeared when processing these subarrays
+(for example, adding a constant value to a subarray would erase its
+dates). I ended up writing the equivalent of <em>numpy.core.ma</em> for my
+particular class, ufuncs included. Everything went fine until I needed to
+subclass my new class, when more problems showed up: some attributes of
+the new subclass were lost during processing. I identified the culprit as
+MaskedArray, which returns masked ndarrays when I expected masked
+arrays of my class. I was preparing myself to rewrite <em>numpy.core.ma</em>
+when I forced myself to learn how to subclass ndarrays. As I became more
+familiar with the <em>__new__</em> and <em>__array_finalize__</em> methods,
+I started to wonder why masked arrays were objects, and not ndarrays,
+and whether it wouldn’t be more convenient for subclassing if they did
+behave like regular ndarrays.</p>
+<p>The new <em>maskedarray</em> is what I eventually come up with. The
+main differences with the initial <em>numpy.core.ma</em> package are
+that MaskedArray is now a subclass of <em>ndarray</em> and that the
+<em>_data</em> section can now be any subclass of <em>ndarray</em>. Apart from a
+couple of issues listed below, the behavior of the new MaskedArray
+class reproduces the old one. Initially the <em>maskedarray</em>
+implementation was marginally slower than <em>numpy.ma</em> in some areas,
+but work is underway to speed it up; the expectation is that it can be
+made substantially faster than the present <em>numpy.ma</em>.</p>
+<p>Note that if the subclass has some special methods and
+attributes, they are not propagated to the masked version:
+this would require a modification of the <em>__getattribute__</em>
+method (first trying <em>ndarray.__getattribute__</em>, then trying
+<em>self._data.__getattribute__</em> if an exception is raised in the first
+place), which really slows things down.</p>
+</section>
+<section id="main-differences">
+<h2><a class="toc-backref" href="#id3" role="doc-backlink">Main differences</a><a class="headerlink" href="#main-differences" title="Link to this heading">#</a></h2>
+<blockquote>
+<div><ul class="simple">
+<li><p>The <em>_data</em> part of the masked array can be any subclass of ndarray (but not recarray, cf below).</p></li>
+<li><p><em>fill_value</em> is now a property, not a function.</p></li>
+<li><p>in the majority of cases, the mask is forced to <em>nomask</em> when no value is actually masked. A notable exception is when a masked array (with no masked values) has just been unpickled.</p></li>
+<li><p>I got rid of the <em>share_mask</em> flag, I never understood its purpose.</p></li>
+<li><p><em>put</em>, <em>putmask</em> and <em>take</em> now mimic the ndarray methods, to avoid unpleasant surprises. Moreover, <em>put</em> and <em>putmask</em> both update the mask when needed.  * if <em>a</em> is a masked array, <em>bool(a)</em> raises a <em>ValueError</em>, as it does with ndarrays.</p></li>
+<li><p>in the same way, the comparison of two masked arrays is a masked array, not a boolean</p></li>
+<li><p><em>filled(a)</em> returns an array of the same subclass as <em>a._data</em>, and no test is performed on whether it is contiguous or not.</p></li>
+<li><p>the mask is always printed, even if it’s <em>nomask</em>, which makes things easy (for me at least) to remember that a masked array is used.</p></li>
+<li><p><em>cumsum</em> works as if the <em>_data</em> array was filled with 0. The mask is preserved, but not updated.</p></li>
+<li><p><em>cumprod</em> works as if the <em>_data</em> array was filled with 1. The mask is preserved, but not updated.</p></li>
+</ul>
+</div></blockquote>
+</section>
+<section id="new-features">
+<h2><a class="toc-backref" href="#id4" role="doc-backlink">New features</a><a class="headerlink" href="#new-features" title="Link to this heading">#</a></h2>
+<p>This list is non-exhaustive…</p>
+<blockquote>
+<div><ul class="simple">
+<li><p>the <em>mr_</em> function mimics <em>r_</em> for masked arrays.</p></li>
+<li><p>the <em>anom</em> method returns the anomalies (deviations from the average)</p></li>
+</ul>
+</div></blockquote>
+</section>
+<section id="using-the-new-package-with-numpy-core-ma">
+<h2><a class="toc-backref" href="#id5" role="doc-backlink">Using the new package with numpy.core.ma</a><a class="headerlink" href="#using-the-new-package-with-numpy-core-ma" title="Link to this heading">#</a></h2>
+<p>I tried to make sure that the new package can understand old masked
+arrays. Unfortunately, there’s no upward compatibility.</p>
+<p>For example:</p>
+<div class="doctest highlight-default notranslate"><div class="highlight"><pre><span></span><span class="gp">&gt;&gt;&gt; </span><span class="kn">import</span><span class="w"> </span><span class="nn">numpy.core.ma</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="nn">old_ma</span>
+<span class="gp">&gt;&gt;&gt; </span><span class="kn">import</span><span class="w"> </span><span class="nn">maskedarray</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="nn">new_ma</span>
+<span class="gp">&gt;&gt;&gt; </span><span class="n">x</span> <span class="o">=</span> <span class="n">old_ma</span><span class="o">.</span><span class="n">array</span><span class="p">([</span><span class="mi">1</span><span class="p">,</span><span class="mi">2</span><span class="p">,</span><span class="mi">3</span><span class="p">,</span><span class="mi">4</span><span class="p">,</span><span class="mi">5</span><span class="p">],</span> <span class="n">mask</span><span class="o">=</span><span class="p">[</span><span class="mi">0</span><span class="p">,</span><span class="mi">0</span><span class="p">,</span><span class="mi">1</span><span class="p">,</span><span class="mi">0</span><span class="p">,</span><span class="mi">0</span><span class="p">])</span>
+<span class="gp">&gt;&gt;&gt; </span><span class="n">x</span>
+<span class="go">array(data =</span>
+<span class="go"> [     1      2 999999      4      5],</span>
+<span class="go">      mask =</span>
+<span class="go"> [False False True False False],</span>
+<span class="go">      fill_value=999999)</span>
+<span class="gp">&gt;&gt;&gt; </span><span class="n">y</span> <span class="o">=</span> <span class="n">new_ma</span><span class="o">.</span><span class="n">array</span><span class="p">([</span><span class="mi">1</span><span class="p">,</span><span class="mi">2</span><span class="p">,</span><span class="mi">3</span><span class="p">,</span><span class="mi">4</span><span class="p">,</span><span class="mi">5</span><span class="p">],</span> <span class="n">mask</span><span class="o">=</span><span class="p">[</span><span class="mi">0</span><span class="p">,</span><span class="mi">0</span><span class="p">,</span><span class="mi">1</span><span class="p">,</span><span class="mi">0</span><span class="p">,</span><span class="mi">0</span><span class="p">])</span>
+<span class="gp">&gt;&gt;&gt; </span><span class="n">y</span>
+<span class="go">array(data = [1 2 -- 4 5],</span>
+<span class="go">      mask = [False False True False False],</span>
+<span class="go">      fill_value=999999)</span>
+<span class="gp">&gt;&gt;&gt; </span><span class="n">x</span><span class="o">==</span><span class="n">y</span>
+<span class="go">array(data =</span>
+<span class="go"> [True True True True True],</span>
+<span class="go">      mask =</span>
+<span class="go"> [False False True False False],</span>
+<span class="go">      fill_value=?)</span>
+<span class="gp">&gt;&gt;&gt; </span><span class="n">old_ma</span><span class="o">.</span><span class="n">getmask</span><span class="p">(</span><span class="n">x</span><span class="p">)</span> <span class="o">==</span> <span class="n">new_ma</span><span class="o">.</span><span class="n">getmask</span><span class="p">(</span><span class="n">x</span><span class="p">)</span>
+<span class="go">array([True, True, True, True, True])</span>
+<span class="gp">&gt;&gt;&gt; </span><span class="n">old_ma</span><span class="o">.</span><span class="n">getmask</span><span class="p">(</span><span class="n">y</span><span class="p">)</span> <span class="o">==</span> <span class="n">new_ma</span><span class="o">.</span><span class="n">getmask</span><span class="p">(</span><span class="n">y</span><span class="p">)</span>
+<span class="go">array([True, True, False, True, True])</span>
+<span class="gp">&gt;&gt;&gt; </span><span class="n">old_ma</span><span class="o">.</span><span class="n">getmask</span><span class="p">(</span><span class="n">y</span><span class="p">)</span>
+<span class="go">False</span>
+</pre></div>
+</div>
+</section>
+<section id="using-maskedarray-with-matplotlib">
+<h2><a class="toc-backref" href="#id6" role="doc-backlink">Using maskedarray with matplotlib</a><a class="headerlink" href="#using-maskedarray-with-matplotlib" title="Link to this heading">#</a></h2>
+<p>Starting with matplotlib 0.91.2, the masked array importing will work with
+the maskedarray branch) as well as with earlier versions.</p>
+<p>By default matplotlib still uses numpy.ma, but there is an rcParams setting
+that you can use to select maskedarray instead.  In the matplotlibrc file
+you will find:</p>
+<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="c1">#maskedarray : False       # True to use external maskedarray module</span>
+                           <span class="c1"># instead of numpy.ma; this is a temporary #</span>
+                           <span class="n">setting</span> <span class="k">for</span> <span class="n">testing</span> <span class="n">maskedarray</span><span class="o">.</span>
+</pre></div>
+</div>
+<p>Uncomment and set to True to select maskedarray everywhere.
+Alternatively, you can test a script with maskedarray by using a
+command-line option, e.g.:</p>
+<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">python</span> <span class="n">simple_plot</span><span class="o">.</span><span class="n">py</span> <span class="o">--</span><span class="n">maskedarray</span>
+</pre></div>
+</div>
+</section>
+<section id="masked-records">
+<h2><a class="toc-backref" href="#id7" role="doc-backlink">Masked records</a><a class="headerlink" href="#masked-records" title="Link to this heading">#</a></h2>
+<p>Like <em>numpy.ma.core</em>, the <em>ndarray</em>-based implementation
+of MaskedArray is limited when working with records: you can
+mask any record of the array, but not a field in a record. If you
+need this feature, you may want to give the <em>mrecords</em> package
+a try (available in the <em>maskedarray</em> directory in the scipy
+sandbox). This module defines a new class, <em>MaskedRecord</em>. An
+instance of this class accepts a <em>recarray</em> as data, and uses two
+masks: the <em>fieldmask</em> has as many entries as records in the array,
+each entry with the same fields as a record, but of boolean types:
+they indicate whether the field is masked or not; a record entry
+is flagged as masked in the <em>mask</em> array if all the fields are
+masked. A few examples in the file should give you an idea of what
+can be done. Note that <em>mrecords</em> is still experimental…</p>
+</section>
+<section id="optimizing-maskedarray">
+<h2><a class="toc-backref" href="#id8" role="doc-backlink">Optimizing maskedarray</a><a class="headerlink" href="#optimizing-maskedarray" title="Link to this heading">#</a></h2>
+</section>
+<section id="should-masked-arrays-be-filled-before-processing-or-not">
+<h2><a class="toc-backref" href="#id9" role="doc-backlink">Should masked arrays be filled before processing or not?</a><a class="headerlink" href="#should-masked-arrays-be-filled-before-processing-or-not" title="Link to this heading">#</a></h2>
+<p>In the current implementation, most operations on masked arrays involve
+the following steps:</p>
+<blockquote>
+<div><ul class="simple">
+<li><p>the input arrays are filled</p></li>
+<li><p>the operation is performed on the filled arrays</p></li>
+<li><p>the mask is set for the results, from the combination of the input masks and the mask corresponding to the domain of the operation.</p></li>
+</ul>
+</div></blockquote>
+<p>For example, consider the division of two masked arrays:</p>
+<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="kn">import</span><span class="w"> </span><span class="nn">numpy</span>
+<span class="kn">import</span><span class="w"> </span><span class="nn">maskedarray</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="nn">ma</span>
+<span class="n">x</span> <span class="o">=</span> <span class="n">ma</span><span class="o">.</span><span class="n">array</span><span class="p">([</span><span class="mi">1</span><span class="p">,</span><span class="mi">2</span><span class="p">,</span><span class="mi">3</span><span class="p">,</span><span class="mi">4</span><span class="p">],</span><span class="n">mask</span><span class="o">=</span><span class="p">[</span><span class="mi">1</span><span class="p">,</span><span class="mi">0</span><span class="p">,</span><span class="mi">0</span><span class="p">,</span><span class="mi">0</span><span class="p">],</span> <span class="n">dtype</span><span class="o">=</span><span class="n">numpy</span><span class="o">.</span><span class="n">float64</span><span class="p">)</span>
+<span class="n">y</span> <span class="o">=</span> <span class="n">ma</span><span class="o">.</span><span class="n">array</span><span class="p">([</span><span class="o">-</span><span class="mi">1</span><span class="p">,</span><span class="mi">0</span><span class="p">,</span><span class="mi">1</span><span class="p">,</span><span class="mi">2</span><span class="p">],</span> <span class="n">mask</span><span class="o">=</span><span class="p">[</span><span class="mi">0</span><span class="p">,</span><span class="mi">0</span><span class="p">,</span><span class="mi">0</span><span class="p">,</span><span class="mi">1</span><span class="p">],</span> <span class="n">dtype</span><span class="o">=</span><span class="n">numpy</span><span class="o">.</span><span class="n">float64</span><span class="p">)</span>
+</pre></div>
+</div>
+<p>The division of x by y is then computed as:</p>
+<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">d1</span> <span class="o">=</span> <span class="n">x</span><span class="o">.</span><span class="n">filled</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span> <span class="c1"># d1 = array([0., 2., 3., 4.])</span>
+<span class="n">d2</span> <span class="o">=</span> <span class="n">y</span><span class="o">.</span><span class="n">filled</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span> <span class="c1"># array([-1.,  0.,  1.,  1.])</span>
+<span class="n">m</span> <span class="o">=</span> <span class="n">ma</span><span class="o">.</span><span class="n">mask_or</span><span class="p">(</span><span class="n">ma</span><span class="o">.</span><span class="n">getmask</span><span class="p">(</span><span class="n">x</span><span class="p">),</span> <span class="n">ma</span><span class="o">.</span><span class="n">getmask</span><span class="p">(</span><span class="n">y</span><span class="p">))</span> <span class="c1"># m =</span>
+<span class="n">array</span><span class="p">([</span><span class="kc">True</span><span class="p">,</span><span class="kc">False</span><span class="p">,</span><span class="kc">False</span><span class="p">,</span><span class="kc">True</span><span class="p">])</span>
+<span class="n">dm</span> <span class="o">=</span> <span class="n">ma</span><span class="o">.</span><span class="n">divide</span><span class="o">.</span><span class="n">domain</span><span class="p">(</span><span class="n">d1</span><span class="p">,</span><span class="n">d2</span><span class="p">)</span> <span class="c1"># array([False,  True, False, False])</span>
+<span class="n">result</span> <span class="o">=</span> <span class="p">(</span><span class="n">d1</span><span class="o">/</span><span class="n">d2</span><span class="p">)</span><span class="o">.</span><span class="n">view</span><span class="p">(</span><span class="n">MaskedArray</span><span class="p">)</span> <span class="c1"># masked_array([-0. inf, 3., 4.])</span>
+<span class="n">result</span><span class="o">.</span><span class="n">_mask</span> <span class="o">=</span> <span class="n">logical_or</span><span class="p">(</span><span class="n">m</span><span class="p">,</span> <span class="n">dm</span><span class="p">)</span>
+</pre></div>
+</div>
+<p>Note that a division by zero takes place. To avoid it, we can consider
+to fill the input arrays, taking the domain mask into account, so that:</p>
+<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">d1</span> <span class="o">=</span> <span class="n">x</span><span class="o">.</span><span class="n">_data</span><span class="o">.</span><span class="n">copy</span><span class="p">()</span> <span class="c1"># d1 = array([1., 2., 3., 4.])</span>
+<span class="n">d2</span> <span class="o">=</span> <span class="n">y</span><span class="o">.</span><span class="n">_data</span><span class="o">.</span><span class="n">copy</span><span class="p">()</span> <span class="c1"># array([-1.,  0.,  1.,  2.])</span>
+<span class="n">dm</span> <span class="o">=</span> <span class="n">ma</span><span class="o">.</span><span class="n">divide</span><span class="o">.</span><span class="n">domain</span><span class="p">(</span><span class="n">d1</span><span class="p">,</span><span class="n">d2</span><span class="p">)</span> <span class="c1"># array([False,  True, False, False])</span>
+<span class="n">numpy</span><span class="o">.</span><span class="n">putmask</span><span class="p">(</span><span class="n">d2</span><span class="p">,</span> <span class="n">dm</span><span class="p">,</span> <span class="mi">1</span><span class="p">)</span> <span class="c1"># d2 = array([-1.,  1.,  1.,  2.])</span>
+<span class="n">m</span> <span class="o">=</span> <span class="n">ma</span><span class="o">.</span><span class="n">mask_or</span><span class="p">(</span><span class="n">ma</span><span class="o">.</span><span class="n">getmask</span><span class="p">(</span><span class="n">x</span><span class="p">),</span> <span class="n">ma</span><span class="o">.</span><span class="n">getmask</span><span class="p">(</span><span class="n">y</span><span class="p">))</span> <span class="c1"># m =</span>
+<span class="n">array</span><span class="p">([</span><span class="kc">True</span><span class="p">,</span><span class="kc">False</span><span class="p">,</span><span class="kc">False</span><span class="p">,</span><span class="kc">True</span><span class="p">])</span>
+<span class="n">result</span> <span class="o">=</span> <span class="p">(</span><span class="n">d1</span><span class="o">/</span><span class="n">d2</span><span class="p">)</span><span class="o">.</span><span class="n">view</span><span class="p">(</span><span class="n">MaskedArray</span><span class="p">)</span> <span class="c1"># masked_array([-1. 0., 3., 2.])</span>
+<span class="n">result</span><span class="o">.</span><span class="n">_mask</span> <span class="o">=</span> <span class="n">logical_or</span><span class="p">(</span><span class="n">m</span><span class="p">,</span> <span class="n">dm</span><span class="p">)</span>
+</pre></div>
+</div>
+<p>Note that the <em>.copy()</em> is required to avoid updating the inputs with
+<em>putmask</em>.  The <em>.filled()</em> method also involves a <em>.copy()</em>.</p>
+<p>A third possibility consists in avoid filling the arrays:</p>
+<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">d1</span> <span class="o">=</span> <span class="n">x</span><span class="o">.</span><span class="n">_data</span> <span class="c1"># d1 = array([1., 2., 3., 4.])</span>
+<span class="n">d2</span> <span class="o">=</span> <span class="n">y</span><span class="o">.</span><span class="n">_data</span> <span class="c1"># array([-1.,  0.,  1.,  2.])</span>
+<span class="n">dm</span> <span class="o">=</span> <span class="n">ma</span><span class="o">.</span><span class="n">divide</span><span class="o">.</span><span class="n">domain</span><span class="p">(</span><span class="n">d1</span><span class="p">,</span><span class="n">d2</span><span class="p">)</span> <span class="c1"># array([False,  True, False, False])</span>
+<span class="n">m</span> <span class="o">=</span> <span class="n">ma</span><span class="o">.</span><span class="n">mask_or</span><span class="p">(</span><span class="n">ma</span><span class="o">.</span><span class="n">getmask</span><span class="p">(</span><span class="n">x</span><span class="p">),</span> <span class="n">ma</span><span class="o">.</span><span class="n">getmask</span><span class="p">(</span><span class="n">y</span><span class="p">))</span> <span class="c1"># m =</span>
+<span class="n">array</span><span class="p">([</span><span class="kc">True</span><span class="p">,</span><span class="kc">False</span><span class="p">,</span><span class="kc">False</span><span class="p">,</span><span class="kc">True</span><span class="p">])</span>
+<span class="n">result</span> <span class="o">=</span> <span class="p">(</span><span class="n">d1</span><span class="o">/</span><span class="n">d2</span><span class="p">)</span><span class="o">.</span><span class="n">view</span><span class="p">(</span><span class="n">MaskedArray</span><span class="p">)</span> <span class="c1"># masked_array([-1. inf, 3., 2.])</span>
+<span class="n">result</span><span class="o">.</span><span class="n">_mask</span> <span class="o">=</span> <span class="n">logical_or</span><span class="p">(</span><span class="n">m</span><span class="p">,</span> <span class="n">dm</span><span class="p">)</span>
+</pre></div>
+</div>
+<p>Note that here again the division by zero takes place.</p>
+<p>A quick benchmark gives the following results:</p>
+<blockquote>
+<div><ul class="simple">
+<li><p><em>numpy.ma.divide</em>  : 2.69 ms per loop</p></li>
+<li><p>classical division     : 2.21 ms per loop</p></li>
+<li><p>division w/ prefilling : 2.34 ms per loop</p></li>
+<li><p>division w/o filling   : 1.55 ms per loop</p></li>
+</ul>
+</div></blockquote>
+<p>So, is it worth filling the arrays beforehand ? Yes, if we are interested
+in avoiding floating-point exceptions that may fill the result with infs
+and nans. No, if we are only interested into speed…</p>
+</section>
+<section id="thanks">
+<h2><a class="toc-backref" href="#id10" role="doc-backlink">Thanks</a><a class="headerlink" href="#thanks" title="Link to this heading">#</a></h2>
+<p>I’d like to thank Paul Dubois, Travis Oliphant and Sasha for the
+original masked array package: without you, I would never have started
+that (it might be argued that I shouldn’t have anyway, but that’s
+another story…).  I also wish to extend these thanks to Reggie Dugard
+and Eric Firing for their suggestions and numerous improvements.</p>
+</section>
+<section id="revision-notes">
+<h2><a class="toc-backref" href="#id11" role="doc-backlink">Revision notes</a><a class="headerlink" href="#revision-notes" title="Link to this heading">#</a></h2>
+<blockquote>
+<div><ul class="simple">
+<li><p>08/25/2007 : Creation of this page</p></li>
+<li><p>01/23/2007 : The package has been moved to the SciPy sandbox, and is regularly updated: please check out your SVN version!</p></li>
+</ul>
+</div></blockquote>
+</section>
+</section>
+
+    <script type="text/x-thebe-config">
+    {
+        requestKernel: true,
+        binderOptions: {
+            repo: "binder-examples/jupyter-stacks-datascience",
+            ref: "master",
+        },
+        codeMirrorConfig: {
+            theme: "abcdef",
+            mode: "python"
+        },
+        kernelOptions: {
+            name: "python3",
+            path: "./.venv/lib/python3.13/site-packages/numpy/ma"
+        },
+        predefinedOutput: true
+    }
+    </script>
+    <script>kernelName = 'python3'</script>
+
+                </article>
+              
+
+              
+              
+              
+              
+                <footer class="prev-next-footer d-print-none">
+                  
+<div class="prev-next-area">
+</div>
+                </footer>
+              
+            </div>
+            
+            
+              
+                <div class="bd-sidebar-secondary bd-toc"><div class="sidebar-secondary-items sidebar-secondary__inner">
+
+
+  <div class="sidebar-secondary-item">
+  <div class="page-toc tocsection onthispage">
+    <i class="fa-solid fa-list"></i> Contents
+  </div>
+  <nav class="bd-toc-nav page-toc">
+    <ul class="visible nav section-nav flex-column">
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#history">History</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#main-differences">Main differences</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#new-features">New features</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#using-the-new-package-with-numpy-core-ma">Using the new package with numpy.core.ma</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#using-maskedarray-with-matplotlib">Using maskedarray with matplotlib</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#masked-records">Masked records</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#optimizing-maskedarray">Optimizing maskedarray</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#should-masked-arrays-be-filled-before-processing-or-not">Should masked arrays be filled before processing or not?</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#thanks">Thanks</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#revision-notes">Revision notes</a></li>
+</ul>
+  </nav></div>
+
+</div></div>
+              
+            
+          </div>
+          <footer class="bd-footer-content">
+            
+<div class="bd-footer-content__inner container">
+  
+  <div class="footer-item">
+    
+<p class="component-author">
+By Morten Hjorth-Jensen
+</p>
+
+  </div>
+  
+  <div class="footer-item">
+    
+
+  <p class="copyright">
+    
+      © Copyright 2023.
+      <br/>
+    
+  </p>
+
+  </div>
+  
+  <div class="footer-item">
+    
+  </div>
+  
+  <div class="footer-item">
+    
+  </div>
+  
+</div>
+          </footer>
+        
+
+      </main>
+    </div>
+  </div>
+  
+  <!-- Scripts loaded after <body> so the DOM is not blocked -->
+  <script src="http://github.com/_static/scripts/bootstrap.js?digest=dfe6caa3a7d634c4db9b"></script>
+<script src="http://github.com/_static/scripts/pydata-sphinx-theme.js?digest=dfe6caa3a7d634c4db9b"></script>
+
+  <footer class="bd-footer">
+  </footer>
+  </body>
+</html>
\ No newline at end of file
diff --git a/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/numpy/random/LICENSE.html b/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/numpy/random/LICENSE.html
new file mode 100644
index 000000000..9e2a43677
--- /dev/null
+++ b/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/numpy/random/LICENSE.html
@@ -0,0 +1,582 @@
+
+<!DOCTYPE html>
+
+
+<html lang="en" data-content_root="../../../../../../" >
+
+  <head>
+    <meta charset="utf-8" />
+    <meta name="viewport" content="width=device-width, initial-scale=1.0" /><meta name="viewport" content="width=device-width, initial-scale=1" />
+
+    <title>NCSA Open Source License &#8212; Applied Data Analysis and Machine Learning</title>
+  
+  
+  
+  <script data-cfasync="false">
+    document.documentElement.dataset.mode = localStorage.getItem("mode") || "";
+    document.documentElement.dataset.theme = localStorage.getItem("theme") || "";
+  </script>
+  
+  <!-- Loaded before other Sphinx assets -->
+  <link href="http://github.com/_static/styles/theme.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+<link href="http://github.com/_static/styles/bootstrap.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+<link href="http://github.com/_static/styles/pydata-sphinx-theme.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+
+  
+  <link href="http://github.com/_static/vendor/fontawesome/6.5.2/css/all.min.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+  <link rel="preload" as="font" type="font/woff2" crossorigin href="http://github.com/_static/vendor/fontawesome/6.5.2/webfonts/fa-solid-900.woff2" />
+<link rel="preload" as="font" type="font/woff2" crossorigin href="http://github.com/_static/vendor/fontawesome/6.5.2/webfonts/fa-brands-400.woff2" />
+<link rel="preload" as="font" type="font/woff2" crossorigin href="http://github.com/_static/vendor/fontawesome/6.5.2/webfonts/fa-regular-400.woff2" />
+
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/pygments.css?v=03e43079" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/styles/sphinx-book-theme.css?v=eba8b062" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/togglebutton.css?v=13237357" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/copybutton.css?v=76b2166b" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/mystnb.8ecb98da25f57f5357bf6f572d296f466b2cfe2517ffebfabe82451661e28f02.css?v=6644e6bb" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/sphinx-thebe.css?v=4fa983c6" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/sphinx-design.min.css?v=95c83b7e" />
+  
+  <!-- Pre-loaded scripts that we'll load fully later -->
+  <link rel="preload" as="script" href="http://github.com/_static/scripts/bootstrap.js?digest=dfe6caa3a7d634c4db9b" />
+<link rel="preload" as="script" href="http://github.com/_static/scripts/pydata-sphinx-theme.js?digest=dfe6caa3a7d634c4db9b" />
+  <script src="http://github.com/_static/vendor/fontawesome/6.5.2/js/all.min.js?digest=dfe6caa3a7d634c4db9b"></script>
+
+    <script src="http://github.com/_static/documentation_options.js?v=9eb32ce0"></script>
+    <script src="http://github.com/_static/doctools.js?v=9a2dae69"></script>
+    <script src="http://github.com/_static/sphinx_highlight.js?v=dc90522c"></script>
+    <script src="http://github.com/_static/clipboard.min.js?v=a7894cd8"></script>
+    <script src="http://github.com/_static/copybutton.js?v=f281be69"></script>
+    <script src="http://github.com/_static/scripts/sphinx-book-theme.js?v=887ef09a"></script>
+    <script>let toggleHintShow = 'Click to show';</script>
+    <script>let toggleHintHide = 'Click to hide';</script>
+    <script>let toggleOpenOnPrint = 'true';</script>
+    <script src="http://github.com/_static/togglebutton.js?v=4a39c7ea"></script>
+    <script>var togglebuttonSelector = '.toggle, .admonition.dropdown';</script>
+    <script src="http://github.com/_static/design-tabs.js?v=f930bc37"></script>
+    <script>const THEBE_JS_URL = "https://unpkg.com/thebe@0.8.2/lib/index.js"; const thebe_selector = ".thebe,.cell"; const thebe_selector_input = "pre"; const thebe_selector_output = ".output, .cell_output"</script>
+    <script async="async" src="http://github.com/_static/sphinx-thebe.js?v=c100c467"></script>
+    <script>var togglebuttonSelector = '.toggle, .admonition.dropdown';</script>
+    <script>const THEBE_JS_URL = "https://unpkg.com/thebe@0.8.2/lib/index.js"; const thebe_selector = ".thebe,.cell"; const thebe_selector_input = "pre"; const thebe_selector_output = ".output, .cell_output"</script>
+    <script>DOCUMENTATION_OPTIONS.pagename = '.venv/lib/python3.13/site-packages/numpy/random/LICENSE';</script>
+    <link rel="index" title="Index" href="http://github.com/genindex.html" />
+    <link rel="search" title="Search" href="http://github.com/search.html" />
+  <meta name="viewport" content="width=device-width, initial-scale=1"/>
+  <meta name="docsearch:language" content="en"/>
+  </head>
+  
+  
+  <body data-bs-spy="scroll" data-bs-target=".bd-toc-nav" data-offset="180" data-bs-root-margin="0px 0px -60%" data-default-mode="">
+
+  
+  
+  <div id="pst-skip-link" class="skip-link d-print-none"><a href="#main-content">Skip to main content</a></div>
+  
+  <div id="pst-scroll-pixel-helper"></div>
+  
+  <button type="button" class="btn rounded-pill" id="pst-back-to-top">
+    <i class="fa-solid fa-arrow-up"></i>Back to top</button>
+
+  
+  <input type="checkbox"
+          class="sidebar-toggle"
+          id="pst-primary-sidebar-checkbox"/>
+  <label class="overlay overlay-primary" for="pst-primary-sidebar-checkbox"></label>
+  
+  <input type="checkbox"
+          class="sidebar-toggle"
+          id="pst-secondary-sidebar-checkbox"/>
+  <label class="overlay overlay-secondary" for="pst-secondary-sidebar-checkbox"></label>
+  
+  <div class="search-button__wrapper">
+    <div class="search-button__overlay"></div>
+    <div class="search-button__search-container">
+<form class="bd-search d-flex align-items-center"
+      action="http://github.com/search.html"
+      method="get">
+  <i class="fa-solid fa-magnifying-glass"></i>
+  <input type="search"
+         class="form-control"
+         name="q"
+         id="search-input"
+         placeholder="Search this book..."
+         aria-label="Search this book..."
+         autocomplete="off"
+         autocorrect="off"
+         autocapitalize="off"
+         spellcheck="false"/>
+  <span class="search-button__kbd-shortcut"><kbd class="kbd-shortcut__modifier">Ctrl</kbd>+<kbd>K</kbd></span>
+</form></div>
+  </div>
+
+  <div class="pst-async-banner-revealer d-none">
+  <aside id="bd-header-version-warning" class="d-none d-print-none" aria-label="Version warning"></aside>
+</div>
+
+  
+    <header class="bd-header navbar navbar-expand-lg bd-navbar d-print-none">
+    </header>
+  
+
+  <div class="bd-container">
+    <div class="bd-container__inner bd-page-width">
+      
+      
+      
+        
+      
+      <div class="bd-sidebar-primary bd-sidebar">
+        
+
+  
+  <div class="sidebar-header-items sidebar-primary__section">
+    
+    
+    
+    
+  </div>
+  
+    <div class="sidebar-primary-items__start sidebar-primary__section">
+        <div class="sidebar-primary-item">
+
+  
+    
+  
+
+<a class="navbar-brand logo" href="http://github.com/intro.html">
+  
+  
+  
+  
+  
+    
+    
+      
+    
+    
+    <img src="http://github.com/_static/logo.png" class="logo__image only-light" alt="Applied Data Analysis and Machine Learning - Home"/>
+    <script>document.write(`<img src="http://github.com/_static/logo.png" class="logo__image only-dark" alt="Applied Data Analysis and Machine Learning - Home"/>`);</script>
+  
+  
+</a></div>
+        <div class="sidebar-primary-item">
+
+ <script>
+ document.write(`
+   <button class="btn search-button-field search-button__button" title="Search" aria-label="Search" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <i class="fa-solid fa-magnifying-glass"></i>
+    <span class="search-button__default-text">Search</span>
+    <span class="search-button__kbd-shortcut"><kbd class="kbd-shortcut__modifier">Ctrl</kbd>+<kbd class="kbd-shortcut__modifier">K</kbd></span>
+   </button>
+ `);
+ </script></div>
+        <div class="sidebar-primary-item"><nav class="bd-links bd-docs-nav" aria-label="Main">
+    <div class="bd-toc-item navbar-nav active">
+        
+        <ul class="nav bd-sidenav bd-sidenav__home-link">
+            <li class="toctree-l1">
+                <a class="reference internal" href="http://github.com/intro.html">
+                    Applied Data Analysis and Machine Learning
+                </a>
+            </li>
+        </ul>
+        <p aria-level="2" class="caption" role="heading"><span class="caption-text">About the course</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/schedule.html">Course setting</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/teachers.html">Teachers and Grading</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/textbooks.html">Textbooks</a></li>
+
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Review of Statistics with Resampling Techniques and Linear Algebra</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/statistics.html">1. Elements of Probability Theory and Statistical Data Analysis</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/linalg.html">2. Linear Algebra, Handling of Arrays and more Python Features</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">From Regression to Support Vector Machines</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter1.html">3. Linear Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter2.html">4. Ridge and Lasso Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter3.html">5. Resampling Methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter4.html">6. Logistic Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapteroptimization.html">7. Optimization, the central part of any Machine Learning algortithm</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter5.html">8. Support Vector Machines, overarching aims</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Decision Trees, Ensemble Methods and Boosting</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter6.html">9. Decision trees, overarching aims</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter7.html">10. Ensemble Methods: From a Single Tree to Many Trees and Extreme Boosting, Meet the Jungle of Methods</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Dimensionality Reduction</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter8.html">11. Basic ideas of the Principal Component Analysis (PCA)</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/clustering.html">12. Clustering and Unsupervised Learning</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Deep Learning Methods</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter9.html">13. Neural networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter10.html">14. Building a Feed Forward Neural Network</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter11.html">15. Solving Differential Equations  with Deep Learning</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter12.html">16. Convolutional Neural Networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter13.html">17. Recurrent neural networks: Overarching view</a></li>
+
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Weekly material, notes and exercises</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek34.html">Exercises week 34</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week34.html">Week 34: Introduction to the course, Logistics and Practicalities</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek35.html">Exercises week 35</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week35.html">Week 35: From Ordinary Linear Regression to Ridge and Lasso Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek36.html">Exercises week 36</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week36.html">Week 36: Linear Regression and Gradient descent</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek37.html">Exercises week 37</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week37.html">Week 37: Gradient descent methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek38.html">Exercises week 38</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week38.html">Week 38: Statistical analysis, bias-variance tradeoff and resampling methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek39.html">Exercises week 39</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week39.html">Week 39: Resampling methods and logistic regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week40.html">Week 40: Gradient descent methods (continued) and start Neural networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week41.html">Week 41 Neural networks and constructing a neural network code</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek41.html">Exercises week 41</a></li>
+
+
+
+
+
+
+
+
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Projects</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/project1.html">Project 1 on Machine Learning, deadline October 6 (midnight), 2025</a></li>
+</ul>
+
+    </div>
+</nav></div>
+    </div>
+  
+  
+  <div class="sidebar-primary-items__end sidebar-primary__section">
+  </div>
+  
+  <div id="rtd-footer-container"></div>
+
+
+      </div>
+      
+      <main id="main-content" class="bd-main" role="main">
+        
+        
+
+<div class="sbt-scroll-pixel-helper"></div>
+
+          <div class="bd-content">
+            <div class="bd-article-container">
+              
+              <div class="bd-header-article d-print-none">
+<div class="header-article-items header-article__inner">
+  
+    <div class="header-article-items__start">
+      
+        <div class="header-article-item"><button class="sidebar-toggle primary-toggle btn btn-sm" title="Toggle primary sidebar" data-bs-placement="bottom" data-bs-toggle="tooltip">
+  <span class="fa-solid fa-bars"></span>
+</button></div>
+      
+    </div>
+  
+  
+    <div class="header-article-items__end">
+      
+        <div class="header-article-item">
+
+<div class="article-header-buttons">
+
+
+
+
+
+<div class="dropdown dropdown-download-buttons">
+  <button class="btn dropdown-toggle" type="button" data-bs-toggle="dropdown" aria-expanded="false" aria-label="Download this page">
+    <i class="fas fa-download"></i>
+  </button>
+  <ul class="dropdown-menu">
+      
+      
+      
+      <li><a href="http://github.com/_sources/.venv/lib/python3.13/site-packages/numpy/random/LICENSE.md" target="_blank"
+   class="btn btn-sm btn-download-source-button dropdown-item"
+   title="Download source file"
+   data-bs-placement="left" data-bs-toggle="tooltip"
+>
+  
+
+<span class="btn__icon-container">
+  <i class="fas fa-file"></i>
+  </span>
+<span class="btn__text-container">.md</span>
+</a>
+</li>
+      
+      
+      
+      
+      <li>
+<button onclick="window.print()"
+  class="btn btn-sm btn-download-pdf-button dropdown-item"
+  title="Print to PDF"
+  data-bs-placement="left" data-bs-toggle="tooltip"
+>
+  
+
+<span class="btn__icon-container">
+  <i class="fas fa-file-pdf"></i>
+  </span>
+<span class="btn__text-container">.pdf</span>
+</button>
+</li>
+      
+  </ul>
+</div>
+
+
+
+
+<button onclick="toggleFullScreen()"
+  class="btn btn-sm btn-fullscreen-button"
+  title="Fullscreen mode"
+  data-bs-placement="bottom" data-bs-toggle="tooltip"
+>
+  
+
+<span class="btn__icon-container">
+  <i class="fas fa-expand"></i>
+  </span>
+
+</button>
+
+
+
+<script>
+document.write(`
+  <button class="btn btn-sm nav-link pst-navbar-icon theme-switch-button" title="light/dark" aria-label="light/dark" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <i class="theme-switch fa-solid fa-sun fa-lg" data-mode="light"></i>
+    <i class="theme-switch fa-solid fa-moon fa-lg" data-mode="dark"></i>
+    <i class="theme-switch fa-solid fa-circle-half-stroke fa-lg" data-mode="auto"></i>
+  </button>
+`);
+</script>
+
+
+<script>
+document.write(`
+  <button class="btn btn-sm pst-navbar-icon search-button search-button__button" title="Search" aria-label="Search" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <i class="fa-solid fa-magnifying-glass fa-lg"></i>
+  </button>
+`);
+</script>
+<button class="sidebar-toggle secondary-toggle btn btn-sm" title="Toggle secondary sidebar" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <span class="fa-solid fa-list"></span>
+</button>
+</div></div>
+      
+    </div>
+  
+</div>
+</div>
+              
+              
+
+<div id="jb-print-docs-body" class="onlyprint">
+    <h1>NCSA Open Source License</h1>
+    <!-- Table of contents -->
+    <div id="print-main-content">
+        <div id="jb-print-toc">
+            
+            <div>
+                <h2> Contents </h2>
+            </div>
+            <nav aria-label="Page">
+                <ul class="visible nav section-nav flex-column">
+<li class="toc-h1 nav-item toc-entry"><a class="reference internal nav-link" href="#">NCSA Open Source License</a></li>
+<li class="toc-h1 nav-item toc-entry"><a class="reference internal nav-link" href="#clause-bsd-license">3-Clause BSD License</a></li>
+<li class="toc-h1 nav-item toc-entry"><a class="reference internal nav-link" href="#components">Components</a></li>
+</ul>
+
+            </nav>
+        </div>
+    </div>
+</div>
+
+              
+                
+<div id="searchbox"></div>
+                <article class="bd-article">
+                  
+  <p><strong>This software is dual-licensed under the The University of Illinois/NCSA
+Open Source License (NCSA) and The 3-Clause BSD License</strong></p>
+<section class="tex2jax_ignore mathjax_ignore" id="ncsa-open-source-license">
+<h1>NCSA Open Source License<a class="headerlink" href="#ncsa-open-source-license" title="Link to this heading">#</a></h1>
+<p><strong>Copyright (c) 2019 Kevin Sheppard. All rights reserved.</strong></p>
+<p>Developed by: Kevin Sheppard (<a class="reference external" href="/service/mailto:kevin&#46;sheppard&#37;&#52;&#48;economics&#46;ox&#46;ac&#46;uk">kevin<span>&#46;</span>sheppard<span>&#64;</span>economics<span>&#46;</span>ox<span>&#46;</span>ac<span>&#46;</span>uk</a>,
+<a class="reference external" href="/service/mailto:kevin&#46;k&#46;sheppard&#37;&#52;&#48;gmail&#46;com">kevin<span>&#46;</span>k<span>&#46;</span>sheppard<span>&#64;</span>gmail<span>&#46;</span>com</a>)
+<a class="reference external" href="http://www.kevinsheppard.com/">http://www.kevinsheppard.com</a></p>
+<p>Permission is hereby granted, free of charge, to any person obtaining a copy of
+this software and associated documentation files (the “Software”), to deal with
+the Software without restriction, including without limitation the rights to
+use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies
+of the Software, and to permit persons to whom the Software is furnished to do
+so, subject to the following conditions:</p>
+<p>Redistributions of source code must retain the above copyright notice, this
+list of conditions and the following disclaimers.</p>
+<p>Redistributions in binary form must reproduce the above copyright notice, this
+list of conditions and the following disclaimers in the documentation and/or
+other materials provided with the distribution.</p>
+<p>Neither the names of Kevin Sheppard, nor the names of any contributors may be
+used to endorse or promote products derived from this Software without specific
+prior written permission.</p>
+<p><strong>THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS WITH
+THE SOFTWARE.</strong></p>
+</section>
+<section class="tex2jax_ignore mathjax_ignore" id="clause-bsd-license">
+<h1>3-Clause BSD License<a class="headerlink" href="#clause-bsd-license" title="Link to this heading">#</a></h1>
+<p><strong>Copyright (c) 2019 Kevin Sheppard. All rights reserved.</strong></p>
+<p>Redistribution and use in source and binary forms, with or without
+modification, are permitted provided that the following conditions are met:</p>
+<ol class="arabic simple">
+<li><p>Redistributions of source code must retain the above copyright notice,
+this list of conditions and the following disclaimer.</p></li>
+<li><p>Redistributions in binary form must reproduce the above copyright notice,
+this list of conditions and the following disclaimer in the documentation
+and/or other materials provided with the distribution.</p></li>
+<li><p>Neither the name of the copyright holder nor the names of its contributors
+may be used to endorse or promote products derived from this software
+without specific prior written permission.</p></li>
+</ol>
+<p><strong>THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS “AS IS”
+AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
+LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF
+THE POSSIBILITY OF SUCH DAMAGE.</strong></p>
+</section>
+<section class="tex2jax_ignore mathjax_ignore" id="components">
+<h1>Components<a class="headerlink" href="#components" title="Link to this heading">#</a></h1>
+<p>Many parts of this module have been derived from original sources,
+often the algorithm’s designer. Component licenses are located with
+the component code.</p>
+</section>
+
+    <script type="text/x-thebe-config">
+    {
+        requestKernel: true,
+        binderOptions: {
+            repo: "binder-examples/jupyter-stacks-datascience",
+            ref: "master",
+        },
+        codeMirrorConfig: {
+            theme: "abcdef",
+            mode: "python"
+        },
+        kernelOptions: {
+            name: "python3",
+            path: "./.venv/lib/python3.13/site-packages/numpy/random"
+        },
+        predefinedOutput: true
+    }
+    </script>
+    <script>kernelName = 'python3'</script>
+
+                </article>
+              
+
+              
+              
+              
+              
+                <footer class="prev-next-footer d-print-none">
+                  
+<div class="prev-next-area">
+</div>
+                </footer>
+              
+            </div>
+            
+            
+              
+                <div class="bd-sidebar-secondary bd-toc"><div class="sidebar-secondary-items sidebar-secondary__inner">
+
+
+  <div class="sidebar-secondary-item">
+  <div class="page-toc tocsection onthispage">
+    <i class="fa-solid fa-list"></i> Contents
+  </div>
+  <nav class="bd-toc-nav page-toc">
+    <ul class="visible nav section-nav flex-column">
+<li class="toc-h1 nav-item toc-entry"><a class="reference internal nav-link" href="#">NCSA Open Source License</a></li>
+<li class="toc-h1 nav-item toc-entry"><a class="reference internal nav-link" href="#clause-bsd-license">3-Clause BSD License</a></li>
+<li class="toc-h1 nav-item toc-entry"><a class="reference internal nav-link" href="#components">Components</a></li>
+</ul>
+
+  </nav></div>
+
+</div></div>
+              
+            
+          </div>
+          <footer class="bd-footer-content">
+            
+<div class="bd-footer-content__inner container">
+  
+  <div class="footer-item">
+    
+<p class="component-author">
+By Morten Hjorth-Jensen
+</p>
+
+  </div>
+  
+  <div class="footer-item">
+    
+
+  <p class="copyright">
+    
+      © Copyright 2023.
+      <br/>
+    
+  </p>
+
+  </div>
+  
+  <div class="footer-item">
+    
+  </div>
+  
+  <div class="footer-item">
+    
+  </div>
+  
+</div>
+          </footer>
+        
+
+      </main>
+    </div>
+  </div>
+  
+  <!-- Scripts loaded after <body> so the DOM is not blocked -->
+  <script src="http://github.com/_static/scripts/bootstrap.js?digest=dfe6caa3a7d634c4db9b"></script>
+<script src="http://github.com/_static/scripts/pydata-sphinx-theme.js?digest=dfe6caa3a7d634c4db9b"></script>
+
+  <footer class="bd-footer">
+  </footer>
+  </body>
+</html>
\ No newline at end of file
diff --git a/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/pip-25.2.dist-info/licenses/src/pip/_vendor/idna/LICENSE.html b/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/pip-25.2.dist-info/licenses/src/pip/_vendor/idna/LICENSE.html
new file mode 100644
index 000000000..04fe3fdfa
--- /dev/null
+++ b/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/pip-25.2.dist-info/licenses/src/pip/_vendor/idna/LICENSE.html
@@ -0,0 +1,540 @@
+
+<!DOCTYPE html>
+
+
+<html lang="en" data-content_root="../../../../../../../../../../" >
+
+  <head>
+    <meta charset="utf-8" />
+    <meta name="viewport" content="width=device-width, initial-scale=1.0" /><meta name="viewport" content="width=device-width, initial-scale=1" />
+
+    <title>&lt;no title&gt; &#8212; Applied Data Analysis and Machine Learning</title>
+  
+  
+  
+  <script data-cfasync="false">
+    document.documentElement.dataset.mode = localStorage.getItem("mode") || "";
+    document.documentElement.dataset.theme = localStorage.getItem("theme") || "";
+  </script>
+  
+  <!-- Loaded before other Sphinx assets -->
+  <link href="http://github.com/_static/styles/theme.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+<link href="http://github.com/_static/styles/bootstrap.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+<link href="http://github.com/_static/styles/pydata-sphinx-theme.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+
+  
+  <link href="http://github.com/_static/vendor/fontawesome/6.5.2/css/all.min.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+  <link rel="preload" as="font" type="font/woff2" crossorigin href="http://github.com/_static/vendor/fontawesome/6.5.2/webfonts/fa-solid-900.woff2" />
+<link rel="preload" as="font" type="font/woff2" crossorigin href="http://github.com/_static/vendor/fontawesome/6.5.2/webfonts/fa-brands-400.woff2" />
+<link rel="preload" as="font" type="font/woff2" crossorigin href="http://github.com/_static/vendor/fontawesome/6.5.2/webfonts/fa-regular-400.woff2" />
+
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/pygments.css?v=03e43079" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/styles/sphinx-book-theme.css?v=eba8b062" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/togglebutton.css?v=13237357" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/copybutton.css?v=76b2166b" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/mystnb.8ecb98da25f57f5357bf6f572d296f466b2cfe2517ffebfabe82451661e28f02.css?v=6644e6bb" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/sphinx-thebe.css?v=4fa983c6" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/sphinx-design.min.css?v=95c83b7e" />
+  
+  <!-- Pre-loaded scripts that we'll load fully later -->
+  <link rel="preload" as="script" href="http://github.com/_static/scripts/bootstrap.js?digest=dfe6caa3a7d634c4db9b" />
+<link rel="preload" as="script" href="http://github.com/_static/scripts/pydata-sphinx-theme.js?digest=dfe6caa3a7d634c4db9b" />
+  <script src="http://github.com/_static/vendor/fontawesome/6.5.2/js/all.min.js?digest=dfe6caa3a7d634c4db9b"></script>
+
+    <script src="http://github.com/_static/documentation_options.js?v=9eb32ce0"></script>
+    <script src="http://github.com/_static/doctools.js?v=9a2dae69"></script>
+    <script src="http://github.com/_static/sphinx_highlight.js?v=dc90522c"></script>
+    <script src="http://github.com/_static/clipboard.min.js?v=a7894cd8"></script>
+    <script src="http://github.com/_static/copybutton.js?v=f281be69"></script>
+    <script src="http://github.com/_static/scripts/sphinx-book-theme.js?v=887ef09a"></script>
+    <script>let toggleHintShow = 'Click to show';</script>
+    <script>let toggleHintHide = 'Click to hide';</script>
+    <script>let toggleOpenOnPrint = 'true';</script>
+    <script src="http://github.com/_static/togglebutton.js?v=4a39c7ea"></script>
+    <script>var togglebuttonSelector = '.toggle, .admonition.dropdown';</script>
+    <script src="http://github.com/_static/design-tabs.js?v=f930bc37"></script>
+    <script>const THEBE_JS_URL = "https://unpkg.com/thebe@0.8.2/lib/index.js"; const thebe_selector = ".thebe,.cell"; const thebe_selector_input = "pre"; const thebe_selector_output = ".output, .cell_output"</script>
+    <script async="async" src="http://github.com/_static/sphinx-thebe.js?v=c100c467"></script>
+    <script>var togglebuttonSelector = '.toggle, .admonition.dropdown';</script>
+    <script>const THEBE_JS_URL = "https://unpkg.com/thebe@0.8.2/lib/index.js"; const thebe_selector = ".thebe,.cell"; const thebe_selector_input = "pre"; const thebe_selector_output = ".output, .cell_output"</script>
+    <script>DOCUMENTATION_OPTIONS.pagename = '.venv/lib/python3.13/site-packages/pip-25.2.dist-info/licenses/src/pip/_vendor/idna/LICENSE';</script>
+    <link rel="index" title="Index" href="http://github.com/genindex.html" />
+    <link rel="search" title="Search" href="http://github.com/search.html" />
+  <meta name="viewport" content="width=device-width, initial-scale=1"/>
+  <meta name="docsearch:language" content="en"/>
+  </head>
+  
+  
+  <body data-bs-spy="scroll" data-bs-target=".bd-toc-nav" data-offset="180" data-bs-root-margin="0px 0px -60%" data-default-mode="">
+
+  
+  
+  <div id="pst-skip-link" class="skip-link d-print-none"><a href="#main-content">Skip to main content</a></div>
+  
+  <div id="pst-scroll-pixel-helper"></div>
+  
+  <button type="button" class="btn rounded-pill" id="pst-back-to-top">
+    <i class="fa-solid fa-arrow-up"></i>Back to top</button>
+
+  
+  <input type="checkbox"
+          class="sidebar-toggle"
+          id="pst-primary-sidebar-checkbox"/>
+  <label class="overlay overlay-primary" for="pst-primary-sidebar-checkbox"></label>
+  
+  <input type="checkbox"
+          class="sidebar-toggle"
+          id="pst-secondary-sidebar-checkbox"/>
+  <label class="overlay overlay-secondary" for="pst-secondary-sidebar-checkbox"></label>
+  
+  <div class="search-button__wrapper">
+    <div class="search-button__overlay"></div>
+    <div class="search-button__search-container">
+<form class="bd-search d-flex align-items-center"
+      action="http://github.com/search.html"
+      method="get">
+  <i class="fa-solid fa-magnifying-glass"></i>
+  <input type="search"
+         class="form-control"
+         name="q"
+         id="search-input"
+         placeholder="Search this book..."
+         aria-label="Search this book..."
+         autocomplete="off"
+         autocorrect="off"
+         autocapitalize="off"
+         spellcheck="false"/>
+  <span class="search-button__kbd-shortcut"><kbd class="kbd-shortcut__modifier">Ctrl</kbd>+<kbd>K</kbd></span>
+</form></div>
+  </div>
+
+  <div class="pst-async-banner-revealer d-none">
+  <aside id="bd-header-version-warning" class="d-none d-print-none" aria-label="Version warning"></aside>
+</div>
+
+  
+    <header class="bd-header navbar navbar-expand-lg bd-navbar d-print-none">
+    </header>
+  
+
+  <div class="bd-container">
+    <div class="bd-container__inner bd-page-width">
+      
+      
+      
+        
+      
+      <div class="bd-sidebar-primary bd-sidebar">
+        
+
+  
+  <div class="sidebar-header-items sidebar-primary__section">
+    
+    
+    
+    
+  </div>
+  
+    <div class="sidebar-primary-items__start sidebar-primary__section">
+        <div class="sidebar-primary-item">
+
+  
+    
+  
+
+<a class="navbar-brand logo" href="http://github.com/intro.html">
+  
+  
+  
+  
+  
+    
+    
+      
+    
+    
+    <img src="http://github.com/_static/logo.png" class="logo__image only-light" alt="Applied Data Analysis and Machine Learning - Home"/>
+    <script>document.write(`<img src="http://github.com/_static/logo.png" class="logo__image only-dark" alt="Applied Data Analysis and Machine Learning - Home"/>`);</script>
+  
+  
+</a></div>
+        <div class="sidebar-primary-item">
+
+ <script>
+ document.write(`
+   <button class="btn search-button-field search-button__button" title="Search" aria-label="Search" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <i class="fa-solid fa-magnifying-glass"></i>
+    <span class="search-button__default-text">Search</span>
+    <span class="search-button__kbd-shortcut"><kbd class="kbd-shortcut__modifier">Ctrl</kbd>+<kbd class="kbd-shortcut__modifier">K</kbd></span>
+   </button>
+ `);
+ </script></div>
+        <div class="sidebar-primary-item"><nav class="bd-links bd-docs-nav" aria-label="Main">
+    <div class="bd-toc-item navbar-nav active">
+        
+        <ul class="nav bd-sidenav bd-sidenav__home-link">
+            <li class="toctree-l1">
+                <a class="reference internal" href="http://github.com/intro.html">
+                    Applied Data Analysis and Machine Learning
+                </a>
+            </li>
+        </ul>
+        <p aria-level="2" class="caption" role="heading"><span class="caption-text">About the course</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/schedule.html">Course setting</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/teachers.html">Teachers and Grading</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/textbooks.html">Textbooks</a></li>
+
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Review of Statistics with Resampling Techniques and Linear Algebra</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/statistics.html">1. Elements of Probability Theory and Statistical Data Analysis</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/linalg.html">2. Linear Algebra, Handling of Arrays and more Python Features</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">From Regression to Support Vector Machines</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter1.html">3. Linear Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter2.html">4. Ridge and Lasso Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter3.html">5. Resampling Methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter4.html">6. Logistic Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapteroptimization.html">7. Optimization, the central part of any Machine Learning algortithm</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter5.html">8. Support Vector Machines, overarching aims</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Decision Trees, Ensemble Methods and Boosting</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter6.html">9. Decision trees, overarching aims</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter7.html">10. Ensemble Methods: From a Single Tree to Many Trees and Extreme Boosting, Meet the Jungle of Methods</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Dimensionality Reduction</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter8.html">11. Basic ideas of the Principal Component Analysis (PCA)</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/clustering.html">12. Clustering and Unsupervised Learning</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Deep Learning Methods</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter9.html">13. Neural networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter10.html">14. Building a Feed Forward Neural Network</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter11.html">15. Solving Differential Equations  with Deep Learning</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter12.html">16. Convolutional Neural Networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter13.html">17. Recurrent neural networks: Overarching view</a></li>
+
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Weekly material, notes and exercises</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek34.html">Exercises week 34</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week34.html">Week 34: Introduction to the course, Logistics and Practicalities</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek35.html">Exercises week 35</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week35.html">Week 35: From Ordinary Linear Regression to Ridge and Lasso Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek36.html">Exercises week 36</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week36.html">Week 36: Linear Regression and Gradient descent</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek37.html">Exercises week 37</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week37.html">Week 37: Gradient descent methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek38.html">Exercises week 38</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week38.html">Week 38: Statistical analysis, bias-variance tradeoff and resampling methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek39.html">Exercises week 39</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week39.html">Week 39: Resampling methods and logistic regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week40.html">Week 40: Gradient descent methods (continued) and start Neural networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week41.html">Week 41 Neural networks and constructing a neural network code</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek41.html">Exercises week 41</a></li>
+
+
+
+
+
+
+
+
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Projects</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/project1.html">Project 1 on Machine Learning, deadline October 6 (midnight), 2025</a></li>
+</ul>
+
+    </div>
+</nav></div>
+    </div>
+  
+  
+  <div class="sidebar-primary-items__end sidebar-primary__section">
+  </div>
+  
+  <div id="rtd-footer-container"></div>
+
+
+      </div>
+      
+      <main id="main-content" class="bd-main" role="main">
+        
+        
+
+<div class="sbt-scroll-pixel-helper"></div>
+
+          <div class="bd-content">
+            <div class="bd-article-container">
+              
+              <div class="bd-header-article d-print-none">
+<div class="header-article-items header-article__inner">
+  
+    <div class="header-article-items__start">
+      
+        <div class="header-article-item"><button class="sidebar-toggle primary-toggle btn btn-sm" title="Toggle primary sidebar" data-bs-placement="bottom" data-bs-toggle="tooltip">
+  <span class="fa-solid fa-bars"></span>
+</button></div>
+      
+    </div>
+  
+  
+    <div class="header-article-items__end">
+      
+        <div class="header-article-item">
+
+<div class="article-header-buttons">
+
+
+
+
+
+<div class="dropdown dropdown-download-buttons">
+  <button class="btn dropdown-toggle" type="button" data-bs-toggle="dropdown" aria-expanded="false" aria-label="Download this page">
+    <i class="fas fa-download"></i>
+  </button>
+  <ul class="dropdown-menu">
+      
+      
+      
+      <li><a href="http://github.com/_sources/.venv/lib/python3.13/site-packages/pip-25.2.dist-info/licenses/src/pip/_vendor/idna/LICENSE.md" target="_blank"
+   class="btn btn-sm btn-download-source-button dropdown-item"
+   title="Download source file"
+   data-bs-placement="left" data-bs-toggle="tooltip"
+>
+  
+
+<span class="btn__icon-container">
+  <i class="fas fa-file"></i>
+  </span>
+<span class="btn__text-container">.md</span>
+</a>
+</li>
+      
+      
+      
+      
+      <li>
+<button onclick="window.print()"
+  class="btn btn-sm btn-download-pdf-button dropdown-item"
+  title="Print to PDF"
+  data-bs-placement="left" data-bs-toggle="tooltip"
+>
+  
+
+<span class="btn__icon-container">
+  <i class="fas fa-file-pdf"></i>
+  </span>
+<span class="btn__text-container">.pdf</span>
+</button>
+</li>
+      
+  </ul>
+</div>
+
+
+
+
+<button onclick="toggleFullScreen()"
+  class="btn btn-sm btn-fullscreen-button"
+  title="Fullscreen mode"
+  data-bs-placement="bottom" data-bs-toggle="tooltip"
+>
+  
+
+<span class="btn__icon-container">
+  <i class="fas fa-expand"></i>
+  </span>
+
+</button>
+
+
+
+<script>
+document.write(`
+  <button class="btn btn-sm nav-link pst-navbar-icon theme-switch-button" title="light/dark" aria-label="light/dark" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <i class="theme-switch fa-solid fa-sun fa-lg" data-mode="light"></i>
+    <i class="theme-switch fa-solid fa-moon fa-lg" data-mode="dark"></i>
+    <i class="theme-switch fa-solid fa-circle-half-stroke fa-lg" data-mode="auto"></i>
+  </button>
+`);
+</script>
+
+
+<script>
+document.write(`
+  <button class="btn btn-sm pst-navbar-icon search-button search-button__button" title="Search" aria-label="Search" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <i class="fa-solid fa-magnifying-glass fa-lg"></i>
+  </button>
+`);
+</script>
+<button class="sidebar-toggle secondary-toggle btn btn-sm" title="Toggle secondary sidebar" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <span class="fa-solid fa-list"></span>
+</button>
+</div></div>
+      
+    </div>
+  
+</div>
+</div>
+              
+              
+
+<div id="jb-print-docs-body" class="onlyprint">
+    <h1><no title></h1>
+    <!-- Table of contents -->
+    <div id="print-main-content">
+        <div id="jb-print-toc">
+            
+            <div>
+                <h2> Contents </h2>
+            </div>
+            <nav aria-label="Page">
+                <ul class="simple visible nav section-nav flex-column">
+</ul>
+
+            </nav>
+        </div>
+    </div>
+</div>
+
+              
+                
+<div id="searchbox"></div>
+                <article class="bd-article">
+                  
+  <p>BSD 3-Clause License</p>
+<p>Copyright (c) 2013-2024, Kim Davies and contributors.
+All rights reserved.</p>
+<p>Redistribution and use in source and binary forms, with or without
+modification, are permitted provided that the following conditions are
+met:</p>
+<ol class="arabic simple">
+<li><p>Redistributions of source code must retain the above copyright
+notice, this list of conditions and the following disclaimer.</p></li>
+<li><p>Redistributions in binary form must reproduce the above copyright
+notice, this list of conditions and the following disclaimer in the
+documentation and/or other materials provided with the distribution.</p></li>
+<li><p>Neither the name of the copyright holder nor the names of its
+contributors may be used to endorse or promote products derived from
+this software without specific prior written permission.</p></li>
+</ol>
+<p>THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+“AS IS” AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED
+TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
+PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
+LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
+NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
+SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.</p>
+
+    <script type="text/x-thebe-config">
+    {
+        requestKernel: true,
+        binderOptions: {
+            repo: "binder-examples/jupyter-stacks-datascience",
+            ref: "master",
+        },
+        codeMirrorConfig: {
+            theme: "abcdef",
+            mode: "python"
+        },
+        kernelOptions: {
+            name: "python3",
+            path: "./.venv/lib/python3.13/site-packages/pip-25.2.dist-info/licenses/src/pip/_vendor/idna"
+        },
+        predefinedOutput: true
+    }
+    </script>
+    <script>kernelName = 'python3'</script>
+
+                </article>
+              
+
+              
+              
+              
+              
+                <footer class="prev-next-footer d-print-none">
+                  
+<div class="prev-next-area">
+</div>
+                </footer>
+              
+            </div>
+            
+            
+              
+                <div class="bd-sidebar-secondary bd-toc"><div class="sidebar-secondary-items sidebar-secondary__inner">
+
+
+  <div class="sidebar-secondary-item">
+  <div class="page-toc tocsection onthispage">
+    <i class="fa-solid fa-list"></i> Contents
+  </div>
+  <nav class="bd-toc-nav page-toc">
+    <ul class="simple visible nav section-nav flex-column">
+</ul>
+
+  </nav></div>
+
+</div></div>
+              
+            
+          </div>
+          <footer class="bd-footer-content">
+            
+<div class="bd-footer-content__inner container">
+  
+  <div class="footer-item">
+    
+<p class="component-author">
+By Morten Hjorth-Jensen
+</p>
+
+  </div>
+  
+  <div class="footer-item">
+    
+
+  <p class="copyright">
+    
+      © Copyright 2023.
+      <br/>
+    
+  </p>
+
+  </div>
+  
+  <div class="footer-item">
+    
+  </div>
+  
+  <div class="footer-item">
+    
+  </div>
+  
+</div>
+          </footer>
+        
+
+      </main>
+    </div>
+  </div>
+  
+  <!-- Scripts loaded after <body> so the DOM is not blocked -->
+  <script src="http://github.com/_static/scripts/bootstrap.js?digest=dfe6caa3a7d634c4db9b"></script>
+<script src="http://github.com/_static/scripts/pydata-sphinx-theme.js?digest=dfe6caa3a7d634c4db9b"></script>
+
+  <footer class="bd-footer">
+  </footer>
+  </body>
+</html>
\ No newline at end of file
diff --git a/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/prompt_toolkit-3.0.52.dist-info/licenses/AUTHORS.html b/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/prompt_toolkit-3.0.52.dist-info/licenses/AUTHORS.html
new file mode 100644
index 000000000..399a27069
--- /dev/null
+++ b/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/prompt_toolkit-3.0.52.dist-info/licenses/AUTHORS.html
@@ -0,0 +1,528 @@
+
+<!DOCTYPE html>
+
+
+<html lang="en" data-content_root="../../../../../../" >
+
+  <head>
+    <meta charset="utf-8" />
+    <meta name="viewport" content="width=device-width, initial-scale=1.0" /><meta name="viewport" content="width=device-width, initial-scale=1" />
+
+    <title>Authors &#8212; Applied Data Analysis and Machine Learning</title>
+  
+  
+  
+  <script data-cfasync="false">
+    document.documentElement.dataset.mode = localStorage.getItem("mode") || "";
+    document.documentElement.dataset.theme = localStorage.getItem("theme") || "";
+  </script>
+  
+  <!-- Loaded before other Sphinx assets -->
+  <link href="http://github.com/_static/styles/theme.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+<link href="http://github.com/_static/styles/bootstrap.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+<link href="http://github.com/_static/styles/pydata-sphinx-theme.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+
+  
+  <link href="http://github.com/_static/vendor/fontawesome/6.5.2/css/all.min.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+  <link rel="preload" as="font" type="font/woff2" crossorigin href="http://github.com/_static/vendor/fontawesome/6.5.2/webfonts/fa-solid-900.woff2" />
+<link rel="preload" as="font" type="font/woff2" crossorigin href="http://github.com/_static/vendor/fontawesome/6.5.2/webfonts/fa-brands-400.woff2" />
+<link rel="preload" as="font" type="font/woff2" crossorigin href="http://github.com/_static/vendor/fontawesome/6.5.2/webfonts/fa-regular-400.woff2" />
+
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/pygments.css?v=03e43079" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/styles/sphinx-book-theme.css?v=eba8b062" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/togglebutton.css?v=13237357" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/copybutton.css?v=76b2166b" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/mystnb.8ecb98da25f57f5357bf6f572d296f466b2cfe2517ffebfabe82451661e28f02.css?v=6644e6bb" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/sphinx-thebe.css?v=4fa983c6" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/sphinx-design.min.css?v=95c83b7e" />
+  
+  <!-- Pre-loaded scripts that we'll load fully later -->
+  <link rel="preload" as="script" href="http://github.com/_static/scripts/bootstrap.js?digest=dfe6caa3a7d634c4db9b" />
+<link rel="preload" as="script" href="http://github.com/_static/scripts/pydata-sphinx-theme.js?digest=dfe6caa3a7d634c4db9b" />
+  <script src="http://github.com/_static/vendor/fontawesome/6.5.2/js/all.min.js?digest=dfe6caa3a7d634c4db9b"></script>
+
+    <script src="http://github.com/_static/documentation_options.js?v=9eb32ce0"></script>
+    <script src="http://github.com/_static/doctools.js?v=9a2dae69"></script>
+    <script src="http://github.com/_static/sphinx_highlight.js?v=dc90522c"></script>
+    <script src="http://github.com/_static/clipboard.min.js?v=a7894cd8"></script>
+    <script src="http://github.com/_static/copybutton.js?v=f281be69"></script>
+    <script src="http://github.com/_static/scripts/sphinx-book-theme.js?v=887ef09a"></script>
+    <script>let toggleHintShow = 'Click to show';</script>
+    <script>let toggleHintHide = 'Click to hide';</script>
+    <script>let toggleOpenOnPrint = 'true';</script>
+    <script src="http://github.com/_static/togglebutton.js?v=4a39c7ea"></script>
+    <script>var togglebuttonSelector = '.toggle, .admonition.dropdown';</script>
+    <script src="http://github.com/_static/design-tabs.js?v=f930bc37"></script>
+    <script>const THEBE_JS_URL = "https://unpkg.com/thebe@0.8.2/lib/index.js"; const thebe_selector = ".thebe,.cell"; const thebe_selector_input = "pre"; const thebe_selector_output = ".output, .cell_output"</script>
+    <script async="async" src="http://github.com/_static/sphinx-thebe.js?v=c100c467"></script>
+    <script>var togglebuttonSelector = '.toggle, .admonition.dropdown';</script>
+    <script>const THEBE_JS_URL = "https://unpkg.com/thebe@0.8.2/lib/index.js"; const thebe_selector = ".thebe,.cell"; const thebe_selector_input = "pre"; const thebe_selector_output = ".output, .cell_output"</script>
+    <script>DOCUMENTATION_OPTIONS.pagename = '.venv/lib/python3.13/site-packages/prompt_toolkit-3.0.52.dist-info/licenses/AUTHORS';</script>
+    <link rel="index" title="Index" href="http://github.com/genindex.html" />
+    <link rel="search" title="Search" href="http://github.com/search.html" />
+  <meta name="viewport" content="width=device-width, initial-scale=1"/>
+  <meta name="docsearch:language" content="en"/>
+  </head>
+  
+  
+  <body data-bs-spy="scroll" data-bs-target=".bd-toc-nav" data-offset="180" data-bs-root-margin="0px 0px -60%" data-default-mode="">
+
+  
+  
+  <div id="pst-skip-link" class="skip-link d-print-none"><a href="#main-content">Skip to main content</a></div>
+  
+  <div id="pst-scroll-pixel-helper"></div>
+  
+  <button type="button" class="btn rounded-pill" id="pst-back-to-top">
+    <i class="fa-solid fa-arrow-up"></i>Back to top</button>
+
+  
+  <input type="checkbox"
+          class="sidebar-toggle"
+          id="pst-primary-sidebar-checkbox"/>
+  <label class="overlay overlay-primary" for="pst-primary-sidebar-checkbox"></label>
+  
+  <input type="checkbox"
+          class="sidebar-toggle"
+          id="pst-secondary-sidebar-checkbox"/>
+  <label class="overlay overlay-secondary" for="pst-secondary-sidebar-checkbox"></label>
+  
+  <div class="search-button__wrapper">
+    <div class="search-button__overlay"></div>
+    <div class="search-button__search-container">
+<form class="bd-search d-flex align-items-center"
+      action="http://github.com/search.html"
+      method="get">
+  <i class="fa-solid fa-magnifying-glass"></i>
+  <input type="search"
+         class="form-control"
+         name="q"
+         id="search-input"
+         placeholder="Search this book..."
+         aria-label="Search this book..."
+         autocomplete="off"
+         autocorrect="off"
+         autocapitalize="off"
+         spellcheck="false"/>
+  <span class="search-button__kbd-shortcut"><kbd class="kbd-shortcut__modifier">Ctrl</kbd>+<kbd>K</kbd></span>
+</form></div>
+  </div>
+
+  <div class="pst-async-banner-revealer d-none">
+  <aside id="bd-header-version-warning" class="d-none d-print-none" aria-label="Version warning"></aside>
+</div>
+
+  
+    <header class="bd-header navbar navbar-expand-lg bd-navbar d-print-none">
+    </header>
+  
+
+  <div class="bd-container">
+    <div class="bd-container__inner bd-page-width">
+      
+      
+      
+        
+      
+      <div class="bd-sidebar-primary bd-sidebar">
+        
+
+  
+  <div class="sidebar-header-items sidebar-primary__section">
+    
+    
+    
+    
+  </div>
+  
+    <div class="sidebar-primary-items__start sidebar-primary__section">
+        <div class="sidebar-primary-item">
+
+  
+    
+  
+
+<a class="navbar-brand logo" href="http://github.com/intro.html">
+  
+  
+  
+  
+  
+    
+    
+      
+    
+    
+    <img src="http://github.com/_static/logo.png" class="logo__image only-light" alt="Applied Data Analysis and Machine Learning - Home"/>
+    <script>document.write(`<img src="http://github.com/_static/logo.png" class="logo__image only-dark" alt="Applied Data Analysis and Machine Learning - Home"/>`);</script>
+  
+  
+</a></div>
+        <div class="sidebar-primary-item">
+
+ <script>
+ document.write(`
+   <button class="btn search-button-field search-button__button" title="Search" aria-label="Search" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <i class="fa-solid fa-magnifying-glass"></i>
+    <span class="search-button__default-text">Search</span>
+    <span class="search-button__kbd-shortcut"><kbd class="kbd-shortcut__modifier">Ctrl</kbd>+<kbd class="kbd-shortcut__modifier">K</kbd></span>
+   </button>
+ `);
+ </script></div>
+        <div class="sidebar-primary-item"><nav class="bd-links bd-docs-nav" aria-label="Main">
+    <div class="bd-toc-item navbar-nav active">
+        
+        <ul class="nav bd-sidenav bd-sidenav__home-link">
+            <li class="toctree-l1">
+                <a class="reference internal" href="http://github.com/intro.html">
+                    Applied Data Analysis and Machine Learning
+                </a>
+            </li>
+        </ul>
+        <p aria-level="2" class="caption" role="heading"><span class="caption-text">About the course</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/schedule.html">Course setting</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/teachers.html">Teachers and Grading</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/textbooks.html">Textbooks</a></li>
+
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Review of Statistics with Resampling Techniques and Linear Algebra</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/statistics.html">1. Elements of Probability Theory and Statistical Data Analysis</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/linalg.html">2. Linear Algebra, Handling of Arrays and more Python Features</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">From Regression to Support Vector Machines</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter1.html">3. Linear Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter2.html">4. Ridge and Lasso Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter3.html">5. Resampling Methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter4.html">6. Logistic Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapteroptimization.html">7. Optimization, the central part of any Machine Learning algortithm</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter5.html">8. Support Vector Machines, overarching aims</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Decision Trees, Ensemble Methods and Boosting</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter6.html">9. Decision trees, overarching aims</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter7.html">10. Ensemble Methods: From a Single Tree to Many Trees and Extreme Boosting, Meet the Jungle of Methods</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Dimensionality Reduction</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter8.html">11. Basic ideas of the Principal Component Analysis (PCA)</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/clustering.html">12. Clustering and Unsupervised Learning</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Deep Learning Methods</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter9.html">13. Neural networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter10.html">14. Building a Feed Forward Neural Network</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter11.html">15. Solving Differential Equations  with Deep Learning</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter12.html">16. Convolutional Neural Networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter13.html">17. Recurrent neural networks: Overarching view</a></li>
+
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Weekly material, notes and exercises</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek34.html">Exercises week 34</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week34.html">Week 34: Introduction to the course, Logistics and Practicalities</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek35.html">Exercises week 35</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week35.html">Week 35: From Ordinary Linear Regression to Ridge and Lasso Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek36.html">Exercises week 36</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week36.html">Week 36: Linear Regression and Gradient descent</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek37.html">Exercises week 37</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week37.html">Week 37: Gradient descent methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek38.html">Exercises week 38</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week38.html">Week 38: Statistical analysis, bias-variance tradeoff and resampling methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek39.html">Exercises week 39</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week39.html">Week 39: Resampling methods and logistic regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week40.html">Week 40: Gradient descent methods (continued) and start Neural networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week41.html">Week 41 Neural networks and constructing a neural network code</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek41.html">Exercises week 41</a></li>
+
+
+
+
+
+
+
+
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Projects</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/project1.html">Project 1 on Machine Learning, deadline October 6 (midnight), 2025</a></li>
+</ul>
+
+    </div>
+</nav></div>
+    </div>
+  
+  
+  <div class="sidebar-primary-items__end sidebar-primary__section">
+  </div>
+  
+  <div id="rtd-footer-container"></div>
+
+
+      </div>
+      
+      <main id="main-content" class="bd-main" role="main">
+        
+        
+
+<div class="sbt-scroll-pixel-helper"></div>
+
+          <div class="bd-content">
+            <div class="bd-article-container">
+              
+              <div class="bd-header-article d-print-none">
+<div class="header-article-items header-article__inner">
+  
+    <div class="header-article-items__start">
+      
+        <div class="header-article-item"><button class="sidebar-toggle primary-toggle btn btn-sm" title="Toggle primary sidebar" data-bs-placement="bottom" data-bs-toggle="tooltip">
+  <span class="fa-solid fa-bars"></span>
+</button></div>
+      
+    </div>
+  
+  
+    <div class="header-article-items__end">
+      
+        <div class="header-article-item">
+
+<div class="article-header-buttons">
+
+
+
+
+
+<div class="dropdown dropdown-download-buttons">
+  <button class="btn dropdown-toggle" type="button" data-bs-toggle="dropdown" aria-expanded="false" aria-label="Download this page">
+    <i class="fas fa-download"></i>
+  </button>
+  <ul class="dropdown-menu">
+      
+      
+      
+      <li><a href="http://github.com/_sources/.venv/lib/python3.13/site-packages/prompt_toolkit-3.0.52.dist-info/licenses/AUTHORS.rst" target="_blank"
+   class="btn btn-sm btn-download-source-button dropdown-item"
+   title="Download source file"
+   data-bs-placement="left" data-bs-toggle="tooltip"
+>
+  
+
+<span class="btn__icon-container">
+  <i class="fas fa-file"></i>
+  </span>
+<span class="btn__text-container">.rst</span>
+</a>
+</li>
+      
+      
+      
+      
+      <li>
+<button onclick="window.print()"
+  class="btn btn-sm btn-download-pdf-button dropdown-item"
+  title="Print to PDF"
+  data-bs-placement="left" data-bs-toggle="tooltip"
+>
+  
+
+<span class="btn__icon-container">
+  <i class="fas fa-file-pdf"></i>
+  </span>
+<span class="btn__text-container">.pdf</span>
+</button>
+</li>
+      
+  </ul>
+</div>
+
+
+
+
+<button onclick="toggleFullScreen()"
+  class="btn btn-sm btn-fullscreen-button"
+  title="Fullscreen mode"
+  data-bs-placement="bottom" data-bs-toggle="tooltip"
+>
+  
+
+<span class="btn__icon-container">
+  <i class="fas fa-expand"></i>
+  </span>
+
+</button>
+
+
+
+<script>
+document.write(`
+  <button class="btn btn-sm nav-link pst-navbar-icon theme-switch-button" title="light/dark" aria-label="light/dark" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <i class="theme-switch fa-solid fa-sun fa-lg" data-mode="light"></i>
+    <i class="theme-switch fa-solid fa-moon fa-lg" data-mode="dark"></i>
+    <i class="theme-switch fa-solid fa-circle-half-stroke fa-lg" data-mode="auto"></i>
+  </button>
+`);
+</script>
+
+
+<script>
+document.write(`
+  <button class="btn btn-sm pst-navbar-icon search-button search-button__button" title="Search" aria-label="Search" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <i class="fa-solid fa-magnifying-glass fa-lg"></i>
+  </button>
+`);
+</script>
+<button class="sidebar-toggle secondary-toggle btn btn-sm" title="Toggle secondary sidebar" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <span class="fa-solid fa-list"></span>
+</button>
+</div></div>
+      
+    </div>
+  
+</div>
+</div>
+              
+              
+
+<div id="jb-print-docs-body" class="onlyprint">
+    <h1>Authors</h1>
+    <!-- Table of contents -->
+    <div id="print-main-content">
+        <div id="jb-print-toc">
+            
+            <div>
+                <h2> Contents </h2>
+            </div>
+            <nav aria-label="Page">
+                <ul class="visible nav section-nav flex-column">
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#creator">Creator</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#contributors">Contributors</a></li>
+</ul>
+            </nav>
+        </div>
+    </div>
+</div>
+
+              
+                
+<div id="searchbox"></div>
+                <article class="bd-article">
+                  
+  <section id="authors">
+<h1>Authors<a class="headerlink" href="#authors" title="Link to this heading">#</a></h1>
+<section id="creator">
+<h2>Creator<a class="headerlink" href="#creator" title="Link to this heading">#</a></h2>
+<p>Jonathan Slenders &lt;jonathan AT slenders.be&gt;</p>
+</section>
+<section id="contributors">
+<h2>Contributors<a class="headerlink" href="#contributors" title="Link to this heading">#</a></h2>
+<ul class="simple">
+<li><p>Amjith Ramanujam &lt;amjith.r AT gmail.com&gt;</p></li>
+</ul>
+</section>
+</section>
+
+    <script type="text/x-thebe-config">
+    {
+        requestKernel: true,
+        binderOptions: {
+            repo: "binder-examples/jupyter-stacks-datascience",
+            ref: "master",
+        },
+        codeMirrorConfig: {
+            theme: "abcdef",
+            mode: "python"
+        },
+        kernelOptions: {
+            name: "python3",
+            path: "./.venv/lib/python3.13/site-packages/prompt_toolkit-3.0.52.dist-info/licenses"
+        },
+        predefinedOutput: true
+    }
+    </script>
+    <script>kernelName = 'python3'</script>
+
+                </article>
+              
+
+              
+              
+              
+              
+                <footer class="prev-next-footer d-print-none">
+                  
+<div class="prev-next-area">
+</div>
+                </footer>
+              
+            </div>
+            
+            
+              
+                <div class="bd-sidebar-secondary bd-toc"><div class="sidebar-secondary-items sidebar-secondary__inner">
+
+
+  <div class="sidebar-secondary-item">
+  <div class="page-toc tocsection onthispage">
+    <i class="fa-solid fa-list"></i> Contents
+  </div>
+  <nav class="bd-toc-nav page-toc">
+    <ul class="visible nav section-nav flex-column">
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#creator">Creator</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#contributors">Contributors</a></li>
+</ul>
+  </nav></div>
+
+</div></div>
+              
+            
+          </div>
+          <footer class="bd-footer-content">
+            
+<div class="bd-footer-content__inner container">
+  
+  <div class="footer-item">
+    
+<p class="component-author">
+By Morten Hjorth-Jensen
+</p>
+
+  </div>
+  
+  <div class="footer-item">
+    
+
+  <p class="copyright">
+    
+      © Copyright 2023.
+      <br/>
+    
+  </p>
+
+  </div>
+  
+  <div class="footer-item">
+    
+  </div>
+  
+  <div class="footer-item">
+    
+  </div>
+  
+</div>
+          </footer>
+        
+
+      </main>
+    </div>
+  </div>
+  
+  <!-- Scripts loaded after <body> so the DOM is not blocked -->
+  <script src="http://github.com/_static/scripts/bootstrap.js?digest=dfe6caa3a7d634c4db9b"></script>
+<script src="http://github.com/_static/scripts/pydata-sphinx-theme.js?digest=dfe6caa3a7d634c4db9b"></script>
+
+  <footer class="bd-footer">
+  </footer>
+  </body>
+</html>
\ No newline at end of file
diff --git a/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/pybtex_docutils-1.0.3.dist-info/LICENSE.html b/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/pybtex_docutils-1.0.3.dist-info/LICENSE.html
new file mode 100644
index 000000000..1100b02f2
--- /dev/null
+++ b/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/pybtex_docutils-1.0.3.dist-info/LICENSE.html
@@ -0,0 +1,535 @@
+
+<!DOCTYPE html>
+
+
+<html lang="en" data-content_root="../../../../../" >
+
+  <head>
+    <meta charset="utf-8" />
+    <meta name="viewport" content="width=device-width, initial-scale=1.0" /><meta name="viewport" content="width=device-width, initial-scale=1" />
+
+    <title>&lt;no title&gt; &#8212; Applied Data Analysis and Machine Learning</title>
+  
+  
+  
+  <script data-cfasync="false">
+    document.documentElement.dataset.mode = localStorage.getItem("mode") || "";
+    document.documentElement.dataset.theme = localStorage.getItem("theme") || "";
+  </script>
+  
+  <!-- Loaded before other Sphinx assets -->
+  <link href="http://github.com/_static/styles/theme.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+<link href="http://github.com/_static/styles/bootstrap.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+<link href="http://github.com/_static/styles/pydata-sphinx-theme.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+
+  
+  <link href="http://github.com/_static/vendor/fontawesome/6.5.2/css/all.min.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+  <link rel="preload" as="font" type="font/woff2" crossorigin href="http://github.com/_static/vendor/fontawesome/6.5.2/webfonts/fa-solid-900.woff2" />
+<link rel="preload" as="font" type="font/woff2" crossorigin href="http://github.com/_static/vendor/fontawesome/6.5.2/webfonts/fa-brands-400.woff2" />
+<link rel="preload" as="font" type="font/woff2" crossorigin href="http://github.com/_static/vendor/fontawesome/6.5.2/webfonts/fa-regular-400.woff2" />
+
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/pygments.css?v=03e43079" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/styles/sphinx-book-theme.css?v=eba8b062" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/togglebutton.css?v=13237357" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/copybutton.css?v=76b2166b" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/mystnb.8ecb98da25f57f5357bf6f572d296f466b2cfe2517ffebfabe82451661e28f02.css?v=6644e6bb" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/sphinx-thebe.css?v=4fa983c6" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/sphinx-design.min.css?v=95c83b7e" />
+  
+  <!-- Pre-loaded scripts that we'll load fully later -->
+  <link rel="preload" as="script" href="http://github.com/_static/scripts/bootstrap.js?digest=dfe6caa3a7d634c4db9b" />
+<link rel="preload" as="script" href="http://github.com/_static/scripts/pydata-sphinx-theme.js?digest=dfe6caa3a7d634c4db9b" />
+  <script src="http://github.com/_static/vendor/fontawesome/6.5.2/js/all.min.js?digest=dfe6caa3a7d634c4db9b"></script>
+
+    <script src="http://github.com/_static/documentation_options.js?v=9eb32ce0"></script>
+    <script src="http://github.com/_static/doctools.js?v=9a2dae69"></script>
+    <script src="http://github.com/_static/sphinx_highlight.js?v=dc90522c"></script>
+    <script src="http://github.com/_static/clipboard.min.js?v=a7894cd8"></script>
+    <script src="http://github.com/_static/copybutton.js?v=f281be69"></script>
+    <script src="http://github.com/_static/scripts/sphinx-book-theme.js?v=887ef09a"></script>
+    <script>let toggleHintShow = 'Click to show';</script>
+    <script>let toggleHintHide = 'Click to hide';</script>
+    <script>let toggleOpenOnPrint = 'true';</script>
+    <script src="http://github.com/_static/togglebutton.js?v=4a39c7ea"></script>
+    <script>var togglebuttonSelector = '.toggle, .admonition.dropdown';</script>
+    <script src="http://github.com/_static/design-tabs.js?v=f930bc37"></script>
+    <script>const THEBE_JS_URL = "https://unpkg.com/thebe@0.8.2/lib/index.js"; const thebe_selector = ".thebe,.cell"; const thebe_selector_input = "pre"; const thebe_selector_output = ".output, .cell_output"</script>
+    <script async="async" src="http://github.com/_static/sphinx-thebe.js?v=c100c467"></script>
+    <script>var togglebuttonSelector = '.toggle, .admonition.dropdown';</script>
+    <script>const THEBE_JS_URL = "https://unpkg.com/thebe@0.8.2/lib/index.js"; const thebe_selector = ".thebe,.cell"; const thebe_selector_input = "pre"; const thebe_selector_output = ".output, .cell_output"</script>
+    <script>DOCUMENTATION_OPTIONS.pagename = '.venv/lib/python3.13/site-packages/pybtex_docutils-1.0.3.dist-info/LICENSE';</script>
+    <link rel="index" title="Index" href="http://github.com/genindex.html" />
+    <link rel="search" title="Search" href="http://github.com/search.html" />
+  <meta name="viewport" content="width=device-width, initial-scale=1"/>
+  <meta name="docsearch:language" content="en"/>
+  </head>
+  
+  
+  <body data-bs-spy="scroll" data-bs-target=".bd-toc-nav" data-offset="180" data-bs-root-margin="0px 0px -60%" data-default-mode="">
+
+  
+  
+  <div id="pst-skip-link" class="skip-link d-print-none"><a href="#main-content">Skip to main content</a></div>
+  
+  <div id="pst-scroll-pixel-helper"></div>
+  
+  <button type="button" class="btn rounded-pill" id="pst-back-to-top">
+    <i class="fa-solid fa-arrow-up"></i>Back to top</button>
+
+  
+  <input type="checkbox"
+          class="sidebar-toggle"
+          id="pst-primary-sidebar-checkbox"/>
+  <label class="overlay overlay-primary" for="pst-primary-sidebar-checkbox"></label>
+  
+  <input type="checkbox"
+          class="sidebar-toggle"
+          id="pst-secondary-sidebar-checkbox"/>
+  <label class="overlay overlay-secondary" for="pst-secondary-sidebar-checkbox"></label>
+  
+  <div class="search-button__wrapper">
+    <div class="search-button__overlay"></div>
+    <div class="search-button__search-container">
+<form class="bd-search d-flex align-items-center"
+      action="http://github.com/search.html"
+      method="get">
+  <i class="fa-solid fa-magnifying-glass"></i>
+  <input type="search"
+         class="form-control"
+         name="q"
+         id="search-input"
+         placeholder="Search this book..."
+         aria-label="Search this book..."
+         autocomplete="off"
+         autocorrect="off"
+         autocapitalize="off"
+         spellcheck="false"/>
+  <span class="search-button__kbd-shortcut"><kbd class="kbd-shortcut__modifier">Ctrl</kbd>+<kbd>K</kbd></span>
+</form></div>
+  </div>
+
+  <div class="pst-async-banner-revealer d-none">
+  <aside id="bd-header-version-warning" class="d-none d-print-none" aria-label="Version warning"></aside>
+</div>
+
+  
+    <header class="bd-header navbar navbar-expand-lg bd-navbar d-print-none">
+    </header>
+  
+
+  <div class="bd-container">
+    <div class="bd-container__inner bd-page-width">
+      
+      
+      
+        
+      
+      <div class="bd-sidebar-primary bd-sidebar">
+        
+
+  
+  <div class="sidebar-header-items sidebar-primary__section">
+    
+    
+    
+    
+  </div>
+  
+    <div class="sidebar-primary-items__start sidebar-primary__section">
+        <div class="sidebar-primary-item">
+
+  
+    
+  
+
+<a class="navbar-brand logo" href="http://github.com/intro.html">
+  
+  
+  
+  
+  
+    
+    
+      
+    
+    
+    <img src="http://github.com/_static/logo.png" class="logo__image only-light" alt="Applied Data Analysis and Machine Learning - Home"/>
+    <script>document.write(`<img src="http://github.com/_static/logo.png" class="logo__image only-dark" alt="Applied Data Analysis and Machine Learning - Home"/>`);</script>
+  
+  
+</a></div>
+        <div class="sidebar-primary-item">
+
+ <script>
+ document.write(`
+   <button class="btn search-button-field search-button__button" title="Search" aria-label="Search" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <i class="fa-solid fa-magnifying-glass"></i>
+    <span class="search-button__default-text">Search</span>
+    <span class="search-button__kbd-shortcut"><kbd class="kbd-shortcut__modifier">Ctrl</kbd>+<kbd class="kbd-shortcut__modifier">K</kbd></span>
+   </button>
+ `);
+ </script></div>
+        <div class="sidebar-primary-item"><nav class="bd-links bd-docs-nav" aria-label="Main">
+    <div class="bd-toc-item navbar-nav active">
+        
+        <ul class="nav bd-sidenav bd-sidenav__home-link">
+            <li class="toctree-l1">
+                <a class="reference internal" href="http://github.com/intro.html">
+                    Applied Data Analysis and Machine Learning
+                </a>
+            </li>
+        </ul>
+        <p aria-level="2" class="caption" role="heading"><span class="caption-text">About the course</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/schedule.html">Course setting</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/teachers.html">Teachers and Grading</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/textbooks.html">Textbooks</a></li>
+
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Review of Statistics with Resampling Techniques and Linear Algebra</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/statistics.html">1. Elements of Probability Theory and Statistical Data Analysis</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/linalg.html">2. Linear Algebra, Handling of Arrays and more Python Features</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">From Regression to Support Vector Machines</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter1.html">3. Linear Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter2.html">4. Ridge and Lasso Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter3.html">5. Resampling Methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter4.html">6. Logistic Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapteroptimization.html">7. Optimization, the central part of any Machine Learning algortithm</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter5.html">8. Support Vector Machines, overarching aims</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Decision Trees, Ensemble Methods and Boosting</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter6.html">9. Decision trees, overarching aims</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter7.html">10. Ensemble Methods: From a Single Tree to Many Trees and Extreme Boosting, Meet the Jungle of Methods</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Dimensionality Reduction</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter8.html">11. Basic ideas of the Principal Component Analysis (PCA)</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/clustering.html">12. Clustering and Unsupervised Learning</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Deep Learning Methods</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter9.html">13. Neural networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter10.html">14. Building a Feed Forward Neural Network</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter11.html">15. Solving Differential Equations  with Deep Learning</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter12.html">16. Convolutional Neural Networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter13.html">17. Recurrent neural networks: Overarching view</a></li>
+
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Weekly material, notes and exercises</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek34.html">Exercises week 34</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week34.html">Week 34: Introduction to the course, Logistics and Practicalities</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek35.html">Exercises week 35</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week35.html">Week 35: From Ordinary Linear Regression to Ridge and Lasso Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek36.html">Exercises week 36</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week36.html">Week 36: Linear Regression and Gradient descent</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek37.html">Exercises week 37</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week37.html">Week 37: Gradient descent methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek38.html">Exercises week 38</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week38.html">Week 38: Statistical analysis, bias-variance tradeoff and resampling methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek39.html">Exercises week 39</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week39.html">Week 39: Resampling methods and logistic regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week40.html">Week 40: Gradient descent methods (continued) and start Neural networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week41.html">Week 41 Neural networks and constructing a neural network code</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek41.html">Exercises week 41</a></li>
+
+
+
+
+
+
+
+
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Projects</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/project1.html">Project 1 on Machine Learning, deadline October 6 (midnight), 2025</a></li>
+</ul>
+
+    </div>
+</nav></div>
+    </div>
+  
+  
+  <div class="sidebar-primary-items__end sidebar-primary__section">
+  </div>
+  
+  <div id="rtd-footer-container"></div>
+
+
+      </div>
+      
+      <main id="main-content" class="bd-main" role="main">
+        
+        
+
+<div class="sbt-scroll-pixel-helper"></div>
+
+          <div class="bd-content">
+            <div class="bd-article-container">
+              
+              <div class="bd-header-article d-print-none">
+<div class="header-article-items header-article__inner">
+  
+    <div class="header-article-items__start">
+      
+        <div class="header-article-item"><button class="sidebar-toggle primary-toggle btn btn-sm" title="Toggle primary sidebar" data-bs-placement="bottom" data-bs-toggle="tooltip">
+  <span class="fa-solid fa-bars"></span>
+</button></div>
+      
+    </div>
+  
+  
+    <div class="header-article-items__end">
+      
+        <div class="header-article-item">
+
+<div class="article-header-buttons">
+
+
+
+
+
+<div class="dropdown dropdown-download-buttons">
+  <button class="btn dropdown-toggle" type="button" data-bs-toggle="dropdown" aria-expanded="false" aria-label="Download this page">
+    <i class="fas fa-download"></i>
+  </button>
+  <ul class="dropdown-menu">
+      
+      
+      
+      <li><a href="http://github.com/_sources/.venv/lib/python3.13/site-packages/pybtex_docutils-1.0.3.dist-info/LICENSE.rst" target="_blank"
+   class="btn btn-sm btn-download-source-button dropdown-item"
+   title="Download source file"
+   data-bs-placement="left" data-bs-toggle="tooltip"
+>
+  
+
+<span class="btn__icon-container">
+  <i class="fas fa-file"></i>
+  </span>
+<span class="btn__text-container">.rst</span>
+</a>
+</li>
+      
+      
+      
+      
+      <li>
+<button onclick="window.print()"
+  class="btn btn-sm btn-download-pdf-button dropdown-item"
+  title="Print to PDF"
+  data-bs-placement="left" data-bs-toggle="tooltip"
+>
+  
+
+<span class="btn__icon-container">
+  <i class="fas fa-file-pdf"></i>
+  </span>
+<span class="btn__text-container">.pdf</span>
+</button>
+</li>
+      
+  </ul>
+</div>
+
+
+
+
+<button onclick="toggleFullScreen()"
+  class="btn btn-sm btn-fullscreen-button"
+  title="Fullscreen mode"
+  data-bs-placement="bottom" data-bs-toggle="tooltip"
+>
+  
+
+<span class="btn__icon-container">
+  <i class="fas fa-expand"></i>
+  </span>
+
+</button>
+
+
+
+<script>
+document.write(`
+  <button class="btn btn-sm nav-link pst-navbar-icon theme-switch-button" title="light/dark" aria-label="light/dark" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <i class="theme-switch fa-solid fa-sun fa-lg" data-mode="light"></i>
+    <i class="theme-switch fa-solid fa-moon fa-lg" data-mode="dark"></i>
+    <i class="theme-switch fa-solid fa-circle-half-stroke fa-lg" data-mode="auto"></i>
+  </button>
+`);
+</script>
+
+
+<script>
+document.write(`
+  <button class="btn btn-sm pst-navbar-icon search-button search-button__button" title="Search" aria-label="Search" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <i class="fa-solid fa-magnifying-glass fa-lg"></i>
+  </button>
+`);
+</script>
+<button class="sidebar-toggle secondary-toggle btn btn-sm" title="Toggle secondary sidebar" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <span class="fa-solid fa-list"></span>
+</button>
+</div></div>
+      
+    </div>
+  
+</div>
+</div>
+              
+              
+
+<div id="jb-print-docs-body" class="onlyprint">
+    <h1><no title></h1>
+    <!-- Table of contents -->
+    <div id="print-main-content">
+        <div id="jb-print-toc">
+            
+            <div>
+                <h2> Contents </h2>
+            </div>
+            <nav aria-label="Page">
+                <ul class="simple visible nav section-nav flex-column">
+</ul>
+
+            </nav>
+        </div>
+    </div>
+</div>
+
+              
+                
+<div id="searchbox"></div>
+                <article class="bd-article">
+                  
+  <div class="line-block">
+<div class="line">pybtex-docutils is a docutils backend for pybtex</div>
+<div class="line">Copyright (c) 2013-2021 by Matthias C. M. Troffaes</div>
+</div>
+<p>Permission is hereby granted, free of charge, to any person
+obtaining a copy of this software and associated documentation
+files (the “Software”), to deal in the Software without
+restriction, including without limitation the rights to use,
+copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the
+Software is furnished to do so, subject to the following
+conditions:</p>
+<p>The above copyright notice and this permission notice shall be
+included in all copies or substantial portions of the Software.</p>
+<p>THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND,
+EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
+OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
+HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
+WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+OTHER DEALINGS IN THE SOFTWARE.</p>
+
+    <script type="text/x-thebe-config">
+    {
+        requestKernel: true,
+        binderOptions: {
+            repo: "binder-examples/jupyter-stacks-datascience",
+            ref: "master",
+        },
+        codeMirrorConfig: {
+            theme: "abcdef",
+            mode: "python"
+        },
+        kernelOptions: {
+            name: "python3",
+            path: "./.venv/lib/python3.13/site-packages/pybtex_docutils-1.0.3.dist-info"
+        },
+        predefinedOutput: true
+    }
+    </script>
+    <script>kernelName = 'python3'</script>
+
+                </article>
+              
+
+              
+              
+              
+              
+                <footer class="prev-next-footer d-print-none">
+                  
+<div class="prev-next-area">
+</div>
+                </footer>
+              
+            </div>
+            
+            
+              
+                <div class="bd-sidebar-secondary bd-toc"><div class="sidebar-secondary-items sidebar-secondary__inner">
+
+
+  <div class="sidebar-secondary-item">
+  <div class="page-toc tocsection onthispage">
+    <i class="fa-solid fa-list"></i> Contents
+  </div>
+  <nav class="bd-toc-nav page-toc">
+    <ul class="simple visible nav section-nav flex-column">
+</ul>
+
+  </nav></div>
+
+</div></div>
+              
+            
+          </div>
+          <footer class="bd-footer-content">
+            
+<div class="bd-footer-content__inner container">
+  
+  <div class="footer-item">
+    
+<p class="component-author">
+By Morten Hjorth-Jensen
+</p>
+
+  </div>
+  
+  <div class="footer-item">
+    
+
+  <p class="copyright">
+    
+      © Copyright 2023.
+      <br/>
+    
+  </p>
+
+  </div>
+  
+  <div class="footer-item">
+    
+  </div>
+  
+  <div class="footer-item">
+    
+  </div>
+  
+</div>
+          </footer>
+        
+
+      </main>
+    </div>
+  </div>
+  
+  <!-- Scripts loaded after <body> so the DOM is not blocked -->
+  <script src="http://github.com/_static/scripts/bootstrap.js?digest=dfe6caa3a7d634c4db9b"></script>
+<script src="http://github.com/_static/scripts/pydata-sphinx-theme.js?digest=dfe6caa3a7d634c4db9b"></script>
+
+  <footer class="bd-footer">
+  </footer>
+  </body>
+</html>
\ No newline at end of file
diff --git a/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/pyzmq-27.0.2.dist-info/licenses/LICENSE.html b/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/pyzmq-27.0.2.dist-info/licenses/LICENSE.html
new file mode 100644
index 000000000..2dd213461
--- /dev/null
+++ b/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/pyzmq-27.0.2.dist-info/licenses/LICENSE.html
@@ -0,0 +1,538 @@
+
+<!DOCTYPE html>
+
+
+<html lang="en" data-content_root="../../../../../../" >
+
+  <head>
+    <meta charset="utf-8" />
+    <meta name="viewport" content="width=device-width, initial-scale=1.0" /><meta name="viewport" content="width=device-width, initial-scale=1" />
+
+    <title>&lt;no title&gt; &#8212; Applied Data Analysis and Machine Learning</title>
+  
+  
+  
+  <script data-cfasync="false">
+    document.documentElement.dataset.mode = localStorage.getItem("mode") || "";
+    document.documentElement.dataset.theme = localStorage.getItem("theme") || "";
+  </script>
+  
+  <!-- Loaded before other Sphinx assets -->
+  <link href="http://github.com/_static/styles/theme.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+<link href="http://github.com/_static/styles/bootstrap.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+<link href="http://github.com/_static/styles/pydata-sphinx-theme.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+
+  
+  <link href="http://github.com/_static/vendor/fontawesome/6.5.2/css/all.min.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+  <link rel="preload" as="font" type="font/woff2" crossorigin href="http://github.com/_static/vendor/fontawesome/6.5.2/webfonts/fa-solid-900.woff2" />
+<link rel="preload" as="font" type="font/woff2" crossorigin href="http://github.com/_static/vendor/fontawesome/6.5.2/webfonts/fa-brands-400.woff2" />
+<link rel="preload" as="font" type="font/woff2" crossorigin href="http://github.com/_static/vendor/fontawesome/6.5.2/webfonts/fa-regular-400.woff2" />
+
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/pygments.css?v=03e43079" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/styles/sphinx-book-theme.css?v=eba8b062" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/togglebutton.css?v=13237357" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/copybutton.css?v=76b2166b" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/mystnb.8ecb98da25f57f5357bf6f572d296f466b2cfe2517ffebfabe82451661e28f02.css?v=6644e6bb" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/sphinx-thebe.css?v=4fa983c6" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/sphinx-design.min.css?v=95c83b7e" />
+  
+  <!-- Pre-loaded scripts that we'll load fully later -->
+  <link rel="preload" as="script" href="http://github.com/_static/scripts/bootstrap.js?digest=dfe6caa3a7d634c4db9b" />
+<link rel="preload" as="script" href="http://github.com/_static/scripts/pydata-sphinx-theme.js?digest=dfe6caa3a7d634c4db9b" />
+  <script src="http://github.com/_static/vendor/fontawesome/6.5.2/js/all.min.js?digest=dfe6caa3a7d634c4db9b"></script>
+
+    <script src="http://github.com/_static/documentation_options.js?v=9eb32ce0"></script>
+    <script src="http://github.com/_static/doctools.js?v=9a2dae69"></script>
+    <script src="http://github.com/_static/sphinx_highlight.js?v=dc90522c"></script>
+    <script src="http://github.com/_static/clipboard.min.js?v=a7894cd8"></script>
+    <script src="http://github.com/_static/copybutton.js?v=f281be69"></script>
+    <script src="http://github.com/_static/scripts/sphinx-book-theme.js?v=887ef09a"></script>
+    <script>let toggleHintShow = 'Click to show';</script>
+    <script>let toggleHintHide = 'Click to hide';</script>
+    <script>let toggleOpenOnPrint = 'true';</script>
+    <script src="http://github.com/_static/togglebutton.js?v=4a39c7ea"></script>
+    <script>var togglebuttonSelector = '.toggle, .admonition.dropdown';</script>
+    <script src="http://github.com/_static/design-tabs.js?v=f930bc37"></script>
+    <script>const THEBE_JS_URL = "https://unpkg.com/thebe@0.8.2/lib/index.js"; const thebe_selector = ".thebe,.cell"; const thebe_selector_input = "pre"; const thebe_selector_output = ".output, .cell_output"</script>
+    <script async="async" src="http://github.com/_static/sphinx-thebe.js?v=c100c467"></script>
+    <script>var togglebuttonSelector = '.toggle, .admonition.dropdown';</script>
+    <script>const THEBE_JS_URL = "https://unpkg.com/thebe@0.8.2/lib/index.js"; const thebe_selector = ".thebe,.cell"; const thebe_selector_input = "pre"; const thebe_selector_output = ".output, .cell_output"</script>
+    <script>DOCUMENTATION_OPTIONS.pagename = '.venv/lib/python3.13/site-packages/pyzmq-27.0.2.dist-info/licenses/LICENSE';</script>
+    <link rel="index" title="Index" href="http://github.com/genindex.html" />
+    <link rel="search" title="Search" href="http://github.com/search.html" />
+  <meta name="viewport" content="width=device-width, initial-scale=1"/>
+  <meta name="docsearch:language" content="en"/>
+  </head>
+  
+  
+  <body data-bs-spy="scroll" data-bs-target=".bd-toc-nav" data-offset="180" data-bs-root-margin="0px 0px -60%" data-default-mode="">
+
+  
+  
+  <div id="pst-skip-link" class="skip-link d-print-none"><a href="#main-content">Skip to main content</a></div>
+  
+  <div id="pst-scroll-pixel-helper"></div>
+  
+  <button type="button" class="btn rounded-pill" id="pst-back-to-top">
+    <i class="fa-solid fa-arrow-up"></i>Back to top</button>
+
+  
+  <input type="checkbox"
+          class="sidebar-toggle"
+          id="pst-primary-sidebar-checkbox"/>
+  <label class="overlay overlay-primary" for="pst-primary-sidebar-checkbox"></label>
+  
+  <input type="checkbox"
+          class="sidebar-toggle"
+          id="pst-secondary-sidebar-checkbox"/>
+  <label class="overlay overlay-secondary" for="pst-secondary-sidebar-checkbox"></label>
+  
+  <div class="search-button__wrapper">
+    <div class="search-button__overlay"></div>
+    <div class="search-button__search-container">
+<form class="bd-search d-flex align-items-center"
+      action="http://github.com/search.html"
+      method="get">
+  <i class="fa-solid fa-magnifying-glass"></i>
+  <input type="search"
+         class="form-control"
+         name="q"
+         id="search-input"
+         placeholder="Search this book..."
+         aria-label="Search this book..."
+         autocomplete="off"
+         autocorrect="off"
+         autocapitalize="off"
+         spellcheck="false"/>
+  <span class="search-button__kbd-shortcut"><kbd class="kbd-shortcut__modifier">Ctrl</kbd>+<kbd>K</kbd></span>
+</form></div>
+  </div>
+
+  <div class="pst-async-banner-revealer d-none">
+  <aside id="bd-header-version-warning" class="d-none d-print-none" aria-label="Version warning"></aside>
+</div>
+
+  
+    <header class="bd-header navbar navbar-expand-lg bd-navbar d-print-none">
+    </header>
+  
+
+  <div class="bd-container">
+    <div class="bd-container__inner bd-page-width">
+      
+      
+      
+        
+      
+      <div class="bd-sidebar-primary bd-sidebar">
+        
+
+  
+  <div class="sidebar-header-items sidebar-primary__section">
+    
+    
+    
+    
+  </div>
+  
+    <div class="sidebar-primary-items__start sidebar-primary__section">
+        <div class="sidebar-primary-item">
+
+  
+    
+  
+
+<a class="navbar-brand logo" href="http://github.com/intro.html">
+  
+  
+  
+  
+  
+    
+    
+      
+    
+    
+    <img src="http://github.com/_static/logo.png" class="logo__image only-light" alt="Applied Data Analysis and Machine Learning - Home"/>
+    <script>document.write(`<img src="http://github.com/_static/logo.png" class="logo__image only-dark" alt="Applied Data Analysis and Machine Learning - Home"/>`);</script>
+  
+  
+</a></div>
+        <div class="sidebar-primary-item">
+
+ <script>
+ document.write(`
+   <button class="btn search-button-field search-button__button" title="Search" aria-label="Search" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <i class="fa-solid fa-magnifying-glass"></i>
+    <span class="search-button__default-text">Search</span>
+    <span class="search-button__kbd-shortcut"><kbd class="kbd-shortcut__modifier">Ctrl</kbd>+<kbd class="kbd-shortcut__modifier">K</kbd></span>
+   </button>
+ `);
+ </script></div>
+        <div class="sidebar-primary-item"><nav class="bd-links bd-docs-nav" aria-label="Main">
+    <div class="bd-toc-item navbar-nav active">
+        
+        <ul class="nav bd-sidenav bd-sidenav__home-link">
+            <li class="toctree-l1">
+                <a class="reference internal" href="http://github.com/intro.html">
+                    Applied Data Analysis and Machine Learning
+                </a>
+            </li>
+        </ul>
+        <p aria-level="2" class="caption" role="heading"><span class="caption-text">About the course</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/schedule.html">Course setting</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/teachers.html">Teachers and Grading</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/textbooks.html">Textbooks</a></li>
+
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Review of Statistics with Resampling Techniques and Linear Algebra</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/statistics.html">1. Elements of Probability Theory and Statistical Data Analysis</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/linalg.html">2. Linear Algebra, Handling of Arrays and more Python Features</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">From Regression to Support Vector Machines</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter1.html">3. Linear Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter2.html">4. Ridge and Lasso Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter3.html">5. Resampling Methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter4.html">6. Logistic Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapteroptimization.html">7. Optimization, the central part of any Machine Learning algortithm</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter5.html">8. Support Vector Machines, overarching aims</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Decision Trees, Ensemble Methods and Boosting</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter6.html">9. Decision trees, overarching aims</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter7.html">10. Ensemble Methods: From a Single Tree to Many Trees and Extreme Boosting, Meet the Jungle of Methods</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Dimensionality Reduction</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter8.html">11. Basic ideas of the Principal Component Analysis (PCA)</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/clustering.html">12. Clustering and Unsupervised Learning</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Deep Learning Methods</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter9.html">13. Neural networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter10.html">14. Building a Feed Forward Neural Network</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter11.html">15. Solving Differential Equations  with Deep Learning</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter12.html">16. Convolutional Neural Networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter13.html">17. Recurrent neural networks: Overarching view</a></li>
+
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Weekly material, notes and exercises</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek34.html">Exercises week 34</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week34.html">Week 34: Introduction to the course, Logistics and Practicalities</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek35.html">Exercises week 35</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week35.html">Week 35: From Ordinary Linear Regression to Ridge and Lasso Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek36.html">Exercises week 36</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week36.html">Week 36: Linear Regression and Gradient descent</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek37.html">Exercises week 37</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week37.html">Week 37: Gradient descent methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek38.html">Exercises week 38</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week38.html">Week 38: Statistical analysis, bias-variance tradeoff and resampling methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek39.html">Exercises week 39</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week39.html">Week 39: Resampling methods and logistic regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week40.html">Week 40: Gradient descent methods (continued) and start Neural networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week41.html">Week 41 Neural networks and constructing a neural network code</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek41.html">Exercises week 41</a></li>
+
+
+
+
+
+
+
+
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Projects</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/project1.html">Project 1 on Machine Learning, deadline October 6 (midnight), 2025</a></li>
+</ul>
+
+    </div>
+</nav></div>
+    </div>
+  
+  
+  <div class="sidebar-primary-items__end sidebar-primary__section">
+  </div>
+  
+  <div id="rtd-footer-container"></div>
+
+
+      </div>
+      
+      <main id="main-content" class="bd-main" role="main">
+        
+        
+
+<div class="sbt-scroll-pixel-helper"></div>
+
+          <div class="bd-content">
+            <div class="bd-article-container">
+              
+              <div class="bd-header-article d-print-none">
+<div class="header-article-items header-article__inner">
+  
+    <div class="header-article-items__start">
+      
+        <div class="header-article-item"><button class="sidebar-toggle primary-toggle btn btn-sm" title="Toggle primary sidebar" data-bs-placement="bottom" data-bs-toggle="tooltip">
+  <span class="fa-solid fa-bars"></span>
+</button></div>
+      
+    </div>
+  
+  
+    <div class="header-article-items__end">
+      
+        <div class="header-article-item">
+
+<div class="article-header-buttons">
+
+
+
+
+
+<div class="dropdown dropdown-download-buttons">
+  <button class="btn dropdown-toggle" type="button" data-bs-toggle="dropdown" aria-expanded="false" aria-label="Download this page">
+    <i class="fas fa-download"></i>
+  </button>
+  <ul class="dropdown-menu">
+      
+      
+      
+      <li><a href="http://github.com/_sources/.venv/lib/python3.13/site-packages/pyzmq-27.0.2.dist-info/licenses/LICENSE.md" target="_blank"
+   class="btn btn-sm btn-download-source-button dropdown-item"
+   title="Download source file"
+   data-bs-placement="left" data-bs-toggle="tooltip"
+>
+  
+
+<span class="btn__icon-container">
+  <i class="fas fa-file"></i>
+  </span>
+<span class="btn__text-container">.md</span>
+</a>
+</li>
+      
+      
+      
+      
+      <li>
+<button onclick="window.print()"
+  class="btn btn-sm btn-download-pdf-button dropdown-item"
+  title="Print to PDF"
+  data-bs-placement="left" data-bs-toggle="tooltip"
+>
+  
+
+<span class="btn__icon-container">
+  <i class="fas fa-file-pdf"></i>
+  </span>
+<span class="btn__text-container">.pdf</span>
+</button>
+</li>
+      
+  </ul>
+</div>
+
+
+
+
+<button onclick="toggleFullScreen()"
+  class="btn btn-sm btn-fullscreen-button"
+  title="Fullscreen mode"
+  data-bs-placement="bottom" data-bs-toggle="tooltip"
+>
+  
+
+<span class="btn__icon-container">
+  <i class="fas fa-expand"></i>
+  </span>
+
+</button>
+
+
+
+<script>
+document.write(`
+  <button class="btn btn-sm nav-link pst-navbar-icon theme-switch-button" title="light/dark" aria-label="light/dark" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <i class="theme-switch fa-solid fa-sun fa-lg" data-mode="light"></i>
+    <i class="theme-switch fa-solid fa-moon fa-lg" data-mode="dark"></i>
+    <i class="theme-switch fa-solid fa-circle-half-stroke fa-lg" data-mode="auto"></i>
+  </button>
+`);
+</script>
+
+
+<script>
+document.write(`
+  <button class="btn btn-sm pst-navbar-icon search-button search-button__button" title="Search" aria-label="Search" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <i class="fa-solid fa-magnifying-glass fa-lg"></i>
+  </button>
+`);
+</script>
+<button class="sidebar-toggle secondary-toggle btn btn-sm" title="Toggle secondary sidebar" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <span class="fa-solid fa-list"></span>
+</button>
+</div></div>
+      
+    </div>
+  
+</div>
+</div>
+              
+              
+
+<div id="jb-print-docs-body" class="onlyprint">
+    <h1><no title></h1>
+    <!-- Table of contents -->
+    <div id="print-main-content">
+        <div id="jb-print-toc">
+            
+            <div>
+                <h2> Contents </h2>
+            </div>
+            <nav aria-label="Page">
+                <ul class="simple visible nav section-nav flex-column">
+</ul>
+
+            </nav>
+        </div>
+    </div>
+</div>
+
+              
+                
+<div id="searchbox"></div>
+                <article class="bd-article">
+                  
+  <p>BSD 3-Clause License</p>
+<p>Copyright (c) 2009-2012, Brian Granger, Min Ragan-Kelley</p>
+<p>All rights reserved.</p>
+<p>Redistribution and use in source and binary forms, with or without
+modification, are permitted provided that the following conditions are met:</p>
+<ol class="arabic simple">
+<li><p>Redistributions of source code must retain the above copyright notice, this
+list of conditions and the following disclaimer.</p></li>
+<li><p>Redistributions in binary form must reproduce the above copyright notice,
+this list of conditions and the following disclaimer in the documentation
+and/or other materials provided with the distribution.</p></li>
+<li><p>Neither the name of the copyright holder nor the names of its
+contributors may be used to endorse or promote products derived from
+this software without specific prior written permission.</p></li>
+</ol>
+<p>THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS “AS IS”
+AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
+FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
+SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
+OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.</p>
+
+    <script type="text/x-thebe-config">
+    {
+        requestKernel: true,
+        binderOptions: {
+            repo: "binder-examples/jupyter-stacks-datascience",
+            ref: "master",
+        },
+        codeMirrorConfig: {
+            theme: "abcdef",
+            mode: "python"
+        },
+        kernelOptions: {
+            name: "python3",
+            path: "./.venv/lib/python3.13/site-packages/pyzmq-27.0.2.dist-info/licenses"
+        },
+        predefinedOutput: true
+    }
+    </script>
+    <script>kernelName = 'python3'</script>
+
+                </article>
+              
+
+              
+              
+              
+              
+                <footer class="prev-next-footer d-print-none">
+                  
+<div class="prev-next-area">
+</div>
+                </footer>
+              
+            </div>
+            
+            
+              
+                <div class="bd-sidebar-secondary bd-toc"><div class="sidebar-secondary-items sidebar-secondary__inner">
+
+
+  <div class="sidebar-secondary-item">
+  <div class="page-toc tocsection onthispage">
+    <i class="fa-solid fa-list"></i> Contents
+  </div>
+  <nav class="bd-toc-nav page-toc">
+    <ul class="simple visible nav section-nav flex-column">
+</ul>
+
+  </nav></div>
+
+</div></div>
+              
+            
+          </div>
+          <footer class="bd-footer-content">
+            
+<div class="bd-footer-content__inner container">
+  
+  <div class="footer-item">
+    
+<p class="component-author">
+By Morten Hjorth-Jensen
+</p>
+
+  </div>
+  
+  <div class="footer-item">
+    
+
+  <p class="copyright">
+    
+      © Copyright 2023.
+      <br/>
+    
+  </p>
+
+  </div>
+  
+  <div class="footer-item">
+    
+  </div>
+  
+  <div class="footer-item">
+    
+  </div>
+  
+</div>
+          </footer>
+        
+
+      </main>
+    </div>
+  </div>
+  
+  <!-- Scripts loaded after <body> so the DOM is not blocked -->
+  <script src="http://github.com/_static/scripts/bootstrap.js?digest=dfe6caa3a7d634c4db9b"></script>
+<script src="http://github.com/_static/scripts/pydata-sphinx-theme.js?digest=dfe6caa3a7d634c4db9b"></script>
+
+  <footer class="bd-footer">
+  </footer>
+  </body>
+</html>
\ No newline at end of file
diff --git a/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/soupsieve-2.8.dist-info/licenses/LICENSE.html b/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/soupsieve-2.8.dist-info/licenses/LICENSE.html
new file mode 100644
index 000000000..e26a1b3d2
--- /dev/null
+++ b/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/soupsieve-2.8.dist-info/licenses/LICENSE.html
@@ -0,0 +1,530 @@
+
+<!DOCTYPE html>
+
+
+<html lang="en" data-content_root="../../../../../../" >
+
+  <head>
+    <meta charset="utf-8" />
+    <meta name="viewport" content="width=device-width, initial-scale=1.0" /><meta name="viewport" content="width=device-width, initial-scale=1" />
+
+    <title>&lt;no title&gt; &#8212; Applied Data Analysis and Machine Learning</title>
+  
+  
+  
+  <script data-cfasync="false">
+    document.documentElement.dataset.mode = localStorage.getItem("mode") || "";
+    document.documentElement.dataset.theme = localStorage.getItem("theme") || "";
+  </script>
+  
+  <!-- Loaded before other Sphinx assets -->
+  <link href="http://github.com/_static/styles/theme.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+<link href="http://github.com/_static/styles/bootstrap.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+<link href="http://github.com/_static/styles/pydata-sphinx-theme.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+
+  
+  <link href="http://github.com/_static/vendor/fontawesome/6.5.2/css/all.min.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+  <link rel="preload" as="font" type="font/woff2" crossorigin href="http://github.com/_static/vendor/fontawesome/6.5.2/webfonts/fa-solid-900.woff2" />
+<link rel="preload" as="font" type="font/woff2" crossorigin href="http://github.com/_static/vendor/fontawesome/6.5.2/webfonts/fa-brands-400.woff2" />
+<link rel="preload" as="font" type="font/woff2" crossorigin href="http://github.com/_static/vendor/fontawesome/6.5.2/webfonts/fa-regular-400.woff2" />
+
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/pygments.css?v=03e43079" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/styles/sphinx-book-theme.css?v=eba8b062" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/togglebutton.css?v=13237357" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/copybutton.css?v=76b2166b" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/mystnb.8ecb98da25f57f5357bf6f572d296f466b2cfe2517ffebfabe82451661e28f02.css?v=6644e6bb" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/sphinx-thebe.css?v=4fa983c6" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/sphinx-design.min.css?v=95c83b7e" />
+  
+  <!-- Pre-loaded scripts that we'll load fully later -->
+  <link rel="preload" as="script" href="http://github.com/_static/scripts/bootstrap.js?digest=dfe6caa3a7d634c4db9b" />
+<link rel="preload" as="script" href="http://github.com/_static/scripts/pydata-sphinx-theme.js?digest=dfe6caa3a7d634c4db9b" />
+  <script src="http://github.com/_static/vendor/fontawesome/6.5.2/js/all.min.js?digest=dfe6caa3a7d634c4db9b"></script>
+
+    <script src="http://github.com/_static/documentation_options.js?v=9eb32ce0"></script>
+    <script src="http://github.com/_static/doctools.js?v=9a2dae69"></script>
+    <script src="http://github.com/_static/sphinx_highlight.js?v=dc90522c"></script>
+    <script src="http://github.com/_static/clipboard.min.js?v=a7894cd8"></script>
+    <script src="http://github.com/_static/copybutton.js?v=f281be69"></script>
+    <script src="http://github.com/_static/scripts/sphinx-book-theme.js?v=887ef09a"></script>
+    <script>let toggleHintShow = 'Click to show';</script>
+    <script>let toggleHintHide = 'Click to hide';</script>
+    <script>let toggleOpenOnPrint = 'true';</script>
+    <script src="http://github.com/_static/togglebutton.js?v=4a39c7ea"></script>
+    <script>var togglebuttonSelector = '.toggle, .admonition.dropdown';</script>
+    <script src="http://github.com/_static/design-tabs.js?v=f930bc37"></script>
+    <script>const THEBE_JS_URL = "https://unpkg.com/thebe@0.8.2/lib/index.js"; const thebe_selector = ".thebe,.cell"; const thebe_selector_input = "pre"; const thebe_selector_output = ".output, .cell_output"</script>
+    <script async="async" src="http://github.com/_static/sphinx-thebe.js?v=c100c467"></script>
+    <script>var togglebuttonSelector = '.toggle, .admonition.dropdown';</script>
+    <script>const THEBE_JS_URL = "https://unpkg.com/thebe@0.8.2/lib/index.js"; const thebe_selector = ".thebe,.cell"; const thebe_selector_input = "pre"; const thebe_selector_output = ".output, .cell_output"</script>
+    <script>DOCUMENTATION_OPTIONS.pagename = '.venv/lib/python3.13/site-packages/soupsieve-2.8.dist-info/licenses/LICENSE';</script>
+    <link rel="index" title="Index" href="http://github.com/genindex.html" />
+    <link rel="search" title="Search" href="http://github.com/search.html" />
+  <meta name="viewport" content="width=device-width, initial-scale=1"/>
+  <meta name="docsearch:language" content="en"/>
+  </head>
+  
+  
+  <body data-bs-spy="scroll" data-bs-target=".bd-toc-nav" data-offset="180" data-bs-root-margin="0px 0px -60%" data-default-mode="">
+
+  
+  
+  <div id="pst-skip-link" class="skip-link d-print-none"><a href="#main-content">Skip to main content</a></div>
+  
+  <div id="pst-scroll-pixel-helper"></div>
+  
+  <button type="button" class="btn rounded-pill" id="pst-back-to-top">
+    <i class="fa-solid fa-arrow-up"></i>Back to top</button>
+
+  
+  <input type="checkbox"
+          class="sidebar-toggle"
+          id="pst-primary-sidebar-checkbox"/>
+  <label class="overlay overlay-primary" for="pst-primary-sidebar-checkbox"></label>
+  
+  <input type="checkbox"
+          class="sidebar-toggle"
+          id="pst-secondary-sidebar-checkbox"/>
+  <label class="overlay overlay-secondary" for="pst-secondary-sidebar-checkbox"></label>
+  
+  <div class="search-button__wrapper">
+    <div class="search-button__overlay"></div>
+    <div class="search-button__search-container">
+<form class="bd-search d-flex align-items-center"
+      action="http://github.com/search.html"
+      method="get">
+  <i class="fa-solid fa-magnifying-glass"></i>
+  <input type="search"
+         class="form-control"
+         name="q"
+         id="search-input"
+         placeholder="Search this book..."
+         aria-label="Search this book..."
+         autocomplete="off"
+         autocorrect="off"
+         autocapitalize="off"
+         spellcheck="false"/>
+  <span class="search-button__kbd-shortcut"><kbd class="kbd-shortcut__modifier">Ctrl</kbd>+<kbd>K</kbd></span>
+</form></div>
+  </div>
+
+  <div class="pst-async-banner-revealer d-none">
+  <aside id="bd-header-version-warning" class="d-none d-print-none" aria-label="Version warning"></aside>
+</div>
+
+  
+    <header class="bd-header navbar navbar-expand-lg bd-navbar d-print-none">
+    </header>
+  
+
+  <div class="bd-container">
+    <div class="bd-container__inner bd-page-width">
+      
+      
+      
+        
+      
+      <div class="bd-sidebar-primary bd-sidebar">
+        
+
+  
+  <div class="sidebar-header-items sidebar-primary__section">
+    
+    
+    
+    
+  </div>
+  
+    <div class="sidebar-primary-items__start sidebar-primary__section">
+        <div class="sidebar-primary-item">
+
+  
+    
+  
+
+<a class="navbar-brand logo" href="http://github.com/intro.html">
+  
+  
+  
+  
+  
+    
+    
+      
+    
+    
+    <img src="http://github.com/_static/logo.png" class="logo__image only-light" alt="Applied Data Analysis and Machine Learning - Home"/>
+    <script>document.write(`<img src="http://github.com/_static/logo.png" class="logo__image only-dark" alt="Applied Data Analysis and Machine Learning - Home"/>`);</script>
+  
+  
+</a></div>
+        <div class="sidebar-primary-item">
+
+ <script>
+ document.write(`
+   <button class="btn search-button-field search-button__button" title="Search" aria-label="Search" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <i class="fa-solid fa-magnifying-glass"></i>
+    <span class="search-button__default-text">Search</span>
+    <span class="search-button__kbd-shortcut"><kbd class="kbd-shortcut__modifier">Ctrl</kbd>+<kbd class="kbd-shortcut__modifier">K</kbd></span>
+   </button>
+ `);
+ </script></div>
+        <div class="sidebar-primary-item"><nav class="bd-links bd-docs-nav" aria-label="Main">
+    <div class="bd-toc-item navbar-nav active">
+        
+        <ul class="nav bd-sidenav bd-sidenav__home-link">
+            <li class="toctree-l1">
+                <a class="reference internal" href="http://github.com/intro.html">
+                    Applied Data Analysis and Machine Learning
+                </a>
+            </li>
+        </ul>
+        <p aria-level="2" class="caption" role="heading"><span class="caption-text">About the course</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/schedule.html">Course setting</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/teachers.html">Teachers and Grading</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/textbooks.html">Textbooks</a></li>
+
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Review of Statistics with Resampling Techniques and Linear Algebra</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/statistics.html">1. Elements of Probability Theory and Statistical Data Analysis</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/linalg.html">2. Linear Algebra, Handling of Arrays and more Python Features</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">From Regression to Support Vector Machines</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter1.html">3. Linear Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter2.html">4. Ridge and Lasso Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter3.html">5. Resampling Methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter4.html">6. Logistic Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapteroptimization.html">7. Optimization, the central part of any Machine Learning algortithm</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter5.html">8. Support Vector Machines, overarching aims</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Decision Trees, Ensemble Methods and Boosting</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter6.html">9. Decision trees, overarching aims</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter7.html">10. Ensemble Methods: From a Single Tree to Many Trees and Extreme Boosting, Meet the Jungle of Methods</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Dimensionality Reduction</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter8.html">11. Basic ideas of the Principal Component Analysis (PCA)</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/clustering.html">12. Clustering and Unsupervised Learning</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Deep Learning Methods</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter9.html">13. Neural networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter10.html">14. Building a Feed Forward Neural Network</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter11.html">15. Solving Differential Equations  with Deep Learning</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter12.html">16. Convolutional Neural Networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter13.html">17. Recurrent neural networks: Overarching view</a></li>
+
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Weekly material, notes and exercises</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek34.html">Exercises week 34</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week34.html">Week 34: Introduction to the course, Logistics and Practicalities</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek35.html">Exercises week 35</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week35.html">Week 35: From Ordinary Linear Regression to Ridge and Lasso Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek36.html">Exercises week 36</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week36.html">Week 36: Linear Regression and Gradient descent</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek37.html">Exercises week 37</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week37.html">Week 37: Gradient descent methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek38.html">Exercises week 38</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week38.html">Week 38: Statistical analysis, bias-variance tradeoff and resampling methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek39.html">Exercises week 39</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week39.html">Week 39: Resampling methods and logistic regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week40.html">Week 40: Gradient descent methods (continued) and start Neural networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week41.html">Week 41 Neural networks and constructing a neural network code</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek41.html">Exercises week 41</a></li>
+
+
+
+
+
+
+
+
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Projects</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/project1.html">Project 1 on Machine Learning, deadline October 6 (midnight), 2025</a></li>
+</ul>
+
+    </div>
+</nav></div>
+    </div>
+  
+  
+  <div class="sidebar-primary-items__end sidebar-primary__section">
+  </div>
+  
+  <div id="rtd-footer-container"></div>
+
+
+      </div>
+      
+      <main id="main-content" class="bd-main" role="main">
+        
+        
+
+<div class="sbt-scroll-pixel-helper"></div>
+
+          <div class="bd-content">
+            <div class="bd-article-container">
+              
+              <div class="bd-header-article d-print-none">
+<div class="header-article-items header-article__inner">
+  
+    <div class="header-article-items__start">
+      
+        <div class="header-article-item"><button class="sidebar-toggle primary-toggle btn btn-sm" title="Toggle primary sidebar" data-bs-placement="bottom" data-bs-toggle="tooltip">
+  <span class="fa-solid fa-bars"></span>
+</button></div>
+      
+    </div>
+  
+  
+    <div class="header-article-items__end">
+      
+        <div class="header-article-item">
+
+<div class="article-header-buttons">
+
+
+
+
+
+<div class="dropdown dropdown-download-buttons">
+  <button class="btn dropdown-toggle" type="button" data-bs-toggle="dropdown" aria-expanded="false" aria-label="Download this page">
+    <i class="fas fa-download"></i>
+  </button>
+  <ul class="dropdown-menu">
+      
+      
+      
+      <li><a href="http://github.com/_sources/.venv/lib/python3.13/site-packages/soupsieve-2.8.dist-info/licenses/LICENSE.md" target="_blank"
+   class="btn btn-sm btn-download-source-button dropdown-item"
+   title="Download source file"
+   data-bs-placement="left" data-bs-toggle="tooltip"
+>
+  
+
+<span class="btn__icon-container">
+  <i class="fas fa-file"></i>
+  </span>
+<span class="btn__text-container">.md</span>
+</a>
+</li>
+      
+      
+      
+      
+      <li>
+<button onclick="window.print()"
+  class="btn btn-sm btn-download-pdf-button dropdown-item"
+  title="Print to PDF"
+  data-bs-placement="left" data-bs-toggle="tooltip"
+>
+  
+
+<span class="btn__icon-container">
+  <i class="fas fa-file-pdf"></i>
+  </span>
+<span class="btn__text-container">.pdf</span>
+</button>
+</li>
+      
+  </ul>
+</div>
+
+
+
+
+<button onclick="toggleFullScreen()"
+  class="btn btn-sm btn-fullscreen-button"
+  title="Fullscreen mode"
+  data-bs-placement="bottom" data-bs-toggle="tooltip"
+>
+  
+
+<span class="btn__icon-container">
+  <i class="fas fa-expand"></i>
+  </span>
+
+</button>
+
+
+
+<script>
+document.write(`
+  <button class="btn btn-sm nav-link pst-navbar-icon theme-switch-button" title="light/dark" aria-label="light/dark" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <i class="theme-switch fa-solid fa-sun fa-lg" data-mode="light"></i>
+    <i class="theme-switch fa-solid fa-moon fa-lg" data-mode="dark"></i>
+    <i class="theme-switch fa-solid fa-circle-half-stroke fa-lg" data-mode="auto"></i>
+  </button>
+`);
+</script>
+
+
+<script>
+document.write(`
+  <button class="btn btn-sm pst-navbar-icon search-button search-button__button" title="Search" aria-label="Search" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <i class="fa-solid fa-magnifying-glass fa-lg"></i>
+  </button>
+`);
+</script>
+<button class="sidebar-toggle secondary-toggle btn btn-sm" title="Toggle secondary sidebar" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <span class="fa-solid fa-list"></span>
+</button>
+</div></div>
+      
+    </div>
+  
+</div>
+</div>
+              
+              
+
+<div id="jb-print-docs-body" class="onlyprint">
+    <h1><no title></h1>
+    <!-- Table of contents -->
+    <div id="print-main-content">
+        <div id="jb-print-toc">
+            
+            <div>
+                <h2> Contents </h2>
+            </div>
+            <nav aria-label="Page">
+                <ul class="simple visible nav section-nav flex-column">
+</ul>
+
+            </nav>
+        </div>
+    </div>
+</div>
+
+              
+                
+<div id="searchbox"></div>
+                <article class="bd-article">
+                  
+  <p>MIT License</p>
+<p>Copyright (c) 2018 - 2025 Isaac Muse <a class="reference external" href="/service/mailto:isaacmuse&#37;&#52;&#48;gmail&#46;com">isaacmuse<span>&#64;</span>gmail<span>&#46;</span>com</a></p>
+<p>Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the “Software”), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:</p>
+<p>The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.</p>
+<p>THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.</p>
+
+    <script type="text/x-thebe-config">
+    {
+        requestKernel: true,
+        binderOptions: {
+            repo: "binder-examples/jupyter-stacks-datascience",
+            ref: "master",
+        },
+        codeMirrorConfig: {
+            theme: "abcdef",
+            mode: "python"
+        },
+        kernelOptions: {
+            name: "python3",
+            path: "./.venv/lib/python3.13/site-packages/soupsieve-2.8.dist-info/licenses"
+        },
+        predefinedOutput: true
+    }
+    </script>
+    <script>kernelName = 'python3'</script>
+
+                </article>
+              
+
+              
+              
+              
+              
+                <footer class="prev-next-footer d-print-none">
+                  
+<div class="prev-next-area">
+</div>
+                </footer>
+              
+            </div>
+            
+            
+              
+                <div class="bd-sidebar-secondary bd-toc"><div class="sidebar-secondary-items sidebar-secondary__inner">
+
+
+  <div class="sidebar-secondary-item">
+  <div class="page-toc tocsection onthispage">
+    <i class="fa-solid fa-list"></i> Contents
+  </div>
+  <nav class="bd-toc-nav page-toc">
+    <ul class="simple visible nav section-nav flex-column">
+</ul>
+
+  </nav></div>
+
+</div></div>
+              
+            
+          </div>
+          <footer class="bd-footer-content">
+            
+<div class="bd-footer-content__inner container">
+  
+  <div class="footer-item">
+    
+<p class="component-author">
+By Morten Hjorth-Jensen
+</p>
+
+  </div>
+  
+  <div class="footer-item">
+    
+
+  <p class="copyright">
+    
+      © Copyright 2023.
+      <br/>
+    
+  </p>
+
+  </div>
+  
+  <div class="footer-item">
+    
+  </div>
+  
+  <div class="footer-item">
+    
+  </div>
+  
+</div>
+          </footer>
+        
+
+      </main>
+    </div>
+  </div>
+  
+  <!-- Scripts loaded after <body> so the DOM is not blocked -->
+  <script src="http://github.com/_static/scripts/bootstrap.js?digest=dfe6caa3a7d634c4db9b"></script>
+<script src="http://github.com/_static/scripts/pydata-sphinx-theme.js?digest=dfe6caa3a7d634c4db9b"></script>
+
+  <footer class="bd-footer">
+  </footer>
+  </body>
+</html>
\ No newline at end of file
diff --git a/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/sphinx-7.4.7.dist-info/LICENSE.html b/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/sphinx-7.4.7.dist-info/LICENSE.html
new file mode 100644
index 000000000..73d7a77f2
--- /dev/null
+++ b/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/sphinx-7.4.7.dist-info/LICENSE.html
@@ -0,0 +1,577 @@
+
+<!DOCTYPE html>
+
+
+<html lang="en" data-content_root="../../../../../" >
+
+  <head>
+    <meta charset="utf-8" />
+    <meta name="viewport" content="width=device-width, initial-scale=1.0" /><meta name="viewport" content="width=device-width, initial-scale=1" />
+
+    <title>License for Sphinx &#8212; Applied Data Analysis and Machine Learning</title>
+  
+  
+  
+  <script data-cfasync="false">
+    document.documentElement.dataset.mode = localStorage.getItem("mode") || "";
+    document.documentElement.dataset.theme = localStorage.getItem("theme") || "";
+  </script>
+  
+  <!-- Loaded before other Sphinx assets -->
+  <link href="http://github.com/_static/styles/theme.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+<link href="http://github.com/_static/styles/bootstrap.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+<link href="http://github.com/_static/styles/pydata-sphinx-theme.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+
+  
+  <link href="http://github.com/_static/vendor/fontawesome/6.5.2/css/all.min.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+  <link rel="preload" as="font" type="font/woff2" crossorigin href="http://github.com/_static/vendor/fontawesome/6.5.2/webfonts/fa-solid-900.woff2" />
+<link rel="preload" as="font" type="font/woff2" crossorigin href="http://github.com/_static/vendor/fontawesome/6.5.2/webfonts/fa-brands-400.woff2" />
+<link rel="preload" as="font" type="font/woff2" crossorigin href="http://github.com/_static/vendor/fontawesome/6.5.2/webfonts/fa-regular-400.woff2" />
+
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/pygments.css?v=03e43079" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/styles/sphinx-book-theme.css?v=eba8b062" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/togglebutton.css?v=13237357" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/copybutton.css?v=76b2166b" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/mystnb.8ecb98da25f57f5357bf6f572d296f466b2cfe2517ffebfabe82451661e28f02.css?v=6644e6bb" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/sphinx-thebe.css?v=4fa983c6" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/sphinx-design.min.css?v=95c83b7e" />
+  
+  <!-- Pre-loaded scripts that we'll load fully later -->
+  <link rel="preload" as="script" href="http://github.com/_static/scripts/bootstrap.js?digest=dfe6caa3a7d634c4db9b" />
+<link rel="preload" as="script" href="http://github.com/_static/scripts/pydata-sphinx-theme.js?digest=dfe6caa3a7d634c4db9b" />
+  <script src="http://github.com/_static/vendor/fontawesome/6.5.2/js/all.min.js?digest=dfe6caa3a7d634c4db9b"></script>
+
+    <script src="http://github.com/_static/documentation_options.js?v=9eb32ce0"></script>
+    <script src="http://github.com/_static/doctools.js?v=9a2dae69"></script>
+    <script src="http://github.com/_static/sphinx_highlight.js?v=dc90522c"></script>
+    <script src="http://github.com/_static/clipboard.min.js?v=a7894cd8"></script>
+    <script src="http://github.com/_static/copybutton.js?v=f281be69"></script>
+    <script src="http://github.com/_static/scripts/sphinx-book-theme.js?v=887ef09a"></script>
+    <script>let toggleHintShow = 'Click to show';</script>
+    <script>let toggleHintHide = 'Click to hide';</script>
+    <script>let toggleOpenOnPrint = 'true';</script>
+    <script src="http://github.com/_static/togglebutton.js?v=4a39c7ea"></script>
+    <script>var togglebuttonSelector = '.toggle, .admonition.dropdown';</script>
+    <script src="http://github.com/_static/design-tabs.js?v=f930bc37"></script>
+    <script>const THEBE_JS_URL = "https://unpkg.com/thebe@0.8.2/lib/index.js"; const thebe_selector = ".thebe,.cell"; const thebe_selector_input = "pre"; const thebe_selector_output = ".output, .cell_output"</script>
+    <script async="async" src="http://github.com/_static/sphinx-thebe.js?v=c100c467"></script>
+    <script>var togglebuttonSelector = '.toggle, .admonition.dropdown';</script>
+    <script>const THEBE_JS_URL = "https://unpkg.com/thebe@0.8.2/lib/index.js"; const thebe_selector = ".thebe,.cell"; const thebe_selector_input = "pre"; const thebe_selector_output = ".output, .cell_output"</script>
+    <script>DOCUMENTATION_OPTIONS.pagename = '.venv/lib/python3.13/site-packages/sphinx-7.4.7.dist-info/LICENSE';</script>
+    <link rel="index" title="Index" href="http://github.com/genindex.html" />
+    <link rel="search" title="Search" href="http://github.com/search.html" />
+  <meta name="viewport" content="width=device-width, initial-scale=1"/>
+  <meta name="docsearch:language" content="en"/>
+  </head>
+  
+  
+  <body data-bs-spy="scroll" data-bs-target=".bd-toc-nav" data-offset="180" data-bs-root-margin="0px 0px -60%" data-default-mode="">
+
+  
+  
+  <div id="pst-skip-link" class="skip-link d-print-none"><a href="#main-content">Skip to main content</a></div>
+  
+  <div id="pst-scroll-pixel-helper"></div>
+  
+  <button type="button" class="btn rounded-pill" id="pst-back-to-top">
+    <i class="fa-solid fa-arrow-up"></i>Back to top</button>
+
+  
+  <input type="checkbox"
+          class="sidebar-toggle"
+          id="pst-primary-sidebar-checkbox"/>
+  <label class="overlay overlay-primary" for="pst-primary-sidebar-checkbox"></label>
+  
+  <input type="checkbox"
+          class="sidebar-toggle"
+          id="pst-secondary-sidebar-checkbox"/>
+  <label class="overlay overlay-secondary" for="pst-secondary-sidebar-checkbox"></label>
+  
+  <div class="search-button__wrapper">
+    <div class="search-button__overlay"></div>
+    <div class="search-button__search-container">
+<form class="bd-search d-flex align-items-center"
+      action="http://github.com/search.html"
+      method="get">
+  <i class="fa-solid fa-magnifying-glass"></i>
+  <input type="search"
+         class="form-control"
+         name="q"
+         id="search-input"
+         placeholder="Search this book..."
+         aria-label="Search this book..."
+         autocomplete="off"
+         autocorrect="off"
+         autocapitalize="off"
+         spellcheck="false"/>
+  <span class="search-button__kbd-shortcut"><kbd class="kbd-shortcut__modifier">Ctrl</kbd>+<kbd>K</kbd></span>
+</form></div>
+  </div>
+
+  <div class="pst-async-banner-revealer d-none">
+  <aside id="bd-header-version-warning" class="d-none d-print-none" aria-label="Version warning"></aside>
+</div>
+
+  
+    <header class="bd-header navbar navbar-expand-lg bd-navbar d-print-none">
+    </header>
+  
+
+  <div class="bd-container">
+    <div class="bd-container__inner bd-page-width">
+      
+      
+      
+        
+      
+      <div class="bd-sidebar-primary bd-sidebar">
+        
+
+  
+  <div class="sidebar-header-items sidebar-primary__section">
+    
+    
+    
+    
+  </div>
+  
+    <div class="sidebar-primary-items__start sidebar-primary__section">
+        <div class="sidebar-primary-item">
+
+  
+    
+  
+
+<a class="navbar-brand logo" href="http://github.com/intro.html">
+  
+  
+  
+  
+  
+    
+    
+      
+    
+    
+    <img src="http://github.com/_static/logo.png" class="logo__image only-light" alt="Applied Data Analysis and Machine Learning - Home"/>
+    <script>document.write(`<img src="http://github.com/_static/logo.png" class="logo__image only-dark" alt="Applied Data Analysis and Machine Learning - Home"/>`);</script>
+  
+  
+</a></div>
+        <div class="sidebar-primary-item">
+
+ <script>
+ document.write(`
+   <button class="btn search-button-field search-button__button" title="Search" aria-label="Search" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <i class="fa-solid fa-magnifying-glass"></i>
+    <span class="search-button__default-text">Search</span>
+    <span class="search-button__kbd-shortcut"><kbd class="kbd-shortcut__modifier">Ctrl</kbd>+<kbd class="kbd-shortcut__modifier">K</kbd></span>
+   </button>
+ `);
+ </script></div>
+        <div class="sidebar-primary-item"><nav class="bd-links bd-docs-nav" aria-label="Main">
+    <div class="bd-toc-item navbar-nav active">
+        
+        <ul class="nav bd-sidenav bd-sidenav__home-link">
+            <li class="toctree-l1">
+                <a class="reference internal" href="http://github.com/intro.html">
+                    Applied Data Analysis and Machine Learning
+                </a>
+            </li>
+        </ul>
+        <p aria-level="2" class="caption" role="heading"><span class="caption-text">About the course</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/schedule.html">Course setting</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/teachers.html">Teachers and Grading</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/textbooks.html">Textbooks</a></li>
+
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Review of Statistics with Resampling Techniques and Linear Algebra</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/statistics.html">1. Elements of Probability Theory and Statistical Data Analysis</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/linalg.html">2. Linear Algebra, Handling of Arrays and more Python Features</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">From Regression to Support Vector Machines</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter1.html">3. Linear Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter2.html">4. Ridge and Lasso Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter3.html">5. Resampling Methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter4.html">6. Logistic Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapteroptimization.html">7. Optimization, the central part of any Machine Learning algortithm</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter5.html">8. Support Vector Machines, overarching aims</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Decision Trees, Ensemble Methods and Boosting</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter6.html">9. Decision trees, overarching aims</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter7.html">10. Ensemble Methods: From a Single Tree to Many Trees and Extreme Boosting, Meet the Jungle of Methods</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Dimensionality Reduction</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter8.html">11. Basic ideas of the Principal Component Analysis (PCA)</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/clustering.html">12. Clustering and Unsupervised Learning</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Deep Learning Methods</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter9.html">13. Neural networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter10.html">14. Building a Feed Forward Neural Network</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter11.html">15. Solving Differential Equations  with Deep Learning</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter12.html">16. Convolutional Neural Networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter13.html">17. Recurrent neural networks: Overarching view</a></li>
+
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Weekly material, notes and exercises</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek34.html">Exercises week 34</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week34.html">Week 34: Introduction to the course, Logistics and Practicalities</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek35.html">Exercises week 35</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week35.html">Week 35: From Ordinary Linear Regression to Ridge and Lasso Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek36.html">Exercises week 36</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week36.html">Week 36: Linear Regression and Gradient descent</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek37.html">Exercises week 37</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week37.html">Week 37: Gradient descent methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek38.html">Exercises week 38</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week38.html">Week 38: Statistical analysis, bias-variance tradeoff and resampling methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek39.html">Exercises week 39</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week39.html">Week 39: Resampling methods and logistic regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week40.html">Week 40: Gradient descent methods (continued) and start Neural networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week41.html">Week 41 Neural networks and constructing a neural network code</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek41.html">Exercises week 41</a></li>
+
+
+
+
+
+
+
+
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Projects</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/project1.html">Project 1 on Machine Learning, deadline October 6 (midnight), 2025</a></li>
+</ul>
+
+    </div>
+</nav></div>
+    </div>
+  
+  
+  <div class="sidebar-primary-items__end sidebar-primary__section">
+  </div>
+  
+  <div id="rtd-footer-container"></div>
+
+
+      </div>
+      
+      <main id="main-content" class="bd-main" role="main">
+        
+        
+
+<div class="sbt-scroll-pixel-helper"></div>
+
+          <div class="bd-content">
+            <div class="bd-article-container">
+              
+              <div class="bd-header-article d-print-none">
+<div class="header-article-items header-article__inner">
+  
+    <div class="header-article-items__start">
+      
+        <div class="header-article-item"><button class="sidebar-toggle primary-toggle btn btn-sm" title="Toggle primary sidebar" data-bs-placement="bottom" data-bs-toggle="tooltip">
+  <span class="fa-solid fa-bars"></span>
+</button></div>
+      
+    </div>
+  
+  
+    <div class="header-article-items__end">
+      
+        <div class="header-article-item">
+
+<div class="article-header-buttons">
+
+
+
+
+
+<div class="dropdown dropdown-download-buttons">
+  <button class="btn dropdown-toggle" type="button" data-bs-toggle="dropdown" aria-expanded="false" aria-label="Download this page">
+    <i class="fas fa-download"></i>
+  </button>
+  <ul class="dropdown-menu">
+      
+      
+      
+      <li><a href="http://github.com/_sources/.venv/lib/python3.13/site-packages/sphinx-7.4.7.dist-info/LICENSE.rst" target="_blank"
+   class="btn btn-sm btn-download-source-button dropdown-item"
+   title="Download source file"
+   data-bs-placement="left" data-bs-toggle="tooltip"
+>
+  
+
+<span class="btn__icon-container">
+  <i class="fas fa-file"></i>
+  </span>
+<span class="btn__text-container">.rst</span>
+</a>
+</li>
+      
+      
+      
+      
+      <li>
+<button onclick="window.print()"
+  class="btn btn-sm btn-download-pdf-button dropdown-item"
+  title="Print to PDF"
+  data-bs-placement="left" data-bs-toggle="tooltip"
+>
+  
+
+<span class="btn__icon-container">
+  <i class="fas fa-file-pdf"></i>
+  </span>
+<span class="btn__text-container">.pdf</span>
+</button>
+</li>
+      
+  </ul>
+</div>
+
+
+
+
+<button onclick="toggleFullScreen()"
+  class="btn btn-sm btn-fullscreen-button"
+  title="Fullscreen mode"
+  data-bs-placement="bottom" data-bs-toggle="tooltip"
+>
+  
+
+<span class="btn__icon-container">
+  <i class="fas fa-expand"></i>
+  </span>
+
+</button>
+
+
+
+<script>
+document.write(`
+  <button class="btn btn-sm nav-link pst-navbar-icon theme-switch-button" title="light/dark" aria-label="light/dark" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <i class="theme-switch fa-solid fa-sun fa-lg" data-mode="light"></i>
+    <i class="theme-switch fa-solid fa-moon fa-lg" data-mode="dark"></i>
+    <i class="theme-switch fa-solid fa-circle-half-stroke fa-lg" data-mode="auto"></i>
+  </button>
+`);
+</script>
+
+
+<script>
+document.write(`
+  <button class="btn btn-sm pst-navbar-icon search-button search-button__button" title="Search" aria-label="Search" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <i class="fa-solid fa-magnifying-glass fa-lg"></i>
+  </button>
+`);
+</script>
+<button class="sidebar-toggle secondary-toggle btn btn-sm" title="Toggle secondary sidebar" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <span class="fa-solid fa-list"></span>
+</button>
+</div></div>
+      
+    </div>
+  
+</div>
+</div>
+              
+              
+
+<div id="jb-print-docs-body" class="onlyprint">
+    <h1>License for Sphinx</h1>
+    <!-- Table of contents -->
+    <div id="print-main-content">
+        <div id="jb-print-toc">
+            
+            <div>
+                <h2> Contents </h2>
+            </div>
+            <nav aria-label="Page">
+                <ul class="visible nav section-nav flex-column">
+<li class="toc-h1 nav-item toc-entry"><a class="reference internal nav-link" href="#">License for Sphinx</a></li>
+<li class="toc-h1 nav-item toc-entry"><a class="reference internal nav-link" href="#licenses-for-incorporated-software">Licenses for incorporated software</a></li>
+</ul>
+
+            </nav>
+        </div>
+    </div>
+</div>
+
+              
+                
+<div id="searchbox"></div>
+                <article class="bd-article">
+                  
+  <section id="license-for-sphinx">
+<h1>License for Sphinx<a class="headerlink" href="#license-for-sphinx" title="Link to this heading">#</a></h1>
+<p>Unless otherwise indicated, all code in the Sphinx project is licenced under the
+two clause BSD licence below.</p>
+<p>Copyright (c) 2007-2024 by the Sphinx team (see AUTHORS file).
+All rights reserved.</p>
+<p>Redistribution and use in source and binary forms, with or without
+modification, are permitted provided that the following conditions are
+met:</p>
+<ul class="simple">
+<li><p>Redistributions of source code must retain the above copyright
+notice, this list of conditions and the following disclaimer.</p></li>
+<li><p>Redistributions in binary form must reproduce the above copyright
+notice, this list of conditions and the following disclaimer in the
+documentation and/or other materials provided with the distribution.</p></li>
+</ul>
+<p>THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+“AS IS” AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.</p>
+</section>
+<section id="licenses-for-incorporated-software">
+<h1>Licenses for incorporated software<a class="headerlink" href="#licenses-for-incorporated-software" title="Link to this heading">#</a></h1>
+<p>The included implementation of NumpyDocstring._parse_numpydoc_see_also_section
+was derived from code under the following license:</p>
+<hr class="docutils" />
+<p>Copyright (C) 2008 Stefan van der Walt &lt;<a class="reference external" href="/service/mailto:stefan&#37;&#52;&#48;mentat&#46;za&#46;net">stefan<span>&#64;</span>mentat<span>&#46;</span>za<span>&#46;</span>net</a>&gt;, Pauli Virtanen &lt;<a class="reference external" href="/service/mailto:pav&#37;&#52;&#48;iki&#46;fi">pav<span>&#64;</span>iki<span>&#46;</span>fi</a>&gt;</p>
+<p>Redistribution and use in source and binary forms, with or without
+modification, are permitted provided that the following conditions are
+met:</p>
+<blockquote>
+<div><ol class="arabic simple">
+<li><p>Redistributions of source code must retain the above copyright
+notice, this list of conditions and the following disclaimer.</p></li>
+<li><p>Redistributions in binary form must reproduce the above copyright
+notice, this list of conditions and the following disclaimer in
+the documentation and/or other materials provided with the
+distribution.</p></li>
+</ol>
+</div></blockquote>
+<p>THIS SOFTWARE IS PROVIDED BY THE AUTHOR <a href="#id1"><span class="problematic" id="id2">``</span></a>AS IS’’ AND ANY EXPRESS OR
+IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
+WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+DISCLAIMED. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT,
+INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
+(INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
+SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
+HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
+STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING
+IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+POSSIBILITY OF SUCH DAMAGE.</p>
+<hr class="docutils" />
+</section>
+
+    <script type="text/x-thebe-config">
+    {
+        requestKernel: true,
+        binderOptions: {
+            repo: "binder-examples/jupyter-stacks-datascience",
+            ref: "master",
+        },
+        codeMirrorConfig: {
+            theme: "abcdef",
+            mode: "python"
+        },
+        kernelOptions: {
+            name: "python3",
+            path: "./.venv/lib/python3.13/site-packages/sphinx-7.4.7.dist-info"
+        },
+        predefinedOutput: true
+    }
+    </script>
+    <script>kernelName = 'python3'</script>
+
+                </article>
+              
+
+              
+              
+              
+              
+                <footer class="prev-next-footer d-print-none">
+                  
+<div class="prev-next-area">
+</div>
+                </footer>
+              
+            </div>
+            
+            
+              
+                <div class="bd-sidebar-secondary bd-toc"><div class="sidebar-secondary-items sidebar-secondary__inner">
+
+
+  <div class="sidebar-secondary-item">
+  <div class="page-toc tocsection onthispage">
+    <i class="fa-solid fa-list"></i> Contents
+  </div>
+  <nav class="bd-toc-nav page-toc">
+    <ul class="visible nav section-nav flex-column">
+<li class="toc-h1 nav-item toc-entry"><a class="reference internal nav-link" href="#">License for Sphinx</a></li>
+<li class="toc-h1 nav-item toc-entry"><a class="reference internal nav-link" href="#licenses-for-incorporated-software">Licenses for incorporated software</a></li>
+</ul>
+
+  </nav></div>
+
+</div></div>
+              
+            
+          </div>
+          <footer class="bd-footer-content">
+            
+<div class="bd-footer-content__inner container">
+  
+  <div class="footer-item">
+    
+<p class="component-author">
+By Morten Hjorth-Jensen
+</p>
+
+  </div>
+  
+  <div class="footer-item">
+    
+
+  <p class="copyright">
+    
+      © Copyright 2023.
+      <br/>
+    
+  </p>
+
+  </div>
+  
+  <div class="footer-item">
+    
+  </div>
+  
+  <div class="footer-item">
+    
+  </div>
+  
+</div>
+          </footer>
+        
+
+      </main>
+    </div>
+  </div>
+  
+  <!-- Scripts loaded after <body> so the DOM is not blocked -->
+  <script src="http://github.com/_static/scripts/bootstrap.js?digest=dfe6caa3a7d634c4db9b"></script>
+<script src="http://github.com/_static/scripts/pydata-sphinx-theme.js?digest=dfe6caa3a7d634c4db9b"></script>
+
+  <footer class="bd-footer">
+  </footer>
+  </body>
+</html>
\ No newline at end of file
diff --git a/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/sphinx_book_theme/assets/translations/README.html b/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/sphinx_book_theme/assets/translations/README.html
new file mode 100644
index 000000000..f731983cf
--- /dev/null
+++ b/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/sphinx_book_theme/assets/translations/README.html
@@ -0,0 +1,578 @@
+
+<!DOCTYPE html>
+
+
+<html lang="en" data-content_root="../../../../../../../" >
+
+  <head>
+    <meta charset="utf-8" />
+    <meta name="viewport" content="width=device-width, initial-scale=1.0" /><meta name="viewport" content="width=device-width, initial-scale=1" />
+
+    <title>Translation workflow &#8212; Applied Data Analysis and Machine Learning</title>
+  
+  
+  
+  <script data-cfasync="false">
+    document.documentElement.dataset.mode = localStorage.getItem("mode") || "";
+    document.documentElement.dataset.theme = localStorage.getItem("theme") || "";
+  </script>
+  
+  <!-- Loaded before other Sphinx assets -->
+  <link href="http://github.com/_static/styles/theme.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+<link href="http://github.com/_static/styles/bootstrap.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+<link href="http://github.com/_static/styles/pydata-sphinx-theme.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+
+  
+  <link href="http://github.com/_static/vendor/fontawesome/6.5.2/css/all.min.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+  <link rel="preload" as="font" type="font/woff2" crossorigin href="http://github.com/_static/vendor/fontawesome/6.5.2/webfonts/fa-solid-900.woff2" />
+<link rel="preload" as="font" type="font/woff2" crossorigin href="http://github.com/_static/vendor/fontawesome/6.5.2/webfonts/fa-brands-400.woff2" />
+<link rel="preload" as="font" type="font/woff2" crossorigin href="http://github.com/_static/vendor/fontawesome/6.5.2/webfonts/fa-regular-400.woff2" />
+
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/pygments.css?v=03e43079" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/styles/sphinx-book-theme.css?v=eba8b062" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/togglebutton.css?v=13237357" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/copybutton.css?v=76b2166b" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/mystnb.8ecb98da25f57f5357bf6f572d296f466b2cfe2517ffebfabe82451661e28f02.css?v=6644e6bb" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/sphinx-thebe.css?v=4fa983c6" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/sphinx-design.min.css?v=95c83b7e" />
+  
+  <!-- Pre-loaded scripts that we'll load fully later -->
+  <link rel="preload" as="script" href="http://github.com/_static/scripts/bootstrap.js?digest=dfe6caa3a7d634c4db9b" />
+<link rel="preload" as="script" href="http://github.com/_static/scripts/pydata-sphinx-theme.js?digest=dfe6caa3a7d634c4db9b" />
+  <script src="http://github.com/_static/vendor/fontawesome/6.5.2/js/all.min.js?digest=dfe6caa3a7d634c4db9b"></script>
+
+    <script src="http://github.com/_static/documentation_options.js?v=9eb32ce0"></script>
+    <script src="http://github.com/_static/doctools.js?v=9a2dae69"></script>
+    <script src="http://github.com/_static/sphinx_highlight.js?v=dc90522c"></script>
+    <script src="http://github.com/_static/clipboard.min.js?v=a7894cd8"></script>
+    <script src="http://github.com/_static/copybutton.js?v=f281be69"></script>
+    <script src="http://github.com/_static/scripts/sphinx-book-theme.js?v=887ef09a"></script>
+    <script>let toggleHintShow = 'Click to show';</script>
+    <script>let toggleHintHide = 'Click to hide';</script>
+    <script>let toggleOpenOnPrint = 'true';</script>
+    <script src="http://github.com/_static/togglebutton.js?v=4a39c7ea"></script>
+    <script>var togglebuttonSelector = '.toggle, .admonition.dropdown';</script>
+    <script src="http://github.com/_static/design-tabs.js?v=f930bc37"></script>
+    <script>const THEBE_JS_URL = "https://unpkg.com/thebe@0.8.2/lib/index.js"; const thebe_selector = ".thebe,.cell"; const thebe_selector_input = "pre"; const thebe_selector_output = ".output, .cell_output"</script>
+    <script async="async" src="http://github.com/_static/sphinx-thebe.js?v=c100c467"></script>
+    <script>var togglebuttonSelector = '.toggle, .admonition.dropdown';</script>
+    <script>const THEBE_JS_URL = "https://unpkg.com/thebe@0.8.2/lib/index.js"; const thebe_selector = ".thebe,.cell"; const thebe_selector_input = "pre"; const thebe_selector_output = ".output, .cell_output"</script>
+    <script>DOCUMENTATION_OPTIONS.pagename = '.venv/lib/python3.13/site-packages/sphinx_book_theme/assets/translations/README';</script>
+    <link rel="index" title="Index" href="http://github.com/genindex.html" />
+    <link rel="search" title="Search" href="http://github.com/search.html" />
+  <meta name="viewport" content="width=device-width, initial-scale=1"/>
+  <meta name="docsearch:language" content="en"/>
+  </head>
+  
+  
+  <body data-bs-spy="scroll" data-bs-target=".bd-toc-nav" data-offset="180" data-bs-root-margin="0px 0px -60%" data-default-mode="">
+
+  
+  
+  <div id="pst-skip-link" class="skip-link d-print-none"><a href="#main-content">Skip to main content</a></div>
+  
+  <div id="pst-scroll-pixel-helper"></div>
+  
+  <button type="button" class="btn rounded-pill" id="pst-back-to-top">
+    <i class="fa-solid fa-arrow-up"></i>Back to top</button>
+
+  
+  <input type="checkbox"
+          class="sidebar-toggle"
+          id="pst-primary-sidebar-checkbox"/>
+  <label class="overlay overlay-primary" for="pst-primary-sidebar-checkbox"></label>
+  
+  <input type="checkbox"
+          class="sidebar-toggle"
+          id="pst-secondary-sidebar-checkbox"/>
+  <label class="overlay overlay-secondary" for="pst-secondary-sidebar-checkbox"></label>
+  
+  <div class="search-button__wrapper">
+    <div class="search-button__overlay"></div>
+    <div class="search-button__search-container">
+<form class="bd-search d-flex align-items-center"
+      action="http://github.com/search.html"
+      method="get">
+  <i class="fa-solid fa-magnifying-glass"></i>
+  <input type="search"
+         class="form-control"
+         name="q"
+         id="search-input"
+         placeholder="Search this book..."
+         aria-label="Search this book..."
+         autocomplete="off"
+         autocorrect="off"
+         autocapitalize="off"
+         spellcheck="false"/>
+  <span class="search-button__kbd-shortcut"><kbd class="kbd-shortcut__modifier">Ctrl</kbd>+<kbd>K</kbd></span>
+</form></div>
+  </div>
+
+  <div class="pst-async-banner-revealer d-none">
+  <aside id="bd-header-version-warning" class="d-none d-print-none" aria-label="Version warning"></aside>
+</div>
+
+  
+    <header class="bd-header navbar navbar-expand-lg bd-navbar d-print-none">
+    </header>
+  
+
+  <div class="bd-container">
+    <div class="bd-container__inner bd-page-width">
+      
+      
+      
+        
+      
+      <div class="bd-sidebar-primary bd-sidebar">
+        
+
+  
+  <div class="sidebar-header-items sidebar-primary__section">
+    
+    
+    
+    
+  </div>
+  
+    <div class="sidebar-primary-items__start sidebar-primary__section">
+        <div class="sidebar-primary-item">
+
+  
+    
+  
+
+<a class="navbar-brand logo" href="http://github.com/intro.html">
+  
+  
+  
+  
+  
+    
+    
+      
+    
+    
+    <img src="http://github.com/_static/logo.png" class="logo__image only-light" alt="Applied Data Analysis and Machine Learning - Home"/>
+    <script>document.write(`<img src="http://github.com/_static/logo.png" class="logo__image only-dark" alt="Applied Data Analysis and Machine Learning - Home"/>`);</script>
+  
+  
+</a></div>
+        <div class="sidebar-primary-item">
+
+ <script>
+ document.write(`
+   <button class="btn search-button-field search-button__button" title="Search" aria-label="Search" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <i class="fa-solid fa-magnifying-glass"></i>
+    <span class="search-button__default-text">Search</span>
+    <span class="search-button__kbd-shortcut"><kbd class="kbd-shortcut__modifier">Ctrl</kbd>+<kbd class="kbd-shortcut__modifier">K</kbd></span>
+   </button>
+ `);
+ </script></div>
+        <div class="sidebar-primary-item"><nav class="bd-links bd-docs-nav" aria-label="Main">
+    <div class="bd-toc-item navbar-nav active">
+        
+        <ul class="nav bd-sidenav bd-sidenav__home-link">
+            <li class="toctree-l1">
+                <a class="reference internal" href="http://github.com/intro.html">
+                    Applied Data Analysis and Machine Learning
+                </a>
+            </li>
+        </ul>
+        <p aria-level="2" class="caption" role="heading"><span class="caption-text">About the course</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/schedule.html">Course setting</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/teachers.html">Teachers and Grading</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/textbooks.html">Textbooks</a></li>
+
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Review of Statistics with Resampling Techniques and Linear Algebra</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/statistics.html">1. Elements of Probability Theory and Statistical Data Analysis</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/linalg.html">2. Linear Algebra, Handling of Arrays and more Python Features</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">From Regression to Support Vector Machines</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter1.html">3. Linear Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter2.html">4. Ridge and Lasso Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter3.html">5. Resampling Methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter4.html">6. Logistic Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapteroptimization.html">7. Optimization, the central part of any Machine Learning algortithm</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter5.html">8. Support Vector Machines, overarching aims</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Decision Trees, Ensemble Methods and Boosting</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter6.html">9. Decision trees, overarching aims</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter7.html">10. Ensemble Methods: From a Single Tree to Many Trees and Extreme Boosting, Meet the Jungle of Methods</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Dimensionality Reduction</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter8.html">11. Basic ideas of the Principal Component Analysis (PCA)</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/clustering.html">12. Clustering and Unsupervised Learning</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Deep Learning Methods</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter9.html">13. Neural networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter10.html">14. Building a Feed Forward Neural Network</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter11.html">15. Solving Differential Equations  with Deep Learning</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter12.html">16. Convolutional Neural Networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter13.html">17. Recurrent neural networks: Overarching view</a></li>
+
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Weekly material, notes and exercises</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek34.html">Exercises week 34</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week34.html">Week 34: Introduction to the course, Logistics and Practicalities</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek35.html">Exercises week 35</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week35.html">Week 35: From Ordinary Linear Regression to Ridge and Lasso Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek36.html">Exercises week 36</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week36.html">Week 36: Linear Regression and Gradient descent</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek37.html">Exercises week 37</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week37.html">Week 37: Gradient descent methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek38.html">Exercises week 38</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week38.html">Week 38: Statistical analysis, bias-variance tradeoff and resampling methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek39.html">Exercises week 39</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week39.html">Week 39: Resampling methods and logistic regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week40.html">Week 40: Gradient descent methods (continued) and start Neural networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week41.html">Week 41 Neural networks and constructing a neural network code</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek41.html">Exercises week 41</a></li>
+
+
+
+
+
+
+
+
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Projects</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/project1.html">Project 1 on Machine Learning, deadline October 6 (midnight), 2025</a></li>
+</ul>
+
+    </div>
+</nav></div>
+    </div>
+  
+  
+  <div class="sidebar-primary-items__end sidebar-primary__section">
+  </div>
+  
+  <div id="rtd-footer-container"></div>
+
+
+      </div>
+      
+      <main id="main-content" class="bd-main" role="main">
+        
+        
+
+<div class="sbt-scroll-pixel-helper"></div>
+
+          <div class="bd-content">
+            <div class="bd-article-container">
+              
+              <div class="bd-header-article d-print-none">
+<div class="header-article-items header-article__inner">
+  
+    <div class="header-article-items__start">
+      
+        <div class="header-article-item"><button class="sidebar-toggle primary-toggle btn btn-sm" title="Toggle primary sidebar" data-bs-placement="bottom" data-bs-toggle="tooltip">
+  <span class="fa-solid fa-bars"></span>
+</button></div>
+      
+    </div>
+  
+  
+    <div class="header-article-items__end">
+      
+        <div class="header-article-item">
+
+<div class="article-header-buttons">
+
+
+
+
+
+<div class="dropdown dropdown-download-buttons">
+  <button class="btn dropdown-toggle" type="button" data-bs-toggle="dropdown" aria-expanded="false" aria-label="Download this page">
+    <i class="fas fa-download"></i>
+  </button>
+  <ul class="dropdown-menu">
+      
+      
+      
+      <li><a href="http://github.com/_sources/.venv/lib/python3.13/site-packages/sphinx_book_theme/assets/translations/README.md" target="_blank"
+   class="btn btn-sm btn-download-source-button dropdown-item"
+   title="Download source file"
+   data-bs-placement="left" data-bs-toggle="tooltip"
+>
+  
+
+<span class="btn__icon-container">
+  <i class="fas fa-file"></i>
+  </span>
+<span class="btn__text-container">.md</span>
+</a>
+</li>
+      
+      
+      
+      
+      <li>
+<button onclick="window.print()"
+  class="btn btn-sm btn-download-pdf-button dropdown-item"
+  title="Print to PDF"
+  data-bs-placement="left" data-bs-toggle="tooltip"
+>
+  
+
+<span class="btn__icon-container">
+  <i class="fas fa-file-pdf"></i>
+  </span>
+<span class="btn__text-container">.pdf</span>
+</button>
+</li>
+      
+  </ul>
+</div>
+
+
+
+
+<button onclick="toggleFullScreen()"
+  class="btn btn-sm btn-fullscreen-button"
+  title="Fullscreen mode"
+  data-bs-placement="bottom" data-bs-toggle="tooltip"
+>
+  
+
+<span class="btn__icon-container">
+  <i class="fas fa-expand"></i>
+  </span>
+
+</button>
+
+
+
+<script>
+document.write(`
+  <button class="btn btn-sm nav-link pst-navbar-icon theme-switch-button" title="light/dark" aria-label="light/dark" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <i class="theme-switch fa-solid fa-sun fa-lg" data-mode="light"></i>
+    <i class="theme-switch fa-solid fa-moon fa-lg" data-mode="dark"></i>
+    <i class="theme-switch fa-solid fa-circle-half-stroke fa-lg" data-mode="auto"></i>
+  </button>
+`);
+</script>
+
+
+<script>
+document.write(`
+  <button class="btn btn-sm pst-navbar-icon search-button search-button__button" title="Search" aria-label="Search" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <i class="fa-solid fa-magnifying-glass fa-lg"></i>
+  </button>
+`);
+</script>
+<button class="sidebar-toggle secondary-toggle btn btn-sm" title="Toggle secondary sidebar" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <span class="fa-solid fa-list"></span>
+</button>
+</div></div>
+      
+    </div>
+  
+</div>
+</div>
+              
+              
+
+<div id="jb-print-docs-body" class="onlyprint">
+    <h1>Translation workflow</h1>
+    <!-- Table of contents -->
+    <div id="print-main-content">
+        <div id="jb-print-toc">
+            
+            <div>
+                <h2> Contents </h2>
+            </div>
+            <nav aria-label="Page">
+                <ul class="visible nav section-nav flex-column">
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#structure-of-translation-files">Structure of translation files</a><ul class="nav section-nav flex-column">
+<li class="toc-h3 nav-item toc-entry"><a class="reference internal nav-link" href="#translation-source-files">Translation source files</a></li>
+<li class="toc-h3 nav-item toc-entry"><a class="reference internal nav-link" href="#compiled-translation-files">Compiled translation files</a></li>
+</ul>
+</li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#workflow-of-translations">Workflow of translations</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#to-update-a-translation">To update a translation</a></li>
+</ul>
+            </nav>
+        </div>
+    </div>
+</div>
+
+              
+                
+<div id="searchbox"></div>
+                <article class="bd-article">
+                  
+  <section class="tex2jax_ignore mathjax_ignore" id="translation-workflow">
+<h1>Translation workflow<a class="headerlink" href="#translation-workflow" title="Link to this heading">#</a></h1>
+<p>This folder contains code and translations for supporting multiple languages with Sphinx.
+See <a class="reference external" href="https://www.sphinx-doc.org/en/master/usage/configuration.html">the Sphinx internationalization documentation</a> for more details.</p>
+<section id="structure-of-translation-files">
+<h2>Structure of translation files<a class="headerlink" href="#structure-of-translation-files" title="Link to this heading">#</a></h2>
+<section id="translation-source-files">
+<h3>Translation source files<a class="headerlink" href="#translation-source-files" title="Link to this heading">#</a></h3>
+<p>The source files for our translations are hand-edited, and contain the raw mapping of words onto various languages.
+They are checked in to <code class="docutils literal notranslate"><span class="pre">git</span></code> history with this repository.</p>
+<p><code class="docutils literal notranslate"><span class="pre">src/sphinx_book_theme/assets/translations/jsons</span></code> contains a collection of JSON files that define the translation for various phrases in this repository.
+Each file is a different phrase, and its contents define language codes and translated phrases for each language we support.
+They were originally created with <a class="reference external" href="https://smodin.me/translate-one-text-into-multiple-languages">the smodin.io language translator</a> (see below for how to update them).</p>
+</section>
+<section id="compiled-translation-files">
+<h3>Compiled translation files<a class="headerlink" href="#compiled-translation-files" title="Link to this heading">#</a></h3>
+<p>The translation source files are compiled at build time (when we run <code class="docutils literal notranslate"><span class="pre">stb</span> <span class="pre">compile</span></code>) automatically.
+This is executed by the Python script at <code class="docutils literal notranslate"><span class="pre">python</span> <span class="pre">src/sphinx_book_theme/_compile_translations.py</span></code> (more information on that below).</p>
+<p>These compiled files are <strong>not checked into <code class="docutils literal notranslate"><span class="pre">.git</span></code> history</strong>, but they <strong>are</strong> bundled with the theme when it is distributed in a package.
+Here’s a brief explanation of each:</p>
+<ul class="simple">
+<li><p><code class="docutils literal notranslate"><span class="pre">src/sphinx_book_theme/theme/sphinx_book_theme/static/locales</span></code> contains Sphinx locale files that were auto-converted from the files in <code class="docutils literal notranslate"><span class="pre">jsons/</span></code> by the helper script below.</p></li>
+<li><p><code class="docutils literal notranslate"><span class="pre">src/sphinx_book_theme/_compile_translations.py</span></code> is a helper script to auto-generate Sphinx locale files from the JSONs in <code class="docutils literal notranslate"><span class="pre">jsons/</span></code>.</p></li>
+</ul>
+</section>
+</section>
+<section id="workflow-of-translations">
+<h2>Workflow of translations<a class="headerlink" href="#workflow-of-translations" title="Link to this heading">#</a></h2>
+<p>Here’s a short workflow of how to add a new translation, assuming that you are translating using the <a class="reference external" href="https://smodin.io/translate-one-text-into-multiple-languages">smodin.io service</a>.</p>
+<ol class="arabic">
+<li><p>Go to <a class="reference external" href="https://smodin.io/translate-one-text-into-multiple-languages">the smodin.io service</a></p></li>
+<li><p>Select as many languages as you like.</p></li>
+<li><p>Type in the phrase you’d like to translate.</p></li>
+<li><p>Click <code class="docutils literal notranslate"><span class="pre">TRANSLATE</span></code> and then <code class="docutils literal notranslate"><span class="pre">Download</span> <span class="pre">JSON</span></code>.</p></li>
+<li><p>This will download a JSON file with a bunch of <code class="docutils literal notranslate"><span class="pre">language-code:</span> <span class="pre">translated-phrase</span></code> mappings.</p></li>
+<li><p>Put this JSON in the <code class="docutils literal notranslate"><span class="pre">jsons/</span></code> folder, and rename it to be the phrase you’ve translated in English.
+So if the original phrase is <code class="docutils literal notranslate"><span class="pre">My</span> <span class="pre">phrase</span></code>, you should name the file <code class="docutils literal notranslate"><span class="pre">My</span> <span class="pre">phrase.json</span></code>.</p></li>
+<li><p>Run <a class="reference external" href="https://prettier.io/">the <code class="docutils literal notranslate"><span class="pre">prettier</span></code> formatter</a> on this JSON to split it into multiple lines (this makes it easier to read and edit if translations should be updated)</p>
+<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>prettier<span class="w"> </span>sphinx_book_theme/translations/jsons/&lt;message<span class="w"> </span>name&gt;.json
+</pre></div>
+</div>
+</li>
+<li><p>Run <code class="docutils literal notranslate"><span class="pre">python</span> <span class="pre">src/sphinx_book_theme/_compile_translations.py</span></code></p></li>
+<li><p>This will generate the locale files (<code class="docutils literal notranslate"><span class="pre">.mo</span></code>) that Sphinx uses in its translation machinery, and put them in <code class="docutils literal notranslate"><span class="pre">locales/&lt;language-code&gt;/LC_MESSAGES/&lt;msg&gt;.mo</span></code>.</p></li>
+</ol>
+<p>Sphinx should now know how to translate this message!</p>
+</section>
+<section id="to-update-a-translation">
+<h2>To update a translation<a class="headerlink" href="#to-update-a-translation" title="Link to this heading">#</a></h2>
+<p>To update a translation, you may go to the phase you’d like to modify in <code class="docutils literal notranslate"><span class="pre">jsons/</span></code>, then find the entry for the language you’d like to update, and change its value.
+Finally, run <code class="docutils literal notranslate"><span class="pre">python</span> <span class="pre">src/sphinx_book_theme/_compile_translations.py</span></code> and this will update the <code class="docutils literal notranslate"><span class="pre">.mo</span></code> files.</p>
+</section>
+</section>
+
+    <script type="text/x-thebe-config">
+    {
+        requestKernel: true,
+        binderOptions: {
+            repo: "binder-examples/jupyter-stacks-datascience",
+            ref: "master",
+        },
+        codeMirrorConfig: {
+            theme: "abcdef",
+            mode: "python"
+        },
+        kernelOptions: {
+            name: "python3",
+            path: "./.venv/lib/python3.13/site-packages/sphinx_book_theme/assets/translations"
+        },
+        predefinedOutput: true
+    }
+    </script>
+    <script>kernelName = 'python3'</script>
+
+                </article>
+              
+
+              
+              
+              
+              
+                <footer class="prev-next-footer d-print-none">
+                  
+<div class="prev-next-area">
+</div>
+                </footer>
+              
+            </div>
+            
+            
+              
+                <div class="bd-sidebar-secondary bd-toc"><div class="sidebar-secondary-items sidebar-secondary__inner">
+
+
+  <div class="sidebar-secondary-item">
+  <div class="page-toc tocsection onthispage">
+    <i class="fa-solid fa-list"></i> Contents
+  </div>
+  <nav class="bd-toc-nav page-toc">
+    <ul class="visible nav section-nav flex-column">
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#structure-of-translation-files">Structure of translation files</a><ul class="nav section-nav flex-column">
+<li class="toc-h3 nav-item toc-entry"><a class="reference internal nav-link" href="#translation-source-files">Translation source files</a></li>
+<li class="toc-h3 nav-item toc-entry"><a class="reference internal nav-link" href="#compiled-translation-files">Compiled translation files</a></li>
+</ul>
+</li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#workflow-of-translations">Workflow of translations</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#to-update-a-translation">To update a translation</a></li>
+</ul>
+  </nav></div>
+
+</div></div>
+              
+            
+          </div>
+          <footer class="bd-footer-content">
+            
+<div class="bd-footer-content__inner container">
+  
+  <div class="footer-item">
+    
+<p class="component-author">
+By Morten Hjorth-Jensen
+</p>
+
+  </div>
+  
+  <div class="footer-item">
+    
+
+  <p class="copyright">
+    
+      © Copyright 2023.
+      <br/>
+    
+  </p>
+
+  </div>
+  
+  <div class="footer-item">
+    
+  </div>
+  
+  <div class="footer-item">
+    
+  </div>
+  
+</div>
+          </footer>
+        
+
+      </main>
+    </div>
+  </div>
+  
+  <!-- Scripts loaded after <body> so the DOM is not blocked -->
+  <script src="http://github.com/_static/scripts/bootstrap.js?digest=dfe6caa3a7d634c4db9b"></script>
+<script src="http://github.com/_static/scripts/pydata-sphinx-theme.js?digest=dfe6caa3a7d634c4db9b"></script>
+
+  <footer class="bd-footer">
+  </footer>
+  </body>
+</html>
\ No newline at end of file
diff --git a/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/sphinxcontrib_bibtex-2.6.5.dist-info/licenses/LICENSE.html b/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/sphinxcontrib_bibtex-2.6.5.dist-info/licenses/LICENSE.html
new file mode 100644
index 000000000..3fc23d826
--- /dev/null
+++ b/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/sphinxcontrib_bibtex-2.6.5.dist-info/licenses/LICENSE.html
@@ -0,0 +1,539 @@
+
+<!DOCTYPE html>
+
+
+<html lang="en" data-content_root="../../../../../../" >
+
+  <head>
+    <meta charset="utf-8" />
+    <meta name="viewport" content="width=device-width, initial-scale=1.0" /><meta name="viewport" content="width=device-width, initial-scale=1" />
+
+    <title>&lt;no title&gt; &#8212; Applied Data Analysis and Machine Learning</title>
+  
+  
+  
+  <script data-cfasync="false">
+    document.documentElement.dataset.mode = localStorage.getItem("mode") || "";
+    document.documentElement.dataset.theme = localStorage.getItem("theme") || "";
+  </script>
+  
+  <!-- Loaded before other Sphinx assets -->
+  <link href="http://github.com/_static/styles/theme.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+<link href="http://github.com/_static/styles/bootstrap.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+<link href="http://github.com/_static/styles/pydata-sphinx-theme.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+
+  
+  <link href="http://github.com/_static/vendor/fontawesome/6.5.2/css/all.min.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+  <link rel="preload" as="font" type="font/woff2" crossorigin href="http://github.com/_static/vendor/fontawesome/6.5.2/webfonts/fa-solid-900.woff2" />
+<link rel="preload" as="font" type="font/woff2" crossorigin href="http://github.com/_static/vendor/fontawesome/6.5.2/webfonts/fa-brands-400.woff2" />
+<link rel="preload" as="font" type="font/woff2" crossorigin href="http://github.com/_static/vendor/fontawesome/6.5.2/webfonts/fa-regular-400.woff2" />
+
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/pygments.css?v=03e43079" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/styles/sphinx-book-theme.css?v=eba8b062" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/togglebutton.css?v=13237357" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/copybutton.css?v=76b2166b" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/mystnb.8ecb98da25f57f5357bf6f572d296f466b2cfe2517ffebfabe82451661e28f02.css?v=6644e6bb" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/sphinx-thebe.css?v=4fa983c6" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/sphinx-design.min.css?v=95c83b7e" />
+  
+  <!-- Pre-loaded scripts that we'll load fully later -->
+  <link rel="preload" as="script" href="http://github.com/_static/scripts/bootstrap.js?digest=dfe6caa3a7d634c4db9b" />
+<link rel="preload" as="script" href="http://github.com/_static/scripts/pydata-sphinx-theme.js?digest=dfe6caa3a7d634c4db9b" />
+  <script src="http://github.com/_static/vendor/fontawesome/6.5.2/js/all.min.js?digest=dfe6caa3a7d634c4db9b"></script>
+
+    <script src="http://github.com/_static/documentation_options.js?v=9eb32ce0"></script>
+    <script src="http://github.com/_static/doctools.js?v=9a2dae69"></script>
+    <script src="http://github.com/_static/sphinx_highlight.js?v=dc90522c"></script>
+    <script src="http://github.com/_static/clipboard.min.js?v=a7894cd8"></script>
+    <script src="http://github.com/_static/copybutton.js?v=f281be69"></script>
+    <script src="http://github.com/_static/scripts/sphinx-book-theme.js?v=887ef09a"></script>
+    <script>let toggleHintShow = 'Click to show';</script>
+    <script>let toggleHintHide = 'Click to hide';</script>
+    <script>let toggleOpenOnPrint = 'true';</script>
+    <script src="http://github.com/_static/togglebutton.js?v=4a39c7ea"></script>
+    <script>var togglebuttonSelector = '.toggle, .admonition.dropdown';</script>
+    <script src="http://github.com/_static/design-tabs.js?v=f930bc37"></script>
+    <script>const THEBE_JS_URL = "https://unpkg.com/thebe@0.8.2/lib/index.js"; const thebe_selector = ".thebe,.cell"; const thebe_selector_input = "pre"; const thebe_selector_output = ".output, .cell_output"</script>
+    <script async="async" src="http://github.com/_static/sphinx-thebe.js?v=c100c467"></script>
+    <script>var togglebuttonSelector = '.toggle, .admonition.dropdown';</script>
+    <script>const THEBE_JS_URL = "https://unpkg.com/thebe@0.8.2/lib/index.js"; const thebe_selector = ".thebe,.cell"; const thebe_selector_input = "pre"; const thebe_selector_output = ".output, .cell_output"</script>
+    <script>DOCUMENTATION_OPTIONS.pagename = '.venv/lib/python3.13/site-packages/sphinxcontrib_bibtex-2.6.5.dist-info/licenses/LICENSE';</script>
+    <link rel="index" title="Index" href="http://github.com/genindex.html" />
+    <link rel="search" title="Search" href="http://github.com/search.html" />
+  <meta name="viewport" content="width=device-width, initial-scale=1"/>
+  <meta name="docsearch:language" content="en"/>
+  </head>
+  
+  
+  <body data-bs-spy="scroll" data-bs-target=".bd-toc-nav" data-offset="180" data-bs-root-margin="0px 0px -60%" data-default-mode="">
+
+  
+  
+  <div id="pst-skip-link" class="skip-link d-print-none"><a href="#main-content">Skip to main content</a></div>
+  
+  <div id="pst-scroll-pixel-helper"></div>
+  
+  <button type="button" class="btn rounded-pill" id="pst-back-to-top">
+    <i class="fa-solid fa-arrow-up"></i>Back to top</button>
+
+  
+  <input type="checkbox"
+          class="sidebar-toggle"
+          id="pst-primary-sidebar-checkbox"/>
+  <label class="overlay overlay-primary" for="pst-primary-sidebar-checkbox"></label>
+  
+  <input type="checkbox"
+          class="sidebar-toggle"
+          id="pst-secondary-sidebar-checkbox"/>
+  <label class="overlay overlay-secondary" for="pst-secondary-sidebar-checkbox"></label>
+  
+  <div class="search-button__wrapper">
+    <div class="search-button__overlay"></div>
+    <div class="search-button__search-container">
+<form class="bd-search d-flex align-items-center"
+      action="http://github.com/search.html"
+      method="get">
+  <i class="fa-solid fa-magnifying-glass"></i>
+  <input type="search"
+         class="form-control"
+         name="q"
+         id="search-input"
+         placeholder="Search this book..."
+         aria-label="Search this book..."
+         autocomplete="off"
+         autocorrect="off"
+         autocapitalize="off"
+         spellcheck="false"/>
+  <span class="search-button__kbd-shortcut"><kbd class="kbd-shortcut__modifier">Ctrl</kbd>+<kbd>K</kbd></span>
+</form></div>
+  </div>
+
+  <div class="pst-async-banner-revealer d-none">
+  <aside id="bd-header-version-warning" class="d-none d-print-none" aria-label="Version warning"></aside>
+</div>
+
+  
+    <header class="bd-header navbar navbar-expand-lg bd-navbar d-print-none">
+    </header>
+  
+
+  <div class="bd-container">
+    <div class="bd-container__inner bd-page-width">
+      
+      
+      
+        
+      
+      <div class="bd-sidebar-primary bd-sidebar">
+        
+
+  
+  <div class="sidebar-header-items sidebar-primary__section">
+    
+    
+    
+    
+  </div>
+  
+    <div class="sidebar-primary-items__start sidebar-primary__section">
+        <div class="sidebar-primary-item">
+
+  
+    
+  
+
+<a class="navbar-brand logo" href="http://github.com/intro.html">
+  
+  
+  
+  
+  
+    
+    
+      
+    
+    
+    <img src="http://github.com/_static/logo.png" class="logo__image only-light" alt="Applied Data Analysis and Machine Learning - Home"/>
+    <script>document.write(`<img src="http://github.com/_static/logo.png" class="logo__image only-dark" alt="Applied Data Analysis and Machine Learning - Home"/>`);</script>
+  
+  
+</a></div>
+        <div class="sidebar-primary-item">
+
+ <script>
+ document.write(`
+   <button class="btn search-button-field search-button__button" title="Search" aria-label="Search" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <i class="fa-solid fa-magnifying-glass"></i>
+    <span class="search-button__default-text">Search</span>
+    <span class="search-button__kbd-shortcut"><kbd class="kbd-shortcut__modifier">Ctrl</kbd>+<kbd class="kbd-shortcut__modifier">K</kbd></span>
+   </button>
+ `);
+ </script></div>
+        <div class="sidebar-primary-item"><nav class="bd-links bd-docs-nav" aria-label="Main">
+    <div class="bd-toc-item navbar-nav active">
+        
+        <ul class="nav bd-sidenav bd-sidenav__home-link">
+            <li class="toctree-l1">
+                <a class="reference internal" href="http://github.com/intro.html">
+                    Applied Data Analysis and Machine Learning
+                </a>
+            </li>
+        </ul>
+        <p aria-level="2" class="caption" role="heading"><span class="caption-text">About the course</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/schedule.html">Course setting</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/teachers.html">Teachers and Grading</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/textbooks.html">Textbooks</a></li>
+
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Review of Statistics with Resampling Techniques and Linear Algebra</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/statistics.html">1. Elements of Probability Theory and Statistical Data Analysis</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/linalg.html">2. Linear Algebra, Handling of Arrays and more Python Features</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">From Regression to Support Vector Machines</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter1.html">3. Linear Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter2.html">4. Ridge and Lasso Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter3.html">5. Resampling Methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter4.html">6. Logistic Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapteroptimization.html">7. Optimization, the central part of any Machine Learning algortithm</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter5.html">8. Support Vector Machines, overarching aims</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Decision Trees, Ensemble Methods and Boosting</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter6.html">9. Decision trees, overarching aims</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter7.html">10. Ensemble Methods: From a Single Tree to Many Trees and Extreme Boosting, Meet the Jungle of Methods</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Dimensionality Reduction</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter8.html">11. Basic ideas of the Principal Component Analysis (PCA)</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/clustering.html">12. Clustering and Unsupervised Learning</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Deep Learning Methods</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter9.html">13. Neural networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter10.html">14. Building a Feed Forward Neural Network</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter11.html">15. Solving Differential Equations  with Deep Learning</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter12.html">16. Convolutional Neural Networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter13.html">17. Recurrent neural networks: Overarching view</a></li>
+
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Weekly material, notes and exercises</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek34.html">Exercises week 34</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week34.html">Week 34: Introduction to the course, Logistics and Practicalities</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek35.html">Exercises week 35</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week35.html">Week 35: From Ordinary Linear Regression to Ridge and Lasso Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek36.html">Exercises week 36</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week36.html">Week 36: Linear Regression and Gradient descent</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek37.html">Exercises week 37</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week37.html">Week 37: Gradient descent methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek38.html">Exercises week 38</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week38.html">Week 38: Statistical analysis, bias-variance tradeoff and resampling methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek39.html">Exercises week 39</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week39.html">Week 39: Resampling methods and logistic regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week40.html">Week 40: Gradient descent methods (continued) and start Neural networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week41.html">Week 41 Neural networks and constructing a neural network code</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek41.html">Exercises week 41</a></li>
+
+
+
+
+
+
+
+
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Projects</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/project1.html">Project 1 on Machine Learning, deadline October 6 (midnight), 2025</a></li>
+</ul>
+
+    </div>
+</nav></div>
+    </div>
+  
+  
+  <div class="sidebar-primary-items__end sidebar-primary__section">
+  </div>
+  
+  <div id="rtd-footer-container"></div>
+
+
+      </div>
+      
+      <main id="main-content" class="bd-main" role="main">
+        
+        
+
+<div class="sbt-scroll-pixel-helper"></div>
+
+          <div class="bd-content">
+            <div class="bd-article-container">
+              
+              <div class="bd-header-article d-print-none">
+<div class="header-article-items header-article__inner">
+  
+    <div class="header-article-items__start">
+      
+        <div class="header-article-item"><button class="sidebar-toggle primary-toggle btn btn-sm" title="Toggle primary sidebar" data-bs-placement="bottom" data-bs-toggle="tooltip">
+  <span class="fa-solid fa-bars"></span>
+</button></div>
+      
+    </div>
+  
+  
+    <div class="header-article-items__end">
+      
+        <div class="header-article-item">
+
+<div class="article-header-buttons">
+
+
+
+
+
+<div class="dropdown dropdown-download-buttons">
+  <button class="btn dropdown-toggle" type="button" data-bs-toggle="dropdown" aria-expanded="false" aria-label="Download this page">
+    <i class="fas fa-download"></i>
+  </button>
+  <ul class="dropdown-menu">
+      
+      
+      
+      <li><a href="http://github.com/_sources/.venv/lib/python3.13/site-packages/sphinxcontrib_bibtex-2.6.5.dist-info/licenses/LICENSE.rst" target="_blank"
+   class="btn btn-sm btn-download-source-button dropdown-item"
+   title="Download source file"
+   data-bs-placement="left" data-bs-toggle="tooltip"
+>
+  
+
+<span class="btn__icon-container">
+  <i class="fas fa-file"></i>
+  </span>
+<span class="btn__text-container">.rst</span>
+</a>
+</li>
+      
+      
+      
+      
+      <li>
+<button onclick="window.print()"
+  class="btn btn-sm btn-download-pdf-button dropdown-item"
+  title="Print to PDF"
+  data-bs-placement="left" data-bs-toggle="tooltip"
+>
+  
+
+<span class="btn__icon-container">
+  <i class="fas fa-file-pdf"></i>
+  </span>
+<span class="btn__text-container">.pdf</span>
+</button>
+</li>
+      
+  </ul>
+</div>
+
+
+
+
+<button onclick="toggleFullScreen()"
+  class="btn btn-sm btn-fullscreen-button"
+  title="Fullscreen mode"
+  data-bs-placement="bottom" data-bs-toggle="tooltip"
+>
+  
+
+<span class="btn__icon-container">
+  <i class="fas fa-expand"></i>
+  </span>
+
+</button>
+
+
+
+<script>
+document.write(`
+  <button class="btn btn-sm nav-link pst-navbar-icon theme-switch-button" title="light/dark" aria-label="light/dark" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <i class="theme-switch fa-solid fa-sun fa-lg" data-mode="light"></i>
+    <i class="theme-switch fa-solid fa-moon fa-lg" data-mode="dark"></i>
+    <i class="theme-switch fa-solid fa-circle-half-stroke fa-lg" data-mode="auto"></i>
+  </button>
+`);
+</script>
+
+
+<script>
+document.write(`
+  <button class="btn btn-sm pst-navbar-icon search-button search-button__button" title="Search" aria-label="Search" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <i class="fa-solid fa-magnifying-glass fa-lg"></i>
+  </button>
+`);
+</script>
+<button class="sidebar-toggle secondary-toggle btn btn-sm" title="Toggle secondary sidebar" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <span class="fa-solid fa-list"></span>
+</button>
+</div></div>
+      
+    </div>
+  
+</div>
+</div>
+              
+              
+
+<div id="jb-print-docs-body" class="onlyprint">
+    <h1><no title></h1>
+    <!-- Table of contents -->
+    <div id="print-main-content">
+        <div id="jb-print-toc">
+            
+            <div>
+                <h2> Contents </h2>
+            </div>
+            <nav aria-label="Page">
+                <ul class="simple visible nav section-nav flex-column">
+</ul>
+
+            </nav>
+        </div>
+    </div>
+</div>
+
+              
+                
+<div id="searchbox"></div>
+                <article class="bd-article">
+                  
+  <div class="line-block">
+<div class="line">sphinxcontrib-bibtex is a Sphinx extension for BibTeX style citations</div>
+<div class="line">Copyright (c) 2011-2024 by Matthias C. M. Troffaes</div>
+<div class="line">All rights reserved.</div>
+</div>
+<p>Redistribution and use in source and binary forms, with or without
+modification, are permitted provided that the following conditions are
+met:</p>
+<ul class="simple">
+<li><p>Redistributions of source code must retain the above copyright
+notice, this list of conditions and the following disclaimer.</p></li>
+<li><p>Redistributions in binary form must reproduce the above copyright
+notice, this list of conditions and the following disclaimer in the
+documentation and/or other materials provided with the distribution.</p></li>
+</ul>
+<p>THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+“AS IS” AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.</p>
+
+    <script type="text/x-thebe-config">
+    {
+        requestKernel: true,
+        binderOptions: {
+            repo: "binder-examples/jupyter-stacks-datascience",
+            ref: "master",
+        },
+        codeMirrorConfig: {
+            theme: "abcdef",
+            mode: "python"
+        },
+        kernelOptions: {
+            name: "python3",
+            path: "./.venv/lib/python3.13/site-packages/sphinxcontrib_bibtex-2.6.5.dist-info/licenses"
+        },
+        predefinedOutput: true
+    }
+    </script>
+    <script>kernelName = 'python3'</script>
+
+                </article>
+              
+
+              
+              
+              
+              
+                <footer class="prev-next-footer d-print-none">
+                  
+<div class="prev-next-area">
+</div>
+                </footer>
+              
+            </div>
+            
+            
+              
+                <div class="bd-sidebar-secondary bd-toc"><div class="sidebar-secondary-items sidebar-secondary__inner">
+
+
+  <div class="sidebar-secondary-item">
+  <div class="page-toc tocsection onthispage">
+    <i class="fa-solid fa-list"></i> Contents
+  </div>
+  <nav class="bd-toc-nav page-toc">
+    <ul class="simple visible nav section-nav flex-column">
+</ul>
+
+  </nav></div>
+
+</div></div>
+              
+            
+          </div>
+          <footer class="bd-footer-content">
+            
+<div class="bd-footer-content__inner container">
+  
+  <div class="footer-item">
+    
+<p class="component-author">
+By Morten Hjorth-Jensen
+</p>
+
+  </div>
+  
+  <div class="footer-item">
+    
+
+  <p class="copyright">
+    
+      © Copyright 2023.
+      <br/>
+    
+  </p>
+
+  </div>
+  
+  <div class="footer-item">
+    
+  </div>
+  
+  <div class="footer-item">
+    
+  </div>
+  
+</div>
+          </footer>
+        
+
+      </main>
+    </div>
+  </div>
+  
+  <!-- Scripts loaded after <body> so the DOM is not blocked -->
+  <script src="http://github.com/_static/scripts/bootstrap.js?digest=dfe6caa3a7d634c4db9b"></script>
+<script src="http://github.com/_static/scripts/pydata-sphinx-theme.js?digest=dfe6caa3a7d634c4db9b"></script>
+
+  <footer class="bd-footer">
+  </footer>
+  </body>
+</html>
\ No newline at end of file
diff --git a/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/zmq/backend/cffi/README.html b/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/zmq/backend/cffi/README.html
new file mode 100644
index 000000000..1e7e089e2
--- /dev/null
+++ b/doc/LectureNotes/_build/html/.venv/lib/python3.13/site-packages/zmq/backend/cffi/README.html
@@ -0,0 +1,514 @@
+
+<!DOCTYPE html>
+
+
+<html lang="en" data-content_root="../../../../../../../" >
+
+  <head>
+    <meta charset="utf-8" />
+    <meta name="viewport" content="width=device-width, initial-scale=1.0" /><meta name="viewport" content="width=device-width, initial-scale=1" />
+
+    <title>&lt;no title&gt; &#8212; Applied Data Analysis and Machine Learning</title>
+  
+  
+  
+  <script data-cfasync="false">
+    document.documentElement.dataset.mode = localStorage.getItem("mode") || "";
+    document.documentElement.dataset.theme = localStorage.getItem("theme") || "";
+  </script>
+  
+  <!-- Loaded before other Sphinx assets -->
+  <link href="http://github.com/_static/styles/theme.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+<link href="http://github.com/_static/styles/bootstrap.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+<link href="http://github.com/_static/styles/pydata-sphinx-theme.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+
+  
+  <link href="http://github.com/_static/vendor/fontawesome/6.5.2/css/all.min.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+  <link rel="preload" as="font" type="font/woff2" crossorigin href="http://github.com/_static/vendor/fontawesome/6.5.2/webfonts/fa-solid-900.woff2" />
+<link rel="preload" as="font" type="font/woff2" crossorigin href="http://github.com/_static/vendor/fontawesome/6.5.2/webfonts/fa-brands-400.woff2" />
+<link rel="preload" as="font" type="font/woff2" crossorigin href="http://github.com/_static/vendor/fontawesome/6.5.2/webfonts/fa-regular-400.woff2" />
+
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/pygments.css?v=03e43079" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/styles/sphinx-book-theme.css?v=eba8b062" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/togglebutton.css?v=13237357" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/copybutton.css?v=76b2166b" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/mystnb.8ecb98da25f57f5357bf6f572d296f466b2cfe2517ffebfabe82451661e28f02.css?v=6644e6bb" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/sphinx-thebe.css?v=4fa983c6" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/sphinx-design.min.css?v=95c83b7e" />
+  
+  <!-- Pre-loaded scripts that we'll load fully later -->
+  <link rel="preload" as="script" href="http://github.com/_static/scripts/bootstrap.js?digest=dfe6caa3a7d634c4db9b" />
+<link rel="preload" as="script" href="http://github.com/_static/scripts/pydata-sphinx-theme.js?digest=dfe6caa3a7d634c4db9b" />
+  <script src="http://github.com/_static/vendor/fontawesome/6.5.2/js/all.min.js?digest=dfe6caa3a7d634c4db9b"></script>
+
+    <script src="http://github.com/_static/documentation_options.js?v=9eb32ce0"></script>
+    <script src="http://github.com/_static/doctools.js?v=9a2dae69"></script>
+    <script src="http://github.com/_static/sphinx_highlight.js?v=dc90522c"></script>
+    <script src="http://github.com/_static/clipboard.min.js?v=a7894cd8"></script>
+    <script src="http://github.com/_static/copybutton.js?v=f281be69"></script>
+    <script src="http://github.com/_static/scripts/sphinx-book-theme.js?v=887ef09a"></script>
+    <script>let toggleHintShow = 'Click to show';</script>
+    <script>let toggleHintHide = 'Click to hide';</script>
+    <script>let toggleOpenOnPrint = 'true';</script>
+    <script src="http://github.com/_static/togglebutton.js?v=4a39c7ea"></script>
+    <script>var togglebuttonSelector = '.toggle, .admonition.dropdown';</script>
+    <script src="http://github.com/_static/design-tabs.js?v=f930bc37"></script>
+    <script>const THEBE_JS_URL = "https://unpkg.com/thebe@0.8.2/lib/index.js"; const thebe_selector = ".thebe,.cell"; const thebe_selector_input = "pre"; const thebe_selector_output = ".output, .cell_output"</script>
+    <script async="async" src="http://github.com/_static/sphinx-thebe.js?v=c100c467"></script>
+    <script>var togglebuttonSelector = '.toggle, .admonition.dropdown';</script>
+    <script>const THEBE_JS_URL = "https://unpkg.com/thebe@0.8.2/lib/index.js"; const thebe_selector = ".thebe,.cell"; const thebe_selector_input = "pre"; const thebe_selector_output = ".output, .cell_output"</script>
+    <script>DOCUMENTATION_OPTIONS.pagename = '.venv/lib/python3.13/site-packages/zmq/backend/cffi/README';</script>
+    <link rel="index" title="Index" href="http://github.com/genindex.html" />
+    <link rel="search" title="Search" href="http://github.com/search.html" />
+  <meta name="viewport" content="width=device-width, initial-scale=1"/>
+  <meta name="docsearch:language" content="en"/>
+  </head>
+  
+  
+  <body data-bs-spy="scroll" data-bs-target=".bd-toc-nav" data-offset="180" data-bs-root-margin="0px 0px -60%" data-default-mode="">
+
+  
+  
+  <div id="pst-skip-link" class="skip-link d-print-none"><a href="#main-content">Skip to main content</a></div>
+  
+  <div id="pst-scroll-pixel-helper"></div>
+  
+  <button type="button" class="btn rounded-pill" id="pst-back-to-top">
+    <i class="fa-solid fa-arrow-up"></i>Back to top</button>
+
+  
+  <input type="checkbox"
+          class="sidebar-toggle"
+          id="pst-primary-sidebar-checkbox"/>
+  <label class="overlay overlay-primary" for="pst-primary-sidebar-checkbox"></label>
+  
+  <input type="checkbox"
+          class="sidebar-toggle"
+          id="pst-secondary-sidebar-checkbox"/>
+  <label class="overlay overlay-secondary" for="pst-secondary-sidebar-checkbox"></label>
+  
+  <div class="search-button__wrapper">
+    <div class="search-button__overlay"></div>
+    <div class="search-button__search-container">
+<form class="bd-search d-flex align-items-center"
+      action="http://github.com/search.html"
+      method="get">
+  <i class="fa-solid fa-magnifying-glass"></i>
+  <input type="search"
+         class="form-control"
+         name="q"
+         id="search-input"
+         placeholder="Search this book..."
+         aria-label="Search this book..."
+         autocomplete="off"
+         autocorrect="off"
+         autocapitalize="off"
+         spellcheck="false"/>
+  <span class="search-button__kbd-shortcut"><kbd class="kbd-shortcut__modifier">Ctrl</kbd>+<kbd>K</kbd></span>
+</form></div>
+  </div>
+
+  <div class="pst-async-banner-revealer d-none">
+  <aside id="bd-header-version-warning" class="d-none d-print-none" aria-label="Version warning"></aside>
+</div>
+
+  
+    <header class="bd-header navbar navbar-expand-lg bd-navbar d-print-none">
+    </header>
+  
+
+  <div class="bd-container">
+    <div class="bd-container__inner bd-page-width">
+      
+      
+      
+        
+      
+      <div class="bd-sidebar-primary bd-sidebar">
+        
+
+  
+  <div class="sidebar-header-items sidebar-primary__section">
+    
+    
+    
+    
+  </div>
+  
+    <div class="sidebar-primary-items__start sidebar-primary__section">
+        <div class="sidebar-primary-item">
+
+  
+    
+  
+
+<a class="navbar-brand logo" href="http://github.com/intro.html">
+  
+  
+  
+  
+  
+    
+    
+      
+    
+    
+    <img src="http://github.com/_static/logo.png" class="logo__image only-light" alt="Applied Data Analysis and Machine Learning - Home"/>
+    <script>document.write(`<img src="http://github.com/_static/logo.png" class="logo__image only-dark" alt="Applied Data Analysis and Machine Learning - Home"/>`);</script>
+  
+  
+</a></div>
+        <div class="sidebar-primary-item">
+
+ <script>
+ document.write(`
+   <button class="btn search-button-field search-button__button" title="Search" aria-label="Search" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <i class="fa-solid fa-magnifying-glass"></i>
+    <span class="search-button__default-text">Search</span>
+    <span class="search-button__kbd-shortcut"><kbd class="kbd-shortcut__modifier">Ctrl</kbd>+<kbd class="kbd-shortcut__modifier">K</kbd></span>
+   </button>
+ `);
+ </script></div>
+        <div class="sidebar-primary-item"><nav class="bd-links bd-docs-nav" aria-label="Main">
+    <div class="bd-toc-item navbar-nav active">
+        
+        <ul class="nav bd-sidenav bd-sidenav__home-link">
+            <li class="toctree-l1">
+                <a class="reference internal" href="http://github.com/intro.html">
+                    Applied Data Analysis and Machine Learning
+                </a>
+            </li>
+        </ul>
+        <p aria-level="2" class="caption" role="heading"><span class="caption-text">About the course</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/schedule.html">Course setting</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/teachers.html">Teachers and Grading</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/textbooks.html">Textbooks</a></li>
+
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Review of Statistics with Resampling Techniques and Linear Algebra</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/statistics.html">1. Elements of Probability Theory and Statistical Data Analysis</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/linalg.html">2. Linear Algebra, Handling of Arrays and more Python Features</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">From Regression to Support Vector Machines</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter1.html">3. Linear Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter2.html">4. Ridge and Lasso Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter3.html">5. Resampling Methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter4.html">6. Logistic Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapteroptimization.html">7. Optimization, the central part of any Machine Learning algortithm</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter5.html">8. Support Vector Machines, overarching aims</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Decision Trees, Ensemble Methods and Boosting</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter6.html">9. Decision trees, overarching aims</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter7.html">10. Ensemble Methods: From a Single Tree to Many Trees and Extreme Boosting, Meet the Jungle of Methods</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Dimensionality Reduction</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter8.html">11. Basic ideas of the Principal Component Analysis (PCA)</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/clustering.html">12. Clustering and Unsupervised Learning</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Deep Learning Methods</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter9.html">13. Neural networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter10.html">14. Building a Feed Forward Neural Network</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter11.html">15. Solving Differential Equations  with Deep Learning</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter12.html">16. Convolutional Neural Networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter13.html">17. Recurrent neural networks: Overarching view</a></li>
+
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Weekly material, notes and exercises</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek34.html">Exercises week 34</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week34.html">Week 34: Introduction to the course, Logistics and Practicalities</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek35.html">Exercises week 35</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week35.html">Week 35: From Ordinary Linear Regression to Ridge and Lasso Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek36.html">Exercises week 36</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week36.html">Week 36: Linear Regression and Gradient descent</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek37.html">Exercises week 37</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week37.html">Week 37: Gradient descent methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek38.html">Exercises week 38</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week38.html">Week 38: Statistical analysis, bias-variance tradeoff and resampling methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek39.html">Exercises week 39</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week39.html">Week 39: Resampling methods and logistic regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week40.html">Week 40: Gradient descent methods (continued) and start Neural networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week41.html">Week 41 Neural networks and constructing a neural network code</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek41.html">Exercises week 41</a></li>
+
+
+
+
+
+
+
+
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Projects</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/project1.html">Project 1 on Machine Learning, deadline October 6 (midnight), 2025</a></li>
+</ul>
+
+    </div>
+</nav></div>
+    </div>
+  
+  
+  <div class="sidebar-primary-items__end sidebar-primary__section">
+  </div>
+  
+  <div id="rtd-footer-container"></div>
+
+
+      </div>
+      
+      <main id="main-content" class="bd-main" role="main">
+        
+        
+
+<div class="sbt-scroll-pixel-helper"></div>
+
+          <div class="bd-content">
+            <div class="bd-article-container">
+              
+              <div class="bd-header-article d-print-none">
+<div class="header-article-items header-article__inner">
+  
+    <div class="header-article-items__start">
+      
+        <div class="header-article-item"><button class="sidebar-toggle primary-toggle btn btn-sm" title="Toggle primary sidebar" data-bs-placement="bottom" data-bs-toggle="tooltip">
+  <span class="fa-solid fa-bars"></span>
+</button></div>
+      
+    </div>
+  
+  
+    <div class="header-article-items__end">
+      
+        <div class="header-article-item">
+
+<div class="article-header-buttons">
+
+
+
+
+
+<div class="dropdown dropdown-download-buttons">
+  <button class="btn dropdown-toggle" type="button" data-bs-toggle="dropdown" aria-expanded="false" aria-label="Download this page">
+    <i class="fas fa-download"></i>
+  </button>
+  <ul class="dropdown-menu">
+      
+      
+      
+      <li><a href="http://github.com/_sources/.venv/lib/python3.13/site-packages/zmq/backend/cffi/README.md" target="_blank"
+   class="btn btn-sm btn-download-source-button dropdown-item"
+   title="Download source file"
+   data-bs-placement="left" data-bs-toggle="tooltip"
+>
+  
+
+<span class="btn__icon-container">
+  <i class="fas fa-file"></i>
+  </span>
+<span class="btn__text-container">.md</span>
+</a>
+</li>
+      
+      
+      
+      
+      <li>
+<button onclick="window.print()"
+  class="btn btn-sm btn-download-pdf-button dropdown-item"
+  title="Print to PDF"
+  data-bs-placement="left" data-bs-toggle="tooltip"
+>
+  
+
+<span class="btn__icon-container">
+  <i class="fas fa-file-pdf"></i>
+  </span>
+<span class="btn__text-container">.pdf</span>
+</button>
+</li>
+      
+  </ul>
+</div>
+
+
+
+
+<button onclick="toggleFullScreen()"
+  class="btn btn-sm btn-fullscreen-button"
+  title="Fullscreen mode"
+  data-bs-placement="bottom" data-bs-toggle="tooltip"
+>
+  
+
+<span class="btn__icon-container">
+  <i class="fas fa-expand"></i>
+  </span>
+
+</button>
+
+
+
+<script>
+document.write(`
+  <button class="btn btn-sm nav-link pst-navbar-icon theme-switch-button" title="light/dark" aria-label="light/dark" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <i class="theme-switch fa-solid fa-sun fa-lg" data-mode="light"></i>
+    <i class="theme-switch fa-solid fa-moon fa-lg" data-mode="dark"></i>
+    <i class="theme-switch fa-solid fa-circle-half-stroke fa-lg" data-mode="auto"></i>
+  </button>
+`);
+</script>
+
+
+<script>
+document.write(`
+  <button class="btn btn-sm pst-navbar-icon search-button search-button__button" title="Search" aria-label="Search" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <i class="fa-solid fa-magnifying-glass fa-lg"></i>
+  </button>
+`);
+</script>
+<button class="sidebar-toggle secondary-toggle btn btn-sm" title="Toggle secondary sidebar" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <span class="fa-solid fa-list"></span>
+</button>
+</div></div>
+      
+    </div>
+  
+</div>
+</div>
+              
+              
+
+<div id="jb-print-docs-body" class="onlyprint">
+    <h1><no title></h1>
+    <!-- Table of contents -->
+    <div id="print-main-content">
+        <div id="jb-print-toc">
+            
+            <div>
+                <h2> Contents </h2>
+            </div>
+            <nav aria-label="Page">
+                <ul class="simple visible nav section-nav flex-column">
+</ul>
+
+            </nav>
+        </div>
+    </div>
+</div>
+
+              
+                
+<div id="searchbox"></div>
+                <article class="bd-article">
+                  
+  <p>PyZMQ’s CFFI support is designed only for (Unix) systems conforming to <code class="docutils literal notranslate"><span class="pre">have_sys_un_h</span> <span class="pre">=</span> <span class="pre">True</span></code>.</p>
+
+    <script type="text/x-thebe-config">
+    {
+        requestKernel: true,
+        binderOptions: {
+            repo: "binder-examples/jupyter-stacks-datascience",
+            ref: "master",
+        },
+        codeMirrorConfig: {
+            theme: "abcdef",
+            mode: "python"
+        },
+        kernelOptions: {
+            name: "python3",
+            path: "./.venv/lib/python3.13/site-packages/zmq/backend/cffi"
+        },
+        predefinedOutput: true
+    }
+    </script>
+    <script>kernelName = 'python3'</script>
+
+                </article>
+              
+
+              
+              
+              
+              
+                <footer class="prev-next-footer d-print-none">
+                  
+<div class="prev-next-area">
+</div>
+                </footer>
+              
+            </div>
+            
+            
+              
+                <div class="bd-sidebar-secondary bd-toc"><div class="sidebar-secondary-items sidebar-secondary__inner">
+
+
+  <div class="sidebar-secondary-item">
+  <div class="page-toc tocsection onthispage">
+    <i class="fa-solid fa-list"></i> Contents
+  </div>
+  <nav class="bd-toc-nav page-toc">
+    <ul class="simple visible nav section-nav flex-column">
+</ul>
+
+  </nav></div>
+
+</div></div>
+              
+            
+          </div>
+          <footer class="bd-footer-content">
+            
+<div class="bd-footer-content__inner container">
+  
+  <div class="footer-item">
+    
+<p class="component-author">
+By Morten Hjorth-Jensen
+</p>
+
+  </div>
+  
+  <div class="footer-item">
+    
+
+  <p class="copyright">
+    
+      © Copyright 2023.
+      <br/>
+    
+  </p>
+
+  </div>
+  
+  <div class="footer-item">
+    
+  </div>
+  
+  <div class="footer-item">
+    
+  </div>
+  
+</div>
+          </footer>
+        
+
+      </main>
+    </div>
+  </div>
+  
+  <!-- Scripts loaded after <body> so the DOM is not blocked -->
+  <script src="http://github.com/_static/scripts/bootstrap.js?digest=dfe6caa3a7d634c4db9b"></script>
+<script src="http://github.com/_static/scripts/pydata-sphinx-theme.js?digest=dfe6caa3a7d634c4db9b"></script>
+
+  <footer class="bd-footer">
+  </footer>
+  </body>
+</html>
\ No newline at end of file
diff --git a/doc/LectureNotes/_build/html/_images/000000.png b/doc/LectureNotes/_build/html/_images/000000.png
new file mode 100644
index 000000000..6495ae016
Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/000000.png differ
diff --git a/doc/LectureNotes/_build/html/_images/005b82.png b/doc/LectureNotes/_build/html/_images/005b82.png
new file mode 100644
index 000000000..1842c5ee3
Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/005b82.png differ
diff --git a/doc/LectureNotes/_build/html/_images/00622f.png b/doc/LectureNotes/_build/html/_images/00622f.png
new file mode 100644
index 000000000..1226459ce
Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/00622f.png differ
diff --git a/doc/LectureNotes/_build/html/_images/0072b2.png b/doc/LectureNotes/_build/html/_images/0072b2.png
new file mode 100644
index 000000000..03e4db02f
Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/0072b2.png differ
diff --git a/doc/LectureNotes/_build/html/_images/00749c.png b/doc/LectureNotes/_build/html/_images/00749c.png
new file mode 100644
index 000000000..64f9a001a
Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/00749c.png differ
diff --git a/doc/LectureNotes/_build/html/_images/008561.png b/doc/LectureNotes/_build/html/_images/008561.png
new file mode 100644
index 000000000..8c4c56b64
Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/008561.png differ
diff --git a/doc/LectureNotes/_build/html/_images/00e0e0.png b/doc/LectureNotes/_build/html/_images/00e0e0.png
new file mode 100644
index 000000000..0ceb5bcd8
Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/00e0e0.png differ
diff --git a/doc/LectureNotes/_build/html/_images/023b95.png b/doc/LectureNotes/_build/html/_images/023b95.png
new file mode 100644
index 000000000..8b48ae837
Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/023b95.png differ
diff --git a/doc/LectureNotes/_build/html/_images/024c1a.png b/doc/LectureNotes/_build/html/_images/024c1a.png
new file mode 100644
index 000000000..692ad9457
Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/024c1a.png differ
diff --git a/doc/LectureNotes/_build/html/_images/0550ae.png b/doc/LectureNotes/_build/html/_images/0550ae.png
new file mode 100644
index 000000000..b6e505fa9
Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/0550ae.png differ
diff --git a/doc/LectureNotes/_build/html/_images/080808.png b/doc/LectureNotes/_build/html/_images/080808.png
new file mode 100644
index 000000000..ad395a689
Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/080808.png differ
diff --git a/doc/LectureNotes/_build/html/_images/116329.png b/doc/LectureNotes/_build/html/_images/116329.png
new file mode 100644
index 000000000..55ddb67a3
Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/116329.png differ
diff --git a/doc/LectureNotes/_build/html/_images/116633.png b/doc/LectureNotes/_build/html/_images/116633.png
new file mode 100644
index 000000000..340ef62ef
Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/116633.png differ
diff --git a/doc/LectureNotes/_build/html/_images/141414.png b/doc/LectureNotes/_build/html/_images/141414.png
new file mode 100644
index 000000000..4aa24384e
Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/141414.png differ
diff --git a/doc/LectureNotes/_build/html/_images/18c1c4.png b/doc/LectureNotes/_build/html/_images/18c1c4.png
new file mode 100644
index 000000000..1cfdaff04
Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/18c1c4.png differ
diff --git a/doc/LectureNotes/_build/html/_images/1e1e1e.png b/doc/LectureNotes/_build/html/_images/1e1e1e.png
new file mode 100644
index 000000000..bd434c627
Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/1e1e1e.png differ
diff --git a/doc/LectureNotes/_build/html/_images/24292f.png b/doc/LectureNotes/_build/html/_images/24292f.png
new file mode 100644
index 000000000..8a6e5b70d
Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/24292f.png differ
diff --git a/doc/LectureNotes/_build/html/_images/3d73a9.png b/doc/LectureNotes/_build/html/_images/3d73a9.png
new file mode 100644
index 000000000..bb65f9821
Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/3d73a9.png differ
diff --git a/doc/LectureNotes/_build/html/_images/437a6b.png b/doc/LectureNotes/_build/html/_images/437a6b.png
new file mode 100644
index 000000000..3be95ecdd
Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/437a6b.png differ
diff --git a/doc/LectureNotes/_build/html/_images/515151.png b/doc/LectureNotes/_build/html/_images/515151.png
new file mode 100644
index 000000000..491fc486c
Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/515151.png differ
diff --git a/doc/LectureNotes/_build/html/_images/5391cf.png b/doc/LectureNotes/_build/html/_images/5391cf.png
new file mode 100644
index 000000000..9676ecf32
Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/5391cf.png differ
diff --git a/doc/LectureNotes/_build/html/_images/5ca7e4.png b/doc/LectureNotes/_build/html/_images/5ca7e4.png
new file mode 100644
index 000000000..b580c19ec
Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/5ca7e4.png differ
diff --git a/doc/LectureNotes/_build/html/_images/622cbc.png b/doc/LectureNotes/_build/html/_images/622cbc.png
new file mode 100644
index 000000000..3591ab100
Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/622cbc.png differ
diff --git a/doc/LectureNotes/_build/html/_images/66707b.png b/doc/LectureNotes/_build/html/_images/66707b.png
new file mode 100644
index 000000000..f4189a06c
Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/66707b.png differ
diff --git a/doc/LectureNotes/_build/html/_images/66ccee.png b/doc/LectureNotes/_build/html/_images/66ccee.png
new file mode 100644
index 000000000..a83dab6e2
Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/66ccee.png differ
diff --git a/doc/LectureNotes/_build/html/_images/66e9ec.png b/doc/LectureNotes/_build/html/_images/66e9ec.png
new file mode 100644
index 000000000..1c98cea18
Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/66e9ec.png differ
diff --git a/doc/LectureNotes/_build/html/_images/6730c5.png b/doc/LectureNotes/_build/html/_images/6730c5.png
new file mode 100644
index 000000000..38814dbc4
Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/6730c5.png differ
diff --git a/doc/LectureNotes/_build/html/_images/6e7781.png b/doc/LectureNotes/_build/html/_images/6e7781.png
new file mode 100644
index 000000000..db5ddb9ea
Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/6e7781.png differ
diff --git a/doc/LectureNotes/_build/html/_images/6f98b3.png b/doc/LectureNotes/_build/html/_images/6f98b3.png
new file mode 100644
index 000000000..fbaa00f2f
Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/6f98b3.png differ
diff --git a/doc/LectureNotes/_build/html/_images/702c00.png b/doc/LectureNotes/_build/html/_images/702c00.png
new file mode 100644
index 000000000..64de65cc3
Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/702c00.png differ
diff --git a/doc/LectureNotes/_build/html/_images/72f088.png b/doc/LectureNotes/_build/html/_images/72f088.png
new file mode 100644
index 000000000..e624bc7f6
Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/72f088.png differ
diff --git a/doc/LectureNotes/_build/html/_images/737373.png b/doc/LectureNotes/_build/html/_images/737373.png
new file mode 100644
index 000000000..436059c52
Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/737373.png differ
diff --git a/doc/LectureNotes/_build/html/_images/797979.png b/doc/LectureNotes/_build/html/_images/797979.png
new file mode 100644
index 000000000..5642e0e9e
Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/797979.png differ
diff --git a/doc/LectureNotes/_build/html/_images/7998f2.png b/doc/LectureNotes/_build/html/_images/7998f2.png
new file mode 100644
index 000000000..fc8b9ec22
Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/7998f2.png differ
diff --git a/doc/LectureNotes/_build/html/_images/79c0ff.png b/doc/LectureNotes/_build/html/_images/79c0ff.png
new file mode 100644
index 000000000..0c15a6509
Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/79c0ff.png differ
diff --git a/doc/LectureNotes/_build/html/_images/7ee787.png b/doc/LectureNotes/_build/html/_images/7ee787.png
new file mode 100644
index 000000000..639863c5c
Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/7ee787.png differ
diff --git a/doc/LectureNotes/_build/html/_images/7f4707.png b/doc/LectureNotes/_build/html/_images/7f4707.png
new file mode 100644
index 000000000..248de1972
Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/7f4707.png differ
diff --git a/doc/LectureNotes/_build/html/_images/8045e5.png b/doc/LectureNotes/_build/html/_images/8045e5.png
new file mode 100644
index 000000000..08ab32e85
Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/8045e5.png differ
diff --git a/doc/LectureNotes/_build/html/_images/81b19b.png b/doc/LectureNotes/_build/html/_images/81b19b.png
new file mode 100644
index 000000000..e2b23db8f
Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/81b19b.png differ
diff --git a/doc/LectureNotes/_build/html/_images/8250df.png b/doc/LectureNotes/_build/html/_images/8250df.png
new file mode 100644
index 000000000..fd096abf0
Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/8250df.png differ
diff --git a/doc/LectureNotes/_build/html/_images/8786ac.png b/doc/LectureNotes/_build/html/_images/8786ac.png
new file mode 100644
index 000000000..995c0e551
Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/8786ac.png differ
diff --git a/doc/LectureNotes/_build/html/_images/8a4600.png b/doc/LectureNotes/_build/html/_images/8a4600.png
new file mode 100644
index 000000000..2fc6f5809
Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/8a4600.png differ
diff --git a/doc/LectureNotes/_build/html/_images/8b949e.png b/doc/LectureNotes/_build/html/_images/8b949e.png
new file mode 100644
index 000000000..ad7978584
Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/8b949e.png differ
diff --git a/doc/LectureNotes/_build/html/_images/8c8c8c.png b/doc/LectureNotes/_build/html/_images/8c8c8c.png
new file mode 100644
index 000000000..b8cd92d80
Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/8c8c8c.png differ
diff --git a/doc/LectureNotes/_build/html/_images/912583.png b/doc/LectureNotes/_build/html/_images/912583.png
new file mode 100644
index 000000000..a71611eae
Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/912583.png differ
diff --git a/doc/LectureNotes/_build/html/_images/91cbff.png b/doc/LectureNotes/_build/html/_images/91cbff.png
new file mode 100644
index 000000000..58e7706f7
Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/91cbff.png differ
diff --git a/doc/LectureNotes/_build/html/_images/953800.png b/doc/LectureNotes/_build/html/_images/953800.png
new file mode 100644
index 000000000..b102d5e58
Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/953800.png differ
diff --git a/doc/LectureNotes/_build/html/_images/974eb7.png b/doc/LectureNotes/_build/html/_images/974eb7.png
new file mode 100644
index 000000000..0cd42cd71
Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/974eb7.png differ
diff --git a/doc/LectureNotes/_build/html/_images/98661b.png b/doc/LectureNotes/_build/html/_images/98661b.png
new file mode 100644
index 000000000..030036eaf
Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/98661b.png differ
diff --git a/doc/LectureNotes/_build/html/_images/996b00.png b/doc/LectureNotes/_build/html/_images/996b00.png
new file mode 100644
index 000000000..1f1404f3f
Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/996b00.png differ
diff --git a/doc/LectureNotes/_build/html/_images/9e86c8.png b/doc/LectureNotes/_build/html/_images/9e86c8.png
new file mode 100644
index 000000000..d66df7500
Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/9e86c8.png differ
diff --git a/doc/LectureNotes/_build/html/_images/9e8741.png b/doc/LectureNotes/_build/html/_images/9e8741.png
new file mode 100644
index 000000000..53d7ec828
Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/9e8741.png differ
diff --git a/doc/LectureNotes/_build/html/_images/9f4e55.png b/doc/LectureNotes/_build/html/_images/9f4e55.png
new file mode 100644
index 000000000..422fefee1
Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/9f4e55.png differ
diff --git a/doc/LectureNotes/_build/html/_images/a0111f.png b/doc/LectureNotes/_build/html/_images/a0111f.png
new file mode 100644
index 000000000..dc3d2c82d
Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/a0111f.png differ
diff --git a/doc/LectureNotes/_build/html/_images/a11y-dark.png b/doc/LectureNotes/_build/html/_images/a11y-dark.png
new file mode 100644
index 000000000..08447103a
Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/a11y-dark.png differ
diff --git a/doc/LectureNotes/_build/html/_images/a11y-high-contrast-dark.png b/doc/LectureNotes/_build/html/_images/a11y-high-contrast-dark.png
new file mode 100644
index 000000000..6e422ed62
Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/a11y-high-contrast-dark.png differ
diff --git a/doc/LectureNotes/_build/html/_images/a11y-high-contrast-light.png b/doc/LectureNotes/_build/html/_images/a11y-high-contrast-light.png
new file mode 100644
index 000000000..6bb19b562
Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/a11y-high-contrast-light.png differ
diff --git a/doc/LectureNotes/_build/html/_images/a11y-light.png b/doc/LectureNotes/_build/html/_images/a11y-light.png
new file mode 100644
index 000000000..7585d6db7
Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/a11y-light.png differ
diff --git a/doc/LectureNotes/_build/html/_images/a12236.png b/doc/LectureNotes/_build/html/_images/a12236.png
new file mode 100644
index 000000000..a61aa4df5
Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/a12236.png differ
diff --git a/doc/LectureNotes/_build/html/_images/a25e53.png b/doc/LectureNotes/_build/html/_images/a25e53.png
new file mode 100644
index 000000000..67d5db79c
Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/a25e53.png differ
diff --git a/doc/LectureNotes/_build/html/_images/a2bffc.png b/doc/LectureNotes/_build/html/_images/a2bffc.png
new file mode 100644
index 000000000..74fd8d7fd
Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/a2bffc.png differ
diff --git a/doc/LectureNotes/_build/html/_images/a5d6ff.png b/doc/LectureNotes/_build/html/_images/a5d6ff.png
new file mode 100644
index 000000000..85dc8ab5c
Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/a5d6ff.png differ
diff --git a/doc/LectureNotes/_build/html/_images/ab6369.png b/doc/LectureNotes/_build/html/_images/ab6369.png
new file mode 100644
index 000000000..bbe790f35
Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/ab6369.png differ
diff --git a/doc/LectureNotes/_build/html/_images/abe338.png b/doc/LectureNotes/_build/html/_images/abe338.png
new file mode 100644
index 000000000..f71421dc4
Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/abe338.png differ
diff --git a/doc/LectureNotes/_build/html/_images/b19db4.png b/doc/LectureNotes/_build/html/_images/b19db4.png
new file mode 100644
index 000000000..2bcf14bf7
Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/b19db4.png differ
diff --git a/doc/LectureNotes/_build/html/_images/b1bac4.png b/doc/LectureNotes/_build/html/_images/b1bac4.png
new file mode 100644
index 000000000..fa99254a4
Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/b1bac4.png differ
diff --git a/doc/LectureNotes/_build/html/_images/b35900.png b/doc/LectureNotes/_build/html/_images/b35900.png
new file mode 100644
index 000000000..af6314be1
Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/b35900.png differ
diff --git a/doc/LectureNotes/_build/html/_images/b89784.png b/doc/LectureNotes/_build/html/_images/b89784.png
new file mode 100644
index 000000000..553c732ed
Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/b89784.png differ
diff --git a/doc/LectureNotes/_build/html/_images/bbbbbb.png b/doc/LectureNotes/_build/html/_images/bbbbbb.png
new file mode 100644
index 000000000..45fa9bad4
Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/bbbbbb.png differ
diff --git a/doc/LectureNotes/_build/html/_images/bf5400.png b/doc/LectureNotes/_build/html/_images/bf5400.png
new file mode 100644
index 000000000..ba4f6a062
Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/bf5400.png differ
diff --git a/doc/LectureNotes/_build/html/_images/blinds-dark.png b/doc/LectureNotes/_build/html/_images/blinds-dark.png
new file mode 100644
index 000000000..fca57415b
Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/blinds-dark.png differ
diff --git a/doc/LectureNotes/_build/html/_images/blinds-light.png b/doc/LectureNotes/_build/html/_images/blinds-light.png
new file mode 100644
index 000000000..c715ebf74
Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/blinds-light.png differ
diff --git a/doc/LectureNotes/_build/html/_images/c4a2f5.png b/doc/LectureNotes/_build/html/_images/c4a2f5.png
new file mode 100644
index 000000000..815fa9a09
Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/c4a2f5.png differ
diff --git a/doc/LectureNotes/_build/html/_images/c5e478.png b/doc/LectureNotes/_build/html/_images/c5e478.png
new file mode 100644
index 000000000..450f66bdf
Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/c5e478.png differ
diff --git a/doc/LectureNotes/_build/html/_images/c9d1d9.png b/doc/LectureNotes/_build/html/_images/c9d1d9.png
new file mode 100644
index 000000000..56df44316
Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/c9d1d9.png differ
diff --git a/doc/LectureNotes/_build/html/_images/caab6d.png b/doc/LectureNotes/_build/html/_images/caab6d.png
new file mode 100644
index 000000000..a20ea8967
Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/caab6d.png differ
diff --git a/doc/LectureNotes/_build/html/_images/cc398b.png b/doc/LectureNotes/_build/html/_images/cc398b.png
new file mode 100644
index 000000000..b05f6b3f6
Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/cc398b.png differ
diff --git a/doc/LectureNotes/_build/html/_images/ccbb44.png b/doc/LectureNotes/_build/html/_images/ccbb44.png
new file mode 100644
index 000000000..20d1da3f3
Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/ccbb44.png differ
diff --git a/doc/LectureNotes/_build/html/_images/cf222e.png b/doc/LectureNotes/_build/html/_images/cf222e.png
new file mode 100644
index 000000000..eba14bfb1
Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/cf222e.png differ
diff --git a/doc/LectureNotes/_build/html/_images/d166a3.png b/doc/LectureNotes/_build/html/_images/d166a3.png
new file mode 100644
index 000000000..34af2ff43
Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/d166a3.png differ
diff --git a/doc/LectureNotes/_build/html/_images/d2a8ff.png b/doc/LectureNotes/_build/html/_images/d2a8ff.png
new file mode 100644
index 000000000..d3ba734d9
Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/d2a8ff.png differ
diff --git a/doc/LectureNotes/_build/html/_images/d4d0ab.png b/doc/LectureNotes/_build/html/_images/d4d0ab.png
new file mode 100644
index 000000000..4c7b827a4
Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/d4d0ab.png differ
diff --git a/doc/LectureNotes/_build/html/_images/d71835.png b/doc/LectureNotes/_build/html/_images/d71835.png
new file mode 100644
index 000000000..aee961355
Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/d71835.png differ
diff --git a/doc/LectureNotes/_build/html/_images/d9dee3.png b/doc/LectureNotes/_build/html/_images/d9dee3.png
new file mode 100644
index 000000000..61bac902b
Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/d9dee3.png differ
diff --git a/doc/LectureNotes/_build/html/_images/dbb7ff.png b/doc/LectureNotes/_build/html/_images/dbb7ff.png
new file mode 100644
index 000000000..fe7039bd1
Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/dbb7ff.png differ
diff --git a/doc/LectureNotes/_build/html/_images/dcc6e0.png b/doc/LectureNotes/_build/html/_images/dcc6e0.png
new file mode 100644
index 000000000..ad963c944
Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/dcc6e0.png differ
diff --git a/doc/LectureNotes/_build/html/_images/ec8e2c.png b/doc/LectureNotes/_build/html/_images/ec8e2c.png
new file mode 100644
index 000000000..857cbbd5a
Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/ec8e2c.png differ
diff --git a/doc/LectureNotes/_build/html/_images/ee6677.png b/doc/LectureNotes/_build/html/_images/ee6677.png
new file mode 100644
index 000000000..a074ed315
Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/ee6677.png differ
diff --git a/doc/LectureNotes/_build/html/_images/f26196.png b/doc/LectureNotes/_build/html/_images/f26196.png
new file mode 100644
index 000000000..136cec902
Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/f26196.png differ
diff --git a/doc/LectureNotes/_build/html/_images/f5a394.png b/doc/LectureNotes/_build/html/_images/f5a394.png
new file mode 100644
index 000000000..4650b86f2
Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/f5a394.png differ
diff --git a/doc/LectureNotes/_build/html/_images/f5ab35.png b/doc/LectureNotes/_build/html/_images/f5ab35.png
new file mode 100644
index 000000000..5df91ee45
Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/f5ab35.png differ
diff --git a/doc/LectureNotes/_build/html/_images/f5f5f5.png b/doc/LectureNotes/_build/html/_images/f5f5f5.png
new file mode 100644
index 000000000..6703ca30c
Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/f5f5f5.png differ
diff --git a/doc/LectureNotes/_build/html/_images/f78c6c.png b/doc/LectureNotes/_build/html/_images/f78c6c.png
new file mode 100644
index 000000000..5cf8e8bcd
Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/f78c6c.png differ
diff --git a/doc/LectureNotes/_build/html/_images/f8f8f2.png b/doc/LectureNotes/_build/html/_images/f8f8f2.png
new file mode 100644
index 000000000..34d0957f3
Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/f8f8f2.png differ
diff --git a/doc/LectureNotes/_build/html/_images/fad000.png b/doc/LectureNotes/_build/html/_images/fad000.png
new file mode 100644
index 000000000..58ac7e8bf
Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/fad000.png differ
diff --git a/doc/LectureNotes/_build/html/_images/fdac54.png b/doc/LectureNotes/_build/html/_images/fdac54.png
new file mode 100644
index 000000000..9f681b21d
Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/fdac54.png differ
diff --git a/doc/LectureNotes/_build/html/_images/fefeff.png b/doc/LectureNotes/_build/html/_images/fefeff.png
new file mode 100644
index 000000000..ab6d16c80
Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/fefeff.png differ
diff --git a/doc/LectureNotes/_build/html/_images/ff7b72.png b/doc/LectureNotes/_build/html/_images/ff7b72.png
new file mode 100644
index 000000000..d5a3abbcb
Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/ff7b72.png differ
diff --git a/doc/LectureNotes/_build/html/_images/ff9492.png b/doc/LectureNotes/_build/html/_images/ff9492.png
new file mode 100644
index 000000000..5368ed567
Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/ff9492.png differ
diff --git a/doc/LectureNotes/_build/html/_images/ffa07a.png b/doc/LectureNotes/_build/html/_images/ffa07a.png
new file mode 100644
index 000000000..c771ccf9d
Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/ffa07a.png differ
diff --git a/doc/LectureNotes/_build/html/_images/ffa657.png b/doc/LectureNotes/_build/html/_images/ffa657.png
new file mode 100644
index 000000000..ea0e84e3d
Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/ffa657.png differ
diff --git a/doc/LectureNotes/_build/html/_images/ffb757.png b/doc/LectureNotes/_build/html/_images/ffb757.png
new file mode 100644
index 000000000..cb52b6c2a
Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/ffb757.png differ
diff --git a/doc/LectureNotes/_build/html/_images/ffd700.png b/doc/LectureNotes/_build/html/_images/ffd700.png
new file mode 100644
index 000000000..86dca1571
Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/ffd700.png differ
diff --git a/doc/LectureNotes/_build/html/_images/ffd900.png b/doc/LectureNotes/_build/html/_images/ffd900.png
new file mode 100644
index 000000000..786918447
Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/ffd900.png differ
diff --git a/doc/LectureNotes/_build/html/_images/github-dark-colorblind.png b/doc/LectureNotes/_build/html/_images/github-dark-colorblind.png
new file mode 100644
index 000000000..96cf5944d
Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/github-dark-colorblind.png differ
diff --git a/doc/LectureNotes/_build/html/_images/github-dark-high-contrast.png b/doc/LectureNotes/_build/html/_images/github-dark-high-contrast.png
new file mode 100644
index 000000000..f73c3480a
Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/github-dark-high-contrast.png differ
diff --git a/doc/LectureNotes/_build/html/_images/github-dark.png b/doc/LectureNotes/_build/html/_images/github-dark.png
new file mode 100644
index 000000000..50bac6465
Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/github-dark.png differ
diff --git a/doc/LectureNotes/_build/html/_images/github-light-colorblind.png b/doc/LectureNotes/_build/html/_images/github-light-colorblind.png
new file mode 100644
index 000000000..d20cedcaf
Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/github-light-colorblind.png differ
diff --git a/doc/LectureNotes/_build/html/_images/github-light-high-contrast.png b/doc/LectureNotes/_build/html/_images/github-light-high-contrast.png
new file mode 100644
index 000000000..9a2bd4dbe
Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/github-light-high-contrast.png differ
diff --git a/doc/LectureNotes/_build/html/_images/github-light.png b/doc/LectureNotes/_build/html/_images/github-light.png
new file mode 100644
index 000000000..de457f2ba
Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/github-light.png differ
diff --git a/doc/LectureNotes/_build/html/_images/gotthard-dark.png b/doc/LectureNotes/_build/html/_images/gotthard-dark.png
new file mode 100644
index 000000000..4976ac1d6
Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/gotthard-dark.png differ
diff --git a/doc/LectureNotes/_build/html/_images/gotthard-light.png b/doc/LectureNotes/_build/html/_images/gotthard-light.png
new file mode 100644
index 000000000..b9c67150d
Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/gotthard-light.png differ
diff --git a/doc/LectureNotes/_build/html/_images/greative.png b/doc/LectureNotes/_build/html/_images/greative.png
new file mode 100644
index 000000000..935a4b6cd
Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/greative.png differ
diff --git a/doc/LectureNotes/_build/html/_images/pitaya-smoothie.png b/doc/LectureNotes/_build/html/_images/pitaya-smoothie.png
new file mode 100644
index 000000000..ce1f7ba74
Binary files /dev/null and b/doc/LectureNotes/_build/html/_images/pitaya-smoothie.png differ
diff --git a/doc/LectureNotes/_build/html/_panels_static/panels-main.c949a650a448cc0ae9fd3441c0e17fb0.css b/doc/LectureNotes/_build/html/_panels_static/panels-main.c949a650a448cc0ae9fd3441c0e17fb0.css
new file mode 100644
index 000000000..fc14abc85
--- /dev/null
+++ b/doc/LectureNotes/_build/html/_panels_static/panels-main.c949a650a448cc0ae9fd3441c0e17fb0.css
@@ -0,0 +1 @@
+details.dropdown .summary-title{padding-right:3em !important;-moz-user-select:none;-ms-user-select:none;-webkit-user-select:none;user-select:none}details.dropdown:hover{cursor:pointer}details.dropdown .summary-content{cursor:default}details.dropdown summary{list-style:none;padding:1em}details.dropdown summary .octicon.no-title{vertical-align:middle}details.dropdown[open] summary .octicon.no-title{visibility:hidden}details.dropdown summary::-webkit-details-marker{display:none}details.dropdown summary:focus{outline:none}details.dropdown summary:hover .summary-up svg,details.dropdown summary:hover .summary-down svg{opacity:1}details.dropdown .summary-up svg,details.dropdown .summary-down svg{display:block;opacity:.6}details.dropdown .summary-up,details.dropdown .summary-down{pointer-events:none;position:absolute;right:1em;top:.75em}details.dropdown[open] .summary-down{visibility:hidden}details.dropdown:not([open]) .summary-up{visibility:hidden}details.dropdown.fade-in[open] summary~*{-moz-animation:panels-fade-in .5s ease-in-out;-webkit-animation:panels-fade-in .5s ease-in-out;animation:panels-fade-in .5s ease-in-out}details.dropdown.fade-in-slide-down[open] summary~*{-moz-animation:panels-fade-in .5s ease-in-out, panels-slide-down .5s ease-in-out;-webkit-animation:panels-fade-in .5s ease-in-out, panels-slide-down .5s ease-in-out;animation:panels-fade-in .5s ease-in-out, panels-slide-down .5s ease-in-out}@keyframes panels-fade-in{0%{opacity:0}100%{opacity:1}}@keyframes panels-slide-down{0%{transform:translate(0, -10px)}100%{transform:translate(0, 0)}}.octicon{display:inline-block;fill:currentColor;vertical-align:text-top}.tabbed-content{box-shadow:0 -.0625rem var(--tabs-color-overline),0 .0625rem var(--tabs-color-underline);display:none;order:99;padding-bottom:.75rem;padding-top:.75rem;width:100%}.tabbed-content>:first-child{margin-top:0 !important}.tabbed-content>:last-child{margin-bottom:0 !important}.tabbed-content>.tabbed-set{margin:0}.tabbed-set{border-radius:.125rem;display:flex;flex-wrap:wrap;margin:1em 0;position:relative}.tabbed-set>input{opacity:0;position:absolute}.tabbed-set>input:checked+label{border-color:var(--tabs-color-label-active);color:var(--tabs-color-label-active)}.tabbed-set>input:checked+label+.tabbed-content{display:block}.tabbed-set>input:focus+label{outline-style:auto}.tabbed-set>input:not(.focus-visible)+label{outline:none;-webkit-tap-highlight-color:transparent}.tabbed-set>label{border-bottom:.125rem solid transparent;color:var(--tabs-color-label-inactive);cursor:pointer;font-size:var(--tabs-size-label);font-weight:700;padding:1em 1.25em .5em;transition:color 250ms;width:auto;z-index:1}html .tabbed-set>label:hover{color:var(--tabs-color-label-active)}
diff --git a/doc/LectureNotes/_build/html/_panels_static/panels-variables.06eb56fa6e07937060861dad626602ad.css b/doc/LectureNotes/_build/html/_panels_static/panels-variables.06eb56fa6e07937060861dad626602ad.css
new file mode 100644
index 000000000..adc616622
--- /dev/null
+++ b/doc/LectureNotes/_build/html/_panels_static/panels-variables.06eb56fa6e07937060861dad626602ad.css
@@ -0,0 +1,7 @@
+:root {
+--tabs-color-label-active: hsla(231, 99%, 66%, 1);
+--tabs-color-label-inactive: rgba(178, 206, 245, 0.62);
+--tabs-color-overline: rgb(207, 236, 238);
+--tabs-color-underline: rgb(207, 236, 238);
+--tabs-size-label: 1rem;
+}
\ No newline at end of file
diff --git a/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/a11y_pygments/a11y_dark/README.md b/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/a11y_pygments/a11y_dark/README.md
new file mode 100644
index 000000000..23af038a3
--- /dev/null
+++ b/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/a11y_pygments/a11y_dark/README.md
@@ -0,0 +1,26 @@
+# A11Y Dark
+
+This is the Pygments implementation of a11y-dark from [Eric Bailey's
+accessible themes for syntax
+highlighting](https://github.com/ericwbailey/a11y-syntax-highlighting)
+
+![Screenshot of the a11y-dark theme in a bash script](./images/a11y-dark.png)
+
+## Colors
+
+Background color: ![#2b2b2b](https://via.placeholder.com/20/2b2b2b/2b2b2b.png) `#2b2b2b`
+
+Highlight color: ![#ffd9002e](https://via.placeholder.com/20/ffd9002e/ffd9002e.png) `#ffd9002e`
+
+**WCAG compliance**
+
+| Color                                             | Hex       | Ratio    | Normal text | Large text |
+| ------------------------------------------------- | --------- | -------- | ----------- | ---------- |
+| ![#d4d0ab](../../a11y_pygments/assets/d4d0ab.png) | `#d4d0ab` | 9.0 : 1  | AAA         | AAA        |
+| ![#ffa07a](../../a11y_pygments/assets/ffa07a.png) | `#ffa07a` | 7.1 : 1  | AAA         | AAA        |
+| ![#f5ab35](../../a11y_pygments/assets/f5ab35.png) | `#f5ab35` | 7.3 : 1  | AAA         | AAA        |
+| ![#ffd700](../../a11y_pygments/assets/ffd700.png) | `#ffd700` | 10.1 : 1 | AAA         | AAA        |
+| ![#abe338](../../a11y_pygments/assets/abe338.png) | `#abe338` | 9.3 : 1  | AAA         | AAA        |
+| ![#00e0e0](../../a11y_pygments/assets/00e0e0.png) | `#00e0e0` | 8.6 : 1  | AAA         | AAA        |
+| ![#dcc6e0](../../a11y_pygments/assets/dcc6e0.png) | `#dcc6e0` | 8.9 : 1  | AAA         | AAA        |
+| ![#f8f8f2](../../a11y_pygments/assets/f8f8f2.png) | `#f8f8f2` | 13.3 : 1 | AAA         | AAA        |
diff --git a/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/a11y_pygments/a11y_high_contrast_dark/README.md b/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/a11y_pygments/a11y_high_contrast_dark/README.md
new file mode 100644
index 000000000..575b3759b
--- /dev/null
+++ b/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/a11y_pygments/a11y_high_contrast_dark/README.md
@@ -0,0 +1,22 @@
+# A11Y High Contrast Dark
+
+This style mimics the a11 light theme from eric bailey's accessible themes.
+
+![Screenshot of the a11y-high-contrast-dark theme in a bash script](./images/a11y-high-contrast-dark.png)
+
+## Colors
+
+Background color: ![#2b2b2b](https://via.placeholder.com/20/2b2b2b/2b2b2b.png) `#2b2b2b`
+
+Highlight color: ![#ffd9002e](https://via.placeholder.com/20/ffd9002e/ffd9002e.png) `#ffd9002e`
+
+**WCAG compliance**
+
+| Color                                             | Hex       | Ratio    | Normal text | Large text |
+| ------------------------------------------------- | --------- | -------- | ----------- | ---------- |
+| ![#ffd900](../../a11y_pygments/assets/ffd900.png) | `#ffd900` | 10.2 : 1 | AAA         | AAA        |
+| ![#ffa07a](../../a11y_pygments/assets/ffa07a.png) | `#ffa07a` | 7.1 : 1  | AAA         | AAA        |
+| ![#abe338](../../a11y_pygments/assets/abe338.png) | `#abe338` | 9.3 : 1  | AAA         | AAA        |
+| ![#00e0e0](../../a11y_pygments/assets/00e0e0.png) | `#00e0e0` | 8.6 : 1  | AAA         | AAA        |
+| ![#dcc6e0](../../a11y_pygments/assets/dcc6e0.png) | `#dcc6e0` | 8.9 : 1  | AAA         | AAA        |
+| ![#f8f8f2](../../a11y_pygments/assets/f8f8f2.png) | `#f8f8f2` | 13.3 : 1 | AAA         | AAA        |
diff --git a/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/a11y_pygments/a11y_high_contrast_light/README.md b/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/a11y_pygments/a11y_high_contrast_light/README.md
new file mode 100644
index 000000000..a0b6be8fc
--- /dev/null
+++ b/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/a11y_pygments/a11y_high_contrast_light/README.md
@@ -0,0 +1,24 @@
+# A11Y High Contrast Light
+
+This style mimics the a11y-light theme (but with more contrast) from eric bailey's accessible themes.
+
+![Screenshot of the a11y-high-contrast-light theme in a bash script](./images/a11y-high-contrast-light.png)
+
+## Colors
+
+Background color: ![#fefefe](https://via.placeholder.com/20/fefefe/fefefe.png) `#fefefe`
+
+Highlight color: ![#fae4c2](https://via.placeholder.com/20/fae4c2/fae4c2.png) `#fae4c2`
+
+**WCAG compliance**
+
+| Color                                             | Hex       | Ratio    | Normal text | Large text |
+| ------------------------------------------------- | --------- | -------- | ----------- | ---------- |
+| ![#515151](../../a11y_pygments/assets/515151.png) | `#515151` | 7.9 : 1  | AAA         | AAA        |
+| ![#a12236](../../a11y_pygments/assets/a12236.png) | `#a12236` | 7.4 : 1  | AAA         | AAA        |
+| ![#7f4707](../../a11y_pygments/assets/7f4707.png) | `#7f4707` | 7.4 : 1  | AAA         | AAA        |
+| ![#912583](../../a11y_pygments/assets/912583.png) | `#912583` | 7.4 : 1  | AAA         | AAA        |
+| ![#00622f](../../a11y_pygments/assets/00622f.png) | `#00622f` | 7.5 : 1  | AAA         | AAA        |
+| ![#005b82](../../a11y_pygments/assets/005b82.png) | `#005b82` | 7.4 : 1  | AAA         | AAA        |
+| ![#6730c5](../../a11y_pygments/assets/6730c5.png) | `#6730c5` | 7.4 : 1  | AAA         | AAA        |
+| ![#080808](../../a11y_pygments/assets/080808.png) | `#080808` | 19.9 : 1 | AAA         | AAA        |
diff --git a/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/a11y_pygments/a11y_light/README.md b/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/a11y_pygments/a11y_light/README.md
new file mode 100644
index 000000000..911cef825
--- /dev/null
+++ b/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/a11y_pygments/a11y_light/README.md
@@ -0,0 +1,23 @@
+# A11Y Light
+
+This style inspired by the a11y-light theme from eric bailey's accessible themes.
+
+![Screenshot of the a11y-light theme in a bash script](./images/a11y-light.png)
+
+## Colors
+
+Background color: ![#f2f2f2](https://via.placeholder.com/20/f2f2f2/f2f2f2.png) `#f2f2f2`
+
+Highlight color: ![#fdf2e2](https://via.placeholder.com/20/fdf2e2/fdf2e2.png) `#fdf2e2`
+
+**WCAG compliance**
+
+| Color                                             | Hex       | Ratio    | Normal text | Large text |
+| ------------------------------------------------- | --------- | -------- | ----------- | ---------- |
+| ![#515151](../../a11y_pygments/assets/515151.png) | `#515151` | 7.1 : 1  | AAA         | AAA        |
+| ![#d71835](../../a11y_pygments/assets/d71835.png) | `#d71835` | 4.6 : 1  | AA          | AAA        |
+| ![#7f4707](../../a11y_pygments/assets/7f4707.png) | `#7f4707` | 6.7 : 1  | AA          | AAA        |
+| ![#116633](../../a11y_pygments/assets/116633.png) | `#116633` | 6.3 : 1  | AA          | AAA        |
+| ![#00749c](../../a11y_pygments/assets/00749c.png) | `#00749c` | 4.7 : 1  | AA          | AAA        |
+| ![#8045e5](../../a11y_pygments/assets/8045e5.png) | `#8045e5` | 4.8 : 1  | AA          | AAA        |
+| ![#1e1e1e](../../a11y_pygments/assets/1e1e1e.png) | `#1e1e1e` | 14.9 : 1 | AAA         | AAA        |
diff --git a/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/a11y_pygments/blinds_dark/README.md b/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/a11y_pygments/blinds_dark/README.md
new file mode 100644
index 000000000..62529463f
--- /dev/null
+++ b/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/a11y_pygments/blinds_dark/README.md
@@ -0,0 +1,23 @@
+# Blinds Dark
+
+This style mimics the blinds dark theme from vscode themes.
+
+![Screenshot of the blinds-dark theme in a bash script](./images/blinds-dark.png)
+
+## Colors
+
+Background color: ![#242424](https://via.placeholder.com/20/242424/242424.png) `#242424`
+
+Highlight color: ![#66666691](https://via.placeholder.com/20/66666691/66666691.png) `#66666691`
+
+**WCAG compliance**
+
+| Color                                             | Hex       | Ratio   | Normal text | Large text |
+| ------------------------------------------------- | --------- | ------- | ----------- | ---------- |
+| ![#8c8c8c](../../a11y_pygments/assets/8c8c8c.png) | `#8c8c8c` | 4.6 : 1 | AA          | AAA        |
+| ![#ee6677](../../a11y_pygments/assets/ee6677.png) | `#ee6677` | 5.0 : 1 | AA          | AAA        |
+| ![#ccbb44](../../a11y_pygments/assets/ccbb44.png) | `#ccbb44` | 8.0 : 1 | AAA         | AAA        |
+| ![#66ccee](../../a11y_pygments/assets/66ccee.png) | `#66ccee` | 8.5 : 1 | AAA         | AAA        |
+| ![#5391cf](../../a11y_pygments/assets/5391cf.png) | `#5391cf` | 4.7 : 1 | AA          | AAA        |
+| ![#d166a3](../../a11y_pygments/assets/d166a3.png) | `#d166a3` | 4.5 : 1 | AA          | AAA        |
+| ![#bbbbbb](../../a11y_pygments/assets/bbbbbb.png) | `#bbbbbb` | 8.1 : 1 | AAA         | AAA        |
diff --git a/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/a11y_pygments/blinds_light/README.md b/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/a11y_pygments/blinds_light/README.md
new file mode 100644
index 000000000..28e724e59
--- /dev/null
+++ b/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/a11y_pygments/blinds_light/README.md
@@ -0,0 +1,23 @@
+# Blinds Light
+
+This style mimics the blinds light theme from vscode themes.
+
+![Screenshot of the blinds-light theme in a bash script](./images/blinds-light.png)
+
+## Colors
+
+Background color: ![#fcfcfc](https://via.placeholder.com/20/fcfcfc/fcfcfc.png) `#fcfcfc`
+
+Highlight color: ![#add6ff](https://via.placeholder.com/20/add6ff/add6ff.png) `#add6ff`
+
+**WCAG compliance**
+
+| Color                                             | Hex       | Ratio    | Normal text | Large text |
+| ------------------------------------------------- | --------- | -------- | ----------- | ---------- |
+| ![#737373](../../a11y_pygments/assets/737373.png) | `#737373` | 4.6 : 1  | AA          | AAA        |
+| ![#bf5400](../../a11y_pygments/assets/bf5400.png) | `#bf5400` | 4.6 : 1  | AA          | AAA        |
+| ![#996b00](../../a11y_pygments/assets/996b00.png) | `#996b00` | 4.6 : 1  | AA          | AAA        |
+| ![#008561](../../a11y_pygments/assets/008561.png) | `#008561` | 4.5 : 1  | AA          | AAA        |
+| ![#0072b2](../../a11y_pygments/assets/0072b2.png) | `#0072b2` | 5.1 : 1  | AA          | AAA        |
+| ![#cc398b](../../a11y_pygments/assets/cc398b.png) | `#cc398b` | 4.5 : 1  | AA          | AAA        |
+| ![#000000](../../a11y_pygments/assets/000000.png) | `#000000` | 20.5 : 1 | AAA         | AAA        |
diff --git a/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/a11y_pygments/github_dark/README.md b/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/a11y_pygments/github_dark/README.md
new file mode 100644
index 000000000..0e24df43e
--- /dev/null
+++ b/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/a11y_pygments/github_dark/README.md
@@ -0,0 +1,23 @@
+# Github Dark
+
+This style mimics the github dark default theme from vs code themes.
+
+![Screenshot of the github-dark theme in a bash script](./images/github-dark.png)
+
+## Colors
+
+Background color: ![#0d1117](https://via.placeholder.com/20/0d1117/0d1117.png) `#0d1117`
+
+Highlight color: ![#6e7681](https://via.placeholder.com/20/6e7681/6e7681.png) `#6e7681`
+
+**WCAG compliance**
+
+| Color                                             | Hex       | Ratio    | Normal text | Large text |
+| ------------------------------------------------- | --------- | -------- | ----------- | ---------- |
+| ![#8b949e](../../a11y_pygments/assets/8b949e.png) | `#8b949e` | 6.2 : 1  | AA          | AAA        |
+| ![#ff7b72](../../a11y_pygments/assets/ff7b72.png) | `#ff7b72` | 7.5 : 1  | AAA         | AAA        |
+| ![#ffa657](../../a11y_pygments/assets/ffa657.png) | `#ffa657` | 9.8 : 1  | AAA         | AAA        |
+| ![#7ee787](../../a11y_pygments/assets/7ee787.png) | `#7ee787` | 12.3 : 1 | AAA         | AAA        |
+| ![#79c0ff](../../a11y_pygments/assets/79c0ff.png) | `#79c0ff` | 9.7 : 1  | AAA         | AAA        |
+| ![#d2a8ff](../../a11y_pygments/assets/d2a8ff.png) | `#d2a8ff` | 9.7 : 1  | AAA         | AAA        |
+| ![#c9d1d9](../../a11y_pygments/assets/c9d1d9.png) | `#c9d1d9` | 12.3 : 1 | AAA         | AAA        |
diff --git a/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/a11y_pygments/github_dark_colorblind/README.md b/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/a11y_pygments/github_dark_colorblind/README.md
new file mode 100644
index 000000000..9ad72f9f9
--- /dev/null
+++ b/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/a11y_pygments/github_dark_colorblind/README.md
@@ -0,0 +1,23 @@
+# Github Dark Colorblind
+
+This style mimics the github dark colorblind theme from vscode.
+
+![Screenshot of the github-dark-colorblind theme in a bash script](./images/github-dark-colorblind.png)
+
+## Colors
+
+Background color: ![#0d1117](https://via.placeholder.com/20/0d1117/0d1117.png) `#0d1117`
+
+Highlight color: ![#58a6ff70](https://via.placeholder.com/20/58a6ff70/58a6ff70.png) `#58a6ff70`
+
+**WCAG compliance**
+
+| Color                                             | Hex       | Ratio    | Normal text | Large text |
+| ------------------------------------------------- | --------- | -------- | ----------- | ---------- |
+| ![#b1bac4](../../a11y_pygments/assets/b1bac4.png) | `#b1bac4` | 9.6 : 1  | AAA         | AAA        |
+| ![#ec8e2c](../../a11y_pygments/assets/ec8e2c.png) | `#ec8e2c` | 7.6 : 1  | AAA         | AAA        |
+| ![#fdac54](../../a11y_pygments/assets/fdac54.png) | `#fdac54` | 10.1 : 1 | AAA         | AAA        |
+| ![#a5d6ff](../../a11y_pygments/assets/a5d6ff.png) | `#a5d6ff` | 12.3 : 1 | AAA         | AAA        |
+| ![#79c0ff](../../a11y_pygments/assets/79c0ff.png) | `#79c0ff` | 9.7 : 1  | AAA         | AAA        |
+| ![#d2a8ff](../../a11y_pygments/assets/d2a8ff.png) | `#d2a8ff` | 9.7 : 1  | AAA         | AAA        |
+| ![#c9d1d9](../../a11y_pygments/assets/c9d1d9.png) | `#c9d1d9` | 12.3 : 1 | AAA         | AAA        |
diff --git a/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/a11y_pygments/github_dark_high_contrast/README.md b/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/a11y_pygments/github_dark_high_contrast/README.md
new file mode 100644
index 000000000..c395966c5
--- /dev/null
+++ b/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/a11y_pygments/github_dark_high_contrast/README.md
@@ -0,0 +1,23 @@
+# Github Dark High Contrast
+
+This style mimics the github dark high contrast theme from vs code themes.
+
+![Screenshot of the github-dark-high-contrast theme in a bash script](./images/github-dark-high-contrast.png)
+
+## Colors
+
+Background color: ![#0d1117](https://via.placeholder.com/20/0d1117/0d1117.png) `#0d1117`
+
+Highlight color: ![#58a6ff70](https://via.placeholder.com/20/58a6ff70/58a6ff70.png) `#58a6ff70`
+
+**WCAG compliance**
+
+| Color                                             | Hex       | Ratio    | Normal text | Large text |
+| ------------------------------------------------- | --------- | -------- | ----------- | ---------- |
+| ![#d9dee3](../../a11y_pygments/assets/d9dee3.png) | `#d9dee3` | 14.0 : 1 | AAA         | AAA        |
+| ![#ff9492](../../a11y_pygments/assets/ff9492.png) | `#ff9492` | 8.9 : 1  | AAA         | AAA        |
+| ![#ffb757](../../a11y_pygments/assets/ffb757.png) | `#ffb757` | 11.0 : 1 | AAA         | AAA        |
+| ![#72f088](../../a11y_pygments/assets/72f088.png) | `#72f088` | 13.1 : 1 | AAA         | AAA        |
+| ![#91cbff](../../a11y_pygments/assets/91cbff.png) | `#91cbff` | 11.0 : 1 | AAA         | AAA        |
+| ![#dbb7ff](../../a11y_pygments/assets/dbb7ff.png) | `#dbb7ff` | 11.0 : 1 | AAA         | AAA        |
+| ![#c9d1d9](../../a11y_pygments/assets/c9d1d9.png) | `#c9d1d9` | 12.3 : 1 | AAA         | AAA        |
diff --git a/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/a11y_pygments/github_light/README.md b/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/a11y_pygments/github_light/README.md
new file mode 100644
index 000000000..8059f6d20
--- /dev/null
+++ b/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/a11y_pygments/github_light/README.md
@@ -0,0 +1,23 @@
+# Github Light
+
+This style mimics the github light theme from vscode themes.
+
+![Screenshot of the github-light theme in a bash script](./images/github-light.png)
+
+## Colors
+
+Background color: ![#ffffff](https://via.placeholder.com/20/ffffff/ffffff.png) `#ffffff`
+
+Highlight color: ![#0969da4a](https://via.placeholder.com/20/0969da4a/0969da4a.png) `#0969da4a`
+
+**WCAG compliance**
+
+| Color                                             | Hex       | Ratio    | Normal text | Large text |
+| ------------------------------------------------- | --------- | -------- | ----------- | ---------- |
+| ![#6e7781](../../a11y_pygments/assets/6e7781.png) | `#6e7781` | 4.5 : 1  | AA          | AAA        |
+| ![#cf222e](../../a11y_pygments/assets/cf222e.png) | `#cf222e` | 5.4 : 1  | AA          | AAA        |
+| ![#953800](../../a11y_pygments/assets/953800.png) | `#953800` | 7.4 : 1  | AAA         | AAA        |
+| ![#116329](../../a11y_pygments/assets/116329.png) | `#116329` | 7.4 : 1  | AAA         | AAA        |
+| ![#0550ae](../../a11y_pygments/assets/0550ae.png) | `#0550ae` | 7.6 : 1  | AAA         | AAA        |
+| ![#8250df](../../a11y_pygments/assets/8250df.png) | `#8250df` | 5.0 : 1  | AA          | AAA        |
+| ![#24292f](../../a11y_pygments/assets/24292f.png) | `#24292f` | 14.7 : 1 | AAA         | AAA        |
diff --git a/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/a11y_pygments/github_light_colorblind/README.md b/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/a11y_pygments/github_light_colorblind/README.md
new file mode 100644
index 000000000..120cbd39a
--- /dev/null
+++ b/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/a11y_pygments/github_light_colorblind/README.md
@@ -0,0 +1,22 @@
+# Github Light Colorblind
+
+This style mimics the github light colorblind theme from vscode themes.
+
+![Screenshot of the github-light-colorblind theme in a bash script](./images/github-light-colorblind.png)
+
+## Colors
+
+Background color: ![#ffffff](https://via.placeholder.com/20/ffffff/ffffff.png) `#ffffff`
+
+Highlight color: ![#0969da4a](https://via.placeholder.com/20/0969da4a/0969da4a.png) `#0969da4a`
+
+**WCAG compliance**
+
+| Color                                             | Hex       | Ratio    | Normal text | Large text |
+| ------------------------------------------------- | --------- | -------- | ----------- | ---------- |
+| ![#6e7781](../../a11y_pygments/assets/6e7781.png) | `#6e7781` | 4.5 : 1  | AA          | AAA        |
+| ![#b35900](../../a11y_pygments/assets/b35900.png) | `#b35900` | 4.8 : 1  | AA          | AAA        |
+| ![#8a4600](../../a11y_pygments/assets/8a4600.png) | `#8a4600` | 7.1 : 1  | AAA         | AAA        |
+| ![#0550ae](../../a11y_pygments/assets/0550ae.png) | `#0550ae` | 7.6 : 1  | AAA         | AAA        |
+| ![#8250df](../../a11y_pygments/assets/8250df.png) | `#8250df` | 5.0 : 1  | AA          | AAA        |
+| ![#24292f](../../a11y_pygments/assets/24292f.png) | `#24292f` | 14.7 : 1 | AAA         | AAA        |
diff --git a/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/a11y_pygments/github_light_high_contrast/README.md b/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/a11y_pygments/github_light_high_contrast/README.md
new file mode 100644
index 000000000..e938e986f
--- /dev/null
+++ b/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/a11y_pygments/github_light_high_contrast/README.md
@@ -0,0 +1,23 @@
+# Github Light High Contrast
+
+This style mimics the github light high contrast theme from vscode themes.
+
+![Screenshot of the github-light-high-contrast theme in a bash script](./images/github-light-high-contrast.png)
+
+## Colors
+
+Background color: ![#ffffff](https://via.placeholder.com/20/ffffff/ffffff.png) `#ffffff`
+
+Highlight color: ![#0969da4a](https://via.placeholder.com/20/0969da4a/0969da4a.png) `#0969da4a`
+
+**WCAG compliance**
+
+| Color                                             | Hex       | Ratio    | Normal text | Large text |
+| ------------------------------------------------- | --------- | -------- | ----------- | ---------- |
+| ![#66707b](../../a11y_pygments/assets/66707b.png) | `#66707b` | 5.0 : 1  | AA          | AAA        |
+| ![#a0111f](../../a11y_pygments/assets/a0111f.png) | `#a0111f` | 8.1 : 1  | AAA         | AAA        |
+| ![#702c00](../../a11y_pygments/assets/702c00.png) | `#702c00` | 10.2 : 1 | AAA         | AAA        |
+| ![#024c1a](../../a11y_pygments/assets/024c1a.png) | `#024c1a` | 10.2 : 1 | AAA         | AAA        |
+| ![#023b95](../../a11y_pygments/assets/023b95.png) | `#023b95` | 10.2 : 1 | AAA         | AAA        |
+| ![#622cbc](../../a11y_pygments/assets/622cbc.png) | `#622cbc` | 8.1 : 1  | AAA         | AAA        |
+| ![#24292f](../../a11y_pygments/assets/24292f.png) | `#24292f` | 14.7 : 1 | AAA         | AAA        |
diff --git a/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/a11y_pygments/gotthard_dark/README.md b/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/a11y_pygments/gotthard_dark/README.md
new file mode 100644
index 000000000..8fa52f430
--- /dev/null
+++ b/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/a11y_pygments/gotthard_dark/README.md
@@ -0,0 +1,23 @@
+# Gotthard Dark
+
+This style mimics the gotthard dark theme from vscode.
+
+![Screenshot of the gotthard-dark theme in a bash script](./images/gotthard-dark.png)
+
+## Colors
+
+Background color: ![#000000](https://via.placeholder.com/20/000000/000000.png) `#000000`
+
+Highlight color: ![#4c4b4be8](https://via.placeholder.com/20/4c4b4be8/4c4b4be8.png) `#4c4b4be8`
+
+**WCAG compliance**
+
+| Color                                             | Hex       | Ratio    | Normal text | Large text |
+| ------------------------------------------------- | --------- | -------- | ----------- | ---------- |
+| ![#f5f5f5](../../a11y_pygments/assets/f5f5f5.png) | `#f5f5f5` | 19.3 : 1 | AAA         | AAA        |
+| ![#ab6369](../../a11y_pygments/assets/ab6369.png) | `#ab6369` | 4.7 : 1  | AA          | AAA        |
+| ![#b89784](../../a11y_pygments/assets/b89784.png) | `#b89784` | 7.8 : 1  | AAA         | AAA        |
+| ![#caab6d](../../a11y_pygments/assets/caab6d.png) | `#caab6d` | 9.6 : 1  | AAA         | AAA        |
+| ![#81b19b](../../a11y_pygments/assets/81b19b.png) | `#81b19b` | 8.7 : 1  | AAA         | AAA        |
+| ![#6f98b3](../../a11y_pygments/assets/6f98b3.png) | `#6f98b3` | 6.8 : 1  | AA          | AAA        |
+| ![#b19db4](../../a11y_pygments/assets/b19db4.png) | `#b19db4` | 8.4 : 1  | AAA         | AAA        |
diff --git a/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/a11y_pygments/gotthard_light/README.md b/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/a11y_pygments/gotthard_light/README.md
new file mode 100644
index 000000000..4ff0d9874
--- /dev/null
+++ b/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/a11y_pygments/gotthard_light/README.md
@@ -0,0 +1,23 @@
+# Gotthard Light
+
+This style mimics the gotthard light theme from vscode.
+
+![Screenshot of the gotthard-light theme in a bash script](./images/gotthard-light.png)
+
+## Colors
+
+Background color: ![#F5F5F5](https://via.placeholder.com/20/F5F5F5/F5F5F5.png) `#F5F5F5`
+
+Highlight color: ![#E1E1E1](https://via.placeholder.com/20/E1E1E1/E1E1E1.png) `#E1E1E1`
+
+**WCAG compliance**
+
+| Color                                             | Hex       | Ratio    | Normal text | Large text |
+| ------------------------------------------------- | --------- | -------- | ----------- | ---------- |
+| ![#141414](../../a11y_pygments/assets/141414.png) | `#141414` | 16.9 : 1 | AAA         | AAA        |
+| ![#9f4e55](../../a11y_pygments/assets/9f4e55.png) | `#9f4e55` | 5.2 : 1  | AA          | AAA        |
+| ![#a25e53](../../a11y_pygments/assets/a25e53.png) | `#a25e53` | 4.5 : 1  | AA          | AAA        |
+| ![#98661b](../../a11y_pygments/assets/98661b.png) | `#98661b` | 4.5 : 1  | AA          | AAA        |
+| ![#437a6b](../../a11y_pygments/assets/437a6b.png) | `#437a6b` | 4.5 : 1  | AA          | AAA        |
+| ![#3d73a9](../../a11y_pygments/assets/3d73a9.png) | `#3d73a9` | 4.6 : 1  | AA          | AAA        |
+| ![#974eb7](../../a11y_pygments/assets/974eb7.png) | `#974eb7` | 4.7 : 1  | AA          | AAA        |
diff --git a/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/a11y_pygments/greative/README.md b/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/a11y_pygments/greative/README.md
new file mode 100644
index 000000000..5e70f12fe
--- /dev/null
+++ b/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/a11y_pygments/greative/README.md
@@ -0,0 +1,23 @@
+# Greative
+
+This style mimics greative theme from vscode themes.
+
+![Screenshot of the greative theme in a bash script](./images/greative.png)
+
+## Colors
+
+Background color: ![#010726](https://via.placeholder.com/20/010726/010726.png) `#010726`
+
+Highlight color: ![#473d18](https://via.placeholder.com/20/473d18/473d18.png) `#473d18`
+
+**WCAG compliance**
+
+| Color                                             | Hex       | Ratio    | Normal text | Large text |
+| ------------------------------------------------- | --------- | -------- | ----------- | ---------- |
+| ![#797979](../../a11y_pygments/assets/797979.png) | `#797979` | 4.6 : 1  | AA          | AAA        |
+| ![#f78c6c](../../a11y_pygments/assets/f78c6c.png) | `#f78c6c` | 8.4 : 1  | AAA         | AAA        |
+| ![#9e8741](../../a11y_pygments/assets/9e8741.png) | `#9e8741` | 5.7 : 1  | AA          | AAA        |
+| ![#c5e478](../../a11y_pygments/assets/c5e478.png) | `#c5e478` | 13.9 : 1 | AAA         | AAA        |
+| ![#a2bffc](../../a11y_pygments/assets/a2bffc.png) | `#a2bffc` | 10.8 : 1 | AAA         | AAA        |
+| ![#5ca7e4](../../a11y_pygments/assets/5ca7e4.png) | `#5ca7e4` | 7.6 : 1  | AAA         | AAA        |
+| ![#9e86c8](../../a11y_pygments/assets/9e86c8.png) | `#9e86c8` | 6.3 : 1  | AA          | AAA        |
diff --git a/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/a11y_pygments/pitaya_smoothie/README.md b/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/a11y_pygments/pitaya_smoothie/README.md
new file mode 100644
index 000000000..a83a734b2
--- /dev/null
+++ b/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/a11y_pygments/pitaya_smoothie/README.md
@@ -0,0 +1,25 @@
+# Pitaya Smoothie
+
+This style mimics the a11 light theme from eric bailey's accessible themes.
+
+![Screenshot of the pitaya-smoothie theme in a bash script](./images/pitaya-smoothie.png)
+
+## Colors
+
+Background color: ![#181036](https://via.placeholder.com/20/181036/181036.png) `#181036`
+
+Highlight color: ![#2A1968](https://via.placeholder.com/20/2A1968/2A1968.png) `#2A1968`
+
+**WCAG compliance**
+
+| Color                                             | Hex       | Ratio    | Normal text | Large text |
+| ------------------------------------------------- | --------- | -------- | ----------- | ---------- |
+| ![#8786ac](../../a11y_pygments/assets/8786ac.png) | `#8786ac` | 5.2 : 1  | AA          | AAA        |
+| ![#f26196](../../a11y_pygments/assets/f26196.png) | `#f26196` | 5.9 : 1  | AA          | AAA        |
+| ![#f5a394](../../a11y_pygments/assets/f5a394.png) | `#f5a394` | 9.0 : 1  | AAA         | AAA        |
+| ![#fad000](../../a11y_pygments/assets/fad000.png) | `#fad000` | 12.1 : 1 | AAA         | AAA        |
+| ![#18c1c4](../../a11y_pygments/assets/18c1c4.png) | `#18c1c4` | 8.1 : 1  | AAA         | AAA        |
+| ![#66e9ec](../../a11y_pygments/assets/66e9ec.png) | `#66e9ec` | 12.4 : 1 | AAA         | AAA        |
+| ![#7998f2](../../a11y_pygments/assets/7998f2.png) | `#7998f2` | 6.5 : 1  | AA          | AAA        |
+| ![#c4a2f5](../../a11y_pygments/assets/c4a2f5.png) | `#c4a2f5` | 8.4 : 1  | AAA         | AAA        |
+| ![#fefeff](../../a11y_pygments/assets/fefeff.png) | `#fefeff` | 17.9 : 1 | AAA         | AAA        |
diff --git a/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/alabaster-0.7.16.dist-info/LICENSE.rst b/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/alabaster-0.7.16.dist-info/LICENSE.rst
new file mode 100644
index 000000000..19361a719
--- /dev/null
+++ b/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/alabaster-0.7.16.dist-info/LICENSE.rst
@@ -0,0 +1,34 @@
+Copyright (c) 2020 Jeff Forcier.
+
+Based on original work copyright (c) 2011 Kenneth Reitz and copyright (c) 2010
+Armin Ronacher.
+
+Some rights reserved.
+
+Redistribution and use in source and binary forms of the theme, with or
+without modification, are permitted provided that the following conditions
+are met:
+
+* Redistributions of source code must retain the above copyright
+  notice, this list of conditions and the following disclaimer.
+
+* Redistributions in binary form must reproduce the above
+  copyright notice, this list of conditions and the following
+  disclaimer in the documentation and/or other materials provided
+  with the distribution.
+
+* The names of the contributors may not be used to endorse or
+  promote products derived from this software without specific
+  prior written permission.
+
+THIS THEME IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
+LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+ARISING IN ANY WAY OUT OF THE USE OF THIS THEME, EVEN IF ADVISED OF THE
+POSSIBILITY OF SUCH DAMAGE.
diff --git a/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/debugpy/_vendored/pydevd/pydevd_plugins/extensions/README.md b/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/debugpy/_vendored/pydevd/pydevd_plugins/extensions/README.md
new file mode 100644
index 000000000..030e303ee
--- /dev/null
+++ b/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/debugpy/_vendored/pydevd/pydevd_plugins/extensions/README.md
@@ -0,0 +1,30 @@
+Extensions allow extending the debugger without modifying the debugger code. This is implemented with explicit namespace
+packages.
+
+To implement your own extension:
+
+1. Ensure that the root folder of your extension is in sys.path (add it to PYTHONPATH) 
+2. Ensure that your module follows the directory structure below
+3. The ``__init__.py`` files inside the pydevd_plugin and extension folder must contain the preamble below,
+and nothing else.
+Preamble: 
+```python
+try:
+    __import__('pkg_resources').declare_namespace(__name__)
+except ImportError:
+    import pkgutil
+    __path__ = pkgutil.extend_path(__path__, __name__)
+```
+4. Your plugin name inside the extensions folder must start with `"pydevd_plugin"`
+5. Implement one or more of the abstract base classes defined in `_pydevd_bundle.pydevd_extension_api`. This can be done
+by either inheriting from them or registering with the abstract base class.
+
+* Directory structure:
+```
+|--  root_directory-> must be on python path
+|    |-- pydevd_plugins
+|    |   |-- __init__.py -> must contain preamble
+|    |   |-- extensions
+|    |   |   |-- __init__.py -> must contain preamble
+|    |   |   |-- pydevd_plugin_plugin_name.py
+```
\ No newline at end of file
diff --git a/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/idna-3.10.dist-info/LICENSE.md b/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/idna-3.10.dist-info/LICENSE.md
new file mode 100644
index 000000000..19b6b4524
--- /dev/null
+++ b/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/idna-3.10.dist-info/LICENSE.md
@@ -0,0 +1,31 @@
+BSD 3-Clause License
+
+Copyright (c) 2013-2024, Kim Davies and contributors.
+All rights reserved.
+
+Redistribution and use in source and binary forms, with or without
+modification, are permitted provided that the following conditions are
+met:
+
+1. Redistributions of source code must retain the above copyright
+   notice, this list of conditions and the following disclaimer.
+
+2. Redistributions in binary form must reproduce the above copyright
+   notice, this list of conditions and the following disclaimer in the
+   documentation and/or other materials provided with the distribution.
+
+3. Neither the name of the copyright holder nor the names of its
+   contributors may be used to endorse or promote products derived from
+   this software without specific prior written permission.
+
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED
+TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
+PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
+LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
+NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
+SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
diff --git a/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/imagesize-1.4.1.dist-info/LICENSE.rst b/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/imagesize-1.4.1.dist-info/LICENSE.rst
new file mode 100644
index 000000000..58a2394f0
--- /dev/null
+++ b/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/imagesize-1.4.1.dist-info/LICENSE.rst
@@ -0,0 +1,19 @@
+The MIT License (MIT)
+----------------------------
+
+Copyright © 2016 Yoshiki Shibukawa
+
+Permission is hereby granted, free of charge, to any person obtaining a copy of this software
+and associated documentation files (the “Software”), to deal in the Software without restriction,
+including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense,
+and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so,
+subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all copies or substantial
+portions of the Software.
+
+THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT
+NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
+IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
+WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH
+THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
diff --git a/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/ipython-9.5.0.dist-info/licenses/COPYING.rst b/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/ipython-9.5.0.dist-info/licenses/COPYING.rst
new file mode 100644
index 000000000..e5c79ef38
--- /dev/null
+++ b/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/ipython-9.5.0.dist-info/licenses/COPYING.rst
@@ -0,0 +1,41 @@
+=============================
+ The IPython licensing terms
+=============================
+
+IPython is licensed under the terms of the Modified BSD License (also known as
+New or Revised or 3-Clause BSD). See the LICENSE file.
+
+
+About the IPython Development Team
+----------------------------------
+
+Fernando Perez began IPython in 2001 based on code from Janko Hauser
+<jhauser@zscout.de> and Nathaniel Gray <n8gray@caltech.edu>.  Fernando is still
+the project lead.
+
+The IPython Development Team is the set of all contributors to the IPython
+project.  This includes all of the IPython subprojects. 
+
+The core team that coordinates development on GitHub can be found here:
+https://github.com/ipython/.
+
+Our Copyright Policy
+--------------------
+
+IPython uses a shared copyright model. Each contributor maintains copyright
+over their contributions to IPython. But, it is important to note that these
+contributions are typically only changes to the repositories. Thus, the IPython
+source code, in its entirety is not the copyright of any single person or
+institution.  Instead, it is the collective copyright of the entire IPython
+Development Team.  If individual contributors want to maintain a record of what
+changes/contributions they have specific copyright on, they should indicate
+their copyright in the commit message of the change, when they commit the
+change to one of the IPython repositories.
+
+With this in mind, the following banner should be used in any source code file 
+to indicate the copyright and license terms:
+
+::
+
+    # Copyright (c) IPython Development Team.
+    # Distributed under the terms of the Modified BSD License.
diff --git a/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/jupyter_book/book_template/intro.md b/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/jupyter_book/book_template/intro.md
new file mode 100644
index 000000000..f8cdc73cb
--- /dev/null
+++ b/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/jupyter_book/book_template/intro.md
@@ -0,0 +1,11 @@
+# Welcome to your Jupyter Book
+
+This is a small sample book to give you a feel for how book content is
+structured.
+It shows off a few of the major file types, as well as some sample content.
+It does not go in-depth into any particular topic - check out [the Jupyter Book documentation](https://jupyterbook.org) for more information.
+
+Check out the content pages bundled with this sample book to see more.
+
+```{tableofcontents}
+```
diff --git a/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/jupyter_book/book_template/markdown-notebooks.md b/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/jupyter_book/book_template/markdown-notebooks.md
new file mode 100644
index 000000000..a057a320d
--- /dev/null
+++ b/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/jupyter_book/book_template/markdown-notebooks.md
@@ -0,0 +1,53 @@
+---
+jupytext:
+  formats: md:myst
+  text_representation:
+    extension: .md
+    format_name: myst
+    format_version: 0.13
+    jupytext_version: 1.11.5
+kernelspec:
+  display_name: Python 3
+  language: python
+  name: python3
+---
+
+# Notebooks with MyST Markdown
+
+Jupyter Book also lets you write text-based notebooks using MyST Markdown.
+See [the Notebooks with MyST Markdown documentation](https://jupyterbook.org/file-types/myst-notebooks.html) for more detailed instructions.
+This page shows off a notebook written in MyST Markdown.
+
+## An example cell
+
+With MyST Markdown, you can define code cells with a directive like so:
+
+```{code-cell}
+print(2 + 2)
+```
+
+When your book is built, the contents of any `{code-cell}` blocks will be
+executed with your default Jupyter kernel, and their outputs will be displayed
+in-line with the rest of your content.
+
+```{seealso}
+Jupyter Book uses [Jupytext](https://jupytext.readthedocs.io/en/latest/) to convert text-based files to notebooks, and can support [many other text-based notebook files](https://jupyterbook.org/file-types/jupytext.html).
+```
+
+## Create a notebook with MyST Markdown
+
+MyST Markdown notebooks are defined by two things:
+
+1. YAML metadata that is needed to understand if / how it should convert text files to notebooks (including information about the kernel needed).
+   See the YAML at the top of this page for example.
+2. The presence of `{code-cell}` directives, which will be executed with your book.
+
+That's all that is needed to get started!
+
+## Quickly add YAML metadata for MyST Notebooks
+
+If you have a markdown file and you'd like to quickly add YAML metadata to it, so that Jupyter Book will treat it as a MyST Markdown Notebook, run the following command:
+
+```
+jupyter-book myst init path/to/markdownfile.md
+```
diff --git a/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/jupyter_book/book_template/markdown.md b/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/jupyter_book/book_template/markdown.md
new file mode 100644
index 000000000..faeea6061
--- /dev/null
+++ b/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/jupyter_book/book_template/markdown.md
@@ -0,0 +1,55 @@
+# Markdown Files
+
+Whether you write your book's content in Jupyter Notebooks (`.ipynb`) or
+in regular markdown files (`.md`), you'll write in the same flavor of markdown
+called **MyST Markdown**.
+This is a simple file to help you get started and show off some syntax.
+
+## What is MyST?
+
+MyST stands for "Markedly Structured Text". It
+is a slight variation on a flavor of markdown called "CommonMark" markdown,
+with small syntax extensions to allow you to write **roles** and **directives**
+in the Sphinx ecosystem.
+
+For more about MyST, see [the MyST Markdown Overview](https://jupyterbook.org/content/myst.html).
+
+## Sample Roles and Directives
+
+Roles and directives are two of the most powerful tools in Jupyter Book. They
+are like functions, but written in a markup language. They both
+serve a similar purpose, but **roles are written in one line**, whereas
+**directives span many lines**. They both accept different kinds of inputs,
+and what they do with those inputs depends on the specific role or directive
+that is being called.
+
+Here is a "note" directive:
+
+```{note}
+Here is a note
+```
+
+It will be rendered in a special box when you build your book.
+
+Here is an inline directive to refer to a document: {doc}`markdown-notebooks`.
+
+
+## Citations
+
+You can also cite references that are stored in a `bibtex` file. For example,
+the following syntax: `` {cite}`holdgraf_evidence_2014` `` will render like
+this: {cite}`holdgraf_evidence_2014`.
+
+Moreover, you can insert a bibliography into your page with this syntax:
+The `{bibliography}` directive must be used for all the `{cite}` roles to
+render properly.
+For example, if the references for your book are stored in `references.bib`,
+then the bibliography is inserted with:
+
+```{bibliography}
+```
+
+## Learn more
+
+This is just a simple starter to get you started.
+You can learn a lot more at [jupyterbook.org](https://jupyterbook.org).
diff --git a/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/jupyter_book/book_template/notebooks.ipynb b/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/jupyter_book/book_template/notebooks.ipynb
new file mode 100644
index 000000000..fdb7176c4
--- /dev/null
+++ b/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/jupyter_book/book_template/notebooks.ipynb
@@ -0,0 +1,122 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Content with notebooks\n",
+    "\n",
+    "You can also create content with Jupyter Notebooks. This means that you can include\n",
+    "code blocks and their outputs in your book.\n",
+    "\n",
+    "## Markdown + notebooks\n",
+    "\n",
+    "As it is markdown, you can embed images, HTML, etc into your posts!\n",
+    "\n",
+    "![](https://myst-parser.readthedocs.io/en/latest/_static/logo-wide.svg)\n",
+    "\n",
+    "You can also $add_{math}$ and\n",
+    "\n",
+    "$$\n",
+    "math^{blocks}\n",
+    "$$\n",
+    "\n",
+    "or\n",
+    "\n",
+    "$$\n",
+    "\\begin{aligned}\n",
+    "\\mbox{mean} la_{tex} \\\\ \\\\\n",
+    "math blocks\n",
+    "\\end{aligned}\n",
+    "$$\n",
+    "\n",
+    "But make sure you \\$Escape \\$your \\$dollar signs \\$you want to keep!\n",
+    "\n",
+    "## MyST markdown\n",
+    "\n",
+    "MyST markdown works in Jupyter Notebooks as well. For more information about MyST markdown, check\n",
+    "out [the MyST guide in Jupyter Book](https://jupyterbook.org/content/myst.html),\n",
+    "or see [the MyST markdown documentation](https://myst-parser.readthedocs.io/en/latest/).\n",
+    "\n",
+    "## Code blocks and outputs\n",
+    "\n",
+    "Jupyter Book will also embed your code blocks and output in your book.\n",
+    "For example, here's some sample Matplotlib code:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from matplotlib import rcParams, cycler\n",
+    "import matplotlib.pyplot as plt\n",
+    "import numpy as np\n",
+    "plt.ion()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Fixing random state for reproducibility\n",
+    "np.random.seed(19680801)\n",
+    "\n",
+    "N = 10\n",
+    "data = [np.logspace(0, 1, 100) + np.random.randn(100) + ii for ii in range(N)]\n",
+    "data = np.array(data).T\n",
+    "cmap = plt.cm.coolwarm\n",
+    "rcParams['axes.prop_cycle'] = cycler(color=cmap(np.linspace(0, 1, N)))\n",
+    "\n",
+    "\n",
+    "from matplotlib.lines import Line2D\n",
+    "custom_lines = [Line2D([0], [0], color=cmap(0.), lw=4),\n",
+    "                Line2D([0], [0], color=cmap(.5), lw=4),\n",
+    "                Line2D([0], [0], color=cmap(1.), lw=4)]\n",
+    "\n",
+    "fig, ax = plt.subplots(figsize=(10, 5))\n",
+    "lines = ax.plot(data)\n",
+    "ax.legend(custom_lines, ['Cold', 'Medium', 'Hot']);"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "There is a lot more that you can do with outputs (such as including interactive outputs)\n",
+    "with your book. For more information about this, see [the Jupyter Book documentation](https://jupyterbook.org)"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.8.0"
+  },
+  "widgets": {
+   "application/vnd.jupyter.widget-state+json": {
+    "state": {},
+    "version_major": 2,
+    "version_minor": 0
+   }
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}
diff --git a/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/latexcodec-3.0.1.dist-info/licenses/AUTHORS.rst b/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/latexcodec-3.0.1.dist-info/licenses/AUTHORS.rst
new file mode 100644
index 000000000..a9d662da7
--- /dev/null
+++ b/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/latexcodec-3.0.1.dist-info/licenses/AUTHORS.rst
@@ -0,0 +1,26 @@
+Main authors:
+
+* David Eppstein
+
+  - wrote the original LaTeX codec as a recipe on ActiveState
+    http://code.activestate.com/recipes/252124-latex-codec/
+
+* Peter Tröger
+
+  - wrote the original latexcodec package, which contained a simple
+    but very effective LaTeX encoder
+
+* Matthias Troffaes (matthias.troffaes@gmail.com)
+
+  - wrote the lexer
+
+  - integrated codec with the lexer for a simpler and more robust
+    design
+
+  - various bugfixes
+
+Contributors:
+
+* Michael Radziej
+
+* Philipp Spitzer
diff --git a/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/latexcodec-3.0.1.dist-info/licenses/LICENSE.rst b/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/latexcodec-3.0.1.dist-info/licenses/LICENSE.rst
new file mode 100644
index 000000000..a7dbb5e82
--- /dev/null
+++ b/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/latexcodec-3.0.1.dist-info/licenses/LICENSE.rst
@@ -0,0 +1,23 @@
+| latexcodec is a lexer and codec to work with LaTeX code in Python
+| Copyright (c) 2011-2020 by Matthias C. M. Troffaes
+
+Permission is hereby granted, free of charge, to any person
+obtaining a copy of this software and associated documentation
+files (the "Software"), to deal in the Software without
+restriction, including without limitation the rights to use,
+copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the
+Software is furnished to do so, subject to the following
+conditions:
+
+The above copyright notice and this permission notice shall be
+included in all copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
+OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
+HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
+WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+OTHER DEALINGS IN THE SOFTWARE.
diff --git a/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/mdit_py_plugins/container/README.md b/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/mdit_py_plugins/container/README.md
new file mode 100644
index 000000000..03868d78b
--- /dev/null
+++ b/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/mdit_py_plugins/container/README.md
@@ -0,0 +1,95 @@
+# markdown-it-container
+
+[![Build Status](https://img.shields.io/travis/markdown-it/markdown-it-container/master.svg?style=flat)](https://travis-ci.org/markdown-it/markdown-it-container)
+[![NPM version](https://img.shields.io/npm/v/markdown-it-container.svg?style=flat)](https://www.npmjs.org/package/markdown-it-container)
+[![Coverage Status](https://img.shields.io/coveralls/markdown-it/markdown-it-container/master.svg?style=flat)](https://coveralls.io/r/markdown-it/markdown-it-container?branch=master)
+
+> Plugin for creating block-level custom containers for [markdown-it](https://github.com/markdown-it/markdown-it) markdown parser.
+
+__v2.+ requires `markdown-it` v5.+, see changelog.__
+
+With this plugin you can create block containers like:
+
+```
+::: warning
+*here be dragons*
+:::
+```
+
+.... and specify how they should be rendered. If no renderer defined, `<div>` with
+container name class will be created:
+
+```html
+<div class="warning">
+<em>here be dragons</em>
+</div>
+```
+
+Markup is the same as for [fenced code blocks](http://spec.commonmark.org/0.18/#fenced-code-blocks).
+Difference is, that marker use another character and content is rendered as markdown markup.
+
+
+## Installation
+
+node.js, browser:
+
+```bash
+$ npm install markdown-it-container --save
+$ bower install markdown-it-container --save
+```
+
+
+## API
+
+```js
+var md = require('markdown-it')()
+            .use(require('markdown-it-container'), name [, options]);
+```
+
+Params:
+
+- __name__ - container name (mandatory)
+- __options:__
+   - __validate__ - optional, function to validate tail after opening marker, should
+     return `true` on success.
+   - __render__ - optional, renderer function for opening/closing tokens.
+   - __marker__ - optional (`:`), character to use in delimiter.
+
+
+## Example
+
+```js
+var md = require('markdown-it')();
+
+md.use(require('markdown-it-container'), 'spoiler', {
+
+  validate: function(params) {
+    return params.trim().match(/^spoiler\s+(.*)$/);
+  },
+
+  render: function (tokens, idx) {
+    var m = tokens[idx].info.trim().match(/^spoiler\s+(.*)$/);
+
+    if (tokens[idx].nesting === 1) {
+      // opening tag
+      return '<details><summary>' + md.utils.escapeHtml(m[1]) + '</summary>\n';
+
+    } else {
+      // closing tag
+      return '</details>\n';
+    }
+  }
+});
+
+console.log(md.render('::: spoiler click me\n*content*\n:::\n'));
+
+// Output:
+//
+// <details><summary>click me</summary>
+// <p><em>content</em></p>
+// </details>
+```
+
+## License
+
+[MIT](https://github.com/markdown-it/markdown-it-container/blob/master/LICENSE)
diff --git a/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/mdit_py_plugins/deflist/README.md b/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/mdit_py_plugins/deflist/README.md
new file mode 100644
index 000000000..414157bcc
--- /dev/null
+++ b/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/mdit_py_plugins/deflist/README.md
@@ -0,0 +1,38 @@
+# markdown-it-deflist
+
+[![Build Status](https://img.shields.io/travis/markdown-it/markdown-it-deflist/master.svg?style=flat)](https://travis-ci.org/markdown-it/markdown-it-deflist)
+[![NPM version](https://img.shields.io/npm/v/markdown-it-deflist.svg?style=flat)](https://www.npmjs.org/package/markdown-it-deflist)
+[![Coverage Status](https://img.shields.io/coveralls/markdown-it/markdown-it-deflist/master.svg?style=flat)](https://coveralls.io/r/markdown-it/markdown-it-deflist?branch=master)
+
+> Definition list (`<dl>`) tag plugin for [markdown-it](https://github.com/markdown-it/markdown-it) markdown parser.
+
+__v2.+ requires `markdown-it` v5.+, see changelog.__
+
+Syntax is based on [pandoc definition lists](http://johnmacfarlane.net/pandoc/README.html#definition-lists).
+
+
+## Install
+
+node.js, browser:
+
+```bash
+npm install markdown-it-deflist --save
+bower install markdown-it-deflist --save
+```
+
+## Use
+
+```js
+var md = require('markdown-it')()
+            .use(require('markdown-it-deflist'));
+
+md.render(/*...*/);
+```
+
+_Differences in browser._ If you load script directly into the page, without
+package system, module will add itself globally as `window.markdownitDeflist`.
+
+
+## License
+
+[MIT](https://github.com/markdown-it/markdown-it-deflist/blob/master/LICENSE)
diff --git a/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/mdit_py_plugins/texmath/README.md b/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/mdit_py_plugins/texmath/README.md
new file mode 100644
index 000000000..f79f33563
--- /dev/null
+++ b/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/mdit_py_plugins/texmath/README.md
@@ -0,0 +1,137 @@
+[![License](https://img.shields.io/github/license/goessner/markdown-it-texmath.svg)](https://github.com/goessner/markdown-it-texmath/blob/master/licence.txt)
+[![npm](https://img.shields.io/npm/v/markdown-it-texmath.svg)](https://www.npmjs.com/package/markdown-it-texmath)
+[![npm](https://img.shields.io/npm/dt/markdown-it-texmath.svg)](https://www.npmjs.com/package/markdown-it-texmath)
+
+# markdown-it-texmath
+
+Add TeX math equations to your Markdown documents rendered by [markdown-it](https://github.com/markdown-it/markdown-it) parser. [KaTeX](https://github.com/Khan/KaTeX) is used as a fast math renderer.
+
+## Features
+Simplify the process of authoring markdown documents containing math formulas.
+This extension is a comfortable tool for scientists, engineers and students with markdown as their first choice document format.
+
+* Macro support
+* Simple formula numbering
+* Inline math with tables, lists and blockquote.
+* User setting delimiters:
+  * `'dollars'` (default)
+    * inline: `$...$`
+    * display: `$$...$$`
+    * display + equation number: `$$...$$ (1)`
+  * `'brackets'`
+    * inline: `\(...\)`
+    * display: `\[...\]`
+    * display + equation number: `\[...\] (1)`
+  * `'gitlab'`
+    * inline: ``$`...`$``
+    * display: `` ```math ... ``` ``
+    * display + equation number: `` ```math ... ``` (1)``
+  * `'julia'`
+    * inline: `$...$`  or ``` ``...`` ```
+    * display: `` ```math ... ``` ``
+    * display + equation number: `` ```math ... ``` (1)``
+  * `'kramdown'`
+    * inline: ``$$...$$``
+    * display: `$$...$$`
+    * display + equation number: `$$...$$ (1)`
+
+## Show me
+
+View a [test table](https://goessner.github.io/markdown-it-texmath/index.html).
+
+[try it out ...](https://goessner.github.io/markdown-it-texmath/markdown-it-texmath-demo.html)
+
+## Use with `node.js`
+
+Install the extension. Verify having `markdown-it` and `katex` already installed .
+```
+npm install markdown-it-texmath
+```
+Use it with JavaScript.
+```js
+let kt = require('katex'),
+    tm = require('markdown-it-texmath').use(kt),
+    md = require('markdown-it')().use(tm,{delimiters:'dollars',macros:{"\\RR": "\\mathbb{R}"}});
+
+md.render('Euler\'s identity \(e^{i\pi}+1=0\) is a beautiful formula in $\\RR 2$.')
+```
+
+## Use in Browser
+```html
+<html>
+<head>
+  <meta charset='utf-8'>
+  <link rel="stylesheet" href="http://github.com/katex.min.css">
+  <link rel="stylesheet" href="http://github.com/texmath.css">
+  <script src="http://github.com/markdown-it.min.js"></script>
+  <script src="http://github.com/katex.min.js"></script>
+  <script src="http://github.com/texmath.js"></script>
+</head>
+<body>
+  <div id="out"></div>
+  <script>
+    let md;
+    document.addEventListener("DOMContentLoaded", () => {
+        const tm = texmath.use(katex);
+        md = markdownit().use(tm,{delimiters:'dollars',macros:{"\\RR": "\\mathbb{R}"}});
+        out.innerHTML = md.render('Euler\'s identity $e^{i\pi}+1=0$ is a beautiful formula in //RR 2.');
+    })
+  </script>
+</body>
+</html>
+```
+## CDN
+
+Use following links for `texmath.js` and `texmath.css`
+* `https://gitcdn.xyz/cdn/goessner/markdown-it-texmath/master/texmath.js`
+* `https://gitcdn.xyz/cdn/goessner/markdown-it-texmath/master/texmath.css`
+
+## Dependencies
+
+* [`markdown-it`](https://github.com/markdown-it/markdown-it): Markdown parser done right. Fast and easy to extend.
+* [`katex`](https://github.com/Khan/KaTeX): This is where credits for fast rendering TeX math in HTML go to.
+
+## ToDo
+
+ nothing yet
+
+## FAQ
+
+* __`markdown-it-texmath` with React Native does not work, why ?__
+  * `markdown-it-texmath` is using regular expressions with `y` [(sticky) property](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/sticky) and cannot avoid this. The use of the `y` flag in regular expressions means the plugin is not compatible with React Native (which as of now doesn't support it and throws an error `Invalid flags supplied to RegExp constructor`).
+
+## CHANGELOG
+
+###  [0.6.0] on October 04, 2019
+* Add support for [Julia Markdown](https://docs.julialang.org/en/v1/stdlib/Markdown/) on [request](https://github.com/goessner/markdown-it-texmath/issues/15).
+
+###  [0.5.5] on February 07, 2019
+* Remove [rendering bug with brackets delimiters](https://github.com/goessner/markdown-it-texmath/issues/9).
+
+###  [0.5.4] on January 20, 2019
+* Remove pathological [bug within blockquotes](https://github.com/goessner/mdmath/issues/50).
+
+###  [0.5.3] on November 11, 2018
+* Add support for Tex macros (https://katex.org/docs/supported.html#macros) .
+* Bug with [brackets delimiters](https://github.com/goessner/markdown-it-texmath/issues/9) .
+
+###  [0.5.2] on September 07, 2018
+* Add support for [Kramdown](https://kramdown.gettalong.org/) .
+
+###  [0.5.0] on August 15, 2018
+* Fatal blockquote bug investigated. Implemented workaround to vscode bug, which has finally gone with vscode 1.26.0 .
+
+###  [0.4.6] on January 05, 2018
+* Escaped underscore bug removed.
+
+###  [0.4.5] on November 06, 2017
+* Backslash bug removed.
+
+###  [0.4.4] on September 27, 2017
+* Modifying the `block` mode regular expression with `gitlab` delimiters, so removing the `newline` bug.
+
+## License
+
+`markdown-it-texmath` is licensed under the [MIT License](./license.txt)
+
+ © [Stefan Gössner](https://github.com/goessner)
diff --git a/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/numpy/ma/README.rst b/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/numpy/ma/README.rst
new file mode 100644
index 000000000..cd1010329
--- /dev/null
+++ b/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/numpy/ma/README.rst
@@ -0,0 +1,236 @@
+==================================
+A guide to masked arrays in NumPy
+==================================
+
+.. Contents::
+
+See http://www.scipy.org/scipy/numpy/wiki/MaskedArray (dead link)
+for updates of this document.
+
+
+History
+-------
+
+As a regular user of MaskedArray, I (Pierre G.F. Gerard-Marchant) became
+increasingly frustrated with the subclassing of masked arrays (even if
+I can only blame my inexperience). I needed to develop a class of arrays
+that could store some additional information along with numerical values,
+while keeping the possibility for missing data (picture storing a series
+of dates along with measurements, what would later become the `TimeSeries
+Scikit <http://projects.scipy.org/scipy/scikits/wiki/TimeSeries>`__
+(dead link).
+
+I started to implement such a class, but then quickly realized that
+any additional information disappeared when processing these subarrays
+(for example, adding a constant value to a subarray would erase its
+dates). I ended up writing the equivalent of *numpy.core.ma* for my
+particular class, ufuncs included. Everything went fine until I needed to
+subclass my new class, when more problems showed up: some attributes of
+the new subclass were lost during processing. I identified the culprit as
+MaskedArray, which returns masked ndarrays when I expected masked
+arrays of my class. I was preparing myself to rewrite *numpy.core.ma*
+when I forced myself to learn how to subclass ndarrays. As I became more
+familiar with the *__new__* and *__array_finalize__* methods,
+I started to wonder why masked arrays were objects, and not ndarrays,
+and whether it wouldn't be more convenient for subclassing if they did
+behave like regular ndarrays.
+
+The new *maskedarray* is what I eventually come up with. The
+main differences with the initial *numpy.core.ma* package are
+that MaskedArray is now a subclass of *ndarray* and that the
+*_data* section can now be any subclass of *ndarray*. Apart from a
+couple of issues listed below, the behavior of the new MaskedArray
+class reproduces the old one. Initially the *maskedarray*
+implementation was marginally slower than *numpy.ma* in some areas,
+but work is underway to speed it up; the expectation is that it can be
+made substantially faster than the present *numpy.ma*.
+
+
+Note that if the subclass has some special methods and
+attributes, they are not propagated to the masked version:
+this would require a modification of the *__getattribute__*
+method (first trying *ndarray.__getattribute__*, then trying
+*self._data.__getattribute__* if an exception is raised in the first
+place), which really slows things down.
+
+Main differences
+----------------
+
+ * The *_data* part of the masked array can be any subclass of ndarray (but not recarray, cf below).
+ * *fill_value* is now a property, not a function.
+ * in the majority of cases, the mask is forced to *nomask* when no value is actually masked. A notable exception is when a masked array (with no masked values) has just been unpickled.
+ * I got rid of the *share_mask* flag, I never understood its purpose.
+ * *put*, *putmask* and *take* now mimic the ndarray methods, to avoid unpleasant surprises. Moreover, *put* and *putmask* both update the mask when needed.  * if *a* is a masked array, *bool(a)* raises a *ValueError*, as it does with ndarrays.
+ * in the same way, the comparison of two masked arrays is a masked array, not a boolean
+ * *filled(a)* returns an array of the same subclass as *a._data*, and no test is performed on whether it is contiguous or not.
+ * the mask is always printed, even if it's *nomask*, which makes things easy (for me at least) to remember that a masked array is used.
+ * *cumsum* works as if the *_data* array was filled with 0. The mask is preserved, but not updated.
+ * *cumprod* works as if the *_data* array was filled with 1. The mask is preserved, but not updated.
+
+New features
+------------
+
+This list is non-exhaustive...
+
+ * the *mr_* function mimics *r_* for masked arrays.
+ * the *anom* method returns the anomalies (deviations from the average)
+
+Using the new package with numpy.core.ma
+----------------------------------------
+
+I tried to make sure that the new package can understand old masked
+arrays. Unfortunately, there's no upward compatibility.
+
+For example:
+
+>>> import numpy.core.ma as old_ma
+>>> import maskedarray as new_ma
+>>> x = old_ma.array([1,2,3,4,5], mask=[0,0,1,0,0])
+>>> x
+array(data =
+ [     1      2 999999      4      5],
+      mask =
+ [False False True False False],
+      fill_value=999999)
+>>> y = new_ma.array([1,2,3,4,5], mask=[0,0,1,0,0])
+>>> y
+array(data = [1 2 -- 4 5],
+      mask = [False False True False False],
+      fill_value=999999)
+>>> x==y
+array(data =
+ [True True True True True],
+      mask =
+ [False False True False False],
+      fill_value=?)
+>>> old_ma.getmask(x) == new_ma.getmask(x)
+array([True, True, True, True, True])
+>>> old_ma.getmask(y) == new_ma.getmask(y)
+array([True, True, False, True, True])
+>>> old_ma.getmask(y)
+False
+
+
+Using maskedarray with matplotlib
+---------------------------------
+
+Starting with matplotlib 0.91.2, the masked array importing will work with
+the maskedarray branch) as well as with earlier versions.
+
+By default matplotlib still uses numpy.ma, but there is an rcParams setting
+that you can use to select maskedarray instead.  In the matplotlibrc file
+you will find::
+
+  #maskedarray : False       # True to use external maskedarray module
+                             # instead of numpy.ma; this is a temporary #
+                             setting for testing maskedarray.
+
+
+Uncomment and set to True to select maskedarray everywhere.
+Alternatively, you can test a script with maskedarray by using a
+command-line option, e.g.::
+
+  python simple_plot.py --maskedarray
+
+
+Masked records
+--------------
+
+Like *numpy.ma.core*, the *ndarray*-based implementation
+of MaskedArray is limited when working with records: you can
+mask any record of the array, but not a field in a record. If you
+need this feature, you may want to give the *mrecords* package
+a try (available in the *maskedarray* directory in the scipy
+sandbox). This module defines a new class, *MaskedRecord*. An
+instance of this class accepts a *recarray* as data, and uses two
+masks: the *fieldmask* has as many entries as records in the array,
+each entry with the same fields as a record, but of boolean types:
+they indicate whether the field is masked or not; a record entry
+is flagged as masked in the *mask* array if all the fields are
+masked. A few examples in the file should give you an idea of what
+can be done. Note that *mrecords* is still experimental...
+
+Optimizing maskedarray
+----------------------
+
+Should masked arrays be filled before processing or not?
+--------------------------------------------------------
+
+In the current implementation, most operations on masked arrays involve
+the following steps:
+
+ * the input arrays are filled
+ * the operation is performed on the filled arrays
+ * the mask is set for the results, from the combination of the input masks and the mask corresponding to the domain of the operation.
+
+For example, consider the division of two masked arrays::
+
+  import numpy
+  import maskedarray as ma
+  x = ma.array([1,2,3,4],mask=[1,0,0,0], dtype=numpy.float64)
+  y = ma.array([-1,0,1,2], mask=[0,0,0,1], dtype=numpy.float64)
+
+The division of x by y is then computed as::
+
+  d1 = x.filled(0) # d1 = array([0., 2., 3., 4.])
+  d2 = y.filled(1) # array([-1.,  0.,  1.,  1.])
+  m = ma.mask_or(ma.getmask(x), ma.getmask(y)) # m =
+  array([True,False,False,True])
+  dm = ma.divide.domain(d1,d2) # array([False,  True, False, False])
+  result = (d1/d2).view(MaskedArray) # masked_array([-0. inf, 3., 4.])
+  result._mask = logical_or(m, dm)
+
+Note that a division by zero takes place. To avoid it, we can consider
+to fill the input arrays, taking the domain mask into account, so that::
+
+  d1 = x._data.copy() # d1 = array([1., 2., 3., 4.])
+  d2 = y._data.copy() # array([-1.,  0.,  1.,  2.])
+  dm = ma.divide.domain(d1,d2) # array([False,  True, False, False])
+  numpy.putmask(d2, dm, 1) # d2 = array([-1.,  1.,  1.,  2.])
+  m = ma.mask_or(ma.getmask(x), ma.getmask(y)) # m =
+  array([True,False,False,True])
+  result = (d1/d2).view(MaskedArray) # masked_array([-1. 0., 3., 2.])
+  result._mask = logical_or(m, dm)
+
+Note that the *.copy()* is required to avoid updating the inputs with
+*putmask*.  The *.filled()* method also involves a *.copy()*.
+
+A third possibility consists in avoid filling the arrays::
+
+  d1 = x._data # d1 = array([1., 2., 3., 4.])
+  d2 = y._data # array([-1.,  0.,  1.,  2.])
+  dm = ma.divide.domain(d1,d2) # array([False,  True, False, False])
+  m = ma.mask_or(ma.getmask(x), ma.getmask(y)) # m =
+  array([True,False,False,True])
+  result = (d1/d2).view(MaskedArray) # masked_array([-1. inf, 3., 2.])
+  result._mask = logical_or(m, dm)
+
+Note that here again the division by zero takes place.
+
+A quick benchmark gives the following results:
+
+ * *numpy.ma.divide*  : 2.69 ms per loop
+ * classical division     : 2.21 ms per loop
+ * division w/ prefilling : 2.34 ms per loop
+ * division w/o filling   : 1.55 ms per loop
+
+So, is it worth filling the arrays beforehand ? Yes, if we are interested
+in avoiding floating-point exceptions that may fill the result with infs
+and nans. No, if we are only interested into speed...
+
+
+Thanks
+------
+
+I'd like to thank Paul Dubois, Travis Oliphant and Sasha for the
+original masked array package: without you, I would never have started
+that (it might be argued that I shouldn't have anyway, but that's
+another story...).  I also wish to extend these thanks to Reggie Dugard
+and Eric Firing for their suggestions and numerous improvements.
+
+
+Revision notes
+--------------
+
+  * 08/25/2007 : Creation of this page
+  * 01/23/2007 : The package has been moved to the SciPy sandbox, and is regularly updated: please check out your SVN version!
diff --git a/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/numpy/random/LICENSE.md b/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/numpy/random/LICENSE.md
new file mode 100644
index 000000000..a6cf1b17e
--- /dev/null
+++ b/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/numpy/random/LICENSE.md
@@ -0,0 +1,71 @@
+**This software is dual-licensed under the The University of Illinois/NCSA
+Open Source License (NCSA) and The 3-Clause BSD License**
+
+# NCSA Open Source License
+**Copyright (c) 2019 Kevin Sheppard. All rights reserved.**
+
+Developed by: Kevin Sheppard (<kevin.sheppard@economics.ox.ac.uk>,
+<kevin.k.sheppard@gmail.com>)
+[http://www.kevinsheppard.com](http://www.kevinsheppard.com)
+
+Permission is hereby granted, free of charge, to any person obtaining a copy of
+this software and associated documentation files (the "Software"), to deal with
+the Software without restriction, including without limitation the rights to
+use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies
+of the Software, and to permit persons to whom the Software is furnished to do
+so, subject to the following conditions:
+
+Redistributions of source code must retain the above copyright notice, this
+list of conditions and the following disclaimers.
+
+Redistributions in binary form must reproduce the above copyright notice, this
+list of conditions and the following disclaimers in the documentation and/or
+other materials provided with the distribution.
+
+Neither the names of Kevin Sheppard, nor the names of any contributors may be
+used to endorse or promote products derived from this Software without specific
+prior written permission.
+
+**THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS WITH
+THE SOFTWARE.**
+
+
+# 3-Clause BSD License
+**Copyright (c) 2019 Kevin Sheppard. All rights reserved.**
+
+Redistribution and use in source and binary forms, with or without
+modification, are permitted provided that the following conditions are met:
+
+1. Redistributions of source code must retain the above copyright notice,
+   this list of conditions and the following disclaimer.
+
+2. Redistributions in binary form must reproduce the above copyright notice,
+   this list of conditions and the following disclaimer in the documentation
+   and/or other materials provided with the distribution.
+
+3. Neither the name of the copyright holder nor the names of its contributors
+   may be used to endorse or promote products derived from this software
+   without specific prior written permission.
+
+**THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
+LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF
+THE POSSIBILITY OF SUCH DAMAGE.**
+
+# Components
+
+Many parts of this module have been derived from original sources, 
+often the algorithm's designer. Component licenses are located with 
+the component code.
diff --git a/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/pip-25.2.dist-info/licenses/src/pip/_vendor/idna/LICENSE.md b/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/pip-25.2.dist-info/licenses/src/pip/_vendor/idna/LICENSE.md
new file mode 100644
index 000000000..19b6b4524
--- /dev/null
+++ b/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/pip-25.2.dist-info/licenses/src/pip/_vendor/idna/LICENSE.md
@@ -0,0 +1,31 @@
+BSD 3-Clause License
+
+Copyright (c) 2013-2024, Kim Davies and contributors.
+All rights reserved.
+
+Redistribution and use in source and binary forms, with or without
+modification, are permitted provided that the following conditions are
+met:
+
+1. Redistributions of source code must retain the above copyright
+   notice, this list of conditions and the following disclaimer.
+
+2. Redistributions in binary form must reproduce the above copyright
+   notice, this list of conditions and the following disclaimer in the
+   documentation and/or other materials provided with the distribution.
+
+3. Neither the name of the copyright holder nor the names of its
+   contributors may be used to endorse or promote products derived from
+   this software without specific prior written permission.
+
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED
+TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
+PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
+LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
+NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
+SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
diff --git a/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/prompt_toolkit-3.0.52.dist-info/licenses/AUTHORS.rst b/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/prompt_toolkit-3.0.52.dist-info/licenses/AUTHORS.rst
new file mode 100644
index 000000000..f7c8f60f4
--- /dev/null
+++ b/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/prompt_toolkit-3.0.52.dist-info/licenses/AUTHORS.rst
@@ -0,0 +1,11 @@
+Authors
+=======
+
+Creator
+-------
+Jonathan Slenders <jonathan AT slenders.be>
+
+Contributors
+------------
+
+- Amjith Ramanujam <amjith.r AT gmail.com>
diff --git a/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/pybtex_docutils-1.0.3.dist-info/LICENSE.rst b/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/pybtex_docutils-1.0.3.dist-info/LICENSE.rst
new file mode 100644
index 000000000..aff5c5aa2
--- /dev/null
+++ b/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/pybtex_docutils-1.0.3.dist-info/LICENSE.rst
@@ -0,0 +1,23 @@
+| pybtex-docutils is a docutils backend for pybtex
+| Copyright (c) 2013-2021 by Matthias C. M. Troffaes
+
+Permission is hereby granted, free of charge, to any person
+obtaining a copy of this software and associated documentation
+files (the "Software"), to deal in the Software without
+restriction, including without limitation the rights to use,
+copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the
+Software is furnished to do so, subject to the following
+conditions:
+
+The above copyright notice and this permission notice shall be
+included in all copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
+OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
+HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
+WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+OTHER DEALINGS IN THE SOFTWARE.
diff --git a/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/pyzmq-27.0.2.dist-info/licenses/LICENSE.md b/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/pyzmq-27.0.2.dist-info/licenses/LICENSE.md
new file mode 100644
index 000000000..f7072d1c9
--- /dev/null
+++ b/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/pyzmq-27.0.2.dist-info/licenses/LICENSE.md
@@ -0,0 +1,30 @@
+BSD 3-Clause License
+
+Copyright (c) 2009-2012, Brian Granger, Min Ragan-Kelley
+
+All rights reserved.
+
+Redistribution and use in source and binary forms, with or without
+modification, are permitted provided that the following conditions are met:
+
+1. Redistributions of source code must retain the above copyright notice, this
+   list of conditions and the following disclaimer.
+
+2. Redistributions in binary form must reproduce the above copyright notice,
+   this list of conditions and the following disclaimer in the documentation
+   and/or other materials provided with the distribution.
+
+3. Neither the name of the copyright holder nor the names of its
+   contributors may be used to endorse or promote products derived from
+   this software without specific prior written permission.
+
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
+FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
+SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
+OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
diff --git a/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/soupsieve-2.8.dist-info/licenses/LICENSE.md b/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/soupsieve-2.8.dist-info/licenses/LICENSE.md
new file mode 100644
index 000000000..20528a992
--- /dev/null
+++ b/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/soupsieve-2.8.dist-info/licenses/LICENSE.md
@@ -0,0 +1,21 @@
+MIT License
+
+Copyright (c) 2018 - 2025 Isaac Muse <isaacmuse@gmail.com>
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
diff --git a/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/sphinx-7.4.7.dist-info/LICENSE.rst b/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/sphinx-7.4.7.dist-info/LICENSE.rst
new file mode 100644
index 000000000..db36b190c
--- /dev/null
+++ b/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/sphinx-7.4.7.dist-info/LICENSE.rst
@@ -0,0 +1,67 @@
+License for Sphinx
+==================
+
+Unless otherwise indicated, all code in the Sphinx project is licenced under the
+two clause BSD licence below.
+
+Copyright (c) 2007-2024 by the Sphinx team (see AUTHORS file).
+All rights reserved.
+
+Redistribution and use in source and binary forms, with or without
+modification, are permitted provided that the following conditions are
+met:
+
+* Redistributions of source code must retain the above copyright
+  notice, this list of conditions and the following disclaimer.
+
+* Redistributions in binary form must reproduce the above copyright
+  notice, this list of conditions and the following disclaimer in the
+  documentation and/or other materials provided with the distribution.
+
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+
+Licenses for incorporated software
+==================================
+
+The included implementation of NumpyDocstring._parse_numpydoc_see_also_section
+was derived from code under the following license:
+
+-------------------------------------------------------------------------------
+
+Copyright (C) 2008 Stefan van der Walt <stefan@mentat.za.net>, Pauli Virtanen <pav@iki.fi>
+
+Redistribution and use in source and binary forms, with or without
+modification, are permitted provided that the following conditions are
+met:
+
+ 1. Redistributions of source code must retain the above copyright
+    notice, this list of conditions and the following disclaimer.
+ 2. Redistributions in binary form must reproduce the above copyright
+    notice, this list of conditions and the following disclaimer in
+    the documentation and/or other materials provided with the
+    distribution.
+
+THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR
+IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
+WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+DISCLAIMED. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT,
+INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
+(INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
+SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
+HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
+STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING
+IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+POSSIBILITY OF SUCH DAMAGE.
+
+-------------------------------------------------------------------------------
diff --git a/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/sphinx_book_theme/assets/translations/README.md b/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/sphinx_book_theme/assets/translations/README.md
new file mode 100644
index 000000000..9e7e1deb5
--- /dev/null
+++ b/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/sphinx_book_theme/assets/translations/README.md
@@ -0,0 +1,53 @@
+# Translation workflow
+
+This folder contains code and translations for supporting multiple languages with Sphinx.
+See [the Sphinx internationalization documentation](https://www.sphinx-doc.org/en/master/usage/configuration.html) for more details.
+
+## Structure of translation files
+
+### Translation source files
+
+The source files for our translations are hand-edited, and contain the raw mapping of words onto various languages.
+They are checked in to `git` history with this repository.
+
+`src/sphinx_book_theme/assets/translations/jsons` contains a collection of JSON files that define the translation for various phrases in this repository.
+Each file is a different phrase, and its contents define language codes and translated phrases for each language we support.
+They were originally created with [the smodin.io language translator](https://smodin.me/translate-one-text-into-multiple-languages) (see below for how to update them).
+
+### Compiled translation files
+
+The translation source files are compiled at build time (when we run `stb compile`) automatically.
+This is executed by the Python script at `python src/sphinx_book_theme/_compile_translations.py` (more information on that below).
+
+These compiled files are **not checked into `.git` history**, but they **are** bundled with the theme when it is distributed in a package.
+Here's a brief explanation of each:
+
+- `src/sphinx_book_theme/theme/sphinx_book_theme/static/locales` contains Sphinx locale files that were auto-converted from the files in `jsons/` by the helper script below.
+- `src/sphinx_book_theme/_compile_translations.py` is a helper script to auto-generate Sphinx locale files from the JSONs in `jsons/`.
+
+## Workflow of translations
+
+Here's a short workflow of how to add a new translation, assuming that you are translating using the [smodin.io service](https://smodin.io/translate-one-text-into-multiple-languages).
+
+1. Go to [the smodin.io service](https://smodin.io/translate-one-text-into-multiple-languages)
+2. Select as many languages as you like.
+3. Type in the phrase you'd like to translate.
+4. Click `TRANSLATE` and then `Download JSON`.
+5. This will download a JSON file with a bunch of `language-code: translated-phrase` mappings.
+6. Put this JSON in the `jsons/` folder, and rename it to be the phrase you've translated in English.
+   So if the original phrase is `My phrase`, you should name the file `My phrase.json`.
+7. Run [the `prettier` formatter](https://prettier.io/) on this JSON to split it into multiple lines (this makes it easier to read and edit if translations should be updated)
+
+   ```bash
+   prettier sphinx_book_theme/translations/jsons/<message name>.json
+   ```
+
+8. Run `python src/sphinx_book_theme/_compile_translations.py`
+9. This will generate the locale files (`.mo`) that Sphinx uses in its translation machinery, and put them in `locales/<language-code>/LC_MESSAGES/<msg>.mo`.
+
+Sphinx should now know how to translate this message!
+
+## To update a translation
+
+To update a translation, you may go to the phase you'd like to modify in `jsons/`, then find the entry for the language you'd like to update, and change its value.
+Finally, run `python src/sphinx_book_theme/_compile_translations.py` and this will update the `.mo` files.
diff --git a/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/sphinxcontrib_bibtex-2.6.5.dist-info/licenses/LICENSE.rst b/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/sphinxcontrib_bibtex-2.6.5.dist-info/licenses/LICENSE.rst
new file mode 100644
index 000000000..fb3f16937
--- /dev/null
+++ b/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/sphinxcontrib_bibtex-2.6.5.dist-info/licenses/LICENSE.rst
@@ -0,0 +1,26 @@
+| sphinxcontrib-bibtex is a Sphinx extension for BibTeX style citations
+| Copyright (c) 2011-2024 by Matthias C. M. Troffaes
+| All rights reserved.
+
+Redistribution and use in source and binary forms, with or without
+modification, are permitted provided that the following conditions are
+met:
+
+* Redistributions of source code must retain the above copyright
+  notice, this list of conditions and the following disclaimer.
+
+* Redistributions in binary form must reproduce the above copyright
+  notice, this list of conditions and the following disclaimer in the
+  documentation and/or other materials provided with the distribution.
+
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
diff --git a/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/zmq/backend/cffi/README.md b/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/zmq/backend/cffi/README.md
new file mode 100644
index 000000000..00bb32989
--- /dev/null
+++ b/doc/LectureNotes/_build/html/_sources/.venv/lib/python3.13/site-packages/zmq/backend/cffi/README.md
@@ -0,0 +1 @@
+PyZMQ's CFFI support is designed only for (Unix) systems conforming to `have_sys_un_h = True`.
diff --git a/doc/LectureNotes/_build/html/_sources/exercisesweek35.ipynb b/doc/LectureNotes/_build/html/_sources/exercisesweek35.ipynb
index 886db99ef..403eab1f3 100644
--- a/doc/LectureNotes/_build/html/_sources/exercisesweek35.ipynb
+++ b/doc/LectureNotes/_build/html/_sources/exercisesweek35.ipynb
@@ -323,7 +323,7 @@
    "source": [
     "n = 100\n",
     "x = np.linspace(-3, 3, n)\n",
-    "y = np.exp(-x**2) + 1.5 * np.exp(-(x-2)**2) + np.random.normal(0, 1.0)"
+    "y = np.exp(-x**2) + 1.5 * np.exp(-(x-2)**2) + np.random.normal(n)"
    ]
   },
   {
diff --git a/doc/LectureNotes/_build/html/_sources/exercisesweek36.ipynb b/doc/LectureNotes/_build/html/_sources/exercisesweek36.ipynb
index ddf3e11e5..3dd1ad167 100644
--- a/doc/LectureNotes/_build/html/_sources/exercisesweek36.ipynb
+++ b/doc/LectureNotes/_build/html/_sources/exercisesweek36.ipynb
@@ -172,7 +172,7 @@
    "source": [
     "n = 100\n",
     "x = np.linspace(-3, 3, n)\n",
-    "y = np.exp(-x**2) + 1.5 * np.exp(-(x-2)**2) + np.random.normal(0, 0.1)"
+    "y = np.exp(-x**2) + 1.5 * np.exp(-(x-2)**2) + np.random.normal(n)"
    ]
   },
   {
diff --git a/doc/LectureNotes/_build/html/_sources/exercisesweek37.ipynb b/doc/LectureNotes/_build/html/_sources/exercisesweek37.ipynb
index 25296c4e0..bb6ba7a35 100644
--- a/doc/LectureNotes/_build/html/_sources/exercisesweek37.ipynb
+++ b/doc/LectureNotes/_build/html/_sources/exercisesweek37.ipynb
@@ -2,32 +2,33 @@
  "cells": [
   {
    "cell_type": "markdown",
-   "id": "1b941c35",
+   "id": "8e6632a0",
    "metadata": {
     "editable": true
    },
    "source": [
     "<!-- HTML file automatically generated from DocOnce source (https://github.com/doconce/doconce/)\n",
     "doconce format html exercisesweek37.do.txt  -->\n",
-    "<!-- dom:TITLE: Exercises week 37 -->"
+    "<!-- dom:TITLE: Exercises week 37 -->\n"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "dc05b096",
+   "id": "82705c4f",
    "metadata": {
     "editable": true
    },
    "source": [
     "# Exercises week 37\n",
+    "\n",
     "**Implementing gradient descent for Ridge and ordinary Least Squares Regression**\n",
     "\n",
-    "Date: **September 8-12, 2025**"
+    "Date: **September 8-12, 2025**\n"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "2cf07405",
+   "id": "921bf331",
    "metadata": {
     "editable": true
    },
@@ -35,55 +36,56 @@
     "## Learning goals\n",
     "\n",
     "After having completed these exercises you will have:\n",
+    "\n",
     "1. Your own code for the implementation of the simplest gradient descent approach applied to ordinary least squares (OLS) and Ridge regression\n",
     "\n",
     "2. Be able to compare the analytical expressions for OLS and Ridge regression with the gradient descent approach\n",
     "\n",
     "3. Explore the role of the learning rate in the gradient descent approach and the hyperparameter $\\lambda$ in Ridge regression\n",
     "\n",
-    "4. Scale the data properly"
+    "4. Scale the data properly\n"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "3c139edb",
+   "id": "adff65d5",
    "metadata": {
     "editable": true
    },
    "source": [
     "## Simple one-dimensional second-order polynomial\n",
     "\n",
-    "We start with a very simple function"
+    "We start with a very simple function\n"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "aad4cfac",
+   "id": "70418b3d",
    "metadata": {
     "editable": true
    },
    "source": [
     "$$\n",
     "f(x)= 2-x+5x^2,\n",
-    "$$"
+    "$$\n"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "6682282f",
+   "id": "11a3cf73",
    "metadata": {
     "editable": true
    },
    "source": [
-    "defined for $x\\in [-2,2]$. You can add noise if you wish. \n",
+    "defined for $x\\in [-2,2]$. You can add noise if you wish.\n",
     "\n",
     "We are going to fit this function with a polynomial ansatz. The easiest thing is to set up a second-order polynomial and see if you can fit the above function.\n",
-    "Feel free to play around with higher-order polynomials."
+    "Feel free to play around with higher-order polynomials.\n"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "89e2f4c4",
+   "id": "04a06b51",
    "metadata": {
     "editable": true
    },
@@ -94,12 +96,12 @@
     "standardize the features. This ensures all features are on a\n",
     "comparable scale, which is especially important when using\n",
     "regularization. Here we will perform standardization, scaling each\n",
-    "feature to have mean 0 and standard deviation 1."
+    "feature to have mean 0 and standard deviation 1.\n"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "b06d4e53",
+   "id": "408db3d9",
    "metadata": {
     "editable": true
    },
@@ -114,13 +116,13 @@
     "term, the data is shifted such that the intercept is effectively 0\n",
     ". (In practice, one could include an intercept in the model and not\n",
     "penalize it, but here we simplify by centering.)\n",
-    "Choose $n=100$ data points and set up $\\boldsymbol{x}, $\\boldsymbol{y}$ and the design matrix $\\boldsymbol{X}$."
+    "Choose $n=100$ data points and set up $\\boldsymbol{x}$, $\\boldsymbol{y}$ and the design matrix $\\boldsymbol{X}$.\n"
    ]
   },
   {
    "cell_type": "code",
    "execution_count": 1,
-   "id": "63796480",
+   "id": "37fb732c",
    "metadata": {
     "collapsed": false,
     "editable": true
@@ -140,46 +142,46 @@
   },
   {
    "cell_type": "markdown",
-   "id": "80748600",
+   "id": "d861e1e3",
    "metadata": {
     "editable": true
    },
    "source": [
-    "Fill in the necessary details.\n",
+    "Fill in the necessary details. Do we need to center the $y$-values?\n",
     "\n",
     "After this preprocessing, each column of $\\boldsymbol{X}_{\\mathrm{norm}}$ has mean zero and standard deviation $1$\n",
     "and $\\boldsymbol{y}_{\\mathrm{centered}}$ has mean 0. This makes the optimization landscape\n",
     "nicer and ensures the regularization penalty $\\lambda \\sum_j\n",
     "\\theta_j^2$ in Ridge regression treats each coefficient fairly (since features are on the\n",
-    "same scale)."
+    "same scale).\n"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "92751e5f",
+   "id": "b3e774d0",
    "metadata": {
     "editable": true
    },
    "source": [
     "## Exercise 2, calculate the gradients\n",
     "\n",
-    "Find the gradients for OLS and Ridge regression using the mean-squared error as cost/loss function."
+    "Find the gradients for OLS and Ridge regression using the mean-squared error as cost/loss function.\n"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "aedfbd7a",
+   "id": "d5dc7708",
    "metadata": {
     "editable": true
    },
    "source": [
-    "## Exercise 3, using the analytical formulae for OLS and Ridge regression to find the optimal paramters $\\boldsymbol{\\theta}$"
+    "## Exercise 3, using the analytical formulae for OLS and Ridge regression to find the optimal paramters $\\boldsymbol{\\theta}$\n"
    ]
   },
   {
    "cell_type": "code",
    "execution_count": 2,
-   "id": "5d1288fa",
+   "id": "4c9c86ac",
    "metadata": {
     "collapsed": false,
     "editable": true
@@ -187,7 +189,9 @@
    "outputs": [],
    "source": [
     "# Set regularization parameter, either a single value or a vector of values\n",
-    "lambda = ?\n",
+    "# Note that lambda is a python keyword. The lambda keyword is used to create small, single-expression functions without a formal name. These are often called \"anonymous functions\" or \"lambda functions.\"\n",
+    "lam = ?\n",
+    "\n",
     "\n",
     "# Analytical form for OLS and Ridge solution: theta_Ridge = (X^T X + lambda * I)^{-1} X^T y and theta_OLS = (X^T X)^{-1} X^T y\n",
     "I = np.eye(n_features)\n",
@@ -200,7 +204,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "628f5e89",
+   "id": "eeae00fd",
    "metadata": {
     "editable": true
    },
@@ -208,37 +212,37 @@
     "This computes the Ridge and OLS regression coefficients directly. The identity\n",
     "matrix $I$ has the same size as $X^T X$. It adds $\\lambda$ to the diagonal of $X^T X$ for Ridge regression. We\n",
     "then invert this matrix and multiply by $X^T y$. The result\n",
-    "for $\\boldsymbol{\\theta}$  is a NumPy array of shape (n$\\_$features,) containing the\n",
-    "fitted parameters $\\boldsymbol{\\theta}$."
+    "for $\\boldsymbol{\\theta}$ is a NumPy array of shape (n$\\_$features,) containing the\n",
+    "fitted parameters $\\boldsymbol{\\theta}$.\n"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "f115ba4e",
+   "id": "e1c215d5",
    "metadata": {
     "editable": true
    },
    "source": [
     "### 3a)\n",
     "\n",
-    "Finalize, in the above code, the OLS and Ridge regression determination of the optimal parameters $\\boldsymbol{\\theta}$."
+    "Finalize, in the above code, the OLS and Ridge regression determination of the optimal parameters $\\boldsymbol{\\theta}$.\n"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "a9b5189c",
+   "id": "587dd3dc",
    "metadata": {
     "editable": true
    },
    "source": [
     "### 3b)\n",
     "\n",
-    "Explore the results as function of different values of the hyperparameter $\\lambda$. See for example exercise 4 from week 36."
+    "Explore the results as function of different values of the hyperparameter $\\lambda$. See for example exercise 4 from week 36.\n"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "a3969ff6",
+   "id": "bfa34697",
    "metadata": {
     "editable": true
    },
@@ -250,15 +254,15 @@
     "necessary if $n$ and $p$ are so large that the closed-form might be\n",
     "too slow or memory-intensive. We derive the gradients from the cost\n",
     "functions defined above. Use the gradients of the Ridge and OLS cost functions with respect to\n",
-    "the parameters  $\\boldsymbol{\\theta}$ and set up (using the template below) your own gradient descent code for OLS and Ridge regression.\n",
+    "the parameters $\\boldsymbol{\\theta}$ and set up (using the template below) your own gradient descent code for OLS and Ridge regression.\n",
     "\n",
-    "Below is a template code for gradient descent implementation of ridge:"
+    "Below is a template code for gradient descent implementation of ridge:\n"
    ]
   },
   {
    "cell_type": "code",
    "execution_count": 3,
-   "id": "34d87303",
+   "id": "49245f55",
    "metadata": {
     "collapsed": false,
     "editable": true
@@ -273,19 +277,8 @@
     "# Initialize weights for gradient descent\n",
     "theta = np.zeros(n_features)\n",
     "\n",
-    "# Arrays to store history for plotting\n",
-    "cost_history = np.zeros(num_iters)\n",
-    "\n",
     "# Gradient descent loop\n",
-    "m = n_samples  # number of data points\n",
     "for t in range(num_iters):\n",
-    "    # Compute prediction error\n",
-    "    error = X_norm.dot(theta) - y_centered \n",
-    "    # Compute cost for OLS and Ridge (MSE + regularization for Ridge) for monitoring\n",
-    "    cost_OLS = ?\n",
-    "    cost_Ridge = ?\n",
-    "    # You could add a history for both methods (optional)\n",
-    "    cost_history[t] = ?\n",
     "    # Compute gradients for OSL and Ridge\n",
     "    grad_OLS = ?\n",
     "    grad_Ridge = ?\n",
@@ -302,31 +295,33 @@
   },
   {
    "cell_type": "markdown",
-   "id": "989f70bb",
+   "id": "f3f43f2c",
    "metadata": {
     "editable": true
    },
    "source": [
     "### 4a)\n",
     "\n",
-    "Discuss the results as function of the learning rate parameters and the number of iterations."
+    "Write first a gradient descent code for OLS only using the above template.\n",
+    "Discuss the results as function of the learning rate parameters and the number of iterations\n"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "370b2dad",
+   "id": "9ba303be",
    "metadata": {
     "editable": true
    },
    "source": [
     "### 4b)\n",
     "\n",
-    "Try to add a stopping parameter as function of the number iterations and the difference between the new and old $\\theta$ values. How would you define a stopping criterion?"
+    "Write then a similar code for Ridge regression using the above template.\n",
+    "Try to add a stopping parameter as function of the number iterations and the difference between the new and old $\\theta$ values. How would you define a stopping criterion?\n"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "ef197cd7",
+   "id": "78362c6c",
    "metadata": {
     "editable": true
    },
@@ -346,13 +341,13 @@
     "Then we sample feature values for $\\boldsymbol{X}$ randomly (e.g. from a normal distribution). We use a normal distribution so features are roughly centered around 0.\n",
     "Then we compute the target values $y$ using the linear combination $\\boldsymbol{X}\\hat{\\boldsymbol{\\theta}}$ and add some noise (to simulate measurement error or unexplained variance).\n",
     "\n",
-    "Below is the code to generate the dataset:"
+    "Below is the code to generate the dataset:\n"
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 4,
-   "id": "4ccc2f65",
+   "execution_count": null,
+   "id": "8be1cebe",
    "metadata": {
     "collapsed": false,
     "editable": true
@@ -375,13 +370,13 @@
     "X = np.random.randn(n_samples, n_features)  # standard normal distribution\n",
     "\n",
     "# Generate target values y with a linear combination of X and theta_true, plus noise\n",
-    "noise = 0.5 * np.random.randn(n_samples)    # Gaussian noise\n",
+    "noise = 0.5 * np.random.randn(n_samples)  # Gaussian noise\n",
     "y = X.dot @ theta_true + noise"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "00e279ef",
+   "id": "e2693666",
    "metadata": {
     "editable": true
    },
@@ -390,29 +385,29 @@
     "significantly influence $\\boldsymbol{y}$. The rest of the features have zero true\n",
     "coefficient. For example, feature 0 has\n",
     "a true weight of 5.0, feature 1 has -3.0, and feature 6 has 2.0, so\n",
-    "the expected relationship is:"
+    "the expected relationship is:\n"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "c910b3f4",
+   "id": "bc954d12",
    "metadata": {
     "editable": true
    },
    "source": [
     "$$\n",
     "y \\approx 5 \\times x_0 \\;-\\; 3 \\times x_1 \\;+\\; 2 \\times x_6 \\;+\\; \\text{noise}.\n",
-    "$$"
+    "$$\n"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "89e6e040",
+   "id": "6534b610",
    "metadata": {
     "editable": true
    },
    "source": [
-    "You can remove the noise if you wish to. \n",
+    "You can remove the noise if you wish to.\n",
     "\n",
     "Try to fit the above data set using OLS and Ridge regression with the analytical expressions and your own gradient descent codes.\n",
     "\n",
@@ -420,11 +415,15 @@
     "close to the true values [5.0, -3.0, 0.0, …, 2.0, …] that we used to\n",
     "generate the data. Keep in mind that due to regularization and noise,\n",
     "the learned values will not exactly equal the true ones, but they\n",
-    "should be in the same ballpark.  Which method (OLS or Ridge) gives the best results?"
+    "should be in the same ballpark. Which method (OLS or Ridge) gives the best results?\n"
    ]
   }
  ],
- "metadata": {},
+ "metadata": {
+  "language_info": {
+   "name": "python"
+  }
+ },
  "nbformat": 4,
  "nbformat_minor": 5
 }
diff --git a/doc/LectureNotes/_build/html/_sources/exercisesweek38.ipynb b/doc/LectureNotes/_build/html/_sources/exercisesweek38.ipynb
new file mode 100644
index 000000000..c100028a5
--- /dev/null
+++ b/doc/LectureNotes/_build/html/_sources/exercisesweek38.ipynb
@@ -0,0 +1,485 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "1da77599",
+   "metadata": {},
+   "source": [
+    "# Exercises week 38\n",
+    "\n",
+    "## September 15-19\n",
+    "\n",
+    "## Resampling and the Bias-Variance Trade-off\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e9f27b0e",
+   "metadata": {},
+   "source": [
+    "### Learning goals\n",
+    "\n",
+    "After completing these exercises, you will know how to\n",
+    "\n",
+    "- Derive expectation and variances values related to linear regression\n",
+    "- Compute expectation and variances values related to linear regression\n",
+    "- Compute and evaluate the trade-off between bias and variance of a model\n",
+    "\n",
+    "### Deliverables\n",
+    "\n",
+    "Complete the following exercises while working in a jupyter notebook. Then, in canvas, include\n",
+    "\n",
+    "- The jupyter notebook with the exercises completed\n",
+    "- An exported PDF of the notebook (https://code.visualstudio.com/docs/datascience/jupyter-notebooks#_export-your-jupyter-notebook)\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "984af8e3",
+   "metadata": {},
+   "source": [
+    "## Use the books!\n",
+    "\n",
+    "This week deals with various mean values and variances in linear regression methods (here it may be useful to look up chapter 3, equation (3.8) of [Trevor Hastie, Robert Tibshirani, Jerome H. Friedman, The Elements of Statistical Learning, Springer](https://www.springer.com/gp/book/9780387848570)).\n",
+    "\n",
+    "For more discussions on Ridge regression and calculation of expectation values, [Wessel van Wieringen's](https://arxiv.org/abs/1509.09169) article is highly recommended.\n",
+    "\n",
+    "The exercises this week are also a part of project 1 and can be reused in the theory part of the project.\n",
+    "\n",
+    "### Definitions\n",
+    "\n",
+    "We assume that there exists a continuous function $f(\\boldsymbol{x})$ and a normal distributed error $\\boldsymbol{\\varepsilon}\\sim N(0, \\sigma^2)$ which describes our data\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c16f7d0e",
+   "metadata": {},
+   "source": [
+    "$$\n",
+    "\\boldsymbol{y} = f(\\boldsymbol{x})+\\boldsymbol{\\varepsilon}\n",
+    "$$\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9fcf981a",
+   "metadata": {
+    "vscode": {
+     "languageId": "plaintext"
+    }
+   },
+   "source": [
+    "We further assume that this continous function can be modeled with a linear model $\\mathbf{\\tilde{y}}$ of some features $\\mathbf{X}$.\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d4189366",
+   "metadata": {},
+   "source": [
+    "$$\n",
+    "\\boldsymbol{y} = \\boldsymbol{\\tilde{y}} + \\boldsymbol{\\varepsilon} = \\boldsymbol{X}\\boldsymbol{\\beta} +\\boldsymbol{\\varepsilon}\n",
+    "$$\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f4fca21b",
+   "metadata": {},
+   "source": [
+    "We therefore get that our data $\\boldsymbol{y}$ has an expectation value $\\boldsymbol{X}\\boldsymbol{\\beta}$ and variance $\\sigma^2$, that is $\\boldsymbol{y}$ follows a normal distribution with mean value $\\boldsymbol{X}\\boldsymbol{\\beta}$ and variance $\\sigma^2$.\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5de0c7e6",
+   "metadata": {},
+   "source": [
+    "## Exercise 1: Expectation values for ordinary least squares expressions\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d878c699",
+   "metadata": {},
+   "source": [
+    "**a)** With the expressions for the optimal parameters $\\boldsymbol{\\hat{\\beta}_{OLS}}$ show that\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "08b7007d",
+   "metadata": {},
+   "source": [
+    "$$\n",
+    "\\mathbb{E}(\\boldsymbol{\\hat{\\beta}_{OLS}}) = \\boldsymbol{\\beta}.\n",
+    "$$\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "46e93394",
+   "metadata": {},
+   "source": [
+    "**b)** Show that the variance of $\\boldsymbol{\\hat{\\beta}_{OLS}}$ is\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "be1b65be",
+   "metadata": {},
+   "source": [
+    "$$\n",
+    "\\mathbf{Var}(\\boldsymbol{\\hat{\\beta}_{OLS}}) = \\sigma^2 \\, (\\mathbf{X}^{T} \\mathbf{X})^{-1}.\n",
+    "$$\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d2143684",
+   "metadata": {},
+   "source": [
+    "We can use the last expression when we define a [confidence interval](https://en.wikipedia.org/wiki/Confidence_interval) for the parameters $\\boldsymbol{\\hat{\\beta}_{OLS}}$.\n",
+    "A given parameter ${\\boldsymbol{\\hat{\\beta}_{OLS}}}_j$ is given by the diagonal matrix element of the above matrix.\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f5c2dc22",
+   "metadata": {},
+   "source": [
+    "## Exercise 2: Expectation values for Ridge regression\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3893e3e7",
+   "metadata": {},
+   "source": [
+    "**a)** With the expressions for the optimal parameters $\\boldsymbol{\\hat{\\beta}_{Ridge}}$ show that\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "79dc571f",
+   "metadata": {},
+   "source": [
+    "$$\n",
+    "\\mathbb{E} \\big[ \\hat{\\boldsymbol{\\beta}}^{\\mathrm{Ridge}} \\big]=(\\mathbf{X}^{T} \\mathbf{X} + \\lambda \\mathbf{I}_{pp})^{-1} (\\mathbf{X}^{\\top} \\mathbf{X})\\boldsymbol{\\beta}\n",
+    "$$\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "028209a1",
+   "metadata": {},
+   "source": [
+    "We see that $\\mathbb{E} \\big[ \\hat{\\boldsymbol{\\beta}}^{\\mathrm{Ridge}} \\big] \\not= \\mathbb{E} \\big[\\hat{\\boldsymbol{\\beta}}^{\\mathrm{OLS}}\\big ]$ for any $\\lambda > 0$.\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b4e721fc",
+   "metadata": {},
+   "source": [
+    "**b)** Show that the variance is\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "090eb1e1",
+   "metadata": {},
+   "source": [
+    "$$\n",
+    "\\mathbf{Var}[\\hat{\\boldsymbol{\\beta}}^{\\mathrm{Ridge}}]=\\sigma^2[  \\mathbf{X}^{T} \\mathbf{X} + \\lambda \\mathbf{I} ]^{-1}  \\mathbf{X}^{T}\\mathbf{X} \\{ [  \\mathbf{X}^{\\top} \\mathbf{X} + \\lambda \\mathbf{I} ]^{-1}\\}^{T}\n",
+    "$$\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6b8e8697",
+   "metadata": {},
+   "source": [
+    "We see that if the parameter $\\lambda$ goes to infinity then the variance of the Ridge parameters $\\boldsymbol{\\beta}$ goes to zero.\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "74bc300b",
+   "metadata": {},
+   "source": [
+    "## Exercise 3: Deriving the expression for the Bias-Variance Trade-off\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "eeb86010",
+   "metadata": {},
+   "source": [
+    "The aim of this exercise is to derive the equations for the bias-variance tradeoff to be used in project 1.\n",
+    "\n",
+    "The parameters $\\boldsymbol{\\hat{\\beta}_{OLS}}$ are found by optimizing the mean squared error via the so-called cost function\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "522a0d1d",
+   "metadata": {},
+   "source": [
+    "$$\n",
+    "C(\\boldsymbol{X},\\boldsymbol{\\beta}) =\\frac{1}{n}\\sum_{i=0}^{n-1}(y_i-\\tilde{y}_i)^2=\\mathbb{E}\\left[(\\boldsymbol{y}-\\boldsymbol{\\tilde{y}})^2\\right]\n",
+    "$$\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "831db06c",
+   "metadata": {},
+   "source": [
+    "**a)** Show that you can rewrite this into an expression which contains\n",
+    "\n",
+    "- the variance of the model (the variance term)\n",
+    "- the expected deviation of the mean of the model from the true data (the bias term)\n",
+    "- the variance of the noise\n",
+    "\n",
+    "In other words, show that:\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8cc52b3c",
+   "metadata": {},
+   "source": [
+    "$$\n",
+    "\\mathbb{E}\\left[(\\boldsymbol{y}-\\boldsymbol{\\tilde{y}})^2\\right]=\\mathrm{Bias}[\\tilde{y}]+\\mathrm{var}[\\tilde{y}]+\\sigma^2,\n",
+    "$$\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8cb50416",
+   "metadata": {},
+   "source": [
+    "with\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e49bdbb4",
+   "metadata": {},
+   "source": [
+    "$$\n",
+    "\\mathrm{Bias}[\\tilde{y}]=\\mathbb{E}\\left[\\left(\\boldsymbol{y}-\\mathbb{E}\\left[\\boldsymbol{\\tilde{y}}\\right]\\right)^2\\right],\n",
+    "$$\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "eca5554a",
+   "metadata": {},
+   "source": [
+    "and\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b1054343",
+   "metadata": {},
+   "source": [
+    "$$\n",
+    "\\mathrm{var}[\\tilde{y}]=\\mathbb{E}\\left[\\left(\\tilde{\\boldsymbol{y}}-\\mathbb{E}\\left[\\boldsymbol{\\tilde{y}}\\right]\\right)^2\\right]=\\frac{1}{n}\\sum_i(\\tilde{y}_i-\\mathbb{E}\\left[\\boldsymbol{\\tilde{y}}\\right])^2.\n",
+    "$$\n",
+    "\n",
+    "In order to arrive at the equation for the bias, we have to approximate the unknown function $f$ with the output/target values $y$.\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "70fbfcd7",
+   "metadata": {},
+   "source": [
+    "**b)** Explain what the terms mean and discuss their interpretations.\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b8f8b9d1",
+   "metadata": {},
+   "source": [
+    "## Exercise 4: Computing the Bias and Variance\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9e012430",
+   "metadata": {},
+   "source": [
+    "Before you compute the bias and variance of a real model for different complexities, let's for now assume that you have sampled predictions and targets for a single model complexity using bootstrap resampling.\n",
+    "\n",
+    "**a)** Using the expression above, compute the mean squared error, bias and variance of the given data. Check that the sum of the bias and variance correctly gives (approximately) the mean squared error.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "b5bf581c",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import numpy as np\n",
+    "\n",
+    "n = 100\n",
+    "bootstraps = 1000\n",
+    "\n",
+    "predictions = np.random.rand(bootstraps, n) * 10 + 10\n",
+    "# The definition of targets has been updated, and was wrong earlier in the week.\n",
+    "targets = np.random.rand(1, n)\n",
+    "\n",
+    "mse = ...\n",
+    "bias = ...\n",
+    "variance = ..."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7b1dc621",
+   "metadata": {},
+   "source": [
+    "**b)** Change the prediction values in some way to increase the bias while decreasing the variance.\n",
+    "\n",
+    "**c)** Change the prediction values in some way to increase the variance while decreasing the bias.\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8da63362",
+   "metadata": {},
+   "source": [
+    "**d)** Perform a bias-variance analysis of a polynomial OLS model fit to a one-dimensional function by computing and plotting the bias and variances values as a function of the polynomial degree of your model.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "dd5855e4",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import numpy as np\n",
+    "import matplotlib.pyplot as plt\n",
+    "from sklearn.preprocessing import (\n",
+    "    PolynomialFeatures,\n",
+    ")  # use the fit_transform method of the created object!\n",
+    "from sklearn.linear_model import LinearRegression\n",
+    "from sklearn.metrics import mean_squared_error\n",
+    "from sklearn.model_selection import train_test_split\n",
+    "from sklearn.utils import resample"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "7e35fa37",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "n = 100\n",
+    "bootstraps = 1000\n",
+    "\n",
+    "x = np.linspace(-3, 3, n)\n",
+    "y = np.exp(-(x**2)) + 1.5 * np.exp(-((x - 2) ** 2)) + np.random.normal(0, 0.1)\n",
+    "\n",
+    "biases = []\n",
+    "variances = []\n",
+    "mses = []\n",
+    "\n",
+    "# for p in range(1, 5):\n",
+    "#    predictions = ...\n",
+    "#    targets = ...\n",
+    "#\n",
+    "#    X = ...\n",
+    "#    X_train, X_test, y_train, y_test = ...\n",
+    "#    for b in range(bootstraps):\n",
+    "#        X_train_re, y_train_re = ...\n",
+    "#\n",
+    "#        # fit your model on the sampled data\n",
+    "#\n",
+    "#        # make predictions on the test data\n",
+    "#        predictions[b, :] =\n",
+    "#        targets[b, :] =\n",
+    "#\n",
+    "#    biases.append(...)\n",
+    "#    variances.append(...)\n",
+    "#    mses.append(...)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "253b8461",
+   "metadata": {},
+   "source": [
+    "**e)** Discuss the bias-variance trade-off as function of your model complexity (the degree of the polynomial).\n",
+    "\n",
+    "**f)** Compute and discuss the bias and variance as function of the number of data points (choose a suitable polynomial degree to show something interesting).\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "46250fbc",
+   "metadata": {},
+   "source": [
+    "## Exercise 5: Interpretation of scaling and metrics\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5af53055",
+   "metadata": {},
+   "source": [
+    "In this course, we often ask you to scale data and compute various metrics. Although these practices are \"standard\" in the field, we will require you to demonstrate an understanding of _why_ you need to scale data and use these metrics. Both so that you can make better arguements about your results, and so that you will hopefully make fewer mistakes.\n",
+    "\n",
+    "First, a few reminders: In this course you should always scale the columns of the feature matrix, and sometimes scale the target data, when it is worth the effort. By scaling, we mean subtracting the mean and dividing by the standard deviation, though there are many other ways to scale data. When scaling either the feature matrix or the target data, the intercept becomes a bit harder to implement and understand, so take care.\n",
+    "\n",
+    "Briefly answer the following:\n",
+    "\n",
+    "**a)** Why do we scale data?\n",
+    "\n",
+    "**b)** Why does the OLS method give practically equivelent models on scaled and unscaled data?\n",
+    "\n",
+    "**c)** Why does the Ridge method **not** give practically equivelent models on scaled and unscaled data? Why do we only consider the model on scaled data correct?\n",
+    "\n",
+    "**d)** Why do we say that the Ridge method gives a biased model?\n",
+    "\n",
+    "**e)** Is the MSE of the OLS method affected by scaling of the feature matrix? Is it affected by scaling of the target data?\n",
+    "\n",
+    "**f)** Read about the R2 score, a metric we will ask you to use a lot later in the course. Is the R2 score of the OLS method affected by scaling of the feature matrix? Is it affected by scaling of the target data?\n",
+    "\n",
+    "**g)** Give interpretations of the following R2 scores: 0, 0.5, 1.\n",
+    "\n",
+    "**h)** What is an advantage of the R2 score over the MSE?\n"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.9.15"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/doc/LectureNotes/_build/html/_sources/exercisesweek39.ipynb b/doc/LectureNotes/_build/html/_sources/exercisesweek39.ipynb
new file mode 100644
index 000000000..22a86cb56
--- /dev/null
+++ b/doc/LectureNotes/_build/html/_sources/exercisesweek39.ipynb
@@ -0,0 +1,185 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "433db993",
+   "metadata": {},
+   "source": [
+    "# Exercises week 39\n",
+    "\n",
+    "## Getting started with project 1\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6b931365",
+   "metadata": {},
+   "source": [
+    "The aim of the exercises this week is to aid you in getting started with writing the report. This will be discussed during the lab sessions as well.\n",
+    "\n",
+    "A short feedback to the this exercise will be available before the project deadline. And you can reuse these elements in your final report.\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2a63bae1",
+   "metadata": {},
+   "source": [
+    "### Learning goals\n",
+    "\n",
+    "After completing these exercises, you will know how to\n",
+    "\n",
+    "- Create a properly formatted report in Overleaf\n",
+    "- Select and present graphs for a scientific report\n",
+    "- Write an abstract and introduction for a scientific report\n",
+    "\n",
+    "### Deliverables\n",
+    "\n",
+    "Complete the following exercises while working in an Overleaf project. Then, in canvas, include\n",
+    "\n",
+    "- An exported PDF of the report draft you have been working on.\n",
+    "- A comment linking to the github repository used in exercise 4.\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e0f2d99d",
+   "metadata": {},
+   "source": [
+    "## Exercise 1: Creating the report document\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d06bfb29",
+   "metadata": {},
+   "source": [
+    "We require all projects to be formatted as proper scientific reports, and this includes using LaTeX for typesetting. We strongly recommend that you use the online LaTeX editor Overleaf, as it is much easier to start using, and has excellent support for collaboration.\n",
+    "\n",
+    "**a)** Create an account on Overleaf.com, or log in using SSO with your UiO email.\n",
+    "\n",
+    "**b)** Download [this](https://github.com/CompPhysics/MachineLearning/blob/master/doc/LectureNotes/data/FYS_STK_Template.zip) template project.\n",
+    "\n",
+    "**c)** Create a new Overleaf project with the correct formatting by uploading the template project.\n",
+    "\n",
+    "**d)** Read the general guideline for writing a report, which can be found at <https://github.com/CompPhysics/MachineLearning/blob/master/doc/Projects/EvaluationGrading/EvaluationForm.md>.\n",
+    "\n",
+    "**e)** Look at the provided example of an earlier project, found at <https://github.com/CompPhysics/MachineLearning/blob/master/doc/Projects/ReportExample/ReportExampleDanielBH.pdf>\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ec36f4c3",
+   "metadata": {},
+   "source": [
+    "## Exercise 2: Adding good figures\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f50723f8",
+   "metadata": {},
+   "source": [
+    "**a)** Using what you have learned so far in this course, create a plot illustrating the Bias-Variance trade-off. Make sure the lines and axes are labeled, with font size being the same as in the text.\n",
+    "\n",
+    "**b)** Add this figure to the results section of your document, with a caption that describes it. A reader should be able to understand the figure with only its contents and caption.\n",
+    "\n",
+    "**c)** Refer to the figure in your text using \\ref.\n",
+    "\n",
+    "**d)** Create a heatmap showing the MSE of a Ridge regression model for various polynomial degrees and lambda values. Make sure the axes are labeled, and that the title or colorbar describes what is plotted.\n",
+    "\n",
+    "**e)** Add this second figure to your document with a caption and reference in the text. All figures in the final report must be captioned and be referenced and used in the text.\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "276c214e",
+   "metadata": {},
+   "source": [
+    "## Exercise 3: Writing an abstract and introduction\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f4134eb5",
+   "metadata": {},
+   "source": [
+    "Although much of your project 1 results are not done yet, we want you to write an abstract and introduction to get you started on writing the report. It is generally a good idea to write a lot of a report before finishing all of the results, as you get a better understanding of your methods and inquiry from doing so, along with saving a lot of time. Where you would typically describe results in the abstract, instead make something up, just this once.\n",
+    "\n",
+    "**a)** Read the guidelines on abstract and introduction before you start.\n",
+    "\n",
+    "**b)** Write an abstract for project 1 in your report.\n",
+    "\n",
+    "**c)** Write an introduction for project 1 in your report.\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2f512b59",
+   "metadata": {},
+   "source": [
+    "## Exercise 4: Making the code available and presentable\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "77fe1fec",
+   "metadata": {},
+   "source": [
+    "A central part of the report is the code you write to implement the methods and generate the results. To get points for the code-part of the project, you need to make your code avaliable and presentable.\n",
+    "\n",
+    "**a)** Create a github repository for project 1, or create a dedicated folder for project 1 in a github repository. Only one person in your group needs to do this.\n",
+    "\n",
+    "**b)** Add a PDF of the report to this repository, after completing exercises 1-3\n",
+    "\n",
+    "**c)** Add a folder named Code, where you can put python files for your functions and notebooks for reproducing your results.\n",
+    "\n",
+    "**d)** Add python files for functions, and a notebook to produce the figures in exercise 2, to the Code folder. Remember to use a seed for generating random data and for train-test splits.\n",
+    "\n",
+    "**e)** Create a README file in the repository or project folder with\n",
+    "\n",
+    "- the name of the group members\n",
+    "- a short description of the project\n",
+    "- a description of how to install the required packages to run your code from a requirements.txt file\n",
+    "- names and descriptions of the various notebooks in the Code folder and the results they produce\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f1d72c56",
+   "metadata": {},
+   "source": [
+    "## Exercise 5: Referencing\n",
+    "\n",
+    "**a)** Add a reference to Hastie et al. using your preferred referencing style. See https://www.sokogskriv.no/referansestiler/ for an overview of styles.\n",
+    "\n",
+    "**b)** Add a reference to sklearn like this: https://scikit-learn.org/stable/about.html#citing-scikit-learn\n",
+    "\n",
+    "**c)** Make a prompt to your LLM of choice, and upload the exported conversation to your GitHub repository for the project.\n",
+    "\n",
+    "**d)** At the end of the methods section of the report, write a one paragraph declaration on how and for what you have used the LLM. Link to the log on GitHub.\n"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.9.15"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/doc/LectureNotes/_build/html/_sources/exercisesweek41.ipynb b/doc/LectureNotes/_build/html/_sources/exercisesweek41.ipynb
new file mode 100644
index 000000000..190c0b96a
--- /dev/null
+++ b/doc/LectureNotes/_build/html/_sources/exercisesweek41.ipynb
@@ -0,0 +1,804 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "4b4c06bc",
+   "metadata": {},
+   "source": [
+    "<!-- HTML file automatically generated from DocOnce source (https://github.com/doconce/doconce/)\n",
+    "doconce format html exercisesweek41.do.txt  -->\n",
+    "<!-- dom:TITLE: Exercises week 41 -->\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "bcb25e64",
+   "metadata": {},
+   "source": [
+    "# Exercises week 41\n",
+    "\n",
+    "**October 6-10, 2025**\n",
+    "\n",
+    "Date: **Deadline is Friday October 10 at midnight**\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "bb01f126",
+   "metadata": {},
+   "source": [
+    "# Overarching aims of the exercises this week\n",
+    "\n",
+    "This week, you will implement the entire feed-forward pass of a neural network! Next week you will compute the gradient of the network by implementing back-propagation manually, and by using autograd which does back-propagation for you (much easier!). Next week, you will also use the gradient to optimize the network with a gradient method! However, there is an optional exercise this week to get started on training the network and getting good results!\n",
+    "\n",
+    "We recommend that you do the exercises this week by editing and running this notebook file, as it includes some checks along the way that you have implemented the pieces of the feed-forward pass correctly, and running small parts of the code at a time will be important for understanding the methods.\n",
+    "\n",
+    "If you have trouble running a notebook, you can run this notebook in google colab instead (https://colab.research.google.com/drive/1zKibVQf-iAYaAn2-GlKfgRjHtLnPlBX4#offline=true&sandboxMode=true), an updated link will be provided on the course discord (you can also send an email to k.h.fredly@fys.uio.no if you encounter any trouble), though we recommend that you set up VSCode and your python environment to run code like this locally.\n",
+    "\n",
+    "First, here are some functions you are going to need, don't change this cell. If you are unable to import autograd, just swap in normal numpy until you want to do the final optional exercise.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "c6f61b09",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import autograd.numpy as np  # We need to use this numpy wrapper to make automatic differentiation work later\n",
+    "from sklearn import datasets\n",
+    "import matplotlib.pyplot as plt\n",
+    "from sklearn.metrics import accuracy_score\n",
+    "\n",
+    "\n",
+    "# Defining some activation functions\n",
+    "def ReLU(z):\n",
+    "    return np.where(z > 0, z, 0)\n",
+    "\n",
+    "\n",
+    "def sigmoid(z):\n",
+    "    return 1 / (1 + np.exp(-z))\n",
+    "\n",
+    "\n",
+    "def softmax(z):\n",
+    "    \"\"\"Compute softmax values for each set of scores in the rows of the matrix z.\n",
+    "    Used with batched input data.\"\"\"\n",
+    "    e_z = np.exp(z - np.max(z, axis=0))\n",
+    "    return e_z / np.sum(e_z, axis=1)[:, np.newaxis]\n",
+    "\n",
+    "\n",
+    "def softmax_vec(z):\n",
+    "    \"\"\"Compute softmax values for each set of scores in the vector z.\n",
+    "    Use this function when you use the activation function on one vector at a time\"\"\"\n",
+    "    e_z = np.exp(z - np.max(z))\n",
+    "    return e_z / np.sum(e_z)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6248ec53",
+   "metadata": {},
+   "source": [
+    "# Exercise 1\n",
+    "\n",
+    "In this exercise you will compute the activation of the first layer. You only need to change the code in the cells right below an exercise, the rest works out of the box. Feel free to make changes and see how stuff works though!\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "37f30740",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "np.random.seed(2024)\n",
+    "\n",
+    "x = np.random.randn(2)  # network input. This is a single input with two features\n",
+    "W1 = np.random.randn(4, 2)  # first layer weights"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4ed2cf3d",
+   "metadata": {},
+   "source": [
+    "**a)** Given the shape of the first layer weight matrix, what is the input shape of the neural network? What is the output shape of the first layer?\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "edf7217b",
+   "metadata": {},
+   "source": [
+    "**b)** Define the bias of the first layer, `b1`with the correct shape. (Run the next cell right after the previous to get the random generated values to line up with the test solution below)\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "2129c19f",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "b1 = ..."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "09e8d453",
+   "metadata": {},
+   "source": [
+    "**c)** Compute the intermediary `z1` for the first layer\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "6837119b",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "z1 = ..."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6f71374e",
+   "metadata": {},
+   "source": [
+    "**d)** Compute the activation `a1` for the first layer using the ReLU activation function defined earlier.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "8d41ed19",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "a1 = ..."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "088710c0",
+   "metadata": {},
+   "source": [
+    "Confirm that you got the correct activation with the test below. Make sure that you define `b1` with the randn function right after you define `W1`.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "4d2f54b4",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "sol1 = np.array([0.60610368, 4.0076268, 0.0, 0.56469864])\n",
+    "\n",
+    "print(np.allclose(a1, sol1))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7fb0cf46",
+   "metadata": {},
+   "source": [
+    "# Exercise 2\n",
+    "\n",
+    "Now we will add a layer to the network with an output of length 8 and ReLU activation.\n",
+    "\n",
+    "**a)** What is the input of the second layer? What is its shape?\n",
+    "\n",
+    "**b)** Define the weight and bias of the second layer with the right shapes.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "00063acf",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "W2 = ...\n",
+    "b2 = ..."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5bd7d84b",
+   "metadata": {},
+   "source": [
+    "**c)** Compute the intermediary `z2` and activation `a2` for the second layer.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "2fd0383d",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "z2 = ...\n",
+    "a2 = ..."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1b5daae5",
+   "metadata": {},
+   "source": [
+    "Confirm that you got the correct activation shape with the test below.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "f7f2f8a1",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "print(\n",
+    "    np.allclose(np.exp(len(a2)), 2980.9579870417283)\n",
+    ")  # This should evaluate to True if a2 has the correct shape :)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3759620d",
+   "metadata": {},
+   "source": [
+    "# Exercise 3\n",
+    "\n",
+    "We often want our neural networks to have many layers of varying sizes. To avoid writing very long and error-prone code where we explicitly define and evaluate each layer we should keep all our layers in a single variable which is easy to create and use.\n",
+    "\n",
+    "**a)** Complete the function below so that it returns a list `layers` of weight and bias tuples `(W, b)` for each layer, in order, with the correct shapes that we can use later as our network parameters.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "c58f10f9",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def create_layers(network_input_size, layer_output_sizes):\n",
+    "    layers = []\n",
+    "\n",
+    "    i_size = network_input_size\n",
+    "    for layer_output_size in layer_output_sizes:\n",
+    "        W = ...\n",
+    "        b = ...\n",
+    "        layers.append((W, b))\n",
+    "\n",
+    "        i_size = layer_output_size\n",
+    "    return layers"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "bdc0cda2",
+   "metadata": {},
+   "source": [
+    "**b)** Comple the function below so that it evaluates the intermediary `z` and activation `a` for each layer, with ReLU actication, and returns the final activation `a`. This is the complete feed-forward pass, a full neural network!\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "5262df05",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def feed_forward_all_relu(layers, input):\n",
+    "    a = input\n",
+    "    for W, b in layers:\n",
+    "        z = ...\n",
+    "        a = ...\n",
+    "    return a"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "245adbcb",
+   "metadata": {},
+   "source": [
+    "**c)** Create a network with input size 8 and layers with output sizes 10, 16, 6, 2. Evaluate it and make sure that you get the correct size vectors along the way.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "89a8f70d",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "input_size = ...\n",
+    "layer_output_sizes = [...]\n",
+    "\n",
+    "x = np.random.rand(input_size)\n",
+    "layers = ...\n",
+    "predict = ...\n",
+    "print(predict)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0da7fd52",
+   "metadata": {},
+   "source": [
+    "**d)** Why is a neural network with no activation functions mathematically equivelent to(can be reduced to) a neural network with only one layer?\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "306d8b7c",
+   "metadata": {},
+   "source": [
+    "# Exercise 4 - Custom activation for each layer\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "221c7b6c",
+   "metadata": {},
+   "source": [
+    "So far, every layer has used the same activation, ReLU. We often want to use other types of activation however, so we need to update our code to support multiple types of activation functions. Make sure that you have completed every previous exercise before trying this one.\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "10896d06",
+   "metadata": {},
+   "source": [
+    "**a)** Complete the `feed_forward` function which accepts a list of activation functions as an argument, and which evaluates these activation functions at each layer.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "de062369",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def feed_forward(input, layers, activation_funcs):\n",
+    "    a = input\n",
+    "    for (W, b), activation_func in zip(layers, activation_funcs):\n",
+    "        z = ...\n",
+    "        a = ...\n",
+    "    return a"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8f7df363",
+   "metadata": {},
+   "source": [
+    "**b)** You are now given a list with three activation functions, two ReLU and one sigmoid. (Don't call them yet! you can make a list with function names as elements, and then call these elements of the list later. If you add other functions than the ones defined at the start of the notebook, make sure everything is defined using autograd's numpy wrapper, like above, since we want to use automatic differentiation on all of these functions later.)\n",
+    "\n",
+    "Evaluate a network with three layers and these activation functions.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "301b46dc",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "network_input_size = ...\n",
+    "layer_output_sizes = [...]\n",
+    "activation_funcs = [ReLU, ReLU, sigmoid]\n",
+    "layers = ...\n",
+    "\n",
+    "x = np.random.randn(network_input_size)\n",
+    "feed_forward(x, layers, activation_funcs)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9c914fd0",
+   "metadata": {},
+   "source": [
+    "**c)** How does the output of the network change if you use sigmoid in the hidden layers and ReLU in the output layer?\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a8d6c425",
+   "metadata": {},
+   "source": [
+    "# Exercise 5 - Processing multiple inputs at once\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0f4330a4",
+   "metadata": {},
+   "source": [
+    "So far, the feed forward function has taken one input vector as an input. This vector then undergoes a linear transformation and then an element-wise non-linear operation for each layer. This approach of sending one vector in at a time is great for interpreting how the network transforms data with its linear and non-linear operations, but not the best for numerical efficiency. Now, we want to be able to send many inputs through the network at once. This will make the code a bit harder to understand, but it will make it faster, and more compact. It will be worth the trouble.\n",
+    "\n",
+    "To process multiple inputs at once, while still performing the same operations, you will only need to flip a couple things around.\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "17023bb7",
+   "metadata": {},
+   "source": [
+    "**a)** Complete the function `create_layers_batch` so that the weight matrix is the transpose of what it was when you only sent in one input at a time.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "a241fd79",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def create_layers_batch(network_input_size, layer_output_sizes):\n",
+    "    layers = []\n",
+    "\n",
+    "    i_size = network_input_size\n",
+    "    for layer_output_size in layer_output_sizes:\n",
+    "        W = ...\n",
+    "        b = ...\n",
+    "        layers.append((W, b))\n",
+    "\n",
+    "        i_size = layer_output_size\n",
+    "    return layers"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a6349db6",
+   "metadata": {},
+   "source": [
+    "**b)** Make a matrix of inputs with the shape (number of inputs, number of features), you choose the number of inputs and features per input. Then complete the function `feed_forward_batch` so that you can process this matrix of inputs with only one matrix multiplication and one broadcasted vector addition per layer. (Hint: You will only need to swap two variable around from your previous implementation, but remember to test that you get the same results for equivelent inputs!)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "425f3bcc",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "inputs = np.random.rand(1000, 4)\n",
+    "\n",
+    "\n",
+    "def feed_forward_batch(inputs, layers, activation_funcs):\n",
+    "    a = inputs\n",
+    "    for (W, b), activation_func in zip(layers, activation_funcs):\n",
+    "        z = ...\n",
+    "        a = ...\n",
+    "    return a"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "efd07b4e",
+   "metadata": {},
+   "source": [
+    "**c)** Create and evaluate a neural network with 4 input features, and layers with output sizes 12, 10, 3 and activations ReLU, ReLU, softmax.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "ce6fcc2f",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "network_input_size = ...\n",
+    "layer_output_sizes = [...]\n",
+    "activation_funcs = [...]\n",
+    "layers = create_layers_batch(network_input_size, layer_output_sizes)\n",
+    "\n",
+    "x = np.random.randn(network_input_size)\n",
+    "feed_forward_batch(inputs, layers, activation_funcs)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "87999271",
+   "metadata": {},
+   "source": [
+    "You should use this batched approach moving forward, as it will lead to much more compact code. However, remember that each input is still treated separately, and that you will need to keep in mind the transposed weight matrix and other details when implementing backpropagation.\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "237eb782",
+   "metadata": {},
+   "source": [
+    "# Exercise 6 - Predicting on real data\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "54d5fde7",
+   "metadata": {},
+   "source": [
+    "You will now evaluate your neural network on the iris data set (https://scikit-learn.org/1.5/auto_examples/datasets/plot_iris_dataset.html).\n",
+    "\n",
+    "This dataset contains data on 150 flowers of 3 different types which can be separated pretty well using the four features given for each flower, which includes the width and length of their leaves. You are will later train your network to actually make good predictions.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "6bd4c148",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "iris = datasets.load_iris()\n",
+    "\n",
+    "_, ax = plt.subplots()\n",
+    "scatter = ax.scatter(iris.data[:, 0], iris.data[:, 1], c=iris.target)\n",
+    "ax.set(xlabel=iris.feature_names[0], ylabel=iris.feature_names[1])\n",
+    "_ = ax.legend(\n",
+    "    scatter.legend_elements()[0], iris.target_names, loc=\"lower right\", title=\"Classes\"\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "ed3e2fc9",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "inputs = iris.data\n",
+    "\n",
+    "# Since each prediction is a vector with a score for each of the three types of flowers,\n",
+    "# we need to make each target a vector with a 1 for the correct flower and a 0 for the others.\n",
+    "targets = np.zeros((len(iris.data), 3))\n",
+    "for i, t in enumerate(iris.target):\n",
+    "    targets[i, t] = 1\n",
+    "\n",
+    "\n",
+    "def accuracy(predictions, targets):\n",
+    "    one_hot_predictions = np.zeros(predictions.shape)\n",
+    "\n",
+    "    for i, prediction in enumerate(predictions):\n",
+    "        one_hot_predictions[i, np.argmax(prediction)] = 1\n",
+    "    return accuracy_score(one_hot_predictions, targets)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0362c4a9",
+   "metadata": {},
+   "source": [
+    "**a)** What should the input size for the network be with this dataset? What should the output size of the last layer be?\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "bf62607e",
+   "metadata": {},
+   "source": [
+    "**b)** Create a network with two hidden layers, the first with sigmoid activation and the last with softmax, the first layer should have 8 \"nodes\", the second has the number of nodes you found in exercise a). Softmax returns a \"probability distribution\", in the sense that the numbers in the output are positive and add up to 1 and, their magnitude are in some sense relative to their magnitude before going through the softmax function. Remember to use the batched version of the create_layers and feed forward functions.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "5366d4ae",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "...\n",
+    "layers = ..."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c528846f",
+   "metadata": {},
+   "source": [
+    "**c)** Evaluate your model on the entire iris dataset! For later purposes, we will split the data into train and test sets, and compute gradients on smaller batches of the training data. But for now, evaluate the network on the whole thing at once.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "6c783105",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "predictions = feed_forward_batch(inputs, layers, activation_funcs)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "01a3caa8",
+   "metadata": {},
+   "source": [
+    "**d)** Compute the accuracy of your model using the accuracy function defined above. Recreate your model a couple times and see how the accuracy changes.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "a2612b82",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "print(accuracy(predictions, targets))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "334560b6",
+   "metadata": {},
+   "source": [
+    "# Exercise 7 - Training on real data (Optional)\n",
+    "\n",
+    "To be able to actually do anything useful with your neural network, you need to train it. For this, we need a cost function and a way to take the gradient of the cost function wrt. the network parameters. The following exercises guide you through taking the gradient using autograd, and updating the network parameters using the gradient. Feel free to implement gradient methods like ADAM if you finish everything.\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "700cabe4",
+   "metadata": {},
+   "source": [
+    "Since we are doing a classification task with multiple output classes, we use the cross-entropy loss function, which can evaluate performance on classification tasks. It sees if your prediction is \"most certain\" on the correct target.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "f30e6e2c",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def cross_entropy(predict, target):\n",
+    "    return np.sum(-target * np.log(predict))\n",
+    "\n",
+    "\n",
+    "def cost(input, layers, activation_funcs, target):\n",
+    "    predict = feed_forward_batch(input, layers, activation_funcs)\n",
+    "    return cross_entropy(predict, target)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7ea9c1a4",
+   "metadata": {},
+   "source": [
+    "To improve our network on whatever prediction task we have given it, we need to use a sensible cost function, take the gradient of that cost function with respect to our network parameters, the weights and biases, and then update the weights and biases using these gradients. To clarify, we need to find and use these\n",
+    "\n",
+    "$$\n",
+    "\\frac{\\partial C}{\\partial W}, \\frac{\\partial C}{\\partial b}\n",
+    "$$\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6c753e3b",
+   "metadata": {},
+   "source": [
+    "Now we need to compute these gradients. This is pretty hard to do for a neural network, we will use most of next week to do this, but we can also use autograd to just do it for us, which is what we always do in practice. With the code cell below, we create a function which takes all of these gradients for us.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "56bef776",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from autograd import grad\n",
+    "\n",
+    "\n",
+    "gradient_func = grad(\n",
+    "    cost, 1\n",
+    ")  # Taking the gradient wrt. the second input to the cost function, i.e. the layers"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7b1b74bc",
+   "metadata": {},
+   "source": [
+    "**a)** What shape should the gradient of the cost function wrt. weights and biases be?\n",
+    "\n",
+    "**b)** Use the `gradient_func` function to take the gradient of the cross entropy wrt. the weights and biases of the network. Check the shapes of what's inside. What does the `grad` func from autograd actually do?\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "841c9e87",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "layers_grad = gradient_func(\n",
+    "    inputs, layers, activation_funcs, targets\n",
+    ")  # Don't change this"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "adc9e9be",
+   "metadata": {},
+   "source": [
+    "**c)** Finish the `train_network` function.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "6e4d38d3",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def train_network(\n",
+    "    inputs, layers, activation_funcs, targets, learning_rate=0.001, epochs=100\n",
+    "):\n",
+    "    for i in range(epochs):\n",
+    "        layers_grad = gradient_func(inputs, layers, activation_funcs, targets)\n",
+    "        for (W, b), (W_g, b_g) in zip(layers, layers_grad):\n",
+    "            W -= ...\n",
+    "            b -= ..."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2f65d663",
+   "metadata": {},
+   "source": [
+    "**e)** What do we call the gradient method used above?\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7059dd8c",
+   "metadata": {},
+   "source": [
+    "**d)** Train your network and see how the accuracy changes! Make a plot if you want.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "5027c7a5",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "..."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3bc77016",
+   "metadata": {},
+   "source": [
+    "**e)** How high of an accuracy is it possible to acheive with a neural network on this dataset, if we use the whole thing as training data?\n"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.9.15"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/doc/LectureNotes/_build/html/_sources/exercisesweek42.ipynb b/doc/LectureNotes/_build/html/_sources/exercisesweek42.ipynb
new file mode 100644
index 000000000..9925836a4
--- /dev/null
+++ b/doc/LectureNotes/_build/html/_sources/exercisesweek42.ipynb
@@ -0,0 +1,719 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Exercises week 42\n",
+    "\n",
+    "**October 13-17, 2025**\n",
+    "\n",
+    "Date: **Deadline is Friday October 17 at midnight**\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Overarching aims of the exercises this week\n",
+    "\n",
+    "The aim of the exercises this week is to train the neural network you implemented last week.\n",
+    "\n",
+    "To train neural networks, we use gradient descent, since there is no analytical expression for the optimal parameters. This means you will need to compute the gradient of the cost function wrt. the network parameters. And then you will need to implement some gradient method.\n",
+    "\n",
+    "You will begin by computing gradients for a network with one layer, then two layers, then any number of layers. Keeping track of the shapes and doing things step by step will be very important this week.\n",
+    "\n",
+    "We recommend that you do the exercises this week by editing and running this notebook file, as it includes some checks along the way that you have implemented the neural network correctly, and running small parts of the code at a time will be important for understanding the methods. If you have trouble running a notebook, you can run this notebook in google colab instead(https://colab.research.google.com/drive/1FfvbN0XlhV-lATRPyGRTtTBnJr3zNuHL#offline=true&sandboxMode=true), though we recommend that you set up VSCode and your python environment to run code like this locally.\n",
+    "\n",
+    "First, some setup code that you will need.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import autograd.numpy as np  # We need to use this numpy wrapper to make automatic differentiation work later\n",
+    "from autograd import grad, elementwise_grad\n",
+    "from sklearn import datasets\n",
+    "import matplotlib.pyplot as plt\n",
+    "from sklearn.metrics import accuracy_score\n",
+    "\n",
+    "\n",
+    "# Defining some activation functions\n",
+    "def ReLU(z):\n",
+    "    return np.where(z > 0, z, 0)\n",
+    "\n",
+    "\n",
+    "# Derivative of the ReLU function\n",
+    "def ReLU_der(z):\n",
+    "    return np.where(z > 0, 1, 0)\n",
+    "\n",
+    "\n",
+    "def sigmoid(z):\n",
+    "    return 1 / (1 + np.exp(-z))\n",
+    "\n",
+    "\n",
+    "def mse(predict, target):\n",
+    "    return np.mean((predict - target) ** 2)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Exercise 1 - Understand the feed forward pass\n",
+    "\n",
+    "**a)** Complete last weeks' exercises if you haven't already (recommended).\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Exercise 2 - Gradient with one layer using autograd\n",
+    "\n",
+    "For the first few exercises, we will not use batched inputs. Only a single input vector is passed through the layer at a time.\n",
+    "\n",
+    "In this exercise you will compute the gradient of a single layer. You only need to change the code in the cells right below an exercise, the rest works out of the box. Feel free to make changes and see how stuff works though!\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "**a)** If the weights and bias of a layer has shapes (10, 4) and (10), what will the shapes of the gradients of the cost function wrt. these weights and this bias be?\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "**b)** Complete the feed_forward_one_layer function. It should use the sigmoid activation function. Also define the weigth and bias with the correct shapes.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 41,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def feed_forward_one_layer(W, b, x):\n",
+    "    z = ...\n",
+    "    a = ...\n",
+    "    return a\n",
+    "\n",
+    "\n",
+    "def cost_one_layer(W, b, x, target):\n",
+    "    predict = feed_forward_one_layer(W, b, x)\n",
+    "    return mse(predict, target)\n",
+    "\n",
+    "\n",
+    "x = np.random.rand(2)\n",
+    "target = np.random.rand(3)\n",
+    "\n",
+    "W = ...\n",
+    "b = ..."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "**c)** Compute the gradient of the cost function wrt. the weigth and bias by running the cell below. You will not need to change anything, just make sure it runs by defining things correctly in the cell above. This code uses the autograd package which uses backprogagation to compute the gradient!\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "autograd_one_layer = grad(cost_one_layer, [0, 1])\n",
+    "W_g, b_g = autograd_one_layer(W, b, x, target)\n",
+    "print(W_g, b_g)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Exercise 3 - Gradient with one layer writing backpropagation by hand\n",
+    "\n",
+    "Before you use the gradient you found using autograd, you will have to find the gradient \"manually\", to better understand how the backpropagation computation works. To do backpropagation \"manually\", you will need to write out expressions for many derivatives along the computation.\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "We want to find the gradient of the cost function wrt. the weight and bias. This is quite hard to do directly, so we instead use the chain rule to combine multiple derivatives which are easier to compute.\n",
+    "\n",
+    "$$\n",
+    "\\frac{dC}{dW} = \\frac{dC}{da}\\frac{da}{dz}\\frac{dz}{dW}\n",
+    "$$\n",
+    "\n",
+    "$$\n",
+    "\\frac{dC}{db} = \\frac{dC}{da}\\frac{da}{dz}\\frac{dz}{db}\n",
+    "$$\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "**a)** Which intermediary results can be reused between the two expressions?\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "**b)** What is the derivative of the cost wrt. the final activation? You can use the autograd calculation to make sure you get the correct result. Remember that we compute the mean in mse.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "z = W @ x + b\n",
+    "a = sigmoid(z)\n",
+    "\n",
+    "predict = a\n",
+    "\n",
+    "\n",
+    "def mse_der(predict, target):\n",
+    "    return ...\n",
+    "\n",
+    "\n",
+    "print(mse_der(predict, target))\n",
+    "\n",
+    "cost_autograd = grad(mse, 0)\n",
+    "print(cost_autograd(predict, target))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "**c)** What is the expression for the derivative of the sigmoid activation function? You can use the autograd calculation to make sure you get the correct result.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def sigmoid_der(z):\n",
+    "    return ...\n",
+    "\n",
+    "\n",
+    "print(sigmoid_der(z))\n",
+    "\n",
+    "sigmoid_autograd = elementwise_grad(sigmoid, 0)\n",
+    "print(sigmoid_autograd(z))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "**d)** Using the two derivatives you just computed, compute this intermetidary gradient you will use later:\n",
+    "\n",
+    "$$\n",
+    "\\frac{dC}{dz} = \\frac{dC}{da}\\frac{da}{dz}\n",
+    "$$\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 54,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "dC_da = ...\n",
+    "dC_dz = ..."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "**e)** What is the derivative of the intermediary z wrt. the weight and bias? What should the shapes be? The one for the weights is a little tricky, it can be easier to play around in the next exercise first. You can also try computing it with autograd to get a hint.\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "**f)** Now combine the expressions you have worked with so far to compute the gradients! Note that you always need to do a feed forward pass while saving the zs and as before you do backpropagation, as they are used in the derivative expressions\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "dC_da = ...\n",
+    "dC_dz = ...\n",
+    "dC_dW = ...\n",
+    "dC_db = ...\n",
+    "\n",
+    "print(dC_dW, dC_db)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "You should get the same results as with autograd.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "W_g, b_g = autograd_one_layer(W, b, x, target)\n",
+    "print(W_g, b_g)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Exercise 4 - Gradient with two layers writing backpropagation by hand\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Now that you have implemented backpropagation for one layer, you have found most of the expressions you will need for more layers. Let's move up to two layers.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 59,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "x = np.random.rand(2)\n",
+    "target = np.random.rand(4)\n",
+    "\n",
+    "W1 = np.random.rand(3, 2)\n",
+    "b1 = np.random.rand(3)\n",
+    "\n",
+    "W2 = np.random.rand(4, 3)\n",
+    "b2 = np.random.rand(4)\n",
+    "\n",
+    "layers = [(W1, b1), (W2, b2)]"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 60,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "z1 = W1 @ x + b1\n",
+    "a1 = sigmoid(z1)\n",
+    "z2 = W2 @ a1 + b2\n",
+    "a2 = sigmoid(z2)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "We begin by computing the gradients of the last layer, as the gradients must be propagated backwards from the end.\n",
+    "\n",
+    "**a)** Compute the gradients of the last layer, just like you did the single layer in the previous exercise.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 61,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "dC_da2 = ...\n",
+    "dC_dz2 = ...\n",
+    "dC_dW2 = ...\n",
+    "dC_db2 = ..."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "To find the derivative of the cost wrt. the activation of the first layer, we need a new expression, the one furthest to the right in the following.\n",
+    "\n",
+    "$$\n",
+    "\\frac{dC}{da_1} = \\frac{dC}{dz_2}\\frac{dz_2}{da_1}\n",
+    "$$\n",
+    "\n",
+    "**b)** What is the derivative of the second layer intermetiate wrt. the first layer activation? (First recall how you compute $z_2$)\n",
+    "\n",
+    "$$\n",
+    "\\frac{dz_2}{da_1}\n",
+    "$$\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "**c)** Use this expression, together with expressions which are equivelent to ones for the last layer to compute all the derivatives of the first layer.\n",
+    "\n",
+    "$$\n",
+    "\\frac{dC}{dW_1} = \\frac{dC}{da_1}\\frac{da_1}{dz_1}\\frac{dz_1}{dW_1}\n",
+    "$$\n",
+    "\n",
+    "$$\n",
+    "\\frac{dC}{db_1} = \\frac{dC}{da_1}\\frac{da_1}{dz_1}\\frac{dz_1}{db_1}\n",
+    "$$\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 63,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "dC_da1 = ...\n",
+    "dC_dz1 = ...\n",
+    "dC_dW1 = ...\n",
+    "dC_db1 = ..."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "print(dC_dW1, dC_db1)\n",
+    "print(dC_dW2, dC_db2)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "**d)** Make sure you got the same gradient as the following code which uses autograd to do backpropagation.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 67,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def feed_forward_two_layers(layers, x):\n",
+    "    W1, b1 = layers[0]\n",
+    "    z1 = W1 @ x + b1\n",
+    "    a1 = sigmoid(z1)\n",
+    "\n",
+    "    W2, b2 = layers[1]\n",
+    "    z2 = W2 @ a1 + b2\n",
+    "    a2 = sigmoid(z2)\n",
+    "\n",
+    "    return a2"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def cost_two_layers(layers, x, target):\n",
+    "    predict = feed_forward_two_layers(layers, x)\n",
+    "    return mse(predict, target)\n",
+    "\n",
+    "\n",
+    "grad_two_layers = grad(cost_two_layers, 0)\n",
+    "grad_two_layers(layers, x, target)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "**e)** How would you use the gradient from this layer to compute the gradient of an even earlier layer? Would the expressions be any different?\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Exercise 5 - Gradient with any number of layers writing backpropagation by hand\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Well done on getting this far! Now it's time to compute the gradient with any number of layers.\n",
+    "\n",
+    "First, some code from the general neural network code from last week. Note that we are still sending in one input vector at a time. We will change it to use batched inputs later.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def create_layers(network_input_size, layer_output_sizes):\n",
+    "    layers = []\n",
+    "\n",
+    "    i_size = network_input_size\n",
+    "    for layer_output_size in layer_output_sizes:\n",
+    "        W = np.random.randn(layer_output_size, i_size)\n",
+    "        b = np.random.randn(layer_output_size)\n",
+    "        layers.append((W, b))\n",
+    "\n",
+    "        i_size = layer_output_size\n",
+    "    return layers\n",
+    "\n",
+    "\n",
+    "def feed_forward(input, layers, activation_funcs):\n",
+    "    a = input\n",
+    "    for (W, b), activation_func in zip(layers, activation_funcs):\n",
+    "        z = W @ a + b\n",
+    "        a = activation_func(z)\n",
+    "    return a\n",
+    "\n",
+    "\n",
+    "def cost(layers, input, activation_funcs, target):\n",
+    "    predict = feed_forward(input, layers, activation_funcs)\n",
+    "    return mse(predict, target)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "You might have already have noticed a very important detail in backpropagation: You need the values from the forward pass to compute all the gradients! The feed forward method above is great for efficiency and for using autograd, as it only cares about computing the final output, but now we need to also save the results along the way.\n",
+    "\n",
+    "Here is a function which does that for you.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def feed_forward_saver(input, layers, activation_funcs):\n",
+    "    layer_inputs = []\n",
+    "    zs = []\n",
+    "    a = input\n",
+    "    for (W, b), activation_func in zip(layers, activation_funcs):\n",
+    "        layer_inputs.append(a)\n",
+    "        z = W @ a + b\n",
+    "        a = activation_func(z)\n",
+    "\n",
+    "        zs.append(z)\n",
+    "\n",
+    "    return layer_inputs, zs, a"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "**a)** Now, complete the backpropagation function so that it returns the gradient of the cost function wrt. all the weigths and biases. Use the autograd calculation below to make sure you get the correct answer.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def backpropagation(\n",
+    "    input, layers, activation_funcs, target, activation_ders, cost_der=mse_der\n",
+    "):\n",
+    "    layer_inputs, zs, predict = feed_forward_saver(input, layers, activation_funcs)\n",
+    "\n",
+    "    layer_grads = [() for layer in layers]\n",
+    "\n",
+    "    # We loop over the layers, from the last to the first\n",
+    "    for i in reversed(range(len(layers))):\n",
+    "        layer_input, z, activation_der = layer_inputs[i], zs[i], activation_ders[i]\n",
+    "\n",
+    "        if i == len(layers) - 1:\n",
+    "            # For last layer we use cost derivative as dC_da(L) can be computed directly\n",
+    "            dC_da = ...\n",
+    "        else:\n",
+    "            # For other layers we build on previous z derivative, as dC_da(i) = dC_dz(i+1) * dz(i+1)_da(i)\n",
+    "            (W, b) = layers[i + 1]\n",
+    "            dC_da = ...\n",
+    "\n",
+    "        dC_dz = ...\n",
+    "        dC_dW = ...\n",
+    "        dC_db = ...\n",
+    "\n",
+    "        layer_grads[i] = (dC_dW, dC_db)\n",
+    "\n",
+    "    return layer_grads"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "network_input_size = 2\n",
+    "layer_output_sizes = [3, 4]\n",
+    "activation_funcs = [sigmoid, ReLU]\n",
+    "activation_ders = [sigmoid_der, ReLU_der]\n",
+    "\n",
+    "layers = create_layers(network_input_size, layer_output_sizes)\n",
+    "\n",
+    "x = np.random.rand(network_input_size)\n",
+    "target = np.random.rand(4)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "layer_grads = backpropagation(x, layers, activation_funcs, target, activation_ders)\n",
+    "print(layer_grads)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "cost_grad = grad(cost, 0)\n",
+    "cost_grad(layers, x, [sigmoid, ReLU], target)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Exercise 6 - Batched inputs\n",
+    "\n",
+    "Make new versions of all the functions in exercise 5 which now take batched inputs instead. See last weeks exercise 5 for details on how to batch inputs to neural networks. You will also need to update the backpropogation function.\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Exercise 7 - Training\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "**a)** Complete exercise 6 and 7 from last week, but use your own backpropogation implementation to compute the gradient.\n",
+    "- IMPORTANT: Do not implement the derivative terms for softmax and cross-entropy separately, it will be very hard!\n",
+    "- Instead, use the fact that the derivatives multiplied together simplify to **prediction - target** (see [source1](https://medium.com/data-science/derivative-of-the-softmax-function-and-the-categorical-cross-entropy-loss-ffceefc081d1), [source2](https://shivammehta25.github.io/posts/deriving-categorical-cross-entropy-and-softmax/))\n",
+    "\n",
+    "**b)** Use stochastic gradient descent with momentum when you train your network.\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Exercise 8 (Optional) - Object orientation\n",
+    "\n",
+    "Passing in the layers, activations functions, activation derivatives and cost derivatives into the functions each time leads to code which is easy to understand in isoloation, but messier when used in a larger context with data splitting, data scaling, gradient methods and so forth. Creating an object which stores these values can lead to code which is much easier to use.\n",
+    "\n",
+    "**a)** Write a neural network class. You are free to implement it how you see fit, though we strongly recommend to not save any input or output values as class attributes, nor let the neural network class handle gradient methods internally. Gradient methods should be handled outside, by performing general operations on the layer_grads list using functions or classes separate to the neural network.\n",
+    "\n",
+    "We provide here a skeleton structure which should get you started.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "class NeuralNetwork:\n",
+    "    def __init__(\n",
+    "        self,\n",
+    "        network_input_size,\n",
+    "        layer_output_sizes,\n",
+    "        activation_funcs,\n",
+    "        activation_ders,\n",
+    "        cost_fun,\n",
+    "        cost_der,\n",
+    "    ):\n",
+    "        pass\n",
+    "\n",
+    "    def predict(self, inputs):\n",
+    "        # Simple feed forward pass\n",
+    "        pass\n",
+    "\n",
+    "    def cost(self, inputs, targets):\n",
+    "        pass\n",
+    "\n",
+    "    def _feed_forward_saver(self, inputs):\n",
+    "        pass\n",
+    "\n",
+    "    def compute_gradient(self, inputs, targets):\n",
+    "        pass\n",
+    "\n",
+    "    def update_weights(self, layer_grads):\n",
+    "        pass\n",
+    "\n",
+    "    # These last two methods are not needed in the project, but they can be nice to have! The first one has a layers parameter so that you can use autograd on it\n",
+    "    def autograd_compliant_predict(self, layers, inputs):\n",
+    "        pass\n",
+    "\n",
+    "    def autograd_gradient(self, inputs, targets):\n",
+    "        pass"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.9.15"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}
diff --git a/doc/LectureNotes/_build/html/_sources/exercisesweek43.ipynb b/doc/LectureNotes/_build/html/_sources/exercisesweek43.ipynb
new file mode 100644
index 000000000..f80e8787a
--- /dev/null
+++ b/doc/LectureNotes/_build/html/_sources/exercisesweek43.ipynb
@@ -0,0 +1,647 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "860d70d8",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- HTML file automatically generated from DocOnce source (https://github.com/doconce/doconce/)\n",
+    "doconce format html exercisesweek43.do.txt  -->\n",
+    "<!-- dom:TITLE: Exercises week 43  -->"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "119c0988",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "# Exercises week 43 \n",
+    "**October 20-24, 2025**\n",
+    "\n",
+    "Date: **Deadline Friday October 24 at midnight**"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "909887eb",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "# Overarching aims of the exercises for week 43\n",
+    "\n",
+    "The aim of the exercises this week is to gain some confidence with\n",
+    "ways to visualize the results of a classification problem.  We will\n",
+    "target three ways of setting up the analysis. The first and simplest\n",
+    "one is the\n",
+    "1. so-called confusion matrix. The next one is the so-called\n",
+    "\n",
+    "2. ROC curve. Finally we have the\n",
+    "\n",
+    "3. Cumulative gain curve.\n",
+    "\n",
+    "We will use Logistic Regression as method for the classification in\n",
+    "this exercise. You can compare these results with those obtained with\n",
+    "your neural network code from project 2 without a hidden layer.\n",
+    "\n",
+    "In these exercises we will use binary and  multi-class data sets\n",
+    "(the Iris data set from week 41).\n",
+    "\n",
+    "The underlying mathematics is described here."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1e1cb4fb",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "### Confusion Matrix\n",
+    "\n",
+    "A **confusion matrix** summarizes a classifier’s performance by\n",
+    "tabulating predictions versus true labels.  For binary classification,\n",
+    "it is a $2\\times2$ table whose entries are counts of outcomes:"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7b090385",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\begin{array}{l|cc} & \\text{Predicted Positive} & \\text{Predicted Negative} \\\\ \\hline \\text{Actual Positive} & TP & FN \\\\ \\text{Actual Negative} & FP & TN \\end{array}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1e14904b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Here TP (true positives) is the number of cases correctly predicted as\n",
+    "positive, FP (false positives) is the number incorrectly predicted as\n",
+    "positive, TN (true negatives) is correctly predicted negative, and FN\n",
+    "(false negatives) is incorrectly predicted negative .  In other words,\n",
+    "“positive” means class 1 and “negative” means class 0; for example, TP\n",
+    "occurs when the prediction and actual are both positive.  Formally:"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e93ea290",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\text{TPR} = \\frac{\\text{TP}}{\\text{TP} + \\text{FN}}, \\quad \\text{FPR} = \\frac{\\text{FP}}{\\text{FP} + \\text{TN}},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c80bea5b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where TPR and FPR are the true and false positive rates defined below.\n",
+    "\n",
+    "In multiclass classification with $K$ classes, the confusion matrix\n",
+    "generalizes to a $K\\times K$ table.  Entry $N_{ij}$ in the table is\n",
+    "the count of instances whose true class is $i$ and whose predicted\n",
+    "class is $j$.  For example, a three-class confusion matrix can be written\n",
+    "as:"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a0f68f5f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\begin{array}{c|ccc} & \\text{Pred Class 1} & \\text{Pred Class 2} & \\text{Pred Class 3} \\\\ \\hline \\text{Act Class 1} & N_{11} & N_{12} & N_{13} \\\\ \\text{Act Class 2} & N_{21} & N_{22} & N_{23} \\\\ \\text{Act Class 3} & N_{31} & N_{32} & N_{33} \\end{array}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "869669b2",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Here the diagonal entries $N_{ii}$ are the true positives for each\n",
+    "class, and off-diagonal entries are misclassifications.  This matrix\n",
+    "allows computation of per-class metrics: e.g. for class $i$,\n",
+    "$\\mathrm{TP}_i=N_{ii}$, $\\mathrm{FN}_i=\\sum_{j\\neq i}N_{ij}$,\n",
+    "$\\mathrm{FP}_i=\\sum_{j\\neq i}N_{ji}$, and $\\mathrm{TN}_i$ is the sum of\n",
+    "all remaining entries.\n",
+    "\n",
+    "As defined above, TPR and FPR come from the binary case. In binary\n",
+    "terms with $P$ actual positives and $N$ actual negatives, one has"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2abd82a7",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\text{TPR} = \\frac{TP}{P} = \\frac{TP}{TP+FN}, \\quad \\text{FPR} =\n",
+    "\\frac{FP}{N} = \\frac{FP}{FP+TN},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2f79325c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "as used in standard confusion-matrix\n",
+    "formulations. These rates will be used in constructing ROC curves."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0ce65a47",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "### ROC Curve\n",
+    "\n",
+    "The Receiver Operating Characteristic (ROC) curve plots the trade-off\n",
+    "between true positives and false positives as a discrimination\n",
+    "threshold varies.  Specifically, for a binary classifier that outputs\n",
+    "a score or probability, one varies the threshold $t$ for declaring\n",
+    "**positive**, and computes at each $t$ the true positive rate\n",
+    "$\\mathrm{TPR}(t)$ and false positive rate $\\mathrm{FPR}(t)$ using the\n",
+    "confusion matrix at that threshold.  The ROC curve is then the graph\n",
+    "of TPR versus FPR.  By definition,"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d750fdff",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\mathrm{TPR} = \\frac{TP}{TP+FN}, \\qquad \\mathrm{FPR} = \\frac{FP}{FP+TN},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "561bfb2c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where $TP,FP,TN,FN$ are counts determined by threshold $t$.  A perfect\n",
+    "classifier would reach the point (FPR=0, TPR=1) at some threshold.\n",
+    "\n",
+    "Formally, the ROC curve is obtained by plotting\n",
+    "$(\\mathrm{FPR}(t),\\mathrm{TPR}(t))$ for all $t\\in[0,1]$ (or as $t$\n",
+    "sweeps through the sorted scores).  The Area Under the ROC Curve (AUC)\n",
+    "quantifies the average performance over all thresholds.  It can be\n",
+    "interpreted probabilistically: $\\mathrm{AUC} =\n",
+    "\\Pr\\bigl(s(X^+)>s(X^-)\\bigr)$, the probability that a random positive\n",
+    "instance $X^+$ receives a higher score $s$ than a random negative\n",
+    "instance $X^-$ .  Equivalently, the AUC is the integral under the ROC\n",
+    "curve:"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5ca722fe",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\mathrm{AUC} \\;=\\; \\int_{0}^{1} \\mathrm{TPR}(f)\\,df,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "30080a86",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where $f$ ranges over FPR (or fraction of negatives).  A model that guesses at random yields a diagonal ROC (AUC=0.5), whereas a perfect model yields AUC=1.0."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9e627156",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "### Cumulative Gain\n",
+    "\n",
+    "The cumulative gain curve (or gains chart) evaluates how many\n",
+    "positives are captured as one targets an increasing fraction of the\n",
+    "population, sorted by model confidence.  To construct it, sort all\n",
+    "instances by decreasing predicted probability of the positive class.\n",
+    "Then, for the top $\\alpha$ fraction of instances, compute the fraction\n",
+    "of all actual positives that fall in this subset.  In formula form, if\n",
+    "$P$ is the total number of positive instances and $P(\\alpha)$ is the\n",
+    "number of positives among the top $\\alpha$ of the data, the cumulative\n",
+    "gain at level $\\alpha$ is"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3e9132ef",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\mathrm{Gain}(\\alpha) \\;=\\; \\frac{P(\\alpha)}{P}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "75be6f5c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "For example, cutting off at the top 10% of predictions yields a gain\n",
+    "equal to (positives in top 10%) divided by (total positives) .\n",
+    "Plotting $\\mathrm{Gain}(\\alpha)$ versus $\\alpha$ (often in percent)\n",
+    "gives the gain curve.  The baseline (random) curve is the diagonal\n",
+    "$\\mathrm{Gain}(\\alpha)=\\alpha$, while an ideal model has a steep climb\n",
+    "toward 1.\n",
+    "\n",
+    "A related measure is the {\\em lift}, often called the gain ratio.  It is the ratio of the model’s capture rate to that of random selection.  Equivalently,"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e5525570",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\mathrm{Lift}(\\alpha) \\;=\\; \\frac{\\mathrm{Gain}(\\alpha)}{\\alpha}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "18ff8dc2",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "A lift $>1$ indicates better-than-random targeting.  In practice, gain\n",
+    "and lift charts (used e.g.\\ in marketing or imbalanced classification)\n",
+    "show how many positives can be “gained” by focusing on a fraction of\n",
+    "the population ."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c3d3fde8",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "### Other measures: Precision, Recall, and the F$_1$ Measure\n",
+    "\n",
+    "Precision and recall (sensitivity) quantify binary classification\n",
+    "accuracy in terms of positive predictions.  They are defined from the\n",
+    "confusion matrix as:"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f1f14c8e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\text{Precision} = \\frac{TP}{TP + FP}, \\qquad \\text{Recall} = \\frac{TP}{TP + FN}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "422cc743",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Precision is the fraction of predicted positives that are correct, and\n",
+    "recall is the fraction of actual positives that are correctly\n",
+    "identified .  A high-precision classifier makes few false-positive\n",
+    "errors, while a high-recall classifier makes few false-negative\n",
+    "errors.\n",
+    "\n",
+    "The F$_1$ score (balanced F-measure) combines precision and recall into a single metric via their harmonic mean.  The usual formula is:"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "621a2e8b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "F_1 =2\\frac{\\text{Precision}\\times\\text{Recall}}{\\text{Precision} + \\text{Recall}}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "62eee54a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "This can be shown to equal"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7a6a2e7a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{2\\,TP}{2\\,TP + FP + FN}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b96c9ff4",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "The F$_1$ score ranges from 0 (worst) to 1 (best), and balances the\n",
+    "trade-off between precision and recall.\n",
+    "\n",
+    "For multi-class classification, one computes per-class\n",
+    "precision/recall/F$_1$ (treating each class as “positive” in a\n",
+    "one-vs-rest manner) and then averages.  Common averaging methods are:\n",
+    "\n",
+    "Micro-averaging: Sum all true positives, false positives, and false negatives across classes, then compute precision/recall/F$_1$ from these totals.\n",
+    "Macro-averaging: Compute the F$1$ score $F{1,i}$ for each class $i$ separately, then take the unweighted mean: $F_{1,\\mathrm{macro}} = \\frac{1}{K}\\sum_{i=1}^K F_{1,i}$ .  This treats all classes equally regardless of size.\n",
+    "Weighted-averaging: Like macro-average, but weight each class’s $F_{1,i}$ by its support $n_i$ (true count): $F_{1,\\mathrm{weighted}} = \\frac{1}{N}\\sum_{i=1}^K n_i F_{1,i}$, where $N=\\sum_i n_i$.  This accounts for class imbalance by giving more weight to larger classes .\n",
+    "\n",
+    "Each of these averages has different use-cases. Micro-average is\n",
+    "dominated by common classes, macro-average highlights performance on\n",
+    "rare classes, and weighted-average is a compromise.  These formulas\n",
+    "and concepts allow rigorous evaluation of classifier performance in\n",
+    "both binary and multi-class settings."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9274bf3f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Exercises\n",
+    "\n",
+    "Here is a simple code example which uses  the Logistic regression machinery from **scikit-learn**.\n",
+    "At the end it sets up the confusion matrix and the ROC and cumulative gain curves.\n",
+    "Feel free to use these functionalities (we don't expect you to write your own code for say the confusion matrix)."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "id": "be9ff0b9",
+   "metadata": {
+    "collapsed": false,
+    "editable": true,
+    "jupyter": {
+     "outputs_hidden": false
+    }
+   },
+   "outputs": [],
+   "source": [
+    "%matplotlib inline\n",
+    "\n",
+    "import matplotlib.pyplot as plt\n",
+    "import numpy as np\n",
+    "from sklearn.model_selection import  train_test_split \n",
+    "# from sklearn.datasets import fill in the data set\n",
+    "from sklearn.linear_model import LogisticRegression\n",
+    "\n",
+    "# Load the data, fill inn\n",
+    "mydata.data = ?\n",
+    "\n",
+    "X_train, X_test, y_train, y_test = train_test_split(mydata.data,cancer.target,random_state=0)\n",
+    "print(X_train.shape)\n",
+    "print(X_test.shape)\n",
+    "# Logistic Regression\n",
+    "# define which type of problem, binary or multiclass\n",
+    "logreg = LogisticRegression(solver='lbfgs')\n",
+    "logreg.fit(X_train, y_train)\n",
+    "\n",
+    "from sklearn.preprocessing import LabelEncoder\n",
+    "from sklearn.model_selection import cross_validate\n",
+    "#Cross validation\n",
+    "accuracy = cross_validate(logreg,X_test,y_test,cv=10)['test_score']\n",
+    "print(accuracy)\n",
+    "print(\"Test set accuracy with Logistic Regression: {:.2f}\".format(logreg.score(X_test,y_test)))\n",
+    "\n",
+    "import scikitplot as skplt\n",
+    "y_pred = logreg.predict(X_test)\n",
+    "skplt.metrics.plot_confusion_matrix(y_test, y_pred, normalize=True)\n",
+    "plt.show()\n",
+    "y_probas = logreg.predict_proba(X_test)\n",
+    "skplt.metrics.plot_roc(y_test, y_probas)\n",
+    "plt.show()\n",
+    "skplt.metrics.plot_cumulative_gain(y_test, y_probas)\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "51760b3e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "### Exercise a)\n",
+    "\n",
+    "Convince yourself about the mathematics for the confusion matrix, the ROC and the cumlative gain curves for both a binary and a multiclass classification problem."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c1d42f5f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "### Exercise b)\n",
+    "\n",
+    "Use a binary classification data available from **scikit-learn**. As an example you can use\n",
+    "the MNIST data set and just specialize to two numbers. To do so you can use the following code lines"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "id": "d20bb8be",
+   "metadata": {
+    "collapsed": false,
+    "editable": true,
+    "jupyter": {
+     "outputs_hidden": false
+    }
+   },
+   "outputs": [],
+   "source": [
+    "from sklearn.datasets import load_digits\n",
+    "digits = load_digits(n_class=2) # Load only two classes, e.g., 0 and 1\n",
+    "X, y = digits.data, digits.target"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "828ea1cd",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Alternatively, you can use the _make$\\_$classification_\n",
+    "functionality. This function generates a random $n$-class classification\n",
+    "dataset, which can be configured for binary classification by setting\n",
+    "n_classes=2. You can also control the number of samples, features,\n",
+    "informative features, redundant features, and more."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "id": "d271f0ba",
+   "metadata": {
+    "collapsed": false,
+    "editable": true,
+    "jupyter": {
+     "outputs_hidden": false
+    }
+   },
+   "outputs": [],
+   "source": [
+    "from sklearn.datasets import make_classification\n",
+    "X, y = make_classification(n_samples=1000, n_features=20, n_informative=10, n_redundant=5, n_classes=2, random_state=42)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0068b032",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "You can use this option for the multiclass case as well, see the next exercise.\n",
+    "If you prefer to study other binary classification datasets, feel free\n",
+    "to replace the above suggestions with your own dataset.\n",
+    "\n",
+    "Make plots of the confusion matrix, the ROC curve and the cumulative gain curve."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c45f5b41",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "### Exercise c) week 43\n",
+    "\n",
+    "As a multiclass problem, we will use the Iris data set discussed in\n",
+    "the exercises from weeks 41 and 42. This is a three-class data set and\n",
+    "you can set it up using **scikit-learn**,"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "id": "3b045d56",
+   "metadata": {
+    "collapsed": false,
+    "editable": true,
+    "jupyter": {
+     "outputs_hidden": false
+    }
+   },
+   "outputs": [],
+   "source": [
+    "from sklearn.datasets import load_iris\n",
+    "iris = load_iris()\n",
+    "X = iris.data  # Features\n",
+    "y = iris.target # Target labels"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "14cc859c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Make plots of the confusion matrix, the ROC curve and the cumulative\n",
+    "gain curve for this (or other) multiclass data set."
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.9.15"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/doc/LectureNotes/_build/html/_sources/exercisesweek44.ipynb b/doc/LectureNotes/_build/html/_sources/exercisesweek44.ipynb
new file mode 100644
index 000000000..32aa0e723
--- /dev/null
+++ b/doc/LectureNotes/_build/html/_sources/exercisesweek44.ipynb
@@ -0,0 +1,182 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "55f7cd56",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- HTML file automatically generated from DocOnce source (https://github.com/doconce/doconce/)\n",
+    "doconce format html exercisesweek44.do.txt  -->\n",
+    "<!-- dom:TITLE: Exercises week 44 -->\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "37c83276",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "# Exercises week 44\n",
+    "\n",
+    "**October 27-31, 2025**\n",
+    "\n",
+    "Date: **Deadline is Friday October 31 at midnight**\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "58a26983",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "# Overarching aims of the exercises this week\n",
+    "\n",
+    "The exercise set this week has two parts.\n",
+    "\n",
+    "1. The first is a version of the exercises from week 39, where you got started with the report and github repository for project 1, only this time for project 2. This part is required, and a short feedback to this exercise will be available before the project deadline. And you can reuse these elements in your final report.\n",
+    "\n",
+    "2. The second is a list of questions meant as a summary of many of the central elements we have discussed in connection with projects 1 and 2, with a slight bias towards deep learning methods and their training. The hope is that these exercises can be of use in your discussions about the neural network results in project 2. **You don't need to answer all the questions, but you should be able to answer them by the end of working on project 2.**\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "350c58e2",
+   "metadata": {
+    "vscode": {
+     "languageId": "plaintext"
+    }
+   },
+   "source": [
+    "### Deliverables\n",
+    "\n",
+    "First, join a group in canvas with your group partners. Pick an avaliable group for Project 2 in the “People” page. If you don't have a group, you should really consider joining one!\n",
+    "\n",
+    "Complete exercise 1 while working in an Overleaf project. Then, in canvas, include\n",
+    "\n",
+    "- An exported PDF of the report draft you have been working on.\n",
+    "- A comment linking to the github repository used in exercise **1d)**\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "00f65f6e",
+   "metadata": {},
+   "source": [
+    "## Exercise 1:\n",
+    "\n",
+    "Following the same directions as in the weekly exercises for week 39:\n",
+    "\n",
+    "**a)** Create a report document in Overleaf, and write a suitable abstract and introduction for project 2.\n",
+    "\n",
+    "**b)** Add a figure in your report of a heatmap showing the test accuracy of a neural network with [0, 1, 2, 3] hidden layers and [5, 10, 25, 50] nodes per hidden layer.\n",
+    "\n",
+    "**c)** Add a figure in your report which meets as few requirements as possible of what we consider a good figure in this course, while still including some results, a title, figure text, and axis labels. Describe in the text of the report the different ways in which the figure is lacking. (This should not be included in the final report for project 2.)\n",
+    "\n",
+    "**d)** Create a github repository or folder in a repository with all the elements described in exercise 4 of the weekly exercises of week 39.\n",
+    "\n",
+    "**e)** If applicable, add references to your report for the source of your data for regression and classification, the source of claims you make about your data, and for the sources of the gradient optimizers you use and your general claims about these.\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6dff53b8",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Exercise 2:\n",
+    "\n",
+    "**a)** Linear and logistic regression methods\n",
+    "\n",
+    "1. What is the main difference between ordinary least squares and Ridge regression?\n",
+    "\n",
+    "2. Which kind of data set would you use logistic regression for?\n",
+    "\n",
+    "3. In linear regression you assume that your output is described by a continuous non-stochastic function $f(x)$. Which is the equivalent function in logistic regression?\n",
+    "\n",
+    "4. Can you find an analytic solution to a logistic regression type of problem?\n",
+    "\n",
+    "5. What kind of cost function would you use in logistic regression?\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "21a056a4",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "**b)** Deep learning\n",
+    "\n",
+    "1. What is an activation function and discuss the use of an activation function? Explain three different types of activation functions?\n",
+    "\n",
+    "2. Describe the architecture of a typical feed forward Neural Network (NN).\n",
+    "\n",
+    "3. You are using a deep neural network for a prediction task. After training your model, you notice that it is strongly overfitting the training set and that the performance on the test isn’t good. What can you do to reduce overfitting?\n",
+    "\n",
+    "4. How would you know if your model is suffering from the problem of exploding gradients?\n",
+    "\n",
+    "5. Can you name and explain a few hyperparameters used for training a neural network?\n",
+    "\n",
+    "6. Describe the architecture of a typical Convolutional Neural Network (CNN)\n",
+    "\n",
+    "7. What is the vanishing gradient problem in Neural Networks and how to fix it?\n",
+    "\n",
+    "8. When it comes to training an artificial neural network, what could the reason be for why the cost/loss doesn't decrease in a few epochs?\n",
+    "\n",
+    "9. How does L1/L2 regularization affect a neural network?\n",
+    "\n",
+    "10. What is(are) the advantage(s) of deep learning over traditional methods like linear regression or logistic regression?\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7c48bc09",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "**c)** Optimization part\n",
+    "\n",
+    "1. Which is the basic mathematical root-finding method behind essentially all gradient descent approaches(stochastic and non-stochastic)?\n",
+    "\n",
+    "2. And why don't we use it? Or stated differently, why do we introduce the learning rate as a parameter?\n",
+    "\n",
+    "3. What might happen if you set the momentum hyperparameter too close to 1 (e.g., 0.9999) when using an optimizer for the learning rate?\n",
+    "\n",
+    "4. Why should we use stochastic gradient descent instead of plain gradient descent?\n",
+    "\n",
+    "5. Which parameters would you need to tune when use a stochastic gradient descent approach?\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "56b0b5f6",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "**d)** Analysis of results\n",
+    "\n",
+    "1. How do you assess overfitting and underfitting?\n",
+    "\n",
+    "2. Why do we divide the data in test and train and/or eventually validation sets?\n",
+    "\n",
+    "3. Why would you use resampling methods in the data analysis? Mention some widely popular resampling methods.\n",
+    "\n",
+    "4. Why might a model that does not overfit the data (maybe because there is a lot of data) perform worse when we add regularization?\n"
+   ]
+  }
+ ],
+ "metadata": {
+  "language_info": {
+   "name": "python"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/doc/LectureNotes/_build/html/_sources/project1.ipynb b/doc/LectureNotes/_build/html/_sources/project1.ipynb
index aba42cd41..5170af951 100644
--- a/doc/LectureNotes/_build/html/_sources/project1.ipynb
+++ b/doc/LectureNotes/_build/html/_sources/project1.ipynb
@@ -9,7 +9,7 @@
    "source": [
     "<!-- HTML file automatically generated from DocOnce source (https://github.com/doconce/doconce/)\n",
     "doconce format html Project1.do.txt  -->\n",
-    "<!-- dom:TITLE: Project 1 on Machine Learning, deadline October 6 (midnight), 2025 -->"
+    "<!-- dom:TITLE: Project 1 on Machine Learning, deadline October 6 (midnight), 2025 -->\n"
    ]
   },
   {
@@ -20,9 +20,34 @@
    },
    "source": [
     "# Project 1 on Machine Learning, deadline October 6 (midnight), 2025\n",
+    "\n",
     "**Data Analysis and Machine Learning FYS-STK3155/FYS4155**, University of Oslo, Norway\n",
     "\n",
-    "Date: **September 2**"
+    "Date: **September 2**\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "beb333e3",
+   "metadata": {},
+   "source": [
+    "### Deliverables\n",
+    "\n",
+    "First, join a group in canvas with your group partners. Pick an avaliable group for Project 1 in the \"People\" page.\n",
+    "\n",
+    "In canvas, deliver as a group and include:\n",
+    "\n",
+    "- A PDF of your report which follows the guidelines covered below and in the week 39 exercises. Additional requirements include:\n",
+    "  - It should be around 5000 words, use the word counter in Overleaf for this. This often corresponds to 10-12 pages. References and appendices are excluded from the word count\n",
+    "  - It should include around 10-15 figures. You can include more figures in appendices and/or as supplemental material in your repository.\n",
+    "- A comment linking to your github repository (or folder in one of your github repositories) for this project. The repository must include\n",
+    "  - A PDF file of the report\n",
+    "  - A folder named Code, where you put python files for your functions and notebooks for reproducing your results. Remember to use a seed for generating random data and for train-test splits when generating final results.\n",
+    "  - A README file with\n",
+    "    - the name of the group members\n",
+    "    - a short description of the project\n",
+    "    - a description of how to install the required packages to run your code from a requirements.txt file or similar (such as a plain text description)\n",
+    "    - names and descriptions of the various notebooks in the Code folder and the results they produce\n"
    ]
   },
   {
@@ -35,7 +60,7 @@
     "## Preamble: Note on writing reports, using reference material, AI and other tools\n",
     "\n",
     "We want you to answer the three different projects by handing in\n",
-    "reports written like a standard scientific/technical report.  The\n",
+    "reports written like a standard scientific/technical report. The\n",
     "links at\n",
     "<https://github.com/CompPhysics/MachineLearning/tree/master/doc/Projects>\n",
     "contain more information. There you can find examples of previous\n",
@@ -63,14 +88,14 @@
     "been studied in the scientific literature. This makes it easier for\n",
     "you to compare and analyze your results. Comparing with existing\n",
     "results from the scientific literature is also an essential element of\n",
-    "the scientific discussion.  The University of California at Irvine\n",
+    "the scientific discussion. The University of California at Irvine\n",
     "with its Machine Learning repository at\n",
     "<https://archive.ics.uci.edu/ml/index.php> is an excellent site to\n",
     "look up for examples and\n",
     "inspiration. [Kaggle.com](https://www.kaggle.com/) is an equally\n",
     "interesting site. Feel free to explore these sites. When selecting\n",
     "other data sets, make sure these are sets used for regression problems\n",
-    "(not classification)."
+    "(not classification).\n"
    ]
   },
   {
@@ -90,7 +115,7 @@
     "We will study how to fit polynomials to specific\n",
     "one-dimensional functions (feel free to replace the suggested function with more complicated ones).\n",
     "\n",
-    "We will use Runge's function (see <https://en.wikipedia.org/wiki/Runge%27s_phenomenon> for a discussion).  The one-dimensional function we will study is"
+    "We will use Runge's function (see <https://en.wikipedia.org/wiki/Runge%27s_phenomenon> for a discussion). The one-dimensional function we will study is\n"
    ]
   },
   {
@@ -102,7 +127,7 @@
    "source": [
     "$$\n",
     "f(x) = \\frac{1}{1+25x^2}.\n",
-    "$$"
+    "$$\n"
    ]
   },
   {
@@ -114,14 +139,14 @@
    "source": [
     "Our first step will be to perform an OLS regression analysis of this\n",
     "function, trying out a polynomial fit with an $x$ dependence of the\n",
-    "form $[x,x^2,\\dots]$.  You can use a uniform distribution to set up the\n",
+    "form $[x,x^2,\\dots]$. You can use a uniform distribution to set up the\n",
     "arrays of values for $x \\in [-1,1]$, or alternatively use a fixed step size.\n",
     "Thereafter we will repeat many of the same steps when using the Ridge and Lasso regression methods,\n",
-    "introducing thereby a dependence on the hyperparameter  (penalty) $\\lambda$.\n",
+    "introducing thereby a dependence on the hyperparameter (penalty) $\\lambda$.\n",
     "\n",
     "We will also include bootstrap as a resampling technique in order to\n",
-    "study the so-called **bias-variance tradeoff**.  After that we will\n",
-    "include the so-called cross-validation technique."
+    "study the so-called **bias-variance tradeoff**. After that we will\n",
+    "include the so-called cross-validation technique.\n"
    ]
   },
   {
@@ -133,15 +158,15 @@
    "source": [
     "### Part a : Ordinary Least Square (OLS) for the Runge function\n",
     "\n",
-    "We will generate our own dataset for abovementioned  function\n",
+    "We will generate our own dataset for abovementioned function\n",
     "$\\mathrm{Runge}(x)$ function with $x\\in [-1,1]$. You should explore also the addition\n",
     "of an added stochastic noise to this function using the normal\n",
     "distribution $N(0,1)$.\n",
     "\n",
-    "*Write your own code* (using for example the  pseudoinverse function **pinv** from  **Numpy** ) and perform a standard **ordinary least square regression**\n",
-    "analysis using polynomials in $x$ up to  order $15$ or higher. Explore the dependence on the number of data points and the polynomial degree.\n",
+    "_Write your own code_ (using for example the pseudoinverse function **pinv** from **Numpy** ) and perform a standard **ordinary least square regression**\n",
+    "analysis using polynomials in $x$ up to order $15$ or higher. Explore the dependence on the number of data points and the polynomial degree.\n",
     "\n",
-    "Evaluate the mean Squared error (MSE)"
+    "Evaluate the mean Squared error (MSE)\n"
    ]
   },
   {
@@ -154,7 +179,7 @@
     "$$\n",
     "MSE(\\boldsymbol{y},\\tilde{\\boldsymbol{y}}) = \\frac{1}{n}\n",
     "\\sum_{i=0}^{n-1}(y_i-\\tilde{y}_i)^2,\n",
-    "$$"
+    "$$\n"
    ]
   },
   {
@@ -164,9 +189,9 @@
     "editable": true
    },
    "source": [
-    "and the $R^2$ score function.  If $\\tilde{\\boldsymbol{y}}_i$ is the predicted\n",
+    "and the $R^2$ score function. If $\\tilde{\\boldsymbol{y}}_i$ is the predicted\n",
     "value of the $i-th$ sample and $y_i$ is the corresponding true value,\n",
-    "then the score $R^2$ is defined as"
+    "then the score $R^2$ is defined as\n"
    ]
   },
   {
@@ -178,7 +203,7 @@
    "source": [
     "$$\n",
     "R^2(\\boldsymbol{y}, \\tilde{\\boldsymbol{y}}) = 1 - \\frac{\\sum_{i=0}^{n - 1} (y_i - \\tilde{y}_i)^2}{\\sum_{i=0}^{n - 1} (y_i - \\bar{y})^2},\n",
-    "$$"
+    "$$\n"
    ]
   },
   {
@@ -188,7 +213,7 @@
     "editable": true
    },
    "source": [
-    "where we have defined the mean value  of $\\boldsymbol{y}$ as"
+    "where we have defined the mean value of $\\boldsymbol{y}$ as\n"
    ]
   },
   {
@@ -200,7 +225,7 @@
    "source": [
     "$$\n",
     "\\bar{y} =  \\frac{1}{n} \\sum_{i=0}^{n - 1} y_i.\n",
-    "$$"
+    "$$\n"
    ]
   },
   {
@@ -215,23 +240,23 @@
     "\n",
     "Your code has to include a scaling/centering of the data (for example by\n",
     "subtracting the mean value), and\n",
-    "a split of the data in training and test data. For the scaling  you can\n",
+    "a split of the data in training and test data. For the scaling you can\n",
     "either write your own code or use for example the function for\n",
     "splitting training data provided by the library **Scikit-Learn** (make\n",
-    "sure you have installed it).  This function is called\n",
-    "$train\\_test\\_split$.  **You should present a critical discussion of why and how you have scaled or not scaled the data**.\n",
+    "sure you have installed it). This function is called\n",
+    "$train\\_test\\_split$. **You should present a critical discussion of why and how you have scaled or not scaled the data**.\n",
     "\n",
     "It is normal in essentially all Machine Learning studies to split the\n",
-    "data in a training set and a test set (eventually  also an additional\n",
-    "validation set).  There\n",
+    "data in a training set and a test set (eventually also an additional\n",
+    "validation set). There\n",
     "is no explicit recipe for how much data should be included as training\n",
-    "data and say test data.  An accepted rule of thumb is to use\n",
+    "data and say test data. An accepted rule of thumb is to use\n",
     "approximately $2/3$ to $4/5$ of the data as training data.\n",
     "\n",
     "You can easily reuse the solutions to your exercises from week 35.\n",
     "See also the lecture slides from week 35 and week 36.\n",
     "\n",
-    "On scaling, we recommend reading the following section from the scikit-learn software description, see <https://scikit-learn.org/stable/auto_examples/preprocessing/plot_all_scaling.html#plot-all-scaling-standard-scaler-section>."
+    "On scaling, we recommend reading the following section from the scikit-learn software description, see <https://scikit-learn.org/stable/auto_examples/preprocessing/plot_all_scaling.html#plot-all-scaling-standard-scaler-section>.\n"
    ]
   },
   {
@@ -241,14 +266,14 @@
     "editable": true
    },
    "source": [
-    "### Part b: Adding Ridge regression for  the Runge  function\n",
+    "### Part b: Adding Ridge regression for the Runge function\n",
     "\n",
     "Write your own code for the Ridge method as done in the previous\n",
-    "exercise. The lecture notes from week 35 and 36 contain more information. Furthermore, the  results from the exercise set from week 36 is something you can reuse here.\n",
+    "exercise. The lecture notes from week 35 and 36 contain more information. Furthermore, the results from the exercise set from week 36 is something you can reuse here.\n",
     "\n",
     "Perform the same analysis as you did in the previous exercise but now for different values of $\\lambda$. Compare and\n",
-    "analyze your results with those obtained in part a) with the OLS  method. Study the\n",
-    "dependence on $\\lambda$."
+    "analyze your results with those obtained in part a) with the OLS method. Study the\n",
+    "dependence on $\\lambda$.\n"
    ]
   },
   {
@@ -267,7 +292,7 @@
     "from week 36).\n",
     "\n",
     "Study and compare your results from parts a) and b) with your gradient\n",
-    "descent approch. Discuss in particular the role of the learning rate."
+    "descent approch. Discuss in particular the role of the learning rate.\n"
    ]
   },
   {
@@ -283,7 +308,7 @@
     "the gradient descent method by including **momentum**, **ADAgrad**,\n",
     "**RMSprop** and **ADAM** as methods fro iteratively updating your learning\n",
     "rate. Discuss the results and compare the different methods applied to\n",
-    "the one-dimensional Runge function. The lecture notes from week 37 contain several examples on how to implement these methods."
+    "the one-dimensional Runge function. The lecture notes from week 37 contain several examples on how to implement these methods.\n"
    ]
   },
   {
@@ -299,12 +324,12 @@
     "represents our first encounter with a machine learning method which\n",
     "cannot be solved through analytical expressions (as in OLS and Ridge regression). Use the gradient\n",
     "descent methods you developed in parts c) and d) to solve the LASSO\n",
-    "optimization problem. You can compare your results with \n",
+    "optimization problem. You can compare your results with\n",
     "the functionalities of **Scikit-Learn**.\n",
     "\n",
     "Discuss (critically) your results for the Runge function from OLS,\n",
     "Ridge and LASSO regression using the various gradient descent\n",
-    "approaches."
+    "approaches.\n"
    ]
   },
   {
@@ -319,7 +344,7 @@
     "Our last gradient step is to include stochastic gradient descent using\n",
     "the same methods to update the learning rates as in parts c-e).\n",
     "Compare and discuss your results with and without stochastic gradient\n",
-    "and give a critical assessment of the various methods."
+    "and give a critical assessment of the various methods.\n"
    ]
   },
   {
@@ -332,14 +357,14 @@
     "### Part g: Bias-variance trade-off and resampling techniques\n",
     "\n",
     "Our aim here is to study the bias-variance trade-off by implementing\n",
-    "the **bootstrap** resampling technique.  **We will only use the simpler\n",
+    "the **bootstrap** resampling technique. **We will only use the simpler\n",
     "ordinary least squares here**.\n",
     "\n",
-    "With a code which does OLS and includes resampling techniques, \n",
+    "With a code which does OLS and includes resampling techniques,\n",
     "we will now discuss the bias-variance trade-off in the context of\n",
     "continuous predictions such as regression. However, many of the\n",
     "intuitions and ideas discussed here also carry over to classification\n",
-    "tasks and basically all Machine Learning algorithms. \n",
+    "tasks and basically all Machine Learning algorithms.\n",
     "\n",
     "Before you perform an analysis of the bias-variance trade-off on your\n",
     "test data, make first a figure similar to Fig. 2.11 of Hastie,\n",
@@ -356,7 +381,7 @@
     "dataset $\\mathcal{L}$ consisting of the data\n",
     "$\\mathbf{X}_\\mathcal{L}=\\{(y_j, \\boldsymbol{x}_j), j=0\\ldots n-1\\}$.\n",
     "\n",
-    "We assume that the true data is generated from a noisy model"
+    "We assume that the true data is generated from a noisy model\n"
    ]
   },
   {
@@ -368,7 +393,7 @@
    "source": [
     "$$\n",
     "\\boldsymbol{y}=f(\\boldsymbol{x}) + \\boldsymbol{\\epsilon}.\n",
-    "$$"
+    "$$\n"
    ]
   },
   {
@@ -387,7 +412,7 @@
     "that is $\\boldsymbol{\\tilde{y}}=\\boldsymbol{X}\\boldsymbol{\\theta}$.\n",
     "\n",
     "The parameters $\\boldsymbol{\\theta}$ are in turn found by optimizing the mean\n",
-    "squared error via the so-called cost function"
+    "squared error via the so-called cost function\n"
    ]
   },
   {
@@ -399,7 +424,7 @@
    "source": [
     "$$\n",
     "C(\\boldsymbol{X},\\boldsymbol{\\theta}) =\\frac{1}{n}\\sum_{i=0}^{n-1}(y_i-\\tilde{y}_i)^2=\\mathbb{E}\\left[(\\boldsymbol{y}-\\boldsymbol{\\tilde{y}})^2\\right].\n",
-    "$$"
+    "$$\n"
    ]
   },
   {
@@ -409,14 +434,14 @@
     "editable": true
    },
    "source": [
-    "Here the expected value $\\mathbb{E}$ is the sample value. \n",
+    "Here the expected value $\\mathbb{E}$ is the sample value.\n",
     "\n",
     "Show that you can rewrite this in terms of a term which contains the\n",
     "variance of the model itself (the so-called variance term), a term\n",
     "which measures the deviation from the true data and the mean value of\n",
     "the model (the bias term) and finally the variance of the noise.\n",
     "\n",
-    "That is, show that"
+    "That is, show that\n"
    ]
   },
   {
@@ -428,7 +453,7 @@
    "source": [
     "$$\n",
     "\\mathbb{E}\\left[(\\boldsymbol{y}-\\boldsymbol{\\tilde{y}})^2\\right]=\\mathrm{Bias}[\\tilde{y}]+\\mathrm{var}[\\tilde{y}]+\\sigma^2,\n",
-    "$$"
+    "$$\n"
    ]
   },
   {
@@ -438,7 +463,7 @@
     "editable": true
    },
    "source": [
-    "with (we approximate $f(\\boldsymbol{x})\\approx \\boldsymbol{y}$)"
+    "with (we approximate $f(\\boldsymbol{x})\\approx \\boldsymbol{y}$)\n"
    ]
   },
   {
@@ -450,7 +475,7 @@
    "source": [
     "$$\n",
     "\\mathrm{Bias}[\\tilde{y}]=\\mathbb{E}\\left[\\left(\\boldsymbol{y}-\\mathbb{E}\\left[\\boldsymbol{\\tilde{y}}\\right]\\right)^2\\right],\n",
-    "$$"
+    "$$\n"
    ]
   },
   {
@@ -460,7 +485,7 @@
     "editable": true
    },
    "source": [
-    "and"
+    "and\n"
    ]
   },
   {
@@ -472,7 +497,7 @@
    "source": [
     "$$\n",
     "\\mathrm{var}[\\tilde{y}]=\\mathbb{E}\\left[\\left(\\tilde{\\boldsymbol{y}}-\\mathbb{E}\\left[\\boldsymbol{\\tilde{y}}\\right]\\right)^2\\right]=\\frac{1}{n}\\sum_i(\\tilde{y}_i-\\mathbb{E}\\left[\\boldsymbol{\\tilde{y}}\\right])^2.\n",
-    "$$"
+    "$$\n"
    ]
   },
   {
@@ -482,11 +507,11 @@
     "editable": true
    },
    "source": [
-    "**Important note**: Since the function $f(x)$ is unknown, in order to be able to evalute the bias, we replace $f(\\boldsymbol{x})$ in the expression for the bias with $\\boldsymbol{y}$. \n",
+    "**Important note**: Since the function $f(x)$ is unknown, in order to be able to evalute the bias, we replace $f(\\boldsymbol{x})$ in the expression for the bias with $\\boldsymbol{y}$.\n",
     "\n",
     "The answer to this exercise should be included in the theory part of\n",
-    "the report.  This exercise is also part of the weekly exercises of\n",
-    "week 38.  Explain what the terms mean and discuss their\n",
+    "the report. This exercise is also part of the weekly exercises of\n",
+    "week 38. Explain what the terms mean and discuss their\n",
     "interpretations.\n",
     "\n",
     "Perform then a bias-variance analysis of the Runge function by\n",
@@ -495,7 +520,7 @@
     "Discuss the bias and variance trade-off as function\n",
     "of your model complexity (the degree of the polynomial) and the number\n",
     "of data points, and possibly also your training and test data using the **bootstrap** resampling method.\n",
-    "You can follow the code example in the jupyter-book at <https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/chapter3.html#the-bias-variance-tradeoff>."
+    "You can follow the code example in the jupyter-book at <https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/chapter3.html#the-bias-variance-tradeoff>.\n"
    ]
   },
   {
@@ -505,20 +530,20 @@
     "editable": true
    },
    "source": [
-    "### Part h):  Cross-validation as resampling techniques, adding more complexity\n",
+    "### Part h): Cross-validation as resampling techniques, adding more complexity\n",
     "\n",
     "The aim here is to implement another widely popular\n",
-    "resampling technique, the so-called cross-validation method.  \n",
+    "resampling technique, the so-called cross-validation method.\n",
     "\n",
     "Implement the $k$-fold cross-validation algorithm (feel free to use\n",
     "the functionality of **Scikit-Learn** or write your own code) and\n",
     "evaluate again the MSE function resulting from the test folds.\n",
     "\n",
     "Compare the MSE you get from your cross-validation code with the one\n",
-    "you got from your **bootstrap** code from the previous exercise. Comment and interpret your results. \n",
+    "you got from your **bootstrap** code from the previous exercise. Comment and interpret your results.\n",
     "\n",
     "In addition to using the ordinary least squares method, you should\n",
-    "include both Ridge and Lasso regression in the final analysis."
+    "include both Ridge and Lasso regression in the final analysis.\n"
    ]
   },
   {
@@ -532,7 +557,7 @@
     "\n",
     "1. For a discussion and derivation of the variances and mean squared errors using linear regression, see the [Lecture notes on ridge regression by Wessel N. van Wieringen](https://arxiv.org/abs/1509.09169)\n",
     "\n",
-    "2. The textbook of [Trevor Hastie, Robert Tibshirani, Jerome H. Friedman, The Elements of Statistical Learning, Springer](https://www.springer.com/gp/book/9780387848570), chapters 3 and 7 are the most relevant ones for the analysis of parts g) and h)."
+    "2. The textbook of [Trevor Hastie, Robert Tibshirani, Jerome H. Friedman, The Elements of Statistical Learning, Springer](https://www.springer.com/gp/book/9780387848570), chapters 3 and 7 are the most relevant ones for the analysis of parts g) and h).\n"
    ]
   },
   {
@@ -544,25 +569,25 @@
    "source": [
     "## Introduction to numerical projects\n",
     "\n",
-    "Here follows a brief recipe and recommendation on how to answer the various questions when preparing your answers. \n",
+    "Here follows a brief recipe and recommendation on how to answer the various questions when preparing your answers.\n",
     "\n",
-    "  * Give a short description of the nature of the problem and the eventual  numerical methods you have used.\n",
+    "- Give a short description of the nature of the problem and the eventual numerical methods you have used.\n",
     "\n",
-    "  * Describe the algorithm you have used and/or developed. Here you may find it convenient to use pseudocoding. In many cases you can describe the algorithm in the program itself.\n",
+    "- Describe the algorithm you have used and/or developed. Here you may find it convenient to use pseudocoding. In many cases you can describe the algorithm in the program itself.\n",
     "\n",
-    "  * Include the source code of your program. Comment your program properly. You should have the code at your GitHub/GitLab link. You can also place the code in an appendix of your report.\n",
+    "- Include the source code of your program. Comment your program properly. You should have the code at your GitHub/GitLab link. You can also place the code in an appendix of your report.\n",
     "\n",
-    "  * If possible, try to find analytic solutions, or known limits in order to test your program when developing the code.\n",
+    "- If possible, try to find analytic solutions, or known limits in order to test your program when developing the code.\n",
     "\n",
-    "  * Include your results either in figure form or in a table. Remember to        label your results. All tables and figures should have relevant captions        and labels on the axes.\n",
+    "- Include your results either in figure form or in a table. Remember to label your results. All tables and figures should have relevant captions and labels on the axes.\n",
     "\n",
-    "  * Try to evaluate the reliabilty and numerical stability/precision of your results. If possible, include a qualitative and/or quantitative discussion of the numerical stability, eventual loss of precision etc.\n",
+    "- Try to evaluate the reliabilty and numerical stability/precision of your results. If possible, include a qualitative and/or quantitative discussion of the numerical stability, eventual loss of precision etc.\n",
     "\n",
-    "  * Try to give an interpretation of you results in your answers to  the problems.\n",
+    "- Try to give an interpretation of you results in your answers to the problems.\n",
     "\n",
-    "  * Critique: if possible include your comments and reflections about the  exercise, whether you felt you learnt something, ideas for improvements and  other thoughts you've made when solving the exercise. We wish to keep this course at the interactive level and your comments can help us improve it.\n",
+    "- Critique: if possible include your comments and reflections about the exercise, whether you felt you learnt something, ideas for improvements and other thoughts you've made when solving the exercise. We wish to keep this course at the interactive level and your comments can help us improve it.\n",
     "\n",
-    "  * Try to establish a practice where you log your work at the  computerlab. You may find such a logbook very handy at later stages in your work, especially when you don't properly remember  what a previous test version  of your program did. Here you could also record  the time spent on solving the exercise, various algorithms you may have tested or other topics which you feel worthy of mentioning."
+    "- Try to establish a practice where you log your work at the computerlab. You may find such a logbook very handy at later stages in your work, especially when you don't properly remember what a previous test version of your program did. Here you could also record the time spent on solving the exercise, various algorithms you may have tested or other topics which you feel worthy of mentioning.\n"
    ]
   },
   {
@@ -574,17 +599,17 @@
    "source": [
     "## Format for electronic delivery of report and programs\n",
     "\n",
-    "The preferred format for the report is a PDF file. You can also use DOC or postscript formats or as an ipython notebook file.  As programming language we prefer that you choose between C/C++, Fortran2008, Julia or Python. The following prescription should be followed when preparing the report:\n",
+    "The preferred format for the report is a PDF file. You can also use DOC or postscript formats or as an ipython notebook file. As programming language we prefer that you choose between C/C++, Fortran2008, Julia or Python. The following prescription should be followed when preparing the report:\n",
     "\n",
-    "  * Use Canvas to hand in your projects, log in  at  <https://www.uio.no/english/services/it/education/canvas/> with your normal UiO username and password.\n",
+    "- Use Canvas to hand in your projects, log in at <https://www.uio.no/english/services/it/education/canvas/> with your normal UiO username and password.\n",
     "\n",
-    "  * Upload **only** the report file or the link to your GitHub/GitLab or similar typo of  repos!  For the source code file(s) you have developed please provide us with your link to your GitHub/GitLab or similar  domain.  The report file should include all of your discussions and a list of the codes you have developed.  Do not include library files which are available at the course homepage, unless you have made specific changes to them.\n",
+    "- Upload **only** the report file or the link to your GitHub/GitLab or similar typo of repos! For the source code file(s) you have developed please provide us with your link to your GitHub/GitLab or similar domain. The report file should include all of your discussions and a list of the codes you have developed. Do not include library files which are available at the course homepage, unless you have made specific changes to them.\n",
     "\n",
-    "  * In your GitHub/GitLab or similar repository, please include a folder which contains selected results. These can be in the form of output from your code for a selected set of runs and input parameters.\n",
+    "- In your GitHub/GitLab or similar repository, please include a folder which contains selected results. These can be in the form of output from your code for a selected set of runs and input parameters.\n",
     "\n",
-    "Finally, \n",
-    "we encourage you to collaborate. Optimal working groups consist of \n",
-    "2-3 students. You can then hand in a common report."
+    "Finally,\n",
+    "we encourage you to collaborate. Optimal working groups consist of\n",
+    "2-3 students. You can then hand in a common report.\n"
    ]
   },
   {
@@ -596,42 +621,46 @@
    "source": [
     "## Software and needed installations\n",
     "\n",
-    "If you have Python installed (we recommend Python3) and you feel pretty familiar with installing different packages, \n",
+    "If you have Python installed (we recommend Python3) and you feel pretty familiar with installing different packages,\n",
     "we recommend that you install the following Python packages via **pip** as\n",
+    "\n",
     "1. pip install numpy scipy matplotlib ipython scikit-learn tensorflow sympy pandas pillow\n",
     "\n",
     "For Python3, replace **pip** with **pip3**.\n",
     "\n",
-    "See below for a discussion of **tensorflow** and **scikit-learn**. \n",
+    "See below for a discussion of **tensorflow** and **scikit-learn**.\n",
     "\n",
-    "For OSX users we recommend also, after having installed Xcode, to install **brew**. Brew allows \n",
+    "For OSX users we recommend also, after having installed Xcode, to install **brew**. Brew allows\n",
     "for a seamless installation of additional software via for example\n",
+    "\n",
     "1. brew install python3\n",
     "\n",
     "For Linux users, with its variety of distributions like for example the widely popular Ubuntu distribution\n",
-    "you can use **pip** as well and simply install Python as \n",
-    "1. sudo apt-get install python3  (or python for python2.7)\n",
+    "you can use **pip** as well and simply install Python as\n",
+    "\n",
+    "1. sudo apt-get install python3 (or python for python2.7)\n",
+    "\n",
+    "etc etc.\n",
     "\n",
-    "etc etc. \n",
+    "If you don't want to install various Python packages with their dependencies separately, we recommend two widely used distrubutions which set up all relevant dependencies for Python, namely\n",
     "\n",
-    "If you don't want to install various Python packages with their dependencies separately, we recommend two widely used distrubutions which set up  all relevant dependencies for Python, namely\n",
     "1. [Anaconda](https://docs.anaconda.com/) Anaconda is an open source distribution of the Python and R programming languages for large-scale data processing, predictive analytics, and scientific computing, that aims to simplify package management and deployment. Package versions are managed by the package management system **conda**\n",
     "\n",
-    "2. [Enthought canopy](https://www.enthought.com/product/canopy/)  is a Python distribution for scientific and analytic computing distribution and analysis environment, available for free and under a commercial license.\n",
+    "2. [Enthought canopy](https://www.enthought.com/product/canopy/) is a Python distribution for scientific and analytic computing distribution and analysis environment, available for free and under a commercial license.\n",
     "\n",
     "Popular software packages written in Python for ML are\n",
     "\n",
-    "* [Scikit-learn](http://scikit-learn.org/stable/), \n",
+    "- [Scikit-learn](http://scikit-learn.org/stable/),\n",
     "\n",
-    "* [Tensorflow](https://www.tensorflow.org/),\n",
+    "- [Tensorflow](https://www.tensorflow.org/),\n",
     "\n",
-    "* [PyTorch](http://pytorch.org/) and \n",
+    "- [PyTorch](http://pytorch.org/) and\n",
     "\n",
-    "* [Keras](https://keras.io/).\n",
+    "- [Keras](https://keras.io/).\n",
     "\n",
-    "These are all freely available at their respective GitHub sites. They \n",
+    "These are all freely available at their respective GitHub sites. They\n",
     "encompass communities of developers in the thousands or more. And the number\n",
-    "of code developers and contributors keeps increasing."
+    "of code developers and contributors keeps increasing.\n"
    ]
   }
  ],
diff --git a/doc/LectureNotes/_build/html/_sources/project2.ipynb b/doc/LectureNotes/_build/html/_sources/project2.ipynb
new file mode 100644
index 000000000..faf4aee16
--- /dev/null
+++ b/doc/LectureNotes/_build/html/_sources/project2.ipynb
@@ -0,0 +1,635 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "96e577ca",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- HTML file automatically generated from DocOnce source (https://github.com/doconce/doconce/)\n",
+    "doconce format html Project2.do.txt  -->\n",
+    "<!-- dom:TITLE: Project 2 on Machine Learning, deadline November 10 (Midnight) -->"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "067c02b9",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "# Project 2 on Machine Learning, deadline November 10 (Midnight)\n",
+    "**[Data Analysis and Machine Learning FYS-STK3155/FYS4155](http://www.uio.no/studier/emner/matnat/fys/FYS3155/index-eng.html)**, University of Oslo, Norway\n",
+    "\n",
+    "Date: **October 14, 2025**"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "01f9fedd",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Deliverables\n",
+    "\n",
+    "First, join a group in canvas with your group partners. Pick an avaliable group for Project 2 in the **People** page.\n",
+    "\n",
+    "In canvas, deliver as a group and include:\n",
+    "\n",
+    "* A PDF of your report which follows the guidelines covered below and in the week 39 exercises. Additional requirements include:\n",
+    "\n",
+    "  * It should be around 5000 words, use the word counter in Overleaf for this. This often corresponds to 10-12 pages. References and appendices are excluded from the word count\n",
+    "\n",
+    "  * It should include around 10-15 figures. You can include more figures in appendices and/or as supplemental material in your repository.\n",
+    "\n",
+    "* A comment linking to your github repository (or folder in one of your github repositories) for this project. The repository must include\n",
+    "\n",
+    "A PDF file of the report\n",
+    "  * A folder named Code, where you put python files for your functions and notebooks for reproducing your results. Remember to use a seed for generating random data and for train-test splits when generating final results.\n",
+    "\n",
+    "  * A README file with the name of the group members\n",
+    "\n",
+    "  * a short description of the project\n",
+    "\n",
+    "  * a description of how to install the required packages to run your code from a requirements.txt file or similar (such as a plain text description) names and descriptions of the various notebooks in the Code folder and the results they produce"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9f8e4871",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "### Preamble: Note on writing reports, using reference material, AI and other tools\n",
+    "\n",
+    "We want you to answer the three different projects by handing in\n",
+    "reports written like a standard scientific/technical report. The links\n",
+    "at\n",
+    "https://github.com/CompPhysics/MachineLearning/tree/master/doc/Projects/n",
+    "contain more information. There you can find examples of previous\n",
+    "reports, the projects themselves, how we grade reports etc. How to\n",
+    "write reports will also be discussed during the various lab\n",
+    "sessions. Please do ask us if you are in doubt.\n",
+    "\n",
+    "When using codes and material from other sources, you should refer to\n",
+    "these in the bibliography of your report, indicating wherefrom you for\n",
+    "example got the code, whether this is from the lecture notes,\n",
+    "softwares like Scikit-Learn, TensorFlow, PyTorch or other\n",
+    "sources. These sources should always be cited correctly. How to cite\n",
+    "some of the libraries is often indicated from their corresponding\n",
+    "GitHub sites or websites, see for example how to cite Scikit-Learn at\n",
+    "https://scikit-learn.org/dev/about.html./n",
+    "\n",
+    "We enocurage you to use tools like ChatGPT or similar in writing the\n",
+    "report. If you use for example ChatGPT, please do cite it properly and\n",
+    "include (if possible) your questions and answers as an addition to the\n",
+    "report. This can be uploaded to for example your website,\n",
+    "GitHub/GitLab or similar as supplemental material.\n",
+    "\n",
+    "If you would like to study other data sets, feel free to propose other\n",
+    "sets. What we have proposed here are mere suggestions from our\n",
+    "side. If you opt for another data set, consider using a set which has\n",
+    "been studied in the scientific literature. This makes it easier for\n",
+    "you to compare and analyze your results. Comparing with existing\n",
+    "results from the scientific literature is also an essential element of\n",
+    "the scientific discussion. The University of California at Irvine with\n",
+    "its Machine Learning repository at\n",
+    "https://archive.ics.uci.edu/ml/index.php%20is%20an%20excellent%20site%20to%20look/n",
+    "up for examples and inspiration. Kaggle.com is an equally interesting\n",
+    "site. Feel free to explore these sites."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "460cc6ea",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Classification and Regression, writing our own neural network code\n",
+    "\n",
+    "The main aim of this project is to study both classification and\n",
+    "regression problems by developing our own \n",
+    "feed-forward neural network (FFNN) code. The exercises from week 41 and 42 (see <https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/exercisesweek41.html> and <https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/exercisesweek42.html>) as well as the lecture material from the same weeks (see  <https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/week41.html> and <https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/week42.html>) should contain enough information for you to get started with writing your own code.\n",
+    "\n",
+    "We will also reuse our codes on gradient descent methods from project 1.\n",
+    "\n",
+    "The data sets that we propose here are (the default sets)\n",
+    "\n",
+    "* Regression (fitting a continuous function). In this part you will need to bring back your results from project 1 and compare these with what you get from your Neural Network code to be developed here. The data sets could be\n",
+    "\n",
+    "  * The simple one-dimensional function Runge function from project 1, that is $f(x) = \\frac{1}{1+25x^2}$. We recommend using a simpler function when developing your neural network code for regression problems. Feel however free to discuss and study other functions, such as the two-dimensional Runge function $f(x,y)=\\left[(10x - 5)^2 + (10y - 5)^2 + 1 \\right]^{-1}$, or even more complicated two-dimensional functions (see the supplementary material of <https://www.nature.com/articles/s41467-025-61362-4> for an extensive list of two-dimensional functions). \n",
+    "\n",
+    "* Classification.\n",
+    "\n",
+    " * We will consider a multiclass classification problem given by the full MNIST data set. The full data set is at <https://www.kaggle.com/datasets/hojjatk/mnist-dataset>.\n",
+    "\n",
+    "We will start with a regression problem and we will reuse our codes on gradient descent methods from project 1."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d62a07ef",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "### Part a): Analytical warm-up\n",
+    "\n",
+    "When using our gradient machinery from project 1, we will need the expressions for the cost/loss functions and their respective\n",
+    "gradients. The functions whose gradients we need are:\n",
+    "1. The mean-squared error (MSE) with and without the $L_1$ and $L_2$ norms (regression problems)\n",
+    "\n",
+    "2. The binary cross entropy (aka log loss)  for binary classification problems with and without $L_1$ and $L_2$ norms\n",
+    "\n",
+    "3. The multiclass cross entropy cost/loss function (aka Softmax cross entropy or just Softmax loss function)\n",
+    "\n",
+    "Set up these three cost/loss functions and their respective derivatives and explain the various terms. In this project you will however only use the MSE and the Softmax  cross entropy.\n",
+    "\n",
+    "We will test three activation functions for our neural network setup, these are the \n",
+    "1. The Sigmoid (aka **logit**) function,\n",
+    "\n",
+    "2. the RELU function and\n",
+    "\n",
+    "3. the Leaky RELU function\n",
+    "\n",
+    "Set up their expressions and their first derivatives.\n",
+    "You may consult the lecture notes (with codes and more) from week 42 at <https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/week42.html>."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9cd8b8ac",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "### Reminder about the gradient machinery from project 1\n",
+    "\n",
+    "In the setup of a neural network code you will need your gradient descent codes from\n",
+    "project 1.  For neural networks we will recommend using stochastic\n",
+    "gradient descent with either the RMSprop or the ADAM algorithms for\n",
+    "updating the learning rates. But you should feel free to try plain gradient descent as well.\n",
+    "\n",
+    "We recommend reading chapter 8 on optimization from the textbook of\n",
+    "Goodfellow, Bengio and Courville at\n",
+    "<https://www.deeplearningbook.org/>. This chapter contains many\n",
+    "useful insights and discussions on the optimization part of machine\n",
+    "learning.  A useful reference on the back progagation algorithm is\n",
+    "Nielsen's book at <http://neuralnetworksanddeeplearning.com/>. \n",
+    "\n",
+    "You will find the Python [Seaborn\n",
+    "package](https://seaborn.pydata.org/generated/seaborn.heatmap.html)\n",
+    "useful when plotting the results as function of the learning rate\n",
+    "$\\eta$ and the hyper-parameter $\\lambda$ ."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5931b155",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "### Part b): Writing your own Neural Network code\n",
+    "\n",
+    "Your aim now, and this is the central part of this project, is to\n",
+    "write your own FFNN code implementing the back\n",
+    "propagation algorithm discussed in the lecture slides from week 41 at <https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/week41.html> and week 42 at <https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/week42.html>.\n",
+    "\n",
+    "We will focus on a regression problem first, using the one-dimensional Runge function"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b273fc8a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "f(x) = \\frac{1}{1+25x^2},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e13db1ec",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "from project 1.\n",
+    "\n",
+    "Use only the mean-squared error as cost function (no regularization terms) and \n",
+    "write an FFNN code for a regression problem with a flexible number of hidden\n",
+    "layers and nodes using only the Sigmoid function as activation function for\n",
+    "the hidden layers. Initialize the weights using a normal\n",
+    "distribution. How would you initialize the biases? And which\n",
+    "activation function would you select for the final output layer?\n",
+    "And how would you set up your design/feature matrix? Hint: does it have to represent a polynomial approximation as you did in project 1? \n",
+    "\n",
+    "Train your network and compare the results with those from your OLS\n",
+    "regression code from project 1 using the one-dimensional Runge\n",
+    "function.  When comparing your neural network code with the OLS\n",
+    "results from project 1, use the same data sets which gave you the best\n",
+    "MSE score. Moreover, use the polynomial order from project 1 that gave you the\n",
+    "best result.  Compare these results with your neural network with one\n",
+    "and two hidden layers using $50$ and $100$ hidden nodes, respectively.\n",
+    "\n",
+    "Comment your results and give a critical discussion of the results\n",
+    "obtained with the OLS code from project 1 and your own neural network\n",
+    "code.  Make an analysis of the learning rates employed to find the\n",
+    "optimal MSE score. Test both stochastic gradient descent\n",
+    "with RMSprop and ADAM and plain gradient descent with different\n",
+    "learning rates.\n",
+    "\n",
+    "You should, as you did in project 1, scale your data."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4f864e31",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "### Part c): Testing against other software libraries\n",
+    "\n",
+    "You should test your results against a similar code using **Scikit-Learn** (see the examples in the above lecture notes from weeks 41 and 42) or **tensorflow/keras** or **Pytorch** (for Pytorch, see Raschka et al.'s text chapters 12 and 13). \n",
+    "\n",
+    "Furthermore, you should also test that your derivatives are correctly\n",
+    "calculated using automatic differentiation, using for example the\n",
+    "**Autograd** library or the **JAX** library. It is optional to implement\n",
+    "these libraries for the present project. In this project they serve as\n",
+    "useful tests of our derivatives."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c9faeafd",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "### Part d): Testing different activation functions and depths of the neural network\n",
+    "\n",
+    "You should also test different activation functions for the hidden\n",
+    "layers. Try out the Sigmoid, the RELU and the Leaky RELU functions and\n",
+    "discuss your results.  Test your results as functions of the number of hidden layers and nodes. Do you see signs of overfitting?\n",
+    "It is optional in this project to perform a bias-variance trade-off analysis."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d865c22b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "### Part e): Testing different norms\n",
+    "\n",
+    "Finally, still using the one-dimensional Runge function, add now the\n",
+    "hyperparameters $\\lambda$ with the $L_2$ and $L_1$ norms.  Find the\n",
+    "optimal results for the hyperparameters $\\lambda$ and the learning\n",
+    "rates $\\eta$ and neural network architecture and compare the $L_2$ results with Ridge regression from\n",
+    "project 1 and the $L_1$ results with the Lasso calculations of project 1.\n",
+    "Use again the same data sets and the best results from project 1 in your comparisons."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5270af8f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "### Part f): Classification  analysis using neural networks\n",
+    "\n",
+    "With a well-written code it should now be easy to change the\n",
+    "activation function for the output layer.\n",
+    "\n",
+    "Here we will change the cost function for our neural network code\n",
+    "developed in parts b), d) and e) in order to perform a classification\n",
+    "analysis.  The classification problem we will study is the multiclass\n",
+    "MNIST problem, see the description of the full data set at\n",
+    "<https://www.kaggle.com/datasets/hojjatk/mnist-dataset>. We will use the Softmax cross entropy function discussed in a). \n",
+    "The MNIST data set discussed in the lecture notes from week 42 is a downscaled variant of the full dataset. \n",
+    "\n",
+    "Feel free to suggest other data sets. If you find the classic MNIST data set somewhat limited, feel free to try the  \n",
+    "MNIST-Fashion data set at for example <https://www.kaggle.com/datasets/zalando-research/fashionmnist>.\n",
+    "\n",
+    "To set up the data set, the following python programs may be useful"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "id": "4e0e1fea",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "from sklearn.datasets import fetch_openml\n",
+    "\n",
+    "# Fetch the MNIST dataset\n",
+    "mnist = fetch_openml('mnist_784', version=1, as_frame=False, parser='auto')\n",
+    "\n",
+    "# Extract data (features) and target (labels)\n",
+    "X = mnist.data\n",
+    "y = mnist.target"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8fe85677",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "You should consider scaling the data. The Pixel values in MNIST range from 0 to 255. Scaling them to a 0-1 range can improve the performance of some models. That is, you could implement the following scaling"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "id": "b28318b2",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "X = X / 255.0"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "97e02c71",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "And then perform the standard train-test splitting"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "id": "88af355c",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "from sklearn.model_selection import train_test_split\n",
+    "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d1f8f0ed",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "To measure the performance of our classification problem we will use the\n",
+    "so-called *accuracy* score.  The accuracy is as you would expect just\n",
+    "the number of correctly guessed targets $t_i$ divided by the total\n",
+    "number of targets, that is"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "554b3a48",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\text{Accuracy} = \\frac{\\sum_{i=1}^n I(t_i = y_i)}{n} ,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "77bfdd5c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where $I$ is the indicator function, $1$ if $t_i = y_i$ and $0$\n",
+    "otherwise if we have a binary classification problem. Here $t_i$\n",
+    "represents the target and $y_i$ the outputs of your FFNN code and $n$ is simply the number of targets $t_i$.\n",
+    "\n",
+    "Discuss your results and give a critical analysis of the various parameters, including hyper-parameters like the learning rates and the regularization parameter $\\lambda$, various activation functions, number of hidden layers and nodes and activation functions.  \n",
+    "\n",
+    "Again, we strongly recommend that you compare your own neural Network\n",
+    "code for classification and pertinent results against a similar code using **Scikit-Learn**  or **tensorflow/keras** or **pytorch**.\n",
+    "\n",
+    "If you have time, you can use the functionality of **scikit-learn** and compare your neural network results with those from Logistic regression. This is optional.\n",
+    "The weblink  here <https://medium.com/ai-in-plain-english/comparison-between-logistic-regression-and-neural-networks-in-classifying-digits-dc5e85cd93c3>compares logistic regression and FFNN using the so-called MNIST data set. You may find several useful hints and ideas from this article. Your neural network code can implement the equivalent of logistic regression by simply setting the number of hidden layers to zero and keeping just the input and the output layers. \n",
+    "\n",
+    "If you wish to compare with say Logisti Regression from **scikit-learn**, the following code uses the above data set"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "id": "eaa9e72e",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "from sklearn.linear_model import LogisticRegression\n",
+    "# Initialize the model\n",
+    "model = LogisticRegression(solver='saga', multi_class='multinomial', max_iter=1000, random_state=42)\n",
+    "# Train the model\n",
+    "model.fit(X_train, y_train)\n",
+    "from sklearn.metrics import accuracy_score\n",
+    "# Make predictions on the test set\n",
+    "y_pred = model.predict(X_test)\n",
+    "# Calculate accuracy\n",
+    "accuracy = accuracy_score(y_test, y_pred)\n",
+    "print(f\"Model Accuracy: {accuracy:.4f}\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c7ba883e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "### Part g) Critical evaluation of the various algorithms\n",
+    "\n",
+    "After all these glorious calculations, you should now summarize the\n",
+    "various algorithms and come with a critical evaluation of their pros\n",
+    "and cons. Which algorithm works best for the regression case and which\n",
+    "is best for the classification case. These codes can also be part of\n",
+    "your final project 3, but now applied to other data sets."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "595be693",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Summary of methods to implement and analyze\n",
+    "\n",
+    "**Required Implementation:**\n",
+    "1. Reuse the regression code and results from project 1, these will act as a benchmark for seeing how suited a neural network is for this regression task.\n",
+    "\n",
+    "2. Implement a neural network with\n",
+    "\n",
+    "  * A flexible number of layers\n",
+    "\n",
+    "  * A flexible number of nodes in each layer\n",
+    "\n",
+    "  * A changeable activation function in each layer (Sigmoid, ReLU, LeakyReLU, as well as Linear and Softmax)\n",
+    "\n",
+    "  * A changeable cost function, which will be set to MSE for regression and cross-entropy for multiple-classification\n",
+    "\n",
+    "  * An optional L1 or L2 norm of the weights and biases in the cost function (only used for computing gradients, not interpretable metrics)\n",
+    "\n",
+    "3. Implement the back-propagation algorithm to compute the gradient of your neural network\n",
+    "\n",
+    "4. Reuse the implementation of Plain and Stochastic Gradient Descent from Project 1 (and adapt the code to work with the your neural network)\n",
+    "\n",
+    "  * With no optimization algorithm\n",
+    "\n",
+    "  * With RMS Prop\n",
+    "\n",
+    "  * With ADAM\n",
+    "\n",
+    "5. Implement scaling and train-test splitting of your data, preferably using sklearn\n",
+    "\n",
+    "6. Implement and compute metrics like the MSE and Accuracy"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "35138b41",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "### Required Analysis:\n",
+    "\n",
+    "1. Briefly show and argue for the advantages and disadvantages of the methods from Project 1.\n",
+    "\n",
+    "2. Explore and show the impact of changing the number of layers, nodes per layer, choice of activation function, and inclusion of L1 and L2 norms. Present only the most interesting results from this exploration. 2D Heatmaps will be good for this: Start with finding a well performing set of hyper-parameters, then change two at a time in a range that shows good and bad performance.\n",
+    "\n",
+    "3. Show and argue for the advantages and disadvantages of using a neural network for regression on your data\n",
+    "\n",
+    "4. Show and argue for the advantages and disadvantages of using a neural network for classification on your data\n",
+    "\n",
+    "5. Show and argue for the advantages and disadvantages of the different gradient methods and learning rates when training the neural network"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b18bea03",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "### Optional (Note that you should include at least two of these in the report):\n",
+    "\n",
+    "1. Implement Logistic Regression as simple classification model case (equivalent to a Neural Network with just the output layer)\n",
+    "\n",
+    "2. Compute the gradient of the neural network with autograd, to show that it gives the same result as your hand-written backpropagation.\n",
+    "\n",
+    "3. Compare your results with results from using a machine-learning library like pytorch (https://docs.pytorch.org/tutorials/beginner/basics/buildmodel_tutorial.html)\n",
+    "\n",
+    "4. Use a more complex classification dataset instead, like the fashion MNIST (see <https://www.kaggle.com/datasets/zalando-research/fashionmnist>)\n",
+    "\n",
+    "5. Use a more complex regression dataset instead, like the two-dimensional Runge function $f(x,y)=\\left[(10x - 5)^2 + (10y - 5)^2 + 1 \\right]^{-1}$, or even more complicated two-dimensional functions (see the supplementary material of <https://www.nature.com/articles/s41467-025-61362-4> for an extensive list of two-dimensional functions). \n",
+    "\n",
+    "6. Compute and interpret a confusion matrix of your best classification model (see <https://www.researchgate.net/figure/Confusion-matrix-of-MNIST-and-F-MNIST-embeddings_fig5_349758607>)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "580d8424",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Background literature\n",
+    "\n",
+    "1. The text of Michael Nielsen is highly recommended, see Nielsen's book at <http://neuralnetworksanddeeplearning.com/>. It is an excellent read.\n",
+    "\n",
+    "2. Goodfellow, Bengio and Courville, Deep Learning at <https://www.deeplearningbook.org/>. Here we recommend chapters 6, 7 and 8\n",
+    "\n",
+    "3. Raschka et al. at <https://sebastianraschka.com/blog/2022/ml-pytorch-book.html>. Here we recommend chapters 11, 12 and 13."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "96f5c67e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Introduction to numerical projects\n",
+    "\n",
+    "Here follows a brief recipe and recommendation on how to write a report for each\n",
+    "project.\n",
+    "\n",
+    "  * Give a short description of the nature of the problem and the eventual  numerical methods you have used.\n",
+    "\n",
+    "  * Describe the algorithm you have used and/or developed. Here you may find it convenient to use pseudocoding. In many cases you can describe the algorithm in the program itself.\n",
+    "\n",
+    "  * Include the source code of your program. Comment your program properly.\n",
+    "\n",
+    "  * If possible, try to find analytic solutions, or known limits in order to test your program when developing the code.\n",
+    "\n",
+    "  * Include your results either in figure form or in a table. Remember to        label your results. All tables and figures should have relevant captions        and labels on the axes.\n",
+    "\n",
+    "  * Try to evaluate the reliabilty and numerical stability/precision of your results. If possible, include a qualitative and/or quantitative discussion of the numerical stability, eventual loss of precision etc.\n",
+    "\n",
+    "  * Try to give an interpretation of you results in your answers to  the problems.\n",
+    "\n",
+    "  * Critique: if possible include your comments and reflections about the  exercise, whether you felt you learnt something, ideas for improvements and  other thoughts you've made when solving the exercise. We wish to keep this course at the interactive level and your comments can help us improve it.\n",
+    "\n",
+    "  * Try to establish a practice where you log your work at the  computerlab. You may find such a logbook very handy at later stages in your work, especially when you don't properly remember  what a previous test version  of your program did. Here you could also record  the time spent on solving the exercise, various algorithms you may have tested or other topics which you feel worthy of mentioning."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d1bc28ba",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Format for electronic delivery of report and programs\n",
+    "\n",
+    "The preferred format for the report is a PDF file. You can also use DOC or postscript formats or as an ipython notebook file.  As programming language we prefer that you choose between C/C++, Fortran2008 or Python. The following prescription should be followed when preparing the report:\n",
+    "\n",
+    "  * Use Canvas to hand in your projects, log in  at  <https://www.uio.no/english/services/it/education/canvas/> with your normal UiO username and password.\n",
+    "\n",
+    "  * Upload **only** the report file or the link to your GitHub/GitLab or similar typo of  repos!  For the source code file(s) you have developed please provide us with your link to your GitHub/GitLab or similar  domain.  The report file should include all of your discussions and a list of the codes you have developed.  Do not include library files which are available at the course homepage, unless you have made specific changes to them.\n",
+    "\n",
+    "  * In your GitHub/GitLab or similar repository, please include a folder which contains selected results. These can be in the form of output from your code for a selected set of runs and input parameters.\n",
+    "\n",
+    "Finally, \n",
+    "we encourage you to collaborate. Optimal working groups consist of \n",
+    "2-3 students. You can then hand in a common report."
+   ]
+  }
+ ],
+ "metadata": {},
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/doc/LectureNotes/_build/html/_sources/week37.ipynb b/doc/LectureNotes/_build/html/_sources/week37.ipynb
new file mode 100644
index 000000000..fe89adb05
--- /dev/null
+++ b/doc/LectureNotes/_build/html/_sources/week37.ipynb
@@ -0,0 +1,3856 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "d842e7e1",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- HTML file automatically generated from DocOnce source (https://github.com/doconce/doconce/)\n",
+    "doconce format html week37.do.txt --no_mako -->\n",
+    "<!-- dom:TITLE: Week 37: Gradient descent methods -->"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0cd52479",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "# Week 37: Gradient descent methods\n",
+    "**Morten Hjorth-Jensen**, Department of Physics, University of Oslo, Norway\n",
+    "\n",
+    "Date: **September 8-12, 2025**\n",
+    "\n",
+    "<!-- todo add link to videos and add link to Van Wieringens notes -->"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "699b6141",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Plans for week 37, lecture Monday\n",
+    "\n",
+    "**Plans and material  for the lecture on Monday September 8.**\n",
+    "\n",
+    "The family of gradient descent methods\n",
+    "1. Plain gradient descent (constant learning rate), reminder from last week with examples using OLS and Ridge\n",
+    "\n",
+    "2. Improving gradient descent with momentum\n",
+    "\n",
+    "3. Introducing stochastic gradient descent\n",
+    "\n",
+    "4. More advanced updates of the learning rate: ADAgrad, RMSprop and ADAM\n",
+    "\n",
+    "5. [Video of Lecture](https://youtu.be/SuxK68tj-V8)\n",
+    "\n",
+    "6. [Whiteboard notes](https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2025/FYSSTKweek37.pdf)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "dd264b1c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Readings and Videos:\n",
+    "1. Recommended: Goodfellow et al, Deep Learning, introduction to gradient descent, see sections 4.3-4.5  at <https://www.deeplearningbook.org/contents/numerical.html> and chapter 8.3-8.5 at <https://www.deeplearningbook.org/contents/optimization.html>\n",
+    "\n",
+    "2. Rashcka et al, pages 37-44 and pages 278-283 with focus on linear regression.\n",
+    "\n",
+    "3. Video on gradient descent at <https://www.youtube.com/watch?v=sDv4f4s2SB8>\n",
+    "\n",
+    "4. Video on Stochastic gradient descent at <https://www.youtube.com/watch?v=vMh0zPT0tLI>"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "608927bc",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Material for lecture Monday September 8"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "60640670",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Gradient descent and revisiting Ordinary Least Squares from last week\n",
+    "\n",
+    "Last week we started with  linear regression as a case study for the gradient descent\n",
+    "methods. Linear regression is a great test case for the gradient\n",
+    "descent methods discussed in the lectures since it has several\n",
+    "desirable properties such as:\n",
+    "\n",
+    "1. An analytical solution (recall homework sets for week 35).\n",
+    "\n",
+    "2. The gradient can be computed analytically.\n",
+    "\n",
+    "3. The cost function is convex which guarantees that gradient descent converges for small enough learning rates\n",
+    "\n",
+    "We revisit an example similar to what we had in the first homework set. We have a function  of the type"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "id": "947b67ee",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "import numpy as np\n",
+    "x = 2*np.random.rand(m,1)\n",
+    "y = 4+3*x+np.random.randn(m,1)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0a787eca",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "with $x_i \\in [0,1] $ is chosen randomly using a uniform distribution. Additionally we have a stochastic noise chosen according to a normal distribution $\\cal {N}(0,1)$. \n",
+    "The linear regression model is given by"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d7e84ac7",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "h_\\theta(x) = \\boldsymbol{y} = \\theta_0 + \\theta_1 x,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f34c217e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "such that"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b145d4eb",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\boldsymbol{y}_i = \\theta_0 + \\theta_1 x_i.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2df6d60d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Gradient descent example\n",
+    "\n",
+    "Let $\\mathbf{y} = (y_1,\\cdots,y_n)^T$, $\\mathbf{\\boldsymbol{y}} = (\\boldsymbol{y}_1,\\cdots,\\boldsymbol{y}_n)^T$ and $\\theta = (\\theta_0, \\theta_1)^T$\n",
+    "\n",
+    "It is convenient to write $\\mathbf{\\boldsymbol{y}} = X\\theta$ where $X \\in \\mathbb{R}^{100 \\times 2} $ is the design matrix given by (we keep the intercept here)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1deafba0",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "X \\equiv \\begin{bmatrix}\n",
+    "1 & x_1  \\\\\n",
+    "\\vdots & \\vdots  \\\\\n",
+    "1 & x_{100} &  \\\\\n",
+    "\\end{bmatrix}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "520ac423",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "The cost/loss/risk function is given by"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "48e7232b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "C(\\theta) = \\frac{1}{n}||X\\theta-\\mathbf{y}||_{2}^{2} = \\frac{1}{n}\\sum_{i=1}^{100}\\left[ (\\theta_0 + \\theta_1 x_i)^2 - 2 y_i (\\theta_0 + \\theta_1 x_i) + y_i^2\\right]\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0194af20",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and we want to find $\\theta$ such that $C(\\theta)$ is minimized."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9f58d823",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## The derivative of the cost/loss function\n",
+    "\n",
+    "Computing $\\partial C(\\theta) / \\partial \\theta_0$ and $\\partial C(\\theta) / \\partial \\theta_1$ we can show  that the gradient can be written as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "10129d02",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\nabla_{\\theta} C(\\theta) = \\frac{2}{n}\\begin{bmatrix} \\sum_{i=1}^{100} \\left(\\theta_0+\\theta_1x_i-y_i\\right) \\\\\n",
+    "\\sum_{i=1}^{100}\\left( x_i (\\theta_0+\\theta_1x_i)-y_ix_i\\right) \\\\\n",
+    "\\end{bmatrix} = \\frac{2}{n}X^T(X\\theta - \\mathbf{y}),\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4cd07523",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where $X$ is the design matrix defined above."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1bda7e01",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## The Hessian matrix\n",
+    "The Hessian matrix of $C(\\theta)$ is given by"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "aa64bdd1",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\boldsymbol{H} \\equiv \\begin{bmatrix}\n",
+    "\\frac{\\partial^2 C(\\theta)}{\\partial \\theta_0^2} & \\frac{\\partial^2 C(\\theta)}{\\partial \\theta_0 \\partial \\theta_1}  \\\\\n",
+    "\\frac{\\partial^2 C(\\theta)}{\\partial \\theta_0 \\partial \\theta_1} & \\frac{\\partial^2 C(\\theta)}{\\partial \\theta_1^2} &  \\\\\n",
+    "\\end{bmatrix} = \\frac{2}{n}X^T X.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3e7f4c5d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "This result implies that $C(\\theta)$ is a convex function since the matrix $X^T X$ always is positive semi-definite."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "79ed73a8",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Simple program\n",
+    "\n",
+    "We can now write a program that minimizes $C(\\theta)$ using the gradient descent method with a constant learning rate $\\eta$ according to"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1b70ad9b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\theta_{k+1} = \\theta_k - \\eta \\nabla_\\theta C(\\theta_k), \\ k=0,1,\\cdots\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2fbef92d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "We can use the expression we computed for the gradient and let use a\n",
+    "$\\theta_0$ be chosen randomly and let $\\eta = 0.001$. Stop iterating\n",
+    "when $||\\nabla_\\theta C(\\theta_k) || \\leq \\epsilon = 10^{-8}$. **Note that the code below does not include the latter stop criterion**.\n",
+    "\n",
+    "And finally we can compare our solution for $\\theta$ with the analytic result given by \n",
+    "$\\theta= (X^TX)^{-1} X^T \\mathbf{y}$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0728a369",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Gradient Descent Example\n",
+    "\n",
+    "Here our simple example"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "id": "a48d43f0",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "%matplotlib inline\n",
+    "\n",
+    "\n",
+    "# Importing various packages\n",
+    "from random import random, seed\n",
+    "import numpy as np\n",
+    "import matplotlib.pyplot as plt\n",
+    "from mpl_toolkits.mplot3d import Axes3D\n",
+    "from matplotlib import cm\n",
+    "from matplotlib.ticker import LinearLocator, FormatStrFormatter\n",
+    "import sys\n",
+    "\n",
+    "# the number of datapoints\n",
+    "n = 100\n",
+    "x = 2*np.random.rand(n,1)\n",
+    "y = 4+3*x+np.random.randn(n,1)\n",
+    "\n",
+    "X = np.c_[np.ones((n,1)), x]\n",
+    "# Hessian matrix\n",
+    "H = (2.0/n)* X.T @ X\n",
+    "# Get the eigenvalues\n",
+    "EigValues, EigVectors = np.linalg.eig(H)\n",
+    "print(f\"Eigenvalues of Hessian Matrix:{EigValues}\")\n",
+    "\n",
+    "theta_linreg = np.linalg.inv(X.T @ X) @ X.T @ y\n",
+    "print(theta_linreg)\n",
+    "theta = np.random.randn(2,1)\n",
+    "\n",
+    "eta = 1.0/np.max(EigValues)\n",
+    "Niterations = 1000\n",
+    "\n",
+    "for iter in range(Niterations):\n",
+    "    gradient = (2.0/n)*X.T @ (X @ theta-y)\n",
+    "    theta -= eta*gradient\n",
+    "\n",
+    "print(theta)\n",
+    "xnew = np.array([[0],[2]])\n",
+    "xbnew = np.c_[np.ones((2,1)), xnew]\n",
+    "ypredict = xbnew.dot(theta)\n",
+    "ypredict2 = xbnew.dot(theta_linreg)\n",
+    "plt.plot(xnew, ypredict, \"r-\")\n",
+    "plt.plot(xnew, ypredict2, \"b-\")\n",
+    "plt.plot(x, y ,'ro')\n",
+    "plt.axis([0,2.0,0, 15.0])\n",
+    "plt.xlabel(r'$x$')\n",
+    "plt.ylabel(r'$y$')\n",
+    "plt.title(r'Gradient descent example')\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6c1c6ed1",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Gradient descent and Ridge\n",
+    "\n",
+    "We have also discussed Ridge regression where the loss function contains a regularized term given by the $L_2$ norm of $\\theta$,"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a82ce6e3",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "C_{\\text{ridge}}(\\theta) = \\frac{1}{n}||X\\theta -\\mathbf{y}||^2 + \\lambda ||\\theta||^2, \\ \\lambda \\geq 0.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cb0de7c2",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "In order to minimize $C_{\\text{ridge}}(\\theta)$ using GD we adjust the gradient as follows"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b76c0dea",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\nabla_\\theta C_{\\text{ridge}}(\\theta)  = \\frac{2}{n}\\begin{bmatrix} \\sum_{i=1}^{100} \\left(\\theta_0+\\theta_1x_i-y_i\\right) \\\\\n",
+    "\\sum_{i=1}^{100}\\left( x_i (\\theta_0+\\theta_1x_i)-y_ix_i\\right) \\\\\n",
+    "\\end{bmatrix} + 2\\lambda\\begin{bmatrix} \\theta_0 \\\\ \\theta_1\\end{bmatrix} = 2 (\\frac{1}{n}X^T(X\\theta - \\mathbf{y})+\\lambda \\theta).\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4eeb07f6",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "We can easily extend our program to minimize $C_{\\text{ridge}}(\\theta)$ using gradient descent and compare with the analytical solution given by"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cc7d6c64",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\theta_{\\text{ridge}} = \\left(X^T X + n\\lambda I_{2 \\times 2} \\right)^{-1} X^T \\mathbf{y}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "08bd65db",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## The Hessian matrix for Ridge Regression\n",
+    "The Hessian matrix of Ridge Regression for our simple example  is given by"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a1c5a4d1",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\boldsymbol{H} \\equiv \\begin{bmatrix}\n",
+    "\\frac{\\partial^2 C(\\theta)}{\\partial \\theta_0^2} & \\frac{\\partial^2 C(\\theta)}{\\partial \\theta_0 \\partial \\theta_1}  \\\\\n",
+    "\\frac{\\partial^2 C(\\theta)}{\\partial \\theta_0 \\partial \\theta_1} & \\frac{\\partial^2 C(\\theta)}{\\partial \\theta_1^2} &  \\\\\n",
+    "\\end{bmatrix} = \\frac{2}{n}X^T X+2\\lambda\\boldsymbol{I}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f178c97e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "This implies that the Hessian matrix  is positive definite, hence the stationary point is a\n",
+    "minimum.\n",
+    "Note that the Ridge cost function is convex being  a sum of two convex\n",
+    "functions. Therefore, the stationary point is a global\n",
+    "minimum of this function."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3853aec7",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Program example for gradient descent with Ridge Regression"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "id": "81740e7b",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "from random import random, seed\n",
+    "import numpy as np\n",
+    "import matplotlib.pyplot as plt\n",
+    "from mpl_toolkits.mplot3d import Axes3D\n",
+    "from matplotlib import cm\n",
+    "from matplotlib.ticker import LinearLocator, FormatStrFormatter\n",
+    "import sys\n",
+    "\n",
+    "# the number of datapoints\n",
+    "n = 100\n",
+    "x = 2*np.random.rand(n,1)\n",
+    "y = 4+3*x+np.random.randn(n,1)\n",
+    "\n",
+    "X = np.c_[np.ones((n,1)), x]\n",
+    "XT_X = X.T @ X\n",
+    "\n",
+    "#Ridge parameter lambda\n",
+    "lmbda  = 0.001\n",
+    "Id = n*lmbda* np.eye(XT_X.shape[0])\n",
+    "\n",
+    "# Hessian matrix\n",
+    "H = (2.0/n)* XT_X+2*lmbda* np.eye(XT_X.shape[0])\n",
+    "# Get the eigenvalues\n",
+    "EigValues, EigVectors = np.linalg.eig(H)\n",
+    "print(f\"Eigenvalues of Hessian Matrix:{EigValues}\")\n",
+    "\n",
+    "\n",
+    "theta_linreg = np.linalg.inv(XT_X+Id) @ X.T @ y\n",
+    "print(theta_linreg)\n",
+    "# Start plain gradient descent\n",
+    "theta = np.random.randn(2,1)\n",
+    "\n",
+    "eta = 1.0/np.max(EigValues)\n",
+    "Niterations = 100\n",
+    "\n",
+    "for iter in range(Niterations):\n",
+    "    gradients = 2.0/n*X.T @ (X @ (theta)-y)+2*lmbda*theta\n",
+    "    theta -= eta*gradients\n",
+    "\n",
+    "print(theta)\n",
+    "ypredict = X @ theta\n",
+    "ypredict2 = X @ theta_linreg\n",
+    "plt.plot(x, ypredict, \"r-\")\n",
+    "plt.plot(x, ypredict2, \"b-\")\n",
+    "plt.plot(x, y ,'ro')\n",
+    "plt.axis([0,2.0,0, 15.0])\n",
+    "plt.xlabel(r'$x$')\n",
+    "plt.ylabel(r'$y$')\n",
+    "plt.title(r'Gradient descent example for Ridge')\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "aa1b6e08",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Using gradient descent methods, limitations\n",
+    "\n",
+    "* **Gradient descent (GD) finds local minima of our function**. Since the GD algorithm is deterministic, if it converges, it will converge to a local minimum of our cost/loss/risk function. Because in ML we are often dealing with extremely rugged landscapes with many local minima, this can lead to poor performance.\n",
+    "\n",
+    "* **GD is sensitive to initial conditions**. One consequence of the local nature of GD is that initial conditions matter. Depending on where one starts, one will end up at a different local minima. Therefore, it is very important to think about how one initializes the training process. This is true for GD as well as more complicated variants of GD.\n",
+    "\n",
+    "* **Gradients are computationally expensive to calculate for large datasets**. In many cases in statistics and ML, the cost/loss/risk function is a sum of terms, with one term for each data point. For example, in linear regression, $E \\propto \\sum_{i=1}^n (y_i - \\mathbf{w}^T\\cdot\\mathbf{x}_i)^2$; for logistic regression, the square error is replaced by the cross entropy. To calculate the gradient we have to sum over *all* $n$ data points. Doing this at every GD step becomes extremely computationally expensive. An ingenious solution to this, is to calculate the gradients using small subsets of the data called \"mini batches\". This has the added benefit of introducing stochasticity into our algorithm.\n",
+    "\n",
+    "* **GD is very sensitive to choices of learning rates**. GD is extremely sensitive to the choice of learning rates. If the learning rate is very small, the training process take an extremely long time. For larger learning rates, GD can diverge and give poor results. Furthermore, depending on what the local landscape looks like, we have to modify the learning rates to ensure convergence. Ideally, we would *adaptively* choose the learning rates to match the landscape.\n",
+    "\n",
+    "* **GD treats all directions in parameter space uniformly.** Another major drawback of GD is that unlike Newton's method, the learning rate for GD is the same in all directions in parameter space. For this reason, the maximum learning rate is set by the behavior of the steepest direction and this can significantly slow down training. Ideally, we would like to take large steps in flat directions and small steps in steep directions. Since we are exploring rugged landscapes where curvatures change, this requires us to keep track of not only the gradient but second derivatives. The ideal scenario would be to calculate the Hessian but this proves to be too computationally expensive. \n",
+    "\n",
+    "* GD can take exponential time to escape saddle points, even with random initialization. As we mentioned, GD is extremely sensitive to initial condition since it determines the particular local minimum GD would eventually reach. However, even with a good initialization scheme, through the introduction of randomness, GD can still take exponential time to escape saddle points."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d1b9be1a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Momentum based GD\n",
+    "\n",
+    "We discuss here some simple examples where we introduce what is called\n",
+    "'memory'about previous steps, or what is normally called momentum\n",
+    "gradient descent.\n",
+    "For the mathematical details, see whiteboad notes from lecture on September 8, 2025."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2e1267e6",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Improving gradient descent with momentum"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "id": "494e82a7",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "from numpy import asarray\n",
+    "from numpy import arange\n",
+    "from numpy.random import rand\n",
+    "from numpy.random import seed\n",
+    "from matplotlib import pyplot\n",
+    " \n",
+    "# objective function\n",
+    "def objective(x):\n",
+    "\treturn x**2.0\n",
+    " \n",
+    "# derivative of objective function\n",
+    "def derivative(x):\n",
+    "\treturn x * 2.0\n",
+    " \n",
+    "# gradient descent algorithm\n",
+    "def gradient_descent(objective, derivative, bounds, n_iter, step_size):\n",
+    "\t# track all solutions\n",
+    "\tsolutions, scores = list(), list()\n",
+    "\t# generate an initial point\n",
+    "\tsolution = bounds[:, 0] + rand(len(bounds)) * (bounds[:, 1] - bounds[:, 0])\n",
+    "\t# run the gradient descent\n",
+    "\tfor i in range(n_iter):\n",
+    "\t\t# calculate gradient\n",
+    "\t\tgradient = derivative(solution)\n",
+    "\t\t# take a step\n",
+    "\t\tsolution = solution - step_size * gradient\n",
+    "\t\t# evaluate candidate point\n",
+    "\t\tsolution_eval = objective(solution)\n",
+    "\t\t# store solution\n",
+    "\t\tsolutions.append(solution)\n",
+    "\t\tscores.append(solution_eval)\n",
+    "\t\t# report progress\n",
+    "\t\tprint('>%d f(%s) = %.5f' % (i, solution, solution_eval))\n",
+    "\treturn [solutions, scores]\n",
+    " \n",
+    "# seed the pseudo random number generator\n",
+    "seed(4)\n",
+    "# define range for input\n",
+    "bounds = asarray([[-1.0, 1.0]])\n",
+    "# define the total iterations\n",
+    "n_iter = 30\n",
+    "# define the step size\n",
+    "step_size = 0.1\n",
+    "# perform the gradient descent search\n",
+    "solutions, scores = gradient_descent(objective, derivative, bounds, n_iter, step_size)\n",
+    "# sample input range uniformly at 0.1 increments\n",
+    "inputs = arange(bounds[0,0], bounds[0,1]+0.1, 0.1)\n",
+    "# compute targets\n",
+    "results = objective(inputs)\n",
+    "# create a line plot of input vs result\n",
+    "pyplot.plot(inputs, results)\n",
+    "# plot the solutions found\n",
+    "pyplot.plot(solutions, scores, '.-', color='red')\n",
+    "# show the plot\n",
+    "pyplot.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "46858c7c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Same code but now with momentum gradient descent"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "id": "6a917123",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "from numpy import asarray\n",
+    "from numpy import arange\n",
+    "from numpy.random import rand\n",
+    "from numpy.random import seed\n",
+    "from matplotlib import pyplot\n",
+    " \n",
+    "# objective function\n",
+    "def objective(x):\n",
+    "\treturn x**2.0\n",
+    " \n",
+    "# derivative of objective function\n",
+    "def derivative(x):\n",
+    "\treturn x * 2.0\n",
+    " \n",
+    "# gradient descent algorithm\n",
+    "def gradient_descent(objective, derivative, bounds, n_iter, step_size, momentum):\n",
+    "\t# track all solutions\n",
+    "\tsolutions, scores = list(), list()\n",
+    "\t# generate an initial point\n",
+    "\tsolution = bounds[:, 0] + rand(len(bounds)) * (bounds[:, 1] - bounds[:, 0])\n",
+    "\t# keep track of the change\n",
+    "\tchange = 0.0\n",
+    "\t# run the gradient descent\n",
+    "\tfor i in range(n_iter):\n",
+    "\t\t# calculate gradient\n",
+    "\t\tgradient = derivative(solution)\n",
+    "\t\t# calculate update\n",
+    "\t\tnew_change = step_size * gradient + momentum * change\n",
+    "\t\t# take a step\n",
+    "\t\tsolution = solution - new_change\n",
+    "\t\t# save the change\n",
+    "\t\tchange = new_change\n",
+    "\t\t# evaluate candidate point\n",
+    "\t\tsolution_eval = objective(solution)\n",
+    "\t\t# store solution\n",
+    "\t\tsolutions.append(solution)\n",
+    "\t\tscores.append(solution_eval)\n",
+    "\t\t# report progress\n",
+    "\t\tprint('>%d f(%s) = %.5f' % (i, solution, solution_eval))\n",
+    "\treturn [solutions, scores]\n",
+    " \n",
+    "# seed the pseudo random number generator\n",
+    "seed(4)\n",
+    "# define range for input\n",
+    "bounds = asarray([[-1.0, 1.0]])\n",
+    "# define the total iterations\n",
+    "n_iter = 30\n",
+    "# define the step size\n",
+    "step_size = 0.1\n",
+    "# define momentum\n",
+    "momentum = 0.3\n",
+    "# perform the gradient descent search with momentum\n",
+    "solutions, scores = gradient_descent(objective, derivative, bounds, n_iter, step_size, momentum)\n",
+    "# sample input range uniformly at 0.1 increments\n",
+    "inputs = arange(bounds[0,0], bounds[0,1]+0.1, 0.1)\n",
+    "# compute targets\n",
+    "results = objective(inputs)\n",
+    "# create a line plot of input vs result\n",
+    "pyplot.plot(inputs, results)\n",
+    "# plot the solutions found\n",
+    "pyplot.plot(solutions, scores, '.-', color='red')\n",
+    "# show the plot\n",
+    "pyplot.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "361b2aa8",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Overview video on Stochastic Gradient Descent (SGD)\n",
+    "\n",
+    "[What is Stochastic Gradient Descent](https://www.youtube.com/watch?v=vMh0zPT0tLI&ab_channel=StatQuestwithJoshStarmer)\n",
+    "There are several reasons for using stochastic gradient descent. Some of these are:\n",
+    "\n",
+    "1. Efficiency: Updates weights more frequently using a single or a small batch of samples, which speeds up convergence.\n",
+    "\n",
+    "2. Hopefully avoid Local Minima\n",
+    "\n",
+    "3. Memory Usage: Requires less memory compared to computing gradients for the entire dataset."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2dacb8ef",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Batches and mini-batches\n",
+    "\n",
+    "In gradient descent we compute the cost function and its gradient for all data points we have.\n",
+    "\n",
+    "In large-scale applications such as the [ILSVRC challenge](https://www.image-net.org/challenges/LSVRC/), the\n",
+    "training data can have on order of millions of examples. Hence, it\n",
+    "seems wasteful to compute the full cost function over the entire\n",
+    "training set in order to perform only a single parameter update. A\n",
+    "very common approach to addressing this challenge is to compute the\n",
+    "gradient over batches of the training data. For example, a typical batch could contain some thousand  examples from\n",
+    "an  entire training set of several millions. This batch is then used to\n",
+    "perform a parameter update."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "59c9add4",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Pros and cons\n",
+    "\n",
+    "1. Speed: SGD is faster than gradient descent because it uses only one training example per iteration, whereas gradient descent requires the entire dataset. This speed advantage becomes more significant as the size of the dataset increases.\n",
+    "\n",
+    "2. Convergence: Gradient descent has a more predictable convergence behaviour because it uses the average gradient of the entire dataset. In contrast, SGD’s convergence behaviour can be more erratic due to its random sampling of individual training examples.\n",
+    "\n",
+    "3. Memory: Gradient descent requires more memory than SGD because it must store the entire dataset for each iteration. SGD only needs to store the current training example, making it more memory-efficient."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a5168cc9",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Convergence rates\n",
+    "\n",
+    "1. Stochastic Gradient Descent has a faster convergence rate due to the use of single training examples in each iteration.\n",
+    "\n",
+    "2. Gradient Descent as a slower convergence rate, as it uses the entire dataset for each iteration."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "47321307",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Accuracy\n",
+    "\n",
+    "In general, stochastic Gradient Descent is Less accurate than gradient\n",
+    "descent, as it calculates the gradient on single examples, which may\n",
+    "not accurately represent the overall dataset.  Gradient Descent is\n",
+    "more accurate because it uses the average gradient calculated over the\n",
+    "entire dataset.\n",
+    "\n",
+    "There are other disadvantages to using SGD. The main drawback is that\n",
+    "its convergence behaviour can be more erratic due to the random\n",
+    "sampling of individual training examples. This can lead to less\n",
+    "accurate results, as the algorithm may not converge to the true\n",
+    "minimum of the cost function. Additionally, the learning rate, which\n",
+    "determines the step size of each update to the model’s parameters,\n",
+    "must be carefully chosen to ensure convergence.\n",
+    "\n",
+    "It is however the method of choice in deep learning algorithms where\n",
+    "SGD is often used in combination with other optimization techniques,\n",
+    "such as momentum or adaptive learning rates"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "96f44d6b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Stochastic Gradient Descent (SGD)\n",
+    "\n",
+    "In stochastic gradient descent, the extreme case is the case where we\n",
+    "have only one batch, that is we include the whole data set.\n",
+    "\n",
+    "This process is called Stochastic Gradient\n",
+    "Descent (SGD) (or also sometimes on-line gradient descent). This is\n",
+    "relatively less common to see because in practice due to vectorized\n",
+    "code optimizations it can be computationally much more efficient to\n",
+    "evaluate the gradient for 100 examples, than the gradient for one\n",
+    "example 100 times. Even though SGD technically refers to using a\n",
+    "single example at a time to evaluate the gradient, you will hear\n",
+    "people use the term SGD even when referring to mini-batch gradient\n",
+    "descent (i.e. mentions of MGD for “Minibatch Gradient Descent”, or BGD\n",
+    "for “Batch gradient descent” are rare to see), where it is usually\n",
+    "assumed that mini-batches are used. The size of the mini-batch is a\n",
+    "hyperparameter but it is not very common to cross-validate or bootstrap it. It is\n",
+    "usually based on memory constraints (if any), or set to some value,\n",
+    "e.g. 32, 64 or 128. We use powers of 2 in practice because many\n",
+    "vectorized operation implementations work faster when their inputs are\n",
+    "sized in powers of 2.\n",
+    "\n",
+    "In our notes with  SGD we mean stochastic gradient descent with mini-batches."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "898ef421",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Stochastic Gradient Descent\n",
+    "\n",
+    "Stochastic gradient descent (SGD) and variants thereof address some of\n",
+    "the shortcomings of the Gradient descent method discussed above.\n",
+    "\n",
+    "The underlying idea of SGD comes from the observation that the cost\n",
+    "function, which we want to minimize, can almost always be written as a\n",
+    "sum over $n$ data points $\\{\\mathbf{x}_i\\}_{i=1}^n$,"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4e827950",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "C(\\mathbf{\\theta}) = \\sum_{i=1}^n c_i(\\mathbf{x}_i,\n",
+    "\\mathbf{\\theta}).\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "05e99546",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Computation of gradients\n",
+    "\n",
+    "This in turn means that the gradient can be\n",
+    "computed as a sum over $i$-gradients"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b92afe6c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\nabla_\\theta C(\\mathbf{\\theta}) = \\sum_i^n \\nabla_\\theta c_i(\\mathbf{x}_i,\n",
+    "\\mathbf{\\theta}).\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b20a4aca",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Stochasticity/randomness is introduced by only taking the\n",
+    "gradient on a subset of the data called minibatches.  If there are $n$\n",
+    "data points and the size of each minibatch is $M$, there will be $n/M$\n",
+    "minibatches. We denote these minibatches by $B_k$ where\n",
+    "$k=1,\\cdots,n/M$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7884cc0d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## SGD example\n",
+    "As an example, suppose we have $10$ data points $(\\mathbf{x}_1,\\cdots, \\mathbf{x}_{10})$ \n",
+    "and we choose to have $M=5$ minibathces,\n",
+    "then each minibatch contains two data points. In particular we have\n",
+    "$B_1 = (\\mathbf{x}_1,\\mathbf{x}_2), \\cdots, B_5 =\n",
+    "(\\mathbf{x}_9,\\mathbf{x}_{10})$. Note that if you choose $M=1$ you\n",
+    "have only a single batch with all data points and on the other extreme,\n",
+    "you may choose $M=n$ resulting in a minibatch for each datapoint, i.e\n",
+    "$B_k = \\mathbf{x}_k$.\n",
+    "\n",
+    "The idea is now to approximate the gradient by replacing the sum over\n",
+    "all data points with a sum over the data points in one the minibatches\n",
+    "picked at random in each gradient descent step"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "392aeed0",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\nabla_{\\theta}\n",
+    "C(\\mathbf{\\theta}) = \\sum_{i=1}^n \\nabla_\\theta c_i(\\mathbf{x}_i,\n",
+    "\\mathbf{\\theta}) \\rightarrow \\sum_{i \\in B_k}^n \\nabla_\\theta\n",
+    "c_i(\\mathbf{x}_i, \\mathbf{\\theta}).\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "04581249",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## The gradient step\n",
+    "\n",
+    "Thus a gradient descent step now looks like"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d21077a4",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\theta_{j+1} = \\theta_j - \\eta_j \\sum_{i \\in B_k}^n \\nabla_\\theta c_i(\\mathbf{x}_i,\n",
+    "\\mathbf{\\theta})\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b4bed668",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where $k$ is picked at random with equal\n",
+    "probability from $[1,n/M]$. An iteration over the number of\n",
+    "minibathces (n/M) is commonly referred to as an epoch. Thus it is\n",
+    "typical to choose a number of epochs and for each epoch iterate over\n",
+    "the number of minibatches, as exemplified in the code below."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9c15b282",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Simple example code"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "id": "602bda4c",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "import numpy as np \n",
+    "\n",
+    "n = 100 #100 datapoints \n",
+    "M = 5   #size of each minibatch\n",
+    "m = int(n/M) #number of minibatches\n",
+    "n_epochs = 10 #number of epochs\n",
+    "\n",
+    "j = 0\n",
+    "for epoch in range(1,n_epochs+1):\n",
+    "    for i in range(m):\n",
+    "        k = np.random.randint(m) #Pick the k-th minibatch at random\n",
+    "        #Compute the gradient using the data in minibatch Bk\n",
+    "        #Compute new suggestion for \n",
+    "        j += 1"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "332831a7",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Taking the gradient only on a subset of the data has two important\n",
+    "benefits. First, it introduces randomness which decreases the chance\n",
+    "that our opmization scheme gets stuck in a local minima. Second, if\n",
+    "the size of the minibatches are small relative to the number of\n",
+    "datapoints ($M <  n$), the computation of the gradient is much\n",
+    "cheaper since we sum over the datapoints in the $k-th$ minibatch and not\n",
+    "all $n$ datapoints."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "187eb27c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## When do we stop?\n",
+    "\n",
+    "A natural question is when do we stop the search for a new minimum?\n",
+    "One possibility is to compute the full gradient after a given number\n",
+    "of epochs and check if the norm of the gradient is smaller than some\n",
+    "threshold and stop if true. However, the condition that the gradient\n",
+    "is zero is valid also for local minima, so this would only tell us\n",
+    "that we are close to a local/global minimum. However, we could also\n",
+    "evaluate the cost function at this point, store the result and\n",
+    "continue the search. If the test kicks in at a later stage we can\n",
+    "compare the values of the cost function and keep the $\\theta$ that\n",
+    "gave the lowest value."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8ddbdbb5",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Slightly different approach\n",
+    "\n",
+    "Another approach is to let the step length $\\eta_j$ depend on the\n",
+    "number of epochs in such a way that it becomes very small after a\n",
+    "reasonable time such that we do not move at all. Such approaches are\n",
+    "also called scaling. There are many such ways to [scale the learning\n",
+    "rate](https://towardsdatascience.com/gradient-descent-the-learning-rate-and-the-importance-of-feature-scaling-6c0b416596e1)\n",
+    "and [discussions here](https://www.jmlr.org/papers/volume23/20-1258/20-1258.pdf). See\n",
+    "also\n",
+    "<https://towardsdatascience.com/learning-rate-schedules-and-adaptive-learning-rate-methods-for-deep-learning-2c8f433990d1>\n",
+    "for a discussion of different scaling functions for the learning rate."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "35ea8e21",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Time decay rate\n",
+    "\n",
+    "As an example, let $e = 0,1,2,3,\\cdots$ denote the current epoch and let $t_0, t_1 > 0$ be two fixed numbers. Furthermore, let $t = e \\cdot m + i$ where $m$ is the number of minibatches and $i=0,\\cdots,m-1$. Then the function $$\\eta_j(t; t_0, t_1) = \\frac{t_0}{t+t_1} $$ goes to zero as the number of epochs gets large. I.e. we start with a step length $\\eta_j (0; t_0, t_1) = t_0/t_1$ which decays in *time* $t$.\n",
+    "\n",
+    "In this way we can fix the number of epochs, compute $\\theta$ and\n",
+    "evaluate the cost function at the end. Repeating the computation will\n",
+    "give a different result since the scheme is random by design. Then we\n",
+    "pick the final $\\theta$ that gives the lowest value of the cost\n",
+    "function."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "id": "77a60fcd",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "import numpy as np \n",
+    "\n",
+    "def step_length(t,t0,t1):\n",
+    "    return t0/(t+t1)\n",
+    "\n",
+    "n = 100 #100 datapoints \n",
+    "M = 5   #size of each minibatch\n",
+    "m = int(n/M) #number of minibatches\n",
+    "n_epochs = 500 #number of epochs\n",
+    "t0 = 1.0\n",
+    "t1 = 10\n",
+    "\n",
+    "eta_j = t0/t1\n",
+    "j = 0\n",
+    "for epoch in range(1,n_epochs+1):\n",
+    "    for i in range(m):\n",
+    "        k = np.random.randint(m) #Pick the k-th minibatch at random\n",
+    "        #Compute the gradient using the data in minibatch Bk\n",
+    "        #Compute new suggestion for theta\n",
+    "        t = epoch*m+i\n",
+    "        eta_j = step_length(t,t0,t1)\n",
+    "        j += 1\n",
+    "\n",
+    "print(\"eta_j after %d epochs: %g\" % (n_epochs,eta_j))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b030b80c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Code with a Number of Minibatches which varies\n",
+    "\n",
+    "In the code here we vary the number of mini-batches."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 8,
+   "id": "9bdf875b",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "# Importing various packages\n",
+    "from math import exp, sqrt\n",
+    "from random import random, seed\n",
+    "import numpy as np\n",
+    "import matplotlib.pyplot as plt\n",
+    "\n",
+    "n = 100\n",
+    "x = 2*np.random.rand(n,1)\n",
+    "y = 4+3*x+np.random.randn(n,1)\n",
+    "\n",
+    "X = np.c_[np.ones((n,1)), x]\n",
+    "XT_X = X.T @ X\n",
+    "theta_linreg = np.linalg.inv(X.T @ X) @ (X.T @ y)\n",
+    "print(\"Own inversion\")\n",
+    "print(theta_linreg)\n",
+    "# Hessian matrix\n",
+    "H = (2.0/n)* XT_X\n",
+    "EigValues, EigVectors = np.linalg.eig(H)\n",
+    "print(f\"Eigenvalues of Hessian Matrix:{EigValues}\")\n",
+    "\n",
+    "theta = np.random.randn(2,1)\n",
+    "eta = 1.0/np.max(EigValues)\n",
+    "Niterations = 1000\n",
+    "\n",
+    "\n",
+    "for iter in range(Niterations):\n",
+    "    gradients = 2.0/n*X.T @ ((X @ theta)-y)\n",
+    "    theta -= eta*gradients\n",
+    "print(\"theta from own gd\")\n",
+    "print(theta)\n",
+    "\n",
+    "xnew = np.array([[0],[2]])\n",
+    "Xnew = np.c_[np.ones((2,1)), xnew]\n",
+    "ypredict = Xnew.dot(theta)\n",
+    "ypredict2 = Xnew.dot(theta_linreg)\n",
+    "\n",
+    "n_epochs = 50\n",
+    "M = 5   #size of each minibatch\n",
+    "m = int(n/M) #number of minibatches\n",
+    "t0, t1 = 5, 50\n",
+    "\n",
+    "def learning_schedule(t):\n",
+    "    return t0/(t+t1)\n",
+    "\n",
+    "theta = np.random.randn(2,1)\n",
+    "\n",
+    "for epoch in range(n_epochs):\n",
+    "# Can you figure out a better way of setting up the contributions to each batch?\n",
+    "    for i in range(m):\n",
+    "        random_index = M*np.random.randint(m)\n",
+    "        xi = X[random_index:random_index+M]\n",
+    "        yi = y[random_index:random_index+M]\n",
+    "        gradients = (2.0/M)* xi.T @ ((xi @ theta)-yi)\n",
+    "        eta = learning_schedule(epoch*m+i)\n",
+    "        theta = theta - eta*gradients\n",
+    "print(\"theta from own sdg\")\n",
+    "print(theta)\n",
+    "\n",
+    "plt.plot(xnew, ypredict, \"r-\")\n",
+    "plt.plot(xnew, ypredict2, \"b-\")\n",
+    "plt.plot(x, y ,'ro')\n",
+    "plt.axis([0,2.0,0, 15.0])\n",
+    "plt.xlabel(r'$x$')\n",
+    "plt.ylabel(r'$y$')\n",
+    "plt.title(r'Random numbers ')\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "365cebd9",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Replace or not\n",
+    "\n",
+    "In the above code, we have use replacement in setting up the\n",
+    "mini-batches. The discussion\n",
+    "[here](https://sebastianraschka.com/faq/docs/sgd-methods.html) may be\n",
+    "useful."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e7c9011a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## SGD vs Full-Batch GD: Convergence Speed and Memory Comparison"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f1c85da0",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "### Theoretical Convergence Speed and convex optimization\n",
+    "\n",
+    "Consider minimizing an empirical cost function"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "66df0f80",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "C(\\theta) =\\frac{1}{N}\\sum_{i=1}^N l_i(\\theta),\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9f02b845",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where each $l_i(\\theta)$ is a\n",
+    "differentiable loss term. Gradient Descent (GD) updates parameters\n",
+    "using the full gradient $\\nabla C(\\theta)$, while Stochastic Gradient\n",
+    "Descent (SGD) uses a single sample (or mini-batch) gradient $\\nabla\n",
+    "l_i(\\theta)$ selected at random. In equation form, one GD step is:"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "21997f1a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\theta_{t+1} = \\theta_t-\\eta \\nabla C(\\theta_t) =\\theta_t -\\eta \\frac{1}{N}\\sum_{i=1}^N \\nabla l_i(\\theta_t),\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cdefe165",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "whereas one SGD step is:"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ac200d56",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\theta_{t+1} = \\theta_t -\\eta \\nabla l_{i_t}(\\theta_t),\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "eb3edfb3",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "with $i_t$ randomly chosen. On smooth convex problems, GD and SGD both\n",
+    "converge to the global minimum, but their rates differ. GD can take\n",
+    "larger, more stable steps since it uses the exact gradient, achieving\n",
+    "an error that decreases on the order of $O(1/t)$ per iteration for\n",
+    "convex objectives (and even exponentially fast for strongly convex\n",
+    "cases). In contrast, plain SGD has more variance in each step, leading\n",
+    "to sublinear convergence in expectation – typically $O(1/\\sqrt{t})$\n",
+    "for general convex objectives (\\thetaith appropriate diminishing step\n",
+    "sizes) . Intuitively, GD’s trajectory is smoother and more\n",
+    "predictable, while SGD’s path oscillates due to noise but costs far\n",
+    "less per iteration, enabling many more updates in the same time."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7fe05c0d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "### Strongly Convex Case\n",
+    "\n",
+    "If $C(\\theta)$ is strongly convex and $L$-smooth (so GD enjoys linear\n",
+    "convergence), the gap $C(\\theta_t)-C(\\theta^*)$ for GD shrinks as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2ae403f1",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "C(\\theta_t) - C(\\theta^* ) \\le \\Big(1 - \\frac{\\mu}{L}\\Big)^t [C(\\theta_0)-C(\\theta^*)],\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "44272171",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "a geometric (linear) convergence per iteration . Achieving an\n",
+    "$\\epsilon$-accurate solution thus takes on the order of\n",
+    "$\\log(1/\\epsilon)$ iterations for GD. However, each GD iteration costs\n",
+    "$O(N)$ gradient evaluations. SGD cannot exploit strong convexity to\n",
+    "obtain a linear rate – instead, with a properly decaying step size\n",
+    "(e.g. $\\eta_t = \\frac{1}{\\mu t}$) or iterate averaging, SGD attains an\n",
+    "$O(1/t)$ convergence rate in expectation . For example, one result\n",
+    "of Moulines and  Bach 2011, see <https://papers.nips.cc/paper_files/paper/2011/hash/40008b9a5380fcacce3976bf7c08af5b-Abstract.html> shows that with $\\eta_t = \\Theta(1/t)$,"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9cde29ef",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\mathbb{E}[C(\\theta_t) - C(\\theta^*)] = O(1/t),\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9b77f20e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "for strongly convex, smooth $F$ . This $1/t$ rate is slower per\n",
+    "iteration than GD’s exponential decay, but each SGD iteration is $N$\n",
+    "times cheaper. In fact, to reach error $\\epsilon$, plain SGD needs on\n",
+    "the order of $T=O(1/\\epsilon)$ iterations (sub-linear convergence),\n",
+    "while GD needs $O(\\log(1/\\epsilon))$ iterations. When accounting for\n",
+    "cost-per-iteration, GD requires $O(N \\log(1/\\epsilon))$ total gradient\n",
+    "computations versus SGD’s $O(1/\\epsilon)$ single-sample\n",
+    "computations. In large-scale regimes (huge $N$), SGD can be\n",
+    "faster in wall-clock time because $N \\log(1/\\epsilon)$ may far exceed\n",
+    "$1/\\epsilon$ for reasonable accuracy levels. In other words,\n",
+    "with millions of data points, one epoch of GD (one full gradient) is\n",
+    "extremely costly, whereas SGD can make $N$ cheap updates in the time\n",
+    "GD makes one – often yielding a good solution faster in practice, even\n",
+    "though SGD’s asymptotic error decays more slowly. As one lecture\n",
+    "succinctly puts it: “SGD can be super effective in terms of iteration\n",
+    "cost and memory, but SGD is slow to converge and can’t adapt to strong\n",
+    "convexity” . Thus, the break-even point depends on $N$ and the desired\n",
+    "accuracy: for moderate accuracy on very large $N$, SGD’s cheaper\n",
+    "updates win; for extremely high precision (very small $\\epsilon$) on a\n",
+    "modest $N$, GD’s fast convergence per step can be advantageous."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4479bd97",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "### Non-Convex Problems\n",
+    "\n",
+    "In non-convex optimization (e.g. deep neural networks), neither GD nor\n",
+    "SGD guarantees global minima, but SGD often displays faster progress\n",
+    "in finding useful minima. Theoretical results here are weaker, usually\n",
+    "showing convergence to a stationary point $\\theta$ ($|\\nabla C|$ is\n",
+    "small) in expectation. For example, GD might require $O(1/\\epsilon^2)$\n",
+    "iterations to ensure $|\\nabla C(\\theta)| < \\epsilon$, and SGD typically has\n",
+    "similar polynomial complexity (often worse due to gradient\n",
+    "noise). However, a noteworthy difference is that SGD’s stochasticity\n",
+    "can help escape saddle points or poor local minima. Random gradient\n",
+    "fluctuations act like implicit noise, helping the iterate “jump” out\n",
+    "of flat saddle regions where full-batch GD could stagnate . In fact,\n",
+    "research has shown that adding noise to GD can guarantee escaping\n",
+    "saddle points in polynomial time, and the inherent noise in SGD often\n",
+    "serves this role. Empirically, this means SGD can sometimes find a\n",
+    "lower loss basin faster, whereas full-batch GD might get “stuck” near\n",
+    "saddle points or need a very small learning rate to navigate complex\n",
+    "error surfaces . Overall, in modern high-dimensional machine learning,\n",
+    "SGD (or mini-batch SGD) is the workhorse for large non-convex problems\n",
+    "because it converges to good solutions much faster in practice,\n",
+    "despite the lack of a linear convergence guarantee. Full-batch GD is\n",
+    "rarely used on large neural networks, as it would require tiny steps\n",
+    "to avoid divergence and is extremely slow per iteration ."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "31ea65c9",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Memory Usage and Scalability\n",
+    "\n",
+    "A major advantage of SGD is its memory efficiency in handling large\n",
+    "datasets. Full-batch GD requires access to the entire training set for\n",
+    "each iteration, which often means the whole dataset (or a large\n",
+    "subset) must reside in memory to compute $\\nabla C(\\theta)$ . This results\n",
+    "in memory usage that scales linearly with the dataset size $N$. For\n",
+    "instance, if each training sample is large (e.g. high-dimensional\n",
+    "features), computing a full gradient may require storing a substantial\n",
+    "portion of the data or all intermediate gradients until they are\n",
+    "aggregated. In contrast, SGD needs only a single (or a small\n",
+    "mini-batch of) training example(s) in memory at any time . The\n",
+    "algorithm processes one sample (or mini-batch) at a time and\n",
+    "immediately updates the model, discarding that sample before moving to\n",
+    "the next. This streaming approach means that memory footprint is\n",
+    "essentially independent of $N$ (apart from storing the model\n",
+    "parameters themselves). As one source notes, gradient descent\n",
+    "“requires more memory than SGD” because it “must store the entire\n",
+    "dataset for each iteration,” whereas SGD “only needs to store the\n",
+    "current training example” . In practical terms, if you have a dataset\n",
+    "of size, say, 1 million examples, full-batch GD would need memory for\n",
+    "all million every step, while SGD could be implemented to load just\n",
+    "one example at a time – a crucial benefit if data are too large to fit\n",
+    "in RAM or GPU memory. This scalability makes SGD suitable for\n",
+    "large-scale learning: as long as you can stream data from disk, SGD\n",
+    "can handle arbitrarily large datasets with fixed memory. In fact, SGD\n",
+    "“does not need to remember which examples were visited” in the past,\n",
+    "allowing it to run in an online fashion on infinite data streams\n",
+    ". Full-batch GD, on the other hand, would require multiple passes\n",
+    "through a giant dataset per update (or a complex distributed memory\n",
+    "system), which is often infeasible.\n",
+    "\n",
+    "There is also a secondary memory effect: computing a full-batch\n",
+    "gradient in deep learning requires storing all intermediate\n",
+    "activations for backpropagation across the entire batch. A very large\n",
+    "batch (approaching the full dataset) might exhaust GPU memory due to\n",
+    "the need to hold activation gradients for thousands or millions of\n",
+    "examples simultaneously. SGD/minibatches mitigate this by splitting\n",
+    "the workload – e.g. with a mini-batch of size 32 or 256, memory use\n",
+    "stays bounded, whereas a full-batch (size = $N$) forward/backward pass\n",
+    "could not even be executed if $N$ is huge. Techniques like gradient\n",
+    "accumulation exist to simulate large-batch GD by summing many\n",
+    "small-batch gradients – but these still process data in manageable\n",
+    "chunks to avoid memory overflow. In summary, memory complexity for GD\n",
+    "grows with $N$, while for SGD it remains $O(1)$ w.r.t. dataset size\n",
+    "(only the model and perhaps a mini-batch reside in memory) . This is a\n",
+    "key reason why batch GD “does not scale” to very large data and why\n",
+    "virtually all large-scale machine learning algorithms rely on\n",
+    "stochastic or mini-batch methods."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3f3fe4c4",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Empirical Evidence: Convergence Time and Memory in Practice\n",
+    "\n",
+    "Empirical studies strongly support the theoretical trade-offs\n",
+    "above. In large-scale machine learning tasks, SGD often converges to a\n",
+    "good solution much faster in wall-clock time than full-batch GD, and\n",
+    "it uses far less memory. For example, Bottou & Bousquet (2008)\n",
+    "analyzed learning time under a fixed computational budget and\n",
+    "concluded that when data is abundant, it’s better to use a faster\n",
+    "(even if less precise) optimization method to process more examples in\n",
+    "the same time . This analysis showed that for large-scale problems,\n",
+    "processing more data with SGD yields lower error than spending the\n",
+    "time to do exact (batch) optimization on fewer data . In other words,\n",
+    "if you have a time budget, it’s often optimal to accept slightly\n",
+    "slower convergence per step (as with SGD) in exchange for being able\n",
+    "to use many more training samples in that time. This phenomenon is\n",
+    "borne out by experiments:"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "69d08c69",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "### Deep Neural Networks\n",
+    "\n",
+    "In modern deep learning, full-batch GD is so slow that it is rarely\n",
+    "attempted; instead, mini-batch SGD is standard. A recent study\n",
+    "demonstrated that it is possible to train a ResNet-50 on ImageNet\n",
+    "using full-batch gradient descent, but it required careful tuning\n",
+    "(e.g. gradient clipping, tiny learning rates) and vast computational\n",
+    "resources – and even then, each full-batch update was extremely\n",
+    "expensive.\n",
+    "\n",
+    "Using a huge batch\n",
+    "(closer to full GD) tends to slow down convergence if the learning\n",
+    "rate is not scaled up, and often encounters optimization difficulties\n",
+    "(plateaus) that small batches avoid.\n",
+    "Empirically, small or medium\n",
+    "batch SGD finds minima in fewer clock hours because it can rapidly\n",
+    "loop over the data with gradient noise aiding exploration."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4e2b549d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "### Memory constraints\n",
+    "\n",
+    "From a memory standpoint, practitioners note that batch GD becomes\n",
+    "infeasible on large data. For example, if one tried to do full-batch\n",
+    "training on a dataset that doesn’t fit in RAM or GPU memory, the\n",
+    "program would resort to heavy disk I/O or simply crash. SGD\n",
+    "circumvents this by processing mini-batches. Even in cases where data\n",
+    "does fit in memory, using a full batch can spike memory usage due to\n",
+    "storing all gradients. One empirical observation is that mini-batch\n",
+    "training has a “lower, fluctuating usage pattern” of memory, whereas\n",
+    "full-batch loading “quickly consumes memory (often exceeding limits)”\n",
+    ". This is especially relevant for graph neural networks or other\n",
+    "models where a “batch” may include a huge chunk of a graph: full-batch\n",
+    "gradient computation can exhaust GPU memory, whereas mini-batch\n",
+    "methods keep memory usage manageable .\n",
+    "\n",
+    "In summary, SGD converges faster than full-batch GD in terms of actual\n",
+    "training time for large-scale problems, provided we measure\n",
+    "convergence as reaching a good-enough solution. Theoretical bounds\n",
+    "show SGD needs more iterations, but because it performs many more\n",
+    "updates per unit time (and requires far less memory), it often\n",
+    "achieves lower loss in a given time frame than GD. Full-batch GD might\n",
+    "take slightly fewer iterations in theory, but each iteration is so\n",
+    "costly that it is “slower… especially for large datasets” . Meanwhile,\n",
+    "memory scaling strongly favors SGD: GD’s memory cost grows with\n",
+    "dataset size, making it impractical beyond a point, whereas SGD’s\n",
+    "memory use is modest and mostly constant w.r.t. $N$ . These\n",
+    "differences have made SGD (and mini-batch variants) the de facto\n",
+    "choice for training large machine learning models, from logistic\n",
+    "regression on millions of examples to deep neural networks with\n",
+    "billions of parameters. The consensus in both research and practice is\n",
+    "that for large-scale or high-dimensional tasks, SGD-type methods\n",
+    "converge quicker per unit of computation and handle memory constraints\n",
+    "better than standard full-batch gradient descent ."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "48c2661e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Second moment of the gradient\n",
+    "\n",
+    "In stochastic gradient descent, with and without momentum, we still\n",
+    "have to specify a schedule for tuning the learning rates $\\eta_t$\n",
+    "as a function of time.  As discussed in the context of Newton's\n",
+    "method, this presents a number of dilemmas. The learning rate is\n",
+    "limited by the steepest direction which can change depending on the\n",
+    "current position in the landscape. To circumvent this problem, ideally\n",
+    "our algorithm would keep track of curvature and take large steps in\n",
+    "shallow, flat directions and small steps in steep, narrow directions.\n",
+    "Second-order methods accomplish this by calculating or approximating\n",
+    "the Hessian and normalizing the learning rate by the\n",
+    "curvature. However, this is very computationally expensive for\n",
+    "extremely large models. Ideally, we would like to be able to\n",
+    "adaptively change the step size to match the landscape without paying\n",
+    "the steep computational price of calculating or approximating\n",
+    "Hessians.\n",
+    "\n",
+    "During the last decade a number of methods have been introduced that accomplish\n",
+    "this by tracking not only the gradient, but also the second moment of\n",
+    "the gradient. These methods include AdaGrad, AdaDelta, Root Mean Squared Propagation (RMS-Prop), and\n",
+    "[ADAM](https://arxiv.org/abs/1412.6980)."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a2106298",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Challenge: Choosing a Fixed Learning Rate\n",
+    "A fixed $\\eta$ is hard to get right:\n",
+    "1. If $\\eta$ is too large, the updates can overshoot the minimum, causing oscillations or divergence\n",
+    "\n",
+    "2. If $\\eta$ is too small, convergence is very slow (many iterations to make progress)\n",
+    "\n",
+    "In practice, one often uses trial-and-error or schedules (decaying $\\eta$ over time) to find a workable balance.\n",
+    "For a function with steep directions and flat directions, a single global $\\eta$ may be inappropriate:\n",
+    "1. Steep coordinates require a smaller step size to avoid oscillation.\n",
+    "\n",
+    "2. Flat/shallow coordinates could use a larger step to speed up progress.\n",
+    "\n",
+    "3. This issue is pronounced in high-dimensional problems with **sparse or varying-scale features** – we need a method to adjust step sizesper feature."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "477a053c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Motivation for Adaptive Step Sizes\n",
+    "\n",
+    "1. Instead of a fixed global $\\eta$, use an **adaptive learning rate** for each parameter that depends on the history of gradients.\n",
+    "\n",
+    "2. Parameters that have large accumulated gradient magnitude should get smaller steps (they've been changing a lot), whereas parameters with small or infrequent gradients can have larger relative steps.\n",
+    "\n",
+    "3. This is especially useful for sparse features: Rarely active features accumulate little gradient, so their learning rate remains comparatively high, ensuring they are not neglected\n",
+    "\n",
+    "4. Conversely, frequently active features accumulate large gradient sums, and their learning rate automatically decreases, preventing too-large updates\n",
+    "\n",
+    "5. Several algorithms implement this idea (AdaGrad, RMSProp, AdaDelta, Adam, etc.). We will derive **AdaGrad**, one of the first adaptive methods."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f0924df8",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## AdaGrad algorithm, taken from [Goodfellow et al](https://www.deeplearningbook.org/contents/optimization.html)\n",
+    "\n",
+    "<!-- dom:FIGURE: [figures/adagrad.png, width=600 frac=0.8] -->\n",
+    "<!-- begin figure -->\n",
+    "\n",
+    "<img src=\"figures/adagrad.png\" width=\"600\"><p style=\"font-size: 0.9em\"><i>Figure 1: </i></p>\n",
+    "<!-- end figure -->"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7743f26d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Derivation of the AdaGrad Algorithm\n",
+    "\n",
+    "**Accumulating Gradient History.**\n",
+    "\n",
+    "1. AdaGrad maintains a running sum of squared gradients for each parameter (coordinate)\n",
+    "\n",
+    "2. Let $g_t = \\nabla C_{i_t}(x_t)$ be the gradient at step $t$ (or a subgradient for nondifferentiable cases).\n",
+    "\n",
+    "3. Initialize $r_0 = 0$ (an all-zero vector in $\\mathbb{R}^d$).\n",
+    "\n",
+    "4. At each iteration $t$, update the accumulation:"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ef4b5d6a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "r_t = r_{t-1} + g_t \\circ g_t,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "927e2738",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "1. Here  $g_t \\circ g_t$ denotes element-wise square of the gradient vector. $g_t^{(j)} = g_{t-1}^{(j)} + (g_{t,j})^2$ for each parameter $j$.\n",
+    "\n",
+    "2. We can view $H_t = \\mathrm{diag}(r_t)$ as a diagonal matrix of past squared gradients. Initially $H_0 = 0$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1753de13",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## AdaGrad Update Rule Derivation\n",
+    "\n",
+    "We scale the gradient by the inverse square root of the accumulated matrix $H_t$. The AdaGrad update at step $t$ is:"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0db67ba3",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\theta_{t+1} =\\theta_t - \\eta H_t^{-1/2} g_t,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7831e978",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where $H_t^{-1/2}$ is the diagonal matrix with entries $(r_{t}^{(1)})^{-1/2}, \\dots, (r_{t}^{(d)})^{-1/2}$\n",
+    "In coordinates, this means each parameter $j$ has an individual step size:"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "92a7758a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\theta_{t+1,j} =\\theta_{t,j} -\\frac{\\eta}{\\sqrt{r_{t,j}}}g_{t,j}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "df62a4ff",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "In practice we add a small constant $\\epsilon$ in the denominator for numerical stability to avoid division by zero:"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c8a2b948",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\theta_{t+1,j}= \\theta_{t,j}-\\frac{\\eta}{\\sqrt{\\epsilon + r_{t,j}}}g_{t,j}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3f269e80",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Equivalently, the effective learning rate for parameter $j$ at time $t$ is $\\displaystyle \\alpha_{t,j} = \\frac{\\eta}{\\sqrt{\\epsilon + r_{t,j}}}$. This decreases over time as $r_{t,j}$ grows."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f4ec584c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## AdaGrad Properties\n",
+    "\n",
+    "1. AdaGrad automatically tunes the step size for each parameter. Parameters with more *volatile or large gradients* get smaller steps, and those with *small or infrequent gradients* get relatively larger steps\n",
+    "\n",
+    "2. No manual schedule needed: The accumulation $r_t$ keeps increasing (or stays the same if gradient is zero), so step sizes $\\eta/\\sqrt{r_t}$ are non-increasing. This has a similar effect to a learning rate schedule, but individualized per coordinate.\n",
+    "\n",
+    "3. Sparse data benefit: For very sparse features, $r_{t,j}$ grows slowly, so that feature’s parameter retains a higher learning rate for longer, allowing it to make significant updates when it does get a gradient signal\n",
+    "\n",
+    "4. Convergence: In convex optimization, AdaGrad can be shown to achieve a sub-linear convergence rate  comparable to the best fixed learning rate tuned for the problem\n",
+    "\n",
+    "It effectively reduces the need to tune $\\eta$ by hand.\n",
+    "1. Limitations: Because $r_t$ accumulates without bound, AdaGrad’s learning rates can become extremely small over long training, potentially slowing progress. (Later variants like RMSProp, AdaDelta, Adam address this by modifying the accumulation rule.)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4b741016",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## RMSProp: Adaptive Learning Rates\n",
+    "\n",
+    "Addresses AdaGrad’s diminishing learning rate issue.\n",
+    "Uses a decaying average of squared gradients (instead of a cumulative sum):"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "76108e75",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "v_t = \\rho v_{t-1} + (1-\\rho)(\\nabla C(\\theta_t))^2,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4c6a3353",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "with $\\rho$ typically $0.9$ (or $0.99$).\n",
+    "1. Update: $\\theta_{t+1} = \\theta_t - \\frac{\\eta}{\\sqrt{v_t + \\epsilon}} \\nabla C(\\theta_t)$.\n",
+    "\n",
+    "2. Recent gradients have more weight, so $v_t$ adapts to the current landscape.\n",
+    "\n",
+    "3. Avoids AdaGrad’s “infinite memory” problem – learning rate does not continuously decay to zero.\n",
+    "\n",
+    "RMSProp was first proposed in lecture notes by Geoff Hinton, 2012 - unpublished.)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3e0a76ae",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## RMSProp algorithm, taken from [Goodfellow et al](https://www.deeplearningbook.org/contents/optimization.html)\n",
+    "\n",
+    "<!-- dom:FIGURE: [figures/rmsprop.png, width=600 frac=0.8] -->\n",
+    "<!-- begin figure -->\n",
+    "\n",
+    "<img src=\"figures/rmsprop.png\" width=\"600\"><p style=\"font-size: 0.9em\"><i>Figure 1: </i></p>\n",
+    "<!-- end figure -->"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "fa5fd82e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Adam Optimizer\n",
+    "\n",
+    "Why combine Momentum and RMSProp? Motivation for Adam: Adaptive Moment Estimation (Adam) was introduced by Kingma an Ba (2014) to combine the benefits of momentum and RMSProp.\n",
+    "\n",
+    "1. Fast convergence by smoothing gradients (accelerates in long-term gradient direction).\n",
+    "\n",
+    "2. Adaptive rates (RMSProp): Per-dimension learning rate scaling for stability (handles different feature scales, sparse gradients).\n",
+    "\n",
+    "3. Adam uses both: maintains moving averages of both first moment (gradients) and second moment (squared gradients)\n",
+    "\n",
+    "4. Additionally, includes a mechanism to correct the bias in these moving averages (crucial in early iterations)\n",
+    "\n",
+    "**Result**: Adam is robust, achieves faster convergence with less tuning, and often outperforms SGD (with momentum) in practice."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "89cda2f6",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## [ADAM optimizer](https://arxiv.org/abs/1412.6980)\n",
+    "\n",
+    "In [ADAM](https://arxiv.org/abs/1412.6980), we keep a running average of\n",
+    "both the first and second moment of the gradient and use this\n",
+    "information to adaptively change the learning rate for different\n",
+    "parameters.  The method is efficient when working with large\n",
+    "problems involving lots data and/or parameters.  It is a combination of the\n",
+    "gradient descent with momentum algorithm and the RMSprop algorithm\n",
+    "discussed above."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "69310c2b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Why Combine Momentum and RMSProp?\n",
+    "\n",
+    "1. Momentum: Fast convergence by smoothing gradients (accelerates in long-term gradient direction).\n",
+    "\n",
+    "2. Adaptive rates (RMSProp): Per-dimension learning rate scaling for stability (handles different feature scales, sparse gradients).\n",
+    "\n",
+    "3. Adam uses both: maintains moving averages of both first moment (gradients) and second moment (squared gradients)\n",
+    "\n",
+    "4. Additionally, includes a mechanism to correct the bias in these moving averages (crucial in early iterations)\n",
+    "\n",
+    "Result: Adam is robust, achieves faster convergence with less tuning, and often outperforms SGD (with momentum) in practice"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7d6b8734",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Adam: Exponential Moving Averages (Moments)\n",
+    "Adam maintains two moving averages at each time step $t$ for each parameter $w$:\n",
+    "**First moment (mean) $m_t$.**\n",
+    "\n",
+    "The Momentum term"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "106ce6bf",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "m_t = \\beta_1m_{t-1} + (1-\\beta_1)\\, \\nabla C(\\theta_t),\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3ba64fd6",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "**Second moment (uncentered variance) $v_t$.**\n",
+    "\n",
+    "The RMS term"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d2e1a9ee",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "v_t = \\beta_2v_{t-1} + (1-\\beta_2)(\\nabla C(\\theta_t))^2,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "00aae51f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "with typical $\\beta_1 = 0.9$, $\\beta_2 = 0.999$. Initialize $m_0 = 0$, $v_0 = 0$.\n",
+    "\n",
+    "  These are **biased** estimators of the true first and second moment of the gradients, especially at the start (since $m_0,v_0$ are zero)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "38adfadd",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Adam: Bias Correction\n",
+    "To counteract initialization bias in $m_t, v_t$, Adam computes bias-corrected estimates"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "484156fb",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\hat{m}_t = \\frac{m_t}{1 - \\beta_1^t}, \\qquad \\hat{v}_t = \\frac{v_t}{1 - \\beta_2^t}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "45d1d0c2",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "* When $t$ is small, $1-\\beta_i^t \\approx 0$, so $\\hat{m}_t, \\hat{v}_t$ significantly larger than raw $m_t, v_t$, compensating for the initial zero bias.\n",
+    "\n",
+    "* As $t$ increases, $1-\\beta_i^t \\to 1$, and $\\hat{m}_t, \\hat{v}_t$ converge to $m_t, v_t$.\n",
+    "\n",
+    "* Bias correction is important for Adam’s stability in early iterations"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e62d5568",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Adam: Update Rule Derivation\n",
+    "Finally, Adam updates parameters using the bias-corrected moments:"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3eb873c1",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\theta_{t+1} =\\theta_t -\\frac{\\alpha}{\\sqrt{\\hat{v}_t} + \\epsilon}\\hat{m}_t,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "fc1129f6",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where $\\epsilon$ is a small constant (e.g. $10^{-8}$) to prevent division by zero.\n",
+    "Breaking it down:\n",
+    "1. Compute gradient $\\nabla C(\\theta_t)$.\n",
+    "\n",
+    "2. Update first moment $m_t$ and second moment $v_t$ (exponential moving averages).\n",
+    "\n",
+    "3. Bias-correct: $\\hat{m}_t = m_t/(1-\\beta_1^t)$, $\\; \\hat{v}_t = v_t/(1-\\beta_2^t)$.\n",
+    "\n",
+    "4. Compute step: $\\Delta \\theta_t = \\frac{\\hat{m}_t}{\\sqrt{\\hat{v}_t} + \\epsilon}$.\n",
+    "\n",
+    "5. Update parameters: $\\theta_{t+1} = \\theta_t - \\alpha\\, \\Delta \\theta_t$.\n",
+    "\n",
+    "This is the Adam update rule as given in the original paper."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6f15ce48",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Adam vs. AdaGrad and RMSProp\n",
+    "\n",
+    "1. AdaGrad: Uses per-coordinate scaling like Adam, but no momentum. Tends to slow down too much due to cumulative history (no forgetting)\n",
+    "\n",
+    "2. RMSProp: Uses moving average of squared gradients (like Adam’s $v_t$) to maintain adaptive learning rates, but does not include momentum or bias-correction.\n",
+    "\n",
+    "3. Adam: Effectively RMSProp + Momentum + Bias-correction\n",
+    "\n",
+    "  * Momentum ($m_t$) provides acceleration and smoother convergence.\n",
+    "\n",
+    "  * Adaptive $v_t$ scaling moderates the step size per dimension.\n",
+    "\n",
+    "  * Bias correction (absent in AdaGrad/RMSProp) ensures robust estimates early on.\n",
+    "\n",
+    "In practice, Adam often yields faster convergence and better tuning stability than RMSProp or AdaGrad alone"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "44cb65e2",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Adaptivity Across Dimensions\n",
+    "\n",
+    "1. Adam adapts the step size \\emph{per coordinate}: parameters with larger gradient variance get smaller effective steps, those with smaller or sparse gradients get larger steps.\n",
+    "\n",
+    "2. This per-dimension adaptivity is inherited from AdaGrad/RMSProp and helps handle ill-conditioned or sparse problems.\n",
+    "\n",
+    "3. Meanwhile, momentum (first moment) allows Adam to continue making progress even if gradients become small or noisy, by leveraging accumulated direction."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e3862c40",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## ADAM algorithm, taken from [Goodfellow et al](https://www.deeplearningbook.org/contents/optimization.html)\n",
+    "\n",
+    "<!-- dom:FIGURE: [figures/adam.png, width=600 frac=0.8] -->\n",
+    "<!-- begin figure -->\n",
+    "\n",
+    "<img src=\"figures/adam.png\" width=\"600\"><p style=\"font-size: 0.9em\"><i>Figure 1: </i></p>\n",
+    "<!-- end figure -->"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c4aa2b35",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Algorithms and codes for Adagrad, RMSprop and Adam\n",
+    "\n",
+    "The algorithms we have implemented are well described in the text by [Goodfellow, Bengio and Courville, chapter 8](https://www.deeplearningbook.org/contents/optimization.html).\n",
+    "\n",
+    "The codes which implement these algorithms are discussed below here."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "01de27d3",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Practical tips\n",
+    "\n",
+    "* **Randomize the data when making mini-batches**. It is always important to randomly shuffle the data when forming mini-batches. Otherwise, the gradient descent method can fit spurious correlations resulting from the order in which data is presented.\n",
+    "\n",
+    "* **Transform your inputs**. Learning becomes difficult when our landscape has a mixture of steep and flat directions. One simple trick for minimizing these situations is to standardize the data by subtracting the mean and normalizing the variance of input variables. Whenever possible, also decorrelate the inputs. To understand why this is helpful, consider the case of linear regression. It is easy to show that for the squared error cost function, the Hessian of the cost function is just the correlation matrix between the inputs. Thus, by standardizing the inputs, we are ensuring that the landscape looks homogeneous in all directions in parameter space. Since most deep networks can be viewed as linear transformations followed by a non-linearity at each layer, we expect this intuition to hold beyond the linear case.\n",
+    "\n",
+    "* **Monitor the out-of-sample performance.** Always monitor the performance of your model on a validation set (a small portion of the training data that is held out of the training process to serve as a proxy for the test set. If the validation error starts increasing, then the model is beginning to overfit. Terminate the learning process. This *early stopping* significantly improves performance in many settings.\n",
+    "\n",
+    "* **Adaptive optimization methods don't always have good generalization.** Recent studies have shown that adaptive methods such as ADAM, RMSPorp, and AdaGrad tend to have poor generalization compared to SGD or SGD with momentum, particularly in the high-dimensional limit (i.e. the number of parameters exceeds the number of data points). Although it is not clear at this stage why these methods perform so well in training deep neural networks, simpler procedures like properly-tuned SGD may work as well or better in these applications."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "78a1a601",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Sneaking in automatic differentiation using Autograd\n",
+    "\n",
+    "In the examples here we take the liberty of sneaking in automatic\n",
+    "differentiation (without having discussed the mathematics).  In\n",
+    "project 1 you will write the gradients as discussed above, that is\n",
+    "hard-coding the gradients.  By introducing automatic differentiation\n",
+    "via the library **autograd**, which is now replaced by **JAX**, we have\n",
+    "more flexibility in setting up alternative cost functions.\n",
+    "\n",
+    "The\n",
+    "first example shows results with ordinary leats squares."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 9,
+   "id": "c721352d",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "# Using Autograd to calculate gradients for OLS\n",
+    "from random import random, seed\n",
+    "import numpy as np\n",
+    "import autograd.numpy as np\n",
+    "import matplotlib.pyplot as plt\n",
+    "from autograd import grad\n",
+    "\n",
+    "def CostOLS(theta):\n",
+    "    return (1.0/n)*np.sum((y-X @ theta)**2)\n",
+    "\n",
+    "n = 100\n",
+    "x = 2*np.random.rand(n,1)\n",
+    "y = 4+3*x+np.random.randn(n,1)\n",
+    "\n",
+    "X = np.c_[np.ones((n,1)), x]\n",
+    "XT_X = X.T @ X\n",
+    "theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)\n",
+    "print(\"Own inversion\")\n",
+    "print(theta_linreg)\n",
+    "# Hessian matrix\n",
+    "H = (2.0/n)* XT_X\n",
+    "EigValues, EigVectors = np.linalg.eig(H)\n",
+    "print(f\"Eigenvalues of Hessian Matrix:{EigValues}\")\n",
+    "\n",
+    "theta = np.random.randn(2,1)\n",
+    "eta = 1.0/np.max(EigValues)\n",
+    "Niterations = 1000\n",
+    "# define the gradient\n",
+    "training_gradient = grad(CostOLS)\n",
+    "\n",
+    "for iter in range(Niterations):\n",
+    "    gradients = training_gradient(theta)\n",
+    "    theta -= eta*gradients\n",
+    "print(\"theta from own gd\")\n",
+    "print(theta)\n",
+    "\n",
+    "xnew = np.array([[0],[2]])\n",
+    "Xnew = np.c_[np.ones((2,1)), xnew]\n",
+    "ypredict = Xnew.dot(theta)\n",
+    "ypredict2 = Xnew.dot(theta_linreg)\n",
+    "\n",
+    "plt.plot(xnew, ypredict, \"r-\")\n",
+    "plt.plot(xnew, ypredict2, \"b-\")\n",
+    "plt.plot(x, y ,'ro')\n",
+    "plt.axis([0,2.0,0, 15.0])\n",
+    "plt.xlabel(r'$x$')\n",
+    "plt.ylabel(r'$y$')\n",
+    "plt.title(r'Random numbers ')\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e36cec47",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Same code but now with momentum gradient descent"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 10,
+   "id": "fc5df7eb",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "# Using Autograd to calculate gradients for OLS\n",
+    "from random import random, seed\n",
+    "import numpy as np\n",
+    "import autograd.numpy as np\n",
+    "import matplotlib.pyplot as plt\n",
+    "from autograd import grad\n",
+    "\n",
+    "def CostOLS(theta):\n",
+    "    return (1.0/n)*np.sum((y-X @ theta)**2)\n",
+    "\n",
+    "n = 100\n",
+    "x = 2*np.random.rand(n,1)\n",
+    "y = 4+3*x#+np.random.randn(n,1)\n",
+    "\n",
+    "X = np.c_[np.ones((n,1)), x]\n",
+    "XT_X = X.T @ X\n",
+    "theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)\n",
+    "print(\"Own inversion\")\n",
+    "print(theta_linreg)\n",
+    "# Hessian matrix\n",
+    "H = (2.0/n)* XT_X\n",
+    "EigValues, EigVectors = np.linalg.eig(H)\n",
+    "print(f\"Eigenvalues of Hessian Matrix:{EigValues}\")\n",
+    "\n",
+    "theta = np.random.randn(2,1)\n",
+    "eta = 1.0/np.max(EigValues)\n",
+    "Niterations = 30\n",
+    "\n",
+    "# define the gradient\n",
+    "training_gradient = grad(CostOLS)\n",
+    "\n",
+    "for iter in range(Niterations):\n",
+    "    gradients = training_gradient(theta)\n",
+    "    theta -= eta*gradients\n",
+    "    print(iter,gradients[0],gradients[1])\n",
+    "print(\"theta from own gd\")\n",
+    "print(theta)\n",
+    "\n",
+    "# Now improve with momentum gradient descent\n",
+    "change = 0.0\n",
+    "delta_momentum = 0.3\n",
+    "for iter in range(Niterations):\n",
+    "    # calculate gradient\n",
+    "    gradients = training_gradient(theta)\n",
+    "    # calculate update\n",
+    "    new_change = eta*gradients+delta_momentum*change\n",
+    "    # take a step\n",
+    "    theta -= new_change\n",
+    "    # save the change\n",
+    "    change = new_change\n",
+    "    print(iter,gradients[0],gradients[1])\n",
+    "print(\"theta from own gd wth momentum\")\n",
+    "print(theta)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0b27af70",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Including Stochastic Gradient Descent with Autograd\n",
+    "\n",
+    "In this code we include the stochastic gradient descent approach\n",
+    "discussed above. Note here that we specify which argument we are\n",
+    "taking the derivative with respect to when using **autograd**."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 11,
+   "id": "adef9763",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "# Using Autograd to calculate gradients using SGD\n",
+    "# OLS example\n",
+    "from random import random, seed\n",
+    "import numpy as np\n",
+    "import autograd.numpy as np\n",
+    "import matplotlib.pyplot as plt\n",
+    "from autograd import grad\n",
+    "\n",
+    "# Note change from previous example\n",
+    "def CostOLS(y,X,theta):\n",
+    "    return np.sum((y-X @ theta)**2)\n",
+    "\n",
+    "n = 100\n",
+    "x = 2*np.random.rand(n,1)\n",
+    "y = 4+3*x+np.random.randn(n,1)\n",
+    "\n",
+    "X = np.c_[np.ones((n,1)), x]\n",
+    "XT_X = X.T @ X\n",
+    "theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)\n",
+    "print(\"Own inversion\")\n",
+    "print(theta_linreg)\n",
+    "# Hessian matrix\n",
+    "H = (2.0/n)* XT_X\n",
+    "EigValues, EigVectors = np.linalg.eig(H)\n",
+    "print(f\"Eigenvalues of Hessian Matrix:{EigValues}\")\n",
+    "\n",
+    "theta = np.random.randn(2,1)\n",
+    "eta = 1.0/np.max(EigValues)\n",
+    "Niterations = 1000\n",
+    "\n",
+    "# Note that we request the derivative wrt third argument (theta, 2 here)\n",
+    "training_gradient = grad(CostOLS,2)\n",
+    "\n",
+    "for iter in range(Niterations):\n",
+    "    gradients = (1.0/n)*training_gradient(y, X, theta)\n",
+    "    theta -= eta*gradients\n",
+    "print(\"theta from own gd\")\n",
+    "print(theta)\n",
+    "\n",
+    "xnew = np.array([[0],[2]])\n",
+    "Xnew = np.c_[np.ones((2,1)), xnew]\n",
+    "ypredict = Xnew.dot(theta)\n",
+    "ypredict2 = Xnew.dot(theta_linreg)\n",
+    "\n",
+    "plt.plot(xnew, ypredict, \"r-\")\n",
+    "plt.plot(xnew, ypredict2, \"b-\")\n",
+    "plt.plot(x, y ,'ro')\n",
+    "plt.axis([0,2.0,0, 15.0])\n",
+    "plt.xlabel(r'$x$')\n",
+    "plt.ylabel(r'$y$')\n",
+    "plt.title(r'Random numbers ')\n",
+    "plt.show()\n",
+    "\n",
+    "n_epochs = 50\n",
+    "M = 5   #size of each minibatch\n",
+    "m = int(n/M) #number of minibatches\n",
+    "t0, t1 = 5, 50\n",
+    "def learning_schedule(t):\n",
+    "    return t0/(t+t1)\n",
+    "\n",
+    "theta = np.random.randn(2,1)\n",
+    "\n",
+    "for epoch in range(n_epochs):\n",
+    "# Can you figure out a better way of setting up the contributions to each batch?\n",
+    "    for i in range(m):\n",
+    "        random_index = M*np.random.randint(m)\n",
+    "        xi = X[random_index:random_index+M]\n",
+    "        yi = y[random_index:random_index+M]\n",
+    "        gradients = (1.0/M)*training_gradient(yi, xi, theta)\n",
+    "        eta = learning_schedule(epoch*m+i)\n",
+    "        theta = theta - eta*gradients\n",
+    "print(\"theta from own sdg\")\n",
+    "print(theta)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "310fe5b2",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Same code but now with momentum gradient descent"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 12,
+   "id": "bcf65acf",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "# Using Autograd to calculate gradients using SGD\n",
+    "# OLS example\n",
+    "from random import random, seed\n",
+    "import numpy as np\n",
+    "import autograd.numpy as np\n",
+    "import matplotlib.pyplot as plt\n",
+    "from autograd import grad\n",
+    "\n",
+    "# Note change from previous example\n",
+    "def CostOLS(y,X,theta):\n",
+    "    return np.sum((y-X @ theta)**2)\n",
+    "\n",
+    "n = 100\n",
+    "x = 2*np.random.rand(n,1)\n",
+    "y = 4+3*x+np.random.randn(n,1)\n",
+    "\n",
+    "X = np.c_[np.ones((n,1)), x]\n",
+    "XT_X = X.T @ X\n",
+    "theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)\n",
+    "print(\"Own inversion\")\n",
+    "print(theta_linreg)\n",
+    "# Hessian matrix\n",
+    "H = (2.0/n)* XT_X\n",
+    "EigValues, EigVectors = np.linalg.eig(H)\n",
+    "print(f\"Eigenvalues of Hessian Matrix:{EigValues}\")\n",
+    "\n",
+    "theta = np.random.randn(2,1)\n",
+    "eta = 1.0/np.max(EigValues)\n",
+    "Niterations = 100\n",
+    "\n",
+    "# Note that we request the derivative wrt third argument (theta, 2 here)\n",
+    "training_gradient = grad(CostOLS,2)\n",
+    "\n",
+    "for iter in range(Niterations):\n",
+    "    gradients = (1.0/n)*training_gradient(y, X, theta)\n",
+    "    theta -= eta*gradients\n",
+    "print(\"theta from own gd\")\n",
+    "print(theta)\n",
+    "\n",
+    "\n",
+    "n_epochs = 50\n",
+    "M = 5   #size of each minibatch\n",
+    "m = int(n/M) #number of minibatches\n",
+    "t0, t1 = 5, 50\n",
+    "def learning_schedule(t):\n",
+    "    return t0/(t+t1)\n",
+    "\n",
+    "theta = np.random.randn(2,1)\n",
+    "\n",
+    "change = 0.0\n",
+    "delta_momentum = 0.3\n",
+    "\n",
+    "for epoch in range(n_epochs):\n",
+    "    for i in range(m):\n",
+    "        random_index = M*np.random.randint(m)\n",
+    "        xi = X[random_index:random_index+M]\n",
+    "        yi = y[random_index:random_index+M]\n",
+    "        gradients = (1.0/M)*training_gradient(yi, xi, theta)\n",
+    "        eta = learning_schedule(epoch*m+i)\n",
+    "        # calculate update\n",
+    "        new_change = eta*gradients+delta_momentum*change\n",
+    "        # take a step\n",
+    "        theta -= new_change\n",
+    "        # save the change\n",
+    "        change = new_change\n",
+    "print(\"theta from own sdg with momentum\")\n",
+    "print(theta)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f5e2c550",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## But none of these can compete with Newton's method\n",
+    "\n",
+    "Note that we here have introduced automatic differentiation"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 13,
+   "id": "300a02a4",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "# Using Newton's method\n",
+    "from random import random, seed\n",
+    "import numpy as np\n",
+    "import autograd.numpy as np\n",
+    "from autograd import grad\n",
+    "\n",
+    "def CostOLS(theta):\n",
+    "    return (1.0/n)*np.sum((y-X @ theta)**2)\n",
+    "\n",
+    "n = 100\n",
+    "x = 2*np.random.rand(n,1)\n",
+    "y = 4+3*x+5*x*x\n",
+    "\n",
+    "X = np.c_[np.ones((n,1)), x, x*x]\n",
+    "XT_X = X.T @ X\n",
+    "theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)\n",
+    "print(\"Own inversion\")\n",
+    "print(theta_linreg)\n",
+    "# Hessian matrix\n",
+    "H = (2.0/n)* XT_X\n",
+    "# Note that here the Hessian does not depend on the parameters theta\n",
+    "invH = np.linalg.pinv(H)\n",
+    "theta = np.random.randn(3,1)\n",
+    "Niterations = 5\n",
+    "# define the gradient\n",
+    "training_gradient = grad(CostOLS)\n",
+    "\n",
+    "for iter in range(Niterations):\n",
+    "    gradients = training_gradient(theta)\n",
+    "    theta -= invH @ gradients\n",
+    "    print(iter,gradients[0],gradients[1])\n",
+    "print(\"theta from own Newton code\")\n",
+    "print(theta)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5cb5fd26",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Similar (second order function now) problem but now with AdaGrad"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 14,
+   "id": "030efc5d",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "# Using Autograd to calculate gradients using AdaGrad and Stochastic Gradient descent\n",
+    "# OLS example\n",
+    "from random import random, seed\n",
+    "import numpy as np\n",
+    "import autograd.numpy as np\n",
+    "import matplotlib.pyplot as plt\n",
+    "from autograd import grad\n",
+    "\n",
+    "# Note change from previous example\n",
+    "def CostOLS(y,X,theta):\n",
+    "    return np.sum((y-X @ theta)**2)\n",
+    "\n",
+    "n = 1000\n",
+    "x = np.random.rand(n,1)\n",
+    "y = 2.0+3*x +4*x*x\n",
+    "\n",
+    "X = np.c_[np.ones((n,1)), x, x*x]\n",
+    "XT_X = X.T @ X\n",
+    "theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)\n",
+    "print(\"Own inversion\")\n",
+    "print(theta_linreg)\n",
+    "\n",
+    "\n",
+    "# Note that we request the derivative wrt third argument (theta, 2 here)\n",
+    "training_gradient = grad(CostOLS,2)\n",
+    "# Define parameters for Stochastic Gradient Descent\n",
+    "n_epochs = 50\n",
+    "M = 5   #size of each minibatch\n",
+    "m = int(n/M) #number of minibatches\n",
+    "# Guess for unknown parameters theta\n",
+    "theta = np.random.randn(3,1)\n",
+    "\n",
+    "# Value for learning rate\n",
+    "eta = 0.01\n",
+    "# Including AdaGrad parameter to avoid possible division by zero\n",
+    "delta  = 1e-8\n",
+    "for epoch in range(n_epochs):\n",
+    "    Giter = 0.0\n",
+    "    for i in range(m):\n",
+    "        random_index = M*np.random.randint(m)\n",
+    "        xi = X[random_index:random_index+M]\n",
+    "        yi = y[random_index:random_index+M]\n",
+    "        gradients = (1.0/M)*training_gradient(yi, xi, theta)\n",
+    "        Giter += gradients*gradients\n",
+    "        update = gradients*eta/(delta+np.sqrt(Giter))\n",
+    "        theta -= update\n",
+    "print(\"theta from own AdaGrad\")\n",
+    "print(theta)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "66850bb7",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Running this code we note an almost perfect agreement with the results from matrix inversion."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e1608bcf",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## RMSprop for adaptive learning rate with Stochastic Gradient Descent"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 15,
+   "id": "0ba7d8f7",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "# Using Autograd to calculate gradients using RMSprop  and Stochastic Gradient descent\n",
+    "# OLS example\n",
+    "from random import random, seed\n",
+    "import numpy as np\n",
+    "import autograd.numpy as np\n",
+    "import matplotlib.pyplot as plt\n",
+    "from autograd import grad\n",
+    "\n",
+    "# Note change from previous example\n",
+    "def CostOLS(y,X,theta):\n",
+    "    return np.sum((y-X @ theta)**2)\n",
+    "\n",
+    "n = 1000\n",
+    "x = np.random.rand(n,1)\n",
+    "y = 2.0+3*x +4*x*x# +np.random.randn(n,1)\n",
+    "\n",
+    "X = np.c_[np.ones((n,1)), x, x*x]\n",
+    "XT_X = X.T @ X\n",
+    "theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)\n",
+    "print(\"Own inversion\")\n",
+    "print(theta_linreg)\n",
+    "\n",
+    "\n",
+    "# Note that we request the derivative wrt third argument (theta, 2 here)\n",
+    "training_gradient = grad(CostOLS,2)\n",
+    "# Define parameters for Stochastic Gradient Descent\n",
+    "n_epochs = 50\n",
+    "M = 5   #size of each minibatch\n",
+    "m = int(n/M) #number of minibatches\n",
+    "# Guess for unknown parameters theta\n",
+    "theta = np.random.randn(3,1)\n",
+    "\n",
+    "# Value for learning rate\n",
+    "eta = 0.01\n",
+    "# Value for parameter rho\n",
+    "rho = 0.99\n",
+    "# Including AdaGrad parameter to avoid possible division by zero\n",
+    "delta  = 1e-8\n",
+    "for epoch in range(n_epochs):\n",
+    "    Giter = 0.0\n",
+    "    for i in range(m):\n",
+    "        random_index = M*np.random.randint(m)\n",
+    "        xi = X[random_index:random_index+M]\n",
+    "        yi = y[random_index:random_index+M]\n",
+    "        gradients = (1.0/M)*training_gradient(yi, xi, theta)\n",
+    "\t# Accumulated gradient\n",
+    "\t# Scaling with rho the new and the previous results\n",
+    "        Giter = (rho*Giter+(1-rho)*gradients*gradients)\n",
+    "\t# Taking the diagonal only and inverting\n",
+    "        update = gradients*eta/(delta+np.sqrt(Giter))\n",
+    "\t# Hadamard product\n",
+    "        theta -= update\n",
+    "print(\"theta from own RMSprop\")\n",
+    "print(theta)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0503f74b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## And finally [ADAM](https://arxiv.org/pdf/1412.6980.pdf)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 16,
+   "id": "c2a2732a",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "# Using Autograd to calculate gradients using RMSprop  and Stochastic Gradient descent\n",
+    "# OLS example\n",
+    "from random import random, seed\n",
+    "import numpy as np\n",
+    "import autograd.numpy as np\n",
+    "import matplotlib.pyplot as plt\n",
+    "from autograd import grad\n",
+    "\n",
+    "# Note change from previous example\n",
+    "def CostOLS(y,X,theta):\n",
+    "    return np.sum((y-X @ theta)**2)\n",
+    "\n",
+    "n = 1000\n",
+    "x = np.random.rand(n,1)\n",
+    "y = 2.0+3*x +4*x*x# +np.random.randn(n,1)\n",
+    "\n",
+    "X = np.c_[np.ones((n,1)), x, x*x]\n",
+    "XT_X = X.T @ X\n",
+    "theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)\n",
+    "print(\"Own inversion\")\n",
+    "print(theta_linreg)\n",
+    "\n",
+    "\n",
+    "# Note that we request the derivative wrt third argument (theta, 2 here)\n",
+    "training_gradient = grad(CostOLS,2)\n",
+    "# Define parameters for Stochastic Gradient Descent\n",
+    "n_epochs = 50\n",
+    "M = 5   #size of each minibatch\n",
+    "m = int(n/M) #number of minibatches\n",
+    "# Guess for unknown parameters theta\n",
+    "theta = np.random.randn(3,1)\n",
+    "\n",
+    "# Value for learning rate\n",
+    "eta = 0.01\n",
+    "# Value for parameters theta1 and theta2, see https://arxiv.org/abs/1412.6980\n",
+    "theta1 = 0.9\n",
+    "theta2 = 0.999\n",
+    "# Including AdaGrad parameter to avoid possible division by zero\n",
+    "delta  = 1e-7\n",
+    "iter = 0\n",
+    "for epoch in range(n_epochs):\n",
+    "    first_moment = 0.0\n",
+    "    second_moment = 0.0\n",
+    "    iter += 1\n",
+    "    for i in range(m):\n",
+    "        random_index = M*np.random.randint(m)\n",
+    "        xi = X[random_index:random_index+M]\n",
+    "        yi = y[random_index:random_index+M]\n",
+    "        gradients = (1.0/M)*training_gradient(yi, xi, theta)\n",
+    "        # Computing moments first\n",
+    "        first_moment = theta1*first_moment + (1-theta1)*gradients\n",
+    "        second_moment = theta2*second_moment+(1-theta2)*gradients*gradients\n",
+    "        first_term = first_moment/(1.0-theta1**iter)\n",
+    "        second_term = second_moment/(1.0-theta2**iter)\n",
+    "\t# Scaling with rho the new and the previous results\n",
+    "        update = eta*first_term/(np.sqrt(second_term)+delta)\n",
+    "        theta -= update\n",
+    "print(\"theta from own ADAM\")\n",
+    "print(theta)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b8475863",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Material for the lab sessions\n",
+    "\n",
+    "1. Exercise set for week 37 and reminder on scaling (from lab sessions of week 35)\n",
+    "\n",
+    "2. Work on project 1\n",
+    "<!-- * [Video of exercise sessions week 37](https://youtu.be/bK4AEcTu-oM) -->\n",
+    "\n",
+    "For more discussions of Ridge regression and calculation of averages, [Wessel van Wieringen's](https://arxiv.org/abs/1509.09169) article is highly recommended."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4d4d0717",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Reminder on different scaling methods\n",
+    "\n",
+    "Before fitting a regression model, it is good practice to normalize or\n",
+    "standardize the features. This ensures all features are on a\n",
+    "comparable scale, which is especially important when using\n",
+    "regularization. In the exercises this week we will perform standardization, scaling each\n",
+    "feature to have mean 0 and standard deviation 1.\n",
+    "\n",
+    "Here we compute the mean and standard deviation of each column (feature) in our design/feature matrix $\\boldsymbol{X}$.\n",
+    "Then we subtract the mean and divide by the standard deviation for each feature.\n",
+    "\n",
+    "In the example here we\n",
+    "we will also center the target $\\boldsymbol{y}$ to mean $0$. Centering $\\boldsymbol{y}$\n",
+    "(and each feature) means the model does not require a separate intercept\n",
+    "term, the data is shifted such that the intercept is effectively 0\n",
+    ". (In practice, one could include an intercept in the model and not\n",
+    "penalize it, but here we simplify by centering.)\n",
+    "Choose $n=100$ data points and set up $\\boldsymbol{x}, $\\boldsymbol{y}$ and the design matrix $\\boldsymbol{X}$."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 17,
+   "id": "46375144",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "# Standardize features (zero mean, unit variance for each feature)\n",
+    "X_mean = X.mean(axis=0)\n",
+    "X_std = X.std(axis=0)\n",
+    "X_std[X_std == 0] = 1  # safeguard to avoid division by zero for constant features\n",
+    "X_norm = (X - X_mean) / X_std\n",
+    "\n",
+    "# Center the target to zero mean (optional, to simplify intercept handling)\n",
+    "y_mean = ?\n",
+    "y_centered = ?"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "39426ccf",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Do we need to center the values of $y$?\n",
+    "\n",
+    "After this preprocessing, each column of $\\boldsymbol{X}_{\\mathrm{norm}}$ has mean zero and standard deviation $1$\n",
+    "and $\\boldsymbol{y}_{\\mathrm{centered}}$ has mean 0. This can make the optimization landscape\n",
+    "nicer and ensures the regularization penalty $\\lambda \\sum_j\n",
+    "\\theta_j^2$ in Ridge regression treats each coefficient fairly (since features are on the\n",
+    "same scale)."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "df7fe27f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Functionality in Scikit-Learn\n",
+    "\n",
+    "**Scikit-Learn** has several functions which allow us to rescale the\n",
+    "data, normally resulting in much better results in terms of various\n",
+    "accuracy scores.  The **StandardScaler** function in **Scikit-Learn**\n",
+    "ensures that for each feature/predictor we study the mean value is\n",
+    "zero and the variance is one (every column in the design/feature\n",
+    "matrix).  This scaling has the drawback that it does not ensure that\n",
+    "we have a particular maximum or minimum in our data set. Another\n",
+    "function included in **Scikit-Learn** is the **MinMaxScaler** which\n",
+    "ensures that all features are exactly between $0$ and $1$. The"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8fd48e39",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## More preprocessing\n",
+    "\n",
+    "The **Normalizer** scales each data\n",
+    "point such that the feature vector has a euclidean length of one. In other words, it\n",
+    "projects a data point on the circle (or sphere in the case of higher dimensions) with a\n",
+    "radius of 1. This means every data point is scaled by a different number (by the\n",
+    "inverse of it’s length).\n",
+    "This normalization is often used when only the direction (or angle) of the data matters,\n",
+    "not the length of the feature vector.\n",
+    "\n",
+    "The **RobustScaler** works similarly to the StandardScaler in that it\n",
+    "ensures statistical properties for each feature that guarantee that\n",
+    "they are on the same scale. However, the RobustScaler uses the median\n",
+    "and quartiles, instead of mean and variance. This makes the\n",
+    "RobustScaler ignore data points that are very different from the rest\n",
+    "(like measurement errors). These odd data points are also called\n",
+    "outliers, and might often lead to trouble for other scaling\n",
+    "techniques."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d6c60a0a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Frequently used scaling functions\n",
+    "\n",
+    "Many features are often scaled using standardization to improve performance. In **Scikit-Learn** this is given by the **StandardScaler** function as discussed above. It is easy however to write your own. \n",
+    "Mathematically, this involves subtracting the mean and divide by the standard deviation over the data set, for each feature:"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1bb6eaa0",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "x_j^{(i)} \\rightarrow \\frac{x_j^{(i)} - \\overline{x}_j}{\\sigma(x_j)},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "25135896",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where $\\overline{x}_j$ and $\\sigma(x_j)$ are the mean and standard deviation, respectively,  of the feature $x_j$.\n",
+    "This ensures that each feature has zero mean and unit standard deviation.  For data sets where  we do not have the standard deviation or don't wish to calculate it,  it is then common to simply set it to one.\n",
+    "\n",
+    "Keep in mind that when you transform your data set before training a model, the same transformation needs to be done\n",
+    "on your eventual new data set  before making a prediction. If we translate this into a Python code, it would could be implemented as"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 18,
+   "id": "469ca11e",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "\"\"\"\n",
+    "#Model training, we compute the mean value of y and X\n",
+    "y_train_mean = np.mean(y_train)\n",
+    "X_train_mean = np.mean(X_train,axis=0)\n",
+    "X_train = X_train - X_train_mean\n",
+    "y_train = y_train - y_train_mean\n",
+    "\n",
+    "# The we fit our model with the training data\n",
+    "trained_model = some_model.fit(X_train,y_train)\n",
+    "\n",
+    "\n",
+    "#Model prediction, we need also to transform our data set used for the prediction.\n",
+    "X_test = X_test - X_train_mean #Use mean from training data\n",
+    "y_pred = trained_model(X_test)\n",
+    "y_pred = y_pred + y_train_mean\n",
+    "\"\"\""
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "33722029",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Let us try to understand what this may imply mathematically when we\n",
+    "subtract the mean values, also known as *zero centering*. For\n",
+    "simplicity, we will focus on  ordinary regression, as done in the above example.\n",
+    "\n",
+    "The cost/loss function  for regression is"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "fe27291e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "C(\\theta_0, \\theta_1, ... , \\theta_{p-1}) = \\frac{1}{n}\\sum_{i=0}^{n} \\left(y_i - \\theta_0 - \\sum_{j=1}^{p-1} X_{ij}\\theta_j\\right)^2,.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ead1167d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Recall also that we use the squared value. This expression can lead to an\n",
+    "increased penalty for higher differences between predicted and\n",
+    "output/target values.\n",
+    "\n",
+    "What we have done is to single out the $\\theta_0$ term in the\n",
+    "definition of the mean squared error (MSE).  The design matrix $X$\n",
+    "does in this case not contain any intercept column.  When we take the\n",
+    "derivative with respect to $\\theta_0$, we want the derivative to obey"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b2efb706",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial C}{\\partial \\theta_j} = 0,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "65333100",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "for all $j$. For $\\theta_0$ we have"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1fde497c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial C}{\\partial \\theta_0} = -\\frac{2}{n}\\sum_{i=0}^{n-1} \\left(y_i - \\theta_0 - \\sum_{j=1}^{p-1} X_{ij} \\theta_j\\right).\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "264ce562",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Multiplying away the constant $2/n$, we obtain"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0f63a6f8",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\sum_{i=0}^{n-1} \\theta_0 = \\sum_{i=0}^{n-1}y_i - \\sum_{i=0}^{n-1} \\sum_{j=1}^{p-1} X_{ij} \\theta_j.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2ba0a6e4",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Let us specialize first to the case where we have only two parameters $\\theta_0$ and $\\theta_1$.\n",
+    "Our result for $\\theta_0$ simplifies then to"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3b377f93",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "n\\theta_0 = \\sum_{i=0}^{n-1}y_i - \\sum_{i=0}^{n-1} X_{i1} \\theta_1.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f05e9d08",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "We obtain then"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "84784b8e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\theta_0 = \\frac{1}{n}\\sum_{i=0}^{n-1}y_i - \\theta_1\\frac{1}{n}\\sum_{i=0}^{n-1} X_{i1}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b62c6e5a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "If we define"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ecce9763",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\mu_{\\boldsymbol{x}_1}=\\frac{1}{n}\\sum_{i=0}^{n-1} X_{i1},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c9e1842a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and the mean value of the outputs as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "be12163e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\mu_y=\\frac{1}{n}\\sum_{i=0}^{n-1}y_i,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a097e9ab",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "we have"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "239422b0",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\theta_0 = \\mu_y - \\theta_1\\mu_{\\boldsymbol{x}_1}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ed9778bb",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "In the general case with more parameters than $\\theta_0$ and $\\theta_1$, we have"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7179b77b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\theta_0 = \\frac{1}{n}\\sum_{i=0}^{n-1}y_i - \\frac{1}{n}\\sum_{i=0}^{n-1}\\sum_{j=1}^{p-1} X_{ij}\\theta_j.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "aad2f56e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "We can rewrite the latter equation as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "26aa9739",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\theta_0 = \\frac{1}{n}\\sum_{i=0}^{n-1}y_i - \\sum_{j=1}^{p-1} \\mu_{\\boldsymbol{x}_j}\\theta_j,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d270cb13",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where we have defined"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5a52457b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\mu_{\\boldsymbol{x}_j}=\\frac{1}{n}\\sum_{i=0}^{n-1} X_{ij},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8c98105d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "the mean value for all elements of the column vector $\\boldsymbol{x}_j$.\n",
+    "\n",
+    "Replacing $y_i$ with $y_i - y_i - \\overline{\\boldsymbol{y}}$ and centering also our design matrix results in a cost function (in vector-matrix disguise)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4d82302f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "C(\\boldsymbol{\\theta}) = (\\boldsymbol{\\tilde{y}} - \\tilde{X}\\boldsymbol{\\theta})^T(\\boldsymbol{\\tilde{y}} - \\tilde{X}\\boldsymbol{\\theta}).\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a3a07a10",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "If we minimize with respect to $\\boldsymbol{\\theta}$ we have then"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ea19374e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\hat{\\boldsymbol{\\theta}} = (\\tilde{X}^T\\tilde{X})^{-1}\\tilde{X}^T\\boldsymbol{\\tilde{y}},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "11dd1361",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where $\\boldsymbol{\\tilde{y}} = \\boldsymbol{y} - \\overline{\\boldsymbol{y}}$\n",
+    "and $\\tilde{X}_{ij} = X_{ij} - \\frac{1}{n}\\sum_{k=0}^{n-1}X_{kj}$.\n",
+    "\n",
+    "For Ridge regression we need to add $\\lambda \\boldsymbol{\\theta}^T\\boldsymbol{\\theta}$ to the cost function and get then"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f6a52f34",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\hat{\\boldsymbol{\\theta}} = (\\tilde{X}^T\\tilde{X} + \\lambda I)^{-1}\\tilde{X}^T\\boldsymbol{\\tilde{y}}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9d6807dc",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "What does this mean? And why do we insist on all this? Let us look at some examples.\n",
+    "\n",
+    "This code shows a simple first-order fit to a data set using the above transformed data, where we consider the role of the intercept first, by either excluding it or including it (*code example thanks to  Øyvind Sigmundson Schøyen*). Here our scaling of the data is done by subtracting the mean values only.\n",
+    "Note also that we do not split the data into training and test."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 19,
+   "id": "2ed0cafc",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "import numpy as np\n",
+    "import matplotlib.pyplot as plt\n",
+    "\n",
+    "from sklearn.linear_model import LinearRegression\n",
+    "\n",
+    "\n",
+    "np.random.seed(2021)\n",
+    "\n",
+    "def MSE(y_data,y_model):\n",
+    "    n = np.size(y_model)\n",
+    "    return np.sum((y_data-y_model)**2)/n\n",
+    "\n",
+    "\n",
+    "def fit_theta(X, y):\n",
+    "    return np.linalg.pinv(X.T @ X) @ X.T @ y\n",
+    "\n",
+    "\n",
+    "true_theta = [2, 0.5, 3.7]\n",
+    "\n",
+    "x = np.linspace(0, 1, 11)\n",
+    "y = np.sum(\n",
+    "    np.asarray([x ** p * b for p, b in enumerate(true_theta)]), axis=0\n",
+    ") + 0.1 * np.random.normal(size=len(x))\n",
+    "\n",
+    "degree = 3\n",
+    "X = np.zeros((len(x), degree))\n",
+    "\n",
+    "# Include the intercept in the design matrix\n",
+    "for p in range(degree):\n",
+    "    X[:, p] = x ** p\n",
+    "\n",
+    "theta = fit_theta(X, y)\n",
+    "\n",
+    "# Intercept is included in the design matrix\n",
+    "skl = LinearRegression(fit_intercept=False).fit(X, y)\n",
+    "\n",
+    "print(f\"True theta: {true_theta}\")\n",
+    "print(f\"Fitted theta: {theta}\")\n",
+    "print(f\"Sklearn fitted theta: {skl.coef_}\")\n",
+    "ypredictOwn = X @ theta\n",
+    "ypredictSKL = skl.predict(X)\n",
+    "print(f\"MSE with intercept column\")\n",
+    "print(MSE(y,ypredictOwn))\n",
+    "print(f\"MSE with intercept column from SKL\")\n",
+    "print(MSE(y,ypredictSKL))\n",
+    "\n",
+    "\n",
+    "plt.figure()\n",
+    "plt.scatter(x, y, label=\"Data\")\n",
+    "plt.plot(x, X @ theta, label=\"Fit\")\n",
+    "plt.plot(x, skl.predict(X), label=\"Sklearn (fit_intercept=False)\")\n",
+    "\n",
+    "\n",
+    "# Do not include the intercept in the design matrix\n",
+    "X = np.zeros((len(x), degree - 1))\n",
+    "\n",
+    "for p in range(degree - 1):\n",
+    "    X[:, p] = x ** (p + 1)\n",
+    "\n",
+    "# Intercept is not included in the design matrix\n",
+    "skl = LinearRegression(fit_intercept=True).fit(X, y)\n",
+    "\n",
+    "# Use centered values for X and y when computing coefficients\n",
+    "y_offset = np.average(y, axis=0)\n",
+    "X_offset = np.average(X, axis=0)\n",
+    "\n",
+    "theta = fit_theta(X - X_offset, y - y_offset)\n",
+    "intercept = np.mean(y_offset - X_offset @ theta)\n",
+    "\n",
+    "print(f\"Manual intercept: {intercept}\")\n",
+    "print(f\"Fitted theta (without intercept): {theta}\")\n",
+    "print(f\"Sklearn intercept: {skl.intercept_}\")\n",
+    "print(f\"Sklearn fitted theta (without intercept): {skl.coef_}\")\n",
+    "ypredictOwn = X @ theta\n",
+    "ypredictSKL = skl.predict(X)\n",
+    "print(f\"MSE with Manual intercept\")\n",
+    "print(MSE(y,ypredictOwn+intercept))\n",
+    "print(f\"MSE with Sklearn intercept\")\n",
+    "print(MSE(y,ypredictSKL))\n",
+    "\n",
+    "plt.plot(x, X @ theta + intercept, \"--\", label=\"Fit (manual intercept)\")\n",
+    "plt.plot(x, skl.predict(X), \"--\", label=\"Sklearn (fit_intercept=True)\")\n",
+    "plt.grid()\n",
+    "plt.legend()\n",
+    "\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f72dbb49",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "The intercept is the value of our output/target variable\n",
+    "when all our features are zero and our function crosses the $y$-axis (for a one-dimensional case). \n",
+    "\n",
+    "Printing the MSE, we see first that both methods give the same MSE, as\n",
+    "they should.  However, when we move to for example Ridge regression,\n",
+    "the way we treat the intercept may give a larger or smaller MSE,\n",
+    "meaning that the MSE can be penalized by the value of the\n",
+    "intercept. Not including the intercept in the fit, means that the\n",
+    "regularization term does not include $\\theta_0$. For different values\n",
+    "of $\\lambda$, this may lead to different MSE values. \n",
+    "\n",
+    "To remind the reader, the regularization term, with the intercept in Ridge regression, is given by"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b7759b1f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\lambda \\vert\\vert \\boldsymbol{\\theta} \\vert\\vert_2^2 = \\lambda \\sum_{j=0}^{p-1}\\theta_j^2,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ba0ecd6e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "but when we take out the intercept, this equation becomes"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ae897f1e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\lambda \\vert\\vert \\boldsymbol{\\theta} \\vert\\vert_2^2 = \\lambda \\sum_{j=1}^{p-1}\\theta_j^2.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f9c41f7f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "For Lasso regression we have"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "fa013cc4",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\lambda \\vert\\vert \\boldsymbol{\\theta} \\vert\\vert_1 = \\lambda \\sum_{j=1}^{p-1}\\vert\\theta_j\\vert.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0c9b24be",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "It means that, when scaling the design matrix and the outputs/targets,\n",
+    "by subtracting the mean values, we have an optimization problem which\n",
+    "is not penalized by the intercept. The MSE value can then be smaller\n",
+    "since it focuses only on the remaining quantities. If we however bring\n",
+    "back the intercept, we will get a MSE which then contains the\n",
+    "intercept.\n",
+    "\n",
+    "Armed with this wisdom, we attempt first to simply set the intercept equal to **False** in our implementation of Ridge regression for our well-known  vanilla data set."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 20,
+   "id": "4f9b1fa0",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "import numpy as np\n",
+    "import pandas as pd\n",
+    "import matplotlib.pyplot as plt\n",
+    "from sklearn.model_selection import train_test_split\n",
+    "from sklearn import linear_model\n",
+    "\n",
+    "def MSE(y_data,y_model):\n",
+    "    n = np.size(y_model)\n",
+    "    return np.sum((y_data-y_model)**2)/n\n",
+    "\n",
+    "\n",
+    "# A seed just to ensure that the random numbers are the same for every run.\n",
+    "# Useful for eventual debugging.\n",
+    "np.random.seed(3155)\n",
+    "\n",
+    "n = 100\n",
+    "x = np.random.rand(n)\n",
+    "y = np.exp(-x**2) + 1.5 * np.exp(-(x-2)**2)\n",
+    "\n",
+    "Maxpolydegree = 20\n",
+    "X = np.zeros((n,Maxpolydegree))\n",
+    "#We include explicitely the intercept column\n",
+    "for degree in range(Maxpolydegree):\n",
+    "    X[:,degree] = x**degree\n",
+    "# We split the data in test and training data\n",
+    "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)\n",
+    "\n",
+    "p = Maxpolydegree\n",
+    "I = np.eye(p,p)\n",
+    "# Decide which values of lambda to use\n",
+    "nlambdas = 6\n",
+    "MSEOwnRidgePredict = np.zeros(nlambdas)\n",
+    "MSERidgePredict = np.zeros(nlambdas)\n",
+    "lambdas = np.logspace(-4, 2, nlambdas)\n",
+    "for i in range(nlambdas):\n",
+    "    lmb = lambdas[i]\n",
+    "    OwnRidgeTheta = np.linalg.pinv(X_train.T @ X_train+lmb*I) @ X_train.T @ y_train\n",
+    "    # Note: we include the intercept column and no scaling\n",
+    "    RegRidge = linear_model.Ridge(lmb,fit_intercept=False)\n",
+    "    RegRidge.fit(X_train,y_train)\n",
+    "    # and then make the prediction\n",
+    "    ytildeOwnRidge = X_train @ OwnRidgeTheta\n",
+    "    ypredictOwnRidge = X_test @ OwnRidgeTheta\n",
+    "    ytildeRidge = RegRidge.predict(X_train)\n",
+    "    ypredictRidge = RegRidge.predict(X_test)\n",
+    "    MSEOwnRidgePredict[i] = MSE(y_test,ypredictOwnRidge)\n",
+    "    MSERidgePredict[i] = MSE(y_test,ypredictRidge)\n",
+    "    print(\"Theta values for own Ridge implementation\")\n",
+    "    print(OwnRidgeTheta)\n",
+    "    print(\"Theta values for Scikit-Learn Ridge implementation\")\n",
+    "    print(RegRidge.coef_)\n",
+    "    print(\"MSE values for own Ridge implementation\")\n",
+    "    print(MSEOwnRidgePredict[i])\n",
+    "    print(\"MSE values for Scikit-Learn Ridge implementation\")\n",
+    "    print(MSERidgePredict[i])\n",
+    "\n",
+    "# Now plot the results\n",
+    "plt.figure()\n",
+    "plt.plot(np.log10(lambdas), MSEOwnRidgePredict, 'r', label = 'MSE own Ridge Test')\n",
+    "plt.plot(np.log10(lambdas), MSERidgePredict, 'g', label = 'MSE Ridge Test')\n",
+    "\n",
+    "plt.xlabel('log10(lambda)')\n",
+    "plt.ylabel('MSE')\n",
+    "plt.legend()\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1aa5ca37",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "The results here agree when we force **Scikit-Learn**'s Ridge function to include the first column in our design matrix.\n",
+    "We see that the results agree very well. Here we have thus explicitely included the intercept column in the design matrix.\n",
+    "What happens if we do not include the intercept in our fit?\n",
+    "Let us see how we can change this code by zero centering."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 21,
+   "id": "a731e32c",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "import numpy as np\n",
+    "import pandas as pd\n",
+    "import matplotlib.pyplot as plt\n",
+    "from sklearn.model_selection import train_test_split\n",
+    "from sklearn import linear_model\n",
+    "from sklearn.preprocessing import StandardScaler\n",
+    "\n",
+    "def MSE(y_data,y_model):\n",
+    "    n = np.size(y_model)\n",
+    "    return np.sum((y_data-y_model)**2)/n\n",
+    "# A seed just to ensure that the random numbers are the same for every run.\n",
+    "# Useful for eventual debugging.\n",
+    "np.random.seed(315)\n",
+    "\n",
+    "n = 100\n",
+    "x = np.random.rand(n)\n",
+    "y = np.exp(-x**2) + 1.5 * np.exp(-(x-2)**2)\n",
+    "\n",
+    "Maxpolydegree = 20\n",
+    "X = np.zeros((n,Maxpolydegree-1))\n",
+    "\n",
+    "for degree in range(1,Maxpolydegree): #No intercept column\n",
+    "    X[:,degree-1] = x**(degree)\n",
+    "\n",
+    "# We split the data in test and training data\n",
+    "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)\n",
+    "\n",
+    "#For our own implementation, we will need to deal with the intercept by centering the design matrix and the target variable\n",
+    "X_train_mean = np.mean(X_train,axis=0)\n",
+    "#Center by removing mean from each feature\n",
+    "X_train_scaled = X_train - X_train_mean \n",
+    "X_test_scaled = X_test - X_train_mean\n",
+    "#The model intercept (called y_scaler) is given by the mean of the target variable (IF X is centered)\n",
+    "#Remove the intercept from the training data.\n",
+    "y_scaler = np.mean(y_train)           \n",
+    "y_train_scaled = y_train - y_scaler   \n",
+    "\n",
+    "p = Maxpolydegree-1\n",
+    "I = np.eye(p,p)\n",
+    "# Decide which values of lambda to use\n",
+    "nlambdas = 6\n",
+    "MSEOwnRidgePredict = np.zeros(nlambdas)\n",
+    "MSERidgePredict = np.zeros(nlambdas)\n",
+    "\n",
+    "lambdas = np.logspace(-4, 2, nlambdas)\n",
+    "for i in range(nlambdas):\n",
+    "    lmb = lambdas[i]\n",
+    "    OwnRidgeTheta = np.linalg.pinv(X_train_scaled.T @ X_train_scaled+lmb*I) @ X_train_scaled.T @ (y_train_scaled)\n",
+    "    intercept_ = y_scaler - X_train_mean@OwnRidgeTheta #The intercept can be shifted so the model can predict on uncentered data\n",
+    "    #Add intercept to prediction\n",
+    "    ypredictOwnRidge = X_test_scaled @ OwnRidgeTheta + y_scaler \n",
+    "    RegRidge = linear_model.Ridge(lmb)\n",
+    "    RegRidge.fit(X_train,y_train)\n",
+    "    ypredictRidge = RegRidge.predict(X_test)\n",
+    "    MSEOwnRidgePredict[i] = MSE(y_test,ypredictOwnRidge)\n",
+    "    MSERidgePredict[i] = MSE(y_test,ypredictRidge)\n",
+    "    print(\"Theta values for own Ridge implementation\")\n",
+    "    print(OwnRidgeTheta) #Intercept is given by mean of target variable\n",
+    "    print(\"Theta values for Scikit-Learn Ridge implementation\")\n",
+    "    print(RegRidge.coef_)\n",
+    "    print('Intercept from own implementation:')\n",
+    "    print(intercept_)\n",
+    "    print('Intercept from Scikit-Learn Ridge implementation')\n",
+    "    print(RegRidge.intercept_)\n",
+    "    print(\"MSE values for own Ridge implementation\")\n",
+    "    print(MSEOwnRidgePredict[i])\n",
+    "    print(\"MSE values for Scikit-Learn Ridge implementation\")\n",
+    "    print(MSERidgePredict[i])\n",
+    "\n",
+    "\n",
+    "# Now plot the results\n",
+    "plt.figure()\n",
+    "plt.plot(np.log10(lambdas), MSEOwnRidgePredict, 'b--', label = 'MSE own Ridge Test')\n",
+    "plt.plot(np.log10(lambdas), MSERidgePredict, 'g--', label = 'MSE SL Ridge Test')\n",
+    "plt.xlabel('log10(lambda)')\n",
+    "plt.ylabel('MSE')\n",
+    "plt.legend()\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6ea197d8",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "We see here, when compared to the code which includes explicitely the\n",
+    "intercept column, that our MSE value is actually smaller. This is\n",
+    "because the regularization term does not include the intercept value\n",
+    "$\\theta_0$ in the fitting.  This applies to Lasso regularization as\n",
+    "well.  It means that our optimization is now done only with the\n",
+    "centered matrix and/or vector that enter the fitting procedure."
+   ]
+  }
+ ],
+ "metadata": {},
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/doc/LectureNotes/_build/html/_sources/week38.ipynb b/doc/LectureNotes/_build/html/_sources/week38.ipynb
new file mode 100644
index 000000000..1d25f9941
--- /dev/null
+++ b/doc/LectureNotes/_build/html/_sources/week38.ipynb
@@ -0,0 +1,2283 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "8f27372d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- HTML file automatically generated from DocOnce source (https://github.com/doconce/doconce/)\n",
+    "doconce format html week38.do.txt --no_mako -->\n",
+    "<!-- dom:TITLE: Week 38: Statistical analysis, bias-variance tradeoff and resampling methods -->"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "fff8ca30",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "# Week 38: Statistical analysis, bias-variance tradeoff and resampling methods\n",
+    "**Morten Hjorth-Jensen**, Department of Physics and Center for Computing in Science Education, University of Oslo, Norway\n",
+    "\n",
+    "Date: **September 15-19, 2025**"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7ee7e714",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Plans for week 38, lecture Monday September 15\n",
+    "\n",
+    "**Material for the lecture on Monday September 15.**\n",
+    "\n",
+    "1. Statistical interpretation of OLS and various expectation values\n",
+    "\n",
+    "2. Resampling techniques, Bootstrap and cross validation and bias-variance tradeoff\n",
+    "\n",
+    "3. The material we did not cover last week, that is on more advanced methods for updating the learning rate, are covered by its own video. We will briefly discuss these topics at the beginning of the lecture and during the lab sessions. See video on ADAgrad, RMSprop and ADAM (material from last week not covered during lecture) at <https://youtu.be/J_41Hld6tTU>\n",
+    "\n",
+    "4. [Video of Lecture](https://youtu.be/4Fo7ITVA7V4)\n",
+    "\n",
+    "5. [Whiteboard notes](https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2025/FYSSTKweek38.pdf)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3b5ac440",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Readings and Videos\n",
+    "1. Raschka et al, pages 175-192\n",
+    "\n",
+    "2. Hastie et al Chapter 7, here we recommend 7.1-7.5 and 7.10 (cross-validation) and 7.11 (bootstrap). See <https://link.springer.com/book/10.1007/978-0-387-84858-7>.\n",
+    "\n",
+    "3. [Video on bias-variance tradeoff](https://www.youtube.com/watch?v=EuBBz3bI-aA)\n",
+    "\n",
+    "4. [Video on Bootstrapping](https://www.youtube.com/watch?v=Xz0x-8-cgaQ)\n",
+    "\n",
+    "5. [Video on cross validation](https://www.youtube.com/watch?v=fSytzGwwBVw)\n",
+    "\n",
+    "For the lab session, the following video on cross validation (from 2024), could be helpful, see <https://www.youtube.com/watch?v=T9jjWsmsd1o>"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6d5dba52",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Linking the regression analysis with a statistical interpretation\n",
+    "\n",
+    "We will now couple the discussions of ordinary least squares, Ridge\n",
+    "and Lasso regression with a statistical interpretation, that is we\n",
+    "move from a linear algebra analysis to a statistical analysis. In\n",
+    "particular, we will focus on what the regularization terms can result\n",
+    "in.  We will amongst other things show that the regularization\n",
+    "parameter can reduce considerably the variance of the parameters\n",
+    "$\\theta$.\n",
+    "\n",
+    "On of the advantages of doing linear regression is that we actually end up with\n",
+    "analytical expressions for several statistical quantities.  \n",
+    "Standard least squares and Ridge regression  allow us to\n",
+    "derive quantities like the variance and other expectation values in a\n",
+    "rather straightforward way.\n",
+    "\n",
+    "It is assumed that $\\varepsilon_i\n",
+    "\\sim \\mathcal{N}(0, \\sigma^2)$ and the $\\varepsilon_{i}$ are\n",
+    "independent, i.e.:"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "bfc2983a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\begin{align*} \n",
+    "\\mbox{Cov}(\\varepsilon_{i_1},\n",
+    "\\varepsilon_{i_2}) & = \\left\\{ \\begin{array}{lcc} \\sigma^2 & \\mbox{if}\n",
+    "& i_1 = i_2, \\\\ 0 & \\mbox{if} & i_1 \\not= i_2.  \\end{array} \\right.\n",
+    "\\end{align*}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2b5f5980",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "The randomness of $\\varepsilon_i$ implies that\n",
+    "$\\mathbf{y}_i$ is also a random variable. In particular,\n",
+    "$\\mathbf{y}_i$ is normally distributed, because $\\varepsilon_i \\sim\n",
+    "\\mathcal{N}(0, \\sigma^2)$ and $\\mathbf{X}_{i,\\ast} \\, \\boldsymbol{\\theta}$ is a\n",
+    "non-random scalar. To specify the parameters of the distribution of\n",
+    "$\\mathbf{y}_i$ we need to calculate its first two moments. \n",
+    "\n",
+    "Recall that $\\boldsymbol{X}$ is a matrix of dimensionality $n\\times p$. The\n",
+    "notation above $\\mathbf{X}_{i,\\ast}$ means that we are looking at the\n",
+    "row number $i$ and perform a sum over all values $p$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3464c7e8",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Assumptions made\n",
+    "\n",
+    "The assumption we have made here can be summarized as (and this is going to be useful when we discuss the bias-variance trade off)\n",
+    "that there exists a function $f(\\boldsymbol{x})$ and  a normal distributed error $\\boldsymbol{\\varepsilon}\\sim \\mathcal{N}(0, \\sigma^2)$\n",
+    "which describe our data"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ed0fd2df",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\boldsymbol{y} = f(\\boldsymbol{x})+\\boldsymbol{\\varepsilon}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "feb9d4c2",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "We approximate this function with our model from the solution of the linear regression equations, that is our\n",
+    "function $f$ is approximated by $\\boldsymbol{\\tilde{y}}$ where we want to minimize $(\\boldsymbol{y}-\\boldsymbol{\\tilde{y}})^2$, our MSE, with"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "eb6d71f8",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\boldsymbol{\\tilde{y}} = \\boldsymbol{X}\\boldsymbol{\\theta}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "566399f6",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Expectation value and variance\n",
+    "\n",
+    "We can calculate the expectation value of $\\boldsymbol{y}$ for a given element $i$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6b33f497",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\begin{align*} \n",
+    "\\mathbb{E}(y_i) & =\n",
+    "\\mathbb{E}(\\mathbf{X}_{i, \\ast} \\, \\boldsymbol{\\theta}) + \\mathbb{E}(\\varepsilon_i)\n",
+    "\\, \\, \\, = \\, \\, \\, \\mathbf{X}_{i, \\ast} \\, \\theta, \n",
+    "\\end{align*}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5f2f79f2",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "while\n",
+    "its variance is"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "199121b0",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\begin{align*} \\mbox{Var}(y_i) & = \\mathbb{E} \\{ [y_i\n",
+    "- \\mathbb{E}(y_i)]^2 \\} \\, \\, \\, = \\, \\, \\, \\mathbb{E} ( y_i^2 ) -\n",
+    "[\\mathbb{E}(y_i)]^2  \\\\  & = \\mathbb{E} [ ( \\mathbf{X}_{i, \\ast} \\,\n",
+    "\\theta + \\varepsilon_i )^2] - ( \\mathbf{X}_{i, \\ast} \\, \\boldsymbol{\\theta})^2 \\\\ &\n",
+    "= \\mathbb{E} [ ( \\mathbf{X}_{i, \\ast} \\, \\boldsymbol{\\theta})^2 + 2 \\varepsilon_i\n",
+    "\\mathbf{X}_{i, \\ast} \\, \\boldsymbol{\\theta} + \\varepsilon_i^2 ] - ( \\mathbf{X}_{i,\n",
+    "\\ast} \\, \\theta)^2 \\\\  & = ( \\mathbf{X}_{i, \\ast} \\, \\boldsymbol{\\theta})^2 + 2\n",
+    "\\mathbb{E}(\\varepsilon_i) \\mathbf{X}_{i, \\ast} \\, \\boldsymbol{\\theta} +\n",
+    "\\mathbb{E}(\\varepsilon_i^2 ) - ( \\mathbf{X}_{i, \\ast} \\, \\boldsymbol{\\theta})^2 \n",
+    "\\\\ & = \\mathbb{E}(\\varepsilon_i^2 ) \\, \\, \\, = \\, \\, \\,\n",
+    "\\mbox{Var}(\\varepsilon_i) \\, \\, \\, = \\, \\, \\, \\sigma^2.  \n",
+    "\\end{align*}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9a1cc529",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Hence, $y_i \\sim \\mathcal{N}( \\mathbf{X}_{i, \\ast} \\, \\boldsymbol{\\theta}, \\sigma^2)$, that is $\\boldsymbol{y}$ follows a normal distribution with \n",
+    "mean value $\\boldsymbol{X}\\boldsymbol{\\theta}$ and variance $\\sigma^2$ (not be confused with the singular values of the SVD)."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "149e63be",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Expectation value and variance for $\\boldsymbol{\\theta}$\n",
+    "\n",
+    "With the OLS expressions for the optimal parameters $\\boldsymbol{\\hat{\\theta}}$ we can evaluate the expectation value"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6a6fb04a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\mathbb{E}(\\boldsymbol{\\hat{\\theta}}) = \\mathbb{E}[ (\\mathbf{X}^{\\top} \\mathbf{X})^{-1}\\mathbf{X}^{T} \\mathbf{Y}]=(\\mathbf{X}^{T} \\mathbf{X})^{-1}\\mathbf{X}^{T} \\mathbb{E}[ \\mathbf{Y}]=(\\mathbf{X}^{T} \\mathbf{X})^{-1} \\mathbf{X}^{T}\\mathbf{X}\\boldsymbol{\\theta}=\\boldsymbol{\\theta}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "79420d06",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "This means that the estimator of the regression parameters is unbiased.\n",
+    "\n",
+    "We can also calculate the variance\n",
+    "\n",
+    "The variance of the optimal value $\\boldsymbol{\\hat{\\theta}}$ is"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0e3de992",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\begin{eqnarray*}\n",
+    "\\mbox{Var}(\\boldsymbol{\\hat{\\theta}}) & = & \\mathbb{E} \\{ [\\boldsymbol{\\theta} - \\mathbb{E}(\\boldsymbol{\\theta})] [\\boldsymbol{\\theta} - \\mathbb{E}(\\boldsymbol{\\theta})]^{T} \\}\n",
+    "\\\\\n",
+    "& = & \\mathbb{E} \\{ [(\\mathbf{X}^{T} \\mathbf{X})^{-1} \\, \\mathbf{X}^{T} \\mathbf{Y} - \\boldsymbol{\\theta}] \\, [(\\mathbf{X}^{T} \\mathbf{X})^{-1} \\, \\mathbf{X}^{T} \\mathbf{Y} - \\boldsymbol{\\theta}]^{T} \\}\n",
+    "\\\\\n",
+    "% & = & \\mathbb{E} \\{ [(\\mathbf{X}^{T} \\mathbf{X})^{-1} \\, \\mathbf{X}^{T} \\mathbf{Y}] \\, [(\\mathbf{X}^{T} \\mathbf{X})^{-1} \\, \\mathbf{X}^{T} \\mathbf{Y}]^{T} \\} - \\boldsymbol{\\theta} \\, \\boldsymbol{\\theta}^{T}\n",
+    "% \\\\\n",
+    "% & = & \\mathbb{E} \\{ (\\mathbf{X}^{T} \\mathbf{X})^{-1} \\, \\mathbf{X}^{T} \\mathbf{Y} \\, \\mathbf{Y}^{T} \\, \\mathbf{X} \\, (\\mathbf{X}^{T} \\mathbf{X})^{-1}  \\} - \\boldsymbol{\\theta} \\, \\boldsymbol{\\theta}^{T}\n",
+    "% \\\\\n",
+    "& = & (\\mathbf{X}^{T} \\mathbf{X})^{-1} \\, \\mathbf{X}^{T} \\, \\mathbb{E} \\{ \\mathbf{Y} \\, \\mathbf{Y}^{T} \\} \\, \\mathbf{X} \\, (\\mathbf{X}^{T} \\mathbf{X})^{-1} - \\boldsymbol{\\theta} \\, \\boldsymbol{\\theta}^{T}\n",
+    "\\\\\n",
+    "& = & (\\mathbf{X}^{T} \\mathbf{X})^{-1} \\, \\mathbf{X}^{T} \\, \\{ \\mathbf{X} \\, \\boldsymbol{\\theta} \\, \\boldsymbol{\\theta}^{T} \\,  \\mathbf{X}^{T} + \\sigma^2 \\} \\, \\mathbf{X} \\, (\\mathbf{X}^{T} \\mathbf{X})^{-1} - \\boldsymbol{\\theta} \\, \\boldsymbol{\\theta}^{T}\n",
+    "% \\\\\n",
+    "% & = & (\\mathbf{X}^T \\mathbf{X})^{-1} \\, \\mathbf{X}^T \\, \\mathbf{X} \\, \\boldsymbol{\\theta} \\, \\boldsymbol{\\theta}^T \\,  \\mathbf{X}^T \\, \\mathbf{X} \\, (\\mathbf{X}^T % \\mathbf{X})^{-1}\n",
+    "% \\\\\n",
+    "% & & + \\, \\, \\sigma^2 \\, (\\mathbf{X}^T \\mathbf{X})^{-1} \\, \\mathbf{X}^T  \\, \\mathbf{X} \\, (\\mathbf{X}^T \\mathbf{X})^{-1} - \\boldsymbol{\\theta} \\boldsymbol{\\theta}^T\n",
+    "\\\\\n",
+    "& = & \\boldsymbol{\\theta} \\, \\boldsymbol{\\theta}^{T}  + \\sigma^2 \\, (\\mathbf{X}^{T} \\mathbf{X})^{-1} - \\boldsymbol{\\theta} \\, \\boldsymbol{\\theta}^{T}\n",
+    "\\, \\, \\, = \\, \\, \\, \\sigma^2 \\, (\\mathbf{X}^{T} \\mathbf{X})^{-1},\n",
+    "\\end{eqnarray*}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d3ea2897",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where we have used  that $\\mathbb{E} (\\mathbf{Y} \\mathbf{Y}^{T}) =\n",
+    "\\mathbf{X} \\, \\boldsymbol{\\theta} \\, \\boldsymbol{\\theta}^{T} \\, \\mathbf{X}^{T} +\n",
+    "\\sigma^2 \\, \\mathbf{I}_{nn}$. From $\\mbox{Var}(\\boldsymbol{\\theta}) = \\sigma^2\n",
+    "\\, (\\mathbf{X}^{T} \\mathbf{X})^{-1}$, one obtains an estimate of the\n",
+    "variance of the estimate of the $j$-th regression coefficient:\n",
+    "$\\boldsymbol{\\sigma}^2 (\\boldsymbol{\\theta}_j ) = \\boldsymbol{\\sigma}^2 [(\\mathbf{X}^{T} \\mathbf{X})^{-1}]_{jj} $. This may be used to\n",
+    "construct a confidence interval for the estimates.\n",
+    "\n",
+    "In a similar way, we can obtain analytical expressions for say the\n",
+    "expectation values of the parameters $\\boldsymbol{\\theta}$ and their variance\n",
+    "when we employ Ridge regression, allowing us again to define a confidence interval. \n",
+    "\n",
+    "It is rather straightforward to show that"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "da5e3927",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\mathbb{E} \\big[ \\boldsymbol{\\theta}^{\\mathrm{Ridge}} \\big]=(\\mathbf{X}^{T} \\mathbf{X} + \\lambda \\mathbf{I}_{pp})^{-1} (\\mathbf{X}^{\\top} \\mathbf{X})\\boldsymbol{\\theta}^{\\mathrm{OLS}}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7ab5488b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "We see clearly that \n",
+    "$\\mathbb{E} \\big[ \\boldsymbol{\\theta}^{\\mathrm{Ridge}} \\big] \\not= \\boldsymbol{\\theta}^{\\mathrm{OLS}}$ for any $\\lambda > 0$. We say then that the ridge estimator is biased.\n",
+    "\n",
+    "We can also compute the variance as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f904a739",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\mbox{Var}[\\boldsymbol{\\theta}^{\\mathrm{Ridge}}]=\\sigma^2[  \\mathbf{X}^{T} \\mathbf{X} + \\lambda \\mathbf{I} ]^{-1}  \\mathbf{X}^{T} \\mathbf{X} \\{ [  \\mathbf{X}^{\\top} \\mathbf{X} + \\lambda \\mathbf{I} ]^{-1}\\}^{T},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "10fd648b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and it is easy to see that if the parameter $\\lambda$ goes to infinity then the variance of Ridge parameters $\\boldsymbol{\\theta}$ goes to zero. \n",
+    "\n",
+    "With this, we can compute the difference"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4812c2a4",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\mbox{Var}[\\boldsymbol{\\theta}^{\\mathrm{OLS}}]-\\mbox{Var}(\\boldsymbol{\\theta}^{\\mathrm{Ridge}})=\\sigma^2 [  \\mathbf{X}^{T} \\mathbf{X} + \\lambda \\mathbf{I} ]^{-1}[ 2\\lambda\\mathbf{I} + \\lambda^2 (\\mathbf{X}^{T} \\mathbf{X})^{-1} ] \\{ [  \\mathbf{X}^{T} \\mathbf{X} + \\lambda \\mathbf{I} ]^{-1}\\}^{T}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "199d8531",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "The difference is non-negative definite since each component of the\n",
+    "matrix product is non-negative definite. \n",
+    "This means the variance we obtain with the standard OLS will always for $\\lambda > 0$ be larger than the variance of $\\boldsymbol{\\theta}$ obtained with the Ridge estimator. This has interesting consequences when we discuss the so-called bias-variance trade-off below."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "96c16676",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Deriving OLS from a probability distribution\n",
+    "\n",
+    "Our basic assumption when we derived the OLS equations was to assume\n",
+    "that our output is determined by a given continuous function\n",
+    "$f(\\boldsymbol{x})$ and a random noise $\\boldsymbol{\\epsilon}$ given by the normal\n",
+    "distribution with zero mean value and an undetermined variance\n",
+    "$\\sigma^2$.\n",
+    "\n",
+    "We found above that the outputs $\\boldsymbol{y}$ have a mean value given by\n",
+    "$\\boldsymbol{X}\\hat{\\boldsymbol{\\theta}}$ and variance $\\sigma^2$. Since the entries to\n",
+    "the design matrix are not stochastic variables, we can assume that the\n",
+    "probability distribution of our targets is also a normal distribution\n",
+    "but now with mean value $\\boldsymbol{X}\\hat{\\boldsymbol{\\theta}}$. This means that a\n",
+    "single output $y_i$ is given by the Gaussian distribution"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a2a1a004",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "y_i\\sim \\mathcal{N}(\\boldsymbol{X}_{i,*}\\boldsymbol{\\theta}, \\sigma^2)=\\frac{1}{\\sqrt{2\\pi\\sigma^2}}\\exp{\\left[-\\frac{(y_i-\\boldsymbol{X}_{i,*}\\boldsymbol{\\theta})^2}{2\\sigma^2}\\right]}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5aad445b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Independent and Identically Distributed (iid)\n",
+    "\n",
+    "We assume now that the various $y_i$ values are stochastically distributed according to the above Gaussian distribution. \n",
+    "We define this distribution as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d197c8bb",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "p(y_i, \\boldsymbol{X}\\vert\\boldsymbol{\\theta})=\\frac{1}{\\sqrt{2\\pi\\sigma^2}}\\exp{\\left[-\\frac{(y_i-\\boldsymbol{X}_{i,*}\\boldsymbol{\\theta})^2}{2\\sigma^2}\\right]},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e2e7462f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "which reads as finding the likelihood of an event $y_i$ with the input variables $\\boldsymbol{X}$ given the parameters (to be determined) $\\boldsymbol{\\theta}$.\n",
+    "\n",
+    "Since these events are assumed to be independent and identicall distributed we can build the probability distribution function (PDF) for all possible event $\\boldsymbol{y}$ as the product of the single events, that is we have"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "eb635d3d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "p(\\boldsymbol{y},\\boldsymbol{X}\\vert\\boldsymbol{\\theta})=\\prod_{i=0}^{n-1}\\frac{1}{\\sqrt{2\\pi\\sigma^2}}\\exp{\\left[-\\frac{(y_i-\\boldsymbol{X}_{i,*}\\boldsymbol{\\theta})^2}{2\\sigma^2}\\right]}=\\prod_{i=0}^{n-1}p(y_i,\\boldsymbol{X}\\vert\\boldsymbol{\\theta}).\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "445ed13e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "We will write this in a more compact form reserving $\\boldsymbol{D}$ for the domain of events, including the ouputs (targets) and the inputs. That is\n",
+    "in case we have a simple one-dimensional input and output case"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "319bfc6c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\boldsymbol{D}=[(x_0,y_0), (x_1,y_1),\\dots, (x_{n-1},y_{n-1})].\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "90abf35a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "In the more general case the various inputs should be replaced by the possible features represented by the input data set $\\boldsymbol{X}$. \n",
+    "We can now rewrite the above probability as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "04b66fbd",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "p(\\boldsymbol{D}\\vert\\boldsymbol{\\theta})=\\prod_{i=0}^{n-1}\\frac{1}{\\sqrt{2\\pi\\sigma^2}}\\exp{\\left[-\\frac{(y_i-\\boldsymbol{X}_{i,*}\\boldsymbol{\\theta})^2}{2\\sigma^2}\\right]}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4a27b5a7",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "It is a conditional probability (see below) and reads as the likelihood of a domain of events $\\boldsymbol{D}$ given a set of parameters $\\boldsymbol{\\theta}$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8d12543f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Maximum Likelihood Estimation (MLE)\n",
+    "\n",
+    "In statistics, maximum likelihood estimation (MLE) is a method of\n",
+    "estimating the parameters of an assumed probability distribution,\n",
+    "given some observed data. This is achieved by maximizing a likelihood\n",
+    "function so that, under the assumed statistical model, the observed\n",
+    "data is the most probable. \n",
+    "\n",
+    "We will assume here that our events are given by the above Gaussian\n",
+    "distribution and we will determine the optimal parameters $\\theta$ by\n",
+    "maximizing the above PDF. However, computing the derivatives of a\n",
+    "product function is cumbersome and can easily lead to overflow and/or\n",
+    "underflowproblems, with potentials for loss of numerical precision.\n",
+    "\n",
+    "In practice, it is more convenient to maximize the logarithm of the\n",
+    "PDF because it is a monotonically increasing function of the argument.\n",
+    "Alternatively, and this will be our option, we will minimize the\n",
+    "negative of the logarithm since this is a monotonically decreasing\n",
+    "function.\n",
+    "\n",
+    "Note also that maximization/minimization of the logarithm of the PDF\n",
+    "is equivalent to the maximization/minimization of the function itself."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2e5cd118",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## A new Cost Function\n",
+    "\n",
+    "We could now define a new cost function to minimize, namely the negative logarithm of the above PDF"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c71a5edf",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "C(\\boldsymbol{\\theta})=-\\log{\\prod_{i=0}^{n-1}p(y_i,\\boldsymbol{X}\\vert\\boldsymbol{\\theta})}=-\\sum_{i=0}^{n-1}\\log{p(y_i,\\boldsymbol{X}\\vert\\boldsymbol{\\theta})},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e663bf2e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "which becomes"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c4bc4873",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "C(\\boldsymbol{\\theta})=\\frac{n}{2}\\log{2\\pi\\sigma^2}+\\frac{\\vert\\vert (\\boldsymbol{y}-\\boldsymbol{X}\\boldsymbol{\\theta})\\vert\\vert_2^2}{2\\sigma^2}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f5bc59b8",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Taking the derivative of the *new* cost function with respect to the parameters $\\theta$ we recognize our familiar OLS equation, namely"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4f6ddf4a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\boldsymbol{X}^T\\left(\\boldsymbol{y}-\\boldsymbol{X}\\boldsymbol{\\theta}\\right) =0,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "afda0a6b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "which leads to the well-known OLS equation for the optimal paramters $\\theta$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b5335dc0",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\hat{\\boldsymbol{\\theta}}^{\\mathrm{OLS}}=\\left(\\boldsymbol{X}^T\\boldsymbol{X}\\right)^{-1}\\boldsymbol{X}^T\\boldsymbol{y}!\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4f86a52d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Next week we will make  a similar analysis for Ridge and Lasso regression"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5cdb1767",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Why resampling methods\n",
+    "\n",
+    "Before we proceed, we need to rethink what we have been doing. In our\n",
+    "eager to fit the data, we have omitted several important elements in\n",
+    "our regression analysis. In what follows we will\n",
+    "1. look at statistical properties, including a discussion of mean values, variance and the so-called bias-variance tradeoff\n",
+    "\n",
+    "2. introduce resampling techniques like cross-validation, bootstrapping and jackknife and more\n",
+    "\n",
+    "and discuss how to select a given model (one of the difficult parts in machine learning)."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "69435d77",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Resampling methods\n",
+    "Resampling methods are an indispensable tool in modern\n",
+    "statistics. They involve repeatedly drawing samples from a training\n",
+    "set and refitting a model of interest on each sample in order to\n",
+    "obtain additional information about the fitted model. For example, in\n",
+    "order to estimate the variability of a linear regression fit, we can\n",
+    "repeatedly draw different samples from the training data, fit a linear\n",
+    "regression to each new sample, and then examine the extent to which\n",
+    "the resulting fits differ. Such an approach may allow us to obtain\n",
+    "information that would not be available from fitting the model only\n",
+    "once using the original training sample.\n",
+    "\n",
+    "Two resampling methods are often used in Machine Learning analyses,\n",
+    "1. The **bootstrap method**\n",
+    "\n",
+    "2. and **Cross-Validation**\n",
+    "\n",
+    "In addition there are several other methods such as the Jackknife and the Blocking methods. We will discuss in particular\n",
+    "cross-validation and the bootstrap method."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cefbb559",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Resampling approaches can be computationally expensive\n",
+    "\n",
+    "Resampling approaches can be computationally expensive, because they\n",
+    "involve fitting the same statistical method multiple times using\n",
+    "different subsets of the training data. However, due to recent\n",
+    "advances in computing power, the computational requirements of\n",
+    "resampling methods generally are not prohibitive. In this chapter, we\n",
+    "discuss two of the most commonly used resampling methods,\n",
+    "cross-validation and the bootstrap. Both methods are important tools\n",
+    "in the practical application of many statistical learning\n",
+    "procedures. For example, cross-validation can be used to estimate the\n",
+    "test error associated with a given statistical learning method in\n",
+    "order to evaluate its performance, or to select the appropriate level\n",
+    "of flexibility. The process of evaluating a model’s performance is\n",
+    "known as model assessment, whereas the process of selecting the proper\n",
+    "level of flexibility for a model is known as model selection. The\n",
+    "bootstrap is widely used."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2659401a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Why resampling methods ?\n",
+    "**Statistical analysis.**\n",
+    "\n",
+    "* Our simulations can be treated as *computer experiments*. This is particularly the case for Monte Carlo methods which are widely used in statistical analyses.\n",
+    "\n",
+    "* The results can be analysed with the same statistical tools as we would use when analysing experimental data.\n",
+    "\n",
+    "* As in all experiments, we are looking for expectation values and an estimate of how accurate they are, i.e., possible sources for errors."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4d5d7748",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Statistical analysis\n",
+    "\n",
+    "* As in other experiments, many numerical  experiments have two classes of errors:\n",
+    "\n",
+    "  * Statistical errors\n",
+    "\n",
+    "  * Systematical errors\n",
+    "\n",
+    "* Statistical errors can be estimated using standard tools from statistics\n",
+    "\n",
+    "* Systematical errors are method specific and must be treated differently from case to case."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "54df92b3",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Resampling methods\n",
+    "\n",
+    "With all these analytical equations for both the OLS and Ridge\n",
+    "regression, we will now outline how to assess a given model. This will\n",
+    "lead to a discussion of the so-called bias-variance tradeoff (see\n",
+    "below) and so-called resampling methods.\n",
+    "\n",
+    "One of the quantities we have discussed as a way to measure errors is\n",
+    "the mean-squared error (MSE), mainly used for fitting of continuous\n",
+    "functions. Another choice is the absolute error.\n",
+    "\n",
+    "In the discussions below we will focus on the MSE and in particular since we will split the data into test and training data,\n",
+    "we discuss the\n",
+    "1. prediction error or simply the **test error** $\\mathrm{Err_{Test}}$, where we have a fixed training set and the test error is the MSE arising from the data reserved for testing. We discuss also the \n",
+    "\n",
+    "2. training error $\\mathrm{Err_{Train}}$, which is the average loss over the training data.\n",
+    "\n",
+    "As our model becomes more and more complex, more of the training data tends to  used. The training may thence adapt to more complicated structures in the data. This may lead to a decrease in the bias (see below for code example) and a slight increase of the variance for the test error.\n",
+    "For a certain level of complexity the test error will reach minimum, before starting to increase again. The\n",
+    "training error reaches a saturation."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5b1a1390",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Resampling methods: Bootstrap\n",
+    "Bootstrapping is a [non-parametric approach](https://en.wikipedia.org/wiki/Nonparametric_statistics) to statistical inference\n",
+    "that substitutes computation for more traditional distributional\n",
+    "assumptions and asymptotic results. Bootstrapping offers a number of\n",
+    "advantages: \n",
+    "1. The bootstrap is quite general, although there are some cases in which it fails.  \n",
+    "\n",
+    "2. Because it does not require distributional assumptions (such as normally distributed errors), the bootstrap can provide more accurate inferences when the data are not well behaved or when the sample size is small.  \n",
+    "\n",
+    "3. It is possible to apply the bootstrap to statistics with sampling distributions that are difficult to derive, even asymptotically. \n",
+    "\n",
+    "4. It is relatively simple to apply the bootstrap to complex data-collection plans (such as stratified and clustered samples).\n",
+    "\n",
+    "The textbook by [Davison on the Bootstrap Methods and their Applications](https://www.cambridge.org/core/books/bootstrap-methods-and-their-application/ED2FD043579F27952363566DC09CBD6A) provides many more insights and proofs. In this course we will take a more practical approach and use the results and theorems provided in the literature. For those interested in reading more about the bootstrap methods, we recommend the above text and the one by [Efron and Tibshirani](https://www.routledge.com/An-Introduction-to-the-Bootstrap/Efron-Tibshirani/p/book/9780412042317).\n",
+    "\n",
+    "Before we proceed however, we need to remind ourselves about a central theorem in statistics, namely the so-called **central limit theorem**."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "39f233e4",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## The Central Limit Theorem\n",
+    "\n",
+    "Suppose we have a PDF $p(x)$ from which we generate  a series $N$\n",
+    "of averages $\\mathbb{E}[x_i]$. Each mean value $\\mathbb{E}[x_i]$\n",
+    "is viewed as the average of a specific measurement, e.g., throwing \n",
+    "dice 100 times and then taking the average value, or producing a certain\n",
+    "amount of random numbers. \n",
+    "For notational ease, we set $\\mathbb{E}[x_i]=x_i$ in the discussion\n",
+    "which follows. We do the same for $\\mathbb{E}[z]=z$.\n",
+    "\n",
+    "If we compute the mean $z$ of $m$ such mean values $x_i$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "361320d8",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "z=\\frac{x_1+x_2+\\dots+x_m}{m},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a363db1e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "the question we pose is which is the PDF of the new variable $z$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "92967efc",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Finding the Limit\n",
+    "\n",
+    "The probability of obtaining an average value $z$ is the product of the \n",
+    "probabilities of obtaining arbitrary individual mean values $x_i$,\n",
+    "but with the constraint that the average is $z$. We can express this through\n",
+    "the following expression"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1bffca97",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\tilde{p}(z)=\\int dx_1p(x_1)\\int dx_2p(x_2)\\dots\\int dx_mp(x_m)\n",
+    "    \\delta(z-\\frac{x_1+x_2+\\dots+x_m}{m}),\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0dacb6fc",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where the $\\delta$-function enbodies the constraint that the mean is $z$.\n",
+    "All measurements that lead to each individual $x_i$ are expected to\n",
+    "be independent, which in turn means that we can express $\\tilde{p}$ as the \n",
+    "product of individual $p(x_i)$.  The independence assumption is important in the derivation of the central limit theorem."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "baeedf81",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Rewriting the $\\delta$-function\n",
+    "\n",
+    "If we use the integral expression for the $\\delta$-function"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "20cc7770",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\delta(z-\\frac{x_1+x_2+\\dots+x_m}{m})=\\frac{1}{2\\pi}\\int_{-\\infty}^{\\infty}\n",
+    "   dq\\exp{\\left(iq(z-\\frac{x_1+x_2+\\dots+x_m}{m})\\right)},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f67d3b94",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and inserting $e^{i\\mu q-i\\mu q}$ where $\\mu$ is the mean value\n",
+    "we arrive at"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "17f59fb6",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\tilde{p}(z)=\\frac{1}{2\\pi}\\int_{-\\infty}^{\\infty}\n",
+    "   dq\\exp{\\left(iq(z-\\mu)\\right)}\\left[\\int_{-\\infty}^{\\infty}\n",
+    "   dxp(x)\\exp{\\left(iq(\\mu-x)/m\\right)}\\right]^m,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5f899fbe",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "with the integral over $x$ resulting in"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "19a1f5bb",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\int_{-\\infty}^{\\infty}dxp(x)\\exp{\\left(iq(\\mu-x)/m\\right)}=\n",
+    "  \\int_{-\\infty}^{\\infty}dxp(x)\n",
+    "   \\left[1+\\frac{iq(\\mu-x)}{m}-\\frac{q^2(\\mu-x)^2}{2m^2}+\\dots\\right].\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1db8fcf2",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Identifying Terms\n",
+    "\n",
+    "The second term on the rhs disappears since this is just the mean and \n",
+    "employing the definition of $\\sigma^2$ we have"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "bfadf7e5",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\int_{-\\infty}^{\\infty}dxp(x)e^{\\left(iq(\\mu-x)/m\\right)}=\n",
+    "  1-\\frac{q^2\\sigma^2}{2m^2}+\\dots,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7c65ce24",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "resulting in"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8cd5650a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\left[\\int_{-\\infty}^{\\infty}dxp(x)\\exp{\\left(iq(\\mu-x)/m\\right)}\\right]^m\\approx\n",
+    "  \\left[1-\\frac{q^2\\sigma^2}{2m^2}+\\dots \\right]^m,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "11fdc936",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and in the limit $m\\rightarrow \\infty$ we obtain"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ed88642e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\tilde{p}(z)=\\frac{1}{\\sqrt{2\\pi}(\\sigma/\\sqrt{m})}\n",
+    "    \\exp{\\left(-\\frac{(z-\\mu)^2}{2(\\sigma/\\sqrt{m})^2}\\right)},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "82c61b81",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "which is the normal distribution with variance\n",
+    "$\\sigma^2_m=\\sigma^2/m$, where $\\sigma$ is the variance of the PDF $p(x)$\n",
+    "and $\\mu$ is also the mean of the PDF $p(x)$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "bc43db46",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Wrapping it up\n",
+    "\n",
+    "Thus, the central limit theorem states that the PDF $\\tilde{p}(z)$ of\n",
+    "the average of $m$ random values corresponding to a PDF $p(x)$ \n",
+    "is a normal distribution whose mean is the \n",
+    "mean value of the PDF $p(x)$ and whose variance is the variance\n",
+    "of the PDF $p(x)$ divided by $m$, the number of values used to compute $z$.\n",
+    "\n",
+    "The central limit theorem leads to the well-known expression for the\n",
+    "standard deviation, given by"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "25418113",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\sigma_m=\n",
+    "\\frac{\\sigma}{\\sqrt{m}}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e5d3c3eb",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "The latter is true only if the average value is known exactly. This is obtained in the limit\n",
+    "$m\\rightarrow \\infty$  only. Because the mean and the variance are measured quantities we obtain \n",
+    "the familiar expression in statistics (the so-called Bessel correction)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c504cba4",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\sigma_m\\approx \n",
+    "\\frac{\\sigma}{\\sqrt{m-1}}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "079ded2a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "In many cases however the above estimate for the standard deviation,\n",
+    "in particular if correlations are strong, may be too simplistic. Keep\n",
+    "in mind that we have assumed that the variables $x$ are independent\n",
+    "and identically distributed. This is obviously not always the\n",
+    "case. For example, the random numbers (or better pseudorandom numbers)\n",
+    "we generate in various calculations do always exhibit some\n",
+    "correlations.\n",
+    "\n",
+    "The theorem is satisfied by a large class of PDFs. Note however that for a\n",
+    "finite $m$, it is not always possible to find a closed form /analytic expression for\n",
+    "$\\tilde{p}(x)$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e8534a50",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Confidence Intervals\n",
+    "\n",
+    "Confidence intervals are used in statistics and represent a type of estimate\n",
+    "computed from the observed data. This gives a range of values for an\n",
+    "unknown parameter such as the parameters $\\boldsymbol{\\theta}$ from linear regression.\n",
+    "\n",
+    "With the OLS expressions for the parameters $\\boldsymbol{\\theta}$ we found \n",
+    "$\\mathbb{E}(\\boldsymbol{\\theta}) = \\boldsymbol{\\theta}$, which means that the estimator of the regression parameters is unbiased.\n",
+    "\n",
+    "In the exercises this week we show that the variance of the estimate of the $j$-th regression coefficient is\n",
+    "$\\boldsymbol{\\sigma}^2 (\\boldsymbol{\\theta}_j ) = \\boldsymbol{\\sigma}^2 [(\\mathbf{X}^{T} \\mathbf{X})^{-1}]_{jj} $.\n",
+    "\n",
+    "This quantity can be used to\n",
+    "construct a confidence interval for the estimates."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2fc73431",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Standard Approach based on the Normal Distribution\n",
+    "\n",
+    "We will assume that the parameters $\\theta$ follow a normal\n",
+    "distribution.  We can then define the confidence interval.  Here we will be using as\n",
+    "shorthands $\\mu_{\\theta}$ for the above mean value and $\\sigma_{\\theta}$\n",
+    "for the standard deviation. We have then a confidence interval"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0f8b0845",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\left(\\mu_{\\theta}\\pm \\frac{z\\sigma_{\\theta}}{\\sqrt{n}}\\right),\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "25105753",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where $z$ defines the level of certainty (or confidence). For a normal\n",
+    "distribution typical parameters are $z=2.576$ which corresponds to a\n",
+    "confidence of $99\\%$ while $z=1.96$ corresponds to a confidence of\n",
+    "$95\\%$.  A confidence level of $95\\%$ is commonly used and it is\n",
+    "normally referred to as a *two-sigmas* confidence level, that is we\n",
+    "approximate $z\\approx 2$.\n",
+    "\n",
+    "For more discussions of confidence intervals (and in particular linked with a discussion of the bootstrap method), see chapter 5 of the textbook by [Davison on the Bootstrap Methods and their Applications](https://www.cambridge.org/core/books/bootstrap-methods-and-their-application/ED2FD043579F27952363566DC09CBD6A)\n",
+    "\n",
+    "In this text you will also find an in-depth discussion of the\n",
+    "Bootstrap method, why it works and various theorems related to it."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "89be6eea",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Resampling methods: Bootstrap background\n",
+    "\n",
+    "Since $\\widehat{\\theta} = \\widehat{\\theta}(\\boldsymbol{X})$ is a function of random variables,\n",
+    "$\\widehat{\\theta}$ itself must be a random variable. Thus it has\n",
+    "a pdf, call this function $p(\\boldsymbol{t})$. The aim of the bootstrap is to\n",
+    "estimate $p(\\boldsymbol{t})$ by the relative frequency of\n",
+    "$\\widehat{\\theta}$. You can think of this as using a histogram\n",
+    "in the place of $p(\\boldsymbol{t})$. If the relative frequency closely\n",
+    "resembles $p(\\vec{t})$, then using numerics, it is straight forward to\n",
+    "estimate all the interesting parameters of $p(\\boldsymbol{t})$ using point\n",
+    "estimators."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6c240b38",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Resampling methods: More Bootstrap background\n",
+    "\n",
+    "In the case that $\\widehat{\\theta}$ has\n",
+    "more than one component, and the components are independent, we use the\n",
+    "same estimator on each component separately.  If the probability\n",
+    "density function of $X_i$, $p(x)$, had been known, then it would have\n",
+    "been straightforward to do this by: \n",
+    "1. Drawing lots of numbers from $p(x)$, suppose we call one such set of numbers $(X_1^*, X_2^*, \\cdots, X_n^*)$. \n",
+    "\n",
+    "2. Then using these numbers, we could compute a replica of $\\widehat{\\theta}$ called $\\widehat{\\theta}^*$. \n",
+    "\n",
+    "By repeated use of the above two points, many\n",
+    "estimates of $\\widehat{\\theta}$ can  be obtained. The\n",
+    "idea is to use the relative frequency of $\\widehat{\\theta}^*$\n",
+    "(think of a histogram) as an estimate of $p(\\boldsymbol{t})$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "fbd95a5c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Resampling methods: Bootstrap approach\n",
+    "\n",
+    "But\n",
+    "unless there is enough information available about the process that\n",
+    "generated $X_1,X_2,\\cdots,X_n$, $p(x)$ is in general\n",
+    "unknown. Therefore, [Efron in 1979](https://projecteuclid.org/euclid.aos/1176344552)  asked the\n",
+    "question: What if we replace $p(x)$ by the relative frequency\n",
+    "of the observation $X_i$?\n",
+    "\n",
+    "If we draw observations in accordance with\n",
+    "the relative frequency of the observations, will we obtain the same\n",
+    "result in some asymptotic sense? The answer is yes."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "dc50d43a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Resampling methods: Bootstrap steps\n",
+    "\n",
+    "The independent bootstrap works like this: \n",
+    "\n",
+    "1. Draw with replacement $n$ numbers for the observed variables $\\boldsymbol{x} = (x_1,x_2,\\cdots,x_n)$. \n",
+    "\n",
+    "2. Define a vector $\\boldsymbol{x}^*$ containing the values which were drawn from $\\boldsymbol{x}$. \n",
+    "\n",
+    "3. Using the vector $\\boldsymbol{x}^*$ compute $\\widehat{\\theta}^*$ by evaluating $\\widehat \\theta$ under the observations $\\boldsymbol{x}^*$. \n",
+    "\n",
+    "4. Repeat this process $k$ times. \n",
+    "\n",
+    "When you are done, you can draw a histogram of the relative frequency\n",
+    "of $\\widehat \\theta^*$. This is your estimate of the probability\n",
+    "distribution $p(t)$. Using this probability distribution you can\n",
+    "estimate any statistics thereof. In principle you never draw the\n",
+    "histogram of the relative frequency of $\\widehat{\\theta}^*$. Instead\n",
+    "you use the estimators corresponding to the statistic of interest. For\n",
+    "example, if you are interested in estimating the variance of $\\widehat\n",
+    "\\theta$, apply the etsimator $\\widehat \\sigma^2$ to the values\n",
+    "$\\widehat \\theta^*$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "283068cc",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Code example for the Bootstrap method\n",
+    "\n",
+    "The following code starts with a Gaussian distribution with mean value\n",
+    "$\\mu =100$ and variance $\\sigma=15$. We use this to generate the data\n",
+    "used in the bootstrap analysis. The bootstrap analysis returns a data\n",
+    "set after a given number of bootstrap operations (as many as we have\n",
+    "data points). This data set consists of estimated mean values for each\n",
+    "bootstrap operation. The histogram generated by the bootstrap method\n",
+    "shows that the distribution for these mean values is also a Gaussian,\n",
+    "centered around the mean value $\\mu=100$ but with standard deviation\n",
+    "$\\sigma/\\sqrt{n}$, where $n$ is the number of bootstrap samples (in\n",
+    "this case the same as the number of original data points). The value\n",
+    "of the standard deviation is what we expect from the central limit\n",
+    "theorem."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "id": "ff4790ba",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "%matplotlib inline\n",
+    "\n",
+    "import numpy as np\n",
+    "from time import time\n",
+    "from scipy.stats import norm\n",
+    "import matplotlib.pyplot as plt\n",
+    "\n",
+    "# Returns mean of bootstrap samples \n",
+    "# Bootstrap algorithm\n",
+    "def bootstrap(data, datapoints):\n",
+    "    t = np.zeros(datapoints)\n",
+    "    n = len(data)\n",
+    "    # non-parametric bootstrap         \n",
+    "    for i in range(datapoints):\n",
+    "        t[i] = np.mean(data[np.random.randint(0,n,n)])\n",
+    "    # analysis    \n",
+    "    print(\"Bootstrap Statistics :\")\n",
+    "    print(\"original           bias      std. error\")\n",
+    "    print(\"%8g %8g %14g %15g\" % (np.mean(data), np.std(data),np.mean(t),np.std(t)))\n",
+    "    return t\n",
+    "\n",
+    "# We set the mean value to 100 and the standard deviation to 15\n",
+    "mu, sigma = 100, 15\n",
+    "datapoints = 10000\n",
+    "# We generate random numbers according to the normal distribution\n",
+    "x = mu + sigma*np.random.randn(datapoints)\n",
+    "# bootstrap returns the data sample                                    \n",
+    "t = bootstrap(x, datapoints)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3e6adc2f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "We see that our new variance and from that the standard deviation, agrees with the central limit theorem."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6ec8223c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Plotting the Histogram"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "id": "3cf4144d",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "# the histogram of the bootstrapped data (normalized data if density = True)\n",
+    "n, binsboot, patches = plt.hist(t, 50, density=True, facecolor='red', alpha=0.75)\n",
+    "# add a 'best fit' line  \n",
+    "y = norm.pdf(binsboot, np.mean(t), np.std(t))\n",
+    "lt = plt.plot(binsboot, y, 'b', linewidth=1)\n",
+    "plt.xlabel('x')\n",
+    "plt.ylabel('Probability')\n",
+    "plt.grid(True)\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "db5a8f91",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## The bias-variance tradeoff\n",
+    "\n",
+    "We will discuss the bias-variance tradeoff in the context of\n",
+    "continuous predictions such as regression. However, many of the\n",
+    "intuitions and ideas discussed here also carry over to classification\n",
+    "tasks. Consider a dataset $\\mathcal{D}$ consisting of the data\n",
+    "$\\mathbf{X}_\\mathcal{D}=\\{(y_j, \\boldsymbol{x}_j), j=0\\ldots n-1\\}$. \n",
+    "\n",
+    "Let us assume that the true data is generated from a noisy model"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "327bce6a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\boldsymbol{y}=f(\\boldsymbol{x}) + \\boldsymbol{\\epsilon}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1c671d4e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where $\\epsilon$ is normally distributed with mean zero and standard deviation $\\sigma^2$.\n",
+    "\n",
+    "In our derivation of the ordinary least squares method we defined then\n",
+    "an approximation to the function $f$ in terms of the parameters\n",
+    "$\\boldsymbol{\\theta}$ and the design matrix $\\boldsymbol{X}$ which embody our model,\n",
+    "that is $\\boldsymbol{\\tilde{y}}=\\boldsymbol{X}\\boldsymbol{\\theta}$. \n",
+    "\n",
+    "Thereafter we found the parameters $\\boldsymbol{\\theta}$ by optimizing the means squared error via the so-called cost function"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6e05fc43",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "C(\\boldsymbol{X},\\boldsymbol{\\theta}) =\\frac{1}{n}\\sum_{i=0}^{n-1}(y_i-\\tilde{y}_i)^2=\\mathbb{E}\\left[(\\boldsymbol{y}-\\boldsymbol{\\tilde{y}})^2\\right].\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c45e0752",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "We can rewrite this as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "bafa4ab6",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\mathbb{E}\\left[(\\boldsymbol{y}-\\boldsymbol{\\tilde{y}})^2\\right]=\\frac{1}{n}\\sum_i(f_i-\\mathbb{E}\\left[\\boldsymbol{\\tilde{y}}\\right])^2+\\frac{1}{n}\\sum_i(\\tilde{y}_i-\\mathbb{E}\\left[\\boldsymbol{\\tilde{y}}\\right])^2+\\sigma^2.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ea0bc471",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "The three terms represent the square of the bias of the learning\n",
+    "method, which can be thought of as the error caused by the simplifying\n",
+    "assumptions built into the method. The second term represents the\n",
+    "variance of the chosen model and finally the last terms is variance of\n",
+    "the error $\\boldsymbol{\\epsilon}$.\n",
+    "\n",
+    "To derive this equation, we need to recall that the variance of $\\boldsymbol{y}$ and $\\boldsymbol{\\epsilon}$ are both equal to $\\sigma^2$. The mean value of $\\boldsymbol{\\epsilon}$ is by definition equal to zero. Furthermore, the function $f$ is not a stochastics variable, idem for $\\boldsymbol{\\tilde{y}}$.\n",
+    "We use a more compact notation in terms of the expectation value"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "08b603f3",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\mathbb{E}\\left[(\\boldsymbol{y}-\\boldsymbol{\\tilde{y}})^2\\right]=\\mathbb{E}\\left[(\\boldsymbol{f}+\\boldsymbol{\\epsilon}-\\boldsymbol{\\tilde{y}})^2\\right],\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4114d10e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and adding and subtracting $\\mathbb{E}\\left[\\boldsymbol{\\tilde{y}}\\right]$ we get"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8890c666",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\mathbb{E}\\left[(\\boldsymbol{y}-\\boldsymbol{\\tilde{y}})^2\\right]=\\mathbb{E}\\left[(\\boldsymbol{f}+\\boldsymbol{\\epsilon}-\\boldsymbol{\\tilde{y}}+\\mathbb{E}\\left[\\boldsymbol{\\tilde{y}}\\right]-\\mathbb{E}\\left[\\boldsymbol{\\tilde{y}}\\right])^2\\right],\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7d5b7ce4",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "which, using the abovementioned expectation values can be rewritten as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3913c5b9",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\mathbb{E}\\left[(\\boldsymbol{y}-\\boldsymbol{\\tilde{y}})^2\\right]=\\mathbb{E}\\left[(\\boldsymbol{y}-\\mathbb{E}\\left[\\boldsymbol{\\tilde{y}}\\right])^2\\right]+\\mathrm{Var}\\left[\\boldsymbol{\\tilde{y}}\\right]+\\sigma^2,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5e0067b1",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "that is the rewriting in terms of the so-called bias, the variance of the model $\\boldsymbol{\\tilde{y}}$ and the variance of $\\boldsymbol{\\epsilon}$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "326bc8f1",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## A way to Read the Bias-Variance Tradeoff\n",
+    "\n",
+    "<!-- dom:FIGURE: [figures/BiasVariance.png, width=600 frac=0.9] -->\n",
+    "<!-- begin figure -->\n",
+    "\n",
+    "<img src=\"figures/BiasVariance.png\" width=\"600\"><p style=\"font-size: 0.9em\"><i>Figure 1: </i></p>\n",
+    "<!-- end figure -->"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d3713eca",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Example code for Bias-Variance tradeoff"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "id": "01c3b507",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "import matplotlib.pyplot as plt\n",
+    "import numpy as np\n",
+    "from sklearn.linear_model import LinearRegression, Ridge, Lasso\n",
+    "from sklearn.preprocessing import PolynomialFeatures\n",
+    "from sklearn.model_selection import train_test_split\n",
+    "from sklearn.pipeline import make_pipeline\n",
+    "from sklearn.utils import resample\n",
+    "\n",
+    "np.random.seed(2018)\n",
+    "\n",
+    "n = 500\n",
+    "n_boostraps = 100\n",
+    "degree = 18  # A quite high value, just to show.\n",
+    "noise = 0.1\n",
+    "\n",
+    "# Make data set.\n",
+    "x = np.linspace(-1, 3, n).reshape(-1, 1)\n",
+    "y = np.exp(-x**2) + 1.5 * np.exp(-(x-2)**2) + np.random.normal(0, 0.1, x.shape)\n",
+    "\n",
+    "# Hold out some test data that is never used in training.\n",
+    "x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2)\n",
+    "\n",
+    "# Combine x transformation and model into one operation.\n",
+    "# Not neccesary, but convenient.\n",
+    "model = make_pipeline(PolynomialFeatures(degree=degree), LinearRegression(fit_intercept=False))\n",
+    "\n",
+    "# The following (m x n_bootstraps) matrix holds the column vectors y_pred\n",
+    "# for each bootstrap iteration.\n",
+    "y_pred = np.empty((y_test.shape[0], n_boostraps))\n",
+    "for i in range(n_boostraps):\n",
+    "    x_, y_ = resample(x_train, y_train)\n",
+    "\n",
+    "    # Evaluate the new model on the same test data each time.\n",
+    "    y_pred[:, i] = model.fit(x_, y_).predict(x_test).ravel()\n",
+    "\n",
+    "# Note: Expectations and variances taken w.r.t. different training\n",
+    "# data sets, hence the axis=1. Subsequent means are taken across the test data\n",
+    "# set in order to obtain a total value, but before this we have error/bias/variance\n",
+    "# calculated per data point in the test set.\n",
+    "# Note 2: The use of keepdims=True is important in the calculation of bias as this \n",
+    "# maintains the column vector form. Dropping this yields very unexpected results.\n",
+    "error = np.mean( np.mean((y_test - y_pred)**2, axis=1, keepdims=True) )\n",
+    "bias = np.mean( (y_test - np.mean(y_pred, axis=1, keepdims=True))**2 )\n",
+    "variance = np.mean( np.var(y_pred, axis=1, keepdims=True) )\n",
+    "print('Error:', error)\n",
+    "print('Bias^2:', bias)\n",
+    "print('Var:', variance)\n",
+    "print('{} >= {} + {} = {}'.format(error, bias, variance, bias+variance))\n",
+    "\n",
+    "plt.plot(x[::5, :], y[::5, :], label='f(x)')\n",
+    "plt.scatter(x_test, y_test, label='Data points')\n",
+    "plt.scatter(x_test, np.mean(y_pred, axis=1), label='Pred')\n",
+    "plt.legend()\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "949e3a5e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Understanding what happens"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "id": "7e7f4926",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "import matplotlib.pyplot as plt\n",
+    "import numpy as np\n",
+    "from sklearn.linear_model import LinearRegression, Ridge, Lasso\n",
+    "from sklearn.preprocessing import PolynomialFeatures\n",
+    "from sklearn.model_selection import train_test_split\n",
+    "from sklearn.pipeline import make_pipeline\n",
+    "from sklearn.utils import resample\n",
+    "\n",
+    "np.random.seed(2018)\n",
+    "\n",
+    "n = 40\n",
+    "n_boostraps = 100\n",
+    "maxdegree = 14\n",
+    "\n",
+    "\n",
+    "# Make data set.\n",
+    "x = np.linspace(-3, 3, n).reshape(-1, 1)\n",
+    "y = np.exp(-x**2) + 1.5 * np.exp(-(x-2)**2)+ np.random.normal(0, 0.1, x.shape)\n",
+    "error = np.zeros(maxdegree)\n",
+    "bias = np.zeros(maxdegree)\n",
+    "variance = np.zeros(maxdegree)\n",
+    "polydegree = np.zeros(maxdegree)\n",
+    "x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2)\n",
+    "\n",
+    "for degree in range(maxdegree):\n",
+    "    model = make_pipeline(PolynomialFeatures(degree=degree), LinearRegression(fit_intercept=False))\n",
+    "    y_pred = np.empty((y_test.shape[0], n_boostraps))\n",
+    "    for i in range(n_boostraps):\n",
+    "        x_, y_ = resample(x_train, y_train)\n",
+    "        y_pred[:, i] = model.fit(x_, y_).predict(x_test).ravel()\n",
+    "\n",
+    "    polydegree[degree] = degree\n",
+    "    error[degree] = np.mean( np.mean((y_test - y_pred)**2, axis=1, keepdims=True) )\n",
+    "    bias[degree] = np.mean( (y_test - np.mean(y_pred, axis=1, keepdims=True))**2 )\n",
+    "    variance[degree] = np.mean( np.var(y_pred, axis=1, keepdims=True) )\n",
+    "    print('Polynomial degree:', degree)\n",
+    "    print('Error:', error[degree])\n",
+    "    print('Bias^2:', bias[degree])\n",
+    "    print('Var:', variance[degree])\n",
+    "    print('{} >= {} + {} = {}'.format(error[degree], bias[degree], variance[degree], bias[degree]+variance[degree]))\n",
+    "\n",
+    "plt.plot(polydegree, error, label='Error')\n",
+    "plt.plot(polydegree, bias, label='bias')\n",
+    "plt.plot(polydegree, variance, label='Variance')\n",
+    "plt.legend()\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "33c5cae5",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Summing up\n",
+    "\n",
+    "The bias-variance tradeoff summarizes the fundamental tension in\n",
+    "machine learning, particularly supervised learning, between the\n",
+    "complexity of a model and the amount of training data needed to train\n",
+    "it.  Since data is often limited, in practice it is often useful to\n",
+    "use a less-complex model with higher bias, that is  a model whose asymptotic\n",
+    "performance is worse than another model because it is easier to\n",
+    "train and less sensitive to sampling noise arising from having a\n",
+    "finite-sized training dataset (smaller variance). \n",
+    "\n",
+    "The above equations tell us that in\n",
+    "order to minimize the expected test error, we need to select a\n",
+    "statistical learning method that simultaneously achieves low variance\n",
+    "and low bias. Note that variance is inherently a nonnegative quantity,\n",
+    "and squared bias is also nonnegative. Hence, we see that the expected\n",
+    "test MSE can never lie below $Var(\\epsilon)$, the irreducible error.\n",
+    "\n",
+    "What do we mean by the variance and bias of a statistical learning\n",
+    "method? The variance refers to the amount by which our model would change if we\n",
+    "estimated it using a different training data set. Since the training\n",
+    "data are used to fit the statistical learning method, different\n",
+    "training data sets  will result in a different estimate. But ideally the\n",
+    "estimate for our model should not vary too much between training\n",
+    "sets. However, if a method has high variance  then small changes in\n",
+    "the training data can result in large changes in the model. In general, more\n",
+    "flexible statistical methods have higher variance.\n",
+    "\n",
+    "You may also find this recent [article](https://www.pnas.org/content/116/32/15849) of interest."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f931f0f2",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Another Example from Scikit-Learn's Repository\n",
+    "\n",
+    "This example demonstrates the problems of underfitting and overfitting and\n",
+    "how we can use linear regression with polynomial features to approximate\n",
+    "nonlinear functions. The plot shows the function that we want to approximate,\n",
+    "which is a part of the cosine function. In addition, the samples from the\n",
+    "real function and the approximations of different models are displayed. The\n",
+    "models have polynomial features of different degrees. We can see that a\n",
+    "linear function (polynomial with degree 1) is not sufficient to fit the\n",
+    "training samples. This is called **underfitting**. A polynomial of degree 4\n",
+    "approximates the true function almost perfectly. However, for higher degrees\n",
+    "the model will **overfit** the training data, i.e. it learns the noise of the\n",
+    "training data.\n",
+    "We evaluate quantitatively overfitting and underfitting by using\n",
+    "cross-validation. We calculate the mean squared error (MSE) on the validation\n",
+    "set, the higher, the less likely the model generalizes correctly from the\n",
+    "training data."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "id": "58daa28d",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "\n",
+    "\n",
+    "#print(__doc__)\n",
+    "\n",
+    "import numpy as np\n",
+    "import matplotlib.pyplot as plt\n",
+    "from sklearn.pipeline import Pipeline\n",
+    "from sklearn.preprocessing import PolynomialFeatures\n",
+    "from sklearn.linear_model import LinearRegression\n",
+    "from sklearn.model_selection import cross_val_score\n",
+    "\n",
+    "\n",
+    "def true_fun(X):\n",
+    "    return np.cos(1.5 * np.pi * X)\n",
+    "\n",
+    "np.random.seed(0)\n",
+    "\n",
+    "n_samples = 30\n",
+    "degrees = [1, 4, 15]\n",
+    "\n",
+    "X = np.sort(np.random.rand(n_samples))\n",
+    "y = true_fun(X) + np.random.randn(n_samples) * 0.1\n",
+    "\n",
+    "plt.figure(figsize=(14, 5))\n",
+    "for i in range(len(degrees)):\n",
+    "    ax = plt.subplot(1, len(degrees), i + 1)\n",
+    "    plt.setp(ax, xticks=(), yticks=())\n",
+    "\n",
+    "    polynomial_features = PolynomialFeatures(degree=degrees[i],\n",
+    "                                             include_bias=False)\n",
+    "    linear_regression = LinearRegression()\n",
+    "    pipeline = Pipeline([(\"polynomial_features\", polynomial_features),\n",
+    "                         (\"linear_regression\", linear_regression)])\n",
+    "    pipeline.fit(X[:, np.newaxis], y)\n",
+    "\n",
+    "    # Evaluate the models using crossvalidation\n",
+    "    scores = cross_val_score(pipeline, X[:, np.newaxis], y,\n",
+    "                             scoring=\"neg_mean_squared_error\", cv=10)\n",
+    "\n",
+    "    X_test = np.linspace(0, 1, 100)\n",
+    "    plt.plot(X_test, pipeline.predict(X_test[:, np.newaxis]), label=\"Model\")\n",
+    "    plt.plot(X_test, true_fun(X_test), label=\"True function\")\n",
+    "    plt.scatter(X, y, edgecolor='b', s=20, label=\"Samples\")\n",
+    "    plt.xlabel(\"x\")\n",
+    "    plt.ylabel(\"y\")\n",
+    "    plt.xlim((0, 1))\n",
+    "    plt.ylim((-2, 2))\n",
+    "    plt.legend(loc=\"best\")\n",
+    "    plt.title(\"Degree {}\\nMSE = {:.2e}(+/- {:.2e})\".format(\n",
+    "        degrees[i], -scores.mean(), scores.std()))\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3bbcf741",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Various steps in cross-validation\n",
+    "\n",
+    "When the repetitive splitting of the data set is done randomly,\n",
+    "samples may accidently end up in a fast majority of the splits in\n",
+    "either training or test set. Such samples may have an unbalanced\n",
+    "influence on either model building or prediction evaluation. To avoid\n",
+    "this $k$-fold cross-validation structures the data splitting. The\n",
+    "samples are divided into $k$ more or less equally sized exhaustive and\n",
+    "mutually exclusive subsets. In turn (at each split) one of these\n",
+    "subsets plays the role of the test set while the union of the\n",
+    "remaining subsets constitutes the training set. Such a splitting\n",
+    "warrants a balanced representation of each sample in both training and\n",
+    "test set over the splits. Still the division into the $k$ subsets\n",
+    "involves a degree of randomness. This may be fully excluded when\n",
+    "choosing $k=n$. This particular case is referred to as leave-one-out\n",
+    "cross-validation (LOOCV)."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4b0ffe06",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Cross-validation in brief\n",
+    "\n",
+    "For the various values of $k$\n",
+    "\n",
+    "1. shuffle the dataset randomly.\n",
+    "\n",
+    "2. Split the dataset into $k$ groups.\n",
+    "\n",
+    "3. For each unique group:\n",
+    "\n",
+    "a. Decide which group to use as set for test data\n",
+    "\n",
+    "b. Take the remaining groups as a training data set\n",
+    "\n",
+    "c. Fit a model on the training set and evaluate it on the test set\n",
+    "\n",
+    "d. Retain the evaluation score and discard the model\n",
+    "\n",
+    "5. Summarize the model using the sample of model evaluation scores"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b11baed6",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Code Example for Cross-validation and $k$-fold Cross-validation\n",
+    "\n",
+    "The code here uses Ridge regression with cross-validation (CV)  resampling and $k$-fold CV in order to fit a specific polynomial."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "id": "39e76d49",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "import numpy as np\n",
+    "import matplotlib.pyplot as plt\n",
+    "from sklearn.model_selection import KFold\n",
+    "from sklearn.linear_model import Ridge\n",
+    "from sklearn.model_selection import cross_val_score\n",
+    "from sklearn.preprocessing import PolynomialFeatures\n",
+    "\n",
+    "# A seed just to ensure that the random numbers are the same for every run.\n",
+    "# Useful for eventual debugging.\n",
+    "np.random.seed(3155)\n",
+    "\n",
+    "# Generate the data.\n",
+    "nsamples = 100\n",
+    "x = np.random.randn(nsamples)\n",
+    "y = 3*x**2 + np.random.randn(nsamples)\n",
+    "\n",
+    "## Cross-validation on Ridge regression using KFold only\n",
+    "\n",
+    "# Decide degree on polynomial to fit\n",
+    "poly = PolynomialFeatures(degree = 6)\n",
+    "\n",
+    "# Decide which values of lambda to use\n",
+    "nlambdas = 500\n",
+    "lambdas = np.logspace(-3, 5, nlambdas)\n",
+    "\n",
+    "# Initialize a KFold instance\n",
+    "k = 5\n",
+    "kfold = KFold(n_splits = k)\n",
+    "\n",
+    "# Perform the cross-validation to estimate MSE\n",
+    "scores_KFold = np.zeros((nlambdas, k))\n",
+    "\n",
+    "i = 0\n",
+    "for lmb in lambdas:\n",
+    "    ridge = Ridge(alpha = lmb)\n",
+    "    j = 0\n",
+    "    for train_inds, test_inds in kfold.split(x):\n",
+    "        xtrain = x[train_inds]\n",
+    "        ytrain = y[train_inds]\n",
+    "\n",
+    "        xtest = x[test_inds]\n",
+    "        ytest = y[test_inds]\n",
+    "\n",
+    "        Xtrain = poly.fit_transform(xtrain[:, np.newaxis])\n",
+    "        ridge.fit(Xtrain, ytrain[:, np.newaxis])\n",
+    "\n",
+    "        Xtest = poly.fit_transform(xtest[:, np.newaxis])\n",
+    "        ypred = ridge.predict(Xtest)\n",
+    "\n",
+    "        scores_KFold[i,j] = np.sum((ypred - ytest[:, np.newaxis])**2)/np.size(ypred)\n",
+    "\n",
+    "        j += 1\n",
+    "    i += 1\n",
+    "\n",
+    "\n",
+    "estimated_mse_KFold = np.mean(scores_KFold, axis = 1)\n",
+    "\n",
+    "## Cross-validation using cross_val_score from sklearn along with KFold\n",
+    "\n",
+    "# kfold is an instance initialized above as:\n",
+    "# kfold = KFold(n_splits = k)\n",
+    "\n",
+    "estimated_mse_sklearn = np.zeros(nlambdas)\n",
+    "i = 0\n",
+    "for lmb in lambdas:\n",
+    "    ridge = Ridge(alpha = lmb)\n",
+    "\n",
+    "    X = poly.fit_transform(x[:, np.newaxis])\n",
+    "    estimated_mse_folds = cross_val_score(ridge, X, y[:, np.newaxis], scoring='neg_mean_squared_error', cv=kfold)\n",
+    "\n",
+    "    # cross_val_score return an array containing the estimated negative mse for every fold.\n",
+    "    # we have to the the mean of every array in order to get an estimate of the mse of the model\n",
+    "    estimated_mse_sklearn[i] = np.mean(-estimated_mse_folds)\n",
+    "\n",
+    "    i += 1\n",
+    "\n",
+    "## Plot and compare the slightly different ways to perform cross-validation\n",
+    "\n",
+    "plt.figure()\n",
+    "\n",
+    "plt.plot(np.log10(lambdas), estimated_mse_sklearn, label = 'cross_val_score')\n",
+    "plt.plot(np.log10(lambdas), estimated_mse_KFold, 'r--', label = 'KFold')\n",
+    "\n",
+    "plt.xlabel('log10(lambda)')\n",
+    "plt.ylabel('mse')\n",
+    "\n",
+    "plt.legend()\n",
+    "\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e7d12ef0",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## More examples on bootstrap and cross-validation and errors"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "id": "47f6ae18",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "# Common imports\n",
+    "import os\n",
+    "import numpy as np\n",
+    "import pandas as pd\n",
+    "import matplotlib.pyplot as plt\n",
+    "from sklearn.linear_model import LinearRegression, Ridge, Lasso\n",
+    "from sklearn.model_selection import train_test_split\n",
+    "from sklearn.utils import resample\n",
+    "from sklearn.metrics import mean_squared_error\n",
+    "# Where to save the figures and data files\n",
+    "PROJECT_ROOT_DIR = \"Results\"\n",
+    "FIGURE_ID = \"Results/FigureFiles\"\n",
+    "DATA_ID = \"DataFiles/\"\n",
+    "\n",
+    "if not os.path.exists(PROJECT_ROOT_DIR):\n",
+    "    os.mkdir(PROJECT_ROOT_DIR)\n",
+    "\n",
+    "if not os.path.exists(FIGURE_ID):\n",
+    "    os.makedirs(FIGURE_ID)\n",
+    "\n",
+    "if not os.path.exists(DATA_ID):\n",
+    "    os.makedirs(DATA_ID)\n",
+    "\n",
+    "def image_path(fig_id):\n",
+    "    return os.path.join(FIGURE_ID, fig_id)\n",
+    "\n",
+    "def data_path(dat_id):\n",
+    "    return os.path.join(DATA_ID, dat_id)\n",
+    "\n",
+    "def save_fig(fig_id):\n",
+    "    plt.savefig(image_path(fig_id) + \".png\", format='png')\n",
+    "\n",
+    "infile = open(data_path(\"EoS.csv\"),'r')\n",
+    "\n",
+    "# Read the EoS data as  csv file and organize the data into two arrays with density and energies\n",
+    "EoS = pd.read_csv(infile, names=('Density', 'Energy'))\n",
+    "EoS['Energy'] = pd.to_numeric(EoS['Energy'], errors='coerce')\n",
+    "EoS = EoS.dropna()\n",
+    "Energies = EoS['Energy']\n",
+    "Density = EoS['Density']\n",
+    "#  The design matrix now as function of various polytrops\n",
+    "\n",
+    "Maxpolydegree = 30\n",
+    "X = np.zeros((len(Density),Maxpolydegree))\n",
+    "X[:,0] = 1.0\n",
+    "testerror = np.zeros(Maxpolydegree)\n",
+    "trainingerror = np.zeros(Maxpolydegree)\n",
+    "polynomial = np.zeros(Maxpolydegree)\n",
+    "\n",
+    "trials = 100\n",
+    "for polydegree in range(1, Maxpolydegree):\n",
+    "    polynomial[polydegree] = polydegree\n",
+    "    for degree in range(polydegree):\n",
+    "        X[:,degree] = Density**(degree/3.0)\n",
+    "\n",
+    "# loop over trials in order to estimate the expectation value of the MSE\n",
+    "    testerror[polydegree] = 0.0\n",
+    "    trainingerror[polydegree] = 0.0\n",
+    "    for samples in range(trials):\n",
+    "        x_train, x_test, y_train, y_test = train_test_split(X, Energies, test_size=0.2)\n",
+    "        model = LinearRegression(fit_intercept=False).fit(x_train, y_train)\n",
+    "        ypred = model.predict(x_train)\n",
+    "        ytilde = model.predict(x_test)\n",
+    "        testerror[polydegree] += mean_squared_error(y_test, ytilde)\n",
+    "        trainingerror[polydegree] += mean_squared_error(y_train, ypred) \n",
+    "\n",
+    "    testerror[polydegree] /= trials\n",
+    "    trainingerror[polydegree] /= trials\n",
+    "    print(\"Degree of polynomial: %3d\"% polynomial[polydegree])\n",
+    "    print(\"Mean squared error on training data: %.8f\" % trainingerror[polydegree])\n",
+    "    print(\"Mean squared error on test data: %.8f\" % testerror[polydegree])\n",
+    "\n",
+    "plt.plot(polynomial, np.log10(trainingerror), label='Training Error')\n",
+    "plt.plot(polynomial, np.log10(testerror), label='Test Error')\n",
+    "plt.xlabel('Polynomial degree')\n",
+    "plt.ylabel('log10[MSE]')\n",
+    "plt.legend()\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9c1d4754",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Note that we kept the intercept column in the fitting here. This means that we need to set the **intercept** in the call to the **Scikit-Learn** function as **False**. Alternatively, we could have set up the design matrix $X$ without the first column of ones."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b698ac66",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## The same example but now with cross-validation\n",
+    "\n",
+    "In this example we keep the intercept column again but add cross-validation in order to estimate the best possible value of the means squared error."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 8,
+   "id": "0a2409b0",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "# Common imports\n",
+    "import os\n",
+    "import numpy as np\n",
+    "import pandas as pd\n",
+    "import matplotlib.pyplot as plt\n",
+    "from sklearn.linear_model import LinearRegression, Ridge, Lasso\n",
+    "from sklearn.metrics import mean_squared_error\n",
+    "from sklearn.model_selection import KFold\n",
+    "from sklearn.model_selection import cross_val_score\n",
+    "\n",
+    "\n",
+    "# Where to save the figures and data files\n",
+    "PROJECT_ROOT_DIR = \"Results\"\n",
+    "FIGURE_ID = \"Results/FigureFiles\"\n",
+    "DATA_ID = \"DataFiles/\"\n",
+    "\n",
+    "if not os.path.exists(PROJECT_ROOT_DIR):\n",
+    "    os.mkdir(PROJECT_ROOT_DIR)\n",
+    "\n",
+    "if not os.path.exists(FIGURE_ID):\n",
+    "    os.makedirs(FIGURE_ID)\n",
+    "\n",
+    "if not os.path.exists(DATA_ID):\n",
+    "    os.makedirs(DATA_ID)\n",
+    "\n",
+    "def image_path(fig_id):\n",
+    "    return os.path.join(FIGURE_ID, fig_id)\n",
+    "\n",
+    "def data_path(dat_id):\n",
+    "    return os.path.join(DATA_ID, dat_id)\n",
+    "\n",
+    "def save_fig(fig_id):\n",
+    "    plt.savefig(image_path(fig_id) + \".png\", format='png')\n",
+    "\n",
+    "infile = open(data_path(\"EoS.csv\"),'r')\n",
+    "\n",
+    "# Read the EoS data as  csv file and organize the data into two arrays with density and energies\n",
+    "EoS = pd.read_csv(infile, names=('Density', 'Energy'))\n",
+    "EoS['Energy'] = pd.to_numeric(EoS['Energy'], errors='coerce')\n",
+    "EoS = EoS.dropna()\n",
+    "Energies = EoS['Energy']\n",
+    "Density = EoS['Density']\n",
+    "#  The design matrix now as function of various polytrops\n",
+    "\n",
+    "Maxpolydegree = 30\n",
+    "X = np.zeros((len(Density),Maxpolydegree))\n",
+    "X[:,0] = 1.0\n",
+    "estimated_mse_sklearn = np.zeros(Maxpolydegree)\n",
+    "polynomial = np.zeros(Maxpolydegree)\n",
+    "k =5\n",
+    "kfold = KFold(n_splits = k)\n",
+    "\n",
+    "for polydegree in range(1, Maxpolydegree):\n",
+    "    polynomial[polydegree] = polydegree\n",
+    "    for degree in range(polydegree):\n",
+    "        X[:,degree] = Density**(degree/3.0)\n",
+    "        OLS = LinearRegression(fit_intercept=False)\n",
+    "# loop over trials in order to estimate the expectation value of the MSE\n",
+    "    estimated_mse_folds = cross_val_score(OLS, X, Energies, scoring='neg_mean_squared_error', cv=kfold)\n",
+    "#[:, np.newaxis]\n",
+    "    estimated_mse_sklearn[polydegree] = np.mean(-estimated_mse_folds)\n",
+    "\n",
+    "plt.plot(polynomial, np.log10(estimated_mse_sklearn), label='Test Error')\n",
+    "plt.xlabel('Polynomial degree')\n",
+    "plt.ylabel('log10[MSE]')\n",
+    "plt.legend()\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "56f130b5",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Material for the lab sessions\n",
+    "\n",
+    "This week we will discuss during the first hour of each lab session\n",
+    "some technicalities related to the project and methods for updating\n",
+    "the learning like ADAgrad, RMSprop and ADAM. As teaching material, see\n",
+    "the jupyter-notebook from week 37 (September 12-16).\n",
+    "\n",
+    "For the lab session, the following video on cross validation (from 2024), could be helpful, see <https://www.youtube.com/watch?v=T9jjWsmsd1o>\n",
+    "\n",
+    "See also video on ADAgrad, RMSprop and ADAM (material from last week not covered during lecture) at <https://youtu.be/J_41Hld6tTU>"
+   ]
+  }
+ ],
+ "metadata": {},
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/doc/LectureNotes/_build/html/_sources/week39.ipynb b/doc/LectureNotes/_build/html/_sources/week39.ipynb
new file mode 100644
index 000000000..1f411fe62
--- /dev/null
+++ b/doc/LectureNotes/_build/html/_sources/week39.ipynb
@@ -0,0 +1,2430 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "3a65fcc4",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- HTML file automatically generated from DocOnce source (https://github.com/doconce/doconce/)\n",
+    "doconce format html week39.do.txt --no_mako -->\n",
+    "<!-- dom:TITLE: Week 39: Resampling methods and logistic regression -->"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "284ac98b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "# Week 39: Resampling methods and logistic regression\n",
+    "**Morten Hjorth-Jensen**, Department of Physics, University of Oslo\n",
+    "\n",
+    "Date: **Week 39**"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "582e0b32",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Plan for week 39, September 22-26, 2025\n",
+    "\n",
+    "**Material for the lecture on Monday September 22.**\n",
+    "\n",
+    "1. Resampling techniques, Bootstrap and cross validation and bias-variance tradeoff\n",
+    "\n",
+    "2. Logistic regression, our first classification encounter and a stepping stone towards neural networks\n",
+    "\n",
+    "3. [Video of lecture](https://youtu.be/OVouJyhoksY)\n",
+    "\n",
+    "4. [Whiteboard notes](https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2024/FYSSTKweek39.pdf)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "08ea52de",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Readings and Videos, resampling methods\n",
+    "1. Raschka et al, pages 175-192\n",
+    "\n",
+    "2. Hastie et al Chapter 7, here we recommend 7.1-7.5 and 7.10 (cross-validation) and 7.11 (bootstrap). See <https://link.springer.com/book/10.1007/978-0-387-84858-7>.\n",
+    "\n",
+    "3. [Video on bias-variance tradeoff](https://www.youtube.com/watch?v=EuBBz3bI-aA)\n",
+    "\n",
+    "4. [Video on Bootstrapping](https://www.youtube.com/watch?v=Xz0x-8-cgaQ)\n",
+    "\n",
+    "5. [Video on cross validation](https://www.youtube.com/watch?v=fSytzGwwBVw)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a8d5878f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Readings and Videos, logistic regression\n",
+    "1. Hastie et al 4.1, 4.2 and 4.3 on logistic regression\n",
+    "\n",
+    "2. Raschka et al, pages 53-76 on Logistic regression and pages 37-52 on gradient optimization\n",
+    "\n",
+    "3. [Video on Logistic regression](https://www.youtube.com/watch?v=C5268D9t9Ak)\n",
+    "\n",
+    "4. [Yet another video on logistic regression](https://www.youtube.com/watch?v=yIYKR4sgzI8)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e93210f9",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Lab sessions week 39\n",
+    "\n",
+    "**Material for the lab sessions on Tuesday and Wednesday.**\n",
+    "\n",
+    "1. Discussions on how to structure your report for the first project\n",
+    "\n",
+    "2. Exercise for week 39 on how to write the abstract and the introduction of the report and how to include references. \n",
+    "\n",
+    "3. Work on project 1, in particular resampling methods like cross-validation and bootstrap. **For more discussions of project 1, chapter 5 of Goodfellow et al is a good read, in particular sections 5.1-5.5 and 5.7-5.11**.\n",
+    "\n",
+    "4. [Video on how to write scientific reports recorded during one of the lab sessions](https://youtu.be/tVW1ZDmZnwM)\n",
+    "\n",
+    "5. A general guideline can be found at <https://github.com/CompPhysics/MachineLearning/blob/master/doc/Projects/EvaluationGrading/EvaluationForm.md>."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c319a504",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Lecture material"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5f29284a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Resampling methods\n",
+    "Resampling methods are an indispensable tool in modern\n",
+    "statistics. They involve repeatedly drawing samples from a training\n",
+    "set and refitting a model of interest on each sample in order to\n",
+    "obtain additional information about the fitted model. For example, in\n",
+    "order to estimate the variability of a linear regression fit, we can\n",
+    "repeatedly draw different samples from the training data, fit a linear\n",
+    "regression to each new sample, and then examine the extent to which\n",
+    "the resulting fits differ. Such an approach may allow us to obtain\n",
+    "information that would not be available from fitting the model only\n",
+    "once using the original training sample.\n",
+    "\n",
+    "Two resampling methods are often used in Machine Learning analyses,\n",
+    "1. The **bootstrap method**\n",
+    "\n",
+    "2. and **Cross-Validation**\n",
+    "\n",
+    "In addition there are several other methods such as the Jackknife and the Blocking methods. This week will repeat some of the elements of the bootstrap method and focus more on cross-validation."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4a774608",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Resampling approaches can be computationally expensive\n",
+    "\n",
+    "Resampling approaches can be computationally expensive, because they\n",
+    "involve fitting the same statistical method multiple times using\n",
+    "different subsets of the training data. However, due to recent\n",
+    "advances in computing power, the computational requirements of\n",
+    "resampling methods generally are not prohibitive. In this chapter, we\n",
+    "discuss two of the most commonly used resampling methods,\n",
+    "cross-validation and the bootstrap. Both methods are important tools\n",
+    "in the practical application of many statistical learning\n",
+    "procedures. For example, cross-validation can be used to estimate the\n",
+    "test error associated with a given statistical learning method in\n",
+    "order to evaluate its performance, or to select the appropriate level\n",
+    "of flexibility. The process of evaluating a model’s performance is\n",
+    "known as model assessment, whereas the process of selecting the proper\n",
+    "level of flexibility for a model is known as model selection. The\n",
+    "bootstrap is widely used."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5e62c381",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Why resampling methods ?\n",
+    "**Statistical analysis.**\n",
+    "\n",
+    "* Our simulations can be treated as *computer experiments*. This is particularly the case for Monte Carlo methods which are widely used in statistical analyses.\n",
+    "\n",
+    "* The results can be analysed with the same statistical tools as we would use when analysing experimental data.\n",
+    "\n",
+    "* As in all experiments, we are looking for expectation values and an estimate of how accurate they are, i.e., possible sources for errors."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "96896342",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Statistical analysis\n",
+    "\n",
+    "* As in other experiments, many numerical  experiments have two classes of errors:\n",
+    "\n",
+    "  * Statistical errors\n",
+    "\n",
+    "  * Systematical errors\n",
+    "\n",
+    "* Statistical errors can be estimated using standard tools from statistics\n",
+    "\n",
+    "* Systematical errors are method specific and must be treated differently from case to case."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d5318be7",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Resampling methods\n",
+    "\n",
+    "With all these analytical equations for both the OLS and Ridge\n",
+    "regression, we will now outline how to assess a given model. This will\n",
+    "lead to a discussion of the so-called bias-variance tradeoff (see\n",
+    "below) and so-called resampling methods.\n",
+    "\n",
+    "One of the quantities we have discussed as a way to measure errors is\n",
+    "the mean-squared error (MSE), mainly used for fitting of continuous\n",
+    "functions. Another choice is the absolute error.\n",
+    "\n",
+    "In the discussions below we will focus on the MSE and in particular since we will split the data into test and training data,\n",
+    "we discuss the\n",
+    "1. prediction error or simply the **test error** $\\mathrm{Err_{Test}}$, where we have a fixed training set and the test error is the MSE arising from the data reserved for testing. We discuss also the \n",
+    "\n",
+    "2. training error $\\mathrm{Err_{Train}}$, which is the average loss over the training data.\n",
+    "\n",
+    "As our model becomes more and more complex, more of the training data tends to  used. The training may thence adapt to more complicated structures in the data. This may lead to a decrease in the bias (see below for code example) and a slight increase of the variance for the test error.\n",
+    "For a certain level of complexity the test error will reach minimum, before starting to increase again. The\n",
+    "training error reaches a saturation."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7597015e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Resampling methods: Bootstrap\n",
+    "Bootstrapping is a [non-parametric approach](https://en.wikipedia.org/wiki/Nonparametric_statistics) to statistical inference\n",
+    "that substitutes computation for more traditional distributional\n",
+    "assumptions and asymptotic results. Bootstrapping offers a number of\n",
+    "advantages: \n",
+    "1. The bootstrap is quite general, although there are some cases in which it fails.  \n",
+    "\n",
+    "2. Because it does not require distributional assumptions (such as normally distributed errors), the bootstrap can provide more accurate inferences when the data are not well behaved or when the sample size is small.  \n",
+    "\n",
+    "3. It is possible to apply the bootstrap to statistics with sampling distributions that are difficult to derive, even asymptotically. \n",
+    "\n",
+    "4. It is relatively simple to apply the bootstrap to complex data-collection plans (such as stratified and clustered samples).\n",
+    "\n",
+    "The textbook by [Davison on the Bootstrap Methods and their Applications](https://www.cambridge.org/core/books/bootstrap-methods-and-their-application/ED2FD043579F27952363566DC09CBD6A) provides many more insights and proofs. In this course we will take a more practical approach and use the results and theorems provided in the literature. For those interested in reading more about the bootstrap methods, we recommend the above text and the one by [Efron and Tibshirani](https://www.routledge.com/An-Introduction-to-the-Bootstrap/Efron-Tibshirani/p/book/9780412042317)."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "fbf69230",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## The bias-variance tradeoff\n",
+    "\n",
+    "We will discuss the bias-variance tradeoff in the context of\n",
+    "continuous predictions such as regression. However, many of the\n",
+    "intuitions and ideas discussed here also carry over to classification\n",
+    "tasks. Consider a dataset $\\mathcal{D}$ consisting of the data\n",
+    "$\\mathbf{X}_\\mathcal{D}=\\{(y_j, \\boldsymbol{x}_j), j=0\\ldots n-1\\}$. \n",
+    "\n",
+    "Let us assume that the true data is generated from a noisy model"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "358f7872",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\boldsymbol{y}=f(\\boldsymbol{x}) + \\boldsymbol{\\epsilon}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6a4aceef",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where $\\epsilon$ is normally distributed with mean zero and standard deviation $\\sigma^2$.\n",
+    "\n",
+    "In our derivation of the ordinary least squares method we defined then\n",
+    "an approximation to the function $f$ in terms of the parameters\n",
+    "$\\boldsymbol{\\theta}$ and the design matrix $\\boldsymbol{X}$ which embody our model,\n",
+    "that is $\\boldsymbol{\\tilde{y}}=\\boldsymbol{X}\\boldsymbol{\\theta}$. \n",
+    "\n",
+    "Thereafter we found the parameters $\\boldsymbol{\\theta}$ by optimizing the means squared error via the so-called cost function"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "84416669",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "C(\\boldsymbol{X},\\boldsymbol{\\theta}) =\\frac{1}{n}\\sum_{i=0}^{n-1}(y_i-\\tilde{y}_i)^2=\\mathbb{E}\\left[(\\boldsymbol{y}-\\boldsymbol{\\tilde{y}})^2\\right].\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0036358e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "We can rewrite this as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d712d2d7",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\mathbb{E}\\left[(\\boldsymbol{y}-\\boldsymbol{\\tilde{y}})^2\\right]=\\frac{1}{n}\\sum_i(f_i-\\mathbb{E}\\left[\\boldsymbol{\\tilde{y}}\\right])^2+\\frac{1}{n}\\sum_i(\\tilde{y}_i-\\mathbb{E}\\left[\\boldsymbol{\\tilde{y}}\\right])^2+\\sigma^2.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b71e48ac",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "The three terms represent the square of the bias of the learning\n",
+    "method, which can be thought of as the error caused by the simplifying\n",
+    "assumptions built into the method. The second term represents the\n",
+    "variance of the chosen model and finally the last terms is variance of\n",
+    "the error $\\boldsymbol{\\epsilon}$.\n",
+    "\n",
+    "To derive this equation, we need to recall that the variance of $\\boldsymbol{y}$ and $\\boldsymbol{\\epsilon}$ are both equal to $\\sigma^2$. The mean value of $\\boldsymbol{\\epsilon}$ is by definition equal to zero. Furthermore, the function $f$ is not a stochastics variable, idem for $\\boldsymbol{\\tilde{y}}$.\n",
+    "We use a more compact notation in terms of the expectation value"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c78ceafe",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\mathbb{E}\\left[(\\boldsymbol{y}-\\boldsymbol{\\tilde{y}})^2\\right]=\\mathbb{E}\\left[(\\boldsymbol{f}+\\boldsymbol{\\epsilon}-\\boldsymbol{\\tilde{y}})^2\\right],\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "74aae5bc",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and adding and subtracting $\\mathbb{E}\\left[\\boldsymbol{\\tilde{y}}\\right]$ we get"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1f2313f1",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\mathbb{E}\\left[(\\boldsymbol{y}-\\boldsymbol{\\tilde{y}})^2\\right]=\\mathbb{E}\\left[(\\boldsymbol{f}+\\boldsymbol{\\epsilon}-\\boldsymbol{\\tilde{y}}+\\mathbb{E}\\left[\\boldsymbol{\\tilde{y}}\\right]-\\mathbb{E}\\left[\\boldsymbol{\\tilde{y}}\\right])^2\\right],\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a29b174f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "which, using the abovementioned expectation values can be rewritten as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3bc08002",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\mathbb{E}\\left[(\\boldsymbol{y}-\\boldsymbol{\\tilde{y}})^2\\right]=\\mathbb{E}\\left[(\\boldsymbol{y}-\\mathbb{E}\\left[\\boldsymbol{\\tilde{y}}\\right])^2\\right]+\\mathrm{Var}\\left[\\boldsymbol{\\tilde{y}}\\right]+\\sigma^2,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7b7d24e8",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "that is the rewriting in terms of the so-called bias, the variance of the model $\\boldsymbol{\\tilde{y}}$ and the variance of $\\boldsymbol{\\epsilon}$.\n",
+    "\n",
+    "**Note that in order to derive these equations we have assumed we can replace the unknown function $\\boldsymbol{f}$ with the target/output data $\\boldsymbol{y}$.**"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f2118d82",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## A way to Read the Bias-Variance Tradeoff\n",
+    "\n",
+    "<!-- dom:FIGURE: [figures/BiasVariance.png, width=600 frac=0.9] -->\n",
+    "<!-- begin figure -->\n",
+    "\n",
+    "<img src=\"figures/BiasVariance.png\" width=\"600\"><p style=\"font-size: 0.9em\"><i>Figure 1: </i></p>\n",
+    "<!-- end figure -->"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "baf08f8a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Understanding what happens"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "id": "1bd7ac4e",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "%matplotlib inline\n",
+    "\n",
+    "import matplotlib.pyplot as plt\n",
+    "import numpy as np\n",
+    "from sklearn.linear_model import LinearRegression, Ridge, Lasso\n",
+    "from sklearn.preprocessing import PolynomialFeatures\n",
+    "from sklearn.model_selection import train_test_split\n",
+    "from sklearn.pipeline import make_pipeline\n",
+    "from sklearn.utils import resample\n",
+    "\n",
+    "np.random.seed(2018)\n",
+    "\n",
+    "n = 40\n",
+    "n_boostraps = 100\n",
+    "maxdegree = 14\n",
+    "\n",
+    "\n",
+    "# Make data set.\n",
+    "x = np.linspace(-3, 3, n).reshape(-1, 1)\n",
+    "y = np.exp(-x**2) + 1.5 * np.exp(-(x-2)**2)+ np.random.normal(0, 0.1, x.shape)\n",
+    "error = np.zeros(maxdegree)\n",
+    "bias = np.zeros(maxdegree)\n",
+    "variance = np.zeros(maxdegree)\n",
+    "polydegree = np.zeros(maxdegree)\n",
+    "x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2)\n",
+    "\n",
+    "for degree in range(maxdegree):\n",
+    "    model = make_pipeline(PolynomialFeatures(degree=degree), LinearRegression(fit_intercept=False))\n",
+    "    y_pred = np.empty((y_test.shape[0], n_boostraps))\n",
+    "    for i in range(n_boostraps):\n",
+    "        x_, y_ = resample(x_train, y_train)\n",
+    "        y_pred[:, i] = model.fit(x_, y_).predict(x_test).ravel()\n",
+    "\n",
+    "    polydegree[degree] = degree\n",
+    "    error[degree] = np.mean( np.mean((y_test - y_pred)**2, axis=1, keepdims=True) )\n",
+    "    bias[degree] = np.mean( (y_test - np.mean(y_pred, axis=1, keepdims=True))**2 )\n",
+    "    variance[degree] = np.mean( np.var(y_pred, axis=1, keepdims=True) )\n",
+    "    print('Polynomial degree:', degree)\n",
+    "    print('Error:', error[degree])\n",
+    "    print('Bias^2:', bias[degree])\n",
+    "    print('Var:', variance[degree])\n",
+    "    print('{} >= {} + {} = {}'.format(error[degree], bias[degree], variance[degree], bias[degree]+variance[degree]))\n",
+    "\n",
+    "plt.plot(polydegree, error, label='Error')\n",
+    "plt.plot(polydegree, bias, label='bias')\n",
+    "plt.plot(polydegree, variance, label='Variance')\n",
+    "plt.legend()\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3edb75ab",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Summing up\n",
+    "\n",
+    "The bias-variance tradeoff summarizes the fundamental tension in\n",
+    "machine learning, particularly supervised learning, between the\n",
+    "complexity of a model and the amount of training data needed to train\n",
+    "it.  Since data is often limited, in practice it is often useful to\n",
+    "use a less-complex model with higher bias, that is  a model whose asymptotic\n",
+    "performance is worse than another model because it is easier to\n",
+    "train and less sensitive to sampling noise arising from having a\n",
+    "finite-sized training dataset (smaller variance). \n",
+    "\n",
+    "The above equations tell us that in\n",
+    "order to minimize the expected test error, we need to select a\n",
+    "statistical learning method that simultaneously achieves low variance\n",
+    "and low bias. Note that variance is inherently a nonnegative quantity,\n",
+    "and squared bias is also nonnegative. Hence, we see that the expected\n",
+    "test MSE can never lie below $Var(\\epsilon)$, the irreducible error.\n",
+    "\n",
+    "What do we mean by the variance and bias of a statistical learning\n",
+    "method? The variance refers to the amount by which our model would change if we\n",
+    "estimated it using a different training data set. Since the training\n",
+    "data are used to fit the statistical learning method, different\n",
+    "training data sets  will result in a different estimate. But ideally the\n",
+    "estimate for our model should not vary too much between training\n",
+    "sets. However, if a method has high variance  then small changes in\n",
+    "the training data can result in large changes in the model. In general, more\n",
+    "flexible statistical methods have higher variance.\n",
+    "\n",
+    "You may also find this recent [article](https://www.pnas.org/content/116/32/15849) of interest."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "88ce8a48",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Another Example from Scikit-Learn's Repository\n",
+    "\n",
+    "This example demonstrates the problems of underfitting and overfitting and\n",
+    "how we can use linear regression with polynomial features to approximate\n",
+    "nonlinear functions. The plot shows the function that we want to approximate,\n",
+    "which is a part of the cosine function. In addition, the samples from the\n",
+    "real function and the approximations of different models are displayed. The\n",
+    "models have polynomial features of different degrees. We can see that a\n",
+    "linear function (polynomial with degree 1) is not sufficient to fit the\n",
+    "training samples. This is called **underfitting**. A polynomial of degree 4\n",
+    "approximates the true function almost perfectly. However, for higher degrees\n",
+    "the model will **overfit** the training data, i.e. it learns the noise of the\n",
+    "training data.\n",
+    "We evaluate quantitatively overfitting and underfitting by using\n",
+    "cross-validation. We calculate the mean squared error (MSE) on the validation\n",
+    "set, the higher, the less likely the model generalizes correctly from the\n",
+    "training data."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "id": "40385eb8",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "\n",
+    "\n",
+    "#print(__doc__)\n",
+    "\n",
+    "import numpy as np\n",
+    "import matplotlib.pyplot as plt\n",
+    "from sklearn.pipeline import Pipeline\n",
+    "from sklearn.preprocessing import PolynomialFeatures\n",
+    "from sklearn.linear_model import LinearRegression\n",
+    "from sklearn.model_selection import cross_val_score\n",
+    "\n",
+    "\n",
+    "def true_fun(X):\n",
+    "    return np.cos(1.5 * np.pi * X)\n",
+    "\n",
+    "np.random.seed(0)\n",
+    "\n",
+    "n_samples = 30\n",
+    "degrees = [1, 4, 15]\n",
+    "\n",
+    "X = np.sort(np.random.rand(n_samples))\n",
+    "y = true_fun(X) + np.random.randn(n_samples) * 0.1\n",
+    "\n",
+    "plt.figure(figsize=(14, 5))\n",
+    "for i in range(len(degrees)):\n",
+    "    ax = plt.subplot(1, len(degrees), i + 1)\n",
+    "    plt.setp(ax, xticks=(), yticks=())\n",
+    "\n",
+    "    polynomial_features = PolynomialFeatures(degree=degrees[i],\n",
+    "                                             include_bias=False)\n",
+    "    linear_regression = LinearRegression()\n",
+    "    pipeline = Pipeline([(\"polynomial_features\", polynomial_features),\n",
+    "                         (\"linear_regression\", linear_regression)])\n",
+    "    pipeline.fit(X[:, np.newaxis], y)\n",
+    "\n",
+    "    # Evaluate the models using crossvalidation\n",
+    "    scores = cross_val_score(pipeline, X[:, np.newaxis], y,\n",
+    "                             scoring=\"neg_mean_squared_error\", cv=10)\n",
+    "\n",
+    "    X_test = np.linspace(0, 1, 100)\n",
+    "    plt.plot(X_test, pipeline.predict(X_test[:, np.newaxis]), label=\"Model\")\n",
+    "    plt.plot(X_test, true_fun(X_test), label=\"True function\")\n",
+    "    plt.scatter(X, y, edgecolor='b', s=20, label=\"Samples\")\n",
+    "    plt.xlabel(\"x\")\n",
+    "    plt.ylabel(\"y\")\n",
+    "    plt.xlim((0, 1))\n",
+    "    plt.ylim((-2, 2))\n",
+    "    plt.legend(loc=\"best\")\n",
+    "    plt.title(\"Degree {}\\nMSE = {:.2e}(+/- {:.2e})\".format(\n",
+    "        degrees[i], -scores.mean(), scores.std()))\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a0c0d4df",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Various steps in cross-validation\n",
+    "\n",
+    "When the repetitive splitting of the data set is done randomly,\n",
+    "samples may accidently end up in a fast majority of the splits in\n",
+    "either training or test set. Such samples may have an unbalanced\n",
+    "influence on either model building or prediction evaluation. To avoid\n",
+    "this $k$-fold cross-validation structures the data splitting. The\n",
+    "samples are divided into $k$ more or less equally sized exhaustive and\n",
+    "mutually exclusive subsets. In turn (at each split) one of these\n",
+    "subsets plays the role of the test set while the union of the\n",
+    "remaining subsets constitutes the training set. Such a splitting\n",
+    "warrants a balanced representation of each sample in both training and\n",
+    "test set over the splits. Still the division into the $k$ subsets\n",
+    "involves a degree of randomness. This may be fully excluded when\n",
+    "choosing $k=n$. This particular case is referred to as leave-one-out\n",
+    "cross-validation (LOOCV)."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "68d3e653",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Cross-validation in brief\n",
+    "\n",
+    "For the various values of $k$\n",
+    "\n",
+    "1. shuffle the dataset randomly.\n",
+    "\n",
+    "2. Split the dataset into $k$ groups.\n",
+    "\n",
+    "3. For each unique group:\n",
+    "\n",
+    "a. Decide which group to use as set for test data\n",
+    "\n",
+    "b. Take the remaining groups as a training data set\n",
+    "\n",
+    "c. Fit a model on the training set and evaluate it on the test set\n",
+    "\n",
+    "d. Retain the evaluation score and discard the model\n",
+    "\n",
+    "5. Summarize the model using the sample of model evaluation scores"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7f7a6350",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Code Example for Cross-validation and $k$-fold Cross-validation\n",
+    "\n",
+    "The code here uses Ridge regression with cross-validation (CV)  resampling and $k$-fold CV in order to fit a specific polynomial."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "id": "23eef50b",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "import numpy as np\n",
+    "import matplotlib.pyplot as plt\n",
+    "from sklearn.model_selection import KFold\n",
+    "from sklearn.linear_model import Ridge\n",
+    "from sklearn.model_selection import cross_val_score\n",
+    "from sklearn.preprocessing import PolynomialFeatures\n",
+    "\n",
+    "# A seed just to ensure that the random numbers are the same for every run.\n",
+    "# Useful for eventual debugging.\n",
+    "np.random.seed(3155)\n",
+    "\n",
+    "# Generate the data.\n",
+    "nsamples = 100\n",
+    "x = np.random.randn(nsamples)\n",
+    "y = 3*x**2 + np.random.randn(nsamples)\n",
+    "\n",
+    "## Cross-validation on Ridge regression using KFold only\n",
+    "\n",
+    "# Decide degree on polynomial to fit\n",
+    "poly = PolynomialFeatures(degree = 6)\n",
+    "\n",
+    "# Decide which values of lambda to use\n",
+    "nlambdas = 500\n",
+    "lambdas = np.logspace(-3, 5, nlambdas)\n",
+    "\n",
+    "# Initialize a KFold instance\n",
+    "k = 5\n",
+    "kfold = KFold(n_splits = k)\n",
+    "\n",
+    "# Perform the cross-validation to estimate MSE\n",
+    "scores_KFold = np.zeros((nlambdas, k))\n",
+    "\n",
+    "i = 0\n",
+    "for lmb in lambdas:\n",
+    "    ridge = Ridge(alpha = lmb)\n",
+    "    j = 0\n",
+    "    for train_inds, test_inds in kfold.split(x):\n",
+    "        xtrain = x[train_inds]\n",
+    "        ytrain = y[train_inds]\n",
+    "\n",
+    "        xtest = x[test_inds]\n",
+    "        ytest = y[test_inds]\n",
+    "\n",
+    "        Xtrain = poly.fit_transform(xtrain[:, np.newaxis])\n",
+    "        ridge.fit(Xtrain, ytrain[:, np.newaxis])\n",
+    "\n",
+    "        Xtest = poly.fit_transform(xtest[:, np.newaxis])\n",
+    "        ypred = ridge.predict(Xtest)\n",
+    "\n",
+    "        scores_KFold[i,j] = np.sum((ypred - ytest[:, np.newaxis])**2)/np.size(ypred)\n",
+    "\n",
+    "        j += 1\n",
+    "    i += 1\n",
+    "\n",
+    "\n",
+    "estimated_mse_KFold = np.mean(scores_KFold, axis = 1)\n",
+    "\n",
+    "## Cross-validation using cross_val_score from sklearn along with KFold\n",
+    "\n",
+    "# kfold is an instance initialized above as:\n",
+    "# kfold = KFold(n_splits = k)\n",
+    "\n",
+    "estimated_mse_sklearn = np.zeros(nlambdas)\n",
+    "i = 0\n",
+    "for lmb in lambdas:\n",
+    "    ridge = Ridge(alpha = lmb)\n",
+    "\n",
+    "    X = poly.fit_transform(x[:, np.newaxis])\n",
+    "    estimated_mse_folds = cross_val_score(ridge, X, y[:, np.newaxis], scoring='neg_mean_squared_error', cv=kfold)\n",
+    "\n",
+    "    # cross_val_score return an array containing the estimated negative mse for every fold.\n",
+    "    # we have to the the mean of every array in order to get an estimate of the mse of the model\n",
+    "    estimated_mse_sklearn[i] = np.mean(-estimated_mse_folds)\n",
+    "\n",
+    "    i += 1\n",
+    "\n",
+    "## Plot and compare the slightly different ways to perform cross-validation\n",
+    "\n",
+    "plt.figure()\n",
+    "\n",
+    "plt.plot(np.log10(lambdas), estimated_mse_sklearn, label = 'cross_val_score')\n",
+    "#plt.plot(np.log10(lambdas), estimated_mse_KFold, 'r--', label = 'KFold')\n",
+    "\n",
+    "plt.xlabel('log10(lambda)')\n",
+    "plt.ylabel('mse')\n",
+    "\n",
+    "plt.legend()\n",
+    "\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "76662787",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## More examples on bootstrap and cross-validation and errors"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "id": "166cd085",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "# Common imports\n",
+    "import os\n",
+    "import numpy as np\n",
+    "import pandas as pd\n",
+    "import matplotlib.pyplot as plt\n",
+    "from sklearn.linear_model import LinearRegression, Ridge, Lasso\n",
+    "from sklearn.model_selection import train_test_split\n",
+    "from sklearn.utils import resample\n",
+    "from sklearn.metrics import mean_squared_error\n",
+    "# Where to save the figures and data files\n",
+    "PROJECT_ROOT_DIR = \"Results\"\n",
+    "FIGURE_ID = \"Results/FigureFiles\"\n",
+    "DATA_ID = \"DataFiles/\"\n",
+    "\n",
+    "if not os.path.exists(PROJECT_ROOT_DIR):\n",
+    "    os.mkdir(PROJECT_ROOT_DIR)\n",
+    "\n",
+    "if not os.path.exists(FIGURE_ID):\n",
+    "    os.makedirs(FIGURE_ID)\n",
+    "\n",
+    "if not os.path.exists(DATA_ID):\n",
+    "    os.makedirs(DATA_ID)\n",
+    "\n",
+    "def image_path(fig_id):\n",
+    "    return os.path.join(FIGURE_ID, fig_id)\n",
+    "\n",
+    "def data_path(dat_id):\n",
+    "    return os.path.join(DATA_ID, dat_id)\n",
+    "\n",
+    "def save_fig(fig_id):\n",
+    "    plt.savefig(image_path(fig_id) + \".png\", format='png')\n",
+    "\n",
+    "infile = open(data_path(\"EoS.csv\"),'r')\n",
+    "\n",
+    "# Read the EoS data as  csv file and organize the data into two arrays with density and energies\n",
+    "EoS = pd.read_csv(infile, names=('Density', 'Energy'))\n",
+    "EoS['Energy'] = pd.to_numeric(EoS['Energy'], errors='coerce')\n",
+    "EoS = EoS.dropna()\n",
+    "Energies = EoS['Energy']\n",
+    "Density = EoS['Density']\n",
+    "#  The design matrix now as function of various polytrops\n",
+    "\n",
+    "Maxpolydegree = 30\n",
+    "X = np.zeros((len(Density),Maxpolydegree))\n",
+    "X[:,0] = 1.0\n",
+    "testerror = np.zeros(Maxpolydegree)\n",
+    "trainingerror = np.zeros(Maxpolydegree)\n",
+    "polynomial = np.zeros(Maxpolydegree)\n",
+    "\n",
+    "trials = 100\n",
+    "for polydegree in range(1, Maxpolydegree):\n",
+    "    polynomial[polydegree] = polydegree\n",
+    "    for degree in range(polydegree):\n",
+    "        X[:,degree] = Density**(degree/3.0)\n",
+    "\n",
+    "# loop over trials in order to estimate the expectation value of the MSE\n",
+    "    testerror[polydegree] = 0.0\n",
+    "    trainingerror[polydegree] = 0.0\n",
+    "    for samples in range(trials):\n",
+    "        x_train, x_test, y_train, y_test = train_test_split(X, Energies, test_size=0.2)\n",
+    "        model = LinearRegression(fit_intercept=False).fit(x_train, y_train)\n",
+    "        ypred = model.predict(x_train)\n",
+    "        ytilde = model.predict(x_test)\n",
+    "        testerror[polydegree] += mean_squared_error(y_test, ytilde)\n",
+    "        trainingerror[polydegree] += mean_squared_error(y_train, ypred) \n",
+    "\n",
+    "    testerror[polydegree] /= trials\n",
+    "    trainingerror[polydegree] /= trials\n",
+    "    print(\"Degree of polynomial: %3d\"% polynomial[polydegree])\n",
+    "    print(\"Mean squared error on training data: %.8f\" % trainingerror[polydegree])\n",
+    "    print(\"Mean squared error on test data: %.8f\" % testerror[polydegree])\n",
+    "\n",
+    "plt.plot(polynomial, np.log10(trainingerror), label='Training Error')\n",
+    "plt.plot(polynomial, np.log10(testerror), label='Test Error')\n",
+    "plt.xlabel('Polynomial degree')\n",
+    "plt.ylabel('log10[MSE]')\n",
+    "plt.legend()\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "53dc97b8",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Note that we kept the intercept column in the fitting here. This means that we need to set the **intercept** in the call to the **Scikit-Learn** function as **False**. Alternatively, we could have set up the design matrix $X$ without the first column of ones."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "660084ab",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## The same example but now with cross-validation\n",
+    "\n",
+    "In this example we keep the intercept column again but add cross-validation in order to estimate the best possible value of the means squared error."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "id": "5dd5aec2",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "# Common imports\n",
+    "import os\n",
+    "import numpy as np\n",
+    "import pandas as pd\n",
+    "import matplotlib.pyplot as plt\n",
+    "from sklearn.linear_model import LinearRegression, Ridge, Lasso\n",
+    "from sklearn.metrics import mean_squared_error\n",
+    "from sklearn.model_selection import KFold\n",
+    "from sklearn.model_selection import cross_val_score\n",
+    "\n",
+    "\n",
+    "# Where to save the figures and data files\n",
+    "PROJECT_ROOT_DIR = \"Results\"\n",
+    "FIGURE_ID = \"Results/FigureFiles\"\n",
+    "DATA_ID = \"DataFiles/\"\n",
+    "\n",
+    "if not os.path.exists(PROJECT_ROOT_DIR):\n",
+    "    os.mkdir(PROJECT_ROOT_DIR)\n",
+    "\n",
+    "if not os.path.exists(FIGURE_ID):\n",
+    "    os.makedirs(FIGURE_ID)\n",
+    "\n",
+    "if not os.path.exists(DATA_ID):\n",
+    "    os.makedirs(DATA_ID)\n",
+    "\n",
+    "def image_path(fig_id):\n",
+    "    return os.path.join(FIGURE_ID, fig_id)\n",
+    "\n",
+    "def data_path(dat_id):\n",
+    "    return os.path.join(DATA_ID, dat_id)\n",
+    "\n",
+    "def save_fig(fig_id):\n",
+    "    plt.savefig(image_path(fig_id) + \".png\", format='png')\n",
+    "\n",
+    "infile = open(data_path(\"EoS.csv\"),'r')\n",
+    "\n",
+    "# Read the EoS data as  csv file and organize the data into two arrays with density and energies\n",
+    "EoS = pd.read_csv(infile, names=('Density', 'Energy'))\n",
+    "EoS['Energy'] = pd.to_numeric(EoS['Energy'], errors='coerce')\n",
+    "EoS = EoS.dropna()\n",
+    "Energies = EoS['Energy']\n",
+    "Density = EoS['Density']\n",
+    "#  The design matrix now as function of various polytrops\n",
+    "\n",
+    "Maxpolydegree = 30\n",
+    "X = np.zeros((len(Density),Maxpolydegree))\n",
+    "X[:,0] = 1.0\n",
+    "estimated_mse_sklearn = np.zeros(Maxpolydegree)\n",
+    "polynomial = np.zeros(Maxpolydegree)\n",
+    "k =5\n",
+    "kfold = KFold(n_splits = k)\n",
+    "\n",
+    "for polydegree in range(1, Maxpolydegree):\n",
+    "    polynomial[polydegree] = polydegree\n",
+    "    for degree in range(polydegree):\n",
+    "        X[:,degree] = Density**(degree/3.0)\n",
+    "        OLS = LinearRegression(fit_intercept=False)\n",
+    "# loop over trials in order to estimate the expectation value of the MSE\n",
+    "    estimated_mse_folds = cross_val_score(OLS, X, Energies, scoring='neg_mean_squared_error', cv=kfold)\n",
+    "#[:, np.newaxis]\n",
+    "    estimated_mse_sklearn[polydegree] = np.mean(-estimated_mse_folds)\n",
+    "\n",
+    "plt.plot(polynomial, np.log10(estimated_mse_sklearn), label='Test Error')\n",
+    "plt.xlabel('Polynomial degree')\n",
+    "plt.ylabel('log10[MSE]')\n",
+    "plt.legend()\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2c1f6d4b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Logistic Regression\n",
+    "\n",
+    "In linear regression our main interest was centered on learning the\n",
+    "coefficients of a functional fit (say a polynomial) in order to be\n",
+    "able to predict the response of a continuous variable on some unseen\n",
+    "data. The fit to the continuous variable $y_i$ is based on some\n",
+    "independent variables $\\boldsymbol{x}_i$. Linear regression resulted in\n",
+    "analytical expressions for standard ordinary Least Squares or Ridge\n",
+    "regression (in terms of matrices to invert) for several quantities,\n",
+    "ranging from the variance and thereby the confidence intervals of the\n",
+    "parameters $\\boldsymbol{\\theta}$ to the mean squared error. If we can invert\n",
+    "the product of the design matrices, linear regression gives then a\n",
+    "simple recipe for fitting our data."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "149e92ec",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Classification problems\n",
+    "\n",
+    "Classification problems, however, are concerned with outcomes taking\n",
+    "the form of discrete variables (i.e. categories). We may for example,\n",
+    "on the basis of DNA sequencing for a number of patients, like to find\n",
+    "out which mutations are important for a certain disease; or based on\n",
+    "scans of various patients' brains, figure out if there is a tumor or\n",
+    "not; or given a specific physical system, we'd like to identify its\n",
+    "state, say whether it is an ordered or disordered system (typical\n",
+    "situation in solid state physics); or classify the status of a\n",
+    "patient, whether she/he has a stroke or not and many other similar\n",
+    "situations.\n",
+    "\n",
+    "The most common situation we encounter when we apply logistic\n",
+    "regression is that of two possible outcomes, normally denoted as a\n",
+    "binary outcome, true or false, positive or negative, success or\n",
+    "failure etc."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ce85cd3a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Optimization and Deep learning\n",
+    "\n",
+    "Logistic regression will also serve as our stepping stone towards\n",
+    "neural network algorithms and supervised deep learning. For logistic\n",
+    "learning, the minimization of the cost function leads to a non-linear\n",
+    "equation in the parameters $\\boldsymbol{\\theta}$. The optimization of the\n",
+    "problem calls therefore for minimization algorithms. This forms the\n",
+    "bottle neck of all machine learning algorithms, namely how to find\n",
+    "reliable minima of a multi-variable function. This leads us to the\n",
+    "family of gradient descent methods. The latter are the working horses\n",
+    "of basically all modern machine learning algorithms.\n",
+    "\n",
+    "We note also that many of the topics discussed here on logistic \n",
+    "regression are also commonly used in modern supervised Deep Learning\n",
+    "models, as we will see later."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2eb9e687",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Basics\n",
+    "\n",
+    "We consider the case where the outputs/targets, also called the\n",
+    "responses or the outcomes, $y_i$ are discrete and only take values\n",
+    "from $k=0,\\dots,K-1$ (i.e. $K$ classes).\n",
+    "\n",
+    "The goal is to predict the\n",
+    "output classes from the design matrix $\\boldsymbol{X}\\in\\mathbb{R}^{n\\times p}$\n",
+    "made of $n$ samples, each of which carries $p$ features or predictors. The\n",
+    "primary goal is to identify the classes to which new unseen samples\n",
+    "belong.\n",
+    "\n",
+    "Let us specialize to the case of two classes only, with outputs\n",
+    "$y_i=0$ and $y_i=1$. Our outcomes could represent the status of a\n",
+    "credit card user that could default or not on her/his credit card\n",
+    "debt. That is"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9b8b7d05",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "y_i = \\begin{bmatrix} 0 & \\mathrm{no}\\\\  1 & \\mathrm{yes} \\end{bmatrix}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7db50d1a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Linear classifier\n",
+    "\n",
+    "Before moving to the logistic model, let us try to use our linear\n",
+    "regression model to classify these two outcomes. We could for example\n",
+    "fit a linear model to the default case if $y_i > 0.5$ and the no\n",
+    "default case $y_i \\leq 0.5$.\n",
+    "\n",
+    "We would then have our \n",
+    "weighted linear combination, namely"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a78fc346",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"_auto1\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation}\n",
+    "\\boldsymbol{y} = \\boldsymbol{X}^T\\boldsymbol{\\theta} +  \\boldsymbol{\\epsilon},\n",
+    "\\label{_auto1} \\tag{1}\n",
+    "\\end{equation}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "661d8faf",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where $\\boldsymbol{y}$ is a vector representing the possible outcomes, $\\boldsymbol{X}$ is our\n",
+    "$n\\times p$ design matrix and $\\boldsymbol{\\theta}$ represents our estimators/predictors."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8620ba1b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Some selected properties\n",
+    "\n",
+    "The main problem with our function is that it takes values on the\n",
+    "entire real axis. In the case of logistic regression, however, the\n",
+    "labels $y_i$ are discrete variables. A typical example is the credit\n",
+    "card data discussed below here, where we can set the state of\n",
+    "defaulting the debt to $y_i=1$ and not to $y_i=0$ for one the persons\n",
+    "in the data set (see the full example below).\n",
+    "\n",
+    "One simple way to get a discrete output is to have sign\n",
+    "functions that map the output of a linear regressor to values $\\{0,1\\}$,\n",
+    "$f(s_i)=sign(s_i)=1$ if $s_i\\ge 0$ and 0 if otherwise. \n",
+    "We will encounter this model in our first demonstration of neural networks.\n",
+    "\n",
+    "Historically it is called the **perceptron** model in the machine learning\n",
+    "literature. This model is extremely simple. However, in many cases it is more\n",
+    "favorable to use a ``soft\" classifier that outputs\n",
+    "the probability of a given category. This leads us to the logistic function."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8fdbebd2",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Simple example\n",
+    "\n",
+    "The following example on data for coronary heart disease (CHD) as function of age may serve as an illustration. In the code here we read and plot whether a person has had CHD (output = 1) or not (output = 0). This ouput  is plotted the person's against age. Clearly, the figure shows that attempting to make a standard linear regression fit may not be very meaningful."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "id": "8dc64aeb",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "# Common imports\n",
+    "import os\n",
+    "import numpy as np\n",
+    "import pandas as pd\n",
+    "import matplotlib.pyplot as plt\n",
+    "from sklearn.linear_model import LinearRegression, Ridge, Lasso\n",
+    "from sklearn.model_selection import train_test_split\n",
+    "from sklearn.utils import resample\n",
+    "from sklearn.metrics import mean_squared_error\n",
+    "from IPython.display import display\n",
+    "from pylab import plt, mpl\n",
+    "mpl.rcParams['font.family'] = 'serif'\n",
+    "\n",
+    "# Where to save the figures and data files\n",
+    "PROJECT_ROOT_DIR = \"Results\"\n",
+    "FIGURE_ID = \"Results/FigureFiles\"\n",
+    "DATA_ID = \"DataFiles/\"\n",
+    "\n",
+    "if not os.path.exists(PROJECT_ROOT_DIR):\n",
+    "    os.mkdir(PROJECT_ROOT_DIR)\n",
+    "\n",
+    "if not os.path.exists(FIGURE_ID):\n",
+    "    os.makedirs(FIGURE_ID)\n",
+    "\n",
+    "if not os.path.exists(DATA_ID):\n",
+    "    os.makedirs(DATA_ID)\n",
+    "\n",
+    "def image_path(fig_id):\n",
+    "    return os.path.join(FIGURE_ID, fig_id)\n",
+    "\n",
+    "def data_path(dat_id):\n",
+    "    return os.path.join(DATA_ID, dat_id)\n",
+    "\n",
+    "def save_fig(fig_id):\n",
+    "    plt.savefig(image_path(fig_id) + \".png\", format='png')\n",
+    "\n",
+    "infile = open(data_path(\"chddata.csv\"),'r')\n",
+    "\n",
+    "# Read the chd data as  csv file and organize the data into arrays with age group, age, and chd\n",
+    "chd = pd.read_csv(infile, names=('ID', 'Age', 'Agegroup', 'CHD'))\n",
+    "chd.columns = ['ID', 'Age', 'Agegroup', 'CHD']\n",
+    "output = chd['CHD']\n",
+    "age = chd['Age']\n",
+    "agegroup = chd['Agegroup']\n",
+    "numberID  = chd['ID'] \n",
+    "display(chd)\n",
+    "\n",
+    "plt.scatter(age, output, marker='o')\n",
+    "plt.axis([18,70.0,-0.1, 1.2])\n",
+    "plt.xlabel(r'Age')\n",
+    "plt.ylabel(r'CHD')\n",
+    "plt.title(r'Age distribution and Coronary heart disease')\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "40385068",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Plotting the mean value for each group\n",
+    "\n",
+    "What we could attempt however is to plot the mean value for each group."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "id": "a473659b",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "agegroupmean = np.array([0.1, 0.133, 0.250, 0.333, 0.462, 0.625, 0.765, 0.800])\n",
+    "group = np.array([1, 2, 3, 4, 5, 6, 7, 8])\n",
+    "plt.plot(group, agegroupmean, \"r-\")\n",
+    "plt.axis([0,9,0, 1.0])\n",
+    "plt.xlabel(r'Age group')\n",
+    "plt.ylabel(r'CHD mean values')\n",
+    "plt.title(r'Mean values for each age group')\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3e2ab512",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "We are now trying to find a function $f(y\\vert x)$, that is a function which gives us an expected value for the output $y$ with a given input $x$.\n",
+    "In standard linear regression with a linear dependence on $x$, we would write this in terms of our model"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "40361f1b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "f(y_i\\vert x_i)=\\theta_0+\\theta_1 x_i.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a1b379fb",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "This expression implies however that $f(y_i\\vert x_i)$ could take any\n",
+    "value from minus infinity to plus infinity. If we however let\n",
+    "$f(y\\vert y)$ be represented by the mean value, the above example\n",
+    "shows us that we can constrain the function to take values between\n",
+    "zero and one, that is we have $0 \\le f(y_i\\vert x_i) \\le 1$. Looking\n",
+    "at our last curve we see also that it has an S-shaped form. This leads\n",
+    "us to a very popular model for the function $f$, namely the so-called\n",
+    "Sigmoid function or logistic model. We will consider this function as\n",
+    "representing the probability for finding a value of $y_i$ with a given\n",
+    "$x_i$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "bcbf3d2b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## The logistic function\n",
+    "\n",
+    "Another widely studied model, is the so-called \n",
+    "perceptron model, which is an example of a \"hard classification\" model. We\n",
+    "will encounter this model when we discuss neural networks as\n",
+    "well. Each datapoint is deterministically assigned to a category (i.e\n",
+    "$y_i=0$ or $y_i=1$). In many cases, and the coronary heart disease data forms one of many such examples, it is favorable to have a \"soft\"\n",
+    "classifier that outputs the probability of a given category rather\n",
+    "than a single value. For example, given $x_i$, the classifier\n",
+    "outputs the probability of being in a category $k$.  Logistic regression\n",
+    "is the most common example of a so-called soft classifier. In logistic\n",
+    "regression, the probability that a data point $x_i$\n",
+    "belongs to a category $y_i=\\{0,1\\}$ is given by the so-called logit function (or Sigmoid) which is meant to represent the likelihood for a given event,"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "38918f44",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "p(t) = \\frac{1}{1+\\mathrm \\exp{-t}}=\\frac{\\exp{t}}{1+\\mathrm \\exp{t}}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "fd225d0f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Note that $1-p(t)= p(-t)$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d340b5c1",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Examples of likelihood functions used in logistic regression and nueral networks\n",
+    "\n",
+    "The following code plots the logistic function, the step function and other functions we will encounter from here and on."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 8,
+   "id": "357d6f03",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "\"\"\"The sigmoid function (or the logistic curve) is a\n",
+    "function that takes any real number, z, and outputs a number (0,1).\n",
+    "It is useful in neural networks for assigning weights on a relative scale.\n",
+    "The value z is the weighted sum of parameters involved in the learning algorithm.\"\"\"\n",
+    "\n",
+    "import numpy\n",
+    "import matplotlib.pyplot as plt\n",
+    "import math as mt\n",
+    "\n",
+    "z = numpy.arange(-5, 5, .1)\n",
+    "sigma_fn = numpy.vectorize(lambda z: 1/(1+numpy.exp(-z)))\n",
+    "sigma = sigma_fn(z)\n",
+    "\n",
+    "fig = plt.figure()\n",
+    "ax = fig.add_subplot(111)\n",
+    "ax.plot(z, sigma)\n",
+    "ax.set_ylim([-0.1, 1.1])\n",
+    "ax.set_xlim([-5,5])\n",
+    "ax.grid(True)\n",
+    "ax.set_xlabel('z')\n",
+    "ax.set_title('sigmoid function')\n",
+    "\n",
+    "plt.show()\n",
+    "\n",
+    "\"\"\"Step Function\"\"\"\n",
+    "z = numpy.arange(-5, 5, .02)\n",
+    "step_fn = numpy.vectorize(lambda z: 1.0 if z >= 0.0 else 0.0)\n",
+    "step = step_fn(z)\n",
+    "\n",
+    "fig = plt.figure()\n",
+    "ax = fig.add_subplot(111)\n",
+    "ax.plot(z, step)\n",
+    "ax.set_ylim([-0.5, 1.5])\n",
+    "ax.set_xlim([-5,5])\n",
+    "ax.grid(True)\n",
+    "ax.set_xlabel('z')\n",
+    "ax.set_title('step function')\n",
+    "\n",
+    "plt.show()\n",
+    "\n",
+    "\"\"\"tanh Function\"\"\"\n",
+    "z = numpy.arange(-2*mt.pi, 2*mt.pi, 0.1)\n",
+    "t = numpy.tanh(z)\n",
+    "\n",
+    "fig = plt.figure()\n",
+    "ax = fig.add_subplot(111)\n",
+    "ax.plot(z, t)\n",
+    "ax.set_ylim([-1.0, 1.0])\n",
+    "ax.set_xlim([-2*mt.pi,2*mt.pi])\n",
+    "ax.grid(True)\n",
+    "ax.set_xlabel('z')\n",
+    "ax.set_title('tanh function')\n",
+    "\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8be63821",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Two parameters\n",
+    "\n",
+    "We assume now that we have two classes with $y_i$ either $0$ or $1$. Furthermore we assume also that we have only two parameters $\\theta$ in our fitting of the Sigmoid function, that is we define probabilities"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f79d930e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\begin{align*}\n",
+    "p(y_i=1|x_i,\\boldsymbol{\\theta}) &= \\frac{\\exp{(\\theta_0+\\theta_1x_i)}}{1+\\exp{(\\theta_0+\\theta_1x_i)}},\\nonumber\\\\\n",
+    "p(y_i=0|x_i,\\boldsymbol{\\theta}) &= 1 - p(y_i=1|x_i,\\boldsymbol{\\theta}),\n",
+    "\\end{align*}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8a758aae",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where $\\boldsymbol{\\theta}$ are the weights we wish to extract from data, in our case $\\theta_0$ and $\\theta_1$. \n",
+    "\n",
+    "Note that we used"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "88159170",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "p(y_i=0\\vert x_i, \\boldsymbol{\\theta}) = 1-p(y_i=1\\vert x_i, \\boldsymbol{\\theta}).\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f9972402",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Maximum likelihood\n",
+    "\n",
+    "In order to define the total likelihood for all possible outcomes from a  \n",
+    "dataset $\\mathcal{D}=\\{(y_i,x_i)\\}$, with the binary labels\n",
+    "$y_i\\in\\{0,1\\}$ and where the data points are drawn independently, we use the so-called [Maximum Likelihood Estimation](https://en.wikipedia.org/wiki/Maximum_likelihood_estimation) (MLE) principle. \n",
+    "We aim thus at maximizing \n",
+    "the probability of seeing the observed data. We can then approximate the \n",
+    "likelihood in terms of the product of the individual probabilities of a specific outcome $y_i$, that is"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "949524d2",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\begin{align*}\n",
+    "P(\\mathcal{D}|\\boldsymbol{\\theta})& = \\prod_{i=1}^n \\left[p(y_i=1|x_i,\\boldsymbol{\\theta})\\right]^{y_i}\\left[1-p(y_i=1|x_i,\\boldsymbol{\\theta}))\\right]^{1-y_i}\\nonumber \\\\\n",
+    "\\end{align*}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d9a7fded",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "from which we obtain the log-likelihood and our **cost/loss** function"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4c5f78fb",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\mathcal{C}(\\boldsymbol{\\theta}) = \\sum_{i=1}^n \\left( y_i\\log{p(y_i=1|x_i,\\boldsymbol{\\theta})} + (1-y_i)\\log\\left[1-p(y_i=1|x_i,\\boldsymbol{\\theta}))\\right]\\right).\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5ccce506",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## The cost function rewritten\n",
+    "\n",
+    "Reordering the logarithms, we can rewrite the **cost/loss** function as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "bf58bb76",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\mathcal{C}(\\boldsymbol{\\theta}) = \\sum_{i=1}^n  \\left(y_i(\\theta_0+\\theta_1x_i) -\\log{(1+\\exp{(\\theta_0+\\theta_1x_i)})}\\right).\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "41543ca6",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "The maximum likelihood estimator is defined as the set of parameters that maximize the log-likelihood where we maximize with respect to $\\theta$.\n",
+    "Since the cost (error) function is just the negative log-likelihood, for logistic regression we have that"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e664b57a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\mathcal{C}(\\boldsymbol{\\theta})=-\\sum_{i=1}^n  \\left(y_i(\\theta_0+\\theta_1x_i) -\\log{(1+\\exp{(\\theta_0+\\theta_1x_i)})}\\right).\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "eb357503",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "This equation is known in statistics as the **cross entropy**. Finally, we note that just as in linear regression, \n",
+    "in practice we often supplement the cross-entropy with additional regularization terms, usually $L_1$ and $L_2$ regularization as we did for Ridge and Lasso regression."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e388ad02",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Minimizing the cross entropy\n",
+    "\n",
+    "The cross entropy is a convex function of the weights $\\boldsymbol{\\theta}$ and,\n",
+    "therefore, any local minimizer is a global minimizer. \n",
+    "\n",
+    "Minimizing this\n",
+    "cost function with respect to the two parameters $\\theta_0$ and $\\theta_1$ we obtain"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1d4f2850",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial \\mathcal{C}(\\boldsymbol{\\theta})}{\\partial \\theta_0} = -\\sum_{i=1}^n  \\left(y_i -\\frac{\\exp{(\\theta_0+\\theta_1x_i)}}{1+\\exp{(\\theta_0+\\theta_1x_i)}}\\right),\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "68a0c133",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c942a72b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial \\mathcal{C}(\\boldsymbol{\\theta})}{\\partial \\theta_1} = -\\sum_{i=1}^n  \\left(y_ix_i -x_i\\frac{\\exp{(\\theta_0+\\theta_1x_i)}}{1+\\exp{(\\theta_0+\\theta_1x_i)}}\\right).\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "42caf6db",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## A more compact expression\n",
+    "\n",
+    "Let us now define a vector $\\boldsymbol{y}$ with $n$ elements $y_i$, an\n",
+    "$n\\times p$ matrix $\\boldsymbol{X}$ which contains the $x_i$ values and a\n",
+    "vector $\\boldsymbol{p}$ of fitted probabilities $p(y_i\\vert x_i,\\boldsymbol{\\theta})$. We can rewrite in a more compact form the first\n",
+    "derivative of the cost function as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "22cd94c9",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial \\mathcal{C}(\\boldsymbol{\\theta})}{\\partial \\boldsymbol{\\theta}} = -\\boldsymbol{X}^T\\left(\\boldsymbol{y}-\\boldsymbol{p}\\right).\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d9428067",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "If we in addition define a diagonal matrix $\\boldsymbol{W}$ with elements \n",
+    "$p(y_i\\vert x_i,\\boldsymbol{\\theta})(1-p(y_i\\vert x_i,\\boldsymbol{\\theta})$, we can obtain a compact expression of the second derivative as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "29178d5a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial^2 \\mathcal{C}(\\boldsymbol{\\theta})}{\\partial \\boldsymbol{\\theta}\\partial \\boldsymbol{\\theta}^T} = \\boldsymbol{X}^T\\boldsymbol{W}\\boldsymbol{X}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6b7671ad",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Extending to more predictors\n",
+    "\n",
+    "Within a binary classification problem, we can easily expand our model to include multiple predictors. Our ratio between likelihoods is then with $p$ predictors"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "500b6574",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\log{ \\frac{p(\\boldsymbol{\\theta}\\boldsymbol{x})}{1-p(\\boldsymbol{\\theta}\\boldsymbol{x})}} = \\theta_0+\\theta_1x_1+\\theta_2x_2+\\dots+\\theta_px_p.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cf0b50ce",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Here we defined $\\boldsymbol{x}=[1,x_1,x_2,\\dots,x_p]$ and $\\boldsymbol{\\theta}=[\\theta_0, \\theta_1, \\dots, \\theta_p]$ leading to"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "537486ee",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "p(\\boldsymbol{\\theta}\\boldsymbol{x})=\\frac{ \\exp{(\\theta_0+\\theta_1x_1+\\theta_2x_2+\\dots+\\theta_px_p)}}{1+\\exp{(\\theta_0+\\theta_1x_1+\\theta_2x_2+\\dots+\\theta_px_p)}}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "534fb571",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Including more classes\n",
+    "\n",
+    "Till now we have mainly focused on two classes, the so-called binary\n",
+    "system. Suppose we wish to extend to $K$ classes.  Let us for the sake\n",
+    "of simplicity assume we have only two predictors. We have then following model"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "fa7ca275",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\log{\\frac{p(C=1\\vert x)}{p(K\\vert x)}} = \\theta_{10}+\\theta_{11}x_1,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cc765c0e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2c43387d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\log{\\frac{p(C=2\\vert x)}{p(K\\vert x)}} = \\theta_{20}+\\theta_{21}x_1,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e063f183",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and so on till the class $C=K-1$ class"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "060fa00c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\log{\\frac{p(C=K-1\\vert x)}{p(K\\vert x)}} = \\theta_{(K-1)0}+\\theta_{(K-1)1}x_1,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b9034492",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and the model is specified in term of $K-1$ so-called log-odds or\n",
+    "**logit** transformations."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b7fba1fc",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## More classes\n",
+    "\n",
+    "In our discussion of neural networks we will encounter the above again\n",
+    "in terms of a slightly modified function, the so-called **Softmax** function.\n",
+    "\n",
+    "The softmax function is used in various multiclass classification\n",
+    "methods, such as multinomial logistic regression (also known as\n",
+    "softmax regression), multiclass linear discriminant analysis, naive\n",
+    "Bayes classifiers, and artificial neural networks.  Specifically, in\n",
+    "multinomial logistic regression and linear discriminant analysis, the\n",
+    "input to the function is the result of $K$ distinct linear functions,\n",
+    "and the predicted probability for the $k$-th class given a sample\n",
+    "vector $\\boldsymbol{x}$ and a weighting vector $\\boldsymbol{\\theta}$ is (with two\n",
+    "predictors):"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a8346f86",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "p(C=k\\vert \\mathbf {x} )=\\frac{\\exp{(\\theta_{k0}+\\theta_{k1}x_1)}}{1+\\sum_{l=1}^{K-1}\\exp{(\\theta_{l0}+\\theta_{l1}x_1)}}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b05e18eb",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "It is easy to extend to more predictors. The final class is"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3bff89b1",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "p(C=K\\vert \\mathbf {x} )=\\frac{1}{1+\\sum_{l=1}^{K-1}\\exp{(\\theta_{l0}+\\theta_{l1}x_1)}},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e89e832c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and they sum to one. Our earlier discussions were all specialized to\n",
+    "the case with two classes only. It is easy to see from the above that\n",
+    "what we derived earlier is compatible with these equations.\n",
+    "\n",
+    "To find the optimal parameters we would typically use a gradient\n",
+    "descent method.  Newton's method and gradient descent methods are\n",
+    "discussed in the material on [optimization\n",
+    "methods](https://compphysics.github.io/MachineLearning/doc/pub/Splines/html/Splines-bs.html)."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "464d4933",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Optimization, the central part of any Machine Learning algortithm\n",
+    "\n",
+    "Almost every problem in machine learning and data science starts with\n",
+    "a dataset $X$, a model $g(\\theta)$, which is a function of the\n",
+    "parameters $\\theta$ and a cost function $C(X, g(\\theta))$ that allows\n",
+    "us to judge how well the model $g(\\theta)$ explains the observations\n",
+    "$X$. The model is fit by finding the values of $\\theta$ that minimize\n",
+    "the cost function. Ideally we would be able to solve for $\\theta$\n",
+    "analytically, however this is not possible in general and we must use\n",
+    "some approximative/numerical method to compute the minimum."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c707d4a0",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Revisiting our Logistic Regression case\n",
+    "\n",
+    "In our discussion on Logistic Regression we studied the \n",
+    "case of\n",
+    "two classes, with $y_i$ either\n",
+    "$0$ or $1$. Furthermore we assumed also that we have only two\n",
+    "parameters $\\theta$ in our fitting, that is we\n",
+    "defined probabilities"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3f00d244",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\begin{align*}\n",
+    "p(y_i=1|x_i,\\boldsymbol{\\theta}) &= \\frac{\\exp{(\\theta_0+\\theta_1x_i)}}{1+\\exp{(\\theta_0+\\theta_1x_i)}},\\nonumber\\\\\n",
+    "p(y_i=0|x_i,\\boldsymbol{\\theta}) &= 1 - p(y_i=1|x_i,\\boldsymbol{\\theta}),\n",
+    "\\end{align*}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2d239661",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where $\\boldsymbol{\\theta}$ are the weights we wish to extract from data, in our case $\\theta_0$ and $\\theta_1$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4243778f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## The equations to solve\n",
+    "\n",
+    "Our compact equations used a definition of a vector $\\boldsymbol{y}$ with $n$\n",
+    "elements $y_i$, an $n\\times p$ matrix $\\boldsymbol{X}$ which contains the\n",
+    "$x_i$ values and a vector $\\boldsymbol{p}$ of fitted probabilities\n",
+    "$p(y_i\\vert x_i,\\boldsymbol{\\theta})$. We rewrote in a more compact form\n",
+    "the first derivative of the cost function as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "21ce04bb",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial \\mathcal{C}(\\boldsymbol{\\theta})}{\\partial \\boldsymbol{\\theta}} = -\\boldsymbol{X}^T\\left(\\boldsymbol{y}-\\boldsymbol{p}\\right).\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b854153c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "If we in addition define a diagonal matrix $\\boldsymbol{W}$ with elements \n",
+    "$p(y_i\\vert x_i,\\boldsymbol{\\theta})(1-p(y_i\\vert x_i,\\boldsymbol{\\theta})$, we can obtain a compact expression of the second derivative as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "235c9b1d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial^2 \\mathcal{C}(\\boldsymbol{\\theta})}{\\partial \\boldsymbol{\\theta}\\partial \\boldsymbol{\\theta}^T} = \\boldsymbol{X}^T\\boldsymbol{W}\\boldsymbol{X}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1651fe82",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "This defines what is called  the Hessian matrix."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f36a8c94",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Solving using Newton-Raphson's method\n",
+    "\n",
+    "If we can set up these equations, Newton-Raphson's iterative method is normally the method of choice. It requires however that we can compute in an efficient way the  matrices that define the first and second derivatives. \n",
+    "\n",
+    "Our iterative scheme is then given by"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "438b5efe",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\boldsymbol{\\theta}^{\\mathrm{new}} = \\boldsymbol{\\theta}^{\\mathrm{old}}-\\left(\\frac{\\partial^2 \\mathcal{C}(\\boldsymbol{\\theta})}{\\partial \\boldsymbol{\\theta}\\partial \\boldsymbol{\\theta}^T}\\right)^{-1}_{\\boldsymbol{\\theta}^{\\mathrm{old}}}\\times \\left(\\frac{\\partial \\mathcal{C}(\\boldsymbol{\\theta})}{\\partial \\boldsymbol{\\theta}}\\right)_{\\boldsymbol{\\theta}^{\\mathrm{old}}},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f3ae8207",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "or in matrix form as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "702a38c4",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\boldsymbol{\\theta}^{\\mathrm{new}} = \\boldsymbol{\\theta}^{\\mathrm{old}}-\\left(\\boldsymbol{X}^T\\boldsymbol{W}\\boldsymbol{X} \\right)^{-1}\\times \\left(-\\boldsymbol{X}^T(\\boldsymbol{y}-\\boldsymbol{p}) \\right)_{\\boldsymbol{\\theta}^{\\mathrm{old}}}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "43b5a9ab",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "The right-hand side is computed with the old values of $\\theta$. \n",
+    "\n",
+    "If we can compute these matrices, in particular the Hessian, the above is often the easiest method to implement."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5b579d10",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Example code for Logistic Regression\n",
+    "\n",
+    "Here we make a class for Logistic regression. The code uses a simple data set and includes both a binary case and a multiclass case."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 9,
+   "id": "a59b8c77",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "import numpy as np\n",
+    "\n",
+    "class LogisticRegression:\n",
+    "    \"\"\"\n",
+    "    Logistic Regression for binary and multiclass classification.\n",
+    "    \"\"\"\n",
+    "    def __init__(self, lr=0.01, epochs=1000, fit_intercept=True, verbose=False):\n",
+    "        self.lr = lr                  # Learning rate for gradient descent\n",
+    "        self.epochs = epochs          # Number of iterations\n",
+    "        self.fit_intercept = fit_intercept  # Whether to add intercept (bias)\n",
+    "        self.verbose = verbose        # Print loss during training if True\n",
+    "        self.weights = None\n",
+    "        self.multi_class = False      # Will be determined at fit time\n",
+    "\n",
+    "    def _add_intercept(self, X):\n",
+    "        \"\"\"Add intercept term (column of ones) to feature matrix.\"\"\"\n",
+    "        intercept = np.ones((X.shape[0], 1))\n",
+    "        return np.concatenate((intercept, X), axis=1)\n",
+    "\n",
+    "    def _sigmoid(self, z):\n",
+    "        \"\"\"Sigmoid function for binary logistic.\"\"\"\n",
+    "        return 1 / (1 + np.exp(-z))\n",
+    "\n",
+    "    def _softmax(self, Z):\n",
+    "        \"\"\"Softmax function for multiclass logistic.\"\"\"\n",
+    "        exp_Z = np.exp(Z - np.max(Z, axis=1, keepdims=True))\n",
+    "        return exp_Z / np.sum(exp_Z, axis=1, keepdims=True)\n",
+    "\n",
+    "    def fit(self, X, y):\n",
+    "        \"\"\"\n",
+    "        Train the logistic regression model using gradient descent.\n",
+    "        Supports binary (sigmoid) and multiclass (softmax) based on y.\n",
+    "        \"\"\"\n",
+    "        X = np.array(X)\n",
+    "        y = np.array(y)\n",
+    "        n_samples, n_features = X.shape\n",
+    "\n",
+    "        # Add intercept if needed\n",
+    "        if self.fit_intercept:\n",
+    "            X = self._add_intercept(X)\n",
+    "            n_features += 1\n",
+    "\n",
+    "        # Determine classes and mode (binary vs multiclass)\n",
+    "        unique_classes = np.unique(y)\n",
+    "        if len(unique_classes) > 2:\n",
+    "            self.multi_class = True\n",
+    "        else:\n",
+    "            self.multi_class = False\n",
+    "\n",
+    "        # ----- Multiclass case -----\n",
+    "        if self.multi_class:\n",
+    "            n_classes = len(unique_classes)\n",
+    "            # Map original labels to 0...n_classes-1\n",
+    "            class_to_index = {c: idx for idx, c in enumerate(unique_classes)}\n",
+    "            y_indices = np.array([class_to_index[c] for c in y])\n",
+    "            # Initialize weight matrix (features x classes)\n",
+    "            self.weights = np.zeros((n_features, n_classes))\n",
+    "\n",
+    "            # One-hot encode y\n",
+    "            Y_onehot = np.zeros((n_samples, n_classes))\n",
+    "            Y_onehot[np.arange(n_samples), y_indices] = 1\n",
+    "\n",
+    "            # Gradient descent\n",
+    "            for epoch in range(self.epochs):\n",
+    "                scores = X.dot(self.weights)          # Linear scores (n_samples x n_classes)\n",
+    "                probs = self._softmax(scores)        # Probabilities (n_samples x n_classes)\n",
+    "                # Compute gradient (features x classes)\n",
+    "                gradient = (1 / n_samples) * X.T.dot(probs - Y_onehot)\n",
+    "                # Update weights\n",
+    "                self.weights -= self.lr * gradient\n",
+    "\n",
+    "                if self.verbose and epoch % 100 == 0:\n",
+    "                    # Compute current loss (categorical cross-entropy)\n",
+    "                    loss = -np.sum(Y_onehot * np.log(probs + 1e-15)) / n_samples\n",
+    "                    print(f\"[Epoch {epoch}] Multiclass loss: {loss:.4f}\")\n",
+    "\n",
+    "        # ----- Binary case -----\n",
+    "        else:\n",
+    "            # Convert y to 0/1 if not already\n",
+    "            if not np.array_equal(unique_classes, [0, 1]):\n",
+    "                # Map the two classes to 0 and 1\n",
+    "                class0, class1 = unique_classes\n",
+    "                y_binary = np.where(y == class1, 1, 0)\n",
+    "            else:\n",
+    "                y_binary = y.copy().astype(int)\n",
+    "\n",
+    "            # Initialize weights vector (features,)\n",
+    "            self.weights = np.zeros(n_features)\n",
+    "\n",
+    "            # Gradient descent\n",
+    "            for epoch in range(self.epochs):\n",
+    "                linear_model = X.dot(self.weights)     # (n_samples,)\n",
+    "                probs = self._sigmoid(linear_model)   # (n_samples,)\n",
+    "                # Gradient for binary cross-entropy\n",
+    "                gradient = (1 / n_samples) * X.T.dot(probs - y_binary)\n",
+    "                self.weights -= self.lr * gradient\n",
+    "\n",
+    "                if self.verbose and epoch % 100 == 0:\n",
+    "                    # Compute binary cross-entropy loss\n",
+    "                    loss = -np.mean(\n",
+    "                        y_binary * np.log(probs + 1e-15) + \n",
+    "                        (1 - y_binary) * np.log(1 - probs + 1e-15)\n",
+    "                    )\n",
+    "                    print(f\"[Epoch {epoch}] Binary loss: {loss:.4f}\")\n",
+    "\n",
+    "    def predict_prob(self, X):\n",
+    "        \"\"\"\n",
+    "        Compute probability estimates. Returns a 1D array for binary or\n",
+    "        a 2D array (n_samples x n_classes) for multiclass.\n",
+    "        \"\"\"\n",
+    "        X = np.array(X)\n",
+    "        # Add intercept if the model used it\n",
+    "        if self.fit_intercept:\n",
+    "            X = self._add_intercept(X)\n",
+    "        scores = X.dot(self.weights)\n",
+    "        if self.multi_class:\n",
+    "            return self._softmax(scores)\n",
+    "        else:\n",
+    "            return self._sigmoid(scores)\n",
+    "\n",
+    "    def predict(self, X):\n",
+    "        \"\"\"\n",
+    "        Predict class labels for samples in X.\n",
+    "        Returns integer class labels (0,1 for binary, or 0...C-1 for multiclass).\n",
+    "        \"\"\"\n",
+    "        probs = self.predict_prob(X)\n",
+    "        if self.multi_class:\n",
+    "            # Choose class with highest probability\n",
+    "            return np.argmax(probs, axis=1)\n",
+    "        else:\n",
+    "            # Threshold at 0.5 for binary\n",
+    "            return (probs >= 0.5).astype(int)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d7401376",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "The class implements the sigmoid and softmax internally. During fit(),\n",
+    "we check the number of classes: if more than 2, we set\n",
+    "self.multi_class=True and perform multinomial logistic regression. We\n",
+    "one-hot encode the target vector and update a weight matrix with\n",
+    "softmax probabilities. Otherwise, we do standard binary logistic\n",
+    "regression, converting labels to 0/1 if needed and updating a weight\n",
+    "vector. In both cases we use batch gradient descent on the\n",
+    "cross-entropy loss (we add a small epsilon 1e-15 to logs for numerical\n",
+    "stability). Progress (loss) can be printed if verbose=True."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 10,
+   "id": "8609fd64",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "# Evaluation Metrics\n",
+    "#We define helper functions for accuracy and cross-entropy loss. Accuracy is the fraction of correct predictions . For loss, we compute the appropriate cross-entropy:\n",
+    "\n",
+    "def accuracy_score(y_true, y_pred):\n",
+    "    \"\"\"Accuracy = (# correct predictions) / (total samples).\"\"\"\n",
+    "    y_true = np.array(y_true)\n",
+    "    y_pred = np.array(y_pred)\n",
+    "    return np.mean(y_true == y_pred)\n",
+    "\n",
+    "def binary_cross_entropy(y_true, y_prob):\n",
+    "    \"\"\"\n",
+    "    Binary cross-entropy loss.\n",
+    "    y_true: true binary labels (0 or 1), y_prob: predicted probabilities for class 1.\n",
+    "    \"\"\"\n",
+    "    y_true = np.array(y_true)\n",
+    "    y_prob = np.clip(np.array(y_prob), 1e-15, 1-1e-15)\n",
+    "    return -np.mean(y_true * np.log(y_prob) + (1 - y_true) * np.log(1 - y_prob))\n",
+    "\n",
+    "def categorical_cross_entropy(y_true, y_prob):\n",
+    "    \"\"\"\n",
+    "    Categorical cross-entropy loss for multiclass.\n",
+    "    y_true: true labels (0...C-1), y_prob: array of predicted probabilities (n_samples x C).\n",
+    "    \"\"\"\n",
+    "    y_true = np.array(y_true, dtype=int)\n",
+    "    y_prob = np.clip(np.array(y_prob), 1e-15, 1-1e-15)\n",
+    "    # One-hot encode true labels\n",
+    "    n_samples, n_classes = y_prob.shape\n",
+    "    one_hot = np.zeros_like(y_prob)\n",
+    "    one_hot[np.arange(n_samples), y_true] = 1\n",
+    "    # Compute cross-entropy\n",
+    "    loss_vec = -np.sum(one_hot * np.log(y_prob), axis=1)\n",
+    "    return np.mean(loss_vec)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1879aba2",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "### Synthetic data generation\n",
+    "\n",
+    "Binary classification data: Create two Gaussian clusters in 2D. For example, class 0 around mean [-2,-2] and class 1 around [2,2].\n",
+    "Multiclass data: Create several Gaussian clusters (one per class) spread out in feature space."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 11,
+   "id": "6083d844",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "import numpy as np\n",
+    "\n",
+    "def generate_binary_data(n_samples=100, n_features=2, random_state=None):\n",
+    "    \"\"\"\n",
+    "    Generate synthetic binary classification data.\n",
+    "    Returns (X, y) where X is (n_samples x n_features), y in {0,1}.\n",
+    "    \"\"\"\n",
+    "    rng = np.random.RandomState(random_state)\n",
+    "    # Half samples for class 0, half for class 1\n",
+    "    n0 = n_samples // 2\n",
+    "    n1 = n_samples - n0\n",
+    "    # Class 0 around mean -2, class 1 around +2\n",
+    "    mean0 = -2 * np.ones(n_features)\n",
+    "    mean1 =  2 * np.ones(n_features)\n",
+    "    X0 = rng.randn(n0, n_features) + mean0\n",
+    "    X1 = rng.randn(n1, n_features) + mean1\n",
+    "    X = np.vstack((X0, X1))\n",
+    "    y = np.array([0]*n0 + [1]*n1)\n",
+    "    return X, y\n",
+    "\n",
+    "def generate_multiclass_data(n_samples=150, n_features=2, n_classes=3, random_state=None):\n",
+    "    \"\"\"\n",
+    "    Generate synthetic multiclass data with n_classes Gaussian clusters.\n",
+    "    \"\"\"\n",
+    "    rng = np.random.RandomState(random_state)\n",
+    "    X = []\n",
+    "    y = []\n",
+    "    samples_per_class = n_samples // n_classes\n",
+    "    for cls in range(n_classes):\n",
+    "        # Random cluster center for each class\n",
+    "        center = rng.uniform(-5, 5, size=n_features)\n",
+    "        Xi = rng.randn(samples_per_class, n_features) + center\n",
+    "        yi = [cls] * samples_per_class\n",
+    "        X.append(Xi)\n",
+    "        y.extend(yi)\n",
+    "    X = np.vstack(X)\n",
+    "    y = np.array(y)\n",
+    "    return X, y\n",
+    "\n",
+    "\n",
+    "# Generate and test on binary data\n",
+    "X_bin, y_bin = generate_binary_data(n_samples=200, n_features=2, random_state=42)\n",
+    "model_bin = LogisticRegression(lr=0.1, epochs=1000)\n",
+    "model_bin.fit(X_bin, y_bin)\n",
+    "y_prob_bin = model_bin.predict_prob(X_bin)      # probabilities for class 1\n",
+    "y_pred_bin = model_bin.predict(X_bin)           # predicted classes 0 or 1\n",
+    "\n",
+    "acc_bin = accuracy_score(y_bin, y_pred_bin)\n",
+    "loss_bin = binary_cross_entropy(y_bin, y_prob_bin)\n",
+    "print(f\"Binary Classification - Accuracy: {acc_bin:.2f}, Cross-Entropy Loss: {loss_bin:.2f}\")\n",
+    "#For multiclass:\n",
+    "# Generate and test on multiclass data\n",
+    "X_multi, y_multi = generate_multiclass_data(n_samples=300, n_features=2, n_classes=3, random_state=1)\n",
+    "model_multi = LogisticRegression(lr=0.1, epochs=1000)\n",
+    "model_multi.fit(X_multi, y_multi)\n",
+    "y_prob_multi = model_multi.predict_prob(X_multi)     # (n_samples x 3) probabilities\n",
+    "y_pred_multi = model_multi.predict(X_multi)          # predicted labels 0,1,2\n",
+    "\n",
+    "acc_multi = accuracy_score(y_multi, y_pred_multi)\n",
+    "loss_multi = categorical_cross_entropy(y_multi, y_prob_multi)\n",
+    "print(f\"Multiclass Classification - Accuracy: {acc_multi:.2f}, Cross-Entropy Loss: {loss_multi:.2f}\")\n",
+    "\n",
+    "# CSV Export\n",
+    "import csv\n",
+    "\n",
+    "# Export binary results\n",
+    "with open('binary_results.csv', mode='w', newline='') as f:\n",
+    "    writer = csv.writer(f)\n",
+    "    writer.writerow([\"TrueLabel\", \"PredictedLabel\"])\n",
+    "    for true, pred in zip(y_bin, y_pred_bin):\n",
+    "        writer.writerow([true, pred])\n",
+    "\n",
+    "# Export multiclass results\n",
+    "with open('multiclass_results.csv', mode='w', newline='') as f:\n",
+    "    writer = csv.writer(f)\n",
+    "    writer.writerow([\"TrueLabel\", \"PredictedLabel\"])\n",
+    "    for true, pred in zip(y_multi, y_pred_multi):\n",
+    "        writer.writerow([true, pred])"
+   ]
+  }
+ ],
+ "metadata": {},
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/doc/LectureNotes/_build/html/_sources/week40.ipynb b/doc/LectureNotes/_build/html/_sources/week40.ipynb
new file mode 100644
index 000000000..aa3733b88
--- /dev/null
+++ b/doc/LectureNotes/_build/html/_sources/week40.ipynb
@@ -0,0 +1,2459 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "2303c986",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- HTML file automatically generated from DocOnce source (https://github.com/doconce/doconce/)\n",
+    "doconce format html week40.do.txt --no_mako -->\n",
+    "<!-- dom:TITLE: Week 40: Gradient descent methods (continued) and start Neural networks -->"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "75c3b33e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "# Week 40: Gradient descent methods (continued) and start Neural networks\n",
+    "**Morten Hjorth-Jensen**, Department of Physics, University of Oslo, Norway\n",
+    "\n",
+    "Date: **September 29-October 3, 2025**"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4ba50982",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Lecture Monday September 29, 2025\n",
+    "1. Logistic regression and gradient descent, examples on how to code\n",
+    "<!-- o Automatic differentiation and gradient descent, examples using Logistic regression -->\n",
+    "\n",
+    "2. Start with the basics of Neural Networks, setting up the basic steps, from the simple perceptron model to the multi-layer perceptron model\n",
+    "\n",
+    "3. Video of lecture at <https://youtu.be/MS3Tv8FVArs>\n",
+    "\n",
+    "4. Whiteboard notes at <https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2025/FYSSTKweek40.pdf>"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1d527020",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Suggested readings and videos\n",
+    "**Readings and Videos:**\n",
+    "\n",
+    "1. The lecture notes for week 40 (these notes)\n",
+    "<!-- o For a good discussion on gradient methods, we would like to recommend Goodfellow et al section 4.3-4.5 and# sections 8.3-8.6. We will come back to the latter chapter in our discussion of Neural networks as well. -->\n",
+    "\n",
+    "2. For neural networks we recommend Goodfellow et al chapter 6 and Raschka et al chapter 2 (contains also material about gradient descent) and chapter 11 (we will use this next week)\n",
+    "<!-- o Video on gradient descent at <https://www.youtube.com/watch?v=sDv4f4s2SB8> -->\n",
+    "<!-- o Video on automatic differentiation  at <https://www.youtube.com/watch?v=wG_nF1awSSY> -->\n",
+    "\n",
+    "3. Neural Networks demystified at <https://www.youtube.com/watch?v=bxe2T-V8XRs&list=PLiaHhY2iBX9hdHaRr6b7XevZtgZRa1PoU&ab_channel=WelchLabs>\n",
+    "\n",
+    "4. Building Neural Networks from scratch at URL:https://www.youtube.com/watch?v=Wo5dMEP_BbI&list=PLQVvvaa0QuDcjD5BAw2DxE6OF2tius3V3&ab_channel=sentdex\""
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "63a4d497",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Lab sessions Tuesday and Wednesday\n",
+    "**Material for the active learning sessions on Tuesday and Wednesday.**\n",
+    "\n",
+    "  * Work on project 1 and discussions on how to structure your report\n",
+    "\n",
+    "  * No weekly exercises for week 40, project work only\n",
+    "\n",
+    "  * Video on how to write scientific reports recorded during one of the lab sessions at <https://youtu.be/tVW1ZDmZnwM>\n",
+    "\n",
+    "  * A general guideline can be found at <https://github.com/CompPhysics/MachineLearning/blob/master/doc/Projects/EvaluationGrading/EvaluationForm.md>."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "73621d6b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Logistic Regression, from last week\n",
+    "\n",
+    "In linear regression our main interest was centered on learning the\n",
+    "coefficients of a functional fit (say a polynomial) in order to be\n",
+    "able to predict the response of a continuous variable on some unseen\n",
+    "data. The fit to the continuous variable $y_i$ is based on some\n",
+    "independent variables $\\boldsymbol{x}_i$. Linear regression resulted in\n",
+    "analytical expressions for standard ordinary Least Squares or Ridge\n",
+    "regression (in terms of matrices to invert) for several quantities,\n",
+    "ranging from the variance and thereby the confidence intervals of the\n",
+    "parameters $\\boldsymbol{\\theta}$ to the mean squared error. If we can invert\n",
+    "the product of the design matrices, linear regression gives then a\n",
+    "simple recipe for fitting our data."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "fc1df17b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Classification problems\n",
+    "\n",
+    "Classification problems, however, are concerned with outcomes taking\n",
+    "the form of discrete variables (i.e. categories). We may for example,\n",
+    "on the basis of DNA sequencing for a number of patients, like to find\n",
+    "out which mutations are important for a certain disease; or based on\n",
+    "scans of various patients' brains, figure out if there is a tumor or\n",
+    "not; or given a specific physical system, we'd like to identify its\n",
+    "state, say whether it is an ordered or disordered system (typical\n",
+    "situation in solid state physics); or classify the status of a\n",
+    "patient, whether she/he has a stroke or not and many other similar\n",
+    "situations.\n",
+    "\n",
+    "The most common situation we encounter when we apply logistic\n",
+    "regression is that of two possible outcomes, normally denoted as a\n",
+    "binary outcome, true or false, positive or negative, success or\n",
+    "failure etc."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a3d311e6",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Optimization and Deep learning\n",
+    "\n",
+    "Logistic regression will also serve as our stepping stone towards\n",
+    "neural network algorithms and supervised deep learning. For logistic\n",
+    "learning, the minimization of the cost function leads to a non-linear\n",
+    "equation in the parameters $\\boldsymbol{\\theta}$. The optimization of the\n",
+    "problem calls therefore for minimization algorithms.\n",
+    "\n",
+    "As we have discussed earlier, this forms the\n",
+    "bottle neck of all machine learning algorithms, namely how to find\n",
+    "reliable minima of a multi-variable function. This leads us to the\n",
+    "family of gradient descent methods. The latter are the working horses\n",
+    "of basically all modern machine learning algorithms.\n",
+    "\n",
+    "We note also that many of the topics discussed here on logistic \n",
+    "regression are also commonly used in modern supervised Deep Learning\n",
+    "models, as we will see later."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4120d6f9",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Basics\n",
+    "\n",
+    "We consider the case where the outputs/targets, also called the\n",
+    "responses or the outcomes, $y_i$ are discrete and only take values\n",
+    "from $k=0,\\dots,K-1$ (i.e. $K$ classes).\n",
+    "\n",
+    "The goal is to predict the\n",
+    "output classes from the design matrix $\\boldsymbol{X}\\in\\mathbb{R}^{n\\times p}$\n",
+    "made of $n$ samples, each of which carries $p$ features or predictors. The\n",
+    "primary goal is to identify the classes to which new unseen samples\n",
+    "belong.\n",
+    "\n",
+    "Last week we  specialized to the case of two classes only, with outputs\n",
+    "$y_i=0$ and $y_i=1$. Our outcomes could represent the status of a\n",
+    "credit card user that could default or not on her/his credit card\n",
+    "debt. That is"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9e85d1e4",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "y_i = \\begin{bmatrix} 0 & \\mathrm{no}\\\\  1 & \\mathrm{yes} \\end{bmatrix}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a0d8c838",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Two parameters\n",
+    "\n",
+    "We assume now that we have two classes with $y_i$ either $0$ or $1$. Furthermore we assume also that we have only two parameters $\\theta$ in our fitting of the Sigmoid function, that is we define probabilities"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7cea7945",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\begin{align*}\n",
+    "p(y_i=1|x_i,\\boldsymbol{\\theta}) &= \\frac{\\exp{(\\theta_0+\\theta_1x_i)}}{1+\\exp{(\\theta_0+\\theta_1x_i)}},\\nonumber\\\\\n",
+    "p(y_i=0|x_i,\\boldsymbol{\\theta}) &= 1 - p(y_i=1|x_i,\\boldsymbol{\\theta}),\n",
+    "\\end{align*}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6adc5106",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where $\\boldsymbol{\\theta}$ are the weights we wish to extract from data, in our case $\\theta_0$ and $\\theta_1$. \n",
+    "\n",
+    "Note that we used"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f976068e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "p(y_i=0\\vert x_i, \\boldsymbol{\\theta}) = 1-p(y_i=1\\vert x_i, \\boldsymbol{\\theta}).\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "dedf9f0e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Maximum likelihood\n",
+    "\n",
+    "In order to define the total likelihood for all possible outcomes from a  \n",
+    "dataset $\\mathcal{D}=\\{(y_i,x_i)\\}$, with the binary labels\n",
+    "$y_i\\in\\{0,1\\}$ and where the data points are drawn independently, we use the so-called [Maximum Likelihood Estimation](https://en.wikipedia.org/wiki/Maximum_likelihood_estimation) (MLE) principle. \n",
+    "We aim thus at maximizing \n",
+    "the probability of seeing the observed data. We can then approximate the \n",
+    "likelihood in terms of the product of the individual probabilities of a specific outcome $y_i$, that is"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "bd8b54ab",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\begin{align*}\n",
+    "P(\\mathcal{D}|\\boldsymbol{\\theta})& = \\prod_{i=1}^n \\left[p(y_i=1|x_i,\\boldsymbol{\\theta})\\right]^{y_i}\\left[1-p(y_i=1|x_i,\\boldsymbol{\\theta}))\\right]^{1-y_i}\\nonumber \\\\\n",
+    "\\end{align*}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "57bfb17f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "from which we obtain the log-likelihood and our **cost/loss** function"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "00aee268",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\mathcal{C}(\\boldsymbol{\\theta}) = \\sum_{i=1}^n \\left( y_i\\log{p(y_i=1|x_i,\\boldsymbol{\\theta})} + (1-y_i)\\log\\left[1-p(y_i=1|x_i,\\boldsymbol{\\theta}))\\right]\\right).\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e12940f3",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## The cost function rewritten\n",
+    "\n",
+    "Reordering the logarithms, we can rewrite the **cost/loss** function as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e5b2b29e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\mathcal{C}(\\boldsymbol{\\theta}) = \\sum_{i=1}^n  \\left(y_i(\\theta_0+\\theta_1x_i) -\\log{(1+\\exp{(\\theta_0+\\theta_1x_i)})}\\right).\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c6c0ba4c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "The maximum likelihood estimator is defined as the set of parameters that maximize the log-likelihood where we maximize with respect to $\\theta$.\n",
+    "Since the cost (error) function is just the negative log-likelihood, for logistic regression we have that"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "46ee2ea8",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\mathcal{C}(\\boldsymbol{\\theta})=-\\sum_{i=1}^n  \\left(y_i(\\theta_0+\\theta_1x_i) -\\log{(1+\\exp{(\\theta_0+\\theta_1x_i)})}\\right).\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9a05709b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "This equation is known in statistics as the **cross entropy**. Finally, we note that just as in linear regression, \n",
+    "in practice we often supplement the cross-entropy with additional regularization terms, usually $L_1$ and $L_2$ regularization as we did for Ridge and Lasso regression."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ae1362c9",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Minimizing the cross entropy\n",
+    "\n",
+    "The cross entropy is a convex function of the weights $\\boldsymbol{\\theta}$ and,\n",
+    "therefore, any local minimizer is a global minimizer. \n",
+    "\n",
+    "Minimizing this\n",
+    "cost function with respect to the two parameters $\\theta_0$ and $\\theta_1$ we obtain"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "57f4670b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial \\mathcal{C}(\\boldsymbol{\\theta})}{\\partial \\theta_0} = -\\sum_{i=1}^n  \\left(y_i -\\frac{\\exp{(\\theta_0+\\theta_1x_i)}}{1+\\exp{(\\theta_0+\\theta_1x_i)}}\\right),\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1dc19f59",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4e96dc87",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial \\mathcal{C}(\\boldsymbol{\\theta})}{\\partial \\theta_1} = -\\sum_{i=1}^n  \\left(y_ix_i -x_i\\frac{\\exp{(\\theta_0+\\theta_1x_i)}}{1+\\exp{(\\theta_0+\\theta_1x_i)}}\\right).\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "fa77bec9",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## A more compact expression\n",
+    "\n",
+    "Let us now define a vector $\\boldsymbol{y}$ with $n$ elements $y_i$, an\n",
+    "$n\\times p$ matrix $\\boldsymbol{X}$ which contains the $x_i$ values and a\n",
+    "vector $\\boldsymbol{p}$ of fitted probabilities $p(y_i\\vert x_i,\\boldsymbol{\\theta})$. We can rewrite in a more compact form the first\n",
+    "derivative of the cost function as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1b013fd2",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial \\mathcal{C}(\\boldsymbol{\\theta})}{\\partial \\boldsymbol{\\theta}} = -\\boldsymbol{X}^T\\left(\\boldsymbol{y}-\\boldsymbol{p}\\right).\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "910f36dd",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "If we in addition define a diagonal matrix $\\boldsymbol{W}$ with elements \n",
+    "$p(y_i\\vert x_i,\\boldsymbol{\\theta})(1-p(y_i\\vert x_i,\\boldsymbol{\\theta})$, we can obtain a compact expression of the second derivative as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8212d0ed",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial^2 \\mathcal{C}(\\boldsymbol{\\theta})}{\\partial \\boldsymbol{\\theta}\\partial \\boldsymbol{\\theta}^T} = \\boldsymbol{X}^T\\boldsymbol{W}\\boldsymbol{X}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7ae7078b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Extending to more predictors\n",
+    "\n",
+    "Within a binary classification problem, we can easily expand our model to include multiple predictors. Our ratio between likelihoods is then with $p$ predictors"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "59e57d7c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\log{ \\frac{p(\\boldsymbol{\\theta}\\boldsymbol{x})}{1-p(\\boldsymbol{\\theta}\\boldsymbol{x})}} = \\theta_0+\\theta_1x_1+\\theta_2x_2+\\dots+\\theta_px_p.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6ffe0955",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Here we defined $\\boldsymbol{x}=[1,x_1,x_2,\\dots,x_p]$ and $\\boldsymbol{\\theta}=[\\theta_0, \\theta_1, \\dots, \\theta_p]$ leading to"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "56e9bd82",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "p(\\boldsymbol{\\theta}\\boldsymbol{x})=\\frac{ \\exp{(\\theta_0+\\theta_1x_1+\\theta_2x_2+\\dots+\\theta_px_p)}}{1+\\exp{(\\theta_0+\\theta_1x_1+\\theta_2x_2+\\dots+\\theta_px_p)}}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "86b12946",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Including more classes\n",
+    "\n",
+    "Till now we have mainly focused on two classes, the so-called binary\n",
+    "system. Suppose we wish to extend to $K$ classes.  Let us for the sake\n",
+    "of simplicity assume we have only two predictors. We have then following model"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d55394df",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\log{\\frac{p(C=1\\vert x)}{p(K\\vert x)}} = \\theta_{10}+\\theta_{11}x_1,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ee01378a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c7fadfbb",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\log{\\frac{p(C=2\\vert x)}{p(K\\vert x)}} = \\theta_{20}+\\theta_{21}x_1,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e8310f63",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and so on till the class $C=K-1$ class"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "be651647",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\log{\\frac{p(C=K-1\\vert x)}{p(K\\vert x)}} = \\theta_{(K-1)0}+\\theta_{(K-1)1}x_1,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e277c601",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and the model is specified in term of $K-1$ so-called log-odds or\n",
+    "**logit** transformations."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "aea3a410",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## More classes\n",
+    "\n",
+    "In our discussion of neural networks we will encounter the above again\n",
+    "in terms of a slightly modified function, the so-called **Softmax** function.\n",
+    "\n",
+    "The softmax function is used in various multiclass classification\n",
+    "methods, such as multinomial logistic regression (also known as\n",
+    "softmax regression), multiclass linear discriminant analysis, naive\n",
+    "Bayes classifiers, and artificial neural networks.  Specifically, in\n",
+    "multinomial logistic regression and linear discriminant analysis, the\n",
+    "input to the function is the result of $K$ distinct linear functions,\n",
+    "and the predicted probability for the $k$-th class given a sample\n",
+    "vector $\\boldsymbol{x}$ and a weighting vector $\\boldsymbol{\\theta}$ is (with two\n",
+    "predictors):"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "bfa7221f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "p(C=k\\vert \\mathbf {x} )=\\frac{\\exp{(\\theta_{k0}+\\theta_{k1}x_1)}}{1+\\sum_{l=1}^{K-1}\\exp{(\\theta_{l0}+\\theta_{l1}x_1)}}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3d749c39",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "It is easy to extend to more predictors. The final class is"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "dc061a39",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "p(C=K\\vert \\mathbf {x} )=\\frac{1}{1+\\sum_{l=1}^{K-1}\\exp{(\\theta_{l0}+\\theta_{l1}x_1)}},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8ea10488",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and they sum to one. Our earlier discussions were all specialized to\n",
+    "the case with two classes only. It is easy to see from the above that\n",
+    "what we derived earlier is compatible with these equations.\n",
+    "\n",
+    "To find the optimal parameters we would typically use a gradient\n",
+    "descent method.  Newton's method and gradient descent methods are\n",
+    "discussed in the material on [optimization\n",
+    "methods](https://compphysics.github.io/MachineLearning/doc/pub/Splines/html/Splines-bs.html)."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9cb3baf8",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Optimization, the central part of any Machine Learning algortithm\n",
+    "\n",
+    "Almost every problem in machine learning and data science starts with\n",
+    "a dataset $X$, a model $g(\\theta)$, which is a function of the\n",
+    "parameters $\\theta$ and a cost function $C(X, g(\\theta))$ that allows\n",
+    "us to judge how well the model $g(\\theta)$ explains the observations\n",
+    "$X$. The model is fit by finding the values of $\\theta$ that minimize\n",
+    "the cost function. Ideally we would be able to solve for $\\theta$\n",
+    "analytically, however this is not possible in general and we must use\n",
+    "some approximative/numerical method to compute the minimum."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "387393d7",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Revisiting our Logistic Regression case\n",
+    "\n",
+    "In our discussion on Logistic Regression we studied the \n",
+    "case of\n",
+    "two classes, with $y_i$ either\n",
+    "$0$ or $1$. Furthermore we assumed also that we have only two\n",
+    "parameters $\\theta$ in our fitting, that is we\n",
+    "defined probabilities"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "30f64659",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\begin{align*}\n",
+    "p(y_i=1|x_i,\\boldsymbol{\\theta}) &= \\frac{\\exp{(\\theta_0+\\theta_1x_i)}}{1+\\exp{(\\theta_0+\\theta_1x_i)}},\\nonumber\\\\\n",
+    "p(y_i=0|x_i,\\boldsymbol{\\theta}) &= 1 - p(y_i=1|x_i,\\boldsymbol{\\theta}),\n",
+    "\\end{align*}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3ba65422",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where $\\boldsymbol{\\theta}$ are the weights we wish to extract from data, in our case $\\theta_0$ and $\\theta_1$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "005f46d7",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## The equations to solve\n",
+    "\n",
+    "Our compact equations used a definition of a vector $\\boldsymbol{y}$ with $n$\n",
+    "elements $y_i$, an $n\\times p$ matrix $\\boldsymbol{X}$ which contains the\n",
+    "$x_i$ values and a vector $\\boldsymbol{p}$ of fitted probabilities\n",
+    "$p(y_i\\vert x_i,\\boldsymbol{\\theta})$. We rewrote in a more compact form\n",
+    "the first derivative of the cost function as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "61a638bc",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial \\mathcal{C}(\\boldsymbol{\\theta})}{\\partial \\boldsymbol{\\theta}} = -\\boldsymbol{X}^T\\left(\\boldsymbol{y}-\\boldsymbol{p}\\right).\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "469c0042",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "If we in addition define a diagonal matrix $\\boldsymbol{W}$ with elements \n",
+    "$p(y_i\\vert x_i,\\boldsymbol{\\theta})(1-p(y_i\\vert x_i,\\boldsymbol{\\theta})$, we can obtain a compact expression of the second derivative as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0af5449a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial^2 \\mathcal{C}(\\boldsymbol{\\theta})}{\\partial \\boldsymbol{\\theta}\\partial \\boldsymbol{\\theta}^T} = \\boldsymbol{X}^T\\boldsymbol{W}\\boldsymbol{X}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f4c16b4f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "This defines what is called  the Hessian matrix."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ddbe7f50",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Solving using Newton-Raphson's method\n",
+    "\n",
+    "If we can set up these equations, Newton-Raphson's iterative method is normally the method of choice. It requires however that we can compute in an efficient way the  matrices that define the first and second derivatives. \n",
+    "\n",
+    "Our iterative scheme is then given by"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "52830f96",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\boldsymbol{\\theta}^{\\mathrm{new}} = \\boldsymbol{\\theta}^{\\mathrm{old}}-\\left(\\frac{\\partial^2 \\mathcal{C}(\\boldsymbol{\\theta})}{\\partial \\boldsymbol{\\theta}\\partial \\boldsymbol{\\theta}^T}\\right)^{-1}_{\\boldsymbol{\\theta}^{\\mathrm{old}}}\\times \\left(\\frac{\\partial \\mathcal{C}(\\boldsymbol{\\theta})}{\\partial \\boldsymbol{\\theta}}\\right)_{\\boldsymbol{\\theta}^{\\mathrm{old}}},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1b8a1c14",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "or in matrix form as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8ad73cea",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\boldsymbol{\\theta}^{\\mathrm{new}} = \\boldsymbol{\\theta}^{\\mathrm{old}}-\\left(\\boldsymbol{X}^T\\boldsymbol{W}\\boldsymbol{X} \\right)^{-1}\\times \\left(-\\boldsymbol{X}^T(\\boldsymbol{y}-\\boldsymbol{p}) \\right)_{\\boldsymbol{\\theta}^{\\mathrm{old}}}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6d47dd0b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "The right-hand side is computed with the old values of $\\theta$. \n",
+    "\n",
+    "If we can compute these matrices, in particular the Hessian, the above is often the easiest method to implement."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f399c2f4",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Example code for Logistic Regression\n",
+    "\n",
+    "Here we make a class for Logistic regression. The code uses a simple data set and includes both a binary case and a multiclass case."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "id": "79f6b6fc",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "import numpy as np\n",
+    "\n",
+    "class LogisticRegression:\n",
+    "    \"\"\"\n",
+    "    Logistic Regression for binary and multiclass classification.\n",
+    "    \"\"\"\n",
+    "    def __init__(self, lr=0.01, epochs=1000, fit_intercept=True, verbose=False):\n",
+    "        self.lr = lr                  # Learning rate for gradient descent\n",
+    "        self.epochs = epochs          # Number of iterations\n",
+    "        self.fit_intercept = fit_intercept  # Whether to add intercept (bias)\n",
+    "        self.verbose = verbose        # Print loss during training if True\n",
+    "        self.weights = None\n",
+    "        self.multi_class = False      # Will be determined at fit time\n",
+    "\n",
+    "    def _add_intercept(self, X):\n",
+    "        \"\"\"Add intercept term (column of ones) to feature matrix.\"\"\"\n",
+    "        intercept = np.ones((X.shape[0], 1))\n",
+    "        return np.concatenate((intercept, X), axis=1)\n",
+    "\n",
+    "    def _sigmoid(self, z):\n",
+    "        \"\"\"Sigmoid function for binary logistic.\"\"\"\n",
+    "        return 1 / (1 + np.exp(-z))\n",
+    "\n",
+    "    def _softmax(self, Z):\n",
+    "        \"\"\"Softmax function for multiclass logistic.\"\"\"\n",
+    "        exp_Z = np.exp(Z - np.max(Z, axis=1, keepdims=True))\n",
+    "        return exp_Z / np.sum(exp_Z, axis=1, keepdims=True)\n",
+    "\n",
+    "    def fit(self, X, y):\n",
+    "        \"\"\"\n",
+    "        Train the logistic regression model using gradient descent.\n",
+    "        Supports binary (sigmoid) and multiclass (softmax) based on y.\n",
+    "        \"\"\"\n",
+    "        X = np.array(X)\n",
+    "        y = np.array(y)\n",
+    "        n_samples, n_features = X.shape\n",
+    "\n",
+    "        # Add intercept if needed\n",
+    "        if self.fit_intercept:\n",
+    "            X = self._add_intercept(X)\n",
+    "            n_features += 1\n",
+    "\n",
+    "        # Determine classes and mode (binary vs multiclass)\n",
+    "        unique_classes = np.unique(y)\n",
+    "        if len(unique_classes) > 2:\n",
+    "            self.multi_class = True\n",
+    "        else:\n",
+    "            self.multi_class = False\n",
+    "\n",
+    "        # ----- Multiclass case -----\n",
+    "        if self.multi_class:\n",
+    "            n_classes = len(unique_classes)\n",
+    "            # Map original labels to 0...n_classes-1\n",
+    "            class_to_index = {c: idx for idx, c in enumerate(unique_classes)}\n",
+    "            y_indices = np.array([class_to_index[c] for c in y])\n",
+    "            # Initialize weight matrix (features x classes)\n",
+    "            self.weights = np.zeros((n_features, n_classes))\n",
+    "\n",
+    "            # One-hot encode y\n",
+    "            Y_onehot = np.zeros((n_samples, n_classes))\n",
+    "            Y_onehot[np.arange(n_samples), y_indices] = 1\n",
+    "\n",
+    "            # Gradient descent\n",
+    "            for epoch in range(self.epochs):\n",
+    "                scores = X.dot(self.weights)          # Linear scores (n_samples x n_classes)\n",
+    "                probs = self._softmax(scores)        # Probabilities (n_samples x n_classes)\n",
+    "                # Compute gradient (features x classes)\n",
+    "                gradient = (1 / n_samples) * X.T.dot(probs - Y_onehot)\n",
+    "                # Update weights\n",
+    "                self.weights -= self.lr * gradient\n",
+    "\n",
+    "                if self.verbose and epoch % 100 == 0:\n",
+    "                    # Compute current loss (categorical cross-entropy)\n",
+    "                    loss = -np.sum(Y_onehot * np.log(probs + 1e-15)) / n_samples\n",
+    "                    print(f\"[Epoch {epoch}] Multiclass loss: {loss:.4f}\")\n",
+    "\n",
+    "        # ----- Binary case -----\n",
+    "        else:\n",
+    "            # Convert y to 0/1 if not already\n",
+    "            if not np.array_equal(unique_classes, [0, 1]):\n",
+    "                # Map the two classes to 0 and 1\n",
+    "                class0, class1 = unique_classes\n",
+    "                y_binary = np.where(y == class1, 1, 0)\n",
+    "            else:\n",
+    "                y_binary = y.copy().astype(int)\n",
+    "\n",
+    "            # Initialize weights vector (features,)\n",
+    "            self.weights = np.zeros(n_features)\n",
+    "\n",
+    "            # Gradient descent\n",
+    "            for epoch in range(self.epochs):\n",
+    "                linear_model = X.dot(self.weights)     # (n_samples,)\n",
+    "                probs = self._sigmoid(linear_model)   # (n_samples,)\n",
+    "                # Gradient for binary cross-entropy\n",
+    "                gradient = (1 / n_samples) * X.T.dot(probs - y_binary)\n",
+    "                self.weights -= self.lr * gradient\n",
+    "\n",
+    "                if self.verbose and epoch % 100 == 0:\n",
+    "                    # Compute binary cross-entropy loss\n",
+    "                    loss = -np.mean(\n",
+    "                        y_binary * np.log(probs + 1e-15) + \n",
+    "                        (1 - y_binary) * np.log(1 - probs + 1e-15)\n",
+    "                    )\n",
+    "                    print(f\"[Epoch {epoch}] Binary loss: {loss:.4f}\")\n",
+    "\n",
+    "    def predict_prob(self, X):\n",
+    "        \"\"\"\n",
+    "        Compute probability estimates. Returns a 1D array for binary or\n",
+    "        a 2D array (n_samples x n_classes) for multiclass.\n",
+    "        \"\"\"\n",
+    "        X = np.array(X)\n",
+    "        # Add intercept if the model used it\n",
+    "        if self.fit_intercept:\n",
+    "            X = self._add_intercept(X)\n",
+    "        scores = X.dot(self.weights)\n",
+    "        if self.multi_class:\n",
+    "            return self._softmax(scores)\n",
+    "        else:\n",
+    "            return self._sigmoid(scores)\n",
+    "\n",
+    "    def predict(self, X):\n",
+    "        \"\"\"\n",
+    "        Predict class labels for samples in X.\n",
+    "        Returns integer class labels (0,1 for binary, or 0...C-1 for multiclass).\n",
+    "        \"\"\"\n",
+    "        probs = self.predict_prob(X)\n",
+    "        if self.multi_class:\n",
+    "            # Choose class with highest probability\n",
+    "            return np.argmax(probs, axis=1)\n",
+    "        else:\n",
+    "            # Threshold at 0.5 for binary\n",
+    "            return (probs >= 0.5).astype(int)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "24e84b29",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "The class implements the sigmoid and softmax internally. During fit(),\n",
+    "we check the number of classes: if more than 2, we set\n",
+    "self.multi_class=True and perform multinomial logistic regression. We\n",
+    "one-hot encode the target vector and update a weight matrix with\n",
+    "softmax probabilities. Otherwise, we do standard binary logistic\n",
+    "regression, converting labels to 0/1 if needed and updating a weight\n",
+    "vector. In both cases we use batch gradient descent on the\n",
+    "cross-entropy loss (we add a small epsilon 1e-15 to logs for numerical\n",
+    "stability). Progress (loss) can be printed if verbose=True."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "id": "7a73eca4",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "# Evaluation Metrics\n",
+    "#We define helper functions for accuracy and cross-entropy loss. Accuracy is the fraction of correct predictions . For loss, we compute the appropriate cross-entropy:\n",
+    "\n",
+    "def accuracy_score(y_true, y_pred):\n",
+    "    \"\"\"Accuracy = (# correct predictions) / (total samples).\"\"\"\n",
+    "    y_true = np.array(y_true)\n",
+    "    y_pred = np.array(y_pred)\n",
+    "    return np.mean(y_true == y_pred)\n",
+    "\n",
+    "def binary_cross_entropy(y_true, y_prob):\n",
+    "    \"\"\"\n",
+    "    Binary cross-entropy loss.\n",
+    "    y_true: true binary labels (0 or 1), y_prob: predicted probabilities for class 1.\n",
+    "    \"\"\"\n",
+    "    y_true = np.array(y_true)\n",
+    "    y_prob = np.clip(np.array(y_prob), 1e-15, 1-1e-15)\n",
+    "    return -np.mean(y_true * np.log(y_prob) + (1 - y_true) * np.log(1 - y_prob))\n",
+    "\n",
+    "def categorical_cross_entropy(y_true, y_prob):\n",
+    "    \"\"\"\n",
+    "    Categorical cross-entropy loss for multiclass.\n",
+    "    y_true: true labels (0...C-1), y_prob: array of predicted probabilities (n_samples x C).\n",
+    "    \"\"\"\n",
+    "    y_true = np.array(y_true, dtype=int)\n",
+    "    y_prob = np.clip(np.array(y_prob), 1e-15, 1-1e-15)\n",
+    "    # One-hot encode true labels\n",
+    "    n_samples, n_classes = y_prob.shape\n",
+    "    one_hot = np.zeros_like(y_prob)\n",
+    "    one_hot[np.arange(n_samples), y_true] = 1\n",
+    "    # Compute cross-entropy\n",
+    "    loss_vec = -np.sum(one_hot * np.log(y_prob), axis=1)\n",
+    "    return np.mean(loss_vec)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "40d4b30f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "### Synthetic data generation\n",
+    "\n",
+    "Binary classification data: Create two Gaussian clusters in 2D. For example, class 0 around mean [-2,-2] and class 1 around [2,2].\n",
+    "Multiclass data: Create several Gaussian clusters (one per class) spread out in feature space."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "id": "ac0089bf",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "import numpy as np\n",
+    "\n",
+    "def generate_binary_data(n_samples=100, n_features=2, random_state=None):\n",
+    "    \"\"\"\n",
+    "    Generate synthetic binary classification data.\n",
+    "    Returns (X, y) where X is (n_samples x n_features), y in {0,1}.\n",
+    "    \"\"\"\n",
+    "    rng = np.random.RandomState(random_state)\n",
+    "    # Half samples for class 0, half for class 1\n",
+    "    n0 = n_samples // 2\n",
+    "    n1 = n_samples - n0\n",
+    "    # Class 0 around mean -2, class 1 around +2\n",
+    "    mean0 = -2 * np.ones(n_features)\n",
+    "    mean1 =  2 * np.ones(n_features)\n",
+    "    X0 = rng.randn(n0, n_features) + mean0\n",
+    "    X1 = rng.randn(n1, n_features) + mean1\n",
+    "    X = np.vstack((X0, X1))\n",
+    "    y = np.array([0]*n0 + [1]*n1)\n",
+    "    return X, y\n",
+    "\n",
+    "def generate_multiclass_data(n_samples=150, n_features=2, n_classes=3, random_state=None):\n",
+    "    \"\"\"\n",
+    "    Generate synthetic multiclass data with n_classes Gaussian clusters.\n",
+    "    \"\"\"\n",
+    "    rng = np.random.RandomState(random_state)\n",
+    "    X = []\n",
+    "    y = []\n",
+    "    samples_per_class = n_samples // n_classes\n",
+    "    for cls in range(n_classes):\n",
+    "        # Random cluster center for each class\n",
+    "        center = rng.uniform(-5, 5, size=n_features)\n",
+    "        Xi = rng.randn(samples_per_class, n_features) + center\n",
+    "        yi = [cls] * samples_per_class\n",
+    "        X.append(Xi)\n",
+    "        y.extend(yi)\n",
+    "    X = np.vstack(X)\n",
+    "    y = np.array(y)\n",
+    "    return X, y\n",
+    "\n",
+    "\n",
+    "# Generate and test on binary data\n",
+    "X_bin, y_bin = generate_binary_data(n_samples=200, n_features=2, random_state=42)\n",
+    "model_bin = LogisticRegression(lr=0.1, epochs=1000)\n",
+    "model_bin.fit(X_bin, y_bin)\n",
+    "y_prob_bin = model_bin.predict_prob(X_bin)      # probabilities for class 1\n",
+    "y_pred_bin = model_bin.predict(X_bin)           # predicted classes 0 or 1\n",
+    "\n",
+    "acc_bin = accuracy_score(y_bin, y_pred_bin)\n",
+    "loss_bin = binary_cross_entropy(y_bin, y_prob_bin)\n",
+    "print(f\"Binary Classification - Accuracy: {acc_bin:.2f}, Cross-Entropy Loss: {loss_bin:.2f}\")\n",
+    "#For multiclass:\n",
+    "# Generate and test on multiclass data\n",
+    "X_multi, y_multi = generate_multiclass_data(n_samples=300, n_features=2, n_classes=3, random_state=1)\n",
+    "model_multi = LogisticRegression(lr=0.1, epochs=1000)\n",
+    "model_multi.fit(X_multi, y_multi)\n",
+    "y_prob_multi = model_multi.predict_prob(X_multi)     # (n_samples x 3) probabilities\n",
+    "y_pred_multi = model_multi.predict(X_multi)          # predicted labels 0,1,2\n",
+    "\n",
+    "acc_multi = accuracy_score(y_multi, y_pred_multi)\n",
+    "loss_multi = categorical_cross_entropy(y_multi, y_prob_multi)\n",
+    "print(f\"Multiclass Classification - Accuracy: {acc_multi:.2f}, Cross-Entropy Loss: {loss_multi:.2f}\")\n",
+    "\n",
+    "# CSV Export\n",
+    "import csv\n",
+    "\n",
+    "# Export binary results\n",
+    "with open('binary_results.csv', mode='w', newline='') as f:\n",
+    "    writer = csv.writer(f)\n",
+    "    writer.writerow([\"TrueLabel\", \"PredictedLabel\"])\n",
+    "    for true, pred in zip(y_bin, y_pred_bin):\n",
+    "        writer.writerow([true, pred])\n",
+    "\n",
+    "# Export multiclass results\n",
+    "with open('multiclass_results.csv', mode='w', newline='') as f:\n",
+    "    writer = csv.writer(f)\n",
+    "    writer.writerow([\"TrueLabel\", \"PredictedLabel\"])\n",
+    "    for true, pred in zip(y_multi, y_pred_multi):\n",
+    "        writer.writerow([true, pred])"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1e9acef3",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Using **Scikit-learn**\n",
+    "\n",
+    "We show here how we can use a logistic regression case on a data set\n",
+    "included in _scikit_learn_, the so-called Wisconsin breast cancer data\n",
+    "using Logistic regression as our algorithm for classification. This is\n",
+    "a widely studied data set and can easily be included in demonstrations\n",
+    "of classification problems."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "id": "9153234a",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "%matplotlib inline\n",
+    "\n",
+    "import matplotlib.pyplot as plt\n",
+    "import numpy as np\n",
+    "from sklearn.model_selection import  train_test_split \n",
+    "from sklearn.datasets import load_breast_cancer\n",
+    "from sklearn.linear_model import LogisticRegression\n",
+    "\n",
+    "# Load the data\n",
+    "cancer = load_breast_cancer()\n",
+    "\n",
+    "X_train, X_test, y_train, y_test = train_test_split(cancer.data,cancer.target,random_state=0)\n",
+    "print(X_train.shape)\n",
+    "print(X_test.shape)\n",
+    "# Logistic Regression\n",
+    "logreg = LogisticRegression(solver='lbfgs')\n",
+    "logreg.fit(X_train, y_train)\n",
+    "print(\"Test set accuracy with Logistic Regression: {:.2f}\".format(logreg.score(X_test,y_test)))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "908d547b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Using the correlation matrix\n",
+    "\n",
+    "In addition to the above scores, we could also study the covariance (and the correlation matrix).\n",
+    "We use **Pandas** to compute the correlation matrix."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "id": "8a46f4f3",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "import matplotlib.pyplot as plt\n",
+    "import numpy as np\n",
+    "from sklearn.model_selection import  train_test_split \n",
+    "from sklearn.datasets import load_breast_cancer\n",
+    "from sklearn.linear_model import LogisticRegression\n",
+    "cancer = load_breast_cancer()\n",
+    "import pandas as pd\n",
+    "# Making a data frame\n",
+    "cancerpd = pd.DataFrame(cancer.data, columns=cancer.feature_names)\n",
+    "\n",
+    "fig, axes = plt.subplots(15,2,figsize=(10,20))\n",
+    "malignant = cancer.data[cancer.target == 0]\n",
+    "benign = cancer.data[cancer.target == 1]\n",
+    "ax = axes.ravel()\n",
+    "\n",
+    "for i in range(30):\n",
+    "    _, bins = np.histogram(cancer.data[:,i], bins =50)\n",
+    "    ax[i].hist(malignant[:,i], bins = bins, alpha = 0.5)\n",
+    "    ax[i].hist(benign[:,i], bins = bins, alpha = 0.5)\n",
+    "    ax[i].set_title(cancer.feature_names[i])\n",
+    "    ax[i].set_yticks(())\n",
+    "ax[0].set_xlabel(\"Feature magnitude\")\n",
+    "ax[0].set_ylabel(\"Frequency\")\n",
+    "ax[0].legend([\"Malignant\", \"Benign\"], loc =\"best\")\n",
+    "fig.tight_layout()\n",
+    "plt.show()\n",
+    "\n",
+    "import seaborn as sns\n",
+    "correlation_matrix = cancerpd.corr().round(1)\n",
+    "# use the heatmap function from seaborn to plot the correlation matrix\n",
+    "# annot = True to print the values inside the square\n",
+    "plt.figure(figsize=(15,8))\n",
+    "sns.heatmap(data=correlation_matrix, annot=True)\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ba0275a7",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Discussing the correlation data\n",
+    "\n",
+    "In the above example we note two things. In the first plot we display\n",
+    "the overlap of benign and malignant tumors as functions of the various\n",
+    "features in the Wisconsin data set. We see that for\n",
+    "some of the features we can distinguish clearly the benign and\n",
+    "malignant cases while for other features we cannot. This can point to\n",
+    "us which features may be of greater interest when we wish to classify\n",
+    "a benign or not benign tumour.\n",
+    "\n",
+    "In the second figure we have computed the so-called correlation\n",
+    "matrix, which in our case with thirty features becomes a $30\\times 30$\n",
+    "matrix.\n",
+    "\n",
+    "We constructed this matrix using **pandas** via the statements"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "id": "1af34f8e",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "cancerpd = pd.DataFrame(cancer.data, columns=cancer.feature_names)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1eac30d3",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and then"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "id": "a0cdd9c9",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "correlation_matrix = cancerpd.corr().round(1)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "013777ad",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Diagonalizing this matrix we can in turn say something about which\n",
+    "features are of relevance and which are not. This leads  us to\n",
+    "the classical Principal Component Analysis (PCA) theorem with\n",
+    "applications. This will be discussed later this semester."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "410f90ac",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Other measures in classification studies"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 8,
+   "id": "fa16a459",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "import matplotlib.pyplot as plt\n",
+    "import numpy as np\n",
+    "from sklearn.model_selection import  train_test_split \n",
+    "from sklearn.datasets import load_breast_cancer\n",
+    "from sklearn.linear_model import LogisticRegression\n",
+    "\n",
+    "# Load the data\n",
+    "cancer = load_breast_cancer()\n",
+    "\n",
+    "X_train, X_test, y_train, y_test = train_test_split(cancer.data,cancer.target,random_state=0)\n",
+    "print(X_train.shape)\n",
+    "print(X_test.shape)\n",
+    "# Logistic Regression\n",
+    "logreg = LogisticRegression(solver='lbfgs')\n",
+    "logreg.fit(X_train, y_train)\n",
+    "\n",
+    "from sklearn.preprocessing import LabelEncoder\n",
+    "from sklearn.model_selection import cross_validate\n",
+    "#Cross validation\n",
+    "accuracy = cross_validate(logreg,X_test,y_test,cv=10)['test_score']\n",
+    "print(accuracy)\n",
+    "print(\"Test set accuracy with Logistic Regression: {:.2f}\".format(logreg.score(X_test,y_test)))\n",
+    "\n",
+    "import scikitplot as skplt\n",
+    "y_pred = logreg.predict(X_test)\n",
+    "skplt.metrics.plot_confusion_matrix(y_test, y_pred, normalize=True)\n",
+    "plt.show()\n",
+    "y_probas = logreg.predict_proba(X_test)\n",
+    "skplt.metrics.plot_roc(y_test, y_probas)\n",
+    "plt.show()\n",
+    "skplt.metrics.plot_cumulative_gain(y_test, y_probas)\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a721de53",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Introduction to Neural networks\n",
+    "\n",
+    "Artificial neural networks are computational systems that can learn to\n",
+    "perform tasks by considering examples, generally without being\n",
+    "programmed with any task-specific rules. It is supposed to mimic a\n",
+    "biological system, wherein neurons interact by sending signals in the\n",
+    "form of mathematical functions between layers. All layers can contain\n",
+    "an arbitrary number of neurons, and each connection is represented by\n",
+    "a weight variable."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "68de5052",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Artificial neurons\n",
+    "\n",
+    "The field of artificial neural networks has a long history of\n",
+    "development, and is closely connected with the advancement of computer\n",
+    "science and computers in general. A model of artificial neurons was\n",
+    "first developed by McCulloch and Pitts in 1943 to study signal\n",
+    "processing in the brain and has later been refined by others. The\n",
+    "general idea is to mimic neural networks in the human brain, which is\n",
+    "composed of billions of neurons that communicate with each other by\n",
+    "sending electrical signals.  Each neuron accumulates its incoming\n",
+    "signals, which must exceed an activation threshold to yield an\n",
+    "output. If the threshold is not overcome, the neuron remains inactive,\n",
+    "i.e. has zero output.\n",
+    "\n",
+    "This behaviour has inspired a simple mathematical model for an artificial neuron."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7685af02",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"artificialNeuron\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation}\n",
+    " y = f\\left(\\sum_{i=1}^n w_ix_i\\right) = f(u)\n",
+    "\\label{artificialNeuron} \\tag{1}\n",
+    "\\end{equation}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3dfcfcb0",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Here, the output $y$ of the neuron is the value of its activation function, which have as input\n",
+    "a weighted sum of signals $x_i, \\dots ,x_n$ received by $n$ other neurons.\n",
+    "\n",
+    "Conceptually, it is helpful to divide neural networks into four\n",
+    "categories:\n",
+    "1. general purpose neural networks for supervised learning,\n",
+    "\n",
+    "2. neural networks designed specifically for image processing, the most prominent example of this class being Convolutional Neural Networks (CNNs),\n",
+    "\n",
+    "3. neural networks for sequential data such as Recurrent Neural Networks (RNNs), and\n",
+    "\n",
+    "4. neural networks for unsupervised learning such as Deep Boltzmann Machines.\n",
+    "\n",
+    "In natural science, DNNs and CNNs have already found numerous\n",
+    "applications. In statistical physics, they have been applied to detect\n",
+    "phase transitions in 2D Ising and Potts models, lattice gauge\n",
+    "theories, and different phases of polymers, or solving the\n",
+    "Navier-Stokes equation in weather forecasting.  Deep learning has also\n",
+    "found interesting applications in quantum physics. Various quantum\n",
+    "phase transitions can be detected and studied using DNNs and CNNs,\n",
+    "topological phases, and even non-equilibrium many-body\n",
+    "localization. Representing quantum states as DNNs quantum state\n",
+    "tomography are among some of the impressive achievements to reveal the\n",
+    "potential of DNNs to facilitate the study of quantum systems.\n",
+    "\n",
+    "In quantum information theory, it has been shown that one can perform\n",
+    "gate decompositions with the help of neural. \n",
+    "\n",
+    "The applications are not limited to the natural sciences. There is a\n",
+    "plethora of applications in essentially all disciplines, from the\n",
+    "humanities to life science and medicine."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0d037ca7",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Neural network types\n",
+    "\n",
+    "An artificial neural network (ANN), is a computational model that\n",
+    "consists of layers of connected neurons, or nodes or units.  We will\n",
+    "refer to these interchangeably as units or nodes, and sometimes as\n",
+    "neurons.\n",
+    "\n",
+    "It is supposed to mimic a biological nervous system by letting each\n",
+    "neuron interact with other neurons by sending signals in the form of\n",
+    "mathematical functions between layers.  A wide variety of different\n",
+    "ANNs have been developed, but most of them consist of an input layer,\n",
+    "an output layer and eventual layers in-between, called *hidden\n",
+    "layers*. All layers can contain an arbitrary number of nodes, and each\n",
+    "connection between two nodes is associated with a weight variable.\n",
+    "\n",
+    "Neural networks (also called neural nets) are neural-inspired\n",
+    "nonlinear models for supervised learning.  As we will see, neural nets\n",
+    "can be viewed as natural, more powerful extensions of supervised\n",
+    "learning methods such as linear and logistic regression and soft-max\n",
+    "methods we discussed earlier."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7bcf7188",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Feed-forward neural networks\n",
+    "\n",
+    "The feed-forward neural network (FFNN) was the first and simplest type\n",
+    "of ANNs that were devised. In this network, the information moves in\n",
+    "only one direction: forward through the layers.\n",
+    "\n",
+    "Nodes are represented by circles, while the arrows display the\n",
+    "connections between the nodes, including the direction of information\n",
+    "flow. Additionally, each arrow corresponds to a weight variable\n",
+    "(figure to come).  We observe that each node in a layer is connected\n",
+    "to *all* nodes in the subsequent layer, making this a so-called\n",
+    "*fully-connected* FFNN."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cd094e20",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Convolutional Neural Network\n",
+    "\n",
+    "A different variant of FFNNs are *convolutional neural networks*\n",
+    "(CNNs), which have a connectivity pattern inspired by the animal\n",
+    "visual cortex. Individual neurons in the visual cortex only respond to\n",
+    "stimuli from small sub-regions of the visual field, called a receptive\n",
+    "field. This makes the neurons well-suited to exploit the strong\n",
+    "spatially local correlation present in natural images. The response of\n",
+    "each neuron can be approximated mathematically as a convolution\n",
+    "operation.  (figure to come)\n",
+    "\n",
+    "Convolutional neural networks emulate the behaviour of neurons in the\n",
+    "visual cortex by enforcing a *local* connectivity pattern between\n",
+    "nodes of adjacent layers: Each node in a convolutional layer is\n",
+    "connected only to a subset of the nodes in the previous layer, in\n",
+    "contrast to the fully-connected FFNN.  Often, CNNs consist of several\n",
+    "convolutional layers that learn local features of the input, with a\n",
+    "fully-connected layer at the end, which gathers all the local data and\n",
+    "produces the outputs. They have wide applications in image and video\n",
+    "recognition."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ea99157e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Recurrent neural networks\n",
+    "\n",
+    "So far we have only mentioned ANNs where information flows in one\n",
+    "direction: forward. *Recurrent neural networks* on the other hand,\n",
+    "have connections between nodes that form directed *cycles*. This\n",
+    "creates a form of internal memory which are able to capture\n",
+    "information on what has been calculated before; the output is\n",
+    "dependent on the previous computations. Recurrent NNs make use of\n",
+    "sequential information by performing the same task for every element\n",
+    "in a sequence, where each element depends on previous elements. An\n",
+    "example of such information is sentences, making recurrent NNs\n",
+    "especially well-suited for handwriting and speech recognition."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b73754c2",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Other types of networks\n",
+    "\n",
+    "There are many other kinds of ANNs that have been developed. One type\n",
+    "that is specifically designed for interpolation in multidimensional\n",
+    "space is the radial basis function (RBF) network. RBFs are typically\n",
+    "made up of three layers: an input layer, a hidden layer with\n",
+    "non-linear radial symmetric activation functions and a linear output\n",
+    "layer (''linear'' here means that each node in the output layer has a\n",
+    "linear activation function). The layers are normally fully-connected\n",
+    "and there are no cycles, thus RBFs can be viewed as a type of\n",
+    "fully-connected FFNN. They are however usually treated as a separate\n",
+    "type of NN due the unusual activation functions."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "aa97c83d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Multilayer perceptrons\n",
+    "\n",
+    "One uses often so-called fully-connected feed-forward neural networks\n",
+    "with three or more layers (an input layer, one or more hidden layers\n",
+    "and an output layer) consisting of neurons that have non-linear\n",
+    "activation functions.\n",
+    "\n",
+    "Such networks are often called *multilayer perceptrons* (MLPs)."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "abe84919",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Why multilayer perceptrons?\n",
+    "\n",
+    "According to the *Universal approximation theorem*, a feed-forward\n",
+    "neural network with just a single hidden layer containing a finite\n",
+    "number of neurons can approximate a continuous multidimensional\n",
+    "function to arbitrary accuracy, assuming the activation function for\n",
+    "the hidden layer is a **non-constant, bounded and\n",
+    "monotonically-increasing continuous function**.\n",
+    "\n",
+    "Note that the requirements on the activation function only applies to\n",
+    "the hidden layer, the output nodes are always assumed to be linear, so\n",
+    "as to not restrict the range of output values."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d3ff207b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Illustration of a single perceptron model and a multi-perceptron model\n",
+    "\n",
+    "<!-- dom:FIGURE: [figures/nns.png, width=600 frac=0.8]  In a) we show a single perceptron model while in b) we dispay a network with two  hidden layers, an input layer and an output layer. -->\n",
+    "<!-- begin figure -->\n",
+    "\n",
+    "<img src=\"figures/nns.png\" width=\"600\"><p style=\"font-size: 0.9em\"><i>Figure 1: In a) we show a single perceptron model while in b) we dispay a network with two  hidden layers, an input layer and an output layer.</i></p>\n",
+    "<!-- end figure -->"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f982c11f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Examples of XOR, OR and AND gates\n",
+    "\n",
+    "Let us first try to fit various gates using standard linear\n",
+    "regression. The gates we are thinking of are the classical XOR, OR and\n",
+    "AND gates, well-known elements in computer science. The tables here\n",
+    "show how we can set up the inputs $x_1$ and $x_2$ in order to yield a\n",
+    "specific target $y_i$."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 9,
+   "id": "04a3e090",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "\"\"\"\n",
+    "Simple code that tests XOR, OR and AND gates with linear regression\n",
+    "\"\"\"\n",
+    "\n",
+    "import numpy as np\n",
+    "# Design matrix\n",
+    "X = np.array([ [1, 0, 0], [1, 0, 1], [1, 1, 0],[1, 1, 1]],dtype=np.float64)\n",
+    "print(f\"The X.TX  matrix:{X.T @ X}\")\n",
+    "Xinv = np.linalg.pinv(X.T @ X)\n",
+    "print(f\"The invers of X.TX  matrix:{Xinv}\")\n",
+    "\n",
+    "# The XOR gate \n",
+    "yXOR = np.array( [ 0, 1 ,1, 0])\n",
+    "ThetaXOR  = Xinv @ X.T @ yXOR\n",
+    "print(f\"The values of theta for the XOR gate:{ThetaXOR}\")\n",
+    "print(f\"The linear regression prediction  for the XOR gate:{X @ ThetaXOR}\")\n",
+    "\n",
+    "\n",
+    "# The OR gate \n",
+    "yOR = np.array( [ 0, 1 ,1, 1])\n",
+    "ThetaOR  = Xinv @ X.T @ yOR\n",
+    "print(f\"The values of theta for the OR gate:{ThetaOR}\")\n",
+    "print(f\"The linear regression prediction  for the OR gate:{X @ ThetaOR}\")\n",
+    "\n",
+    "\n",
+    "# The OR gate \n",
+    "yAND = np.array( [ 0, 0 ,0, 1])\n",
+    "ThetaAND  = Xinv @ X.T @ yAND\n",
+    "print(f\"The values of theta for the AND gate:{ThetaAND}\")\n",
+    "print(f\"The linear regression prediction  for the AND gate:{X @ ThetaAND}\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "95b1f5a5",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "What is happening here?"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0d200eff",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Does Logistic Regression do a better Job?"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 10,
+   "id": "040a69d0",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "\"\"\"\n",
+    "Simple code that tests XOR and OR gates with linear regression\n",
+    "and logistic regression\n",
+    "\"\"\"\n",
+    "\n",
+    "import matplotlib.pyplot as plt\n",
+    "from sklearn.linear_model import LogisticRegression\n",
+    "import numpy as np\n",
+    "\n",
+    "# Design matrix\n",
+    "X = np.array([ [1, 0, 0], [1, 0, 1], [1, 1, 0],[1, 1, 1]],dtype=np.float64)\n",
+    "print(f\"The X.TX  matrix:{X.T @ X}\")\n",
+    "Xinv = np.linalg.pinv(X.T @ X)\n",
+    "print(f\"The invers of X.TX  matrix:{Xinv}\")\n",
+    "\n",
+    "# The XOR gate \n",
+    "yXOR = np.array( [ 0, 1 ,1, 0])\n",
+    "ThetaXOR  = Xinv @ X.T @ yXOR\n",
+    "print(f\"The values of theta for the XOR gate:{ThetaXOR}\")\n",
+    "print(f\"The linear regression prediction  for the XOR gate:{X @ ThetaXOR}\")\n",
+    "\n",
+    "\n",
+    "# The OR gate \n",
+    "yOR = np.array( [ 0, 1 ,1, 1])\n",
+    "ThetaOR  = Xinv @ X.T @ yOR\n",
+    "print(f\"The values of theta for the OR gate:{ThetaOR}\")\n",
+    "print(f\"The linear regression prediction  for the OR gate:{X @ ThetaOR}\")\n",
+    "\n",
+    "\n",
+    "# The OR gate \n",
+    "yAND = np.array( [ 0, 0 ,0, 1])\n",
+    "ThetaAND  = Xinv @ X.T @ yAND\n",
+    "print(f\"The values of theta for the AND gate:{ThetaAND}\")\n",
+    "print(f\"The linear regression prediction  for the AND gate:{X @ ThetaAND}\")\n",
+    "\n",
+    "# Now we change to logistic regression\n",
+    "\n",
+    "\n",
+    "# Logistic Regression\n",
+    "logreg = LogisticRegression()\n",
+    "logreg.fit(X, yOR)\n",
+    "print(\"Test set accuracy with Logistic Regression for OR gate: {:.2f}\".format(logreg.score(X,yOR)))\n",
+    "\n",
+    "logreg.fit(X, yXOR)\n",
+    "print(\"Test set accuracy with Logistic Regression for XOR gate: {:.2f}\".format(logreg.score(X,yXOR)))\n",
+    "\n",
+    "\n",
+    "logreg.fit(X, yAND)\n",
+    "print(\"Test set accuracy with Logistic Regression for AND gate: {:.2f}\".format(logreg.score(X,yAND)))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "49f17f65",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Not exactly impressive, but somewhat better."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "714e0891",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Adding Neural Networks"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 11,
+   "id": "28bde670",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "\n",
+    "# and now neural networks with Scikit-Learn and the XOR\n",
+    "\n",
+    "from sklearn.neural_network import MLPClassifier\n",
+    "from sklearn.datasets import make_classification\n",
+    "X, yXOR = make_classification(n_samples=100, random_state=1)\n",
+    "FFNN = MLPClassifier(random_state=1, max_iter=300).fit(X, yXOR)\n",
+    "FFNN.predict_proba(X)\n",
+    "print(f\"Test set accuracy with Feed Forward Neural Network  for XOR gate:{FFNN.score(X, yXOR)}\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4440856f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Mathematical model\n",
+    "\n",
+    "The output $y$ is produced via the activation function $f$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6199da92",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "y = f\\left(\\sum_{i=1}^n w_ix_i + b_i\\right) = f(z),\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "62c964e3",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "This function receives $x_i$ as inputs.\n",
+    "Here the activation $z=(\\sum_{i=1}^n w_ix_i+b_i)$. \n",
+    "In an FFNN of such neurons, the *inputs* $x_i$ are the *outputs* of\n",
+    "the neurons in the preceding layer. Furthermore, an MLP is\n",
+    "fully-connected, which means that each neuron receives a weighted sum\n",
+    "of the outputs of *all* neurons in the previous layer."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "64ba4c70",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Mathematical model\n",
+    "\n",
+    "First, for each node $i$ in the first hidden layer, we calculate a weighted sum $z_i^1$ of the input coordinates $x_j$,"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "66c11135",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"_auto1\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation} z_i^1 = \\sum_{j=1}^{M} w_{ij}^1 x_j + b_i^1\n",
+    "\\label{_auto1} \\tag{2}\n",
+    "\\end{equation}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0f47b20a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Here $b_i$ is the so-called bias which is normally needed in\n",
+    "case of zero activation weights or inputs. How to fix the biases and\n",
+    "the weights will be discussed below.  The value of $z_i^1$ is the\n",
+    "argument to the activation function $f_i$ of each node $i$, The\n",
+    "variable $M$ stands for all possible inputs to a given node $i$ in the\n",
+    "first layer.  We define  the output $y_i^1$ of all neurons in layer 1 as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "bda56156",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"outputLayer1\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation}\n",
+    " y_i^1 = f(z_i^1) = f\\left(\\sum_{j=1}^M w_{ij}^1 x_j  + b_i^1\\right)\n",
+    "\\label{outputLayer1} \\tag{3}\n",
+    "\\end{equation}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1330fab9",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where we assume that all nodes in the same layer have identical\n",
+    "activation functions, hence the notation $f$. In general, we could assume in the more general case that different layers have different activation functions.\n",
+    "In this case we would identify these functions with a superscript $l$ for the $l$-th layer,"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ae474dfb",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"generalLayer\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation}\n",
+    " y_i^l = f^l(u_i^l) = f^l\\left(\\sum_{j=1}^{N_{l-1}} w_{ij}^l y_j^{l-1} + b_i^l\\right)\n",
+    "\\label{generalLayer} \\tag{4}\n",
+    "\\end{equation}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b6cb6fed",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where $N_l$ is the number of nodes in layer $l$. When the output of\n",
+    "all the nodes in the first hidden layer are computed, the values of\n",
+    "the subsequent layer can be calculated and so forth until the output\n",
+    "is obtained."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2f8f9b4e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Mathematical model\n",
+    "\n",
+    "The output of neuron $i$ in layer 2 is thus,"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "18e74238",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"_auto2\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation}\n",
+    " y_i^2 = f^2\\left(\\sum_{j=1}^N w_{ij}^2 y_j^1 + b_i^2\\right) \n",
+    "\\label{_auto2} \\tag{5}\n",
+    "\\end{equation}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d10df3e7",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"outputLayer2\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation} \n",
+    " = f^2\\left[\\sum_{j=1}^N w_{ij}^2f^1\\left(\\sum_{k=1}^M w_{jk}^1 x_k + b_j^1\\right) + b_i^2\\right]\n",
+    "\\label{outputLayer2} \\tag{6}\n",
+    "\\end{equation}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "da21a316",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where we have substituted $y_k^1$ with the inputs $x_k$. Finally, the ANN output reads"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "76938a28",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"_auto3\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation}\n",
+    " y_i^3 = f^3\\left(\\sum_{j=1}^N w_{ij}^3 y_j^2 + b_i^3\\right) \n",
+    "\\label{_auto3} \\tag{7}\n",
+    "\\end{equation}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "65434967",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"_auto4\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation} \n",
+    " = f_3\\left[\\sum_{j} w_{ij}^3 f^2\\left(\\sum_{k} w_{jk}^2 f^1\\left(\\sum_{m} w_{km}^1 x_m + b_k^1\\right) + b_j^2\\right)\n",
+    "  + b_1^3\\right]\n",
+    "\\label{_auto4} \\tag{8}\n",
+    "\\end{equation}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "31d4f5aa",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Mathematical model\n",
+    "\n",
+    "We can generalize this expression to an MLP with $l$ hidden\n",
+    "layers. The complete functional form is,"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "114030e5",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"completeNN\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation}\n",
+    "y^{l+1}_i = f^{l+1}\\left[\\!\\sum_{j=1}^{N_l} w_{ij}^3 f^l\\left(\\sum_{k=1}^{N_{l-1}}w_{jk}^{l-1}\\left(\\dots f^1\\left(\\sum_{n=1}^{N_0} w_{mn}^1 x_n+ b_m^1\\right)\\dots\\right)+b_k^2\\right)+b_1^3\\right] \n",
+    "\\label{completeNN} \\tag{9}\n",
+    "\\end{equation}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a93aec4e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "which illustrates a basic property of MLPs: The only independent\n",
+    "variables are the input values $x_n$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7c85562d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Mathematical model\n",
+    "\n",
+    "This confirms that an MLP, despite its quite convoluted mathematical\n",
+    "form, is nothing more than an analytic function, specifically a\n",
+    "mapping of real-valued vectors $\\hat{x} \\in \\mathbb{R}^n \\rightarrow\n",
+    "\\hat{y} \\in \\mathbb{R}^m$.\n",
+    "\n",
+    "Furthermore, the flexibility and universality of an MLP can be\n",
+    "illustrated by realizing that the expression is essentially a nested\n",
+    "sum of scaled activation functions of the form"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1152ea5e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"_auto5\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation}\n",
+    " f(x) = c_1 f(c_2 x + c_3) + c_4\n",
+    "\\label{_auto5} \\tag{10}\n",
+    "\\end{equation}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4f3d4b33",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where the parameters $c_i$ are weights and biases. By adjusting these\n",
+    "parameters, the activation functions can be shifted up and down or\n",
+    "left and right, change slope or be rescaled which is the key to the\n",
+    "flexibility of a neural network."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4c1ac54e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "### Matrix-vector notation\n",
+    "\n",
+    "We can introduce a more convenient notation for the activations in an A NN. \n",
+    "\n",
+    "Additionally, we can represent the biases and activations\n",
+    "as layer-wise column vectors $\\hat{b}_l$ and $\\hat{y}_l$, so that the $i$-th element of each vector \n",
+    "is the bias $b_i^l$ and activation $y_i^l$ of node $i$ in layer $l$ respectively. \n",
+    "\n",
+    "We have that $\\mathrm{W}_l$ is an $N_{l-1} \\times N_l$ matrix, while $\\hat{b}_l$ and $\\hat{y}_l$ are $N_l \\times 1$ column vectors. \n",
+    "With this notation, the sum becomes a matrix-vector multiplication, and we can write\n",
+    "the equation for the activations of hidden layer 2 (assuming three nodes for simplicity) as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5c4a861f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"_auto6\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation}\n",
+    " \\hat{y}_2 = f_2(\\mathrm{W}_2 \\hat{y}_{1} + \\hat{b}_{2}) = \n",
+    " f_2\\left(\\left[\\begin{array}{ccc}\n",
+    "    w^2_{11} &w^2_{12} &w^2_{13} \\\\\n",
+    "    w^2_{21} &w^2_{22} &w^2_{23} \\\\\n",
+    "    w^2_{31} &w^2_{32} &w^2_{33} \\\\\n",
+    "    \\end{array} \\right] \\cdot\n",
+    "    \\left[\\begin{array}{c}\n",
+    "           y^1_1 \\\\\n",
+    "           y^1_2 \\\\\n",
+    "           y^1_3 \\\\\n",
+    "          \\end{array}\\right] + \n",
+    "    \\left[\\begin{array}{c}\n",
+    "           b^2_1 \\\\\n",
+    "           b^2_2 \\\\\n",
+    "           b^2_3 \\\\\n",
+    "          \\end{array}\\right]\\right).\n",
+    "\\label{_auto6} \\tag{11}\n",
+    "\\end{equation}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "276b271b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "### Matrix-vector notation  and activation\n",
+    "\n",
+    "The activation of node $i$ in layer 2 is"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "63a5b8f1",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"_auto7\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation}\n",
+    " y^2_i = f_2\\Bigr(w^2_{i1}y^1_1 + w^2_{i2}y^1_2 + w^2_{i3}y^1_3 + b^2_i\\Bigr) = \n",
+    " f_2\\left(\\sum_{j=1}^3 w^2_{ij} y_j^1 + b^2_i\\right).\n",
+    "\\label{_auto7} \\tag{12}\n",
+    "\\end{equation}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "316b8c32",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "This is not just a convenient and compact notation, but also a useful\n",
+    "and intuitive way to think about MLPs: The output is calculated by a\n",
+    "series of matrix-vector multiplications and vector additions that are\n",
+    "used as input to the activation functions. For each operation\n",
+    "$\\mathrm{W}_l \\hat{y}_{l-1}$ we move forward one layer."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "34ba90c8",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "### Activation functions\n",
+    "\n",
+    "A property that characterizes a neural network, other than its\n",
+    "connectivity, is the choice of activation function(s).  As described\n",
+    "in, the following restrictions are imposed on an activation function\n",
+    "for a FFNN to fulfill the universal approximation theorem\n",
+    "\n",
+    "  * Non-constant\n",
+    "\n",
+    "  * Bounded\n",
+    "\n",
+    "  * Monotonically-increasing\n",
+    "\n",
+    "  * Continuous"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3019fcaf",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "### Activation functions, Logistic and Hyperbolic ones\n",
+    "\n",
+    "The second requirement excludes all linear functions. Furthermore, in\n",
+    "a MLP with only linear activation functions, each layer simply\n",
+    "performs a linear transformation of its inputs.\n",
+    "\n",
+    "Regardless of the number of layers, the output of the NN will be\n",
+    "nothing but a linear function of the inputs. Thus we need to introduce\n",
+    "some kind of non-linearity to the NN to be able to fit non-linear\n",
+    "functions Typical examples are the logistic *Sigmoid*"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "389ff36b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "f(x) = \\frac{1}{1 + e^{-x}},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ee9b399a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and the *hyperbolic tangent* function"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "36f98b26",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "f(x) = \\tanh(x)\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cb7b8839",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "### Relevance\n",
+    "\n",
+    "The *sigmoid* function are more biologically plausible because the\n",
+    "output of inactive neurons are zero. Such activation function are\n",
+    "called *one-sided*. However, it has been shown that the hyperbolic\n",
+    "tangent performs better than the sigmoid for training MLPs.  has\n",
+    "become the most popular for *deep neural networks*"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 12,
+   "id": "db8d28b5",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "\"\"\"The sigmoid function (or the logistic curve) is a \n",
+    "function that takes any real number, z, and outputs a number (0,1).\n",
+    "It is useful in neural networks for assigning weights on a relative scale.\n",
+    "The value z is the weighted sum of parameters involved in the learning algorithm.\"\"\"\n",
+    "\n",
+    "import numpy\n",
+    "import matplotlib.pyplot as plt\n",
+    "import math as mt\n",
+    "\n",
+    "z = numpy.arange(-5, 5, .1)\n",
+    "sigma_fn = numpy.vectorize(lambda z: 1/(1+numpy.exp(-z)))\n",
+    "sigma = sigma_fn(z)\n",
+    "\n",
+    "fig = plt.figure()\n",
+    "ax = fig.add_subplot(111)\n",
+    "ax.plot(z, sigma)\n",
+    "ax.set_ylim([-0.1, 1.1])\n",
+    "ax.set_xlim([-5,5])\n",
+    "ax.grid(True)\n",
+    "ax.set_xlabel('z')\n",
+    "ax.set_title('sigmoid function')\n",
+    "\n",
+    "plt.show()\n",
+    "\n",
+    "\"\"\"Step Function\"\"\"\n",
+    "z = numpy.arange(-5, 5, .02)\n",
+    "step_fn = numpy.vectorize(lambda z: 1.0 if z >= 0.0 else 0.0)\n",
+    "step = step_fn(z)\n",
+    "\n",
+    "fig = plt.figure()\n",
+    "ax = fig.add_subplot(111)\n",
+    "ax.plot(z, step)\n",
+    "ax.set_ylim([-0.5, 1.5])\n",
+    "ax.set_xlim([-5,5])\n",
+    "ax.grid(True)\n",
+    "ax.set_xlabel('z')\n",
+    "ax.set_title('step function')\n",
+    "\n",
+    "plt.show()\n",
+    "\n",
+    "\"\"\"Sine Function\"\"\"\n",
+    "z = numpy.arange(-2*mt.pi, 2*mt.pi, 0.1)\n",
+    "t = numpy.sin(z)\n",
+    "\n",
+    "fig = plt.figure()\n",
+    "ax = fig.add_subplot(111)\n",
+    "ax.plot(z, t)\n",
+    "ax.set_ylim([-1.0, 1.0])\n",
+    "ax.set_xlim([-2*mt.pi,2*mt.pi])\n",
+    "ax.grid(True)\n",
+    "ax.set_xlabel('z')\n",
+    "ax.set_title('sine function')\n",
+    "\n",
+    "plt.show()\n",
+    "\n",
+    "\"\"\"Plots a graph of the squashing function used by a rectified linear\n",
+    "unit\"\"\"\n",
+    "z = numpy.arange(-2, 2, .1)\n",
+    "zero = numpy.zeros(len(z))\n",
+    "y = numpy.max([zero, z], axis=0)\n",
+    "\n",
+    "fig = plt.figure()\n",
+    "ax = fig.add_subplot(111)\n",
+    "ax.plot(z, y)\n",
+    "ax.set_ylim([-2.0, 2.0])\n",
+    "ax.set_xlim([-2.0, 2.0])\n",
+    "ax.grid(True)\n",
+    "ax.set_xlabel('z')\n",
+    "ax.set_title('Rectified linear unit')\n",
+    "\n",
+    "plt.show()"
+   ]
+  }
+ ],
+ "metadata": {},
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/doc/LectureNotes/_build/html/_sources/week41.ipynb b/doc/LectureNotes/_build/html/_sources/week41.ipynb
new file mode 100644
index 000000000..c9b1adcdd
--- /dev/null
+++ b/doc/LectureNotes/_build/html/_sources/week41.ipynb
@@ -0,0 +1,3820 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "b625bb28",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- HTML file automatically generated from DocOnce source (https://github.com/doconce/doconce/)\n",
+    "doconce format html week41.do.txt --no_mako -->\n",
+    "<!-- dom:TITLE: Week 41 Neural networks and constructing a neural network code -->"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "679109d4",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "# Week 41 Neural networks and constructing a neural network code\n",
+    "**Morten Hjorth-Jensen**, Department of Physics, University of Oslo, Norway\n",
+    "\n",
+    "Date: **Week 41**"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d7401ab9",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Plan for week 41, October 6-10"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f47e1c5c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Material for the lecture on Monday October 6, 2025\n",
+    "1. Neural Networks, setting up the basic steps, from the simple perceptron model to the multi-layer perceptron model.\n",
+    "\n",
+    "2. Building our own Feed-forward Neural Network, getting started\n",
+    "<!-- * Video of lecture notes at URL:\"\" -->\n",
+    "<!-- * Whiteboard notes at <https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2025/FYSSTKweek41.pdf> -->"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "af0a9895",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Readings and Videos:\n",
+    "1. These lecture notes\n",
+    "\n",
+    "2. For neural networks we recommend Goodfellow et al chapters 6 and 7.\n",
+    "\n",
+    "3. Rashkca et al., chapter 11, jupyter-notebook sent separately, from [GitHub](https://github.com/rasbt/machine-learning-book)\n",
+    "\n",
+    "4. Neural Networks demystified at <https://www.youtube.com/watch?v=bxe2T-V8XRs&list=PLiaHhY2iBX9hdHaRr6b7XevZtgZRa1PoU&ab_channel=WelchLabs>\n",
+    "\n",
+    "5. Building Neural Networks from scratch at <https://www.youtube.com/watch?v=Wo5dMEP_BbI&list=PLQVvvaa0QuDcjD5BAw2DxE6OF2tius3V3&ab_channel=sentdex>\n",
+    "\n",
+    "6. Video on Neural Networks at <https://www.youtube.com/watch?v=CqOfi41LfDw>\n",
+    "\n",
+    "7. Video on the back propagation algorithm at <https://www.youtube.com/watch?v=Ilg3gGewQ5U>\n",
+    "\n",
+    "8. We also  recommend Michael Nielsen's intuitive approach to the neural networks and the universal approximation theorem, see the slides at <http://neuralnetworksanddeeplearning.com/chap4.html>."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "be1e5c03",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Mathematics of deep learning\n",
+    "\n",
+    "**Two recent books online.**\n",
+    "\n",
+    "1. [The Modern Mathematics of Deep Learning, by Julius Berner, Philipp Grohs, Gitta Kutyniok, Philipp Petersen](https://arxiv.org/abs/2105.04026), published as [Mathematical Aspects of Deep Learning, pp. 1-111. Cambridge University Press, 2022](https://doi.org/10.1017/9781009025096.002)\n",
+    "\n",
+    "2. [Mathematical Introduction to Deep Learning: Methods, Implementations, and Theory, Arnulf Jentzen, Benno Kuckuck, Philippe von Wurstemberger](https://doi.org/10.48550/arXiv.2310.20360)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "52520e8f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Reminder on books with hands-on material and codes\n",
+    "[Sebastian Rashcka et al, Machine learning with Sickit-Learn and PyTorch](https://sebastianraschka.com/blog/2022/ml-pytorch-book.html)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "408a0487",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Lab sessions on Tuesday and Wednesday\n",
+    "\n",
+    "Aim: Getting started with coding neural network. The exercises this\n",
+    "week aim at setting up the feed-forward part of a neural network."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "23056baf",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Lecture Monday  October 6"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "56a2f2f2",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Introduction to Neural networks\n",
+    "\n",
+    "Artificial neural networks are computational systems that can learn to\n",
+    "perform tasks by considering examples, generally without being\n",
+    "programmed with any task-specific rules. It is supposed to mimic a\n",
+    "biological system, wherein neurons interact by sending signals in the\n",
+    "form of mathematical functions between layers. All layers can contain\n",
+    "an arbitrary number of neurons, and each connection is represented by\n",
+    "a weight variable."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2e3fa93d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Artificial neurons\n",
+    "\n",
+    "The field of artificial neural networks has a long history of\n",
+    "development, and is closely connected with the advancement of computer\n",
+    "science and computers in general. A model of artificial neurons was\n",
+    "first developed by McCulloch and Pitts in 1943 to study signal\n",
+    "processing in the brain and has later been refined by others. The\n",
+    "general idea is to mimic neural networks in the human brain, which is\n",
+    "composed of billions of neurons that communicate with each other by\n",
+    "sending electrical signals.  Each neuron accumulates its incoming\n",
+    "signals, which must exceed an activation threshold to yield an\n",
+    "output. If the threshold is not overcome, the neuron remains inactive,\n",
+    "i.e. has zero output.\n",
+    "\n",
+    "This behaviour has inspired a simple mathematical model for an artificial neuron."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0afafe3e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"artificialNeuron\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation}\n",
+    " y = f\\left(\\sum_{i=1}^n w_ix_i\\right) = f(u)\n",
+    "\\label{artificialNeuron} \\tag{1}\n",
+    "\\end{equation}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "bc113056",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Here, the output $y$ of the neuron is the value of its activation function, which have as input\n",
+    "a weighted sum of signals $x_i, \\dots ,x_n$ received by $n$ other neurons.\n",
+    "\n",
+    "Conceptually, it is helpful to divide neural networks into four\n",
+    "categories:\n",
+    "1. general purpose neural networks for supervised learning,\n",
+    "\n",
+    "2. neural networks designed specifically for image processing, the most prominent example of this class being Convolutional Neural Networks (CNNs),\n",
+    "\n",
+    "3. neural networks for sequential data such as Recurrent Neural Networks (RNNs), and\n",
+    "\n",
+    "4. neural networks for unsupervised learning such as Deep Boltzmann Machines.\n",
+    "\n",
+    "In natural science, DNNs and CNNs have already found numerous\n",
+    "applications. In statistical physics, they have been applied to detect\n",
+    "phase transitions in 2D Ising and Potts models, lattice gauge\n",
+    "theories, and different phases of polymers, or solving the\n",
+    "Navier-Stokes equation in weather forecasting.  Deep learning has also\n",
+    "found interesting applications in quantum physics. Various quantum\n",
+    "phase transitions can be detected and studied using DNNs and CNNs,\n",
+    "topological phases, and even non-equilibrium many-body\n",
+    "localization. Representing quantum states as DNNs quantum state\n",
+    "tomography are among some of the impressive achievements to reveal the\n",
+    "potential of DNNs to facilitate the study of quantum systems.\n",
+    "\n",
+    "In quantum information theory, it has been shown that one can perform\n",
+    "gate decompositions with the help of neural. \n",
+    "\n",
+    "The applications are not limited to the natural sciences. There is a\n",
+    "plethora of applications in essentially all disciplines, from the\n",
+    "humanities to life science and medicine."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "872c3321",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Neural network types\n",
+    "\n",
+    "An artificial neural network (ANN), is a computational model that\n",
+    "consists of layers of connected neurons, or nodes or units.  We will\n",
+    "refer to these interchangeably as units or nodes, and sometimes as\n",
+    "neurons.\n",
+    "\n",
+    "It is supposed to mimic a biological nervous system by letting each\n",
+    "neuron interact with other neurons by sending signals in the form of\n",
+    "mathematical functions between layers.  A wide variety of different\n",
+    "ANNs have been developed, but most of them consist of an input layer,\n",
+    "an output layer and eventual layers in-between, called *hidden\n",
+    "layers*. All layers can contain an arbitrary number of nodes, and each\n",
+    "connection between two nodes is associated with a weight variable.\n",
+    "\n",
+    "Neural networks (also called neural nets) are neural-inspired\n",
+    "nonlinear models for supervised learning.  As we will see, neural nets\n",
+    "can be viewed as natural, more powerful extensions of supervised\n",
+    "learning methods such as linear and logistic regression and soft-max\n",
+    "methods we discussed earlier."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "53edae74",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Feed-forward neural networks\n",
+    "\n",
+    "The feed-forward neural network (FFNN) was the first and simplest type\n",
+    "of ANNs that were devised. In this network, the information moves in\n",
+    "only one direction: forward through the layers.\n",
+    "\n",
+    "Nodes are represented by circles, while the arrows display the\n",
+    "connections between the nodes, including the direction of information\n",
+    "flow. Additionally, each arrow corresponds to a weight variable\n",
+    "(figure to come).  We observe that each node in a layer is connected\n",
+    "to *all* nodes in the subsequent layer, making this a so-called\n",
+    "*fully-connected* FFNN."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0eef36d6",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Convolutional Neural Network\n",
+    "\n",
+    "A different variant of FFNNs are *convolutional neural networks*\n",
+    "(CNNs), which have a connectivity pattern inspired by the animal\n",
+    "visual cortex. Individual neurons in the visual cortex only respond to\n",
+    "stimuli from small sub-regions of the visual field, called a receptive\n",
+    "field. This makes the neurons well-suited to exploit the strong\n",
+    "spatially local correlation present in natural images. The response of\n",
+    "each neuron can be approximated mathematically as a convolution\n",
+    "operation.  (figure to come)\n",
+    "\n",
+    "Convolutional neural networks emulate the behaviour of neurons in the\n",
+    "visual cortex by enforcing a *local* connectivity pattern between\n",
+    "nodes of adjacent layers: Each node in a convolutional layer is\n",
+    "connected only to a subset of the nodes in the previous layer, in\n",
+    "contrast to the fully-connected FFNN.  Often, CNNs consist of several\n",
+    "convolutional layers that learn local features of the input, with a\n",
+    "fully-connected layer at the end, which gathers all the local data and\n",
+    "produces the outputs. They have wide applications in image and video\n",
+    "recognition."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "bf602451",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Recurrent neural networks\n",
+    "\n",
+    "So far we have only mentioned ANNs where information flows in one\n",
+    "direction: forward. *Recurrent neural networks* on the other hand,\n",
+    "have connections between nodes that form directed *cycles*. This\n",
+    "creates a form of internal memory which are able to capture\n",
+    "information on what has been calculated before; the output is\n",
+    "dependent on the previous computations. Recurrent NNs make use of\n",
+    "sequential information by performing the same task for every element\n",
+    "in a sequence, where each element depends on previous elements. An\n",
+    "example of such information is sentences, making recurrent NNs\n",
+    "especially well-suited for handwriting and speech recognition."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0afbe2d0",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Other types of networks\n",
+    "\n",
+    "There are many other kinds of ANNs that have been developed. One type\n",
+    "that is specifically designed for interpolation in multidimensional\n",
+    "space is the radial basis function (RBF) network. RBFs are typically\n",
+    "made up of three layers: an input layer, a hidden layer with\n",
+    "non-linear radial symmetric activation functions and a linear output\n",
+    "layer (''linear'' here means that each node in the output layer has a\n",
+    "linear activation function). The layers are normally fully-connected\n",
+    "and there are no cycles, thus RBFs can be viewed as a type of\n",
+    "fully-connected FFNN. They are however usually treated as a separate\n",
+    "type of NN due the unusual activation functions."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d957cfe8",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Multilayer perceptrons\n",
+    "\n",
+    "One uses often so-called fully-connected feed-forward neural networks\n",
+    "with three or more layers (an input layer, one or more hidden layers\n",
+    "and an output layer) consisting of neurons that have non-linear\n",
+    "activation functions.\n",
+    "\n",
+    "Such networks are often called *multilayer perceptrons* (MLPs)."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "57b218ab",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Why multilayer perceptrons?\n",
+    "\n",
+    "According to the *Universal approximation theorem*, a feed-forward\n",
+    "neural network with just a single hidden layer containing a finite\n",
+    "number of neurons can approximate a continuous multidimensional\n",
+    "function to arbitrary accuracy, assuming the activation function for\n",
+    "the hidden layer is a **non-constant, bounded and\n",
+    "monotonically-increasing continuous function**.\n",
+    "\n",
+    "Note that the requirements on the activation function only applies to\n",
+    "the hidden layer, the output nodes are always assumed to be linear, so\n",
+    "as to not restrict the range of output values."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6bda8dda",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Illustration of a single perceptron model and a multi-perceptron model\n",
+    "\n",
+    "<!-- dom:FIGURE: [figures/nns.png, width=600 frac=0.8]  In a) we show a single perceptron model while in b) we dispay a network with two  hidden layers, an input layer and an output layer. -->\n",
+    "<!-- begin figure -->\n",
+    "\n",
+    "<img src=\"figures/nns.png\" width=\"600\"><p style=\"font-size: 0.9em\"><i>Figure 1: In a) we show a single perceptron model while in b) we dispay a network with two  hidden layers, an input layer and an output layer.</i></p>\n",
+    "<!-- end figure -->"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f7d514be",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Mathematics of deep learning and neural networks\n",
+    "\n",
+    "Neural networks, in its so-called feed-forward form, where each\n",
+    "iterations contains a feed-forward stage and a back-propgagation\n",
+    "stage, consist of series of affine matrix-matrix and matrix-vector\n",
+    "multiplications. The unknown parameters (the so-called biases and\n",
+    "weights which deternine the architecture of a neural network), are\n",
+    "uptaded iteratively using the so-called back-propagation algorithm.\n",
+    "This algorithm corresponds to the so-called reverse mode of \n",
+    "automatic differentation."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "02ed299b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Basics of an NN\n",
+    "\n",
+    "A neural network consists of a series of hidden layers, in addition to\n",
+    "the input and output layers.  Each layer $l$ has a set of parameters\n",
+    "$\\boldsymbol{\\Theta}^{(l)}=(\\boldsymbol{W}^{(l)},\\boldsymbol{b}^{(l)})$ which are related to the\n",
+    "parameters in other layers through a series of affine transformations,\n",
+    "for a standard NN these are matrix-matrix and matrix-vector\n",
+    "multiplications.  For all layers we will simply use a collective variable $\\boldsymbol{\\Theta}$.\n",
+    "\n",
+    "It consist of two basic steps:\n",
+    "1. a feed forward stage which takes a given input and produces a final output which is compared with the target values through our cost/loss function.\n",
+    "\n",
+    "2. a back-propagation state where the unknown parameters $\\boldsymbol{\\Theta}$ are updated through the optimization of the their gradients. The expressions for the gradients are obtained via the chain rule, starting from the derivative of the cost/function.\n",
+    "\n",
+    "These two steps make up one iteration. This iterative process is continued till we reach an eventual stopping criterion."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "96b8c13c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Overarching view of a neural network\n",
+    "\n",
+    "The architecture of a neural network defines our model. This model\n",
+    "aims at describing some function $f(\\boldsymbol{x}$ which represents\n",
+    "some final result (outputs or tagrget values) given a specific inpput\n",
+    "$\\boldsymbol{x}$. Note that here $\\boldsymbol{y}$ and $\\boldsymbol{x}$ are not limited to be\n",
+    "vectors.\n",
+    "\n",
+    "The architecture consists of\n",
+    "1. An input and an output layer where the input layer is defined by the inputs $\\boldsymbol{x}$. The output layer produces the model ouput $\\boldsymbol{\\tilde{y}}$ which is compared with the target value $\\boldsymbol{y}$\n",
+    "\n",
+    "2. A given number of hidden layers and neurons/nodes/units for each layer (this may vary)\n",
+    "\n",
+    "3. A given activation function $\\sigma(\\boldsymbol{z})$ with arguments $\\boldsymbol{z}$ to be defined below. The activation functions may differ from layer to layer.\n",
+    "\n",
+    "4. The last layer, normally called **output** layer has normally an activation function tailored to the specific problem\n",
+    "\n",
+    "5. Finally we define a so-called cost or loss function which is used to gauge the quality of our model."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "089704bf",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## The optimization problem\n",
+    "\n",
+    "The cost function is a function of the unknown parameters\n",
+    "$\\boldsymbol{\\Theta}$ where the latter is a container for all possible\n",
+    "parameters needed to define a neural network\n",
+    "\n",
+    "If we are dealing with a regression task a typical cost/loss function\n",
+    "is the mean squared error"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "91ef7170",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "C(\\boldsymbol{\\Theta})=\\frac{1}{n}\\left\\{\\left(\\boldsymbol{y}-\\boldsymbol{X}\\boldsymbol{\\theta}\\right)^T\\left(\\boldsymbol{y}-\\boldsymbol{X}\\boldsymbol{\\theta}\\right)\\right\\}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c9402737",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "This function represents one of many possible ways to define\n",
+    "the so-called cost function. Note that here we have assumed a linear dependence in terms of the paramters $\\boldsymbol{\\Theta}$. This is in general not the case."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "09940e05",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Parameters of neural networks\n",
+    "For neural networks the parameters\n",
+    "$\\boldsymbol{\\Theta}$ are given by the so-called weights and biases (to be\n",
+    "defined below).\n",
+    "\n",
+    "The weights are given by matrix elements $w_{ij}^{(l)}$ where the\n",
+    "superscript indicates the layer number. The biases are typically given\n",
+    "by vector elements representing each single node of a given layer,\n",
+    "that is $b_j^{(l)}$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2bd7b3ff",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Other ingredients of a neural network\n",
+    "\n",
+    "Having defined the architecture of a neural network, the optimization\n",
+    "of the cost function with respect to the parameters $\\boldsymbol{\\Theta}$,\n",
+    "involves the calculations of gradients and their optimization. The\n",
+    "gradients represent the derivatives of a multidimensional object and\n",
+    "are often approximated by various gradient methods, including\n",
+    "1. various quasi-Newton methods,\n",
+    "\n",
+    "2. plain gradient descent (GD) with a constant learning rate $\\eta$,\n",
+    "\n",
+    "3. GD with momentum and other approximations to the learning rates such as\n",
+    "\n",
+    "  * Adapative gradient (ADAgrad)\n",
+    "\n",
+    "  * Root mean-square propagation (RMSprop)\n",
+    "\n",
+    "  * Adaptive gradient with momentum (ADAM) and many other\n",
+    "\n",
+    "4. Stochastic gradient descent and various families of learning rate approximations"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1a771f02",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Other parameters\n",
+    "\n",
+    "In addition to the above, there are often additional hyperparamaters\n",
+    "which are included in the setup of a neural network. These will be\n",
+    "discussed below."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3291a232",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Universal approximation theorem\n",
+    "\n",
+    "The universal approximation theorem plays a central role in deep\n",
+    "learning.  [Cybenko (1989)](https://link.springer.com/article/10.1007/BF02551274) showed\n",
+    "the following:\n",
+    "\n",
+    "Let $\\sigma$ be any continuous sigmoidal function such that"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "74cc209d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\sigma(z) = \\left\\{\\begin{array}{cc} 1 & z\\rightarrow \\infty\\\\ 0 & z \\rightarrow -\\infty \\end{array}\\right.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "fe210f2f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Given a continuous and deterministic function $F(\\boldsymbol{x})$ on the unit\n",
+    "cube in $d$-dimensions $F\\in [0,1]^d$, $x\\in [0,1]^d$ and a parameter\n",
+    "$\\epsilon >0$, there is a one-layer (hidden) neural network\n",
+    "$f(\\boldsymbol{x};\\boldsymbol{\\Theta})$ with $\\boldsymbol{\\Theta}=(\\boldsymbol{W},\\boldsymbol{b})$ and $\\boldsymbol{W}\\in\n",
+    "\\mathbb{R}^{m\\times n}$ and $\\boldsymbol{b}\\in \\mathbb{R}^{n}$, for which"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4dfec9c9",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\vert F(\\boldsymbol{x})-f(\\boldsymbol{x};\\boldsymbol{\\Theta})\\vert < \\epsilon \\hspace{0.1cm} \\forall \\boldsymbol{x}\\in[0,1]^d.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a65f0cd5",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Some parallels from real analysis\n",
+    "\n",
+    "For those of you familiar with for example the [Stone-Weierstrass\n",
+    "theorem](https://en.wikipedia.org/wiki/Stone%E2%80%93Weierstrass_theorem)\n",
+    "for polynomial approximations or the convergence criterion for Fourier\n",
+    "series, there are similarities in the derivation of the proof for\n",
+    "neural networks."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d006386b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## The approximation theorem in words\n",
+    "\n",
+    "**Any continuous function $y=F(\\boldsymbol{x})$ supported on the unit cube in\n",
+    "$d$-dimensions can be approximated by a one-layer sigmoidal network to\n",
+    "arbitrary accuracy.**\n",
+    "\n",
+    "[Hornik (1991)](https://www.sciencedirect.com/science/article/abs/pii/089360809190009T) extended the theorem by letting any non-constant, bounded activation function to be included using that the expectation value"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0b094d43",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\mathbb{E}[\\vert F(\\boldsymbol{x})\\vert^2] =\\int_{\\boldsymbol{x}\\in D} \\vert F(\\boldsymbol{x})\\vert^2p(\\boldsymbol{x})d\\boldsymbol{x} < \\infty.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f2b9ca56",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Then we have"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "db4817b0",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\mathbb{E}[\\vert F(\\boldsymbol{x})-f(\\boldsymbol{x};\\boldsymbol{\\Theta})\\vert^2] =\\int_{\\boldsymbol{x}\\in D} \\vert F(\\boldsymbol{x})-f(\\boldsymbol{x};\\boldsymbol{\\Theta})\\vert^2p(\\boldsymbol{x})d\\boldsymbol{x} < \\epsilon.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "43216143",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## More on the general approximation theorem\n",
+    "\n",
+    "None of the proofs give any insight into the relation between the\n",
+    "number of of hidden layers and nodes and the approximation error\n",
+    "$\\epsilon$, nor the magnitudes of $\\boldsymbol{W}$ and $\\boldsymbol{b}$.\n",
+    "\n",
+    "Neural networks (NNs) have what we may call a kind of universality no matter what function we want to compute.\n",
+    "\n",
+    "It does not mean that an NN can be used to exactly compute any function. Rather, we get an approximation that is as good as we want."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ef48ad88",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Class of functions we can approximate\n",
+    "\n",
+    "The class of functions that can be approximated are the continuous ones.\n",
+    "If the function $F(\\boldsymbol{x})$ is discontinuous, it won't in general be possible to approximate it. However, an NN may still give an approximation even if we fail in some points."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7c4fed36",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Setting up the equations for a neural network\n",
+    "\n",
+    "The questions we want to ask are how do changes in the biases and the\n",
+    "weights in our network change the cost function and how can we use the\n",
+    "final output to modify the weights and biases?\n",
+    "\n",
+    "To derive these equations let us start with a plain regression problem\n",
+    "and define our cost function as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c4cf04e8",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "{\\cal C}(\\boldsymbol{\\Theta})  =  \\frac{1}{2}\\sum_{i=1}^n\\left(y_i - \\tilde{y}_i\\right)^2,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ecc9a1bd",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where the $y_i$s are our $n$ targets (the values we want to\n",
+    "reproduce), while the outputs of the network after having propagated\n",
+    "all inputs $\\boldsymbol{x}$ are given by $\\boldsymbol{\\tilde{y}}_i$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "91e6e150",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Layout of a neural network with three hidden layers\n",
+    "\n",
+    "<!-- dom:FIGURE: [figures/nn1.png, width=900 frac=1.0] -->\n",
+    "<!-- begin figure -->\n",
+    "\n",
+    "<img src=\"figures/nn1.png\" width=\"900\"><p style=\"font-size: 0.9em\"><i>Figure 1: </i></p>\n",
+    "<!-- end figure -->"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4fabe3cc",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Definitions\n",
+    "\n",
+    "With our definition of the targets $\\boldsymbol{y}$, the outputs of the\n",
+    "network $\\boldsymbol{\\tilde{y}}$ and the inputs $\\boldsymbol{x}$ we\n",
+    "define now the activation $z_j^l$ of node/neuron/unit $j$ of the\n",
+    "$l$-th layer as a function of the bias, the weights which add up from\n",
+    "the previous layer $l-1$ and the forward passes/outputs\n",
+    "$\\hat{a}^{l-1}$ from the previous layer as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8c25e4cf",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "z_j^l = \\sum_{i=1}^{M_{l-1}}w_{ij}^la_i^{l-1}+b_j^l,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ae861380",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where $b_k^l$ are the biases from layer $l$.  Here $M_{l-1}$\n",
+    "represents the total number of nodes/neurons/units of layer $l-1$. The\n",
+    "figure in the whiteboard notes illustrates this equation.  We can rewrite this in a more\n",
+    "compact form as the matrix-vector products we discussed earlier,"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2b7f7b74",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\hat{z}^l = \\left(\\hat{W}^l\\right)^T\\hat{a}^{l-1}+\\hat{b}^l.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e76386ca",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Inputs to the activation function\n",
+    "\n",
+    "With the activation values $\\boldsymbol{z}^l$ we can in turn define the\n",
+    "output of layer $l$ as $\\boldsymbol{a}^l = f(\\boldsymbol{z}^l)$ where $f$ is our\n",
+    "activation function. In the examples here we will use the sigmoid\n",
+    "function discussed in our logistic regression lectures. We will also use the same activation function $f$ for all layers\n",
+    "and their nodes.  It means we have"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "12a9fb38",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "a_j^l = \\sigma(z_j^l) = \\frac{1}{1+\\exp{-(z_j^l)}}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "08bbe824",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Derivatives and the chain rule\n",
+    "\n",
+    "From the definition of the activation $z_j^l$ we have"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3783fe53",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial z_j^l}{\\partial w_{ij}^l} = a_i^{l-1},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2b70d213",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "209db1b2",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial z_j^l}{\\partial a_i^{l-1}} = w_{ji}^l.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6e42f02f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "With our definition of the activation function we have that (note that this function depends only on $z_j^l$)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "78422fdc",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial a_j^l}{\\partial z_j^{l}} = a_j^l(1-a_j^l)=\\sigma(z_j^l)(1-\\sigma(z_j^l)).\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8c8491cf",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Derivative of the cost function\n",
+    "\n",
+    "With these definitions we can now compute the derivative of the cost function in terms of the weights.\n",
+    "\n",
+    "Let us specialize to the output layer $l=L$. Our cost function is"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "82fb3ded",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "{\\cal C}(\\boldsymbol{\\Theta}^L)  =  \\frac{1}{2}\\sum_{i=1}^n\\left(y_i - \\tilde{y}_i\\right)^2=\\frac{1}{2}\\sum_{i=1}^n\\left(a_i^L - y_i\\right)^2,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "88fe7049",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "The derivative of this function with respect to the weights is"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "af856571",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial{\\cal C}(\\boldsymbol{\\Theta}^L)}{\\partial w_{jk}^L}  =  \\left(a_j^L - y_j\\right)\\frac{\\partial a_j^L}{\\partial w_{jk}^{L}},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d684ab45",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "The last partial derivative can easily be computed and reads (by applying the chain rule)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ac371b5c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial a_j^L}{\\partial w_{jk}^{L}} = \\frac{\\partial a_j^L}{\\partial z_{j}^{L}}\\frac{\\partial z_j^L}{\\partial w_{jk}^{L}}=a_j^L(1-a_j^L)a_k^{L-1}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8dbfe230",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Simpler examples first, and automatic differentiation\n",
+    "\n",
+    "In order to understand the back propagation algorithm and its\n",
+    "derivation (an implementation of the chain rule), let us first digress\n",
+    "with some simple examples. These examples are also meant to motivate\n",
+    "the link with back propagation and [automatic differentiation](https://en.wikipedia.org/wiki/Automatic_differentiation). We will discuss these topics next week (week 42)."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7244f7f3",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Reminder on the chain rule and gradients\n",
+    "\n",
+    "If we have a multivariate function $f(x,y)$ where $x=x(t)$ and $y=y(t)$ are functions of a variable $t$, we have that the gradient of $f$ with respect to $t$ (without the explicit unit vector components)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ffb80d86",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{df}{dt} = \\begin{bmatrix}\\frac{\\partial f}{\\partial x} & \\frac{\\partial f}{\\partial y} \\end{bmatrix} \\begin{bmatrix}\\frac{\\partial x}{\\partial t} \\\\ \\frac{\\partial y}{\\partial t} \\end{bmatrix}=\\frac{\\partial f}{\\partial x} \\frac{\\partial x}{\\partial t} +\\frac{\\partial f}{\\partial y} \\frac{\\partial y}{\\partial t}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6f15ef23",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Multivariable functions\n",
+    "\n",
+    "If we have a multivariate function $f(x,y)$ where $x=x(t,s)$ and $y=y(t,s)$ are functions of the variables $t$ and $s$, we have that the partial derivatives"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1734d532",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial f}{\\partial s}=\\frac{\\partial f}{\\partial x}\\frac{\\partial x}{\\partial s}+\\frac{\\partial f}{\\partial y}\\frac{\\partial y}{\\partial s},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8c013e25",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f416e200",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial f}{\\partial t}=\\frac{\\partial f}{\\partial x}\\frac{\\partial x}{\\partial t}+\\frac{\\partial f}{\\partial y}\\frac{\\partial y}{\\partial t}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "943d440c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "the gradient of $f$ with respect to $t$ and $s$ (without the explicit unit vector components)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9a88f9e3",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{df}{d(s,t)} = \\begin{bmatrix}\\frac{\\partial f}{\\partial x} & \\frac{\\partial f}{\\partial y} \\end{bmatrix} \\begin{bmatrix}\\frac{\\partial x}{\\partial s}  &\\frac{\\partial x}{\\partial t} \\\\ \\frac{\\partial y}{\\partial s} & \\frac{\\partial y}{\\partial t} \\end{bmatrix}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6bc993bf",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Automatic differentiation through examples\n",
+    "\n",
+    "A great introduction to automatic differentiation is given by Baydin et al., see <https://arxiv.org/abs/1502.05767>.\n",
+    "See also the video at <https://www.youtube.com/watch?v=wG_nF1awSSY>.\n",
+    "\n",
+    "Automatic differentiation is a represented by a repeated application\n",
+    "of the chain rule on well-known functions and allows for the\n",
+    "calculation of derivatives to numerical precision. It is not the same\n",
+    "as the calculation of symbolic derivatives via for example SymPy, nor\n",
+    "does it use approximative formulae based on Taylor-expansions of a\n",
+    "function around a given value. The latter are error prone due to\n",
+    "truncation errors and values of the step size $\\Delta$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0685fdd2",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Simple example\n",
+    "\n",
+    "Our first example is rather simple,"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9a2b16de",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "f(x) =\\exp{x^2},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ba5c3f8a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "with derivative"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d0c973a9",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "f'(x) =2x\\exp{x^2}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "34c21223",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "We can use SymPy to extract the pertinent lines of Python code through the following simple example"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "id": "72fa0f44",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "from __future__ import division\n",
+    "from sympy import *\n",
+    "x = symbols('x')\n",
+    "expr = exp(x*x)\n",
+    "simplify(expr)\n",
+    "derivative = diff(expr,x)\n",
+    "print(python(expr))\n",
+    "print(python(derivative))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "78884bc6",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Smarter way of evaluating the above function\n",
+    "If we study this function, we note that we can reduce the number of operations by introducing an intermediate variable"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f13d7286",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "a = x^2,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "443739d9",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "leading to"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "48b45da1",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "f(x) = f(a(x)) = b= \\exp{a}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "81e7fd8f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "We now assume that all operations can be counted in terms of equal\n",
+    "floating point operations. This means that in order to calculate\n",
+    "$f(x)$ we need first to square $x$ and then compute the exponential. We\n",
+    "have thus two floating point operations only."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "824bbfa1",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Reducing the number of operations\n",
+    "\n",
+    "With the introduction of a precalculated quantity $a$ and thereby $f(x)$ we have that the derivative can be written as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "42d2716e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "f'(x) = 2xb,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f27855c1",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "which reduces the number of operations from four in the orginal\n",
+    "expression to two. This means that if we need to compute $f(x)$ and\n",
+    "its derivative (a common task in optimizations), we have reduced the\n",
+    "number of operations from six to four in total.\n",
+    "\n",
+    "**Note** that the usage of a symbolic software like SymPy does not\n",
+    "include such simplifications and the calculations of the function and\n",
+    "the derivatives yield in general more floating point operations."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d4fe531f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Chain rule, forward and reverse modes\n",
+    "\n",
+    "In the above example we have introduced the variables $a$ and $b$, and our function is"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "aba8f666",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "f(x) = f(a(x)) = b= \\exp{a},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "404c698a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "with $a=x^2$. We can decompose the derivative of $f$ with respect to $x$ as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2c73032a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{df}{dx}=\\frac{df}{db}\\frac{db}{da}\\frac{da}{dx}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "95a71a82",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "We note that since $b=f(x)$ that"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c71b8e66",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{df}{db}=1,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "23998633",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "leading to"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0708e562",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{df}{dx}=\\frac{db}{da}\\frac{da}{dx}=2x\\exp{x^2},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ee8c4ade",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "as before."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "860d410c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Forward and reverse modes\n",
+    "\n",
+    "We have that"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "064e5852",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{df}{dx}=\\frac{df}{db}\\frac{db}{da}\\frac{da}{dx},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "983c3afe",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "which we can rewrite either as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a1f9638f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{df}{dx}=\\left[\\frac{df}{db}\\frac{db}{da}\\right]\\frac{da}{dx},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "84a07e04",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "or"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4383650d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{df}{dx}=\\frac{df}{db}\\left[\\frac{db}{da}\\frac{da}{dx}\\right].\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "36a2d607",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "The first expression is called reverse mode (or back propagation)\n",
+    "since we start by evaluating the derivatives at the end point and then\n",
+    "propagate backwards. This is the standard way of evaluating\n",
+    "derivatives (gradients) when optimizing the parameters of a neural\n",
+    "network.  In the context of deep learning this is computationally\n",
+    "more efficient since the output of a neural network consists of either\n",
+    "one or some few other output variables.\n",
+    "\n",
+    "The second equation defines the so-called  **forward mode**."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ab0a9ca8",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## More complicated function\n",
+    "\n",
+    "We increase our ambitions and introduce a slightly more complicated function"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e85a7d29",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "f(x) =\\sqrt{x^2+exp{x^2}},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "91c151e1",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "with derivative"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "037a60e4",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "f'(x) =\\frac{x(1+\\exp{x^2})}{\\sqrt{x^2+exp{x^2}}}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9f198b96",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "The corresponding SymPy code reads"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "id": "620b6c3e",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "from __future__ import division\n",
+    "from sympy import *\n",
+    "x = symbols('x')\n",
+    "expr = sqrt(x*x+exp(x*x))\n",
+    "simplify(expr)\n",
+    "derivative = diff(expr,x)\n",
+    "print(python(expr))\n",
+    "print(python(derivative))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d1fe5ce8",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Counting the number of floating point operations\n",
+    "\n",
+    "A simple count of operations shows that we need five operations for\n",
+    "the function itself and ten for the derivative.  Fifteen operations in total if we wish to proceed with the above codes.\n",
+    "\n",
+    "Can we reduce this to\n",
+    "say half the number of operations?"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "746e84de",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Defining intermediate operations\n",
+    "\n",
+    "We can indeed reduce the number of operation to half of those listed in the brute force approach above.\n",
+    "We define the following quantities"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cbb4abde",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "a = x^2,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "640a0037",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e3b8b12d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "b = \\exp{x^2} = \\exp{a},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5b2087bf",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5c397a99",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "c= a+b,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c4884822",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c1834aef",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "d=f(x)=\\sqrt{c}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "aeee8fc4",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## New expression for the derivative\n",
+    "\n",
+    "With these definitions we obtain the following partial derivatives"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "df71e889",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial a}{\\partial x} = 2x,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "358a49a2",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "95138b08",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial b}{\\partial a} = \\exp{a},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0a0e2f81",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7fa7f3b5",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial c}{\\partial a} = 1,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c74442e2",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2e9ebae8",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial c}{\\partial b} = 1,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "db89516c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0bc2735a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial d}{\\partial c} = \\frac{1}{2\\sqrt{c}},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "42e0cb08",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and finally"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "56ccf1d5",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial f}{\\partial d} = 1.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "557f2482",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Final derivatives\n",
+    "Our final derivatives are thus"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "90eeebe1",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial f}{\\partial c} = \\frac{\\partial f}{\\partial d} \\frac{\\partial d}{\\partial c}  = \\frac{1}{2\\sqrt{c}},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6c2abeb4",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial f}{\\partial b} = \\frac{\\partial f}{\\partial c} \\frac{\\partial c}{\\partial b}  = \\frac{1}{2\\sqrt{c}},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3f5af305",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial f}{\\partial a} = \\frac{\\partial f}{\\partial c} \\frac{\\partial c}{\\partial a}+\n",
+    "\\frac{\\partial f}{\\partial b} \\frac{\\partial b}{\\partial a}  = \\frac{1+\\exp{a}}{2\\sqrt{c}},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b78e9f43",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and finally"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d197d721",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial f}{\\partial x} = \\frac{\\partial f}{\\partial a} \\frac{\\partial a}{\\partial x}  = \\frac{x(1+\\exp{a})}{\\sqrt{c}},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "17334528",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "which is just"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f69ca3fd",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial f}{\\partial x} = \\frac{x(1+b)}{d},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e937d622",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and requires only three operations if we can reuse all intermediate variables."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8ab7ba6b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## In general not this simple\n",
+    "\n",
+    "In general, see the generalization below, unless we can obtain simple\n",
+    "analytical expressions which we can simplify further, the final\n",
+    "implementation of automatic differentiation involves repeated\n",
+    "calculations (and thereby operations) of derivatives of elementary\n",
+    "functions."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "02665ba6",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Automatic differentiation\n",
+    "\n",
+    "We can make this example more formal. Automatic differentiation is a\n",
+    "formalization of the previous example (see graph).\n",
+    "\n",
+    "We define $\\boldsymbol{x}\\in x_1,\\dots, x_l$ input variables to a given function $f(\\boldsymbol{x})$ and $x_{l+1},\\dots, x_L$ intermediate variables.\n",
+    "\n",
+    "In the above example we have only one input variable, $l=1$ and four intermediate variables, that is"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c473a49a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\begin{bmatrix} x_1=x & x_2 = x^2=a & x_3 =\\exp{a}= b & x_4=c=a+b & x_5 = \\sqrt{c}=d \\end{bmatrix}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6beeffc2",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Furthemore, for $i=l+1, \\dots, L$ (here $i=2,3,4,5$ and $f=x_L=d$), we\n",
+    "define the elementary functions $g_i(x_{Pa(x_i)})$ where $x_{Pa(x_i)}$ are the parent nodes of the variable $x_i$.\n",
+    "\n",
+    "In our case, we have for example for $x_3=g_3(x_{Pa(x_i)})=\\exp{a}$, that $g_3=\\exp{()}$ and $x_{Pa(x_3)}=a$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "814918db",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Chain rule\n",
+    "\n",
+    "We can now compute the gradients by back-propagating the derivatives using the chain rule.\n",
+    "We have defined"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a7a72e3b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial f}{\\partial x_L} = 1,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "041df7ab",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "which allows us to find the derivatives of the various variables $x_i$ as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b687bc51",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial f}{\\partial x_i} = \\sum_{x_j:x_i\\in Pa(x_j)}\\frac{\\partial f}{\\partial x_j} \\frac{\\partial x_j}{\\partial x_i}=\\sum_{x_j:x_i\\in Pa(x_j)}\\frac{\\partial f}{\\partial x_j} \\frac{\\partial g_j}{\\partial x_i}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5c87f3af",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Whenever we have a function which can be expressed as a computation\n",
+    "graph and the various functions can be expressed in terms of\n",
+    "elementary functions that are differentiable, then automatic\n",
+    "differentiation works.  The functions may not need to be elementary\n",
+    "functions, they could also be computer programs, although not all\n",
+    "programs can be automatically differentiated."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "02df0535",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## First network example, simple percepetron with one input\n",
+    "\n",
+    "As yet another example we define now a simple perceptron model with\n",
+    "all quantities given by scalars. We consider only one input variable\n",
+    "$x$ and one target value $y$.  We define an activation function\n",
+    "$\\sigma_1$ which takes as input"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "dc45fa01",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "z_1 = w_1x+b_1,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5568395b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where $w_1$ is the weight and $b_1$ is the bias. These are the\n",
+    "parameters we want to optimize.  The output is $a_1=\\sigma(z_1)$ (see\n",
+    "graph from whiteboard notes). This output is then fed into the\n",
+    "**cost/loss** function, which we here for the sake of simplicity just\n",
+    "define as the squared error"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e6ae6f18",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "C(x;w_1,b_1)=\\frac{1}{2}(a_1-y)^2.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7d6abd22",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Layout of a simple neural network with no hidden layer\n",
+    "\n",
+    "<!-- dom:FIGURE: [figures/simplenn1.png, width=900 frac=1.0] -->\n",
+    "<!-- begin figure -->\n",
+    "\n",
+    "<img src=\"figures/simplenn1.png\" width=\"900\"><p style=\"font-size: 0.9em\"><i>Figure 1: </i></p>\n",
+    "<!-- end figure -->"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1e466108",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Optimizing the parameters\n",
+    "\n",
+    "In setting up the feed forward and back propagation parts of the\n",
+    "algorithm, we need now the derivative of the various variables we want\n",
+    "to train.\n",
+    "\n",
+    "We need"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3b6fd059",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial C}{\\partial w_1} \\hspace{0.1cm}\\mathrm{and}\\hspace{0.1cm}\\frac{\\partial C}{\\partial b_1}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cfad60fc",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Using the chain rule we find"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5c5014b3",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial C}{\\partial w_1}=\\frac{\\partial C}{\\partial a_1}\\frac{\\partial a_1}{\\partial z_1}\\frac{\\partial z_1}{\\partial w_1}=(a_1-y)\\sigma_1'x,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1c677323",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "93362833",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial C}{\\partial b_1}=\\frac{\\partial C}{\\partial a_1}\\frac{\\partial a_1}{\\partial z_1}\\frac{\\partial z_1}{\\partial b_1}=(a_1-y)\\sigma_1',\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c857a902",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "which we later will just define as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b7b95721",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial C}{\\partial a_1}\\frac{\\partial a_1}{\\partial z_1}=\\delta_1.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e2574534",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Adding a hidden layer\n",
+    "\n",
+    "We change our simple model to (see graph)\n",
+    "a network with just one hidden layer but with scalar variables only.\n",
+    "\n",
+    "Our output variable changes to $a_2$ and $a_1$ is now the output from the hidden node and $a_0=x$.\n",
+    "We have then"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ae7a5afa",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "z_1 = w_1a_0+b_1 \\hspace{0.1cm} \\wedge a_1 = \\sigma_1(z_1),\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7962e138",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "z_2 = w_2a_1+b_2 \\hspace{0.1cm} \\wedge a_2 = \\sigma_2(z_2),\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0add2cb1",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and the cost function"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2ea986fc",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "C(x;\\boldsymbol{\\Theta})=\\frac{1}{2}(a_2-y)^2,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "683c4849",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "with $\\boldsymbol{\\Theta}=[w_1,w_2,b_1,b_2]$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f345670c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Layout of a simple neural network with one hidden layer\n",
+    "\n",
+    "<!-- dom:FIGURE: [figures/simplenn2.png, width=900 frac=1.0] -->\n",
+    "<!-- begin figure -->\n",
+    "\n",
+    "<img src=\"figures/simplenn2.png\" width=\"900\"><p style=\"font-size: 0.9em\"><i>Figure 1: </i></p>\n",
+    "<!-- end figure -->"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "bb15a76b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## The derivatives\n",
+    "\n",
+    "The derivatives are now, using the chain rule again"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d0882362",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial C}{\\partial w_2}=\\frac{\\partial C}{\\partial a_2}\\frac{\\partial a_2}{\\partial z_2}\\frac{\\partial z_2}{\\partial w_2}=(a_2-y)\\sigma_2'a_1=\\delta_2a_1,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3e16d45d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial C}{\\partial b_2}=\\frac{\\partial C}{\\partial a_2}\\frac{\\partial a_2}{\\partial z_2}\\frac{\\partial z_2}{\\partial b_2}=(a_2-y)\\sigma_2'=\\delta_2,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b2a0a41b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial C}{\\partial w_1}=\\frac{\\partial C}{\\partial a_2}\\frac{\\partial a_2}{\\partial z_2}\\frac{\\partial z_2}{\\partial a_1}\\frac{\\partial a_1}{\\partial z_1}\\frac{\\partial z_1}{\\partial w_1}=(a_2-y)\\sigma_2'a_1\\sigma_1'a_0,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e8f61358",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial C}{\\partial b_1}=\\frac{\\partial C}{\\partial a_2}\\frac{\\partial a_2}{\\partial z_2}\\frac{\\partial z_2}{\\partial a_1}\\frac{\\partial a_1}{\\partial z_1}\\frac{\\partial z_1}{\\partial b_1}=(a_2-y)\\sigma_2'\\sigma_1'=\\delta_1.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5a8258cb",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Can you generalize this to more than one hidden layer?"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "bb720314",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Important observations\n",
+    "\n",
+    "From the above equations we see that the derivatives of the activation\n",
+    "functions play a central role. If they vanish, the training may\n",
+    "stop. This is called the vanishing gradient problem, see discussions below. If they become\n",
+    "large, the parameters $w_i$ and $b_i$ may simply go to infinity. This\n",
+    "is referenced as  the exploding gradient problem."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "52217a26",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## The training\n",
+    "\n",
+    "The training of the parameters is done through various gradient descent approximations with"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "eb647e50",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "w_{i}\\leftarrow w_{i}- \\eta \\delta_i a_{i-1},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cda95964",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "130a2766",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "b_i \\leftarrow b_i-\\eta \\delta_i,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ac7cc3bc",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "with $\\eta$ is the learning rate.\n",
+    "\n",
+    "One iteration consists of one feed forward step and one back-propagation step. Each back-propagation step does one update of the parameters $\\boldsymbol{\\Theta}$.\n",
+    "\n",
+    "For the first hidden layer $a_{i-1}=a_0=x$ for this simple model."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cde60cd2",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Code example\n",
+    "\n",
+    "The code here implements the above model with one hidden layer and\n",
+    "scalar variables for the same function we studied in the previous\n",
+    "example.  The code is however set up so that we can add multiple\n",
+    "inputs $x$ and target values $y$. Note also that we have the\n",
+    "possibility of defining a feature matrix $\\boldsymbol{X}$ with more than just\n",
+    "one column for the input values. This will turn useful in our next example. We have also defined matrices and vectors for all of our operations although it is not necessary here."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "id": "3616dd69",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "import numpy as np\n",
+    "# We use the Sigmoid function as activation function\n",
+    "def sigmoid(z):\n",
+    "    return 1.0/(1.0+np.exp(-z))\n",
+    "\n",
+    "def forwardpropagation(x):\n",
+    "    # weighted sum of inputs to the hidden layer\n",
+    "    z_1 = np.matmul(x, w_1) + b_1\n",
+    "    # activation in the hidden layer\n",
+    "    a_1 = sigmoid(z_1)\n",
+    "    # weighted sum of inputs to the output layer\n",
+    "    z_2 = np.matmul(a_1, w_2) + b_2\n",
+    "    a_2 = z_2\n",
+    "    return a_1, a_2\n",
+    "\n",
+    "def backpropagation(x, y):\n",
+    "    a_1, a_2 = forwardpropagation(x)\n",
+    "    # parameter delta for the output layer, note that a_2=z_2 and its derivative wrt z_2 is just 1\n",
+    "    delta_2 = a_2 - y\n",
+    "    print(0.5*((a_2-y)**2))\n",
+    "    # delta for  the hidden layer\n",
+    "    delta_1 = np.matmul(delta_2, w_2.T) * a_1 * (1 - a_1)\n",
+    "    # gradients for the output layer\n",
+    "    output_weights_gradient = np.matmul(a_1.T, delta_2)\n",
+    "    output_bias_gradient = np.sum(delta_2, axis=0)\n",
+    "    # gradient for the hidden layer\n",
+    "    hidden_weights_gradient = np.matmul(x.T, delta_1)\n",
+    "    hidden_bias_gradient = np.sum(delta_1, axis=0)\n",
+    "    return output_weights_gradient, output_bias_gradient, hidden_weights_gradient, hidden_bias_gradient\n",
+    "\n",
+    "\n",
+    "# ensure the same random numbers appear every time\n",
+    "np.random.seed(0)\n",
+    "# Input variable\n",
+    "x = np.array([4.0],dtype=np.float64)\n",
+    "# Target values\n",
+    "y = 2*x+1.0 \n",
+    "\n",
+    "# Defining the neural network, only scalars here\n",
+    "n_inputs = x.shape\n",
+    "n_features = 1\n",
+    "n_hidden_neurons = 1\n",
+    "n_outputs = 1\n",
+    "\n",
+    "# Initialize the network\n",
+    "# weights and bias in the hidden layer\n",
+    "w_1 = np.random.randn(n_features, n_hidden_neurons)\n",
+    "b_1 = np.zeros(n_hidden_neurons) + 0.01\n",
+    "\n",
+    "# weights and bias in the output layer\n",
+    "w_2 = np.random.randn(n_hidden_neurons, n_outputs)\n",
+    "b_2 = np.zeros(n_outputs) + 0.01\n",
+    "\n",
+    "eta = 0.1\n",
+    "for i in range(50):\n",
+    "    # calculate gradients\n",
+    "    derivW2, derivB2, derivW1, derivB1 = backpropagation(x, y)\n",
+    "    # update weights and biases\n",
+    "    w_2 -= eta * derivW2\n",
+    "    b_2 -= eta * derivB2\n",
+    "    w_1 -= eta * derivW1\n",
+    "    b_1 -= eta * derivB1"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3348a149",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "We see that after some few iterations (the results do depend on the learning rate however), we get an error which is rather small."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b9b47543",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Exercise 1: Including more data\n",
+    "\n",
+    "Try to increase the amount of input and\n",
+    "target/output data. Try also to perform calculations for more values\n",
+    "of the learning rates. Feel free to add either hyperparameters with an\n",
+    "$l_1$ norm or an $l_2$ norm and discuss your results.\n",
+    "Discuss your results as functions of the amount of training data and various learning rates.\n",
+    "\n",
+    "**Challenge:** Try to change the activation functions and replace the hard-coded analytical expressions with automatic derivation via either **autograd** or **JAX**."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3d2a82c9",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Simple neural network and the  back propagation equations\n",
+    "\n",
+    "Let us now try to increase our level of ambition and attempt at setting \n",
+    "up the equations for a neural network with two input nodes, one hidden\n",
+    "layer with two hidden nodes and one output layer with one output node/neuron only (see graph)..\n",
+    "\n",
+    "We need to define the following parameters and variables with the input layer (layer $(0)$) \n",
+    "where we label the  nodes $x_0$ and $x_1$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e2bda122",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "x_0 = a_0^{(0)} \\wedge x_1 = a_1^{(0)}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d4324d91",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "The  hidden layer (layer $(1)$) has  nodes which yield the outputs $a_0^{(1)}$ and $a_1^{(1)}$) with  weight $\\boldsymbol{w}$ and bias $\\boldsymbol{b}$ parameters"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b3c0b344",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "w_{ij}^{(1)}=\\left\\{w_{00}^{(1)},w_{01}^{(1)},w_{10}^{(1)},w_{11}^{(1)}\\right\\} \\wedge b^{(1)}=\\left\\{b_0^{(1)},b_1^{(1)}\\right\\}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "fb200d12",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Layout of a simple neural network with two input nodes, one  hidden layer and one output node\n",
+    "\n",
+    "<!-- dom:FIGURE: [figures/simplenn3.png, width=900 frac=1.0] -->\n",
+    "<!-- begin figure -->\n",
+    "\n",
+    "<img src=\"figures/simplenn3.png\" width=\"900\"><p style=\"font-size: 0.9em\"><i>Figure 1: </i></p>\n",
+    "<!-- end figure -->"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5a7e37cd",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## The ouput layer\n",
+    "\n",
+    "Finally, we have the ouput layer given by layer label $(2)$ with output $a^{(2)}$ and weights and biases to be determined given by the variables"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "11f25dfa",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "w_{i}^{(2)}=\\left\\{w_{0}^{(2)},w_{1}^{(2)}\\right\\} \\wedge b^{(2)}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8755dbae",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Our output is $\\tilde{y}=a^{(2)}$ and we define a generic cost function $C(a^{(2)},y;\\boldsymbol{\\Theta})$ where $y$ is the target value (a scalar here).\n",
+    "The parameters we need to optimize are given by"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "51983594",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\boldsymbol{\\Theta}=\\left\\{w_{00}^{(1)},w_{01}^{(1)},w_{10}^{(1)},w_{11}^{(1)},w_{0}^{(2)},w_{1}^{(2)},b_0^{(1)},b_1^{(1)},b^{(2)}\\right\\}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "20a70d90",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Compact expressions\n",
+    "\n",
+    "We can define the inputs to the activation functions for the various layers in terms of various matrix-vector multiplications and vector additions.\n",
+    "The inputs to the first hidden layer are"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "76e186dc",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\begin{bmatrix}z_0^{(1)} \\\\ z_1^{(1)} \\end{bmatrix}=\\begin{bmatrix}w_{00}^{(1)} & w_{01}^{(1)}\\\\ w_{10}^{(1)} &w_{11}^{(1)} \\end{bmatrix}\\begin{bmatrix}a_0^{(0)} \\\\ a_1^{(0)} \\end{bmatrix}+\\begin{bmatrix}b_0^{(1)} \\\\ b_1^{(1)} \\end{bmatrix},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3396d1b9",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "with outputs"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2f4d2eed",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\begin{bmatrix}a_0^{(1)} \\\\ a_1^{(1)} \\end{bmatrix}=\\begin{bmatrix}\\sigma^{(1)}(z_0^{(1)}) \\\\ \\sigma^{(1)}(z_1^{(1)}) \\end{bmatrix}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6863edaa",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Output layer\n",
+    "\n",
+    "For the final output layer we have the inputs to the final activation function"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "569b5a62",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "z^{(2)} = w_{0}^{(2)}a_0^{(1)} +w_{1}^{(2)}a_1^{(1)}+b^{(2)},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "88775a53",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "resulting in the  output"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "11852c41",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "a^{(2)}=\\sigma^{(2)}(z^{(2)}).\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4e2e26a9",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Explicit derivatives\n",
+    "\n",
+    "In total we have nine parameters which we need to train.  Using the\n",
+    "chain rule (or just the back-propagation algorithm) we can find all\n",
+    "derivatives. Since we will use automatic differentiation in reverse\n",
+    "mode, we start with the derivatives of the cost function with respect\n",
+    "to the parameters of the output layer, namely"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "25da37b5",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial C}{\\partial w_{i}^{(2)}}=\\frac{\\partial C}{\\partial a^{(2)}}\\frac{\\partial a^{(2)}}{\\partial z^{(2)}}\\frac{\\partial z^{(2)}}{\\partial w_{i}^{(2)}}=\\delta^{(2)}a_i^{(1)},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4094b188",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "with"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "99f40072",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\delta^{(2)}=\\frac{\\partial C}{\\partial a^{(2)}}\\frac{\\partial a^{(2)}}{\\partial z^{(2)}}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a93180cb",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and finally"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "312c8e22",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial C}{\\partial b^{(2)}}=\\frac{\\partial C}{\\partial a^{(2)}}\\frac{\\partial a^{(2)}}{\\partial z^{(2)}}\\frac{\\partial z^{(2)}}{\\partial b^{(2)}}=\\delta^{(2)}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4db8065c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Derivatives of the hidden layer\n",
+    "\n",
+    "Using the chain rule we have the following expressions for say one of the weight parameters (it is easy to generalize to the other weight parameters)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "316b7cc7",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial C}{\\partial w_{00}^{(1)}}=\\frac{\\partial C}{\\partial a^{(2)}}\\frac{\\partial a^{(2)}}{\\partial z^{(2)}}\n",
+    "\\frac{\\partial z^{(2)}}{\\partial z_0^{(1)}}\\frac{\\partial z_0^{(1)}}{\\partial w_{00}^{(1)}}=    \\delta^{(2)}\\frac{\\partial z^{(2)}}{\\partial z_0^{(1)}}\\frac{\\partial z_0^{(1)}}{\\partial w_{00}^{(1)}},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8ef16e76",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "which, noting that"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "85a0f70d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "z^{(2)} =w_0^{(2)}a_0^{(1)}+w_1^{(2)}a_1^{(1)}+b^{(2)},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "108db06e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "allows us to rewrite"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2922e5c6",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial z^{(2)}}{\\partial z_0^{(1)}}\\frac{\\partial z_0^{(1)}}{\\partial w_{00}^{(1)}}=w_0^{(2)}\\frac{\\partial a_0^{(1)}}{\\partial z_0^{(1)}}a_0^{(1)}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cb6f6fe5",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Final expression\n",
+    "Defining"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3a0d272d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\delta_0^{(1)}=w_0^{(2)}\\frac{\\partial a_0^{(1)}}{\\partial z_0^{(1)}}\\delta^{(2)},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "70a6cf5c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "we have"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a862fb73",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial C}{\\partial w_{00}^{(1)}}=\\delta_0^{(1)}a_0^{(1)}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "703fa2c1",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Similarly, we obtain"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2032458a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial C}{\\partial w_{01}^{(1)}}=\\delta_0^{(1)}a_1^{(1)}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "97d8acd7",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Completing the list\n",
+    "\n",
+    "Similarly, we find"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "972e5301",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial C}{\\partial w_{10}^{(1)}}=\\delta_1^{(1)}a_0^{(1)},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ba8f5955",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3ac41463",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial C}{\\partial w_{11}^{(1)}}=\\delta_1^{(1)}a_1^{(1)},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ab92a69c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where we have defined"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8224b6f2",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\delta_1^{(1)}=w_1^{(2)}\\frac{\\partial a_1^{(1)}}{\\partial z_1^{(1)}}\\delta^{(2)}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b55a566b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Final expressions for the biases of the hidden layer\n",
+    "\n",
+    "For the sake of completeness, we list the derivatives of the biases, which are"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cb5f687e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial C}{\\partial b_{0}^{(1)}}=\\delta_0^{(1)},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6d8361e8",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ccfb7fa8",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial C}{\\partial b_{1}^{(1)}}=\\delta_1^{(1)}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "20fd0aa3",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "As we will see below, these expressions can be generalized in a more compact form."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6bca7f99",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Gradient expressions\n",
+    "\n",
+    "For this specific model, with just one output node and two hidden\n",
+    "nodes, the gradient descent equations take the following form for output layer"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "430e26d8",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "w_{i}^{(2)}\\leftarrow w_{i}^{(2)}- \\eta \\delta^{(2)} a_{i}^{(1)},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ced71f83",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ec12ee1a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "b^{(2)} \\leftarrow b^{(2)}-\\eta \\delta^{(2)},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f46fe24d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "af8f924d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "w_{ij}^{(1)}\\leftarrow w_{ij}^{(1)}- \\eta \\delta_{i}^{(1)} a_{j}^{(0)},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4aeb6140",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0bc2f26c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "b_{i}^{(1)} \\leftarrow b_{i}^{(1)}-\\eta \\delta_{i}^{(1)},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7eafd358",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where $\\eta$ is the learning rate."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "548f58f6",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Exercise 2: Extended program\n",
+    "\n",
+    "We extend our simple code to a function which depends on two variable $x_0$ and $x_1$, that is"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4c38514a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "y=f(x_0,x_1)=x_0^2+3x_0x_1+x_1^2+5.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "06303245",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "We feed our network with $n=100$ entries $x_0$ and $x_1$. We have thus two features represented by these variable and an input matrix/design matrix $\\boldsymbol{X}\\in \\mathbf{R}^{n\\times 2}$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ed0c0029",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\boldsymbol{X}=\\begin{bmatrix} x_{00} & x_{01} \\\\ x_{00} & x_{01} \\\\ x_{10} & x_{11} \\\\ x_{20} & x_{21} \\\\ \\dots & \\dots \\\\ \\dots & \\dots \\\\ x_{n-20} & x_{n-21} \\\\ x_{n-10} & x_{n-11} \\end{bmatrix}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "93df389e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Write a code, based on the previous code examples, which takes as input these data and fit the above function.\n",
+    "You can extend your code to include automatic differentiation.\n",
+    "\n",
+    "With these examples, we are now ready to embark upon the writing of more a general code for neural networks."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5df18704",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Getting serious, the  back propagation equations for a neural network\n",
+    "\n",
+    "Now it is time to move away from one node in each layer only. Our inputs are also represented either by several inputs.\n",
+    "\n",
+    "We have thus"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ae3765be",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial{\\cal C}((\\boldsymbol{\\Theta}^L)}{\\partial w_{jk}^L}  =  \\left(a_j^L - y_j\\right)a_j^L(1-a_j^L)a_k^{L-1},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "dd8f7882",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Defining"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f204fdd7",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\delta_j^L = a_j^L(1-a_j^L)\\left(a_j^L - y_j\\right) = \\sigma'(z_j^L)\\frac{\\partial {\\cal C}}{\\partial (a_j^L)},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c28e8401",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and using the Hadamard product of two vectors we can write this as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "910c4eb1",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\boldsymbol{\\delta}^L = \\sigma'(\\hat{z}^L)\\circ\\frac{\\partial {\\cal C}}{\\partial (\\boldsymbol{a}^L)}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "efd2f948",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Analyzing the last results\n",
+    "\n",
+    "This is an important expression. The second term on the right handside\n",
+    "measures how fast the cost function is changing as a function of the $j$th\n",
+    "output activation.  If, for example, the cost function doesn't depend\n",
+    "much on a particular output node $j$, then $\\delta_j^L$ will be small,\n",
+    "which is what we would expect. The first term on the right, measures\n",
+    "how fast the activation function $f$ is changing at a given activation\n",
+    "value $z_j^L$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e1eeeba2",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## More considerations\n",
+    "\n",
+    "Notice that everything in the above equations is easily computed.  In\n",
+    "particular, we compute $z_j^L$ while computing the behaviour of the\n",
+    "network, and it is only a small additional overhead to compute\n",
+    "$\\sigma'(z^L_j)$.  The exact form of the derivative with respect to the\n",
+    "output depends on the form of the cost function.\n",
+    "However, provided the cost function is known there should be little\n",
+    "trouble in calculating"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b5e74c11",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial {\\cal C}}{\\partial (a_j^L)}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e129fe72",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "With the definition of $\\delta_j^L$ we have a more compact definition of the derivative of the cost function in terms of the weights, namely"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3879d293",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial{\\cal C}}{\\partial w_{jk}^L}  =  \\delta_j^La_k^{L-1}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1ea1da9d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Derivatives in terms of $z_j^L$\n",
+    "\n",
+    "It is also easy to see that our previous equation can be written as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c7156e16",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\delta_j^L =\\frac{\\partial {\\cal C}}{\\partial z_j^L}= \\frac{\\partial {\\cal C}}{\\partial a_j^L}\\frac{\\partial a_j^L}{\\partial z_j^L},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8311b4aa",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "which can also be interpreted as the partial derivative of the cost function with respect to the biases $b_j^L$, namely"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7bb3d820",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\delta_j^L = \\frac{\\partial {\\cal C}}{\\partial b_j^L}\\frac{\\partial b_j^L}{\\partial z_j^L}=\\frac{\\partial {\\cal C}}{\\partial b_j^L},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1eeb0c00",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "That is, the error $\\delta_j^L$ is exactly equal to the rate of change of the cost function as a function of the bias."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "bc7d3757",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Bringing it together\n",
+    "\n",
+    "We have now three equations that are essential for the computations of the derivatives of the cost function at the output layer. These equations are needed to start the algorithm and they are"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9f018cff",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"_auto1\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation}\n",
+    "\\frac{\\partial{\\cal C}(\\hat{W^L})}{\\partial w_{jk}^L}  =  \\delta_j^La_k^{L-1},\n",
+    "\\label{_auto1} \\tag{2}\n",
+    "\\end{equation}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ebde7551",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f96aa8f7",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"_auto2\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation}\n",
+    "\\delta_j^L = \\sigma'(z_j^L)\\frac{\\partial {\\cal C}}{\\partial (a_j^L)},\n",
+    "\\label{_auto2} \\tag{3}\n",
+    "\\end{equation}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1215d118",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c5f6885e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"_auto3\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation}\n",
+    "\\delta_j^L = \\frac{\\partial {\\cal C}}{\\partial b_j^L},\n",
+    "\\label{_auto3} \\tag{4}\n",
+    "\\end{equation}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1dedde99",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Final back propagating equation\n",
+    "\n",
+    "We have that (replacing $L$ with a general layer $l$)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a182b912",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\delta_j^l =\\frac{\\partial {\\cal C}}{\\partial z_j^l}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9fcc3201",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "We want to express this in terms of the equations for layer $l+1$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "54237463",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Using the chain rule and summing over all $k$ entries\n",
+    "\n",
+    "We obtain"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "dc069f5a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\delta_j^l =\\sum_k \\frac{\\partial {\\cal C}}{\\partial z_k^{l+1}}\\frac{\\partial z_k^{l+1}}{\\partial z_j^{l}}=\\sum_k \\delta_k^{l+1}\\frac{\\partial z_k^{l+1}}{\\partial z_j^{l}},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "71ba0435",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and recalling that"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "bd00cbe9",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "z_j^{l+1} = \\sum_{i=1}^{M_{l}}w_{ij}^{l+1}a_i^{l}+b_j^{l+1},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1e7e0241",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "with $M_l$ being the number of nodes in layer $l$, we obtain"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e8e3697e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\delta_j^l =\\sum_k \\delta_k^{l+1}w_{kj}^{l+1}\\sigma'(z_j^l),\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7d86a02b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "This is our final equation.\n",
+    "\n",
+    "We are now ready to set up the algorithm for back propagation and learning the weights and biases."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ff1dc46f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Setting up the back propagation algorithm\n",
+    "\n",
+    "The four equations  provide us with a way of computing the gradient of the cost function. Let us write this out in the form of an algorithm.\n",
+    "\n",
+    "**First**, we set up the input data $\\hat{x}$ and the activations\n",
+    "$\\hat{z}_1$ of the input layer and compute the activation function and\n",
+    "the pertinent outputs $\\hat{a}^1$.\n",
+    "\n",
+    "**Secondly**, we perform then the feed forward till we reach the output\n",
+    "layer and compute all $\\hat{z}_l$ of the input layer and compute the\n",
+    "activation function and the pertinent outputs $\\hat{a}^l$ for\n",
+    "$l=1,2,3,\\dots,L$.\n",
+    "\n",
+    "**Notation**: The first hidden layer has $l=1$ as label and the final output layer has $l=L$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1313e6dc",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Setting up the back propagation algorithm, part 2\n",
+    "\n",
+    "Thereafter we compute the ouput error $\\hat{\\delta}^L$ by computing all"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "74378773",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\delta_j^L = \\sigma'(z_j^L)\\frac{\\partial {\\cal C}}{\\partial (a_j^L)}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "70450254",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Then we compute the back propagate error for each $l=L-1,L-2,\\dots,1$ as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "81a28b23",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\delta_j^l = \\sum_k \\delta_k^{l+1}w_{kj}^{l+1}\\sigma'(z_j^l).\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9a733356",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Setting up the Back propagation algorithm, part 3\n",
+    "\n",
+    "Finally, we update the weights and the biases using gradient descent\n",
+    "for each $l=L-1,L-2,\\dots,1$ and update the weights and biases\n",
+    "according to the rules"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f469f486",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "w_{jk}^l\\leftarrow  = w_{jk}^l- \\eta \\delta_j^la_k^{l-1},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7461e5e6",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "b_j^l \\leftarrow b_j^l-\\eta \\frac{\\partial {\\cal C}}{\\partial b_j^l}=b_j^l-\\eta \\delta_j^l,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "50a1b605",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "with $\\eta$ being the learning rate."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0cebce43",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Updating the gradients\n",
+    "\n",
+    "With the back propagate error for each $l=L-1,L-2,\\dots,1$ as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2e4405bd",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\delta_j^l = \\sum_k \\delta_k^{l+1}w_{kj}^{l+1}sigma'(z_j^l),\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2920aa4e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "we update the weights and the biases using gradient descent for each $l=L-1,L-2,\\dots,1$ and update the weights and biases according to the rules"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "bc4357b0",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "w_{jk}^l\\leftarrow  = w_{jk}^l- \\eta \\delta_j^la_k^{l-1},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d9b66569",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "b_j^l \\leftarrow b_j^l-\\eta \\frac{\\partial {\\cal C}}{\\partial b_j^l}=b_j^l-\\eta \\delta_j^l,\n",
+    "$$"
+   ]
+  }
+ ],
+ "metadata": {},
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/doc/pub/week42/ipynb/week42.ipynb b/doc/LectureNotes/_build/html/_sources/week42.ipynb
similarity index 87%
rename from doc/pub/week42/ipynb/week42.ipynb
rename to doc/LectureNotes/_build/html/_sources/week42.ipynb
index 0000be88d..45a126e79 100644
--- a/doc/pub/week42/ipynb/week42.ipynb
+++ b/doc/LectureNotes/_build/html/_sources/week42.ipynb
@@ -2,8 +2,10 @@
  "cells": [
   {
    "cell_type": "markdown",
-   "id": "0b7206c9",
-   "metadata": {},
+   "id": "d231eeee",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "<!-- HTML file automatically generated from DocOnce source (https://github.com/doconce/doconce/)\n",
     "doconce format html week42.do.txt --no_mako -->\n",
@@ -12,32 +14,45 @@
   },
   {
    "cell_type": "markdown",
-   "id": "66a4424e",
-   "metadata": {},
+   "id": "5e782cb1",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "# Week 42 Constructing a Neural Network code with examples\n",
-    "**Morten Hjorth-Jensen**, Department of Physics, University of Oslo and Department of Physics and Astronomy and Facility for Rare Isotope Beams, Michigan State University\n",
+    "**Morten Hjorth-Jensen**, Department of Physics, University of Oslo, Norway\n",
     "\n",
-    "Date: **October 14-18, 2024**"
+    "Date: **October 13-17, 2025**"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "2d48e612",
-   "metadata": {},
+   "id": "53309290",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "## Lecture October 14, 2024\n",
+    "## Lecture October 13, 2025\n",
     "1. Building our own Feed-forward Neural Network and discussion of project 2\n",
     "\n",
-    "**Readings and videos.**\n",
-    "\n",
+    "2. Project 2 is available at <https://github.com/CompPhysics/MachineLearning/blob/master/doc/Projects/2025/Project2/ipynb/Project2.ipynb>"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "71367514",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Readings and videos\n",
     "1. These lecture notes\n",
     "\n",
-    "2. [Video of lecture](https://youtu.be/7B2F35gNj2Y)\n",
+    "2. Video of lecture at <https://youtu.be/eqyNrEYRXnY>\n",
     "\n",
-    "3. [Whiteboard notes](https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2024/NotesOct14.pdf)\n",
+    "3. Whiteboard notes at <https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2025/FYSSTKweek42.pdf>\n",
     "\n",
-    "4. For a more in depth discussion on  neural networks we recommend Goodfellow et al chapters 6 and 7.     \n",
+    "4. For a more in depth discussion on  neural networks we recommend Goodfellow et al chapters 6 and 7. For the optimization part, see chapter 8.    \n",
     "\n",
     "5. Neural Networks demystified at <https://www.youtube.com/watch?v=bxe2T-V8XRs&list=PLiaHhY2iBX9hdHaRr6b7XevZtgZRa1PoU&ab_channel=WelchLabs>\n",
     "\n",
@@ -52,25 +67,25 @@
   },
   {
    "cell_type": "markdown",
-   "id": "493dbcac",
-   "metadata": {},
+   "id": "c7be87be",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "## Material for the active learning sessions on Tuesday and Wednesday\n",
-    "  * Exercise on starting to write a code for neural networks, feed forward part. We will also continue ur discussions of gradient descent methods from last week. If you have time, start considering the back-propagation part as well (exercises for next week)\n",
+    "## Material for the lab sessions on Tuesday and Wednesday\n",
+    "1. Exercises on writing a code for neural networks, back propagation part, see exercises for week 42 at <https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/exercisesweek42.html> \n",
     "\n",
-    "  * Discussion of project 2\n",
-    "\n",
-    "  \n",
-    "\n",
-    "**Note**: some of the codes will also be discussed next week in connection with the solution of differential equations."
+    "2. Discussion of project 2"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "0a83a2c3",
-   "metadata": {},
+   "id": "8e0567a2",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "## Writing a code which implements a feed-forward neural network\n",
+    "## Lecture material: Writing a code which implements a feed-forward neural network\n",
     "\n",
     "Last week we discussed the basics of neural networks and deep learning\n",
     "and the basics of automatic differentiation.  We looked also at\n",
@@ -79,13 +94,15 @@
     "\n",
     "We ended our discussions with the derivation of the equations for a\n",
     "neural network with one hidden layers and two input variables and two\n",
-    "hidden nodes but only one output node."
+    "hidden nodes but only one output node. We did almost finish the derivation of the back propagation algorithm."
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "93735ab4",
-   "metadata": {},
+   "id": "549dcc05",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "## Mathematics of deep learning\n",
     "\n",
@@ -98,8 +115,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "9f8617c7",
-   "metadata": {},
+   "id": "21203bae",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "## Reminder on books with hands-on material and codes\n",
     "* [Sebastian Rashcka et al, Machine learning with Sickit-Learn and PyTorch](https://sebastianraschka.com/blog/2022/ml-pytorch-book.html)"
@@ -107,8 +126,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "e35c3c4a",
-   "metadata": {},
+   "id": "1c102a30",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "## Reading recommendations\n",
     "\n",
@@ -119,10 +140,12 @@
   },
   {
    "cell_type": "markdown",
-   "id": "35d77455",
-   "metadata": {},
+   "id": "53f11afe",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "## First network example, simple percepetron with one input\n",
+    "## Reminder from last week: First network example, simple percepetron with one input\n",
     "\n",
     "As yet another example we define now a simple perceptron model with\n",
     "all quantities given by scalars. We consider only one input variable\n",
@@ -132,8 +155,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "aed7f415",
-   "metadata": {},
+   "id": "afa8c42a",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "z_1 = w_1x+b_1,\n",
@@ -142,8 +167,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "012d3932",
-   "metadata": {},
+   "id": "cb5c959f",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "where $w_1$ is the weight and $b_1$ is the bias. These are the\n",
     "parameters we want to optimize.  The output is $a_1=\\sigma(z_1)$ (see\n",
@@ -154,8 +181,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "6c916a40",
-   "metadata": {},
+   "id": "0083ae15",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "C(x;w_1,b_1)=\\frac{1}{2}(a_1-y)^2.\n",
@@ -164,8 +193,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "de97e0a8",
-   "metadata": {},
+   "id": "f4931203",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "## Layout of a simple neural network with no hidden layer\n",
     "\n",
@@ -178,8 +209,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "b2a74b7e",
-   "metadata": {},
+   "id": "d3a3754d",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "## Optimizing the parameters\n",
     "\n",
@@ -192,8 +225,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "a09160e9",
-   "metadata": {},
+   "id": "bcd5dbab",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "\\frac{\\partial C}{\\partial w_1} \\hspace{0.1cm}\\mathrm{and}\\hspace{0.1cm}\\frac{\\partial C}{\\partial b_1}.\n",
@@ -202,16 +237,20 @@
   },
   {
    "cell_type": "markdown",
-   "id": "6e00f28f",
-   "metadata": {},
+   "id": "2cbc30f1",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "Using the chain rule we find"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "91ca6f32",
-   "metadata": {},
+   "id": "1a1d803d",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "\\frac{\\partial C}{\\partial w_1}=\\frac{\\partial C}{\\partial a_1}\\frac{\\partial a_1}{\\partial z_1}\\frac{\\partial z_1}{\\partial w_1}=(a_1-y)\\sigma_1'x,\n",
@@ -220,16 +259,20 @@
   },
   {
    "cell_type": "markdown",
-   "id": "234f9dd4",
-   "metadata": {},
+   "id": "776735c7",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "and"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "0a5bcd5f",
-   "metadata": {},
+   "id": "c1a2e5af",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "\\frac{\\partial C}{\\partial b_1}=\\frac{\\partial C}{\\partial a_1}\\frac{\\partial a_1}{\\partial z_1}\\frac{\\partial z_1}{\\partial b_1}=(a_1-y)\\sigma_1',\n",
@@ -238,16 +281,20 @@
   },
   {
    "cell_type": "markdown",
-   "id": "b781fb94",
-   "metadata": {},
+   "id": "9e603df9",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "which we later will just define as"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "b3d748ee",
-   "metadata": {},
+   "id": "533212cd",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "\\frac{\\partial C}{\\partial a_1}\\frac{\\partial a_1}{\\partial z_1}=\\delta_1.\n",
@@ -256,8 +303,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "59e42ceb",
-   "metadata": {},
+   "id": "09d91067",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "## Adding a hidden layer\n",
     "\n",
@@ -270,8 +319,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "c2f312ae",
-   "metadata": {},
+   "id": "f767afe7",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "z_1 = w_1a_0+b_1 \\hspace{0.1cm} \\wedge a_1 = \\sigma_1(z_1),\n",
@@ -280,8 +331,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "1476ad2f",
-   "metadata": {},
+   "id": "f38ded54",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "z_2 = w_2a_1+b_2 \\hspace{0.1cm} \\wedge a_2 = \\sigma_2(z_2),\n",
@@ -290,16 +343,20 @@
   },
   {
    "cell_type": "markdown",
-   "id": "907e90de",
-   "metadata": {},
+   "id": "f3f03bc3",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "and the cost function"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "1d1157b0",
-   "metadata": {},
+   "id": "9062730e",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "C(x;\\boldsymbol{\\Theta})=\\frac{1}{2}(a_2-y)^2,\n",
@@ -308,16 +365,20 @@
   },
   {
    "cell_type": "markdown",
-   "id": "348ddd64",
-   "metadata": {},
+   "id": "75bbc32c",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "with $\\boldsymbol{\\Theta}=[w_1,w_2,b_1,b_2]$."
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "3672dcce",
-   "metadata": {},
+   "id": "fcf02dbf",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "## Layout of a simple neural network with one hidden layer\n",
     "\n",
@@ -330,8 +391,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "785c3632",
-   "metadata": {},
+   "id": "aa97678f",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "## The derivatives\n",
     "\n",
@@ -340,8 +403,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "af633a03",
-   "metadata": {},
+   "id": "98f68e27",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "\\frac{\\partial C}{\\partial w_2}=\\frac{\\partial C}{\\partial a_2}\\frac{\\partial a_2}{\\partial z_2}\\frac{\\partial z_2}{\\partial w_2}=(a_2-y)\\sigma_2'a_1=\\delta_2a_1,\n",
@@ -350,8 +415,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "c0fa4b25",
-   "metadata": {},
+   "id": "c4528178",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "\\frac{\\partial C}{\\partial b_2}=\\frac{\\partial C}{\\partial a_2}\\frac{\\partial a_2}{\\partial z_2}\\frac{\\partial z_2}{\\partial b_2}=(a_2-y)\\sigma_2'=\\delta_2,\n",
@@ -360,8 +427,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "c6ed00a0",
-   "metadata": {},
+   "id": "d6304298",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "\\frac{\\partial C}{\\partial w_1}=\\frac{\\partial C}{\\partial a_2}\\frac{\\partial a_2}{\\partial z_2}\\frac{\\partial z_2}{\\partial a_1}\\frac{\\partial a_1}{\\partial z_1}\\frac{\\partial z_1}{\\partial w_1}=(a_2-y)\\sigma_2'a_1\\sigma_1'a_0,\n",
@@ -370,8 +439,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "a4ff465c",
-   "metadata": {},
+   "id": "dfc47ba6",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "\\frac{\\partial C}{\\partial b_1}=\\frac{\\partial C}{\\partial a_2}\\frac{\\partial a_2}{\\partial z_2}\\frac{\\partial z_2}{\\partial a_1}\\frac{\\partial a_1}{\\partial z_1}\\frac{\\partial z_1}{\\partial b_1}=(a_2-y)\\sigma_2'\\sigma_1'=\\delta_1.\n",
@@ -380,16 +451,20 @@
   },
   {
    "cell_type": "markdown",
-   "id": "fd71eaf0",
-   "metadata": {},
+   "id": "8834c3dc",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "Can you generalize this to more than one hidden layer?"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "771d3788",
-   "metadata": {},
+   "id": "40956770",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "## Important observations\n",
     "\n",
@@ -402,8 +477,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "8f4f2e0d",
-   "metadata": {},
+   "id": "69e7fdcf",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "## The training\n",
     "\n",
@@ -412,8 +489,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "6f0e3d04",
-   "metadata": {},
+   "id": "726d4c90",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "w_{i}\\leftarrow w_{i}- \\eta \\delta_i a_{i-1},\n",
@@ -422,16 +501,20 @@
   },
   {
    "cell_type": "markdown",
-   "id": "dffb7c57",
-   "metadata": {},
+   "id": "0ee83d1c",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "and"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "82d6c20b",
-   "metadata": {},
+   "id": "f5b3b5a5",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "b_i \\leftarrow b_i-\\eta \\delta_i,\n",
@@ -440,8 +523,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "fb058f95",
-   "metadata": {},
+   "id": "b2746792",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "with $\\eta$ is the learning rate.\n",
     "\n",
@@ -452,8 +537,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "49ee11bd",
-   "metadata": {},
+   "id": "76e2e41a",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "## Code example\n",
     "\n",
@@ -468,8 +555,11 @@
   {
    "cell_type": "code",
    "execution_count": 1,
-   "id": "96f8781a",
-   "metadata": {},
+   "id": "1c4719c1",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
    "outputs": [],
    "source": [
     "import numpy as np\n",
@@ -538,16 +628,20 @@
   },
   {
    "cell_type": "markdown",
-   "id": "89957128",
-   "metadata": {},
+   "id": "debaaadc",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "We see that after some few iterations (the results do depend on the learning rate however), we get an error which is rather small."
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "786ee004",
-   "metadata": {},
+   "id": "7d576f19",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "## Simple neural network and the  back propagation equations\n",
     "\n",
@@ -561,8 +655,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "81c9e4d3",
-   "metadata": {},
+   "id": "582b3b43",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "x_1 = a_1^{(0)} \\wedge x_2 = a_2^{(0)}.\n",
@@ -571,16 +667,20 @@
   },
   {
    "cell_type": "markdown",
-   "id": "521d11f5",
-   "metadata": {},
+   "id": "c8eace47",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "The  hidden layer (layer $(1)$) has  nodes which yield the outputs $a_1^{(1)}$ and $a_2^{(1)}$) with  weight $\\boldsymbol{w}$ and bias $\\boldsymbol{b}$ parameters"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "973b290c",
-   "metadata": {},
+   "id": "81ec9945",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "w_{ij}^{(1)}=\\left\\{w_{11}^{(1)},w_{12}^{(1)},w_{21}^{(1)},w_{22}^{(1)}\\right\\} \\wedge b^{(1)}=\\left\\{b_1^{(1)},b_2^{(1)}\\right\\}.\n",
@@ -589,8 +689,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "32390f24",
-   "metadata": {},
+   "id": "c35e1f69",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "## Layout of a simple neural network with two input nodes, one  hidden layer with two hidden noeds and one output node\n",
     "\n",
@@ -603,8 +705,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "945def24",
-   "metadata": {},
+   "id": "05b8eea9",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "## The ouput layer\n",
     "\n",
@@ -613,8 +717,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "7f0f65a4",
-   "metadata": {},
+   "id": "7ef9cb55",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "w_{i}^{(2)}=\\left\\{w_{1}^{(2)},w_{2}^{(2)}\\right\\} \\wedge b^{(2)}.\n",
@@ -623,8 +729,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "3851aa3b",
-   "metadata": {},
+   "id": "1eb5c5ac",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "Our output is $\\tilde{y}=a^{(2)}$ and we define a generic cost function $C(a^{(2)},y;\\boldsymbol{\\Theta})$ where $y$ is the target value (a scalar here).\n",
     "The parameters we need to optimize are given by"
@@ -632,8 +740,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "56cf96e2",
-   "metadata": {},
+   "id": "00492358",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "\\boldsymbol{\\Theta}=\\left\\{w_{11}^{(1)},w_{12}^{(1)},w_{21}^{(1)},w_{22}^{(1)},w_{1}^{(2)},w_{2}^{(2)},b_1^{(1)},b_2^{(1)},b^{(2)}\\right\\}.\n",
@@ -642,8 +752,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "a17c14e8",
-   "metadata": {},
+   "id": "45cca5aa",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "## Compact expressions\n",
     "\n",
@@ -653,8 +765,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "e62b0591",
-   "metadata": {},
+   "id": "22cfb40b",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "\\begin{bmatrix}z_1^{(1)} \\\\ z_2^{(1)} \\end{bmatrix}=\\left(\\begin{bmatrix}w_{11}^{(1)} & w_{12}^{(1)}\\\\ w_{21}^{(1)} &w_{22}^{(1)} \\end{bmatrix}\\right)^{T}\\begin{bmatrix}a_1^{(0)} \\\\ a_2^{(0)} \\end{bmatrix}+\\begin{bmatrix}b_1^{(1)} \\\\ b_2^{(1)} \\end{bmatrix},\n",
@@ -663,16 +777,20 @@
   },
   {
    "cell_type": "markdown",
-   "id": "081802b0",
-   "metadata": {},
+   "id": "45b30d06",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "with outputs"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "5d153d02",
-   "metadata": {},
+   "id": "ebd6a7a5",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "\\begin{bmatrix}a_1^{(1)} \\\\ a_2^{(1)} \\end{bmatrix}=\\begin{bmatrix}\\sigma^{(1)}(z_1^{(1)}) \\\\ \\sigma^{(1)}(z_2^{(1)}) \\end{bmatrix}.\n",
@@ -681,8 +799,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "cd1f6429",
-   "metadata": {},
+   "id": "659dd686",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "## Output layer\n",
     "\n",
@@ -691,8 +811,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "d9f4dbc5",
-   "metadata": {},
+   "id": "34a1d4ca",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "z^{(2)} = w_{1}^{(2)}a_1^{(1)} +w_{2}^{(2)}a_2^{(1)}+b^{(2)},\n",
@@ -701,16 +823,20 @@
   },
   {
    "cell_type": "markdown",
-   "id": "1de28add",
-   "metadata": {},
+   "id": "34471712",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "resulting in the  output"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "59b0576f",
-   "metadata": {},
+   "id": "0b3a74fd",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "a^{(2)}=\\sigma^{(2)}(z^{(2)}).\n",
@@ -719,8 +845,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "7d33341b",
-   "metadata": {},
+   "id": "1a5bdab3",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "## Explicit derivatives\n",
     "\n",
@@ -733,8 +861,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "428a98ec",
-   "metadata": {},
+   "id": "37f19e78",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "\\frac{\\partial C}{\\partial w_{i}^{(2)}}=\\frac{\\partial C}{\\partial a^{(2)}}\\frac{\\partial a^{(2)}}{\\partial z^{(2)}}\\frac{\\partial z^{(2)}}{\\partial w_{i}^{(2)}}=\\delta^{(2)}a_i^{(1)},\n",
@@ -743,16 +873,20 @@
   },
   {
    "cell_type": "markdown",
-   "id": "77447fe6",
-   "metadata": {},
+   "id": "5505aab8",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "with"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "63aef148",
-   "metadata": {},
+   "id": "d55d045c",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "\\delta^{(2)}=\\frac{\\partial C}{\\partial a^{(2)}}\\frac{\\partial a^{(2)}}{\\partial z^{(2)}}\n",
@@ -761,16 +895,20 @@
   },
   {
    "cell_type": "markdown",
-   "id": "730b31af",
-   "metadata": {},
+   "id": "04f101e7",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "and finally"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "d590fdc8",
-   "metadata": {},
+   "id": "bfab2e91",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "\\frac{\\partial C}{\\partial b^{(2)}}=\\frac{\\partial C}{\\partial a^{(2)}}\\frac{\\partial a^{(2)}}{\\partial z^{(2)}}\\frac{\\partial z^{(2)}}{\\partial b^{(2)}}=\\delta^{(2)}.\n",
@@ -779,8 +917,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "0923fb8e",
-   "metadata": {},
+   "id": "77f35b7e",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "## Derivatives of the hidden layer\n",
     "\n",
@@ -789,8 +929,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "74c764da",
-   "metadata": {},
+   "id": "8cf4a606",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "\\frac{\\partial C}{\\partial w_{11}^{(1)}}=\\frac{\\partial C}{\\partial a^{(2)}}\\frac{\\partial a^{(2)}}{\\partial z^{(2)}}\n",
@@ -800,16 +942,20 @@
   },
   {
    "cell_type": "markdown",
-   "id": "b384b7ef",
-   "metadata": {},
+   "id": "86951351",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "which, noting that"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "f74a8bc9",
-   "metadata": {},
+   "id": "73414e65",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "z^{(2)} =w_1^{(2)}a_1^{(1)}+w_2^{(2)}a_2^{(1)}+b^{(2)},\n",
@@ -818,16 +964,20 @@
   },
   {
    "cell_type": "markdown",
-   "id": "913ae0bd",
-   "metadata": {},
+   "id": "8f0aaa15",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "allows us to rewrite"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "895cb126",
-   "metadata": {},
+   "id": "730c5415",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "\\frac{\\partial z^{(2)}}{\\partial z_1^{(1)}}\\frac{\\partial z_1^{(1)}}{\\partial w_{11}^{(1)}}=w_1^{(2)}\\frac{\\partial a_1^{(1)}}{\\partial z_1^{(1)}}a_1^{(1)}.\n",
@@ -836,8 +986,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "d279d84b",
-   "metadata": {},
+   "id": "1afcb5a1",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "## Final expression\n",
     "Defining"
@@ -845,8 +997,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "e646a164",
-   "metadata": {},
+   "id": "7f30cb44",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "\\delta_1^{(1)}=w_1^{(2)}\\frac{\\partial a_1^{(1)}}{\\partial z_1^{(1)}}\\delta^{(2)},\n",
@@ -855,16 +1009,20 @@
   },
   {
    "cell_type": "markdown",
-   "id": "f46a9699",
-   "metadata": {},
+   "id": "14c045ce",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "we have"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "72a5bd14",
-   "metadata": {},
+   "id": "0c1a2c68",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "\\frac{\\partial C}{\\partial w_{11}^{(1)}}=\\delta_1^{(1)}a_1^{(1)}.\n",
@@ -873,16 +1031,20 @@
   },
   {
    "cell_type": "markdown",
-   "id": "b7389977",
-   "metadata": {},
+   "id": "a3385222",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "Similarly, we obtain"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "ecbc82bb",
-   "metadata": {},
+   "id": "18ee3804",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "\\frac{\\partial C}{\\partial w_{12}^{(1)}}=\\delta_1^{(1)}a_2^{(1)}.\n",
@@ -891,8 +1053,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "4749523a",
-   "metadata": {},
+   "id": "ad741d56",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "## Completing the list\n",
     "\n",
@@ -901,8 +1065,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "93ea8c62",
-   "metadata": {},
+   "id": "65870a70",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "\\frac{\\partial C}{\\partial w_{21}^{(1)}}=\\delta_2^{(1)}a_1^{(1)},\n",
@@ -911,16 +1077,20 @@
   },
   {
    "cell_type": "markdown",
-   "id": "aeea971d",
-   "metadata": {},
+   "id": "f7807fdc",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "and"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "bbaf4d20",
-   "metadata": {},
+   "id": "9af4a759",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "\\frac{\\partial C}{\\partial w_{22}^{(1)}}=\\delta_2^{(1)}a_2^{(1)},\n",
@@ -929,16 +1099,20 @@
   },
   {
    "cell_type": "markdown",
-   "id": "4469f285",
-   "metadata": {},
+   "id": "dc548cb7",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "where we have defined"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "9735e5df",
-   "metadata": {},
+   "id": "83b75e94",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "\\delta_2^{(1)}=w_2^{(2)}\\frac{\\partial a_2^{(1)}}{\\partial z_2^{(1)}}\\delta^{(2)}.\n",
@@ -947,8 +1121,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "54d79226",
-   "metadata": {},
+   "id": "1c2be559",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "## Final expressions for the biases of the hidden layer\n",
     "\n",
@@ -957,8 +1133,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "4365ba97",
-   "metadata": {},
+   "id": "18b85f86",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "\\frac{\\partial C}{\\partial b_{1}^{(1)}}=\\delta_1^{(1)},\n",
@@ -967,16 +1145,20 @@
   },
   {
    "cell_type": "markdown",
-   "id": "bcd2d94c",
-   "metadata": {},
+   "id": "63e39eb4",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "and"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "5584b15e",
-   "metadata": {},
+   "id": "a55371c1",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "\\frac{\\partial C}{\\partial b_{2}^{(1)}}=\\delta_2^{(1)}.\n",
@@ -985,16 +1167,20 @@
   },
   {
    "cell_type": "markdown",
-   "id": "f0c53017",
-   "metadata": {},
+   "id": "fa31a9b3",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "As we will see below, these expressions can be generalized in a more compact form."
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "2ebbdf34",
-   "metadata": {},
+   "id": "580df891",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "## Gradient expressions\n",
     "\n",
@@ -1004,8 +1190,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "b94ec668",
-   "metadata": {},
+   "id": "c10bf2ce",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "w_{i}^{(2)}\\leftarrow w_{i}^{(2)}- \\eta \\delta^{(2)} a_{i}^{(1)},\n",
@@ -1014,16 +1202,20 @@
   },
   {
    "cell_type": "markdown",
-   "id": "8bcb00fc",
-   "metadata": {},
+   "id": "0bae11f8",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "and"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "f8725166",
-   "metadata": {},
+   "id": "ed4a8b93",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "b^{(2)} \\leftarrow b^{(2)}-\\eta \\delta^{(2)},\n",
@@ -1032,16 +1224,20 @@
   },
   {
    "cell_type": "markdown",
-   "id": "bcb99786",
-   "metadata": {},
+   "id": "2d582987",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "and"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "975fb151",
-   "metadata": {},
+   "id": "5fa760a1",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "w_{ij}^{(1)}\\leftarrow w_{ij}^{(1)}- \\eta \\delta_{i}^{(1)} a_{j}^{(0)},\n",
@@ -1050,16 +1246,20 @@
   },
   {
    "cell_type": "markdown",
-   "id": "e17fa81d",
-   "metadata": {},
+   "id": "bc9de8bf",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "and"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "12912a16",
-   "metadata": {},
+   "id": "f00e3ace",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "b_{i}^{(1)} \\leftarrow b_{i}^{(1)}-\\eta \\delta_{i}^{(1)},\n",
@@ -1068,16 +1268,20 @@
   },
   {
    "cell_type": "markdown",
-   "id": "0c14e44b",
-   "metadata": {},
+   "id": "7ac96362",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "where $\\eta$ is the learning rate."
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "6e854ca6",
-   "metadata": {},
+   "id": "9c46f966",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "## Setting up the equations for a neural network\n",
     "\n",
@@ -1091,8 +1295,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "d10945a7",
-   "metadata": {},
+   "id": "ea509b11",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "{\\cal C}(\\boldsymbol{\\Theta})  =  \\frac{1}{2}\\sum_{i=1}^n\\left(y_i - \\tilde{y}_i\\right)^2,\n",
@@ -1101,8 +1307,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "44212558",
-   "metadata": {},
+   "id": "e08ff771",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "where the $y_i$s are our $n$ targets (the values we want to\n",
     "reproduce), while the outputs of the network after having propagated\n",
@@ -1111,10 +1319,12 @@
   },
   {
    "cell_type": "markdown",
-   "id": "3fd80944",
-   "metadata": {},
+   "id": "6f476983",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "## Layout of a neural network with three hidden layers (last later = $l=L=4$, first layer $l=0$)\n",
+    "## Layout of a neural network with three hidden layers (last layer = $l=L=4$, first layer $l=0$)\n",
     "\n",
     "<!-- dom:FIGURE: [figures/nn2.png, width=900 frac=1.0] -->\n",
     "<!-- begin figure -->\n",
@@ -1125,8 +1335,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "42598707",
-   "metadata": {},
+   "id": "0535d087",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "## Definitions\n",
     "\n",
@@ -1140,8 +1352,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "21638e6e",
-   "metadata": {},
+   "id": "5e024ec1",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "z_j^l = \\sum_{i=1}^{M_{l-1}}w_{ij}^la_i^{l-1}+b_j^l,\n",
@@ -1150,8 +1364,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "2843d78a",
-   "metadata": {},
+   "id": "239fb4c6",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "where $b_k^l$ are the biases from layer $l$.  Here $M_{l-1}$\n",
     "represents the total number of nodes/neurons/units of layer $l-1$. The\n",
@@ -1161,8 +1377,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "152a46dd",
-   "metadata": {},
+   "id": "7e4fa6c5",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "\\boldsymbol{z}^l = \\left(\\boldsymbol{W}^l\\right)^T\\boldsymbol{a}^{l-1}+\\boldsymbol{b}^l.\n",
@@ -1171,8 +1389,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "f888a137",
-   "metadata": {},
+   "id": "c47cc3c6",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "## Inputs to the activation function\n",
     "\n",
@@ -1185,8 +1405,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "adc3a5e4",
-   "metadata": {},
+   "id": "4eb89f11",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "a_j^l = \\sigma(z_j^l) = \\frac{1}{1+\\exp{-(z_j^l)}}.\n",
@@ -1195,8 +1417,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "8c598490",
-   "metadata": {},
+   "id": "92744a90",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "## Layout of input to first hidden layer $l=1$ from input layer $l=0$\n",
     "\n",
@@ -1209,8 +1433,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "0ae86831",
-   "metadata": {},
+   "id": "35424d45",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "## Derivatives and the chain rule\n",
     "\n",
@@ -1219,8 +1445,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "077d65f3",
-   "metadata": {},
+   "id": "b8502930",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "\\frac{\\partial z_j^l}{\\partial w_{ij}^l} = a_i^{l-1},\n",
@@ -1229,16 +1457,20 @@
   },
   {
    "cell_type": "markdown",
-   "id": "816b7643",
-   "metadata": {},
+   "id": "81ad45a5",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "and"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "c748d997",
-   "metadata": {},
+   "id": "11bb8afb",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "\\frac{\\partial z_j^l}{\\partial a_i^{l-1}} = w_{ji}^l.\n",
@@ -1247,16 +1479,20 @@
   },
   {
    "cell_type": "markdown",
-   "id": "6186a477",
-   "metadata": {},
+   "id": "b53ec752",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "With our definition of the activation function we have that (note that this function depends only on $z_j^l$)"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "5159e465",
-   "metadata": {},
+   "id": "b7519a84",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "\\frac{\\partial a_j^l}{\\partial z_j^{l}} = a_j^l(1-a_j^l)=\\sigma(z_j^l)(1-\\sigma(z_j^l)).\n",
@@ -1265,8 +1501,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "1717c046",
-   "metadata": {},
+   "id": "c57689db",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "## Derivative of the cost function\n",
     "\n",
@@ -1277,8 +1515,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "43b02473",
-   "metadata": {},
+   "id": "a9f83b15",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "{\\cal C}(\\boldsymbol{\\Theta}^L)  =  \\frac{1}{2}\\sum_{i=1}^n\\left(y_i - \\tilde{y}_i\\right)^2=\\frac{1}{2}\\sum_{i=1}^n\\left(a_i^L - y_i\\right)^2,\n",
@@ -1287,16 +1527,20 @@
   },
   {
    "cell_type": "markdown",
-   "id": "5034b9a1",
-   "metadata": {},
+   "id": "067c2583",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "The derivative of this function with respect to the weights is"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "cd13d020",
-   "metadata": {},
+   "id": "43545710",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "\\frac{\\partial{\\cal C}(\\boldsymbol{\\Theta}^L)}{\\partial w_{ij}^L}  =  \\left(a_j^L - y_j\\right)\\frac{\\partial a_j^L}{\\partial w_{ij}^{L}},\n",
@@ -1305,16 +1549,20 @@
   },
   {
    "cell_type": "markdown",
-   "id": "592375f7",
-   "metadata": {},
+   "id": "1eb33717",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "The last partial derivative can easily be computed and reads (by applying the chain rule)"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "2bbcf893",
-   "metadata": {},
+   "id": "e09a8734",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "\\frac{\\partial a_j^L}{\\partial w_{ij}^{L}} = \\frac{\\partial a_j^L}{\\partial z_{j}^{L}}\\frac{\\partial z_j^L}{\\partial w_{ij}^{L}}=a_j^L(1-a_j^L)a_i^{L-1}.\n",
@@ -1323,8 +1571,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "58fc5cdc",
-   "metadata": {},
+   "id": "3dc0f5a3",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "## The  back propagation equations for a neural network\n",
     "\n",
@@ -1333,8 +1583,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "9ec4e6ef",
-   "metadata": {},
+   "id": "bb58784b",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "\\frac{\\partial{\\cal C}((\\boldsymbol{\\Theta}^L)}{\\partial w_{ij}^L}  =  \\left(a_j^L - y_j\\right)a_j^L(1-a_j^L)a_i^{L-1},\n",
@@ -1343,16 +1595,20 @@
   },
   {
    "cell_type": "markdown",
-   "id": "fcdfd63d",
-   "metadata": {},
+   "id": "10aea094",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "Defining"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "5199bd46",
-   "metadata": {},
+   "id": "b7cc2db8",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "\\delta_j^L = a_j^L(1-a_j^L)\\left(a_j^L - y_j\\right) = \\sigma'(z_j^L)\\frac{\\partial {\\cal C}}{\\partial (a_j^L)},\n",
@@ -1361,16 +1617,20 @@
   },
   {
    "cell_type": "markdown",
-   "id": "b255a6df",
-   "metadata": {},
+   "id": "6cce9a62",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "and using the Hadamard product of two vectors we can write this as"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "a6617bc8",
-   "metadata": {},
+   "id": "43e5a84b",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "\\boldsymbol{\\delta}^L = \\sigma'(\\boldsymbol{z}^L)\\circ\\frac{\\partial {\\cal C}}{\\partial (\\boldsymbol{a}^L)}.\n",
@@ -1379,8 +1639,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "1ae198f0",
-   "metadata": {},
+   "id": "d5c607a7",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "## Analyzing the last results\n",
     "\n",
@@ -1395,8 +1657,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "f0333d5b",
-   "metadata": {},
+   "id": "a51b3b58",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "## More considerations\n",
     "\n",
@@ -1411,8 +1675,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "05f19c67",
-   "metadata": {},
+   "id": "4cd9d058",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "\\frac{\\partial {\\cal C}}{\\partial (a_j^L)}\n",
@@ -1421,16 +1687,20 @@
   },
   {
    "cell_type": "markdown",
-   "id": "f35623e3",
-   "metadata": {},
+   "id": "c80b630d",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "With the definition of $\\delta_j^L$ we have a more compact definition of the derivative of the cost function in terms of the weights, namely"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "422aabb5",
-   "metadata": {},
+   "id": "dc0c1a06",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "\\frac{\\partial{\\cal C}}{\\partial w_{ij}^L}  =  \\delta_j^La_i^{L-1}.\n",
@@ -1439,8 +1709,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "5f2f2143",
-   "metadata": {},
+   "id": "8f2065b7",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "## Derivatives in terms of $z_j^L$\n",
     "\n",
@@ -1449,8 +1721,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "e0ec6446",
-   "metadata": {},
+   "id": "7f89b9d8",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "\\delta_j^L =\\frac{\\partial {\\cal C}}{\\partial z_j^L}= \\frac{\\partial {\\cal C}}{\\partial a_j^L}\\frac{\\partial a_j^L}{\\partial z_j^L},\n",
@@ -1459,16 +1733,20 @@
   },
   {
    "cell_type": "markdown",
-   "id": "ab3e824e",
-   "metadata": {},
+   "id": "49c2cd3f",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "which can also be interpreted as the partial derivative of the cost function with respect to the biases $b_j^L$, namely"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "b63e0260",
-   "metadata": {},
+   "id": "517b1a37",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "\\delta_j^L = \\frac{\\partial {\\cal C}}{\\partial b_j^L}\\frac{\\partial b_j^L}{\\partial z_j^L}=\\frac{\\partial {\\cal C}}{\\partial b_j^L},\n",
@@ -1477,16 +1755,20 @@
   },
   {
    "cell_type": "markdown",
-   "id": "8f85d588",
-   "metadata": {},
+   "id": "65c8107f",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "That is, the error $\\delta_j^L$ is exactly equal to the rate of change of the cost function as a function of the bias."
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "c53b387b",
-   "metadata": {},
+   "id": "2a10f902",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "## Bringing it together\n",
     "\n",
@@ -1495,8 +1777,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "8bcd2836",
-   "metadata": {},
+   "id": "b2ebf9c2",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "<!-- Equation labels as ordinary links -->\n",
     "<div id=\"_auto1\"></div>\n",
@@ -1511,16 +1795,20 @@
   },
   {
    "cell_type": "markdown",
-   "id": "e37eab2a",
-   "metadata": {},
+   "id": "90336322",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "and"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "e7371843",
-   "metadata": {},
+   "id": "f25ff166",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "<!-- Equation labels as ordinary links -->\n",
     "<div id=\"_auto2\"></div>\n",
@@ -1535,16 +1823,20 @@
   },
   {
    "cell_type": "markdown",
-   "id": "c6b5e4ee",
-   "metadata": {},
+   "id": "4cf11d5e",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "and"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "d49e9c2c",
-   "metadata": {},
+   "id": "2670748d",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "<!-- Equation labels as ordinary links -->\n",
     "<div id=\"_auto3\"></div>\n",
@@ -1559,8 +1851,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "bc813237",
-   "metadata": {},
+   "id": "18c29f71",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "## Final back propagating equation\n",
     "\n",
@@ -1569,8 +1863,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "0adc23ee",
-   "metadata": {},
+   "id": "c593470c",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "\\delta_j^l =\\frac{\\partial {\\cal C}}{\\partial z_j^l}.\n",
@@ -1579,16 +1875,20 @@
   },
   {
    "cell_type": "markdown",
-   "id": "9f18ba82",
-   "metadata": {},
+   "id": "28e8caef",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "We want to express this in terms of the equations for layer $l+1$."
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "b9c3658c",
-   "metadata": {},
+   "id": "516de9d7",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "## Using the chain rule and summing over all $k$ entries\n",
     "\n",
@@ -1597,8 +1897,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "3792e41e",
-   "metadata": {},
+   "id": "004c0bf4",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "\\delta_j^l =\\sum_k \\frac{\\partial {\\cal C}}{\\partial z_k^{l+1}}\\frac{\\partial z_k^{l+1}}{\\partial z_j^{l}}=\\sum_k \\delta_k^{l+1}\\frac{\\partial z_k^{l+1}}{\\partial z_j^{l}},\n",
@@ -1607,16 +1909,20 @@
   },
   {
    "cell_type": "markdown",
-   "id": "d052d0c1",
-   "metadata": {},
+   "id": "d62a3b1f",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "and recalling that"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "cb86b10e",
-   "metadata": {},
+   "id": "e9af770e",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "z_j^{l+1} = \\sum_{i=1}^{M_{l}}w_{ij}^{l+1}a_i^{l}+b_j^{l+1},\n",
@@ -1625,16 +1931,20 @@
   },
   {
    "cell_type": "markdown",
-   "id": "4053ba69",
-   "metadata": {},
+   "id": "eca56f17",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "with $M_l$ being the number of nodes in layer $l$, we obtain"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "444986c0",
-   "metadata": {},
+   "id": "bb0e4414",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "\\delta_j^l =\\sum_k \\delta_k^{l+1}w_{kj}^{l+1}\\sigma'(z_j^l),\n",
@@ -1643,8 +1953,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "19dc83fd",
-   "metadata": {},
+   "id": "a4b190fc",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "This is our final equation.\n",
     "\n",
@@ -1653,8 +1965,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "a10386fd",
-   "metadata": {},
+   "id": "ec0f87c0",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "## Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations\n",
     "\n",
@@ -1677,8 +1991,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "08b046a2",
-   "metadata": {},
+   "id": "2fb45155",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "## Setting up the back propagation algorithm, part 1\n",
     "\n",
@@ -1698,8 +2014,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "abd09f94",
-   "metadata": {},
+   "id": "3d5c2a0e",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "## Setting up the back propagation algorithm, part 2\n",
     "\n",
@@ -1708,8 +2026,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "d2b58a05",
-   "metadata": {},
+   "id": "9183bbd0",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "\\delta_j^L = \\sigma'(z_j^L)\\frac{\\partial {\\cal C}}{\\partial (a_j^L)}.\n",
@@ -1718,16 +2038,20 @@
   },
   {
    "cell_type": "markdown",
-   "id": "9eb706a2",
-   "metadata": {},
+   "id": "32ece956",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "Then we compute the back propagate error for each $l=L-1,L-2,\\dots,1$ as"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "4dd2c404",
-   "metadata": {},
+   "id": "466d6bda",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "\\delta_j^l = \\sum_k \\delta_k^{l+1}w_{kj}^{l+1}\\sigma'(z_j^l).\n",
@@ -1736,8 +2060,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "10ec2807",
-   "metadata": {},
+   "id": "9f31b228",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "## Setting up the Back propagation algorithm, part 3\n",
     "\n",
@@ -1748,8 +2074,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "dc831415",
-   "metadata": {},
+   "id": "fbeac005",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "w_{ij}^l\\leftarrow  = w_{ij}^l- \\eta \\delta_j^la_i^{l-1},\n",
@@ -1758,8 +2086,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "b6841649",
-   "metadata": {},
+   "id": "bc6ae984",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "b_j^l \\leftarrow b_j^l-\\eta \\frac{\\partial {\\cal C}}{\\partial b_j^l}=b_j^l-\\eta \\delta_j^l,\n",
@@ -1768,16 +2098,20 @@
   },
   {
    "cell_type": "markdown",
-   "id": "355caec2",
-   "metadata": {},
+   "id": "65f3133d",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "with $\\eta$ being the learning rate."
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "472bf1c5",
-   "metadata": {},
+   "id": "5d27bbe1",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "## Updating the gradients\n",
     "\n",
@@ -1786,8 +2120,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "d0a449d0",
-   "metadata": {},
+   "id": "5e5d0aa0",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "\\delta_j^l = \\sum_k \\delta_k^{l+1}w_{kj}^{l+1}\\sigma'(z_j^l),\n",
@@ -1796,16 +2132,20 @@
   },
   {
    "cell_type": "markdown",
-   "id": "128ab1d1",
-   "metadata": {},
+   "id": "ea32e5bb",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "we update the weights and the biases using gradient descent for each $l=L-1,L-2,\\dots,1$ and update the weights and biases according to the rules"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "bfae04eb",
-   "metadata": {},
+   "id": "3a9bb5a6",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "w_{ij}^l\\leftarrow  = w_{ij}^l- \\eta \\delta_j^la_i^{l-1},\n",
@@ -1814,8 +2154,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "4530efa3",
-   "metadata": {},
+   "id": "9008dcf8",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "b_j^l \\leftarrow b_j^l-\\eta \\frac{\\partial {\\cal C}}{\\partial b_j^l}=b_j^l-\\eta \\delta_j^l,\n",
@@ -1824,8 +2166,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "4da3f467",
-   "metadata": {},
+   "id": "89aba7d6",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "## Activation functions\n",
     "\n",
@@ -1845,8 +2189,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "a3f2750e",
-   "metadata": {},
+   "id": "ea0cdce2",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "### Activation functions, Logistic and Hyperbolic ones\n",
     "\n",
@@ -1862,8 +2208,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "a8996317",
-   "metadata": {},
+   "id": "91342c80",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "\\sigma(x) = \\frac{1}{1 + e^{-x}},\n",
@@ -1872,16 +2220,20 @@
   },
   {
    "cell_type": "markdown",
-   "id": "bb6de399",
-   "metadata": {},
+   "id": "bd6eb22a",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "and the *hyperbolic tangent* function"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "c6aa6e93",
-   "metadata": {},
+   "id": "4e75b2ab",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "\\sigma(x) = \\tanh(x)\n",
@@ -1890,8 +2242,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "e10b7b94",
-   "metadata": {},
+   "id": "1626d9b7",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "## Relevance\n",
     "\n",
@@ -1905,8 +2259,11 @@
   {
    "cell_type": "code",
    "execution_count": 2,
-   "id": "7255188c",
-   "metadata": {},
+   "id": "4ac7c23c",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
    "outputs": [],
    "source": [
     "%matplotlib inline\n",
@@ -1986,8 +2343,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "cbe6427f",
-   "metadata": {},
+   "id": "6aeb0ee4",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "## Vanishing gradients\n",
     "\n",
@@ -2006,8 +2365,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "7aa9a09d",
-   "metadata": {},
+   "id": "ea47d1d6",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "## Exploding gradients\n",
     "\n",
@@ -2022,8 +2383,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "a6833949",
-   "metadata": {},
+   "id": "1947aa95",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "## Is the Logistic activation function (Sigmoid)  our choice?\n",
     "\n",
@@ -2043,8 +2406,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "6c4a6dfd",
-   "metadata": {},
+   "id": "d024119f",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "## Logistic function as the root of problems\n",
     "\n",
@@ -2060,8 +2425,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "7ebf35c4",
-   "metadata": {},
+   "id": "c9178132",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "## The derivative of the Logistic funtion\n",
     "\n",
@@ -2086,8 +2453,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "81e57dfd",
-   "metadata": {},
+   "id": "756185f5",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "## Insights from the paper by Glorot and Bengio\n",
     "\n",
@@ -2104,8 +2473,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "c0ad35af",
-   "metadata": {},
+   "id": "3d92cad4",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "## The RELU function family\n",
     "\n",
@@ -2123,8 +2494,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "9abee1c8",
-   "metadata": {},
+   "id": "cbc6f721",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "## ELU function\n",
     "\n",
@@ -2135,8 +2508,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "1d1ff061",
-   "metadata": {},
+   "id": "9249dc7b",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "ELU(z) = \\left\\{\\begin{array}{cc} \\alpha\\left( \\exp{(z)}-1\\right) & z < 0,\\\\  z & z \\ge 0.\\end{array}\\right.\n",
@@ -2145,8 +2520,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "a8b62fc1",
-   "metadata": {},
+   "id": "e59de3af",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "## Which activation function should we use?\n",
     "\n",
@@ -2165,8 +2542,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "a8858730",
-   "metadata": {},
+   "id": "e2da998c",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "## More on activation functions, output layers\n",
     "\n",
@@ -2185,8 +2564,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "be1e67e4",
-   "metadata": {},
+   "id": "e1abf01e",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "## Fine-tuning neural network hyperparameters\n",
     "\n",
@@ -2212,8 +2593,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "361acaff",
-   "metadata": {},
+   "id": "a8ded7cd",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "## Hidden layers\n",
     "\n",
@@ -2236,8 +2619,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "636cc811",
-   "metadata": {},
+   "id": "96da4f48",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "## Batch Normalization\n",
     "\n",
@@ -2260,8 +2645,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "a9c3c37b",
-   "metadata": {},
+   "id": "395346a7",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "## Dropout\n",
     "\n",
@@ -2278,8 +2665,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "99899732",
-   "metadata": {},
+   "id": "9c712bbb",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "## Gradient Clipping\n",
     "\n",
@@ -2296,8 +2685,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "d76992f2",
-   "metadata": {},
+   "id": "2b66ea72",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "## A top-down perspective on Neural networks\n",
     "\n",
@@ -2320,8 +2711,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "f3f3714c",
-   "metadata": {},
+   "id": "5acbc082",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "## More top-down perspectives\n",
     "\n",
@@ -2347,8 +2740,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "f9eea049",
-   "metadata": {},
+   "id": "31825b65",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "## Limitations of supervised learning with deep networks\n",
     "\n",
@@ -2363,8 +2758,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "ffe27b44",
-   "metadata": {},
+   "id": "c76d9af9",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "## Limitations of NNs\n",
     "\n",
@@ -2377,8 +2774,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "8a4d6517",
-   "metadata": {},
+   "id": "bdc93363",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "## Homogeneous data\n",
     "\n",
@@ -2387,8 +2786,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "322a5ffb",
-   "metadata": {},
+   "id": "a1d6ff64",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "## More limitations\n",
     "\n",
@@ -2399,8 +2800,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "d7c906f4",
-   "metadata": {},
+   "id": "0c2e5742",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "## Setting up a Multi-layer perceptron model for classification\n",
     "\n",
@@ -2425,8 +2828,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "d9b1c4a9",
-   "metadata": {},
+   "id": "d4da3f02",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "P(y = 0 \\mid \\boldsymbol{x}, \\boldsymbol{\\theta}) = \\frac{1}{1 + \\exp{(- \\boldsymbol{x}})} ,\n",
@@ -2435,16 +2840,20 @@
   },
   {
    "cell_type": "markdown",
-   "id": "62795b6a",
-   "metadata": {},
+   "id": "01ea2e0b",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "and"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "a302cc7a",
-   "metadata": {},
+   "id": "9c1c7bec",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "P(y = 1 \\mid \\boldsymbol{x}, \\boldsymbol{\\theta}) = 1 - P(y = 0 \\mid \\boldsymbol{x}, \\boldsymbol{\\theta}) ,\n",
@@ -2453,8 +2862,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "892da0f2",
-   "metadata": {},
+   "id": "9238ff2d",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "where $y \\in \\{0, 1\\}$  and $\\boldsymbol{\\theta}$ represents the weights and biases\n",
     "of our network."
@@ -2462,8 +2873,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "0702951f",
-   "metadata": {},
+   "id": "3be74bd1",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "## Defining the cost function\n",
     "\n",
@@ -2472,8 +2885,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "cb0f2050",
-   "metadata": {},
+   "id": "2e2fd39c",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "\\mathcal{C}(\\boldsymbol{\\theta}) = - \\ln P(\\mathcal{D} \\mid \\boldsymbol{\\theta}) = - \\sum_{i=1}^n\n",
@@ -2483,8 +2898,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "9d6da3d7",
-   "metadata": {},
+   "id": "42b1d26b",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "This last equality means that we can interpret our *cost* function as a sum over the *loss* function\n",
     "for each point in the dataset $\\mathcal{L}_i(\\boldsymbol{\\theta})$.  \n",
@@ -2506,8 +2923,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "eb7f6521",
-   "metadata": {},
+   "id": "f740a484",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "P(y_{ic} = 1 \\mid \\boldsymbol{x}_i, \\boldsymbol{\\theta}) = \\frac{\\exp{((\\boldsymbol{a}_i^{hidden})^T \\boldsymbol{w}_c)}}\n",
@@ -2517,8 +2936,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "2d3caa00",
-   "metadata": {},
+   "id": "19189bfc",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "which reduces to the logistic function in the binary case.  \n",
     "The likelihood of this $C$-class classifier\n",
@@ -2527,8 +2948,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "b84b3da0",
-   "metadata": {},
+   "id": "aeb3ef60",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "P(\\mathcal{D} \\mid \\boldsymbol{\\theta}) = \\prod_{i=1}^n \\prod_{c=0}^{C-1} [P(y_{ic} = 1)]^{y_{ic}} .\n",
@@ -2537,16 +2960,20 @@
   },
   {
    "cell_type": "markdown",
-   "id": "5be6ff37",
-   "metadata": {},
+   "id": "dbf419a1",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "Again we take the negative log-likelihood to define our cost function:"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "491b47ff",
-   "metadata": {},
+   "id": "9e345753",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "\\mathcal{C}(\\boldsymbol{\\theta}) = - \\log{P(\\mathcal{D} \\mid \\boldsymbol{\\theta})}.\n",
@@ -2555,8 +2982,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "c752e63e",
-   "metadata": {},
+   "id": "3b13095e",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "See the logistic regression lectures for a full definition of the cost function.\n",
     "\n",
@@ -2565,8 +2994,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "efe0f0c7",
-   "metadata": {},
+   "id": "96501a91",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "## Example: binary classification problem\n",
     "\n",
@@ -2575,8 +3006,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "13ee778d",
-   "metadata": {},
+   "id": "48cf79fe",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "\\mathcal{C}(\\boldsymbol{\\beta}) = - \\sum_{i=1}^n \\left(y_i\\log{p(y_i \\vert x_i,\\boldsymbol{\\beta})}+(1-y_i)\\log{1-p(y_i \\vert x_i,\\boldsymbol{\\beta})}\\right),\n",
@@ -2585,16 +3018,20 @@
   },
   {
    "cell_type": "markdown",
-   "id": "6884cc1e",
-   "metadata": {},
+   "id": "3243c0b1",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "where we had defined the logistic (sigmoid) function"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "9d7a0a4e",
-   "metadata": {},
+   "id": "bb312a09",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "p(y_i =1\\vert x_i,\\boldsymbol{\\beta})=\\frac{\\exp{(\\beta_0+\\beta_1 x_i)}}{1+\\exp{(\\beta_0+\\beta_1 x_i)}},\n",
@@ -2603,16 +3040,20 @@
   },
   {
    "cell_type": "markdown",
-   "id": "35bfbdc8",
-   "metadata": {},
+   "id": "484cf2b4",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "and"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "631ba4a6",
-   "metadata": {},
+   "id": "2b9c5483",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "p(y_i =0\\vert x_i,\\boldsymbol{\\beta})=1-p(y_i =1\\vert x_i,\\boldsymbol{\\beta}).\n",
@@ -2621,8 +3062,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "a37f8d69",
-   "metadata": {},
+   "id": "5ca21f09",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "The parameters $\\boldsymbol{\\beta}$ were defined using a minimization method like gradient descent or Newton-Raphson's method. \n",
     "\n",
@@ -2632,8 +3075,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "fcedfc85",
-   "metadata": {},
+   "id": "4852e4d2",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "a_i^l = y_i = \\frac{\\exp{(z_i^l)}}{1+\\exp{(z_i^l)}},\n",
@@ -2642,16 +3087,20 @@
   },
   {
    "cell_type": "markdown",
-   "id": "4c1ab15e",
-   "metadata": {},
+   "id": "e3b7cbef",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "with"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "6518cab5",
-   "metadata": {},
+   "id": "0c1e69a1",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "z_i^l = \\sum_{j}w_{ij}^l a_j^{l-1}+b_i^l,\n",
@@ -2660,8 +3109,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "ccfcb38c",
-   "metadata": {},
+   "id": "e71df7f4",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "where the superscript $l-1$ indicates that these are the outputs from layer $l-1$.\n",
     "Our cost function at the final layer $l=L$ is now"
@@ -2669,8 +3120,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "4174ce25",
-   "metadata": {},
+   "id": "50d6fecc",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "\\mathcal{C}(\\boldsymbol{W}) = - \\sum_{i=1}^n \\left(t_i\\log{a_i^L}+(1-t_i)\\log{(1-a_i^L)}\\right),\n",
@@ -2679,16 +3132,20 @@
   },
   {
    "cell_type": "markdown",
-   "id": "4602e3f2",
-   "metadata": {},
+   "id": "e145e461",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "where we have defined the targets $t_i$. The derivatives of the cost function with respect to the output $a_i^L$ are then easily calculated and we get"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "4b71d88d",
-   "metadata": {},
+   "id": "97f13260",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "\\frac{\\partial \\mathcal{C}(\\boldsymbol{W})}{\\partial a_i^L} = \\frac{a_i^L-t_i}{a_i^L(1-a_i^L)}.\n",
@@ -2697,16 +3154,20 @@
   },
   {
    "cell_type": "markdown",
-   "id": "d5f0d911",
-   "metadata": {},
+   "id": "4361ce3b",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "In case we use another activation function than the logistic one, we need to evaluate other derivatives."
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "d0c13a16",
-   "metadata": {},
+   "id": "52a16654",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "## The Softmax function\n",
     "In case we employ the more general case given by the Softmax equation, we need to evaluate the derivative of the activation function with respect to the activation $z_i^l$, that is we need"
@@ -2714,8 +3175,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "7af1d556",
-   "metadata": {},
+   "id": "3bfb321e",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "\\frac{\\partial f(z_i^l)}{\\partial w_{jk}^l} =\n",
@@ -2725,16 +3188,20 @@
   },
   {
    "cell_type": "markdown",
-   "id": "a42c51d7",
-   "metadata": {},
+   "id": "eccac6c9",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "For the Softmax function we have"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "6365d360",
-   "metadata": {},
+   "id": "23634198",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "f(z_i^l) = \\frac{\\exp{(z_i^l)}}{\\sum_{m=1}^K\\exp{(z_m^l)}}.\n",
@@ -2743,16 +3210,20 @@
   },
   {
    "cell_type": "markdown",
-   "id": "557a932f",
-   "metadata": {},
+   "id": "7a2e75ba",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "Its derivative with respect to $z_j^l$ gives"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "055db867",
-   "metadata": {},
+   "id": "2dad2d14",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "\\frac{\\partial f(z_i^l)}{\\partial z_j^l}= f(z_i^l)\\left(\\delta_{ij}-f(z_j^l)\\right),\n",
@@ -2761,16 +3232,20 @@
   },
   {
    "cell_type": "markdown",
-   "id": "ccd85582",
-   "metadata": {},
+   "id": "46415917",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "which in case of the simply binary model reduces to  having $i=j$."
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "d2d1077a",
-   "metadata": {},
+   "id": "6adc7c1e",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "## Developing a code for doing neural networks with back propagation\n",
     "\n",
@@ -2791,8 +3266,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "3dff91e3",
-   "metadata": {},
+   "id": "4110d83e",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "## Collect and pre-process data\n",
     "\n",
@@ -2839,8 +3316,11 @@
   {
    "cell_type": "code",
    "execution_count": 3,
-   "id": "0527bd02",
-   "metadata": {},
+   "id": "070c610d",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
    "outputs": [],
    "source": [
     "# import necessary packages\n",
@@ -2889,8 +3369,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "b5238d8c",
-   "metadata": {},
+   "id": "28bb6085",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "## Train and test datasets\n",
     "\n",
@@ -2908,8 +3390,11 @@
   {
    "cell_type": "code",
    "execution_count": 4,
-   "id": "bba6be21",
-   "metadata": {},
+   "id": "5a6ae0b0",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
    "outputs": [],
    "source": [
     "from sklearn.model_selection import train_test_split\n",
@@ -2943,8 +3428,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "4a77d660",
-   "metadata": {},
+   "id": "c26d604d",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "## Define model and architecture\n",
     "\n",
@@ -2985,8 +3472,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "0d276258",
-   "metadata": {},
+   "id": "2775283b",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "## Layers\n",
     "\n",
@@ -3023,8 +3512,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "f25c6fa1",
-   "metadata": {},
+   "id": "f7455c00",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "## Weights and biases\n",
     "\n",
@@ -3042,8 +3533,11 @@
   {
    "cell_type": "code",
    "execution_count": 5,
-   "id": "3a4a6bba",
-   "metadata": {},
+   "id": "20b3c8c0",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
    "outputs": [],
    "source": [
     "# building our neural network\n",
@@ -3065,8 +3559,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "de886e02",
-   "metadata": {},
+   "id": "a41d9acd",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "## Feed-forward pass\n",
     "\n",
@@ -3091,8 +3587,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "baeb1290",
-   "metadata": {},
+   "id": "b2f64238",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "## Matrix multiplications\n",
     "\n",
@@ -3126,8 +3624,11 @@
   {
    "cell_type": "code",
    "execution_count": 6,
-   "id": "b575ea05",
-   "metadata": {},
+   "id": "1f5589af",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
    "outputs": [],
    "source": [
     "# setup the feed-forward pass, subscript h = hidden layer\n",
@@ -3169,8 +3670,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "a981e9cf",
-   "metadata": {},
+   "id": "4518e911",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "## Choose cost function and optimizer\n",
     "\n",
@@ -3198,8 +3701,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "fcb5b7b4",
-   "metadata": {},
+   "id": "d519516b",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "## Optimizing the cost function\n",
     "\n",
@@ -3234,8 +3739,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "135edd42",
-   "metadata": {},
+   "id": "46b71202",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "## Regularization\n",
     "\n",
@@ -3266,8 +3773,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "6a94b210",
-   "metadata": {},
+   "id": "129c39d3",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "## Matrix  multiplication\n",
     "\n",
@@ -3305,8 +3814,11 @@
   {
    "cell_type": "code",
    "execution_count": 7,
-   "id": "10a0f4b1",
-   "metadata": {},
+   "id": "8abafb44",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
    "outputs": [],
    "source": [
     "# to categorical turns our integer vector into a onehot representation\n",
@@ -3381,8 +3893,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "37e26c5f",
-   "metadata": {},
+   "id": "e95c7166",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "## Improving performance\n",
     "\n",
@@ -3400,8 +3914,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "3721f1b2",
-   "metadata": {},
+   "id": "b4365471",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "## Full object-oriented implementation\n",
     "\n",
@@ -3412,8 +3928,11 @@
   {
    "cell_type": "code",
    "execution_count": 8,
-   "id": "a2225589",
-   "metadata": {},
+   "id": "5a0357b2",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
    "outputs": [],
    "source": [
     "class NeuralNetwork:\n",
@@ -3519,8 +4038,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "9d915d49",
-   "metadata": {},
+   "id": "a417307d",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "## Evaluate model performance on test data\n",
     "\n",
@@ -3536,8 +4057,11 @@
   {
    "cell_type": "code",
    "execution_count": 9,
-   "id": "62e979c5",
-   "metadata": {},
+   "id": "8ee4b306",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
    "outputs": [],
    "source": [
     "epochs = 100\n",
@@ -3560,8 +4084,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "32464471",
-   "metadata": {},
+   "id": "efcbd954",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "## Adjust hyperparameters\n",
     "\n",
@@ -3572,8 +4098,11 @@
   {
    "cell_type": "code",
    "execution_count": 10,
-   "id": "c3277c8f",
-   "metadata": {},
+   "id": "bb527e6e",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
    "outputs": [],
    "source": [
     "eta_vals = np.logspace(-5, 1, 7)\n",
@@ -3600,8 +4129,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "bb09a45b",
-   "metadata": {},
+   "id": "d282951d",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "## Visualization"
    ]
@@ -3609,8 +4140,11 @@
   {
    "cell_type": "code",
    "execution_count": 11,
-   "id": "6e2566d5",
-   "metadata": {},
+   "id": "69d3d9c8",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
    "outputs": [],
    "source": [
     "# visual representation of grid search\n",
@@ -3650,8 +4184,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "eec163df",
-   "metadata": {},
+   "id": "99f5058c",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "## scikit-learn implementation\n",
     "\n",
@@ -3671,8 +4207,11 @@
   {
    "cell_type": "code",
    "execution_count": 12,
-   "id": "b33a66c0",
-   "metadata": {},
+   "id": "7898d99f",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
    "outputs": [],
    "source": [
     "from sklearn.neural_network import MLPClassifier\n",
@@ -3695,8 +4234,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "e1d6d9d7",
-   "metadata": {},
+   "id": "7ceec918",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "## Visualization"
    ]
@@ -3704,8 +4245,11 @@
   {
    "cell_type": "code",
    "execution_count": 13,
-   "id": "4bc9d5c7",
-   "metadata": {},
+   "id": "98abf229",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
    "outputs": [],
    "source": [
     "# optional\n",
@@ -3746,8 +4290,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "c1c46aeb",
-   "metadata": {},
+   "id": "ba07c374",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "## Building neural networks in Tensorflow and Keras\n",
     "\n",
@@ -3762,8 +4308,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "b5205123",
-   "metadata": {},
+   "id": "1cf09819",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "## Tensorflow\n",
     "\n",
@@ -3795,8 +4343,11 @@
   {
    "cell_type": "code",
    "execution_count": 14,
-   "id": "b9c0dfe3",
-   "metadata": {},
+   "id": "2c2c3ec5",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
    "outputs": [],
    "source": [
     "pip3 install tensorflow"
@@ -3804,8 +4355,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "f562b8e8",
-   "metadata": {},
+   "id": "39d013b1",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "and/or if you use **anaconda**, just write (or install from the graphical user interface)\n",
     "(current release of CPU-only TensorFlow)"
@@ -3814,8 +4367,11 @@
   {
    "cell_type": "code",
    "execution_count": 15,
-   "id": "d4526899",
-   "metadata": {},
+   "id": "fbf36c26",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
    "outputs": [],
    "source": [
     "conda create -n tf tensorflow\n",
@@ -3824,8 +4380,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "59617395",
-   "metadata": {},
+   "id": "94e66380",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "To install the current release of GPU TensorFlow"
    ]
@@ -3833,8 +4391,11 @@
   {
    "cell_type": "code",
    "execution_count": 16,
-   "id": "9c975a67",
-   "metadata": {},
+   "id": "5e72b1d2",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
    "outputs": [],
    "source": [
     "conda create -n tf-gpu tensorflow-gpu\n",
@@ -3843,8 +4404,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "975357be",
-   "metadata": {},
+   "id": "40470dbd",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "## Using Keras\n",
     "\n",
@@ -3856,8 +4419,11 @@
   {
    "cell_type": "code",
    "execution_count": 17,
-   "id": "e558fb1f",
-   "metadata": {},
+   "id": "f2cd4f41",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
    "outputs": [],
    "source": [
     "conda install keras"
@@ -3865,8 +4431,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "a6f5c4d4",
-   "metadata": {},
+   "id": "636940c6",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "You can look up the [instructions here](https://keras.io/) for more information.\n",
     "\n",
@@ -3875,8 +4443,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "6bb12225",
-   "metadata": {},
+   "id": "d9f47b57",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "## Collect and pre-process data\n",
     "\n",
@@ -3886,8 +4456,11 @@
   {
    "cell_type": "code",
    "execution_count": 18,
-   "id": "41fd6ccf",
-   "metadata": {},
+   "id": "1489b5d5",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
    "outputs": [],
    "source": [
     "# import necessary packages\n",
@@ -3938,8 +4511,11 @@
   {
    "cell_type": "code",
    "execution_count": 19,
-   "id": "c70145e7",
-   "metadata": {},
+   "id": "672dc5a2",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
    "outputs": [],
    "source": [
     "from tensorflow.keras.layers import Input\n",
@@ -3964,8 +4540,11 @@
   {
    "cell_type": "code",
    "execution_count": 20,
-   "id": "f0413064",
-   "metadata": {},
+   "id": "0513084f",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
    "outputs": [],
    "source": [
     "\n",
@@ -3991,8 +4570,11 @@
   {
    "cell_type": "code",
    "execution_count": 21,
-   "id": "a4ba8bc0",
-   "metadata": {},
+   "id": "02a34777",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
    "outputs": [],
    "source": [
     "DNN_keras = np.zeros((len(eta_vals), len(lmbd_vals)), dtype=object)\n",
@@ -4015,8 +4597,11 @@
   {
    "cell_type": "code",
    "execution_count": 22,
-   "id": "e38856a7",
-   "metadata": {},
+   "id": "52c1d6e2",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
    "outputs": [],
    "source": [
     "# optional\n",
@@ -4054,191 +4639,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "143ff6b2",
-   "metadata": {},
-   "source": [
-    "## The Breast Cancer Data, now with Keras"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 23,
-   "id": "8830114d",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "\n",
-    "import tensorflow as tf\n",
-    "from tensorflow.keras.layers import Input\n",
-    "from tensorflow.keras.models import Sequential      #This allows appending layers to existing models\n",
-    "from tensorflow.keras.layers import Dense           #This allows defining the characteristics of a particular layer\n",
-    "from tensorflow.keras import optimizers             #This allows using whichever optimiser we want (sgd,adam,RMSprop)\n",
-    "from tensorflow.keras import regularizers           #This allows using whichever regularizer we want (l1,l2,l1_l2)\n",
-    "from tensorflow.keras.utils import to_categorical   #This allows using categorical cross entropy as the cost function\n",
-    "import numpy as np\n",
-    "import matplotlib.pyplot as plt\n",
-    "import seaborn as sns\n",
-    "from sklearn.model_selection import train_test_split as splitter\n",
-    "from sklearn.datasets import load_breast_cancer\n",
-    "import pickle\n",
-    "import os \n",
-    "\n",
-    "\n",
-    "\"\"\"Load breast cancer dataset\"\"\"\n",
-    "\n",
-    "np.random.seed(0)        #create same seed for random number every time\n",
-    "\n",
-    "cancer=load_breast_cancer()      #Download breast cancer dataset\n",
-    "\n",
-    "inputs=cancer.data                     #Feature matrix of 569 rows (samples) and 30 columns (parameters)\n",
-    "outputs=cancer.target                  #Label array of 569 rows (0 for benign and 1 for malignant)\n",
-    "labels=cancer.feature_names[0:30]\n",
-    "\n",
-    "print('The content of the breast cancer dataset is:')      #Print information about the datasets\n",
-    "print(labels)\n",
-    "print('-------------------------')\n",
-    "print(\"inputs =  \" + str(inputs.shape))\n",
-    "print(\"outputs =  \" + str(outputs.shape))\n",
-    "print(\"labels =  \"+ str(labels.shape))\n",
-    "\n",
-    "x=inputs      #Reassign the Feature and Label matrices to other variables\n",
-    "y=outputs\n",
-    "\n",
-    "#%% \n",
-    "\n",
-    "# Visualisation of dataset (for correlation analysis)\n",
-    "\n",
-    "plt.figure()\n",
-    "plt.scatter(x[:,0],x[:,2],s=40,c=y,cmap=plt.cm.Spectral)\n",
-    "plt.xlabel('Mean radius',fontweight='bold')\n",
-    "plt.ylabel('Mean perimeter',fontweight='bold')\n",
-    "plt.show()\n",
-    "\n",
-    "plt.figure()\n",
-    "plt.scatter(x[:,5],x[:,6],s=40,c=y, cmap=plt.cm.Spectral)\n",
-    "plt.xlabel('Mean compactness',fontweight='bold')\n",
-    "plt.ylabel('Mean concavity',fontweight='bold')\n",
-    "plt.show()\n",
-    "\n",
-    "\n",
-    "plt.figure()\n",
-    "plt.scatter(x[:,0],x[:,1],s=40,c=y,cmap=plt.cm.Spectral)\n",
-    "plt.xlabel('Mean radius',fontweight='bold')\n",
-    "plt.ylabel('Mean texture',fontweight='bold')\n",
-    "plt.show()\n",
-    "\n",
-    "plt.figure()\n",
-    "plt.scatter(x[:,2],x[:,1],s=40,c=y,cmap=plt.cm.Spectral)\n",
-    "plt.xlabel('Mean perimeter',fontweight='bold')\n",
-    "plt.ylabel('Mean compactness',fontweight='bold')\n",
-    "plt.show()\n",
-    "\n",
-    "\n",
-    "# Generate training and testing datasets\n",
-    "\n",
-    "#Select features relevant to classification (texture,perimeter,compactness and symmetery) \n",
-    "#and add to input matrix\n",
-    "\n",
-    "temp1=np.reshape(x[:,1],(len(x[:,1]),1))\n",
-    "temp2=np.reshape(x[:,2],(len(x[:,2]),1))\n",
-    "X=np.hstack((temp1,temp2))      \n",
-    "temp=np.reshape(x[:,5],(len(x[:,5]),1))\n",
-    "X=np.hstack((X,temp))       \n",
-    "temp=np.reshape(x[:,8],(len(x[:,8]),1))\n",
-    "X=np.hstack((X,temp))       \n",
-    "\n",
-    "X_train,X_test,y_train,y_test=splitter(X,y,test_size=0.1)   #Split datasets into training and testing\n",
-    "\n",
-    "y_train=to_categorical(y_train)     #Convert labels to categorical when using categorical cross entropy\n",
-    "y_test=to_categorical(y_test)\n",
-    "\n",
-    "del temp1,temp2,temp\n",
-    "\n",
-    "# %%\n",
-    "\n",
-    "# Define tunable parameters\"\n",
-    "\n",
-    "eta=np.logspace(-3,-1,3)                    #Define vector of learning rates (parameter to SGD optimiser)\n",
-    "lamda=0.01                                  #Define hyperparameter\n",
-    "n_layers=2                                  #Define number of hidden layers in the model\n",
-    "n_neuron=np.logspace(0,3,4,dtype=int)       #Define number of neurons per layer\n",
-    "epochs=100                                   #Number of reiterations over the input data\n",
-    "batch_size=100                              #Number of samples per gradient update\n",
-    "\n",
-    "# %%\n",
-    "\n",
-    "\"\"\"Define function to return Deep Neural Network model\"\"\"\n",
-    "\n",
-    "def NN_model(inputsize,n_layers,n_neuron,eta,lamda):\n",
-    "    model=Sequential()      \n",
-    "    for i in range(n_layers):       #Run loop to add hidden layers to the model\n",
-    "        if (i==0):                  #First layer requires input dimensions\n",
-    "            model.add(Dense(n_neuron,activation='relu',kernel_regularizer=regularizers.l2(lamda),input_dim=inputsize))\n",
-    "        else:                       #Subsequent layers are capable of automatic shape inferencing\n",
-    "            model.add(Dense(n_neuron,activation='relu',kernel_regularizer=regularizers.l2(lamda)))\n",
-    "    model.add(Dense(2,activation='softmax'))  #2 outputs - ordered and disordered (softmax for prob)\n",
-    "    sgd=optimizers.SGD(lr=eta)\n",
-    "    model.compile(loss='categorical_crossentropy',optimizer=sgd,metrics=['accuracy'])\n",
-    "    return model\n",
-    "\n",
-    "    \n",
-    "Train_accuracy=np.zeros((len(n_neuron),len(eta)))      #Define matrices to store accuracy scores as a function\n",
-    "Test_accuracy=np.zeros((len(n_neuron),len(eta)))       #of learning rate and number of hidden neurons for \n",
-    "\n",
-    "for i in range(len(n_neuron)):     #run loops over hidden neurons and learning rates to calculate \n",
-    "    for j in range(len(eta)):      #accuracy scores \n",
-    "        DNN_model=NN_model(X_train.shape[1],n_layers,n_neuron[i],eta[j],lamda)\n",
-    "        DNN_model.fit(X_train,y_train,epochs=epochs,batch_size=batch_size,verbose=1)\n",
-    "        Train_accuracy[i,j]=DNN_model.evaluate(X_train,y_train)[1]\n",
-    "        Test_accuracy[i,j]=DNN_model.evaluate(X_test,y_test)[1]\n",
-    "               \n",
-    "\n",
-    "def plot_data(x,y,data,title=None):\n",
-    "\n",
-    "    # plot results\n",
-    "    fontsize=16\n",
-    "\n",
-    "\n",
-    "    fig = plt.figure()\n",
-    "    ax = fig.add_subplot(111)\n",
-    "    cax = ax.matshow(data, interpolation='nearest', vmin=0, vmax=1)\n",
-    "    \n",
-    "    cbar=fig.colorbar(cax)\n",
-    "    cbar.ax.set_ylabel('accuracy (%)',rotation=90,fontsize=fontsize)\n",
-    "    cbar.set_ticks([0,.2,.4,0.6,0.8,1.0])\n",
-    "    cbar.set_ticklabels(['0%','20%','40%','60%','80%','100%'])\n",
-    "\n",
-    "    # put text on matrix elements\n",
-    "    for i, x_val in enumerate(np.arange(len(x))):\n",
-    "        for j, y_val in enumerate(np.arange(len(y))):\n",
-    "            c = \"${0:.1f}\\\\%$\".format( 100*data[j,i])  \n",
-    "            ax.text(x_val, y_val, c, va='center', ha='center')\n",
-    "\n",
-    "    # convert axis vaues to to string labels\n",
-    "    x=[str(i) for i in x]\n",
-    "    y=[str(i) for i in y]\n",
-    "\n",
-    "\n",
-    "    ax.set_xticklabels(['']+x)\n",
-    "    ax.set_yticklabels(['']+y)\n",
-    "\n",
-    "    ax.set_xlabel('$\\\\mathrm{learning\\\\ rate}$',fontsize=fontsize)\n",
-    "    ax.set_ylabel('$\\\\mathrm{hidden\\\\ neurons}$',fontsize=fontsize)\n",
-    "    if title is not None:\n",
-    "        ax.set_title(title)\n",
-    "\n",
-    "    plt.tight_layout()\n",
-    "\n",
-    "    plt.show()\n",
-    "    \n",
-    "plot_data(eta,n_neuron,Train_accuracy, 'training')\n",
-    "plot_data(eta,n_neuron,Test_accuracy, 'testing')"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "3a018087",
-   "metadata": {},
+   "id": "53f9be79",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "## Building a neural network code\n",
     "\n",
@@ -4254,8 +4658,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "e513a294",
-   "metadata": {},
+   "id": "39bd1718",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "### Learning rate methods\n",
     "\n",
@@ -4273,9 +4679,12 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 24,
-   "id": "5a213611",
-   "metadata": {},
+   "execution_count": 23,
+   "id": "4c1f42f1",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
    "outputs": [],
    "source": [
     "import autograd.numpy as np\n",
@@ -4412,8 +4821,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "961917fd",
-   "metadata": {},
+   "id": "532aecc2",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "### Usage of the above learning rate schedulers\n",
     "\n",
@@ -4425,9 +4836,12 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 25,
-   "id": "e3745f70",
-   "metadata": {},
+   "execution_count": 24,
+   "id": "b24b4414",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
    "outputs": [],
    "source": [
     "momentum_scheduler = Momentum(eta=1e-3, momentum=0.9)\n",
@@ -4436,8 +4850,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "22aca85a",
-   "metadata": {},
+   "id": "32a25c0b",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "Here is a small example for how a segment of code using schedulers\n",
     "could look. Switching out the schedulers is simple."
@@ -4445,9 +4861,12 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 26,
-   "id": "3c9a6d4a",
-   "metadata": {},
+   "execution_count": 25,
+   "id": "7a7d273f",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
    "outputs": [],
    "source": [
     "weights = np.ones((3,3))\n",
@@ -4465,8 +4884,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "6cea2036",
-   "metadata": {},
+   "id": "d34cd45c",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "### Cost functions\n",
     "\n",
@@ -4478,9 +4899,12 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 27,
-   "id": "76f5c3ab",
-   "metadata": {},
+   "execution_count": 26,
+   "id": "9ad6425d",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
    "outputs": [],
    "source": [
     "import autograd.numpy as np\n",
@@ -4514,8 +4938,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "8cbf4208",
-   "metadata": {},
+   "id": "baaaff79",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "Below we give a short example of how these cost function may be used\n",
     "to obtain results if you wish to test them out on your own using\n",
@@ -4524,9 +4950,12 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 28,
-   "id": "d6e082d7",
-   "metadata": {},
+   "execution_count": 27,
+   "id": "78f11b83",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
    "outputs": [],
    "source": [
     "from autograd import grad\n",
@@ -4543,8 +4972,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "110f03f8",
-   "metadata": {},
+   "id": "05285af5",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "### Activation functions\n",
     "\n",
@@ -4556,9 +4987,12 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 29,
-   "id": "3c6712b7",
-   "metadata": {},
+   "execution_count": 28,
+   "id": "7ac52c84",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
    "outputs": [],
    "source": [
     "import autograd.numpy as np\n",
@@ -4612,8 +5046,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "498d3949",
-   "metadata": {},
+   "id": "873e7caa",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "Below follows a short demonstration of how to use an activation\n",
     "function. The derivative of the activation function will be important\n",
@@ -4624,9 +5060,12 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 30,
-   "id": "33947583",
-   "metadata": {},
+   "execution_count": 29,
+   "id": "bd43ac18",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
    "outputs": [],
    "source": [
     "z = np.array([[4, 5, 6]]).T\n",
@@ -4643,8 +5082,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "731fc79c",
-   "metadata": {},
+   "id": "3dc2175e",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "### The Neural Network\n",
     "\n",
@@ -4664,9 +5105,12 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 31,
-   "id": "f27ea6ab",
-   "metadata": {},
+   "execution_count": 30,
+   "id": "5b4b161c",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
    "outputs": [],
    "source": [
     "import math\n",
@@ -5134,8 +5578,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "bf5cdac7",
-   "metadata": {},
+   "id": "9596ae53",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "Before we make a model, we will quickly generate a dataset we can use\n",
     "for our linear regression problem as shown below"
@@ -5143,9 +5589,12 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 32,
-   "id": "c57cb644",
-   "metadata": {},
+   "execution_count": 31,
+   "id": "a11f680f",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
    "outputs": [],
    "source": [
     "import autograd.numpy as np\n",
@@ -5185,8 +5634,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "c5864d33",
-   "metadata": {},
+   "id": "0fc39e40",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "Now that we have our dataset ready for the regression, we can create\n",
     "our regressor. Note that with the seed parameter, we can make sure our\n",
@@ -5198,9 +5649,12 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 33,
-   "id": "474c34e0",
-   "metadata": {},
+   "execution_count": 32,
+   "id": "a67ab3a0",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
    "outputs": [],
    "source": [
     "input_nodes = X_train.shape[1]\n",
@@ -5211,17 +5665,22 @@
   },
   {
    "cell_type": "markdown",
-   "id": "74f3bc91",
-   "metadata": {},
+   "id": "3add8665",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "We then fit our model with our training data using the scheduler of our choice."
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 34,
-   "id": "a47d9dc5",
-   "metadata": {},
+   "execution_count": 33,
+   "id": "4a4fbc7a",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
    "outputs": [],
    "source": [
     "linear_regression.reset_weights() # reset weights such that previous runs or reruns don't affect the weights\n",
@@ -5232,8 +5691,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "bb2d666b",
-   "metadata": {},
+   "id": "4dff1871",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "Due to the progress bar we can see the MSE (train_error) throughout\n",
     "the FFNN's training. Note that the fit() function has some optional\n",
@@ -5245,9 +5706,12 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 35,
-   "id": "f05cdd60",
-   "metadata": {},
+   "execution_count": 34,
+   "id": "ad40e38c",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
    "outputs": [],
    "source": [
     "linear_regression.reset_weights() # reset weights such that previous runs or reruns don't affect the weights\n",
@@ -5257,8 +5721,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "0034d61c",
-   "metadata": {},
+   "id": "43cd1e22",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "We see that given more epochs to train on, the regressor reaches a lower MSE.\n",
     "\n",
@@ -5269,9 +5735,12 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 36,
-   "id": "67ecf987",
-   "metadata": {},
+   "execution_count": 35,
+   "id": "cde36b38",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
    "outputs": [],
    "source": [
     "from sklearn.datasets import load_breast_cancer\n",
@@ -5292,9 +5761,12 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 37,
-   "id": "729ba5dd",
-   "metadata": {},
+   "execution_count": 36,
+   "id": "2bc572a4",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
    "outputs": [],
    "source": [
     "input_nodes = X_train.shape[1]\n",
@@ -5305,17 +5777,22 @@
   },
   {
    "cell_type": "markdown",
-   "id": "719a054a",
-   "metadata": {},
+   "id": "e3e6fa31",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "We will now make use of our validation data by passing it into our fit function as a keyword argument"
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 38,
-   "id": "7e58cd70",
-   "metadata": {},
+   "execution_count": 37,
+   "id": "575ceb29",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
    "outputs": [],
    "source": [
     "logistic_regression.reset_weights() # reset weights such that previous runs or reruns don't affect the weights\n",
@@ -5326,17 +5803,22 @@
   },
   {
    "cell_type": "markdown",
-   "id": "8225cc6d",
-   "metadata": {},
+   "id": "622015f0",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "Finally, we will create a neural network with 2 hidden layers with activation functions."
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 39,
-   "id": "a134deda",
-   "metadata": {},
+   "execution_count": 38,
+   "id": "9c075b36",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
    "outputs": [],
    "source": [
     "input_nodes = X_train.shape[1]\n",
@@ -5351,9 +5833,12 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 40,
-   "id": "92aeced5",
-   "metadata": {},
+   "execution_count": 39,
+   "id": "44ded771",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
    "outputs": [],
    "source": [
     "neural_network.reset_weights() # reset weights such that previous runs or reruns don't affect the weights\n",
@@ -5364,8 +5849,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "3f0373a0",
-   "metadata": {},
+   "id": "317e6e5c",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "### Multiclass classification\n",
     "\n",
@@ -5376,9 +5863,12 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 41,
-   "id": "02888bf9",
-   "metadata": {},
+   "execution_count": 40,
+   "id": "8911de9d",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
    "outputs": [],
    "source": [
     "from sklearn.datasets import load_digits\n",
@@ -5411,8 +5901,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "e7714a6d",
-   "metadata": {},
+   "id": "82d61377",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "## Testing the XOR gate and other gates\n",
     "\n",
@@ -5421,9 +5913,12 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 42,
-   "id": "08a9206b",
-   "metadata": {},
+   "execution_count": 41,
+   "id": "2a72a374",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
    "outputs": [],
    "source": [
     "X = np.array([ [0, 0], [0, 1], [1, 0],[1, 1]],dtype=np.float64)\n",
@@ -5442,32 +5937,16 @@
   },
   {
    "cell_type": "markdown",
-   "id": "82cfd04b",
-   "metadata": {},
+   "id": "2d892009",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "Not bad, but the results depend strongly on the learning reate. Try different learning rates."
    ]
   }
  ],
- "metadata": {
-  "kernelspec": {
-   "display_name": "Python 3 (ipykernel)",
-   "language": "python",
-   "name": "python3"
-  },
-  "language_info": {
-   "codemirror_mode": {
-    "name": "ipython",
-    "version": 3
-   },
-   "file_extension": ".py",
-   "mimetype": "text/x-python",
-   "name": "python",
-   "nbconvert_exporter": "python",
-   "pygments_lexer": "ipython3",
-   "version": "3.9.15"
-  }
- },
+ "metadata": {},
  "nbformat": 4,
  "nbformat_minor": 5
 }
diff --git a/doc/LectureNotes/_build/html/_sources/week43.ipynb b/doc/LectureNotes/_build/html/_sources/week43.ipynb
new file mode 100644
index 000000000..b190102b6
--- /dev/null
+++ b/doc/LectureNotes/_build/html/_sources/week43.ipynb
@@ -0,0 +1,5950 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "5e07edf2",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- HTML file automatically generated from DocOnce source (https://github.com/doconce/doconce/)\n",
+    "doconce format html week43.do.txt --no_mako -->\n",
+    "<!-- dom:TITLE: Week 43: Deep Learning: Constructing a Neural Network code and solving differential equations -->"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "44b465a0",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "# Week 43: Deep Learning: Constructing a Neural Network code and solving differential equations\n",
+    "**Morten Hjorth-Jensen**, Department of Physics, University of Oslo, Norway\n",
+    "\n",
+    "Date: **October 20, 2025**"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9d7bd8c9",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Plans for week 43\n",
+    "\n",
+    "**Material for the lecture on Monday October 20, 2025.**\n",
+    "\n",
+    "1. Reminder from last week, see also lecture notes from week 42 at <https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/week42.html> as well as those from week 41, see see <https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/week41.html>. \n",
+    "\n",
+    "2. Building our own Feed-forward Neural Network.\n",
+    "\n",
+    "3. Coding examples using Tensorflow/Keras and Pytorch examples. The Pytorch examples are adapted from Rashcka's text, see chapters 11-13.. \n",
+    "\n",
+    "4. Start discussions on how to use neural networks for solving  differential equations (ordinary and partial ones). This topic continues next week as well.\n",
+    "\n",
+    "5. Video of lecture at <https://youtu.be/Gi6mzxAT0Ew>\n",
+    "\n",
+    "6. Whiteboard notes at <https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2025/FYSSTKweek43.pdf>"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c50cff0f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Exercises and lab session week 43\n",
+    "**Lab sessions on Tuesday and Wednesday.**\n",
+    "\n",
+    "1. Work  on writing your own neural network code and discussions of project 2. If you didn't get time to do the exercises from the two last weeks, we recommend doing so as these exercises give you the basic elements of a neural network code.\n",
+    "\n",
+    "2. The exercises this week are tailored to the optional part of project 2, and deal with studying ways to display results from classification problems"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "fe8d32ed",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Using Automatic differentiation\n",
+    "\n",
+    "In our discussions of ordinary differential equations and neural network codes\n",
+    "we will also study the usage of Autograd, see for example <https://www.youtube.com/watch?v=fRf4l5qaX1M&ab_channel=AlexSmola> in computing gradients for deep learning. For the documentation of Autograd and examples see the Autograd documentation at <https://github.com/HIPS/autograd> and the lecture slides from week 41, see <https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/week41.html>."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "99999ab4",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Back propagation and automatic differentiation\n",
+    "\n",
+    "For more details on the back propagation algorithm and automatic differentiation see\n",
+    "1. <https://www.jmlr.org/papers/volume18/17-468/17-468.pdf>\n",
+    "\n",
+    "2. <https://deepimaging.github.io/lectures/lecture_11_Backpropagation.pdf>\n",
+    "\n",
+    "3. Slides 12-44 at <http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture4.pdf>"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b4489372",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Lecture Monday  October 20"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f7435e4a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations\n",
+    "This is a reminder from last week.\n",
+    "\n",
+    "**The architecture (our model).**\n",
+    "\n",
+    "1. Set up your inputs and outputs (scalars, vectors, matrices or higher-order arrays)\n",
+    "\n",
+    "2. Define the number of hidden layers and hidden nodes\n",
+    "\n",
+    "3. Define activation functions for hidden layers and output layers\n",
+    "\n",
+    "4. Define optimizer (plan learning rate, momentum, ADAgrad, RMSprop, ADAM etc) and array of initial learning rates\n",
+    "\n",
+    "5. Define cost function and possible regularization terms with hyperparameters\n",
+    "\n",
+    "6. Initialize weights and biases\n",
+    "\n",
+    "7. Fix number of iterations for the feed forward part and back propagation part"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e2561576",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Setting up the back propagation algorithm, part 1\n",
+    "\n",
+    "Let us write this out in the form of an algorithm.\n",
+    "\n",
+    "**First**, we set up the input data $\\boldsymbol{x}$ and the activations\n",
+    "$\\boldsymbol{z}_1$ of the input layer and compute the activation function and\n",
+    "the pertinent outputs $\\boldsymbol{a}^1$.\n",
+    "\n",
+    "**Secondly**, we perform then the feed forward till we reach the output\n",
+    "layer and compute all $\\boldsymbol{z}_l$ of the input layer and compute the\n",
+    "activation function and the pertinent outputs $\\boldsymbol{a}^l$ for\n",
+    "$l=1,2,3,\\dots,L$.\n",
+    "\n",
+    "**Notation**: The first hidden layer has $l=1$ as label and the final output layer has $l=L$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "39ed46ed",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Setting up the back propagation algorithm, part 2\n",
+    "\n",
+    "Thereafter we compute the ouput error $\\boldsymbol{\\delta}^L$ by computing all"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "776b50ac",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\delta_j^L = \\sigma'(z_j^L)\\frac{\\partial {\\cal C}}{\\partial (a_j^L)}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b0ad385d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Then we compute the back propagate error for each $l=L-1,L-2,\\dots,1$ as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "bb592830",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\delta_j^l = \\sum_k \\delta_k^{l+1}w_{kj}^{l+1}\\sigma'(z_j^l).\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "41259526",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Setting up the Back propagation algorithm, part 3\n",
+    "\n",
+    "Finally, we update the weights and the biases using gradient descent\n",
+    "for each $l=L-1,L-2,\\dots,1$ (the first hidden layer) and update the weights and biases\n",
+    "according to the rules"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "47eaff91",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "w_{ij}^l\\leftarrow  = w_{ij}^l- \\eta \\delta_j^la_i^{l-1},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "05b74533",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "b_j^l \\leftarrow b_j^l-\\eta \\frac{\\partial {\\cal C}}{\\partial b_j^l}=b_j^l-\\eta \\delta_j^l,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6edb8648",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "with $\\eta$ being the learning rate."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a663fc08",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Updating the gradients\n",
+    "\n",
+    "With the back propagate error for each $l=L-1,L-2,\\dots,1$ as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "479150e0",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\delta_j^l = \\sum_k \\delta_k^{l+1}w_{kj}^{l+1}\\sigma'(z_j^l),\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "41b9b1ea",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "we update the weights and the biases using gradient descent for each $l=L-1,L-2,\\dots,1$ and update the weights and biases according to the rules"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "590c403a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "w_{ij}^l\\leftarrow  = w_{ij}^l- \\eta \\delta_j^la_i^{l-1},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3db8cbb4",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "b_j^l \\leftarrow b_j^l-\\eta \\frac{\\partial {\\cal C}}{\\partial b_j^l}=b_j^l-\\eta \\delta_j^l,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a204182a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Activation functions\n",
+    "\n",
+    "A property that characterizes a neural network, other than its\n",
+    "connectivity, is the choice of activation function(s).  The following\n",
+    "restrictions are imposed on an activation function for an FFNN to\n",
+    "fulfill the universal approximation theorem\n",
+    "\n",
+    "  * Non-constant\n",
+    "\n",
+    "  * Bounded\n",
+    "\n",
+    "  * Monotonically-increasing\n",
+    "\n",
+    "  * Continuous"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4fe58cce",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "### Activation functions, examples\n",
+    "\n",
+    "Typical examples are the logistic *Sigmoid*"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a14f6d08",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\sigma(x) = \\frac{1}{1 + e^{-x}},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4c290410",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and the *hyperbolic tangent* function"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ca1ac514",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\sigma(x) = \\tanh(x)\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b9bcfab3",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## The RELU function family\n",
+    "\n",
+    "The ReLU activation function suffers from a problem known as the dying\n",
+    "ReLUs: during training, some neurons effectively die, meaning they\n",
+    "stop outputting anything other than 0.\n",
+    "\n",
+    "In some cases, you may find that half of your network’s neurons are\n",
+    "dead, especially if you used a large learning rate. During training,\n",
+    "if a neuron’s weights get updated such that the weighted sum of the\n",
+    "neuron’s inputs is negative, it will start outputting 0. When this\n",
+    "happen, the neuron is unlikely to come back to life since the gradient\n",
+    "of the ReLU function is 0 when its input is negative."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2fdf56f7",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## ELU function\n",
+    "\n",
+    "To solve this problem, nowadays practitioners use a variant of the\n",
+    "ReLU function, such as the leaky ReLU discussed above or the so-called\n",
+    "exponential linear unit (ELU) function"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "14bf193c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "ELU(z) = \\left\\{\\begin{array}{cc} \\alpha\\left( \\exp{(z)}-1\\right) & z < 0,\\\\  z & z \\ge 0.\\end{array}\\right.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "df29068f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Which activation function should we use?\n",
+    "\n",
+    "In general it seems that the ELU activation function is better than\n",
+    "the leaky ReLU function (and its variants), which is better than\n",
+    "ReLU. ReLU performs better than $\\tanh$ which in turn performs better\n",
+    "than the logistic function.\n",
+    "\n",
+    "If runtime performance is an issue, then you may opt for the leaky\n",
+    "ReLU function over the ELU function If you don’t want to tweak yet\n",
+    "another hyperparameter, you may just use the default $\\alpha$ of\n",
+    "$0.01$ for the leaky ReLU, and $1$ for ELU. If you have spare time and\n",
+    "computing power, you can use cross-validation or bootstrap to evaluate\n",
+    "other activation functions."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2fb5a29e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## More on activation functions, output layers\n",
+    "\n",
+    "In most cases you can use the ReLU activation function in the hidden\n",
+    "layers (or one of its variants).\n",
+    "\n",
+    "It is a bit faster to compute than other activation functions, and the\n",
+    "gradient descent optimization does in general not get stuck.\n",
+    "\n",
+    "**For the output layer:**\n",
+    "\n",
+    "* For classification the softmax activation function is generally a good choice for classification tasks (when the classes are mutually exclusive).\n",
+    "\n",
+    "* For regression tasks, you can simply use no activation function at all."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "bab79791",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Building neural networks in Tensorflow and Keras\n",
+    "\n",
+    "Now we want  to build on the experience gained from our neural network implementation in NumPy and scikit-learn\n",
+    "and use it to construct a neural network in Tensorflow. Once we have constructed a neural network in NumPy\n",
+    "and Tensorflow, building one in Keras is really quite trivial, though the performance may suffer.  \n",
+    "\n",
+    "In our previous example we used only one hidden layer, and in this we will use two. From this it should be quite\n",
+    "clear how to build one using an arbitrary number of hidden layers, using data structures such as Python lists or\n",
+    "NumPy arrays."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cc32bc9d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Tensorflow\n",
+    "\n",
+    "Tensorflow is an open source library machine learning library\n",
+    "developed by the Google Brain team for internal use. It was released\n",
+    "under the Apache 2.0 open source license in November 9, 2015.\n",
+    "\n",
+    "Tensorflow is a computational framework that allows you to construct\n",
+    "machine learning models at different levels of abstraction, from\n",
+    "high-level, object-oriented APIs like Keras, down to the C++ kernels\n",
+    "that Tensorflow is built upon. The higher levels of abstraction are\n",
+    "simpler to use, but less flexible, and our choice of implementation\n",
+    "should reflect the problems we are trying to solve.\n",
+    "\n",
+    "[Tensorflow uses](https://www.tensorflow.org/guide/graphs) so-called graphs to represent your computation\n",
+    "in terms of the dependencies between individual operations, such that you first build a Tensorflow *graph*\n",
+    "to represent your model, and then create a Tensorflow *session* to run the graph.\n",
+    "\n",
+    "In this guide we will analyze the same data as we did in our NumPy and\n",
+    "scikit-learn tutorial, gathered from the MNIST database of images. We\n",
+    "will give an introduction to the lower level Python Application\n",
+    "Program Interfaces (APIs), and see how we use them to build our graph.\n",
+    "Then we will build (effectively) the same graph in Keras, to see just\n",
+    "how simple solving a machine learning problem can be.\n",
+    "\n",
+    "To install tensorflow on Unix/Linux systems, use pip as"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "id": "deb81088",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "pip3 install tensorflow"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "979148b0",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and/or if you use **anaconda**, just write (or install from the graphical user interface)\n",
+    "(current release of CPU-only TensorFlow)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "id": "ad63b8d9",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "conda create -n tf tensorflow\n",
+    "conda activate tf"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1417a40e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "To install the current release of GPU TensorFlow"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "id": "d56acb3a",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "conda create -n tf-gpu tensorflow-gpu\n",
+    "conda activate tf-gpu"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6a163d27",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Using Keras\n",
+    "\n",
+    "Keras is a high level [neural network](https://en.wikipedia.org/wiki/Application_programming_interface)\n",
+    "that supports Tensorflow, CTNK and Theano as backends.  \n",
+    "If you have Anaconda installed you may run the following command"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "id": "9ee390a8",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "conda install keras"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "528ea3d5",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "You can look up the [instructions here](https://keras.io/) for more information.\n",
+    "\n",
+    "We will to a large extent use **keras** in this course."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "32178225",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Collect and pre-process data\n",
+    "\n",
+    "Let us look again at the MINST data set."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "id": "e37f86e4",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "%matplotlib inline\n",
+    "\n",
+    "# import necessary packages\n",
+    "import numpy as np\n",
+    "import matplotlib.pyplot as plt\n",
+    "import tensorflow as tf\n",
+    "from sklearn import datasets\n",
+    "\n",
+    "\n",
+    "# ensure the same random numbers appear every time\n",
+    "np.random.seed(0)\n",
+    "\n",
+    "# display images in notebook\n",
+    "%matplotlib inline\n",
+    "plt.rcParams['figure.figsize'] = (12,12)\n",
+    "\n",
+    "\n",
+    "# download MNIST dataset\n",
+    "digits = datasets.load_digits()\n",
+    "\n",
+    "# define inputs and labels\n",
+    "inputs = digits.images\n",
+    "labels = digits.target\n",
+    "\n",
+    "print(\"inputs = (n_inputs, pixel_width, pixel_height) = \" + str(inputs.shape))\n",
+    "print(\"labels = (n_inputs) = \" + str(labels.shape))\n",
+    "\n",
+    "\n",
+    "# flatten the image\n",
+    "# the value -1 means dimension is inferred from the remaining dimensions: 8x8 = 64\n",
+    "n_inputs = len(inputs)\n",
+    "inputs = inputs.reshape(n_inputs, -1)\n",
+    "print(\"X = (n_inputs, n_features) = \" + str(inputs.shape))\n",
+    "\n",
+    "\n",
+    "# choose some random images to display\n",
+    "indices = np.arange(n_inputs)\n",
+    "random_indices = np.random.choice(indices, size=5)\n",
+    "\n",
+    "for i, image in enumerate(digits.images[random_indices]):\n",
+    "    plt.subplot(1, 5, i+1)\n",
+    "    plt.axis('off')\n",
+    "    plt.imshow(image, cmap=plt.cm.gray_r, interpolation='nearest')\n",
+    "    plt.title(\"Label: %d\" % digits.target[random_indices[i]])\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "id": "06a7c3bd",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "from tensorflow.keras.layers import Input\n",
+    "from tensorflow.keras.models import Sequential      #This allows appending layers to existing models\n",
+    "from tensorflow.keras.layers import Dense           #This allows defining the characteristics of a particular layer\n",
+    "from tensorflow.keras import optimizers             #This allows using whichever optimiser we want (sgd,adam,RMSprop)\n",
+    "from tensorflow.keras import regularizers           #This allows using whichever regularizer we want (l1,l2,l1_l2)\n",
+    "from tensorflow.keras.utils import to_categorical   #This allows using categorical cross entropy as the cost function\n",
+    "\n",
+    "from sklearn.model_selection import train_test_split\n",
+    "\n",
+    "# one-hot representation of labels\n",
+    "labels = to_categorical(labels)\n",
+    "\n",
+    "# split into train and test data\n",
+    "train_size = 0.8\n",
+    "test_size = 1 - train_size\n",
+    "X_train, X_test, Y_train, Y_test = train_test_split(inputs, labels, train_size=train_size,\n",
+    "                                                    test_size=test_size)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "id": "358b46c5",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "\n",
+    "epochs = 100\n",
+    "batch_size = 100\n",
+    "n_neurons_layer1 = 100\n",
+    "n_neurons_layer2 = 50\n",
+    "n_categories = 10\n",
+    "eta_vals = np.logspace(-5, 1, 7)\n",
+    "lmbd_vals = np.logspace(-5, 1, 7)\n",
+    "def create_neural_network_keras(n_neurons_layer1, n_neurons_layer2, n_categories, eta, lmbd):\n",
+    "    model = Sequential()\n",
+    "    model.add(Dense(n_neurons_layer1, activation='sigmoid', kernel_regularizer=regularizers.l2(lmbd)))\n",
+    "    model.add(Dense(n_neurons_layer2, activation='sigmoid', kernel_regularizer=regularizers.l2(lmbd)))\n",
+    "    model.add(Dense(n_categories, activation='softmax'))\n",
+    "    \n",
+    "    sgd = optimizers.SGD(learning_rate=eta)\n",
+    "    model.compile(loss='categorical_crossentropy', optimizer=sgd, metrics=['accuracy'])\n",
+    "    \n",
+    "    return model"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 8,
+   "id": "5a0445fb",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "DNN_keras = np.zeros((len(eta_vals), len(lmbd_vals)), dtype=object)\n",
+    "        \n",
+    "for i, eta in enumerate(eta_vals):\n",
+    "    for j, lmbd in enumerate(lmbd_vals):\n",
+    "        DNN = create_neural_network_keras(n_neurons_layer1, n_neurons_layer2, n_categories,\n",
+    "                                         eta=eta, lmbd=lmbd)\n",
+    "        DNN.fit(X_train, Y_train, epochs=epochs, batch_size=batch_size, verbose=0)\n",
+    "        scores = DNN.evaluate(X_test, Y_test)\n",
+    "        \n",
+    "        DNN_keras[i][j] = DNN\n",
+    "        \n",
+    "        print(\"Learning rate = \", eta)\n",
+    "        print(\"Lambda = \", lmbd)\n",
+    "        print(\"Test accuracy: %.3f\" % scores[1])\n",
+    "        print()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 9,
+   "id": "f301c7cf",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "# optional\n",
+    "# visual representation of grid search\n",
+    "# uses seaborn heatmap, could probably do this in matplotlib\n",
+    "import seaborn as sns\n",
+    "\n",
+    "sns.set()\n",
+    "\n",
+    "train_accuracy = np.zeros((len(eta_vals), len(lmbd_vals)))\n",
+    "test_accuracy = np.zeros((len(eta_vals), len(lmbd_vals)))\n",
+    "\n",
+    "for i in range(len(eta_vals)):\n",
+    "    for j in range(len(lmbd_vals)):\n",
+    "        DNN = DNN_keras[i][j]\n",
+    "\n",
+    "        train_accuracy[i][j] = DNN.evaluate(X_train, Y_train)[1]\n",
+    "        test_accuracy[i][j] = DNN.evaluate(X_test, Y_test)[1]\n",
+    "\n",
+    "        \n",
+    "fig, ax = plt.subplots(figsize = (10, 10))\n",
+    "sns.heatmap(train_accuracy, annot=True, ax=ax, cmap=\"viridis\")\n",
+    "ax.set_title(\"Training Accuracy\")\n",
+    "ax.set_ylabel(\"$\\eta$\")\n",
+    "ax.set_xlabel(\"$\\lambda$\")\n",
+    "plt.show()\n",
+    "\n",
+    "fig, ax = plt.subplots(figsize = (10, 10))\n",
+    "sns.heatmap(test_accuracy, annot=True, ax=ax, cmap=\"viridis\")\n",
+    "ax.set_title(\"Test Accuracy\")\n",
+    "ax.set_ylabel(\"$\\eta$\")\n",
+    "ax.set_xlabel(\"$\\lambda$\")\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "610c95e1",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Using Pytorch with the full MNIST data set"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 10,
+   "id": "d0f3ad9a",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "import torch\n",
+    "import torch.nn as nn\n",
+    "import torch.optim as optim\n",
+    "import torchvision\n",
+    "import torchvision.transforms as transforms\n",
+    "\n",
+    "# Device configuration: use GPU if available\n",
+    "device = torch.device(\"cuda\" if torch.cuda.is_available() else \"cpu\")\n",
+    "\n",
+    "# MNIST dataset (downloads if not already present)\n",
+    "transform = transforms.Compose([\n",
+    "    transforms.ToTensor(),\n",
+    "    transforms.Normalize((0.5,), (0.5,))  # normalize to mean=0.5, std=0.5 (approx. [-1,1] pixel range)\n",
+    "])\n",
+    "train_dataset = torchvision.datasets.MNIST(root='./data', train=True, download=True, transform=transform)\n",
+    "test_dataset  = torchvision.datasets.MNIST(root='./data', train=False, download=True, transform=transform)\n",
+    "\n",
+    "train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=64, shuffle=True)\n",
+    "test_loader  = torch.utils.data.DataLoader(test_dataset, batch_size=64, shuffle=False)\n",
+    "\n",
+    "\n",
+    "class NeuralNet(nn.Module):\n",
+    "    def __init__(self):\n",
+    "        super(NeuralNet, self).__init__()\n",
+    "        self.fc1 = nn.Linear(28*28, 100)   # first hidden layer (784 -> 100)\n",
+    "        self.fc2 = nn.Linear(100, 100)    # second hidden layer (100 -> 100)\n",
+    "        self.fc3 = nn.Linear(100, 10)     # output layer (100 -> 10 classes)\n",
+    "    def forward(self, x):\n",
+    "        x = x.view(x.size(0), -1)         # flatten images into vectors of size 784\n",
+    "        x = torch.relu(self.fc1(x))       # hidden layer 1 + ReLU activation\n",
+    "        x = torch.relu(self.fc2(x))       # hidden layer 2 + ReLU activation\n",
+    "        x = self.fc3(x)                   # output layer (logits for 10 classes)\n",
+    "        return x\n",
+    "\n",
+    "model = NeuralNet().to(device)\n",
+    "\n",
+    "\n",
+    "criterion = nn.CrossEntropyLoss()\n",
+    "optimizer = optim.SGD(model.parameters(), lr=0.01, weight_decay=1e-4)\n",
+    "\n",
+    "num_epochs = 10\n",
+    "for epoch in range(num_epochs):\n",
+    "    model.train()  # set model to training mode\n",
+    "    running_loss = 0.0\n",
+    "    for images, labels in train_loader:\n",
+    "        # Move data to device (GPU if available, else CPU)\n",
+    "        images, labels = images.to(device), labels.to(device)\n",
+    "\n",
+    "        optimizer.zero_grad()            # reset gradients to zero\n",
+    "        outputs = model(images)          # forward pass: compute predictions\n",
+    "        loss = criterion(outputs, labels)  # compute cross-entropy loss\n",
+    "        loss.backward()                 # backpropagate to compute gradients\n",
+    "        optimizer.step()                # update weights using SGD step \n",
+    "\n",
+    "        running_loss += loss.item()\n",
+    "    # Compute average loss over all batches in this epoch\n",
+    "    avg_loss = running_loss / len(train_loader)\n",
+    "    print(f\"Epoch {epoch+1}/{num_epochs}, Loss: {avg_loss:.4f}\")\n",
+    "\n",
+    "#Evaluation on the Test Set\n",
+    "\n",
+    "\n",
+    "\n",
+    "model.eval()  # set model to evaluation mode \n",
+    "correct = 0\n",
+    "total = 0\n",
+    "with torch.no_grad():  # disable gradient calculation for evaluation \n",
+    "    for images, labels in test_loader:\n",
+    "        images, labels = images.to(device), labels.to(device)\n",
+    "        outputs = model(images)\n",
+    "        _, predicted = torch.max(outputs, dim=1)  # class with highest score\n",
+    "        total += labels.size(0)\n",
+    "        correct += (predicted == labels).sum().item()\n",
+    "\n",
+    "accuracy = 100 * correct / total\n",
+    "print(f\"Test Accuracy: {accuracy:.2f}%\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "aad687aa",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## And a similar example using Tensorflow with Keras"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 11,
+   "id": "b6c4fad4",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "\n",
+    "import tensorflow as tf\n",
+    "from tensorflow import keras\n",
+    "from tensorflow.keras import layers, regularizers\n",
+    "\n",
+    "# Check for GPU (TensorFlow will use it automatically if available)\n",
+    "gpus = tf.config.list_physical_devices('GPU')\n",
+    "print(f\"GPUs available: {gpus}\")\n",
+    "\n",
+    "# 1) Load and preprocess MNIST\n",
+    "(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()\n",
+    "# Normalize to [0, 1]\n",
+    "x_train = (x_train.astype(\"float32\") / 255.0)\n",
+    "x_test  = (x_test.astype(\"float32\") / 255.0)\n",
+    "\n",
+    "# 2) Build the model: 784 -> 100 -> 100 -> 10\n",
+    "l2_reg = 1e-4  # L2 regularization strength\n",
+    "\n",
+    "model = keras.Sequential([\n",
+    "    layers.Input(shape=(28, 28)),\n",
+    "    layers.Flatten(),\n",
+    "    layers.Dense(100, activation=\"relu\",\n",
+    "                 kernel_regularizer=regularizers.l2(l2_reg)),\n",
+    "    layers.Dense(100, activation=\"relu\",\n",
+    "                 kernel_regularizer=regularizers.l2(l2_reg)),\n",
+    "    layers.Dense(10, activation=\"softmax\")  # output probabilities for 10 classes\n",
+    "])\n",
+    "\n",
+    "# 3) Compile with SGD + weight decay via L2 regularizers\n",
+    "model.compile(\n",
+    "    optimizer=keras.optimizers.SGD(learning_rate=0.01),\n",
+    "    loss=\"sparse_categorical_crossentropy\",\n",
+    "    metrics=[\"accuracy\"],\n",
+    ")\n",
+    "\n",
+    "model.summary()\n",
+    "\n",
+    "# 4) Train\n",
+    "history = model.fit(\n",
+    "    x_train, y_train,\n",
+    "    epochs=10,\n",
+    "    batch_size=64,\n",
+    "    validation_split=0.1,  # optional: monitor validation during training\n",
+    "    verbose=1\n",
+    ")\n",
+    "\n",
+    "# 5) Evaluate on test set\n",
+    "test_loss, test_acc = model.evaluate(x_test, y_test, verbose=0)\n",
+    "print(f\"Test accuracy: {test_acc:.4f}, Test loss: {test_loss:.4f}\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "73162fbb",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Building our own  neural network code\n",
+    "\n",
+    "Here we  present a flexible object oriented codebase\n",
+    "for a feed forward neural network, along with a demonstration of how\n",
+    "to use it. Before we get into the details of the neural network, we\n",
+    "will first present some implementations of various schedulers, cost\n",
+    "functions and activation functions that can be used together with the\n",
+    "neural network.\n",
+    "\n",
+    "The codes here were developed by Eric Reber and Gregor Kajda during spring 2023."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "86f36041",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "### Learning rate methods\n",
+    "\n",
+    "The code below shows object oriented implementations of the Constant,\n",
+    "Momentum, Adagrad, AdagradMomentum, RMS prop and Adam schedulers. All\n",
+    "of the classes belong to the shared abstract Scheduler class, and\n",
+    "share the update_change() and reset() methods allowing for any of the\n",
+    "schedulers to be seamlessly used during the training stage, as will\n",
+    "later be shown in the fit() method of the neural\n",
+    "network. Update_change() only has one parameter, the gradient\n",
+    "($δ^l_ja^{l−1}_k$), and returns the change which will be subtracted\n",
+    "from the weights. The reset() function takes no parameters, and resets\n",
+    "the desired variables. For Constant and Momentum, reset does nothing."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 12,
+   "id": "bcbec449",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "import autograd.numpy as np\n",
+    "\n",
+    "class Scheduler:\n",
+    "    \"\"\"\n",
+    "    Abstract class for Schedulers\n",
+    "    \"\"\"\n",
+    "\n",
+    "    def __init__(self, eta):\n",
+    "        self.eta = eta\n",
+    "\n",
+    "    # should be overwritten\n",
+    "    def update_change(self, gradient):\n",
+    "        raise NotImplementedError\n",
+    "\n",
+    "    # overwritten if needed\n",
+    "    def reset(self):\n",
+    "        pass\n",
+    "\n",
+    "\n",
+    "class Constant(Scheduler):\n",
+    "    def __init__(self, eta):\n",
+    "        super().__init__(eta)\n",
+    "\n",
+    "    def update_change(self, gradient):\n",
+    "        return self.eta * gradient\n",
+    "    \n",
+    "    def reset(self):\n",
+    "        pass\n",
+    "\n",
+    "\n",
+    "class Momentum(Scheduler):\n",
+    "    def __init__(self, eta: float, momentum: float):\n",
+    "        super().__init__(eta)\n",
+    "        self.momentum = momentum\n",
+    "        self.change = 0\n",
+    "\n",
+    "    def update_change(self, gradient):\n",
+    "        self.change = self.momentum * self.change + self.eta * gradient\n",
+    "        return self.change\n",
+    "\n",
+    "    def reset(self):\n",
+    "        pass\n",
+    "\n",
+    "\n",
+    "class Adagrad(Scheduler):\n",
+    "    def __init__(self, eta):\n",
+    "        super().__init__(eta)\n",
+    "        self.G_t = None\n",
+    "\n",
+    "    def update_change(self, gradient):\n",
+    "        delta = 1e-8  # avoid division ny zero\n",
+    "\n",
+    "        if self.G_t is None:\n",
+    "            self.G_t = np.zeros((gradient.shape[0], gradient.shape[0]))\n",
+    "\n",
+    "        self.G_t += gradient @ gradient.T\n",
+    "\n",
+    "        G_t_inverse = 1 / (\n",
+    "            delta + np.sqrt(np.reshape(np.diagonal(self.G_t), (self.G_t.shape[0], 1)))\n",
+    "        )\n",
+    "        return self.eta * gradient * G_t_inverse\n",
+    "\n",
+    "    def reset(self):\n",
+    "        self.G_t = None\n",
+    "\n",
+    "\n",
+    "class AdagradMomentum(Scheduler):\n",
+    "    def __init__(self, eta, momentum):\n",
+    "        super().__init__(eta)\n",
+    "        self.G_t = None\n",
+    "        self.momentum = momentum\n",
+    "        self.change = 0\n",
+    "\n",
+    "    def update_change(self, gradient):\n",
+    "        delta = 1e-8  # avoid division ny zero\n",
+    "\n",
+    "        if self.G_t is None:\n",
+    "            self.G_t = np.zeros((gradient.shape[0], gradient.shape[0]))\n",
+    "\n",
+    "        self.G_t += gradient @ gradient.T\n",
+    "\n",
+    "        G_t_inverse = 1 / (\n",
+    "            delta + np.sqrt(np.reshape(np.diagonal(self.G_t), (self.G_t.shape[0], 1)))\n",
+    "        )\n",
+    "        self.change = self.change * self.momentum + self.eta * gradient * G_t_inverse\n",
+    "        return self.change\n",
+    "\n",
+    "    def reset(self):\n",
+    "        self.G_t = None\n",
+    "\n",
+    "\n",
+    "class RMS_prop(Scheduler):\n",
+    "    def __init__(self, eta, rho):\n",
+    "        super().__init__(eta)\n",
+    "        self.rho = rho\n",
+    "        self.second = 0.0\n",
+    "\n",
+    "    def update_change(self, gradient):\n",
+    "        delta = 1e-8  # avoid division ny zero\n",
+    "        self.second = self.rho * self.second + (1 - self.rho) * gradient * gradient\n",
+    "        return self.eta * gradient / (np.sqrt(self.second + delta))\n",
+    "\n",
+    "    def reset(self):\n",
+    "        self.second = 0.0\n",
+    "\n",
+    "\n",
+    "class Adam(Scheduler):\n",
+    "    def __init__(self, eta, rho, rho2):\n",
+    "        super().__init__(eta)\n",
+    "        self.rho = rho\n",
+    "        self.rho2 = rho2\n",
+    "        self.moment = 0\n",
+    "        self.second = 0\n",
+    "        self.n_epochs = 1\n",
+    "\n",
+    "    def update_change(self, gradient):\n",
+    "        delta = 1e-8  # avoid division ny zero\n",
+    "\n",
+    "        self.moment = self.rho * self.moment + (1 - self.rho) * gradient\n",
+    "        self.second = self.rho2 * self.second + (1 - self.rho2) * gradient * gradient\n",
+    "\n",
+    "        moment_corrected = self.moment / (1 - self.rho**self.n_epochs)\n",
+    "        second_corrected = self.second / (1 - self.rho2**self.n_epochs)\n",
+    "\n",
+    "        return self.eta * moment_corrected / (np.sqrt(second_corrected + delta))\n",
+    "\n",
+    "    def reset(self):\n",
+    "        self.n_epochs += 1\n",
+    "        self.moment = 0\n",
+    "        self.second = 0"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "961989d9",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "### Usage of the above learning rate schedulers\n",
+    "\n",
+    "To initalize a scheduler, simply create the object and pass in the\n",
+    "necessary parameters such as the learning rate and the momentum as\n",
+    "shown below. As the Scheduler class is an abstract class it should not\n",
+    "called directly, and will raise an error upon usage."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 13,
+   "id": "1e9fbe0f",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "momentum_scheduler = Momentum(eta=1e-3, momentum=0.9)\n",
+    "adam_scheduler = Adam(eta=1e-3, rho=0.9, rho2=0.999)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b5adb1b4",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Here is a small example for how a segment of code using schedulers\n",
+    "could look. Switching out the schedulers is simple."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 14,
+   "id": "dc4f4d28",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "weights = np.ones((3,3))\n",
+    "print(f\"Before scheduler:\\n{weights=}\")\n",
+    "\n",
+    "epochs = 10\n",
+    "for e in range(epochs):\n",
+    "    gradient = np.random.rand(3, 3)\n",
+    "    change = adam_scheduler.update_change(gradient)\n",
+    "    weights = weights - change\n",
+    "    adam_scheduler.reset()\n",
+    "\n",
+    "print(f\"\\nAfter scheduler:\\n{weights=}\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8964d118",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "### Cost functions\n",
+    "\n",
+    "Here we discuss cost functions that can be used when creating the\n",
+    "neural network. Every cost function takes the target vector as its\n",
+    "parameter, and returns a function valued only at $x$ such that it may\n",
+    "easily be differentiated."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 15,
+   "id": "3a8470bd",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "import autograd.numpy as np\n",
+    "\n",
+    "def CostOLS(target):\n",
+    "    \n",
+    "    def func(X):\n",
+    "        return (1.0 / target.shape[0]) * np.sum((target - X) ** 2)\n",
+    "\n",
+    "    return func\n",
+    "\n",
+    "\n",
+    "def CostLogReg(target):\n",
+    "\n",
+    "    def func(X):\n",
+    "        \n",
+    "        return -(1.0 / target.shape[0]) * np.sum(\n",
+    "            (target * np.log(X + 10e-10)) + ((1 - target) * np.log(1 - X + 10e-10))\n",
+    "        )\n",
+    "\n",
+    "    return func\n",
+    "\n",
+    "\n",
+    "def CostCrossEntropy(target):\n",
+    "    \n",
+    "    def func(X):\n",
+    "        return -(1.0 / target.size) * np.sum(target * np.log(X + 10e-10))\n",
+    "\n",
+    "    return func"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ab4daf8f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Below we give a short example of how these cost function may be used\n",
+    "to obtain results if you wish to test them out on your own using\n",
+    "AutoGrad's automatics differentiation."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 16,
+   "id": "cf8922ac",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "from autograd import grad\n",
+    "\n",
+    "target = np.array([[1, 2, 3]]).T\n",
+    "a = np.array([[4, 5, 6]]).T\n",
+    "\n",
+    "cost_func = CostCrossEntropy\n",
+    "cost_func_derivative = grad(cost_func(target))\n",
+    "\n",
+    "valued_at_a = cost_func_derivative(a)\n",
+    "print(f\"Derivative of cost function {cost_func.__name__} valued at a:\\n{valued_at_a}\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "fab332c4",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "### Activation functions\n",
+    "\n",
+    "Finally, before we look at the neural network, we will look at the\n",
+    "activation functions which can be specified between the hidden layers\n",
+    "and as the output function. Each function can be valued for any given\n",
+    "vector or matrix X, and can be differentiated via derivate()."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 17,
+   "id": "5ab56013",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "import autograd.numpy as np\n",
+    "from autograd import elementwise_grad\n",
+    "\n",
+    "def identity(X):\n",
+    "    return X\n",
+    "\n",
+    "\n",
+    "def sigmoid(X):\n",
+    "    try:\n",
+    "        return 1.0 / (1 + np.exp(-X))\n",
+    "    except FloatingPointError:\n",
+    "        return np.where(X > np.zeros(X.shape), np.ones(X.shape), np.zeros(X.shape))\n",
+    "\n",
+    "\n",
+    "def softmax(X):\n",
+    "    X = X - np.max(X, axis=-1, keepdims=True)\n",
+    "    delta = 10e-10\n",
+    "    return np.exp(X) / (np.sum(np.exp(X), axis=-1, keepdims=True) + delta)\n",
+    "\n",
+    "\n",
+    "def RELU(X):\n",
+    "    return np.where(X > np.zeros(X.shape), X, np.zeros(X.shape))\n",
+    "\n",
+    "\n",
+    "def LRELU(X):\n",
+    "    delta = 10e-4\n",
+    "    return np.where(X > np.zeros(X.shape), X, delta * X)\n",
+    "\n",
+    "\n",
+    "def derivate(func):\n",
+    "    if func.__name__ == \"RELU\":\n",
+    "\n",
+    "        def func(X):\n",
+    "            return np.where(X > 0, 1, 0)\n",
+    "\n",
+    "        return func\n",
+    "\n",
+    "    elif func.__name__ == \"LRELU\":\n",
+    "\n",
+    "        def func(X):\n",
+    "            delta = 10e-4\n",
+    "            return np.where(X > 0, 1, delta)\n",
+    "\n",
+    "        return func\n",
+    "\n",
+    "    else:\n",
+    "        return elementwise_grad(func)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "969612c3",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Below follows a short demonstration of how to use an activation\n",
+    "function. The derivative of the activation function will be important\n",
+    "when calculating the output delta term during backpropagation. Note\n",
+    "that derivate() can also be used for cost functions for a more\n",
+    "generalized approach."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 18,
+   "id": "313878c6",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "z = np.array([[4, 5, 6]]).T\n",
+    "print(f\"Input to activation function:\\n{z}\")\n",
+    "\n",
+    "act_func = sigmoid\n",
+    "a = act_func(z)\n",
+    "print(f\"\\nOutput from {act_func.__name__} activation function:\\n{a}\")\n",
+    "\n",
+    "act_func_derivative = derivate(act_func)\n",
+    "valued_at_z = act_func_derivative(a)\n",
+    "print(f\"\\nDerivative of {act_func.__name__} activation function valued at z:\\n{valued_at_z}\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "095347a2",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "### The Neural Network\n",
+    "\n",
+    "Now that we have gotten a good understanding of the implementation of\n",
+    "some important components, we can take a look at an object oriented\n",
+    "implementation of a feed forward neural network. The feed forward\n",
+    "neural network has been implemented as a class named FFNN, which can\n",
+    "be initiated as a regressor or classifier dependant on the choice of\n",
+    "cost function. The FFNN can have any number of input nodes, hidden\n",
+    "layers with any amount of hidden nodes, and any amount of output nodes\n",
+    "meaning it can perform multiclass classification as well as binary\n",
+    "classification and regression problems. Although there is a lot of\n",
+    "code present, it makes for an easy to use and generalizeable interface\n",
+    "for creating many types of neural networks as will be demonstrated\n",
+    "below."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 19,
+   "id": "9ea2b0b7",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "import math\n",
+    "import autograd.numpy as np\n",
+    "import sys\n",
+    "import warnings\n",
+    "from autograd import grad, elementwise_grad\n",
+    "from random import random, seed\n",
+    "from copy import deepcopy, copy\n",
+    "from typing import Tuple, Callable\n",
+    "from sklearn.utils import resample\n",
+    "\n",
+    "warnings.simplefilter(\"error\")\n",
+    "\n",
+    "\n",
+    "class FFNN:\n",
+    "    \"\"\"\n",
+    "    Description:\n",
+    "    ------------\n",
+    "        Feed Forward Neural Network with interface enabling flexible design of a\n",
+    "        nerual networks architecture and the specification of activation function\n",
+    "        in the hidden layers and output layer respectively. This model can be used\n",
+    "        for both regression and classification problems, depending on the output function.\n",
+    "\n",
+    "    Attributes:\n",
+    "    ------------\n",
+    "        I   dimensions (tuple[int]): A list of positive integers, which specifies the\n",
+    "            number of nodes in each of the networks layers. The first integer in the array\n",
+    "            defines the number of nodes in the input layer, the second integer defines number\n",
+    "            of nodes in the first hidden layer and so on until the last number, which\n",
+    "            specifies the number of nodes in the output layer.\n",
+    "        II  hidden_func (Callable): The activation function for the hidden layers\n",
+    "        III output_func (Callable): The activation function for the output layer\n",
+    "        IV  cost_func (Callable): Our cost function\n",
+    "        V   seed (int): Sets random seed, makes results reproducible\n",
+    "    \"\"\"\n",
+    "\n",
+    "    def __init__(\n",
+    "        self,\n",
+    "        dimensions: tuple[int],\n",
+    "        hidden_func: Callable = sigmoid,\n",
+    "        output_func: Callable = lambda x: x,\n",
+    "        cost_func: Callable = CostOLS,\n",
+    "        seed: int = None,\n",
+    "    ):\n",
+    "        self.dimensions = dimensions\n",
+    "        self.hidden_func = hidden_func\n",
+    "        self.output_func = output_func\n",
+    "        self.cost_func = cost_func\n",
+    "        self.seed = seed\n",
+    "        self.weights = list()\n",
+    "        self.schedulers_weight = list()\n",
+    "        self.schedulers_bias = list()\n",
+    "        self.a_matrices = list()\n",
+    "        self.z_matrices = list()\n",
+    "        self.classification = None\n",
+    "\n",
+    "        self.reset_weights()\n",
+    "        self._set_classification()\n",
+    "\n",
+    "    def fit(\n",
+    "        self,\n",
+    "        X: np.ndarray,\n",
+    "        t: np.ndarray,\n",
+    "        scheduler: Scheduler,\n",
+    "        batches: int = 1,\n",
+    "        epochs: int = 100,\n",
+    "        lam: float = 0,\n",
+    "        X_val: np.ndarray = None,\n",
+    "        t_val: np.ndarray = None,\n",
+    "    ):\n",
+    "        \"\"\"\n",
+    "        Description:\n",
+    "        ------------\n",
+    "            This function performs the training the neural network by performing the feedforward and backpropagation\n",
+    "            algorithm to update the networks weights.\n",
+    "\n",
+    "        Parameters:\n",
+    "        ------------\n",
+    "            I    X (np.ndarray) : training data\n",
+    "            II   t (np.ndarray) : target data\n",
+    "            III  scheduler (Scheduler) : specified scheduler (algorithm for optimization of gradient descent)\n",
+    "            IV   scheduler_args (list[int]) : list of all arguments necessary for scheduler\n",
+    "\n",
+    "        Optional Parameters:\n",
+    "        ------------\n",
+    "            V    batches (int) : number of batches the datasets are split into, default equal to 1\n",
+    "            VI   epochs (int) : number of iterations used to train the network, default equal to 100\n",
+    "            VII  lam (float) : regularization hyperparameter lambda\n",
+    "            VIII X_val (np.ndarray) : validation set\n",
+    "            IX   t_val (np.ndarray) : validation target set\n",
+    "\n",
+    "        Returns:\n",
+    "        ------------\n",
+    "            I   scores (dict) : A dictionary containing the performance metrics of the model.\n",
+    "                The number of the metrics depends on the parameters passed to the fit-function.\n",
+    "\n",
+    "        \"\"\"\n",
+    "\n",
+    "        # setup \n",
+    "        if self.seed is not None:\n",
+    "            np.random.seed(self.seed)\n",
+    "\n",
+    "        val_set = False\n",
+    "        if X_val is not None and t_val is not None:\n",
+    "            val_set = True\n",
+    "\n",
+    "        # creating arrays for score metrics\n",
+    "        train_errors = np.empty(epochs)\n",
+    "        train_errors.fill(np.nan)\n",
+    "        val_errors = np.empty(epochs)\n",
+    "        val_errors.fill(np.nan)\n",
+    "\n",
+    "        train_accs = np.empty(epochs)\n",
+    "        train_accs.fill(np.nan)\n",
+    "        val_accs = np.empty(epochs)\n",
+    "        val_accs.fill(np.nan)\n",
+    "\n",
+    "        self.schedulers_weight = list()\n",
+    "        self.schedulers_bias = list()\n",
+    "\n",
+    "        batch_size = X.shape[0] // batches\n",
+    "\n",
+    "        X, t = resample(X, t)\n",
+    "\n",
+    "        # this function returns a function valued only at X\n",
+    "        cost_function_train = self.cost_func(t)\n",
+    "        if val_set:\n",
+    "            cost_function_val = self.cost_func(t_val)\n",
+    "\n",
+    "        # create schedulers for each weight matrix\n",
+    "        for i in range(len(self.weights)):\n",
+    "            self.schedulers_weight.append(copy(scheduler))\n",
+    "            self.schedulers_bias.append(copy(scheduler))\n",
+    "\n",
+    "        print(f\"{scheduler.__class__.__name__}: Eta={scheduler.eta}, Lambda={lam}\")\n",
+    "\n",
+    "        try:\n",
+    "            for e in range(epochs):\n",
+    "                for i in range(batches):\n",
+    "                    # allows for minibatch gradient descent\n",
+    "                    if i == batches - 1:\n",
+    "                        # If the for loop has reached the last batch, take all thats left\n",
+    "                        X_batch = X[i * batch_size :, :]\n",
+    "                        t_batch = t[i * batch_size :, :]\n",
+    "                    else:\n",
+    "                        X_batch = X[i * batch_size : (i + 1) * batch_size, :]\n",
+    "                        t_batch = t[i * batch_size : (i + 1) * batch_size, :]\n",
+    "\n",
+    "                    self._feedforward(X_batch)\n",
+    "                    self._backpropagate(X_batch, t_batch, lam)\n",
+    "\n",
+    "                # reset schedulers for each epoch (some schedulers pass in this call)\n",
+    "                for scheduler in self.schedulers_weight:\n",
+    "                    scheduler.reset()\n",
+    "\n",
+    "                for scheduler in self.schedulers_bias:\n",
+    "                    scheduler.reset()\n",
+    "\n",
+    "                # computing performance metrics\n",
+    "                pred_train = self.predict(X)\n",
+    "                train_error = cost_function_train(pred_train)\n",
+    "\n",
+    "                train_errors[e] = train_error\n",
+    "                if val_set:\n",
+    "                    \n",
+    "                    pred_val = self.predict(X_val)\n",
+    "                    val_error = cost_function_val(pred_val)\n",
+    "                    val_errors[e] = val_error\n",
+    "\n",
+    "                if self.classification:\n",
+    "                    train_acc = self._accuracy(self.predict(X), t)\n",
+    "                    train_accs[e] = train_acc\n",
+    "                    if val_set:\n",
+    "                        val_acc = self._accuracy(pred_val, t_val)\n",
+    "                        val_accs[e] = val_acc\n",
+    "\n",
+    "                # printing progress bar\n",
+    "                progression = e / epochs\n",
+    "                print_length = self._progress_bar(\n",
+    "                    progression,\n",
+    "                    train_error=train_errors[e],\n",
+    "                    train_acc=train_accs[e],\n",
+    "                    val_error=val_errors[e],\n",
+    "                    val_acc=val_accs[e],\n",
+    "                )\n",
+    "        except KeyboardInterrupt:\n",
+    "            # allows for stopping training at any point and seeing the result\n",
+    "            pass\n",
+    "\n",
+    "        # visualization of training progression (similiar to tensorflow progression bar)\n",
+    "        sys.stdout.write(\"\\r\" + \" \" * print_length)\n",
+    "        sys.stdout.flush()\n",
+    "        self._progress_bar(\n",
+    "            1,\n",
+    "            train_error=train_errors[e],\n",
+    "            train_acc=train_accs[e],\n",
+    "            val_error=val_errors[e],\n",
+    "            val_acc=val_accs[e],\n",
+    "        )\n",
+    "        sys.stdout.write(\"\")\n",
+    "\n",
+    "        # return performance metrics for the entire run\n",
+    "        scores = dict()\n",
+    "\n",
+    "        scores[\"train_errors\"] = train_errors\n",
+    "\n",
+    "        if val_set:\n",
+    "            scores[\"val_errors\"] = val_errors\n",
+    "\n",
+    "        if self.classification:\n",
+    "            scores[\"train_accs\"] = train_accs\n",
+    "\n",
+    "            if val_set:\n",
+    "                scores[\"val_accs\"] = val_accs\n",
+    "\n",
+    "        return scores\n",
+    "\n",
+    "    def predict(self, X: np.ndarray, *, threshold=0.5):\n",
+    "        \"\"\"\n",
+    "         Description:\n",
+    "         ------------\n",
+    "             Performs prediction after training of the network has been finished.\n",
+    "\n",
+    "         Parameters:\n",
+    "        ------------\n",
+    "             I   X (np.ndarray): The design matrix, with n rows of p features each\n",
+    "\n",
+    "         Optional Parameters:\n",
+    "         ------------\n",
+    "             II  threshold (float) : sets minimal value for a prediction to be predicted as the positive class\n",
+    "                 in classification problems\n",
+    "\n",
+    "         Returns:\n",
+    "         ------------\n",
+    "             I   z (np.ndarray): A prediction vector (row) for each row in our design matrix\n",
+    "                 This vector is thresholded if regression=False, meaning that classification results\n",
+    "                 in a vector of 1s and 0s, while regressions in an array of decimal numbers\n",
+    "\n",
+    "        \"\"\"\n",
+    "\n",
+    "        predict = self._feedforward(X)\n",
+    "\n",
+    "        if self.classification:\n",
+    "            return np.where(predict > threshold, 1, 0)\n",
+    "        else:\n",
+    "            return predict\n",
+    "\n",
+    "    def reset_weights(self):\n",
+    "        \"\"\"\n",
+    "        Description:\n",
+    "        ------------\n",
+    "            Resets/Reinitializes the weights in order to train the network for a new problem.\n",
+    "\n",
+    "        \"\"\"\n",
+    "        if self.seed is not None:\n",
+    "            np.random.seed(self.seed)\n",
+    "\n",
+    "        self.weights = list()\n",
+    "        for i in range(len(self.dimensions) - 1):\n",
+    "            weight_array = np.random.randn(\n",
+    "                self.dimensions[i] + 1, self.dimensions[i + 1]\n",
+    "            )\n",
+    "            weight_array[0, :] = np.random.randn(self.dimensions[i + 1]) * 0.01\n",
+    "\n",
+    "            self.weights.append(weight_array)\n",
+    "\n",
+    "    def _feedforward(self, X: np.ndarray):\n",
+    "        \"\"\"\n",
+    "        Description:\n",
+    "        ------------\n",
+    "            Calculates the activation of each layer starting at the input and ending at the output.\n",
+    "            Each following activation is calculated from a weighted sum of each of the preceeding\n",
+    "            activations (except in the case of the input layer).\n",
+    "\n",
+    "        Parameters:\n",
+    "        ------------\n",
+    "            I   X (np.ndarray): The design matrix, with n rows of p features each\n",
+    "\n",
+    "        Returns:\n",
+    "        ------------\n",
+    "            I   z (np.ndarray): A prediction vector (row) for each row in our design matrix\n",
+    "        \"\"\"\n",
+    "\n",
+    "        # reset matrices\n",
+    "        self.a_matrices = list()\n",
+    "        self.z_matrices = list()\n",
+    "\n",
+    "        # if X is just a vector, make it into a matrix\n",
+    "        if len(X.shape) == 1:\n",
+    "            X = X.reshape((1, X.shape[0]))\n",
+    "\n",
+    "        # Add a coloumn of zeros as the first coloumn of the design matrix, in order\n",
+    "        # to add bias to our data\n",
+    "        bias = np.ones((X.shape[0], 1)) * 0.01\n",
+    "        X = np.hstack([bias, X])\n",
+    "\n",
+    "        # a^0, the nodes in the input layer (one a^0 for each row in X - where the\n",
+    "        # exponent indicates layer number).\n",
+    "        a = X\n",
+    "        self.a_matrices.append(a)\n",
+    "        self.z_matrices.append(a)\n",
+    "\n",
+    "        # The feed forward algorithm\n",
+    "        for i in range(len(self.weights)):\n",
+    "            if i < len(self.weights) - 1:\n",
+    "                z = a @ self.weights[i]\n",
+    "                self.z_matrices.append(z)\n",
+    "                a = self.hidden_func(z)\n",
+    "                # bias column again added to the data here\n",
+    "                bias = np.ones((a.shape[0], 1)) * 0.01\n",
+    "                a = np.hstack([bias, a])\n",
+    "                self.a_matrices.append(a)\n",
+    "            else:\n",
+    "                try:\n",
+    "                    # a^L, the nodes in our output layers\n",
+    "                    z = a @ self.weights[i]\n",
+    "                    a = self.output_func(z)\n",
+    "                    self.a_matrices.append(a)\n",
+    "                    self.z_matrices.append(z)\n",
+    "                except Exception as OverflowError:\n",
+    "                    print(\n",
+    "                        \"OverflowError in fit() in FFNN\\nHOW TO DEBUG ERROR: Consider lowering your learning rate or scheduler specific parameters such as momentum, or check if your input values need scaling\"\n",
+    "                    )\n",
+    "\n",
+    "        # this will be a^L\n",
+    "        return a\n",
+    "\n",
+    "    def _backpropagate(self, X, t, lam):\n",
+    "        \"\"\"\n",
+    "        Description:\n",
+    "        ------------\n",
+    "            Performs the backpropagation algorithm. In other words, this method\n",
+    "            calculates the gradient of all the layers starting at the\n",
+    "            output layer, and moving from right to left accumulates the gradient until\n",
+    "            the input layer is reached. Each layers respective weights are updated while\n",
+    "            the algorithm propagates backwards from the output layer (auto-differentation in reverse mode).\n",
+    "\n",
+    "        Parameters:\n",
+    "        ------------\n",
+    "            I   X (np.ndarray): The design matrix, with n rows of p features each.\n",
+    "            II  t (np.ndarray): The target vector, with n rows of p targets.\n",
+    "            III lam (float32): regularization parameter used to punish the weights in case of overfitting\n",
+    "\n",
+    "        Returns:\n",
+    "        ------------\n",
+    "            No return value.\n",
+    "\n",
+    "        \"\"\"\n",
+    "        out_derivative = derivate(self.output_func)\n",
+    "        hidden_derivative = derivate(self.hidden_func)\n",
+    "\n",
+    "        for i in range(len(self.weights) - 1, -1, -1):\n",
+    "            # delta terms for output\n",
+    "            if i == len(self.weights) - 1:\n",
+    "                # for multi-class classification\n",
+    "                if (\n",
+    "                    self.output_func.__name__ == \"softmax\"\n",
+    "                ):\n",
+    "                    delta_matrix = self.a_matrices[i + 1] - t\n",
+    "                # for single class classification\n",
+    "                else:\n",
+    "                    cost_func_derivative = grad(self.cost_func(t))\n",
+    "                    delta_matrix = out_derivative(\n",
+    "                        self.z_matrices[i + 1]\n",
+    "                    ) * cost_func_derivative(self.a_matrices[i + 1])\n",
+    "\n",
+    "            # delta terms for hidden layer\n",
+    "            else:\n",
+    "                delta_matrix = (\n",
+    "                    self.weights[i + 1][1:, :] @ delta_matrix.T\n",
+    "                ).T * hidden_derivative(self.z_matrices[i + 1])\n",
+    "\n",
+    "            # calculate gradient\n",
+    "            gradient_weights = self.a_matrices[i][:, 1:].T @ delta_matrix\n",
+    "            gradient_bias = np.sum(delta_matrix, axis=0).reshape(\n",
+    "                1, delta_matrix.shape[1]\n",
+    "            )\n",
+    "\n",
+    "            # regularization term\n",
+    "            gradient_weights += self.weights[i][1:, :] * lam\n",
+    "\n",
+    "            # use scheduler\n",
+    "            update_matrix = np.vstack(\n",
+    "                [\n",
+    "                    self.schedulers_bias[i].update_change(gradient_bias),\n",
+    "                    self.schedulers_weight[i].update_change(gradient_weights),\n",
+    "                ]\n",
+    "            )\n",
+    "\n",
+    "            # update weights and bias\n",
+    "            self.weights[i] -= update_matrix\n",
+    "\n",
+    "    def _accuracy(self, prediction: np.ndarray, target: np.ndarray):\n",
+    "        \"\"\"\n",
+    "        Description:\n",
+    "        ------------\n",
+    "            Calculates accuracy of given prediction to target\n",
+    "\n",
+    "        Parameters:\n",
+    "        ------------\n",
+    "            I   prediction (np.ndarray): vector of predicitons output network\n",
+    "                (1s and 0s in case of classification, and real numbers in case of regression)\n",
+    "            II  target (np.ndarray): vector of true values (What the network ideally should predict)\n",
+    "\n",
+    "        Returns:\n",
+    "        ------------\n",
+    "            A floating point number representing the percentage of correctly classified instances.\n",
+    "        \"\"\"\n",
+    "        assert prediction.size == target.size\n",
+    "        return np.average((target == prediction))\n",
+    "    def _set_classification(self):\n",
+    "        \"\"\"\n",
+    "        Description:\n",
+    "        ------------\n",
+    "            Decides if FFNN acts as classifier (True) og regressor (False),\n",
+    "            sets self.classification during init()\n",
+    "        \"\"\"\n",
+    "        self.classification = False\n",
+    "        if (\n",
+    "            self.cost_func.__name__ == \"CostLogReg\"\n",
+    "            or self.cost_func.__name__ == \"CostCrossEntropy\"\n",
+    "        ):\n",
+    "            self.classification = True\n",
+    "\n",
+    "    def _progress_bar(self, progression, **kwargs):\n",
+    "        \"\"\"\n",
+    "        Description:\n",
+    "        ------------\n",
+    "            Displays progress of training\n",
+    "        \"\"\"\n",
+    "        print_length = 40\n",
+    "        num_equals = int(progression * print_length)\n",
+    "        num_not = print_length - num_equals\n",
+    "        arrow = \">\" if num_equals > 0 else \"\"\n",
+    "        bar = \"[\" + \"=\" * (num_equals - 1) + arrow + \"-\" * num_not + \"]\"\n",
+    "        perc_print = self._format(progression * 100, decimals=5)\n",
+    "        line = f\"  {bar} {perc_print}% \"\n",
+    "\n",
+    "        for key in kwargs:\n",
+    "            if not np.isnan(kwargs[key]):\n",
+    "                value = self._format(kwargs[key], decimals=4)\n",
+    "                line += f\"| {key}: {value} \"\n",
+    "        sys.stdout.write(\"\\r\" + line)\n",
+    "        sys.stdout.flush()\n",
+    "        return len(line)\n",
+    "\n",
+    "    def _format(self, value, decimals=4):\n",
+    "        \"\"\"\n",
+    "        Description:\n",
+    "        ------------\n",
+    "            Formats decimal numbers for progress bar\n",
+    "        \"\"\"\n",
+    "        if value > 0:\n",
+    "            v = value\n",
+    "        elif value < 0:\n",
+    "            v = -10 * value\n",
+    "        else:\n",
+    "            v = 1\n",
+    "        n = 1 + math.floor(math.log10(v))\n",
+    "        if n >= decimals - 1:\n",
+    "            return str(round(value))\n",
+    "        return f\"{value:.{decimals-n-1}f}\""
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0f29bccd",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Before we make a model, we will quickly generate a dataset we can use\n",
+    "for our linear regression problem as shown below"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 20,
+   "id": "dc37b403",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "import autograd.numpy as np\n",
+    "from sklearn.model_selection import train_test_split\n",
+    "\n",
+    "def SkrankeFunction(x, y):\n",
+    "    return np.ravel(0 + 1*x + 2*y + 3*x**2 + 4*x*y + 5*y**2)\n",
+    "\n",
+    "def create_X(x, y, n):\n",
+    "    if len(x.shape) > 1:\n",
+    "        x = np.ravel(x)\n",
+    "        y = np.ravel(y)\n",
+    "\n",
+    "    N = len(x)\n",
+    "    l = int((n + 1) * (n + 2) / 2)  # Number of elements in beta\n",
+    "    X = np.ones((N, l))\n",
+    "\n",
+    "    for i in range(1, n + 1):\n",
+    "        q = int((i) * (i + 1) / 2)\n",
+    "        for k in range(i + 1):\n",
+    "            X[:, q + k] = (x ** (i - k)) * (y**k)\n",
+    "\n",
+    "    return X\n",
+    "\n",
+    "step=0.5\n",
+    "x = np.arange(0, 1, step)\n",
+    "y = np.arange(0, 1, step)\n",
+    "x, y = np.meshgrid(x, y)\n",
+    "target = SkrankeFunction(x, y)\n",
+    "target = target.reshape(target.shape[0], 1)\n",
+    "\n",
+    "poly_degree=3\n",
+    "X = create_X(x, y, poly_degree)\n",
+    "\n",
+    "X_train, X_test, t_train, t_test = train_test_split(X, target)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "91790369",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Now that we have our dataset ready for the regression, we can create\n",
+    "our regressor. Note that with the seed parameter, we can make sure our\n",
+    "results stay the same every time we run the neural network. For\n",
+    "inititialization, we simply specify the dimensions (we wish the amount\n",
+    "of input nodes to be equal to the datapoints, and the output to\n",
+    "predict one value)."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 21,
+   "id": "62585c7a",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "input_nodes = X_train.shape[1]\n",
+    "output_nodes = 1\n",
+    "\n",
+    "linear_regression = FFNN((input_nodes, output_nodes), output_func=identity, cost_func=CostOLS, seed=2023)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "69cdc171",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "We then fit our model with our training data using the scheduler of our choice."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 22,
+   "id": "d0713298",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "linear_regression.reset_weights() # reset weights such that previous runs or reruns don't affect the weights\n",
+    "\n",
+    "scheduler = Constant(eta=1e-3)\n",
+    "scores = linear_regression.fit(X_train, t_train, scheduler)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "310f805d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Due to the progress bar we can see the MSE (train_error) throughout\n",
+    "the FFNN's training. Note that the fit() function has some optional\n",
+    "parameters with defualt arguments. For example, the regularization\n",
+    "hyperparameter can be left ignored if not needed, and equally the FFNN\n",
+    "will by default run for 100 epochs. These can easily be changed, such\n",
+    "as for example:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 23,
+   "id": "216d1c44",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "linear_regression.reset_weights() # reset weights such that previous runs or reruns don't affect the weights\n",
+    "\n",
+    "scores = linear_regression.fit(X_train, t_train, scheduler, lam=1e-4, epochs=1000)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ba2e5a39",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "We see that given more epochs to train on, the regressor reaches a lower MSE.\n",
+    "\n",
+    "Let us then switch to a binary classification. We use a binary\n",
+    "classification dataset, and follow a similar setup to the regression\n",
+    "case."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 24,
+   "id": "8c5b291e",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "from sklearn.datasets import load_breast_cancer\n",
+    "from sklearn.preprocessing import MinMaxScaler\n",
+    "\n",
+    "wisconsin = load_breast_cancer()\n",
+    "X = wisconsin.data\n",
+    "target = wisconsin.target\n",
+    "target = target.reshape(target.shape[0], 1)\n",
+    "\n",
+    "X_train, X_val, t_train, t_val = train_test_split(X, target)\n",
+    "\n",
+    "scaler = MinMaxScaler()\n",
+    "scaler.fit(X_train)\n",
+    "X_train = scaler.transform(X_train)\n",
+    "X_val = scaler.transform(X_val)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 25,
+   "id": "4f6aa682",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "input_nodes = X_train.shape[1]\n",
+    "output_nodes = 1\n",
+    "\n",
+    "logistic_regression = FFNN((input_nodes, output_nodes), output_func=sigmoid, cost_func=CostLogReg, seed=2023)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3ff7c54a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "We will now make use of our validation data by passing it into our fit function as a keyword argument"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 26,
+   "id": "4bbcaedd",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "logistic_regression.reset_weights() # reset weights such that previous runs or reruns don't affect the weights\n",
+    "\n",
+    "scheduler = Adam(eta=1e-3, rho=0.9, rho2=0.999)\n",
+    "scores = logistic_regression.fit(X_train, t_train, scheduler, epochs=1000, X_val=X_val, t_val=t_val)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "aa4f54fe",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Finally, we will create a neural network with 2 hidden layers with activation functions."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 27,
+   "id": "c11be1f5",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "input_nodes = X_train.shape[1]\n",
+    "hidden_nodes1 = 100\n",
+    "hidden_nodes2 = 30\n",
+    "output_nodes = 1\n",
+    "\n",
+    "dims = (input_nodes, hidden_nodes1, hidden_nodes2, output_nodes)\n",
+    "\n",
+    "neural_network = FFNN(dims, hidden_func=RELU, output_func=sigmoid, cost_func=CostLogReg, seed=2023)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 28,
+   "id": "78482f24",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "neural_network.reset_weights() # reset weights such that previous runs or reruns don't affect the weights\n",
+    "\n",
+    "scheduler = Adam(eta=1e-4, rho=0.9, rho2=0.999)\n",
+    "scores = neural_network.fit(X_train, t_train, scheduler, epochs=1000, X_val=X_val, t_val=t_val)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "678b88e7",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "### Multiclass classification\n",
+    "\n",
+    "Finally, we will demonstrate the use case of multiclass classification\n",
+    "using our FFNN with the famous MNIST dataset, which contain images of\n",
+    "digits between the range of 0 to 9."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 29,
+   "id": "833a7321",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "from sklearn.datasets import load_digits\n",
+    "\n",
+    "def onehot(target: np.ndarray):\n",
+    "    onehot = np.zeros((target.size, target.max() + 1))\n",
+    "    onehot[np.arange(target.size), target] = 1\n",
+    "    return onehot\n",
+    "\n",
+    "digits = load_digits()\n",
+    "\n",
+    "X = digits.data\n",
+    "target = digits.target\n",
+    "target = onehot(target)\n",
+    "\n",
+    "input_nodes = 64\n",
+    "hidden_nodes1 = 100\n",
+    "hidden_nodes2 = 30\n",
+    "output_nodes = 10\n",
+    "\n",
+    "dims = (input_nodes, hidden_nodes1, hidden_nodes2, output_nodes)\n",
+    "\n",
+    "multiclass = FFNN(dims, hidden_func=LRELU, output_func=softmax, cost_func=CostCrossEntropy)\n",
+    "\n",
+    "multiclass.reset_weights() # reset weights such that previous runs or reruns don't affect the weights\n",
+    "\n",
+    "scheduler = Adam(eta=1e-4, rho=0.9, rho2=0.999)\n",
+    "scores = multiclass.fit(X, target, scheduler, epochs=1000)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1af2ad7b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Testing the XOR gate and other gates\n",
+    "\n",
+    "Let us now use our code to test the XOR gate."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 30,
+   "id": "752c6403",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "X = np.array([ [0, 0], [0, 1], [1, 0],[1, 1]],dtype=np.float64)\n",
+    "\n",
+    "# The XOR gate\n",
+    "yXOR = np.array( [[ 0], [1] ,[1], [0]])\n",
+    "\n",
+    "input_nodes = X.shape[1]\n",
+    "output_nodes = 1\n",
+    "\n",
+    "logistic_regression = FFNN((input_nodes, output_nodes), output_func=sigmoid, cost_func=CostLogReg, seed=2023)\n",
+    "logistic_regression.reset_weights() # reset weights such that previous runs or reruns don't affect the weights\n",
+    "scheduler = Adam(eta=1e-1, rho=0.9, rho2=0.999)\n",
+    "scores = logistic_regression.fit(X, yXOR, scheduler, epochs=1000)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0a7c91e3",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Not bad, but the results depend strongly on the learning reate. Try different learning rates."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "40ffa1fb",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Solving differential equations  with Deep Learning\n",
+    "\n",
+    "The Universal Approximation Theorem states that a neural network can\n",
+    "approximate any function at a single hidden layer along with one input\n",
+    "and output layer to any given precision.\n",
+    "\n",
+    "**Book on solving differential equations with ML methods.**\n",
+    "\n",
+    "[An Introduction to Neural Network Methods for Differential Equations](https://www.springer.com/gp/book/9789401798150), by Yadav and Kumar.\n",
+    "\n",
+    "**Physics informed neural networks.**\n",
+    "\n",
+    "[Scientific Machine Learning Through Physics–Informed Neural Networks: Where we are and What’s Next](https://link.springer.com/article/10.1007/s10915-022-01939-z), by Cuomo et al\n",
+    "\n",
+    "**Thanks to Kristine Baluka Hein.**\n",
+    "\n",
+    "The lectures on differential equations were developed by Kristine Baluka Hein, now PhD student at IFI.\n",
+    "A great thanks to Kristine."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "191ba3eb",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Ordinary Differential Equations first\n",
+    "\n",
+    "An ordinary differential equation (ODE) is an equation involving functions having one variable.\n",
+    "\n",
+    "In general, an ordinary differential equation looks like"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a0be312a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"ode\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation} \\label{ode} \\tag{1}\n",
+    "f\\left(x, \\, g(x), \\, g'(x), \\, g''(x), \\, \\dots \\, , \\, g^{(n)}(x)\\right) = 0\n",
+    "\\end{equation}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "000663cf",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where $g(x)$ is the function to find, and $g^{(n)}(x)$ is the $n$-th derivative of $g(x)$.\n",
+    "\n",
+    "The $f\\left(x, g(x), g'(x), g''(x), \\, \\dots \\, , g^{(n)}(x)\\right)$ is just a way to write that there is an expression involving $x$ and $g(x), \\ g'(x), \\ g''(x), \\, \\dots \\, , \\text{ and } g^{(n)}(x)$ on the left side of the equality sign in ([1](#ode)).\n",
+    "The highest order of derivative, that is the value of $n$, determines to the order of the equation.\n",
+    "The equation is referred to as a $n$-th order ODE.\n",
+    "Along with ([1](#ode)), some additional conditions of the function $g(x)$ are typically given\n",
+    "for the solution to be unique."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f5b87995",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## The trial solution\n",
+    "\n",
+    "Let the trial solution $g_t(x)$ be"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a166c0b6",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"_auto1\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation}\n",
+    "\tg_t(x) = h_1(x) + h_2(x,N(x,P))\n",
+    "\\label{_auto1} \\tag{2}\n",
+    "\\end{equation}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f1e49a2c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where $h_1(x)$ is a function that makes $g_t(x)$ satisfy a given set\n",
+    "of conditions, $N(x,P)$ a neural network with weights and biases\n",
+    "described by $P$ and $h_2(x, N(x,P))$ some expression involving the\n",
+    "neural network.  The role of the function $h_2(x, N(x,P))$, is to\n",
+    "ensure that the output from $N(x,P)$ is zero when $g_t(x)$ is\n",
+    "evaluated at the values of $x$ where the given conditions must be\n",
+    "satisfied.  The function $h_1(x)$ should alone make $g_t(x)$ satisfy\n",
+    "the conditions.\n",
+    "\n",
+    "But what about the network $N(x,P)$?\n",
+    "\n",
+    "As described previously, an optimization method could be used to minimize the parameters of a neural network, that being its weights and biases, through backward propagation."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "207d1a97",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Minimization process\n",
+    "\n",
+    "For the minimization to be defined, we need to have a cost function at hand to minimize.\n",
+    "\n",
+    "It is given that $f\\left(x, \\, g(x), \\, g'(x), \\, g''(x), \\, \\dots \\, , \\, g^{(n)}(x)\\right)$ should be equal to zero in ([1](#ode)).\n",
+    "We can choose to consider the mean squared error as the cost function for an input $x$.\n",
+    "Since we are looking at one input, the cost function is just $f$ squared.\n",
+    "The cost function $c\\left(x, P \\right)$ can therefore be expressed as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "94a061a1",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "C\\left(x, P\\right) = \\big(f\\left(x, \\, g(x), \\, g'(x), \\, g''(x), \\, \\dots \\, , \\, g^{(n)}(x)\\right)\\big)^2\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "93244d03",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "If $N$ inputs are given as a vector $\\boldsymbol{x}$ with elements $x_i$ for $i = 1,\\dots,N$,\n",
+    "the cost function becomes"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6dc16fd4",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"cost\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation} \\label{cost} \\tag{3}\n",
+    "\tC\\left(\\boldsymbol{x}, P\\right) = \\frac{1}{N} \\sum_{i=1}^N \\big(f\\left(x_i, \\, g(x_i), \\, g'(x_i), \\, g''(x_i), \\, \\dots \\, , \\, g^{(n)}(x_i)\\right)\\big)^2\n",
+    "\\end{equation}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "01f4c14a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "The neural net should then find the parameters $P$ that minimizes the cost function in\n",
+    "([3](#cost)) for a set of $N$ training samples $x_i$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1784066c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Minimizing the cost function using gradient descent and automatic differentiation\n",
+    "\n",
+    "To perform the minimization using gradient descent, the gradient of $C\\left(\\boldsymbol{x}, P\\right)$ is needed.\n",
+    "It might happen so that finding an analytical expression of the gradient of $C(\\boldsymbol{x}, P)$ from ([3](#cost)) gets too messy, depending on which cost function one desires to use.\n",
+    "\n",
+    "Luckily, there exists libraries that makes the job for us through automatic differentiation.\n",
+    "Automatic differentiation is a method of finding the derivatives numerically with very high precision."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "43e1b7bf",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Example: Exponential decay\n",
+    "\n",
+    "An exponential decay of a quantity $g(x)$ is described by the equation"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5c28e60a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"solve_expdec\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation} \\label{solve_expdec} \\tag{4}\n",
+    "  g'(x) = -\\gamma g(x)\n",
+    "\\end{equation}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cfd2e420",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "with $g(0) = g_0$ for some chosen initial value $g_0$.\n",
+    "\n",
+    "The analytical solution of ([4](#solve_expdec)) is"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b93aa0f8",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"_auto2\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation}\n",
+    "  g(x) = g_0 \\exp\\left(-\\gamma x\\right)\n",
+    "\\label{_auto2} \\tag{5}\n",
+    "\\end{equation}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "093952f0",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Having an analytical solution at hand, it is possible to use it to compare how well a neural network finds a solution of ([4](#solve_expdec))."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8f82fa61",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## The function to solve for\n",
+    "\n",
+    "The program will use a neural network to solve"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "027d9c52",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"solveode\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation} \\label{solveode} \\tag{6}\n",
+    "g'(x) = -\\gamma g(x)\n",
+    "\\end{equation}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c18c4ee8",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where $g(0) = g_0$ with $\\gamma$ and $g_0$ being some chosen values.\n",
+    "\n",
+    "In this example, $\\gamma = 2$ and $g_0 = 10$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a0d7fc0a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## The trial solution\n",
+    "To begin with, a trial solution $g_t(t)$ must be chosen. A general trial solution for ordinary differential equations could be"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "73cd72f4",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "g_t(x, P) = h_1(x) + h_2(x, N(x, P))\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a4d0850f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "with $h_1(x)$ ensuring that $g_t(x)$ satisfies some conditions and $h_2(x,N(x, P))$ an expression involving $x$ and the output from the neural network $N(x,P)$ with $P $ being the collection of the weights and biases for each layer. For now, it is assumed that the network consists of one input layer, one hidden layer, and one output layer."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "62f3b94f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Setup of Network\n",
+    "\n",
+    "In this network, there are no weights and bias at the input layer, so $P = \\{ P_{\\text{hidden}},  P_{\\text{output}} \\}$.\n",
+    "If there are $N_{\\text{hidden} }$ neurons in the hidden layer, then $P_{\\text{hidden}}$ is a $N_{\\text{hidden} } \\times (1 + N_{\\text{input}})$ matrix, given that there are $N_{\\text{input}}$ neurons in the input layer.\n",
+    "\n",
+    "The first column in $P_{\\text{hidden} }$ represents the bias for each neuron in the hidden layer and the second column represents the weights for each neuron in the hidden layer from the input layer.\n",
+    "If there are $N_{\\text{output} }$ neurons in the output layer, then $P_{\\text{output}} $ is a $N_{\\text{output} } \\times (1 + N_{\\text{hidden} })$ matrix.\n",
+    "\n",
+    "Its first column represents the bias of each neuron and the remaining columns represents the weights to each neuron.\n",
+    "\n",
+    "It is given that $g(0) = g_0$. The trial solution must fulfill this condition to be a proper solution of ([6](#solveode)). A possible way to ensure that $g_t(0, P) = g_0$, is to let $F(N(x,P)) = x \\cdot N(x,P)$ and $A(x) = g_0$. This gives the following trial solution:"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f5144858",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"trial\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation} \\label{trial} \\tag{7}\n",
+    "g_t(x, P) = g_0 + x \\cdot N(x, P)\n",
+    "\\end{equation}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6b441362",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Reformulating the problem\n",
+    "\n",
+    "We wish that our neural network manages to minimize a given cost function.\n",
+    "\n",
+    "A reformulation of out equation, ([6](#solveode)), must therefore be done,\n",
+    "such that it describes the problem a neural network can solve for.\n",
+    "\n",
+    "The neural network must find the set of weights and biases $P$ such that the trial solution in ([7](#trial)) satisfies ([6](#solveode)).\n",
+    "\n",
+    "The trial solution"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "abfe2d6d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "g_t(x, P) = g_0 + x \\cdot N(x, P)\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "aabb6c7b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "has been chosen such that it already solves the condition $g(0) = g_0$. What remains, is to find $P$ such that"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "11fc8b1b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"nnmin\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation} \\label{nnmin} \\tag{8}\n",
+    "g_t'(x, P) = - \\gamma g_t(x, P)\n",
+    "\\end{equation}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "604c92b4",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "is fulfilled as *best as possible*."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e2cd7572",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## More technicalities\n",
+    "\n",
+    "The left hand side and right hand side of ([8](#nnmin)) must be computed separately, and then the neural network must choose weights and biases, contained in $P$, such that the sides are equal as best as possible.\n",
+    "This means that the absolute or squared difference between the sides must be as close to zero, ideally equal to zero.\n",
+    "In this case, the difference squared shows to be an appropriate measurement of how erroneous the trial solution is with respect to $P$ of the neural network.\n",
+    "\n",
+    "This gives the following cost function our neural network must solve for:"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d916a5f6",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\min_{P}\\Big\\{ \\big(g_t'(x, P) - ( -\\gamma g_t(x, P) \\big)^2 \\Big\\}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d746e69c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "(the notation $\\min_{P}\\{ f(x, P) \\}$ means that we desire to find $P$ that yields the minimum of $f(x, P)$)\n",
+    "\n",
+    "or, in terms of weights and biases for the hidden and output layer in our network:"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4c34c242",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\min_{P_{\\text{hidden} }, \\ P_{\\text{output} }}\\Big\\{ \\big(g_t'(x, \\{ P_{\\text{hidden} }, P_{\\text{output} }\\}) - ( -\\gamma g_t(x, \\{ P_{\\text{hidden} }, P_{\\text{output} }\\}) \\big)^2 \\Big\\}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f55f3047",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "for an input value $x$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "485e4671",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## More details\n",
+    "\n",
+    "If the neural network evaluates $g_t(x, P)$ at more values for $x$, say $N$ values $x_i$ for $i = 1, \\dots, N$, then the *total* error to minimize becomes"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5628ca35",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"min\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation} \\label{min} \\tag{9}\n",
+    "\\min_{P}\\Big\\{\\frac{1}{N} \\sum_{i=1}^N  \\big(g_t'(x_i, P) - ( -\\gamma g_t(x_i, P) \\big)^2 \\Big\\}\n",
+    "\\end{equation}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "da2c90ea",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Letting $\\boldsymbol{x}$ be a vector with elements $x_i$ and $C(\\boldsymbol{x}, P) = \\frac{1}{N} \\sum_i  \\big(g_t'(x_i, P) - ( -\\gamma g_t(x_i, P) \\big)^2$ denote the cost function, the minimization problem that our network must solve, becomes"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d386a466",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\min_{P} C(\\boldsymbol{x}, P)\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ec3d975a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "In terms of $P_{\\text{hidden} }$ and $P_{\\text{output} }$, this could also be expressed as\n",
+    "\n",
+    "$$\n",
+    "\\min_{P_{\\text{hidden} }, \\ P_{\\text{output} }} C(\\boldsymbol{x}, \\{P_{\\text{hidden} }, P_{\\text{output} }\\})\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4f0f47e7",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## A possible implementation of a neural network\n",
+    "\n",
+    "For simplicity, it is assumed that the input is an array $\\boldsymbol{x} = (x_1, \\dots, x_N)$ with $N$ elements. It is at these points the neural network should find $P$ such that it fulfills ([9](#min)).\n",
+    "\n",
+    "First, the neural network must feed forward the inputs.\n",
+    "This means that $\\boldsymbol{x}s$ must be passed through an input layer, a hidden layer and a output layer. The input layer in this case, does not need to process the data any further.\n",
+    "The input layer will consist of $N_{\\text{input} }$ neurons, passing its element to each neuron in the hidden layer.  The number of neurons in the hidden layer will be $N_{\\text{hidden} }$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a757d9cf",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Technicalities\n",
+    "\n",
+    "For the $i$-th in the hidden layer with weight $w_i^{\\text{hidden} }$ and bias $b_i^{\\text{hidden} }$, the weighting from the $j$-th neuron at the input layer is:"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ee093dd9",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\begin{aligned}\n",
+    "z_{i,j}^{\\text{hidden}} &= b_i^{\\text{hidden}} + w_i^{\\text{hidden}}x_j \\\\\n",
+    "&=\n",
+    "\\begin{pmatrix}\n",
+    "b_i^{\\text{hidden}} & w_i^{\\text{hidden}}\n",
+    "\\end{pmatrix}\n",
+    "\\begin{pmatrix}\n",
+    "1 \\\\\n",
+    "x_j\n",
+    "\\end{pmatrix}\n",
+    "\\end{aligned}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4d3954bf",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Final technicalities I\n",
+    "\n",
+    "The result after weighting the inputs at the $i$-th hidden neuron can be written as a vector:"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b4b36b8c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\begin{aligned}\n",
+    "\\boldsymbol{z}_{i}^{\\text{hidden}} &= \\Big( b_i^{\\text{hidden}} + w_i^{\\text{hidden}}x_1 , \\ b_i^{\\text{hidden}} + w_i^{\\text{hidden}} x_2, \\ \\dots \\, , \\ b_i^{\\text{hidden}} + w_i^{\\text{hidden}} x_N\\Big)  \\\\\n",
+    "&=\n",
+    "\\begin{pmatrix}\n",
+    " b_i^{\\text{hidden}}  & w_i^{\\text{hidden}}\n",
+    "\\end{pmatrix}\n",
+    "\\begin{pmatrix}\n",
+    "1  & 1 & \\dots & 1 \\\\\n",
+    "x_1 & x_2 & \\dots & x_N\n",
+    "\\end{pmatrix} \\\\\n",
+    "&= \\boldsymbol{p}_{i, \\text{hidden}}^T X\n",
+    "\\end{aligned}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "36e8a1dd",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Final technicalities II\n",
+    "\n",
+    "The vector $\\boldsymbol{p}_{i, \\text{hidden}}^T$ constitutes each row in $P_{\\text{hidden} }$, which contains the weights for the neural network to minimize according to ([9](#min)).\n",
+    "\n",
+    "After having found $\\boldsymbol{z}_{i}^{\\text{hidden}} $ for every $i$-th neuron within the hidden layer, the vector will be sent to an activation function $a_i(\\boldsymbol{z})$.\n",
+    "\n",
+    "In this example, the sigmoid function has been chosen to be the activation function for each hidden neuron:"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "af2e68be",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "f(z) = \\frac{1}{1 + \\exp{(-z)}}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7b8922c6",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "It is possible to use other activations functions for the hidden layer also.\n",
+    "\n",
+    "The output $\\boldsymbol{x}_i^{\\text{hidden}}$ from each $i$-th hidden neuron is:\n",
+    "\n",
+    "$$\n",
+    "\\boldsymbol{x}_i^{\\text{hidden} } = f\\big(  \\boldsymbol{z}_{i}^{\\text{hidden}} \\big)\n",
+    "$$\n",
+    "\n",
+    "The outputs $\\boldsymbol{x}_i^{\\text{hidden} } $ are then sent to the output layer.\n",
+    "\n",
+    "The output layer consists of one neuron in this case, and combines the\n",
+    "output from each of the neurons in the hidden layers. The output layer\n",
+    "combines the results from the hidden layer using some weights $w_i^{\\text{output}}$\n",
+    "and biases $b_i^{\\text{output}}$. In this case,\n",
+    "it is assumes that the number of neurons in the output layer is one."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2aa977d9",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Final technicalities III\n",
+    "\n",
+    "The procedure of weighting the output neuron $j$ in the hidden layer to the $i$-th neuron in the output layer is similar as for the hidden layer described previously."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "48eccfa6",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\begin{aligned}\n",
+    "z_{1,j}^{\\text{output}} & =\n",
+    "\\begin{pmatrix}\n",
+    "b_1^{\\text{output}} & \\boldsymbol{w}_1^{\\text{output}}\n",
+    "\\end{pmatrix}\n",
+    "\\begin{pmatrix}\n",
+    "1 \\\\\n",
+    "\\boldsymbol{x}_j^{\\text{hidden}}\n",
+    "\\end{pmatrix}\n",
+    "\\end{aligned}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d4c2cdbf",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Final technicalities IV\n",
+    "\n",
+    "Expressing $z_{1,j}^{\\text{output}}$ as a vector gives the following way of weighting the inputs from the hidden layer:"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "be26d9c9",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\boldsymbol{z}_{1}^{\\text{output}} =\n",
+    "\\begin{pmatrix}\n",
+    "b_1^{\\text{output}} & \\boldsymbol{w}_1^{\\text{output}}\n",
+    "\\end{pmatrix}\n",
+    "\\begin{pmatrix}\n",
+    "1  & 1 & \\dots & 1 \\\\\n",
+    "\\boldsymbol{x}_1^{\\text{hidden}} & \\boldsymbol{x}_2^{\\text{hidden}} & \\dots & \\boldsymbol{x}_N^{\\text{hidden}}\n",
+    "\\end{pmatrix}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f3703c9a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "In this case we seek a continuous range of values since we are approximating a function. This means that after computing $\\boldsymbol{z}_{1}^{\\text{output}}$ the neural network has finished its feed forward step, and $\\boldsymbol{z}_{1}^{\\text{output}}$ is the final output of the network."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9859680c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Back propagation\n",
+    "\n",
+    "The next step is to decide how the parameters should be changed such that they minimize the cost function.\n",
+    "\n",
+    "The chosen cost function for this problem is"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c3df269d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "C(\\boldsymbol{x}, P) = \\frac{1}{N} \\sum_i  \\big(g_t'(x_i, P) - ( -\\gamma g_t(x_i, P) \\big)^2\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "dc69023a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "In order to minimize the cost function, an optimization method must be chosen.\n",
+    "\n",
+    "Here, gradient descent with a constant step size has been chosen."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d4bed3bd",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Gradient descent\n",
+    "\n",
+    "The idea of the gradient descent algorithm is to update parameters in\n",
+    "a direction where the cost function decreases goes to a minimum.\n",
+    "\n",
+    "In general, the update of some parameters $\\boldsymbol{\\omega}$ given a cost\n",
+    "function defined by some weights $\\boldsymbol{\\omega}$, $C(\\boldsymbol{x},\n",
+    "\\boldsymbol{\\omega})$, goes as follows:"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ed2a4f9a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\boldsymbol{\\omega}_{\\text{new} } = \\boldsymbol{\\omega} - \\lambda \\nabla_{\\boldsymbol{\\omega}} C(\\boldsymbol{x}, \\boldsymbol{\\omega})\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b9a4f604",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "for a number of iterations or until $ \\big|\\big| \\boldsymbol{\\omega}_{\\text{new} } - \\boldsymbol{\\omega} \\big|\\big|$ becomes smaller than some given tolerance.\n",
+    "\n",
+    "The value of $\\lambda$ decides how large steps the algorithm must take\n",
+    "in the direction of $ \\nabla_{\\boldsymbol{\\omega}} C(\\boldsymbol{x}, \\boldsymbol{\\omega})$.\n",
+    "The notation $\\nabla_{\\boldsymbol{\\omega}}$ express the gradient with respect\n",
+    "to the elements in $\\boldsymbol{\\omega}$.\n",
+    "\n",
+    "In our case, we have to minimize the cost function $C(\\boldsymbol{x}, P)$ with\n",
+    "respect to the two sets of weights and biases, that is for the hidden\n",
+    "layer $P_{\\text{hidden} }$ and for the output layer $P_{\\text{output}\n",
+    "}$ .\n",
+    "\n",
+    "This means that $P_{\\text{hidden} }$ and $P_{\\text{output} }$ is updated by"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e48d507f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\begin{aligned}\n",
+    "P_{\\text{hidden},\\text{new}} &= P_{\\text{hidden}} - \\lambda \\nabla_{P_{\\text{hidden}}} C(\\boldsymbol{x}, P)  \\\\\n",
+    "P_{\\text{output},\\text{new}} &= P_{\\text{output}} - \\lambda \\nabla_{P_{\\text{output}}} C(\\boldsymbol{x}, P)\n",
+    "\\end{aligned}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b84c5cf5",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## The code for solving the ODE"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 31,
+   "id": "293d0f7d",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "import autograd.numpy as np\n",
+    "from autograd import grad, elementwise_grad\n",
+    "import autograd.numpy.random as npr\n",
+    "from matplotlib import pyplot as plt\n",
+    "\n",
+    "def sigmoid(z):\n",
+    "    return 1/(1 + np.exp(-z))\n",
+    "\n",
+    "# Assuming one input, hidden, and output layer\n",
+    "def neural_network(params, x):\n",
+    "\n",
+    "    # Find the weights (including and biases) for the hidden and output layer.\n",
+    "    # Assume that params is a list of parameters for each layer.\n",
+    "    # The biases are the first element for each array in params,\n",
+    "    # and the weights are the remaning elements in each array in params.\n",
+    "\n",
+    "    w_hidden = params[0]\n",
+    "    w_output = params[1]\n",
+    "\n",
+    "    # Assumes input x being an one-dimensional array\n",
+    "    num_values = np.size(x)\n",
+    "    x = x.reshape(-1, num_values)\n",
+    "\n",
+    "    # Assume that the input layer does nothing to the input x\n",
+    "    x_input = x\n",
+    "\n",
+    "    ## Hidden layer:\n",
+    "\n",
+    "    # Add a row of ones to include bias\n",
+    "    x_input = np.concatenate((np.ones((1,num_values)), x_input ), axis = 0)\n",
+    "\n",
+    "    z_hidden = np.matmul(w_hidden, x_input)\n",
+    "    x_hidden = sigmoid(z_hidden)\n",
+    "\n",
+    "    ## Output layer:\n",
+    "\n",
+    "    # Include bias:\n",
+    "    x_hidden = np.concatenate((np.ones((1,num_values)), x_hidden ), axis = 0)\n",
+    "\n",
+    "    z_output = np.matmul(w_output, x_hidden)\n",
+    "    x_output = z_output\n",
+    "\n",
+    "    return x_output\n",
+    "\n",
+    "# The trial solution using the deep neural network:\n",
+    "def g_trial(x,params, g0 = 10):\n",
+    "    return g0 + x*neural_network(params,x)\n",
+    "\n",
+    "# The right side of the ODE:\n",
+    "def g(x, g_trial, gamma = 2):\n",
+    "    return -gamma*g_trial\n",
+    "\n",
+    "# The cost function:\n",
+    "def cost_function(P, x):\n",
+    "\n",
+    "    # Evaluate the trial function with the current parameters P\n",
+    "    g_t = g_trial(x,P)\n",
+    "\n",
+    "    # Find the derivative w.r.t x of the neural network\n",
+    "    d_net_out = elementwise_grad(neural_network,1)(P,x)\n",
+    "\n",
+    "    # Find the derivative w.r.t x of the trial function\n",
+    "    d_g_t = elementwise_grad(g_trial,0)(x,P)\n",
+    "\n",
+    "    # The right side of the ODE\n",
+    "    func = g(x, g_t)\n",
+    "\n",
+    "    err_sqr = (d_g_t - func)**2\n",
+    "    cost_sum = np.sum(err_sqr)\n",
+    "\n",
+    "    return cost_sum / np.size(err_sqr)\n",
+    "\n",
+    "# Solve the exponential decay ODE using neural network with one input, hidden, and output layer\n",
+    "def solve_ode_neural_network(x, num_neurons_hidden, num_iter, lmb):\n",
+    "    ## Set up initial weights and biases\n",
+    "\n",
+    "    # For the hidden layer\n",
+    "    p0 = npr.randn(num_neurons_hidden, 2 )\n",
+    "\n",
+    "    # For the output layer\n",
+    "    p1 = npr.randn(1, num_neurons_hidden + 1 ) # +1 since bias is included\n",
+    "\n",
+    "    P = [p0, p1]\n",
+    "\n",
+    "    print('Initial cost: %g'%cost_function(P, x))\n",
+    "\n",
+    "    ## Start finding the optimal weights using gradient descent\n",
+    "\n",
+    "    # Find the Python function that represents the gradient of the cost function\n",
+    "    # w.r.t the 0-th input argument -- that is the weights and biases in the hidden and output layer\n",
+    "    cost_function_grad = grad(cost_function,0)\n",
+    "\n",
+    "    # Let the update be done num_iter times\n",
+    "    for i in range(num_iter):\n",
+    "        # Evaluate the gradient at the current weights and biases in P.\n",
+    "        # The cost_grad consist now of two arrays;\n",
+    "        # one for the gradient w.r.t P_hidden and\n",
+    "        # one for the gradient w.r.t P_output\n",
+    "        cost_grad =  cost_function_grad(P, x)\n",
+    "\n",
+    "        P[0] = P[0] - lmb * cost_grad[0]\n",
+    "        P[1] = P[1] - lmb * cost_grad[1]\n",
+    "\n",
+    "    print('Final cost: %g'%cost_function(P, x))\n",
+    "\n",
+    "    return P\n",
+    "\n",
+    "def g_analytic(x, gamma = 2, g0 = 10):\n",
+    "    return g0*np.exp(-gamma*x)\n",
+    "\n",
+    "# Solve the given problem\n",
+    "if __name__ == '__main__':\n",
+    "    # Set seed such that the weight are initialized\n",
+    "    # with same weights and biases for every run.\n",
+    "    npr.seed(15)\n",
+    "\n",
+    "    ## Decide the vales of arguments to the function to solve\n",
+    "    N = 10\n",
+    "    x = np.linspace(0, 1, N)\n",
+    "\n",
+    "    ## Set up the initial parameters\n",
+    "    num_hidden_neurons = 10\n",
+    "    num_iter = 10000\n",
+    "    lmb = 0.001\n",
+    "\n",
+    "    # Use the network\n",
+    "    P = solve_ode_neural_network(x, num_hidden_neurons, num_iter, lmb)\n",
+    "\n",
+    "    # Print the deviation from the trial solution and true solution\n",
+    "    res = g_trial(x,P)\n",
+    "    res_analytical = g_analytic(x)\n",
+    "\n",
+    "    print('Max absolute difference: %g'%np.max(np.abs(res - res_analytical)))\n",
+    "\n",
+    "    # Plot the results\n",
+    "    plt.figure(figsize=(10,10))\n",
+    "\n",
+    "    plt.title('Performance of neural network solving an ODE compared to the analytical solution')\n",
+    "    plt.plot(x, res_analytical)\n",
+    "    plt.plot(x, res[0,:])\n",
+    "    plt.legend(['analytical','nn'])\n",
+    "    plt.xlabel('x')\n",
+    "    plt.ylabel('g(x)')\n",
+    "    plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "54c070e1",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## The network with one input layer, specified number of hidden layers, and one output layer\n",
+    "\n",
+    "It is also possible to extend the construction of our network into a more general one, allowing the network to contain more than one hidden layers.\n",
+    "\n",
+    "The number of neurons within each hidden layer are given as a list of integers in the program below."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 32,
+   "id": "4ab2467e",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "import autograd.numpy as np\n",
+    "from autograd import grad, elementwise_grad\n",
+    "import autograd.numpy.random as npr\n",
+    "from matplotlib import pyplot as plt\n",
+    "\n",
+    "def sigmoid(z):\n",
+    "    return 1/(1 + np.exp(-z))\n",
+    "\n",
+    "# The neural network with one input layer and one output layer,\n",
+    "# but with number of hidden layers specified by the user.\n",
+    "def deep_neural_network(deep_params, x):\n",
+    "    # N_hidden is the number of hidden layers\n",
+    "    # deep_params is a list, len() should be used\n",
+    "    N_hidden = len(deep_params) - 1 # -1 since params consists of\n",
+    "                                        # parameters to all the hidden\n",
+    "                                        # layers AND the output layer.\n",
+    "\n",
+    "    # Assumes input x being an one-dimensional array\n",
+    "    num_values = np.size(x)\n",
+    "    x = x.reshape(-1, num_values)\n",
+    "\n",
+    "    # Assume that the input layer does nothing to the input x\n",
+    "    x_input = x\n",
+    "\n",
+    "    # Due to multiple hidden layers, define a variable referencing to the\n",
+    "    # output of the previous layer:\n",
+    "    x_prev = x_input\n",
+    "\n",
+    "    ## Hidden layers:\n",
+    "\n",
+    "    for l in range(N_hidden):\n",
+    "        # From the list of parameters P; find the correct weigths and bias for this layer\n",
+    "        w_hidden = deep_params[l]\n",
+    "\n",
+    "        # Add a row of ones to include bias\n",
+    "        x_prev = np.concatenate((np.ones((1,num_values)), x_prev ), axis = 0)\n",
+    "\n",
+    "        z_hidden = np.matmul(w_hidden, x_prev)\n",
+    "        x_hidden = sigmoid(z_hidden)\n",
+    "\n",
+    "        # Update x_prev such that next layer can use the output from this layer\n",
+    "        x_prev = x_hidden\n",
+    "\n",
+    "    ## Output layer:\n",
+    "\n",
+    "    # Get the weights and bias for this layer\n",
+    "    w_output = deep_params[-1]\n",
+    "\n",
+    "    # Include bias:\n",
+    "    x_prev = np.concatenate((np.ones((1,num_values)), x_prev), axis = 0)\n",
+    "\n",
+    "    z_output = np.matmul(w_output, x_prev)\n",
+    "    x_output = z_output\n",
+    "\n",
+    "    return x_output\n",
+    "\n",
+    "# The trial solution using the deep neural network:\n",
+    "def g_trial_deep(x,params, g0 = 10):\n",
+    "    return g0 + x*deep_neural_network(params, x)\n",
+    "\n",
+    "# The right side of the ODE:\n",
+    "def g(x, g_trial, gamma = 2):\n",
+    "    return -gamma*g_trial\n",
+    "\n",
+    "# The same cost function as before, but calls deep_neural_network instead.\n",
+    "def cost_function_deep(P, x):\n",
+    "\n",
+    "    # Evaluate the trial function with the current parameters P\n",
+    "    g_t = g_trial_deep(x,P)\n",
+    "\n",
+    "    # Find the derivative w.r.t x of the neural network\n",
+    "    d_net_out = elementwise_grad(deep_neural_network,1)(P,x)\n",
+    "\n",
+    "    # Find the derivative w.r.t x of the trial function\n",
+    "    d_g_t = elementwise_grad(g_trial_deep,0)(x,P)\n",
+    "\n",
+    "    # The right side of the ODE\n",
+    "    func = g(x, g_t)\n",
+    "\n",
+    "    err_sqr = (d_g_t - func)**2\n",
+    "    cost_sum = np.sum(err_sqr)\n",
+    "\n",
+    "    return cost_sum / np.size(err_sqr)\n",
+    "\n",
+    "# Solve the exponential decay ODE using neural network with one input and one output layer,\n",
+    "# but with specified number of hidden layers from the user.\n",
+    "def solve_ode_deep_neural_network(x, num_neurons, num_iter, lmb):\n",
+    "    # num_hidden_neurons is now a list of number of neurons within each hidden layer\n",
+    "\n",
+    "    # The number of elements in the list num_hidden_neurons thus represents\n",
+    "    # the number of hidden layers.\n",
+    "\n",
+    "    # Find the number of hidden layers:\n",
+    "    N_hidden = np.size(num_neurons)\n",
+    "\n",
+    "    ## Set up initial weights and biases\n",
+    "\n",
+    "    # Initialize the list of parameters:\n",
+    "    P = [None]*(N_hidden + 1) # + 1 to include the output layer\n",
+    "\n",
+    "    P[0] = npr.randn(num_neurons[0], 2 )\n",
+    "    for l in range(1,N_hidden):\n",
+    "        P[l] = npr.randn(num_neurons[l], num_neurons[l-1] + 1) # +1 to include bias\n",
+    "\n",
+    "    # For the output layer\n",
+    "    P[-1] = npr.randn(1, num_neurons[-1] + 1 ) # +1 since bias is included\n",
+    "\n",
+    "    print('Initial cost: %g'%cost_function_deep(P, x))\n",
+    "\n",
+    "    ## Start finding the optimal weights using gradient descent\n",
+    "\n",
+    "    # Find the Python function that represents the gradient of the cost function\n",
+    "    # w.r.t the 0-th input argument -- that is the weights and biases in the hidden and output layer\n",
+    "    cost_function_deep_grad = grad(cost_function_deep,0)\n",
+    "\n",
+    "    # Let the update be done num_iter times\n",
+    "    for i in range(num_iter):\n",
+    "        # Evaluate the gradient at the current weights and biases in P.\n",
+    "        # The cost_grad consist now of N_hidden + 1 arrays; the gradient w.r.t the weights and biases\n",
+    "        # in the hidden layers and output layers evaluated at x.\n",
+    "        cost_deep_grad =  cost_function_deep_grad(P, x)\n",
+    "\n",
+    "        for l in range(N_hidden+1):\n",
+    "            P[l] = P[l] - lmb * cost_deep_grad[l]\n",
+    "\n",
+    "    print('Final cost: %g'%cost_function_deep(P, x))\n",
+    "\n",
+    "    return P\n",
+    "\n",
+    "def g_analytic(x, gamma = 2, g0 = 10):\n",
+    "    return g0*np.exp(-gamma*x)\n",
+    "\n",
+    "# Solve the given problem\n",
+    "if __name__ == '__main__':\n",
+    "    npr.seed(15)\n",
+    "\n",
+    "    ## Decide the vales of arguments to the function to solve\n",
+    "    N = 10\n",
+    "    x = np.linspace(0, 1, N)\n",
+    "\n",
+    "    ## Set up the initial parameters\n",
+    "    num_hidden_neurons = np.array([10,10])\n",
+    "    num_iter = 10000\n",
+    "    lmb = 0.001\n",
+    "\n",
+    "    P = solve_ode_deep_neural_network(x, num_hidden_neurons, num_iter, lmb)\n",
+    "\n",
+    "    res = g_trial_deep(x,P)\n",
+    "    res_analytical = g_analytic(x)\n",
+    "\n",
+    "    plt.figure(figsize=(10,10))\n",
+    "\n",
+    "    plt.title('Performance of a deep neural network solving an ODE compared to the analytical solution')\n",
+    "    plt.plot(x, res_analytical)\n",
+    "    plt.plot(x, res[0,:])\n",
+    "    plt.legend(['analytical','dnn'])\n",
+    "    plt.ylabel('g(x)')\n",
+    "    plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "05126a03",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Example: Population growth\n",
+    "\n",
+    "A logistic model of population growth assumes that a population converges toward an equilibrium.\n",
+    "The population growth can be modeled by"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7b4e9871",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"log\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation} \\label{log} \\tag{10}\n",
+    "\tg'(t) = \\alpha g(t)(A - g(t))\n",
+    "\\end{equation}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "20266e3a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where $g(t)$ is the population density at time $t$, $\\alpha > 0$ the growth rate and $A > 0$ is the maximum population number in the environment.\n",
+    "Also, at $t = 0$ the population has the size $g(0) = g_0$, where $g_0$ is some chosen constant.\n",
+    "\n",
+    "In this example, similar network as for the exponential decay using Autograd has been used to solve the equation. However, as the implementation might suffer from e.g numerical instability\n",
+    "and high execution time (this might be more apparent in the examples solving PDEs),\n",
+    "using a library like  TensorFlow is recommended.\n",
+    "Here, we stay with a more simple approach and implement for comparison, the simple forward Euler method."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8a3f1b3d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Setting up the problem\n",
+    "\n",
+    "Here, we will model a population $g(t)$ in an environment having carrying capacity $A$.\n",
+    "The population follows the model"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "14dfc04b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"solveode_population\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation} \\label{solveode_population} \\tag{11}\n",
+    "g'(t) = \\alpha g(t)(A - g(t))\n",
+    "\\end{equation}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b125d1d3",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where $g(0) = g_0$.\n",
+    "\n",
+    "In this example, we let $\\alpha = 2$, $A = 1$, and $g_0 = 1.2$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "226a3528",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## The trial solution\n",
+    "\n",
+    "We will get a slightly different trial solution, as the boundary conditions are different\n",
+    "compared to the case for exponential decay.\n",
+    "\n",
+    "A possible trial solution satisfying the condition $g(0) = g_0$ could be\n",
+    "\n",
+    "$$\n",
+    "h_1(t) = g_0 + t \\cdot N(t,P)\n",
+    "$$\n",
+    "\n",
+    "with $N(t,P)$ being the output from the neural network with weights and biases for each layer collected in the set $P$.\n",
+    "\n",
+    "The analytical solution is\n",
+    "\n",
+    "$$\n",
+    "g(t) = \\frac{Ag_0}{g_0 + (A - g_0)\\exp(-\\alpha A t)}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "adeeb731",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## The program using Autograd\n",
+    "\n",
+    "The network will be the similar as for the exponential decay example, but with some small modifications for our problem."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 33,
+   "id": "eb3ed6d1",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "import autograd.numpy as np\n",
+    "from autograd import grad, elementwise_grad\n",
+    "import autograd.numpy.random as npr\n",
+    "from matplotlib import pyplot as plt\n",
+    "\n",
+    "def sigmoid(z):\n",
+    "    return 1/(1 + np.exp(-z))\n",
+    "\n",
+    "# Function to get the parameters.\n",
+    "# Done such that one can easily change the paramaters after one's liking.\n",
+    "def get_parameters():\n",
+    "    alpha = 2\n",
+    "    A = 1\n",
+    "    g0 = 1.2\n",
+    "    return alpha, A, g0\n",
+    "\n",
+    "def deep_neural_network(deep_params, x):\n",
+    "    # N_hidden is the number of hidden layers\n",
+    "    # deep_params is a list, len() should be used\n",
+    "    N_hidden = len(deep_params) - 1 # -1 since params consists of\n",
+    "                                        # parameters to all the hidden\n",
+    "                                        # layers AND the output layer.\n",
+    "\n",
+    "    # Assumes input x being an one-dimensional array\n",
+    "    num_values = np.size(x)\n",
+    "    x = x.reshape(-1, num_values)\n",
+    "\n",
+    "    # Assume that the input layer does nothing to the input x\n",
+    "    x_input = x\n",
+    "\n",
+    "    # Due to multiple hidden layers, define a variable referencing to the\n",
+    "    # output of the previous layer:\n",
+    "    x_prev = x_input\n",
+    "\n",
+    "    ## Hidden layers:\n",
+    "\n",
+    "    for l in range(N_hidden):\n",
+    "        # From the list of parameters P; find the correct weigths and bias for this layer\n",
+    "        w_hidden = deep_params[l]\n",
+    "\n",
+    "        # Add a row of ones to include bias\n",
+    "        x_prev = np.concatenate((np.ones((1,num_values)), x_prev ), axis = 0)\n",
+    "\n",
+    "        z_hidden = np.matmul(w_hidden, x_prev)\n",
+    "        x_hidden = sigmoid(z_hidden)\n",
+    "\n",
+    "        # Update x_prev such that next layer can use the output from this layer\n",
+    "        x_prev = x_hidden\n",
+    "\n",
+    "    ## Output layer:\n",
+    "\n",
+    "    # Get the weights and bias for this layer\n",
+    "    w_output = deep_params[-1]\n",
+    "\n",
+    "    # Include bias:\n",
+    "    x_prev = np.concatenate((np.ones((1,num_values)), x_prev), axis = 0)\n",
+    "\n",
+    "    z_output = np.matmul(w_output, x_prev)\n",
+    "    x_output = z_output\n",
+    "\n",
+    "    return x_output\n",
+    "\n",
+    "\n",
+    "\n",
+    "\n",
+    "def cost_function_deep(P, x):\n",
+    "\n",
+    "    # Evaluate the trial function with the current parameters P\n",
+    "    g_t = g_trial_deep(x,P)\n",
+    "\n",
+    "    # Find the derivative w.r.t x of the trial function\n",
+    "    d_g_t = elementwise_grad(g_trial_deep,0)(x,P)\n",
+    "\n",
+    "    # The right side of the ODE\n",
+    "    func = f(x, g_t)\n",
+    "\n",
+    "    err_sqr = (d_g_t - func)**2\n",
+    "    cost_sum = np.sum(err_sqr)\n",
+    "\n",
+    "    return cost_sum / np.size(err_sqr)\n",
+    "\n",
+    "# The right side of the ODE:\n",
+    "def f(x, g_trial):\n",
+    "    alpha,A, g0 = get_parameters()\n",
+    "    return alpha*g_trial*(A - g_trial)\n",
+    "\n",
+    "# The trial solution using the deep neural network:\n",
+    "def g_trial_deep(x, params):\n",
+    "    alpha,A, g0 = get_parameters()\n",
+    "    return g0 + x*deep_neural_network(params,x)\n",
+    "\n",
+    "# The analytical solution:\n",
+    "def g_analytic(t):\n",
+    "    alpha,A, g0 = get_parameters()\n",
+    "    return A*g0/(g0 + (A - g0)*np.exp(-alpha*A*t))\n",
+    "\n",
+    "def solve_ode_deep_neural_network(x, num_neurons, num_iter, lmb):\n",
+    "    # num_hidden_neurons is now a list of number of neurons within each hidden layer\n",
+    "\n",
+    "    # Find the number of hidden layers:\n",
+    "    N_hidden = np.size(num_neurons)\n",
+    "\n",
+    "    ## Set up initial weigths and biases\n",
+    "\n",
+    "    # Initialize the list of parameters:\n",
+    "    P = [None]*(N_hidden + 1) # + 1 to include the output layer\n",
+    "\n",
+    "    P[0] = npr.randn(num_neurons[0], 2 )\n",
+    "    for l in range(1,N_hidden):\n",
+    "        P[l] = npr.randn(num_neurons[l], num_neurons[l-1] + 1) # +1 to include bias\n",
+    "\n",
+    "    # For the output layer\n",
+    "    P[-1] = npr.randn(1, num_neurons[-1] + 1 ) # +1 since bias is included\n",
+    "\n",
+    "    print('Initial cost: %g'%cost_function_deep(P, x))\n",
+    "\n",
+    "    ## Start finding the optimal weigths using gradient descent\n",
+    "\n",
+    "    # Find the Python function that represents the gradient of the cost function\n",
+    "    # w.r.t the 0-th input argument -- that is the weights and biases in the hidden and output layer\n",
+    "    cost_function_deep_grad = grad(cost_function_deep,0)\n",
+    "\n",
+    "    # Let the update be done num_iter times\n",
+    "    for i in range(num_iter):\n",
+    "        # Evaluate the gradient at the current weights and biases in P.\n",
+    "        # The cost_grad consist now of N_hidden + 1 arrays; the gradient w.r.t the weights and biases\n",
+    "        # in the hidden layers and output layers evaluated at x.\n",
+    "        cost_deep_grad =  cost_function_deep_grad(P, x)\n",
+    "\n",
+    "        for l in range(N_hidden+1):\n",
+    "            P[l] = P[l] - lmb * cost_deep_grad[l]\n",
+    "\n",
+    "    print('Final cost: %g'%cost_function_deep(P, x))\n",
+    "\n",
+    "    return P\n",
+    "\n",
+    "if __name__ == '__main__':\n",
+    "    npr.seed(4155)\n",
+    "\n",
+    "    ## Decide the vales of arguments to the function to solve\n",
+    "    Nt = 10\n",
+    "    T = 1\n",
+    "    t = np.linspace(0,T, Nt)\n",
+    "\n",
+    "    ## Set up the initial parameters\n",
+    "    num_hidden_neurons = [100, 50, 25]\n",
+    "    num_iter = 1000\n",
+    "    lmb = 1e-3\n",
+    "\n",
+    "    P = solve_ode_deep_neural_network(t, num_hidden_neurons, num_iter, lmb)\n",
+    "\n",
+    "    g_dnn_ag = g_trial_deep(t,P)\n",
+    "    g_analytical = g_analytic(t)\n",
+    "\n",
+    "    # Find the maximum absolute difference between the solutons:\n",
+    "    diff_ag = np.max(np.abs(g_dnn_ag - g_analytical))\n",
+    "    print(\"The max absolute difference between the solutions is: %g\"%diff_ag)\n",
+    "\n",
+    "    plt.figure(figsize=(10,10))\n",
+    "\n",
+    "    plt.title('Performance of neural network solving an ODE compared to the analytical solution')\n",
+    "    plt.plot(t, g_analytical)\n",
+    "    plt.plot(t, g_dnn_ag[0,:])\n",
+    "    plt.legend(['analytical','nn'])\n",
+    "    plt.xlabel('t')\n",
+    "    plt.ylabel('g(t)')\n",
+    "\n",
+    "    plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2407df1c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Using forward Euler to solve the ODE\n",
+    "\n",
+    "A straightforward way of solving an ODE numerically, is to use Euler's method.\n",
+    "\n",
+    "Euler's method uses Taylor series to approximate the value at a function $f$ at a step $\\Delta x$ from $x$:\n",
+    "\n",
+    "$$\n",
+    "f(x + \\Delta x) \\approx f(x) + \\Delta x f'(x)\n",
+    "$$\n",
+    "\n",
+    "In our case, using Euler's method to approximate the value of $g$ at a step $\\Delta t$ from $t$ yields"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e30d9840",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\begin{aligned}\n",
+    "  g(t + \\Delta t) &\\approx g(t) + \\Delta t g'(t) \\\\\n",
+    "  &= g(t) + \\Delta t \\big(\\alpha g(t)(A - g(t))\\big)\n",
+    "\\end{aligned}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4af6e338",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "along with the condition that $g(0) = g_0$.\n",
+    "\n",
+    "Let $t_i = i \\cdot \\Delta t$ where $\\Delta t = \\frac{T}{N_t-1}$ where $T$ is the final time our solver must solve for and $N_t$ the number of values for $t \\in [0, T]$ for $i = 0, \\dots, N_t-1$.\n",
+    "\n",
+    "For $i \\geq 1$, we have that"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "606cf0d3",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\begin{aligned}\n",
+    "t_i &= i\\Delta t \\\\\n",
+    "&= (i - 1)\\Delta t + \\Delta t \\\\\n",
+    "&= t_{i-1} + \\Delta t\n",
+    "\\end{aligned}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3275ea67",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Now, if $g_i = g(t_i)$ then"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8c36efec",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"odenum\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation}\n",
+    "  \\begin{aligned}\n",
+    "  g_i &= g(t_i) \\\\\n",
+    "  &= g(t_{i-1} + \\Delta t) \\\\\n",
+    "  &\\approx g(t_{i-1}) + \\Delta t \\big(\\alpha g(t_{i-1})(A - g(t_{i-1}))\\big) \\\\\n",
+    "  &= g_{i-1} + \\Delta t \\big(\\alpha g_{i-1}(A - g_{i-1})\\big)\n",
+    "  \\end{aligned}\n",
+    "\\end{equation} \\label{odenum} \\tag{12}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5290cde6",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "for $i \\geq 1$ and $g_0 = g(t_0) = g(0) = g_0$.\n",
+    "\n",
+    "Equation ([12](#odenum)) could be implemented in the following way,\n",
+    "extending the program that uses the network using Autograd:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 34,
+   "id": "d5488516",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "# Assume that all function definitions from the example program using Autograd\n",
+    "# are located here.\n",
+    "\n",
+    "if __name__ == '__main__':\n",
+    "    npr.seed(4155)\n",
+    "\n",
+    "    ## Decide the vales of arguments to the function to solve\n",
+    "    Nt = 10\n",
+    "    T = 1\n",
+    "    t = np.linspace(0,T, Nt)\n",
+    "\n",
+    "    ## Set up the initial parameters\n",
+    "    num_hidden_neurons = [100,50,25]\n",
+    "    num_iter = 1000\n",
+    "    lmb = 1e-3\n",
+    "\n",
+    "    P = solve_ode_deep_neural_network(t, num_hidden_neurons, num_iter, lmb)\n",
+    "\n",
+    "    g_dnn_ag = g_trial_deep(t,P)\n",
+    "    g_analytical = g_analytic(t)\n",
+    "\n",
+    "    # Find the maximum absolute difference between the solutons:\n",
+    "    diff_ag = np.max(np.abs(g_dnn_ag - g_analytical))\n",
+    "    print(\"The max absolute difference between the solutions is: %g\"%diff_ag)\n",
+    "\n",
+    "    plt.figure(figsize=(10,10))\n",
+    "\n",
+    "    plt.title('Performance of neural network solving an ODE compared to the analytical solution')\n",
+    "    plt.plot(t, g_analytical)\n",
+    "    plt.plot(t, g_dnn_ag[0,:])\n",
+    "    plt.legend(['analytical','nn'])\n",
+    "    plt.xlabel('t')\n",
+    "    plt.ylabel('g(t)')\n",
+    "\n",
+    "    ## Find an approximation to the funtion using forward Euler\n",
+    "\n",
+    "    alpha, A, g0 = get_parameters()\n",
+    "    dt = T/(Nt - 1)\n",
+    "\n",
+    "    # Perform forward Euler to solve the ODE\n",
+    "    g_euler = np.zeros(Nt)\n",
+    "    g_euler[0] = g0\n",
+    "\n",
+    "    for i in range(1,Nt):\n",
+    "        g_euler[i] = g_euler[i-1] + dt*(alpha*g_euler[i-1]*(A - g_euler[i-1]))\n",
+    "\n",
+    "    # Print the errors done by each method\n",
+    "    diff1 = np.max(np.abs(g_euler - g_analytical))\n",
+    "    diff2 = np.max(np.abs(g_dnn_ag[0,:] - g_analytical))\n",
+    "\n",
+    "    print('Max absolute difference between Euler method and analytical: %g'%diff1)\n",
+    "    print('Max absolute difference between deep neural network and analytical: %g'%diff2)\n",
+    "\n",
+    "    # Plot results\n",
+    "    plt.figure(figsize=(10,10))\n",
+    "\n",
+    "    plt.plot(t,g_euler)\n",
+    "    plt.plot(t,g_analytical)\n",
+    "    plt.plot(t,g_dnn_ag[0,:])\n",
+    "\n",
+    "    plt.legend(['euler','analytical','dnn'])\n",
+    "    plt.xlabel('Time t')\n",
+    "    plt.ylabel('g(t)')\n",
+    "\n",
+    "    plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d631641d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Example: Solving the one dimensional Poisson equation\n",
+    "\n",
+    "The Poisson equation for $g(x)$ in one dimension is"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3bd8043b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"poisson\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation} \\label{poisson} \\tag{13}\n",
+    "  -g''(x) = f(x)\n",
+    "\\end{equation}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "818ac1d8",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where $f(x)$ is a given function for $x \\in (0,1)$.\n",
+    "\n",
+    "The conditions that $g(x)$ is chosen to fulfill, are"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "894be116",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\begin{align*}\n",
+    "  g(0) &= 0 \\\\\n",
+    "  g(1) &= 0\n",
+    "\\end{align*}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c2fce07f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "This equation can be solved numerically using programs where e.g Autograd and TensorFlow are used.\n",
+    "The results from the networks can then be compared to the analytical solution.\n",
+    "In addition, it could be interesting to see how a typical method for numerically solving second order ODEs compares to the neural networks."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1e2ffb5e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## The specific equation to solve for\n",
+    "\n",
+    "Here, the function $g(x)$ to solve for follows the equation"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5677eb07",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "-g''(x) = f(x),\\qquad x \\in (0,1)\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "89173815",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where $f(x)$ is a given function, along with the chosen conditions"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f6e81c01",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"cond\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{aligned}\n",
+    "g(0) = g(1) = 0\n",
+    "\\end{aligned}\\label{cond} \\tag{14}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "82b4c100",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "In this example, we consider the case when $f(x) = (3x + x^2)\\exp(x)$.\n",
+    "\n",
+    "For this case, a possible trial solution satisfying the conditions could be"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "05574f7f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "g_t(x) = x \\cdot (1-x) \\cdot N(P,x)\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5c17a08c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "The analytical solution for this problem is"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a0ce240a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "g(x) = x(1 - x)\\exp(x)\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d90da9be",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Solving the equation using Autograd"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 35,
+   "id": "ffd8b552",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "import autograd.numpy as np\n",
+    "from autograd import grad, elementwise_grad\n",
+    "import autograd.numpy.random as npr\n",
+    "from matplotlib import pyplot as plt\n",
+    "\n",
+    "def sigmoid(z):\n",
+    "    return 1/(1 + np.exp(-z))\n",
+    "\n",
+    "def deep_neural_network(deep_params, x):\n",
+    "    # N_hidden is the number of hidden layers\n",
+    "    # deep_params is a list, len() should be used\n",
+    "    N_hidden = len(deep_params) - 1 # -1 since params consists of\n",
+    "                                        # parameters to all the hidden\n",
+    "                                        # layers AND the output layer.\n",
+    "\n",
+    "    # Assumes input x being an one-dimensional array\n",
+    "    num_values = np.size(x)\n",
+    "    x = x.reshape(-1, num_values)\n",
+    "\n",
+    "    # Assume that the input layer does nothing to the input x\n",
+    "    x_input = x\n",
+    "\n",
+    "    # Due to multiple hidden layers, define a variable referencing to the\n",
+    "    # output of the previous layer:\n",
+    "    x_prev = x_input\n",
+    "\n",
+    "    ## Hidden layers:\n",
+    "\n",
+    "    for l in range(N_hidden):\n",
+    "        # From the list of parameters P; find the correct weigths and bias for this layer\n",
+    "        w_hidden = deep_params[l]\n",
+    "\n",
+    "        # Add a row of ones to include bias\n",
+    "        x_prev = np.concatenate((np.ones((1,num_values)), x_prev ), axis = 0)\n",
+    "\n",
+    "        z_hidden = np.matmul(w_hidden, x_prev)\n",
+    "        x_hidden = sigmoid(z_hidden)\n",
+    "\n",
+    "        # Update x_prev such that next layer can use the output from this layer\n",
+    "        x_prev = x_hidden\n",
+    "\n",
+    "    ## Output layer:\n",
+    "\n",
+    "    # Get the weights and bias for this layer\n",
+    "    w_output = deep_params[-1]\n",
+    "\n",
+    "    # Include bias:\n",
+    "    x_prev = np.concatenate((np.ones((1,num_values)), x_prev), axis = 0)\n",
+    "\n",
+    "    z_output = np.matmul(w_output, x_prev)\n",
+    "    x_output = z_output\n",
+    "\n",
+    "    return x_output\n",
+    "\n",
+    "\n",
+    "def solve_ode_deep_neural_network(x, num_neurons, num_iter, lmb):\n",
+    "    # num_hidden_neurons is now a list of number of neurons within each hidden layer\n",
+    "\n",
+    "    # Find the number of hidden layers:\n",
+    "    N_hidden = np.size(num_neurons)\n",
+    "\n",
+    "    ## Set up initial weigths and biases\n",
+    "\n",
+    "    # Initialize the list of parameters:\n",
+    "    P = [None]*(N_hidden + 1) # + 1 to include the output layer\n",
+    "\n",
+    "    P[0] = npr.randn(num_neurons[0], 2 )\n",
+    "    for l in range(1,N_hidden):\n",
+    "        P[l] = npr.randn(num_neurons[l], num_neurons[l-1] + 1) # +1 to include bias\n",
+    "\n",
+    "    # For the output layer\n",
+    "    P[-1] = npr.randn(1, num_neurons[-1] + 1 ) # +1 since bias is included\n",
+    "\n",
+    "    print('Initial cost: %g'%cost_function_deep(P, x))\n",
+    "\n",
+    "    ## Start finding the optimal weigths using gradient descent\n",
+    "\n",
+    "    # Find the Python function that represents the gradient of the cost function\n",
+    "    # w.r.t the 0-th input argument -- that is the weights and biases in the hidden and output layer\n",
+    "    cost_function_deep_grad = grad(cost_function_deep,0)\n",
+    "\n",
+    "    # Let the update be done num_iter times\n",
+    "    for i in range(num_iter):\n",
+    "        # Evaluate the gradient at the current weights and biases in P.\n",
+    "        # The cost_grad consist now of N_hidden + 1 arrays; the gradient w.r.t the weights and biases\n",
+    "        # in the hidden layers and output layers evaluated at x.\n",
+    "        cost_deep_grad =  cost_function_deep_grad(P, x)\n",
+    "\n",
+    "        for l in range(N_hidden+1):\n",
+    "            P[l] = P[l] - lmb * cost_deep_grad[l]\n",
+    "\n",
+    "    print('Final cost: %g'%cost_function_deep(P, x))\n",
+    "\n",
+    "    return P\n",
+    "\n",
+    "## Set up the cost function specified for this Poisson equation:\n",
+    "\n",
+    "# The right side of the ODE\n",
+    "def f(x):\n",
+    "    return (3*x + x**2)*np.exp(x)\n",
+    "\n",
+    "def cost_function_deep(P, x):\n",
+    "\n",
+    "    # Evaluate the trial function with the current parameters P\n",
+    "    g_t = g_trial_deep(x,P)\n",
+    "\n",
+    "    # Find the derivative w.r.t x of the trial function\n",
+    "    d2_g_t = elementwise_grad(elementwise_grad(g_trial_deep,0))(x,P)\n",
+    "\n",
+    "    right_side = f(x)\n",
+    "\n",
+    "    err_sqr = (-d2_g_t - right_side)**2\n",
+    "    cost_sum = np.sum(err_sqr)\n",
+    "\n",
+    "    return cost_sum/np.size(err_sqr)\n",
+    "\n",
+    "# The trial solution:\n",
+    "def g_trial_deep(x,P):\n",
+    "    return x*(1-x)*deep_neural_network(P,x)\n",
+    "\n",
+    "# The analytic solution;\n",
+    "def g_analytic(x):\n",
+    "    return x*(1-x)*np.exp(x)\n",
+    "\n",
+    "if __name__ == '__main__':\n",
+    "    npr.seed(4155)\n",
+    "\n",
+    "    ## Decide the vales of arguments to the function to solve\n",
+    "    Nx = 10\n",
+    "    x = np.linspace(0,1, Nx)\n",
+    "\n",
+    "    ## Set up the initial parameters\n",
+    "    num_hidden_neurons = [200,100]\n",
+    "    num_iter = 1000\n",
+    "    lmb = 1e-3\n",
+    "\n",
+    "    P = solve_ode_deep_neural_network(x, num_hidden_neurons, num_iter, lmb)\n",
+    "\n",
+    "    g_dnn_ag = g_trial_deep(x,P)\n",
+    "    g_analytical = g_analytic(x)\n",
+    "\n",
+    "    # Find the maximum absolute difference between the solutons:\n",
+    "    max_diff = np.max(np.abs(g_dnn_ag - g_analytical))\n",
+    "    print(\"The max absolute difference between the solutions is: %g\"%max_diff)\n",
+    "\n",
+    "    plt.figure(figsize=(10,10))\n",
+    "\n",
+    "    plt.title('Performance of neural network solving an ODE compared to the analytical solution')\n",
+    "    plt.plot(x, g_analytical)\n",
+    "    plt.plot(x, g_dnn_ag[0,:])\n",
+    "    plt.legend(['analytical','nn'])\n",
+    "    plt.xlabel('x')\n",
+    "    plt.ylabel('g(x)')\n",
+    "    plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2cde42e7",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Comparing with a numerical scheme\n",
+    "\n",
+    "The Poisson equation is possible to solve using Taylor series to approximate the second derivative.\n",
+    "\n",
+    "Using Taylor series, the second derivative can be expressed as\n",
+    "\n",
+    "$$\n",
+    "g''(x) = \\frac{g(x + \\Delta x) - 2g(x) + g(x-\\Delta x)}{\\Delta x^2} + E_{\\Delta x}(x)\n",
+    "$$\n",
+    "\n",
+    "where $\\Delta x$ is a small step size and $E_{\\Delta x}(x)$ being the error term.\n",
+    "\n",
+    "Looking away from the error terms gives an approximation to the second derivative:"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e24a46af",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"approx\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation} \\label{approx} \\tag{15}\n",
+    "g''(x) \\approx \\frac{g(x + \\Delta x) - 2g(x) + g(x-\\Delta x)}{\\Delta x^2}\n",
+    "\\end{equation}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2417ec7c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "If $x_i = i \\Delta x = x_{i-1} + \\Delta x$ and $g_i = g(x_i)$ for $i = 1,\\dots N_x - 2$ with $N_x$ being the number of values for $x$, ([15](#approx)) becomes"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "012a9c2b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\begin{aligned}\n",
+    "g''(x_i) &\\approx \\frac{g(x_i + \\Delta x) - 2g(x_i) + g(x_i -\\Delta x)}{\\Delta x^2} \\\\\n",
+    "&= \\frac{g_{i+1} - 2g_i + g_{i-1}}{\\Delta x^2}\n",
+    "\\end{aligned}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "101bccb8",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Since we know from our problem that"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "280cdc54",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\begin{aligned}\n",
+    "-g''(x) &= f(x) \\\\\n",
+    "&= (3x + x^2)\\exp(x)\n",
+    "\\end{aligned}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "38bc9035",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "along with the conditions $g(0) = g(1) = 0$,\n",
+    "the following scheme can be used to find an approximate solution for $g(x)$ numerically:"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3925a117",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"odesys\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation}\n",
+    "  \\begin{aligned}\n",
+    "  -\\Big( \\frac{g_{i+1} - 2g_i + g_{i-1}}{\\Delta x^2} \\Big) &= f(x_i) \\\\\n",
+    "  -g_{i+1} + 2g_i - g_{i-1} &= \\Delta x^2 f(x_i)\n",
+    "  \\end{aligned}\n",
+    "\\end{equation} \\label{odesys} \\tag{16}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6f86e85b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "for $i = 1, \\dots, N_x - 2$ where $g_0 = g_{N_x - 1} = 0$ and $f(x_i) = (3x_i + x_i^2)\\exp(x_i)$, which is given for our specific problem.\n",
+    "\n",
+    "The equation can be rewritten into a matrix equation:"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "394b14bc",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\begin{aligned}\n",
+    "\\begin{pmatrix}\n",
+    "2 & -1 & 0 & \\dots & 0 \\\\\n",
+    "-1 & 2 & -1 & \\dots & 0 \\\\\n",
+    "\\vdots & & \\ddots & & \\vdots \\\\\n",
+    "0 & \\dots & -1 & 2 & -1  \\\\\n",
+    "0 & \\dots & 0 & -1 & 2\\\\\n",
+    "\\end{pmatrix}\n",
+    "\\begin{pmatrix}\n",
+    "g_1 \\\\\n",
+    "g_2 \\\\\n",
+    "\\vdots \\\\\n",
+    "g_{N_x - 3} \\\\\n",
+    "g_{N_x - 2}\n",
+    "\\end{pmatrix}\n",
+    "&=\n",
+    "\\Delta x^2\n",
+    "\\begin{pmatrix}\n",
+    "f(x_1) \\\\\n",
+    "f(x_2) \\\\\n",
+    "\\vdots \\\\\n",
+    "f(x_{N_x - 3}) \\\\\n",
+    "f(x_{N_x - 2})\n",
+    "\\end{pmatrix} \\\\\n",
+    "\\boldsymbol{A}\\boldsymbol{g} &= \\boldsymbol{f},\n",
+    "\\end{aligned}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5ab07ae1",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "which makes it possible to solve for the vector $\\boldsymbol{g}$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8134c34f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Setting up the code\n",
+    "\n",
+    "We can then compare the result from this numerical scheme with the output from our network using Autograd:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 36,
+   "id": "4362f9a9",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "import autograd.numpy as np\n",
+    "from autograd import grad, elementwise_grad\n",
+    "import autograd.numpy.random as npr\n",
+    "from matplotlib import pyplot as plt\n",
+    "\n",
+    "def sigmoid(z):\n",
+    "    return 1/(1 + np.exp(-z))\n",
+    "\n",
+    "def deep_neural_network(deep_params, x):\n",
+    "    # N_hidden is the number of hidden layers\n",
+    "    # deep_params is a list, len() should be used\n",
+    "    N_hidden = len(deep_params) - 1 # -1 since params consists of\n",
+    "                                        # parameters to all the hidden\n",
+    "                                        # layers AND the output layer.\n",
+    "\n",
+    "    # Assumes input x being an one-dimensional array\n",
+    "    num_values = np.size(x)\n",
+    "    x = x.reshape(-1, num_values)\n",
+    "\n",
+    "    # Assume that the input layer does nothing to the input x\n",
+    "    x_input = x\n",
+    "\n",
+    "    # Due to multiple hidden layers, define a variable referencing to the\n",
+    "    # output of the previous layer:\n",
+    "    x_prev = x_input\n",
+    "\n",
+    "    ## Hidden layers:\n",
+    "\n",
+    "    for l in range(N_hidden):\n",
+    "        # From the list of parameters P; find the correct weigths and bias for this layer\n",
+    "        w_hidden = deep_params[l]\n",
+    "\n",
+    "        # Add a row of ones to include bias\n",
+    "        x_prev = np.concatenate((np.ones((1,num_values)), x_prev ), axis = 0)\n",
+    "\n",
+    "        z_hidden = np.matmul(w_hidden, x_prev)\n",
+    "        x_hidden = sigmoid(z_hidden)\n",
+    "\n",
+    "        # Update x_prev such that next layer can use the output from this layer\n",
+    "        x_prev = x_hidden\n",
+    "\n",
+    "    ## Output layer:\n",
+    "\n",
+    "    # Get the weights and bias for this layer\n",
+    "    w_output = deep_params[-1]\n",
+    "\n",
+    "    # Include bias:\n",
+    "    x_prev = np.concatenate((np.ones((1,num_values)), x_prev), axis = 0)\n",
+    "\n",
+    "    z_output = np.matmul(w_output, x_prev)\n",
+    "    x_output = z_output\n",
+    "\n",
+    "    return x_output\n",
+    "\n",
+    "\n",
+    "def solve_ode_deep_neural_network(x, num_neurons, num_iter, lmb):\n",
+    "    # num_hidden_neurons is now a list of number of neurons within each hidden layer\n",
+    "\n",
+    "    # Find the number of hidden layers:\n",
+    "    N_hidden = np.size(num_neurons)\n",
+    "\n",
+    "    ## Set up initial weigths and biases\n",
+    "\n",
+    "    # Initialize the list of parameters:\n",
+    "    P = [None]*(N_hidden + 1) # + 1 to include the output layer\n",
+    "\n",
+    "    P[0] = npr.randn(num_neurons[0], 2 )\n",
+    "    for l in range(1,N_hidden):\n",
+    "        P[l] = npr.randn(num_neurons[l], num_neurons[l-1] + 1) # +1 to include bias\n",
+    "\n",
+    "    # For the output layer\n",
+    "    P[-1] = npr.randn(1, num_neurons[-1] + 1 ) # +1 since bias is included\n",
+    "\n",
+    "    print('Initial cost: %g'%cost_function_deep(P, x))\n",
+    "\n",
+    "    ## Start finding the optimal weigths using gradient descent\n",
+    "\n",
+    "    # Find the Python function that represents the gradient of the cost function\n",
+    "    # w.r.t the 0-th input argument -- that is the weights and biases in the hidden and output layer\n",
+    "    cost_function_deep_grad = grad(cost_function_deep,0)\n",
+    "\n",
+    "    # Let the update be done num_iter times\n",
+    "    for i in range(num_iter):\n",
+    "        # Evaluate the gradient at the current weights and biases in P.\n",
+    "        # The cost_grad consist now of N_hidden + 1 arrays; the gradient w.r.t the weights and biases\n",
+    "        # in the hidden layers and output layers evaluated at x.\n",
+    "        cost_deep_grad =  cost_function_deep_grad(P, x)\n",
+    "\n",
+    "        for l in range(N_hidden+1):\n",
+    "            P[l] = P[l] - lmb * cost_deep_grad[l]\n",
+    "\n",
+    "    print('Final cost: %g'%cost_function_deep(P, x))\n",
+    "\n",
+    "    return P\n",
+    "\n",
+    "## Set up the cost function specified for this Poisson equation:\n",
+    "\n",
+    "# The right side of the ODE\n",
+    "def f(x):\n",
+    "    return (3*x + x**2)*np.exp(x)\n",
+    "\n",
+    "def cost_function_deep(P, x):\n",
+    "\n",
+    "    # Evaluate the trial function with the current parameters P\n",
+    "    g_t = g_trial_deep(x,P)\n",
+    "\n",
+    "    # Find the derivative w.r.t x of the trial function\n",
+    "    d2_g_t = elementwise_grad(elementwise_grad(g_trial_deep,0))(x,P)\n",
+    "\n",
+    "    right_side = f(x)\n",
+    "\n",
+    "    err_sqr = (-d2_g_t - right_side)**2\n",
+    "    cost_sum = np.sum(err_sqr)\n",
+    "\n",
+    "    return cost_sum/np.size(err_sqr)\n",
+    "\n",
+    "# The trial solution:\n",
+    "def g_trial_deep(x,P):\n",
+    "    return x*(1-x)*deep_neural_network(P,x)\n",
+    "\n",
+    "# The analytic solution;\n",
+    "def g_analytic(x):\n",
+    "    return x*(1-x)*np.exp(x)\n",
+    "\n",
+    "if __name__ == '__main__':\n",
+    "    npr.seed(4155)\n",
+    "\n",
+    "    ## Decide the vales of arguments to the function to solve\n",
+    "    Nx = 10\n",
+    "    x = np.linspace(0,1, Nx)\n",
+    "\n",
+    "    ## Set up the initial parameters\n",
+    "    num_hidden_neurons = [200,100]\n",
+    "    num_iter = 1000\n",
+    "    lmb = 1e-3\n",
+    "\n",
+    "    P = solve_ode_deep_neural_network(x, num_hidden_neurons, num_iter, lmb)\n",
+    "\n",
+    "    g_dnn_ag = g_trial_deep(x,P)\n",
+    "    g_analytical = g_analytic(x)\n",
+    "\n",
+    "    # Find the maximum absolute difference between the solutons:\n",
+    "\n",
+    "    plt.figure(figsize=(10,10))\n",
+    "\n",
+    "    plt.title('Performance of neural network solving an ODE compared to the analytical solution')\n",
+    "    plt.plot(x, g_analytical)\n",
+    "    plt.plot(x, g_dnn_ag[0,:])\n",
+    "    plt.legend(['analytical','nn'])\n",
+    "    plt.xlabel('x')\n",
+    "    plt.ylabel('g(x)')\n",
+    "\n",
+    "    ## Perform the computation using the numerical scheme\n",
+    "\n",
+    "    dx = 1/(Nx - 1)\n",
+    "\n",
+    "    # Set up the matrix A\n",
+    "    A = np.zeros((Nx-2,Nx-2))\n",
+    "\n",
+    "    A[0,0] = 2\n",
+    "    A[0,1] = -1\n",
+    "\n",
+    "    for i in range(1,Nx-3):\n",
+    "        A[i,i-1] = -1\n",
+    "        A[i,i] = 2\n",
+    "        A[i,i+1] = -1\n",
+    "\n",
+    "    A[Nx - 3, Nx - 4] = -1\n",
+    "    A[Nx - 3, Nx - 3] = 2\n",
+    "\n",
+    "    # Set up the vector f\n",
+    "    f_vec = dx**2 * f(x[1:-1])\n",
+    "\n",
+    "    # Solve the equation\n",
+    "    g_res = np.linalg.solve(A,f_vec)\n",
+    "\n",
+    "    g_vec = np.zeros(Nx)\n",
+    "    g_vec[1:-1] = g_res\n",
+    "\n",
+    "    # Print the differences between each method\n",
+    "    max_diff1 = np.max(np.abs(g_dnn_ag - g_analytical))\n",
+    "    max_diff2 = np.max(np.abs(g_vec - g_analytical))\n",
+    "    print(\"The max absolute difference between the analytical solution and DNN Autograd: %g\"%max_diff1)\n",
+    "    print(\"The max absolute difference between the analytical solution and numerical scheme: %g\"%max_diff2)\n",
+    "\n",
+    "    # Plot the results\n",
+    "    plt.figure(figsize=(10,10))\n",
+    "\n",
+    "    plt.plot(x,g_vec)\n",
+    "    plt.plot(x,g_analytical)\n",
+    "    plt.plot(x,g_dnn_ag[0,:])\n",
+    "\n",
+    "    plt.legend(['numerical scheme','analytical','dnn'])\n",
+    "    plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c66dc85a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Partial Differential Equations\n",
+    "\n",
+    "A partial differential equation (PDE) has a solution here the function\n",
+    "is defined by multiple variables.  The equation may involve all kinds\n",
+    "of combinations of which variables the function is differentiated with\n",
+    "respect to.\n",
+    "\n",
+    "In general, a partial differential equation for a function $g(x_1,\\dots,x_N)$ with $N$ variables may be expressed as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cf60d1fc",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"PDE\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation} \\label{PDE} \\tag{17}\n",
+    "  f\\left(x_1, \\, \\dots \\, , x_N, \\frac{\\partial g(x_1,\\dots,x_N) }{\\partial x_1}, \\dots , \\frac{\\partial g(x_1,\\dots,x_N) }{\\partial x_N}, \\frac{\\partial g(x_1,\\dots,x_N) }{\\partial x_1\\partial x_2}, \\, \\dots \\, , \\frac{\\partial^n g(x_1,\\dots,x_N) }{\\partial x_N^n} \\right) = 0\n",
+    "\\end{equation}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "bff85f6e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where $f$ is an expression involving all kinds of possible mixed derivatives of $g(x_1,\\dots,x_N)$ up to an order $n$. In order for the solution to be unique, some additional conditions must also be given."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "64289867",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Type of problem\n",
+    "\n",
+    "The problem our network must solve for, is similar to the ODE case.\n",
+    "We must have a trial solution $g_t$ at hand.\n",
+    "\n",
+    "For instance, the trial solution could be expressed as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "75d3a4d2",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\begin{align*}\n",
+    "  g_t(x_1,\\dots,x_N) = h_1(x_1,\\dots,x_N) + h_2(x_1,\\dots,x_N,N(x_1,\\dots,x_N,P))\n",
+    "\\end{align*}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6f3e695d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where $h_1(x_1,\\dots,x_N)$ is a function that ensures $g_t(x_1,\\dots,x_N)$ satisfies some given conditions.\n",
+    "The neural network $N(x_1,\\dots,x_N,P)$ has weights and biases described by $P$ and $h_2(x_1,\\dots,x_N,N(x_1,\\dots,x_N,P))$ is an expression using the output from the neural network in some way.\n",
+    "\n",
+    "The role of the function $h_2(x_1,\\dots,x_N,N(x_1,\\dots,x_N,P))$, is to ensure that the output of $N(x_1,\\dots,x_N,P)$ is zero when $g_t(x_1,\\dots,x_N)$ is evaluated at the values of $x_1,\\dots,x_N$ where the given conditions must be satisfied. The function $h_1(x_1,\\dots,x_N)$ should alone make $g_t(x_1,\\dots,x_N)$ satisfy the conditions."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "da1ba3cf",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Network requirements\n",
+    "\n",
+    "The network tries then the minimize the cost function following the\n",
+    "same ideas as described for the ODE case, but now with more than one\n",
+    "variables to consider.  The concept still remains the same; find a set\n",
+    "of parameters $P$ such that the expression $f$ in ([17](#PDE)) is as\n",
+    "close to zero as possible.\n",
+    "\n",
+    "As for the ODE case, the cost function is the mean squared error that\n",
+    "the network must try to minimize. The cost function for the network to\n",
+    "minimize is"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "373065ff",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "C\\left(x_1, \\dots, x_N, P\\right) = \\left(  f\\left(x_1, \\, \\dots \\, , x_N, \\frac{\\partial g(x_1,\\dots,x_N) }{\\partial x_1}, \\dots , \\frac{\\partial g(x_1,\\dots,x_N) }{\\partial x_N}, \\frac{\\partial g(x_1,\\dots,x_N) }{\\partial x_1\\partial x_2}, \\, \\dots \\, , \\frac{\\partial^n g(x_1,\\dots,x_N) }{\\partial x_N^n} \\right) \\right)^2\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2281eade",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## More details\n",
+    "\n",
+    "If we let $\\boldsymbol{x} = \\big( x_1, \\dots, x_N \\big)$ be an array containing the values for $x_1, \\dots, x_N$ respectively, the cost function can be reformulated into the following:"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "989a8905",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "C\\left(\\boldsymbol{x}, P\\right) = f\\left( \\left( \\boldsymbol{x}, \\frac{\\partial g(\\boldsymbol{x}) }{\\partial x_1}, \\dots , \\frac{\\partial g(\\boldsymbol{x}) }{\\partial x_N}, \\frac{\\partial g(\\boldsymbol{x}) }{\\partial x_1\\partial x_2}, \\, \\dots \\, , \\frac{\\partial^n g(\\boldsymbol{x}) }{\\partial x_N^n} \\right) \\right)^2\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b36367a0",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "If we also have $M$ different sets of values for $x_1, \\dots, x_N$, that is $\\boldsymbol{x}_i = \\big(x_1^{(i)}, \\dots, x_N^{(i)}\\big)$ for $i = 1,\\dots,M$ being the rows in matrix $X$, the cost function can be generalized into"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6f6f51dd",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "C\\left(X, P \\right) = \\sum_{i=1}^M f\\left( \\left( \\boldsymbol{x}_i, \\frac{\\partial g(\\boldsymbol{x}_i) }{\\partial x_1}, \\dots , \\frac{\\partial g(\\boldsymbol{x}_i) }{\\partial x_N}, \\frac{\\partial g(\\boldsymbol{x}_i) }{\\partial x_1\\partial x_2}, \\, \\dots \\, , \\frac{\\partial^n g(\\boldsymbol{x}_i) }{\\partial x_N^n} \\right) \\right)^2.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "35bd1e4a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Example: The diffusion equation\n",
+    "\n",
+    "In one spatial dimension, the equation reads"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2b804c0a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial g(x,t)}{\\partial t} = \\frac{\\partial^2 g(x,t)}{\\partial x^2}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "07f20557",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where a possible choice of conditions are"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0e14c702",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\begin{align*}\n",
+    "g(0,t) &= 0 ,\\qquad t \\geq 0 \\\\\n",
+    "g(1,t) &= 0, \\qquad t \\geq 0 \\\\\n",
+    "g(x,0) &= u(x),\\qquad x\\in [0,1]\n",
+    "\\end{align*}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a19c5cae",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "with $u(x)$ being some given function."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "de041a40",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Defining the problem\n",
+    "\n",
+    "For this case, we want to find $g(x,t)$ such that"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "519bb7a7",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"diffonedim\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation}\n",
+    "  \\frac{\\partial g(x,t)}{\\partial t} = \\frac{\\partial^2 g(x,t)}{\\partial x^2}\n",
+    "\\end{equation} \\label{diffonedim} \\tag{18}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "129322ea",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ddc7b725",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\begin{align*}\n",
+    "g(0,t) &= 0 ,\\qquad t \\geq 0 \\\\\n",
+    "g(1,t) &= 0, \\qquad t \\geq 0 \\\\\n",
+    "g(x,0) &= u(x),\\qquad x\\in [0,1]\n",
+    "\\end{align*}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5497b34b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "with $u(x) = \\sin(\\pi x)$.\n",
+    "\n",
+    "First, let us set up the deep neural network.\n",
+    "The deep neural network will follow the same structure as discussed in the examples solving the ODEs.\n",
+    "First, we will look into how Autograd could be used in a network tailored to solve for bivariate functions."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0b9040e4",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Setting up the network using Autograd\n",
+    "\n",
+    "The only change to do here, is to extend our network such that\n",
+    "functions of multiple parameters are correctly handled.  In this case\n",
+    "we have two variables in our function to solve for, that is time $t$\n",
+    "and position $x$.  The variables will be represented by a\n",
+    "one-dimensional array in the program.  The program will evaluate the\n",
+    "network at each possible pair $(x,t)$, given an array for the desired\n",
+    "$x$-values and $t$-values to approximate the solution at."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 37,
+   "id": "17097802",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "def sigmoid(z):\n",
+    "    return 1/(1 + np.exp(-z))\n",
+    "\n",
+    "def deep_neural_network(deep_params, x):\n",
+    "    # x is now a point and a 1D numpy array; make it a column vector\n",
+    "    num_coordinates = np.size(x,0)\n",
+    "    x = x.reshape(num_coordinates,-1)\n",
+    "\n",
+    "    num_points = np.size(x,1)\n",
+    "\n",
+    "    # N_hidden is the number of hidden layers\n",
+    "    N_hidden = len(deep_params) - 1 # -1 since params consist of parameters to all the hidden layers AND the output layer\n",
+    "\n",
+    "    # Assume that the input layer does nothing to the input x\n",
+    "    x_input = x\n",
+    "    x_prev = x_input\n",
+    "\n",
+    "    ## Hidden layers:\n",
+    "\n",
+    "    for l in range(N_hidden):\n",
+    "        # From the list of parameters P; find the correct weigths and bias for this layer\n",
+    "        w_hidden = deep_params[l]\n",
+    "\n",
+    "        # Add a row of ones to include bias\n",
+    "        x_prev = np.concatenate((np.ones((1,num_points)), x_prev ), axis = 0)\n",
+    "\n",
+    "        z_hidden = np.matmul(w_hidden, x_prev)\n",
+    "        x_hidden = sigmoid(z_hidden)\n",
+    "\n",
+    "        # Update x_prev such that next layer can use the output from this layer\n",
+    "        x_prev = x_hidden\n",
+    "\n",
+    "    ## Output layer:\n",
+    "\n",
+    "    # Get the weights and bias for this layer\n",
+    "    w_output = deep_params[-1]\n",
+    "\n",
+    "    # Include bias:\n",
+    "    x_prev = np.concatenate((np.ones((1,num_points)), x_prev), axis = 0)\n",
+    "\n",
+    "    z_output = np.matmul(w_output, x_prev)\n",
+    "    x_output = z_output\n",
+    "\n",
+    "    return x_output[0][0]"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a2178b56",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Setting up the network using Autograd; The trial solution\n",
+    "\n",
+    "The cost function must then iterate through the given arrays\n",
+    "containing values for $x$ and $t$, defines a point $(x,t)$ the deep\n",
+    "neural network and the trial solution is evaluated at, and then finds\n",
+    "the Jacobian of the trial solution.\n",
+    "\n",
+    "A possible trial solution for this PDE is\n",
+    "\n",
+    "$$\n",
+    "g_t(x,t) = h_1(x,t) + x(1-x)tN(x,t,P)\n",
+    "$$\n",
+    "\n",
+    "with $A(x,t)$ being a function ensuring that $g_t(x,t)$ satisfies our given conditions, and $N(x,t,P)$ being the output from the deep neural network using weights and biases for each layer from $P$.\n",
+    "\n",
+    "To fulfill the conditions, $A(x,t)$ could be:\n",
+    "\n",
+    "$$\n",
+    "h_1(x,t) = (1-t)\\Big(u(x) - \\big((1-x)u(0) + x u(1)\\big)\\Big) = (1-t)u(x) = (1-t)\\sin(\\pi x)\n",
+    "$$\n",
+    "since $(0) = u(1) = 0$ and $u(x) = \\sin(\\pi x)$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "533f4e84",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Why the jacobian?\n",
+    "\n",
+    "The Jacobian is used because the program must find the derivative of\n",
+    "the trial solution with respect to $x$ and $t$.\n",
+    "\n",
+    "This gives the necessity of computing the Jacobian matrix, as we want\n",
+    "to evaluate the gradient with respect to $x$ and $t$ (note that the\n",
+    "Jacobian of a scalar-valued multivariate function is simply its\n",
+    "gradient).\n",
+    "\n",
+    "In Autograd, the differentiation is by default done with respect to\n",
+    "the first input argument of your Python function. Since the points is\n",
+    "an array representing $x$ and $t$, the Jacobian is calculated using\n",
+    "the values of $x$ and $t$.\n",
+    "\n",
+    "To find the second derivative with respect to $x$ and $t$, the\n",
+    "Jacobian can be found for the second time. The result is a Hessian\n",
+    "matrix, which is the matrix containing all the possible second order\n",
+    "mixed derivatives of $g(x,t)$."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 38,
+   "id": "7b494481",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "# Set up the trial function:\n",
+    "def u(x):\n",
+    "    return np.sin(np.pi*x)\n",
+    "\n",
+    "def g_trial(point,P):\n",
+    "    x,t = point\n",
+    "    return (1-t)*u(x) + x*(1-x)*t*deep_neural_network(P,point)\n",
+    "\n",
+    "# The right side of the ODE:\n",
+    "def f(point):\n",
+    "    return 0.\n",
+    "\n",
+    "# The cost function:\n",
+    "def cost_function(P, x, t):\n",
+    "    cost_sum = 0\n",
+    "\n",
+    "    g_t_jacobian_func = jacobian(g_trial)\n",
+    "    g_t_hessian_func = hessian(g_trial)\n",
+    "\n",
+    "    for x_ in x:\n",
+    "        for t_ in t:\n",
+    "            point = np.array([x_,t_])\n",
+    "\n",
+    "            g_t = g_trial(point,P)\n",
+    "            g_t_jacobian = g_t_jacobian_func(point,P)\n",
+    "            g_t_hessian = g_t_hessian_func(point,P)\n",
+    "\n",
+    "            g_t_dt = g_t_jacobian[1]\n",
+    "            g_t_d2x = g_t_hessian[0][0]\n",
+    "\n",
+    "            func = f(point)\n",
+    "\n",
+    "            err_sqr = ( (g_t_dt - g_t_d2x) - func)**2\n",
+    "            cost_sum += err_sqr\n",
+    "\n",
+    "    return cost_sum"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9f4b4939",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Setting up the network using Autograd; The full program\n",
+    "\n",
+    "Having set up the network, along with the trial solution and cost function, we can now see how the deep neural network performs by comparing the results to the analytical solution.\n",
+    "\n",
+    "The analytical solution of our problem is\n",
+    "\n",
+    "$$\n",
+    "g(x,t) = \\exp(-\\pi^2 t)\\sin(\\pi x)\n",
+    "$$\n",
+    "\n",
+    "A possible way to implement a neural network solving the PDE, is given below.\n",
+    "Be aware, though, that it is fairly slow for the parameters used.\n",
+    "A better result is possible, but requires more iterations, and thus longer time to complete.\n",
+    "\n",
+    "Indeed, the program below is not optimal in its implementation, but rather serves as an example on how to implement and use a neural network to solve a PDE.\n",
+    "Using TensorFlow results in a much better execution time. Try it!"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 39,
+   "id": "83d6eb7d",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "import autograd.numpy as np\n",
+    "from autograd import jacobian,hessian,grad\n",
+    "import autograd.numpy.random as npr\n",
+    "from matplotlib import cm\n",
+    "from matplotlib import pyplot as plt\n",
+    "from mpl_toolkits.mplot3d import axes3d\n",
+    "\n",
+    "## Set up the network\n",
+    "\n",
+    "def sigmoid(z):\n",
+    "    return 1/(1 + np.exp(-z))\n",
+    "\n",
+    "def deep_neural_network(deep_params, x):\n",
+    "    # x is now a point and a 1D numpy array; make it a column vector\n",
+    "    num_coordinates = np.size(x,0)\n",
+    "    x = x.reshape(num_coordinates,-1)\n",
+    "\n",
+    "    num_points = np.size(x,1)\n",
+    "\n",
+    "    # N_hidden is the number of hidden layers\n",
+    "    N_hidden = len(deep_params) - 1 # -1 since params consist of parameters to all the hidden layers AND the output layer\n",
+    "\n",
+    "    # Assume that the input layer does nothing to the input x\n",
+    "    x_input = x\n",
+    "    x_prev = x_input\n",
+    "\n",
+    "    ## Hidden layers:\n",
+    "\n",
+    "    for l in range(N_hidden):\n",
+    "        # From the list of parameters P; find the correct weigths and bias for this layer\n",
+    "        w_hidden = deep_params[l]\n",
+    "\n",
+    "        # Add a row of ones to include bias\n",
+    "        x_prev = np.concatenate((np.ones((1,num_points)), x_prev ), axis = 0)\n",
+    "\n",
+    "        z_hidden = np.matmul(w_hidden, x_prev)\n",
+    "        x_hidden = sigmoid(z_hidden)\n",
+    "\n",
+    "        # Update x_prev such that next layer can use the output from this layer\n",
+    "        x_prev = x_hidden\n",
+    "\n",
+    "    ## Output layer:\n",
+    "\n",
+    "    # Get the weights and bias for this layer\n",
+    "    w_output = deep_params[-1]\n",
+    "\n",
+    "    # Include bias:\n",
+    "    x_prev = np.concatenate((np.ones((1,num_points)), x_prev), axis = 0)\n",
+    "\n",
+    "    z_output = np.matmul(w_output, x_prev)\n",
+    "    x_output = z_output\n",
+    "\n",
+    "    return x_output[0][0]\n",
+    "\n",
+    "## Define the trial solution and cost function\n",
+    "def u(x):\n",
+    "    return np.sin(np.pi*x)\n",
+    "\n",
+    "def g_trial(point,P):\n",
+    "    x,t = point\n",
+    "    return (1-t)*u(x) + x*(1-x)*t*deep_neural_network(P,point)\n",
+    "\n",
+    "# The right side of the ODE:\n",
+    "def f(point):\n",
+    "    return 0.\n",
+    "\n",
+    "# The cost function:\n",
+    "def cost_function(P, x, t):\n",
+    "    cost_sum = 0\n",
+    "\n",
+    "    g_t_jacobian_func = jacobian(g_trial)\n",
+    "    g_t_hessian_func = hessian(g_trial)\n",
+    "\n",
+    "    for x_ in x:\n",
+    "        for t_ in t:\n",
+    "            point = np.array([x_,t_])\n",
+    "\n",
+    "            g_t = g_trial(point,P)\n",
+    "            g_t_jacobian = g_t_jacobian_func(point,P)\n",
+    "            g_t_hessian = g_t_hessian_func(point,P)\n",
+    "\n",
+    "            g_t_dt = g_t_jacobian[1]\n",
+    "            g_t_d2x = g_t_hessian[0][0]\n",
+    "\n",
+    "            func = f(point)\n",
+    "\n",
+    "            err_sqr = ( (g_t_dt - g_t_d2x) - func)**2\n",
+    "            cost_sum += err_sqr\n",
+    "\n",
+    "    return cost_sum /( np.size(x)*np.size(t) )\n",
+    "\n",
+    "## For comparison, define the analytical solution\n",
+    "def g_analytic(point):\n",
+    "    x,t = point\n",
+    "    return np.exp(-np.pi**2*t)*np.sin(np.pi*x)\n",
+    "\n",
+    "## Set up a function for training the network to solve for the equation\n",
+    "def solve_pde_deep_neural_network(x,t, num_neurons, num_iter, lmb):\n",
+    "    ## Set up initial weigths and biases\n",
+    "    N_hidden = np.size(num_neurons)\n",
+    "\n",
+    "    ## Set up initial weigths and biases\n",
+    "\n",
+    "    # Initialize the list of parameters:\n",
+    "    P = [None]*(N_hidden + 1) # + 1 to include the output layer\n",
+    "\n",
+    "    P[0] = npr.randn(num_neurons[0], 2 + 1 ) # 2 since we have two points, +1 to include bias\n",
+    "    for l in range(1,N_hidden):\n",
+    "        P[l] = npr.randn(num_neurons[l], num_neurons[l-1] + 1) # +1 to include bias\n",
+    "\n",
+    "    # For the output layer\n",
+    "    P[-1] = npr.randn(1, num_neurons[-1] + 1 ) # +1 since bias is included\n",
+    "\n",
+    "    print('Initial cost: ',cost_function(P, x, t))\n",
+    "\n",
+    "    cost_function_grad = grad(cost_function,0)\n",
+    "\n",
+    "    # Let the update be done num_iter times\n",
+    "    for i in range(num_iter):\n",
+    "        cost_grad =  cost_function_grad(P, x , t)\n",
+    "\n",
+    "        for l in range(N_hidden+1):\n",
+    "            P[l] = P[l] - lmb * cost_grad[l]\n",
+    "\n",
+    "    print('Final cost: ',cost_function(P, x, t))\n",
+    "\n",
+    "    return P\n",
+    "\n",
+    "if __name__ == '__main__':\n",
+    "    ### Use the neural network:\n",
+    "    npr.seed(15)\n",
+    "\n",
+    "    ## Decide the vales of arguments to the function to solve\n",
+    "    Nx = 10; Nt = 10\n",
+    "    x = np.linspace(0, 1, Nx)\n",
+    "    t = np.linspace(0,1,Nt)\n",
+    "\n",
+    "    ## Set up the parameters for the network\n",
+    "    num_hidden_neurons = [100, 25]\n",
+    "    num_iter = 250\n",
+    "    lmb = 0.01\n",
+    "\n",
+    "    P = solve_pde_deep_neural_network(x,t, num_hidden_neurons, num_iter, lmb)\n",
+    "\n",
+    "    ## Store the results\n",
+    "    g_dnn_ag = np.zeros((Nx, Nt))\n",
+    "    G_analytical = np.zeros((Nx, Nt))\n",
+    "    for i,x_ in enumerate(x):\n",
+    "        for j, t_ in enumerate(t):\n",
+    "            point = np.array([x_, t_])\n",
+    "            g_dnn_ag[i,j] = g_trial(point,P)\n",
+    "\n",
+    "            G_analytical[i,j] = g_analytic(point)\n",
+    "\n",
+    "    # Find the map difference between the analytical and the computed solution\n",
+    "    diff_ag = np.abs(g_dnn_ag - G_analytical)\n",
+    "    print('Max absolute difference between the analytical solution and the network: %g'%np.max(diff_ag))\n",
+    "\n",
+    "    ## Plot the solutions in two dimensions, that being in position and time\n",
+    "\n",
+    "    T,X = np.meshgrid(t,x)\n",
+    "\n",
+    "    fig = plt.figure(figsize=(10,10))\n",
+    "    ax = fig.add_suplot(projection='3d')\n",
+    "    ax.set_title('Solution from the deep neural network w/ %d layer'%len(num_hidden_neurons))\n",
+    "    s = ax.plot_surface(T,X,g_dnn_ag,linewidth=0,antialiased=False,cmap=cm.viridis)\n",
+    "    ax.set_xlabel('Time $t$')\n",
+    "    ax.set_ylabel('Position $x$');\n",
+    "\n",
+    "\n",
+    "    fig = plt.figure(figsize=(10,10))\n",
+    "    ax = fig.add_suplot(projection='3d')\n",
+    "    ax.set_title('Analytical solution')\n",
+    "    s = ax.plot_surface(T,X,G_analytical,linewidth=0,antialiased=False,cmap=cm.viridis)\n",
+    "    ax.set_xlabel('Time $t$')\n",
+    "    ax.set_ylabel('Position $x$');\n",
+    "\n",
+    "    fig = plt.figure(figsize=(10,10))\n",
+    "    ax = fig.add_suplot(projection='3d')\n",
+    "    ax.set_title('Difference')\n",
+    "    s = ax.plot_surface(T,X,diff_ag,linewidth=0,antialiased=False,cmap=cm.viridis)\n",
+    "    ax.set_xlabel('Time $t$')\n",
+    "    ax.set_ylabel('Position $x$');\n",
+    "\n",
+    "    ## Take some slices of the 3D plots just to see the solutions at particular times\n",
+    "    indx1 = 0\n",
+    "    indx2 = int(Nt/2)\n",
+    "    indx3 = Nt-1\n",
+    "\n",
+    "    t1 = t[indx1]\n",
+    "    t2 = t[indx2]\n",
+    "    t3 = t[indx3]\n",
+    "\n",
+    "    # Slice the results from the DNN\n",
+    "    res1 = g_dnn_ag[:,indx1]\n",
+    "    res2 = g_dnn_ag[:,indx2]\n",
+    "    res3 = g_dnn_ag[:,indx3]\n",
+    "\n",
+    "    # Slice the analytical results\n",
+    "    res_analytical1 = G_analytical[:,indx1]\n",
+    "    res_analytical2 = G_analytical[:,indx2]\n",
+    "    res_analytical3 = G_analytical[:,indx3]\n",
+    "\n",
+    "    # Plot the slices\n",
+    "    plt.figure(figsize=(10,10))\n",
+    "    plt.title(\"Computed solutions at time = %g\"%t1)\n",
+    "    plt.plot(x, res1)\n",
+    "    plt.plot(x,res_analytical1)\n",
+    "    plt.legend(['dnn','analytical'])\n",
+    "\n",
+    "    plt.figure(figsize=(10,10))\n",
+    "    plt.title(\"Computed solutions at time = %g\"%t2)\n",
+    "    plt.plot(x, res2)\n",
+    "    plt.plot(x,res_analytical2)\n",
+    "    plt.legend(['dnn','analytical'])\n",
+    "\n",
+    "    plt.figure(figsize=(10,10))\n",
+    "    plt.title(\"Computed solutions at time = %g\"%t3)\n",
+    "    plt.plot(x, res3)\n",
+    "    plt.plot(x,res_analytical3)\n",
+    "    plt.legend(['dnn','analytical'])\n",
+    "\n",
+    "    plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ada13a48",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Example: Solving the wave equation with Neural Networks\n",
+    "\n",
+    "The wave equation is"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e4727d73",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial^2 g(x,t)}{\\partial t^2} = c^2\\frac{\\partial^2 g(x,t)}{\\partial x^2}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0b86d555",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "with $c$ being the specified wave speed.\n",
+    "\n",
+    "Here, the chosen conditions are"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "216948d5",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\begin{align*}\n",
+    "\tg(0,t) &= 0 \\\\\n",
+    "\tg(1,t) &= 0 \\\\\n",
+    "\tg(x,0) &= u(x) \\\\\n",
+    "\t\\frac{\\partial g(x,t)}{\\partial t} \\Big |_{t = 0} &= v(x)\n",
+    "\\end{align*}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "44c25fdc",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where $\\frac{\\partial g(x,t)}{\\partial t} \\Big |_{t = 0}$ means the derivative of $g(x,t)$ with respect to $t$ is evaluated at $t = 0$, and $u(x)$ and $v(x)$ being given functions."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "98f919eb",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## The problem to solve for\n",
+    "\n",
+    "The wave equation to solve for, is"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "01299767",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"wave\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation} \\label{wave} \\tag{19}\n",
+    "\\frac{\\partial^2 g(x,t)}{\\partial t^2} = c^2 \\frac{\\partial^2 g(x,t)}{\\partial x^2}\n",
+    "\\end{equation}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "556587c5",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where $c$ is the given wave speed.\n",
+    "The chosen conditions for this equation are"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c9eb4f3a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"condwave\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{aligned}\n",
+    "g(0,t) &= 0, &t \\geq 0 \\\\\n",
+    "g(1,t) &= 0, &t \\geq 0 \\\\\n",
+    "g(x,0) &= u(x), &x\\in[0,1] \\\\\n",
+    "\\frac{\\partial g(x,t)}{\\partial t}\\Big |_{t = 0} &= v(x), &x \\in [0,1]\n",
+    "\\end{aligned} \\label{condwave} \\tag{20}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "63128ef6",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "In this example, let $c = 1$ and $u(x) = \\sin(\\pi x)$ and $v(x) = -\\pi\\sin(\\pi x)$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ff568c81",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## The trial solution\n",
+    "Setting up the network is done in similar matter as for the example of solving the diffusion equation.\n",
+    "The only things we have to change, is the trial solution such that it satisfies the conditions from ([20](#condwave)) and the cost function.\n",
+    "\n",
+    "The trial solution becomes slightly different since we have other conditions than in the example of solving the diffusion equation. Here, a possible trial solution $g_t(x,t)$ is\n",
+    "\n",
+    "$$\n",
+    "g_t(x,t) = h_1(x,t) + x(1-x)t^2N(x,t,P)\n",
+    "$$\n",
+    "\n",
+    "where\n",
+    "\n",
+    "$$\n",
+    "h_1(x,t) = (1-t^2)u(x) + tv(x)\n",
+    "$$\n",
+    "\n",
+    "Note that this trial solution satisfies the conditions only if $u(0) = v(0) = u(1) = v(1) = 0$, which is the case in this example."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7b32c8dd",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## The analytical solution\n",
+    "\n",
+    "The analytical solution for our specific problem, is\n",
+    "\n",
+    "$$\n",
+    "g(x,t) = \\sin(\\pi x)\\cos(\\pi t) - \\sin(\\pi x)\\sin(\\pi t)\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "fc33e683",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Solving the wave equation - the full program using Autograd"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 40,
+   "id": "2f923958",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "import autograd.numpy as np\n",
+    "from autograd import hessian,grad\n",
+    "import autograd.numpy.random as npr\n",
+    "from matplotlib import cm\n",
+    "from matplotlib import pyplot as plt\n",
+    "from mpl_toolkits.mplot3d import axes3d\n",
+    "\n",
+    "## Set up the trial function:\n",
+    "def u(x):\n",
+    "    return np.sin(np.pi*x)\n",
+    "\n",
+    "def v(x):\n",
+    "    return -np.pi*np.sin(np.pi*x)\n",
+    "\n",
+    "def h1(point):\n",
+    "    x,t = point\n",
+    "    return (1 - t**2)*u(x) + t*v(x)\n",
+    "\n",
+    "def g_trial(point,P):\n",
+    "    x,t = point\n",
+    "    return h1(point) + x*(1-x)*t**2*deep_neural_network(P,point)\n",
+    "\n",
+    "## Define the cost function\n",
+    "def cost_function(P, x, t):\n",
+    "    cost_sum = 0\n",
+    "\n",
+    "    g_t_hessian_func = hessian(g_trial)\n",
+    "\n",
+    "    for x_ in x:\n",
+    "        for t_ in t:\n",
+    "            point = np.array([x_,t_])\n",
+    "\n",
+    "            g_t_hessian = g_t_hessian_func(point,P)\n",
+    "\n",
+    "            g_t_d2x = g_t_hessian[0][0]\n",
+    "            g_t_d2t = g_t_hessian[1][1]\n",
+    "\n",
+    "            err_sqr = ( (g_t_d2t - g_t_d2x) )**2\n",
+    "            cost_sum += err_sqr\n",
+    "\n",
+    "    return cost_sum / (np.size(t) * np.size(x))\n",
+    "\n",
+    "## The neural network\n",
+    "def sigmoid(z):\n",
+    "    return 1/(1 + np.exp(-z))\n",
+    "\n",
+    "def deep_neural_network(deep_params, x):\n",
+    "    # x is now a point and a 1D numpy array; make it a column vector\n",
+    "    num_coordinates = np.size(x,0)\n",
+    "    x = x.reshape(num_coordinates,-1)\n",
+    "\n",
+    "    num_points = np.size(x,1)\n",
+    "\n",
+    "    # N_hidden is the number of hidden layers\n",
+    "    N_hidden = len(deep_params) - 1 # -1 since params consist of parameters to all the hidden layers AND the output layer\n",
+    "\n",
+    "    # Assume that the input layer does nothing to the input x\n",
+    "    x_input = x\n",
+    "    x_prev = x_input\n",
+    "\n",
+    "    ## Hidden layers:\n",
+    "\n",
+    "    for l in range(N_hidden):\n",
+    "        # From the list of parameters P; find the correct weigths and bias for this layer\n",
+    "        w_hidden = deep_params[l]\n",
+    "\n",
+    "        # Add a row of ones to include bias\n",
+    "        x_prev = np.concatenate((np.ones((1,num_points)), x_prev ), axis = 0)\n",
+    "\n",
+    "        z_hidden = np.matmul(w_hidden, x_prev)\n",
+    "        x_hidden = sigmoid(z_hidden)\n",
+    "\n",
+    "        # Update x_prev such that next layer can use the output from this layer\n",
+    "        x_prev = x_hidden\n",
+    "\n",
+    "    ## Output layer:\n",
+    "\n",
+    "    # Get the weights and bias for this layer\n",
+    "    w_output = deep_params[-1]\n",
+    "\n",
+    "    # Include bias:\n",
+    "    x_prev = np.concatenate((np.ones((1,num_points)), x_prev), axis = 0)\n",
+    "\n",
+    "    z_output = np.matmul(w_output, x_prev)\n",
+    "    x_output = z_output\n",
+    "\n",
+    "    return x_output[0][0]\n",
+    "\n",
+    "## The analytical solution\n",
+    "def g_analytic(point):\n",
+    "    x,t = point\n",
+    "    return np.sin(np.pi*x)*np.cos(np.pi*t) - np.sin(np.pi*x)*np.sin(np.pi*t)\n",
+    "\n",
+    "def solve_pde_deep_neural_network(x,t, num_neurons, num_iter, lmb):\n",
+    "    ## Set up initial weigths and biases\n",
+    "    N_hidden = np.size(num_neurons)\n",
+    "\n",
+    "    ## Set up initial weigths and biases\n",
+    "\n",
+    "    # Initialize the list of parameters:\n",
+    "    P = [None]*(N_hidden + 1) # + 1 to include the output layer\n",
+    "\n",
+    "    P[0] = npr.randn(num_neurons[0], 2 + 1 ) # 2 since we have two points, +1 to include bias\n",
+    "    for l in range(1,N_hidden):\n",
+    "        P[l] = npr.randn(num_neurons[l], num_neurons[l-1] + 1) # +1 to include bias\n",
+    "\n",
+    "    # For the output layer\n",
+    "    P[-1] = npr.randn(1, num_neurons[-1] + 1 ) # +1 since bias is included\n",
+    "\n",
+    "    print('Initial cost: ',cost_function(P, x, t))\n",
+    "\n",
+    "    cost_function_grad = grad(cost_function,0)\n",
+    "\n",
+    "    # Let the update be done num_iter times\n",
+    "    for i in range(num_iter):\n",
+    "        cost_grad =  cost_function_grad(P, x , t)\n",
+    "\n",
+    "        for l in range(N_hidden+1):\n",
+    "            P[l] = P[l] - lmb * cost_grad[l]\n",
+    "\n",
+    "\n",
+    "    print('Final cost: ',cost_function(P, x, t))\n",
+    "\n",
+    "    return P\n",
+    "\n",
+    "if __name__ == '__main__':\n",
+    "    ### Use the neural network:\n",
+    "    npr.seed(15)\n",
+    "\n",
+    "    ## Decide the vales of arguments to the function to solve\n",
+    "    Nx = 10; Nt = 10\n",
+    "    x = np.linspace(0, 1, Nx)\n",
+    "    t = np.linspace(0,1,Nt)\n",
+    "\n",
+    "    ## Set up the parameters for the network\n",
+    "    num_hidden_neurons = [50,20]\n",
+    "    num_iter = 1000\n",
+    "    lmb = 0.01\n",
+    "\n",
+    "    P = solve_pde_deep_neural_network(x,t, num_hidden_neurons, num_iter, lmb)\n",
+    "\n",
+    "    ## Store the results\n",
+    "    res = np.zeros((Nx, Nt))\n",
+    "    res_analytical = np.zeros((Nx, Nt))\n",
+    "    for i,x_ in enumerate(x):\n",
+    "        for j, t_ in enumerate(t):\n",
+    "            point = np.array([x_, t_])\n",
+    "            res[i,j] = g_trial(point,P)\n",
+    "\n",
+    "            res_analytical[i,j] = g_analytic(point)\n",
+    "\n",
+    "    diff = np.abs(res - res_analytical)\n",
+    "    print(\"Max difference between analytical and solution from nn: %g\"%np.max(diff))\n",
+    "\n",
+    "    ## Plot the solutions in two dimensions, that being in position and time\n",
+    "\n",
+    "    T,X = np.meshgrid(t,x)\n",
+    "\n",
+    "    fig = plt.figure(figsize=(10,10))\n",
+    "    ax = fig.add_suplot(projection='3d')\n",
+    "    ax.set_title('Solution from the deep neural network w/ %d layer'%len(num_hidden_neurons))\n",
+    "    s = ax.plot_surface(T,X,res,linewidth=0,antialiased=False,cmap=cm.viridis)\n",
+    "    ax.set_xlabel('Time $t$')\n",
+    "    ax.set_ylabel('Position $x$');\n",
+    "\n",
+    "\n",
+    "    fig = plt.figure(figsize=(10,10))\n",
+    "    ax = fig.add_suplot(projection='3d')\n",
+    "    ax.set_title('Analytical solution')\n",
+    "    s = ax.plot_surface(T,X,res_analytical,linewidth=0,antialiased=False,cmap=cm.viridis)\n",
+    "    ax.set_xlabel('Time $t$')\n",
+    "    ax.set_ylabel('Position $x$');\n",
+    "\n",
+    "\n",
+    "    fig = plt.figure(figsize=(10,10))\n",
+    "    ax = fig.add_suplot(projection='3d')\n",
+    "    ax.set_title('Difference')\n",
+    "    s = ax.plot_surface(T,X,diff,linewidth=0,antialiased=False,cmap=cm.viridis)\n",
+    "    ax.set_xlabel('Time $t$')\n",
+    "    ax.set_ylabel('Position $x$');\n",
+    "\n",
+    "    ## Take some slices of the 3D plots just to see the solutions at particular times\n",
+    "    indx1 = 0\n",
+    "    indx2 = int(Nt/2)\n",
+    "    indx3 = Nt-1\n",
+    "\n",
+    "    t1 = t[indx1]\n",
+    "    t2 = t[indx2]\n",
+    "    t3 = t[indx3]\n",
+    "\n",
+    "    # Slice the results from the DNN\n",
+    "    res1 = res[:,indx1]\n",
+    "    res2 = res[:,indx2]\n",
+    "    res3 = res[:,indx3]\n",
+    "\n",
+    "    # Slice the analytical results\n",
+    "    res_analytical1 = res_analytical[:,indx1]\n",
+    "    res_analytical2 = res_analytical[:,indx2]\n",
+    "    res_analytical3 = res_analytical[:,indx3]\n",
+    "\n",
+    "    # Plot the slices\n",
+    "    plt.figure(figsize=(10,10))\n",
+    "    plt.title(\"Computed solutions at time = %g\"%t1)\n",
+    "    plt.plot(x, res1)\n",
+    "    plt.plot(x,res_analytical1)\n",
+    "    plt.legend(['dnn','analytical'])\n",
+    "\n",
+    "    plt.figure(figsize=(10,10))\n",
+    "    plt.title(\"Computed solutions at time = %g\"%t2)\n",
+    "    plt.plot(x, res2)\n",
+    "    plt.plot(x,res_analytical2)\n",
+    "    plt.legend(['dnn','analytical'])\n",
+    "\n",
+    "    plt.figure(figsize=(10,10))\n",
+    "    plt.title(\"Computed solutions at time = %g\"%t3)\n",
+    "    plt.plot(x, res3)\n",
+    "    plt.plot(x,res_analytical3)\n",
+    "    plt.legend(['dnn','analytical'])\n",
+    "\n",
+    "    plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "95dea76f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Resources on differential equations and deep learning\n",
+    "\n",
+    "1. [Artificial neural networks for solving ordinary and partial differential equations by I.E. Lagaris et al](https://pdfs.semanticscholar.org/d061/df393e0e8fbfd0ea24976458b7d42419040d.pdf)\n",
+    "\n",
+    "2. [Neural networks for solving differential equations by A. Honchar](https://becominghuman.ai/neural-networks-for-solving-differential-equations-fa230ac5e04c)\n",
+    "\n",
+    "3. [Solving differential equations using neural networks by M.M Chiaramonte and M. Kiener](http://cs229.stanford.edu/proj2013/ChiaramonteKiener-SolvingDifferentialEquationsUsingNeuralNetworks.pdf)\n",
+    "\n",
+    "4. [Introduction to Partial Differential Equations by A. Tveito, R. Winther](https://www.springer.com/us/book/9783540225515)"
+   ]
+  }
+ ],
+ "metadata": {},
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/doc/LectureNotes/_build/html/_sources/week44.ipynb b/doc/LectureNotes/_build/html/_sources/week44.ipynb
new file mode 100644
index 000000000..6193b11ee
--- /dev/null
+++ b/doc/LectureNotes/_build/html/_sources/week44.ipynb
@@ -0,0 +1,4983 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "67995f17",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- HTML file automatically generated from DocOnce source (https://github.com/doconce/doconce/)\n",
+    "doconce format html week44.do.txt --no_mako -->\n",
+    "<!-- dom:TITLE: Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN) -->"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d31bb6a0",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "# Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)\n",
+    "**Morten Hjorth-Jensen**, Department of Physics, University of Oslo, Norway\n",
+    "\n",
+    "Date: **Week 44**"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "846f5bd7",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Plan for week 44\n",
+    "\n",
+    "**Material for the lecture Monday October 27, 2025.**\n",
+    "\n",
+    "1. Solving differential equations, continuation from last week, first lecture\n",
+    "\n",
+    "2. Convolutional  Neural Networks, second lecture\n",
+    "\n",
+    "3. Readings and Videos:\n",
+    "\n",
+    "  * These lecture notes at <https://github.com/CompPhysics/MachineLearning/blob/master/doc/pub/week44/ipynb/week44.ipynb>\n",
+    "\n",
+    "  * For a more in depth discussion on  neural networks we recommend Goodfellow et al chapter 9. See also chapter 11 and 12 on practicalities and applications\n",
+    "\n",
+    "  * Reading suggestions for implementation of CNNs see Rashcka et al.'s chapter 14 at <https://github.com/rasbt/machine-learning-book/tree/main/ch14>.     \n",
+    "\n",
+    "  * Video on Deep Learning at <https://www.youtube.com/playlist?list=PLZHQObOWTQDNU6R1_67000Dx_ZCJB-3pi>\n",
+    "\n",
+    "  * Video  on Convolutional Neural Networks from MIT at <https://www.youtube.com/watch?v=iaSUYvmCekI&ab_channel=AlexanderAmini>\n",
+    "\n",
+    "  * Video on CNNs from Stanford at <https://www.youtube.com/watch?v=bNb2fEVKeEo&list=PLC1qU-LWwrF64f4QKQT-Vg5Wr4qEE1Zxk&index=6&ab_channel=StanfordUniversitySchoolofEngineering>\n",
+    "\n",
+    "  * Video of lecture October 27 at <https://youtu.be/QqOGhLgkig0>\n",
+    "\n",
+    "  * Whiteboard notes at <https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2025/FYSSTKweek44>"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "855f98ab",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Lab  sessions on Tuesday and Wednesday\n",
+    "\n",
+    "* Main focus is discussion of and work on project 2\n",
+    "\n",
+    "* If you did not get time to finish the exercises from weeks 41-42, you can also keep working on them and hand in this coming Friday"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "12675cc5",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Material for Lecture Monday October 27"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f714320f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Solving differential equations  with Deep Learning\n",
+    "\n",
+    "The Universal Approximation Theorem states that a neural network can\n",
+    "approximate any function at a single hidden layer along with one input\n",
+    "and output layer to any given precision.\n",
+    "\n",
+    "**Book on solving differential equations with ML methods.**\n",
+    "\n",
+    "[An Introduction to Neural Network Methods for Differential Equations](https://www.springer.com/gp/book/9789401798150), by Yadav and Kumar.\n",
+    "\n",
+    "**Physics informed neural networks.**\n",
+    "\n",
+    "[Scientific Machine Learning Through Physics–Informed Neural Networks: Where we are and What’s Next](https://link.springer.com/article/10.1007/s10915-022-01939-z), by Cuomo et al\n",
+    "\n",
+    "**Thanks to Kristine Baluka Hein.**\n",
+    "\n",
+    "The lectures on differential equations were developed by Kristine Baluka Hein, now PhD student at IFI.\n",
+    "A great thanks to Kristine."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ebe354b6",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Ordinary Differential Equations first\n",
+    "\n",
+    "An ordinary differential equation (ODE) is an equation involving functions having one variable.\n",
+    "\n",
+    "In general, an ordinary differential equation looks like"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f16621c0",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"ode\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation} \\label{ode} \\tag{1}\n",
+    "f\\left(x, \\, g(x), \\, g'(x), \\, g''(x), \\, \\dots \\, , \\, g^{(n)}(x)\\right) = 0\n",
+    "\\end{equation}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2b272a0d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where $g(x)$ is the function to find, and $g^{(n)}(x)$ is the $n$-th derivative of $g(x)$.\n",
+    "\n",
+    "The $f\\left(x, g(x), g'(x), g''(x), \\, \\dots \\, , g^{(n)}(x)\\right)$ is just a way to write that there is an expression involving $x$ and $g(x), \\ g'(x), \\ g''(x), \\, \\dots \\, , \\text{ and } g^{(n)}(x)$ on the left side of the equality sign in ([1](#ode)).\n",
+    "The highest order of derivative, that is the value of $n$, determines to the order of the equation.\n",
+    "The equation is referred to as a $n$-th order ODE.\n",
+    "Along with ([1](#ode)), some additional conditions of the function $g(x)$ are typically given\n",
+    "for the solution to be unique."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "611b2399",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## The trial solution\n",
+    "\n",
+    "Let the trial solution $g_t(x)$ be"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cab2d9fb",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"_auto1\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation}\n",
+    "\tg_t(x) = h_1(x) + h_2(x,N(x,P))\n",
+    "\\label{_auto1} \\tag{2}\n",
+    "\\end{equation}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "fbd68a84",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where $h_1(x)$ is a function that makes $g_t(x)$ satisfy a given set\n",
+    "of conditions, $N(x,P)$ a neural network with weights and biases\n",
+    "described by $P$ and $h_2(x, N(x,P))$ some expression involving the\n",
+    "neural network.  The role of the function $h_2(x, N(x,P))$, is to\n",
+    "ensure that the output from $N(x,P)$ is zero when $g_t(x)$ is\n",
+    "evaluated at the values of $x$ where the given conditions must be\n",
+    "satisfied.  The function $h_1(x)$ should alone make $g_t(x)$ satisfy\n",
+    "the conditions.\n",
+    "\n",
+    "But what about the network $N(x,P)$?\n",
+    "\n",
+    "As described previously, an optimization method could be used to minimize the parameters of a neural network, that being its weights and biases, through backward propagation."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "24929e78",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Minimization process\n",
+    "\n",
+    "For the minimization to be defined, we need to have a cost function at hand to minimize.\n",
+    "\n",
+    "It is given that $f\\left(x, \\, g(x), \\, g'(x), \\, g''(x), \\, \\dots \\, , \\, g^{(n)}(x)\\right)$ should be equal to zero in ([1](#ode)).\n",
+    "We can choose to consider the mean squared error as the cost function for an input $x$.\n",
+    "Since we are looking at one input, the cost function is just $f$ squared.\n",
+    "The cost function $c\\left(x, P \\right)$ can therefore be expressed as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8da0a4d4",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "C\\left(x, P\\right) = \\big(f\\left(x, \\, g(x), \\, g'(x), \\, g''(x), \\, \\dots \\, , \\, g^{(n)}(x)\\right)\\big)^2\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3de8b89e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "If $N$ inputs are given as a vector $\\boldsymbol{x}$ with elements $x_i$ for $i = 1,\\dots,N$,\n",
+    "the cost function becomes"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1275ce7a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"cost\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation} \\label{cost} \\tag{3}\n",
+    "\tC\\left(\\boldsymbol{x}, P\\right) = \\frac{1}{N} \\sum_{i=1}^N \\big(f\\left(x_i, \\, g(x_i), \\, g'(x_i), \\, g''(x_i), \\, \\dots \\, , \\, g^{(n)}(x_i)\\right)\\big)^2\n",
+    "\\end{equation}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a522e0fa",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "The neural net should then find the parameters $P$ that minimizes the cost function in\n",
+    "([3](#cost)) for a set of $N$ training samples $x_i$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8a18955b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Minimizing the cost function using gradient descent and automatic differentiation\n",
+    "\n",
+    "To perform the minimization using gradient descent, the gradient of $C\\left(\\boldsymbol{x}, P\\right)$ is needed.\n",
+    "It might happen so that finding an analytical expression of the gradient of $C(\\boldsymbol{x}, P)$ from ([3](#cost)) gets too messy, depending on which cost function one desires to use.\n",
+    "\n",
+    "Luckily, there exists libraries that makes the job for us through automatic differentiation.\n",
+    "Automatic differentiation is a method of finding the derivatives numerically with very high precision."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "888808f7",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Example: Exponential decay\n",
+    "\n",
+    "An exponential decay of a quantity $g(x)$ is described by the equation"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "fcefd7fb",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"solve_expdec\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation} \\label{solve_expdec} \\tag{4}\n",
+    "  g'(x) = -\\gamma g(x)\n",
+    "\\end{equation}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "02cb2ce9",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "with $g(0) = g_0$ for some chosen initial value $g_0$.\n",
+    "\n",
+    "The analytical solution of ([4](#solve_expdec)) is"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "bdd9ef4d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"_auto2\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation}\n",
+    "  g(x) = g_0 \\exp\\left(-\\gamma x\\right)\n",
+    "\\label{_auto2} \\tag{5}\n",
+    "\\end{equation}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "867cbb56",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Having an analytical solution at hand, it is possible to use it to compare how well a neural network finds a solution of ([4](#solve_expdec))."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2f9ac7ae",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## The function to solve for\n",
+    "\n",
+    "The program will use a neural network to solve"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "49a68337",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"solveode\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation} \\label{solveode} \\tag{6}\n",
+    "g'(x) = -\\gamma g(x)\n",
+    "\\end{equation}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a6a70316",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where $g(0) = g_0$ with $\\gamma$ and $g_0$ being some chosen values.\n",
+    "\n",
+    "In this example, $\\gamma = 2$ and $g_0 = 10$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "15622597",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## The trial solution\n",
+    "To begin with, a trial solution $g_t(t)$ must be chosen. A general trial solution for ordinary differential equations could be"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3661d5fe",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "g_t(x, P) = h_1(x) + h_2(x, N(x, P))\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "245327b3",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "with $h_1(x)$ ensuring that $g_t(x)$ satisfies some conditions and $h_2(x,N(x, P))$ an expression involving $x$ and the output from the neural network $N(x,P)$ with $P $ being the collection of the weights and biases for each layer. For now, it is assumed that the network consists of one input layer, one hidden layer, and one output layer."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "57ae96b2",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Setup of Network\n",
+    "\n",
+    "In this network, there are no weights and bias at the input layer, so $P = \\{ P_{\\text{hidden}},  P_{\\text{output}} \\}$.\n",
+    "If there are $N_{\\text{hidden} }$ neurons in the hidden layer, then $P_{\\text{hidden}}$ is a $N_{\\text{hidden} } \\times (1 + N_{\\text{input}})$ matrix, given that there are $N_{\\text{input}}$ neurons in the input layer.\n",
+    "\n",
+    "The first column in $P_{\\text{hidden} }$ represents the bias for each neuron in the hidden layer and the second column represents the weights for each neuron in the hidden layer from the input layer.\n",
+    "If there are $N_{\\text{output} }$ neurons in the output layer, then $P_{\\text{output}} $ is a $N_{\\text{output} } \\times (1 + N_{\\text{hidden} })$ matrix.\n",
+    "\n",
+    "Its first column represents the bias of each neuron and the remaining columns represents the weights to each neuron.\n",
+    "\n",
+    "It is given that $g(0) = g_0$. The trial solution must fulfill this condition to be a proper solution of ([6](#solveode)). A possible way to ensure that $g_t(0, P) = g_0$, is to let $F(N(x,P)) = x \\cdot N(x,P)$ and $h_1(x) = g_0$. This gives the following trial solution:"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6e7ea73f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"trial\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation} \\label{trial} \\tag{7}\n",
+    "g_t(x, P) = g_0 + x \\cdot N(x, P)\n",
+    "\\end{equation}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3ef84086",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Reformulating the problem\n",
+    "\n",
+    "We wish that our neural network manages to minimize a given cost function.\n",
+    "\n",
+    "A reformulation of out equation, ([6](#solveode)), must therefore be done,\n",
+    "such that it describes the problem a neural network can solve for.\n",
+    "\n",
+    "The neural network must find the set of weights and biases $P$ such that the trial solution in ([7](#trial)) satisfies ([6](#solveode)).\n",
+    "\n",
+    "The trial solution"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "03980965",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "g_t(x, P) = g_0 + x \\cdot N(x, P)\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f838bf7c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "has been chosen such that it already solves the condition $g(0) = g_0$. What remains, is to find $P$ such that"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3e1ebb62",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"nnmin\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation} \\label{nnmin} \\tag{8}\n",
+    "g_t'(x, P) = - \\gamma g_t(x, P)\n",
+    "\\end{equation}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a85dcbea",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "is fulfilled as *best as possible*."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "dc4a2fc0",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## More technicalities\n",
+    "\n",
+    "The left hand side and right hand side of ([8](#nnmin)) must be computed separately, and then the neural network must choose weights and biases, contained in $P$, such that the sides are equal as best as possible.\n",
+    "This means that the absolute or squared difference between the sides must be as close to zero, ideally equal to zero.\n",
+    "In this case, the difference squared shows to be an appropriate measurement of how erroneous the trial solution is with respect to $P$ of the neural network.\n",
+    "\n",
+    "This gives the following cost function our neural network must solve for:"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "20921b20",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\min_{P}\\Big\\{ \\big(g_t'(x, P) - ( -\\gamma g_t(x, P) \\big)^2 \\Big\\}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "06e89d99",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "(the notation $\\min_{P}\\{ f(x, P) \\}$ means that we desire to find $P$ that yields the minimum of $f(x, P)$)\n",
+    "\n",
+    "or, in terms of weights and biases for the hidden and output layer in our network:"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "fb4b7d00",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\min_{P_{\\text{hidden} }, \\ P_{\\text{output} }}\\Big\\{ \\big(g_t'(x, \\{ P_{\\text{hidden} }, P_{\\text{output} }\\}) - ( -\\gamma g_t(x, \\{ P_{\\text{hidden} }, P_{\\text{output} }\\}) \\big)^2 \\Big\\}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "925d8872",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "for an input value $x$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "46f38d69",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## More details\n",
+    "\n",
+    "If the neural network evaluates $g_t(x, P)$ at more values for $x$, say $N$ values $x_i$ for $i = 1, \\dots, N$, then the *total* error to minimize becomes"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "adca56df",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"min\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation} \\label{min} \\tag{9}\n",
+    "\\min_{P}\\Big\\{\\frac{1}{N} \\sum_{i=1}^N  \\big(g_t'(x_i, P) - ( -\\gamma g_t(x_i, P) \\big)^2 \\Big\\}\n",
+    "\\end{equation}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9e260216",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Letting $\\boldsymbol{x}$ be a vector with elements $x_i$ and $C(\\boldsymbol{x}, P) = \\frac{1}{N} \\sum_i  \\big(g_t'(x_i, P) - ( -\\gamma g_t(x_i, P) \\big)^2$ denote the cost function, the minimization problem that our network must solve, becomes"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7d5e7f63",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\min_{P} C(\\boldsymbol{x}, P)\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d7442d44",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "In terms of $P_{\\text{hidden} }$ and $P_{\\text{output} }$, this could also be expressed as\n",
+    "\n",
+    "$$\n",
+    "\\min_{P_{\\text{hidden} }, \\ P_{\\text{output} }} C(\\boldsymbol{x}, \\{P_{\\text{hidden} }, P_{\\text{output} }\\})\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "af21673a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## A possible implementation of a neural network\n",
+    "\n",
+    "For simplicity, it is assumed that the input is an array $\\boldsymbol{x} = (x_1, \\dots, x_N)$ with $N$ elements. It is at these points the neural network should find $P$ such that it fulfills ([9](#min)).\n",
+    "\n",
+    "First, the neural network must feed forward the inputs.\n",
+    "This means that $\\boldsymbol{x}s$ must be passed through an input layer, a hidden layer and a output layer. The input layer in this case, does not need to process the data any further.\n",
+    "The input layer will consist of $N_{\\text{input} }$ neurons, passing its element to each neuron in the hidden layer.  The number of neurons in the hidden layer will be $N_{\\text{hidden} }$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6687f370",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Technicalities\n",
+    "\n",
+    "For the $i$-th in the hidden layer with weight $w_i^{\\text{hidden} }$ and bias $b_i^{\\text{hidden} }$, the weighting from the $j$-th neuron at the input layer is:"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7c07e210",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\begin{aligned}\n",
+    "z_{i,j}^{\\text{hidden}} &= b_i^{\\text{hidden}} + w_i^{\\text{hidden}}x_j \\\\\n",
+    "&=\n",
+    "\\begin{pmatrix}\n",
+    "b_i^{\\text{hidden}} & w_i^{\\text{hidden}}\n",
+    "\\end{pmatrix}\n",
+    "\\begin{pmatrix}\n",
+    "1 \\\\\n",
+    "x_j\n",
+    "\\end{pmatrix}\n",
+    "\\end{aligned}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7747386f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Final technicalities I\n",
+    "\n",
+    "The result after weighting the inputs at the $i$-th hidden neuron can be written as a vector:"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "981c5e4b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\begin{aligned}\n",
+    "\\boldsymbol{z}_{i}^{\\text{hidden}} &= \\Big( b_i^{\\text{hidden}} + w_i^{\\text{hidden}}x_1 , \\ b_i^{\\text{hidden}} + w_i^{\\text{hidden}} x_2, \\ \\dots \\, , \\ b_i^{\\text{hidden}} + w_i^{\\text{hidden}} x_N\\Big)  \\\\\n",
+    "&=\n",
+    "\\begin{pmatrix}\n",
+    " b_i^{\\text{hidden}}  & w_i^{\\text{hidden}}\n",
+    "\\end{pmatrix}\n",
+    "\\begin{pmatrix}\n",
+    "1  & 1 & \\dots & 1 \\\\\n",
+    "x_1 & x_2 & \\dots & x_N\n",
+    "\\end{pmatrix} \\\\\n",
+    "&= \\boldsymbol{p}_{i, \\text{hidden}}^T X\n",
+    "\\end{aligned}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7eedb1ed",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Final technicalities II\n",
+    "\n",
+    "The vector $\\boldsymbol{p}_{i, \\text{hidden}}^T$ constitutes each row in $P_{\\text{hidden} }$, which contains the weights for the neural network to minimize according to ([9](#min)).\n",
+    "\n",
+    "After having found $\\boldsymbol{z}_{i}^{\\text{hidden}} $ for every $i$-th neuron within the hidden layer, the vector will be sent to an activation function $a_i(\\boldsymbol{z})$.\n",
+    "\n",
+    "In this example, the sigmoid function has been chosen to be the activation function for each hidden neuron:"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8507388c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "f(z) = \\frac{1}{1 + \\exp{(-z)}}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "32c6ce19",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "It is possible to use other activations functions for the hidden layer also.\n",
+    "\n",
+    "The output $\\boldsymbol{x}_i^{\\text{hidden}}$ from each $i$-th hidden neuron is:\n",
+    "\n",
+    "$$\n",
+    "\\boldsymbol{x}_i^{\\text{hidden} } = f\\big(  \\boldsymbol{z}_{i}^{\\text{hidden}} \\big)\n",
+    "$$\n",
+    "\n",
+    "The outputs $\\boldsymbol{x}_i^{\\text{hidden} } $ are then sent to the output layer.\n",
+    "\n",
+    "The output layer consists of one neuron in this case, and combines the\n",
+    "output from each of the neurons in the hidden layers. The output layer\n",
+    "combines the results from the hidden layer using some weights $w_i^{\\text{output}}$\n",
+    "and biases $b_i^{\\text{output}}$. In this case,\n",
+    "it is assumes that the number of neurons in the output layer is one."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d3adb503",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Final technicalities III\n",
+    "\n",
+    "The procedure of weighting the output neuron $j$ in the hidden layer to the $i$-th neuron in the output layer is similar as for the hidden layer described previously."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "41fb7d85",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\begin{aligned}\n",
+    "z_{1,j}^{\\text{output}} & =\n",
+    "\\begin{pmatrix}\n",
+    "b_1^{\\text{output}} & \\boldsymbol{w}_1^{\\text{output}}\n",
+    "\\end{pmatrix}\n",
+    "\\begin{pmatrix}\n",
+    "1 \\\\\n",
+    "\\boldsymbol{x}_j^{\\text{hidden}}\n",
+    "\\end{pmatrix}\n",
+    "\\end{aligned}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6af6c5f6",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Final technicalities IV\n",
+    "\n",
+    "Expressing $z_{1,j}^{\\text{output}}$ as a vector gives the following way of weighting the inputs from the hidden layer:"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "bfdfcfe5",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\boldsymbol{z}_{1}^{\\text{output}} =\n",
+    "\\begin{pmatrix}\n",
+    "b_1^{\\text{output}} & \\boldsymbol{w}_1^{\\text{output}}\n",
+    "\\end{pmatrix}\n",
+    "\\begin{pmatrix}\n",
+    "1  & 1 & \\dots & 1 \\\\\n",
+    "\\boldsymbol{x}_1^{\\text{hidden}} & \\boldsymbol{x}_2^{\\text{hidden}} & \\dots & \\boldsymbol{x}_N^{\\text{hidden}}\n",
+    "\\end{pmatrix}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "224fb7a0",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "In this case we seek a continuous range of values since we are approximating a function. This means that after computing $\\boldsymbol{z}_{1}^{\\text{output}}$ the neural network has finished its feed forward step, and $\\boldsymbol{z}_{1}^{\\text{output}}$ is the final output of the network."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "03c8c39e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Back propagation\n",
+    "\n",
+    "The next step is to decide how the parameters should be changed such that they minimize the cost function.\n",
+    "\n",
+    "The chosen cost function for this problem is"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f467feb4",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "C(\\boldsymbol{x}, P) = \\frac{1}{N} \\sum_i  \\big(g_t'(x_i, P) - ( -\\gamma g_t(x_i, P) \\big)^2\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "287a0aed",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "In order to minimize the cost function, an optimization method must be chosen.\n",
+    "\n",
+    "Here, gradient descent with a constant step size has been chosen."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a49835f1",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Gradient descent\n",
+    "\n",
+    "The idea of the gradient descent algorithm is to update parameters in\n",
+    "a direction where the cost function decreases goes to a minimum.\n",
+    "\n",
+    "In general, the update of some parameters $\\boldsymbol{\\omega}$ given a cost\n",
+    "function defined by some weights $\\boldsymbol{\\omega}$, $C(\\boldsymbol{x},\n",
+    "\\boldsymbol{\\omega})$, goes as follows:"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "62d6f51d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\boldsymbol{\\omega}_{\\text{new} } = \\boldsymbol{\\omega} - \\lambda \\nabla_{\\boldsymbol{\\omega}} C(\\boldsymbol{x}, \\boldsymbol{\\omega})\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3ca20573",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "for a number of iterations or until $ \\big|\\big| \\boldsymbol{\\omega}_{\\text{new} } - \\boldsymbol{\\omega} \\big|\\big|$ becomes smaller than some given tolerance.\n",
+    "\n",
+    "The value of $\\lambda$ decides how large steps the algorithm must take\n",
+    "in the direction of $ \\nabla_{\\boldsymbol{\\omega}} C(\\boldsymbol{x}, \\boldsymbol{\\omega})$.\n",
+    "The notation $\\nabla_{\\boldsymbol{\\omega}}$ express the gradient with respect\n",
+    "to the elements in $\\boldsymbol{\\omega}$.\n",
+    "\n",
+    "In our case, we have to minimize the cost function $C(\\boldsymbol{x}, P)$ with\n",
+    "respect to the two sets of weights and biases, that is for the hidden\n",
+    "layer $P_{\\text{hidden} }$ and for the output layer $P_{\\text{output}\n",
+    "}$ .\n",
+    "\n",
+    "This means that $P_{\\text{hidden} }$ and $P_{\\text{output} }$ is updated by"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8b16bc94",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\begin{aligned}\n",
+    "P_{\\text{hidden},\\text{new}} &= P_{\\text{hidden}} - \\lambda \\nabla_{P_{\\text{hidden}}} C(\\boldsymbol{x}, P)  \\\\\n",
+    "P_{\\text{output},\\text{new}} &= P_{\\text{output}} - \\lambda \\nabla_{P_{\\text{output}}} C(\\boldsymbol{x}, P)\n",
+    "\\end{aligned}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a339b3a7",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## The code for solving the ODE"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "id": "a63e587a",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "%matplotlib inline\n",
+    "\n",
+    "import autograd.numpy as np\n",
+    "from autograd import grad, elementwise_grad\n",
+    "import autograd.numpy.random as npr\n",
+    "from matplotlib import pyplot as plt\n",
+    "\n",
+    "def sigmoid(z):\n",
+    "    return 1/(1 + np.exp(-z))\n",
+    "\n",
+    "# Assuming one input, hidden, and output layer\n",
+    "def neural_network(params, x):\n",
+    "\n",
+    "    # Find the weights (including and biases) for the hidden and output layer.\n",
+    "    # Assume that params is a list of parameters for each layer.\n",
+    "    # The biases are the first element for each array in params,\n",
+    "    # and the weights are the remaning elements in each array in params.\n",
+    "\n",
+    "    w_hidden = params[0]\n",
+    "    w_output = params[1]\n",
+    "\n",
+    "    # Assumes input x being an one-dimensional array\n",
+    "    num_values = np.size(x)\n",
+    "    x = x.reshape(-1, num_values)\n",
+    "\n",
+    "    # Assume that the input layer does nothing to the input x\n",
+    "    x_input = x\n",
+    "\n",
+    "    ## Hidden layer:\n",
+    "\n",
+    "    # Add a row of ones to include bias\n",
+    "    x_input = np.concatenate((np.ones((1,num_values)), x_input ), axis = 0)\n",
+    "\n",
+    "    z_hidden = np.matmul(w_hidden, x_input)\n",
+    "    x_hidden = sigmoid(z_hidden)\n",
+    "\n",
+    "    ## Output layer:\n",
+    "\n",
+    "    # Include bias:\n",
+    "    x_hidden = np.concatenate((np.ones((1,num_values)), x_hidden ), axis = 0)\n",
+    "\n",
+    "    z_output = np.matmul(w_output, x_hidden)\n",
+    "    x_output = z_output\n",
+    "\n",
+    "    return x_output\n",
+    "\n",
+    "# The trial solution using the deep neural network:\n",
+    "def g_trial(x,params, g0 = 10):\n",
+    "    return g0 + x*neural_network(params,x)\n",
+    "\n",
+    "# The right side of the ODE:\n",
+    "def g(x, g_trial, gamma = 2):\n",
+    "    return -gamma*g_trial\n",
+    "\n",
+    "# The cost function:\n",
+    "def cost_function(P, x):\n",
+    "\n",
+    "    # Evaluate the trial function with the current parameters P\n",
+    "    g_t = g_trial(x,P)\n",
+    "\n",
+    "    # Find the derivative w.r.t x of the neural network\n",
+    "    d_net_out = elementwise_grad(neural_network,1)(P,x)\n",
+    "\n",
+    "    # Find the derivative w.r.t x of the trial function\n",
+    "    d_g_t = elementwise_grad(g_trial,0)(x,P)\n",
+    "\n",
+    "    # The right side of the ODE\n",
+    "    func = g(x, g_t)\n",
+    "\n",
+    "    err_sqr = (d_g_t - func)**2\n",
+    "    cost_sum = np.sum(err_sqr)\n",
+    "\n",
+    "    return cost_sum / np.size(err_sqr)\n",
+    "\n",
+    "# Solve the exponential decay ODE using neural network with one input, hidden, and output layer\n",
+    "def solve_ode_neural_network(x, num_neurons_hidden, num_iter, lmb):\n",
+    "    ## Set up initial weights and biases\n",
+    "\n",
+    "    # For the hidden layer\n",
+    "    p0 = npr.randn(num_neurons_hidden, 2 )\n",
+    "\n",
+    "    # For the output layer\n",
+    "    p1 = npr.randn(1, num_neurons_hidden + 1 ) # +1 since bias is included\n",
+    "\n",
+    "    P = [p0, p1]\n",
+    "\n",
+    "    print('Initial cost: %g'%cost_function(P, x))\n",
+    "\n",
+    "    ## Start finding the optimal weights using gradient descent\n",
+    "\n",
+    "    # Find the Python function that represents the gradient of the cost function\n",
+    "    # w.r.t the 0-th input argument -- that is the weights and biases in the hidden and output layer\n",
+    "    cost_function_grad = grad(cost_function,0)\n",
+    "\n",
+    "    # Let the update be done num_iter times\n",
+    "    for i in range(num_iter):\n",
+    "        # Evaluate the gradient at the current weights and biases in P.\n",
+    "        # The cost_grad consist now of two arrays;\n",
+    "        # one for the gradient w.r.t P_hidden and\n",
+    "        # one for the gradient w.r.t P_output\n",
+    "        cost_grad =  cost_function_grad(P, x)\n",
+    "\n",
+    "        P[0] = P[0] - lmb * cost_grad[0]\n",
+    "        P[1] = P[1] - lmb * cost_grad[1]\n",
+    "\n",
+    "    print('Final cost: %g'%cost_function(P, x))\n",
+    "\n",
+    "    return P\n",
+    "\n",
+    "def g_analytic(x, gamma = 2, g0 = 10):\n",
+    "    return g0*np.exp(-gamma*x)\n",
+    "\n",
+    "# Solve the given problem\n",
+    "if __name__ == '__main__':\n",
+    "    # Set seed such that the weight are initialized\n",
+    "    # with same weights and biases for every run.\n",
+    "    npr.seed(15)\n",
+    "\n",
+    "    ## Decide the vales of arguments to the function to solve\n",
+    "    N = 10\n",
+    "    x = np.linspace(0, 1, N)\n",
+    "\n",
+    "    ## Set up the initial parameters\n",
+    "    num_hidden_neurons = 10\n",
+    "    num_iter = 10000\n",
+    "    lmb = 0.001\n",
+    "\n",
+    "    # Use the network\n",
+    "    P = solve_ode_neural_network(x, num_hidden_neurons, num_iter, lmb)\n",
+    "\n",
+    "    # Print the deviation from the trial solution and true solution\n",
+    "    res = g_trial(x,P)\n",
+    "    res_analytical = g_analytic(x)\n",
+    "\n",
+    "    print('Max absolute difference: %g'%np.max(np.abs(res - res_analytical)))\n",
+    "\n",
+    "    # Plot the results\n",
+    "    plt.figure(figsize=(10,10))\n",
+    "\n",
+    "    plt.title('Performance of neural network solving an ODE compared to the analytical solution')\n",
+    "    plt.plot(x, res_analytical)\n",
+    "    plt.plot(x, res[0,:])\n",
+    "    plt.legend(['analytical','nn'])\n",
+    "    plt.xlabel('x')\n",
+    "    plt.ylabel('g(x)')\n",
+    "    plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "85985bda",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## The network with one input layer, specified number of hidden layers, and one output layer\n",
+    "\n",
+    "It is also possible to extend the construction of our network into a more general one, allowing the network to contain more than one hidden layers.\n",
+    "\n",
+    "The number of neurons within each hidden layer are given as a list of integers in the program below."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "id": "91831f8e",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "import autograd.numpy as np\n",
+    "from autograd import grad, elementwise_grad\n",
+    "import autograd.numpy.random as npr\n",
+    "from matplotlib import pyplot as plt\n",
+    "\n",
+    "def sigmoid(z):\n",
+    "    return 1/(1 + np.exp(-z))\n",
+    "\n",
+    "# The neural network with one input layer and one output layer,\n",
+    "# but with number of hidden layers specified by the user.\n",
+    "def deep_neural_network(deep_params, x):\n",
+    "    # N_hidden is the number of hidden layers\n",
+    "    # deep_params is a list, len() should be used\n",
+    "    N_hidden = len(deep_params) - 1 # -1 since params consists of\n",
+    "                                        # parameters to all the hidden\n",
+    "                                        # layers AND the output layer.\n",
+    "\n",
+    "    # Assumes input x being an one-dimensional array\n",
+    "    num_values = np.size(x)\n",
+    "    x = x.reshape(-1, num_values)\n",
+    "\n",
+    "    # Assume that the input layer does nothing to the input x\n",
+    "    x_input = x\n",
+    "\n",
+    "    # Due to multiple hidden layers, define a variable referencing to the\n",
+    "    # output of the previous layer:\n",
+    "    x_prev = x_input\n",
+    "\n",
+    "    ## Hidden layers:\n",
+    "\n",
+    "    for l in range(N_hidden):\n",
+    "        # From the list of parameters P; find the correct weigths and bias for this layer\n",
+    "        w_hidden = deep_params[l]\n",
+    "\n",
+    "        # Add a row of ones to include bias\n",
+    "        x_prev = np.concatenate((np.ones((1,num_values)), x_prev ), axis = 0)\n",
+    "\n",
+    "        z_hidden = np.matmul(w_hidden, x_prev)\n",
+    "        x_hidden = sigmoid(z_hidden)\n",
+    "\n",
+    "        # Update x_prev such that next layer can use the output from this layer\n",
+    "        x_prev = x_hidden\n",
+    "\n",
+    "    ## Output layer:\n",
+    "\n",
+    "    # Get the weights and bias for this layer\n",
+    "    w_output = deep_params[-1]\n",
+    "\n",
+    "    # Include bias:\n",
+    "    x_prev = np.concatenate((np.ones((1,num_values)), x_prev), axis = 0)\n",
+    "\n",
+    "    z_output = np.matmul(w_output, x_prev)\n",
+    "    x_output = z_output\n",
+    "\n",
+    "    return x_output\n",
+    "\n",
+    "# The trial solution using the deep neural network:\n",
+    "def g_trial_deep(x,params, g0 = 10):\n",
+    "    return g0 + x*deep_neural_network(params, x)\n",
+    "\n",
+    "# The right side of the ODE:\n",
+    "def g(x, g_trial, gamma = 2):\n",
+    "    return -gamma*g_trial\n",
+    "\n",
+    "# The same cost function as before, but calls deep_neural_network instead.\n",
+    "def cost_function_deep(P, x):\n",
+    "\n",
+    "    # Evaluate the trial function with the current parameters P\n",
+    "    g_t = g_trial_deep(x,P)\n",
+    "\n",
+    "    # Find the derivative w.r.t x of the neural network\n",
+    "    d_net_out = elementwise_grad(deep_neural_network,1)(P,x)\n",
+    "\n",
+    "    # Find the derivative w.r.t x of the trial function\n",
+    "    d_g_t = elementwise_grad(g_trial_deep,0)(x,P)\n",
+    "\n",
+    "    # The right side of the ODE\n",
+    "    func = g(x, g_t)\n",
+    "\n",
+    "    err_sqr = (d_g_t - func)**2\n",
+    "    cost_sum = np.sum(err_sqr)\n",
+    "\n",
+    "    return cost_sum / np.size(err_sqr)\n",
+    "\n",
+    "# Solve the exponential decay ODE using neural network with one input and one output layer,\n",
+    "# but with specified number of hidden layers from the user.\n",
+    "def solve_ode_deep_neural_network(x, num_neurons, num_iter, lmb):\n",
+    "    # num_hidden_neurons is now a list of number of neurons within each hidden layer\n",
+    "\n",
+    "    # The number of elements in the list num_hidden_neurons thus represents\n",
+    "    # the number of hidden layers.\n",
+    "\n",
+    "    # Find the number of hidden layers:\n",
+    "    N_hidden = np.size(num_neurons)\n",
+    "\n",
+    "    ## Set up initial weights and biases\n",
+    "\n",
+    "    # Initialize the list of parameters:\n",
+    "    P = [None]*(N_hidden + 1) # + 1 to include the output layer\n",
+    "\n",
+    "    P[0] = npr.randn(num_neurons[0], 2 )\n",
+    "    for l in range(1,N_hidden):\n",
+    "        P[l] = npr.randn(num_neurons[l], num_neurons[l-1] + 1) # +1 to include bias\n",
+    "\n",
+    "    # For the output layer\n",
+    "    P[-1] = npr.randn(1, num_neurons[-1] + 1 ) # +1 since bias is included\n",
+    "\n",
+    "    print('Initial cost: %g'%cost_function_deep(P, x))\n",
+    "\n",
+    "    ## Start finding the optimal weights using gradient descent\n",
+    "\n",
+    "    # Find the Python function that represents the gradient of the cost function\n",
+    "    # w.r.t the 0-th input argument -- that is the weights and biases in the hidden and output layer\n",
+    "    cost_function_deep_grad = grad(cost_function_deep,0)\n",
+    "\n",
+    "    # Let the update be done num_iter times\n",
+    "    for i in range(num_iter):\n",
+    "        # Evaluate the gradient at the current weights and biases in P.\n",
+    "        # The cost_grad consist now of N_hidden + 1 arrays; the gradient w.r.t the weights and biases\n",
+    "        # in the hidden layers and output layers evaluated at x.\n",
+    "        cost_deep_grad =  cost_function_deep_grad(P, x)\n",
+    "\n",
+    "        for l in range(N_hidden+1):\n",
+    "            P[l] = P[l] - lmb * cost_deep_grad[l]\n",
+    "\n",
+    "    print('Final cost: %g'%cost_function_deep(P, x))\n",
+    "\n",
+    "    return P\n",
+    "\n",
+    "def g_analytic(x, gamma = 2, g0 = 10):\n",
+    "    return g0*np.exp(-gamma*x)\n",
+    "\n",
+    "# Solve the given problem\n",
+    "if __name__ == '__main__':\n",
+    "    npr.seed(15)\n",
+    "\n",
+    "    ## Decide the vales of arguments to the function to solve\n",
+    "    N = 10\n",
+    "    x = np.linspace(0, 1, N)\n",
+    "\n",
+    "    ## Set up the initial parameters\n",
+    "    num_hidden_neurons = np.array([10,10])\n",
+    "    num_iter = 10000\n",
+    "    lmb = 0.001\n",
+    "\n",
+    "    P = solve_ode_deep_neural_network(x, num_hidden_neurons, num_iter, lmb)\n",
+    "\n",
+    "    res = g_trial_deep(x,P)\n",
+    "    res_analytical = g_analytic(x)\n",
+    "\n",
+    "    plt.figure(figsize=(10,10))\n",
+    "\n",
+    "    plt.title('Performance of a deep neural network solving an ODE compared to the analytical solution')\n",
+    "    plt.plot(x, res_analytical)\n",
+    "    plt.plot(x, res[0,:])\n",
+    "    plt.legend(['analytical','dnn'])\n",
+    "    plt.ylabel('g(x)')\n",
+    "    plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e6de1553",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Example: Population growth\n",
+    "\n",
+    "A logistic model of population growth assumes that a population converges toward an equilibrium.\n",
+    "The population growth can be modeled by"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6e4c5e3a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"log\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation} \\label{log} \\tag{10}\n",
+    "\tg'(t) = \\alpha g(t)(A - g(t))\n",
+    "\\end{equation}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "64a97256",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where $g(t)$ is the population density at time $t$, $\\alpha > 0$ the growth rate and $A > 0$ is the maximum population number in the environment.\n",
+    "Also, at $t = 0$ the population has the size $g(0) = g_0$, where $g_0$ is some chosen constant.\n",
+    "\n",
+    "In this example, similar network as for the exponential decay using Autograd has been used to solve the equation. However, as the implementation might suffer from e.g numerical instability\n",
+    "and high execution time (this might be more apparent in the examples solving PDEs),\n",
+    "using a library like  TensorFlow is recommended.\n",
+    "Here, we stay with a more simple approach and implement for comparison, the simple forward Euler method."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "94bb8aaa",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Setting up the problem\n",
+    "\n",
+    "Here, we will model a population $g(t)$ in an environment having carrying capacity $A$.\n",
+    "The population follows the model"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "29ead54b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"solveode_population\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation} \\label{solveode_population} \\tag{11}\n",
+    "g'(t) = \\alpha g(t)(A - g(t))\n",
+    "\\end{equation}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5685f6e2",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where $g(0) = g_0$.\n",
+    "\n",
+    "In this example, we let $\\alpha = 2$, $A = 1$, and $g_0 = 1.2$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "adaea719",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## The trial solution\n",
+    "\n",
+    "We will get a slightly different trial solution, as the boundary conditions are different\n",
+    "compared to the case for exponential decay.\n",
+    "\n",
+    "A possible trial solution satisfying the condition $g(0) = g_0$ could be\n",
+    "\n",
+    "$$\n",
+    "h_1(t) = g_0 + t \\cdot N(t,P)\n",
+    "$$\n",
+    "\n",
+    "with $N(t,P)$ being the output from the neural network with weights and biases for each layer collected in the set $P$.\n",
+    "\n",
+    "The analytical solution is\n",
+    "\n",
+    "$$\n",
+    "g(t) = \\frac{Ag_0}{g_0 + (A - g_0)\\exp(-\\alpha A t)}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4ee7e543",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## The program using Autograd\n",
+    "\n",
+    "The network will be the similar as for the exponential decay example, but with some small modifications for our problem."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "id": "e50f4369",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "import autograd.numpy as np\n",
+    "from autograd import grad, elementwise_grad\n",
+    "import autograd.numpy.random as npr\n",
+    "from matplotlib import pyplot as plt\n",
+    "\n",
+    "def sigmoid(z):\n",
+    "    return 1/(1 + np.exp(-z))\n",
+    "\n",
+    "# Function to get the parameters.\n",
+    "# Done such that one can easily change the paramaters after one's liking.\n",
+    "def get_parameters():\n",
+    "    alpha = 2\n",
+    "    A = 1\n",
+    "    g0 = 1.2\n",
+    "    return alpha, A, g0\n",
+    "\n",
+    "def deep_neural_network(deep_params, x):\n",
+    "    # N_hidden is the number of hidden layers\n",
+    "    # deep_params is a list, len() should be used\n",
+    "    N_hidden = len(deep_params) - 1 # -1 since params consists of\n",
+    "                                        # parameters to all the hidden\n",
+    "                                        # layers AND the output layer.\n",
+    "\n",
+    "    # Assumes input x being an one-dimensional array\n",
+    "    num_values = np.size(x)\n",
+    "    x = x.reshape(-1, num_values)\n",
+    "\n",
+    "    # Assume that the input layer does nothing to the input x\n",
+    "    x_input = x\n",
+    "\n",
+    "    # Due to multiple hidden layers, define a variable referencing to the\n",
+    "    # output of the previous layer:\n",
+    "    x_prev = x_input\n",
+    "\n",
+    "    ## Hidden layers:\n",
+    "\n",
+    "    for l in range(N_hidden):\n",
+    "        # From the list of parameters P; find the correct weigths and bias for this layer\n",
+    "        w_hidden = deep_params[l]\n",
+    "\n",
+    "        # Add a row of ones to include bias\n",
+    "        x_prev = np.concatenate((np.ones((1,num_values)), x_prev ), axis = 0)\n",
+    "\n",
+    "        z_hidden = np.matmul(w_hidden, x_prev)\n",
+    "        x_hidden = sigmoid(z_hidden)\n",
+    "\n",
+    "        # Update x_prev such that next layer can use the output from this layer\n",
+    "        x_prev = x_hidden\n",
+    "\n",
+    "    ## Output layer:\n",
+    "\n",
+    "    # Get the weights and bias for this layer\n",
+    "    w_output = deep_params[-1]\n",
+    "\n",
+    "    # Include bias:\n",
+    "    x_prev = np.concatenate((np.ones((1,num_values)), x_prev), axis = 0)\n",
+    "\n",
+    "    z_output = np.matmul(w_output, x_prev)\n",
+    "    x_output = z_output\n",
+    "\n",
+    "    return x_output\n",
+    "\n",
+    "\n",
+    "\n",
+    "\n",
+    "def cost_function_deep(P, x):\n",
+    "\n",
+    "    # Evaluate the trial function with the current parameters P\n",
+    "    g_t = g_trial_deep(x,P)\n",
+    "\n",
+    "    # Find the derivative w.r.t x of the trial function\n",
+    "    d_g_t = elementwise_grad(g_trial_deep,0)(x,P)\n",
+    "\n",
+    "    # The right side of the ODE\n",
+    "    func = f(x, g_t)\n",
+    "\n",
+    "    err_sqr = (d_g_t - func)**2\n",
+    "    cost_sum = np.sum(err_sqr)\n",
+    "\n",
+    "    return cost_sum / np.size(err_sqr)\n",
+    "\n",
+    "# The right side of the ODE:\n",
+    "def f(x, g_trial):\n",
+    "    alpha,A, g0 = get_parameters()\n",
+    "    return alpha*g_trial*(A - g_trial)\n",
+    "\n",
+    "# The trial solution using the deep neural network:\n",
+    "def g_trial_deep(x, params):\n",
+    "    alpha,A, g0 = get_parameters()\n",
+    "    return g0 + x*deep_neural_network(params,x)\n",
+    "\n",
+    "# The analytical solution:\n",
+    "def g_analytic(t):\n",
+    "    alpha,A, g0 = get_parameters()\n",
+    "    return A*g0/(g0 + (A - g0)*np.exp(-alpha*A*t))\n",
+    "\n",
+    "def solve_ode_deep_neural_network(x, num_neurons, num_iter, lmb):\n",
+    "    # num_hidden_neurons is now a list of number of neurons within each hidden layer\n",
+    "\n",
+    "    # Find the number of hidden layers:\n",
+    "    N_hidden = np.size(num_neurons)\n",
+    "\n",
+    "    ## Set up initial weigths and biases\n",
+    "\n",
+    "    # Initialize the list of parameters:\n",
+    "    P = [None]*(N_hidden + 1) # + 1 to include the output layer\n",
+    "\n",
+    "    P[0] = npr.randn(num_neurons[0], 2 )\n",
+    "    for l in range(1,N_hidden):\n",
+    "        P[l] = npr.randn(num_neurons[l], num_neurons[l-1] + 1) # +1 to include bias\n",
+    "\n",
+    "    # For the output layer\n",
+    "    P[-1] = npr.randn(1, num_neurons[-1] + 1 ) # +1 since bias is included\n",
+    "\n",
+    "    print('Initial cost: %g'%cost_function_deep(P, x))\n",
+    "\n",
+    "    ## Start finding the optimal weigths using gradient descent\n",
+    "\n",
+    "    # Find the Python function that represents the gradient of the cost function\n",
+    "    # w.r.t the 0-th input argument -- that is the weights and biases in the hidden and output layer\n",
+    "    cost_function_deep_grad = grad(cost_function_deep,0)\n",
+    "\n",
+    "    # Let the update be done num_iter times\n",
+    "    for i in range(num_iter):\n",
+    "        # Evaluate the gradient at the current weights and biases in P.\n",
+    "        # The cost_grad consist now of N_hidden + 1 arrays; the gradient w.r.t the weights and biases\n",
+    "        # in the hidden layers and output layers evaluated at x.\n",
+    "        cost_deep_grad =  cost_function_deep_grad(P, x)\n",
+    "\n",
+    "        for l in range(N_hidden+1):\n",
+    "            P[l] = P[l] - lmb * cost_deep_grad[l]\n",
+    "\n",
+    "    print('Final cost: %g'%cost_function_deep(P, x))\n",
+    "\n",
+    "    return P\n",
+    "\n",
+    "if __name__ == '__main__':\n",
+    "    npr.seed(4155)\n",
+    "\n",
+    "    ## Decide the vales of arguments to the function to solve\n",
+    "    Nt = 10\n",
+    "    T = 1\n",
+    "    t = np.linspace(0,T, Nt)\n",
+    "\n",
+    "    ## Set up the initial parameters\n",
+    "    num_hidden_neurons = [100, 50, 25]\n",
+    "    num_iter = 1000\n",
+    "    lmb = 1e-3\n",
+    "\n",
+    "    P = solve_ode_deep_neural_network(t, num_hidden_neurons, num_iter, lmb)\n",
+    "\n",
+    "    g_dnn_ag = g_trial_deep(t,P)\n",
+    "    g_analytical = g_analytic(t)\n",
+    "\n",
+    "    # Find the maximum absolute difference between the solutons:\n",
+    "    diff_ag = np.max(np.abs(g_dnn_ag - g_analytical))\n",
+    "    print(\"The max absolute difference between the solutions is: %g\"%diff_ag)\n",
+    "\n",
+    "    plt.figure(figsize=(10,10))\n",
+    "\n",
+    "    plt.title('Performance of neural network solving an ODE compared to the analytical solution')\n",
+    "    plt.plot(t, g_analytical)\n",
+    "    plt.plot(t, g_dnn_ag[0,:])\n",
+    "    plt.legend(['analytical','nn'])\n",
+    "    plt.xlabel('t')\n",
+    "    plt.ylabel('g(t)')\n",
+    "\n",
+    "    plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cf212644",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Using forward Euler to solve the ODE\n",
+    "\n",
+    "A straightforward way of solving an ODE numerically, is to use Euler's method.\n",
+    "\n",
+    "Euler's method uses Taylor series to approximate the value at a function $f$ at a step $\\Delta x$ from $x$:\n",
+    "\n",
+    "$$\n",
+    "f(x + \\Delta x) \\approx f(x) + \\Delta x f'(x)\n",
+    "$$\n",
+    "\n",
+    "In our case, using Euler's method to approximate the value of $g$ at a step $\\Delta t$ from $t$ yields"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "46f2fb77",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\begin{aligned}\n",
+    "  g(t + \\Delta t) &\\approx g(t) + \\Delta t g'(t) \\\\\n",
+    "  &= g(t) + \\Delta t \\big(\\alpha g(t)(A - g(t))\\big)\n",
+    "\\end{aligned}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "aab2dfa5",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "along with the condition that $g(0) = g_0$.\n",
+    "\n",
+    "Let $t_i = i \\cdot \\Delta t$ where $\\Delta t = \\frac{T}{N_t-1}$ where $T$ is the final time our solver must solve for and $N_t$ the number of values for $t \\in [0, T]$ for $i = 0, \\dots, N_t-1$.\n",
+    "\n",
+    "For $i \\geq 1$, we have that"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8eea575e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\begin{aligned}\n",
+    "t_i &= i\\Delta t \\\\\n",
+    "&= (i - 1)\\Delta t + \\Delta t \\\\\n",
+    "&= t_{i-1} + \\Delta t\n",
+    "\\end{aligned}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b91b116d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Now, if $g_i = g(t_i)$ then"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b438159d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"odenum\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation}\n",
+    "  \\begin{aligned}\n",
+    "  g_i &= g(t_i) \\\\\n",
+    "  &= g(t_{i-1} + \\Delta t) \\\\\n",
+    "  &\\approx g(t_{i-1}) + \\Delta t \\big(\\alpha g(t_{i-1})(A - g(t_{i-1}))\\big) \\\\\n",
+    "  &= g_{i-1} + \\Delta t \\big(\\alpha g_{i-1}(A - g_{i-1})\\big)\n",
+    "  \\end{aligned}\n",
+    "\\end{equation} \\label{odenum} \\tag{12}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c4fcc89b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "for $i \\geq 1$ and $g_0 = g(t_0) = g(0) = g_0$.\n",
+    "\n",
+    "Equation ([12](#odenum)) could be implemented in the following way,\n",
+    "extending the program that uses the network using Autograd:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "id": "98f55b29",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "# Assume that all function definitions from the example program using Autograd\n",
+    "# are located here.\n",
+    "\n",
+    "if __name__ == '__main__':\n",
+    "    npr.seed(4155)\n",
+    "\n",
+    "    ## Decide the vales of arguments to the function to solve\n",
+    "    Nt = 10\n",
+    "    T = 1\n",
+    "    t = np.linspace(0,T, Nt)\n",
+    "\n",
+    "    ## Set up the initial parameters\n",
+    "    num_hidden_neurons = [100,50,25]\n",
+    "    num_iter = 1000\n",
+    "    lmb = 1e-3\n",
+    "\n",
+    "    P = solve_ode_deep_neural_network(t, num_hidden_neurons, num_iter, lmb)\n",
+    "\n",
+    "    g_dnn_ag = g_trial_deep(t,P)\n",
+    "    g_analytical = g_analytic(t)\n",
+    "\n",
+    "    # Find the maximum absolute difference between the solutons:\n",
+    "    diff_ag = np.max(np.abs(g_dnn_ag - g_analytical))\n",
+    "    print(\"The max absolute difference between the solutions is: %g\"%diff_ag)\n",
+    "\n",
+    "    plt.figure(figsize=(10,10))\n",
+    "\n",
+    "    plt.title('Performance of neural network solving an ODE compared to the analytical solution')\n",
+    "    plt.plot(t, g_analytical)\n",
+    "    plt.plot(t, g_dnn_ag[0,:])\n",
+    "    plt.legend(['analytical','nn'])\n",
+    "    plt.xlabel('t')\n",
+    "    plt.ylabel('g(t)')\n",
+    "\n",
+    "    ## Find an approximation to the funtion using forward Euler\n",
+    "\n",
+    "    alpha, A, g0 = get_parameters()\n",
+    "    dt = T/(Nt - 1)\n",
+    "\n",
+    "    # Perform forward Euler to solve the ODE\n",
+    "    g_euler = np.zeros(Nt)\n",
+    "    g_euler[0] = g0\n",
+    "\n",
+    "    for i in range(1,Nt):\n",
+    "        g_euler[i] = g_euler[i-1] + dt*(alpha*g_euler[i-1]*(A - g_euler[i-1]))\n",
+    "\n",
+    "    # Print the errors done by each method\n",
+    "    diff1 = np.max(np.abs(g_euler - g_analytical))\n",
+    "    diff2 = np.max(np.abs(g_dnn_ag[0,:] - g_analytical))\n",
+    "\n",
+    "    print('Max absolute difference between Euler method and analytical: %g'%diff1)\n",
+    "    print('Max absolute difference between deep neural network and analytical: %g'%diff2)\n",
+    "\n",
+    "    # Plot results\n",
+    "    plt.figure(figsize=(10,10))\n",
+    "\n",
+    "    plt.plot(t,g_euler)\n",
+    "    plt.plot(t,g_analytical)\n",
+    "    plt.plot(t,g_dnn_ag[0,:])\n",
+    "\n",
+    "    plt.legend(['euler','analytical','dnn'])\n",
+    "    plt.xlabel('Time t')\n",
+    "    plt.ylabel('g(t)')\n",
+    "\n",
+    "    plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a6e8888e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Example: Solving the one dimensional Poisson equation\n",
+    "\n",
+    "The Poisson equation for $g(x)$ in one dimension is"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ac2720d4",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"poisson\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation} \\label{poisson} \\tag{13}\n",
+    "  -g''(x) = f(x)\n",
+    "\\end{equation}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "65554b02",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where $f(x)$ is a given function for $x \\in (0,1)$.\n",
+    "\n",
+    "The conditions that $g(x)$ is chosen to fulfill, are"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0cdf0586",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\begin{align*}\n",
+    "  g(0) &= 0 \\\\\n",
+    "  g(1) &= 0\n",
+    "\\end{align*}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f7e65a6a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "This equation can be solved numerically using programs where e.g Autograd and TensorFlow are used.\n",
+    "The results from the networks can then be compared to the analytical solution.\n",
+    "In addition, it could be interesting to see how a typical method for numerically solving second order ODEs compares to the neural networks."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cd827e12",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## The specific equation to solve for\n",
+    "\n",
+    "Here, the function $g(x)$ to solve for follows the equation"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a6100e41",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "-g''(x) = f(x),\\qquad x \\in (0,1)\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "15c06751",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where $f(x)$ is a given function, along with the chosen conditions"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b2b4dd2f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"cond\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{aligned}\n",
+    "g(0) = g(1) = 0\n",
+    "\\end{aligned}\\label{cond} \\tag{14}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2133aeed",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "In this example, we consider the case when $f(x) = (3x + x^2)\\exp(x)$.\n",
+    "\n",
+    "For this case, a possible trial solution satisfying the conditions could be"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5baf9b4b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "g_t(x) = x \\cdot (1-x) \\cdot N(P,x)\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ed82aba2",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "The analytical solution for this problem is"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c9bce69c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "g(x) = x(1 - x)\\exp(x)\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ce42c4a8",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Solving the equation using Autograd"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "id": "2fcb9045",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "import autograd.numpy as np\n",
+    "from autograd import grad, elementwise_grad\n",
+    "import autograd.numpy.random as npr\n",
+    "from matplotlib import pyplot as plt\n",
+    "\n",
+    "def sigmoid(z):\n",
+    "    return 1/(1 + np.exp(-z))\n",
+    "\n",
+    "def deep_neural_network(deep_params, x):\n",
+    "    # N_hidden is the number of hidden layers\n",
+    "    # deep_params is a list, len() should be used\n",
+    "    N_hidden = len(deep_params) - 1 # -1 since params consists of\n",
+    "                                        # parameters to all the hidden\n",
+    "                                        # layers AND the output layer.\n",
+    "\n",
+    "    # Assumes input x being an one-dimensional array\n",
+    "    num_values = np.size(x)\n",
+    "    x = x.reshape(-1, num_values)\n",
+    "\n",
+    "    # Assume that the input layer does nothing to the input x\n",
+    "    x_input = x\n",
+    "\n",
+    "    # Due to multiple hidden layers, define a variable referencing to the\n",
+    "    # output of the previous layer:\n",
+    "    x_prev = x_input\n",
+    "\n",
+    "    ## Hidden layers:\n",
+    "\n",
+    "    for l in range(N_hidden):\n",
+    "        # From the list of parameters P; find the correct weigths and bias for this layer\n",
+    "        w_hidden = deep_params[l]\n",
+    "\n",
+    "        # Add a row of ones to include bias\n",
+    "        x_prev = np.concatenate((np.ones((1,num_values)), x_prev ), axis = 0)\n",
+    "\n",
+    "        z_hidden = np.matmul(w_hidden, x_prev)\n",
+    "        x_hidden = sigmoid(z_hidden)\n",
+    "\n",
+    "        # Update x_prev such that next layer can use the output from this layer\n",
+    "        x_prev = x_hidden\n",
+    "\n",
+    "    ## Output layer:\n",
+    "\n",
+    "    # Get the weights and bias for this layer\n",
+    "    w_output = deep_params[-1]\n",
+    "\n",
+    "    # Include bias:\n",
+    "    x_prev = np.concatenate((np.ones((1,num_values)), x_prev), axis = 0)\n",
+    "\n",
+    "    z_output = np.matmul(w_output, x_prev)\n",
+    "    x_output = z_output\n",
+    "\n",
+    "    return x_output\n",
+    "\n",
+    "\n",
+    "def solve_ode_deep_neural_network(x, num_neurons, num_iter, lmb):\n",
+    "    # num_hidden_neurons is now a list of number of neurons within each hidden layer\n",
+    "\n",
+    "    # Find the number of hidden layers:\n",
+    "    N_hidden = np.size(num_neurons)\n",
+    "\n",
+    "    ## Set up initial weigths and biases\n",
+    "\n",
+    "    # Initialize the list of parameters:\n",
+    "    P = [None]*(N_hidden + 1) # + 1 to include the output layer\n",
+    "\n",
+    "    P[0] = npr.randn(num_neurons[0], 2 )\n",
+    "    for l in range(1,N_hidden):\n",
+    "        P[l] = npr.randn(num_neurons[l], num_neurons[l-1] + 1) # +1 to include bias\n",
+    "\n",
+    "    # For the output layer\n",
+    "    P[-1] = npr.randn(1, num_neurons[-1] + 1 ) # +1 since bias is included\n",
+    "\n",
+    "    print('Initial cost: %g'%cost_function_deep(P, x))\n",
+    "\n",
+    "    ## Start finding the optimal weigths using gradient descent\n",
+    "\n",
+    "    # Find the Python function that represents the gradient of the cost function\n",
+    "    # w.r.t the 0-th input argument -- that is the weights and biases in the hidden and output layer\n",
+    "    cost_function_deep_grad = grad(cost_function_deep,0)\n",
+    "\n",
+    "    # Let the update be done num_iter times\n",
+    "    for i in range(num_iter):\n",
+    "        # Evaluate the gradient at the current weights and biases in P.\n",
+    "        # The cost_grad consist now of N_hidden + 1 arrays; the gradient w.r.t the weights and biases\n",
+    "        # in the hidden layers and output layers evaluated at x.\n",
+    "        cost_deep_grad =  cost_function_deep_grad(P, x)\n",
+    "\n",
+    "        for l in range(N_hidden+1):\n",
+    "            P[l] = P[l] - lmb * cost_deep_grad[l]\n",
+    "\n",
+    "    print('Final cost: %g'%cost_function_deep(P, x))\n",
+    "\n",
+    "    return P\n",
+    "\n",
+    "## Set up the cost function specified for this Poisson equation:\n",
+    "\n",
+    "# The right side of the ODE\n",
+    "def f(x):\n",
+    "    return (3*x + x**2)*np.exp(x)\n",
+    "\n",
+    "def cost_function_deep(P, x):\n",
+    "\n",
+    "    # Evaluate the trial function with the current parameters P\n",
+    "    g_t = g_trial_deep(x,P)\n",
+    "\n",
+    "    # Find the derivative w.r.t x of the trial function\n",
+    "    d2_g_t = elementwise_grad(elementwise_grad(g_trial_deep,0))(x,P)\n",
+    "\n",
+    "    right_side = f(x)\n",
+    "\n",
+    "    err_sqr = (-d2_g_t - right_side)**2\n",
+    "    cost_sum = np.sum(err_sqr)\n",
+    "\n",
+    "    return cost_sum/np.size(err_sqr)\n",
+    "\n",
+    "# The trial solution:\n",
+    "def g_trial_deep(x,P):\n",
+    "    return x*(1-x)*deep_neural_network(P,x)\n",
+    "\n",
+    "# The analytic solution;\n",
+    "def g_analytic(x):\n",
+    "    return x*(1-x)*np.exp(x)\n",
+    "\n",
+    "if __name__ == '__main__':\n",
+    "    npr.seed(4155)\n",
+    "\n",
+    "    ## Decide the vales of arguments to the function to solve\n",
+    "    Nx = 10\n",
+    "    x = np.linspace(0,1, Nx)\n",
+    "\n",
+    "    ## Set up the initial parameters\n",
+    "    num_hidden_neurons = [200,100]\n",
+    "    num_iter = 1000\n",
+    "    lmb = 1e-3\n",
+    "\n",
+    "    P = solve_ode_deep_neural_network(x, num_hidden_neurons, num_iter, lmb)\n",
+    "\n",
+    "    g_dnn_ag = g_trial_deep(x,P)\n",
+    "    g_analytical = g_analytic(x)\n",
+    "\n",
+    "    # Find the maximum absolute difference between the solutons:\n",
+    "    max_diff = np.max(np.abs(g_dnn_ag - g_analytical))\n",
+    "    print(\"The max absolute difference between the solutions is: %g\"%max_diff)\n",
+    "\n",
+    "    plt.figure(figsize=(10,10))\n",
+    "\n",
+    "    plt.title('Performance of neural network solving an ODE compared to the analytical solution')\n",
+    "    plt.plot(x, g_analytical)\n",
+    "    plt.plot(x, g_dnn_ag[0,:])\n",
+    "    plt.legend(['analytical','nn'])\n",
+    "    plt.xlabel('x')\n",
+    "    plt.ylabel('g(x)')\n",
+    "    plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9db2e30e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Comparing with a numerical scheme\n",
+    "\n",
+    "The Poisson equation is possible to solve using Taylor series to approximate the second derivative.\n",
+    "\n",
+    "Using Taylor series, the second derivative can be expressed as\n",
+    "\n",
+    "$$\n",
+    "g''(x) = \\frac{g(x + \\Delta x) - 2g(x) + g(x-\\Delta x)}{\\Delta x^2} + E_{\\Delta x}(x)\n",
+    "$$\n",
+    "\n",
+    "where $\\Delta x$ is a small step size and $E_{\\Delta x}(x)$ being the error term.\n",
+    "\n",
+    "Looking away from the error terms gives an approximation to the second derivative:"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2cea098e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"approx\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation} \\label{approx} \\tag{15}\n",
+    "g''(x) \\approx \\frac{g(x + \\Delta x) - 2g(x) + g(x-\\Delta x)}{\\Delta x^2}\n",
+    "\\end{equation}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4606d139",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "If $x_i = i \\Delta x = x_{i-1} + \\Delta x$ and $g_i = g(x_i)$ for $i = 1,\\dots N_x - 2$ with $N_x$ being the number of values for $x$, ([15](#approx)) becomes"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "bf52b218",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\begin{aligned}\n",
+    "g''(x_i) &\\approx \\frac{g(x_i + \\Delta x) - 2g(x_i) + g(x_i -\\Delta x)}{\\Delta x^2} \\\\\n",
+    "&= \\frac{g_{i+1} - 2g_i + g_{i-1}}{\\Delta x^2}\n",
+    "\\end{aligned}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5649b303",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Since we know from our problem that"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cabbaeeb",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\begin{aligned}\n",
+    "-g''(x) &= f(x) \\\\\n",
+    "&= (3x + x^2)\\exp(x)\n",
+    "\\end{aligned}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9116da9a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "along with the conditions $g(0) = g(1) = 0$,\n",
+    "the following scheme can be used to find an approximate solution for $g(x)$ numerically:"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "fa0313ed",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"odesys\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation}\n",
+    "  \\begin{aligned}\n",
+    "  -\\Big( \\frac{g_{i+1} - 2g_i + g_{i-1}}{\\Delta x^2} \\Big) &= f(x_i) \\\\\n",
+    "  -g_{i+1} + 2g_i - g_{i-1} &= \\Delta x^2 f(x_i)\n",
+    "  \\end{aligned}\n",
+    "\\end{equation} \\label{odesys} \\tag{16}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d4bff256",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "for $i = 1, \\dots, N_x - 2$ where $g_0 = g_{N_x - 1} = 0$ and $f(x_i) = (3x_i + x_i^2)\\exp(x_i)$, which is given for our specific problem.\n",
+    "\n",
+    "The equation can be rewritten into a matrix equation:"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2817b619",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\begin{aligned}\n",
+    "\\begin{pmatrix}\n",
+    "2 & -1 & 0 & \\dots & 0 \\\\\n",
+    "-1 & 2 & -1 & \\dots & 0 \\\\\n",
+    "\\vdots & & \\ddots & & \\vdots \\\\\n",
+    "0 & \\dots & -1 & 2 & -1  \\\\\n",
+    "0 & \\dots & 0 & -1 & 2\\\\\n",
+    "\\end{pmatrix}\n",
+    "\\begin{pmatrix}\n",
+    "g_1 \\\\\n",
+    "g_2 \\\\\n",
+    "\\vdots \\\\\n",
+    "g_{N_x - 3} \\\\\n",
+    "g_{N_x - 2}\n",
+    "\\end{pmatrix}\n",
+    "&=\n",
+    "\\Delta x^2\n",
+    "\\begin{pmatrix}\n",
+    "f(x_1) \\\\\n",
+    "f(x_2) \\\\\n",
+    "\\vdots \\\\\n",
+    "f(x_{N_x - 3}) \\\\\n",
+    "f(x_{N_x - 2})\n",
+    "\\end{pmatrix} \\\\\n",
+    "\\boldsymbol{A}\\boldsymbol{g} &= \\boldsymbol{f},\n",
+    "\\end{aligned}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5130b233",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "which makes it possible to solve for the vector $\\boldsymbol{g}$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "18a4fdda",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Setting up the code\n",
+    "\n",
+    "We can then compare the result from this numerical scheme with the output from our network using Autograd:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "id": "3cff184d",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "import autograd.numpy as np\n",
+    "from autograd import grad, elementwise_grad\n",
+    "import autograd.numpy.random as npr\n",
+    "from matplotlib import pyplot as plt\n",
+    "\n",
+    "def sigmoid(z):\n",
+    "    return 1/(1 + np.exp(-z))\n",
+    "\n",
+    "def deep_neural_network(deep_params, x):\n",
+    "    # N_hidden is the number of hidden layers\n",
+    "    # deep_params is a list, len() should be used\n",
+    "    N_hidden = len(deep_params) - 1 # -1 since params consists of\n",
+    "                                        # parameters to all the hidden\n",
+    "                                        # layers AND the output layer.\n",
+    "\n",
+    "    # Assumes input x being an one-dimensional array\n",
+    "    num_values = np.size(x)\n",
+    "    x = x.reshape(-1, num_values)\n",
+    "\n",
+    "    # Assume that the input layer does nothing to the input x\n",
+    "    x_input = x\n",
+    "\n",
+    "    # Due to multiple hidden layers, define a variable referencing to the\n",
+    "    # output of the previous layer:\n",
+    "    x_prev = x_input\n",
+    "\n",
+    "    ## Hidden layers:\n",
+    "\n",
+    "    for l in range(N_hidden):\n",
+    "        # From the list of parameters P; find the correct weigths and bias for this layer\n",
+    "        w_hidden = deep_params[l]\n",
+    "\n",
+    "        # Add a row of ones to include bias\n",
+    "        x_prev = np.concatenate((np.ones((1,num_values)), x_prev ), axis = 0)\n",
+    "\n",
+    "        z_hidden = np.matmul(w_hidden, x_prev)\n",
+    "        x_hidden = sigmoid(z_hidden)\n",
+    "\n",
+    "        # Update x_prev such that next layer can use the output from this layer\n",
+    "        x_prev = x_hidden\n",
+    "\n",
+    "    ## Output layer:\n",
+    "\n",
+    "    # Get the weights and bias for this layer\n",
+    "    w_output = deep_params[-1]\n",
+    "\n",
+    "    # Include bias:\n",
+    "    x_prev = np.concatenate((np.ones((1,num_values)), x_prev), axis = 0)\n",
+    "\n",
+    "    z_output = np.matmul(w_output, x_prev)\n",
+    "    x_output = z_output\n",
+    "\n",
+    "    return x_output\n",
+    "\n",
+    "\n",
+    "def solve_ode_deep_neural_network(x, num_neurons, num_iter, lmb):\n",
+    "    # num_hidden_neurons is now a list of number of neurons within each hidden layer\n",
+    "\n",
+    "    # Find the number of hidden layers:\n",
+    "    N_hidden = np.size(num_neurons)\n",
+    "\n",
+    "    ## Set up initial weigths and biases\n",
+    "\n",
+    "    # Initialize the list of parameters:\n",
+    "    P = [None]*(N_hidden + 1) # + 1 to include the output layer\n",
+    "\n",
+    "    P[0] = npr.randn(num_neurons[0], 2 )\n",
+    "    for l in range(1,N_hidden):\n",
+    "        P[l] = npr.randn(num_neurons[l], num_neurons[l-1] + 1) # +1 to include bias\n",
+    "\n",
+    "    # For the output layer\n",
+    "    P[-1] = npr.randn(1, num_neurons[-1] + 1 ) # +1 since bias is included\n",
+    "\n",
+    "    print('Initial cost: %g'%cost_function_deep(P, x))\n",
+    "\n",
+    "    ## Start finding the optimal weigths using gradient descent\n",
+    "\n",
+    "    # Find the Python function that represents the gradient of the cost function\n",
+    "    # w.r.t the 0-th input argument -- that is the weights and biases in the hidden and output layer\n",
+    "    cost_function_deep_grad = grad(cost_function_deep,0)\n",
+    "\n",
+    "    # Let the update be done num_iter times\n",
+    "    for i in range(num_iter):\n",
+    "        # Evaluate the gradient at the current weights and biases in P.\n",
+    "        # The cost_grad consist now of N_hidden + 1 arrays; the gradient w.r.t the weights and biases\n",
+    "        # in the hidden layers and output layers evaluated at x.\n",
+    "        cost_deep_grad =  cost_function_deep_grad(P, x)\n",
+    "\n",
+    "        for l in range(N_hidden+1):\n",
+    "            P[l] = P[l] - lmb * cost_deep_grad[l]\n",
+    "\n",
+    "    print('Final cost: %g'%cost_function_deep(P, x))\n",
+    "\n",
+    "    return P\n",
+    "\n",
+    "## Set up the cost function specified for this Poisson equation:\n",
+    "\n",
+    "# The right side of the ODE\n",
+    "def f(x):\n",
+    "    return (3*x + x**2)*np.exp(x)\n",
+    "\n",
+    "def cost_function_deep(P, x):\n",
+    "\n",
+    "    # Evaluate the trial function with the current parameters P\n",
+    "    g_t = g_trial_deep(x,P)\n",
+    "\n",
+    "    # Find the derivative w.r.t x of the trial function\n",
+    "    d2_g_t = elementwise_grad(elementwise_grad(g_trial_deep,0))(x,P)\n",
+    "\n",
+    "    right_side = f(x)\n",
+    "\n",
+    "    err_sqr = (-d2_g_t - right_side)**2\n",
+    "    cost_sum = np.sum(err_sqr)\n",
+    "\n",
+    "    return cost_sum/np.size(err_sqr)\n",
+    "\n",
+    "# The trial solution:\n",
+    "def g_trial_deep(x,P):\n",
+    "    return x*(1-x)*deep_neural_network(P,x)\n",
+    "\n",
+    "# The analytic solution;\n",
+    "def g_analytic(x):\n",
+    "    return x*(1-x)*np.exp(x)\n",
+    "\n",
+    "if __name__ == '__main__':\n",
+    "    npr.seed(4155)\n",
+    "\n",
+    "    ## Decide the vales of arguments to the function to solve\n",
+    "    Nx = 10\n",
+    "    x = np.linspace(0,1, Nx)\n",
+    "\n",
+    "    ## Set up the initial parameters\n",
+    "    num_hidden_neurons = [200,100]\n",
+    "    num_iter = 1000\n",
+    "    lmb = 1e-3\n",
+    "\n",
+    "    P = solve_ode_deep_neural_network(x, num_hidden_neurons, num_iter, lmb)\n",
+    "\n",
+    "    g_dnn_ag = g_trial_deep(x,P)\n",
+    "    g_analytical = g_analytic(x)\n",
+    "\n",
+    "    # Find the maximum absolute difference between the solutons:\n",
+    "\n",
+    "    plt.figure(figsize=(10,10))\n",
+    "\n",
+    "    plt.title('Performance of neural network solving an ODE compared to the analytical solution')\n",
+    "    plt.plot(x, g_analytical)\n",
+    "    plt.plot(x, g_dnn_ag[0,:])\n",
+    "    plt.legend(['analytical','nn'])\n",
+    "    plt.xlabel('x')\n",
+    "    plt.ylabel('g(x)')\n",
+    "\n",
+    "    ## Perform the computation using the numerical scheme\n",
+    "\n",
+    "    dx = 1/(Nx - 1)\n",
+    "\n",
+    "    # Set up the matrix A\n",
+    "    A = np.zeros((Nx-2,Nx-2))\n",
+    "\n",
+    "    A[0,0] = 2\n",
+    "    A[0,1] = -1\n",
+    "\n",
+    "    for i in range(1,Nx-3):\n",
+    "        A[i,i-1] = -1\n",
+    "        A[i,i] = 2\n",
+    "        A[i,i+1] = -1\n",
+    "\n",
+    "    A[Nx - 3, Nx - 4] = -1\n",
+    "    A[Nx - 3, Nx - 3] = 2\n",
+    "\n",
+    "    # Set up the vector f\n",
+    "    f_vec = dx**2 * f(x[1:-1])\n",
+    "\n",
+    "    # Solve the equation\n",
+    "    g_res = np.linalg.solve(A,f_vec)\n",
+    "\n",
+    "    g_vec = np.zeros(Nx)\n",
+    "    g_vec[1:-1] = g_res\n",
+    "\n",
+    "    # Print the differences between each method\n",
+    "    max_diff1 = np.max(np.abs(g_dnn_ag - g_analytical))\n",
+    "    max_diff2 = np.max(np.abs(g_vec - g_analytical))\n",
+    "    print(\"The max absolute difference between the analytical solution and DNN Autograd: %g\"%max_diff1)\n",
+    "    print(\"The max absolute difference between the analytical solution and numerical scheme: %g\"%max_diff2)\n",
+    "\n",
+    "    # Plot the results\n",
+    "    plt.figure(figsize=(10,10))\n",
+    "\n",
+    "    plt.plot(x,g_vec)\n",
+    "    plt.plot(x,g_analytical)\n",
+    "    plt.plot(x,g_dnn_ag[0,:])\n",
+    "\n",
+    "    plt.legend(['numerical scheme','analytical','dnn'])\n",
+    "    plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "89115be0",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Partial Differential Equations\n",
+    "\n",
+    "A partial differential equation (PDE) has a solution here the function\n",
+    "is defined by multiple variables.  The equation may involve all kinds\n",
+    "of combinations of which variables the function is differentiated with\n",
+    "respect to.\n",
+    "\n",
+    "In general, a partial differential equation for a function $g(x_1,\\dots,x_N)$ with $N$ variables may be expressed as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c43a6341",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"PDE\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation} \\label{PDE} \\tag{17}\n",
+    "  f\\left(x_1, \\, \\dots \\, , x_N, \\frac{\\partial g(x_1,\\dots,x_N) }{\\partial x_1}, \\dots , \\frac{\\partial g(x_1,\\dots,x_N) }{\\partial x_N}, \\frac{\\partial g(x_1,\\dots,x_N) }{\\partial x_1\\partial x_2}, \\, \\dots \\, , \\frac{\\partial^n g(x_1,\\dots,x_N) }{\\partial x_N^n} \\right) = 0\n",
+    "\\end{equation}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "218a7a68",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where $f$ is an expression involving all kinds of possible mixed derivatives of $g(x_1,\\dots,x_N)$ up to an order $n$. In order for the solution to be unique, some additional conditions must also be given."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "902f8f61",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Type of problem\n",
+    "\n",
+    "The problem our network must solve for, is similar to the ODE case.\n",
+    "We must have a trial solution $g_t$ at hand.\n",
+    "\n",
+    "For instance, the trial solution could be expressed as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1c2bbcbd",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\begin{align*}\n",
+    "  g_t(x_1,\\dots,x_N) = h_1(x_1,\\dots,x_N) + h_2(x_1,\\dots,x_N,N(x_1,\\dots,x_N,P))\n",
+    "\\end{align*}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "73f5bf7b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where $h_1(x_1,\\dots,x_N)$ is a function that ensures $g_t(x_1,\\dots,x_N)$ satisfies some given conditions.\n",
+    "The neural network $N(x_1,\\dots,x_N,P)$ has weights and biases described by $P$ and $h_2(x_1,\\dots,x_N,N(x_1,\\dots,x_N,P))$ is an expression using the output from the neural network in some way.\n",
+    "\n",
+    "The role of the function $h_2(x_1,\\dots,x_N,N(x_1,\\dots,x_N,P))$, is to ensure that the output of $N(x_1,\\dots,x_N,P)$ is zero when $g_t(x_1,\\dots,x_N)$ is evaluated at the values of $x_1,\\dots,x_N$ where the given conditions must be satisfied. The function $h_1(x_1,\\dots,x_N)$ should alone make $g_t(x_1,\\dots,x_N)$ satisfy the conditions."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "dbb4ece5",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Network requirements\n",
+    "\n",
+    "The network tries then the minimize the cost function following the\n",
+    "same ideas as described for the ODE case, but now with more than one\n",
+    "variables to consider.  The concept still remains the same; find a set\n",
+    "of parameters $P$ such that the expression $f$ in ([17](#PDE)) is as\n",
+    "close to zero as possible.\n",
+    "\n",
+    "As for the ODE case, the cost function is the mean squared error that\n",
+    "the network must try to minimize. The cost function for the network to\n",
+    "minimize is"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d01d3943",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "C\\left(x_1, \\dots, x_N, P\\right) = \\left(  f\\left(x_1, \\, \\dots \\, , x_N, \\frac{\\partial g(x_1,\\dots,x_N) }{\\partial x_1}, \\dots , \\frac{\\partial g(x_1,\\dots,x_N) }{\\partial x_N}, \\frac{\\partial g(x_1,\\dots,x_N) }{\\partial x_1\\partial x_2}, \\, \\dots \\, , \\frac{\\partial^n g(x_1,\\dots,x_N) }{\\partial x_N^n} \\right) \\right)^2\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6514db22",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## More details\n",
+    "\n",
+    "If we let $\\boldsymbol{x} = \\big( x_1, \\dots, x_N \\big)$ be an array containing the values for $x_1, \\dots, x_N$ respectively, the cost function can be reformulated into the following:"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5a0ed10c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "C\\left(\\boldsymbol{x}, P\\right) = f\\left( \\left( \\boldsymbol{x}, \\frac{\\partial g(\\boldsymbol{x}) }{\\partial x_1}, \\dots , \\frac{\\partial g(\\boldsymbol{x}) }{\\partial x_N}, \\frac{\\partial g(\\boldsymbol{x}) }{\\partial x_1\\partial x_2}, \\, \\dots \\, , \\frac{\\partial^n g(\\boldsymbol{x}) }{\\partial x_N^n} \\right) \\right)^2\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "200fc78c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "If we also have $M$ different sets of values for $x_1, \\dots, x_N$, that is $\\boldsymbol{x}_i = \\big(x_1^{(i)}, \\dots, x_N^{(i)}\\big)$ for $i = 1,\\dots,M$ being the rows in matrix $X$, the cost function can be generalized into"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0c87647d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "C\\left(X, P \\right) = \\sum_{i=1}^M f\\left( \\left( \\boldsymbol{x}_i, \\frac{\\partial g(\\boldsymbol{x}_i) }{\\partial x_1}, \\dots , \\frac{\\partial g(\\boldsymbol{x}_i) }{\\partial x_N}, \\frac{\\partial g(\\boldsymbol{x}_i) }{\\partial x_1\\partial x_2}, \\, \\dots \\, , \\frac{\\partial^n g(\\boldsymbol{x}_i) }{\\partial x_N^n} \\right) \\right)^2.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6484a267",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Example: The diffusion equation\n",
+    "\n",
+    "In one spatial dimension, the equation reads"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2c2a2467",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial g(x,t)}{\\partial t} = \\frac{\\partial^2 g(x,t)}{\\partial x^2}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6df58357",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where a possible choice of conditions are"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "13d9c7f6",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\begin{align*}\n",
+    "g(0,t) &= 0 ,\\qquad t \\geq 0 \\\\\n",
+    "g(1,t) &= 0, \\qquad t \\geq 0 \\\\\n",
+    "g(x,0) &= u(x),\\qquad x\\in [0,1]\n",
+    "\\end{align*}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "627708ec",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "with $u(x)$ being some given function."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "43cdd945",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Defining the problem\n",
+    "\n",
+    "For this case, we want to find $g(x,t)$ such that"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ccdcb67e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"diffonedim\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation}\n",
+    "  \\frac{\\partial g(x,t)}{\\partial t} = \\frac{\\partial^2 g(x,t)}{\\partial x^2}\n",
+    "\\end{equation} \\label{diffonedim} \\tag{18}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ebe711f8",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2174f30f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\begin{align*}\n",
+    "g(0,t) &= 0 ,\\qquad t \\geq 0 \\\\\n",
+    "g(1,t) &= 0, \\qquad t \\geq 0 \\\\\n",
+    "g(x,0) &= u(x),\\qquad x\\in [0,1]\n",
+    "\\end{align*}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "083ed2ff",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "with $u(x) = \\sin(\\pi x)$.\n",
+    "\n",
+    "First, let us set up the deep neural network.\n",
+    "The deep neural network will follow the same structure as discussed in the examples solving the ODEs.\n",
+    "First, we will look into how Autograd could be used in a network tailored to solve for bivariate functions."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cf5e3f46",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Setting up the network using Autograd\n",
+    "\n",
+    "The only change to do here, is to extend our network such that\n",
+    "functions of multiple parameters are correctly handled.  In this case\n",
+    "we have two variables in our function to solve for, that is time $t$\n",
+    "and position $x$.  The variables will be represented by a\n",
+    "one-dimensional array in the program.  The program will evaluate the\n",
+    "network at each possible pair $(x,t)$, given an array for the desired\n",
+    "$x$-values and $t$-values to approximate the solution at."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "id": "4fee106b",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "def sigmoid(z):\n",
+    "    return 1/(1 + np.exp(-z))\n",
+    "\n",
+    "def deep_neural_network(deep_params, x):\n",
+    "    # x is now a point and a 1D numpy array; make it a column vector\n",
+    "    num_coordinates = np.size(x,0)\n",
+    "    x = x.reshape(num_coordinates,-1)\n",
+    "\n",
+    "    num_points = np.size(x,1)\n",
+    "\n",
+    "    # N_hidden is the number of hidden layers\n",
+    "    N_hidden = len(deep_params) - 1 # -1 since params consist of parameters to all the hidden layers AND the output layer\n",
+    "\n",
+    "    # Assume that the input layer does nothing to the input x\n",
+    "    x_input = x\n",
+    "    x_prev = x_input\n",
+    "\n",
+    "    ## Hidden layers:\n",
+    "\n",
+    "    for l in range(N_hidden):\n",
+    "        # From the list of parameters P; find the correct weigths and bias for this layer\n",
+    "        w_hidden = deep_params[l]\n",
+    "\n",
+    "        # Add a row of ones to include bias\n",
+    "        x_prev = np.concatenate((np.ones((1,num_points)), x_prev ), axis = 0)\n",
+    "\n",
+    "        z_hidden = np.matmul(w_hidden, x_prev)\n",
+    "        x_hidden = sigmoid(z_hidden)\n",
+    "\n",
+    "        # Update x_prev such that next layer can use the output from this layer\n",
+    "        x_prev = x_hidden\n",
+    "\n",
+    "    ## Output layer:\n",
+    "\n",
+    "    # Get the weights and bias for this layer\n",
+    "    w_output = deep_params[-1]\n",
+    "\n",
+    "    # Include bias:\n",
+    "    x_prev = np.concatenate((np.ones((1,num_points)), x_prev), axis = 0)\n",
+    "\n",
+    "    z_output = np.matmul(w_output, x_prev)\n",
+    "    x_output = z_output\n",
+    "\n",
+    "    return x_output[0][0]"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "63e5fb7e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Setting up the network using Autograd; The trial solution\n",
+    "\n",
+    "The cost function must then iterate through the given arrays\n",
+    "containing values for $x$ and $t$, defines a point $(x,t)$ the deep\n",
+    "neural network and the trial solution is evaluated at, and then finds\n",
+    "the Jacobian of the trial solution.\n",
+    "\n",
+    "A possible trial solution for this PDE is\n",
+    "\n",
+    "$$\n",
+    "g_t(x,t) = h_1(x,t) + x(1-x)tN(x,t,P)\n",
+    "$$\n",
+    "\n",
+    "with $h_1(x,t)$ being a function ensuring that $g_t(x,t)$ satisfies our given conditions, and $N(x,t,P)$ being the output from the deep neural network using weights and biases for each layer from $P$.\n",
+    "\n",
+    "To fulfill the conditions, $h_1(x,t)$ could be:\n",
+    "\n",
+    "$$\n",
+    "h_1(x,t) = (1-t)\\Big(u(x) - \\big((1-x)u(0) + x u(1)\\big)\\Big) = (1-t)u(x) = (1-t)\\sin(\\pi x)\n",
+    "$$\n",
+    "since $(0) = u(1) = 0$ and $u(x) = \\sin(\\pi x)$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "50cfea81",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Why the Jacobian?\n",
+    "\n",
+    "The Jacobian is used because the program must find the derivative of\n",
+    "the trial solution with respect to $x$ and $t$.\n",
+    "\n",
+    "This gives the necessity of computing the Jacobian matrix, as we want\n",
+    "to evaluate the gradient with respect to $x$ and $t$ (note that the\n",
+    "Jacobian of a scalar-valued multivariate function is simply its\n",
+    "gradient).\n",
+    "\n",
+    "In Autograd, the differentiation is by default done with respect to\n",
+    "the first input argument of your Python function. Since the points is\n",
+    "an array representing $x$ and $t$, the Jacobian is calculated using\n",
+    "the values of $x$ and $t$.\n",
+    "\n",
+    "To find the second derivative with respect to $x$ and $t$, the\n",
+    "Jacobian can be found for the second time. The result is a Hessian\n",
+    "matrix, which is the matrix containing all the possible second order\n",
+    "mixed derivatives of $g(x,t)$."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 8,
+   "id": "309808f6",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "# Set up the trial function:\n",
+    "def u(x):\n",
+    "    return np.sin(np.pi*x)\n",
+    "\n",
+    "def g_trial(point,P):\n",
+    "    x,t = point\n",
+    "    return (1-t)*u(x) + x*(1-x)*t*deep_neural_network(P,point)\n",
+    "\n",
+    "# The right side of the ODE:\n",
+    "def f(point):\n",
+    "    return 0.\n",
+    "\n",
+    "# The cost function:\n",
+    "def cost_function(P, x, t):\n",
+    "    cost_sum = 0\n",
+    "\n",
+    "    g_t_jacobian_func = jacobian(g_trial)\n",
+    "    g_t_hessian_func = hessian(g_trial)\n",
+    "\n",
+    "    for x_ in x:\n",
+    "        for t_ in t:\n",
+    "            point = np.array([x_,t_])\n",
+    "\n",
+    "            g_t = g_trial(point,P)\n",
+    "            g_t_jacobian = g_t_jacobian_func(point,P)\n",
+    "            g_t_hessian = g_t_hessian_func(point,P)\n",
+    "\n",
+    "            g_t_dt = g_t_jacobian[1]\n",
+    "            g_t_d2x = g_t_hessian[0][0]\n",
+    "\n",
+    "            func = f(point)\n",
+    "\n",
+    "            err_sqr = ( (g_t_dt - g_t_d2x) - func)**2\n",
+    "            cost_sum += err_sqr\n",
+    "\n",
+    "    return cost_sum"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9880d94c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Setting up the network using Autograd; The full program\n",
+    "\n",
+    "Having set up the network, along with the trial solution and cost function, we can now see how the deep neural network performs by comparing the results to the analytical solution.\n",
+    "\n",
+    "The analytical solution of our problem is\n",
+    "\n",
+    "$$\n",
+    "g(x,t) = \\exp(-\\pi^2 t)\\sin(\\pi x)\n",
+    "$$\n",
+    "\n",
+    "A possible way to implement a neural network solving the PDE, is given below.\n",
+    "Be aware, though, that it is fairly slow for the parameters used.\n",
+    "A better result is possible, but requires more iterations, and thus longer time to complete.\n",
+    "\n",
+    "Indeed, the program below is not optimal in its implementation, but rather serves as an example on how to implement and use a neural network to solve a PDE.\n",
+    "Using TensorFlow results in a much better execution time. Try it!"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 9,
+   "id": "fcd284e3",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "import autograd.numpy as np\n",
+    "from autograd import jacobian,hessian,grad\n",
+    "import autograd.numpy.random as npr\n",
+    "from matplotlib import cm\n",
+    "from matplotlib import pyplot as plt\n",
+    "from mpl_toolkits.mplot3d import axes3d\n",
+    "\n",
+    "## Set up the network\n",
+    "\n",
+    "def sigmoid(z):\n",
+    "    return 1/(1 + np.exp(-z))\n",
+    "\n",
+    "def deep_neural_network(deep_params, x):\n",
+    "    # x is now a point and a 1D numpy array; make it a column vector\n",
+    "    num_coordinates = np.size(x,0)\n",
+    "    x = x.reshape(num_coordinates,-1)\n",
+    "\n",
+    "    num_points = np.size(x,1)\n",
+    "\n",
+    "    # N_hidden is the number of hidden layers\n",
+    "    N_hidden = len(deep_params) - 1 # -1 since params consist of parameters to all the hidden layers AND the output layer\n",
+    "\n",
+    "    # Assume that the input layer does nothing to the input x\n",
+    "    x_input = x\n",
+    "    x_prev = x_input\n",
+    "\n",
+    "    ## Hidden layers:\n",
+    "\n",
+    "    for l in range(N_hidden):\n",
+    "        # From the list of parameters P; find the correct weigths and bias for this layer\n",
+    "        w_hidden = deep_params[l]\n",
+    "\n",
+    "        # Add a row of ones to include bias\n",
+    "        x_prev = np.concatenate((np.ones((1,num_points)), x_prev ), axis = 0)\n",
+    "\n",
+    "        z_hidden = np.matmul(w_hidden, x_prev)\n",
+    "        x_hidden = sigmoid(z_hidden)\n",
+    "\n",
+    "        # Update x_prev such that next layer can use the output from this layer\n",
+    "        x_prev = x_hidden\n",
+    "\n",
+    "    ## Output layer:\n",
+    "\n",
+    "    # Get the weights and bias for this layer\n",
+    "    w_output = deep_params[-1]\n",
+    "\n",
+    "    # Include bias:\n",
+    "    x_prev = np.concatenate((np.ones((1,num_points)), x_prev), axis = 0)\n",
+    "\n",
+    "    z_output = np.matmul(w_output, x_prev)\n",
+    "    x_output = z_output\n",
+    "\n",
+    "    return x_output[0][0]\n",
+    "\n",
+    "## Define the trial solution and cost function\n",
+    "def u(x):\n",
+    "    return np.sin(np.pi*x)\n",
+    "\n",
+    "def g_trial(point,P):\n",
+    "    x,t = point\n",
+    "    return (1-t)*u(x) + x*(1-x)*t*deep_neural_network(P,point)\n",
+    "\n",
+    "# The right side of the ODE:\n",
+    "def f(point):\n",
+    "    return 0.\n",
+    "\n",
+    "# The cost function:\n",
+    "def cost_function(P, x, t):\n",
+    "    cost_sum = 0\n",
+    "\n",
+    "    g_t_jacobian_func = jacobian(g_trial)\n",
+    "    g_t_hessian_func = hessian(g_trial)\n",
+    "\n",
+    "    for x_ in x:\n",
+    "        for t_ in t:\n",
+    "            point = np.array([x_,t_])\n",
+    "\n",
+    "            g_t = g_trial(point,P)\n",
+    "            g_t_jacobian = g_t_jacobian_func(point,P)\n",
+    "            g_t_hessian = g_t_hessian_func(point,P)\n",
+    "\n",
+    "            g_t_dt = g_t_jacobian[1]\n",
+    "            g_t_d2x = g_t_hessian[0][0]\n",
+    "\n",
+    "            func = f(point)\n",
+    "\n",
+    "            err_sqr = ( (g_t_dt - g_t_d2x) - func)**2\n",
+    "            cost_sum += err_sqr\n",
+    "\n",
+    "    return cost_sum /( np.size(x)*np.size(t) )\n",
+    "\n",
+    "## For comparison, define the analytical solution\n",
+    "def g_analytic(point):\n",
+    "    x,t = point\n",
+    "    return np.exp(-np.pi**2*t)*np.sin(np.pi*x)\n",
+    "\n",
+    "## Set up a function for training the network to solve for the equation\n",
+    "def solve_pde_deep_neural_network(x,t, num_neurons, num_iter, lmb):\n",
+    "    ## Set up initial weigths and biases\n",
+    "    N_hidden = np.size(num_neurons)\n",
+    "\n",
+    "    ## Set up initial weigths and biases\n",
+    "\n",
+    "    # Initialize the list of parameters:\n",
+    "    P = [None]*(N_hidden + 1) # + 1 to include the output layer\n",
+    "\n",
+    "    P[0] = npr.randn(num_neurons[0], 2 + 1 ) # 2 since we have two points, +1 to include bias\n",
+    "    for l in range(1,N_hidden):\n",
+    "        P[l] = npr.randn(num_neurons[l], num_neurons[l-1] + 1) # +1 to include bias\n",
+    "\n",
+    "    # For the output layer\n",
+    "    P[-1] = npr.randn(1, num_neurons[-1] + 1 ) # +1 since bias is included\n",
+    "\n",
+    "    print('Initial cost: ',cost_function(P, x, t))\n",
+    "\n",
+    "    cost_function_grad = grad(cost_function,0)\n",
+    "\n",
+    "    # Let the update be done num_iter times\n",
+    "    for i in range(num_iter):\n",
+    "        cost_grad =  cost_function_grad(P, x , t)\n",
+    "\n",
+    "        for l in range(N_hidden+1):\n",
+    "            P[l] = P[l] - lmb * cost_grad[l]\n",
+    "\n",
+    "    print('Final cost: ',cost_function(P, x, t))\n",
+    "\n",
+    "    return P\n",
+    "\n",
+    "if __name__ == '__main__':\n",
+    "    ### Use the neural network:\n",
+    "    npr.seed(15)\n",
+    "\n",
+    "    ## Decide the vales of arguments to the function to solve\n",
+    "    Nx = 10; Nt = 10\n",
+    "    x = np.linspace(0, 1, Nx)\n",
+    "    t = np.linspace(0,1,Nt)\n",
+    "\n",
+    "    ## Set up the parameters for the network\n",
+    "    num_hidden_neurons = [100, 25]\n",
+    "    num_iter = 250\n",
+    "    lmb = 0.01\n",
+    "\n",
+    "    P = solve_pde_deep_neural_network(x,t, num_hidden_neurons, num_iter, lmb)\n",
+    "\n",
+    "    ## Store the results\n",
+    "    g_dnn_ag = np.zeros((Nx, Nt))\n",
+    "    G_analytical = np.zeros((Nx, Nt))\n",
+    "    for i,x_ in enumerate(x):\n",
+    "        for j, t_ in enumerate(t):\n",
+    "            point = np.array([x_, t_])\n",
+    "            g_dnn_ag[i,j] = g_trial(point,P)\n",
+    "\n",
+    "            G_analytical[i,j] = g_analytic(point)\n",
+    "\n",
+    "    # Find the map difference between the analytical and the computed solution\n",
+    "    diff_ag = np.abs(g_dnn_ag - G_analytical)\n",
+    "    print('Max absolute difference between the analytical solution and the network: %g'%np.max(diff_ag))\n",
+    "\n",
+    "    ## Plot the solutions in two dimensions, that being in position and time\n",
+    "\n",
+    "    T,X = np.meshgrid(t,x)\n",
+    "\n",
+    "    fig = plt.figure(figsize=(10,10))\n",
+    "    ax = fig.add_suplot(projection='3d')\n",
+    "    ax.set_title('Solution from the deep neural network w/ %d layer'%len(num_hidden_neurons))\n",
+    "    s = ax.plot_surface(T,X,g_dnn_ag,linewidth=0,antialiased=False,cmap=cm.viridis)\n",
+    "    ax.set_xlabel('Time $t$')\n",
+    "    ax.set_ylabel('Position $x$');\n",
+    "\n",
+    "\n",
+    "    fig = plt.figure(figsize=(10,10))\n",
+    "    ax = fig.add_suplot(projection='3d')\n",
+    "    ax.set_title('Analytical solution')\n",
+    "    s = ax.plot_surface(T,X,G_analytical,linewidth=0,antialiased=False,cmap=cm.viridis)\n",
+    "    ax.set_xlabel('Time $t$')\n",
+    "    ax.set_ylabel('Position $x$');\n",
+    "\n",
+    "    fig = plt.figure(figsize=(10,10))\n",
+    "    ax = fig.add_suplot(projection='3d')\n",
+    "    ax.set_title('Difference')\n",
+    "    s = ax.plot_surface(T,X,diff_ag,linewidth=0,antialiased=False,cmap=cm.viridis)\n",
+    "    ax.set_xlabel('Time $t$')\n",
+    "    ax.set_ylabel('Position $x$');\n",
+    "\n",
+    "    ## Take some slices of the 3D plots just to see the solutions at particular times\n",
+    "    indx1 = 0\n",
+    "    indx2 = int(Nt/2)\n",
+    "    indx3 = Nt-1\n",
+    "\n",
+    "    t1 = t[indx1]\n",
+    "    t2 = t[indx2]\n",
+    "    t3 = t[indx3]\n",
+    "\n",
+    "    # Slice the results from the DNN\n",
+    "    res1 = g_dnn_ag[:,indx1]\n",
+    "    res2 = g_dnn_ag[:,indx2]\n",
+    "    res3 = g_dnn_ag[:,indx3]\n",
+    "\n",
+    "    # Slice the analytical results\n",
+    "    res_analytical1 = G_analytical[:,indx1]\n",
+    "    res_analytical2 = G_analytical[:,indx2]\n",
+    "    res_analytical3 = G_analytical[:,indx3]\n",
+    "\n",
+    "    # Plot the slices\n",
+    "    plt.figure(figsize=(10,10))\n",
+    "    plt.title(\"Computed solutions at time = %g\"%t1)\n",
+    "    plt.plot(x, res1)\n",
+    "    plt.plot(x,res_analytical1)\n",
+    "    plt.legend(['dnn','analytical'])\n",
+    "\n",
+    "    plt.figure(figsize=(10,10))\n",
+    "    plt.title(\"Computed solutions at time = %g\"%t2)\n",
+    "    plt.plot(x, res2)\n",
+    "    plt.plot(x,res_analytical2)\n",
+    "    plt.legend(['dnn','analytical'])\n",
+    "\n",
+    "    plt.figure(figsize=(10,10))\n",
+    "    plt.title(\"Computed solutions at time = %g\"%t3)\n",
+    "    plt.plot(x, res3)\n",
+    "    plt.plot(x,res_analytical3)\n",
+    "    plt.legend(['dnn','analytical'])\n",
+    "\n",
+    "    plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "51ff4964",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Resources on differential equations and deep learning\n",
+    "\n",
+    "1. [Artificial neural networks for solving ordinary and partial differential equations by I.E. Lagaris et al](https://pdfs.semanticscholar.org/d061/df393e0e8fbfd0ea24976458b7d42419040d.pdf)\n",
+    "\n",
+    "2. [Neural networks for solving differential equations by A. Honchar](https://becominghuman.ai/neural-networks-for-solving-differential-equations-fa230ac5e04c)\n",
+    "\n",
+    "3. [Solving differential equations using neural networks by M.M Chiaramonte and M. Kiener](http://cs229.stanford.edu/proj2013/ChiaramonteKiener-SolvingDifferentialEquationsUsingNeuralNetworks.pdf)\n",
+    "\n",
+    "4. [Introduction to Partial Differential Equations by A. Tveito, R. Winther](https://www.springer.com/us/book/9783540225515)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f7c3b9fc",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Convolutional Neural Networks (recognizing images)\n",
+    "\n",
+    "Convolutional neural networks (CNNs) were developed during the last\n",
+    "decade of the previous century, with a focus on character recognition\n",
+    "tasks. Nowadays, CNNs are a central element in the spectacular success\n",
+    "of deep learning methods. The success in for example image\n",
+    "classifications have made them a central tool for most machine\n",
+    "learning practitioners.\n",
+    "\n",
+    "CNNs are very similar to ordinary Neural Networks.\n",
+    "They are made up of neurons that have learnable weights and\n",
+    "biases. Each neuron receives some inputs, performs a dot product and\n",
+    "optionally follows it with a non-linearity. The whole network still\n",
+    "expresses a single differentiable score function: from the raw image\n",
+    "pixels on one end to class scores at the other. And they still have a\n",
+    "loss function (for example Softmax) on the last (fully-connected) layer\n",
+    "and all the tips/tricks we developed for learning regular Neural\n",
+    "Networks still apply (back propagation, gradient descent etc etc)."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5d3a5ee8",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## What is the Difference\n",
+    "\n",
+    "**CNN architectures make the explicit assumption that\n",
+    "the inputs are images, which allows us to encode certain properties\n",
+    "into the architecture. These then make the forward function more\n",
+    "efficient to implement and vastly reduce the amount of parameters in\n",
+    "the network.**"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e8618fc8",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Neural Networks vs CNNs\n",
+    "\n",
+    "Neural networks are defined as **affine transformations**, that is \n",
+    "a vector is received as input and is multiplied with a matrix of so-called weights (our unknown paramters) to produce an\n",
+    "output (to which a bias vector is usually added before passing the result\n",
+    "through a nonlinear activation function). This is applicable to any type of input, be it an\n",
+    "image, a sound clip or an unordered collection of features: whatever their\n",
+    "dimensionality, their representation can always be flattened into a vector\n",
+    "before the transformation."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b41b4781",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Why CNNS for images, sound files, medical images from CT scans etc?\n",
+    "\n",
+    "However, when we consider images, sound clips and many other similar kinds of data, these data  have an intrinsic\n",
+    "structure. More formally, they share these important properties:\n",
+    "* They are stored as multi-dimensional arrays (think of the pixels of a figure) .\n",
+    "\n",
+    "* They feature one or more axes for which ordering matters (e.g., width and height axes for an image, time axis for a sound clip).\n",
+    "\n",
+    "* One axis, called the channel axis, is used to access different views of the data (e.g., the red, green and blue channels of a color image, or the left and right channels of a stereo audio track).\n",
+    "\n",
+    "These properties are not exploited when an affine transformation is applied; in\n",
+    "fact, all the axes are treated in the same way and the topological information\n",
+    "is not taken into account. Still, taking advantage of the implicit structure of\n",
+    "the data may prove very handy in solving some tasks, like computer vision and\n",
+    "speech recognition, and in these cases it would be best to preserve it. This is\n",
+    "where discrete convolutions come into play.\n",
+    "\n",
+    "A discrete convolution is a linear transformation that preserves this notion of\n",
+    "ordering. It is sparse (only a few input units contribute to a given output\n",
+    "unit) and reuses parameters (the same weights are applied to multiple locations\n",
+    "in the input)."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "33bf8922",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Regular NNs don’t scale well to full images\n",
+    "\n",
+    "As an example, consider\n",
+    "an image of size $32\\times 32\\times 3$ (32 wide, 32 high, 3 color channels), so a\n",
+    "single fully-connected neuron in a first hidden layer of a regular\n",
+    "Neural Network would have $32\\times 32\\times 3 = 3072$ weights. This amount still\n",
+    "seems manageable, but clearly this fully-connected structure does not\n",
+    "scale to larger images. For example, an image of more respectable\n",
+    "size, say $200\\times 200\\times 3$, would lead to neurons that have \n",
+    "$200\\times 200\\times 3 = 120,000$ weights. \n",
+    "\n",
+    "We could have\n",
+    "several such neurons, and the parameters would add up quickly! Clearly,\n",
+    "this full connectivity is wasteful and the huge number of parameters\n",
+    "would quickly lead to possible overfitting.\n",
+    "\n",
+    "<!-- dom:FIGURE: [figslides/nn.jpeg, width=500 frac=0.6]  A regular 3-layer Neural Network. -->\n",
+    "<!-- begin figure -->\n",
+    "\n",
+    "<img src=\"figslides/nn.jpeg\" width=\"500\"><p style=\"font-size: 0.9em\"><i>Figure 1: A regular 3-layer Neural Network.</i></p>\n",
+    "<!-- end figure -->"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "95c20234",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## 3D volumes of neurons\n",
+    "\n",
+    "Convolutional Neural Networks take advantage of the fact that the\n",
+    "input consists of images and they constrain the architecture in a more\n",
+    "sensible way. \n",
+    "\n",
+    "In particular, unlike a regular Neural Network, the\n",
+    "layers of a CNN have neurons arranged in 3 dimensions: width,\n",
+    "height, depth. (Note that the word depth here refers to the third\n",
+    "dimension of an activation volume, not to the depth of a full Neural\n",
+    "Network, which can refer to the total number of layers in a network.)\n",
+    "\n",
+    "To understand it better, the above example of an image \n",
+    "with an input volume of\n",
+    "activations has dimensions $32\\times 32\\times 3$ (width, height,\n",
+    "depth respectively). \n",
+    "\n",
+    "The neurons in a layer will\n",
+    "only be connected to a small region of the layer before it, instead of\n",
+    "all of the neurons in a fully-connected manner. Moreover, the final\n",
+    "output layer could  for this specific image have dimensions $1\\times 1 \\times 10$, \n",
+    "because by the\n",
+    "end of the CNN architecture we will reduce the full image into a\n",
+    "single vector of class scores, arranged along the depth\n",
+    "dimension. \n",
+    "\n",
+    "<!-- dom:FIGURE: [figslides/cnn.jpeg, width=500 frac=0.6]  A CNN arranges its neurons in three dimensions (width, height, depth), as visualized in one of the layers. Every layer of a CNN transforms the 3D input volume to a 3D output volume of neuron activations. In this example, the red input layer holds the image, so its width and height would be the dimensions of the image, and the depth would be 3 (Red, Green, Blue channels). -->\n",
+    "<!-- begin figure -->\n",
+    "\n",
+    "<img src=\"figslides/cnn.jpeg\" width=\"500\"><p style=\"font-size: 0.9em\"><i>Figure 1: A CNN arranges its neurons in three dimensions (width, height, depth), as visualized in one of the layers. Every layer of a CNN transforms the 3D input volume to a 3D output volume of neuron activations. In this example, the red input layer holds the image, so its width and height would be the dimensions of the image, and the depth would be 3 (Red, Green, Blue channels).</i></p>\n",
+    "<!-- end figure -->"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2b7ba652",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## More on Dimensionalities\n",
+    "\n",
+    "In fields like signal processing (and imaging as well), one designs\n",
+    "so-called filters. These filters are defined by the convolutions and\n",
+    "are often hand-crafted. One may specify filters for smoothing, edge\n",
+    "detection, frequency reshaping, and similar operations. However with\n",
+    "neural networks the idea is to automatically learn the filters and use\n",
+    "many of them in conjunction with non-linear operations (activation\n",
+    "functions).\n",
+    "\n",
+    "As an example consider a neural network operating on sound sequence\n",
+    "data.  Assume that we an input vector $\\boldsymbol{x}$ of length $d=10^6$.  We\n",
+    "construct then a neural network with onle hidden layer only with\n",
+    "$10^4$ nodes. This means that we will have a weight matrix with\n",
+    "$10^4\\times 10^6=10^{10}$ weights to be determined, together with $10^4$ biases.\n",
+    "\n",
+    "Assume furthermore that we have an output layer which is meant to train whether the sound sequence represents a human voice (true) or something else (false).\n",
+    "It means that we have only one output node. But since this output node connects to $10^4$ nodes in the hidden layer, there are in total $10^4$ weights to be determined for the output layer, plus one bias. In total we have"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b6a7ae46",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\mathrm{NumberParameters}=10^{10}+10^4+10^4+1 \\approx 10^{10},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0d56b05e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "that is ten billion parameters to determine."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "35c90423",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Further remarks\n",
+    "\n",
+    "The main principles that justify convolutions is locality of\n",
+    "information and repetion of patterns within the signal. Sound samples\n",
+    "of the input in adjacent spots are much more likely to affect each\n",
+    "other than those that are very far away. Similarly, sounds are\n",
+    "repeated in multiple times in the signal. While slightly simplistic,\n",
+    "reasoning about such a sound example demonstrates this. The same\n",
+    "principles then apply to images and other similar data."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d08d4fb6",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Layers used to build CNNs\n",
+    "\n",
+    "A simple CNN is a sequence of layers, and every layer of a CNN\n",
+    "transforms one volume of activations to another through a\n",
+    "differentiable function. We use three main types of layers to build\n",
+    "CNN architectures: Convolutional Layer, Pooling Layer, and\n",
+    "Fully-Connected Layer (exactly as seen in regular Neural Networks). We\n",
+    "will stack these layers to form a full CNN architecture.\n",
+    "\n",
+    "A simple CNN for image classification could have the architecture:\n",
+    "\n",
+    "* **INPUT** ($32\\times 32 \\times 3$) will hold the raw pixel values of the image, in this case an image of width 32, height 32, and with three color channels R,G,B.\n",
+    "\n",
+    "* **CONV** (convolutional )layer will compute the output of neurons that are connected to local regions in the input, each computing a dot product between their weights and a small region they are connected to in the input volume. This may result in volume such as $[32\\times 32\\times 12]$ if we decided to use 12 filters.\n",
+    "\n",
+    "* **RELU** layer will apply an elementwise activation function, such as the $max(0,x)$ thresholding at zero. This leaves the size of the volume unchanged ($[32\\times 32\\times 12]$).\n",
+    "\n",
+    "* **POOL** (pooling) layer will perform a downsampling operation along the spatial dimensions (width, height), resulting in volume such as $[16\\times 16\\times 12]$.\n",
+    "\n",
+    "* **FC** (i.e. fully-connected) layer will compute the class scores, resulting in volume of size $[1\\times 1\\times 10]$, where each of the 10 numbers correspond to a class score, such as among the 10 categories of the MNIST images we considered above . As with ordinary Neural Networks and as the name implies, each neuron in this layer will be connected to all the numbers in the previous volume."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "dd95dcc6",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Transforming images\n",
+    "\n",
+    "CNNs transform the original image layer by layer from the original\n",
+    "pixel values to the final class scores. \n",
+    "\n",
+    "Observe that some layers contain\n",
+    "parameters and other don’t. In particular, the CNN layers perform\n",
+    "transformations that are a function of not only the activations in the\n",
+    "input volume, but also of the parameters (the weights and biases of\n",
+    "the neurons). On the other hand, the RELU/POOL layers will implement a\n",
+    "fixed function. The parameters in the CONV/FC layers will be trained\n",
+    "with gradient descent so that the class scores that the CNN computes\n",
+    "are consistent with the labels in the training set for each image."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5fdbdbfd",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## CNNs in brief\n",
+    "\n",
+    "In summary:\n",
+    "\n",
+    "* A CNN architecture is in the simplest case a list of Layers that transform the image volume into an output volume (e.g. holding the class scores)\n",
+    "\n",
+    "* There are a few distinct types of Layers (e.g. CONV/FC/RELU/POOL are by far the most popular)\n",
+    "\n",
+    "* Each Layer accepts an input 3D volume and transforms it to an output 3D volume through a differentiable function\n",
+    "\n",
+    "* Each Layer may or may not have parameters (e.g. CONV/FC do, RELU/POOL don’t)\n",
+    "\n",
+    "* Each Layer may or may not have additional hyperparameters (e.g. CONV/FC/POOL do, RELU doesn’t)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c0cbb6b0",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## A deep CNN model ([From Raschka et al](https://github.com/rasbt/machine-learning-book))\n",
+    "\n",
+    "<!-- dom:FIGURE: [figslides/deepcnn.png, width=500 frac=0.67]  A deep CNN -->\n",
+    "<!-- begin figure -->\n",
+    "\n",
+    "<img src=\"figslides/deepcnn.png\" width=\"500\"><p style=\"font-size: 0.9em\"><i>Figure 1: A deep CNN</i></p>\n",
+    "<!-- end figure -->"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "caf2418d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Key Idea\n",
+    "\n",
+    "A dense neural network is representd by an affine operation (like\n",
+    "matrix-matrix multiplication) where all parameters are included.\n",
+    "\n",
+    "The key idea in CNNs for say imaging is that in images neighbor pixels tend to be related! So we connect\n",
+    "only neighboring neurons in the input instead of connecting all with the first hidden layer.\n",
+    "\n",
+    "We say we perform a filtering (convolution is the mathematical operation)."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7d5552d8",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## How to do image compression before the era of deep learning\n",
+    "\n",
+    "The singular-value decomposition (SVD) algorithm has been for decades one of the standard ways of compressing images.\n",
+    "The [lectures on the SVD](https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/chapter2.html#the-singular-value-decomposition) give many of the essential details concerning the SVD.\n",
+    "\n",
+    "The orthogonal vectors which are obtained from the SVD, can be used to\n",
+    "project down the dimensionality of a given image. In the example here\n",
+    "we gray-scale an image and downsize it.\n",
+    "\n",
+    "This recipe relies on us being able to actually perform the SVD. For\n",
+    "large images, and in particular with many images to reconstruct, using the SVD \n",
+    "may quickly become an overwhelming task. With the advent of efficient deep\n",
+    "learning methods like CNNs and later generative methods, these methods\n",
+    "have become in the last years the premier way of performing image\n",
+    "analysis. In particular for classification problems with labelled images."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d0bc0489",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## The SVD example"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 10,
+   "id": "cec697e6",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "from matplotlib.image import imread\n",
+    "import matplotlib.pyplot as plt\n",
+    "import scipy.linalg as ln\n",
+    "import numpy as np\n",
+    "import os\n",
+    "from PIL import Image\n",
+    "from math import log10, sqrt \n",
+    "plt.rcParams['figure.figsize'] = [16, 8]\n",
+    "# Import image\n",
+    "A = imread(os.path.join(\"figslides/photo1.jpg\"))\n",
+    "X = A.dot([0.299, 0.5870, 0.114]) # Convert RGB to grayscale\n",
+    "img = plt.imshow(X)\n",
+    "# convert to gray\n",
+    "img.set_cmap('gray')\n",
+    "plt.axis('off')\n",
+    "plt.show()\n",
+    "# Call image size\n",
+    "print(': %s'%str(X.shape))\n",
+    "\n",
+    "\n",
+    "# split the matrix into U, S, VT\n",
+    "U, S, VT = np.linalg.svd(X,full_matrices=False)\n",
+    "S = np.diag(S)\n",
+    "m = 800 # Image's width\n",
+    "n = 1200 # Image's height\n",
+    "j = 0\n",
+    "# Try compression with different k vectors (these represent projections):\n",
+    "for k in (5,10, 20, 100,200,400,500):\n",
+    "    # Original size of the image\n",
+    "    originalSize = m * n \n",
+    "    # Size after compressed\n",
+    "    compressedSize = k * (1 + m + n) \n",
+    "    # The projection of the original image\n",
+    "    Xapprox = U[:,:k] @ S[0:k,:k] @ VT[:k,:]\n",
+    "    plt.figure(j+1)\n",
+    "    j += 1\n",
+    "    img = plt.imshow(Xapprox)\n",
+    "    img.set_cmap('gray')\n",
+    "    \n",
+    "    plt.axis('off')\n",
+    "    plt.title('k = ' + str(k))\n",
+    "    plt.show() \n",
+    "    print('Original size of image:')\n",
+    "    print(originalSize)\n",
+    "    print('Compression rate as Compressed image / Original size:')\n",
+    "    ratio = compressedSize * 1.0 / originalSize\n",
+    "    print(ratio)\n",
+    "    print('Compression rate is ' + str( round(ratio * 100 ,2)) + '%' )  \n",
+    "    # Estimate MQA\n",
+    "    x= X.astype(\"float\")\n",
+    "    y=Xapprox.astype(\"float\")\n",
+    "    err = np.sum((x - y) ** 2)\n",
+    "    err /= float(X.shape[0] * Xapprox.shape[1])\n",
+    "    print('The mean-square deviation '+ str(round( err)))\n",
+    "    max_pixel = 255.0\n",
+    "    # Estimate Signal Noise Ratio\n",
+    "    srv = 20 * (log10(max_pixel / sqrt(err)))\n",
+    "    print('Signa to noise ratio '+ str(round(srv)) +'dB')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6a578704",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Mathematics of CNNs\n",
+    "\n",
+    "The mathematics of CNNs is based on the mathematical operation of\n",
+    "**convolution**.  In mathematics (in particular in functional analysis),\n",
+    "convolution is represented by mathematical operations (integration,\n",
+    "summation etc) on two functions in order to produce a third function\n",
+    "that expresses how the shape of one gets modified by the other.\n",
+    "Convolution has a plethora of applications in a variety of\n",
+    "disciplines, spanning from statistics to signal processing, computer\n",
+    "vision, solutions of differential equations,linear algebra,\n",
+    "engineering, and yes, machine learning.\n",
+    "\n",
+    "Mathematically, convolution is defined as follows (one-dimensional example):\n",
+    "Let us define a continuous function $y(t)$ given by"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5c858d52",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "y(t) = \\int x(a) w(t-a) da,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a96333c3",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where $x(a)$ represents a so-called input and $w(t-a)$ is normally called the weight function or kernel.\n",
+    "\n",
+    "The above integral is written in  a more compact form as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9834d45e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "y(t) = \\left(x * w\\right)(t).\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "13e15c5f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "The discretized version reads"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0a496b2f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "y(t) = \\sum_{a=-\\infty}^{a=\\infty}x(a)w(t-a).\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "48c5ecd3",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Computing the inverse of the above convolution operations is known as deconvolution and the process is commutative.\n",
+    "\n",
+    "How can we use this? And what does it mean? Let us study some familiar examples first."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7cab11e7",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Convolution Examples: Polynomial multiplication\n",
+    "\n",
+    "Our first example is that of a multiplication between two polynomials,\n",
+    "which we will rewrite in terms of the mathematics of convolution. In\n",
+    "the final stage, since the problem here is a discrete one, we will\n",
+    "recast the final expression in terms of a matrix-vector\n",
+    "multiplication, where the matrix is a so-called [Toeplitz matrix\n",
+    "](https://link.springer.com/book/10.1007/978-93-86279-04-0).\n",
+    "\n",
+    "Let us look a the following polynomials to second and third order, respectively:"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c90333f8",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "p(t) = \\alpha_0+\\alpha_1 t+\\alpha_2 t^2,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7c1b0c9b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9c8df6e8",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "s(t) = \\beta_0+\\beta_1 t+\\beta_2 t^2+\\beta_3 t^3.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "50667dfa",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "The polynomial multiplication gives us a new polynomial of degree $5$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "11f2ea4b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "z(t) = \\delta_0+\\delta_1 t+\\delta_2 t^2+\\delta_3 t^3+\\delta_4 t^4+\\delta_5 t^5.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4abea758",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Efficient Polynomial Multiplication\n",
+    "\n",
+    "Computing polynomial products can be implemented efficiently if we rewrite the more brute force multiplications using convolution.\n",
+    "We note first that the new coefficients are given as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ad22b2d2",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\begin{split}\n",
+    "\\delta_0=&\\alpha_0\\beta_0\\\\\n",
+    "\\delta_1=&\\alpha_1\\beta_0+\\alpha_0\\beta_1\\\\\n",
+    "\\delta_2=&\\alpha_0\\beta_2+\\alpha_1\\beta_1+\\alpha_2\\beta_0\\\\\n",
+    "\\delta_3=&\\alpha_1\\beta_2+\\alpha_2\\beta_1+\\alpha_0\\beta_3\\\\\n",
+    "\\delta_4=&\\alpha_2\\beta_2+\\alpha_1\\beta_3\\\\\n",
+    "\\delta_5=&\\alpha_2\\beta_3.\\\\\n",
+    "\\end{split}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6a3ee064",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "We note that $\\alpha_i=0$ except for $i\\in \\left\\{0,1,2\\right\\}$ and $\\beta_i=0$ except for $i\\in\\left\\{0,1,2,3\\right\\}$.\n",
+    "\n",
+    "We can then rewrite the coefficients $\\delta_j$ using a discrete convolution as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3aca65d8",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\delta_j = \\sum_{i=-\\infty}^{i=\\infty}\\alpha_i\\beta_{j-i}=(\\alpha * \\beta)_j,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0e04ce27",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "or as a double sum with restriction $l=i+j$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "173eda29",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\delta_l = \\sum_{ij}\\alpha_i\\beta_{j}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a196c2cd",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Further simplification\n",
+    "\n",
+    "Although we may have redundant operations with some few zeros for $\\beta_i$, we can rewrite the above sum in a more compact way as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "56018bb8",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\delta_i = \\sum_{k=0}^{k=m-1}\\alpha_k\\beta_{i-k},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ba91ab7b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where $m=3$ in our case, the maximum length of\n",
+    "the vector $\\alpha$. Note that the vector $\\boldsymbol{\\beta}$ has length $n=4$. Below we will find an even more efficient representation."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1b25324b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## A more efficient way of coding the above Convolution\n",
+    "\n",
+    "Since we only have a finite number of $\\alpha$ and $\\beta$ values\n",
+    "which are non-zero, we can rewrite the above convolution expressions\n",
+    "as a matrix-vector multiplication"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "dd6d9155",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\boldsymbol{\\delta}=\\begin{bmatrix}\\alpha_0 & 0 & 0 & 0 \\\\\n",
+    "                            \\alpha_1 & \\alpha_0 & 0 & 0 \\\\\n",
+    "\t\t\t    \\alpha_2 & \\alpha_1 & \\alpha_0 & 0 \\\\\n",
+    "\t\t\t    0 & \\alpha_2 & \\alpha_1 & \\alpha_0 \\\\\n",
+    "\t\t\t    0 & 0 & \\alpha_2 & \\alpha_1 \\\\\n",
+    "\t\t\t    0 & 0 & 0 & \\alpha_2\n",
+    "\t\t\t    \\end{bmatrix}\\begin{bmatrix} \\beta_0 \\\\ \\beta_1 \\\\ \\beta_2 \\\\ \\beta_3\\end{bmatrix}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "28050537",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Commutative process\n",
+    "\n",
+    "The process is commutative and we can easily see that we can rewrite the multiplication in terms of  a matrix holding $\\beta$ and a vector holding $\\alpha$.\n",
+    "In this case we have"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f8278af4",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\boldsymbol{\\delta}=\\begin{bmatrix}\\beta_0 & 0 & 0  \\\\\n",
+    "                            \\beta_1 & \\beta_0 & 0  \\\\\n",
+    "\t\t\t    \\beta_2 & \\beta_1 & \\beta_0  \\\\\n",
+    "\t\t\t    \\beta_3 & \\beta_2 & \\beta_1 \\\\\n",
+    "\t\t\t    0 & \\beta_3 & \\beta_2 \\\\\n",
+    "\t\t\t    0 & 0 & \\beta_3\n",
+    "\t\t\t    \\end{bmatrix}\\begin{bmatrix} \\alpha_0 \\\\ \\alpha_1 \\\\ \\alpha_2\\end{bmatrix}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cfa8bf9e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Note that the use of these matrices is for mathematical purposes only\n",
+    "and not implementation purposes.  When implementing the above equation\n",
+    "we do not encode (and allocate memory) the matrices explicitely.  We\n",
+    "rather code the convolutions in the minimal memory footprint that they\n",
+    "require."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4ad971ca",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Toeplitz matrices\n",
+    "\n",
+    "The above matrices are examples of so-called [Toeplitz\n",
+    "matrices](https://link.springer.com/book/10.1007/978-93-86279-04-0). A\n",
+    "Toeplitz matrix is a matrix in which each descending diagonal from\n",
+    "left to right is constant. For instance the last matrix, which we\n",
+    "rewrite as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ff12250a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\boldsymbol{A}=\\begin{bmatrix}a_0 & 0 & 0  \\\\\n",
+    "                            a_1 & a_0 & 0  \\\\\n",
+    "\t\t\t    a_2 & a_1 & a_0  \\\\\n",
+    "\t\t\t    a_3 & a_2 & a_1 \\\\\n",
+    "\t\t\t    0 & a_3 & a_2 \\\\\n",
+    "\t\t\t    0 & 0 & a_3\n",
+    "\t\t\t    \\end{bmatrix},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d7ebe2e9",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "with elements $a_{ii}=a_{i+1,j+1}=a_{i-j}$ is an example of a Toeplitz\n",
+    "matrix. Such a matrix does not need to be a square matrix.  Toeplitz\n",
+    "matrices are also closely connected with Fourier series, because the multiplication operator by a trigonometric\n",
+    "polynomial, compressed to a finite-dimensional space, can be\n",
+    "represented by such a matrix. The example above shows that we can\n",
+    "represent linear convolution as multiplication of a Toeplitz matrix by\n",
+    "a vector."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5bfa6cd4",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Fourier series and Toeplitz matrices\n",
+    "\n",
+    "This is an active and ogoing research area concerning CNNs. The following articles may be of interest\n",
+    "1. [Read more about the convolution theorem and Fouriers series](https://www.sciencedirect.com/topics/engineering/convolution-theorem#:~:text=The%20convolution%20theorem%20(together%20with,k%20)%20G%20(%20k%20)%20.)\n",
+    "\n",
+    "2. [Fourier Transform Layer](https://www.sciencedirect.com/science/article/pii/S1568494623006257)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4cb64d8c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Generalizing the above one-dimensional case\n",
+    "\n",
+    "In order to align the above simple case with the more general\n",
+    "convolution cases, we rename $\\boldsymbol{\\alpha}$, whose length is $m=3$,\n",
+    "with $\\boldsymbol{w}$.  We will interpret $\\boldsymbol{w}$ as a weight/filter function\n",
+    "with which we want to perform the convolution with an input variable\n",
+    "$\\boldsymbol{x}$ of length $n$.  We will assume always that the filter\n",
+    "$\\boldsymbol{w}$ has dimensionality $m \\le n$.\n",
+    "\n",
+    "We replace thus $\\boldsymbol{\\beta}$ with $\\boldsymbol{x}$ and $\\boldsymbol{\\delta}$ with $\\boldsymbol{y}$ and have"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b05f94fc",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "y(i)= \\left(x*w\\right)(i)= \\sum_{k=0}^{k=m-1}w(k)x(i-k),\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e95bb8b8",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where $m=3$ in our case, the maximum length of the vector $\\boldsymbol{w}$.\n",
+    "Here the symbol $*$ represents the mathematical operation of convolution."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "490b28d9",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Memory considerations\n",
+    "\n",
+    "This expression leaves us however with some terms with negative\n",
+    "indices, for example $x(-1)$ and $x(-2)$ which may not be defined. Our\n",
+    "vector $\\boldsymbol{x}$ has components $x(0)$, $x(1)$, $x(2)$ and $x(3)$.\n",
+    "\n",
+    "The index $j$ for $\\boldsymbol{x}$ runs from $j=0$ to $j=3$ since $\\boldsymbol{x}$ is meant to\n",
+    "represent a third-order polynomial.\n",
+    "\n",
+    "Furthermore, the index $i$ runs from $i=0$ to $i=5$ since $\\boldsymbol{y}$\n",
+    "contains the coefficients of a fifth-order polynomial.  When $i=5$ we\n",
+    "may also have values of $x(4)$ and $x(5)$ which are not defined."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "73dba37b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Padding\n",
+    "\n",
+    "The solution to this is what is called **padding**!  We simply define a\n",
+    "new vector $x$ with two added elements set to zero before $x(0)$ and\n",
+    "two new elements after $x(3)$ set to zero. That is, we augment the\n",
+    "length of $\\boldsymbol{x}$ from $n=4$ to $n+2P=8$, where $P=2$ is the padding\n",
+    "constant (a new hyperparameter), see discussions below as well."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a4ef9cfb",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## New vector\n",
+    "\n",
+    "We have a new vector defined as $x(0)=0$, $x(1)=0$,\n",
+    "$x(2)=\\beta_0$, $x(3)=\\beta_1$, $x(4)=\\beta_2$, $x(5)=\\beta_3$,\n",
+    "$x(6)=0$, and $x(7)=0$.\n",
+    "\n",
+    "We have added four new elements, which\n",
+    "are all zero. The benefit is that we can rewrite the equation for\n",
+    "$\\boldsymbol{y}$, with $i=0,1,\\dots,5$,"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a3df037d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "y(i) = \\sum_{k=0}^{k=m-1}w(k)x(i+(m-1)-k).\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a10c95fd",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "As an example, we have"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "be674b8a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "y(4)=x(6)w(0)+x(5)w(1)+x(4)w(2)=0\\times \\alpha_0+\\beta_3\\alpha_1+\\beta_2\\alpha_2,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c903130e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "as before except that we have an additional term $x(6)w(0)$, which is zero.\n",
+    "\n",
+    "Similarly, for the fifth-order term we have"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "369fb648",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "y(5)=x(7)w(0)+x(6)w(1)+x(5)w(2)=0\\times \\alpha_0+0\\times\\alpha_1+\\beta_3\\alpha_2.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9eae3982",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "The zeroth-order term is"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "52147ec0",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "y(0)=x(2)w(0)+x(1)w(1)+x(0)w(2)=\\beta_0 \\alpha_0+0\\times\\alpha_1+0\\times\\alpha_2=\\alpha_0\\beta_0.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f26b1f24",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Rewriting as dot products\n",
+    "\n",
+    "If we now flip the filter/weight vector, with the following term as a typical example"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1cda7b7e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "y(0)=x(2)w(0)+x(1)w(1)+x(0)w(2)=x(2)\\tilde{w}(2)+x(1)\\tilde{w}(1)+x(0)\\tilde{w}(0),\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "de80daa7",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "with $\\tilde{w}(0)=w(2)$, $\\tilde{w}(1)=w(1)$, and $\\tilde{w}(2)=w(0)$, we can then rewrite the above sum as a dot product of\n",
+    "$x(i:i+(m-1))\\tilde{w}$ for element $y(i)$, where $x(i:i+(m-1))$ is simply a patch of $\\boldsymbol{x}$ of size $m-1$.\n",
+    "\n",
+    "The padding $P$ we have introduced for the convolution stage is just\n",
+    "another hyperparameter which is introduced as part of the\n",
+    "architecture. Similarly, below we will also introduce another\n",
+    "hyperparameter called **Stride** $S$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "bdb16a64",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Cross correlation\n",
+    "\n",
+    "In essentially all applications one uses what is called cross correlation instead of the standard convolution described above.\n",
+    "This means that multiplication is performed in the same direction and instead of the general expression we have discussed above (with infinite sums)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a88a1043",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "y(i) = \\sum_{k=-\\infty}^{k=\\infty}w(k)x(i-k),\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "03659d77",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "we have now"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "532e84de",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "y(i) = \\sum_{k=-\\infty}^{k=\\infty}w(k)x(i+k).\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0487f1f5",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Both TensorFlow and PyTorch (as well as our own code example below),\n",
+    "implement the last equation, although it is normally referred to as\n",
+    "convolution.  The same padding rules and stride rules discussed below\n",
+    "apply to this expression as well.\n",
+    "\n",
+    "We leave it as an exercise for you to convince yourself that the example we have discussed till now, gives the same final expression using the last expression."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "98475dfa",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Two-dimensional objects\n",
+    "\n",
+    "We are now ready to start studying the discrete convolutions relevant for convolutional neural networks.\n",
+    "We often use convolutions over more than one dimension at a time. If\n",
+    "we have a two-dimensional image $X$ as input, we can have a **filter**\n",
+    "defined by a two-dimensional **kernel/weight/filter** $W$. This leads to an output $Y$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1cb3be71",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "Y(i,j)=(X * W)(i,j) = \\sum_m\\sum_n X(m,n)W(i-m,j-n).\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "bd9cd9fb",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Convolution is a commutative process, which means we can rewrite this equation as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1ba314a8",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "Y(i,j)=(X * W)(i,j) = \\sum_m\\sum_n X(i-m,j-n)W(m,n).\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b9fb3fef",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Normally the latter is more straightforward to implement in a machine\n",
+    "larning library since there is less variation in the range of values\n",
+    "of $m$ and $n$.\n",
+    "\n",
+    "As mentioned above, most deep learning libraries implement\n",
+    "cross-correlation instead of convolution (although it is referred to as\n",
+    "convolution)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2d48086b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "Y(i,j)=(X * W)(i,j) = \\sum_m\\sum_n X(i+m,j+n)W(m,n).\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2a62fbae",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## CNNs in more detail, simple example\n",
+    "\n",
+    "Let assume we have an input matrix $X$ of dimensionality $3\\times 3$\n",
+    "and a $2\\times 2$ filter $W$ given by the following matrices"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0176ecc6",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\boldsymbol{X}=\\begin{bmatrix}x_{00} & x_{01} & x_{02}  \\\\\n",
+    "                      x_{10} & x_{11} & x_{12}  \\\\\n",
+    "\t              x_{20} & x_{21} & x_{22} \\end{bmatrix},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f87b6051",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "164502cc",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\boldsymbol{W}=\\begin{bmatrix}w_{00} & w_{01} \\\\\n",
+    "\t              w_{10} & w_{11}\\end{bmatrix}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1d4e61fe",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "We introduce now the hyperparameter $S$ **stride**. Stride represents how the filter $W$ moves the convolution process on the matrix $X$.\n",
+    "We strongly recommend the repository on [Arithmetic of deep learning by Dumoulin and Visin](https://github.com/vdumoulin/conv_arithmetic) \n",
+    "\n",
+    "Here we set the stride equal to $S=1$, which means that, starting with the element $x_{00}$, the filter will act on $2\\times 2$ submatrices each time, starting with the upper corner and moving according to the stride value column by column. \n",
+    "\n",
+    "Here we perform the operation"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7aae890d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "Y_(i,j)=(X * W)(i,j) = \\sum_m\\sum_n X(i-m,j-n)W(m,n),\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "352ba109",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and obtain"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4660c16f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\boldsymbol{Y}=\\begin{bmatrix}x_{00}w_{00}+x_{01}w_{01}+x_{10}w_{10}+x_{11}w_{11} & x_{01}w_{00}+x_{02}w_{01}+x_{11}w_{10}+x_{12}w_{11}  \\\\\n",
+    "\t              x_{10}w_{00}+x_{11}w_{01}+x_{20}w_{10}+x_{21}w_{11} & x_{11}w_{00}+x_{12}w_{01}+x_{21}w_{10}+x_{22}w_{11}\\end{bmatrix}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "edb9d39b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "We can rewrite this operation in terms of a matrix-vector multiplication by defining a new vector where we flatten out the inputs as a vector $\\boldsymbol{X}'$ of length $9$ and\n",
+    "a matrix $\\boldsymbol{W}'$ with dimension $4\\times 9$ as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "11470079",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\boldsymbol{X}'=\\begin{bmatrix}x_{00} \\\\ x_{01} \\\\ x_{02} \\\\ x_{10} \\\\ x_{11} \\\\ x_{12} \\\\ x_{20} \\\\ x_{21} \\\\ x_{22} \\end{bmatrix},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9b505f16",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and the new matrix"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "30c903b3",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\boldsymbol{W}'=\\begin{bmatrix} w_{00} & w_{01} & 0 & w_{10} & w_{11} & 0 & 0 & 0 & 0 \\\\\n",
+    "                        0  & w_{00} & w_{01} & 0 & w_{10} & w_{11} & 0 & 0 & 0 \\\\\n",
+    "\t\t\t0 & 0 & 0 & w_{00} & w_{01} & 0 & w_{10} & w_{11} & 0  \\\\\n",
+    "                        0 & 0 & 0 & 0 & w_{00} & w_{01} & 0 & w_{10} & w_{11}\\end{bmatrix}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "057a5e31",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "We see easily that performing the matrix-vector multiplication $\\boldsymbol{W}'\\boldsymbol{X}'$ is the same as the above convolution with stride $S=1$, that is"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e5f35917",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "Y=(\\boldsymbol{W}*\\boldsymbol{X}),\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7c7cca5e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "is now given by $\\boldsymbol{W}'\\boldsymbol{X}'$ which is a vector of length $4$ instead of the originally resulting  $2\\times 2$ output matrix."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ed8782fc",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## The convolution stage\n",
+    "\n",
+    "The convolution stage, where we apply different filters $\\boldsymbol{W}$ in\n",
+    "order to reduce the dimensionality of an image, adds, in addition to\n",
+    "the weights and biases (to be trained by the back propagation\n",
+    "algorithm) that define the filters, two new hyperparameters, the so-called\n",
+    "**padding** $P$ and the stride $S$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3582873f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Finding the number of parameters\n",
+    "\n",
+    "In the above example we have an input matrix of dimension $3\\times\n",
+    "3$. In general we call the input for an input volume and it is defined\n",
+    "by its width $H_1$, height $H_1$ and depth $D_1$. If we have the\n",
+    "standard three color channels $D_1=3$.\n",
+    "\n",
+    "The above example has $W_1=H_1=3$ and $D_1=1$.\n",
+    "\n",
+    "When we introduce the filter we have the following additional hyperparameters\n",
+    "1. $K$ the number of filters. It is common to perform the convolution of the input several times since by experience shrinking the input too fast does not work well\n",
+    "\n",
+    "2. $F$ as the filter's spatial extent\n",
+    "\n",
+    "3. $S$ as the stride parameter\n",
+    "\n",
+    "4. $P$ as the padding parameter\n",
+    "\n",
+    "These parameters are defined by the architecture of the network and are not included in the training."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c06b2b85",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## New image (or volume)\n",
+    "\n",
+    "Acting with the filter on the input volume produces an output volume\n",
+    "which is defined by its width $W_2$, its height $H_2$ and its depth\n",
+    "$D_2$.\n",
+    "\n",
+    "These are defined by the following relations"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "aa9ff748",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "W_2 = \\frac{(W_1-F+2P)}{S}+1,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0508533e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "H_2 = \\frac{(H_1-F+2P)}{S}+1,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2b59d0d6",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and $D_2=K$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e283f13b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Parameters to train, common settings\n",
+    "\n",
+    "With parameter sharing, the convolution involves thus  for each filter  $F\\times F\\times D_1$ weights plus one bias parameter.\n",
+    "\n",
+    "In total we have"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "59617fcb",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\left(F\\times F\\times D_1)\\right) \\times K+(K\\mathrm{--biases}),\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f406e197",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "parameters to train by back propagation.\n",
+    "\n",
+    "It is common to let $K$ come in powers of $2$, that is $32$, $64$, $128$ etc.\n",
+    "\n",
+    "**Common settings.**\n",
+    "\n",
+    "1. $\\begin{array}{c} F=3 & S=1 & P=1 \\end{array}$\n",
+    "\n",
+    "2. $\\begin{array}{c} F=5 & S=1 & P=2 \\end{array}$\n",
+    "\n",
+    "3. $\\begin{array}{c} F=5 & S=2 & P=\\mathrm{open} \\end{array}$\n",
+    "\n",
+    "4. $\\begin{array}{c} F=1 & S=1 & P=0 \\end{array}$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "82febfb4",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Examples of CNN setups\n",
+    "\n",
+    "Let us assume we have an input volume $V$ given by an image of dimensionality\n",
+    "$32\\times 32 \\times 3$, that is three color channels and $32\\times 32$ pixels.\n",
+    "\n",
+    "We apply a filter of dimension $5\\times 5$ ten times with stride $S=1$ and padding $P=0$.\n",
+    "\n",
+    "The output volume is given by $(32-5)/1+1=28$, resulting in ten images\n",
+    "of dimensionality $28\\times 28\\times 3$.\n",
+    "\n",
+    "The total number of parameters to train for each filter is then\n",
+    "$5\\times 5\\times 3+1$, where the last parameter is the bias. This\n",
+    "gives us $76$ parameters for each filter, leading to a total of $760$\n",
+    "parameters for the ten filters.\n",
+    "\n",
+    "How many parameters will a filter of dimensionality $3\\times 3$\n",
+    "(adding color channels) result in if we produce $32$ new images? Use $S=1$ and $P=0$.\n",
+    "\n",
+    "Note that strides constitute a form of **subsampling**. As an alternative to\n",
+    "being interpreted as a measure of how much the kernel/filter is translated, strides\n",
+    "can also be viewed as how much of the output is retained. For instance, moving\n",
+    "the kernel by hops of two is equivalent to moving the kernel by hops of one but\n",
+    "retaining only odd output elements."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "638e063c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Summarizing: Performing a general discrete convolution ([From Raschka et al](https://github.com/rasbt/machine-learning-book))\n",
+    "\n",
+    "<!-- dom:FIGURE: [figslides/discreteconv1.png, width=500 frac=0.67]  A deep CNN -->\n",
+    "<!-- begin figure -->\n",
+    "\n",
+    "<img src=\"figslides/discreteconv1.png\" width=\"500\"><p style=\"font-size: 0.9em\"><i>Figure 1: A deep CNN</i></p>\n",
+    "<!-- end figure -->"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d182de4b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Pooling\n",
+    "\n",
+    "In addition to discrete convolutions themselves, **pooling** operations\n",
+    "make up another important building block in CNNs. Pooling operations reduce\n",
+    "the size of feature maps by using some function to summarize subregions, such\n",
+    "as taking the average or the maximum value.\n",
+    "\n",
+    "Pooling works by sliding a window across the input and feeding the content of\n",
+    "the window to a **pooling function**. In some sense, pooling works very much\n",
+    "like a discrete convolution, but replaces the linear combination described by\n",
+    "the kernel with some other function."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1159bffe",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Pooling arithmetic\n",
+    "\n",
+    "In a neural network, pooling layers provide invariance to small translations of\n",
+    "the input. The most common kind of pooling is **max pooling**, which\n",
+    "consists in splitting the input in (usually non-overlapping) patches and\n",
+    "outputting the maximum value of each patch. Other kinds of pooling exist, e.g.,\n",
+    "mean or average pooling, which all share the same idea of aggregating the input\n",
+    "locally by applying a non-linearity to the content of some patches."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "138b6d6a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Pooling types ([From Raschka et al](https://github.com/rasbt/machine-learning-book))\n",
+    "\n",
+    "<!-- dom:FIGURE: [figslides/maxpooling.png, width=500 frac=0.67]  A deep CNN -->\n",
+    "<!-- begin figure -->\n",
+    "\n",
+    "<img src=\"figslides/maxpooling.png\" width=\"500\"><p style=\"font-size: 0.9em\"><i>Figure 1: A deep CNN</i></p>\n",
+    "<!-- end figure -->"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "97123878",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Building convolutional neural networks in Tensorflow/Keras and PyTorch\n",
+    "\n",
+    "As discussed above, CNNs are neural networks built from the assumption\n",
+    "that the inputs to the network are 2D images. This is important\n",
+    "because the number of features or pixels in images grows very fast\n",
+    "with the image size, and an enormous number of weights and biases are\n",
+    "needed in order to build an accurate network.  Next week we will\n",
+    "discuss in more detail how we can build a CNN using either TensorFlow\n",
+    "with Keras and PyTorch."
+   ]
+  }
+ ],
+ "metadata": {},
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/doc/LectureNotes/_build/html/_sources/week45.ipynb b/doc/LectureNotes/_build/html/_sources/week45.ipynb
new file mode 100644
index 000000000..c5336e2ab
--- /dev/null
+++ b/doc/LectureNotes/_build/html/_sources/week45.ipynb
@@ -0,0 +1,2335 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "9686648f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- HTML file automatically generated from DocOnce source (https://github.com/doconce/doconce/)\n",
+    "doconce format html week45.do.txt --no_mako -->\n",
+    "<!-- dom:TITLE: Week 45,  Convolutional Neural Networks (CCNs) -->"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "45892517",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "# Week 45,  Convolutional Neural Networks (CCNs)\n",
+    "**Morten Hjorth-Jensen**, Department of Physics, University of Oslo\n",
+    "\n",
+    "Date: **November 3-7, 2025**"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8449fbfd",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Plans for week 45\n",
+    "\n",
+    "**Material for the lecture on Monday November 3, 2025.**\n",
+    "\n",
+    "1. Convolutional Neural Networks, codes and examples (TensorFlow and Pytorch implementations)\n",
+    "\n",
+    "2. Readings and Videos:\n",
+    "\n",
+    "3. These lecture notes at <https://github.com/CompPhysics/MachineLearning/blob/master/doc/pub/week45/ipynb/week45.ipynb>\n",
+    "\n",
+    "4. Video of lecture at <https://youtu.be/dZt6Vm1wjhs>\n",
+    "\n",
+    "5. Whiteboard notes at <https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2025/FYSSTKweek45.pdf>\n",
+    "\n",
+    "6. For a more in depth discussion on  CNNs we recommend Goodfellow et al chapters 9. See also chapter 11 and 12 on practicalities and applications    \n",
+    "\n",
+    "7. Reading suggestions for implementation of CNNs, see Raschka et al chapters 14-15 at <https://github.com/rasbt/machine-learning-book>.\n",
+    "<!-- o Video  on Recurrent Neural Networks from MIT at <https://www.youtube.com/watch?v=SEnXr6v2ifU&ab_channel=AlexanderAmini> -->\n",
+    "\n",
+    "a. Video on Deep Learning at <https://www.youtube.com/playlist?list=PLZHQObOWTQDNU6R1_67000Dx_ZCJB-3pi>"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4ad8a4b2",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Material for the lab sessions\n",
+    "\n",
+    "Discussion of and work on project 2, no exercises this week, only project work"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "48e99fbe",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Material for Lecture Monday November 3"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "661e183c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Convolutional Neural Networks (recognizing images), reminder from last week\n",
+    "\n",
+    "Convolutional neural networks (CNNs) were developed during the last\n",
+    "decade of the previous century, with a focus on character recognition\n",
+    "tasks. Nowadays, CNNs are a central element in the spectacular success\n",
+    "of deep learning methods. The success in for example image\n",
+    "classifications have made them a central tool for most machine\n",
+    "learning practitioners.\n",
+    "\n",
+    "CNNs are very similar to ordinary Neural Networks.\n",
+    "They are made up of neurons that have learnable weights and\n",
+    "biases. Each neuron receives some inputs, performs a dot product and\n",
+    "optionally follows it with a non-linearity. The whole network still\n",
+    "expresses a single differentiable score function: from the raw image\n",
+    "pixels on one end to class scores at the other. And they still have a\n",
+    "loss function (for example Softmax) on the last (fully-connected) layer\n",
+    "and all the tips/tricks we developed for learning regular Neural\n",
+    "Networks still apply (back propagation, gradient descent etc etc)."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "96a38398",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## What is the Difference\n",
+    "\n",
+    "**CNN architectures make the explicit assumption that\n",
+    "the inputs are images, which allows us to encode certain properties\n",
+    "into the architecture. These then make the forward function more\n",
+    "efficient to implement and vastly reduce the amount of parameters in\n",
+    "the network.**"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3ca522fb",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Neural Networks vs CNNs\n",
+    "\n",
+    "Neural networks are defined as **affine transformations**, that is \n",
+    "a vector is received as input and is multiplied with a matrix of so-called weights (our unknown paramters) to produce an\n",
+    "output (to which a bias vector is usually added before passing the result\n",
+    "through a nonlinear activation function). This is applicable to any type of input, be it an\n",
+    "image, a sound clip or an unordered collection of features: whatever their\n",
+    "dimensionality, their representation can always be flattened into a vector\n",
+    "before the transformation."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "609aa156",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Why CNNS for images, sound files, medical images from CT scans etc?\n",
+    "\n",
+    "However, when we consider images, sound clips and many other similar kinds of data, these data  have an intrinsic\n",
+    "structure. More formally, they share these important properties:\n",
+    "* They are stored as multi-dimensional arrays (think of the pixels of a figure) .\n",
+    "\n",
+    "* They feature one or more axes for which ordering matters (e.g., width and height axes for an image, time axis for a sound clip).\n",
+    "\n",
+    "* One axis, called the channel axis, is used to access different views of the data (e.g., the red, green and blue channels of a color image, or the left and right channels of a stereo audio track).\n",
+    "\n",
+    "These properties are not exploited when an affine transformation is applied; in\n",
+    "fact, all the axes are treated in the same way and the topological information\n",
+    "is not taken into account. Still, taking advantage of the implicit structure of\n",
+    "the data may prove very handy in solving some tasks, like computer vision and\n",
+    "speech recognition, and in these cases it would be best to preserve it. This is\n",
+    "where discrete convolutions come into play.\n",
+    "\n",
+    "A discrete convolution is a linear transformation that preserves this notion of\n",
+    "ordering. It is sparse (only a few input units contribute to a given output\n",
+    "unit) and reuses parameters (the same weights are applied to multiple locations\n",
+    "in the input)."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c280e4de",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Regular NNs don’t scale well to full images\n",
+    "\n",
+    "As an example, consider\n",
+    "an image of size $32\\times 32\\times 3$ (32 wide, 32 high, 3 color channels), so a\n",
+    "single fully-connected neuron in a first hidden layer of a regular\n",
+    "Neural Network would have $32\\times 32\\times 3 = 3072$ weights. This amount still\n",
+    "seems manageable, but clearly this fully-connected structure does not\n",
+    "scale to larger images. For example, an image of more respectable\n",
+    "size, say $200\\times 200\\times 3$, would lead to neurons that have \n",
+    "$200\\times 200\\times 3 = 120,000$ weights. \n",
+    "\n",
+    "We could have\n",
+    "several such neurons, and the parameters would add up quickly! Clearly,\n",
+    "this full connectivity is wasteful and the huge number of parameters\n",
+    "would quickly lead to possible overfitting.\n",
+    "\n",
+    "<!-- dom:FIGURE: [figslides/nn.jpeg, width=500 frac=0.6]  A regular 3-layer Neural Network. -->\n",
+    "<!-- begin figure -->\n",
+    "\n",
+    "<img src=\"figslides/nn.jpeg\" width=\"500\"><p style=\"font-size: 0.9em\"><i>Figure 1: A regular 3-layer Neural Network.</i></p>\n",
+    "<!-- end figure -->"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0d86d50e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## 3D volumes of neurons\n",
+    "\n",
+    "Convolutional Neural Networks take advantage of the fact that the\n",
+    "input consists of images and they constrain the architecture in a more\n",
+    "sensible way. \n",
+    "\n",
+    "In particular, unlike a regular Neural Network, the\n",
+    "layers of a CNN have neurons arranged in 3 dimensions: width,\n",
+    "height, depth. (Note that the word depth here refers to the third\n",
+    "dimension of an activation volume, not to the depth of a full Neural\n",
+    "Network, which can refer to the total number of layers in a network.)\n",
+    "\n",
+    "To understand it better, the above example of an image \n",
+    "with an input volume of\n",
+    "activations has dimensions $32\\times 32\\times 3$ (width, height,\n",
+    "depth respectively). \n",
+    "\n",
+    "The neurons in a layer will\n",
+    "only be connected to a small region of the layer before it, instead of\n",
+    "all of the neurons in a fully-connected manner. Moreover, the final\n",
+    "output layer could  for this specific image have dimensions $1\\times 1 \\times 10$, \n",
+    "because by the\n",
+    "end of the CNN architecture we will reduce the full image into a\n",
+    "single vector of class scores, arranged along the depth\n",
+    "dimension. \n",
+    "\n",
+    "<!-- dom:FIGURE: [figslides/cnn.jpeg, width=500 frac=0.6]  A CNN arranges its neurons in three dimensions (width, height, depth), as visualized in one of the layers. Every layer of a CNN transforms the 3D input volume to a 3D output volume of neuron activations. In this example, the red input layer holds the image, so its width and height would be the dimensions of the image, and the depth would be 3 (Red, Green, Blue channels). -->\n",
+    "<!-- begin figure -->\n",
+    "\n",
+    "<img src=\"figslides/cnn.jpeg\" width=\"500\"><p style=\"font-size: 0.9em\"><i>Figure 1: A CNN arranges its neurons in three dimensions (width, height, depth), as visualized in one of the layers. Every layer of a CNN transforms the 3D input volume to a 3D output volume of neuron activations. In this example, the red input layer holds the image, so its width and height would be the dimensions of the image, and the depth would be 3 (Red, Green, Blue channels).</i></p>\n",
+    "<!-- end figure -->"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "93102a35",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## More on Dimensionalities\n",
+    "\n",
+    "In fields like signal processing (and imaging as well), one designs\n",
+    "so-called filters. These filters are defined by the convolutions and\n",
+    "are often hand-crafted. One may specify filters for smoothing, edge\n",
+    "detection, frequency reshaping, and similar operations. However with\n",
+    "neural networks the idea is to automatically learn the filters and use\n",
+    "many of them in conjunction with non-linear operations (activation\n",
+    "functions).\n",
+    "\n",
+    "As an example consider a neural network operating on sound sequence\n",
+    "data.  Assume that we an input vector $\\boldsymbol{x}$ of length $d=10^6$.  We\n",
+    "construct then a neural network with onle hidden layer only with\n",
+    "$10^4$ nodes. This means that we will have a weight matrix with\n",
+    "$10^4\\times 10^6=10^{10}$ weights to be determined, together with $10^4$ biases.\n",
+    "\n",
+    "Assume furthermore that we have an output layer which is meant to train whether the sound sequence represents a human voice (true) or something else (false).\n",
+    "It means that we have only one output node. But since this output node connects to $10^4$ nodes in the hidden layer, there are in total $10^4$ weights to be determined for the output layer, plus one bias. In total we have"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b0e6ea33",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\mathrm{NumberParameters}=10^{10}+10^4+10^4+1 \\approx 10^{10},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3fbba997",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "that is ten billion parameters to determine."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4be9d3e0",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Further remarks\n",
+    "\n",
+    "The main principles that justify convolutions is locality of\n",
+    "information and repetion of patterns within the signal. Sound samples\n",
+    "of the input in adjacent spots are much more likely to affect each\n",
+    "other than those that are very far away. Similarly, sounds are\n",
+    "repeated in multiple times in the signal. While slightly simplistic,\n",
+    "reasoning about such a sound example demonstrates this. The same\n",
+    "principles then apply to images and other similar data."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b93711ab",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Layers used to build CNNs\n",
+    "\n",
+    "A simple CNN is a sequence of layers, and every layer of a CNN\n",
+    "transforms one volume of activations to another through a\n",
+    "differentiable function. We use three main types of layers to build\n",
+    "CNN architectures: Convolutional Layer, Pooling Layer, and\n",
+    "Fully-Connected Layer (exactly as seen in regular Neural Networks). We\n",
+    "will stack these layers to form a full CNN architecture.\n",
+    "\n",
+    "A simple CNN for image classification could have the architecture:\n",
+    "\n",
+    "* **INPUT** ($32\\times 32 \\times 3$) will hold the raw pixel values of the image, in this case an image of width 32, height 32, and with three color channels R,G,B.\n",
+    "\n",
+    "* **CONV** (convolutional )layer will compute the output of neurons that are connected to local regions in the input, each computing a dot product between their weights and a small region they are connected to in the input volume. This may result in volume such as $[32\\times 32\\times 12]$ if we decided to use 12 filters.\n",
+    "\n",
+    "* **RELU** layer will apply an elementwise activation function, such as the $max(0,x)$ thresholding at zero. This leaves the size of the volume unchanged ($[32\\times 32\\times 12]$).\n",
+    "\n",
+    "* **POOL** (pooling) layer will perform a downsampling operation along the spatial dimensions (width, height), resulting in volume such as $[16\\times 16\\times 12]$.\n",
+    "\n",
+    "* **FC** (i.e. fully-connected) layer will compute the class scores, resulting in volume of size $[1\\times 1\\times 10]$, where each of the 10 numbers correspond to a class score, such as among the 10 categories of the MNIST images we considered above . As with ordinary Neural Networks and as the name implies, each neuron in this layer will be connected to all the numbers in the previous volume."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "df93de2c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Transforming images\n",
+    "\n",
+    "CNNs transform the original image layer by layer from the original\n",
+    "pixel values to the final class scores. \n",
+    "\n",
+    "Observe that some layers contain\n",
+    "parameters and other don’t. In particular, the CNN layers perform\n",
+    "transformations that are a function of not only the activations in the\n",
+    "input volume, but also of the parameters (the weights and biases of\n",
+    "the neurons). On the other hand, the RELU/POOL layers will implement a\n",
+    "fixed function. The parameters in the CONV/FC layers will be trained\n",
+    "with gradient descent so that the class scores that the CNN computes\n",
+    "are consistent with the labels in the training set for each image."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "35b469f8",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## CNNs in brief\n",
+    "\n",
+    "In summary:\n",
+    "\n",
+    "* A CNN architecture is in the simplest case a list of Layers that transform the image volume into an output volume (e.g. holding the class scores)\n",
+    "\n",
+    "* There are a few distinct types of Layers (e.g. CONV/FC/RELU/POOL are by far the most popular)\n",
+    "\n",
+    "* Each Layer accepts an input 3D volume and transforms it to an output 3D volume through a differentiable function\n",
+    "\n",
+    "* Each Layer may or may not have parameters (e.g. CONV/FC do, RELU/POOL don’t)\n",
+    "\n",
+    "* Each Layer may or may not have additional hyperparameters (e.g. CONV/FC/POOL do, RELU doesn’t)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f2bc243c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## A deep CNN model ([From Raschka et al](https://github.com/rasbt/machine-learning-book))\n",
+    "\n",
+    "<!-- dom:FIGURE: [figslides/deepcnn.png, width=500 frac=0.67]  A deep CNN -->\n",
+    "<!-- begin figure -->\n",
+    "\n",
+    "<img src=\"figslides/deepcnn.png\" width=\"500\"><p style=\"font-size: 0.9em\"><i>Figure 1: A deep CNN</i></p>\n",
+    "<!-- end figure -->"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "92956a26",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Key Idea\n",
+    "\n",
+    "A dense neural network is representd by an affine operation (like matrix-matrix multiplication) where all parameters are included.\n",
+    "\n",
+    "The key idea in CNNs for say imaging is that in images neighbor pixels tend to be related! So we connect\n",
+    "only neighboring neurons in the input instead of connecting all with the first hidden layer.\n",
+    "\n",
+    "We say we perform a filtering (convolution is the mathematical operation)."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b758f4ee",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Mathematics of CNNs\n",
+    "\n",
+    "The mathematics of CNNs is based on the mathematical operation of\n",
+    "**convolution**.  In mathematics (in particular in functional analysis),\n",
+    "convolution is represented by mathematical operations (integration,\n",
+    "summation etc) on two functions in order to produce a third function\n",
+    "that expresses how the shape of one gets modified by the other.\n",
+    "Convolution has a plethora of applications in a variety of\n",
+    "disciplines, spanning from statistics to signal processing, computer\n",
+    "vision, solutions of differential equations,linear algebra,\n",
+    "engineering, and yes, machine learning.\n",
+    "\n",
+    "Mathematically, convolution is defined as follows (one-dimensional example):\n",
+    "Let us define a continuous function $y(t)$ given by"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9fa911b3",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "y(t) = \\int x(a) w(t-a) da,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "918817a5",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where $x(a)$ represents a so-called input and $w(t-a)$ is normally called the weight function or kernel.\n",
+    "\n",
+    "The above integral is written in  a more compact form as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d5538df6",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "y(t) = \\left(x * w\\right)(t).\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d4a4e2bc",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "The discretized version reads"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "68268e68",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "y(t) = \\sum_{a=-\\infty}^{a=\\infty}x(a)w(t-a).\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "198bcce9",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Computing the inverse of the above convolution operations is known as deconvolution and the process is commutative.\n",
+    "\n",
+    "How can we use this? And what does it mean? Let us study some familiar examples first."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "43b535c4",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Convolution Examples: Polynomial multiplication\n",
+    "\n",
+    "Our first example is that of a multiplication between two polynomials,\n",
+    "which we will rewrite in terms of the mathematics of convolution. In\n",
+    "the final stage, since the problem here is a discrete one, we will\n",
+    "recast the final expression in terms of a matrix-vector\n",
+    "multiplication, where the matrix is a so-called [Toeplitz matrix\n",
+    "](https://link.springer.com/book/10.1007/978-93-86279-04-0).\n",
+    "\n",
+    "Let us look a the following polynomials to second and third order, respectively:"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "45bc8ffc",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "p(t) = \\alpha_0+\\alpha_1 t+\\alpha_2 t^2,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2c42df04",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "08c139bf",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "s(t) = \\beta_0+\\beta_1 t+\\beta_2 t^2+\\beta_3 t^3.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "bf189420",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "The polynomial multiplication gives us a new polynomial of degree $5$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7f5d7607",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "z(t) = \\delta_0+\\delta_1 t+\\delta_2 t^2+\\delta_3 t^3+\\delta_4 t^4+\\delta_5 t^5.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a2f47e64",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Efficient Polynomial Multiplication\n",
+    "\n",
+    "Computing polynomial products can be implemented efficiently if we rewrite the more brute force multiplications using convolution.\n",
+    "We note first that the new coefficients are given as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7890aee8",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\begin{split}\n",
+    "\\delta_0=&\\alpha_0\\beta_0\\\\\n",
+    "\\delta_1=&\\alpha_1\\beta_0+\\alpha_0\\beta_1\\\\\n",
+    "\\delta_2=&\\alpha_0\\beta_2+\\alpha_1\\beta_1+\\alpha_2\\beta_0\\\\\n",
+    "\\delta_3=&\\alpha_1\\beta_2+\\alpha_2\\beta_1+\\alpha_0\\beta_3\\\\\n",
+    "\\delta_4=&\\alpha_2\\beta_2+\\alpha_1\\beta_3\\\\\n",
+    "\\delta_5=&\\alpha_2\\beta_3.\\\\\n",
+    "\\end{split}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6a03a3eb",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "We note that $\\alpha_i=0$ except for $i\\in \\left\\{0,1,2\\right\\}$ and $\\beta_i=0$ except for $i\\in\\left\\{0,1,2,3\\right\\}$.\n",
+    "\n",
+    "We can then rewrite the coefficients $\\delta_j$ using a discrete convolution as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b49e404f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\delta_j = \\sum_{i=-\\infty}^{i=\\infty}\\alpha_i\\beta_{j-i}=(\\alpha * \\beta)_j,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4ef5b061",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "or as a double sum with restriction $l=i+j$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "61685a6c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\delta_l = \\sum_{ij}\\alpha_i\\beta_{j}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7ced5341",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Further simplification\n",
+    "\n",
+    "Although we may have redundant operations with some few zeros for $\\beta_i$, we can rewrite the above sum in a more compact way as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3d00697e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\delta_i = \\sum_{k=0}^{k=m-1}\\alpha_k\\beta_{i-k},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "22837be3",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where $m=3$ in our case, the maximum length of\n",
+    "the vector $\\alpha$. Note that the vector $\\boldsymbol{\\beta}$ has length $n=4$. Below we will find an even more efficient representation."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1603a086",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## A more efficient way of coding the above Convolution\n",
+    "\n",
+    "Since we only have a finite number of $\\alpha$ and $\\beta$ values\n",
+    "which are non-zero, we can rewrite the above convolution expressions\n",
+    "as a matrix-vector multiplication"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "340acf5c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\boldsymbol{\\delta}=\\begin{bmatrix}\\alpha_0 & 0 & 0 & 0 \\\\\n",
+    "                            \\alpha_1 & \\alpha_0 & 0 & 0 \\\\\n",
+    "\t\t\t    \\alpha_2 & \\alpha_1 & \\alpha_0 & 0 \\\\\n",
+    "\t\t\t    0 & \\alpha_2 & \\alpha_1 & \\alpha_0 \\\\\n",
+    "\t\t\t    0 & 0 & \\alpha_2 & \\alpha_1 \\\\\n",
+    "\t\t\t    0 & 0 & 0 & \\alpha_2\n",
+    "\t\t\t    \\end{bmatrix}\\begin{bmatrix} \\beta_0 \\\\ \\beta_1 \\\\ \\beta_2 \\\\ \\beta_3\\end{bmatrix}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cdc8d513",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Commutative process\n",
+    "\n",
+    "The process is commutative and we can easily see that we can rewrite the multiplication in terms of  a matrix holding $\\beta$ and a vector holding $\\alpha$.\n",
+    "In this case we have"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "51e1f3d8",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\boldsymbol{\\delta}=\\begin{bmatrix}\\beta_0 & 0 & 0  \\\\\n",
+    "                            \\beta_1 & \\beta_0 & 0  \\\\\n",
+    "\t\t\t    \\beta_2 & \\beta_1 & \\beta_0  \\\\\n",
+    "\t\t\t    \\beta_3 & \\beta_2 & \\beta_1 \\\\\n",
+    "\t\t\t    0 & \\beta_3 & \\beta_2 \\\\\n",
+    "\t\t\t    0 & 0 & \\beta_3\n",
+    "\t\t\t    \\end{bmatrix}\\begin{bmatrix} \\alpha_0 \\\\ \\alpha_1 \\\\ \\alpha_2\\end{bmatrix}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ce936f65",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Note that the use of these matrices is for mathematical purposes only\n",
+    "and not implementation purposes.  When implementing the above equation\n",
+    "we do not encode (and allocate memory) the matrices explicitely.  We\n",
+    "rather code the convolutions in the minimal memory footprint that they\n",
+    "require."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c93a683f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Toeplitz matrices\n",
+    "\n",
+    "The above matrices are examples of so-called [Toeplitz\n",
+    "matrices](https://link.springer.com/book/10.1007/978-93-86279-04-0). A\n",
+    "Toeplitz matrix is a matrix in which each descending diagonal from\n",
+    "left to right is constant. For instance the last matrix, which we\n",
+    "rewrite as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1e3cffca",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\boldsymbol{A}=\\begin{bmatrix}a_0 & 0 & 0  \\\\\n",
+    "                            a_1 & a_0 & 0  \\\\\n",
+    "\t\t\t    a_2 & a_1 & a_0  \\\\\n",
+    "\t\t\t    a_3 & a_2 & a_1 \\\\\n",
+    "\t\t\t    0 & a_3 & a_2 \\\\\n",
+    "\t\t\t    0 & 0 & a_3\n",
+    "\t\t\t    \\end{bmatrix},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e27270d9",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "with elements $a_{ii}=a_{i+1,j+1}=a_{i-j}$ is an example of a Toeplitz\n",
+    "matrix. Such a matrix does not need to be a square matrix.  Toeplitz\n",
+    "matrices are also closely connected with Fourier series, because the multiplication operator by a trigonometric\n",
+    "polynomial, compressed to a finite-dimensional space, can be\n",
+    "represented by such a matrix. The example above shows that we can\n",
+    "represent linear convolution as multiplication of a Toeplitz matrix by\n",
+    "a vector."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "125ef645",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Fourier series and Toeplitz matrices\n",
+    "\n",
+    "This is an active and ogoing research area concerning CNNs. The following articles may be of interest\n",
+    "1. [Read more about the convolution theorem and Fouriers series](https://www.sciencedirect.com/topics/engineering/convolution-theorem#:~:text=The%20convolution%20theorem%20(together%20with,k%20)%20G%20(%20k%20)%20.)\n",
+    "\n",
+    "2. [Fourier Transform Layer](https://www.sciencedirect.com/science/article/pii/S1568494623006257)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d13ab1e4",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Generalizing the above one-dimensional case\n",
+    "\n",
+    "In order to align the above simple case with the more general\n",
+    "convolution cases, we rename $\\boldsymbol{\\alpha}$, whose length is $m=3$,\n",
+    "with $\\boldsymbol{w}$.  We will interpret $\\boldsymbol{w}$ as a weight/filter function\n",
+    "with which we want to perform the convolution with an input variable\n",
+    "$\\boldsymbol{x}$ of length $n$.  We will assume always that the filter\n",
+    "$\\boldsymbol{w}$ has dimensionality $m \\le n$.\n",
+    "\n",
+    "We replace thus $\\boldsymbol{\\beta}$ with $\\boldsymbol{x}$ and $\\boldsymbol{\\delta}$ with $\\boldsymbol{y}$ and have"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b9eb4b1e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "y(i)= \\left(x*w\\right)(i)= \\sum_{k=0}^{k=m-1}w(k)x(i-k),\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "bdf0893f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where $m=3$ in our case, the maximum length of the vector $\\boldsymbol{w}$.\n",
+    "Here the symbol $*$ represents the mathematical operation of convolution."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "64cd5dbb",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Memory considerations\n",
+    "\n",
+    "This expression leaves us however with some terms with negative\n",
+    "indices, for example $x(-1)$ and $x(-2)$ which may not be defined. Our\n",
+    "vector $\\boldsymbol{x}$ has components $x(0)$, $x(1)$, $x(2)$ and $x(3)$.\n",
+    "\n",
+    "The index $j$ for $\\boldsymbol{x}$ runs from $j=0$ to $j=3$ since $\\boldsymbol{x}$ is meant to\n",
+    "represent a third-order polynomial.\n",
+    "\n",
+    "Furthermore, the index $i$ runs from $i=0$ to $i=5$ since $\\boldsymbol{y}$\n",
+    "contains the coefficients of a fifth-order polynomial.  When $i=5$ we\n",
+    "may also have values of $x(4)$ and $x(5)$ which are not defined."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "20fa0219",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Padding\n",
+    "\n",
+    "The solution to this is what is called **padding**!  We simply define a\n",
+    "new vector $x$ with two added elements set to zero before $x(0)$ and\n",
+    "two new elements after $x(3)$ set to zero. That is, we augment the\n",
+    "length of $\\boldsymbol{x}$ from $n=4$ to $n+2P=8$, where $P=2$ is the padding\n",
+    "constant (a new hyperparameter), see discussions below as well."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d24c7e69",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## New vector\n",
+    "\n",
+    "We have a new vector defined as $x(0)=0$, $x(1)=0$,\n",
+    "$x(2)=\\beta_0$, $x(3)=\\beta_1$, $x(4)=\\beta_2$, $x(5)=\\beta_3$,\n",
+    "$x(6)=0$, and $x(7)=0$.\n",
+    "\n",
+    "We have added four new elements, which\n",
+    "are all zero. The benefit is that we can rewrite the equation for\n",
+    "$\\boldsymbol{y}$, with $i=0,1,\\dots,5$,"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c00151a8",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "y(i) = \\sum_{k=0}^{k=m-1}w(k)x(i+(m-1)-k).\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c9b39bfd",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "As an example, we have"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "53de5ac4",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "y(4)=x(6)w(0)+x(5)w(1)+x(4)w(2)=0\\times \\alpha_0+\\beta_3\\alpha_1+\\beta_2\\alpha_2,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e1025d77",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "as before except that we have an additional term $x(6)w(0)$, which is zero.\n",
+    "\n",
+    "Similarly, for the fifth-order term we have"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "34a5a413",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "y(5)=x(7)w(0)+x(6)w(1)+x(5)w(2)=0\\times \\alpha_0+0\\times\\alpha_1+\\beta_3\\alpha_2.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5ef38242",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "The zeroth-order term is"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "42a8bd2e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "y(0)=x(2)w(0)+x(1)w(1)+x(0)w(2)=\\beta_0 \\alpha_0+0\\times\\alpha_1+0\\times\\alpha_2=\\alpha_0\\beta_0.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2580d624",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Rewriting as dot products\n",
+    "\n",
+    "If we now flip the filter/weight vector, with the following term as a typical example"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "76157e3c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "y(0)=x(2)w(0)+x(1)w(1)+x(0)w(2)=x(2)\\tilde{w}(2)+x(1)\\tilde{w}(1)+x(0)\\tilde{w}(0),\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a47c0bbf",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "with $\\tilde{w}(0)=w(2)$, $\\tilde{w}(1)=w(1)$, and $\\tilde{w}(2)=w(0)$, we can then rewrite the above sum as a dot product of\n",
+    "$x(i:i+(m-1))\\tilde{w}$ for element $y(i)$, where $x(i:i+(m-1))$ is simply a patch of $\\boldsymbol{x}$ of size $m-1$.\n",
+    "\n",
+    "The padding $P$ we have introduced for the convolution stage is just\n",
+    "another hyperparameter which is introduced as part of the\n",
+    "architecture. Similarly, below we will also introduce another\n",
+    "hyperparameter called **Stride** $S$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4de2c235",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Cross correlation\n",
+    "\n",
+    "In essentially all applications one uses what is called cross correlation instead of the standard convolution described above.\n",
+    "This means that multiplication is performed in the same direction and instead of the general expression we have discussed above (with infinite sums)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "33319954",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "y(i) = \\sum_{k=-\\infty}^{k=\\infty}w(k)x(i-k),\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d46fb216",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "we have now"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1125a773",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "y(i) = \\sum_{k=-\\infty}^{k=\\infty}w(k)x(i+k).\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4e9ea645",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Both TensorFlow and PyTorch (as well as our own code example below),\n",
+    "implement the last equation, although it is normally referred to as\n",
+    "convolution.  The same padding rules and stride rules discussed below\n",
+    "apply to this expression as well.\n",
+    "\n",
+    "We leave it as an exercise for you to convince yourself that the example we have discussed till now, gives the same final expression using the last expression."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "711fc589",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Two-dimensional objects\n",
+    "\n",
+    "We are now ready to start studying the discrete convolutions relevant for convolutional neural networks.\n",
+    "We often use convolutions over more than one dimension at a time. If\n",
+    "we have a two-dimensional image $X$ as input, we can have a **filter**\n",
+    "defined by a two-dimensional **kernel/weight/filter** $W$. This leads to an output $Y$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ea93186d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "Y(i,j)=(X * W)(i,j) = \\sum_m\\sum_n X(m,n)W(i-m,j-n).\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2ce72e4f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Convolution is a commutative process, which means we can rewrite this equation as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7c891889",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "Y(i,j)=(X * W)(i,j) = \\sum_m\\sum_n X(i-m,j-n)W(m,n).\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "337f70a6",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Normally the latter is more straightforward to implement in a machine\n",
+    "larning library since there is less variation in the range of values\n",
+    "of $m$ and $n$.\n",
+    "\n",
+    "As mentioned above, most deep learning libraries implement\n",
+    "cross-correlation instead of convolution (although it is referred to as\n",
+    "convolution)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "aa0e3c87",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "Y(i,j)=(X * W)(i,j) = \\sum_m\\sum_n X(i+m,j+n)W(m,n).\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "77113b34",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## CNNs in more detail, simple example\n",
+    "\n",
+    "Let assume we have an input matrix $X$ of dimensionality $3\\times 3$\n",
+    "and a $2\\times 2$ filter $W$ given by the following matrices"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d54278c7",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\boldsymbol{X}=\\begin{bmatrix}x_{00} & x_{01} & x_{02}  \\\\\n",
+    "                      x_{10} & x_{11} & x_{12}  \\\\\n",
+    "\t              x_{20} & x_{21} & x_{22} \\end{bmatrix},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "597d1ef3",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c544ba40",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\boldsymbol{W}=\\begin{bmatrix}w_{00} & w_{01} \\\\\n",
+    "\t              w_{10} & w_{11}\\end{bmatrix}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b6c1b40b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "We introduce now the hyperparameter $S$ **stride**. Stride represents how the filter $W$ moves the convolution process on the matrix $X$.\n",
+    "We strongly recommend the repository on [Arithmetic of deep learning by Dumoulin and Visin](https://github.com/vdumoulin/conv_arithmetic) \n",
+    "\n",
+    "Here we set the stride equal to $S=1$, which means that, starting with the element $x_{00}$, the filter will act on $2\\times 2$ submatrices each time, starting with the upper corner and moving according to the stride value column by column. \n",
+    "\n",
+    "Here we perform the operation"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d8ee5cf0",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "Y_(i,j)=(X * W)(i,j) = \\sum_m\\sum_n X(i-m,j-n)W(m,n),\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5df35204",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and obtain"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "afe8a3ab",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\boldsymbol{Y}=\\begin{bmatrix}x_{00}w_{00}+x_{01}w_{01}+x_{10}w_{10}+x_{11}w_{11} & x_{01}w_{00}+x_{02}w_{01}+x_{11}w_{10}+x_{12}w_{11}  \\\\\n",
+    "\t              x_{10}w_{00}+x_{11}w_{01}+x_{20}w_{10}+x_{21}w_{11} & x_{11}w_{00}+x_{12}w_{01}+x_{21}w_{10}+x_{22}w_{11}\\end{bmatrix}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9a1c6848",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "We can rewrite this operation in terms of a matrix-vector multiplication by defining a new vector where we flatten out the inputs as a vector $\\boldsymbol{X}'$ of length $9$ and\n",
+    "a matrix $\\boldsymbol{W}'$ with dimension $4\\times 9$ as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4506234a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\boldsymbol{X}'=\\begin{bmatrix}x_{00} \\\\ x_{01} \\\\ x_{02} \\\\ x_{10} \\\\ x_{11} \\\\ x_{12} \\\\ x_{20} \\\\ x_{21} \\\\ x_{22} \\end{bmatrix},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f1b2fef4",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and the new matrix"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6c372fa6",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\boldsymbol{W}'=\\begin{bmatrix} w_{00} & w_{01} & 0 & w_{10} & w_{11} & 0 & 0 & 0 & 0 \\\\\n",
+    "                        0  & w_{00} & w_{01} & 0 & w_{10} & w_{11} & 0 & 0 & 0 \\\\\n",
+    "\t\t\t0 & 0 & 0 & w_{00} & w_{01} & 0 & w_{10} & w_{11} & 0  \\\\\n",
+    "                        0 & 0 & 0 & 0 & w_{00} & w_{01} & 0 & w_{10} & w_{11}\\end{bmatrix}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "61ad1cf3",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "We see easily that performing the matrix-vector multiplication $\\boldsymbol{W}'\\boldsymbol{X}'$ is the same as the above convolution with stride $S=1$, that is"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a18a70a2",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "Y=(\\boldsymbol{W}*\\boldsymbol{X}),\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b63a1613",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "is now given by $\\boldsymbol{W}'\\boldsymbol{X}'$ which is a vector of length $4$ instead of the originally resulting  $2\\times 2$ output matrix."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8fa9fe57",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## The convolution stage\n",
+    "\n",
+    "The convolution stage, where we apply different filters $\\boldsymbol{W}$ in\n",
+    "order to reduce the dimensionality of an image, adds, in addition to\n",
+    "the weights and biases (to be trained by the back propagation\n",
+    "algorithm) that define the filters, two new hyperparameters, the so-called\n",
+    "**padding** $P$ and the stride $S$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a30b6ced",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Finding the number of parameters\n",
+    "\n",
+    "In the above example we have an input matrix of dimension $3\\times\n",
+    "3$. In general we call the input for an input volume and it is defined\n",
+    "by its width $H_1$, height $H_1$ and depth $D_1$. If we have the\n",
+    "standard three color channels $D_1=3$.\n",
+    "\n",
+    "The above example has $W_1=H_1=3$ and $D_1=1$.\n",
+    "\n",
+    "When we introduce the filter we have the following additional hyperparameters\n",
+    "1. $K$ the number of filters. It is common to perform the convolution of the input several times since by experience shrinking the input too fast does not work well\n",
+    "\n",
+    "2. $F$ as the filter's spatial extent\n",
+    "\n",
+    "3. $S$ as the stride parameter\n",
+    "\n",
+    "4. $P$ as the padding parameter\n",
+    "\n",
+    "These parameters are defined by the architecture of the network and are not included in the training."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b38d040f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## New image (or volume)\n",
+    "\n",
+    "Acting with the filter on the input volume produces an output volume\n",
+    "which is defined by its width $W_2$, its height $H_2$ and its depth\n",
+    "$D_2$.\n",
+    "\n",
+    "These are defined by the following relations"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3b090ce0",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "W_2 = \\frac{(W_1-F+2P)}{S}+1,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "52fa4212",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "H_2 = \\frac{(H_1-F+2P)}{S}+1,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "dfa9a926",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and $D_2=K$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9bb02c26",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Parameters to train, common settings\n",
+    "\n",
+    "With parameter sharing, the convolution involves thus  for each filter  $F\\times F\\times D_1$ weights plus one bias parameter.\n",
+    "\n",
+    "In total we have"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d98e6808",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\left(F\\times F\\times D_1)\\right) \\times K+(K\\mathrm{--biases}),\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "601ecd16",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "parameters to train by back propagation.\n",
+    "\n",
+    "It is common to let $K$ come in powers of $2$, that is $32$, $64$, $128$ etc.\n",
+    "\n",
+    "**Common settings.**\n",
+    "\n",
+    "1. $\\begin{array}{c} F=3 & S=1 & P=1 \\end{array}$\n",
+    "\n",
+    "2. $\\begin{array}{c} F=5 & S=1 & P=2 \\end{array}$\n",
+    "\n",
+    "3. $\\begin{array}{c} F=5 & S=2 & P=\\mathrm{open} \\end{array}$\n",
+    "\n",
+    "4. $\\begin{array}{c} F=1 & S=1 & P=0 \\end{array}$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3f87e148",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Examples of CNN setups\n",
+    "\n",
+    "Let us assume we have an input volume $V$ given by an image of dimensionality\n",
+    "$32\\times 32 \\times 3$, that is three color channels and $32\\times 32$ pixels.\n",
+    "\n",
+    "We apply a filter of dimension $5\\times 5$ ten times with stride $S=1$ and padding $P=0$.\n",
+    "\n",
+    "The output volume is given by $(32-5)/1+1=28$, resulting in ten images\n",
+    "of dimensionality $28\\times 28\\times 3$.\n",
+    "\n",
+    "The total number of parameters to train for each filter is then\n",
+    "$5\\times 5\\times 3+1$, where the last parameter is the bias. This\n",
+    "gives us $76$ parameters for each filter, leading to a total of $760$\n",
+    "parameters for the ten filters.\n",
+    "\n",
+    "How many parameters will a filter of dimensionality $3\\times 3$\n",
+    "(adding color channels) result in if we produce $32$ new images? Use $S=1$ and $P=0$.\n",
+    "\n",
+    "Note that strides constitute a form of **subsampling**. As an alternative to\n",
+    "being interpreted as a measure of how much the kernel/filter is translated, strides\n",
+    "can also be viewed as how much of the output is retained. For instance, moving\n",
+    "the kernel by hops of two is equivalent to moving the kernel by hops of one but\n",
+    "retaining only odd output elements."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "45526eae",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Summarizing: Performing a general discrete convolution ([From Raschka et al](https://github.com/rasbt/machine-learning-book))\n",
+    "\n",
+    "<!-- dom:FIGURE: [figslides/discreteconv1.png, width=500 frac=0.67]  A deep CNN -->\n",
+    "<!-- begin figure -->\n",
+    "\n",
+    "<img src=\"figslides/discreteconv1.png\" width=\"500\"><p style=\"font-size: 0.9em\"><i>Figure 1: A deep CNN</i></p>\n",
+    "<!-- end figure -->"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "963177d2",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Pooling\n",
+    "\n",
+    "In addition to discrete convolutions themselves, **pooling** operations\n",
+    "make up another important building block in CNNs. Pooling operations reduce\n",
+    "the size of feature maps by using some function to summarize subregions, such\n",
+    "as taking the average or the maximum value.\n",
+    "\n",
+    "Pooling works by sliding a window across the input and feeding the content of\n",
+    "the window to a **pooling function**. In some sense, pooling works very much\n",
+    "like a discrete convolution, but replaces the linear combination described by\n",
+    "the kernel with some other function."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f657465b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Pooling arithmetic\n",
+    "\n",
+    "In a neural network, pooling layers provide invariance to small translations of\n",
+    "the input. The most common kind of pooling is **max pooling**, which\n",
+    "consists in splitting the input in (usually non-overlapping) patches and\n",
+    "outputting the maximum value of each patch. Other kinds of pooling exist, e.g.,\n",
+    "mean or average pooling, which all share the same idea of aggregating the input\n",
+    "locally by applying a non-linearity to the content of some patches."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "33142d01",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Pooling types ([From Raschka et al](https://github.com/rasbt/machine-learning-book))\n",
+    "\n",
+    "<!-- dom:FIGURE: [figslides/maxpooling.png, width=500 frac=0.67]  A deep CNN -->\n",
+    "<!-- begin figure -->\n",
+    "\n",
+    "<img src=\"figslides/maxpooling.png\" width=\"500\"><p style=\"font-size: 0.9em\"><i>Figure 1: A deep CNN</i></p>\n",
+    "<!-- end figure -->"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7e8ee265",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Building convolutional neural networks using Tensorflow and Keras\n",
+    "\n",
+    "As discussed above, CNNs are neural networks built from the assumption that the inputs\n",
+    "to the network are 2D images. This is important because the number of features or pixels in images\n",
+    "grows very fast with the image size, and an enormous number of weights and biases are needed in order to build an accurate network.  \n",
+    "\n",
+    "As before, we still have our input, a hidden layer and an output. What's novel about convolutional networks\n",
+    "are the **convolutional** and **pooling** layers stacked in pairs between the input and the hidden layer.\n",
+    "In addition, the data is no longer represented as a 2D feature matrix, instead each input is a number of 2D\n",
+    "matrices, typically 1 for each color dimension (Red, Green, Blue)."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c4e2bc6f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Setting it up\n",
+    "\n",
+    "It means that to represent the entire\n",
+    "dataset of images, we require a 4D matrix or **tensor**. This tensor has the dimensions:"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f8d6e5be",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "(n_{inputs},\\, n_{pixels, width},\\, n_{pixels, height},\\, depth) .\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "bd170ded",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## The MNIST dataset again\n",
+    "\n",
+    "The MNIST dataset consists of grayscale images with a pixel size of\n",
+    "$28\\times 28$, meaning we require $28 \\times 28 = 724$ weights to each\n",
+    "neuron in the first hidden layer.\n",
+    "\n",
+    "If we were to analyze images of size $128\\times 128$ we would require\n",
+    "$128 \\times 128 = 16384$ weights to each neuron. Even worse if we were\n",
+    "dealing with color images, as most images are, we have an image matrix\n",
+    "of size $128\\times 128$ for each color dimension (Red, Green, Blue),\n",
+    "meaning 3 times the number of weights $= 49152$ are required for every\n",
+    "single neuron in the first hidden layer."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5f8a4322",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Strong correlations\n",
+    "\n",
+    "Images typically have strong local correlations, meaning that a small\n",
+    "part of the image varies little from its neighboring regions. If for\n",
+    "example we have an image of a blue car, we can roughly assume that a\n",
+    "small blue part of the image is surrounded by other blue regions.\n",
+    "\n",
+    "Therefore, instead of connecting every single pixel to a neuron in the\n",
+    "first hidden layer, as we have previously done with deep neural\n",
+    "networks, we can instead connect each neuron to a small part of the\n",
+    "image (in all 3 RGB depth dimensions).  The size of each small area is\n",
+    "fixed, and known as a [receptive](https://en.wikipedia.org/wiki/Receptive_field)."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "bad994c1",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Layers of a CNN\n",
+    "\n",
+    "The layers of a convolutional neural network arrange neurons in 3D: width, height and depth.  \n",
+    "The input image is typically a square matrix of depth 3. \n",
+    "\n",
+    "A **convolution** is performed on the image which outputs\n",
+    "a 3D volume of neurons. The weights to the input are arranged in a number of 2D matrices, known as **filters**.\n",
+    "\n",
+    "Each filter slides along the input image, taking the dot product\n",
+    "between each small part of the image and the filter, in all depth\n",
+    "dimensions. This is then passed through a non-linear function,\n",
+    "typically the **Rectified Linear (ReLu)** function, which serves as the\n",
+    "activation of the neurons in the first convolutional layer. This is\n",
+    "further passed through a **pooling layer**, which reduces the size of the\n",
+    "convolutional layer, e.g. by taking the maximum or average across some\n",
+    "small regions, and this serves as input to the next convolutional\n",
+    "layer."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3f9bf131",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Systematic reduction\n",
+    "\n",
+    "By systematically reducing the size of the input volume, through\n",
+    "convolution and pooling, the network should create representations of\n",
+    "small parts of the input, and then from them assemble representations\n",
+    "of larger areas.  The final pooling layer is flattened to serve as\n",
+    "input to a hidden layer, such that each neuron in the final pooling\n",
+    "layer is connected to every single neuron in the hidden layer. This\n",
+    "then serves as input to the output layer, e.g. a softmax output for\n",
+    "classification."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "625ace40",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Prerequisites: Collect and pre-process data"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "id": "a3f06a64",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "%matplotlib inline\n",
+    "\n",
+    "# import necessary packages\n",
+    "import numpy as np\n",
+    "import matplotlib.pyplot as plt\n",
+    "from sklearn import datasets\n",
+    "\n",
+    "\n",
+    "# ensure the same random numbers appear every time\n",
+    "np.random.seed(0)\n",
+    "\n",
+    "# display images in notebook\n",
+    "%matplotlib inline\n",
+    "plt.rcParams['figure.figsize'] = (12,12)\n",
+    "\n",
+    "\n",
+    "# download MNIST dataset\n",
+    "digits = datasets.load_digits()\n",
+    "\n",
+    "# define inputs and labels\n",
+    "inputs = digits.images\n",
+    "labels = digits.target\n",
+    "\n",
+    "# RGB images have a depth of 3\n",
+    "# our images are grayscale so they should have a depth of 1\n",
+    "inputs = inputs[:,:,:,np.newaxis]\n",
+    "\n",
+    "print(\"inputs = (n_inputs, pixel_width, pixel_height, depth) = \" + str(inputs.shape))\n",
+    "print(\"labels = (n_inputs) = \" + str(labels.shape))\n",
+    "\n",
+    "\n",
+    "# choose some random images to display\n",
+    "n_inputs = len(inputs)\n",
+    "indices = np.arange(n_inputs)\n",
+    "random_indices = np.random.choice(indices, size=5)\n",
+    "\n",
+    "for i, image in enumerate(digits.images[random_indices]):\n",
+    "    plt.subplot(1, 5, i+1)\n",
+    "    plt.axis('off')\n",
+    "    plt.imshow(image, cmap=plt.cm.gray_r, interpolation='nearest')\n",
+    "    plt.title(\"Label: %d\" % digits.target[random_indices[i]])\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "764e7143",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Importing Keras and Tensorflow"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "id": "1b8fd15a",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "from tensorflow.keras import datasets, layers, models\n",
+    "from tensorflow.keras.layers import Input\n",
+    "from tensorflow.keras.models import Sequential      #This allows appending layers to existing models\n",
+    "from tensorflow.keras.layers import Dense           #This allows defining the characteristics of a particular layer\n",
+    "from tensorflow.keras import optimizers             #This allows using whichever optimiser we want (sgd,adam,RMSprop)\n",
+    "from tensorflow.keras import regularizers           #This allows using whichever regularizer we want (l1,l2,l1_l2)\n",
+    "from tensorflow.keras.utils import to_categorical   #This allows using categorical cross entropy as the cost function\n",
+    "#from tensorflow.keras import Conv2D\n",
+    "#from tensorflow.keras import MaxPooling2D\n",
+    "#from tensorflow.keras import Flatten\n",
+    "\n",
+    "from sklearn.model_selection import train_test_split\n",
+    "\n",
+    "# representation of labels\n",
+    "labels = to_categorical(labels)\n",
+    "\n",
+    "# split into train and test data\n",
+    "# one-liner from scikit-learn library\n",
+    "train_size = 0.8\n",
+    "test_size = 1 - train_size\n",
+    "X_train, X_test, Y_train, Y_test = train_test_split(inputs, labels, train_size=train_size,\n",
+    "                                                    test_size=test_size)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "bf68c3f4",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Running with Keras"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "id": "d5a91d0e",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "def create_convolutional_neural_network_keras(input_shape, receptive_field,\n",
+    "                                              n_filters, n_neurons_connected, n_categories,\n",
+    "                                              eta, lmbd):\n",
+    "    model = Sequential()\n",
+    "    model.add(layers.Conv2D(n_filters, (receptive_field, receptive_field), input_shape=input_shape, padding='same',\n",
+    "              activation='relu', kernel_regularizer=regularizers.l2(lmbd)))\n",
+    "    model.add(layers.MaxPooling2D(pool_size=(2, 2)))\n",
+    "    model.add(layers.Flatten())\n",
+    "    model.add(layers.Dense(n_neurons_connected, activation='relu', kernel_regularizer=regularizers.l2(lmbd)))\n",
+    "    model.add(layers.Dense(n_categories, activation='softmax', kernel_regularizer=regularizers.l2(lmbd)))\n",
+    "    \n",
+    "    sgd = optimizers.SGD(lr=eta)\n",
+    "    model.compile(loss='categorical_crossentropy', optimizer=sgd, metrics=['accuracy'])\n",
+    "    \n",
+    "    return model\n",
+    "\n",
+    "epochs = 100\n",
+    "batch_size = 100\n",
+    "input_shape = X_train.shape[1:4]\n",
+    "receptive_field = 3\n",
+    "n_filters = 10\n",
+    "n_neurons_connected = 50\n",
+    "n_categories = 10\n",
+    "\n",
+    "eta_vals = np.logspace(-5, 1, 7)\n",
+    "lmbd_vals = np.logspace(-5, 1, 7)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8ff4d34b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Final part"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "id": "c1035646",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "CNN_keras = np.zeros((len(eta_vals), len(lmbd_vals)), dtype=object)\n",
+    "        \n",
+    "for i, eta in enumerate(eta_vals):\n",
+    "    for j, lmbd in enumerate(lmbd_vals):\n",
+    "        CNN = create_convolutional_neural_network_keras(input_shape, receptive_field,\n",
+    "                                              n_filters, n_neurons_connected, n_categories,\n",
+    "                                              eta, lmbd)\n",
+    "        CNN.fit(X_train, Y_train, epochs=epochs, batch_size=batch_size, verbose=0)\n",
+    "        scores = CNN.evaluate(X_test, Y_test)\n",
+    "        \n",
+    "        CNN_keras[i][j] = CNN\n",
+    "        \n",
+    "        print(\"Learning rate = \", eta)\n",
+    "        print(\"Lambda = \", lmbd)\n",
+    "        print(\"Test accuracy: %.3f\" % scores[1])\n",
+    "        print()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "dcdee4b4",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Final visualization"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "id": "c34c4218",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "# visual representation of grid search\n",
+    "# uses seaborn heatmap, could probably do this in matplotlib\n",
+    "import seaborn as sns\n",
+    "\n",
+    "sns.set()\n",
+    "\n",
+    "train_accuracy = np.zeros((len(eta_vals), len(lmbd_vals)))\n",
+    "test_accuracy = np.zeros((len(eta_vals), len(lmbd_vals)))\n",
+    "\n",
+    "for i in range(len(eta_vals)):\n",
+    "    for j in range(len(lmbd_vals)):\n",
+    "        CNN = CNN_keras[i][j]\n",
+    "\n",
+    "        train_accuracy[i][j] = CNN.evaluate(X_train, Y_train)[1]\n",
+    "        test_accuracy[i][j] = CNN.evaluate(X_test, Y_test)[1]\n",
+    "\n",
+    "        \n",
+    "fig, ax = plt.subplots(figsize = (10, 10))\n",
+    "sns.heatmap(train_accuracy, annot=True, ax=ax, cmap=\"viridis\")\n",
+    "ax.set_title(\"Training Accuracy\")\n",
+    "ax.set_ylabel(\"$\\eta$\")\n",
+    "ax.set_xlabel(\"$\\lambda$\")\n",
+    "plt.show()\n",
+    "\n",
+    "fig, ax = plt.subplots(figsize = (10, 10))\n",
+    "sns.heatmap(test_accuracy, annot=True, ax=ax, cmap=\"viridis\")\n",
+    "ax.set_title(\"Test Accuracy\")\n",
+    "ax.set_ylabel(\"$\\eta$\")\n",
+    "ax.set_xlabel(\"$\\lambda$\")\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9848777f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## The CIFAR01 data set\n",
+    "\n",
+    "The CIFAR10 dataset contains 60,000 color images in 10 classes, with\n",
+    "6,000 images in each class. The dataset is divided into 50,000\n",
+    "training images and 10,000 testing images. The classes are mutually\n",
+    "exclusive and there is no overlap between them."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "id": "e3c34685",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "import tensorflow as tf\n",
+    "\n",
+    "from tensorflow.keras import datasets, layers, models\n",
+    "import matplotlib.pyplot as plt\n",
+    "\n",
+    "# We import the data set\n",
+    "(train_images, train_labels), (test_images, test_labels) = datasets.cifar10.load_data()\n",
+    "\n",
+    "# Normalize pixel values to be between 0 and 1 by dividing by 255. \n",
+    "train_images, test_images = train_images / 255.0, test_images / 255.0"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "376a2959",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Verifying the data set\n",
+    "\n",
+    "To verify that the dataset looks correct, let's plot the first 25 images from the training set and display the class name below each image."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "id": "fa4b303c",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "class_names = ['airplane', 'automobile', 'bird', 'cat', 'deer',\n",
+    "               'dog', 'frog', 'horse', 'ship', 'truck']\n",
+    "plt.figure(figsize=(10,10))\n",
+    "for i in range(25):\n",
+    "    plt.subplot(5,5,i+1)\n",
+    "    plt.xticks([])\n",
+    "    plt.yticks([])\n",
+    "    plt.grid(False)\n",
+    "    plt.imshow(train_images[i], cmap=plt.cm.binary)\n",
+    "    # The CIFAR labels happen to be arrays, \n",
+    "    # which is why you need the extra index\n",
+    "    plt.xlabel(class_names[train_labels[i][0]])\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8f717ab7",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Set up  the model\n",
+    "\n",
+    "The 6 lines of code below define the convolutional base using a common pattern: a stack of Conv2D and MaxPooling2D layers.\n",
+    "\n",
+    "As input, a CNN takes tensors of shape (image_height, image_width, color_channels), ignoring the batch size. If you are new to these dimensions, color_channels refers to (R,G,B). In this example, you will configure our CNN to process inputs of shape (32, 32, 3), which is the format of CIFAR images. You can do this by passing the argument input_shape to our first layer."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 8,
+   "id": "91013222",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "model = models.Sequential()\n",
+    "model.add(layers.Conv2D(32, (3, 3), activation='relu', input_shape=(32, 32, 3)))\n",
+    "model.add(layers.MaxPooling2D((2, 2)))\n",
+    "model.add(layers.Conv2D(64, (3, 3), activation='relu'))\n",
+    "model.add(layers.MaxPooling2D((2, 2)))\n",
+    "model.add(layers.Conv2D(64, (3, 3), activation='relu'))\n",
+    "\n",
+    "# Let's display the architecture of our model so far.\n",
+    "\n",
+    "model.summary()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "64f3581b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "You can see that the output of every Conv2D and MaxPooling2D layer is a 3D tensor of shape (height, width, channels). The width and height dimensions tend to shrink as you go deeper in the network. The number of output channels for each Conv2D layer is controlled by the first argument (e.g., 32 or 64). Typically, as the width and height shrink, you can afford (computationally) to add more output channels in each Conv2D layer."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "07774fd6",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Add Dense layers on top\n",
+    "\n",
+    "To complete our model, you will feed the last output tensor from the\n",
+    "convolutional base (of shape (4, 4, 64)) into one or more Dense layers\n",
+    "to perform classification. Dense layers take vectors as input (which\n",
+    "are 1D), while the current output is a 3D tensor. First, you will\n",
+    "flatten (or unroll) the 3D output to 1D, then add one or more Dense\n",
+    "layers on top. CIFAR has 10 output classes, so you use a final Dense\n",
+    "layer with 10 outputs and a softmax activation."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 9,
+   "id": "a6dc1206",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "model.add(layers.Flatten())\n",
+    "model.add(layers.Dense(64, activation='relu'))\n",
+    "model.add(layers.Dense(10))\n",
+    "Here's the complete architecture of our model.\n",
+    "\n",
+    "model.summary()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "71ef5715",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "As you can see, our (4, 4, 64) outputs were flattened into vectors of shape (1024) before going through two Dense layers."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "596eaf51",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Compile and train the model"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 10,
+   "id": "1c8159af",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "model.compile(optimizer='adam',\n",
+    "              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),\n",
+    "              metrics=['accuracy'])\n",
+    "\n",
+    "history = model.fit(train_images, train_labels, epochs=10, \n",
+    "                    validation_data=(test_images, test_labels))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "23913f02",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Finally, evaluate the model"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 11,
+   "id": "942cf136",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "plt.plot(history.history['accuracy'], label='accuracy')\n",
+    "plt.plot(history.history['val_accuracy'], label = 'val_accuracy')\n",
+    "plt.xlabel('Epoch')\n",
+    "plt.ylabel('Accuracy')\n",
+    "plt.ylim([0.5, 1])\n",
+    "plt.legend(loc='lower right')\n",
+    "\n",
+    "test_loss, test_acc = model.evaluate(test_images,  test_labels, verbose=2)\n",
+    "\n",
+    "print(test_acc)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9cf8f35b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Building code using Pytorch\n",
+    "\n",
+    "This code loads and normalizes the MNIST dataset. Thereafter it defines  a CNN architecture with:\n",
+    "1. Two convolutional layers\n",
+    "\n",
+    "2. Max pooling\n",
+    "\n",
+    "3. Dropout for regularization\n",
+    "\n",
+    "4. Two fully connected layers\n",
+    "\n",
+    "It uses the Adam optimizer and for cost function it employs the\n",
+    "Cross-Entropy function. It trains for 10 epochs.\n",
+    "You can modify the architecture (number of layers, channels, dropout\n",
+    "rate) or training parameters (learning rate, batch size, epochs) to\n",
+    "experiment with different configurations."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 12,
+   "id": "3f08edcf",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "import torch\n",
+    "import torch.nn as nn\n",
+    "import torch.nn.functional as F\n",
+    "import torch.optim as optim\n",
+    "from torchvision import datasets, transforms\n",
+    "\n",
+    "# Set device\n",
+    "device = torch.device(\"cuda\" if torch.cuda.is_available() else \"cpu\")\n",
+    "\n",
+    "# Define transforms\n",
+    "transform = transforms.Compose([\n",
+    "   transforms.ToTensor(),\n",
+    "   transforms.Normalize((0.1307,), (0.3081,))\n",
+    "])\n",
+    "\n",
+    "# Load datasets\n",
+    "train_dataset = datasets.MNIST(root='./data', train=True, download=True, transform=transform)\n",
+    "test_dataset = datasets.MNIST(root='./data', train=False, download=True, transform=transform)\n",
+    "\n",
+    "# Create data loaders\n",
+    "train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=64, shuffle=True)\n",
+    "test_loader = torch.utils.data.DataLoader(test_dataset, batch_size=64, shuffle=False)\n",
+    "\n",
+    "# Define CNN model\n",
+    "class CNN(nn.Module):\n",
+    "   def __init__(self):\n",
+    "       super(CNN, self).__init__()\n",
+    "       self.conv1 = nn.Conv2d(1, 32, 3, padding=1)\n",
+    "       self.conv2 = nn.Conv2d(32, 64, 3, padding=1)\n",
+    "       self.pool = nn.MaxPool2d(2, 2)\n",
+    "       self.fc1 = nn.Linear(64*7*7, 1024)\n",
+    "       self.fc2 = nn.Linear(1024, 10)\n",
+    "       self.dropout = nn.Dropout(0.5)\n",
+    "\n",
+    "   def forward(self, x):\n",
+    "       x = self.pool(F.relu(self.conv1(x)))\n",
+    "       x = self.pool(F.relu(self.conv2(x)))\n",
+    "       x = x.view(-1, 64*7*7)\n",
+    "       x = self.dropout(F.relu(self.fc1(x)))\n",
+    "       x = self.fc2(x)\n",
+    "       return x\n",
+    "\n",
+    "# Initialize model, loss function, and optimizer\n",
+    "model = CNN().to(device)\n",
+    "criterion = nn.CrossEntropyLoss()\n",
+    "optimizer = optim.Adam(model.parameters(), lr=0.001)\n",
+    "\n",
+    "# Training loop\n",
+    "num_epochs = 10\n",
+    "for epoch in range(num_epochs):\n",
+    "   model.train()\n",
+    "   running_loss = 0.0\n",
+    "   for batch_idx, (data, target) in enumerate(train_loader):\n",
+    "       data, target = data.to(device), target.to(device)\n",
+    "       optimizer.zero_grad()\n",
+    "       outputs = model(data)\n",
+    "       loss = criterion(outputs, target)\n",
+    "       loss.backward()\n",
+    "       optimizer.step()\n",
+    "       running_loss += loss.item()\n",
+    "\n",
+    "   print(f'Epoch [{epoch+1}/{num_epochs}], Loss: {running_loss/len(train_loader):.4f}')\n",
+    "\n",
+    "# Testing the model\n",
+    "model.eval()\n",
+    "correct = 0\n",
+    "total = 0\n",
+    "with torch.no_grad():\n",
+    "   for data, target in test_loader:\n",
+    "       data, target = data.to(device), target.to(device)\n",
+    "       outputs = model(data)\n",
+    "       _, predicted = torch.max(outputs.data, 1)\n",
+    "       total += target.size(0)\n",
+    "       correct += (predicted == target).sum().item()\n",
+    "\n",
+    "print(f'Test Accuracy: {100 * correct / total:.2f}%')"
+   ]
+  }
+ ],
+ "metadata": {},
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/doc/LectureNotes/_build/html/chapter1.html b/doc/LectureNotes/_build/html/chapter1.html
index c323aa723..ca7455c5a 100644
--- a/doc/LectureNotes/_build/html/chapter1.html
+++ b/doc/LectureNotes/_build/html/chapter1.html
@@ -230,10 +230,45 @@
 <li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek36.html">Exercises week 36</a></li>
 <li class="toctree-l1"><a class="reference internal" href="http://github.com/week36.html">Week 36: Linear Regression and Gradient descent</a></li>
 <li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek37.html">Exercises week 37</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week37.html">Week 37: Gradient descent methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek38.html">Exercises week 38</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week38.html">Week 38: Statistical analysis, bias-variance tradeoff and resampling methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek39.html">Exercises week 39</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week39.html">Week 39: Resampling methods and logistic regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week40.html">Week 40: Gradient descent methods (continued) and start Neural networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week41.html">Week 41 Neural networks and constructing a neural network code</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek41.html">Exercises week 41</a></li>
+
+
+
+
+
+
+
+
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week42.html">Week 42 Constructing a Neural Network code with examples</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek42.html">Exercises week 42</a></li>
+
+
+
+
+
+
+
+
+
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week43.html">Week 43: Deep Learning: Constructing a Neural Network code and solving differential equations</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek43.html">Exercises week 43</a></li>
+
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week44.html">Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek44.html">Exercises week 44</a></li>
+
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week45.html">Week 45,  Convolutional Neural Networks (CCNs)</a></li>
 </ul>
 <p aria-level="2" class="caption" role="heading"><span class="caption-text">Projects</span></p>
 <ul class="nav bd-sidenav">
 <li class="toctree-l1"><a class="reference internal" href="http://github.com/project1.html">Project 1 on Machine Learning, deadline October 6 (midnight), 2025</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/project2.html">Project 2 on Machine Learning, deadline November 10 (Midnight)</a></li>
 </ul>
 
     </div>
diff --git a/doc/LectureNotes/_build/html/chapter10.html b/doc/LectureNotes/_build/html/chapter10.html
index be40b4d4a..612413b38 100644
--- a/doc/LectureNotes/_build/html/chapter10.html
+++ b/doc/LectureNotes/_build/html/chapter10.html
@@ -230,10 +230,45 @@
 <li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek36.html">Exercises week 36</a></li>
 <li class="toctree-l1"><a class="reference internal" href="http://github.com/week36.html">Week 36: Linear Regression and Gradient descent</a></li>
 <li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek37.html">Exercises week 37</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week37.html">Week 37: Gradient descent methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek38.html">Exercises week 38</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week38.html">Week 38: Statistical analysis, bias-variance tradeoff and resampling methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek39.html">Exercises week 39</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week39.html">Week 39: Resampling methods and logistic regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week40.html">Week 40: Gradient descent methods (continued) and start Neural networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week41.html">Week 41 Neural networks and constructing a neural network code</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek41.html">Exercises week 41</a></li>
+
+
+
+
+
+
+
+
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week42.html">Week 42 Constructing a Neural Network code with examples</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek42.html">Exercises week 42</a></li>
+
+
+
+
+
+
+
+
+
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week43.html">Week 43: Deep Learning: Constructing a Neural Network code and solving differential equations</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek43.html">Exercises week 43</a></li>
+
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week44.html">Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek44.html">Exercises week 44</a></li>
+
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week45.html">Week 45,  Convolutional Neural Networks (CCNs)</a></li>
 </ul>
 <p aria-level="2" class="caption" role="heading"><span class="caption-text">Projects</span></p>
 <ul class="nav bd-sidenav">
 <li class="toctree-l1"><a class="reference internal" href="http://github.com/project1.html">Project 1 on Machine Learning, deadline October 6 (midnight), 2025</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/project2.html">Project 2 on Machine Learning, deadline November 10 (Midnight)</a></li>
 </ul>
 
     </div>
diff --git a/doc/LectureNotes/_build/html/chapter11.html b/doc/LectureNotes/_build/html/chapter11.html
index 21280b666..25c63e7e2 100644
--- a/doc/LectureNotes/_build/html/chapter11.html
+++ b/doc/LectureNotes/_build/html/chapter11.html
@@ -230,10 +230,45 @@
 <li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek36.html">Exercises week 36</a></li>
 <li class="toctree-l1"><a class="reference internal" href="http://github.com/week36.html">Week 36: Linear Regression and Gradient descent</a></li>
 <li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek37.html">Exercises week 37</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week37.html">Week 37: Gradient descent methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek38.html">Exercises week 38</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week38.html">Week 38: Statistical analysis, bias-variance tradeoff and resampling methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek39.html">Exercises week 39</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week39.html">Week 39: Resampling methods and logistic regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week40.html">Week 40: Gradient descent methods (continued) and start Neural networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week41.html">Week 41 Neural networks and constructing a neural network code</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek41.html">Exercises week 41</a></li>
+
+
+
+
+
+
+
+
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week42.html">Week 42 Constructing a Neural Network code with examples</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek42.html">Exercises week 42</a></li>
+
+
+
+
+
+
+
+
+
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week43.html">Week 43: Deep Learning: Constructing a Neural Network code and solving differential equations</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek43.html">Exercises week 43</a></li>
+
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week44.html">Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek44.html">Exercises week 44</a></li>
+
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week45.html">Week 45,  Convolutional Neural Networks (CCNs)</a></li>
 </ul>
 <p aria-level="2" class="caption" role="heading"><span class="caption-text">Projects</span></p>
 <ul class="nav bd-sidenav">
 <li class="toctree-l1"><a class="reference internal" href="http://github.com/project1.html">Project 1 on Machine Learning, deadline October 6 (midnight), 2025</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/project2.html">Project 2 on Machine Learning, deadline November 10 (Midnight)</a></li>
 </ul>
 
     </div>
diff --git a/doc/LectureNotes/_build/html/chapter12.html b/doc/LectureNotes/_build/html/chapter12.html
index b3454b65b..e3ff223d2 100644
--- a/doc/LectureNotes/_build/html/chapter12.html
+++ b/doc/LectureNotes/_build/html/chapter12.html
@@ -230,10 +230,45 @@
 <li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek36.html">Exercises week 36</a></li>
 <li class="toctree-l1"><a class="reference internal" href="http://github.com/week36.html">Week 36: Linear Regression and Gradient descent</a></li>
 <li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek37.html">Exercises week 37</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week37.html">Week 37: Gradient descent methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek38.html">Exercises week 38</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week38.html">Week 38: Statistical analysis, bias-variance tradeoff and resampling methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek39.html">Exercises week 39</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week39.html">Week 39: Resampling methods and logistic regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week40.html">Week 40: Gradient descent methods (continued) and start Neural networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week41.html">Week 41 Neural networks and constructing a neural network code</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek41.html">Exercises week 41</a></li>
+
+
+
+
+
+
+
+
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week42.html">Week 42 Constructing a Neural Network code with examples</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek42.html">Exercises week 42</a></li>
+
+
+
+
+
+
+
+
+
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week43.html">Week 43: Deep Learning: Constructing a Neural Network code and solving differential equations</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek43.html">Exercises week 43</a></li>
+
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week44.html">Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek44.html">Exercises week 44</a></li>
+
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week45.html">Week 45,  Convolutional Neural Networks (CCNs)</a></li>
 </ul>
 <p aria-level="2" class="caption" role="heading"><span class="caption-text">Projects</span></p>
 <ul class="nav bd-sidenav">
 <li class="toctree-l1"><a class="reference internal" href="http://github.com/project1.html">Project 1 on Machine Learning, deadline October 6 (midnight), 2025</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/project2.html">Project 2 on Machine Learning, deadline November 10 (Midnight)</a></li>
 </ul>
 
     </div>
diff --git a/doc/LectureNotes/_build/html/chapter13.html b/doc/LectureNotes/_build/html/chapter13.html
index c8c0990c6..c366b8259 100644
--- a/doc/LectureNotes/_build/html/chapter13.html
+++ b/doc/LectureNotes/_build/html/chapter13.html
@@ -230,10 +230,45 @@
 <li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek36.html">Exercises week 36</a></li>
 <li class="toctree-l1"><a class="reference internal" href="http://github.com/week36.html">Week 36: Linear Regression and Gradient descent</a></li>
 <li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek37.html">Exercises week 37</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week37.html">Week 37: Gradient descent methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek38.html">Exercises week 38</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week38.html">Week 38: Statistical analysis, bias-variance tradeoff and resampling methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek39.html">Exercises week 39</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week39.html">Week 39: Resampling methods and logistic regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week40.html">Week 40: Gradient descent methods (continued) and start Neural networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week41.html">Week 41 Neural networks and constructing a neural network code</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek41.html">Exercises week 41</a></li>
+
+
+
+
+
+
+
+
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week42.html">Week 42 Constructing a Neural Network code with examples</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek42.html">Exercises week 42</a></li>
+
+
+
+
+
+
+
+
+
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week43.html">Week 43: Deep Learning: Constructing a Neural Network code and solving differential equations</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek43.html">Exercises week 43</a></li>
+
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week44.html">Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek44.html">Exercises week 44</a></li>
+
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week45.html">Week 45,  Convolutional Neural Networks (CCNs)</a></li>
 </ul>
 <p aria-level="2" class="caption" role="heading"><span class="caption-text">Projects</span></p>
 <ul class="nav bd-sidenav">
 <li class="toctree-l1"><a class="reference internal" href="http://github.com/project1.html">Project 1 on Machine Learning, deadline October 6 (midnight), 2025</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/project2.html">Project 2 on Machine Learning, deadline November 10 (Midnight)</a></li>
 </ul>
 
     </div>
diff --git a/doc/LectureNotes/_build/html/chapter2.html b/doc/LectureNotes/_build/html/chapter2.html
index 2b3176071..814f23a7c 100644
--- a/doc/LectureNotes/_build/html/chapter2.html
+++ b/doc/LectureNotes/_build/html/chapter2.html
@@ -230,10 +230,45 @@
 <li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek36.html">Exercises week 36</a></li>
 <li class="toctree-l1"><a class="reference internal" href="http://github.com/week36.html">Week 36: Linear Regression and Gradient descent</a></li>
 <li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek37.html">Exercises week 37</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week37.html">Week 37: Gradient descent methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek38.html">Exercises week 38</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week38.html">Week 38: Statistical analysis, bias-variance tradeoff and resampling methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek39.html">Exercises week 39</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week39.html">Week 39: Resampling methods and logistic regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week40.html">Week 40: Gradient descent methods (continued) and start Neural networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week41.html">Week 41 Neural networks and constructing a neural network code</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek41.html">Exercises week 41</a></li>
+
+
+
+
+
+
+
+
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week42.html">Week 42 Constructing a Neural Network code with examples</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek42.html">Exercises week 42</a></li>
+
+
+
+
+
+
+
+
+
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week43.html">Week 43: Deep Learning: Constructing a Neural Network code and solving differential equations</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek43.html">Exercises week 43</a></li>
+
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week44.html">Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek44.html">Exercises week 44</a></li>
+
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week45.html">Week 45,  Convolutional Neural Networks (CCNs)</a></li>
 </ul>
 <p aria-level="2" class="caption" role="heading"><span class="caption-text">Projects</span></p>
 <ul class="nav bd-sidenav">
 <li class="toctree-l1"><a class="reference internal" href="http://github.com/project1.html">Project 1 on Machine Learning, deadline October 6 (midnight), 2025</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/project2.html">Project 2 on Machine Learning, deadline November 10 (Midnight)</a></li>
 </ul>
 
     </div>
diff --git a/doc/LectureNotes/_build/html/chapter3.html b/doc/LectureNotes/_build/html/chapter3.html
index a06cc0460..8accd5d6d 100644
--- a/doc/LectureNotes/_build/html/chapter3.html
+++ b/doc/LectureNotes/_build/html/chapter3.html
@@ -230,10 +230,45 @@
 <li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek36.html">Exercises week 36</a></li>
 <li class="toctree-l1"><a class="reference internal" href="http://github.com/week36.html">Week 36: Linear Regression and Gradient descent</a></li>
 <li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek37.html">Exercises week 37</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week37.html">Week 37: Gradient descent methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek38.html">Exercises week 38</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week38.html">Week 38: Statistical analysis, bias-variance tradeoff and resampling methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek39.html">Exercises week 39</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week39.html">Week 39: Resampling methods and logistic regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week40.html">Week 40: Gradient descent methods (continued) and start Neural networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week41.html">Week 41 Neural networks and constructing a neural network code</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek41.html">Exercises week 41</a></li>
+
+
+
+
+
+
+
+
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week42.html">Week 42 Constructing a Neural Network code with examples</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek42.html">Exercises week 42</a></li>
+
+
+
+
+
+
+
+
+
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week43.html">Week 43: Deep Learning: Constructing a Neural Network code and solving differential equations</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek43.html">Exercises week 43</a></li>
+
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week44.html">Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek44.html">Exercises week 44</a></li>
+
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week45.html">Week 45,  Convolutional Neural Networks (CCNs)</a></li>
 </ul>
 <p aria-level="2" class="caption" role="heading"><span class="caption-text">Projects</span></p>
 <ul class="nav bd-sidenav">
 <li class="toctree-l1"><a class="reference internal" href="http://github.com/project1.html">Project 1 on Machine Learning, deadline October 6 (midnight), 2025</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/project2.html">Project 2 on Machine Learning, deadline November 10 (Midnight)</a></li>
 </ul>
 
     </div>
diff --git a/doc/LectureNotes/_build/html/chapter4.html b/doc/LectureNotes/_build/html/chapter4.html
index fdb544a13..90002fb38 100644
--- a/doc/LectureNotes/_build/html/chapter4.html
+++ b/doc/LectureNotes/_build/html/chapter4.html
@@ -230,10 +230,45 @@
 <li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek36.html">Exercises week 36</a></li>
 <li class="toctree-l1"><a class="reference internal" href="http://github.com/week36.html">Week 36: Linear Regression and Gradient descent</a></li>
 <li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek37.html">Exercises week 37</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week37.html">Week 37: Gradient descent methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek38.html">Exercises week 38</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week38.html">Week 38: Statistical analysis, bias-variance tradeoff and resampling methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek39.html">Exercises week 39</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week39.html">Week 39: Resampling methods and logistic regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week40.html">Week 40: Gradient descent methods (continued) and start Neural networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week41.html">Week 41 Neural networks and constructing a neural network code</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek41.html">Exercises week 41</a></li>
+
+
+
+
+
+
+
+
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week42.html">Week 42 Constructing a Neural Network code with examples</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek42.html">Exercises week 42</a></li>
+
+
+
+
+
+
+
+
+
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week43.html">Week 43: Deep Learning: Constructing a Neural Network code and solving differential equations</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek43.html">Exercises week 43</a></li>
+
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week44.html">Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek44.html">Exercises week 44</a></li>
+
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week45.html">Week 45,  Convolutional Neural Networks (CCNs)</a></li>
 </ul>
 <p aria-level="2" class="caption" role="heading"><span class="caption-text">Projects</span></p>
 <ul class="nav bd-sidenav">
 <li class="toctree-l1"><a class="reference internal" href="http://github.com/project1.html">Project 1 on Machine Learning, deadline October 6 (midnight), 2025</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/project2.html">Project 2 on Machine Learning, deadline November 10 (Midnight)</a></li>
 </ul>
 
     </div>
diff --git a/doc/LectureNotes/_build/html/chapter5.html b/doc/LectureNotes/_build/html/chapter5.html
index 7e4464070..a264f255a 100644
--- a/doc/LectureNotes/_build/html/chapter5.html
+++ b/doc/LectureNotes/_build/html/chapter5.html
@@ -230,10 +230,45 @@
 <li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek36.html">Exercises week 36</a></li>
 <li class="toctree-l1"><a class="reference internal" href="http://github.com/week36.html">Week 36: Linear Regression and Gradient descent</a></li>
 <li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek37.html">Exercises week 37</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week37.html">Week 37: Gradient descent methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek38.html">Exercises week 38</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week38.html">Week 38: Statistical analysis, bias-variance tradeoff and resampling methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek39.html">Exercises week 39</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week39.html">Week 39: Resampling methods and logistic regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week40.html">Week 40: Gradient descent methods (continued) and start Neural networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week41.html">Week 41 Neural networks and constructing a neural network code</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek41.html">Exercises week 41</a></li>
+
+
+
+
+
+
+
+
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week42.html">Week 42 Constructing a Neural Network code with examples</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek42.html">Exercises week 42</a></li>
+
+
+
+
+
+
+
+
+
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week43.html">Week 43: Deep Learning: Constructing a Neural Network code and solving differential equations</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek43.html">Exercises week 43</a></li>
+
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week44.html">Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek44.html">Exercises week 44</a></li>
+
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week45.html">Week 45,  Convolutional Neural Networks (CCNs)</a></li>
 </ul>
 <p aria-level="2" class="caption" role="heading"><span class="caption-text">Projects</span></p>
 <ul class="nav bd-sidenav">
 <li class="toctree-l1"><a class="reference internal" href="http://github.com/project1.html">Project 1 on Machine Learning, deadline October 6 (midnight), 2025</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/project2.html">Project 2 on Machine Learning, deadline November 10 (Midnight)</a></li>
 </ul>
 
     </div>
diff --git a/doc/LectureNotes/_build/html/chapter6.html b/doc/LectureNotes/_build/html/chapter6.html
index 8f9aec998..178cf9a00 100644
--- a/doc/LectureNotes/_build/html/chapter6.html
+++ b/doc/LectureNotes/_build/html/chapter6.html
@@ -230,10 +230,45 @@
 <li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek36.html">Exercises week 36</a></li>
 <li class="toctree-l1"><a class="reference internal" href="http://github.com/week36.html">Week 36: Linear Regression and Gradient descent</a></li>
 <li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek37.html">Exercises week 37</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week37.html">Week 37: Gradient descent methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek38.html">Exercises week 38</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week38.html">Week 38: Statistical analysis, bias-variance tradeoff and resampling methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek39.html">Exercises week 39</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week39.html">Week 39: Resampling methods and logistic regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week40.html">Week 40: Gradient descent methods (continued) and start Neural networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week41.html">Week 41 Neural networks and constructing a neural network code</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek41.html">Exercises week 41</a></li>
+
+
+
+
+
+
+
+
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week42.html">Week 42 Constructing a Neural Network code with examples</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek42.html">Exercises week 42</a></li>
+
+
+
+
+
+
+
+
+
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week43.html">Week 43: Deep Learning: Constructing a Neural Network code and solving differential equations</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek43.html">Exercises week 43</a></li>
+
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week44.html">Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek44.html">Exercises week 44</a></li>
+
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week45.html">Week 45,  Convolutional Neural Networks (CCNs)</a></li>
 </ul>
 <p aria-level="2" class="caption" role="heading"><span class="caption-text">Projects</span></p>
 <ul class="nav bd-sidenav">
 <li class="toctree-l1"><a class="reference internal" href="http://github.com/project1.html">Project 1 on Machine Learning, deadline October 6 (midnight), 2025</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/project2.html">Project 2 on Machine Learning, deadline November 10 (Midnight)</a></li>
 </ul>
 
     </div>
diff --git a/doc/LectureNotes/_build/html/chapter7.html b/doc/LectureNotes/_build/html/chapter7.html
index 53f16bfdf..e9af01fc7 100644
--- a/doc/LectureNotes/_build/html/chapter7.html
+++ b/doc/LectureNotes/_build/html/chapter7.html
@@ -230,10 +230,45 @@
 <li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek36.html">Exercises week 36</a></li>
 <li class="toctree-l1"><a class="reference internal" href="http://github.com/week36.html">Week 36: Linear Regression and Gradient descent</a></li>
 <li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek37.html">Exercises week 37</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week37.html">Week 37: Gradient descent methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek38.html">Exercises week 38</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week38.html">Week 38: Statistical analysis, bias-variance tradeoff and resampling methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek39.html">Exercises week 39</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week39.html">Week 39: Resampling methods and logistic regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week40.html">Week 40: Gradient descent methods (continued) and start Neural networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week41.html">Week 41 Neural networks and constructing a neural network code</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek41.html">Exercises week 41</a></li>
+
+
+
+
+
+
+
+
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week42.html">Week 42 Constructing a Neural Network code with examples</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek42.html">Exercises week 42</a></li>
+
+
+
+
+
+
+
+
+
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week43.html">Week 43: Deep Learning: Constructing a Neural Network code and solving differential equations</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek43.html">Exercises week 43</a></li>
+
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week44.html">Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek44.html">Exercises week 44</a></li>
+
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week45.html">Week 45,  Convolutional Neural Networks (CCNs)</a></li>
 </ul>
 <p aria-level="2" class="caption" role="heading"><span class="caption-text">Projects</span></p>
 <ul class="nav bd-sidenav">
 <li class="toctree-l1"><a class="reference internal" href="http://github.com/project1.html">Project 1 on Machine Learning, deadline October 6 (midnight), 2025</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/project2.html">Project 2 on Machine Learning, deadline November 10 (Midnight)</a></li>
 </ul>
 
     </div>
diff --git a/doc/LectureNotes/_build/html/chapter8.html b/doc/LectureNotes/_build/html/chapter8.html
index 47a013e0c..a068a54b0 100644
--- a/doc/LectureNotes/_build/html/chapter8.html
+++ b/doc/LectureNotes/_build/html/chapter8.html
@@ -230,10 +230,45 @@
 <li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek36.html">Exercises week 36</a></li>
 <li class="toctree-l1"><a class="reference internal" href="http://github.com/week36.html">Week 36: Linear Regression and Gradient descent</a></li>
 <li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek37.html">Exercises week 37</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week37.html">Week 37: Gradient descent methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek38.html">Exercises week 38</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week38.html">Week 38: Statistical analysis, bias-variance tradeoff and resampling methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek39.html">Exercises week 39</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week39.html">Week 39: Resampling methods and logistic regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week40.html">Week 40: Gradient descent methods (continued) and start Neural networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week41.html">Week 41 Neural networks and constructing a neural network code</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek41.html">Exercises week 41</a></li>
+
+
+
+
+
+
+
+
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week42.html">Week 42 Constructing a Neural Network code with examples</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek42.html">Exercises week 42</a></li>
+
+
+
+
+
+
+
+
+
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week43.html">Week 43: Deep Learning: Constructing a Neural Network code and solving differential equations</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek43.html">Exercises week 43</a></li>
+
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week44.html">Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek44.html">Exercises week 44</a></li>
+
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week45.html">Week 45,  Convolutional Neural Networks (CCNs)</a></li>
 </ul>
 <p aria-level="2" class="caption" role="heading"><span class="caption-text">Projects</span></p>
 <ul class="nav bd-sidenav">
 <li class="toctree-l1"><a class="reference internal" href="http://github.com/project1.html">Project 1 on Machine Learning, deadline October 6 (midnight), 2025</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/project2.html">Project 2 on Machine Learning, deadline November 10 (Midnight)</a></li>
 </ul>
 
     </div>
diff --git a/doc/LectureNotes/_build/html/chapter9.html b/doc/LectureNotes/_build/html/chapter9.html
index 39622eb94..d98e219ce 100644
--- a/doc/LectureNotes/_build/html/chapter9.html
+++ b/doc/LectureNotes/_build/html/chapter9.html
@@ -230,10 +230,45 @@
 <li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek36.html">Exercises week 36</a></li>
 <li class="toctree-l1"><a class="reference internal" href="http://github.com/week36.html">Week 36: Linear Regression and Gradient descent</a></li>
 <li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek37.html">Exercises week 37</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week37.html">Week 37: Gradient descent methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek38.html">Exercises week 38</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week38.html">Week 38: Statistical analysis, bias-variance tradeoff and resampling methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek39.html">Exercises week 39</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week39.html">Week 39: Resampling methods and logistic regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week40.html">Week 40: Gradient descent methods (continued) and start Neural networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week41.html">Week 41 Neural networks and constructing a neural network code</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek41.html">Exercises week 41</a></li>
+
+
+
+
+
+
+
+
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week42.html">Week 42 Constructing a Neural Network code with examples</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek42.html">Exercises week 42</a></li>
+
+
+
+
+
+
+
+
+
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week43.html">Week 43: Deep Learning: Constructing a Neural Network code and solving differential equations</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek43.html">Exercises week 43</a></li>
+
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week44.html">Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek44.html">Exercises week 44</a></li>
+
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week45.html">Week 45,  Convolutional Neural Networks (CCNs)</a></li>
 </ul>
 <p aria-level="2" class="caption" role="heading"><span class="caption-text">Projects</span></p>
 <ul class="nav bd-sidenav">
 <li class="toctree-l1"><a class="reference internal" href="http://github.com/project1.html">Project 1 on Machine Learning, deadline October 6 (midnight), 2025</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/project2.html">Project 2 on Machine Learning, deadline November 10 (Midnight)</a></li>
 </ul>
 
     </div>
diff --git a/doc/LectureNotes/_build/html/chapteroptimization.html b/doc/LectureNotes/_build/html/chapteroptimization.html
index 44829966d..7c1d042b7 100644
--- a/doc/LectureNotes/_build/html/chapteroptimization.html
+++ b/doc/LectureNotes/_build/html/chapteroptimization.html
@@ -230,10 +230,45 @@
 <li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek36.html">Exercises week 36</a></li>
 <li class="toctree-l1"><a class="reference internal" href="http://github.com/week36.html">Week 36: Linear Regression and Gradient descent</a></li>
 <li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek37.html">Exercises week 37</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week37.html">Week 37: Gradient descent methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek38.html">Exercises week 38</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week38.html">Week 38: Statistical analysis, bias-variance tradeoff and resampling methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek39.html">Exercises week 39</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week39.html">Week 39: Resampling methods and logistic regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week40.html">Week 40: Gradient descent methods (continued) and start Neural networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week41.html">Week 41 Neural networks and constructing a neural network code</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek41.html">Exercises week 41</a></li>
+
+
+
+
+
+
+
+
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week42.html">Week 42 Constructing a Neural Network code with examples</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek42.html">Exercises week 42</a></li>
+
+
+
+
+
+
+
+
+
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week43.html">Week 43: Deep Learning: Constructing a Neural Network code and solving differential equations</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek43.html">Exercises week 43</a></li>
+
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week44.html">Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek44.html">Exercises week 44</a></li>
+
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week45.html">Week 45,  Convolutional Neural Networks (CCNs)</a></li>
 </ul>
 <p aria-level="2" class="caption" role="heading"><span class="caption-text">Projects</span></p>
 <ul class="nav bd-sidenav">
 <li class="toctree-l1"><a class="reference internal" href="http://github.com/project1.html">Project 1 on Machine Learning, deadline October 6 (midnight), 2025</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/project2.html">Project 2 on Machine Learning, deadline November 10 (Midnight)</a></li>
 </ul>
 
     </div>
diff --git a/doc/LectureNotes/_build/html/clustering.html b/doc/LectureNotes/_build/html/clustering.html
index 7175fdd80..cb49d05d2 100644
--- a/doc/LectureNotes/_build/html/clustering.html
+++ b/doc/LectureNotes/_build/html/clustering.html
@@ -230,10 +230,45 @@
 <li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek36.html">Exercises week 36</a></li>
 <li class="toctree-l1"><a class="reference internal" href="http://github.com/week36.html">Week 36: Linear Regression and Gradient descent</a></li>
 <li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek37.html">Exercises week 37</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week37.html">Week 37: Gradient descent methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek38.html">Exercises week 38</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week38.html">Week 38: Statistical analysis, bias-variance tradeoff and resampling methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek39.html">Exercises week 39</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week39.html">Week 39: Resampling methods and logistic regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week40.html">Week 40: Gradient descent methods (continued) and start Neural networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week41.html">Week 41 Neural networks and constructing a neural network code</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek41.html">Exercises week 41</a></li>
+
+
+
+
+
+
+
+
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week42.html">Week 42 Constructing a Neural Network code with examples</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek42.html">Exercises week 42</a></li>
+
+
+
+
+
+
+
+
+
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week43.html">Week 43: Deep Learning: Constructing a Neural Network code and solving differential equations</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek43.html">Exercises week 43</a></li>
+
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week44.html">Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek44.html">Exercises week 44</a></li>
+
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week45.html">Week 45,  Convolutional Neural Networks (CCNs)</a></li>
 </ul>
 <p aria-level="2" class="caption" role="heading"><span class="caption-text">Projects</span></p>
 <ul class="nav bd-sidenav">
 <li class="toctree-l1"><a class="reference internal" href="http://github.com/project1.html">Project 1 on Machine Learning, deadline October 6 (midnight), 2025</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/project2.html">Project 2 on Machine Learning, deadline November 10 (Midnight)</a></li>
 </ul>
 
     </div>
diff --git a/doc/LectureNotes/_build/html/exercisesweek34.html b/doc/LectureNotes/_build/html/exercisesweek34.html
index bc02792c6..67011b8c8 100644
--- a/doc/LectureNotes/_build/html/exercisesweek34.html
+++ b/doc/LectureNotes/_build/html/exercisesweek34.html
@@ -228,10 +228,45 @@
 <li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek36.html">Exercises week 36</a></li>
 <li class="toctree-l1"><a class="reference internal" href="http://github.com/week36.html">Week 36: Linear Regression and Gradient descent</a></li>
 <li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek37.html">Exercises week 37</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week37.html">Week 37: Gradient descent methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek38.html">Exercises week 38</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week38.html">Week 38: Statistical analysis, bias-variance tradeoff and resampling methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek39.html">Exercises week 39</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week39.html">Week 39: Resampling methods and logistic regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week40.html">Week 40: Gradient descent methods (continued) and start Neural networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week41.html">Week 41 Neural networks and constructing a neural network code</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek41.html">Exercises week 41</a></li>
+
+
+
+
+
+
+
+
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week42.html">Week 42 Constructing a Neural Network code with examples</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek42.html">Exercises week 42</a></li>
+
+
+
+
+
+
+
+
+
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week43.html">Week 43: Deep Learning: Constructing a Neural Network code and solving differential equations</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek43.html">Exercises week 43</a></li>
+
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week44.html">Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek44.html">Exercises week 44</a></li>
+
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week45.html">Week 45,  Convolutional Neural Networks (CCNs)</a></li>
 </ul>
 <p aria-level="2" class="caption" role="heading"><span class="caption-text">Projects</span></p>
 <ul class="nav bd-sidenav">
 <li class="toctree-l1"><a class="reference internal" href="http://github.com/project1.html">Project 1 on Machine Learning, deadline October 6 (midnight), 2025</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/project2.html">Project 2 on Machine Learning, deadline November 10 (Midnight)</a></li>
 </ul>
 
     </div>
diff --git a/doc/LectureNotes/_build/html/exercisesweek35.html b/doc/LectureNotes/_build/html/exercisesweek35.html
index ddaf0e727..a3f715f2c 100644
--- a/doc/LectureNotes/_build/html/exercisesweek35.html
+++ b/doc/LectureNotes/_build/html/exercisesweek35.html
@@ -230,10 +230,45 @@
 <li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek36.html">Exercises week 36</a></li>
 <li class="toctree-l1"><a class="reference internal" href="http://github.com/week36.html">Week 36: Linear Regression and Gradient descent</a></li>
 <li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek37.html">Exercises week 37</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week37.html">Week 37: Gradient descent methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek38.html">Exercises week 38</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week38.html">Week 38: Statistical analysis, bias-variance tradeoff and resampling methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek39.html">Exercises week 39</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week39.html">Week 39: Resampling methods and logistic regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week40.html">Week 40: Gradient descent methods (continued) and start Neural networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week41.html">Week 41 Neural networks and constructing a neural network code</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek41.html">Exercises week 41</a></li>
+
+
+
+
+
+
+
+
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week42.html">Week 42 Constructing a Neural Network code with examples</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek42.html">Exercises week 42</a></li>
+
+
+
+
+
+
+
+
+
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week43.html">Week 43: Deep Learning: Constructing a Neural Network code and solving differential equations</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek43.html">Exercises week 43</a></li>
+
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week44.html">Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek44.html">Exercises week 44</a></li>
+
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week45.html">Week 45,  Convolutional Neural Networks (CCNs)</a></li>
 </ul>
 <p aria-level="2" class="caption" role="heading"><span class="caption-text">Projects</span></p>
 <ul class="nav bd-sidenav">
 <li class="toctree-l1"><a class="reference internal" href="http://github.com/project1.html">Project 1 on Machine Learning, deadline October 6 (midnight), 2025</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/project2.html">Project 2 on Machine Learning, deadline November 10 (Midnight)</a></li>
 </ul>
 
     </div>
@@ -557,7 +592,7 @@ <h2>Exercise 4 - Fitting a polynomial<a class="headerlink" href="#exercise-4-fit
 <div class="cell_input docutils container">
 <div class="highlight-ipython3 notranslate"><div class="highlight"><pre><span></span><span class="n">n</span> <span class="o">=</span> <span class="mi">100</span>
 <span class="n">x</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">linspace</span><span class="p">(</span><span class="o">-</span><span class="mi">3</span><span class="p">,</span> <span class="mi">3</span><span class="p">,</span> <span class="n">n</span><span class="p">)</span>
-<span class="n">y</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">exp</span><span class="p">(</span><span class="o">-</span><span class="n">x</span><span class="o">**</span><span class="mi">2</span><span class="p">)</span> <span class="o">+</span> <span class="mf">1.5</span> <span class="o">*</span> <span class="n">np</span><span class="o">.</span><span class="n">exp</span><span class="p">(</span><span class="o">-</span><span class="p">(</span><span class="n">x</span><span class="o">-</span><span class="mi">2</span><span class="p">)</span><span class="o">**</span><span class="mi">2</span><span class="p">)</span> <span class="o">+</span> <span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">normal</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mf">1.0</span><span class="p">)</span>
+<span class="n">y</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">exp</span><span class="p">(</span><span class="o">-</span><span class="n">x</span><span class="o">**</span><span class="mi">2</span><span class="p">)</span> <span class="o">+</span> <span class="mf">1.5</span> <span class="o">*</span> <span class="n">np</span><span class="o">.</span><span class="n">exp</span><span class="p">(</span><span class="o">-</span><span class="p">(</span><span class="n">x</span><span class="o">-</span><span class="mi">2</span><span class="p">)</span><span class="o">**</span><span class="mi">2</span><span class="p">)</span> <span class="o">+</span> <span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">normal</span><span class="p">(</span><span class="n">n</span><span class="p">)</span>
 </pre></div>
 </div>
 </div>
diff --git a/doc/LectureNotes/_build/html/exercisesweek36.html b/doc/LectureNotes/_build/html/exercisesweek36.html
index 8a389f299..275e30f1e 100644
--- a/doc/LectureNotes/_build/html/exercisesweek36.html
+++ b/doc/LectureNotes/_build/html/exercisesweek36.html
@@ -230,10 +230,45 @@
 <li class="toctree-l1 current active"><a class="current reference internal" href="#">Exercises week 36</a></li>
 <li class="toctree-l1"><a class="reference internal" href="http://github.com/week36.html">Week 36: Linear Regression and Gradient descent</a></li>
 <li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek37.html">Exercises week 37</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week37.html">Week 37: Gradient descent methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek38.html">Exercises week 38</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week38.html">Week 38: Statistical analysis, bias-variance tradeoff and resampling methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek39.html">Exercises week 39</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week39.html">Week 39: Resampling methods and logistic regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week40.html">Week 40: Gradient descent methods (continued) and start Neural networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week41.html">Week 41 Neural networks and constructing a neural network code</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek41.html">Exercises week 41</a></li>
+
+
+
+
+
+
+
+
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week42.html">Week 42 Constructing a Neural Network code with examples</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek42.html">Exercises week 42</a></li>
+
+
+
+
+
+
+
+
+
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week43.html">Week 43: Deep Learning: Constructing a Neural Network code and solving differential equations</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek43.html">Exercises week 43</a></li>
+
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week44.html">Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek44.html">Exercises week 44</a></li>
+
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week45.html">Week 45,  Convolutional Neural Networks (CCNs)</a></li>
 </ul>
 <p aria-level="2" class="caption" role="heading"><span class="caption-text">Projects</span></p>
 <ul class="nav bd-sidenav">
 <li class="toctree-l1"><a class="reference internal" href="http://github.com/project1.html">Project 1 on Machine Learning, deadline October 6 (midnight), 2025</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/project2.html">Project 2 on Machine Learning, deadline November 10 (Midnight)</a></li>
 </ul>
 
     </div>
@@ -475,7 +510,7 @@ <h2>Exercise 3 - Scaling data<a class="headerlink" href="#exercise-3-scaling-dat
 <div class="cell_input docutils container">
 <div class="highlight-ipython3 notranslate"><div class="highlight"><pre><span></span><span class="n">n</span> <span class="o">=</span> <span class="mi">100</span>
 <span class="n">x</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">linspace</span><span class="p">(</span><span class="o">-</span><span class="mi">3</span><span class="p">,</span> <span class="mi">3</span><span class="p">,</span> <span class="n">n</span><span class="p">)</span>
-<span class="n">y</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">exp</span><span class="p">(</span><span class="o">-</span><span class="n">x</span><span class="o">**</span><span class="mi">2</span><span class="p">)</span> <span class="o">+</span> <span class="mf">1.5</span> <span class="o">*</span> <span class="n">np</span><span class="o">.</span><span class="n">exp</span><span class="p">(</span><span class="o">-</span><span class="p">(</span><span class="n">x</span><span class="o">-</span><span class="mi">2</span><span class="p">)</span><span class="o">**</span><span class="mi">2</span><span class="p">)</span> <span class="o">+</span> <span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">normal</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mf">0.1</span><span class="p">)</span>
+<span class="n">y</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">exp</span><span class="p">(</span><span class="o">-</span><span class="n">x</span><span class="o">**</span><span class="mi">2</span><span class="p">)</span> <span class="o">+</span> <span class="mf">1.5</span> <span class="o">*</span> <span class="n">np</span><span class="o">.</span><span class="n">exp</span><span class="p">(</span><span class="o">-</span><span class="p">(</span><span class="n">x</span><span class="o">-</span><span class="mi">2</span><span class="p">)</span><span class="o">**</span><span class="mi">2</span><span class="p">)</span> <span class="o">+</span> <span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">normal</span><span class="p">(</span><span class="n">n</span><span class="p">)</span>
 </pre></div>
 </div>
 </div>
diff --git a/doc/LectureNotes/_build/html/exercisesweek37.html b/doc/LectureNotes/_build/html/exercisesweek37.html
index a366a3c71..2595944b9 100644
--- a/doc/LectureNotes/_build/html/exercisesweek37.html
+++ b/doc/LectureNotes/_build/html/exercisesweek37.html
@@ -62,7 +62,7 @@
     <script>DOCUMENTATION_OPTIONS.pagename = 'exercisesweek37';</script>
     <link rel="index" title="Index" href="http://github.com/genindex.html" />
     <link rel="search" title="Search" href="http://github.com/search.html" />
-    <link rel="next" title="Project 1 on Machine Learning, deadline October 6 (midnight), 2025" href="http://github.com/project1.html" />
+    <link rel="next" title="Week 37: Gradient descent methods" href="http://github.com/week37.html" />
     <link rel="prev" title="Week 36: Linear Regression and Gradient descent" href="http://github.com/week36.html" />
   <meta name="viewport" content="width=device-width, initial-scale=1"/>
   <meta name="docsearch:language" content="en"/>
@@ -230,10 +230,45 @@
 <li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek36.html">Exercises week 36</a></li>
 <li class="toctree-l1"><a class="reference internal" href="http://github.com/week36.html">Week 36: Linear Regression and Gradient descent</a></li>
 <li class="toctree-l1 current active"><a class="current reference internal" href="#">Exercises week 37</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week37.html">Week 37: Gradient descent methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek38.html">Exercises week 38</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week38.html">Week 38: Statistical analysis, bias-variance tradeoff and resampling methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek39.html">Exercises week 39</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week39.html">Week 39: Resampling methods and logistic regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week40.html">Week 40: Gradient descent methods (continued) and start Neural networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week41.html">Week 41 Neural networks and constructing a neural network code</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek41.html">Exercises week 41</a></li>
+
+
+
+
+
+
+
+
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week42.html">Week 42 Constructing a Neural Network code with examples</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek42.html">Exercises week 42</a></li>
+
+
+
+
+
+
+
+
+
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week43.html">Week 43: Deep Learning: Constructing a Neural Network code and solving differential equations</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek43.html">Exercises week 43</a></li>
+
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week44.html">Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek44.html">Exercises week 44</a></li>
+
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week45.html">Week 45,  Convolutional Neural Networks (CCNs)</a></li>
 </ul>
 <p aria-level="2" class="caption" role="heading"><span class="caption-text">Projects</span></p>
 <ul class="nav bd-sidenav">
 <li class="toctree-l1"><a class="reference internal" href="http://github.com/project1.html">Project 1 on Machine Learning, deadline October 6 (midnight), 2025</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/project2.html">Project 2 on Machine Learning, deadline November 10 (Midnight)</a></li>
 </ul>
 
     </div>
@@ -413,7 +448,8 @@ <h2> Contents </h2>
                   
   <!-- HTML file automatically generated from DocOnce source (https://github.com/doconce/doconce/)
 doconce format html exercisesweek37.do.txt  -->
-<!-- dom:TITLE: Exercises week 37 --><section class="tex2jax_ignore mathjax_ignore" id="exercises-week-37">
+<!-- dom:TITLE: Exercises week 37 -->
+<section class="tex2jax_ignore mathjax_ignore" id="exercises-week-37">
 <h1>Exercises week 37<a class="headerlink" href="#exercises-week-37" title="Link to this heading">#</a></h1>
 <p><strong>Implementing gradient descent for Ridge and ordinary Least Squares Regression</strong></p>
 <p>Date: <strong>September 8-12, 2025</strong></p>
@@ -454,23 +490,23 @@ <h3>1a)<a class="headerlink" href="#a" title="Link to this heading">#</a></h3>
 term, the data is shifted such that the intercept is effectively 0
 . (In practice, one could include an intercept in the model and not
 penalize it, but here we simplify by centering.)
-Choose <span class="math notranslate nohighlight">\(n=100\)</span> data points and set up <span class="math notranslate nohighlight">\(\boldsymbol{x}, \)</span>\boldsymbol{y}<span class="math notranslate nohighlight">\( and the design matrix \)</span>\boldsymbol{X}$.</p>
+Choose <span class="math notranslate nohighlight">\(n=100\)</span> data points and set up <span class="math notranslate nohighlight">\(\boldsymbol{x}\)</span>, <span class="math notranslate nohighlight">\(\boldsymbol{y}\)</span> and the design matrix <span class="math notranslate nohighlight">\(\boldsymbol{X}\)</span>.</p>
 <div class="cell docutils container">
 <div class="cell_input docutils container">
-<div class="highlight-none notranslate"><div class="highlight"><pre><span></span># Standardize features (zero mean, unit variance for each feature)
-X_mean = X.mean(axis=0)
-X_std = X.std(axis=0)
-X_std[X_std == 0] = 1  # safeguard to avoid division by zero for constant features
-X_norm = (X - X_mean) / X_std
-
-# Center the target to zero mean (optional, to simplify intercept handling)
-y_mean = ?
-y_centered = ?
+<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="c1"># Standardize features (zero mean, unit variance for each feature)</span>
+<span class="n">X_mean</span> <span class="o">=</span> <span class="n">X</span><span class="o">.</span><span class="n">mean</span><span class="p">(</span><span class="n">axis</span><span class="o">=</span><span class="mi">0</span><span class="p">)</span>
+<span class="n">X_std</span> <span class="o">=</span> <span class="n">X</span><span class="o">.</span><span class="n">std</span><span class="p">(</span><span class="n">axis</span><span class="o">=</span><span class="mi">0</span><span class="p">)</span>
+<span class="n">X_std</span><span class="p">[</span><span class="n">X_std</span> <span class="o">==</span> <span class="mi">0</span><span class="p">]</span> <span class="o">=</span> <span class="mi">1</span>  <span class="c1"># safeguard to avoid division by zero for constant features</span>
+<span class="n">X_norm</span> <span class="o">=</span> <span class="p">(</span><span class="n">X</span> <span class="o">-</span> <span class="n">X_mean</span><span class="p">)</span> <span class="o">/</span> <span class="n">X_std</span>
+
+<span class="c1"># Center the target to zero mean (optional, to simplify intercept handling)</span>
+<span class="n">y_mean</span> <span class="o">=</span> <span class="err">?</span>
+<span class="n">y_centered</span> <span class="o">=</span> <span class="err">?</span>
 </pre></div>
 </div>
 </div>
 </div>
-<p>Fill in the necessary details.</p>
+<p>Fill in the necessary details. Do we need to center the <span class="math notranslate nohighlight">\(y\)</span>-values?</p>
 <p>After this preprocessing, each column of <span class="math notranslate nohighlight">\(\boldsymbol{X}_{\mathrm{norm}}\)</span> has mean zero and standard deviation <span class="math notranslate nohighlight">\(1\)</span>
 and <span class="math notranslate nohighlight">\(\boldsymbol{y}_{\mathrm{centered}}\)</span> has mean 0. This makes the optimization landscape
 nicer and ensures the regularization penalty <span class="math notranslate nohighlight">\(\lambda \sum_j
@@ -486,16 +522,18 @@ <h2>Exercise 2, calculate the gradients<a class="headerlink" href="#exercise-2-c
 <h2>Exercise 3, using the analytical formulae for OLS and Ridge regression to find the optimal paramters <span class="math notranslate nohighlight">\(\boldsymbol{\theta}\)</span><a class="headerlink" href="#exercise-3-using-the-analytical-formulae-for-ols-and-ridge-regression-to-find-the-optimal-paramters-boldsymbol-theta" title="Link to this heading">#</a></h2>
 <div class="cell docutils container">
 <div class="cell_input docutils container">
-<div class="highlight-none notranslate"><div class="highlight"><pre><span></span># Set regularization parameter, either a single value or a vector of values
-lambda = ?
+<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="c1"># Set regularization parameter, either a single value or a vector of values</span>
+<span class="c1"># Note that lambda is a python keyword. The lambda keyword is used to create small, single-expression functions without a formal name. These are often called &quot;anonymous functions&quot; or &quot;lambda functions.&quot;</span>
+<span class="n">lam</span> <span class="o">=</span> <span class="err">?</span>
+
 
-# Analytical form for OLS and Ridge solution: theta_Ridge = (X^T X + lambda * I)^{-1} X^T y and theta_OLS = (X^T X)^{-1} X^T y
-I = np.eye(n_features)
-theta_closed_formRidge = ?
-theta_closed_formOLS = ?
+<span class="c1"># Analytical form for OLS and Ridge solution: theta_Ridge = (X^T X + lambda * I)^{-1} X^T y and theta_OLS = (X^T X)^{-1} X^T y</span>
+<span class="n">I</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">eye</span><span class="p">(</span><span class="n">n_features</span><span class="p">)</span>
+<span class="n">theta_closed_formRidge</span> <span class="o">=</span> <span class="err">?</span>
+<span class="n">theta_closed_formOLS</span> <span class="o">=</span> <span class="err">?</span>
 
-print(&quot;Closed-form Ridge coefficients:&quot;, theta_closed_form)
-print(&quot;Closed-form OLS coefficients:&quot;, theta_closed_form)
+<span class="nb">print</span><span class="p">(</span><span class="s2">&quot;Closed-form Ridge coefficients:&quot;</span><span class="p">,</span> <span class="n">theta_closed_form</span><span class="p">)</span>
+<span class="nb">print</span><span class="p">(</span><span class="s2">&quot;Closed-form OLS coefficients:&quot;</span><span class="p">,</span> <span class="n">theta_closed_form</span><span class="p">)</span>
 </pre></div>
 </div>
 </div>
@@ -503,7 +541,7 @@ <h2>Exercise 3, using the analytical formulae for OLS and Ridge regression to fi
 <p>This computes the Ridge and OLS regression coefficients directly. The identity
 matrix <span class="math notranslate nohighlight">\(I\)</span> has the same size as <span class="math notranslate nohighlight">\(X^T X\)</span>. It adds <span class="math notranslate nohighlight">\(\lambda\)</span> to the diagonal of <span class="math notranslate nohighlight">\(X^T X\)</span> for Ridge regression. We
 then invert this matrix and multiply by <span class="math notranslate nohighlight">\(X^T y\)</span>. The result
-for <span class="math notranslate nohighlight">\(\boldsymbol{\theta}\)</span>  is a NumPy array of shape (n<span class="math notranslate nohighlight">\(\_\)</span>features,) containing the
+for <span class="math notranslate nohighlight">\(\boldsymbol{\theta}\)</span> is a NumPy array of shape (n<span class="math notranslate nohighlight">\(\_\)</span>features,) containing the
 fitted parameters <span class="math notranslate nohighlight">\(\boldsymbol{\theta}\)</span>.</p>
 <section id="id1">
 <h3>3a)<a class="headerlink" href="#id1" title="Link to this heading">#</a></h3>
@@ -521,54 +559,45 @@ <h2>Exercise 4, Implementing the simplest form for gradient descent<a class="hea
 necessary if <span class="math notranslate nohighlight">\(n\)</span> and <span class="math notranslate nohighlight">\(p\)</span> are so large that the closed-form might be
 too slow or memory-intensive. We derive the gradients from the cost
 functions defined above. Use the gradients of the Ridge and OLS cost functions with respect to
-the parameters  <span class="math notranslate nohighlight">\(\boldsymbol{\theta}\)</span> and set up (using the template below) your own gradient descent code for OLS and Ridge regression.</p>
+the parameters <span class="math notranslate nohighlight">\(\boldsymbol{\theta}\)</span> and set up (using the template below) your own gradient descent code for OLS and Ridge regression.</p>
 <p>Below is a template code for gradient descent implementation of ridge:</p>
 <div class="cell docutils container">
 <div class="cell_input docutils container">
-<div class="highlight-none notranslate"><div class="highlight"><pre><span></span># Gradient descent parameters, learning rate eta first
-eta = 0.1
-# Then number of iterations
-num_iters = 1000
-
-# Initialize weights for gradient descent
-theta = np.zeros(n_features)
-
-# Arrays to store history for plotting
-cost_history = np.zeros(num_iters)
-
-# Gradient descent loop
-m = n_samples  # number of data points
-for t in range(num_iters):
-    # Compute prediction error
-    error = X_norm.dot(theta) - y_centered 
-    # Compute cost for OLS and Ridge (MSE + regularization for Ridge) for monitoring
-    cost_OLS = ?
-    cost_Ridge = ?
-    # You could add a history for both methods (optional)
-    cost_history[t] = ?
-    # Compute gradients for OSL and Ridge
-    grad_OLS = ?
-    grad_Ridge = ?
-    # Update parameters theta
-    theta_gdOLS = ?
-    theta_gdRidge = ? 
-
-# After the loop, theta contains the fitted coefficients
-theta_gdOLS = ?
-theta_gdRidge = ?
-print(&quot;Gradient Descent OLS coefficients:&quot;, theta_gdOLS)
-print(&quot;Gradient Descent Ridge coefficients:&quot;, theta_gdRidge)
+<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="c1"># Gradient descent parameters, learning rate eta first</span>
+<span class="n">eta</span> <span class="o">=</span> <span class="mf">0.1</span>
+<span class="c1"># Then number of iterations</span>
+<span class="n">num_iters</span> <span class="o">=</span> <span class="mi">1000</span>
+
+<span class="c1"># Initialize weights for gradient descent</span>
+<span class="n">theta</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">zeros</span><span class="p">(</span><span class="n">n_features</span><span class="p">)</span>
+
+<span class="c1"># Gradient descent loop</span>
+<span class="k">for</span> <span class="n">t</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">num_iters</span><span class="p">):</span>
+    <span class="c1"># Compute gradients for OSL and Ridge</span>
+    <span class="n">grad_OLS</span> <span class="o">=</span> <span class="err">?</span>
+    <span class="n">grad_Ridge</span> <span class="o">=</span> <span class="err">?</span>
+    <span class="c1"># Update parameters theta</span>
+    <span class="n">theta_gdOLS</span> <span class="o">=</span> <span class="err">?</span>
+    <span class="n">theta_gdRidge</span> <span class="o">=</span> <span class="err">?</span> 
+
+<span class="c1"># After the loop, theta contains the fitted coefficients</span>
+<span class="n">theta_gdOLS</span> <span class="o">=</span> <span class="err">?</span>
+<span class="n">theta_gdRidge</span> <span class="o">=</span> <span class="err">?</span>
+<span class="nb">print</span><span class="p">(</span><span class="s2">&quot;Gradient Descent OLS coefficients:&quot;</span><span class="p">,</span> <span class="n">theta_gdOLS</span><span class="p">)</span>
+<span class="nb">print</span><span class="p">(</span><span class="s2">&quot;Gradient Descent Ridge coefficients:&quot;</span><span class="p">,</span> <span class="n">theta_gdRidge</span><span class="p">)</span>
 </pre></div>
 </div>
 </div>
 </div>
 <section id="id2">
 <h3>4a)<a class="headerlink" href="#id2" title="Link to this heading">#</a></h3>
-<p>Discuss the results as function of the learning rate parameters and the number of iterations.</p>
+<p>Write first a gradient descent code for OLS only using the above template.
+Discuss the results as function of the learning rate parameters and the number of iterations</p>
 </section>
 <section id="id3">
 <h3>4b)<a class="headerlink" href="#id3" title="Link to this heading">#</a></h3>
-<p>Try to add a stopping parameter as function of the number iterations and the difference between the new and old <span class="math notranslate nohighlight">\(\theta\)</span> values. How would you define a stopping criterion?</p>
+<p>Write then a similar code for Ridge regression using the above template.
+Try to add a stopping parameter as function of the number iterations and the difference between the new and old <span class="math notranslate nohighlight">\(\theta\)</span> values. How would you define a stopping criterion?</p>
 </section>
 </section>
 <section id="exercise-5-ridge-regression-and-a-new-synthetic-dataset">
@@ -586,24 +615,24 @@ <h2>Exercise 5, Ridge regression and a new Synthetic Dataset<a class="headerlink
 <p>Below is the code to generate the dataset:</p>
 <div class="cell docutils container">
 <div class="cell_input docutils container">
-<div class="highlight-none notranslate"><div class="highlight"><pre><span></span>import numpy as np
+<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">numpy</span> <span class="k">as</span> <span class="nn">np</span>
 
-# Set random seed for reproducibility
-np.random.seed(0)
+<span class="c1"># Set random seed for reproducibility</span>
+<span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">seed</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span>
 
-# Define dataset size
-n_samples = 100
-n_features = 10
+<span class="c1"># Define dataset size</span>
+<span class="n">n_samples</span> <span class="o">=</span> <span class="mi">100</span>
+<span class="n">n_features</span> <span class="o">=</span> <span class="mi">10</span>
 
-# Define true coefficients (sparse linear relationship)
-theta_true = np.array([5.0, -3.0, 0.0, 0.0, 0.0, 0.0, 2.0, 0.0, 0.0, 0.0])
+<span class="c1"># Define true coefficients (sparse linear relationship)</span>
+<span class="n">theta_true</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">array</span><span class="p">([</span><span class="mf">5.0</span><span class="p">,</span> <span class="o">-</span><span class="mf">3.0</span><span class="p">,</span> <span class="mf">0.0</span><span class="p">,</span> <span class="mf">0.0</span><span class="p">,</span> <span class="mf">0.0</span><span class="p">,</span> <span class="mf">0.0</span><span class="p">,</span> <span class="mf">2.0</span><span class="p">,</span> <span class="mf">0.0</span><span class="p">,</span> <span class="mf">0.0</span><span class="p">,</span> <span class="mf">0.0</span><span class="p">])</span>
 
-# Generate feature matrix X (n_samples x n_features) with random values
-X = np.random.randn(n_samples, n_features)  # standard normal distribution
+<span class="c1"># Generate feature matrix X (n_samples x n_features) with random values</span>
+<span class="n">X</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">randn</span><span class="p">(</span><span class="n">n_samples</span><span class="p">,</span> <span class="n">n_features</span><span class="p">)</span>  <span class="c1"># standard normal distribution</span>
 
-# Generate target values y with a linear combination of X and theta_true, plus noise
-noise = 0.5 * np.random.randn(n_samples)    # Gaussian noise
-y = X.dot @ theta_true + noise
+<span class="c1"># Generate target values y with a linear combination of X and theta_true, plus noise</span>
+<span class="n">noise</span> <span class="o">=</span> <span class="mf">0.5</span> <span class="o">*</span> <span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">randn</span><span class="p">(</span><span class="n">n_samples</span><span class="p">)</span>  <span class="c1"># Gaussian noise</span>
+<span class="n">y</span> <span class="o">=</span> <span class="n">X</span><span class="o">.</span><span class="n">dot</span> <span class="o">@</span> <span class="n">theta_true</span> <span class="o">+</span> <span class="n">noise</span>
 </pre></div>
 </div>
 </div>
@@ -623,7 +652,7 @@ <h2>Exercise 5, Ridge regression and a new Synthetic Dataset<a class="headerlink
 close to the true values [5.0, -3.0, 0.0, …, 2.0, …] that we used to
 generate the data. Keep in mind that due to regularization and noise,
 the learned values will not exactly equal the true ones, but they
-should be in the same ballpark.  Which method (OLS or Ridge) gives the best results?</p>
+should be in the same ballpark. Which method (OLS or Ridge) gives the best results?</p>
 </section>
 </section>
 
@@ -667,11 +696,11 @@ <h2>Exercise 5, Ridge regression and a new Synthetic Dataset<a class="headerlink
       </div>
     </a>
     <a class="right-next"
-       href="http://github.com/project1.html"
+       href="http://github.com/week37.html"
        title="next page">
       <div class="prev-next-info">
         <p class="prev-next-subtitle">next</p>
-        <p class="prev-next-title">Project 1 on Machine Learning, deadline October 6 (midnight), 2025</p>
+        <p class="prev-next-title">Week 37: Gradient descent methods</p>
       </div>
       <i class="fa-solid fa-angle-right"></i>
     </a>
diff --git a/doc/LectureNotes/_build/html/exercisesweek38.html b/doc/LectureNotes/_build/html/exercisesweek38.html
new file mode 100644
index 000000000..5f9730b95
--- /dev/null
+++ b/doc/LectureNotes/_build/html/exercisesweek38.html
@@ -0,0 +1,779 @@
+
+<!DOCTYPE html>
+
+
+<html lang="en" data-content_root="./" >
+
+  <head>
+    <meta charset="utf-8" />
+    <meta name="viewport" content="width=device-width, initial-scale=1.0" /><meta name="viewport" content="width=device-width, initial-scale=1" />
+
+    <title>Exercises week 38 &#8212; Applied Data Analysis and Machine Learning</title>
+  
+  
+  
+  <script data-cfasync="false">
+    document.documentElement.dataset.mode = localStorage.getItem("mode") || "";
+    document.documentElement.dataset.theme = localStorage.getItem("theme") || "";
+  </script>
+  
+  <!-- Loaded before other Sphinx assets -->
+  <link href="http://github.com/_static/styles/theme.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+<link href="http://github.com/_static/styles/bootstrap.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+<link href="http://github.com/_static/styles/pydata-sphinx-theme.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+
+  
+  <link href="http://github.com/_static/vendor/fontawesome/6.5.2/css/all.min.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+  <link rel="preload" as="font" type="font/woff2" crossorigin href="http://github.com/_static/vendor/fontawesome/6.5.2/webfonts/fa-solid-900.woff2" />
+<link rel="preload" as="font" type="font/woff2" crossorigin href="http://github.com/_static/vendor/fontawesome/6.5.2/webfonts/fa-brands-400.woff2" />
+<link rel="preload" as="font" type="font/woff2" crossorigin href="http://github.com/_static/vendor/fontawesome/6.5.2/webfonts/fa-regular-400.woff2" />
+
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/pygments.css?v=fa44fd50" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/styles/sphinx-book-theme.css?v=eba8b062" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/togglebutton.css?v=13237357" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/copybutton.css?v=76b2166b" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/mystnb.8ecb98da25f57f5357bf6f572d296f466b2cfe2517ffebfabe82451661e28f02.css?v=6644e6bb" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/sphinx-thebe.css?v=4fa983c6" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/sphinx-design.min.css?v=95c83b7e" />
+  
+  <!-- Pre-loaded scripts that we'll load fully later -->
+  <link rel="preload" as="script" href="http://github.com/_static/scripts/bootstrap.js?digest=dfe6caa3a7d634c4db9b" />
+<link rel="preload" as="script" href="http://github.com/_static/scripts/pydata-sphinx-theme.js?digest=dfe6caa3a7d634c4db9b" />
+  <script src="http://github.com/_static/vendor/fontawesome/6.5.2/js/all.min.js?digest=dfe6caa3a7d634c4db9b"></script>
+
+    <script src="http://github.com/_static/documentation_options.js?v=9eb32ce0"></script>
+    <script src="http://github.com/_static/doctools.js?v=9a2dae69"></script>
+    <script src="http://github.com/_static/sphinx_highlight.js?v=dc90522c"></script>
+    <script src="http://github.com/_static/clipboard.min.js?v=a7894cd8"></script>
+    <script src="http://github.com/_static/copybutton.js?v=f281be69"></script>
+    <script src="http://github.com/_static/scripts/sphinx-book-theme.js?v=887ef09a"></script>
+    <script>let toggleHintShow = 'Click to show';</script>
+    <script>let toggleHintHide = 'Click to hide';</script>
+    <script>let toggleOpenOnPrint = 'true';</script>
+    <script src="http://github.com/_static/togglebutton.js?v=4a39c7ea"></script>
+    <script>var togglebuttonSelector = '.toggle, .admonition.dropdown';</script>
+    <script src="http://github.com/_static/design-tabs.js?v=f930bc37"></script>
+    <script>const THEBE_JS_URL = "https://unpkg.com/thebe@0.8.2/lib/index.js"; const thebe_selector = ".thebe,.cell"; const thebe_selector_input = "pre"; const thebe_selector_output = ".output, .cell_output"</script>
+    <script async="async" src="http://github.com/_static/sphinx-thebe.js?v=c100c467"></script>
+    <script>var togglebuttonSelector = '.toggle, .admonition.dropdown';</script>
+    <script>const THEBE_JS_URL = "https://unpkg.com/thebe@0.8.2/lib/index.js"; const thebe_selector = ".thebe,.cell"; const thebe_selector_input = "pre"; const thebe_selector_output = ".output, .cell_output"</script>
+    <script>window.MathJax = {"options": {"processHtmlClass": "tex2jax_process|mathjax_process|math|output_area"}}</script>
+    <script defer="defer" src="https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-mml-chtml.js"></script>
+    <script>DOCUMENTATION_OPTIONS.pagename = 'exercisesweek38';</script>
+    <link rel="index" title="Index" href="http://github.com/genindex.html" />
+    <link rel="search" title="Search" href="http://github.com/search.html" />
+    <link rel="next" title="Week 38: Statistical analysis, bias-variance tradeoff and resampling methods" href="http://github.com/week38.html" />
+    <link rel="prev" title="Week 37: Gradient descent methods" href="http://github.com/week37.html" />
+  <meta name="viewport" content="width=device-width, initial-scale=1"/>
+  <meta name="docsearch:language" content="en"/>
+  </head>
+  
+  
+  <body data-bs-spy="scroll" data-bs-target=".bd-toc-nav" data-offset="180" data-bs-root-margin="0px 0px -60%" data-default-mode="">
+
+  
+  
+  <div id="pst-skip-link" class="skip-link d-print-none"><a href="#main-content">Skip to main content</a></div>
+  
+  <div id="pst-scroll-pixel-helper"></div>
+  
+  <button type="button" class="btn rounded-pill" id="pst-back-to-top">
+    <i class="fa-solid fa-arrow-up"></i>Back to top</button>
+
+  
+  <input type="checkbox"
+          class="sidebar-toggle"
+          id="pst-primary-sidebar-checkbox"/>
+  <label class="overlay overlay-primary" for="pst-primary-sidebar-checkbox"></label>
+  
+  <input type="checkbox"
+          class="sidebar-toggle"
+          id="pst-secondary-sidebar-checkbox"/>
+  <label class="overlay overlay-secondary" for="pst-secondary-sidebar-checkbox"></label>
+  
+  <div class="search-button__wrapper">
+    <div class="search-button__overlay"></div>
+    <div class="search-button__search-container">
+<form class="bd-search d-flex align-items-center"
+      action="http://github.com/search.html"
+      method="get">
+  <i class="fa-solid fa-magnifying-glass"></i>
+  <input type="search"
+         class="form-control"
+         name="q"
+         id="search-input"
+         placeholder="Search this book..."
+         aria-label="Search this book..."
+         autocomplete="off"
+         autocorrect="off"
+         autocapitalize="off"
+         spellcheck="false"/>
+  <span class="search-button__kbd-shortcut"><kbd class="kbd-shortcut__modifier">Ctrl</kbd>+<kbd>K</kbd></span>
+</form></div>
+  </div>
+
+  <div class="pst-async-banner-revealer d-none">
+  <aside id="bd-header-version-warning" class="d-none d-print-none" aria-label="Version warning"></aside>
+</div>
+
+  
+    <header class="bd-header navbar navbar-expand-lg bd-navbar d-print-none">
+    </header>
+  
+
+  <div class="bd-container">
+    <div class="bd-container__inner bd-page-width">
+      
+      
+      
+      <div class="bd-sidebar-primary bd-sidebar">
+        
+
+  
+  <div class="sidebar-header-items sidebar-primary__section">
+    
+    
+    
+    
+  </div>
+  
+    <div class="sidebar-primary-items__start sidebar-primary__section">
+        <div class="sidebar-primary-item">
+
+  
+    
+  
+
+<a class="navbar-brand logo" href="http://github.com/intro.html">
+  
+  
+  
+  
+  
+    
+    
+      
+    
+    
+    <img src="http://github.com/_static/logo.png" class="logo__image only-light" alt="Applied Data Analysis and Machine Learning - Home"/>
+    <script>document.write(`<img src="http://github.com/_static/logo.png" class="logo__image only-dark" alt="Applied Data Analysis and Machine Learning - Home"/>`);</script>
+  
+  
+</a></div>
+        <div class="sidebar-primary-item">
+
+ <script>
+ document.write(`
+   <button class="btn search-button-field search-button__button" title="Search" aria-label="Search" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <i class="fa-solid fa-magnifying-glass"></i>
+    <span class="search-button__default-text">Search</span>
+    <span class="search-button__kbd-shortcut"><kbd class="kbd-shortcut__modifier">Ctrl</kbd>+<kbd class="kbd-shortcut__modifier">K</kbd></span>
+   </button>
+ `);
+ </script></div>
+        <div class="sidebar-primary-item"><nav class="bd-links bd-docs-nav" aria-label="Main">
+    <div class="bd-toc-item navbar-nav active">
+        
+        <ul class="nav bd-sidenav bd-sidenav__home-link">
+            <li class="toctree-l1">
+                <a class="reference internal" href="http://github.com/intro.html">
+                    Applied Data Analysis and Machine Learning
+                </a>
+            </li>
+        </ul>
+        <p aria-level="2" class="caption" role="heading"><span class="caption-text">About the course</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/schedule.html">Course setting</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/teachers.html">Teachers and Grading</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/textbooks.html">Textbooks</a></li>
+
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Review of Statistics with Resampling Techniques and Linear Algebra</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/statistics.html">1. Elements of Probability Theory and Statistical Data Analysis</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/linalg.html">2. Linear Algebra, Handling of Arrays and more Python Features</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">From Regression to Support Vector Machines</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter1.html">3. Linear Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter2.html">4. Ridge and Lasso Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter3.html">5. Resampling Methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter4.html">6. Logistic Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapteroptimization.html">7. Optimization, the central part of any Machine Learning algortithm</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter5.html">8. Support Vector Machines, overarching aims</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Decision Trees, Ensemble Methods and Boosting</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter6.html">9. Decision trees, overarching aims</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter7.html">10. Ensemble Methods: From a Single Tree to Many Trees and Extreme Boosting, Meet the Jungle of Methods</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Dimensionality Reduction</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter8.html">11. Basic ideas of the Principal Component Analysis (PCA)</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/clustering.html">12. Clustering and Unsupervised Learning</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Deep Learning Methods</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter9.html">13. Neural networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter10.html">14. Building a Feed Forward Neural Network</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter11.html">15. Solving Differential Equations  with Deep Learning</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter12.html">16. Convolutional Neural Networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter13.html">17. Recurrent neural networks: Overarching view</a></li>
+
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Weekly material, notes and exercises</span></p>
+<ul class="current nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek34.html">Exercises week 34</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week34.html">Week 34: Introduction to the course, Logistics and Practicalities</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek35.html">Exercises week 35</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week35.html">Week 35: From Ordinary Linear Regression to Ridge and Lasso Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek36.html">Exercises week 36</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week36.html">Week 36: Linear Regression and Gradient descent</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek37.html">Exercises week 37</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week37.html">Week 37: Gradient descent methods</a></li>
+<li class="toctree-l1 current active"><a class="current reference internal" href="#">Exercises week 38</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week38.html">Week 38: Statistical analysis, bias-variance tradeoff and resampling methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek39.html">Exercises week 39</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week39.html">Week 39: Resampling methods and logistic regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week40.html">Week 40: Gradient descent methods (continued) and start Neural networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week41.html">Week 41 Neural networks and constructing a neural network code</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek41.html">Exercises week 41</a></li>
+
+
+
+
+
+
+
+
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week42.html">Week 42 Constructing a Neural Network code with examples</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek42.html">Exercises week 42</a></li>
+
+
+
+
+
+
+
+
+
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week43.html">Week 43: Deep Learning: Constructing a Neural Network code and solving differential equations</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek43.html">Exercises week 43</a></li>
+
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week44.html">Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek44.html">Exercises week 44</a></li>
+
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week45.html">Week 45,  Convolutional Neural Networks (CCNs)</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Projects</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/project1.html">Project 1 on Machine Learning, deadline October 6 (midnight), 2025</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/project2.html">Project 2 on Machine Learning, deadline November 10 (Midnight)</a></li>
+</ul>
+
+    </div>
+</nav></div>
+    </div>
+  
+  
+  <div class="sidebar-primary-items__end sidebar-primary__section">
+  </div>
+  
+  <div id="rtd-footer-container"></div>
+
+
+      </div>
+      
+      <main id="main-content" class="bd-main" role="main">
+        
+        
+
+<div class="sbt-scroll-pixel-helper"></div>
+
+          <div class="bd-content">
+            <div class="bd-article-container">
+              
+              <div class="bd-header-article d-print-none">
+<div class="header-article-items header-article__inner">
+  
+    <div class="header-article-items__start">
+      
+        <div class="header-article-item"><button class="sidebar-toggle primary-toggle btn btn-sm" title="Toggle primary sidebar" data-bs-placement="bottom" data-bs-toggle="tooltip">
+  <span class="fa-solid fa-bars"></span>
+</button></div>
+      
+    </div>
+  
+  
+    <div class="header-article-items__end">
+      
+        <div class="header-article-item">
+
+<div class="article-header-buttons">
+
+
+
+
+
+<div class="dropdown dropdown-download-buttons">
+  <button class="btn dropdown-toggle" type="button" data-bs-toggle="dropdown" aria-expanded="false" aria-label="Download this page">
+    <i class="fas fa-download"></i>
+  </button>
+  <ul class="dropdown-menu">
+      
+      
+      
+      <li><a href="http://github.com/_sources/exercisesweek38.ipynb" target="_blank"
+   class="btn btn-sm btn-download-source-button dropdown-item"
+   title="Download source file"
+   data-bs-placement="left" data-bs-toggle="tooltip"
+>
+  
+
+<span class="btn__icon-container">
+  <i class="fas fa-file"></i>
+  </span>
+<span class="btn__text-container">.ipynb</span>
+</a>
+</li>
+      
+      
+      
+      
+      <li>
+<button onclick="window.print()"
+  class="btn btn-sm btn-download-pdf-button dropdown-item"
+  title="Print to PDF"
+  data-bs-placement="left" data-bs-toggle="tooltip"
+>
+  
+
+<span class="btn__icon-container">
+  <i class="fas fa-file-pdf"></i>
+  </span>
+<span class="btn__text-container">.pdf</span>
+</button>
+</li>
+      
+  </ul>
+</div>
+
+
+
+
+<button onclick="toggleFullScreen()"
+  class="btn btn-sm btn-fullscreen-button"
+  title="Fullscreen mode"
+  data-bs-placement="bottom" data-bs-toggle="tooltip"
+>
+  
+
+<span class="btn__icon-container">
+  <i class="fas fa-expand"></i>
+  </span>
+
+</button>
+
+
+
+<script>
+document.write(`
+  <button class="btn btn-sm nav-link pst-navbar-icon theme-switch-button" title="light/dark" aria-label="light/dark" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <i class="theme-switch fa-solid fa-sun fa-lg" data-mode="light"></i>
+    <i class="theme-switch fa-solid fa-moon fa-lg" data-mode="dark"></i>
+    <i class="theme-switch fa-solid fa-circle-half-stroke fa-lg" data-mode="auto"></i>
+  </button>
+`);
+</script>
+
+
+<script>
+document.write(`
+  <button class="btn btn-sm pst-navbar-icon search-button search-button__button" title="Search" aria-label="Search" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <i class="fa-solid fa-magnifying-glass fa-lg"></i>
+  </button>
+`);
+</script>
+<button class="sidebar-toggle secondary-toggle btn btn-sm" title="Toggle secondary sidebar" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <span class="fa-solid fa-list"></span>
+</button>
+</div></div>
+      
+    </div>
+  
+</div>
+</div>
+              
+              
+
+<div id="jb-print-docs-body" class="onlyprint">
+    <h1>Exercises week 38</h1>
+    <!-- Table of contents -->
+    <div id="print-main-content">
+        <div id="jb-print-toc">
+            
+            <div>
+                <h2> Contents </h2>
+            </div>
+            <nav aria-label="Page">
+                <ul class="visible nav section-nav flex-column">
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#september-15-19">September 15-19</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#resampling-and-the-bias-variance-trade-off">Resampling and the Bias-Variance Trade-off</a><ul class="nav section-nav flex-column">
+<li class="toc-h3 nav-item toc-entry"><a class="reference internal nav-link" href="#learning-goals">Learning goals</a></li>
+<li class="toc-h3 nav-item toc-entry"><a class="reference internal nav-link" href="#deliverables">Deliverables</a></li>
+</ul>
+</li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#use-the-books">Use the books!</a><ul class="nav section-nav flex-column">
+<li class="toc-h3 nav-item toc-entry"><a class="reference internal nav-link" href="#definitions">Definitions</a></li>
+</ul>
+</li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#exercise-1-expectation-values-for-ordinary-least-squares-expressions">Exercise 1: Expectation values for ordinary least squares expressions</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#exercise-2-expectation-values-for-ridge-regression">Exercise 2: Expectation values for Ridge regression</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#exercise-3-deriving-the-expression-for-the-bias-variance-trade-off">Exercise 3: Deriving the expression for the Bias-Variance Trade-off</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#exercise-4-computing-the-bias-and-variance">Exercise 4: Computing the Bias and Variance</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#exercise-5-interpretation-of-scaling-and-metrics">Exercise 5: Interpretation of scaling and metrics</a></li>
+</ul>
+            </nav>
+        </div>
+    </div>
+</div>
+
+              
+                
+<div id="searchbox"></div>
+                <article class="bd-article">
+                  
+  <section class="tex2jax_ignore mathjax_ignore" id="exercises-week-38">
+<h1>Exercises week 38<a class="headerlink" href="#exercises-week-38" title="Link to this heading">#</a></h1>
+<section id="september-15-19">
+<h2>September 15-19<a class="headerlink" href="#september-15-19" title="Link to this heading">#</a></h2>
+</section>
+<section id="resampling-and-the-bias-variance-trade-off">
+<h2>Resampling and the Bias-Variance Trade-off<a class="headerlink" href="#resampling-and-the-bias-variance-trade-off" title="Link to this heading">#</a></h2>
+<section id="learning-goals">
+<h3>Learning goals<a class="headerlink" href="#learning-goals" title="Link to this heading">#</a></h3>
+<p>After completing these exercises, you will know how to</p>
+<ul class="simple">
+<li><p>Derive expectation and variances values related to linear regression</p></li>
+<li><p>Compute expectation and variances values related to linear regression</p></li>
+<li><p>Compute and evaluate the trade-off between bias and variance of a model</p></li>
+</ul>
+</section>
+<section id="deliverables">
+<h3>Deliverables<a class="headerlink" href="#deliverables" title="Link to this heading">#</a></h3>
+<p>Complete the following exercises while working in a jupyter notebook. Then, in canvas, include</p>
+<ul class="simple">
+<li><p>The jupyter notebook with the exercises completed</p></li>
+<li><p>An exported PDF of the notebook (<a class="reference external" href="https://code.visualstudio.com/docs/datascience/jupyter-notebooks#_export-your-jupyter-notebook">https://code.visualstudio.com/docs/datascience/jupyter-notebooks#_export-your-jupyter-notebook</a>)</p></li>
+</ul>
+</section>
+</section>
+<section id="use-the-books">
+<h2>Use the books!<a class="headerlink" href="#use-the-books" title="Link to this heading">#</a></h2>
+<p>This week deals with various mean values and variances in linear regression methods (here it may be useful to look up chapter 3, equation (3.8) of <a class="reference external" href="https://www.springer.com/gp/book/9780387848570">Trevor Hastie, Robert Tibshirani, Jerome H. Friedman, The Elements of Statistical Learning, Springer</a>).</p>
+<p>For more discussions on Ridge regression and calculation of expectation values, <a class="reference external" href="https://arxiv.org/abs/1509.09169">Wessel van Wieringen’s</a> article is highly recommended.</p>
+<p>The exercises this week are also a part of project 1 and can be reused in the theory part of the project.</p>
+<section id="definitions">
+<h3>Definitions<a class="headerlink" href="#definitions" title="Link to this heading">#</a></h3>
+<p>We assume that there exists a continuous function <span class="math notranslate nohighlight">\(f(\boldsymbol{x})\)</span> and a normal distributed error <span class="math notranslate nohighlight">\(\boldsymbol{\varepsilon}\sim N(0, \sigma^2)\)</span> which describes our data</p>
+<div class="math notranslate nohighlight">
+\[
+\boldsymbol{y} = f(\boldsymbol{x})+\boldsymbol{\varepsilon}
+\]</div>
+<p>We further assume that this continous function can be modeled with a linear model <span class="math notranslate nohighlight">\(\mathbf{\tilde{y}}\)</span> of some features <span class="math notranslate nohighlight">\(\mathbf{X}\)</span>.</p>
+<div class="math notranslate nohighlight">
+\[
+\boldsymbol{y} = \boldsymbol{\tilde{y}} + \boldsymbol{\varepsilon} = \boldsymbol{X}\boldsymbol{\beta} +\boldsymbol{\varepsilon}
+\]</div>
+<p>We therefore get that our data <span class="math notranslate nohighlight">\(\boldsymbol{y}\)</span> has an expectation value <span class="math notranslate nohighlight">\(\boldsymbol{X}\boldsymbol{\beta}\)</span> and variance <span class="math notranslate nohighlight">\(\sigma^2\)</span>, that is <span class="math notranslate nohighlight">\(\boldsymbol{y}\)</span> follows a normal distribution with mean value <span class="math notranslate nohighlight">\(\boldsymbol{X}\boldsymbol{\beta}\)</span> and variance <span class="math notranslate nohighlight">\(\sigma^2\)</span>.</p>
+</section>
+</section>
+<section id="exercise-1-expectation-values-for-ordinary-least-squares-expressions">
+<h2>Exercise 1: Expectation values for ordinary least squares expressions<a class="headerlink" href="#exercise-1-expectation-values-for-ordinary-least-squares-expressions" title="Link to this heading">#</a></h2>
+<p><strong>a)</strong> With the expressions for the optimal parameters <span class="math notranslate nohighlight">\(\boldsymbol{\hat{\beta}_{OLS}}\)</span> show that</p>
+<div class="math notranslate nohighlight">
+\[
+\mathbb{E}(\boldsymbol{\hat{\beta}_{OLS}}) = \boldsymbol{\beta}.
+\]</div>
+<p><strong>b)</strong> Show that the variance of <span class="math notranslate nohighlight">\(\boldsymbol{\hat{\beta}_{OLS}}\)</span> is</p>
+<div class="math notranslate nohighlight">
+\[
+\mathbf{Var}(\boldsymbol{\hat{\beta}_{OLS}}) = \sigma^2 \, (\mathbf{X}^{T} \mathbf{X})^{-1}.
+\]</div>
+<p>We can use the last expression when we define a <a class="reference external" href="https://en.wikipedia.org/wiki/Confidence_interval">confidence interval</a> for the parameters <span class="math notranslate nohighlight">\(\boldsymbol{\hat{\beta}_{OLS}}\)</span>.
+A given parameter <span class="math notranslate nohighlight">\({\boldsymbol{\hat{\beta}_{OLS}}}_j\)</span> is given by the diagonal matrix element of the above matrix.</p>
+</section>
+<section id="exercise-2-expectation-values-for-ridge-regression">
+<h2>Exercise 2: Expectation values for Ridge regression<a class="headerlink" href="#exercise-2-expectation-values-for-ridge-regression" title="Link to this heading">#</a></h2>
+<p><strong>a)</strong> With the expressions for the optimal parameters <span class="math notranslate nohighlight">\(\boldsymbol{\hat{\beta}_{Ridge}}\)</span> show that</p>
+<div class="math notranslate nohighlight">
+\[
+\mathbb{E} \big[ \hat{\boldsymbol{\beta}}^{\mathrm{Ridge}} \big]=(\mathbf{X}^{T} \mathbf{X} + \lambda \mathbf{I}_{pp})^{-1} (\mathbf{X}^{\top} \mathbf{X})\boldsymbol{\beta}
+\]</div>
+<p>We see that <span class="math notranslate nohighlight">\(\mathbb{E} \big[ \hat{\boldsymbol{\beta}}^{\mathrm{Ridge}} \big] \not= \mathbb{E} \big[\hat{\boldsymbol{\beta}}^{\mathrm{OLS}}\big ]\)</span> for any <span class="math notranslate nohighlight">\(\lambda &gt; 0\)</span>.</p>
+<p><strong>b)</strong> Show that the variance is</p>
+<div class="math notranslate nohighlight">
+\[
+\mathbf{Var}[\hat{\boldsymbol{\beta}}^{\mathrm{Ridge}}]=\sigma^2[  \mathbf{X}^{T} \mathbf{X} + \lambda \mathbf{I} ]^{-1}  \mathbf{X}^{T}\mathbf{X} \{ [  \mathbf{X}^{\top} \mathbf{X} + \lambda \mathbf{I} ]^{-1}\}^{T}
+\]</div>
+<p>We see that if the parameter <span class="math notranslate nohighlight">\(\lambda\)</span> goes to infinity then the variance of the Ridge parameters <span class="math notranslate nohighlight">\(\boldsymbol{\beta}\)</span> goes to zero.</p>
+</section>
+<section id="exercise-3-deriving-the-expression-for-the-bias-variance-trade-off">
+<h2>Exercise 3: Deriving the expression for the Bias-Variance Trade-off<a class="headerlink" href="#exercise-3-deriving-the-expression-for-the-bias-variance-trade-off" title="Link to this heading">#</a></h2>
+<p>The aim of this exercise is to derive the equations for the bias-variance tradeoff to be used in project 1.</p>
+<p>The parameters <span class="math notranslate nohighlight">\(\boldsymbol{\hat{\beta}_{OLS}}\)</span> are found by optimizing the mean squared error via the so-called cost function</p>
+<div class="math notranslate nohighlight">
+\[
+C(\boldsymbol{X},\boldsymbol{\beta}) =\frac{1}{n}\sum_{i=0}^{n-1}(y_i-\tilde{y}_i)^2=\mathbb{E}\left[(\boldsymbol{y}-\boldsymbol{\tilde{y}})^2\right]
+\]</div>
+<p><strong>a)</strong> Show that you can rewrite this into an expression which contains</p>
+<ul class="simple">
+<li><p>the variance of the model (the variance term)</p></li>
+<li><p>the expected deviation of the mean of the model from the true data (the bias term)</p></li>
+<li><p>the variance of the noise</p></li>
+</ul>
+<p>In other words, show that:</p>
+<div class="math notranslate nohighlight">
+\[
+\mathbb{E}\left[(\boldsymbol{y}-\boldsymbol{\tilde{y}})^2\right]=\mathrm{Bias}[\tilde{y}]+\mathrm{var}[\tilde{y}]+\sigma^2,
+\]</div>
+<p>with</p>
+<div class="math notranslate nohighlight">
+\[
+\mathrm{Bias}[\tilde{y}]=\mathbb{E}\left[\left(\boldsymbol{y}-\mathbb{E}\left[\boldsymbol{\tilde{y}}\right]\right)^2\right],
+\]</div>
+<p>and</p>
+<div class="math notranslate nohighlight">
+\[
+\mathrm{var}[\tilde{y}]=\mathbb{E}\left[\left(\tilde{\boldsymbol{y}}-\mathbb{E}\left[\boldsymbol{\tilde{y}}\right]\right)^2\right]=\frac{1}{n}\sum_i(\tilde{y}_i-\mathbb{E}\left[\boldsymbol{\tilde{y}}\right])^2.
+\]</div>
+<p>In order to arrive at the equation for the bias, we have to approximate the unknown function <span class="math notranslate nohighlight">\(f\)</span> with the output/target values <span class="math notranslate nohighlight">\(y\)</span>.</p>
+<p><strong>b)</strong> Explain what the terms mean and discuss their interpretations.</p>
+</section>
+<section id="exercise-4-computing-the-bias-and-variance">
+<h2>Exercise 4: Computing the Bias and Variance<a class="headerlink" href="#exercise-4-computing-the-bias-and-variance" title="Link to this heading">#</a></h2>
+<p>Before you compute the bias and variance of a real model for different complexities, let’s for now assume that you have sampled predictions and targets for a single model complexity using bootstrap resampling.</p>
+<p><strong>a)</strong> Using the expression above, compute the mean squared error, bias and variance of the given data. Check that the sum of the bias and variance correctly gives (approximately) the mean squared error.</p>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-ipython3 notranslate"><div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">numpy</span> <span class="k">as</span> <span class="nn">np</span>
+
+<span class="n">n</span> <span class="o">=</span> <span class="mi">100</span>
+<span class="n">bootstraps</span> <span class="o">=</span> <span class="mi">1000</span>
+
+<span class="n">predictions</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">rand</span><span class="p">(</span><span class="n">bootstraps</span><span class="p">,</span> <span class="n">n</span><span class="p">)</span> <span class="o">*</span> <span class="mi">10</span> <span class="o">+</span> <span class="mi">10</span>
+<span class="c1"># The definition of targets has been updated, and was wrong earlier in the week.</span>
+<span class="n">targets</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">rand</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="n">n</span><span class="p">)</span>
+
+<span class="n">mse</span> <span class="o">=</span> <span class="o">...</span>
+<span class="n">bias</span> <span class="o">=</span> <span class="o">...</span>
+<span class="n">variance</span> <span class="o">=</span> <span class="o">...</span>
+</pre></div>
+</div>
+</div>
+</div>
+<p><strong>b)</strong> Change the prediction values in some way to increase the bias while decreasing the variance.</p>
+<p><strong>c)</strong> Change the prediction values in some way to increase the variance while decreasing the bias.</p>
+<p><strong>d)</strong> Perform a bias-variance analysis of a polynomial OLS model fit to a one-dimensional function by computing and plotting the bias and variances values as a function of the polynomial degree of your model.</p>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-ipython3 notranslate"><div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">numpy</span> <span class="k">as</span> <span class="nn">np</span>
+<span class="kn">import</span> <span class="nn">matplotlib.pyplot</span> <span class="k">as</span> <span class="nn">plt</span>
+<span class="kn">from</span> <span class="nn">sklearn.preprocessing</span> <span class="kn">import</span> <span class="p">(</span>
+    <span class="n">PolynomialFeatures</span><span class="p">,</span>
+<span class="p">)</span>  <span class="c1"># use the fit_transform method of the created object!</span>
+<span class="kn">from</span> <span class="nn">sklearn.linear_model</span> <span class="kn">import</span> <span class="n">LinearRegression</span>
+<span class="kn">from</span> <span class="nn">sklearn.metrics</span> <span class="kn">import</span> <span class="n">mean_squared_error</span>
+<span class="kn">from</span> <span class="nn">sklearn.model_selection</span> <span class="kn">import</span> <span class="n">train_test_split</span>
+<span class="kn">from</span> <span class="nn">sklearn.utils</span> <span class="kn">import</span> <span class="n">resample</span>
+</pre></div>
+</div>
+</div>
+</div>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-ipython3 notranslate"><div class="highlight"><pre><span></span><span class="n">n</span> <span class="o">=</span> <span class="mi">100</span>
+<span class="n">bootstraps</span> <span class="o">=</span> <span class="mi">1000</span>
+
+<span class="n">x</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">linspace</span><span class="p">(</span><span class="o">-</span><span class="mi">3</span><span class="p">,</span> <span class="mi">3</span><span class="p">,</span> <span class="n">n</span><span class="p">)</span>
+<span class="n">y</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">exp</span><span class="p">(</span><span class="o">-</span><span class="p">(</span><span class="n">x</span><span class="o">**</span><span class="mi">2</span><span class="p">))</span> <span class="o">+</span> <span class="mf">1.5</span> <span class="o">*</span> <span class="n">np</span><span class="o">.</span><span class="n">exp</span><span class="p">(</span><span class="o">-</span><span class="p">((</span><span class="n">x</span> <span class="o">-</span> <span class="mi">2</span><span class="p">)</span> <span class="o">**</span> <span class="mi">2</span><span class="p">))</span> <span class="o">+</span> <span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">normal</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mf">0.1</span><span class="p">)</span>
+
+<span class="n">biases</span> <span class="o">=</span> <span class="p">[]</span>
+<span class="n">variances</span> <span class="o">=</span> <span class="p">[]</span>
+<span class="n">mses</span> <span class="o">=</span> <span class="p">[]</span>
+
+<span class="c1"># for p in range(1, 5):</span>
+<span class="c1">#    predictions = ...</span>
+<span class="c1">#    targets = ...</span>
+<span class="c1">#</span>
+<span class="c1">#    X = ...</span>
+<span class="c1">#    X_train, X_test, y_train, y_test = ...</span>
+<span class="c1">#    for b in range(bootstraps):</span>
+<span class="c1">#        X_train_re, y_train_re = ...</span>
+<span class="c1">#</span>
+<span class="c1">#        # fit your model on the sampled data</span>
+<span class="c1">#</span>
+<span class="c1">#        # make predictions on the test data</span>
+<span class="c1">#        predictions[b, :] =</span>
+<span class="c1">#        targets[b, :] =</span>
+<span class="c1">#</span>
+<span class="c1">#    biases.append(...)</span>
+<span class="c1">#    variances.append(...)</span>
+<span class="c1">#    mses.append(...)</span>
+</pre></div>
+</div>
+</div>
+</div>
+<p><strong>e)</strong> Discuss the bias-variance trade-off as function of your model complexity (the degree of the polynomial).</p>
+<p><strong>f)</strong> Compute and discuss the bias and variance as function of the number of data points (choose a suitable polynomial degree to show something interesting).</p>
+</section>
+<section id="exercise-5-interpretation-of-scaling-and-metrics">
+<h2>Exercise 5: Interpretation of scaling and metrics<a class="headerlink" href="#exercise-5-interpretation-of-scaling-and-metrics" title="Link to this heading">#</a></h2>
+<p>In this course, we often ask you to scale data and compute various metrics. Although these practices are “standard” in the field, we will require you to demonstrate an understanding of <em>why</em> you need to scale data and use these metrics. Both so that you can make better arguements about your results, and so that you will hopefully make fewer mistakes.</p>
+<p>First, a few reminders: In this course you should always scale the columns of the feature matrix, and sometimes scale the target data, when it is worth the effort. By scaling, we mean subtracting the mean and dividing by the standard deviation, though there are many other ways to scale data. When scaling either the feature matrix or the target data, the intercept becomes a bit harder to implement and understand, so take care.</p>
+<p>Briefly answer the following:</p>
+<p><strong>a)</strong> Why do we scale data?</p>
+<p><strong>b)</strong> Why does the OLS method give practically equivelent models on scaled and unscaled data?</p>
+<p><strong>c)</strong> Why does the Ridge method <strong>not</strong> give practically equivelent models on scaled and unscaled data? Why do we only consider the model on scaled data correct?</p>
+<p><strong>d)</strong> Why do we say that the Ridge method gives a biased model?</p>
+<p><strong>e)</strong> Is the MSE of the OLS method affected by scaling of the feature matrix? Is it affected by scaling of the target data?</p>
+<p><strong>f)</strong> Read about the R2 score, a metric we will ask you to use a lot later in the course. Is the R2 score of the OLS method affected by scaling of the feature matrix? Is it affected by scaling of the target data?</p>
+<p><strong>g)</strong> Give interpretations of the following R2 scores: 0, 0.5, 1.</p>
+<p><strong>h)</strong> What is an advantage of the R2 score over the MSE?</p>
+</section>
+</section>
+
+    <script type="text/x-thebe-config">
+    {
+        requestKernel: true,
+        binderOptions: {
+            repo: "binder-examples/jupyter-stacks-datascience",
+            ref: "master",
+        },
+        codeMirrorConfig: {
+            theme: "abcdef",
+            mode: "python"
+        },
+        kernelOptions: {
+            name: "python3",
+            path: "./."
+        },
+        predefinedOutput: true
+    }
+    </script>
+    <script>kernelName = 'python3'</script>
+
+                </article>
+              
+
+              
+              
+              
+              
+                <footer class="prev-next-footer d-print-none">
+                  
+<div class="prev-next-area">
+    <a class="left-prev"
+       href="http://github.com/week37.html"
+       title="previous page">
+      <i class="fa-solid fa-angle-left"></i>
+      <div class="prev-next-info">
+        <p class="prev-next-subtitle">previous</p>
+        <p class="prev-next-title">Week 37: Gradient descent methods</p>
+      </div>
+    </a>
+    <a class="right-next"
+       href="http://github.com/week38.html"
+       title="next page">
+      <div class="prev-next-info">
+        <p class="prev-next-subtitle">next</p>
+        <p class="prev-next-title">Week 38: Statistical analysis, bias-variance tradeoff and resampling methods</p>
+      </div>
+      <i class="fa-solid fa-angle-right"></i>
+    </a>
+</div>
+                </footer>
+              
+            </div>
+            
+            
+              
+                <div class="bd-sidebar-secondary bd-toc"><div class="sidebar-secondary-items sidebar-secondary__inner">
+
+
+  <div class="sidebar-secondary-item">
+  <div class="page-toc tocsection onthispage">
+    <i class="fa-solid fa-list"></i> Contents
+  </div>
+  <nav class="bd-toc-nav page-toc">
+    <ul class="visible nav section-nav flex-column">
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#september-15-19">September 15-19</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#resampling-and-the-bias-variance-trade-off">Resampling and the Bias-Variance Trade-off</a><ul class="nav section-nav flex-column">
+<li class="toc-h3 nav-item toc-entry"><a class="reference internal nav-link" href="#learning-goals">Learning goals</a></li>
+<li class="toc-h3 nav-item toc-entry"><a class="reference internal nav-link" href="#deliverables">Deliverables</a></li>
+</ul>
+</li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#use-the-books">Use the books!</a><ul class="nav section-nav flex-column">
+<li class="toc-h3 nav-item toc-entry"><a class="reference internal nav-link" href="#definitions">Definitions</a></li>
+</ul>
+</li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#exercise-1-expectation-values-for-ordinary-least-squares-expressions">Exercise 1: Expectation values for ordinary least squares expressions</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#exercise-2-expectation-values-for-ridge-regression">Exercise 2: Expectation values for Ridge regression</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#exercise-3-deriving-the-expression-for-the-bias-variance-trade-off">Exercise 3: Deriving the expression for the Bias-Variance Trade-off</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#exercise-4-computing-the-bias-and-variance">Exercise 4: Computing the Bias and Variance</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#exercise-5-interpretation-of-scaling-and-metrics">Exercise 5: Interpretation of scaling and metrics</a></li>
+</ul>
+  </nav></div>
+
+</div></div>
+              
+            
+          </div>
+          <footer class="bd-footer-content">
+            
+<div class="bd-footer-content__inner container">
+  
+  <div class="footer-item">
+    
+<p class="component-author">
+By Morten Hjorth-Jensen
+</p>
+
+  </div>
+  
+  <div class="footer-item">
+    
+
+  <p class="copyright">
+    
+      © Copyright 2023.
+      <br/>
+    
+  </p>
+
+  </div>
+  
+  <div class="footer-item">
+    
+  </div>
+  
+  <div class="footer-item">
+    
+  </div>
+  
+</div>
+          </footer>
+        
+
+      </main>
+    </div>
+  </div>
+  
+  <!-- Scripts loaded after <body> so the DOM is not blocked -->
+  <script src="http://github.com/_static/scripts/bootstrap.js?digest=dfe6caa3a7d634c4db9b"></script>
+<script src="http://github.com/_static/scripts/pydata-sphinx-theme.js?digest=dfe6caa3a7d634c4db9b"></script>
+
+  <footer class="bd-footer">
+  </footer>
+  </body>
+</html>
\ No newline at end of file
diff --git a/doc/LectureNotes/_build/html/exercisesweek39.html b/doc/LectureNotes/_build/html/exercisesweek39.html
new file mode 100644
index 000000000..e0b03141c
--- /dev/null
+++ b/doc/LectureNotes/_build/html/exercisesweek39.html
@@ -0,0 +1,639 @@
+
+<!DOCTYPE html>
+
+
+<html lang="en" data-content_root="./" >
+
+  <head>
+    <meta charset="utf-8" />
+    <meta name="viewport" content="width=device-width, initial-scale=1.0" /><meta name="viewport" content="width=device-width, initial-scale=1" />
+
+    <title>Exercises week 39 &#8212; Applied Data Analysis and Machine Learning</title>
+  
+  
+  
+  <script data-cfasync="false">
+    document.documentElement.dataset.mode = localStorage.getItem("mode") || "";
+    document.documentElement.dataset.theme = localStorage.getItem("theme") || "";
+  </script>
+  
+  <!-- Loaded before other Sphinx assets -->
+  <link href="http://github.com/_static/styles/theme.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+<link href="http://github.com/_static/styles/bootstrap.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+<link href="http://github.com/_static/styles/pydata-sphinx-theme.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+
+  
+  <link href="http://github.com/_static/vendor/fontawesome/6.5.2/css/all.min.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+  <link rel="preload" as="font" type="font/woff2" crossorigin href="http://github.com/_static/vendor/fontawesome/6.5.2/webfonts/fa-solid-900.woff2" />
+<link rel="preload" as="font" type="font/woff2" crossorigin href="http://github.com/_static/vendor/fontawesome/6.5.2/webfonts/fa-brands-400.woff2" />
+<link rel="preload" as="font" type="font/woff2" crossorigin href="http://github.com/_static/vendor/fontawesome/6.5.2/webfonts/fa-regular-400.woff2" />
+
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/pygments.css?v=fa44fd50" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/styles/sphinx-book-theme.css?v=eba8b062" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/togglebutton.css?v=13237357" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/copybutton.css?v=76b2166b" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/mystnb.8ecb98da25f57f5357bf6f572d296f466b2cfe2517ffebfabe82451661e28f02.css?v=6644e6bb" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/sphinx-thebe.css?v=4fa983c6" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/sphinx-design.min.css?v=95c83b7e" />
+  
+  <!-- Pre-loaded scripts that we'll load fully later -->
+  <link rel="preload" as="script" href="http://github.com/_static/scripts/bootstrap.js?digest=dfe6caa3a7d634c4db9b" />
+<link rel="preload" as="script" href="http://github.com/_static/scripts/pydata-sphinx-theme.js?digest=dfe6caa3a7d634c4db9b" />
+  <script src="http://github.com/_static/vendor/fontawesome/6.5.2/js/all.min.js?digest=dfe6caa3a7d634c4db9b"></script>
+
+    <script src="http://github.com/_static/documentation_options.js?v=9eb32ce0"></script>
+    <script src="http://github.com/_static/doctools.js?v=9a2dae69"></script>
+    <script src="http://github.com/_static/sphinx_highlight.js?v=dc90522c"></script>
+    <script src="http://github.com/_static/clipboard.min.js?v=a7894cd8"></script>
+    <script src="http://github.com/_static/copybutton.js?v=f281be69"></script>
+    <script src="http://github.com/_static/scripts/sphinx-book-theme.js?v=887ef09a"></script>
+    <script>let toggleHintShow = 'Click to show';</script>
+    <script>let toggleHintHide = 'Click to hide';</script>
+    <script>let toggleOpenOnPrint = 'true';</script>
+    <script src="http://github.com/_static/togglebutton.js?v=4a39c7ea"></script>
+    <script>var togglebuttonSelector = '.toggle, .admonition.dropdown';</script>
+    <script src="http://github.com/_static/design-tabs.js?v=f930bc37"></script>
+    <script>const THEBE_JS_URL = "https://unpkg.com/thebe@0.8.2/lib/index.js"; const thebe_selector = ".thebe,.cell"; const thebe_selector_input = "pre"; const thebe_selector_output = ".output, .cell_output"</script>
+    <script async="async" src="http://github.com/_static/sphinx-thebe.js?v=c100c467"></script>
+    <script>var togglebuttonSelector = '.toggle, .admonition.dropdown';</script>
+    <script>const THEBE_JS_URL = "https://unpkg.com/thebe@0.8.2/lib/index.js"; const thebe_selector = ".thebe,.cell"; const thebe_selector_input = "pre"; const thebe_selector_output = ".output, .cell_output"</script>
+    <script>DOCUMENTATION_OPTIONS.pagename = 'exercisesweek39';</script>
+    <link rel="index" title="Index" href="http://github.com/genindex.html" />
+    <link rel="search" title="Search" href="http://github.com/search.html" />
+    <link rel="next" title="Week 39: Resampling methods and logistic regression" href="http://github.com/week39.html" />
+    <link rel="prev" title="Week 38: Statistical analysis, bias-variance tradeoff and resampling methods" href="http://github.com/week38.html" />
+  <meta name="viewport" content="width=device-width, initial-scale=1"/>
+  <meta name="docsearch:language" content="en"/>
+  </head>
+  
+  
+  <body data-bs-spy="scroll" data-bs-target=".bd-toc-nav" data-offset="180" data-bs-root-margin="0px 0px -60%" data-default-mode="">
+
+  
+  
+  <div id="pst-skip-link" class="skip-link d-print-none"><a href="#main-content">Skip to main content</a></div>
+  
+  <div id="pst-scroll-pixel-helper"></div>
+  
+  <button type="button" class="btn rounded-pill" id="pst-back-to-top">
+    <i class="fa-solid fa-arrow-up"></i>Back to top</button>
+
+  
+  <input type="checkbox"
+          class="sidebar-toggle"
+          id="pst-primary-sidebar-checkbox"/>
+  <label class="overlay overlay-primary" for="pst-primary-sidebar-checkbox"></label>
+  
+  <input type="checkbox"
+          class="sidebar-toggle"
+          id="pst-secondary-sidebar-checkbox"/>
+  <label class="overlay overlay-secondary" for="pst-secondary-sidebar-checkbox"></label>
+  
+  <div class="search-button__wrapper">
+    <div class="search-button__overlay"></div>
+    <div class="search-button__search-container">
+<form class="bd-search d-flex align-items-center"
+      action="http://github.com/search.html"
+      method="get">
+  <i class="fa-solid fa-magnifying-glass"></i>
+  <input type="search"
+         class="form-control"
+         name="q"
+         id="search-input"
+         placeholder="Search this book..."
+         aria-label="Search this book..."
+         autocomplete="off"
+         autocorrect="off"
+         autocapitalize="off"
+         spellcheck="false"/>
+  <span class="search-button__kbd-shortcut"><kbd class="kbd-shortcut__modifier">Ctrl</kbd>+<kbd>K</kbd></span>
+</form></div>
+  </div>
+
+  <div class="pst-async-banner-revealer d-none">
+  <aside id="bd-header-version-warning" class="d-none d-print-none" aria-label="Version warning"></aside>
+</div>
+
+  
+    <header class="bd-header navbar navbar-expand-lg bd-navbar d-print-none">
+    </header>
+  
+
+  <div class="bd-container">
+    <div class="bd-container__inner bd-page-width">
+      
+      
+      
+      <div class="bd-sidebar-primary bd-sidebar">
+        
+
+  
+  <div class="sidebar-header-items sidebar-primary__section">
+    
+    
+    
+    
+  </div>
+  
+    <div class="sidebar-primary-items__start sidebar-primary__section">
+        <div class="sidebar-primary-item">
+
+  
+    
+  
+
+<a class="navbar-brand logo" href="http://github.com/intro.html">
+  
+  
+  
+  
+  
+    
+    
+      
+    
+    
+    <img src="http://github.com/_static/logo.png" class="logo__image only-light" alt="Applied Data Analysis and Machine Learning - Home"/>
+    <script>document.write(`<img src="http://github.com/_static/logo.png" class="logo__image only-dark" alt="Applied Data Analysis and Machine Learning - Home"/>`);</script>
+  
+  
+</a></div>
+        <div class="sidebar-primary-item">
+
+ <script>
+ document.write(`
+   <button class="btn search-button-field search-button__button" title="Search" aria-label="Search" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <i class="fa-solid fa-magnifying-glass"></i>
+    <span class="search-button__default-text">Search</span>
+    <span class="search-button__kbd-shortcut"><kbd class="kbd-shortcut__modifier">Ctrl</kbd>+<kbd class="kbd-shortcut__modifier">K</kbd></span>
+   </button>
+ `);
+ </script></div>
+        <div class="sidebar-primary-item"><nav class="bd-links bd-docs-nav" aria-label="Main">
+    <div class="bd-toc-item navbar-nav active">
+        
+        <ul class="nav bd-sidenav bd-sidenav__home-link">
+            <li class="toctree-l1">
+                <a class="reference internal" href="http://github.com/intro.html">
+                    Applied Data Analysis and Machine Learning
+                </a>
+            </li>
+        </ul>
+        <p aria-level="2" class="caption" role="heading"><span class="caption-text">About the course</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/schedule.html">Course setting</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/teachers.html">Teachers and Grading</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/textbooks.html">Textbooks</a></li>
+
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Review of Statistics with Resampling Techniques and Linear Algebra</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/statistics.html">1. Elements of Probability Theory and Statistical Data Analysis</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/linalg.html">2. Linear Algebra, Handling of Arrays and more Python Features</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">From Regression to Support Vector Machines</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter1.html">3. Linear Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter2.html">4. Ridge and Lasso Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter3.html">5. Resampling Methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter4.html">6. Logistic Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapteroptimization.html">7. Optimization, the central part of any Machine Learning algortithm</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter5.html">8. Support Vector Machines, overarching aims</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Decision Trees, Ensemble Methods and Boosting</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter6.html">9. Decision trees, overarching aims</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter7.html">10. Ensemble Methods: From a Single Tree to Many Trees and Extreme Boosting, Meet the Jungle of Methods</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Dimensionality Reduction</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter8.html">11. Basic ideas of the Principal Component Analysis (PCA)</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/clustering.html">12. Clustering and Unsupervised Learning</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Deep Learning Methods</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter9.html">13. Neural networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter10.html">14. Building a Feed Forward Neural Network</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter11.html">15. Solving Differential Equations  with Deep Learning</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter12.html">16. Convolutional Neural Networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter13.html">17. Recurrent neural networks: Overarching view</a></li>
+
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Weekly material, notes and exercises</span></p>
+<ul class="current nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek34.html">Exercises week 34</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week34.html">Week 34: Introduction to the course, Logistics and Practicalities</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek35.html">Exercises week 35</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week35.html">Week 35: From Ordinary Linear Regression to Ridge and Lasso Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek36.html">Exercises week 36</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week36.html">Week 36: Linear Regression and Gradient descent</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek37.html">Exercises week 37</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week37.html">Week 37: Gradient descent methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek38.html">Exercises week 38</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week38.html">Week 38: Statistical analysis, bias-variance tradeoff and resampling methods</a></li>
+<li class="toctree-l1 current active"><a class="current reference internal" href="#">Exercises week 39</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week39.html">Week 39: Resampling methods and logistic regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week40.html">Week 40: Gradient descent methods (continued) and start Neural networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week41.html">Week 41 Neural networks and constructing a neural network code</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek41.html">Exercises week 41</a></li>
+
+
+
+
+
+
+
+
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week42.html">Week 42 Constructing a Neural Network code with examples</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek42.html">Exercises week 42</a></li>
+
+
+
+
+
+
+
+
+
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week43.html">Week 43: Deep Learning: Constructing a Neural Network code and solving differential equations</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek43.html">Exercises week 43</a></li>
+
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week44.html">Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek44.html">Exercises week 44</a></li>
+
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week45.html">Week 45,  Convolutional Neural Networks (CCNs)</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Projects</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/project1.html">Project 1 on Machine Learning, deadline October 6 (midnight), 2025</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/project2.html">Project 2 on Machine Learning, deadline November 10 (Midnight)</a></li>
+</ul>
+
+    </div>
+</nav></div>
+    </div>
+  
+  
+  <div class="sidebar-primary-items__end sidebar-primary__section">
+  </div>
+  
+  <div id="rtd-footer-container"></div>
+
+
+      </div>
+      
+      <main id="main-content" class="bd-main" role="main">
+        
+        
+
+<div class="sbt-scroll-pixel-helper"></div>
+
+          <div class="bd-content">
+            <div class="bd-article-container">
+              
+              <div class="bd-header-article d-print-none">
+<div class="header-article-items header-article__inner">
+  
+    <div class="header-article-items__start">
+      
+        <div class="header-article-item"><button class="sidebar-toggle primary-toggle btn btn-sm" title="Toggle primary sidebar" data-bs-placement="bottom" data-bs-toggle="tooltip">
+  <span class="fa-solid fa-bars"></span>
+</button></div>
+      
+    </div>
+  
+  
+    <div class="header-article-items__end">
+      
+        <div class="header-article-item">
+
+<div class="article-header-buttons">
+
+
+
+
+
+<div class="dropdown dropdown-download-buttons">
+  <button class="btn dropdown-toggle" type="button" data-bs-toggle="dropdown" aria-expanded="false" aria-label="Download this page">
+    <i class="fas fa-download"></i>
+  </button>
+  <ul class="dropdown-menu">
+      
+      
+      
+      <li><a href="http://github.com/_sources/exercisesweek39.ipynb" target="_blank"
+   class="btn btn-sm btn-download-source-button dropdown-item"
+   title="Download source file"
+   data-bs-placement="left" data-bs-toggle="tooltip"
+>
+  
+
+<span class="btn__icon-container">
+  <i class="fas fa-file"></i>
+  </span>
+<span class="btn__text-container">.ipynb</span>
+</a>
+</li>
+      
+      
+      
+      
+      <li>
+<button onclick="window.print()"
+  class="btn btn-sm btn-download-pdf-button dropdown-item"
+  title="Print to PDF"
+  data-bs-placement="left" data-bs-toggle="tooltip"
+>
+  
+
+<span class="btn__icon-container">
+  <i class="fas fa-file-pdf"></i>
+  </span>
+<span class="btn__text-container">.pdf</span>
+</button>
+</li>
+      
+  </ul>
+</div>
+
+
+
+
+<button onclick="toggleFullScreen()"
+  class="btn btn-sm btn-fullscreen-button"
+  title="Fullscreen mode"
+  data-bs-placement="bottom" data-bs-toggle="tooltip"
+>
+  
+
+<span class="btn__icon-container">
+  <i class="fas fa-expand"></i>
+  </span>
+
+</button>
+
+
+
+<script>
+document.write(`
+  <button class="btn btn-sm nav-link pst-navbar-icon theme-switch-button" title="light/dark" aria-label="light/dark" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <i class="theme-switch fa-solid fa-sun fa-lg" data-mode="light"></i>
+    <i class="theme-switch fa-solid fa-moon fa-lg" data-mode="dark"></i>
+    <i class="theme-switch fa-solid fa-circle-half-stroke fa-lg" data-mode="auto"></i>
+  </button>
+`);
+</script>
+
+
+<script>
+document.write(`
+  <button class="btn btn-sm pst-navbar-icon search-button search-button__button" title="Search" aria-label="Search" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <i class="fa-solid fa-magnifying-glass fa-lg"></i>
+  </button>
+`);
+</script>
+<button class="sidebar-toggle secondary-toggle btn btn-sm" title="Toggle secondary sidebar" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <span class="fa-solid fa-list"></span>
+</button>
+</div></div>
+      
+    </div>
+  
+</div>
+</div>
+              
+              
+
+<div id="jb-print-docs-body" class="onlyprint">
+    <h1>Exercises week 39</h1>
+    <!-- Table of contents -->
+    <div id="print-main-content">
+        <div id="jb-print-toc">
+            
+            <div>
+                <h2> Contents </h2>
+            </div>
+            <nav aria-label="Page">
+                <ul class="visible nav section-nav flex-column">
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#getting-started-with-project-1">Getting started with project 1</a><ul class="nav section-nav flex-column">
+<li class="toc-h3 nav-item toc-entry"><a class="reference internal nav-link" href="#learning-goals">Learning goals</a></li>
+<li class="toc-h3 nav-item toc-entry"><a class="reference internal nav-link" href="#deliverables">Deliverables</a></li>
+</ul>
+</li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#exercise-1-creating-the-report-document">Exercise 1: Creating the report document</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#exercise-2-adding-good-figures">Exercise 2: Adding good figures</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#exercise-3-writing-an-abstract-and-introduction">Exercise 3: Writing an abstract and introduction</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#exercise-4-making-the-code-available-and-presentable">Exercise 4: Making the code available and presentable</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#exercise-5-referencing">Exercise 5: Referencing</a></li>
+</ul>
+            </nav>
+        </div>
+    </div>
+</div>
+
+              
+                
+<div id="searchbox"></div>
+                <article class="bd-article">
+                  
+  <section class="tex2jax_ignore mathjax_ignore" id="exercises-week-39">
+<h1>Exercises week 39<a class="headerlink" href="#exercises-week-39" title="Link to this heading">#</a></h1>
+<section id="getting-started-with-project-1">
+<h2>Getting started with project 1<a class="headerlink" href="#getting-started-with-project-1" title="Link to this heading">#</a></h2>
+<p>The aim of the exercises this week is to aid you in getting started with writing the report. This will be discussed during the lab sessions as well.</p>
+<p>A short feedback to the this exercise will be available before the project deadline. And you can reuse these elements in your final report.</p>
+<section id="learning-goals">
+<h3>Learning goals<a class="headerlink" href="#learning-goals" title="Link to this heading">#</a></h3>
+<p>After completing these exercises, you will know how to</p>
+<ul class="simple">
+<li><p>Create a properly formatted report in Overleaf</p></li>
+<li><p>Select and present graphs for a scientific report</p></li>
+<li><p>Write an abstract and introduction for a scientific report</p></li>
+</ul>
+</section>
+<section id="deliverables">
+<h3>Deliverables<a class="headerlink" href="#deliverables" title="Link to this heading">#</a></h3>
+<p>Complete the following exercises while working in an Overleaf project. Then, in canvas, include</p>
+<ul class="simple">
+<li><p>An exported PDF of the report draft you have been working on.</p></li>
+<li><p>A comment linking to the github repository used in exercise 4.</p></li>
+</ul>
+</section>
+</section>
+<section id="exercise-1-creating-the-report-document">
+<h2>Exercise 1: Creating the report document<a class="headerlink" href="#exercise-1-creating-the-report-document" title="Link to this heading">#</a></h2>
+<p>We require all projects to be formatted as proper scientific reports, and this includes using LaTeX for typesetting. We strongly recommend that you use the online LaTeX editor Overleaf, as it is much easier to start using, and has excellent support for collaboration.</p>
+<p><strong>a)</strong> Create an account on <a class="reference external" href="http://overleaf.com/">Overleaf.com</a>, or log in using SSO with your UiO email.</p>
+<p><strong>b)</strong> Download <a class="reference external" href="https://github.com/CompPhysics/MachineLearning/blob/master/doc/LectureNotes/data/FYS_STK_Template.zip">this</a> template project.</p>
+<p><strong>c)</strong> Create a new Overleaf project with the correct formatting by uploading the template project.</p>
+<p><strong>d)</strong> Read the general guideline for writing a report, which can be found at <a class="github reference external" href="https://github.com/CompPhysics/MachineLearning/blob/master/doc/Projects/EvaluationGrading/EvaluationForm.md">CompPhysics/MachineLearning</a>.</p>
+<p><strong>e)</strong> Look at the provided example of an earlier project, found at <a class="github reference external" href="https://github.com/CompPhysics/MachineLearning/blob/master/doc/Projects/ReportExample/ReportExampleDanielBH.pdf">CompPhysics/MachineLearning</a></p>
+</section>
+<section id="exercise-2-adding-good-figures">
+<h2>Exercise 2: Adding good figures<a class="headerlink" href="#exercise-2-adding-good-figures" title="Link to this heading">#</a></h2>
+<p><strong>a)</strong> Using what you have learned so far in this course, create a plot illustrating the Bias-Variance trade-off. Make sure the lines and axes are labeled, with font size being the same as in the text.</p>
+<p><strong>b)</strong> Add this figure to the results section of your document, with a caption that describes it. A reader should be able to understand the figure with only its contents and caption.</p>
+<p><strong>c)</strong> Refer to the figure in your text using \ref.</p>
+<p><strong>d)</strong> Create a heatmap showing the MSE of a Ridge regression model for various polynomial degrees and lambda values. Make sure the axes are labeled, and that the title or colorbar describes what is plotted.</p>
+<p><strong>e)</strong> Add this second figure to your document with a caption and reference in the text. All figures in the final report must be captioned and be referenced and used in the text.</p>
+</section>
+<section id="exercise-3-writing-an-abstract-and-introduction">
+<h2>Exercise 3: Writing an abstract and introduction<a class="headerlink" href="#exercise-3-writing-an-abstract-and-introduction" title="Link to this heading">#</a></h2>
+<p>Although much of your project 1 results are not done yet, we want you to write an abstract and introduction to get you started on writing the report. It is generally a good idea to write a lot of a report before finishing all of the results, as you get a better understanding of your methods and inquiry from doing so, along with saving a lot of time. Where you would typically describe results in the abstract, instead make something up, just this once.</p>
+<p><strong>a)</strong> Read the guidelines on abstract and introduction before you start.</p>
+<p><strong>b)</strong> Write an abstract for project 1 in your report.</p>
+<p><strong>c)</strong> Write an introduction for project 1 in your report.</p>
+</section>
+<section id="exercise-4-making-the-code-available-and-presentable">
+<h2>Exercise 4: Making the code available and presentable<a class="headerlink" href="#exercise-4-making-the-code-available-and-presentable" title="Link to this heading">#</a></h2>
+<p>A central part of the report is the code you write to implement the methods and generate the results. To get points for the code-part of the project, you need to make your code avaliable and presentable.</p>
+<p><strong>a)</strong> Create a github repository for project 1, or create a dedicated folder for project 1 in a github repository. Only one person in your group needs to do this.</p>
+<p><strong>b)</strong> Add a PDF of the report to this repository, after completing exercises 1-3</p>
+<p><strong>c)</strong> Add a folder named Code, where you can put python files for your functions and notebooks for reproducing your results.</p>
+<p><strong>d)</strong> Add python files for functions, and a notebook to produce the figures in exercise 2, to the Code folder. Remember to use a seed for generating random data and for train-test splits.</p>
+<p><strong>e)</strong> Create a README file in the repository or project folder with</p>
+<ul class="simple">
+<li><p>the name of the group members</p></li>
+<li><p>a short description of the project</p></li>
+<li><p>a description of how to install the required packages to run your code from a requirements.txt file</p></li>
+<li><p>names and descriptions of the various notebooks in the Code folder and the results they produce</p></li>
+</ul>
+</section>
+<section id="exercise-5-referencing">
+<h2>Exercise 5: Referencing<a class="headerlink" href="#exercise-5-referencing" title="Link to this heading">#</a></h2>
+<p><strong>a)</strong> Add a reference to Hastie et al. using your preferred referencing style. See <a class="reference external" href="https://www.sokogskriv.no/referansestiler/">https://www.sokogskriv.no/referansestiler/</a> for an overview of styles.</p>
+<p><strong>b)</strong> Add a reference to sklearn like this: <a class="reference external" href="https://scikit-learn.org/stable/about.html#citing-scikit-learn">https://scikit-learn.org/stable/about.html#citing-scikit-learn</a></p>
+<p><strong>c)</strong> Make a prompt to your LLM of choice, and upload the exported conversation to your GitHub repository for the project.</p>
+<p><strong>d)</strong> At the end of the methods section of the report, write a one paragraph declaration on how and for what you have used the LLM. Link to the log on GitHub.</p>
+</section>
+</section>
+
+    <script type="text/x-thebe-config">
+    {
+        requestKernel: true,
+        binderOptions: {
+            repo: "binder-examples/jupyter-stacks-datascience",
+            ref: "master",
+        },
+        codeMirrorConfig: {
+            theme: "abcdef",
+            mode: "python"
+        },
+        kernelOptions: {
+            name: "python3",
+            path: "./."
+        },
+        predefinedOutput: true
+    }
+    </script>
+    <script>kernelName = 'python3'</script>
+
+                </article>
+              
+
+              
+              
+              
+              
+                <footer class="prev-next-footer d-print-none">
+                  
+<div class="prev-next-area">
+    <a class="left-prev"
+       href="http://github.com/week38.html"
+       title="previous page">
+      <i class="fa-solid fa-angle-left"></i>
+      <div class="prev-next-info">
+        <p class="prev-next-subtitle">previous</p>
+        <p class="prev-next-title">Week 38: Statistical analysis, bias-variance tradeoff and resampling methods</p>
+      </div>
+    </a>
+    <a class="right-next"
+       href="http://github.com/week39.html"
+       title="next page">
+      <div class="prev-next-info">
+        <p class="prev-next-subtitle">next</p>
+        <p class="prev-next-title">Week 39: Resampling methods and logistic regression</p>
+      </div>
+      <i class="fa-solid fa-angle-right"></i>
+    </a>
+</div>
+                </footer>
+              
+            </div>
+            
+            
+              
+                <div class="bd-sidebar-secondary bd-toc"><div class="sidebar-secondary-items sidebar-secondary__inner">
+
+
+  <div class="sidebar-secondary-item">
+  <div class="page-toc tocsection onthispage">
+    <i class="fa-solid fa-list"></i> Contents
+  </div>
+  <nav class="bd-toc-nav page-toc">
+    <ul class="visible nav section-nav flex-column">
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#getting-started-with-project-1">Getting started with project 1</a><ul class="nav section-nav flex-column">
+<li class="toc-h3 nav-item toc-entry"><a class="reference internal nav-link" href="#learning-goals">Learning goals</a></li>
+<li class="toc-h3 nav-item toc-entry"><a class="reference internal nav-link" href="#deliverables">Deliverables</a></li>
+</ul>
+</li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#exercise-1-creating-the-report-document">Exercise 1: Creating the report document</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#exercise-2-adding-good-figures">Exercise 2: Adding good figures</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#exercise-3-writing-an-abstract-and-introduction">Exercise 3: Writing an abstract and introduction</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#exercise-4-making-the-code-available-and-presentable">Exercise 4: Making the code available and presentable</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#exercise-5-referencing">Exercise 5: Referencing</a></li>
+</ul>
+  </nav></div>
+
+</div></div>
+              
+            
+          </div>
+          <footer class="bd-footer-content">
+            
+<div class="bd-footer-content__inner container">
+  
+  <div class="footer-item">
+    
+<p class="component-author">
+By Morten Hjorth-Jensen
+</p>
+
+  </div>
+  
+  <div class="footer-item">
+    
+
+  <p class="copyright">
+    
+      © Copyright 2023.
+      <br/>
+    
+  </p>
+
+  </div>
+  
+  <div class="footer-item">
+    
+  </div>
+  
+  <div class="footer-item">
+    
+  </div>
+  
+</div>
+          </footer>
+        
+
+      </main>
+    </div>
+  </div>
+  
+  <!-- Scripts loaded after <body> so the DOM is not blocked -->
+  <script src="http://github.com/_static/scripts/bootstrap.js?digest=dfe6caa3a7d634c4db9b"></script>
+<script src="http://github.com/_static/scripts/pydata-sphinx-theme.js?digest=dfe6caa3a7d634c4db9b"></script>
+
+  <footer class="bd-footer">
+  </footer>
+  </body>
+</html>
\ No newline at end of file
diff --git a/doc/LectureNotes/_build/html/exercisesweek41.html b/doc/LectureNotes/_build/html/exercisesweek41.html
new file mode 100644
index 000000000..9ede978a5
--- /dev/null
+++ b/doc/LectureNotes/_build/html/exercisesweek41.html
@@ -0,0 +1,975 @@
+
+<!DOCTYPE html>
+
+
+<html lang="en" data-content_root="./" >
+
+  <head>
+    <meta charset="utf-8" />
+    <meta name="viewport" content="width=device-width, initial-scale=1.0" /><meta name="viewport" content="width=device-width, initial-scale=1" />
+
+    <title>Exercises week 41 &#8212; Applied Data Analysis and Machine Learning</title>
+  
+  
+  
+  <script data-cfasync="false">
+    document.documentElement.dataset.mode = localStorage.getItem("mode") || "";
+    document.documentElement.dataset.theme = localStorage.getItem("theme") || "";
+  </script>
+  
+  <!-- Loaded before other Sphinx assets -->
+  <link href="http://github.com/_static/styles/theme.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+<link href="http://github.com/_static/styles/bootstrap.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+<link href="http://github.com/_static/styles/pydata-sphinx-theme.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+
+  
+  <link href="http://github.com/_static/vendor/fontawesome/6.5.2/css/all.min.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+  <link rel="preload" as="font" type="font/woff2" crossorigin href="http://github.com/_static/vendor/fontawesome/6.5.2/webfonts/fa-solid-900.woff2" />
+<link rel="preload" as="font" type="font/woff2" crossorigin href="http://github.com/_static/vendor/fontawesome/6.5.2/webfonts/fa-brands-400.woff2" />
+<link rel="preload" as="font" type="font/woff2" crossorigin href="http://github.com/_static/vendor/fontawesome/6.5.2/webfonts/fa-regular-400.woff2" />
+
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/pygments.css?v=fa44fd50" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/styles/sphinx-book-theme.css?v=eba8b062" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/togglebutton.css?v=13237357" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/copybutton.css?v=76b2166b" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/mystnb.8ecb98da25f57f5357bf6f572d296f466b2cfe2517ffebfabe82451661e28f02.css?v=6644e6bb" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/sphinx-thebe.css?v=4fa983c6" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/sphinx-design.min.css?v=95c83b7e" />
+  
+  <!-- Pre-loaded scripts that we'll load fully later -->
+  <link rel="preload" as="script" href="http://github.com/_static/scripts/bootstrap.js?digest=dfe6caa3a7d634c4db9b" />
+<link rel="preload" as="script" href="http://github.com/_static/scripts/pydata-sphinx-theme.js?digest=dfe6caa3a7d634c4db9b" />
+  <script src="http://github.com/_static/vendor/fontawesome/6.5.2/js/all.min.js?digest=dfe6caa3a7d634c4db9b"></script>
+
+    <script src="http://github.com/_static/documentation_options.js?v=9eb32ce0"></script>
+    <script src="http://github.com/_static/doctools.js?v=9a2dae69"></script>
+    <script src="http://github.com/_static/sphinx_highlight.js?v=dc90522c"></script>
+    <script src="http://github.com/_static/clipboard.min.js?v=a7894cd8"></script>
+    <script src="http://github.com/_static/copybutton.js?v=f281be69"></script>
+    <script src="http://github.com/_static/scripts/sphinx-book-theme.js?v=887ef09a"></script>
+    <script>let toggleHintShow = 'Click to show';</script>
+    <script>let toggleHintHide = 'Click to hide';</script>
+    <script>let toggleOpenOnPrint = 'true';</script>
+    <script src="http://github.com/_static/togglebutton.js?v=4a39c7ea"></script>
+    <script>var togglebuttonSelector = '.toggle, .admonition.dropdown';</script>
+    <script src="http://github.com/_static/design-tabs.js?v=f930bc37"></script>
+    <script>const THEBE_JS_URL = "https://unpkg.com/thebe@0.8.2/lib/index.js"; const thebe_selector = ".thebe,.cell"; const thebe_selector_input = "pre"; const thebe_selector_output = ".output, .cell_output"</script>
+    <script async="async" src="http://github.com/_static/sphinx-thebe.js?v=c100c467"></script>
+    <script>var togglebuttonSelector = '.toggle, .admonition.dropdown';</script>
+    <script>const THEBE_JS_URL = "https://unpkg.com/thebe@0.8.2/lib/index.js"; const thebe_selector = ".thebe,.cell"; const thebe_selector_input = "pre"; const thebe_selector_output = ".output, .cell_output"</script>
+    <script>window.MathJax = {"options": {"processHtmlClass": "tex2jax_process|mathjax_process|math|output_area"}}</script>
+    <script defer="defer" src="https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-mml-chtml.js"></script>
+    <script>DOCUMENTATION_OPTIONS.pagename = 'exercisesweek41';</script>
+    <link rel="index" title="Index" href="http://github.com/genindex.html" />
+    <link rel="search" title="Search" href="http://github.com/search.html" />
+    <link rel="next" title="Week 42 Constructing a Neural Network code with examples" href="http://github.com/week42.html" />
+    <link rel="prev" title="Week 41 Neural networks and constructing a neural network code" href="http://github.com/week41.html" />
+  <meta name="viewport" content="width=device-width, initial-scale=1"/>
+  <meta name="docsearch:language" content="en"/>
+  </head>
+  
+  
+  <body data-bs-spy="scroll" data-bs-target=".bd-toc-nav" data-offset="180" data-bs-root-margin="0px 0px -60%" data-default-mode="">
+
+  
+  
+  <div id="pst-skip-link" class="skip-link d-print-none"><a href="#main-content">Skip to main content</a></div>
+  
+  <div id="pst-scroll-pixel-helper"></div>
+  
+  <button type="button" class="btn rounded-pill" id="pst-back-to-top">
+    <i class="fa-solid fa-arrow-up"></i>Back to top</button>
+
+  
+  <input type="checkbox"
+          class="sidebar-toggle"
+          id="pst-primary-sidebar-checkbox"/>
+  <label class="overlay overlay-primary" for="pst-primary-sidebar-checkbox"></label>
+  
+  <input type="checkbox"
+          class="sidebar-toggle"
+          id="pst-secondary-sidebar-checkbox"/>
+  <label class="overlay overlay-secondary" for="pst-secondary-sidebar-checkbox"></label>
+  
+  <div class="search-button__wrapper">
+    <div class="search-button__overlay"></div>
+    <div class="search-button__search-container">
+<form class="bd-search d-flex align-items-center"
+      action="http://github.com/search.html"
+      method="get">
+  <i class="fa-solid fa-magnifying-glass"></i>
+  <input type="search"
+         class="form-control"
+         name="q"
+         id="search-input"
+         placeholder="Search this book..."
+         aria-label="Search this book..."
+         autocomplete="off"
+         autocorrect="off"
+         autocapitalize="off"
+         spellcheck="false"/>
+  <span class="search-button__kbd-shortcut"><kbd class="kbd-shortcut__modifier">Ctrl</kbd>+<kbd>K</kbd></span>
+</form></div>
+  </div>
+
+  <div class="pst-async-banner-revealer d-none">
+  <aside id="bd-header-version-warning" class="d-none d-print-none" aria-label="Version warning"></aside>
+</div>
+
+  
+    <header class="bd-header navbar navbar-expand-lg bd-navbar d-print-none">
+    </header>
+  
+
+  <div class="bd-container">
+    <div class="bd-container__inner bd-page-width">
+      
+      
+      
+      <div class="bd-sidebar-primary bd-sidebar">
+        
+
+  
+  <div class="sidebar-header-items sidebar-primary__section">
+    
+    
+    
+    
+  </div>
+  
+    <div class="sidebar-primary-items__start sidebar-primary__section">
+        <div class="sidebar-primary-item">
+
+  
+    
+  
+
+<a class="navbar-brand logo" href="http://github.com/intro.html">
+  
+  
+  
+  
+  
+    
+    
+      
+    
+    
+    <img src="http://github.com/_static/logo.png" class="logo__image only-light" alt="Applied Data Analysis and Machine Learning - Home"/>
+    <script>document.write(`<img src="http://github.com/_static/logo.png" class="logo__image only-dark" alt="Applied Data Analysis and Machine Learning - Home"/>`);</script>
+  
+  
+</a></div>
+        <div class="sidebar-primary-item">
+
+ <script>
+ document.write(`
+   <button class="btn search-button-field search-button__button" title="Search" aria-label="Search" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <i class="fa-solid fa-magnifying-glass"></i>
+    <span class="search-button__default-text">Search</span>
+    <span class="search-button__kbd-shortcut"><kbd class="kbd-shortcut__modifier">Ctrl</kbd>+<kbd class="kbd-shortcut__modifier">K</kbd></span>
+   </button>
+ `);
+ </script></div>
+        <div class="sidebar-primary-item"><nav class="bd-links bd-docs-nav" aria-label="Main">
+    <div class="bd-toc-item navbar-nav active">
+        
+        <ul class="nav bd-sidenav bd-sidenav__home-link">
+            <li class="toctree-l1">
+                <a class="reference internal" href="http://github.com/intro.html">
+                    Applied Data Analysis and Machine Learning
+                </a>
+            </li>
+        </ul>
+        <p aria-level="2" class="caption" role="heading"><span class="caption-text">About the course</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/schedule.html">Course setting</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/teachers.html">Teachers and Grading</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/textbooks.html">Textbooks</a></li>
+
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Review of Statistics with Resampling Techniques and Linear Algebra</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/statistics.html">1. Elements of Probability Theory and Statistical Data Analysis</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/linalg.html">2. Linear Algebra, Handling of Arrays and more Python Features</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">From Regression to Support Vector Machines</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter1.html">3. Linear Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter2.html">4. Ridge and Lasso Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter3.html">5. Resampling Methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter4.html">6. Logistic Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapteroptimization.html">7. Optimization, the central part of any Machine Learning algortithm</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter5.html">8. Support Vector Machines, overarching aims</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Decision Trees, Ensemble Methods and Boosting</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter6.html">9. Decision trees, overarching aims</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter7.html">10. Ensemble Methods: From a Single Tree to Many Trees and Extreme Boosting, Meet the Jungle of Methods</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Dimensionality Reduction</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter8.html">11. Basic ideas of the Principal Component Analysis (PCA)</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/clustering.html">12. Clustering and Unsupervised Learning</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Deep Learning Methods</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter9.html">13. Neural networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter10.html">14. Building a Feed Forward Neural Network</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter11.html">15. Solving Differential Equations  with Deep Learning</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter12.html">16. Convolutional Neural Networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter13.html">17. Recurrent neural networks: Overarching view</a></li>
+
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Weekly material, notes and exercises</span></p>
+<ul class="current nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek34.html">Exercises week 34</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week34.html">Week 34: Introduction to the course, Logistics and Practicalities</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek35.html">Exercises week 35</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week35.html">Week 35: From Ordinary Linear Regression to Ridge and Lasso Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek36.html">Exercises week 36</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week36.html">Week 36: Linear Regression and Gradient descent</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek37.html">Exercises week 37</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week37.html">Week 37: Gradient descent methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek38.html">Exercises week 38</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week38.html">Week 38: Statistical analysis, bias-variance tradeoff and resampling methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek39.html">Exercises week 39</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week39.html">Week 39: Resampling methods and logistic regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week40.html">Week 40: Gradient descent methods (continued) and start Neural networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week41.html">Week 41 Neural networks and constructing a neural network code</a></li>
+<li class="toctree-l1 current active"><a class="current reference internal" href="#">Exercises week 41</a></li>
+
+
+
+
+
+
+
+
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week42.html">Week 42 Constructing a Neural Network code with examples</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek42.html">Exercises week 42</a></li>
+
+
+
+
+
+
+
+
+
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week43.html">Week 43: Deep Learning: Constructing a Neural Network code and solving differential equations</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek43.html">Exercises week 43</a></li>
+
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week44.html">Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek44.html">Exercises week 44</a></li>
+
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week45.html">Week 45,  Convolutional Neural Networks (CCNs)</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Projects</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/project1.html">Project 1 on Machine Learning, deadline October 6 (midnight), 2025</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/project2.html">Project 2 on Machine Learning, deadline November 10 (Midnight)</a></li>
+</ul>
+
+    </div>
+</nav></div>
+    </div>
+  
+  
+  <div class="sidebar-primary-items__end sidebar-primary__section">
+  </div>
+  
+  <div id="rtd-footer-container"></div>
+
+
+      </div>
+      
+      <main id="main-content" class="bd-main" role="main">
+        
+        
+
+<div class="sbt-scroll-pixel-helper"></div>
+
+          <div class="bd-content">
+            <div class="bd-article-container">
+              
+              <div class="bd-header-article d-print-none">
+<div class="header-article-items header-article__inner">
+  
+    <div class="header-article-items__start">
+      
+        <div class="header-article-item"><button class="sidebar-toggle primary-toggle btn btn-sm" title="Toggle primary sidebar" data-bs-placement="bottom" data-bs-toggle="tooltip">
+  <span class="fa-solid fa-bars"></span>
+</button></div>
+      
+    </div>
+  
+  
+    <div class="header-article-items__end">
+      
+        <div class="header-article-item">
+
+<div class="article-header-buttons">
+
+
+
+
+
+<div class="dropdown dropdown-download-buttons">
+  <button class="btn dropdown-toggle" type="button" data-bs-toggle="dropdown" aria-expanded="false" aria-label="Download this page">
+    <i class="fas fa-download"></i>
+  </button>
+  <ul class="dropdown-menu">
+      
+      
+      
+      <li><a href="http://github.com/_sources/exercisesweek41.ipynb" target="_blank"
+   class="btn btn-sm btn-download-source-button dropdown-item"
+   title="Download source file"
+   data-bs-placement="left" data-bs-toggle="tooltip"
+>
+  
+
+<span class="btn__icon-container">
+  <i class="fas fa-file"></i>
+  </span>
+<span class="btn__text-container">.ipynb</span>
+</a>
+</li>
+      
+      
+      
+      
+      <li>
+<button onclick="window.print()"
+  class="btn btn-sm btn-download-pdf-button dropdown-item"
+  title="Print to PDF"
+  data-bs-placement="left" data-bs-toggle="tooltip"
+>
+  
+
+<span class="btn__icon-container">
+  <i class="fas fa-file-pdf"></i>
+  </span>
+<span class="btn__text-container">.pdf</span>
+</button>
+</li>
+      
+  </ul>
+</div>
+
+
+
+
+<button onclick="toggleFullScreen()"
+  class="btn btn-sm btn-fullscreen-button"
+  title="Fullscreen mode"
+  data-bs-placement="bottom" data-bs-toggle="tooltip"
+>
+  
+
+<span class="btn__icon-container">
+  <i class="fas fa-expand"></i>
+  </span>
+
+</button>
+
+
+
+<script>
+document.write(`
+  <button class="btn btn-sm nav-link pst-navbar-icon theme-switch-button" title="light/dark" aria-label="light/dark" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <i class="theme-switch fa-solid fa-sun fa-lg" data-mode="light"></i>
+    <i class="theme-switch fa-solid fa-moon fa-lg" data-mode="dark"></i>
+    <i class="theme-switch fa-solid fa-circle-half-stroke fa-lg" data-mode="auto"></i>
+  </button>
+`);
+</script>
+
+
+<script>
+document.write(`
+  <button class="btn btn-sm pst-navbar-icon search-button search-button__button" title="Search" aria-label="Search" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <i class="fa-solid fa-magnifying-glass fa-lg"></i>
+  </button>
+`);
+</script>
+<button class="sidebar-toggle secondary-toggle btn btn-sm" title="Toggle secondary sidebar" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <span class="fa-solid fa-list"></span>
+</button>
+</div></div>
+      
+    </div>
+  
+</div>
+</div>
+              
+              
+
+<div id="jb-print-docs-body" class="onlyprint">
+    <h1>Exercises week 41</h1>
+    <!-- Table of contents -->
+    <div id="print-main-content">
+        <div id="jb-print-toc">
+            
+            <div>
+                <h2> Contents </h2>
+            </div>
+            <nav aria-label="Page">
+                <ul class="visible nav section-nav flex-column">
+<li class="toc-h1 nav-item toc-entry"><a class="reference internal nav-link" href="#">Exercises week 41</a></li>
+<li class="toc-h1 nav-item toc-entry"><a class="reference internal nav-link" href="#overarching-aims-of-the-exercises-this-week">Overarching aims of the exercises this week</a></li>
+<li class="toc-h1 nav-item toc-entry"><a class="reference internal nav-link" href="#exercise-1">Exercise 1</a></li>
+<li class="toc-h1 nav-item toc-entry"><a class="reference internal nav-link" href="#exercise-2">Exercise 2</a></li>
+<li class="toc-h1 nav-item toc-entry"><a class="reference internal nav-link" href="#exercise-3">Exercise 3</a></li>
+<li class="toc-h1 nav-item toc-entry"><a class="reference internal nav-link" href="#exercise-4-custom-activation-for-each-layer">Exercise 4 - Custom activation for each layer</a></li>
+<li class="toc-h1 nav-item toc-entry"><a class="reference internal nav-link" href="#exercise-5-processing-multiple-inputs-at-once">Exercise 5 - Processing multiple inputs at once</a></li>
+<li class="toc-h1 nav-item toc-entry"><a class="reference internal nav-link" href="#exercise-6-predicting-on-real-data">Exercise 6 - Predicting on real data</a></li>
+<li class="toc-h1 nav-item toc-entry"><a class="reference internal nav-link" href="#exercise-7-training-on-real-data-optional">Exercise 7 - Training on real data (Optional)</a></li>
+</ul>
+
+            </nav>
+        </div>
+    </div>
+</div>
+
+              
+                
+<div id="searchbox"></div>
+                <article class="bd-article">
+                  
+  <!-- HTML file automatically generated from DocOnce source (https://github.com/doconce/doconce/)
+doconce format html exercisesweek41.do.txt  -->
+<!-- dom:TITLE: Exercises week 41 -->
+<section class="tex2jax_ignore mathjax_ignore" id="exercises-week-41">
+<h1>Exercises week 41<a class="headerlink" href="#exercises-week-41" title="Link to this heading">#</a></h1>
+<p><strong>October 6-10, 2025</strong></p>
+<p>Date: <strong>Deadline is Friday October 10 at midnight</strong></p>
+</section>
+<section class="tex2jax_ignore mathjax_ignore" id="overarching-aims-of-the-exercises-this-week">
+<h1>Overarching aims of the exercises this week<a class="headerlink" href="#overarching-aims-of-the-exercises-this-week" title="Link to this heading">#</a></h1>
+<p>This week, you will implement the entire feed-forward pass of a neural network! Next week you will compute the gradient of the network by implementing back-propagation manually, and by using autograd which does back-propagation for you (much easier!). Next week, you will also use the gradient to optimize the network with a gradient method! However, there is an optional exercise this week to get started on training the network and getting good results!</p>
+<p>We recommend that you do the exercises this week by editing and running this notebook file, as it includes some checks along the way that you have implemented the pieces of the feed-forward pass correctly, and running small parts of the code at a time will be important for understanding the methods.</p>
+<p>If you have trouble running a notebook, you can run this notebook in google colab instead (<a class="reference external" href="https://colab.research.google.com/drive/1zKibVQf-iAYaAn2-GlKfgRjHtLnPlBX4#offline=true&amp;amp;sandboxMode=true">https://colab.research.google.com/drive/1zKibVQf-iAYaAn2-GlKfgRjHtLnPlBX4#offline=true&amp;sandboxMode=true</a>), an updated link will be provided on the course discord (you can also send an email to <a class="reference external" href="/service/mailto:k&#46;h&#46;fredly&#37;&#52;&#48;fys&#46;uio&#46;no">k<span>&#46;</span>h<span>&#46;</span>fredly<span>&#64;</span>fys<span>&#46;</span>uio<span>&#46;</span>no</a> if you encounter any trouble), though we recommend that you set up VSCode and your python environment to run code like this locally.</p>
+<p>First, here are some functions you are going to need, don’t change this cell. If you are unable to import autograd, just swap in normal numpy until you want to do the final optional exercise.</p>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-ipython3 notranslate"><div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">autograd.numpy</span> <span class="k">as</span> <span class="nn">np</span>  <span class="c1"># We need to use this numpy wrapper to make automatic differentiation work later</span>
+<span class="kn">from</span> <span class="nn">sklearn</span> <span class="kn">import</span> <span class="n">datasets</span>
+<span class="kn">import</span> <span class="nn">matplotlib.pyplot</span> <span class="k">as</span> <span class="nn">plt</span>
+<span class="kn">from</span> <span class="nn">sklearn.metrics</span> <span class="kn">import</span> <span class="n">accuracy_score</span>
+
+
+<span class="c1"># Defining some activation functions</span>
+<span class="k">def</span> <span class="nf">ReLU</span><span class="p">(</span><span class="n">z</span><span class="p">):</span>
+    <span class="k">return</span> <span class="n">np</span><span class="o">.</span><span class="n">where</span><span class="p">(</span><span class="n">z</span> <span class="o">&gt;</span> <span class="mi">0</span><span class="p">,</span> <span class="n">z</span><span class="p">,</span> <span class="mi">0</span><span class="p">)</span>
+
+
+<span class="k">def</span> <span class="nf">sigmoid</span><span class="p">(</span><span class="n">z</span><span class="p">):</span>
+    <span class="k">return</span> <span class="mi">1</span> <span class="o">/</span> <span class="p">(</span><span class="mi">1</span> <span class="o">+</span> <span class="n">np</span><span class="o">.</span><span class="n">exp</span><span class="p">(</span><span class="o">-</span><span class="n">z</span><span class="p">))</span>
+
+
+<span class="k">def</span> <span class="nf">softmax</span><span class="p">(</span><span class="n">z</span><span class="p">):</span>
+<span class="w">    </span><span class="sd">&quot;&quot;&quot;Compute softmax values for each set of scores in the rows of the matrix z.</span>
+<span class="sd">    Used with batched input data.&quot;&quot;&quot;</span>
+    <span class="n">e_z</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">exp</span><span class="p">(</span><span class="n">z</span> <span class="o">-</span> <span class="n">np</span><span class="o">.</span><span class="n">max</span><span class="p">(</span><span class="n">z</span><span class="p">,</span> <span class="n">axis</span><span class="o">=</span><span class="mi">0</span><span class="p">))</span>
+    <span class="k">return</span> <span class="n">e_z</span> <span class="o">/</span> <span class="n">np</span><span class="o">.</span><span class="n">sum</span><span class="p">(</span><span class="n">e_z</span><span class="p">,</span> <span class="n">axis</span><span class="o">=</span><span class="mi">1</span><span class="p">)[:,</span> <span class="n">np</span><span class="o">.</span><span class="n">newaxis</span><span class="p">]</span>
+
+
+<span class="k">def</span> <span class="nf">softmax_vec</span><span class="p">(</span><span class="n">z</span><span class="p">):</span>
+<span class="w">    </span><span class="sd">&quot;&quot;&quot;Compute softmax values for each set of scores in the vector z.</span>
+<span class="sd">    Use this function when you use the activation function on one vector at a time&quot;&quot;&quot;</span>
+    <span class="n">e_z</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">exp</span><span class="p">(</span><span class="n">z</span> <span class="o">-</span> <span class="n">np</span><span class="o">.</span><span class="n">max</span><span class="p">(</span><span class="n">z</span><span class="p">))</span>
+    <span class="k">return</span> <span class="n">e_z</span> <span class="o">/</span> <span class="n">np</span><span class="o">.</span><span class="n">sum</span><span class="p">(</span><span class="n">e_z</span><span class="p">)</span>
+</pre></div>
+</div>
+</div>
+</div>
+</section>
+<section class="tex2jax_ignore mathjax_ignore" id="exercise-1">
+<h1>Exercise 1<a class="headerlink" href="#exercise-1" title="Link to this heading">#</a></h1>
+<p>In this exercise you will compute the activation of the first layer. You only need to change the code in the cells right below an exercise, the rest works out of the box. Feel free to make changes and see how stuff works though!</p>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-ipython3 notranslate"><div class="highlight"><pre><span></span><span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">seed</span><span class="p">(</span><span class="mi">2024</span><span class="p">)</span>
+
+<span class="n">x</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">randn</span><span class="p">(</span><span class="mi">2</span><span class="p">)</span>  <span class="c1"># network input. This is a single input with two features</span>
+<span class="n">W1</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">randn</span><span class="p">(</span><span class="mi">4</span><span class="p">,</span> <span class="mi">2</span><span class="p">)</span>  <span class="c1"># first layer weights</span>
+</pre></div>
+</div>
+</div>
+</div>
+<p><strong>a)</strong> Given the shape of the first layer weight matrix, what is the input shape of the neural network? What is the output shape of the first layer?</p>
+<p><strong>b)</strong> Define the bias of the first layer, <code class="docutils literal notranslate"><span class="pre">b1</span></code>with the correct shape. (Run the next cell right after the previous to get the random generated values to line up with the test solution below)</p>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-ipython3 notranslate"><div class="highlight"><pre><span></span><span class="n">b1</span> <span class="o">=</span> <span class="o">...</span>
+</pre></div>
+</div>
+</div>
+</div>
+<p><strong>c)</strong> Compute the intermediary <code class="docutils literal notranslate"><span class="pre">z1</span></code> for the first layer</p>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-ipython3 notranslate"><div class="highlight"><pre><span></span><span class="n">z1</span> <span class="o">=</span> <span class="o">...</span>
+</pre></div>
+</div>
+</div>
+</div>
+<p><strong>d)</strong> Compute the activation <code class="docutils literal notranslate"><span class="pre">a1</span></code> for the first layer using the ReLU activation function defined earlier.</p>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-ipython3 notranslate"><div class="highlight"><pre><span></span><span class="n">a1</span> <span class="o">=</span> <span class="o">...</span>
+</pre></div>
+</div>
+</div>
+</div>
+<p>Confirm that you got the correct activation with the test below. Make sure that you define <code class="docutils literal notranslate"><span class="pre">b1</span></code> with the randn function right after you define <code class="docutils literal notranslate"><span class="pre">W1</span></code>.</p>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-ipython3 notranslate"><div class="highlight"><pre><span></span><span class="n">sol1</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">array</span><span class="p">([</span><span class="mf">0.60610368</span><span class="p">,</span> <span class="mf">4.0076268</span><span class="p">,</span> <span class="mf">0.0</span><span class="p">,</span> <span class="mf">0.56469864</span><span class="p">])</span>
+
+<span class="nb">print</span><span class="p">(</span><span class="n">np</span><span class="o">.</span><span class="n">allclose</span><span class="p">(</span><span class="n">a1</span><span class="p">,</span> <span class="n">sol1</span><span class="p">))</span>
+</pre></div>
+</div>
+</div>
+</div>
+</section>
+<section class="tex2jax_ignore mathjax_ignore" id="exercise-2">
+<h1>Exercise 2<a class="headerlink" href="#exercise-2" title="Link to this heading">#</a></h1>
+<p>Now we will add a layer to the network with an output of length 8 and ReLU activation.</p>
+<p><strong>a)</strong> What is the input of the second layer? What is its shape?</p>
+<p><strong>b)</strong> Define the weight and bias of the second layer with the right shapes.</p>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-ipython3 notranslate"><div class="highlight"><pre><span></span><span class="n">W2</span> <span class="o">=</span> <span class="o">...</span>
+<span class="n">b2</span> <span class="o">=</span> <span class="o">...</span>
+</pre></div>
+</div>
+</div>
+</div>
+<p><strong>c)</strong> Compute the intermediary <code class="docutils literal notranslate"><span class="pre">z2</span></code> and activation <code class="docutils literal notranslate"><span class="pre">a2</span></code> for the second layer.</p>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-ipython3 notranslate"><div class="highlight"><pre><span></span><span class="n">z2</span> <span class="o">=</span> <span class="o">...</span>
+<span class="n">a2</span> <span class="o">=</span> <span class="o">...</span>
+</pre></div>
+</div>
+</div>
+</div>
+<p>Confirm that you got the correct activation shape with the test below.</p>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-ipython3 notranslate"><div class="highlight"><pre><span></span><span class="nb">print</span><span class="p">(</span>
+    <span class="n">np</span><span class="o">.</span><span class="n">allclose</span><span class="p">(</span><span class="n">np</span><span class="o">.</span><span class="n">exp</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="n">a2</span><span class="p">)),</span> <span class="mf">2980.9579870417283</span><span class="p">)</span>
+<span class="p">)</span>  <span class="c1"># This should evaluate to True if a2 has the correct shape :)</span>
+</pre></div>
+</div>
+</div>
+</div>
+</section>
+<section class="tex2jax_ignore mathjax_ignore" id="exercise-3">
+<h1>Exercise 3<a class="headerlink" href="#exercise-3" title="Link to this heading">#</a></h1>
+<p>We often want our neural networks to have many layers of varying sizes. To avoid writing very long and error-prone code where we explicitly define and evaluate each layer we should keep all our layers in a single variable which is easy to create and use.</p>
+<p><strong>a)</strong> Complete the function below so that it returns a list <code class="docutils literal notranslate"><span class="pre">layers</span></code> of weight and bias tuples <code class="docutils literal notranslate"><span class="pre">(W,</span> <span class="pre">b)</span></code> for each layer, in order, with the correct shapes that we can use later as our network parameters.</p>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-ipython3 notranslate"><div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">create_layers</span><span class="p">(</span><span class="n">network_input_size</span><span class="p">,</span> <span class="n">layer_output_sizes</span><span class="p">):</span>
+    <span class="n">layers</span> <span class="o">=</span> <span class="p">[]</span>
+
+    <span class="n">i_size</span> <span class="o">=</span> <span class="n">network_input_size</span>
+    <span class="k">for</span> <span class="n">layer_output_size</span> <span class="ow">in</span> <span class="n">layer_output_sizes</span><span class="p">:</span>
+        <span class="n">W</span> <span class="o">=</span> <span class="o">...</span>
+        <span class="n">b</span> <span class="o">=</span> <span class="o">...</span>
+        <span class="n">layers</span><span class="o">.</span><span class="n">append</span><span class="p">((</span><span class="n">W</span><span class="p">,</span> <span class="n">b</span><span class="p">))</span>
+
+        <span class="n">i_size</span> <span class="o">=</span> <span class="n">layer_output_size</span>
+    <span class="k">return</span> <span class="n">layers</span>
+</pre></div>
+</div>
+</div>
+</div>
+<p><strong>b)</strong> Comple the function below so that it evaluates the intermediary <code class="docutils literal notranslate"><span class="pre">z</span></code> and activation <code class="docutils literal notranslate"><span class="pre">a</span></code> for each layer, with ReLU actication, and returns the final activation <code class="docutils literal notranslate"><span class="pre">a</span></code>. This is the complete feed-forward pass, a full neural network!</p>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-ipython3 notranslate"><div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">feed_forward_all_relu</span><span class="p">(</span><span class="n">layers</span><span class="p">,</span> <span class="nb">input</span><span class="p">):</span>
+    <span class="n">a</span> <span class="o">=</span> <span class="nb">input</span>
+    <span class="k">for</span> <span class="n">W</span><span class="p">,</span> <span class="n">b</span> <span class="ow">in</span> <span class="n">layers</span><span class="p">:</span>
+        <span class="n">z</span> <span class="o">=</span> <span class="o">...</span>
+        <span class="n">a</span> <span class="o">=</span> <span class="o">...</span>
+    <span class="k">return</span> <span class="n">a</span>
+</pre></div>
+</div>
+</div>
+</div>
+<p><strong>c)</strong> Create a network with input size 8 and layers with output sizes 10, 16, 6, 2. Evaluate it and make sure that you get the correct size vectors along the way.</p>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-ipython3 notranslate"><div class="highlight"><pre><span></span><span class="n">input_size</span> <span class="o">=</span> <span class="o">...</span>
+<span class="n">layer_output_sizes</span> <span class="o">=</span> <span class="p">[</span><span class="o">...</span><span class="p">]</span>
+
+<span class="n">x</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">rand</span><span class="p">(</span><span class="n">input_size</span><span class="p">)</span>
+<span class="n">layers</span> <span class="o">=</span> <span class="o">...</span>
+<span class="n">predict</span> <span class="o">=</span> <span class="o">...</span>
+<span class="nb">print</span><span class="p">(</span><span class="n">predict</span><span class="p">)</span>
+</pre></div>
+</div>
+</div>
+</div>
+<p><strong>d)</strong> Why is a neural network with no activation functions mathematically equivelent to(can be reduced to) a neural network with only one layer?</p>
+</section>
+<section class="tex2jax_ignore mathjax_ignore" id="exercise-4-custom-activation-for-each-layer">
+<h1>Exercise 4 - Custom activation for each layer<a class="headerlink" href="#exercise-4-custom-activation-for-each-layer" title="Link to this heading">#</a></h1>
+<p>So far, every layer has used the same activation, ReLU. We often want to use other types of activation however, so we need to update our code to support multiple types of activation functions. Make sure that you have completed every previous exercise before trying this one.</p>
+<p><strong>a)</strong> Complete the <code class="docutils literal notranslate"><span class="pre">feed_forward</span></code> function which accepts a list of activation functions as an argument, and which evaluates these activation functions at each layer.</p>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-ipython3 notranslate"><div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">feed_forward</span><span class="p">(</span><span class="nb">input</span><span class="p">,</span> <span class="n">layers</span><span class="p">,</span> <span class="n">activation_funcs</span><span class="p">):</span>
+    <span class="n">a</span> <span class="o">=</span> <span class="nb">input</span>
+    <span class="k">for</span> <span class="p">(</span><span class="n">W</span><span class="p">,</span> <span class="n">b</span><span class="p">),</span> <span class="n">activation_func</span> <span class="ow">in</span> <span class="nb">zip</span><span class="p">(</span><span class="n">layers</span><span class="p">,</span> <span class="n">activation_funcs</span><span class="p">):</span>
+        <span class="n">z</span> <span class="o">=</span> <span class="o">...</span>
+        <span class="n">a</span> <span class="o">=</span> <span class="o">...</span>
+    <span class="k">return</span> <span class="n">a</span>
+</pre></div>
+</div>
+</div>
+</div>
+<p><strong>b)</strong> You are now given a list with three activation functions, two ReLU and one sigmoid. (Don’t call them yet! you can make a list with function names as elements, and then call these elements of the list later. If you add other functions than the ones defined at the start of the notebook, make sure everything is defined using autograd’s numpy wrapper, like above, since we want to use automatic differentiation on all of these functions later.)</p>
+<p>Evaluate a network with three layers and these activation functions.</p>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-ipython3 notranslate"><div class="highlight"><pre><span></span><span class="n">network_input_size</span> <span class="o">=</span> <span class="o">...</span>
+<span class="n">layer_output_sizes</span> <span class="o">=</span> <span class="p">[</span><span class="o">...</span><span class="p">]</span>
+<span class="n">activation_funcs</span> <span class="o">=</span> <span class="p">[</span><span class="n">ReLU</span><span class="p">,</span> <span class="n">ReLU</span><span class="p">,</span> <span class="n">sigmoid</span><span class="p">]</span>
+<span class="n">layers</span> <span class="o">=</span> <span class="o">...</span>
+
+<span class="n">x</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">randn</span><span class="p">(</span><span class="n">network_input_size</span><span class="p">)</span>
+<span class="n">feed_forward</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">layers</span><span class="p">,</span> <span class="n">activation_funcs</span><span class="p">)</span>
+</pre></div>
+</div>
+</div>
+</div>
+<p><strong>c)</strong> How does the output of the network change if you use sigmoid in the hidden layers and ReLU in the output layer?</p>
+</section>
+<section class="tex2jax_ignore mathjax_ignore" id="exercise-5-processing-multiple-inputs-at-once">
+<h1>Exercise 5 - Processing multiple inputs at once<a class="headerlink" href="#exercise-5-processing-multiple-inputs-at-once" title="Link to this heading">#</a></h1>
+<p>So far, the feed forward function has taken one input vector as an input. This vector then undergoes a linear transformation and then an element-wise non-linear operation for each layer. This approach of sending one vector in at a time is great for interpreting how the network transforms data with its linear and non-linear operations, but not the best for numerical efficiency. Now, we want to be able to send many inputs through the network at once. This will make the code a bit harder to understand, but it will make it faster, and more compact. It will be worth the trouble.</p>
+<p>To process multiple inputs at once, while still performing the same operations, you will only need to flip a couple things around.</p>
+<p><strong>a)</strong> Complete the function <code class="docutils literal notranslate"><span class="pre">create_layers_batch</span></code> so that the weight matrix is the transpose of what it was when you only sent in one input at a time.</p>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-ipython3 notranslate"><div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">create_layers_batch</span><span class="p">(</span><span class="n">network_input_size</span><span class="p">,</span> <span class="n">layer_output_sizes</span><span class="p">):</span>
+    <span class="n">layers</span> <span class="o">=</span> <span class="p">[]</span>
+
+    <span class="n">i_size</span> <span class="o">=</span> <span class="n">network_input_size</span>
+    <span class="k">for</span> <span class="n">layer_output_size</span> <span class="ow">in</span> <span class="n">layer_output_sizes</span><span class="p">:</span>
+        <span class="n">W</span> <span class="o">=</span> <span class="o">...</span>
+        <span class="n">b</span> <span class="o">=</span> <span class="o">...</span>
+        <span class="n">layers</span><span class="o">.</span><span class="n">append</span><span class="p">((</span><span class="n">W</span><span class="p">,</span> <span class="n">b</span><span class="p">))</span>
+
+        <span class="n">i_size</span> <span class="o">=</span> <span class="n">layer_output_size</span>
+    <span class="k">return</span> <span class="n">layers</span>
+</pre></div>
+</div>
+</div>
+</div>
+<p><strong>b)</strong> Make a matrix of inputs with the shape (number of inputs, number of features), you choose the number of inputs and features per input. Then complete the function <code class="docutils literal notranslate"><span class="pre">feed_forward_batch</span></code> so that you can process this matrix of inputs with only one matrix multiplication and one broadcasted vector addition per layer. (Hint: You will only need to swap two variable around from your previous implementation, but remember to test that you get the same results for equivelent inputs!)</p>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-ipython3 notranslate"><div class="highlight"><pre><span></span><span class="n">inputs</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">rand</span><span class="p">(</span><span class="mi">1000</span><span class="p">,</span> <span class="mi">4</span><span class="p">)</span>
+
+
+<span class="k">def</span> <span class="nf">feed_forward_batch</span><span class="p">(</span><span class="n">inputs</span><span class="p">,</span> <span class="n">layers</span><span class="p">,</span> <span class="n">activation_funcs</span><span class="p">):</span>
+    <span class="n">a</span> <span class="o">=</span> <span class="n">inputs</span>
+    <span class="k">for</span> <span class="p">(</span><span class="n">W</span><span class="p">,</span> <span class="n">b</span><span class="p">),</span> <span class="n">activation_func</span> <span class="ow">in</span> <span class="nb">zip</span><span class="p">(</span><span class="n">layers</span><span class="p">,</span> <span class="n">activation_funcs</span><span class="p">):</span>
+        <span class="n">z</span> <span class="o">=</span> <span class="o">...</span>
+        <span class="n">a</span> <span class="o">=</span> <span class="o">...</span>
+    <span class="k">return</span> <span class="n">a</span>
+</pre></div>
+</div>
+</div>
+</div>
+<p><strong>c)</strong> Create and evaluate a neural network with 4 input features, and layers with output sizes 12, 10, 3 and activations ReLU, ReLU, softmax.</p>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-ipython3 notranslate"><div class="highlight"><pre><span></span><span class="n">network_input_size</span> <span class="o">=</span> <span class="o">...</span>
+<span class="n">layer_output_sizes</span> <span class="o">=</span> <span class="p">[</span><span class="o">...</span><span class="p">]</span>
+<span class="n">activation_funcs</span> <span class="o">=</span> <span class="p">[</span><span class="o">...</span><span class="p">]</span>
+<span class="n">layers</span> <span class="o">=</span> <span class="n">create_layers_batch</span><span class="p">(</span><span class="n">network_input_size</span><span class="p">,</span> <span class="n">layer_output_sizes</span><span class="p">)</span>
+
+<span class="n">x</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">randn</span><span class="p">(</span><span class="n">network_input_size</span><span class="p">)</span>
+<span class="n">feed_forward_batch</span><span class="p">(</span><span class="n">inputs</span><span class="p">,</span> <span class="n">layers</span><span class="p">,</span> <span class="n">activation_funcs</span><span class="p">)</span>
+</pre></div>
+</div>
+</div>
+</div>
+<p>You should use this batched approach moving forward, as it will lead to much more compact code. However, remember that each input is still treated separately, and that you will need to keep in mind the transposed weight matrix and other details when implementing backpropagation.</p>
+</section>
+<section class="tex2jax_ignore mathjax_ignore" id="exercise-6-predicting-on-real-data">
+<h1>Exercise 6 - Predicting on real data<a class="headerlink" href="#exercise-6-predicting-on-real-data" title="Link to this heading">#</a></h1>
+<p>You will now evaluate your neural network on the iris data set (<a class="reference external" href="https://scikit-learn.org/1.5/auto_examples/datasets/plot_iris_dataset.html">https://scikit-learn.org/1.5/auto_examples/datasets/plot_iris_dataset.html</a>).</p>
+<p>This dataset contains data on 150 flowers of 3 different types which can be separated pretty well using the four features given for each flower, which includes the width and length of their leaves. You are will later train your network to actually make good predictions.</p>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-ipython3 notranslate"><div class="highlight"><pre><span></span><span class="n">iris</span> <span class="o">=</span> <span class="n">datasets</span><span class="o">.</span><span class="n">load_iris</span><span class="p">()</span>
+
+<span class="n">_</span><span class="p">,</span> <span class="n">ax</span> <span class="o">=</span> <span class="n">plt</span><span class="o">.</span><span class="n">subplots</span><span class="p">()</span>
+<span class="n">scatter</span> <span class="o">=</span> <span class="n">ax</span><span class="o">.</span><span class="n">scatter</span><span class="p">(</span><span class="n">iris</span><span class="o">.</span><span class="n">data</span><span class="p">[:,</span> <span class="mi">0</span><span class="p">],</span> <span class="n">iris</span><span class="o">.</span><span class="n">data</span><span class="p">[:,</span> <span class="mi">1</span><span class="p">],</span> <span class="n">c</span><span class="o">=</span><span class="n">iris</span><span class="o">.</span><span class="n">target</span><span class="p">)</span>
+<span class="n">ax</span><span class="o">.</span><span class="n">set</span><span class="p">(</span><span class="n">xlabel</span><span class="o">=</span><span class="n">iris</span><span class="o">.</span><span class="n">feature_names</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="n">ylabel</span><span class="o">=</span><span class="n">iris</span><span class="o">.</span><span class="n">feature_names</span><span class="p">[</span><span class="mi">1</span><span class="p">])</span>
+<span class="n">_</span> <span class="o">=</span> <span class="n">ax</span><span class="o">.</span><span class="n">legend</span><span class="p">(</span>
+    <span class="n">scatter</span><span class="o">.</span><span class="n">legend_elements</span><span class="p">()[</span><span class="mi">0</span><span class="p">],</span> <span class="n">iris</span><span class="o">.</span><span class="n">target_names</span><span class="p">,</span> <span class="n">loc</span><span class="o">=</span><span class="s2">&quot;lower right&quot;</span><span class="p">,</span> <span class="n">title</span><span class="o">=</span><span class="s2">&quot;Classes&quot;</span>
+<span class="p">)</span>
+</pre></div>
+</div>
+</div>
+</div>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-ipython3 notranslate"><div class="highlight"><pre><span></span><span class="n">inputs</span> <span class="o">=</span> <span class="n">iris</span><span class="o">.</span><span class="n">data</span>
+
+<span class="c1"># Since each prediction is a vector with a score for each of the three types of flowers,</span>
+<span class="c1"># we need to make each target a vector with a 1 for the correct flower and a 0 for the others.</span>
+<span class="n">targets</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">zeros</span><span class="p">((</span><span class="nb">len</span><span class="p">(</span><span class="n">iris</span><span class="o">.</span><span class="n">data</span><span class="p">),</span> <span class="mi">3</span><span class="p">))</span>
+<span class="k">for</span> <span class="n">i</span><span class="p">,</span> <span class="n">t</span> <span class="ow">in</span> <span class="nb">enumerate</span><span class="p">(</span><span class="n">iris</span><span class="o">.</span><span class="n">target</span><span class="p">):</span>
+    <span class="n">targets</span><span class="p">[</span><span class="n">i</span><span class="p">,</span> <span class="n">t</span><span class="p">]</span> <span class="o">=</span> <span class="mi">1</span>
+
+
+<span class="k">def</span> <span class="nf">accuracy</span><span class="p">(</span><span class="n">predictions</span><span class="p">,</span> <span class="n">targets</span><span class="p">):</span>
+    <span class="n">one_hot_predictions</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">zeros</span><span class="p">(</span><span class="n">predictions</span><span class="o">.</span><span class="n">shape</span><span class="p">)</span>
+
+    <span class="k">for</span> <span class="n">i</span><span class="p">,</span> <span class="n">prediction</span> <span class="ow">in</span> <span class="nb">enumerate</span><span class="p">(</span><span class="n">predictions</span><span class="p">):</span>
+        <span class="n">one_hot_predictions</span><span class="p">[</span><span class="n">i</span><span class="p">,</span> <span class="n">np</span><span class="o">.</span><span class="n">argmax</span><span class="p">(</span><span class="n">prediction</span><span class="p">)]</span> <span class="o">=</span> <span class="mi">1</span>
+    <span class="k">return</span> <span class="n">accuracy_score</span><span class="p">(</span><span class="n">one_hot_predictions</span><span class="p">,</span> <span class="n">targets</span><span class="p">)</span>
+</pre></div>
+</div>
+</div>
+</div>
+<p><strong>a)</strong> What should the input size for the network be with this dataset? What should the output size of the last layer be?</p>
+<p><strong>b)</strong> Create a network with two hidden layers, the first with sigmoid activation and the last with softmax, the first layer should have 8 “nodes”, the second has the number of nodes you found in exercise a). Softmax returns a “probability distribution”, in the sense that the numbers in the output are positive and add up to 1 and, their magnitude are in some sense relative to their magnitude before going through the softmax function. Remember to use the batched version of the create_layers and feed forward functions.</p>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-ipython3 notranslate"><div class="highlight"><pre><span></span><span class="o">...</span>
+<span class="n">layers</span> <span class="o">=</span> <span class="o">...</span>
+</pre></div>
+</div>
+</div>
+</div>
+<p><strong>c)</strong> Evaluate your model on the entire iris dataset! For later purposes, we will split the data into train and test sets, and compute gradients on smaller batches of the training data. But for now, evaluate the network on the whole thing at once.</p>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-ipython3 notranslate"><div class="highlight"><pre><span></span><span class="n">predictions</span> <span class="o">=</span> <span class="n">feed_forward_batch</span><span class="p">(</span><span class="n">inputs</span><span class="p">,</span> <span class="n">layers</span><span class="p">,</span> <span class="n">activation_funcs</span><span class="p">)</span>
+</pre></div>
+</div>
+</div>
+</div>
+<p><strong>d)</strong> Compute the accuracy of your model using the accuracy function defined above. Recreate your model a couple times and see how the accuracy changes.</p>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-ipython3 notranslate"><div class="highlight"><pre><span></span><span class="nb">print</span><span class="p">(</span><span class="n">accuracy</span><span class="p">(</span><span class="n">predictions</span><span class="p">,</span> <span class="n">targets</span><span class="p">))</span>
+</pre></div>
+</div>
+</div>
+</div>
+</section>
+<section class="tex2jax_ignore mathjax_ignore" id="exercise-7-training-on-real-data-optional">
+<h1>Exercise 7 - Training on real data (Optional)<a class="headerlink" href="#exercise-7-training-on-real-data-optional" title="Link to this heading">#</a></h1>
+<p>To be able to actually do anything useful with your neural network, you need to train it. For this, we need a cost function and a way to take the gradient of the cost function wrt. the network parameters. The following exercises guide you through taking the gradient using autograd, and updating the network parameters using the gradient. Feel free to implement gradient methods like ADAM if you finish everything.</p>
+<p>Since we are doing a classification task with multiple output classes, we use the cross-entropy loss function, which can evaluate performance on classification tasks. It sees if your prediction is “most certain” on the correct target.</p>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-ipython3 notranslate"><div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">cross_entropy</span><span class="p">(</span><span class="n">predict</span><span class="p">,</span> <span class="n">target</span><span class="p">):</span>
+    <span class="k">return</span> <span class="n">np</span><span class="o">.</span><span class="n">sum</span><span class="p">(</span><span class="o">-</span><span class="n">target</span> <span class="o">*</span> <span class="n">np</span><span class="o">.</span><span class="n">log</span><span class="p">(</span><span class="n">predict</span><span class="p">))</span>
+
+
+<span class="k">def</span> <span class="nf">cost</span><span class="p">(</span><span class="nb">input</span><span class="p">,</span> <span class="n">layers</span><span class="p">,</span> <span class="n">activation_funcs</span><span class="p">,</span> <span class="n">target</span><span class="p">):</span>
+    <span class="n">predict</span> <span class="o">=</span> <span class="n">feed_forward_batch</span><span class="p">(</span><span class="nb">input</span><span class="p">,</span> <span class="n">layers</span><span class="p">,</span> <span class="n">activation_funcs</span><span class="p">)</span>
+    <span class="k">return</span> <span class="n">cross_entropy</span><span class="p">(</span><span class="n">predict</span><span class="p">,</span> <span class="n">target</span><span class="p">)</span>
+</pre></div>
+</div>
+</div>
+</div>
+<p>To improve our network on whatever prediction task we have given it, we need to use a sensible cost function, take the gradient of that cost function with respect to our network parameters, the weights and biases, and then update the weights and biases using these gradients. To clarify, we need to find and use these</p>
+<div class="math notranslate nohighlight">
+\[
+\frac{\partial C}{\partial W}, \frac{\partial C}{\partial b}
+\]</div>
+<p>Now we need to compute these gradients. This is pretty hard to do for a neural network, we will use most of next week to do this, but we can also use autograd to just do it for us, which is what we always do in practice. With the code cell below, we create a function which takes all of these gradients for us.</p>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-ipython3 notranslate"><div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">autograd</span> <span class="kn">import</span> <span class="n">grad</span>
+
+
+<span class="n">gradient_func</span> <span class="o">=</span> <span class="n">grad</span><span class="p">(</span>
+    <span class="n">cost</span><span class="p">,</span> <span class="mi">1</span>
+<span class="p">)</span>  <span class="c1"># Taking the gradient wrt. the second input to the cost function, i.e. the layers</span>
+</pre></div>
+</div>
+</div>
+</div>
+<p><strong>a)</strong> What shape should the gradient of the cost function wrt. weights and biases be?</p>
+<p><strong>b)</strong> Use the <code class="docutils literal notranslate"><span class="pre">gradient_func</span></code> function to take the gradient of the cross entropy wrt. the weights and biases of the network. Check the shapes of what’s inside. What does the <code class="docutils literal notranslate"><span class="pre">grad</span></code> func from autograd actually do?</p>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-ipython3 notranslate"><div class="highlight"><pre><span></span><span class="n">layers_grad</span> <span class="o">=</span> <span class="n">gradient_func</span><span class="p">(</span>
+    <span class="n">inputs</span><span class="p">,</span> <span class="n">layers</span><span class="p">,</span> <span class="n">activation_funcs</span><span class="p">,</span> <span class="n">targets</span>
+<span class="p">)</span>  <span class="c1"># Don&#39;t change this</span>
+</pre></div>
+</div>
+</div>
+</div>
+<p><strong>c)</strong> Finish the <code class="docutils literal notranslate"><span class="pre">train_network</span></code> function.</p>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-ipython3 notranslate"><div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">train_network</span><span class="p">(</span>
+    <span class="n">inputs</span><span class="p">,</span> <span class="n">layers</span><span class="p">,</span> <span class="n">activation_funcs</span><span class="p">,</span> <span class="n">targets</span><span class="p">,</span> <span class="n">learning_rate</span><span class="o">=</span><span class="mf">0.001</span><span class="p">,</span> <span class="n">epochs</span><span class="o">=</span><span class="mi">100</span>
+<span class="p">):</span>
+    <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">epochs</span><span class="p">):</span>
+        <span class="n">layers_grad</span> <span class="o">=</span> <span class="n">gradient_func</span><span class="p">(</span><span class="n">inputs</span><span class="p">,</span> <span class="n">layers</span><span class="p">,</span> <span class="n">activation_funcs</span><span class="p">,</span> <span class="n">targets</span><span class="p">)</span>
+        <span class="k">for</span> <span class="p">(</span><span class="n">W</span><span class="p">,</span> <span class="n">b</span><span class="p">),</span> <span class="p">(</span><span class="n">W_g</span><span class="p">,</span> <span class="n">b_g</span><span class="p">)</span> <span class="ow">in</span> <span class="nb">zip</span><span class="p">(</span><span class="n">layers</span><span class="p">,</span> <span class="n">layers_grad</span><span class="p">):</span>
+            <span class="n">W</span> <span class="o">-=</span> <span class="o">...</span>
+            <span class="n">b</span> <span class="o">-=</span> <span class="o">...</span>
+</pre></div>
+</div>
+</div>
+</div>
+<p><strong>e)</strong> What do we call the gradient method used above?</p>
+<p><strong>d)</strong> Train your network and see how the accuracy changes! Make a plot if you want.</p>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-ipython3 notranslate"><div class="highlight"><pre><span></span><span class="o">...</span>
+</pre></div>
+</div>
+</div>
+</div>
+<p><strong>e)</strong> How high of an accuracy is it possible to acheive with a neural network on this dataset, if we use the whole thing as training data?</p>
+</section>
+
+    <script type="text/x-thebe-config">
+    {
+        requestKernel: true,
+        binderOptions: {
+            repo: "binder-examples/jupyter-stacks-datascience",
+            ref: "master",
+        },
+        codeMirrorConfig: {
+            theme: "abcdef",
+            mode: "python"
+        },
+        kernelOptions: {
+            name: "python3",
+            path: "./."
+        },
+        predefinedOutput: true
+    }
+    </script>
+    <script>kernelName = 'python3'</script>
+
+                </article>
+              
+
+              
+              
+              
+              
+                <footer class="prev-next-footer d-print-none">
+                  
+<div class="prev-next-area">
+    <a class="left-prev"
+       href="http://github.com/week41.html"
+       title="previous page">
+      <i class="fa-solid fa-angle-left"></i>
+      <div class="prev-next-info">
+        <p class="prev-next-subtitle">previous</p>
+        <p class="prev-next-title">Week 41 Neural networks and constructing a neural network code</p>
+      </div>
+    </a>
+    <a class="right-next"
+       href="http://github.com/week42.html"
+       title="next page">
+      <div class="prev-next-info">
+        <p class="prev-next-subtitle">next</p>
+        <p class="prev-next-title">Week 42 Constructing a Neural Network code with examples</p>
+      </div>
+      <i class="fa-solid fa-angle-right"></i>
+    </a>
+</div>
+                </footer>
+              
+            </div>
+            
+            
+              
+                <div class="bd-sidebar-secondary bd-toc"><div class="sidebar-secondary-items sidebar-secondary__inner">
+
+
+  <div class="sidebar-secondary-item">
+  <div class="page-toc tocsection onthispage">
+    <i class="fa-solid fa-list"></i> Contents
+  </div>
+  <nav class="bd-toc-nav page-toc">
+    <ul class="visible nav section-nav flex-column">
+<li class="toc-h1 nav-item toc-entry"><a class="reference internal nav-link" href="#">Exercises week 41</a></li>
+<li class="toc-h1 nav-item toc-entry"><a class="reference internal nav-link" href="#overarching-aims-of-the-exercises-this-week">Overarching aims of the exercises this week</a></li>
+<li class="toc-h1 nav-item toc-entry"><a class="reference internal nav-link" href="#exercise-1">Exercise 1</a></li>
+<li class="toc-h1 nav-item toc-entry"><a class="reference internal nav-link" href="#exercise-2">Exercise 2</a></li>
+<li class="toc-h1 nav-item toc-entry"><a class="reference internal nav-link" href="#exercise-3">Exercise 3</a></li>
+<li class="toc-h1 nav-item toc-entry"><a class="reference internal nav-link" href="#exercise-4-custom-activation-for-each-layer">Exercise 4 - Custom activation for each layer</a></li>
+<li class="toc-h1 nav-item toc-entry"><a class="reference internal nav-link" href="#exercise-5-processing-multiple-inputs-at-once">Exercise 5 - Processing multiple inputs at once</a></li>
+<li class="toc-h1 nav-item toc-entry"><a class="reference internal nav-link" href="#exercise-6-predicting-on-real-data">Exercise 6 - Predicting on real data</a></li>
+<li class="toc-h1 nav-item toc-entry"><a class="reference internal nav-link" href="#exercise-7-training-on-real-data-optional">Exercise 7 - Training on real data (Optional)</a></li>
+</ul>
+
+  </nav></div>
+
+</div></div>
+              
+            
+          </div>
+          <footer class="bd-footer-content">
+            
+<div class="bd-footer-content__inner container">
+  
+  <div class="footer-item">
+    
+<p class="component-author">
+By Morten Hjorth-Jensen
+</p>
+
+  </div>
+  
+  <div class="footer-item">
+    
+
+  <p class="copyright">
+    
+      © Copyright 2023.
+      <br/>
+    
+  </p>
+
+  </div>
+  
+  <div class="footer-item">
+    
+  </div>
+  
+  <div class="footer-item">
+    
+  </div>
+  
+</div>
+          </footer>
+        
+
+      </main>
+    </div>
+  </div>
+  
+  <!-- Scripts loaded after <body> so the DOM is not blocked -->
+  <script src="http://github.com/_static/scripts/bootstrap.js?digest=dfe6caa3a7d634c4db9b"></script>
+<script src="http://github.com/_static/scripts/pydata-sphinx-theme.js?digest=dfe6caa3a7d634c4db9b"></script>
+
+  <footer class="bd-footer">
+  </footer>
+  </body>
+</html>
\ No newline at end of file
diff --git a/doc/LectureNotes/_build/html/exercisesweek42.html b/doc/LectureNotes/_build/html/exercisesweek42.html
new file mode 100644
index 000000000..b9a532931
--- /dev/null
+++ b/doc/LectureNotes/_build/html/exercisesweek42.html
@@ -0,0 +1,1031 @@
+
+<!DOCTYPE html>
+
+
+<html lang="en" data-content_root="./" >
+
+  <head>
+    <meta charset="utf-8" />
+    <meta name="viewport" content="width=device-width, initial-scale=1.0" /><meta name="viewport" content="width=device-width, initial-scale=1" />
+
+    <title>Exercises week 42 &#8212; Applied Data Analysis and Machine Learning</title>
+  
+  
+  
+  <script data-cfasync="false">
+    document.documentElement.dataset.mode = localStorage.getItem("mode") || "";
+    document.documentElement.dataset.theme = localStorage.getItem("theme") || "";
+  </script>
+  
+  <!-- Loaded before other Sphinx assets -->
+  <link href="http://github.com/_static/styles/theme.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+<link href="http://github.com/_static/styles/bootstrap.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+<link href="http://github.com/_static/styles/pydata-sphinx-theme.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+
+  
+  <link href="http://github.com/_static/vendor/fontawesome/6.5.2/css/all.min.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+  <link rel="preload" as="font" type="font/woff2" crossorigin href="http://github.com/_static/vendor/fontawesome/6.5.2/webfonts/fa-solid-900.woff2" />
+<link rel="preload" as="font" type="font/woff2" crossorigin href="http://github.com/_static/vendor/fontawesome/6.5.2/webfonts/fa-brands-400.woff2" />
+<link rel="preload" as="font" type="font/woff2" crossorigin href="http://github.com/_static/vendor/fontawesome/6.5.2/webfonts/fa-regular-400.woff2" />
+
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/pygments.css?v=fa44fd50" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/styles/sphinx-book-theme.css?v=eba8b062" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/togglebutton.css?v=13237357" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/copybutton.css?v=76b2166b" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/mystnb.8ecb98da25f57f5357bf6f572d296f466b2cfe2517ffebfabe82451661e28f02.css?v=6644e6bb" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/sphinx-thebe.css?v=4fa983c6" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/sphinx-design.min.css?v=95c83b7e" />
+  
+  <!-- Pre-loaded scripts that we'll load fully later -->
+  <link rel="preload" as="script" href="http://github.com/_static/scripts/bootstrap.js?digest=dfe6caa3a7d634c4db9b" />
+<link rel="preload" as="script" href="http://github.com/_static/scripts/pydata-sphinx-theme.js?digest=dfe6caa3a7d634c4db9b" />
+  <script src="http://github.com/_static/vendor/fontawesome/6.5.2/js/all.min.js?digest=dfe6caa3a7d634c4db9b"></script>
+
+    <script src="http://github.com/_static/documentation_options.js?v=9eb32ce0"></script>
+    <script src="http://github.com/_static/doctools.js?v=9a2dae69"></script>
+    <script src="http://github.com/_static/sphinx_highlight.js?v=dc90522c"></script>
+    <script src="http://github.com/_static/clipboard.min.js?v=a7894cd8"></script>
+    <script src="http://github.com/_static/copybutton.js?v=f281be69"></script>
+    <script src="http://github.com/_static/scripts/sphinx-book-theme.js?v=887ef09a"></script>
+    <script>let toggleHintShow = 'Click to show';</script>
+    <script>let toggleHintHide = 'Click to hide';</script>
+    <script>let toggleOpenOnPrint = 'true';</script>
+    <script src="http://github.com/_static/togglebutton.js?v=4a39c7ea"></script>
+    <script>var togglebuttonSelector = '.toggle, .admonition.dropdown';</script>
+    <script src="http://github.com/_static/design-tabs.js?v=f930bc37"></script>
+    <script>const THEBE_JS_URL = "https://unpkg.com/thebe@0.8.2/lib/index.js"; const thebe_selector = ".thebe,.cell"; const thebe_selector_input = "pre"; const thebe_selector_output = ".output, .cell_output"</script>
+    <script async="async" src="http://github.com/_static/sphinx-thebe.js?v=c100c467"></script>
+    <script>var togglebuttonSelector = '.toggle, .admonition.dropdown';</script>
+    <script>const THEBE_JS_URL = "https://unpkg.com/thebe@0.8.2/lib/index.js"; const thebe_selector = ".thebe,.cell"; const thebe_selector_input = "pre"; const thebe_selector_output = ".output, .cell_output"</script>
+    <script>window.MathJax = {"options": {"processHtmlClass": "tex2jax_process|mathjax_process|math|output_area"}}</script>
+    <script defer="defer" src="https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-mml-chtml.js"></script>
+    <script>DOCUMENTATION_OPTIONS.pagename = 'exercisesweek42';</script>
+    <link rel="index" title="Index" href="http://github.com/genindex.html" />
+    <link rel="search" title="Search" href="http://github.com/search.html" />
+    <link rel="next" title="Week 43: Deep Learning: Constructing a Neural Network code and solving differential equations" href="http://github.com/week43.html" />
+    <link rel="prev" title="Week 42 Constructing a Neural Network code with examples" href="http://github.com/week42.html" />
+  <meta name="viewport" content="width=device-width, initial-scale=1"/>
+  <meta name="docsearch:language" content="en"/>
+  </head>
+  
+  
+  <body data-bs-spy="scroll" data-bs-target=".bd-toc-nav" data-offset="180" data-bs-root-margin="0px 0px -60%" data-default-mode="">
+
+  
+  
+  <div id="pst-skip-link" class="skip-link d-print-none"><a href="#main-content">Skip to main content</a></div>
+  
+  <div id="pst-scroll-pixel-helper"></div>
+  
+  <button type="button" class="btn rounded-pill" id="pst-back-to-top">
+    <i class="fa-solid fa-arrow-up"></i>Back to top</button>
+
+  
+  <input type="checkbox"
+          class="sidebar-toggle"
+          id="pst-primary-sidebar-checkbox"/>
+  <label class="overlay overlay-primary" for="pst-primary-sidebar-checkbox"></label>
+  
+  <input type="checkbox"
+          class="sidebar-toggle"
+          id="pst-secondary-sidebar-checkbox"/>
+  <label class="overlay overlay-secondary" for="pst-secondary-sidebar-checkbox"></label>
+  
+  <div class="search-button__wrapper">
+    <div class="search-button__overlay"></div>
+    <div class="search-button__search-container">
+<form class="bd-search d-flex align-items-center"
+      action="http://github.com/search.html"
+      method="get">
+  <i class="fa-solid fa-magnifying-glass"></i>
+  <input type="search"
+         class="form-control"
+         name="q"
+         id="search-input"
+         placeholder="Search this book..."
+         aria-label="Search this book..."
+         autocomplete="off"
+         autocorrect="off"
+         autocapitalize="off"
+         spellcheck="false"/>
+  <span class="search-button__kbd-shortcut"><kbd class="kbd-shortcut__modifier">Ctrl</kbd>+<kbd>K</kbd></span>
+</form></div>
+  </div>
+
+  <div class="pst-async-banner-revealer d-none">
+  <aside id="bd-header-version-warning" class="d-none d-print-none" aria-label="Version warning"></aside>
+</div>
+
+  
+    <header class="bd-header navbar navbar-expand-lg bd-navbar d-print-none">
+    </header>
+  
+
+  <div class="bd-container">
+    <div class="bd-container__inner bd-page-width">
+      
+      
+      
+      <div class="bd-sidebar-primary bd-sidebar">
+        
+
+  
+  <div class="sidebar-header-items sidebar-primary__section">
+    
+    
+    
+    
+  </div>
+  
+    <div class="sidebar-primary-items__start sidebar-primary__section">
+        <div class="sidebar-primary-item">
+
+  
+    
+  
+
+<a class="navbar-brand logo" href="http://github.com/intro.html">
+  
+  
+  
+  
+  
+    
+    
+      
+    
+    
+    <img src="http://github.com/_static/logo.png" class="logo__image only-light" alt="Applied Data Analysis and Machine Learning - Home"/>
+    <script>document.write(`<img src="http://github.com/_static/logo.png" class="logo__image only-dark" alt="Applied Data Analysis and Machine Learning - Home"/>`);</script>
+  
+  
+</a></div>
+        <div class="sidebar-primary-item">
+
+ <script>
+ document.write(`
+   <button class="btn search-button-field search-button__button" title="Search" aria-label="Search" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <i class="fa-solid fa-magnifying-glass"></i>
+    <span class="search-button__default-text">Search</span>
+    <span class="search-button__kbd-shortcut"><kbd class="kbd-shortcut__modifier">Ctrl</kbd>+<kbd class="kbd-shortcut__modifier">K</kbd></span>
+   </button>
+ `);
+ </script></div>
+        <div class="sidebar-primary-item"><nav class="bd-links bd-docs-nav" aria-label="Main">
+    <div class="bd-toc-item navbar-nav active">
+        
+        <ul class="nav bd-sidenav bd-sidenav__home-link">
+            <li class="toctree-l1">
+                <a class="reference internal" href="http://github.com/intro.html">
+                    Applied Data Analysis and Machine Learning
+                </a>
+            </li>
+        </ul>
+        <p aria-level="2" class="caption" role="heading"><span class="caption-text">About the course</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/schedule.html">Course setting</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/teachers.html">Teachers and Grading</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/textbooks.html">Textbooks</a></li>
+
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Review of Statistics with Resampling Techniques and Linear Algebra</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/statistics.html">1. Elements of Probability Theory and Statistical Data Analysis</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/linalg.html">2. Linear Algebra, Handling of Arrays and more Python Features</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">From Regression to Support Vector Machines</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter1.html">3. Linear Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter2.html">4. Ridge and Lasso Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter3.html">5. Resampling Methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter4.html">6. Logistic Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapteroptimization.html">7. Optimization, the central part of any Machine Learning algortithm</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter5.html">8. Support Vector Machines, overarching aims</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Decision Trees, Ensemble Methods and Boosting</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter6.html">9. Decision trees, overarching aims</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter7.html">10. Ensemble Methods: From a Single Tree to Many Trees and Extreme Boosting, Meet the Jungle of Methods</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Dimensionality Reduction</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter8.html">11. Basic ideas of the Principal Component Analysis (PCA)</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/clustering.html">12. Clustering and Unsupervised Learning</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Deep Learning Methods</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter9.html">13. Neural networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter10.html">14. Building a Feed Forward Neural Network</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter11.html">15. Solving Differential Equations  with Deep Learning</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter12.html">16. Convolutional Neural Networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter13.html">17. Recurrent neural networks: Overarching view</a></li>
+
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Weekly material, notes and exercises</span></p>
+<ul class="current nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek34.html">Exercises week 34</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week34.html">Week 34: Introduction to the course, Logistics and Practicalities</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek35.html">Exercises week 35</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week35.html">Week 35: From Ordinary Linear Regression to Ridge and Lasso Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek36.html">Exercises week 36</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week36.html">Week 36: Linear Regression and Gradient descent</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek37.html">Exercises week 37</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week37.html">Week 37: Gradient descent methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek38.html">Exercises week 38</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week38.html">Week 38: Statistical analysis, bias-variance tradeoff and resampling methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek39.html">Exercises week 39</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week39.html">Week 39: Resampling methods and logistic regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week40.html">Week 40: Gradient descent methods (continued) and start Neural networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week41.html">Week 41 Neural networks and constructing a neural network code</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek41.html">Exercises week 41</a></li>
+
+
+
+
+
+
+
+
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week42.html">Week 42 Constructing a Neural Network code with examples</a></li>
+<li class="toctree-l1 current active"><a class="current reference internal" href="#">Exercises week 42</a></li>
+
+
+
+
+
+
+
+
+
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week43.html">Week 43: Deep Learning: Constructing a Neural Network code and solving differential equations</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek43.html">Exercises week 43</a></li>
+
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week44.html">Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek44.html">Exercises week 44</a></li>
+
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week45.html">Week 45,  Convolutional Neural Networks (CCNs)</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Projects</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/project1.html">Project 1 on Machine Learning, deadline October 6 (midnight), 2025</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/project2.html">Project 2 on Machine Learning, deadline November 10 (Midnight)</a></li>
+</ul>
+
+    </div>
+</nav></div>
+    </div>
+  
+  
+  <div class="sidebar-primary-items__end sidebar-primary__section">
+  </div>
+  
+  <div id="rtd-footer-container"></div>
+
+
+      </div>
+      
+      <main id="main-content" class="bd-main" role="main">
+        
+        
+
+<div class="sbt-scroll-pixel-helper"></div>
+
+          <div class="bd-content">
+            <div class="bd-article-container">
+              
+              <div class="bd-header-article d-print-none">
+<div class="header-article-items header-article__inner">
+  
+    <div class="header-article-items__start">
+      
+        <div class="header-article-item"><button class="sidebar-toggle primary-toggle btn btn-sm" title="Toggle primary sidebar" data-bs-placement="bottom" data-bs-toggle="tooltip">
+  <span class="fa-solid fa-bars"></span>
+</button></div>
+      
+    </div>
+  
+  
+    <div class="header-article-items__end">
+      
+        <div class="header-article-item">
+
+<div class="article-header-buttons">
+
+
+
+
+
+<div class="dropdown dropdown-download-buttons">
+  <button class="btn dropdown-toggle" type="button" data-bs-toggle="dropdown" aria-expanded="false" aria-label="Download this page">
+    <i class="fas fa-download"></i>
+  </button>
+  <ul class="dropdown-menu">
+      
+      
+      
+      <li><a href="http://github.com/_sources/exercisesweek42.ipynb" target="_blank"
+   class="btn btn-sm btn-download-source-button dropdown-item"
+   title="Download source file"
+   data-bs-placement="left" data-bs-toggle="tooltip"
+>
+  
+
+<span class="btn__icon-container">
+  <i class="fas fa-file"></i>
+  </span>
+<span class="btn__text-container">.ipynb</span>
+</a>
+</li>
+      
+      
+      
+      
+      <li>
+<button onclick="window.print()"
+  class="btn btn-sm btn-download-pdf-button dropdown-item"
+  title="Print to PDF"
+  data-bs-placement="left" data-bs-toggle="tooltip"
+>
+  
+
+<span class="btn__icon-container">
+  <i class="fas fa-file-pdf"></i>
+  </span>
+<span class="btn__text-container">.pdf</span>
+</button>
+</li>
+      
+  </ul>
+</div>
+
+
+
+
+<button onclick="toggleFullScreen()"
+  class="btn btn-sm btn-fullscreen-button"
+  title="Fullscreen mode"
+  data-bs-placement="bottom" data-bs-toggle="tooltip"
+>
+  
+
+<span class="btn__icon-container">
+  <i class="fas fa-expand"></i>
+  </span>
+
+</button>
+
+
+
+<script>
+document.write(`
+  <button class="btn btn-sm nav-link pst-navbar-icon theme-switch-button" title="light/dark" aria-label="light/dark" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <i class="theme-switch fa-solid fa-sun fa-lg" data-mode="light"></i>
+    <i class="theme-switch fa-solid fa-moon fa-lg" data-mode="dark"></i>
+    <i class="theme-switch fa-solid fa-circle-half-stroke fa-lg" data-mode="auto"></i>
+  </button>
+`);
+</script>
+
+
+<script>
+document.write(`
+  <button class="btn btn-sm pst-navbar-icon search-button search-button__button" title="Search" aria-label="Search" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <i class="fa-solid fa-magnifying-glass fa-lg"></i>
+  </button>
+`);
+</script>
+<button class="sidebar-toggle secondary-toggle btn btn-sm" title="Toggle secondary sidebar" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <span class="fa-solid fa-list"></span>
+</button>
+</div></div>
+      
+    </div>
+  
+</div>
+</div>
+              
+              
+
+<div id="jb-print-docs-body" class="onlyprint">
+    <h1>Exercises week 42</h1>
+    <!-- Table of contents -->
+    <div id="print-main-content">
+        <div id="jb-print-toc">
+            
+            <div>
+                <h2> Contents </h2>
+            </div>
+            <nav aria-label="Page">
+                <ul class="visible nav section-nav flex-column">
+<li class="toc-h1 nav-item toc-entry"><a class="reference internal nav-link" href="#">Exercises week 42</a></li>
+<li class="toc-h1 nav-item toc-entry"><a class="reference internal nav-link" href="#overarching-aims-of-the-exercises-this-week">Overarching aims of the exercises this week</a></li>
+<li class="toc-h1 nav-item toc-entry"><a class="reference internal nav-link" href="#exercise-1-understand-the-feed-forward-pass">Exercise 1 - Understand the feed forward pass</a></li>
+<li class="toc-h1 nav-item toc-entry"><a class="reference internal nav-link" href="#exercise-2-gradient-with-one-layer-using-autograd">Exercise 2 - Gradient with one layer using autograd</a></li>
+<li class="toc-h1 nav-item toc-entry"><a class="reference internal nav-link" href="#exercise-3-gradient-with-one-layer-writing-backpropagation-by-hand">Exercise 3 - Gradient with one layer writing backpropagation by hand</a></li>
+<li class="toc-h1 nav-item toc-entry"><a class="reference internal nav-link" href="#exercise-4-gradient-with-two-layers-writing-backpropagation-by-hand">Exercise 4 - Gradient with two layers writing backpropagation by hand</a></li>
+<li class="toc-h1 nav-item toc-entry"><a class="reference internal nav-link" href="#exercise-5-gradient-with-any-number-of-layers-writing-backpropagation-by-hand">Exercise 5 - Gradient with any number of layers writing backpropagation by hand</a></li>
+<li class="toc-h1 nav-item toc-entry"><a class="reference internal nav-link" href="#exercise-6-batched-inputs">Exercise 6 - Batched inputs</a></li>
+<li class="toc-h1 nav-item toc-entry"><a class="reference internal nav-link" href="#exercise-7-training">Exercise 7 - Training</a></li>
+<li class="toc-h1 nav-item toc-entry"><a class="reference internal nav-link" href="#exercise-8-optional-object-orientation">Exercise 8 (Optional) - Object orientation</a></li>
+</ul>
+
+            </nav>
+        </div>
+    </div>
+</div>
+
+              
+                
+<div id="searchbox"></div>
+                <article class="bd-article">
+                  
+  <section class="tex2jax_ignore mathjax_ignore" id="exercises-week-42">
+<h1>Exercises week 42<a class="headerlink" href="#exercises-week-42" title="Link to this heading">#</a></h1>
+<p><strong>October 13-17, 2025</strong></p>
+<p>Date: <strong>Deadline is Friday October 17 at midnight</strong></p>
+</section>
+<section class="tex2jax_ignore mathjax_ignore" id="overarching-aims-of-the-exercises-this-week">
+<h1>Overarching aims of the exercises this week<a class="headerlink" href="#overarching-aims-of-the-exercises-this-week" title="Link to this heading">#</a></h1>
+<p>The aim of the exercises this week is to train the neural network you implemented last week.</p>
+<p>To train neural networks, we use gradient descent, since there is no analytical expression for the optimal parameters. This means you will need to compute the gradient of the cost function wrt. the network parameters. And then you will need to implement some gradient method.</p>
+<p>You will begin by computing gradients for a network with one layer, then two layers, then any number of layers. Keeping track of the shapes and doing things step by step will be very important this week.</p>
+<p>We recommend that you do the exercises this week by editing and running this notebook file, as it includes some checks along the way that you have implemented the neural network correctly, and running small parts of the code at a time will be important for understanding the methods. If you have trouble running a notebook, you can run this notebook in google colab instead(<a class="reference external" href="https://colab.research.google.com/drive/1FfvbN0XlhV-lATRPyGRTtTBnJr3zNuHL#offline=true&amp;amp;sandboxMode=true">https://colab.research.google.com/drive/1FfvbN0XlhV-lATRPyGRTtTBnJr3zNuHL#offline=true&amp;sandboxMode=true</a>), though we recommend that you set up VSCode and your python environment to run code like this locally.</p>
+<p>First, some setup code that you will need.</p>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-ipython3 notranslate"><div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">autograd.numpy</span> <span class="k">as</span> <span class="nn">np</span>  <span class="c1"># We need to use this numpy wrapper to make automatic differentiation work later</span>
+<span class="kn">from</span> <span class="nn">autograd</span> <span class="kn">import</span> <span class="n">grad</span><span class="p">,</span> <span class="n">elementwise_grad</span>
+<span class="kn">from</span> <span class="nn">sklearn</span> <span class="kn">import</span> <span class="n">datasets</span>
+<span class="kn">import</span> <span class="nn">matplotlib.pyplot</span> <span class="k">as</span> <span class="nn">plt</span>
+<span class="kn">from</span> <span class="nn">sklearn.metrics</span> <span class="kn">import</span> <span class="n">accuracy_score</span>
+
+
+<span class="c1"># Defining some activation functions</span>
+<span class="k">def</span> <span class="nf">ReLU</span><span class="p">(</span><span class="n">z</span><span class="p">):</span>
+    <span class="k">return</span> <span class="n">np</span><span class="o">.</span><span class="n">where</span><span class="p">(</span><span class="n">z</span> <span class="o">&gt;</span> <span class="mi">0</span><span class="p">,</span> <span class="n">z</span><span class="p">,</span> <span class="mi">0</span><span class="p">)</span>
+
+
+<span class="c1"># Derivative of the ReLU function</span>
+<span class="k">def</span> <span class="nf">ReLU_der</span><span class="p">(</span><span class="n">z</span><span class="p">):</span>
+    <span class="k">return</span> <span class="n">np</span><span class="o">.</span><span class="n">where</span><span class="p">(</span><span class="n">z</span> <span class="o">&gt;</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">)</span>
+
+
+<span class="k">def</span> <span class="nf">sigmoid</span><span class="p">(</span><span class="n">z</span><span class="p">):</span>
+    <span class="k">return</span> <span class="mi">1</span> <span class="o">/</span> <span class="p">(</span><span class="mi">1</span> <span class="o">+</span> <span class="n">np</span><span class="o">.</span><span class="n">exp</span><span class="p">(</span><span class="o">-</span><span class="n">z</span><span class="p">))</span>
+
+
+<span class="k">def</span> <span class="nf">mse</span><span class="p">(</span><span class="n">predict</span><span class="p">,</span> <span class="n">target</span><span class="p">):</span>
+    <span class="k">return</span> <span class="n">np</span><span class="o">.</span><span class="n">mean</span><span class="p">((</span><span class="n">predict</span> <span class="o">-</span> <span class="n">target</span><span class="p">)</span> <span class="o">**</span> <span class="mi">2</span><span class="p">)</span>
+</pre></div>
+</div>
+</div>
+</div>
+</section>
+<section class="tex2jax_ignore mathjax_ignore" id="exercise-1-understand-the-feed-forward-pass">
+<h1>Exercise 1 - Understand the feed forward pass<a class="headerlink" href="#exercise-1-understand-the-feed-forward-pass" title="Link to this heading">#</a></h1>
+<p><strong>a)</strong> Complete last weeks’ exercises if you haven’t already (recommended).</p>
+</section>
+<section class="tex2jax_ignore mathjax_ignore" id="exercise-2-gradient-with-one-layer-using-autograd">
+<h1>Exercise 2 - Gradient with one layer using autograd<a class="headerlink" href="#exercise-2-gradient-with-one-layer-using-autograd" title="Link to this heading">#</a></h1>
+<p>For the first few exercises, we will not use batched inputs. Only a single input vector is passed through the layer at a time.</p>
+<p>In this exercise you will compute the gradient of a single layer. You only need to change the code in the cells right below an exercise, the rest works out of the box. Feel free to make changes and see how stuff works though!</p>
+<p><strong>a)</strong> If the weights and bias of a layer has shapes (10, 4) and (10), what will the shapes of the gradients of the cost function wrt. these weights and this bias be?</p>
+<p><strong>b)</strong> Complete the feed_forward_one_layer function. It should use the sigmoid activation function. Also define the weigth and bias with the correct shapes.</p>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-ipython3 notranslate"><div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">feed_forward_one_layer</span><span class="p">(</span><span class="n">W</span><span class="p">,</span> <span class="n">b</span><span class="p">,</span> <span class="n">x</span><span class="p">):</span>
+    <span class="n">z</span> <span class="o">=</span> <span class="o">...</span>
+    <span class="n">a</span> <span class="o">=</span> <span class="o">...</span>
+    <span class="k">return</span> <span class="n">a</span>
+
+
+<span class="k">def</span> <span class="nf">cost_one_layer</span><span class="p">(</span><span class="n">W</span><span class="p">,</span> <span class="n">b</span><span class="p">,</span> <span class="n">x</span><span class="p">,</span> <span class="n">target</span><span class="p">):</span>
+    <span class="n">predict</span> <span class="o">=</span> <span class="n">feed_forward_one_layer</span><span class="p">(</span><span class="n">W</span><span class="p">,</span> <span class="n">b</span><span class="p">,</span> <span class="n">x</span><span class="p">)</span>
+    <span class="k">return</span> <span class="n">mse</span><span class="p">(</span><span class="n">predict</span><span class="p">,</span> <span class="n">target</span><span class="p">)</span>
+
+
+<span class="n">x</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">rand</span><span class="p">(</span><span class="mi">2</span><span class="p">)</span>
+<span class="n">target</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">rand</span><span class="p">(</span><span class="mi">3</span><span class="p">)</span>
+
+<span class="n">W</span> <span class="o">=</span> <span class="o">...</span>
+<span class="n">b</span> <span class="o">=</span> <span class="o">...</span>
+</pre></div>
+</div>
+</div>
+</div>
+<p><strong>c)</strong> Compute the gradient of the cost function wrt. the weigth and bias by running the cell below. You will not need to change anything, just make sure it runs by defining things correctly in the cell above. This code uses the autograd package which uses backprogagation to compute the gradient!</p>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-ipython3 notranslate"><div class="highlight"><pre><span></span><span class="n">autograd_one_layer</span> <span class="o">=</span> <span class="n">grad</span><span class="p">(</span><span class="n">cost_one_layer</span><span class="p">,</span> <span class="p">[</span><span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">])</span>
+<span class="n">W_g</span><span class="p">,</span> <span class="n">b_g</span> <span class="o">=</span> <span class="n">autograd_one_layer</span><span class="p">(</span><span class="n">W</span><span class="p">,</span> <span class="n">b</span><span class="p">,</span> <span class="n">x</span><span class="p">,</span> <span class="n">target</span><span class="p">)</span>
+<span class="nb">print</span><span class="p">(</span><span class="n">W_g</span><span class="p">,</span> <span class="n">b_g</span><span class="p">)</span>
+</pre></div>
+</div>
+</div>
+</div>
+</section>
+<section class="tex2jax_ignore mathjax_ignore" id="exercise-3-gradient-with-one-layer-writing-backpropagation-by-hand">
+<h1>Exercise 3 - Gradient with one layer writing backpropagation by hand<a class="headerlink" href="#exercise-3-gradient-with-one-layer-writing-backpropagation-by-hand" title="Link to this heading">#</a></h1>
+<p>Before you use the gradient you found using autograd, you will have to find the gradient “manually”, to better understand how the backpropagation computation works. To do backpropagation “manually”, you will need to write out expressions for many derivatives along the computation.</p>
+<p>We want to find the gradient of the cost function wrt. the weight and bias. This is quite hard to do directly, so we instead use the chain rule to combine multiple derivatives which are easier to compute.</p>
+<div class="math notranslate nohighlight">
+\[
+\frac{dC}{dW} = \frac{dC}{da}\frac{da}{dz}\frac{dz}{dW}
+\]</div>
+<div class="math notranslate nohighlight">
+\[
+\frac{dC}{db} = \frac{dC}{da}\frac{da}{dz}\frac{dz}{db}
+\]</div>
+<p><strong>a)</strong> Which intermediary results can be reused between the two expressions?</p>
+<p><strong>b)</strong> What is the derivative of the cost wrt. the final activation? You can use the autograd calculation to make sure you get the correct result. Remember that we compute the mean in mse.</p>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-ipython3 notranslate"><div class="highlight"><pre><span></span><span class="n">z</span> <span class="o">=</span> <span class="n">W</span> <span class="o">@</span> <span class="n">x</span> <span class="o">+</span> <span class="n">b</span>
+<span class="n">a</span> <span class="o">=</span> <span class="n">sigmoid</span><span class="p">(</span><span class="n">z</span><span class="p">)</span>
+
+<span class="n">predict</span> <span class="o">=</span> <span class="n">a</span>
+
+
+<span class="k">def</span> <span class="nf">mse_der</span><span class="p">(</span><span class="n">predict</span><span class="p">,</span> <span class="n">target</span><span class="p">):</span>
+    <span class="k">return</span> <span class="o">...</span>
+
+
+<span class="nb">print</span><span class="p">(</span><span class="n">mse_der</span><span class="p">(</span><span class="n">predict</span><span class="p">,</span> <span class="n">target</span><span class="p">))</span>
+
+<span class="n">cost_autograd</span> <span class="o">=</span> <span class="n">grad</span><span class="p">(</span><span class="n">mse</span><span class="p">,</span> <span class="mi">0</span><span class="p">)</span>
+<span class="nb">print</span><span class="p">(</span><span class="n">cost_autograd</span><span class="p">(</span><span class="n">predict</span><span class="p">,</span> <span class="n">target</span><span class="p">))</span>
+</pre></div>
+</div>
+</div>
+</div>
+<p><strong>c)</strong> What is the expression for the derivative of the sigmoid activation function? You can use the autograd calculation to make sure you get the correct result.</p>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-ipython3 notranslate"><div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">sigmoid_der</span><span class="p">(</span><span class="n">z</span><span class="p">):</span>
+    <span class="k">return</span> <span class="o">...</span>
+
+
+<span class="nb">print</span><span class="p">(</span><span class="n">sigmoid_der</span><span class="p">(</span><span class="n">z</span><span class="p">))</span>
+
+<span class="n">sigmoid_autograd</span> <span class="o">=</span> <span class="n">elementwise_grad</span><span class="p">(</span><span class="n">sigmoid</span><span class="p">,</span> <span class="mi">0</span><span class="p">)</span>
+<span class="nb">print</span><span class="p">(</span><span class="n">sigmoid_autograd</span><span class="p">(</span><span class="n">z</span><span class="p">))</span>
+</pre></div>
+</div>
+</div>
+</div>
+<p><strong>d)</strong> Using the two derivatives you just computed, compute this intermetidary gradient you will use later:</p>
+<div class="math notranslate nohighlight">
+\[
+\frac{dC}{dz} = \frac{dC}{da}\frac{da}{dz}
+\]</div>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-ipython3 notranslate"><div class="highlight"><pre><span></span><span class="n">dC_da</span> <span class="o">=</span> <span class="o">...</span>
+<span class="n">dC_dz</span> <span class="o">=</span> <span class="o">...</span>
+</pre></div>
+</div>
+</div>
+</div>
+<p><strong>e)</strong> What is the derivative of the intermediary z wrt. the weight and bias? What should the shapes be? The one for the weights is a little tricky, it can be easier to play around in the next exercise first. You can also try computing it with autograd to get a hint.</p>
+<p><strong>f)</strong> Now combine the expressions you have worked with so far to compute the gradients! Note that you always need to do a feed forward pass while saving the zs and as before you do backpropagation, as they are used in the derivative expressions</p>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-ipython3 notranslate"><div class="highlight"><pre><span></span><span class="n">dC_da</span> <span class="o">=</span> <span class="o">...</span>
+<span class="n">dC_dz</span> <span class="o">=</span> <span class="o">...</span>
+<span class="n">dC_dW</span> <span class="o">=</span> <span class="o">...</span>
+<span class="n">dC_db</span> <span class="o">=</span> <span class="o">...</span>
+
+<span class="nb">print</span><span class="p">(</span><span class="n">dC_dW</span><span class="p">,</span> <span class="n">dC_db</span><span class="p">)</span>
+</pre></div>
+</div>
+</div>
+</div>
+<p>You should get the same results as with autograd.</p>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-ipython3 notranslate"><div class="highlight"><pre><span></span><span class="n">W_g</span><span class="p">,</span> <span class="n">b_g</span> <span class="o">=</span> <span class="n">autograd_one_layer</span><span class="p">(</span><span class="n">W</span><span class="p">,</span> <span class="n">b</span><span class="p">,</span> <span class="n">x</span><span class="p">,</span> <span class="n">target</span><span class="p">)</span>
+<span class="nb">print</span><span class="p">(</span><span class="n">W_g</span><span class="p">,</span> <span class="n">b_g</span><span class="p">)</span>
+</pre></div>
+</div>
+</div>
+</div>
+</section>
+<section class="tex2jax_ignore mathjax_ignore" id="exercise-4-gradient-with-two-layers-writing-backpropagation-by-hand">
+<h1>Exercise 4 - Gradient with two layers writing backpropagation by hand<a class="headerlink" href="#exercise-4-gradient-with-two-layers-writing-backpropagation-by-hand" title="Link to this heading">#</a></h1>
+<p>Now that you have implemented backpropagation for one layer, you have found most of the expressions you will need for more layers. Let’s move up to two layers.</p>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-ipython3 notranslate"><div class="highlight"><pre><span></span><span class="n">x</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">rand</span><span class="p">(</span><span class="mi">2</span><span class="p">)</span>
+<span class="n">target</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">rand</span><span class="p">(</span><span class="mi">4</span><span class="p">)</span>
+
+<span class="n">W1</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">rand</span><span class="p">(</span><span class="mi">3</span><span class="p">,</span> <span class="mi">2</span><span class="p">)</span>
+<span class="n">b1</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">rand</span><span class="p">(</span><span class="mi">3</span><span class="p">)</span>
+
+<span class="n">W2</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">rand</span><span class="p">(</span><span class="mi">4</span><span class="p">,</span> <span class="mi">3</span><span class="p">)</span>
+<span class="n">b2</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">rand</span><span class="p">(</span><span class="mi">4</span><span class="p">)</span>
+
+<span class="n">layers</span> <span class="o">=</span> <span class="p">[(</span><span class="n">W1</span><span class="p">,</span> <span class="n">b1</span><span class="p">),</span> <span class="p">(</span><span class="n">W2</span><span class="p">,</span> <span class="n">b2</span><span class="p">)]</span>
+</pre></div>
+</div>
+</div>
+</div>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-ipython3 notranslate"><div class="highlight"><pre><span></span><span class="n">z1</span> <span class="o">=</span> <span class="n">W1</span> <span class="o">@</span> <span class="n">x</span> <span class="o">+</span> <span class="n">b1</span>
+<span class="n">a1</span> <span class="o">=</span> <span class="n">sigmoid</span><span class="p">(</span><span class="n">z1</span><span class="p">)</span>
+<span class="n">z2</span> <span class="o">=</span> <span class="n">W2</span> <span class="o">@</span> <span class="n">a1</span> <span class="o">+</span> <span class="n">b2</span>
+<span class="n">a2</span> <span class="o">=</span> <span class="n">sigmoid</span><span class="p">(</span><span class="n">z2</span><span class="p">)</span>
+</pre></div>
+</div>
+</div>
+</div>
+<p>We begin by computing the gradients of the last layer, as the gradients must be propagated backwards from the end.</p>
+<p><strong>a)</strong> Compute the gradients of the last layer, just like you did the single layer in the previous exercise.</p>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-ipython3 notranslate"><div class="highlight"><pre><span></span><span class="n">dC_da2</span> <span class="o">=</span> <span class="o">...</span>
+<span class="n">dC_dz2</span> <span class="o">=</span> <span class="o">...</span>
+<span class="n">dC_dW2</span> <span class="o">=</span> <span class="o">...</span>
+<span class="n">dC_db2</span> <span class="o">=</span> <span class="o">...</span>
+</pre></div>
+</div>
+</div>
+</div>
+<p>To find the derivative of the cost wrt. the activation of the first layer, we need a new expression, the one furthest to the right in the following.</p>
+<div class="math notranslate nohighlight">
+\[
+\frac{dC}{da_1} = \frac{dC}{dz_2}\frac{dz_2}{da_1}
+\]</div>
+<p><strong>b)</strong> What is the derivative of the second layer intermetiate wrt. the first layer activation? (First recall how you compute <span class="math notranslate nohighlight">\(z_2\)</span>)</p>
+<div class="math notranslate nohighlight">
+\[
+\frac{dz_2}{da_1}
+\]</div>
+<p><strong>c)</strong> Use this expression, together with expressions which are equivelent to ones for the last layer to compute all the derivatives of the first layer.</p>
+<div class="math notranslate nohighlight">
+\[
+\frac{dC}{dW_1} = \frac{dC}{da_1}\frac{da_1}{dz_1}\frac{dz_1}{dW_1}
+\]</div>
+<div class="math notranslate nohighlight">
+\[
+\frac{dC}{db_1} = \frac{dC}{da_1}\frac{da_1}{dz_1}\frac{dz_1}{db_1}
+\]</div>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-ipython3 notranslate"><div class="highlight"><pre><span></span><span class="n">dC_da1</span> <span class="o">=</span> <span class="o">...</span>
+<span class="n">dC_dz1</span> <span class="o">=</span> <span class="o">...</span>
+<span class="n">dC_dW1</span> <span class="o">=</span> <span class="o">...</span>
+<span class="n">dC_db1</span> <span class="o">=</span> <span class="o">...</span>
+</pre></div>
+</div>
+</div>
+</div>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-ipython3 notranslate"><div class="highlight"><pre><span></span><span class="nb">print</span><span class="p">(</span><span class="n">dC_dW1</span><span class="p">,</span> <span class="n">dC_db1</span><span class="p">)</span>
+<span class="nb">print</span><span class="p">(</span><span class="n">dC_dW2</span><span class="p">,</span> <span class="n">dC_db2</span><span class="p">)</span>
+</pre></div>
+</div>
+</div>
+</div>
+<p><strong>d)</strong> Make sure you got the same gradient as the following code which uses autograd to do backpropagation.</p>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-ipython3 notranslate"><div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">feed_forward_two_layers</span><span class="p">(</span><span class="n">layers</span><span class="p">,</span> <span class="n">x</span><span class="p">):</span>
+    <span class="n">W1</span><span class="p">,</span> <span class="n">b1</span> <span class="o">=</span> <span class="n">layers</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span>
+    <span class="n">z1</span> <span class="o">=</span> <span class="n">W1</span> <span class="o">@</span> <span class="n">x</span> <span class="o">+</span> <span class="n">b1</span>
+    <span class="n">a1</span> <span class="o">=</span> <span class="n">sigmoid</span><span class="p">(</span><span class="n">z1</span><span class="p">)</span>
+
+    <span class="n">W2</span><span class="p">,</span> <span class="n">b2</span> <span class="o">=</span> <span class="n">layers</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span>
+    <span class="n">z2</span> <span class="o">=</span> <span class="n">W2</span> <span class="o">@</span> <span class="n">a1</span> <span class="o">+</span> <span class="n">b2</span>
+    <span class="n">a2</span> <span class="o">=</span> <span class="n">sigmoid</span><span class="p">(</span><span class="n">z2</span><span class="p">)</span>
+
+    <span class="k">return</span> <span class="n">a2</span>
+</pre></div>
+</div>
+</div>
+</div>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-ipython3 notranslate"><div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">cost_two_layers</span><span class="p">(</span><span class="n">layers</span><span class="p">,</span> <span class="n">x</span><span class="p">,</span> <span class="n">target</span><span class="p">):</span>
+    <span class="n">predict</span> <span class="o">=</span> <span class="n">feed_forward_two_layers</span><span class="p">(</span><span class="n">layers</span><span class="p">,</span> <span class="n">x</span><span class="p">)</span>
+    <span class="k">return</span> <span class="n">mse</span><span class="p">(</span><span class="n">predict</span><span class="p">,</span> <span class="n">target</span><span class="p">)</span>
+
+
+<span class="n">grad_two_layers</span> <span class="o">=</span> <span class="n">grad</span><span class="p">(</span><span class="n">cost_two_layers</span><span class="p">,</span> <span class="mi">0</span><span class="p">)</span>
+<span class="n">grad_two_layers</span><span class="p">(</span><span class="n">layers</span><span class="p">,</span> <span class="n">x</span><span class="p">,</span> <span class="n">target</span><span class="p">)</span>
+</pre></div>
+</div>
+</div>
+</div>
+<p><strong>e)</strong> How would you use the gradient from this layer to compute the gradient of an even earlier layer? Would the expressions be any different?</p>
+</section>
+<section class="tex2jax_ignore mathjax_ignore" id="exercise-5-gradient-with-any-number-of-layers-writing-backpropagation-by-hand">
+<h1>Exercise 5 - Gradient with any number of layers writing backpropagation by hand<a class="headerlink" href="#exercise-5-gradient-with-any-number-of-layers-writing-backpropagation-by-hand" title="Link to this heading">#</a></h1>
+<p>Well done on getting this far! Now it’s time to compute the gradient with any number of layers.</p>
+<p>First, some code from the general neural network code from last week. Note that we are still sending in one input vector at a time. We will change it to use batched inputs later.</p>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-ipython3 notranslate"><div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">create_layers</span><span class="p">(</span><span class="n">network_input_size</span><span class="p">,</span> <span class="n">layer_output_sizes</span><span class="p">):</span>
+    <span class="n">layers</span> <span class="o">=</span> <span class="p">[]</span>
+
+    <span class="n">i_size</span> <span class="o">=</span> <span class="n">network_input_size</span>
+    <span class="k">for</span> <span class="n">layer_output_size</span> <span class="ow">in</span> <span class="n">layer_output_sizes</span><span class="p">:</span>
+        <span class="n">W</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">randn</span><span class="p">(</span><span class="n">layer_output_size</span><span class="p">,</span> <span class="n">i_size</span><span class="p">)</span>
+        <span class="n">b</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">randn</span><span class="p">(</span><span class="n">layer_output_size</span><span class="p">)</span>
+        <span class="n">layers</span><span class="o">.</span><span class="n">append</span><span class="p">((</span><span class="n">W</span><span class="p">,</span> <span class="n">b</span><span class="p">))</span>
+
+        <span class="n">i_size</span> <span class="o">=</span> <span class="n">layer_output_size</span>
+    <span class="k">return</span> <span class="n">layers</span>
+
+
+<span class="k">def</span> <span class="nf">feed_forward</span><span class="p">(</span><span class="nb">input</span><span class="p">,</span> <span class="n">layers</span><span class="p">,</span> <span class="n">activation_funcs</span><span class="p">):</span>
+    <span class="n">a</span> <span class="o">=</span> <span class="nb">input</span>
+    <span class="k">for</span> <span class="p">(</span><span class="n">W</span><span class="p">,</span> <span class="n">b</span><span class="p">),</span> <span class="n">activation_func</span> <span class="ow">in</span> <span class="nb">zip</span><span class="p">(</span><span class="n">layers</span><span class="p">,</span> <span class="n">activation_funcs</span><span class="p">):</span>
+        <span class="n">z</span> <span class="o">=</span> <span class="n">W</span> <span class="o">@</span> <span class="n">a</span> <span class="o">+</span> <span class="n">b</span>
+        <span class="n">a</span> <span class="o">=</span> <span class="n">activation_func</span><span class="p">(</span><span class="n">z</span><span class="p">)</span>
+    <span class="k">return</span> <span class="n">a</span>
+
+
+<span class="k">def</span> <span class="nf">cost</span><span class="p">(</span><span class="n">layers</span><span class="p">,</span> <span class="nb">input</span><span class="p">,</span> <span class="n">activation_funcs</span><span class="p">,</span> <span class="n">target</span><span class="p">):</span>
+    <span class="n">predict</span> <span class="o">=</span> <span class="n">feed_forward</span><span class="p">(</span><span class="nb">input</span><span class="p">,</span> <span class="n">layers</span><span class="p">,</span> <span class="n">activation_funcs</span><span class="p">)</span>
+    <span class="k">return</span> <span class="n">mse</span><span class="p">(</span><span class="n">predict</span><span class="p">,</span> <span class="n">target</span><span class="p">)</span>
+</pre></div>
+</div>
+</div>
+</div>
+<p>You might have already have noticed a very important detail in backpropagation: You need the values from the forward pass to compute all the gradients! The feed forward method above is great for efficiency and for using autograd, as it only cares about computing the final output, but now we need to also save the results along the way.</p>
+<p>Here is a function which does that for you.</p>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-ipython3 notranslate"><div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">feed_forward_saver</span><span class="p">(</span><span class="nb">input</span><span class="p">,</span> <span class="n">layers</span><span class="p">,</span> <span class="n">activation_funcs</span><span class="p">):</span>
+    <span class="n">layer_inputs</span> <span class="o">=</span> <span class="p">[]</span>
+    <span class="n">zs</span> <span class="o">=</span> <span class="p">[]</span>
+    <span class="n">a</span> <span class="o">=</span> <span class="nb">input</span>
+    <span class="k">for</span> <span class="p">(</span><span class="n">W</span><span class="p">,</span> <span class="n">b</span><span class="p">),</span> <span class="n">activation_func</span> <span class="ow">in</span> <span class="nb">zip</span><span class="p">(</span><span class="n">layers</span><span class="p">,</span> <span class="n">activation_funcs</span><span class="p">):</span>
+        <span class="n">layer_inputs</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">a</span><span class="p">)</span>
+        <span class="n">z</span> <span class="o">=</span> <span class="n">W</span> <span class="o">@</span> <span class="n">a</span> <span class="o">+</span> <span class="n">b</span>
+        <span class="n">a</span> <span class="o">=</span> <span class="n">activation_func</span><span class="p">(</span><span class="n">z</span><span class="p">)</span>
+
+        <span class="n">zs</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">z</span><span class="p">)</span>
+
+    <span class="k">return</span> <span class="n">layer_inputs</span><span class="p">,</span> <span class="n">zs</span><span class="p">,</span> <span class="n">a</span>
+</pre></div>
+</div>
+</div>
+</div>
+<p><strong>a)</strong> Now, complete the backpropagation function so that it returns the gradient of the cost function wrt. all the weigths and biases. Use the autograd calculation below to make sure you get the correct answer.</p>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-ipython3 notranslate"><div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">backpropagation</span><span class="p">(</span>
+    <span class="nb">input</span><span class="p">,</span> <span class="n">layers</span><span class="p">,</span> <span class="n">activation_funcs</span><span class="p">,</span> <span class="n">target</span><span class="p">,</span> <span class="n">activation_ders</span><span class="p">,</span> <span class="n">cost_der</span><span class="o">=</span><span class="n">mse_der</span>
+<span class="p">):</span>
+    <span class="n">layer_inputs</span><span class="p">,</span> <span class="n">zs</span><span class="p">,</span> <span class="n">predict</span> <span class="o">=</span> <span class="n">feed_forward_saver</span><span class="p">(</span><span class="nb">input</span><span class="p">,</span> <span class="n">layers</span><span class="p">,</span> <span class="n">activation_funcs</span><span class="p">)</span>
+
+    <span class="n">layer_grads</span> <span class="o">=</span> <span class="p">[()</span> <span class="k">for</span> <span class="n">layer</span> <span class="ow">in</span> <span class="n">layers</span><span class="p">]</span>
+
+    <span class="c1"># We loop over the layers, from the last to the first</span>
+    <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">reversed</span><span class="p">(</span><span class="nb">range</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="n">layers</span><span class="p">))):</span>
+        <span class="n">layer_input</span><span class="p">,</span> <span class="n">z</span><span class="p">,</span> <span class="n">activation_der</span> <span class="o">=</span> <span class="n">layer_inputs</span><span class="p">[</span><span class="n">i</span><span class="p">],</span> <span class="n">zs</span><span class="p">[</span><span class="n">i</span><span class="p">],</span> <span class="n">activation_ders</span><span class="p">[</span><span class="n">i</span><span class="p">]</span>
+
+        <span class="k">if</span> <span class="n">i</span> <span class="o">==</span> <span class="nb">len</span><span class="p">(</span><span class="n">layers</span><span class="p">)</span> <span class="o">-</span> <span class="mi">1</span><span class="p">:</span>
+            <span class="c1"># For last layer we use cost derivative as dC_da(L) can be computed directly</span>
+            <span class="n">dC_da</span> <span class="o">=</span> <span class="o">...</span>
+        <span class="k">else</span><span class="p">:</span>
+            <span class="c1"># For other layers we build on previous z derivative, as dC_da(i) = dC_dz(i+1) * dz(i+1)_da(i)</span>
+            <span class="p">(</span><span class="n">W</span><span class="p">,</span> <span class="n">b</span><span class="p">)</span> <span class="o">=</span> <span class="n">layers</span><span class="p">[</span><span class="n">i</span> <span class="o">+</span> <span class="mi">1</span><span class="p">]</span>
+            <span class="n">dC_da</span> <span class="o">=</span> <span class="o">...</span>
+
+        <span class="n">dC_dz</span> <span class="o">=</span> <span class="o">...</span>
+        <span class="n">dC_dW</span> <span class="o">=</span> <span class="o">...</span>
+        <span class="n">dC_db</span> <span class="o">=</span> <span class="o">...</span>
+
+        <span class="n">layer_grads</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">=</span> <span class="p">(</span><span class="n">dC_dW</span><span class="p">,</span> <span class="n">dC_db</span><span class="p">)</span>
+
+    <span class="k">return</span> <span class="n">layer_grads</span>
+</pre></div>
+</div>
+</div>
+</div>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-ipython3 notranslate"><div class="highlight"><pre><span></span><span class="n">network_input_size</span> <span class="o">=</span> <span class="mi">2</span>
+<span class="n">layer_output_sizes</span> <span class="o">=</span> <span class="p">[</span><span class="mi">3</span><span class="p">,</span> <span class="mi">4</span><span class="p">]</span>
+<span class="n">activation_funcs</span> <span class="o">=</span> <span class="p">[</span><span class="n">sigmoid</span><span class="p">,</span> <span class="n">ReLU</span><span class="p">]</span>
+<span class="n">activation_ders</span> <span class="o">=</span> <span class="p">[</span><span class="n">sigmoid_der</span><span class="p">,</span> <span class="n">ReLU_der</span><span class="p">]</span>
+
+<span class="n">layers</span> <span class="o">=</span> <span class="n">create_layers</span><span class="p">(</span><span class="n">network_input_size</span><span class="p">,</span> <span class="n">layer_output_sizes</span><span class="p">)</span>
+
+<span class="n">x</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">rand</span><span class="p">(</span><span class="n">network_input_size</span><span class="p">)</span>
+<span class="n">target</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">rand</span><span class="p">(</span><span class="mi">4</span><span class="p">)</span>
+</pre></div>
+</div>
+</div>
+</div>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-ipython3 notranslate"><div class="highlight"><pre><span></span><span class="n">layer_grads</span> <span class="o">=</span> <span class="n">backpropagation</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">layers</span><span class="p">,</span> <span class="n">activation_funcs</span><span class="p">,</span> <span class="n">target</span><span class="p">,</span> <span class="n">activation_ders</span><span class="p">)</span>
+<span class="nb">print</span><span class="p">(</span><span class="n">layer_grads</span><span class="p">)</span>
+</pre></div>
+</div>
+</div>
+</div>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-ipython3 notranslate"><div class="highlight"><pre><span></span><span class="n">cost_grad</span> <span class="o">=</span> <span class="n">grad</span><span class="p">(</span><span class="n">cost</span><span class="p">,</span> <span class="mi">0</span><span class="p">)</span>
+<span class="n">cost_grad</span><span class="p">(</span><span class="n">layers</span><span class="p">,</span> <span class="n">x</span><span class="p">,</span> <span class="p">[</span><span class="n">sigmoid</span><span class="p">,</span> <span class="n">ReLU</span><span class="p">],</span> <span class="n">target</span><span class="p">)</span>
+</pre></div>
+</div>
+</div>
+</div>
+</section>
+<section class="tex2jax_ignore mathjax_ignore" id="exercise-6-batched-inputs">
+<h1>Exercise 6 - Batched inputs<a class="headerlink" href="#exercise-6-batched-inputs" title="Link to this heading">#</a></h1>
+<p>Make new versions of all the functions in exercise 5 which now take batched inputs instead. See last weeks exercise 5 for details on how to batch inputs to neural networks. You will also need to update the backpropogation function.</p>
+</section>
+<section class="tex2jax_ignore mathjax_ignore" id="exercise-7-training">
+<h1>Exercise 7 - Training<a class="headerlink" href="#exercise-7-training" title="Link to this heading">#</a></h1>
+<p><strong>a)</strong> Complete exercise 6 and 7 from last week, but use your own backpropogation implementation to compute the gradient.</p>
+<ul class="simple">
+<li><p>IMPORTANT: Do not implement the derivative terms for softmax and cross-entropy separately, it will be very hard!</p></li>
+<li><p>Instead, use the fact that the derivatives multiplied together simplify to <strong>prediction - target</strong> (see <a class="reference external" href="https://medium.com/data-science/derivative-of-the-softmax-function-and-the-categorical-cross-entropy-loss-ffceefc081d1">source1</a>, <a class="reference external" href="https://shivammehta25.github.io/posts/deriving-categorical-cross-entropy-and-softmax/">source2</a>)</p></li>
+</ul>
+<p><strong>b)</strong> Use stochastic gradient descent with momentum when you train your network.</p>
+</section>
+<section class="tex2jax_ignore mathjax_ignore" id="exercise-8-optional-object-orientation">
+<h1>Exercise 8 (Optional) - Object orientation<a class="headerlink" href="#exercise-8-optional-object-orientation" title="Link to this heading">#</a></h1>
+<p>Passing in the layers, activations functions, activation derivatives and cost derivatives into the functions each time leads to code which is easy to understand in isoloation, but messier when used in a larger context with data splitting, data scaling, gradient methods and so forth. Creating an object which stores these values can lead to code which is much easier to use.</p>
+<p><strong>a)</strong> Write a neural network class. You are free to implement it how you see fit, though we strongly recommend to not save any input or output values as class attributes, nor let the neural network class handle gradient methods internally. Gradient methods should be handled outside, by performing general operations on the layer_grads list using functions or classes separate to the neural network.</p>
+<p>We provide here a skeleton structure which should get you started.</p>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-ipython3 notranslate"><div class="highlight"><pre><span></span><span class="k">class</span> <span class="nc">NeuralNetwork</span><span class="p">:</span>
+    <span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span>
+        <span class="bp">self</span><span class="p">,</span>
+        <span class="n">network_input_size</span><span class="p">,</span>
+        <span class="n">layer_output_sizes</span><span class="p">,</span>
+        <span class="n">activation_funcs</span><span class="p">,</span>
+        <span class="n">activation_ders</span><span class="p">,</span>
+        <span class="n">cost_fun</span><span class="p">,</span>
+        <span class="n">cost_der</span><span class="p">,</span>
+    <span class="p">):</span>
+        <span class="k">pass</span>
+
+    <span class="k">def</span> <span class="nf">predict</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">inputs</span><span class="p">):</span>
+        <span class="c1"># Simple feed forward pass</span>
+        <span class="k">pass</span>
+
+    <span class="k">def</span> <span class="nf">cost</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">inputs</span><span class="p">,</span> <span class="n">targets</span><span class="p">):</span>
+        <span class="k">pass</span>
+
+    <span class="k">def</span> <span class="nf">_feed_forward_saver</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">inputs</span><span class="p">):</span>
+        <span class="k">pass</span>
+
+    <span class="k">def</span> <span class="nf">compute_gradient</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">inputs</span><span class="p">,</span> <span class="n">targets</span><span class="p">):</span>
+        <span class="k">pass</span>
+
+    <span class="k">def</span> <span class="nf">update_weights</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">layer_grads</span><span class="p">):</span>
+        <span class="k">pass</span>
+
+    <span class="c1"># These last two methods are not needed in the project, but they can be nice to have! The first one has a layers parameter so that you can use autograd on it</span>
+    <span class="k">def</span> <span class="nf">autograd_compliant_predict</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">layers</span><span class="p">,</span> <span class="n">inputs</span><span class="p">):</span>
+        <span class="k">pass</span>
+
+    <span class="k">def</span> <span class="nf">autograd_gradient</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">inputs</span><span class="p">,</span> <span class="n">targets</span><span class="p">):</span>
+        <span class="k">pass</span>
+</pre></div>
+</div>
+</div>
+</div>
+</section>
+
+    <script type="text/x-thebe-config">
+    {
+        requestKernel: true,
+        binderOptions: {
+            repo: "binder-examples/jupyter-stacks-datascience",
+            ref: "master",
+        },
+        codeMirrorConfig: {
+            theme: "abcdef",
+            mode: "python"
+        },
+        kernelOptions: {
+            name: "python3",
+            path: "./."
+        },
+        predefinedOutput: true
+    }
+    </script>
+    <script>kernelName = 'python3'</script>
+
+                </article>
+              
+
+              
+              
+              
+              
+                <footer class="prev-next-footer d-print-none">
+                  
+<div class="prev-next-area">
+    <a class="left-prev"
+       href="http://github.com/week42.html"
+       title="previous page">
+      <i class="fa-solid fa-angle-left"></i>
+      <div class="prev-next-info">
+        <p class="prev-next-subtitle">previous</p>
+        <p class="prev-next-title">Week 42 Constructing a Neural Network code with examples</p>
+      </div>
+    </a>
+    <a class="right-next"
+       href="http://github.com/week43.html"
+       title="next page">
+      <div class="prev-next-info">
+        <p class="prev-next-subtitle">next</p>
+        <p class="prev-next-title">Week 43: Deep Learning: Constructing a Neural Network code and solving differential equations</p>
+      </div>
+      <i class="fa-solid fa-angle-right"></i>
+    </a>
+</div>
+                </footer>
+              
+            </div>
+            
+            
+              
+                <div class="bd-sidebar-secondary bd-toc"><div class="sidebar-secondary-items sidebar-secondary__inner">
+
+
+  <div class="sidebar-secondary-item">
+  <div class="page-toc tocsection onthispage">
+    <i class="fa-solid fa-list"></i> Contents
+  </div>
+  <nav class="bd-toc-nav page-toc">
+    <ul class="visible nav section-nav flex-column">
+<li class="toc-h1 nav-item toc-entry"><a class="reference internal nav-link" href="#">Exercises week 42</a></li>
+<li class="toc-h1 nav-item toc-entry"><a class="reference internal nav-link" href="#overarching-aims-of-the-exercises-this-week">Overarching aims of the exercises this week</a></li>
+<li class="toc-h1 nav-item toc-entry"><a class="reference internal nav-link" href="#exercise-1-understand-the-feed-forward-pass">Exercise 1 - Understand the feed forward pass</a></li>
+<li class="toc-h1 nav-item toc-entry"><a class="reference internal nav-link" href="#exercise-2-gradient-with-one-layer-using-autograd">Exercise 2 - Gradient with one layer using autograd</a></li>
+<li class="toc-h1 nav-item toc-entry"><a class="reference internal nav-link" href="#exercise-3-gradient-with-one-layer-writing-backpropagation-by-hand">Exercise 3 - Gradient with one layer writing backpropagation by hand</a></li>
+<li class="toc-h1 nav-item toc-entry"><a class="reference internal nav-link" href="#exercise-4-gradient-with-two-layers-writing-backpropagation-by-hand">Exercise 4 - Gradient with two layers writing backpropagation by hand</a></li>
+<li class="toc-h1 nav-item toc-entry"><a class="reference internal nav-link" href="#exercise-5-gradient-with-any-number-of-layers-writing-backpropagation-by-hand">Exercise 5 - Gradient with any number of layers writing backpropagation by hand</a></li>
+<li class="toc-h1 nav-item toc-entry"><a class="reference internal nav-link" href="#exercise-6-batched-inputs">Exercise 6 - Batched inputs</a></li>
+<li class="toc-h1 nav-item toc-entry"><a class="reference internal nav-link" href="#exercise-7-training">Exercise 7 - Training</a></li>
+<li class="toc-h1 nav-item toc-entry"><a class="reference internal nav-link" href="#exercise-8-optional-object-orientation">Exercise 8 (Optional) - Object orientation</a></li>
+</ul>
+
+  </nav></div>
+
+</div></div>
+              
+            
+          </div>
+          <footer class="bd-footer-content">
+            
+<div class="bd-footer-content__inner container">
+  
+  <div class="footer-item">
+    
+<p class="component-author">
+By Morten Hjorth-Jensen
+</p>
+
+  </div>
+  
+  <div class="footer-item">
+    
+
+  <p class="copyright">
+    
+      © Copyright 2023.
+      <br/>
+    
+  </p>
+
+  </div>
+  
+  <div class="footer-item">
+    
+  </div>
+  
+  <div class="footer-item">
+    
+  </div>
+  
+</div>
+          </footer>
+        
+
+      </main>
+    </div>
+  </div>
+  
+  <!-- Scripts loaded after <body> so the DOM is not blocked -->
+  <script src="http://github.com/_static/scripts/bootstrap.js?digest=dfe6caa3a7d634c4db9b"></script>
+<script src="http://github.com/_static/scripts/pydata-sphinx-theme.js?digest=dfe6caa3a7d634c4db9b"></script>
+
+  <footer class="bd-footer">
+  </footer>
+  </body>
+</html>
\ No newline at end of file
diff --git a/doc/LectureNotes/_build/html/exercisesweek43.html b/doc/LectureNotes/_build/html/exercisesweek43.html
new file mode 100644
index 000000000..037da43fb
--- /dev/null
+++ b/doc/LectureNotes/_build/html/exercisesweek43.html
@@ -0,0 +1,850 @@
+
+<!DOCTYPE html>
+
+
+<html lang="en" data-content_root="./" >
+
+  <head>
+    <meta charset="utf-8" />
+    <meta name="viewport" content="width=device-width, initial-scale=1.0" /><meta name="viewport" content="width=device-width, initial-scale=1" />
+
+    <title>Exercises week 43 &#8212; Applied Data Analysis and Machine Learning</title>
+  
+  
+  
+  <script data-cfasync="false">
+    document.documentElement.dataset.mode = localStorage.getItem("mode") || "";
+    document.documentElement.dataset.theme = localStorage.getItem("theme") || "";
+  </script>
+  
+  <!-- Loaded before other Sphinx assets -->
+  <link href="http://github.com/_static/styles/theme.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+<link href="http://github.com/_static/styles/bootstrap.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+<link href="http://github.com/_static/styles/pydata-sphinx-theme.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+
+  
+  <link href="http://github.com/_static/vendor/fontawesome/6.5.2/css/all.min.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+  <link rel="preload" as="font" type="font/woff2" crossorigin href="http://github.com/_static/vendor/fontawesome/6.5.2/webfonts/fa-solid-900.woff2" />
+<link rel="preload" as="font" type="font/woff2" crossorigin href="http://github.com/_static/vendor/fontawesome/6.5.2/webfonts/fa-brands-400.woff2" />
+<link rel="preload" as="font" type="font/woff2" crossorigin href="http://github.com/_static/vendor/fontawesome/6.5.2/webfonts/fa-regular-400.woff2" />
+
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/pygments.css?v=fa44fd50" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/styles/sphinx-book-theme.css?v=eba8b062" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/togglebutton.css?v=13237357" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/copybutton.css?v=76b2166b" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/mystnb.8ecb98da25f57f5357bf6f572d296f466b2cfe2517ffebfabe82451661e28f02.css?v=6644e6bb" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/sphinx-thebe.css?v=4fa983c6" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/sphinx-design.min.css?v=95c83b7e" />
+  
+  <!-- Pre-loaded scripts that we'll load fully later -->
+  <link rel="preload" as="script" href="http://github.com/_static/scripts/bootstrap.js?digest=dfe6caa3a7d634c4db9b" />
+<link rel="preload" as="script" href="http://github.com/_static/scripts/pydata-sphinx-theme.js?digest=dfe6caa3a7d634c4db9b" />
+  <script src="http://github.com/_static/vendor/fontawesome/6.5.2/js/all.min.js?digest=dfe6caa3a7d634c4db9b"></script>
+
+    <script src="http://github.com/_static/documentation_options.js?v=9eb32ce0"></script>
+    <script src="http://github.com/_static/doctools.js?v=9a2dae69"></script>
+    <script src="http://github.com/_static/sphinx_highlight.js?v=dc90522c"></script>
+    <script src="http://github.com/_static/clipboard.min.js?v=a7894cd8"></script>
+    <script src="http://github.com/_static/copybutton.js?v=f281be69"></script>
+    <script src="http://github.com/_static/scripts/sphinx-book-theme.js?v=887ef09a"></script>
+    <script>let toggleHintShow = 'Click to show';</script>
+    <script>let toggleHintHide = 'Click to hide';</script>
+    <script>let toggleOpenOnPrint = 'true';</script>
+    <script src="http://github.com/_static/togglebutton.js?v=4a39c7ea"></script>
+    <script>var togglebuttonSelector = '.toggle, .admonition.dropdown';</script>
+    <script src="http://github.com/_static/design-tabs.js?v=f930bc37"></script>
+    <script>const THEBE_JS_URL = "https://unpkg.com/thebe@0.8.2/lib/index.js"; const thebe_selector = ".thebe,.cell"; const thebe_selector_input = "pre"; const thebe_selector_output = ".output, .cell_output"</script>
+    <script async="async" src="http://github.com/_static/sphinx-thebe.js?v=c100c467"></script>
+    <script>var togglebuttonSelector = '.toggle, .admonition.dropdown';</script>
+    <script>const THEBE_JS_URL = "https://unpkg.com/thebe@0.8.2/lib/index.js"; const thebe_selector = ".thebe,.cell"; const thebe_selector_input = "pre"; const thebe_selector_output = ".output, .cell_output"</script>
+    <script>window.MathJax = {"options": {"processHtmlClass": "tex2jax_process|mathjax_process|math|output_area"}}</script>
+    <script defer="defer" src="https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-mml-chtml.js"></script>
+    <script>DOCUMENTATION_OPTIONS.pagename = 'exercisesweek43';</script>
+    <link rel="index" title="Index" href="http://github.com/genindex.html" />
+    <link rel="search" title="Search" href="http://github.com/search.html" />
+    <link rel="next" title="Week 44, Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)" href="http://github.com/week44.html" />
+    <link rel="prev" title="Week 43: Deep Learning: Constructing a Neural Network code and solving differential equations" href="http://github.com/week43.html" />
+  <meta name="viewport" content="width=device-width, initial-scale=1"/>
+  <meta name="docsearch:language" content="en"/>
+  </head>
+  
+  
+  <body data-bs-spy="scroll" data-bs-target=".bd-toc-nav" data-offset="180" data-bs-root-margin="0px 0px -60%" data-default-mode="">
+
+  
+  
+  <div id="pst-skip-link" class="skip-link d-print-none"><a href="#main-content">Skip to main content</a></div>
+  
+  <div id="pst-scroll-pixel-helper"></div>
+  
+  <button type="button" class="btn rounded-pill" id="pst-back-to-top">
+    <i class="fa-solid fa-arrow-up"></i>Back to top</button>
+
+  
+  <input type="checkbox"
+          class="sidebar-toggle"
+          id="pst-primary-sidebar-checkbox"/>
+  <label class="overlay overlay-primary" for="pst-primary-sidebar-checkbox"></label>
+  
+  <input type="checkbox"
+          class="sidebar-toggle"
+          id="pst-secondary-sidebar-checkbox"/>
+  <label class="overlay overlay-secondary" for="pst-secondary-sidebar-checkbox"></label>
+  
+  <div class="search-button__wrapper">
+    <div class="search-button__overlay"></div>
+    <div class="search-button__search-container">
+<form class="bd-search d-flex align-items-center"
+      action="http://github.com/search.html"
+      method="get">
+  <i class="fa-solid fa-magnifying-glass"></i>
+  <input type="search"
+         class="form-control"
+         name="q"
+         id="search-input"
+         placeholder="Search this book..."
+         aria-label="Search this book..."
+         autocomplete="off"
+         autocorrect="off"
+         autocapitalize="off"
+         spellcheck="false"/>
+  <span class="search-button__kbd-shortcut"><kbd class="kbd-shortcut__modifier">Ctrl</kbd>+<kbd>K</kbd></span>
+</form></div>
+  </div>
+
+  <div class="pst-async-banner-revealer d-none">
+  <aside id="bd-header-version-warning" class="d-none d-print-none" aria-label="Version warning"></aside>
+</div>
+
+  
+    <header class="bd-header navbar navbar-expand-lg bd-navbar d-print-none">
+    </header>
+  
+
+  <div class="bd-container">
+    <div class="bd-container__inner bd-page-width">
+      
+      
+      
+      <div class="bd-sidebar-primary bd-sidebar">
+        
+
+  
+  <div class="sidebar-header-items sidebar-primary__section">
+    
+    
+    
+    
+  </div>
+  
+    <div class="sidebar-primary-items__start sidebar-primary__section">
+        <div class="sidebar-primary-item">
+
+  
+    
+  
+
+<a class="navbar-brand logo" href="http://github.com/intro.html">
+  
+  
+  
+  
+  
+    
+    
+      
+    
+    
+    <img src="http://github.com/_static/logo.png" class="logo__image only-light" alt="Applied Data Analysis and Machine Learning - Home"/>
+    <script>document.write(`<img src="http://github.com/_static/logo.png" class="logo__image only-dark" alt="Applied Data Analysis and Machine Learning - Home"/>`);</script>
+  
+  
+</a></div>
+        <div class="sidebar-primary-item">
+
+ <script>
+ document.write(`
+   <button class="btn search-button-field search-button__button" title="Search" aria-label="Search" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <i class="fa-solid fa-magnifying-glass"></i>
+    <span class="search-button__default-text">Search</span>
+    <span class="search-button__kbd-shortcut"><kbd class="kbd-shortcut__modifier">Ctrl</kbd>+<kbd class="kbd-shortcut__modifier">K</kbd></span>
+   </button>
+ `);
+ </script></div>
+        <div class="sidebar-primary-item"><nav class="bd-links bd-docs-nav" aria-label="Main">
+    <div class="bd-toc-item navbar-nav active">
+        
+        <ul class="nav bd-sidenav bd-sidenav__home-link">
+            <li class="toctree-l1">
+                <a class="reference internal" href="http://github.com/intro.html">
+                    Applied Data Analysis and Machine Learning
+                </a>
+            </li>
+        </ul>
+        <p aria-level="2" class="caption" role="heading"><span class="caption-text">About the course</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/schedule.html">Course setting</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/teachers.html">Teachers and Grading</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/textbooks.html">Textbooks</a></li>
+
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Review of Statistics with Resampling Techniques and Linear Algebra</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/statistics.html">1. Elements of Probability Theory and Statistical Data Analysis</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/linalg.html">2. Linear Algebra, Handling of Arrays and more Python Features</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">From Regression to Support Vector Machines</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter1.html">3. Linear Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter2.html">4. Ridge and Lasso Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter3.html">5. Resampling Methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter4.html">6. Logistic Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapteroptimization.html">7. Optimization, the central part of any Machine Learning algortithm</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter5.html">8. Support Vector Machines, overarching aims</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Decision Trees, Ensemble Methods and Boosting</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter6.html">9. Decision trees, overarching aims</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter7.html">10. Ensemble Methods: From a Single Tree to Many Trees and Extreme Boosting, Meet the Jungle of Methods</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Dimensionality Reduction</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter8.html">11. Basic ideas of the Principal Component Analysis (PCA)</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/clustering.html">12. Clustering and Unsupervised Learning</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Deep Learning Methods</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter9.html">13. Neural networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter10.html">14. Building a Feed Forward Neural Network</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter11.html">15. Solving Differential Equations  with Deep Learning</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter12.html">16. Convolutional Neural Networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter13.html">17. Recurrent neural networks: Overarching view</a></li>
+
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Weekly material, notes and exercises</span></p>
+<ul class="current nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek34.html">Exercises week 34</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week34.html">Week 34: Introduction to the course, Logistics and Practicalities</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek35.html">Exercises week 35</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week35.html">Week 35: From Ordinary Linear Regression to Ridge and Lasso Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek36.html">Exercises week 36</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week36.html">Week 36: Linear Regression and Gradient descent</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek37.html">Exercises week 37</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week37.html">Week 37: Gradient descent methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek38.html">Exercises week 38</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week38.html">Week 38: Statistical analysis, bias-variance tradeoff and resampling methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek39.html">Exercises week 39</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week39.html">Week 39: Resampling methods and logistic regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week40.html">Week 40: Gradient descent methods (continued) and start Neural networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week41.html">Week 41 Neural networks and constructing a neural network code</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek41.html">Exercises week 41</a></li>
+
+
+
+
+
+
+
+
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week42.html">Week 42 Constructing a Neural Network code with examples</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek42.html">Exercises week 42</a></li>
+
+
+
+
+
+
+
+
+
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week43.html">Week 43: Deep Learning: Constructing a Neural Network code and solving differential equations</a></li>
+<li class="toctree-l1 current active"><a class="current reference internal" href="#">Exercises week 43</a></li>
+
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week44.html">Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek44.html">Exercises week 44</a></li>
+
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week45.html">Week 45,  Convolutional Neural Networks (CCNs)</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Projects</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/project1.html">Project 1 on Machine Learning, deadline October 6 (midnight), 2025</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/project2.html">Project 2 on Machine Learning, deadline November 10 (Midnight)</a></li>
+</ul>
+
+    </div>
+</nav></div>
+    </div>
+  
+  
+  <div class="sidebar-primary-items__end sidebar-primary__section">
+  </div>
+  
+  <div id="rtd-footer-container"></div>
+
+
+      </div>
+      
+      <main id="main-content" class="bd-main" role="main">
+        
+        
+
+<div class="sbt-scroll-pixel-helper"></div>
+
+          <div class="bd-content">
+            <div class="bd-article-container">
+              
+              <div class="bd-header-article d-print-none">
+<div class="header-article-items header-article__inner">
+  
+    <div class="header-article-items__start">
+      
+        <div class="header-article-item"><button class="sidebar-toggle primary-toggle btn btn-sm" title="Toggle primary sidebar" data-bs-placement="bottom" data-bs-toggle="tooltip">
+  <span class="fa-solid fa-bars"></span>
+</button></div>
+      
+    </div>
+  
+  
+    <div class="header-article-items__end">
+      
+        <div class="header-article-item">
+
+<div class="article-header-buttons">
+
+
+
+
+
+<div class="dropdown dropdown-download-buttons">
+  <button class="btn dropdown-toggle" type="button" data-bs-toggle="dropdown" aria-expanded="false" aria-label="Download this page">
+    <i class="fas fa-download"></i>
+  </button>
+  <ul class="dropdown-menu">
+      
+      
+      
+      <li><a href="http://github.com/_sources/exercisesweek43.ipynb" target="_blank"
+   class="btn btn-sm btn-download-source-button dropdown-item"
+   title="Download source file"
+   data-bs-placement="left" data-bs-toggle="tooltip"
+>
+  
+
+<span class="btn__icon-container">
+  <i class="fas fa-file"></i>
+  </span>
+<span class="btn__text-container">.ipynb</span>
+</a>
+</li>
+      
+      
+      
+      
+      <li>
+<button onclick="window.print()"
+  class="btn btn-sm btn-download-pdf-button dropdown-item"
+  title="Print to PDF"
+  data-bs-placement="left" data-bs-toggle="tooltip"
+>
+  
+
+<span class="btn__icon-container">
+  <i class="fas fa-file-pdf"></i>
+  </span>
+<span class="btn__text-container">.pdf</span>
+</button>
+</li>
+      
+  </ul>
+</div>
+
+
+
+
+<button onclick="toggleFullScreen()"
+  class="btn btn-sm btn-fullscreen-button"
+  title="Fullscreen mode"
+  data-bs-placement="bottom" data-bs-toggle="tooltip"
+>
+  
+
+<span class="btn__icon-container">
+  <i class="fas fa-expand"></i>
+  </span>
+
+</button>
+
+
+
+<script>
+document.write(`
+  <button class="btn btn-sm nav-link pst-navbar-icon theme-switch-button" title="light/dark" aria-label="light/dark" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <i class="theme-switch fa-solid fa-sun fa-lg" data-mode="light"></i>
+    <i class="theme-switch fa-solid fa-moon fa-lg" data-mode="dark"></i>
+    <i class="theme-switch fa-solid fa-circle-half-stroke fa-lg" data-mode="auto"></i>
+  </button>
+`);
+</script>
+
+
+<script>
+document.write(`
+  <button class="btn btn-sm pst-navbar-icon search-button search-button__button" title="Search" aria-label="Search" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <i class="fa-solid fa-magnifying-glass fa-lg"></i>
+  </button>
+`);
+</script>
+<button class="sidebar-toggle secondary-toggle btn btn-sm" title="Toggle secondary sidebar" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <span class="fa-solid fa-list"></span>
+</button>
+</div></div>
+      
+    </div>
+  
+</div>
+</div>
+              
+              
+
+<div id="jb-print-docs-body" class="onlyprint">
+    <h1>Exercises week 43</h1>
+    <!-- Table of contents -->
+    <div id="print-main-content">
+        <div id="jb-print-toc">
+            
+            <div>
+                <h2> Contents </h2>
+            </div>
+            <nav aria-label="Page">
+                <ul class="visible nav section-nav flex-column">
+<li class="toc-h1 nav-item toc-entry"><a class="reference internal nav-link" href="#">Exercises week 43</a></li>
+<li class="toc-h1 nav-item toc-entry"><a class="reference internal nav-link" href="#overarching-aims-of-the-exercises-for-week-43">Overarching aims of the exercises for week 43</a><ul class="visible nav section-nav flex-column">
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#confusion-matrix">Confusion Matrix</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#roc-curve">ROC Curve</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#cumulative-gain">Cumulative Gain</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#other-measures-precision-recall-and-the-f-1-measure">Other measures: Precision, Recall, and the F<span class="math notranslate nohighlight">\(_1\)</span> Measure</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#exercises">Exercises</a><ul class="nav section-nav flex-column">
+<li class="toc-h3 nav-item toc-entry"><a class="reference internal nav-link" href="#exercise-a">Exercise a)</a></li>
+<li class="toc-h3 nav-item toc-entry"><a class="reference internal nav-link" href="#exercise-b">Exercise b)</a></li>
+<li class="toc-h3 nav-item toc-entry"><a class="reference internal nav-link" href="#exercise-c-week-43">Exercise c) week 43</a></li>
+</ul>
+</li>
+</ul>
+</li>
+</ul>
+
+            </nav>
+        </div>
+    </div>
+</div>
+
+              
+                
+<div id="searchbox"></div>
+                <article class="bd-article">
+                  
+  <!-- HTML file automatically generated from DocOnce source (https://github.com/doconce/doconce/)
+doconce format html exercisesweek43.do.txt  -->
+<!-- dom:TITLE: Exercises week 43  --><section class="tex2jax_ignore mathjax_ignore" id="exercises-week-43">
+<h1>Exercises week 43<a class="headerlink" href="#exercises-week-43" title="Link to this heading">#</a></h1>
+<p><strong>October 20-24, 2025</strong></p>
+<p>Date: <strong>Deadline Friday October 24 at midnight</strong></p>
+</section>
+<section class="tex2jax_ignore mathjax_ignore" id="overarching-aims-of-the-exercises-for-week-43">
+<h1>Overarching aims of the exercises for week 43<a class="headerlink" href="#overarching-aims-of-the-exercises-for-week-43" title="Link to this heading">#</a></h1>
+<p>The aim of the exercises this week is to gain some confidence with
+ways to visualize the results of a classification problem.  We will
+target three ways of setting up the analysis. The first and simplest
+one is the</p>
+<ol class="arabic simple">
+<li><p>so-called confusion matrix. The next one is the so-called</p></li>
+<li><p>ROC curve. Finally we have the</p></li>
+<li><p>Cumulative gain curve.</p></li>
+</ol>
+<p>We will use Logistic Regression as method for the classification in
+this exercise. You can compare these results with those obtained with
+your neural network code from project 2 without a hidden layer.</p>
+<p>In these exercises we will use binary and  multi-class data sets
+(the Iris data set from week 41).</p>
+<p>The underlying mathematics is described here.</p>
+<section id="confusion-matrix">
+<h2>Confusion Matrix<a class="headerlink" href="#confusion-matrix" title="Link to this heading">#</a></h2>
+<p>A <strong>confusion matrix</strong> summarizes a classifier’s performance by
+tabulating predictions versus true labels.  For binary classification,
+it is a <span class="math notranslate nohighlight">\(2\times2\)</span> table whose entries are counts of outcomes:</p>
+<div class="math notranslate nohighlight">
+\[\begin{split}
+\begin{array}{l|cc} &amp; \text{Predicted Positive} &amp; \text{Predicted Negative} \\ \hline \text{Actual Positive} &amp; TP &amp; FN \\ \text{Actual Negative} &amp; FP &amp; TN \end{array}.
+\end{split}\]</div>
+<p>Here TP (true positives) is the number of cases correctly predicted as
+positive, FP (false positives) is the number incorrectly predicted as
+positive, TN (true negatives) is correctly predicted negative, and FN
+(false negatives) is incorrectly predicted negative .  In other words,
+“positive” means class 1 and “negative” means class 0; for example, TP
+occurs when the prediction and actual are both positive.  Formally:</p>
+<div class="math notranslate nohighlight">
+\[
+\text{TPR} = \frac{\text{TP}}{\text{TP} + \text{FN}}, \quad \text{FPR} = \frac{\text{FP}}{\text{FP} + \text{TN}},
+\]</div>
+<p>where TPR and FPR are the true and false positive rates defined below.</p>
+<p>In multiclass classification with <span class="math notranslate nohighlight">\(K\)</span> classes, the confusion matrix
+generalizes to a <span class="math notranslate nohighlight">\(K\times K\)</span> table.  Entry <span class="math notranslate nohighlight">\(N_{ij}\)</span> in the table is
+the count of instances whose true class is <span class="math notranslate nohighlight">\(i\)</span> and whose predicted
+class is <span class="math notranslate nohighlight">\(j\)</span>.  For example, a three-class confusion matrix can be written
+as:</p>
+<div class="math notranslate nohighlight">
+\[\begin{split}
+\begin{array}{c|ccc} &amp; \text{Pred Class 1} &amp; \text{Pred Class 2} &amp; \text{Pred Class 3} \\ \hline \text{Act Class 1} &amp; N_{11} &amp; N_{12} &amp; N_{13} \\ \text{Act Class 2} &amp; N_{21} &amp; N_{22} &amp; N_{23} \\ \text{Act Class 3} &amp; N_{31} &amp; N_{32} &amp; N_{33} \end{array}.
+\end{split}\]</div>
+<p>Here the diagonal entries <span class="math notranslate nohighlight">\(N_{ii}\)</span> are the true positives for each
+class, and off-diagonal entries are misclassifications.  This matrix
+allows computation of per-class metrics: e.g. for class <span class="math notranslate nohighlight">\(i\)</span>,
+<span class="math notranslate nohighlight">\(\mathrm{TP}_i=N_{ii}\)</span>, <span class="math notranslate nohighlight">\(\mathrm{FN}_i=\sum_{j\neq i}N_{ij}\)</span>,
+<span class="math notranslate nohighlight">\(\mathrm{FP}_i=\sum_{j\neq i}N_{ji}\)</span>, and <span class="math notranslate nohighlight">\(\mathrm{TN}_i\)</span> is the sum of
+all remaining entries.</p>
+<p>As defined above, TPR and FPR come from the binary case. In binary
+terms with <span class="math notranslate nohighlight">\(P\)</span> actual positives and <span class="math notranslate nohighlight">\(N\)</span> actual negatives, one has</p>
+<div class="math notranslate nohighlight">
+\[
+\text{TPR} = \frac{TP}{P} = \frac{TP}{TP+FN}, \quad \text{FPR} =
+\frac{FP}{N} = \frac{FP}{FP+TN},
+\]</div>
+<p>as used in standard confusion-matrix
+formulations. These rates will be used in constructing ROC curves.</p>
+</section>
+<section id="roc-curve">
+<h2>ROC Curve<a class="headerlink" href="#roc-curve" title="Link to this heading">#</a></h2>
+<p>The Receiver Operating Characteristic (ROC) curve plots the trade-off
+between true positives and false positives as a discrimination
+threshold varies.  Specifically, for a binary classifier that outputs
+a score or probability, one varies the threshold <span class="math notranslate nohighlight">\(t\)</span> for declaring
+<strong>positive</strong>, and computes at each <span class="math notranslate nohighlight">\(t\)</span> the true positive rate
+<span class="math notranslate nohighlight">\(\mathrm{TPR}(t)\)</span> and false positive rate <span class="math notranslate nohighlight">\(\mathrm{FPR}(t)\)</span> using the
+confusion matrix at that threshold.  The ROC curve is then the graph
+of TPR versus FPR.  By definition,</p>
+<div class="math notranslate nohighlight">
+\[
+\mathrm{TPR} = \frac{TP}{TP+FN}, \qquad \mathrm{FPR} = \frac{FP}{FP+TN},
+\]</div>
+<p>where <span class="math notranslate nohighlight">\(TP,FP,TN,FN\)</span> are counts determined by threshold <span class="math notranslate nohighlight">\(t\)</span>.  A perfect
+classifier would reach the point (FPR=0, TPR=1) at some threshold.</p>
+<p>Formally, the ROC curve is obtained by plotting
+<span class="math notranslate nohighlight">\((\mathrm{FPR}(t),\mathrm{TPR}(t))\)</span> for all <span class="math notranslate nohighlight">\(t\in[0,1]\)</span> (or as <span class="math notranslate nohighlight">\(t\)</span>
+sweeps through the sorted scores).  The Area Under the ROC Curve (AUC)
+quantifies the average performance over all thresholds.  It can be
+interpreted probabilistically: <span class="math notranslate nohighlight">\(\mathrm{AUC} =
+\Pr\bigl(s(X^+)&gt;s(X^-)\bigr)\)</span>, the probability that a random positive
+instance <span class="math notranslate nohighlight">\(X^+\)</span> receives a higher score <span class="math notranslate nohighlight">\(s\)</span> than a random negative
+instance <span class="math notranslate nohighlight">\(X^-\)</span> .  Equivalently, the AUC is the integral under the ROC
+curve:</p>
+<div class="math notranslate nohighlight">
+\[
+\mathrm{AUC} \;=\; \int_{0}^{1} \mathrm{TPR}(f)\,df,
+\]</div>
+<p>where <span class="math notranslate nohighlight">\(f\)</span> ranges over FPR (or fraction of negatives).  A model that guesses at random yields a diagonal ROC (AUC=0.5), whereas a perfect model yields AUC=1.0.</p>
+</section>
+<section id="cumulative-gain">
+<h2>Cumulative Gain<a class="headerlink" href="#cumulative-gain" title="Link to this heading">#</a></h2>
+<p>The cumulative gain curve (or gains chart) evaluates how many
+positives are captured as one targets an increasing fraction of the
+population, sorted by model confidence.  To construct it, sort all
+instances by decreasing predicted probability of the positive class.
+Then, for the top <span class="math notranslate nohighlight">\(\alpha\)</span> fraction of instances, compute the fraction
+of all actual positives that fall in this subset.  In formula form, if
+<span class="math notranslate nohighlight">\(P\)</span> is the total number of positive instances and <span class="math notranslate nohighlight">\(P(\alpha)\)</span> is the
+number of positives among the top <span class="math notranslate nohighlight">\(\alpha\)</span> of the data, the cumulative
+gain at level <span class="math notranslate nohighlight">\(\alpha\)</span> is</p>
+<div class="math notranslate nohighlight">
+\[
+\mathrm{Gain}(\alpha) \;=\; \frac{P(\alpha)}{P}.
+\]</div>
+<p>For example, cutting off at the top 10% of predictions yields a gain
+equal to (positives in top 10%) divided by (total positives) .
+Plotting <span class="math notranslate nohighlight">\(\mathrm{Gain}(\alpha)\)</span> versus <span class="math notranslate nohighlight">\(\alpha\)</span> (often in percent)
+gives the gain curve.  The baseline (random) curve is the diagonal
+<span class="math notranslate nohighlight">\(\mathrm{Gain}(\alpha)=\alpha\)</span>, while an ideal model has a steep climb
+toward 1.</p>
+<p>A related measure is the {\em lift}, often called the gain ratio.  It is the ratio of the model’s capture rate to that of random selection.  Equivalently,</p>
+<div class="math notranslate nohighlight">
+\[
+\mathrm{Lift}(\alpha) \;=\; \frac{\mathrm{Gain}(\alpha)}{\alpha}.
+\]</div>
+<p>A lift <span class="math notranslate nohighlight">\(&gt;1\)</span> indicates better-than-random targeting.  In practice, gain
+and lift charts (used e.g.\ in marketing or imbalanced classification)
+show how many positives can be “gained” by focusing on a fraction of
+the population .</p>
+</section>
+<section id="other-measures-precision-recall-and-the-f-1-measure">
+<h2>Other measures: Precision, Recall, and the F<span class="math notranslate nohighlight">\(_1\)</span> Measure<a class="headerlink" href="#other-measures-precision-recall-and-the-f-1-measure" title="Link to this heading">#</a></h2>
+<p>Precision and recall (sensitivity) quantify binary classification
+accuracy in terms of positive predictions.  They are defined from the
+confusion matrix as:</p>
+<div class="math notranslate nohighlight">
+\[
+\text{Precision} = \frac{TP}{TP + FP}, \qquad \text{Recall} = \frac{TP}{TP + FN}.
+\]</div>
+<p>Precision is the fraction of predicted positives that are correct, and
+recall is the fraction of actual positives that are correctly
+identified .  A high-precision classifier makes few false-positive
+errors, while a high-recall classifier makes few false-negative
+errors.</p>
+<p>The F<span class="math notranslate nohighlight">\(_1\)</span> score (balanced F-measure) combines precision and recall into a single metric via their harmonic mean.  The usual formula is:</p>
+<div class="math notranslate nohighlight">
+\[
+F_1 =2\frac{\text{Precision}\times\text{Recall}}{\text{Precision} + \text{Recall}}.
+\]</div>
+<p>This can be shown to equal</p>
+<div class="math notranslate nohighlight">
+\[
+\frac{2\,TP}{2\,TP + FP + FN}.
+\]</div>
+<p>The F<span class="math notranslate nohighlight">\(_1\)</span> score ranges from 0 (worst) to 1 (best), and balances the
+trade-off between precision and recall.</p>
+<p>For multi-class classification, one computes per-class
+precision/recall/F<span class="math notranslate nohighlight">\(_1\)</span> (treating each class as “positive” in a
+one-vs-rest manner) and then averages.  Common averaging methods are:</p>
+<p>Micro-averaging: Sum all true positives, false positives, and false negatives across classes, then compute precision/recall/F<span class="math notranslate nohighlight">\(_1\)</span> from these totals.
+Macro-averaging: Compute the F<span class="math notranslate nohighlight">\(1\)</span> score <span class="math notranslate nohighlight">\(F{1,i}\)</span> for each class <span class="math notranslate nohighlight">\(i\)</span> separately, then take the unweighted mean: <span class="math notranslate nohighlight">\(F_{1,\mathrm{macro}} = \frac{1}{K}\sum_{i=1}^K F_{1,i}\)</span> .  This treats all classes equally regardless of size.
+Weighted-averaging: Like macro-average, but weight each class’s <span class="math notranslate nohighlight">\(F_{1,i}\)</span> by its support <span class="math notranslate nohighlight">\(n_i\)</span> (true count): <span class="math notranslate nohighlight">\(F_{1,\mathrm{weighted}} = \frac{1}{N}\sum_{i=1}^K n_i F_{1,i}\)</span>, where <span class="math notranslate nohighlight">\(N=\sum_i n_i\)</span>.  This accounts for class imbalance by giving more weight to larger classes .</p>
+<p>Each of these averages has different use-cases. Micro-average is
+dominated by common classes, macro-average highlights performance on
+rare classes, and weighted-average is a compromise.  These formulas
+and concepts allow rigorous evaluation of classifier performance in
+both binary and multi-class settings.</p>
+</section>
+<section id="exercises">
+<h2>Exercises<a class="headerlink" href="#exercises" title="Link to this heading">#</a></h2>
+<p>Here is a simple code example which uses  the Logistic regression machinery from <strong>scikit-learn</strong>.
+At the end it sets up the confusion matrix and the ROC and cumulative gain curves.
+Feel free to use these functionalities (we don’t expect you to write your own code for say the confusion matrix).</p>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-ipython3 notranslate"><div class="highlight"><pre><span></span><span class="o">%</span><span class="k">matplotlib</span> inline
+
+<span class="kn">import</span> <span class="nn">matplotlib.pyplot</span> <span class="k">as</span> <span class="nn">plt</span>
+<span class="kn">import</span> <span class="nn">numpy</span> <span class="k">as</span> <span class="nn">np</span>
+<span class="kn">from</span> <span class="nn">sklearn.model_selection</span> <span class="kn">import</span>  <span class="n">train_test_split</span> 
+<span class="c1"># from sklearn.datasets import fill in the data set</span>
+<span class="kn">from</span> <span class="nn">sklearn.linear_model</span> <span class="kn">import</span> <span class="n">LogisticRegression</span>
+
+<span class="c1"># Load the data, fill inn</span>
+<span class="n">mydata</span><span class="o">.</span><span class="n">data</span> <span class="o">=</span> <span class="o">?</span>
+
+<span class="n">X_train</span><span class="p">,</span> <span class="n">X_test</span><span class="p">,</span> <span class="n">y_train</span><span class="p">,</span> <span class="n">y_test</span> <span class="o">=</span> <span class="n">train_test_split</span><span class="p">(</span><span class="n">mydata</span><span class="o">.</span><span class="n">data</span><span class="p">,</span><span class="n">cancer</span><span class="o">.</span><span class="n">target</span><span class="p">,</span><span class="n">random_state</span><span class="o">=</span><span class="mi">0</span><span class="p">)</span>
+<span class="nb">print</span><span class="p">(</span><span class="n">X_train</span><span class="o">.</span><span class="n">shape</span><span class="p">)</span>
+<span class="nb">print</span><span class="p">(</span><span class="n">X_test</span><span class="o">.</span><span class="n">shape</span><span class="p">)</span>
+<span class="c1"># Logistic Regression</span>
+<span class="c1"># define which type of problem, binary or multiclass</span>
+<span class="n">logreg</span> <span class="o">=</span> <span class="n">LogisticRegression</span><span class="p">(</span><span class="n">solver</span><span class="o">=</span><span class="s1">&#39;lbfgs&#39;</span><span class="p">)</span>
+<span class="n">logreg</span><span class="o">.</span><span class="n">fit</span><span class="p">(</span><span class="n">X_train</span><span class="p">,</span> <span class="n">y_train</span><span class="p">)</span>
+
+<span class="kn">from</span> <span class="nn">sklearn.preprocessing</span> <span class="kn">import</span> <span class="n">LabelEncoder</span>
+<span class="kn">from</span> <span class="nn">sklearn.model_selection</span> <span class="kn">import</span> <span class="n">cross_validate</span>
+<span class="c1">#Cross validation</span>
+<span class="n">accuracy</span> <span class="o">=</span> <span class="n">cross_validate</span><span class="p">(</span><span class="n">logreg</span><span class="p">,</span><span class="n">X_test</span><span class="p">,</span><span class="n">y_test</span><span class="p">,</span><span class="n">cv</span><span class="o">=</span><span class="mi">10</span><span class="p">)[</span><span class="s1">&#39;test_score&#39;</span><span class="p">]</span>
+<span class="nb">print</span><span class="p">(</span><span class="n">accuracy</span><span class="p">)</span>
+<span class="nb">print</span><span class="p">(</span><span class="s2">&quot;Test set accuracy with Logistic Regression: </span><span class="si">{:.2f}</span><span class="s2">&quot;</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">logreg</span><span class="o">.</span><span class="n">score</span><span class="p">(</span><span class="n">X_test</span><span class="p">,</span><span class="n">y_test</span><span class="p">)))</span>
+
+<span class="kn">import</span> <span class="nn">scikitplot</span> <span class="k">as</span> <span class="nn">skplt</span>
+<span class="n">y_pred</span> <span class="o">=</span> <span class="n">logreg</span><span class="o">.</span><span class="n">predict</span><span class="p">(</span><span class="n">X_test</span><span class="p">)</span>
+<span class="n">skplt</span><span class="o">.</span><span class="n">metrics</span><span class="o">.</span><span class="n">plot_confusion_matrix</span><span class="p">(</span><span class="n">y_test</span><span class="p">,</span> <span class="n">y_pred</span><span class="p">,</span> <span class="n">normalize</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
+<span class="n">plt</span><span class="o">.</span><span class="n">show</span><span class="p">()</span>
+<span class="n">y_probas</span> <span class="o">=</span> <span class="n">logreg</span><span class="o">.</span><span class="n">predict_proba</span><span class="p">(</span><span class="n">X_test</span><span class="p">)</span>
+<span class="n">skplt</span><span class="o">.</span><span class="n">metrics</span><span class="o">.</span><span class="n">plot_roc</span><span class="p">(</span><span class="n">y_test</span><span class="p">,</span> <span class="n">y_probas</span><span class="p">)</span>
+<span class="n">plt</span><span class="o">.</span><span class="n">show</span><span class="p">()</span>
+<span class="n">skplt</span><span class="o">.</span><span class="n">metrics</span><span class="o">.</span><span class="n">plot_cumulative_gain</span><span class="p">(</span><span class="n">y_test</span><span class="p">,</span> <span class="n">y_probas</span><span class="p">)</span>
+<span class="n">plt</span><span class="o">.</span><span class="n">show</span><span class="p">()</span>
+</pre></div>
+</div>
+</div>
+</div>
+<section id="exercise-a">
+<h3>Exercise a)<a class="headerlink" href="#exercise-a" title="Link to this heading">#</a></h3>
+<p>Convince yourself about the mathematics for the confusion matrix, the ROC and the cumlative gain curves for both a binary and a multiclass classification problem.</p>
+</section>
+<section id="exercise-b">
+<h3>Exercise b)<a class="headerlink" href="#exercise-b" title="Link to this heading">#</a></h3>
+<p>Use a binary classification data available from <strong>scikit-learn</strong>. As an example you can use
+the MNIST data set and just specialize to two numbers. To do so you can use the following code lines</p>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-ipython3 notranslate"><div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">sklearn.datasets</span> <span class="kn">import</span> <span class="n">load_digits</span>
+<span class="n">digits</span> <span class="o">=</span> <span class="n">load_digits</span><span class="p">(</span><span class="n">n_class</span><span class="o">=</span><span class="mi">2</span><span class="p">)</span> <span class="c1"># Load only two classes, e.g., 0 and 1</span>
+<span class="n">X</span><span class="p">,</span> <span class="n">y</span> <span class="o">=</span> <span class="n">digits</span><span class="o">.</span><span class="n">data</span><span class="p">,</span> <span class="n">digits</span><span class="o">.</span><span class="n">target</span>
+</pre></div>
+</div>
+</div>
+</div>
+<p>Alternatively, you can use the <em>make<span class="math notranslate nohighlight">\(\_\)</span>classification</em>
+functionality. This function generates a random <span class="math notranslate nohighlight">\(n\)</span>-class classification
+dataset, which can be configured for binary classification by setting
+n_classes=2. You can also control the number of samples, features,
+informative features, redundant features, and more.</p>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-ipython3 notranslate"><div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">sklearn.datasets</span> <span class="kn">import</span> <span class="n">make_classification</span>
+<span class="n">X</span><span class="p">,</span> <span class="n">y</span> <span class="o">=</span> <span class="n">make_classification</span><span class="p">(</span><span class="n">n_samples</span><span class="o">=</span><span class="mi">1000</span><span class="p">,</span> <span class="n">n_features</span><span class="o">=</span><span class="mi">20</span><span class="p">,</span> <span class="n">n_informative</span><span class="o">=</span><span class="mi">10</span><span class="p">,</span> <span class="n">n_redundant</span><span class="o">=</span><span class="mi">5</span><span class="p">,</span> <span class="n">n_classes</span><span class="o">=</span><span class="mi">2</span><span class="p">,</span> <span class="n">random_state</span><span class="o">=</span><span class="mi">42</span><span class="p">)</span>
+</pre></div>
+</div>
+</div>
+</div>
+<p>You can use this option for the multiclass case as well, see the next exercise.
+If you prefer to study other binary classification datasets, feel free
+to replace the above suggestions with your own dataset.</p>
+<p>Make plots of the confusion matrix, the ROC curve and the cumulative gain curve.</p>
+</section>
+<section id="exercise-c-week-43">
+<h3>Exercise c) week 43<a class="headerlink" href="#exercise-c-week-43" title="Link to this heading">#</a></h3>
+<p>As a multiclass problem, we will use the Iris data set discussed in
+the exercises from weeks 41 and 42. This is a three-class data set and
+you can set it up using <strong>scikit-learn</strong>,</p>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-ipython3 notranslate"><div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">sklearn.datasets</span> <span class="kn">import</span> <span class="n">load_iris</span>
+<span class="n">iris</span> <span class="o">=</span> <span class="n">load_iris</span><span class="p">()</span>
+<span class="n">X</span> <span class="o">=</span> <span class="n">iris</span><span class="o">.</span><span class="n">data</span>  <span class="c1"># Features</span>
+<span class="n">y</span> <span class="o">=</span> <span class="n">iris</span><span class="o">.</span><span class="n">target</span> <span class="c1"># Target labels</span>
+</pre></div>
+</div>
+</div>
+</div>
+<p>Make plots of the confusion matrix, the ROC curve and the cumulative
+gain curve for this (or other) multiclass data set.</p>
+</section>
+</section>
+</section>
+
+    <script type="text/x-thebe-config">
+    {
+        requestKernel: true,
+        binderOptions: {
+            repo: "binder-examples/jupyter-stacks-datascience",
+            ref: "master",
+        },
+        codeMirrorConfig: {
+            theme: "abcdef",
+            mode: "python"
+        },
+        kernelOptions: {
+            name: "python3",
+            path: "./."
+        },
+        predefinedOutput: true
+    }
+    </script>
+    <script>kernelName = 'python3'</script>
+
+                </article>
+              
+
+              
+              
+              
+              
+                <footer class="prev-next-footer d-print-none">
+                  
+<div class="prev-next-area">
+    <a class="left-prev"
+       href="http://github.com/week43.html"
+       title="previous page">
+      <i class="fa-solid fa-angle-left"></i>
+      <div class="prev-next-info">
+        <p class="prev-next-subtitle">previous</p>
+        <p class="prev-next-title">Week 43: Deep Learning: Constructing a Neural Network code and solving differential equations</p>
+      </div>
+    </a>
+    <a class="right-next"
+       href="http://github.com/week44.html"
+       title="next page">
+      <div class="prev-next-info">
+        <p class="prev-next-subtitle">next</p>
+        <p class="prev-next-title">Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)</p>
+      </div>
+      <i class="fa-solid fa-angle-right"></i>
+    </a>
+</div>
+                </footer>
+              
+            </div>
+            
+            
+              
+                <div class="bd-sidebar-secondary bd-toc"><div class="sidebar-secondary-items sidebar-secondary__inner">
+
+
+  <div class="sidebar-secondary-item">
+  <div class="page-toc tocsection onthispage">
+    <i class="fa-solid fa-list"></i> Contents
+  </div>
+  <nav class="bd-toc-nav page-toc">
+    <ul class="visible nav section-nav flex-column">
+<li class="toc-h1 nav-item toc-entry"><a class="reference internal nav-link" href="#">Exercises week 43</a></li>
+<li class="toc-h1 nav-item toc-entry"><a class="reference internal nav-link" href="#overarching-aims-of-the-exercises-for-week-43">Overarching aims of the exercises for week 43</a><ul class="visible nav section-nav flex-column">
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#confusion-matrix">Confusion Matrix</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#roc-curve">ROC Curve</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#cumulative-gain">Cumulative Gain</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#other-measures-precision-recall-and-the-f-1-measure">Other measures: Precision, Recall, and the F<span class="math notranslate nohighlight">\(_1\)</span> Measure</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#exercises">Exercises</a><ul class="nav section-nav flex-column">
+<li class="toc-h3 nav-item toc-entry"><a class="reference internal nav-link" href="#exercise-a">Exercise a)</a></li>
+<li class="toc-h3 nav-item toc-entry"><a class="reference internal nav-link" href="#exercise-b">Exercise b)</a></li>
+<li class="toc-h3 nav-item toc-entry"><a class="reference internal nav-link" href="#exercise-c-week-43">Exercise c) week 43</a></li>
+</ul>
+</li>
+</ul>
+</li>
+</ul>
+
+  </nav></div>
+
+</div></div>
+              
+            
+          </div>
+          <footer class="bd-footer-content">
+            
+<div class="bd-footer-content__inner container">
+  
+  <div class="footer-item">
+    
+<p class="component-author">
+By Morten Hjorth-Jensen
+</p>
+
+  </div>
+  
+  <div class="footer-item">
+    
+
+  <p class="copyright">
+    
+      © Copyright 2023.
+      <br/>
+    
+  </p>
+
+  </div>
+  
+  <div class="footer-item">
+    
+  </div>
+  
+  <div class="footer-item">
+    
+  </div>
+  
+</div>
+          </footer>
+        
+
+      </main>
+    </div>
+  </div>
+  
+  <!-- Scripts loaded after <body> so the DOM is not blocked -->
+  <script src="http://github.com/_static/scripts/bootstrap.js?digest=dfe6caa3a7d634c4db9b"></script>
+<script src="http://github.com/_static/scripts/pydata-sphinx-theme.js?digest=dfe6caa3a7d634c4db9b"></script>
+
+  <footer class="bd-footer">
+  </footer>
+  </body>
+</html>
\ No newline at end of file
diff --git a/doc/LectureNotes/_build/html/exercisesweek44.html b/doc/LectureNotes/_build/html/exercisesweek44.html
new file mode 100644
index 000000000..97dee6175
--- /dev/null
+++ b/doc/LectureNotes/_build/html/exercisesweek44.html
@@ -0,0 +1,639 @@
+
+<!DOCTYPE html>
+
+
+<html lang="en" data-content_root="./" >
+
+  <head>
+    <meta charset="utf-8" />
+    <meta name="viewport" content="width=device-width, initial-scale=1.0" /><meta name="viewport" content="width=device-width, initial-scale=1" />
+
+    <title>Exercises week 44 &#8212; Applied Data Analysis and Machine Learning</title>
+  
+  
+  
+  <script data-cfasync="false">
+    document.documentElement.dataset.mode = localStorage.getItem("mode") || "";
+    document.documentElement.dataset.theme = localStorage.getItem("theme") || "";
+  </script>
+  
+  <!-- Loaded before other Sphinx assets -->
+  <link href="http://github.com/_static/styles/theme.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+<link href="http://github.com/_static/styles/bootstrap.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+<link href="http://github.com/_static/styles/pydata-sphinx-theme.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+
+  
+  <link href="http://github.com/_static/vendor/fontawesome/6.5.2/css/all.min.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+  <link rel="preload" as="font" type="font/woff2" crossorigin href="http://github.com/_static/vendor/fontawesome/6.5.2/webfonts/fa-solid-900.woff2" />
+<link rel="preload" as="font" type="font/woff2" crossorigin href="http://github.com/_static/vendor/fontawesome/6.5.2/webfonts/fa-brands-400.woff2" />
+<link rel="preload" as="font" type="font/woff2" crossorigin href="http://github.com/_static/vendor/fontawesome/6.5.2/webfonts/fa-regular-400.woff2" />
+
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/pygments.css?v=fa44fd50" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/styles/sphinx-book-theme.css?v=eba8b062" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/togglebutton.css?v=13237357" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/copybutton.css?v=76b2166b" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/mystnb.8ecb98da25f57f5357bf6f572d296f466b2cfe2517ffebfabe82451661e28f02.css?v=6644e6bb" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/sphinx-thebe.css?v=4fa983c6" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/sphinx-design.min.css?v=95c83b7e" />
+  
+  <!-- Pre-loaded scripts that we'll load fully later -->
+  <link rel="preload" as="script" href="http://github.com/_static/scripts/bootstrap.js?digest=dfe6caa3a7d634c4db9b" />
+<link rel="preload" as="script" href="http://github.com/_static/scripts/pydata-sphinx-theme.js?digest=dfe6caa3a7d634c4db9b" />
+  <script src="http://github.com/_static/vendor/fontawesome/6.5.2/js/all.min.js?digest=dfe6caa3a7d634c4db9b"></script>
+
+    <script src="http://github.com/_static/documentation_options.js?v=9eb32ce0"></script>
+    <script src="http://github.com/_static/doctools.js?v=9a2dae69"></script>
+    <script src="http://github.com/_static/sphinx_highlight.js?v=dc90522c"></script>
+    <script src="http://github.com/_static/clipboard.min.js?v=a7894cd8"></script>
+    <script src="http://github.com/_static/copybutton.js?v=f281be69"></script>
+    <script src="http://github.com/_static/scripts/sphinx-book-theme.js?v=887ef09a"></script>
+    <script>let toggleHintShow = 'Click to show';</script>
+    <script>let toggleHintHide = 'Click to hide';</script>
+    <script>let toggleOpenOnPrint = 'true';</script>
+    <script src="http://github.com/_static/togglebutton.js?v=4a39c7ea"></script>
+    <script>var togglebuttonSelector = '.toggle, .admonition.dropdown';</script>
+    <script src="http://github.com/_static/design-tabs.js?v=f930bc37"></script>
+    <script>const THEBE_JS_URL = "https://unpkg.com/thebe@0.8.2/lib/index.js"; const thebe_selector = ".thebe,.cell"; const thebe_selector_input = "pre"; const thebe_selector_output = ".output, .cell_output"</script>
+    <script async="async" src="http://github.com/_static/sphinx-thebe.js?v=c100c467"></script>
+    <script>var togglebuttonSelector = '.toggle, .admonition.dropdown';</script>
+    <script>const THEBE_JS_URL = "https://unpkg.com/thebe@0.8.2/lib/index.js"; const thebe_selector = ".thebe,.cell"; const thebe_selector_input = "pre"; const thebe_selector_output = ".output, .cell_output"</script>
+    <script>window.MathJax = {"options": {"processHtmlClass": "tex2jax_process|mathjax_process|math|output_area"}}</script>
+    <script defer="defer" src="https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-mml-chtml.js"></script>
+    <script>DOCUMENTATION_OPTIONS.pagename = 'exercisesweek44';</script>
+    <link rel="index" title="Index" href="http://github.com/genindex.html" />
+    <link rel="search" title="Search" href="http://github.com/search.html" />
+    <link rel="next" title="Week 45, Convolutional Neural Networks (CCNs)" href="http://github.com/week45.html" />
+    <link rel="prev" title="Week 44, Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)" href="http://github.com/week44.html" />
+  <meta name="viewport" content="width=device-width, initial-scale=1"/>
+  <meta name="docsearch:language" content="en"/>
+  </head>
+  
+  
+  <body data-bs-spy="scroll" data-bs-target=".bd-toc-nav" data-offset="180" data-bs-root-margin="0px 0px -60%" data-default-mode="">
+
+  
+  
+  <div id="pst-skip-link" class="skip-link d-print-none"><a href="#main-content">Skip to main content</a></div>
+  
+  <div id="pst-scroll-pixel-helper"></div>
+  
+  <button type="button" class="btn rounded-pill" id="pst-back-to-top">
+    <i class="fa-solid fa-arrow-up"></i>Back to top</button>
+
+  
+  <input type="checkbox"
+          class="sidebar-toggle"
+          id="pst-primary-sidebar-checkbox"/>
+  <label class="overlay overlay-primary" for="pst-primary-sidebar-checkbox"></label>
+  
+  <input type="checkbox"
+          class="sidebar-toggle"
+          id="pst-secondary-sidebar-checkbox"/>
+  <label class="overlay overlay-secondary" for="pst-secondary-sidebar-checkbox"></label>
+  
+  <div class="search-button__wrapper">
+    <div class="search-button__overlay"></div>
+    <div class="search-button__search-container">
+<form class="bd-search d-flex align-items-center"
+      action="http://github.com/search.html"
+      method="get">
+  <i class="fa-solid fa-magnifying-glass"></i>
+  <input type="search"
+         class="form-control"
+         name="q"
+         id="search-input"
+         placeholder="Search this book..."
+         aria-label="Search this book..."
+         autocomplete="off"
+         autocorrect="off"
+         autocapitalize="off"
+         spellcheck="false"/>
+  <span class="search-button__kbd-shortcut"><kbd class="kbd-shortcut__modifier">Ctrl</kbd>+<kbd>K</kbd></span>
+</form></div>
+  </div>
+
+  <div class="pst-async-banner-revealer d-none">
+  <aside id="bd-header-version-warning" class="d-none d-print-none" aria-label="Version warning"></aside>
+</div>
+
+  
+    <header class="bd-header navbar navbar-expand-lg bd-navbar d-print-none">
+    </header>
+  
+
+  <div class="bd-container">
+    <div class="bd-container__inner bd-page-width">
+      
+      
+      
+      <div class="bd-sidebar-primary bd-sidebar">
+        
+
+  
+  <div class="sidebar-header-items sidebar-primary__section">
+    
+    
+    
+    
+  </div>
+  
+    <div class="sidebar-primary-items__start sidebar-primary__section">
+        <div class="sidebar-primary-item">
+
+  
+    
+  
+
+<a class="navbar-brand logo" href="http://github.com/intro.html">
+  
+  
+  
+  
+  
+    
+    
+      
+    
+    
+    <img src="http://github.com/_static/logo.png" class="logo__image only-light" alt="Applied Data Analysis and Machine Learning - Home"/>
+    <script>document.write(`<img src="http://github.com/_static/logo.png" class="logo__image only-dark" alt="Applied Data Analysis and Machine Learning - Home"/>`);</script>
+  
+  
+</a></div>
+        <div class="sidebar-primary-item">
+
+ <script>
+ document.write(`
+   <button class="btn search-button-field search-button__button" title="Search" aria-label="Search" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <i class="fa-solid fa-magnifying-glass"></i>
+    <span class="search-button__default-text">Search</span>
+    <span class="search-button__kbd-shortcut"><kbd class="kbd-shortcut__modifier">Ctrl</kbd>+<kbd class="kbd-shortcut__modifier">K</kbd></span>
+   </button>
+ `);
+ </script></div>
+        <div class="sidebar-primary-item"><nav class="bd-links bd-docs-nav" aria-label="Main">
+    <div class="bd-toc-item navbar-nav active">
+        
+        <ul class="nav bd-sidenav bd-sidenav__home-link">
+            <li class="toctree-l1">
+                <a class="reference internal" href="http://github.com/intro.html">
+                    Applied Data Analysis and Machine Learning
+                </a>
+            </li>
+        </ul>
+        <p aria-level="2" class="caption" role="heading"><span class="caption-text">About the course</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/schedule.html">Course setting</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/teachers.html">Teachers and Grading</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/textbooks.html">Textbooks</a></li>
+
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Review of Statistics with Resampling Techniques and Linear Algebra</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/statistics.html">1. Elements of Probability Theory and Statistical Data Analysis</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/linalg.html">2. Linear Algebra, Handling of Arrays and more Python Features</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">From Regression to Support Vector Machines</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter1.html">3. Linear Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter2.html">4. Ridge and Lasso Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter3.html">5. Resampling Methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter4.html">6. Logistic Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapteroptimization.html">7. Optimization, the central part of any Machine Learning algortithm</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter5.html">8. Support Vector Machines, overarching aims</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Decision Trees, Ensemble Methods and Boosting</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter6.html">9. Decision trees, overarching aims</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter7.html">10. Ensemble Methods: From a Single Tree to Many Trees and Extreme Boosting, Meet the Jungle of Methods</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Dimensionality Reduction</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter8.html">11. Basic ideas of the Principal Component Analysis (PCA)</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/clustering.html">12. Clustering and Unsupervised Learning</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Deep Learning Methods</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter9.html">13. Neural networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter10.html">14. Building a Feed Forward Neural Network</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter11.html">15. Solving Differential Equations  with Deep Learning</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter12.html">16. Convolutional Neural Networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter13.html">17. Recurrent neural networks: Overarching view</a></li>
+
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Weekly material, notes and exercises</span></p>
+<ul class="current nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek34.html">Exercises week 34</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week34.html">Week 34: Introduction to the course, Logistics and Practicalities</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek35.html">Exercises week 35</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week35.html">Week 35: From Ordinary Linear Regression to Ridge and Lasso Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek36.html">Exercises week 36</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week36.html">Week 36: Linear Regression and Gradient descent</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek37.html">Exercises week 37</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week37.html">Week 37: Gradient descent methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek38.html">Exercises week 38</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week38.html">Week 38: Statistical analysis, bias-variance tradeoff and resampling methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek39.html">Exercises week 39</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week39.html">Week 39: Resampling methods and logistic regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week40.html">Week 40: Gradient descent methods (continued) and start Neural networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week41.html">Week 41 Neural networks and constructing a neural network code</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek41.html">Exercises week 41</a></li>
+
+
+
+
+
+
+
+
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week42.html">Week 42 Constructing a Neural Network code with examples</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek42.html">Exercises week 42</a></li>
+
+
+
+
+
+
+
+
+
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week43.html">Week 43: Deep Learning: Constructing a Neural Network code and solving differential equations</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek43.html">Exercises week 43</a></li>
+
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week44.html">Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)</a></li>
+<li class="toctree-l1 current active"><a class="current reference internal" href="#">Exercises week 44</a></li>
+
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week45.html">Week 45,  Convolutional Neural Networks (CCNs)</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Projects</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/project1.html">Project 1 on Machine Learning, deadline October 6 (midnight), 2025</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/project2.html">Project 2 on Machine Learning, deadline November 10 (Midnight)</a></li>
+</ul>
+
+    </div>
+</nav></div>
+    </div>
+  
+  
+  <div class="sidebar-primary-items__end sidebar-primary__section">
+  </div>
+  
+  <div id="rtd-footer-container"></div>
+
+
+      </div>
+      
+      <main id="main-content" class="bd-main" role="main">
+        
+        
+
+<div class="sbt-scroll-pixel-helper"></div>
+
+          <div class="bd-content">
+            <div class="bd-article-container">
+              
+              <div class="bd-header-article d-print-none">
+<div class="header-article-items header-article__inner">
+  
+    <div class="header-article-items__start">
+      
+        <div class="header-article-item"><button class="sidebar-toggle primary-toggle btn btn-sm" title="Toggle primary sidebar" data-bs-placement="bottom" data-bs-toggle="tooltip">
+  <span class="fa-solid fa-bars"></span>
+</button></div>
+      
+    </div>
+  
+  
+    <div class="header-article-items__end">
+      
+        <div class="header-article-item">
+
+<div class="article-header-buttons">
+
+
+
+
+
+<div class="dropdown dropdown-download-buttons">
+  <button class="btn dropdown-toggle" type="button" data-bs-toggle="dropdown" aria-expanded="false" aria-label="Download this page">
+    <i class="fas fa-download"></i>
+  </button>
+  <ul class="dropdown-menu">
+      
+      
+      
+      <li><a href="http://github.com/_sources/exercisesweek44.ipynb" target="_blank"
+   class="btn btn-sm btn-download-source-button dropdown-item"
+   title="Download source file"
+   data-bs-placement="left" data-bs-toggle="tooltip"
+>
+  
+
+<span class="btn__icon-container">
+  <i class="fas fa-file"></i>
+  </span>
+<span class="btn__text-container">.ipynb</span>
+</a>
+</li>
+      
+      
+      
+      
+      <li>
+<button onclick="window.print()"
+  class="btn btn-sm btn-download-pdf-button dropdown-item"
+  title="Print to PDF"
+  data-bs-placement="left" data-bs-toggle="tooltip"
+>
+  
+
+<span class="btn__icon-container">
+  <i class="fas fa-file-pdf"></i>
+  </span>
+<span class="btn__text-container">.pdf</span>
+</button>
+</li>
+      
+  </ul>
+</div>
+
+
+
+
+<button onclick="toggleFullScreen()"
+  class="btn btn-sm btn-fullscreen-button"
+  title="Fullscreen mode"
+  data-bs-placement="bottom" data-bs-toggle="tooltip"
+>
+  
+
+<span class="btn__icon-container">
+  <i class="fas fa-expand"></i>
+  </span>
+
+</button>
+
+
+
+<script>
+document.write(`
+  <button class="btn btn-sm nav-link pst-navbar-icon theme-switch-button" title="light/dark" aria-label="light/dark" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <i class="theme-switch fa-solid fa-sun fa-lg" data-mode="light"></i>
+    <i class="theme-switch fa-solid fa-moon fa-lg" data-mode="dark"></i>
+    <i class="theme-switch fa-solid fa-circle-half-stroke fa-lg" data-mode="auto"></i>
+  </button>
+`);
+</script>
+
+
+<script>
+document.write(`
+  <button class="btn btn-sm pst-navbar-icon search-button search-button__button" title="Search" aria-label="Search" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <i class="fa-solid fa-magnifying-glass fa-lg"></i>
+  </button>
+`);
+</script>
+<button class="sidebar-toggle secondary-toggle btn btn-sm" title="Toggle secondary sidebar" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <span class="fa-solid fa-list"></span>
+</button>
+</div></div>
+      
+    </div>
+  
+</div>
+</div>
+              
+              
+
+<div id="jb-print-docs-body" class="onlyprint">
+    <h1>Exercises week 44</h1>
+    <!-- Table of contents -->
+    <div id="print-main-content">
+        <div id="jb-print-toc">
+            
+            <div>
+                <h2> Contents </h2>
+            </div>
+            <nav aria-label="Page">
+                <ul class="visible nav section-nav flex-column">
+<li class="toc-h1 nav-item toc-entry"><a class="reference internal nav-link" href="#">Exercises week 44</a></li>
+<li class="toc-h1 nav-item toc-entry"><a class="reference internal nav-link" href="#overarching-aims-of-the-exercises-this-week">Overarching aims of the exercises this week</a><ul class="visible nav section-nav flex-column">
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#deliverables">Deliverables</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#exercise-1">Exercise 1:</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#exercise-2">Exercise 2:</a></li>
+</ul>
+</li>
+</ul>
+
+            </nav>
+        </div>
+    </div>
+</div>
+
+              
+                
+<div id="searchbox"></div>
+                <article class="bd-article">
+                  
+  <!-- HTML file automatically generated from DocOnce source (https://github.com/doconce/doconce/)
+doconce format html exercisesweek44.do.txt  -->
+<!-- dom:TITLE: Exercises week 44 -->
+<section class="tex2jax_ignore mathjax_ignore" id="exercises-week-44">
+<h1>Exercises week 44<a class="headerlink" href="#exercises-week-44" title="Link to this heading">#</a></h1>
+<p><strong>October 27-31, 2025</strong></p>
+<p>Date: <strong>Deadline is Friday October 31 at midnight</strong></p>
+</section>
+<section class="tex2jax_ignore mathjax_ignore" id="overarching-aims-of-the-exercises-this-week">
+<h1>Overarching aims of the exercises this week<a class="headerlink" href="#overarching-aims-of-the-exercises-this-week" title="Link to this heading">#</a></h1>
+<p>The exercise set this week has two parts.</p>
+<ol class="arabic simple">
+<li><p>The first is a version of the exercises from week 39, where you got started with the report and github repository for project 1, only this time for project 2. This part is required, and a short feedback to this exercise will be available before the project deadline. And you can reuse these elements in your final report.</p></li>
+<li><p>The second is a list of questions meant as a summary of many of the central elements we have discussed in connection with projects 1 and 2, with a slight bias towards deep learning methods and their training. The hope is that these exercises can be of use in your discussions about the neural network results in project 2. <strong>You don’t need to answer all the questions, but you should be able to answer them by the end of working on project 2.</strong></p></li>
+</ol>
+<section id="deliverables">
+<h2>Deliverables<a class="headerlink" href="#deliverables" title="Link to this heading">#</a></h2>
+<p>First, join a group in canvas with your group partners. Pick an avaliable group for Project 2 in the “People” page. If you don’t have a group, you should really consider joining one!</p>
+<p>Complete exercise 1 while working in an Overleaf project. Then, in canvas, include</p>
+<ul class="simple">
+<li><p>An exported PDF of the report draft you have been working on.</p></li>
+<li><p>A comment linking to the github repository used in exercise <strong>1d)</strong></p></li>
+</ul>
+</section>
+<section id="exercise-1">
+<h2>Exercise 1:<a class="headerlink" href="#exercise-1" title="Link to this heading">#</a></h2>
+<p>Following the same directions as in the weekly exercises for week 39:</p>
+<p><strong>a)</strong> Create a report document in Overleaf, and write a suitable abstract and introduction for project 2.</p>
+<p><strong>b)</strong> Add a figure in your report of a heatmap showing the test accuracy of a neural network with [0, 1, 2, 3] hidden layers and [5, 10, 25, 50] nodes per hidden layer.</p>
+<p><strong>c)</strong> Add a figure in your report which meets as few requirements as possible of what we consider a good figure in this course, while still including some results, a title, figure text, and axis labels. Describe in the text of the report the different ways in which the figure is lacking. (This should not be included in the final report for project 2.)</p>
+<p><strong>d)</strong> Create a github repository or folder in a repository with all the elements described in exercise 4 of the weekly exercises of week 39.</p>
+<p><strong>e)</strong> If applicable, add references to your report for the source of your data for regression and classification, the source of claims you make about your data, and for the sources of the gradient optimizers you use and your general claims about these.</p>
+</section>
+<section id="exercise-2">
+<h2>Exercise 2:<a class="headerlink" href="#exercise-2" title="Link to this heading">#</a></h2>
+<p><strong>a)</strong> Linear and logistic regression methods</p>
+<ol class="arabic simple">
+<li><p>What is the main difference between ordinary least squares and Ridge regression?</p></li>
+<li><p>Which kind of data set would you use logistic regression for?</p></li>
+<li><p>In linear regression you assume that your output is described by a continuous non-stochastic function <span class="math notranslate nohighlight">\(f(x)\)</span>. Which is the equivalent function in logistic regression?</p></li>
+<li><p>Can you find an analytic solution to a logistic regression type of problem?</p></li>
+<li><p>What kind of cost function would you use in logistic regression?</p></li>
+</ol>
+<p><strong>b)</strong> Deep learning</p>
+<ol class="arabic simple">
+<li><p>What is an activation function and discuss the use of an activation function? Explain three different types of activation functions?</p></li>
+<li><p>Describe the architecture of a typical feed forward Neural Network (NN).</p></li>
+<li><p>You are using a deep neural network for a prediction task. After training your model, you notice that it is strongly overfitting the training set and that the performance on the test isn’t good. What can you do to reduce overfitting?</p></li>
+<li><p>How would you know if your model is suffering from the problem of exploding gradients?</p></li>
+<li><p>Can you name and explain a few hyperparameters used for training a neural network?</p></li>
+<li><p>Describe the architecture of a typical Convolutional Neural Network (CNN)</p></li>
+<li><p>What is the vanishing gradient problem in Neural Networks and how to fix it?</p></li>
+<li><p>When it comes to training an artificial neural network, what could the reason be for why the cost/loss doesn’t decrease in a few epochs?</p></li>
+<li><p>How does L1/L2 regularization affect a neural network?</p></li>
+<li><p>What is(are) the advantage(s) of deep learning over traditional methods like linear regression or logistic regression?</p></li>
+</ol>
+<p><strong>c)</strong> Optimization part</p>
+<ol class="arabic simple">
+<li><p>Which is the basic mathematical root-finding method behind essentially all gradient descent approaches(stochastic and non-stochastic)?</p></li>
+<li><p>And why don’t we use it? Or stated differently, why do we introduce the learning rate as a parameter?</p></li>
+<li><p>What might happen if you set the momentum hyperparameter too close to 1 (e.g., 0.9999) when using an optimizer for the learning rate?</p></li>
+<li><p>Why should we use stochastic gradient descent instead of plain gradient descent?</p></li>
+<li><p>Which parameters would you need to tune when use a stochastic gradient descent approach?</p></li>
+</ol>
+<p><strong>d)</strong> Analysis of results</p>
+<ol class="arabic simple">
+<li><p>How do you assess overfitting and underfitting?</p></li>
+<li><p>Why do we divide the data in test and train and/or eventually validation sets?</p></li>
+<li><p>Why would you use resampling methods in the data analysis? Mention some widely popular resampling methods.</p></li>
+<li><p>Why might a model that does not overfit the data (maybe because there is a lot of data) perform worse when we add regularization?</p></li>
+</ol>
+</section>
+</section>
+
+    <script type="text/x-thebe-config">
+    {
+        requestKernel: true,
+        binderOptions: {
+            repo: "binder-examples/jupyter-stacks-datascience",
+            ref: "master",
+        },
+        codeMirrorConfig: {
+            theme: "abcdef",
+            mode: "python"
+        },
+        kernelOptions: {
+            name: "python3",
+            path: "./."
+        },
+        predefinedOutput: true
+    }
+    </script>
+    <script>kernelName = 'python3'</script>
+
+                </article>
+              
+
+              
+              
+              
+              
+                <footer class="prev-next-footer d-print-none">
+                  
+<div class="prev-next-area">
+    <a class="left-prev"
+       href="http://github.com/week44.html"
+       title="previous page">
+      <i class="fa-solid fa-angle-left"></i>
+      <div class="prev-next-info">
+        <p class="prev-next-subtitle">previous</p>
+        <p class="prev-next-title">Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)</p>
+      </div>
+    </a>
+    <a class="right-next"
+       href="http://github.com/week45.html"
+       title="next page">
+      <div class="prev-next-info">
+        <p class="prev-next-subtitle">next</p>
+        <p class="prev-next-title">Week 45,  Convolutional Neural Networks (CCNs)</p>
+      </div>
+      <i class="fa-solid fa-angle-right"></i>
+    </a>
+</div>
+                </footer>
+              
+            </div>
+            
+            
+              
+                <div class="bd-sidebar-secondary bd-toc"><div class="sidebar-secondary-items sidebar-secondary__inner">
+
+
+  <div class="sidebar-secondary-item">
+  <div class="page-toc tocsection onthispage">
+    <i class="fa-solid fa-list"></i> Contents
+  </div>
+  <nav class="bd-toc-nav page-toc">
+    <ul class="visible nav section-nav flex-column">
+<li class="toc-h1 nav-item toc-entry"><a class="reference internal nav-link" href="#">Exercises week 44</a></li>
+<li class="toc-h1 nav-item toc-entry"><a class="reference internal nav-link" href="#overarching-aims-of-the-exercises-this-week">Overarching aims of the exercises this week</a><ul class="visible nav section-nav flex-column">
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#deliverables">Deliverables</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#exercise-1">Exercise 1:</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#exercise-2">Exercise 2:</a></li>
+</ul>
+</li>
+</ul>
+
+  </nav></div>
+
+</div></div>
+              
+            
+          </div>
+          <footer class="bd-footer-content">
+            
+<div class="bd-footer-content__inner container">
+  
+  <div class="footer-item">
+    
+<p class="component-author">
+By Morten Hjorth-Jensen
+</p>
+
+  </div>
+  
+  <div class="footer-item">
+    
+
+  <p class="copyright">
+    
+      © Copyright 2023.
+      <br/>
+    
+  </p>
+
+  </div>
+  
+  <div class="footer-item">
+    
+  </div>
+  
+  <div class="footer-item">
+    
+  </div>
+  
+</div>
+          </footer>
+        
+
+      </main>
+    </div>
+  </div>
+  
+  <!-- Scripts loaded after <body> so the DOM is not blocked -->
+  <script src="http://github.com/_static/scripts/bootstrap.js?digest=dfe6caa3a7d634c4db9b"></script>
+<script src="http://github.com/_static/scripts/pydata-sphinx-theme.js?digest=dfe6caa3a7d634c4db9b"></script>
+
+  <footer class="bd-footer">
+  </footer>
+  </body>
+</html>
\ No newline at end of file
diff --git a/doc/LectureNotes/_build/html/figslides/RNN1.png b/doc/LectureNotes/_build/html/figslides/RNN1.png
new file mode 100644
index 000000000..6174bee40
Binary files /dev/null and b/doc/LectureNotes/_build/html/figslides/RNN1.png differ
diff --git a/doc/LectureNotes/_build/html/figslides/RNN10.png b/doc/LectureNotes/_build/html/figslides/RNN10.png
new file mode 100644
index 000000000..259fc5c22
Binary files /dev/null and b/doc/LectureNotes/_build/html/figslides/RNN10.png differ
diff --git a/doc/LectureNotes/_build/html/figslides/RNN11.png b/doc/LectureNotes/_build/html/figslides/RNN11.png
new file mode 100644
index 000000000..04423c850
Binary files /dev/null and b/doc/LectureNotes/_build/html/figslides/RNN11.png differ
diff --git a/doc/LectureNotes/_build/html/figslides/RNN12.png b/doc/LectureNotes/_build/html/figslides/RNN12.png
new file mode 100644
index 000000000..f0c1fc40b
Binary files /dev/null and b/doc/LectureNotes/_build/html/figslides/RNN12.png differ
diff --git a/doc/LectureNotes/_build/html/figslides/RNN13.png b/doc/LectureNotes/_build/html/figslides/RNN13.png
new file mode 100644
index 000000000..f0f83c0d1
Binary files /dev/null and b/doc/LectureNotes/_build/html/figslides/RNN13.png differ
diff --git a/doc/LectureNotes/_build/html/figslides/RNN14.png b/doc/LectureNotes/_build/html/figslides/RNN14.png
new file mode 100644
index 000000000..cead8a2fa
Binary files /dev/null and b/doc/LectureNotes/_build/html/figslides/RNN14.png differ
diff --git a/doc/LectureNotes/_build/html/figslides/RNN15.png b/doc/LectureNotes/_build/html/figslides/RNN15.png
new file mode 100644
index 000000000..2d894680e
Binary files /dev/null and b/doc/LectureNotes/_build/html/figslides/RNN15.png differ
diff --git a/doc/LectureNotes/_build/html/figslides/RNN16.png b/doc/LectureNotes/_build/html/figslides/RNN16.png
new file mode 100644
index 000000000..10bc64a05
Binary files /dev/null and b/doc/LectureNotes/_build/html/figslides/RNN16.png differ
diff --git a/doc/LectureNotes/_build/html/figslides/RNN17.png b/doc/LectureNotes/_build/html/figslides/RNN17.png
new file mode 100644
index 000000000..095e0df92
Binary files /dev/null and b/doc/LectureNotes/_build/html/figslides/RNN17.png differ
diff --git a/doc/LectureNotes/_build/html/figslides/RNN18.png b/doc/LectureNotes/_build/html/figslides/RNN18.png
new file mode 100644
index 000000000..aa5cfee07
Binary files /dev/null and b/doc/LectureNotes/_build/html/figslides/RNN18.png differ
diff --git a/doc/LectureNotes/_build/html/figslides/RNN19.png b/doc/LectureNotes/_build/html/figslides/RNN19.png
new file mode 100644
index 000000000..37ac76e53
Binary files /dev/null and b/doc/LectureNotes/_build/html/figslides/RNN19.png differ
diff --git a/doc/LectureNotes/_build/html/figslides/RNN2.png b/doc/LectureNotes/_build/html/figslides/RNN2.png
new file mode 100644
index 000000000..39bc7147c
Binary files /dev/null and b/doc/LectureNotes/_build/html/figslides/RNN2.png differ
diff --git a/doc/LectureNotes/_build/html/figslides/RNN20.png b/doc/LectureNotes/_build/html/figslides/RNN20.png
new file mode 100644
index 000000000..12635c4c8
Binary files /dev/null and b/doc/LectureNotes/_build/html/figslides/RNN20.png differ
diff --git a/doc/LectureNotes/_build/html/figslides/RNN21.png b/doc/LectureNotes/_build/html/figslides/RNN21.png
new file mode 100644
index 000000000..3e55cd33b
Binary files /dev/null and b/doc/LectureNotes/_build/html/figslides/RNN21.png differ
diff --git a/doc/LectureNotes/_build/html/figslides/RNN22.png b/doc/LectureNotes/_build/html/figslides/RNN22.png
new file mode 100644
index 000000000..fa4611af1
Binary files /dev/null and b/doc/LectureNotes/_build/html/figslides/RNN22.png differ
diff --git a/doc/LectureNotes/_build/html/figslides/RNN3.png b/doc/LectureNotes/_build/html/figslides/RNN3.png
new file mode 100644
index 000000000..07ca1d7d4
Binary files /dev/null and b/doc/LectureNotes/_build/html/figslides/RNN3.png differ
diff --git a/doc/LectureNotes/_build/html/figslides/RNN4.png b/doc/LectureNotes/_build/html/figslides/RNN4.png
new file mode 100644
index 000000000..5b204a801
Binary files /dev/null and b/doc/LectureNotes/_build/html/figslides/RNN4.png differ
diff --git a/doc/LectureNotes/_build/html/figslides/RNN5.png b/doc/LectureNotes/_build/html/figslides/RNN5.png
new file mode 100644
index 000000000..bc4d8e6ca
Binary files /dev/null and b/doc/LectureNotes/_build/html/figslides/RNN5.png differ
diff --git a/doc/LectureNotes/_build/html/figslides/RNN6.png b/doc/LectureNotes/_build/html/figslides/RNN6.png
new file mode 100644
index 000000000..11faa4239
Binary files /dev/null and b/doc/LectureNotes/_build/html/figslides/RNN6.png differ
diff --git a/doc/LectureNotes/_build/html/figslides/RNN7.png b/doc/LectureNotes/_build/html/figslides/RNN7.png
new file mode 100644
index 000000000..6f9489814
Binary files /dev/null and b/doc/LectureNotes/_build/html/figslides/RNN7.png differ
diff --git a/doc/LectureNotes/_build/html/figslides/RNN8.png b/doc/LectureNotes/_build/html/figslides/RNN8.png
new file mode 100644
index 000000000..9ea7d412c
Binary files /dev/null and b/doc/LectureNotes/_build/html/figslides/RNN8.png differ
diff --git a/doc/LectureNotes/_build/html/figslides/RNN9.png b/doc/LectureNotes/_build/html/figslides/RNN9.png
new file mode 100644
index 000000000..bd537ad0a
Binary files /dev/null and b/doc/LectureNotes/_build/html/figslides/RNN9.png differ
diff --git a/doc/LectureNotes/_build/html/figslides/cnn.jpeg b/doc/LectureNotes/_build/html/figslides/cnn.jpeg
new file mode 100644
index 000000000..67bf3ced7
Binary files /dev/null and b/doc/LectureNotes/_build/html/figslides/cnn.jpeg differ
diff --git a/doc/LectureNotes/_build/html/figslides/deepcnn.png b/doc/LectureNotes/_build/html/figslides/deepcnn.png
new file mode 100644
index 000000000..a6c023d72
Binary files /dev/null and b/doc/LectureNotes/_build/html/figslides/deepcnn.png differ
diff --git a/doc/LectureNotes/_build/html/figslides/discreteconv.png b/doc/LectureNotes/_build/html/figslides/discreteconv.png
new file mode 100644
index 000000000..3d40abfcb
Binary files /dev/null and b/doc/LectureNotes/_build/html/figslides/discreteconv.png differ
diff --git a/doc/LectureNotes/_build/html/figslides/discreteconv1.png b/doc/LectureNotes/_build/html/figslides/discreteconv1.png
new file mode 100644
index 000000000..4d57c1e99
Binary files /dev/null and b/doc/LectureNotes/_build/html/figslides/discreteconv1.png differ
diff --git a/doc/LectureNotes/_build/html/figslides/lstm.pdf b/doc/LectureNotes/_build/html/figslides/lstm.pdf
new file mode 100644
index 000000000..31ca2c643
Binary files /dev/null and b/doc/LectureNotes/_build/html/figslides/lstm.pdf differ
diff --git a/doc/LectureNotes/_build/html/figslides/lstm.png b/doc/LectureNotes/_build/html/figslides/lstm.png
new file mode 100644
index 000000000..fc543251a
Binary files /dev/null and b/doc/LectureNotes/_build/html/figslides/lstm.png differ
diff --git a/doc/LectureNotes/_build/html/figslides/maxpooling.png b/doc/LectureNotes/_build/html/figslides/maxpooling.png
new file mode 100644
index 000000000..752651f85
Binary files /dev/null and b/doc/LectureNotes/_build/html/figslides/maxpooling.png differ
diff --git a/doc/LectureNotes/_build/html/figslides/nn.jpeg b/doc/LectureNotes/_build/html/figslides/nn.jpeg
new file mode 100644
index 000000000..0a495cfe4
Binary files /dev/null and b/doc/LectureNotes/_build/html/figslides/nn.jpeg differ
diff --git a/doc/LectureNotes/_build/html/figslides/photo.jpg b/doc/LectureNotes/_build/html/figslides/photo.jpg
new file mode 100755
index 000000000..426220598
Binary files /dev/null and b/doc/LectureNotes/_build/html/figslides/photo.jpg differ
diff --git a/doc/LectureNotes/_build/html/figslides/photo1.jpg b/doc/LectureNotes/_build/html/figslides/photo1.jpg
new file mode 100755
index 000000000..2989b6347
Binary files /dev/null and b/doc/LectureNotes/_build/html/figslides/photo1.jpg differ
diff --git a/doc/LectureNotes/_build/html/figures/adagrad.png b/doc/LectureNotes/_build/html/figures/adagrad.png
new file mode 100644
index 000000000..97a9cf908
Binary files /dev/null and b/doc/LectureNotes/_build/html/figures/adagrad.png differ
diff --git a/doc/LectureNotes/_build/html/figures/adam.png b/doc/LectureNotes/_build/html/figures/adam.png
new file mode 100644
index 000000000..a3a39f025
Binary files /dev/null and b/doc/LectureNotes/_build/html/figures/adam.png differ
diff --git a/doc/LectureNotes/_build/html/figures/generativelearning.png b/doc/LectureNotes/_build/html/figures/generativelearning.png
new file mode 100644
index 000000000..78168b7a0
Binary files /dev/null and b/doc/LectureNotes/_build/html/figures/generativelearning.png differ
diff --git a/doc/LectureNotes/_build/html/figures/nn1.pdf b/doc/LectureNotes/_build/html/figures/nn1.pdf
new file mode 100644
index 000000000..bebe5cabd
Binary files /dev/null and b/doc/LectureNotes/_build/html/figures/nn1.pdf differ
diff --git a/doc/LectureNotes/_build/html/figures/nn1.png b/doc/LectureNotes/_build/html/figures/nn1.png
new file mode 100644
index 000000000..05c359481
Binary files /dev/null and b/doc/LectureNotes/_build/html/figures/nn1.png differ
diff --git a/doc/LectureNotes/_build/html/figures/nn2.pdf b/doc/LectureNotes/_build/html/figures/nn2.pdf
new file mode 100644
index 000000000..7b62d8ff7
Binary files /dev/null and b/doc/LectureNotes/_build/html/figures/nn2.pdf differ
diff --git a/doc/LectureNotes/_build/html/figures/nn2.png b/doc/LectureNotes/_build/html/figures/nn2.png
new file mode 100644
index 000000000..c402d795b
Binary files /dev/null and b/doc/LectureNotes/_build/html/figures/nn2.png differ
diff --git a/doc/LectureNotes/_build/html/figures/nns.png b/doc/LectureNotes/_build/html/figures/nns.png
new file mode 100644
index 000000000..19e31ef05
Binary files /dev/null and b/doc/LectureNotes/_build/html/figures/nns.png differ
diff --git a/doc/LectureNotes/_build/html/figures/rmsprop.png b/doc/LectureNotes/_build/html/figures/rmsprop.png
new file mode 100644
index 000000000..9f336d033
Binary files /dev/null and b/doc/LectureNotes/_build/html/figures/rmsprop.png differ
diff --git a/doc/LectureNotes/_build/html/figures/simplenn1.png b/doc/LectureNotes/_build/html/figures/simplenn1.png
new file mode 100644
index 000000000..3c87aa3ee
Binary files /dev/null and b/doc/LectureNotes/_build/html/figures/simplenn1.png differ
diff --git a/doc/LectureNotes/_build/html/figures/simplenn2.png b/doc/LectureNotes/_build/html/figures/simplenn2.png
new file mode 100644
index 000000000..2ce83dd53
Binary files /dev/null and b/doc/LectureNotes/_build/html/figures/simplenn2.png differ
diff --git a/doc/LectureNotes/_build/html/figures/simplenn3.pdf b/doc/LectureNotes/_build/html/figures/simplenn3.pdf
new file mode 100644
index 000000000..c27014f4a
Binary files /dev/null and b/doc/LectureNotes/_build/html/figures/simplenn3.pdf differ
diff --git a/doc/LectureNotes/_build/html/figures/simplenn3.png b/doc/LectureNotes/_build/html/figures/simplenn3.png
new file mode 100644
index 000000000..a377fad3c
Binary files /dev/null and b/doc/LectureNotes/_build/html/figures/simplenn3.png differ
diff --git a/doc/LectureNotes/_build/html/figures/standarddeeplearning.png b/doc/LectureNotes/_build/html/figures/standarddeeplearning.png
new file mode 100644
index 000000000..f21133ff9
Binary files /dev/null and b/doc/LectureNotes/_build/html/figures/standarddeeplearning.png differ
diff --git a/doc/LectureNotes/_build/html/figures/structure.pdf b/doc/LectureNotes/_build/html/figures/structure.pdf
new file mode 100644
index 000000000..d21e6d3d9
Binary files /dev/null and b/doc/LectureNotes/_build/html/figures/structure.pdf differ
diff --git a/doc/LectureNotes/_build/html/figures/structure.png b/doc/LectureNotes/_build/html/figures/structure.png
new file mode 100644
index 000000000..bf82679e3
Binary files /dev/null and b/doc/LectureNotes/_build/html/figures/structure.png differ
diff --git a/doc/LectureNotes/_build/html/genindex.html b/doc/LectureNotes/_build/html/genindex.html
index 5583cc411..161d3d12f 100644
--- a/doc/LectureNotes/_build/html/genindex.html
+++ b/doc/LectureNotes/_build/html/genindex.html
@@ -227,10 +227,45 @@
 <li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek36.html">Exercises week 36</a></li>
 <li class="toctree-l1"><a class="reference internal" href="http://github.com/week36.html">Week 36: Linear Regression and Gradient descent</a></li>
 <li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek37.html">Exercises week 37</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week37.html">Week 37: Gradient descent methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek38.html">Exercises week 38</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week38.html">Week 38: Statistical analysis, bias-variance tradeoff and resampling methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek39.html">Exercises week 39</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week39.html">Week 39: Resampling methods and logistic regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week40.html">Week 40: Gradient descent methods (continued) and start Neural networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week41.html">Week 41 Neural networks and constructing a neural network code</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek41.html">Exercises week 41</a></li>
+
+
+
+
+
+
+
+
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week42.html">Week 42 Constructing a Neural Network code with examples</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek42.html">Exercises week 42</a></li>
+
+
+
+
+
+
+
+
+
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week43.html">Week 43: Deep Learning: Constructing a Neural Network code and solving differential equations</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek43.html">Exercises week 43</a></li>
+
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week44.html">Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek44.html">Exercises week 44</a></li>
+
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week45.html">Week 45,  Convolutional Neural Networks (CCNs)</a></li>
 </ul>
 <p aria-level="2" class="caption" role="heading"><span class="caption-text">Projects</span></p>
 <ul class="nav bd-sidenav">
 <li class="toctree-l1"><a class="reference internal" href="http://github.com/project1.html">Project 1 on Machine Learning, deadline October 6 (midnight), 2025</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/project2.html">Project 2 on Machine Learning, deadline November 10 (Midnight)</a></li>
 </ul>
 
     </div>
diff --git a/doc/LectureNotes/_build/html/intro.html b/doc/LectureNotes/_build/html/intro.html
index 01ecdb733..ae09bc9c6 100644
--- a/doc/LectureNotes/_build/html/intro.html
+++ b/doc/LectureNotes/_build/html/intro.html
@@ -231,10 +231,45 @@
 <li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek36.html">Exercises week 36</a></li>
 <li class="toctree-l1"><a class="reference internal" href="http://github.com/week36.html">Week 36: Linear Regression and Gradient descent</a></li>
 <li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek37.html">Exercises week 37</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week37.html">Week 37: Gradient descent methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek38.html">Exercises week 38</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week38.html">Week 38: Statistical analysis, bias-variance tradeoff and resampling methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek39.html">Exercises week 39</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week39.html">Week 39: Resampling methods and logistic regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week40.html">Week 40: Gradient descent methods (continued) and start Neural networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week41.html">Week 41 Neural networks and constructing a neural network code</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek41.html">Exercises week 41</a></li>
+
+
+
+
+
+
+
+
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week42.html">Week 42 Constructing a Neural Network code with examples</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek42.html">Exercises week 42</a></li>
+
+
+
+
+
+
+
+
+
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week43.html">Week 43: Deep Learning: Constructing a Neural Network code and solving differential equations</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek43.html">Exercises week 43</a></li>
+
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week44.html">Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek44.html">Exercises week 44</a></li>
+
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week45.html">Week 45,  Convolutional Neural Networks (CCNs)</a></li>
 </ul>
 <p aria-level="2" class="caption" role="heading"><span class="caption-text">Projects</span></p>
 <ul class="nav bd-sidenav">
 <li class="toctree-l1"><a class="reference internal" href="http://github.com/project1.html">Project 1 on Machine Learning, deadline October 6 (midnight), 2025</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/project2.html">Project 2 on Machine Learning, deadline November 10 (Midnight)</a></li>
 </ul>
 
     </div>
diff --git a/doc/LectureNotes/_build/html/linalg.html b/doc/LectureNotes/_build/html/linalg.html
index 52c213f10..f7d1bcb07 100644
--- a/doc/LectureNotes/_build/html/linalg.html
+++ b/doc/LectureNotes/_build/html/linalg.html
@@ -230,10 +230,45 @@
 <li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek36.html">Exercises week 36</a></li>
 <li class="toctree-l1"><a class="reference internal" href="http://github.com/week36.html">Week 36: Linear Regression and Gradient descent</a></li>
 <li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek37.html">Exercises week 37</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week37.html">Week 37: Gradient descent methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek38.html">Exercises week 38</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week38.html">Week 38: Statistical analysis, bias-variance tradeoff and resampling methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek39.html">Exercises week 39</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week39.html">Week 39: Resampling methods and logistic regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week40.html">Week 40: Gradient descent methods (continued) and start Neural networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week41.html">Week 41 Neural networks and constructing a neural network code</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek41.html">Exercises week 41</a></li>
+
+
+
+
+
+
+
+
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week42.html">Week 42 Constructing a Neural Network code with examples</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek42.html">Exercises week 42</a></li>
+
+
+
+
+
+
+
+
+
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week43.html">Week 43: Deep Learning: Constructing a Neural Network code and solving differential equations</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek43.html">Exercises week 43</a></li>
+
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week44.html">Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek44.html">Exercises week 44</a></li>
+
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week45.html">Week 45,  Convolutional Neural Networks (CCNs)</a></li>
 </ul>
 <p aria-level="2" class="caption" role="heading"><span class="caption-text">Projects</span></p>
 <ul class="nav bd-sidenav">
 <li class="toctree-l1"><a class="reference internal" href="http://github.com/project1.html">Project 1 on Machine Learning, deadline October 6 (midnight), 2025</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/project2.html">Project 2 on Machine Learning, deadline November 10 (Midnight)</a></li>
 </ul>
 
     </div>
diff --git a/doc/LectureNotes/_build/html/objects.inv b/doc/LectureNotes/_build/html/objects.inv
index 26f7645df..3220aedd5 100644
Binary files a/doc/LectureNotes/_build/html/objects.inv and b/doc/LectureNotes/_build/html/objects.inv differ
diff --git a/doc/LectureNotes/_build/html/project1.html b/doc/LectureNotes/_build/html/project1.html
index 44d08f6bb..b7ae745b1 100644
--- a/doc/LectureNotes/_build/html/project1.html
+++ b/doc/LectureNotes/_build/html/project1.html
@@ -62,7 +62,8 @@
     <script>DOCUMENTATION_OPTIONS.pagename = 'project1';</script>
     <link rel="index" title="Index" href="http://github.com/genindex.html" />
     <link rel="search" title="Search" href="http://github.com/search.html" />
-    <link rel="prev" title="Exercises week 37" href="http://github.com/exercisesweek37.html" />
+    <link rel="next" title="Project 2 on Machine Learning, deadline November 10 (Midnight)" href="http://github.com/project2.html" />
+    <link rel="prev" title="Week 45, Convolutional Neural Networks (CCNs)" href="http://github.com/week45.html" />
   <meta name="viewport" content="width=device-width, initial-scale=1"/>
   <meta name="docsearch:language" content="en"/>
   </head>
@@ -229,10 +230,45 @@
 <li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek36.html">Exercises week 36</a></li>
 <li class="toctree-l1"><a class="reference internal" href="http://github.com/week36.html">Week 36: Linear Regression and Gradient descent</a></li>
 <li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek37.html">Exercises week 37</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week37.html">Week 37: Gradient descent methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek38.html">Exercises week 38</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week38.html">Week 38: Statistical analysis, bias-variance tradeoff and resampling methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek39.html">Exercises week 39</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week39.html">Week 39: Resampling methods and logistic regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week40.html">Week 40: Gradient descent methods (continued) and start Neural networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week41.html">Week 41 Neural networks and constructing a neural network code</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek41.html">Exercises week 41</a></li>
+
+
+
+
+
+
+
+
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week42.html">Week 42 Constructing a Neural Network code with examples</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek42.html">Exercises week 42</a></li>
+
+
+
+
+
+
+
+
+
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week43.html">Week 43: Deep Learning: Constructing a Neural Network code and solving differential equations</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek43.html">Exercises week 43</a></li>
+
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week44.html">Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek44.html">Exercises week 44</a></li>
+
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week45.html">Week 45,  Convolutional Neural Networks (CCNs)</a></li>
 </ul>
 <p aria-level="2" class="caption" role="heading"><span class="caption-text">Projects</span></p>
 <ul class="current nav bd-sidenav">
 <li class="toctree-l1 current active"><a class="current reference internal" href="#">Project 1 on Machine Learning, deadline October 6 (midnight), 2025</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/project2.html">Project 2 on Machine Learning, deadline November 10 (Midnight)</a></li>
 </ul>
 
     </div>
@@ -381,16 +417,17 @@ <h2> Contents </h2>
             </div>
             <nav aria-label="Page">
                 <ul class="visible nav section-nav flex-column">
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#deliverables">Deliverables</a></li>
 <li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#preamble-note-on-writing-reports-using-reference-material-ai-and-other-tools">Preamble: Note on writing reports, using reference material, AI and other tools</a></li>
 <li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#regression-analysis-and-resampling-methods">Regression analysis and resampling methods</a><ul class="nav section-nav flex-column">
 <li class="toc-h3 nav-item toc-entry"><a class="reference internal nav-link" href="#part-a-ordinary-least-square-ols-for-the-runge-function">Part a : Ordinary Least Square (OLS) for the Runge function</a></li>
-<li class="toc-h3 nav-item toc-entry"><a class="reference internal nav-link" href="#part-b-adding-ridge-regression-for-the-runge-function">Part b: Adding Ridge regression for  the Runge  function</a></li>
+<li class="toc-h3 nav-item toc-entry"><a class="reference internal nav-link" href="#part-b-adding-ridge-regression-for-the-runge-function">Part b: Adding Ridge regression for the Runge function</a></li>
 <li class="toc-h3 nav-item toc-entry"><a class="reference internal nav-link" href="#part-c-writing-your-own-gradient-descent-code">Part c: Writing your own gradient descent code</a></li>
 <li class="toc-h3 nav-item toc-entry"><a class="reference internal nav-link" href="#part-d-including-momentum-and-more-advanced-ways-to-update-the-learning-the-rate">Part d: Including momentum and more advanced ways to update the learning the rate</a></li>
 <li class="toc-h3 nav-item toc-entry"><a class="reference internal nav-link" href="#part-e-writing-our-own-code-for-lasso-regression">Part e: Writing our own code for Lasso regression</a></li>
 <li class="toc-h3 nav-item toc-entry"><a class="reference internal nav-link" href="#part-f-stochastic-gradient-descent">Part f: Stochastic gradient descent</a></li>
 <li class="toc-h3 nav-item toc-entry"><a class="reference internal nav-link" href="#part-g-bias-variance-trade-off-and-resampling-techniques">Part g: Bias-variance trade-off and resampling techniques</a></li>
-<li class="toc-h3 nav-item toc-entry"><a class="reference internal nav-link" href="#part-h-cross-validation-as-resampling-techniques-adding-more-complexity">Part h):  Cross-validation as resampling techniques, adding more complexity</a></li>
+<li class="toc-h3 nav-item toc-entry"><a class="reference internal nav-link" href="#part-h-cross-validation-as-resampling-techniques-adding-more-complexity">Part h): Cross-validation as resampling techniques, adding more complexity</a></li>
 </ul>
 </li>
 <li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#background-literature">Background literature</a></li>
@@ -410,14 +447,42 @@ <h2> Contents </h2>
                   
   <!-- HTML file automatically generated from DocOnce source (https://github.com/doconce/doconce/)
 doconce format html Project1.do.txt  -->
-<!-- dom:TITLE: Project 1 on Machine Learning, deadline October 6 (midnight), 2025 --><section class="tex2jax_ignore mathjax_ignore" id="project-1-on-machine-learning-deadline-october-6-midnight-2025">
+<!-- dom:TITLE: Project 1 on Machine Learning, deadline October 6 (midnight), 2025 -->
+<section class="tex2jax_ignore mathjax_ignore" id="project-1-on-machine-learning-deadline-october-6-midnight-2025">
 <h1>Project 1 on Machine Learning, deadline October 6 (midnight), 2025<a class="headerlink" href="#project-1-on-machine-learning-deadline-october-6-midnight-2025" title="Link to this heading">#</a></h1>
 <p><strong>Data Analysis and Machine Learning FYS-STK3155/FYS4155</strong>, University of Oslo, Norway</p>
 <p>Date: <strong>September 2</strong></p>
+<section id="deliverables">
+<h2>Deliverables<a class="headerlink" href="#deliverables" title="Link to this heading">#</a></h2>
+<p>First, join a group in canvas with your group partners. Pick an avaliable group for Project 1 in the “People” page.</p>
+<p>In canvas, deliver as a group and include:</p>
+<ul class="simple">
+<li><p>A PDF of your report which follows the guidelines covered below and in the week 39 exercises. Additional requirements include:</p>
+<ul>
+<li><p>It should be around 5000 words, use the word counter in Overleaf for this. This often corresponds to 10-12 pages. References and appendices are excluded from the word count</p></li>
+<li><p>It should include around 10-15 figures. You can include more figures in appendices and/or as supplemental material in your repository.</p></li>
+</ul>
+</li>
+<li><p>A comment linking to your github repository (or folder in one of your github repositories) for this project. The repository must include</p>
+<ul>
+<li><p>A PDF file of the report</p></li>
+<li><p>A folder named Code, where you put python files for your functions and notebooks for reproducing your results. Remember to use a seed for generating random data and for train-test splits when generating final results.</p></li>
+<li><p>A README file with</p>
+<ul>
+<li><p>the name of the group members</p></li>
+<li><p>a short description of the project</p></li>
+<li><p>a description of how to install the required packages to run your code from a requirements.txt file or similar (such as a plain text description)</p></li>
+<li><p>names and descriptions of the various notebooks in the Code folder and the results they produce</p></li>
+</ul>
+</li>
+</ul>
+</li>
+</ul>
+</section>
 <section id="preamble-note-on-writing-reports-using-reference-material-ai-and-other-tools">
 <h2>Preamble: Note on writing reports, using reference material, AI and other tools<a class="headerlink" href="#preamble-note-on-writing-reports-using-reference-material-ai-and-other-tools" title="Link to this heading">#</a></h2>
 <p>We want you to answer the three different projects by handing in
-reports written like a standard scientific/technical report.  The
+reports written like a standard scientific/technical report. The
 links at
 <a class="github reference external" href="https://github.com/CompPhysics/MachineLearning/tree/master/doc/Projects">CompPhysics/MachineLearning</a>
 contain more information. There you can find examples of previous
@@ -442,7 +507,7 @@ <h2>Preamble: Note on writing reports, using reference material, AI and other to
 been studied in the scientific literature. This makes it easier for
 you to compare and analyze your results. Comparing with existing
 results from the scientific literature is also an essential element of
-the scientific discussion.  The University of California at Irvine
+the scientific discussion. The University of California at Irvine
 with its Machine Learning repository at
 <a class="reference external" href="https://archive.ics.uci.edu/ml/index.php">https://archive.ics.uci.edu/ml/index.php</a> is an excellent site to
 look up for examples and
@@ -459,42 +524,42 @@ <h2>Regression analysis and resampling methods<a class="headerlink" href="#regre
 give you an experience in writing scientific reports.</p>
 <p>We will study how to fit polynomials to specific
 one-dimensional functions (feel free to replace the suggested function with more complicated ones).</p>
-<p>We will use Runge’s function (see <a class="reference external" href="https://en.wikipedia.org/wiki/Runge%27s_phenomenon">https://en.wikipedia.org/wiki/Runge’s_phenomenon</a> for a discussion).  The one-dimensional function we will study is</p>
+<p>We will use Runge’s function (see <a class="reference external" href="https://en.wikipedia.org/wiki/Runge%27s_phenomenon">https://en.wikipedia.org/wiki/Runge’s_phenomenon</a> for a discussion). The one-dimensional function we will study is</p>
 <div class="math notranslate nohighlight">
 \[
 f(x) = \frac{1}{1+25x^2}.
 \]</div>
 <p>Our first step will be to perform an OLS regression analysis of this
 function, trying out a polynomial fit with an <span class="math notranslate nohighlight">\(x\)</span> dependence of the
-form <span class="math notranslate nohighlight">\([x,x^2,\dots]\)</span>.  You can use a uniform distribution to set up the
+form <span class="math notranslate nohighlight">\([x,x^2,\dots]\)</span>. You can use a uniform distribution to set up the
 arrays of values for <span class="math notranslate nohighlight">\(x \in [-1,1]\)</span>, or alternatively use a fixed step size.
 Thereafter we will repeat many of the same steps when using the Ridge and Lasso regression methods,
-introducing thereby a dependence on the hyperparameter  (penalty) <span class="math notranslate nohighlight">\(\lambda\)</span>.</p>
+introducing thereby a dependence on the hyperparameter (penalty) <span class="math notranslate nohighlight">\(\lambda\)</span>.</p>
 <p>We will also include bootstrap as a resampling technique in order to
-study the so-called <strong>bias-variance tradeoff</strong>.  After that we will
+study the so-called <strong>bias-variance tradeoff</strong>. After that we will
 include the so-called cross-validation technique.</p>
 <section id="part-a-ordinary-least-square-ols-for-the-runge-function">
 <h3>Part a : Ordinary Least Square (OLS) for the Runge function<a class="headerlink" href="#part-a-ordinary-least-square-ols-for-the-runge-function" title="Link to this heading">#</a></h3>
-<p>We will generate our own dataset for abovementioned  function
+<p>We will generate our own dataset for abovementioned function
 <span class="math notranslate nohighlight">\(\mathrm{Runge}(x)\)</span> function with <span class="math notranslate nohighlight">\(x\in [-1,1]\)</span>. You should explore also the addition
 of an added stochastic noise to this function using the normal
 distribution <span class="math notranslate nohighlight">\(N(0,1)\)</span>.</p>
-<p><em>Write your own code</em> (using for example the  pseudoinverse function <strong>pinv</strong> from  <strong>Numpy</strong> ) and perform a standard <strong>ordinary least square regression</strong>
-analysis using polynomials in <span class="math notranslate nohighlight">\(x\)</span> up to  order <span class="math notranslate nohighlight">\(15\)</span> or higher. Explore the dependence on the number of data points and the polynomial degree.</p>
+<p><em>Write your own code</em> (using for example the pseudoinverse function <strong>pinv</strong> from <strong>Numpy</strong> ) and perform a standard <strong>ordinary least square regression</strong>
+analysis using polynomials in <span class="math notranslate nohighlight">\(x\)</span> up to order <span class="math notranslate nohighlight">\(15\)</span> or higher. Explore the dependence on the number of data points and the polynomial degree.</p>
 <p>Evaluate the mean Squared error (MSE)</p>
 <div class="math notranslate nohighlight">
 \[
 MSE(\boldsymbol{y},\tilde{\boldsymbol{y}}) = \frac{1}{n}
 \sum_{i=0}^{n-1}(y_i-\tilde{y}_i)^2,
 \]</div>
-<p>and the <span class="math notranslate nohighlight">\(R^2\)</span> score function.  If <span class="math notranslate nohighlight">\(\tilde{\boldsymbol{y}}_i\)</span> is the predicted
+<p>and the <span class="math notranslate nohighlight">\(R^2\)</span> score function. If <span class="math notranslate nohighlight">\(\tilde{\boldsymbol{y}}_i\)</span> is the predicted
 value of the <span class="math notranslate nohighlight">\(i-th\)</span> sample and <span class="math notranslate nohighlight">\(y_i\)</span> is the corresponding true value,
 then the score <span class="math notranslate nohighlight">\(R^2\)</span> is defined as</p>
 <div class="math notranslate nohighlight">
 \[
 R^2(\boldsymbol{y}, \tilde{\boldsymbol{y}}) = 1 - \frac{\sum_{i=0}^{n - 1} (y_i - \tilde{y}_i)^2}{\sum_{i=0}^{n - 1} (y_i - \bar{y})^2},
 \]</div>
-<p>where we have defined the mean value  of <span class="math notranslate nohighlight">\(\boldsymbol{y}\)</span> as</p>
+<p>where we have defined the mean value of <span class="math notranslate nohighlight">\(\boldsymbol{y}\)</span> as</p>
 <div class="math notranslate nohighlight">
 \[
 \bar{y} =  \frac{1}{n} \sum_{i=0}^{n - 1} y_i.
@@ -503,27 +568,27 @@ <h3>Part a : Ordinary Least Square (OLS) for the Runge function<a class="headerl
 Plot also the parameters <span class="math notranslate nohighlight">\(\theta\)</span> as you increase the order of the polynomial. Comment your results.</p>
 <p>Your code has to include a scaling/centering of the data (for example by
 subtracting the mean value), and
-a split of the data in training and test data. For the scaling  you can
+a split of the data in training and test data. For the scaling you can
 either write your own code or use for example the function for
 splitting training data provided by the library <strong>Scikit-Learn</strong> (make
-sure you have installed it).  This function is called
-<span class="math notranslate nohighlight">\(train\_test\_split\)</span>.  <strong>You should present a critical discussion of why and how you have scaled or not scaled the data</strong>.</p>
+sure you have installed it). This function is called
+<span class="math notranslate nohighlight">\(train\_test\_split\)</span>. <strong>You should present a critical discussion of why and how you have scaled or not scaled the data</strong>.</p>
 <p>It is normal in essentially all Machine Learning studies to split the
-data in a training set and a test set (eventually  also an additional
-validation set).  There
+data in a training set and a test set (eventually also an additional
+validation set). There
 is no explicit recipe for how much data should be included as training
-data and say test data.  An accepted rule of thumb is to use
+data and say test data. An accepted rule of thumb is to use
 approximately <span class="math notranslate nohighlight">\(2/3\)</span> to <span class="math notranslate nohighlight">\(4/5\)</span> of the data as training data.</p>
 <p>You can easily reuse the solutions to your exercises from week 35.
 See also the lecture slides from week 35 and week 36.</p>
 <p>On scaling, we recommend reading the following section from the scikit-learn software description, see <a class="reference external" href="https://scikit-learn.org/stable/auto_examples/preprocessing/plot_all_scaling.html#plot-all-scaling-standard-scaler-section">https://scikit-learn.org/stable/auto_examples/preprocessing/plot_all_scaling.html#plot-all-scaling-standard-scaler-section</a>.</p>
 </section>
 <section id="part-b-adding-ridge-regression-for-the-runge-function">
-<h3>Part b: Adding Ridge regression for  the Runge  function<a class="headerlink" href="#part-b-adding-ridge-regression-for-the-runge-function" title="Link to this heading">#</a></h3>
+<h3>Part b: Adding Ridge regression for the Runge function<a class="headerlink" href="#part-b-adding-ridge-regression-for-the-runge-function" title="Link to this heading">#</a></h3>
 <p>Write your own code for the Ridge method as done in the previous
-exercise. The lecture notes from week 35 and 36 contain more information. Furthermore, the  results from the exercise set from week 36 is something you can reuse here.</p>
+exercise. The lecture notes from week 35 and 36 contain more information. Furthermore, the results from the exercise set from week 36 is something you can reuse here.</p>
 <p>Perform the same analysis as you did in the previous exercise but now for different values of <span class="math notranslate nohighlight">\(\lambda\)</span>. Compare and
-analyze your results with those obtained in part a) with the OLS  method. Study the
+analyze your results with those obtained in part a) with the OLS method. Study the
 dependence on <span class="math notranslate nohighlight">\(\lambda\)</span>.</p>
 </section>
 <section id="part-c-writing-your-own-gradient-descent-code">
@@ -566,7 +631,7 @@ <h3>Part f: Stochastic gradient descent<a class="headerlink" href="#part-f-stoch
 <section id="part-g-bias-variance-trade-off-and-resampling-techniques">
 <h3>Part g: Bias-variance trade-off and resampling techniques<a class="headerlink" href="#part-g-bias-variance-trade-off-and-resampling-techniques" title="Link to this heading">#</a></h3>
 <p>Our aim here is to study the bias-variance trade-off by implementing
-the <strong>bootstrap</strong> resampling technique.  <strong>We will only use the simpler
+the <strong>bootstrap</strong> resampling technique. <strong>We will only use the simpler
 ordinary least squares here</strong>.</p>
 <p>With a code which does OLS and includes resampling techniques,
 we will now discuss the bias-variance trade-off in the context of
@@ -624,8 +689,8 @@ <h3>Part g: Bias-variance trade-off and resampling techniques<a class="headerlin
 \]</div>
 <p><strong>Important note</strong>: Since the function <span class="math notranslate nohighlight">\(f(x)\)</span> is unknown, in order to be able to evalute the bias, we replace <span class="math notranslate nohighlight">\(f(\boldsymbol{x})\)</span> in the expression for the bias with <span class="math notranslate nohighlight">\(\boldsymbol{y}\)</span>.</p>
 <p>The answer to this exercise should be included in the theory part of
-the report.  This exercise is also part of the weekly exercises of
-week 38.  Explain what the terms mean and discuss their
+the report. This exercise is also part of the weekly exercises of
+week 38. Explain what the terms mean and discuss their
 interpretations.</p>
 <p>Perform then a bias-variance analysis of the Runge function by
 studying the MSE value as function of the complexity of your model.</p>
@@ -635,7 +700,7 @@ <h3>Part g: Bias-variance trade-off and resampling techniques<a class="headerlin
 You can follow the code example in the jupyter-book at <a class="reference external" href="https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/chapter3.html#the-bias-variance-tradeoff">https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/chapter3.html#the-bias-variance-tradeoff</a>.</p>
 </section>
 <section id="part-h-cross-validation-as-resampling-techniques-adding-more-complexity">
-<h3>Part h):  Cross-validation as resampling techniques, adding more complexity<a class="headerlink" href="#part-h-cross-validation-as-resampling-techniques-adding-more-complexity" title="Link to this heading">#</a></h3>
+<h3>Part h): Cross-validation as resampling techniques, adding more complexity<a class="headerlink" href="#part-h-cross-validation-as-resampling-techniques-adding-more-complexity" title="Link to this heading">#</a></h3>
 <p>The aim here is to implement another widely popular
 resampling technique, the so-called cross-validation method.</p>
 <p>Implement the <span class="math notranslate nohighlight">\(k\)</span>-fold cross-validation algorithm (feel free to use
@@ -658,23 +723,23 @@ <h2>Background literature<a class="headerlink" href="#background-literature" tit
 <h2>Introduction to numerical projects<a class="headerlink" href="#introduction-to-numerical-projects" title="Link to this heading">#</a></h2>
 <p>Here follows a brief recipe and recommendation on how to answer the various questions when preparing your answers.</p>
 <ul class="simple">
-<li><p>Give a short description of the nature of the problem and the eventual  numerical methods you have used.</p></li>
+<li><p>Give a short description of the nature of the problem and the eventual numerical methods you have used.</p></li>
 <li><p>Describe the algorithm you have used and/or developed. Here you may find it convenient to use pseudocoding. In many cases you can describe the algorithm in the program itself.</p></li>
 <li><p>Include the source code of your program. Comment your program properly. You should have the code at your GitHub/GitLab link. You can also place the code in an appendix of your report.</p></li>
 <li><p>If possible, try to find analytic solutions, or known limits in order to test your program when developing the code.</p></li>
-<li><p>Include your results either in figure form or in a table. Remember to        label your results. All tables and figures should have relevant captions        and labels on the axes.</p></li>
+<li><p>Include your results either in figure form or in a table. Remember to label your results. All tables and figures should have relevant captions and labels on the axes.</p></li>
 <li><p>Try to evaluate the reliabilty and numerical stability/precision of your results. If possible, include a qualitative and/or quantitative discussion of the numerical stability, eventual loss of precision etc.</p></li>
-<li><p>Try to give an interpretation of you results in your answers to  the problems.</p></li>
-<li><p>Critique: if possible include your comments and reflections about the  exercise, whether you felt you learnt something, ideas for improvements and  other thoughts you’ve made when solving the exercise. We wish to keep this course at the interactive level and your comments can help us improve it.</p></li>
-<li><p>Try to establish a practice where you log your work at the  computerlab. You may find such a logbook very handy at later stages in your work, especially when you don’t properly remember  what a previous test version  of your program did. Here you could also record  the time spent on solving the exercise, various algorithms you may have tested or other topics which you feel worthy of mentioning.</p></li>
+<li><p>Try to give an interpretation of you results in your answers to the problems.</p></li>
+<li><p>Critique: if possible include your comments and reflections about the exercise, whether you felt you learnt something, ideas for improvements and other thoughts you’ve made when solving the exercise. We wish to keep this course at the interactive level and your comments can help us improve it.</p></li>
+<li><p>Try to establish a practice where you log your work at the computerlab. You may find such a logbook very handy at later stages in your work, especially when you don’t properly remember what a previous test version of your program did. Here you could also record the time spent on solving the exercise, various algorithms you may have tested or other topics which you feel worthy of mentioning.</p></li>
 </ul>
 </section>
 <section id="format-for-electronic-delivery-of-report-and-programs">
 <h2>Format for electronic delivery of report and programs<a class="headerlink" href="#format-for-electronic-delivery-of-report-and-programs" title="Link to this heading">#</a></h2>
-<p>The preferred format for the report is a PDF file. You can also use DOC or postscript formats or as an ipython notebook file.  As programming language we prefer that you choose between C/C++, Fortran2008, Julia or Python. The following prescription should be followed when preparing the report:</p>
+<p>The preferred format for the report is a PDF file. You can also use DOC or postscript formats or as an ipython notebook file. As programming language we prefer that you choose between C/C++, Fortran2008, Julia or Python. The following prescription should be followed when preparing the report:</p>
 <ul class="simple">
-<li><p>Use Canvas to hand in your projects, log in  at  <a class="reference external" href="https://www.uio.no/english/services/it/education/canvas/">https://www.uio.no/english/services/it/education/canvas/</a> with your normal UiO username and password.</p></li>
-<li><p>Upload <strong>only</strong> the report file or the link to your GitHub/GitLab or similar typo of  repos!  For the source code file(s) you have developed please provide us with your link to your GitHub/GitLab or similar  domain.  The report file should include all of your discussions and a list of the codes you have developed.  Do not include library files which are available at the course homepage, unless you have made specific changes to them.</p></li>
+<li><p>Use Canvas to hand in your projects, log in at <a class="reference external" href="https://www.uio.no/english/services/it/education/canvas/">https://www.uio.no/english/services/it/education/canvas/</a> with your normal UiO username and password.</p></li>
+<li><p>Upload <strong>only</strong> the report file or the link to your GitHub/GitLab or similar typo of repos! For the source code file(s) you have developed please provide us with your link to your GitHub/GitLab or similar domain. The report file should include all of your discussions and a list of the codes you have developed. Do not include library files which are available at the course homepage, unless you have made specific changes to them.</p></li>
 <li><p>In your GitHub/GitLab or similar repository, please include a folder which contains selected results. These can be in the form of output from your code for a selected set of runs and input parameters.</p></li>
 </ul>
 <p>Finally,
@@ -698,13 +763,13 @@ <h2>Software and needed installations<a class="headerlink" href="#software-and-n
 <p>For Linux users, with its variety of distributions like for example the widely popular Ubuntu distribution
 you can use <strong>pip</strong> as well and simply install Python as</p>
 <ol class="arabic simple">
-<li><p>sudo apt-get install python3  (or python for python2.7)</p></li>
+<li><p>sudo apt-get install python3 (or python for python2.7)</p></li>
 </ol>
 <p>etc etc.</p>
-<p>If you don’t want to install various Python packages with their dependencies separately, we recommend two widely used distrubutions which set up  all relevant dependencies for Python, namely</p>
+<p>If you don’t want to install various Python packages with their dependencies separately, we recommend two widely used distrubutions which set up all relevant dependencies for Python, namely</p>
 <ol class="arabic simple">
 <li><p><a class="reference external" href="https://docs.anaconda.com/">Anaconda</a> Anaconda is an open source distribution of the Python and R programming languages for large-scale data processing, predictive analytics, and scientific computing, that aims to simplify package management and deployment. Package versions are managed by the package management system <strong>conda</strong></p></li>
-<li><p><a class="reference external" href="https://www.enthought.com/product/canopy/">Enthought canopy</a>  is a Python distribution for scientific and analytic computing distribution and analysis environment, available for free and under a commercial license.</p></li>
+<li><p><a class="reference external" href="https://www.enthought.com/product/canopy/">Enthought canopy</a> is a Python distribution for scientific and analytic computing distribution and analysis environment, available for free and under a commercial license.</p></li>
 </ol>
 <p>Popular software packages written in Python for ML are</p>
 <ul class="simple">
@@ -750,13 +815,22 @@ <h2>Software and needed installations<a class="headerlink" href="#software-and-n
                   
 <div class="prev-next-area">
     <a class="left-prev"
-       href="http://github.com/exercisesweek37.html"
+       href="http://github.com/week45.html"
        title="previous page">
       <i class="fa-solid fa-angle-left"></i>
       <div class="prev-next-info">
         <p class="prev-next-subtitle">previous</p>
-        <p class="prev-next-title">Exercises week 37</p>
+        <p class="prev-next-title">Week 45,  Convolutional Neural Networks (CCNs)</p>
+      </div>
+    </a>
+    <a class="right-next"
+       href="http://github.com/project2.html"
+       title="next page">
+      <div class="prev-next-info">
+        <p class="prev-next-subtitle">next</p>
+        <p class="prev-next-title">Project 2 on Machine Learning, deadline November 10 (Midnight)</p>
       </div>
+      <i class="fa-solid fa-angle-right"></i>
     </a>
 </div>
                 </footer>
@@ -774,16 +848,17 @@ <h2>Software and needed installations<a class="headerlink" href="#software-and-n
   </div>
   <nav class="bd-toc-nav page-toc">
     <ul class="visible nav section-nav flex-column">
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#deliverables">Deliverables</a></li>
 <li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#preamble-note-on-writing-reports-using-reference-material-ai-and-other-tools">Preamble: Note on writing reports, using reference material, AI and other tools</a></li>
 <li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#regression-analysis-and-resampling-methods">Regression analysis and resampling methods</a><ul class="nav section-nav flex-column">
 <li class="toc-h3 nav-item toc-entry"><a class="reference internal nav-link" href="#part-a-ordinary-least-square-ols-for-the-runge-function">Part a : Ordinary Least Square (OLS) for the Runge function</a></li>
-<li class="toc-h3 nav-item toc-entry"><a class="reference internal nav-link" href="#part-b-adding-ridge-regression-for-the-runge-function">Part b: Adding Ridge regression for  the Runge  function</a></li>
+<li class="toc-h3 nav-item toc-entry"><a class="reference internal nav-link" href="#part-b-adding-ridge-regression-for-the-runge-function">Part b: Adding Ridge regression for the Runge function</a></li>
 <li class="toc-h3 nav-item toc-entry"><a class="reference internal nav-link" href="#part-c-writing-your-own-gradient-descent-code">Part c: Writing your own gradient descent code</a></li>
 <li class="toc-h3 nav-item toc-entry"><a class="reference internal nav-link" href="#part-d-including-momentum-and-more-advanced-ways-to-update-the-learning-the-rate">Part d: Including momentum and more advanced ways to update the learning the rate</a></li>
 <li class="toc-h3 nav-item toc-entry"><a class="reference internal nav-link" href="#part-e-writing-our-own-code-for-lasso-regression">Part e: Writing our own code for Lasso regression</a></li>
 <li class="toc-h3 nav-item toc-entry"><a class="reference internal nav-link" href="#part-f-stochastic-gradient-descent">Part f: Stochastic gradient descent</a></li>
 <li class="toc-h3 nav-item toc-entry"><a class="reference internal nav-link" href="#part-g-bias-variance-trade-off-and-resampling-techniques">Part g: Bias-variance trade-off and resampling techniques</a></li>
-<li class="toc-h3 nav-item toc-entry"><a class="reference internal nav-link" href="#part-h-cross-validation-as-resampling-techniques-adding-more-complexity">Part h):  Cross-validation as resampling techniques, adding more complexity</a></li>
+<li class="toc-h3 nav-item toc-entry"><a class="reference internal nav-link" href="#part-h-cross-validation-as-resampling-techniques-adding-more-complexity">Part h): Cross-validation as resampling techniques, adding more complexity</a></li>
 </ul>
 </li>
 <li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#background-literature">Background literature</a></li>
diff --git a/doc/LectureNotes/_build/html/project2.html b/doc/LectureNotes/_build/html/project2.html
new file mode 100644
index 000000000..af93d671e
--- /dev/null
+++ b/doc/LectureNotes/_build/html/project2.html
@@ -0,0 +1,933 @@
+
+<!DOCTYPE html>
+
+
+<html lang="en" data-content_root="./" >
+
+  <head>
+    <meta charset="utf-8" />
+    <meta name="viewport" content="width=device-width, initial-scale=1.0" /><meta name="viewport" content="width=device-width, initial-scale=1" />
+
+    <title>Project 2 on Machine Learning, deadline November 10 (Midnight) &#8212; Applied Data Analysis and Machine Learning</title>
+  
+  
+  
+  <script data-cfasync="false">
+    document.documentElement.dataset.mode = localStorage.getItem("mode") || "";
+    document.documentElement.dataset.theme = localStorage.getItem("theme") || "";
+  </script>
+  
+  <!-- Loaded before other Sphinx assets -->
+  <link href="http://github.com/_static/styles/theme.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+<link href="http://github.com/_static/styles/bootstrap.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+<link href="http://github.com/_static/styles/pydata-sphinx-theme.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+
+  
+  <link href="http://github.com/_static/vendor/fontawesome/6.5.2/css/all.min.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+  <link rel="preload" as="font" type="font/woff2" crossorigin href="http://github.com/_static/vendor/fontawesome/6.5.2/webfonts/fa-solid-900.woff2" />
+<link rel="preload" as="font" type="font/woff2" crossorigin href="http://github.com/_static/vendor/fontawesome/6.5.2/webfonts/fa-brands-400.woff2" />
+<link rel="preload" as="font" type="font/woff2" crossorigin href="http://github.com/_static/vendor/fontawesome/6.5.2/webfonts/fa-regular-400.woff2" />
+
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/pygments.css?v=fa44fd50" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/styles/sphinx-book-theme.css?v=eba8b062" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/togglebutton.css?v=13237357" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/copybutton.css?v=76b2166b" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/mystnb.8ecb98da25f57f5357bf6f572d296f466b2cfe2517ffebfabe82451661e28f02.css?v=6644e6bb" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/sphinx-thebe.css?v=4fa983c6" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/sphinx-design.min.css?v=95c83b7e" />
+  
+  <!-- Pre-loaded scripts that we'll load fully later -->
+  <link rel="preload" as="script" href="http://github.com/_static/scripts/bootstrap.js?digest=dfe6caa3a7d634c4db9b" />
+<link rel="preload" as="script" href="http://github.com/_static/scripts/pydata-sphinx-theme.js?digest=dfe6caa3a7d634c4db9b" />
+  <script src="http://github.com/_static/vendor/fontawesome/6.5.2/js/all.min.js?digest=dfe6caa3a7d634c4db9b"></script>
+
+    <script src="http://github.com/_static/documentation_options.js?v=9eb32ce0"></script>
+    <script src="http://github.com/_static/doctools.js?v=9a2dae69"></script>
+    <script src="http://github.com/_static/sphinx_highlight.js?v=dc90522c"></script>
+    <script src="http://github.com/_static/clipboard.min.js?v=a7894cd8"></script>
+    <script src="http://github.com/_static/copybutton.js?v=f281be69"></script>
+    <script src="http://github.com/_static/scripts/sphinx-book-theme.js?v=887ef09a"></script>
+    <script>let toggleHintShow = 'Click to show';</script>
+    <script>let toggleHintHide = 'Click to hide';</script>
+    <script>let toggleOpenOnPrint = 'true';</script>
+    <script src="http://github.com/_static/togglebutton.js?v=4a39c7ea"></script>
+    <script>var togglebuttonSelector = '.toggle, .admonition.dropdown';</script>
+    <script src="http://github.com/_static/design-tabs.js?v=f930bc37"></script>
+    <script>const THEBE_JS_URL = "https://unpkg.com/thebe@0.8.2/lib/index.js"; const thebe_selector = ".thebe,.cell"; const thebe_selector_input = "pre"; const thebe_selector_output = ".output, .cell_output"</script>
+    <script async="async" src="http://github.com/_static/sphinx-thebe.js?v=c100c467"></script>
+    <script>var togglebuttonSelector = '.toggle, .admonition.dropdown';</script>
+    <script>const THEBE_JS_URL = "https://unpkg.com/thebe@0.8.2/lib/index.js"; const thebe_selector = ".thebe,.cell"; const thebe_selector_input = "pre"; const thebe_selector_output = ".output, .cell_output"</script>
+    <script>window.MathJax = {"options": {"processHtmlClass": "tex2jax_process|mathjax_process|math|output_area"}}</script>
+    <script defer="defer" src="https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-mml-chtml.js"></script>
+    <script>DOCUMENTATION_OPTIONS.pagename = 'project2';</script>
+    <link rel="index" title="Index" href="http://github.com/genindex.html" />
+    <link rel="search" title="Search" href="http://github.com/search.html" />
+    <link rel="prev" title="Project 1 on Machine Learning, deadline October 6 (midnight), 2025" href="http://github.com/project1.html" />
+  <meta name="viewport" content="width=device-width, initial-scale=1"/>
+  <meta name="docsearch:language" content="en"/>
+  </head>
+  
+  
+  <body data-bs-spy="scroll" data-bs-target=".bd-toc-nav" data-offset="180" data-bs-root-margin="0px 0px -60%" data-default-mode="">
+
+  
+  
+  <div id="pst-skip-link" class="skip-link d-print-none"><a href="#main-content">Skip to main content</a></div>
+  
+  <div id="pst-scroll-pixel-helper"></div>
+  
+  <button type="button" class="btn rounded-pill" id="pst-back-to-top">
+    <i class="fa-solid fa-arrow-up"></i>Back to top</button>
+
+  
+  <input type="checkbox"
+          class="sidebar-toggle"
+          id="pst-primary-sidebar-checkbox"/>
+  <label class="overlay overlay-primary" for="pst-primary-sidebar-checkbox"></label>
+  
+  <input type="checkbox"
+          class="sidebar-toggle"
+          id="pst-secondary-sidebar-checkbox"/>
+  <label class="overlay overlay-secondary" for="pst-secondary-sidebar-checkbox"></label>
+  
+  <div class="search-button__wrapper">
+    <div class="search-button__overlay"></div>
+    <div class="search-button__search-container">
+<form class="bd-search d-flex align-items-center"
+      action="http://github.com/search.html"
+      method="get">
+  <i class="fa-solid fa-magnifying-glass"></i>
+  <input type="search"
+         class="form-control"
+         name="q"
+         id="search-input"
+         placeholder="Search this book..."
+         aria-label="Search this book..."
+         autocomplete="off"
+         autocorrect="off"
+         autocapitalize="off"
+         spellcheck="false"/>
+  <span class="search-button__kbd-shortcut"><kbd class="kbd-shortcut__modifier">Ctrl</kbd>+<kbd>K</kbd></span>
+</form></div>
+  </div>
+
+  <div class="pst-async-banner-revealer d-none">
+  <aside id="bd-header-version-warning" class="d-none d-print-none" aria-label="Version warning"></aside>
+</div>
+
+  
+    <header class="bd-header navbar navbar-expand-lg bd-navbar d-print-none">
+    </header>
+  
+
+  <div class="bd-container">
+    <div class="bd-container__inner bd-page-width">
+      
+      
+      
+      <div class="bd-sidebar-primary bd-sidebar">
+        
+
+  
+  <div class="sidebar-header-items sidebar-primary__section">
+    
+    
+    
+    
+  </div>
+  
+    <div class="sidebar-primary-items__start sidebar-primary__section">
+        <div class="sidebar-primary-item">
+
+  
+    
+  
+
+<a class="navbar-brand logo" href="http://github.com/intro.html">
+  
+  
+  
+  
+  
+    
+    
+      
+    
+    
+    <img src="http://github.com/_static/logo.png" class="logo__image only-light" alt="Applied Data Analysis and Machine Learning - Home"/>
+    <script>document.write(`<img src="http://github.com/_static/logo.png" class="logo__image only-dark" alt="Applied Data Analysis and Machine Learning - Home"/>`);</script>
+  
+  
+</a></div>
+        <div class="sidebar-primary-item">
+
+ <script>
+ document.write(`
+   <button class="btn search-button-field search-button__button" title="Search" aria-label="Search" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <i class="fa-solid fa-magnifying-glass"></i>
+    <span class="search-button__default-text">Search</span>
+    <span class="search-button__kbd-shortcut"><kbd class="kbd-shortcut__modifier">Ctrl</kbd>+<kbd class="kbd-shortcut__modifier">K</kbd></span>
+   </button>
+ `);
+ </script></div>
+        <div class="sidebar-primary-item"><nav class="bd-links bd-docs-nav" aria-label="Main">
+    <div class="bd-toc-item navbar-nav active">
+        
+        <ul class="nav bd-sidenav bd-sidenav__home-link">
+            <li class="toctree-l1">
+                <a class="reference internal" href="http://github.com/intro.html">
+                    Applied Data Analysis and Machine Learning
+                </a>
+            </li>
+        </ul>
+        <p aria-level="2" class="caption" role="heading"><span class="caption-text">About the course</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/schedule.html">Course setting</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/teachers.html">Teachers and Grading</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/textbooks.html">Textbooks</a></li>
+
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Review of Statistics with Resampling Techniques and Linear Algebra</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/statistics.html">1. Elements of Probability Theory and Statistical Data Analysis</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/linalg.html">2. Linear Algebra, Handling of Arrays and more Python Features</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">From Regression to Support Vector Machines</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter1.html">3. Linear Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter2.html">4. Ridge and Lasso Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter3.html">5. Resampling Methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter4.html">6. Logistic Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapteroptimization.html">7. Optimization, the central part of any Machine Learning algortithm</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter5.html">8. Support Vector Machines, overarching aims</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Decision Trees, Ensemble Methods and Boosting</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter6.html">9. Decision trees, overarching aims</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter7.html">10. Ensemble Methods: From a Single Tree to Many Trees and Extreme Boosting, Meet the Jungle of Methods</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Dimensionality Reduction</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter8.html">11. Basic ideas of the Principal Component Analysis (PCA)</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/clustering.html">12. Clustering and Unsupervised Learning</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Deep Learning Methods</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter9.html">13. Neural networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter10.html">14. Building a Feed Forward Neural Network</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter11.html">15. Solving Differential Equations  with Deep Learning</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter12.html">16. Convolutional Neural Networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter13.html">17. Recurrent neural networks: Overarching view</a></li>
+
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Weekly material, notes and exercises</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek34.html">Exercises week 34</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week34.html">Week 34: Introduction to the course, Logistics and Practicalities</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek35.html">Exercises week 35</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week35.html">Week 35: From Ordinary Linear Regression to Ridge and Lasso Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek36.html">Exercises week 36</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week36.html">Week 36: Linear Regression and Gradient descent</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek37.html">Exercises week 37</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week37.html">Week 37: Gradient descent methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek38.html">Exercises week 38</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week38.html">Week 38: Statistical analysis, bias-variance tradeoff and resampling methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek39.html">Exercises week 39</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week39.html">Week 39: Resampling methods and logistic regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week40.html">Week 40: Gradient descent methods (continued) and start Neural networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week41.html">Week 41 Neural networks and constructing a neural network code</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek41.html">Exercises week 41</a></li>
+
+
+
+
+
+
+
+
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week42.html">Week 42 Constructing a Neural Network code with examples</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek42.html">Exercises week 42</a></li>
+
+
+
+
+
+
+
+
+
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week43.html">Week 43: Deep Learning: Constructing a Neural Network code and solving differential equations</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek43.html">Exercises week 43</a></li>
+
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week44.html">Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek44.html">Exercises week 44</a></li>
+
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week45.html">Week 45,  Convolutional Neural Networks (CCNs)</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Projects</span></p>
+<ul class="current nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/project1.html">Project 1 on Machine Learning, deadline October 6 (midnight), 2025</a></li>
+<li class="toctree-l1 current active"><a class="current reference internal" href="#">Project 2 on Machine Learning, deadline November 10 (Midnight)</a></li>
+</ul>
+
+    </div>
+</nav></div>
+    </div>
+  
+  
+  <div class="sidebar-primary-items__end sidebar-primary__section">
+  </div>
+  
+  <div id="rtd-footer-container"></div>
+
+
+      </div>
+      
+      <main id="main-content" class="bd-main" role="main">
+        
+        
+
+<div class="sbt-scroll-pixel-helper"></div>
+
+          <div class="bd-content">
+            <div class="bd-article-container">
+              
+              <div class="bd-header-article d-print-none">
+<div class="header-article-items header-article__inner">
+  
+    <div class="header-article-items__start">
+      
+        <div class="header-article-item"><button class="sidebar-toggle primary-toggle btn btn-sm" title="Toggle primary sidebar" data-bs-placement="bottom" data-bs-toggle="tooltip">
+  <span class="fa-solid fa-bars"></span>
+</button></div>
+      
+    </div>
+  
+  
+    <div class="header-article-items__end">
+      
+        <div class="header-article-item">
+
+<div class="article-header-buttons">
+
+
+
+
+
+<div class="dropdown dropdown-download-buttons">
+  <button class="btn dropdown-toggle" type="button" data-bs-toggle="dropdown" aria-expanded="false" aria-label="Download this page">
+    <i class="fas fa-download"></i>
+  </button>
+  <ul class="dropdown-menu">
+      
+      
+      
+      <li><a href="http://github.com/_sources/project2.ipynb" target="_blank"
+   class="btn btn-sm btn-download-source-button dropdown-item"
+   title="Download source file"
+   data-bs-placement="left" data-bs-toggle="tooltip"
+>
+  
+
+<span class="btn__icon-container">
+  <i class="fas fa-file"></i>
+  </span>
+<span class="btn__text-container">.ipynb</span>
+</a>
+</li>
+      
+      
+      
+      
+      <li>
+<button onclick="window.print()"
+  class="btn btn-sm btn-download-pdf-button dropdown-item"
+  title="Print to PDF"
+  data-bs-placement="left" data-bs-toggle="tooltip"
+>
+  
+
+<span class="btn__icon-container">
+  <i class="fas fa-file-pdf"></i>
+  </span>
+<span class="btn__text-container">.pdf</span>
+</button>
+</li>
+      
+  </ul>
+</div>
+
+
+
+
+<button onclick="toggleFullScreen()"
+  class="btn btn-sm btn-fullscreen-button"
+  title="Fullscreen mode"
+  data-bs-placement="bottom" data-bs-toggle="tooltip"
+>
+  
+
+<span class="btn__icon-container">
+  <i class="fas fa-expand"></i>
+  </span>
+
+</button>
+
+
+
+<script>
+document.write(`
+  <button class="btn btn-sm nav-link pst-navbar-icon theme-switch-button" title="light/dark" aria-label="light/dark" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <i class="theme-switch fa-solid fa-sun fa-lg" data-mode="light"></i>
+    <i class="theme-switch fa-solid fa-moon fa-lg" data-mode="dark"></i>
+    <i class="theme-switch fa-solid fa-circle-half-stroke fa-lg" data-mode="auto"></i>
+  </button>
+`);
+</script>
+
+
+<script>
+document.write(`
+  <button class="btn btn-sm pst-navbar-icon search-button search-button__button" title="Search" aria-label="Search" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <i class="fa-solid fa-magnifying-glass fa-lg"></i>
+  </button>
+`);
+</script>
+<button class="sidebar-toggle secondary-toggle btn btn-sm" title="Toggle secondary sidebar" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <span class="fa-solid fa-list"></span>
+</button>
+</div></div>
+      
+    </div>
+  
+</div>
+</div>
+              
+              
+
+<div id="jb-print-docs-body" class="onlyprint">
+    <h1>Project 2 on Machine Learning, deadline November 10 (Midnight)</h1>
+    <!-- Table of contents -->
+    <div id="print-main-content">
+        <div id="jb-print-toc">
+            
+            <div>
+                <h2> Contents </h2>
+            </div>
+            <nav aria-label="Page">
+                <ul class="visible nav section-nav flex-column">
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#deliverables">Deliverables</a><ul class="nav section-nav flex-column">
+<li class="toc-h3 nav-item toc-entry"><a class="reference internal nav-link" href="#preamble-note-on-writing-reports-using-reference-material-ai-and-other-tools">Preamble: Note on writing reports, using reference material, AI and other tools</a></li>
+</ul>
+</li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#classification-and-regression-writing-our-own-neural-network-code">Classification and Regression, writing our own neural network code</a><ul class="nav section-nav flex-column">
+<li class="toc-h3 nav-item toc-entry"><a class="reference internal nav-link" href="#part-a-analytical-warm-up">Part a): Analytical warm-up</a></li>
+<li class="toc-h3 nav-item toc-entry"><a class="reference internal nav-link" href="#reminder-about-the-gradient-machinery-from-project-1">Reminder about the gradient machinery from project 1</a></li>
+<li class="toc-h3 nav-item toc-entry"><a class="reference internal nav-link" href="#part-b-writing-your-own-neural-network-code">Part b): Writing your own Neural Network code</a></li>
+<li class="toc-h3 nav-item toc-entry"><a class="reference internal nav-link" href="#part-c-testing-against-other-software-libraries">Part c): Testing against other software libraries</a></li>
+<li class="toc-h3 nav-item toc-entry"><a class="reference internal nav-link" href="#part-d-testing-different-activation-functions-and-depths-of-the-neural-network">Part d): Testing different activation functions and depths of the neural network</a></li>
+<li class="toc-h3 nav-item toc-entry"><a class="reference internal nav-link" href="#part-e-testing-different-norms">Part e): Testing different norms</a></li>
+<li class="toc-h3 nav-item toc-entry"><a class="reference internal nav-link" href="#part-f-classification-analysis-using-neural-networks">Part f): Classification  analysis using neural networks</a></li>
+<li class="toc-h3 nav-item toc-entry"><a class="reference internal nav-link" href="#part-g-critical-evaluation-of-the-various-algorithms">Part g) Critical evaluation of the various algorithms</a></li>
+</ul>
+</li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#summary-of-methods-to-implement-and-analyze">Summary of methods to implement and analyze</a><ul class="nav section-nav flex-column">
+<li class="toc-h3 nav-item toc-entry"><a class="reference internal nav-link" href="#required-analysis">Required Analysis:</a></li>
+<li class="toc-h3 nav-item toc-entry"><a class="reference internal nav-link" href="#optional-note-that-you-should-include-at-least-two-of-these-in-the-report">Optional (Note that you should include at least two of these in the report):</a></li>
+</ul>
+</li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#background-literature">Background literature</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#introduction-to-numerical-projects">Introduction to numerical projects</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#format-for-electronic-delivery-of-report-and-programs">Format for electronic delivery of report and programs</a></li>
+</ul>
+            </nav>
+        </div>
+    </div>
+</div>
+
+              
+                
+<div id="searchbox"></div>
+                <article class="bd-article">
+                  
+  <!-- HTML file automatically generated from DocOnce source (https://github.com/doconce/doconce/)
+doconce format html Project2.do.txt  -->
+<!-- dom:TITLE: Project 2 on Machine Learning, deadline November 10 (Midnight) --><section class="tex2jax_ignore mathjax_ignore" id="project-2-on-machine-learning-deadline-november-10-midnight">
+<h1>Project 2 on Machine Learning, deadline November 10 (Midnight)<a class="headerlink" href="#project-2-on-machine-learning-deadline-november-10-midnight" title="Link to this heading">#</a></h1>
+<p><strong><a class="reference external" href="http://www.uio.no/studier/emner/matnat/fys/FYS3155/index-eng.html">Data Analysis and Machine Learning FYS-STK3155/FYS4155</a></strong>, University of Oslo, Norway</p>
+<p>Date: <strong>October 14, 2025</strong></p>
+<section id="deliverables">
+<h2>Deliverables<a class="headerlink" href="#deliverables" title="Link to this heading">#</a></h2>
+<p>First, join a group in canvas with your group partners. Pick an avaliable group for Project 2 in the <strong>People</strong> page.</p>
+<p>In canvas, deliver as a group and include:</p>
+<ul class="simple">
+<li><p>A PDF of your report which follows the guidelines covered below and in the week 39 exercises. Additional requirements include:</p>
+<ul>
+<li><p>It should be around 5000 words, use the word counter in Overleaf for this. This often corresponds to 10-12 pages. References and appendices are excluded from the word count</p></li>
+<li><p>It should include around 10-15 figures. You can include more figures in appendices and/or as supplemental material in your repository.</p></li>
+</ul>
+</li>
+<li><p>A comment linking to your github repository (or folder in one of your github repositories) for this project. The repository must include</p></li>
+</ul>
+<p>A PDF file of the report</p>
+<ul class="simple">
+<li><p>A folder named Code, where you put python files for your functions and notebooks for reproducing your results. Remember to use a seed for generating random data and for train-test splits when generating final results.</p></li>
+<li><p>A README file with the name of the group members</p></li>
+<li><p>a short description of the project</p></li>
+<li><p>a description of how to install the required packages to run your code from a requirements.txt file or similar (such as a plain text description) names and descriptions of the various notebooks in the Code folder and the results they produce</p></li>
+</ul>
+<section id="preamble-note-on-writing-reports-using-reference-material-ai-and-other-tools">
+<h3>Preamble: Note on writing reports, using reference material, AI and other tools<a class="headerlink" href="#preamble-note-on-writing-reports-using-reference-material-ai-and-other-tools" title="Link to this heading">#</a></h3>
+<p>We want you to answer the three different projects by handing in
+reports written like a standard scientific/technical report. The links
+at
+<a class="github reference external" href="https://github.com/CompPhysics/MachineLearning/tree/master/doc/Projects">CompPhysics/MachineLearning</a>
+contain more information. There you can find examples of previous
+reports, the projects themselves, how we grade reports etc. How to
+write reports will also be discussed during the various lab
+sessions. Please do ask us if you are in doubt.</p>
+<p>When using codes and material from other sources, you should refer to
+these in the bibliography of your report, indicating wherefrom you for
+example got the code, whether this is from the lecture notes,
+softwares like Scikit-Learn, TensorFlow, PyTorch or other
+sources. These sources should always be cited correctly. How to cite
+some of the libraries is often indicated from their corresponding
+GitHub sites or websites, see for example how to cite Scikit-Learn at
+<a class="reference external" href="https://scikit-learn.org/dev/about.html">https://scikit-learn.org/dev/about.html</a>.</p>
+<p>We enocurage you to use tools like ChatGPT or similar in writing the
+report. If you use for example ChatGPT, please do cite it properly and
+include (if possible) your questions and answers as an addition to the
+report. This can be uploaded to for example your website,
+GitHub/GitLab or similar as supplemental material.</p>
+<p>If you would like to study other data sets, feel free to propose other
+sets. What we have proposed here are mere suggestions from our
+side. If you opt for another data set, consider using a set which has
+been studied in the scientific literature. This makes it easier for
+you to compare and analyze your results. Comparing with existing
+results from the scientific literature is also an essential element of
+the scientific discussion. The University of California at Irvine with
+its Machine Learning repository at
+<a class="reference external" href="https://archive.ics.uci.edu/ml/index.php">https://archive.ics.uci.edu/ml/index.php</a> is an excellent site to look
+up for examples and inspiration. <a class="reference external" href="http://kaggle.com/">Kaggle.com</a> is an equally interesting
+site. Feel free to explore these sites.</p>
+</section>
+</section>
+<section id="classification-and-regression-writing-our-own-neural-network-code">
+<h2>Classification and Regression, writing our own neural network code<a class="headerlink" href="#classification-and-regression-writing-our-own-neural-network-code" title="Link to this heading">#</a></h2>
+<p>The main aim of this project is to study both classification and
+regression problems by developing our own
+feed-forward neural network (FFNN) code. The exercises from week 41 and 42 (see <a class="reference external" href="https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/exercisesweek41.html">https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/exercisesweek41.html</a> and <a class="reference external" href="https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/exercisesweek42.html">https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/exercisesweek42.html</a>) as well as the lecture material from the same weeks (see  <a class="reference external" href="https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/week41.html">https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/week41.html</a> and <a class="reference external" href="https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/week42.html">https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/week42.html</a>) should contain enough information for you to get started with writing your own code.</p>
+<p>We will also reuse our codes on gradient descent methods from project 1.</p>
+<p>The data sets that we propose here are (the default sets)</p>
+<ul class="simple">
+<li><p>Regression (fitting a continuous function). In this part you will need to bring back your results from project 1 and compare these with what you get from your Neural Network code to be developed here. The data sets could be</p>
+<ul>
+<li><p>The simple one-dimensional function Runge function from project 1, that is <span class="math notranslate nohighlight">\(f(x) = \frac{1}{1+25x^2}\)</span>. We recommend using a simpler function when developing your neural network code for regression problems. Feel however free to discuss and study other functions, such as the two-dimensional Runge function <span class="math notranslate nohighlight">\(f(x,y)=\left[(10x - 5)^2 + (10y - 5)^2 + 1 \right]^{-1}\)</span>, or even more complicated two-dimensional functions (see the supplementary material of <a class="reference external" href="https://www.nature.com/articles/s41467-025-61362-4">https://www.nature.com/articles/s41467-025-61362-4</a> for an extensive list of two-dimensional functions).</p></li>
+</ul>
+</li>
+<li><p>Classification.</p></li>
+<li><p>We will consider a multiclass classification problem given by the full MNIST data set. The full data set is at <a class="reference external" href="https://www.kaggle.com/datasets/hojjatk/mnist-dataset">https://www.kaggle.com/datasets/hojjatk/mnist-dataset</a>.</p></li>
+</ul>
+<p>We will start with a regression problem and we will reuse our codes on gradient descent methods from project 1.</p>
+<section id="part-a-analytical-warm-up">
+<h3>Part a): Analytical warm-up<a class="headerlink" href="#part-a-analytical-warm-up" title="Link to this heading">#</a></h3>
+<p>When using our gradient machinery from project 1, we will need the expressions for the cost/loss functions and their respective
+gradients. The functions whose gradients we need are:</p>
+<ol class="arabic simple">
+<li><p>The mean-squared error (MSE) with and without the <span class="math notranslate nohighlight">\(L_1\)</span> and <span class="math notranslate nohighlight">\(L_2\)</span> norms (regression problems)</p></li>
+<li><p>The binary cross entropy (aka log loss)  for binary classification problems with and without <span class="math notranslate nohighlight">\(L_1\)</span> and <span class="math notranslate nohighlight">\(L_2\)</span> norms</p></li>
+<li><p>The multiclass cross entropy cost/loss function (aka Softmax cross entropy or just Softmax loss function)</p></li>
+</ol>
+<p>Set up these three cost/loss functions and their respective derivatives and explain the various terms. In this project you will however only use the MSE and the Softmax  cross entropy.</p>
+<p>We will test three activation functions for our neural network setup, these are the</p>
+<ol class="arabic simple">
+<li><p>The Sigmoid (aka <strong>logit</strong>) function,</p></li>
+<li><p>the RELU function and</p></li>
+<li><p>the Leaky RELU function</p></li>
+</ol>
+<p>Set up their expressions and their first derivatives.
+You may consult the lecture notes (with codes and more) from week 42 at <a class="reference external" href="https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/week42.html">https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/week42.html</a>.</p>
+</section>
+<section id="reminder-about-the-gradient-machinery-from-project-1">
+<h3>Reminder about the gradient machinery from project 1<a class="headerlink" href="#reminder-about-the-gradient-machinery-from-project-1" title="Link to this heading">#</a></h3>
+<p>In the setup of a neural network code you will need your gradient descent codes from
+project 1.  For neural networks we will recommend using stochastic
+gradient descent with either the RMSprop or the ADAM algorithms for
+updating the learning rates. But you should feel free to try plain gradient descent as well.</p>
+<p>We recommend reading chapter 8 on optimization from the textbook of
+Goodfellow, Bengio and Courville at
+<a class="reference external" href="https://www.deeplearningbook.org/">https://www.deeplearningbook.org/</a>. This chapter contains many
+useful insights and discussions on the optimization part of machine
+learning.  A useful reference on the back progagation algorithm is
+Nielsen’s book at <a class="reference external" href="http://neuralnetworksanddeeplearning.com/">http://neuralnetworksanddeeplearning.com/</a>.</p>
+<p>You will find the Python <a class="reference external" href="https://seaborn.pydata.org/generated/seaborn.heatmap.html">Seaborn
+package</a>
+useful when plotting the results as function of the learning rate
+<span class="math notranslate nohighlight">\(\eta\)</span> and the hyper-parameter <span class="math notranslate nohighlight">\(\lambda\)</span> .</p>
+</section>
+<section id="part-b-writing-your-own-neural-network-code">
+<h3>Part b): Writing your own Neural Network code<a class="headerlink" href="#part-b-writing-your-own-neural-network-code" title="Link to this heading">#</a></h3>
+<p>Your aim now, and this is the central part of this project, is to
+write your own FFNN code implementing the back
+propagation algorithm discussed in the lecture slides from week 41 at <a class="reference external" href="https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/week41.html">https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/week41.html</a> and week 42 at <a class="reference external" href="https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/week42.html">https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/week42.html</a>.</p>
+<p>We will focus on a regression problem first, using the one-dimensional Runge function</p>
+<div class="math notranslate nohighlight">
+\[
+f(x) = \frac{1}{1+25x^2},
+\]</div>
+<p>from project 1.</p>
+<p>Use only the mean-squared error as cost function (no regularization terms) and
+write an FFNN code for a regression problem with a flexible number of hidden
+layers and nodes using only the Sigmoid function as activation function for
+the hidden layers. Initialize the weights using a normal
+distribution. How would you initialize the biases? And which
+activation function would you select for the final output layer?
+And how would you set up your design/feature matrix? Hint: does it have to represent a polynomial approximation as you did in project 1?</p>
+<p>Train your network and compare the results with those from your OLS
+regression code from project 1 using the one-dimensional Runge
+function.  When comparing your neural network code with the OLS
+results from project 1, use the same data sets which gave you the best
+MSE score. Moreover, use the polynomial order from project 1 that gave you the
+best result.  Compare these results with your neural network with one
+and two hidden layers using <span class="math notranslate nohighlight">\(50\)</span> and <span class="math notranslate nohighlight">\(100\)</span> hidden nodes, respectively.</p>
+<p>Comment your results and give a critical discussion of the results
+obtained with the OLS code from project 1 and your own neural network
+code.  Make an analysis of the learning rates employed to find the
+optimal MSE score. Test both stochastic gradient descent
+with RMSprop and ADAM and plain gradient descent with different
+learning rates.</p>
+<p>You should, as you did in project 1, scale your data.</p>
+</section>
+<section id="part-c-testing-against-other-software-libraries">
+<h3>Part c): Testing against other software libraries<a class="headerlink" href="#part-c-testing-against-other-software-libraries" title="Link to this heading">#</a></h3>
+<p>You should test your results against a similar code using <strong>Scikit-Learn</strong> (see the examples in the above lecture notes from weeks 41 and 42) or <strong>tensorflow/keras</strong> or <strong>Pytorch</strong> (for Pytorch, see Raschka et al.’s text chapters 12 and 13).</p>
+<p>Furthermore, you should also test that your derivatives are correctly
+calculated using automatic differentiation, using for example the
+<strong>Autograd</strong> library or the <strong>JAX</strong> library. It is optional to implement
+these libraries for the present project. In this project they serve as
+useful tests of our derivatives.</p>
+</section>
+<section id="part-d-testing-different-activation-functions-and-depths-of-the-neural-network">
+<h3>Part d): Testing different activation functions and depths of the neural network<a class="headerlink" href="#part-d-testing-different-activation-functions-and-depths-of-the-neural-network" title="Link to this heading">#</a></h3>
+<p>You should also test different activation functions for the hidden
+layers. Try out the Sigmoid, the RELU and the Leaky RELU functions and
+discuss your results.  Test your results as functions of the number of hidden layers and nodes. Do you see signs of overfitting?
+It is optional in this project to perform a bias-variance trade-off analysis.</p>
+</section>
+<section id="part-e-testing-different-norms">
+<h3>Part e): Testing different norms<a class="headerlink" href="#part-e-testing-different-norms" title="Link to this heading">#</a></h3>
+<p>Finally, still using the one-dimensional Runge function, add now the
+hyperparameters <span class="math notranslate nohighlight">\(\lambda\)</span> with the <span class="math notranslate nohighlight">\(L_2\)</span> and <span class="math notranslate nohighlight">\(L_1\)</span> norms.  Find the
+optimal results for the hyperparameters <span class="math notranslate nohighlight">\(\lambda\)</span> and the learning
+rates <span class="math notranslate nohighlight">\(\eta\)</span> and neural network architecture and compare the <span class="math notranslate nohighlight">\(L_2\)</span> results with Ridge regression from
+project 1 and the <span class="math notranslate nohighlight">\(L_1\)</span> results with the Lasso calculations of project 1.
+Use again the same data sets and the best results from project 1 in your comparisons.</p>
+</section>
+<section id="part-f-classification-analysis-using-neural-networks">
+<h3>Part f): Classification  analysis using neural networks<a class="headerlink" href="#part-f-classification-analysis-using-neural-networks" title="Link to this heading">#</a></h3>
+<p>With a well-written code it should now be easy to change the
+activation function for the output layer.</p>
+<p>Here we will change the cost function for our neural network code
+developed in parts b), d) and e) in order to perform a classification
+analysis.  The classification problem we will study is the multiclass
+MNIST problem, see the description of the full data set at
+<a class="reference external" href="https://www.kaggle.com/datasets/hojjatk/mnist-dataset">https://www.kaggle.com/datasets/hojjatk/mnist-dataset</a>. We will use the Softmax cross entropy function discussed in a).
+The MNIST data set discussed in the lecture notes from week 42 is a downscaled variant of the full dataset.</p>
+<p>Feel free to suggest other data sets. If you find the classic MNIST data set somewhat limited, feel free to try the<br />
+MNIST-Fashion data set at for example <a class="reference external" href="https://www.kaggle.com/datasets/zalando-research/fashionmnist">https://www.kaggle.com/datasets/zalando-research/fashionmnist</a>.</p>
+<p>To set up the data set, the following python programs may be useful</p>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-none notranslate"><div class="highlight"><pre><span></span>from sklearn.datasets import fetch_openml
+
+# Fetch the MNIST dataset
+mnist = fetch_openml(&#39;mnist_784&#39;, version=1, as_frame=False, parser=&#39;auto&#39;)
+
+# Extract data (features) and target (labels)
+X = mnist.data
+y = mnist.target
+</pre></div>
+</div>
+</div>
+</div>
+<p>You should consider scaling the data. The Pixel values in MNIST range from 0 to 255. Scaling them to a 0-1 range can improve the performance of some models. That is, you could implement the following scaling</p>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-none notranslate"><div class="highlight"><pre><span></span>X = X / 255.0
+</pre></div>
+</div>
+</div>
+</div>
+<p>And then perform the standard train-test splitting</p>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-none notranslate"><div class="highlight"><pre><span></span>from sklearn.model_selection import train_test_split
+X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
+</pre></div>
+</div>
+</div>
+</div>
+<p>To measure the performance of our classification problem we will use the
+so-called <em>accuracy</em> score.  The accuracy is as you would expect just
+the number of correctly guessed targets <span class="math notranslate nohighlight">\(t_i\)</span> divided by the total
+number of targets, that is</p>
+<div class="math notranslate nohighlight">
+\[
+\text{Accuracy} = \frac{\sum_{i=1}^n I(t_i = y_i)}{n} ,
+\]</div>
+<p>where <span class="math notranslate nohighlight">\(I\)</span> is the indicator function, <span class="math notranslate nohighlight">\(1\)</span> if <span class="math notranslate nohighlight">\(t_i = y_i\)</span> and <span class="math notranslate nohighlight">\(0\)</span>
+otherwise if we have a binary classification problem. Here <span class="math notranslate nohighlight">\(t_i\)</span>
+represents the target and <span class="math notranslate nohighlight">\(y_i\)</span> the outputs of your FFNN code and <span class="math notranslate nohighlight">\(n\)</span> is simply the number of targets <span class="math notranslate nohighlight">\(t_i\)</span>.</p>
+<p>Discuss your results and give a critical analysis of the various parameters, including hyper-parameters like the learning rates and the regularization parameter <span class="math notranslate nohighlight">\(\lambda\)</span>, various activation functions, number of hidden layers and nodes and activation functions.</p>
+<p>Again, we strongly recommend that you compare your own neural Network
+code for classification and pertinent results against a similar code using <strong>Scikit-Learn</strong>  or <strong>tensorflow/keras</strong> or <strong>pytorch</strong>.</p>
+<p>If you have time, you can use the functionality of <strong>scikit-learn</strong> and compare your neural network results with those from Logistic regression. This is optional.
+The weblink  here <a class="reference external" href="https://medium.com/ai-in-plain-english/comparison-between-logistic-regression-and-neural-networks-in-classifying-digits-dc5e85cd93c3">https://medium.com/ai-in-plain-english/comparison-between-logistic-regression-and-neural-networks-in-classifying-digits-dc5e85cd93c3</a>compares logistic regression and FFNN using the so-called MNIST data set. You may find several useful hints and ideas from this article. Your neural network code can implement the equivalent of logistic regression by simply setting the number of hidden layers to zero and keeping just the input and the output layers.</p>
+<p>If you wish to compare with say Logisti Regression from <strong>scikit-learn</strong>, the following code uses the above data set</p>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-none notranslate"><div class="highlight"><pre><span></span>from sklearn.linear_model import LogisticRegression
+# Initialize the model
+model = LogisticRegression(solver=&#39;saga&#39;, multi_class=&#39;multinomial&#39;, max_iter=1000, random_state=42)
+# Train the model
+model.fit(X_train, y_train)
+from sklearn.metrics import accuracy_score
+# Make predictions on the test set
+y_pred = model.predict(X_test)
+# Calculate accuracy
+accuracy = accuracy_score(y_test, y_pred)
+print(f&quot;Model Accuracy: {accuracy:.4f}&quot;)
+</pre></div>
+</div>
+</div>
+</div>
+</section>
+<section id="part-g-critical-evaluation-of-the-various-algorithms">
+<h3>Part g) Critical evaluation of the various algorithms<a class="headerlink" href="#part-g-critical-evaluation-of-the-various-algorithms" title="Link to this heading">#</a></h3>
+<p>After all these glorious calculations, you should now summarize the
+various algorithms and come with a critical evaluation of their pros
+and cons. Which algorithm works best for the regression case and which
+is best for the classification case. These codes can also be part of
+your final project 3, but now applied to other data sets.</p>
+</section>
+</section>
+<section id="summary-of-methods-to-implement-and-analyze">
+<h2>Summary of methods to implement and analyze<a class="headerlink" href="#summary-of-methods-to-implement-and-analyze" title="Link to this heading">#</a></h2>
+<p><strong>Required Implementation:</strong></p>
+<ol class="arabic simple">
+<li><p>Reuse the regression code and results from project 1, these will act as a benchmark for seeing how suited a neural network is for this regression task.</p></li>
+<li><p>Implement a neural network with</p></li>
+</ol>
+<ul class="simple">
+<li><p>A flexible number of layers</p></li>
+<li><p>A flexible number of nodes in each layer</p></li>
+<li><p>A changeable activation function in each layer (Sigmoid, ReLU, LeakyReLU, as well as Linear and Softmax)</p></li>
+<li><p>A changeable cost function, which will be set to MSE for regression and cross-entropy for multiple-classification</p></li>
+<li><p>An optional L1 or L2 norm of the weights and biases in the cost function (only used for computing gradients, not interpretable metrics)</p></li>
+</ul>
+<ol class="arabic simple" start="3">
+<li><p>Implement the back-propagation algorithm to compute the gradient of your neural network</p></li>
+<li><p>Reuse the implementation of Plain and Stochastic Gradient Descent from Project 1 (and adapt the code to work with the your neural network)</p></li>
+</ol>
+<ul class="simple">
+<li><p>With no optimization algorithm</p></li>
+<li><p>With RMS Prop</p></li>
+<li><p>With ADAM</p></li>
+</ul>
+<ol class="arabic simple" start="5">
+<li><p>Implement scaling and train-test splitting of your data, preferably using sklearn</p></li>
+<li><p>Implement and compute metrics like the MSE and Accuracy</p></li>
+</ol>
+<section id="required-analysis">
+<h3>Required Analysis:<a class="headerlink" href="#required-analysis" title="Link to this heading">#</a></h3>
+<ol class="arabic simple">
+<li><p>Briefly show and argue for the advantages and disadvantages of the methods from Project 1.</p></li>
+<li><p>Explore and show the impact of changing the number of layers, nodes per layer, choice of activation function, and inclusion of L1 and L2 norms. Present only the most interesting results from this exploration. 2D Heatmaps will be good for this: Start with finding a well performing set of hyper-parameters, then change two at a time in a range that shows good and bad performance.</p></li>
+<li><p>Show and argue for the advantages and disadvantages of using a neural network for regression on your data</p></li>
+<li><p>Show and argue for the advantages and disadvantages of using a neural network for classification on your data</p></li>
+<li><p>Show and argue for the advantages and disadvantages of the different gradient methods and learning rates when training the neural network</p></li>
+</ol>
+</section>
+<section id="optional-note-that-you-should-include-at-least-two-of-these-in-the-report">
+<h3>Optional (Note that you should include at least two of these in the report):<a class="headerlink" href="#optional-note-that-you-should-include-at-least-two-of-these-in-the-report" title="Link to this heading">#</a></h3>
+<ol class="arabic simple">
+<li><p>Implement Logistic Regression as simple classification model case (equivalent to a Neural Network with just the output layer)</p></li>
+<li><p>Compute the gradient of the neural network with autograd, to show that it gives the same result as your hand-written backpropagation.</p></li>
+<li><p>Compare your results with results from using a machine-learning library like pytorch (<a class="reference external" href="https://docs.pytorch.org/tutorials/beginner/basics/buildmodel_tutorial.html">https://docs.pytorch.org/tutorials/beginner/basics/buildmodel_tutorial.html</a>)</p></li>
+<li><p>Use a more complex classification dataset instead, like the fashion MNIST (see <a class="reference external" href="https://www.kaggle.com/datasets/zalando-research/fashionmnist">https://www.kaggle.com/datasets/zalando-research/fashionmnist</a>)</p></li>
+<li><p>Use a more complex regression dataset instead, like the two-dimensional Runge function <span class="math notranslate nohighlight">\(f(x,y)=\left[(10x - 5)^2 + (10y - 5)^2 + 1 \right]^{-1}\)</span>, or even more complicated two-dimensional functions (see the supplementary material of <a class="reference external" href="https://www.nature.com/articles/s41467-025-61362-4">https://www.nature.com/articles/s41467-025-61362-4</a> for an extensive list of two-dimensional functions).</p></li>
+<li><p>Compute and interpret a confusion matrix of your best classification model (see <a class="reference external" href="https://www.researchgate.net/figure/Confusion-matrix-of-MNIST-and-F-MNIST-embeddings_fig5_349758607">https://www.researchgate.net/figure/Confusion-matrix-of-MNIST-and-F-MNIST-embeddings_fig5_349758607</a>)</p></li>
+</ol>
+</section>
+</section>
+<section id="background-literature">
+<h2>Background literature<a class="headerlink" href="#background-literature" title="Link to this heading">#</a></h2>
+<ol class="arabic simple">
+<li><p>The text of Michael Nielsen is highly recommended, see Nielsen’s book at <a class="reference external" href="http://neuralnetworksanddeeplearning.com/">http://neuralnetworksanddeeplearning.com/</a>. It is an excellent read.</p></li>
+<li><p>Goodfellow, Bengio and Courville, Deep Learning at <a class="reference external" href="https://www.deeplearningbook.org/">https://www.deeplearningbook.org/</a>. Here we recommend chapters 6, 7 and 8</p></li>
+<li><p>Raschka et al. at <a class="reference external" href="https://sebastianraschka.com/blog/2022/ml-pytorch-book.html">https://sebastianraschka.com/blog/2022/ml-pytorch-book.html</a>. Here we recommend chapters 11, 12 and 13.</p></li>
+</ol>
+</section>
+<section id="introduction-to-numerical-projects">
+<h2>Introduction to numerical projects<a class="headerlink" href="#introduction-to-numerical-projects" title="Link to this heading">#</a></h2>
+<p>Here follows a brief recipe and recommendation on how to write a report for each
+project.</p>
+<ul class="simple">
+<li><p>Give a short description of the nature of the problem and the eventual  numerical methods you have used.</p></li>
+<li><p>Describe the algorithm you have used and/or developed. Here you may find it convenient to use pseudocoding. In many cases you can describe the algorithm in the program itself.</p></li>
+<li><p>Include the source code of your program. Comment your program properly.</p></li>
+<li><p>If possible, try to find analytic solutions, or known limits in order to test your program when developing the code.</p></li>
+<li><p>Include your results either in figure form or in a table. Remember to        label your results. All tables and figures should have relevant captions        and labels on the axes.</p></li>
+<li><p>Try to evaluate the reliabilty and numerical stability/precision of your results. If possible, include a qualitative and/or quantitative discussion of the numerical stability, eventual loss of precision etc.</p></li>
+<li><p>Try to give an interpretation of you results in your answers to  the problems.</p></li>
+<li><p>Critique: if possible include your comments and reflections about the  exercise, whether you felt you learnt something, ideas for improvements and  other thoughts you’ve made when solving the exercise. We wish to keep this course at the interactive level and your comments can help us improve it.</p></li>
+<li><p>Try to establish a practice where you log your work at the  computerlab. You may find such a logbook very handy at later stages in your work, especially when you don’t properly remember  what a previous test version  of your program did. Here you could also record  the time spent on solving the exercise, various algorithms you may have tested or other topics which you feel worthy of mentioning.</p></li>
+</ul>
+</section>
+<section id="format-for-electronic-delivery-of-report-and-programs">
+<h2>Format for electronic delivery of report and programs<a class="headerlink" href="#format-for-electronic-delivery-of-report-and-programs" title="Link to this heading">#</a></h2>
+<p>The preferred format for the report is a PDF file. You can also use DOC or postscript formats or as an ipython notebook file.  As programming language we prefer that you choose between C/C++, Fortran2008 or Python. The following prescription should be followed when preparing the report:</p>
+<ul class="simple">
+<li><p>Use Canvas to hand in your projects, log in  at  <a class="reference external" href="https://www.uio.no/english/services/it/education/canvas/">https://www.uio.no/english/services/it/education/canvas/</a> with your normal UiO username and password.</p></li>
+<li><p>Upload <strong>only</strong> the report file or the link to your GitHub/GitLab or similar typo of  repos!  For the source code file(s) you have developed please provide us with your link to your GitHub/GitLab or similar  domain.  The report file should include all of your discussions and a list of the codes you have developed.  Do not include library files which are available at the course homepage, unless you have made specific changes to them.</p></li>
+<li><p>In your GitHub/GitLab or similar repository, please include a folder which contains selected results. These can be in the form of output from your code for a selected set of runs and input parameters.</p></li>
+</ul>
+<p>Finally,
+we encourage you to collaborate. Optimal working groups consist of
+2-3 students. You can then hand in a common report.</p>
+</section>
+</section>
+
+    <script type="text/x-thebe-config">
+    {
+        requestKernel: true,
+        binderOptions: {
+            repo: "binder-examples/jupyter-stacks-datascience",
+            ref: "master",
+        },
+        codeMirrorConfig: {
+            theme: "abcdef",
+            mode: "python"
+        },
+        kernelOptions: {
+            name: "python3",
+            path: "./."
+        },
+        predefinedOutput: true
+    }
+    </script>
+    <script>kernelName = 'python3'</script>
+
+                </article>
+              
+
+              
+              
+              
+              
+                <footer class="prev-next-footer d-print-none">
+                  
+<div class="prev-next-area">
+    <a class="left-prev"
+       href="http://github.com/project1.html"
+       title="previous page">
+      <i class="fa-solid fa-angle-left"></i>
+      <div class="prev-next-info">
+        <p class="prev-next-subtitle">previous</p>
+        <p class="prev-next-title">Project 1 on Machine Learning, deadline October 6 (midnight), 2025</p>
+      </div>
+    </a>
+</div>
+                </footer>
+              
+            </div>
+            
+            
+              
+                <div class="bd-sidebar-secondary bd-toc"><div class="sidebar-secondary-items sidebar-secondary__inner">
+
+
+  <div class="sidebar-secondary-item">
+  <div class="page-toc tocsection onthispage">
+    <i class="fa-solid fa-list"></i> Contents
+  </div>
+  <nav class="bd-toc-nav page-toc">
+    <ul class="visible nav section-nav flex-column">
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#deliverables">Deliverables</a><ul class="nav section-nav flex-column">
+<li class="toc-h3 nav-item toc-entry"><a class="reference internal nav-link" href="#preamble-note-on-writing-reports-using-reference-material-ai-and-other-tools">Preamble: Note on writing reports, using reference material, AI and other tools</a></li>
+</ul>
+</li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#classification-and-regression-writing-our-own-neural-network-code">Classification and Regression, writing our own neural network code</a><ul class="nav section-nav flex-column">
+<li class="toc-h3 nav-item toc-entry"><a class="reference internal nav-link" href="#part-a-analytical-warm-up">Part a): Analytical warm-up</a></li>
+<li class="toc-h3 nav-item toc-entry"><a class="reference internal nav-link" href="#reminder-about-the-gradient-machinery-from-project-1">Reminder about the gradient machinery from project 1</a></li>
+<li class="toc-h3 nav-item toc-entry"><a class="reference internal nav-link" href="#part-b-writing-your-own-neural-network-code">Part b): Writing your own Neural Network code</a></li>
+<li class="toc-h3 nav-item toc-entry"><a class="reference internal nav-link" href="#part-c-testing-against-other-software-libraries">Part c): Testing against other software libraries</a></li>
+<li class="toc-h3 nav-item toc-entry"><a class="reference internal nav-link" href="#part-d-testing-different-activation-functions-and-depths-of-the-neural-network">Part d): Testing different activation functions and depths of the neural network</a></li>
+<li class="toc-h3 nav-item toc-entry"><a class="reference internal nav-link" href="#part-e-testing-different-norms">Part e): Testing different norms</a></li>
+<li class="toc-h3 nav-item toc-entry"><a class="reference internal nav-link" href="#part-f-classification-analysis-using-neural-networks">Part f): Classification  analysis using neural networks</a></li>
+<li class="toc-h3 nav-item toc-entry"><a class="reference internal nav-link" href="#part-g-critical-evaluation-of-the-various-algorithms">Part g) Critical evaluation of the various algorithms</a></li>
+</ul>
+</li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#summary-of-methods-to-implement-and-analyze">Summary of methods to implement and analyze</a><ul class="nav section-nav flex-column">
+<li class="toc-h3 nav-item toc-entry"><a class="reference internal nav-link" href="#required-analysis">Required Analysis:</a></li>
+<li class="toc-h3 nav-item toc-entry"><a class="reference internal nav-link" href="#optional-note-that-you-should-include-at-least-two-of-these-in-the-report">Optional (Note that you should include at least two of these in the report):</a></li>
+</ul>
+</li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#background-literature">Background literature</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#introduction-to-numerical-projects">Introduction to numerical projects</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#format-for-electronic-delivery-of-report-and-programs">Format for electronic delivery of report and programs</a></li>
+</ul>
+  </nav></div>
+
+</div></div>
+              
+            
+          </div>
+          <footer class="bd-footer-content">
+            
+<div class="bd-footer-content__inner container">
+  
+  <div class="footer-item">
+    
+<p class="component-author">
+By Morten Hjorth-Jensen
+</p>
+
+  </div>
+  
+  <div class="footer-item">
+    
+
+  <p class="copyright">
+    
+      © Copyright 2023.
+      <br/>
+    
+  </p>
+
+  </div>
+  
+  <div class="footer-item">
+    
+  </div>
+  
+  <div class="footer-item">
+    
+  </div>
+  
+</div>
+          </footer>
+        
+
+      </main>
+    </div>
+  </div>
+  
+  <!-- Scripts loaded after <body> so the DOM is not blocked -->
+  <script src="http://github.com/_static/scripts/bootstrap.js?digest=dfe6caa3a7d634c4db9b"></script>
+<script src="http://github.com/_static/scripts/pydata-sphinx-theme.js?digest=dfe6caa3a7d634c4db9b"></script>
+
+  <footer class="bd-footer">
+  </footer>
+  </body>
+</html>
\ No newline at end of file
diff --git a/doc/LectureNotes/_build/html/schedule.html b/doc/LectureNotes/_build/html/schedule.html
index a5588a187..c1f4b19fe 100644
--- a/doc/LectureNotes/_build/html/schedule.html
+++ b/doc/LectureNotes/_build/html/schedule.html
@@ -228,10 +228,45 @@
 <li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek36.html">Exercises week 36</a></li>
 <li class="toctree-l1"><a class="reference internal" href="http://github.com/week36.html">Week 36: Linear Regression and Gradient descent</a></li>
 <li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek37.html">Exercises week 37</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week37.html">Week 37: Gradient descent methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek38.html">Exercises week 38</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week38.html">Week 38: Statistical analysis, bias-variance tradeoff and resampling methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek39.html">Exercises week 39</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week39.html">Week 39: Resampling methods and logistic regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week40.html">Week 40: Gradient descent methods (continued) and start Neural networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week41.html">Week 41 Neural networks and constructing a neural network code</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek41.html">Exercises week 41</a></li>
+
+
+
+
+
+
+
+
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week42.html">Week 42 Constructing a Neural Network code with examples</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek42.html">Exercises week 42</a></li>
+
+
+
+
+
+
+
+
+
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week43.html">Week 43: Deep Learning: Constructing a Neural Network code and solving differential equations</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek43.html">Exercises week 43</a></li>
+
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week44.html">Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek44.html">Exercises week 44</a></li>
+
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week45.html">Week 45,  Convolutional Neural Networks (CCNs)</a></li>
 </ul>
 <p aria-level="2" class="caption" role="heading"><span class="caption-text">Projects</span></p>
 <ul class="nav bd-sidenav">
 <li class="toctree-l1"><a class="reference internal" href="http://github.com/project1.html">Project 1 on Machine Learning, deadline October 6 (midnight), 2025</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/project2.html">Project 2 on Machine Learning, deadline November 10 (Midnight)</a></li>
 </ul>
 
     </div>
diff --git a/doc/LectureNotes/_build/html/search.html b/doc/LectureNotes/_build/html/search.html
index be23d8ca2..109651ed7 100644
--- a/doc/LectureNotes/_build/html/search.html
+++ b/doc/LectureNotes/_build/html/search.html
@@ -229,10 +229,45 @@
 <li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek36.html">Exercises week 36</a></li>
 <li class="toctree-l1"><a class="reference internal" href="http://github.com/week36.html">Week 36: Linear Regression and Gradient descent</a></li>
 <li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek37.html">Exercises week 37</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week37.html">Week 37: Gradient descent methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek38.html">Exercises week 38</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week38.html">Week 38: Statistical analysis, bias-variance tradeoff and resampling methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek39.html">Exercises week 39</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week39.html">Week 39: Resampling methods and logistic regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week40.html">Week 40: Gradient descent methods (continued) and start Neural networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week41.html">Week 41 Neural networks and constructing a neural network code</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek41.html">Exercises week 41</a></li>
+
+
+
+
+
+
+
+
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week42.html">Week 42 Constructing a Neural Network code with examples</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek42.html">Exercises week 42</a></li>
+
+
+
+
+
+
+
+
+
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week43.html">Week 43: Deep Learning: Constructing a Neural Network code and solving differential equations</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek43.html">Exercises week 43</a></li>
+
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week44.html">Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek44.html">Exercises week 44</a></li>
+
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week45.html">Week 45,  Convolutional Neural Networks (CCNs)</a></li>
 </ul>
 <p aria-level="2" class="caption" role="heading"><span class="caption-text">Projects</span></p>
 <ul class="nav bd-sidenav">
 <li class="toctree-l1"><a class="reference internal" href="http://github.com/project1.html">Project 1 on Machine Learning, deadline October 6 (midnight), 2025</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/project2.html">Project 2 on Machine Learning, deadline November 10 (Midnight)</a></li>
 </ul>
 
     </div>
diff --git a/doc/LectureNotes/_build/html/searchindex.js b/doc/LectureNotes/_build/html/searchindex.js
index 4c7aa149b..30d44a50d 100644
--- a/doc/LectureNotes/_build/html/searchindex.js
+++ b/doc/LectureNotes/_build/html/searchindex.js
@@ -1 +1 @@
-Search.setIndex({"alltitles": {"1a)": [[18, "a"]], "3a)": [[18, "id1"]], "3b)": [[18, "b"]], "4a)": [[18, "id2"]], "4b)": [[18, "id3"]], "A Classification Tree": [[9, "a-classification-tree"]], "A Frequentist approach to data analysis": [[0, "a-frequentist-approach-to-data-analysis"], [26, "a-frequentist-approach-to-data-analysis"]], "A better approach": [[8, "a-better-approach"]], "A first summary": [[26, "a-first-summary"]], "A quick Reminder on Lagrangian Multipliers": [[8, "a-quick-reminder-on-lagrangian-multipliers"]], "A simple example": [[4, "a-simple-example"]], "A soft classifier": [[8, "a-soft-classifier"]], "A top-down perspective on Neural networks": [[1, "a-top-down-perspective-on-neural-networks"]], "ADAM optimizer": [[13, "adam-optimizer"]], "Activation functions": [[12, "activation-functions"]], "Adaptive boosting: AdaBoost, Basic Algorithm": [[10, "adaptive-boosting-adaboost-basic-algorithm"]], "Adding error analysis and training set up": [[26, "adding-error-analysis-and-training-set-up"], [27, "adding-error-analysis-and-training-set-up"]], "Adjust hyperparameters": [[1, "adjust-hyperparameters"]], "Algorithms for Setting up Decision Trees": [[9, "algorithms-for-setting-up-decision-trees"]], "An Overview of Ensemble Methods": [[10, "an-overview-of-ensemble-methods"]], "An extrapolation example": [[4, "an-extrapolation-example"]], "An optimization/minimization problem": [[26, "an-optimization-minimization-problem"]], "And finally  \\boldsymbol{X}\\boldsymbol{X}^T": [[27, "and-finally-boldsymbol-x-boldsymbol-x-t"]], "And what about using neural networks?": [[26, "and-what-about-using-neural-networks"]], "Another Example, now with a polynomial fit": [[28, "another-example-now-with-a-polynomial-fit"]], "Another example, the moons again": [[9, "another-example-the-moons-again"]], "Applied Data Analysis and Machine Learning": [[19, null]], "Autocorrelation function": [[23, "autocorrelation-function"]], "Automatic differentiation": [[13, "automatic-differentiation"]], "Back to Ridge and LASSO Regression": [[27, "back-to-ridge-and-lasso-regression"], [28, "back-to-ridge-and-lasso-regression"]], "Back to the Cancer Data": [[11, "back-to-the-cancer-data"]], "Background literature": [[21, "background-literature"]], "Bagging": [[10, "bagging"]], "Bagging Examples": [[10, "bagging-examples"]], "Basic Matrix Features": [[20, "basic-matrix-features"]], "Basic ideas of the Principal Component Analysis (PCA)": [[11, null]], "Basic math of the SVD": [[5, "basic-math-of-the-svd"], [27, "basic-math-of-the-svd"], [28, "basic-math-of-the-svd"]], "Basics": [[7, "basics"]], "Basics of a tree": [[9, "basics-of-a-tree"]], "Batch Normalization": [[1, "batch-normalization"]], "Bayes\u2019 Theorem and Ridge and Lasso Regression": [[5, "bayes-theorem-and-ridge-and-lasso-regression"]], "Boosting, a Bird\u2019s Eye View": [[10, "boosting-a-bird-s-eye-view"]], "Bootstrap": [[6, "bootstrap"]], "Bringing it together, first back propagation equation": [[12, "bringing-it-together-first-back-propagation-equation"]], "Building a Feed Forward Neural Network": [[1, null]], "Building a tree, regression": [[9, "building-a-tree-regression"]], "Building neural networks in Tensorflow and Keras": [[1, "building-neural-networks-in-tensorflow-and-keras"]], "CNNs in more detail, building convolutional neural networks in Tensorflow and Keras": [[3, "cnns-in-more-detail-building-convolutional-neural-networks-in-tensorflow-and-keras"]], "Cancer Data again now with Decision Trees and other Methods": [[9, "cancer-data-again-now-with-decision-trees-and-other-methods"]], "Choose cost function and optimizer": [[1, "choose-cost-function-and-optimizer"]], "Classical PCA Theorem": [[11, "classical-pca-theorem"]], "Clustering and Unsupervised Learning": [[14, null]], "Code for SVD and Inversion of Matrices": [[5, "code-for-svd-and-inversion-of-matrices"]], "Codes and Approaches": [[14, "codes-and-approaches"]], "Codes for the SVD": [[5, "codes-for-the-svd"], [27, "codes-for-the-svd"], [28, "codes-for-the-svd"]], "Coding Setup and Linear Regression": [[15, "coding-setup-and-linear-regression"]], "Collect and pre-process data": [[1, "collect-and-pre-process-data"]], "Communication channels": [[26, "communication-channels"]], "Compare  Bagging on Trees with Random Forests": [[10, "compare-bagging-on-trees-with-random-forests"]], "Comparing with a numerical scheme": [[2, "comparing-with-a-numerical-scheme"]], "Comparison with OLS": [[28, "comparison-with-ols"]], "Computing the Gini index": [[9, "computing-the-gini-index"]], "Conditions on convex functions": [[28, "conditions-on-convex-functions"]], "Conjugate gradient method": [[13, "conjugate-gradient-method"]], "Convex function": [[28, "convex-function"]], "Convex functions": [[13, "convex-functions"], [28, "convex-functions"]], "Convolution Examples: Polynomial multiplication": [[3, "convolution-examples-polynomial-multiplication"]], "Convolution Examples: Principle of Superposition and Periodic Forces (Fourier Transforms)": [[3, "convolution-examples-principle-of-superposition-and-periodic-forces-fourier-transforms"]], "Convolutional Neural Network": [[12, "convolutional-neural-network"]], "Convolutional Neural Networks": [[3, null]], "Correlation Function and Design/Feature Matrix": [[27, "correlation-function-and-design-feature-matrix"]], "Correlation Matrix": [[11, "correlation-matrix"], [27, "correlation-matrix"]], "Correlation Matrix with Pandas": [[27, "correlation-matrix-with-pandas"]], "Course Format": [[26, "course-format"]], "Course setting": [[22, null]], "Covariance Matrix Examples": [[27, "covariance-matrix-examples"]], "Covariance and Correlation Matrix": [[27, "covariance-and-correlation-matrix"]], "Cross-validation": [[6, "cross-validation"]], "Deadlines for projects (tentative)": [[26, "deadlines-for-projects-tentative"]], "Decision trees, overarching aims": [[9, null]], "Deep learning methods": [[26, "deep-learning-methods"]], "Define model and architecture": [[1, "define-model-and-architecture"]], "Defining the cost function": [[1, "defining-the-cost-function"]], "Deliverables": [[15, "deliverables"], [16, "deliverables"]], "Derivatives and the chain rule": [[12, "derivatives-and-the-chain-rule"]], "Derivatives, example 1": [[27, "derivatives-example-1"]], "Deriving OLS from a probability distribution": [[5, "deriving-ols-from-a-probability-distribution"]], "Deriving and Implementing Ordinary Least Squares": [[16, "deriving-and-implementing-ordinary-least-squares"]], "Deriving and Implementing Ridge Regression": [[17, "deriving-and-implementing-ridge-regression"]], "Deriving the  Lasso Regression Equations": [[27, "deriving-the-lasso-regression-equations"], [28, "deriving-the-lasso-regression-equations"], [28, "id6"]], "Deriving the  Ridge Regression Equations": [[27, "deriving-the-ridge-regression-equations"], [28, "deriving-the-ridge-regression-equations"], [28, "id3"]], "Deriving the back propagation code for a multilayer perceptron model": [[12, "deriving-the-back-propagation-code-for-a-multilayer-perceptron-model"]], "Developing a code for doing neural networks with back propagation": [[1, "developing-a-code-for-doing-neural-networks-with-back-propagation"]], "Diagonalize the sample covariance matrix to obtain the principal components": [[11, "diagonalize-the-sample-covariance-matrix-to-obtain-the-principal-components"]], "Different kernels and Mercer\u2019s theorem": [[8, "different-kernels-and-mercer-s-theorem"]], "Disadvantages": [[9, "disadvantages"]], "Discriminative Modeling": [[26, "discriminative-modeling"]], "Domains and probabilities": [[23, "domains-and-probabilities"]], "Dropout": [[1, "dropout"]], "Economy-size SVD": [[27, "economy-size-svd"], [28, "economy-size-svd"]], "Elements of Probability Theory and Statistical Data Analysis": [[23, null]], "Ensemble Methods: From a Single Tree to Many Trees and Extreme Boosting, Meet the Jungle of Methods": [[10, null]], "Entropy and the ID3 algorithm": [[9, "entropy-and-the-id3-algorithm"]], "Essential elements of ML": [[26, "essential-elements-of-ml"]], "Evaluate model performance on test data": [[1, "evaluate-model-performance-on-test-data"]], "Example 2": [[27, "example-2"]], "Example 3": [[27, "example-3"]], "Example 4": [[27, "example-4"]], "Example Matrix": [[27, "example-matrix"], [28, "example-matrix"]], "Example of discriminative modeling, taken from Generative Deep Learning by David Foster": [[26, "example-of-discriminative-modeling-taken-from-generative-deep-learning-by-david-foster"]], "Example of generative modeling, taken from Generative Deep Learning by David Foster": [[26, "example-of-generative-modeling-taken-from-generative-deep-learning-by-david-foster"]], "Example of own Standard scaling": [[27, "example-of-own-standard-scaling"]], "Example relevant for the exercises": [[27, "example-relevant-for-the-exercises"]], "Example: Exponential decay": [[2, "example-exponential-decay"]], "Example: Population growth": [[2, "example-population-growth"]], "Example: The diffusion equation": [[2, "example-the-diffusion-equation"]], "Example: binary classification problem": [[1, "example-binary-classification-problem"]], "Examples": [[26, "examples"]], "Examples of likelihood functions used in logistic regression and neural networks": [[7, "examples-of-likelihood-functions-used-in-logistic-regression-and-neural-networks"]], "Exercise 1 - Choice of model and degrees of freedom": [[17, "exercise-1-choice-of-model-and-degrees-of-freedom"]], "Exercise 1 - Finding the derivative of Matrix-Vector expressions": [[16, "exercise-1-finding-the-derivative-of-matrix-vector-expressions"]], "Exercise 1 - Github Setup": [[15, "exercise-1-github-setup"]], "Exercise 1, scale your data": [[18, "exercise-1-scale-your-data"]], "Exercise 1: Setting up various Python environments": [[0, "exercise-1-setting-up-various-python-environments"]], "Exercise 2 - Deriving the expression for OLS": [[16, "exercise-2-deriving-the-expression-for-ols"]], "Exercise 2 - Deriving the expression for Ridge Regression": [[17, "exercise-2-deriving-the-expression-for-ridge-regression"]], "Exercise 2 - Setting up a Github repository": [[15, "exercise-2-setting-up-a-github-repository"]], "Exercise 2, calculate the gradients": [[18, "exercise-2-calculate-the-gradients"]], "Exercise 2: making your own data and exploring scikit-learn": [[0, "exercise-2-making-your-own-data-and-exploring-scikit-learn"]], "Exercise 3 - Creating feature matrix and implementing OLS using the analytical expression": [[16, "exercise-3-creating-feature-matrix-and-implementing-ols-using-the-analytical-expression"]], "Exercise 3 - Fitting an OLS model to data": [[15, "exercise-3-fitting-an-ols-model-to-data"]], "Exercise 3 - Scaling data": [[17, "exercise-3-scaling-data"]], "Exercise 3 - Setting up a Python virtual environment": [[15, "exercise-3-setting-up-a-python-virtual-environment"]], "Exercise 3, using the analytical formulae for OLS and Ridge regression to find the optimal paramters \\boldsymbol{\\theta}": [[18, "exercise-3-using-the-analytical-formulae-for-ols-and-ridge-regression-to-find-the-optimal-paramters-boldsymbol-theta"]], "Exercise 3: Normalizing our data": [[0, "exercise-3-normalizing-our-data"]], "Exercise 4 - Fitting a polynomial": [[16, "exercise-4-fitting-a-polynomial"]], "Exercise 4 - Implementing Ridge Regression": [[17, "exercise-4-implementing-ridge-regression"]], "Exercise 4 - Testing multiple hyperparameters": [[17, "exercise-4-testing-multiple-hyperparameters"]], "Exercise 4 - The train-test split": [[15, "exercise-4-the-train-test-split"]], "Exercise 4, Implementing the simplest form for gradient descent": [[18, "exercise-4-implementing-the-simplest-form-for-gradient-descent"]], "Exercise 4: Adding Ridge Regression": [[0, "exercise-4-adding-ridge-regression"]], "Exercise 5 - Comparing your code with sklearn": [[16, "exercise-5-comparing-your-code-with-sklearn"]], "Exercise 5, Ridge regression and a new Synthetic Dataset": [[18, "exercise-5-ridge-regression-and-a-new-synthetic-dataset"]], "Exercise 5: Analytical exercises": [[0, "exercise-5-analytical-exercises"]], "Exercise:  Cross-validation as resampling techniques, adding more complexity": [[6, "exercise-cross-validation-as-resampling-techniques-adding-more-complexity"]], "Exercise: Analysis of real data": [[6, "exercise-analysis-of-real-data"]], "Exercise: Bias-variance trade-off and resampling techniques": [[6, "exercise-bias-variance-trade-off-and-resampling-techniques"]], "Exercise: Lasso Regression on the Franke function  with resampling": [[6, "exercise-lasso-regression-on-the-franke-function-with-resampling"]], "Exercise: Ordinary Least Square (OLS) on the Franke function": [[6, "exercise-ordinary-least-square-ols-on-the-franke-function"]], "Exercise: Ridge Regression on the Franke function  with resampling": [[6, "exercise-ridge-regression-on-the-franke-function-with-resampling"]], "Exercises": [[0, "exercises"]], "Exercises and Projects": [[6, "exercises-and-projects"]], "Exercises week 34": [[15, null]], "Exercises week 35": [[16, null]], "Exercises week 36": [[17, null]], "Exercises week 37": [[18, null]], "Expectation values": [[23, "expectation-values"]], "Extending to more than one variable": [[28, "extending-to-more-than-one-variable"]], "Extremely useful tools, strongly recommended": [[26, "extremely-useful-tools-strongly-recommended"]], "Feed-forward neural networks": [[12, "feed-forward-neural-networks"]], "Feed-forward pass": [[1, "feed-forward-pass"]], "Final back propagating equation": [[12, "final-back-propagating-equation"]], "Fine-tuning neural network hyperparameters": [[1, "fine-tuning-neural-network-hyperparameters"]], "Fitting an Equation of State for Dense Nuclear Matter": [[0, "fitting-an-equation-of-state-for-dense-nuclear-matter"]], "Fixing the singularity": [[27, "fixing-the-singularity"], [28, "fixing-the-singularity"]], "Format for electronic delivery of report and programs": [[21, "format-for-electronic-delivery-of-report-and-programs"]], "Frequently used scaling functions": [[27, "frequently-used-scaling-functions"]], "From OLS to Ridge and Lasso": [[28, "from-ols-to-ridge-and-lasso"]], "From one to many layers, the universal approximation theorem": [[12, "from-one-to-many-layers-the-universal-approximation-theorem"]], "Functionality in Scikit-Learn": [[27, "functionality-in-scikit-learn"]], "Further Dimensionality Remarks": [[3, "further-dimensionality-remarks"]], "Further properties (important for our analyses later)": [[5, "further-properties-important-for-our-analyses-later"], [27, "further-properties-important-for-our-analyses-later"], [28, "further-properties-important-for-our-analyses-later"]], "Gaussian Elimination": [[20, "gaussian-elimination"]], "General Features": [[9, "general-features"]], "General linear models and linear algebra": [[26, "general-linear-models-and-linear-algebra"]], "Generalizing the fitting procedure as a linear algebra problem": [[26, "generalizing-the-fitting-procedure-as-a-linear-algebra-problem"], [26, "id1"]], "Generative Adversarial Networks": [[4, "generative-adversarial-networks"]], "Generative Models": [[4, "generative-models"]], "Generative Versus Discriminative Modeling": [[26, "generative-versus-discriminative-modeling"]], "Geometric Interpretation and link with Singular Value Decomposition": [[11, "geometric-interpretation-and-link-with-singular-value-decomposition"]], "Gradient Boosting, Classification Example": [[10, "gradient-boosting-classification-example"]], "Gradient Boosting, Examples of Regression": [[10, "gradient-boosting-examples-of-regression"]], "Gradient Clipping": [[1, "gradient-clipping"]], "Gradient Descent Example": [[28, "id1"]], "Gradient boosting: Basics with Steepest Descent/Functional Gradient Descent": [[10, "gradient-boosting-basics-with-steepest-descent-functional-gradient-descent"]], "Gradient descent": [[2, "gradient-descent"]], "Gradient descent and Ridge": [[28, "gradient-descent-and-ridge"]], "Gradient descent example": [[28, "gradient-descent-example"]], "Grading": [[24, "grading"], [24, "id2"], [26, "grading"]], "How to take derivatives of Matrix-Vector expressions": [[16, "how-to-take-derivatives-of-matrix-vector-expressions"]], "Hyperplanes and all that": [[8, "hyperplanes-and-all-that"]], "Important Matrix and vector handling packages": [[20, "important-matrix-and-vector-handling-packages"]], "Important technicalities: More on Rescaling data": [[27, "important-technicalities-more-on-rescaling-data"]], "Improving performance": [[1, "improving-performance"]], "In summary": [[24, "in-summary"]], "Including Stochastic Gradient Descent with Autograd": [[13, "including-stochastic-gradient-descent-with-autograd"]], "Incremental PCA": [[11, "incremental-pca"]], "Installing R, C++, cython or Julia": [[26, "installing-r-c-cython-or-julia"]], "Installing R, C++, cython, Numba etc": [[26, "installing-r-c-cython-numba-etc"]], "Instructor information": [[24, "instructor-information"]], "Interpretations and optimizing our parameters": [[26, "interpretations-and-optimizing-our-parameters"], [26, "id2"], [26, "id3"], [27, "interpretations-and-optimizing-our-parameters"], [27, "id1"], [27, "id2"]], "Interpreting the Ridge results": [[27, "interpreting-the-ridge-results"], [28, "interpreting-the-ridge-results"], [28, "id4"]], "Introducing JAX": [[13, "introducing-jax"]], "Introducing the Covariance and Correlation functions": [[11, "introducing-the-covariance-and-correlation-functions"], [27, "introducing-the-covariance-and-correlation-functions"]], "Introduction": [[0, "introduction"], [6, "introduction"], [19, "introduction"], [20, "introduction"]], "Introduction to numerical projects": [[21, "introduction-to-numerical-projects"]], "Iterative Fitting, Classification and AdaBoost": [[10, "iterative-fitting-classification-and-adaboost"]], "Iterative Fitting, Regression and Squared-error Cost Function": [[10, "iterative-fitting-regression-and-squared-error-cost-function"]], "Kernel PCA": [[11, "kernel-pca"]], "Kernels and non-linearity": [[8, "kernels-and-non-linearity"]], "LU Decomposition, the inverse of a matrix": [[20, "lu-decomposition-the-inverse-of-a-matrix"]], "Lasso Regression": [[28, "lasso-regression"]], "Lasso case": [[28, "lasso-case"]], "Layers": [[1, "layers"]], "Layers used to build CNNs": [[3, "layers-used-to-build-cnns"]], "Learning goals": [[15, "learning-goals"], [16, "learning-goals"], [17, "learning-goals"], [18, "learning-goals"]], "Learning outcomes": [[19, "learning-outcomes"], [26, "learning-outcomes"]], "Lectures and ComputerLab": [[26, "lectures-and-computerlab"]], "Limitations of supervised learning with deep networks": [[1, "limitations-of-supervised-learning-with-deep-networks"]], "Linear Algebra, Handling of Arrays and more Python Features": [[20, null]], "Linear Regression": [[0, null]], "Linear Regression Problems": [[27, "linear-regression-problems"], [28, "linear-regression-problems"]], "Linear Regression and  the SVD": [[28, "linear-regression-and-the-svd"]], "Linear Regression, basic elements": [[0, "linear-regression-basic-elements"]], "Linking Bayes\u2019 Theorem with Ridge and Lasso Regression": [[5, "linking-bayes-theorem-with-ridge-and-lasso-regression"]], "Linking the regression analysis with a statistical interpretation": [[5, "linking-the-regression-analysis-with-a-statistical-interpretation"]], "Linking with the SVD": [[5, "linking-with-the-svd"], [27, "linking-with-the-svd"]], "Links to relevant courses at the University of Oslo": [[25, "links-to-relevant-courses-at-the-university-of-oslo"]], "Logistic Regression": [[7, null], [7, "id1"]], "MNIST and GANs": [[4, "mnist-and-gans"]], "Machine Learning": [[26, "machine-learning"]], "Machine learning": [[19, "machine-learning"]], "Main textbooks": [[26, "main-textbooks"]], "Making a tree": [[9, "making-a-tree"]], "Making your own Bootstrap: Changing the Level of the Decision Tree": [[10, "making-your-own-bootstrap-changing-the-level-of-the-decision-tree"]], "Making your own test-train splitting": [[27, "making-your-own-test-train-splitting"]], "Material for exercises week 35": [[27, "material-for-exercises-week-35"]], "Material for lab sessions  sessions Tuesday and Wednesday": [[28, "material-for-lab-sessions-sessions-tuesday-and-wednesday"]], "Material for lecture Monday September 2": [[28, "material-for-lecture-monday-september-2"]], "Mathematical Interpretation of Ordinary Least Squares": [[5, "mathematical-interpretation-of-ordinary-least-squares"], [27, "mathematical-interpretation-of-ordinary-least-squares"], [28, "mathematical-interpretation-of-ordinary-least-squares"]], "Mathematical optimization of convex functions": [[8, "mathematical-optimization-of-convex-functions"]], "Mathematics of CNNs": [[3, "mathematics-of-cnns"]], "Mathematics of the SVD and implications": [[5, "mathematics-of-the-svd-and-implications"], [27, "mathematics-of-the-svd-and-implications"], [28, "mathematics-of-the-svd-and-implications"]], "Matrices in Python": [[26, "matrices-in-python"]], "Matrix  multiplication": [[1, "matrix-multiplication"]], "Matrix-vector notation  and activation": [[12, "matrix-vector-notation-and-activation"]], "Meet the  covariance!": [[23, "meet-the-covariance"]], "Meet the Covariance Matrix": [[5, "meet-the-covariance-matrix"], [27, "meet-the-covariance-matrix"]], "Meet the Hessian Matrix": [[27, "meet-the-hessian-matrix"]], "Meet the Pandas": [[26, "meet-the-pandas"]], "Min-Max Scaling": [[27, "min-max-scaling"]], "Momentum based GD": [[13, "momentum-based-gd"]], "More complicated Example: The Ising model": [[6, "more-complicated-example-the-ising-model"]], "More interpretations": [[27, "more-interpretations"], [28, "more-interpretations"], [28, "id5"]], "More on Dimensionalities": [[3, "more-on-dimensionalities"]], "More on Rescaling data": [[6, "more-on-rescaling-data"]], "More on Steepest descent": [[28, "more-on-steepest-descent"]], "More on convex functions": [[28, "more-on-convex-functions"]], "More preprocessing": [[27, "more-preprocessing"]], "Multilayer perceptrons": [[12, "multilayer-perceptrons"]], "Network requirements": [[2, "network-requirements"]], "Neural Networks vs CNNs": [[3, "neural-networks-vs-cnns"]], "Neural networks": [[12, null]], "Note about SVD Calculations": [[27, "note-about-svd-calculations"], [28, "note-about-svd-calculations"]], "Note on Scikit-Learn": [[28, "note-on-scikit-learn"]], "Numerical experiments and the covariance, central limit theorem": [[23, "numerical-experiments-and-the-covariance-central-limit-theorem"]], "Numpy and arrays": [[20, "numpy-and-arrays"], [26, "numpy-and-arrays"]], "Numpy examples and Important Matrix and vector handling packages": [[26, "numpy-examples-and-important-matrix-and-vector-handling-packages"]], "Optimization and gradient descent, the central part of any Machine Learning algortithm": [[28, "optimization-and-gradient-descent-the-central-part-of-any-machine-learning-algortithm"]], "Optimization, the central part of any Machine Learning algortithm": [[13, null]], "Optimizing our parameters": [[26, "optimizing-our-parameters"]], "Optimizing our parameters, more details": [[26, "optimizing-our-parameters-more-details"]], "Optimizing the cost function": [[1, "optimizing-the-cost-function"]], "Organizing our data": [[0, "organizing-our-data"], [26, "organizing-our-data"]], "Other Matrix and Vector Operations": [[20, "other-matrix-and-vector-operations"]], "Other Types of Recurrent Neural Networks": [[4, "other-types-of-recurrent-neural-networks"]], "Other courses on Data science and Machine Learning  at UiO": [[26, "other-courses-on-data-science-and-machine-learning-at-uio"]], "Other courses on Data science and Machine Learning  at UiO, contn": [[26, "other-courses-on-data-science-and-machine-learning-at-uio-contn"]], "Other popular texts": [[26, "other-popular-texts"]], "Other techniques": [[11, "other-techniques"]], "Other types of networks": [[12, "other-types-of-networks"]], "Other ways of visualizing the trees": [[9, "other-ways-of-visualizing-the-trees"]], "Our model for the nuclear binding energies": [[26, "our-model-for-the-nuclear-binding-energies"]], "Overview of first week": [[26, "overview-of-first-week"]], "Own code for Ordinary Least Squares": [[26, "own-code-for-ordinary-least-squares"], [27, "own-code-for-ordinary-least-squares"]], "PCA and scikit-learn": [[11, "pca-and-scikit-learn"]], "Pandas AI": [[26, "pandas-ai"]], "Part a : Ordinary Least Square (OLS) for the Runge function": [[21, "part-a-ordinary-least-square-ols-for-the-runge-function"]], "Part b: Adding Ridge regression for  the Runge  function": [[21, "part-b-adding-ridge-regression-for-the-runge-function"]], "Part c: Writing your own gradient descent code": [[21, "part-c-writing-your-own-gradient-descent-code"]], "Part d: Including momentum and more advanced ways to update the learning the rate": [[21, "part-d-including-momentum-and-more-advanced-ways-to-update-the-learning-the-rate"]], "Part e: Writing our own code for Lasso regression": [[21, "part-e-writing-our-own-code-for-lasso-regression"]], "Part f: Stochastic gradient descent": [[21, "part-f-stochastic-gradient-descent"]], "Part g: Bias-variance trade-off and resampling techniques": [[21, "part-g-bias-variance-trade-off-and-resampling-techniques"]], "Part h):  Cross-validation as resampling techniques, adding more complexity": [[21, "part-h-cross-validation-as-resampling-techniques-adding-more-complexity"]], "Partial Differential Equations": [[2, "partial-differential-equations"]], "Plans for week 35": [[27, "plans-for-week-35"]], "Plans for week 36": [[28, "plans-for-week-36"]], "Practical tips": [[13, "practical-tips"]], "Practicalities": [[24, "practicalities"], [24, "id1"]], "Preamble: Note on writing reports, using reference material, AI and other tools": [[21, "preamble-note-on-writing-reports-using-reference-material-ai-and-other-tools"]], "Predicting New Points With A Trained Recurrent Neural Network": [[4, "predicting-new-points-with-a-trained-recurrent-neural-network"]], "Preprocessing our data": [[27, "preprocessing-our-data"]], "Prerequisites": [[26, "prerequisites"]], "Prerequisites and background": [[19, "prerequisites-and-background"]], "Prerequisites: Collect and pre-process data": [[3, "prerequisites-collect-and-pre-process-data"]], "Probability Distribution Functions": [[23, "probability-distribution-functions"]], "Program example for gradient descent with Ridge Regression": [[28, "program-example-for-gradient-descent-with-ridge-regression"]], "Program for stochastic gradient": [[13, "program-for-stochastic-gradient"]], "Project 1 on Machine Learning, deadline October 6 (midnight), 2025": [[21, null]], "Properties of PDFs": [[23, "properties-of-pdfs"]], "Pros and cons of trees, pros": [[9, "pros-and-cons-of-trees-pros"]], "Python installers": [[19, "python-installers"], [26, "python-installers"]], "RMS prop": [[13, "rms-prop"]], "Random Numbers": [[23, "random-numbers"]], "Random forests": [[10, "random-forests"]], "Randomized PCA": [[11, "randomized-pca"]], "Reading material": [[26, "reading-material"]], "Reading recommendations:": [[27, "reading-recommendations"]], "Reading suggestions week 34": [[26, "reading-suggestions-week-34"]], "Recurrent neural networks": [[12, "recurrent-neural-networks"]], "Recurrent neural networks: Overarching view": [[4, null]], "Reducing the number of degrees of freedom, overarching view": [[0, "reducing-the-number-of-degrees-of-freedom-overarching-view"], [27, "reducing-the-number-of-degrees-of-freedom-overarching-view"]], "Reformulating the problem": [[2, "reformulating-the-problem"]], "Regression Case": [[10, "regression-case"]], "Regression analysis and resampling methods": [[21, "regression-analysis-and-resampling-methods"]], "Regression analysis, overarching aims": [[26, "regression-analysis-overarching-aims"]], "Regression analysis, overarching aims II": [[26, "regression-analysis-overarching-aims-ii"]], "Regularization": [[1, "regularization"]], "Reminder from last week": [[27, "reminder-from-last-week"]], "Reminder on Newton-Raphson\u2019s method": [[28, "reminder-on-newton-raphson-s-method"]], "Reminder on Statistics": [[6, "reminder-on-statistics"]], "Replace or not": [[13, "replace-or-not"]], "Required Technologies": [[19, "required-technologies"]], "Resampling Methods": [[6, null]], "Resampling methods": [[6, "id1"]], "Residual Error": [[27, "residual-error"], [28, "residual-error"]], "Resources on differential equations and deep learning": [[2, "resources-on-differential-equations-and-deep-learning"]], "Revisiting Ordinary Least Squares": [[28, "revisiting-ordinary-least-squares"]], "Revisiting our Linear Regression Solvers": [[13, "revisiting-our-linear-regression-solvers"]], "Rewriting the Covariance and/or Correlation Matrix": [[27, "rewriting-the-covariance-and-or-correlation-matrix"]], "Rewriting the fitting procedure as a linear algebra problem": [[26, "rewriting-the-fitting-procedure-as-a-linear-algebra-problem"]], "Rewriting the fitting procedure as a linear algebra problem, more details": [[26, "rewriting-the-fitting-procedure-as-a-linear-algebra-problem-more-details"]], "Ridge Regression": [[28, "ridge-regression"]], "Ridge and LASSO Regression": [[27, "ridge-and-lasso-regression"], [28, "ridge-and-lasso-regression"], [28, "id2"]], "Ridge and Lasso Regression": [[5, null], [5, "id1"]], "SVD analysis": [[28, "svd-analysis"]], "Same code but now with momentum gradient descent": [[13, "same-code-but-now-with-momentum-gradient-descent"]], "Schedule first week": [[26, "schedule-first-week"]], "Schematic Regression Procedure": [[9, "schematic-regression-procedure"]], "Setting up the Back propagation algorithm": [[12, "setting-up-the-back-propagation-algorithm"]], "Setting up the Matrix to be inverted": [[27, "setting-up-the-matrix-to-be-inverted"], [28, "setting-up-the-matrix-to-be-inverted"]], "Setting up the network using Autograd; The full program": [[2, "setting-up-the-network-using-autograd-the-full-program"]], "Similar (second order function now) problem but now with AdaGrad": [[13, "similar-second-order-function-now-problem-but-now-with-adagrad"]], "Simple Python Code to read in Data and perform Classification": [[9, "simple-python-code-to-read-in-data-and-perform-classification"]], "Simple case": [[27, "simple-case"], [28, "simple-case"]], "Simple code for solving the above problem": [[28, "simple-code-for-solving-the-above-problem"]], "Simple example to illustrate Ordinary Least Squares, Ridge and Lasso Regression": [[28, "simple-example-to-illustrate-ordinary-least-squares-ridge-and-lasso-regression"]], "Simple geometric interpretation": [[28, "simple-geometric-interpretation"]], "Simple linear regression model using scikit-learn": [[0, "simple-linear-regression-model-using-scikit-learn"], [26, "simple-linear-regression-model-using-scikit-learn"]], "Simple one-dimensional second-order polynomial": [[18, "simple-one-dimensional-second-order-polynomial"]], "Simple program": [[28, "simple-program"]], "Software and needed installations": [[21, "software-and-needed-installations"], [26, "software-and-needed-installations"]], "Solving Differential Equations  with Deep Learning": [[2, null]], "Solving the one dimensional Poisson equation": [[2, "solving-the-one-dimensional-poisson-equation"]], "Solving the wave equation with Neural Networks": [[2, "solving-the-wave-equation-with-neural-networks"]], "Some famous Matrices": [[20, "some-famous-matrices"]], "Some simple problems": [[13, "some-simple-problems"], [28, "some-simple-problems"]], "Some useful matrix and vector expressions": [[27, "some-useful-matrix-and-vector-expressions"]], "Splitting our Data in Training and Test data": [[0, "splitting-our-data-in-training-and-test-data"], [27, "splitting-our-data-in-training-and-test-data"]], "Standard steepest descent": [[13, "standard-steepest-descent"]], "Statistical analysis and optimization of data": [[19, "statistical-analysis-and-optimization-of-data"], [26, "statistical-analysis-and-optimization-of-data"]], "Steepest descent": [[13, "steepest-descent"], [28, "steepest-descent"]], "Stochastic Gradient Descent (SGD)": [[13, "stochastic-gradient-descent-sgd"]], "Stochastic variables and the main concepts, the discrete case": [[23, "stochastic-variables-and-the-main-concepts-the-discrete-case"]], "Support Vector Machines, overarching aims": [[8, null]], "Systematic reduction": [[3, "systematic-reduction"]], "Teachers": [[26, "teachers"]], "Teachers and Grading": [[24, null]], "Teaching Assistants Fall semester 2023": [[24, "teaching-assistants-fall-semester-2023"]], "Tentative deadllines for projects": [[24, "tentative-deadllines-for-projects"]], "Testing the Means Squared Error as function of Complexity": [[0, "testing-the-means-squared-error-as-function-of-complexity"], [27, "testing-the-means-squared-error-as-function-of-complexity"]], "Textbooks": [[25, null]], "The Algorithm before theorem": [[11, "the-algorithm-before-theorem"]], "The Breast Cancer Data, now with Keras": [[1, "the-breast-cancer-data-now-with-keras"]], "The CART algorithm for Classification": [[9, "the-cart-algorithm-for-classification"]], "The CART algorithm for Regression": [[9, "the-cart-algorithm-for-regression"]], "The CIFAR01 data set": [[3, "the-cifar01-data-set"]], "The Hessian matrix": [[28, "the-hessian-matrix"]], "The Hessian matrix for Ridge Regression": [[28, "the-hessian-matrix-for-ridge-regression"]], "The Jacobian": [[27, "the-jacobian"]], "The MNIST dataset again": [[3, "the-mnist-dataset-again"]], "The OLS case": [[28, "the-ols-case"]], "The RELU function family": [[1, "the-relu-function-family"]], "The Ridge case": [[28, "the-ridge-case"]], "The SVD, a Fantastic Algorithm": [[27, "the-svd-a-fantastic-algorithm"], [28, "the-svd-a-fantastic-algorithm"]], "The Softmax function": [[1, "the-softmax-function"]], "The \\chi^2 function": [[0, "the-chi-2-function"], [26, "the-chi-2-function"], [26, "id4"], [26, "id5"], [26, "id6"], [26, "id7"], [26, "id8"]], "The bias-variance tradeoff": [[6, "the-bias-variance-tradeoff"]], "The code for solving the ODE": [[2, "the-code-for-solving-the-ode"]], "The complete code with a simple data set": [[27, "the-complete-code-with-a-simple-data-set"]], "The cost/loss function": [[27, "the-cost-loss-function"]], "The course has two central parts": [[19, "the-course-has-two-central-parts"]], "The derivative of the cost/loss function": [[28, "the-derivative-of-the-cost-loss-function"]], "The equations": [[28, "the-equations"]], "The equations for ordinary least squares": [[27, "the-equations-for-ordinary-least-squares"]], "The first Case": [[28, "the-first-case"]], "The ideal": [[28, "the-ideal"]], "The logistic function": [[7, "the-logistic-function"]], "The mean squared error and its derivative": [[27, "the-mean-squared-error-and-its-derivative"]], "The moons example": [[8, "the-moons-example"]], "The multilayer  perceptron (MLP)": [[12, "the-multilayer-perceptron-mlp"]], "The network with one input layer, specified number of hidden layers, and one output layer": [[2, "the-network-with-one-input-layer-specified-number-of-hidden-layers-and-one-output-layer"]], "The plethora  of machine learning algorithms/methods": [[26, "the-plethora-of-machine-learning-algorithms-methods"]], "The sensitiveness of the gradient descent": [[28, "the-sensitiveness-of-the-gradient-descent"]], "The singular value decomposition": [[5, "the-singular-value-decomposition"], [27, "the-singular-value-decomposition"], [28, "the-singular-value-decomposition"]], "The two-dimensional case": [[8, "the-two-dimensional-case"]], "To our real data: nuclear binding energies. Brief reminder on masses and binding energies": [[26, "to-our-real-data-nuclear-binding-energies-brief-reminder-on-masses-and-binding-energies"]], "Topics covered in this course: Statistical analysis and optimization of data": [[26, "topics-covered-in-this-course-statistical-analysis-and-optimization-of-data"]], "Towards the PCA theorem": [[11, "towards-the-pca-theorem"]], "Train and test datasets": [[1, "train-and-test-datasets"]], "Two-dimensional Objects": [[3, "two-dimensional-objects"]], "Type of problem": [[2, "type-of-problem"]], "Types of Machine Learning": [[26, "types-of-machine-learning"]], "Useful Python libraries": [[19, "useful-python-libraries"], [26, "useful-python-libraries"]], "Using Autograd": [[13, "using-autograd"]], "Using forward Euler to solve the ODE": [[2, "using-forward-euler-to-solve-the-ode"]], "Using gradient descent methods, limitations": [[13, "using-gradient-descent-methods-limitations"], [28, "using-gradient-descent-methods-limitations"]], "Visualization": [[1, "visualization"], [1, "id1"]], "Visualizing the Tree, Classification": [[9, "visualizing-the-tree-classification"]], "Week 34: Introduction to the course, Logistics and Practicalities": [[26, null]], "Week 35: From Ordinary Linear Regression to Ridge and Lasso Regression": [[27, null]], "Week 36: Linear Regression and Gradient descent": [[28, null]], "What Is Generative Modeling?": [[26, "what-is-generative-modeling"]], "What does it mean?": [[27, "what-does-it-mean"], [28, "what-does-it-mean"]], "What is Machine Learning?": [[0, "what-is-machine-learning"]], "What is a good model?": [[0, "what-is-a-good-model"], [26, "what-is-a-good-model"]], "What is a good model? Can we define it?": [[26, "what-is-a-good-model-can-we-define-it"]], "Which activation function should I use?": [[1, "which-activation-function-should-i-use"]], "Why Linear Regression (aka Ordinary Least Squares and family)": [[26, "why-linear-regression-aka-ordinary-least-squares-and-family"]], "Wisconsin Cancer Data": [[7, "wisconsin-cancer-data"]], "With Lasso Regression": [[28, "with-lasso-regression"]], "Writing Our First Generative Adversarial Network": [[4, "writing-our-first-generative-adversarial-network"]], "Writing our own PCA code": [[11, "writing-our-own-pca-code"]], "Writing the Cost Function": [[28, "writing-the-cost-function"]], "XGBoost: Extreme Gradient Boosting": [[10, "xgboost-extreme-gradient-boosting"]], "Yet another Example": [[28, "yet-another-example"]], "a) Expression for Ridge regression": [[17, "a-expression-for-ridge-regression"]], "scikit-learn implementation": [[1, "scikit-learn-implementation"]]}, "docnames": ["chapter1", "chapter10", "chapter11", "chapter12", "chapter13", "chapter2", "chapter3", "chapter4", "chapter5", "chapter6", "chapter7", "chapter8", "chapter9", "chapteroptimization", "clustering", "exercisesweek34", "exercisesweek35", "exercisesweek36", "exercisesweek37", "intro", "linalg", "project1", "schedule", "statistics", "teachers", "textbooks", "week34", "week35", "week36"], "envversion": {"sphinx": 62, "sphinx.domains.c": 3, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 9, "sphinx.domains.index": 1, "sphinx.domains.javascript": 3, "sphinx.domains.math": 2, "sphinx.domains.python": 4, "sphinx.domains.rst": 2, "sphinx.domains.std": 2, "sphinx.ext.intersphinx": 1}, "filenames": ["chapter1.ipynb", "chapter10.ipynb", "chapter11.ipynb", "chapter12.ipynb", "chapter13.ipynb", "chapter2.ipynb", "chapter3.ipynb", "chapter4.ipynb", "chapter5.ipynb", "chapter6.ipynb", "chapter7.ipynb", "chapter8.ipynb", "chapter9.ipynb", "chapteroptimization.ipynb", "clustering.ipynb", "exercisesweek34.ipynb", "exercisesweek35.ipynb", "exercisesweek36.ipynb", "exercisesweek37.ipynb", "intro.md", "linalg.ipynb", "project1.ipynb", "schedule.md", "statistics.ipynb", "teachers.md", "textbooks.md", "week34.ipynb", "week35.ipynb", "week36.ipynb"], "indexentries": {}, "objects": {}, "objnames": {}, "objtypes": {}, "terms": {"": [0, 1, 2, 3, 4, 5, 6, 7, 9, 11, 12, 13, 15, 16, 17, 19, 20, 21, 23, 24, 26, 27], "0": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 20, 21, 23, 24, 26, 27, 28], "00": [0, 1, 5, 11, 26, 27], "000": [1, 3], "00000000e": [], "001": [2, 8, 13, 28], "004": 5, "004113634617443131": 27, "004113634617443139": 27, "00411363461744314": 27, "004113634617443147": 27, "00727646693": [0, 26], "0086649156": [0, 26], "01": [0, 1, 2, 5, 9, 11, 13, 17, 25, 26, 27], "0110": 23, "01719003e": [], "02": [0, 4, 7, 12, 26], "02334824": [], "02857": 4, "02f": 6, "03077640549": 4, "03097597e": [], "031": 5, "04": 11, "0458": 9, "05": [4, 6], "062292565": 4, "062435": [], "06730814": [], "07": [], "0713": [0, 26], "07285": 3, "08": 23, "08078025e": [], "08336233266": 4, "08376632": 27, "083766322923899": 27, "0837663229239043": 27, "0917": 9, "0n": [0, 26], "0x113e21950": 17, "1": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 20, 22, 23, 24, 25, 26, 28], "10": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 16, 17, 18, 20, 22, 23, 24, 26, 27, 28], "100": [0, 1, 2, 3, 4, 5, 6, 8, 9, 10, 11, 13, 14, 15, 16, 17, 18, 20, 23, 24, 26, 27, 28], "1000": [0, 1, 2, 4, 5, 8, 11, 13, 14, 18, 19, 23, 26, 28], "10000": [2, 5, 6, 10, 11, 13, 23], "100000": 8, "10001": 10, "1001": 23, "1002": 23, "1003": 23, "1005": 23, "1009": 23, "101": 16, "1011": 23, "1013": 23, "1013904243": 23, "1015": 23, "102": 16, "1023": 23, "1024": 3, "1026": 23, "1027": 23, "103": 1, "1030": 23, "1037": 23, "1038": 23, "1040": 23, "1047": 23, "107": 16, "108": [], "10th": 9, "10x": [0, 26], "11": [0, 2, 5, 6, 7, 8, 9, 10, 11, 12, 13, 16, 20, 21, 23, 25, 26, 27, 28], "110": [], "1100": 23, "1101": 23, "111": [1, 7, 12], "112": 16, "11340253": [], "11590451": [], "116": 16, "117": 16, "118": 16, "12": [0, 1, 2, 3, 4, 5, 6, 8, 9, 11, 12, 18, 20, 23, 25, 26, 27, 28], "120": 3, "121": [8, 9, 10, 16], "1215pm": [24, 26], "122": [8, 9, 10], "124": [0, 26], "125": 16, "127": [4, 16], "128": [3, 4, 13], "129": 16, "1298": 9, "12pm": [24, 26], "13": [0, 2, 9, 12, 20, 23, 26], "131": 16, "133": 7, "135": 16, "136": 16, "14": [0, 2, 4, 6, 8, 9, 10, 12, 20, 23, 25, 27], "141": 16, "143": 16, "1446729567": 4, "149": 16, "14g": 6, "15": [0, 2, 4, 6, 7, 8, 9, 12, 13, 21, 23, 26, 28], "150": [4, 8], "152": 16, "153760": [], "156": 16, "157": [], "158": [], "159": 16, "15g": 6, "15pm": 26, "16": [1, 2, 3, 4, 5, 8, 9, 10, 23, 26, 28], "160": 16, "1603": 3, "161": 16, "162": 16, "16231451": 4, "163": 16, "16384": 3, "164": 16, "167": 16, "17": [1, 2, 8, 23], "172": 16, "173": 16, "176": 16, "178": 16, "179": 16, "1797": 1, "18": [2, 6, 7, 8, 9, 10, 23, 26], "1807": 4, "18392847": [], "19": [2, 23, 26], "1940": [], "1943": 12, "19569961": 27, "1970": [20, 26], "1973": 9, "1979": 6, "1_1": 12, "1_2": 12, "1_3": 12, "1cm": [0, 8, 10, 23, 26], "1d": [1, 2, 3], "1e": [2, 4, 13, 14], "1e10": 14, "1e4": 6, "1f": 1, "1k": 20, "1n": [0, 26], "1x": [0, 26], "2": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 19, 20, 21, 23, 25], "20": [0, 1, 2, 6, 7, 8, 16, 17, 23, 24, 26, 27, 28], "200": [0, 2, 3, 4, 8, 9, 10], "2000": [0, 27], "2004": [13, 28], "2006": 25, "20072279": [], "2008": 26, "2010": 1, "2011": 1, "2014": 4, "2015": 1, "2016": [0, 26], "2018": [0, 6, 27], "2021": [6, 14, 27], "2022": 26, "2025": [18, 26, 27, 28], "21": [0, 1, 5, 7, 9, 12, 20, 26, 27, 28], "2116753732": 4, "215pm": [24, 26], "2167072": [], "22": [0, 1, 5, 12, 13, 20, 26, 27, 28], "221": 8, "225": 4, "22948497": [], "23": [1, 12, 20], "24": [0, 1, 20, 26], "25": [2, 3, 4, 5, 6, 8, 9, 11, 27], "250": [2, 4, 7, 9], "25000": [], "250154": [], "253775": [], "255": 3, "256": 4, "25x": 21, "26": [], "26303845": [], "264": [], "265": [], "265109911": 4, "266": [], "269": [], "27": 1, "270": [], "278": 28, "27n_": 23, "28": [1, 3, 4], "283": 28, "2830637392": 4, "2861": 23, "2873": 9, "2882": 23, "2886": 23, "2890": [0, 26], "2892": 23, "29": 27, "2915": 23, "2931": 26, "29364655": [], "294399745619595": [], "296247": [], "2968": 26, "2980": 26, "298273": [], "298375": [], "2990": 26, "2_": 12, "2_1": 12, "2_2": 12, "2_3": 12, "2_i": 12, "2_m": [6, 23], "2_t": 13, "2_x": 23, "2a": 17, "2b": 23, "2cm": 8, "2d": [1, 3, 11, 12, 19, 26], "2e": 6, "2f": [0, 7, 9, 10, 11, 12, 26], "2g": 2, "2g_i": 2, "2k": 3, "2m": 6, "2mvizaqfst8": 27, "2n": [0, 2, 3, 26, 27], "2nd": 9, "2p": 23, "2pt": 4, "2x": [0, 3, 8, 13, 26], "2x_ix_jy_iy_j": 8, "2x_j": 8, "2y_i": 10, "2y_j": 8, "3": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 19, 20, 21, 22, 23, 24, 26, 28], "30": [0, 1, 4, 6, 7, 10, 13, 24], "30000": [0, 26], "3072": 3, "31": [12, 20, 23], "315": [6, 27], "3155": [0, 5, 6, 27, 28], "32": [3, 4, 6, 12, 13, 20, 23], "3200": 1, "3250": 1, "3297": [], "33": [12, 20, 24], "3303": [], "3310": [], "332331": [], "333": 7, "3331": [], "3337": [], "34": 20, "3436": [0, 26], "3437": [0, 26], "35": [0, 6, 21, 26, 28], "3581341341": 4, "359": [5, 28], "36": [0, 5, 6, 18, 21, 23], "37": [21, 28], "370782966": 4, "38": [21, 23], "39": [0, 24, 26], "3d": [2, 3, 4, 6, 13, 16], "3f": [1, 3, 9], "3n": 20, "3x": [2, 8], "3x_i": 2, "3y": 8, "4": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 20, 21, 23, 26, 28], "40": [1, 6, 24, 26], "400": 4, "4000": 26, "4050": [25, 26], "41": 20, "4155": [2, 15], "41589548": [], "42": [1, 4, 8, 9, 10, 20], "43": [0, 7, 20], "4310": 26, "436462435": 4, "44": [0, 20, 28], "45": [24, 26], "46": [24, 26], "462": 7, "47": [24, 26], "479465113": 4, "47958494": [], "48": [], "48257387": [24, 26], "49": [5, 6, 11], "49152": 3, "4940954": [0, 26], "4990": 23, "4992": 23, "4997": 23, "4c4c7f": [9, 10], "4d": 3, "4f": 6, "4pm": [24, 26], "4y": 8, "4y_i": 10, "5": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 17, 20, 21, 23, 26, 27, 28], "50": [1, 2, 3, 4, 6, 7, 8, 10, 13, 26, 27], "500": [1, 3, 4, 6, 9, 10, 13], "5018": 23, "506": [], "507d50": [9, 10], "50j": 13, "50x10": 1, "51": 10, "510": 1, "512132": [], "5177783846": 4, "53": 9, "54": [6, 23], "5411205": [], "54894451": [], "55": 1, "56": 1, "56536": [0, 26], "569": 1, "57": [0, 8, 24, 26], "571": [5, 28], "58": [10, 24, 26], "591317992": 4, "5cm": 23, "5f": 8, "5x": [8, 18], "5y": 8, "6": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 12, 13, 18, 20, 23, 24, 26, 27, 28], "60": [1, 3], "60000": 4, "6019067271": 4, "606439": [], "625": 7, "63": 1, "64": [1, 3, 4, 13, 20, 26], "64x50": 1, "65": [1, 8, 9], "6887363571": 4, "69": [16, 23], "69069n_": 23, "691": [], "6n_": 23, "7": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 11, 12, 13, 20, 21, 23, 25, 26, 27], "70": [1, 7], "70653767": 4, "71": 1, "724": 3, "73": [], "7304881": [], "75": [5, 6, 8, 11], "76": [24, 26], "765": 7, "77": [24, 26], "7718": 9, "7782028952": 4, "77893972": [], "78": [], "7d7d58": [9, 10], "8": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 12, 13, 14, 18, 20, 23, 24, 26, 28], "80": [0, 1, 5, 8, 17, 27], "800": [4, 7], "81": 1, "815am": [24, 26], "85": 1, "8702784034": 4, "88": 26, "8f": 6, "8g": 6, "8n": 20, "8x8": 1, "9": [0, 1, 2, 4, 5, 6, 7, 8, 9, 11, 12, 13, 20, 23, 26], "90": 1, "9040": 9, "91": [24, 26], "92": [24, 26], "93": 16, "931": [0, 26], "933": [5, 28], "937": 23, "938": 23, "939": [0, 23, 26], "94": 23, "95": [1, 11], "954": 23, "955820c21e8b": 4, "96": 6, "960": 23, "961": 23, "962": 23, "9649652536": 4, "96611194e": [], "9780387310732": 25, "9780387848570": 25, "9781098134174": 26, "9781492032632": 25, "9781801819312": 26, "97898392": 27, "98": [0, 1, 16], "985": 23, "986": 23, "989": 23, "9898ff": [9, 10], "99": [13, 16], "991": 23, "992": 23, "993": 23, "996": 5, "999": [9, 23], "9x": 6, "9y": 6, "A": [2, 3, 5, 6, 7, 10, 11, 12, 13, 15, 16, 19, 20, 22, 23, 24, 25, 27, 28], "AND": 2, "And": [0, 3, 4, 5, 6, 9, 13, 19, 21, 23, 28], "As": [0, 1, 2, 3, 4, 5, 6, 8, 10, 12, 13, 15, 16, 20, 21, 23, 26, 27, 28], "At": [0, 4, 6, 13, 26], "BE": [0, 26], "Be": [2, 18, 19, 26], "Being": 13, "But": [0, 1, 2, 3, 5, 6, 9, 10, 16, 23, 27], "By": [0, 3, 5, 6, 12, 13, 17, 20, 26, 27, 28], "For": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 18, 19, 20, 21, 23, 25, 26, 27, 28], "IF": [6, 27], "IN": 25, "If": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 15, 16, 18, 19, 20, 21, 23, 26, 27, 28], "In": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 18, 19, 20, 21, 23, 25, 26, 27, 28], "Ising": [5, 12, 27, 28], "It": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 18, 19, 20, 21, 23, 26, 27, 28], "Its": [1, 2, 4, 11], "No": [6, 9, 26, 27], "Not": [0, 1, 5, 6, 27, 28], "OR": 23, "Of": 23, "On": [0, 3, 21, 23, 24, 25, 26], "One": [0, 1, 3, 4, 5, 6, 7, 8, 11, 12, 13, 17, 23, 27, 28], "Or": [0, 1, 6, 26], "Such": [0, 6, 12, 16, 23], "That": [0, 5, 7, 10, 11, 12, 14, 21, 23, 26], "The": [4, 10, 13, 14, 16, 17, 18, 20, 21, 22, 23, 24, 25], "Then": [0, 1, 6, 8, 9, 10, 11, 12, 13, 14, 15, 16, 18, 20, 26, 28], "There": [0, 3, 4, 5, 6, 8, 9, 11, 12, 14, 15, 20, 21, 23, 24, 26, 27, 28], "These": [0, 3, 4, 5, 8, 9, 10, 11, 12, 13, 14, 17, 20, 21, 23, 24, 26, 27, 28], "To": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 15, 16, 20, 23, 27, 28], "With": [0, 5, 6, 8, 9, 10, 11, 12, 14, 16, 20, 21, 23, 26, 27], "_": [0, 1, 2, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 16, 17, 18, 20, 21, 26, 27, 28], "_0": [5, 8, 10, 11, 13, 27, 28], "_1": [2, 5, 6, 8, 10, 11, 12, 13, 14, 20, 27, 28], "_2": [2, 5, 8, 11, 12, 13, 20, 27], "_3": 20, "_4": 20, "_9": 13, "__class__": 10, "__doc__": 6, "__future__": [8, 9], "__init__": 1, "__main__": 2, "__name__": [2, 10], "_auto1": [2, 3, 4, 5, 6, 7, 12, 13, 20, 23, 27, 28], "_auto10": [6, 12], "_auto11": 6, "_auto12": 6, "_auto2": [2, 3, 4, 5, 6, 12, 13, 20, 23], "_auto3": [3, 4, 5, 6, 12, 13, 20], "_auto4": [4, 6, 12, 13, 20], "_auto5": [4, 6, 12, 13, 20], "_auto6": [4, 6, 12, 20], "_auto7": [4, 6, 12, 20], "_auto8": [6, 12], "_auto9": [6, 12], "_build": [0, 19, 21, 25, 26], "_c": 1, "_center": [], "_compon": 11, "_depth": 9, "_export": [15, 16], "_fraction": 9, "_i": [0, 1, 2, 5, 6, 7, 8, 11, 12, 13, 21, 26, 27, 28], "_j": [0, 1, 2, 3, 5, 6, 8, 13, 21, 27, 28], "_k": [13, 28], "_l": 12, "_lambda": 6, "_leaf": 9, "_m": 10, "_multilayer_perceptron": [], "_n": [2, 5, 8, 11, 13, 27, 28], "_node": 9, "_norm": [], "_p": [5, 8, 27, 28], "_ratio": 11, "_sampl": 9, "_split": [6, 9, 21], "_t": 13, "_test": [6, 21], "_varianc": 11, "_weight": 9, "a0": 3, "a0faa0": [9, 10], "a1": [0, 26], "a2": [0, 26], "a3": [0, 26], "a4": [0, 26], "a_": [0, 1, 16, 20, 26, 27], "a_0": [0, 26], "a_1a": [0, 26], "a_2a": [0, 26], "a_3": [0, 26], "a_3a": [0, 26], "a_4": [0, 26], "a_4a": [0, 26], "a_h": 1, "a_i": [0, 1, 2, 12, 26], "a_j": [1, 12], "a_k": [0, 1, 12], "aaron": 25, "ab": [0, 2, 5, 13, 14, 26, 27], "ab_channel": 19, "abandon": 1, "abid": 23, "abil": [0, 10], "abl": [0, 1, 4, 5, 6, 7, 10, 12, 13, 16, 18, 21, 27, 28], "about": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 15, 16, 19, 20, 21, 24], "abov": [0, 1, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 20, 23, 25, 26, 27], "abovement": [6, 21, 26], "abscissa": [13, 28], "absolut": [0, 2, 5, 6, 13, 26, 27, 28], "absorb": [27, 28], "abstract": 1, "acceler": 13, "accept": [0, 3, 6, 9, 21, 27], "access": [3, 11, 23, 26], "accid": [4, 6], "accompani": [0, 26, 27], "accomplish": [8, 9, 13], "accord": [0, 1, 2, 5, 6, 9, 12, 13, 14, 23, 26, 28], "accordingli": 11, "account": [0, 3, 5, 13, 15, 16, 23, 26], "accumul": [12, 13, 23], "accur": [0, 3, 4, 6, 10, 13], "accuraci": [0, 1, 3, 4, 5, 6, 7, 9, 10, 11, 12, 26, 27, 28], "accuracy_scor": [0, 1, 10, 26], "accuracy_score_numpi": 1, "achiev": [0, 1, 5, 6, 8, 12, 20, 26], "aco": 23, "acquaint": 19, "acquir": [1, 19, 26], "acr": [], "across": [1, 3, 6, 9, 17, 19, 26], "act": [1, 3, 20], "action": 23, "activ": [0, 2, 3, 4, 9, 15, 22, 24, 26], "actual": [0, 1, 4, 5, 6, 8, 11, 15, 16, 18, 20, 23, 26, 27, 28], "ad": [1, 3, 4, 5, 8, 13, 15, 16, 20, 28], "ada_clf": 10, "adaboostclassifi": 10, "adadelta": 13, "adagrad": 21, "adam": [1, 3, 4, 21, 26], "adapt": [4, 6, 13, 17, 25, 28], "add": [0, 1, 2, 3, 4, 5, 6, 8, 10, 11, 12, 15, 16, 17, 18, 23, 24, 26, 27, 28], "add_subplot": [1, 7, 12, 14], "addendum": 5, "addit": [0, 2, 3, 5, 6, 7, 8, 9, 10, 12, 13, 15, 19, 20, 21, 23, 24, 25, 26, 27], "addition": [12, 13, 28], "address": [1, 9, 11, 13, 26], "adjac": [3, 12], "adjoint": [5, 27], "adjust": [0, 5, 12, 13, 28], "admir": [0, 26], "advanc": [4, 6, 12, 25, 26], "advantag": [1, 3, 5, 6, 10, 13, 20, 28], "adversari": 26, "afecionado": 26, "affect": [3, 15], "affin": [0, 3, 8, 11, 27], "afford": 3, "aficionado": 26, "aforement": 14, "african": [], "after": [0, 1, 2, 4, 5, 6, 9, 11, 12, 13, 15, 16, 17, 18, 19, 20, 21, 23, 26, 27, 28], "afterward": [0, 26], "ag": [0, 7, 26, 27], "ag_0": 2, "again": [0, 1, 4, 5, 6, 7, 8, 10, 11, 12, 13, 21, 23, 26, 27, 28], "against": [1, 4, 7, 10], "agegroup": 7, "agegroupmean": 7, "aggreg": [9, 10], "agorithm": 10, "agre": [5, 6, 23, 27, 28], "agreement": 13, "ahead": 9, "ai": [0, 25], "aid": 11, "aim": [0, 1, 4, 6, 7, 11, 14, 16, 17, 19, 20, 21, 27], "ainv": 5, "airplan": 3, "aka": 5, "al": [0, 2, 4, 16, 17, 25, 26, 27, 28], "alarm": [5, 7], "aldo": 27, "algebra": [0, 3, 5, 13, 19, 27, 28], "algorithm": [0, 1, 2, 4, 5, 6, 7, 8, 13, 14, 16, 19, 20, 21, 23, 25], "align": [0, 2, 5, 6, 7, 8, 13, 23, 26, 27, 28], "all": [0, 1, 2, 3, 4, 5, 6, 7, 9, 10, 11, 12, 13, 14, 15, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28], "allevi": [1, 13, 28], "alloc": [3, 20], "allow": [0, 1, 2, 3, 5, 6, 8, 10, 13, 15, 19, 20, 21, 26, 27, 28], "almost": [0, 1, 6, 8, 11, 13, 23, 28], "alon": [2, 9], "along": [2, 3, 4, 5, 6, 9, 10, 11, 15, 19, 20, 26, 27, 28], "alpha": [0, 1, 2, 3, 4, 6, 7, 8, 9, 10, 13, 14, 23, 26, 27, 28], "alpha_": 10, "alpha_0": 3, "alpha_1": 3, "alpha_2": 3, "alpha_i": [3, 13], "alpha_k": 13, "alpha_m": 10, "alpha_n": 3, "alpha_opt": 13, "alreadi": [2, 3, 4, 5, 6, 10, 12, 15, 19, 20, 23, 26, 27, 28], "also": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 18, 19, 20, 21, 23, 26, 27, 28], "alter": 1, "altern": [0, 1, 4, 5, 6, 8, 9, 11, 13, 15, 18, 20, 21, 26, 27], "although": [0, 1, 5, 6, 8, 10, 13, 16, 26], "alwai": [0, 3, 5, 6, 12, 13, 16, 21, 23, 26, 27, 28], "am": 4, "ame2016": [0, 26], "american": [], "among": [0, 3, 5, 9, 10, 12, 20, 26, 27], "amongst": 5, "amount": [0, 1, 3, 4, 6, 8, 10, 14, 19], "an": [1, 2, 3, 5, 6, 7, 8, 9, 11, 12, 13, 14, 16, 17, 18, 19, 20, 21, 23, 24, 25, 27, 28], "an_": 23, "anaconda": [0, 1, 19, 21, 26], "analogi": 13, "analys": 6, "analysi": [1, 3, 4, 7, 14, 20, 25], "analyt": [2, 3, 5, 6, 7, 12, 13, 17, 19, 21, 26, 27, 28], "analyz": [0, 1, 3, 4, 5, 6, 16, 21, 23, 27, 28], "andrew": 1, "angl": [0, 3, 9, 27], "anharmon": 3, "ani": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 12, 14, 15, 16, 23, 26, 27], "anim": [4, 12], "ann": 12, "annot": [0, 1, 3, 7, 8, 26], "announc": 26, "anoth": [0, 1, 3, 4, 5, 6, 7, 8, 10, 11, 12, 13, 15, 20, 21, 23, 26, 27], "ansatz": [0, 18, 26], "answer": [0, 1, 3, 5, 6, 20, 21, 24, 26], "antialias": [2, 6], "anticip": 4, "anymor": [1, 8], "anyon": [4, 8, 15], "anyth": [1, 15, 16, 23], "anytim": [24, 26], "apach": 1, "apart": [11, 13, 28], "api": [1, 19, 26], "appar": 2, "appear": [0, 1, 3, 13, 20, 23], "append": [1, 3, 4, 8, 9, 13, 26], "appendix": 21, "appli": [0, 1, 3, 4, 6, 7, 8, 9, 10, 11, 12, 13, 18, 21, 23, 25, 26, 27], "applic": [0, 1, 3, 4, 5, 6, 7, 9, 12, 13, 16, 20, 23, 25, 26, 27, 28], "apply_gradi": 4, "approach": [1, 2, 4, 5, 6, 9, 10, 11, 12, 13, 15, 16, 18, 19, 21, 23, 25, 27, 28], "approch": 21, "appropri": [2, 6, 9, 12, 13, 17, 19, 23], "approv": 26, "approx": [0, 2, 3, 6, 10, 11, 13, 18, 21, 23, 26, 28], "approxim": [0, 1, 2, 3, 4, 5, 6, 7, 10, 11, 13, 21, 23, 26, 27, 28], "apt": [0, 19, 21, 26], "aq": 23, "ar": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28], "aragorn": 26, "arang": [1, 3, 4, 6, 7, 9, 10, 12, 13, 26], "arbitrari": [1, 4, 6, 8, 12, 13, 23, 28], "arbitrarili": [0, 1, 11, 26], "arc": 6, "architectur": [3, 4, 12], "archiv": 21, "area": [0, 3, 6, 25, 26], "argmax": [1, 11], "argmin": [4, 10, 14], "argsort": 11, "argu": [1, 13], "argument": [0, 2, 3, 5, 11, 12, 13, 17, 26, 27], "aris": [0, 6, 12, 13, 23, 26, 28], "arithmet": [0, 13, 20, 26], "arm": [6, 27], "armadillo": 20, "around": [0, 1, 4, 5, 6, 11, 18, 21, 23, 26], "arrai": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 12, 13, 14, 16, 18, 19, 21, 23, 27, 28], "arrang": [3, 26], "arraybox": 13, "arriv": [0, 6, 9, 11, 20, 23, 26], "arrow": 12, "arrowprop": 8, "art": [0, 1, 19], "articl": [0, 3, 4, 6, 10, 26, 27, 28], "artifici": [0, 2, 7, 12, 25, 26], "artificialneuron": 12, "arug": 13, "arxiv": [3, 4], "asarrai": [0, 6, 9, 27], "asid": 27, "ask": [5, 6, 11, 12, 15, 21], "aspect": [0, 6, 19, 26, 27], "assembl": 3, "assembli": [0, 26], "assert": 4, "assess": [0, 6, 21, 26, 27], "assici": 4, "assign": [0, 7, 8, 9, 12, 13, 14, 15, 22, 24, 25, 26], "associ": [0, 6, 9, 12, 14, 23, 26], "assum": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 17, 20, 21, 23, 26, 27, 28], "assumpt": [0, 3, 5, 6, 9, 11, 23, 26, 27], "ast": [0, 5, 6, 26], "astyp": [4, 9, 10], "asymmetri": [0, 26], "asymptot": [4, 6], "atom": [0, 26], "attempt": [0, 4, 6, 7, 8, 10, 26, 27], "attend": 26, "attent": [0, 20, 26], "attract": [0, 10, 26], "attribut": [0, 9, 26], "audi": [0, 26], "audio": [3, 4], "august": [26, 27], "aurelien": [0, 25, 26], "austfjel": 6, "auth": 15, "authent": 15, "author": [0, 1, 10, 23], "authour": 26, "auto": [9, 10, 23], "auto_exampl": [21, 27], "autocor": 23, "autocorrelation_tim": 23, "autocorrelform": 23, "autocovari": 23, "autoencod": [4, 19, 26], "autoencond": 19, "autograd": [19, 26], "autom": [0, 19, 25, 26], "automac": 20, "automag": 26, "automat": [0, 1, 2, 3, 4, 11, 16, 19, 20, 26], "automobil": 3, "autonom": 4, "avail": [0, 1, 4, 6, 10, 11, 19, 20, 21, 22, 24, 25, 26], "averag": [0, 1, 3, 6, 9, 10, 13, 14, 23, 24, 26, 27], "avoid": [0, 4, 5, 6, 9, 11, 13, 18, 20, 27], "awai": [2, 3, 6, 27], "awar": [2, 10], "award": [24, 26], "ax": [0, 1, 2, 3, 4, 6, 7, 8, 9, 10, 11, 12, 13, 14, 20, 21, 26], "axes3d": [2, 6, 13, 28], "axes_grid1": 6, "axhlin": 8, "axi": [0, 1, 2, 3, 4, 6, 7, 8, 9, 10, 11, 12, 13, 14, 18, 20, 23, 26, 27, 28], "axiom": 5, "axvlin": [4, 8], "axvspan": 4, "b": [0, 1, 3, 4, 5, 6, 8, 9, 10, 12, 13, 14, 15, 16, 17, 23, 24, 26, 27, 28], "b1": 8, "b2": 8, "b3": 8, "b_": [0, 1, 20], "b_0": 0, "b_1": [0, 2, 12, 13], "b_2": [0, 13], "b_5": 13, "b_group": 9, "b_i": [0, 1, 2, 12, 26], "b_ia_": [0, 26], "b_ia_i": 0, "b_index": 9, "b_j": [1, 12], "b_k": [0, 1, 12, 13], "b_m": 12, "b_score": 9, "b_valu": 9, "babcock": 26, "bachelor": [22, 24], "back": [0, 3, 4, 5, 6, 8, 9, 10, 15, 16, 20, 23, 26], "backbon": 20, "backend": [1, 4], "background": [25, 26], "backpropag": 1, "backtrack": 9, "backup": 20, "backward": [1, 2, 4, 12, 20], "bad": [6, 17, 27], "badli": 23, "bag": [9, 19, 26], "bag_clf": 10, "baggin": 26, "baggingboot": 10, "baggingclassifi": 10, "baggingtre": 10, "balanc": 6, "ballpark": 18, "band": 20, "bandwidth": 20, "bar": [0, 6, 11, 21, 26], "barber": 25, "bare": [4, 10], "base": [0, 1, 3, 4, 5, 7, 8, 9, 10, 14, 15, 16, 17, 19, 23, 24, 25, 26, 27, 28], "basi": [5, 7, 8, 10, 11, 12, 13, 20, 27, 28], "basic": [6, 8, 12, 13, 14, 15, 19, 21, 23, 26], "batch": [3, 4, 11, 12, 13, 28], "batch_shap": 4, "batch_siz": [1, 3, 4], "batchnorm": 4, "bay": 7, "bayesian": [5, 19, 25, 26], "becaus": [0, 1, 2, 3, 4, 5, 6, 8, 9, 12, 13, 14, 26, 27, 28], "becom": [0, 1, 2, 5, 6, 7, 9, 12, 13, 23, 26, 27, 28], "been": [0, 1, 2, 3, 4, 5, 6, 11, 12, 13, 19, 20, 21, 26, 27], "befor": [0, 1, 2, 3, 4, 5, 6, 7, 8, 12, 13, 14, 16, 17, 18, 20, 21, 23, 26, 27], "beforehand": [0, 23, 26], "begin": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 11, 12, 13, 14, 15, 20, 23, 24, 26, 27, 28], "behav": [1, 6, 13, 28], "behavior": [0, 1, 13, 26, 28], "behaviour": 12, "behind": [0, 1, 6, 8, 13, 26, 28], "being": [0, 1, 2, 3, 4, 5, 7, 8, 10, 11, 12, 13, 17, 23, 26, 27, 28], "believ": [9, 20], "belong": [7, 8, 9, 13, 14, 28], "below": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 15, 18, 20, 21, 23, 26, 27, 28], "benchmark": 10, "benefici": [1, 13], "benefit": [0, 1, 4, 11, 13, 19, 26, 28], "bengio": [1, 25, 26, 27], "benign": [1, 7], "besid": [4, 5, 28], "bessel": [5, 27], "best": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 12, 13, 15, 16, 18, 24, 26, 27, 28], "beta": [1, 3, 10, 11, 13, 16, 17, 26, 27, 28], "beta_": [3, 13, 17, 27], "beta_0": [1, 3, 13, 27], "beta_1": [1, 3, 10, 13, 27], "beta_1x_i": 13, "beta_2": [3, 13], "beta_3": 3, "beta_i": 3, "beta_j": [13, 27], "beta_k": 13, "beta_linreg": 13, "beta_m": 10, "beta_mg_m": 10, "beta_n": 3, "better": [0, 1, 2, 3, 4, 6, 9, 10, 11, 12, 13, 26, 27], "between": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 11, 12, 13, 14, 15, 16, 17, 18, 21, 23, 26, 27, 28], "beyond": [0, 1, 5, 6, 8, 13, 26, 27, 28], "bf": [13, 14, 20, 23, 28], "bg": 26, "bgd": 13, "bia": [0, 1, 2, 3, 5, 8, 9, 10, 12, 13, 26, 27, 28], "bias": [1, 2, 3, 5, 6, 9, 12], "bibliographi": 21, "big": [0, 1, 2, 5, 6, 14], "bigger": [1, 6, 27], "bigr": 12, "bike": 9, "bilbo": 26, "billion": [3, 12, 19], "bin": [7, 23], "binari": [0, 3, 5, 7, 9, 10, 12, 26], "binarycrossentropi": 4, "bind": 0, "binomi": [19, 23, 26], "binsboot": 6, "bioinformat": 0, "biolog": [1, 12], "bios1100": [19, 26], "bird": [0, 3], "birth": 26, "bishop": [25, 26], "bit": [1, 4, 20, 23, 26], "bitwis": 23, "bivari": 2, "bk": 13, "bla": [20, 26], "black": [8, 9, 14], "block": [6, 10, 19, 20, 23, 26], "blog": 26, "blogpost": 4, "blue": [0, 3], "bm": [], "bmatrix": [0, 1, 3, 5, 7, 8, 11, 13, 20, 26, 27, 28], "bmi": 1, "bodi": [0, 1, 4, 12], "bold": 1, "boldfac": [0, 5, 16, 27, 28], "boldsymbol": [0, 1, 2, 3, 5, 6, 7, 8, 10, 11, 13, 14, 16, 17, 21, 26, 28], "boltzmann": [12, 19, 26], "book": [17, 21, 25, 26, 27], "book1": 25, "boolean": [4, 17], "boost": [1, 9, 19, 26], "boostrap": 10, "bootstrap": [1, 13, 19, 21, 26], "borrow": 26, "boston_dataset": [], "bot": 8, "both": [0, 1, 4, 5, 6, 8, 9, 10, 13, 14, 15, 16, 17, 18, 19, 20, 21, 23, 24, 26, 27, 28], "bottl": 7, "bound": [8, 12], "boundari": [2, 4, 8, 11, 12], "box": [4, 9], "boyd": [8, 13, 28], "bracket": [4, 23], "brain": [1, 7, 12], "branch": [9, 26], "break": [0, 4, 6, 11, 14, 26], "breast": [5, 7, 11], "breviti": 13, "brew": [0, 19, 21, 26], "brg": 8, "brief": [21, 27], "briefli": [0, 16, 26], "bring": [0, 5, 6, 10, 27], "britt": [24, 26], "broad": 0, "broadli": 26, "brought": [13, 19, 26], "brownle": 4, "browser": [15, 26], "brute": [3, 5, 11, 27], "buffer_s": 4, "bui": 4, "build": [0, 4, 5, 6, 10, 16, 20, 23, 26], "built": [1, 3, 4, 6], "bunch": 11, "busi": [], "byte": [20, 26], "c": [0, 1, 2, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 19, 20, 22, 23, 24, 25, 27, 28], "c1": [8, 11], "c2": [8, 11], "c_": [8, 9, 10, 13, 23, 28], "c_0": 23, "c_1": 12, "c_2": 12, "c_3": 12, "c_4": 12, "c_i": [12, 13], "c_k": 23, "ca": [1, 26], "cach": 10, "cal": [0, 8, 10, 12, 13, 28], "calcul": [0, 1, 2, 4, 5, 6, 8, 9, 10, 11, 12, 13, 14, 16, 20, 23, 26], "california": 21, "call": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 19, 20, 21, 23, 24, 26, 27, 28], "calor": [0, 27], "cambridg": [13, 25, 28], "can": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 23, 24, 25, 27, 28], "cancel": [0, 13, 26, 27], "cancer": [5, 10], "cancerpd": 7, "candid": [8, 9, 10], "cannot": [0, 1, 4, 5, 6, 7, 8, 9, 21, 23, 27, 28], "canopi": [0, 19, 21, 26], "canva": [15, 16, 21, 26], "cap": 5, "capabl": [0, 1, 8, 13, 19, 26], "capac": [2, 24], "capita": [], "caption": 21, "captur": [4, 11, 12, 26], "car": [3, 4], "card": [0, 7, 26], "cardin": 1, "care": [11, 15], "carefulli": 13, "carlo": [0, 6, 19, 23, 25, 26], "carri": [2, 6, 7, 21], "cart": 10, "case": [0, 1, 2, 3, 4, 5, 6, 7, 11, 12, 13, 14, 15, 16, 19, 20, 21, 26], "casella": 25, "cast": 1, "cat": [3, 4], "catch": 0, "categor": [0, 1, 3, 9, 11, 26], "categori": [0, 1, 3, 7, 10, 12, 14, 26], "categorical_crossentropi": [1, 3], "caus": [0, 5, 6, 23, 26, 27, 28], "causal": 0, "causat": [0, 26], "cax": 1, "cb": [6, 26], "cbar": 1, "cc": [0, 1, 5, 13, 26, 27, 28], "ccc": [5, 12, 28], "cdf": 23, "cdot": [0, 2, 6, 12, 13, 14, 20, 23, 26, 28], "celebr": [13, 28], "cell": 4, "center": [0, 1, 6, 7, 8, 9, 11, 14, 18, 21, 23, 26, 27], "central": [0, 3, 5, 6, 8, 16, 20, 26, 27], "centroid": [14, 23], "centroid_differ": 14, "centuri": 3, "certain": [0, 3, 6, 7, 9, 23, 26, 27], "cg": 13, "cha": [], "chain": [0, 1, 13, 19, 23, 26], "challeng": 15, "chanc": [1, 5, 13, 23], "chang": [0, 1, 2, 3, 4, 5, 6, 8, 9, 11, 12, 13, 14, 15, 16, 20, 21, 23, 26, 27, 28], "channel": 3, "chapter": [0, 6, 10, 11, 16, 17, 20, 21, 25, 26, 27, 28], "chapter3": [0, 21], "charact": [0, 3, 5, 26, 27, 28], "character": [8, 9, 10, 12, 23], "characterist": [0, 1, 3, 10, 13, 26], "charg": [0, 26], "charl": [], "chase": 4, "chatgpt": [15, 21], "chd": 7, "chddata": 7, "cheap": [5, 27, 28], "cheaper": [1, 13], "check": [1, 3, 4, 5, 11, 13, 15, 16, 20, 26], "checkmark": 3, "checkpoint": 4, "checkpoint_dir": 4, "checkpoint_prefix": 4, "chen": 10, "cheng": 27, "chiaramont": 2, "childcar": 16, "children": 16, "choic": [0, 1, 2, 3, 4, 6, 9, 12, 13, 14, 20, 26, 27, 28], "choleski": [5, 20, 27, 28], "choos": [2, 3, 6, 9, 10, 11, 13, 14, 15, 18, 21, 28], "chosen": [0, 1, 2, 6, 8, 9, 10, 13, 16, 23, 26, 28], "chosen_datapoint": 1, "christian": 25, "christoph": [25, 26], "cifar": 3, "cifar10": 3, "circ": [1, 12], "circl": [0, 8, 12, 27], "circuit": 3, "circumfer": 9, "circumv": [1, 5, 13, 27, 28], "cite": 21, "ckpt": 4, "clariti": 23, "class": [0, 1, 3, 4, 6, 7, 8, 9, 11, 12, 13, 23, 26], "class_nam": [3, 9], "class_val": 9, "class_valu": 9, "classic": [7, 9, 13], "classif": [0, 3, 5, 6, 7, 8, 11, 12, 19, 21, 25, 26, 27], "classifi": [0, 1, 4, 7, 9, 10, 11, 26], "classificaton": 1, "classifii": 10, "clean": 1, "clear": [1, 5, 10, 12, 13], "clearli": [0, 3, 5, 6, 7, 8, 23, 27, 28], "clever": [1, 10], "clf": [0, 6, 8, 9, 10, 26, 27], "clf3": 0, "clf_lasso": 6, "clf_ridg": 6, "cli": 15, "clip": [3, 23], "clone": [15, 24], "close": [0, 1, 2, 4, 6, 8, 9, 11, 12, 13, 14, 18, 23, 25, 26, 28], "closer": [3, 5, 13, 27, 28], "closest": [8, 11, 13, 14], "closur": [19, 26], "cloud": [19, 26], "cluster": [0, 1, 4, 6, 11, 19, 26], "cluster_label": 14, "cm": [1, 2, 3, 6, 8, 13, 28], "cmap": [0, 1, 2, 3, 4, 6, 8, 9, 10, 26], "cmap_arg": 6, "cmd": [9, 15], "cn_": 23, "cnn": 12, "cnn_kera": 3, "cntk": [19, 26], "co": [0, 2, 3, 6, 9, 13, 26], "code": [0, 3, 4, 6, 7, 8, 18, 19, 20, 23, 25], "coef": [0, 26], "coef0": 8, "coef_": [0, 5, 6, 8, 9, 13, 16, 26, 27, 28], "coeff": 5, "coeffici": [0, 3, 5, 6, 7, 8, 9, 13, 18, 20, 26, 27], "coerc": [0, 6, 26], "coin": [10, 23], "coin_toss": 10, "col": [0, 11, 26, 27], "colab": [19, 26], "cold": 9, "colinear": [], "collabor": 21, "collaps": 8, "collect": [2, 6, 10, 11, 17, 19, 23, 25, 26], "collinear": [5, 27, 28], "color": [0, 3, 4, 6, 8, 9, 10, 23], "color_channel": 3, "color_cod": 6, "colorbar": [1, 6], "colsample_bytre": 10, "colsaobject": 10, "column": [0, 1, 2, 5, 6, 7, 8, 9, 11, 12, 16, 17, 18, 20, 26, 27, 28], "columntransform": 9, "com": [4, 6, 15, 16, 19, 21, 25, 26, 28], "combin": [1, 2, 5, 6, 7, 10, 15, 18, 23], "come": [0, 1, 3, 4, 5, 12, 13, 14, 15, 26, 27, 28], "command": [0, 1, 15], "comment": [0, 4, 5, 6, 21], "commerci": [0, 19, 21, 26], "commit": 15, "commod": [0, 26], "common": [0, 1, 3, 5, 6, 7, 9, 11, 13, 14, 16, 21, 23, 26, 27, 28], "commonli": [0, 1, 4, 6, 7, 9, 13, 14, 27], "commun": [0, 12, 15, 21], "commut": 3, "commutatitav": 3, "compact": [0, 1, 3, 5, 6, 7, 9, 11, 12, 13, 14, 26, 27], "compair": 0, "compar": [0, 3, 4, 5, 6, 11, 13, 18, 20, 21, 26, 27, 28], "comparison": [2, 4, 13], "compat": 7, "compet": 0, "competit": 10, "compil": [0, 1, 3, 4, 13, 19, 20, 26], "complet": [0, 2, 3, 4, 9, 12, 15, 16, 17, 18, 26], "completenn": 12, "complex": [1, 5, 8, 9, 11, 12, 13, 16, 26, 28], "complic": [0, 1, 9, 13, 21, 26, 28], "compoment": 27, "compon": [0, 1, 3, 4, 5, 6, 7, 9, 14, 16, 19, 26, 27, 28], "components_": 11, "compos": [9, 12, 13, 14, 19, 26], "compphys": [0, 6, 16, 19, 21, 22, 24, 25, 26, 27, 28], "compress": [0, 26, 27], "compris": 6, "compromis": [5, 27, 28], "compulsori": [19, 26], "comput": [0, 1, 2, 3, 4, 5, 6, 7, 8, 10, 11, 12, 13, 15, 16, 17, 18, 19, 20, 21, 22, 23, 25, 26, 27, 28], "computation": [0, 3, 6, 9, 13, 23, 26, 28], "computationalscienceuio": 26, "computerlab": 21, "concaten": [2, 4, 6, 14], "concav": [1, 13, 27, 28], "concentr": 10, "concept": [0, 2, 19, 26, 27], "conceptu": [12, 13, 28], "concern": [0, 1, 4, 7, 26, 28], "concic": 26, "conclud": [0, 5, 13], "conclus": 1, "cond": 2, "conda": [0, 1, 19, 21, 26], "condis": 27, "condit": [0, 2, 4, 5, 6, 8, 9, 11, 13, 23, 26, 27], "conduct": 19, "condwav": 2, "confid": [0, 5, 6, 7, 8, 26, 27], "configur": 3, "confirm": [5, 12], "confus": [5, 6, 7, 10, 20, 27], "confusion_matrix": 9, "congruenti": 23, "conjug": [4, 8], "conjugaci": 13, "conjunct": 3, "connect": [0, 1, 3, 4, 9, 11, 12, 13, 20, 26, 27, 28], "consequ": [5, 6, 8, 10, 12, 13, 27, 28], "conserv": [5, 14, 27, 28], "consid": [0, 1, 2, 3, 5, 6, 7, 8, 9, 10, 12, 13, 16, 20, 21, 23, 26, 27, 28], "consider": [0, 1, 5, 13, 26, 27, 28], "consist": [1, 2, 3, 4, 6, 12, 13, 21, 23, 27, 28], "constant": [0, 2, 4, 5, 6, 8, 12, 13, 16, 18, 23, 26, 27, 28], "constitu": [0, 26], "constitut": [2, 6], "constrain": [1, 3, 5, 7, 11, 28], "constraint": [5, 6, 8, 13, 27, 28], "construct": [0, 1, 2, 3, 5, 6, 7, 8, 9, 10, 11, 20, 23, 26, 27], "contact": [0, 26], "contain": [0, 2, 3, 4, 5, 6, 7, 8, 9, 11, 12, 13, 15, 18, 20, 21, 23, 25, 26, 27, 28], "contemporari": 26, "content": [1, 15, 19, 20, 26, 28], "context": [6, 10, 13, 21, 28], "contigu": 20, "continu": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 12, 13, 20, 21, 23, 26, 27, 28], "contour": [9, 10, 13], "contourf": [8, 9, 10], "contrast": [1, 4, 9, 10, 12, 26], "contribut": [0, 3, 5, 13, 18, 23, 26, 27, 28], "contributor": [0, 21], "control": [0, 1, 3, 9, 13, 15, 19, 26], "conv": [3, 4], "conv2d": [3, 4], "conv2dtranspos": 4, "convei": 26, "conveni": [5, 6, 12, 13, 20, 21, 26, 28], "convent": [12, 27], "converg": [1, 2, 4, 5, 8, 13, 14, 18, 27, 28], "convergencewarn": [], "convert": [0, 1, 4, 5, 9, 11, 13, 20, 26, 27, 28], "converttomatrix": 4, "convex": [4, 5, 7, 27], "convinc": [13, 28], "convolut": [1, 4, 19, 26], "cool": [4, 9], "coolwarm": 6, "coordin": [5, 12, 14, 27, 28], "coorel": [], "copi": [0, 1, 14, 15, 27], "core": 10, "corel": 26, "coronari": 7, "corr": [5, 7, 11, 27], "correalt": [11, 19], "correct": [0, 1, 2, 3, 4, 5, 7, 13, 15, 20, 23, 26, 27, 28], "correctli": [1, 2, 6, 7, 10, 18, 21], "correl": [0, 1, 3, 5, 6, 7, 10, 12, 13, 19, 23, 26, 28], "correlation_matrix": [5, 7, 11, 27], "correspond": [0, 3, 5, 6, 8, 9, 11, 12, 19, 20, 21, 23, 26, 27, 28], "cortex": 12, "cosin": [3, 6], "cost": [0, 2, 3, 5, 6, 7, 8, 9, 12, 13, 16, 17, 18, 21, 26], "cost_deep_grad": 2, "cost_funct": 2, "cost_function_deep": 2, "cost_function_deep_grad": 2, "cost_function_grad": 2, "cost_grad": 2, "cost_histori": 18, "cost_ol": 18, "cost_ridg": 18, "cost_sum": 2, "costol": 13, "could": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 15, 16, 17, 18, 20, 21, 23, 26, 27, 28], "coulomb": [0, 26], "count": [0, 9, 15, 22, 23, 24, 26], "counterpart": 26, "countor": 13, "coupl": [4, 5, 6], "cours": [0, 1, 3, 5, 11, 15, 16, 17, 21, 24, 27], "coursework": 15, "courvil": [25, 26, 27], "cov": [5, 6, 11, 20, 23, 26, 27], "cov_xi": [5, 11, 27], "cov_xx": [5, 11, 27], "cov_yi": [5, 11, 27], "covari": [0, 7, 19, 20, 26, 28], "covariance_matrix": [5, 11, 14], "cover": [0, 5, 19, 24, 25, 27, 28], "covert": [0, 26], "covxi": 23, "covxx": 23, "covxz": 23, "covyi": 23, "covyz": 23, "covzz": 23, "cpu": 1, "craft": 3, "creat": [1, 3, 4, 5, 9, 10, 11, 12, 15, 18, 19, 26], "create_biases_and_weight": 1, "create_convolutional_neural_network_kera": 3, "create_neural_network_kera": 1, "create_x": [5, 11], "credit": [0, 7, 24, 26], "crim": [], "crime": [], "criteria": [0, 4, 9, 10, 14, 23, 26], "criterion": [9, 10, 13, 18, 28], "critic": [6, 21, 27], "critiqu": 21, "cross": [0, 1, 3, 7, 9, 10, 13, 15, 19, 23, 26, 27, 28], "cross_entropi": 4, "cross_val_scor": 6, "cross_valid": [7, 10], "crossvalid": 6, "crucial": [1, 23], "cs231": 3, "csr_matrix": [20, 26], "csv": [0, 4, 6, 7, 9], "ctnk": 1, "cubic": 0, "cumbersom": 5, "cumsum": [10, 11, 26], "cumul": [7, 10, 23], "cumulative_heads_ratio": 10, "cup": 5, "current": [1, 2, 3, 4, 13, 14, 15, 16, 25, 28], "curs": [0, 27], "curv": [6, 7, 10, 12, 21], "curvatur": [13, 28], "custom": [6, 14], "custom_cmap": [9, 10], "custom_cmap2": [9, 10], "cutpoint": 9, "cv": [6, 7, 10], "cvxbook": [13, 28], "cvxopt": [5, 8, 27], "cycl": [1, 12], "d": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 13, 14, 15, 16, 17, 20, 23, 24, 26, 27, 28], "d2_g_t": 2, "d_f": [13, 28], "d_g_t": 2, "d_net_out": 2, "da": 3, "dagger": [5, 20, 27, 28], "dai": [1, 9, 19], "damp": 3, "darget": 9, "darkr": 23, "dat": [0, 26], "dat_id": [0, 6, 7, 9, 26], "data": [2, 4, 5, 8, 10, 12, 13, 14, 16, 20, 21, 25, 28], "data1": 14, "data2": 14, "data3": 14, "data4": 14, "data_id": [0, 6, 7, 9, 26], "data_indic": 1, "data_panda": 26, "data_path": [0, 6, 7, 9, 26], "databas": 1, "datafil": [0, 6, 7, 9, 26], "datafram": [0, 4, 5, 7, 9, 11, 26, 27], "datapoint": [1, 5, 6, 7, 11, 13, 16, 28], "datasci": [15, 16], "dataset": [0, 4, 6, 7, 8, 9, 10, 11, 13, 14, 16, 21, 26, 28], "datatyp": 4, "date": [15, 18, 21, 26, 27, 28], "daughter": 10, "david": 25, "dbh": 1, "dbo": 1, "dcomposit": 20, "ddot": 2, "dead": 1, "deadlin": 15, "deal": [0, 1, 3, 5, 6, 8, 11, 13, 14, 20, 23, 26, 27, 28], "dealt": 0, "debt": 7, "debug": [0, 5, 6, 27, 28], "decad": [0, 3], "decai": [0, 13, 23, 26], "decemb": [24, 26], "decent": 10, "decid": [0, 2, 3, 5, 6, 9, 18, 27, 28], "decim": [0, 26], "decis": [0, 1, 8, 11, 19, 25, 26], "decision_funct": 8, "decision_tre": 9, "decisiontreeclassifi": [9, 10], "decisiontreeregressor": [0, 9, 10], "declar": [0, 4, 20, 26], "decompos": [5, 6, 20, 27, 28], "decomposit": [0, 6, 12, 26], "decompost": [5, 27, 28], "deconvolut": 3, "decorrel": [10, 13], "decreas": [1, 2, 4, 5, 6, 10, 11, 13, 28], "deduc": [0, 26], "deep": [3, 7, 12, 13, 19, 25, 27, 28], "deep_neural_network": 2, "deep_param": 2, "deep_tree_clf": [9, 10], "deep_tree_clf1": 9, "deep_tree_clf2": 9, "deepen": [5, 19, 26], "deeper": [0, 3, 4, 26], "deeplearningbook": [25, 26, 28], "deer": 3, "def": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 13, 14, 16, 17, 23, 26, 27, 28], "def_covari": 23, "default": [0, 1, 2, 4, 6, 7, 20, 26, 27], "default_tim": 4, "defect": [5, 27, 28], "defici": [5, 27, 28], "defin": [0, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 16, 17, 18, 20, 21, 23, 27, 28], "definit": [1, 2, 5, 6, 7, 8, 10, 11, 12, 13, 20, 23, 27, 28], "defint": 23, "degre": [3, 5, 6, 8, 9, 10, 11, 15, 16, 21, 23, 26, 28], "deisenroth": 27, "del": 1, "delet": [6, 15], "delimit": 4, "deliv": [15, 22, 26], "delta": [0, 2, 3, 6, 8, 12, 13, 14, 26], "delta_": [1, 20], "delta_0": 3, "delta_1": 3, "delta_2": 3, "delta_3": 3, "delta_4": 3, "delta_5": 3, "delta_h": [0, 1, 26], "delta_j": [3, 12], "delta_k": 12, "delta_l": [1, 3], "delta_momentum": 13, "delta_n": [0, 3, 26], "delug": 19, "delv": 0, "demand": [13, 28], "demonstr": [0, 3, 5, 6, 7, 11, 12, 19, 26, 27, 28], "den": 4, "denomin": [1, 5], "denot": [1, 2, 6, 7, 13, 23, 28], "dens": [1, 3, 4], "densiti": [0, 2, 6, 23], "depart": [24, 26, 27, 28], "depend": [0, 1, 2, 4, 5, 6, 7, 8, 11, 12, 13, 15, 16, 19, 20, 21, 23, 26, 27, 28], "depict": 23, "deploy": [0, 19, 21, 26], "depth": [0, 3, 9, 10, 20], "deriv": [0, 1, 2, 6, 7, 8, 10, 11, 13, 18, 19, 21, 26], "derivati": 13, "derivative_fn": 13, "descend": [5, 9, 11, 27, 28], "descent": [0, 1, 3, 7, 8, 12, 26, 27], "describ": [0, 2, 4, 5, 6, 8, 10, 11, 12, 13, 20, 21, 26], "descript": [0, 8, 9, 21, 26], "design": [0, 1, 3, 4, 5, 6, 7, 10, 11, 12, 13, 17, 18, 21, 26, 28], "designmatrix": [0, 26], "desir": [0, 2, 4, 5, 13, 14, 26, 27, 28], "desktop": 15, "despit": [1, 12], "destroi": 20, "det": [5, 20, 27, 28], "detail": [0, 6, 11, 13, 14, 18, 20, 21, 27, 28], "detect": [3, 8, 12], "determin": [0, 2, 3, 4, 5, 6, 8, 9, 10, 11, 12, 13, 18, 20, 23, 26, 27, 28], "determinist": [7, 13, 23, 28], "dev": [1, 21], "develop": [0, 3, 5, 8, 10, 11, 12, 19, 20, 21, 26, 27], "deviat": [0, 1, 2, 4, 5, 6, 17, 18, 21, 23, 26, 27], "devis": 12, "df": [4, 8, 11, 13, 26], "df1": 26, "di": [], "diag": [5, 8, 27, 28], "diagnost": [1, 10], "diagon": [0, 5, 7, 13, 18, 20, 23, 26, 27, 28], "diagonaliz": [5, 27, 28], "diagram": 10, "diagsvd": 6, "dice": [6, 23], "dict": [6, 8], "dictionari": [], "did": [0, 1, 5, 6, 7, 10, 11, 14, 16, 21, 26], "die": 1, "diff": 2, "diff1": 2, "diff2": 2, "diff_ag": 2, "diffeent": 8, "differ": [0, 1, 2, 3, 4, 5, 6, 7, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 23, 25, 26, 27, 28], "differenti": [0, 3, 16, 19, 20, 26, 27, 28], "difficult": [0, 1, 6, 10, 13, 23, 26], "difficulti": [0, 1, 13, 26, 28], "diffonedim": 2, "digit": [0, 1, 3, 4, 6, 24, 26], "dilemma": 13, "dilut": 1, "dim": [4, 11, 14, 20], "dimens": [0, 1, 2, 3, 4, 5, 8, 11, 14, 16, 20, 26, 27, 28], "dimension": [0, 4, 5, 6, 9, 11, 13, 14, 19, 20, 21, 26, 27, 28], "dimensionless": [0, 3, 26], "diment": 20, "dimnsion": 4, "diod": 3, "direct": [0, 1, 2, 4, 11, 12, 13, 14, 26, 27, 28], "directli": [1, 4, 5, 6, 18, 23, 27, 28], "disadvantag": [0, 26], "disappear": [3, 6], "disc_loss": 4, "disc_tap": 4, "discard": [6, 11], "disciplin": [0, 3, 12], "disclaim": 23, "discord": 26, "discourag": [13, 15, 28], "discov": [0, 26], "discover": 5, "discret": [1, 3, 5, 7, 13], "discrimin": [4, 7, 10, 11], "discriminator_loss": 4, "discriminator_loss_list": 4, "discriminator_model": 4, "discriminator_optim": 4, "discuss": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 16, 18, 19, 20, 21, 23, 25, 26, 27, 28], "diseas": 7, "disguis": [6, 27], "disord": [1, 7], "displai": [0, 1, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 14, 21, 23, 26, 27], "displaystyl": [0, 5, 17, 26, 27, 28], "disregard": [0, 26], "dissimilar": [11, 14], "dist": 14, "distanc": [8, 9, 11, 14, 23], "distance_list": 9, "distinct": [3, 7, 8, 9, 10, 14], "distinctli": 8, "distinguish": [0, 4, 7, 8, 23, 26], "distplot": [], "distribut": [0, 1, 4, 6, 7, 10, 11, 13, 14, 18, 19, 20, 21, 26, 27, 28], "distrubut": [0, 19, 21, 26], "dive": [0, 8, 20, 26], "diverg": [1, 13, 28], "divid": [0, 1, 3, 5, 6, 7, 8, 9, 11, 12, 18, 23, 26, 27], "divis": [6, 8, 9, 13, 18, 20, 23], "dna": 7, "dnn": [0, 1, 2, 4, 12, 26], "dnn1": 4, "dnn2_gru2": 4, "dnn_kera": 1, "dnn_model": 1, "dnn_numpi": 1, "dnn_scikit": [0, 1, 26], "do": [0, 2, 3, 4, 5, 6, 8, 9, 10, 11, 12, 13, 14, 15, 16, 20, 21, 26, 27, 28], "doc": [0, 15, 16, 19, 21, 22, 24, 25, 26], "document": [4, 13, 15], "doe": [0, 1, 2, 3, 4, 5, 6, 8, 10, 11, 12, 13, 15, 16, 17, 18, 20, 21, 23, 26], "doesn": [3, 9, 12, 26], "dog": [1, 3, 4], "domain": [5, 8, 13, 21, 28], "domin": [0, 26], "don": [0, 1, 3, 5, 6, 8, 11, 13, 15, 16, 19, 21, 26, 27], "done": [0, 2, 3, 4, 5, 6, 9, 10, 11, 13, 16, 20, 21, 26, 27, 28], "dot": [0, 2, 3, 5, 6, 7, 8, 9, 10, 11, 12, 13, 18, 20, 21, 23, 26, 27, 28], "doubl": [3, 4, 16, 20, 26], "doubli": 1, "doubt": 21, "down": [0, 3, 6, 9, 11, 12, 13, 28], "download": [0, 1, 3, 5, 6, 15, 20, 25, 26], "downsampl": 3, "dozen": 1, "dq": 6, "drag": 13, "dramat": 11, "drastic": 4, "draw": [4, 6, 10, 13, 28], "drawback": [0, 1, 3, 13, 27, 28], "drawn": [1, 4, 6, 7, 11, 23, 26], "drive": [3, 4], "driven": 3, "drop": [0, 1, 5, 6, 11, 13, 23, 26, 27, 28], "dropna": [0, 6, 26], "dropout": 4, "dt": [2, 3, 13, 23], "dtype": [0, 1, 3, 4, 14, 20, 26], "dub": [0, 26], "due": [1, 2, 5, 6, 8, 10, 12, 13, 18, 24, 26, 27, 28], "dummi": [], "dure": [0, 1, 3, 4, 8, 9, 11, 19, 21, 26], "dwell": [], "dwh": 1, "dwo": 1, "dx": [2, 3, 8, 23], "dx_1": 23, "dx_1p": 6, "dx_2p": 6, "dx_mp": 6, "dx_n": 23, "dxp": 6, "dy": [1, 8, 23], "dynam": 4, "dz": 8, "e": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 23, 24, 26, 27, 28], "e_": [0, 2, 26], "each": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 18, 19, 20, 22, 23, 24, 26, 27, 28], "eapprox": [0, 26], "earli": [1, 13], "earlier": [0, 5, 7, 8, 9, 11, 12, 13, 26, 27], "earthexplor": 6, "eas": [6, 9, 14], "easi": [0, 5, 6, 7, 8, 9, 10, 11, 12, 13, 15, 19, 20, 26, 27, 28], "easier": [5, 6, 8, 9, 13, 15, 21, 23, 26, 27, 28], "easiest": [13, 18], "easili": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 20, 21, 26, 27, 28], "eastern": [24, 26], "ebind": [0, 26], "eblock": 9, "econometr": 26, "economi": 5, "ecosystem": [19, 26], "ect": 22, "edg": 3, "edgecolor": 6, "editor": 15, "edu": [13, 21, 28], "educ": [0, 21, 26], "eff": 23, "effect": [1, 4, 10, 13, 16, 17, 18, 23], "effic": 1, "effici": [0, 3, 10, 13, 19, 20, 23, 26], "efron": 6, "egrad": 13, "eig": [5, 11, 13, 20, 23, 26, 27, 28], "eigen": 23, "eigenpair": [5, 11, 27, 28], "eigenvalu": [0, 5, 8, 11, 13, 20, 26, 27, 28], "eigenvector": [5, 11, 13, 27, 28], "eight": [20, 26], "eigval": [20, 23, 26], "eigvalu": [11, 13, 28], "eigvec": [20, 23, 26], "eigvector": [11, 13, 28], "eir": [24, 26], "eispack": [20, 26], "either": [1, 5, 6, 7, 8, 9, 10, 11, 13, 18, 21, 23, 26, 27, 28], "eivind": 24, "eivinsto": 24, "ekstr\u00f8m": 4, "elabor": 23, "elarn": 3, "electr": [0, 3, 12, 26], "electron": 26, "eleg": 11, "element": [1, 2, 3, 4, 5, 6, 7, 8, 11, 12, 13, 19, 20, 21, 25, 27], "elementari": [10, 13, 20], "elementwis": [3, 13], "elementwise_grad": [2, 13], "elessar": 26, "elif": 14, "elim": 20, "elimin": [3, 8], "elin": [24, 26], "ellipsi": 16, "els": [1, 3, 4, 7, 9, 12, 13, 16, 20], "elu": 1, "elus": [0, 26], "email": [22, 24, 26], "embed": [0, 11, 27], "embodi": [6, 21], "emit": 23, "emner": 25, "emphas": [0, 10, 19, 26], "emphasi": [0, 19, 25, 26], "empir": [1, 11, 23], "emploi": [0, 1, 5, 6, 11, 13, 23, 26, 27, 28], "employ": 0, "empti": [6, 10, 15], "emul": 12, "en": [19, 21, 25], "enabl": 11, "enbodi": 6, "encod": [0, 3, 5, 9, 11, 14, 26, 27, 28], "encompass": [0, 21, 23], "encount": [0, 1, 5, 7, 13, 15, 21, 23, 26, 27, 28], "encourag": [15, 21], "end": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 20, 23, 24, 26, 27, 28], "endpoint": [3, 6], "energi": [0, 4, 6], "enforc": 12, "eng": 25, "engin": [0, 1, 3, 4, 19, 26], "english": 21, "enocurag": 21, "enorm": 3, "enough": [0, 6, 13, 26, 28], "ensembl": [1, 9, 26], "ensur": [0, 1, 2, 3, 5, 6, 11, 13, 18, 23, 27, 28], "entail": 26, "enter": [5, 6, 27, 28], "enthought": [0, 19, 21, 26], "entir": [1, 3, 7, 9, 19, 23, 26], "entiti": [9, 12, 20, 26], "entri": [0, 5, 8, 11, 12, 20, 26, 27], "entropi": [1, 3, 7, 10, 13, 26, 28], "enumer": [0, 1, 2, 3, 4, 6, 8, 26, 27], "env": 23, "environ": [2, 19, 21, 26], "environemnt": 15, "eo": [0, 6], "eol": 0, "eosfit": 0, "epoch": [0, 1, 3, 4, 12, 13, 26], "epsilon": [0, 5, 6, 7, 13, 21, 26, 27, 28], "epsilon_": [0, 26], "epsilon_0": [0, 26], "epsilon_1": [0, 26], "epsilon_2": [0, 26], "epsilon_i": [0, 26, 27], "eq": [3, 13, 14, 20, 23, 28], "eqnarrai": [3, 5, 6], "equal": [0, 1, 2, 3, 4, 5, 6, 8, 9, 11, 12, 13, 14, 16, 18, 20, 21, 23, 26, 27, 28], "equat": [1, 3, 4, 5, 6, 7, 8, 9, 10, 11, 13, 14, 17, 20, 23, 26], "equilibrium": [2, 12], "equiv": [3, 13, 20, 23, 28], "equival": [0, 1, 5, 7, 8, 11, 13, 19, 20, 26, 27, 28], "erf": 23, "eriador": 26, "err": [0, 10], "err_": 6, "err_sqr": 2, "errat": [13, 28], "erron": 2, "error": [1, 2, 4, 5, 6, 7, 9, 11, 12, 13, 15, 16, 17, 18, 19, 20, 21, 23], "error_estimate_corr_tim": 23, "error_hidden": 1, "error_output": 1, "escap": [13, 28], "especi": [1, 3, 9, 12, 13, 15, 18, 21], "essenti": [0, 5, 6, 9, 10, 12, 14, 15, 21, 23, 27, 28], "establish": [0, 6, 10, 11, 16, 21], "estim": [0, 1, 5, 6, 7, 10, 11, 13, 19, 23, 26, 27, 28], "estimated_mse_fold": 6, "estimated_mse_kfold": 6, "estimated_mse_sklearn": 6, "et": [0, 2, 4, 16, 17, 25, 26, 27, 28], "eta": [0, 1, 3, 8, 12, 13, 18, 26, 28], "eta0": [8, 13], "eta_": 13, "eta_t": 13, "eta_v": [0, 1, 3, 26], "etc": [0, 1, 3, 5, 7, 8, 9, 11, 12, 13, 14, 19, 20, 21, 23, 27, 28], "ethic": 19, "euclidean": [0, 14, 27], "evalu": [0, 2, 3, 4, 5, 6, 9, 13, 15, 16, 17, 21, 23, 26, 27, 28], "evalut": [13, 21], "even": [0, 1, 3, 4, 5, 6, 8, 9, 10, 11, 12, 13, 14, 19, 20, 23, 26, 27, 28], "evenli": 4, "event": [5, 7, 10, 23], "eventu": [0, 5, 6, 11, 12, 13, 21, 24, 27, 28], "everi": [0, 1, 2, 3, 4, 5, 6, 9, 10, 11, 12, 13, 14, 15, 19, 23, 24, 26, 27, 28], "everyth": [4, 12, 16, 18], "everywher": [4, 13, 28], "evolv": 0, "exact": [0, 5, 11, 12, 13, 20, 23, 26, 27], "exactli": [0, 3, 4, 6, 12, 18, 19, 27], "exam": 26, "examin": 6, "exampl": [0, 5, 11, 12, 13, 15, 16, 18, 19, 20, 21, 23, 25], "exce": [1, 12, 13], "excel": [0, 1, 4, 5, 10, 21, 26, 27], "except": [3, 4, 6, 8, 9, 20], "excess": [0, 26], "excit": 0, "exclud": [1, 6, 12, 27], "exclus": [0, 1, 3, 6, 23, 26], "execut": [2, 5, 13, 15, 27, 28], "exemplifi": 13, "exercic": [24, 26], "exercis": [5, 19, 21, 22, 24, 26, 28], "exhaust": 6, "exhibit": [0, 5, 6, 8, 26, 27], "exist": [0, 1, 2, 3, 5, 6, 7, 8, 9, 13, 20, 21, 26, 28], "exit": [5, 20, 27, 28], "exp": [0, 1, 2, 5, 6, 7, 8, 10, 11, 12, 13, 16, 17, 23, 27, 28], "exp_term": 1, "expand": [5, 7, 11, 13, 28], "expans": [0, 3, 5, 8, 10, 12, 13, 26, 27, 28], "expect": [0, 1, 5, 6, 7, 11, 12, 13, 15, 18, 19, 21, 26, 27], "expectation_value_of_h_wrt_p": 23, "expens": [6, 10, 13, 16, 28], "experi": [0, 1, 6, 8, 13, 15, 19, 21, 26, 27, 28], "experiment": [0, 4, 6, 9, 23, 26], "expert": [1, 9], "explain": [0, 6, 9, 10, 11, 13, 16, 21, 26, 28], "explained_variance_ratio_": 11, "explanatori": [0, 26], "explicit": [0, 3, 6, 13, 20, 21, 26, 27, 28], "explicitli": [0, 4], "explod": 1, "exploit": [0, 3, 12, 13, 26], "explor": [1, 4, 6, 8, 13, 18, 19, 21, 26, 28], "expon": 1, "exponenti": [0, 1, 5, 6, 10, 13, 23, 26, 28], "export": [9, 15, 16], "export_graphviz": 9, "export_text": 9, "exporttext": 9, "expos": 19, "express": [0, 2, 3, 5, 6, 7, 10, 12, 13, 18, 20, 21, 23, 26, 28], "exptmean": 23, "exptvari": 23, "extend": [0, 2, 7, 11, 13, 19, 26], "extens": [0, 12, 15, 19, 26], "extent": [0, 1, 6, 25], "extern": [3, 6, 9], "extra": [1, 3, 5, 15, 24, 26, 27, 28], "extract": [0, 3, 5, 6, 7, 8, 11, 13, 16, 17, 20, 26, 27], "extrapol": [0, 26], "extrem": [0, 1, 4, 5, 6, 7, 8, 9, 13, 15, 16, 20, 27, 28], "extremum": [13, 28], "extrins": 11, "ey": [0, 5, 6, 13, 14, 18, 20, 26, 27, 28], "f": [0, 1, 2, 3, 4, 5, 6, 7, 8, 10, 12, 13, 14, 15, 16, 17, 18, 20, 23, 24, 26, 27, 28], "f1": 13, "f11": [0, 26], "f12": [0, 26], "f13": [0, 26], "f1_grad": 13, "f1d": 13, "f2": 13, "f2_grad_x1": 13, "f2_grad_x1_analyt": 13, "f2_grad_x2": 13, "f2_grad_x2_analyt": 13, "f3": 13, "f3_grad": 13, "f3_grad_analyt": 13, "f4": 13, "f4_grad": 13, "f4_grad_analyt": 13, "f5": 13, "f5_grad": 13, "f6": 13, "f6_for": 13, "f6_for_grad": 13, "f6_grad_analyt": 13, "f6_while": 13, "f6_while_grad": 13, "f7": 13, "f7_grad": 13, "f7_grad_analyt": 13, "f8": 13, "f8_grad": 13, "f9": [0, 13, 26], "f9_altern": 13, "f9_alternative_grad": 13, "f9_grad": 13, "f_": 10, "f_0": [3, 10], "f_1": [10, 13, 28], "f_2": [12, 13, 28], "f_3": 12, "f_d": 23, "f_grad": 13, "f_grad_analyt": 13, "f_i": [0, 6, 12, 16], "f_m": [3, 10], "f_n": 3, "f_vec": 2, "face": [13, 26, 28], "facecolor": [6, 8, 23], "facil": [0, 19], "facilit": 12, "fact": [0, 1, 3, 5, 9, 11, 12, 13, 26, 27, 28], "factor": [0, 1, 3, 5, 6, 9, 10, 11, 13, 20, 23, 26, 27, 28], "factori": 13, "fade": 6, "fafab0": [9, 10], "fail": [0, 6, 13, 24, 26, 28], "failur": 7, "fairli": [1, 2, 18, 23], "faisal": [16, 27], "fake": 4, "fake_loss": 4, "fake_output": 4, "fall": [8, 9, 22], "fals": [0, 1, 2, 3, 4, 5, 6, 7, 9, 10, 14, 16, 17, 20, 26, 27, 28], "famili": [0, 7, 8, 23, 27], "familiar": [0, 3, 5, 6, 8, 15, 19, 20, 21, 23, 26], "famou": [6, 12], "far": [0, 3, 4, 5, 6, 8, 11, 12, 13, 14, 16, 26, 27, 28], "fashion": [0, 9, 10, 26], "fast": [1, 3, 6, 10, 12, 13, 19, 23, 26, 28], "faster": [1, 11, 13], "fastest": [13, 20, 28], "favor": 7, "favorit": 23, "fc": 3, "featur": [0, 1, 3, 5, 6, 7, 8, 10, 11, 12, 13, 15, 17, 18, 19, 23, 26, 28], "feature_nam": [1, 7, 9], "feautur": 9, "fed": 1, "feed": [0, 2, 3, 11, 19, 26], "feed_forward": 1, "feed_forward_out": 1, "feed_forward_train": 1, "feedback": [4, 26], "feeddorward": 4, "feedforward": [1, 4, 12], "feel": [0, 5, 6, 11, 13, 15, 16, 18, 19, 21, 24, 26], "feet": [], "felt": 21, "fetch": [6, 15], "few": [1, 3, 4, 5, 9, 17, 18, 23, 26], "fewer": [0, 9, 11, 26], "ffnn": [1, 12], "field": [0, 3, 6, 12, 19], "fifth": [0, 6, 26], "fig": [0, 1, 2, 3, 4, 6, 7, 12, 13, 14, 21, 26], "fig_id": [0, 6, 7, 9, 26], "figaxi": 23, "figsiz": [0, 1, 2, 3, 4, 6, 7, 8, 9, 10, 26], "figur": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 12, 13, 14, 16, 19, 21, 26, 27, 28], "figure_id": [0, 6, 7, 9, 26], "figurefil": [0, 6, 7, 9, 26], "file": [0, 4, 5, 6, 7, 9, 15, 21, 26], "file_prefix": 4, "filenam": 26, "fill": [5, 9, 18, 27, 28], "filter": [3, 4], "final": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 13, 14, 18, 21, 22, 23, 24, 26, 28], "financ": 0, "find": [0, 1, 2, 3, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 19, 21, 23, 26, 27, 28], "fine": [0, 14], "finish": 2, "finit": [3, 5, 6, 12, 13, 17, 23, 27, 28], "finnicki": 15, "first": [0, 1, 2, 3, 5, 6, 7, 8, 9, 10, 11, 13, 14, 15, 16, 18, 20, 21, 23, 24, 25, 27], "firsteigvector": 11, "fit": [1, 3, 4, 5, 6, 7, 8, 9, 11, 12, 13, 17, 18, 21, 23, 27], "fit_beta": 27, "fit_intercept": [0, 5, 6, 16, 27, 28], "fit_mod": 9, "fit_theta": 6, "fit_transform": [0, 6, 8, 9, 11, 15], "fiti": [0, 26], "five": [0, 9, 26, 27], "fix": [0, 3, 4, 6, 10, 11, 12, 13, 21, 26], "flag": 4, "flat": [12, 13, 28], "flatten": [1, 3, 4, 5, 20], "flexibl": [1, 6, 8, 10, 12, 26], "flip": [24, 26], "float": [0, 3, 4, 5, 9, 11, 13, 14, 20, 26, 27, 28], "float32": [4, 9], "float64": [4, 20, 26], "flop": [5, 20, 27, 28], "flow": [1, 4, 12], "fluctuat": 5, "fly": 11, "fm": 0, "fmax": 3, "fmesh": 13, "fn": 7, "focu": [0, 3, 4, 5, 6, 15, 19, 21, 25, 26, 27, 28], "focus": [1, 6, 7, 20, 27], "fold": [6, 9, 21], "folder": [0, 4, 6, 15, 21, 26], "follow": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 19, 20, 21, 23, 24, 25, 26, 27, 28], "font": [7, 23, 26], "fontdict": 23, "fontsiz": [1, 6, 8, 9, 10, 23], "fontweight": 1, "footprint": 3, "foral": [8, 27], "forc": [0, 5, 6, 10, 11, 27, 28], "forcast": 4, "forecast": [4, 12], "forest": [0, 1, 9, 19, 26], "forget": 11, "form": [0, 3, 4, 5, 6, 7, 8, 9, 11, 12, 13, 15, 16, 19, 20, 21, 23, 26, 27, 28], "formal": [3, 4, 14, 23], "format": [0, 1, 3, 4, 6, 7, 8, 9, 10, 11, 19, 23, 25], "format_data": 4, "formatstrformatt": [6, 13, 28], "formul": [4, 6, 11, 14], "formula": [3, 13, 23, 28], "forth": [4, 12], "fortran": [0, 19, 20, 26], "fortran2003": [19, 26], "fortran2008": 21, "fortran90": 23, "fortun": [0, 11, 27], "forward": [0, 3, 6, 19, 20, 26], "found": [1, 2, 4, 5, 6, 12, 13, 21, 26, 27], "foundat": [19, 26], "four": [4, 5, 6, 8, 12, 20, 22, 24, 26, 28], "fourier": [0, 26], "fourierdef1": 3, "fourierdef2": 3, "fourierseriessign": 3, "fourth": [12, 26, 27], "fp": 7, "frac": [0, 1, 2, 3, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 16, 17, 20, 21, 23, 26, 27, 28], "fraction": 9, "frame": 7, "framework": [1, 8, 10, 23], "frank": [5, 11], "frankefunct": [5, 6, 11], "fredli": [24, 26], "free": [0, 6, 11, 13, 15, 16, 18, 19, 20, 21, 23, 24, 25, 26], "freecodecamp": 19, "freedom": [5, 28], "freeli": [0, 21], "freez": 15, "frequenc": [3, 6, 7, 23], "frequent": [0, 8, 9, 13, 28], "frequentist": 19, "fresh": 10, "fridai": [15, 24, 26], "friedman": [6, 21, 25, 26], "friendli": 4, "fro": 21, "frodo": 26, "frog": 3, "from": [0, 1, 2, 3, 4, 6, 7, 8, 9, 11, 13, 14, 15, 16, 17, 18, 19, 20, 21, 23, 24, 25], "from_cod": 9, "from_logit": [3, 4], "from_tensor_slic": 4, "front": [0, 4, 5, 26, 27, 28], "frustrat": 15, "fulfil": [2, 5, 12, 27, 28], "full": [1, 3, 5, 7, 9, 10, 13, 23, 26, 27, 28], "full_matric": [5, 27, 28], "fulli": [3, 6, 12, 23], "fun": [19, 26], "func": 2, "function": [2, 3, 4, 5, 9, 14, 15, 16, 17, 18, 19, 20], "functionali": 11, "fundament": [0, 6, 19, 26], "funtion": 2, "further": [2, 7, 9, 26], "furthermor": [0, 3, 5, 6, 7, 11, 12, 13, 19, 21, 26, 27, 28], "futur": [0, 4, 8, 9, 26], "fy": [15, 21, 22, 24, 25, 26], "fys4155": 21, "fys5419": [25, 26], "fys5429": [25, 26], "f\u00f8470": [24, 26], "g": [0, 1, 2, 3, 4, 6, 8, 9, 10, 11, 13, 15, 18, 23, 26, 27, 28], "g0": 2, "g_": [2, 9, 10], "g_0": 2, "g_1": [2, 10], "g_2": [2, 10], "g_analyt": 2, "g_dnn_ag": 2, "g_euler": 2, "g_i": 2, "g_m": [3, 10], "g_n": 3, "g_re": 2, "g_t": 2, "g_t_d2t": 2, "g_t_d2x": 2, "g_t_dt": 2, "g_t_hessian": 2, "g_t_hessian_func": 2, "g_t_jacobian": 2, "g_t_jacobian_func": 2, "g_trial": 2, "g_trial_deep": 2, "g_vec": 2, "gain": [1, 5, 7, 9, 10, 13, 27, 28], "galleri": [0, 26], "game": 4, "gamge": 26, "gamma": [0, 2, 8, 9, 10, 11, 13, 26, 28], "gamma1": 8, "gamma2": 8, "gamma_": [0, 26], "gamma_0": 10, "gamma_1": 10, "gamma_1x": 10, "gamma_i": [0, 8, 23, 26], "gamma_j": 13, "gamma_k": [13, 28], "gamma_m": 10, "gamma_x": [0, 26], "gap": 8, "gate": [4, 12], "gather": [0, 1, 12, 27], "gaug": 12, "gaussbacksub": 20, "gaussian": [4, 5, 6, 8, 14, 18, 23, 26], "gaussian_point": 14, "gaussian_rbf": 8, "gave": 13, "gavra": 26, "gbc": 26, "gca": [2, 6, 8, 13], "gd": [1, 28], "gd_clf": 10, "gdclassiffiercgain": 10, "gdclassiffierconfus": 10, "gdclassiffierroc": 10, "gdm": 13, "gdregress": 10, "ge": [1, 5, 7, 23, 27, 28], "gen_loss": 4, "gen_tap": 4, "gender": [0, 26], "genener": 4, "gener": [0, 1, 2, 3, 5, 6, 8, 10, 11, 12, 13, 14, 15, 16, 18, 20, 21, 23, 25, 27, 28], "generaliz": 16, "generallay": 12, "generate_and_save_imag": 4, "generate_imag": 4, "generate_latent_point": 4, "generate_simple_clustering_dataset": 14, "generated_imag": 4, "generator_loss": 4, "generator_loss_list": 4, "generator_model": 4, "generator_optim": 4, "genom": 19, "geodes": 11, "geometr": [0, 13, 26], "geometri": 5, "georg": 25, "geotif": 6, "geq": [2, 5, 8, 9, 13, 27, 28], "geron": [0, 25, 26], "get": [0, 1, 2, 3, 4, 5, 6, 7, 9, 10, 11, 13, 15, 19, 20, 21, 23, 24, 26, 27, 28], "get_dummi": 9, "get_paramet": 2, "get_split": 9, "get_yaxi": 8, "get_yticklabel": 6, "gh": 15, "gibb": [19, 26], "gif": 4, "gini": 10, "gini_index": 9, "ginvers": 13, "git": [0, 15, 19, 26], "giter": 13, "github": [0, 19, 21, 22, 24, 25, 26, 27], "gitignor": 15, "gitlab": [0, 15, 19, 21, 26], "give": [0, 1, 2, 3, 5, 6, 7, 8, 9, 10, 12, 13, 14, 18, 19, 21, 23, 26, 27, 28], "given": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 17, 20, 23, 26, 27, 28], "global": [6, 7, 13, 28], "glorot": 1, "gnew": 13, "go": [0, 1, 3, 5, 6, 8, 9, 11, 12, 13, 15, 16, 18, 26, 27, 28], "goal": [0, 7, 9, 26], "goe": [0, 1, 2, 5, 6, 13, 14, 15, 20, 26, 27, 28], "golden": 13, "gone": [5, 27, 28], "gong": 1, "good": [1, 3, 4, 5, 6, 9, 10, 11, 13, 15, 18, 19, 23, 25, 27, 28], "goodfellow": [4, 25, 26, 27, 28], "googl": [1, 4, 19, 26], "got": [1, 6, 21], "gotten": 26, "gov": 6, "govern": 26, "gp": 25, "gpu": [1, 13, 19, 26], "grad": [2, 13], "grad_analyt": 13, "grad_ol": 18, "grad_ridg": 18, "grade": [21, 22], "gradient": [0, 3, 4, 7, 8, 9, 12, 19, 26, 27], "gradientboostingclassifi": 10, "gradientboostingregressor": 10, "gradients_of_discrimin": 4, "gradients_of_gener": 4, "gradienttap": 4, "gradual": [1, 14], "grai": [4, 6], "graph": [1, 9, 11, 12, 13, 16, 28], "graph_from_dot_data": 9, "graphic": [0, 1, 9, 15, 26], "grasp": 0, "gray_r": [1, 3], "grayscal": 3, "great": [5, 13, 15, 28], "greater": [1, 7, 23, 27], "greatli": 13, "greedi": 9, "green": [0, 3, 9, 23], "grei": 4, "grid": [1, 3, 6, 7, 8, 12, 23, 27], "grossli": [13, 28], "ground": [0, 26], "group": [0, 6, 7, 9, 14, 15, 19, 21, 22, 24, 26], "groupbi": [0, 26], "grow": [1, 3, 9, 10], "growth": [0, 26], "gru": 4, "guarante": [0, 4, 13, 23, 26, 27, 28], "guess": [1, 4, 10, 13, 14, 28], "guestrin": 10, "gui": 15, "guid": 1, "h": [0, 1, 5, 6, 8, 13, 15, 23, 24, 25, 26, 27, 28], "h1": 2, "h_": [0, 13, 26, 28], "h_1": [2, 13, 28], "h_2": [2, 13, 28], "h_m": 10, "ha": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 17, 18, 20, 21, 23, 26, 27, 28], "haanen": [24, 26], "habit": [0, 27], "had": [0, 1, 6, 7, 13, 26, 28], "hadamard": [1, 12, 13], "half": [1, 8, 9], "halv": 10, "hand": [0, 1, 2, 3, 5, 11, 12, 13, 19, 20, 21, 23, 24, 25, 26, 27, 28], "handi": [3, 21], "handl": [0, 1, 2, 5, 9, 11, 15, 18, 19, 27, 28], "handle_unknown": 9, "handsid": 12, "handwrit": 12, "handwritten": [1, 5], "happen": [1, 2, 3, 4, 5, 6, 10, 13, 23, 27, 28], "hard": [1, 7, 8, 10, 13, 28], "hardcopi": [19, 26], "harder": [0, 1, 27], "harmon": 3, "hasn": [], "hassl": [0, 19, 26], "hast": [19, 26], "hasti": [0, 6, 16, 17, 21, 25, 26, 27], "hat": [0, 1, 5, 6, 7, 9, 10, 11, 12, 13, 16, 17, 18, 20, 27, 28], "have": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 23, 24, 26, 27, 28], "haven": 1, "he": 7, "head": [4, 10, 23], "header": [0, 26], "heads_proba": 10, "health": [0, 27], "hear": [0, 13, 26], "heart": [0, 7, 26], "heatmap": [0, 1, 3, 7, 17, 26], "heavili": 0, "heavisid": 1, "height": [1, 3, 6, 27], "held": 13, "help": [0, 1, 4, 12, 13, 15, 16, 21, 26], "helper": [4, 14], "henc": [0, 5, 6, 8, 9, 10, 12, 13, 26, 27, 28], "henrik": [24, 26], "her": 7, "here": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 17, 18, 19, 20, 21, 23, 26, 27, 28], "hereaft": [0, 8, 12, 26], "hermitian": 20, "hessenberg": 20, "hessian": [0, 2, 5, 13], "heterogen": [9, 10], "hi": 7, "hidden": [1, 3, 4, 12], "hidden_bia": 1, "hidden_bias_gradi": 1, "hidden_layer_s": [0, 1, 26], "hidden_neuron": 4, "hidden_weight": 1, "hidden_weights_gradi": 1, "hierarch": [5, 27, 28], "high": [0, 1, 2, 3, 4, 5, 6, 9, 10, 11, 13, 14, 19, 20, 21, 26, 27, 28], "higher": [0, 1, 3, 5, 6, 8, 13, 18, 21, 26, 27, 28], "highest": [1, 2], "highli": [0, 3, 4, 10, 19, 20, 25, 26, 27, 28], "highwai": [], "hing": 8, "hint": [13, 15, 16, 27, 28], "hip": 19, "hire": 0, "hist": [4, 6, 7, 23], "histogram": [6, 7, 23], "histor": [7, 11], "histori": [3, 4, 12, 15, 18], "hitherto": 5, "hjorth": [24, 26, 27, 28], "hobbi": 23, "hoc": [5, 27, 28], "hoff": 25, "hold": [1, 3, 6, 13, 14, 28], "holder": [0, 26], "home": [], "homepag": [21, 26], "homework": [6, 13, 28], "homogen": [1, 3, 9, 10, 13], "honchar": 2, "hopefulli": [0, 11, 15, 23, 26], "horizont": 11, "horlyk": [24, 26], "hors": [3, 7, 26], "hot": [1, 9], "hour": [1, 19, 22, 23, 24, 26], "how": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 17, 18, 19, 20, 21, 23, 26, 27, 28], "howev": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 19, 20, 21, 23, 26, 27, 28], "hspace": [0, 4, 8, 10, 23, 26], "hstack": 1, "htf": 26, "html": [0, 16, 19, 21, 22, 24, 25, 26, 27, 28], "http": [0, 3, 4, 6, 13, 15, 16, 19, 20, 21, 22, 24, 25, 26, 27, 28], "huang": [0, 26], "huber": [0, 26], "huge": [1, 3, 4, 19], "human": [0, 1, 3, 6, 9, 12, 27], "humid": 9, "hundr": 1, "hungri": 1, "hybrid": 22, "hydrogen": [0, 26], "hyperbol": [1, 4, 12], "hyperparam": 8, "hyperparamet": [3, 4, 5, 6, 9, 13, 18, 21, 27, 28], "hyperplan": 11, "h\u00f8rlyk": [24, 26], "i": [2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 27, 28], "i0": [0, 26], "i1": [0, 6, 8, 12, 26, 27], "i2": [0, 8, 12, 26], "i3": [0, 12, 26], "i5": [0, 26], "i_": [13, 28], "i_1": [5, 6], "i_2": [5, 6], "ian": 25, "ic": [1, 21], "id": [7, 13, 28], "ida": [24, 26], "idea": [0, 1, 2, 3, 4, 6, 9, 10, 12, 13, 20, 21, 27, 28], "ideal": [0, 2, 6, 8, 13, 23, 26], "idem": 6, "ident": [5, 6, 12, 13, 17, 18, 20, 27, 28], "identifi": [0, 1, 7, 9, 11, 12, 13, 14, 26, 27], "ieor": 23, "ifi": 25, "ifs": [19, 26], "ignor": [0, 1, 3, 9, 15, 27], "ii": [20, 23], "iii": [20, 26], "ij": [0, 1, 3, 6, 8, 12, 14, 16, 20, 23, 26, 27], "ik": [0, 20, 26, 27], "illustr": [5, 7, 10, 12, 13, 14, 19, 26], "im": 6, "imag": [1, 3, 4, 6, 9, 11, 12, 14, 25, 26], "image_at_epoch_": 4, "image_batch": 4, "image_height": 3, "image_path": [0, 6, 7, 9, 26], "image_width": 3, "imageio": 6, "images_from_seed_imag": 4, "imagin": 1, "immedi": [0, 3, 4, 6, 19, 26], "implement": [0, 2, 3, 4, 5, 6, 8, 9, 10, 11, 12, 13, 14, 21, 23, 26, 27, 28], "impli": [3, 5, 6, 7, 13, 20, 27, 28], "implicit": 3, "implicitli": [11, 23], "import": [0, 1, 2, 3, 4, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 21, 23], "importantli": 3, "impos": [0, 6, 11, 12, 26], "imposs": [0, 5, 26, 27, 28], "impress": [0, 12, 26], "improv": [0, 4, 5, 9, 10, 11, 13, 15, 21, 27, 28], "impur": 9, "imread": 6, "imshow": [1, 3, 4, 6], "in3050": [25, 26], "in3310": 26, "in4080": [25, 26], "in4300": [25, 26], "in4310": 25, "in5400": 3, "in5550": 25, "in_out_neuron": 4, "inaccur": [13, 28], "inact": 12, "inadequ": [0, 26], "inch": [6, 27], "includ": [0, 1, 2, 3, 4, 5, 6, 7, 11, 12, 15, 16, 17, 18, 19, 23, 24, 25, 26, 27, 28], "include_bia": [6, 9], "incom": [12, 16], "incorrect": 1, "incoveni": 8, "increas": [0, 1, 3, 4, 5, 6, 9, 12, 13, 21, 23, 26, 27], "increasingli": 23, "ind": 6, "inde": [0, 2, 4, 5, 6, 13, 26, 27, 28], "indefinit": 4, "independ": [0, 5, 6, 7, 8, 12, 13, 23, 26, 27, 28], "index": [0, 1, 3, 4, 10, 14, 19, 20, 21, 23, 25, 26], "index_col": [0, 26], "indic": [0, 1, 3, 4, 5, 6, 9, 10, 11, 13, 16, 21, 26, 27], "indispens": 6, "individu": [1, 6, 7, 10, 12, 23, 26, 27], "indu": [], "indx": 20, "indx1": 2, "indx2": 2, "indx3": 2, "ineffici": [3, 13], "inequ": [8, 13], "inequaltii": 28, "inertia": 13, "inf1000": [19, 26], "inf1100": [19, 26], "inf1100l": [19, 26], "inf1110": [19, 26], "inf3000": 26, "infeas": 9, "infer": [0, 1, 4, 6, 25, 26], "inferenc": 1, "infil": [0, 6, 7, 9, 26], "infin": [5, 6, 7, 11, 27, 28], "infinit": 3, "infinitesim": 23, "influenc": [6, 10, 18], "influenti": 1, "info": 26, "inform": [0, 1, 3, 4, 6, 9, 11, 12, 13, 14, 20, 21, 25, 26, 28], "inforom": 15, "infti": [3, 6, 13, 23, 28], "ingeni": [13, 28], "ingredi": [0, 9, 26], "inher": 6, "inherit": [20, 26], "initi": [0, 1, 2, 6, 10, 13, 14, 18, 20, 23, 26, 28], "inject": 14, "inlin": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 20, 23, 26, 27, 28], "inner": [0, 13, 27], "inp": 4, "inplac": 13, "input": [0, 1, 3, 4, 5, 6, 7, 8, 12, 13, 14, 16, 21, 23, 26, 27, 28], "input_dim": 1, "input_shap": [3, 4], "inputs": 1, "inputs_shuffl": [0, 1, 27], "insert": [3, 5, 6, 8, 10, 23, 27, 28], "insid": [4, 7], "insight": [0, 1, 5, 19, 26, 27, 28], "insist": [6, 13, 27], "inspir": [0, 1, 12, 21, 26], "instabl": 2, "instal": [0, 1, 5, 6, 9, 15], "instanc": [0, 1, 2, 4, 6, 9, 11, 13, 16, 26, 27, 28], "instanti": 10, "instead": [0, 1, 2, 3, 4, 5, 6, 8, 9, 11, 13, 14, 17, 20, 23, 26, 27], "institut": 1, "instruct": [0, 1, 15], "int": [0, 1, 2, 3, 4, 5, 6, 11, 13, 14, 20, 23, 27], "int32": 10, "int_": [3, 6, 23], "int_0": 23, "int_a": 23, "intak": [0, 27], "integ": [1, 2, 13, 14, 20, 23, 26], "integer_vector": 1, "integr": [3, 6, 23, 26], "intellig": [0, 14, 25, 26], "intend": 10, "intens": [1, 18], "intention": 14, "interact": [0, 6, 9, 12, 19, 21, 26], "intercept": [0, 6, 8, 11, 13, 16, 17, 18, 26, 27, 28], "intercept_": [0, 6, 8, 9, 13, 26, 27], "interchang": [5, 12, 20], "interconnect": 1, "interest": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 12, 19, 21, 23, 26, 27, 28], "interfac": [0, 1, 15, 20, 27], "interior": [0, 9, 26], "intermedi": [20, 27], "intern": [1, 10, 12], "interpol": [1, 3, 4, 6, 12], "interpr": [5, 27, 28], "interpret": [0, 1, 6, 9, 10, 12, 13, 15, 16, 20, 21, 23], "interv": [0, 3, 5, 6, 7, 13, 23, 26, 27, 28], "intial": [13, 28], "intract": [0, 4, 27], "intrins": [3, 11, 20, 23, 26], "intro": [19, 25, 26], "introduc": [0, 1, 5, 6, 8, 10, 12, 20, 21, 23, 26, 28], "introduct": [1, 2, 4, 13, 25, 27, 28], "introductori": [0, 4, 20, 25, 26, 27], "intuit": [0, 5, 6, 8, 12, 13, 21, 26], "inv": [0, 5, 13, 17, 26, 27, 28], "invalu": [0, 13, 19, 26, 28], "invari": 1, "invd": 5, "inver": 8, "invers": [0, 3, 6, 13, 26, 27, 28], "inverse_transform": 8, "invert": [0, 5, 7, 10, 13, 16, 18, 26], "invh": 13, "invok": 8, "involv": [0, 2, 6, 7, 11, 12, 26, 27], "io": [0, 19, 21, 22, 24, 25, 26, 27], "ip": [0, 8, 23, 26], "ipca": 11, "ipynb": [19, 26], "ipython": [0, 5, 7, 9, 11, 14, 19, 21, 26, 27], "iq": 6, "iri": [8, 9], "irreduc": 6, "irrelev": [5, 27, 28], "irrespect": [0, 26], "irvin": 21, "isn": 5, "isnul": [], "isomap": 11, "issu": [1, 9, 15, 20], "it_arrai": 13, "item": [0, 13, 26], "items": [20, 26], "iter": [1, 2, 4, 6, 8, 13, 14, 18, 21, 23, 28], "its": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 19, 20, 21, 23, 26, 28], "itself": [5, 6, 12, 21, 23, 26, 27], "j": [0, 1, 2, 3, 4, 5, 6, 8, 9, 11, 12, 13, 14, 15, 16, 20, 21, 23, 25, 26, 27, 28], "j1": 20, "j_": 6, "j_lasso_sk": 6, "j_ridge_sk": 6, "j_sk": 6, "jackknif": [6, 19, 26], "jacobian": [2, 13, 28], "jason": 4, "jax": [19, 26], "jensen": [24, 26, 27, 28], "jerom": [21, 25], "ji": [12, 20], "jit": 13, "jj": [0, 5, 6, 26], "jk": [0, 1, 6, 12, 20, 26], "jl": [0, 26], "jm": 20, "jnp": 13, "job": [2, 8, 10, 15], "join": [0, 4, 6, 7, 9, 26], "joint": [4, 5], "judg": [13, 28], "judgement": 6, "julia": [19, 20, 21], "jump": 23, "junk": 4, "jupit": 26, "jupyt": [0, 15, 16, 19, 21, 25, 26], "just": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 19, 23, 26, 27, 28], "justif": 0, "justifi": [3, 10], "k": [0, 1, 3, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 19, 20, 21, 23, 24, 26, 27, 28], "k0": 7, "k1": 7, "kaggl": [6, 21], "kappa_d": 23, "karl": [24, 26], "karush": 8, "katrin": [24, 26], "keep": [0, 1, 4, 5, 6, 11, 13, 14, 15, 18, 20, 21, 26, 27, 28], "keepdim": [1, 6, 10, 20], "kei": [1, 3, 6, 12], "kept": [4, 6, 14], "kera": [0, 4, 19, 21, 26], "kernel": [0, 1, 3, 19, 26, 27], "kernel_regular": [1, 3], "kernel_s": 4, "kernelpca": 11, "kev": [0, 26], "kevin": [25, 26], "keyword": [20, 26], "kfold": 6, "kg": 1, "ki": 20, "kick": [1, 13], "kiener": 2, "kilomet": [6, 27], "kind": [0, 2, 3, 4, 8, 12, 13, 14, 26, 27], "kj": [6, 12, 20, 27], "kjm": [19, 26], "kkt": 8, "kl": 23, "km": [12, 26], "kmean": 14, "kmeanspoint": 14, "kn_k": 14, "know": [0, 1, 2, 5, 6, 8, 13, 15, 16, 17, 19, 26, 27, 28], "knowledg": [0, 19, 26], "known": [1, 3, 4, 5, 6, 7, 8, 9, 12, 18, 20, 21, 23, 25, 27], "kondev": [0, 26], "kp": 23, "kpca": 11, "kroneck": 14, "kuhn": 8, "kvalsund": [24, 26], "kwown": [0, 26], "l": [0, 1, 2, 3, 5, 6, 7, 8, 10, 11, 12, 13, 20, 21, 23, 26, 28], "l0": 7, "l1": [0, 1, 3, 7, 26], "l1_l2": [1, 3], "l1regl": 5, "l2": [1, 3], "l_": 20, "l_1": 7, "l_2": [7, 13, 28], "l_j": 12, "la": 13, "la_i": 12, "la_k": 12, "lab": [19, 21, 26], "label": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 12, 13, 14, 15, 19, 20, 21, 23, 26, 27, 28], "labelencod": [7, 10], "labels": [6, 8, 9], "labels_shuffl": [0, 1, 27], "laboratori": 22, "lack": [0, 26], "lagari": 2, "lagrang": [8, 11], "lam": [], "lambda": [0, 1, 2, 3, 5, 6, 7, 8, 10, 12, 13, 17, 18, 21, 23, 26, 27, 28], "lambda_": 11, "lambda_0": 11, "lambda_1": [5, 8, 11, 27, 28], "lambda_2": [8, 11], "lambda_i": [8, 11], "lambda_iy_i": 8, "lambda_jy_iy_j": 8, "lambda_k": 8, "lambda_n": [5, 8, 27, 28], "lamda": 1, "land": 8, "landmark": 8, "landscap": [13, 18, 28], "langl": [0, 6, 11, 23, 26, 27], "languag": [0, 1, 4, 8, 19, 20, 21, 25, 26], "lapack": [20, 26], "laplac": 5, "laptop": [15, 19], "larg": [0, 1, 2, 4, 5, 6, 8, 9, 10, 11, 13, 18, 19, 20, 21, 23, 25, 26, 27, 28], "larger": [0, 3, 5, 6, 8, 10, 11, 13, 17, 23, 26, 27, 28], "largest": [4, 8, 11], "lasso": [0, 7, 19, 26], "lasso_sk": 6, "last": [0, 1, 3, 4, 5, 6, 7, 8, 12, 16, 17, 20, 21, 23, 24, 26, 28], "latent": 4, "latent_dim": 4, "latent_point": 4, "latent_space_value_rang": 4, "later": [0, 1, 4, 7, 8, 12, 13, 14, 15, 19, 21, 26], "latest": [4, 15, 19], "latest_checkpoint": 4, "latex": 26, "latter": [0, 3, 6, 7, 8, 11, 13, 20, 23, 26, 27, 28], "lattic": 12, "law": 0, "layer": [0, 4, 13, 26], "lbfg": [7, 9, 10], "lcc": [5, 6], "lda": 11, "ldot": [0, 6, 11, 21, 26], "le": [5, 7, 10, 13, 17, 23, 27, 28], "lead": [0, 1, 3, 5, 6, 7, 8, 9, 10, 11, 12, 13, 16, 17, 20, 23, 26, 27, 28], "leaf": 9, "leaki": 1, "leakyrelu": 4, "lear": [13, 28], "learn": [3, 4, 5, 6, 7, 8, 9, 10, 12, 20, 24, 25], "learnabl": 3, "learner": 10, "learnig": 26, "learning_r": [8, 10], "learning_rate_init": [0, 1, 26], "learning_schedul": 13, "learnt": 21, "least": [0, 7, 8, 10, 11, 17, 18, 19, 20, 23], "leat": 13, "leav": [0, 1, 3, 5, 6, 9, 11, 26, 28], "lectur": [0, 1, 5, 10, 11, 12, 13, 19, 20, 21, 22, 24, 25, 27], "lecturenot": [0, 19, 21, 25, 26], "left": [0, 1, 2, 3, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 16, 17, 20, 21, 23, 26, 27, 28], "leftarrow": [8, 12], "legend": [0, 2, 3, 4, 5, 6, 7, 8, 9, 10, 13, 15, 26, 27, 28], "leinonen": 26, "len": [0, 1, 2, 3, 4, 5, 6, 8, 9, 10, 11, 12, 16, 17, 20, 26, 27, 28], "length": [0, 1, 3, 4, 8, 9, 13, 16, 19, 26, 27, 28], "length_of_sequ": 4, "leq": [0, 5, 7, 8, 13, 14, 23, 26, 27, 28], "less": [0, 1, 3, 4, 5, 6, 8, 9, 13, 19, 23, 26, 27, 28], "lessen": 1, "let": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 20, 23, 26, 27, 28], "letter": [0, 16, 20, 23, 26, 27], "level": [0, 1, 5, 6, 9, 19, 20, 21, 22, 24, 26], "li": [8, 11], "lib": [], "liblinear": 10, "librari": [0, 1, 2, 3, 4, 5, 6, 9, 10, 11, 20, 21, 23, 25, 27, 28], "licens": [0, 1, 19, 21, 26], "lie": [0, 6, 11, 23, 26, 27], "life": [0, 1, 8, 12, 26], "lifetim": 13, "like": [0, 1, 2, 3, 4, 5, 6, 7, 9, 10, 11, 12, 13, 15, 16, 19, 20, 21, 23, 26, 27, 28], "likelihood": [0, 1, 5, 9, 26, 27], "lim_": 23, "limit": [0, 5, 6, 8, 12, 20, 21, 26, 27], "lin_clf": 8, "lin_model": [], "lin_reg": 9, "linalg": [0, 2, 5, 6, 8, 11, 13, 17, 20, 23, 26, 27, 28], "line": [0, 3, 6, 8, 11, 13, 15, 16, 26, 28], "line1": 8, "line2": 8, "line3": 8, "line_model": 15, "line_ms": 15, "line_predict": 15, "linear": [1, 3, 5, 6, 7, 9, 10, 11, 12, 16, 17, 18, 19, 21, 23], "linear_model": [0, 5, 6, 7, 8, 9, 10, 11, 13, 15, 16, 26, 27, 28], "linear_regress": 6, "linearli": [5, 27, 28], "linearloc": [6, 13, 28], "linearregress": [0, 6, 7, 9, 15, 16, 26, 27], "linearsvc": 8, "lineat": 28, "liner": [1, 3], "linerar": 10, "linewidth": [0, 2, 4, 6, 8, 9, 10], "link": [0, 4, 9, 12, 15, 19, 21, 22, 24, 26], "linlag": 5, "linpack": [20, 26], "linreg": [0, 26], "linspac": [0, 2, 3, 4, 6, 8, 9, 10, 13, 16, 17, 20, 23, 26, 27], "linu": 4, "linux": [0, 1, 19, 21, 26], "liquid": [0, 26], "list": [1, 2, 3, 4, 9, 15, 19, 21, 26], "listedcolormap": [9, 10], "literatur": [1, 7, 14, 25], "littl": [1, 3, 9, 12], "live": [8, 16], "ll": [0, 18, 23, 26, 27], "lle": [0, 27], "lloyd": [4, 14], "lmb": [0, 2, 5, 6, 27, 28], "lmbd": [0, 1, 3, 26], "lmbd_val": [0, 1, 3, 26], "lmbda": [13, 28], "ln": [1, 13, 28], "load": [1, 4, 6, 7, 9, 10], "load_boston": [], "load_breast_canc": [1, 7, 9, 10, 11], "load_data": [3, 4], "load_digit": [1, 3], "load_iri": [8, 9], "loc": [3, 6, 7, 8, 9, 10, 26], "local": [0, 1, 3, 7, 12, 13, 15, 27, 28], "locat": [2, 3, 8, 15], "log": [0, 1, 2, 4, 5, 6, 7, 9, 10, 11, 13, 15, 20, 21, 26], "log10": [0, 5, 6, 27, 28], "log_": [0, 26], "log_clf": 10, "logarithm": [0, 5, 7, 17, 20, 26], "logbook": 21, "logic": [0, 1, 9, 26], "login": 15, "logist": [0, 1, 2, 8, 9, 10, 11, 12, 13, 19, 27, 28], "logisticregress": [7, 9, 10, 11], "logit": 7, "logreg": [7, 9, 10, 11], "logspac": [0, 1, 3, 5, 6, 26, 27, 28], "long": [0, 1, 3, 4, 12, 13, 26, 28], "longer": [2, 3, 8, 10, 14, 20, 23, 26], "loocv": 6, "look": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 13, 15, 16, 20, 21, 23, 26, 27, 28], "loop": [1, 4, 6, 10, 12, 14, 16, 17, 18, 19, 20, 26], "lose": 1, "loss": [0, 1, 3, 4, 5, 6, 7, 8, 10, 11, 13, 18, 20, 21, 26], "loss_fil": 4, "lossfil": 4, "lost": 4, "lot": [1, 4, 6, 16], "low": [0, 6, 9, 10, 11, 21, 26, 27], "lower": [0, 1, 3, 6, 9, 10, 16, 20, 27], "lowercas": [20, 26], "lowest": [9, 13, 23], "lr": [1, 3, 4, 10], "lstat": [], "lstm": 4, "lstm_2layer": 4, "lstsq": [0, 26, 27], "lt": 6, "lu": [0, 5, 26, 27, 28], "lubksb": 20, "luckili": 2, "ludcmp": 20, "lux": 20, "lvert": 1, "lw": [0, 26], "m": [0, 1, 2, 3, 5, 6, 8, 9, 10, 11, 12, 13, 15, 18, 20, 23, 24, 25, 26, 27, 28], "m_": [9, 12], "m_1": 14, "m_h": [0, 26], "m_k": 14, "m_l": 12, "m_n": [0, 26], "m_p": [0, 26], "m_t": 13, "ma": 11, "machin": [1, 3, 4, 5, 6, 7, 9, 10, 11, 12, 15, 16, 20, 25, 27], "machinelearn": [0, 6, 16, 19, 21, 22, 24, 25, 26, 27, 28], "mackai": 25, "made": [0, 1, 3, 4, 5, 6, 7, 9, 11, 12, 21, 26, 27], "mae": [0, 26], "magic": 4, "magnitud": [1, 6, 7, 13, 27], "mai": [0, 1, 2, 3, 5, 6, 7, 8, 9, 11, 12, 13, 19, 20, 21, 23, 26, 27, 28], "mail": [22, 24], "main": [0, 1, 3, 4, 5, 6, 7, 9, 20, 21, 25, 27, 28], "mainli": [0, 5, 6, 7, 9, 26, 27], "maintain": 6, "major": [1, 6, 9, 10, 13, 20, 26, 28], "make": [1, 2, 3, 4, 5, 6, 7, 8, 11, 12, 13, 15, 16, 18, 19, 20, 21, 23, 25, 26, 28], "make_axes_locat": 6, "make_moon": [8, 9, 10], "make_pipelin": [0, 6, 10, 27], "makedir": [0, 6, 7, 9, 26], "malcondit": 20, "malign": [1, 7, 9], "mammographi": 5, "manag": [0, 2, 3, 15, 19, 21, 26], "mandatori": [24, 26], "mani": [0, 1, 3, 4, 5, 6, 7, 8, 9, 11, 13, 14, 15, 16, 17, 18, 19, 20, 21, 23, 25, 26, 27, 28], "manifold": 11, "manner": 3, "manual": [6, 27], "map": [0, 1, 2, 6, 7, 8, 11, 12, 14, 23, 26], "marc": 27, "margin": [0, 5, 8], "marit": [0, 26], "mark": 26, "marker": [7, 20, 26], "markov": [19, 26], "marsaglia": 23, "mass": [0, 1, 5, 13, 27, 28], "massag": [0, 26], "masses2016": [0, 26], "masses2016ol": [0, 26], "masses2016tre": 0, "masseval2016": [0, 26], "master": [22, 24], "mat": [19, 26], "mat1100": [19, 26], "mat1110": [19, 26], "mat1120": [19, 26], "match": [1, 4, 5, 13, 14, 15, 27, 28], "materi": [4, 5, 7, 13, 15, 20, 22, 24], "math": [3, 7, 12, 13, 20, 23, 25, 26], "mathbb": [0, 4, 5, 6, 7, 8, 11, 12, 13, 14, 17, 20, 21, 23, 26, 27, 28], "mathbf": [0, 5, 6, 7, 8, 13, 20, 21, 26, 27, 28], "mathcal": [1, 5, 6, 7, 13, 21], "matheemat": 3, "mathemat": [0, 6, 11, 12, 13, 19, 20, 23, 25, 26], "mathemati": 26, "mathrm": [0, 1, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 17, 18, 21, 23, 26, 27, 28], "matmul": [1, 2, 5], "matnat": 25, "matplotlib": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 17, 19, 20, 21, 23, 26, 27, 28], "matric": [0, 1, 3, 4, 6, 7, 8, 11, 13, 16, 17, 19, 27, 28], "matrix": [0, 2, 3, 4, 6, 7, 8, 10, 13, 17, 18, 21, 23], "matshow": 1, "matter": [2, 3, 13, 27, 28], "max": [0, 1, 2, 3, 4, 9, 10, 12, 13, 24, 26, 28], "max_depth": [0, 9, 10], "max_diff": 2, "max_diff1": 2, "max_diff2": 2, "max_it": [0, 1, 8, 13, 26], "max_iter": 14, "max_leaf_nod": 10, "max_sampl": 10, "maxdegre": [0, 6, 10, 27], "maxdepth": 10, "maxim": [1, 4, 5, 7, 8, 11], "maximum": [0, 2, 3, 5, 7, 8, 9, 10, 13, 14, 26, 27, 28], "maxpolydegre": [5, 6, 27, 28], "maxpooling2d": 3, "mbox": [5, 6, 27, 28], "mcculloch": 12, "md": 11, "mdoel": 4, "mean": [1, 2, 3, 4, 5, 6, 7, 9, 10, 11, 12, 13, 14, 15, 17, 18, 19, 20, 21, 23, 26], "mean_absolute_error": [0, 26], "mean_divisor": 14, "mean_i": 23, "mean_matrix": 14, "mean_squared_error": [0, 4, 6, 7, 10, 15, 26, 27], "mean_squared_log_error": [0, 26], "mean_vector": 14, "mean_x": 23, "meaning": [0, 4, 7, 26], "meansquarederror": [0, 26], "meant": [3, 7, 10, 13], "measur": [0, 1, 2, 5, 6, 9, 11, 12, 14, 16, 18, 21, 23, 26, 27], "mechan": [0, 4, 23, 26], "median": [0, 26, 27], "medicin": 12, "medium": [4, 8, 13], "medv": [], "meet": [0, 24], "mehta": [0, 26, 27, 28], "memori": [3, 4, 11, 12, 13, 18, 20], "mention": [0, 12, 13, 21, 23, 26, 28], "mere": [0, 21], "meshgrid": [2, 5, 6, 8, 9, 10, 11], "mess": 15, "messag": [5, 13], "messi": 2, "met": [0, 3, 8, 27], "meteorolog": 9, "meter": [6, 27], "method": [0, 1, 2, 3, 4, 5, 7, 8, 11, 12, 14, 15, 16, 17, 18, 19, 20, 23, 25, 27], "metion": 6, "metric": [0, 1, 3, 6, 7, 9, 10, 14, 15, 26, 27], "metropoli": [19, 26], "mev": [0, 23, 26], "mgd": 13, "mglearn": [19, 26], "mgrid": 13, "mhjensen": [], "mi": 10, "mia": [24, 26], "microsoft": 25, "mid": 1, "midel": 4, "midnight": 15, "midpoint": 9, "might": [0, 1, 2, 4, 6, 9, 13, 15, 17, 18, 27, 28], "migth": 17, "mild": 9, "millimet": [6, 27], "million": [0, 26, 27], "mimic": 12, "min": [0, 2, 5, 8, 9, 28], "min_": [0, 2, 5, 14, 17, 26, 27, 28], "min_samples_leaf": 9, "mind": [0, 6, 13, 15, 18, 26, 27, 28], "mindboard": 4, "mine": [19, 26], "mini": [1, 11, 12, 13, 28], "minibatch": [1, 11, 13], "minibathc": 13, "miniforge3": [], "minim": [0, 1, 2, 3, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 16, 17, 27, 28], "minima": [0, 1, 7, 13, 26, 28], "minimum": [0, 1, 2, 6, 8, 9, 11, 13, 27, 28], "minmaxscal": [0, 27], "minor": 23, "minst": 1, "minu": 7, "mirjalili": 26, "mirror": 9, "misc": 6, "misclassif": [8, 9, 10], "misclassifi": [8, 10], "miser": 0, "mismatch": 1, "miss": [7, 10], "mistak": 4, "mit": 25, "mix": [1, 2, 26], "mixtur": 13, "mk": [9, 20], "mkdir": [0, 6, 7, 9, 26], "ml": [0, 1, 10, 13, 20, 21, 27, 28], "mlab": 23, "mle": [5, 7], "mlp": 1, "mlpclassifi": 1, "mlpregressor": [0, 26], "mm": 20, "mml": 27, "mn": [12, 23], "mnist": [1, 11], "mod": 23, "mode": [22, 24, 26], "model": [2, 3, 5, 7, 8, 9, 10, 11, 13, 14, 16, 18, 19, 21, 23, 25, 27, 28], "model_select": [0, 1, 3, 5, 6, 7, 9, 10, 11, 15, 16, 17, 26, 27, 28], "moder": 10, "modern": [0, 6, 7, 19, 26], "modif": [2, 12, 13], "modifi": [0, 1, 3, 5, 7, 8, 10, 12, 13, 26, 27, 28], "modul": [0, 16, 20, 26], "modular": 23, "modulo": 23, "moe": [11, 27], "moment": [5, 6, 13, 23], "mondai": [24, 26], "monitor": [13, 18], "monoton": [5, 12, 23], "mont": [0, 6, 19, 23, 25, 26], "montli": 16, "moor": [5, 6], "more": [0, 1, 2, 4, 5, 7, 8, 9, 10, 11, 12, 13, 14, 16, 17, 19, 23], "moreov": [0, 3], "morten": [24, 26, 27, 28], "mortenhj": 26, "most": [0, 1, 3, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 19, 21, 23, 26, 27, 28], "mostli": [1, 11, 18], "motion": [0, 13], "motiv": [1, 4], "move": [0, 4, 5, 6, 7, 9, 12, 13, 14, 15, 16, 21, 23, 27, 28], "mpl": [7, 26], "mpl_toolkit": [2, 6, 13, 28], "mplot3d": [2, 6, 13, 28], "mplregressor": 1, "mse": [0, 4, 5, 6, 9, 10, 15, 16, 17, 18, 21, 26, 27, 28], "mse_simpletre": 10, "mselassopredict": [5, 28], "mselassotrain": [5, 28], "mseownridgepredict": [6, 27, 28], "msepredict": [5, 28], "mseridgepredict": [0, 5, 6, 27, 28], "msetrain": [5, 28], "msle": [0, 26], "mt": [7, 12], "mu": [0, 6, 11, 13, 23, 26], "mu0": 23, "mu1": 23, "mu2": 23, "mu_": [6, 23, 27], "mu_i": [6, 27], "mu_n": 11, "mu_x": 23, "much": [0, 1, 2, 3, 4, 5, 6, 8, 9, 10, 11, 12, 13, 15, 20, 21, 23, 26, 27, 28], "multi": [0, 1, 3, 7, 19, 26], "multiclass": [1, 7], "multidimension": [11, 12, 26], "multilay": 1, "multinomi": 7, "multipl": [2, 4, 5, 6, 7, 12, 13, 15, 23, 27, 28], "multipli": [3, 5, 6, 11, 13, 18, 20, 23, 27, 28], "multiplum": 8, "multivari": [0, 2, 10, 11, 19, 23, 26], "multivariate_norm": [11, 14], "multpli": 16, "murphi": [11, 25, 26], "must": [1, 2, 5, 6, 8, 10, 12, 13, 14, 15, 23, 27, 28], "mutat": 7, "mutual": [1, 3, 6, 13], "mx_": 23, "my": 26, "myenv": [], "myriad": [0, 19, 26], "mz1": 23, "mz2": 23, "m\u00f8svatn": 6, "n": [0, 1, 2, 3, 4, 5, 6, 7, 8, 10, 11, 12, 13, 14, 15, 16, 17, 18, 20, 21, 23, 26, 27, 28], "n1": 20, "n2": 20, "n_": [1, 2, 3, 8, 12, 23], "n_0": [12, 23], "n_boostrap": [6, 10], "n_bootstrap": 6, "n_categori": [1, 3], "n_cluster": 14, "n_compon": 11, "n_epoch": 13, "n_estim": 10, "n_examples_to_gener": 4, "n_featur": [1, 18], "n_filter": 3, "n_hidden": 2, "n_hidden_neuron": [0, 1, 26], "n_i": 23, "n_input": [0, 1, 3, 27], "n_instanc": 9, "n_job": 10, "n_k": 14, "n_l": [12, 23], "n_layer": 1, "n_m": 9, "n_neuron": 1, "n_neurons_connect": 3, "n_neurons_layer1": 1, "n_neurons_layer2": 1, "n_point": 14, "n_sampl": [6, 8, 9, 10, 14, 18], "n_split": 6, "n_step": 4, "n_t": 2, "n_x": 2, "nabla": [1, 13, 28], "nabla_": [2, 13, 28], "nabla_w": 13, "nag": 13, "naimi": [0, 26], "naiv": 7, "naive_kmean": 14, "name": [0, 1, 3, 4, 5, 6, 7, 8, 9, 10, 12, 13, 14, 15, 19, 20, 21, 23, 24, 26, 27, 28], "narrow": 13, "nation": [1, 5], "nativ": [19, 26], "natur": [0, 1, 4, 8, 9, 12, 13, 21, 23, 25, 26, 28], "navier": 12, "navig": 15, "nb": 23, "nb_": 20, "nbconvert": 26, "nd": 14, "ndarrai": 6, "ne": [9, 10, 20, 23, 27, 28], "nearest": [1, 3, 6, 11], "nearli": [13, 28], "neat": 26, "neccesari": 6, "necess": 2, "necessari": [0, 1, 3, 4, 8, 14, 18, 26], "necessarili": [0, 4, 11, 23, 26], "necesserali": 5, "neck": 7, "need": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 20, 23, 27, 28], "neg": [0, 1, 3, 5, 6, 7, 10, 13, 20, 23, 26, 28], "neg_mean_squared_error": 6, "neglect": 23, "neglig": 23, "neighbor": [3, 6, 11], "neither": [4, 13], "neq": [13, 14, 23, 28], "nervou": 12, "nest": [9, 12], "nesterov": 13, "net": [2, 4, 12], "netlib": [20, 26], "network": [0, 9, 13, 19, 25, 27], "neural": [0, 13, 19, 25, 27], "neural_network": [0, 1, 2, 26], "neuralnetwork": 1, "neuron": [1, 2, 3, 4, 12], "neutral": [0, 26], "neutron": [0, 26], "never": [1, 4, 6, 9, 23], "new": [0, 1, 2, 3, 5, 6, 7, 8, 9, 10, 11, 13, 14, 15, 17, 20, 26, 27, 28], "new_chang": 13, "new_hobbit": 26, "newaxi": [0, 3, 6, 9], "newli": [0, 26], "newton": [1, 7, 8, 13, 23], "next": [0, 1, 2, 3, 4, 5, 6, 8, 9, 13, 14, 15, 16, 26, 27, 28], "next_guess": 13, "next_input": 4, "ng": 1, "ni": 14, "nice": [0, 1, 5, 11, 26, 27, 28], "nicer": 18, "niter": [13, 28], "nitric": [], "nlambda": [0, 5, 6, 27, 28], "nlp": 25, "nm": 23, "nm_n": [0, 26], "nmse": 6, "nn": [2, 5, 6, 12, 20, 26], "nn_model": 1, "nnmin": 2, "node": [1, 3, 9, 10, 12], "nois": [0, 4, 5, 6, 8, 9, 10, 13, 18, 21, 26, 27, 28], "noise_dimens": 4, "noisi": [1, 6, 21], "non": [0, 1, 3, 5, 6, 7, 9, 10, 11, 12, 13, 14, 18, 20, 23, 26, 27, 28], "none": [0, 1, 2, 4, 5, 9, 10, 13, 23, 26, 27], "nonlinear": [3, 6, 8, 9, 11, 12], "nonneg": [6, 9, 13, 28], "nonparametr": 6, "nonsens": 23, "nonsingular": 20, "nonumb": [3, 7, 8, 13, 20], "nor": [1, 4, 13], "norm": [0, 1, 5, 6, 8, 11, 13, 18, 26, 27, 28], "normal": [3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 16, 17, 18, 19, 20, 21, 23, 26, 27, 28], "normali": [20, 26], "norwai": [6, 21, 26, 28], "notat": [0, 2, 5, 6, 13, 14, 23, 26, 27, 28], "note": [0, 1, 2, 3, 4, 5, 6, 7, 8, 11, 12, 13, 14, 15, 16, 19, 20, 23, 25, 26], "notebook": [0, 1, 3, 9, 15, 16, 19, 21, 26], "noth": [1, 2, 5, 8, 12, 14, 23, 27, 28], "notic": [4, 5, 12, 13, 20, 23, 26], "notion": 3, "novel": [3, 6, 10, 26], "novemb": [1, 24, 26], "now": [0, 2, 4, 5, 6, 7, 8, 10, 11, 12, 14, 15, 16, 19, 20, 21, 23, 26, 27], "nowadai": [0, 1, 3, 9, 19, 26], "nox": [], "np": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 13, 14, 15, 16, 17, 18, 20, 23, 26, 27, 28], "npr": 2, "nsampl": 6, "nt": 2, "nu": 23, "nuclear": [5, 27, 28], "nuclei": [0, 23, 26], "nucleon": [0, 26], "nucleu": [0, 26], "num": 4, "num_coordin": 2, "num_hidden_neuron": 2, "num_it": [2, 18], "num_neuron": 2, "num_neurons_hidden": 2, "num_point": 2, "num_tre": 10, "num_valu": 2, "number": [1, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 16, 17, 18, 20, 21, 22, 24, 26, 28], "numberid": 7, "numberparamet": 3, "numer": [0, 5, 6, 9, 10, 11, 12, 13, 19, 20, 25, 26, 27, 28], "numpi": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 21, 23, 27, 28], "nunmpi": [5, 27], "nve_frngahw": 28, "nx": 2, "ny": 23, "o": [0, 1, 4, 5, 6, 7, 8, 9, 11, 20, 24, 25, 26, 27, 28], "obei": [6, 11, 13, 27], "object": [0, 1, 4, 8, 10, 15, 20, 26], "obliqu": [5, 27, 28], "observ": [0, 1, 3, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 23, 26, 28], "obtain": [0, 1, 5, 6, 7, 8, 9, 10, 12, 13, 14, 17, 20, 21, 23, 26, 27, 28], "obviou": [5, 6, 11, 23, 27, 28], "obviouli": 26, "obvious": [0, 4, 5, 6, 20, 26], "oc": [27, 28], "occupi": [], "occur": [0, 6, 8, 9, 20, 23, 26], "octob": [24, 26], "od": 0, "odd": [0, 3, 7, 26, 27], "odenum": 2, "odesi": 2, "oen": 0, "off": [1, 3, 4, 5, 9, 13, 23], "offer": [6, 11, 19, 20, 22, 24, 26], "offic": [24, 26], "offici": [22, 26], "often": [0, 1, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 19, 20, 21, 23, 26, 27, 28], "ofter": [20, 26], "ol": [0, 13, 17, 27], "old": [1, 5, 10, 13, 15, 18], "ols_paramet": 16, "ols_sk": 6, "ols_svd": 6, "olsbeta": 28, "olstheta": [0, 5], "omega": [2, 3, 6], "omega_0": 3, "omit": [0, 5, 26, 27, 28], "onc": [1, 6, 9, 11, 13], "one": [0, 1, 3, 4, 5, 6, 7, 8, 9, 10, 11, 13, 14, 15, 19, 20, 21, 23, 24, 26, 27], "onehot": 1, "onehot_vector": 1, "onehotencod": 9, "ones": [0, 2, 5, 6, 8, 9, 10, 11, 13, 16, 18, 20, 21, 26, 27, 28], "ones_lik": 4, "ong": 27, "onl": 3, "onli": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 17, 18, 20, 21, 23, 26, 27, 28], "onlin": [11, 15, 22], "onto": [5, 11, 27, 28], "open": [0, 1, 4, 6, 7, 9, 15, 19, 21, 22, 24, 26], "oper": [0, 1, 3, 5, 6, 10, 11, 12, 13, 15, 16, 19, 23, 26, 27, 28], "operation": 23, "oplu": 23, "opmiz": 13, "opportun": 0, "oppos": [6, 13], "opposit": [1, 5, 8, 27, 28], "opt": [1, 5, 21, 26, 28], "optim": [0, 2, 3, 4, 5, 6, 7, 9, 10, 11, 14, 16, 17, 21], "optimis": [1, 3], "option": [0, 1, 3, 5, 6, 8, 11, 15, 18, 20, 27], "optmiz": [1, 8, 13, 27], "oral": 26, "orang": 0, "order": [0, 1, 2, 3, 5, 6, 7, 8, 9, 10, 11, 12, 15, 20, 21, 23, 26, 27, 28], "ordinari": [0, 2, 3, 7, 11, 13, 17, 18, 19], "oreilli": [25, 26], "org": [0, 3, 4, 16, 19, 20, 21, 25, 26, 27, 28], "organ": [6, 7, 10, 20], "orient": [1, 5, 23, 27, 28], "origin": [0, 3, 5, 6, 8, 11, 12, 13, 15, 20, 26, 27, 28], "orthogn": [5, 27, 28], "orthogon": [0, 5, 6, 8, 11, 13, 20, 26, 27, 28], "orthonorm": [5, 27, 28], "os": [24, 26], "oscar": 1, "oscil": [3, 13], "oskar": 26, "oskarlei": 26, "osl": 18, "oslo": [0, 19, 21, 22, 24, 26, 27, 28], "osx": [0, 19, 21, 26], "other": [0, 1, 2, 3, 5, 6, 7, 8, 10, 13, 14, 16, 19, 22, 23, 24, 25, 27, 28], "otherwis": [0, 1, 4, 7, 13, 20, 26], "ouput": [5, 7, 12], "our": [1, 2, 3, 6, 7, 8, 9, 10, 12, 14, 15, 16, 17, 18, 19, 20, 23], "ourmodel": 0, "ourselv": [0, 5, 6, 8, 11, 13, 26, 27, 28], "out": [0, 1, 2, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 15, 16, 19, 20, 21, 23, 26, 27], "out_fil": 9, "outcom": [0, 7, 9, 10, 12, 23, 27], "outdoor": 9, "outer": [6, 12, 13], "outfil": 4, "outlier": [0, 8, 26, 27], "outlin": [6, 10, 11], "outlook": 9, "outperform": 10, "output": [0, 1, 3, 4, 5, 6, 7, 8, 9, 10, 12, 13, 20, 21, 23, 26, 27, 28], "output_bia": 1, "output_bias_gradi": 1, "output_shap": 4, "output_weight": 1, "output_weights_gradi": 1, "outputlayer1": 12, "outputlayer2": 12, "outsid": 4, "over": [0, 1, 3, 4, 5, 6, 9, 10, 12, 13, 15, 16, 20, 21, 26, 27, 28], "over1": 13, "overal": [1, 10], "overcast": 9, "overcom": [12, 13], "overdetermin": [0, 26], "overfit": [0, 1, 3, 6, 9, 10, 13], "overflow": 5, "overhead": 12, "overlap": [3, 7, 8, 9], "overlin": [0, 5, 6, 9, 10, 11, 14, 20, 26, 27], "overst": 0, "overtrain": 4, "overview": 3, "own": [4, 5, 6, 8, 12, 13, 16, 18, 19, 20, 28], "owner": [], "ownmsepredict": 0, "ownmsetrain": 0, "ownridgebeta": 27, "ownridgetheta": [0, 6, 27, 28], "ownypredictridg": 0, "ownytilderidg": 0, "oxid": [], "p": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 13, 14, 15, 16, 17, 18, 20, 23, 26, 27, 28], "p0": 2, "p1": 2, "p_": [2, 4, 8, 9], "p_hidden": 2, "p_i": [5, 23], "p_j": 23, "p_n": 23, "p_output": 2, "p_x": 23, "pack": [0, 26], "packag": [0, 1, 3, 4, 5, 8, 11, 13, 15, 19, 21, 23, 27, 28], "packtpub": 26, "packtpublish": 26, "pad": [3, 4], "page": [0, 19, 26, 28], "pai": [0, 1, 9, 13, 15], "pair": [0, 2, 3, 9, 19, 23, 26], "paltform": 15, "panda": [0, 4, 5, 6, 7, 9, 11, 19, 21, 28], "panel": 26, "paper": 1, "paradigm": [0, 26], "parallel": [10, 13, 19, 20, 26], "param": 2, "paramat": 2, "paramet": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 12, 13, 16, 17, 18, 21, 23, 28], "parameter": [0, 6, 10, 26, 27], "parametr": [0, 6, 26, 27], "paramt": [3, 5], "part": [0, 1, 3, 5, 6, 10, 17, 20, 22, 23, 24, 26, 27], "partial": [0, 1, 5, 6, 7, 8, 10, 11, 12, 13, 16, 23, 26, 27, 28], "particip": [15, 19, 22, 24, 26], "particl": [0, 4, 13, 23, 26], "particular": [0, 1, 2, 3, 5, 6, 9, 10, 11, 12, 13, 16, 21, 23, 25, 26, 27, 28], "particularli": [5, 6, 8, 11, 13, 23, 27, 28], "partit": [1, 4, 9], "partli": [6, 26], "partner": 15, "pass": [2, 3, 12, 14], "password": 21, "past": [10, 23], "patch": [6, 23], "path": [0, 4, 6, 7, 9, 19, 26], "pathcollect": 17, "patient": 7, "patter": 4, "pattern": [0, 3, 4, 12, 25, 26], "pauli": [0, 26], "pc": [11, 15, 19], "pca": [0, 7, 19, 26, 27], "pd": [0, 4, 5, 6, 7, 9, 11, 26, 27, 28], "pde": 2, "pdf": [0, 3, 4, 5, 6, 9, 15, 16, 21, 25, 26], "pedagog": [0, 26, 27], "penal": [6, 18, 27], "penalti": [6, 13, 18, 21, 27], "penros": [5, 6], "pentagon": [13, 28], "peopl": [1, 9, 13, 19], "per": [0, 1, 6, 22, 24, 26], "percentag": [10, 11, 24], "perceptron": [0, 1, 7, 26], "peregrin": 26, "perfect": [0, 1, 13, 26], "perfectli": [4, 6], "perform": [0, 2, 3, 4, 5, 6, 8, 10, 11, 12, 13, 14, 16, 18, 19, 20, 21, 23, 26, 27, 28], "performac": 4, "perhap": [0, 5, 13, 26, 27, 28], "perimet": 1, "period": [1, 4, 23], "permiss": 15, "permut": 11, "persist": 13, "person": [5, 6, 7, 16, 22, 24, 26, 27], "perspect": 25, "pertin": [12, 26], "petal": [8, 9], "peter": [25, 27], "phantom": 23, "phase": [6, 12], "phenomena": 23, "phi": 8, "phi_k": 8, "philosophi": 13, "phone": [24, 26], "photo": [4, 26], "php": 21, "phrase": [0, 26], "physic": [0, 1, 4, 7, 12, 13, 23, 24, 25, 26, 27, 28], "pi": [2, 3, 5, 6, 7, 9, 12, 13, 23], "pick": [1, 9, 10, 11, 13, 14], "pickl": 1, "pictur": [0, 26], "pie": [19, 26], "piec": [11, 14], "pillow": [0, 19, 21, 26], "pinv": [5, 6, 13, 21, 27, 28], "pip": [0, 1, 15, 19, 21, 26], "pip3": [0, 1, 21, 26], "pipelin": [0, 6, 8, 10, 27], "pippin": 26, "pit": 4, "pitfal": [6, 27], "pitt": 12, "pixel": [1, 3, 4, 26], "pixel_height": [1, 3], "pixel_width": [1, 3], "place": [0, 4, 6, 8, 13, 15, 20, 21, 26, 28], "plai": [0, 3, 4, 5, 6, 8, 11, 18, 19, 21, 26, 27, 28], "plain": [8, 10, 12, 13, 14, 28], "plan": [6, 9, 24, 25, 26], "plane": [8, 9], "plateau": [5, 28], "platform": [19, 26], "plausibl": 12, "pleas": [13, 21, 24, 26], "plenti": 1, "plethora": [3, 12], "plot": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 16, 17, 18, 19, 20, 21, 23, 26, 27, 28], "plot_all_sc": [21, 27], "plot_confusion_matrix": [7, 10], "plot_count": 6, "plot_cumulative_gain": [7, 10], "plot_data": 1, "plot_dataset": 8, "plot_decision_boundari": [9, 10], "plot_import": 10, "plot_max": 4, "plot_min": 4, "plot_model": 4, "plot_numb": 4, "plot_predict": 8, "plot_regression_predict": 9, "plot_result": 4, "plot_roc": [7, 10], "plot_surfac": [2, 6, 13], "plot_train": 9, "plot_tre": [9, 10], "plt": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 17, 20, 23, 26, 27, 28], "plu": [0, 3, 5, 7, 18, 26, 27], "pm": 8, "pmatrix": 2, "pml": 25, "pn": 3, "png": [0, 4, 6, 7, 9, 26], "point": [0, 1, 2, 3, 5, 6, 7, 8, 9, 10, 11, 13, 14, 18, 20, 21, 23, 24, 26, 27, 28], "point_1": 4, "point_2": 4, "poisson": [19, 23, 26], "poli": [6, 8], "poly100_kernel_svm_clf": 8, "poly3": 0, "poly3_plot": 0, "poly_featur": [8, 9, 15], "poly_features10": 9, "poly_fit": 9, "poly_fit10": 9, "poly_kernel_svm_clf": 8, "poly_model": 15, "poly_ms": 15, "poly_predict": 15, "polydegre": [0, 5, 6, 10, 27], "polygon": [13, 28], "polym": 12, "polymi": 21, "polynomi": [0, 5, 6, 7, 8, 9, 10, 11, 15, 17, 21, 26, 27], "polynomial_featur": [6, 15, 16, 17], "polynomial_svm_clf": 8, "polynomialfeatur": [0, 6, 8, 9, 15, 16, 27], "polytrop": [0, 6], "pool": 3, "pool_siz": 3, "poor": [1, 13, 28], "poorli": [0, 27], "popul": [0, 5, 26, 27], "popular": [0, 1, 3, 6, 7, 8, 9, 11, 12, 15, 19, 20, 21, 23, 27], "popularli": [0, 26], "portabl": 10, "portion": [11, 13], "pose": [0, 4, 5, 6, 11, 23, 26], "posit": [0, 1, 2, 3, 5, 7, 8, 10, 11, 13, 14, 20, 23, 26, 27, 28], "possibl": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 19, 20, 21, 23, 24, 26, 27, 28], "possibli": [6, 8, 13, 21], "posterior": 5, "postpon": [0, 27], "postscript": 21, "postul": 5, "potenti": [0, 3, 5, 6, 12, 13, 27], "pott": 12, "power": [0, 1, 5, 6, 8, 9, 12, 13, 26, 27, 28], "pp": [5, 6], "practic": [0, 5, 6, 7, 8, 16, 18, 21, 23, 27], "practition": [0, 1, 3, 26], "pre": 26, "preced": [1, 11, 12, 23], "preceed": 4, "preceq": 8, "precis": [0, 2, 5, 11, 13, 20, 21, 23, 26, 27], "pred": 6, "predicit": 0, "predict": [0, 1, 5, 6, 7, 8, 9, 10, 15, 16, 17, 18, 19, 21, 25, 26, 27, 28], "predict_prob": 1, "predict_proba": [7, 10], "predictor": [0, 5, 6, 7, 9, 10, 11, 26, 27], "prefer": [0, 1, 6, 8, 9, 11, 13, 15, 19, 21, 26], "prepar": [0, 6, 20, 21, 26, 27], "preprocess": [0, 4, 6, 7, 8, 9, 10, 11, 15, 16, 17, 18, 21], "prerequisit": 0, "prescript": 21, "presenc": 13, "present": [0, 5, 6, 7, 9, 12, 13, 20, 21, 23, 26, 27, 28], "preserv": [3, 11, 20], "press": [13, 15, 25, 28], "pretrain": [1, 4], "pretti": [0, 4, 8, 9, 19, 21, 26], "prev_centroid": 14, "prevent": [13, 23], "previou": [0, 1, 2, 3, 4, 5, 6, 8, 10, 11, 12, 13, 15, 16, 20, 21, 23, 27, 28], "previous": [2, 3, 9, 10, 23], "price": [0, 4, 9, 13], "primal": 8, "primari": [0, 7, 26], "prime": 23, "princip": [0, 5, 7, 19, 26, 27, 28], "principl": [0, 6, 7, 8, 14, 26], "print": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 13, 14, 18, 20, 23, 26, 27, 28], "print_funct": [8, 9], "printout": [0, 26], "prior": [0, 5, 6, 26], "privat": 0, "prob": [1, 23], "probabilist": [0, 25, 26, 27], "probabl": [0, 1, 3, 4, 6, 7, 10, 13, 19, 26, 27], "problem": [0, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 17, 19, 20, 21, 23], "probml": 25, "proce": [0, 5, 6, 7, 8, 9, 10, 11, 13, 20, 26, 27], "procedur": [2, 4, 5, 6, 8, 10, 11, 13, 27, 28], "proceed": 20, "process": [0, 2, 4, 6, 9, 10, 12, 13, 19, 20, 21, 23, 25, 26, 28], "prod": 25, "prod_": [1, 5, 7], "produc": [0, 3, 4, 5, 6, 9, 10, 11, 12, 13, 18, 19, 20, 23, 26, 27], "product": [0, 1, 3, 5, 6, 7, 8, 12, 13, 16, 17, 19, 20, 26, 27], "profess": [0, 26], "program": [0, 1, 4, 5, 6, 8, 12, 14, 15, 19, 20, 22, 23, 24, 26, 27], "programm": 20, "progress": [1, 4, 14], "prohibit": 6, "project": [0, 1, 2, 3, 5, 11, 13, 15, 19, 22, 27, 28], "project_root_dir": [0, 6, 7, 9, 26], "promin": 12, "promis": 8, "promot": [24, 26], "prone": [9, 15], "pronounc": [13, 19, 26], "proof": [0, 11, 12, 13, 26, 28], "propag": [2, 3, 13], "proper": [0, 2, 6, 7], "properli": [1, 6, 8, 10, 13, 18, 21], "properti": [0, 1, 3, 12, 13, 16, 20, 26], "proport": [0, 1, 5, 9, 11, 13, 23, 26, 27], "propos": [1, 4, 6, 10, 21, 26], "propto": [5, 13, 28], "proton": [0, 26], "prove": [3, 13, 28], "provid": [0, 1, 3, 4, 5, 6, 8, 9, 10, 12, 13, 19, 20, 21, 23, 26, 27, 28], "proxi": [1, 13], "prune": 9, "pseudo": [20, 23], "pseudocod": 21, "pseudoinv": 5, "pseudoinvers": [5, 6, 21], "pseudorandom": [6, 23], "psychologi": [0, 26], "pt": 13, "public": [0, 15, 19, 26], "pull": 15, "punish": [0, 1, 26], "pure": [3, 9, 23], "purest": 9, "puriti": 9, "purpos": [0, 3, 10, 12, 14, 26], "push": 15, "put": 1, "py": 5, "pycod": 26, "pydata": 19, "pydot": 9, "pyhton2": 26, "pylab": [7, 26], "pypi": 19, "pyplot": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 17, 20, 23, 26, 27, 28], "pythagora": 5, "python": [1, 2, 3, 5, 6, 8, 11, 12, 13, 14, 21, 23, 27], "python2": [0, 21], "python3": [0, 19, 21, 26], "pytorch": [0, 19, 21, 26], "q": [5, 6, 8, 11, 23], "qp": 8, "qquad": [2, 11, 13, 20], "qr": [5, 6, 20, 27, 28], "quad": [1, 13, 20], "quadrat": [0, 8, 9, 13, 26], "qualit": [4, 9, 21, 23], "qualiti": [0, 9, 19, 26, 27], "quantifi": 1, "quantil": 10, "quantit": [0, 6, 9, 21, 26], "quantiti": [0, 2, 5, 6, 7, 9, 10, 11, 12, 14, 16, 20, 23, 26, 27, 28], "quantum": [4, 12, 25, 26], "quartil": [0, 27], "quench": 5, "queri": 9, "question": [0, 5, 6, 9, 11, 12, 13, 21, 24, 26, 27], "qugan": 4, "quick": [4, 23], "quickli": [1, 3, 9, 11, 13, 28], "quit": [1, 5, 6, 9, 10, 12, 15, 27, 28], "quot": 4, "r": [0, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 17, 19, 20, 21, 23, 27, 28], "r2": [0, 5, 6, 26, 27, 28], "r2_score": [0, 26], "r2score": [0, 26], "r_1": 9, "r_2": 9, "r_j": 9, "r_m": 9, "rad": [], "rade": [], "radial": [8, 12], "radioact": 23, "radiu": [0, 1, 27], "rain": 9, "ramp": 1, "ran0": 23, "ran1": 23, "ran2": 23, "ran3": 23, "rand": [0, 4, 5, 6, 9, 10, 13, 15, 20, 26, 27, 28], "randint": [6, 9, 13], "randn": [0, 1, 2, 5, 6, 9, 11, 13, 15, 18, 26, 27, 28], "random": [0, 1, 2, 3, 4, 5, 6, 8, 9, 13, 14, 15, 16, 17, 18, 19, 20, 26, 27, 28], "random_forest_model": 10, "random_index": 13, "random_indic": [1, 3], "random_st": [7, 8, 9, 10, 11], "randomforestclassifi": 10, "randomli": [1, 6, 9, 13, 14, 18, 28], "rang": [0, 1, 2, 3, 4, 5, 6, 7, 9, 10, 11, 12, 13, 14, 18, 20, 23, 26, 27, 28], "rangl": [0, 6, 11, 23, 26, 27], "rangle_x": 23, "rank": [5, 27, 28], "rankdir": 4, "raphson": [1, 8, 13], "rapidli": 0, "rare": [1, 13], "raschka": [26, 27], "rasckha": 26, "rashcka": 28, "rate": [1, 2, 3, 4, 8, 9, 10, 12, 13, 18, 28], "rather": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 20, 23, 26, 27, 28], "ratio": [4, 7, 9, 10, 11], "rational": [0, 26], "ravel": [5, 6, 7, 8, 9, 10, 11, 13, 20], "raw": 3, "rbf": [8, 11, 12], "rbf_kernel_svm_clf": 8, "rbf_pca": 11, "rc": 23, "rcond": [0, 26, 27], "rcparam": [1, 3, 7, 8, 9, 10, 23, 26], "re": [2, 4, 13, 15, 28], "reach": [1, 4, 5, 6, 9, 10, 12, 13, 14, 28], "read": [0, 2, 3, 4, 5, 6, 7, 8, 11, 12, 16, 17, 20, 21, 23, 25, 28], "read_csv": [0, 6, 7, 9], "read_fwf": [0, 26], "reader": [0, 6, 20, 23, 26, 27], "readi": [0, 1, 5, 6, 8, 10, 11, 12, 20, 26], "readili": 1, "readm": 15, "readthedoc": 19, "real": [0, 1, 4, 7, 10, 11, 12, 16, 18, 20, 27], "real_loss": 4, "real_output": 4, "realist": [8, 26], "realiti": 23, "realiz": [1, 12], "realli": [0, 1, 26], "rearrang": 13, "reason": [0, 1, 3, 4, 10, 13, 25, 26, 28], "reassign": 1, "recal": [5, 6, 9, 10, 11, 12, 20, 23, 26, 27, 28], "recast": 3, "receiv": [1, 3, 10, 12, 23], "recent": [0, 6, 13, 25], "recept": [3, 12], "receptive_field": 3, "recip": [0, 6, 7, 20, 21, 26, 27], "reciproc": 5, "recogn": [0, 4, 5, 10, 26], "recognit": [0, 1, 3, 12, 25, 26], "recommen": 26, "recommend": [0, 2, 3, 4, 5, 6, 8, 13, 15, 19, 20, 21, 25, 28], "reconsid": 9, "reconstruct": 11, "record": [10, 21, 22, 24, 26], "recreat": 15, "rectangl": [9, 13, 28], "rectangular": [5, 27, 28], "rectifi": [1, 3, 12], "recur": [0, 19, 26], "recurr": [0, 1, 19, 26], "recurs": [9, 19, 20, 26], "red": [0, 3, 4, 6, 8, 9], "redefin": [0, 10, 26, 27, 28], "redefinit": 28, "reduc": [1, 3, 5, 6, 9, 10, 11, 13, 26, 28], "reduct": [0, 10, 11, 19, 23, 26, 27], "reegress": 21, "refer": [0, 1, 2, 3, 5, 6, 11, 12, 13, 14, 20, 25, 26, 27, 28], "referenc": 2, "refin": 12, "refit": 6, "reflect": [0, 1, 4, 5, 21, 23, 26], "refresh": [19, 26], "refreshprogrammingskil": 26, "reg": [10, 11], "regard": [1, 9, 13], "regardless": [12, 16], "region": [3, 4, 6, 9, 12, 21], "regist": [6, 23], "reglasso": [5, 28], "regr_1": [0, 9], "regr_2": [0, 9], "regr_3": [0, 9], "regress": [1, 8, 11, 12, 16, 19, 20], "regressor": [0, 7, 10], "regridg": [0, 5, 6, 27, 28], "regular": [0, 3, 4, 5, 6, 7, 9, 13, 17, 18, 24, 26, 27, 28], "regularli": 15, "reilli": [0, 25, 26], "reinforc": [0, 8, 19, 26], "reiter": 1, "reject": 7, "rel": [0, 4, 6, 7, 9, 12, 13, 23, 26, 27], "relat": [0, 1, 3, 4, 5, 11, 13, 14, 20, 23, 26, 27, 28], "relationship": [0, 4, 9, 18, 26], "relativeerror": [0, 26, 27], "releas": [1, 19, 26], "relev": [0, 1, 5, 7, 11, 19, 21, 23, 26, 28], "reli": [0, 6, 8], "reliabilti": 21, "reliabl": [7, 23], "relu": [3, 4, 26], "remain": [1, 2, 4, 6, 12, 20, 23, 27], "remaind": 23, "reman": 2, "remark": 1, "rememb": [0, 8, 13, 20, 21, 26], "remind": [0, 5, 11, 13, 20, 23], "remot": 15, "remov": [4, 5, 6, 18, 27, 28], "renam": 15, "render": [0, 26, 27], "reorder": [5, 7, 27, 28], "reorgan": [0, 26], "repeat": [0, 1, 3, 4, 5, 6, 9, 10, 11, 13, 14, 20, 21, 23, 26, 27, 28], "repeated": 26, "repeatedli": [0, 6, 10, 13], "repet": 3, "repetit": [6, 26, 27], "rephras": [13, 28], "replac": [0, 1, 3, 4, 5, 6, 10, 12, 14, 19, 21, 26, 27, 28], "replica": 6, "repo": [15, 21], "report": 26, "repositori": [4, 21, 26], "repres": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 12, 13, 21, 23, 26, 27, 28], "represent": [0, 1, 3, 6, 23, 26], "representd": 3, "reproduc": [0, 5, 6, 9, 12, 15, 16, 18, 19, 23, 26, 27], "repuls": [0, 26], "request": [0, 13], "requir": [0, 1, 3, 4, 5, 6, 8, 9, 11, 12, 13, 15, 17, 18, 20, 26, 27, 28], "res1": 2, "res2": 2, "res3": 2, "res_analyt": 2, "res_analytical1": 2, "res_analytical2": 2, "res_analytical3": 2, "resaml": 6, "resampl": [0, 7, 10, 19, 26, 27], "rescal": [0, 11, 12], "rescu": 5, "reseach": 6, "research": [0, 4, 13, 19, 25, 26], "resembl": [6, 23], "reserv": [1, 5, 6, 23], "reshap": [0, 1, 2, 3, 4, 6, 8, 9, 10, 20, 26, 27], "residenti": [], "residu": [0, 5, 13, 26], "resiz": [5, 27, 28], "resourc": 26, "respect": [0, 1, 2, 3, 5, 6, 7, 8, 10, 11, 12, 13, 14, 16, 17, 18, 21, 23, 26, 27, 28], "respond": 12, "respons": [0, 7, 9, 12, 26, 27], "rest": [0, 5, 18, 27, 28], "restat": [0, 12, 26], "restor": 4, "restored_discrimin": 4, "restored_gener": 4, "restrict": [0, 3, 9, 12, 26], "result": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 23, 26], "retail": [], "retain": [5, 6, 27, 28], "return": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 11, 13, 14, 16, 17, 20, 23, 26, 27, 28], "return_data": 14, "return_sequ": 4, "return_x_i": 9, "reus": [1, 3, 6, 21], "reveal": [0, 12, 26], "revers": [1, 20], "review": [19, 20], "revisit": 14, "revolut": 26, "reward": [0, 4, 26], "rewrit": [0, 3, 5, 6, 7, 8, 10, 11, 12, 13, 16, 20, 21, 23, 28], "rewritten": [2, 6, 8, 10, 23], "rewrot": 13, "rf": 10, "rgb": 3, "rgoj5yh7evk": 19, "rh": 6, "rho": [0, 10, 13], "rho_1": 10, "rho_2": 10, "rho_m": 10, "rich": [0, 26], "ride": 9, "rideclass": 9, "ridedata": 9, "ridg": [7, 11, 13, 19, 26], "ridge_paramet": 17, "ridge_sk": 6, "ridgebeta": 28, "ridgetheta": 5, "right": [0, 1, 2, 3, 5, 6, 7, 8, 9, 10, 12, 13, 14, 16, 17, 20, 21, 23, 26, 27, 28], "right_sid": 2, "rightarrow": [0, 1, 5, 6, 8, 11, 12, 13, 23, 26, 27, 28], "rigor": [0, 26, 27, 28], "ring": 6, "rise": [0, 26], "risk": [0, 13, 26, 28], "rival": 4, "river": [], "rlm": 26, "rm": 23, "rmse": [], "rmsporp": 13, "rmsprop": [1, 3, 4, 13, 21], "rnd_clf": 10, "rng": 23, "rnn": [4, 12], "rnn1": 4, "rnn2": 4, "rnn_2layer": 4, "rnn_input": 4, "rnn_output": 4, "rnn_train": 4, "rntrick1": 23, "rntrick2": 23, "rntrick3": 23, "rntrick4": 23, "ro": [0, 13, 26, 28], "robert": [21, 25], "robust": [0, 26], "robustscal": [0, 27], "roc": [7, 10], "role": [0, 2, 5, 6, 8, 18, 19, 21, 26, 27, 28], "roll": 6, "room": [0, 24, 26], "root": [0, 5, 9, 13, 15, 23, 27, 28], "rot": 26, "rotat": [1, 8, 9, 10], "rotation_matrix": 9, "roughli": [1, 3, 18], "round": [7, 9, 13], "routin": [13, 20, 26, 28], "row": [0, 1, 2, 5, 6, 9, 11, 16, 20, 26, 27, 28], "rr": [5, 27, 28], "rrr": [5, 27, 28], "rudg": [], "rug": [13, 28], "rule": [0, 1, 5, 6, 13, 21, 26, 27, 28], "run": [0, 1, 2, 4, 5, 6, 8, 9, 11, 13, 15, 19, 21, 26, 27, 28], "runtim": [1, 6, 14, 15], "rust": [0, 19, 20, 26], "rvert": 1, "rvert_2": 1, "s_": [3, 6], "s_1": 6, "s_i": [6, 7], "s_j": 6, "s_k": 6, "s_phenomenon": 21, "saddl": [13, 28], "safeguard": 18, "sai": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 20, 21, 23, 26, 27, 28], "said": [6, 9, 13, 28], "sake": [0, 5, 7, 11, 26, 27, 28], "sale": [0, 26], "sam": 26, "same": [0, 1, 2, 3, 4, 5, 6, 8, 9, 11, 12, 14, 15, 16, 18, 20, 21, 23, 26, 27, 28], "samm": 10, "sampl": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 13, 14, 18, 19, 20, 21, 23, 26, 27], "sample_vari": 14, "sampleexptvari": 23, "samwis": 26, "sastri": 11, "satisfactori": [0, 26], "satisfi": [1, 2, 3, 6, 8, 13, 20, 23, 28], "satur": [1, 6], "save": [0, 4, 6, 7, 9, 13, 26], "save_fig": [0, 6, 7, 9, 10, 26], "savefig": [0, 4, 6, 7, 9, 23, 26], "savetxt": 4, "saw": [5, 27], "scalabl": 10, "scalar": [2, 5, 6, 10, 27], "scale": [0, 1, 3, 5, 6, 7, 8, 9, 10, 11, 12, 13, 19, 20, 21, 24, 26, 28], "scale_mean": 4, "scale_std": 4, "scaler": [0, 7, 8, 9, 10, 11, 17, 21, 27], "scan": [5, 7], "scari": 5, "scatter": [0, 1, 6, 7, 8, 9, 14, 15, 17, 26, 27], "scenario": [6, 13, 28], "schedul": 13, "scheme": [1, 13, 28], "schrage": 23, "sch\u00f8yen": [6, 27], "scienc": [0, 1, 10, 12, 13, 19, 22, 23, 24, 25, 28], "scientif": [0, 19, 21, 26], "scientist": [0, 26], "scikit": [3, 5, 6, 8, 9, 10, 13, 15, 16, 19, 20, 21, 25], "scikit_learn": 0, "scikitlearn": 26, "scikitplot": [7, 10], "scipi": [0, 3, 5, 6, 13, 19, 20, 21, 26, 27, 28], "scl": 6, "scm": 15, "score": [0, 1, 3, 6, 7, 9, 10, 11, 15, 16, 21, 24, 26, 27], "scores_kfold": 6, "scratch": [1, 13, 16], "sdg": 13, "sdv4f4s2sb8": 28, "seaborn": [0, 1, 3, 6, 7, 26], "seamless": [0, 19, 21, 26], "search": [0, 1, 3, 5, 9, 13, 15, 26, 28], "sebastian": 26, "sebastianraschka": 26, "sec": 6, "second": [0, 2, 3, 4, 5, 6, 7, 8, 9, 11, 12, 14, 15, 16, 19, 20, 23, 24, 26, 27, 28], "secondeigvector": 11, "secondli": 12, "section": [4, 11, 16, 20, 21, 23, 27], "sector": 0, "see": [0, 1, 2, 3, 4, 5, 6, 7, 8, 10, 11, 12, 13, 15, 16, 18, 19, 20, 21, 23, 26, 27, 28], "seed": [0, 1, 2, 3, 4, 5, 6, 8, 9, 11, 13, 14, 18, 23, 26, 27, 28], "seed_imag": 4, "seek": [1, 2, 8], "seem": [1, 3, 4], "seemingli": [0, 26], "seen": [0, 1, 3, 5, 10, 12, 23], "segment": [13, 28], "seismic": 6, "seldomli": [0, 26], "select": [1, 5, 6, 8, 9, 10, 11, 15, 21, 22, 23, 24, 25, 26, 27, 28], "selevet": 15, "self": [1, 5, 27], "sell": 4, "semest": [7, 22], "semi": [8, 13, 28], "semilogx": 6, "send": [5, 12, 13, 24, 26], "senior": [22, 24], "sens": [0, 4, 6, 8, 26], "sensibl": 3, "sensit": [0, 5, 6, 9, 13, 26, 27], "sent": 2, "sentenc": [4, 12], "separ": [0, 1, 2, 4, 6, 8, 9, 12, 14, 18, 19, 21, 23, 26], "septemb": [18, 21, 26], "sequenc": [3, 4, 7, 9, 10, 12, 13, 19, 20, 23, 26, 28], "sequenti": [1, 3, 4, 10, 12, 23], "seri": [0, 1, 2, 3, 4, 5, 6, 10, 11, 12, 13, 20, 26, 27, 28], "serif": [7, 23, 26], "serv": [0, 1, 2, 3, 5, 7, 13, 25, 26, 27, 28], "servic": 21, "session": [1, 15, 21, 22, 24, 26], "set": [1, 4, 5, 6, 7, 8, 10, 11, 13, 14, 16, 17, 18, 19, 20, 21, 23, 24], "set_major_formatt": 6, "set_major_loc": 6, "set_tick": [1, 8], "set_ticklabel": 1, "set_titl": [0, 1, 2, 3, 7, 12, 14, 26], "set_xlabel": [0, 1, 2, 3, 7, 12, 26], "set_xlim": [7, 12], "set_xticklabel": 1, "set_ylabel": [0, 1, 2, 3, 7, 26], "set_ylim": [7, 12], "set_ytick": 7, "set_yticklabel": [1, 6], "set_zlim": 6, "seth": 4, "setminu": 6, "setosa": [8, 9], "setosa_or_versicolor": 8, "setp": 6, "setup": [1, 4, 6, 8, 19, 26, 27, 28], "sever": [0, 3, 5, 6, 7, 8, 9, 11, 12, 13, 16, 19, 20, 21, 23, 26, 27, 28], "sgd": [1, 3, 28], "sgd_clf": 8, "sgdclassifi": 8, "sgdreg": 13, "sgdregressor": 13, "sgn": [5, 27, 28], "shallow": 13, "shape": [0, 1, 3, 4, 5, 6, 7, 8, 9, 10, 11, 13, 14, 15, 16, 18, 20, 26, 27, 28], "share": [1, 3, 15, 26], "shareabl": 15, "she": 7, "shift": [1, 6, 12, 15, 18, 23, 27], "ship": 3, "shire": 26, "short": [4, 5, 21], "shortcom": [13, 28], "shorten": 4, "shorter": 23, "shorthand": 26, "shortli": [20, 26], "should": [0, 2, 3, 5, 6, 8, 9, 11, 12, 15, 18, 20, 21, 23, 26, 27], "show": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 20, 21, 23, 26, 27, 28], "show_shap": 4, "shown": [0, 4, 5, 8, 12, 13, 20, 27, 28], "shrink": [3, 5, 6, 8, 11, 27, 28], "shrinkag": [5, 6, 27, 28], "shrunk": 11, "shuffl": [0, 1, 4, 6, 13, 27], "side": [0, 2, 5, 8, 12, 13, 20, 21, 26, 28], "sigh": [19, 26], "sigma": [0, 1, 5, 6, 7, 10, 11, 12, 13, 20, 21, 23, 26, 27, 28], "sigma0": 23, "sigma1": 23, "sigma2": 23, "sigma_": [5, 20, 26, 27, 28], "sigma_0": [5, 27, 28], "sigma_1": [5, 27, 28], "sigma_2": [5, 27, 28], "sigma_fn": [7, 12], "sigma_i": [0, 5, 26, 27, 28], "sigma_j": [5, 27, 28], "sigma_m": [6, 23], "sigma_n": [11, 23], "sigma_t": 13, "sigma_x": 23, "sigmoid": [1, 2, 4, 7, 8, 10, 12], "sigmundson": [6, 27], "sign": [1, 2, 7, 8, 10, 23, 24], "signal": [1, 3, 10, 12], "signifi": 4, "signific": 1, "significantli": [1, 13, 18, 23, 28], "sim": [4, 5, 6, 13, 23], "similar": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 14, 19, 20, 21, 26, 28], "similarli": [0, 1, 3, 5, 8, 10, 13, 23, 26, 27, 28], "simpl": [1, 2, 3, 5, 6, 7, 8, 10, 11, 12, 14, 16, 17, 19, 20, 23], "simplepredict": 10, "simpler": [0, 1, 5, 6, 7, 13, 16, 19, 21, 26, 28], "simplernn": 4, "simplest": [0, 1, 3, 4, 9, 10, 12, 14, 21, 26], "simpletre": 10, "simpli": [0, 1, 2, 4, 5, 6, 8, 9, 10, 11, 12, 19, 20, 21, 23, 26, 27, 28], "simplic": [2, 5, 6, 7, 8, 9, 10, 11, 12, 14, 27, 28], "simplicti": [5, 27, 28], "simplifi": [0, 6, 9, 18, 19, 21, 26, 27], "simplist": [3, 6, 23], "simul": [6, 18], "simultan": 6, "sin": [0, 1, 2, 3, 4, 9, 12, 13, 20, 26], "sinc": [0, 1, 2, 3, 5, 6, 7, 8, 9, 10, 11, 13, 16, 18, 20, 21, 23, 25, 26, 27, 28], "sine": [3, 12], "singl": [0, 1, 2, 3, 5, 6, 7, 8, 9, 12, 13, 18, 20, 23, 26, 27, 28], "singular": [0, 6, 13, 20, 26], "sinusoid": 3, "site": [0, 21, 22, 27], "situat": [0, 4, 5, 7, 13, 23, 26, 27, 28], "six": [3, 23], "size": [0, 1, 2, 3, 4, 5, 6, 8, 9, 10, 11, 13, 18, 20, 21, 23, 26], "sketch": 10, "ski": 9, "skill": 0, "skip": 11, "skl": [0, 6, 26, 27], "sklearn": [0, 1, 3, 5, 6, 7, 8, 9, 10, 11, 13, 14, 15, 17, 26, 27, 28], "skplt": [7, 10], "sl": [6, 27], "slack": 8, "slice": [2, 20, 26], "slide": [0, 3, 16, 21, 23, 26, 27, 28], "slight": [6, 13], "slightli": [1, 2, 3, 5, 6, 7, 10, 23, 27, 28], "slope": [8, 11, 12], "slow": [0, 2, 8, 13, 18, 27, 28], "slower": [5, 20, 26, 27, 28], "slowest": 20, "slowli": 12, "slp": 1, "small": [0, 1, 2, 3, 5, 6, 8, 9, 10, 11, 12, 13, 19, 20, 23, 26, 27, 28], "smaller": [0, 1, 2, 5, 6, 8, 9, 11, 13, 23, 26, 27, 28], "smallest": [0, 4, 14, 26], "smallest_row_index": 14, "smooth": [0, 3, 6, 13, 21, 26, 28], "sn": [0, 1, 3, 6, 7, 26], "sne": 11, "so": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 18, 19, 20, 21, 23, 24, 26, 27, 28], "soar": 6, "social": 0, "soft": [1, 7, 10, 12], "soften": 8, "softmax": [3, 7], "softwar": [0, 8, 19, 20], "sol": 8, "sole": [0, 6, 26], "solid": [0, 7], "solut": [0, 1, 2, 3, 5, 6, 8, 10, 11, 13, 18, 20, 21, 23, 26, 27, 28], "soluton": 2, "solv": [0, 1, 3, 5, 6, 8, 10, 11, 12, 13, 16, 20, 21, 26, 27], "solve_expdec": 2, "solve_ode_deep_neural_network": 2, "solve_ode_neural_network": 2, "solve_pde_deep_neural_network": 2, "solveod": 2, "solveode_popul": 2, "solver": [2, 7, 8, 9, 10, 20, 26], "some": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 14, 15, 16, 18, 21, 23, 26], "some_model": [6, 27], "somehow": 4, "someon": 16, "someth": [0, 1, 3, 4, 7, 9, 11, 15, 21, 23, 26, 27], "sometim": [0, 1, 11, 12, 13, 14, 27], "soon": [20, 24, 27], "sophist": [0, 26], "sopt": 13, "sort": [5, 6, 9, 11, 23], "sound": [3, 5], "sourc": [0, 1, 3, 6, 19, 20, 21, 23, 26], "space": [0, 1, 4, 5, 8, 9, 11, 12, 13, 14, 23, 27, 28], "span": [0, 3, 5, 9, 11, 20, 26, 27, 28], "spare": 1, "spars": [3, 6, 18, 20, 26], "sparse_mtx": [20, 26], "sparsecategoricalcrossentropi": 3, "sparsiti": [10, 18], "spatial": [1, 2, 3, 12], "speak": 23, "special": [6, 7, 10, 12, 13, 20, 23, 26, 27, 28], "specif": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 11, 12, 15, 16, 19, 20, 21, 23, 25, 26, 27, 28], "specifi": [0, 3, 5, 6, 7, 9, 11, 13, 14, 23, 26, 28], "specifici": [0, 10, 26], "spectacular": 3, "spectral": 1, "speech": [0, 1, 3, 4, 12], "speed": [1, 2, 4, 13], "spend": [16, 23], "spent": 21, "sphere": [0, 27], "spin": 6, "spite": 0, "spline": 8, "split": [1, 3, 4, 5, 6, 8, 9, 10, 11, 14, 16, 17, 21, 23, 26, 28], "splite": 0, "splitter": [1, 10], "spontan": 23, "spot": 3, "spread": [0, 11, 23, 26, 27], "springer": [21, 25, 26], "spuriou": 13, "sqquar": 28, "sqrsignal": 3, "sqrt": [3, 4, 5, 6, 8, 10, 11, 13, 23, 27, 28], "squar": [1, 2, 3, 4, 7, 8, 9, 11, 13, 14, 15, 17, 18, 19, 20, 23], "squarederror": 10, "squaredeuclidean": 14, "squash": 12, "srtm": 6, "srtm_data_norway_1": 6, "stabil": [5, 21], "stabl": [0, 4, 5, 6, 9, 16, 19, 21, 26, 27, 28], "stack": [3, 4], "stage": [5, 13, 15, 21], "stai": [0, 2, 4, 5, 11, 26, 27], "stand": [0, 5, 9, 12, 26, 27, 28], "standard": [0, 1, 4, 5, 6, 7, 8, 10, 12, 17, 18, 20, 21, 23, 26, 28], "standardscal": [0, 6, 7, 8, 9, 10, 11, 17, 27], "stanford": [13, 28], "start": [0, 1, 2, 3, 4, 5, 6, 8, 9, 10, 11, 12, 13, 14, 15, 16, 18, 20, 23, 24, 26, 27, 28], "start_tim": 14, "stat": 6, "state": [1, 2, 4, 5, 6, 7, 8, 10, 11, 12, 13, 19, 23, 26, 27, 28], "statement": [0, 7, 20, 26], "stationari": 28, "statist": [0, 1, 3, 4, 7, 9, 10, 11, 12, 13, 14, 20, 21, 25, 27, 28], "statu": [0, 7, 15, 26], "stavang": 6, "std": [0, 4, 6, 18, 26, 27], "steep": [13, 28], "step": [0, 1, 2, 4, 6, 7, 9, 10, 11, 12, 13, 14, 15, 18, 20, 21, 26, 28], "step_fn": [7, 12], "step_length": 13, "steps_list": 9, "stereo": 3, "still": [0, 2, 3, 5, 6, 11, 13, 23, 27, 28], "stimuli": 12, "stk": [25, 26], "stk2100": [25, 26], "stk3155": [15, 21, 22, 24], "stk4021": [25, 26], "stk4051": [25, 26], "stk4155": [22, 24], "stk5000": 25, "stochast": [0, 1, 5, 6, 8, 11, 12, 28], "stock": 4, "stoke": 12, "stone": [0, 7], "stop": [1, 4, 9, 13, 14, 18, 28], "storag": [5, 27, 28], "store": [0, 1, 2, 3, 6, 11, 13, 18, 23, 26], "storehaug": [24, 26], "str": [1, 3, 4], "straight": [0, 6, 8, 13, 26, 28], "straightforward": [0, 2, 3, 5, 6, 8, 9, 10, 13, 20, 26, 27, 28], "strategi": [0, 1, 9, 26], "stratifi": 6, "strength": [0, 5, 14, 27, 28], "stretch": 11, "strict": [8, 13, 28], "strictli": [8, 13, 28], "stride": [4, 20], "strike": 6, "string": 1, "stroke": 7, "strong": [3, 6, 9, 10, 12, 20, 23], "strongli": [0, 8, 15, 19, 20], "stronli": [], "structur": [0, 1, 2, 3, 6, 9, 10, 12, 19, 26], "stuck": [1, 13, 28], "student": [0, 15, 21, 22, 24, 25, 26], "studi": [0, 3, 4, 5, 6, 7, 8, 11, 12, 13, 19, 21, 25, 26, 27, 28], "studier": 25, "style": [7, 9, 20, 26], "st\u00f8land": 24, "sub": [9, 12], "subdivid": [0, 20, 26], "subfield": 0, "subject": [6, 8, 23], "submit": 26, "subplot": [0, 1, 3, 4, 6, 7, 8, 9, 10, 14, 26], "subplots_adjust": [8, 23], "subprogram": [20, 26], "subract": [0, 27], "subroutin": [0, 26], "subscript": 1, "subsequ": [1, 4, 5, 6, 12, 20, 23, 27, 28], "subset": [1, 6, 9, 12, 13, 19, 26, 28], "subspac": [0, 8, 11, 27], "substanti": [9, 10], "substep": 11, "substitut": [3, 6, 12, 16, 20], "subsubset": 9, "subtask": 6, "subtl": 1, "subtract": [0, 4, 5, 6, 11, 13, 18, 20, 21, 23, 27], "subtre": 9, "succeed": [0, 4, 26], "success": [3, 7, 9, 13, 23], "successfulli": [4, 9], "sudo": [0, 19, 21, 26], "suffer": [0, 1, 2, 5, 10, 26, 27, 28], "suffici": [1, 6, 8, 11, 13, 28], "suggest": [1, 13, 21, 25, 28], "suit": [8, 12], "suitabl": [0, 15, 23, 27], "sum": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 20, 23, 26, 27, 28], "sum_": [0, 1, 2, 3, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 16, 20, 21, 23, 26, 27, 28], "sum_i": [0, 2, 5, 6, 8, 13, 21, 27, 28], "sum_j": [6, 18], "sum_ja_": 0, "sum_k": [6, 8, 12, 20], "sum_logist": 13, "sum_m": 3, "sum_n": 3, "sum_nx_": 3, "summar": [5, 6, 9], "summari": [1, 3, 4, 10, 22, 28], "summat": [0, 3, 16, 27, 28], "sunni": 9, "super": [5, 27, 28], "superfici": 3, "superscript": [1, 12], "supervis": [0, 5, 6, 7, 9, 12, 19, 26, 27, 28], "supplement": [7, 21], "support": [0, 1, 9, 10, 11, 13, 19, 26, 27], "suppos": [0, 5, 6, 7, 8, 10, 11, 12, 13, 20, 26, 27, 28], "suppress": [5, 13, 28], "sure": [0, 1, 4, 6, 16, 21], "surf": 6, "surfac": [0, 6, 26], "surpass": 6, "surpris": [0, 26], "surround": [3, 19], "survei": [0, 5, 6, 26, 27], "svc": [8, 9, 10], "svd": [0, 6, 11, 26], "svdinv": 5, "svm": [8, 9, 10, 11], "svm_clf": [8, 10], "swath": [5, 27, 28], "switch": 0, "sy": [13, 28], "symbol": [1, 5, 11, 13, 19, 23, 26, 27, 28], "symmeteri": 1, "symmetr": [0, 5, 8, 11, 12, 13, 20, 26, 27], "symmetri": 6, "sympi": [0, 19, 21, 26], "synonim": 23, "syntax": 13, "system": [0, 1, 3, 4, 6, 7, 9, 10, 12, 13, 15, 19, 20, 21, 26, 28], "systemat": [4, 6], "t": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 23, 24, 26, 28], "t0": [3, 6, 13], "t1": [2, 13], "t2": 2, "t3": 2, "t_": 2, "t_0": [2, 9, 13], "t_1": 13, "t_b": 10, "t_i": [1, 2, 5, 12, 27, 28], "t_j": 12, "t_k": 9, "tabl": [9, 21, 23, 24, 26], "tabul": [0, 26], "tabular": 26, "tackl": 4, "tag": [2, 3, 4, 5, 6, 7, 12, 13, 14, 20, 23, 27, 28], "taht": [0, 26], "tail": 23, "tailor": [2, 8, 11, 26], "taiwan": [0, 26], "take": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 17, 19, 20, 23, 26, 27, 28], "taken": [0, 1, 3, 6, 10, 13, 20], "tan": 3, "tangent": [1, 4, 12, 13, 28], "tanh": [1, 4, 7, 8, 12], "target": [0, 1, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 15, 16, 18, 26, 27, 28], "target_nam": 9, "task": [0, 1, 3, 6, 9, 11, 12, 14, 21, 26], "tau": [3, 5, 23], "taught": 26, "tax": [], "taylor": [2, 13, 28], "taylornr": [13, 28], "tc": 8, "teach": [15, 22, 26], "team": 1, "teaser": 0, "technic": [0, 5, 6, 13, 21, 28], "techniqu": [0, 1, 8, 10, 13, 19, 23, 25, 26, 27], "technologi": [0, 1], "tell": [0, 4, 6, 10, 11, 13, 16, 23], "temp": 1, "temp1": 1, "temp2": 1, "temperatur": [0, 9, 26], "templat": 18, "temporarili": 1, "ten": [3, 26], "tend": [3, 5, 6, 8, 9, 10, 12, 13, 14, 27], "tendenc": [0, 26], "tension": 6, "tensor": 3, "tensorflow": [0, 2, 4, 8, 14, 19, 20, 21, 25, 26, 27], "term": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 16, 18, 21, 23, 26, 27, 28], "term1": [5, 6, 11], "term2": [5, 6, 11], "term3": [5, 6, 11], "term4": [5, 6, 11], "termin": [0, 4, 5, 9, 10, 13, 15, 27, 28], "terminarl": 15, "terrain": 6, "terrain1": 6, "test": [3, 4, 5, 6, 7, 8, 9, 10, 13, 16, 20, 21, 23, 26, 28], "test_acc": 3, "test_accuraci": [1, 3], "test_error": 6, "test_imag": [3, 4], "test_ind": 6, "test_input": 4, "test_label": [3, 4], "test_loss": 3, "test_pr": 1, "test_predict": 1, "test_rnn": 4, "test_scor": [7, 10], "test_siz": [0, 1, 3, 5, 6, 10, 15, 17, 27, 28], "test_split": 9, "testerror": [0, 6, 27], "testi": 4, "testpredict": 4, "testx": 4, "text": [0, 1, 2, 4, 5, 8, 9, 11, 13, 15, 18, 20, 23, 25, 27, 28], "textbook": [16, 21, 27, 28], "textual": 9, "textur": 1, "tf": [1, 3, 4, 13, 14, 28], "th": [0, 1, 2, 5, 6, 7, 9, 12, 13, 14, 20, 21, 23, 26, 27], "than": [0, 1, 2, 3, 4, 5, 6, 7, 9, 10, 11, 12, 13, 17, 19, 23, 26, 27], "thank": [4, 6, 27], "theano": [1, 19, 26], "thei": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 11, 12, 13, 15, 16, 18, 20, 21, 23, 26, 27, 28], "them": [0, 1, 3, 4, 6, 8, 9, 10, 11, 12, 13, 18, 20, 21, 26, 27], "theme": [0, 15, 26], "themselv": [0, 21, 23, 26], "thenc": 6, "theorem": [2, 6, 7, 27, 28], "theoret": [0, 4, 10], "theori": [0, 1, 3, 8, 9, 12, 13, 19, 21, 25, 26], "thereaft": [0, 5, 6, 11, 12, 20, 21, 26], "therebi": [0, 5, 7, 11, 21, 26, 27, 28], "therefor": [0, 1, 2, 3, 4, 6, 7, 8, 11, 13, 23, 26, 27, 28], "therein": 11, "thereof": [0, 6, 13, 26], "theta": [0, 1, 4, 5, 6, 7, 13, 16, 21, 23, 26, 27, 28], "theta_": [0, 1, 6, 7, 13, 26, 27, 28], "theta_0": [0, 5, 6, 7, 16, 26, 27, 28], "theta_0x_": [0, 26, 27], "theta_1": [0, 5, 6, 7, 26, 27, 28], "theta_1x_": [0, 26, 27], "theta_1x_0": [0, 26], "theta_1x_1": [0, 7, 26], "theta_1x_2": [0, 26], "theta_1x_i": [7, 27, 28], "theta_2": [0, 26, 27], "theta_2x_": [0, 26, 27], "theta_2x_0": [0, 26], "theta_2x_1": [0, 26], "theta_2x_2": [0, 7, 26], "theta_2x_i": 27, "theta_3x_i": 27, "theta_4x_i": 27, "theta_closed_form": 18, "theta_closed_formol": 18, "theta_closed_formridg": 18, "theta_gdol": 18, "theta_gdridg": 18, "theta_i": [0, 1, 5, 26, 27, 28], "theta_j": [0, 5, 6, 18, 26, 27], "theta_k": 28, "theta_linreg": [13, 28], "theta_ol": 18, "theta_p": 7, "theta_px_p": 7, "theta_ridg": 18, "theta_t": 13, "theta_tru": 18, "thetavalu": 5, "thi": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 25, 27, 28], "thing": [0, 1, 2, 4, 5, 7, 9, 15, 16, 18, 23, 26], "think": [0, 1, 3, 4, 6, 9, 12, 13, 14, 23, 26, 27, 28], "third": [0, 3, 6, 13, 24, 26, 28], "thirti": 7, "thorughout": 26, "those": [0, 3, 5, 6, 8, 9, 10, 11, 20, 21, 26, 27, 28], "though": [1, 2, 3, 4, 13, 16, 17, 20, 23], "thought": [6, 14, 21, 23], "thousand": [0, 1, 21, 27], "three": [0, 1, 3, 5, 6, 8, 9, 12, 20, 21, 22, 23, 24, 26, 27, 28], "threshold": [1, 3, 9, 10, 11, 12, 13], "through": [0, 1, 2, 3, 4, 5, 6, 8, 11, 12, 13, 14, 15, 19, 20, 21, 23, 26, 27, 28], "throughout": [0, 4, 5, 14, 15, 19, 20, 23, 26], "throw": [3, 6, 23], "thu": [0, 1, 2, 5, 6, 7, 8, 10, 11, 12, 13, 24, 26, 27, 28], "thumb": [0, 6, 21, 27], "thursdai": [], "tibshirani": [6, 21, 25, 26], "tick_param": 6, "ticker": [6, 13, 23, 28], "tif": 6, "tight_layout": [1, 7], "tightli": 11, "tild": [0, 5, 6, 7, 11, 21, 23, 26, 27, 28], "till": [0, 4, 7, 8, 9, 10, 12, 20, 26, 27], "time": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 23, 26, 27, 28], "timeit": 4, "timer": 4, "tini": 1, "tip": 3, "titl": [0, 1, 2, 3, 4, 6, 7, 8, 9, 10, 13, 15, 23, 26, 28], "tmp": 13, "tn": [2, 3, 7], "to_categor": [1, 3, 4], "to_categorical_numpi": 1, "to_numer": [0, 6, 26], "todai": 3, "togeth": [0, 3, 6, 8, 11, 13, 19, 26], "toi": 14, "told": 13, "toler": [2, 14], "tolist": 4, "tomographi": 12, "too": [0, 2, 4, 5, 6, 9, 11, 13, 17, 18, 23, 25, 27, 28], "took": [8, 26], "tool": [0, 1, 3, 6, 13, 15, 19, 27], "toolbox": 8, "top": [0, 3, 5, 6, 9, 10, 19, 26], "topic": [0, 5, 6, 7, 8, 19, 21, 27, 28], "topolog": [3, 12], "topologi": [1, 12], "torkjellsdatt": [24, 26], "toss": [10, 23], "total": [0, 1, 2, 3, 4, 6, 7, 8, 10, 11, 12, 13, 14, 20, 23, 24, 26, 27, 28], "total_loss": 4, "totalclustervari": 14, "totalscatt": 14, "toward": [1, 2, 7, 12, 13, 15, 28], "town": [], "tp": [4, 7], "tpng": 9, "tpu": [13, 19, 26], "tqdm": 6, "tr": [], "track": [3, 13, 14, 15, 20, 27, 28], "tract": [], "tractabl": [0, 26, 27], "trade": [5, 9], "tradeoff": [0, 5, 21, 26, 27, 28], "tradit": [0, 1, 4, 6, 26], "train": [2, 3, 5, 6, 8, 9, 10, 11, 12, 13, 16, 17, 21, 28], "train_accuraci": [0, 1, 3, 26], "train_dataset": 4, "train_end": [0, 1, 27], "train_error": 6, "train_imag": [3, 4], "train_ind": 6, "train_label": [3, 4], "train_pr": 1, "train_siz": [0, 1, 3, 27], "train_step": 4, "train_test_split": [0, 1, 3, 5, 6, 7, 9, 10, 11, 15, 16, 17, 26, 27, 28], "train_test_split_numpi": [0, 1, 27], "trainable_vari": 4, "trained_model": [6, 27], "trainerror": [0, 27], "traini": 4, "training_checkpoint": 4, "training_dataset": 4, "training_gradi": 13, "trainingerror": 6, "trainpredict": 4, "trainscor": 4, "trainx": 4, "trait": [0, 26], "trajectori": 4, "transfer": [9, 26], "transform": [0, 5, 6, 7, 8, 9, 10, 11, 12, 13, 17, 19, 20, 26, 27, 28], "transit": [6, 12], "translat": [1, 4, 6, 10, 26, 27], "transpos": [1, 5, 11, 20, 27, 28], "travers": [0, 5], "treat": [0, 1, 3, 6, 12, 13, 18, 23, 26, 27, 28], "tree": [0, 1, 19, 26], "tree_clf": [9, 10], "tree_clf_": 9, "tree_clf_sr": 9, "tree_reg": 9, "tree_reg1": 9, "tree_reg2": 9, "trend": 23, "treue": 7, "trevor": [21, 25], "tri": [2, 3, 4, 9, 13, 16], "triain": 0, "trial": [0, 2, 4, 6, 13, 23, 26, 28], "triangl": [13, 28], "triangular": 20, "trick": [3, 4, 8, 11, 13, 23], "trickier": 23, "tridiagon": 20, "trillion": 19, "trivial": [0, 1, 5, 11, 23, 26, 28], "troubl": [0, 8, 12, 15, 27], "truck": 3, "true": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 12, 13, 14, 16, 17, 18, 20, 21, 23, 26, 27, 28], "true_beta": 27, "true_fun": 6, "true_theta": 6, "truli": 26, "try": [0, 1, 2, 4, 5, 6, 7, 8, 9, 10, 11, 13, 14, 15, 18, 19, 20, 21, 23, 26, 27, 28], "tucker": 8, "tuesdai": [24, 26], "tumor": [7, 9], "tumour": 7, "tunabl": 1, "tune": [4, 9, 13, 20, 26], "turn": [0, 1, 5, 6, 7, 8, 9, 10, 11, 12, 13, 20, 21, 23, 26, 27, 28], "tutori": [1, 4], "tv": 2, "tveito": 2, "tweak": [1, 4, 10, 23], "twice": [13, 28], "twist": 11, "two": [0, 1, 2, 4, 5, 6, 7, 9, 10, 11, 12, 13, 15, 17, 20, 21, 22, 23, 25, 26, 27, 28], "tx": [13, 28], "tx_1": [13, 28], "txt": [4, 15], "ty": [13, 28], "type": [0, 1, 3, 6, 8, 10, 13, 20, 23, 27, 28], "typic": [0, 1, 2, 3, 4, 5, 7, 9, 10, 12, 13, 15, 16, 23, 26, 27, 28], "typo": 21, "u": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 20, 21, 23, 25, 26, 27, 28], "u_": 20, "u_i": 12, "u_m": 10, "ua": [0, 26], "ubuntu": [0, 19, 21, 26], "uci": 21, "uio": [15, 21, 24, 25], "un": 14, "unabl": 15, "unari": [20, 26], "unbalanc": [6, 9], "unbias": [0, 5, 6, 26], "uncent": [6, 27], "uncertainti": [0, 5, 26], "uncertitud": 23, "unchang": [1, 3], "uncorrel": [10, 23], "undefin": [5, 27, 28], "under": [0, 1, 5, 6, 10, 13, 19, 21, 26, 27, 28], "underdetermin": [0, 26], "underfit": [1, 6], "underflowproblem": 5, "undergo": 5, "undergradu": [22, 24], "underli": [0, 1, 9, 13, 18, 23, 26], "underset": [4, 14], "understand": [0, 1, 3, 5, 6, 10, 13, 14, 15, 19, 26, 27, 28], "understood": [8, 13], "undesir": 8, "undetermin": [5, 8], "undo": 4, "unexpect": 6, "unexpected": 23, "unexplain": 18, "unfair": [6, 27], "unfortun": [1, 8, 9, 10], "unicode_liter": [8, 9], "uniform": [0, 1, 5, 6, 11, 13, 21, 23, 26, 28], "uniformli": [13, 23, 28], "unifrompdf": 23, "unimport": [13, 28], "union": [5, 6], "uniqu": [0, 2, 6, 13, 14, 20, 26], "unique_cluster_label": 14, "unit": [0, 1, 3, 4, 5, 10, 12, 18, 23, 26, 27, 28], "unitari": [5, 6, 20, 27, 28], "unitarili": [20, 26], "uniti": 23, "univari": 23, "univers": [0, 1, 2, 13, 19, 21, 22, 24, 26, 27, 28], "unix": 1, "unknow": [0, 20, 26], "unknown": [0, 1, 3, 4, 5, 6, 8, 10, 13, 20, 21, 26, 27, 28], "unknowwn": 12, "unlabel": 1, "unless": [0, 3, 6, 11, 13, 21, 26, 28], "unlik": [1, 3, 8, 13, 28], "unnecessarili": 9, "unord": 3, "unravel": 1, "unrol": [3, 11], "unseen": [0, 7, 9, 15], "unstabl": 1, "unsupervis": [0, 1, 4, 12, 19, 26], "unsymmetr": [20, 26], "until": [1, 2, 4, 9, 12, 13, 14, 28], "untouch": 0, "unusu": 12, "up": [1, 3, 4, 5, 6, 8, 10, 11, 13, 14, 16, 18, 19, 20, 21, 23, 24], "updat": [1, 2, 10, 12, 13, 14, 15, 18], "uploa": 26, "upload": [15, 19, 21, 25], "upon": [0, 1, 6, 7, 11, 20], "upper": [0, 8, 9, 16, 20, 27], "uppercas": [20, 26], "upsampl": 4, "upscal": 4, "url": [26, 27], "us": [4, 5, 6, 8, 9, 10, 11, 12, 14, 15, 17, 20, 23, 25], "usag": [0, 8, 19, 26, 27], "usd": [], "usd10000": [], "use_bia": 4, "usecol": [0, 26], "useless": 1, "user": [0, 1, 2, 4, 6, 7, 15, 19, 20, 21, 26, 27], "usernam": [15, 21], "usetex": 23, "usg": 6, "usr": 23, "usual": [0, 3, 4, 7, 12, 13, 14, 26], "ut": 5, "util": [1, 3, 4, 6, 7, 10, 14, 26], "ux": 20, "v": [2, 4, 5, 6, 11, 13, 15, 19, 27, 28], "v0": 23, "v1": 23, "v2": 23, "v_0": 11, "va": 1, "vahid": 26, "val": 13, "val_accuraci": 3, "val_loss": 4, "vale": 2, "valid": [0, 1, 4, 7, 9, 10, 13, 19, 23, 26, 27], "validation_data": 3, "validation_split": 4, "valu": [0, 1, 2, 3, 4, 6, 7, 8, 9, 10, 12, 13, 14, 16, 17, 18, 19, 20, 21, 26], "valuat": 9, "valy": 4, "van": [0, 21, 26, 27, 28], "vandenbergh": [8, 13, 28], "vandermond": [0, 26], "vanilla": [0, 6, 11, 14, 27], "vanish": [1, 4, 13, 23, 28], "var": [5, 6, 10, 11, 21, 23, 27], "var_x": 23, "varabl": 8, "varepsilon": [5, 6], "varepsilon_": [5, 6], "varepsilon_i": [5, 6], "vari": [0, 1, 3, 5, 6, 10, 26], "variabl": [0, 1, 2, 5, 6, 7, 8, 10, 11, 12, 13, 14, 20, 26, 27], "varianc": [0, 1, 5, 7, 9, 10, 11, 13, 14, 18, 19, 20, 23, 26, 27, 28], "variance_i": [5, 11, 27], "variance_x": [5, 11, 27], "variant": [0, 1, 6, 8, 12, 13, 26, 27, 28], "variat": [3, 4, 11, 26], "varieti": [0, 3, 12, 19, 21, 26], "variou": [1, 3, 5, 6, 7, 8, 9, 11, 12, 13, 16, 19, 20, 21, 23, 26, 27, 28], "varydimens": 4, "vastli": 3, "vaue": 1, "vault": 0, "vdot": [2, 13, 28], "ve": 21, "vec": 6, "vector": [0, 1, 2, 3, 4, 5, 6, 7, 9, 10, 11, 13, 14, 17, 18, 19, 28], "vector_mean": 14, "ventur": [0, 8, 19, 26], "venv": 15, "verbos": [1, 3, 4], "veri": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 16, 18, 21, 23, 25, 26, 27, 28], "verifi": [3, 11, 20, 26], "versatil": [8, 26], "versicolor": [8, 9], "version": [0, 3, 10, 13, 14, 15, 19, 20, 21, 23, 26], "versu": 1, "vert": [0, 1, 5, 6, 7, 8, 9, 11, 13, 16, 17, 26, 27, 28], "vert_1": [5, 6, 27, 28], "vert_2": [5, 6, 11, 17, 27, 28], "via": [0, 5, 6, 7, 8, 9, 10, 11, 12, 19, 20, 21, 22, 23, 24, 26, 27, 28], "vidal": 11, "video": [0, 1, 12, 19, 22, 24, 26, 27, 28], "view": [1, 3, 5, 6, 12, 13, 23, 25, 26, 28], "violat": 8, "virginica": 9, "viridi": [0, 1, 2, 3, 26], "virtual": 1, "viscos": 13, "viscou": 13, "visibl": 15, "vision": [0, 3], "visual": [0, 3, 11, 12, 18, 19, 26, 27], "visualis": 1, "visualstudio": [15, 16], "viz": [6, 8, 23], "vmap": 13, "vmax": [1, 6], "vmin": [1, 6], "voic": 3, "volum": [0, 3, 26], "vote": [10, 26], "voting_clf": 10, "votingclassifi": 10, "votingsimpl": 10, "vstack": [5, 11, 20, 23, 26, 27], "vt": [5, 27, 28], "w": [0, 1, 2, 3, 4, 5, 6, 7, 8, 10, 11, 12, 13, 14, 20, 23, 26, 27, 28], "w1": 8, "w2": [8, 11], "w3": 8, "w_": [1, 12], "w_1": [8, 20], "w_1x_": 8, "w_1x_1": 8, "w_2": [8, 20], "w_2x_": 8, "w_2x_2": 8, "w_3": 20, "w_4": 20, "w_hidden": 2, "w_i": [1, 2, 10], "w_ix_i": 12, "w_j": 20, "w_m": 20, "w_output": 2, "w_px_": 8, "w_px_p": 8, "wa": [1, 3, 4, 5, 6, 7, 10, 11, 12, 14, 17, 20, 26, 27], "wai": [0, 1, 2, 3, 4, 5, 6, 7, 8, 10, 11, 12, 13, 14, 15, 18, 20, 23, 26, 27, 28], "walk": 9, "walker": 23, "wang": [0, 26], "want": [0, 1, 2, 3, 4, 5, 6, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 19, 21, 23, 26, 27, 28], "warn": 4, "warrant": 6, "wast": 3, "watch": [19, 28], "wave": 3, "wavelet": 8, "we": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 23, 24, 25, 27, 28], "weak": [9, 10, 14], "weather": [1, 12], "web": [19, 22, 24, 26], "webpag": 26, "websit": [6, 20, 21, 22, 26], "wedg": [8, 23], "wednesdai": [24, 26], "wee": 11, "week": [0, 5, 6, 7, 21, 22, 24], "weekli": [15, 16, 19, 21, 22, 24, 25, 26], "weight": [1, 2, 3, 6, 7, 9, 10, 12, 13, 18, 23], "weigth": 2, "welcom": [8, 15, 19], "well": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 12, 13, 15, 16, 19, 20, 21, 23, 25, 26, 27, 28], "went": 8, "were": [0, 1, 3, 4, 5, 6, 7, 8, 10, 11, 12, 14, 23, 26], "wessel": [0, 21, 26, 27, 28], "what": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 19, 20, 21, 23], "whatev": 3, "when": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 18, 20, 21, 23, 26, 27, 28], "whenev": [13, 15, 23], "where": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 23, 24, 26, 27, 28], "wherea": [6, 23], "wherefrom": 21, "wherein": [1, 12], "whether": [0, 3, 5, 7, 9, 21, 23, 26], "which": [0, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 26, 27, 28], "whichev": [1, 3], "while": [0, 1, 3, 4, 5, 6, 7, 8, 9, 11, 12, 13, 15, 16, 23, 26, 27, 28], "white": 9, "whiteboard": [27, 28], "who": [0, 15], "whole": [1, 3, 4, 5, 9, 11, 13], "whose": [0, 6, 10, 23, 27], "whow": [11, 27], "why": [0, 1, 3, 6, 13, 15, 16, 17, 21, 27, 28], "wide": [0, 1, 3, 6, 7, 12, 19, 20, 21, 26], "widehat": 6, "width": [0, 3, 8, 9, 26], "wieringen": [0, 21, 26, 27, 28], "wiki": 21, "wikipedia": 21, "win": 10, "wind": 9, "wing": [24, 26], "winther": 2, "wiothout": 6, "wiscons": 7, "wisconsin": 10, "wisdom": [6, 27], "wise": [1, 5, 12, 13, 27, 28], "wish": [0, 2, 5, 7, 8, 11, 13, 14, 18, 20, 21, 26, 27, 28], "with_std": [0, 27], "wither": 6, "within": [0, 2, 3, 4, 7, 9, 12, 13, 14, 23, 25, 26, 28], "withinclust": 14, "without": [0, 1, 5, 6, 8, 9, 11, 12, 13, 15, 21, 26, 27, 28], "won": [0, 15, 26], "wonder": 8, "word": [0, 1, 3, 4, 5, 6, 7, 14, 23, 26, 27, 28], "work": [0, 1, 4, 6, 7, 8, 9, 13, 15, 16, 18, 19, 21, 22, 23, 24, 26, 27], "workshop": 26, "world": [0, 8, 16, 27], "worldwid": [0, 26], "worri": 15, "wors": [0, 1, 3, 4, 6, 26], "worth": 9, "worthi": 21, "would": [0, 1, 3, 5, 6, 7, 8, 9, 10, 11, 12, 13, 16, 18, 20, 21, 23, 26, 27, 28], "wrap": [6, 20, 26], "write": [0, 1, 2, 3, 5, 6, 7, 8, 12, 13, 15, 16, 20, 26, 27], "written": [0, 2, 3, 5, 11, 12, 13, 16, 19, 20, 21, 23, 26, 27, 28], "wrong": [1, 8, 15], "wrongli": 10, "wrote": [5, 11, 27], "wrt": [10, 13], "wth": [10, 13], "www": [19, 20, 21, 25, 26, 28], "wx_1": 8, "x": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 20, 21, 23, 26, 28], "x0": 8, "x1": [4, 8, 9, 10, 13], "x1_exampl": 8, "x1d": 8, "x2": [8, 9, 10, 13], "x2d": [8, 11], "x2d_train": 11, "x2dsl": 11, "x3": 8, "x_": [0, 2, 3, 5, 6, 8, 10, 11, 13, 14, 20, 23, 26, 27, 28], "x_0": [0, 5, 11, 18, 20, 26, 27], "x_1": [0, 2, 5, 6, 7, 8, 9, 10, 11, 13, 18, 20, 23, 26, 27, 28], "x_2": [0, 2, 5, 6, 7, 8, 9, 10, 11, 13, 20, 23, 26, 27, 28], "x_3": [8, 20, 23], "x_4": 20, "x_6": 18, "x_center": 11, "x_data": 1, "x_data_ful": 1, "x_hidden": 2, "x_i": [0, 1, 2, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 20, 23, 26, 27, 28], "x_input": 2, "x_ix_": [0, 26], "x_iy_i": 8, "x_j": [0, 2, 8, 9, 12, 16, 23, 27], "x_jy_j": 8, "x_k": [12, 14, 20, 23, 27], "x_l": 23, "x_m": [6, 12, 20, 23], "x_mean": 18, "x_n": [0, 2, 3, 6, 8, 11, 12, 13, 20, 23, 26, 28], "x_new": [9, 10], "x_norm": 18, "x_offset": [6, 27], "x_output": 2, "x_p": [3, 7, 9], "x_poli": 9, "x_poly10": 9, "x_pred": 4, "x_prev": 2, "x_reduc": 11, "x_scale": 8, "x_small": 13, "x_std": 18, "x_test": [0, 1, 3, 5, 6, 7, 9, 10, 11, 15, 16, 17, 27, 28], "x_test_": 17, "x_test_own": 6, "x_test_scal": [0, 6, 7, 9, 10, 11, 27], "x_tot": 4, "x_train": [0, 1, 3, 4, 5, 6, 7, 9, 10, 11, 15, 16, 17, 26, 27, 28], "x_train_": 17, "x_train_mean": [6, 27], "x_train_own": 6, "x_train_scal": [0, 6, 7, 9, 10, 11, 27], "x_val": 1, "xarrai": [19, 26], "xavier": 1, "xbnew": [13, 28], "xcode": [0, 19, 21, 26], "xdclassiffierconfus": 10, "xdclassiffierroc": 10, "xg_clf": 10, "xgb": 10, "xgbclassifi": 10, "xgboost": 9, "xgboot": 10, "xgbregressor": 10, "xgparam": 10, "xgtree": 10, "xi": [8, 13], "xi_": 8, "xi_1": 8, "xi_i": 8, "xk": 8, "xla": [13, 19, 26], "xlabel": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 13, 23, 26, 27, 28], "xlim": [6, 10], "xm": 9, "xmesh": 13, "xnew": [0, 13, 26, 28], "xp": 23, "xpanda": [0, 27], "xpd": [5, 11, 27], "xplot": 0, "xscale": [0, 27], "xsr": 9, "xt_x": [13, 28], "xtest": 6, "xtick": [3, 6, 8, 9], "xtrain": 6, "xu": [0, 26], "xx": [0, 20, 26], "xy": [0, 6, 8, 20, 26], "xytext": 8, "xz": [20, 26], "y": [0, 1, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 20, 21, 23, 26, 27, 28], "y1": 4, "y2": 4, "y3": 4, "y_": [0, 1, 5, 6, 10, 11, 20, 26, 27], "y_0": [0, 5, 11, 20, 26, 27], "y_1": [0, 5, 8, 9, 11, 13, 20, 26, 27, 28], "y_1y_1": 8, "y_1y_1k": 8, "y_1y_2": 8, "y_1y_2k": 8, "y_1y_n": 8, "y_1y_nk": 8, "y_2": [0, 5, 8, 9, 11, 20, 26, 27], "y_2y_1": 8, "y_2y_1k": 8, "y_2y_2": 8, "y_2y_2k": 8, "y_3": [0, 9, 20], "y_4": 20, "y_center": 18, "y_data": [0, 1, 5, 6, 26, 27, 28], "y_data_ful": 1, "y_decis": 8, "y_fit": [0, 27], "y_i": [0, 1, 5, 6, 7, 8, 9, 10, 11, 12, 13, 20, 21, 26, 27, 28], "y_if_": 10, "y_ix_": [0, 26], "y_ix_i": [7, 8, 13, 27, 28], "y_iy_jk": 8, "y_j": [6, 8, 12, 21], "y_k": 12, "y_m": 20, "y_mean": 18, "y_model": [0, 4, 5, 6, 26, 27, 28], "y_n": [8, 13, 28], "y_ny_1": 8, "y_ny_1k": 8, "y_ny_2": 8, "y_ny_2k": 8, "y_ny_n": 8, "y_ny_nk": 8, "y_offset": [6, 17, 27], "y_plot": 9, "y_pred": [0, 1, 4, 6, 7, 8, 9, 10, 27], "y_pred1": 9, "y_pred2": 9, "y_pred_rf": 10, "y_pred_tre": 10, "y_proba": [7, 10], "y_scaler": [6, 27], "y_test": [0, 1, 3, 4, 5, 6, 7, 9, 10, 11, 15, 16, 17, 27, 28], "y_test_onehot": 1, "y_test_predict": [], "y_tot": 4, "y_train": [0, 1, 3, 4, 5, 6, 7, 9, 10, 11, 15, 16, 17, 26, 27, 28], "y_train_mean": [6, 27], "y_train_onehot": 1, "y_train_predict": [], "y_train_scal": [6, 27], "y_val": 1, "ye": [3, 6, 7], "year": [0, 19, 26], "yet": [0, 1, 6, 8, 11, 13, 26], "yi": 13, "yield": [0, 2, 5, 6, 8, 10, 12, 13, 14, 20, 23, 26, 28], "yk": 8, "ylabel": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 13, 23, 26, 27, 28], "ylim": [3, 6], "ym": 9, "ymesh": 13, "yn": 0, "yo": [8, 9, 10], "yoshua": [1, 25], "you": [0, 1, 3, 4, 5, 6, 8, 9, 10, 11, 13, 15, 16, 17, 18, 19, 20, 21, 23, 24, 25, 26, 27, 28], "young": 0, "your": [1, 2, 4, 5, 6, 8, 11, 13, 15, 17, 19, 20, 26, 28], "your_model_object": 16, "yourself": [11, 13, 26, 28], "youtu": [27, 28], "youtub": [19, 28], "ypred": 6, "ypredict": [0, 13, 26, 27, 28], "ypredict2": [13, 28], "ypredictlasso": [5, 28], "ypredictol": [0, 5, 28], "ypredictown": [6, 27], "ypredictownridg": [6, 27, 28], "ypredictridg": [0, 5, 6, 27, 28], "ypredictskl": [6, 27], "ytest": 6, "ytick": [3, 6, 8, 9], "ytild": [0, 6, 26, 27], "ytildelasso": [5, 28], "ytildenp": [0, 26, 27], "ytildeol": [0, 5, 28], "ytildeownridg": [6, 27, 28], "ytilderidg": [5, 6, 27, 28], "ytrain": 6, "yuxi": 26, "yx": [20, 26], "yy": [20, 26], "yz": [20, 26], "z": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 11, 12, 13, 20, 23, 26, 27], "z_": [1, 2, 12, 20, 26], "z_0": [20, 26], "z_1": [20, 26], "z_2": [20, 26], "z_c": 1, "z_h": 1, "z_hidden": 2, "z_i": [1, 12], "z_j": [1, 12], "z_k": [12, 27], "z_m": 1, "z_mod": 9, "z_o": 1, "z_output": 2, "zaman": 23, "zaxi": 6, "zero": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 16, 17, 18, 20, 21, 23, 26, 27, 28], "zeros_lik": 4, "zeroth": 27, "zfill": 4, "zip": [4, 6], "zm_h": [0, 26], "zn": [], "zone": [], "zoom": 26, "zx": [20, 26], "zy": [20, 26], "zz": [20, 26], "\u00f8yvind": [6, 27]}, "titles": ["<span class=\"section-number\">3. </span>Linear Regression", "<span class=\"section-number\">14. </span>Building a Feed Forward Neural Network", "<span class=\"section-number\">15. </span>Solving Differential Equations  with Deep Learning", "<span class=\"section-number\">16. </span>Convolutional Neural Networks", "<span class=\"section-number\">17. </span>Recurrent neural networks: Overarching view", "<span class=\"section-number\">4. </span>Ridge and Lasso Regression", "<span class=\"section-number\">5. </span>Resampling Methods", "<span class=\"section-number\">6. </span>Logistic Regression", "<span class=\"section-number\">8. </span>Support Vector Machines, overarching aims", "<span class=\"section-number\">9. </span>Decision trees, overarching aims", "<span class=\"section-number\">10. </span>Ensemble Methods: From a Single Tree to Many Trees and Extreme Boosting, Meet the Jungle of Methods", "<span class=\"section-number\">11. </span>Basic ideas of the Principal Component Analysis (PCA)", "<span class=\"section-number\">13. </span>Neural networks", "<span class=\"section-number\">7. </span>Optimization, the central part of any Machine Learning algortithm", "<span class=\"section-number\">12. </span>Clustering and Unsupervised Learning", "Exercises week 34", "Exercises week 35", "Exercises week 36", "Exercises week 37", "Applied Data Analysis and Machine Learning", "<span class=\"section-number\">2. </span>Linear Algebra, Handling of Arrays and more Python Features", "Project 1 on Machine Learning, deadline October 6 (midnight), 2025", "Course setting", "<span class=\"section-number\">1. </span>Elements of Probability Theory and Statistical Data Analysis", "Teachers and Grading", "Textbooks", "Week 34: Introduction to the course, Logistics and Practicalities", "Week 35: From Ordinary Linear Regression to Ridge and Lasso Regression", "Week 36: Linear Regression and Gradient descent"], "titleterms": {"": [8, 10, 28], "1": [0, 15, 16, 17, 18, 21, 27], "1a": 18, "2": [0, 15, 16, 17, 18, 26, 27, 28], "2023": 24, "2025": 21, "2a": [], "2b": [], "3": [0, 15, 16, 17, 18, 27], "34": [15, 26], "35": [16, 27], "36": [17, 28], "37": 18, "3a": 18, "3b": 18, "4": [0, 15, 16, 17, 18, 27], "4a": 18, "4b": 18, "5": [0, 16, 18], "6": 21, "A": [0, 1, 4, 8, 9, 26], "And": [26, 27], "In": 24, "Ising": 6, "The": [0, 1, 2, 3, 5, 6, 7, 8, 9, 11, 12, 15, 19, 26, 27, 28], "To": 26, "With": [4, 28], "about": [26, 27, 28], "abov": 28, "activ": [1, 12], "ad": [0, 6, 21, 26, 27], "adaboost": 10, "adagrad": 13, "adam": 13, "adapt": 10, "adjust": 1, "advanc": 21, "adversari": 4, "again": [3, 9], "ai": [21, 26], "aim": [8, 9, 26], "aka": 26, "algebra": [20, 26], "algorithm": [9, 10, 11, 12, 26, 27, 28], "algortithm": [13, 28], "all": 8, "an": [0, 4, 10, 15, 26], "analys": [5, 27, 28], "analysi": [0, 5, 6, 11, 19, 21, 23, 26, 27, 28], "analyt": [0, 16, 18], "ani": [13, 28], "anoth": [9, 28], "appli": 19, "approach": [0, 8, 14, 26], "approxim": 12, "architectur": 1, "arrai": [20, 26], "assist": 24, "autocorrel": 23, "autograd": [2, 13], "automat": 13, "b": 21, "back": [1, 11, 12, 27, 28], "background": [19, 21], "bag": 10, "base": 13, "basic": [0, 5, 7, 9, 10, 11, 20, 27, 28], "batch": 1, "bay": 5, "befor": 11, "better": 8, "bia": [6, 21], "binari": 1, "bind": 26, "bird": 10, "boldsymbol": [18, 27], "boost": 10, "bootstrap": [6, 10], "boston": [], "breast": 1, "brief": 26, "bring": 12, "build": [1, 3, 9], "c": [21, 26], "calcul": [18, 27, 28], "can": 26, "cancer": [1, 7, 9, 11], "cart": 9, "case": [8, 10, 23, 27, 28], "central": [13, 19, 23, 28], "chain": 12, "chang": 10, "channel": 26, "chi": [0, 26], "choic": 17, "choos": 1, "cifar01": 3, "classic": 11, "classif": [1, 9, 10], "classifi": 8, "clip": 1, "cluster": 14, "cnn": 3, "code": [1, 2, 5, 9, 11, 12, 13, 14, 15, 16, 21, 26, 27, 28], "collect": [1, 3], "commun": 26, "compar": [2, 10, 16], "comparison": 28, "complet": 27, "complex": [0, 6, 21, 27], "complic": 6, "compon": 11, "comput": 9, "computerlab": 26, "con": 9, "concept": 23, "condit": 28, "conjug": 13, "contn": 26, "convex": [8, 13, 28], "convolut": [3, 12], "correl": [11, 27], "correspond": [], "cost": [1, 10, 27, 28], "cours": [19, 22, 25, 26], "covari": [5, 11, 23, 27], "cover": 26, "creat": 16, "cross": [6, 21], "cython": 26, "d": 21, "data": [0, 1, 3, 6, 7, 9, 11, 15, 17, 18, 19, 23, 26, 27], "dataset": [1, 3, 18], "david": 26, "deadlin": [21, 26], "deadllin": 24, "decai": 2, "decis": [9, 10], "decomposit": [5, 11, 20, 27, 28], "deeep": [], "deep": [1, 2, 26], "defin": [1, 26], "degre": [0, 17, 27], "deliver": [15, 16], "deliveri": 21, "dens": 0, "deriv": [5, 12, 16, 17, 27, 28], "descent": [2, 10, 13, 18, 21, 28], "design": 27, "detail": [3, 26], "develop": 1, "diagon": 11, "differ": 8, "differenti": [2, 13], "diffus": 2, "dimension": [2, 3, 8, 18], "disadvantag": 9, "discret": 23, "discrimin": 26, "distribut": [5, 23], "do": 1, "doe": [27, 28], "domain": 23, "down": 1, "dropout": 1, "e": 21, "economi": [27, 28], "electron": 21, "element": [0, 23, 26], "elimin": 20, "energi": 26, "ensembl": 10, "entropi": 9, "environ": [0, 15], "equat": [0, 2, 12, 27, 28], "error": [0, 10, 26, 27, 28], "essenti": 26, "etc": 26, "euler": 2, "evalu": 1, "exampl": [1, 2, 3, 4, 6, 7, 8, 9, 10, 26, 27, 28], "exercis": [0, 6, 15, 16, 17, 18, 27], "expect": 23, "experi": 23, "explor": 0, "exponenti": 2, "express": [16, 17, 27], "extend": 28, "extrapol": 4, "extrem": [10, 26], "ey": 10, "f": 21, "fall": 24, "famili": [1, 26], "famou": 20, "fantast": [27, 28], "featur": [9, 16, 20, 27], "feed": [1, 12], "final": [12, 27], "find": [16, 18], "fine": 1, "first": [4, 12, 26, 28], "fit": [0, 10, 15, 16, 26, 28], "fix": [27, 28], "forc": 3, "forest": 10, "form": 18, "format": [21, 26], "formula": 18, "forward": [1, 2, 12], "foster": 26, "fourier": 3, "frank": 6, "freedom": [0, 17, 27], "frequent": 27, "frequentist": [0, 26], "from": [5, 10, 12, 26, 27, 28], "full": 2, "function": [0, 1, 6, 7, 8, 10, 11, 12, 13, 21, 23, 26, 27, 28], "further": [3, 5, 27, 28], "g": 21, "gan": 4, "gaussian": 20, "gd": 13, "gener": [4, 9, 26], "geometr": [11, 28], "gini": 9, "github": 15, "goal": [15, 16, 17, 18], "good": [0, 26], "grade": [24, 26], "gradient": [1, 2, 10, 13, 18, 21, 28], "growth": 2, "h": 21, "ha": 19, "handl": [20, 26], "hessian": [27, 28], "hidden": 2, "hous": [], "how": 16, "hyperparamet": [1, 17], "hyperplan": 8, "i": [0, 1, 26], "id3": 9, "idea": 11, "ideal": 28, "ii": 26, "illustr": 28, "implement": [1, 16, 17, 18], "implic": [5, 27, 28], "import": [5, 20, 26, 27, 28], "improv": 1, "includ": [13, 21], "increment": 11, "index": 9, "inform": 24, "input": 2, "instal": [19, 21, 26], "instructor": 24, "interpret": [5, 11, 26, 27, 28], "introduc": [11, 13, 27], "introduct": [0, 6, 19, 20, 21, 26], "invers": [5, 20], "invert": [27, 28], "iter": 10, "its": 27, "jacobian": 27, "jax": 13, "julia": 26, "jungl": 10, "kera": [1, 3], "kernel": [8, 11], "lab": 28, "lagrangian": 8, "lasso": [5, 6, 21, 27, 28], "last": 27, "later": [5, 27, 28], "layer": [1, 2, 3, 12], "learn": [0, 1, 2, 11, 13, 14, 15, 16, 17, 18, 19, 21, 26, 27, 28], "least": [5, 6, 16, 21, 26, 27, 28], "lectur": [26, 28], "level": 10, "librari": [19, 26], "likelihood": 7, "limit": [1, 13, 23, 28], "linear": [0, 8, 13, 15, 20, 26, 27, 28], "link": [5, 11, 25, 27], "literatur": 21, "logist": [7, 26], "loss": [27, 28], "lu": 20, "machin": [0, 8, 13, 19, 21, 26, 28], "main": [23, 26], "make": [0, 9, 10, 27], "mani": [10, 12], "mass": 26, "materi": [21, 26, 27, 28], "math": [5, 27, 28], "mathemat": [3, 5, 8, 27, 28], "matric": [5, 20, 26], "matrix": [1, 5, 11, 12, 16, 20, 26, 27, 28], "matter": 0, "max": 27, "mean": [0, 27, 28], "meet": [5, 10, 23, 26, 27], "mercer": 8, "method": [6, 9, 10, 13, 21, 26, 28], "midnight": 21, "min": 27, "minim": 26, "ml": 26, "mlp": 12, "mnist": [3, 4], "model": [0, 1, 4, 6, 12, 15, 17, 26], "momentum": [13, 21], "mondai": 28, "moon": [8, 9], "more": [3, 6, 20, 21, 26, 27, 28], "multilay": 12, "multipl": [1, 3, 17], "multipli": 8, "need": [21, 26], "network": [1, 2, 3, 4, 7, 12, 26], "neural": [1, 2, 3, 4, 7, 12, 26], "new": [4, 18], "newton": 28, "non": 8, "normal": [0, 1], "notat": 12, "note": [21, 27, 28], "now": [1, 9, 13, 28], "nuclear": [0, 26], "numba": 26, "number": [0, 2, 23, 27], "numer": [2, 21, 23], "numpi": [20, 26], "object": 3, "obtain": 11, "octob": 21, "od": 2, "off": [6, 21], "ol": [5, 6, 15, 16, 18, 21, 28], "one": [2, 12, 18, 28], "oper": 20, "optim": [1, 8, 13, 18, 19, 26, 27, 28], "order": [13, 18], "ordinari": [5, 6, 16, 21, 26, 27, 28], "organ": [0, 26], "oslo": 25, "other": [4, 9, 11, 12, 20, 21, 26], "our": [0, 4, 5, 11, 13, 21, 26, 27, 28], "outcom": [19, 26], "output": 2, "overarch": [0, 4, 8, 9, 26, 27], "overview": [10, 26], "own": [0, 10, 11, 21, 26, 27], "packag": [20, 26], "panda": [26, 27], "paramet": [26, 27], "paramt": 18, "part": [13, 19, 21, 28], "partial": 2, "pass": 1, "pca": 11, "pdf": 23, "perceptron": 12, "perform": [1, 9], "period": 3, "perspect": 1, "plan": [27, 28], "plethora": 26, "point": 4, "poisson": 2, "polynomi": [3, 16, 18, 28], "popul": 2, "popular": 26, "practic": [13, 24, 26], "pre": [1, 3], "preambl": 21, "predict": 4, "preprocess": 27, "prerequisit": [3, 19, 26], "princip": 11, "principl": 3, "pro": 9, "probabl": [5, 23], "problem": [1, 2, 13, 26, 27, 28], "procedur": [9, 26], "process": [1, 3], "program": [2, 13, 21, 28], "project": [6, 21, 24, 26], "prop": 13, "propag": [1, 12], "properti": [5, 23, 27, 28], "python": [0, 9, 15, 19, 20, 26], "quick": 8, "r": 26, "random": [10, 11, 23], "raphson": 28, "rate": 21, "read": [9, 26, 27], "real": [6, 26], "recommend": [26, 27], "recurr": [4, 12], "reduc": [0, 27], "reduct": 3, "refer": 21, "reformul": 2, "regress": [0, 5, 6, 7, 9, 10, 13, 15, 17, 18, 21, 26, 27, 28], "regular": 1, "relat": [], "relev": [25, 27], "relu": 1, "remark": 3, "remind": [6, 8, 26, 27, 28], "replac": 13, "report": 21, "repositori": 15, "requir": [2, 19], "resampl": [6, 21], "rescal": [6, 27], "residu": [27, 28], "resourc": 2, "result": [27, 28], "revisit": [13, 28], "rewrit": [26, 27], "ridg": [0, 5, 6, 17, 18, 21, 27, 28], "rm": 13, "rule": 12, "rung": 21, "same": 13, "sampl": 11, "scale": [17, 18, 27], "schedul": 26, "schemat": 9, "scheme": 2, "scienc": 26, "scikit": [0, 1, 11, 26, 27, 28], "second": [13, 18], "semest": 24, "sensit": 28, "septemb": 28, "session": 28, "set": [0, 2, 3, 9, 12, 15, 22, 26, 27, 28], "setup": 15, "sgd": 13, "should": 1, "similar": 13, "simpl": [0, 4, 9, 13, 18, 26, 27, 28], "simplest": 18, "singl": 10, "singular": [5, 11, 27, 28], "size": [27, 28], "sklearn": 16, "soft": 8, "softmax": 1, "softwar": [21, 26], "solv": [2, 28], "solver": 13, "some": [13, 20, 27, 28], "specifi": 2, "split": [0, 15, 27], "squar": [0, 5, 6, 10, 16, 21, 26, 27, 28], "standard": [13, 27], "state": 0, "statist": [5, 6, 19, 23, 26], "steepest": [10, 13, 28], "stochast": [13, 21, 23], "strongli": 26, "suggest": 26, "summari": [24, 26], "superposit": 3, "supervis": 1, "support": 8, "svd": [5, 27, 28], "synthet": 18, "systemat": 3, "t": 27, "take": 16, "taken": 26, "teach": 24, "teacher": [24, 26], "technic": 27, "techniqu": [6, 11, 21], "technologi": 19, "tensorflow": [1, 3], "tent": [24, 26], "test": [0, 1, 15, 17, 27], "text": 26, "textbook": [25, 26], "than": 28, "theorem": [5, 8, 11, 12, 23], "theori": 23, "theta": 18, "thi": 26, "tip": 13, "togeth": 12, "tool": [21, 26], "top": 1, "topic": 26, "toward": 11, "trade": [6, 21], "tradeoff": 6, "train": [0, 1, 4, 15, 26, 27], "transform": 3, "tree": [9, 10], "tuesdai": 28, "tune": 1, "two": [3, 8, 19], "type": [2, 4, 12, 26], "uio": 26, "univers": [12, 25], "unsupervis": 14, "up": [0, 2, 9, 12, 15, 26, 27, 28], "updat": 21, "us": [0, 1, 2, 3, 7, 13, 16, 18, 19, 21, 26, 27, 28], "v": 3, "valid": [6, 21], "valu": [5, 11, 23, 27, 28], "variabl": [23, 28], "varianc": [6, 21], "variou": 0, "vector": [8, 12, 16, 20, 26, 27], "versu": 26, "view": [0, 4, 10, 27], "virtual": 15, "visual": [1, 9], "wai": [9, 21], "wave": 2, "we": 26, "wednesdai": 28, "week": [15, 16, 17, 18, 26, 27, 28], "weekli": [], "what": [0, 26, 27, 28], "which": 1, "why": 26, "wisconsin": 7, "write": [4, 11, 21, 28], "x": 27, "xgboost": 10, "yet": 28, "your": [0, 10, 16, 18, 21, 27]}})
\ No newline at end of file
+Search.setIndex({"alltitles": {"1a)": [[18, "a"]], "3D volumes of neurons": [[43, "d-volumes-of-neurons"], [44, "d-volumes-of-neurons"]], "3a)": [[18, "id1"]], "3b)": [[18, "b"]], "4a)": [[18, "id2"]], "4b)": [[18, "id3"]], "A Classification Tree": [[9, "a-classification-tree"]], "A Frequentist approach to data analysis": [[0, "a-frequentist-approach-to-data-analysis"], [33, "a-frequentist-approach-to-data-analysis"]], "A better approach": [[8, "a-better-approach"]], "A deep CNN model (From Raschka et al)": [[43, "a-deep-cnn-model-from-raschka-et-al"], [44, "a-deep-cnn-model-from-raschka-et-al"]], "A first summary": [[33, "a-first-summary"]], "A more compact expression": [[38, "a-more-compact-expression"], [39, "a-more-compact-expression"]], "A more efficient way of coding the above Convolution": [[43, "a-more-efficient-way-of-coding-the-above-convolution"], [44, "a-more-efficient-way-of-coding-the-above-convolution"]], "A new Cost Function": [[37, "a-new-cost-function"]], "A possible implementation of a neural network": [[42, "a-possible-implementation-of-a-neural-network"], [43, "a-possible-implementation-of-a-neural-network"]], "A quick Reminder on Lagrangian Multipliers": [[8, "a-quick-reminder-on-lagrangian-multipliers"]], "A simple example": [[4, "a-simple-example"]], "A soft classifier": [[8, "a-soft-classifier"]], "A top-down perspective on Neural networks": [[1, "a-top-down-perspective-on-neural-networks"], [41, "a-top-down-perspective-on-neural-networks"]], "A way to Read the Bias-Variance Tradeoff": [[37, "a-way-to-read-the-bias-variance-tradeoff"], [38, "a-way-to-read-the-bias-variance-tradeoff"]], "ADAM algorithm, taken from Goodfellow et al": [[36, "adam-algorithm-taken-from-goodfellow-et-al"]], "ADAM optimizer": [[13, "adam-optimizer"], [36, "id2"]], "Accuracy": [[36, "accuracy"]], "Activation functions": [[12, "activation-functions"], [39, "activation-functions"], [41, "activation-functions"], [41, "id3"], [42, "activation-functions"], [42, "id1"]], "Activation functions, Logistic and Hyperbolic ones": [[39, "activation-functions-logistic-and-hyperbolic-ones"], [41, "activation-functions-logistic-and-hyperbolic-ones"]], "Activation functions, examples": [[42, "activation-functions-examples"]], "AdaGrad Properties": [[36, "adagrad-properties"]], "AdaGrad Update Rule Derivation": [[36, "adagrad-update-rule-derivation"]], "AdaGrad algorithm, taken from Goodfellow et al": [[36, "adagrad-algorithm-taken-from-goodfellow-et-al"]], "Adam Optimizer": [[36, "adam-optimizer"]], "Adam vs. AdaGrad and RMSProp": [[36, "adam-vs-adagrad-and-rmsprop"]], "Adam: Bias Correction": [[36, "adam-bias-correction"]], "Adam: Exponential Moving Averages (Moments)": [[36, "adam-exponential-moving-averages-moments"]], "Adam: Update Rule Derivation": [[36, "adam-update-rule-derivation"]], "Adaptive boosting: AdaBoost, Basic Algorithm": [[10, "adaptive-boosting-adaboost-basic-algorithm"]], "Adaptivity Across Dimensions": [[36, "adaptivity-across-dimensions"]], "Add Dense layers on top": [[44, "add-dense-layers-on-top"]], "Adding Neural Networks": [[39, "adding-neural-networks"]], "Adding a hidden layer": [[40, "adding-a-hidden-layer"], [41, "adding-a-hidden-layer"]], "Adding error analysis and training set up": [[33, "adding-error-analysis-and-training-set-up"], [34, "adding-error-analysis-and-training-set-up"]], "Adjust hyperparameters": [[1, "adjust-hyperparameters"], [41, "adjust-hyperparameters"]], "Algorithms and codes for Adagrad, RMSprop and Adam": [[36, "algorithms-and-codes-for-adagrad-rmsprop-and-adam"]], "Algorithms for Setting up Decision Trees": [[9, "algorithms-for-setting-up-decision-trees"]], "An Overview of Ensemble Methods": [[10, "an-overview-of-ensemble-methods"]], "An extrapolation example": [[4, "an-extrapolation-example"]], "An optimization/minimization problem": [[33, "an-optimization-minimization-problem"]], "Analyzing the last results": [[40, "analyzing-the-last-results"], [41, "analyzing-the-last-results"]], "And a similar example using Tensorflow with Keras": [[42, "and-a-similar-example-using-tensorflow-with-keras"]], "And finally  \\boldsymbol{X}\\boldsymbol{X}^T": [[34, "and-finally-boldsymbol-x-boldsymbol-x-t"]], "And finally ADAM": [[36, "and-finally-adam"]], "And what about using neural networks?": [[33, "and-what-about-using-neural-networks"]], "Another Example from Scikit-Learn\u2019s Repository": [[37, "another-example-from-scikit-learn-s-repository"], [38, "another-example-from-scikit-learn-s-repository"]], "Another Example, now with a polynomial fit": [[35, "another-example-now-with-a-polynomial-fit"]], "Another example, the moons again": [[9, "another-example-the-moons-again"]], "Applied Data Analysis and Machine Learning": [[25, null]], "Artificial neurons": [[39, "artificial-neurons"], [40, "artificial-neurons"]], "Assumptions made": [[37, "assumptions-made"]], "Autocorrelation function": [[30, "autocorrelation-function"]], "Automatic differentiation": [[13, "automatic-differentiation"], [40, "automatic-differentiation"]], "Automatic differentiation through examples": [[40, "automatic-differentiation-through-examples"]], "Back propagation": [[42, "back-propagation"], [43, "back-propagation"]], "Back propagation and automatic differentiation": [[42, "back-propagation-and-automatic-differentiation"]], "Back to Ridge and LASSO Regression": [[34, "back-to-ridge-and-lasso-regression"], [35, "back-to-ridge-and-lasso-regression"]], "Back to the Cancer Data": [[11, "back-to-the-cancer-data"]], "Background literature": [[27, "background-literature"], [28, "background-literature"]], "Bagging": [[10, "bagging"]], "Bagging Examples": [[10, "bagging-examples"]], "Basic Matrix Features": [[26, "basic-matrix-features"]], "Basic ideas of the Principal Component Analysis (PCA)": [[11, null]], "Basic math of the SVD": [[5, "basic-math-of-the-svd"], [34, "basic-math-of-the-svd"], [35, "basic-math-of-the-svd"]], "Basics": [[7, "basics"], [38, "basics"], [39, "basics"]], "Basics of a tree": [[9, "basics-of-a-tree"]], "Basics of an NN": [[40, "basics-of-an-nn"]], "Batch Normalization": [[1, "batch-normalization"], [41, "batch-normalization"]], "Batches and mini-batches": [[36, "batches-and-mini-batches"]], "Bayes\u2019 Theorem and Ridge and Lasso Regression": [[5, "bayes-theorem-and-ridge-and-lasso-regression"]], "Boosting, a Bird\u2019s Eye View": [[10, "boosting-a-bird-s-eye-view"]], "Bootstrap": [[6, "bootstrap"]], "Bringing it together": [[40, "bringing-it-together"], [41, "bringing-it-together"]], "Bringing it together, first back propagation equation": [[12, "bringing-it-together-first-back-propagation-equation"]], "Building a Feed Forward Neural Network": [[1, null]], "Building a neural network code": [[41, "building-a-neural-network-code"]], "Building a tree, regression": [[9, "building-a-tree-regression"]], "Building code using Pytorch": [[44, "building-code-using-pytorch"]], "Building convolutional neural networks in Tensorflow/Keras and PyTorch": [[43, "building-convolutional-neural-networks-in-tensorflow-keras-and-pytorch"]], "Building convolutional neural networks using Tensorflow and Keras": [[44, "building-convolutional-neural-networks-using-tensorflow-and-keras"]], "Building neural networks in Tensorflow and Keras": [[1, "building-neural-networks-in-tensorflow-and-keras"], [41, "building-neural-networks-in-tensorflow-and-keras"], [42, "building-neural-networks-in-tensorflow-and-keras"]], "Building our own  neural network code": [[42, "building-our-own-neural-network-code"]], "But none of these can compete with Newton\u2019s method": [[36, "but-none-of-these-can-compete-with-newton-s-method"]], "CNNs in brief": [[43, "cnns-in-brief"], [44, "cnns-in-brief"]], "CNNs in more detail, building convolutional neural networks in Tensorflow and Keras": [[3, "cnns-in-more-detail-building-convolutional-neural-networks-in-tensorflow-and-keras"]], "CNNs in more detail, simple example": [[43, "cnns-in-more-detail-simple-example"], [44, "cnns-in-more-detail-simple-example"]], "Cancer Data again now with Decision Trees and other Methods": [[9, "cancer-data-again-now-with-decision-trees-and-other-methods"]], "Chain rule": [[40, "chain-rule"]], "Chain rule, forward and reverse modes": [[40, "chain-rule-forward-and-reverse-modes"]], "Challenge: Choosing a Fixed Learning Rate": [[36, "challenge-choosing-a-fixed-learning-rate"]], "Choose cost function and optimizer": [[1, "choose-cost-function-and-optimizer"], [41, "choose-cost-function-and-optimizer"]], "Class of functions we can approximate": [[40, "class-of-functions-we-can-approximate"]], "Classical PCA Theorem": [[11, "classical-pca-theorem"]], "Classification and Regression, writing our own neural network code": [[28, "classification-and-regression-writing-our-own-neural-network-code"]], "Classification problems": [[38, "classification-problems"], [39, "classification-problems"]], "Clustering and Unsupervised Learning": [[14, null]], "Code Example for Cross-validation and k-fold Cross-validation": [[37, "code-example-for-cross-validation-and-k-fold-cross-validation"], [38, "code-example-for-cross-validation-and-k-fold-cross-validation"]], "Code example": [[40, "code-example"], [41, "code-example"]], "Code example for the Bootstrap method": [[37, "code-example-for-the-bootstrap-method"]], "Code for SVD and Inversion of Matrices": [[5, "code-for-svd-and-inversion-of-matrices"]], "Code with a Number of Minibatches which varies": [[36, "code-with-a-number-of-minibatches-which-varies"]], "Codes and Approaches": [[14, "codes-and-approaches"]], "Codes for the SVD": [[5, "codes-for-the-svd"], [34, "codes-for-the-svd"], [35, "codes-for-the-svd"]], "Coding Setup and Linear Regression": [[15, "coding-setup-and-linear-regression"]], "Collect and pre-process data": [[1, "collect-and-pre-process-data"], [41, "collect-and-pre-process-data"], [41, "id2"], [42, "collect-and-pre-process-data"]], "Communication channels": [[33, "communication-channels"]], "Commutative process": [[43, "commutative-process"], [44, "commutative-process"]], "Compact expressions": [[40, "compact-expressions"], [41, "compact-expressions"]], "Compare  Bagging on Trees with Random Forests": [[10, "compare-bagging-on-trees-with-random-forests"]], "Comparing with a numerical scheme": [[2, "comparing-with-a-numerical-scheme"], [42, "comparing-with-a-numerical-scheme"], [43, "comparing-with-a-numerical-scheme"]], "Comparison with OLS": [[35, "comparison-with-ols"]], "Compile and train the model": [[44, "compile-and-train-the-model"]], "Completing the list": [[40, "completing-the-list"], [41, "completing-the-list"]], "Computation of gradients": [[36, "computation-of-gradients"]], "Computing the Gini index": [[9, "computing-the-gini-index"]], "Conditions on convex functions": [[35, "conditions-on-convex-functions"]], "Confidence Intervals": [[37, "confidence-intervals"]], "Confusion Matrix": [[23, "confusion-matrix"]], "Conjugate gradient method": [[13, "conjugate-gradient-method"]], "Convergence rates": [[36, "convergence-rates"]], "Convex function": [[35, "convex-function"]], "Convex functions": [[13, "convex-functions"], [35, "convex-functions"]], "Convolution Examples: Polynomial multiplication": [[3, "convolution-examples-polynomial-multiplication"], [43, "convolution-examples-polynomial-multiplication"], [44, "convolution-examples-polynomial-multiplication"]], "Convolution Examples: Principle of Superposition and Periodic Forces (Fourier Transforms)": [[3, "convolution-examples-principle-of-superposition-and-periodic-forces-fourier-transforms"]], "Convolutional Neural Network": [[12, "convolutional-neural-network"], [39, "convolutional-neural-network"], [40, "convolutional-neural-network"]], "Convolutional Neural Networks": [[3, null]], "Convolutional Neural Networks (recognizing images)": [[43, "convolutional-neural-networks-recognizing-images"]], "Convolutional Neural Networks (recognizing images), reminder from last week": [[44, "convolutional-neural-networks-recognizing-images-reminder-from-last-week"]], "Correlation Function and Design/Feature Matrix": [[34, "correlation-function-and-design-feature-matrix"]], "Correlation Matrix": [[11, "correlation-matrix"], [34, "correlation-matrix"]], "Correlation Matrix with Pandas": [[34, "correlation-matrix-with-pandas"]], "Cost functions": [[41, "cost-functions"], [42, "cost-functions"]], "Counting the number of floating point operations": [[40, "counting-the-number-of-floating-point-operations"]], "Course Format": [[33, "course-format"]], "Course setting": [[29, null]], "Covariance Matrix Examples": [[34, "covariance-matrix-examples"]], "Covariance and Correlation Matrix": [[34, "covariance-and-correlation-matrix"]], "Cross correlation": [[43, "cross-correlation"], [44, "cross-correlation"]], "Cross-validation": [[6, "cross-validation"]], "Cross-validation in brief": [[37, "cross-validation-in-brief"], [38, "cross-validation-in-brief"]], "Cumulative Gain": [[23, "cumulative-gain"]], "Deadlines for projects (tentative)": [[33, "deadlines-for-projects-tentative"]], "Decision trees, overarching aims": [[9, null]], "Deep Neural Networks": [[36, "deep-neural-networks"]], "Deep learning methods": [[33, "deep-learning-methods"]], "Define model and architecture": [[1, "define-model-and-architecture"], [41, "define-model-and-architecture"]], "Defining intermediate operations": [[40, "defining-intermediate-operations"]], "Defining the cost function": [[1, "defining-the-cost-function"], [41, "defining-the-cost-function"]], "Defining the problem": [[42, "defining-the-problem"], [43, "defining-the-problem"]], "Definitions": [[19, "definitions"], [40, "definitions"], [41, "definitions"]], "Deliverables": [[15, "deliverables"], [16, "deliverables"], [19, "deliverables"], [20, "deliverables"], [24, "deliverables"], [27, "deliverables"], [28, "deliverables"]], "Derivation of the AdaGrad Algorithm": [[36, "derivation-of-the-adagrad-algorithm"]], "Derivative of the cost function": [[40, "derivative-of-the-cost-function"], [41, "derivative-of-the-cost-function"]], "Derivatives and the chain rule": [[12, "derivatives-and-the-chain-rule"], [40, "derivatives-and-the-chain-rule"], [41, "derivatives-and-the-chain-rule"]], "Derivatives in terms of z_j^L": [[40, "derivatives-in-terms-of-z-j-l"], [41, "derivatives-in-terms-of-z-j-l"]], "Derivatives of the hidden layer": [[40, "derivatives-of-the-hidden-layer"], [41, "derivatives-of-the-hidden-layer"]], "Derivatives, example 1": [[34, "derivatives-example-1"]], "Deriving OLS from a probability distribution": [[5, "deriving-ols-from-a-probability-distribution"], [37, "deriving-ols-from-a-probability-distribution"]], "Deriving and Implementing Ordinary Least Squares": [[16, "deriving-and-implementing-ordinary-least-squares"]], "Deriving and Implementing Ridge Regression": [[17, "deriving-and-implementing-ridge-regression"]], "Deriving the  Lasso Regression Equations": [[34, "deriving-the-lasso-regression-equations"], [35, "deriving-the-lasso-regression-equations"], [35, "id6"]], "Deriving the  Ridge Regression Equations": [[34, "deriving-the-ridge-regression-equations"], [35, "deriving-the-ridge-regression-equations"], [35, "id3"]], "Deriving the back propagation code for a multilayer perceptron model": [[12, "deriving-the-back-propagation-code-for-a-multilayer-perceptron-model"]], "Developing a code for doing neural networks with back propagation": [[1, "developing-a-code-for-doing-neural-networks-with-back-propagation"], [41, "developing-a-code-for-doing-neural-networks-with-back-propagation"]], "Diagonalize the sample covariance matrix to obtain the principal components": [[11, "diagonalize-the-sample-covariance-matrix-to-obtain-the-principal-components"]], "Different kernels and Mercer\u2019s theorem": [[8, "different-kernels-and-mercer-s-theorem"]], "Disadvantages": [[9, "disadvantages"]], "Discriminative Modeling": [[33, "discriminative-modeling"]], "Discussing the correlation data": [[39, "discussing-the-correlation-data"]], "Does Logistic Regression do a better Job?": [[39, "does-logistic-regression-do-a-better-job"]], "Domains and probabilities": [[30, "domains-and-probabilities"]], "Dropout": [[1, "dropout"], [41, "dropout"]], "ELU function": [[41, "elu-function"], [42, "elu-function"]], "Economy-size SVD": [[34, "economy-size-svd"], [35, "economy-size-svd"]], "Efficient Polynomial Multiplication": [[43, "efficient-polynomial-multiplication"], [44, "efficient-polynomial-multiplication"]], "Elements of Probability Theory and Statistical Data Analysis": [[30, null]], "Empirical Evidence: Convergence Time and Memory in Practice": [[36, "empirical-evidence-convergence-time-and-memory-in-practice"]], "Ensemble Methods: From a Single Tree to Many Trees and Extreme Boosting, Meet the Jungle of Methods": [[10, null]], "Entropy and the ID3 algorithm": [[9, "entropy-and-the-id3-algorithm"]], "Essential elements of ML": [[33, "essential-elements-of-ml"]], "Evaluate model performance on test data": [[1, "evaluate-model-performance-on-test-data"], [41, "evaluate-model-performance-on-test-data"]], "Example 2": [[34, "example-2"]], "Example 3": [[34, "example-3"]], "Example 4": [[34, "example-4"]], "Example Matrix": [[34, "example-matrix"], [35, "example-matrix"]], "Example code for Bias-Variance tradeoff": [[37, "example-code-for-bias-variance-tradeoff"]], "Example code for Logistic Regression": [[38, "example-code-for-logistic-regression"], [39, "example-code-for-logistic-regression"]], "Example of discriminative modeling, taken from Generative Deep Learning by David Foster": [[33, "example-of-discriminative-modeling-taken-from-generative-deep-learning-by-david-foster"]], "Example of generative modeling, taken from Generative Deep Learning by David Foster": [[33, "example-of-generative-modeling-taken-from-generative-deep-learning-by-david-foster"]], "Example of own Standard scaling": [[34, "example-of-own-standard-scaling"]], "Example relevant for the exercises": [[34, "example-relevant-for-the-exercises"]], "Example: Exponential decay": [[2, "example-exponential-decay"], [42, "example-exponential-decay"], [43, "example-exponential-decay"]], "Example: Population growth": [[2, "example-population-growth"], [42, "example-population-growth"], [43, "example-population-growth"]], "Example: Solving the one dimensional Poisson equation": [[42, "example-solving-the-one-dimensional-poisson-equation"], [43, "example-solving-the-one-dimensional-poisson-equation"]], "Example: Solving the wave equation with Neural Networks": [[42, "example-solving-the-wave-equation-with-neural-networks"]], "Example: The diffusion equation": [[2, "example-the-diffusion-equation"], [42, "example-the-diffusion-equation"], [43, "example-the-diffusion-equation"]], "Example: binary classification problem": [[1, "example-binary-classification-problem"], [41, "example-binary-classification-problem"]], "Examples": [[33, "examples"]], "Examples of CNN setups": [[43, "examples-of-cnn-setups"], [44, "examples-of-cnn-setups"]], "Examples of XOR, OR and AND gates": [[39, "examples-of-xor-or-and-and-gates"]], "Examples of likelihood functions used in logistic regression and neural networks": [[7, "examples-of-likelihood-functions-used-in-logistic-regression-and-neural-networks"]], "Examples of likelihood functions used in logistic regression and nueral networks": [[38, "examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-networks"]], "Exercise 1": [[21, "exercise-1"]], "Exercise 1 - Choice of model and degrees of freedom": [[17, "exercise-1-choice-of-model-and-degrees-of-freedom"]], "Exercise 1 - Finding the derivative of Matrix-Vector expressions": [[16, "exercise-1-finding-the-derivative-of-matrix-vector-expressions"]], "Exercise 1 - Github Setup": [[15, "exercise-1-github-setup"]], "Exercise 1 - Understand the feed forward pass": [[22, "exercise-1-understand-the-feed-forward-pass"]], "Exercise 1, scale your data": [[18, "exercise-1-scale-your-data"]], "Exercise 1:": [[24, "exercise-1"]], "Exercise 1: Creating the report document": [[20, "exercise-1-creating-the-report-document"]], "Exercise 1: Expectation values for ordinary least squares expressions": [[19, "exercise-1-expectation-values-for-ordinary-least-squares-expressions"]], "Exercise 1: Including more data": [[40, "exercise-1-including-more-data"]], "Exercise 1: Setting up various Python environments": [[0, "exercise-1-setting-up-various-python-environments"]], "Exercise 2": [[21, "exercise-2"]], "Exercise 2 - Deriving the expression for OLS": [[16, "exercise-2-deriving-the-expression-for-ols"]], "Exercise 2 - Deriving the expression for Ridge Regression": [[17, "exercise-2-deriving-the-expression-for-ridge-regression"]], "Exercise 2 - Gradient with one layer using autograd": [[22, "exercise-2-gradient-with-one-layer-using-autograd"]], "Exercise 2 - Setting up a Github repository": [[15, "exercise-2-setting-up-a-github-repository"]], "Exercise 2, calculate the gradients": [[18, "exercise-2-calculate-the-gradients"]], "Exercise 2:": [[24, "exercise-2"]], "Exercise 2: Adding good figures": [[20, "exercise-2-adding-good-figures"]], "Exercise 2: Expectation values for Ridge regression": [[19, "exercise-2-expectation-values-for-ridge-regression"]], "Exercise 2: Extended program": [[40, "exercise-2-extended-program"]], "Exercise 2: making your own data and exploring scikit-learn": [[0, "exercise-2-making-your-own-data-and-exploring-scikit-learn"]], "Exercise 3": [[21, "exercise-3"]], "Exercise 3 - Creating feature matrix and implementing OLS using the analytical expression": [[16, "exercise-3-creating-feature-matrix-and-implementing-ols-using-the-analytical-expression"]], "Exercise 3 - Fitting an OLS model to data": [[15, "exercise-3-fitting-an-ols-model-to-data"]], "Exercise 3 - Gradient with one layer writing backpropagation by hand": [[22, "exercise-3-gradient-with-one-layer-writing-backpropagation-by-hand"]], "Exercise 3 - Scaling data": [[17, "exercise-3-scaling-data"]], "Exercise 3 - Setting up a Python virtual environment": [[15, "exercise-3-setting-up-a-python-virtual-environment"]], "Exercise 3, using the analytical formulae for OLS and Ridge regression to find the optimal paramters \\boldsymbol{\\theta}": [[18, "exercise-3-using-the-analytical-formulae-for-ols-and-ridge-regression-to-find-the-optimal-paramters-boldsymbol-theta"]], "Exercise 3: Deriving the expression for the Bias-Variance Trade-off": [[19, "exercise-3-deriving-the-expression-for-the-bias-variance-trade-off"]], "Exercise 3: Normalizing our data": [[0, "exercise-3-normalizing-our-data"]], "Exercise 3: Writing an abstract and introduction": [[20, "exercise-3-writing-an-abstract-and-introduction"]], "Exercise 4 - Custom activation for each layer": [[21, "exercise-4-custom-activation-for-each-layer"]], "Exercise 4 - Fitting a polynomial": [[16, "exercise-4-fitting-a-polynomial"]], "Exercise 4 - Gradient with two layers writing backpropagation by hand": [[22, "exercise-4-gradient-with-two-layers-writing-backpropagation-by-hand"]], "Exercise 4 - Implementing Ridge Regression": [[17, "exercise-4-implementing-ridge-regression"]], "Exercise 4 - Testing multiple hyperparameters": [[17, "exercise-4-testing-multiple-hyperparameters"]], "Exercise 4 - The train-test split": [[15, "exercise-4-the-train-test-split"]], "Exercise 4, Implementing the simplest form for gradient descent": [[18, "exercise-4-implementing-the-simplest-form-for-gradient-descent"]], "Exercise 4: Adding Ridge Regression": [[0, "exercise-4-adding-ridge-regression"]], "Exercise 4: Computing the Bias and Variance": [[19, "exercise-4-computing-the-bias-and-variance"]], "Exercise 4: Making the code available and presentable": [[20, "exercise-4-making-the-code-available-and-presentable"]], "Exercise 5 - Comparing your code with sklearn": [[16, "exercise-5-comparing-your-code-with-sklearn"]], "Exercise 5 - Gradient with any number of layers writing backpropagation by hand": [[22, "exercise-5-gradient-with-any-number-of-layers-writing-backpropagation-by-hand"]], "Exercise 5 - Processing multiple inputs at once": [[21, "exercise-5-processing-multiple-inputs-at-once"]], "Exercise 5, Ridge regression and a new Synthetic Dataset": [[18, "exercise-5-ridge-regression-and-a-new-synthetic-dataset"]], "Exercise 5: Analytical exercises": [[0, "exercise-5-analytical-exercises"]], "Exercise 5: Interpretation of scaling and metrics": [[19, "exercise-5-interpretation-of-scaling-and-metrics"]], "Exercise 5: Referencing": [[20, "exercise-5-referencing"]], "Exercise 6 - Batched inputs": [[22, "exercise-6-batched-inputs"]], "Exercise 6 - Predicting on real data": [[21, "exercise-6-predicting-on-real-data"]], "Exercise 7 - Training": [[22, "exercise-7-training"]], "Exercise 7 - Training on real data (Optional)": [[21, "exercise-7-training-on-real-data-optional"]], "Exercise 8 (Optional) - Object orientation": [[22, "exercise-8-optional-object-orientation"]], "Exercise a)": [[23, "exercise-a"]], "Exercise b)": [[23, "exercise-b"]], "Exercise c) week 43": [[23, "exercise-c-week-43"]], "Exercise:  Cross-validation as resampling techniques, adding more complexity": [[6, "exercise-cross-validation-as-resampling-techniques-adding-more-complexity"]], "Exercise: Analysis of real data": [[6, "exercise-analysis-of-real-data"]], "Exercise: Bias-variance trade-off and resampling techniques": [[6, "exercise-bias-variance-trade-off-and-resampling-techniques"]], "Exercise: Lasso Regression on the Franke function  with resampling": [[6, "exercise-lasso-regression-on-the-franke-function-with-resampling"]], "Exercise: Ordinary Least Square (OLS) on the Franke function": [[6, "exercise-ordinary-least-square-ols-on-the-franke-function"]], "Exercise: Ridge Regression on the Franke function  with resampling": [[6, "exercise-ridge-regression-on-the-franke-function-with-resampling"]], "Exercises": [[0, "exercises"], [23, "exercises"]], "Exercises and Projects": [[6, "exercises-and-projects"]], "Exercises and lab session week 43": [[42, "exercises-and-lab-session-week-43"]], "Exercises week 34": [[15, null]], "Exercises week 35": [[16, null]], "Exercises week 36": [[17, null]], "Exercises week 37": [[18, null]], "Exercises week 38": [[19, null]], "Exercises week 39": [[20, null]], "Exercises week 41": [[21, null]], "Exercises week 42": [[22, null]], "Exercises week 43": [[23, null]], "Exercises week 44": [[24, null]], "Expectation value and variance": [[37, "expectation-value-and-variance"]], "Expectation value and variance for \\boldsymbol{\\theta}": [[37, "expectation-value-and-variance-for-boldsymbol-theta"]], "Expectation values": [[30, "expectation-values"]], "Explicit derivatives": [[40, "explicit-derivatives"], [41, "explicit-derivatives"]], "Exploding gradients": [[41, "exploding-gradients"]], "Extending to more predictors": [[38, "extending-to-more-predictors"], [39, "extending-to-more-predictors"]], "Extending to more than one variable": [[35, "extending-to-more-than-one-variable"]], "Extremely useful tools, strongly recommended": [[33, "extremely-useful-tools-strongly-recommended"]], "Feed-forward neural networks": [[12, "feed-forward-neural-networks"], [39, "feed-forward-neural-networks"], [40, "feed-forward-neural-networks"]], "Feed-forward pass": [[1, "feed-forward-pass"], [41, "feed-forward-pass"]], "Final back propagating equation": [[12, "final-back-propagating-equation"], [40, "final-back-propagating-equation"], [41, "final-back-propagating-equation"]], "Final derivatives": [[40, "final-derivatives"]], "Final expression": [[40, "final-expression"], [41, "final-expression"]], "Final expressions for the biases of the hidden layer": [[40, "final-expressions-for-the-biases-of-the-hidden-layer"], [41, "final-expressions-for-the-biases-of-the-hidden-layer"]], "Final part": [[44, "final-part"]], "Final technicalities I": [[42, "final-technicalities-i"], [43, "final-technicalities-i"]], "Final technicalities II": [[42, "final-technicalities-ii"], [43, "final-technicalities-ii"]], "Final technicalities III": [[42, "final-technicalities-iii"], [43, "final-technicalities-iii"]], "Final technicalities IV": [[42, "final-technicalities-iv"], [43, "final-technicalities-iv"]], "Final visualization": [[44, "final-visualization"]], "Finally, evaluate the model": [[44, "finally-evaluate-the-model"]], "Finding the Limit": [[37, "finding-the-limit"]], "Finding the number of parameters": [[43, "finding-the-number-of-parameters"], [44, "finding-the-number-of-parameters"]], "Fine-tuning neural network hyperparameters": [[1, "fine-tuning-neural-network-hyperparameters"], [41, "fine-tuning-neural-network-hyperparameters"]], "First network example, simple percepetron with one input": [[40, "first-network-example-simple-percepetron-with-one-input"]], "Fitting an Equation of State for Dense Nuclear Matter": [[0, "fitting-an-equation-of-state-for-dense-nuclear-matter"]], "Fixing the singularity": [[34, "fixing-the-singularity"], [35, "fixing-the-singularity"]], "Format for electronic delivery of report and programs": [[27, "format-for-electronic-delivery-of-report-and-programs"], [28, "format-for-electronic-delivery-of-report-and-programs"]], "Forward and reverse modes": [[40, "forward-and-reverse-modes"]], "Fourier series and Toeplitz matrices": [[43, "fourier-series-and-toeplitz-matrices"], [44, "fourier-series-and-toeplitz-matrices"]], "Frequently used scaling functions": [[34, "frequently-used-scaling-functions"], [36, "frequently-used-scaling-functions"]], "From OLS to Ridge and Lasso": [[35, "from-ols-to-ridge-and-lasso"]], "From one to many layers, the universal approximation theorem": [[12, "from-one-to-many-layers-the-universal-approximation-theorem"]], "Full object-oriented implementation": [[41, "full-object-oriented-implementation"]], "Functionality in Scikit-Learn": [[34, "functionality-in-scikit-learn"], [36, "functionality-in-scikit-learn"]], "Further Dimensionality Remarks": [[3, "further-dimensionality-remarks"]], "Further properties (important for our analyses later)": [[5, "further-properties-important-for-our-analyses-later"], [34, "further-properties-important-for-our-analyses-later"], [35, "further-properties-important-for-our-analyses-later"]], "Further remarks": [[43, "further-remarks"], [44, "further-remarks"]], "Further simplification": [[43, "further-simplification"], [44, "further-simplification"]], "Gaussian Elimination": [[26, "gaussian-elimination"]], "General Features": [[9, "general-features"]], "General linear models and linear algebra": [[33, "general-linear-models-and-linear-algebra"]], "Generalizing the above one-dimensional case": [[43, "generalizing-the-above-one-dimensional-case"], [44, "generalizing-the-above-one-dimensional-case"]], "Generalizing the fitting procedure as a linear algebra problem": [[33, "generalizing-the-fitting-procedure-as-a-linear-algebra-problem"], [33, "id1"]], "Generative Adversarial Networks": [[4, "generative-adversarial-networks"]], "Generative Models": [[4, "generative-models"]], "Generative Versus Discriminative Modeling": [[33, "generative-versus-discriminative-modeling"]], "Geometric Interpretation and link with Singular Value Decomposition": [[11, "geometric-interpretation-and-link-with-singular-value-decomposition"]], "Getting serious, the  back propagation equations for a neural network": [[40, "getting-serious-the-back-propagation-equations-for-a-neural-network"]], "Getting started with project 1": [[20, "getting-started-with-project-1"]], "Gradient Boosting, Classification Example": [[10, "gradient-boosting-classification-example"]], "Gradient Boosting, Examples of Regression": [[10, "gradient-boosting-examples-of-regression"]], "Gradient Clipping": [[1, "gradient-clipping"], [41, "gradient-clipping"]], "Gradient Descent Example": [[35, "id1"], [36, "id1"]], "Gradient boosting: Basics with Steepest Descent/Functional Gradient Descent": [[10, "gradient-boosting-basics-with-steepest-descent-functional-gradient-descent"]], "Gradient descent": [[2, "gradient-descent"], [42, "gradient-descent"], [43, "gradient-descent"]], "Gradient descent and Ridge": [[35, "gradient-descent-and-ridge"], [36, "gradient-descent-and-ridge"]], "Gradient descent and revisiting Ordinary Least Squares from last week": [[36, "gradient-descent-and-revisiting-ordinary-least-squares-from-last-week"]], "Gradient descent example": [[35, "gradient-descent-example"], [36, "gradient-descent-example"]], "Gradient expressions": [[40, "gradient-expressions"], [41, "gradient-expressions"]], "Grading": [[31, "grading"], [31, "id2"], [33, "grading"]], "Hidden layers": [[41, "hidden-layers"]], "Homogeneous data": [[41, "homogeneous-data"]], "How to do image compression before the era of deep learning": [[43, "how-to-do-image-compression-before-the-era-of-deep-learning"]], "How to take derivatives of Matrix-Vector expressions": [[16, "how-to-take-derivatives-of-matrix-vector-expressions"]], "Hyperplanes and all that": [[8, "hyperplanes-and-all-that"]], "Identifying Terms": [[37, "identifying-terms"]], "Illustration of a single perceptron model and a multi-perceptron model": [[39, "illustration-of-a-single-perceptron-model-and-a-multi-perceptron-model"], [40, "illustration-of-a-single-perceptron-model-and-a-multi-perceptron-model"]], "Important Matrix and vector handling packages": [[26, "important-matrix-and-vector-handling-packages"]], "Important observations": [[40, "important-observations"], [41, "important-observations"]], "Important technicalities: More on Rescaling data": [[34, "important-technicalities-more-on-rescaling-data"]], "Importing Keras and Tensorflow": [[44, "importing-keras-and-tensorflow"]], "Improving gradient descent with momentum": [[36, "improving-gradient-descent-with-momentum"]], "Improving performance": [[1, "improving-performance"], [41, "improving-performance"]], "In general not this simple": [[40, "in-general-not-this-simple"]], "In summary": [[31, "in-summary"]], "Including Stochastic Gradient Descent with Autograd": [[13, "including-stochastic-gradient-descent-with-autograd"], [36, "including-stochastic-gradient-descent-with-autograd"]], "Including more classes": [[38, "including-more-classes"], [39, "including-more-classes"]], "Incremental PCA": [[11, "incremental-pca"]], "Independent and Identically Distributed (iid)": [[37, "independent-and-identically-distributed-iid"]], "Inputs to the activation function": [[40, "inputs-to-the-activation-function"], [41, "inputs-to-the-activation-function"]], "Insights from the paper by Glorot and Bengio": [[41, "insights-from-the-paper-by-glorot-and-bengio"]], "Installing R, C++, cython or Julia": [[33, "installing-r-c-cython-or-julia"]], "Installing R, C++, cython, Numba etc": [[33, "installing-r-c-cython-numba-etc"]], "Instructor information": [[31, "instructor-information"]], "Interpretations and optimizing our parameters": [[33, "interpretations-and-optimizing-our-parameters"], [33, "id2"], [33, "id3"], [34, "interpretations-and-optimizing-our-parameters"], [34, "id1"], [34, "id2"]], "Interpreting the Ridge results": [[34, "interpreting-the-ridge-results"], [35, "interpreting-the-ridge-results"], [35, "id4"]], "Introducing JAX": [[13, "introducing-jax"]], "Introducing the Covariance and Correlation functions": [[11, "introducing-the-covariance-and-correlation-functions"], [34, "introducing-the-covariance-and-correlation-functions"]], "Introduction": [[0, "introduction"], [6, "introduction"], [25, "introduction"], [26, "introduction"]], "Introduction to Neural networks": [[39, "introduction-to-neural-networks"], [40, "introduction-to-neural-networks"]], "Introduction to numerical projects": [[27, "introduction-to-numerical-projects"], [28, "introduction-to-numerical-projects"]], "Is the Logistic activation function (Sigmoid)  our choice?": [[41, "is-the-logistic-activation-function-sigmoid-our-choice"]], "Iterative Fitting, Classification and AdaBoost": [[10, "iterative-fitting-classification-and-adaboost"]], "Iterative Fitting, Regression and Squared-error Cost Function": [[10, "iterative-fitting-regression-and-squared-error-cost-function"]], "Kernel PCA": [[11, "kernel-pca"]], "Kernels and non-linearity": [[8, "kernels-and-non-linearity"]], "Key Idea": [[43, "key-idea"], [44, "key-idea"]], "LU Decomposition, the inverse of a matrix": [[26, "lu-decomposition-the-inverse-of-a-matrix"]], "Lab  sessions on Tuesday and Wednesday": [[43, "lab-sessions-on-tuesday-and-wednesday"]], "Lab sessions Tuesday and Wednesday": [[39, "lab-sessions-tuesday-and-wednesday"]], "Lab sessions on Tuesday and Wednesday": [[40, "lab-sessions-on-tuesday-and-wednesday"]], "Lab sessions week 39": [[38, "lab-sessions-week-39"]], "Lasso Regression": [[35, "lasso-regression"]], "Lasso case": [[35, "lasso-case"]], "Layers": [[1, "layers"], [41, "layers"]], "Layers of a CNN": [[44, "layers-of-a-cnn"]], "Layers used to build CNNs": [[3, "layers-used-to-build-cnns"], [43, "layers-used-to-build-cnns"], [44, "layers-used-to-build-cnns"]], "Layout of a neural network with three hidden layers": [[40, "layout-of-a-neural-network-with-three-hidden-layers"]], "Layout of a neural network with three hidden layers (last layer = l=L=4, first layer l=0)": [[41, "layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0"]], "Layout of a simple neural network with no hidden layer": [[40, "layout-of-a-simple-neural-network-with-no-hidden-layer"], [41, "layout-of-a-simple-neural-network-with-no-hidden-layer"]], "Layout of a simple neural network with one hidden layer": [[40, "layout-of-a-simple-neural-network-with-one-hidden-layer"], [41, "layout-of-a-simple-neural-network-with-one-hidden-layer"]], "Layout of a simple neural network with two input nodes, one  hidden layer and one output node": [[40, "layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-and-one-output-node"]], "Layout of a simple neural network with two input nodes, one  hidden layer with two hidden noeds and one output node": [[41, "layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-with-two-hidden-noeds-and-one-output-node"]], "Layout of input to first hidden layer l=1 from input layer l=0": [[41, "layout-of-input-to-first-hidden-layer-l-1-from-input-layer-l-0"]], "Learning goals": [[15, "learning-goals"], [16, "learning-goals"], [17, "learning-goals"], [18, "learning-goals"], [19, "learning-goals"], [20, "learning-goals"]], "Learning outcomes": [[25, "learning-outcomes"], [33, "learning-outcomes"]], "Learning rate methods": [[41, "learning-rate-methods"], [42, "learning-rate-methods"]], "Lecture Monday  October 20": [[42, "lecture-monday-october-20"]], "Lecture Monday  October 6": [[40, "lecture-monday-october-6"]], "Lecture Monday September 29, 2025": [[39, "lecture-monday-september-29-2025"]], "Lecture October 13, 2025": [[41, "lecture-october-13-2025"]], "Lecture material": [[38, "lecture-material"]], "Lecture material: Writing a code which implements a feed-forward neural network": [[41, "lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network"]], "Lectures and ComputerLab": [[33, "lectures-and-computerlab"]], "Limitations of NNs": [[41, "limitations-of-nns"]], "Limitations of supervised learning with deep networks": [[1, "limitations-of-supervised-learning-with-deep-networks"], [41, "limitations-of-supervised-learning-with-deep-networks"]], "Linear Algebra, Handling of Arrays and more Python Features": [[26, null]], "Linear Regression": [[0, null]], "Linear Regression Problems": [[34, "linear-regression-problems"], [35, "linear-regression-problems"]], "Linear Regression and  the SVD": [[35, "linear-regression-and-the-svd"]], "Linear Regression, basic elements": [[0, "linear-regression-basic-elements"]], "Linear classifier": [[38, "linear-classifier"]], "Linking Bayes\u2019 Theorem with Ridge and Lasso Regression": [[5, "linking-bayes-theorem-with-ridge-and-lasso-regression"]], "Linking the regression analysis with a statistical interpretation": [[5, "linking-the-regression-analysis-with-a-statistical-interpretation"], [37, "linking-the-regression-analysis-with-a-statistical-interpretation"]], "Linking with the SVD": [[5, "linking-with-the-svd"], [34, "linking-with-the-svd"]], "Links to relevant courses at the University of Oslo": [[32, "links-to-relevant-courses-at-the-university-of-oslo"]], "Logistic Regression": [[7, null], [7, "id1"], [38, "logistic-regression"]], "Logistic Regression, from last week": [[39, "logistic-regression-from-last-week"]], "Logistic function as the root of problems": [[41, "logistic-function-as-the-root-of-problems"]], "MNIST and GANs": [[4, "mnist-and-gans"]], "Machine Learning": [[33, "machine-learning"]], "Machine learning": [[25, "machine-learning"]], "Main textbooks": [[33, "main-textbooks"]], "Making a tree": [[9, "making-a-tree"]], "Making your own Bootstrap: Changing the Level of the Decision Tree": [[10, "making-your-own-bootstrap-changing-the-level-of-the-decision-tree"]], "Making your own test-train splitting": [[34, "making-your-own-test-train-splitting"]], "Material for Lecture Monday November 3": [[44, "material-for-lecture-monday-november-3"]], "Material for Lecture Monday October 27": [[43, "material-for-lecture-monday-october-27"]], "Material for exercises week 35": [[34, "material-for-exercises-week-35"]], "Material for lab sessions  sessions Tuesday and Wednesday": [[35, "material-for-lab-sessions-sessions-tuesday-and-wednesday"]], "Material for lecture Monday September 2": [[35, "material-for-lecture-monday-september-2"]], "Material for lecture Monday September 8": [[36, "material-for-lecture-monday-september-8"]], "Material for the lab sessions": [[36, "material-for-the-lab-sessions"], [37, "material-for-the-lab-sessions"], [44, "material-for-the-lab-sessions"]], "Material for the lab sessions on Tuesday and Wednesday": [[41, "material-for-the-lab-sessions-on-tuesday-and-wednesday"]], "Material for the lecture on Monday October 6, 2025": [[40, "material-for-the-lecture-on-monday-october-6-2025"]], "Mathematical Interpretation of Ordinary Least Squares": [[5, "mathematical-interpretation-of-ordinary-least-squares"], [34, "mathematical-interpretation-of-ordinary-least-squares"], [35, "mathematical-interpretation-of-ordinary-least-squares"]], "Mathematical model": [[39, "mathematical-model"], [39, "id1"], [39, "id2"], [39, "id3"], [39, "id4"]], "Mathematical optimization of convex functions": [[8, "mathematical-optimization-of-convex-functions"]], "Mathematics of CNNs": [[3, "mathematics-of-cnns"], [43, "mathematics-of-cnns"], [44, "mathematics-of-cnns"]], "Mathematics of deep learning": [[40, "mathematics-of-deep-learning"], [41, "mathematics-of-deep-learning"]], "Mathematics of deep learning and neural networks": [[40, "mathematics-of-deep-learning-and-neural-networks"]], "Mathematics of the SVD and implications": [[5, "mathematics-of-the-svd-and-implications"], [34, "mathematics-of-the-svd-and-implications"], [35, "mathematics-of-the-svd-and-implications"]], "Matrices in Python": [[33, "matrices-in-python"]], "Matrix  multiplication": [[1, "matrix-multiplication"], [41, "matrix-multiplication"]], "Matrix multiplications": [[41, "matrix-multiplications"]], "Matrix-vector notation": [[39, "matrix-vector-notation"]], "Matrix-vector notation  and activation": [[12, "matrix-vector-notation-and-activation"], [39, "matrix-vector-notation-and-activation"]], "Maximum Likelihood Estimation (MLE)": [[37, "maximum-likelihood-estimation-mle"]], "Maximum likelihood": [[38, "maximum-likelihood"], [39, "maximum-likelihood"]], "Meet the  covariance!": [[30, "meet-the-covariance"]], "Meet the Covariance Matrix": [[5, "meet-the-covariance-matrix"], [34, "meet-the-covariance-matrix"]], "Meet the Hessian Matrix": [[34, "meet-the-hessian-matrix"]], "Meet the Pandas": [[33, "meet-the-pandas"]], "Memory Usage and Scalability": [[36, "memory-usage-and-scalability"]], "Memory considerations": [[43, "memory-considerations"], [44, "memory-considerations"]], "Memory constraints": [[36, "memory-constraints"]], "Min-Max Scaling": [[34, "min-max-scaling"]], "Minimization process": [[42, "minimization-process"], [43, "minimization-process"]], "Minimizing the cost function using gradient descent and automatic differentiation": [[42, "minimizing-the-cost-function-using-gradient-descent-and-automatic-differentiation"], [43, "minimizing-the-cost-function-using-gradient-descent-and-automatic-differentiation"]], "Minimizing the cross entropy": [[38, "minimizing-the-cross-entropy"], [39, "minimizing-the-cross-entropy"]], "Momentum based GD": [[13, "momentum-based-gd"], [36, "momentum-based-gd"]], "More classes": [[38, "more-classes"], [39, "more-classes"]], "More complicated Example: The Ising model": [[6, "more-complicated-example-the-ising-model"]], "More complicated function": [[40, "more-complicated-function"]], "More considerations": [[40, "more-considerations"], [41, "more-considerations"]], "More details": [[42, "more-details"], [42, "id4"], [43, "more-details"], [43, "id3"]], "More examples on bootstrap and cross-validation and errors": [[37, "more-examples-on-bootstrap-and-cross-validation-and-errors"], [38, "more-examples-on-bootstrap-and-cross-validation-and-errors"]], "More interpretations": [[34, "more-interpretations"], [35, "more-interpretations"], [35, "id5"]], "More limitations": [[41, "more-limitations"]], "More on Dimensionalities": [[3, "more-on-dimensionalities"], [43, "more-on-dimensionalities"], [44, "more-on-dimensionalities"]], "More on Rescaling data": [[6, "more-on-rescaling-data"]], "More on Steepest descent": [[35, "more-on-steepest-descent"]], "More on activation functions, output layers": [[41, "more-on-activation-functions-output-layers"], [42, "more-on-activation-functions-output-layers"]], "More on convex functions": [[35, "more-on-convex-functions"]], "More on the general approximation theorem": [[40, "more-on-the-general-approximation-theorem"]], "More preprocessing": [[34, "more-preprocessing"], [36, "more-preprocessing"]], "More technicalities": [[42, "more-technicalities"], [43, "more-technicalities"]], "More top-down perspectives": [[41, "more-top-down-perspectives"]], "Motivation for Adaptive Step Sizes": [[36, "motivation-for-adaptive-step-sizes"]], "Multiclass classification": [[41, "multiclass-classification"], [42, "multiclass-classification"]], "Multilayer perceptrons": [[12, "multilayer-perceptrons"], [39, "multilayer-perceptrons"], [40, "multilayer-perceptrons"]], "Multivariable functions": [[40, "multivariable-functions"]], "Network requirements": [[2, "network-requirements"], [42, "network-requirements"], [43, "network-requirements"]], "Neural Networks vs CNNs": [[3, "neural-networks-vs-cnns"], [43, "neural-networks-vs-cnns"], [44, "neural-networks-vs-cnns"]], "Neural network types": [[39, "neural-network-types"], [40, "neural-network-types"]], "Neural networks": [[12, null]], "New expression for the derivative": [[40, "new-expression-for-the-derivative"]], "New image (or volume)": [[43, "new-image-or-volume"], [44, "new-image-or-volume"]], "New vector": [[43, "new-vector"], [44, "new-vector"]], "Non-Convex Problems": [[36, "non-convex-problems"]], "Note about SVD Calculations": [[34, "note-about-svd-calculations"], [35, "note-about-svd-calculations"]], "Note on Scikit-Learn": [[35, "note-on-scikit-learn"]], "Numerical experiments and the covariance, central limit theorem": [[30, "numerical-experiments-and-the-covariance-central-limit-theorem"]], "Numpy and arrays": [[26, "numpy-and-arrays"], [33, "numpy-and-arrays"]], "Numpy examples and Important Matrix and vector handling packages": [[33, "numpy-examples-and-important-matrix-and-vector-handling-packages"]], "Optimization and Deep learning": [[38, "optimization-and-deep-learning"], [39, "optimization-and-deep-learning"]], "Optimization and gradient descent, the central part of any Machine Learning algortithm": [[35, "optimization-and-gradient-descent-the-central-part-of-any-machine-learning-algortithm"]], "Optimization, the central part of any Machine Learning algortithm": [[13, null], [38, "optimization-the-central-part-of-any-machine-learning-algortithm"], [39, "optimization-the-central-part-of-any-machine-learning-algortithm"]], "Optimizing our parameters": [[33, "optimizing-our-parameters"]], "Optimizing our parameters, more details": [[33, "optimizing-our-parameters-more-details"]], "Optimizing the cost function": [[1, "optimizing-the-cost-function"], [41, "optimizing-the-cost-function"]], "Optimizing the parameters": [[40, "optimizing-the-parameters"], [41, "optimizing-the-parameters"]], "Optional (Note that you should include at least two of these in the report):": [[28, "optional-note-that-you-should-include-at-least-two-of-these-in-the-report"]], "Ordinary Differential Equations first": [[42, "ordinary-differential-equations-first"], [43, "ordinary-differential-equations-first"]], "Organizing our data": [[0, "organizing-our-data"], [33, "organizing-our-data"]], "Other Matrix and Vector Operations": [[26, "other-matrix-and-vector-operations"]], "Other Types of Recurrent Neural Networks": [[4, "other-types-of-recurrent-neural-networks"]], "Other courses on Data science and Machine Learning  at UiO": [[33, "other-courses-on-data-science-and-machine-learning-at-uio"]], "Other courses on Data science and Machine Learning  at UiO, contn": [[33, "other-courses-on-data-science-and-machine-learning-at-uio-contn"]], "Other ingredients of a neural network": [[40, "other-ingredients-of-a-neural-network"]], "Other measures in classification studies": [[39, "other-measures-in-classification-studies"]], "Other measures: Precision, Recall, and the F_1 Measure": [[23, "other-measures-precision-recall-and-the-f-1-measure"]], "Other parameters": [[40, "other-parameters"]], "Other popular texts": [[33, "other-popular-texts"]], "Other techniques": [[11, "other-techniques"]], "Other types of networks": [[12, "other-types-of-networks"], [39, "other-types-of-networks"], [40, "other-types-of-networks"]], "Other ways of visualizing the trees": [[9, "other-ways-of-visualizing-the-trees"]], "Our model for the nuclear binding energies": [[33, "our-model-for-the-nuclear-binding-energies"]], "Output layer": [[40, "output-layer"], [41, "output-layer"]], "Overarching aims of the exercises for week 43": [[23, "overarching-aims-of-the-exercises-for-week-43"]], "Overarching aims of the exercises this week": [[21, "overarching-aims-of-the-exercises-this-week"], [22, "overarching-aims-of-the-exercises-this-week"], [24, "overarching-aims-of-the-exercises-this-week"]], "Overarching view of a neural network": [[40, "overarching-view-of-a-neural-network"]], "Overview of first week": [[33, "overview-of-first-week"]], "Overview video on Stochastic Gradient Descent (SGD)": [[36, "overview-video-on-stochastic-gradient-descent-sgd"]], "Own code for Ordinary Least Squares": [[33, "own-code-for-ordinary-least-squares"], [34, "own-code-for-ordinary-least-squares"]], "PCA and scikit-learn": [[11, "pca-and-scikit-learn"]], "Padding": [[43, "padding"], [44, "padding"]], "Pandas AI": [[33, "pandas-ai"]], "Parameters of neural networks": [[40, "parameters-of-neural-networks"]], "Parameters to train, common settings": [[43, "parameters-to-train-common-settings"], [44, "parameters-to-train-common-settings"]], "Part a : Ordinary Least Square (OLS) for the Runge function": [[27, "part-a-ordinary-least-square-ols-for-the-runge-function"]], "Part a): Analytical warm-up": [[28, "part-a-analytical-warm-up"]], "Part b): Writing your own Neural Network code": [[28, "part-b-writing-your-own-neural-network-code"]], "Part b: Adding Ridge regression for the Runge function": [[27, "part-b-adding-ridge-regression-for-the-runge-function"]], "Part c): Testing against other software libraries": [[28, "part-c-testing-against-other-software-libraries"]], "Part c: Writing your own gradient descent code": [[27, "part-c-writing-your-own-gradient-descent-code"]], "Part d): Testing different activation functions and depths of the neural network": [[28, "part-d-testing-different-activation-functions-and-depths-of-the-neural-network"]], "Part d: Including momentum and more advanced ways to update the learning the rate": [[27, "part-d-including-momentum-and-more-advanced-ways-to-update-the-learning-the-rate"]], "Part e): Testing different norms": [[28, "part-e-testing-different-norms"]], "Part e: Writing our own code for Lasso regression": [[27, "part-e-writing-our-own-code-for-lasso-regression"]], "Part f): Classification  analysis using neural networks": [[28, "part-f-classification-analysis-using-neural-networks"]], "Part f: Stochastic gradient descent": [[27, "part-f-stochastic-gradient-descent"]], "Part g) Critical evaluation of the various algorithms": [[28, "part-g-critical-evaluation-of-the-various-algorithms"]], "Part g: Bias-variance trade-off and resampling techniques": [[27, "part-g-bias-variance-trade-off-and-resampling-techniques"]], "Part h): Cross-validation as resampling techniques, adding more complexity": [[27, "part-h-cross-validation-as-resampling-techniques-adding-more-complexity"]], "Partial Differential Equations": [[2, "partial-differential-equations"], [42, "partial-differential-equations"], [43, "partial-differential-equations"]], "Plan for week 39, September 22-26, 2025": [[38, "plan-for-week-39-september-22-26-2025"]], "Plan for week 41, October 6-10": [[40, "plan-for-week-41-october-6-10"]], "Plan for week 44": [[43, "plan-for-week-44"]], "Plans for week 35": [[34, "plans-for-week-35"]], "Plans for week 36": [[35, "plans-for-week-36"]], "Plans for week 37, lecture Monday": [[36, "plans-for-week-37-lecture-monday"]], "Plans for week 38, lecture Monday September 15": [[37, "plans-for-week-38-lecture-monday-september-15"]], "Plans for week 43": [[42, "plans-for-week-43"]], "Plans for week 45": [[44, "plans-for-week-45"]], "Plotting the Histogram": [[37, "plotting-the-histogram"]], "Plotting the mean value for each group": [[38, "plotting-the-mean-value-for-each-group"]], "Pooling": [[43, "pooling"], [44, "pooling"]], "Pooling arithmetic": [[43, "pooling-arithmetic"], [44, "pooling-arithmetic"]], "Pooling types (From Raschka et al)": [[43, "pooling-types-from-raschka-et-al"], [44, "pooling-types-from-raschka-et-al"]], "Practical tips": [[13, "practical-tips"], [36, "practical-tips"]], "Practicalities": [[31, "practicalities"], [31, "id1"]], "Preamble: Note on writing reports, using reference material, AI and other tools": [[27, "preamble-note-on-writing-reports-using-reference-material-ai-and-other-tools"], [28, "preamble-note-on-writing-reports-using-reference-material-ai-and-other-tools"]], "Predicting New Points With A Trained Recurrent Neural Network": [[4, "predicting-new-points-with-a-trained-recurrent-neural-network"]], "Preprocessing our data": [[34, "preprocessing-our-data"]], "Prerequisites": [[33, "prerequisites"]], "Prerequisites and background": [[25, "prerequisites-and-background"]], "Prerequisites: Collect and pre-process data": [[3, "prerequisites-collect-and-pre-process-data"], [44, "prerequisites-collect-and-pre-process-data"]], "Probability Distribution Functions": [[30, "probability-distribution-functions"]], "Program example for gradient descent with Ridge Regression": [[35, "program-example-for-gradient-descent-with-ridge-regression"], [36, "program-example-for-gradient-descent-with-ridge-regression"]], "Program for stochastic gradient": [[13, "program-for-stochastic-gradient"]], "Project 1 on Machine Learning, deadline October 6 (midnight), 2025": [[27, null]], "Project 2 on Machine Learning, deadline November 10 (Midnight)": [[28, null]], "Properties of PDFs": [[30, "properties-of-pdfs"]], "Pros and cons": [[36, "pros-and-cons"]], "Pros and cons of trees, pros": [[9, "pros-and-cons-of-trees-pros"]], "Python installers": [[25, "python-installers"], [33, "python-installers"]], "RMS prop": [[13, "rms-prop"]], "RMSProp algorithm, taken from Goodfellow et al": [[36, "rmsprop-algorithm-taken-from-goodfellow-et-al"]], "RMSProp: Adaptive Learning Rates": [[36, "rmsprop-adaptive-learning-rates"]], "RMSprop for adaptive learning rate with Stochastic Gradient Descent": [[36, "rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent"]], "ROC Curve": [[23, "roc-curve"]], "Random Numbers": [[30, "random-numbers"]], "Random forests": [[10, "random-forests"]], "Randomized PCA": [[11, "randomized-pca"]], "Reading material": [[33, "reading-material"]], "Reading recommendations": [[41, "reading-recommendations"]], "Reading recommendations:": [[34, "reading-recommendations"]], "Reading suggestions week 34": [[33, "reading-suggestions-week-34"]], "Readings and Videos": [[37, "readings-and-videos"]], "Readings and Videos, logistic regression": [[38, "readings-and-videos-logistic-regression"]], "Readings and Videos, resampling methods": [[38, "readings-and-videos-resampling-methods"]], "Readings and Videos:": [[36, "readings-and-videos"], [40, "readings-and-videos"]], "Readings and videos": [[41, "readings-and-videos"]], "Recurrent neural networks": [[12, "recurrent-neural-networks"], [39, "recurrent-neural-networks"], [40, "recurrent-neural-networks"]], "Recurrent neural networks: Overarching view": [[4, null]], "Reducing the number of degrees of freedom, overarching view": [[0, "reducing-the-number-of-degrees-of-freedom-overarching-view"], [34, "reducing-the-number-of-degrees-of-freedom-overarching-view"]], "Reducing the number of operations": [[40, "reducing-the-number-of-operations"]], "Reformulating the problem": [[2, "reformulating-the-problem"], [42, "reformulating-the-problem"], [43, "reformulating-the-problem"]], "Regression Case": [[10, "regression-case"]], "Regression analysis and resampling methods": [[27, "regression-analysis-and-resampling-methods"]], "Regression analysis, overarching aims": [[33, "regression-analysis-overarching-aims"]], "Regression analysis, overarching aims II": [[33, "regression-analysis-overarching-aims-ii"]], "Regular NNs don\u2019t scale well to full images": [[43, "regular-nns-dont-scale-well-to-full-images"], [44, "regular-nns-dont-scale-well-to-full-images"]], "Regularization": [[1, "regularization"], [41, "regularization"]], "Relevance": [[39, "relevance"], [41, "relevance"]], "Reminder about the gradient machinery from project 1": [[28, "reminder-about-the-gradient-machinery-from-project-1"]], "Reminder from last week": [[34, "reminder-from-last-week"]], "Reminder from last week: First network example, simple percepetron with one input": [[41, "reminder-from-last-week-first-network-example-simple-percepetron-with-one-input"]], "Reminder on Newton-Raphson\u2019s method": [[35, "reminder-on-newton-raphson-s-method"]], "Reminder on Statistics": [[6, "reminder-on-statistics"]], "Reminder on books with hands-on material and codes": [[40, "reminder-on-books-with-hands-on-material-and-codes"], [41, "reminder-on-books-with-hands-on-material-and-codes"]], "Reminder on different scaling methods": [[36, "reminder-on-different-scaling-methods"]], "Reminder on the chain rule and gradients": [[40, "reminder-on-the-chain-rule-and-gradients"]], "Replace or not": [[13, "replace-or-not"], [36, "replace-or-not"]], "Required Analysis:": [[28, "required-analysis"]], "Required Technologies": [[25, "required-technologies"]], "Resampling Methods": [[6, null]], "Resampling and the Bias-Variance Trade-off": [[19, "resampling-and-the-bias-variance-trade-off"]], "Resampling approaches can be computationally expensive": [[37, "resampling-approaches-can-be-computationally-expensive"], [38, "resampling-approaches-can-be-computationally-expensive"]], "Resampling methods": [[6, "id1"], [37, "resampling-methods"], [37, "id2"], [38, "resampling-methods"], [38, "id1"]], "Resampling methods: Bootstrap": [[37, "resampling-methods-bootstrap"], [38, "resampling-methods-bootstrap"]], "Resampling methods: Bootstrap approach": [[37, "resampling-methods-bootstrap-approach"]], "Resampling methods: Bootstrap background": [[37, "resampling-methods-bootstrap-background"]], "Resampling methods: Bootstrap steps": [[37, "resampling-methods-bootstrap-steps"]], "Resampling methods: More Bootstrap background": [[37, "resampling-methods-more-bootstrap-background"]], "Residual Error": [[34, "residual-error"], [35, "residual-error"]], "Resources on differential equations and deep learning": [[2, "resources-on-differential-equations-and-deep-learning"], [42, "resources-on-differential-equations-and-deep-learning"], [43, "resources-on-differential-equations-and-deep-learning"]], "Revisiting Ordinary Least Squares": [[35, "revisiting-ordinary-least-squares"]], "Revisiting our Linear Regression Solvers": [[13, "revisiting-our-linear-regression-solvers"]], "Revisiting our Logistic Regression case": [[38, "revisiting-our-logistic-regression-case"], [39, "revisiting-our-logistic-regression-case"]], "Rewriting as dot products": [[43, "rewriting-as-dot-products"], [44, "rewriting-as-dot-products"]], "Rewriting the Covariance and/or Correlation Matrix": [[34, "rewriting-the-covariance-and-or-correlation-matrix"]], "Rewriting the \\delta-function": [[37, "rewriting-the-delta-function"]], "Rewriting the fitting procedure as a linear algebra problem": [[33, "rewriting-the-fitting-procedure-as-a-linear-algebra-problem"]], "Rewriting the fitting procedure as a linear algebra problem, more details": [[33, "rewriting-the-fitting-procedure-as-a-linear-algebra-problem-more-details"]], "Ridge Regression": [[35, "ridge-regression"]], "Ridge and LASSO Regression": [[34, "ridge-and-lasso-regression"], [35, "ridge-and-lasso-regression"], [35, "id2"]], "Ridge and Lasso Regression": [[5, null], [5, "id1"]], "Running with Keras": [[44, "running-with-keras"]], "SGD example": [[36, "sgd-example"]], "SGD vs Full-Batch GD: Convergence Speed and Memory Comparison": [[36, "sgd-vs-full-batch-gd-convergence-speed-and-memory-comparison"]], "SVD analysis": [[35, "svd-analysis"]], "Same code but now with momentum gradient descent": [[13, "same-code-but-now-with-momentum-gradient-descent"], [36, "same-code-but-now-with-momentum-gradient-descent"], [36, "id3"], [36, "id4"]], "Schedule first week": [[33, "schedule-first-week"]], "Schematic Regression Procedure": [[9, "schematic-regression-procedure"]], "Second moment of the gradient": [[36, "second-moment-of-the-gradient"]], "September 15-19": [[19, "september-15-19"]], "Set up  the model": [[44, "set-up-the-model"]], "Setting it up": [[44, "setting-it-up"]], "Setting up a Multi-layer perceptron model for classification": [[41, "setting-up-a-multi-layer-perceptron-model-for-classification"]], "Setting up the Back propagation algorithm": [[12, "setting-up-the-back-propagation-algorithm"]], "Setting up the Back propagation algorithm, part 3": [[40, "setting-up-the-back-propagation-algorithm-part-3"], [41, "setting-up-the-back-propagation-algorithm-part-3"], [42, "setting-up-the-back-propagation-algorithm-part-3"]], "Setting up the Matrix to be inverted": [[34, "setting-up-the-matrix-to-be-inverted"], [35, "setting-up-the-matrix-to-be-inverted"]], "Setting up the back propagation algorithm": [[40, "setting-up-the-back-propagation-algorithm"]], "Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations": [[41, "setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations"], [42, "setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations"]], "Setting up the back propagation algorithm, part 1": [[41, "setting-up-the-back-propagation-algorithm-part-1"], [42, "setting-up-the-back-propagation-algorithm-part-1"]], "Setting up the back propagation algorithm, part 2": [[40, "setting-up-the-back-propagation-algorithm-part-2"], [41, "setting-up-the-back-propagation-algorithm-part-2"], [42, "setting-up-the-back-propagation-algorithm-part-2"]], "Setting up the code": [[42, "setting-up-the-code"], [43, "setting-up-the-code"]], "Setting up the equations for a neural network": [[40, "setting-up-the-equations-for-a-neural-network"], [41, "setting-up-the-equations-for-a-neural-network"]], "Setting up the network using Autograd": [[42, "setting-up-the-network-using-autograd"], [43, "setting-up-the-network-using-autograd"]], "Setting up the network using Autograd; The full program": [[2, "setting-up-the-network-using-autograd-the-full-program"], [42, "setting-up-the-network-using-autograd-the-full-program"], [43, "setting-up-the-network-using-autograd-the-full-program"]], "Setting up the network using Autograd; The trial solution": [[42, "setting-up-the-network-using-autograd-the-trial-solution"], [43, "setting-up-the-network-using-autograd-the-trial-solution"]], "Setting up the problem": [[42, "setting-up-the-problem"], [43, "setting-up-the-problem"]], "Setup of Network": [[42, "setup-of-network"], [43, "setup-of-network"]], "Similar (second order function now) problem but now with AdaGrad": [[13, "similar-second-order-function-now-problem-but-now-with-adagrad"], [36, "similar-second-order-function-now-problem-but-now-with-adagrad"]], "Simple Python Code to read in Data and perform Classification": [[9, "simple-python-code-to-read-in-data-and-perform-classification"]], "Simple case": [[34, "simple-case"], [35, "simple-case"]], "Simple code for solving the above problem": [[35, "simple-code-for-solving-the-above-problem"]], "Simple example": [[38, "simple-example"], [40, "simple-example"]], "Simple example code": [[36, "simple-example-code"]], "Simple example to illustrate Ordinary Least Squares, Ridge and Lasso Regression": [[35, "simple-example-to-illustrate-ordinary-least-squares-ridge-and-lasso-regression"]], "Simple geometric interpretation": [[35, "simple-geometric-interpretation"]], "Simple linear regression model using scikit-learn": [[0, "simple-linear-regression-model-using-scikit-learn"], [33, "simple-linear-regression-model-using-scikit-learn"]], "Simple neural network and the  back propagation equations": [[40, "simple-neural-network-and-the-back-propagation-equations"], [41, "simple-neural-network-and-the-back-propagation-equations"]], "Simple one-dimensional second-order polynomial": [[18, "simple-one-dimensional-second-order-polynomial"]], "Simple program": [[35, "simple-program"], [36, "simple-program"]], "Simpler examples first, and automatic differentiation": [[40, "simpler-examples-first-and-automatic-differentiation"]], "Slightly different approach": [[36, "slightly-different-approach"]], "Smarter way of evaluating the above function": [[40, "smarter-way-of-evaluating-the-above-function"]], "Sneaking in automatic differentiation using Autograd": [[36, "sneaking-in-automatic-differentiation-using-autograd"]], "Software and needed installations": [[27, "software-and-needed-installations"], [33, "software-and-needed-installations"]], "Solving Differential Equations  with Deep Learning": [[2, null]], "Solving differential equations  with Deep Learning": [[42, "solving-differential-equations-with-deep-learning"], [43, "solving-differential-equations-with-deep-learning"]], "Solving the equation using Autograd": [[42, "solving-the-equation-using-autograd"], [43, "solving-the-equation-using-autograd"]], "Solving the one dimensional Poisson equation": [[2, "solving-the-one-dimensional-poisson-equation"]], "Solving the wave equation - the full program using Autograd": [[42, "solving-the-wave-equation-the-full-program-using-autograd"]], "Solving the wave equation with Neural Networks": [[2, "solving-the-wave-equation-with-neural-networks"]], "Solving using Newton-Raphson\u2019s method": [[38, "solving-using-newton-raphson-s-method"], [39, "solving-using-newton-raphson-s-method"]], "Some famous Matrices": [[26, "some-famous-matrices"]], "Some parallels from real analysis": [[40, "some-parallels-from-real-analysis"]], "Some selected properties": [[38, "some-selected-properties"]], "Some simple problems": [[13, "some-simple-problems"], [35, "some-simple-problems"]], "Some useful matrix and vector expressions": [[34, "some-useful-matrix-and-vector-expressions"]], "Splitting our Data in Training and Test data": [[0, "splitting-our-data-in-training-and-test-data"], [34, "splitting-our-data-in-training-and-test-data"]], "Standard Approach based on the Normal Distribution": [[37, "standard-approach-based-on-the-normal-distribution"]], "Standard steepest descent": [[13, "standard-steepest-descent"]], "Statistical analysis": [[37, "statistical-analysis"], [38, "statistical-analysis"]], "Statistical analysis and optimization of data": [[25, "statistical-analysis-and-optimization-of-data"], [33, "statistical-analysis-and-optimization-of-data"]], "Steepest descent": [[13, "steepest-descent"], [35, "steepest-descent"]], "Stochastic Gradient Descent": [[36, "stochastic-gradient-descent"]], "Stochastic Gradient Descent (SGD)": [[13, "stochastic-gradient-descent-sgd"], [36, "stochastic-gradient-descent-sgd"]], "Stochastic variables and the main concepts, the discrete case": [[30, "stochastic-variables-and-the-main-concepts-the-discrete-case"]], "Strong correlations": [[44, "strong-correlations"]], "Strongly Convex Case": [[36, "strongly-convex-case"]], "Suggested readings and videos": [[39, "suggested-readings-and-videos"]], "Summarizing: Performing a general discrete convolution (From Raschka et al)": [[43, "summarizing-performing-a-general-discrete-convolution-from-raschka-et-al"], [44, "summarizing-performing-a-general-discrete-convolution-from-raschka-et-al"]], "Summary of methods to implement and analyze": [[28, "summary-of-methods-to-implement-and-analyze"]], "Summing up": [[37, "summing-up"], [38, "summing-up"]], "Support Vector Machines, overarching aims": [[8, null]], "Synthetic data generation": [[38, "synthetic-data-generation"], [39, "synthetic-data-generation"]], "Systematic reduction": [[3, "systematic-reduction"], [44, "systematic-reduction"]], "Teachers": [[33, "teachers"]], "Teachers and Grading": [[31, null]], "Teaching Assistants Fall semester 2023": [[31, "teaching-assistants-fall-semester-2023"]], "Technicalities": [[42, "technicalities"], [43, "technicalities"]], "Tensorflow": [[41, "tensorflow"], [42, "tensorflow"]], "Tentative deadllines for projects": [[31, "tentative-deadllines-for-projects"]], "Testing the Means Squared Error as function of Complexity": [[0, "testing-the-means-squared-error-as-function-of-complexity"], [34, "testing-the-means-squared-error-as-function-of-complexity"]], "Testing the XOR gate and other gates": [[41, "testing-the-xor-gate-and-other-gates"], [42, "testing-the-xor-gate-and-other-gates"]], "Textbooks": [[32, null]], "The  back propagation equations for a neural network": [[41, "the-back-propagation-equations-for-a-neural-network"]], "The Algorithm before theorem": [[11, "the-algorithm-before-theorem"]], "The Breast Cancer Data, now with Keras": [[1, "the-breast-cancer-data-now-with-keras"]], "The CART algorithm for Classification": [[9, "the-cart-algorithm-for-classification"]], "The CART algorithm for Regression": [[9, "the-cart-algorithm-for-regression"]], "The CIFAR01 data set": [[3, "the-cifar01-data-set"], [44, "the-cifar01-data-set"]], "The Central Limit Theorem": [[37, "the-central-limit-theorem"]], "The Hessian matrix": [[35, "the-hessian-matrix"], [36, "the-hessian-matrix"]], "The Hessian matrix for Ridge Regression": [[35, "the-hessian-matrix-for-ridge-regression"], [36, "the-hessian-matrix-for-ridge-regression"]], "The Jacobian": [[34, "the-jacobian"]], "The MNIST dataset again": [[3, "the-mnist-dataset-again"], [44, "the-mnist-dataset-again"]], "The Neural Network": [[41, "the-neural-network"], [42, "the-neural-network"]], "The OLS case": [[35, "the-ols-case"]], "The RELU function family": [[1, "the-relu-function-family"], [41, "the-relu-function-family"], [42, "the-relu-function-family"]], "The Ridge case": [[35, "the-ridge-case"]], "The SVD example": [[43, "the-svd-example"]], "The SVD, a Fantastic Algorithm": [[34, "the-svd-a-fantastic-algorithm"], [35, "the-svd-a-fantastic-algorithm"]], "The Softmax function": [[1, "the-softmax-function"], [41, "the-softmax-function"]], "The \\chi^2 function": [[0, "the-chi-2-function"], [33, "the-chi-2-function"], [33, "id4"], [33, "id5"], [33, "id6"], [33, "id7"], [33, "id8"]], "The analytical solution": [[42, "the-analytical-solution"]], "The approximation theorem in words": [[40, "the-approximation-theorem-in-words"]], "The bias-variance tradeoff": [[6, "the-bias-variance-tradeoff"], [37, "the-bias-variance-tradeoff"], [38, "the-bias-variance-tradeoff"]], "The code for solving the ODE": [[2, "the-code-for-solving-the-ode"], [42, "the-code-for-solving-the-ode"], [43, "the-code-for-solving-the-ode"]], "The complete code with a simple data set": [[34, "the-complete-code-with-a-simple-data-set"]], "The convolution stage": [[43, "the-convolution-stage"], [44, "the-convolution-stage"]], "The cost function rewritten": [[38, "the-cost-function-rewritten"], [39, "the-cost-function-rewritten"]], "The cost/loss function": [[34, "the-cost-loss-function"]], "The course has two central parts": [[25, "the-course-has-two-central-parts"]], "The derivative of the Logistic funtion": [[41, "the-derivative-of-the-logistic-funtion"]], "The derivative of the cost/loss function": [[35, "the-derivative-of-the-cost-loss-function"], [36, "the-derivative-of-the-cost-loss-function"]], "The derivatives": [[40, "the-derivatives"], [41, "the-derivatives"]], "The equations": [[35, "the-equations"]], "The equations for ordinary least squares": [[34, "the-equations-for-ordinary-least-squares"]], "The equations to solve": [[38, "the-equations-to-solve"], [39, "the-equations-to-solve"]], "The first Case": [[35, "the-first-case"]], "The function to solve for": [[42, "the-function-to-solve-for"], [43, "the-function-to-solve-for"]], "The gradient step": [[36, "the-gradient-step"]], "The ideal": [[35, "the-ideal"]], "The logistic function": [[7, "the-logistic-function"], [38, "the-logistic-function"]], "The mean squared error and its derivative": [[34, "the-mean-squared-error-and-its-derivative"]], "The moons example": [[8, "the-moons-example"]], "The multilayer  perceptron (MLP)": [[12, "the-multilayer-perceptron-mlp"]], "The network with one input layer, specified number of hidden layers, and one output layer": [[2, "the-network-with-one-input-layer-specified-number-of-hidden-layers-and-one-output-layer"], [42, "the-network-with-one-input-layer-specified-number-of-hidden-layers-and-one-output-layer"], [43, "the-network-with-one-input-layer-specified-number-of-hidden-layers-and-one-output-layer"]], "The optimization problem": [[40, "the-optimization-problem"]], "The ouput layer": [[40, "the-ouput-layer"], [41, "the-ouput-layer"]], "The plethora  of machine learning algorithms/methods": [[33, "the-plethora-of-machine-learning-algorithms-methods"]], "The problem to solve for": [[42, "the-problem-to-solve-for"]], "The program using Autograd": [[42, "the-program-using-autograd"], [43, "the-program-using-autograd"]], "The same example but now with cross-validation": [[37, "the-same-example-but-now-with-cross-validation"], [38, "the-same-example-but-now-with-cross-validation"]], "The sensitiveness of the gradient descent": [[35, "the-sensitiveness-of-the-gradient-descent"]], "The singular value decomposition": [[5, "the-singular-value-decomposition"], [34, "the-singular-value-decomposition"], [35, "the-singular-value-decomposition"]], "The specific equation to solve for": [[42, "the-specific-equation-to-solve-for"], [43, "the-specific-equation-to-solve-for"]], "The training": [[40, "the-training"], [41, "the-training"]], "The trial solution": [[42, "the-trial-solution"], [42, "id2"], [42, "id3"], [42, "id5"], [43, "the-trial-solution"], [43, "id1"], [43, "id2"]], "The two-dimensional case": [[8, "the-two-dimensional-case"]], "Theoretical Convergence Speed and convex optimization": [[36, "theoretical-convergence-speed-and-convex-optimization"]], "Time decay rate": [[36, "time-decay-rate"]], "To our real data: nuclear binding energies. Brief reminder on masses and binding energies": [[33, "to-our-real-data-nuclear-binding-energies-brief-reminder-on-masses-and-binding-energies"]], "Toeplitz matrices": [[43, "toeplitz-matrices"], [44, "toeplitz-matrices"]], "Topics covered in this course: Statistical analysis and optimization of data": [[33, "topics-covered-in-this-course-statistical-analysis-and-optimization-of-data"]], "Towards the PCA theorem": [[11, "towards-the-pca-theorem"]], "Train and test datasets": [[1, "train-and-test-datasets"], [41, "train-and-test-datasets"]], "Transforming images": [[43, "transforming-images"], [44, "transforming-images"]], "Two parameters": [[38, "two-parameters"], [39, "two-parameters"]], "Two-dimensional Objects": [[3, "two-dimensional-objects"]], "Two-dimensional objects": [[43, "two-dimensional-objects"], [44, "two-dimensional-objects"]], "Type of problem": [[2, "type-of-problem"], [42, "type-of-problem"], [43, "type-of-problem"]], "Types of Machine Learning": [[33, "types-of-machine-learning"]], "Understanding what happens": [[37, "understanding-what-happens"], [38, "understanding-what-happens"]], "Universal approximation theorem": [[40, "universal-approximation-theorem"]], "Updating the gradients": [[40, "updating-the-gradients"], [41, "updating-the-gradients"], [42, "updating-the-gradients"]], "Usage of the above learning rate schedulers": [[41, "usage-of-the-above-learning-rate-schedulers"], [42, "usage-of-the-above-learning-rate-schedulers"]], "Use the books!": [[19, "use-the-books"]], "Useful Python libraries": [[25, "useful-python-libraries"], [33, "useful-python-libraries"]], "Using Autograd": [[13, "using-autograd"]], "Using Automatic differentiation": [[42, "using-automatic-differentiation"]], "Using Keras": [[41, "using-keras"], [42, "using-keras"]], "Using Pytorch with the full MNIST data set": [[42, "using-pytorch-with-the-full-mnist-data-set"]], "Using Scikit-learn": [[39, "using-scikit-learn"]], "Using forward Euler to solve the ODE": [[2, "using-forward-euler-to-solve-the-ode"], [42, "using-forward-euler-to-solve-the-ode"], [43, "using-forward-euler-to-solve-the-ode"]], "Using gradient descent methods, limitations": [[13, "using-gradient-descent-methods-limitations"], [35, "using-gradient-descent-methods-limitations"], [36, "using-gradient-descent-methods-limitations"]], "Using the chain rule and summing over all k entries": [[40, "using-the-chain-rule-and-summing-over-all-k-entries"], [41, "using-the-chain-rule-and-summing-over-all-k-entries"]], "Using the correlation matrix": [[39, "using-the-correlation-matrix"]], "Vanishing gradients": [[41, "vanishing-gradients"]], "Various steps in cross-validation": [[37, "various-steps-in-cross-validation"], [38, "various-steps-in-cross-validation"]], "Verifying the data set": [[44, "verifying-the-data-set"]], "Visualization": [[1, "visualization"], [1, "id1"], [41, "visualization"], [41, "id1"]], "Visualizing the Tree, Classification": [[9, "visualizing-the-tree-classification"]], "Week 34: Introduction to the course, Logistics and Practicalities": [[33, null]], "Week 35: From Ordinary Linear Regression to Ridge and Lasso Regression": [[34, null]], "Week 36: Linear Regression and Gradient descent": [[35, null]], "Week 37: Gradient descent methods": [[36, null]], "Week 38: Statistical analysis, bias-variance tradeoff and resampling methods": [[37, null]], "Week 39: Resampling methods and logistic regression": [[38, null]], "Week 40: Gradient descent methods (continued) and start Neural networks": [[39, null]], "Week 41 Neural networks and constructing a neural network code": [[40, null]], "Week 42 Constructing a Neural Network code with examples": [[41, null]], "Week 43: Deep Learning: Constructing a Neural Network code and solving differential equations": [[42, null]], "Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)": [[43, null]], "Week 45,  Convolutional Neural Networks (CCNs)": [[44, null]], "Weights and biases": [[41, "weights-and-biases"]], "What Is Generative Modeling?": [[33, "what-is-generative-modeling"]], "What does it mean?": [[34, "what-does-it-mean"], [35, "what-does-it-mean"]], "What is Machine Learning?": [[0, "what-is-machine-learning"]], "What is a good model?": [[0, "what-is-a-good-model"], [33, "what-is-a-good-model"]], "What is a good model? Can we define it?": [[33, "what-is-a-good-model-can-we-define-it"]], "What is the Difference": [[43, "what-is-the-difference"], [44, "what-is-the-difference"]], "When do we stop?": [[36, "when-do-we-stop"]], "Which activation function should I use?": [[1, "which-activation-function-should-i-use"]], "Which activation function should we use?": [[41, "which-activation-function-should-we-use"], [42, "which-activation-function-should-we-use"]], "Why CNNS for images, sound files, medical images from CT scans etc?": [[43, "why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc"], [44, "why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc"]], "Why Combine Momentum and RMSProp?": [[36, "why-combine-momentum-and-rmsprop"]], "Why Linear Regression (aka Ordinary Least Squares and family)": [[33, "why-linear-regression-aka-ordinary-least-squares-and-family"]], "Why multilayer perceptrons?": [[39, "why-multilayer-perceptrons"], [40, "why-multilayer-perceptrons"]], "Why resampling methods": [[37, "why-resampling-methods"]], "Why resampling methods ?": [[37, "id1"], [38, "why-resampling-methods"]], "Why the Jacobian?": [[43, "why-the-jacobian"]], "Why the jacobian?": [[42, "why-the-jacobian"]], "Wisconsin Cancer Data": [[7, "wisconsin-cancer-data"]], "With Lasso Regression": [[35, "with-lasso-regression"]], "Wrapping it up": [[37, "wrapping-it-up"]], "Writing Our First Generative Adversarial Network": [[4, "writing-our-first-generative-adversarial-network"]], "Writing our own PCA code": [[11, "writing-our-own-pca-code"]], "Writing the Cost Function": [[35, "writing-the-cost-function"]], "XGBoost: Extreme Gradient Boosting": [[10, "xgboost-extreme-gradient-boosting"]], "Yet another Example": [[35, "yet-another-example"]], "a) Expression for Ridge regression": [[17, "a-expression-for-ridge-regression"]], "scikit-learn implementation": [[1, "scikit-learn-implementation"], [41, "scikit-learn-implementation"]]}, "docnames": ["chapter1", "chapter10", "chapter11", "chapter12", "chapter13", "chapter2", "chapter3", "chapter4", "chapter5", "chapter6", "chapter7", "chapter8", "chapter9", "chapteroptimization", "clustering", "exercisesweek34", "exercisesweek35", "exercisesweek36", "exercisesweek37", "exercisesweek38", "exercisesweek39", "exercisesweek41", "exercisesweek42", "exercisesweek43", "exercisesweek44", "intro", "linalg", "project1", "project2", "schedule", "statistics", "teachers", "textbooks", "week34", "week35", "week36", "week37", "week38", "week39", "week40", "week41", "week42", "week43", "week44", "week45"], "envversion": {"sphinx": 62, "sphinx.domains.c": 3, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 9, "sphinx.domains.index": 1, "sphinx.domains.javascript": 3, "sphinx.domains.math": 2, "sphinx.domains.python": 4, "sphinx.domains.rst": 2, "sphinx.domains.std": 2, "sphinx.ext.intersphinx": 1}, "filenames": ["chapter1.ipynb", "chapter10.ipynb", "chapter11.ipynb", "chapter12.ipynb", "chapter13.ipynb", "chapter2.ipynb", "chapter3.ipynb", "chapter4.ipynb", "chapter5.ipynb", "chapter6.ipynb", "chapter7.ipynb", "chapter8.ipynb", "chapter9.ipynb", "chapteroptimization.ipynb", "clustering.ipynb", "exercisesweek34.ipynb", "exercisesweek35.ipynb", "exercisesweek36.ipynb", "exercisesweek37.ipynb", "exercisesweek38.ipynb", "exercisesweek39.ipynb", "exercisesweek41.ipynb", "exercisesweek42.ipynb", "exercisesweek43.ipynb", "exercisesweek44.ipynb", "intro.md", "linalg.ipynb", "project1.ipynb", "project2.ipynb", "schedule.md", "statistics.ipynb", "teachers.md", "textbooks.md", "week34.ipynb", "week35.ipynb", "week36.ipynb", "week37.ipynb", "week38.ipynb", "week39.ipynb", "week40.ipynb", "week41.ipynb", "week42.ipynb", "week43.ipynb", "week44.ipynb", "week45.ipynb"], "indexentries": {}, "objects": {}, "objnames": {}, "objtypes": {}, "terms": {"": [0, 1, 2, 3, 4, 5, 6, 7, 9, 11, 12, 13, 15, 16, 17, 19, 21, 22, 23, 24, 25, 26, 27, 28, 30, 31, 33, 34, 40, 41, 42, 43, 44], "0": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 21, 22, 23, 24, 26, 27, 28, 30, 31, 33, 34, 35, 36, 37, 38, 39, 40, 42, 43, 44], "00": [0, 1, 5, 11, 33, 34, 40, 41, 43, 44], "000": [1, 3, 41, 43, 44], "000000": [], "00000000e": [], "001": [2, 8, 13, 21, 35, 36, 42, 43, 44], "004": 5, "004113634617443131": 34, "004113634617443139": 34, "00411363461744314": 34, "004113634617443147": 34, "005b82": [], "00622f": [], "00727646693": [0, 33], "0072b2": [], "00749c": [], "0076268": 21, "008561": [], "0086649156": [0, 33], "00e0e0": [], "01": [0, 1, 2, 5, 9, 11, 13, 17, 32, 33, 34, 36, 38, 39, 40, 41, 42, 43, 44], "010726": [], "0110": 30, "01719003e": [], "02": [0, 4, 7, 12, 33, 38, 39, 41, 43, 44], "02334824": [], "023b95": [], "024c1a": [], "025": 28, "02857": 4, "02f": 6, "03077640549": 4, "03097597e": [], "031": 5, "04": 11, "0458": 9, "05": [4, 6], "0550ae": [], "05767": 40, "062292565": 4, "062435": [], "06730814": [], "07": [], "0713": [0, 33], "07285": 3, "08": 30, "08078025e": [], "080808": [], "08336233266": 4, "08376632": 34, "083766322923899": 34, "0837663229239043": 34, "0917": 9, "0969da4a": [], "0d1117": [], "0n": [0, 33], "0x113e21950": 17, "1": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 23, 26, 29, 30, 31, 32, 33, 35, 36, 37, 38, 39, 43, 44], "10": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 16, 17, 18, 19, 21, 22, 23, 24, 26, 27, 29, 30, 31, 33, 34, 35, 36, 37, 38, 39, 41, 42, 43, 44], "100": [0, 1, 2, 3, 4, 5, 6, 8, 9, 10, 11, 13, 14, 15, 16, 17, 18, 19, 21, 26, 28, 30, 31, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "1000": [0, 1, 2, 4, 5, 8, 11, 13, 14, 18, 19, 21, 23, 25, 28, 30, 33, 35, 36, 38, 39, 41, 42, 43], "10000": [2, 5, 6, 10, 11, 13, 30, 37, 42, 43], "100000": 8, "10001": 10, "1001": 30, "1002": 30, "1003": 30, "1005": 30, "1007": [37, 38], "1009": 30, "101": 16, "1011": 30, "1013": 30, "1013904243": 30, "1015": 30, "102": 16, "1023": 30, "1024": [3, 44], "1026": 30, "1027": 30, "103": [1, 41], "1030": 30, "1037": 30, "1038": 30, "1040": 30, "1047": 30, "107": 16, "108": [], "10e": [41, 42], "10th": 9, "10x": [0, 28, 33], "10y": 28, "11": [0, 2, 5, 6, 7, 8, 9, 10, 11, 12, 13, 16, 23, 26, 27, 28, 30, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "110": [], "1100": 30, "1101": 30, "111": [1, 7, 12, 38, 39, 40, 41], "112": 16, "11340253": [], "114": 43, "11590451": [], "116": 16, "116329": [], "116633": [], "117": 16, "118": 16, "12": [0, 1, 2, 3, 4, 5, 6, 8, 9, 11, 12, 18, 21, 23, 26, 27, 28, 30, 32, 33, 34, 35, 36, 37, 39, 41, 42, 43, 44], "120": [3, 43, 44], "1200": 43, "121": [8, 9, 10, 16], "1215pm": [31, 33], "122": [8, 9, 10], "124": [0, 33], "125": 16, "127": [4, 16], "128": [3, 4, 13, 36, 43, 44], "129": 16, "1298": 9, "12pm": [31, 33], "13": [0, 2, 9, 12, 22, 23, 26, 28, 30, 33, 39, 42, 43], "1307": 44, "131": 16, "133": [7, 38], "135": 16, "136": 16, "14": [0, 2, 4, 6, 8, 9, 10, 12, 26, 28, 30, 32, 34, 37, 38, 42, 43, 44], "141": 16, "1412": 36, "141414": [], "143": 16, "1446729567": 4, "149": 16, "14g": [6, 37], "15": [0, 2, 4, 6, 7, 8, 9, 12, 13, 27, 28, 30, 33, 35, 36, 38, 39, 42, 43, 44], "150": [4, 8, 21, 38, 39], "1502": 40, "152": 16, "153760": [], "156": 16, "157": [], "158": [], "159": 16, "15g": [6, 37], "15pm": 33, "16": [1, 2, 3, 4, 5, 8, 9, 10, 21, 30, 33, 35, 37, 42, 43, 44], "160": 16, "1603": 3, "161": 16, "162": 16, "16231451": 4, "163": 16, "16384": [3, 44], "164": 16, "167": 16, "17": [1, 2, 8, 22, 30, 41, 42, 43], "172": 16, "173": 16, "175": [37, 38], "176": 16, "178": 16, "179": 16, "1797": [1, 41], "18": [2, 6, 7, 8, 9, 10, 30, 33, 37, 38, 42, 43], "1807": 4, "181036": [], "18392847": [], "18c1c4": [], "19": [2, 30, 33, 37, 42], "192": [37, 38], "1940": [], "1943": [12, 39, 40], "19569961": 34, "19680801": [], "1970": [26, 33], "1973": 9, "1979": [6, 37], "1989": 40, "1991": 40, "1_1": [12, 39], "1_2": [12, 39], "1_3": [12, 39], "1cm": [0, 8, 10, 30, 33, 40, 41], "1d": [1, 2, 3, 24, 38, 39, 41, 42, 43, 44], "1e": [2, 4, 13, 14, 36, 38, 39, 41, 42, 43], "1e10": 14, "1e1e1": [], "1e4": 6, "1f": 1, "1ffvbn0xlhv": 22, "1k": 26, "1n": [0, 33], "1x": [0, 33], "1zkibvqf": 21, "2": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 23, 25, 26, 27, 30, 32, 36, 37, 38, 39, 43, 44], "20": [0, 1, 2, 6, 7, 8, 16, 17, 23, 30, 31, 33, 34, 35, 36, 37, 38, 39, 40, 41, 43, 44], "200": [0, 2, 3, 4, 8, 9, 10, 38, 39, 42, 43, 44], "2000": [0, 34], "2001": [], "2004": [13, 35], "2006": 32, "2007": [], "20072279": [], "2008": [33, 36], "2009": [], "2010": [1, 41], "2011": [1, 36, 41], "2012": 36, "2013": [], "2014": [4, 36], "2015": [1, 41, 42], "2016": [0, 33], "2017": 42, "2018": [0, 6, 34, 37, 38], "2019": [], "2020": [], "2021": [6, 14, 34, 36], "2022": [28, 33, 40, 41], "2023": [41, 42], "2024": [21, 37], "2025": [18, 21, 22, 23, 24, 28, 33, 34, 35, 36, 37, 42, 43, 44], "21": [0, 1, 5, 7, 9, 12, 23, 26, 33, 34, 35, 38, 39, 40, 41, 43, 44], "2116753732": 4, "215pm": [31, 33], "2167072": [], "22": [0, 1, 5, 12, 13, 23, 26, 33, 34, 35, 39, 41, 43, 44], "221": 8, "225": 4, "22948497": [], "23": [1, 12, 23, 26, 39, 41], "24": [0, 1, 23, 26, 33, 41], "242424": [], "24292f": [], "25": [2, 3, 4, 5, 6, 8, 9, 11, 24, 34, 42, 43, 44], "250": [2, 4, 7, 9, 38, 42, 43], "25000": [], "250154": [], "252124": [], "253775": [], "255": [3, 28, 42, 43, 44], "256": [4, 36], "25x": [27, 28], "26": [], "26303845": [], "264": [], "265": [], "265109911": 4, "266": [], "269": [], "27": [1, 24, 41], "270": [], "278": [35, 36], "27n_": 30, "28": [1, 3, 4, 41, 42, 43, 44], "283": [35, 36], "2830637392": 4, "2861": 30, "2873": 9, "2882": 30, "2886": 30, "2890": [0, 33], "2892": 30, "29": 34, "2915": 30, "2931": 33, "29364655": [], "294399745619595": [], "296247": [], "2968": 33, "2980": [21, 33], "298273": [], "298375": [], "299": 43, "2990": 33, "2_": [12, 39], "2_1": [12, 39], "2_2": [12, 39], "2_3": [12, 39], "2_i": [12, 39], "2_m": [6, 30, 37], "2_t": 13, "2_x": 30, "2a": 17, "2a1968": [], "2b": 30, "2b2b2b": [], "2c8f433990d1": 36, "2cm": 8, "2d": [1, 3, 11, 12, 25, 28, 33, 38, 39, 40, 41, 43, 44], "2e": [6, 37, 38], "2f": [0, 7, 9, 10, 11, 12, 23, 33, 38, 39, 42, 44], "2g": [2, 42, 43], "2g_i": [2, 42, 43], "2k": 3, "2m": [6, 37], "2mvizaqfst8": 34, "2n": [0, 2, 3, 33, 34, 42], "2nd": 9, "2p": [30, 40, 43, 44], "2pt": 4, "2x": [0, 3, 8, 13, 33, 40], "2x_ix_jy_iy_j": 8, "2x_j": 8, "2xb": 40, "2y_i": 10, "2y_j": 8, "3": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 23, 24, 25, 26, 27, 28, 29, 30, 31, 33, 35, 36, 37, 38, 39, 43], "30": [0, 1, 4, 6, 7, 10, 13, 31, 36, 37, 38, 39, 41, 42], "300": [38, 39], "30000": [0, 33], "3072": [3, 43, 44], "3081": 44, "31": [12, 23, 24, 26, 30, 39], "315": [6, 34, 36], "3155": [0, 5, 6, 34, 35, 36, 37, 38], "32": [3, 4, 6, 12, 13, 23, 26, 30, 36, 39, 43, 44], "3200": [1, 41], "3250": [1, 41], "3297": [], "33": [12, 23, 26, 31, 39], "3303": [], "3310": [], "332331": [], "333": [7, 38], "3331": [], "3337": [], "34": 26, "3436": [0, 33], "3437": [0, 33], "35": [0, 6, 27, 33, 35, 36], "3581341341": 4, "359": [5, 35], "36": [0, 5, 6, 18, 27, 30], "37": [27, 35, 37, 38], "370782966": 4, "38": [27, 30], "387": [37, 38], "39": [0, 24, 27, 28, 31, 33], "3d": [2, 3, 4, 6, 13, 16, 37, 38, 42], "3d73a9": [], "3f": [1, 3, 9, 41, 42, 44], "3n": 26, "3pi": [43, 44], "3x": [2, 8, 42, 43], "3x_0x_1": 40, "3x_i": [2, 42, 43], "3y": 8, "4": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 24, 26, 27, 28, 30, 33, 35, 36, 37, 38, 39, 40, 42, 43, 44], "40": [1, 6, 31, 33, 37, 38, 41, 42], "400": [4, 43], "4000": 33, "40008b9a5380fcacce3976bf7c08af5b": 36, "4050": [32, 33], "41": [23, 26, 28, 42, 43], "4155": [2, 15, 42, 43], "41589548": [], "42": [1, 4, 8, 9, 10, 23, 26, 28, 38, 39, 40, 42, 43], "43": [0, 7, 26], "4310": 33, "436462435": 4, "437a6b": [], "44": [0, 26, 35, 36, 42], "45": [31, 33], "46": [31, 33], "462": [7, 38], "468": 42, "47": [31, 33], "473d18": [], "479465113": 4, "47958494": [], "48": [], "48257387": [31, 33], "49": [5, 6, 11], "49152": [3, 44], "4940954": [0, 33], "4990": 30, "4992": 30, "4997": 30, "4c4b4be8": [], "4c4c7f": [9, 10], "4d": [3, 44], "4f": [6, 28, 38, 39, 42, 44], "4pm": [31, 33], "4y": 8, "4y_i": 10, "5": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 17, 23, 24, 26, 27, 28, 30, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "50": [1, 2, 3, 4, 6, 7, 8, 10, 13, 24, 28, 33, 34, 36, 37, 39, 40, 41, 42, 43, 44], "500": [1, 3, 4, 6, 9, 10, 13, 36, 37, 38, 41, 43], "5000": [27, 28], "5018": 30, "506": [], "507d50": [9, 10], "50j": 13, "50x10": [1, 41], "51": 10, "510": [1, 41], "512132": [], "515151": [], "5177783846": 4, "52": 38, "53": [9, 38], "5391cf": [], "54": [6, 30], "5411205": [], "54894451": [], "55": [1, 41], "56": [1, 41], "56469864": 21, "56536": [0, 33], "569": 1, "57": [0, 8, 31, 33], "571": [5, 35], "576": 37, "58": [10, 31, 33], "5870": 43, "58a6ff70": [], "591317992": 4, "5ca7e4": [], "5cm": 30, "5f": [8, 36], "5x": [8, 18], "5y": 8, "6": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 12, 13, 18, 26, 28, 30, 31, 33, 34, 35, 36, 37, 38, 39, 41, 42, 43, 44], "60": [1, 3, 44], "60000": 4, "6019067271": 4, "60610368": 21, "606439": [], "61362": 28, "622cbc": [], "625": [7, 38], "63": [1, 41], "64": [1, 3, 4, 13, 26, 33, 36, 41, 42, 43, 44], "64x50": [1, 41], "65": [1, 8, 9, 41], "66666691": [], "66707b": [], "66ccee": [], "66e9ec": [], "6730c5": [], "6887363571": 4, "69": [16, 30], "69069n_": 30, "691": [], "6980": 36, "6e7681": [], "6e7781": [], "6f98b3": [], "6n_": 30, "7": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 11, 12, 13, 26, 27, 28, 30, 32, 33, 34, 36, 37, 38, 39, 40, 41, 42, 43, 44], "70": [1, 7, 38, 41], "702c00": [], "70653767": 4, "71": [1, 41], "724": [3, 44], "72f088": [], "73": [], "7304881": [], "737373": [], "75": [5, 6, 8, 11, 37], "76": [31, 33, 38, 43, 44], "760": [43, 44], "765": [7, 38], "77": [31, 33], "7718": 9, "7782028952": 4, "77893972": [], "78": [], "784": 42, "797979": [], "7998f2": [], "79c0ff": [], "7d7d58": [9, 10], "7ee787": [], "7f4707": [], "8": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 12, 13, 14, 18, 19, 21, 26, 28, 30, 31, 33, 35, 38, 39, 41, 42, 43, 44], "80": [0, 1, 5, 8, 17, 34, 41], "800": [4, 7, 38, 43], "8045e5": [], "81": [1, 41], "815am": [31, 33], "81b19b": [], "8250df": [], "84858": [37, 38], "85": [1, 41], "8702784034": 4, "8786ac": [], "88": 33, "8a4600": [], "8b949e": [], "8c8c8c": [], "8f": [6, 37, 38], "8g": [6, 37], "8n": 26, "8x8": [1, 41, 42], "9": [0, 1, 2, 4, 5, 6, 7, 8, 9, 11, 12, 13, 26, 30, 33, 36, 38, 39, 41, 42, 43, 44], "90": 1, "9040": 9, "91": [31, 33], "912583": [], "91cbff": [], "92": [31, 33], "93": 16, "931": [0, 33], "933": [5, 35], "937": 30, "938": 30, "939": [0, 30, 33], "94": 30, "95": [1, 11, 37, 41], "953800": [], "954": 30, "955820c21e8b": 4, "9579870417283": 21, "96": [6, 37], "960": 30, "961": 30, "962": 30, "9649652536": 4, "96611194e": [], "974eb7": [], "978": [37, 38], "9780387310732": 32, "9780387848570": 32, "9781098134174": 33, "9781492032632": 32, "9781801819312": 33, "97898392": 34, "98": [0, 1, 16, 41], "985": 30, "986": 30, "98661b": [], "989": 30, "9898ff": [9, 10], "99": [13, 16, 36, 37], "991": 30, "992": 30, "993": 30, "996": 5, "996b00": [], "999": [9, 30, 36, 41, 42], "9999": 24, "999999": [], "9e86c8": [], "9e8741": [], "9f4e55": [], "9x": 6, "9y": 6, "A": [2, 3, 5, 6, 7, 10, 11, 12, 13, 15, 16, 19, 20, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 34, 35, 36, 40], "AND": [2, 42, 43], "AS": [], "AT": [], "And": [0, 3, 4, 5, 6, 9, 13, 20, 22, 24, 25, 27, 28, 30, 35, 43, 44], "As": [0, 1, 2, 3, 4, 5, 6, 8, 10, 12, 13, 15, 16, 23, 26, 27, 28, 30, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "At": [0, 4, 6, 13, 20, 23, 33, 36], "BE": [0, 33], "BUT": [], "BY": [], "Be": [2, 18, 25, 33, 42, 43], "Being": 13, "But": [0, 1, 2, 3, 5, 6, 9, 10, 16, 21, 28, 30, 34, 37, 38, 41, 42, 43, 44], "By": [0, 3, 5, 6, 12, 13, 17, 19, 23, 26, 33, 34, 35, 36, 37, 39, 44], "FOR": [], "For": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 18, 19, 21, 22, 23, 25, 26, 27, 28, 30, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "IF": [6, 34, 36], "IN": 32, "If": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 15, 16, 18, 21, 22, 23, 24, 25, 26, 27, 28, 30, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "In": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 18, 19, 21, 22, 23, 24, 25, 26, 27, 28, 30, 32, 33, 34, 35, 36, 37, 38, 39, 41, 42, 43, 44], "Ising": [5, 12, 34, 35, 39, 40], "It": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 18, 20, 21, 22, 23, 25, 26, 27, 28, 30, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "Its": [1, 2, 4, 11, 41, 42, 43], "NO": [], "NOT": [], "No": [6, 9, 33, 34, 36, 39, 41, 42], "Not": [0, 1, 5, 6, 34, 35, 36, 37, 39, 41, 42], "OF": [], "ON": [], "OR": 30, "Of": 30, "On": [0, 3, 27, 30, 31, 32, 33, 36, 37, 43, 44], "One": [0, 1, 3, 4, 5, 6, 7, 8, 11, 12, 13, 17, 30, 34, 35, 36, 37, 38, 39, 40, 41, 43, 44], "Or": [0, 1, 6, 24, 33, 41], "SUCH": [], "Such": [0, 6, 12, 16, 30, 36, 37, 38, 39, 40, 41, 43, 44], "THE": [], "TO": [41, 42], "That": [0, 5, 7, 10, 11, 12, 14, 27, 28, 30, 33, 37, 38, 39, 40, 41, 43, 44], "The": [4, 10, 13, 14, 16, 17, 18, 19, 20, 21, 22, 23, 24, 26, 27, 28, 29, 30, 31, 32], "Then": [0, 1, 6, 8, 9, 10, 11, 12, 13, 14, 15, 16, 18, 19, 20, 21, 23, 24, 26, 33, 35, 36, 37, 40, 41, 42], "There": [0, 3, 4, 5, 6, 8, 9, 11, 12, 14, 15, 26, 27, 28, 30, 31, 33, 34, 35, 36, 39, 40, 43, 44], "These": [0, 3, 4, 5, 8, 9, 10, 11, 12, 13, 14, 17, 18, 22, 23, 26, 27, 28, 30, 31, 33, 34, 35, 36, 40, 41, 42, 43, 44], "To": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 15, 16, 20, 21, 22, 23, 26, 28, 30, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "WITH": [], "Will": [38, 39], "With": [0, 5, 6, 8, 9, 10, 11, 12, 14, 16, 19, 21, 26, 27, 28, 30, 33, 34, 37, 38, 39, 40, 41, 42, 43, 44], "_": [0, 1, 2, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 16, 17, 18, 19, 21, 23, 26, 27, 33, 34, 35, 36, 37, 38, 39, 41, 42, 43, 44], "_0": [5, 8, 10, 11, 13, 34, 35], "_1": [2, 5, 6, 8, 10, 11, 12, 13, 14, 23, 26, 34, 35, 36, 40, 41, 42, 43], "_2": [2, 5, 8, 11, 12, 13, 26, 34, 36, 39, 42, 43], "_3": 26, "_4": 26, "_9": [13, 36], "__array_finalize__": [], "__class__": [10, 41, 42], "__doc__": [6, 37, 38], "__future__": [8, 9, 40], "__getattribute__": [], "__import__": [], "__init__": [1, 22, 38, 39, 41, 42, 44], "__main__": [2, 42, 43], "__name__": [2, 10, 41, 42, 43], "__new__": [], "__path__": [], "_accuraci": [41, 42], "_add_intercept": [38, 39], "_auto1": [2, 3, 4, 5, 6, 7, 12, 13, 26, 30, 34, 35, 38, 39, 40, 41, 42, 43], "_auto10": [6, 12], "_auto11": 6, "_auto12": 6, "_auto2": [2, 3, 4, 5, 6, 12, 13, 26, 30, 39, 40, 41, 42, 43], "_auto3": [3, 4, 5, 6, 12, 13, 26, 39, 40, 41], "_auto4": [4, 6, 12, 13, 26, 39], "_auto5": [4, 6, 12, 13, 26, 39], "_auto6": [4, 6, 12, 26, 39], "_auto7": [4, 6, 12, 26, 39], "_auto8": [6, 12], "_auto9": [6, 12], "_backpropag": [41, 42], "_build": [0, 25, 27, 28, 32, 33, 41, 42], "_c": [1, 41], "_center": [], "_compile_transl": [], "_compon": 11, "_da": 22, "_data": [], "_depth": 9, "_export": [15, 16, 19], "_feed_forward_sav": 22, "_feedforward": [41, 42], "_format": [41, 42], "_fraction": 9, "_i": [0, 1, 2, 5, 6, 7, 8, 11, 12, 13, 19, 23, 27, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43], "_j": [0, 1, 2, 3, 5, 6, 8, 13, 19, 27, 34, 35, 36, 37, 38, 41, 42, 43, 44], "_k": [13, 35, 36, 41, 42], "_l": [12, 39, 40, 41, 42], "_lambda": 6, "_leaf": 9, "_m": 10, "_mask": [], "_multilayer_perceptron": [], "_n": [2, 5, 8, 11, 13, 34, 35, 36, 42, 43], "_node": 9, "_norm": [], "_p": [5, 8, 34, 35], "_parse_numpydoc_see_also_sect": [], "_progress_bar": [41, 42], "_pydevd_bundl": [], "_ratio": 11, "_sampl": 9, "_set_classif": [41, 42], "_sigmoid": [38, 39], "_softmax": [38, 39], "_split": [6, 9, 27], "_t": [13, 36], "_test": [6, 27], "_varianc": 11, "_weight": 9, "a0": 3, "a0111f": [], "a0faa0": [9, 10], "a1": [0, 21, 22, 33], "a11": [], "a12236": [], "a2": [0, 21, 22, 33], "a25e53": [], "a2bffc": [], "a3": [0, 33], "a4": [0, 33], "a5d6ff": [], "a_": [0, 1, 16, 26, 33, 34, 40, 41, 43, 44], "a_0": [0, 33, 40, 41, 43, 44], "a_1": [40, 41, 43, 44], "a_1a": [0, 33], "a_2": [40, 41, 43, 44], "a_2a": [0, 33], "a_3": [0, 33, 43, 44], "a_3a": [0, 33], "a_4": [0, 33], "a_4a": [0, 33], "a_h": [1, 41], "a_i": [0, 1, 2, 12, 33, 40, 41, 42, 43], "a_j": [1, 12, 40, 41, 42], "a_k": [0, 1, 12, 40, 41], "a_matric": [41, 42], "aa": [], "aaa": [], "aaron": 32, "ab": [0, 2, 5, 13, 14, 33, 34, 36, 40, 42, 43], "ab6369": [], "ab_channel": [25, 39, 40, 41, 42, 43], "abandon": [1, 41], "abe338": [], "abid": 30, "abil": [0, 10], "abl": [0, 1, 4, 5, 6, 7, 10, 12, 13, 16, 18, 20, 21, 24, 27, 34, 35, 36, 38, 39, 40, 41, 43], "about": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 15, 16, 19, 20, 22, 23, 24, 25, 26, 27, 31, 36, 37, 38, 39, 41, 42, 43, 44], "abov": [0, 1, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 21, 22, 23, 26, 28, 30, 32, 33, 34, 36, 37, 38, 39], "abovement": [6, 27, 33, 37, 38], "abscissa": [13, 35], "absent": 36, "absolut": [0, 2, 5, 6, 13, 33, 34, 35, 37, 38, 42, 43], "absorb": [34, 35], "abstract": [1, 24, 36, 38, 41, 42], "abund": 36, "ac": [], "acc_bin": [38, 39], "acc_multi": [38, 39], "acceler": [13, 36], "accept": [0, 3, 6, 9, 21, 27, 34, 36, 43, 44], "access": [3, 11, 30, 33, 36, 43, 44], "accid": [4, 6, 37, 38], "accompani": [0, 33, 34], "accomplish": [8, 9, 13, 36], "accord": [0, 1, 2, 5, 6, 9, 12, 13, 14, 30, 33, 35, 36, 37, 39, 40, 41, 42, 43, 44], "accordingli": 11, "account": [0, 3, 5, 13, 15, 16, 20, 23, 30, 33, 36, 43, 44], "accumul": [12, 13, 30, 36, 39, 40, 41, 42], "accur": [0, 3, 4, 6, 10, 13, 36, 37, 38, 43, 44], "accuraci": [0, 1, 3, 4, 5, 6, 7, 9, 10, 11, 12, 21, 23, 24, 28, 33, 34, 35, 38, 39, 40, 41, 42, 44], "accuracy_scor": [0, 1, 10, 21, 22, 28, 33, 38, 39, 41], "accuracy_score_numpi": [1, 41], "acheiv": 21, "achiev": [0, 1, 5, 6, 8, 12, 26, 33, 36, 37, 38, 39, 40, 41], "aco": 30, "acquaint": 25, "acquir": [1, 25, 33, 41], "acr": [], "across": [1, 3, 6, 9, 17, 23, 25, 33, 37, 41, 43, 44], "act": [1, 3, 23, 26, 28, 36, 41, 42, 43, 44], "act_func": [41, 42], "act_func_deriv": [41, 42], "actic": 21, "action": 30, "activ": [0, 2, 3, 4, 9, 15, 22, 24, 29, 31, 33, 36, 43, 44], "activation_d": 22, "activation_func": [21, 22], "activest": [], "actual": [0, 1, 4, 5, 6, 8, 11, 15, 16, 18, 21, 23, 26, 30, 33, 34, 35, 36, 37, 41, 43], "ad": [1, 3, 4, 5, 8, 13, 15, 16, 26, 35, 36, 37, 38, 42, 43, 44], "ada_clf": 10, "adaboostclassifi": 10, "adadelta": [13, 36], "adagrad": [27, 37, 40, 41, 42], "adagradmomentum": [41, 42], "adam": [1, 3, 4, 21, 27, 28, 33, 37, 40, 41, 42, 44], "adam_schedul": [41, 42], "adap": 40, "adapt": [4, 6, 13, 17, 28, 32, 35, 37, 38, 40, 42], "add": [0, 1, 2, 3, 4, 5, 6, 8, 10, 11, 12, 15, 16, 17, 18, 20, 21, 24, 28, 30, 31, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43], "add6ff": [], "add_": [], "add_subplot": [1, 7, 12, 14, 38, 39, 41], "add_suplot": [42, 43], "addendum": 5, "addeventlisten": [], "addit": [0, 2, 3, 5, 6, 7, 8, 9, 10, 12, 13, 15, 21, 25, 26, 27, 28, 30, 31, 32, 33, 34, 37, 38, 39, 40, 41, 42, 43, 44], "addition": [12, 13, 35, 36, 39, 40], "address": [1, 9, 11, 13, 33, 36, 41], "adjac": [3, 12, 39, 40, 43, 44], "adjoint": [5, 34], "adjust": [0, 5, 12, 13, 35, 36, 39], "admir": [0, 33], "advanc": [4, 6, 12, 32, 33, 36, 37, 38, 39, 40], "advantag": [1, 3, 5, 6, 10, 13, 19, 24, 26, 28, 35, 36, 37, 38, 41, 43, 44], "advent": 43, "adversari": 33, "advis": [], "afecionado": 33, "affect": [3, 15, 19, 24, 41, 42, 43, 44], "affin": [0, 3, 8, 11, 34, 40, 43, 44], "afford": [3, 44], "aficionado": 33, "aforement": 14, "african": [], "after": [0, 1, 2, 4, 5, 6, 9, 11, 12, 13, 15, 16, 17, 18, 19, 20, 21, 24, 25, 26, 27, 28, 30, 33, 34, 35, 36, 37, 40, 41, 42, 43, 44], "afterward": [0, 33], "ag": [0, 7, 33, 34, 38], "ag_0": [2, 42, 43], "again": [0, 1, 4, 5, 6, 7, 8, 10, 11, 12, 13, 27, 28, 30, 33, 34, 35, 37, 38, 39, 40, 41, 42], "against": [1, 4, 7, 10, 38, 41], "agegroup": [7, 38], "agegroupmean": [7, 38], "aggreg": [9, 10, 36, 43, 44], "agorithm": 10, "agre": [5, 6, 30, 34, 35, 36, 37], "agreement": [13, 36], "ahead": 9, "ai": [0, 32], "aid": [11, 20, 36], "aim": [0, 1, 4, 6, 7, 11, 14, 16, 17, 19, 20, 25, 26, 27, 28, 34, 37, 38, 39, 40, 41], "ainv": 5, "airplan": [3, 44], "aka": [5, 28], "al": [0, 2, 4, 16, 17, 20, 28, 32, 33, 34, 35, 37, 38, 39, 40, 41, 42], "alarm": [5, 7], "aldo": 34, "alexanderamini": 43, "alexsmola": 42, "algebra": [0, 3, 5, 13, 25, 34, 35, 37, 43, 44], "algorithm": [0, 1, 2, 4, 5, 6, 7, 8, 13, 14, 16, 25, 26, 27, 30, 32, 37, 38, 39, 43, 44], "align": [0, 2, 5, 6, 7, 8, 13, 30, 33, 34, 35, 37, 38, 39, 42, 43, 44], "all": [0, 1, 2, 3, 4, 5, 6, 7, 9, 10, 11, 12, 13, 14, 15, 18, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 42, 43, 44], "allclos": 21, "allevi": [1, 13, 35, 41], "alloc": [3, 26, 43, 44], "allow": [0, 1, 2, 3, 5, 6, 8, 10, 13, 15, 23, 25, 26, 27, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "almost": [0, 1, 6, 8, 11, 13, 30, 35, 36, 37, 38, 39, 41], "alon": [2, 9, 36, 42, 43], "along": [2, 3, 4, 5, 6, 9, 10, 11, 15, 20, 21, 22, 25, 26, 33, 34, 35, 37, 38, 41, 42, 43, 44], "alpha": [0, 1, 2, 3, 4, 6, 7, 8, 9, 10, 13, 14, 23, 30, 33, 34, 35, 36, 37, 38, 39, 41, 42, 43, 44], "alpha_": [10, 36], "alpha_0": [3, 43, 44], "alpha_1": [3, 43, 44], "alpha_2": [3, 43, 44], "alpha_i": [3, 13, 43, 44], "alpha_k": [13, 43, 44], "alpha_m": 10, "alpha_n": 3, "alpha_opt": 13, "alreadi": [2, 3, 4, 5, 6, 10, 12, 15, 22, 25, 26, 30, 33, 34, 35, 38, 39, 40, 42, 43], "also": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 18, 19, 21, 22, 23, 25, 26, 27, 28, 30, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "alter": [1, 41], "altern": [0, 1, 4, 5, 6, 8, 9, 11, 13, 15, 18, 23, 26, 27, 33, 34, 36, 37, 38, 41, 43, 44], "although": [0, 1, 5, 6, 8, 10, 13, 16, 19, 20, 33, 36, 37, 38, 40, 41, 42, 43, 44], "alwai": [0, 3, 5, 6, 12, 13, 16, 19, 21, 22, 27, 28, 30, 33, 34, 35, 36, 37, 39, 40, 43, 44], "am": 4, "ambit": [40, 41], "ame2016": [0, 33], "american": [], "amjith": [], "among": [0, 3, 5, 9, 10, 12, 23, 26, 33, 34, 39, 40, 43, 44], "amongst": [5, 37], "amount": [0, 1, 3, 4, 6, 8, 10, 14, 25, 37, 38, 40, 41, 42, 43, 44], "an": [1, 2, 3, 5, 6, 7, 8, 9, 11, 12, 13, 14, 16, 17, 18, 19, 21, 22, 23, 24, 25, 26, 27, 28, 30, 31, 32, 34, 35, 36, 37, 38, 39, 41, 42, 43, 44], "an_": 30, "anaconda": [0, 1, 25, 27, 33, 41, 42], "analogi": 13, "analys": [6, 37, 38], "analysi": [1, 3, 4, 7, 14, 19, 23, 24, 26, 32, 36, 39, 41, 43, 44], "analyt": [2, 3, 5, 6, 7, 12, 13, 17, 22, 24, 25, 27, 33, 34, 35, 36, 37, 38, 39, 40, 43], "analyz": [0, 1, 3, 4, 5, 6, 16, 27, 30, 34, 35, 36, 42, 44], "andrew": [1, 41], "angl": [0, 3, 9, 34, 36], "anharmon": 3, "ani": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 12, 14, 15, 16, 19, 21, 30, 33, 34, 36, 37, 40, 41, 42, 43, 44], "anim": [4, 12, 39, 40], "ann": [12, 39, 40], "annot": [0, 1, 3, 7, 8, 33, 39, 41, 42, 44], "announc": 33, "anom": [], "anomali": [], "anonym": 18, "anoth": [0, 1, 3, 4, 5, 6, 7, 8, 10, 11, 12, 13, 15, 26, 27, 28, 30, 33, 34, 36, 40, 41, 42, 43, 44], "ansatz": [0, 18, 33], "answer": [0, 1, 3, 5, 6, 19, 22, 24, 26, 27, 28, 31, 33, 37, 41], "antialias": [2, 6, 42, 43], "anticip": 4, "anymor": [1, 8, 41], "anyon": [4, 8, 15], "anyth": [1, 15, 16, 21, 22, 30, 41, 42], "anytim": [31, 33], "anywai": [], "apach": [1, 41, 42], "apart": [11, 13, 35, 36], "api": [1, 25, 33, 41, 42], "appar": [2, 42, 43], "appear": [0, 1, 3, 13, 26, 30, 40, 41, 42, 44], "append": [1, 3, 4, 8, 9, 13, 19, 21, 22, 33, 36, 38, 39, 41, 42, 44], "appendic": [27, 28], "appendix": 27, "appli": [0, 1, 3, 4, 6, 7, 8, 9, 10, 11, 12, 13, 18, 27, 28, 30, 32, 33, 34, 36, 37, 38, 39, 40, 41, 43, 44], "applic": [0, 1, 3, 4, 5, 6, 7, 9, 12, 13, 16, 24, 26, 30, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "apply_gradi": 4, "approach": [1, 2, 4, 5, 6, 9, 10, 11, 12, 13, 15, 16, 18, 21, 24, 25, 27, 30, 32, 34, 35, 40, 41, 42, 43], "approch": 27, "appropri": [2, 6, 9, 12, 13, 17, 25, 30, 36, 37, 38, 39, 42, 43], "approv": 33, "approx": [0, 2, 3, 6, 10, 11, 13, 18, 27, 30, 33, 35, 36, 37, 42, 43, 44], "approxim": [0, 1, 2, 3, 4, 5, 6, 7, 10, 11, 13, 19, 27, 28, 30, 33, 34, 35, 36, 37, 38, 39, 41, 42, 43], "apt": [0, 25, 27, 33], "aq": 30, "ar": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "aragorn": 33, "arang": [1, 3, 4, 6, 7, 9, 10, 12, 13, 33, 36, 38, 39, 41, 42, 44], "arbitrari": [1, 4, 6, 8, 12, 13, 30, 35, 37, 39, 40, 41, 42], "arbitrarili": [0, 1, 11, 33, 36, 41], "arc": 6, "architectur": [3, 4, 12, 24, 28, 40, 42, 43, 44], "archiv": [27, 28], "area": [0, 3, 6, 23, 32, 33, 43, 44], "argmax": [1, 11, 21, 38, 39, 41], "argmin": [4, 10, 14], "argsort": 11, "argu": [1, 13, 28, 41], "arguement": 19, "argument": [0, 2, 3, 5, 11, 12, 13, 17, 21, 33, 34, 36, 37, 39, 40, 41, 42, 43, 44], "aris": [0, 6, 12, 13, 30, 33, 35, 37, 38], "arithmet": [0, 13, 26, 33], "arm": [6, 34, 36], "armadillo": 26, "armin": [], "arnulf": [40, 41], "around": [0, 1, 4, 5, 6, 11, 18, 21, 22, 27, 28, 30, 33, 37, 38, 39, 40, 41], "arrai": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 12, 13, 14, 16, 18, 21, 23, 25, 27, 30, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "arrang": [3, 33, 43, 44], "array_equ": [38, 39], "arraybox": 13, "arriv": [0, 6, 9, 11, 19, 26, 30, 33, 37], "arrow": [12, 39, 40, 41, 42], "arrowprop": 8, "art": [0, 1, 25, 41], "articl": [0, 3, 4, 6, 10, 19, 28, 33, 34, 35, 36, 37, 38, 43, 44], "artifici": [0, 2, 7, 12, 24, 32, 33, 38, 42, 43], "artificialneuron": [12, 39, 40], "arug": 13, "arxiv": [3, 4, 36, 40], "as_fram": 28, "asarrai": [0, 6, 9, 34, 36], "asid": 34, "ask": [5, 6, 11, 12, 15, 19, 27, 28, 37, 40, 41], "aspect": [0, 6, 25, 33, 34, 40, 41], "assembl": [3, 44], "assembli": [0, 33], "assert": [4, 41, 42], "assess": [0, 6, 24, 27, 33, 34, 37, 38], "asset": [], "assici": 4, "assign": [0, 7, 8, 9, 12, 13, 14, 15, 29, 31, 32, 33, 38, 39, 41], "associ": [0, 6, 9, 12, 14, 30, 33, 37, 38, 39, 40], "assum": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 17, 19, 24, 26, 27, 30, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "assumpt": [0, 3, 5, 6, 9, 11, 30, 33, 34, 38, 43, 44], "ast": [0, 5, 6, 33, 37], "astyp": [4, 9, 10, 38, 39, 42, 43], "asymmetri": [0, 33], "asymptot": [4, 6, 36, 37, 38], "atom": [0, 33], "attain": 36, "attempt": [0, 4, 6, 7, 8, 10, 33, 34, 36, 38, 40, 41], "attend": 33, "attent": [0, 26, 33], "attract": [0, 10, 33], "attribut": [0, 9, 22, 33, 41, 42], "auc": 23, "audi": [0, 33], "audio": [3, 4, 43, 44], "augment": [43, 44], "august": [33, 34], "aurelien": [0, 32, 33], "austfjel": 6, "auth": 15, "authent": 15, "author": [0, 1, 10, 30, 41], "authour": 33, "auto": [9, 10, 28, 30, 41, 42], "auto_exampl": [21, 27, 34], "autocor": 30, "autocorrelation_tim": 30, "autocorrelform": 30, "autocovari": 30, "autoencod": [4, 25, 33], "autoencond": 25, "autograd": [21, 25, 28, 33, 40, 41], "autograd_compliant_predict": 22, "autograd_gradi": 22, "autograd_one_lay": 22, "autom": [0, 25, 32, 33], "automac": 26, "automag": 33, "automat": [0, 1, 2, 3, 4, 11, 16, 21, 22, 25, 26, 28, 33, 39, 41, 44], "automobil": [3, 44], "autonom": 4, "avail": [0, 1, 4, 6, 10, 11, 23, 24, 25, 26, 27, 28, 29, 31, 32, 33, 37, 38, 41, 42], "avali": [20, 24, 27, 28], "averag": [0, 1, 3, 6, 9, 10, 13, 14, 23, 30, 31, 33, 34, 37, 38, 41, 42, 43, 44], "avg_loss": 42, "avoid": [0, 4, 5, 6, 9, 11, 13, 18, 21, 26, 34, 36, 37, 38, 41, 42], "awai": [2, 3, 6, 34, 36, 40, 42, 43, 44], "awar": [2, 10, 42, 43], "award": [31, 33], "ax": [0, 1, 2, 3, 4, 6, 7, 8, 9, 10, 11, 12, 13, 14, 20, 21, 26, 27, 28, 33, 37, 38, 39, 41, 42, 43, 44], "axes3d": [2, 6, 13, 35, 36, 42, 43], "axes_grid1": 6, "axhlin": 8, "axi": [0, 1, 2, 3, 4, 6, 7, 8, 9, 10, 11, 12, 13, 14, 18, 21, 24, 26, 30, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "axiom": 5, "axvlin": [4, 8], "axvspan": 4, "b": [0, 1, 3, 4, 5, 6, 8, 9, 10, 12, 13, 14, 15, 16, 17, 19, 20, 21, 22, 24, 30, 31, 33, 34, 35, 36, 37, 38, 39, 40, 41, 43, 44], "b1": [8, 21, 22], "b19db4": [], "b1bac4": [], "b2": [8, 21, 22], "b3": 8, "b35900": [], "b89784": [], "b_": [0, 1, 26, 40, 41], "b_0": [0, 40], "b_1": [0, 2, 12, 13, 36, 39, 40, 41, 42, 43], "b_2": [0, 13, 40, 41], "b_5": [13, 36], "b_g": [21, 22], "b_group": 9, "b_i": [0, 1, 2, 12, 33, 39, 40, 41, 42, 43], "b_ia_": [0, 33], "b_ia_i": 0, "b_index": 9, "b_j": [1, 12, 39, 40, 41, 42], "b_k": [0, 1, 12, 13, 36, 39, 40, 41], "b_m": [12, 39], "b_score": 9, "b_valu": 9, "ba": 36, "babcock": 33, "bach": 36, "bachelor": [29, 31], "back": [0, 3, 4, 5, 6, 8, 9, 10, 15, 16, 21, 26, 28, 30, 33, 36, 44], "backbon": 26, "backend": [1, 4, 41, 42], "background": [32, 33, 41], "backprogag": 22, "backpropag": [1, 21, 28, 36, 40, 41, 42], "backpropog": 22, "backslash": [], "backtrack": 9, "backup": 26, "backward": [1, 2, 4, 12, 22, 26, 36, 40, 41, 42, 43, 44], "bad": [6, 17, 28, 34, 41, 42], "badli": 30, "bag": [9, 25, 33], "bag_clf": 10, "baggin": 33, "baggingboot": 10, "baggingclassifi": 10, "baggingtre": 10, "bailei": [], "balanc": [6, 23, 36, 37, 38], "ballpark": 18, "baluka": [42, 43], "band": 26, "bandwidth": 26, "banner": [], "bar": [0, 6, 11, 27, 33, 41, 42], "barber": 32, "bare": [4, 10], "base": [0, 1, 3, 4, 5, 7, 8, 9, 10, 14, 15, 16, 17, 25, 30, 31, 32, 33, 34, 35, 38, 39, 40, 41, 43, 44], "baselin": 23, "basi": [5, 7, 8, 10, 11, 12, 13, 26, 34, 35, 38, 39, 40], "basic": [6, 8, 12, 13, 14, 15, 24, 25, 27, 28, 30, 33, 37, 41, 42], "basin": 36, "batch": [3, 4, 11, 12, 13, 21, 35, 38, 39, 42, 44], "batch_idx": 44, "batch_shap": 4, "batch_siz": [1, 3, 4, 41, 42, 44], "batchnorm": 4, "bay": [7, 38, 39], "baydin": 40, "bayesian": [5, 25, 32, 33], "bbbbbb": [], "beauti": [], "becam": [], "becaus": [0, 1, 2, 3, 4, 5, 6, 8, 9, 12, 13, 14, 24, 33, 34, 35, 36, 37, 38, 39, 41, 42, 43, 44], "becom": [0, 1, 2, 5, 6, 7, 9, 12, 13, 19, 30, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43], "been": [0, 1, 2, 3, 4, 5, 6, 11, 12, 13, 19, 20, 24, 25, 26, 27, 28, 33, 34, 36, 37, 39, 40, 41, 42, 43], "befor": [0, 1, 2, 3, 4, 5, 6, 7, 8, 12, 13, 14, 16, 17, 18, 19, 20, 21, 22, 24, 26, 27, 30, 33, 34, 36, 37, 38, 39, 40, 41, 42, 44], "beforehand": [0, 30, 33], "began": [], "begin": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 11, 12, 13, 14, 15, 22, 23, 26, 30, 31, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "beginn": 28, "behav": [1, 6, 13, 35, 37, 38, 41], "behavior": [0, 1, 13, 33, 35, 36, 41], "behaviour": [12, 36, 39, 40, 41], "behind": [0, 1, 6, 8, 13, 24, 33, 35, 41], "being": [0, 1, 2, 3, 4, 5, 7, 8, 10, 11, 12, 13, 17, 20, 30, 33, 34, 35, 36, 38, 39, 40, 41, 42, 43, 44], "believ": [9, 26], "belong": [7, 8, 9, 13, 14, 35, 38, 39, 41, 42], "below": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 15, 18, 21, 22, 23, 26, 27, 28, 30, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "benchmark": [10, 28], "benefici": [1, 13, 41], "benefit": [0, 1, 4, 11, 13, 25, 33, 35, 36, 41, 43, 44], "bengio": [1, 28, 32, 33, 34, 36], "benign": [1, 7, 39], "benno": [40, 41], "berner": [40, 41], "besid": [4, 5, 35], "bessel": [5, 34, 37], "best": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 12, 13, 15, 16, 18, 21, 23, 28, 31, 33, 34, 35, 36, 37, 38, 39, 41, 42, 43, 44], "beta": [1, 3, 10, 11, 13, 16, 17, 19, 33, 34, 35, 41, 42, 43, 44], "beta1": [], "beta2": [], "beta_": [3, 13, 17, 34, 43, 44], "beta_0": [1, 3, 13, 34, 41, 43, 44], "beta_1": [1, 3, 10, 13, 34, 36, 41, 43, 44], "beta_1m_": 36, "beta_1x_i": 13, "beta_2": [3, 13, 36, 43, 44], "beta_2v_": 36, "beta_3": [3, 43, 44], "beta_i": [3, 36, 43, 44], "beta_j": [13, 34], "beta_k": 13, "beta_linreg": 13, "beta_m": 10, "beta_mg_m": 10, "beta_n": 3, "better": [0, 1, 2, 3, 4, 6, 9, 10, 11, 12, 13, 19, 20, 22, 23, 33, 34, 36, 37, 41, 42, 43, 44], "between": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 11, 12, 13, 14, 15, 16, 17, 18, 19, 22, 23, 24, 27, 28, 30, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "beyond": [0, 1, 5, 6, 8, 13, 33, 34, 35, 36, 41], "bf": [13, 14, 26, 30, 35], "bf5400": [], "bg": 33, "bgd": [13, 36], "bia": [0, 1, 2, 3, 5, 8, 9, 10, 12, 13, 20, 21, 22, 24, 28, 33, 34, 35, 39, 40, 41, 42, 43, 44], "bias": [1, 2, 3, 5, 6, 9, 12, 19, 21, 22, 28, 36, 37, 39, 42, 43, 44], "bib": [], "bibliographi": [27, 28], "bibtex": [], "big": [0, 1, 2, 5, 6, 14, 19, 36, 37, 41, 42, 43], "bigger": [1, 6, 34, 41], "bigl": 23, "bigr": [12, 23, 39], "bike": 9, "bilbo": 33, "billion": [3, 12, 25, 36, 39, 40, 43, 44], "bin": [7, 30, 39], "binari": [0, 3, 5, 7, 9, 10, 12, 23, 28, 33, 38, 39, 42, 44], "binary_cross_entropi": [38, 39], "binary_result": [38, 39], "binarycrossentropi": 4, "bind": 0, "binomi": [25, 30, 33], "binsboot": [6, 37], "bioinformat": 0, "biolog": [1, 12, 39, 40, 41], "bios1100": [25, 33], "bird": [0, 3, 44], "birth": 33, "bishop": [32, 33], "bit": [1, 4, 19, 21, 26, 30, 33, 41, 42], "bitwis": 30, "bivari": [2, 42, 43], "bk": [13, 36], "bla": [26, 33], "black": [8, 9, 14], "blame": [], "block": [6, 10, 25, 26, 30, 33, 37, 38, 43, 44], "blockquot": [], "blog": [28, 33], "blogpost": 4, "blue": [0, 3, 43, 44], "bm": [], "bmatrix": [0, 1, 3, 5, 7, 8, 11, 13, 26, 33, 34, 35, 36, 38, 39, 40, 41, 43, 44], "bmi": [1, 41], "bnb2fevkeeo": 43, "bodi": [0, 1, 4, 12, 39, 40, 41], "bold": 1, "boldfac": [0, 5, 16, 34, 35], "boldsymbol": [0, 1, 2, 3, 5, 6, 7, 8, 10, 11, 13, 14, 16, 17, 19, 27, 33, 35, 36, 38, 39, 40, 41, 42, 43, 44], "boltzmann": [12, 25, 33, 39, 40], "book": [17, 27, 28, 32, 33, 34, 37, 38, 42, 43, 44], "book1": 32, "bool": [], "boolean": [4, 17], "boost": [1, 9, 25, 33, 41], "boostrap": 10, "bootstrap": [1, 13, 19, 25, 27, 33, 36, 41, 42], "born": 36, "borrow": 33, "boston_dataset": [], "bot": 8, "both": [0, 1, 4, 5, 6, 8, 9, 10, 13, 14, 15, 16, 17, 19, 23, 25, 26, 27, 28, 30, 31, 33, 34, 35, 36, 37, 38, 39, 41, 42, 43, 44], "bottl": [7, 38, 39], "bottou": 36, "bound": [8, 12, 36, 39, 40, 41, 42], "boundari": [2, 4, 8, 11, 12, 42, 43], "bousquet": 36, "bower": [], "box": [4, 9, 21, 22], "boyd": [8, 13, 35], "bracket": [4, 30], "brain": [1, 7, 12, 38, 39, 40, 41, 42], "branch": [9, 33], "break": [0, 4, 6, 11, 14, 33, 36], "breast": [5, 7, 11, 39], "breviti": 13, "brew": [0, 25, 27, 33], "brg": 8, "brian": [], "brief": [27, 28, 34], "briefli": [0, 16, 19, 28, 33, 37], "bring": [0, 5, 6, 10, 28, 34, 36], "britt": [31, 33], "broad": 0, "broadcast": 21, "broadli": 33, "brought": [13, 25, 33], "brownle": 4, "browser": [15, 33], "brute": [3, 5, 11, 34, 40, 43, 44], "bsd": [], "budget": 36, "buffer_s": 4, "bug": [], "bugfix": [], "bui": 4, "build": [0, 4, 5, 6, 10, 16, 22, 26, 30, 33, 37, 38, 39, 40], "buildmodel_tutori": 28, "built": [1, 3, 4, 6, 37, 38, 41, 42, 43, 44], "bunch": 11, "bundl": [], "busi": [], "bxe2t": [39, 40, 41], "byte": [26, 33], "c": [0, 1, 2, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 19, 20, 21, 22, 24, 25, 26, 29, 30, 31, 32, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "c1": [8, 11], "c2": [8, 11], "c4a2f5": [], "c5e478": [], "c9d1d9": [], "c_": [8, 9, 10, 13, 30, 35, 36], "c_0": 30, "c_1": [12, 39], "c_2": [12, 39], "c_3": [12, 39], "c_4": [12, 39], "c_i": [12, 13, 36, 39], "c_k": 30, "ca": [1, 33], "caab6d": [], "cach": 10, "cal": [0, 8, 10, 12, 13, 35, 36, 40, 41, 42], "calcul": [0, 1, 2, 4, 5, 6, 8, 9, 10, 11, 12, 13, 14, 16, 19, 22, 26, 28, 30, 33, 36, 37, 38, 39, 40, 41, 42, 43], "california": [27, 28], "call": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 18, 19, 21, 23, 25, 26, 27, 28, 30, 31, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "callabl": [41, 42], "calor": [0, 34], "caltech": [], "cambridg": [13, 32, 35, 40, 41], "can": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 30, 31, 32, 34, 35, 39, 41, 42, 43, 44], "cancel": [0, 13, 33, 34], "cancer": [5, 10, 23, 39], "cancerpd": [7, 39], "candid": [8, 9, 10, 36], "cannot": [0, 1, 4, 5, 6, 7, 8, 9, 27, 30, 34, 35, 36, 39, 41], "canopi": [0, 25, 27, 33], "canva": [15, 16, 19, 20, 24, 27, 28, 33], "cap": 5, "capabl": [0, 1, 8, 13, 25, 33], "capac": [2, 31, 42, 43], "capita": [], "caption": [20, 27, 28], "captur": [4, 11, 12, 23, 33, 39, 40], "car": [3, 4, 44], "card": [0, 7, 33, 38, 39], "cardin": [1, 41], "care": [11, 15, 19, 22, 36], "carefulli": [13, 36], "carlo": [0, 6, 25, 30, 32, 33, 37, 38], "carri": [2, 6, 7, 27, 37, 38, 39, 42, 43], "cart": 10, "case": [0, 1, 2, 3, 4, 5, 6, 7, 11, 12, 13, 14, 15, 16, 23, 25, 26, 27, 28, 33, 37, 40, 41, 42], "casella": 32, "cast": [1, 41], "cat": [3, 4, 44], "catch": 0, "categor": [0, 1, 3, 9, 11, 33, 38, 39, 41, 42, 44], "categori": [0, 1, 3, 7, 10, 12, 14, 33, 38, 39, 40, 41, 43, 44], "categorical_cross_entropi": [38, 39], "categorical_crossentropi": [1, 3, 41, 42, 44], "caus": [0, 5, 6, 30, 33, 34, 35, 36, 37, 38], "causal": 0, "causat": [0, 33], "cax": 1, "cb": [6, 33], "cbar": 1, "cc": [0, 1, 5, 13, 23, 33, 34, 35, 36, 40, 41, 42], "cc398b": [], "ccbb44": [], "ccc": [5, 12, 23, 35, 39], "cdf": 30, "cdot": [0, 2, 6, 12, 13, 14, 26, 30, 33, 35, 36, 37, 39, 42, 43], "celebr": [13, 35], "cell": [4, 21, 22], "center": [0, 1, 6, 7, 8, 9, 11, 14, 18, 27, 30, 33, 34, 36, 37, 38, 39, 41], "central": [0, 3, 5, 6, 8, 16, 20, 24, 26, 28, 33, 34, 40, 41, 43, 44], "centroid": [14, 30], "centroid_differ": 14, "centuri": [3, 43, 44], "certain": [0, 3, 6, 7, 9, 21, 30, 33, 34, 37, 38, 39, 43, 44], "certainti": 37, "cf": [], "cf222e": [], "cffi": [], "cg": 13, "cha": [], "chain": [0, 1, 13, 22, 25, 30, 33], "challeng": [15, 40], "chanc": [1, 5, 13, 30, 36, 41], "chang": [0, 1, 2, 3, 4, 5, 6, 8, 9, 11, 12, 13, 14, 15, 16, 19, 21, 22, 26, 27, 28, 30, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43], "changeabl": 28, "changelog": [], "channel": [3, 43, 44], "chap4": [40, 41], "chapter": [0, 6, 10, 11, 16, 17, 19, 26, 27, 28, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "chapter3": [0, 27], "charact": [0, 3, 5, 33, 34, 35, 43, 44], "character": [8, 9, 10, 12, 30, 39, 41, 42], "characterist": [0, 1, 3, 10, 13, 23, 33, 41, 42, 44], "charg": [0, 33], "charl": [], "charset": [], "chart": 23, "chase": 4, "chatgpt": [15, 27, 28], "chd": [7, 38], "chddata": [7, 38], "cheap": [5, 34, 35, 36], "cheaper": [1, 13, 36, 41], "check": [1, 3, 4, 5, 11, 13, 15, 16, 19, 21, 22, 26, 33, 36, 38, 39, 41, 42], "checkmark": 3, "checkpoint": 4, "checkpoint_dir": 4, "checkpoint_prefix": 4, "chen": 10, "cheng": 34, "chiaramont": [2, 42, 43], "childcar": 16, "children": 16, "choic": [0, 1, 2, 3, 4, 6, 9, 12, 13, 14, 20, 26, 28, 33, 34, 35, 36, 37, 38, 39, 42, 43, 44], "choleski": [5, 26, 34, 35], "choos": [2, 3, 6, 9, 10, 11, 13, 14, 15, 18, 19, 21, 27, 28, 35, 37, 38, 39, 42, 43, 44], "chosen": [0, 1, 2, 6, 8, 9, 10, 13, 16, 30, 33, 35, 36, 37, 38, 41, 42, 43], "chosen_datapoint": [1, 41], "christian": 32, "christoph": [32, 33], "chunk": 36, "cifar": [3, 44], "cifar10": [3, 44], "circ": [1, 12, 36, 40, 41], "circl": [0, 8, 12, 34, 36, 39, 40], "circuit": 3, "circumfer": 9, "circumv": [1, 5, 13, 34, 35, 36, 41], "citat": [], "cite": [20, 27, 28], "ckpt": 4, "cl": [38, 39], "claim": 24, "clarifi": 21, "clariti": 30, "class": [0, 1, 3, 4, 6, 7, 8, 9, 11, 12, 13, 21, 22, 23, 30, 33, 37, 41, 42, 43, 44], "class0": [38, 39], "class1": [38, 39], "class_nam": [3, 9, 44], "class_to_index": [38, 39], "class_val": 9, "class_valu": 9, "classic": [7, 9, 13, 28, 39], "classif": [0, 3, 5, 6, 7, 8, 11, 12, 21, 23, 24, 25, 27, 32, 33, 34, 37, 43, 44], "classifi": [0, 1, 4, 7, 9, 10, 11, 23, 28, 33, 39, 41, 42], "classificaton": [1, 41], "classifii": 10, "claus": [], "clean": [1, 41], "clear": [1, 5, 10, 12, 13, 36, 41, 42], "clearli": [0, 3, 5, 6, 7, 8, 30, 34, 35, 37, 38, 39, 43, 44], "clever": [1, 10, 41], "clf": [0, 6, 8, 9, 10, 33, 34], "clf3": 0, "clf_lasso": 6, "clf_ridg": 6, "cli": 15, "click": [], "climb": 23, "clip": [3, 30, 36, 38, 39, 43, 44], "clock": 36, "clone": [15, 31], "close": [0, 1, 2, 4, 6, 8, 9, 11, 12, 13, 14, 18, 24, 30, 32, 33, 35, 36, 37, 39, 40, 41, 42, 43, 44], "closer": [3, 5, 13, 34, 35, 36], "closest": [8, 11, 13, 14], "closur": [25, 33], "cloud": [25, 33], "cluster": [0, 1, 4, 6, 11, 25, 33, 37, 38, 39, 41], "cluster_label": 14, "cm": [1, 2, 3, 6, 8, 13, 35, 36, 41, 42, 43, 44], "cmap": [0, 1, 2, 3, 4, 6, 8, 9, 10, 33, 41, 42, 43, 44], "cmap_arg": 6, "cmd": [9, 15], "cn_": 30, "cnn": [12, 24, 39, 40], "cnn_kera": [3, 44], "cntk": [25, 33], "co": [0, 2, 3, 6, 9, 13, 33, 37, 38, 42], "code": [0, 3, 4, 6, 7, 8, 18, 19, 21, 22, 23, 25, 26, 30, 32], "codebas": [41, 42], "codec": [], "coef": [0, 33], "coef0": 8, "coef_": [0, 5, 6, 8, 9, 13, 16, 33, 34, 35, 36], "coeff": 5, "coeffici": [0, 3, 5, 6, 7, 8, 9, 13, 18, 26, 33, 34, 36, 37, 38, 39, 43, 44], "coerc": [0, 6, 33, 37, 38], "coin": [10, 30], "coin_toss": 10, "col": [0, 11, 33, 34], "colab": [21, 22, 25, 33], "cold": 9, "colinear": [], "collabor": [20, 27, 28], "collaps": 8, "collect": [2, 6, 10, 11, 17, 25, 30, 32, 33, 37, 38, 40, 43], "collinear": [5, 34, 35], "color": [0, 3, 4, 6, 8, 9, 10, 30, 36, 43, 44], "color_channel": [3, 44], "color_cod": 6, "colorbar": [1, 6, 20], "coloumn": [41, 42], "colsample_bytre": 10, "colsaobject": 10, "column": [0, 1, 2, 5, 6, 7, 8, 9, 11, 12, 16, 17, 18, 19, 26, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "columntransform": 9, "com": [4, 6, 15, 16, 19, 20, 21, 22, 25, 27, 28, 32, 33, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "combin": [1, 2, 5, 6, 7, 10, 15, 18, 22, 23, 30, 37, 38, 41, 42, 43, 44], "come": [0, 1, 3, 4, 5, 12, 13, 14, 15, 23, 24, 28, 33, 34, 35, 36, 39, 40, 41, 42, 43, 44], "comfort": [], "command": [0, 1, 15, 41, 42], "comment": [0, 4, 5, 6, 20, 24, 27, 28], "commerci": [0, 25, 27, 33], "commit": 15, "commod": [0, 33], "common": [0, 1, 3, 5, 6, 7, 9, 11, 13, 14, 16, 23, 27, 28, 30, 33, 34, 35, 36, 37, 38, 39, 40, 41], "commonli": [0, 1, 4, 6, 7, 9, 13, 14, 34, 36, 37, 38, 39, 41], "commonmark": [], "commun": [0, 12, 15, 27, 39, 40], "commut": 3, "commutatitav": 3, "compact": [0, 1, 3, 5, 6, 7, 9, 11, 12, 13, 14, 21, 33, 34, 37, 43, 44], "compair": 0, "compar": [0, 3, 4, 5, 6, 11, 13, 18, 23, 26, 27, 28, 33, 34, 35, 36, 37, 38, 40], "comparison": [2, 4, 13, 28, 42, 43], "compat": [7, 38, 39], "compens": 36, "compet": 0, "competit": 10, "compil": [0, 1, 3, 4, 13, 25, 26, 33, 41, 42], "compl": 21, "complet": [0, 2, 3, 4, 9, 12, 15, 16, 17, 18, 19, 20, 21, 22, 24, 33, 39, 42, 43, 44], "completenn": [12, 39], "complex": [1, 5, 8, 9, 11, 12, 13, 16, 19, 28, 33, 35, 36, 37, 38, 41], "complianc": [], "complic": [0, 1, 9, 13, 27, 28, 33, 35, 36, 37, 38, 41], "compoment": 34, "compon": [0, 1, 3, 4, 5, 6, 7, 9, 14, 16, 25, 33, 34, 35, 37, 39, 40, 41, 42, 43, 44], "components_": 11, "compos": [9, 12, 13, 14, 25, 33, 39, 40, 42, 44], "compphys": [0, 6, 16, 20, 25, 27, 28, 29, 31, 32, 33, 34, 35, 38, 39, 41, 42, 43, 44], "compress": [0, 33, 34, 44], "compresseds": 43, "compris": 6, "compromis": [5, 23, 34, 35], "compulsori": [25, 33], "comput": [0, 1, 2, 3, 4, 5, 6, 7, 8, 10, 11, 12, 13, 15, 16, 17, 18, 21, 22, 23, 25, 26, 27, 28, 29, 30, 32, 33, 34, 35, 37, 38, 39, 40, 41, 42, 43, 44], "computation": [0, 3, 6, 9, 13, 30, 33, 35, 36, 40, 44], "computationalscienceuio": 33, "compute_gradi": 22, "computerlab": [27, 28], "con": 28, "concaten": [2, 4, 6, 14, 38, 39, 42, 43], "concav": [1, 13, 34, 35], "concentr": 10, "concept": [0, 2, 23, 25, 33, 34, 42, 43], "conceptu": [12, 13, 35, 39, 40], "concern": [0, 1, 4, 7, 33, 35, 38, 39, 41, 43, 44], "concic": 33, "conclud": [0, 5, 13, 36], "conclus": [1, 41], "cond": [2, 42, 43], "conda": [0, 1, 25, 27, 33, 41, 42], "condis": 34, "condit": [0, 2, 4, 5, 6, 8, 9, 11, 13, 30, 33, 34, 36, 37, 42, 43], "conduct": 25, "condwav": [2, 42], "confid": [0, 5, 6, 7, 8, 19, 23, 33, 34, 38, 39], "config": 42, "configur": [3, 23, 42, 44], "confirm": [5, 12, 21, 39], "conform": [], "confus": [5, 6, 7, 10, 26, 28, 34, 37], "confusion_matrix": 9, "congruenti": 30, "conjug": [4, 8], "conjugaci": 13, "conjunct": [3, 43, 44], "connect": [0, 1, 3, 4, 9, 11, 12, 13, 24, 26, 33, 34, 35, 39, 40, 41, 42, 43, 44], "consensu": 36, "consequ": [5, 6, 8, 10, 12, 13, 34, 35, 36, 37], "consequenti": [], "conserv": [5, 14, 34, 35], "consid": [0, 1, 2, 3, 5, 6, 7, 8, 9, 10, 12, 13, 16, 19, 24, 26, 27, 28, 30, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "consider": [0, 1, 5, 13, 33, 34, 35, 37], "consist": [1, 2, 3, 4, 6, 12, 13, 27, 28, 30, 34, 35, 37, 38, 39, 40, 41, 42, 43, 44], "consol": [], "const": [], "constant": [0, 2, 4, 5, 6, 8, 12, 13, 16, 18, 30, 33, 34, 35, 36, 39, 40, 41, 42, 43, 44], "constitu": [0, 33], "constitut": [2, 6, 37, 38, 42, 43, 44], "constrain": [1, 3, 5, 7, 11, 35, 38, 41, 43, 44], "constraint": [5, 6, 8, 13, 34, 35, 37], "construct": [0, 1, 2, 3, 5, 6, 7, 8, 9, 10, 11, 23, 26, 30, 33, 34, 37, 39, 43, 44], "constructor": [], "consult": 28, "consum": 36, "contact": [0, 33], "contain": [0, 2, 3, 4, 5, 6, 7, 8, 9, 11, 12, 13, 15, 18, 19, 21, 26, 27, 28, 30, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "contemporari": 33, "content": [1, 15, 20, 25, 26, 33, 35, 36, 43, 44], "context": [6, 10, 13, 22, 27, 35, 36, 37, 38, 40], "contigu": 26, "contin": 19, "continu": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 12, 13, 19, 24, 26, 27, 28, 30, 33, 34, 35, 36, 37, 38, 40, 41, 42, 43, 44], "contour": [9, 10, 13], "contourf": [8, 9, 10], "contract": [], "contrast": [1, 4, 9, 10, 12, 33, 36, 39, 40, 41], "contribut": [0, 3, 5, 13, 18, 30, 33, 34, 35, 36, 43, 44], "contributor": [0, 27], "control": [0, 1, 3, 9, 13, 15, 23, 25, 33, 41, 44], "conv": [3, 4, 43, 44], "conv1": 44, "conv2": 44, "conv2d": [3, 4, 44], "conv2dtranspos": 4, "convei": 33, "conveni": [5, 6, 12, 13, 26, 27, 28, 33, 35, 36, 37, 39], "convent": [12, 34], "converg": [1, 2, 4, 5, 8, 13, 14, 18, 34, 35, 40, 41, 42, 43], "convergencewarn": [], "convers": [20, 36], "convert": [0, 1, 4, 5, 9, 11, 13, 26, 33, 34, 35, 38, 39, 43], "converttomatrix": 4, "convex": [4, 5, 7, 34, 38, 39], "convinc": [13, 23, 35, 43, 44], "convolut": [1, 4, 24, 25, 33, 41], "cool": [4, 9], "coolwarm": 6, "coordin": [5, 12, 14, 34, 35, 36, 39], "coorel": [], "copi": [0, 1, 14, 15, 34, 38, 39, 41, 42], "copyright": [], "core": 10, "corel": 33, "corner": [43, 44], "coronari": [7, 38], "corr": [5, 7, 11, 34, 39], "correalt": [11, 25], "correct": [0, 1, 2, 3, 4, 5, 7, 13, 15, 19, 20, 21, 22, 23, 26, 30, 33, 34, 35, 37, 38, 39, 41, 42, 43, 44], "correctli": [1, 2, 6, 7, 10, 18, 19, 21, 22, 23, 27, 28, 37, 38, 41, 42, 43], "correl": [0, 1, 3, 5, 6, 7, 10, 12, 13, 25, 30, 33, 35, 36, 37, 40], "correlation_matrix": [5, 7, 11, 34, 39], "correspond": [0, 3, 5, 6, 8, 9, 11, 12, 25, 26, 27, 28, 30, 33, 34, 35, 37, 39, 40, 43, 44], "cortex": [12, 39, 40], "cosin": [3, 6, 37, 38], "cost": [0, 2, 3, 5, 6, 7, 8, 9, 12, 13, 16, 17, 18, 19, 21, 22, 24, 27, 28, 33, 44], "cost_autograd": 22, "cost_deep_grad": [2, 42, 43], "cost_der": 22, "cost_fun": 22, "cost_func": [41, 42], "cost_func_deriv": [41, 42], "cost_funct": [2, 42, 43], "cost_function_deep": [2, 42, 43], "cost_function_deep_grad": [2, 42, 43], "cost_function_grad": [2, 42, 43], "cost_function_train": [41, 42], "cost_function_v": [41, 42], "cost_grad": [2, 22, 42, 43], "cost_histori": [], "cost_ol": [], "cost_one_lay": 22, "cost_ridg": [], "cost_sum": [2, 42, 43], "cost_two_lay": 22, "costcrossentropi": [41, 42], "costli": 36, "costlogreg": [41, 42], "costol": [13, 36, 41, 42], "could": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 15, 16, 17, 18, 24, 26, 27, 28, 30, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "coulomb": [0, 33], "count": [0, 9, 15, 23, 27, 28, 29, 30, 31, 33], "counter": [27, 28], "counteract": 36, "counterpart": 33, "countor": 13, "coupl": [4, 5, 6, 21, 37], "cours": [0, 1, 3, 5, 11, 15, 16, 17, 19, 20, 21, 24, 27, 28, 31, 34, 37, 38, 41, 42], "coursework": 15, "courvil": [28, 32, 33, 34, 36], "cov": [5, 6, 11, 26, 30, 33, 34, 37], "cov_xi": [5, 11, 34], "cov_xx": [5, 11, 34], "cov_yi": [5, 11, 34], "covari": [0, 7, 25, 26, 33, 35, 39], "covariance_matrix": [5, 11, 14], "cover": [0, 5, 25, 27, 28, 31, 32, 34, 35, 37], "covert": [0, 33], "covxi": 30, "covxx": 30, "covxz": 30, "covyi": 30, "covyz": 30, "covzz": 30, "cpu": [1, 41, 42, 44], "cqofi41lfdw": [40, 41], "craft": [3, 43, 44], "crash": 36, "creat": [1, 3, 4, 5, 9, 10, 11, 12, 15, 18, 19, 21, 22, 24, 25, 33, 36, 38, 39, 40, 41, 42, 44], "create_biases_and_weight": [1, 41], "create_convolutional_neural_network_kera": [3, 44], "create_lay": [21, 22], "create_layers_batch": 21, "create_neural_network_kera": [1, 41, 42], "create_x": [5, 11, 41, 42], "creation": [], "credit": [0, 7, 31, 33, 38, 39], "crim": [], "crime": [], "criteria": [0, 4, 9, 10, 14, 30, 33], "criterion": [9, 10, 13, 18, 35, 36, 40, 42, 44], "critic": [6, 27, 34], "critiqu": [27, 28], "cross": [0, 1, 3, 7, 9, 10, 13, 15, 21, 22, 23, 25, 28, 30, 33, 34, 35, 36, 41, 42], "cross_entropi": [4, 21], "cross_val_scor": [6, 37, 38], "cross_valid": [7, 10, 23, 39], "crossentropyloss": [42, 44], "crossvalid": [6, 37, 38], "crucial": [1, 30, 36, 41], "cs231": 3, "cs231n": 42, "cs231n_2017_lecture4": 42, "csr_matrix": [26, 33], "css": [], "csv": [0, 4, 6, 7, 9, 37, 38, 39], "ctnk": [1, 41, 42], "cube": 40, "cubic": 0, "cuda": [42, 44], "culprit": [], "cumbersom": [5, 37], "cuml": 23, "cumprod": [], "cumsum": [10, 11, 33], "cumul": [7, 10, 30, 36], "cumulative_heads_ratio": 10, "cuomo": [42, 43], "cup": 5, "current": [1, 2, 3, 4, 13, 14, 15, 16, 32, 35, 36, 38, 39, 41, 42, 43, 44], "curs": [0, 34], "curv": [6, 7, 10, 12, 27, 38, 39, 41], "curvatur": [13, 35, 36], "custom": [6, 14], "custom_cmap": [9, 10], "custom_cmap2": [9, 10], "custom_lin": [], "cut": 23, "cutpoint": 9, "cv": [6, 7, 10, 23, 37, 38, 39], "cvxbook": [13, 35], "cvxopt": [5, 8, 34], "cybenko": 40, "cycl": [1, 12, 39, 40, 41], "cycler": [], "d": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 13, 14, 15, 16, 17, 19, 20, 21, 22, 24, 26, 30, 31, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "d1": [], "d166a3": [], "d2": [], "d2_g_t": [2, 42, 43], "d2a8ff": [], "d4d0ab": [], "d71835": [], "d9dee3": [], "d_1": [43, 44], "d_2": [43, 44], "d_f": [13, 35], "d_g_t": [2, 42, 43], "d_net_out": [2, 42, 43], "da": [3, 22, 40, 43, 44], "da_1": 22, "dagger": [5, 26, 34, 35], "dai": [1, 9, 25, 41], "damag": [], "damp": 3, "darget": 9, "darkr": 30, "dat": [0, 33], "dat_id": [0, 6, 7, 9, 33, 37, 38], "data": [2, 4, 5, 8, 10, 12, 13, 14, 16, 19, 20, 22, 23, 24, 26, 27, 28, 32, 35, 36, 37, 43], "data1": 14, "data2": 14, "data3": 14, "data4": 14, "data_id": [0, 6, 7, 9, 33, 37, 38], "data_indic": [1, 41], "data_panda": 33, "data_path": [0, 6, 7, 9, 33, 37, 38], "databas": [1, 41, 42], "datafil": [0, 6, 7, 9, 33, 37, 38], "datafram": [0, 4, 5, 7, 9, 11, 33, 34, 39], "dataload": [42, 44], "datapoint": [1, 5, 6, 7, 11, 13, 16, 35, 36, 37, 38, 41, 42], "datasci": [15, 16, 19], "dataset": [0, 4, 6, 7, 8, 9, 10, 11, 13, 14, 16, 21, 22, 23, 27, 28, 33, 35, 36, 37, 38, 39, 42], "datatyp": 4, "date": [15, 18, 21, 22, 23, 24, 27, 28, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "daughter": 10, "davi": [], "david": 32, "davison": [37, 38], "db": [22, 40, 43], "db_1": 22, "dbb7ff": [], "dbh": [1, 41], "dbo": [1, 41], "dc": 22, "dc5e85cd93c3": 28, "dc_da": 22, "dc_da1": 22, "dc_da2": 22, "dc_db": 22, "dc_db1": 22, "dc_db2": 22, "dc_dw": 22, "dc_dw1": 22, "dc_dw2": 22, "dc_dz": 22, "dc_dz1": 22, "dc_dz2": 22, "dcc6e0": [], "dcomposit": 26, "ddot": [2, 42, 43], "de": 36, "dead": [1, 41, 42], "deadlin": [15, 20, 21, 22, 23, 24], "deal": [0, 1, 3, 5, 6, 8, 11, 13, 14, 19, 26, 30, 33, 34, 35, 36, 40, 41, 42, 44], "dealt": 0, "debt": [7, 38, 39], "debug": [0, 5, 6, 34, 35, 36, 37, 38, 41, 42], "debugg": [], "decad": [0, 3, 36, 43, 44], "decai": [0, 13, 30, 33], "decemb": [31, 33], "decent": 10, "decid": [0, 2, 3, 5, 6, 9, 18, 34, 35, 36, 37, 38, 41, 42, 43, 44], "decim": [0, 33, 41, 42], "decis": [0, 1, 8, 11, 25, 32, 33, 41], "decision_funct": 8, "decision_tre": 9, "decisiontreeclassifi": [9, 10], "decisiontreeregressor": [0, 9, 10], "declar": [0, 4, 20, 23, 26, 33], "declare_namespac": [], "decompos": [5, 6, 26, 34, 35, 40], "decomposit": [0, 6, 12, 33, 39, 40, 43], "decompost": [5, 34, 35], "deconvolut": [3, 43, 44], "decorrel": [10, 13, 36], "decreas": [1, 2, 4, 5, 6, 10, 11, 13, 19, 23, 24, 35, 36, 37, 38, 41, 42, 43], "dedic": 20, "deduc": [0, 33], "deep": [3, 7, 12, 13, 24, 25, 28, 32, 34, 35], "deep_neural_network": [2, 42, 43], "deep_param": [2, 42, 43], "deep_tree_clf": [9, 10], "deep_tree_clf1": 9, "deep_tree_clf2": 9, "deepcopi": [41, 42], "deepen": [5, 25, 33], "deeper": [0, 3, 4, 33, 44], "deepimag": 42, "deeplearningbook": [28, 32, 33, 35, 36], "deer": [3, 44], "def": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 13, 14, 16, 17, 21, 22, 30, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "def_covari": 30, "default": [0, 1, 2, 4, 6, 7, 26, 28, 33, 34, 38, 39, 41, 42, 43], "default_tim": 4, "defect": [5, 34, 35], "defici": [5, 34, 35], "defin": [0, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 16, 17, 18, 19, 21, 22, 23, 26, 27, 30, 34, 35, 36, 37, 38, 39, 44], "definit": [1, 2, 5, 6, 7, 8, 10, 11, 12, 13, 23, 26, 30, 34, 35, 36, 37, 38, 39, 42, 43], "defint": 30, "defualt": [41, 42], "degre": [3, 5, 6, 8, 9, 10, 11, 15, 16, 19, 20, 27, 30, 33, 35, 36, 37, 38, 43, 44], "deisenroth": 34, "del": 1, "delet": [6, 15], "delimit": 4, "deliv": [15, 27, 28, 29, 33], "delta": [0, 2, 3, 6, 8, 12, 13, 14, 33, 36, 40, 41, 42, 43, 44], "delta_": [1, 26, 40, 41], "delta_0": [3, 40, 43, 44], "delta_1": [3, 40, 41, 43, 44], "delta_2": [3, 40, 41, 43, 44], "delta_2a_1": [40, 41], "delta_3": [3, 43, 44], "delta_4": [3, 43, 44], "delta_5": [3, 43, 44], "delta_h": [0, 1, 33, 41], "delta_i": [40, 41, 43, 44], "delta_j": [3, 12, 40, 41, 42, 43, 44], "delta_k": [12, 40, 41, 42], "delta_l": [1, 3, 41, 43, 44], "delta_matrix": [41, 42], "delta_momentum": [13, 36], "delta_n": [0, 3, 33], "delug": 25, "delv": 0, "demand": [13, 35], "demonstr": [0, 3, 5, 6, 7, 11, 12, 19, 25, 33, 34, 35, 36, 37, 38, 39, 41, 42, 43, 44], "demystifi": [39, 40, 41], "den": 4, "denomin": [1, 5, 36, 41], "denot": [1, 2, 6, 7, 13, 30, 35, 36, 38, 39, 41, 42, 43], "dens": [1, 3, 4, 41, 42, 43], "densiti": [0, 2, 6, 30, 37, 38, 42, 43], "depart": [31, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "depend": [0, 1, 2, 4, 5, 6, 7, 8, 11, 12, 13, 15, 16, 25, 26, 27, 30, 33, 34, 35, 36, 38, 39, 40, 41, 42, 43], "depict": 30, "deploy": [0, 25, 27, 33], "depth": [0, 3, 9, 10, 26, 37, 41, 43, 44], "der": [], "deriv": [0, 1, 2, 6, 7, 8, 10, 11, 13, 18, 22, 25, 27, 28, 33, 38, 39, 42, 43], "derivati": 13, "derivative_fn": 13, "derivb1": [40, 41], "derivb2": [40, 41], "derivw1": [40, 41], "derivw2": [40, 41], "descend": [5, 9, 11, 34, 35, 43, 44], "descent": [0, 1, 3, 7, 8, 12, 22, 24, 28, 33, 34, 38, 40, 41, 44], "describ": [0, 2, 4, 5, 6, 8, 10, 11, 12, 13, 19, 20, 23, 24, 26, 27, 28, 33, 36, 37, 39, 40, 42, 43, 44], "descript": [0, 8, 9, 20, 27, 28, 33, 41, 42], "design": [0, 1, 3, 4, 5, 6, 7, 10, 11, 12, 13, 17, 18, 27, 28, 33, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "designmatrix": [0, 33], "desir": [0, 2, 4, 5, 13, 14, 33, 34, 35, 36, 41, 42, 43], "desktop": 15, "despit": [1, 12, 36, 39, 41], "destroi": 26, "det": [5, 26, 34, 35], "detail": [0, 6, 11, 13, 14, 18, 21, 22, 26, 27, 34, 35, 36, 41], "detect": [3, 8, 12, 39, 40, 43, 44], "determin": [0, 2, 3, 4, 5, 6, 8, 9, 10, 11, 12, 13, 18, 23, 26, 30, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "determinist": [7, 13, 30, 35, 36, 38, 40], "deternin": 40, "dev": [1, 27, 28, 41], "develop": [0, 3, 5, 8, 10, 11, 12, 25, 26, 27, 28, 33, 34, 39, 40, 42, 43, 44], "deviat": [0, 1, 2, 4, 5, 6, 17, 18, 19, 27, 30, 33, 34, 36, 37, 38, 41, 42, 43], "devic": [42, 44], "devis": [12, 39, 40], "df": [4, 8, 11, 13, 23, 33, 40], "df1": 33, "di": [], "diag": [5, 8, 34, 35, 36, 43], "diagnost": [1, 10, 41], "diagon": [0, 5, 7, 13, 18, 19, 23, 26, 30, 33, 34, 35, 36, 38, 39, 41, 42, 43, 44], "diagonaliz": [5, 34, 35], "diagram": 10, "diagsvd": 6, "dice": [6, 30, 37], "dict": [6, 8, 41, 42], "dictionari": [41, 42], "did": [0, 1, 5, 6, 7, 10, 11, 14, 16, 22, 27, 28, 33, 37, 38, 39, 41, 42, 43], "didn": 42, "die": [1, 41, 42], "diff": [2, 40, 42], "diff1": [2, 42, 43], "diff2": [2, 42, 43], "diff_ag": [2, 42, 43], "diffeent": 8, "differ": [0, 1, 2, 3, 4, 5, 6, 7, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 21, 22, 23, 24, 25, 26, 27, 30, 32, 33, 34, 35, 37, 38, 39, 40, 41, 42], "different": [40, 41, 42], "differenti": [0, 3, 16, 21, 22, 25, 26, 28, 33, 34, 35, 39, 41, 44], "difficult": [0, 1, 6, 10, 13, 30, 33, 36, 37, 38, 41], "difficulti": [0, 1, 13, 33, 35, 36, 41], "diffonedim": [2, 42, 43], "digit": [0, 1, 3, 4, 6, 23, 28, 31, 33, 41, 42, 44], "digress": 40, "dilemma": [13, 36], "dilut": [1, 41], "dim": [4, 11, 14, 26, 41, 42], "dimens": [0, 1, 2, 3, 4, 5, 8, 11, 14, 16, 26, 33, 34, 35, 40, 41, 42, 43, 44], "dimension": [0, 4, 5, 6, 9, 11, 13, 14, 19, 25, 26, 27, 28, 33, 34, 35, 36, 37], "dimensionless": [0, 3, 33], "diment": 26, "diminish": 36, "dimnsion": 4, "diod": 3, "direct": [0, 1, 2, 4, 11, 12, 13, 14, 24, 33, 34, 35, 36, 39, 40, 41, 42, 43, 44], "directli": [1, 4, 5, 6, 18, 22, 30, 34, 35, 41, 42], "directori": [], "disabl": 42, "disadvantag": [0, 28, 33, 36], "disappear": [3, 6, 37], "disc_loss": 4, "disc_tap": 4, "discard": [6, 11, 36, 37, 38], "disciplin": [0, 3, 12, 39, 40, 43, 44], "disclaim": 30, "discontinu": 40, "discord": [21, 33], "discourag": [13, 15, 35], "discov": [0, 33], "discover": 5, "discret": [1, 3, 5, 7, 13, 38, 39, 41], "discrimin": [4, 7, 10, 11, 23, 38, 39], "discriminator_loss": 4, "discriminator_loss_list": 4, "discriminator_model": 4, "discriminator_optim": 4, "discuss": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 16, 18, 19, 20, 23, 24, 25, 26, 27, 28, 30, 32, 33, 34, 35, 36, 37, 38, 40, 41, 42, 43, 44], "diseas": [7, 38, 39], "disguis": [6, 34, 36], "disk": 36, "disord": [1, 7, 38, 39], "dispai": [39, 40], "displai": [0, 1, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 14, 27, 30, 33, 34, 36, 37, 38, 39, 40, 41, 42, 44], "displaystyl": [0, 5, 17, 33, 34, 35, 36], "disregard": [0, 33], "dissimilar": [11, 14], "dist": 14, "distanc": [8, 9, 11, 14, 30], "distance_list": 9, "distinct": [3, 7, 8, 9, 10, 14, 38, 39, 43, 44], "distinctli": 8, "distinguish": [0, 4, 7, 8, 30, 33, 39], "distplot": [], "distribut": [0, 1, 4, 6, 7, 10, 11, 13, 14, 18, 19, 21, 25, 26, 27, 28, 33, 34, 35, 36, 38, 41], "distrubut": [0, 25, 27, 33], "div": [], "dive": [0, 8, 26, 33], "diverg": [1, 13, 35, 36, 41], "divid": [0, 1, 3, 5, 6, 7, 8, 9, 11, 12, 18, 19, 23, 24, 28, 30, 33, 34, 36, 37, 38, 39, 40, 41, 44], "divis": [6, 8, 9, 13, 18, 26, 30, 36, 37, 38, 40, 41, 42], "dl": [], "dm": [], "dna": [7, 38, 39], "dnn": [0, 1, 2, 4, 12, 33, 39, 40, 41, 42, 43], "dnn1": 4, "dnn2_gru2": 4, "dnn_kera": [1, 41, 42], "dnn_model": 1, "dnn_numpi": [1, 41], "dnn_scikit": [0, 1, 33, 41], "do": [0, 2, 3, 4, 5, 6, 8, 9, 10, 11, 12, 13, 14, 15, 16, 18, 19, 20, 21, 22, 23, 24, 26, 27, 28, 33, 34, 35, 37, 38, 40, 42, 44], "doc": [0, 15, 16, 19, 25, 27, 28, 29, 31, 32, 33, 41, 42], "document": [4, 13, 15, 24, 42], "docutil": [], "doe": [0, 1, 2, 3, 4, 5, 6, 8, 10, 11, 12, 13, 15, 16, 17, 18, 19, 21, 22, 24, 26, 27, 28, 30, 33, 36, 37, 38, 40, 41, 42, 43, 44], "doesn": [3, 9, 12, 24, 33, 36, 40, 41, 43, 44], "dog": [1, 3, 4, 41, 44], "dollar": [], "domain": [5, 8, 13, 27, 28, 35, 37], "domcontentload": [], "domin": [0, 23, 33], "don": [0, 1, 3, 5, 6, 8, 11, 13, 15, 16, 21, 23, 24, 25, 27, 28, 33, 34, 36, 41, 42], "done": [0, 2, 3, 4, 5, 6, 9, 10, 11, 13, 16, 20, 22, 26, 27, 33, 34, 35, 36, 37, 38, 40, 41, 42, 43, 44], "dot": [0, 2, 3, 5, 6, 7, 8, 9, 10, 11, 12, 13, 18, 26, 27, 30, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42], "doubl": [3, 4, 16, 26, 33, 43, 44], "doubli": [1, 41], "doubt": [27, 28], "down": [0, 3, 6, 9, 11, 12, 13, 35, 36, 39, 42, 43], "download": [0, 1, 3, 5, 6, 15, 20, 26, 32, 33, 41, 42, 44], "downsampl": [3, 43, 44], "downscal": 28, "downsiz": 43, "dozen": [1, 41], "dq": [6, 37], "draft": [20, 24], "drag": 13, "dragon": [], "dramat": 11, "drastic": 4, "draw": [4, 6, 10, 13, 35, 37, 38], "drawback": [0, 1, 3, 13, 34, 35, 36, 41], "drawn": [1, 4, 6, 7, 11, 30, 33, 37, 38, 39, 41], "drive": [3, 4, 21, 22], "driven": 3, "drop": [0, 1, 5, 6, 11, 13, 30, 33, 34, 35, 37, 41], "dropna": [0, 6, 33, 37, 38], "dropout": [4, 44], "dt": [2, 3, 13, 30, 40, 42, 43], "dtype": [0, 1, 3, 4, 14, 26, 33, 38, 39, 40, 41, 42, 44], "dual": [], "dub": [0, 33], "duboi": [], "due": [1, 2, 5, 6, 8, 10, 12, 13, 18, 31, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43], "dugard": [], "dummi": [], "dumoulin": [43, 44], "dure": [0, 1, 3, 4, 8, 9, 11, 20, 25, 27, 28, 33, 36, 37, 38, 39, 41, 42, 43, 44], "dw": 22, "dw_1": 22, "dwell": [], "dwh": [1, 41], "dwo": [1, 41], "dx": [2, 3, 8, 30, 40, 42, 43], "dx_1": 30, "dx_1p": [6, 37], "dx_2p": [6, 37], "dx_mp": [6, 37], "dx_n": 30, "dxp": [6, 37], "dy": [1, 8, 30, 41, 42], "dynam": 4, "dz": [8, 22], "dz_1": 22, "dz_2": 22, "dzt6vm1wjh": 44, "e": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 30, 31, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "e1e1e1": [], "e_": [0, 2, 33, 42, 43], "e_z": 21, "each": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 18, 22, 23, 25, 26, 28, 29, 30, 31, 33, 34, 35, 36, 37, 39, 40, 41, 42, 43, 44], "eager": 37, "eapprox": [0, 33], "earli": [1, 13, 36, 41], "earlier": [0, 5, 7, 8, 9, 11, 12, 13, 19, 20, 21, 22, 33, 34, 38, 39, 40, 41], "earthexplor": 6, "eas": [6, 9, 14, 37], "easi": [0, 5, 6, 7, 8, 9, 10, 11, 12, 13, 15, 21, 22, 25, 26, 28, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42], "easier": [5, 6, 8, 9, 13, 15, 20, 21, 22, 27, 28, 30, 33, 34, 35, 37, 38], "easiest": [13, 18, 38, 39], "easili": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 26, 27, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "eastern": [31, 33], "ebind": [0, 33], "eblock": 9, "ec8e2c": [], "econom": [], "econometr": 33, "economi": 5, "ecosystem": [25, 33], "ect": 29, "edg": [3, 43, 44], "edgecolor": [6, 37, 38], "edit": [21, 22], "editor": [15, 20], "edu": [13, 27, 28, 35, 42], "educ": [0, 27, 28, 33, 37], "ee6677": [], "eff": 30, "effect": [1, 4, 10, 13, 16, 17, 18, 30, 36, 41, 42], "effic": [1, 41], "effici": [0, 3, 10, 13, 21, 22, 25, 26, 30, 33, 36, 38, 39, 40], "effort": 19, "efron": [6, 37, 38], "egrad": 13, "eig": [5, 11, 13, 26, 30, 33, 34, 35, 36], "eigen": 30, "eigenpair": [5, 11, 34, 35], "eigenvalu": [0, 5, 8, 11, 13, 26, 33, 34, 35, 36], "eigenvector": [5, 11, 13, 34, 35], "eight": [26, 33], "eigval": [26, 30, 33], "eigvalu": [11, 13, 35, 36], "eigvec": [26, 30, 33], "eigvector": [11, 13, 35, 36], "eir": [31, 33], "eispack": [26, 33], "either": [1, 5, 6, 7, 8, 9, 10, 11, 13, 18, 19, 27, 28, 30, 33, 34, 35, 36, 37, 38, 39, 40, 41, 43], "eivind": 31, "eivinsto": 31, "ekstr\u00f8m": 4, "elabor": 30, "elarn": 3, "electr": [0, 3, 12, 33, 39, 40], "electron": 33, "eleg": 11, "element": [1, 2, 3, 4, 5, 6, 7, 8, 11, 12, 13, 19, 20, 21, 24, 25, 26, 27, 28, 32, 34, 36, 37, 38, 39, 40, 41, 42, 43, 44], "elementari": [10, 13, 26, 40], "elementwis": [3, 13, 43, 44], "elementwise_grad": [2, 13, 22, 41, 42, 43], "elessar": 33, "elif": [14, 41, 42], "elim": 26, "elimin": [3, 8], "elin": [31, 33], "ell_": [], "ellipsi": 16, "els": [1, 3, 4, 7, 9, 12, 13, 16, 22, 26, 38, 39, 41, 42, 43, 44], "elu": 1, "elus": [0, 33], "em": 23, "email": [20, 21, 29, 31, 33], "emb": [], "embark": 40, "embed": [0, 11, 34], "embeddings_fig5_349758607": 28, "embodi": [6, 27, 37, 38], "emit": 30, "emner": 32, "emph": 36, "emphas": [0, 10, 25, 33], "emphasi": [0, 25, 32, 33], "empir": [1, 11, 30, 41], "emploi": [0, 1, 5, 6, 11, 13, 28, 30, 33, 34, 35, 37, 41, 44], "employ": 0, "empti": [6, 10, 15, 37, 38, 41, 42], "emul": [12, 39, 40], "en": [25, 27, 32], "enabl": [11, 36, 41, 42], "enbodi": [6, 37], "encod": [0, 3, 5, 9, 11, 14, 33, 34, 35, 38, 39, 43, 44], "encompass": [0, 27, 30], "encount": [0, 1, 5, 7, 13, 15, 21, 27, 30, 33, 34, 35, 36, 38, 39, 41], "encourag": [15, 27, 28], "end": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 20, 22, 23, 24, 26, 30, 31, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "endblock": [], "endfor": [], "endif": [], "endors": [], "endpoint": [3, 6], "energi": [0, 4, 6, 37, 38], "enforc": [12, 39, 40], "eng": 32, "engin": [0, 1, 3, 4, 25, 33, 41, 43, 44], "english": [27, 28], "enjoi": 36, "enocurag": [27, 28], "enorm": [3, 43, 44], "enough": [0, 6, 13, 28, 33, 35, 36, 37], "ensembl": [1, 9, 33, 41], "ensur": [0, 1, 2, 3, 5, 6, 11, 13, 18, 30, 34, 35, 36, 37, 38, 40, 41, 42, 43, 44], "entail": 33, "enter": [5, 6, 34, 35, 36], "enthought": [0, 25, 27, 33], "entir": [1, 3, 7, 9, 21, 25, 30, 33, 36, 38, 41, 42, 44], "entireti": [], "entiti": [9, 12, 26, 33], "entri": [0, 5, 8, 11, 12, 23, 26, 33, 34, 36, 37], "entropi": [1, 3, 7, 10, 13, 21, 22, 28, 33, 35, 36, 41, 42, 44], "enumer": [0, 1, 2, 3, 4, 6, 8, 21, 33, 34, 36, 38, 39, 41, 42, 43, 44], "env": 30, "environ": [2, 21, 22, 25, 27, 33, 42, 43], "environemnt": 15, "eo": [0, 6, 37, 38], "eol": 0, "eosfit": 0, "epoch": [0, 1, 3, 4, 12, 13, 21, 24, 33, 36, 38, 39, 41, 42, 44], "eppstein": [], "epsilon": [0, 5, 6, 7, 13, 27, 33, 34, 35, 36, 37, 38, 39, 40], "epsilon_": [0, 33], "epsilon_0": [0, 33], "epsilon_1": [0, 33], "epsilon_2": [0, 33], "epsilon_i": [0, 33, 34], "eq": [3, 13, 14, 26, 30, 35], "eqnarrai": [3, 5, 6, 37], "equal": [0, 1, 2, 3, 4, 5, 6, 8, 9, 11, 12, 13, 14, 16, 18, 23, 26, 27, 28, 30, 33, 34, 35, 36, 37, 38, 40, 41, 42, 43, 44], "equat": [1, 3, 4, 5, 6, 7, 8, 9, 10, 11, 13, 14, 17, 19, 26, 30, 33, 36, 37, 44], "equilibrium": [2, 12, 39, 40, 42, 43], "equiv": [3, 13, 26, 30, 35, 36], "equival": [0, 1, 5, 7, 8, 11, 13, 23, 24, 25, 26, 28, 33, 34, 35, 36, 37, 41, 43, 44], "equivel": [19, 21, 22], "eqynreyrxni": 41, "eras": [], "erf": 30, "eriador": 33, "eric": [41, 42], "err": [0, 10, 43], "err_": [6, 37, 38], "err_sqr": [2, 42, 43], "errat": [13, 35, 36], "erron": [2, 42, 43], "error": [1, 2, 4, 5, 6, 7, 9, 11, 12, 13, 15, 16, 17, 18, 19, 21, 23, 25, 26, 27, 28, 30, 36, 39, 40, 41, 42, 43], "error_estimate_corr_tim": 30, "error_hidden": [1, 41], "error_output": [1, 41], "escap": [13, 35, 36], "escapehtml": [], "especi": [1, 3, 9, 12, 13, 15, 18, 27, 28, 36, 39, 40, 41, 42], "essenti": [0, 5, 6, 9, 10, 12, 14, 15, 24, 27, 28, 30, 34, 35, 36, 39, 40, 41, 43, 44], "establish": [0, 6, 10, 11, 16, 27, 28], "estim": [0, 1, 5, 6, 7, 10, 11, 13, 25, 30, 33, 34, 35, 36, 38, 39, 41, 43], "estimated_mse_fold": [6, 37, 38], "estimated_mse_kfold": [6, 37, 38], "estimated_mse_sklearn": [6, 37, 38], "et": [0, 2, 4, 16, 17, 20, 28, 32, 33, 34, 35, 37, 38, 39, 40, 41, 42], "eta": [0, 1, 3, 8, 12, 13, 18, 28, 33, 35, 36, 40, 41, 42, 44], "eta0": [8, 13], "eta_": 13, "eta_j": 36, "eta_t": [13, 36], "eta_v": [0, 1, 3, 33, 41, 42, 44], "etc": [0, 1, 3, 5, 7, 8, 9, 11, 12, 13, 14, 25, 26, 27, 28, 30, 34, 35, 36, 38, 39, 41, 42], "ethic": 25, "etsim": 37, "euclidean": [0, 14, 34, 36], "euler": [], "eval": [42, 44], "evalu": [0, 2, 3, 4, 5, 6, 9, 13, 15, 16, 17, 19, 21, 23, 27, 30, 33, 34, 35, 36, 37, 38, 39, 42, 43], "evalut": [13, 27], "even": [0, 1, 3, 4, 5, 6, 8, 9, 10, 11, 12, 13, 14, 22, 25, 26, 28, 30, 33, 34, 35, 36, 37, 38, 39, 40, 41, 43, 44], "evenli": 4, "event": [5, 7, 10, 30, 37, 38], "eventu": [0, 5, 6, 11, 12, 13, 24, 27, 28, 31, 34, 35, 36, 37, 38, 39, 40], "everi": [0, 1, 2, 3, 4, 5, 6, 9, 10, 11, 12, 13, 14, 15, 21, 25, 30, 31, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "everyth": [4, 12, 16, 18, 21, 40, 41], "everywher": [4, 13, 35], "evolv": 0, "exact": [0, 5, 11, 12, 13, 26, 30, 33, 34, 36, 40, 41], "exactli": [0, 3, 4, 6, 12, 18, 25, 34, 36, 37, 39, 40, 41, 43, 44], "exam": 33, "examin": [6, 37, 38], "exampl": [0, 5, 11, 12, 13, 15, 16, 18, 20, 23, 25, 26, 27, 28, 30, 32], "exce": [1, 12, 13, 36, 39, 40, 41], "exceed": 36, "excel": [0, 1, 4, 5, 10, 20, 27, 28, 33, 34, 41], "except": [3, 4, 6, 8, 9, 26, 41, 42, 43, 44], "excess": [0, 33], "exchang": 36, "excit": 0, "exclud": [1, 6, 12, 27, 28, 34, 36, 37, 38, 39, 41], "exclus": [0, 1, 3, 6, 30, 33, 37, 38, 41, 42, 44], "execut": [2, 5, 13, 15, 34, 35, 36, 42, 43], "exemplari": [], "exemplifi": [13, 36], "exercic": [31, 33], "exercis": [5, 25, 27, 28, 29, 31, 33, 35, 36, 37, 38, 39, 41, 43, 44], "exercisesweek41": 28, "exercisesweek42": [28, 41], "exhaust": [6, 36, 37, 38], "exhibit": [0, 5, 6, 8, 33, 34, 37], "exist": [0, 1, 2, 3, 5, 6, 7, 8, 9, 13, 19, 26, 27, 28, 33, 35, 36, 37, 38, 41, 42, 43, 44], "exit": [5, 26, 34, 35], "exp": [0, 1, 2, 5, 6, 7, 8, 10, 11, 12, 13, 16, 17, 19, 21, 22, 30, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43], "exp_term": [1, 41], "exp_z": [38, 39], "expand": [5, 7, 11, 13, 35, 38, 39], "expans": [0, 3, 5, 8, 10, 12, 13, 33, 34, 35, 40], "expect": [0, 1, 5, 6, 7, 11, 12, 13, 15, 18, 23, 25, 27, 28, 33, 34, 36, 38, 40, 41], "expectation_value_of_h_wrt_p": 30, "expens": [6, 10, 13, 16, 35, 36], "experi": [0, 1, 6, 8, 13, 15, 25, 27, 33, 34, 35, 36, 37, 38, 41, 42, 43, 44], "experiment": [0, 4, 6, 9, 30, 33, 37, 38], "expert": [1, 9, 41], "explain": [0, 6, 9, 10, 11, 13, 16, 19, 24, 27, 28, 33, 35, 38, 39], "explained_variance_ratio_": 11, "explan": [], "explanatori": [0, 33], "explicit": [0, 3, 6, 13, 26, 27, 33, 34, 35, 36, 43, 44], "explicitli": [0, 4, 21], "explod": [1, 24, 40], "exploit": [0, 3, 12, 13, 33, 36, 39, 40, 43, 44], "explor": [1, 4, 6, 8, 13, 18, 25, 27, 28, 33, 35, 36, 41], "expon": [1, 41, 42], "exponenti": [0, 1, 5, 6, 10, 13, 30, 33, 35, 40, 41], "export": [9, 15, 16, 19, 20, 24, 38, 39], "export_graphviz": 9, "export_text": 9, "exporttext": 9, "expos": 25, "expr": 40, "express": [0, 2, 3, 5, 6, 7, 10, 12, 13, 18, 22, 26, 27, 28, 30, 33, 35, 36, 37, 42, 43, 44], "exptmean": 30, "exptvari": 30, "extend": [0, 2, 7, 11, 13, 25, 33, 36, 42, 43], "extend_path": [], "extens": [0, 12, 15, 25, 28, 33, 39, 40], "extent": [0, 1, 6, 32, 37, 38, 41, 42, 43, 44], "extern": [3, 6, 9], "extra": [1, 3, 5, 15, 31, 33, 34, 35, 41, 44], "extract": [0, 3, 5, 6, 7, 8, 11, 13, 16, 17, 26, 28, 33, 34, 38, 39, 40], "extrapol": [0, 33], "extrem": [0, 1, 4, 5, 6, 7, 8, 9, 13, 15, 16, 26, 34, 35, 36, 38, 41], "extremum": [13, 35], "extrins": 11, "ey": [0, 5, 6, 13, 14, 18, 26, 33, 34, 35, 36], "f": [0, 1, 2, 3, 4, 5, 6, 7, 8, 10, 12, 13, 14, 15, 16, 17, 18, 19, 22, 23, 24, 26, 30, 31, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "f1": 13, "f11": [0, 33], "f12": [0, 33], "f13": [0, 33], "f1_grad": 13, "f1d": 13, "f2": 13, "f26196": [], "f2_grad_x1": 13, "f2_grad_x1_analyt": 13, "f2_grad_x2": 13, "f2_grad_x2_analyt": 13, "f2f2f2": [], "f3": 13, "f3_grad": 13, "f3_grad_analyt": 13, "f4": 13, "f4_grad": 13, "f4_grad_analyt": 13, "f5": 13, "f5_grad": 13, "f5a394": [], "f5ab35": [], "f5f5f5": [], "f6": 13, "f6_for": 13, "f6_for_grad": 13, "f6_grad_analyt": 13, "f6_while": 13, "f6_while_grad": 13, "f7": 13, "f78c6c": [], "f7_grad": 13, "f7_grad_analyt": 13, "f8": 13, "f8_grad": 13, "f8f8f2": [], "f9": [0, 13, 33], "f9_altern": 13, "f9_alternative_grad": 13, "f9_grad": 13, "f_": [10, 23], "f_0": [3, 10], "f_1": [10, 13, 35], "f_2": [12, 13, 35, 39], "f_3": [12, 39], "f_d": 30, "f_grad": 13, "f_grad_analyt": 13, "f_i": [0, 6, 12, 16, 37, 38, 39], "f_m": [3, 10], "f_n": 3, "f_vec": [2, 42, 43], "face": [13, 33, 35], "facecolor": [6, 8, 30, 37], "facil": [0, 25], "facilit": [12, 39, 40], "fact": [0, 1, 3, 5, 9, 11, 12, 13, 22, 33, 34, 35, 36, 41, 43, 44], "facto": 36, "factor": [0, 1, 3, 5, 6, 9, 10, 11, 13, 26, 30, 33, 34, 35, 41], "factori": 13, "fad000": [], "fade": 6, "fae4c2": [], "fafab0": [9, 10], "fail": [0, 6, 13, 31, 33, 35, 37, 38, 40], "failur": [7, 38, 39], "fairli": [1, 2, 18, 30, 36, 41, 42, 43], "faisal": [16, 34], "fake": 4, "fake_loss": 4, "fake_output": 4, "fall": [8, 9, 23, 29], "fals": [0, 1, 2, 3, 4, 5, 6, 7, 9, 10, 14, 16, 17, 23, 26, 28, 33, 34, 35, 36, 37, 38, 39, 41, 42, 43, 44], "famili": [0, 7, 8, 30, 34, 36, 38, 39, 40], "familiar": [0, 3, 5, 6, 8, 15, 25, 26, 27, 30, 33, 37, 40, 43, 44], "famou": [6, 12, 41, 42], "far": [0, 3, 4, 5, 6, 8, 11, 12, 13, 14, 16, 20, 21, 22, 33, 34, 35, 36, 39, 40, 43, 44], "fashion": [0, 9, 10, 28, 33, 36], "fashionmnist": 28, "fast": [1, 3, 6, 10, 12, 13, 25, 30, 33, 35, 36, 37, 38, 40, 41, 43, 44], "faster": [1, 11, 13, 21, 36, 41, 42], "fastest": [13, 26, 35], "fatal": [], "favor": [7, 36, 38], "favorit": 30, "fc": [3, 43, 44], "fc1": [42, 44], "fc2": [42, 44], "fc3": 42, "fcfcfc": [], "fdac54": [], "fdf2e2": [], "featur": [0, 1, 3, 5, 6, 7, 8, 10, 11, 12, 13, 15, 17, 18, 19, 21, 23, 25, 28, 30, 33, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "feature_nam": [1, 7, 9, 21, 39], "feautur": 9, "fed": [1, 40, 41], "feed": [0, 2, 3, 11, 21, 24, 25, 28, 33, 43, 44], "feed_forward": [1, 21, 22, 41], "feed_forward_all_relu": 21, "feed_forward_batch": 21, "feed_forward_one_lay": 22, "feed_forward_out": [1, 41], "feed_forward_sav": 22, "feed_forward_train": [1, 41], "feed_forward_two_lay": 22, "feedback": [4, 20, 24, 33], "feeddorward": 4, "feedforward": [1, 4, 12, 41, 42], "feel": [0, 5, 6, 11, 13, 15, 16, 18, 21, 22, 23, 25, 27, 28, 31, 33, 40], "feet": [], "fefef": [], "fefeff": [], "felt": [27, 28], "fenc": [], "fernando": [], "fetch": [6, 15, 28], "fetch_openml": 28, "few": [1, 3, 4, 5, 9, 17, 18, 19, 22, 23, 24, 30, 33, 40, 41, 43, 44], "fewer": [0, 9, 11, 19, 33, 36], "ff7b72": [], "ff9492": [], "ffa07a": [], "ffa657": [], "ffb757": [], "ffd700": [], "ffd900": [], "ffd9002e": [], "ffffff": [], "ffnn": [1, 12, 28, 39, 40, 41, 42], "fi": [], "field": [0, 3, 6, 12, 19, 25, 39, 40, 43, 44], "fieldmask": [], "fifteen": 40, "fifth": [0, 6, 33, 43, 44], "fig": [0, 1, 2, 3, 4, 6, 7, 12, 13, 14, 27, 33, 38, 39, 41, 42, 43, 44], "fig_id": [0, 6, 7, 9, 33, 37, 38], "figaxi": 30, "figsiz": [0, 1, 2, 3, 4, 6, 7, 8, 9, 10, 33, 37, 38, 39, 41, 42, 43, 44], "figslid": 43, "figur": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 12, 13, 14, 16, 24, 25, 27, 28, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "figure_id": [0, 6, 7, 9, 33, 37, 38], "figurefil": [0, 6, 7, 9, 33, 37, 38], "file": [0, 4, 5, 6, 7, 9, 15, 20, 21, 22, 27, 28, 33, 37, 38], "file_prefix": 4, "filenam": 33, "fill": [5, 9, 18, 23, 34, 35, 41, 42], "fill_valu": [], "filter": [3, 4, 43, 44], "final": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 13, 14, 18, 20, 21, 22, 23, 24, 27, 28, 29, 30, 31, 33, 35, 37, 38, 39], "financ": 0, "find": [0, 1, 2, 3, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 21, 22, 24, 25, 27, 28, 30, 33, 34, 35, 36, 38, 39, 40, 41, 42], "fine": [0, 14], "finish": [2, 20, 21, 41, 42, 43], "finit": [3, 5, 6, 12, 13, 17, 30, 34, 35, 37, 38, 39, 40, 43, 44], "finnicki": 15, "fire": [], "first": [0, 1, 2, 3, 5, 6, 7, 8, 9, 10, 11, 13, 14, 15, 16, 18, 19, 21, 22, 23, 24, 26, 27, 28, 30, 31, 32, 34, 36, 37, 38, 39, 44], "first_moment": 36, "first_term": 36, "firsteigvector": 11, "fit": [1, 3, 4, 5, 6, 7, 8, 9, 11, 12, 13, 17, 18, 19, 22, 23, 27, 28, 30, 34, 36, 37, 38, 39, 40, 41, 42, 44], "fit_beta": 34, "fit_intercept": [0, 5, 6, 16, 34, 35, 36, 37, 38, 39], "fit_mod": 9, "fit_theta": [6, 36], "fit_transform": [0, 6, 8, 9, 11, 15, 19, 37, 38], "fiti": [0, 33], "five": [0, 9, 33, 34, 40], "fix": [0, 3, 4, 6, 10, 11, 12, 13, 24, 27, 33, 37, 38, 39, 41, 42, 43, 44], "flag": 4, "flat": [12, 13, 35, 36], "flatten": [1, 3, 4, 5, 26, 41, 42, 43, 44], "flavor": [], "flexibl": [1, 6, 8, 10, 12, 28, 33, 36, 37, 38, 39, 41, 42], "flip": [21, 31, 33, 43, 44], "float": [0, 3, 4, 5, 9, 11, 13, 14, 26, 33, 34, 35, 41, 42, 43], "float32": [4, 9, 41, 42], "float64": [4, 26, 33, 39, 40, 41, 42], "floatingpointerror": [41, 42], "floor": [41, 42], "flop": [5, 26, 34, 35], "flow": [1, 4, 12, 39, 40, 41], "flower": 21, "fluctuat": [5, 36], "flush": [41, 42], "fly": 11, "fm": 0, "fmax": 3, "fmesh": 13, "fn": [7, 23], "focu": [0, 3, 4, 5, 6, 15, 25, 27, 28, 32, 33, 34, 35, 36, 37, 38, 43, 44], "focus": [1, 6, 7, 23, 26, 34, 36, 38, 39, 41], "fold": [6, 9, 27], "folder": [0, 4, 6, 15, 20, 24, 27, 28, 33], "follow": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "font": [7, 20, 30, 33, 38], "fontdict": 30, "fontsiz": [1, 6, 8, 9, 10, 30], "fontweight": 1, "footprint": [3, 36, 43, 44], "foral": [8, 34, 40], "forc": [0, 5, 6, 10, 11, 34, 35, 36, 40, 43, 44], "forcast": 4, "forcier": [], "forecast": [4, 12, 39, 40], "forest": [0, 1, 9, 25, 33, 41], "forget": [11, 36], "form": [0, 3, 4, 5, 6, 7, 8, 9, 11, 12, 13, 15, 16, 23, 25, 26, 27, 28, 30, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "formal": [3, 4, 14, 18, 23, 30, 40, 43, 44], "format": [0, 1, 3, 4, 6, 7, 8, 9, 10, 11, 20, 23, 25, 30, 32, 37, 38, 39, 41, 42, 44], "format_data": 4, "formatstrformatt": [6, 13, 35, 36], "formatt": [], "formul": [4, 6, 11, 14, 23], "formula": [3, 13, 23, 30, 35, 40], "forth": [4, 12, 22, 39], "fortran": [0, 25, 26, 33], "fortran2003": [25, 33], "fortran2008": [27, 28], "fortran90": 30, "fortun": [0, 11, 34], "forward": [0, 3, 6, 21, 24, 25, 26, 28, 33, 36, 37, 44], "forwardpropag": [40, 41], "found": [1, 2, 4, 5, 6, 12, 13, 19, 20, 21, 22, 27, 33, 34, 36, 37, 38, 39, 40, 41, 42, 43], "foundat": [25, 33], "four": [4, 5, 6, 8, 12, 21, 26, 29, 31, 33, 35, 39, 40, 41, 43, 44], "fourier": [0, 33, 40], "fourierdef1": 3, "fourierdef2": 3, "fourierseriessign": 3, "fourth": [12, 33, 34], "fp": [7, 23], "fpr": 23, "frac": [0, 1, 2, 3, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 16, 17, 19, 21, 22, 23, 26, 27, 28, 30, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "fraction": [9, 23, 38, 39], "frame": [7, 36, 39], "framework": [1, 8, 10, 30, 41, 42], "frank": [5, 11], "frankefunct": [5, 6, 11], "fredli": [21, 31, 33], "free": [0, 6, 11, 13, 15, 16, 18, 21, 22, 23, 25, 26, 27, 28, 30, 31, 32, 33, 40], "freecodecamp": 25, "freedom": [5, 35], "freeli": [0, 27], "freez": 15, "frequenc": [3, 6, 7, 30, 37, 39, 43, 44], "frequent": [0, 8, 9, 13, 35], "frequentist": 25, "fresh": 10, "frf4l5qax1m": 42, "fridai": [15, 21, 22, 23, 24, 31, 33, 43], "friedman": [6, 19, 27, 32, 33], "friendli": 4, "fro": 27, "frodo": 33, "frog": [3, 44], "from": [0, 1, 2, 3, 4, 6, 7, 8, 9, 11, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 30, 31, 32, 42], "from_cod": 9, "from_logit": [3, 4, 44], "from_tensor_slic": 4, "front": [0, 4, 5, 33, 34, 35], "frustrat": 15, "fulfil": [2, 5, 12, 34, 35, 39, 41, 42, 43], "full": [1, 3, 5, 7, 9, 10, 13, 21, 28, 30, 33, 34, 35, 38], "full_matric": [5, 34, 35, 43], "fulli": [3, 6, 12, 30, 37, 38, 39, 40, 43, 44], "fullnam": [], "fun": [25, 33], "func": [2, 21, 41, 42, 43], "function": [2, 3, 4, 5, 9, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 44], "functionali": 11, "fundament": [0, 6, 25, 33, 37, 38], "funtion": [2, 42, 43], "furnish": [], "furthemor": 40, "further": [2, 7, 9, 19, 33, 40, 42], "furthermor": [0, 3, 5, 6, 7, 11, 12, 13, 25, 27, 28, 33, 34, 35, 36, 37, 38, 39, 41, 43, 44], "furthest": 22, "futur": [0, 4, 8, 9, 33], "fy": [15, 21, 27, 28, 29, 31, 32, 33], "fys4155": [27, 28], "fys5419": [32, 33], "fys5429": [32, 33], "f\u00f8470": [31, 33], "g": [0, 1, 2, 3, 4, 6, 8, 9, 10, 11, 13, 15, 18, 19, 23, 24, 30, 33, 34, 35, 36, 37, 38, 39, 41, 42, 43, 44], "g0": [2, 42, 43], "g_": [2, 9, 10, 36, 42, 43], "g_0": [2, 42, 43], "g_1": [2, 10, 42, 43], "g_2": [2, 10, 42, 43], "g_3": 40, "g_analyt": [2, 42, 43], "g_dnn_ag": [2, 42, 43], "g_euler": [2, 42, 43], "g_i": [2, 40, 42, 43], "g_j": 40, "g_m": [3, 10], "g_n": 3, "g_re": [2, 42, 43], "g_t": [2, 36, 41, 42, 43], "g_t_d2t": [2, 42], "g_t_d2x": [2, 42, 43], "g_t_dt": [2, 42, 43], "g_t_hessian": [2, 42, 43], "g_t_hessian_func": [2, 42, 43], "g_t_invers": [41, 42], "g_t_jacobian": [2, 42, 43], "g_t_jacobian_func": [2, 42, 43], "g_trial": [2, 42, 43], "g_trial_deep": [2, 42, 43], "g_vec": [2, 42, 43], "gain": [1, 5, 7, 9, 10, 13, 34, 35, 41, 42], "galleri": [0, 33], "game": 4, "gamge": 33, "gamma": [0, 2, 8, 9, 10, 11, 13, 33, 35, 42, 43], "gamma1": 8, "gamma2": 8, "gamma_": [0, 33], "gamma_0": 10, "gamma_1": 10, "gamma_1x": 10, "gamma_i": [0, 8, 30, 33], "gamma_j": 13, "gamma_k": [13, 35], "gamma_m": 10, "gamma_x": [0, 33], "gap": [8, 36], "gate": [4, 12, 40], "gather": [0, 1, 12, 34, 39, 40, 41, 42], "gaug": [12, 39, 40], "gaussbacksub": 26, "gaussian": [4, 5, 6, 8, 14, 18, 30, 33, 37, 38, 39], "gaussian_point": 14, "gaussian_rbf": 8, "gave": [13, 28, 36], "gavra": 33, "gbc": 33, "gca": [2, 6, 8, 13], "gd": [1, 35, 40, 41], "gd_clf": 10, "gdclassiffiercgain": 10, "gdclassiffierconfus": 10, "gdclassiffierroc": 10, "gdm": 13, "gdregress": 10, "ge": [1, 5, 7, 30, 34, 35, 38, 41, 42], "gen_loss": 4, "gen_tap": 4, "gender": [0, 33], "genener": 4, "gener": [0, 1, 2, 3, 5, 6, 8, 10, 11, 12, 13, 14, 15, 16, 18, 20, 21, 22, 23, 24, 26, 27, 28, 30, 32, 34, 35, 36, 37, 41, 42], "generaliz": [16, 41, 42], "generallay": [12, 39], "generate_and_save_imag": 4, "generate_binary_data": [38, 39], "generate_imag": 4, "generate_latent_point": 4, "generate_multiclass_data": [38, 39], "generate_simple_clustering_dataset": 14, "generated_imag": 4, "generator_loss": 4, "generator_loss_list": 4, "generator_model": 4, "generator_optim": 4, "genom": 25, "geodes": 11, "geoff": 36, "geometr": [0, 13, 33, 36], "geometri": 5, "georg": 32, "geotif": 6, "geq": [2, 5, 8, 9, 13, 34, 35, 36, 42, 43], "gerard": [], "geron": [0, 32, 33], "get": [0, 1, 2, 3, 4, 5, 6, 7, 9, 10, 11, 13, 15, 19, 21, 22, 25, 26, 27, 28, 30, 31, 33, 34, 35, 36, 37, 38, 41, 42, 43, 44], "get_dummi": 9, "get_paramet": [2, 42, 43], "get_split": 9, "get_yaxi": 8, "get_yticklabel": 6, "getmask": [], "gh": 15, "gi6mzxat0ew": 42, "giant": 36, "gibb": [25, 33], "gif": 4, "gini": 10, "gini_index": 9, "ginvers": 13, "git": [0, 15, 25, 33], "gitcdn": [], "giter": [13, 36], "github": [0, 20, 24, 25, 27, 28, 29, 31, 32, 33, 34, 40, 41, 42], "gitignor": 15, "gitlab": [0, 15, 25, 27, 28, 33], "gitta": [40, 41], "give": [0, 1, 2, 3, 5, 6, 7, 8, 9, 10, 12, 13, 14, 18, 19, 23, 25, 27, 28, 30, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "given": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 17, 19, 21, 26, 28, 30, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "glkfgrjhtlnplbx4": 21, "global": [6, 7, 13, 35, 36, 38, 39], "gloriou": 28, "glorot": 1, "gmail": [], "gnew": 13, "go": [0, 1, 3, 5, 6, 8, 9, 11, 12, 13, 15, 16, 18, 21, 33, 34, 35, 37, 40, 41, 44], "goal": [0, 7, 9, 33, 38, 39], "goe": [0, 1, 2, 5, 6, 13, 14, 15, 19, 26, 33, 34, 35, 36, 37, 41, 42, 43], "goessner": [], "golden": 13, "gone": [5, 34, 35], "gong": [1, 41], "good": [1, 3, 4, 5, 6, 9, 10, 11, 13, 15, 18, 21, 24, 25, 28, 30, 32, 34, 35, 36, 38, 40, 41, 42], "goodfellow": [4, 28, 32, 33, 34, 35, 38, 39, 40, 41, 43, 44], "googl": [1, 4, 21, 22, 25, 33, 41, 42], "got": [1, 6, 21, 22, 24, 27, 28, 41], "gotten": [33, 41, 42], "gov": 6, "govern": 33, "gp": 32, "gpu": [1, 13, 25, 33, 36, 41, 42], "grad": [2, 13, 21, 22, 36, 41, 42, 43], "grad_analyt": 13, "grad_ol": 18, "grad_ridg": 18, "grad_two_lay": 22, "grade": [27, 28, 29], "gradient": [0, 3, 4, 7, 8, 9, 12, 21, 24, 25, 33, 34, 38, 44], "gradient_bia": [41, 42], "gradient_desc": 36, "gradient_func": 21, "gradient_weight": [41, 42], "gradientboostingclassifi": 10, "gradientboostingregressor": 10, "gradients_of_discrimin": 4, "gradients_of_gener": 4, "gradienttap": 4, "gradual": [1, 14, 41], "grai": [4, 6, 43], "granger": [], "grant": [], "graph": [1, 9, 11, 12, 13, 16, 20, 23, 35, 36, 39, 40, 41, 42], "graph_from_dot_data": 9, "graphic": [0, 1, 9, 15, 33, 41, 42], "grasp": 0, "gray_r": [1, 3, 41, 42, 44], "grayscal": [3, 43, 44], "great": [5, 13, 15, 21, 22, 35, 36, 40, 42, 43], "greater": [1, 7, 30, 34, 39, 41], "greatli": 13, "greedi": 9, "green": [0, 3, 9, 30, 43, 44], "gregor": [41, 42], "grei": 4, "grid": [1, 3, 6, 7, 8, 12, 30, 34, 36, 37, 38, 39, 41, 42, 44], "groh": [40, 41], "grossli": [13, 35], "ground": [0, 33], "group": [0, 6, 7, 9, 14, 15, 20, 24, 25, 27, 28, 29, 31, 33, 37], "groupbi": [0, 33], "grow": [1, 3, 9, 10, 36, 41, 43, 44], "growth": [0, 33], "gru": 4, "guarante": [0, 4, 13, 30, 33, 34, 35, 36], "guess": [1, 4, 10, 13, 14, 23, 28, 35, 36, 41], "guestrin": 10, "gui": 15, "guid": [1, 21, 41, 42], "guidelin": [20, 27, 28, 38, 39], "g\u00f6ssner": [], "h": [0, 1, 5, 6, 8, 13, 15, 19, 21, 30, 31, 32, 33, 34, 35, 36, 41], "h1": [2, 42], "h_": [0, 13, 33, 35, 36], "h_0": 36, "h_1": [2, 13, 35, 42, 43, 44], "h_2": [2, 13, 35, 42, 43, 44], "h_m": 10, "h_t": 36, "ha": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 17, 18, 19, 20, 21, 22, 23, 24, 26, 27, 28, 30, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "haanen": [31, 33], "habit": [0, 34], "had": [0, 1, 6, 7, 13, 33, 35, 36, 37, 38, 41], "hadamard": [1, 12, 13, 36, 40, 41], "half": [1, 8, 9, 38, 39, 40, 41, 42], "halv": 10, "hand": [0, 1, 2, 3, 5, 11, 12, 13, 25, 26, 27, 28, 30, 31, 32, 33, 34, 35, 36, 38, 39, 42, 43, 44], "handi": [3, 27, 28, 43, 44], "handl": [0, 1, 2, 5, 9, 11, 15, 18, 22, 25, 34, 35, 36, 41, 42, 43], "handle_unknown": 9, "handsid": [12, 40, 41], "handwrit": [12, 39, 40], "handwritten": [1, 5, 41], "happen": [1, 2, 3, 4, 5, 6, 10, 13, 24, 30, 34, 35, 36, 39, 41, 42, 43, 44], "hard": [1, 7, 8, 10, 13, 21, 22, 35, 36, 38, 40, 41], "hardcopi": [25, 33], "harder": [0, 1, 19, 21, 34, 41], "harmon": [3, 23], "hash": 36, "hasn": [], "hassl": [0, 25, 33], "hast": [25, 33], "hasti": [0, 6, 16, 17, 19, 20, 27, 32, 33, 34, 37, 38], "hat": [0, 1, 5, 6, 7, 9, 10, 11, 12, 13, 16, 17, 18, 19, 26, 34, 35, 36, 37, 39, 40], "hauser": [], "have": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 30, 31, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "have_sys_un_h": [], "haven": [1, 22, 41], "he": [7, 38, 39], "head": [4, 10, 30], "header": [0, 33], "heads_proba": 10, "health": [0, 34], "hear": [0, 13, 33, 36], "heart": [0, 7, 33, 38], "heatmap": [0, 1, 3, 7, 17, 20, 24, 28, 33, 39, 41, 42, 44], "heavi": 36, "heavili": 0, "heavisid": [1, 41], "height": [1, 3, 6, 34, 41, 43, 44], "hein": [42, 43], "held": [13, 36], "help": [0, 1, 4, 12, 13, 15, 16, 27, 28, 33, 36, 37, 39, 40, 41], "helper": [4, 14, 38, 39], "henc": [0, 5, 6, 8, 9, 10, 12, 13, 33, 34, 35, 36, 37, 38, 39], "henrik": [31, 33], "her": [7, 38, 39], "here": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 17, 18, 19, 21, 22, 23, 25, 26, 27, 28, 30, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "hereaft": [0, 8, 12, 33], "herebi": [], "hermitian": 26, "hessenberg": 26, "hessian": [0, 2, 5, 13, 38, 39, 42, 43], "heterogen": [9, 10], "hex": [], "hi": [7, 38, 39], "hidden": [1, 3, 4, 12, 21, 23, 24, 28, 39, 44], "hidden_bia": [1, 41], "hidden_bias_gradi": [1, 40, 41], "hidden_deriv": [41, 42], "hidden_func": [41, 42], "hidden_layer_s": [0, 1, 33, 41], "hidden_neuron": 4, "hidden_nodes1": [41, 42], "hidden_nodes2": [41, 42], "hidden_weight": [1, 41], "hidden_weights_gradi": [1, 40, 41], "hierarch": [5, 34, 35], "high": [0, 1, 2, 3, 4, 5, 6, 9, 10, 11, 13, 14, 21, 23, 25, 26, 27, 33, 34, 35, 36, 37, 38, 41, 42, 43, 44], "higher": [0, 1, 3, 5, 6, 8, 13, 18, 23, 27, 33, 34, 35, 36, 37, 38, 41, 42], "highest": [1, 2, 38, 39, 41, 42, 43], "highli": [0, 3, 4, 10, 19, 25, 26, 28, 32, 33, 34, 35, 36], "highlight": 23, "highwai": [], "hing": 8, "hint": [13, 15, 16, 21, 22, 28, 34, 35], "hinton": 36, "hip": [25, 42], "hire": 0, "hist": [4, 6, 7, 30, 37, 39], "histogram": [6, 7, 30, 39], "histor": [7, 11, 38], "histori": [3, 4, 12, 15, 36, 39, 40, 42, 44], "hitherto": 5, "hjorth": [31, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "hline": 23, "hobbi": 30, "hoc": [5, 34, 35], "hoff": 32, "hojjatk": 28, "hold": [1, 3, 6, 13, 14, 35, 36, 37, 41, 43, 44], "holder": [0, 33], "holdgraf_evidence_2014": [], "home": [], "homepag": [27, 28, 33], "homework": [6, 13, 35, 36], "homogen": [1, 3, 9, 10, 13, 36], "honchar": [2, 42, 43], "hop": [43, 44], "hope": 24, "hopefulli": [0, 11, 15, 19, 30, 33, 36], "horizont": 11, "horlyk": [31, 33], "hornik": 40, "hors": [3, 7, 33, 38, 39, 44], "hot": [1, 9, 38, 39, 41, 42], "hour": [1, 25, 29, 30, 31, 33, 36, 37, 41], "how": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 30, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 44], "howev": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 21, 25, 26, 27, 28, 30, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "href": [], "hspace": [0, 4, 8, 10, 30, 33, 40, 41], "hstack": [1, 41, 42], "htf": 33, "html": [0, 16, 20, 21, 25, 27, 28, 29, 31, 32, 33, 34, 35, 36, 40, 41, 42], "http": [0, 3, 4, 6, 13, 15, 16, 19, 20, 21, 22, 25, 26, 27, 28, 29, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "huang": [0, 33], "huber": [0, 33], "huge": [1, 3, 4, 25, 36, 41, 43, 44], "human": [0, 1, 3, 6, 9, 12, 34, 39, 40, 41, 43, 44], "humid": 9, "hundr": [1, 41], "hungri": [1, 41], "hybrid": 29, "hydrogen": [0, 33], "hyper": 28, "hyperbol": [1, 4, 12, 42], "hyperparam": 8, "hyperparamat": 40, "hyperparamet": [3, 4, 5, 6, 9, 13, 18, 24, 27, 28, 34, 35, 36, 40, 42, 43, 44], "hyperplan": 11, "h\u00f8rlyk": [31, 33], "i": [2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 34, 35, 36, 37, 38, 39, 40], "i0": [0, 33], "i1": [0, 6, 8, 12, 33, 34, 36, 39], "i2": [0, 8, 12, 33, 39], "i3": [0, 12, 33, 39], "i5": [0, 33], "i_": [13, 35, 36], "i_1": [5, 6, 37], "i_2": [5, 6, 37], "i_siz": [21, 22], "i_t": 36, "ian": 32, "iasuyvmceki": 43, "iayaan2": 21, "ic": [1, 27, 28, 41], "id": [7, 13, 35, 36, 38], "ida": [31, 33], "idea": [0, 1, 2, 3, 4, 6, 9, 10, 12, 13, 20, 26, 27, 28, 34, 35, 36, 37, 38, 39, 40, 41, 42], "ideal": [0, 2, 6, 8, 13, 23, 30, 33, 36, 37, 38, 39, 41, 42, 43], "idem": [6, 37, 38], "ident": [5, 6, 12, 13, 17, 18, 26, 34, 35, 39, 41, 42], "identical": 37, "identifi": [0, 1, 7, 9, 11, 12, 13, 14, 23, 33, 34, 38, 39, 41], "idx": [38, 39], "ieor": 30, "ifi": [32, 42, 43], "ifs": [25, 33], "ignor": [0, 1, 3, 9, 15, 34, 36, 41, 42, 44], "ii": [23, 26, 30, 41, 44], "iii": [26, 33, 41], "ij": [0, 1, 3, 6, 8, 12, 14, 16, 23, 26, 30, 33, 34, 36, 39, 40, 41, 42, 43, 44], "ik": [0, 26, 33, 34], "iki": [], "ilg3ggewq5u": [40, 41], "ill": 36, "illinoi": [], "illustr": [5, 7, 10, 12, 13, 14, 20, 25, 33, 38, 41], "ilsvrc": 36, "im": 6, "imag": [1, 3, 4, 6, 9, 11, 12, 14, 32, 33, 39, 40, 41, 42], "image_at_epoch_": 4, "image_batch": 4, "image_height": [3, 44], "image_path": [0, 6, 7, 9, 33, 37, 38], "image_width": [3, 44], "imageio": 6, "imagenet": 36, "images_from_seed_imag": 4, "imagin": [1, 41], "imbal": 23, "imbalanc": 23, "img": 43, "immedi": [0, 3, 4, 6, 25, 33, 36], "impact": 28, "implement": [0, 2, 3, 4, 5, 6, 8, 9, 10, 11, 12, 13, 14, 19, 20, 21, 22, 27, 30, 33, 34, 35, 36, 38, 39, 40, 44], "impli": [3, 5, 6, 7, 13, 26, 34, 35, 36, 37, 38, 43, 44], "implicit": [3, 36, 43, 44], "implicitli": [11, 30], "import": [0, 1, 2, 3, 4, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 21, 22, 23, 27, 28, 30, 36, 37, 38, 39, 42, 43], "importantli": 3, "importerror": [], "impos": [0, 6, 11, 12, 33, 39, 41, 42], "imposs": [0, 5, 33, 34, 35], "impract": 36, "impress": [0, 12, 33, 39, 40], "improv": [0, 4, 5, 9, 10, 11, 13, 15, 21, 27, 28, 34, 35], "impur": 9, "imread": [6, 43], "imshow": [1, 3, 4, 6, 41, 42, 43, 44], "in3050": [32, 33], "in3310": 33, "in4080": [32, 33], "in4300": [32, 33], "in4310": 32, "in5400": 3, "in5550": 32, "in_out_neuron": 4, "inaccur": [13, 35], "inact": [12, 39, 40, 41], "inadequ": [0, 33], "inappropri": 36, "inch": [6, 34], "incident": [], "includ": [0, 1, 2, 3, 4, 5, 6, 7, 11, 12, 15, 16, 17, 18, 19, 20, 21, 22, 24, 25, 30, 31, 32, 33, 34, 35, 37, 41, 42, 43, 44], "include_bia": [6, 9, 37, 38], "inclus": 28, "incom": [12, 16, 39, 40], "incorrect": [1, 41], "incorrectli": 23, "incoveni": 8, "increas": [0, 1, 3, 4, 5, 6, 9, 12, 13, 19, 23, 27, 30, 33, 34, 36, 37, 38, 39, 40, 41, 42], "increasingli": 30, "increment": 36, "ind": 6, "inde": [0, 2, 4, 5, 6, 13, 33, 34, 35, 40, 42, 43], "indefinit": 4, "independ": [0, 5, 6, 7, 8, 12, 13, 30, 33, 34, 35, 36, 38, 39], "index": [0, 1, 3, 4, 10, 14, 25, 26, 27, 28, 30, 32, 33, 41, 43, 44], "index_col": [0, 33], "indic": [0, 1, 3, 4, 5, 6, 9, 10, 11, 13, 16, 23, 27, 28, 33, 34, 40, 41, 42, 43, 44], "indirect": [], "indispens": [6, 37, 38], "individu": [1, 6, 7, 10, 12, 30, 33, 34, 36, 37, 38, 39, 40, 41, 42], "indu": [], "indx": 26, "indx1": [2, 42, 43], "indx2": [2, 42, 43], "indx3": [2, 42, 43], "ineffici": [3, 13], "inequ": [8, 13], "inequaltii": 35, "inertia": 13, "inexperi": [], "inf": [], "inf1000": [25, 33], "inf1100": [25, 33], "inf1100l": [25, 33], "inf1110": [25, 33], "inf3000": 33, "infeas": [9, 36], "infer": [0, 1, 4, 6, 32, 33, 37, 38, 41, 42], "inferenc": 1, "infil": [0, 6, 7, 9, 33, 37, 38], "infin": [5, 6, 7, 11, 19, 34, 35, 37, 38, 40, 41], "infinit": [3, 36, 43, 44], "infinitesim": 30, "influenc": [6, 10, 18, 37, 38], "influenti": [1, 41], "info": 33, "inform": [0, 1, 3, 4, 6, 9, 11, 12, 13, 14, 23, 26, 27, 28, 32, 33, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "inforom": 15, "infrequ": 36, "infti": [3, 6, 13, 30, 35, 37, 40, 43, 44], "ingeni": [13, 35, 36], "ingredi": [0, 9, 33], "inher": [6, 36, 37, 38], "inherit": [26, 33, 36], "init": [], "initi": [0, 1, 2, 6, 10, 13, 14, 18, 26, 28, 30, 33, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "inititi": [41, 42], "inject": 14, "inlin": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 23, 26, 30, 33, 34, 35, 36, 37, 38, 39, 41, 42, 43, 44], "inn": 23, "inner": [0, 13, 34], "innerhtml": [], "inp": 4, "inplac": 13, "inpput": 40, "input": [0, 1, 3, 4, 5, 6, 7, 8, 12, 13, 14, 16, 27, 28, 30, 33, 34, 35, 36, 37, 38, 39, 44], "input_dim": 1, "input_nod": [41, 42], "input_s": 21, "input_shap": [3, 4, 44], "inputs": 1, "inputs_shuffl": [0, 1, 34, 41], "inquiri": 20, "insert": [3, 5, 6, 8, 10, 30, 34, 35, 37], "insid": [4, 7, 21, 39], "insight": [0, 1, 5, 25, 28, 33, 34, 35, 37, 38, 40], "insist": [6, 13, 34, 36], "inspir": [0, 1, 12, 27, 28, 33, 39, 40, 41], "instabl": [2, 42, 43], "instal": [0, 1, 5, 6, 9, 15, 20, 28, 41, 42], "instanc": [0, 1, 2, 4, 6, 9, 11, 13, 16, 23, 33, 34, 35, 36, 37, 38, 41, 42, 43, 44], "instanti": 10, "instead": [0, 1, 2, 3, 4, 5, 6, 8, 9, 11, 13, 14, 17, 20, 21, 22, 24, 26, 28, 30, 33, 34, 36, 37, 41, 42, 43, 44], "institut": [1, 41], "instruct": [0, 1, 15, 41, 42], "int": [0, 1, 2, 3, 4, 5, 6, 11, 13, 14, 26, 30, 34, 36, 37, 38, 39, 41, 42, 43, 44], "int32": 10, "int_": [3, 6, 23, 30, 37, 40], "int_0": 30, "int_a": 30, "intak": [0, 34], "integ": [1, 2, 13, 14, 26, 30, 33, 38, 39, 41, 42, 43], "integer_vector": [1, 41], "integr": [3, 6, 23, 30, 33, 37, 43, 44], "intellig": [0, 14, 32, 33], "intend": 10, "intens": [1, 18, 41], "intention": 14, "interact": [0, 6, 9, 12, 25, 27, 28, 33, 39, 40], "intercept": [0, 6, 8, 11, 13, 16, 17, 18, 19, 33, 34, 35, 36, 37, 38, 39], "intercept_": [0, 6, 8, 9, 13, 33, 34, 36], "interchang": [5, 12, 26, 39, 40], "interconnect": [1, 41], "interesit": [], "interest": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 12, 19, 25, 27, 28, 30, 33, 34, 35, 37, 38, 39, 40, 41, 42, 43, 44], "interfac": [0, 1, 15, 26, 34, 41, 42], "interior": [0, 9, 33], "intermedi": [26, 34, 36], "intermediari": [21, 22], "intermeti": 22, "intermetidari": 22, "intern": [1, 10, 12, 22, 38, 39, 40, 41, 42], "internation": [], "interpol": [1, 3, 4, 6, 12, 39, 40, 41, 42, 44], "interpr": [5, 34, 35], "interpret": [0, 1, 6, 9, 10, 12, 13, 15, 16, 21, 23, 26, 27, 28, 30, 40, 41, 43, 44], "interrupt": [], "interv": [0, 3, 5, 6, 7, 13, 19, 30, 33, 34, 35, 38, 39], "intial": [13, 35], "intract": [0, 4, 34], "intrins": [3, 11, 26, 30, 33, 43, 44], "intro": [25, 32, 33], "introduc": [0, 1, 5, 6, 8, 10, 12, 24, 26, 27, 30, 33, 35, 36, 37, 39, 40, 41, 43, 44], "introduct": [1, 2, 4, 13, 24, 32, 34, 35, 36, 38, 41, 42, 43], "introductori": [0, 4, 26, 32, 33, 34], "intuit": [0, 5, 6, 8, 12, 13, 27, 33, 36, 37, 38, 39, 40, 41], "inv": [0, 5, 13, 17, 33, 34, 35, 36], "invalid": [], "invalu": [0, 13, 25, 33, 35], "invari": [1, 41, 43, 44], "invd": 5, "inver": [8, 39], "invers": [0, 3, 6, 13, 33, 34, 35, 36, 43, 44], "inverse_transform": 8, "invert": [0, 5, 7, 10, 13, 16, 18, 33, 36, 38, 39], "investig": [], "invh": [13, 36], "invok": 8, "involv": [0, 2, 6, 7, 11, 12, 33, 34, 36, 37, 38, 39, 40, 41, 42, 43, 44], "io": [0, 25, 27, 28, 29, 31, 32, 33, 34, 41, 42], "ion": [], "ip": [0, 8, 30, 33], "ipca": 11, "ipynb": [25, 33], "ipython": [0, 5, 7, 9, 11, 14, 25, 27, 28, 33, 34, 38], "iq": [6, 37], "iri": [8, 9, 21, 23], "irreduc": [6, 37, 38], "irrelev": [5, 34, 35], "irrespect": [0, 33], "irvin": [27, 28], "is_avail": [42, 44], "isaac": [], "isaacmus": [], "iseffici": [], "isn": [5, 24], "isnan": [41, 42], "isnul": [], "isolo": 22, "isomap": 11, "issu": [1, 9, 15, 26, 36, 41, 42], "it_arrai": 13, "item": [0, 13, 33, 42, 44], "items": [26, 33], "iter": [1, 2, 4, 6, 8, 13, 14, 18, 27, 30, 35, 36, 37, 38, 39, 40, 41, 42, 43], "its": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 20, 21, 23, 25, 26, 27, 28, 30, 33, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "itself": [5, 6, 12, 27, 28, 30, 33, 34, 37, 40], "iv": 41, "ix": [41, 42], "j": [0, 1, 2, 3, 4, 5, 6, 8, 9, 11, 12, 13, 14, 15, 16, 23, 26, 27, 30, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "j1": 26, "j_": 6, "j_41hld6ttu": 37, "j_lasso_sk": 6, "j_ridge_sk": 6, "j_sk": 6, "jackknif": [6, 25, 33, 37, 38], "jacobian": [2, 13, 35], "janko": [], "jason": 4, "javascript": [], "jax": [25, 28, 33, 36, 40], "jeff": [], "jensen": [31, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "jentzen": [40, 41], "jerom": [19, 27, 32], "jhauser": [], "ji": [12, 23, 26, 40, 41], "jit": 13, "jj": [0, 5, 6, 33, 37], "jk": [0, 1, 6, 12, 26, 33, 39, 40, 41], "jl": [0, 33], "jm": 26, "jmlr": 42, "jnp": 13, "job": [2, 8, 10, 15, 42, 43], "join": [0, 4, 6, 7, 9, 24, 27, 28, 33, 37, 38, 43], "joint": [4, 5], "jonathan": [], "jpg": 43, "json": [], "judg": [13, 35, 38, 39], "judgement": 6, "julia": [25, 26, 27], "juliu": [40, 41], "jump": [30, 36], "junk": 4, "jupit": 33, "jupyt": [0, 15, 16, 19, 25, 27, 32, 33, 37, 40, 41], "jupyterbook": [], "jupytext": [], "just": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 20, 21, 22, 23, 25, 28, 30, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "justif": 0, "justifi": [3, 10, 43, 44], "k": [0, 1, 3, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 21, 23, 25, 26, 27, 30, 31, 33, 34, 35, 36, 39, 42, 43, 44], "k0": [7, 38, 39], "k1": [7, 38, 39], "kaggl": [6, 27, 28], "kajda": [41, 42], "kappa_d": 30, "karl": [31, 33], "karush": 8, "katex": [], "katrin": [31, 33], "keep": [0, 1, 4, 5, 6, 11, 13, 14, 15, 18, 21, 22, 26, 27, 28, 33, 34, 35, 36, 37, 38, 41, 43], "keepdim": [1, 6, 10, 26, 37, 38, 39, 41, 42], "kei": [1, 3, 6, 12, 36, 39, 41, 42], "kellei": [], "kenneth": [], "kept": [4, 6, 14, 37, 38], "kera": [0, 4, 25, 27, 28, 33], "kernel": [0, 1, 3, 25, 33, 34, 41, 42, 43, 44], "kernel_regular": [1, 3, 41, 42, 44], "kernel_s": 4, "kernelpca": 11, "kev": [0, 33], "kevin": [32, 33], "kevinsheppard": [], "keyboardinterrupt": [41, 42], "keyword": [18, 26, 33, 41, 42], "kfold": [6, 37, 38], "kg": [1, 41], "ki": 26, "kick": [1, 13, 36, 41], "kiener": [2, 42, 43], "kilomet": [6, 34], "kim": [], "kind": [0, 2, 3, 4, 8, 12, 13, 14, 24, 33, 34, 39, 40, 41, 42, 43, 44], "kingma": 36, "kj": [6, 12, 26, 34, 36, 40, 41, 42], "kjm": [25, 33], "kkt": 8, "kl": 30, "km": [12, 33, 39], "kmean": 14, "kmeanspoint": 14, "kn_k": 14, "know": [0, 1, 2, 5, 6, 8, 13, 15, 16, 17, 19, 20, 24, 25, 33, 34, 35, 41, 42, 43], "knowledg": [0, 25, 33], "known": [1, 3, 4, 5, 6, 7, 8, 9, 12, 18, 26, 27, 28, 30, 32, 34, 36, 37, 38, 39, 40, 41, 42, 43, 44], "kondev": [0, 33], "kp": 30, "kpca": 11, "kramdown": [], "kristin": [42, 43], "kroneck": 14, "kt": [], "kuckuck": [40, 41], "kuhn": 8, "kumar": [42, 43], "kutyniok": [40, 41], "kvalsund": [31, 33], "kwarg": [41, 42], "kwown": [0, 33], "l": [0, 1, 2, 3, 5, 6, 7, 8, 10, 11, 12, 13, 22, 23, 26, 27, 30, 33, 35, 36, 38, 39, 42, 43, 44], "l0": [7, 38, 39], "l1": [0, 1, 3, 7, 24, 28, 33, 38, 39, 41, 42, 44], "l1_l2": [1, 3, 41, 42, 44], "l1regl": 5, "l2": [1, 3, 24, 28, 41, 42, 44], "l2_reg": 42, "l_": [26, 36], "l_1": [7, 28, 38, 39, 40], "l_2": [7, 13, 28, 35, 36, 38, 39, 40], "l_i": 36, "l_j": [12, 40, 41], "l_ja": [41, 42], "la": 13, "la_": [], "la_i": [12, 40, 41, 42], "la_k": [12, 40], "lab": [20, 25, 27, 28, 33], "label": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 12, 13, 14, 15, 20, 23, 24, 25, 26, 27, 28, 30, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "labelencod": [7, 10, 23, 39], "labels": [6, 8, 9], "labels_shuffl": [0, 1, 34, 41], "laboratori": 29, "lack": [0, 24, 33, 36], "lagari": [2, 42, 43], "lagrang": [8, 11], "lam": [18, 41, 42], "lambda": [0, 1, 2, 3, 5, 6, 7, 8, 10, 12, 13, 17, 18, 19, 20, 27, 28, 30, 33, 34, 35, 36, 37, 38, 39, 41, 42, 43, 44], "lambda_": 11, "lambda_0": 11, "lambda_1": [5, 8, 11, 34, 35], "lambda_2": [8, 11], "lambda_i": [8, 11], "lambda_iy_i": 8, "lambda_jy_iy_j": 8, "lambda_k": 8, "lambda_n": [5, 8, 34, 35], "lamda": 1, "land": 8, "landmark": 8, "landscap": [13, 18, 35, 36], "langl": [0, 6, 11, 30, 33, 34], "languag": [0, 1, 4, 8, 25, 26, 27, 28, 32, 33, 41], "lapack": [26, 33], "laplac": 5, "laptop": [15, 25], "larg": [0, 1, 2, 4, 5, 6, 8, 9, 10, 11, 13, 18, 25, 26, 27, 30, 32, 33, 34, 35, 36, 37, 38, 40, 41, 42, 43], "larger": [0, 3, 5, 6, 8, 10, 11, 13, 17, 22, 23, 30, 33, 34, 35, 36, 37, 43, 44], "largest": [4, 8, 11], "larn": [43, 44], "lasso": [0, 7, 25, 28, 33, 36, 37, 38, 39], "lasso_sk": 6, "last": [0, 1, 3, 4, 5, 6, 7, 8, 12, 16, 17, 19, 21, 22, 26, 27, 30, 31, 33, 35, 37, 38, 42, 43], "latent": 4, "latent_dim": 4, "latent_point": 4, "latent_space_value_rang": 4, "later": [0, 1, 4, 7, 8, 12, 13, 14, 15, 19, 21, 22, 25, 27, 28, 33, 36, 38, 39, 40, 41, 42, 43], "latest": [4, 15, 25], "latest_checkpoint": 4, "latex": [20, 33], "latexcodec": [], "latrpygrtttbnjr3znuhl": 22, "latter": [0, 3, 6, 7, 8, 11, 13, 26, 30, 33, 34, 35, 36, 37, 38, 39, 40, 43, 44], "lattic": [12, 39, 40], "law": 0, "layer": [0, 4, 13, 23, 24, 28, 33, 36, 39], "layer_grad": 22, "layer_input": 22, "layer_output_s": [21, 22], "layers_grad": 21, "lbfg": [7, 9, 10, 23, 39], "lc_messag": [], "lcc": [5, 6, 37], "lda": 11, "ldot": [0, 6, 11, 27, 33, 37, 38], "le": [5, 7, 10, 13, 17, 30, 34, 35, 36, 38, 43, 44], "lead": [0, 1, 3, 5, 6, 7, 8, 9, 10, 11, 12, 13, 16, 17, 21, 22, 26, 30, 33, 34, 35, 36, 37, 38, 39, 40, 41, 43, 44], "leaf": 9, "leaki": [1, 28, 41, 42], "leakyrelu": [4, 28], "lear": [13, 35], "learn": [3, 4, 5, 6, 7, 8, 9, 10, 12, 21, 23, 24, 26, 31, 32, 44], "learnabl": [3, 43, 44], "learner": 10, "learnig": 33, "learning_r": [8, 10, 21, 42], "learning_rate_init": [0, 1, 33, 41], "learning_schedul": [13, 36], "learnt": [27, 28], "least": [0, 7, 8, 10, 11, 17, 18, 24, 25, 26, 30, 37, 38, 39], "leat": [13, 36], "leav": [0, 1, 3, 5, 6, 9, 11, 21, 33, 35, 37, 38, 41, 43, 44], "lectur": [0, 1, 5, 10, 11, 12, 13, 25, 26, 27, 28, 29, 31, 32, 34], "lecture_11_backpropag": 42, "lecturenot": [0, 25, 27, 28, 32, 33, 41, 42], "left": [0, 1, 2, 3, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 16, 17, 19, 26, 27, 28, 30, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "leftarrow": [8, 12, 40, 41, 42], "legend": [0, 2, 3, 4, 5, 6, 7, 8, 9, 10, 13, 15, 21, 33, 34, 35, 36, 37, 38, 39, 42, 43, 44], "legend_el": 21, "leinonen": 33, "len": [0, 1, 2, 3, 4, 5, 6, 8, 9, 10, 11, 12, 16, 17, 21, 22, 26, 33, 34, 35, 36, 37, 38, 39, 41, 42, 43, 44], "length": [0, 1, 3, 4, 8, 9, 13, 16, 21, 25, 33, 34, 35, 36, 41, 43, 44], "length_of_sequ": 4, "leq": [0, 5, 7, 8, 13, 14, 30, 33, 34, 35, 36, 38], "less": [0, 1, 3, 4, 5, 6, 8, 9, 13, 25, 30, 33, 34, 35, 36, 37, 38, 41, 42, 43, 44], "lessen": [1, 41], "let": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 19, 22, 26, 30, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "letter": [0, 16, 26, 30, 33, 34], "level": [0, 1, 5, 6, 9, 23, 25, 26, 27, 28, 29, 31, 33, 36, 37, 38, 40, 41, 42], "leverag": 36, "lexer": [], "li": [8, 11], "liabil": [], "liabl": [], "lib": [], "liberti": 36, "liblinear": 10, "librari": [0, 1, 2, 3, 4, 5, 6, 9, 10, 11, 26, 27, 30, 32, 34, 35, 36, 41, 42, 43, 44], "licenc": [], "licens": [0, 1, 25, 27, 33, 41, 42], "lie": [0, 6, 11, 30, 33, 34, 37, 38], "life": [0, 1, 8, 12, 33, 39, 40, 41, 42], "lifetim": 13, "lift": 23, "light": [], "like": [0, 1, 2, 3, 4, 5, 6, 7, 9, 10, 11, 12, 13, 15, 16, 20, 21, 22, 23, 24, 25, 26, 27, 28, 30, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "likelihood": [0, 1, 5, 9, 33, 34, 41], "lim_": 30, "limit": [0, 5, 6, 8, 12, 26, 27, 28, 33, 34, 38, 39, 40], "lin_clf": 8, "lin_model": [], "lin_reg": 9, "linalg": [0, 2, 5, 6, 8, 11, 13, 17, 26, 30, 33, 34, 35, 36, 39, 42, 43], "line": [0, 3, 6, 8, 11, 13, 15, 16, 20, 21, 23, 33, 35, 36, 37, 40, 41, 42, 44], "line1": 8, "line2": 8, "line2d": [], "line3": 8, "line_model": 15, "line_ms": 15, "line_predict": 15, "linear": [1, 3, 5, 6, 7, 9, 10, 11, 12, 16, 17, 18, 19, 21, 24, 25, 27, 28, 30, 36, 37, 39, 40, 41, 42, 43, 44], "linear_model": [0, 5, 6, 7, 8, 9, 10, 11, 13, 15, 16, 19, 23, 28, 33, 34, 35, 36, 37, 38, 39], "linear_regress": [6, 37, 38, 41, 42], "linearli": [5, 34, 35, 36], "linearloc": [6, 13, 35, 36], "linearregress": [0, 6, 7, 9, 15, 16, 19, 33, 34, 36, 37, 38], "linearsvc": 8, "lineat": 35, "liner": [1, 3, 41, 44], "linerar": 10, "linewidth": [0, 2, 4, 6, 8, 9, 10, 37, 42, 43], "link": [0, 4, 9, 12, 15, 20, 21, 24, 25, 27, 28, 29, 31, 33, 38, 40], "linlag": 5, "linpack": [26, 33], "linreg": [0, 33], "linspac": [0, 2, 3, 4, 6, 8, 9, 10, 13, 16, 17, 19, 26, 30, 33, 34, 36, 37, 38, 42, 43], "linu": 4, "linux": [0, 1, 25, 27, 33, 41, 42], "liquid": [0, 33], "list": [1, 2, 3, 4, 9, 15, 21, 22, 24, 25, 27, 28, 33, 36, 39, 42, 43, 44], "list_physical_devic": 42, "listedcolormap": [9, 10], "literatur": [1, 7, 14, 32, 37, 38, 41], "littl": [1, 3, 9, 12, 22, 36, 40, 41, 44], "live": [8, 16], "ll": [0, 18, 30, 33, 34], "lle": [0, 34], "llm": 20, "lloyd": [4, 14], "lmb": [0, 2, 5, 6, 34, 35, 36, 37, 38, 42, 43], "lmbd": [0, 1, 3, 33, 41, 42, 44], "lmbd_val": [0, 1, 3, 33, 41, 42, 44], "lmbda": [13, 35, 36], "ln": [1, 13, 35, 41, 43], "load": [1, 4, 6, 7, 9, 10, 23, 36, 39, 42, 44], "load_boston": [], "load_breast_canc": [1, 7, 9, 10, 11, 39, 41, 42], "load_data": [3, 4, 42, 44], "load_digit": [1, 3, 23, 41, 42, 44], "load_iri": [8, 9, 21, 23], "loader": 44, "loc": [3, 6, 7, 8, 9, 10, 21, 33, 37, 38, 39, 44], "local": [0, 1, 3, 7, 12, 13, 15, 21, 22, 34, 35, 36, 38, 39, 40, 41, 43, 44], "locat": [2, 3, 8, 15, 42, 43, 44], "log": [0, 1, 2, 4, 5, 6, 7, 9, 10, 11, 13, 15, 20, 21, 26, 27, 28, 33, 36, 37, 38, 39, 41, 42, 43], "log10": [0, 5, 6, 34, 35, 36, 37, 38, 41, 42, 43], "log_": [0, 33], "log_clf": 10, "logarithm": [0, 5, 7, 17, 26, 33, 37, 38, 39], "logbook": [27, 28], "logic": [0, 1, 9, 33, 41], "logical_or": [], "login": 15, "logist": [0, 1, 2, 8, 9, 10, 11, 12, 13, 23, 24, 25, 28, 34, 35, 36, 40, 42, 43], "logisti": 28, "logistic_regress": [41, 42], "logisticregress": [7, 9, 10, 11, 23, 28, 38, 39], "logit": [7, 28, 38, 39, 42], "logreg": [7, 9, 10, 11, 23, 39], "logspac": [0, 1, 3, 5, 6, 33, 34, 35, 36, 37, 38, 41, 42, 44], "long": [0, 1, 3, 4, 12, 13, 21, 33, 35, 36, 39, 40, 41], "longer": [2, 3, 8, 10, 14, 26, 30, 33, 36, 42, 43, 44], "loocv": [6, 37, 38], "look": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 13, 15, 16, 19, 20, 26, 27, 28, 30, 33, 34, 35, 36, 37, 38, 41, 42, 43, 44], "loop": [1, 4, 6, 10, 12, 14, 16, 17, 18, 22, 25, 26, 33, 36, 37, 38, 41, 42, 44], "lose": [1, 41], "loss": [0, 1, 3, 4, 5, 6, 7, 8, 10, 11, 13, 18, 21, 24, 26, 27, 28, 33, 37, 38, 39, 40, 41, 42, 43, 44], "loss_bin": [38, 39], "loss_fil": 4, "loss_multi": [38, 39], "loss_vec": [38, 39], "lossfil": 4, "lost": 4, "lot": [1, 4, 6, 16, 19, 20, 24, 36, 37, 41, 42], "low": [0, 6, 9, 10, 11, 27, 33, 34, 37, 38], "lower": [0, 1, 3, 6, 9, 10, 16, 21, 26, 34, 36, 41, 42, 44], "lowercas": [26, 33], "lowest": [9, 13, 30, 36], "lr": [1, 3, 4, 10, 38, 39, 41, 42, 44], "lrelu": [41, 42], "lstat": [], "lstm": 4, "lstm_2layer": 4, "lstsq": [0, 33, 34], "lt": [6, 37], "lu": [0, 5, 33, 34, 35], "lubksb": 26, "luckili": [2, 42, 43], "ludcmp": 26, "lux": 26, "lvert": [1, 41], "lw": [0, 33], "lwwrf64f4qkqt": 43, "m": [0, 1, 2, 3, 5, 6, 8, 9, 10, 11, 12, 13, 15, 26, 30, 31, 32, 33, 34, 35, 36, 37, 39, 40, 41, 42, 43, 44], "m_": [9, 12, 40, 41], "m_0": 36, "m_1": 14, "m_h": [0, 33], "m_k": 14, "m_l": [12, 40, 41], "m_n": [0, 33], "m_p": [0, 33], "m_t": [13, 36], "ma": 11, "machin": [1, 3, 4, 5, 6, 7, 9, 10, 11, 12, 15, 16, 26, 32, 34, 36, 37, 40, 41, 42, 43, 44], "machinelearn": [0, 6, 16, 20, 25, 27, 28, 29, 31, 32, 33, 34, 35, 38, 39, 41, 42, 43, 44], "machineri": 23, "mackai": 32, "macro": 23, "made": [0, 1, 3, 4, 5, 6, 7, 9, 11, 12, 27, 28, 33, 34, 36, 38, 39, 40, 41, 43, 44], "mae": [0, 33], "magic": 4, "magnitud": [1, 6, 7, 13, 21, 34, 36, 39, 40, 41], "mai": [0, 1, 2, 3, 5, 6, 7, 8, 9, 11, 12, 13, 19, 25, 26, 27, 28, 30, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "mail": [29, 31], "main": [0, 1, 3, 4, 5, 6, 7, 9, 24, 26, 27, 28, 32, 34, 35, 36, 38, 39, 41, 43, 44], "mainli": [0, 5, 6, 7, 9, 33, 34, 37, 38, 39], "maintain": [6, 36, 37], "major": [1, 6, 9, 10, 13, 26, 33, 35, 36, 37, 38, 41], "make": [1, 2, 3, 4, 5, 6, 7, 8, 11, 12, 13, 15, 16, 18, 19, 21, 22, 23, 24, 25, 26, 27, 28, 30, 32, 33, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "make_axes_locat": 6, "make_classif": [23, 39], "make_moon": [8, 9, 10], "make_pipelin": [0, 6, 10, 34, 37, 38], "makedir": [0, 6, 7, 9, 33, 37, 38], "malcondit": 26, "malign": [1, 7, 9, 39], "mammographi": 5, "manag": [0, 2, 3, 15, 25, 27, 33, 36, 42, 43, 44], "mandatori": [31, 33], "mani": [0, 1, 3, 4, 5, 6, 7, 8, 9, 11, 13, 14, 15, 16, 17, 18, 19, 21, 22, 23, 24, 25, 26, 27, 28, 30, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "manifold": 11, "manner": [3, 23, 43, 44], "manual": [6, 21, 22, 34, 36], "map": [0, 1, 2, 6, 7, 8, 11, 12, 14, 30, 33, 38, 39, 41, 42, 43, 44], "marc": 34, "marchant": [], "margin": [0, 5, 8], "marit": [0, 33], "mark": 33, "markdownfil": [], "markdownit": [], "markdownitdeflist": [], "markedli": [], "marker": [7, 26, 33, 38], "market": 23, "markov": [25, 33], "markup": [], "marsaglia": 30, "mask_or": [], "masked_arrai": [], "maskedrecord": [], "mass": [0, 1, 5, 13, 34, 35, 41], "massag": [0, 33], "masses2016": [0, 33], "masses2016ol": [0, 33], "masses2016tre": 0, "masseval2016": [0, 33], "master": [29, 31], "mat": [25, 33], "mat1100": [25, 33], "mat1110": [25, 33], "mat1120": [25, 33], "match": [1, 4, 5, 13, 14, 15, 34, 35, 36, 41], "materi": [4, 5, 7, 13, 15, 26, 29, 31, 39, 42], "math": [3, 7, 12, 13, 26, 30, 32, 33, 36, 38, 39, 41, 42, 43], "mathbb": [0, 4, 5, 6, 7, 8, 11, 12, 13, 14, 17, 19, 26, 27, 30, 33, 34, 35, 36, 37, 38, 39, 40], "mathbf": [0, 5, 6, 7, 8, 13, 19, 26, 27, 33, 34, 35, 36, 37, 38, 39, 40], "mathcal": [1, 5, 6, 7, 13, 27, 37, 38, 39, 41], "matheemat": 3, "mathemat": [0, 6, 11, 12, 13, 21, 23, 24, 25, 26, 30, 32, 33, 36], "mathemati": 33, "mathrm": [0, 1, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 17, 18, 19, 23, 27, 30, 33, 34, 35, 36, 37, 38, 39, 40, 41, 43, 44], "matmul": [1, 2, 5, 40, 41, 42, 43], "matnat": 32, "matplotlib": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 17, 19, 21, 22, 23, 25, 26, 27, 30, 33, 34, 35, 36, 37, 38, 39, 41, 42, 43, 44], "matplotlibrc": [], "matric": [0, 1, 3, 4, 6, 7, 8, 11, 13, 16, 17, 25, 34, 35, 38, 39, 40, 41, 42], "matrix": [0, 2, 3, 4, 6, 7, 8, 10, 13, 17, 18, 19, 21, 27, 28, 30, 37, 38, 40, 42, 43, 44], "matshow": 1, "matter": [2, 3, 13, 34, 35, 36, 40, 42, 43, 44], "matthia": [], "max": [0, 1, 2, 3, 4, 9, 10, 12, 13, 21, 31, 33, 35, 36, 38, 39, 40, 41, 42, 43, 44], "max_depth": [0, 9, 10], "max_diff": [2, 42, 43], "max_diff1": [2, 42, 43], "max_diff2": [2, 42, 43], "max_it": [0, 1, 8, 13, 28, 33, 39, 41], "max_iter": 14, "max_leaf_nod": 10, "max_pixel": 43, "max_sampl": 10, "maxdegre": [0, 6, 10, 34, 37, 38], "maxdepth": 10, "maxim": [1, 4, 5, 7, 8, 11, 37, 38, 39, 41], "maximum": [0, 2, 3, 5, 7, 8, 9, 10, 13, 14, 33, 34, 35, 36, 42, 43, 44], "maxpolydegre": [5, 6, 34, 35, 36, 37, 38], "maxpool2d": 44, "maxpooling2d": [3, 44], "mayb": 24, "mbox": [5, 6, 34, 35, 37], "mcculloch": [12, 39, 40], "md": 11, "mdoel": 4, "me": [], "mean": [1, 2, 3, 4, 5, 6, 7, 9, 10, 11, 12, 13, 14, 15, 17, 18, 19, 22, 23, 25, 26, 27, 28, 30, 33, 36, 37, 39, 40, 41, 42, 43, 44], "mean0": [38, 39], "mean1": [38, 39], "mean_absolute_error": [0, 33], "mean_divisor": 14, "mean_i": 30, "mean_matrix": 14, "mean_squared_error": [0, 4, 6, 7, 10, 15, 19, 33, 34, 37, 38], "mean_squared_log_error": [0, 33], "mean_vector": 14, "mean_x": 30, "meaning": [0, 4, 7, 33, 38], "meansquarederror": [0, 33], "meant": [3, 7, 10, 13, 24, 38, 40, 43, 44], "meanwhil": 36, "measur": [0, 1, 2, 5, 6, 9, 11, 12, 14, 16, 18, 27, 28, 30, 33, 34, 36, 37, 38, 40, 41, 42, 43, 44], "mechan": [0, 4, 30, 33, 36], "median": [0, 33, 34, 36], "medicin": [12, 39, 40], "medium": [4, 8, 13, 28, 36], "medv": [], "meet": [0, 24, 31], "mehta": [0, 33, 34, 35], "member": [20, 27, 28], "memori": [3, 4, 11, 12, 13, 18, 26, 39, 40], "mentat": [], "mention": [0, 12, 13, 24, 27, 28, 30, 33, 35, 36, 39, 40, 43, 44], "merchant": [], "mere": [0, 27, 28], "merg": [], "meshgrid": [2, 5, 6, 8, 9, 10, 11, 41, 42, 43], "mess": 15, "messag": [5, 13], "messi": [2, 42, 43], "messier": 22, "met": [0, 3, 8, 34], "meta": [], "meteorolog": 9, "meter": [6, 34], "method": [0, 1, 2, 3, 4, 5, 7, 8, 11, 12, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 30, 32, 34, 40, 43, 44], "metion": 6, "metric": [0, 1, 3, 6, 7, 9, 10, 14, 15, 21, 22, 23, 28, 33, 34, 37, 38, 39, 41, 42, 44], "metropoli": [25, 33], "mev": [0, 30, 33], "mgd": [13, 36], "mglearn": [25, 33], "mgrid": 13, "mhjensen": [], "mi": 10, "mia": [31, 33], "michael": [28, 40, 41], "micro": 23, "microsoft": 32, "mid": [1, 41], "midel": 4, "midnight": [15, 21, 22, 23, 24], "midpoint": 9, "might": [0, 1, 2, 4, 6, 9, 13, 15, 17, 18, 22, 24, 34, 35, 36, 41, 42, 43], "migth": 17, "mild": 9, "millimet": [6, 34], "million": [0, 33, 34, 36], "mimic": [12, 39, 40], "min": [0, 2, 5, 8, 9, 35, 42, 43], "min_": [0, 2, 5, 14, 17, 33, 34, 35, 42, 43], "min_samples_leaf": 9, "mind": [0, 6, 13, 15, 18, 21, 33, 34, 35, 36, 37], "mindboard": 4, "mine": [25, 33], "mini": [1, 11, 12, 13, 35, 41], "minibatch": [1, 11, 13, 41, 42], "minibathc": [13, 36], "miniforge3": [], "minim": [0, 1, 2, 3, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 16, 17, 34, 35, 36, 37, 41, 44], "minima": [0, 1, 7, 13, 33, 35, 36, 38, 39, 41], "minimum": [0, 1, 2, 6, 8, 9, 11, 13, 34, 35, 36, 37, 38, 39, 41, 42, 43], "minmaxscal": [0, 34, 36, 41, 42], "minor": 30, "minst": [1, 41, 42], "minu": [7, 38], "mirjalili": 33, "mirror": 9, "misc": 6, "misclassif": [8, 9, 10, 23], "misclassifi": [8, 10], "miser": 0, "mismatch": [1, 41], "miss": [7, 10], "mistak": [4, 19], "mit": [32, 43, 44], "mitig": 36, "mix": [1, 2, 33, 41, 42, 43], "mixtur": [13, 36], "mk": [9, 26], "mkdir": [0, 6, 7, 9, 33, 37, 38], "ml": [0, 1, 10, 13, 26, 27, 28, 34, 35, 36, 41, 42, 43], "mlab": 30, "mle": [5, 7, 38, 39], "mlp": [1, 39, 40, 41], "mlpclassifi": [1, 39, 41], "mlpregressor": [0, 33], "mm": 26, "mml": 34, "mn": [12, 30, 39], "mnist": [1, 11, 23, 28, 41, 43], "mnist_784": 28, "mo": [], "mod": 30, "mode": [29, 31, 33, 38, 39, 41, 42], "model": [2, 3, 5, 7, 8, 9, 10, 11, 13, 14, 16, 18, 19, 20, 21, 23, 24, 25, 27, 28, 30, 32, 34, 35, 36, 37, 38, 42], "model_bin": [38, 39], "model_multi": [38, 39], "model_select": [0, 1, 3, 5, 6, 7, 9, 10, 11, 15, 16, 17, 19, 23, 28, 33, 34, 35, 36, 37, 38, 39, 41, 42, 44], "moder": [10, 36], "modern": [0, 6, 7, 25, 33, 36, 37, 38, 39, 40, 41], "modest": 36, "modif": [2, 12, 13, 42, 43], "modifi": [0, 1, 3, 5, 7, 8, 10, 12, 13, 33, 34, 35, 36, 38, 39, 40, 41, 43, 44], "modul": [0, 16, 26, 33, 42, 44], "modular": 30, "modulo": 30, "moe": [11, 34], "moment": [5, 6, 13, 30, 37, 41, 42], "moment_correct": [41, 42], "momentum": [22, 24, 40, 41, 42], "momentum_schedul": [41, 42], "mondai": [31, 33, 38], "monitor": [13, 36, 42], "monoton": [5, 12, 30, 37, 39, 40, 41, 42], "mont": [0, 6, 25, 30, 32, 33, 37, 38], "montli": 16, "moor": [5, 6], "more": [0, 1, 2, 4, 5, 7, 8, 9, 10, 11, 12, 13, 14, 16, 17, 19, 21, 22, 23, 25, 28, 30], "moreov": [0, 3, 28, 43, 44], "morten": [31, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "mortenhj": 33, "most": [0, 1, 3, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 21, 22, 25, 27, 28, 30, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "mostli": [1, 11, 18, 36, 41], "motion": [0, 13], "motiv": [1, 4, 40, 41], "moulin": 36, "move": [0, 4, 5, 6, 7, 9, 12, 13, 14, 15, 16, 21, 22, 27, 30, 34, 35, 37, 38, 39, 40, 41, 42, 43, 44], "mpl": [7, 33, 38], "mpl_toolkit": [2, 6, 13, 35, 36, 42, 43], "mplot3d": [2, 6, 13, 35, 36, 42, 43], "mplregressor": [1, 41], "mqa": 43, "mr_": [], "mrecord": [], "ms3tv8fvar": 39, "mse": [0, 4, 5, 6, 9, 10, 15, 16, 17, 19, 20, 22, 27, 28, 33, 34, 35, 36, 37, 38, 41, 42], "mse_der": 22, "mse_simpletre": 10, "mselassopredict": [5, 35], "mselassotrain": [5, 35], "mseownridgepredict": [6, 34, 35, 36], "msepredict": [5, 35], "mseridgepredict": [0, 5, 6, 34, 35, 36], "msetrain": [5, 35], "msg": [], "msle": [0, 33], "mt": [7, 12, 38, 39, 41], "mu": [0, 6, 11, 13, 30, 33, 36, 37], "mu0": 30, "mu1": 30, "mu2": 30, "mu_": [6, 30, 34, 36, 37], "mu_i": [6, 34, 36], "mu_n": 11, "mu_x": 30, "much": [0, 1, 2, 3, 4, 5, 6, 8, 9, 10, 11, 12, 13, 15, 20, 21, 22, 26, 27, 30, 33, 34, 35, 36, 37, 38, 40, 41, 42, 43, 44], "multi": [0, 1, 3, 7, 23, 25, 33, 38, 42, 43, 44], "multi_class": [28, 38, 39], "multiclass": [1, 7, 23, 28, 38, 39], "multiclass_result": [38, 39], "multidimension": [11, 12, 33, 39, 40], "multilay": [1, 41], "multinomi": [7, 28, 38, 39], "multipl": [2, 4, 5, 6, 7, 12, 13, 15, 22, 28, 30, 34, 35, 36, 37, 38, 39, 40, 42], "multipli": [3, 5, 6, 11, 13, 18, 22, 26, 30, 34, 35, 36, 43, 44], "multiplum": 8, "multivari": [0, 2, 10, 11, 25, 30, 33, 42, 43], "multivariate_norm": [11, 14], "multpli": 16, "murphi": [11, 32, 33], "muse": [], "must": [1, 2, 5, 6, 8, 10, 12, 13, 14, 15, 20, 22, 27, 28, 30, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43], "mutat": [7, 38, 39], "mutual": [1, 3, 6, 13, 37, 38, 41, 42, 44], "mx_": 30, "my": 33, "mydata": 23, "myenv": [], "myriad": [0, 25, 33], "myself": [], "mz1": 30, "mz2": 30, "m\u00f8svatn": 6, "n": [0, 1, 2, 3, 4, 5, 6, 7, 8, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 23, 26, 27, 28, 30, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "n0": [38, 39], "n1": [26, 38, 39], "n2": 26, "n8grai": [], "n_": [1, 2, 3, 8, 12, 23, 30, 39, 41, 42, 43, 44], "n_0": [12, 30, 39], "n_boostrap": [6, 10, 37, 38], "n_bootstrap": [6, 37], "n_categori": [1, 3, 41, 42, 44], "n_class": [23, 38, 39], "n_cluster": 14, "n_compon": 11, "n_epoch": [13, 36, 41, 42], "n_estim": 10, "n_examples_to_gener": 4, "n_featur": [1, 18, 23, 38, 39, 40, 41, 42], "n_filter": [3, 44], "n_hidden": [2, 42, 43], "n_hidden_neuron": [0, 1, 33, 40, 41], "n_i": [23, 30], "n_inform": 23, "n_input": [0, 1, 3, 34, 40, 41, 42, 44], "n_instanc": 9, "n_iter": 36, "n_job": 10, "n_k": 14, "n_l": [12, 30, 39], "n_layer": 1, "n_m": 9, "n_neuron": 1, "n_neurons_connect": [3, 44], "n_neurons_layer1": [1, 41, 42], "n_neurons_layer2": [1, 41, 42], "n_output": [40, 41], "n_point": 14, "n_redund": 23, "n_sampl": [6, 8, 9, 10, 14, 18, 23, 37, 38, 39], "n_split": [6, 37, 38], "n_step": 4, "n_t": [2, 42, 43], "n_x": [2, 42, 43], "nabla": [1, 13, 35, 36, 41], "nabla_": [2, 13, 35, 36, 42, 43], "nabla_w": 13, "nafter": [41, 42], "nag": 13, "naimi": [0, 33], "naiv": [7, 38, 39], "naive_kmean": 14, "name": [0, 1, 3, 4, 5, 6, 7, 8, 9, 10, 12, 13, 14, 15, 18, 20, 21, 24, 25, 26, 27, 28, 30, 31, 33, 34, 35, 37, 38, 39, 40, 41, 42, 43, 44], "namespac": [], "nan": [41, 42], "narrow": [13, 36], "nathaniel": [], "nation": [1, 5, 41], "nativ": [25, 33], "natur": [0, 1, 4, 8, 9, 12, 13, 27, 28, 30, 32, 33, 35, 36, 39, 40, 41], "navier": [12, 39, 40], "navig": [15, 36], "nb": 30, "nb_": 26, "nbconvert": 33, "nd": 14, "ndarrai": [6, 41, 42], "nderiv": [41, 42], "ne": [9, 10, 26, 30, 34, 35], "nearest": [1, 3, 6, 11, 41, 42, 44], "nearli": [13, 35], "neat": 33, "neccesari": [6, 37], "necess": [2, 42, 43], "necessari": [0, 1, 3, 4, 8, 14, 18, 33, 40, 41, 42, 44], "necessarili": [0, 4, 11, 30, 33], "necesserali": 5, "neck": [7, 38, 39], "need": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 18, 19, 20, 21, 22, 24, 26, 28, 30, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "neg": [0, 1, 3, 5, 6, 7, 10, 13, 23, 26, 30, 33, 35, 37, 38, 39, 41, 42, 43, 44], "neg_mean_squared_error": [6, 37, 38], "neglect": [30, 36], "neglig": 30, "neighbor": [3, 6, 11, 43, 44], "neither": [4, 13, 36], "neq": [13, 14, 23, 30, 35], "nerual": [41, 42], "nervou": [12, 39, 40], "nest": [9, 12, 39], "nesterov": 13, "net": [2, 4, 12, 28, 39, 40, 42, 43], "netlib": [26, 33], "network": [0, 9, 13, 21, 22, 23, 24, 25, 32, 34], "network_input_s": [21, 22], "neural": [0, 13, 21, 22, 23, 24, 25, 32, 34, 38], "neural_network": [0, 1, 2, 33, 39, 41, 42, 43], "neuralnet": 42, "neuralnetwork": [1, 22, 41], "neuralnetworksanddeeplearn": [28, 40, 41], "neuron": [1, 2, 3, 4, 12, 41, 42], "neutral": [0, 33], "neutron": [0, 33], "never": [1, 4, 6, 9, 30, 37, 38, 41], "new": [0, 1, 2, 3, 5, 6, 7, 8, 9, 10, 11, 13, 14, 15, 17, 20, 22, 26, 33, 34, 35, 36, 38, 39, 41, 42], "new_chang": [13, 36], "new_hobbit": 33, "new_ma": [], "newaxi": [0, 3, 6, 9, 21, 37, 38, 44], "newli": [0, 33], "newlin": [38, 39], "newton": [1, 7, 8, 13, 30, 40, 41], "next": [0, 1, 2, 3, 4, 5, 6, 8, 9, 13, 14, 15, 16, 21, 22, 23, 33, 34, 35, 36, 37, 39, 40, 41, 42, 43, 44], "next_guess": 13, "next_input": 4, "ng": [1, 41], "nhow": [41, 42], "ni": 14, "nice": [0, 1, 5, 11, 22, 33, 34, 35, 41], "nicer": [18, 36], "nielsen": [28, 40, 41], "nine": [40, 41], "nip": 36, "niter": [13, 35, 36], "nitric": [], "nlambda": [0, 5, 6, 34, 35, 36, 37, 38], "nlp": 32, "nm": 30, "nm_n": [0, 33], "nmse": [6, 37, 38], "nn": [2, 5, 6, 12, 24, 26, 33, 37, 39], "nn_model": 1, "nnmin": [2, 42, 43], "no_grad": [42, 44], "node": [1, 3, 9, 10, 12, 21, 24, 28, 39, 42, 43, 44], "nois": [0, 4, 5, 6, 8, 9, 10, 13, 18, 19, 27, 33, 34, 35, 36, 37, 38, 43], "noise_dimens": 4, "noisi": [1, 6, 27, 36, 37, 38, 41], "nomask": [], "non": [0, 1, 3, 5, 6, 7, 9, 10, 11, 12, 13, 14, 18, 21, 24, 26, 30, 33, 34, 35, 37, 38, 39, 40, 41, 42, 43, 44], "nondifferenti": 36, "none": [0, 1, 2, 4, 5, 9, 10, 13, 30, 33, 34, 38, 39, 40, 41, 42, 43], "noninfring": [], "nonlinear": [3, 6, 8, 9, 11, 12, 37, 38, 39, 40, 43, 44], "nonneg": [6, 9, 13, 35, 37, 38], "nonparametr": 6, "nonsens": 30, "nonsingular": 26, "nonumb": [3, 7, 8, 13, 26, 38, 39], "nor": [1, 4, 13, 22, 36, 40, 41], "norm": [0, 1, 5, 6, 8, 11, 13, 18, 33, 34, 35, 36, 37, 40, 41], "normal": [3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 16, 17, 18, 19, 21, 23, 25, 26, 27, 28, 30, 33, 34, 35, 36, 38, 39, 40, 42, 43, 44], "normali": [26, 33], "norwai": [6, 27, 28, 33, 35, 36, 37, 39, 40, 41, 42, 43], "notabl": [], "notat": [0, 2, 5, 6, 13, 14, 30, 33, 34, 35, 37, 38, 40, 41, 42, 43], "note": [0, 1, 2, 3, 4, 5, 6, 7, 8, 11, 12, 13, 14, 15, 16, 18, 22, 25, 26, 30, 32, 33, 36, 37, 38, 39, 40, 41, 42, 43, 44], "notebook": [0, 1, 3, 9, 15, 16, 19, 20, 21, 22, 25, 27, 28, 33, 37, 40, 41, 42, 44], "noteworthi": 36, "noth": [1, 2, 5, 8, 12, 14, 30, 34, 35, 39, 41, 42, 43], "notic": [4, 5, 12, 13, 22, 24, 26, 30, 33, 40, 41], "notimplementederror": [41, 42], "notion": [3, 43, 44], "noutput": [41, 42], "novel": [3, 6, 10, 33, 44], "novemb": [1, 31, 33, 41, 42], "now": [0, 2, 4, 5, 6, 7, 8, 10, 11, 12, 14, 15, 16, 19, 21, 22, 25, 26, 27, 28, 30, 33, 34, 39, 40, 41, 42, 43, 44], "nowadai": [0, 1, 3, 9, 25, 33, 41, 42, 43, 44], "nox": [], "np": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 13, 14, 15, 16, 17, 18, 19, 21, 22, 23, 26, 30, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "npm": [], "npr": [2, 42, 43], "nsampl": [6, 37, 38], "nt": [2, 42, 43], "nu": 30, "nuclear": [5, 34, 35], "nuclei": [0, 30, 33], "nucleon": [0, 33], "nucleu": [0, 33], "num": 4, "num_coordin": [2, 42, 43], "num_epoch": [42, 44], "num_equ": [41, 42], "num_hidden_neuron": [2, 42, 43], "num_it": [2, 18, 42, 43], "num_neuron": [2, 42, 43], "num_neurons_hidden": [2, 42, 43], "num_not": [41, 42], "num_point": [2, 42, 43], "num_tre": 10, "num_valu": [2, 42, 43], "number": [1, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 16, 17, 18, 19, 21, 23, 26, 27, 28, 29, 31, 33, 35, 37, 38, 39, 41], "numberid": [7, 38], "numberparamet": [3, 43, 44], "numer": [0, 5, 6, 9, 10, 11, 12, 13, 21, 25, 26, 32, 33, 34, 35, 36, 37, 38, 39, 40], "numpi": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 21, 22, 23, 25, 27, 30, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "numpydocstr": [], "nunmpi": [5, 34], "nve_frngahw": 35, "nx": [2, 42, 43], "ny": [30, 41, 42], "o": [0, 1, 4, 5, 6, 7, 8, 9, 11, 26, 31, 32, 33, 34, 35, 36, 37, 38, 39, 43, 44], "obei": [6, 11, 13, 34, 36], "object": [0, 1, 4, 8, 10, 15, 19, 26, 33, 36, 40, 42], "obliqu": [5, 34, 35], "observ": [0, 1, 3, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 30, 33, 35, 36, 37, 38, 39, 43, 44], "obtain": [0, 1, 5, 6, 7, 8, 9, 10, 12, 13, 14, 17, 23, 26, 27, 28, 30, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "obviou": [5, 6, 11, 30, 34, 35], "obviouli": 33, "obvious": [0, 4, 5, 6, 26, 33, 37], "oc": [34, 35], "occupi": [], "occur": [0, 6, 8, 9, 23, 26, 30, 33], "octob": [21, 22, 23, 24, 28, 31, 33, 39], "od": 0, "odd": [0, 3, 7, 33, 34, 36, 38, 39, 43, 44], "odenum": [2, 42, 43], "odesi": [2, 42, 43], "oen": 0, "off": [1, 3, 4, 5, 9, 13, 20, 23, 28, 30, 36, 37, 41, 42, 43, 44], "offer": [6, 11, 25, 26, 29, 31, 33, 37, 38], "offic": [31, 33], "offici": [29, 33], "offlin": [21, 22], "often": [0, 1, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 18, 19, 21, 23, 25, 26, 27, 28, 30, 33, 34, 35, 36, 37, 38, 39, 40, 41, 43, 44], "ofter": [26, 33], "og": [41, 42], "ogo": [43, 44], "ol": [0, 13, 17, 19, 28, 34, 36, 38], "old": [1, 5, 10, 13, 15, 18, 38, 39, 41], "old_ma": [], "oliph": [], "ols_paramet": 16, "ols_sk": 6, "ols_svd": 6, "olsbeta": 35, "olstheta": [0, 5], "omega": [2, 3, 6, 42, 43], "omega_0": 3, "omit": [0, 5, 33, 34, 35, 37], "onc": [1, 6, 9, 11, 13, 20, 37, 38, 41, 42], "one": [0, 1, 3, 4, 5, 6, 7, 8, 9, 10, 11, 13, 14, 15, 19, 20, 21, 23, 24, 25, 26, 27, 28, 30, 31, 33, 34, 36, 37, 38, 39], "one_hot": [38, 39], "one_hot_predict": 21, "onehot": [1, 41, 42], "onehot_vector": [1, 41], "onehotencod": 9, "ones": [0, 2, 5, 6, 8, 9, 10, 11, 13, 16, 18, 21, 22, 26, 27, 33, 34, 35, 36, 37, 38, 40, 42, 43], "ones_lik": 4, "ong": 34, "onl": [3, 43, 44], "onli": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 17, 18, 19, 20, 21, 22, 23, 24, 26, 27, 28, 30, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "onlin": [11, 15, 20, 29, 36, 40, 41], "onto": [5, 11, 34, 35], "open": [0, 1, 4, 6, 7, 9, 15, 25, 27, 29, 31, 33, 37, 38, 39, 41, 42, 43, 44], "oper": [0, 1, 3, 5, 6, 10, 11, 12, 13, 15, 16, 21, 22, 23, 25, 30, 33, 34, 35, 36, 37, 39, 41, 42, 43, 44], "operation": 30, "oplu": 30, "opmiz": [13, 36], "opportun": 0, "oppos": [6, 13], "opposit": [1, 5, 8, 34, 35, 41], "opt": [1, 5, 27, 28, 33, 35, 41, 42], "optim": [0, 2, 3, 4, 5, 6, 7, 9, 10, 11, 14, 16, 17, 19, 21, 22, 24, 27, 28, 37, 42, 43, 44], "optimis": [1, 3, 41, 42, 44], "option": [0, 1, 3, 5, 6, 8, 11, 15, 18, 23, 26, 34, 36, 37, 41, 42, 43, 44], "optmiz": [1, 8, 13, 34, 41], "oral": 33, "orang": 0, "order": [0, 1, 2, 3, 5, 6, 7, 8, 9, 10, 11, 12, 15, 19, 21, 26, 27, 28, 30, 33, 34, 35, 37, 38, 39, 40, 41, 42, 43, 44], "ordinari": [0, 2, 3, 7, 11, 13, 17, 18, 24, 25, 37, 38, 39, 44], "oreilli": [32, 33], "org": [0, 3, 4, 16, 20, 21, 25, 26, 27, 28, 32, 33, 34, 35, 36, 40, 42], "organ": [6, 7, 10, 26, 37, 38], "orgin": 40, "orient": [1, 5, 30, 34, 35, 42], "origin": [0, 3, 5, 6, 8, 11, 12, 13, 15, 26, 33, 34, 35, 36, 37, 38, 39, 43, 44], "originals": 43, "orthogn": [5, 34, 35], "orthogon": [0, 5, 6, 8, 11, 13, 26, 33, 34, 35, 43], "orthonorm": [5, 34, 35], "os": [31, 33], "oscar": [1, 41], "oscil": [3, 13, 36], "oskar": 33, "oskarlei": 33, "osl": 18, "oslo": [0, 25, 27, 28, 29, 31, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "osx": [0, 25, 27, 33], "other": [0, 1, 2, 3, 5, 6, 7, 8, 10, 13, 14, 16, 19, 21, 22, 25, 29, 30, 31, 32, 34, 35, 36, 37, 38, 43, 44], "otherwis": [0, 1, 4, 7, 13, 26, 28, 33, 36, 38, 39, 41], "ouput": [5, 7, 12, 37, 38, 42], "our": [1, 2, 3, 6, 7, 8, 9, 10, 12, 14, 15, 16, 17, 18, 19, 21, 25, 26, 30, 36, 37, 40, 43, 44], "ourmodel": 0, "ourselv": [0, 5, 6, 8, 11, 13, 33, 34, 35, 37], "out": [0, 1, 2, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 15, 16, 21, 22, 25, 26, 27, 28, 30, 33, 34, 36, 37, 38, 39, 40, 41, 42, 43, 44], "out_deriv": [41, 42], "out_fil": 9, "outcom": [0, 7, 9, 10, 12, 23, 30, 34, 38, 39], "outdoor": 9, "outer": [6, 12, 13], "outfil": 4, "outlier": [0, 8, 33, 34, 36], "outlin": [6, 10, 11, 37, 38], "outlook": 9, "outperform": [10, 36], "output": [0, 1, 3, 4, 5, 6, 7, 8, 9, 10, 12, 13, 19, 21, 22, 23, 24, 26, 27, 28, 30, 33, 34, 35, 36, 37, 38, 39, 44], "output_bia": [1, 41], "output_bias_gradi": [1, 40, 41], "output_func": [41, 42], "output_nod": [41, 42], "output_shap": 4, "output_weight": [1, 41], "output_weights_gradi": [1, 40, 41], "outputlayer1": [12, 39], "outputlayer2": [12, 39], "outsid": [4, 22], "over": [0, 1, 3, 4, 5, 6, 9, 10, 12, 13, 15, 16, 19, 22, 23, 24, 26, 27, 33, 34, 35, 36, 37, 38, 42, 43, 44], "over1": 13, "overal": [1, 10, 36, 41], "overcast": 9, "overcom": [12, 13, 39, 40], "overdetermin": [0, 33], "overfit": [0, 1, 3, 6, 9, 10, 13, 24, 28, 36, 37, 38, 41, 42, 43, 44], "overflow": [5, 36, 37], "overflowerror": [41, 42], "overhead": [12, 40, 41], "overlap": [3, 7, 8, 9, 39, 43, 44], "overleaf": [20, 24, 27, 28], "overlin": [0, 5, 6, 9, 10, 11, 14, 26, 33, 34, 36], "overshoot": 36, "overst": 0, "overtrain": 4, "overview": [3, 20], "overwhelm": 43, "overwritten": [41, 42], "own": [4, 5, 6, 8, 12, 13, 16, 18, 22, 23, 25, 26, 35, 36, 37, 40, 41, 43, 44], "owner": [], "ownmsepredict": 0, "ownmsetrain": 0, "ownridgebeta": 34, "ownridgetheta": [0, 6, 34, 35, 36], "ownypredictridg": 0, "ownytilderidg": 0, "ox": [], "oxid": [], "p": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 13, 14, 15, 16, 17, 18, 19, 23, 26, 30, 33, 34, 35, 36, 37, 38, 39, 41, 42, 43, 44], "p0": [2, 42, 43], "p1": [2, 42, 43], "p_": [2, 4, 8, 9, 42, 43], "p_hidden": [2, 42, 43], "p_i": [5, 30], "p_j": 30, "p_n": 30, "p_output": [2, 42, 43], "p_x": 30, "pa": 40, "pack": [0, 33], "packag": [0, 1, 3, 4, 5, 8, 11, 13, 15, 20, 22, 25, 27, 28, 30, 34, 35, 36, 41, 42, 44], "packtpub": 33, "packtpublish": 33, "pad": [3, 4], "page": [0, 24, 25, 27, 28, 33, 35, 36, 37, 38], "pai": [0, 1, 9, 13, 15, 36, 41], "pair": [0, 2, 3, 9, 25, 30, 33, 42, 43, 44], "paltform": 15, "panda": [0, 4, 5, 6, 7, 9, 11, 25, 27, 35, 36, 37, 38, 39], "pandoc": [], "panel": 33, "paper": [1, 36, 42], "paper_fil": 36, "paradigm": [0, 33], "paragraph": 20, "parallel": [10, 13, 25, 26, 33], "param": [2, 42, 43], "paramat": [2, 42, 43], "paramet": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 12, 13, 16, 17, 18, 19, 21, 22, 24, 27, 28, 30, 35, 36, 37, 42], "parameter": [0, 6, 10, 33, 34], "parametr": [0, 6, 33, 34, 37, 38], "paramt": [3, 5, 37, 40, 43, 44], "parent": 40, "parser": 28, "part": [0, 1, 3, 5, 6, 10, 17, 19, 20, 21, 22, 24, 26, 29, 30, 31, 33, 34, 37, 43], "partial": [0, 1, 5, 6, 7, 8, 10, 11, 12, 13, 16, 21, 30, 33, 34, 35, 36, 38, 39, 40, 41], "particip": [15, 25, 29, 31, 33], "particl": [0, 4, 13, 30, 33], "particular": [0, 1, 2, 3, 5, 6, 9, 10, 11, 12, 13, 16, 27, 30, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "particularli": [5, 6, 8, 11, 13, 30, 34, 35, 36, 37, 38], "partit": [1, 4, 9, 41], "partli": [6, 33], "partner": [15, 24, 27, 28], "pass": [2, 3, 12, 14, 21, 36, 40, 42, 43, 44], "password": [27, 28], "past": [10, 30, 36], "patch": [6, 30, 37, 43, 44], "path": [0, 4, 6, 7, 9, 25, 33, 36, 37, 38, 43], "pathcollect": 17, "patholog": [], "patient": [7, 38, 39], "patter": 4, "pattern": [0, 3, 4, 12, 32, 33, 36, 39, 40, 43, 44], "paul": [], "pauli": [0, 33], "pav": [], "pc": [11, 15, 25], "pca": [0, 7, 25, 33, 34, 39], "pd": [0, 4, 5, 6, 7, 9, 11, 33, 34, 35, 36, 37, 38, 39], "pde": [2, 42, 43], "pdf": [0, 3, 4, 5, 6, 9, 15, 16, 19, 20, 24, 27, 28, 32, 33, 37, 42], "pedagog": [0, 33, 34], "penal": [6, 18, 34, 36], "penalti": [6, 13, 18, 27, 34, 36], "penros": [5, 6], "pentagon": [13, 35], "peopl": [1, 9, 13, 24, 25, 27, 28, 36, 41], "per": [0, 1, 6, 21, 23, 24, 28, 29, 31, 33, 36, 37, 38, 39, 41], "perc_print": [41, 42], "percent": 23, "percentag": [10, 11, 31, 41, 42], "perceptron": [0, 1, 7, 33, 38], "peregrin": 33, "perez": [], "perfect": [0, 1, 13, 23, 33, 36, 41], "perfectli": [4, 6, 37, 38], "perform": [0, 2, 3, 4, 5, 6, 8, 10, 11, 12, 13, 14, 16, 18, 19, 21, 22, 23, 24, 25, 26, 27, 28, 30, 33, 34, 35, 36, 37, 38, 39, 40, 42], "performac": 4, "perhap": [0, 5, 13, 33, 34, 35, 36], "perimet": 1, "period": [1, 4, 30, 41], "permiss": 15, "permit": [], "permut": 11, "persist": 13, "person": [5, 6, 7, 16, 20, 29, 31, 33, 34, 38], "perspect": 32, "pertin": [12, 28, 33, 40, 41, 42], "petal": [8, 9], "peter": [32, 34], "petersen": [40, 41], "phantom": 30, "phase": [6, 12, 39, 40], "phd": [42, 43], "phenomena": 30, "phenomenon": 36, "phi": 8, "phi_k": 8, "philipp": [40, 41], "philosophi": 13, "phone": [31, 33], "photo": [4, 33], "photo1": 43, "php": [27, 28], "phrase": [0, 33], "physic": [0, 1, 4, 7, 12, 13, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "pi": [2, 3, 5, 6, 7, 9, 12, 13, 30, 37, 38, 39, 41, 42, 43], "pick": [1, 9, 10, 11, 13, 14, 24, 27, 28, 36, 41], "pickl": 1, "pictur": [0, 33], "pie": [25, 33], "piec": [11, 14, 21], "pierr": [], "pil": 43, "pillow": [0, 25, 27, 33], "pinv": [5, 6, 13, 27, 34, 35, 36, 39], "pip": [0, 1, 15, 25, 27, 33, 41, 42], "pip3": [0, 1, 27, 33, 41, 42], "pipelin": [0, 6, 8, 10, 34, 37, 38], "pippin": 33, "pit": 4, "pitfal": [6, 34], "pitt": [12, 39, 40], "pixel": [1, 3, 4, 28, 33, 41, 42, 43, 44], "pixel_height": [1, 3, 41, 42, 44], "pixel_width": [1, 3, 41, 42, 44], "pkg_resourc": [], "pkgutil": [], "place": [0, 4, 6, 8, 13, 15, 26, 27, 33, 35, 37], "plai": [0, 3, 4, 5, 6, 8, 11, 18, 22, 25, 27, 33, 34, 35, 37, 38, 40, 41, 43, 44], "plain": [8, 10, 12, 13, 14, 24, 27, 28, 35, 36, 40, 41], "plan": [6, 9, 31, 32, 33, 41], "plane": [8, 9], "plateau": [5, 35, 36], "platform": [25, 33], "plausibl": [12, 39, 41], "playlist": [43, 44], "plc1qu": 43, "pleas": [13, 27, 28, 31, 33], "plenti": [1, 41], "plethora": [3, 12, 39, 40, 43, 44], "pliahhy2ibx9hdharr6b7xevztgzra1p": [39, 40, 41], "plot": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 16, 17, 19, 20, 21, 23, 25, 26, 27, 28, 30, 33, 34, 35, 36, 39, 41, 42, 43, 44], "plot_all_sc": [27, 34], "plot_confusion_matrix": [7, 10, 23, 39], "plot_count": 6, "plot_cumulative_gain": [7, 10, 23, 39], "plot_data": 1, "plot_dataset": 8, "plot_decision_boundari": [9, 10], "plot_import": 10, "plot_iris_dataset": 21, "plot_max": 4, "plot_min": 4, "plot_model": 4, "plot_numb": 4, "plot_predict": 8, "plot_regression_predict": 9, "plot_result": 4, "plot_roc": [7, 10, 23, 39], "plot_surfac": [2, 6, 13, 42, 43], "plot_train": 9, "plot_tre": [9, 10], "plqvvvaa0qudcjd5baw2dxe6of2tius3v3": [39, 40, 41], "plt": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 17, 19, 21, 22, 23, 26, 30, 33, 34, 35, 36, 37, 38, 39, 41, 42, 43, 44], "plu": [0, 3, 5, 7, 18, 33, 34, 38, 43, 44], "plugin": [], "plzhqobowtqdnu6r1_67000dx_zcjb": [43, 44], "pm": [8, 37], "pmatrix": [2, 42, 43], "pml": 32, "pn": 3, "png": [0, 4, 6, 7, 9, 33, 37, 38], "point": [0, 1, 2, 3, 5, 6, 7, 8, 9, 10, 11, 13, 14, 18, 19, 20, 23, 26, 27, 30, 31, 33, 34, 35, 36, 37, 38, 39, 41, 42, 43], "point_1": 4, "point_2": 4, "poisson": [25, 30, 33], "poli": [6, 8, 37, 38], "poly100_kernel_svm_clf": 8, "poly3": 0, "poly3_plot": 0, "poly_degre": [41, 42], "poly_featur": [8, 9, 15], "poly_features10": 9, "poly_fit": 9, "poly_fit10": 9, "poly_kernel_svm_clf": 8, "poly_model": 15, "poly_ms": 15, "poly_predict": 15, "polydegre": [0, 5, 6, 10, 34, 37, 38], "polygon": [13, 35], "polym": [12, 39, 40], "polymi": 27, "polynomi": [0, 5, 6, 7, 8, 9, 10, 11, 15, 17, 19, 20, 27, 28, 33, 34, 36, 37, 38, 39, 40], "polynomial_featur": [6, 15, 16, 17, 37, 38], "polynomial_svm_clf": 8, "polynomialfeatur": [0, 6, 8, 9, 15, 16, 19, 34, 37, 38], "polytrop": [0, 6, 37, 38], "pool": 3, "pool_siz": [3, 44], "poor": [1, 13, 35, 36, 41], "poorli": [0, 34], "popul": [0, 5, 23, 33, 34], "popular": [0, 1, 3, 6, 7, 8, 9, 11, 12, 15, 24, 25, 26, 27, 30, 34, 38, 39, 41, 43, 44], "popularli": [0, 33], "portabl": 10, "portion": [11, 13, 36], "pose": [0, 4, 5, 6, 11, 30, 33, 37], "posit": [0, 1, 2, 3, 5, 7, 8, 10, 11, 13, 14, 21, 23, 26, 30, 33, 34, 35, 36, 38, 39, 41, 42, 43], "possibl": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 21, 24, 25, 26, 27, 28, 30, 31, 33, 34, 35, 36, 37, 38, 39, 40, 41, 44], "possibli": [6, 8, 13, 27], "post": [], "posterior": 5, "postpon": [0, 34], "postscript": [27, 28], "postul": 5, "potenti": [0, 3, 5, 6, 12, 13, 34, 36, 37, 39, 40], "pott": [12, 39, 40], "power": [0, 1, 5, 6, 8, 9, 12, 13, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "pp": [5, 6, 19, 37, 40, 41], "pr": 23, "practic": [0, 5, 6, 7, 8, 16, 18, 19, 21, 23, 27, 28, 30, 34, 37, 38, 39, 43, 44], "practition": [0, 1, 3, 33, 36, 41, 42, 43, 44], "pre": 33, "preambl": [], "precalcul": 40, "preced": [1, 11, 12, 30, 39, 41], "preceed": [4, 41, 42], "preceq": 8, "precis": [0, 2, 5, 11, 13, 26, 27, 28, 30, 33, 34, 36, 37, 40, 42, 43], "pred": [6, 23, 37, 38, 39], "pred_train": [41, 42], "pred_val": [41, 42], "predicit": 0, "prediciton": [41, 42], "predict": [0, 1, 5, 6, 7, 8, 9, 10, 15, 16, 17, 19, 22, 23, 24, 25, 27, 28, 32, 33, 34, 35, 36, 37, 38, 39, 41, 42, 44], "predict_prob": [1, 38, 39, 41], "predict_proba": [7, 10, 23, 39], "predictedlabel": [38, 39], "predictor": [0, 5, 6, 7, 9, 10, 11, 33, 34, 36], "prefer": [0, 1, 6, 8, 9, 11, 13, 15, 20, 23, 25, 27, 28, 33, 41], "prefil": [], "premier": 43, "prepar": [0, 6, 26, 27, 28, 33, 34], "preprocess": [0, 4, 6, 7, 8, 9, 10, 11, 15, 16, 17, 18, 19, 23, 27, 37, 38, 39, 41, 42], "prerequisit": 0, "prescript": [27, 28], "presenc": 13, "present": [0, 5, 6, 7, 9, 12, 13, 26, 27, 28, 30, 33, 34, 35, 36, 39, 40, 41, 42], "preserv": [3, 11, 26, 43, 44], "press": [13, 15, 32, 35, 40, 41], "pretrain": [1, 4, 41], "pretti": [0, 4, 8, 9, 21, 25, 27, 33], "prettier": [], "prev_centroid": 14, "prevent": [13, 30, 36], "previou": [0, 1, 2, 3, 4, 5, 6, 8, 10, 11, 12, 13, 15, 16, 21, 22, 26, 27, 28, 30, 34, 35, 36, 39, 40, 41, 42, 43, 44], "previous": [2, 3, 9, 10, 30, 42, 43, 44], "price": [0, 4, 9, 13, 36], "primal": 8, "primari": [0, 7, 33, 38, 39], "prime": 30, "princip": [0, 5, 7, 25, 33, 34, 35, 39], "principl": [0, 6, 7, 8, 14, 33, 37, 38, 39, 43, 44], "print": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 13, 14, 18, 21, 22, 23, 26, 28, 30, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "print_funct": [8, 9], "print_length": [41, 42], "printout": [0, 33], "prior": [0, 5, 6, 33], "privat": 0, "pro": 28, "prob": [1, 30, 38, 39], "probabilist": [0, 23, 32, 33, 34], "probabl": [0, 1, 3, 4, 6, 7, 10, 13, 21, 23, 25, 33, 34, 36, 38, 39, 41, 42, 44], "problem": [0, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 17, 23, 24, 25, 26, 27, 28, 30, 37, 44], "probml": 32, "proce": [0, 5, 6, 7, 8, 9, 10, 11, 13, 26, 33, 34, 37, 40], "procedur": [2, 4, 5, 6, 8, 10, 11, 13, 34, 35, 36, 37, 38, 42, 43], "proceed": 26, "process": [0, 2, 4, 6, 9, 10, 12, 13, 25, 26, 27, 30, 32, 33, 35, 36, 37, 38, 39, 40], "procur": [], "prod": 32, "prod_": [1, 5, 7, 37, 38, 39, 41], "produc": [0, 3, 4, 5, 6, 9, 10, 11, 12, 13, 18, 20, 25, 26, 27, 28, 30, 33, 34, 37, 39, 40, 43, 44], "product": [0, 1, 3, 5, 6, 7, 8, 12, 13, 16, 17, 25, 26, 33, 34, 36, 37, 38, 39, 40, 41], "profess": [0, 33], "profit": [], "progag": 28, "program": [0, 1, 4, 5, 6, 8, 12, 14, 15, 25, 26, 29, 30, 31, 33, 34, 39, 41], "programm": 26, "progress": [1, 4, 14, 36, 38, 39, 41, 42], "prohibit": [6, 37, 38], "project": [0, 1, 2, 3, 5, 11, 13, 15, 19, 22, 23, 24, 25, 29, 34, 35, 36, 37, 38, 39, 41, 42, 43, 44], "project_root_dir": [0, 6, 7, 9, 33, 37, 38], "promin": [12, 39, 40], "promis": 8, "promot": [31, 33], "prompt": 20, "prone": [9, 15, 21, 40], "pronounc": [13, 25, 33, 36], "proof": [0, 11, 12, 13, 33, 35, 37, 38, 40], "prop": [28, 36, 41, 42], "prop_cycl": [], "propag": [2, 3, 13, 21, 22, 28, 36, 44], "proper": [0, 2, 6, 7, 20, 37, 38, 42, 43], "properli": [1, 6, 8, 10, 13, 18, 20, 27, 28, 36, 41], "properti": [0, 1, 3, 12, 13, 16, 26, 33, 37, 39, 41, 42, 43, 44], "propgag": 40, "proport": [0, 1, 5, 9, 11, 13, 30, 33, 34, 41], "propos": [1, 4, 6, 10, 27, 28, 33, 36, 41], "propto": [5, 13, 35, 36], "proton": [0, 33], "prove": [3, 13, 35, 36, 43, 44], "provid": [0, 1, 3, 4, 5, 6, 8, 9, 10, 12, 13, 20, 21, 22, 25, 26, 27, 28, 30, 33, 34, 35, 36, 37, 38, 40, 41, 43, 44], "proxi": [1, 13, 36, 41], "prune": 9, "pseudo": [26, 30, 36], "pseudocod": [27, 28], "pseudoinv": 5, "pseudoinvers": [5, 6, 27], "pseudorandom": [6, 30, 37], "psychologi": [0, 33], "pt": 13, "public": [0, 15, 25, 33], "publish": [40, 41], "pull": 15, "punish": [0, 1, 33, 41, 42], "pure": [3, 9, 30], "purest": 9, "puriti": 9, "purpos": [0, 3, 10, 12, 14, 21, 33, 39, 40, 43, 44], "push": 15, "put": [1, 20, 27, 28, 36], "putmask": [], "py": 5, "pybtex": [], "pycod": 33, "pydata": 25, "pydevd_extension_api": [], "pydevd_plugin": [], "pydevd_plugin_plugin_nam": [], "pydot": 9, "pygment": [], "pyhton2": 33, "pylab": [7, 33, 38], "pypi": 25, "pyplot": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 17, 19, 21, 22, 23, 26, 30, 33, 34, 35, 36, 37, 38, 39, 41, 42, 43, 44], "pythagora": 5, "python": [1, 2, 3, 5, 6, 8, 11, 12, 13, 14, 18, 20, 21, 22, 27, 28, 30, 34, 36, 40, 41, 42, 43], "python2": [0, 27], "python3": [0, 25, 27, 33], "pythonpath": [], "pytorch": [0, 25, 27, 28, 33, 40, 41], "pyzmq": [], "q": [5, 6, 8, 11, 30, 37, 41, 42], "qp": 8, "qqoghlgkig0": 43, "qquad": [2, 11, 13, 23, 26, 36, 42, 43], "qr": [5, 6, 26, 34, 35], "quad": [1, 13, 23, 26, 41], "quadrat": [0, 8, 9, 13, 33], "qualit": [4, 9, 27, 28, 30], "qualiti": [0, 9, 25, 33, 34, 40], "quantifi": [1, 23, 41], "quantil": 10, "quantit": [0, 6, 9, 27, 28, 33, 37, 38], "quantiti": [0, 2, 5, 6, 7, 9, 10, 11, 12, 14, 16, 26, 30, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43], "quantum": [4, 12, 32, 33, 39, 40], "quartil": [0, 34, 36], "quasi": 40, "quench": 5, "queri": 9, "question": [0, 5, 6, 9, 11, 12, 13, 24, 27, 28, 31, 33, 34, 36, 37, 40, 41], "qugan": 4, "quick": [4, 30], "quicker": 36, "quickli": [1, 3, 9, 11, 13, 35, 36, 41, 42, 43, 44], "quit": [1, 5, 6, 9, 10, 12, 15, 22, 34, 35, 37, 38, 39, 41, 42], "quot": 4, "r": [0, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 17, 25, 26, 27, 30, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "r2": [0, 5, 6, 19, 33, 34, 35], "r2_score": [0, 33], "r2score": [0, 33], "r_": 36, "r_0": 36, "r_1": 9, "r_2": 9, "r_j": 9, "r_m": 9, "r_t": 36, "rad": [], "rade": [], "radial": [8, 12, 39, 40], "radioact": 30, "radiu": [0, 1, 34, 36], "radziej": [], "ragan": [], "rain": 9, "rais": [41, 42], "ram": 36, "ramanujam": [], "ramp": [1, 41], "ran0": 30, "ran1": 30, "ran2": 30, "ran3": 30, "rand": [0, 4, 5, 6, 9, 10, 13, 15, 19, 21, 22, 26, 33, 34, 35, 36, 37, 38, 41, 42], "randint": [6, 9, 13, 36, 37], "randn": [0, 1, 2, 5, 6, 9, 11, 13, 15, 18, 21, 22, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43], "random": [0, 1, 2, 3, 4, 5, 6, 8, 9, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 25, 26, 27, 28, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "random_forest_model": 10, "random_index": [13, 36], "random_indic": [1, 3, 41, 42, 44], "random_st": [7, 8, 9, 10, 11, 23, 28, 38, 39], "randomforestclassifi": 10, "randomli": [1, 6, 9, 13, 14, 18, 35, 36, 37, 38, 41], "randomst": [38, 39], "rang": [0, 1, 2, 3, 4, 5, 6, 7, 9, 10, 11, 12, 13, 14, 18, 19, 21, 22, 23, 26, 28, 30, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "rangl": [0, 6, 11, 30, 33, 34], "rangle_x": 30, "rank": [5, 34, 35], "rankdir": 4, "raphson": [1, 8, 13, 41], "rapidli": [0, 36], "rare": [1, 13, 23, 36, 41], "rasbt": [43, 44], "raschka": [28, 33, 34, 37, 38, 39], "rasckha": 33, "rashcka": [35, 36, 40, 41, 42, 43], "rashkca": [40, 41], "rate": [1, 2, 3, 4, 8, 9, 10, 12, 13, 18, 23, 24, 28, 35, 37, 38, 39, 40, 43, 44], "rather": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 26, 30, 33, 34, 35, 37, 38, 40, 41, 42, 43, 44], "ratio": [4, 7, 9, 10, 11, 23, 38, 39, 43], "rational": [0, 33], "ravel": [5, 6, 7, 8, 9, 10, 11, 13, 26, 37, 38, 39, 41, 42], "raw": [3, 36, 43, 44], "rbf": [8, 11, 12, 39, 40], "rbf_kernel_svm_clf": 8, "rbf_pca": 11, "rc": 30, "rcond": [0, 33, 34], "rcparam": [1, 3, 7, 8, 9, 10, 30, 33, 38, 41, 42, 43, 44], "re": [2, 4, 13, 15, 35, 42, 43], "reach": [1, 4, 5, 6, 9, 10, 12, 13, 14, 23, 35, 36, 37, 38, 40, 41, 42], "react": [], "read": [0, 2, 3, 4, 5, 6, 7, 8, 11, 12, 16, 17, 19, 20, 26, 27, 28, 30, 32, 35, 42, 43, 44], "read_csv": [0, 6, 7, 9, 37, 38], "read_fwf": [0, 33], "reader": [0, 6, 20, 26, 30, 33, 34, 36], "readi": [0, 1, 5, 6, 8, 10, 11, 12, 26, 33, 40, 41, 42, 43, 44], "readili": [1, 41], "readm": [15, 20, 27, 28], "readthedoc": 25, "real": [0, 1, 4, 7, 10, 11, 12, 16, 18, 19, 26, 34, 37, 38, 39, 41, 42], "real_loss": 4, "real_output": 4, "realist": [8, 33], "realiti": 30, "realiz": [1, 12, 39, 41], "realli": [0, 1, 24, 33, 41, 42], "rearrang": 13, "reason": [0, 1, 3, 4, 10, 13, 24, 32, 33, 35, 36, 41, 43, 44], "reassign": 1, "reat": [41, 42], "reber": [41, 42], "recal": [5, 6, 9, 10, 11, 12, 22, 26, 30, 33, 34, 35, 36, 37, 38, 40, 41], "recarrai": [], "recast": [3, 43, 44], "receiv": [1, 3, 10, 12, 23, 30, 39, 40, 41, 43, 44], "recent": [0, 6, 13, 32, 36, 37, 38, 40, 41], "recept": [3, 12, 39, 40, 44], "receptive_field": [3, 44], "recip": [0, 6, 7, 26, 27, 28, 33, 34, 38, 39, 43], "reciproc": 5, "recogn": [0, 4, 5, 10, 33, 37], "recognit": [0, 1, 3, 12, 32, 33, 39, 40, 41, 43, 44], "recommen": 33, "recommend": [0, 2, 3, 4, 5, 6, 8, 13, 15, 19, 20, 21, 22, 25, 26, 27, 28, 32, 35, 36, 37, 38, 39, 40, 42, 43, 44], "reconsid": 9, "reconstruct": [11, 43], "record": [10, 27, 28, 29, 31, 33, 38, 39], "recreat": [15, 21], "rectangl": [9, 13, 35], "rectangular": [5, 34, 35], "rectifi": [1, 3, 12, 39, 41, 44], "recur": [0, 25, 33], "recurr": [0, 1, 25, 33, 41, 44], "recurs": [9, 25, 26, 33], "red": [0, 3, 4, 6, 8, 9, 36, 37, 43, 44], "redefin": [0, 10, 33, 34, 35], "redefinit": 35, "redistribut": [], "reduc": [1, 3, 5, 6, 9, 10, 11, 13, 21, 24, 33, 35, 36, 37, 41, 43, 44], "reduct": [0, 10, 11, 25, 30, 33, 34], "redund": [23, 43, 44], "reegress": 27, "ref": 20, "refer": [0, 1, 2, 3, 5, 6, 11, 12, 13, 14, 20, 24, 26, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "referansestil": 20, "referenc": [2, 40, 41, 42, 43], "refin": [12, 39, 40], "refit": [6, 37, 38], "reflect": [0, 1, 4, 5, 27, 28, 30, 33, 41, 42], "refresh": [25, 33], "refreshprogrammingskil": 33, "reg": [10, 11], "regard": [1, 9, 13, 41], "regardless": [12, 16, 23, 39, 41], "regexp": [], "reggi": [], "regim": 36, "region": [3, 4, 6, 9, 12, 27, 36, 39, 40, 43, 44], "regist": [6, 30], "reglasso": [5, 35], "regr_1": [0, 9], "regr_2": [0, 9], "regr_3": [0, 9], "regress": [1, 8, 11, 12, 16, 20, 23, 24, 25, 26, 40, 41, 42], "regressor": [0, 7, 10, 38, 41, 42], "regret": [], "regridg": [0, 5, 6, 34, 35, 36], "regular": [0, 3, 4, 5, 6, 7, 9, 13, 17, 18, 24, 28, 31, 33, 34, 35, 36, 37, 38, 39, 42], "regularli": 15, "reilli": [0, 32, 33], "reinforc": [0, 8, 25, 33], "reiniti": [41, 42], "reiter": 1, "reitz": [], "reject": 7, "rel": [0, 4, 6, 7, 9, 12, 13, 21, 30, 33, 34, 36, 37, 38, 39, 41], "relat": [0, 1, 3, 4, 5, 11, 13, 14, 19, 23, 26, 30, 33, 34, 35, 37, 40, 41, 43, 44], "relationship": [0, 4, 9, 18, 33], "relativeerror": [0, 33, 34], "releas": [1, 25, 33, 41, 42], "relev": [0, 1, 5, 7, 11, 25, 27, 28, 30, 33, 35, 36, 43, 44], "reli": [0, 6, 8, 36, 43], "reliabilti": [27, 28], "reliabl": [7, 30, 38, 39], "relu": [3, 4, 21, 22, 28, 33, 43, 44], "relu_d": 22, "remain": [1, 2, 4, 6, 12, 23, 26, 30, 34, 36, 37, 38, 39, 40, 41, 42, 43], "remaind": 30, "reman": [2, 42, 43], "remark": [1, 41], "rememb": [0, 8, 13, 20, 21, 22, 26, 27, 28, 33, 36], "remind": [0, 5, 11, 13, 19, 26, 30, 37, 42], "remot": 15, "remov": [4, 5, 6, 18, 34, 35, 36], "renam": [15, 43, 44], "render": [0, 33, 34], "reorder": [5, 7, 34, 35, 38, 39], "reorgan": [0, 33], "repeat": [0, 1, 3, 4, 5, 6, 9, 10, 11, 13, 14, 26, 27, 30, 33, 34, 35, 36, 37, 38, 40, 41, 43, 44], "repeated": 33, "repeatedli": [0, 6, 10, 13, 37, 38], "repet": [3, 43, 44], "repetit": [6, 33, 34, 37, 38], "rephras": [13, 35], "replac": [0, 1, 3, 4, 5, 6, 10, 12, 14, 23, 25, 27, 33, 34, 35, 37, 38, 40, 41, 43, 44], "replica": [6, 37], "repo": [15, 27, 28], "report": [24, 33, 36, 38, 39], "repositori": [4, 20, 24, 27, 28, 33, 43, 44], "reposotori": [], "repres": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 12, 13, 27, 28, 30, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "represent": [0, 1, 3, 6, 30, 33, 37, 38, 41, 42, 43, 44], "representd": [3, 43, 44], "reproduc": [0, 5, 6, 9, 12, 15, 16, 18, 20, 25, 27, 28, 30, 33, 34, 40, 41, 42], "repuls": [0, 33], "request": [0, 13, 36], "requir": [0, 1, 3, 4, 5, 6, 8, 9, 11, 12, 13, 15, 17, 18, 19, 20, 24, 26, 27, 33, 34, 35, 36, 37, 38, 39, 40, 41, 44], "rerun": [41, 42], "res1": [2, 42, 43], "res2": [2, 42, 43], "res3": [2, 42, 43], "res_analyt": [2, 42, 43], "res_analytical1": [2, 42, 43], "res_analytical2": [2, 42, 43], "res_analytical3": [2, 42, 43], "resaml": 6, "resampl": [0, 7, 10, 24, 25, 33, 34, 41, 42], "rescal": [0, 11, 12, 36, 39], "rescu": 5, "reseach": 6, "research": [0, 4, 13, 21, 22, 25, 28, 32, 33, 36, 43, 44], "researchg": 28, "resembl": [6, 30, 37], "reserv": [1, 5, 6, 30, 37, 38, 41], "reset": [41, 42], "reset_weight": [41, 42], "reshap": [0, 1, 2, 3, 4, 6, 8, 9, 10, 26, 33, 34, 37, 38, 41, 42, 43, 44], "resid": 36, "residenti": [], "residu": [0, 5, 13, 33], "resiz": [5, 34, 35], "resnet": 36, "resort": 36, "resourc": [33, 36], "respect": [0, 1, 2, 3, 5, 6, 7, 8, 10, 11, 12, 13, 14, 16, 17, 18, 21, 27, 28, 30, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "respond": [12, 39, 40], "respons": [0, 7, 9, 12, 33, 34, 38, 39, 40], "rest": [0, 5, 18, 21, 22, 23, 34, 35, 36], "restat": [0, 12, 33], "restor": 4, "restored_discrimin": 4, "restored_gener": 4, "restrict": [0, 3, 9, 12, 33, 39, 40, 41, 42, 43, 44], "result": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 30, 33, 36, 37, 38, 39, 42, 43, 44], "retail": [], "retain": [5, 6, 34, 35, 36, 37, 38, 43, 44], "rethink": 37, "return": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 11, 13, 14, 16, 17, 21, 22, 26, 30, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "return_data": 14, "return_sequ": 4, "return_x_i": 9, "reus": [1, 3, 6, 19, 20, 22, 24, 27, 28, 40, 41, 43, 44], "reveal": [0, 12, 33, 39, 40], "revers": [1, 22, 26, 41, 42], "review": [25, 26], "revis": [], "revisit": 14, "revolut": 33, "reward": [0, 4, 33], "rewrit": [0, 3, 5, 6, 7, 8, 10, 11, 12, 13, 16, 19, 26, 27, 30, 35, 36, 38, 39, 40, 41], "rewritten": [2, 6, 8, 10, 30, 37, 42, 43], "rewrot": [13, 38, 39], "rf": 10, "rgb": [3, 43, 44], "rgoj5yh7evk": 25, "rh": [6, 37], "rho": [0, 10, 13, 36, 41, 42], "rho2": [41, 42], "rho_1": 10, "rho_2": 10, "rho_m": 10, "rich": [0, 33], "rid": [], "ride": 9, "rideclass": 9, "ridedata": 9, "ridg": [7, 11, 13, 20, 24, 25, 28, 33, 37, 38, 39], "ridge_paramet": 17, "ridge_sk": 6, "ridgebeta": 35, "ridgetheta": 5, "right": [0, 1, 2, 3, 5, 6, 7, 8, 9, 10, 12, 13, 14, 16, 17, 19, 21, 22, 26, 27, 28, 30, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "right_sid": [2, 42, 43], "rightarrow": [0, 1, 5, 6, 8, 11, 12, 13, 30, 33, 34, 35, 36, 37, 39, 40, 41], "rigor": [0, 23, 33, 34, 35], "ring": 6, "rise": [0, 33], "risk": [0, 13, 33, 35, 36], "rival": 4, "river": [], "rlm": 33, "rm": [28, 30, 36, 41, 42], "rms_prop": [41, 42], "rmse": [], "rmsporp": [13, 36], "rmsprop": [1, 3, 4, 13, 27, 28, 37, 40, 41, 42, 44], "rnd_clf": 10, "rng": [30, 38, 39], "rnn": [4, 12, 39, 40], "rnn1": 4, "rnn2": 4, "rnn_2layer": 4, "rnn_input": 4, "rnn_output": 4, "rnn_train": 4, "rntrick1": 30, "rntrick2": 30, "rntrick3": 30, "rntrick4": 30, "ro": [0, 13, 33, 35, 36], "robert": [19, 27, 32], "robust": [0, 33, 36], "robustscal": [0, 34, 36], "roc": [7, 10], "role": [0, 2, 5, 6, 8, 18, 25, 27, 33, 34, 35, 36, 37, 38, 40, 41, 42, 43], "roll": 6, "ronach": [], "room": [0, 31, 33], "root": [0, 5, 9, 13, 15, 24, 30, 34, 35, 36, 40, 42, 44], "root_directori": [], "rot": 33, "rotat": [1, 8, 9, 10], "rotation_matrix": 9, "roughli": [1, 3, 18, 41, 44], "round": [7, 9, 13, 39, 41, 42, 43], "routin": [13, 26, 33, 35], "row": [0, 1, 2, 5, 6, 9, 11, 16, 21, 26, 33, 34, 35, 37, 41, 42, 43], "rr": [5, 34, 35], "rrr": [5, 34, 35], "rubric": [], "rudg": [], "rug": [13, 35, 36], "rule": [0, 1, 5, 6, 13, 22, 27, 33, 34, 35, 39, 42, 43, 44], "run": [0, 1, 2, 4, 5, 6, 8, 9, 11, 13, 15, 20, 21, 22, 25, 27, 28, 33, 34, 35, 36, 37, 38, 41, 42, 43], "rung": 28, "running_loss": [42, 44], "runtim": [1, 6, 14, 15, 41, 42], "rust": [0, 25, 26, 33], "rvert": [1, 41], "rvert_2": [1, 41], "s41467": 28, "s_": [3, 6], "s_1": 6, "s_i": [6, 7, 38], "s_j": 6, "s_k": 6, "s_phenomenon": 27, "saddl": [13, 35, 36], "safeguard": [18, 36], "saga": 28, "sai": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 19, 23, 26, 27, 28, 30, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "said": [6, 9, 13, 35], "sake": [0, 5, 7, 11, 33, 34, 35, 38, 39, 40, 41], "sale": [0, 33], "sam": 33, "same": [0, 1, 2, 3, 4, 5, 6, 8, 9, 11, 12, 14, 15, 16, 18, 20, 21, 22, 24, 26, 27, 28, 30, 33, 34, 35, 39, 40, 41, 42, 43, 44], "samm": 10, "sampl": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 13, 14, 18, 19, 23, 25, 26, 27, 30, 33, 34, 36, 37, 38, 39, 41, 42, 43, 44], "sample_vari": 14, "sampleexptvari": 30, "samples_per_class": [38, 39], "samwis": 33, "sandbox": [], "sandboxmod": [21, 22], "sasha": [], "sastri": 11, "satisfactori": [0, 33], "satisfi": [1, 2, 3, 6, 8, 13, 26, 30, 35, 37, 41, 42, 43], "satur": [1, 6, 37, 38, 41], "save": [0, 4, 6, 7, 9, 13, 20, 22, 33, 36, 37, 38], "save_fig": [0, 6, 7, 9, 10, 33, 37, 38], "savefig": [0, 4, 6, 7, 9, 30, 33, 37, 38], "savetxt": 4, "saw": [5, 34], "scalabl": 10, "scalar": [2, 5, 6, 10, 34, 37, 40, 41, 42, 43], "scale": [0, 1, 3, 5, 6, 7, 8, 9, 10, 11, 12, 13, 22, 25, 26, 27, 28, 31, 33, 35, 38, 39, 41, 42], "scale_mean": 4, "scale_std": 4, "scaler": [0, 7, 8, 9, 10, 11, 17, 27, 34, 41, 42], "scan": [5, 7, 38, 39], "scari": 5, "scatter": [0, 1, 6, 7, 8, 9, 14, 15, 17, 21, 33, 34, 36, 37, 38], "scenario": [6, 13, 35, 36], "schedul": [13, 36], "scheduler_arg": [41, 42], "schedulers_bia": [41, 42], "schedulers_weight": [41, 42], "scheme": [1, 13, 35, 36, 38, 39, 41], "schrage": 30, "sch\u00f8yen": [6, 34, 36], "scienc": [0, 1, 10, 12, 13, 25, 29, 30, 31, 32, 35, 37, 38, 39, 40, 41], "scientif": [0, 20, 25, 27, 28, 33, 38, 39, 42, 43], "scientist": [0, 33], "scikit": [3, 5, 6, 8, 9, 10, 13, 15, 16, 20, 21, 23, 25, 26, 27, 28, 32, 42, 44], "scikit_learn": [0, 39], "scikitlearn": 33, "scikitplot": [7, 10, 23, 39], "scipi": [0, 3, 5, 6, 13, 25, 26, 27, 33, 34, 35, 37, 43], "scl": 6, "scm": 15, "score": [0, 1, 3, 6, 7, 9, 10, 11, 15, 16, 19, 21, 23, 27, 28, 31, 33, 34, 36, 37, 38, 39, 41, 42, 43, 44], "scores_kfold": [6, 37, 38], "scratch": [1, 13, 16, 39, 40, 41], "script": [], "sdg": [13, 36], "sdv4f4s2sb8": [35, 36], "seaborn": [0, 1, 3, 6, 7, 28, 33, 39, 41, 42, 44], "seamless": [0, 25, 27, 33], "seamlessli": [41, 42], "search": [0, 1, 3, 5, 9, 13, 15, 33, 35, 36, 41, 42, 44], "sebastian": [33, 40, 41], "sebastianraschka": [28, 33], "sec": 6, "second": [0, 2, 3, 4, 5, 6, 7, 8, 9, 11, 12, 14, 15, 16, 20, 21, 22, 24, 25, 26, 30, 31, 33, 34, 35, 37, 38, 39, 40, 41, 42, 43, 44], "second_correct": [41, 42], "second_mo": 36, "second_term": 36, "secondari": 36, "secondeigvector": 11, "secondli": [12, 40, 41, 42], "section": [4, 11, 16, 20, 26, 27, 30, 34, 36, 38], "sector": 0, "see": [0, 1, 2, 3, 4, 5, 6, 7, 8, 10, 11, 12, 13, 15, 16, 18, 19, 20, 21, 22, 23, 25, 26, 27, 28, 30, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "seed": [0, 1, 2, 3, 4, 5, 6, 8, 9, 11, 13, 14, 18, 20, 21, 27, 28, 30, 33, 34, 35, 36, 37, 38, 40, 41, 42, 43, 44], "seed_imag": 4, "seek": [1, 2, 8, 41, 42, 43], "seem": [1, 3, 4, 36, 41, 42, 43, 44], "seemingli": [0, 33], "seen": [0, 1, 3, 5, 10, 12, 30, 41, 43, 44], "segment": [13, 35, 41, 42], "seismic": 6, "seldomli": [0, 33], "select": [1, 5, 6, 8, 9, 10, 11, 15, 20, 23, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 41], "selevet": 15, "self": [1, 5, 22, 34, 38, 39, 41, 42, 44], "sell": 4, "semest": [7, 29, 39], "semi": [8, 13, 35, 36], "semilogx": 6, "send": [5, 12, 13, 21, 22, 31, 33, 39, 40], "senior": [29, 31], "sens": [0, 4, 6, 8, 21, 33, 37, 43, 44], "sensibl": [3, 21, 43, 44], "sensit": [0, 5, 6, 9, 13, 23, 33, 34, 36, 37, 38], "sent": [2, 21, 40, 41, 42, 43], "sentdex": [39, 40, 41], "sentenc": [4, 12, 39, 40], "separ": [0, 1, 2, 4, 6, 8, 9, 12, 14, 18, 21, 22, 23, 25, 27, 30, 33, 36, 37, 39, 40, 41, 42, 43], "septemb": [18, 27, 33], "sequenc": [3, 4, 7, 9, 10, 12, 13, 25, 26, 30, 33, 35, 38, 39, 40, 43, 44], "sequenti": [1, 3, 4, 10, 12, 30, 39, 40, 41, 42, 44], "seri": [0, 1, 2, 3, 4, 5, 6, 10, 11, 12, 13, 26, 33, 34, 35, 37, 39, 40, 41, 42], "serif": [7, 30, 33, 38], "serv": [0, 1, 2, 3, 5, 7, 13, 28, 32, 33, 34, 35, 36, 38, 39, 41, 42, 43, 44], "servic": [27, 28], "session": [1, 15, 20, 27, 28, 29, 31, 33], "set": [1, 4, 5, 6, 7, 8, 10, 11, 13, 14, 16, 17, 18, 21, 22, 23, 24, 25, 26, 27, 28, 30, 31, 36, 37, 38, 39], "set_cmap": 43, "set_major_formatt": 6, "set_major_loc": 6, "set_tick": [1, 8], "set_ticklabel": 1, "set_titl": [0, 1, 2, 3, 7, 12, 14, 33, 38, 39, 41, 42, 43, 44], "set_xlabel": [0, 1, 2, 3, 7, 12, 33, 38, 39, 41, 42, 43, 44], "set_xlim": [7, 12, 38, 39, 41], "set_xticklabel": 1, "set_ylabel": [0, 1, 2, 3, 7, 33, 39, 41, 42, 43, 44], "set_ylim": [7, 12, 38, 39, 41], "set_ytick": [7, 39], "set_yticklabel": [1, 6], "set_zlim": 6, "seth": 4, "setminu": 6, "setosa": [8, 9], "setosa_or_versicolor": 8, "setp": [6, 37, 38], "setup": [1, 4, 6, 8, 22, 25, 28, 33, 34, 35, 40, 41], "sever": [0, 3, 5, 6, 7, 8, 9, 11, 12, 13, 16, 25, 26, 27, 28, 30, 33, 34, 35, 36, 37, 38, 39, 40, 43, 44], "sgd": [1, 3, 35, 41, 42, 44], "sgd_clf": 8, "sgdclassifi": 8, "sgdreg": 13, "sgdregressor": 13, "sgn": [5, 34, 35], "shall": [], "shallow": [13, 36], "shape": [0, 1, 3, 4, 5, 6, 7, 8, 9, 10, 11, 13, 14, 15, 16, 18, 21, 22, 23, 26, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "share": [1, 3, 15, 33, 41, 42, 43, 44], "share_mask": [], "shareabl": 15, "she": [7, 38, 39], "sheppard": [], "shibukawa": [], "shift": [1, 6, 12, 15, 18, 30, 34, 36, 39, 41], "ship": [3, 44], "shire": 33, "short": [4, 5, 20, 24, 27, 28, 41, 42], "shortcom": [13, 35, 36], "shorten": 4, "shorter": 30, "shorthand": [33, 37], "shortli": [26, 33], "should": [0, 2, 3, 5, 6, 8, 9, 11, 12, 15, 18, 19, 20, 21, 22, 24, 26, 27, 30, 33, 34, 36, 37, 38, 40, 43, 44], "shouldn": [], "show": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 19, 20, 23, 24, 26, 27, 28, 30, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "show_shap": 4, "shown": [0, 4, 5, 8, 12, 13, 23, 26, 34, 35, 36, 39, 40, 41, 42], "shrink": [3, 5, 6, 8, 11, 34, 35, 36, 43, 44], "shrinkag": [5, 6, 34, 35], "shrunk": 11, "shuffl": [0, 1, 4, 6, 13, 34, 36, 37, 38, 41, 42, 44], "sickit": [40, 41], "side": [0, 2, 5, 8, 12, 13, 26, 27, 28, 33, 35, 38, 39, 41, 42, 43], "sigh": [25, 33], "sigma": [0, 1, 5, 6, 7, 10, 11, 12, 13, 19, 26, 27, 30, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42], "sigma0": 30, "sigma1": 30, "sigma2": 30, "sigma_": [5, 26, 33, 34, 35, 37], "sigma_0": [5, 34, 35], "sigma_1": [5, 34, 35, 40, 41], "sigma_2": [5, 34, 35, 40, 41], "sigma_fn": [7, 12, 38, 39, 41], "sigma_i": [0, 5, 33, 34, 35], "sigma_j": [5, 34, 35], "sigma_m": [6, 30, 37], "sigma_n": [11, 30], "sigma_t": 13, "sigma_x": 30, "sigmoid": [1, 2, 4, 7, 8, 10, 12, 21, 22, 28, 38, 39, 40, 42, 43], "sigmoid_autograd": 22, "sigmoid_d": 22, "sigmundson": [6, 34, 36], "sign": [1, 2, 7, 8, 10, 28, 30, 31, 38, 41, 42, 43], "signa": 43, "signal": [1, 3, 10, 12, 36, 39, 40, 41, 43, 44], "signifi": 4, "signific": [1, 36, 41], "significantli": [1, 13, 18, 30, 35, 36, 41], "sim": [4, 5, 6, 13, 19, 30, 37], "similar": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 14, 18, 25, 26, 27, 28, 33, 35, 37, 38, 39, 40, 41, 43, 44], "similarli": [0, 1, 3, 5, 8, 10, 13, 30, 33, 34, 35, 36, 40, 41, 43, 44], "similiar": [41, 42], "simpl": [1, 2, 3, 5, 6, 7, 8, 10, 11, 12, 14, 16, 17, 22, 23, 25, 26, 28, 30, 37, 39, 42], "simple_plot": [], "simplefilt": [41, 42], "simplepredict": 10, "simpler": [0, 1, 5, 6, 7, 13, 16, 25, 27, 28, 33, 35, 36, 41, 42], "simplernn": 4, "simplest": [0, 1, 3, 4, 9, 10, 12, 14, 23, 27, 33, 39, 40, 41, 43, 44], "simpletre": 10, "simpli": [0, 1, 2, 4, 5, 6, 8, 9, 10, 11, 12, 25, 26, 27, 28, 30, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "simplic": [2, 5, 6, 7, 8, 9, 10, 11, 12, 14, 34, 35, 36, 38, 39, 40, 41, 42, 43], "simplicti": [5, 34, 35], "simplif": 40, "simplifi": [0, 6, 9, 18, 22, 25, 27, 33, 34, 36, 37, 38, 40], "simplist": [3, 6, 30, 37, 43, 44], "simul": [6, 18, 36, 37, 38], "simultan": [6, 36, 37, 38], "sin": [0, 1, 2, 3, 4, 9, 12, 13, 26, 33, 39, 41, 42, 43], "sinc": [0, 1, 2, 3, 5, 6, 7, 8, 9, 10, 11, 13, 16, 18, 21, 22, 26, 27, 30, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "sine": [3, 12, 39, 41], "singl": [0, 1, 2, 3, 5, 6, 7, 8, 9, 12, 13, 18, 19, 21, 22, 23, 26, 30, 33, 34, 35, 36, 37, 38, 41, 42, 43, 44], "singular": [0, 6, 13, 26, 33, 37, 43], "sinusoid": 3, "site": [0, 27, 28, 29, 34], "situat": [0, 4, 5, 7, 13, 30, 33, 34, 35, 36, 38, 39], "six": [3, 30, 40], "size": [0, 1, 2, 3, 4, 5, 6, 8, 9, 10, 11, 13, 18, 20, 21, 23, 26, 27, 30, 33, 37, 38, 39, 40, 41, 42, 43, 44], "sizesp": 36, "skeleton": 22, "sketch": 10, "ski": 9, "skill": 0, "skip": 11, "skl": [0, 6, 33, 34, 36], "sklearn": [0, 1, 3, 5, 6, 7, 8, 9, 10, 11, 13, 14, 15, 17, 19, 20, 21, 22, 23, 28, 33, 34, 35, 36, 37, 38, 39, 41, 42, 44], "skplt": [7, 10, 23, 39], "skrankefunct": [41, 42], "sl": [6, 34, 36], "slack": 8, "slender": [], "slice": [2, 26, 33, 42, 43], "slide": [0, 3, 16, 27, 28, 30, 33, 34, 35, 40, 41, 42, 43, 44], "slight": [6, 13, 24, 37, 38], "slightli": [1, 2, 3, 5, 6, 7, 10, 30, 34, 35, 37, 38, 39, 40, 41, 42, 43, 44], "slope": [8, 11, 12, 39], "slow": [0, 2, 8, 13, 18, 34, 35, 36, 42, 43], "slower": [5, 26, 33, 34, 35, 36], "slowest": 26, "slowli": [12, 36], "slp": [1, 41], "small": [0, 1, 2, 3, 5, 6, 8, 9, 10, 11, 12, 13, 18, 21, 22, 25, 26, 30, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "smaller": [0, 1, 2, 5, 6, 8, 9, 11, 13, 21, 30, 33, 34, 35, 36, 37, 38, 41, 42, 43], "smallest": [0, 4, 14, 33], "smallest_row_index": 14, "smodin": [], "smooth": [0, 3, 6, 13, 27, 33, 35, 36, 43, 44], "smoother": 36, "sn": [0, 1, 3, 6, 7, 33, 39, 41, 42, 44], "sne": 11, "so": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 18, 19, 20, 21, 22, 23, 25, 26, 27, 28, 30, 31, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "soar": 6, "social": 0, "soft": [1, 7, 10, 12, 38, 39, 40, 41], "soften": 8, "softmax": [3, 7, 21, 22, 28, 38, 39, 42, 43, 44], "softmax_vec": 21, "softwar": [0, 8, 25, 26, 40], "sokogskriv": 20, "sol": 8, "sol1": 21, "sole": [0, 6, 33], "solid": [0, 7, 38, 39], "solut": [0, 1, 2, 3, 5, 6, 8, 10, 11, 13, 18, 21, 24, 26, 27, 28, 30, 33, 34, 35, 36, 37, 41, 44], "solution_ev": 36, "soluton": [2, 42, 43], "solv": [0, 1, 3, 5, 6, 8, 10, 11, 12, 13, 16, 26, 27, 28, 33, 34, 40, 41, 44], "solve_expdec": [2, 42, 43], "solve_ode_deep_neural_network": [2, 42, 43], "solve_ode_neural_network": [2, 42, 43], "solve_pde_deep_neural_network": [2, 42, 43], "solveod": [2, 42, 43], "solveode_popul": [2, 42, 43], "solver": [2, 7, 8, 9, 10, 23, 26, 28, 33, 39, 42, 43], "some": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 14, 15, 16, 18, 19, 21, 22, 23, 24, 27, 28, 30, 33, 36, 37, 39, 41, 42, 43, 44], "some_model": [6, 34, 36], "somehow": 4, "someon": 16, "someth": [0, 1, 3, 4, 7, 9, 11, 15, 19, 20, 27, 28, 30, 33, 34, 39, 41, 43, 44], "sometim": [0, 1, 11, 12, 13, 14, 19, 34, 36, 39, 40, 41], "somewhat": [28, 39], "soon": [26, 31, 34], "sophist": [0, 33], "sopt": 13, "sort": [5, 6, 9, 11, 23, 30, 37, 38], "sound": [3, 5], "sourc": [0, 1, 3, 6, 24, 25, 26, 27, 28, 30, 33, 36, 37, 38, 41, 42], "source1": 22, "source2": 22, "space": [0, 1, 4, 5, 8, 9, 11, 12, 13, 14, 30, 34, 35, 36, 38, 39, 40, 41, 43, 44], "span": [0, 3, 5, 9, 11, 26, 33, 34, 35, 43, 44], "spare": [1, 41, 42], "spars": [3, 6, 18, 26, 33, 36, 43, 44], "sparse_categorical_crossentropi": 42, "sparse_mtx": [26, 33], "sparsecategoricalcrossentropi": [3, 44], "sparsiti": [10, 18], "spatial": [1, 2, 3, 12, 39, 40, 41, 42, 43, 44], "speak": 30, "special": [6, 7, 10, 12, 13, 23, 26, 30, 33, 34, 35, 36, 38, 39, 40, 41], "specif": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 11, 12, 15, 16, 23, 25, 26, 27, 28, 30, 32, 33, 34, 35, 37, 38, 39, 40, 41, 44], "specifi": [0, 3, 5, 6, 7, 9, 11, 13, 14, 30, 33, 35, 36, 37, 38, 39, 41, 44], "specifici": [0, 10, 33], "spectacular": [3, 43, 44], "spectral": 1, "speech": [0, 1, 3, 4, 12, 39, 40, 41, 43, 44], "speed": [1, 2, 4, 13, 41, 42], "spend": [16, 30, 36], "spent": [27, 28], "sphere": [0, 34, 36], "sphinx": [], "sphinx_book_them": [], "sphinxcontrib": [], "spike": 36, "spin": 6, "spite": 0, "spitzer": [], "spline": 8, "split": [1, 3, 4, 5, 6, 8, 9, 10, 11, 14, 16, 17, 20, 21, 22, 27, 28, 30, 33, 35, 36, 37, 38, 41, 42, 43, 44], "splite": 0, "splitter": [1, 10], "spoiler": [], "spontan": 30, "spot": [3, 43, 44], "spread": [0, 11, 30, 33, 34, 38, 39], "spring": [41, 42], "springer": [19, 27, 32, 33, 37, 38], "spuriou": [13, 36], "sqquar": 35, "sqrsignal": 3, "sqrt": [3, 4, 5, 6, 8, 10, 11, 13, 30, 34, 35, 36, 37, 40, 41, 42, 43], "squar": [1, 2, 3, 4, 7, 8, 9, 11, 13, 14, 15, 17, 18, 24, 25, 26, 28, 30, 37, 38, 39, 40, 41, 42, 43, 44], "squarederror": 10, "squaredeuclidean": 14, "squash": [12, 39, 41], "src": [], "srtm": 6, "srtm_data_norway_1": 6, "srv": 43, "sso": 20, "stabil": [5, 27, 28, 36, 38, 39], "stabl": [0, 4, 5, 6, 9, 16, 20, 25, 27, 33, 34, 35, 36], "stack": [3, 4, 43, 44], "stage": [5, 13, 15, 27, 28, 36, 40, 41, 42], "stagnat": 36, "stai": [0, 2, 4, 5, 11, 33, 34, 36, 41, 42, 43], "stand": [0, 5, 9, 12, 33, 34, 35, 39], "standard": [0, 1, 4, 5, 6, 7, 8, 10, 12, 17, 18, 19, 23, 26, 27, 28, 30, 33, 35, 36, 38, 39, 40, 41, 43, 44], "standardscal": [0, 6, 7, 8, 9, 10, 11, 17, 34, 36], "standpoint": 36, "stanford": [13, 35, 42, 43], "stanforduniversityschoolofengin": 43, "start": [0, 1, 2, 3, 4, 5, 6, 8, 9, 10, 11, 12, 13, 14, 15, 16, 18, 21, 22, 24, 26, 28, 30, 31, 33, 34, 35, 36, 37, 38, 40, 41, 42, 44], "start_tim": 14, "starter": [], "stat": [6, 37], "state": [1, 2, 4, 5, 6, 7, 8, 10, 11, 12, 13, 24, 25, 30, 33, 34, 35, 37, 38, 39, 40, 41, 42, 43], "statement": [0, 7, 26, 33, 39], "static": [], "stationari": [35, 36], "statist": [0, 1, 3, 4, 7, 9, 10, 11, 12, 13, 14, 19, 26, 27, 32, 34, 35, 36, 39, 40, 41, 43, 44], "statu": [0, 7, 15, 33, 38, 39], "stavang": 6, "stb": [], "std": [0, 4, 6, 18, 33, 34, 36, 37, 38, 42], "stdout": [41, 42], "steep": [13, 23, 35, 36], "steepest": 36, "stefan": [], "step": [0, 1, 2, 4, 6, 7, 9, 10, 11, 12, 13, 14, 15, 18, 22, 26, 27, 33, 35, 39, 40, 41, 42, 43, 44], "step_fn": [7, 12, 38, 39, 41], "step_length": [13, 36], "step_siz": 36, "steps_list": 9, "stereo": [3, 43, 44], "sticki": [], "still": [0, 2, 3, 5, 6, 11, 13, 21, 22, 24, 28, 30, 34, 35, 36, 37, 38, 40, 42, 43, 44], "stimuli": [12, 39, 40], "stk": [32, 33], "stk2100": [32, 33], "stk3155": [15, 27, 28, 29, 31], "stk4021": [32, 33], "stk4051": [32, 33], "stk4155": [29, 31], "stk5000": 32, "stochast": [0, 1, 5, 6, 8, 11, 12, 22, 24, 28, 35, 37, 38, 40, 41], "stock": 4, "stoke": [12, 39, 40], "stone": [0, 7, 38, 39, 40], "stop": [1, 4, 9, 13, 14, 18, 35, 40, 41, 42], "storag": [5, 34, 35], "store": [0, 1, 2, 3, 6, 11, 13, 22, 30, 33, 36, 41, 42, 43, 44], "storehaug": [31, 33], "stori": [], "str": [1, 3, 4, 41, 42, 43, 44], "straight": [0, 6, 8, 13, 33, 35, 37], "straightforward": [0, 2, 3, 5, 6, 8, 9, 10, 13, 26, 33, 34, 35, 37, 42, 43, 44], "strategi": [0, 1, 9, 33, 41], "stratifi": [6, 37, 38], "stream": 36, "strength": [0, 5, 14, 34, 35, 42], "stretch": 11, "strict": [8, 13, 35], "strictli": [8, 13, 35], "stride": [4, 26, 43, 44], "strike": 6, "string": [1, 41], "stroke": [7, 38, 39], "strong": [3, 6, 9, 10, 12, 26, 30, 36, 37, 39, 40], "strongli": [0, 8, 15, 20, 22, 24, 25, 26, 28, 41, 42, 43, 44], "stronli": [], "structur": [0, 1, 2, 3, 6, 9, 10, 12, 22, 25, 33, 37, 38, 39, 41, 42, 43, 44], "stuck": [1, 13, 35, 36, 41, 42], "student": [0, 15, 27, 28, 29, 31, 32, 33, 42, 43], "studi": [0, 3, 4, 5, 6, 7, 8, 11, 12, 13, 23, 25, 27, 28, 32, 33, 34, 35, 36, 38, 40, 41, 42, 43, 44], "studier": 32, "stuff": [21, 22], "style": [7, 9, 20, 26, 33], "stylesheet": [], "st\u00f8land": 31, "sub": [9, 12, 36, 39, 40], "subarrai": [], "subclass": [], "subdivid": [0, 26, 33], "subfield": 0, "subgradi": 36, "subject": [6, 8, 30], "sublicens": [], "sublinear": 36, "submatric": [43, 44], "submit": 33, "subplot": [0, 1, 3, 4, 6, 7, 8, 9, 10, 14, 21, 33, 37, 38, 39, 41, 42, 44], "subplots_adjust": [8, 30], "subprogram": [26, 33], "subproject": [], "subract": [0, 34], "subregion": [43, 44], "subroutin": [0, 33], "subsampl": [43, 44], "subscript": [1, 41], "subsequ": [1, 4, 5, 6, 12, 26, 30, 34, 35, 37, 39, 40], "subset": [1, 6, 9, 12, 13, 23, 25, 33, 35, 36, 37, 38, 39, 40, 41], "subspac": [0, 8, 11, 34], "substanti": [9, 10, 36], "substep": 11, "substitut": [3, 6, 12, 16, 26, 37, 38, 39], "subsubset": 9, "subtask": 6, "subtl": [1, 41], "subtract": [0, 4, 5, 6, 11, 13, 18, 19, 26, 27, 30, 34, 36, 37, 38, 41, 42], "subtre": 9, "succeed": [0, 4, 33], "success": [3, 7, 9, 13, 30, 38, 39, 43, 44], "successfulli": [4, 9], "succinctli": 36, "sudo": [0, 25, 27, 33], "suffer": [0, 1, 2, 5, 10, 24, 33, 34, 35, 41, 42, 43], "suffici": [1, 6, 8, 11, 13, 35, 37, 38, 41], "suggest": [1, 13, 23, 27, 28, 32, 35, 36, 41, 43, 44], "suit": [8, 12, 28, 39, 40], "suitabl": [0, 15, 19, 24, 30, 34, 36], "sum": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 19, 21, 23, 26, 30, 33, 34, 35, 36, 39, 42, 43, 44], "sum_": [0, 1, 2, 3, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 16, 19, 23, 26, 27, 28, 30, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "sum_i": [0, 2, 5, 6, 8, 13, 19, 23, 27, 34, 35, 36, 37, 38, 42, 43], "sum_j": [6, 18, 36], "sum_ja_": 0, "sum_k": [6, 8, 12, 26, 40, 41, 42], "sum_logist": 13, "sum_m": [3, 43, 44], "sum_n": [3, 43, 44], "sum_nx_": 3, "summar": [5, 6, 9, 23, 28, 37, 38], "summari": [1, 3, 4, 10, 24, 29, 35, 36, 41, 42, 43, 44], "summat": [0, 3, 16, 34, 35, 43, 44], "sunni": 9, "super": [5, 34, 35, 36, 41, 42, 44], "superfici": 3, "superscript": [1, 12, 39, 40, 41], "supervis": [0, 5, 6, 7, 9, 12, 25, 33, 34, 35, 37, 38, 39, 40], "supplement": [7, 27, 28, 38, 39], "supplementari": 28, "suppli": [], "support": [0, 1, 9, 10, 11, 13, 20, 21, 23, 25, 33, 34, 36, 38, 39, 40, 41, 42], "suppos": [0, 5, 6, 7, 8, 10, 11, 12, 13, 26, 33, 34, 35, 36, 37, 38, 39, 40], "suppress": [5, 13, 35], "sure": [0, 1, 4, 6, 16, 20, 21, 22, 27, 41, 42], "surf": 6, "surfac": [0, 6, 33, 36], "surpass": 6, "surpris": [0, 33], "surround": [3, 25, 44], "survei": [0, 5, 6, 33, 34], "svc": [8, 9, 10], "svd": [0, 6, 11, 33, 37], "svdinv": 5, "svm": [8, 9, 10, 11], "svm_clf": [8, 10], "svn": [], "swap": 21, "swath": [5, 34, 35], "sweep": 23, "switch": [0, 41, 42], "sy": [13, 35, 36, 41, 42], "symbol": [1, 5, 11, 13, 25, 30, 33, 34, 35, 40, 41, 43, 44], "symmeteri": 1, "symmetr": [0, 5, 8, 11, 12, 13, 26, 33, 34, 39, 40], "symmetri": 6, "sympi": [0, 25, 27, 33, 40], "synonim": 30, "syntax": 13, "system": [0, 1, 3, 4, 6, 7, 9, 10, 12, 13, 15, 25, 26, 27, 33, 35, 36, 38, 39, 40, 41, 42], "systemat": [4, 6, 37, 38], "t": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 21, 22, 23, 24, 25, 26, 27, 28, 30, 31, 33, 35, 36, 37, 38, 39, 40, 41, 42], "t0": [3, 6, 13, 36], "t1": [2, 13, 36, 42, 43], "t2": [2, 42, 43], "t3": [2, 42, 43], "t9jjwsmsd1o": 37, "t_": [2, 42, 43], "t_0": [2, 9, 13, 36, 42, 43], "t_1": [13, 36], "t_b": 10, "t_batch": [41, 42], "t_i": [1, 2, 5, 12, 28, 34, 35, 41, 42, 43], "t_j": 12, "t_k": 9, "t_test": [41, 42], "t_train": [41, 42], "t_val": [41, 42], "tabl": [9, 23, 27, 28, 30, 31, 33, 39], "tabul": [0, 23, 33], "tabular": 33, "tackl": 4, "tag": [2, 3, 4, 5, 6, 7, 12, 13, 14, 26, 30, 34, 35, 38, 39, 40, 41, 42, 43], "tagrget": 40, "taht": [0, 33], "tail": 30, "tailor": [2, 8, 11, 33, 40, 42, 43], "taiwan": [0, 33], "take": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 17, 19, 21, 22, 23, 25, 26, 30, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "taken": [0, 1, 3, 6, 10, 13, 21, 26, 37, 41, 43, 44], "tan": 3, "tangent": [1, 4, 12, 13, 35, 39, 41, 42], "tanh": [1, 4, 7, 8, 12, 38, 39, 41, 42], "target": [0, 1, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 15, 16, 18, 19, 21, 22, 23, 28, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 44], "target_nam": [9, 21], "task": [0, 1, 3, 6, 9, 11, 12, 14, 21, 24, 27, 28, 33, 36, 37, 38, 39, 40, 41, 42, 43, 44], "tau": [3, 5, 30], "taught": 33, "tax": [], "taylor": [2, 13, 35, 40, 42, 43], "taylornr": [13, 35], "tc": 8, "teach": [15, 29, 33, 37], "team": [1, 41, 42], "teaser": 0, "technic": [0, 5, 6, 13, 27, 28, 35, 36, 37], "techniqu": [0, 1, 8, 10, 13, 25, 30, 32, 33, 34, 36, 37, 38, 41], "technologi": [0, 1, 41], "tell": [0, 4, 6, 10, 11, 13, 16, 30, 36, 37, 38], "temp": 1, "temp1": 1, "temp2": 1, "temperatur": [0, 9, 33], "templat": [18, 20], "temporari": [], "temporarili": [1, 41], "ten": [3, 33, 40, 43, 44], "tend": [3, 5, 6, 8, 9, 10, 12, 13, 14, 34, 36, 37, 38, 43, 44], "tendenc": [0, 33], "tension": [6, 37, 38], "tensor": [3, 44], "tensorflow": [0, 2, 4, 8, 14, 25, 26, 27, 28, 32, 33, 34], "term": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 16, 18, 19, 22, 23, 27, 28, 30, 33, 34, 35, 36, 38, 39, 42, 43, 44], "term1": [5, 6, 11], "term2": [5, 6, 11], "term3": [5, 6, 11], "term4": [5, 6, 11], "termin": [0, 4, 5, 9, 10, 13, 15, 34, 35, 36], "terminarl": 15, "terrain": 6, "terrain1": 6, "test": [3, 4, 5, 6, 7, 8, 9, 10, 13, 16, 19, 20, 21, 23, 24, 26, 27, 30, 33, 35, 36, 37, 38, 39, 44], "test_acc": [3, 42, 44], "test_accuraci": [1, 3, 41, 42, 44], "test_dataset": [42, 44], "test_error": 6, "test_imag": [3, 4, 44], "test_ind": [6, 37, 38], "test_input": 4, "test_label": [3, 4, 44], "test_load": [42, 44], "test_loss": [3, 42, 44], "test_pr": [1, 41], "test_predict": [1, 41], "test_rnn": 4, "test_scor": [7, 10, 23, 39], "test_siz": [0, 1, 3, 5, 6, 10, 15, 17, 28, 34, 35, 36, 37, 38, 41, 42, 44], "test_split": 9, "testerror": [0, 6, 34, 37, 38], "testi": 4, "testpredict": 4, "testx": 4, "tex": [], "text": [0, 1, 2, 4, 5, 8, 9, 11, 13, 15, 18, 20, 23, 24, 26, 27, 28, 30, 32, 34, 35, 36, 37, 38, 41, 42, 43], "textbf": [], "textbook": [16, 27, 28, 34, 35, 37, 38], "textual": 9, "textur": 1, "tf": [1, 3, 4, 13, 14, 35, 41, 42, 44], "th": [0, 1, 2, 5, 6, 7, 9, 12, 13, 14, 26, 27, 30, 33, 34, 36, 37, 38, 39, 40, 41, 42, 43], "than": [0, 1, 2, 3, 4, 5, 6, 7, 9, 10, 11, 12, 13, 17, 21, 23, 25, 30, 33, 34, 36, 37, 38, 39, 40, 41, 42, 43, 44], "thank": [4, 6, 34, 36, 42, 43], "thats": [41, 42], "theano": [1, 25, 33, 41, 42], "thei": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 11, 12, 13, 15, 16, 18, 20, 22, 23, 26, 27, 28, 30, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "them": [0, 1, 3, 4, 6, 8, 9, 10, 11, 12, 13, 18, 21, 24, 26, 27, 28, 33, 34, 39, 40, 41, 42, 43, 44], "theme": [0, 15, 33], "themselv": [0, 27, 28, 30, 33, 36, 43, 44], "thenc": [6, 37, 38], "theorem": [2, 6, 7, 34, 35, 38, 39, 41, 42, 43, 44], "theoret": [0, 4, 10], "theori": [0, 1, 3, 8, 9, 12, 13, 19, 25, 27, 32, 33, 36, 39, 40, 41], "thereaft": [0, 5, 6, 11, 12, 26, 27, 33, 37, 38, 40, 41, 42, 44], "therebi": [0, 5, 7, 11, 27, 33, 34, 35, 38, 39, 40], "therefor": [0, 1, 2, 3, 4, 6, 7, 8, 11, 13, 19, 30, 33, 34, 35, 36, 37, 38, 39, 41, 42, 43, 44], "therein": 11, "thereof": [0, 6, 13, 33, 36, 37], "theta": [0, 1, 4, 5, 6, 7, 13, 16, 27, 30, 33, 34, 35, 36, 38, 39, 40, 41], "theta1": 36, "theta2": 36, "theta_": [0, 1, 6, 7, 13, 33, 34, 35, 36, 38, 39, 41], "theta_0": [0, 5, 6, 7, 16, 33, 34, 35, 36, 38, 39], "theta_0x_": [0, 33, 34], "theta_1": [0, 5, 6, 7, 33, 34, 35, 36, 38, 39], "theta_1x_": [0, 33, 34], "theta_1x_0": [0, 33], "theta_1x_1": [0, 7, 33, 38, 39], "theta_1x_2": [0, 33], "theta_1x_i": [7, 34, 35, 36, 38, 39], "theta_2": [0, 33, 34], "theta_2x_": [0, 33, 34], "theta_2x_0": [0, 33], "theta_2x_1": [0, 33], "theta_2x_2": [0, 7, 33, 38, 39], "theta_2x_i": 34, "theta_3x_i": 34, "theta_4x_i": 34, "theta_closed_form": 18, "theta_closed_formol": 18, "theta_closed_formridg": 18, "theta_gdol": 18, "theta_gdridg": 18, "theta_i": [0, 1, 5, 33, 34, 35, 41], "theta_j": [0, 5, 6, 18, 33, 34, 36], "theta_k": [35, 36], "theta_linreg": [13, 35, 36], "theta_ol": 18, "theta_p": [7, 38, 39], "theta_px_p": [7, 38, 39], "theta_ridg": 18, "theta_t": [13, 36], "theta_tru": 18, "thetaand": 39, "thetaith": 36, "thetaor": 39, "thetavalu": 5, "thetaxor": 39, "thi": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 23, 25, 26, 27, 28, 29, 30, 32, 34, 35, 36, 37, 38, 39, 41, 42, 43, 44], "thing": [0, 1, 2, 4, 5, 7, 9, 15, 16, 18, 21, 22, 30, 33, 37, 39, 41, 42], "think": [0, 1, 3, 4, 6, 9, 12, 13, 14, 30, 33, 34, 35, 36, 37, 39, 41, 43, 44], "third": [0, 3, 6, 13, 31, 33, 35, 36, 43, 44], "thirti": [7, 39], "thorughout": 33, "those": [0, 3, 5, 6, 8, 9, 10, 11, 23, 26, 27, 28, 33, 34, 35, 36, 37, 38, 40, 42, 43, 44], "though": [1, 2, 3, 4, 13, 16, 17, 19, 21, 22, 26, 30, 36, 41, 42, 43], "thought": [6, 14, 27, 28, 30, 37, 38], "thousand": [0, 1, 27, 34, 36, 41], "three": [0, 1, 3, 5, 6, 8, 9, 12, 21, 23, 24, 26, 27, 28, 29, 30, 31, 33, 34, 35, 37, 38, 39, 43, 44], "threshold": [1, 3, 9, 10, 11, 12, 13, 23, 36, 38, 39, 40, 41, 42, 43, 44], "through": [0, 1, 2, 3, 4, 5, 6, 8, 11, 12, 13, 14, 15, 21, 22, 23, 25, 26, 27, 30, 33, 34, 35, 36, 37, 39, 41, 42, 43, 44], "throughout": [0, 4, 5, 14, 15, 25, 26, 30, 33, 41, 42], "throw": [3, 6, 30, 37], "thu": [0, 1, 2, 5, 6, 7, 8, 10, 11, 12, 13, 31, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "thumb": [0, 6, 27, 34], "thursdai": [], "tibshirani": [6, 19, 27, 32, 33, 37, 38], "tick_param": 6, "ticker": [6, 13, 30, 35, 36], "tif": 6, "tight_layout": [1, 7, 39], "tightli": 11, "tild": [0, 5, 6, 7, 11, 19, 27, 30, 33, 34, 35, 36, 37, 38, 40, 41, 43, 44], "till": [0, 4, 7, 8, 9, 10, 12, 26, 33, 34, 38, 39, 40, 41, 42, 43, 44], "time": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 20, 21, 22, 23, 24, 25, 26, 27, 28, 30, 33, 34, 35, 37, 38, 39, 40, 41, 42, 43, 44], "timeit": 4, "timer": 4, "times2": 23, "timeseri": [], "tini": [1, 36, 41], "tip": [3, 43, 44], "titl": [0, 1, 2, 3, 4, 6, 7, 8, 9, 10, 13, 15, 20, 21, 24, 30, 33, 35, 36, 37, 38, 41, 42, 43, 44], "tm": [], "tmp": 13, "tn": [2, 3, 7, 23, 42, 43], "to_categor": [1, 3, 4, 41, 42, 44], "to_categorical_numpi": [1, 41], "to_numer": [0, 6, 33, 37, 38], "todai": 3, "togeth": [0, 3, 6, 8, 11, 13, 22, 25, 33, 42, 43, 44], "toi": 14, "token": [], "told": 13, "toler": [2, 14, 42, 43], "tolist": 4, "tomographi": [12, 39, 40], "too": [0, 2, 4, 5, 6, 9, 11, 13, 17, 18, 24, 30, 32, 34, 35, 36, 37, 38, 42, 43, 44], "took": [8, 33], "tool": [0, 1, 3, 6, 13, 15, 25, 34, 37, 38, 41, 43, 44], "toolbox": 8, "top": [0, 3, 5, 6, 9, 10, 19, 23, 25, 33, 37], "topic": [0, 5, 6, 7, 8, 25, 27, 28, 34, 35, 37, 38, 39, 40, 42], "topolog": [3, 12, 39, 40, 43, 44], "topologi": [1, 12, 41], "torch": [42, 44], "torchvis": [42, 44], "torkjellsdatt": [31, 33], "tort": [], "toss": [10, 30], "total": [0, 1, 2, 3, 4, 6, 7, 8, 10, 11, 12, 13, 14, 23, 26, 28, 30, 31, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "total_loss": 4, "totalclustervari": 14, "totalscatt": 14, "totensor": [42, 44], "toward": [1, 2, 7, 12, 13, 15, 23, 24, 35, 38, 39, 41, 42, 43], "towardsdatasci": 36, "town": [], "tp": [4, 7, 23], "tpng": 9, "tpr": 23, "tpu": [13, 25, 33], "tqdm": 6, "tr": [], "track": [3, 13, 14, 15, 22, 26, 34, 35, 36, 43, 44], "tract": [], "tractabl": [0, 33, 34], "trade": [5, 9, 20, 23, 28, 36, 37], "tradeoff": [0, 5, 19, 27, 33, 34, 35], "tradit": [0, 1, 4, 6, 24, 33, 37, 38, 41], "train": [2, 3, 5, 6, 8, 9, 10, 11, 12, 13, 16, 17, 20, 24, 27, 28, 35, 36, 37, 38, 39, 42], "train_acc": [41, 42], "train_accuraci": [0, 1, 3, 33, 41, 42, 44], "train_dataset": [4, 42, 44], "train_end": [0, 1, 34, 41], "train_error": [6, 41, 42], "train_imag": [3, 4, 44], "train_ind": [6, 37, 38], "train_label": [3, 4, 44], "train_load": [42, 44], "train_network": 21, "train_pr": [1, 41], "train_siz": [0, 1, 3, 34, 41, 42, 44], "train_step": 4, "train_test_split": [0, 1, 3, 5, 6, 7, 9, 10, 11, 15, 16, 17, 19, 23, 28, 33, 34, 35, 36, 37, 38, 39, 41, 42, 44], "train_test_split_numpi": [0, 1, 34, 41], "trainable_vari": 4, "trained_model": [6, 34, 36], "trainerror": [0, 34], "traini": 4, "training_checkpoint": 4, "training_dataset": 4, "training_gradi": [13, 36], "trainingerror": [6, 37, 38], "trainpredict": 4, "trainscor": 4, "trainx": 4, "trait": [0, 33], "trajectori": [4, 36], "transfer": [9, 33], "transform": [0, 5, 6, 7, 8, 9, 10, 11, 12, 13, 17, 21, 25, 26, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42], "transit": [6, 12, 39, 40], "translat": [1, 4, 6, 10, 33, 34, 36, 41, 43, 44], "transpos": [1, 5, 11, 21, 26, 34, 35, 41], "travers": [0, 5], "travi": [], "treat": [0, 1, 3, 6, 12, 13, 18, 21, 23, 30, 33, 34, 35, 36, 37, 38, 39, 40, 41, 43, 44], "tree": [0, 1, 25, 33, 41], "tree_clf": [9, 10], "tree_clf_": 9, "tree_clf_sr": 9, "tree_reg": 9, "tree_reg1": 9, "tree_reg2": 9, "trend": 30, "treue": 7, "trevor": [19, 27, 32], "tri": [2, 3, 4, 9, 13, 16, 36, 42, 43], "triain": 0, "trial": [0, 2, 4, 6, 13, 30, 33, 35, 36, 37, 38], "triangl": [13, 35], "triangular": 26, "trick": [3, 4, 8, 11, 13, 30, 36, 43, 44], "tricki": 22, "trickier": 30, "tridiagon": 26, "trigonometr": [43, 44], "trillion": 25, "trim": [], "trivial": [0, 1, 5, 11, 30, 33, 35, 41, 42], "troffa": [], "troubl": [0, 8, 12, 15, 21, 22, 34, 36, 40, 41], "truck": [3, 44], "true": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 12, 13, 14, 16, 17, 18, 19, 21, 22, 23, 26, 27, 30, 33, 34, 35, 36, 37, 38, 39, 41, 42, 43, 44], "true_beta": 34, "true_fun": [6, 37, 38], "true_theta": [6, 36], "truelabel": [38, 39], "truli": 33, "truncat": 40, "try": [0, 1, 2, 4, 5, 6, 7, 8, 9, 10, 11, 13, 14, 15, 18, 21, 22, 25, 26, 27, 28, 30, 33, 34, 35, 36, 38, 39, 40, 41, 42, 43], "tr\u00f6ger": [], "tucker": 8, "tuesdai": [31, 33, 38, 42], "tumor": [7, 9, 38, 39], "tumour": [7, 39], "tunabl": 1, "tune": [4, 9, 13, 24, 26, 33, 36], "tupl": [21, 41, 42], "turn": [0, 1, 5, 6, 7, 8, 9, 10, 11, 12, 13, 26, 27, 30, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42], "tutori": [1, 4, 28, 41, 42], "tv": [2, 42], "tveito": [2, 42, 43], "tvw1zdmznwm": 39, "tweak": [1, 4, 10, 30, 41, 42], "twice": [13, 35], "twist": 11, "two": [0, 1, 2, 4, 5, 6, 7, 9, 10, 11, 12, 13, 15, 17, 21, 23, 24, 26, 27, 29, 30, 32, 33, 34, 35, 36, 37, 42], "tx": [13, 35, 36, 39], "tx_1": [13, 35], "txt": [4, 15, 20, 27, 28], "ty": [13, 35], "type": [0, 1, 3, 6, 8, 10, 13, 21, 23, 24, 26, 30, 34, 35, 36, 37, 41], "typeset": 20, "typic": [0, 1, 2, 3, 4, 5, 7, 9, 10, 12, 13, 15, 16, 20, 24, 30, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "typo": [27, 28], "u": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 21, 26, 27, 28, 30, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "u_": 26, "u_i": [12, 39], "u_m": 10, "ua": [0, 33], "ubuntu": [0, 25, 27, 33], "uci": [27, 28], "ufunc": [], "uio": [15, 20, 21, 27, 28, 31, 32], "uk": [], "un": 14, "unabl": [15, 21], "unari": [26, 33], "unbalanc": [6, 9, 37, 38], "unbias": [0, 5, 6, 33, 37], "uncent": [6, 34, 36], "uncertainti": [0, 5, 33], "uncertitud": 30, "unchang": [1, 3, 41, 43, 44], "uncom": [], "uncorrel": [10, 30], "undefin": [5, 34, 35], "under": [0, 1, 5, 6, 10, 13, 23, 25, 27, 33, 34, 35, 36, 37, 41, 42], "underdetermin": [0, 33], "underfit": [1, 6, 24, 37, 38, 41], "underflowproblem": [5, 37], "undergo": [5, 21], "undergradu": [29, 31], "underli": [0, 1, 9, 13, 18, 23, 30, 33, 36, 41], "underlin": [], "underscor": [], "underset": [4, 14], "understand": [0, 1, 3, 5, 6, 10, 13, 14, 15, 19, 20, 21, 25, 33, 34, 35, 36, 40, 41, 42, 43, 44], "understood": [8, 13], "underwai": [], "undesir": 8, "undetermin": [5, 8, 37], "undo": 4, "unexpect": [6, 37], "unexpected": 30, "unexplain": 18, "unfair": [6, 34], "unfortun": [1, 8, 9, 10, 41], "unicode_liter": [8, 9], "uniform": [0, 1, 5, 6, 11, 13, 27, 30, 33, 35, 36, 38, 39, 41], "uniformli": [13, 30, 35, 36], "unifrompdf": 30, "unimport": [13, 35], "union": [5, 6, 37, 38], "uniqu": [0, 2, 6, 13, 14, 26, 33, 37, 38, 39, 42, 43], "unique_class": [38, 39], "unique_cluster_label": 14, "unit": [0, 1, 3, 4, 5, 10, 12, 18, 30, 33, 34, 35, 36, 39, 40, 41, 42, 43, 44], "unitari": [5, 6, 26, 34, 35], "unitarili": [26, 33], "uniti": 30, "univari": 30, "univers": [0, 1, 2, 13, 25, 27, 28, 29, 31, 33, 34, 35, 36, 37, 38, 39, 41, 42, 43, 44], "unix": [1, 41, 42], "unknow": [0, 26, 33], "unknown": [0, 1, 3, 4, 5, 6, 8, 10, 13, 19, 26, 27, 33, 34, 35, 36, 37, 38, 40, 41, 43, 44], "unknowwn": 12, "unlabel": [1, 41], "unless": [0, 3, 6, 11, 13, 27, 28, 33, 35, 37, 40], "unlik": [1, 3, 8, 13, 35, 36, 41, 42, 43, 44], "unnecessarili": 9, "unord": [3, 43, 44], "unpickl": [], "unpleas": [], "unpublish": 36, "unravel": [1, 41], "unrol": [3, 11, 44], "unscal": 19, "unseen": [0, 7, 9, 15, 38, 39], "unstabl": [1, 41], "unsupervis": [0, 1, 4, 12, 25, 33, 39, 40, 41], "unsymmetr": [26, 33], "until": [1, 2, 4, 9, 12, 13, 14, 21, 35, 36, 39, 41, 42, 43], "untouch": 0, "unusu": [12, 39, 40], "unweight": 23, "up": [1, 3, 4, 5, 6, 8, 10, 11, 13, 14, 16, 18, 19, 20, 21, 22, 23, 25, 26, 27, 30, 31, 36, 39], "updat": [1, 2, 10, 12, 13, 14, 15, 18, 19, 21, 22, 28, 37, 38, 39, 43], "update_chang": [41, 42], "update_matrix": [41, 42], "update_weight": 22, "uploa": 33, "upload": [15, 20, 25, 27, 28, 32], "upon": [0, 1, 6, 7, 11, 26, 40, 41, 42], "upper": [0, 8, 9, 16, 26, 34, 43, 44], "uppercas": [26, 33], "upsampl": 4, "upscal": 4, "uptad": 40, "upward": [], "url": [33, 34, 39], "us": [4, 5, 6, 8, 9, 10, 11, 12, 14, 15, 17, 20, 21, 23, 24, 26, 30, 32, 37], "usag": [0, 8, 25, 33, 34, 40], "usd": [], "usd10000": [], "use_bia": 4, "usecol": [0, 33], "useless": [1, 41], "user": [0, 1, 2, 4, 6, 7, 15, 25, 26, 27, 33, 34, 38, 39, 41, 42, 43], "usernam": [15, 27, 28], "usetex": 30, "usg": 6, "usr": 30, "usual": [0, 3, 4, 7, 12, 13, 14, 23, 33, 36, 38, 39, 40, 43, 44], "ut": 5, "utf": [], "util": [1, 3, 4, 6, 7, 10, 14, 19, 33, 37, 38, 41, 42, 44], "ux": 26, "v": [2, 4, 5, 6, 11, 13, 15, 23, 25, 34, 35, 37, 38, 39, 40, 41, 42], "v0": 30, "v1": 30, "v2": 30, "v5": [], "v8xr": [39, 40, 41], "v_": 36, "v_0": [11, 36], "v_t": 36, "va": 1, "vahid": 33, "val": 13, "val_acc": [41, 42], "val_accuraci": [3, 44], "val_error": [41, 42], "val_loss": 4, "val_set": [41, 42], "vale": [2, 42, 43], "valid": [0, 1, 4, 7, 9, 10, 13, 23, 24, 25, 30, 33, 34, 36, 39, 41, 42], "validation_data": [3, 44], "validation_split": [4, 42], "valu": [0, 1, 2, 3, 4, 6, 7, 8, 9, 10, 12, 13, 14, 16, 17, 18, 20, 21, 22, 25, 26, 27, 28, 33, 36, 39, 40, 41, 42, 43, 44], "valuat": 9, "valued_at_a": [41, 42], "valued_at_z": [41, 42], "valueerror": [], "valy": 4, "van": [0, 19, 27, 33, 34, 35, 36], "vandenbergh": [8, 13, 35], "vandermond": [0, 33], "vanilla": [0, 6, 11, 14, 34, 36], "vanish": [1, 4, 13, 24, 30, 35, 40], "var": [5, 6, 10, 11, 19, 27, 30, 34, 37, 38], "var_x": 30, "varabl": 8, "varepsilon": [5, 6, 19, 37], "varepsilon_": [5, 6, 37], "varepsilon_i": [5, 6, 37], "vari": [0, 1, 3, 5, 6, 10, 21, 23, 33, 37, 38, 40, 41, 44], "variabl": [0, 1, 2, 5, 6, 7, 8, 10, 11, 12, 13, 14, 21, 26, 33, 34, 36, 37, 38, 39, 40, 41, 42, 43, 44], "varianc": [0, 1, 5, 7, 9, 10, 11, 13, 14, 18, 20, 25, 26, 28, 30, 33, 34, 35, 36, 39, 41], "variance_i": [5, 11, 34], "variance_x": [5, 11, 34], "variant": [0, 1, 6, 8, 12, 13, 28, 33, 34, 35, 36, 39, 40, 41, 42], "variat": [3, 4, 11, 33, 43, 44], "varieti": [0, 3, 12, 25, 27, 33, 39, 40, 43, 44], "variou": [1, 3, 5, 6, 7, 8, 9, 11, 12, 13, 16, 19, 20, 25, 26, 27, 30, 33, 34, 35, 36, 39, 40, 41, 42], "varydimens": 4, "vast": 36, "vastli": [3, 43, 44], "vaue": 1, "vault": 0, "vdot": [2, 13, 35, 36, 42, 43], "ve": [27, 28, 36], "vec": [6, 37], "vector": [0, 1, 2, 3, 4, 5, 6, 7, 9, 10, 11, 13, 14, 17, 18, 21, 22, 25, 35, 36, 37, 38, 40, 41, 42], "vector_mean": 14, "ventur": [0, 8, 25, 33], "venv": 15, "verbos": [1, 3, 4, 38, 39, 41, 42, 44], "veri": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 16, 18, 21, 22, 27, 28, 30, 32, 33, 34, 35, 36, 37, 38, 41, 42, 43, 44], "verifi": [3, 11, 26, 33], "versatil": [8, 33], "versicolor": [8, 9], "version": [0, 3, 10, 13, 14, 15, 21, 22, 24, 25, 26, 27, 28, 30, 33, 43, 44], "versu": [1, 23, 36, 41], "vert": [0, 1, 5, 6, 7, 8, 9, 11, 13, 16, 17, 33, 34, 35, 36, 37, 38, 39, 40, 41], "vert_1": [5, 6, 34, 35, 36], "vert_2": [5, 6, 11, 17, 34, 35, 36, 37], "vg5wr4qee1zxk": 43, "vi": [41, 42], "via": [0, 5, 6, 7, 8, 9, 10, 11, 12, 19, 23, 25, 26, 27, 29, 30, 31, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42], "vidal": 11, "video": [0, 1, 12, 25, 29, 31, 33, 34, 35, 42, 43, 44], "view": [1, 3, 5, 6, 12, 13, 30, 32, 33, 35, 36, 37, 39, 41, 42, 43, 44], "vii": [41, 42], "viii": [41, 42], "violat": 8, "virginica": 9, "viridi": [0, 1, 2, 3, 33, 41, 42, 43, 44], "virtanen": [], "virtual": [1, 36, 41], "viscos": 13, "viscou": 13, "visibl": 15, "visin": [43, 44], "vision": [0, 3, 43, 44], "visit": 36, "visual": [0, 3, 11, 12, 18, 23, 25, 33, 34, 39, 40, 42, 43], "visualis": 1, "visualstudio": [15, 16, 19], "viz": [6, 8, 30], "vmap": 13, "vmax": [1, 6], "vmh0zpt0tli": 36, "vmin": [1, 6], "voic": [3, 43, 44], "volatil": 36, "volum": [0, 3, 33], "volume18": 42, "von": [40, 41], "vote": [10, 33], "voting_clf": 10, "votingclassifi": 10, "votingsimpl": 10, "vscode": [21, 22], "vstack": [5, 11, 26, 30, 33, 34, 38, 39, 41, 42], "vt": [5, 34, 35, 43], "w": [0, 1, 2, 3, 4, 5, 6, 7, 8, 10, 11, 12, 13, 14, 21, 22, 26, 30, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "w1": [8, 21, 22], "w2": [8, 11, 21, 22], "w3": 8, "w_": [1, 12, 39, 40, 41, 42, 43, 44], "w_0": 40, "w_1": [8, 26, 40, 41, 43, 44], "w_1a_0": [40, 41], "w_1x": [40, 41], "w_1x_": 8, "w_1x_1": 8, "w_2": [8, 26, 40, 41, 43, 44], "w_2a_1": [40, 41], "w_2x_": 8, "w_2x_2": 8, "w_3": 26, "w_4": 26, "w_g": [21, 22], "w_hidden": [2, 42, 43], "w_i": [1, 2, 10, 40, 41, 42, 43], "w_ix_i": [12, 39, 40], "w_j": 26, "w_m": 26, "w_output": [2, 42, 43], "w_px_": 8, "w_px_p": 8, "w_t": [], "wa": [1, 3, 4, 5, 6, 7, 10, 11, 12, 14, 17, 19, 21, 26, 33, 34, 36, 37, 38, 39, 40, 41, 42], "wai": [0, 1, 2, 3, 4, 5, 6, 7, 8, 10, 11, 12, 13, 14, 15, 18, 19, 21, 22, 23, 24, 26, 30, 33, 34, 35, 36, 39, 41, 42], "walk": 9, "walker": 30, "wall": 36, "walt": [], "wang": [0, 33], "want": [0, 1, 2, 3, 4, 5, 6, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 20, 21, 22, 25, 27, 28, 30, 33, 34, 35, 36, 37, 38, 40, 41, 42, 43, 44], "warn": [4, 41, 42], "warrant": [6, 37, 38], "warranti": [], "wast": [3, 36, 43, 44], "watch": [25, 35, 36, 37, 39, 40, 41, 42, 43], "wave": 3, "wavelet": 8, "wcag": [], "we": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 30, 31, 32, 34, 35, 37, 38, 39, 43, 44], "weak": [9, 10, 14], "weaker": 36, "weather": [1, 12, 39, 40, 41], "web": [25, 29, 31, 33], "weblink": 28, "webpag": 33, "websit": [6, 26, 27, 28, 29, 33], "wedg": [8, 30, 40, 41], "wednesdai": [31, 33, 38, 42], "wee": 11, "week": [0, 5, 6, 7, 27, 28, 29, 31], "week41": [28, 42], "week42": [28, 42], "weekli": [15, 16, 24, 25, 27, 29, 31, 32, 33, 39], "weierstrass": 40, "weight": [1, 2, 3, 6, 7, 9, 10, 12, 13, 18, 21, 22, 23, 28, 30, 36, 38, 39, 40, 42, 43, 44], "weight_arrai": [41, 42], "weight_decai": 42, "weigth": [2, 22, 42, 43], "welchlab": [39, 40, 41], "welcom": [8, 15, 25], "well": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 12, 13, 15, 16, 20, 21, 22, 23, 25, 26, 27, 28, 30, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42], "went": 8, "were": [0, 1, 3, 4, 5, 6, 7, 8, 10, 11, 12, 14, 30, 33, 36, 37, 38, 39, 40, 41, 42, 43, 44], "wessel": [0, 19, 27, 33, 34, 35, 36], "wg_nf1awssi": 40, "what": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 19, 20, 21, 22, 24, 25, 26, 27, 28, 30, 36, 39, 40, 41, 42], "whatev": [3, 21, 43, 44], "when": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 18, 19, 21, 22, 23, 24, 26, 27, 28, 30, 33, 34, 35, 37, 38, 39, 40, 41, 42, 43, 44], "whenev": [13, 15, 30, 36, 40], "where": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 20, 21, 22, 23, 24, 25, 26, 27, 28, 30, 31, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "wherea": [6, 23, 30, 36, 37, 38], "wherefrom": [27, 28], "wherein": [1, 12, 39, 40, 41], "whether": [0, 3, 5, 7, 9, 27, 28, 30, 33, 38, 39, 43, 44], "which": [0, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 33, 34, 35, 37, 38, 39, 40, 43, 44], "whichev": [1, 3, 41, 42, 44], "while": [0, 1, 3, 4, 5, 6, 7, 8, 9, 11, 12, 13, 15, 16, 19, 20, 21, 22, 23, 24, 30, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "white": 9, "whiteboad": 36, "whiteboard": [34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "who": [0, 15], "whole": [1, 3, 4, 5, 9, 11, 13, 21, 36, 41, 43, 44], "whom": [], "whose": [0, 6, 10, 23, 28, 30, 34, 37, 38, 43, 44], "whow": [11, 34], "why": [0, 1, 3, 6, 13, 15, 16, 17, 19, 21, 24, 27, 34, 35, 41], "wide": [0, 1, 3, 6, 7, 12, 24, 25, 26, 27, 33, 37, 38, 39, 40, 41, 43, 44], "widehat": [6, 37], "width": [0, 3, 8, 9, 21, 33, 43, 44], "wieringen": [0, 19, 27, 33, 34, 35, 36], "wiki": 27, "wikipedia": 27, "win": [10, 36], "wind": 9, "window": [43, 44], "wing": [31, 33], "winther": [2, 42, 43], "wiothout": 6, "wiscons": 7, "wisconsin": [10, 39, 41, 42], "wisdom": [6, 34, 36], "wise": [1, 5, 12, 13, 21, 34, 35, 36, 39, 41], "wish": [0, 2, 5, 7, 8, 11, 13, 14, 18, 26, 27, 28, 33, 34, 35, 36, 38, 39, 40, 41, 42, 43], "with_std": [0, 34], "wither": 6, "within": [0, 2, 3, 4, 7, 9, 12, 13, 14, 30, 32, 33, 35, 38, 39, 42, 43, 44], "withinclust": 14, "without": [0, 1, 5, 6, 8, 9, 11, 12, 13, 15, 18, 23, 27, 28, 33, 34, 35, 36, 37, 38, 39, 40, 41], "wo5dmep_bbi": [39, 40, 41], "won": [0, 15, 33, 40], "wonder": 8, "word": [0, 1, 3, 4, 5, 6, 7, 14, 19, 23, 27, 28, 30, 33, 34, 35, 36, 41, 42, 43, 44], "work": [0, 1, 4, 6, 7, 8, 9, 13, 15, 16, 18, 19, 20, 21, 22, 24, 25, 27, 28, 29, 30, 31, 33, 34, 36, 37, 38, 39, 40, 41, 42, 43, 44], "workabl": 36, "workaround": [], "workhors": 36, "workload": 36, "workshop": 33, "world": [0, 8, 16, 34], "worldwid": [0, 33], "worri": 15, "wors": [0, 1, 3, 4, 6, 24, 33, 36, 37, 38, 41, 44], "worst": 23, "worth": [9, 19, 21], "worthi": [27, 28], "would": [0, 1, 3, 5, 6, 7, 8, 9, 10, 11, 12, 13, 16, 18, 20, 22, 23, 24, 26, 27, 28, 30, 33, 34, 35, 36, 37, 38, 39, 40, 41, 43, 44], "wouldn": [], "wrap": [6, 26, 33], "wrapper": [21, 22], "write": [0, 1, 2, 3, 5, 6, 7, 8, 12, 13, 15, 16, 18, 21, 23, 24, 26, 33, 34, 36, 37, 38, 39, 40, 42, 43], "writer": [38, 39], "writerow": [38, 39], "written": [0, 2, 3, 5, 11, 12, 13, 16, 23, 25, 26, 27, 28, 30, 33, 34, 35, 36, 40, 41, 42, 43, 44], "wrong": [1, 8, 15, 19, 41], "wrongli": 10, "wrote": [5, 11, 34], "wrt": [10, 13, 21, 22, 36, 40, 41], "wth": [10, 13, 36], "wurstemberg": [40, 41], "www": [20, 25, 26, 27, 28, 32, 33, 35, 36, 37, 39, 40, 41, 42, 43, 44], "wx_1": 8, "x": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 21, 22, 23, 24, 26, 27, 28, 30, 33, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "x0": [8, 38, 39], "x1": [4, 8, 9, 10, 13, 38, 39], "x1_exampl": 8, "x1d": 8, "x2": [8, 9, 10, 13], "x2d": [8, 11], "x2d_train": 11, "x2dsl": 11, "x3": 8, "x_": [0, 2, 3, 5, 6, 8, 10, 11, 13, 14, 26, 30, 33, 34, 35, 36, 37, 38, 40, 42, 43, 44], "x_0": [0, 5, 11, 18, 26, 33, 34, 37, 40], "x_1": [0, 2, 5, 6, 7, 8, 9, 10, 11, 13, 18, 26, 30, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43], "x_2": [0, 2, 5, 6, 7, 8, 9, 10, 11, 13, 26, 30, 33, 34, 35, 37, 38, 39, 40, 41, 42, 43], "x_3": [8, 26, 30, 40], "x_4": [26, 40], "x_5": 40, "x_6": 18, "x_batch": [41, 42], "x_bin": [38, 39], "x_center": 11, "x_data": [1, 41], "x_data_ful": [1, 41], "x_hidden": [2, 42, 43], "x_i": [0, 1, 2, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 26, 30, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43], "x_input": [2, 42, 43], "x_ix_": [0, 33], "x_iy_i": 8, "x_j": [0, 2, 8, 9, 12, 16, 30, 34, 36, 39, 40, 42, 43], "x_jy_j": 8, "x_k": [12, 14, 26, 30, 34, 39], "x_l": [30, 40], "x_m": [6, 12, 26, 30, 37, 39], "x_mean": [18, 36], "x_multi": [38, 39], "x_n": [0, 2, 3, 6, 8, 11, 12, 13, 26, 30, 33, 35, 37, 39, 40, 42, 43], "x_new": [9, 10], "x_norm": [18, 36], "x_offset": [6, 34, 36], "x_output": [2, 42, 43], "x_p": [3, 7, 9, 38, 39], "x_poli": 9, "x_poly10": 9, "x_pred": 4, "x_prev": [2, 42, 43], "x_reduc": 11, "x_sampl": [], "x_scale": 8, "x_small": 13, "x_std": [18, 36], "x_t": 36, "x_test": [0, 1, 3, 5, 6, 7, 9, 10, 11, 15, 16, 17, 19, 23, 28, 34, 35, 36, 37, 38, 39, 41, 42, 44], "x_test_": 17, "x_test_own": 6, "x_test_scal": [0, 6, 7, 9, 10, 11, 34, 36], "x_tot": 4, "x_train": [0, 1, 3, 4, 5, 6, 7, 9, 10, 11, 15, 16, 17, 19, 23, 28, 33, 34, 35, 36, 37, 38, 39, 41, 42, 44], "x_train_": 17, "x_train_mean": [6, 34, 36], "x_train_own": 6, "x_train_r": 19, "x_train_scal": [0, 6, 7, 9, 10, 11, 34, 36], "x_val": [1, 41, 42], "xapprox": 43, "xarrai": [25, 33], "xavier": [1, 41], "xbnew": [13, 35, 36], "xcode": [0, 25, 27, 33], "xdclassiffierconfus": 10, "xdclassiffierroc": 10, "xg_clf": 10, "xgb": 10, "xgbclassifi": 10, "xgboost": 9, "xgboot": 10, "xgbregressor": 10, "xgparam": 10, "xgtree": 10, "xi": [8, 13, 36, 38, 39], "xi_": 8, "xi_1": 8, "xi_i": 8, "xinv": 39, "xk": 8, "xla": [13, 25, 33], "xlabel": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 13, 21, 30, 33, 34, 35, 36, 37, 38, 42, 43, 44], "xlim": [6, 10, 37, 38], "xm": 9, "xmesh": 13, "xnew": [0, 13, 33, 35, 36], "xp": 30, "xpanda": [0, 34], "xpd": [5, 11, 34], "xplot": 0, "xscale": [0, 34], "xsr": 9, "xt_x": [13, 35, 36], "xtest": [6, 37, 38], "xtick": [3, 6, 8, 9, 37, 38, 44], "xtrain": [6, 37, 38], "xu": [0, 33], "xx": [0, 26, 33], "xy": [0, 6, 8, 26, 33], "xytext": 8, "xyz": [], "xz": [26, 33], "y": [0, 1, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 23, 26, 27, 28, 30, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "y1": 4, "y2": 4, "y3": 4, "y_": [0, 1, 5, 6, 10, 11, 26, 33, 34, 37, 38, 41, 43, 44], "y_0": [0, 5, 11, 26, 33, 34, 37], "y_1": [0, 5, 8, 9, 11, 13, 26, 33, 34, 35, 36, 37], "y_1y_1": 8, "y_1y_1k": 8, "y_1y_2": 8, "y_1y_2k": 8, "y_1y_n": 8, "y_1y_nk": 8, "y_2": [0, 5, 8, 9, 11, 26, 33, 34], "y_2y_1": 8, "y_2y_1k": 8, "y_2y_2": 8, "y_2y_2k": 8, "y_3": [0, 9, 26], "y_4": 26, "y_bin": [38, 39], "y_binari": [38, 39], "y_center": [18, 36], "y_data": [0, 1, 5, 6, 33, 34, 35, 36, 41], "y_data_ful": [1, 41], "y_decis": 8, "y_fit": [0, 34], "y_i": [0, 1, 5, 6, 7, 8, 9, 10, 11, 12, 13, 19, 26, 27, 28, 33, 34, 35, 36, 37, 38, 39, 40, 41], "y_if_": 10, "y_indic": [38, 39], "y_ix_": [0, 33], "y_ix_i": [7, 8, 13, 34, 35, 36, 38, 39], "y_iy_jk": 8, "y_j": [6, 8, 12, 27, 37, 38, 39, 40, 41], "y_k": [12, 39], "y_m": 26, "y_mean": [18, 36], "y_model": [0, 4, 5, 6, 33, 34, 35, 36], "y_multi": [38, 39], "y_n": [8, 13, 35, 36], "y_ny_1": 8, "y_ny_1k": 8, "y_ny_2": 8, "y_ny_2k": 8, "y_ny_n": 8, "y_ny_nk": 8, "y_offset": [6, 17, 34, 36], "y_onehot": [38, 39], "y_plot": 9, "y_pred": [0, 1, 4, 6, 7, 8, 9, 10, 23, 28, 34, 36, 37, 38, 39, 41], "y_pred1": 9, "y_pred2": 9, "y_pred_bin": [38, 39], "y_pred_multi": [38, 39], "y_pred_rf": 10, "y_pred_tre": 10, "y_prob": [38, 39], "y_prob_bin": [38, 39], "y_prob_multi": [38, 39], "y_proba": [7, 10, 23, 39], "y_sampl": [], "y_scaler": [6, 34, 36], "y_test": [0, 1, 3, 4, 5, 6, 7, 9, 10, 11, 15, 16, 17, 19, 23, 28, 34, 35, 36, 37, 38, 39, 41, 42, 44], "y_test_onehot": [1, 41], "y_test_predict": [], "y_tot": 4, "y_train": [0, 1, 3, 4, 5, 6, 7, 9, 10, 11, 15, 16, 17, 19, 23, 28, 33, 34, 35, 36, 37, 38, 39, 41, 42, 44], "y_train_mean": [6, 34, 36], "y_train_onehot": [1, 41], "y_train_predict": [], "y_train_r": 19, "y_train_scal": [6, 34, 36], "y_true": [38, 39], "y_val": 1, "yadav": [42, 43], "yand": 39, "ye": [3, 6, 7, 37, 38, 39, 43, 44], "year": [0, 25, 33, 43], "yet": [0, 1, 6, 8, 11, 13, 20, 21, 33, 38, 40, 41, 42], "yi": [13, 36, 38, 39], "yield": [0, 2, 5, 6, 8, 10, 12, 13, 14, 23, 26, 30, 33, 35, 36, 37, 39, 40, 41, 42, 43], "yk": 8, "ylabel": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 13, 21, 30, 33, 34, 35, 36, 37, 38, 42, 43, 44], "ylim": [3, 6, 37, 38, 44], "ym": 9, "ymesh": 13, "yn": 0, "yo": [8, 9, 10], "yor": 39, "yoshiki": [], "yoshua": [1, 32, 41], "you": [0, 1, 3, 4, 5, 6, 8, 9, 10, 11, 13, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 30, 31, 32, 33, 34, 35, 36, 37, 38, 40, 41, 42, 43, 44], "young": 0, "your": [1, 2, 4, 5, 6, 8, 11, 13, 15, 17, 19, 20, 21, 22, 23, 24, 25, 26, 33, 35, 36, 37, 38, 39, 40, 41, 42, 43], "your_model_object": 16, "yourself": [11, 13, 23, 33, 35, 43, 44], "youtu": [34, 35, 37, 39, 41, 42, 43, 44], "youtub": [25, 35, 36, 37, 39, 40, 41, 42, 43, 44], "ypred": [6, 37, 38], "ypredict": [0, 13, 33, 34, 35, 36], "ypredict2": [13, 35, 36], "ypredictlasso": [5, 35], "ypredictol": [0, 5, 35], "ypredictown": [6, 34, 36], "ypredictownridg": [6, 34, 35, 36], "ypredictridg": [0, 5, 6, 34, 35, 36], "ypredictskl": [6, 34, 36], "ytest": [6, 37, 38], "ytick": [3, 6, 8, 9, 37, 38, 44], "ytild": [0, 6, 33, 34, 37, 38], "ytildelasso": [5, 35], "ytildenp": [0, 33, 34], "ytildeol": [0, 5, 35], "ytildeownridg": [6, 34, 35, 36], "ytilderidg": [5, 6, 34, 35, 36], "ytrain": [6, 37, 38], "yuxi": 33, "yx": [26, 33], "yxor": [39, 41, 42], "yy": [26, 33], "yz": [26, 33], "z": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 11, 12, 13, 21, 22, 26, 30, 33, 34, 37, 38, 39, 40, 41, 42, 43, 44], "z1": [21, 22], "z2": [21, 22], "z_": [1, 2, 12, 26, 33, 40, 41, 42, 43], "z_0": [26, 33, 40], "z_1": [26, 33, 40, 41], "z_2": [22, 26, 33, 40, 41], "z_c": [1, 41], "z_h": [1, 41], "z_hidden": [2, 42, 43], "z_i": [1, 12, 39, 41], "z_j": [1, 12, 42], "z_k": [12, 34, 40, 41], "z_m": [1, 41], "z_matric": [41, 42], "z_mod": 9, "z_o": [1, 41], "z_output": [2, 42, 43], "za": [], "zalando": 28, "zaman": 30, "zaxi": 6, "zero": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 16, 17, 18, 19, 21, 26, 27, 28, 30, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "zero_grad": [42, 44], "zeros_lik": [4, 38, 39], "zeroth": [34, 43, 44], "zfill": 4, "zip": [4, 6, 21, 22, 38, 39], "zm_h": [0, 33], "zn": [], "zone": [], "zoom": 33, "zscout": [], "zx": [26, 33], "zy": [26, 33], "zz": [26, 33], "\u00f8yvind": [6, 34, 36], "\u03b4": [41, 42]}, "titles": ["<span class=\"section-number\">3. </span>Linear Regression", "<span class=\"section-number\">14. </span>Building a Feed Forward Neural Network", "<span class=\"section-number\">15. </span>Solving Differential Equations  with Deep Learning", "<span class=\"section-number\">16. </span>Convolutional Neural Networks", "<span class=\"section-number\">17. </span>Recurrent neural networks: Overarching view", "<span class=\"section-number\">4. </span>Ridge and Lasso Regression", "<span class=\"section-number\">5. </span>Resampling Methods", "<span class=\"section-number\">6. </span>Logistic Regression", "<span class=\"section-number\">8. </span>Support Vector Machines, overarching aims", "<span class=\"section-number\">9. </span>Decision trees, overarching aims", "<span class=\"section-number\">10. </span>Ensemble Methods: From a Single Tree to Many Trees and Extreme Boosting, Meet the Jungle of Methods", "<span class=\"section-number\">11. </span>Basic ideas of the Principal Component Analysis (PCA)", "<span class=\"section-number\">13. </span>Neural networks", "<span class=\"section-number\">7. </span>Optimization, the central part of any Machine Learning algortithm", "<span class=\"section-number\">12. </span>Clustering and Unsupervised Learning", "Exercises week 34", "Exercises week 35", "Exercises week 36", "Exercises week 37", "Exercises week 38", "Exercises week 39", "Exercises week 41", "Exercises week 42", "Exercises week 43", "Exercises week 44", "Applied Data Analysis and Machine Learning", "<span class=\"section-number\">2. </span>Linear Algebra, Handling of Arrays and more Python Features", "Project 1 on Machine Learning, deadline October 6 (midnight), 2025", "Project 2 on Machine Learning, deadline November 10 (Midnight)", "Course setting", "<span class=\"section-number\">1. </span>Elements of Probability Theory and Statistical Data Analysis", "Teachers and Grading", "Textbooks", "Week 34: Introduction to the course, Logistics and Practicalities", "Week 35: From Ordinary Linear Regression to Ridge and Lasso Regression", "Week 36: Linear Regression and Gradient descent", "Week 37: Gradient descent methods", "Week 38: Statistical analysis, bias-variance tradeoff and resampling methods", "Week 39: Resampling methods and logistic regression", "Week 40: Gradient descent methods (continued) and start Neural networks", "Week 41 Neural networks and constructing a neural network code", "Week 42 Constructing a Neural Network code with examples", "Week 43: Deep Learning: Constructing a Neural Network code and solving differential equations", "Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)", "Week 45,  Convolutional Neural Networks (CCNs)"], "titleterms": {"": [8, 10, 35, 36, 37, 38, 39], "0": 41, "04": [], "05": [], "06": [], "07": [], "1": [0, 15, 16, 17, 18, 19, 20, 21, 22, 24, 27, 28, 34, 40, 41, 42], "10": [28, 40], "11": [], "13": 41, "15": [19, 37], "19": 19, "1a": 18, "2": [0, 15, 16, 17, 18, 19, 20, 21, 22, 24, 28, 33, 34, 35, 40, 41, 42], "20": 42, "2017": [], "2018": [], "2019": [], "2023": 31, "2025": [27, 38, 39, 40, 41], "22": 38, "26": 38, "27": 43, "29": 39, "2a": [], "2b": [], "3": [0, 15, 16, 17, 18, 19, 20, 21, 22, 34, 40, 41, 42, 44], "34": [15, 33], "35": [16, 34], "36": [17, 35], "37": [18, 36], "38": [19, 37], "39": [20, 38], "3a": 18, "3b": 18, "3d": [43, 44], "4": [0, 15, 16, 17, 18, 19, 20, 21, 22, 34, 41], "40": 39, "41": [21, 40], "42": [22, 41], "43": [23, 42], "44": [24, 43], "45": 44, "4a": 18, "4b": 18, "5": [0, 16, 18, 19, 20, 21, 22], "6": [21, 22, 27, 40], "7": [21, 22], "8": [22, 36], "A": [0, 1, 4, 8, 9, 33, 37, 38, 39, 41, 42, 43, 44], "AND": 39, "And": [33, 34, 36, 42], "But": 36, "In": [31, 40], "Ising": 6, "OR": 39, "The": [0, 1, 2, 3, 5, 6, 7, 8, 9, 11, 12, 15, 25, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "To": 33, "With": [4, 35], "a11i": [], "about": [28, 33, 34, 35], "abov": [35, 40, 41, 42, 43, 44], "abstract": 20, "accuraci": 36, "across": 36, "activ": [1, 12, 21, 28, 39, 40, 41, 42], "ad": [0, 6, 20, 27, 33, 34, 39, 40, 41], "adaboost": 10, "adagrad": [13, 36], "adam": [13, 36], "adapt": [10, 36], "add": 44, "addit": [], "adjust": [1, 41], "advanc": 27, "adversari": 4, "again": [3, 9, 44], "against": 28, "ai": [27, 28, 33], "aim": [8, 9, 21, 22, 23, 24, 33], "aka": 33, "al": [36, 43, 44], "algebra": [26, 33], "algorithm": [9, 10, 11, 12, 28, 33, 34, 35, 36, 40, 41, 42], "algortithm": [13, 35, 38, 39], "all": [8, 40, 41], "an": [0, 4, 10, 15, 20, 33, 40], "analys": [5, 34, 35], "analysi": [0, 5, 6, 11, 25, 27, 28, 30, 33, 34, 35, 37, 38, 40], "analyt": [0, 16, 18, 28, 42], "analyz": [28, 40, 41], "ani": [13, 22, 35, 38, 39], "anoth": [9, 35, 37, 38], "api": [], "appli": 25, "approach": [0, 8, 14, 33, 36, 37, 38], "approxim": [12, 40], "architectur": [1, 41], "arithmet": [43, 44], "arrai": [26, 33], "artifici": [39, 40], "assist": 31, "assumpt": 37, "august": [], "author": [], "autocorrel": 30, "autograd": [2, 13, 22, 36, 42, 43], "automat": [13, 36, 40, 42, 43], "avail": 20, "avali": [], "averag": 36, "b": [23, 27, 28], "back": [1, 11, 12, 34, 35, 40, 41, 42, 43], "background": [25, 27, 28, 37], "backpropag": 22, "bag": 10, "base": [13, 36, 37], "basic": [0, 5, 7, 9, 10, 11, 26, 34, 35, 38, 39, 40], "batch": [1, 22, 36, 41], "bay": 5, "befor": [11, 43], "bengio": 41, "beta": [], "better": [8, 39], "bia": [6, 19, 27, 36, 37, 38], "bias": [40, 41], "binari": [1, 41], "bind": 33, "bird": 10, "blind": [], "block": [], "boldsymbol": [18, 34, 37], "book": [19, 40, 41], "boost": 10, "bootstrap": [6, 10, 37, 38], "boston": [], "breast": 1, "brief": [33, 37, 38, 43, 44], "bring": [12, 40, 41], "browser": [], "bsd": [], "build": [1, 3, 9, 41, 42, 43, 44], "c": [23, 27, 28, 33], "calcul": [18, 34, 35], "can": [33, 36, 37, 38, 40], "cancer": [1, 7, 9, 11], "cart": 9, "case": [8, 10, 30, 34, 35, 36, 38, 39, 43, 44], "ccn": 44, "cdn": [], "cell": [], "central": [13, 25, 30, 35, 37, 38, 39], "chain": [12, 40, 41], "challeng": 36, "chang": 10, "changelog": [], "channel": 33, "chi": [0, 33], "choic": [17, 41], "choos": [1, 36, 41], "cifar01": [3, 44], "citat": [], "class": [38, 39, 40], "classic": 11, "classif": [1, 9, 10, 28, 38, 39, 41, 42], "classifi": [8, 38], "claus": [], "clip": [1, 41], "cluster": 14, "cnn": [3, 43, 44], "code": [1, 2, 5, 9, 11, 12, 13, 14, 15, 16, 20, 27, 28, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "collect": [1, 3, 41, 42, 44], "color": [], "colorblind": [], "combin": 36, "common": [43, 44], "commun": 33, "commut": [43, 44], "compact": [38, 39, 40, 41], "compar": [2, 10, 16, 42, 43], "comparison": [35, 36], "compet": 36, "compil": 44, "complet": [34, 40, 41], "complex": [0, 6, 27, 34], "complic": [6, 40], "compon": 11, "compress": 43, "comput": [9, 19, 36], "computation": [37, 38], "computerlab": 33, "con": [9, 36], "concept": 30, "condit": 35, "confid": 37, "confus": 23, "conjug": 13, "consider": [40, 41, 43, 44], "constraint": 36, "construct": [40, 41, 42], "contain": [], "content": [], "continu": 39, "contn": 33, "contrast": [], "contributor": [], "converg": 36, "convex": [8, 13, 35, 36], "convolut": [3, 12, 39, 40, 43, 44], "copyright": [], "core": [], "correct": 36, "correl": [11, 34, 39, 43, 44], "correspond": [], "cost": [1, 10, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43], "count": 40, "cours": [25, 29, 32, 33], "covari": [5, 11, 30, 34], "cover": 33, "creat": [16, 20], "creator": [], "critic": 28, "cross": [6, 27, 37, 38, 39, 43, 44], "ct": [43, 44], "cumul": 23, "curv": 23, "custom": 21, "cython": 33, "d": [27, 28], "dark": [], "data": [0, 1, 3, 6, 7, 9, 11, 15, 17, 18, 21, 25, 30, 33, 34, 38, 39, 40, 41, 42, 44], "dataset": [1, 3, 18, 41, 44], "david": 33, "deadlin": [27, 28, 33], "deadllin": 31, "decai": [2, 36, 42, 43], "decis": [9, 10], "decomposit": [5, 11, 26, 34, 35], "deeep": [], "deep": [1, 2, 33, 36, 38, 39, 40, 41, 42, 43, 44], "defin": [1, 33, 40, 41, 42, 43], "definit": [19, 40, 41], "deflist": [], "degre": [0, 17, 34], "deliver": [15, 16, 19, 20, 24, 27, 28], "deliveri": [27, 28], "delta": 37, "dens": [0, 44], "depend": [], "depth": 28, "deriv": [5, 12, 16, 17, 19, 34, 35, 36, 37, 40, 41], "descent": [2, 10, 13, 18, 27, 35, 36, 39, 42, 43], "design": 34, "detail": [3, 33, 42, 43, 44], "develop": [1, 41], "diagon": 11, "differ": [8, 28, 36, 43, 44], "differenti": [2, 13, 36, 40, 42, 43], "diffus": [2, 42, 43], "dimens": 36, "dimension": [2, 3, 8, 18, 42, 43, 44], "direct": [], "disadvantag": 9, "discret": [30, 43, 44], "discrimin": 33, "discuss": 39, "distribut": [5, 30, 37], "do": [1, 36, 39, 41, 43], "document": 20, "doe": [34, 35, 39], "domain": 30, "don": [43, 44], "dot": [43, 44], "down": [1, 41], "dropout": [1, 41], "e": [27, 28], "each": [21, 38], "economi": [34, 35], "effici": [43, 44], "electron": [27, 28], "element": [0, 30, 33], "elimin": 26, "elu": [41, 42], "empir": 36, "energi": 33, "ensembl": 10, "entri": [40, 41], "entropi": [9, 38, 39], "environ": [0, 15], "equat": [0, 2, 12, 34, 35, 38, 39, 40, 41, 42, 43], "era": 43, "error": [0, 10, 33, 34, 35, 37, 38], "essenti": 33, "estim": 37, "et": [36, 43, 44], "etc": [33, 43, 44], "euler": [2, 42, 43], "evalu": [1, 28, 40, 41, 44], "evid": 36, "exampl": [1, 2, 3, 4, 6, 7, 8, 9, 10, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "exercis": [0, 6, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 34, 40, 42], "expect": [19, 30, 37], "expens": [37, 38], "experi": 30, "explicit": [40, 41], "explod": 41, "explor": 0, "exponenti": [2, 36, 42, 43], "express": [16, 17, 19, 34, 38, 39, 40, 41], "extend": [35, 38, 39, 40], "extrapol": 4, "extrem": [10, 33], "ey": 10, "f": [27, 28], "f_1": 23, "fall": 31, "famili": [1, 33, 41, 42], "famou": 26, "fantast": [34, 35], "faq": [], "featur": [9, 16, 26, 34], "februari": [], "feed": [1, 12, 22, 39, 40, 41, 42], "figur": 20, "file": [43, 44], "fill": [], "final": [12, 34, 36, 40, 41, 42, 43, 44], "find": [16, 18, 37, 43, 44], "fine": [1, 41], "first": [4, 12, 33, 35, 40, 41, 42, 43], "fit": [0, 10, 15, 16, 33, 35], "fix": [34, 35, 36], "float": 40, "fold": [37, 38], "forc": 3, "forest": 10, "form": 18, "format": [27, 28, 33], "formula": 18, "forward": [1, 2, 12, 22, 39, 40, 41, 42, 43], "foster": 33, "fourier": [3, 43, 44], "frank": 6, "freedom": [0, 17, 34], "frequent": [34, 36], "frequentist": [0, 33], "from": [5, 10, 12, 28, 33, 34, 35, 36, 37, 38, 39, 40, 41, 43, 44], "full": [2, 36, 41, 42, 43, 44], "function": [0, 1, 6, 7, 8, 10, 11, 12, 13, 27, 28, 30, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43], "funtion": 41, "further": [3, 5, 34, 35, 43, 44], "g": [27, 28], "gain": 23, "gan": 4, "gate": [39, 41, 42], "gaussian": 26, "gd": [13, 36], "gener": [4, 9, 33, 38, 39, 40, 43, 44], "geometr": [11, 35], "get": [20, 40], "gini": 9, "github": 15, "glorot": 41, "goal": [15, 16, 17, 18, 19, 20], "good": [0, 20, 33], "goodfellow": 36, "gotthard": [], "grade": [31, 33], "gradient": [1, 2, 10, 13, 18, 22, 27, 28, 35, 36, 39, 40, 41, 42, 43], "greativ": [], "group": 38, "growth": [2, 42, 43], "guid": [], "h": 27, "ha": 25, "hand": [22, 40, 41], "handl": [26, 33], "happen": [37, 38], "hessian": [34, 35, 36], "hidden": [2, 40, 41, 42, 43], "high": [], "histogram": 37, "histori": [], "homogen": 41, "hous": [], "how": [16, 43], "hyperbol": [39, 41], "hyperparamet": [1, 17, 41], "hyperplan": 8, "i": [0, 1, 33, 41, 42, 43, 44], "id3": 9, "idea": [11, 43, 44], "ideal": 35, "ident": 37, "identifi": 37, "ii": [33, 42, 43], "iid": 37, "iii": [42, 43], "illustr": [35, 39, 40], "imag": [43, 44], "implement": [1, 16, 17, 18, 28, 41, 42, 43], "implic": [5, 34, 35], "import": [5, 26, 33, 34, 35, 40, 41, 44], "improv": [1, 36, 41], "includ": [13, 27, 28, 36, 38, 39, 40], "incorpor": [], "increment": 11, "independ": 37, "index": 9, "inform": 31, "ingredi": 40, "init": [41, 42], "input": [2, 21, 22, 40, 41, 42, 43], "insight": 41, "instal": [25, 27, 33], "instructor": 31, "intermedi": 40, "interpret": [5, 11, 19, 33, 34, 35, 37], "interv": 37, "introduc": [11, 13, 34], "introduct": [0, 6, 20, 25, 26, 27, 28, 33, 39, 40], "invers": [5, 26], "invert": [34, 35], "ipython": [], "iter": 10, "its": 34, "iv": [42, 43], "j": [], "jacobian": [34, 42, 43], "januari": [], "jax": 13, "job": 39, "julia": 33, "jungl": 10, "jupyt": [], "k": [37, 38, 40, 41], "kei": [43, 44], "kera": [1, 3, 41, 42, 43, 44], "kernel": [8, 11], "l": [40, 41], "lab": [35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "lagrangian": 8, "lasso": [5, 6, 27, 34, 35], "last": [34, 36, 39, 40, 41, 44], "later": [5, 34, 35], "layer": [1, 2, 3, 12, 21, 22, 40, 41, 42, 43, 44], "layout": [40, 41], "learn": [0, 1, 2, 11, 13, 14, 15, 16, 17, 18, 19, 20, 25, 27, 28, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43], "least": [5, 6, 16, 19, 27, 28, 33, 34, 35, 36], "lectur": [33, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "level": 10, "librari": [25, 28, 33], "licens": [], "light": [], "likelihood": [7, 37, 38, 39], "limit": [1, 13, 30, 35, 36, 37, 41], "linear": [0, 8, 13, 15, 26, 33, 34, 35, 38], "link": [5, 11, 32, 34, 37], "list": [40, 41], "literatur": [27, 28], "logist": [7, 33, 38, 39, 41], "loss": [34, 35, 36], "lu": 26, "ma": [], "machin": [0, 8, 13, 25, 27, 28, 33, 35, 38, 39], "machineri": 28, "made": 37, "main": [30, 33], "make": [0, 9, 10, 20, 34], "mani": [10, 12], "markdown": [], "mask": [], "maskedarrai": [], "mass": 33, "materi": [27, 28, 33, 34, 35, 36, 37, 38, 40, 41, 43, 44], "math": [5, 34, 35], "mathemat": [3, 5, 8, 34, 35, 39, 40, 41, 43, 44], "matplotlib": [], "matric": [5, 26, 33, 43, 44], "matrix": [1, 5, 11, 12, 16, 23, 26, 33, 34, 35, 36, 39, 41], "matter": 0, "max": 34, "maximum": [37, 38, 39], "me": [], "mean": [0, 34, 35, 38], "measur": [23, 39], "medic": [43, 44], "meet": [5, 10, 30, 33, 34], "memori": [36, 43, 44], "mercer": 8, "metadata": [], "method": [6, 9, 10, 13, 27, 28, 33, 35, 36, 37, 38, 39, 41, 42], "metric": 19, "midnight": [27, 28], "min": 34, "mini": 36, "minibatch": 36, "minim": [33, 38, 39, 42, 43], "mit": [], "ml": 33, "mle": 37, "mlp": 12, "mnist": [3, 4, 42, 44], "mode": 40, "model": [0, 1, 4, 6, 12, 15, 17, 33, 39, 40, 41, 43, 44], "moment": 36, "momentum": [13, 27, 36], "mondai": [35, 36, 37, 39, 40, 42, 43, 44], "moon": [8, 9], "more": [3, 6, 26, 27, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "motiv": 36, "move": 36, "multi": [39, 40, 41], "multiclass": [41, 42], "multilay": [12, 39, 40], "multipl": [1, 3, 17, 21, 41, 43, 44], "multipli": 8, "multivari": 40, "myst": [], "ncsa": [], "need": [27, 33], "network": [1, 2, 3, 4, 7, 12, 28, 33, 36, 38, 39, 40, 41, 42, 43, 44], "neural": [1, 2, 3, 4, 7, 12, 28, 33, 36, 39, 40, 41, 42, 43, 44], "neuron": [39, 40, 43, 44], "new": [4, 18, 37, 40, 43, 44], "newton": [35, 36, 38, 39], "nn": [40, 41, 42, 43, 44], "node": [40, 41], "noeds": 41, "non": [8, 36], "none": 36, "norm": 28, "normal": [0, 1, 37, 41], "notat": [12, 39], "note": [27, 28, 34, 35], "notebook": [], "novemb": [28, 44], "now": [1, 9, 13, 35, 36, 37, 38], "nuclear": [0, 33], "nueral": 38, "numba": 33, "number": [0, 2, 22, 30, 34, 36, 40, 42, 43, 44], "numer": [2, 27, 28, 30, 42, 43], "numpi": [26, 33], "object": [3, 22, 41, 43, 44], "observ": [40, 41], "obtain": 11, "octob": [27, 40, 41, 42, 43], "od": [2, 42, 43], "off": [6, 19, 27], "ol": [5, 6, 15, 16, 18, 27, 35, 37], "onc": 21, "one": [2, 12, 18, 22, 35, 40, 41, 42, 43, 44], "ones": [39, 41], "open": [], "oper": [26, 40], "optim": [1, 8, 13, 18, 25, 33, 34, 35, 36, 38, 39, 40, 41], "option": [21, 22, 28], "order": [13, 18, 36], "ordinari": [5, 6, 16, 19, 27, 33, 34, 35, 36, 42, 43], "organ": [0, 33], "orient": [22, 41], "oslo": 32, "other": [4, 9, 11, 12, 23, 26, 27, 28, 33, 39, 40, 41, 42], "ouput": [40, 41], "our": [0, 4, 5, 11, 13, 27, 28, 33, 34, 35, 38, 39, 41, 42], "outcom": [25, 33], "output": [2, 40, 41, 42, 43], "over": [40, 41], "overarch": [0, 4, 8, 9, 21, 22, 23, 24, 33, 34, 40], "overview": [10, 33, 36], "own": [0, 10, 11, 27, 28, 33, 34, 42], "packag": [26, 33], "pad": [43, 44], "panda": [33, 34], "paper": 41, "parallel": 40, "paramet": [33, 34, 38, 39, 40, 41, 43, 44], "paramt": 18, "part": [13, 25, 27, 28, 35, 38, 39, 40, 41, 42, 44], "partial": [2, 42, 43], "pass": [1, 22, 41], "pca": 11, "pdf": 30, "percepetron": [40, 41], "perceptron": [12, 39, 40, 41], "perform": [1, 9, 41, 43, 44], "period": 3, "perspect": [1, 41], "pitaya": [], "plan": [34, 35, 36, 37, 38, 40, 42, 43, 44], "plethora": 33, "plot": [37, 38], "point": [4, 40], "poisson": [2, 42, 43], "polici": [], "polynomi": [3, 16, 18, 35, 43, 44], "pool": [43, 44], "popul": [2, 42, 43], "popular": 33, "possibl": [42, 43], "practic": [13, 31, 33, 36], "pre": [1, 3, 41, 42, 44], "preambl": [27, 28], "precis": 23, "predict": [4, 21], "predictor": [38, 39], "preprocess": [34, 36], "prerequisit": [3, 25, 33, 44], "present": 20, "princip": 11, "principl": 3, "pro": [9, 36], "probabl": [5, 30, 37], "problem": [1, 2, 13, 33, 34, 35, 36, 38, 39, 40, 41, 42, 43], "procedur": [9, 33], "process": [1, 3, 21, 41, 42, 43, 44], "product": [43, 44], "program": [2, 13, 27, 28, 35, 36, 40, 42, 43], "project": [6, 20, 27, 28, 31, 33], "prop": 13, "propag": [1, 12, 40, 41, 42, 43], "properti": [5, 30, 34, 35, 36, 38], "python": [0, 9, 15, 25, 26, 33], "pytorch": [42, 43, 44], "quick": 8, "quickli": [], "r": 33, "random": [10, 11, 30], "raphson": [35, 38, 39], "raschka": [43, 44], "rate": [27, 36, 41, 42], "read": [9, 33, 34, 36, 37, 38, 39, 40, 41], "real": [6, 21, 33, 40], "recal": 23, "recogn": [43, 44], "recommend": [33, 34, 41], "record": [], "recurr": [4, 12, 39, 40], "reduc": [0, 34, 40], "reduct": [3, 44], "refer": [27, 28], "referenc": 20, "reformul": [2, 42, 43], "regress": [0, 5, 6, 7, 9, 10, 13, 15, 17, 18, 19, 27, 28, 33, 34, 35, 36, 37, 38, 39], "regular": [1, 41, 43, 44], "relat": [], "relev": [32, 34, 39, 41], "relu": [1, 41, 42], "remark": [3, 43, 44], "remind": [6, 8, 28, 33, 34, 35, 36, 40, 41, 44], "replac": [13, 36], "report": [20, 27, 28], "repositori": [15, 37, 38], "requir": [2, 25, 28, 42, 43], "resampl": [6, 19, 27, 37, 38], "rescal": [6, 34], "residu": [34, 35], "resourc": [2, 42, 43], "result": [34, 35, 40, 41], "revers": 40, "revis": [], "revisit": [13, 35, 36, 38, 39], "rewrit": [33, 34, 37, 43, 44], "rewritten": [38, 39], "ridg": [0, 5, 6, 17, 18, 19, 27, 34, 35, 36], "rm": 13, "rmsprop": 36, "roc": 23, "role": [], "root": 41, "rule": [12, 36, 40, 41], "run": 44, "rung": 27, "same": [13, 36, 37, 38], "sampl": 11, "scalabl": 36, "scale": [17, 18, 19, 34, 36, 43, 44], "scan": [43, 44], "schedul": [33, 41, 42], "schemat": 9, "scheme": [2, 42, 43], "scienc": 33, "scikit": [0, 1, 11, 33, 34, 35, 36, 37, 38, 39, 41], "second": [13, 18, 36], "select": 38, "semest": 31, "sensit": 35, "septemb": [19, 35, 36, 37, 38, 39], "seri": [43, 44], "seriou": 40, "session": [35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "set": [0, 2, 3, 9, 12, 15, 29, 33, 34, 35, 40, 41, 42, 43, 44], "setup": [15, 42, 43, 44], "sgd": [13, 36], "should": [1, 28, 41, 42], "show": [], "sigmoid": 41, "similar": [13, 36, 42], "simpl": [0, 4, 9, 13, 18, 33, 34, 35, 36, 38, 40, 41, 43, 44], "simpler": 40, "simplest": 18, "simplif": [43, 44], "singl": [10, 39, 40], "singular": [5, 11, 34, 35], "size": [34, 35, 36], "sklearn": 16, "slightli": 36, "smarter": 40, "smoothi": [], "sneak": 36, "soft": 8, "softmax": [1, 41], "softwar": [27, 28, 33], "solut": [42, 43], "solv": [2, 35, 38, 39, 42, 43], "solver": 13, "some": [13, 26, 34, 35, 38, 40], "sound": [43, 44], "sourc": [], "specif": [42, 43], "specifi": [2, 42, 43], "speed": 36, "sphinx": [], "split": [0, 15, 34], "squar": [0, 5, 6, 10, 16, 19, 27, 33, 34, 35, 36], "stage": [43, 44], "standard": [13, 34, 37], "start": [20, 39, 43], "state": 0, "statist": [5, 6, 25, 30, 33, 37, 38], "steepest": [10, 13, 35], "step": [36, 37, 38], "stochast": [13, 27, 30, 36], "stop": 36, "strong": 44, "strongli": [33, 36], "structur": [], "studi": 39, "suggest": [33, 39], "sum": [37, 38, 40, 41], "summar": [43, 44], "summari": [28, 31, 33], "superposit": 3, "supervis": [1, 41], "support": 8, "svd": [5, 34, 35, 43], "synthet": [18, 38, 39], "systemat": [3, 44], "t": [34, 43, 44], "take": 16, "taken": [33, 36], "teach": 31, "teacher": [31, 33], "team": [], "technic": [34, 42, 43], "techniqu": [6, 11, 27], "technologi": 25, "tensorflow": [1, 3, 41, 42, 43, 44], "tent": [31, 33], "term": [37, 40, 41], "test": [0, 1, 15, 17, 28, 34, 41, 42], "texmath": [], "text": 33, "textbook": [32, 33], "than": 35, "thank": [], "theorem": [5, 8, 11, 12, 30, 37, 40], "theoret": 36, "theori": 30, "theta": [18, 37], "thi": [21, 22, 24, 33, 40], "three": [40, 41], "through": 40, "time": 36, "tip": [13, 36], "todo": [], "toeplitz": [43, 44], "togeth": [12, 40, 41], "tool": [27, 28, 33], "top": [1, 41, 44], "topic": 33, "toward": 11, "trade": [6, 19, 27], "tradeoff": [6, 37, 38], "train": [0, 1, 4, 15, 21, 22, 33, 34, 40, 41, 43, 44], "transform": [3, 43, 44], "translat": [], "tree": [9, 10], "trial": [42, 43], "tuesdai": [35, 39, 40, 41, 43], "tune": [1, 41], "two": [3, 8, 22, 25, 28, 38, 39, 40, 41, 43, 44], "type": [2, 4, 12, 33, 39, 40, 42, 43, 44], "uio": 33, "understand": [22, 37, 38], "univers": [12, 32, 40], "unsupervis": 14, "up": [0, 2, 9, 12, 15, 28, 33, 34, 35, 37, 38, 40, 41, 42, 43, 44], "updat": [27, 36, 40, 41, 42], "us": [0, 1, 2, 3, 7, 13, 16, 18, 19, 22, 25, 27, 28, 33, 34, 35, 36, 38, 39, 40, 41, 42, 43, 44], "usag": [36, 41, 42], "v": [3, 36, 43, 44], "valid": [6, 27, 37, 38], "valu": [5, 11, 19, 30, 34, 35, 37, 38], "vanish": 41, "vari": 36, "variabl": [30, 35], "varianc": [6, 19, 27, 37, 38], "variou": [0, 28, 37, 38], "vector": [8, 12, 16, 26, 33, 34, 39, 43, 44], "verifi": 44, "versu": 33, "video": [36, 37, 38, 39, 40, 41], "view": [0, 4, 10, 34, 40], "virtual": 15, "visual": [1, 9, 41, 44], "volum": [43, 44], "wai": [9, 27, 37, 38, 40, 43, 44], "warm": 28, "wave": [2, 42], "we": [33, 36, 40, 41, 42], "wednesdai": [35, 39, 40, 41, 43], "week": [15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44], "weekli": [], "weight": 41, "welcom": [], "well": [43, 44], "what": [0, 33, 34, 35, 37, 38, 43, 44], "when": 36, "which": [1, 36, 41, 42], "why": [33, 36, 37, 38, 39, 40, 42, 43, 44], "wisconsin": 7, "word": 40, "workflow": [], "wrap": 37, "write": [4, 11, 20, 22, 27, 28, 35, 41], "x": 34, "xgboost": 10, "xor": [39, 41, 42], "yaml": [], "yet": 35, "you": 28, "your": [0, 10, 16, 18, 27, 28, 34], "z_j": [40, 41]}})
\ No newline at end of file
diff --git a/doc/LectureNotes/_build/html/statistics.html b/doc/LectureNotes/_build/html/statistics.html
index 97a898137..d973b08be 100644
--- a/doc/LectureNotes/_build/html/statistics.html
+++ b/doc/LectureNotes/_build/html/statistics.html
@@ -230,10 +230,45 @@
 <li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek36.html">Exercises week 36</a></li>
 <li class="toctree-l1"><a class="reference internal" href="http://github.com/week36.html">Week 36: Linear Regression and Gradient descent</a></li>
 <li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek37.html">Exercises week 37</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week37.html">Week 37: Gradient descent methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek38.html">Exercises week 38</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week38.html">Week 38: Statistical analysis, bias-variance tradeoff and resampling methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek39.html">Exercises week 39</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week39.html">Week 39: Resampling methods and logistic regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week40.html">Week 40: Gradient descent methods (continued) and start Neural networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week41.html">Week 41 Neural networks and constructing a neural network code</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek41.html">Exercises week 41</a></li>
+
+
+
+
+
+
+
+
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week42.html">Week 42 Constructing a Neural Network code with examples</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek42.html">Exercises week 42</a></li>
+
+
+
+
+
+
+
+
+
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week43.html">Week 43: Deep Learning: Constructing a Neural Network code and solving differential equations</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek43.html">Exercises week 43</a></li>
+
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week44.html">Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek44.html">Exercises week 44</a></li>
+
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week45.html">Week 45,  Convolutional Neural Networks (CCNs)</a></li>
 </ul>
 <p aria-level="2" class="caption" role="heading"><span class="caption-text">Projects</span></p>
 <ul class="nav bd-sidenav">
 <li class="toctree-l1"><a class="reference internal" href="http://github.com/project1.html">Project 1 on Machine Learning, deadline October 6 (midnight), 2025</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/project2.html">Project 2 on Machine Learning, deadline November 10 (Midnight)</a></li>
 </ul>
 
     </div>
diff --git a/doc/LectureNotes/_build/html/teachers.html b/doc/LectureNotes/_build/html/teachers.html
index 1f89a2222..85acc656f 100644
--- a/doc/LectureNotes/_build/html/teachers.html
+++ b/doc/LectureNotes/_build/html/teachers.html
@@ -228,10 +228,45 @@
 <li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek36.html">Exercises week 36</a></li>
 <li class="toctree-l1"><a class="reference internal" href="http://github.com/week36.html">Week 36: Linear Regression and Gradient descent</a></li>
 <li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek37.html">Exercises week 37</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week37.html">Week 37: Gradient descent methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek38.html">Exercises week 38</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week38.html">Week 38: Statistical analysis, bias-variance tradeoff and resampling methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek39.html">Exercises week 39</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week39.html">Week 39: Resampling methods and logistic regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week40.html">Week 40: Gradient descent methods (continued) and start Neural networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week41.html">Week 41 Neural networks and constructing a neural network code</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek41.html">Exercises week 41</a></li>
+
+
+
+
+
+
+
+
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week42.html">Week 42 Constructing a Neural Network code with examples</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek42.html">Exercises week 42</a></li>
+
+
+
+
+
+
+
+
+
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week43.html">Week 43: Deep Learning: Constructing a Neural Network code and solving differential equations</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek43.html">Exercises week 43</a></li>
+
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week44.html">Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek44.html">Exercises week 44</a></li>
+
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week45.html">Week 45,  Convolutional Neural Networks (CCNs)</a></li>
 </ul>
 <p aria-level="2" class="caption" role="heading"><span class="caption-text">Projects</span></p>
 <ul class="nav bd-sidenav">
 <li class="toctree-l1"><a class="reference internal" href="http://github.com/project1.html">Project 1 on Machine Learning, deadline October 6 (midnight), 2025</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/project2.html">Project 2 on Machine Learning, deadline November 10 (Midnight)</a></li>
 </ul>
 
     </div>
diff --git a/doc/LectureNotes/_build/html/textbooks.html b/doc/LectureNotes/_build/html/textbooks.html
index b169538f8..565ff7ef8 100644
--- a/doc/LectureNotes/_build/html/textbooks.html
+++ b/doc/LectureNotes/_build/html/textbooks.html
@@ -228,10 +228,45 @@
 <li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek36.html">Exercises week 36</a></li>
 <li class="toctree-l1"><a class="reference internal" href="http://github.com/week36.html">Week 36: Linear Regression and Gradient descent</a></li>
 <li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek37.html">Exercises week 37</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week37.html">Week 37: Gradient descent methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek38.html">Exercises week 38</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week38.html">Week 38: Statistical analysis, bias-variance tradeoff and resampling methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek39.html">Exercises week 39</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week39.html">Week 39: Resampling methods and logistic regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week40.html">Week 40: Gradient descent methods (continued) and start Neural networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week41.html">Week 41 Neural networks and constructing a neural network code</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek41.html">Exercises week 41</a></li>
+
+
+
+
+
+
+
+
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week42.html">Week 42 Constructing a Neural Network code with examples</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek42.html">Exercises week 42</a></li>
+
+
+
+
+
+
+
+
+
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week43.html">Week 43: Deep Learning: Constructing a Neural Network code and solving differential equations</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek43.html">Exercises week 43</a></li>
+
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week44.html">Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek44.html">Exercises week 44</a></li>
+
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week45.html">Week 45,  Convolutional Neural Networks (CCNs)</a></li>
 </ul>
 <p aria-level="2" class="caption" role="heading"><span class="caption-text">Projects</span></p>
 <ul class="nav bd-sidenav">
 <li class="toctree-l1"><a class="reference internal" href="http://github.com/project1.html">Project 1 on Machine Learning, deadline October 6 (midnight), 2025</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/project2.html">Project 2 on Machine Learning, deadline November 10 (Midnight)</a></li>
 </ul>
 
     </div>
diff --git a/doc/LectureNotes/_build/html/week34.html b/doc/LectureNotes/_build/html/week34.html
index 4b2a537ec..d218bca00 100644
--- a/doc/LectureNotes/_build/html/week34.html
+++ b/doc/LectureNotes/_build/html/week34.html
@@ -230,10 +230,45 @@
 <li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek36.html">Exercises week 36</a></li>
 <li class="toctree-l1"><a class="reference internal" href="http://github.com/week36.html">Week 36: Linear Regression and Gradient descent</a></li>
 <li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek37.html">Exercises week 37</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week37.html">Week 37: Gradient descent methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek38.html">Exercises week 38</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week38.html">Week 38: Statistical analysis, bias-variance tradeoff and resampling methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek39.html">Exercises week 39</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week39.html">Week 39: Resampling methods and logistic regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week40.html">Week 40: Gradient descent methods (continued) and start Neural networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week41.html">Week 41 Neural networks and constructing a neural network code</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek41.html">Exercises week 41</a></li>
+
+
+
+
+
+
+
+
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week42.html">Week 42 Constructing a Neural Network code with examples</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek42.html">Exercises week 42</a></li>
+
+
+
+
+
+
+
+
+
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week43.html">Week 43: Deep Learning: Constructing a Neural Network code and solving differential equations</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek43.html">Exercises week 43</a></li>
+
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week44.html">Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek44.html">Exercises week 44</a></li>
+
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week45.html">Week 45,  Convolutional Neural Networks (CCNs)</a></li>
 </ul>
 <p aria-level="2" class="caption" role="heading"><span class="caption-text">Projects</span></p>
 <ul class="nav bd-sidenav">
 <li class="toctree-l1"><a class="reference internal" href="http://github.com/project1.html">Project 1 on Machine Learning, deadline October 6 (midnight), 2025</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/project2.html">Project 2 on Machine Learning, deadline November 10 (Midnight)</a></li>
 </ul>
 
     </div>
diff --git a/doc/LectureNotes/_build/html/week35.html b/doc/LectureNotes/_build/html/week35.html
index 4b070b2bc..6a2b2b830 100644
--- a/doc/LectureNotes/_build/html/week35.html
+++ b/doc/LectureNotes/_build/html/week35.html
@@ -230,10 +230,45 @@
 <li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek36.html">Exercises week 36</a></li>
 <li class="toctree-l1"><a class="reference internal" href="http://github.com/week36.html">Week 36: Linear Regression and Gradient descent</a></li>
 <li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek37.html">Exercises week 37</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week37.html">Week 37: Gradient descent methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek38.html">Exercises week 38</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week38.html">Week 38: Statistical analysis, bias-variance tradeoff and resampling methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek39.html">Exercises week 39</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week39.html">Week 39: Resampling methods and logistic regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week40.html">Week 40: Gradient descent methods (continued) and start Neural networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week41.html">Week 41 Neural networks and constructing a neural network code</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek41.html">Exercises week 41</a></li>
+
+
+
+
+
+
+
+
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week42.html">Week 42 Constructing a Neural Network code with examples</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek42.html">Exercises week 42</a></li>
+
+
+
+
+
+
+
+
+
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week43.html">Week 43: Deep Learning: Constructing a Neural Network code and solving differential equations</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek43.html">Exercises week 43</a></li>
+
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week44.html">Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek44.html">Exercises week 44</a></li>
+
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week45.html">Week 45,  Convolutional Neural Networks (CCNs)</a></li>
 </ul>
 <p aria-level="2" class="caption" role="heading"><span class="caption-text">Projects</span></p>
 <ul class="nav bd-sidenav">
 <li class="toctree-l1"><a class="reference internal" href="http://github.com/project1.html">Project 1 on Machine Learning, deadline October 6 (midnight), 2025</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/project2.html">Project 2 on Machine Learning, deadline November 10 (Midnight)</a></li>
 </ul>
 
     </div>
diff --git a/doc/LectureNotes/_build/html/week36.html b/doc/LectureNotes/_build/html/week36.html
index ac976bbb6..0fe2ee2e3 100644
--- a/doc/LectureNotes/_build/html/week36.html
+++ b/doc/LectureNotes/_build/html/week36.html
@@ -230,10 +230,45 @@
 <li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek36.html">Exercises week 36</a></li>
 <li class="toctree-l1 current active"><a class="current reference internal" href="#">Week 36: Linear Regression and Gradient descent</a></li>
 <li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek37.html">Exercises week 37</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week37.html">Week 37: Gradient descent methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek38.html">Exercises week 38</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week38.html">Week 38: Statistical analysis, bias-variance tradeoff and resampling methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek39.html">Exercises week 39</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week39.html">Week 39: Resampling methods and logistic regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week40.html">Week 40: Gradient descent methods (continued) and start Neural networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week41.html">Week 41 Neural networks and constructing a neural network code</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek41.html">Exercises week 41</a></li>
+
+
+
+
+
+
+
+
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week42.html">Week 42 Constructing a Neural Network code with examples</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek42.html">Exercises week 42</a></li>
+
+
+
+
+
+
+
+
+
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week43.html">Week 43: Deep Learning: Constructing a Neural Network code and solving differential equations</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek43.html">Exercises week 43</a></li>
+
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week44.html">Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek44.html">Exercises week 44</a></li>
+
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week45.html">Week 45,  Convolutional Neural Networks (CCNs)</a></li>
 </ul>
 <p aria-level="2" class="caption" role="heading"><span class="caption-text">Projects</span></p>
 <ul class="nav bd-sidenav">
 <li class="toctree-l1"><a class="reference internal" href="http://github.com/project1.html">Project 1 on Machine Learning, deadline October 6 (midnight), 2025</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/project2.html">Project 2 on Machine Learning, deadline November 10 (Midnight)</a></li>
 </ul>
 
     </div>
diff --git a/doc/LectureNotes/_build/html/week37.html b/doc/LectureNotes/_build/html/week37.html
new file mode 100644
index 000000000..345a65c8a
--- /dev/null
+++ b/doc/LectureNotes/_build/html/week37.html
@@ -0,0 +1,2927 @@
+
+<!DOCTYPE html>
+
+
+<html lang="en" data-content_root="./" >
+
+  <head>
+    <meta charset="utf-8" />
+    <meta name="viewport" content="width=device-width, initial-scale=1.0" /><meta name="viewport" content="width=device-width, initial-scale=1" />
+
+    <title>Week 37: Gradient descent methods &#8212; Applied Data Analysis and Machine Learning</title>
+  
+  
+  
+  <script data-cfasync="false">
+    document.documentElement.dataset.mode = localStorage.getItem("mode") || "";
+    document.documentElement.dataset.theme = localStorage.getItem("theme") || "";
+  </script>
+  
+  <!-- Loaded before other Sphinx assets -->
+  <link href="http://github.com/_static/styles/theme.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+<link href="http://github.com/_static/styles/bootstrap.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+<link href="http://github.com/_static/styles/pydata-sphinx-theme.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+
+  
+  <link href="http://github.com/_static/vendor/fontawesome/6.5.2/css/all.min.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+  <link rel="preload" as="font" type="font/woff2" crossorigin href="http://github.com/_static/vendor/fontawesome/6.5.2/webfonts/fa-solid-900.woff2" />
+<link rel="preload" as="font" type="font/woff2" crossorigin href="http://github.com/_static/vendor/fontawesome/6.5.2/webfonts/fa-brands-400.woff2" />
+<link rel="preload" as="font" type="font/woff2" crossorigin href="http://github.com/_static/vendor/fontawesome/6.5.2/webfonts/fa-regular-400.woff2" />
+
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/pygments.css?v=fa44fd50" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/styles/sphinx-book-theme.css?v=eba8b062" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/togglebutton.css?v=13237357" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/copybutton.css?v=76b2166b" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/mystnb.8ecb98da25f57f5357bf6f572d296f466b2cfe2517ffebfabe82451661e28f02.css?v=6644e6bb" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/sphinx-thebe.css?v=4fa983c6" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/sphinx-design.min.css?v=95c83b7e" />
+  
+  <!-- Pre-loaded scripts that we'll load fully later -->
+  <link rel="preload" as="script" href="http://github.com/_static/scripts/bootstrap.js?digest=dfe6caa3a7d634c4db9b" />
+<link rel="preload" as="script" href="http://github.com/_static/scripts/pydata-sphinx-theme.js?digest=dfe6caa3a7d634c4db9b" />
+  <script src="http://github.com/_static/vendor/fontawesome/6.5.2/js/all.min.js?digest=dfe6caa3a7d634c4db9b"></script>
+
+    <script src="http://github.com/_static/documentation_options.js?v=9eb32ce0"></script>
+    <script src="http://github.com/_static/doctools.js?v=9a2dae69"></script>
+    <script src="http://github.com/_static/sphinx_highlight.js?v=dc90522c"></script>
+    <script src="http://github.com/_static/clipboard.min.js?v=a7894cd8"></script>
+    <script src="http://github.com/_static/copybutton.js?v=f281be69"></script>
+    <script src="http://github.com/_static/scripts/sphinx-book-theme.js?v=887ef09a"></script>
+    <script>let toggleHintShow = 'Click to show';</script>
+    <script>let toggleHintHide = 'Click to hide';</script>
+    <script>let toggleOpenOnPrint = 'true';</script>
+    <script src="http://github.com/_static/togglebutton.js?v=4a39c7ea"></script>
+    <script>var togglebuttonSelector = '.toggle, .admonition.dropdown';</script>
+    <script src="http://github.com/_static/design-tabs.js?v=f930bc37"></script>
+    <script>const THEBE_JS_URL = "https://unpkg.com/thebe@0.8.2/lib/index.js"; const thebe_selector = ".thebe,.cell"; const thebe_selector_input = "pre"; const thebe_selector_output = ".output, .cell_output"</script>
+    <script async="async" src="http://github.com/_static/sphinx-thebe.js?v=c100c467"></script>
+    <script>var togglebuttonSelector = '.toggle, .admonition.dropdown';</script>
+    <script>const THEBE_JS_URL = "https://unpkg.com/thebe@0.8.2/lib/index.js"; const thebe_selector = ".thebe,.cell"; const thebe_selector_input = "pre"; const thebe_selector_output = ".output, .cell_output"</script>
+    <script>window.MathJax = {"options": {"processHtmlClass": "tex2jax_process|mathjax_process|math|output_area"}}</script>
+    <script defer="defer" src="https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-mml-chtml.js"></script>
+    <script>DOCUMENTATION_OPTIONS.pagename = 'week37';</script>
+    <link rel="index" title="Index" href="http://github.com/genindex.html" />
+    <link rel="search" title="Search" href="http://github.com/search.html" />
+    <link rel="next" title="Exercises week 38" href="http://github.com/exercisesweek38.html" />
+    <link rel="prev" title="Exercises week 37" href="http://github.com/exercisesweek37.html" />
+  <meta name="viewport" content="width=device-width, initial-scale=1"/>
+  <meta name="docsearch:language" content="en"/>
+  </head>
+  
+  
+  <body data-bs-spy="scroll" data-bs-target=".bd-toc-nav" data-offset="180" data-bs-root-margin="0px 0px -60%" data-default-mode="">
+
+  
+  
+  <div id="pst-skip-link" class="skip-link d-print-none"><a href="#main-content">Skip to main content</a></div>
+  
+  <div id="pst-scroll-pixel-helper"></div>
+  
+  <button type="button" class="btn rounded-pill" id="pst-back-to-top">
+    <i class="fa-solid fa-arrow-up"></i>Back to top</button>
+
+  
+  <input type="checkbox"
+          class="sidebar-toggle"
+          id="pst-primary-sidebar-checkbox"/>
+  <label class="overlay overlay-primary" for="pst-primary-sidebar-checkbox"></label>
+  
+  <input type="checkbox"
+          class="sidebar-toggle"
+          id="pst-secondary-sidebar-checkbox"/>
+  <label class="overlay overlay-secondary" for="pst-secondary-sidebar-checkbox"></label>
+  
+  <div class="search-button__wrapper">
+    <div class="search-button__overlay"></div>
+    <div class="search-button__search-container">
+<form class="bd-search d-flex align-items-center"
+      action="http://github.com/search.html"
+      method="get">
+  <i class="fa-solid fa-magnifying-glass"></i>
+  <input type="search"
+         class="form-control"
+         name="q"
+         id="search-input"
+         placeholder="Search this book..."
+         aria-label="Search this book..."
+         autocomplete="off"
+         autocorrect="off"
+         autocapitalize="off"
+         spellcheck="false"/>
+  <span class="search-button__kbd-shortcut"><kbd class="kbd-shortcut__modifier">Ctrl</kbd>+<kbd>K</kbd></span>
+</form></div>
+  </div>
+
+  <div class="pst-async-banner-revealer d-none">
+  <aside id="bd-header-version-warning" class="d-none d-print-none" aria-label="Version warning"></aside>
+</div>
+
+  
+    <header class="bd-header navbar navbar-expand-lg bd-navbar d-print-none">
+    </header>
+  
+
+  <div class="bd-container">
+    <div class="bd-container__inner bd-page-width">
+      
+      
+      
+      <div class="bd-sidebar-primary bd-sidebar">
+        
+
+  
+  <div class="sidebar-header-items sidebar-primary__section">
+    
+    
+    
+    
+  </div>
+  
+    <div class="sidebar-primary-items__start sidebar-primary__section">
+        <div class="sidebar-primary-item">
+
+  
+    
+  
+
+<a class="navbar-brand logo" href="http://github.com/intro.html">
+  
+  
+  
+  
+  
+    
+    
+      
+    
+    
+    <img src="http://github.com/_static/logo.png" class="logo__image only-light" alt="Applied Data Analysis and Machine Learning - Home"/>
+    <script>document.write(`<img src="http://github.com/_static/logo.png" class="logo__image only-dark" alt="Applied Data Analysis and Machine Learning - Home"/>`);</script>
+  
+  
+</a></div>
+        <div class="sidebar-primary-item">
+
+ <script>
+ document.write(`
+   <button class="btn search-button-field search-button__button" title="Search" aria-label="Search" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <i class="fa-solid fa-magnifying-glass"></i>
+    <span class="search-button__default-text">Search</span>
+    <span class="search-button__kbd-shortcut"><kbd class="kbd-shortcut__modifier">Ctrl</kbd>+<kbd class="kbd-shortcut__modifier">K</kbd></span>
+   </button>
+ `);
+ </script></div>
+        <div class="sidebar-primary-item"><nav class="bd-links bd-docs-nav" aria-label="Main">
+    <div class="bd-toc-item navbar-nav active">
+        
+        <ul class="nav bd-sidenav bd-sidenav__home-link">
+            <li class="toctree-l1">
+                <a class="reference internal" href="http://github.com/intro.html">
+                    Applied Data Analysis and Machine Learning
+                </a>
+            </li>
+        </ul>
+        <p aria-level="2" class="caption" role="heading"><span class="caption-text">About the course</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/schedule.html">Course setting</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/teachers.html">Teachers and Grading</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/textbooks.html">Textbooks</a></li>
+
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Review of Statistics with Resampling Techniques and Linear Algebra</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/statistics.html">1. Elements of Probability Theory and Statistical Data Analysis</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/linalg.html">2. Linear Algebra, Handling of Arrays and more Python Features</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">From Regression to Support Vector Machines</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter1.html">3. Linear Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter2.html">4. Ridge and Lasso Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter3.html">5. Resampling Methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter4.html">6. Logistic Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapteroptimization.html">7. Optimization, the central part of any Machine Learning algortithm</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter5.html">8. Support Vector Machines, overarching aims</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Decision Trees, Ensemble Methods and Boosting</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter6.html">9. Decision trees, overarching aims</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter7.html">10. Ensemble Methods: From a Single Tree to Many Trees and Extreme Boosting, Meet the Jungle of Methods</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Dimensionality Reduction</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter8.html">11. Basic ideas of the Principal Component Analysis (PCA)</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/clustering.html">12. Clustering and Unsupervised Learning</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Deep Learning Methods</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter9.html">13. Neural networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter10.html">14. Building a Feed Forward Neural Network</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter11.html">15. Solving Differential Equations  with Deep Learning</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter12.html">16. Convolutional Neural Networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter13.html">17. Recurrent neural networks: Overarching view</a></li>
+
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Weekly material, notes and exercises</span></p>
+<ul class="current nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek34.html">Exercises week 34</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week34.html">Week 34: Introduction to the course, Logistics and Practicalities</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek35.html">Exercises week 35</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week35.html">Week 35: From Ordinary Linear Regression to Ridge and Lasso Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek36.html">Exercises week 36</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week36.html">Week 36: Linear Regression and Gradient descent</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek37.html">Exercises week 37</a></li>
+<li class="toctree-l1 current active"><a class="current reference internal" href="#">Week 37: Gradient descent methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek38.html">Exercises week 38</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week38.html">Week 38: Statistical analysis, bias-variance tradeoff and resampling methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek39.html">Exercises week 39</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week39.html">Week 39: Resampling methods and logistic regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week40.html">Week 40: Gradient descent methods (continued) and start Neural networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week41.html">Week 41 Neural networks and constructing a neural network code</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek41.html">Exercises week 41</a></li>
+
+
+
+
+
+
+
+
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week42.html">Week 42 Constructing a Neural Network code with examples</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek42.html">Exercises week 42</a></li>
+
+
+
+
+
+
+
+
+
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week43.html">Week 43: Deep Learning: Constructing a Neural Network code and solving differential equations</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek43.html">Exercises week 43</a></li>
+
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week44.html">Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek44.html">Exercises week 44</a></li>
+
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week45.html">Week 45,  Convolutional Neural Networks (CCNs)</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Projects</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/project1.html">Project 1 on Machine Learning, deadline October 6 (midnight), 2025</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/project2.html">Project 2 on Machine Learning, deadline November 10 (Midnight)</a></li>
+</ul>
+
+    </div>
+</nav></div>
+    </div>
+  
+  
+  <div class="sidebar-primary-items__end sidebar-primary__section">
+  </div>
+  
+  <div id="rtd-footer-container"></div>
+
+
+      </div>
+      
+      <main id="main-content" class="bd-main" role="main">
+        
+        
+
+<div class="sbt-scroll-pixel-helper"></div>
+
+          <div class="bd-content">
+            <div class="bd-article-container">
+              
+              <div class="bd-header-article d-print-none">
+<div class="header-article-items header-article__inner">
+  
+    <div class="header-article-items__start">
+      
+        <div class="header-article-item"><button class="sidebar-toggle primary-toggle btn btn-sm" title="Toggle primary sidebar" data-bs-placement="bottom" data-bs-toggle="tooltip">
+  <span class="fa-solid fa-bars"></span>
+</button></div>
+      
+    </div>
+  
+  
+    <div class="header-article-items__end">
+      
+        <div class="header-article-item">
+
+<div class="article-header-buttons">
+
+
+
+
+
+<div class="dropdown dropdown-download-buttons">
+  <button class="btn dropdown-toggle" type="button" data-bs-toggle="dropdown" aria-expanded="false" aria-label="Download this page">
+    <i class="fas fa-download"></i>
+  </button>
+  <ul class="dropdown-menu">
+      
+      
+      
+      <li><a href="http://github.com/_sources/week37.ipynb" target="_blank"
+   class="btn btn-sm btn-download-source-button dropdown-item"
+   title="Download source file"
+   data-bs-placement="left" data-bs-toggle="tooltip"
+>
+  
+
+<span class="btn__icon-container">
+  <i class="fas fa-file"></i>
+  </span>
+<span class="btn__text-container">.ipynb</span>
+</a>
+</li>
+      
+      
+      
+      
+      <li>
+<button onclick="window.print()"
+  class="btn btn-sm btn-download-pdf-button dropdown-item"
+  title="Print to PDF"
+  data-bs-placement="left" data-bs-toggle="tooltip"
+>
+  
+
+<span class="btn__icon-container">
+  <i class="fas fa-file-pdf"></i>
+  </span>
+<span class="btn__text-container">.pdf</span>
+</button>
+</li>
+      
+  </ul>
+</div>
+
+
+
+
+<button onclick="toggleFullScreen()"
+  class="btn btn-sm btn-fullscreen-button"
+  title="Fullscreen mode"
+  data-bs-placement="bottom" data-bs-toggle="tooltip"
+>
+  
+
+<span class="btn__icon-container">
+  <i class="fas fa-expand"></i>
+  </span>
+
+</button>
+
+
+
+<script>
+document.write(`
+  <button class="btn btn-sm nav-link pst-navbar-icon theme-switch-button" title="light/dark" aria-label="light/dark" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <i class="theme-switch fa-solid fa-sun fa-lg" data-mode="light"></i>
+    <i class="theme-switch fa-solid fa-moon fa-lg" data-mode="dark"></i>
+    <i class="theme-switch fa-solid fa-circle-half-stroke fa-lg" data-mode="auto"></i>
+  </button>
+`);
+</script>
+
+
+<script>
+document.write(`
+  <button class="btn btn-sm pst-navbar-icon search-button search-button__button" title="Search" aria-label="Search" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <i class="fa-solid fa-magnifying-glass fa-lg"></i>
+  </button>
+`);
+</script>
+<button class="sidebar-toggle secondary-toggle btn btn-sm" title="Toggle secondary sidebar" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <span class="fa-solid fa-list"></span>
+</button>
+</div></div>
+      
+    </div>
+  
+</div>
+</div>
+              
+              
+
+<div id="jb-print-docs-body" class="onlyprint">
+    <h1>Week 37: Gradient descent methods</h1>
+    <!-- Table of contents -->
+    <div id="print-main-content">
+        <div id="jb-print-toc">
+            
+            <div>
+                <h2> Contents </h2>
+            </div>
+            <nav aria-label="Page">
+                <ul class="visible nav section-nav flex-column">
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#plans-for-week-37-lecture-monday">Plans for week 37, lecture Monday</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#readings-and-videos">Readings and Videos:</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#material-for-lecture-monday-september-8">Material for lecture Monday September 8</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#gradient-descent-and-revisiting-ordinary-least-squares-from-last-week">Gradient descent and revisiting Ordinary Least Squares from last week</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#gradient-descent-example">Gradient descent example</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#the-derivative-of-the-cost-loss-function">The derivative of the cost/loss function</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#the-hessian-matrix">The Hessian matrix</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#simple-program">Simple program</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#id1">Gradient Descent Example</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#gradient-descent-and-ridge">Gradient descent and Ridge</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#the-hessian-matrix-for-ridge-regression">The Hessian matrix for Ridge Regression</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#program-example-for-gradient-descent-with-ridge-regression">Program example for gradient descent with Ridge Regression</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#using-gradient-descent-methods-limitations">Using gradient descent methods, limitations</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#momentum-based-gd">Momentum based GD</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#improving-gradient-descent-with-momentum">Improving gradient descent with momentum</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#same-code-but-now-with-momentum-gradient-descent">Same code but now with momentum gradient descent</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#overview-video-on-stochastic-gradient-descent-sgd">Overview video on Stochastic Gradient Descent (SGD)</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#batches-and-mini-batches">Batches and mini-batches</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#pros-and-cons">Pros and cons</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#convergence-rates">Convergence rates</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#accuracy">Accuracy</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#stochastic-gradient-descent-sgd">Stochastic Gradient Descent (SGD)</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#stochastic-gradient-descent">Stochastic Gradient Descent</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#computation-of-gradients">Computation of gradients</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#sgd-example">SGD example</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#the-gradient-step">The gradient step</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#simple-example-code">Simple example code</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#when-do-we-stop">When do we stop?</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#slightly-different-approach">Slightly different approach</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#time-decay-rate">Time decay rate</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#code-with-a-number-of-minibatches-which-varies">Code with a Number of Minibatches which varies</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#replace-or-not">Replace or not</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#sgd-vs-full-batch-gd-convergence-speed-and-memory-comparison">SGD vs Full-Batch GD: Convergence Speed and Memory Comparison</a><ul class="nav section-nav flex-column">
+<li class="toc-h3 nav-item toc-entry"><a class="reference internal nav-link" href="#theoretical-convergence-speed-and-convex-optimization">Theoretical Convergence Speed and convex optimization</a></li>
+<li class="toc-h3 nav-item toc-entry"><a class="reference internal nav-link" href="#strongly-convex-case">Strongly Convex Case</a></li>
+<li class="toc-h3 nav-item toc-entry"><a class="reference internal nav-link" href="#non-convex-problems">Non-Convex Problems</a></li>
+</ul>
+</li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#memory-usage-and-scalability">Memory Usage and Scalability</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#empirical-evidence-convergence-time-and-memory-in-practice">Empirical Evidence: Convergence Time and Memory in Practice</a><ul class="nav section-nav flex-column">
+<li class="toc-h3 nav-item toc-entry"><a class="reference internal nav-link" href="#deep-neural-networks">Deep Neural Networks</a></li>
+<li class="toc-h3 nav-item toc-entry"><a class="reference internal nav-link" href="#memory-constraints">Memory constraints</a></li>
+</ul>
+</li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#second-moment-of-the-gradient">Second moment of the gradient</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#challenge-choosing-a-fixed-learning-rate">Challenge: Choosing a Fixed Learning Rate</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#motivation-for-adaptive-step-sizes">Motivation for Adaptive Step Sizes</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#adagrad-algorithm-taken-from-goodfellow-et-al">AdaGrad algorithm, taken from Goodfellow et al</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#derivation-of-the-adagrad-algorithm">Derivation of the AdaGrad Algorithm</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#adagrad-update-rule-derivation">AdaGrad Update Rule Derivation</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#adagrad-properties">AdaGrad Properties</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#rmsprop-adaptive-learning-rates">RMSProp: Adaptive Learning Rates</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#rmsprop-algorithm-taken-from-goodfellow-et-al">RMSProp algorithm, taken from Goodfellow et al</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#adam-optimizer">Adam Optimizer</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#id2">ADAM optimizer</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#why-combine-momentum-and-rmsprop">Why Combine Momentum and RMSProp?</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#adam-exponential-moving-averages-moments">Adam: Exponential Moving Averages (Moments)</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#adam-bias-correction">Adam: Bias Correction</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#adam-update-rule-derivation">Adam: Update Rule Derivation</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#adam-vs-adagrad-and-rmsprop">Adam vs. AdaGrad and RMSProp</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#adaptivity-across-dimensions">Adaptivity Across Dimensions</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#adam-algorithm-taken-from-goodfellow-et-al">ADAM algorithm, taken from Goodfellow et al</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#algorithms-and-codes-for-adagrad-rmsprop-and-adam">Algorithms and codes for Adagrad, RMSprop and Adam</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#practical-tips">Practical tips</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#sneaking-in-automatic-differentiation-using-autograd">Sneaking in automatic differentiation using Autograd</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#id3">Same code but now with momentum gradient descent</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#including-stochastic-gradient-descent-with-autograd">Including Stochastic Gradient Descent with Autograd</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#id4">Same code but now with momentum gradient descent</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#but-none-of-these-can-compete-with-newton-s-method">But none of these can compete with Newton’s method</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#similar-second-order-function-now-problem-but-now-with-adagrad">Similar (second order function now) problem but now with AdaGrad</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent">RMSprop for adaptive learning rate with Stochastic Gradient Descent</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#and-finally-adam">And finally ADAM</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#material-for-the-lab-sessions">Material for the lab sessions</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#reminder-on-different-scaling-methods">Reminder on different scaling methods</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#functionality-in-scikit-learn">Functionality in Scikit-Learn</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#more-preprocessing">More preprocessing</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#frequently-used-scaling-functions">Frequently used scaling functions</a></li>
+</ul>
+            </nav>
+        </div>
+    </div>
+</div>
+
+              
+                
+<div id="searchbox"></div>
+                <article class="bd-article">
+                  
+  <!-- HTML file automatically generated from DocOnce source (https://github.com/doconce/doconce/)
+doconce format html week37.do.txt --no_mako -->
+<!-- dom:TITLE: Week 37: Gradient descent methods --><section class="tex2jax_ignore mathjax_ignore" id="week-37-gradient-descent-methods">
+<h1>Week 37: Gradient descent methods<a class="headerlink" href="#week-37-gradient-descent-methods" title="Link to this heading">#</a></h1>
+<p><strong>Morten Hjorth-Jensen</strong>, Department of Physics, University of Oslo, Norway</p>
+<p>Date: <strong>September 8-12, 2025</strong></p>
+<!-- todo add link to videos and add link to Van Wieringens notes --><section id="plans-for-week-37-lecture-monday">
+<h2>Plans for week 37, lecture Monday<a class="headerlink" href="#plans-for-week-37-lecture-monday" title="Link to this heading">#</a></h2>
+<p><strong>Plans and material  for the lecture on Monday September 8.</strong></p>
+<p>The family of gradient descent methods</p>
+<ol class="arabic simple">
+<li><p>Plain gradient descent (constant learning rate), reminder from last week with examples using OLS and Ridge</p></li>
+<li><p>Improving gradient descent with momentum</p></li>
+<li><p>Introducing stochastic gradient descent</p></li>
+<li><p>More advanced updates of the learning rate: ADAgrad, RMSprop and ADAM</p></li>
+<li><p><a class="reference external" href="https://youtu.be/SuxK68tj-V8">Video of Lecture</a></p></li>
+<li><p><a class="reference external" href="https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2025/FYSSTKweek37.pdf">Whiteboard notes</a></p></li>
+</ol>
+</section>
+<section id="readings-and-videos">
+<h2>Readings and Videos:<a class="headerlink" href="#readings-and-videos" title="Link to this heading">#</a></h2>
+<ol class="arabic simple">
+<li><p>Recommended: Goodfellow et al, Deep Learning, introduction to gradient descent, see sections 4.3-4.5  at <a class="reference external" href="https://www.deeplearningbook.org/contents/numerical.html">https://www.deeplearningbook.org/contents/numerical.html</a> and chapter 8.3-8.5 at <a class="reference external" href="https://www.deeplearningbook.org/contents/optimization.html">https://www.deeplearningbook.org/contents/optimization.html</a></p></li>
+<li><p>Rashcka et al, pages 37-44 and pages 278-283 with focus on linear regression.</p></li>
+<li><p>Video on gradient descent at <a class="reference external" href="https://www.youtube.com/watch?v=sDv4f4s2SB8">https://www.youtube.com/watch?v=sDv4f4s2SB8</a></p></li>
+<li><p>Video on Stochastic gradient descent at <a class="reference external" href="https://www.youtube.com/watch?v=vMh0zPT0tLI">https://www.youtube.com/watch?v=vMh0zPT0tLI</a></p></li>
+</ol>
+</section>
+<section id="material-for-lecture-monday-september-8">
+<h2>Material for lecture Monday September 8<a class="headerlink" href="#material-for-lecture-monday-september-8" title="Link to this heading">#</a></h2>
+</section>
+<section id="gradient-descent-and-revisiting-ordinary-least-squares-from-last-week">
+<h2>Gradient descent and revisiting Ordinary Least Squares from last week<a class="headerlink" href="#gradient-descent-and-revisiting-ordinary-least-squares-from-last-week" title="Link to this heading">#</a></h2>
+<p>Last week we started with  linear regression as a case study for the gradient descent
+methods. Linear regression is a great test case for the gradient
+descent methods discussed in the lectures since it has several
+desirable properties such as:</p>
+<ol class="arabic simple">
+<li><p>An analytical solution (recall homework sets for week 35).</p></li>
+<li><p>The gradient can be computed analytically.</p></li>
+<li><p>The cost function is convex which guarantees that gradient descent converges for small enough learning rates</p></li>
+</ol>
+<p>We revisit an example similar to what we had in the first homework set. We have a function  of the type</p>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-none notranslate"><div class="highlight"><pre><span></span>import numpy as np
+x = 2*np.random.rand(m,1)
+y = 4+3*x+np.random.randn(m,1)
+</pre></div>
+</div>
+</div>
+</div>
+<p>with <span class="math notranslate nohighlight">\(x_i \in [0,1] \)</span> is chosen randomly using a uniform distribution. Additionally we have a stochastic noise chosen according to a normal distribution <span class="math notranslate nohighlight">\(\cal {N}(0,1)\)</span>.
+The linear regression model is given by</p>
+<div class="math notranslate nohighlight">
+\[
+h_\theta(x) = \boldsymbol{y} = \theta_0 + \theta_1 x,
+\]</div>
+<p>such that</p>
+<div class="math notranslate nohighlight">
+\[
+\boldsymbol{y}_i = \theta_0 + \theta_1 x_i.
+\]</div>
+</section>
+<section id="gradient-descent-example">
+<h2>Gradient descent example<a class="headerlink" href="#gradient-descent-example" title="Link to this heading">#</a></h2>
+<p>Let <span class="math notranslate nohighlight">\(\mathbf{y} = (y_1,\cdots,y_n)^T\)</span>, <span class="math notranslate nohighlight">\(\mathbf{\boldsymbol{y}} = (\boldsymbol{y}_1,\cdots,\boldsymbol{y}_n)^T\)</span> and <span class="math notranslate nohighlight">\(\theta = (\theta_0, \theta_1)^T\)</span></p>
+<p>It is convenient to write <span class="math notranslate nohighlight">\(\mathbf{\boldsymbol{y}} = X\theta\)</span> where <span class="math notranslate nohighlight">\(X \in \mathbb{R}^{100 \times 2} \)</span> is the design matrix given by (we keep the intercept here)</p>
+<div class="math notranslate nohighlight">
+\[\begin{split}
+X \equiv \begin{bmatrix}
+1 &amp; x_1  \\
+\vdots &amp; \vdots  \\
+1 &amp; x_{100} &amp;  \\
+\end{bmatrix}.
+\end{split}\]</div>
+<p>The cost/loss/risk function is given by</p>
+<div class="math notranslate nohighlight">
+\[
+C(\theta) = \frac{1}{n}||X\theta-\mathbf{y}||_{2}^{2} = \frac{1}{n}\sum_{i=1}^{100}\left[ (\theta_0 + \theta_1 x_i)^2 - 2 y_i (\theta_0 + \theta_1 x_i) + y_i^2\right]
+\]</div>
+<p>and we want to find <span class="math notranslate nohighlight">\(\theta\)</span> such that <span class="math notranslate nohighlight">\(C(\theta)\)</span> is minimized.</p>
+</section>
+<section id="the-derivative-of-the-cost-loss-function">
+<h2>The derivative of the cost/loss function<a class="headerlink" href="#the-derivative-of-the-cost-loss-function" title="Link to this heading">#</a></h2>
+<p>Computing <span class="math notranslate nohighlight">\(\partial C(\theta) / \partial \theta_0\)</span> and <span class="math notranslate nohighlight">\(\partial C(\theta) / \partial \theta_1\)</span> we can show  that the gradient can be written as</p>
+<div class="math notranslate nohighlight">
+\[\begin{split}
+\nabla_{\theta} C(\theta) = \frac{2}{n}\begin{bmatrix} \sum_{i=1}^{100} \left(\theta_0+\theta_1x_i-y_i\right) \\
+\sum_{i=1}^{100}\left( x_i (\theta_0+\theta_1x_i)-y_ix_i\right) \\
+\end{bmatrix} = \frac{2}{n}X^T(X\theta - \mathbf{y}),
+\end{split}\]</div>
+<p>where <span class="math notranslate nohighlight">\(X\)</span> is the design matrix defined above.</p>
+</section>
+<section id="the-hessian-matrix">
+<h2>The Hessian matrix<a class="headerlink" href="#the-hessian-matrix" title="Link to this heading">#</a></h2>
+<p>The Hessian matrix of <span class="math notranslate nohighlight">\(C(\theta)\)</span> is given by</p>
+<div class="math notranslate nohighlight">
+\[\begin{split}
+\boldsymbol{H} \equiv \begin{bmatrix}
+\frac{\partial^2 C(\theta)}{\partial \theta_0^2} &amp; \frac{\partial^2 C(\theta)}{\partial \theta_0 \partial \theta_1}  \\
+\frac{\partial^2 C(\theta)}{\partial \theta_0 \partial \theta_1} &amp; \frac{\partial^2 C(\theta)}{\partial \theta_1^2} &amp;  \\
+\end{bmatrix} = \frac{2}{n}X^T X.
+\end{split}\]</div>
+<p>This result implies that <span class="math notranslate nohighlight">\(C(\theta)\)</span> is a convex function since the matrix <span class="math notranslate nohighlight">\(X^T X\)</span> always is positive semi-definite.</p>
+</section>
+<section id="simple-program">
+<h2>Simple program<a class="headerlink" href="#simple-program" title="Link to this heading">#</a></h2>
+<p>We can now write a program that minimizes <span class="math notranslate nohighlight">\(C(\theta)\)</span> using the gradient descent method with a constant learning rate <span class="math notranslate nohighlight">\(\eta\)</span> according to</p>
+<div class="math notranslate nohighlight">
+\[
+\theta_{k+1} = \theta_k - \eta \nabla_\theta C(\theta_k), \ k=0,1,\cdots
+\]</div>
+<p>We can use the expression we computed for the gradient and let use a
+<span class="math notranslate nohighlight">\(\theta_0\)</span> be chosen randomly and let <span class="math notranslate nohighlight">\(\eta = 0.001\)</span>. Stop iterating
+when <span class="math notranslate nohighlight">\(||\nabla_\theta C(\theta_k) || \leq \epsilon = 10^{-8}\)</span>. <strong>Note that the code below does not include the latter stop criterion</strong>.</p>
+<p>And finally we can compare our solution for <span class="math notranslate nohighlight">\(\theta\)</span> with the analytic result given by
+<span class="math notranslate nohighlight">\(\theta= (X^TX)^{-1} X^T \mathbf{y}\)</span>.</p>
+</section>
+<section id="id1">
+<h2>Gradient Descent Example<a class="headerlink" href="#id1" title="Link to this heading">#</a></h2>
+<p>Here our simple example</p>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-none notranslate"><div class="highlight"><pre><span></span>%matplotlib inline
+
+
+# Importing various packages
+from random import random, seed
+import numpy as np
+import matplotlib.pyplot as plt
+from mpl_toolkits.mplot3d import Axes3D
+from matplotlib import cm
+from matplotlib.ticker import LinearLocator, FormatStrFormatter
+import sys
+
+# the number of datapoints
+n = 100
+x = 2*np.random.rand(n,1)
+y = 4+3*x+np.random.randn(n,1)
+
+X = np.c_[np.ones((n,1)), x]
+# Hessian matrix
+H = (2.0/n)* X.T @ X
+# Get the eigenvalues
+EigValues, EigVectors = np.linalg.eig(H)
+print(f&quot;Eigenvalues of Hessian Matrix:{EigValues}&quot;)
+
+theta_linreg = np.linalg.inv(X.T @ X) @ X.T @ y
+print(theta_linreg)
+theta = np.random.randn(2,1)
+
+eta = 1.0/np.max(EigValues)
+Niterations = 1000
+
+for iter in range(Niterations):
+    gradient = (2.0/n)*X.T @ (X @ theta-y)
+    theta -= eta*gradient
+
+print(theta)
+xnew = np.array([[0],[2]])
+xbnew = np.c_[np.ones((2,1)), xnew]
+ypredict = xbnew.dot(theta)
+ypredict2 = xbnew.dot(theta_linreg)
+plt.plot(xnew, ypredict, &quot;r-&quot;)
+plt.plot(xnew, ypredict2, &quot;b-&quot;)
+plt.plot(x, y ,&#39;ro&#39;)
+plt.axis([0,2.0,0, 15.0])
+plt.xlabel(r&#39;$x$&#39;)
+plt.ylabel(r&#39;$y$&#39;)
+plt.title(r&#39;Gradient descent example&#39;)
+plt.show()
+</pre></div>
+</div>
+</div>
+</div>
+</section>
+<section id="gradient-descent-and-ridge">
+<h2>Gradient descent and Ridge<a class="headerlink" href="#gradient-descent-and-ridge" title="Link to this heading">#</a></h2>
+<p>We have also discussed Ridge regression where the loss function contains a regularized term given by the <span class="math notranslate nohighlight">\(L_2\)</span> norm of <span class="math notranslate nohighlight">\(\theta\)</span>,</p>
+<div class="math notranslate nohighlight">
+\[
+C_{\text{ridge}}(\theta) = \frac{1}{n}||X\theta -\mathbf{y}||^2 + \lambda ||\theta||^2, \ \lambda \geq 0.
+\]</div>
+<p>In order to minimize <span class="math notranslate nohighlight">\(C_{\text{ridge}}(\theta)\)</span> using GD we adjust the gradient as follows</p>
+<div class="math notranslate nohighlight">
+\[\begin{split}
+\nabla_\theta C_{\text{ridge}}(\theta)  = \frac{2}{n}\begin{bmatrix} \sum_{i=1}^{100} \left(\theta_0+\theta_1x_i-y_i\right) \\
+\sum_{i=1}^{100}\left( x_i (\theta_0+\theta_1x_i)-y_ix_i\right) \\
+\end{bmatrix} + 2\lambda\begin{bmatrix} \theta_0 \\ \theta_1\end{bmatrix} = 2 (\frac{1}{n}X^T(X\theta - \mathbf{y})+\lambda \theta).
+\end{split}\]</div>
+<p>We can easily extend our program to minimize <span class="math notranslate nohighlight">\(C_{\text{ridge}}(\theta)\)</span> using gradient descent and compare with the analytical solution given by</p>
+<div class="math notranslate nohighlight">
+\[
+\theta_{\text{ridge}} = \left(X^T X + n\lambda I_{2 \times 2} \right)^{-1} X^T \mathbf{y}.
+\]</div>
+</section>
+<section id="the-hessian-matrix-for-ridge-regression">
+<h2>The Hessian matrix for Ridge Regression<a class="headerlink" href="#the-hessian-matrix-for-ridge-regression" title="Link to this heading">#</a></h2>
+<p>The Hessian matrix of Ridge Regression for our simple example  is given by</p>
+<div class="math notranslate nohighlight">
+\[\begin{split}
+\boldsymbol{H} \equiv \begin{bmatrix}
+\frac{\partial^2 C(\theta)}{\partial \theta_0^2} &amp; \frac{\partial^2 C(\theta)}{\partial \theta_0 \partial \theta_1}  \\
+\frac{\partial^2 C(\theta)}{\partial \theta_0 \partial \theta_1} &amp; \frac{\partial^2 C(\theta)}{\partial \theta_1^2} &amp;  \\
+\end{bmatrix} = \frac{2}{n}X^T X+2\lambda\boldsymbol{I}.
+\end{split}\]</div>
+<p>This implies that the Hessian matrix  is positive definite, hence the stationary point is a
+minimum.
+Note that the Ridge cost function is convex being  a sum of two convex
+functions. Therefore, the stationary point is a global
+minimum of this function.</p>
+</section>
+<section id="program-example-for-gradient-descent-with-ridge-regression">
+<h2>Program example for gradient descent with Ridge Regression<a class="headerlink" href="#program-example-for-gradient-descent-with-ridge-regression" title="Link to this heading">#</a></h2>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-none notranslate"><div class="highlight"><pre><span></span>from random import random, seed
+import numpy as np
+import matplotlib.pyplot as plt
+from mpl_toolkits.mplot3d import Axes3D
+from matplotlib import cm
+from matplotlib.ticker import LinearLocator, FormatStrFormatter
+import sys
+
+# the number of datapoints
+n = 100
+x = 2*np.random.rand(n,1)
+y = 4+3*x+np.random.randn(n,1)
+
+X = np.c_[np.ones((n,1)), x]
+XT_X = X.T @ X
+
+#Ridge parameter lambda
+lmbda  = 0.001
+Id = n*lmbda* np.eye(XT_X.shape[0])
+
+# Hessian matrix
+H = (2.0/n)* XT_X+2*lmbda* np.eye(XT_X.shape[0])
+# Get the eigenvalues
+EigValues, EigVectors = np.linalg.eig(H)
+print(f&quot;Eigenvalues of Hessian Matrix:{EigValues}&quot;)
+
+
+theta_linreg = np.linalg.inv(XT_X+Id) @ X.T @ y
+print(theta_linreg)
+# Start plain gradient descent
+theta = np.random.randn(2,1)
+
+eta = 1.0/np.max(EigValues)
+Niterations = 100
+
+for iter in range(Niterations):
+    gradients = 2.0/n*X.T @ (X @ (theta)-y)+2*lmbda*theta
+    theta -= eta*gradients
+
+print(theta)
+ypredict = X @ theta
+ypredict2 = X @ theta_linreg
+plt.plot(x, ypredict, &quot;r-&quot;)
+plt.plot(x, ypredict2, &quot;b-&quot;)
+plt.plot(x, y ,&#39;ro&#39;)
+plt.axis([0,2.0,0, 15.0])
+plt.xlabel(r&#39;$x$&#39;)
+plt.ylabel(r&#39;$y$&#39;)
+plt.title(r&#39;Gradient descent example for Ridge&#39;)
+plt.show()
+</pre></div>
+</div>
+</div>
+</div>
+</section>
+<section id="using-gradient-descent-methods-limitations">
+<h2>Using gradient descent methods, limitations<a class="headerlink" href="#using-gradient-descent-methods-limitations" title="Link to this heading">#</a></h2>
+<ul class="simple">
+<li><p><strong>Gradient descent (GD) finds local minima of our function</strong>. Since the GD algorithm is deterministic, if it converges, it will converge to a local minimum of our cost/loss/risk function. Because in ML we are often dealing with extremely rugged landscapes with many local minima, this can lead to poor performance.</p></li>
+<li><p><strong>GD is sensitive to initial conditions</strong>. One consequence of the local nature of GD is that initial conditions matter. Depending on where one starts, one will end up at a different local minima. Therefore, it is very important to think about how one initializes the training process. This is true for GD as well as more complicated variants of GD.</p></li>
+<li><p><strong>Gradients are computationally expensive to calculate for large datasets</strong>. In many cases in statistics and ML, the cost/loss/risk function is a sum of terms, with one term for each data point. For example, in linear regression, <span class="math notranslate nohighlight">\(E \propto \sum_{i=1}^n (y_i - \mathbf{w}^T\cdot\mathbf{x}_i)^2\)</span>; for logistic regression, the square error is replaced by the cross entropy. To calculate the gradient we have to sum over <em>all</em> <span class="math notranslate nohighlight">\(n\)</span> data points. Doing this at every GD step becomes extremely computationally expensive. An ingenious solution to this, is to calculate the gradients using small subsets of the data called “mini batches”. This has the added benefit of introducing stochasticity into our algorithm.</p></li>
+<li><p><strong>GD is very sensitive to choices of learning rates</strong>. GD is extremely sensitive to the choice of learning rates. If the learning rate is very small, the training process take an extremely long time. For larger learning rates, GD can diverge and give poor results. Furthermore, depending on what the local landscape looks like, we have to modify the learning rates to ensure convergence. Ideally, we would <em>adaptively</em> choose the learning rates to match the landscape.</p></li>
+<li><p><strong>GD treats all directions in parameter space uniformly.</strong> Another major drawback of GD is that unlike Newton’s method, the learning rate for GD is the same in all directions in parameter space. For this reason, the maximum learning rate is set by the behavior of the steepest direction and this can significantly slow down training. Ideally, we would like to take large steps in flat directions and small steps in steep directions. Since we are exploring rugged landscapes where curvatures change, this requires us to keep track of not only the gradient but second derivatives. The ideal scenario would be to calculate the Hessian but this proves to be too computationally expensive.</p></li>
+<li><p>GD can take exponential time to escape saddle points, even with random initialization. As we mentioned, GD is extremely sensitive to initial condition since it determines the particular local minimum GD would eventually reach. However, even with a good initialization scheme, through the introduction of randomness, GD can still take exponential time to escape saddle points.</p></li>
+</ul>
+</section>
+<section id="momentum-based-gd">
+<h2>Momentum based GD<a class="headerlink" href="#momentum-based-gd" title="Link to this heading">#</a></h2>
+<p>We discuss here some simple examples where we introduce what is called
+‘memory’about previous steps, or what is normally called momentum
+gradient descent.
+For the mathematical details, see whiteboad notes from lecture on September 8, 2025.</p>
+</section>
+<section id="improving-gradient-descent-with-momentum">
+<h2>Improving gradient descent with momentum<a class="headerlink" href="#improving-gradient-descent-with-momentum" title="Link to this heading">#</a></h2>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-none notranslate"><div class="highlight"><pre><span></span>from numpy import asarray
+from numpy import arange
+from numpy.random import rand
+from numpy.random import seed
+from matplotlib import pyplot
+ 
+# objective function
+def objective(x):
+	return x**2.0
+ 
+# derivative of objective function
+def derivative(x):
+	return x * 2.0
+ 
+# gradient descent algorithm
+def gradient_descent(objective, derivative, bounds, n_iter, step_size):
+	# track all solutions
+	solutions, scores = list(), list()
+	# generate an initial point
+	solution = bounds[:, 0] + rand(len(bounds)) * (bounds[:, 1] - bounds[:, 0])
+	# run the gradient descent
+	for i in range(n_iter):
+		# calculate gradient
+		gradient = derivative(solution)
+		# take a step
+		solution = solution - step_size * gradient
+		# evaluate candidate point
+		solution_eval = objective(solution)
+		# store solution
+		solutions.append(solution)
+		scores.append(solution_eval)
+		# report progress
+		print(&#39;&gt;%d f(%s) = %.5f&#39; % (i, solution, solution_eval))
+	return [solutions, scores]
+ 
+# seed the pseudo random number generator
+seed(4)
+# define range for input
+bounds = asarray([[-1.0, 1.0]])
+# define the total iterations
+n_iter = 30
+# define the step size
+step_size = 0.1
+# perform the gradient descent search
+solutions, scores = gradient_descent(objective, derivative, bounds, n_iter, step_size)
+# sample input range uniformly at 0.1 increments
+inputs = arange(bounds[0,0], bounds[0,1]+0.1, 0.1)
+# compute targets
+results = objective(inputs)
+# create a line plot of input vs result
+pyplot.plot(inputs, results)
+# plot the solutions found
+pyplot.plot(solutions, scores, &#39;.-&#39;, color=&#39;red&#39;)
+# show the plot
+pyplot.show()
+</pre></div>
+</div>
+</div>
+</div>
+</section>
+<section id="same-code-but-now-with-momentum-gradient-descent">
+<h2>Same code but now with momentum gradient descent<a class="headerlink" href="#same-code-but-now-with-momentum-gradient-descent" title="Link to this heading">#</a></h2>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-none notranslate"><div class="highlight"><pre><span></span>from numpy import asarray
+from numpy import arange
+from numpy.random import rand
+from numpy.random import seed
+from matplotlib import pyplot
+ 
+# objective function
+def objective(x):
+	return x**2.0
+ 
+# derivative of objective function
+def derivative(x):
+	return x * 2.0
+ 
+# gradient descent algorithm
+def gradient_descent(objective, derivative, bounds, n_iter, step_size, momentum):
+	# track all solutions
+	solutions, scores = list(), list()
+	# generate an initial point
+	solution = bounds[:, 0] + rand(len(bounds)) * (bounds[:, 1] - bounds[:, 0])
+	# keep track of the change
+	change = 0.0
+	# run the gradient descent
+	for i in range(n_iter):
+		# calculate gradient
+		gradient = derivative(solution)
+		# calculate update
+		new_change = step_size * gradient + momentum * change
+		# take a step
+		solution = solution - new_change
+		# save the change
+		change = new_change
+		# evaluate candidate point
+		solution_eval = objective(solution)
+		# store solution
+		solutions.append(solution)
+		scores.append(solution_eval)
+		# report progress
+		print(&#39;&gt;%d f(%s) = %.5f&#39; % (i, solution, solution_eval))
+	return [solutions, scores]
+ 
+# seed the pseudo random number generator
+seed(4)
+# define range for input
+bounds = asarray([[-1.0, 1.0]])
+# define the total iterations
+n_iter = 30
+# define the step size
+step_size = 0.1
+# define momentum
+momentum = 0.3
+# perform the gradient descent search with momentum
+solutions, scores = gradient_descent(objective, derivative, bounds, n_iter, step_size, momentum)
+# sample input range uniformly at 0.1 increments
+inputs = arange(bounds[0,0], bounds[0,1]+0.1, 0.1)
+# compute targets
+results = objective(inputs)
+# create a line plot of input vs result
+pyplot.plot(inputs, results)
+# plot the solutions found
+pyplot.plot(solutions, scores, &#39;.-&#39;, color=&#39;red&#39;)
+# show the plot
+pyplot.show()
+</pre></div>
+</div>
+</div>
+</div>
+</section>
+<section id="overview-video-on-stochastic-gradient-descent-sgd">
+<h2>Overview video on Stochastic Gradient Descent (SGD)<a class="headerlink" href="#overview-video-on-stochastic-gradient-descent-sgd" title="Link to this heading">#</a></h2>
+<p><a class="reference external" href="https://www.youtube.com/watch?v=vMh0zPT0tLI&amp;amp;ab_channel=StatQuestwithJoshStarmer">What is Stochastic Gradient Descent</a>
+There are several reasons for using stochastic gradient descent. Some of these are:</p>
+<ol class="arabic simple">
+<li><p>Efficiency: Updates weights more frequently using a single or a small batch of samples, which speeds up convergence.</p></li>
+<li><p>Hopefully avoid Local Minima</p></li>
+<li><p>Memory Usage: Requires less memory compared to computing gradients for the entire dataset.</p></li>
+</ol>
+</section>
+<section id="batches-and-mini-batches">
+<h2>Batches and mini-batches<a class="headerlink" href="#batches-and-mini-batches" title="Link to this heading">#</a></h2>
+<p>In gradient descent we compute the cost function and its gradient for all data points we have.</p>
+<p>In large-scale applications such as the <a class="reference external" href="https://www.image-net.org/challenges/LSVRC/">ILSVRC challenge</a>, the
+training data can have on order of millions of examples. Hence, it
+seems wasteful to compute the full cost function over the entire
+training set in order to perform only a single parameter update. A
+very common approach to addressing this challenge is to compute the
+gradient over batches of the training data. For example, a typical batch could contain some thousand  examples from
+an  entire training set of several millions. This batch is then used to
+perform a parameter update.</p>
+</section>
+<section id="pros-and-cons">
+<h2>Pros and cons<a class="headerlink" href="#pros-and-cons" title="Link to this heading">#</a></h2>
+<ol class="arabic simple">
+<li><p>Speed: SGD is faster than gradient descent because it uses only one training example per iteration, whereas gradient descent requires the entire dataset. This speed advantage becomes more significant as the size of the dataset increases.</p></li>
+<li><p>Convergence: Gradient descent has a more predictable convergence behaviour because it uses the average gradient of the entire dataset. In contrast, SGD’s convergence behaviour can be more erratic due to its random sampling of individual training examples.</p></li>
+<li><p>Memory: Gradient descent requires more memory than SGD because it must store the entire dataset for each iteration. SGD only needs to store the current training example, making it more memory-efficient.</p></li>
+</ol>
+</section>
+<section id="convergence-rates">
+<h2>Convergence rates<a class="headerlink" href="#convergence-rates" title="Link to this heading">#</a></h2>
+<ol class="arabic simple">
+<li><p>Stochastic Gradient Descent has a faster convergence rate due to the use of single training examples in each iteration.</p></li>
+<li><p>Gradient Descent as a slower convergence rate, as it uses the entire dataset for each iteration.</p></li>
+</ol>
+</section>
+<section id="accuracy">
+<h2>Accuracy<a class="headerlink" href="#accuracy" title="Link to this heading">#</a></h2>
+<p>In general, stochastic Gradient Descent is Less accurate than gradient
+descent, as it calculates the gradient on single examples, which may
+not accurately represent the overall dataset.  Gradient Descent is
+more accurate because it uses the average gradient calculated over the
+entire dataset.</p>
+<p>There are other disadvantages to using SGD. The main drawback is that
+its convergence behaviour can be more erratic due to the random
+sampling of individual training examples. This can lead to less
+accurate results, as the algorithm may not converge to the true
+minimum of the cost function. Additionally, the learning rate, which
+determines the step size of each update to the model’s parameters,
+must be carefully chosen to ensure convergence.</p>
+<p>It is however the method of choice in deep learning algorithms where
+SGD is often used in combination with other optimization techniques,
+such as momentum or adaptive learning rates</p>
+</section>
+<section id="stochastic-gradient-descent-sgd">
+<h2>Stochastic Gradient Descent (SGD)<a class="headerlink" href="#stochastic-gradient-descent-sgd" title="Link to this heading">#</a></h2>
+<p>In stochastic gradient descent, the extreme case is the case where we
+have only one batch, that is we include the whole data set.</p>
+<p>This process is called Stochastic Gradient
+Descent (SGD) (or also sometimes on-line gradient descent). This is
+relatively less common to see because in practice due to vectorized
+code optimizations it can be computationally much more efficient to
+evaluate the gradient for 100 examples, than the gradient for one
+example 100 times. Even though SGD technically refers to using a
+single example at a time to evaluate the gradient, you will hear
+people use the term SGD even when referring to mini-batch gradient
+descent (i.e. mentions of MGD for “Minibatch Gradient Descent”, or BGD
+for “Batch gradient descent” are rare to see), where it is usually
+assumed that mini-batches are used. The size of the mini-batch is a
+hyperparameter but it is not very common to cross-validate or bootstrap it. It is
+usually based on memory constraints (if any), or set to some value,
+e.g. 32, 64 or 128. We use powers of 2 in practice because many
+vectorized operation implementations work faster when their inputs are
+sized in powers of 2.</p>
+<p>In our notes with  SGD we mean stochastic gradient descent with mini-batches.</p>
+</section>
+<section id="stochastic-gradient-descent">
+<h2>Stochastic Gradient Descent<a class="headerlink" href="#stochastic-gradient-descent" title="Link to this heading">#</a></h2>
+<p>Stochastic gradient descent (SGD) and variants thereof address some of
+the shortcomings of the Gradient descent method discussed above.</p>
+<p>The underlying idea of SGD comes from the observation that the cost
+function, which we want to minimize, can almost always be written as a
+sum over <span class="math notranslate nohighlight">\(n\)</span> data points <span class="math notranslate nohighlight">\(\{\mathbf{x}_i\}_{i=1}^n\)</span>,</p>
+<div class="math notranslate nohighlight">
+\[
+C(\mathbf{\theta}) = \sum_{i=1}^n c_i(\mathbf{x}_i,
+\mathbf{\theta}).
+\]</div>
+</section>
+<section id="computation-of-gradients">
+<h2>Computation of gradients<a class="headerlink" href="#computation-of-gradients" title="Link to this heading">#</a></h2>
+<p>This in turn means that the gradient can be
+computed as a sum over <span class="math notranslate nohighlight">\(i\)</span>-gradients</p>
+<div class="math notranslate nohighlight">
+\[
+\nabla_\theta C(\mathbf{\theta}) = \sum_i^n \nabla_\theta c_i(\mathbf{x}_i,
+\mathbf{\theta}).
+\]</div>
+<p>Stochasticity/randomness is introduced by only taking the
+gradient on a subset of the data called minibatches.  If there are <span class="math notranslate nohighlight">\(n\)</span>
+data points and the size of each minibatch is <span class="math notranslate nohighlight">\(M\)</span>, there will be <span class="math notranslate nohighlight">\(n/M\)</span>
+minibatches. We denote these minibatches by <span class="math notranslate nohighlight">\(B_k\)</span> where
+<span class="math notranslate nohighlight">\(k=1,\cdots,n/M\)</span>.</p>
+</section>
+<section id="sgd-example">
+<h2>SGD example<a class="headerlink" href="#sgd-example" title="Link to this heading">#</a></h2>
+<p>As an example, suppose we have <span class="math notranslate nohighlight">\(10\)</span> data points <span class="math notranslate nohighlight">\((\mathbf{x}_1,\cdots, \mathbf{x}_{10})\)</span>
+and we choose to have <span class="math notranslate nohighlight">\(M=5\)</span> minibathces,
+then each minibatch contains two data points. In particular we have
+<span class="math notranslate nohighlight">\(B_1 = (\mathbf{x}_1,\mathbf{x}_2), \cdots, B_5 =
+(\mathbf{x}_9,\mathbf{x}_{10})\)</span>. Note that if you choose <span class="math notranslate nohighlight">\(M=1\)</span> you
+have only a single batch with all data points and on the other extreme,
+you may choose <span class="math notranslate nohighlight">\(M=n\)</span> resulting in a minibatch for each datapoint, i.e
+<span class="math notranslate nohighlight">\(B_k = \mathbf{x}_k\)</span>.</p>
+<p>The idea is now to approximate the gradient by replacing the sum over
+all data points with a sum over the data points in one the minibatches
+picked at random in each gradient descent step</p>
+<div class="math notranslate nohighlight">
+\[
+\nabla_{\theta}
+C(\mathbf{\theta}) = \sum_{i=1}^n \nabla_\theta c_i(\mathbf{x}_i,
+\mathbf{\theta}) \rightarrow \sum_{i \in B_k}^n \nabla_\theta
+c_i(\mathbf{x}_i, \mathbf{\theta}).
+\]</div>
+</section>
+<section id="the-gradient-step">
+<h2>The gradient step<a class="headerlink" href="#the-gradient-step" title="Link to this heading">#</a></h2>
+<p>Thus a gradient descent step now looks like</p>
+<div class="math notranslate nohighlight">
+\[
+\theta_{j+1} = \theta_j - \eta_j \sum_{i \in B_k}^n \nabla_\theta c_i(\mathbf{x}_i,
+\mathbf{\theta})
+\]</div>
+<p>where <span class="math notranslate nohighlight">\(k\)</span> is picked at random with equal
+probability from <span class="math notranslate nohighlight">\([1,n/M]\)</span>. An iteration over the number of
+minibathces (n/M) is commonly referred to as an epoch. Thus it is
+typical to choose a number of epochs and for each epoch iterate over
+the number of minibatches, as exemplified in the code below.</p>
+</section>
+<section id="simple-example-code">
+<h2>Simple example code<a class="headerlink" href="#simple-example-code" title="Link to this heading">#</a></h2>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-none notranslate"><div class="highlight"><pre><span></span>import numpy as np 
+
+n = 100 #100 datapoints 
+M = 5   #size of each minibatch
+m = int(n/M) #number of minibatches
+n_epochs = 10 #number of epochs
+
+j = 0
+for epoch in range(1,n_epochs+1):
+    for i in range(m):
+        k = np.random.randint(m) #Pick the k-th minibatch at random
+        #Compute the gradient using the data in minibatch Bk
+        #Compute new suggestion for 
+        j += 1
+</pre></div>
+</div>
+</div>
+</div>
+<p>Taking the gradient only on a subset of the data has two important
+benefits. First, it introduces randomness which decreases the chance
+that our opmization scheme gets stuck in a local minima. Second, if
+the size of the minibatches are small relative to the number of
+datapoints (<span class="math notranslate nohighlight">\(M &lt;  n\)</span>), the computation of the gradient is much
+cheaper since we sum over the datapoints in the <span class="math notranslate nohighlight">\(k-th\)</span> minibatch and not
+all <span class="math notranslate nohighlight">\(n\)</span> datapoints.</p>
+</section>
+<section id="when-do-we-stop">
+<h2>When do we stop?<a class="headerlink" href="#when-do-we-stop" title="Link to this heading">#</a></h2>
+<p>A natural question is when do we stop the search for a new minimum?
+One possibility is to compute the full gradient after a given number
+of epochs and check if the norm of the gradient is smaller than some
+threshold and stop if true. However, the condition that the gradient
+is zero is valid also for local minima, so this would only tell us
+that we are close to a local/global minimum. However, we could also
+evaluate the cost function at this point, store the result and
+continue the search. If the test kicks in at a later stage we can
+compare the values of the cost function and keep the <span class="math notranslate nohighlight">\(\theta\)</span> that
+gave the lowest value.</p>
+</section>
+<section id="slightly-different-approach">
+<h2>Slightly different approach<a class="headerlink" href="#slightly-different-approach" title="Link to this heading">#</a></h2>
+<p>Another approach is to let the step length <span class="math notranslate nohighlight">\(\eta_j\)</span> depend on the
+number of epochs in such a way that it becomes very small after a
+reasonable time such that we do not move at all. Such approaches are
+also called scaling. There are many such ways to <a class="reference external" href="https://towardsdatascience.com/gradient-descent-the-learning-rate-and-the-importance-of-feature-scaling-6c0b416596e1">scale the learning
+rate</a>
+and <a class="reference external" href="https://www.jmlr.org/papers/volume23/20-1258/20-1258.pdf">discussions here</a>. See
+also
+<a class="reference external" href="https://towardsdatascience.com/learning-rate-schedules-and-adaptive-learning-rate-methods-for-deep-learning-2c8f433990d1">https://towardsdatascience.com/learning-rate-schedules-and-adaptive-learning-rate-methods-for-deep-learning-2c8f433990d1</a>
+for a discussion of different scaling functions for the learning rate.</p>
+</section>
+<section id="time-decay-rate">
+<h2>Time decay rate<a class="headerlink" href="#time-decay-rate" title="Link to this heading">#</a></h2>
+<p>As an example, let <span class="math notranslate nohighlight">\(e = 0,1,2,3,\cdots\)</span> denote the current epoch and let <span class="math notranslate nohighlight">\(t_0, t_1 &gt; 0\)</span> be two fixed numbers. Furthermore, let <span class="math notranslate nohighlight">\(t = e \cdot m + i\)</span> where <span class="math notranslate nohighlight">\(m\)</span> is the number of minibatches and <span class="math notranslate nohighlight">\(i=0,\cdots,m-1\)</span>. Then the function $<span class="math notranslate nohighlight">\(\eta_j(t; t_0, t_1) = \frac{t_0}{t+t_1} \)</span><span class="math notranslate nohighlight">\( goes to zero as the number of epochs gets large. I.e. we start with a step length \)</span>\eta_j (0; t_0, t_1) = t_0/t_1<span class="math notranslate nohighlight">\( which decays in *time* \)</span>t$.</p>
+<p>In this way we can fix the number of epochs, compute <span class="math notranslate nohighlight">\(\theta\)</span> and
+evaluate the cost function at the end. Repeating the computation will
+give a different result since the scheme is random by design. Then we
+pick the final <span class="math notranslate nohighlight">\(\theta\)</span> that gives the lowest value of the cost
+function.</p>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-none notranslate"><div class="highlight"><pre><span></span>import numpy as np 
+
+def step_length(t,t0,t1):
+    return t0/(t+t1)
+
+n = 100 #100 datapoints 
+M = 5   #size of each minibatch
+m = int(n/M) #number of minibatches
+n_epochs = 500 #number of epochs
+t0 = 1.0
+t1 = 10
+
+eta_j = t0/t1
+j = 0
+for epoch in range(1,n_epochs+1):
+    for i in range(m):
+        k = np.random.randint(m) #Pick the k-th minibatch at random
+        #Compute the gradient using the data in minibatch Bk
+        #Compute new suggestion for theta
+        t = epoch*m+i
+        eta_j = step_length(t,t0,t1)
+        j += 1
+
+print(&quot;eta_j after %d epochs: %g&quot; % (n_epochs,eta_j))
+</pre></div>
+</div>
+</div>
+</div>
+</section>
+<section id="code-with-a-number-of-minibatches-which-varies">
+<h2>Code with a Number of Minibatches which varies<a class="headerlink" href="#code-with-a-number-of-minibatches-which-varies" title="Link to this heading">#</a></h2>
+<p>In the code here we vary the number of mini-batches.</p>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-none notranslate"><div class="highlight"><pre><span></span># Importing various packages
+from math import exp, sqrt
+from random import random, seed
+import numpy as np
+import matplotlib.pyplot as plt
+
+n = 100
+x = 2*np.random.rand(n,1)
+y = 4+3*x+np.random.randn(n,1)
+
+X = np.c_[np.ones((n,1)), x]
+XT_X = X.T @ X
+theta_linreg = np.linalg.inv(X.T @ X) @ (X.T @ y)
+print(&quot;Own inversion&quot;)
+print(theta_linreg)
+# Hessian matrix
+H = (2.0/n)* XT_X
+EigValues, EigVectors = np.linalg.eig(H)
+print(f&quot;Eigenvalues of Hessian Matrix:{EigValues}&quot;)
+
+theta = np.random.randn(2,1)
+eta = 1.0/np.max(EigValues)
+Niterations = 1000
+
+
+for iter in range(Niterations):
+    gradients = 2.0/n*X.T @ ((X @ theta)-y)
+    theta -= eta*gradients
+print(&quot;theta from own gd&quot;)
+print(theta)
+
+xnew = np.array([[0],[2]])
+Xnew = np.c_[np.ones((2,1)), xnew]
+ypredict = Xnew.dot(theta)
+ypredict2 = Xnew.dot(theta_linreg)
+
+n_epochs = 50
+M = 5   #size of each minibatch
+m = int(n/M) #number of minibatches
+t0, t1 = 5, 50
+
+def learning_schedule(t):
+    return t0/(t+t1)
+
+theta = np.random.randn(2,1)
+
+for epoch in range(n_epochs):
+# Can you figure out a better way of setting up the contributions to each batch?
+    for i in range(m):
+        random_index = M*np.random.randint(m)
+        xi = X[random_index:random_index+M]
+        yi = y[random_index:random_index+M]
+        gradients = (2.0/M)* xi.T @ ((xi @ theta)-yi)
+        eta = learning_schedule(epoch*m+i)
+        theta = theta - eta*gradients
+print(&quot;theta from own sdg&quot;)
+print(theta)
+
+plt.plot(xnew, ypredict, &quot;r-&quot;)
+plt.plot(xnew, ypredict2, &quot;b-&quot;)
+plt.plot(x, y ,&#39;ro&#39;)
+plt.axis([0,2.0,0, 15.0])
+plt.xlabel(r&#39;$x$&#39;)
+plt.ylabel(r&#39;$y$&#39;)
+plt.title(r&#39;Random numbers &#39;)
+plt.show()
+</pre></div>
+</div>
+</div>
+</div>
+</section>
+<section id="replace-or-not">
+<h2>Replace or not<a class="headerlink" href="#replace-or-not" title="Link to this heading">#</a></h2>
+<p>In the above code, we have use replacement in setting up the
+mini-batches. The discussion
+<a class="reference external" href="https://sebastianraschka.com/faq/docs/sgd-methods.html">here</a> may be
+useful.</p>
+</section>
+<section id="sgd-vs-full-batch-gd-convergence-speed-and-memory-comparison">
+<h2>SGD vs Full-Batch GD: Convergence Speed and Memory Comparison<a class="headerlink" href="#sgd-vs-full-batch-gd-convergence-speed-and-memory-comparison" title="Link to this heading">#</a></h2>
+<section id="theoretical-convergence-speed-and-convex-optimization">
+<h3>Theoretical Convergence Speed and convex optimization<a class="headerlink" href="#theoretical-convergence-speed-and-convex-optimization" title="Link to this heading">#</a></h3>
+<p>Consider minimizing an empirical cost function</p>
+<div class="math notranslate nohighlight">
+\[
+C(\theta) =\frac{1}{N}\sum_{i=1}^N l_i(\theta),
+\]</div>
+<p>where each <span class="math notranslate nohighlight">\(l_i(\theta)\)</span> is a
+differentiable loss term. Gradient Descent (GD) updates parameters
+using the full gradient <span class="math notranslate nohighlight">\(\nabla C(\theta)\)</span>, while Stochastic Gradient
+Descent (SGD) uses a single sample (or mini-batch) gradient <span class="math notranslate nohighlight">\(\nabla
+l_i(\theta)\)</span> selected at random. In equation form, one GD step is:</p>
+<div class="math notranslate nohighlight">
+\[
+\theta_{t+1} = \theta_t-\eta \nabla C(\theta_t) =\theta_t -\eta \frac{1}{N}\sum_{i=1}^N \nabla l_i(\theta_t),
+\]</div>
+<p>whereas one SGD step is:</p>
+<div class="math notranslate nohighlight">
+\[
+\theta_{t+1} = \theta_t -\eta \nabla l_{i_t}(\theta_t),
+\]</div>
+<p>with <span class="math notranslate nohighlight">\(i_t\)</span> randomly chosen. On smooth convex problems, GD and SGD both
+converge to the global minimum, but their rates differ. GD can take
+larger, more stable steps since it uses the exact gradient, achieving
+an error that decreases on the order of <span class="math notranslate nohighlight">\(O(1/t)\)</span> per iteration for
+convex objectives (and even exponentially fast for strongly convex
+cases). In contrast, plain SGD has more variance in each step, leading
+to sublinear convergence in expectation – typically <span class="math notranslate nohighlight">\(O(1/\sqrt{t})\)</span>
+for general convex objectives (\thetaith appropriate diminishing step
+sizes) . Intuitively, GD’s trajectory is smoother and more
+predictable, while SGD’s path oscillates due to noise but costs far
+less per iteration, enabling many more updates in the same time.</p>
+</section>
+<section id="strongly-convex-case">
+<h3>Strongly Convex Case<a class="headerlink" href="#strongly-convex-case" title="Link to this heading">#</a></h3>
+<p>If <span class="math notranslate nohighlight">\(C(\theta)\)</span> is strongly convex and <span class="math notranslate nohighlight">\(L\)</span>-smooth (so GD enjoys linear
+convergence), the gap <span class="math notranslate nohighlight">\(C(\theta_t)-C(\theta^*)\)</span> for GD shrinks as</p>
+<div class="math notranslate nohighlight">
+\[
+C(\theta_t) - C(\theta^* ) \le \Big(1 - \frac{\mu}{L}\Big)^t [C(\theta_0)-C(\theta^*)],
+\]</div>
+<p>a geometric (linear) convergence per iteration . Achieving an
+<span class="math notranslate nohighlight">\(\epsilon\)</span>-accurate solution thus takes on the order of
+<span class="math notranslate nohighlight">\(\log(1/\epsilon)\)</span> iterations for GD. However, each GD iteration costs
+<span class="math notranslate nohighlight">\(O(N)\)</span> gradient evaluations. SGD cannot exploit strong convexity to
+obtain a linear rate – instead, with a properly decaying step size
+(e.g. <span class="math notranslate nohighlight">\(\eta_t = \frac{1}{\mu t}\)</span>) or iterate averaging, SGD attains an
+<span class="math notranslate nohighlight">\(O(1/t)\)</span> convergence rate in expectation . For example, one result
+of Moulines and  Bach 2011, see <a class="reference external" href="https://papers.nips.cc/paper_files/paper/2011/hash/40008b9a5380fcacce3976bf7c08af5b-Abstract.html">https://papers.nips.cc/paper_files/paper/2011/hash/40008b9a5380fcacce3976bf7c08af5b-Abstract.html</a> shows that with <span class="math notranslate nohighlight">\(\eta_t = \Theta(1/t)\)</span>,</p>
+<div class="math notranslate nohighlight">
+\[
+\mathbb{E}[C(\theta_t) - C(\theta^*)] = O(1/t),
+\]</div>
+<p>for strongly convex, smooth <span class="math notranslate nohighlight">\(F\)</span> . This <span class="math notranslate nohighlight">\(1/t\)</span> rate is slower per
+iteration than GD’s exponential decay, but each SGD iteration is <span class="math notranslate nohighlight">\(N\)</span>
+times cheaper. In fact, to reach error <span class="math notranslate nohighlight">\(\epsilon\)</span>, plain SGD needs on
+the order of <span class="math notranslate nohighlight">\(T=O(1/\epsilon)\)</span> iterations (sub-linear convergence),
+while GD needs <span class="math notranslate nohighlight">\(O(\log(1/\epsilon))\)</span> iterations. When accounting for
+cost-per-iteration, GD requires <span class="math notranslate nohighlight">\(O(N \log(1/\epsilon))\)</span> total gradient
+computations versus SGD’s <span class="math notranslate nohighlight">\(O(1/\epsilon)\)</span> single-sample
+computations. In large-scale regimes (huge <span class="math notranslate nohighlight">\(N\)</span>), SGD can be
+faster in wall-clock time because <span class="math notranslate nohighlight">\(N \log(1/\epsilon)\)</span> may far exceed
+<span class="math notranslate nohighlight">\(1/\epsilon\)</span> for reasonable accuracy levels. In other words,
+with millions of data points, one epoch of GD (one full gradient) is
+extremely costly, whereas SGD can make <span class="math notranslate nohighlight">\(N\)</span> cheap updates in the time
+GD makes one – often yielding a good solution faster in practice, even
+though SGD’s asymptotic error decays more slowly. As one lecture
+succinctly puts it: “SGD can be super effective in terms of iteration
+cost and memory, but SGD is slow to converge and can’t adapt to strong
+convexity” . Thus, the break-even point depends on <span class="math notranslate nohighlight">\(N\)</span> and the desired
+accuracy: for moderate accuracy on very large <span class="math notranslate nohighlight">\(N\)</span>, SGD’s cheaper
+updates win; for extremely high precision (very small <span class="math notranslate nohighlight">\(\epsilon\)</span>) on a
+modest <span class="math notranslate nohighlight">\(N\)</span>, GD’s fast convergence per step can be advantageous.</p>
+</section>
+<section id="non-convex-problems">
+<h3>Non-Convex Problems<a class="headerlink" href="#non-convex-problems" title="Link to this heading">#</a></h3>
+<p>In non-convex optimization (e.g. deep neural networks), neither GD nor
+SGD guarantees global minima, but SGD often displays faster progress
+in finding useful minima. Theoretical results here are weaker, usually
+showing convergence to a stationary point <span class="math notranslate nohighlight">\(\theta\)</span> (<span class="math notranslate nohighlight">\(|\nabla C|\)</span> is
+small) in expectation. For example, GD might require <span class="math notranslate nohighlight">\(O(1/\epsilon^2)\)</span>
+iterations to ensure <span class="math notranslate nohighlight">\(|\nabla C(\theta)| &lt; \epsilon\)</span>, and SGD typically has
+similar polynomial complexity (often worse due to gradient
+noise). However, a noteworthy difference is that SGD’s stochasticity
+can help escape saddle points or poor local minima. Random gradient
+fluctuations act like implicit noise, helping the iterate “jump” out
+of flat saddle regions where full-batch GD could stagnate . In fact,
+research has shown that adding noise to GD can guarantee escaping
+saddle points in polynomial time, and the inherent noise in SGD often
+serves this role. Empirically, this means SGD can sometimes find a
+lower loss basin faster, whereas full-batch GD might get “stuck” near
+saddle points or need a very small learning rate to navigate complex
+error surfaces . Overall, in modern high-dimensional machine learning,
+SGD (or mini-batch SGD) is the workhorse for large non-convex problems
+because it converges to good solutions much faster in practice,
+despite the lack of a linear convergence guarantee. Full-batch GD is
+rarely used on large neural networks, as it would require tiny steps
+to avoid divergence and is extremely slow per iteration .</p>
+</section>
+</section>
+<section id="memory-usage-and-scalability">
+<h2>Memory Usage and Scalability<a class="headerlink" href="#memory-usage-and-scalability" title="Link to this heading">#</a></h2>
+<p>A major advantage of SGD is its memory efficiency in handling large
+datasets. Full-batch GD requires access to the entire training set for
+each iteration, which often means the whole dataset (or a large
+subset) must reside in memory to compute <span class="math notranslate nohighlight">\(\nabla C(\theta)\)</span> . This results
+in memory usage that scales linearly with the dataset size <span class="math notranslate nohighlight">\(N\)</span>. For
+instance, if each training sample is large (e.g. high-dimensional
+features), computing a full gradient may require storing a substantial
+portion of the data or all intermediate gradients until they are
+aggregated. In contrast, SGD needs only a single (or a small
+mini-batch of) training example(s) in memory at any time . The
+algorithm processes one sample (or mini-batch) at a time and
+immediately updates the model, discarding that sample before moving to
+the next. This streaming approach means that memory footprint is
+essentially independent of <span class="math notranslate nohighlight">\(N\)</span> (apart from storing the model
+parameters themselves). As one source notes, gradient descent
+“requires more memory than SGD” because it “must store the entire
+dataset for each iteration,” whereas SGD “only needs to store the
+current training example” . In practical terms, if you have a dataset
+of size, say, 1 million examples, full-batch GD would need memory for
+all million every step, while SGD could be implemented to load just
+one example at a time – a crucial benefit if data are too large to fit
+in RAM or GPU memory. This scalability makes SGD suitable for
+large-scale learning: as long as you can stream data from disk, SGD
+can handle arbitrarily large datasets with fixed memory. In fact, SGD
+“does not need to remember which examples were visited” in the past,
+allowing it to run in an online fashion on infinite data streams
+. Full-batch GD, on the other hand, would require multiple passes
+through a giant dataset per update (or a complex distributed memory
+system), which is often infeasible.</p>
+<p>There is also a secondary memory effect: computing a full-batch
+gradient in deep learning requires storing all intermediate
+activations for backpropagation across the entire batch. A very large
+batch (approaching the full dataset) might exhaust GPU memory due to
+the need to hold activation gradients for thousands or millions of
+examples simultaneously. SGD/minibatches mitigate this by splitting
+the workload – e.g. with a mini-batch of size 32 or 256, memory use
+stays bounded, whereas a full-batch (size = <span class="math notranslate nohighlight">\(N\)</span>) forward/backward pass
+could not even be executed if <span class="math notranslate nohighlight">\(N\)</span> is huge. Techniques like gradient
+accumulation exist to simulate large-batch GD by summing many
+small-batch gradients – but these still process data in manageable
+chunks to avoid memory overflow. In summary, memory complexity for GD
+grows with <span class="math notranslate nohighlight">\(N\)</span>, while for SGD it remains <span class="math notranslate nohighlight">\(O(1)\)</span> w.r.t. dataset size
+(only the model and perhaps a mini-batch reside in memory) . This is a
+key reason why batch GD “does not scale” to very large data and why
+virtually all large-scale machine learning algorithms rely on
+stochastic or mini-batch methods.</p>
+</section>
+<section id="empirical-evidence-convergence-time-and-memory-in-practice">
+<h2>Empirical Evidence: Convergence Time and Memory in Practice<a class="headerlink" href="#empirical-evidence-convergence-time-and-memory-in-practice" title="Link to this heading">#</a></h2>
+<p>Empirical studies strongly support the theoretical trade-offs
+above. In large-scale machine learning tasks, SGD often converges to a
+good solution much faster in wall-clock time than full-batch GD, and
+it uses far less memory. For example, Bottou &amp; Bousquet (2008)
+analyzed learning time under a fixed computational budget and
+concluded that when data is abundant, it’s better to use a faster
+(even if less precise) optimization method to process more examples in
+the same time . This analysis showed that for large-scale problems,
+processing more data with SGD yields lower error than spending the
+time to do exact (batch) optimization on fewer data . In other words,
+if you have a time budget, it’s often optimal to accept slightly
+slower convergence per step (as with SGD) in exchange for being able
+to use many more training samples in that time. This phenomenon is
+borne out by experiments:</p>
+<section id="deep-neural-networks">
+<h3>Deep Neural Networks<a class="headerlink" href="#deep-neural-networks" title="Link to this heading">#</a></h3>
+<p>In modern deep learning, full-batch GD is so slow that it is rarely
+attempted; instead, mini-batch SGD is standard. A recent study
+demonstrated that it is possible to train a ResNet-50 on ImageNet
+using full-batch gradient descent, but it required careful tuning
+(e.g. gradient clipping, tiny learning rates) and vast computational
+resources – and even then, each full-batch update was extremely
+expensive.</p>
+<p>Using a huge batch
+(closer to full GD) tends to slow down convergence if the learning
+rate is not scaled up, and often encounters optimization difficulties
+(plateaus) that small batches avoid.
+Empirically, small or medium
+batch SGD finds minima in fewer clock hours because it can rapidly
+loop over the data with gradient noise aiding exploration.</p>
+</section>
+<section id="memory-constraints">
+<h3>Memory constraints<a class="headerlink" href="#memory-constraints" title="Link to this heading">#</a></h3>
+<p>From a memory standpoint, practitioners note that batch GD becomes
+infeasible on large data. For example, if one tried to do full-batch
+training on a dataset that doesn’t fit in RAM or GPU memory, the
+program would resort to heavy disk I/O or simply crash. SGD
+circumvents this by processing mini-batches. Even in cases where data
+does fit in memory, using a full batch can spike memory usage due to
+storing all gradients. One empirical observation is that mini-batch
+training has a “lower, fluctuating usage pattern” of memory, whereas
+full-batch loading “quickly consumes memory (often exceeding limits)”
+. This is especially relevant for graph neural networks or other
+models where a “batch” may include a huge chunk of a graph: full-batch
+gradient computation can exhaust GPU memory, whereas mini-batch
+methods keep memory usage manageable .</p>
+<p>In summary, SGD converges faster than full-batch GD in terms of actual
+training time for large-scale problems, provided we measure
+convergence as reaching a good-enough solution. Theoretical bounds
+show SGD needs more iterations, but because it performs many more
+updates per unit time (and requires far less memory), it often
+achieves lower loss in a given time frame than GD. Full-batch GD might
+take slightly fewer iterations in theory, but each iteration is so
+costly that it is “slower… especially for large datasets” . Meanwhile,
+memory scaling strongly favors SGD: GD’s memory cost grows with
+dataset size, making it impractical beyond a point, whereas SGD’s
+memory use is modest and mostly constant w.r.t. <span class="math notranslate nohighlight">\(N\)</span> . These
+differences have made SGD (and mini-batch variants) the de facto
+choice for training large machine learning models, from logistic
+regression on millions of examples to deep neural networks with
+billions of parameters. The consensus in both research and practice is
+that for large-scale or high-dimensional tasks, SGD-type methods
+converge quicker per unit of computation and handle memory constraints
+better than standard full-batch gradient descent .</p>
+</section>
+</section>
+<section id="second-moment-of-the-gradient">
+<h2>Second moment of the gradient<a class="headerlink" href="#second-moment-of-the-gradient" title="Link to this heading">#</a></h2>
+<p>In stochastic gradient descent, with and without momentum, we still
+have to specify a schedule for tuning the learning rates <span class="math notranslate nohighlight">\(\eta_t\)</span>
+as a function of time.  As discussed in the context of Newton’s
+method, this presents a number of dilemmas. The learning rate is
+limited by the steepest direction which can change depending on the
+current position in the landscape. To circumvent this problem, ideally
+our algorithm would keep track of curvature and take large steps in
+shallow, flat directions and small steps in steep, narrow directions.
+Second-order methods accomplish this by calculating or approximating
+the Hessian and normalizing the learning rate by the
+curvature. However, this is very computationally expensive for
+extremely large models. Ideally, we would like to be able to
+adaptively change the step size to match the landscape without paying
+the steep computational price of calculating or approximating
+Hessians.</p>
+<p>During the last decade a number of methods have been introduced that accomplish
+this by tracking not only the gradient, but also the second moment of
+the gradient. These methods include AdaGrad, AdaDelta, Root Mean Squared Propagation (RMS-Prop), and
+<a class="reference external" href="https://arxiv.org/abs/1412.6980">ADAM</a>.</p>
+</section>
+<section id="challenge-choosing-a-fixed-learning-rate">
+<h2>Challenge: Choosing a Fixed Learning Rate<a class="headerlink" href="#challenge-choosing-a-fixed-learning-rate" title="Link to this heading">#</a></h2>
+<p>A fixed <span class="math notranslate nohighlight">\(\eta\)</span> is hard to get right:</p>
+<ol class="arabic simple">
+<li><p>If <span class="math notranslate nohighlight">\(\eta\)</span> is too large, the updates can overshoot the minimum, causing oscillations or divergence</p></li>
+<li><p>If <span class="math notranslate nohighlight">\(\eta\)</span> is too small, convergence is very slow (many iterations to make progress)</p></li>
+</ol>
+<p>In practice, one often uses trial-and-error or schedules (decaying <span class="math notranslate nohighlight">\(\eta\)</span> over time) to find a workable balance.
+For a function with steep directions and flat directions, a single global <span class="math notranslate nohighlight">\(\eta\)</span> may be inappropriate:</p>
+<ol class="arabic simple">
+<li><p>Steep coordinates require a smaller step size to avoid oscillation.</p></li>
+<li><p>Flat/shallow coordinates could use a larger step to speed up progress.</p></li>
+<li><p>This issue is pronounced in high-dimensional problems with <strong>sparse or varying-scale features</strong> – we need a method to adjust step sizesper feature.</p></li>
+</ol>
+</section>
+<section id="motivation-for-adaptive-step-sizes">
+<h2>Motivation for Adaptive Step Sizes<a class="headerlink" href="#motivation-for-adaptive-step-sizes" title="Link to this heading">#</a></h2>
+<ol class="arabic simple">
+<li><p>Instead of a fixed global <span class="math notranslate nohighlight">\(\eta\)</span>, use an <strong>adaptive learning rate</strong> for each parameter that depends on the history of gradients.</p></li>
+<li><p>Parameters that have large accumulated gradient magnitude should get smaller steps (they’ve been changing a lot), whereas parameters with small or infrequent gradients can have larger relative steps.</p></li>
+<li><p>This is especially useful for sparse features: Rarely active features accumulate little gradient, so their learning rate remains comparatively high, ensuring they are not neglected</p></li>
+<li><p>Conversely, frequently active features accumulate large gradient sums, and their learning rate automatically decreases, preventing too-large updates</p></li>
+<li><p>Several algorithms implement this idea (AdaGrad, RMSProp, AdaDelta, Adam, etc.). We will derive <strong>AdaGrad</strong>, one of the first adaptive methods.</p></li>
+</ol>
+</section>
+<section id="adagrad-algorithm-taken-from-goodfellow-et-al">
+<h2>AdaGrad algorithm, taken from <a class="reference external" href="https://www.deeplearningbook.org/contents/optimization.html">Goodfellow et al</a><a class="headerlink" href="#adagrad-algorithm-taken-from-goodfellow-et-al" title="Link to this heading">#</a></h2>
+<!-- dom:FIGURE: [figures/adagrad.png, width=600 frac=0.8] -->
+<!-- begin figure -->
+<p><img src="http://github.com/figures/adagrad.png" width="600"><p style="font-size: 0.9em"><i>Figure 1: </i></p></p>
+<!-- end figure --></section>
+<section id="derivation-of-the-adagrad-algorithm">
+<h2>Derivation of the AdaGrad Algorithm<a class="headerlink" href="#derivation-of-the-adagrad-algorithm" title="Link to this heading">#</a></h2>
+<p><strong>Accumulating Gradient History.</strong></p>
+<ol class="arabic simple">
+<li><p>AdaGrad maintains a running sum of squared gradients for each parameter (coordinate)</p></li>
+<li><p>Let <span class="math notranslate nohighlight">\(g_t = \nabla C_{i_t}(x_t)\)</span> be the gradient at step <span class="math notranslate nohighlight">\(t\)</span> (or a subgradient for nondifferentiable cases).</p></li>
+<li><p>Initialize <span class="math notranslate nohighlight">\(r_0 = 0\)</span> (an all-zero vector in <span class="math notranslate nohighlight">\(\mathbb{R}^d\)</span>).</p></li>
+<li><p>At each iteration <span class="math notranslate nohighlight">\(t\)</span>, update the accumulation:</p></li>
+</ol>
+<div class="math notranslate nohighlight">
+\[
+r_t = r_{t-1} + g_t \circ g_t,
+\]</div>
+<ol class="arabic simple">
+<li><p>Here  <span class="math notranslate nohighlight">\(g_t \circ g_t\)</span> denotes element-wise square of the gradient vector. <span class="math notranslate nohighlight">\(g_t^{(j)} = g_{t-1}^{(j)} + (g_{t,j})^2\)</span> for each parameter <span class="math notranslate nohighlight">\(j\)</span>.</p></li>
+<li><p>We can view <span class="math notranslate nohighlight">\(H_t = \mathrm{diag}(r_t)\)</span> as a diagonal matrix of past squared gradients. Initially <span class="math notranslate nohighlight">\(H_0 = 0\)</span>.</p></li>
+</ol>
+</section>
+<section id="adagrad-update-rule-derivation">
+<h2>AdaGrad Update Rule Derivation<a class="headerlink" href="#adagrad-update-rule-derivation" title="Link to this heading">#</a></h2>
+<p>We scale the gradient by the inverse square root of the accumulated matrix <span class="math notranslate nohighlight">\(H_t\)</span>. The AdaGrad update at step <span class="math notranslate nohighlight">\(t\)</span> is:</p>
+<div class="math notranslate nohighlight">
+\[
+\theta_{t+1} =\theta_t - \eta H_t^{-1/2} g_t,
+\]</div>
+<p>where <span class="math notranslate nohighlight">\(H_t^{-1/2}\)</span> is the diagonal matrix with entries <span class="math notranslate nohighlight">\((r_{t}^{(1)})^{-1/2}, \dots, (r_{t}^{(d)})^{-1/2}\)</span>
+In coordinates, this means each parameter <span class="math notranslate nohighlight">\(j\)</span> has an individual step size:</p>
+<div class="math notranslate nohighlight">
+\[
+\theta_{t+1,j} =\theta_{t,j} -\frac{\eta}{\sqrt{r_{t,j}}}g_{t,j}.
+\]</div>
+<p>In practice we add a small constant <span class="math notranslate nohighlight">\(\epsilon\)</span> in the denominator for numerical stability to avoid division by zero:</p>
+<div class="math notranslate nohighlight">
+\[
+\theta_{t+1,j}= \theta_{t,j}-\frac{\eta}{\sqrt{\epsilon + r_{t,j}}}g_{t,j}.
+\]</div>
+<p>Equivalently, the effective learning rate for parameter <span class="math notranslate nohighlight">\(j\)</span> at time <span class="math notranslate nohighlight">\(t\)</span> is <span class="math notranslate nohighlight">\(\displaystyle \alpha_{t,j} = \frac{\eta}{\sqrt{\epsilon + r_{t,j}}}\)</span>. This decreases over time as <span class="math notranslate nohighlight">\(r_{t,j}\)</span> grows.</p>
+</section>
+<section id="adagrad-properties">
+<h2>AdaGrad Properties<a class="headerlink" href="#adagrad-properties" title="Link to this heading">#</a></h2>
+<ol class="arabic simple">
+<li><p>AdaGrad automatically tunes the step size for each parameter. Parameters with more <em>volatile or large gradients</em> get smaller steps, and those with <em>small or infrequent gradients</em> get relatively larger steps</p></li>
+<li><p>No manual schedule needed: The accumulation <span class="math notranslate nohighlight">\(r_t\)</span> keeps increasing (or stays the same if gradient is zero), so step sizes <span class="math notranslate nohighlight">\(\eta/\sqrt{r_t}\)</span> are non-increasing. This has a similar effect to a learning rate schedule, but individualized per coordinate.</p></li>
+<li><p>Sparse data benefit: For very sparse features, <span class="math notranslate nohighlight">\(r_{t,j}\)</span> grows slowly, so that feature’s parameter retains a higher learning rate for longer, allowing it to make significant updates when it does get a gradient signal</p></li>
+<li><p>Convergence: In convex optimization, AdaGrad can be shown to achieve a sub-linear convergence rate  comparable to the best fixed learning rate tuned for the problem</p></li>
+</ol>
+<p>It effectively reduces the need to tune <span class="math notranslate nohighlight">\(\eta\)</span> by hand.</p>
+<ol class="arabic simple">
+<li><p>Limitations: Because <span class="math notranslate nohighlight">\(r_t\)</span> accumulates without bound, AdaGrad’s learning rates can become extremely small over long training, potentially slowing progress. (Later variants like RMSProp, AdaDelta, Adam address this by modifying the accumulation rule.)</p></li>
+</ol>
+</section>
+<section id="rmsprop-adaptive-learning-rates">
+<h2>RMSProp: Adaptive Learning Rates<a class="headerlink" href="#rmsprop-adaptive-learning-rates" title="Link to this heading">#</a></h2>
+<p>Addresses AdaGrad’s diminishing learning rate issue.
+Uses a decaying average of squared gradients (instead of a cumulative sum):</p>
+<div class="math notranslate nohighlight">
+\[
+v_t = \rho v_{t-1} + (1-\rho)(\nabla C(\theta_t))^2,
+\]</div>
+<p>with <span class="math notranslate nohighlight">\(\rho\)</span> typically <span class="math notranslate nohighlight">\(0.9\)</span> (or <span class="math notranslate nohighlight">\(0.99\)</span>).</p>
+<ol class="arabic simple">
+<li><p>Update: <span class="math notranslate nohighlight">\(\theta_{t+1} = \theta_t - \frac{\eta}{\sqrt{v_t + \epsilon}} \nabla C(\theta_t)\)</span>.</p></li>
+<li><p>Recent gradients have more weight, so <span class="math notranslate nohighlight">\(v_t\)</span> adapts to the current landscape.</p></li>
+<li><p>Avoids AdaGrad’s “infinite memory” problem – learning rate does not continuously decay to zero.</p></li>
+</ol>
+<p>RMSProp was first proposed in lecture notes by Geoff Hinton, 2012 - unpublished.)</p>
+</section>
+<section id="rmsprop-algorithm-taken-from-goodfellow-et-al">
+<h2>RMSProp algorithm, taken from <a class="reference external" href="https://www.deeplearningbook.org/contents/optimization.html">Goodfellow et al</a><a class="headerlink" href="#rmsprop-algorithm-taken-from-goodfellow-et-al" title="Link to this heading">#</a></h2>
+<!-- dom:FIGURE: [figures/rmsprop.png, width=600 frac=0.8] -->
+<!-- begin figure -->
+<p><img src="http://github.com/figures/rmsprop.png" width="600"><p style="font-size: 0.9em"><i>Figure 1: </i></p></p>
+<!-- end figure --></section>
+<section id="adam-optimizer">
+<h2>Adam Optimizer<a class="headerlink" href="#adam-optimizer" title="Link to this heading">#</a></h2>
+<p>Why combine Momentum and RMSProp? Motivation for Adam: Adaptive Moment Estimation (Adam) was introduced by Kingma an Ba (2014) to combine the benefits of momentum and RMSProp.</p>
+<ol class="arabic simple">
+<li><p>Fast convergence by smoothing gradients (accelerates in long-term gradient direction).</p></li>
+<li><p>Adaptive rates (RMSProp): Per-dimension learning rate scaling for stability (handles different feature scales, sparse gradients).</p></li>
+<li><p>Adam uses both: maintains moving averages of both first moment (gradients) and second moment (squared gradients)</p></li>
+<li><p>Additionally, includes a mechanism to correct the bias in these moving averages (crucial in early iterations)</p></li>
+</ol>
+<p><strong>Result</strong>: Adam is robust, achieves faster convergence with less tuning, and often outperforms SGD (with momentum) in practice.</p>
+</section>
+<section id="id2">
+<h2><a class="reference external" href="https://arxiv.org/abs/1412.6980">ADAM optimizer</a><a class="headerlink" href="#id2" title="Link to this heading">#</a></h2>
+<p>In <a class="reference external" href="https://arxiv.org/abs/1412.6980">ADAM</a>, we keep a running average of
+both the first and second moment of the gradient and use this
+information to adaptively change the learning rate for different
+parameters.  The method is efficient when working with large
+problems involving lots data and/or parameters.  It is a combination of the
+gradient descent with momentum algorithm and the RMSprop algorithm
+discussed above.</p>
+</section>
+<section id="why-combine-momentum-and-rmsprop">
+<h2>Why Combine Momentum and RMSProp?<a class="headerlink" href="#why-combine-momentum-and-rmsprop" title="Link to this heading">#</a></h2>
+<ol class="arabic simple">
+<li><p>Momentum: Fast convergence by smoothing gradients (accelerates in long-term gradient direction).</p></li>
+<li><p>Adaptive rates (RMSProp): Per-dimension learning rate scaling for stability (handles different feature scales, sparse gradients).</p></li>
+<li><p>Adam uses both: maintains moving averages of both first moment (gradients) and second moment (squared gradients)</p></li>
+<li><p>Additionally, includes a mechanism to correct the bias in these moving averages (crucial in early iterations)</p></li>
+</ol>
+<p>Result: Adam is robust, achieves faster convergence with less tuning, and often outperforms SGD (with momentum) in practice</p>
+</section>
+<section id="adam-exponential-moving-averages-moments">
+<h2>Adam: Exponential Moving Averages (Moments)<a class="headerlink" href="#adam-exponential-moving-averages-moments" title="Link to this heading">#</a></h2>
+<p>Adam maintains two moving averages at each time step <span class="math notranslate nohighlight">\(t\)</span> for each parameter <span class="math notranslate nohighlight">\(w\)</span>:
+<strong>First moment (mean) <span class="math notranslate nohighlight">\(m_t\)</span>.</strong></p>
+<p>The Momentum term</p>
+<div class="math notranslate nohighlight">
+\[
+m_t = \beta_1m_{t-1} + (1-\beta_1)\, \nabla C(\theta_t),
+\]</div>
+<p><strong>Second moment (uncentered variance) <span class="math notranslate nohighlight">\(v_t\)</span>.</strong></p>
+<p>The RMS term</p>
+<div class="math notranslate nohighlight">
+\[
+v_t = \beta_2v_{t-1} + (1-\beta_2)(\nabla C(\theta_t))^2,
+\]</div>
+<p>with typical <span class="math notranslate nohighlight">\(\beta_1 = 0.9\)</span>, <span class="math notranslate nohighlight">\(\beta_2 = 0.999\)</span>. Initialize <span class="math notranslate nohighlight">\(m_0 = 0\)</span>, <span class="math notranslate nohighlight">\(v_0 = 0\)</span>.</p>
+<p>These are <strong>biased</strong> estimators of the true first and second moment of the gradients, especially at the start (since <span class="math notranslate nohighlight">\(m_0,v_0\)</span> are zero)</p>
+</section>
+<section id="adam-bias-correction">
+<h2>Adam: Bias Correction<a class="headerlink" href="#adam-bias-correction" title="Link to this heading">#</a></h2>
+<p>To counteract initialization bias in <span class="math notranslate nohighlight">\(m_t, v_t\)</span>, Adam computes bias-corrected estimates</p>
+<div class="math notranslate nohighlight">
+\[
+\hat{m}_t = \frac{m_t}{1 - \beta_1^t}, \qquad \hat{v}_t = \frac{v_t}{1 - \beta_2^t}.
+\]</div>
+<ul class="simple">
+<li><p>When <span class="math notranslate nohighlight">\(t\)</span> is small, <span class="math notranslate nohighlight">\(1-\beta_i^t \approx 0\)</span>, so <span class="math notranslate nohighlight">\(\hat{m}_t, \hat{v}_t\)</span> significantly larger than raw <span class="math notranslate nohighlight">\(m_t, v_t\)</span>, compensating for the initial zero bias.</p></li>
+<li><p>As <span class="math notranslate nohighlight">\(t\)</span> increases, <span class="math notranslate nohighlight">\(1-\beta_i^t \to 1\)</span>, and <span class="math notranslate nohighlight">\(\hat{m}_t, \hat{v}_t\)</span> converge to <span class="math notranslate nohighlight">\(m_t, v_t\)</span>.</p></li>
+<li><p>Bias correction is important for Adam’s stability in early iterations</p></li>
+</ul>
+</section>
+<section id="adam-update-rule-derivation">
+<h2>Adam: Update Rule Derivation<a class="headerlink" href="#adam-update-rule-derivation" title="Link to this heading">#</a></h2>
+<p>Finally, Adam updates parameters using the bias-corrected moments:</p>
+<div class="math notranslate nohighlight">
+\[
+\theta_{t+1} =\theta_t -\frac{\alpha}{\sqrt{\hat{v}_t} + \epsilon}\hat{m}_t,
+\]</div>
+<p>where <span class="math notranslate nohighlight">\(\epsilon\)</span> is a small constant (e.g. <span class="math notranslate nohighlight">\(10^{-8}\)</span>) to prevent division by zero.
+Breaking it down:</p>
+<ol class="arabic simple">
+<li><p>Compute gradient <span class="math notranslate nohighlight">\(\nabla C(\theta_t)\)</span>.</p></li>
+<li><p>Update first moment <span class="math notranslate nohighlight">\(m_t\)</span> and second moment <span class="math notranslate nohighlight">\(v_t\)</span> (exponential moving averages).</p></li>
+<li><p>Bias-correct: <span class="math notranslate nohighlight">\(\hat{m}_t = m_t/(1-\beta_1^t)\)</span>, <span class="math notranslate nohighlight">\(\; \hat{v}_t = v_t/(1-\beta_2^t)\)</span>.</p></li>
+<li><p>Compute step: <span class="math notranslate nohighlight">\(\Delta \theta_t = \frac{\hat{m}_t}{\sqrt{\hat{v}_t} + \epsilon}\)</span>.</p></li>
+<li><p>Update parameters: <span class="math notranslate nohighlight">\(\theta_{t+1} = \theta_t - \alpha\, \Delta \theta_t\)</span>.</p></li>
+</ol>
+<p>This is the Adam update rule as given in the original paper.</p>
+</section>
+<section id="adam-vs-adagrad-and-rmsprop">
+<h2>Adam vs. AdaGrad and RMSProp<a class="headerlink" href="#adam-vs-adagrad-and-rmsprop" title="Link to this heading">#</a></h2>
+<ol class="arabic simple">
+<li><p>AdaGrad: Uses per-coordinate scaling like Adam, but no momentum. Tends to slow down too much due to cumulative history (no forgetting)</p></li>
+<li><p>RMSProp: Uses moving average of squared gradients (like Adam’s <span class="math notranslate nohighlight">\(v_t\)</span>) to maintain adaptive learning rates, but does not include momentum or bias-correction.</p></li>
+<li><p>Adam: Effectively RMSProp + Momentum + Bias-correction</p></li>
+</ol>
+<ul class="simple">
+<li><p>Momentum (<span class="math notranslate nohighlight">\(m_t\)</span>) provides acceleration and smoother convergence.</p></li>
+<li><p>Adaptive <span class="math notranslate nohighlight">\(v_t\)</span> scaling moderates the step size per dimension.</p></li>
+<li><p>Bias correction (absent in AdaGrad/RMSProp) ensures robust estimates early on.</p></li>
+</ul>
+<p>In practice, Adam often yields faster convergence and better tuning stability than RMSProp or AdaGrad alone</p>
+</section>
+<section id="adaptivity-across-dimensions">
+<h2>Adaptivity Across Dimensions<a class="headerlink" href="#adaptivity-across-dimensions" title="Link to this heading">#</a></h2>
+<ol class="arabic simple">
+<li><p>Adam adapts the step size \emph{per coordinate}: parameters with larger gradient variance get smaller effective steps, those with smaller or sparse gradients get larger steps.</p></li>
+<li><p>This per-dimension adaptivity is inherited from AdaGrad/RMSProp and helps handle ill-conditioned or sparse problems.</p></li>
+<li><p>Meanwhile, momentum (first moment) allows Adam to continue making progress even if gradients become small or noisy, by leveraging accumulated direction.</p></li>
+</ol>
+</section>
+<section id="adam-algorithm-taken-from-goodfellow-et-al">
+<h2>ADAM algorithm, taken from <a class="reference external" href="https://www.deeplearningbook.org/contents/optimization.html">Goodfellow et al</a><a class="headerlink" href="#adam-algorithm-taken-from-goodfellow-et-al" title="Link to this heading">#</a></h2>
+<!-- dom:FIGURE: [figures/adam.png, width=600 frac=0.8] -->
+<!-- begin figure -->
+<p><img src="http://github.com/figures/adam.png" width="600"><p style="font-size: 0.9em"><i>Figure 1: </i></p></p>
+<!-- end figure --></section>
+<section id="algorithms-and-codes-for-adagrad-rmsprop-and-adam">
+<h2>Algorithms and codes for Adagrad, RMSprop and Adam<a class="headerlink" href="#algorithms-and-codes-for-adagrad-rmsprop-and-adam" title="Link to this heading">#</a></h2>
+<p>The algorithms we have implemented are well described in the text by <a class="reference external" href="https://www.deeplearningbook.org/contents/optimization.html">Goodfellow, Bengio and Courville, chapter 8</a>.</p>
+<p>The codes which implement these algorithms are discussed below here.</p>
+</section>
+<section id="practical-tips">
+<h2>Practical tips<a class="headerlink" href="#practical-tips" title="Link to this heading">#</a></h2>
+<ul class="simple">
+<li><p><strong>Randomize the data when making mini-batches</strong>. It is always important to randomly shuffle the data when forming mini-batches. Otherwise, the gradient descent method can fit spurious correlations resulting from the order in which data is presented.</p></li>
+<li><p><strong>Transform your inputs</strong>. Learning becomes difficult when our landscape has a mixture of steep and flat directions. One simple trick for minimizing these situations is to standardize the data by subtracting the mean and normalizing the variance of input variables. Whenever possible, also decorrelate the inputs. To understand why this is helpful, consider the case of linear regression. It is easy to show that for the squared error cost function, the Hessian of the cost function is just the correlation matrix between the inputs. Thus, by standardizing the inputs, we are ensuring that the landscape looks homogeneous in all directions in parameter space. Since most deep networks can be viewed as linear transformations followed by a non-linearity at each layer, we expect this intuition to hold beyond the linear case.</p></li>
+<li><p><strong>Monitor the out-of-sample performance.</strong> Always monitor the performance of your model on a validation set (a small portion of the training data that is held out of the training process to serve as a proxy for the test set. If the validation error starts increasing, then the model is beginning to overfit. Terminate the learning process. This <em>early stopping</em> significantly improves performance in many settings.</p></li>
+<li><p><strong>Adaptive optimization methods don’t always have good generalization.</strong> Recent studies have shown that adaptive methods such as ADAM, RMSPorp, and AdaGrad tend to have poor generalization compared to SGD or SGD with momentum, particularly in the high-dimensional limit (i.e. the number of parameters exceeds the number of data points). Although it is not clear at this stage why these methods perform so well in training deep neural networks, simpler procedures like properly-tuned SGD may work as well or better in these applications.</p></li>
+</ul>
+</section>
+<section id="sneaking-in-automatic-differentiation-using-autograd">
+<h2>Sneaking in automatic differentiation using Autograd<a class="headerlink" href="#sneaking-in-automatic-differentiation-using-autograd" title="Link to this heading">#</a></h2>
+<p>In the examples here we take the liberty of sneaking in automatic
+differentiation (without having discussed the mathematics).  In
+project 1 you will write the gradients as discussed above, that is
+hard-coding the gradients.  By introducing automatic differentiation
+via the library <strong>autograd</strong>, which is now replaced by <strong>JAX</strong>, we have
+more flexibility in setting up alternative cost functions.</p>
+<p>The
+first example shows results with ordinary leats squares.</p>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-none notranslate"><div class="highlight"><pre><span></span># Using Autograd to calculate gradients for OLS
+from random import random, seed
+import numpy as np
+import autograd.numpy as np
+import matplotlib.pyplot as plt
+from autograd import grad
+
+def CostOLS(theta):
+    return (1.0/n)*np.sum((y-X @ theta)**2)
+
+n = 100
+x = 2*np.random.rand(n,1)
+y = 4+3*x+np.random.randn(n,1)
+
+X = np.c_[np.ones((n,1)), x]
+XT_X = X.T @ X
+theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)
+print(&quot;Own inversion&quot;)
+print(theta_linreg)
+# Hessian matrix
+H = (2.0/n)* XT_X
+EigValues, EigVectors = np.linalg.eig(H)
+print(f&quot;Eigenvalues of Hessian Matrix:{EigValues}&quot;)
+
+theta = np.random.randn(2,1)
+eta = 1.0/np.max(EigValues)
+Niterations = 1000
+# define the gradient
+training_gradient = grad(CostOLS)
+
+for iter in range(Niterations):
+    gradients = training_gradient(theta)
+    theta -= eta*gradients
+print(&quot;theta from own gd&quot;)
+print(theta)
+
+xnew = np.array([[0],[2]])
+Xnew = np.c_[np.ones((2,1)), xnew]
+ypredict = Xnew.dot(theta)
+ypredict2 = Xnew.dot(theta_linreg)
+
+plt.plot(xnew, ypredict, &quot;r-&quot;)
+plt.plot(xnew, ypredict2, &quot;b-&quot;)
+plt.plot(x, y ,&#39;ro&#39;)
+plt.axis([0,2.0,0, 15.0])
+plt.xlabel(r&#39;$x$&#39;)
+plt.ylabel(r&#39;$y$&#39;)
+plt.title(r&#39;Random numbers &#39;)
+plt.show()
+</pre></div>
+</div>
+</div>
+</div>
+</section>
+<section id="id3">
+<h2>Same code but now with momentum gradient descent<a class="headerlink" href="#id3" title="Link to this heading">#</a></h2>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-none notranslate"><div class="highlight"><pre><span></span># Using Autograd to calculate gradients for OLS
+from random import random, seed
+import numpy as np
+import autograd.numpy as np
+import matplotlib.pyplot as plt
+from autograd import grad
+
+def CostOLS(theta):
+    return (1.0/n)*np.sum((y-X @ theta)**2)
+
+n = 100
+x = 2*np.random.rand(n,1)
+y = 4+3*x#+np.random.randn(n,1)
+
+X = np.c_[np.ones((n,1)), x]
+XT_X = X.T @ X
+theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)
+print(&quot;Own inversion&quot;)
+print(theta_linreg)
+# Hessian matrix
+H = (2.0/n)* XT_X
+EigValues, EigVectors = np.linalg.eig(H)
+print(f&quot;Eigenvalues of Hessian Matrix:{EigValues}&quot;)
+
+theta = np.random.randn(2,1)
+eta = 1.0/np.max(EigValues)
+Niterations = 30
+
+# define the gradient
+training_gradient = grad(CostOLS)
+
+for iter in range(Niterations):
+    gradients = training_gradient(theta)
+    theta -= eta*gradients
+    print(iter,gradients[0],gradients[1])
+print(&quot;theta from own gd&quot;)
+print(theta)
+
+# Now improve with momentum gradient descent
+change = 0.0
+delta_momentum = 0.3
+for iter in range(Niterations):
+    # calculate gradient
+    gradients = training_gradient(theta)
+    # calculate update
+    new_change = eta*gradients+delta_momentum*change
+    # take a step
+    theta -= new_change
+    # save the change
+    change = new_change
+    print(iter,gradients[0],gradients[1])
+print(&quot;theta from own gd wth momentum&quot;)
+print(theta)
+</pre></div>
+</div>
+</div>
+</div>
+</section>
+<section id="including-stochastic-gradient-descent-with-autograd">
+<h2>Including Stochastic Gradient Descent with Autograd<a class="headerlink" href="#including-stochastic-gradient-descent-with-autograd" title="Link to this heading">#</a></h2>
+<p>In this code we include the stochastic gradient descent approach
+discussed above. Note here that we specify which argument we are
+taking the derivative with respect to when using <strong>autograd</strong>.</p>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-none notranslate"><div class="highlight"><pre><span></span># Using Autograd to calculate gradients using SGD
+# OLS example
+from random import random, seed
+import numpy as np
+import autograd.numpy as np
+import matplotlib.pyplot as plt
+from autograd import grad
+
+# Note change from previous example
+def CostOLS(y,X,theta):
+    return np.sum((y-X @ theta)**2)
+
+n = 100
+x = 2*np.random.rand(n,1)
+y = 4+3*x+np.random.randn(n,1)
+
+X = np.c_[np.ones((n,1)), x]
+XT_X = X.T @ X
+theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)
+print(&quot;Own inversion&quot;)
+print(theta_linreg)
+# Hessian matrix
+H = (2.0/n)* XT_X
+EigValues, EigVectors = np.linalg.eig(H)
+print(f&quot;Eigenvalues of Hessian Matrix:{EigValues}&quot;)
+
+theta = np.random.randn(2,1)
+eta = 1.0/np.max(EigValues)
+Niterations = 1000
+
+# Note that we request the derivative wrt third argument (theta, 2 here)
+training_gradient = grad(CostOLS,2)
+
+for iter in range(Niterations):
+    gradients = (1.0/n)*training_gradient(y, X, theta)
+    theta -= eta*gradients
+print(&quot;theta from own gd&quot;)
+print(theta)
+
+xnew = np.array([[0],[2]])
+Xnew = np.c_[np.ones((2,1)), xnew]
+ypredict = Xnew.dot(theta)
+ypredict2 = Xnew.dot(theta_linreg)
+
+plt.plot(xnew, ypredict, &quot;r-&quot;)
+plt.plot(xnew, ypredict2, &quot;b-&quot;)
+plt.plot(x, y ,&#39;ro&#39;)
+plt.axis([0,2.0,0, 15.0])
+plt.xlabel(r&#39;$x$&#39;)
+plt.ylabel(r&#39;$y$&#39;)
+plt.title(r&#39;Random numbers &#39;)
+plt.show()
+
+n_epochs = 50
+M = 5   #size of each minibatch
+m = int(n/M) #number of minibatches
+t0, t1 = 5, 50
+def learning_schedule(t):
+    return t0/(t+t1)
+
+theta = np.random.randn(2,1)
+
+for epoch in range(n_epochs):
+# Can you figure out a better way of setting up the contributions to each batch?
+    for i in range(m):
+        random_index = M*np.random.randint(m)
+        xi = X[random_index:random_index+M]
+        yi = y[random_index:random_index+M]
+        gradients = (1.0/M)*training_gradient(yi, xi, theta)
+        eta = learning_schedule(epoch*m+i)
+        theta = theta - eta*gradients
+print(&quot;theta from own sdg&quot;)
+print(theta)
+</pre></div>
+</div>
+</div>
+</div>
+</section>
+<section id="id4">
+<h2>Same code but now with momentum gradient descent<a class="headerlink" href="#id4" title="Link to this heading">#</a></h2>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-none notranslate"><div class="highlight"><pre><span></span># Using Autograd to calculate gradients using SGD
+# OLS example
+from random import random, seed
+import numpy as np
+import autograd.numpy as np
+import matplotlib.pyplot as plt
+from autograd import grad
+
+# Note change from previous example
+def CostOLS(y,X,theta):
+    return np.sum((y-X @ theta)**2)
+
+n = 100
+x = 2*np.random.rand(n,1)
+y = 4+3*x+np.random.randn(n,1)
+
+X = np.c_[np.ones((n,1)), x]
+XT_X = X.T @ X
+theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)
+print(&quot;Own inversion&quot;)
+print(theta_linreg)
+# Hessian matrix
+H = (2.0/n)* XT_X
+EigValues, EigVectors = np.linalg.eig(H)
+print(f&quot;Eigenvalues of Hessian Matrix:{EigValues}&quot;)
+
+theta = np.random.randn(2,1)
+eta = 1.0/np.max(EigValues)
+Niterations = 100
+
+# Note that we request the derivative wrt third argument (theta, 2 here)
+training_gradient = grad(CostOLS,2)
+
+for iter in range(Niterations):
+    gradients = (1.0/n)*training_gradient(y, X, theta)
+    theta -= eta*gradients
+print(&quot;theta from own gd&quot;)
+print(theta)
+
+
+n_epochs = 50
+M = 5   #size of each minibatch
+m = int(n/M) #number of minibatches
+t0, t1 = 5, 50
+def learning_schedule(t):
+    return t0/(t+t1)
+
+theta = np.random.randn(2,1)
+
+change = 0.0
+delta_momentum = 0.3
+
+for epoch in range(n_epochs):
+    for i in range(m):
+        random_index = M*np.random.randint(m)
+        xi = X[random_index:random_index+M]
+        yi = y[random_index:random_index+M]
+        gradients = (1.0/M)*training_gradient(yi, xi, theta)
+        eta = learning_schedule(epoch*m+i)
+        # calculate update
+        new_change = eta*gradients+delta_momentum*change
+        # take a step
+        theta -= new_change
+        # save the change
+        change = new_change
+print(&quot;theta from own sdg with momentum&quot;)
+print(theta)
+</pre></div>
+</div>
+</div>
+</div>
+</section>
+<section id="but-none-of-these-can-compete-with-newton-s-method">
+<h2>But none of these can compete with Newton’s method<a class="headerlink" href="#but-none-of-these-can-compete-with-newton-s-method" title="Link to this heading">#</a></h2>
+<p>Note that we here have introduced automatic differentiation</p>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-none notranslate"><div class="highlight"><pre><span></span># Using Newton&#39;s method
+from random import random, seed
+import numpy as np
+import autograd.numpy as np
+from autograd import grad
+
+def CostOLS(theta):
+    return (1.0/n)*np.sum((y-X @ theta)**2)
+
+n = 100
+x = 2*np.random.rand(n,1)
+y = 4+3*x+5*x*x
+
+X = np.c_[np.ones((n,1)), x, x*x]
+XT_X = X.T @ X
+theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)
+print(&quot;Own inversion&quot;)
+print(theta_linreg)
+# Hessian matrix
+H = (2.0/n)* XT_X
+# Note that here the Hessian does not depend on the parameters theta
+invH = np.linalg.pinv(H)
+theta = np.random.randn(3,1)
+Niterations = 5
+# define the gradient
+training_gradient = grad(CostOLS)
+
+for iter in range(Niterations):
+    gradients = training_gradient(theta)
+    theta -= invH @ gradients
+    print(iter,gradients[0],gradients[1])
+print(&quot;theta from own Newton code&quot;)
+print(theta)
+</pre></div>
+</div>
+</div>
+</div>
+</section>
+<section id="similar-second-order-function-now-problem-but-now-with-adagrad">
+<h2>Similar (second order function now) problem but now with AdaGrad<a class="headerlink" href="#similar-second-order-function-now-problem-but-now-with-adagrad" title="Link to this heading">#</a></h2>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-none notranslate"><div class="highlight"><pre><span></span># Using Autograd to calculate gradients using AdaGrad and Stochastic Gradient descent
+# OLS example
+from random import random, seed
+import numpy as np
+import autograd.numpy as np
+import matplotlib.pyplot as plt
+from autograd import grad
+
+# Note change from previous example
+def CostOLS(y,X,theta):
+    return np.sum((y-X @ theta)**2)
+
+n = 1000
+x = np.random.rand(n,1)
+y = 2.0+3*x +4*x*x
+
+X = np.c_[np.ones((n,1)), x, x*x]
+XT_X = X.T @ X
+theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)
+print(&quot;Own inversion&quot;)
+print(theta_linreg)
+
+
+# Note that we request the derivative wrt third argument (theta, 2 here)
+training_gradient = grad(CostOLS,2)
+# Define parameters for Stochastic Gradient Descent
+n_epochs = 50
+M = 5   #size of each minibatch
+m = int(n/M) #number of minibatches
+# Guess for unknown parameters theta
+theta = np.random.randn(3,1)
+
+# Value for learning rate
+eta = 0.01
+# Including AdaGrad parameter to avoid possible division by zero
+delta  = 1e-8
+for epoch in range(n_epochs):
+    Giter = 0.0
+    for i in range(m):
+        random_index = M*np.random.randint(m)
+        xi = X[random_index:random_index+M]
+        yi = y[random_index:random_index+M]
+        gradients = (1.0/M)*training_gradient(yi, xi, theta)
+        Giter += gradients*gradients
+        update = gradients*eta/(delta+np.sqrt(Giter))
+        theta -= update
+print(&quot;theta from own AdaGrad&quot;)
+print(theta)
+</pre></div>
+</div>
+</div>
+</div>
+<p>Running this code we note an almost perfect agreement with the results from matrix inversion.</p>
+</section>
+<section id="rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent">
+<h2>RMSprop for adaptive learning rate with Stochastic Gradient Descent<a class="headerlink" href="#rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent" title="Link to this heading">#</a></h2>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-none notranslate"><div class="highlight"><pre><span></span># Using Autograd to calculate gradients using RMSprop  and Stochastic Gradient descent
+# OLS example
+from random import random, seed
+import numpy as np
+import autograd.numpy as np
+import matplotlib.pyplot as plt
+from autograd import grad
+
+# Note change from previous example
+def CostOLS(y,X,theta):
+    return np.sum((y-X @ theta)**2)
+
+n = 1000
+x = np.random.rand(n,1)
+y = 2.0+3*x +4*x*x# +np.random.randn(n,1)
+
+X = np.c_[np.ones((n,1)), x, x*x]
+XT_X = X.T @ X
+theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)
+print(&quot;Own inversion&quot;)
+print(theta_linreg)
+
+
+# Note that we request the derivative wrt third argument (theta, 2 here)
+training_gradient = grad(CostOLS,2)
+# Define parameters for Stochastic Gradient Descent
+n_epochs = 50
+M = 5   #size of each minibatch
+m = int(n/M) #number of minibatches
+# Guess for unknown parameters theta
+theta = np.random.randn(3,1)
+
+# Value for learning rate
+eta = 0.01
+# Value for parameter rho
+rho = 0.99
+# Including AdaGrad parameter to avoid possible division by zero
+delta  = 1e-8
+for epoch in range(n_epochs):
+    Giter = 0.0
+    for i in range(m):
+        random_index = M*np.random.randint(m)
+        xi = X[random_index:random_index+M]
+        yi = y[random_index:random_index+M]
+        gradients = (1.0/M)*training_gradient(yi, xi, theta)
+	# Accumulated gradient
+	# Scaling with rho the new and the previous results
+        Giter = (rho*Giter+(1-rho)*gradients*gradients)
+	# Taking the diagonal only and inverting
+        update = gradients*eta/(delta+np.sqrt(Giter))
+	# Hadamard product
+        theta -= update
+print(&quot;theta from own RMSprop&quot;)
+print(theta)
+</pre></div>
+</div>
+</div>
+</div>
+</section>
+<section id="and-finally-adam">
+<h2>And finally <a class="reference external" href="https://arxiv.org/pdf/1412.6980.pdf">ADAM</a><a class="headerlink" href="#and-finally-adam" title="Link to this heading">#</a></h2>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-none notranslate"><div class="highlight"><pre><span></span># Using Autograd to calculate gradients using RMSprop  and Stochastic Gradient descent
+# OLS example
+from random import random, seed
+import numpy as np
+import autograd.numpy as np
+import matplotlib.pyplot as plt
+from autograd import grad
+
+# Note change from previous example
+def CostOLS(y,X,theta):
+    return np.sum((y-X @ theta)**2)
+
+n = 1000
+x = np.random.rand(n,1)
+y = 2.0+3*x +4*x*x# +np.random.randn(n,1)
+
+X = np.c_[np.ones((n,1)), x, x*x]
+XT_X = X.T @ X
+theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)
+print(&quot;Own inversion&quot;)
+print(theta_linreg)
+
+
+# Note that we request the derivative wrt third argument (theta, 2 here)
+training_gradient = grad(CostOLS,2)
+# Define parameters for Stochastic Gradient Descent
+n_epochs = 50
+M = 5   #size of each minibatch
+m = int(n/M) #number of minibatches
+# Guess for unknown parameters theta
+theta = np.random.randn(3,1)
+
+# Value for learning rate
+eta = 0.01
+# Value for parameters theta1 and theta2, see https://arxiv.org/abs/1412.6980
+theta1 = 0.9
+theta2 = 0.999
+# Including AdaGrad parameter to avoid possible division by zero
+delta  = 1e-7
+iter = 0
+for epoch in range(n_epochs):
+    first_moment = 0.0
+    second_moment = 0.0
+    iter += 1
+    for i in range(m):
+        random_index = M*np.random.randint(m)
+        xi = X[random_index:random_index+M]
+        yi = y[random_index:random_index+M]
+        gradients = (1.0/M)*training_gradient(yi, xi, theta)
+        # Computing moments first
+        first_moment = theta1*first_moment + (1-theta1)*gradients
+        second_moment = theta2*second_moment+(1-theta2)*gradients*gradients
+        first_term = first_moment/(1.0-theta1**iter)
+        second_term = second_moment/(1.0-theta2**iter)
+	# Scaling with rho the new and the previous results
+        update = eta*first_term/(np.sqrt(second_term)+delta)
+        theta -= update
+print(&quot;theta from own ADAM&quot;)
+print(theta)
+</pre></div>
+</div>
+</div>
+</div>
+</section>
+<section id="material-for-the-lab-sessions">
+<h2>Material for the lab sessions<a class="headerlink" href="#material-for-the-lab-sessions" title="Link to this heading">#</a></h2>
+<ol class="arabic simple">
+<li><p>Exercise set for week 37 and reminder on scaling (from lab sessions of week 35)</p></li>
+<li><p>Work on project 1</p></li>
+</ol>
+<!-- * [Video of exercise sessions week 37](https://youtu.be/bK4AEcTu-oM) -->
+<p>For more discussions of Ridge regression and calculation of averages, <a class="reference external" href="https://arxiv.org/abs/1509.09169">Wessel van Wieringen’s</a> article is highly recommended.</p>
+</section>
+<section id="reminder-on-different-scaling-methods">
+<h2>Reminder on different scaling methods<a class="headerlink" href="#reminder-on-different-scaling-methods" title="Link to this heading">#</a></h2>
+<p>Before fitting a regression model, it is good practice to normalize or
+standardize the features. This ensures all features are on a
+comparable scale, which is especially important when using
+regularization. In the exercises this week we will perform standardization, scaling each
+feature to have mean 0 and standard deviation 1.</p>
+<p>Here we compute the mean and standard deviation of each column (feature) in our design/feature matrix <span class="math notranslate nohighlight">\(\boldsymbol{X}\)</span>.
+Then we subtract the mean and divide by the standard deviation for each feature.</p>
+<p>In the example here we
+we will also center the target <span class="math notranslate nohighlight">\(\boldsymbol{y}\)</span> to mean <span class="math notranslate nohighlight">\(0\)</span>. Centering <span class="math notranslate nohighlight">\(\boldsymbol{y}\)</span>
+(and each feature) means the model does not require a separate intercept
+term, the data is shifted such that the intercept is effectively 0
+. (In practice, one could include an intercept in the model and not
+penalize it, but here we simplify by centering.)
+Choose <span class="math notranslate nohighlight">\(n=100\)</span> data points and set up <span class="math notranslate nohighlight">\(\boldsymbol{x}, \)</span>\boldsymbol{y}<span class="math notranslate nohighlight">\( and the design matrix \)</span>\boldsymbol{X}$.</p>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-none notranslate"><div class="highlight"><pre><span></span># Standardize features (zero mean, unit variance for each feature)
+X_mean = X.mean(axis=0)
+X_std = X.std(axis=0)
+X_std[X_std == 0] = 1  # safeguard to avoid division by zero for constant features
+X_norm = (X - X_mean) / X_std
+
+# Center the target to zero mean (optional, to simplify intercept handling)
+y_mean = ?
+y_centered = ?
+</pre></div>
+</div>
+</div>
+</div>
+<p>Do we need to center the values of <span class="math notranslate nohighlight">\(y\)</span>?</p>
+<p>After this preprocessing, each column of <span class="math notranslate nohighlight">\(\boldsymbol{X}_{\mathrm{norm}}\)</span> has mean zero and standard deviation <span class="math notranslate nohighlight">\(1\)</span>
+and <span class="math notranslate nohighlight">\(\boldsymbol{y}_{\mathrm{centered}}\)</span> has mean 0. This can make the optimization landscape
+nicer and ensures the regularization penalty <span class="math notranslate nohighlight">\(\lambda \sum_j
+\theta_j^2\)</span> in Ridge regression treats each coefficient fairly (since features are on the
+same scale).</p>
+</section>
+<section id="functionality-in-scikit-learn">
+<h2>Functionality in Scikit-Learn<a class="headerlink" href="#functionality-in-scikit-learn" title="Link to this heading">#</a></h2>
+<p><strong>Scikit-Learn</strong> has several functions which allow us to rescale the
+data, normally resulting in much better results in terms of various
+accuracy scores.  The <strong>StandardScaler</strong> function in <strong>Scikit-Learn</strong>
+ensures that for each feature/predictor we study the mean value is
+zero and the variance is one (every column in the design/feature
+matrix).  This scaling has the drawback that it does not ensure that
+we have a particular maximum or minimum in our data set. Another
+function included in <strong>Scikit-Learn</strong> is the <strong>MinMaxScaler</strong> which
+ensures that all features are exactly between <span class="math notranslate nohighlight">\(0\)</span> and <span class="math notranslate nohighlight">\(1\)</span>. The</p>
+</section>
+<section id="more-preprocessing">
+<h2>More preprocessing<a class="headerlink" href="#more-preprocessing" title="Link to this heading">#</a></h2>
+<p>The <strong>Normalizer</strong> scales each data
+point such that the feature vector has a euclidean length of one. In other words, it
+projects a data point on the circle (or sphere in the case of higher dimensions) with a
+radius of 1. This means every data point is scaled by a different number (by the
+inverse of it’s length).
+This normalization is often used when only the direction (or angle) of the data matters,
+not the length of the feature vector.</p>
+<p>The <strong>RobustScaler</strong> works similarly to the StandardScaler in that it
+ensures statistical properties for each feature that guarantee that
+they are on the same scale. However, the RobustScaler uses the median
+and quartiles, instead of mean and variance. This makes the
+RobustScaler ignore data points that are very different from the rest
+(like measurement errors). These odd data points are also called
+outliers, and might often lead to trouble for other scaling
+techniques.</p>
+</section>
+<section id="frequently-used-scaling-functions">
+<h2>Frequently used scaling functions<a class="headerlink" href="#frequently-used-scaling-functions" title="Link to this heading">#</a></h2>
+<p>Many features are often scaled using standardization to improve performance. In <strong>Scikit-Learn</strong> this is given by the <strong>StandardScaler</strong> function as discussed above. It is easy however to write your own.
+Mathematically, this involves subtracting the mean and divide by the standard deviation over the data set, for each feature:</p>
+<div class="math notranslate nohighlight">
+\[
+x_j^{(i)} \rightarrow \frac{x_j^{(i)} - \overline{x}_j}{\sigma(x_j)},
+\]</div>
+<p>where <span class="math notranslate nohighlight">\(\overline{x}_j\)</span> and <span class="math notranslate nohighlight">\(\sigma(x_j)\)</span> are the mean and standard deviation, respectively,  of the feature <span class="math notranslate nohighlight">\(x_j\)</span>.
+This ensures that each feature has zero mean and unit standard deviation.  For data sets where  we do not have the standard deviation or don’t wish to calculate it,  it is then common to simply set it to one.</p>
+<p>Keep in mind that when you transform your data set before training a model, the same transformation needs to be done
+on your eventual new data set  before making a prediction. If we translate this into a Python code, it would could be implemented as</p>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-none notranslate"><div class="highlight"><pre><span></span>&quot;&quot;&quot;
+#Model training, we compute the mean value of y and X
+y_train_mean = np.mean(y_train)
+X_train_mean = np.mean(X_train,axis=0)
+X_train = X_train - X_train_mean
+y_train = y_train - y_train_mean
+
+# The we fit our model with the training data
+trained_model = some_model.fit(X_train,y_train)
+
+
+#Model prediction, we need also to transform our data set used for the prediction.
+X_test = X_test - X_train_mean #Use mean from training data
+y_pred = trained_model(X_test)
+y_pred = y_pred + y_train_mean
+&quot;&quot;&quot;
+</pre></div>
+</div>
+</div>
+</div>
+<p>Let us try to understand what this may imply mathematically when we
+subtract the mean values, also known as <em>zero centering</em>. For
+simplicity, we will focus on  ordinary regression, as done in the above example.</p>
+<p>The cost/loss function  for regression is</p>
+<div class="math notranslate nohighlight">
+\[
+C(\theta_0, \theta_1, ... , \theta_{p-1}) = \frac{1}{n}\sum_{i=0}^{n} \left(y_i - \theta_0 - \sum_{j=1}^{p-1} X_{ij}\theta_j\right)^2,.
+\]</div>
+<p>Recall also that we use the squared value. This expression can lead to an
+increased penalty for higher differences between predicted and
+output/target values.</p>
+<p>What we have done is to single out the <span class="math notranslate nohighlight">\(\theta_0\)</span> term in the
+definition of the mean squared error (MSE).  The design matrix <span class="math notranslate nohighlight">\(X\)</span>
+does in this case not contain any intercept column.  When we take the
+derivative with respect to <span class="math notranslate nohighlight">\(\theta_0\)</span>, we want the derivative to obey</p>
+<div class="math notranslate nohighlight">
+\[
+\frac{\partial C}{\partial \theta_j} = 0,
+\]</div>
+<p>for all <span class="math notranslate nohighlight">\(j\)</span>. For <span class="math notranslate nohighlight">\(\theta_0\)</span> we have</p>
+<div class="math notranslate nohighlight">
+\[
+\frac{\partial C}{\partial \theta_0} = -\frac{2}{n}\sum_{i=0}^{n-1} \left(y_i - \theta_0 - \sum_{j=1}^{p-1} X_{ij} \theta_j\right).
+\]</div>
+<p>Multiplying away the constant <span class="math notranslate nohighlight">\(2/n\)</span>, we obtain</p>
+<div class="math notranslate nohighlight">
+\[
+\sum_{i=0}^{n-1} \theta_0 = \sum_{i=0}^{n-1}y_i - \sum_{i=0}^{n-1} \sum_{j=1}^{p-1} X_{ij} \theta_j.
+\]</div>
+<p>Let us specialize first to the case where we have only two parameters <span class="math notranslate nohighlight">\(\theta_0\)</span> and <span class="math notranslate nohighlight">\(\theta_1\)</span>.
+Our result for <span class="math notranslate nohighlight">\(\theta_0\)</span> simplifies then to</p>
+<div class="math notranslate nohighlight">
+\[
+n\theta_0 = \sum_{i=0}^{n-1}y_i - \sum_{i=0}^{n-1} X_{i1} \theta_1.
+\]</div>
+<p>We obtain then</p>
+<div class="math notranslate nohighlight">
+\[
+\theta_0 = \frac{1}{n}\sum_{i=0}^{n-1}y_i - \theta_1\frac{1}{n}\sum_{i=0}^{n-1} X_{i1}.
+\]</div>
+<p>If we define</p>
+<div class="math notranslate nohighlight">
+\[
+\mu_{\boldsymbol{x}_1}=\frac{1}{n}\sum_{i=0}^{n-1} X_{i1},
+\]</div>
+<p>and the mean value of the outputs as</p>
+<div class="math notranslate nohighlight">
+\[
+\mu_y=\frac{1}{n}\sum_{i=0}^{n-1}y_i,
+\]</div>
+<p>we have</p>
+<div class="math notranslate nohighlight">
+\[
+\theta_0 = \mu_y - \theta_1\mu_{\boldsymbol{x}_1}.
+\]</div>
+<p>In the general case with more parameters than <span class="math notranslate nohighlight">\(\theta_0\)</span> and <span class="math notranslate nohighlight">\(\theta_1\)</span>, we have</p>
+<div class="math notranslate nohighlight">
+\[
+\theta_0 = \frac{1}{n}\sum_{i=0}^{n-1}y_i - \frac{1}{n}\sum_{i=0}^{n-1}\sum_{j=1}^{p-1} X_{ij}\theta_j.
+\]</div>
+<p>We can rewrite the latter equation as</p>
+<div class="math notranslate nohighlight">
+\[
+\theta_0 = \frac{1}{n}\sum_{i=0}^{n-1}y_i - \sum_{j=1}^{p-1} \mu_{\boldsymbol{x}_j}\theta_j,
+\]</div>
+<p>where we have defined</p>
+<div class="math notranslate nohighlight">
+\[
+\mu_{\boldsymbol{x}_j}=\frac{1}{n}\sum_{i=0}^{n-1} X_{ij},
+\]</div>
+<p>the mean value for all elements of the column vector <span class="math notranslate nohighlight">\(\boldsymbol{x}_j\)</span>.</p>
+<p>Replacing <span class="math notranslate nohighlight">\(y_i\)</span> with <span class="math notranslate nohighlight">\(y_i - y_i - \overline{\boldsymbol{y}}\)</span> and centering also our design matrix results in a cost function (in vector-matrix disguise)</p>
+<div class="math notranslate nohighlight">
+\[
+C(\boldsymbol{\theta}) = (\boldsymbol{\tilde{y}} - \tilde{X}\boldsymbol{\theta})^T(\boldsymbol{\tilde{y}} - \tilde{X}\boldsymbol{\theta}).
+\]</div>
+<p>If we minimize with respect to <span class="math notranslate nohighlight">\(\boldsymbol{\theta}\)</span> we have then</p>
+<div class="math notranslate nohighlight">
+\[
+\hat{\boldsymbol{\theta}} = (\tilde{X}^T\tilde{X})^{-1}\tilde{X}^T\boldsymbol{\tilde{y}},
+\]</div>
+<p>where <span class="math notranslate nohighlight">\(\boldsymbol{\tilde{y}} = \boldsymbol{y} - \overline{\boldsymbol{y}}\)</span>
+and <span class="math notranslate nohighlight">\(\tilde{X}_{ij} = X_{ij} - \frac{1}{n}\sum_{k=0}^{n-1}X_{kj}\)</span>.</p>
+<p>For Ridge regression we need to add <span class="math notranslate nohighlight">\(\lambda \boldsymbol{\theta}^T\boldsymbol{\theta}\)</span> to the cost function and get then</p>
+<div class="math notranslate nohighlight">
+\[
+\hat{\boldsymbol{\theta}} = (\tilde{X}^T\tilde{X} + \lambda I)^{-1}\tilde{X}^T\boldsymbol{\tilde{y}}.
+\]</div>
+<p>What does this mean? And why do we insist on all this? Let us look at some examples.</p>
+<p>This code shows a simple first-order fit to a data set using the above transformed data, where we consider the role of the intercept first, by either excluding it or including it (<em>code example thanks to  Øyvind Sigmundson Schøyen</em>). Here our scaling of the data is done by subtracting the mean values only.
+Note also that we do not split the data into training and test.</p>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-none notranslate"><div class="highlight"><pre><span></span>import numpy as np
+import matplotlib.pyplot as plt
+
+from sklearn.linear_model import LinearRegression
+
+
+np.random.seed(2021)
+
+def MSE(y_data,y_model):
+    n = np.size(y_model)
+    return np.sum((y_data-y_model)**2)/n
+
+
+def fit_theta(X, y):
+    return np.linalg.pinv(X.T @ X) @ X.T @ y
+
+
+true_theta = [2, 0.5, 3.7]
+
+x = np.linspace(0, 1, 11)
+y = np.sum(
+    np.asarray([x ** p * b for p, b in enumerate(true_theta)]), axis=0
+) + 0.1 * np.random.normal(size=len(x))
+
+degree = 3
+X = np.zeros((len(x), degree))
+
+# Include the intercept in the design matrix
+for p in range(degree):
+    X[:, p] = x ** p
+
+theta = fit_theta(X, y)
+
+# Intercept is included in the design matrix
+skl = LinearRegression(fit_intercept=False).fit(X, y)
+
+print(f&quot;True theta: {true_theta}&quot;)
+print(f&quot;Fitted theta: {theta}&quot;)
+print(f&quot;Sklearn fitted theta: {skl.coef_}&quot;)
+ypredictOwn = X @ theta
+ypredictSKL = skl.predict(X)
+print(f&quot;MSE with intercept column&quot;)
+print(MSE(y,ypredictOwn))
+print(f&quot;MSE with intercept column from SKL&quot;)
+print(MSE(y,ypredictSKL))
+
+
+plt.figure()
+plt.scatter(x, y, label=&quot;Data&quot;)
+plt.plot(x, X @ theta, label=&quot;Fit&quot;)
+plt.plot(x, skl.predict(X), label=&quot;Sklearn (fit_intercept=False)&quot;)
+
+
+# Do not include the intercept in the design matrix
+X = np.zeros((len(x), degree - 1))
+
+for p in range(degree - 1):
+    X[:, p] = x ** (p + 1)
+
+# Intercept is not included in the design matrix
+skl = LinearRegression(fit_intercept=True).fit(X, y)
+
+# Use centered values for X and y when computing coefficients
+y_offset = np.average(y, axis=0)
+X_offset = np.average(X, axis=0)
+
+theta = fit_theta(X - X_offset, y - y_offset)
+intercept = np.mean(y_offset - X_offset @ theta)
+
+print(f&quot;Manual intercept: {intercept}&quot;)
+print(f&quot;Fitted theta (without intercept): {theta}&quot;)
+print(f&quot;Sklearn intercept: {skl.intercept_}&quot;)
+print(f&quot;Sklearn fitted theta (without intercept): {skl.coef_}&quot;)
+ypredictOwn = X @ theta
+ypredictSKL = skl.predict(X)
+print(f&quot;MSE with Manual intercept&quot;)
+print(MSE(y,ypredictOwn+intercept))
+print(f&quot;MSE with Sklearn intercept&quot;)
+print(MSE(y,ypredictSKL))
+
+plt.plot(x, X @ theta + intercept, &quot;--&quot;, label=&quot;Fit (manual intercept)&quot;)
+plt.plot(x, skl.predict(X), &quot;--&quot;, label=&quot;Sklearn (fit_intercept=True)&quot;)
+plt.grid()
+plt.legend()
+
+plt.show()
+</pre></div>
+</div>
+</div>
+</div>
+<p>The intercept is the value of our output/target variable
+when all our features are zero and our function crosses the <span class="math notranslate nohighlight">\(y\)</span>-axis (for a one-dimensional case).</p>
+<p>Printing the MSE, we see first that both methods give the same MSE, as
+they should.  However, when we move to for example Ridge regression,
+the way we treat the intercept may give a larger or smaller MSE,
+meaning that the MSE can be penalized by the value of the
+intercept. Not including the intercept in the fit, means that the
+regularization term does not include <span class="math notranslate nohighlight">\(\theta_0\)</span>. For different values
+of <span class="math notranslate nohighlight">\(\lambda\)</span>, this may lead to different MSE values.</p>
+<p>To remind the reader, the regularization term, with the intercept in Ridge regression, is given by</p>
+<div class="math notranslate nohighlight">
+\[
+\lambda \vert\vert \boldsymbol{\theta} \vert\vert_2^2 = \lambda \sum_{j=0}^{p-1}\theta_j^2,
+\]</div>
+<p>but when we take out the intercept, this equation becomes</p>
+<div class="math notranslate nohighlight">
+\[
+\lambda \vert\vert \boldsymbol{\theta} \vert\vert_2^2 = \lambda \sum_{j=1}^{p-1}\theta_j^2.
+\]</div>
+<p>For Lasso regression we have</p>
+<div class="math notranslate nohighlight">
+\[
+\lambda \vert\vert \boldsymbol{\theta} \vert\vert_1 = \lambda \sum_{j=1}^{p-1}\vert\theta_j\vert.
+\]</div>
+<p>It means that, when scaling the design matrix and the outputs/targets,
+by subtracting the mean values, we have an optimization problem which
+is not penalized by the intercept. The MSE value can then be smaller
+since it focuses only on the remaining quantities. If we however bring
+back the intercept, we will get a MSE which then contains the
+intercept.</p>
+<p>Armed with this wisdom, we attempt first to simply set the intercept equal to <strong>False</strong> in our implementation of Ridge regression for our well-known  vanilla data set.</p>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-none notranslate"><div class="highlight"><pre><span></span>import numpy as np
+import pandas as pd
+import matplotlib.pyplot as plt
+from sklearn.model_selection import train_test_split
+from sklearn import linear_model
+
+def MSE(y_data,y_model):
+    n = np.size(y_model)
+    return np.sum((y_data-y_model)**2)/n
+
+
+# A seed just to ensure that the random numbers are the same for every run.
+# Useful for eventual debugging.
+np.random.seed(3155)
+
+n = 100
+x = np.random.rand(n)
+y = np.exp(-x**2) + 1.5 * np.exp(-(x-2)**2)
+
+Maxpolydegree = 20
+X = np.zeros((n,Maxpolydegree))
+#We include explicitely the intercept column
+for degree in range(Maxpolydegree):
+    X[:,degree] = x**degree
+# We split the data in test and training data
+X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
+
+p = Maxpolydegree
+I = np.eye(p,p)
+# Decide which values of lambda to use
+nlambdas = 6
+MSEOwnRidgePredict = np.zeros(nlambdas)
+MSERidgePredict = np.zeros(nlambdas)
+lambdas = np.logspace(-4, 2, nlambdas)
+for i in range(nlambdas):
+    lmb = lambdas[i]
+    OwnRidgeTheta = np.linalg.pinv(X_train.T @ X_train+lmb*I) @ X_train.T @ y_train
+    # Note: we include the intercept column and no scaling
+    RegRidge = linear_model.Ridge(lmb,fit_intercept=False)
+    RegRidge.fit(X_train,y_train)
+    # and then make the prediction
+    ytildeOwnRidge = X_train @ OwnRidgeTheta
+    ypredictOwnRidge = X_test @ OwnRidgeTheta
+    ytildeRidge = RegRidge.predict(X_train)
+    ypredictRidge = RegRidge.predict(X_test)
+    MSEOwnRidgePredict[i] = MSE(y_test,ypredictOwnRidge)
+    MSERidgePredict[i] = MSE(y_test,ypredictRidge)
+    print(&quot;Theta values for own Ridge implementation&quot;)
+    print(OwnRidgeTheta)
+    print(&quot;Theta values for Scikit-Learn Ridge implementation&quot;)
+    print(RegRidge.coef_)
+    print(&quot;MSE values for own Ridge implementation&quot;)
+    print(MSEOwnRidgePredict[i])
+    print(&quot;MSE values for Scikit-Learn Ridge implementation&quot;)
+    print(MSERidgePredict[i])
+
+# Now plot the results
+plt.figure()
+plt.plot(np.log10(lambdas), MSEOwnRidgePredict, &#39;r&#39;, label = &#39;MSE own Ridge Test&#39;)
+plt.plot(np.log10(lambdas), MSERidgePredict, &#39;g&#39;, label = &#39;MSE Ridge Test&#39;)
+
+plt.xlabel(&#39;log10(lambda)&#39;)
+plt.ylabel(&#39;MSE&#39;)
+plt.legend()
+plt.show()
+</pre></div>
+</div>
+</div>
+</div>
+<p>The results here agree when we force <strong>Scikit-Learn</strong>’s Ridge function to include the first column in our design matrix.
+We see that the results agree very well. Here we have thus explicitely included the intercept column in the design matrix.
+What happens if we do not include the intercept in our fit?
+Let us see how we can change this code by zero centering.</p>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-none notranslate"><div class="highlight"><pre><span></span>import numpy as np
+import pandas as pd
+import matplotlib.pyplot as plt
+from sklearn.model_selection import train_test_split
+from sklearn import linear_model
+from sklearn.preprocessing import StandardScaler
+
+def MSE(y_data,y_model):
+    n = np.size(y_model)
+    return np.sum((y_data-y_model)**2)/n
+# A seed just to ensure that the random numbers are the same for every run.
+# Useful for eventual debugging.
+np.random.seed(315)
+
+n = 100
+x = np.random.rand(n)
+y = np.exp(-x**2) + 1.5 * np.exp(-(x-2)**2)
+
+Maxpolydegree = 20
+X = np.zeros((n,Maxpolydegree-1))
+
+for degree in range(1,Maxpolydegree): #No intercept column
+    X[:,degree-1] = x**(degree)
+
+# We split the data in test and training data
+X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
+
+#For our own implementation, we will need to deal with the intercept by centering the design matrix and the target variable
+X_train_mean = np.mean(X_train,axis=0)
+#Center by removing mean from each feature
+X_train_scaled = X_train - X_train_mean 
+X_test_scaled = X_test - X_train_mean
+#The model intercept (called y_scaler) is given by the mean of the target variable (IF X is centered)
+#Remove the intercept from the training data.
+y_scaler = np.mean(y_train)           
+y_train_scaled = y_train - y_scaler   
+
+p = Maxpolydegree-1
+I = np.eye(p,p)
+# Decide which values of lambda to use
+nlambdas = 6
+MSEOwnRidgePredict = np.zeros(nlambdas)
+MSERidgePredict = np.zeros(nlambdas)
+
+lambdas = np.logspace(-4, 2, nlambdas)
+for i in range(nlambdas):
+    lmb = lambdas[i]
+    OwnRidgeTheta = np.linalg.pinv(X_train_scaled.T @ X_train_scaled+lmb*I) @ X_train_scaled.T @ (y_train_scaled)
+    intercept_ = y_scaler - X_train_mean@OwnRidgeTheta #The intercept can be shifted so the model can predict on uncentered data
+    #Add intercept to prediction
+    ypredictOwnRidge = X_test_scaled @ OwnRidgeTheta + y_scaler 
+    RegRidge = linear_model.Ridge(lmb)
+    RegRidge.fit(X_train,y_train)
+    ypredictRidge = RegRidge.predict(X_test)
+    MSEOwnRidgePredict[i] = MSE(y_test,ypredictOwnRidge)
+    MSERidgePredict[i] = MSE(y_test,ypredictRidge)
+    print(&quot;Theta values for own Ridge implementation&quot;)
+    print(OwnRidgeTheta) #Intercept is given by mean of target variable
+    print(&quot;Theta values for Scikit-Learn Ridge implementation&quot;)
+    print(RegRidge.coef_)
+    print(&#39;Intercept from own implementation:&#39;)
+    print(intercept_)
+    print(&#39;Intercept from Scikit-Learn Ridge implementation&#39;)
+    print(RegRidge.intercept_)
+    print(&quot;MSE values for own Ridge implementation&quot;)
+    print(MSEOwnRidgePredict[i])
+    print(&quot;MSE values for Scikit-Learn Ridge implementation&quot;)
+    print(MSERidgePredict[i])
+
+
+# Now plot the results
+plt.figure()
+plt.plot(np.log10(lambdas), MSEOwnRidgePredict, &#39;b--&#39;, label = &#39;MSE own Ridge Test&#39;)
+plt.plot(np.log10(lambdas), MSERidgePredict, &#39;g--&#39;, label = &#39;MSE SL Ridge Test&#39;)
+plt.xlabel(&#39;log10(lambda)&#39;)
+plt.ylabel(&#39;MSE&#39;)
+plt.legend()
+plt.show()
+</pre></div>
+</div>
+</div>
+</div>
+<p>We see here, when compared to the code which includes explicitely the
+intercept column, that our MSE value is actually smaller. This is
+because the regularization term does not include the intercept value
+<span class="math notranslate nohighlight">\(\theta_0\)</span> in the fitting.  This applies to Lasso regularization as
+well.  It means that our optimization is now done only with the
+centered matrix and/or vector that enter the fitting procedure.</p>
+</section>
+</section>
+
+    <script type="text/x-thebe-config">
+    {
+        requestKernel: true,
+        binderOptions: {
+            repo: "binder-examples/jupyter-stacks-datascience",
+            ref: "master",
+        },
+        codeMirrorConfig: {
+            theme: "abcdef",
+            mode: "python"
+        },
+        kernelOptions: {
+            name: "python3",
+            path: "./."
+        },
+        predefinedOutput: true
+    }
+    </script>
+    <script>kernelName = 'python3'</script>
+
+                </article>
+              
+
+              
+              
+              
+              
+                <footer class="prev-next-footer d-print-none">
+                  
+<div class="prev-next-area">
+    <a class="left-prev"
+       href="http://github.com/exercisesweek37.html"
+       title="previous page">
+      <i class="fa-solid fa-angle-left"></i>
+      <div class="prev-next-info">
+        <p class="prev-next-subtitle">previous</p>
+        <p class="prev-next-title">Exercises week 37</p>
+      </div>
+    </a>
+    <a class="right-next"
+       href="http://github.com/exercisesweek38.html"
+       title="next page">
+      <div class="prev-next-info">
+        <p class="prev-next-subtitle">next</p>
+        <p class="prev-next-title">Exercises week 38</p>
+      </div>
+      <i class="fa-solid fa-angle-right"></i>
+    </a>
+</div>
+                </footer>
+              
+            </div>
+            
+            
+              
+                <div class="bd-sidebar-secondary bd-toc"><div class="sidebar-secondary-items sidebar-secondary__inner">
+
+
+  <div class="sidebar-secondary-item">
+  <div class="page-toc tocsection onthispage">
+    <i class="fa-solid fa-list"></i> Contents
+  </div>
+  <nav class="bd-toc-nav page-toc">
+    <ul class="visible nav section-nav flex-column">
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#plans-for-week-37-lecture-monday">Plans for week 37, lecture Monday</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#readings-and-videos">Readings and Videos:</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#material-for-lecture-monday-september-8">Material for lecture Monday September 8</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#gradient-descent-and-revisiting-ordinary-least-squares-from-last-week">Gradient descent and revisiting Ordinary Least Squares from last week</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#gradient-descent-example">Gradient descent example</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#the-derivative-of-the-cost-loss-function">The derivative of the cost/loss function</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#the-hessian-matrix">The Hessian matrix</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#simple-program">Simple program</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#id1">Gradient Descent Example</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#gradient-descent-and-ridge">Gradient descent and Ridge</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#the-hessian-matrix-for-ridge-regression">The Hessian matrix for Ridge Regression</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#program-example-for-gradient-descent-with-ridge-regression">Program example for gradient descent with Ridge Regression</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#using-gradient-descent-methods-limitations">Using gradient descent methods, limitations</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#momentum-based-gd">Momentum based GD</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#improving-gradient-descent-with-momentum">Improving gradient descent with momentum</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#same-code-but-now-with-momentum-gradient-descent">Same code but now with momentum gradient descent</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#overview-video-on-stochastic-gradient-descent-sgd">Overview video on Stochastic Gradient Descent (SGD)</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#batches-and-mini-batches">Batches and mini-batches</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#pros-and-cons">Pros and cons</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#convergence-rates">Convergence rates</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#accuracy">Accuracy</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#stochastic-gradient-descent-sgd">Stochastic Gradient Descent (SGD)</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#stochastic-gradient-descent">Stochastic Gradient Descent</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#computation-of-gradients">Computation of gradients</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#sgd-example">SGD example</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#the-gradient-step">The gradient step</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#simple-example-code">Simple example code</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#when-do-we-stop">When do we stop?</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#slightly-different-approach">Slightly different approach</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#time-decay-rate">Time decay rate</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#code-with-a-number-of-minibatches-which-varies">Code with a Number of Minibatches which varies</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#replace-or-not">Replace or not</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#sgd-vs-full-batch-gd-convergence-speed-and-memory-comparison">SGD vs Full-Batch GD: Convergence Speed and Memory Comparison</a><ul class="nav section-nav flex-column">
+<li class="toc-h3 nav-item toc-entry"><a class="reference internal nav-link" href="#theoretical-convergence-speed-and-convex-optimization">Theoretical Convergence Speed and convex optimization</a></li>
+<li class="toc-h3 nav-item toc-entry"><a class="reference internal nav-link" href="#strongly-convex-case">Strongly Convex Case</a></li>
+<li class="toc-h3 nav-item toc-entry"><a class="reference internal nav-link" href="#non-convex-problems">Non-Convex Problems</a></li>
+</ul>
+</li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#memory-usage-and-scalability">Memory Usage and Scalability</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#empirical-evidence-convergence-time-and-memory-in-practice">Empirical Evidence: Convergence Time and Memory in Practice</a><ul class="nav section-nav flex-column">
+<li class="toc-h3 nav-item toc-entry"><a class="reference internal nav-link" href="#deep-neural-networks">Deep Neural Networks</a></li>
+<li class="toc-h3 nav-item toc-entry"><a class="reference internal nav-link" href="#memory-constraints">Memory constraints</a></li>
+</ul>
+</li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#second-moment-of-the-gradient">Second moment of the gradient</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#challenge-choosing-a-fixed-learning-rate">Challenge: Choosing a Fixed Learning Rate</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#motivation-for-adaptive-step-sizes">Motivation for Adaptive Step Sizes</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#adagrad-algorithm-taken-from-goodfellow-et-al">AdaGrad algorithm, taken from Goodfellow et al</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#derivation-of-the-adagrad-algorithm">Derivation of the AdaGrad Algorithm</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#adagrad-update-rule-derivation">AdaGrad Update Rule Derivation</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#adagrad-properties">AdaGrad Properties</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#rmsprop-adaptive-learning-rates">RMSProp: Adaptive Learning Rates</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#rmsprop-algorithm-taken-from-goodfellow-et-al">RMSProp algorithm, taken from Goodfellow et al</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#adam-optimizer">Adam Optimizer</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#id2">ADAM optimizer</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#why-combine-momentum-and-rmsprop">Why Combine Momentum and RMSProp?</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#adam-exponential-moving-averages-moments">Adam: Exponential Moving Averages (Moments)</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#adam-bias-correction">Adam: Bias Correction</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#adam-update-rule-derivation">Adam: Update Rule Derivation</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#adam-vs-adagrad-and-rmsprop">Adam vs. AdaGrad and RMSProp</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#adaptivity-across-dimensions">Adaptivity Across Dimensions</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#adam-algorithm-taken-from-goodfellow-et-al">ADAM algorithm, taken from Goodfellow et al</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#algorithms-and-codes-for-adagrad-rmsprop-and-adam">Algorithms and codes for Adagrad, RMSprop and Adam</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#practical-tips">Practical tips</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#sneaking-in-automatic-differentiation-using-autograd">Sneaking in automatic differentiation using Autograd</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#id3">Same code but now with momentum gradient descent</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#including-stochastic-gradient-descent-with-autograd">Including Stochastic Gradient Descent with Autograd</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#id4">Same code but now with momentum gradient descent</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#but-none-of-these-can-compete-with-newton-s-method">But none of these can compete with Newton’s method</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#similar-second-order-function-now-problem-but-now-with-adagrad">Similar (second order function now) problem but now with AdaGrad</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent">RMSprop for adaptive learning rate with Stochastic Gradient Descent</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#and-finally-adam">And finally ADAM</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#material-for-the-lab-sessions">Material for the lab sessions</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#reminder-on-different-scaling-methods">Reminder on different scaling methods</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#functionality-in-scikit-learn">Functionality in Scikit-Learn</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#more-preprocessing">More preprocessing</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#frequently-used-scaling-functions">Frequently used scaling functions</a></li>
+</ul>
+  </nav></div>
+
+</div></div>
+              
+            
+          </div>
+          <footer class="bd-footer-content">
+            
+<div class="bd-footer-content__inner container">
+  
+  <div class="footer-item">
+    
+<p class="component-author">
+By Morten Hjorth-Jensen
+</p>
+
+  </div>
+  
+  <div class="footer-item">
+    
+
+  <p class="copyright">
+    
+      © Copyright 2023.
+      <br/>
+    
+  </p>
+
+  </div>
+  
+  <div class="footer-item">
+    
+  </div>
+  
+  <div class="footer-item">
+    
+  </div>
+  
+</div>
+          </footer>
+        
+
+      </main>
+    </div>
+  </div>
+  
+  <!-- Scripts loaded after <body> so the DOM is not blocked -->
+  <script src="http://github.com/_static/scripts/bootstrap.js?digest=dfe6caa3a7d634c4db9b"></script>
+<script src="http://github.com/_static/scripts/pydata-sphinx-theme.js?digest=dfe6caa3a7d634c4db9b"></script>
+
+  <footer class="bd-footer">
+  </footer>
+  </body>
+</html>
\ No newline at end of file
diff --git a/doc/LectureNotes/_build/html/week38.html b/doc/LectureNotes/_build/html/week38.html
new file mode 100644
index 000000000..b62f3b3f0
--- /dev/null
+++ b/doc/LectureNotes/_build/html/week38.html
@@ -0,0 +1,1860 @@
+
+<!DOCTYPE html>
+
+
+<html lang="en" data-content_root="./" >
+
+  <head>
+    <meta charset="utf-8" />
+    <meta name="viewport" content="width=device-width, initial-scale=1.0" /><meta name="viewport" content="width=device-width, initial-scale=1" />
+
+    <title>Week 38: Statistical analysis, bias-variance tradeoff and resampling methods &#8212; Applied Data Analysis and Machine Learning</title>
+  
+  
+  
+  <script data-cfasync="false">
+    document.documentElement.dataset.mode = localStorage.getItem("mode") || "";
+    document.documentElement.dataset.theme = localStorage.getItem("theme") || "";
+  </script>
+  
+  <!-- Loaded before other Sphinx assets -->
+  <link href="http://github.com/_static/styles/theme.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+<link href="http://github.com/_static/styles/bootstrap.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+<link href="http://github.com/_static/styles/pydata-sphinx-theme.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+
+  
+  <link href="http://github.com/_static/vendor/fontawesome/6.5.2/css/all.min.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+  <link rel="preload" as="font" type="font/woff2" crossorigin href="http://github.com/_static/vendor/fontawesome/6.5.2/webfonts/fa-solid-900.woff2" />
+<link rel="preload" as="font" type="font/woff2" crossorigin href="http://github.com/_static/vendor/fontawesome/6.5.2/webfonts/fa-brands-400.woff2" />
+<link rel="preload" as="font" type="font/woff2" crossorigin href="http://github.com/_static/vendor/fontawesome/6.5.2/webfonts/fa-regular-400.woff2" />
+
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/pygments.css?v=fa44fd50" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/styles/sphinx-book-theme.css?v=eba8b062" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/togglebutton.css?v=13237357" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/copybutton.css?v=76b2166b" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/mystnb.8ecb98da25f57f5357bf6f572d296f466b2cfe2517ffebfabe82451661e28f02.css?v=6644e6bb" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/sphinx-thebe.css?v=4fa983c6" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/sphinx-design.min.css?v=95c83b7e" />
+  
+  <!-- Pre-loaded scripts that we'll load fully later -->
+  <link rel="preload" as="script" href="http://github.com/_static/scripts/bootstrap.js?digest=dfe6caa3a7d634c4db9b" />
+<link rel="preload" as="script" href="http://github.com/_static/scripts/pydata-sphinx-theme.js?digest=dfe6caa3a7d634c4db9b" />
+  <script src="http://github.com/_static/vendor/fontawesome/6.5.2/js/all.min.js?digest=dfe6caa3a7d634c4db9b"></script>
+
+    <script src="http://github.com/_static/documentation_options.js?v=9eb32ce0"></script>
+    <script src="http://github.com/_static/doctools.js?v=9a2dae69"></script>
+    <script src="http://github.com/_static/sphinx_highlight.js?v=dc90522c"></script>
+    <script src="http://github.com/_static/clipboard.min.js?v=a7894cd8"></script>
+    <script src="http://github.com/_static/copybutton.js?v=f281be69"></script>
+    <script src="http://github.com/_static/scripts/sphinx-book-theme.js?v=887ef09a"></script>
+    <script>let toggleHintShow = 'Click to show';</script>
+    <script>let toggleHintHide = 'Click to hide';</script>
+    <script>let toggleOpenOnPrint = 'true';</script>
+    <script src="http://github.com/_static/togglebutton.js?v=4a39c7ea"></script>
+    <script>var togglebuttonSelector = '.toggle, .admonition.dropdown';</script>
+    <script src="http://github.com/_static/design-tabs.js?v=f930bc37"></script>
+    <script>const THEBE_JS_URL = "https://unpkg.com/thebe@0.8.2/lib/index.js"; const thebe_selector = ".thebe,.cell"; const thebe_selector_input = "pre"; const thebe_selector_output = ".output, .cell_output"</script>
+    <script async="async" src="http://github.com/_static/sphinx-thebe.js?v=c100c467"></script>
+    <script>var togglebuttonSelector = '.toggle, .admonition.dropdown';</script>
+    <script>const THEBE_JS_URL = "https://unpkg.com/thebe@0.8.2/lib/index.js"; const thebe_selector = ".thebe,.cell"; const thebe_selector_input = "pre"; const thebe_selector_output = ".output, .cell_output"</script>
+    <script>window.MathJax = {"options": {"processHtmlClass": "tex2jax_process|mathjax_process|math|output_area"}}</script>
+    <script defer="defer" src="https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-mml-chtml.js"></script>
+    <script>DOCUMENTATION_OPTIONS.pagename = 'week38';</script>
+    <link rel="index" title="Index" href="http://github.com/genindex.html" />
+    <link rel="search" title="Search" href="http://github.com/search.html" />
+    <link rel="next" title="Exercises week 39" href="http://github.com/exercisesweek39.html" />
+    <link rel="prev" title="Exercises week 38" href="http://github.com/exercisesweek38.html" />
+  <meta name="viewport" content="width=device-width, initial-scale=1"/>
+  <meta name="docsearch:language" content="en"/>
+  </head>
+  
+  
+  <body data-bs-spy="scroll" data-bs-target=".bd-toc-nav" data-offset="180" data-bs-root-margin="0px 0px -60%" data-default-mode="">
+
+  
+  
+  <div id="pst-skip-link" class="skip-link d-print-none"><a href="#main-content">Skip to main content</a></div>
+  
+  <div id="pst-scroll-pixel-helper"></div>
+  
+  <button type="button" class="btn rounded-pill" id="pst-back-to-top">
+    <i class="fa-solid fa-arrow-up"></i>Back to top</button>
+
+  
+  <input type="checkbox"
+          class="sidebar-toggle"
+          id="pst-primary-sidebar-checkbox"/>
+  <label class="overlay overlay-primary" for="pst-primary-sidebar-checkbox"></label>
+  
+  <input type="checkbox"
+          class="sidebar-toggle"
+          id="pst-secondary-sidebar-checkbox"/>
+  <label class="overlay overlay-secondary" for="pst-secondary-sidebar-checkbox"></label>
+  
+  <div class="search-button__wrapper">
+    <div class="search-button__overlay"></div>
+    <div class="search-button__search-container">
+<form class="bd-search d-flex align-items-center"
+      action="http://github.com/search.html"
+      method="get">
+  <i class="fa-solid fa-magnifying-glass"></i>
+  <input type="search"
+         class="form-control"
+         name="q"
+         id="search-input"
+         placeholder="Search this book..."
+         aria-label="Search this book..."
+         autocomplete="off"
+         autocorrect="off"
+         autocapitalize="off"
+         spellcheck="false"/>
+  <span class="search-button__kbd-shortcut"><kbd class="kbd-shortcut__modifier">Ctrl</kbd>+<kbd>K</kbd></span>
+</form></div>
+  </div>
+
+  <div class="pst-async-banner-revealer d-none">
+  <aside id="bd-header-version-warning" class="d-none d-print-none" aria-label="Version warning"></aside>
+</div>
+
+  
+    <header class="bd-header navbar navbar-expand-lg bd-navbar d-print-none">
+    </header>
+  
+
+  <div class="bd-container">
+    <div class="bd-container__inner bd-page-width">
+      
+      
+      
+      <div class="bd-sidebar-primary bd-sidebar">
+        
+
+  
+  <div class="sidebar-header-items sidebar-primary__section">
+    
+    
+    
+    
+  </div>
+  
+    <div class="sidebar-primary-items__start sidebar-primary__section">
+        <div class="sidebar-primary-item">
+
+  
+    
+  
+
+<a class="navbar-brand logo" href="http://github.com/intro.html">
+  
+  
+  
+  
+  
+    
+    
+      
+    
+    
+    <img src="http://github.com/_static/logo.png" class="logo__image only-light" alt="Applied Data Analysis and Machine Learning - Home"/>
+    <script>document.write(`<img src="http://github.com/_static/logo.png" class="logo__image only-dark" alt="Applied Data Analysis and Machine Learning - Home"/>`);</script>
+  
+  
+</a></div>
+        <div class="sidebar-primary-item">
+
+ <script>
+ document.write(`
+   <button class="btn search-button-field search-button__button" title="Search" aria-label="Search" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <i class="fa-solid fa-magnifying-glass"></i>
+    <span class="search-button__default-text">Search</span>
+    <span class="search-button__kbd-shortcut"><kbd class="kbd-shortcut__modifier">Ctrl</kbd>+<kbd class="kbd-shortcut__modifier">K</kbd></span>
+   </button>
+ `);
+ </script></div>
+        <div class="sidebar-primary-item"><nav class="bd-links bd-docs-nav" aria-label="Main">
+    <div class="bd-toc-item navbar-nav active">
+        
+        <ul class="nav bd-sidenav bd-sidenav__home-link">
+            <li class="toctree-l1">
+                <a class="reference internal" href="http://github.com/intro.html">
+                    Applied Data Analysis and Machine Learning
+                </a>
+            </li>
+        </ul>
+        <p aria-level="2" class="caption" role="heading"><span class="caption-text">About the course</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/schedule.html">Course setting</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/teachers.html">Teachers and Grading</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/textbooks.html">Textbooks</a></li>
+
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Review of Statistics with Resampling Techniques and Linear Algebra</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/statistics.html">1. Elements of Probability Theory and Statistical Data Analysis</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/linalg.html">2. Linear Algebra, Handling of Arrays and more Python Features</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">From Regression to Support Vector Machines</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter1.html">3. Linear Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter2.html">4. Ridge and Lasso Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter3.html">5. Resampling Methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter4.html">6. Logistic Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapteroptimization.html">7. Optimization, the central part of any Machine Learning algortithm</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter5.html">8. Support Vector Machines, overarching aims</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Decision Trees, Ensemble Methods and Boosting</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter6.html">9. Decision trees, overarching aims</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter7.html">10. Ensemble Methods: From a Single Tree to Many Trees and Extreme Boosting, Meet the Jungle of Methods</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Dimensionality Reduction</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter8.html">11. Basic ideas of the Principal Component Analysis (PCA)</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/clustering.html">12. Clustering and Unsupervised Learning</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Deep Learning Methods</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter9.html">13. Neural networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter10.html">14. Building a Feed Forward Neural Network</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter11.html">15. Solving Differential Equations  with Deep Learning</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter12.html">16. Convolutional Neural Networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter13.html">17. Recurrent neural networks: Overarching view</a></li>
+
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Weekly material, notes and exercises</span></p>
+<ul class="current nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek34.html">Exercises week 34</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week34.html">Week 34: Introduction to the course, Logistics and Practicalities</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek35.html">Exercises week 35</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week35.html">Week 35: From Ordinary Linear Regression to Ridge and Lasso Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek36.html">Exercises week 36</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week36.html">Week 36: Linear Regression and Gradient descent</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek37.html">Exercises week 37</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week37.html">Week 37: Gradient descent methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek38.html">Exercises week 38</a></li>
+<li class="toctree-l1 current active"><a class="current reference internal" href="#">Week 38: Statistical analysis, bias-variance tradeoff and resampling methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek39.html">Exercises week 39</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week39.html">Week 39: Resampling methods and logistic regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week40.html">Week 40: Gradient descent methods (continued) and start Neural networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week41.html">Week 41 Neural networks and constructing a neural network code</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek41.html">Exercises week 41</a></li>
+
+
+
+
+
+
+
+
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week42.html">Week 42 Constructing a Neural Network code with examples</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek42.html">Exercises week 42</a></li>
+
+
+
+
+
+
+
+
+
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week43.html">Week 43: Deep Learning: Constructing a Neural Network code and solving differential equations</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek43.html">Exercises week 43</a></li>
+
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week44.html">Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek44.html">Exercises week 44</a></li>
+
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week45.html">Week 45,  Convolutional Neural Networks (CCNs)</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Projects</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/project1.html">Project 1 on Machine Learning, deadline October 6 (midnight), 2025</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/project2.html">Project 2 on Machine Learning, deadline November 10 (Midnight)</a></li>
+</ul>
+
+    </div>
+</nav></div>
+    </div>
+  
+  
+  <div class="sidebar-primary-items__end sidebar-primary__section">
+  </div>
+  
+  <div id="rtd-footer-container"></div>
+
+
+      </div>
+      
+      <main id="main-content" class="bd-main" role="main">
+        
+        
+
+<div class="sbt-scroll-pixel-helper"></div>
+
+          <div class="bd-content">
+            <div class="bd-article-container">
+              
+              <div class="bd-header-article d-print-none">
+<div class="header-article-items header-article__inner">
+  
+    <div class="header-article-items__start">
+      
+        <div class="header-article-item"><button class="sidebar-toggle primary-toggle btn btn-sm" title="Toggle primary sidebar" data-bs-placement="bottom" data-bs-toggle="tooltip">
+  <span class="fa-solid fa-bars"></span>
+</button></div>
+      
+    </div>
+  
+  
+    <div class="header-article-items__end">
+      
+        <div class="header-article-item">
+
+<div class="article-header-buttons">
+
+
+
+
+
+<div class="dropdown dropdown-download-buttons">
+  <button class="btn dropdown-toggle" type="button" data-bs-toggle="dropdown" aria-expanded="false" aria-label="Download this page">
+    <i class="fas fa-download"></i>
+  </button>
+  <ul class="dropdown-menu">
+      
+      
+      
+      <li><a href="http://github.com/_sources/week38.ipynb" target="_blank"
+   class="btn btn-sm btn-download-source-button dropdown-item"
+   title="Download source file"
+   data-bs-placement="left" data-bs-toggle="tooltip"
+>
+  
+
+<span class="btn__icon-container">
+  <i class="fas fa-file"></i>
+  </span>
+<span class="btn__text-container">.ipynb</span>
+</a>
+</li>
+      
+      
+      
+      
+      <li>
+<button onclick="window.print()"
+  class="btn btn-sm btn-download-pdf-button dropdown-item"
+  title="Print to PDF"
+  data-bs-placement="left" data-bs-toggle="tooltip"
+>
+  
+
+<span class="btn__icon-container">
+  <i class="fas fa-file-pdf"></i>
+  </span>
+<span class="btn__text-container">.pdf</span>
+</button>
+</li>
+      
+  </ul>
+</div>
+
+
+
+
+<button onclick="toggleFullScreen()"
+  class="btn btn-sm btn-fullscreen-button"
+  title="Fullscreen mode"
+  data-bs-placement="bottom" data-bs-toggle="tooltip"
+>
+  
+
+<span class="btn__icon-container">
+  <i class="fas fa-expand"></i>
+  </span>
+
+</button>
+
+
+
+<script>
+document.write(`
+  <button class="btn btn-sm nav-link pst-navbar-icon theme-switch-button" title="light/dark" aria-label="light/dark" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <i class="theme-switch fa-solid fa-sun fa-lg" data-mode="light"></i>
+    <i class="theme-switch fa-solid fa-moon fa-lg" data-mode="dark"></i>
+    <i class="theme-switch fa-solid fa-circle-half-stroke fa-lg" data-mode="auto"></i>
+  </button>
+`);
+</script>
+
+
+<script>
+document.write(`
+  <button class="btn btn-sm pst-navbar-icon search-button search-button__button" title="Search" aria-label="Search" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <i class="fa-solid fa-magnifying-glass fa-lg"></i>
+  </button>
+`);
+</script>
+<button class="sidebar-toggle secondary-toggle btn btn-sm" title="Toggle secondary sidebar" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <span class="fa-solid fa-list"></span>
+</button>
+</div></div>
+      
+    </div>
+  
+</div>
+</div>
+              
+              
+
+<div id="jb-print-docs-body" class="onlyprint">
+    <h1>Week 38: Statistical analysis, bias-variance tradeoff and resampling methods</h1>
+    <!-- Table of contents -->
+    <div id="print-main-content">
+        <div id="jb-print-toc">
+            
+            <div>
+                <h2> Contents </h2>
+            </div>
+            <nav aria-label="Page">
+                <ul class="visible nav section-nav flex-column">
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#plans-for-week-38-lecture-monday-september-15">Plans for week 38, lecture Monday September 15</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#readings-and-videos">Readings and Videos</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#linking-the-regression-analysis-with-a-statistical-interpretation">Linking the regression analysis with a statistical interpretation</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#assumptions-made">Assumptions made</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#expectation-value-and-variance">Expectation value and variance</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#expectation-value-and-variance-for-boldsymbol-theta">Expectation value and variance for <span class="math notranslate nohighlight">\(\boldsymbol{\theta}\)</span></a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#deriving-ols-from-a-probability-distribution">Deriving OLS from a probability distribution</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#independent-and-identically-distributed-iid">Independent and Identically Distributed (iid)</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#maximum-likelihood-estimation-mle">Maximum Likelihood Estimation (MLE)</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#a-new-cost-function">A new Cost Function</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#why-resampling-methods">Why resampling methods</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#resampling-methods">Resampling methods</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#resampling-approaches-can-be-computationally-expensive">Resampling approaches can be computationally expensive</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#id1">Why resampling methods ?</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#statistical-analysis">Statistical analysis</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#id2">Resampling methods</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#resampling-methods-bootstrap">Resampling methods: Bootstrap</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#the-central-limit-theorem">The Central Limit Theorem</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#finding-the-limit">Finding the Limit</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#rewriting-the-delta-function">Rewriting the <span class="math notranslate nohighlight">\(\delta\)</span>-function</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#identifying-terms">Identifying Terms</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#wrapping-it-up">Wrapping it up</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#confidence-intervals">Confidence Intervals</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#standard-approach-based-on-the-normal-distribution">Standard Approach based on the Normal Distribution</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#resampling-methods-bootstrap-background">Resampling methods: Bootstrap background</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#resampling-methods-more-bootstrap-background">Resampling methods: More Bootstrap background</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#resampling-methods-bootstrap-approach">Resampling methods: Bootstrap approach</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#resampling-methods-bootstrap-steps">Resampling methods: Bootstrap steps</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#code-example-for-the-bootstrap-method">Code example for the Bootstrap method</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#plotting-the-histogram">Plotting the Histogram</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#the-bias-variance-tradeoff">The bias-variance tradeoff</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#a-way-to-read-the-bias-variance-tradeoff">A way to Read the Bias-Variance Tradeoff</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#example-code-for-bias-variance-tradeoff">Example code for Bias-Variance tradeoff</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#understanding-what-happens">Understanding what happens</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#summing-up">Summing up</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#another-example-from-scikit-learn-s-repository">Another Example from Scikit-Learn’s Repository</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#various-steps-in-cross-validation">Various steps in cross-validation</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#cross-validation-in-brief">Cross-validation in brief</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#code-example-for-cross-validation-and-k-fold-cross-validation">Code Example for Cross-validation and <span class="math notranslate nohighlight">\(k\)</span>-fold Cross-validation</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#more-examples-on-bootstrap-and-cross-validation-and-errors">More examples on bootstrap and cross-validation and errors</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#the-same-example-but-now-with-cross-validation">The same example but now with cross-validation</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#material-for-the-lab-sessions">Material for the lab sessions</a></li>
+</ul>
+            </nav>
+        </div>
+    </div>
+</div>
+
+              
+                
+<div id="searchbox"></div>
+                <article class="bd-article">
+                  
+  <!-- HTML file automatically generated from DocOnce source (https://github.com/doconce/doconce/)
+doconce format html week38.do.txt --no_mako -->
+<!-- dom:TITLE: Week 38: Statistical analysis, bias-variance tradeoff and resampling methods --><section class="tex2jax_ignore mathjax_ignore" id="week-38-statistical-analysis-bias-variance-tradeoff-and-resampling-methods">
+<h1>Week 38: Statistical analysis, bias-variance tradeoff and resampling methods<a class="headerlink" href="#week-38-statistical-analysis-bias-variance-tradeoff-and-resampling-methods" title="Link to this heading">#</a></h1>
+<p><strong>Morten Hjorth-Jensen</strong>, Department of Physics and Center for Computing in Science Education, University of Oslo, Norway</p>
+<p>Date: <strong>September 15-19, 2025</strong></p>
+<section id="plans-for-week-38-lecture-monday-september-15">
+<h2>Plans for week 38, lecture Monday September 15<a class="headerlink" href="#plans-for-week-38-lecture-monday-september-15" title="Link to this heading">#</a></h2>
+<p><strong>Material for the lecture on Monday September 15.</strong></p>
+<ol class="arabic simple">
+<li><p>Statistical interpretation of OLS and various expectation values</p></li>
+<li><p>Resampling techniques, Bootstrap and cross validation and bias-variance tradeoff</p></li>
+<li><p>The material we did not cover last week, that is on more advanced methods for updating the learning rate, are covered by its own video. We will briefly discuss these topics at the beginning of the lecture and during the lab sessions. See video on ADAgrad, RMSprop and ADAM (material from last week not covered during lecture) at <a class="reference external" href="https://youtu.be/J_41Hld6tTU">https://youtu.be/J_41Hld6tTU</a></p></li>
+<li><p><a class="reference external" href="https://youtu.be/4Fo7ITVA7V4">Video of Lecture</a></p></li>
+<li><p><a class="reference external" href="https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2025/FYSSTKweek38.pdf">Whiteboard notes</a></p></li>
+</ol>
+</section>
+<section id="readings-and-videos">
+<h2>Readings and Videos<a class="headerlink" href="#readings-and-videos" title="Link to this heading">#</a></h2>
+<ol class="arabic simple">
+<li><p>Raschka et al, pages 175-192</p></li>
+<li><p>Hastie et al Chapter 7, here we recommend 7.1-7.5 and 7.10 (cross-validation) and 7.11 (bootstrap). See <a class="reference external" href="https://link.springer.com/book/10.1007/978-0-387-84858-7">https://link.springer.com/book/10.1007/978-0-387-84858-7</a>.</p></li>
+<li><p><a class="reference external" href="https://www.youtube.com/watch?v=EuBBz3bI-aA">Video on bias-variance tradeoff</a></p></li>
+<li><p><a class="reference external" href="https://www.youtube.com/watch?v=Xz0x-8-cgaQ">Video on Bootstrapping</a></p></li>
+<li><p><a class="reference external" href="https://www.youtube.com/watch?v=fSytzGwwBVw">Video on cross validation</a></p></li>
+</ol>
+<p>For the lab session, the following video on cross validation (from 2024), could be helpful, see <a class="reference external" href="https://www.youtube.com/watch?v=T9jjWsmsd1o">https://www.youtube.com/watch?v=T9jjWsmsd1o</a></p>
+</section>
+<section id="linking-the-regression-analysis-with-a-statistical-interpretation">
+<h2>Linking the regression analysis with a statistical interpretation<a class="headerlink" href="#linking-the-regression-analysis-with-a-statistical-interpretation" title="Link to this heading">#</a></h2>
+<p>We will now couple the discussions of ordinary least squares, Ridge
+and Lasso regression with a statistical interpretation, that is we
+move from a linear algebra analysis to a statistical analysis. In
+particular, we will focus on what the regularization terms can result
+in.  We will amongst other things show that the regularization
+parameter can reduce considerably the variance of the parameters
+<span class="math notranslate nohighlight">\(\theta\)</span>.</p>
+<p>On of the advantages of doing linear regression is that we actually end up with
+analytical expressions for several statistical quantities.<br />
+Standard least squares and Ridge regression  allow us to
+derive quantities like the variance and other expectation values in a
+rather straightforward way.</p>
+<p>It is assumed that <span class="math notranslate nohighlight">\(\varepsilon_i
+\sim \mathcal{N}(0, \sigma^2)\)</span> and the <span class="math notranslate nohighlight">\(\varepsilon_{i}\)</span> are
+independent, i.e.:</p>
+<div class="math notranslate nohighlight">
+\[\begin{split}
+\begin{align*} 
+\mbox{Cov}(\varepsilon_{i_1},
+\varepsilon_{i_2}) &amp; = \left\{ \begin{array}{lcc} \sigma^2 &amp; \mbox{if}
+&amp; i_1 = i_2, \\ 0 &amp; \mbox{if} &amp; i_1 \not= i_2.  \end{array} \right.
+\end{align*}
+\end{split}\]</div>
+<p>The randomness of <span class="math notranslate nohighlight">\(\varepsilon_i\)</span> implies that
+<span class="math notranslate nohighlight">\(\mathbf{y}_i\)</span> is also a random variable. In particular,
+<span class="math notranslate nohighlight">\(\mathbf{y}_i\)</span> is normally distributed, because <span class="math notranslate nohighlight">\(\varepsilon_i \sim
+\mathcal{N}(0, \sigma^2)\)</span> and <span class="math notranslate nohighlight">\(\mathbf{X}_{i,\ast} \, \boldsymbol{\theta}\)</span> is a
+non-random scalar. To specify the parameters of the distribution of
+<span class="math notranslate nohighlight">\(\mathbf{y}_i\)</span> we need to calculate its first two moments.</p>
+<p>Recall that <span class="math notranslate nohighlight">\(\boldsymbol{X}\)</span> is a matrix of dimensionality <span class="math notranslate nohighlight">\(n\times p\)</span>. The
+notation above <span class="math notranslate nohighlight">\(\mathbf{X}_{i,\ast}\)</span> means that we are looking at the
+row number <span class="math notranslate nohighlight">\(i\)</span> and perform a sum over all values <span class="math notranslate nohighlight">\(p\)</span>.</p>
+</section>
+<section id="assumptions-made">
+<h2>Assumptions made<a class="headerlink" href="#assumptions-made" title="Link to this heading">#</a></h2>
+<p>The assumption we have made here can be summarized as (and this is going to be useful when we discuss the bias-variance trade off)
+that there exists a function <span class="math notranslate nohighlight">\(f(\boldsymbol{x})\)</span> and  a normal distributed error <span class="math notranslate nohighlight">\(\boldsymbol{\varepsilon}\sim \mathcal{N}(0, \sigma^2)\)</span>
+which describe our data</p>
+<div class="math notranslate nohighlight">
+\[
+\boldsymbol{y} = f(\boldsymbol{x})+\boldsymbol{\varepsilon}
+\]</div>
+<p>We approximate this function with our model from the solution of the linear regression equations, that is our
+function <span class="math notranslate nohighlight">\(f\)</span> is approximated by <span class="math notranslate nohighlight">\(\boldsymbol{\tilde{y}}\)</span> where we want to minimize <span class="math notranslate nohighlight">\((\boldsymbol{y}-\boldsymbol{\tilde{y}})^2\)</span>, our MSE, with</p>
+<div class="math notranslate nohighlight">
+\[
+\boldsymbol{\tilde{y}} = \boldsymbol{X}\boldsymbol{\theta}.
+\]</div>
+</section>
+<section id="expectation-value-and-variance">
+<h2>Expectation value and variance<a class="headerlink" href="#expectation-value-and-variance" title="Link to this heading">#</a></h2>
+<p>We can calculate the expectation value of <span class="math notranslate nohighlight">\(\boldsymbol{y}\)</span> for a given element <span class="math notranslate nohighlight">\(i\)</span></p>
+<div class="math notranslate nohighlight">
+\[
+\begin{align*} 
+\mathbb{E}(y_i) &amp; =
+\mathbb{E}(\mathbf{X}_{i, \ast} \, \boldsymbol{\theta}) + \mathbb{E}(\varepsilon_i)
+\, \, \, = \, \, \, \mathbf{X}_{i, \ast} \, \theta, 
+\end{align*}
+\]</div>
+<p>while
+its variance is</p>
+<div class="math notranslate nohighlight">
+\[\begin{split}
+\begin{align*} \mbox{Var}(y_i) &amp; = \mathbb{E} \{ [y_i
+- \mathbb{E}(y_i)]^2 \} \, \, \, = \, \, \, \mathbb{E} ( y_i^2 ) -
+[\mathbb{E}(y_i)]^2  \\  &amp; = \mathbb{E} [ ( \mathbf{X}_{i, \ast} \,
+\theta + \varepsilon_i )^2] - ( \mathbf{X}_{i, \ast} \, \boldsymbol{\theta})^2 \\ &amp;
+= \mathbb{E} [ ( \mathbf{X}_{i, \ast} \, \boldsymbol{\theta})^2 + 2 \varepsilon_i
+\mathbf{X}_{i, \ast} \, \boldsymbol{\theta} + \varepsilon_i^2 ] - ( \mathbf{X}_{i,
+\ast} \, \theta)^2 \\  &amp; = ( \mathbf{X}_{i, \ast} \, \boldsymbol{\theta})^2 + 2
+\mathbb{E}(\varepsilon_i) \mathbf{X}_{i, \ast} \, \boldsymbol{\theta} +
+\mathbb{E}(\varepsilon_i^2 ) - ( \mathbf{X}_{i, \ast} \, \boldsymbol{\theta})^2 
+\\ &amp; = \mathbb{E}(\varepsilon_i^2 ) \, \, \, = \, \, \,
+\mbox{Var}(\varepsilon_i) \, \, \, = \, \, \, \sigma^2.  
+\end{align*}
+\end{split}\]</div>
+<p>Hence, <span class="math notranslate nohighlight">\(y_i \sim \mathcal{N}( \mathbf{X}_{i, \ast} \, \boldsymbol{\theta}, \sigma^2)\)</span>, that is <span class="math notranslate nohighlight">\(\boldsymbol{y}\)</span> follows a normal distribution with
+mean value <span class="math notranslate nohighlight">\(\boldsymbol{X}\boldsymbol{\theta}\)</span> and variance <span class="math notranslate nohighlight">\(\sigma^2\)</span> (not be confused with the singular values of the SVD).</p>
+</section>
+<section id="expectation-value-and-variance-for-boldsymbol-theta">
+<h2>Expectation value and variance for <span class="math notranslate nohighlight">\(\boldsymbol{\theta}\)</span><a class="headerlink" href="#expectation-value-and-variance-for-boldsymbol-theta" title="Link to this heading">#</a></h2>
+<p>With the OLS expressions for the optimal parameters <span class="math notranslate nohighlight">\(\boldsymbol{\hat{\theta}}\)</span> we can evaluate the expectation value</p>
+<div class="math notranslate nohighlight">
+\[
+\mathbb{E}(\boldsymbol{\hat{\theta}}) = \mathbb{E}[ (\mathbf{X}^{\top} \mathbf{X})^{-1}\mathbf{X}^{T} \mathbf{Y}]=(\mathbf{X}^{T} \mathbf{X})^{-1}\mathbf{X}^{T} \mathbb{E}[ \mathbf{Y}]=(\mathbf{X}^{T} \mathbf{X})^{-1} \mathbf{X}^{T}\mathbf{X}\boldsymbol{\theta}=\boldsymbol{\theta}.
+\]</div>
+<p>This means that the estimator of the regression parameters is unbiased.</p>
+<p>We can also calculate the variance</p>
+<p>The variance of the optimal value <span class="math notranslate nohighlight">\(\boldsymbol{\hat{\theta}}\)</span> is</p>
+<div class="math notranslate nohighlight">
+\[\begin{split}
+\begin{eqnarray*}
+\mbox{Var}(\boldsymbol{\hat{\theta}}) &amp; = &amp; \mathbb{E} \{ [\boldsymbol{\theta} - \mathbb{E}(\boldsymbol{\theta})] [\boldsymbol{\theta} - \mathbb{E}(\boldsymbol{\theta})]^{T} \}
+\\
+&amp; = &amp; \mathbb{E} \{ [(\mathbf{X}^{T} \mathbf{X})^{-1} \, \mathbf{X}^{T} \mathbf{Y} - \boldsymbol{\theta}] \, [(\mathbf{X}^{T} \mathbf{X})^{-1} \, \mathbf{X}^{T} \mathbf{Y} - \boldsymbol{\theta}]^{T} \}
+\\
+% &amp; = &amp; \mathbb{E} \{ [(\mathbf{X}^{T} \mathbf{X})^{-1} \, \mathbf{X}^{T} \mathbf{Y}] \, [(\mathbf{X}^{T} \mathbf{X})^{-1} \, \mathbf{X}^{T} \mathbf{Y}]^{T} \} - \boldsymbol{\theta} \, \boldsymbol{\theta}^{T}
+% \\
+% &amp; = &amp; \mathbb{E} \{ (\mathbf{X}^{T} \mathbf{X})^{-1} \, \mathbf{X}^{T} \mathbf{Y} \, \mathbf{Y}^{T} \, \mathbf{X} \, (\mathbf{X}^{T} \mathbf{X})^{-1}  \} - \boldsymbol{\theta} \, \boldsymbol{\theta}^{T}
+% \\
+&amp; = &amp; (\mathbf{X}^{T} \mathbf{X})^{-1} \, \mathbf{X}^{T} \, \mathbb{E} \{ \mathbf{Y} \, \mathbf{Y}^{T} \} \, \mathbf{X} \, (\mathbf{X}^{T} \mathbf{X})^{-1} - \boldsymbol{\theta} \, \boldsymbol{\theta}^{T}
+\\
+&amp; = &amp; (\mathbf{X}^{T} \mathbf{X})^{-1} \, \mathbf{X}^{T} \, \{ \mathbf{X} \, \boldsymbol{\theta} \, \boldsymbol{\theta}^{T} \,  \mathbf{X}^{T} + \sigma^2 \} \, \mathbf{X} \, (\mathbf{X}^{T} \mathbf{X})^{-1} - \boldsymbol{\theta} \, \boldsymbol{\theta}^{T}
+% \\
+% &amp; = &amp; (\mathbf{X}^T \mathbf{X})^{-1} \, \mathbf{X}^T \, \mathbf{X} \, \boldsymbol{\theta} \, \boldsymbol{\theta}^T \,  \mathbf{X}^T \, \mathbf{X} \, (\mathbf{X}^T % \mathbf{X})^{-1}
+% \\
+% &amp; &amp; + \, \, \sigma^2 \, (\mathbf{X}^T \mathbf{X})^{-1} \, \mathbf{X}^T  \, \mathbf{X} \, (\mathbf{X}^T \mathbf{X})^{-1} - \boldsymbol{\theta} \boldsymbol{\theta}^T
+\\
+&amp; = &amp; \boldsymbol{\theta} \, \boldsymbol{\theta}^{T}  + \sigma^2 \, (\mathbf{X}^{T} \mathbf{X})^{-1} - \boldsymbol{\theta} \, \boldsymbol{\theta}^{T}
+\, \, \, = \, \, \, \sigma^2 \, (\mathbf{X}^{T} \mathbf{X})^{-1},
+\end{eqnarray*}
+\end{split}\]</div>
+<p>where we have used  that <span class="math notranslate nohighlight">\(\mathbb{E} (\mathbf{Y} \mathbf{Y}^{T}) =
+\mathbf{X} \, \boldsymbol{\theta} \, \boldsymbol{\theta}^{T} \, \mathbf{X}^{T} +
+\sigma^2 \, \mathbf{I}_{nn}\)</span>. From <span class="math notranslate nohighlight">\(\mbox{Var}(\boldsymbol{\theta}) = \sigma^2
+\, (\mathbf{X}^{T} \mathbf{X})^{-1}\)</span>, one obtains an estimate of the
+variance of the estimate of the <span class="math notranslate nohighlight">\(j\)</span>-th regression coefficient:
+<span class="math notranslate nohighlight">\(\boldsymbol{\sigma}^2 (\boldsymbol{\theta}_j ) = \boldsymbol{\sigma}^2 [(\mathbf{X}^{T} \mathbf{X})^{-1}]_{jj} \)</span>. This may be used to
+construct a confidence interval for the estimates.</p>
+<p>In a similar way, we can obtain analytical expressions for say the
+expectation values of the parameters <span class="math notranslate nohighlight">\(\boldsymbol{\theta}\)</span> and their variance
+when we employ Ridge regression, allowing us again to define a confidence interval.</p>
+<p>It is rather straightforward to show that</p>
+<div class="math notranslate nohighlight">
+\[
+\mathbb{E} \big[ \boldsymbol{\theta}^{\mathrm{Ridge}} \big]=(\mathbf{X}^{T} \mathbf{X} + \lambda \mathbf{I}_{pp})^{-1} (\mathbf{X}^{\top} \mathbf{X})\boldsymbol{\theta}^{\mathrm{OLS}}.
+\]</div>
+<p>We see clearly that
+<span class="math notranslate nohighlight">\(\mathbb{E} \big[ \boldsymbol{\theta}^{\mathrm{Ridge}} \big] \not= \boldsymbol{\theta}^{\mathrm{OLS}}\)</span> for any <span class="math notranslate nohighlight">\(\lambda &gt; 0\)</span>. We say then that the ridge estimator is biased.</p>
+<p>We can also compute the variance as</p>
+<div class="math notranslate nohighlight">
+\[
+\mbox{Var}[\boldsymbol{\theta}^{\mathrm{Ridge}}]=\sigma^2[  \mathbf{X}^{T} \mathbf{X} + \lambda \mathbf{I} ]^{-1}  \mathbf{X}^{T} \mathbf{X} \{ [  \mathbf{X}^{\top} \mathbf{X} + \lambda \mathbf{I} ]^{-1}\}^{T},
+\]</div>
+<p>and it is easy to see that if the parameter <span class="math notranslate nohighlight">\(\lambda\)</span> goes to infinity then the variance of Ridge parameters <span class="math notranslate nohighlight">\(\boldsymbol{\theta}\)</span> goes to zero.</p>
+<p>With this, we can compute the difference</p>
+<div class="math notranslate nohighlight">
+\[
+\mbox{Var}[\boldsymbol{\theta}^{\mathrm{OLS}}]-\mbox{Var}(\boldsymbol{\theta}^{\mathrm{Ridge}})=\sigma^2 [  \mathbf{X}^{T} \mathbf{X} + \lambda \mathbf{I} ]^{-1}[ 2\lambda\mathbf{I} + \lambda^2 (\mathbf{X}^{T} \mathbf{X})^{-1} ] \{ [  \mathbf{X}^{T} \mathbf{X} + \lambda \mathbf{I} ]^{-1}\}^{T}.
+\]</div>
+<p>The difference is non-negative definite since each component of the
+matrix product is non-negative definite.
+This means the variance we obtain with the standard OLS will always for <span class="math notranslate nohighlight">\(\lambda &gt; 0\)</span> be larger than the variance of <span class="math notranslate nohighlight">\(\boldsymbol{\theta}\)</span> obtained with the Ridge estimator. This has interesting consequences when we discuss the so-called bias-variance trade-off below.</p>
+</section>
+<section id="deriving-ols-from-a-probability-distribution">
+<h2>Deriving OLS from a probability distribution<a class="headerlink" href="#deriving-ols-from-a-probability-distribution" title="Link to this heading">#</a></h2>
+<p>Our basic assumption when we derived the OLS equations was to assume
+that our output is determined by a given continuous function
+<span class="math notranslate nohighlight">\(f(\boldsymbol{x})\)</span> and a random noise <span class="math notranslate nohighlight">\(\boldsymbol{\epsilon}\)</span> given by the normal
+distribution with zero mean value and an undetermined variance
+<span class="math notranslate nohighlight">\(\sigma^2\)</span>.</p>
+<p>We found above that the outputs <span class="math notranslate nohighlight">\(\boldsymbol{y}\)</span> have a mean value given by
+<span class="math notranslate nohighlight">\(\boldsymbol{X}\hat{\boldsymbol{\theta}}\)</span> and variance <span class="math notranslate nohighlight">\(\sigma^2\)</span>. Since the entries to
+the design matrix are not stochastic variables, we can assume that the
+probability distribution of our targets is also a normal distribution
+but now with mean value <span class="math notranslate nohighlight">\(\boldsymbol{X}\hat{\boldsymbol{\theta}}\)</span>. This means that a
+single output <span class="math notranslate nohighlight">\(y_i\)</span> is given by the Gaussian distribution</p>
+<div class="math notranslate nohighlight">
+\[
+y_i\sim \mathcal{N}(\boldsymbol{X}_{i,*}\boldsymbol{\theta}, \sigma^2)=\frac{1}{\sqrt{2\pi\sigma^2}}\exp{\left[-\frac{(y_i-\boldsymbol{X}_{i,*}\boldsymbol{\theta})^2}{2\sigma^2}\right]}.
+\]</div>
+</section>
+<section id="independent-and-identically-distributed-iid">
+<h2>Independent and Identically Distributed (iid)<a class="headerlink" href="#independent-and-identically-distributed-iid" title="Link to this heading">#</a></h2>
+<p>We assume now that the various <span class="math notranslate nohighlight">\(y_i\)</span> values are stochastically distributed according to the above Gaussian distribution.
+We define this distribution as</p>
+<div class="math notranslate nohighlight">
+\[
+p(y_i, \boldsymbol{X}\vert\boldsymbol{\theta})=\frac{1}{\sqrt{2\pi\sigma^2}}\exp{\left[-\frac{(y_i-\boldsymbol{X}_{i,*}\boldsymbol{\theta})^2}{2\sigma^2}\right]},
+\]</div>
+<p>which reads as finding the likelihood of an event <span class="math notranslate nohighlight">\(y_i\)</span> with the input variables <span class="math notranslate nohighlight">\(\boldsymbol{X}\)</span> given the parameters (to be determined) <span class="math notranslate nohighlight">\(\boldsymbol{\theta}\)</span>.</p>
+<p>Since these events are assumed to be independent and identicall distributed we can build the probability distribution function (PDF) for all possible event <span class="math notranslate nohighlight">\(\boldsymbol{y}\)</span> as the product of the single events, that is we have</p>
+<div class="math notranslate nohighlight">
+\[
+p(\boldsymbol{y},\boldsymbol{X}\vert\boldsymbol{\theta})=\prod_{i=0}^{n-1}\frac{1}{\sqrt{2\pi\sigma^2}}\exp{\left[-\frac{(y_i-\boldsymbol{X}_{i,*}\boldsymbol{\theta})^2}{2\sigma^2}\right]}=\prod_{i=0}^{n-1}p(y_i,\boldsymbol{X}\vert\boldsymbol{\theta}).
+\]</div>
+<p>We will write this in a more compact form reserving <span class="math notranslate nohighlight">\(\boldsymbol{D}\)</span> for the domain of events, including the ouputs (targets) and the inputs. That is
+in case we have a simple one-dimensional input and output case</p>
+<div class="math notranslate nohighlight">
+\[
+\boldsymbol{D}=[(x_0,y_0), (x_1,y_1),\dots, (x_{n-1},y_{n-1})].
+\]</div>
+<p>In the more general case the various inputs should be replaced by the possible features represented by the input data set <span class="math notranslate nohighlight">\(\boldsymbol{X}\)</span>.
+We can now rewrite the above probability as</p>
+<div class="math notranslate nohighlight">
+\[
+p(\boldsymbol{D}\vert\boldsymbol{\theta})=\prod_{i=0}^{n-1}\frac{1}{\sqrt{2\pi\sigma^2}}\exp{\left[-\frac{(y_i-\boldsymbol{X}_{i,*}\boldsymbol{\theta})^2}{2\sigma^2}\right]}.
+\]</div>
+<p>It is a conditional probability (see below) and reads as the likelihood of a domain of events <span class="math notranslate nohighlight">\(\boldsymbol{D}\)</span> given a set of parameters <span class="math notranslate nohighlight">\(\boldsymbol{\theta}\)</span>.</p>
+</section>
+<section id="maximum-likelihood-estimation-mle">
+<h2>Maximum Likelihood Estimation (MLE)<a class="headerlink" href="#maximum-likelihood-estimation-mle" title="Link to this heading">#</a></h2>
+<p>In statistics, maximum likelihood estimation (MLE) is a method of
+estimating the parameters of an assumed probability distribution,
+given some observed data. This is achieved by maximizing a likelihood
+function so that, under the assumed statistical model, the observed
+data is the most probable.</p>
+<p>We will assume here that our events are given by the above Gaussian
+distribution and we will determine the optimal parameters <span class="math notranslate nohighlight">\(\theta\)</span> by
+maximizing the above PDF. However, computing the derivatives of a
+product function is cumbersome and can easily lead to overflow and/or
+underflowproblems, with potentials for loss of numerical precision.</p>
+<p>In practice, it is more convenient to maximize the logarithm of the
+PDF because it is a monotonically increasing function of the argument.
+Alternatively, and this will be our option, we will minimize the
+negative of the logarithm since this is a monotonically decreasing
+function.</p>
+<p>Note also that maximization/minimization of the logarithm of the PDF
+is equivalent to the maximization/minimization of the function itself.</p>
+</section>
+<section id="a-new-cost-function">
+<h2>A new Cost Function<a class="headerlink" href="#a-new-cost-function" title="Link to this heading">#</a></h2>
+<p>We could now define a new cost function to minimize, namely the negative logarithm of the above PDF</p>
+<div class="math notranslate nohighlight">
+\[
+C(\boldsymbol{\theta})=-\log{\prod_{i=0}^{n-1}p(y_i,\boldsymbol{X}\vert\boldsymbol{\theta})}=-\sum_{i=0}^{n-1}\log{p(y_i,\boldsymbol{X}\vert\boldsymbol{\theta})},
+\]</div>
+<p>which becomes</p>
+<div class="math notranslate nohighlight">
+\[
+C(\boldsymbol{\theta})=\frac{n}{2}\log{2\pi\sigma^2}+\frac{\vert\vert (\boldsymbol{y}-\boldsymbol{X}\boldsymbol{\theta})\vert\vert_2^2}{2\sigma^2}.
+\]</div>
+<p>Taking the derivative of the <em>new</em> cost function with respect to the parameters <span class="math notranslate nohighlight">\(\theta\)</span> we recognize our familiar OLS equation, namely</p>
+<div class="math notranslate nohighlight">
+\[
+\boldsymbol{X}^T\left(\boldsymbol{y}-\boldsymbol{X}\boldsymbol{\theta}\right) =0,
+\]</div>
+<p>which leads to the well-known OLS equation for the optimal paramters <span class="math notranslate nohighlight">\(\theta\)</span></p>
+<div class="math notranslate nohighlight">
+\[
+\hat{\boldsymbol{\theta}}^{\mathrm{OLS}}=\left(\boldsymbol{X}^T\boldsymbol{X}\right)^{-1}\boldsymbol{X}^T\boldsymbol{y}!
+\]</div>
+<p>Next week we will make  a similar analysis for Ridge and Lasso regression</p>
+</section>
+<section id="why-resampling-methods">
+<h2>Why resampling methods<a class="headerlink" href="#why-resampling-methods" title="Link to this heading">#</a></h2>
+<p>Before we proceed, we need to rethink what we have been doing. In our
+eager to fit the data, we have omitted several important elements in
+our regression analysis. In what follows we will</p>
+<ol class="arabic simple">
+<li><p>look at statistical properties, including a discussion of mean values, variance and the so-called bias-variance tradeoff</p></li>
+<li><p>introduce resampling techniques like cross-validation, bootstrapping and jackknife and more</p></li>
+</ol>
+<p>and discuss how to select a given model (one of the difficult parts in machine learning).</p>
+</section>
+<section id="resampling-methods">
+<h2>Resampling methods<a class="headerlink" href="#resampling-methods" title="Link to this heading">#</a></h2>
+<p>Resampling methods are an indispensable tool in modern
+statistics. They involve repeatedly drawing samples from a training
+set and refitting a model of interest on each sample in order to
+obtain additional information about the fitted model. For example, in
+order to estimate the variability of a linear regression fit, we can
+repeatedly draw different samples from the training data, fit a linear
+regression to each new sample, and then examine the extent to which
+the resulting fits differ. Such an approach may allow us to obtain
+information that would not be available from fitting the model only
+once using the original training sample.</p>
+<p>Two resampling methods are often used in Machine Learning analyses,</p>
+<ol class="arabic simple">
+<li><p>The <strong>bootstrap method</strong></p></li>
+<li><p>and <strong>Cross-Validation</strong></p></li>
+</ol>
+<p>In addition there are several other methods such as the Jackknife and the Blocking methods. We will discuss in particular
+cross-validation and the bootstrap method.</p>
+</section>
+<section id="resampling-approaches-can-be-computationally-expensive">
+<h2>Resampling approaches can be computationally expensive<a class="headerlink" href="#resampling-approaches-can-be-computationally-expensive" title="Link to this heading">#</a></h2>
+<p>Resampling approaches can be computationally expensive, because they
+involve fitting the same statistical method multiple times using
+different subsets of the training data. However, due to recent
+advances in computing power, the computational requirements of
+resampling methods generally are not prohibitive. In this chapter, we
+discuss two of the most commonly used resampling methods,
+cross-validation and the bootstrap. Both methods are important tools
+in the practical application of many statistical learning
+procedures. For example, cross-validation can be used to estimate the
+test error associated with a given statistical learning method in
+order to evaluate its performance, or to select the appropriate level
+of flexibility. The process of evaluating a model’s performance is
+known as model assessment, whereas the process of selecting the proper
+level of flexibility for a model is known as model selection. The
+bootstrap is widely used.</p>
+</section>
+<section id="id1">
+<h2>Why resampling methods ?<a class="headerlink" href="#id1" title="Link to this heading">#</a></h2>
+<p><strong>Statistical analysis.</strong></p>
+<ul class="simple">
+<li><p>Our simulations can be treated as <em>computer experiments</em>. This is particularly the case for Monte Carlo methods which are widely used in statistical analyses.</p></li>
+<li><p>The results can be analysed with the same statistical tools as we would use when analysing experimental data.</p></li>
+<li><p>As in all experiments, we are looking for expectation values and an estimate of how accurate they are, i.e., possible sources for errors.</p></li>
+</ul>
+</section>
+<section id="statistical-analysis">
+<h2>Statistical analysis<a class="headerlink" href="#statistical-analysis" title="Link to this heading">#</a></h2>
+<ul class="simple">
+<li><p>As in other experiments, many numerical  experiments have two classes of errors:</p>
+<ul>
+<li><p>Statistical errors</p></li>
+<li><p>Systematical errors</p></li>
+</ul>
+</li>
+<li><p>Statistical errors can be estimated using standard tools from statistics</p></li>
+<li><p>Systematical errors are method specific and must be treated differently from case to case.</p></li>
+</ul>
+</section>
+<section id="id2">
+<h2>Resampling methods<a class="headerlink" href="#id2" title="Link to this heading">#</a></h2>
+<p>With all these analytical equations for both the OLS and Ridge
+regression, we will now outline how to assess a given model. This will
+lead to a discussion of the so-called bias-variance tradeoff (see
+below) and so-called resampling methods.</p>
+<p>One of the quantities we have discussed as a way to measure errors is
+the mean-squared error (MSE), mainly used for fitting of continuous
+functions. Another choice is the absolute error.</p>
+<p>In the discussions below we will focus on the MSE and in particular since we will split the data into test and training data,
+we discuss the</p>
+<ol class="arabic simple">
+<li><p>prediction error or simply the <strong>test error</strong> <span class="math notranslate nohighlight">\(\mathrm{Err_{Test}}\)</span>, where we have a fixed training set and the test error is the MSE arising from the data reserved for testing. We discuss also the</p></li>
+<li><p>training error <span class="math notranslate nohighlight">\(\mathrm{Err_{Train}}\)</span>, which is the average loss over the training data.</p></li>
+</ol>
+<p>As our model becomes more and more complex, more of the training data tends to  used. The training may thence adapt to more complicated structures in the data. This may lead to a decrease in the bias (see below for code example) and a slight increase of the variance for the test error.
+For a certain level of complexity the test error will reach minimum, before starting to increase again. The
+training error reaches a saturation.</p>
+</section>
+<section id="resampling-methods-bootstrap">
+<h2>Resampling methods: Bootstrap<a class="headerlink" href="#resampling-methods-bootstrap" title="Link to this heading">#</a></h2>
+<p>Bootstrapping is a <a class="reference external" href="https://en.wikipedia.org/wiki/Nonparametric_statistics">non-parametric approach</a> to statistical inference
+that substitutes computation for more traditional distributional
+assumptions and asymptotic results. Bootstrapping offers a number of
+advantages:</p>
+<ol class="arabic simple">
+<li><p>The bootstrap is quite general, although there are some cases in which it fails.</p></li>
+<li><p>Because it does not require distributional assumptions (such as normally distributed errors), the bootstrap can provide more accurate inferences when the data are not well behaved or when the sample size is small.</p></li>
+<li><p>It is possible to apply the bootstrap to statistics with sampling distributions that are difficult to derive, even asymptotically.</p></li>
+<li><p>It is relatively simple to apply the bootstrap to complex data-collection plans (such as stratified and clustered samples).</p></li>
+</ol>
+<p>The textbook by <a class="reference external" href="https://www.cambridge.org/core/books/bootstrap-methods-and-their-application/ED2FD043579F27952363566DC09CBD6A">Davison on the Bootstrap Methods and their Applications</a> provides many more insights and proofs. In this course we will take a more practical approach and use the results and theorems provided in the literature. For those interested in reading more about the bootstrap methods, we recommend the above text and the one by <a class="reference external" href="https://www.routledge.com/An-Introduction-to-the-Bootstrap/Efron-Tibshirani/p/book/9780412042317">Efron and Tibshirani</a>.</p>
+<p>Before we proceed however, we need to remind ourselves about a central theorem in statistics, namely the so-called <strong>central limit theorem</strong>.</p>
+</section>
+<section id="the-central-limit-theorem">
+<h2>The Central Limit Theorem<a class="headerlink" href="#the-central-limit-theorem" title="Link to this heading">#</a></h2>
+<p>Suppose we have a PDF <span class="math notranslate nohighlight">\(p(x)\)</span> from which we generate  a series <span class="math notranslate nohighlight">\(N\)</span>
+of averages <span class="math notranslate nohighlight">\(\mathbb{E}[x_i]\)</span>. Each mean value <span class="math notranslate nohighlight">\(\mathbb{E}[x_i]\)</span>
+is viewed as the average of a specific measurement, e.g., throwing
+dice 100 times and then taking the average value, or producing a certain
+amount of random numbers.
+For notational ease, we set <span class="math notranslate nohighlight">\(\mathbb{E}[x_i]=x_i\)</span> in the discussion
+which follows. We do the same for <span class="math notranslate nohighlight">\(\mathbb{E}[z]=z\)</span>.</p>
+<p>If we compute the mean <span class="math notranslate nohighlight">\(z\)</span> of <span class="math notranslate nohighlight">\(m\)</span> such mean values <span class="math notranslate nohighlight">\(x_i\)</span></p>
+<div class="math notranslate nohighlight">
+\[
+z=\frac{x_1+x_2+\dots+x_m}{m},
+\]</div>
+<p>the question we pose is which is the PDF of the new variable <span class="math notranslate nohighlight">\(z\)</span>.</p>
+</section>
+<section id="finding-the-limit">
+<h2>Finding the Limit<a class="headerlink" href="#finding-the-limit" title="Link to this heading">#</a></h2>
+<p>The probability of obtaining an average value <span class="math notranslate nohighlight">\(z\)</span> is the product of the
+probabilities of obtaining arbitrary individual mean values <span class="math notranslate nohighlight">\(x_i\)</span>,
+but with the constraint that the average is <span class="math notranslate nohighlight">\(z\)</span>. We can express this through
+the following expression</p>
+<div class="math notranslate nohighlight">
+\[
+\tilde{p}(z)=\int dx_1p(x_1)\int dx_2p(x_2)\dots\int dx_mp(x_m)
+    \delta(z-\frac{x_1+x_2+\dots+x_m}{m}),
+\]</div>
+<p>where the <span class="math notranslate nohighlight">\(\delta\)</span>-function enbodies the constraint that the mean is <span class="math notranslate nohighlight">\(z\)</span>.
+All measurements that lead to each individual <span class="math notranslate nohighlight">\(x_i\)</span> are expected to
+be independent, which in turn means that we can express <span class="math notranslate nohighlight">\(\tilde{p}\)</span> as the
+product of individual <span class="math notranslate nohighlight">\(p(x_i)\)</span>.  The independence assumption is important in the derivation of the central limit theorem.</p>
+</section>
+<section id="rewriting-the-delta-function">
+<h2>Rewriting the <span class="math notranslate nohighlight">\(\delta\)</span>-function<a class="headerlink" href="#rewriting-the-delta-function" title="Link to this heading">#</a></h2>
+<p>If we use the integral expression for the <span class="math notranslate nohighlight">\(\delta\)</span>-function</p>
+<div class="math notranslate nohighlight">
+\[
+\delta(z-\frac{x_1+x_2+\dots+x_m}{m})=\frac{1}{2\pi}\int_{-\infty}^{\infty}
+   dq\exp{\left(iq(z-\frac{x_1+x_2+\dots+x_m}{m})\right)},
+\]</div>
+<p>and inserting <span class="math notranslate nohighlight">\(e^{i\mu q-i\mu q}\)</span> where <span class="math notranslate nohighlight">\(\mu\)</span> is the mean value
+we arrive at</p>
+<div class="math notranslate nohighlight">
+\[
+\tilde{p}(z)=\frac{1}{2\pi}\int_{-\infty}^{\infty}
+   dq\exp{\left(iq(z-\mu)\right)}\left[\int_{-\infty}^{\infty}
+   dxp(x)\exp{\left(iq(\mu-x)/m\right)}\right]^m,
+\]</div>
+<p>with the integral over <span class="math notranslate nohighlight">\(x\)</span> resulting in</p>
+<div class="math notranslate nohighlight">
+\[
+\int_{-\infty}^{\infty}dxp(x)\exp{\left(iq(\mu-x)/m\right)}=
+  \int_{-\infty}^{\infty}dxp(x)
+   \left[1+\frac{iq(\mu-x)}{m}-\frac{q^2(\mu-x)^2}{2m^2}+\dots\right].
+\]</div>
+</section>
+<section id="identifying-terms">
+<h2>Identifying Terms<a class="headerlink" href="#identifying-terms" title="Link to this heading">#</a></h2>
+<p>The second term on the rhs disappears since this is just the mean and
+employing the definition of <span class="math notranslate nohighlight">\(\sigma^2\)</span> we have</p>
+<div class="math notranslate nohighlight">
+\[
+\int_{-\infty}^{\infty}dxp(x)e^{\left(iq(\mu-x)/m\right)}=
+  1-\frac{q^2\sigma^2}{2m^2}+\dots,
+\]</div>
+<p>resulting in</p>
+<div class="math notranslate nohighlight">
+\[
+\left[\int_{-\infty}^{\infty}dxp(x)\exp{\left(iq(\mu-x)/m\right)}\right]^m\approx
+  \left[1-\frac{q^2\sigma^2}{2m^2}+\dots \right]^m,
+\]</div>
+<p>and in the limit <span class="math notranslate nohighlight">\(m\rightarrow \infty\)</span> we obtain</p>
+<div class="math notranslate nohighlight">
+\[
+\tilde{p}(z)=\frac{1}{\sqrt{2\pi}(\sigma/\sqrt{m})}
+    \exp{\left(-\frac{(z-\mu)^2}{2(\sigma/\sqrt{m})^2}\right)},
+\]</div>
+<p>which is the normal distribution with variance
+<span class="math notranslate nohighlight">\(\sigma^2_m=\sigma^2/m\)</span>, where <span class="math notranslate nohighlight">\(\sigma\)</span> is the variance of the PDF <span class="math notranslate nohighlight">\(p(x)\)</span>
+and <span class="math notranslate nohighlight">\(\mu\)</span> is also the mean of the PDF <span class="math notranslate nohighlight">\(p(x)\)</span>.</p>
+</section>
+<section id="wrapping-it-up">
+<h2>Wrapping it up<a class="headerlink" href="#wrapping-it-up" title="Link to this heading">#</a></h2>
+<p>Thus, the central limit theorem states that the PDF <span class="math notranslate nohighlight">\(\tilde{p}(z)\)</span> of
+the average of <span class="math notranslate nohighlight">\(m\)</span> random values corresponding to a PDF <span class="math notranslate nohighlight">\(p(x)\)</span>
+is a normal distribution whose mean is the
+mean value of the PDF <span class="math notranslate nohighlight">\(p(x)\)</span> and whose variance is the variance
+of the PDF <span class="math notranslate nohighlight">\(p(x)\)</span> divided by <span class="math notranslate nohighlight">\(m\)</span>, the number of values used to compute <span class="math notranslate nohighlight">\(z\)</span>.</p>
+<p>The central limit theorem leads to the well-known expression for the
+standard deviation, given by</p>
+<div class="math notranslate nohighlight">
+\[
+\sigma_m=
+\frac{\sigma}{\sqrt{m}}.
+\]</div>
+<p>The latter is true only if the average value is known exactly. This is obtained in the limit
+<span class="math notranslate nohighlight">\(m\rightarrow \infty\)</span>  only. Because the mean and the variance are measured quantities we obtain
+the familiar expression in statistics (the so-called Bessel correction)</p>
+<div class="math notranslate nohighlight">
+\[
+\sigma_m\approx 
+\frac{\sigma}{\sqrt{m-1}}.
+\]</div>
+<p>In many cases however the above estimate for the standard deviation,
+in particular if correlations are strong, may be too simplistic. Keep
+in mind that we have assumed that the variables <span class="math notranslate nohighlight">\(x\)</span> are independent
+and identically distributed. This is obviously not always the
+case. For example, the random numbers (or better pseudorandom numbers)
+we generate in various calculations do always exhibit some
+correlations.</p>
+<p>The theorem is satisfied by a large class of PDFs. Note however that for a
+finite <span class="math notranslate nohighlight">\(m\)</span>, it is not always possible to find a closed form /analytic expression for
+<span class="math notranslate nohighlight">\(\tilde{p}(x)\)</span>.</p>
+</section>
+<section id="confidence-intervals">
+<h2>Confidence Intervals<a class="headerlink" href="#confidence-intervals" title="Link to this heading">#</a></h2>
+<p>Confidence intervals are used in statistics and represent a type of estimate
+computed from the observed data. This gives a range of values for an
+unknown parameter such as the parameters <span class="math notranslate nohighlight">\(\boldsymbol{\theta}\)</span> from linear regression.</p>
+<p>With the OLS expressions for the parameters <span class="math notranslate nohighlight">\(\boldsymbol{\theta}\)</span> we found
+<span class="math notranslate nohighlight">\(\mathbb{E}(\boldsymbol{\theta}) = \boldsymbol{\theta}\)</span>, which means that the estimator of the regression parameters is unbiased.</p>
+<p>In the exercises this week we show that the variance of the estimate of the <span class="math notranslate nohighlight">\(j\)</span>-th regression coefficient is
+<span class="math notranslate nohighlight">\(\boldsymbol{\sigma}^2 (\boldsymbol{\theta}_j ) = \boldsymbol{\sigma}^2 [(\mathbf{X}^{T} \mathbf{X})^{-1}]_{jj} \)</span>.</p>
+<p>This quantity can be used to
+construct a confidence interval for the estimates.</p>
+</section>
+<section id="standard-approach-based-on-the-normal-distribution">
+<h2>Standard Approach based on the Normal Distribution<a class="headerlink" href="#standard-approach-based-on-the-normal-distribution" title="Link to this heading">#</a></h2>
+<p>We will assume that the parameters <span class="math notranslate nohighlight">\(\theta\)</span> follow a normal
+distribution.  We can then define the confidence interval.  Here we will be using as
+shorthands <span class="math notranslate nohighlight">\(\mu_{\theta}\)</span> for the above mean value and <span class="math notranslate nohighlight">\(\sigma_{\theta}\)</span>
+for the standard deviation. We have then a confidence interval</p>
+<div class="math notranslate nohighlight">
+\[
+\left(\mu_{\theta}\pm \frac{z\sigma_{\theta}}{\sqrt{n}}\right),
+\]</div>
+<p>where <span class="math notranslate nohighlight">\(z\)</span> defines the level of certainty (or confidence). For a normal
+distribution typical parameters are <span class="math notranslate nohighlight">\(z=2.576\)</span> which corresponds to a
+confidence of <span class="math notranslate nohighlight">\(99\%\)</span> while <span class="math notranslate nohighlight">\(z=1.96\)</span> corresponds to a confidence of
+<span class="math notranslate nohighlight">\(95\%\)</span>.  A confidence level of <span class="math notranslate nohighlight">\(95\%\)</span> is commonly used and it is
+normally referred to as a <em>two-sigmas</em> confidence level, that is we
+approximate <span class="math notranslate nohighlight">\(z\approx 2\)</span>.</p>
+<p>For more discussions of confidence intervals (and in particular linked with a discussion of the bootstrap method), see chapter 5 of the textbook by <a class="reference external" href="https://www.cambridge.org/core/books/bootstrap-methods-and-their-application/ED2FD043579F27952363566DC09CBD6A">Davison on the Bootstrap Methods and their Applications</a></p>
+<p>In this text you will also find an in-depth discussion of the
+Bootstrap method, why it works and various theorems related to it.</p>
+</section>
+<section id="resampling-methods-bootstrap-background">
+<h2>Resampling methods: Bootstrap background<a class="headerlink" href="#resampling-methods-bootstrap-background" title="Link to this heading">#</a></h2>
+<p>Since <span class="math notranslate nohighlight">\(\widehat{\theta} = \widehat{\theta}(\boldsymbol{X})\)</span> is a function of random variables,
+<span class="math notranslate nohighlight">\(\widehat{\theta}\)</span> itself must be a random variable. Thus it has
+a pdf, call this function <span class="math notranslate nohighlight">\(p(\boldsymbol{t})\)</span>. The aim of the bootstrap is to
+estimate <span class="math notranslate nohighlight">\(p(\boldsymbol{t})\)</span> by the relative frequency of
+<span class="math notranslate nohighlight">\(\widehat{\theta}\)</span>. You can think of this as using a histogram
+in the place of <span class="math notranslate nohighlight">\(p(\boldsymbol{t})\)</span>. If the relative frequency closely
+resembles <span class="math notranslate nohighlight">\(p(\vec{t})\)</span>, then using numerics, it is straight forward to
+estimate all the interesting parameters of <span class="math notranslate nohighlight">\(p(\boldsymbol{t})\)</span> using point
+estimators.</p>
+</section>
+<section id="resampling-methods-more-bootstrap-background">
+<h2>Resampling methods: More Bootstrap background<a class="headerlink" href="#resampling-methods-more-bootstrap-background" title="Link to this heading">#</a></h2>
+<p>In the case that <span class="math notranslate nohighlight">\(\widehat{\theta}\)</span> has
+more than one component, and the components are independent, we use the
+same estimator on each component separately.  If the probability
+density function of <span class="math notranslate nohighlight">\(X_i\)</span>, <span class="math notranslate nohighlight">\(p(x)\)</span>, had been known, then it would have
+been straightforward to do this by:</p>
+<ol class="arabic simple">
+<li><p>Drawing lots of numbers from <span class="math notranslate nohighlight">\(p(x)\)</span>, suppose we call one such set of numbers <span class="math notranslate nohighlight">\((X_1^*, X_2^*, \cdots, X_n^*)\)</span>.</p></li>
+<li><p>Then using these numbers, we could compute a replica of <span class="math notranslate nohighlight">\(\widehat{\theta}\)</span> called <span class="math notranslate nohighlight">\(\widehat{\theta}^*\)</span>.</p></li>
+</ol>
+<p>By repeated use of the above two points, many
+estimates of <span class="math notranslate nohighlight">\(\widehat{\theta}\)</span> can  be obtained. The
+idea is to use the relative frequency of <span class="math notranslate nohighlight">\(\widehat{\theta}^*\)</span>
+(think of a histogram) as an estimate of <span class="math notranslate nohighlight">\(p(\boldsymbol{t})\)</span>.</p>
+</section>
+<section id="resampling-methods-bootstrap-approach">
+<h2>Resampling methods: Bootstrap approach<a class="headerlink" href="#resampling-methods-bootstrap-approach" title="Link to this heading">#</a></h2>
+<p>But
+unless there is enough information available about the process that
+generated <span class="math notranslate nohighlight">\(X_1,X_2,\cdots,X_n\)</span>, <span class="math notranslate nohighlight">\(p(x)\)</span> is in general
+unknown. Therefore, <a class="reference external" href="https://projecteuclid.org/euclid.aos/1176344552">Efron in 1979</a>  asked the
+question: What if we replace <span class="math notranslate nohighlight">\(p(x)\)</span> by the relative frequency
+of the observation <span class="math notranslate nohighlight">\(X_i\)</span>?</p>
+<p>If we draw observations in accordance with
+the relative frequency of the observations, will we obtain the same
+result in some asymptotic sense? The answer is yes.</p>
+</section>
+<section id="resampling-methods-bootstrap-steps">
+<h2>Resampling methods: Bootstrap steps<a class="headerlink" href="#resampling-methods-bootstrap-steps" title="Link to this heading">#</a></h2>
+<p>The independent bootstrap works like this:</p>
+<ol class="arabic simple">
+<li><p>Draw with replacement <span class="math notranslate nohighlight">\(n\)</span> numbers for the observed variables <span class="math notranslate nohighlight">\(\boldsymbol{x} = (x_1,x_2,\cdots,x_n)\)</span>.</p></li>
+<li><p>Define a vector <span class="math notranslate nohighlight">\(\boldsymbol{x}^*\)</span> containing the values which were drawn from <span class="math notranslate nohighlight">\(\boldsymbol{x}\)</span>.</p></li>
+<li><p>Using the vector <span class="math notranslate nohighlight">\(\boldsymbol{x}^*\)</span> compute <span class="math notranslate nohighlight">\(\widehat{\theta}^*\)</span> by evaluating <span class="math notranslate nohighlight">\(\widehat \theta\)</span> under the observations <span class="math notranslate nohighlight">\(\boldsymbol{x}^*\)</span>.</p></li>
+<li><p>Repeat this process <span class="math notranslate nohighlight">\(k\)</span> times.</p></li>
+</ol>
+<p>When you are done, you can draw a histogram of the relative frequency
+of <span class="math notranslate nohighlight">\(\widehat \theta^*\)</span>. This is your estimate of the probability
+distribution <span class="math notranslate nohighlight">\(p(t)\)</span>. Using this probability distribution you can
+estimate any statistics thereof. In principle you never draw the
+histogram of the relative frequency of <span class="math notranslate nohighlight">\(\widehat{\theta}^*\)</span>. Instead
+you use the estimators corresponding to the statistic of interest. For
+example, if you are interested in estimating the variance of <span class="math notranslate nohighlight">\(\widehat
+\theta\)</span>, apply the etsimator <span class="math notranslate nohighlight">\(\widehat \sigma^2\)</span> to the values
+<span class="math notranslate nohighlight">\(\widehat \theta^*\)</span>.</p>
+</section>
+<section id="code-example-for-the-bootstrap-method">
+<h2>Code example for the Bootstrap method<a class="headerlink" href="#code-example-for-the-bootstrap-method" title="Link to this heading">#</a></h2>
+<p>The following code starts with a Gaussian distribution with mean value
+<span class="math notranslate nohighlight">\(\mu =100\)</span> and variance <span class="math notranslate nohighlight">\(\sigma=15\)</span>. We use this to generate the data
+used in the bootstrap analysis. The bootstrap analysis returns a data
+set after a given number of bootstrap operations (as many as we have
+data points). This data set consists of estimated mean values for each
+bootstrap operation. The histogram generated by the bootstrap method
+shows that the distribution for these mean values is also a Gaussian,
+centered around the mean value <span class="math notranslate nohighlight">\(\mu=100\)</span> but with standard deviation
+<span class="math notranslate nohighlight">\(\sigma/\sqrt{n}\)</span>, where <span class="math notranslate nohighlight">\(n\)</span> is the number of bootstrap samples (in
+this case the same as the number of original data points). The value
+of the standard deviation is what we expect from the central limit
+theorem.</p>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-none notranslate"><div class="highlight"><pre><span></span>%matplotlib inline
+
+import numpy as np
+from time import time
+from scipy.stats import norm
+import matplotlib.pyplot as plt
+
+# Returns mean of bootstrap samples 
+# Bootstrap algorithm
+def bootstrap(data, datapoints):
+    t = np.zeros(datapoints)
+    n = len(data)
+    # non-parametric bootstrap         
+    for i in range(datapoints):
+        t[i] = np.mean(data[np.random.randint(0,n,n)])
+    # analysis    
+    print(&quot;Bootstrap Statistics :&quot;)
+    print(&quot;original           bias      std. error&quot;)
+    print(&quot;%8g %8g %14g %15g&quot; % (np.mean(data), np.std(data),np.mean(t),np.std(t)))
+    return t
+
+# We set the mean value to 100 and the standard deviation to 15
+mu, sigma = 100, 15
+datapoints = 10000
+# We generate random numbers according to the normal distribution
+x = mu + sigma*np.random.randn(datapoints)
+# bootstrap returns the data sample                                    
+t = bootstrap(x, datapoints)
+</pre></div>
+</div>
+</div>
+</div>
+<p>We see that our new variance and from that the standard deviation, agrees with the central limit theorem.</p>
+</section>
+<section id="plotting-the-histogram">
+<h2>Plotting the Histogram<a class="headerlink" href="#plotting-the-histogram" title="Link to this heading">#</a></h2>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-none notranslate"><div class="highlight"><pre><span></span># the histogram of the bootstrapped data (normalized data if density = True)
+n, binsboot, patches = plt.hist(t, 50, density=True, facecolor=&#39;red&#39;, alpha=0.75)
+# add a &#39;best fit&#39; line  
+y = norm.pdf(binsboot, np.mean(t), np.std(t))
+lt = plt.plot(binsboot, y, &#39;b&#39;, linewidth=1)
+plt.xlabel(&#39;x&#39;)
+plt.ylabel(&#39;Probability&#39;)
+plt.grid(True)
+plt.show()
+</pre></div>
+</div>
+</div>
+</div>
+</section>
+<section id="the-bias-variance-tradeoff">
+<h2>The bias-variance tradeoff<a class="headerlink" href="#the-bias-variance-tradeoff" title="Link to this heading">#</a></h2>
+<p>We will discuss the bias-variance tradeoff in the context of
+continuous predictions such as regression. However, many of the
+intuitions and ideas discussed here also carry over to classification
+tasks. Consider a dataset <span class="math notranslate nohighlight">\(\mathcal{D}\)</span> consisting of the data
+<span class="math notranslate nohighlight">\(\mathbf{X}_\mathcal{D}=\{(y_j, \boldsymbol{x}_j), j=0\ldots n-1\}\)</span>.</p>
+<p>Let us assume that the true data is generated from a noisy model</p>
+<div class="math notranslate nohighlight">
+\[
+\boldsymbol{y}=f(\boldsymbol{x}) + \boldsymbol{\epsilon}
+\]</div>
+<p>where <span class="math notranslate nohighlight">\(\epsilon\)</span> is normally distributed with mean zero and standard deviation <span class="math notranslate nohighlight">\(\sigma^2\)</span>.</p>
+<p>In our derivation of the ordinary least squares method we defined then
+an approximation to the function <span class="math notranslate nohighlight">\(f\)</span> in terms of the parameters
+<span class="math notranslate nohighlight">\(\boldsymbol{\theta}\)</span> and the design matrix <span class="math notranslate nohighlight">\(\boldsymbol{X}\)</span> which embody our model,
+that is <span class="math notranslate nohighlight">\(\boldsymbol{\tilde{y}}=\boldsymbol{X}\boldsymbol{\theta}\)</span>.</p>
+<p>Thereafter we found the parameters <span class="math notranslate nohighlight">\(\boldsymbol{\theta}\)</span> by optimizing the means squared error via the so-called cost function</p>
+<div class="math notranslate nohighlight">
+\[
+C(\boldsymbol{X},\boldsymbol{\theta}) =\frac{1}{n}\sum_{i=0}^{n-1}(y_i-\tilde{y}_i)^2=\mathbb{E}\left[(\boldsymbol{y}-\boldsymbol{\tilde{y}})^2\right].
+\]</div>
+<p>We can rewrite this as</p>
+<div class="math notranslate nohighlight">
+\[
+\mathbb{E}\left[(\boldsymbol{y}-\boldsymbol{\tilde{y}})^2\right]=\frac{1}{n}\sum_i(f_i-\mathbb{E}\left[\boldsymbol{\tilde{y}}\right])^2+\frac{1}{n}\sum_i(\tilde{y}_i-\mathbb{E}\left[\boldsymbol{\tilde{y}}\right])^2+\sigma^2.
+\]</div>
+<p>The three terms represent the square of the bias of the learning
+method, which can be thought of as the error caused by the simplifying
+assumptions built into the method. The second term represents the
+variance of the chosen model and finally the last terms is variance of
+the error <span class="math notranslate nohighlight">\(\boldsymbol{\epsilon}\)</span>.</p>
+<p>To derive this equation, we need to recall that the variance of <span class="math notranslate nohighlight">\(\boldsymbol{y}\)</span> and <span class="math notranslate nohighlight">\(\boldsymbol{\epsilon}\)</span> are both equal to <span class="math notranslate nohighlight">\(\sigma^2\)</span>. The mean value of <span class="math notranslate nohighlight">\(\boldsymbol{\epsilon}\)</span> is by definition equal to zero. Furthermore, the function <span class="math notranslate nohighlight">\(f\)</span> is not a stochastics variable, idem for <span class="math notranslate nohighlight">\(\boldsymbol{\tilde{y}}\)</span>.
+We use a more compact notation in terms of the expectation value</p>
+<div class="math notranslate nohighlight">
+\[
+\mathbb{E}\left[(\boldsymbol{y}-\boldsymbol{\tilde{y}})^2\right]=\mathbb{E}\left[(\boldsymbol{f}+\boldsymbol{\epsilon}-\boldsymbol{\tilde{y}})^2\right],
+\]</div>
+<p>and adding and subtracting <span class="math notranslate nohighlight">\(\mathbb{E}\left[\boldsymbol{\tilde{y}}\right]\)</span> we get</p>
+<div class="math notranslate nohighlight">
+\[
+\mathbb{E}\left[(\boldsymbol{y}-\boldsymbol{\tilde{y}})^2\right]=\mathbb{E}\left[(\boldsymbol{f}+\boldsymbol{\epsilon}-\boldsymbol{\tilde{y}}+\mathbb{E}\left[\boldsymbol{\tilde{y}}\right]-\mathbb{E}\left[\boldsymbol{\tilde{y}}\right])^2\right],
+\]</div>
+<p>which, using the abovementioned expectation values can be rewritten as</p>
+<div class="math notranslate nohighlight">
+\[
+\mathbb{E}\left[(\boldsymbol{y}-\boldsymbol{\tilde{y}})^2\right]=\mathbb{E}\left[(\boldsymbol{y}-\mathbb{E}\left[\boldsymbol{\tilde{y}}\right])^2\right]+\mathrm{Var}\left[\boldsymbol{\tilde{y}}\right]+\sigma^2,
+\]</div>
+<p>that is the rewriting in terms of the so-called bias, the variance of the model <span class="math notranslate nohighlight">\(\boldsymbol{\tilde{y}}\)</span> and the variance of <span class="math notranslate nohighlight">\(\boldsymbol{\epsilon}\)</span>.</p>
+</section>
+<section id="a-way-to-read-the-bias-variance-tradeoff">
+<h2>A way to Read the Bias-Variance Tradeoff<a class="headerlink" href="#a-way-to-read-the-bias-variance-tradeoff" title="Link to this heading">#</a></h2>
+<!-- dom:FIGURE: [figures/BiasVariance.png, width=600 frac=0.9] -->
+<!-- begin figure -->
+<p><img src="http://github.com/figures/BiasVariance.png" width="600"><p style="font-size: 0.9em"><i>Figure 1: </i></p></p>
+<!-- end figure --></section>
+<section id="example-code-for-bias-variance-tradeoff">
+<h2>Example code for Bias-Variance tradeoff<a class="headerlink" href="#example-code-for-bias-variance-tradeoff" title="Link to this heading">#</a></h2>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-none notranslate"><div class="highlight"><pre><span></span>import matplotlib.pyplot as plt
+import numpy as np
+from sklearn.linear_model import LinearRegression, Ridge, Lasso
+from sklearn.preprocessing import PolynomialFeatures
+from sklearn.model_selection import train_test_split
+from sklearn.pipeline import make_pipeline
+from sklearn.utils import resample
+
+np.random.seed(2018)
+
+n = 500
+n_boostraps = 100
+degree = 18  # A quite high value, just to show.
+noise = 0.1
+
+# Make data set.
+x = np.linspace(-1, 3, n).reshape(-1, 1)
+y = np.exp(-x**2) + 1.5 * np.exp(-(x-2)**2) + np.random.normal(0, 0.1, x.shape)
+
+# Hold out some test data that is never used in training.
+x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2)
+
+# Combine x transformation and model into one operation.
+# Not neccesary, but convenient.
+model = make_pipeline(PolynomialFeatures(degree=degree), LinearRegression(fit_intercept=False))
+
+# The following (m x n_bootstraps) matrix holds the column vectors y_pred
+# for each bootstrap iteration.
+y_pred = np.empty((y_test.shape[0], n_boostraps))
+for i in range(n_boostraps):
+    x_, y_ = resample(x_train, y_train)
+
+    # Evaluate the new model on the same test data each time.
+    y_pred[:, i] = model.fit(x_, y_).predict(x_test).ravel()
+
+# Note: Expectations and variances taken w.r.t. different training
+# data sets, hence the axis=1. Subsequent means are taken across the test data
+# set in order to obtain a total value, but before this we have error/bias/variance
+# calculated per data point in the test set.
+# Note 2: The use of keepdims=True is important in the calculation of bias as this 
+# maintains the column vector form. Dropping this yields very unexpected results.
+error = np.mean( np.mean((y_test - y_pred)**2, axis=1, keepdims=True) )
+bias = np.mean( (y_test - np.mean(y_pred, axis=1, keepdims=True))**2 )
+variance = np.mean( np.var(y_pred, axis=1, keepdims=True) )
+print(&#39;Error:&#39;, error)
+print(&#39;Bias^2:&#39;, bias)
+print(&#39;Var:&#39;, variance)
+print(&#39;{} &gt;= {} + {} = {}&#39;.format(error, bias, variance, bias+variance))
+
+plt.plot(x[::5, :], y[::5, :], label=&#39;f(x)&#39;)
+plt.scatter(x_test, y_test, label=&#39;Data points&#39;)
+plt.scatter(x_test, np.mean(y_pred, axis=1), label=&#39;Pred&#39;)
+plt.legend()
+plt.show()
+</pre></div>
+</div>
+</div>
+</div>
+</section>
+<section id="understanding-what-happens">
+<h2>Understanding what happens<a class="headerlink" href="#understanding-what-happens" title="Link to this heading">#</a></h2>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-none notranslate"><div class="highlight"><pre><span></span>import matplotlib.pyplot as plt
+import numpy as np
+from sklearn.linear_model import LinearRegression, Ridge, Lasso
+from sklearn.preprocessing import PolynomialFeatures
+from sklearn.model_selection import train_test_split
+from sklearn.pipeline import make_pipeline
+from sklearn.utils import resample
+
+np.random.seed(2018)
+
+n = 40
+n_boostraps = 100
+maxdegree = 14
+
+
+# Make data set.
+x = np.linspace(-3, 3, n).reshape(-1, 1)
+y = np.exp(-x**2) + 1.5 * np.exp(-(x-2)**2)+ np.random.normal(0, 0.1, x.shape)
+error = np.zeros(maxdegree)
+bias = np.zeros(maxdegree)
+variance = np.zeros(maxdegree)
+polydegree = np.zeros(maxdegree)
+x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2)
+
+for degree in range(maxdegree):
+    model = make_pipeline(PolynomialFeatures(degree=degree), LinearRegression(fit_intercept=False))
+    y_pred = np.empty((y_test.shape[0], n_boostraps))
+    for i in range(n_boostraps):
+        x_, y_ = resample(x_train, y_train)
+        y_pred[:, i] = model.fit(x_, y_).predict(x_test).ravel()
+
+    polydegree[degree] = degree
+    error[degree] = np.mean( np.mean((y_test - y_pred)**2, axis=1, keepdims=True) )
+    bias[degree] = np.mean( (y_test - np.mean(y_pred, axis=1, keepdims=True))**2 )
+    variance[degree] = np.mean( np.var(y_pred, axis=1, keepdims=True) )
+    print(&#39;Polynomial degree:&#39;, degree)
+    print(&#39;Error:&#39;, error[degree])
+    print(&#39;Bias^2:&#39;, bias[degree])
+    print(&#39;Var:&#39;, variance[degree])
+    print(&#39;{} &gt;= {} + {} = {}&#39;.format(error[degree], bias[degree], variance[degree], bias[degree]+variance[degree]))
+
+plt.plot(polydegree, error, label=&#39;Error&#39;)
+plt.plot(polydegree, bias, label=&#39;bias&#39;)
+plt.plot(polydegree, variance, label=&#39;Variance&#39;)
+plt.legend()
+plt.show()
+</pre></div>
+</div>
+</div>
+</div>
+</section>
+<section id="summing-up">
+<h2>Summing up<a class="headerlink" href="#summing-up" title="Link to this heading">#</a></h2>
+<p>The bias-variance tradeoff summarizes the fundamental tension in
+machine learning, particularly supervised learning, between the
+complexity of a model and the amount of training data needed to train
+it.  Since data is often limited, in practice it is often useful to
+use a less-complex model with higher bias, that is  a model whose asymptotic
+performance is worse than another model because it is easier to
+train and less sensitive to sampling noise arising from having a
+finite-sized training dataset (smaller variance).</p>
+<p>The above equations tell us that in
+order to minimize the expected test error, we need to select a
+statistical learning method that simultaneously achieves low variance
+and low bias. Note that variance is inherently a nonnegative quantity,
+and squared bias is also nonnegative. Hence, we see that the expected
+test MSE can never lie below <span class="math notranslate nohighlight">\(Var(\epsilon)\)</span>, the irreducible error.</p>
+<p>What do we mean by the variance and bias of a statistical learning
+method? The variance refers to the amount by which our model would change if we
+estimated it using a different training data set. Since the training
+data are used to fit the statistical learning method, different
+training data sets  will result in a different estimate. But ideally the
+estimate for our model should not vary too much between training
+sets. However, if a method has high variance  then small changes in
+the training data can result in large changes in the model. In general, more
+flexible statistical methods have higher variance.</p>
+<p>You may also find this recent <a class="reference external" href="https://www.pnas.org/content/116/32/15849">article</a> of interest.</p>
+</section>
+<section id="another-example-from-scikit-learn-s-repository">
+<h2>Another Example from Scikit-Learn’s Repository<a class="headerlink" href="#another-example-from-scikit-learn-s-repository" title="Link to this heading">#</a></h2>
+<p>This example demonstrates the problems of underfitting and overfitting and
+how we can use linear regression with polynomial features to approximate
+nonlinear functions. The plot shows the function that we want to approximate,
+which is a part of the cosine function. In addition, the samples from the
+real function and the approximations of different models are displayed. The
+models have polynomial features of different degrees. We can see that a
+linear function (polynomial with degree 1) is not sufficient to fit the
+training samples. This is called <strong>underfitting</strong>. A polynomial of degree 4
+approximates the true function almost perfectly. However, for higher degrees
+the model will <strong>overfit</strong> the training data, i.e. it learns the noise of the
+training data.
+We evaluate quantitatively overfitting and underfitting by using
+cross-validation. We calculate the mean squared error (MSE) on the validation
+set, the higher, the less likely the model generalizes correctly from the
+training data.</p>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-none notranslate"><div class="highlight"><pre><span></span>
+
+#print(__doc__)
+
+import numpy as np
+import matplotlib.pyplot as plt
+from sklearn.pipeline import Pipeline
+from sklearn.preprocessing import PolynomialFeatures
+from sklearn.linear_model import LinearRegression
+from sklearn.model_selection import cross_val_score
+
+
+def true_fun(X):
+    return np.cos(1.5 * np.pi * X)
+
+np.random.seed(0)
+
+n_samples = 30
+degrees = [1, 4, 15]
+
+X = np.sort(np.random.rand(n_samples))
+y = true_fun(X) + np.random.randn(n_samples) * 0.1
+
+plt.figure(figsize=(14, 5))
+for i in range(len(degrees)):
+    ax = plt.subplot(1, len(degrees), i + 1)
+    plt.setp(ax, xticks=(), yticks=())
+
+    polynomial_features = PolynomialFeatures(degree=degrees[i],
+                                             include_bias=False)
+    linear_regression = LinearRegression()
+    pipeline = Pipeline([(&quot;polynomial_features&quot;, polynomial_features),
+                         (&quot;linear_regression&quot;, linear_regression)])
+    pipeline.fit(X[:, np.newaxis], y)
+
+    # Evaluate the models using crossvalidation
+    scores = cross_val_score(pipeline, X[:, np.newaxis], y,
+                             scoring=&quot;neg_mean_squared_error&quot;, cv=10)
+
+    X_test = np.linspace(0, 1, 100)
+    plt.plot(X_test, pipeline.predict(X_test[:, np.newaxis]), label=&quot;Model&quot;)
+    plt.plot(X_test, true_fun(X_test), label=&quot;True function&quot;)
+    plt.scatter(X, y, edgecolor=&#39;b&#39;, s=20, label=&quot;Samples&quot;)
+    plt.xlabel(&quot;x&quot;)
+    plt.ylabel(&quot;y&quot;)
+    plt.xlim((0, 1))
+    plt.ylim((-2, 2))
+    plt.legend(loc=&quot;best&quot;)
+    plt.title(&quot;Degree {}\nMSE = {:.2e}(+/- {:.2e})&quot;.format(
+        degrees[i], -scores.mean(), scores.std()))
+plt.show()
+</pre></div>
+</div>
+</div>
+</div>
+</section>
+<section id="various-steps-in-cross-validation">
+<h2>Various steps in cross-validation<a class="headerlink" href="#various-steps-in-cross-validation" title="Link to this heading">#</a></h2>
+<p>When the repetitive splitting of the data set is done randomly,
+samples may accidently end up in a fast majority of the splits in
+either training or test set. Such samples may have an unbalanced
+influence on either model building or prediction evaluation. To avoid
+this <span class="math notranslate nohighlight">\(k\)</span>-fold cross-validation structures the data splitting. The
+samples are divided into <span class="math notranslate nohighlight">\(k\)</span> more or less equally sized exhaustive and
+mutually exclusive subsets. In turn (at each split) one of these
+subsets plays the role of the test set while the union of the
+remaining subsets constitutes the training set. Such a splitting
+warrants a balanced representation of each sample in both training and
+test set over the splits. Still the division into the <span class="math notranslate nohighlight">\(k\)</span> subsets
+involves a degree of randomness. This may be fully excluded when
+choosing <span class="math notranslate nohighlight">\(k=n\)</span>. This particular case is referred to as leave-one-out
+cross-validation (LOOCV).</p>
+</section>
+<section id="cross-validation-in-brief">
+<h2>Cross-validation in brief<a class="headerlink" href="#cross-validation-in-brief" title="Link to this heading">#</a></h2>
+<p>For the various values of <span class="math notranslate nohighlight">\(k\)</span></p>
+<ol class="arabic simple">
+<li><p>shuffle the dataset randomly.</p></li>
+<li><p>Split the dataset into <span class="math notranslate nohighlight">\(k\)</span> groups.</p></li>
+<li><p>For each unique group:</p></li>
+</ol>
+<p>a. Decide which group to use as set for test data</p>
+<p>b. Take the remaining groups as a training data set</p>
+<p>c. Fit a model on the training set and evaluate it on the test set</p>
+<p>d. Retain the evaluation score and discard the model</p>
+<ol class="arabic simple" start="5">
+<li><p>Summarize the model using the sample of model evaluation scores</p></li>
+</ol>
+</section>
+<section id="code-example-for-cross-validation-and-k-fold-cross-validation">
+<h2>Code Example for Cross-validation and <span class="math notranslate nohighlight">\(k\)</span>-fold Cross-validation<a class="headerlink" href="#code-example-for-cross-validation-and-k-fold-cross-validation" title="Link to this heading">#</a></h2>
+<p>The code here uses Ridge regression with cross-validation (CV)  resampling and <span class="math notranslate nohighlight">\(k\)</span>-fold CV in order to fit a specific polynomial.</p>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-none notranslate"><div class="highlight"><pre><span></span>import numpy as np
+import matplotlib.pyplot as plt
+from sklearn.model_selection import KFold
+from sklearn.linear_model import Ridge
+from sklearn.model_selection import cross_val_score
+from sklearn.preprocessing import PolynomialFeatures
+
+# A seed just to ensure that the random numbers are the same for every run.
+# Useful for eventual debugging.
+np.random.seed(3155)
+
+# Generate the data.
+nsamples = 100
+x = np.random.randn(nsamples)
+y = 3*x**2 + np.random.randn(nsamples)
+
+## Cross-validation on Ridge regression using KFold only
+
+# Decide degree on polynomial to fit
+poly = PolynomialFeatures(degree = 6)
+
+# Decide which values of lambda to use
+nlambdas = 500
+lambdas = np.logspace(-3, 5, nlambdas)
+
+# Initialize a KFold instance
+k = 5
+kfold = KFold(n_splits = k)
+
+# Perform the cross-validation to estimate MSE
+scores_KFold = np.zeros((nlambdas, k))
+
+i = 0
+for lmb in lambdas:
+    ridge = Ridge(alpha = lmb)
+    j = 0
+    for train_inds, test_inds in kfold.split(x):
+        xtrain = x[train_inds]
+        ytrain = y[train_inds]
+
+        xtest = x[test_inds]
+        ytest = y[test_inds]
+
+        Xtrain = poly.fit_transform(xtrain[:, np.newaxis])
+        ridge.fit(Xtrain, ytrain[:, np.newaxis])
+
+        Xtest = poly.fit_transform(xtest[:, np.newaxis])
+        ypred = ridge.predict(Xtest)
+
+        scores_KFold[i,j] = np.sum((ypred - ytest[:, np.newaxis])**2)/np.size(ypred)
+
+        j += 1
+    i += 1
+
+
+estimated_mse_KFold = np.mean(scores_KFold, axis = 1)
+
+## Cross-validation using cross_val_score from sklearn along with KFold
+
+# kfold is an instance initialized above as:
+# kfold = KFold(n_splits = k)
+
+estimated_mse_sklearn = np.zeros(nlambdas)
+i = 0
+for lmb in lambdas:
+    ridge = Ridge(alpha = lmb)
+
+    X = poly.fit_transform(x[:, np.newaxis])
+    estimated_mse_folds = cross_val_score(ridge, X, y[:, np.newaxis], scoring=&#39;neg_mean_squared_error&#39;, cv=kfold)
+
+    # cross_val_score return an array containing the estimated negative mse for every fold.
+    # we have to the the mean of every array in order to get an estimate of the mse of the model
+    estimated_mse_sklearn[i] = np.mean(-estimated_mse_folds)
+
+    i += 1
+
+## Plot and compare the slightly different ways to perform cross-validation
+
+plt.figure()
+
+plt.plot(np.log10(lambdas), estimated_mse_sklearn, label = &#39;cross_val_score&#39;)
+plt.plot(np.log10(lambdas), estimated_mse_KFold, &#39;r--&#39;, label = &#39;KFold&#39;)
+
+plt.xlabel(&#39;log10(lambda)&#39;)
+plt.ylabel(&#39;mse&#39;)
+
+plt.legend()
+
+plt.show()
+</pre></div>
+</div>
+</div>
+</div>
+</section>
+<section id="more-examples-on-bootstrap-and-cross-validation-and-errors">
+<h2>More examples on bootstrap and cross-validation and errors<a class="headerlink" href="#more-examples-on-bootstrap-and-cross-validation-and-errors" title="Link to this heading">#</a></h2>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-none notranslate"><div class="highlight"><pre><span></span># Common imports
+import os
+import numpy as np
+import pandas as pd
+import matplotlib.pyplot as plt
+from sklearn.linear_model import LinearRegression, Ridge, Lasso
+from sklearn.model_selection import train_test_split
+from sklearn.utils import resample
+from sklearn.metrics import mean_squared_error
+# Where to save the figures and data files
+PROJECT_ROOT_DIR = &quot;Results&quot;
+FIGURE_ID = &quot;Results/FigureFiles&quot;
+DATA_ID = &quot;DataFiles/&quot;
+
+if not os.path.exists(PROJECT_ROOT_DIR):
+    os.mkdir(PROJECT_ROOT_DIR)
+
+if not os.path.exists(FIGURE_ID):
+    os.makedirs(FIGURE_ID)
+
+if not os.path.exists(DATA_ID):
+    os.makedirs(DATA_ID)
+
+def image_path(fig_id):
+    return os.path.join(FIGURE_ID, fig_id)
+
+def data_path(dat_id):
+    return os.path.join(DATA_ID, dat_id)
+
+def save_fig(fig_id):
+    plt.savefig(image_path(fig_id) + &quot;.png&quot;, format=&#39;png&#39;)
+
+infile = open(data_path(&quot;EoS.csv&quot;),&#39;r&#39;)
+
+# Read the EoS data as  csv file and organize the data into two arrays with density and energies
+EoS = pd.read_csv(infile, names=(&#39;Density&#39;, &#39;Energy&#39;))
+EoS[&#39;Energy&#39;] = pd.to_numeric(EoS[&#39;Energy&#39;], errors=&#39;coerce&#39;)
+EoS = EoS.dropna()
+Energies = EoS[&#39;Energy&#39;]
+Density = EoS[&#39;Density&#39;]
+#  The design matrix now as function of various polytrops
+
+Maxpolydegree = 30
+X = np.zeros((len(Density),Maxpolydegree))
+X[:,0] = 1.0
+testerror = np.zeros(Maxpolydegree)
+trainingerror = np.zeros(Maxpolydegree)
+polynomial = np.zeros(Maxpolydegree)
+
+trials = 100
+for polydegree in range(1, Maxpolydegree):
+    polynomial[polydegree] = polydegree
+    for degree in range(polydegree):
+        X[:,degree] = Density**(degree/3.0)
+
+# loop over trials in order to estimate the expectation value of the MSE
+    testerror[polydegree] = 0.0
+    trainingerror[polydegree] = 0.0
+    for samples in range(trials):
+        x_train, x_test, y_train, y_test = train_test_split(X, Energies, test_size=0.2)
+        model = LinearRegression(fit_intercept=False).fit(x_train, y_train)
+        ypred = model.predict(x_train)
+        ytilde = model.predict(x_test)
+        testerror[polydegree] += mean_squared_error(y_test, ytilde)
+        trainingerror[polydegree] += mean_squared_error(y_train, ypred) 
+
+    testerror[polydegree] /= trials
+    trainingerror[polydegree] /= trials
+    print(&quot;Degree of polynomial: %3d&quot;% polynomial[polydegree])
+    print(&quot;Mean squared error on training data: %.8f&quot; % trainingerror[polydegree])
+    print(&quot;Mean squared error on test data: %.8f&quot; % testerror[polydegree])
+
+plt.plot(polynomial, np.log10(trainingerror), label=&#39;Training Error&#39;)
+plt.plot(polynomial, np.log10(testerror), label=&#39;Test Error&#39;)
+plt.xlabel(&#39;Polynomial degree&#39;)
+plt.ylabel(&#39;log10[MSE]&#39;)
+plt.legend()
+plt.show()
+</pre></div>
+</div>
+</div>
+</div>
+<p>Note that we kept the intercept column in the fitting here. This means that we need to set the <strong>intercept</strong> in the call to the <strong>Scikit-Learn</strong> function as <strong>False</strong>. Alternatively, we could have set up the design matrix <span class="math notranslate nohighlight">\(X\)</span> without the first column of ones.</p>
+</section>
+<section id="the-same-example-but-now-with-cross-validation">
+<h2>The same example but now with cross-validation<a class="headerlink" href="#the-same-example-but-now-with-cross-validation" title="Link to this heading">#</a></h2>
+<p>In this example we keep the intercept column again but add cross-validation in order to estimate the best possible value of the means squared error.</p>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-none notranslate"><div class="highlight"><pre><span></span># Common imports
+import os
+import numpy as np
+import pandas as pd
+import matplotlib.pyplot as plt
+from sklearn.linear_model import LinearRegression, Ridge, Lasso
+from sklearn.metrics import mean_squared_error
+from sklearn.model_selection import KFold
+from sklearn.model_selection import cross_val_score
+
+
+# Where to save the figures and data files
+PROJECT_ROOT_DIR = &quot;Results&quot;
+FIGURE_ID = &quot;Results/FigureFiles&quot;
+DATA_ID = &quot;DataFiles/&quot;
+
+if not os.path.exists(PROJECT_ROOT_DIR):
+    os.mkdir(PROJECT_ROOT_DIR)
+
+if not os.path.exists(FIGURE_ID):
+    os.makedirs(FIGURE_ID)
+
+if not os.path.exists(DATA_ID):
+    os.makedirs(DATA_ID)
+
+def image_path(fig_id):
+    return os.path.join(FIGURE_ID, fig_id)
+
+def data_path(dat_id):
+    return os.path.join(DATA_ID, dat_id)
+
+def save_fig(fig_id):
+    plt.savefig(image_path(fig_id) + &quot;.png&quot;, format=&#39;png&#39;)
+
+infile = open(data_path(&quot;EoS.csv&quot;),&#39;r&#39;)
+
+# Read the EoS data as  csv file and organize the data into two arrays with density and energies
+EoS = pd.read_csv(infile, names=(&#39;Density&#39;, &#39;Energy&#39;))
+EoS[&#39;Energy&#39;] = pd.to_numeric(EoS[&#39;Energy&#39;], errors=&#39;coerce&#39;)
+EoS = EoS.dropna()
+Energies = EoS[&#39;Energy&#39;]
+Density = EoS[&#39;Density&#39;]
+#  The design matrix now as function of various polytrops
+
+Maxpolydegree = 30
+X = np.zeros((len(Density),Maxpolydegree))
+X[:,0] = 1.0
+estimated_mse_sklearn = np.zeros(Maxpolydegree)
+polynomial = np.zeros(Maxpolydegree)
+k =5
+kfold = KFold(n_splits = k)
+
+for polydegree in range(1, Maxpolydegree):
+    polynomial[polydegree] = polydegree
+    for degree in range(polydegree):
+        X[:,degree] = Density**(degree/3.0)
+        OLS = LinearRegression(fit_intercept=False)
+# loop over trials in order to estimate the expectation value of the MSE
+    estimated_mse_folds = cross_val_score(OLS, X, Energies, scoring=&#39;neg_mean_squared_error&#39;, cv=kfold)
+#[:, np.newaxis]
+    estimated_mse_sklearn[polydegree] = np.mean(-estimated_mse_folds)
+
+plt.plot(polynomial, np.log10(estimated_mse_sklearn), label=&#39;Test Error&#39;)
+plt.xlabel(&#39;Polynomial degree&#39;)
+plt.ylabel(&#39;log10[MSE]&#39;)
+plt.legend()
+plt.show()
+</pre></div>
+</div>
+</div>
+</div>
+</section>
+<section id="material-for-the-lab-sessions">
+<h2>Material for the lab sessions<a class="headerlink" href="#material-for-the-lab-sessions" title="Link to this heading">#</a></h2>
+<p>This week we will discuss during the first hour of each lab session
+some technicalities related to the project and methods for updating
+the learning like ADAgrad, RMSprop and ADAM. As teaching material, see
+the jupyter-notebook from week 37 (September 12-16).</p>
+<p>For the lab session, the following video on cross validation (from 2024), could be helpful, see <a class="reference external" href="https://www.youtube.com/watch?v=T9jjWsmsd1o">https://www.youtube.com/watch?v=T9jjWsmsd1o</a></p>
+<p>See also video on ADAgrad, RMSprop and ADAM (material from last week not covered during lecture) at <a class="reference external" href="https://youtu.be/J_41Hld6tTU">https://youtu.be/J_41Hld6tTU</a></p>
+</section>
+</section>
+
+    <script type="text/x-thebe-config">
+    {
+        requestKernel: true,
+        binderOptions: {
+            repo: "binder-examples/jupyter-stacks-datascience",
+            ref: "master",
+        },
+        codeMirrorConfig: {
+            theme: "abcdef",
+            mode: "python"
+        },
+        kernelOptions: {
+            name: "python3",
+            path: "./."
+        },
+        predefinedOutput: true
+    }
+    </script>
+    <script>kernelName = 'python3'</script>
+
+                </article>
+              
+
+              
+              
+              
+              
+                <footer class="prev-next-footer d-print-none">
+                  
+<div class="prev-next-area">
+    <a class="left-prev"
+       href="http://github.com/exercisesweek38.html"
+       title="previous page">
+      <i class="fa-solid fa-angle-left"></i>
+      <div class="prev-next-info">
+        <p class="prev-next-subtitle">previous</p>
+        <p class="prev-next-title">Exercises week 38</p>
+      </div>
+    </a>
+    <a class="right-next"
+       href="http://github.com/exercisesweek39.html"
+       title="next page">
+      <div class="prev-next-info">
+        <p class="prev-next-subtitle">next</p>
+        <p class="prev-next-title">Exercises week 39</p>
+      </div>
+      <i class="fa-solid fa-angle-right"></i>
+    </a>
+</div>
+                </footer>
+              
+            </div>
+            
+            
+              
+                <div class="bd-sidebar-secondary bd-toc"><div class="sidebar-secondary-items sidebar-secondary__inner">
+
+
+  <div class="sidebar-secondary-item">
+  <div class="page-toc tocsection onthispage">
+    <i class="fa-solid fa-list"></i> Contents
+  </div>
+  <nav class="bd-toc-nav page-toc">
+    <ul class="visible nav section-nav flex-column">
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#plans-for-week-38-lecture-monday-september-15">Plans for week 38, lecture Monday September 15</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#readings-and-videos">Readings and Videos</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#linking-the-regression-analysis-with-a-statistical-interpretation">Linking the regression analysis with a statistical interpretation</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#assumptions-made">Assumptions made</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#expectation-value-and-variance">Expectation value and variance</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#expectation-value-and-variance-for-boldsymbol-theta">Expectation value and variance for <span class="math notranslate nohighlight">\(\boldsymbol{\theta}\)</span></a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#deriving-ols-from-a-probability-distribution">Deriving OLS from a probability distribution</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#independent-and-identically-distributed-iid">Independent and Identically Distributed (iid)</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#maximum-likelihood-estimation-mle">Maximum Likelihood Estimation (MLE)</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#a-new-cost-function">A new Cost Function</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#why-resampling-methods">Why resampling methods</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#resampling-methods">Resampling methods</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#resampling-approaches-can-be-computationally-expensive">Resampling approaches can be computationally expensive</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#id1">Why resampling methods ?</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#statistical-analysis">Statistical analysis</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#id2">Resampling methods</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#resampling-methods-bootstrap">Resampling methods: Bootstrap</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#the-central-limit-theorem">The Central Limit Theorem</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#finding-the-limit">Finding the Limit</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#rewriting-the-delta-function">Rewriting the <span class="math notranslate nohighlight">\(\delta\)</span>-function</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#identifying-terms">Identifying Terms</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#wrapping-it-up">Wrapping it up</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#confidence-intervals">Confidence Intervals</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#standard-approach-based-on-the-normal-distribution">Standard Approach based on the Normal Distribution</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#resampling-methods-bootstrap-background">Resampling methods: Bootstrap background</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#resampling-methods-more-bootstrap-background">Resampling methods: More Bootstrap background</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#resampling-methods-bootstrap-approach">Resampling methods: Bootstrap approach</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#resampling-methods-bootstrap-steps">Resampling methods: Bootstrap steps</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#code-example-for-the-bootstrap-method">Code example for the Bootstrap method</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#plotting-the-histogram">Plotting the Histogram</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#the-bias-variance-tradeoff">The bias-variance tradeoff</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#a-way-to-read-the-bias-variance-tradeoff">A way to Read the Bias-Variance Tradeoff</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#example-code-for-bias-variance-tradeoff">Example code for Bias-Variance tradeoff</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#understanding-what-happens">Understanding what happens</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#summing-up">Summing up</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#another-example-from-scikit-learn-s-repository">Another Example from Scikit-Learn’s Repository</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#various-steps-in-cross-validation">Various steps in cross-validation</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#cross-validation-in-brief">Cross-validation in brief</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#code-example-for-cross-validation-and-k-fold-cross-validation">Code Example for Cross-validation and <span class="math notranslate nohighlight">\(k\)</span>-fold Cross-validation</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#more-examples-on-bootstrap-and-cross-validation-and-errors">More examples on bootstrap and cross-validation and errors</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#the-same-example-but-now-with-cross-validation">The same example but now with cross-validation</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#material-for-the-lab-sessions">Material for the lab sessions</a></li>
+</ul>
+  </nav></div>
+
+</div></div>
+              
+            
+          </div>
+          <footer class="bd-footer-content">
+            
+<div class="bd-footer-content__inner container">
+  
+  <div class="footer-item">
+    
+<p class="component-author">
+By Morten Hjorth-Jensen
+</p>
+
+  </div>
+  
+  <div class="footer-item">
+    
+
+  <p class="copyright">
+    
+      © Copyright 2023.
+      <br/>
+    
+  </p>
+
+  </div>
+  
+  <div class="footer-item">
+    
+  </div>
+  
+  <div class="footer-item">
+    
+  </div>
+  
+</div>
+          </footer>
+        
+
+      </main>
+    </div>
+  </div>
+  
+  <!-- Scripts loaded after <body> so the DOM is not blocked -->
+  <script src="http://github.com/_static/scripts/bootstrap.js?digest=dfe6caa3a7d634c4db9b"></script>
+<script src="http://github.com/_static/scripts/pydata-sphinx-theme.js?digest=dfe6caa3a7d634c4db9b"></script>
+
+  <footer class="bd-footer">
+  </footer>
+  </body>
+</html>
\ No newline at end of file
diff --git a/doc/LectureNotes/_build/html/week39.html b/doc/LectureNotes/_build/html/week39.html
new file mode 100644
index 000000000..a2746005a
--- /dev/null
+++ b/doc/LectureNotes/_build/html/week39.html
@@ -0,0 +1,2073 @@
+
+<!DOCTYPE html>
+
+
+<html lang="en" data-content_root="./" >
+
+  <head>
+    <meta charset="utf-8" />
+    <meta name="viewport" content="width=device-width, initial-scale=1.0" /><meta name="viewport" content="width=device-width, initial-scale=1" />
+
+    <title>Week 39: Resampling methods and logistic regression &#8212; Applied Data Analysis and Machine Learning</title>
+  
+  
+  
+  <script data-cfasync="false">
+    document.documentElement.dataset.mode = localStorage.getItem("mode") || "";
+    document.documentElement.dataset.theme = localStorage.getItem("theme") || "";
+  </script>
+  
+  <!-- Loaded before other Sphinx assets -->
+  <link href="http://github.com/_static/styles/theme.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+<link href="http://github.com/_static/styles/bootstrap.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+<link href="http://github.com/_static/styles/pydata-sphinx-theme.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+
+  
+  <link href="http://github.com/_static/vendor/fontawesome/6.5.2/css/all.min.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+  <link rel="preload" as="font" type="font/woff2" crossorigin href="http://github.com/_static/vendor/fontawesome/6.5.2/webfonts/fa-solid-900.woff2" />
+<link rel="preload" as="font" type="font/woff2" crossorigin href="http://github.com/_static/vendor/fontawesome/6.5.2/webfonts/fa-brands-400.woff2" />
+<link rel="preload" as="font" type="font/woff2" crossorigin href="http://github.com/_static/vendor/fontawesome/6.5.2/webfonts/fa-regular-400.woff2" />
+
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/pygments.css?v=fa44fd50" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/styles/sphinx-book-theme.css?v=eba8b062" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/togglebutton.css?v=13237357" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/copybutton.css?v=76b2166b" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/mystnb.8ecb98da25f57f5357bf6f572d296f466b2cfe2517ffebfabe82451661e28f02.css?v=6644e6bb" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/sphinx-thebe.css?v=4fa983c6" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/sphinx-design.min.css?v=95c83b7e" />
+  
+  <!-- Pre-loaded scripts that we'll load fully later -->
+  <link rel="preload" as="script" href="http://github.com/_static/scripts/bootstrap.js?digest=dfe6caa3a7d634c4db9b" />
+<link rel="preload" as="script" href="http://github.com/_static/scripts/pydata-sphinx-theme.js?digest=dfe6caa3a7d634c4db9b" />
+  <script src="http://github.com/_static/vendor/fontawesome/6.5.2/js/all.min.js?digest=dfe6caa3a7d634c4db9b"></script>
+
+    <script src="http://github.com/_static/documentation_options.js?v=9eb32ce0"></script>
+    <script src="http://github.com/_static/doctools.js?v=9a2dae69"></script>
+    <script src="http://github.com/_static/sphinx_highlight.js?v=dc90522c"></script>
+    <script src="http://github.com/_static/clipboard.min.js?v=a7894cd8"></script>
+    <script src="http://github.com/_static/copybutton.js?v=f281be69"></script>
+    <script src="http://github.com/_static/scripts/sphinx-book-theme.js?v=887ef09a"></script>
+    <script>let toggleHintShow = 'Click to show';</script>
+    <script>let toggleHintHide = 'Click to hide';</script>
+    <script>let toggleOpenOnPrint = 'true';</script>
+    <script src="http://github.com/_static/togglebutton.js?v=4a39c7ea"></script>
+    <script>var togglebuttonSelector = '.toggle, .admonition.dropdown';</script>
+    <script src="http://github.com/_static/design-tabs.js?v=f930bc37"></script>
+    <script>const THEBE_JS_URL = "https://unpkg.com/thebe@0.8.2/lib/index.js"; const thebe_selector = ".thebe,.cell"; const thebe_selector_input = "pre"; const thebe_selector_output = ".output, .cell_output"</script>
+    <script async="async" src="http://github.com/_static/sphinx-thebe.js?v=c100c467"></script>
+    <script>var togglebuttonSelector = '.toggle, .admonition.dropdown';</script>
+    <script>const THEBE_JS_URL = "https://unpkg.com/thebe@0.8.2/lib/index.js"; const thebe_selector = ".thebe,.cell"; const thebe_selector_input = "pre"; const thebe_selector_output = ".output, .cell_output"</script>
+    <script>window.MathJax = {"options": {"processHtmlClass": "tex2jax_process|mathjax_process|math|output_area"}}</script>
+    <script defer="defer" src="https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-mml-chtml.js"></script>
+    <script>DOCUMENTATION_OPTIONS.pagename = 'week39';</script>
+    <link rel="index" title="Index" href="http://github.com/genindex.html" />
+    <link rel="search" title="Search" href="http://github.com/search.html" />
+    <link rel="next" title="Week 40: Gradient descent methods (continued) and start Neural networks" href="http://github.com/week40.html" />
+    <link rel="prev" title="Exercises week 39" href="http://github.com/exercisesweek39.html" />
+  <meta name="viewport" content="width=device-width, initial-scale=1"/>
+  <meta name="docsearch:language" content="en"/>
+  </head>
+  
+  
+  <body data-bs-spy="scroll" data-bs-target=".bd-toc-nav" data-offset="180" data-bs-root-margin="0px 0px -60%" data-default-mode="">
+
+  
+  
+  <div id="pst-skip-link" class="skip-link d-print-none"><a href="#main-content">Skip to main content</a></div>
+  
+  <div id="pst-scroll-pixel-helper"></div>
+  
+  <button type="button" class="btn rounded-pill" id="pst-back-to-top">
+    <i class="fa-solid fa-arrow-up"></i>Back to top</button>
+
+  
+  <input type="checkbox"
+          class="sidebar-toggle"
+          id="pst-primary-sidebar-checkbox"/>
+  <label class="overlay overlay-primary" for="pst-primary-sidebar-checkbox"></label>
+  
+  <input type="checkbox"
+          class="sidebar-toggle"
+          id="pst-secondary-sidebar-checkbox"/>
+  <label class="overlay overlay-secondary" for="pst-secondary-sidebar-checkbox"></label>
+  
+  <div class="search-button__wrapper">
+    <div class="search-button__overlay"></div>
+    <div class="search-button__search-container">
+<form class="bd-search d-flex align-items-center"
+      action="http://github.com/search.html"
+      method="get">
+  <i class="fa-solid fa-magnifying-glass"></i>
+  <input type="search"
+         class="form-control"
+         name="q"
+         id="search-input"
+         placeholder="Search this book..."
+         aria-label="Search this book..."
+         autocomplete="off"
+         autocorrect="off"
+         autocapitalize="off"
+         spellcheck="false"/>
+  <span class="search-button__kbd-shortcut"><kbd class="kbd-shortcut__modifier">Ctrl</kbd>+<kbd>K</kbd></span>
+</form></div>
+  </div>
+
+  <div class="pst-async-banner-revealer d-none">
+  <aside id="bd-header-version-warning" class="d-none d-print-none" aria-label="Version warning"></aside>
+</div>
+
+  
+    <header class="bd-header navbar navbar-expand-lg bd-navbar d-print-none">
+    </header>
+  
+
+  <div class="bd-container">
+    <div class="bd-container__inner bd-page-width">
+      
+      
+      
+      <div class="bd-sidebar-primary bd-sidebar">
+        
+
+  
+  <div class="sidebar-header-items sidebar-primary__section">
+    
+    
+    
+    
+  </div>
+  
+    <div class="sidebar-primary-items__start sidebar-primary__section">
+        <div class="sidebar-primary-item">
+
+  
+    
+  
+
+<a class="navbar-brand logo" href="http://github.com/intro.html">
+  
+  
+  
+  
+  
+    
+    
+      
+    
+    
+    <img src="http://github.com/_static/logo.png" class="logo__image only-light" alt="Applied Data Analysis and Machine Learning - Home"/>
+    <script>document.write(`<img src="http://github.com/_static/logo.png" class="logo__image only-dark" alt="Applied Data Analysis and Machine Learning - Home"/>`);</script>
+  
+  
+</a></div>
+        <div class="sidebar-primary-item">
+
+ <script>
+ document.write(`
+   <button class="btn search-button-field search-button__button" title="Search" aria-label="Search" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <i class="fa-solid fa-magnifying-glass"></i>
+    <span class="search-button__default-text">Search</span>
+    <span class="search-button__kbd-shortcut"><kbd class="kbd-shortcut__modifier">Ctrl</kbd>+<kbd class="kbd-shortcut__modifier">K</kbd></span>
+   </button>
+ `);
+ </script></div>
+        <div class="sidebar-primary-item"><nav class="bd-links bd-docs-nav" aria-label="Main">
+    <div class="bd-toc-item navbar-nav active">
+        
+        <ul class="nav bd-sidenav bd-sidenav__home-link">
+            <li class="toctree-l1">
+                <a class="reference internal" href="http://github.com/intro.html">
+                    Applied Data Analysis and Machine Learning
+                </a>
+            </li>
+        </ul>
+        <p aria-level="2" class="caption" role="heading"><span class="caption-text">About the course</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/schedule.html">Course setting</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/teachers.html">Teachers and Grading</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/textbooks.html">Textbooks</a></li>
+
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Review of Statistics with Resampling Techniques and Linear Algebra</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/statistics.html">1. Elements of Probability Theory and Statistical Data Analysis</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/linalg.html">2. Linear Algebra, Handling of Arrays and more Python Features</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">From Regression to Support Vector Machines</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter1.html">3. Linear Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter2.html">4. Ridge and Lasso Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter3.html">5. Resampling Methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter4.html">6. Logistic Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapteroptimization.html">7. Optimization, the central part of any Machine Learning algortithm</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter5.html">8. Support Vector Machines, overarching aims</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Decision Trees, Ensemble Methods and Boosting</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter6.html">9. Decision trees, overarching aims</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter7.html">10. Ensemble Methods: From a Single Tree to Many Trees and Extreme Boosting, Meet the Jungle of Methods</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Dimensionality Reduction</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter8.html">11. Basic ideas of the Principal Component Analysis (PCA)</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/clustering.html">12. Clustering and Unsupervised Learning</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Deep Learning Methods</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter9.html">13. Neural networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter10.html">14. Building a Feed Forward Neural Network</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter11.html">15. Solving Differential Equations  with Deep Learning</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter12.html">16. Convolutional Neural Networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter13.html">17. Recurrent neural networks: Overarching view</a></li>
+
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Weekly material, notes and exercises</span></p>
+<ul class="current nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek34.html">Exercises week 34</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week34.html">Week 34: Introduction to the course, Logistics and Practicalities</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek35.html">Exercises week 35</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week35.html">Week 35: From Ordinary Linear Regression to Ridge and Lasso Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek36.html">Exercises week 36</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week36.html">Week 36: Linear Regression and Gradient descent</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek37.html">Exercises week 37</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week37.html">Week 37: Gradient descent methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek38.html">Exercises week 38</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week38.html">Week 38: Statistical analysis, bias-variance tradeoff and resampling methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek39.html">Exercises week 39</a></li>
+<li class="toctree-l1 current active"><a class="current reference internal" href="#">Week 39: Resampling methods and logistic regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week40.html">Week 40: Gradient descent methods (continued) and start Neural networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week41.html">Week 41 Neural networks and constructing a neural network code</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek41.html">Exercises week 41</a></li>
+
+
+
+
+
+
+
+
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week42.html">Week 42 Constructing a Neural Network code with examples</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek42.html">Exercises week 42</a></li>
+
+
+
+
+
+
+
+
+
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week43.html">Week 43: Deep Learning: Constructing a Neural Network code and solving differential equations</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek43.html">Exercises week 43</a></li>
+
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week44.html">Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek44.html">Exercises week 44</a></li>
+
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week45.html">Week 45,  Convolutional Neural Networks (CCNs)</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Projects</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/project1.html">Project 1 on Machine Learning, deadline October 6 (midnight), 2025</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/project2.html">Project 2 on Machine Learning, deadline November 10 (Midnight)</a></li>
+</ul>
+
+    </div>
+</nav></div>
+    </div>
+  
+  
+  <div class="sidebar-primary-items__end sidebar-primary__section">
+  </div>
+  
+  <div id="rtd-footer-container"></div>
+
+
+      </div>
+      
+      <main id="main-content" class="bd-main" role="main">
+        
+        
+
+<div class="sbt-scroll-pixel-helper"></div>
+
+          <div class="bd-content">
+            <div class="bd-article-container">
+              
+              <div class="bd-header-article d-print-none">
+<div class="header-article-items header-article__inner">
+  
+    <div class="header-article-items__start">
+      
+        <div class="header-article-item"><button class="sidebar-toggle primary-toggle btn btn-sm" title="Toggle primary sidebar" data-bs-placement="bottom" data-bs-toggle="tooltip">
+  <span class="fa-solid fa-bars"></span>
+</button></div>
+      
+    </div>
+  
+  
+    <div class="header-article-items__end">
+      
+        <div class="header-article-item">
+
+<div class="article-header-buttons">
+
+
+
+
+
+<div class="dropdown dropdown-download-buttons">
+  <button class="btn dropdown-toggle" type="button" data-bs-toggle="dropdown" aria-expanded="false" aria-label="Download this page">
+    <i class="fas fa-download"></i>
+  </button>
+  <ul class="dropdown-menu">
+      
+      
+      
+      <li><a href="http://github.com/_sources/week39.ipynb" target="_blank"
+   class="btn btn-sm btn-download-source-button dropdown-item"
+   title="Download source file"
+   data-bs-placement="left" data-bs-toggle="tooltip"
+>
+  
+
+<span class="btn__icon-container">
+  <i class="fas fa-file"></i>
+  </span>
+<span class="btn__text-container">.ipynb</span>
+</a>
+</li>
+      
+      
+      
+      
+      <li>
+<button onclick="window.print()"
+  class="btn btn-sm btn-download-pdf-button dropdown-item"
+  title="Print to PDF"
+  data-bs-placement="left" data-bs-toggle="tooltip"
+>
+  
+
+<span class="btn__icon-container">
+  <i class="fas fa-file-pdf"></i>
+  </span>
+<span class="btn__text-container">.pdf</span>
+</button>
+</li>
+      
+  </ul>
+</div>
+
+
+
+
+<button onclick="toggleFullScreen()"
+  class="btn btn-sm btn-fullscreen-button"
+  title="Fullscreen mode"
+  data-bs-placement="bottom" data-bs-toggle="tooltip"
+>
+  
+
+<span class="btn__icon-container">
+  <i class="fas fa-expand"></i>
+  </span>
+
+</button>
+
+
+
+<script>
+document.write(`
+  <button class="btn btn-sm nav-link pst-navbar-icon theme-switch-button" title="light/dark" aria-label="light/dark" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <i class="theme-switch fa-solid fa-sun fa-lg" data-mode="light"></i>
+    <i class="theme-switch fa-solid fa-moon fa-lg" data-mode="dark"></i>
+    <i class="theme-switch fa-solid fa-circle-half-stroke fa-lg" data-mode="auto"></i>
+  </button>
+`);
+</script>
+
+
+<script>
+document.write(`
+  <button class="btn btn-sm pst-navbar-icon search-button search-button__button" title="Search" aria-label="Search" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <i class="fa-solid fa-magnifying-glass fa-lg"></i>
+  </button>
+`);
+</script>
+<button class="sidebar-toggle secondary-toggle btn btn-sm" title="Toggle secondary sidebar" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <span class="fa-solid fa-list"></span>
+</button>
+</div></div>
+      
+    </div>
+  
+</div>
+</div>
+              
+              
+
+<div id="jb-print-docs-body" class="onlyprint">
+    <h1>Week 39: Resampling methods and logistic regression</h1>
+    <!-- Table of contents -->
+    <div id="print-main-content">
+        <div id="jb-print-toc">
+            
+            <div>
+                <h2> Contents </h2>
+            </div>
+            <nav aria-label="Page">
+                <ul class="visible nav section-nav flex-column">
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#plan-for-week-39-september-22-26-2025">Plan for week 39, September 22-26, 2025</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#readings-and-videos-resampling-methods">Readings and Videos, resampling methods</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#readings-and-videos-logistic-regression">Readings and Videos, logistic regression</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#lab-sessions-week-39">Lab sessions week 39</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#lecture-material">Lecture material</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#resampling-methods">Resampling methods</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#resampling-approaches-can-be-computationally-expensive">Resampling approaches can be computationally expensive</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#why-resampling-methods">Why resampling methods ?</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#statistical-analysis">Statistical analysis</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#id1">Resampling methods</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#resampling-methods-bootstrap">Resampling methods: Bootstrap</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#the-bias-variance-tradeoff">The bias-variance tradeoff</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#a-way-to-read-the-bias-variance-tradeoff">A way to Read the Bias-Variance Tradeoff</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#understanding-what-happens">Understanding what happens</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#summing-up">Summing up</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#another-example-from-scikit-learn-s-repository">Another Example from Scikit-Learn’s Repository</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#various-steps-in-cross-validation">Various steps in cross-validation</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#cross-validation-in-brief">Cross-validation in brief</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#code-example-for-cross-validation-and-k-fold-cross-validation">Code Example for Cross-validation and <span class="math notranslate nohighlight">\(k\)</span>-fold Cross-validation</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#more-examples-on-bootstrap-and-cross-validation-and-errors">More examples on bootstrap and cross-validation and errors</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#the-same-example-but-now-with-cross-validation">The same example but now with cross-validation</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#logistic-regression">Logistic Regression</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#classification-problems">Classification problems</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#optimization-and-deep-learning">Optimization and Deep learning</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#basics">Basics</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#linear-classifier">Linear classifier</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#some-selected-properties">Some selected properties</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#simple-example">Simple example</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#plotting-the-mean-value-for-each-group">Plotting the mean value for each group</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#the-logistic-function">The logistic function</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-networks">Examples of likelihood functions used in logistic regression and nueral networks</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#two-parameters">Two parameters</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#maximum-likelihood">Maximum likelihood</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#the-cost-function-rewritten">The cost function rewritten</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#minimizing-the-cross-entropy">Minimizing the cross entropy</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#a-more-compact-expression">A more compact expression</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#extending-to-more-predictors">Extending to more predictors</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#including-more-classes">Including more classes</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#more-classes">More classes</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#optimization-the-central-part-of-any-machine-learning-algortithm">Optimization, the central part of any Machine Learning algortithm</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#revisiting-our-logistic-regression-case">Revisiting our Logistic Regression case</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#the-equations-to-solve">The equations to solve</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#solving-using-newton-raphson-s-method">Solving using Newton-Raphson’s method</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#example-code-for-logistic-regression">Example code for Logistic Regression</a><ul class="nav section-nav flex-column">
+<li class="toc-h3 nav-item toc-entry"><a class="reference internal nav-link" href="#synthetic-data-generation">Synthetic data generation</a></li>
+</ul>
+</li>
+</ul>
+            </nav>
+        </div>
+    </div>
+</div>
+
+              
+                
+<div id="searchbox"></div>
+                <article class="bd-article">
+                  
+  <!-- HTML file automatically generated from DocOnce source (https://github.com/doconce/doconce/)
+doconce format html week39.do.txt --no_mako -->
+<!-- dom:TITLE: Week 39: Resampling methods and logistic regression --><section class="tex2jax_ignore mathjax_ignore" id="week-39-resampling-methods-and-logistic-regression">
+<h1>Week 39: Resampling methods and logistic regression<a class="headerlink" href="#week-39-resampling-methods-and-logistic-regression" title="Link to this heading">#</a></h1>
+<p><strong>Morten Hjorth-Jensen</strong>, Department of Physics, University of Oslo</p>
+<p>Date: <strong>Week 39</strong></p>
+<section id="plan-for-week-39-september-22-26-2025">
+<h2>Plan for week 39, September 22-26, 2025<a class="headerlink" href="#plan-for-week-39-september-22-26-2025" title="Link to this heading">#</a></h2>
+<p><strong>Material for the lecture on Monday September 22.</strong></p>
+<ol class="arabic simple">
+<li><p>Resampling techniques, Bootstrap and cross validation and bias-variance tradeoff</p></li>
+<li><p>Logistic regression, our first classification encounter and a stepping stone towards neural networks</p></li>
+<li><p><a class="reference external" href="https://youtu.be/OVouJyhoksY">Video of lecture</a></p></li>
+<li><p><a class="reference external" href="https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2024/FYSSTKweek39.pdf">Whiteboard notes</a></p></li>
+</ol>
+</section>
+<section id="readings-and-videos-resampling-methods">
+<h2>Readings and Videos, resampling methods<a class="headerlink" href="#readings-and-videos-resampling-methods" title="Link to this heading">#</a></h2>
+<ol class="arabic simple">
+<li><p>Raschka et al, pages 175-192</p></li>
+<li><p>Hastie et al Chapter 7, here we recommend 7.1-7.5 and 7.10 (cross-validation) and 7.11 (bootstrap). See <a class="reference external" href="https://link.springer.com/book/10.1007/978-0-387-84858-7">https://link.springer.com/book/10.1007/978-0-387-84858-7</a>.</p></li>
+<li><p><a class="reference external" href="https://www.youtube.com/watch?v=EuBBz3bI-aA">Video on bias-variance tradeoff</a></p></li>
+<li><p><a class="reference external" href="https://www.youtube.com/watch?v=Xz0x-8-cgaQ">Video on Bootstrapping</a></p></li>
+<li><p><a class="reference external" href="https://www.youtube.com/watch?v=fSytzGwwBVw">Video on cross validation</a></p></li>
+</ol>
+</section>
+<section id="readings-and-videos-logistic-regression">
+<h2>Readings and Videos, logistic regression<a class="headerlink" href="#readings-and-videos-logistic-regression" title="Link to this heading">#</a></h2>
+<ol class="arabic simple">
+<li><p>Hastie et al 4.1, 4.2 and 4.3 on logistic regression</p></li>
+<li><p>Raschka et al, pages 53-76 on Logistic regression and pages 37-52 on gradient optimization</p></li>
+<li><p><a class="reference external" href="https://www.youtube.com/watch?v=C5268D9t9Ak">Video on Logistic regression</a></p></li>
+<li><p><a class="reference external" href="https://www.youtube.com/watch?v=yIYKR4sgzI8">Yet another video on logistic regression</a></p></li>
+</ol>
+</section>
+<section id="lab-sessions-week-39">
+<h2>Lab sessions week 39<a class="headerlink" href="#lab-sessions-week-39" title="Link to this heading">#</a></h2>
+<p><strong>Material for the lab sessions on Tuesday and Wednesday.</strong></p>
+<ol class="arabic simple">
+<li><p>Discussions on how to structure your report for the first project</p></li>
+<li><p>Exercise for week 39 on how to write the abstract and the introduction of the report and how to include references.</p></li>
+<li><p>Work on project 1, in particular resampling methods like cross-validation and bootstrap. <strong>For more discussions of project 1, chapter 5 of Goodfellow et al is a good read, in particular sections 5.1-5.5 and 5.7-5.11</strong>.</p></li>
+<li><p><a class="reference external" href="https://youtu.be/tVW1ZDmZnwM">Video on how to write scientific reports recorded during one of the lab sessions</a></p></li>
+<li><p>A general guideline can be found at <a class="github reference external" href="https://github.com/CompPhysics/MachineLearning/blob/master/doc/Projects/EvaluationGrading/EvaluationForm.md">CompPhysics/MachineLearning</a>.</p></li>
+</ol>
+</section>
+<section id="lecture-material">
+<h2>Lecture material<a class="headerlink" href="#lecture-material" title="Link to this heading">#</a></h2>
+</section>
+<section id="resampling-methods">
+<h2>Resampling methods<a class="headerlink" href="#resampling-methods" title="Link to this heading">#</a></h2>
+<p>Resampling methods are an indispensable tool in modern
+statistics. They involve repeatedly drawing samples from a training
+set and refitting a model of interest on each sample in order to
+obtain additional information about the fitted model. For example, in
+order to estimate the variability of a linear regression fit, we can
+repeatedly draw different samples from the training data, fit a linear
+regression to each new sample, and then examine the extent to which
+the resulting fits differ. Such an approach may allow us to obtain
+information that would not be available from fitting the model only
+once using the original training sample.</p>
+<p>Two resampling methods are often used in Machine Learning analyses,</p>
+<ol class="arabic simple">
+<li><p>The <strong>bootstrap method</strong></p></li>
+<li><p>and <strong>Cross-Validation</strong></p></li>
+</ol>
+<p>In addition there are several other methods such as the Jackknife and the Blocking methods. This week will repeat some of the elements of the bootstrap method and focus more on cross-validation.</p>
+</section>
+<section id="resampling-approaches-can-be-computationally-expensive">
+<h2>Resampling approaches can be computationally expensive<a class="headerlink" href="#resampling-approaches-can-be-computationally-expensive" title="Link to this heading">#</a></h2>
+<p>Resampling approaches can be computationally expensive, because they
+involve fitting the same statistical method multiple times using
+different subsets of the training data. However, due to recent
+advances in computing power, the computational requirements of
+resampling methods generally are not prohibitive. In this chapter, we
+discuss two of the most commonly used resampling methods,
+cross-validation and the bootstrap. Both methods are important tools
+in the practical application of many statistical learning
+procedures. For example, cross-validation can be used to estimate the
+test error associated with a given statistical learning method in
+order to evaluate its performance, or to select the appropriate level
+of flexibility. The process of evaluating a model’s performance is
+known as model assessment, whereas the process of selecting the proper
+level of flexibility for a model is known as model selection. The
+bootstrap is widely used.</p>
+</section>
+<section id="why-resampling-methods">
+<h2>Why resampling methods ?<a class="headerlink" href="#why-resampling-methods" title="Link to this heading">#</a></h2>
+<p><strong>Statistical analysis.</strong></p>
+<ul class="simple">
+<li><p>Our simulations can be treated as <em>computer experiments</em>. This is particularly the case for Monte Carlo methods which are widely used in statistical analyses.</p></li>
+<li><p>The results can be analysed with the same statistical tools as we would use when analysing experimental data.</p></li>
+<li><p>As in all experiments, we are looking for expectation values and an estimate of how accurate they are, i.e., possible sources for errors.</p></li>
+</ul>
+</section>
+<section id="statistical-analysis">
+<h2>Statistical analysis<a class="headerlink" href="#statistical-analysis" title="Link to this heading">#</a></h2>
+<ul class="simple">
+<li><p>As in other experiments, many numerical  experiments have two classes of errors:</p>
+<ul>
+<li><p>Statistical errors</p></li>
+<li><p>Systematical errors</p></li>
+</ul>
+</li>
+<li><p>Statistical errors can be estimated using standard tools from statistics</p></li>
+<li><p>Systematical errors are method specific and must be treated differently from case to case.</p></li>
+</ul>
+</section>
+<section id="id1">
+<h2>Resampling methods<a class="headerlink" href="#id1" title="Link to this heading">#</a></h2>
+<p>With all these analytical equations for both the OLS and Ridge
+regression, we will now outline how to assess a given model. This will
+lead to a discussion of the so-called bias-variance tradeoff (see
+below) and so-called resampling methods.</p>
+<p>One of the quantities we have discussed as a way to measure errors is
+the mean-squared error (MSE), mainly used for fitting of continuous
+functions. Another choice is the absolute error.</p>
+<p>In the discussions below we will focus on the MSE and in particular since we will split the data into test and training data,
+we discuss the</p>
+<ol class="arabic simple">
+<li><p>prediction error or simply the <strong>test error</strong> <span class="math notranslate nohighlight">\(\mathrm{Err_{Test}}\)</span>, where we have a fixed training set and the test error is the MSE arising from the data reserved for testing. We discuss also the</p></li>
+<li><p>training error <span class="math notranslate nohighlight">\(\mathrm{Err_{Train}}\)</span>, which is the average loss over the training data.</p></li>
+</ol>
+<p>As our model becomes more and more complex, more of the training data tends to  used. The training may thence adapt to more complicated structures in the data. This may lead to a decrease in the bias (see below for code example) and a slight increase of the variance for the test error.
+For a certain level of complexity the test error will reach minimum, before starting to increase again. The
+training error reaches a saturation.</p>
+</section>
+<section id="resampling-methods-bootstrap">
+<h2>Resampling methods: Bootstrap<a class="headerlink" href="#resampling-methods-bootstrap" title="Link to this heading">#</a></h2>
+<p>Bootstrapping is a <a class="reference external" href="https://en.wikipedia.org/wiki/Nonparametric_statistics">non-parametric approach</a> to statistical inference
+that substitutes computation for more traditional distributional
+assumptions and asymptotic results. Bootstrapping offers a number of
+advantages:</p>
+<ol class="arabic simple">
+<li><p>The bootstrap is quite general, although there are some cases in which it fails.</p></li>
+<li><p>Because it does not require distributional assumptions (such as normally distributed errors), the bootstrap can provide more accurate inferences when the data are not well behaved or when the sample size is small.</p></li>
+<li><p>It is possible to apply the bootstrap to statistics with sampling distributions that are difficult to derive, even asymptotically.</p></li>
+<li><p>It is relatively simple to apply the bootstrap to complex data-collection plans (such as stratified and clustered samples).</p></li>
+</ol>
+<p>The textbook by <a class="reference external" href="https://www.cambridge.org/core/books/bootstrap-methods-and-their-application/ED2FD043579F27952363566DC09CBD6A">Davison on the Bootstrap Methods and their Applications</a> provides many more insights and proofs. In this course we will take a more practical approach and use the results and theorems provided in the literature. For those interested in reading more about the bootstrap methods, we recommend the above text and the one by <a class="reference external" href="https://www.routledge.com/An-Introduction-to-the-Bootstrap/Efron-Tibshirani/p/book/9780412042317">Efron and Tibshirani</a>.</p>
+</section>
+<section id="the-bias-variance-tradeoff">
+<h2>The bias-variance tradeoff<a class="headerlink" href="#the-bias-variance-tradeoff" title="Link to this heading">#</a></h2>
+<p>We will discuss the bias-variance tradeoff in the context of
+continuous predictions such as regression. However, many of the
+intuitions and ideas discussed here also carry over to classification
+tasks. Consider a dataset <span class="math notranslate nohighlight">\(\mathcal{D}\)</span> consisting of the data
+<span class="math notranslate nohighlight">\(\mathbf{X}_\mathcal{D}=\{(y_j, \boldsymbol{x}_j), j=0\ldots n-1\}\)</span>.</p>
+<p>Let us assume that the true data is generated from a noisy model</p>
+<div class="math notranslate nohighlight">
+\[
+\boldsymbol{y}=f(\boldsymbol{x}) + \boldsymbol{\epsilon}
+\]</div>
+<p>where <span class="math notranslate nohighlight">\(\epsilon\)</span> is normally distributed with mean zero and standard deviation <span class="math notranslate nohighlight">\(\sigma^2\)</span>.</p>
+<p>In our derivation of the ordinary least squares method we defined then
+an approximation to the function <span class="math notranslate nohighlight">\(f\)</span> in terms of the parameters
+<span class="math notranslate nohighlight">\(\boldsymbol{\theta}\)</span> and the design matrix <span class="math notranslate nohighlight">\(\boldsymbol{X}\)</span> which embody our model,
+that is <span class="math notranslate nohighlight">\(\boldsymbol{\tilde{y}}=\boldsymbol{X}\boldsymbol{\theta}\)</span>.</p>
+<p>Thereafter we found the parameters <span class="math notranslate nohighlight">\(\boldsymbol{\theta}\)</span> by optimizing the means squared error via the so-called cost function</p>
+<div class="math notranslate nohighlight">
+\[
+C(\boldsymbol{X},\boldsymbol{\theta}) =\frac{1}{n}\sum_{i=0}^{n-1}(y_i-\tilde{y}_i)^2=\mathbb{E}\left[(\boldsymbol{y}-\boldsymbol{\tilde{y}})^2\right].
+\]</div>
+<p>We can rewrite this as</p>
+<div class="math notranslate nohighlight">
+\[
+\mathbb{E}\left[(\boldsymbol{y}-\boldsymbol{\tilde{y}})^2\right]=\frac{1}{n}\sum_i(f_i-\mathbb{E}\left[\boldsymbol{\tilde{y}}\right])^2+\frac{1}{n}\sum_i(\tilde{y}_i-\mathbb{E}\left[\boldsymbol{\tilde{y}}\right])^2+\sigma^2.
+\]</div>
+<p>The three terms represent the square of the bias of the learning
+method, which can be thought of as the error caused by the simplifying
+assumptions built into the method. The second term represents the
+variance of the chosen model and finally the last terms is variance of
+the error <span class="math notranslate nohighlight">\(\boldsymbol{\epsilon}\)</span>.</p>
+<p>To derive this equation, we need to recall that the variance of <span class="math notranslate nohighlight">\(\boldsymbol{y}\)</span> and <span class="math notranslate nohighlight">\(\boldsymbol{\epsilon}\)</span> are both equal to <span class="math notranslate nohighlight">\(\sigma^2\)</span>. The mean value of <span class="math notranslate nohighlight">\(\boldsymbol{\epsilon}\)</span> is by definition equal to zero. Furthermore, the function <span class="math notranslate nohighlight">\(f\)</span> is not a stochastics variable, idem for <span class="math notranslate nohighlight">\(\boldsymbol{\tilde{y}}\)</span>.
+We use a more compact notation in terms of the expectation value</p>
+<div class="math notranslate nohighlight">
+\[
+\mathbb{E}\left[(\boldsymbol{y}-\boldsymbol{\tilde{y}})^2\right]=\mathbb{E}\left[(\boldsymbol{f}+\boldsymbol{\epsilon}-\boldsymbol{\tilde{y}})^2\right],
+\]</div>
+<p>and adding and subtracting <span class="math notranslate nohighlight">\(\mathbb{E}\left[\boldsymbol{\tilde{y}}\right]\)</span> we get</p>
+<div class="math notranslate nohighlight">
+\[
+\mathbb{E}\left[(\boldsymbol{y}-\boldsymbol{\tilde{y}})^2\right]=\mathbb{E}\left[(\boldsymbol{f}+\boldsymbol{\epsilon}-\boldsymbol{\tilde{y}}+\mathbb{E}\left[\boldsymbol{\tilde{y}}\right]-\mathbb{E}\left[\boldsymbol{\tilde{y}}\right])^2\right],
+\]</div>
+<p>which, using the abovementioned expectation values can be rewritten as</p>
+<div class="math notranslate nohighlight">
+\[
+\mathbb{E}\left[(\boldsymbol{y}-\boldsymbol{\tilde{y}})^2\right]=\mathbb{E}\left[(\boldsymbol{y}-\mathbb{E}\left[\boldsymbol{\tilde{y}}\right])^2\right]+\mathrm{Var}\left[\boldsymbol{\tilde{y}}\right]+\sigma^2,
+\]</div>
+<p>that is the rewriting in terms of the so-called bias, the variance of the model <span class="math notranslate nohighlight">\(\boldsymbol{\tilde{y}}\)</span> and the variance of <span class="math notranslate nohighlight">\(\boldsymbol{\epsilon}\)</span>.</p>
+<p><strong>Note that in order to derive these equations we have assumed we can replace the unknown function <span class="math notranslate nohighlight">\(\boldsymbol{f}\)</span> with the target/output data <span class="math notranslate nohighlight">\(\boldsymbol{y}\)</span>.</strong></p>
+</section>
+<section id="a-way-to-read-the-bias-variance-tradeoff">
+<h2>A way to Read the Bias-Variance Tradeoff<a class="headerlink" href="#a-way-to-read-the-bias-variance-tradeoff" title="Link to this heading">#</a></h2>
+<!-- dom:FIGURE: [figures/BiasVariance.png, width=600 frac=0.9] -->
+<!-- begin figure -->
+<p><img src="http://github.com/figures/BiasVariance.png" width="600"><p style="font-size: 0.9em"><i>Figure 1: </i></p></p>
+<!-- end figure --></section>
+<section id="understanding-what-happens">
+<h2>Understanding what happens<a class="headerlink" href="#understanding-what-happens" title="Link to this heading">#</a></h2>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-none notranslate"><div class="highlight"><pre><span></span>%matplotlib inline
+
+import matplotlib.pyplot as plt
+import numpy as np
+from sklearn.linear_model import LinearRegression, Ridge, Lasso
+from sklearn.preprocessing import PolynomialFeatures
+from sklearn.model_selection import train_test_split
+from sklearn.pipeline import make_pipeline
+from sklearn.utils import resample
+
+np.random.seed(2018)
+
+n = 40
+n_boostraps = 100
+maxdegree = 14
+
+
+# Make data set.
+x = np.linspace(-3, 3, n).reshape(-1, 1)
+y = np.exp(-x**2) + 1.5 * np.exp(-(x-2)**2)+ np.random.normal(0, 0.1, x.shape)
+error = np.zeros(maxdegree)
+bias = np.zeros(maxdegree)
+variance = np.zeros(maxdegree)
+polydegree = np.zeros(maxdegree)
+x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2)
+
+for degree in range(maxdegree):
+    model = make_pipeline(PolynomialFeatures(degree=degree), LinearRegression(fit_intercept=False))
+    y_pred = np.empty((y_test.shape[0], n_boostraps))
+    for i in range(n_boostraps):
+        x_, y_ = resample(x_train, y_train)
+        y_pred[:, i] = model.fit(x_, y_).predict(x_test).ravel()
+
+    polydegree[degree] = degree
+    error[degree] = np.mean( np.mean((y_test - y_pred)**2, axis=1, keepdims=True) )
+    bias[degree] = np.mean( (y_test - np.mean(y_pred, axis=1, keepdims=True))**2 )
+    variance[degree] = np.mean( np.var(y_pred, axis=1, keepdims=True) )
+    print(&#39;Polynomial degree:&#39;, degree)
+    print(&#39;Error:&#39;, error[degree])
+    print(&#39;Bias^2:&#39;, bias[degree])
+    print(&#39;Var:&#39;, variance[degree])
+    print(&#39;{} &gt;= {} + {} = {}&#39;.format(error[degree], bias[degree], variance[degree], bias[degree]+variance[degree]))
+
+plt.plot(polydegree, error, label=&#39;Error&#39;)
+plt.plot(polydegree, bias, label=&#39;bias&#39;)
+plt.plot(polydegree, variance, label=&#39;Variance&#39;)
+plt.legend()
+plt.show()
+</pre></div>
+</div>
+</div>
+</div>
+</section>
+<section id="summing-up">
+<h2>Summing up<a class="headerlink" href="#summing-up" title="Link to this heading">#</a></h2>
+<p>The bias-variance tradeoff summarizes the fundamental tension in
+machine learning, particularly supervised learning, between the
+complexity of a model and the amount of training data needed to train
+it.  Since data is often limited, in practice it is often useful to
+use a less-complex model with higher bias, that is  a model whose asymptotic
+performance is worse than another model because it is easier to
+train and less sensitive to sampling noise arising from having a
+finite-sized training dataset (smaller variance).</p>
+<p>The above equations tell us that in
+order to minimize the expected test error, we need to select a
+statistical learning method that simultaneously achieves low variance
+and low bias. Note that variance is inherently a nonnegative quantity,
+and squared bias is also nonnegative. Hence, we see that the expected
+test MSE can never lie below <span class="math notranslate nohighlight">\(Var(\epsilon)\)</span>, the irreducible error.</p>
+<p>What do we mean by the variance and bias of a statistical learning
+method? The variance refers to the amount by which our model would change if we
+estimated it using a different training data set. Since the training
+data are used to fit the statistical learning method, different
+training data sets  will result in a different estimate. But ideally the
+estimate for our model should not vary too much between training
+sets. However, if a method has high variance  then small changes in
+the training data can result in large changes in the model. In general, more
+flexible statistical methods have higher variance.</p>
+<p>You may also find this recent <a class="reference external" href="https://www.pnas.org/content/116/32/15849">article</a> of interest.</p>
+</section>
+<section id="another-example-from-scikit-learn-s-repository">
+<h2>Another Example from Scikit-Learn’s Repository<a class="headerlink" href="#another-example-from-scikit-learn-s-repository" title="Link to this heading">#</a></h2>
+<p>This example demonstrates the problems of underfitting and overfitting and
+how we can use linear regression with polynomial features to approximate
+nonlinear functions. The plot shows the function that we want to approximate,
+which is a part of the cosine function. In addition, the samples from the
+real function and the approximations of different models are displayed. The
+models have polynomial features of different degrees. We can see that a
+linear function (polynomial with degree 1) is not sufficient to fit the
+training samples. This is called <strong>underfitting</strong>. A polynomial of degree 4
+approximates the true function almost perfectly. However, for higher degrees
+the model will <strong>overfit</strong> the training data, i.e. it learns the noise of the
+training data.
+We evaluate quantitatively overfitting and underfitting by using
+cross-validation. We calculate the mean squared error (MSE) on the validation
+set, the higher, the less likely the model generalizes correctly from the
+training data.</p>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-none notranslate"><div class="highlight"><pre><span></span>
+
+#print(__doc__)
+
+import numpy as np
+import matplotlib.pyplot as plt
+from sklearn.pipeline import Pipeline
+from sklearn.preprocessing import PolynomialFeatures
+from sklearn.linear_model import LinearRegression
+from sklearn.model_selection import cross_val_score
+
+
+def true_fun(X):
+    return np.cos(1.5 * np.pi * X)
+
+np.random.seed(0)
+
+n_samples = 30
+degrees = [1, 4, 15]
+
+X = np.sort(np.random.rand(n_samples))
+y = true_fun(X) + np.random.randn(n_samples) * 0.1
+
+plt.figure(figsize=(14, 5))
+for i in range(len(degrees)):
+    ax = plt.subplot(1, len(degrees), i + 1)
+    plt.setp(ax, xticks=(), yticks=())
+
+    polynomial_features = PolynomialFeatures(degree=degrees[i],
+                                             include_bias=False)
+    linear_regression = LinearRegression()
+    pipeline = Pipeline([(&quot;polynomial_features&quot;, polynomial_features),
+                         (&quot;linear_regression&quot;, linear_regression)])
+    pipeline.fit(X[:, np.newaxis], y)
+
+    # Evaluate the models using crossvalidation
+    scores = cross_val_score(pipeline, X[:, np.newaxis], y,
+                             scoring=&quot;neg_mean_squared_error&quot;, cv=10)
+
+    X_test = np.linspace(0, 1, 100)
+    plt.plot(X_test, pipeline.predict(X_test[:, np.newaxis]), label=&quot;Model&quot;)
+    plt.plot(X_test, true_fun(X_test), label=&quot;True function&quot;)
+    plt.scatter(X, y, edgecolor=&#39;b&#39;, s=20, label=&quot;Samples&quot;)
+    plt.xlabel(&quot;x&quot;)
+    plt.ylabel(&quot;y&quot;)
+    plt.xlim((0, 1))
+    plt.ylim((-2, 2))
+    plt.legend(loc=&quot;best&quot;)
+    plt.title(&quot;Degree {}\nMSE = {:.2e}(+/- {:.2e})&quot;.format(
+        degrees[i], -scores.mean(), scores.std()))
+plt.show()
+</pre></div>
+</div>
+</div>
+</div>
+</section>
+<section id="various-steps-in-cross-validation">
+<h2>Various steps in cross-validation<a class="headerlink" href="#various-steps-in-cross-validation" title="Link to this heading">#</a></h2>
+<p>When the repetitive splitting of the data set is done randomly,
+samples may accidently end up in a fast majority of the splits in
+either training or test set. Such samples may have an unbalanced
+influence on either model building or prediction evaluation. To avoid
+this <span class="math notranslate nohighlight">\(k\)</span>-fold cross-validation structures the data splitting. The
+samples are divided into <span class="math notranslate nohighlight">\(k\)</span> more or less equally sized exhaustive and
+mutually exclusive subsets. In turn (at each split) one of these
+subsets plays the role of the test set while the union of the
+remaining subsets constitutes the training set. Such a splitting
+warrants a balanced representation of each sample in both training and
+test set over the splits. Still the division into the <span class="math notranslate nohighlight">\(k\)</span> subsets
+involves a degree of randomness. This may be fully excluded when
+choosing <span class="math notranslate nohighlight">\(k=n\)</span>. This particular case is referred to as leave-one-out
+cross-validation (LOOCV).</p>
+</section>
+<section id="cross-validation-in-brief">
+<h2>Cross-validation in brief<a class="headerlink" href="#cross-validation-in-brief" title="Link to this heading">#</a></h2>
+<p>For the various values of <span class="math notranslate nohighlight">\(k\)</span></p>
+<ol class="arabic simple">
+<li><p>shuffle the dataset randomly.</p></li>
+<li><p>Split the dataset into <span class="math notranslate nohighlight">\(k\)</span> groups.</p></li>
+<li><p>For each unique group:</p></li>
+</ol>
+<p>a. Decide which group to use as set for test data</p>
+<p>b. Take the remaining groups as a training data set</p>
+<p>c. Fit a model on the training set and evaluate it on the test set</p>
+<p>d. Retain the evaluation score and discard the model</p>
+<ol class="arabic simple" start="5">
+<li><p>Summarize the model using the sample of model evaluation scores</p></li>
+</ol>
+</section>
+<section id="code-example-for-cross-validation-and-k-fold-cross-validation">
+<h2>Code Example for Cross-validation and <span class="math notranslate nohighlight">\(k\)</span>-fold Cross-validation<a class="headerlink" href="#code-example-for-cross-validation-and-k-fold-cross-validation" title="Link to this heading">#</a></h2>
+<p>The code here uses Ridge regression with cross-validation (CV)  resampling and <span class="math notranslate nohighlight">\(k\)</span>-fold CV in order to fit a specific polynomial.</p>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-none notranslate"><div class="highlight"><pre><span></span>import numpy as np
+import matplotlib.pyplot as plt
+from sklearn.model_selection import KFold
+from sklearn.linear_model import Ridge
+from sklearn.model_selection import cross_val_score
+from sklearn.preprocessing import PolynomialFeatures
+
+# A seed just to ensure that the random numbers are the same for every run.
+# Useful for eventual debugging.
+np.random.seed(3155)
+
+# Generate the data.
+nsamples = 100
+x = np.random.randn(nsamples)
+y = 3*x**2 + np.random.randn(nsamples)
+
+## Cross-validation on Ridge regression using KFold only
+
+# Decide degree on polynomial to fit
+poly = PolynomialFeatures(degree = 6)
+
+# Decide which values of lambda to use
+nlambdas = 500
+lambdas = np.logspace(-3, 5, nlambdas)
+
+# Initialize a KFold instance
+k = 5
+kfold = KFold(n_splits = k)
+
+# Perform the cross-validation to estimate MSE
+scores_KFold = np.zeros((nlambdas, k))
+
+i = 0
+for lmb in lambdas:
+    ridge = Ridge(alpha = lmb)
+    j = 0
+    for train_inds, test_inds in kfold.split(x):
+        xtrain = x[train_inds]
+        ytrain = y[train_inds]
+
+        xtest = x[test_inds]
+        ytest = y[test_inds]
+
+        Xtrain = poly.fit_transform(xtrain[:, np.newaxis])
+        ridge.fit(Xtrain, ytrain[:, np.newaxis])
+
+        Xtest = poly.fit_transform(xtest[:, np.newaxis])
+        ypred = ridge.predict(Xtest)
+
+        scores_KFold[i,j] = np.sum((ypred - ytest[:, np.newaxis])**2)/np.size(ypred)
+
+        j += 1
+    i += 1
+
+
+estimated_mse_KFold = np.mean(scores_KFold, axis = 1)
+
+## Cross-validation using cross_val_score from sklearn along with KFold
+
+# kfold is an instance initialized above as:
+# kfold = KFold(n_splits = k)
+
+estimated_mse_sklearn = np.zeros(nlambdas)
+i = 0
+for lmb in lambdas:
+    ridge = Ridge(alpha = lmb)
+
+    X = poly.fit_transform(x[:, np.newaxis])
+    estimated_mse_folds = cross_val_score(ridge, X, y[:, np.newaxis], scoring=&#39;neg_mean_squared_error&#39;, cv=kfold)
+
+    # cross_val_score return an array containing the estimated negative mse for every fold.
+    # we have to the the mean of every array in order to get an estimate of the mse of the model
+    estimated_mse_sklearn[i] = np.mean(-estimated_mse_folds)
+
+    i += 1
+
+## Plot and compare the slightly different ways to perform cross-validation
+
+plt.figure()
+
+plt.plot(np.log10(lambdas), estimated_mse_sklearn, label = &#39;cross_val_score&#39;)
+#plt.plot(np.log10(lambdas), estimated_mse_KFold, &#39;r--&#39;, label = &#39;KFold&#39;)
+
+plt.xlabel(&#39;log10(lambda)&#39;)
+plt.ylabel(&#39;mse&#39;)
+
+plt.legend()
+
+plt.show()
+</pre></div>
+</div>
+</div>
+</div>
+</section>
+<section id="more-examples-on-bootstrap-and-cross-validation-and-errors">
+<h2>More examples on bootstrap and cross-validation and errors<a class="headerlink" href="#more-examples-on-bootstrap-and-cross-validation-and-errors" title="Link to this heading">#</a></h2>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-none notranslate"><div class="highlight"><pre><span></span># Common imports
+import os
+import numpy as np
+import pandas as pd
+import matplotlib.pyplot as plt
+from sklearn.linear_model import LinearRegression, Ridge, Lasso
+from sklearn.model_selection import train_test_split
+from sklearn.utils import resample
+from sklearn.metrics import mean_squared_error
+# Where to save the figures and data files
+PROJECT_ROOT_DIR = &quot;Results&quot;
+FIGURE_ID = &quot;Results/FigureFiles&quot;
+DATA_ID = &quot;DataFiles/&quot;
+
+if not os.path.exists(PROJECT_ROOT_DIR):
+    os.mkdir(PROJECT_ROOT_DIR)
+
+if not os.path.exists(FIGURE_ID):
+    os.makedirs(FIGURE_ID)
+
+if not os.path.exists(DATA_ID):
+    os.makedirs(DATA_ID)
+
+def image_path(fig_id):
+    return os.path.join(FIGURE_ID, fig_id)
+
+def data_path(dat_id):
+    return os.path.join(DATA_ID, dat_id)
+
+def save_fig(fig_id):
+    plt.savefig(image_path(fig_id) + &quot;.png&quot;, format=&#39;png&#39;)
+
+infile = open(data_path(&quot;EoS.csv&quot;),&#39;r&#39;)
+
+# Read the EoS data as  csv file and organize the data into two arrays with density and energies
+EoS = pd.read_csv(infile, names=(&#39;Density&#39;, &#39;Energy&#39;))
+EoS[&#39;Energy&#39;] = pd.to_numeric(EoS[&#39;Energy&#39;], errors=&#39;coerce&#39;)
+EoS = EoS.dropna()
+Energies = EoS[&#39;Energy&#39;]
+Density = EoS[&#39;Density&#39;]
+#  The design matrix now as function of various polytrops
+
+Maxpolydegree = 30
+X = np.zeros((len(Density),Maxpolydegree))
+X[:,0] = 1.0
+testerror = np.zeros(Maxpolydegree)
+trainingerror = np.zeros(Maxpolydegree)
+polynomial = np.zeros(Maxpolydegree)
+
+trials = 100
+for polydegree in range(1, Maxpolydegree):
+    polynomial[polydegree] = polydegree
+    for degree in range(polydegree):
+        X[:,degree] = Density**(degree/3.0)
+
+# loop over trials in order to estimate the expectation value of the MSE
+    testerror[polydegree] = 0.0
+    trainingerror[polydegree] = 0.0
+    for samples in range(trials):
+        x_train, x_test, y_train, y_test = train_test_split(X, Energies, test_size=0.2)
+        model = LinearRegression(fit_intercept=False).fit(x_train, y_train)
+        ypred = model.predict(x_train)
+        ytilde = model.predict(x_test)
+        testerror[polydegree] += mean_squared_error(y_test, ytilde)
+        trainingerror[polydegree] += mean_squared_error(y_train, ypred) 
+
+    testerror[polydegree] /= trials
+    trainingerror[polydegree] /= trials
+    print(&quot;Degree of polynomial: %3d&quot;% polynomial[polydegree])
+    print(&quot;Mean squared error on training data: %.8f&quot; % trainingerror[polydegree])
+    print(&quot;Mean squared error on test data: %.8f&quot; % testerror[polydegree])
+
+plt.plot(polynomial, np.log10(trainingerror), label=&#39;Training Error&#39;)
+plt.plot(polynomial, np.log10(testerror), label=&#39;Test Error&#39;)
+plt.xlabel(&#39;Polynomial degree&#39;)
+plt.ylabel(&#39;log10[MSE]&#39;)
+plt.legend()
+plt.show()
+</pre></div>
+</div>
+</div>
+</div>
+<p>Note that we kept the intercept column in the fitting here. This means that we need to set the <strong>intercept</strong> in the call to the <strong>Scikit-Learn</strong> function as <strong>False</strong>. Alternatively, we could have set up the design matrix <span class="math notranslate nohighlight">\(X\)</span> without the first column of ones.</p>
+</section>
+<section id="the-same-example-but-now-with-cross-validation">
+<h2>The same example but now with cross-validation<a class="headerlink" href="#the-same-example-but-now-with-cross-validation" title="Link to this heading">#</a></h2>
+<p>In this example we keep the intercept column again but add cross-validation in order to estimate the best possible value of the means squared error.</p>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-none notranslate"><div class="highlight"><pre><span></span># Common imports
+import os
+import numpy as np
+import pandas as pd
+import matplotlib.pyplot as plt
+from sklearn.linear_model import LinearRegression, Ridge, Lasso
+from sklearn.metrics import mean_squared_error
+from sklearn.model_selection import KFold
+from sklearn.model_selection import cross_val_score
+
+
+# Where to save the figures and data files
+PROJECT_ROOT_DIR = &quot;Results&quot;
+FIGURE_ID = &quot;Results/FigureFiles&quot;
+DATA_ID = &quot;DataFiles/&quot;
+
+if not os.path.exists(PROJECT_ROOT_DIR):
+    os.mkdir(PROJECT_ROOT_DIR)
+
+if not os.path.exists(FIGURE_ID):
+    os.makedirs(FIGURE_ID)
+
+if not os.path.exists(DATA_ID):
+    os.makedirs(DATA_ID)
+
+def image_path(fig_id):
+    return os.path.join(FIGURE_ID, fig_id)
+
+def data_path(dat_id):
+    return os.path.join(DATA_ID, dat_id)
+
+def save_fig(fig_id):
+    plt.savefig(image_path(fig_id) + &quot;.png&quot;, format=&#39;png&#39;)
+
+infile = open(data_path(&quot;EoS.csv&quot;),&#39;r&#39;)
+
+# Read the EoS data as  csv file and organize the data into two arrays with density and energies
+EoS = pd.read_csv(infile, names=(&#39;Density&#39;, &#39;Energy&#39;))
+EoS[&#39;Energy&#39;] = pd.to_numeric(EoS[&#39;Energy&#39;], errors=&#39;coerce&#39;)
+EoS = EoS.dropna()
+Energies = EoS[&#39;Energy&#39;]
+Density = EoS[&#39;Density&#39;]
+#  The design matrix now as function of various polytrops
+
+Maxpolydegree = 30
+X = np.zeros((len(Density),Maxpolydegree))
+X[:,0] = 1.0
+estimated_mse_sklearn = np.zeros(Maxpolydegree)
+polynomial = np.zeros(Maxpolydegree)
+k =5
+kfold = KFold(n_splits = k)
+
+for polydegree in range(1, Maxpolydegree):
+    polynomial[polydegree] = polydegree
+    for degree in range(polydegree):
+        X[:,degree] = Density**(degree/3.0)
+        OLS = LinearRegression(fit_intercept=False)
+# loop over trials in order to estimate the expectation value of the MSE
+    estimated_mse_folds = cross_val_score(OLS, X, Energies, scoring=&#39;neg_mean_squared_error&#39;, cv=kfold)
+#[:, np.newaxis]
+    estimated_mse_sklearn[polydegree] = np.mean(-estimated_mse_folds)
+
+plt.plot(polynomial, np.log10(estimated_mse_sklearn), label=&#39;Test Error&#39;)
+plt.xlabel(&#39;Polynomial degree&#39;)
+plt.ylabel(&#39;log10[MSE]&#39;)
+plt.legend()
+plt.show()
+</pre></div>
+</div>
+</div>
+</div>
+</section>
+<section id="logistic-regression">
+<h2>Logistic Regression<a class="headerlink" href="#logistic-regression" title="Link to this heading">#</a></h2>
+<p>In linear regression our main interest was centered on learning the
+coefficients of a functional fit (say a polynomial) in order to be
+able to predict the response of a continuous variable on some unseen
+data. The fit to the continuous variable <span class="math notranslate nohighlight">\(y_i\)</span> is based on some
+independent variables <span class="math notranslate nohighlight">\(\boldsymbol{x}_i\)</span>. Linear regression resulted in
+analytical expressions for standard ordinary Least Squares or Ridge
+regression (in terms of matrices to invert) for several quantities,
+ranging from the variance and thereby the confidence intervals of the
+parameters <span class="math notranslate nohighlight">\(\boldsymbol{\theta}\)</span> to the mean squared error. If we can invert
+the product of the design matrices, linear regression gives then a
+simple recipe for fitting our data.</p>
+</section>
+<section id="classification-problems">
+<h2>Classification problems<a class="headerlink" href="#classification-problems" title="Link to this heading">#</a></h2>
+<p>Classification problems, however, are concerned with outcomes taking
+the form of discrete variables (i.e. categories). We may for example,
+on the basis of DNA sequencing for a number of patients, like to find
+out which mutations are important for a certain disease; or based on
+scans of various patients’ brains, figure out if there is a tumor or
+not; or given a specific physical system, we’d like to identify its
+state, say whether it is an ordered or disordered system (typical
+situation in solid state physics); or classify the status of a
+patient, whether she/he has a stroke or not and many other similar
+situations.</p>
+<p>The most common situation we encounter when we apply logistic
+regression is that of two possible outcomes, normally denoted as a
+binary outcome, true or false, positive or negative, success or
+failure etc.</p>
+</section>
+<section id="optimization-and-deep-learning">
+<h2>Optimization and Deep learning<a class="headerlink" href="#optimization-and-deep-learning" title="Link to this heading">#</a></h2>
+<p>Logistic regression will also serve as our stepping stone towards
+neural network algorithms and supervised deep learning. For logistic
+learning, the minimization of the cost function leads to a non-linear
+equation in the parameters <span class="math notranslate nohighlight">\(\boldsymbol{\theta}\)</span>. The optimization of the
+problem calls therefore for minimization algorithms. This forms the
+bottle neck of all machine learning algorithms, namely how to find
+reliable minima of a multi-variable function. This leads us to the
+family of gradient descent methods. The latter are the working horses
+of basically all modern machine learning algorithms.</p>
+<p>We note also that many of the topics discussed here on logistic
+regression are also commonly used in modern supervised Deep Learning
+models, as we will see later.</p>
+</section>
+<section id="basics">
+<h2>Basics<a class="headerlink" href="#basics" title="Link to this heading">#</a></h2>
+<p>We consider the case where the outputs/targets, also called the
+responses or the outcomes, <span class="math notranslate nohighlight">\(y_i\)</span> are discrete and only take values
+from <span class="math notranslate nohighlight">\(k=0,\dots,K-1\)</span> (i.e. <span class="math notranslate nohighlight">\(K\)</span> classes).</p>
+<p>The goal is to predict the
+output classes from the design matrix <span class="math notranslate nohighlight">\(\boldsymbol{X}\in\mathbb{R}^{n\times p}\)</span>
+made of <span class="math notranslate nohighlight">\(n\)</span> samples, each of which carries <span class="math notranslate nohighlight">\(p\)</span> features or predictors. The
+primary goal is to identify the classes to which new unseen samples
+belong.</p>
+<p>Let us specialize to the case of two classes only, with outputs
+<span class="math notranslate nohighlight">\(y_i=0\)</span> and <span class="math notranslate nohighlight">\(y_i=1\)</span>. Our outcomes could represent the status of a
+credit card user that could default or not on her/his credit card
+debt. That is</p>
+<div class="math notranslate nohighlight">
+\[\begin{split}
+y_i = \begin{bmatrix} 0 &amp; \mathrm{no}\\  1 &amp; \mathrm{yes} \end{bmatrix}.
+\end{split}\]</div>
+</section>
+<section id="linear-classifier">
+<h2>Linear classifier<a class="headerlink" href="#linear-classifier" title="Link to this heading">#</a></h2>
+<p>Before moving to the logistic model, let us try to use our linear
+regression model to classify these two outcomes. We could for example
+fit a linear model to the default case if <span class="math notranslate nohighlight">\(y_i &gt; 0.5\)</span> and the no
+default case <span class="math notranslate nohighlight">\(y_i \leq 0.5\)</span>.</p>
+<p>We would then have our
+weighted linear combination, namely</p>
+<!-- Equation labels as ordinary links -->
+<div id="_auto1"></div>
+<div class="math notranslate nohighlight">
+\[
+\begin{equation}
+\boldsymbol{y} = \boldsymbol{X}^T\boldsymbol{\theta} +  \boldsymbol{\epsilon},
+\label{_auto1} \tag{1}
+\end{equation}
+\]</div>
+<p>where <span class="math notranslate nohighlight">\(\boldsymbol{y}\)</span> is a vector representing the possible outcomes, <span class="math notranslate nohighlight">\(\boldsymbol{X}\)</span> is our
+<span class="math notranslate nohighlight">\(n\times p\)</span> design matrix and <span class="math notranslate nohighlight">\(\boldsymbol{\theta}\)</span> represents our estimators/predictors.</p>
+</section>
+<section id="some-selected-properties">
+<h2>Some selected properties<a class="headerlink" href="#some-selected-properties" title="Link to this heading">#</a></h2>
+<p>The main problem with our function is that it takes values on the
+entire real axis. In the case of logistic regression, however, the
+labels <span class="math notranslate nohighlight">\(y_i\)</span> are discrete variables. A typical example is the credit
+card data discussed below here, where we can set the state of
+defaulting the debt to <span class="math notranslate nohighlight">\(y_i=1\)</span> and not to <span class="math notranslate nohighlight">\(y_i=0\)</span> for one the persons
+in the data set (see the full example below).</p>
+<p>One simple way to get a discrete output is to have sign
+functions that map the output of a linear regressor to values <span class="math notranslate nohighlight">\(\{0,1\}\)</span>,
+<span class="math notranslate nohighlight">\(f(s_i)=sign(s_i)=1\)</span> if <span class="math notranslate nohighlight">\(s_i\ge 0\)</span> and 0 if otherwise.
+We will encounter this model in our first demonstration of neural networks.</p>
+<p>Historically it is called the <strong>perceptron</strong> model in the machine learning
+literature. This model is extremely simple. However, in many cases it is more
+favorable to use a ``soft” classifier that outputs
+the probability of a given category. This leads us to the logistic function.</p>
+</section>
+<section id="simple-example">
+<h2>Simple example<a class="headerlink" href="#simple-example" title="Link to this heading">#</a></h2>
+<p>The following example on data for coronary heart disease (CHD) as function of age may serve as an illustration. In the code here we read and plot whether a person has had CHD (output = 1) or not (output = 0). This ouput  is plotted the person’s against age. Clearly, the figure shows that attempting to make a standard linear regression fit may not be very meaningful.</p>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-none notranslate"><div class="highlight"><pre><span></span># Common imports
+import os
+import numpy as np
+import pandas as pd
+import matplotlib.pyplot as plt
+from sklearn.linear_model import LinearRegression, Ridge, Lasso
+from sklearn.model_selection import train_test_split
+from sklearn.utils import resample
+from sklearn.metrics import mean_squared_error
+from IPython.display import display
+from pylab import plt, mpl
+mpl.rcParams[&#39;font.family&#39;] = &#39;serif&#39;
+
+# Where to save the figures and data files
+PROJECT_ROOT_DIR = &quot;Results&quot;
+FIGURE_ID = &quot;Results/FigureFiles&quot;
+DATA_ID = &quot;DataFiles/&quot;
+
+if not os.path.exists(PROJECT_ROOT_DIR):
+    os.mkdir(PROJECT_ROOT_DIR)
+
+if not os.path.exists(FIGURE_ID):
+    os.makedirs(FIGURE_ID)
+
+if not os.path.exists(DATA_ID):
+    os.makedirs(DATA_ID)
+
+def image_path(fig_id):
+    return os.path.join(FIGURE_ID, fig_id)
+
+def data_path(dat_id):
+    return os.path.join(DATA_ID, dat_id)
+
+def save_fig(fig_id):
+    plt.savefig(image_path(fig_id) + &quot;.png&quot;, format=&#39;png&#39;)
+
+infile = open(data_path(&quot;chddata.csv&quot;),&#39;r&#39;)
+
+# Read the chd data as  csv file and organize the data into arrays with age group, age, and chd
+chd = pd.read_csv(infile, names=(&#39;ID&#39;, &#39;Age&#39;, &#39;Agegroup&#39;, &#39;CHD&#39;))
+chd.columns = [&#39;ID&#39;, &#39;Age&#39;, &#39;Agegroup&#39;, &#39;CHD&#39;]
+output = chd[&#39;CHD&#39;]
+age = chd[&#39;Age&#39;]
+agegroup = chd[&#39;Agegroup&#39;]
+numberID  = chd[&#39;ID&#39;] 
+display(chd)
+
+plt.scatter(age, output, marker=&#39;o&#39;)
+plt.axis([18,70.0,-0.1, 1.2])
+plt.xlabel(r&#39;Age&#39;)
+plt.ylabel(r&#39;CHD&#39;)
+plt.title(r&#39;Age distribution and Coronary heart disease&#39;)
+plt.show()
+</pre></div>
+</div>
+</div>
+</div>
+</section>
+<section id="plotting-the-mean-value-for-each-group">
+<h2>Plotting the mean value for each group<a class="headerlink" href="#plotting-the-mean-value-for-each-group" title="Link to this heading">#</a></h2>
+<p>What we could attempt however is to plot the mean value for each group.</p>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-none notranslate"><div class="highlight"><pre><span></span>agegroupmean = np.array([0.1, 0.133, 0.250, 0.333, 0.462, 0.625, 0.765, 0.800])
+group = np.array([1, 2, 3, 4, 5, 6, 7, 8])
+plt.plot(group, agegroupmean, &quot;r-&quot;)
+plt.axis([0,9,0, 1.0])
+plt.xlabel(r&#39;Age group&#39;)
+plt.ylabel(r&#39;CHD mean values&#39;)
+plt.title(r&#39;Mean values for each age group&#39;)
+plt.show()
+</pre></div>
+</div>
+</div>
+</div>
+<p>We are now trying to find a function <span class="math notranslate nohighlight">\(f(y\vert x)\)</span>, that is a function which gives us an expected value for the output <span class="math notranslate nohighlight">\(y\)</span> with a given input <span class="math notranslate nohighlight">\(x\)</span>.
+In standard linear regression with a linear dependence on <span class="math notranslate nohighlight">\(x\)</span>, we would write this in terms of our model</p>
+<div class="math notranslate nohighlight">
+\[
+f(y_i\vert x_i)=\theta_0+\theta_1 x_i.
+\]</div>
+<p>This expression implies however that <span class="math notranslate nohighlight">\(f(y_i\vert x_i)\)</span> could take any
+value from minus infinity to plus infinity. If we however let
+<span class="math notranslate nohighlight">\(f(y\vert y)\)</span> be represented by the mean value, the above example
+shows us that we can constrain the function to take values between
+zero and one, that is we have <span class="math notranslate nohighlight">\(0 \le f(y_i\vert x_i) \le 1\)</span>. Looking
+at our last curve we see also that it has an S-shaped form. This leads
+us to a very popular model for the function <span class="math notranslate nohighlight">\(f\)</span>, namely the so-called
+Sigmoid function or logistic model. We will consider this function as
+representing the probability for finding a value of <span class="math notranslate nohighlight">\(y_i\)</span> with a given
+<span class="math notranslate nohighlight">\(x_i\)</span>.</p>
+</section>
+<section id="the-logistic-function">
+<h2>The logistic function<a class="headerlink" href="#the-logistic-function" title="Link to this heading">#</a></h2>
+<p>Another widely studied model, is the so-called
+perceptron model, which is an example of a “hard classification” model. We
+will encounter this model when we discuss neural networks as
+well. Each datapoint is deterministically assigned to a category (i.e
+<span class="math notranslate nohighlight">\(y_i=0\)</span> or <span class="math notranslate nohighlight">\(y_i=1\)</span>). In many cases, and the coronary heart disease data forms one of many such examples, it is favorable to have a “soft”
+classifier that outputs the probability of a given category rather
+than a single value. For example, given <span class="math notranslate nohighlight">\(x_i\)</span>, the classifier
+outputs the probability of being in a category <span class="math notranslate nohighlight">\(k\)</span>.  Logistic regression
+is the most common example of a so-called soft classifier. In logistic
+regression, the probability that a data point <span class="math notranslate nohighlight">\(x_i\)</span>
+belongs to a category <span class="math notranslate nohighlight">\(y_i=\{0,1\}\)</span> is given by the so-called logit function (or Sigmoid) which is meant to represent the likelihood for a given event,</p>
+<div class="math notranslate nohighlight">
+\[
+p(t) = \frac{1}{1+\mathrm \exp{-t}}=\frac{\exp{t}}{1+\mathrm \exp{t}}.
+\]</div>
+<p>Note that <span class="math notranslate nohighlight">\(1-p(t)= p(-t)\)</span>.</p>
+</section>
+<section id="examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-networks">
+<h2>Examples of likelihood functions used in logistic regression and nueral networks<a class="headerlink" href="#examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-networks" title="Link to this heading">#</a></h2>
+<p>The following code plots the logistic function, the step function and other functions we will encounter from here and on.</p>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-none notranslate"><div class="highlight"><pre><span></span>&quot;&quot;&quot;The sigmoid function (or the logistic curve) is a
+function that takes any real number, z, and outputs a number (0,1).
+It is useful in neural networks for assigning weights on a relative scale.
+The value z is the weighted sum of parameters involved in the learning algorithm.&quot;&quot;&quot;
+
+import numpy
+import matplotlib.pyplot as plt
+import math as mt
+
+z = numpy.arange(-5, 5, .1)
+sigma_fn = numpy.vectorize(lambda z: 1/(1+numpy.exp(-z)))
+sigma = sigma_fn(z)
+
+fig = plt.figure()
+ax = fig.add_subplot(111)
+ax.plot(z, sigma)
+ax.set_ylim([-0.1, 1.1])
+ax.set_xlim([-5,5])
+ax.grid(True)
+ax.set_xlabel(&#39;z&#39;)
+ax.set_title(&#39;sigmoid function&#39;)
+
+plt.show()
+
+&quot;&quot;&quot;Step Function&quot;&quot;&quot;
+z = numpy.arange(-5, 5, .02)
+step_fn = numpy.vectorize(lambda z: 1.0 if z &gt;= 0.0 else 0.0)
+step = step_fn(z)
+
+fig = plt.figure()
+ax = fig.add_subplot(111)
+ax.plot(z, step)
+ax.set_ylim([-0.5, 1.5])
+ax.set_xlim([-5,5])
+ax.grid(True)
+ax.set_xlabel(&#39;z&#39;)
+ax.set_title(&#39;step function&#39;)
+
+plt.show()
+
+&quot;&quot;&quot;tanh Function&quot;&quot;&quot;
+z = numpy.arange(-2*mt.pi, 2*mt.pi, 0.1)
+t = numpy.tanh(z)
+
+fig = plt.figure()
+ax = fig.add_subplot(111)
+ax.plot(z, t)
+ax.set_ylim([-1.0, 1.0])
+ax.set_xlim([-2*mt.pi,2*mt.pi])
+ax.grid(True)
+ax.set_xlabel(&#39;z&#39;)
+ax.set_title(&#39;tanh function&#39;)
+
+plt.show()
+</pre></div>
+</div>
+</div>
+</div>
+</section>
+<section id="two-parameters">
+<h2>Two parameters<a class="headerlink" href="#two-parameters" title="Link to this heading">#</a></h2>
+<p>We assume now that we have two classes with <span class="math notranslate nohighlight">\(y_i\)</span> either <span class="math notranslate nohighlight">\(0\)</span> or <span class="math notranslate nohighlight">\(1\)</span>. Furthermore we assume also that we have only two parameters <span class="math notranslate nohighlight">\(\theta\)</span> in our fitting of the Sigmoid function, that is we define probabilities</p>
+<div class="math notranslate nohighlight">
+\[\begin{split}
+\begin{align*}
+p(y_i=1|x_i,\boldsymbol{\theta}) &amp;= \frac{\exp{(\theta_0+\theta_1x_i)}}{1+\exp{(\theta_0+\theta_1x_i)}},\nonumber\\
+p(y_i=0|x_i,\boldsymbol{\theta}) &amp;= 1 - p(y_i=1|x_i,\boldsymbol{\theta}),
+\end{align*}
+\end{split}\]</div>
+<p>where <span class="math notranslate nohighlight">\(\boldsymbol{\theta}\)</span> are the weights we wish to extract from data, in our case <span class="math notranslate nohighlight">\(\theta_0\)</span> and <span class="math notranslate nohighlight">\(\theta_1\)</span>.</p>
+<p>Note that we used</p>
+<div class="math notranslate nohighlight">
+\[
+p(y_i=0\vert x_i, \boldsymbol{\theta}) = 1-p(y_i=1\vert x_i, \boldsymbol{\theta}).
+\]</div>
+</section>
+<section id="maximum-likelihood">
+<h2>Maximum likelihood<a class="headerlink" href="#maximum-likelihood" title="Link to this heading">#</a></h2>
+<p>In order to define the total likelihood for all possible outcomes from a<br />
+dataset <span class="math notranslate nohighlight">\(\mathcal{D}=\{(y_i,x_i)\}\)</span>, with the binary labels
+<span class="math notranslate nohighlight">\(y_i\in\{0,1\}\)</span> and where the data points are drawn independently, we use the so-called <a class="reference external" href="https://en.wikipedia.org/wiki/Maximum_likelihood_estimation">Maximum Likelihood Estimation</a> (MLE) principle.
+We aim thus at maximizing
+the probability of seeing the observed data. We can then approximate the
+likelihood in terms of the product of the individual probabilities of a specific outcome <span class="math notranslate nohighlight">\(y_i\)</span>, that is</p>
+<div class="math notranslate nohighlight">
+\[\begin{split}
+\begin{align*}
+P(\mathcal{D}|\boldsymbol{\theta})&amp; = \prod_{i=1}^n \left[p(y_i=1|x_i,\boldsymbol{\theta})\right]^{y_i}\left[1-p(y_i=1|x_i,\boldsymbol{\theta}))\right]^{1-y_i}\nonumber \\
+\end{align*}
+\end{split}\]</div>
+<p>from which we obtain the log-likelihood and our <strong>cost/loss</strong> function</p>
+<div class="math notranslate nohighlight">
+\[
+\mathcal{C}(\boldsymbol{\theta}) = \sum_{i=1}^n \left( y_i\log{p(y_i=1|x_i,\boldsymbol{\theta})} + (1-y_i)\log\left[1-p(y_i=1|x_i,\boldsymbol{\theta}))\right]\right).
+\]</div>
+</section>
+<section id="the-cost-function-rewritten">
+<h2>The cost function rewritten<a class="headerlink" href="#the-cost-function-rewritten" title="Link to this heading">#</a></h2>
+<p>Reordering the logarithms, we can rewrite the <strong>cost/loss</strong> function as</p>
+<div class="math notranslate nohighlight">
+\[
+\mathcal{C}(\boldsymbol{\theta}) = \sum_{i=1}^n  \left(y_i(\theta_0+\theta_1x_i) -\log{(1+\exp{(\theta_0+\theta_1x_i)})}\right).
+\]</div>
+<p>The maximum likelihood estimator is defined as the set of parameters that maximize the log-likelihood where we maximize with respect to <span class="math notranslate nohighlight">\(\theta\)</span>.
+Since the cost (error) function is just the negative log-likelihood, for logistic regression we have that</p>
+<div class="math notranslate nohighlight">
+\[
+\mathcal{C}(\boldsymbol{\theta})=-\sum_{i=1}^n  \left(y_i(\theta_0+\theta_1x_i) -\log{(1+\exp{(\theta_0+\theta_1x_i)})}\right).
+\]</div>
+<p>This equation is known in statistics as the <strong>cross entropy</strong>. Finally, we note that just as in linear regression,
+in practice we often supplement the cross-entropy with additional regularization terms, usually <span class="math notranslate nohighlight">\(L_1\)</span> and <span class="math notranslate nohighlight">\(L_2\)</span> regularization as we did for Ridge and Lasso regression.</p>
+</section>
+<section id="minimizing-the-cross-entropy">
+<h2>Minimizing the cross entropy<a class="headerlink" href="#minimizing-the-cross-entropy" title="Link to this heading">#</a></h2>
+<p>The cross entropy is a convex function of the weights <span class="math notranslate nohighlight">\(\boldsymbol{\theta}\)</span> and,
+therefore, any local minimizer is a global minimizer.</p>
+<p>Minimizing this
+cost function with respect to the two parameters <span class="math notranslate nohighlight">\(\theta_0\)</span> and <span class="math notranslate nohighlight">\(\theta_1\)</span> we obtain</p>
+<div class="math notranslate nohighlight">
+\[
+\frac{\partial \mathcal{C}(\boldsymbol{\theta})}{\partial \theta_0} = -\sum_{i=1}^n  \left(y_i -\frac{\exp{(\theta_0+\theta_1x_i)}}{1+\exp{(\theta_0+\theta_1x_i)}}\right),
+\]</div>
+<p>and</p>
+<div class="math notranslate nohighlight">
+\[
+\frac{\partial \mathcal{C}(\boldsymbol{\theta})}{\partial \theta_1} = -\sum_{i=1}^n  \left(y_ix_i -x_i\frac{\exp{(\theta_0+\theta_1x_i)}}{1+\exp{(\theta_0+\theta_1x_i)}}\right).
+\]</div>
+</section>
+<section id="a-more-compact-expression">
+<h2>A more compact expression<a class="headerlink" href="#a-more-compact-expression" title="Link to this heading">#</a></h2>
+<p>Let us now define a vector <span class="math notranslate nohighlight">\(\boldsymbol{y}\)</span> with <span class="math notranslate nohighlight">\(n\)</span> elements <span class="math notranslate nohighlight">\(y_i\)</span>, an
+<span class="math notranslate nohighlight">\(n\times p\)</span> matrix <span class="math notranslate nohighlight">\(\boldsymbol{X}\)</span> which contains the <span class="math notranslate nohighlight">\(x_i\)</span> values and a
+vector <span class="math notranslate nohighlight">\(\boldsymbol{p}\)</span> of fitted probabilities <span class="math notranslate nohighlight">\(p(y_i\vert x_i,\boldsymbol{\theta})\)</span>. We can rewrite in a more compact form the first
+derivative of the cost function as</p>
+<div class="math notranslate nohighlight">
+\[
+\frac{\partial \mathcal{C}(\boldsymbol{\theta})}{\partial \boldsymbol{\theta}} = -\boldsymbol{X}^T\left(\boldsymbol{y}-\boldsymbol{p}\right).
+\]</div>
+<p>If we in addition define a diagonal matrix <span class="math notranslate nohighlight">\(\boldsymbol{W}\)</span> with elements
+<span class="math notranslate nohighlight">\(p(y_i\vert x_i,\boldsymbol{\theta})(1-p(y_i\vert x_i,\boldsymbol{\theta})\)</span>, we can obtain a compact expression of the second derivative as</p>
+<div class="math notranslate nohighlight">
+\[
+\frac{\partial^2 \mathcal{C}(\boldsymbol{\theta})}{\partial \boldsymbol{\theta}\partial \boldsymbol{\theta}^T} = \boldsymbol{X}^T\boldsymbol{W}\boldsymbol{X}.
+\]</div>
+</section>
+<section id="extending-to-more-predictors">
+<h2>Extending to more predictors<a class="headerlink" href="#extending-to-more-predictors" title="Link to this heading">#</a></h2>
+<p>Within a binary classification problem, we can easily expand our model to include multiple predictors. Our ratio between likelihoods is then with <span class="math notranslate nohighlight">\(p\)</span> predictors</p>
+<div class="math notranslate nohighlight">
+\[
+\log{ \frac{p(\boldsymbol{\theta}\boldsymbol{x})}{1-p(\boldsymbol{\theta}\boldsymbol{x})}} = \theta_0+\theta_1x_1+\theta_2x_2+\dots+\theta_px_p.
+\]</div>
+<p>Here we defined <span class="math notranslate nohighlight">\(\boldsymbol{x}=[1,x_1,x_2,\dots,x_p]\)</span> and <span class="math notranslate nohighlight">\(\boldsymbol{\theta}=[\theta_0, \theta_1, \dots, \theta_p]\)</span> leading to</p>
+<div class="math notranslate nohighlight">
+\[
+p(\boldsymbol{\theta}\boldsymbol{x})=\frac{ \exp{(\theta_0+\theta_1x_1+\theta_2x_2+\dots+\theta_px_p)}}{1+\exp{(\theta_0+\theta_1x_1+\theta_2x_2+\dots+\theta_px_p)}}.
+\]</div>
+</section>
+<section id="including-more-classes">
+<h2>Including more classes<a class="headerlink" href="#including-more-classes" title="Link to this heading">#</a></h2>
+<p>Till now we have mainly focused on two classes, the so-called binary
+system. Suppose we wish to extend to <span class="math notranslate nohighlight">\(K\)</span> classes.  Let us for the sake
+of simplicity assume we have only two predictors. We have then following model</p>
+<div class="math notranslate nohighlight">
+\[
+\log{\frac{p(C=1\vert x)}{p(K\vert x)}} = \theta_{10}+\theta_{11}x_1,
+\]</div>
+<p>and</p>
+<div class="math notranslate nohighlight">
+\[
+\log{\frac{p(C=2\vert x)}{p(K\vert x)}} = \theta_{20}+\theta_{21}x_1,
+\]</div>
+<p>and so on till the class <span class="math notranslate nohighlight">\(C=K-1\)</span> class</p>
+<div class="math notranslate nohighlight">
+\[
+\log{\frac{p(C=K-1\vert x)}{p(K\vert x)}} = \theta_{(K-1)0}+\theta_{(K-1)1}x_1,
+\]</div>
+<p>and the model is specified in term of <span class="math notranslate nohighlight">\(K-1\)</span> so-called log-odds or
+<strong>logit</strong> transformations.</p>
+</section>
+<section id="more-classes">
+<h2>More classes<a class="headerlink" href="#more-classes" title="Link to this heading">#</a></h2>
+<p>In our discussion of neural networks we will encounter the above again
+in terms of a slightly modified function, the so-called <strong>Softmax</strong> function.</p>
+<p>The softmax function is used in various multiclass classification
+methods, such as multinomial logistic regression (also known as
+softmax regression), multiclass linear discriminant analysis, naive
+Bayes classifiers, and artificial neural networks.  Specifically, in
+multinomial logistic regression and linear discriminant analysis, the
+input to the function is the result of <span class="math notranslate nohighlight">\(K\)</span> distinct linear functions,
+and the predicted probability for the <span class="math notranslate nohighlight">\(k\)</span>-th class given a sample
+vector <span class="math notranslate nohighlight">\(\boldsymbol{x}\)</span> and a weighting vector <span class="math notranslate nohighlight">\(\boldsymbol{\theta}\)</span> is (with two
+predictors):</p>
+<div class="math notranslate nohighlight">
+\[
+p(C=k\vert \mathbf {x} )=\frac{\exp{(\theta_{k0}+\theta_{k1}x_1)}}{1+\sum_{l=1}^{K-1}\exp{(\theta_{l0}+\theta_{l1}x_1)}}.
+\]</div>
+<p>It is easy to extend to more predictors. The final class is</p>
+<div class="math notranslate nohighlight">
+\[
+p(C=K\vert \mathbf {x} )=\frac{1}{1+\sum_{l=1}^{K-1}\exp{(\theta_{l0}+\theta_{l1}x_1)}},
+\]</div>
+<p>and they sum to one. Our earlier discussions were all specialized to
+the case with two classes only. It is easy to see from the above that
+what we derived earlier is compatible with these equations.</p>
+<p>To find the optimal parameters we would typically use a gradient
+descent method.  Newton’s method and gradient descent methods are
+discussed in the material on <a class="reference external" href="https://compphysics.github.io/MachineLearning/doc/pub/Splines/html/Splines-bs.html">optimization
+methods</a>.</p>
+</section>
+<section id="optimization-the-central-part-of-any-machine-learning-algortithm">
+<h2>Optimization, the central part of any Machine Learning algortithm<a class="headerlink" href="#optimization-the-central-part-of-any-machine-learning-algortithm" title="Link to this heading">#</a></h2>
+<p>Almost every problem in machine learning and data science starts with
+a dataset <span class="math notranslate nohighlight">\(X\)</span>, a model <span class="math notranslate nohighlight">\(g(\theta)\)</span>, which is a function of the
+parameters <span class="math notranslate nohighlight">\(\theta\)</span> and a cost function <span class="math notranslate nohighlight">\(C(X, g(\theta))\)</span> that allows
+us to judge how well the model <span class="math notranslate nohighlight">\(g(\theta)\)</span> explains the observations
+<span class="math notranslate nohighlight">\(X\)</span>. The model is fit by finding the values of <span class="math notranslate nohighlight">\(\theta\)</span> that minimize
+the cost function. Ideally we would be able to solve for <span class="math notranslate nohighlight">\(\theta\)</span>
+analytically, however this is not possible in general and we must use
+some approximative/numerical method to compute the minimum.</p>
+</section>
+<section id="revisiting-our-logistic-regression-case">
+<h2>Revisiting our Logistic Regression case<a class="headerlink" href="#revisiting-our-logistic-regression-case" title="Link to this heading">#</a></h2>
+<p>In our discussion on Logistic Regression we studied the
+case of
+two classes, with <span class="math notranslate nohighlight">\(y_i\)</span> either
+<span class="math notranslate nohighlight">\(0\)</span> or <span class="math notranslate nohighlight">\(1\)</span>. Furthermore we assumed also that we have only two
+parameters <span class="math notranslate nohighlight">\(\theta\)</span> in our fitting, that is we
+defined probabilities</p>
+<div class="math notranslate nohighlight">
+\[\begin{split}
+\begin{align*}
+p(y_i=1|x_i,\boldsymbol{\theta}) &amp;= \frac{\exp{(\theta_0+\theta_1x_i)}}{1+\exp{(\theta_0+\theta_1x_i)}},\nonumber\\
+p(y_i=0|x_i,\boldsymbol{\theta}) &amp;= 1 - p(y_i=1|x_i,\boldsymbol{\theta}),
+\end{align*}
+\end{split}\]</div>
+<p>where <span class="math notranslate nohighlight">\(\boldsymbol{\theta}\)</span> are the weights we wish to extract from data, in our case <span class="math notranslate nohighlight">\(\theta_0\)</span> and <span class="math notranslate nohighlight">\(\theta_1\)</span>.</p>
+</section>
+<section id="the-equations-to-solve">
+<h2>The equations to solve<a class="headerlink" href="#the-equations-to-solve" title="Link to this heading">#</a></h2>
+<p>Our compact equations used a definition of a vector <span class="math notranslate nohighlight">\(\boldsymbol{y}\)</span> with <span class="math notranslate nohighlight">\(n\)</span>
+elements <span class="math notranslate nohighlight">\(y_i\)</span>, an <span class="math notranslate nohighlight">\(n\times p\)</span> matrix <span class="math notranslate nohighlight">\(\boldsymbol{X}\)</span> which contains the
+<span class="math notranslate nohighlight">\(x_i\)</span> values and a vector <span class="math notranslate nohighlight">\(\boldsymbol{p}\)</span> of fitted probabilities
+<span class="math notranslate nohighlight">\(p(y_i\vert x_i,\boldsymbol{\theta})\)</span>. We rewrote in a more compact form
+the first derivative of the cost function as</p>
+<div class="math notranslate nohighlight">
+\[
+\frac{\partial \mathcal{C}(\boldsymbol{\theta})}{\partial \boldsymbol{\theta}} = -\boldsymbol{X}^T\left(\boldsymbol{y}-\boldsymbol{p}\right).
+\]</div>
+<p>If we in addition define a diagonal matrix <span class="math notranslate nohighlight">\(\boldsymbol{W}\)</span> with elements
+<span class="math notranslate nohighlight">\(p(y_i\vert x_i,\boldsymbol{\theta})(1-p(y_i\vert x_i,\boldsymbol{\theta})\)</span>, we can obtain a compact expression of the second derivative as</p>
+<div class="math notranslate nohighlight">
+\[
+\frac{\partial^2 \mathcal{C}(\boldsymbol{\theta})}{\partial \boldsymbol{\theta}\partial \boldsymbol{\theta}^T} = \boldsymbol{X}^T\boldsymbol{W}\boldsymbol{X}.
+\]</div>
+<p>This defines what is called  the Hessian matrix.</p>
+</section>
+<section id="solving-using-newton-raphson-s-method">
+<h2>Solving using Newton-Raphson’s method<a class="headerlink" href="#solving-using-newton-raphson-s-method" title="Link to this heading">#</a></h2>
+<p>If we can set up these equations, Newton-Raphson’s iterative method is normally the method of choice. It requires however that we can compute in an efficient way the  matrices that define the first and second derivatives.</p>
+<p>Our iterative scheme is then given by</p>
+<div class="math notranslate nohighlight">
+\[
+\boldsymbol{\theta}^{\mathrm{new}} = \boldsymbol{\theta}^{\mathrm{old}}-\left(\frac{\partial^2 \mathcal{C}(\boldsymbol{\theta})}{\partial \boldsymbol{\theta}\partial \boldsymbol{\theta}^T}\right)^{-1}_{\boldsymbol{\theta}^{\mathrm{old}}}\times \left(\frac{\partial \mathcal{C}(\boldsymbol{\theta})}{\partial \boldsymbol{\theta}}\right)_{\boldsymbol{\theta}^{\mathrm{old}}},
+\]</div>
+<p>or in matrix form as</p>
+<div class="math notranslate nohighlight">
+\[
+\boldsymbol{\theta}^{\mathrm{new}} = \boldsymbol{\theta}^{\mathrm{old}}-\left(\boldsymbol{X}^T\boldsymbol{W}\boldsymbol{X} \right)^{-1}\times \left(-\boldsymbol{X}^T(\boldsymbol{y}-\boldsymbol{p}) \right)_{\boldsymbol{\theta}^{\mathrm{old}}}.
+\]</div>
+<p>The right-hand side is computed with the old values of <span class="math notranslate nohighlight">\(\theta\)</span>.</p>
+<p>If we can compute these matrices, in particular the Hessian, the above is often the easiest method to implement.</p>
+</section>
+<section id="example-code-for-logistic-regression">
+<h2>Example code for Logistic Regression<a class="headerlink" href="#example-code-for-logistic-regression" title="Link to this heading">#</a></h2>
+<p>Here we make a class for Logistic regression. The code uses a simple data set and includes both a binary case and a multiclass case.</p>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-none notranslate"><div class="highlight"><pre><span></span>import numpy as np
+
+class LogisticRegression:
+    &quot;&quot;&quot;
+    Logistic Regression for binary and multiclass classification.
+    &quot;&quot;&quot;
+    def __init__(self, lr=0.01, epochs=1000, fit_intercept=True, verbose=False):
+        self.lr = lr                  # Learning rate for gradient descent
+        self.epochs = epochs          # Number of iterations
+        self.fit_intercept = fit_intercept  # Whether to add intercept (bias)
+        self.verbose = verbose        # Print loss during training if True
+        self.weights = None
+        self.multi_class = False      # Will be determined at fit time
+
+    def _add_intercept(self, X):
+        &quot;&quot;&quot;Add intercept term (column of ones) to feature matrix.&quot;&quot;&quot;
+        intercept = np.ones((X.shape[0], 1))
+        return np.concatenate((intercept, X), axis=1)
+
+    def _sigmoid(self, z):
+        &quot;&quot;&quot;Sigmoid function for binary logistic.&quot;&quot;&quot;
+        return 1 / (1 + np.exp(-z))
+
+    def _softmax(self, Z):
+        &quot;&quot;&quot;Softmax function for multiclass logistic.&quot;&quot;&quot;
+        exp_Z = np.exp(Z - np.max(Z, axis=1, keepdims=True))
+        return exp_Z / np.sum(exp_Z, axis=1, keepdims=True)
+
+    def fit(self, X, y):
+        &quot;&quot;&quot;
+        Train the logistic regression model using gradient descent.
+        Supports binary (sigmoid) and multiclass (softmax) based on y.
+        &quot;&quot;&quot;
+        X = np.array(X)
+        y = np.array(y)
+        n_samples, n_features = X.shape
+
+        # Add intercept if needed
+        if self.fit_intercept:
+            X = self._add_intercept(X)
+            n_features += 1
+
+        # Determine classes and mode (binary vs multiclass)
+        unique_classes = np.unique(y)
+        if len(unique_classes) &gt; 2:
+            self.multi_class = True
+        else:
+            self.multi_class = False
+
+        # ----- Multiclass case -----
+        if self.multi_class:
+            n_classes = len(unique_classes)
+            # Map original labels to 0...n_classes-1
+            class_to_index = {c: idx for idx, c in enumerate(unique_classes)}
+            y_indices = np.array([class_to_index[c] for c in y])
+            # Initialize weight matrix (features x classes)
+            self.weights = np.zeros((n_features, n_classes))
+
+            # One-hot encode y
+            Y_onehot = np.zeros((n_samples, n_classes))
+            Y_onehot[np.arange(n_samples), y_indices] = 1
+
+            # Gradient descent
+            for epoch in range(self.epochs):
+                scores = X.dot(self.weights)          # Linear scores (n_samples x n_classes)
+                probs = self._softmax(scores)        # Probabilities (n_samples x n_classes)
+                # Compute gradient (features x classes)
+                gradient = (1 / n_samples) * X.T.dot(probs - Y_onehot)
+                # Update weights
+                self.weights -= self.lr * gradient
+
+                if self.verbose and epoch % 100 == 0:
+                    # Compute current loss (categorical cross-entropy)
+                    loss = -np.sum(Y_onehot * np.log(probs + 1e-15)) / n_samples
+                    print(f&quot;[Epoch {epoch}] Multiclass loss: {loss:.4f}&quot;)
+
+        # ----- Binary case -----
+        else:
+            # Convert y to 0/1 if not already
+            if not np.array_equal(unique_classes, [0, 1]):
+                # Map the two classes to 0 and 1
+                class0, class1 = unique_classes
+                y_binary = np.where(y == class1, 1, 0)
+            else:
+                y_binary = y.copy().astype(int)
+
+            # Initialize weights vector (features,)
+            self.weights = np.zeros(n_features)
+
+            # Gradient descent
+            for epoch in range(self.epochs):
+                linear_model = X.dot(self.weights)     # (n_samples,)
+                probs = self._sigmoid(linear_model)   # (n_samples,)
+                # Gradient for binary cross-entropy
+                gradient = (1 / n_samples) * X.T.dot(probs - y_binary)
+                self.weights -= self.lr * gradient
+
+                if self.verbose and epoch % 100 == 0:
+                    # Compute binary cross-entropy loss
+                    loss = -np.mean(
+                        y_binary * np.log(probs + 1e-15) + 
+                        (1 - y_binary) * np.log(1 - probs + 1e-15)
+                    )
+                    print(f&quot;[Epoch {epoch}] Binary loss: {loss:.4f}&quot;)
+
+    def predict_prob(self, X):
+        &quot;&quot;&quot;
+        Compute probability estimates. Returns a 1D array for binary or
+        a 2D array (n_samples x n_classes) for multiclass.
+        &quot;&quot;&quot;
+        X = np.array(X)
+        # Add intercept if the model used it
+        if self.fit_intercept:
+            X = self._add_intercept(X)
+        scores = X.dot(self.weights)
+        if self.multi_class:
+            return self._softmax(scores)
+        else:
+            return self._sigmoid(scores)
+
+    def predict(self, X):
+        &quot;&quot;&quot;
+        Predict class labels for samples in X.
+        Returns integer class labels (0,1 for binary, or 0...C-1 for multiclass).
+        &quot;&quot;&quot;
+        probs = self.predict_prob(X)
+        if self.multi_class:
+            # Choose class with highest probability
+            return np.argmax(probs, axis=1)
+        else:
+            # Threshold at 0.5 for binary
+            return (probs &gt;= 0.5).astype(int)
+</pre></div>
+</div>
+</div>
+</div>
+<p>The class implements the sigmoid and softmax internally. During fit(),
+we check the number of classes: if more than 2, we set
+self.multi_class=True and perform multinomial logistic regression. We
+one-hot encode the target vector and update a weight matrix with
+softmax probabilities. Otherwise, we do standard binary logistic
+regression, converting labels to 0/1 if needed and updating a weight
+vector. In both cases we use batch gradient descent on the
+cross-entropy loss (we add a small epsilon 1e-15 to logs for numerical
+stability). Progress (loss) can be printed if verbose=True.</p>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-none notranslate"><div class="highlight"><pre><span></span># Evaluation Metrics
+#We define helper functions for accuracy and cross-entropy loss. Accuracy is the fraction of correct predictions . For loss, we compute the appropriate cross-entropy:
+
+def accuracy_score(y_true, y_pred):
+    &quot;&quot;&quot;Accuracy = (# correct predictions) / (total samples).&quot;&quot;&quot;
+    y_true = np.array(y_true)
+    y_pred = np.array(y_pred)
+    return np.mean(y_true == y_pred)
+
+def binary_cross_entropy(y_true, y_prob):
+    &quot;&quot;&quot;
+    Binary cross-entropy loss.
+    y_true: true binary labels (0 or 1), y_prob: predicted probabilities for class 1.
+    &quot;&quot;&quot;
+    y_true = np.array(y_true)
+    y_prob = np.clip(np.array(y_prob), 1e-15, 1-1e-15)
+    return -np.mean(y_true * np.log(y_prob) + (1 - y_true) * np.log(1 - y_prob))
+
+def categorical_cross_entropy(y_true, y_prob):
+    &quot;&quot;&quot;
+    Categorical cross-entropy loss for multiclass.
+    y_true: true labels (0...C-1), y_prob: array of predicted probabilities (n_samples x C).
+    &quot;&quot;&quot;
+    y_true = np.array(y_true, dtype=int)
+    y_prob = np.clip(np.array(y_prob), 1e-15, 1-1e-15)
+    # One-hot encode true labels
+    n_samples, n_classes = y_prob.shape
+    one_hot = np.zeros_like(y_prob)
+    one_hot[np.arange(n_samples), y_true] = 1
+    # Compute cross-entropy
+    loss_vec = -np.sum(one_hot * np.log(y_prob), axis=1)
+    return np.mean(loss_vec)
+</pre></div>
+</div>
+</div>
+</div>
+<section id="synthetic-data-generation">
+<h3>Synthetic data generation<a class="headerlink" href="#synthetic-data-generation" title="Link to this heading">#</a></h3>
+<p>Binary classification data: Create two Gaussian clusters in 2D. For example, class 0 around mean [-2,-2] and class 1 around [2,2].
+Multiclass data: Create several Gaussian clusters (one per class) spread out in feature space.</p>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-none notranslate"><div class="highlight"><pre><span></span>import numpy as np
+
+def generate_binary_data(n_samples=100, n_features=2, random_state=None):
+    &quot;&quot;&quot;
+    Generate synthetic binary classification data.
+    Returns (X, y) where X is (n_samples x n_features), y in {0,1}.
+    &quot;&quot;&quot;
+    rng = np.random.RandomState(random_state)
+    # Half samples for class 0, half for class 1
+    n0 = n_samples // 2
+    n1 = n_samples - n0
+    # Class 0 around mean -2, class 1 around +2
+    mean0 = -2 * np.ones(n_features)
+    mean1 =  2 * np.ones(n_features)
+    X0 = rng.randn(n0, n_features) + mean0
+    X1 = rng.randn(n1, n_features) + mean1
+    X = np.vstack((X0, X1))
+    y = np.array([0]*n0 + [1]*n1)
+    return X, y
+
+def generate_multiclass_data(n_samples=150, n_features=2, n_classes=3, random_state=None):
+    &quot;&quot;&quot;
+    Generate synthetic multiclass data with n_classes Gaussian clusters.
+    &quot;&quot;&quot;
+    rng = np.random.RandomState(random_state)
+    X = []
+    y = []
+    samples_per_class = n_samples // n_classes
+    for cls in range(n_classes):
+        # Random cluster center for each class
+        center = rng.uniform(-5, 5, size=n_features)
+        Xi = rng.randn(samples_per_class, n_features) + center
+        yi = [cls] * samples_per_class
+        X.append(Xi)
+        y.extend(yi)
+    X = np.vstack(X)
+    y = np.array(y)
+    return X, y
+
+
+# Generate and test on binary data
+X_bin, y_bin = generate_binary_data(n_samples=200, n_features=2, random_state=42)
+model_bin = LogisticRegression(lr=0.1, epochs=1000)
+model_bin.fit(X_bin, y_bin)
+y_prob_bin = model_bin.predict_prob(X_bin)      # probabilities for class 1
+y_pred_bin = model_bin.predict(X_bin)           # predicted classes 0 or 1
+
+acc_bin = accuracy_score(y_bin, y_pred_bin)
+loss_bin = binary_cross_entropy(y_bin, y_prob_bin)
+print(f&quot;Binary Classification - Accuracy: {acc_bin:.2f}, Cross-Entropy Loss: {loss_bin:.2f}&quot;)
+#For multiclass:
+# Generate and test on multiclass data
+X_multi, y_multi = generate_multiclass_data(n_samples=300, n_features=2, n_classes=3, random_state=1)
+model_multi = LogisticRegression(lr=0.1, epochs=1000)
+model_multi.fit(X_multi, y_multi)
+y_prob_multi = model_multi.predict_prob(X_multi)     # (n_samples x 3) probabilities
+y_pred_multi = model_multi.predict(X_multi)          # predicted labels 0,1,2
+
+acc_multi = accuracy_score(y_multi, y_pred_multi)
+loss_multi = categorical_cross_entropy(y_multi, y_prob_multi)
+print(f&quot;Multiclass Classification - Accuracy: {acc_multi:.2f}, Cross-Entropy Loss: {loss_multi:.2f}&quot;)
+
+# CSV Export
+import csv
+
+# Export binary results
+with open(&#39;binary_results.csv&#39;, mode=&#39;w&#39;, newline=&#39;&#39;) as f:
+    writer = csv.writer(f)
+    writer.writerow([&quot;TrueLabel&quot;, &quot;PredictedLabel&quot;])
+    for true, pred in zip(y_bin, y_pred_bin):
+        writer.writerow([true, pred])
+
+# Export multiclass results
+with open(&#39;multiclass_results.csv&#39;, mode=&#39;w&#39;, newline=&#39;&#39;) as f:
+    writer = csv.writer(f)
+    writer.writerow([&quot;TrueLabel&quot;, &quot;PredictedLabel&quot;])
+    for true, pred in zip(y_multi, y_pred_multi):
+        writer.writerow([true, pred])
+</pre></div>
+</div>
+</div>
+</div>
+</section>
+</section>
+</section>
+
+    <script type="text/x-thebe-config">
+    {
+        requestKernel: true,
+        binderOptions: {
+            repo: "binder-examples/jupyter-stacks-datascience",
+            ref: "master",
+        },
+        codeMirrorConfig: {
+            theme: "abcdef",
+            mode: "python"
+        },
+        kernelOptions: {
+            name: "python3",
+            path: "./."
+        },
+        predefinedOutput: true
+    }
+    </script>
+    <script>kernelName = 'python3'</script>
+
+                </article>
+              
+
+              
+              
+              
+              
+                <footer class="prev-next-footer d-print-none">
+                  
+<div class="prev-next-area">
+    <a class="left-prev"
+       href="http://github.com/exercisesweek39.html"
+       title="previous page">
+      <i class="fa-solid fa-angle-left"></i>
+      <div class="prev-next-info">
+        <p class="prev-next-subtitle">previous</p>
+        <p class="prev-next-title">Exercises week 39</p>
+      </div>
+    </a>
+    <a class="right-next"
+       href="http://github.com/week40.html"
+       title="next page">
+      <div class="prev-next-info">
+        <p class="prev-next-subtitle">next</p>
+        <p class="prev-next-title">Week 40: Gradient descent methods (continued) and start Neural networks</p>
+      </div>
+      <i class="fa-solid fa-angle-right"></i>
+    </a>
+</div>
+                </footer>
+              
+            </div>
+            
+            
+              
+                <div class="bd-sidebar-secondary bd-toc"><div class="sidebar-secondary-items sidebar-secondary__inner">
+
+
+  <div class="sidebar-secondary-item">
+  <div class="page-toc tocsection onthispage">
+    <i class="fa-solid fa-list"></i> Contents
+  </div>
+  <nav class="bd-toc-nav page-toc">
+    <ul class="visible nav section-nav flex-column">
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#plan-for-week-39-september-22-26-2025">Plan for week 39, September 22-26, 2025</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#readings-and-videos-resampling-methods">Readings and Videos, resampling methods</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#readings-and-videos-logistic-regression">Readings and Videos, logistic regression</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#lab-sessions-week-39">Lab sessions week 39</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#lecture-material">Lecture material</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#resampling-methods">Resampling methods</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#resampling-approaches-can-be-computationally-expensive">Resampling approaches can be computationally expensive</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#why-resampling-methods">Why resampling methods ?</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#statistical-analysis">Statistical analysis</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#id1">Resampling methods</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#resampling-methods-bootstrap">Resampling methods: Bootstrap</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#the-bias-variance-tradeoff">The bias-variance tradeoff</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#a-way-to-read-the-bias-variance-tradeoff">A way to Read the Bias-Variance Tradeoff</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#understanding-what-happens">Understanding what happens</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#summing-up">Summing up</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#another-example-from-scikit-learn-s-repository">Another Example from Scikit-Learn’s Repository</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#various-steps-in-cross-validation">Various steps in cross-validation</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#cross-validation-in-brief">Cross-validation in brief</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#code-example-for-cross-validation-and-k-fold-cross-validation">Code Example for Cross-validation and <span class="math notranslate nohighlight">\(k\)</span>-fold Cross-validation</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#more-examples-on-bootstrap-and-cross-validation-and-errors">More examples on bootstrap and cross-validation and errors</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#the-same-example-but-now-with-cross-validation">The same example but now with cross-validation</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#logistic-regression">Logistic Regression</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#classification-problems">Classification problems</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#optimization-and-deep-learning">Optimization and Deep learning</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#basics">Basics</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#linear-classifier">Linear classifier</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#some-selected-properties">Some selected properties</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#simple-example">Simple example</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#plotting-the-mean-value-for-each-group">Plotting the mean value for each group</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#the-logistic-function">The logistic function</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-networks">Examples of likelihood functions used in logistic regression and nueral networks</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#two-parameters">Two parameters</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#maximum-likelihood">Maximum likelihood</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#the-cost-function-rewritten">The cost function rewritten</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#minimizing-the-cross-entropy">Minimizing the cross entropy</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#a-more-compact-expression">A more compact expression</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#extending-to-more-predictors">Extending to more predictors</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#including-more-classes">Including more classes</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#more-classes">More classes</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#optimization-the-central-part-of-any-machine-learning-algortithm">Optimization, the central part of any Machine Learning algortithm</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#revisiting-our-logistic-regression-case">Revisiting our Logistic Regression case</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#the-equations-to-solve">The equations to solve</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#solving-using-newton-raphson-s-method">Solving using Newton-Raphson’s method</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#example-code-for-logistic-regression">Example code for Logistic Regression</a><ul class="nav section-nav flex-column">
+<li class="toc-h3 nav-item toc-entry"><a class="reference internal nav-link" href="#synthetic-data-generation">Synthetic data generation</a></li>
+</ul>
+</li>
+</ul>
+  </nav></div>
+
+</div></div>
+              
+            
+          </div>
+          <footer class="bd-footer-content">
+            
+<div class="bd-footer-content__inner container">
+  
+  <div class="footer-item">
+    
+<p class="component-author">
+By Morten Hjorth-Jensen
+</p>
+
+  </div>
+  
+  <div class="footer-item">
+    
+
+  <p class="copyright">
+    
+      © Copyright 2023.
+      <br/>
+    
+  </p>
+
+  </div>
+  
+  <div class="footer-item">
+    
+  </div>
+  
+  <div class="footer-item">
+    
+  </div>
+  
+</div>
+          </footer>
+        
+
+      </main>
+    </div>
+  </div>
+  
+  <!-- Scripts loaded after <body> so the DOM is not blocked -->
+  <script src="http://github.com/_static/scripts/bootstrap.js?digest=dfe6caa3a7d634c4db9b"></script>
+<script src="http://github.com/_static/scripts/pydata-sphinx-theme.js?digest=dfe6caa3a7d634c4db9b"></script>
+
+  <footer class="bd-footer">
+  </footer>
+  </body>
+</html>
\ No newline at end of file
diff --git a/doc/LectureNotes/_build/html/week40.html b/doc/LectureNotes/_build/html/week40.html
new file mode 100644
index 000000000..3a6944e4d
--- /dev/null
+++ b/doc/LectureNotes/_build/html/week40.html
@@ -0,0 +1,2002 @@
+
+<!DOCTYPE html>
+
+
+<html lang="en" data-content_root="./" >
+
+  <head>
+    <meta charset="utf-8" />
+    <meta name="viewport" content="width=device-width, initial-scale=1.0" /><meta name="viewport" content="width=device-width, initial-scale=1" />
+
+    <title>Week 40: Gradient descent methods (continued) and start Neural networks &#8212; Applied Data Analysis and Machine Learning</title>
+  
+  
+  
+  <script data-cfasync="false">
+    document.documentElement.dataset.mode = localStorage.getItem("mode") || "";
+    document.documentElement.dataset.theme = localStorage.getItem("theme") || "";
+  </script>
+  
+  <!-- Loaded before other Sphinx assets -->
+  <link href="http://github.com/_static/styles/theme.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+<link href="http://github.com/_static/styles/bootstrap.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+<link href="http://github.com/_static/styles/pydata-sphinx-theme.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+
+  
+  <link href="http://github.com/_static/vendor/fontawesome/6.5.2/css/all.min.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+  <link rel="preload" as="font" type="font/woff2" crossorigin href="http://github.com/_static/vendor/fontawesome/6.5.2/webfonts/fa-solid-900.woff2" />
+<link rel="preload" as="font" type="font/woff2" crossorigin href="http://github.com/_static/vendor/fontawesome/6.5.2/webfonts/fa-brands-400.woff2" />
+<link rel="preload" as="font" type="font/woff2" crossorigin href="http://github.com/_static/vendor/fontawesome/6.5.2/webfonts/fa-regular-400.woff2" />
+
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/pygments.css?v=fa44fd50" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/styles/sphinx-book-theme.css?v=eba8b062" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/togglebutton.css?v=13237357" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/copybutton.css?v=76b2166b" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/mystnb.8ecb98da25f57f5357bf6f572d296f466b2cfe2517ffebfabe82451661e28f02.css?v=6644e6bb" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/sphinx-thebe.css?v=4fa983c6" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/sphinx-design.min.css?v=95c83b7e" />
+  
+  <!-- Pre-loaded scripts that we'll load fully later -->
+  <link rel="preload" as="script" href="http://github.com/_static/scripts/bootstrap.js?digest=dfe6caa3a7d634c4db9b" />
+<link rel="preload" as="script" href="http://github.com/_static/scripts/pydata-sphinx-theme.js?digest=dfe6caa3a7d634c4db9b" />
+  <script src="http://github.com/_static/vendor/fontawesome/6.5.2/js/all.min.js?digest=dfe6caa3a7d634c4db9b"></script>
+
+    <script src="http://github.com/_static/documentation_options.js?v=9eb32ce0"></script>
+    <script src="http://github.com/_static/doctools.js?v=9a2dae69"></script>
+    <script src="http://github.com/_static/sphinx_highlight.js?v=dc90522c"></script>
+    <script src="http://github.com/_static/clipboard.min.js?v=a7894cd8"></script>
+    <script src="http://github.com/_static/copybutton.js?v=f281be69"></script>
+    <script src="http://github.com/_static/scripts/sphinx-book-theme.js?v=887ef09a"></script>
+    <script>let toggleHintShow = 'Click to show';</script>
+    <script>let toggleHintHide = 'Click to hide';</script>
+    <script>let toggleOpenOnPrint = 'true';</script>
+    <script src="http://github.com/_static/togglebutton.js?v=4a39c7ea"></script>
+    <script>var togglebuttonSelector = '.toggle, .admonition.dropdown';</script>
+    <script src="http://github.com/_static/design-tabs.js?v=f930bc37"></script>
+    <script>const THEBE_JS_URL = "https://unpkg.com/thebe@0.8.2/lib/index.js"; const thebe_selector = ".thebe,.cell"; const thebe_selector_input = "pre"; const thebe_selector_output = ".output, .cell_output"</script>
+    <script async="async" src="http://github.com/_static/sphinx-thebe.js?v=c100c467"></script>
+    <script>var togglebuttonSelector = '.toggle, .admonition.dropdown';</script>
+    <script>const THEBE_JS_URL = "https://unpkg.com/thebe@0.8.2/lib/index.js"; const thebe_selector = ".thebe,.cell"; const thebe_selector_input = "pre"; const thebe_selector_output = ".output, .cell_output"</script>
+    <script>window.MathJax = {"options": {"processHtmlClass": "tex2jax_process|mathjax_process|math|output_area"}}</script>
+    <script defer="defer" src="https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-mml-chtml.js"></script>
+    <script>DOCUMENTATION_OPTIONS.pagename = 'week40';</script>
+    <link rel="index" title="Index" href="http://github.com/genindex.html" />
+    <link rel="search" title="Search" href="http://github.com/search.html" />
+    <link rel="next" title="Week 41 Neural networks and constructing a neural network code" href="http://github.com/week41.html" />
+    <link rel="prev" title="Week 39: Resampling methods and logistic regression" href="http://github.com/week39.html" />
+  <meta name="viewport" content="width=device-width, initial-scale=1"/>
+  <meta name="docsearch:language" content="en"/>
+  </head>
+  
+  
+  <body data-bs-spy="scroll" data-bs-target=".bd-toc-nav" data-offset="180" data-bs-root-margin="0px 0px -60%" data-default-mode="">
+
+  
+  
+  <div id="pst-skip-link" class="skip-link d-print-none"><a href="#main-content">Skip to main content</a></div>
+  
+  <div id="pst-scroll-pixel-helper"></div>
+  
+  <button type="button" class="btn rounded-pill" id="pst-back-to-top">
+    <i class="fa-solid fa-arrow-up"></i>Back to top</button>
+
+  
+  <input type="checkbox"
+          class="sidebar-toggle"
+          id="pst-primary-sidebar-checkbox"/>
+  <label class="overlay overlay-primary" for="pst-primary-sidebar-checkbox"></label>
+  
+  <input type="checkbox"
+          class="sidebar-toggle"
+          id="pst-secondary-sidebar-checkbox"/>
+  <label class="overlay overlay-secondary" for="pst-secondary-sidebar-checkbox"></label>
+  
+  <div class="search-button__wrapper">
+    <div class="search-button__overlay"></div>
+    <div class="search-button__search-container">
+<form class="bd-search d-flex align-items-center"
+      action="http://github.com/search.html"
+      method="get">
+  <i class="fa-solid fa-magnifying-glass"></i>
+  <input type="search"
+         class="form-control"
+         name="q"
+         id="search-input"
+         placeholder="Search this book..."
+         aria-label="Search this book..."
+         autocomplete="off"
+         autocorrect="off"
+         autocapitalize="off"
+         spellcheck="false"/>
+  <span class="search-button__kbd-shortcut"><kbd class="kbd-shortcut__modifier">Ctrl</kbd>+<kbd>K</kbd></span>
+</form></div>
+  </div>
+
+  <div class="pst-async-banner-revealer d-none">
+  <aside id="bd-header-version-warning" class="d-none d-print-none" aria-label="Version warning"></aside>
+</div>
+
+  
+    <header class="bd-header navbar navbar-expand-lg bd-navbar d-print-none">
+    </header>
+  
+
+  <div class="bd-container">
+    <div class="bd-container__inner bd-page-width">
+      
+      
+      
+      <div class="bd-sidebar-primary bd-sidebar">
+        
+
+  
+  <div class="sidebar-header-items sidebar-primary__section">
+    
+    
+    
+    
+  </div>
+  
+    <div class="sidebar-primary-items__start sidebar-primary__section">
+        <div class="sidebar-primary-item">
+
+  
+    
+  
+
+<a class="navbar-brand logo" href="http://github.com/intro.html">
+  
+  
+  
+  
+  
+    
+    
+      
+    
+    
+    <img src="http://github.com/_static/logo.png" class="logo__image only-light" alt="Applied Data Analysis and Machine Learning - Home"/>
+    <script>document.write(`<img src="http://github.com/_static/logo.png" class="logo__image only-dark" alt="Applied Data Analysis and Machine Learning - Home"/>`);</script>
+  
+  
+</a></div>
+        <div class="sidebar-primary-item">
+
+ <script>
+ document.write(`
+   <button class="btn search-button-field search-button__button" title="Search" aria-label="Search" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <i class="fa-solid fa-magnifying-glass"></i>
+    <span class="search-button__default-text">Search</span>
+    <span class="search-button__kbd-shortcut"><kbd class="kbd-shortcut__modifier">Ctrl</kbd>+<kbd class="kbd-shortcut__modifier">K</kbd></span>
+   </button>
+ `);
+ </script></div>
+        <div class="sidebar-primary-item"><nav class="bd-links bd-docs-nav" aria-label="Main">
+    <div class="bd-toc-item navbar-nav active">
+        
+        <ul class="nav bd-sidenav bd-sidenav__home-link">
+            <li class="toctree-l1">
+                <a class="reference internal" href="http://github.com/intro.html">
+                    Applied Data Analysis and Machine Learning
+                </a>
+            </li>
+        </ul>
+        <p aria-level="2" class="caption" role="heading"><span class="caption-text">About the course</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/schedule.html">Course setting</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/teachers.html">Teachers and Grading</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/textbooks.html">Textbooks</a></li>
+
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Review of Statistics with Resampling Techniques and Linear Algebra</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/statistics.html">1. Elements of Probability Theory and Statistical Data Analysis</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/linalg.html">2. Linear Algebra, Handling of Arrays and more Python Features</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">From Regression to Support Vector Machines</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter1.html">3. Linear Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter2.html">4. Ridge and Lasso Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter3.html">5. Resampling Methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter4.html">6. Logistic Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapteroptimization.html">7. Optimization, the central part of any Machine Learning algortithm</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter5.html">8. Support Vector Machines, overarching aims</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Decision Trees, Ensemble Methods and Boosting</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter6.html">9. Decision trees, overarching aims</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter7.html">10. Ensemble Methods: From a Single Tree to Many Trees and Extreme Boosting, Meet the Jungle of Methods</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Dimensionality Reduction</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter8.html">11. Basic ideas of the Principal Component Analysis (PCA)</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/clustering.html">12. Clustering and Unsupervised Learning</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Deep Learning Methods</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter9.html">13. Neural networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter10.html">14. Building a Feed Forward Neural Network</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter11.html">15. Solving Differential Equations  with Deep Learning</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter12.html">16. Convolutional Neural Networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter13.html">17. Recurrent neural networks: Overarching view</a></li>
+
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Weekly material, notes and exercises</span></p>
+<ul class="current nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek34.html">Exercises week 34</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week34.html">Week 34: Introduction to the course, Logistics and Practicalities</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek35.html">Exercises week 35</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week35.html">Week 35: From Ordinary Linear Regression to Ridge and Lasso Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek36.html">Exercises week 36</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week36.html">Week 36: Linear Regression and Gradient descent</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek37.html">Exercises week 37</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week37.html">Week 37: Gradient descent methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek38.html">Exercises week 38</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week38.html">Week 38: Statistical analysis, bias-variance tradeoff and resampling methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek39.html">Exercises week 39</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week39.html">Week 39: Resampling methods and logistic regression</a></li>
+<li class="toctree-l1 current active"><a class="current reference internal" href="#">Week 40: Gradient descent methods (continued) and start Neural networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week41.html">Week 41 Neural networks and constructing a neural network code</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek41.html">Exercises week 41</a></li>
+
+
+
+
+
+
+
+
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week42.html">Week 42 Constructing a Neural Network code with examples</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek42.html">Exercises week 42</a></li>
+
+
+
+
+
+
+
+
+
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week43.html">Week 43: Deep Learning: Constructing a Neural Network code and solving differential equations</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek43.html">Exercises week 43</a></li>
+
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week44.html">Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek44.html">Exercises week 44</a></li>
+
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week45.html">Week 45,  Convolutional Neural Networks (CCNs)</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Projects</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/project1.html">Project 1 on Machine Learning, deadline October 6 (midnight), 2025</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/project2.html">Project 2 on Machine Learning, deadline November 10 (Midnight)</a></li>
+</ul>
+
+    </div>
+</nav></div>
+    </div>
+  
+  
+  <div class="sidebar-primary-items__end sidebar-primary__section">
+  </div>
+  
+  <div id="rtd-footer-container"></div>
+
+
+      </div>
+      
+      <main id="main-content" class="bd-main" role="main">
+        
+        
+
+<div class="sbt-scroll-pixel-helper"></div>
+
+          <div class="bd-content">
+            <div class="bd-article-container">
+              
+              <div class="bd-header-article d-print-none">
+<div class="header-article-items header-article__inner">
+  
+    <div class="header-article-items__start">
+      
+        <div class="header-article-item"><button class="sidebar-toggle primary-toggle btn btn-sm" title="Toggle primary sidebar" data-bs-placement="bottom" data-bs-toggle="tooltip">
+  <span class="fa-solid fa-bars"></span>
+</button></div>
+      
+    </div>
+  
+  
+    <div class="header-article-items__end">
+      
+        <div class="header-article-item">
+
+<div class="article-header-buttons">
+
+
+
+
+
+<div class="dropdown dropdown-download-buttons">
+  <button class="btn dropdown-toggle" type="button" data-bs-toggle="dropdown" aria-expanded="false" aria-label="Download this page">
+    <i class="fas fa-download"></i>
+  </button>
+  <ul class="dropdown-menu">
+      
+      
+      
+      <li><a href="http://github.com/_sources/week40.ipynb" target="_blank"
+   class="btn btn-sm btn-download-source-button dropdown-item"
+   title="Download source file"
+   data-bs-placement="left" data-bs-toggle="tooltip"
+>
+  
+
+<span class="btn__icon-container">
+  <i class="fas fa-file"></i>
+  </span>
+<span class="btn__text-container">.ipynb</span>
+</a>
+</li>
+      
+      
+      
+      
+      <li>
+<button onclick="window.print()"
+  class="btn btn-sm btn-download-pdf-button dropdown-item"
+  title="Print to PDF"
+  data-bs-placement="left" data-bs-toggle="tooltip"
+>
+  
+
+<span class="btn__icon-container">
+  <i class="fas fa-file-pdf"></i>
+  </span>
+<span class="btn__text-container">.pdf</span>
+</button>
+</li>
+      
+  </ul>
+</div>
+
+
+
+
+<button onclick="toggleFullScreen()"
+  class="btn btn-sm btn-fullscreen-button"
+  title="Fullscreen mode"
+  data-bs-placement="bottom" data-bs-toggle="tooltip"
+>
+  
+
+<span class="btn__icon-container">
+  <i class="fas fa-expand"></i>
+  </span>
+
+</button>
+
+
+
+<script>
+document.write(`
+  <button class="btn btn-sm nav-link pst-navbar-icon theme-switch-button" title="light/dark" aria-label="light/dark" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <i class="theme-switch fa-solid fa-sun fa-lg" data-mode="light"></i>
+    <i class="theme-switch fa-solid fa-moon fa-lg" data-mode="dark"></i>
+    <i class="theme-switch fa-solid fa-circle-half-stroke fa-lg" data-mode="auto"></i>
+  </button>
+`);
+</script>
+
+
+<script>
+document.write(`
+  <button class="btn btn-sm pst-navbar-icon search-button search-button__button" title="Search" aria-label="Search" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <i class="fa-solid fa-magnifying-glass fa-lg"></i>
+  </button>
+`);
+</script>
+<button class="sidebar-toggle secondary-toggle btn btn-sm" title="Toggle secondary sidebar" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <span class="fa-solid fa-list"></span>
+</button>
+</div></div>
+      
+    </div>
+  
+</div>
+</div>
+              
+              
+
+<div id="jb-print-docs-body" class="onlyprint">
+    <h1>Week 40: Gradient descent methods (continued) and start Neural networks</h1>
+    <!-- Table of contents -->
+    <div id="print-main-content">
+        <div id="jb-print-toc">
+            
+            <div>
+                <h2> Contents </h2>
+            </div>
+            <nav aria-label="Page">
+                <ul class="visible nav section-nav flex-column">
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#lecture-monday-september-29-2025">Lecture Monday September 29, 2025</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#suggested-readings-and-videos">Suggested readings and videos</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#lab-sessions-tuesday-and-wednesday">Lab sessions Tuesday and Wednesday</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#logistic-regression-from-last-week">Logistic Regression, from last week</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#classification-problems">Classification problems</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#optimization-and-deep-learning">Optimization and Deep learning</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#basics">Basics</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#two-parameters">Two parameters</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#maximum-likelihood">Maximum likelihood</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#the-cost-function-rewritten">The cost function rewritten</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#minimizing-the-cross-entropy">Minimizing the cross entropy</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#a-more-compact-expression">A more compact expression</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#extending-to-more-predictors">Extending to more predictors</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#including-more-classes">Including more classes</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#more-classes">More classes</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#optimization-the-central-part-of-any-machine-learning-algortithm">Optimization, the central part of any Machine Learning algortithm</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#revisiting-our-logistic-regression-case">Revisiting our Logistic Regression case</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#the-equations-to-solve">The equations to solve</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#solving-using-newton-raphson-s-method">Solving using Newton-Raphson’s method</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#example-code-for-logistic-regression">Example code for Logistic Regression</a><ul class="nav section-nav flex-column">
+<li class="toc-h3 nav-item toc-entry"><a class="reference internal nav-link" href="#synthetic-data-generation">Synthetic data generation</a></li>
+</ul>
+</li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#using-scikit-learn">Using <strong>Scikit-learn</strong></a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#using-the-correlation-matrix">Using the correlation matrix</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#discussing-the-correlation-data">Discussing the correlation data</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#other-measures-in-classification-studies">Other measures in classification studies</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#introduction-to-neural-networks">Introduction to Neural networks</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#artificial-neurons">Artificial neurons</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#neural-network-types">Neural network types</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#feed-forward-neural-networks">Feed-forward neural networks</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#convolutional-neural-network">Convolutional Neural Network</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#recurrent-neural-networks">Recurrent neural networks</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#other-types-of-networks">Other types of networks</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#multilayer-perceptrons">Multilayer perceptrons</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#why-multilayer-perceptrons">Why multilayer perceptrons?</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#illustration-of-a-single-perceptron-model-and-a-multi-perceptron-model">Illustration of a single perceptron model and a multi-perceptron model</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#examples-of-xor-or-and-and-gates">Examples of XOR, OR and AND gates</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#does-logistic-regression-do-a-better-job">Does Logistic Regression do a better Job?</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#adding-neural-networks">Adding Neural Networks</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#mathematical-model">Mathematical model</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#id1">Mathematical model</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#id2">Mathematical model</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#id3">Mathematical model</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#id4">Mathematical model</a><ul class="nav section-nav flex-column">
+<li class="toc-h3 nav-item toc-entry"><a class="reference internal nav-link" href="#matrix-vector-notation">Matrix-vector notation</a></li>
+<li class="toc-h3 nav-item toc-entry"><a class="reference internal nav-link" href="#matrix-vector-notation-and-activation">Matrix-vector notation  and activation</a></li>
+<li class="toc-h3 nav-item toc-entry"><a class="reference internal nav-link" href="#activation-functions">Activation functions</a></li>
+<li class="toc-h3 nav-item toc-entry"><a class="reference internal nav-link" href="#activation-functions-logistic-and-hyperbolic-ones">Activation functions, Logistic and Hyperbolic ones</a></li>
+<li class="toc-h3 nav-item toc-entry"><a class="reference internal nav-link" href="#relevance">Relevance</a></li>
+</ul>
+</li>
+</ul>
+            </nav>
+        </div>
+    </div>
+</div>
+
+              
+                
+<div id="searchbox"></div>
+                <article class="bd-article">
+                  
+  <!-- HTML file automatically generated from DocOnce source (https://github.com/doconce/doconce/)
+doconce format html week40.do.txt --no_mako -->
+<!-- dom:TITLE: Week 40: Gradient descent methods (continued) and start Neural networks --><section class="tex2jax_ignore mathjax_ignore" id="week-40-gradient-descent-methods-continued-and-start-neural-networks">
+<h1>Week 40: Gradient descent methods (continued) and start Neural networks<a class="headerlink" href="#week-40-gradient-descent-methods-continued-and-start-neural-networks" title="Link to this heading">#</a></h1>
+<p><strong>Morten Hjorth-Jensen</strong>, Department of Physics, University of Oslo, Norway</p>
+<p>Date: <strong>September 29-October 3, 2025</strong></p>
+<section id="lecture-monday-september-29-2025">
+<h2>Lecture Monday September 29, 2025<a class="headerlink" href="#lecture-monday-september-29-2025" title="Link to this heading">#</a></h2>
+<ol class="arabic simple">
+<li><p>Logistic regression and gradient descent, examples on how to code</p></li>
+</ol>
+<!-- o Automatic differentiation and gradient descent, examples using Logistic regression -->
+<ol class="arabic simple" start="2">
+<li><p>Start with the basics of Neural Networks, setting up the basic steps, from the simple perceptron model to the multi-layer perceptron model</p></li>
+<li><p>Video of lecture at <a class="reference external" href="https://youtu.be/MS3Tv8FVArs">https://youtu.be/MS3Tv8FVArs</a></p></li>
+<li><p>Whiteboard notes at <a class="github reference external" href="https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2025/FYSSTKweek40.pdf">CompPhysics/MachineLearning</a></p></li>
+</ol>
+</section>
+<section id="suggested-readings-and-videos">
+<h2>Suggested readings and videos<a class="headerlink" href="#suggested-readings-and-videos" title="Link to this heading">#</a></h2>
+<p><strong>Readings and Videos:</strong></p>
+<ol class="arabic simple">
+<li><p>The lecture notes for week 40 (these notes)</p></li>
+</ol>
+<!-- o For a good discussion on gradient methods, we would like to recommend Goodfellow et al section 4.3-4.5 and# sections 8.3-8.6. We will come back to the latter chapter in our discussion of Neural networks as well. -->
+<ol class="arabic simple" start="2">
+<li><p>For neural networks we recommend Goodfellow et al chapter 6 and Raschka et al chapter 2 (contains also material about gradient descent) and chapter 11 (we will use this next week)</p></li>
+</ol>
+<!-- o Video on gradient descent at <https://www.youtube.com/watch?v=sDv4f4s2SB8> -->
+<!-- o Video on automatic differentiation  at <https://www.youtube.com/watch?v=wG_nF1awSSY> -->
+<ol class="arabic simple" start="3">
+<li><p>Neural Networks demystified at <a class="reference external" href="https://www.youtube.com/watch?v=bxe2T-V8XRs&amp;amp;list=PLiaHhY2iBX9hdHaRr6b7XevZtgZRa1PoU&amp;amp;ab_channel=WelchLabs">https://www.youtube.com/watch?v=bxe2T-V8XRs&amp;list=PLiaHhY2iBX9hdHaRr6b7XevZtgZRa1PoU&amp;ab_channel=WelchLabs</a></p></li>
+<li><p>Building Neural Networks from scratch at URL:<a class="reference external" href="https://www.youtube.com/watch?v=Wo5dMEP_BbI&amp;amp;list=PLQVvvaa0QuDcjD5BAw2DxE6OF2tius3V3&amp;amp;ab_channel=sentdex">https://www.youtube.com/watch?v=Wo5dMEP_BbI&amp;list=PLQVvvaa0QuDcjD5BAw2DxE6OF2tius3V3&amp;ab_channel=sentdex</a>”</p></li>
+</ol>
+</section>
+<section id="lab-sessions-tuesday-and-wednesday">
+<h2>Lab sessions Tuesday and Wednesday<a class="headerlink" href="#lab-sessions-tuesday-and-wednesday" title="Link to this heading">#</a></h2>
+<p><strong>Material for the active learning sessions on Tuesday and Wednesday.</strong></p>
+<ul class="simple">
+<li><p>Work on project 1 and discussions on how to structure your report</p></li>
+<li><p>No weekly exercises for week 40, project work only</p></li>
+<li><p>Video on how to write scientific reports recorded during one of the lab sessions at <a class="reference external" href="https://youtu.be/tVW1ZDmZnwM">https://youtu.be/tVW1ZDmZnwM</a></p></li>
+<li><p>A general guideline can be found at <a class="github reference external" href="https://github.com/CompPhysics/MachineLearning/blob/master/doc/Projects/EvaluationGrading/EvaluationForm.md">CompPhysics/MachineLearning</a>.</p></li>
+</ul>
+</section>
+<section id="logistic-regression-from-last-week">
+<h2>Logistic Regression, from last week<a class="headerlink" href="#logistic-regression-from-last-week" title="Link to this heading">#</a></h2>
+<p>In linear regression our main interest was centered on learning the
+coefficients of a functional fit (say a polynomial) in order to be
+able to predict the response of a continuous variable on some unseen
+data. The fit to the continuous variable <span class="math notranslate nohighlight">\(y_i\)</span> is based on some
+independent variables <span class="math notranslate nohighlight">\(\boldsymbol{x}_i\)</span>. Linear regression resulted in
+analytical expressions for standard ordinary Least Squares or Ridge
+regression (in terms of matrices to invert) for several quantities,
+ranging from the variance and thereby the confidence intervals of the
+parameters <span class="math notranslate nohighlight">\(\boldsymbol{\theta}\)</span> to the mean squared error. If we can invert
+the product of the design matrices, linear regression gives then a
+simple recipe for fitting our data.</p>
+</section>
+<section id="classification-problems">
+<h2>Classification problems<a class="headerlink" href="#classification-problems" title="Link to this heading">#</a></h2>
+<p>Classification problems, however, are concerned with outcomes taking
+the form of discrete variables (i.e. categories). We may for example,
+on the basis of DNA sequencing for a number of patients, like to find
+out which mutations are important for a certain disease; or based on
+scans of various patients’ brains, figure out if there is a tumor or
+not; or given a specific physical system, we’d like to identify its
+state, say whether it is an ordered or disordered system (typical
+situation in solid state physics); or classify the status of a
+patient, whether she/he has a stroke or not and many other similar
+situations.</p>
+<p>The most common situation we encounter when we apply logistic
+regression is that of two possible outcomes, normally denoted as a
+binary outcome, true or false, positive or negative, success or
+failure etc.</p>
+</section>
+<section id="optimization-and-deep-learning">
+<h2>Optimization and Deep learning<a class="headerlink" href="#optimization-and-deep-learning" title="Link to this heading">#</a></h2>
+<p>Logistic regression will also serve as our stepping stone towards
+neural network algorithms and supervised deep learning. For logistic
+learning, the minimization of the cost function leads to a non-linear
+equation in the parameters <span class="math notranslate nohighlight">\(\boldsymbol{\theta}\)</span>. The optimization of the
+problem calls therefore for minimization algorithms.</p>
+<p>As we have discussed earlier, this forms the
+bottle neck of all machine learning algorithms, namely how to find
+reliable minima of a multi-variable function. This leads us to the
+family of gradient descent methods. The latter are the working horses
+of basically all modern machine learning algorithms.</p>
+<p>We note also that many of the topics discussed here on logistic
+regression are also commonly used in modern supervised Deep Learning
+models, as we will see later.</p>
+</section>
+<section id="basics">
+<h2>Basics<a class="headerlink" href="#basics" title="Link to this heading">#</a></h2>
+<p>We consider the case where the outputs/targets, also called the
+responses or the outcomes, <span class="math notranslate nohighlight">\(y_i\)</span> are discrete and only take values
+from <span class="math notranslate nohighlight">\(k=0,\dots,K-1\)</span> (i.e. <span class="math notranslate nohighlight">\(K\)</span> classes).</p>
+<p>The goal is to predict the
+output classes from the design matrix <span class="math notranslate nohighlight">\(\boldsymbol{X}\in\mathbb{R}^{n\times p}\)</span>
+made of <span class="math notranslate nohighlight">\(n\)</span> samples, each of which carries <span class="math notranslate nohighlight">\(p\)</span> features or predictors. The
+primary goal is to identify the classes to which new unseen samples
+belong.</p>
+<p>Last week we  specialized to the case of two classes only, with outputs
+<span class="math notranslate nohighlight">\(y_i=0\)</span> and <span class="math notranslate nohighlight">\(y_i=1\)</span>. Our outcomes could represent the status of a
+credit card user that could default or not on her/his credit card
+debt. That is</p>
+<div class="math notranslate nohighlight">
+\[\begin{split}
+y_i = \begin{bmatrix} 0 &amp; \mathrm{no}\\  1 &amp; \mathrm{yes} \end{bmatrix}.
+\end{split}\]</div>
+</section>
+<section id="two-parameters">
+<h2>Two parameters<a class="headerlink" href="#two-parameters" title="Link to this heading">#</a></h2>
+<p>We assume now that we have two classes with <span class="math notranslate nohighlight">\(y_i\)</span> either <span class="math notranslate nohighlight">\(0\)</span> or <span class="math notranslate nohighlight">\(1\)</span>. Furthermore we assume also that we have only two parameters <span class="math notranslate nohighlight">\(\theta\)</span> in our fitting of the Sigmoid function, that is we define probabilities</p>
+<div class="math notranslate nohighlight">
+\[\begin{split}
+\begin{align*}
+p(y_i=1|x_i,\boldsymbol{\theta}) &amp;= \frac{\exp{(\theta_0+\theta_1x_i)}}{1+\exp{(\theta_0+\theta_1x_i)}},\nonumber\\
+p(y_i=0|x_i,\boldsymbol{\theta}) &amp;= 1 - p(y_i=1|x_i,\boldsymbol{\theta}),
+\end{align*}
+\end{split}\]</div>
+<p>where <span class="math notranslate nohighlight">\(\boldsymbol{\theta}\)</span> are the weights we wish to extract from data, in our case <span class="math notranslate nohighlight">\(\theta_0\)</span> and <span class="math notranslate nohighlight">\(\theta_1\)</span>.</p>
+<p>Note that we used</p>
+<div class="math notranslate nohighlight">
+\[
+p(y_i=0\vert x_i, \boldsymbol{\theta}) = 1-p(y_i=1\vert x_i, \boldsymbol{\theta}).
+\]</div>
+</section>
+<section id="maximum-likelihood">
+<h2>Maximum likelihood<a class="headerlink" href="#maximum-likelihood" title="Link to this heading">#</a></h2>
+<p>In order to define the total likelihood for all possible outcomes from a<br />
+dataset <span class="math notranslate nohighlight">\(\mathcal{D}=\{(y_i,x_i)\}\)</span>, with the binary labels
+<span class="math notranslate nohighlight">\(y_i\in\{0,1\}\)</span> and where the data points are drawn independently, we use the so-called <a class="reference external" href="https://en.wikipedia.org/wiki/Maximum_likelihood_estimation">Maximum Likelihood Estimation</a> (MLE) principle.
+We aim thus at maximizing
+the probability of seeing the observed data. We can then approximate the
+likelihood in terms of the product of the individual probabilities of a specific outcome <span class="math notranslate nohighlight">\(y_i\)</span>, that is</p>
+<div class="math notranslate nohighlight">
+\[\begin{split}
+\begin{align*}
+P(\mathcal{D}|\boldsymbol{\theta})&amp; = \prod_{i=1}^n \left[p(y_i=1|x_i,\boldsymbol{\theta})\right]^{y_i}\left[1-p(y_i=1|x_i,\boldsymbol{\theta}))\right]^{1-y_i}\nonumber \\
+\end{align*}
+\end{split}\]</div>
+<p>from which we obtain the log-likelihood and our <strong>cost/loss</strong> function</p>
+<div class="math notranslate nohighlight">
+\[
+\mathcal{C}(\boldsymbol{\theta}) = \sum_{i=1}^n \left( y_i\log{p(y_i=1|x_i,\boldsymbol{\theta})} + (1-y_i)\log\left[1-p(y_i=1|x_i,\boldsymbol{\theta}))\right]\right).
+\]</div>
+</section>
+<section id="the-cost-function-rewritten">
+<h2>The cost function rewritten<a class="headerlink" href="#the-cost-function-rewritten" title="Link to this heading">#</a></h2>
+<p>Reordering the logarithms, we can rewrite the <strong>cost/loss</strong> function as</p>
+<div class="math notranslate nohighlight">
+\[
+\mathcal{C}(\boldsymbol{\theta}) = \sum_{i=1}^n  \left(y_i(\theta_0+\theta_1x_i) -\log{(1+\exp{(\theta_0+\theta_1x_i)})}\right).
+\]</div>
+<p>The maximum likelihood estimator is defined as the set of parameters that maximize the log-likelihood where we maximize with respect to <span class="math notranslate nohighlight">\(\theta\)</span>.
+Since the cost (error) function is just the negative log-likelihood, for logistic regression we have that</p>
+<div class="math notranslate nohighlight">
+\[
+\mathcal{C}(\boldsymbol{\theta})=-\sum_{i=1}^n  \left(y_i(\theta_0+\theta_1x_i) -\log{(1+\exp{(\theta_0+\theta_1x_i)})}\right).
+\]</div>
+<p>This equation is known in statistics as the <strong>cross entropy</strong>. Finally, we note that just as in linear regression,
+in practice we often supplement the cross-entropy with additional regularization terms, usually <span class="math notranslate nohighlight">\(L_1\)</span> and <span class="math notranslate nohighlight">\(L_2\)</span> regularization as we did for Ridge and Lasso regression.</p>
+</section>
+<section id="minimizing-the-cross-entropy">
+<h2>Minimizing the cross entropy<a class="headerlink" href="#minimizing-the-cross-entropy" title="Link to this heading">#</a></h2>
+<p>The cross entropy is a convex function of the weights <span class="math notranslate nohighlight">\(\boldsymbol{\theta}\)</span> and,
+therefore, any local minimizer is a global minimizer.</p>
+<p>Minimizing this
+cost function with respect to the two parameters <span class="math notranslate nohighlight">\(\theta_0\)</span> and <span class="math notranslate nohighlight">\(\theta_1\)</span> we obtain</p>
+<div class="math notranslate nohighlight">
+\[
+\frac{\partial \mathcal{C}(\boldsymbol{\theta})}{\partial \theta_0} = -\sum_{i=1}^n  \left(y_i -\frac{\exp{(\theta_0+\theta_1x_i)}}{1+\exp{(\theta_0+\theta_1x_i)}}\right),
+\]</div>
+<p>and</p>
+<div class="math notranslate nohighlight">
+\[
+\frac{\partial \mathcal{C}(\boldsymbol{\theta})}{\partial \theta_1} = -\sum_{i=1}^n  \left(y_ix_i -x_i\frac{\exp{(\theta_0+\theta_1x_i)}}{1+\exp{(\theta_0+\theta_1x_i)}}\right).
+\]</div>
+</section>
+<section id="a-more-compact-expression">
+<h2>A more compact expression<a class="headerlink" href="#a-more-compact-expression" title="Link to this heading">#</a></h2>
+<p>Let us now define a vector <span class="math notranslate nohighlight">\(\boldsymbol{y}\)</span> with <span class="math notranslate nohighlight">\(n\)</span> elements <span class="math notranslate nohighlight">\(y_i\)</span>, an
+<span class="math notranslate nohighlight">\(n\times p\)</span> matrix <span class="math notranslate nohighlight">\(\boldsymbol{X}\)</span> which contains the <span class="math notranslate nohighlight">\(x_i\)</span> values and a
+vector <span class="math notranslate nohighlight">\(\boldsymbol{p}\)</span> of fitted probabilities <span class="math notranslate nohighlight">\(p(y_i\vert x_i,\boldsymbol{\theta})\)</span>. We can rewrite in a more compact form the first
+derivative of the cost function as</p>
+<div class="math notranslate nohighlight">
+\[
+\frac{\partial \mathcal{C}(\boldsymbol{\theta})}{\partial \boldsymbol{\theta}} = -\boldsymbol{X}^T\left(\boldsymbol{y}-\boldsymbol{p}\right).
+\]</div>
+<p>If we in addition define a diagonal matrix <span class="math notranslate nohighlight">\(\boldsymbol{W}\)</span> with elements
+<span class="math notranslate nohighlight">\(p(y_i\vert x_i,\boldsymbol{\theta})(1-p(y_i\vert x_i,\boldsymbol{\theta})\)</span>, we can obtain a compact expression of the second derivative as</p>
+<div class="math notranslate nohighlight">
+\[
+\frac{\partial^2 \mathcal{C}(\boldsymbol{\theta})}{\partial \boldsymbol{\theta}\partial \boldsymbol{\theta}^T} = \boldsymbol{X}^T\boldsymbol{W}\boldsymbol{X}.
+\]</div>
+</section>
+<section id="extending-to-more-predictors">
+<h2>Extending to more predictors<a class="headerlink" href="#extending-to-more-predictors" title="Link to this heading">#</a></h2>
+<p>Within a binary classification problem, we can easily expand our model to include multiple predictors. Our ratio between likelihoods is then with <span class="math notranslate nohighlight">\(p\)</span> predictors</p>
+<div class="math notranslate nohighlight">
+\[
+\log{ \frac{p(\boldsymbol{\theta}\boldsymbol{x})}{1-p(\boldsymbol{\theta}\boldsymbol{x})}} = \theta_0+\theta_1x_1+\theta_2x_2+\dots+\theta_px_p.
+\]</div>
+<p>Here we defined <span class="math notranslate nohighlight">\(\boldsymbol{x}=[1,x_1,x_2,\dots,x_p]\)</span> and <span class="math notranslate nohighlight">\(\boldsymbol{\theta}=[\theta_0, \theta_1, \dots, \theta_p]\)</span> leading to</p>
+<div class="math notranslate nohighlight">
+\[
+p(\boldsymbol{\theta}\boldsymbol{x})=\frac{ \exp{(\theta_0+\theta_1x_1+\theta_2x_2+\dots+\theta_px_p)}}{1+\exp{(\theta_0+\theta_1x_1+\theta_2x_2+\dots+\theta_px_p)}}.
+\]</div>
+</section>
+<section id="including-more-classes">
+<h2>Including more classes<a class="headerlink" href="#including-more-classes" title="Link to this heading">#</a></h2>
+<p>Till now we have mainly focused on two classes, the so-called binary
+system. Suppose we wish to extend to <span class="math notranslate nohighlight">\(K\)</span> classes.  Let us for the sake
+of simplicity assume we have only two predictors. We have then following model</p>
+<div class="math notranslate nohighlight">
+\[
+\log{\frac{p(C=1\vert x)}{p(K\vert x)}} = \theta_{10}+\theta_{11}x_1,
+\]</div>
+<p>and</p>
+<div class="math notranslate nohighlight">
+\[
+\log{\frac{p(C=2\vert x)}{p(K\vert x)}} = \theta_{20}+\theta_{21}x_1,
+\]</div>
+<p>and so on till the class <span class="math notranslate nohighlight">\(C=K-1\)</span> class</p>
+<div class="math notranslate nohighlight">
+\[
+\log{\frac{p(C=K-1\vert x)}{p(K\vert x)}} = \theta_{(K-1)0}+\theta_{(K-1)1}x_1,
+\]</div>
+<p>and the model is specified in term of <span class="math notranslate nohighlight">\(K-1\)</span> so-called log-odds or
+<strong>logit</strong> transformations.</p>
+</section>
+<section id="more-classes">
+<h2>More classes<a class="headerlink" href="#more-classes" title="Link to this heading">#</a></h2>
+<p>In our discussion of neural networks we will encounter the above again
+in terms of a slightly modified function, the so-called <strong>Softmax</strong> function.</p>
+<p>The softmax function is used in various multiclass classification
+methods, such as multinomial logistic regression (also known as
+softmax regression), multiclass linear discriminant analysis, naive
+Bayes classifiers, and artificial neural networks.  Specifically, in
+multinomial logistic regression and linear discriminant analysis, the
+input to the function is the result of <span class="math notranslate nohighlight">\(K\)</span> distinct linear functions,
+and the predicted probability for the <span class="math notranslate nohighlight">\(k\)</span>-th class given a sample
+vector <span class="math notranslate nohighlight">\(\boldsymbol{x}\)</span> and a weighting vector <span class="math notranslate nohighlight">\(\boldsymbol{\theta}\)</span> is (with two
+predictors):</p>
+<div class="math notranslate nohighlight">
+\[
+p(C=k\vert \mathbf {x} )=\frac{\exp{(\theta_{k0}+\theta_{k1}x_1)}}{1+\sum_{l=1}^{K-1}\exp{(\theta_{l0}+\theta_{l1}x_1)}}.
+\]</div>
+<p>It is easy to extend to more predictors. The final class is</p>
+<div class="math notranslate nohighlight">
+\[
+p(C=K\vert \mathbf {x} )=\frac{1}{1+\sum_{l=1}^{K-1}\exp{(\theta_{l0}+\theta_{l1}x_1)}},
+\]</div>
+<p>and they sum to one. Our earlier discussions were all specialized to
+the case with two classes only. It is easy to see from the above that
+what we derived earlier is compatible with these equations.</p>
+<p>To find the optimal parameters we would typically use a gradient
+descent method.  Newton’s method and gradient descent methods are
+discussed in the material on <a class="reference external" href="https://compphysics.github.io/MachineLearning/doc/pub/Splines/html/Splines-bs.html">optimization
+methods</a>.</p>
+</section>
+<section id="optimization-the-central-part-of-any-machine-learning-algortithm">
+<h2>Optimization, the central part of any Machine Learning algortithm<a class="headerlink" href="#optimization-the-central-part-of-any-machine-learning-algortithm" title="Link to this heading">#</a></h2>
+<p>Almost every problem in machine learning and data science starts with
+a dataset <span class="math notranslate nohighlight">\(X\)</span>, a model <span class="math notranslate nohighlight">\(g(\theta)\)</span>, which is a function of the
+parameters <span class="math notranslate nohighlight">\(\theta\)</span> and a cost function <span class="math notranslate nohighlight">\(C(X, g(\theta))\)</span> that allows
+us to judge how well the model <span class="math notranslate nohighlight">\(g(\theta)\)</span> explains the observations
+<span class="math notranslate nohighlight">\(X\)</span>. The model is fit by finding the values of <span class="math notranslate nohighlight">\(\theta\)</span> that minimize
+the cost function. Ideally we would be able to solve for <span class="math notranslate nohighlight">\(\theta\)</span>
+analytically, however this is not possible in general and we must use
+some approximative/numerical method to compute the minimum.</p>
+</section>
+<section id="revisiting-our-logistic-regression-case">
+<h2>Revisiting our Logistic Regression case<a class="headerlink" href="#revisiting-our-logistic-regression-case" title="Link to this heading">#</a></h2>
+<p>In our discussion on Logistic Regression we studied the
+case of
+two classes, with <span class="math notranslate nohighlight">\(y_i\)</span> either
+<span class="math notranslate nohighlight">\(0\)</span> or <span class="math notranslate nohighlight">\(1\)</span>. Furthermore we assumed also that we have only two
+parameters <span class="math notranslate nohighlight">\(\theta\)</span> in our fitting, that is we
+defined probabilities</p>
+<div class="math notranslate nohighlight">
+\[\begin{split}
+\begin{align*}
+p(y_i=1|x_i,\boldsymbol{\theta}) &amp;= \frac{\exp{(\theta_0+\theta_1x_i)}}{1+\exp{(\theta_0+\theta_1x_i)}},\nonumber\\
+p(y_i=0|x_i,\boldsymbol{\theta}) &amp;= 1 - p(y_i=1|x_i,\boldsymbol{\theta}),
+\end{align*}
+\end{split}\]</div>
+<p>where <span class="math notranslate nohighlight">\(\boldsymbol{\theta}\)</span> are the weights we wish to extract from data, in our case <span class="math notranslate nohighlight">\(\theta_0\)</span> and <span class="math notranslate nohighlight">\(\theta_1\)</span>.</p>
+</section>
+<section id="the-equations-to-solve">
+<h2>The equations to solve<a class="headerlink" href="#the-equations-to-solve" title="Link to this heading">#</a></h2>
+<p>Our compact equations used a definition of a vector <span class="math notranslate nohighlight">\(\boldsymbol{y}\)</span> with <span class="math notranslate nohighlight">\(n\)</span>
+elements <span class="math notranslate nohighlight">\(y_i\)</span>, an <span class="math notranslate nohighlight">\(n\times p\)</span> matrix <span class="math notranslate nohighlight">\(\boldsymbol{X}\)</span> which contains the
+<span class="math notranslate nohighlight">\(x_i\)</span> values and a vector <span class="math notranslate nohighlight">\(\boldsymbol{p}\)</span> of fitted probabilities
+<span class="math notranslate nohighlight">\(p(y_i\vert x_i,\boldsymbol{\theta})\)</span>. We rewrote in a more compact form
+the first derivative of the cost function as</p>
+<div class="math notranslate nohighlight">
+\[
+\frac{\partial \mathcal{C}(\boldsymbol{\theta})}{\partial \boldsymbol{\theta}} = -\boldsymbol{X}^T\left(\boldsymbol{y}-\boldsymbol{p}\right).
+\]</div>
+<p>If we in addition define a diagonal matrix <span class="math notranslate nohighlight">\(\boldsymbol{W}\)</span> with elements
+<span class="math notranslate nohighlight">\(p(y_i\vert x_i,\boldsymbol{\theta})(1-p(y_i\vert x_i,\boldsymbol{\theta})\)</span>, we can obtain a compact expression of the second derivative as</p>
+<div class="math notranslate nohighlight">
+\[
+\frac{\partial^2 \mathcal{C}(\boldsymbol{\theta})}{\partial \boldsymbol{\theta}\partial \boldsymbol{\theta}^T} = \boldsymbol{X}^T\boldsymbol{W}\boldsymbol{X}.
+\]</div>
+<p>This defines what is called  the Hessian matrix.</p>
+</section>
+<section id="solving-using-newton-raphson-s-method">
+<h2>Solving using Newton-Raphson’s method<a class="headerlink" href="#solving-using-newton-raphson-s-method" title="Link to this heading">#</a></h2>
+<p>If we can set up these equations, Newton-Raphson’s iterative method is normally the method of choice. It requires however that we can compute in an efficient way the  matrices that define the first and second derivatives.</p>
+<p>Our iterative scheme is then given by</p>
+<div class="math notranslate nohighlight">
+\[
+\boldsymbol{\theta}^{\mathrm{new}} = \boldsymbol{\theta}^{\mathrm{old}}-\left(\frac{\partial^2 \mathcal{C}(\boldsymbol{\theta})}{\partial \boldsymbol{\theta}\partial \boldsymbol{\theta}^T}\right)^{-1}_{\boldsymbol{\theta}^{\mathrm{old}}}\times \left(\frac{\partial \mathcal{C}(\boldsymbol{\theta})}{\partial \boldsymbol{\theta}}\right)_{\boldsymbol{\theta}^{\mathrm{old}}},
+\]</div>
+<p>or in matrix form as</p>
+<div class="math notranslate nohighlight">
+\[
+\boldsymbol{\theta}^{\mathrm{new}} = \boldsymbol{\theta}^{\mathrm{old}}-\left(\boldsymbol{X}^T\boldsymbol{W}\boldsymbol{X} \right)^{-1}\times \left(-\boldsymbol{X}^T(\boldsymbol{y}-\boldsymbol{p}) \right)_{\boldsymbol{\theta}^{\mathrm{old}}}.
+\]</div>
+<p>The right-hand side is computed with the old values of <span class="math notranslate nohighlight">\(\theta\)</span>.</p>
+<p>If we can compute these matrices, in particular the Hessian, the above is often the easiest method to implement.</p>
+</section>
+<section id="example-code-for-logistic-regression">
+<h2>Example code for Logistic Regression<a class="headerlink" href="#example-code-for-logistic-regression" title="Link to this heading">#</a></h2>
+<p>Here we make a class for Logistic regression. The code uses a simple data set and includes both a binary case and a multiclass case.</p>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-none notranslate"><div class="highlight"><pre><span></span>import numpy as np
+
+class LogisticRegression:
+    &quot;&quot;&quot;
+    Logistic Regression for binary and multiclass classification.
+    &quot;&quot;&quot;
+    def __init__(self, lr=0.01, epochs=1000, fit_intercept=True, verbose=False):
+        self.lr = lr                  # Learning rate for gradient descent
+        self.epochs = epochs          # Number of iterations
+        self.fit_intercept = fit_intercept  # Whether to add intercept (bias)
+        self.verbose = verbose        # Print loss during training if True
+        self.weights = None
+        self.multi_class = False      # Will be determined at fit time
+
+    def _add_intercept(self, X):
+        &quot;&quot;&quot;Add intercept term (column of ones) to feature matrix.&quot;&quot;&quot;
+        intercept = np.ones((X.shape[0], 1))
+        return np.concatenate((intercept, X), axis=1)
+
+    def _sigmoid(self, z):
+        &quot;&quot;&quot;Sigmoid function for binary logistic.&quot;&quot;&quot;
+        return 1 / (1 + np.exp(-z))
+
+    def _softmax(self, Z):
+        &quot;&quot;&quot;Softmax function for multiclass logistic.&quot;&quot;&quot;
+        exp_Z = np.exp(Z - np.max(Z, axis=1, keepdims=True))
+        return exp_Z / np.sum(exp_Z, axis=1, keepdims=True)
+
+    def fit(self, X, y):
+        &quot;&quot;&quot;
+        Train the logistic regression model using gradient descent.
+        Supports binary (sigmoid) and multiclass (softmax) based on y.
+        &quot;&quot;&quot;
+        X = np.array(X)
+        y = np.array(y)
+        n_samples, n_features = X.shape
+
+        # Add intercept if needed
+        if self.fit_intercept:
+            X = self._add_intercept(X)
+            n_features += 1
+
+        # Determine classes and mode (binary vs multiclass)
+        unique_classes = np.unique(y)
+        if len(unique_classes) &gt; 2:
+            self.multi_class = True
+        else:
+            self.multi_class = False
+
+        # ----- Multiclass case -----
+        if self.multi_class:
+            n_classes = len(unique_classes)
+            # Map original labels to 0...n_classes-1
+            class_to_index = {c: idx for idx, c in enumerate(unique_classes)}
+            y_indices = np.array([class_to_index[c] for c in y])
+            # Initialize weight matrix (features x classes)
+            self.weights = np.zeros((n_features, n_classes))
+
+            # One-hot encode y
+            Y_onehot = np.zeros((n_samples, n_classes))
+            Y_onehot[np.arange(n_samples), y_indices] = 1
+
+            # Gradient descent
+            for epoch in range(self.epochs):
+                scores = X.dot(self.weights)          # Linear scores (n_samples x n_classes)
+                probs = self._softmax(scores)        # Probabilities (n_samples x n_classes)
+                # Compute gradient (features x classes)
+                gradient = (1 / n_samples) * X.T.dot(probs - Y_onehot)
+                # Update weights
+                self.weights -= self.lr * gradient
+
+                if self.verbose and epoch % 100 == 0:
+                    # Compute current loss (categorical cross-entropy)
+                    loss = -np.sum(Y_onehot * np.log(probs + 1e-15)) / n_samples
+                    print(f&quot;[Epoch {epoch}] Multiclass loss: {loss:.4f}&quot;)
+
+        # ----- Binary case -----
+        else:
+            # Convert y to 0/1 if not already
+            if not np.array_equal(unique_classes, [0, 1]):
+                # Map the two classes to 0 and 1
+                class0, class1 = unique_classes
+                y_binary = np.where(y == class1, 1, 0)
+            else:
+                y_binary = y.copy().astype(int)
+
+            # Initialize weights vector (features,)
+            self.weights = np.zeros(n_features)
+
+            # Gradient descent
+            for epoch in range(self.epochs):
+                linear_model = X.dot(self.weights)     # (n_samples,)
+                probs = self._sigmoid(linear_model)   # (n_samples,)
+                # Gradient for binary cross-entropy
+                gradient = (1 / n_samples) * X.T.dot(probs - y_binary)
+                self.weights -= self.lr * gradient
+
+                if self.verbose and epoch % 100 == 0:
+                    # Compute binary cross-entropy loss
+                    loss = -np.mean(
+                        y_binary * np.log(probs + 1e-15) + 
+                        (1 - y_binary) * np.log(1 - probs + 1e-15)
+                    )
+                    print(f&quot;[Epoch {epoch}] Binary loss: {loss:.4f}&quot;)
+
+    def predict_prob(self, X):
+        &quot;&quot;&quot;
+        Compute probability estimates. Returns a 1D array for binary or
+        a 2D array (n_samples x n_classes) for multiclass.
+        &quot;&quot;&quot;
+        X = np.array(X)
+        # Add intercept if the model used it
+        if self.fit_intercept:
+            X = self._add_intercept(X)
+        scores = X.dot(self.weights)
+        if self.multi_class:
+            return self._softmax(scores)
+        else:
+            return self._sigmoid(scores)
+
+    def predict(self, X):
+        &quot;&quot;&quot;
+        Predict class labels for samples in X.
+        Returns integer class labels (0,1 for binary, or 0...C-1 for multiclass).
+        &quot;&quot;&quot;
+        probs = self.predict_prob(X)
+        if self.multi_class:
+            # Choose class with highest probability
+            return np.argmax(probs, axis=1)
+        else:
+            # Threshold at 0.5 for binary
+            return (probs &gt;= 0.5).astype(int)
+</pre></div>
+</div>
+</div>
+</div>
+<p>The class implements the sigmoid and softmax internally. During fit(),
+we check the number of classes: if more than 2, we set
+self.multi_class=True and perform multinomial logistic regression. We
+one-hot encode the target vector and update a weight matrix with
+softmax probabilities. Otherwise, we do standard binary logistic
+regression, converting labels to 0/1 if needed and updating a weight
+vector. In both cases we use batch gradient descent on the
+cross-entropy loss (we add a small epsilon 1e-15 to logs for numerical
+stability). Progress (loss) can be printed if verbose=True.</p>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-none notranslate"><div class="highlight"><pre><span></span># Evaluation Metrics
+#We define helper functions for accuracy and cross-entropy loss. Accuracy is the fraction of correct predictions . For loss, we compute the appropriate cross-entropy:
+
+def accuracy_score(y_true, y_pred):
+    &quot;&quot;&quot;Accuracy = (# correct predictions) / (total samples).&quot;&quot;&quot;
+    y_true = np.array(y_true)
+    y_pred = np.array(y_pred)
+    return np.mean(y_true == y_pred)
+
+def binary_cross_entropy(y_true, y_prob):
+    &quot;&quot;&quot;
+    Binary cross-entropy loss.
+    y_true: true binary labels (0 or 1), y_prob: predicted probabilities for class 1.
+    &quot;&quot;&quot;
+    y_true = np.array(y_true)
+    y_prob = np.clip(np.array(y_prob), 1e-15, 1-1e-15)
+    return -np.mean(y_true * np.log(y_prob) + (1 - y_true) * np.log(1 - y_prob))
+
+def categorical_cross_entropy(y_true, y_prob):
+    &quot;&quot;&quot;
+    Categorical cross-entropy loss for multiclass.
+    y_true: true labels (0...C-1), y_prob: array of predicted probabilities (n_samples x C).
+    &quot;&quot;&quot;
+    y_true = np.array(y_true, dtype=int)
+    y_prob = np.clip(np.array(y_prob), 1e-15, 1-1e-15)
+    # One-hot encode true labels
+    n_samples, n_classes = y_prob.shape
+    one_hot = np.zeros_like(y_prob)
+    one_hot[np.arange(n_samples), y_true] = 1
+    # Compute cross-entropy
+    loss_vec = -np.sum(one_hot * np.log(y_prob), axis=1)
+    return np.mean(loss_vec)
+</pre></div>
+</div>
+</div>
+</div>
+<section id="synthetic-data-generation">
+<h3>Synthetic data generation<a class="headerlink" href="#synthetic-data-generation" title="Link to this heading">#</a></h3>
+<p>Binary classification data: Create two Gaussian clusters in 2D. For example, class 0 around mean [-2,-2] and class 1 around [2,2].
+Multiclass data: Create several Gaussian clusters (one per class) spread out in feature space.</p>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-none notranslate"><div class="highlight"><pre><span></span>import numpy as np
+
+def generate_binary_data(n_samples=100, n_features=2, random_state=None):
+    &quot;&quot;&quot;
+    Generate synthetic binary classification data.
+    Returns (X, y) where X is (n_samples x n_features), y in {0,1}.
+    &quot;&quot;&quot;
+    rng = np.random.RandomState(random_state)
+    # Half samples for class 0, half for class 1
+    n0 = n_samples // 2
+    n1 = n_samples - n0
+    # Class 0 around mean -2, class 1 around +2
+    mean0 = -2 * np.ones(n_features)
+    mean1 =  2 * np.ones(n_features)
+    X0 = rng.randn(n0, n_features) + mean0
+    X1 = rng.randn(n1, n_features) + mean1
+    X = np.vstack((X0, X1))
+    y = np.array([0]*n0 + [1]*n1)
+    return X, y
+
+def generate_multiclass_data(n_samples=150, n_features=2, n_classes=3, random_state=None):
+    &quot;&quot;&quot;
+    Generate synthetic multiclass data with n_classes Gaussian clusters.
+    &quot;&quot;&quot;
+    rng = np.random.RandomState(random_state)
+    X = []
+    y = []
+    samples_per_class = n_samples // n_classes
+    for cls in range(n_classes):
+        # Random cluster center for each class
+        center = rng.uniform(-5, 5, size=n_features)
+        Xi = rng.randn(samples_per_class, n_features) + center
+        yi = [cls] * samples_per_class
+        X.append(Xi)
+        y.extend(yi)
+    X = np.vstack(X)
+    y = np.array(y)
+    return X, y
+
+
+# Generate and test on binary data
+X_bin, y_bin = generate_binary_data(n_samples=200, n_features=2, random_state=42)
+model_bin = LogisticRegression(lr=0.1, epochs=1000)
+model_bin.fit(X_bin, y_bin)
+y_prob_bin = model_bin.predict_prob(X_bin)      # probabilities for class 1
+y_pred_bin = model_bin.predict(X_bin)           # predicted classes 0 or 1
+
+acc_bin = accuracy_score(y_bin, y_pred_bin)
+loss_bin = binary_cross_entropy(y_bin, y_prob_bin)
+print(f&quot;Binary Classification - Accuracy: {acc_bin:.2f}, Cross-Entropy Loss: {loss_bin:.2f}&quot;)
+#For multiclass:
+# Generate and test on multiclass data
+X_multi, y_multi = generate_multiclass_data(n_samples=300, n_features=2, n_classes=3, random_state=1)
+model_multi = LogisticRegression(lr=0.1, epochs=1000)
+model_multi.fit(X_multi, y_multi)
+y_prob_multi = model_multi.predict_prob(X_multi)     # (n_samples x 3) probabilities
+y_pred_multi = model_multi.predict(X_multi)          # predicted labels 0,1,2
+
+acc_multi = accuracy_score(y_multi, y_pred_multi)
+loss_multi = categorical_cross_entropy(y_multi, y_prob_multi)
+print(f&quot;Multiclass Classification - Accuracy: {acc_multi:.2f}, Cross-Entropy Loss: {loss_multi:.2f}&quot;)
+
+# CSV Export
+import csv
+
+# Export binary results
+with open(&#39;binary_results.csv&#39;, mode=&#39;w&#39;, newline=&#39;&#39;) as f:
+    writer = csv.writer(f)
+    writer.writerow([&quot;TrueLabel&quot;, &quot;PredictedLabel&quot;])
+    for true, pred in zip(y_bin, y_pred_bin):
+        writer.writerow([true, pred])
+
+# Export multiclass results
+with open(&#39;multiclass_results.csv&#39;, mode=&#39;w&#39;, newline=&#39;&#39;) as f:
+    writer = csv.writer(f)
+    writer.writerow([&quot;TrueLabel&quot;, &quot;PredictedLabel&quot;])
+    for true, pred in zip(y_multi, y_pred_multi):
+        writer.writerow([true, pred])
+</pre></div>
+</div>
+</div>
+</div>
+</section>
+</section>
+<section id="using-scikit-learn">
+<h2>Using <strong>Scikit-learn</strong><a class="headerlink" href="#using-scikit-learn" title="Link to this heading">#</a></h2>
+<p>We show here how we can use a logistic regression case on a data set
+included in <em>scikit_learn</em>, the so-called Wisconsin breast cancer data
+using Logistic regression as our algorithm for classification. This is
+a widely studied data set and can easily be included in demonstrations
+of classification problems.</p>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-none notranslate"><div class="highlight"><pre><span></span>%matplotlib inline
+
+import matplotlib.pyplot as plt
+import numpy as np
+from sklearn.model_selection import  train_test_split 
+from sklearn.datasets import load_breast_cancer
+from sklearn.linear_model import LogisticRegression
+
+# Load the data
+cancer = load_breast_cancer()
+
+X_train, X_test, y_train, y_test = train_test_split(cancer.data,cancer.target,random_state=0)
+print(X_train.shape)
+print(X_test.shape)
+# Logistic Regression
+logreg = LogisticRegression(solver=&#39;lbfgs&#39;)
+logreg.fit(X_train, y_train)
+print(&quot;Test set accuracy with Logistic Regression: {:.2f}&quot;.format(logreg.score(X_test,y_test)))
+</pre></div>
+</div>
+</div>
+</div>
+</section>
+<section id="using-the-correlation-matrix">
+<h2>Using the correlation matrix<a class="headerlink" href="#using-the-correlation-matrix" title="Link to this heading">#</a></h2>
+<p>In addition to the above scores, we could also study the covariance (and the correlation matrix).
+We use <strong>Pandas</strong> to compute the correlation matrix.</p>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-none notranslate"><div class="highlight"><pre><span></span>import matplotlib.pyplot as plt
+import numpy as np
+from sklearn.model_selection import  train_test_split 
+from sklearn.datasets import load_breast_cancer
+from sklearn.linear_model import LogisticRegression
+cancer = load_breast_cancer()
+import pandas as pd
+# Making a data frame
+cancerpd = pd.DataFrame(cancer.data, columns=cancer.feature_names)
+
+fig, axes = plt.subplots(15,2,figsize=(10,20))
+malignant = cancer.data[cancer.target == 0]
+benign = cancer.data[cancer.target == 1]
+ax = axes.ravel()
+
+for i in range(30):
+    _, bins = np.histogram(cancer.data[:,i], bins =50)
+    ax[i].hist(malignant[:,i], bins = bins, alpha = 0.5)
+    ax[i].hist(benign[:,i], bins = bins, alpha = 0.5)
+    ax[i].set_title(cancer.feature_names[i])
+    ax[i].set_yticks(())
+ax[0].set_xlabel(&quot;Feature magnitude&quot;)
+ax[0].set_ylabel(&quot;Frequency&quot;)
+ax[0].legend([&quot;Malignant&quot;, &quot;Benign&quot;], loc =&quot;best&quot;)
+fig.tight_layout()
+plt.show()
+
+import seaborn as sns
+correlation_matrix = cancerpd.corr().round(1)
+# use the heatmap function from seaborn to plot the correlation matrix
+# annot = True to print the values inside the square
+plt.figure(figsize=(15,8))
+sns.heatmap(data=correlation_matrix, annot=True)
+plt.show()
+</pre></div>
+</div>
+</div>
+</div>
+</section>
+<section id="discussing-the-correlation-data">
+<h2>Discussing the correlation data<a class="headerlink" href="#discussing-the-correlation-data" title="Link to this heading">#</a></h2>
+<p>In the above example we note two things. In the first plot we display
+the overlap of benign and malignant tumors as functions of the various
+features in the Wisconsin data set. We see that for
+some of the features we can distinguish clearly the benign and
+malignant cases while for other features we cannot. This can point to
+us which features may be of greater interest when we wish to classify
+a benign or not benign tumour.</p>
+<p>In the second figure we have computed the so-called correlation
+matrix, which in our case with thirty features becomes a <span class="math notranslate nohighlight">\(30\times 30\)</span>
+matrix.</p>
+<p>We constructed this matrix using <strong>pandas</strong> via the statements</p>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-none notranslate"><div class="highlight"><pre><span></span>cancerpd = pd.DataFrame(cancer.data, columns=cancer.feature_names)
+</pre></div>
+</div>
+</div>
+</div>
+<p>and then</p>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-none notranslate"><div class="highlight"><pre><span></span>correlation_matrix = cancerpd.corr().round(1)
+</pre></div>
+</div>
+</div>
+</div>
+<p>Diagonalizing this matrix we can in turn say something about which
+features are of relevance and which are not. This leads  us to
+the classical Principal Component Analysis (PCA) theorem with
+applications. This will be discussed later this semester.</p>
+</section>
+<section id="other-measures-in-classification-studies">
+<h2>Other measures in classification studies<a class="headerlink" href="#other-measures-in-classification-studies" title="Link to this heading">#</a></h2>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-none notranslate"><div class="highlight"><pre><span></span>import matplotlib.pyplot as plt
+import numpy as np
+from sklearn.model_selection import  train_test_split 
+from sklearn.datasets import load_breast_cancer
+from sklearn.linear_model import LogisticRegression
+
+# Load the data
+cancer = load_breast_cancer()
+
+X_train, X_test, y_train, y_test = train_test_split(cancer.data,cancer.target,random_state=0)
+print(X_train.shape)
+print(X_test.shape)
+# Logistic Regression
+logreg = LogisticRegression(solver=&#39;lbfgs&#39;)
+logreg.fit(X_train, y_train)
+
+from sklearn.preprocessing import LabelEncoder
+from sklearn.model_selection import cross_validate
+#Cross validation
+accuracy = cross_validate(logreg,X_test,y_test,cv=10)[&#39;test_score&#39;]
+print(accuracy)
+print(&quot;Test set accuracy with Logistic Regression: {:.2f}&quot;.format(logreg.score(X_test,y_test)))
+
+import scikitplot as skplt
+y_pred = logreg.predict(X_test)
+skplt.metrics.plot_confusion_matrix(y_test, y_pred, normalize=True)
+plt.show()
+y_probas = logreg.predict_proba(X_test)
+skplt.metrics.plot_roc(y_test, y_probas)
+plt.show()
+skplt.metrics.plot_cumulative_gain(y_test, y_probas)
+plt.show()
+</pre></div>
+</div>
+</div>
+</div>
+</section>
+<section id="introduction-to-neural-networks">
+<h2>Introduction to Neural networks<a class="headerlink" href="#introduction-to-neural-networks" title="Link to this heading">#</a></h2>
+<p>Artificial neural networks are computational systems that can learn to
+perform tasks by considering examples, generally without being
+programmed with any task-specific rules. It is supposed to mimic a
+biological system, wherein neurons interact by sending signals in the
+form of mathematical functions between layers. All layers can contain
+an arbitrary number of neurons, and each connection is represented by
+a weight variable.</p>
+</section>
+<section id="artificial-neurons">
+<h2>Artificial neurons<a class="headerlink" href="#artificial-neurons" title="Link to this heading">#</a></h2>
+<p>The field of artificial neural networks has a long history of
+development, and is closely connected with the advancement of computer
+science and computers in general. A model of artificial neurons was
+first developed by McCulloch and Pitts in 1943 to study signal
+processing in the brain and has later been refined by others. The
+general idea is to mimic neural networks in the human brain, which is
+composed of billions of neurons that communicate with each other by
+sending electrical signals.  Each neuron accumulates its incoming
+signals, which must exceed an activation threshold to yield an
+output. If the threshold is not overcome, the neuron remains inactive,
+i.e. has zero output.</p>
+<p>This behaviour has inspired a simple mathematical model for an artificial neuron.</p>
+<!-- Equation labels as ordinary links -->
+<div id="artificialNeuron"></div>
+<div class="math notranslate nohighlight">
+\[
+\begin{equation}
+ y = f\left(\sum_{i=1}^n w_ix_i\right) = f(u)
+\label{artificialNeuron} \tag{1}
+\end{equation}
+\]</div>
+<p>Here, the output <span class="math notranslate nohighlight">\(y\)</span> of the neuron is the value of its activation function, which have as input
+a weighted sum of signals <span class="math notranslate nohighlight">\(x_i, \dots ,x_n\)</span> received by <span class="math notranslate nohighlight">\(n\)</span> other neurons.</p>
+<p>Conceptually, it is helpful to divide neural networks into four
+categories:</p>
+<ol class="arabic simple">
+<li><p>general purpose neural networks for supervised learning,</p></li>
+<li><p>neural networks designed specifically for image processing, the most prominent example of this class being Convolutional Neural Networks (CNNs),</p></li>
+<li><p>neural networks for sequential data such as Recurrent Neural Networks (RNNs), and</p></li>
+<li><p>neural networks for unsupervised learning such as Deep Boltzmann Machines.</p></li>
+</ol>
+<p>In natural science, DNNs and CNNs have already found numerous
+applications. In statistical physics, they have been applied to detect
+phase transitions in 2D Ising and Potts models, lattice gauge
+theories, and different phases of polymers, or solving the
+Navier-Stokes equation in weather forecasting.  Deep learning has also
+found interesting applications in quantum physics. Various quantum
+phase transitions can be detected and studied using DNNs and CNNs,
+topological phases, and even non-equilibrium many-body
+localization. Representing quantum states as DNNs quantum state
+tomography are among some of the impressive achievements to reveal the
+potential of DNNs to facilitate the study of quantum systems.</p>
+<p>In quantum information theory, it has been shown that one can perform
+gate decompositions with the help of neural.</p>
+<p>The applications are not limited to the natural sciences. There is a
+plethora of applications in essentially all disciplines, from the
+humanities to life science and medicine.</p>
+</section>
+<section id="neural-network-types">
+<h2>Neural network types<a class="headerlink" href="#neural-network-types" title="Link to this heading">#</a></h2>
+<p>An artificial neural network (ANN), is a computational model that
+consists of layers of connected neurons, or nodes or units.  We will
+refer to these interchangeably as units or nodes, and sometimes as
+neurons.</p>
+<p>It is supposed to mimic a biological nervous system by letting each
+neuron interact with other neurons by sending signals in the form of
+mathematical functions between layers.  A wide variety of different
+ANNs have been developed, but most of them consist of an input layer,
+an output layer and eventual layers in-between, called <em>hidden
+layers</em>. All layers can contain an arbitrary number of nodes, and each
+connection between two nodes is associated with a weight variable.</p>
+<p>Neural networks (also called neural nets) are neural-inspired
+nonlinear models for supervised learning.  As we will see, neural nets
+can be viewed as natural, more powerful extensions of supervised
+learning methods such as linear and logistic regression and soft-max
+methods we discussed earlier.</p>
+</section>
+<section id="feed-forward-neural-networks">
+<h2>Feed-forward neural networks<a class="headerlink" href="#feed-forward-neural-networks" title="Link to this heading">#</a></h2>
+<p>The feed-forward neural network (FFNN) was the first and simplest type
+of ANNs that were devised. In this network, the information moves in
+only one direction: forward through the layers.</p>
+<p>Nodes are represented by circles, while the arrows display the
+connections between the nodes, including the direction of information
+flow. Additionally, each arrow corresponds to a weight variable
+(figure to come).  We observe that each node in a layer is connected
+to <em>all</em> nodes in the subsequent layer, making this a so-called
+<em>fully-connected</em> FFNN.</p>
+</section>
+<section id="convolutional-neural-network">
+<h2>Convolutional Neural Network<a class="headerlink" href="#convolutional-neural-network" title="Link to this heading">#</a></h2>
+<p>A different variant of FFNNs are <em>convolutional neural networks</em>
+(CNNs), which have a connectivity pattern inspired by the animal
+visual cortex. Individual neurons in the visual cortex only respond to
+stimuli from small sub-regions of the visual field, called a receptive
+field. This makes the neurons well-suited to exploit the strong
+spatially local correlation present in natural images. The response of
+each neuron can be approximated mathematically as a convolution
+operation.  (figure to come)</p>
+<p>Convolutional neural networks emulate the behaviour of neurons in the
+visual cortex by enforcing a <em>local</em> connectivity pattern between
+nodes of adjacent layers: Each node in a convolutional layer is
+connected only to a subset of the nodes in the previous layer, in
+contrast to the fully-connected FFNN.  Often, CNNs consist of several
+convolutional layers that learn local features of the input, with a
+fully-connected layer at the end, which gathers all the local data and
+produces the outputs. They have wide applications in image and video
+recognition.</p>
+</section>
+<section id="recurrent-neural-networks">
+<h2>Recurrent neural networks<a class="headerlink" href="#recurrent-neural-networks" title="Link to this heading">#</a></h2>
+<p>So far we have only mentioned ANNs where information flows in one
+direction: forward. <em>Recurrent neural networks</em> on the other hand,
+have connections between nodes that form directed <em>cycles</em>. This
+creates a form of internal memory which are able to capture
+information on what has been calculated before; the output is
+dependent on the previous computations. Recurrent NNs make use of
+sequential information by performing the same task for every element
+in a sequence, where each element depends on previous elements. An
+example of such information is sentences, making recurrent NNs
+especially well-suited for handwriting and speech recognition.</p>
+</section>
+<section id="other-types-of-networks">
+<h2>Other types of networks<a class="headerlink" href="#other-types-of-networks" title="Link to this heading">#</a></h2>
+<p>There are many other kinds of ANNs that have been developed. One type
+that is specifically designed for interpolation in multidimensional
+space is the radial basis function (RBF) network. RBFs are typically
+made up of three layers: an input layer, a hidden layer with
+non-linear radial symmetric activation functions and a linear output
+layer (‘’linear’’ here means that each node in the output layer has a
+linear activation function). The layers are normally fully-connected
+and there are no cycles, thus RBFs can be viewed as a type of
+fully-connected FFNN. They are however usually treated as a separate
+type of NN due the unusual activation functions.</p>
+</section>
+<section id="multilayer-perceptrons">
+<h2>Multilayer perceptrons<a class="headerlink" href="#multilayer-perceptrons" title="Link to this heading">#</a></h2>
+<p>One uses often so-called fully-connected feed-forward neural networks
+with three or more layers (an input layer, one or more hidden layers
+and an output layer) consisting of neurons that have non-linear
+activation functions.</p>
+<p>Such networks are often called <em>multilayer perceptrons</em> (MLPs).</p>
+</section>
+<section id="why-multilayer-perceptrons">
+<h2>Why multilayer perceptrons?<a class="headerlink" href="#why-multilayer-perceptrons" title="Link to this heading">#</a></h2>
+<p>According to the <em>Universal approximation theorem</em>, a feed-forward
+neural network with just a single hidden layer containing a finite
+number of neurons can approximate a continuous multidimensional
+function to arbitrary accuracy, assuming the activation function for
+the hidden layer is a <strong>non-constant, bounded and
+monotonically-increasing continuous function</strong>.</p>
+<p>Note that the requirements on the activation function only applies to
+the hidden layer, the output nodes are always assumed to be linear, so
+as to not restrict the range of output values.</p>
+</section>
+<section id="illustration-of-a-single-perceptron-model-and-a-multi-perceptron-model">
+<h2>Illustration of a single perceptron model and a multi-perceptron model<a class="headerlink" href="#illustration-of-a-single-perceptron-model-and-a-multi-perceptron-model" title="Link to this heading">#</a></h2>
+<!-- dom:FIGURE: [figures/nns.png, width=600 frac=0.8]  In a) we show a single perceptron model while in b) we dispay a network with two  hidden layers, an input layer and an output layer. -->
+<!-- begin figure -->
+<p><img src="http://github.com/figures/nns.png" width="600"><p style="font-size: 0.9em"><i>Figure 1: In a) we show a single perceptron model while in b) we dispay a network with two  hidden layers, an input layer and an output layer.</i></p></p>
+<!-- end figure --></section>
+<section id="examples-of-xor-or-and-and-gates">
+<h2>Examples of XOR, OR and AND gates<a class="headerlink" href="#examples-of-xor-or-and-and-gates" title="Link to this heading">#</a></h2>
+<p>Let us first try to fit various gates using standard linear
+regression. The gates we are thinking of are the classical XOR, OR and
+AND gates, well-known elements in computer science. The tables here
+show how we can set up the inputs <span class="math notranslate nohighlight">\(x_1\)</span> and <span class="math notranslate nohighlight">\(x_2\)</span> in order to yield a
+specific target <span class="math notranslate nohighlight">\(y_i\)</span>.</p>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-none notranslate"><div class="highlight"><pre><span></span>&quot;&quot;&quot;
+Simple code that tests XOR, OR and AND gates with linear regression
+&quot;&quot;&quot;
+
+import numpy as np
+# Design matrix
+X = np.array([ [1, 0, 0], [1, 0, 1], [1, 1, 0],[1, 1, 1]],dtype=np.float64)
+print(f&quot;The X.TX  matrix:{X.T @ X}&quot;)
+Xinv = np.linalg.pinv(X.T @ X)
+print(f&quot;The invers of X.TX  matrix:{Xinv}&quot;)
+
+# The XOR gate 
+yXOR = np.array( [ 0, 1 ,1, 0])
+ThetaXOR  = Xinv @ X.T @ yXOR
+print(f&quot;The values of theta for the XOR gate:{ThetaXOR}&quot;)
+print(f&quot;The linear regression prediction  for the XOR gate:{X @ ThetaXOR}&quot;)
+
+
+# The OR gate 
+yOR = np.array( [ 0, 1 ,1, 1])
+ThetaOR  = Xinv @ X.T @ yOR
+print(f&quot;The values of theta for the OR gate:{ThetaOR}&quot;)
+print(f&quot;The linear regression prediction  for the OR gate:{X @ ThetaOR}&quot;)
+
+
+# The OR gate 
+yAND = np.array( [ 0, 0 ,0, 1])
+ThetaAND  = Xinv @ X.T @ yAND
+print(f&quot;The values of theta for the AND gate:{ThetaAND}&quot;)
+print(f&quot;The linear regression prediction  for the AND gate:{X @ ThetaAND}&quot;)
+</pre></div>
+</div>
+</div>
+</div>
+<p>What is happening here?</p>
+</section>
+<section id="does-logistic-regression-do-a-better-job">
+<h2>Does Logistic Regression do a better Job?<a class="headerlink" href="#does-logistic-regression-do-a-better-job" title="Link to this heading">#</a></h2>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-none notranslate"><div class="highlight"><pre><span></span>&quot;&quot;&quot;
+Simple code that tests XOR and OR gates with linear regression
+and logistic regression
+&quot;&quot;&quot;
+
+import matplotlib.pyplot as plt
+from sklearn.linear_model import LogisticRegression
+import numpy as np
+
+# Design matrix
+X = np.array([ [1, 0, 0], [1, 0, 1], [1, 1, 0],[1, 1, 1]],dtype=np.float64)
+print(f&quot;The X.TX  matrix:{X.T @ X}&quot;)
+Xinv = np.linalg.pinv(X.T @ X)
+print(f&quot;The invers of X.TX  matrix:{Xinv}&quot;)
+
+# The XOR gate 
+yXOR = np.array( [ 0, 1 ,1, 0])
+ThetaXOR  = Xinv @ X.T @ yXOR
+print(f&quot;The values of theta for the XOR gate:{ThetaXOR}&quot;)
+print(f&quot;The linear regression prediction  for the XOR gate:{X @ ThetaXOR}&quot;)
+
+
+# The OR gate 
+yOR = np.array( [ 0, 1 ,1, 1])
+ThetaOR  = Xinv @ X.T @ yOR
+print(f&quot;The values of theta for the OR gate:{ThetaOR}&quot;)
+print(f&quot;The linear regression prediction  for the OR gate:{X @ ThetaOR}&quot;)
+
+
+# The OR gate 
+yAND = np.array( [ 0, 0 ,0, 1])
+ThetaAND  = Xinv @ X.T @ yAND
+print(f&quot;The values of theta for the AND gate:{ThetaAND}&quot;)
+print(f&quot;The linear regression prediction  for the AND gate:{X @ ThetaAND}&quot;)
+
+# Now we change to logistic regression
+
+
+# Logistic Regression
+logreg = LogisticRegression()
+logreg.fit(X, yOR)
+print(&quot;Test set accuracy with Logistic Regression for OR gate: {:.2f}&quot;.format(logreg.score(X,yOR)))
+
+logreg.fit(X, yXOR)
+print(&quot;Test set accuracy with Logistic Regression for XOR gate: {:.2f}&quot;.format(logreg.score(X,yXOR)))
+
+
+logreg.fit(X, yAND)
+print(&quot;Test set accuracy with Logistic Regression for AND gate: {:.2f}&quot;.format(logreg.score(X,yAND)))
+</pre></div>
+</div>
+</div>
+</div>
+<p>Not exactly impressive, but somewhat better.</p>
+</section>
+<section id="adding-neural-networks">
+<h2>Adding Neural Networks<a class="headerlink" href="#adding-neural-networks" title="Link to this heading">#</a></h2>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-none notranslate"><div class="highlight"><pre><span></span>
+# and now neural networks with Scikit-Learn and the XOR
+
+from sklearn.neural_network import MLPClassifier
+from sklearn.datasets import make_classification
+X, yXOR = make_classification(n_samples=100, random_state=1)
+FFNN = MLPClassifier(random_state=1, max_iter=300).fit(X, yXOR)
+FFNN.predict_proba(X)
+print(f&quot;Test set accuracy with Feed Forward Neural Network  for XOR gate:{FFNN.score(X, yXOR)}&quot;)
+</pre></div>
+</div>
+</div>
+</div>
+</section>
+<section id="mathematical-model">
+<h2>Mathematical model<a class="headerlink" href="#mathematical-model" title="Link to this heading">#</a></h2>
+<p>The output <span class="math notranslate nohighlight">\(y\)</span> is produced via the activation function <span class="math notranslate nohighlight">\(f\)</span></p>
+<div class="math notranslate nohighlight">
+\[
+y = f\left(\sum_{i=1}^n w_ix_i + b_i\right) = f(z),
+\]</div>
+<p>This function receives <span class="math notranslate nohighlight">\(x_i\)</span> as inputs.
+Here the activation <span class="math notranslate nohighlight">\(z=(\sum_{i=1}^n w_ix_i+b_i)\)</span>.
+In an FFNN of such neurons, the <em>inputs</em> <span class="math notranslate nohighlight">\(x_i\)</span> are the <em>outputs</em> of
+the neurons in the preceding layer. Furthermore, an MLP is
+fully-connected, which means that each neuron receives a weighted sum
+of the outputs of <em>all</em> neurons in the previous layer.</p>
+</section>
+<section id="id1">
+<h2>Mathematical model<a class="headerlink" href="#id1" title="Link to this heading">#</a></h2>
+<p>First, for each node <span class="math notranslate nohighlight">\(i\)</span> in the first hidden layer, we calculate a weighted sum <span class="math notranslate nohighlight">\(z_i^1\)</span> of the input coordinates <span class="math notranslate nohighlight">\(x_j\)</span>,</p>
+<!-- Equation labels as ordinary links -->
+<div id="_auto1"></div>
+<div class="math notranslate nohighlight">
+\[
+\begin{equation} z_i^1 = \sum_{j=1}^{M} w_{ij}^1 x_j + b_i^1
+\label{_auto1} \tag{2}
+\end{equation}
+\]</div>
+<p>Here <span class="math notranslate nohighlight">\(b_i\)</span> is the so-called bias which is normally needed in
+case of zero activation weights or inputs. How to fix the biases and
+the weights will be discussed below.  The value of <span class="math notranslate nohighlight">\(z_i^1\)</span> is the
+argument to the activation function <span class="math notranslate nohighlight">\(f_i\)</span> of each node <span class="math notranslate nohighlight">\(i\)</span>, The
+variable <span class="math notranslate nohighlight">\(M\)</span> stands for all possible inputs to a given node <span class="math notranslate nohighlight">\(i\)</span> in the
+first layer.  We define  the output <span class="math notranslate nohighlight">\(y_i^1\)</span> of all neurons in layer 1 as</p>
+<!-- Equation labels as ordinary links -->
+<div id="outputLayer1"></div>
+<div class="math notranslate nohighlight">
+\[
+\begin{equation}
+ y_i^1 = f(z_i^1) = f\left(\sum_{j=1}^M w_{ij}^1 x_j  + b_i^1\right)
+\label{outputLayer1} \tag{3}
+\end{equation}
+\]</div>
+<p>where we assume that all nodes in the same layer have identical
+activation functions, hence the notation <span class="math notranslate nohighlight">\(f\)</span>. In general, we could assume in the more general case that different layers have different activation functions.
+In this case we would identify these functions with a superscript <span class="math notranslate nohighlight">\(l\)</span> for the <span class="math notranslate nohighlight">\(l\)</span>-th layer,</p>
+<!-- Equation labels as ordinary links -->
+<div id="generalLayer"></div>
+<div class="math notranslate nohighlight">
+\[
+\begin{equation}
+ y_i^l = f^l(u_i^l) = f^l\left(\sum_{j=1}^{N_{l-1}} w_{ij}^l y_j^{l-1} + b_i^l\right)
+\label{generalLayer} \tag{4}
+\end{equation}
+\]</div>
+<p>where <span class="math notranslate nohighlight">\(N_l\)</span> is the number of nodes in layer <span class="math notranslate nohighlight">\(l\)</span>. When the output of
+all the nodes in the first hidden layer are computed, the values of
+the subsequent layer can be calculated and so forth until the output
+is obtained.</p>
+</section>
+<section id="id2">
+<h2>Mathematical model<a class="headerlink" href="#id2" title="Link to this heading">#</a></h2>
+<p>The output of neuron <span class="math notranslate nohighlight">\(i\)</span> in layer 2 is thus,</p>
+<!-- Equation labels as ordinary links -->
+<div id="_auto2"></div>
+<div class="math notranslate nohighlight">
+\[
+\begin{equation}
+ y_i^2 = f^2\left(\sum_{j=1}^N w_{ij}^2 y_j^1 + b_i^2\right) 
+\label{_auto2} \tag{5}
+\end{equation}
+\]</div>
+<!-- Equation labels as ordinary links -->
+<div id="outputLayer2"></div>
+<div class="math notranslate nohighlight">
+\[
+\begin{equation} 
+ = f^2\left[\sum_{j=1}^N w_{ij}^2f^1\left(\sum_{k=1}^M w_{jk}^1 x_k + b_j^1\right) + b_i^2\right]
+\label{outputLayer2} \tag{6}
+\end{equation}
+\]</div>
+<p>where we have substituted <span class="math notranslate nohighlight">\(y_k^1\)</span> with the inputs <span class="math notranslate nohighlight">\(x_k\)</span>. Finally, the ANN output reads</p>
+<!-- Equation labels as ordinary links -->
+<div id="_auto3"></div>
+<div class="math notranslate nohighlight">
+\[
+\begin{equation}
+ y_i^3 = f^3\left(\sum_{j=1}^N w_{ij}^3 y_j^2 + b_i^3\right) 
+\label{_auto3} \tag{7}
+\end{equation}
+\]</div>
+<!-- Equation labels as ordinary links -->
+<div id="_auto4"></div>
+<div class="math notranslate nohighlight">
+\[
+\begin{equation} 
+ = f_3\left[\sum_{j} w_{ij}^3 f^2\left(\sum_{k} w_{jk}^2 f^1\left(\sum_{m} w_{km}^1 x_m + b_k^1\right) + b_j^2\right)
+  + b_1^3\right]
+\label{_auto4} \tag{8}
+\end{equation}
+\]</div>
+</section>
+<section id="id3">
+<h2>Mathematical model<a class="headerlink" href="#id3" title="Link to this heading">#</a></h2>
+<p>We can generalize this expression to an MLP with <span class="math notranslate nohighlight">\(l\)</span> hidden
+layers. The complete functional form is,</p>
+<!-- Equation labels as ordinary links -->
+<div id="completeNN"></div>
+<div class="math notranslate nohighlight">
+\[
+\begin{equation}
+y^{l+1}_i = f^{l+1}\left[\!\sum_{j=1}^{N_l} w_{ij}^3 f^l\left(\sum_{k=1}^{N_{l-1}}w_{jk}^{l-1}\left(\dots f^1\left(\sum_{n=1}^{N_0} w_{mn}^1 x_n+ b_m^1\right)\dots\right)+b_k^2\right)+b_1^3\right] 
+\label{completeNN} \tag{9}
+\end{equation}
+\]</div>
+<p>which illustrates a basic property of MLPs: The only independent
+variables are the input values <span class="math notranslate nohighlight">\(x_n\)</span>.</p>
+</section>
+<section id="id4">
+<h2>Mathematical model<a class="headerlink" href="#id4" title="Link to this heading">#</a></h2>
+<p>This confirms that an MLP, despite its quite convoluted mathematical
+form, is nothing more than an analytic function, specifically a
+mapping of real-valued vectors <span class="math notranslate nohighlight">\(\hat{x} \in \mathbb{R}^n \rightarrow
+\hat{y} \in \mathbb{R}^m\)</span>.</p>
+<p>Furthermore, the flexibility and universality of an MLP can be
+illustrated by realizing that the expression is essentially a nested
+sum of scaled activation functions of the form</p>
+<!-- Equation labels as ordinary links -->
+<div id="_auto5"></div>
+<div class="math notranslate nohighlight">
+\[
+\begin{equation}
+ f(x) = c_1 f(c_2 x + c_3) + c_4
+\label{_auto5} \tag{10}
+\end{equation}
+\]</div>
+<p>where the parameters <span class="math notranslate nohighlight">\(c_i\)</span> are weights and biases. By adjusting these
+parameters, the activation functions can be shifted up and down or
+left and right, change slope or be rescaled which is the key to the
+flexibility of a neural network.</p>
+<section id="matrix-vector-notation">
+<h3>Matrix-vector notation<a class="headerlink" href="#matrix-vector-notation" title="Link to this heading">#</a></h3>
+<p>We can introduce a more convenient notation for the activations in an A NN.</p>
+<p>Additionally, we can represent the biases and activations
+as layer-wise column vectors <span class="math notranslate nohighlight">\(\hat{b}_l\)</span> and <span class="math notranslate nohighlight">\(\hat{y}_l\)</span>, so that the <span class="math notranslate nohighlight">\(i\)</span>-th element of each vector
+is the bias <span class="math notranslate nohighlight">\(b_i^l\)</span> and activation <span class="math notranslate nohighlight">\(y_i^l\)</span> of node <span class="math notranslate nohighlight">\(i\)</span> in layer <span class="math notranslate nohighlight">\(l\)</span> respectively.</p>
+<p>We have that <span class="math notranslate nohighlight">\(\mathrm{W}_l\)</span> is an <span class="math notranslate nohighlight">\(N_{l-1} \times N_l\)</span> matrix, while <span class="math notranslate nohighlight">\(\hat{b}_l\)</span> and <span class="math notranslate nohighlight">\(\hat{y}_l\)</span> are <span class="math notranslate nohighlight">\(N_l \times 1\)</span> column vectors.
+With this notation, the sum becomes a matrix-vector multiplication, and we can write
+the equation for the activations of hidden layer 2 (assuming three nodes for simplicity) as</p>
+<!-- Equation labels as ordinary links -->
+<div id="_auto6"></div>
+<div class="math notranslate nohighlight">
+\[\begin{split}
+\begin{equation}
+ \hat{y}_2 = f_2(\mathrm{W}_2 \hat{y}_{1} + \hat{b}_{2}) = 
+ f_2\left(\left[\begin{array}{ccc}
+    w^2_{11} &amp;w^2_{12} &amp;w^2_{13} \\
+    w^2_{21} &amp;w^2_{22} &amp;w^2_{23} \\
+    w^2_{31} &amp;w^2_{32} &amp;w^2_{33} \\
+    \end{array} \right] \cdot
+    \left[\begin{array}{c}
+           y^1_1 \\
+           y^1_2 \\
+           y^1_3 \\
+          \end{array}\right] + 
+    \left[\begin{array}{c}
+           b^2_1 \\
+           b^2_2 \\
+           b^2_3 \\
+          \end{array}\right]\right).
+\label{_auto6} \tag{11}
+\end{equation}
+\end{split}\]</div>
+</section>
+<section id="matrix-vector-notation-and-activation">
+<h3>Matrix-vector notation  and activation<a class="headerlink" href="#matrix-vector-notation-and-activation" title="Link to this heading">#</a></h3>
+<p>The activation of node <span class="math notranslate nohighlight">\(i\)</span> in layer 2 is</p>
+<!-- Equation labels as ordinary links -->
+<div id="_auto7"></div>
+<div class="math notranslate nohighlight">
+\[
+\begin{equation}
+ y^2_i = f_2\Bigr(w^2_{i1}y^1_1 + w^2_{i2}y^1_2 + w^2_{i3}y^1_3 + b^2_i\Bigr) = 
+ f_2\left(\sum_{j=1}^3 w^2_{ij} y_j^1 + b^2_i\right).
+\label{_auto7} \tag{12}
+\end{equation}
+\]</div>
+<p>This is not just a convenient and compact notation, but also a useful
+and intuitive way to think about MLPs: The output is calculated by a
+series of matrix-vector multiplications and vector additions that are
+used as input to the activation functions. For each operation
+<span class="math notranslate nohighlight">\(\mathrm{W}_l \hat{y}_{l-1}\)</span> we move forward one layer.</p>
+</section>
+<section id="activation-functions">
+<h3>Activation functions<a class="headerlink" href="#activation-functions" title="Link to this heading">#</a></h3>
+<p>A property that characterizes a neural network, other than its
+connectivity, is the choice of activation function(s).  As described
+in, the following restrictions are imposed on an activation function
+for a FFNN to fulfill the universal approximation theorem</p>
+<ul class="simple">
+<li><p>Non-constant</p></li>
+<li><p>Bounded</p></li>
+<li><p>Monotonically-increasing</p></li>
+<li><p>Continuous</p></li>
+</ul>
+</section>
+<section id="activation-functions-logistic-and-hyperbolic-ones">
+<h3>Activation functions, Logistic and Hyperbolic ones<a class="headerlink" href="#activation-functions-logistic-and-hyperbolic-ones" title="Link to this heading">#</a></h3>
+<p>The second requirement excludes all linear functions. Furthermore, in
+a MLP with only linear activation functions, each layer simply
+performs a linear transformation of its inputs.</p>
+<p>Regardless of the number of layers, the output of the NN will be
+nothing but a linear function of the inputs. Thus we need to introduce
+some kind of non-linearity to the NN to be able to fit non-linear
+functions Typical examples are the logistic <em>Sigmoid</em></p>
+<div class="math notranslate nohighlight">
+\[
+f(x) = \frac{1}{1 + e^{-x}},
+\]</div>
+<p>and the <em>hyperbolic tangent</em> function</p>
+<div class="math notranslate nohighlight">
+\[
+f(x) = \tanh(x)
+\]</div>
+</section>
+<section id="relevance">
+<h3>Relevance<a class="headerlink" href="#relevance" title="Link to this heading">#</a></h3>
+<p>The <em>sigmoid</em> function are more biologically plausible because the
+output of inactive neurons are zero. Such activation function are
+called <em>one-sided</em>. However, it has been shown that the hyperbolic
+tangent performs better than the sigmoid for training MLPs.  has
+become the most popular for <em>deep neural networks</em></p>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-none notranslate"><div class="highlight"><pre><span></span>&quot;&quot;&quot;The sigmoid function (or the logistic curve) is a 
+function that takes any real number, z, and outputs a number (0,1).
+It is useful in neural networks for assigning weights on a relative scale.
+The value z is the weighted sum of parameters involved in the learning algorithm.&quot;&quot;&quot;
+
+import numpy
+import matplotlib.pyplot as plt
+import math as mt
+
+z = numpy.arange(-5, 5, .1)
+sigma_fn = numpy.vectorize(lambda z: 1/(1+numpy.exp(-z)))
+sigma = sigma_fn(z)
+
+fig = plt.figure()
+ax = fig.add_subplot(111)
+ax.plot(z, sigma)
+ax.set_ylim([-0.1, 1.1])
+ax.set_xlim([-5,5])
+ax.grid(True)
+ax.set_xlabel(&#39;z&#39;)
+ax.set_title(&#39;sigmoid function&#39;)
+
+plt.show()
+
+&quot;&quot;&quot;Step Function&quot;&quot;&quot;
+z = numpy.arange(-5, 5, .02)
+step_fn = numpy.vectorize(lambda z: 1.0 if z &gt;= 0.0 else 0.0)
+step = step_fn(z)
+
+fig = plt.figure()
+ax = fig.add_subplot(111)
+ax.plot(z, step)
+ax.set_ylim([-0.5, 1.5])
+ax.set_xlim([-5,5])
+ax.grid(True)
+ax.set_xlabel(&#39;z&#39;)
+ax.set_title(&#39;step function&#39;)
+
+plt.show()
+
+&quot;&quot;&quot;Sine Function&quot;&quot;&quot;
+z = numpy.arange(-2*mt.pi, 2*mt.pi, 0.1)
+t = numpy.sin(z)
+
+fig = plt.figure()
+ax = fig.add_subplot(111)
+ax.plot(z, t)
+ax.set_ylim([-1.0, 1.0])
+ax.set_xlim([-2*mt.pi,2*mt.pi])
+ax.grid(True)
+ax.set_xlabel(&#39;z&#39;)
+ax.set_title(&#39;sine function&#39;)
+
+plt.show()
+
+&quot;&quot;&quot;Plots a graph of the squashing function used by a rectified linear
+unit&quot;&quot;&quot;
+z = numpy.arange(-2, 2, .1)
+zero = numpy.zeros(len(z))
+y = numpy.max([zero, z], axis=0)
+
+fig = plt.figure()
+ax = fig.add_subplot(111)
+ax.plot(z, y)
+ax.set_ylim([-2.0, 2.0])
+ax.set_xlim([-2.0, 2.0])
+ax.grid(True)
+ax.set_xlabel(&#39;z&#39;)
+ax.set_title(&#39;Rectified linear unit&#39;)
+
+plt.show()
+</pre></div>
+</div>
+</div>
+</div>
+</section>
+</section>
+</section>
+
+    <script type="text/x-thebe-config">
+    {
+        requestKernel: true,
+        binderOptions: {
+            repo: "binder-examples/jupyter-stacks-datascience",
+            ref: "master",
+        },
+        codeMirrorConfig: {
+            theme: "abcdef",
+            mode: "python"
+        },
+        kernelOptions: {
+            name: "python3",
+            path: "./."
+        },
+        predefinedOutput: true
+    }
+    </script>
+    <script>kernelName = 'python3'</script>
+
+                </article>
+              
+
+              
+              
+              
+              
+                <footer class="prev-next-footer d-print-none">
+                  
+<div class="prev-next-area">
+    <a class="left-prev"
+       href="http://github.com/week39.html"
+       title="previous page">
+      <i class="fa-solid fa-angle-left"></i>
+      <div class="prev-next-info">
+        <p class="prev-next-subtitle">previous</p>
+        <p class="prev-next-title">Week 39: Resampling methods and logistic regression</p>
+      </div>
+    </a>
+    <a class="right-next"
+       href="http://github.com/week41.html"
+       title="next page">
+      <div class="prev-next-info">
+        <p class="prev-next-subtitle">next</p>
+        <p class="prev-next-title">Week 41 Neural networks and constructing a neural network code</p>
+      </div>
+      <i class="fa-solid fa-angle-right"></i>
+    </a>
+</div>
+                </footer>
+              
+            </div>
+            
+            
+              
+                <div class="bd-sidebar-secondary bd-toc"><div class="sidebar-secondary-items sidebar-secondary__inner">
+
+
+  <div class="sidebar-secondary-item">
+  <div class="page-toc tocsection onthispage">
+    <i class="fa-solid fa-list"></i> Contents
+  </div>
+  <nav class="bd-toc-nav page-toc">
+    <ul class="visible nav section-nav flex-column">
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#lecture-monday-september-29-2025">Lecture Monday September 29, 2025</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#suggested-readings-and-videos">Suggested readings and videos</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#lab-sessions-tuesday-and-wednesday">Lab sessions Tuesday and Wednesday</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#logistic-regression-from-last-week">Logistic Regression, from last week</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#classification-problems">Classification problems</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#optimization-and-deep-learning">Optimization and Deep learning</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#basics">Basics</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#two-parameters">Two parameters</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#maximum-likelihood">Maximum likelihood</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#the-cost-function-rewritten">The cost function rewritten</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#minimizing-the-cross-entropy">Minimizing the cross entropy</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#a-more-compact-expression">A more compact expression</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#extending-to-more-predictors">Extending to more predictors</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#including-more-classes">Including more classes</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#more-classes">More classes</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#optimization-the-central-part-of-any-machine-learning-algortithm">Optimization, the central part of any Machine Learning algortithm</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#revisiting-our-logistic-regression-case">Revisiting our Logistic Regression case</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#the-equations-to-solve">The equations to solve</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#solving-using-newton-raphson-s-method">Solving using Newton-Raphson’s method</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#example-code-for-logistic-regression">Example code for Logistic Regression</a><ul class="nav section-nav flex-column">
+<li class="toc-h3 nav-item toc-entry"><a class="reference internal nav-link" href="#synthetic-data-generation">Synthetic data generation</a></li>
+</ul>
+</li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#using-scikit-learn">Using <strong>Scikit-learn</strong></a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#using-the-correlation-matrix">Using the correlation matrix</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#discussing-the-correlation-data">Discussing the correlation data</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#other-measures-in-classification-studies">Other measures in classification studies</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#introduction-to-neural-networks">Introduction to Neural networks</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#artificial-neurons">Artificial neurons</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#neural-network-types">Neural network types</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#feed-forward-neural-networks">Feed-forward neural networks</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#convolutional-neural-network">Convolutional Neural Network</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#recurrent-neural-networks">Recurrent neural networks</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#other-types-of-networks">Other types of networks</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#multilayer-perceptrons">Multilayer perceptrons</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#why-multilayer-perceptrons">Why multilayer perceptrons?</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#illustration-of-a-single-perceptron-model-and-a-multi-perceptron-model">Illustration of a single perceptron model and a multi-perceptron model</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#examples-of-xor-or-and-and-gates">Examples of XOR, OR and AND gates</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#does-logistic-regression-do-a-better-job">Does Logistic Regression do a better Job?</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#adding-neural-networks">Adding Neural Networks</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#mathematical-model">Mathematical model</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#id1">Mathematical model</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#id2">Mathematical model</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#id3">Mathematical model</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#id4">Mathematical model</a><ul class="nav section-nav flex-column">
+<li class="toc-h3 nav-item toc-entry"><a class="reference internal nav-link" href="#matrix-vector-notation">Matrix-vector notation</a></li>
+<li class="toc-h3 nav-item toc-entry"><a class="reference internal nav-link" href="#matrix-vector-notation-and-activation">Matrix-vector notation  and activation</a></li>
+<li class="toc-h3 nav-item toc-entry"><a class="reference internal nav-link" href="#activation-functions">Activation functions</a></li>
+<li class="toc-h3 nav-item toc-entry"><a class="reference internal nav-link" href="#activation-functions-logistic-and-hyperbolic-ones">Activation functions, Logistic and Hyperbolic ones</a></li>
+<li class="toc-h3 nav-item toc-entry"><a class="reference internal nav-link" href="#relevance">Relevance</a></li>
+</ul>
+</li>
+</ul>
+  </nav></div>
+
+</div></div>
+              
+            
+          </div>
+          <footer class="bd-footer-content">
+            
+<div class="bd-footer-content__inner container">
+  
+  <div class="footer-item">
+    
+<p class="component-author">
+By Morten Hjorth-Jensen
+</p>
+
+  </div>
+  
+  <div class="footer-item">
+    
+
+  <p class="copyright">
+    
+      © Copyright 2023.
+      <br/>
+    
+  </p>
+
+  </div>
+  
+  <div class="footer-item">
+    
+  </div>
+  
+  <div class="footer-item">
+    
+  </div>
+  
+</div>
+          </footer>
+        
+
+      </main>
+    </div>
+  </div>
+  
+  <!-- Scripts loaded after <body> so the DOM is not blocked -->
+  <script src="http://github.com/_static/scripts/bootstrap.js?digest=dfe6caa3a7d634c4db9b"></script>
+<script src="http://github.com/_static/scripts/pydata-sphinx-theme.js?digest=dfe6caa3a7d634c4db9b"></script>
+
+  <footer class="bd-footer">
+  </footer>
+  </body>
+</html>
\ No newline at end of file
diff --git a/doc/LectureNotes/_build/html/week41.html b/doc/LectureNotes/_build/html/week41.html
new file mode 100644
index 000000000..c6cf52bc8
--- /dev/null
+++ b/doc/LectureNotes/_build/html/week41.html
@@ -0,0 +1,2080 @@
+
+<!DOCTYPE html>
+
+
+<html lang="en" data-content_root="./" >
+
+  <head>
+    <meta charset="utf-8" />
+    <meta name="viewport" content="width=device-width, initial-scale=1.0" /><meta name="viewport" content="width=device-width, initial-scale=1" />
+
+    <title>Week 41 Neural networks and constructing a neural network code &#8212; Applied Data Analysis and Machine Learning</title>
+  
+  
+  
+  <script data-cfasync="false">
+    document.documentElement.dataset.mode = localStorage.getItem("mode") || "";
+    document.documentElement.dataset.theme = localStorage.getItem("theme") || "";
+  </script>
+  
+  <!-- Loaded before other Sphinx assets -->
+  <link href="http://github.com/_static/styles/theme.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+<link href="http://github.com/_static/styles/bootstrap.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+<link href="http://github.com/_static/styles/pydata-sphinx-theme.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+
+  
+  <link href="http://github.com/_static/vendor/fontawesome/6.5.2/css/all.min.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+  <link rel="preload" as="font" type="font/woff2" crossorigin href="http://github.com/_static/vendor/fontawesome/6.5.2/webfonts/fa-solid-900.woff2" />
+<link rel="preload" as="font" type="font/woff2" crossorigin href="http://github.com/_static/vendor/fontawesome/6.5.2/webfonts/fa-brands-400.woff2" />
+<link rel="preload" as="font" type="font/woff2" crossorigin href="http://github.com/_static/vendor/fontawesome/6.5.2/webfonts/fa-regular-400.woff2" />
+
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/pygments.css?v=fa44fd50" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/styles/sphinx-book-theme.css?v=eba8b062" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/togglebutton.css?v=13237357" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/copybutton.css?v=76b2166b" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/mystnb.8ecb98da25f57f5357bf6f572d296f466b2cfe2517ffebfabe82451661e28f02.css?v=6644e6bb" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/sphinx-thebe.css?v=4fa983c6" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/sphinx-design.min.css?v=95c83b7e" />
+  
+  <!-- Pre-loaded scripts that we'll load fully later -->
+  <link rel="preload" as="script" href="http://github.com/_static/scripts/bootstrap.js?digest=dfe6caa3a7d634c4db9b" />
+<link rel="preload" as="script" href="http://github.com/_static/scripts/pydata-sphinx-theme.js?digest=dfe6caa3a7d634c4db9b" />
+  <script src="http://github.com/_static/vendor/fontawesome/6.5.2/js/all.min.js?digest=dfe6caa3a7d634c4db9b"></script>
+
+    <script src="http://github.com/_static/documentation_options.js?v=9eb32ce0"></script>
+    <script src="http://github.com/_static/doctools.js?v=9a2dae69"></script>
+    <script src="http://github.com/_static/sphinx_highlight.js?v=dc90522c"></script>
+    <script src="http://github.com/_static/clipboard.min.js?v=a7894cd8"></script>
+    <script src="http://github.com/_static/copybutton.js?v=f281be69"></script>
+    <script src="http://github.com/_static/scripts/sphinx-book-theme.js?v=887ef09a"></script>
+    <script>let toggleHintShow = 'Click to show';</script>
+    <script>let toggleHintHide = 'Click to hide';</script>
+    <script>let toggleOpenOnPrint = 'true';</script>
+    <script src="http://github.com/_static/togglebutton.js?v=4a39c7ea"></script>
+    <script>var togglebuttonSelector = '.toggle, .admonition.dropdown';</script>
+    <script src="http://github.com/_static/design-tabs.js?v=f930bc37"></script>
+    <script>const THEBE_JS_URL = "https://unpkg.com/thebe@0.8.2/lib/index.js"; const thebe_selector = ".thebe,.cell"; const thebe_selector_input = "pre"; const thebe_selector_output = ".output, .cell_output"</script>
+    <script async="async" src="http://github.com/_static/sphinx-thebe.js?v=c100c467"></script>
+    <script>var togglebuttonSelector = '.toggle, .admonition.dropdown';</script>
+    <script>const THEBE_JS_URL = "https://unpkg.com/thebe@0.8.2/lib/index.js"; const thebe_selector = ".thebe,.cell"; const thebe_selector_input = "pre"; const thebe_selector_output = ".output, .cell_output"</script>
+    <script>window.MathJax = {"options": {"processHtmlClass": "tex2jax_process|mathjax_process|math|output_area"}}</script>
+    <script defer="defer" src="https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-mml-chtml.js"></script>
+    <script>DOCUMENTATION_OPTIONS.pagename = 'week41';</script>
+    <link rel="index" title="Index" href="http://github.com/genindex.html" />
+    <link rel="search" title="Search" href="http://github.com/search.html" />
+    <link rel="next" title="Exercises week 41" href="http://github.com/exercisesweek41.html" />
+    <link rel="prev" title="Week 40: Gradient descent methods (continued) and start Neural networks" href="http://github.com/week40.html" />
+  <meta name="viewport" content="width=device-width, initial-scale=1"/>
+  <meta name="docsearch:language" content="en"/>
+  </head>
+  
+  
+  <body data-bs-spy="scroll" data-bs-target=".bd-toc-nav" data-offset="180" data-bs-root-margin="0px 0px -60%" data-default-mode="">
+
+  
+  
+  <div id="pst-skip-link" class="skip-link d-print-none"><a href="#main-content">Skip to main content</a></div>
+  
+  <div id="pst-scroll-pixel-helper"></div>
+  
+  <button type="button" class="btn rounded-pill" id="pst-back-to-top">
+    <i class="fa-solid fa-arrow-up"></i>Back to top</button>
+
+  
+  <input type="checkbox"
+          class="sidebar-toggle"
+          id="pst-primary-sidebar-checkbox"/>
+  <label class="overlay overlay-primary" for="pst-primary-sidebar-checkbox"></label>
+  
+  <input type="checkbox"
+          class="sidebar-toggle"
+          id="pst-secondary-sidebar-checkbox"/>
+  <label class="overlay overlay-secondary" for="pst-secondary-sidebar-checkbox"></label>
+  
+  <div class="search-button__wrapper">
+    <div class="search-button__overlay"></div>
+    <div class="search-button__search-container">
+<form class="bd-search d-flex align-items-center"
+      action="http://github.com/search.html"
+      method="get">
+  <i class="fa-solid fa-magnifying-glass"></i>
+  <input type="search"
+         class="form-control"
+         name="q"
+         id="search-input"
+         placeholder="Search this book..."
+         aria-label="Search this book..."
+         autocomplete="off"
+         autocorrect="off"
+         autocapitalize="off"
+         spellcheck="false"/>
+  <span class="search-button__kbd-shortcut"><kbd class="kbd-shortcut__modifier">Ctrl</kbd>+<kbd>K</kbd></span>
+</form></div>
+  </div>
+
+  <div class="pst-async-banner-revealer d-none">
+  <aside id="bd-header-version-warning" class="d-none d-print-none" aria-label="Version warning"></aside>
+</div>
+
+  
+    <header class="bd-header navbar navbar-expand-lg bd-navbar d-print-none">
+    </header>
+  
+
+  <div class="bd-container">
+    <div class="bd-container__inner bd-page-width">
+      
+      
+      
+      <div class="bd-sidebar-primary bd-sidebar">
+        
+
+  
+  <div class="sidebar-header-items sidebar-primary__section">
+    
+    
+    
+    
+  </div>
+  
+    <div class="sidebar-primary-items__start sidebar-primary__section">
+        <div class="sidebar-primary-item">
+
+  
+    
+  
+
+<a class="navbar-brand logo" href="http://github.com/intro.html">
+  
+  
+  
+  
+  
+    
+    
+      
+    
+    
+    <img src="http://github.com/_static/logo.png" class="logo__image only-light" alt="Applied Data Analysis and Machine Learning - Home"/>
+    <script>document.write(`<img src="http://github.com/_static/logo.png" class="logo__image only-dark" alt="Applied Data Analysis and Machine Learning - Home"/>`);</script>
+  
+  
+</a></div>
+        <div class="sidebar-primary-item">
+
+ <script>
+ document.write(`
+   <button class="btn search-button-field search-button__button" title="Search" aria-label="Search" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <i class="fa-solid fa-magnifying-glass"></i>
+    <span class="search-button__default-text">Search</span>
+    <span class="search-button__kbd-shortcut"><kbd class="kbd-shortcut__modifier">Ctrl</kbd>+<kbd class="kbd-shortcut__modifier">K</kbd></span>
+   </button>
+ `);
+ </script></div>
+        <div class="sidebar-primary-item"><nav class="bd-links bd-docs-nav" aria-label="Main">
+    <div class="bd-toc-item navbar-nav active">
+        
+        <ul class="nav bd-sidenav bd-sidenav__home-link">
+            <li class="toctree-l1">
+                <a class="reference internal" href="http://github.com/intro.html">
+                    Applied Data Analysis and Machine Learning
+                </a>
+            </li>
+        </ul>
+        <p aria-level="2" class="caption" role="heading"><span class="caption-text">About the course</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/schedule.html">Course setting</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/teachers.html">Teachers and Grading</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/textbooks.html">Textbooks</a></li>
+
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Review of Statistics with Resampling Techniques and Linear Algebra</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/statistics.html">1. Elements of Probability Theory and Statistical Data Analysis</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/linalg.html">2. Linear Algebra, Handling of Arrays and more Python Features</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">From Regression to Support Vector Machines</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter1.html">3. Linear Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter2.html">4. Ridge and Lasso Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter3.html">5. Resampling Methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter4.html">6. Logistic Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapteroptimization.html">7. Optimization, the central part of any Machine Learning algortithm</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter5.html">8. Support Vector Machines, overarching aims</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Decision Trees, Ensemble Methods and Boosting</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter6.html">9. Decision trees, overarching aims</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter7.html">10. Ensemble Methods: From a Single Tree to Many Trees and Extreme Boosting, Meet the Jungle of Methods</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Dimensionality Reduction</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter8.html">11. Basic ideas of the Principal Component Analysis (PCA)</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/clustering.html">12. Clustering and Unsupervised Learning</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Deep Learning Methods</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter9.html">13. Neural networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter10.html">14. Building a Feed Forward Neural Network</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter11.html">15. Solving Differential Equations  with Deep Learning</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter12.html">16. Convolutional Neural Networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter13.html">17. Recurrent neural networks: Overarching view</a></li>
+
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Weekly material, notes and exercises</span></p>
+<ul class="current nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek34.html">Exercises week 34</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week34.html">Week 34: Introduction to the course, Logistics and Practicalities</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek35.html">Exercises week 35</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week35.html">Week 35: From Ordinary Linear Regression to Ridge and Lasso Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek36.html">Exercises week 36</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week36.html">Week 36: Linear Regression and Gradient descent</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek37.html">Exercises week 37</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week37.html">Week 37: Gradient descent methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek38.html">Exercises week 38</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week38.html">Week 38: Statistical analysis, bias-variance tradeoff and resampling methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek39.html">Exercises week 39</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week39.html">Week 39: Resampling methods and logistic regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week40.html">Week 40: Gradient descent methods (continued) and start Neural networks</a></li>
+<li class="toctree-l1 current active"><a class="current reference internal" href="#">Week 41 Neural networks and constructing a neural network code</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek41.html">Exercises week 41</a></li>
+
+
+
+
+
+
+
+
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week42.html">Week 42 Constructing a Neural Network code with examples</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek42.html">Exercises week 42</a></li>
+
+
+
+
+
+
+
+
+
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week43.html">Week 43: Deep Learning: Constructing a Neural Network code and solving differential equations</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek43.html">Exercises week 43</a></li>
+
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week44.html">Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek44.html">Exercises week 44</a></li>
+
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week45.html">Week 45,  Convolutional Neural Networks (CCNs)</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Projects</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/project1.html">Project 1 on Machine Learning, deadline October 6 (midnight), 2025</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/project2.html">Project 2 on Machine Learning, deadline November 10 (Midnight)</a></li>
+</ul>
+
+    </div>
+</nav></div>
+    </div>
+  
+  
+  <div class="sidebar-primary-items__end sidebar-primary__section">
+  </div>
+  
+  <div id="rtd-footer-container"></div>
+
+
+      </div>
+      
+      <main id="main-content" class="bd-main" role="main">
+        
+        
+
+<div class="sbt-scroll-pixel-helper"></div>
+
+          <div class="bd-content">
+            <div class="bd-article-container">
+              
+              <div class="bd-header-article d-print-none">
+<div class="header-article-items header-article__inner">
+  
+    <div class="header-article-items__start">
+      
+        <div class="header-article-item"><button class="sidebar-toggle primary-toggle btn btn-sm" title="Toggle primary sidebar" data-bs-placement="bottom" data-bs-toggle="tooltip">
+  <span class="fa-solid fa-bars"></span>
+</button></div>
+      
+    </div>
+  
+  
+    <div class="header-article-items__end">
+      
+        <div class="header-article-item">
+
+<div class="article-header-buttons">
+
+
+
+
+
+<div class="dropdown dropdown-download-buttons">
+  <button class="btn dropdown-toggle" type="button" data-bs-toggle="dropdown" aria-expanded="false" aria-label="Download this page">
+    <i class="fas fa-download"></i>
+  </button>
+  <ul class="dropdown-menu">
+      
+      
+      
+      <li><a href="http://github.com/_sources/week41.ipynb" target="_blank"
+   class="btn btn-sm btn-download-source-button dropdown-item"
+   title="Download source file"
+   data-bs-placement="left" data-bs-toggle="tooltip"
+>
+  
+
+<span class="btn__icon-container">
+  <i class="fas fa-file"></i>
+  </span>
+<span class="btn__text-container">.ipynb</span>
+</a>
+</li>
+      
+      
+      
+      
+      <li>
+<button onclick="window.print()"
+  class="btn btn-sm btn-download-pdf-button dropdown-item"
+  title="Print to PDF"
+  data-bs-placement="left" data-bs-toggle="tooltip"
+>
+  
+
+<span class="btn__icon-container">
+  <i class="fas fa-file-pdf"></i>
+  </span>
+<span class="btn__text-container">.pdf</span>
+</button>
+</li>
+      
+  </ul>
+</div>
+
+
+
+
+<button onclick="toggleFullScreen()"
+  class="btn btn-sm btn-fullscreen-button"
+  title="Fullscreen mode"
+  data-bs-placement="bottom" data-bs-toggle="tooltip"
+>
+  
+
+<span class="btn__icon-container">
+  <i class="fas fa-expand"></i>
+  </span>
+
+</button>
+
+
+
+<script>
+document.write(`
+  <button class="btn btn-sm nav-link pst-navbar-icon theme-switch-button" title="light/dark" aria-label="light/dark" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <i class="theme-switch fa-solid fa-sun fa-lg" data-mode="light"></i>
+    <i class="theme-switch fa-solid fa-moon fa-lg" data-mode="dark"></i>
+    <i class="theme-switch fa-solid fa-circle-half-stroke fa-lg" data-mode="auto"></i>
+  </button>
+`);
+</script>
+
+
+<script>
+document.write(`
+  <button class="btn btn-sm pst-navbar-icon search-button search-button__button" title="Search" aria-label="Search" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <i class="fa-solid fa-magnifying-glass fa-lg"></i>
+  </button>
+`);
+</script>
+<button class="sidebar-toggle secondary-toggle btn btn-sm" title="Toggle secondary sidebar" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <span class="fa-solid fa-list"></span>
+</button>
+</div></div>
+      
+    </div>
+  
+</div>
+</div>
+              
+              
+
+<div id="jb-print-docs-body" class="onlyprint">
+    <h1>Week 41 Neural networks and constructing a neural network code</h1>
+    <!-- Table of contents -->
+    <div id="print-main-content">
+        <div id="jb-print-toc">
+            
+            <div>
+                <h2> Contents </h2>
+            </div>
+            <nav aria-label="Page">
+                <ul class="visible nav section-nav flex-column">
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#plan-for-week-41-october-6-10">Plan for week 41, October 6-10</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#material-for-the-lecture-on-monday-october-6-2025">Material for the lecture on Monday October 6, 2025</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#readings-and-videos">Readings and Videos:</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#mathematics-of-deep-learning">Mathematics of deep learning</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#reminder-on-books-with-hands-on-material-and-codes">Reminder on books with hands-on material and codes</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#lab-sessions-on-tuesday-and-wednesday">Lab sessions on Tuesday and Wednesday</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#lecture-monday-october-6">Lecture Monday  October 6</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#introduction-to-neural-networks">Introduction to Neural networks</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#artificial-neurons">Artificial neurons</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#neural-network-types">Neural network types</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#feed-forward-neural-networks">Feed-forward neural networks</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#convolutional-neural-network">Convolutional Neural Network</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#recurrent-neural-networks">Recurrent neural networks</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#other-types-of-networks">Other types of networks</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#multilayer-perceptrons">Multilayer perceptrons</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#why-multilayer-perceptrons">Why multilayer perceptrons?</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#illustration-of-a-single-perceptron-model-and-a-multi-perceptron-model">Illustration of a single perceptron model and a multi-perceptron model</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#mathematics-of-deep-learning-and-neural-networks">Mathematics of deep learning and neural networks</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#basics-of-an-nn">Basics of an NN</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#overarching-view-of-a-neural-network">Overarching view of a neural network</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#the-optimization-problem">The optimization problem</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#parameters-of-neural-networks">Parameters of neural networks</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#other-ingredients-of-a-neural-network">Other ingredients of a neural network</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#other-parameters">Other parameters</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#universal-approximation-theorem">Universal approximation theorem</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#some-parallels-from-real-analysis">Some parallels from real analysis</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#the-approximation-theorem-in-words">The approximation theorem in words</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#more-on-the-general-approximation-theorem">More on the general approximation theorem</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#class-of-functions-we-can-approximate">Class of functions we can approximate</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#setting-up-the-equations-for-a-neural-network">Setting up the equations for a neural network</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#layout-of-a-neural-network-with-three-hidden-layers">Layout of a neural network with three hidden layers</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#definitions">Definitions</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#inputs-to-the-activation-function">Inputs to the activation function</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#derivatives-and-the-chain-rule">Derivatives and the chain rule</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#derivative-of-the-cost-function">Derivative of the cost function</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#simpler-examples-first-and-automatic-differentiation">Simpler examples first, and automatic differentiation</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#reminder-on-the-chain-rule-and-gradients">Reminder on the chain rule and gradients</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#multivariable-functions">Multivariable functions</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#automatic-differentiation-through-examples">Automatic differentiation through examples</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#simple-example">Simple example</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#smarter-way-of-evaluating-the-above-function">Smarter way of evaluating the above function</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#reducing-the-number-of-operations">Reducing the number of operations</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#chain-rule-forward-and-reverse-modes">Chain rule, forward and reverse modes</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#forward-and-reverse-modes">Forward and reverse modes</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#more-complicated-function">More complicated function</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#counting-the-number-of-floating-point-operations">Counting the number of floating point operations</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#defining-intermediate-operations">Defining intermediate operations</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#new-expression-for-the-derivative">New expression for the derivative</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#final-derivatives">Final derivatives</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#in-general-not-this-simple">In general not this simple</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#automatic-differentiation">Automatic differentiation</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#chain-rule">Chain rule</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#first-network-example-simple-percepetron-with-one-input">First network example, simple percepetron with one input</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#layout-of-a-simple-neural-network-with-no-hidden-layer">Layout of a simple neural network with no hidden layer</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#optimizing-the-parameters">Optimizing the parameters</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#adding-a-hidden-layer">Adding a hidden layer</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#layout-of-a-simple-neural-network-with-one-hidden-layer">Layout of a simple neural network with one hidden layer</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#the-derivatives">The derivatives</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#important-observations">Important observations</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#the-training">The training</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#code-example">Code example</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#exercise-1-including-more-data">Exercise 1: Including more data</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#simple-neural-network-and-the-back-propagation-equations">Simple neural network and the  back propagation equations</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-and-one-output-node">Layout of a simple neural network with two input nodes, one  hidden layer and one output node</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#the-ouput-layer">The ouput layer</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#compact-expressions">Compact expressions</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#output-layer">Output layer</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#explicit-derivatives">Explicit derivatives</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#derivatives-of-the-hidden-layer">Derivatives of the hidden layer</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#final-expression">Final expression</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#completing-the-list">Completing the list</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#final-expressions-for-the-biases-of-the-hidden-layer">Final expressions for the biases of the hidden layer</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#gradient-expressions">Gradient expressions</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#exercise-2-extended-program">Exercise 2: Extended program</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#getting-serious-the-back-propagation-equations-for-a-neural-network">Getting serious, the  back propagation equations for a neural network</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#analyzing-the-last-results">Analyzing the last results</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#more-considerations">More considerations</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#derivatives-in-terms-of-z-j-l">Derivatives in terms of <span class="math notranslate nohighlight">\(z_j^L\)</span></a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#bringing-it-together">Bringing it together</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#final-back-propagating-equation">Final back propagating equation</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#using-the-chain-rule-and-summing-over-all-k-entries">Using the chain rule and summing over all <span class="math notranslate nohighlight">\(k\)</span> entries</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#setting-up-the-back-propagation-algorithm">Setting up the back propagation algorithm</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#setting-up-the-back-propagation-algorithm-part-2">Setting up the back propagation algorithm, part 2</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#setting-up-the-back-propagation-algorithm-part-3">Setting up the Back propagation algorithm, part 3</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#updating-the-gradients">Updating the gradients</a></li>
+</ul>
+            </nav>
+        </div>
+    </div>
+</div>
+
+              
+                
+<div id="searchbox"></div>
+                <article class="bd-article">
+                  
+  <!-- HTML file automatically generated from DocOnce source (https://github.com/doconce/doconce/)
+doconce format html week41.do.txt --no_mako -->
+<!-- dom:TITLE: Week 41 Neural networks and constructing a neural network code --><section class="tex2jax_ignore mathjax_ignore" id="week-41-neural-networks-and-constructing-a-neural-network-code">
+<h1>Week 41 Neural networks and constructing a neural network code<a class="headerlink" href="#week-41-neural-networks-and-constructing-a-neural-network-code" title="Link to this heading">#</a></h1>
+<p><strong>Morten Hjorth-Jensen</strong>, Department of Physics, University of Oslo, Norway</p>
+<p>Date: <strong>Week 41</strong></p>
+<section id="plan-for-week-41-october-6-10">
+<h2>Plan for week 41, October 6-10<a class="headerlink" href="#plan-for-week-41-october-6-10" title="Link to this heading">#</a></h2>
+</section>
+<section id="material-for-the-lecture-on-monday-october-6-2025">
+<h2>Material for the lecture on Monday October 6, 2025<a class="headerlink" href="#material-for-the-lecture-on-monday-october-6-2025" title="Link to this heading">#</a></h2>
+<ol class="arabic simple">
+<li><p>Neural Networks, setting up the basic steps, from the simple perceptron model to the multi-layer perceptron model.</p></li>
+<li><p>Building our own Feed-forward Neural Network, getting started</p></li>
+</ol>
+<!-- * Video of lecture notes at URL:"" -->
+<!-- * Whiteboard notes at <https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2025/FYSSTKweek41.pdf> --></section>
+<section id="readings-and-videos">
+<h2>Readings and Videos:<a class="headerlink" href="#readings-and-videos" title="Link to this heading">#</a></h2>
+<ol class="arabic simple">
+<li><p>These lecture notes</p></li>
+<li><p>For neural networks we recommend Goodfellow et al chapters 6 and 7.</p></li>
+<li><p>Rashkca et al., chapter 11, jupyter-notebook sent separately, from <a class="reference external" href="https://github.com/rasbt/machine-learning-book">GitHub</a></p></li>
+<li><p>Neural Networks demystified at <a class="reference external" href="https://www.youtube.com/watch?v=bxe2T-V8XRs&amp;amp;list=PLiaHhY2iBX9hdHaRr6b7XevZtgZRa1PoU&amp;amp;ab_channel=WelchLabs">https://www.youtube.com/watch?v=bxe2T-V8XRs&amp;list=PLiaHhY2iBX9hdHaRr6b7XevZtgZRa1PoU&amp;ab_channel=WelchLabs</a></p></li>
+<li><p>Building Neural Networks from scratch at <a class="reference external" href="https://www.youtube.com/watch?v=Wo5dMEP_BbI&amp;amp;list=PLQVvvaa0QuDcjD5BAw2DxE6OF2tius3V3&amp;amp;ab_channel=sentdex">https://www.youtube.com/watch?v=Wo5dMEP_BbI&amp;list=PLQVvvaa0QuDcjD5BAw2DxE6OF2tius3V3&amp;ab_channel=sentdex</a></p></li>
+<li><p>Video on Neural Networks at <a class="reference external" href="https://www.youtube.com/watch?v=CqOfi41LfDw">https://www.youtube.com/watch?v=CqOfi41LfDw</a></p></li>
+<li><p>Video on the back propagation algorithm at <a class="reference external" href="https://www.youtube.com/watch?v=Ilg3gGewQ5U">https://www.youtube.com/watch?v=Ilg3gGewQ5U</a></p></li>
+<li><p>We also  recommend Michael Nielsen’s intuitive approach to the neural networks and the universal approximation theorem, see the slides at <a class="reference external" href="http://neuralnetworksanddeeplearning.com/chap4.html">http://neuralnetworksanddeeplearning.com/chap4.html</a>.</p></li>
+</ol>
+</section>
+<section id="mathematics-of-deep-learning">
+<h2>Mathematics of deep learning<a class="headerlink" href="#mathematics-of-deep-learning" title="Link to this heading">#</a></h2>
+<p><strong>Two recent books online.</strong></p>
+<ol class="arabic simple">
+<li><p><a class="reference external" href="https://arxiv.org/abs/2105.04026">The Modern Mathematics of Deep Learning, by Julius Berner, Philipp Grohs, Gitta Kutyniok, Philipp Petersen</a>, published as <a class="reference external" href="https://doi.org/10.1017/9781009025096.002">Mathematical Aspects of Deep Learning, pp. 1-111. Cambridge University Press, 2022</a></p></li>
+<li><p><a class="reference external" href="https://doi.org/10.48550/arXiv.2310.20360">Mathematical Introduction to Deep Learning: Methods, Implementations, and Theory, Arnulf Jentzen, Benno Kuckuck, Philippe von Wurstemberger</a></p></li>
+</ol>
+</section>
+<section id="reminder-on-books-with-hands-on-material-and-codes">
+<h2>Reminder on books with hands-on material and codes<a class="headerlink" href="#reminder-on-books-with-hands-on-material-and-codes" title="Link to this heading">#</a></h2>
+<p><a class="reference external" href="https://sebastianraschka.com/blog/2022/ml-pytorch-book.html">Sebastian Rashcka et al, Machine learning with Sickit-Learn and PyTorch</a></p>
+</section>
+<section id="lab-sessions-on-tuesday-and-wednesday">
+<h2>Lab sessions on Tuesday and Wednesday<a class="headerlink" href="#lab-sessions-on-tuesday-and-wednesday" title="Link to this heading">#</a></h2>
+<p>Aim: Getting started with coding neural network. The exercises this
+week aim at setting up the feed-forward part of a neural network.</p>
+</section>
+<section id="lecture-monday-october-6">
+<h2>Lecture Monday  October 6<a class="headerlink" href="#lecture-monday-october-6" title="Link to this heading">#</a></h2>
+</section>
+<section id="introduction-to-neural-networks">
+<h2>Introduction to Neural networks<a class="headerlink" href="#introduction-to-neural-networks" title="Link to this heading">#</a></h2>
+<p>Artificial neural networks are computational systems that can learn to
+perform tasks by considering examples, generally without being
+programmed with any task-specific rules. It is supposed to mimic a
+biological system, wherein neurons interact by sending signals in the
+form of mathematical functions between layers. All layers can contain
+an arbitrary number of neurons, and each connection is represented by
+a weight variable.</p>
+</section>
+<section id="artificial-neurons">
+<h2>Artificial neurons<a class="headerlink" href="#artificial-neurons" title="Link to this heading">#</a></h2>
+<p>The field of artificial neural networks has a long history of
+development, and is closely connected with the advancement of computer
+science and computers in general. A model of artificial neurons was
+first developed by McCulloch and Pitts in 1943 to study signal
+processing in the brain and has later been refined by others. The
+general idea is to mimic neural networks in the human brain, which is
+composed of billions of neurons that communicate with each other by
+sending electrical signals.  Each neuron accumulates its incoming
+signals, which must exceed an activation threshold to yield an
+output. If the threshold is not overcome, the neuron remains inactive,
+i.e. has zero output.</p>
+<p>This behaviour has inspired a simple mathematical model for an artificial neuron.</p>
+<!-- Equation labels as ordinary links -->
+<div id="artificialNeuron"></div>
+<div class="math notranslate nohighlight">
+\[
+\begin{equation}
+ y = f\left(\sum_{i=1}^n w_ix_i\right) = f(u)
+\label{artificialNeuron} \tag{1}
+\end{equation}
+\]</div>
+<p>Here, the output <span class="math notranslate nohighlight">\(y\)</span> of the neuron is the value of its activation function, which have as input
+a weighted sum of signals <span class="math notranslate nohighlight">\(x_i, \dots ,x_n\)</span> received by <span class="math notranslate nohighlight">\(n\)</span> other neurons.</p>
+<p>Conceptually, it is helpful to divide neural networks into four
+categories:</p>
+<ol class="arabic simple">
+<li><p>general purpose neural networks for supervised learning,</p></li>
+<li><p>neural networks designed specifically for image processing, the most prominent example of this class being Convolutional Neural Networks (CNNs),</p></li>
+<li><p>neural networks for sequential data such as Recurrent Neural Networks (RNNs), and</p></li>
+<li><p>neural networks for unsupervised learning such as Deep Boltzmann Machines.</p></li>
+</ol>
+<p>In natural science, DNNs and CNNs have already found numerous
+applications. In statistical physics, they have been applied to detect
+phase transitions in 2D Ising and Potts models, lattice gauge
+theories, and different phases of polymers, or solving the
+Navier-Stokes equation in weather forecasting.  Deep learning has also
+found interesting applications in quantum physics. Various quantum
+phase transitions can be detected and studied using DNNs and CNNs,
+topological phases, and even non-equilibrium many-body
+localization. Representing quantum states as DNNs quantum state
+tomography are among some of the impressive achievements to reveal the
+potential of DNNs to facilitate the study of quantum systems.</p>
+<p>In quantum information theory, it has been shown that one can perform
+gate decompositions with the help of neural.</p>
+<p>The applications are not limited to the natural sciences. There is a
+plethora of applications in essentially all disciplines, from the
+humanities to life science and medicine.</p>
+</section>
+<section id="neural-network-types">
+<h2>Neural network types<a class="headerlink" href="#neural-network-types" title="Link to this heading">#</a></h2>
+<p>An artificial neural network (ANN), is a computational model that
+consists of layers of connected neurons, or nodes or units.  We will
+refer to these interchangeably as units or nodes, and sometimes as
+neurons.</p>
+<p>It is supposed to mimic a biological nervous system by letting each
+neuron interact with other neurons by sending signals in the form of
+mathematical functions between layers.  A wide variety of different
+ANNs have been developed, but most of them consist of an input layer,
+an output layer and eventual layers in-between, called <em>hidden
+layers</em>. All layers can contain an arbitrary number of nodes, and each
+connection between two nodes is associated with a weight variable.</p>
+<p>Neural networks (also called neural nets) are neural-inspired
+nonlinear models for supervised learning.  As we will see, neural nets
+can be viewed as natural, more powerful extensions of supervised
+learning methods such as linear and logistic regression and soft-max
+methods we discussed earlier.</p>
+</section>
+<section id="feed-forward-neural-networks">
+<h2>Feed-forward neural networks<a class="headerlink" href="#feed-forward-neural-networks" title="Link to this heading">#</a></h2>
+<p>The feed-forward neural network (FFNN) was the first and simplest type
+of ANNs that were devised. In this network, the information moves in
+only one direction: forward through the layers.</p>
+<p>Nodes are represented by circles, while the arrows display the
+connections between the nodes, including the direction of information
+flow. Additionally, each arrow corresponds to a weight variable
+(figure to come).  We observe that each node in a layer is connected
+to <em>all</em> nodes in the subsequent layer, making this a so-called
+<em>fully-connected</em> FFNN.</p>
+</section>
+<section id="convolutional-neural-network">
+<h2>Convolutional Neural Network<a class="headerlink" href="#convolutional-neural-network" title="Link to this heading">#</a></h2>
+<p>A different variant of FFNNs are <em>convolutional neural networks</em>
+(CNNs), which have a connectivity pattern inspired by the animal
+visual cortex. Individual neurons in the visual cortex only respond to
+stimuli from small sub-regions of the visual field, called a receptive
+field. This makes the neurons well-suited to exploit the strong
+spatially local correlation present in natural images. The response of
+each neuron can be approximated mathematically as a convolution
+operation.  (figure to come)</p>
+<p>Convolutional neural networks emulate the behaviour of neurons in the
+visual cortex by enforcing a <em>local</em> connectivity pattern between
+nodes of adjacent layers: Each node in a convolutional layer is
+connected only to a subset of the nodes in the previous layer, in
+contrast to the fully-connected FFNN.  Often, CNNs consist of several
+convolutional layers that learn local features of the input, with a
+fully-connected layer at the end, which gathers all the local data and
+produces the outputs. They have wide applications in image and video
+recognition.</p>
+</section>
+<section id="recurrent-neural-networks">
+<h2>Recurrent neural networks<a class="headerlink" href="#recurrent-neural-networks" title="Link to this heading">#</a></h2>
+<p>So far we have only mentioned ANNs where information flows in one
+direction: forward. <em>Recurrent neural networks</em> on the other hand,
+have connections between nodes that form directed <em>cycles</em>. This
+creates a form of internal memory which are able to capture
+information on what has been calculated before; the output is
+dependent on the previous computations. Recurrent NNs make use of
+sequential information by performing the same task for every element
+in a sequence, where each element depends on previous elements. An
+example of such information is sentences, making recurrent NNs
+especially well-suited for handwriting and speech recognition.</p>
+</section>
+<section id="other-types-of-networks">
+<h2>Other types of networks<a class="headerlink" href="#other-types-of-networks" title="Link to this heading">#</a></h2>
+<p>There are many other kinds of ANNs that have been developed. One type
+that is specifically designed for interpolation in multidimensional
+space is the radial basis function (RBF) network. RBFs are typically
+made up of three layers: an input layer, a hidden layer with
+non-linear radial symmetric activation functions and a linear output
+layer (‘’linear’’ here means that each node in the output layer has a
+linear activation function). The layers are normally fully-connected
+and there are no cycles, thus RBFs can be viewed as a type of
+fully-connected FFNN. They are however usually treated as a separate
+type of NN due the unusual activation functions.</p>
+</section>
+<section id="multilayer-perceptrons">
+<h2>Multilayer perceptrons<a class="headerlink" href="#multilayer-perceptrons" title="Link to this heading">#</a></h2>
+<p>One uses often so-called fully-connected feed-forward neural networks
+with three or more layers (an input layer, one or more hidden layers
+and an output layer) consisting of neurons that have non-linear
+activation functions.</p>
+<p>Such networks are often called <em>multilayer perceptrons</em> (MLPs).</p>
+</section>
+<section id="why-multilayer-perceptrons">
+<h2>Why multilayer perceptrons?<a class="headerlink" href="#why-multilayer-perceptrons" title="Link to this heading">#</a></h2>
+<p>According to the <em>Universal approximation theorem</em>, a feed-forward
+neural network with just a single hidden layer containing a finite
+number of neurons can approximate a continuous multidimensional
+function to arbitrary accuracy, assuming the activation function for
+the hidden layer is a <strong>non-constant, bounded and
+monotonically-increasing continuous function</strong>.</p>
+<p>Note that the requirements on the activation function only applies to
+the hidden layer, the output nodes are always assumed to be linear, so
+as to not restrict the range of output values.</p>
+</section>
+<section id="illustration-of-a-single-perceptron-model-and-a-multi-perceptron-model">
+<h2>Illustration of a single perceptron model and a multi-perceptron model<a class="headerlink" href="#illustration-of-a-single-perceptron-model-and-a-multi-perceptron-model" title="Link to this heading">#</a></h2>
+<!-- dom:FIGURE: [figures/nns.png, width=600 frac=0.8]  In a) we show a single perceptron model while in b) we dispay a network with two  hidden layers, an input layer and an output layer. -->
+<!-- begin figure -->
+<p><img src="http://github.com/figures/nns.png" width="600"><p style="font-size: 0.9em"><i>Figure 1: In a) we show a single perceptron model while in b) we dispay a network with two  hidden layers, an input layer and an output layer.</i></p></p>
+<!-- end figure --></section>
+<section id="mathematics-of-deep-learning-and-neural-networks">
+<h2>Mathematics of deep learning and neural networks<a class="headerlink" href="#mathematics-of-deep-learning-and-neural-networks" title="Link to this heading">#</a></h2>
+<p>Neural networks, in its so-called feed-forward form, where each
+iterations contains a feed-forward stage and a back-propgagation
+stage, consist of series of affine matrix-matrix and matrix-vector
+multiplications. The unknown parameters (the so-called biases and
+weights which deternine the architecture of a neural network), are
+uptaded iteratively using the so-called back-propagation algorithm.
+This algorithm corresponds to the so-called reverse mode of
+automatic differentation.</p>
+</section>
+<section id="basics-of-an-nn">
+<h2>Basics of an NN<a class="headerlink" href="#basics-of-an-nn" title="Link to this heading">#</a></h2>
+<p>A neural network consists of a series of hidden layers, in addition to
+the input and output layers.  Each layer <span class="math notranslate nohighlight">\(l\)</span> has a set of parameters
+<span class="math notranslate nohighlight">\(\boldsymbol{\Theta}^{(l)}=(\boldsymbol{W}^{(l)},\boldsymbol{b}^{(l)})\)</span> which are related to the
+parameters in other layers through a series of affine transformations,
+for a standard NN these are matrix-matrix and matrix-vector
+multiplications.  For all layers we will simply use a collective variable <span class="math notranslate nohighlight">\(\boldsymbol{\Theta}\)</span>.</p>
+<p>It consist of two basic steps:</p>
+<ol class="arabic simple">
+<li><p>a feed forward stage which takes a given input and produces a final output which is compared with the target values through our cost/loss function.</p></li>
+<li><p>a back-propagation state where the unknown parameters <span class="math notranslate nohighlight">\(\boldsymbol{\Theta}\)</span> are updated through the optimization of the their gradients. The expressions for the gradients are obtained via the chain rule, starting from the derivative of the cost/function.</p></li>
+</ol>
+<p>These two steps make up one iteration. This iterative process is continued till we reach an eventual stopping criterion.</p>
+</section>
+<section id="overarching-view-of-a-neural-network">
+<h2>Overarching view of a neural network<a class="headerlink" href="#overarching-view-of-a-neural-network" title="Link to this heading">#</a></h2>
+<p>The architecture of a neural network defines our model. This model
+aims at describing some function <span class="math notranslate nohighlight">\(f(\boldsymbol{x}\)</span> which represents
+some final result (outputs or tagrget values) given a specific inpput
+<span class="math notranslate nohighlight">\(\boldsymbol{x}\)</span>. Note that here <span class="math notranslate nohighlight">\(\boldsymbol{y}\)</span> and <span class="math notranslate nohighlight">\(\boldsymbol{x}\)</span> are not limited to be
+vectors.</p>
+<p>The architecture consists of</p>
+<ol class="arabic simple">
+<li><p>An input and an output layer where the input layer is defined by the inputs <span class="math notranslate nohighlight">\(\boldsymbol{x}\)</span>. The output layer produces the model ouput <span class="math notranslate nohighlight">\(\boldsymbol{\tilde{y}}\)</span> which is compared with the target value <span class="math notranslate nohighlight">\(\boldsymbol{y}\)</span></p></li>
+<li><p>A given number of hidden layers and neurons/nodes/units for each layer (this may vary)</p></li>
+<li><p>A given activation function <span class="math notranslate nohighlight">\(\sigma(\boldsymbol{z})\)</span> with arguments <span class="math notranslate nohighlight">\(\boldsymbol{z}\)</span> to be defined below. The activation functions may differ from layer to layer.</p></li>
+<li><p>The last layer, normally called <strong>output</strong> layer has normally an activation function tailored to the specific problem</p></li>
+<li><p>Finally we define a so-called cost or loss function which is used to gauge the quality of our model.</p></li>
+</ol>
+</section>
+<section id="the-optimization-problem">
+<h2>The optimization problem<a class="headerlink" href="#the-optimization-problem" title="Link to this heading">#</a></h2>
+<p>The cost function is a function of the unknown parameters
+<span class="math notranslate nohighlight">\(\boldsymbol{\Theta}\)</span> where the latter is a container for all possible
+parameters needed to define a neural network</p>
+<p>If we are dealing with a regression task a typical cost/loss function
+is the mean squared error</p>
+<div class="math notranslate nohighlight">
+\[
+C(\boldsymbol{\Theta})=\frac{1}{n}\left\{\left(\boldsymbol{y}-\boldsymbol{X}\boldsymbol{\theta}\right)^T\left(\boldsymbol{y}-\boldsymbol{X}\boldsymbol{\theta}\right)\right\}.
+\]</div>
+<p>This function represents one of many possible ways to define
+the so-called cost function. Note that here we have assumed a linear dependence in terms of the paramters <span class="math notranslate nohighlight">\(\boldsymbol{\Theta}\)</span>. This is in general not the case.</p>
+</section>
+<section id="parameters-of-neural-networks">
+<h2>Parameters of neural networks<a class="headerlink" href="#parameters-of-neural-networks" title="Link to this heading">#</a></h2>
+<p>For neural networks the parameters
+<span class="math notranslate nohighlight">\(\boldsymbol{\Theta}\)</span> are given by the so-called weights and biases (to be
+defined below).</p>
+<p>The weights are given by matrix elements <span class="math notranslate nohighlight">\(w_{ij}^{(l)}\)</span> where the
+superscript indicates the layer number. The biases are typically given
+by vector elements representing each single node of a given layer,
+that is <span class="math notranslate nohighlight">\(b_j^{(l)}\)</span>.</p>
+</section>
+<section id="other-ingredients-of-a-neural-network">
+<h2>Other ingredients of a neural network<a class="headerlink" href="#other-ingredients-of-a-neural-network" title="Link to this heading">#</a></h2>
+<p>Having defined the architecture of a neural network, the optimization
+of the cost function with respect to the parameters <span class="math notranslate nohighlight">\(\boldsymbol{\Theta}\)</span>,
+involves the calculations of gradients and their optimization. The
+gradients represent the derivatives of a multidimensional object and
+are often approximated by various gradient methods, including</p>
+<ol class="arabic simple">
+<li><p>various quasi-Newton methods,</p></li>
+<li><p>plain gradient descent (GD) with a constant learning rate <span class="math notranslate nohighlight">\(\eta\)</span>,</p></li>
+<li><p>GD with momentum and other approximations to the learning rates such as</p></li>
+</ol>
+<ul class="simple">
+<li><p>Adapative gradient (ADAgrad)</p></li>
+<li><p>Root mean-square propagation (RMSprop)</p></li>
+<li><p>Adaptive gradient with momentum (ADAM) and many other</p></li>
+</ul>
+<ol class="arabic simple" start="4">
+<li><p>Stochastic gradient descent and various families of learning rate approximations</p></li>
+</ol>
+</section>
+<section id="other-parameters">
+<h2>Other parameters<a class="headerlink" href="#other-parameters" title="Link to this heading">#</a></h2>
+<p>In addition to the above, there are often additional hyperparamaters
+which are included in the setup of a neural network. These will be
+discussed below.</p>
+</section>
+<section id="universal-approximation-theorem">
+<h2>Universal approximation theorem<a class="headerlink" href="#universal-approximation-theorem" title="Link to this heading">#</a></h2>
+<p>The universal approximation theorem plays a central role in deep
+learning.  <a class="reference external" href="https://link.springer.com/article/10.1007/BF02551274">Cybenko (1989)</a> showed
+the following:</p>
+<p>Let <span class="math notranslate nohighlight">\(\sigma\)</span> be any continuous sigmoidal function such that</p>
+<div class="math notranslate nohighlight">
+\[\begin{split}
+\sigma(z) = \left\{\begin{array}{cc} 1 &amp; z\rightarrow \infty\\ 0 &amp; z \rightarrow -\infty \end{array}\right.
+\end{split}\]</div>
+<p>Given a continuous and deterministic function <span class="math notranslate nohighlight">\(F(\boldsymbol{x})\)</span> on the unit
+cube in <span class="math notranslate nohighlight">\(d\)</span>-dimensions <span class="math notranslate nohighlight">\(F\in [0,1]^d\)</span>, <span class="math notranslate nohighlight">\(x\in [0,1]^d\)</span> and a parameter
+<span class="math notranslate nohighlight">\(\epsilon &gt;0\)</span>, there is a one-layer (hidden) neural network
+<span class="math notranslate nohighlight">\(f(\boldsymbol{x};\boldsymbol{\Theta})\)</span> with <span class="math notranslate nohighlight">\(\boldsymbol{\Theta}=(\boldsymbol{W},\boldsymbol{b})\)</span> and <span class="math notranslate nohighlight">\(\boldsymbol{W}\in
+\mathbb{R}^{m\times n}\)</span> and <span class="math notranslate nohighlight">\(\boldsymbol{b}\in \mathbb{R}^{n}\)</span>, for which</p>
+<div class="math notranslate nohighlight">
+\[
+\vert F(\boldsymbol{x})-f(\boldsymbol{x};\boldsymbol{\Theta})\vert &lt; \epsilon \hspace{0.1cm} \forall \boldsymbol{x}\in[0,1]^d.
+\]</div>
+</section>
+<section id="some-parallels-from-real-analysis">
+<h2>Some parallels from real analysis<a class="headerlink" href="#some-parallels-from-real-analysis" title="Link to this heading">#</a></h2>
+<p>For those of you familiar with for example the <a class="reference external" href="https://en.wikipedia.org/wiki/Stone%E2%80%93Weierstrass_theorem">Stone-Weierstrass
+theorem</a>
+for polynomial approximations or the convergence criterion for Fourier
+series, there are similarities in the derivation of the proof for
+neural networks.</p>
+</section>
+<section id="the-approximation-theorem-in-words">
+<h2>The approximation theorem in words<a class="headerlink" href="#the-approximation-theorem-in-words" title="Link to this heading">#</a></h2>
+<p><strong>Any continuous function <span class="math notranslate nohighlight">\(y=F(\boldsymbol{x})\)</span> supported on the unit cube in
+<span class="math notranslate nohighlight">\(d\)</span>-dimensions can be approximated by a one-layer sigmoidal network to
+arbitrary accuracy.</strong></p>
+<p><a class="reference external" href="https://www.sciencedirect.com/science/article/abs/pii/089360809190009T">Hornik (1991)</a> extended the theorem by letting any non-constant, bounded activation function to be included using that the expectation value</p>
+<div class="math notranslate nohighlight">
+\[
+\mathbb{E}[\vert F(\boldsymbol{x})\vert^2] =\int_{\boldsymbol{x}\in D} \vert F(\boldsymbol{x})\vert^2p(\boldsymbol{x})d\boldsymbol{x} &lt; \infty.
+\]</div>
+<p>Then we have</p>
+<div class="math notranslate nohighlight">
+\[
+\mathbb{E}[\vert F(\boldsymbol{x})-f(\boldsymbol{x};\boldsymbol{\Theta})\vert^2] =\int_{\boldsymbol{x}\in D} \vert F(\boldsymbol{x})-f(\boldsymbol{x};\boldsymbol{\Theta})\vert^2p(\boldsymbol{x})d\boldsymbol{x} &lt; \epsilon.
+\]</div>
+</section>
+<section id="more-on-the-general-approximation-theorem">
+<h2>More on the general approximation theorem<a class="headerlink" href="#more-on-the-general-approximation-theorem" title="Link to this heading">#</a></h2>
+<p>None of the proofs give any insight into the relation between the
+number of of hidden layers and nodes and the approximation error
+<span class="math notranslate nohighlight">\(\epsilon\)</span>, nor the magnitudes of <span class="math notranslate nohighlight">\(\boldsymbol{W}\)</span> and <span class="math notranslate nohighlight">\(\boldsymbol{b}\)</span>.</p>
+<p>Neural networks (NNs) have what we may call a kind of universality no matter what function we want to compute.</p>
+<p>It does not mean that an NN can be used to exactly compute any function. Rather, we get an approximation that is as good as we want.</p>
+</section>
+<section id="class-of-functions-we-can-approximate">
+<h2>Class of functions we can approximate<a class="headerlink" href="#class-of-functions-we-can-approximate" title="Link to this heading">#</a></h2>
+<p>The class of functions that can be approximated are the continuous ones.
+If the function <span class="math notranslate nohighlight">\(F(\boldsymbol{x})\)</span> is discontinuous, it won’t in general be possible to approximate it. However, an NN may still give an approximation even if we fail in some points.</p>
+</section>
+<section id="setting-up-the-equations-for-a-neural-network">
+<h2>Setting up the equations for a neural network<a class="headerlink" href="#setting-up-the-equations-for-a-neural-network" title="Link to this heading">#</a></h2>
+<p>The questions we want to ask are how do changes in the biases and the
+weights in our network change the cost function and how can we use the
+final output to modify the weights and biases?</p>
+<p>To derive these equations let us start with a plain regression problem
+and define our cost function as</p>
+<div class="math notranslate nohighlight">
+\[
+{\cal C}(\boldsymbol{\Theta})  =  \frac{1}{2}\sum_{i=1}^n\left(y_i - \tilde{y}_i\right)^2,
+\]</div>
+<p>where the <span class="math notranslate nohighlight">\(y_i\)</span>s are our <span class="math notranslate nohighlight">\(n\)</span> targets (the values we want to
+reproduce), while the outputs of the network after having propagated
+all inputs <span class="math notranslate nohighlight">\(\boldsymbol{x}\)</span> are given by <span class="math notranslate nohighlight">\(\boldsymbol{\tilde{y}}_i\)</span>.</p>
+</section>
+<section id="layout-of-a-neural-network-with-three-hidden-layers">
+<h2>Layout of a neural network with three hidden layers<a class="headerlink" href="#layout-of-a-neural-network-with-three-hidden-layers" title="Link to this heading">#</a></h2>
+<!-- dom:FIGURE: [figures/nn1.png, width=900 frac=1.0] -->
+<!-- begin figure -->
+<p><img src="http://github.com/figures/nn1.png" width="900"><p style="font-size: 0.9em"><i>Figure 1: </i></p></p>
+<!-- end figure --></section>
+<section id="definitions">
+<h2>Definitions<a class="headerlink" href="#definitions" title="Link to this heading">#</a></h2>
+<p>With our definition of the targets <span class="math notranslate nohighlight">\(\boldsymbol{y}\)</span>, the outputs of the
+network <span class="math notranslate nohighlight">\(\boldsymbol{\tilde{y}}\)</span> and the inputs <span class="math notranslate nohighlight">\(\boldsymbol{x}\)</span> we
+define now the activation <span class="math notranslate nohighlight">\(z_j^l\)</span> of node/neuron/unit <span class="math notranslate nohighlight">\(j\)</span> of the
+<span class="math notranslate nohighlight">\(l\)</span>-th layer as a function of the bias, the weights which add up from
+the previous layer <span class="math notranslate nohighlight">\(l-1\)</span> and the forward passes/outputs
+<span class="math notranslate nohighlight">\(\hat{a}^{l-1}\)</span> from the previous layer as</p>
+<div class="math notranslate nohighlight">
+\[
+z_j^l = \sum_{i=1}^{M_{l-1}}w_{ij}^la_i^{l-1}+b_j^l,
+\]</div>
+<p>where <span class="math notranslate nohighlight">\(b_k^l\)</span> are the biases from layer <span class="math notranslate nohighlight">\(l\)</span>.  Here <span class="math notranslate nohighlight">\(M_{l-1}\)</span>
+represents the total number of nodes/neurons/units of layer <span class="math notranslate nohighlight">\(l-1\)</span>. The
+figure in the whiteboard notes illustrates this equation.  We can rewrite this in a more
+compact form as the matrix-vector products we discussed earlier,</p>
+<div class="math notranslate nohighlight">
+\[
+\hat{z}^l = \left(\hat{W}^l\right)^T\hat{a}^{l-1}+\hat{b}^l.
+\]</div>
+</section>
+<section id="inputs-to-the-activation-function">
+<h2>Inputs to the activation function<a class="headerlink" href="#inputs-to-the-activation-function" title="Link to this heading">#</a></h2>
+<p>With the activation values <span class="math notranslate nohighlight">\(\boldsymbol{z}^l\)</span> we can in turn define the
+output of layer <span class="math notranslate nohighlight">\(l\)</span> as <span class="math notranslate nohighlight">\(\boldsymbol{a}^l = f(\boldsymbol{z}^l)\)</span> where <span class="math notranslate nohighlight">\(f\)</span> is our
+activation function. In the examples here we will use the sigmoid
+function discussed in our logistic regression lectures. We will also use the same activation function <span class="math notranslate nohighlight">\(f\)</span> for all layers
+and their nodes.  It means we have</p>
+<div class="math notranslate nohighlight">
+\[
+a_j^l = \sigma(z_j^l) = \frac{1}{1+\exp{-(z_j^l)}}.
+\]</div>
+</section>
+<section id="derivatives-and-the-chain-rule">
+<h2>Derivatives and the chain rule<a class="headerlink" href="#derivatives-and-the-chain-rule" title="Link to this heading">#</a></h2>
+<p>From the definition of the activation <span class="math notranslate nohighlight">\(z_j^l\)</span> we have</p>
+<div class="math notranslate nohighlight">
+\[
+\frac{\partial z_j^l}{\partial w_{ij}^l} = a_i^{l-1},
+\]</div>
+<p>and</p>
+<div class="math notranslate nohighlight">
+\[
+\frac{\partial z_j^l}{\partial a_i^{l-1}} = w_{ji}^l.
+\]</div>
+<p>With our definition of the activation function we have that (note that this function depends only on <span class="math notranslate nohighlight">\(z_j^l\)</span>)</p>
+<div class="math notranslate nohighlight">
+\[
+\frac{\partial a_j^l}{\partial z_j^{l}} = a_j^l(1-a_j^l)=\sigma(z_j^l)(1-\sigma(z_j^l)).
+\]</div>
+</section>
+<section id="derivative-of-the-cost-function">
+<h2>Derivative of the cost function<a class="headerlink" href="#derivative-of-the-cost-function" title="Link to this heading">#</a></h2>
+<p>With these definitions we can now compute the derivative of the cost function in terms of the weights.</p>
+<p>Let us specialize to the output layer <span class="math notranslate nohighlight">\(l=L\)</span>. Our cost function is</p>
+<div class="math notranslate nohighlight">
+\[
+{\cal C}(\boldsymbol{\Theta}^L)  =  \frac{1}{2}\sum_{i=1}^n\left(y_i - \tilde{y}_i\right)^2=\frac{1}{2}\sum_{i=1}^n\left(a_i^L - y_i\right)^2,
+\]</div>
+<p>The derivative of this function with respect to the weights is</p>
+<div class="math notranslate nohighlight">
+\[
+\frac{\partial{\cal C}(\boldsymbol{\Theta}^L)}{\partial w_{jk}^L}  =  \left(a_j^L - y_j\right)\frac{\partial a_j^L}{\partial w_{jk}^{L}},
+\]</div>
+<p>The last partial derivative can easily be computed and reads (by applying the chain rule)</p>
+<div class="math notranslate nohighlight">
+\[
+\frac{\partial a_j^L}{\partial w_{jk}^{L}} = \frac{\partial a_j^L}{\partial z_{j}^{L}}\frac{\partial z_j^L}{\partial w_{jk}^{L}}=a_j^L(1-a_j^L)a_k^{L-1}.
+\]</div>
+</section>
+<section id="simpler-examples-first-and-automatic-differentiation">
+<h2>Simpler examples first, and automatic differentiation<a class="headerlink" href="#simpler-examples-first-and-automatic-differentiation" title="Link to this heading">#</a></h2>
+<p>In order to understand the back propagation algorithm and its
+derivation (an implementation of the chain rule), let us first digress
+with some simple examples. These examples are also meant to motivate
+the link with back propagation and <a class="reference external" href="https://en.wikipedia.org/wiki/Automatic_differentiation">automatic differentiation</a>. We will discuss these topics next week (week 42).</p>
+</section>
+<section id="reminder-on-the-chain-rule-and-gradients">
+<h2>Reminder on the chain rule and gradients<a class="headerlink" href="#reminder-on-the-chain-rule-and-gradients" title="Link to this heading">#</a></h2>
+<p>If we have a multivariate function <span class="math notranslate nohighlight">\(f(x,y)\)</span> where <span class="math notranslate nohighlight">\(x=x(t)\)</span> and <span class="math notranslate nohighlight">\(y=y(t)\)</span> are functions of a variable <span class="math notranslate nohighlight">\(t\)</span>, we have that the gradient of <span class="math notranslate nohighlight">\(f\)</span> with respect to <span class="math notranslate nohighlight">\(t\)</span> (without the explicit unit vector components)</p>
+<div class="math notranslate nohighlight">
+\[\begin{split}
+\frac{df}{dt} = \begin{bmatrix}\frac{\partial f}{\partial x} &amp; \frac{\partial f}{\partial y} \end{bmatrix} \begin{bmatrix}\frac{\partial x}{\partial t} \\ \frac{\partial y}{\partial t} \end{bmatrix}=\frac{\partial f}{\partial x} \frac{\partial x}{\partial t} +\frac{\partial f}{\partial y} \frac{\partial y}{\partial t}.
+\end{split}\]</div>
+</section>
+<section id="multivariable-functions">
+<h2>Multivariable functions<a class="headerlink" href="#multivariable-functions" title="Link to this heading">#</a></h2>
+<p>If we have a multivariate function <span class="math notranslate nohighlight">\(f(x,y)\)</span> where <span class="math notranslate nohighlight">\(x=x(t,s)\)</span> and <span class="math notranslate nohighlight">\(y=y(t,s)\)</span> are functions of the variables <span class="math notranslate nohighlight">\(t\)</span> and <span class="math notranslate nohighlight">\(s\)</span>, we have that the partial derivatives</p>
+<div class="math notranslate nohighlight">
+\[
+\frac{\partial f}{\partial s}=\frac{\partial f}{\partial x}\frac{\partial x}{\partial s}+\frac{\partial f}{\partial y}\frac{\partial y}{\partial s},
+\]</div>
+<p>and</p>
+<div class="math notranslate nohighlight">
+\[
+\frac{\partial f}{\partial t}=\frac{\partial f}{\partial x}\frac{\partial x}{\partial t}+\frac{\partial f}{\partial y}\frac{\partial y}{\partial t}.
+\]</div>
+<p>the gradient of <span class="math notranslate nohighlight">\(f\)</span> with respect to <span class="math notranslate nohighlight">\(t\)</span> and <span class="math notranslate nohighlight">\(s\)</span> (without the explicit unit vector components)</p>
+<div class="math notranslate nohighlight">
+\[\begin{split}
+\frac{df}{d(s,t)} = \begin{bmatrix}\frac{\partial f}{\partial x} &amp; \frac{\partial f}{\partial y} \end{bmatrix} \begin{bmatrix}\frac{\partial x}{\partial s}  &amp;\frac{\partial x}{\partial t} \\ \frac{\partial y}{\partial s} &amp; \frac{\partial y}{\partial t} \end{bmatrix}.
+\end{split}\]</div>
+</section>
+<section id="automatic-differentiation-through-examples">
+<h2>Automatic differentiation through examples<a class="headerlink" href="#automatic-differentiation-through-examples" title="Link to this heading">#</a></h2>
+<p>A great introduction to automatic differentiation is given by Baydin et al., see <a class="reference external" href="https://arxiv.org/abs/1502.05767">https://arxiv.org/abs/1502.05767</a>.
+See also the video at <a class="reference external" href="https://www.youtube.com/watch?v=wG_nF1awSSY">https://www.youtube.com/watch?v=wG_nF1awSSY</a>.</p>
+<p>Automatic differentiation is a represented by a repeated application
+of the chain rule on well-known functions and allows for the
+calculation of derivatives to numerical precision. It is not the same
+as the calculation of symbolic derivatives via for example SymPy, nor
+does it use approximative formulae based on Taylor-expansions of a
+function around a given value. The latter are error prone due to
+truncation errors and values of the step size <span class="math notranslate nohighlight">\(\Delta\)</span>.</p>
+</section>
+<section id="simple-example">
+<h2>Simple example<a class="headerlink" href="#simple-example" title="Link to this heading">#</a></h2>
+<p>Our first example is rather simple,</p>
+<div class="math notranslate nohighlight">
+\[
+f(x) =\exp{x^2},
+\]</div>
+<p>with derivative</p>
+<div class="math notranslate nohighlight">
+\[
+f'(x) =2x\exp{x^2}.
+\]</div>
+<p>We can use SymPy to extract the pertinent lines of Python code through the following simple example</p>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-none notranslate"><div class="highlight"><pre><span></span>from __future__ import division
+from sympy import *
+x = symbols(&#39;x&#39;)
+expr = exp(x*x)
+simplify(expr)
+derivative = diff(expr,x)
+print(python(expr))
+print(python(derivative))
+</pre></div>
+</div>
+</div>
+</div>
+</section>
+<section id="smarter-way-of-evaluating-the-above-function">
+<h2>Smarter way of evaluating the above function<a class="headerlink" href="#smarter-way-of-evaluating-the-above-function" title="Link to this heading">#</a></h2>
+<p>If we study this function, we note that we can reduce the number of operations by introducing an intermediate variable</p>
+<div class="math notranslate nohighlight">
+\[
+a = x^2,
+\]</div>
+<p>leading to</p>
+<div class="math notranslate nohighlight">
+\[
+f(x) = f(a(x)) = b= \exp{a}.
+\]</div>
+<p>We now assume that all operations can be counted in terms of equal
+floating point operations. This means that in order to calculate
+<span class="math notranslate nohighlight">\(f(x)\)</span> we need first to square <span class="math notranslate nohighlight">\(x\)</span> and then compute the exponential. We
+have thus two floating point operations only.</p>
+</section>
+<section id="reducing-the-number-of-operations">
+<h2>Reducing the number of operations<a class="headerlink" href="#reducing-the-number-of-operations" title="Link to this heading">#</a></h2>
+<p>With the introduction of a precalculated quantity <span class="math notranslate nohighlight">\(a\)</span> and thereby <span class="math notranslate nohighlight">\(f(x)\)</span> we have that the derivative can be written as</p>
+<div class="math notranslate nohighlight">
+\[
+f'(x) = 2xb,
+\]</div>
+<p>which reduces the number of operations from four in the orginal
+expression to two. This means that if we need to compute <span class="math notranslate nohighlight">\(f(x)\)</span> and
+its derivative (a common task in optimizations), we have reduced the
+number of operations from six to four in total.</p>
+<p><strong>Note</strong> that the usage of a symbolic software like SymPy does not
+include such simplifications and the calculations of the function and
+the derivatives yield in general more floating point operations.</p>
+</section>
+<section id="chain-rule-forward-and-reverse-modes">
+<h2>Chain rule, forward and reverse modes<a class="headerlink" href="#chain-rule-forward-and-reverse-modes" title="Link to this heading">#</a></h2>
+<p>In the above example we have introduced the variables <span class="math notranslate nohighlight">\(a\)</span> and <span class="math notranslate nohighlight">\(b\)</span>, and our function is</p>
+<div class="math notranslate nohighlight">
+\[
+f(x) = f(a(x)) = b= \exp{a},
+\]</div>
+<p>with <span class="math notranslate nohighlight">\(a=x^2\)</span>. We can decompose the derivative of <span class="math notranslate nohighlight">\(f\)</span> with respect to <span class="math notranslate nohighlight">\(x\)</span> as</p>
+<div class="math notranslate nohighlight">
+\[
+\frac{df}{dx}=\frac{df}{db}\frac{db}{da}\frac{da}{dx}.
+\]</div>
+<p>We note that since <span class="math notranslate nohighlight">\(b=f(x)\)</span> that</p>
+<div class="math notranslate nohighlight">
+\[
+\frac{df}{db}=1,
+\]</div>
+<p>leading to</p>
+<div class="math notranslate nohighlight">
+\[
+\frac{df}{dx}=\frac{db}{da}\frac{da}{dx}=2x\exp{x^2},
+\]</div>
+<p>as before.</p>
+</section>
+<section id="forward-and-reverse-modes">
+<h2>Forward and reverse modes<a class="headerlink" href="#forward-and-reverse-modes" title="Link to this heading">#</a></h2>
+<p>We have that</p>
+<div class="math notranslate nohighlight">
+\[
+\frac{df}{dx}=\frac{df}{db}\frac{db}{da}\frac{da}{dx},
+\]</div>
+<p>which we can rewrite either as</p>
+<div class="math notranslate nohighlight">
+\[
+\frac{df}{dx}=\left[\frac{df}{db}\frac{db}{da}\right]\frac{da}{dx},
+\]</div>
+<p>or</p>
+<div class="math notranslate nohighlight">
+\[
+\frac{df}{dx}=\frac{df}{db}\left[\frac{db}{da}\frac{da}{dx}\right].
+\]</div>
+<p>The first expression is called reverse mode (or back propagation)
+since we start by evaluating the derivatives at the end point and then
+propagate backwards. This is the standard way of evaluating
+derivatives (gradients) when optimizing the parameters of a neural
+network.  In the context of deep learning this is computationally
+more efficient since the output of a neural network consists of either
+one or some few other output variables.</p>
+<p>The second equation defines the so-called  <strong>forward mode</strong>.</p>
+</section>
+<section id="more-complicated-function">
+<h2>More complicated function<a class="headerlink" href="#more-complicated-function" title="Link to this heading">#</a></h2>
+<p>We increase our ambitions and introduce a slightly more complicated function</p>
+<div class="math notranslate nohighlight">
+\[
+f(x) =\sqrt{x^2+exp{x^2}},
+\]</div>
+<p>with derivative</p>
+<div class="math notranslate nohighlight">
+\[
+f'(x) =\frac{x(1+\exp{x^2})}{\sqrt{x^2+exp{x^2}}}.
+\]</div>
+<p>The corresponding SymPy code reads</p>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-none notranslate"><div class="highlight"><pre><span></span>from __future__ import division
+from sympy import *
+x = symbols(&#39;x&#39;)
+expr = sqrt(x*x+exp(x*x))
+simplify(expr)
+derivative = diff(expr,x)
+print(python(expr))
+print(python(derivative))
+</pre></div>
+</div>
+</div>
+</div>
+</section>
+<section id="counting-the-number-of-floating-point-operations">
+<h2>Counting the number of floating point operations<a class="headerlink" href="#counting-the-number-of-floating-point-operations" title="Link to this heading">#</a></h2>
+<p>A simple count of operations shows that we need five operations for
+the function itself and ten for the derivative.  Fifteen operations in total if we wish to proceed with the above codes.</p>
+<p>Can we reduce this to
+say half the number of operations?</p>
+</section>
+<section id="defining-intermediate-operations">
+<h2>Defining intermediate operations<a class="headerlink" href="#defining-intermediate-operations" title="Link to this heading">#</a></h2>
+<p>We can indeed reduce the number of operation to half of those listed in the brute force approach above.
+We define the following quantities</p>
+<div class="math notranslate nohighlight">
+\[
+a = x^2,
+\]</div>
+<p>and</p>
+<div class="math notranslate nohighlight">
+\[
+b = \exp{x^2} = \exp{a},
+\]</div>
+<p>and</p>
+<div class="math notranslate nohighlight">
+\[
+c= a+b,
+\]</div>
+<p>and</p>
+<div class="math notranslate nohighlight">
+\[
+d=f(x)=\sqrt{c}.
+\]</div>
+</section>
+<section id="new-expression-for-the-derivative">
+<h2>New expression for the derivative<a class="headerlink" href="#new-expression-for-the-derivative" title="Link to this heading">#</a></h2>
+<p>With these definitions we obtain the following partial derivatives</p>
+<div class="math notranslate nohighlight">
+\[
+\frac{\partial a}{\partial x} = 2x,
+\]</div>
+<p>and</p>
+<div class="math notranslate nohighlight">
+\[
+\frac{\partial b}{\partial a} = \exp{a},
+\]</div>
+<p>and</p>
+<div class="math notranslate nohighlight">
+\[
+\frac{\partial c}{\partial a} = 1,
+\]</div>
+<p>and</p>
+<div class="math notranslate nohighlight">
+\[
+\frac{\partial c}{\partial b} = 1,
+\]</div>
+<p>and</p>
+<div class="math notranslate nohighlight">
+\[
+\frac{\partial d}{\partial c} = \frac{1}{2\sqrt{c}},
+\]</div>
+<p>and finally</p>
+<div class="math notranslate nohighlight">
+\[
+\frac{\partial f}{\partial d} = 1.
+\]</div>
+</section>
+<section id="final-derivatives">
+<h2>Final derivatives<a class="headerlink" href="#final-derivatives" title="Link to this heading">#</a></h2>
+<p>Our final derivatives are thus</p>
+<div class="math notranslate nohighlight">
+\[
+\frac{\partial f}{\partial c} = \frac{\partial f}{\partial d} \frac{\partial d}{\partial c}  = \frac{1}{2\sqrt{c}},
+\]</div>
+<div class="math notranslate nohighlight">
+\[
+\frac{\partial f}{\partial b} = \frac{\partial f}{\partial c} \frac{\partial c}{\partial b}  = \frac{1}{2\sqrt{c}},
+\]</div>
+<div class="math notranslate nohighlight">
+\[
+\frac{\partial f}{\partial a} = \frac{\partial f}{\partial c} \frac{\partial c}{\partial a}+
+\frac{\partial f}{\partial b} \frac{\partial b}{\partial a}  = \frac{1+\exp{a}}{2\sqrt{c}},
+\]</div>
+<p>and finally</p>
+<div class="math notranslate nohighlight">
+\[
+\frac{\partial f}{\partial x} = \frac{\partial f}{\partial a} \frac{\partial a}{\partial x}  = \frac{x(1+\exp{a})}{\sqrt{c}},
+\]</div>
+<p>which is just</p>
+<div class="math notranslate nohighlight">
+\[
+\frac{\partial f}{\partial x} = \frac{x(1+b)}{d},
+\]</div>
+<p>and requires only three operations if we can reuse all intermediate variables.</p>
+</section>
+<section id="in-general-not-this-simple">
+<h2>In general not this simple<a class="headerlink" href="#in-general-not-this-simple" title="Link to this heading">#</a></h2>
+<p>In general, see the generalization below, unless we can obtain simple
+analytical expressions which we can simplify further, the final
+implementation of automatic differentiation involves repeated
+calculations (and thereby operations) of derivatives of elementary
+functions.</p>
+</section>
+<section id="automatic-differentiation">
+<h2>Automatic differentiation<a class="headerlink" href="#automatic-differentiation" title="Link to this heading">#</a></h2>
+<p>We can make this example more formal. Automatic differentiation is a
+formalization of the previous example (see graph).</p>
+<p>We define <span class="math notranslate nohighlight">\(\boldsymbol{x}\in x_1,\dots, x_l\)</span> input variables to a given function <span class="math notranslate nohighlight">\(f(\boldsymbol{x})\)</span> and <span class="math notranslate nohighlight">\(x_{l+1},\dots, x_L\)</span> intermediate variables.</p>
+<p>In the above example we have only one input variable, <span class="math notranslate nohighlight">\(l=1\)</span> and four intermediate variables, that is</p>
+<div class="math notranslate nohighlight">
+\[
+\begin{bmatrix} x_1=x &amp; x_2 = x^2=a &amp; x_3 =\exp{a}= b &amp; x_4=c=a+b &amp; x_5 = \sqrt{c}=d \end{bmatrix}.
+\]</div>
+<p>Furthemore, for <span class="math notranslate nohighlight">\(i=l+1, \dots, L\)</span> (here <span class="math notranslate nohighlight">\(i=2,3,4,5\)</span> and <span class="math notranslate nohighlight">\(f=x_L=d\)</span>), we
+define the elementary functions <span class="math notranslate nohighlight">\(g_i(x_{Pa(x_i)})\)</span> where <span class="math notranslate nohighlight">\(x_{Pa(x_i)}\)</span> are the parent nodes of the variable <span class="math notranslate nohighlight">\(x_i\)</span>.</p>
+<p>In our case, we have for example for <span class="math notranslate nohighlight">\(x_3=g_3(x_{Pa(x_i)})=\exp{a}\)</span>, that <span class="math notranslate nohighlight">\(g_3=\exp{()}\)</span> and <span class="math notranslate nohighlight">\(x_{Pa(x_3)}=a\)</span>.</p>
+</section>
+<section id="chain-rule">
+<h2>Chain rule<a class="headerlink" href="#chain-rule" title="Link to this heading">#</a></h2>
+<p>We can now compute the gradients by back-propagating the derivatives using the chain rule.
+We have defined</p>
+<div class="math notranslate nohighlight">
+\[
+\frac{\partial f}{\partial x_L} = 1,
+\]</div>
+<p>which allows us to find the derivatives of the various variables <span class="math notranslate nohighlight">\(x_i\)</span> as</p>
+<div class="math notranslate nohighlight">
+\[
+\frac{\partial f}{\partial x_i} = \sum_{x_j:x_i\in Pa(x_j)}\frac{\partial f}{\partial x_j} \frac{\partial x_j}{\partial x_i}=\sum_{x_j:x_i\in Pa(x_j)}\frac{\partial f}{\partial x_j} \frac{\partial g_j}{\partial x_i}.
+\]</div>
+<p>Whenever we have a function which can be expressed as a computation
+graph and the various functions can be expressed in terms of
+elementary functions that are differentiable, then automatic
+differentiation works.  The functions may not need to be elementary
+functions, they could also be computer programs, although not all
+programs can be automatically differentiated.</p>
+</section>
+<section id="first-network-example-simple-percepetron-with-one-input">
+<h2>First network example, simple percepetron with one input<a class="headerlink" href="#first-network-example-simple-percepetron-with-one-input" title="Link to this heading">#</a></h2>
+<p>As yet another example we define now a simple perceptron model with
+all quantities given by scalars. We consider only one input variable
+<span class="math notranslate nohighlight">\(x\)</span> and one target value <span class="math notranslate nohighlight">\(y\)</span>.  We define an activation function
+<span class="math notranslate nohighlight">\(\sigma_1\)</span> which takes as input</p>
+<div class="math notranslate nohighlight">
+\[
+z_1 = w_1x+b_1,
+\]</div>
+<p>where <span class="math notranslate nohighlight">\(w_1\)</span> is the weight and <span class="math notranslate nohighlight">\(b_1\)</span> is the bias. These are the
+parameters we want to optimize.  The output is <span class="math notranslate nohighlight">\(a_1=\sigma(z_1)\)</span> (see
+graph from whiteboard notes). This output is then fed into the
+<strong>cost/loss</strong> function, which we here for the sake of simplicity just
+define as the squared error</p>
+<div class="math notranslate nohighlight">
+\[
+C(x;w_1,b_1)=\frac{1}{2}(a_1-y)^2.
+\]</div>
+</section>
+<section id="layout-of-a-simple-neural-network-with-no-hidden-layer">
+<h2>Layout of a simple neural network with no hidden layer<a class="headerlink" href="#layout-of-a-simple-neural-network-with-no-hidden-layer" title="Link to this heading">#</a></h2>
+<!-- dom:FIGURE: [figures/simplenn1.png, width=900 frac=1.0] -->
+<!-- begin figure -->
+<p><img src="http://github.com/figures/simplenn1.png" width="900"><p style="font-size: 0.9em"><i>Figure 1: </i></p></p>
+<!-- end figure --></section>
+<section id="optimizing-the-parameters">
+<h2>Optimizing the parameters<a class="headerlink" href="#optimizing-the-parameters" title="Link to this heading">#</a></h2>
+<p>In setting up the feed forward and back propagation parts of the
+algorithm, we need now the derivative of the various variables we want
+to train.</p>
+<p>We need</p>
+<div class="math notranslate nohighlight">
+\[
+\frac{\partial C}{\partial w_1} \hspace{0.1cm}\mathrm{and}\hspace{0.1cm}\frac{\partial C}{\partial b_1}.
+\]</div>
+<p>Using the chain rule we find</p>
+<div class="math notranslate nohighlight">
+\[
+\frac{\partial C}{\partial w_1}=\frac{\partial C}{\partial a_1}\frac{\partial a_1}{\partial z_1}\frac{\partial z_1}{\partial w_1}=(a_1-y)\sigma_1'x,
+\]</div>
+<p>and</p>
+<div class="math notranslate nohighlight">
+\[
+\frac{\partial C}{\partial b_1}=\frac{\partial C}{\partial a_1}\frac{\partial a_1}{\partial z_1}\frac{\partial z_1}{\partial b_1}=(a_1-y)\sigma_1',
+\]</div>
+<p>which we later will just define as</p>
+<div class="math notranslate nohighlight">
+\[
+\frac{\partial C}{\partial a_1}\frac{\partial a_1}{\partial z_1}=\delta_1.
+\]</div>
+</section>
+<section id="adding-a-hidden-layer">
+<h2>Adding a hidden layer<a class="headerlink" href="#adding-a-hidden-layer" title="Link to this heading">#</a></h2>
+<p>We change our simple model to (see graph)
+a network with just one hidden layer but with scalar variables only.</p>
+<p>Our output variable changes to <span class="math notranslate nohighlight">\(a_2\)</span> and <span class="math notranslate nohighlight">\(a_1\)</span> is now the output from the hidden node and <span class="math notranslate nohighlight">\(a_0=x\)</span>.
+We have then</p>
+<div class="math notranslate nohighlight">
+\[
+z_1 = w_1a_0+b_1 \hspace{0.1cm} \wedge a_1 = \sigma_1(z_1),
+\]</div>
+<div class="math notranslate nohighlight">
+\[
+z_2 = w_2a_1+b_2 \hspace{0.1cm} \wedge a_2 = \sigma_2(z_2),
+\]</div>
+<p>and the cost function</p>
+<div class="math notranslate nohighlight">
+\[
+C(x;\boldsymbol{\Theta})=\frac{1}{2}(a_2-y)^2,
+\]</div>
+<p>with <span class="math notranslate nohighlight">\(\boldsymbol{\Theta}=[w_1,w_2,b_1,b_2]\)</span>.</p>
+</section>
+<section id="layout-of-a-simple-neural-network-with-one-hidden-layer">
+<h2>Layout of a simple neural network with one hidden layer<a class="headerlink" href="#layout-of-a-simple-neural-network-with-one-hidden-layer" title="Link to this heading">#</a></h2>
+<!-- dom:FIGURE: [figures/simplenn2.png, width=900 frac=1.0] -->
+<!-- begin figure -->
+<p><img src="http://github.com/figures/simplenn2.png" width="900"><p style="font-size: 0.9em"><i>Figure 1: </i></p></p>
+<!-- end figure --></section>
+<section id="the-derivatives">
+<h2>The derivatives<a class="headerlink" href="#the-derivatives" title="Link to this heading">#</a></h2>
+<p>The derivatives are now, using the chain rule again</p>
+<div class="math notranslate nohighlight">
+\[
+\frac{\partial C}{\partial w_2}=\frac{\partial C}{\partial a_2}\frac{\partial a_2}{\partial z_2}\frac{\partial z_2}{\partial w_2}=(a_2-y)\sigma_2'a_1=\delta_2a_1,
+\]</div>
+<div class="math notranslate nohighlight">
+\[
+\frac{\partial C}{\partial b_2}=\frac{\partial C}{\partial a_2}\frac{\partial a_2}{\partial z_2}\frac{\partial z_2}{\partial b_2}=(a_2-y)\sigma_2'=\delta_2,
+\]</div>
+<div class="math notranslate nohighlight">
+\[
+\frac{\partial C}{\partial w_1}=\frac{\partial C}{\partial a_2}\frac{\partial a_2}{\partial z_2}\frac{\partial z_2}{\partial a_1}\frac{\partial a_1}{\partial z_1}\frac{\partial z_1}{\partial w_1}=(a_2-y)\sigma_2'a_1\sigma_1'a_0,
+\]</div>
+<div class="math notranslate nohighlight">
+\[
+\frac{\partial C}{\partial b_1}=\frac{\partial C}{\partial a_2}\frac{\partial a_2}{\partial z_2}\frac{\partial z_2}{\partial a_1}\frac{\partial a_1}{\partial z_1}\frac{\partial z_1}{\partial b_1}=(a_2-y)\sigma_2'\sigma_1'=\delta_1.
+\]</div>
+<p>Can you generalize this to more than one hidden layer?</p>
+</section>
+<section id="important-observations">
+<h2>Important observations<a class="headerlink" href="#important-observations" title="Link to this heading">#</a></h2>
+<p>From the above equations we see that the derivatives of the activation
+functions play a central role. If they vanish, the training may
+stop. This is called the vanishing gradient problem, see discussions below. If they become
+large, the parameters <span class="math notranslate nohighlight">\(w_i\)</span> and <span class="math notranslate nohighlight">\(b_i\)</span> may simply go to infinity. This
+is referenced as  the exploding gradient problem.</p>
+</section>
+<section id="the-training">
+<h2>The training<a class="headerlink" href="#the-training" title="Link to this heading">#</a></h2>
+<p>The training of the parameters is done through various gradient descent approximations with</p>
+<div class="math notranslate nohighlight">
+\[
+w_{i}\leftarrow w_{i}- \eta \delta_i a_{i-1},
+\]</div>
+<p>and</p>
+<div class="math notranslate nohighlight">
+\[
+b_i \leftarrow b_i-\eta \delta_i,
+\]</div>
+<p>with <span class="math notranslate nohighlight">\(\eta\)</span> is the learning rate.</p>
+<p>One iteration consists of one feed forward step and one back-propagation step. Each back-propagation step does one update of the parameters <span class="math notranslate nohighlight">\(\boldsymbol{\Theta}\)</span>.</p>
+<p>For the first hidden layer <span class="math notranslate nohighlight">\(a_{i-1}=a_0=x\)</span> for this simple model.</p>
+</section>
+<section id="code-example">
+<h2>Code example<a class="headerlink" href="#code-example" title="Link to this heading">#</a></h2>
+<p>The code here implements the above model with one hidden layer and
+scalar variables for the same function we studied in the previous
+example.  The code is however set up so that we can add multiple
+inputs <span class="math notranslate nohighlight">\(x\)</span> and target values <span class="math notranslate nohighlight">\(y\)</span>. Note also that we have the
+possibility of defining a feature matrix <span class="math notranslate nohighlight">\(\boldsymbol{X}\)</span> with more than just
+one column for the input values. This will turn useful in our next example. We have also defined matrices and vectors for all of our operations although it is not necessary here.</p>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-none notranslate"><div class="highlight"><pre><span></span>import numpy as np
+# We use the Sigmoid function as activation function
+def sigmoid(z):
+    return 1.0/(1.0+np.exp(-z))
+
+def forwardpropagation(x):
+    # weighted sum of inputs to the hidden layer
+    z_1 = np.matmul(x, w_1) + b_1
+    # activation in the hidden layer
+    a_1 = sigmoid(z_1)
+    # weighted sum of inputs to the output layer
+    z_2 = np.matmul(a_1, w_2) + b_2
+    a_2 = z_2
+    return a_1, a_2
+
+def backpropagation(x, y):
+    a_1, a_2 = forwardpropagation(x)
+    # parameter delta for the output layer, note that a_2=z_2 and its derivative wrt z_2 is just 1
+    delta_2 = a_2 - y
+    print(0.5*((a_2-y)**2))
+    # delta for  the hidden layer
+    delta_1 = np.matmul(delta_2, w_2.T) * a_1 * (1 - a_1)
+    # gradients for the output layer
+    output_weights_gradient = np.matmul(a_1.T, delta_2)
+    output_bias_gradient = np.sum(delta_2, axis=0)
+    # gradient for the hidden layer
+    hidden_weights_gradient = np.matmul(x.T, delta_1)
+    hidden_bias_gradient = np.sum(delta_1, axis=0)
+    return output_weights_gradient, output_bias_gradient, hidden_weights_gradient, hidden_bias_gradient
+
+
+# ensure the same random numbers appear every time
+np.random.seed(0)
+# Input variable
+x = np.array([4.0],dtype=np.float64)
+# Target values
+y = 2*x+1.0 
+
+# Defining the neural network, only scalars here
+n_inputs = x.shape
+n_features = 1
+n_hidden_neurons = 1
+n_outputs = 1
+
+# Initialize the network
+# weights and bias in the hidden layer
+w_1 = np.random.randn(n_features, n_hidden_neurons)
+b_1 = np.zeros(n_hidden_neurons) + 0.01
+
+# weights and bias in the output layer
+w_2 = np.random.randn(n_hidden_neurons, n_outputs)
+b_2 = np.zeros(n_outputs) + 0.01
+
+eta = 0.1
+for i in range(50):
+    # calculate gradients
+    derivW2, derivB2, derivW1, derivB1 = backpropagation(x, y)
+    # update weights and biases
+    w_2 -= eta * derivW2
+    b_2 -= eta * derivB2
+    w_1 -= eta * derivW1
+    b_1 -= eta * derivB1
+</pre></div>
+</div>
+</div>
+</div>
+<p>We see that after some few iterations (the results do depend on the learning rate however), we get an error which is rather small.</p>
+</section>
+<section id="exercise-1-including-more-data">
+<h2>Exercise 1: Including more data<a class="headerlink" href="#exercise-1-including-more-data" title="Link to this heading">#</a></h2>
+<p>Try to increase the amount of input and
+target/output data. Try also to perform calculations for more values
+of the learning rates. Feel free to add either hyperparameters with an
+<span class="math notranslate nohighlight">\(l_1\)</span> norm or an <span class="math notranslate nohighlight">\(l_2\)</span> norm and discuss your results.
+Discuss your results as functions of the amount of training data and various learning rates.</p>
+<p><strong>Challenge:</strong> Try to change the activation functions and replace the hard-coded analytical expressions with automatic derivation via either <strong>autograd</strong> or <strong>JAX</strong>.</p>
+</section>
+<section id="simple-neural-network-and-the-back-propagation-equations">
+<h2>Simple neural network and the  back propagation equations<a class="headerlink" href="#simple-neural-network-and-the-back-propagation-equations" title="Link to this heading">#</a></h2>
+<p>Let us now try to increase our level of ambition and attempt at setting
+up the equations for a neural network with two input nodes, one hidden
+layer with two hidden nodes and one output layer with one output node/neuron only (see graph)..</p>
+<p>We need to define the following parameters and variables with the input layer (layer <span class="math notranslate nohighlight">\((0)\)</span>)
+where we label the  nodes <span class="math notranslate nohighlight">\(x_0\)</span> and <span class="math notranslate nohighlight">\(x_1\)</span></p>
+<div class="math notranslate nohighlight">
+\[
+x_0 = a_0^{(0)} \wedge x_1 = a_1^{(0)}.
+\]</div>
+<p>The  hidden layer (layer <span class="math notranslate nohighlight">\((1)\)</span>) has  nodes which yield the outputs <span class="math notranslate nohighlight">\(a_0^{(1)}\)</span> and <span class="math notranslate nohighlight">\(a_1^{(1)}\)</span>) with  weight <span class="math notranslate nohighlight">\(\boldsymbol{w}\)</span> and bias <span class="math notranslate nohighlight">\(\boldsymbol{b}\)</span> parameters</p>
+<div class="math notranslate nohighlight">
+\[
+w_{ij}^{(1)}=\left\{w_{00}^{(1)},w_{01}^{(1)},w_{10}^{(1)},w_{11}^{(1)}\right\} \wedge b^{(1)}=\left\{b_0^{(1)},b_1^{(1)}\right\}.
+\]</div>
+</section>
+<section id="layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-and-one-output-node">
+<h2>Layout of a simple neural network with two input nodes, one  hidden layer and one output node<a class="headerlink" href="#layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-and-one-output-node" title="Link to this heading">#</a></h2>
+<!-- dom:FIGURE: [figures/simplenn3.png, width=900 frac=1.0] -->
+<!-- begin figure -->
+<p><img src="http://github.com/figures/simplenn3.png" width="900"><p style="font-size: 0.9em"><i>Figure 1: </i></p></p>
+<!-- end figure --></section>
+<section id="the-ouput-layer">
+<h2>The ouput layer<a class="headerlink" href="#the-ouput-layer" title="Link to this heading">#</a></h2>
+<p>Finally, we have the ouput layer given by layer label <span class="math notranslate nohighlight">\((2)\)</span> with output <span class="math notranslate nohighlight">\(a^{(2)}\)</span> and weights and biases to be determined given by the variables</p>
+<div class="math notranslate nohighlight">
+\[
+w_{i}^{(2)}=\left\{w_{0}^{(2)},w_{1}^{(2)}\right\} \wedge b^{(2)}.
+\]</div>
+<p>Our output is <span class="math notranslate nohighlight">\(\tilde{y}=a^{(2)}\)</span> and we define a generic cost function <span class="math notranslate nohighlight">\(C(a^{(2)},y;\boldsymbol{\Theta})\)</span> where <span class="math notranslate nohighlight">\(y\)</span> is the target value (a scalar here).
+The parameters we need to optimize are given by</p>
+<div class="math notranslate nohighlight">
+\[
+\boldsymbol{\Theta}=\left\{w_{00}^{(1)},w_{01}^{(1)},w_{10}^{(1)},w_{11}^{(1)},w_{0}^{(2)},w_{1}^{(2)},b_0^{(1)},b_1^{(1)},b^{(2)}\right\}.
+\]</div>
+</section>
+<section id="compact-expressions">
+<h2>Compact expressions<a class="headerlink" href="#compact-expressions" title="Link to this heading">#</a></h2>
+<p>We can define the inputs to the activation functions for the various layers in terms of various matrix-vector multiplications and vector additions.
+The inputs to the first hidden layer are</p>
+<div class="math notranslate nohighlight">
+\[\begin{split}
+\begin{bmatrix}z_0^{(1)} \\ z_1^{(1)} \end{bmatrix}=\begin{bmatrix}w_{00}^{(1)} &amp; w_{01}^{(1)}\\ w_{10}^{(1)} &amp;w_{11}^{(1)} \end{bmatrix}\begin{bmatrix}a_0^{(0)} \\ a_1^{(0)} \end{bmatrix}+\begin{bmatrix}b_0^{(1)} \\ b_1^{(1)} \end{bmatrix},
+\end{split}\]</div>
+<p>with outputs</p>
+<div class="math notranslate nohighlight">
+\[\begin{split}
+\begin{bmatrix}a_0^{(1)} \\ a_1^{(1)} \end{bmatrix}=\begin{bmatrix}\sigma^{(1)}(z_0^{(1)}) \\ \sigma^{(1)}(z_1^{(1)}) \end{bmatrix}.
+\end{split}\]</div>
+</section>
+<section id="output-layer">
+<h2>Output layer<a class="headerlink" href="#output-layer" title="Link to this heading">#</a></h2>
+<p>For the final output layer we have the inputs to the final activation function</p>
+<div class="math notranslate nohighlight">
+\[
+z^{(2)} = w_{0}^{(2)}a_0^{(1)} +w_{1}^{(2)}a_1^{(1)}+b^{(2)},
+\]</div>
+<p>resulting in the  output</p>
+<div class="math notranslate nohighlight">
+\[
+a^{(2)}=\sigma^{(2)}(z^{(2)}).
+\]</div>
+</section>
+<section id="explicit-derivatives">
+<h2>Explicit derivatives<a class="headerlink" href="#explicit-derivatives" title="Link to this heading">#</a></h2>
+<p>In total we have nine parameters which we need to train.  Using the
+chain rule (or just the back-propagation algorithm) we can find all
+derivatives. Since we will use automatic differentiation in reverse
+mode, we start with the derivatives of the cost function with respect
+to the parameters of the output layer, namely</p>
+<div class="math notranslate nohighlight">
+\[
+\frac{\partial C}{\partial w_{i}^{(2)}}=\frac{\partial C}{\partial a^{(2)}}\frac{\partial a^{(2)}}{\partial z^{(2)}}\frac{\partial z^{(2)}}{\partial w_{i}^{(2)}}=\delta^{(2)}a_i^{(1)},
+\]</div>
+<p>with</p>
+<div class="math notranslate nohighlight">
+\[
+\delta^{(2)}=\frac{\partial C}{\partial a^{(2)}}\frac{\partial a^{(2)}}{\partial z^{(2)}}
+\]</div>
+<p>and finally</p>
+<div class="math notranslate nohighlight">
+\[
+\frac{\partial C}{\partial b^{(2)}}=\frac{\partial C}{\partial a^{(2)}}\frac{\partial a^{(2)}}{\partial z^{(2)}}\frac{\partial z^{(2)}}{\partial b^{(2)}}=\delta^{(2)}.
+\]</div>
+</section>
+<section id="derivatives-of-the-hidden-layer">
+<h2>Derivatives of the hidden layer<a class="headerlink" href="#derivatives-of-the-hidden-layer" title="Link to this heading">#</a></h2>
+<p>Using the chain rule we have the following expressions for say one of the weight parameters (it is easy to generalize to the other weight parameters)</p>
+<div class="math notranslate nohighlight">
+\[
+\frac{\partial C}{\partial w_{00}^{(1)}}=\frac{\partial C}{\partial a^{(2)}}\frac{\partial a^{(2)}}{\partial z^{(2)}}
+\frac{\partial z^{(2)}}{\partial z_0^{(1)}}\frac{\partial z_0^{(1)}}{\partial w_{00}^{(1)}}=    \delta^{(2)}\frac{\partial z^{(2)}}{\partial z_0^{(1)}}\frac{\partial z_0^{(1)}}{\partial w_{00}^{(1)}},
+\]</div>
+<p>which, noting that</p>
+<div class="math notranslate nohighlight">
+\[
+z^{(2)} =w_0^{(2)}a_0^{(1)}+w_1^{(2)}a_1^{(1)}+b^{(2)},
+\]</div>
+<p>allows us to rewrite</p>
+<div class="math notranslate nohighlight">
+\[
+\frac{\partial z^{(2)}}{\partial z_0^{(1)}}\frac{\partial z_0^{(1)}}{\partial w_{00}^{(1)}}=w_0^{(2)}\frac{\partial a_0^{(1)}}{\partial z_0^{(1)}}a_0^{(1)}.
+\]</div>
+</section>
+<section id="final-expression">
+<h2>Final expression<a class="headerlink" href="#final-expression" title="Link to this heading">#</a></h2>
+<p>Defining</p>
+<div class="math notranslate nohighlight">
+\[
+\delta_0^{(1)}=w_0^{(2)}\frac{\partial a_0^{(1)}}{\partial z_0^{(1)}}\delta^{(2)},
+\]</div>
+<p>we have</p>
+<div class="math notranslate nohighlight">
+\[
+\frac{\partial C}{\partial w_{00}^{(1)}}=\delta_0^{(1)}a_0^{(1)}.
+\]</div>
+<p>Similarly, we obtain</p>
+<div class="math notranslate nohighlight">
+\[
+\frac{\partial C}{\partial w_{01}^{(1)}}=\delta_0^{(1)}a_1^{(1)}.
+\]</div>
+</section>
+<section id="completing-the-list">
+<h2>Completing the list<a class="headerlink" href="#completing-the-list" title="Link to this heading">#</a></h2>
+<p>Similarly, we find</p>
+<div class="math notranslate nohighlight">
+\[
+\frac{\partial C}{\partial w_{10}^{(1)}}=\delta_1^{(1)}a_0^{(1)},
+\]</div>
+<p>and</p>
+<div class="math notranslate nohighlight">
+\[
+\frac{\partial C}{\partial w_{11}^{(1)}}=\delta_1^{(1)}a_1^{(1)},
+\]</div>
+<p>where we have defined</p>
+<div class="math notranslate nohighlight">
+\[
+\delta_1^{(1)}=w_1^{(2)}\frac{\partial a_1^{(1)}}{\partial z_1^{(1)}}\delta^{(2)}.
+\]</div>
+</section>
+<section id="final-expressions-for-the-biases-of-the-hidden-layer">
+<h2>Final expressions for the biases of the hidden layer<a class="headerlink" href="#final-expressions-for-the-biases-of-the-hidden-layer" title="Link to this heading">#</a></h2>
+<p>For the sake of completeness, we list the derivatives of the biases, which are</p>
+<div class="math notranslate nohighlight">
+\[
+\frac{\partial C}{\partial b_{0}^{(1)}}=\delta_0^{(1)},
+\]</div>
+<p>and</p>
+<div class="math notranslate nohighlight">
+\[
+\frac{\partial C}{\partial b_{1}^{(1)}}=\delta_1^{(1)}.
+\]</div>
+<p>As we will see below, these expressions can be generalized in a more compact form.</p>
+</section>
+<section id="gradient-expressions">
+<h2>Gradient expressions<a class="headerlink" href="#gradient-expressions" title="Link to this heading">#</a></h2>
+<p>For this specific model, with just one output node and two hidden
+nodes, the gradient descent equations take the following form for output layer</p>
+<div class="math notranslate nohighlight">
+\[
+w_{i}^{(2)}\leftarrow w_{i}^{(2)}- \eta \delta^{(2)} a_{i}^{(1)},
+\]</div>
+<p>and</p>
+<div class="math notranslate nohighlight">
+\[
+b^{(2)} \leftarrow b^{(2)}-\eta \delta^{(2)},
+\]</div>
+<p>and</p>
+<div class="math notranslate nohighlight">
+\[
+w_{ij}^{(1)}\leftarrow w_{ij}^{(1)}- \eta \delta_{i}^{(1)} a_{j}^{(0)},
+\]</div>
+<p>and</p>
+<div class="math notranslate nohighlight">
+\[
+b_{i}^{(1)} \leftarrow b_{i}^{(1)}-\eta \delta_{i}^{(1)},
+\]</div>
+<p>where <span class="math notranslate nohighlight">\(\eta\)</span> is the learning rate.</p>
+</section>
+<section id="exercise-2-extended-program">
+<h2>Exercise 2: Extended program<a class="headerlink" href="#exercise-2-extended-program" title="Link to this heading">#</a></h2>
+<p>We extend our simple code to a function which depends on two variable <span class="math notranslate nohighlight">\(x_0\)</span> and <span class="math notranslate nohighlight">\(x_1\)</span>, that is</p>
+<div class="math notranslate nohighlight">
+\[
+y=f(x_0,x_1)=x_0^2+3x_0x_1+x_1^2+5.
+\]</div>
+<p>We feed our network with <span class="math notranslate nohighlight">\(n=100\)</span> entries <span class="math notranslate nohighlight">\(x_0\)</span> and <span class="math notranslate nohighlight">\(x_1\)</span>. We have thus two features represented by these variable and an input matrix/design matrix <span class="math notranslate nohighlight">\(\boldsymbol{X}\in \mathbf{R}^{n\times 2}\)</span></p>
+<div class="math notranslate nohighlight">
+\[\begin{split}
+\boldsymbol{X}=\begin{bmatrix} x_{00} &amp; x_{01} \\ x_{00} &amp; x_{01} \\ x_{10} &amp; x_{11} \\ x_{20} &amp; x_{21} \\ \dots &amp; \dots \\ \dots &amp; \dots \\ x_{n-20} &amp; x_{n-21} \\ x_{n-10} &amp; x_{n-11} \end{bmatrix}.
+\end{split}\]</div>
+<p>Write a code, based on the previous code examples, which takes as input these data and fit the above function.
+You can extend your code to include automatic differentiation.</p>
+<p>With these examples, we are now ready to embark upon the writing of more a general code for neural networks.</p>
+</section>
+<section id="getting-serious-the-back-propagation-equations-for-a-neural-network">
+<h2>Getting serious, the  back propagation equations for a neural network<a class="headerlink" href="#getting-serious-the-back-propagation-equations-for-a-neural-network" title="Link to this heading">#</a></h2>
+<p>Now it is time to move away from one node in each layer only. Our inputs are also represented either by several inputs.</p>
+<p>We have thus</p>
+<div class="math notranslate nohighlight">
+\[
+\frac{\partial{\cal C}((\boldsymbol{\Theta}^L)}{\partial w_{jk}^L}  =  \left(a_j^L - y_j\right)a_j^L(1-a_j^L)a_k^{L-1},
+\]</div>
+<p>Defining</p>
+<div class="math notranslate nohighlight">
+\[
+\delta_j^L = a_j^L(1-a_j^L)\left(a_j^L - y_j\right) = \sigma'(z_j^L)\frac{\partial {\cal C}}{\partial (a_j^L)},
+\]</div>
+<p>and using the Hadamard product of two vectors we can write this as</p>
+<div class="math notranslate nohighlight">
+\[
+\boldsymbol{\delta}^L = \sigma'(\hat{z}^L)\circ\frac{\partial {\cal C}}{\partial (\boldsymbol{a}^L)}.
+\]</div>
+</section>
+<section id="analyzing-the-last-results">
+<h2>Analyzing the last results<a class="headerlink" href="#analyzing-the-last-results" title="Link to this heading">#</a></h2>
+<p>This is an important expression. The second term on the right handside
+measures how fast the cost function is changing as a function of the <span class="math notranslate nohighlight">\(j\)</span>th
+output activation.  If, for example, the cost function doesn’t depend
+much on a particular output node <span class="math notranslate nohighlight">\(j\)</span>, then <span class="math notranslate nohighlight">\(\delta_j^L\)</span> will be small,
+which is what we would expect. The first term on the right, measures
+how fast the activation function <span class="math notranslate nohighlight">\(f\)</span> is changing at a given activation
+value <span class="math notranslate nohighlight">\(z_j^L\)</span>.</p>
+</section>
+<section id="more-considerations">
+<h2>More considerations<a class="headerlink" href="#more-considerations" title="Link to this heading">#</a></h2>
+<p>Notice that everything in the above equations is easily computed.  In
+particular, we compute <span class="math notranslate nohighlight">\(z_j^L\)</span> while computing the behaviour of the
+network, and it is only a small additional overhead to compute
+<span class="math notranslate nohighlight">\(\sigma'(z^L_j)\)</span>.  The exact form of the derivative with respect to the
+output depends on the form of the cost function.
+However, provided the cost function is known there should be little
+trouble in calculating</p>
+<div class="math notranslate nohighlight">
+\[
+\frac{\partial {\cal C}}{\partial (a_j^L)}
+\]</div>
+<p>With the definition of <span class="math notranslate nohighlight">\(\delta_j^L\)</span> we have a more compact definition of the derivative of the cost function in terms of the weights, namely</p>
+<div class="math notranslate nohighlight">
+\[
+\frac{\partial{\cal C}}{\partial w_{jk}^L}  =  \delta_j^La_k^{L-1}.
+\]</div>
+</section>
+<section id="derivatives-in-terms-of-z-j-l">
+<h2>Derivatives in terms of <span class="math notranslate nohighlight">\(z_j^L\)</span><a class="headerlink" href="#derivatives-in-terms-of-z-j-l" title="Link to this heading">#</a></h2>
+<p>It is also easy to see that our previous equation can be written as</p>
+<div class="math notranslate nohighlight">
+\[
+\delta_j^L =\frac{\partial {\cal C}}{\partial z_j^L}= \frac{\partial {\cal C}}{\partial a_j^L}\frac{\partial a_j^L}{\partial z_j^L},
+\]</div>
+<p>which can also be interpreted as the partial derivative of the cost function with respect to the biases <span class="math notranslate nohighlight">\(b_j^L\)</span>, namely</p>
+<div class="math notranslate nohighlight">
+\[
+\delta_j^L = \frac{\partial {\cal C}}{\partial b_j^L}\frac{\partial b_j^L}{\partial z_j^L}=\frac{\partial {\cal C}}{\partial b_j^L},
+\]</div>
+<p>That is, the error <span class="math notranslate nohighlight">\(\delta_j^L\)</span> is exactly equal to the rate of change of the cost function as a function of the bias.</p>
+</section>
+<section id="bringing-it-together">
+<h2>Bringing it together<a class="headerlink" href="#bringing-it-together" title="Link to this heading">#</a></h2>
+<p>We have now three equations that are essential for the computations of the derivatives of the cost function at the output layer. These equations are needed to start the algorithm and they are</p>
+<!-- Equation labels as ordinary links -->
+<div id="_auto1"></div>
+<div class="math notranslate nohighlight">
+\[
+\begin{equation}
+\frac{\partial{\cal C}(\hat{W^L})}{\partial w_{jk}^L}  =  \delta_j^La_k^{L-1},
+\label{_auto1} \tag{2}
+\end{equation}
+\]</div>
+<p>and</p>
+<!-- Equation labels as ordinary links -->
+<div id="_auto2"></div>
+<div class="math notranslate nohighlight">
+\[
+\begin{equation}
+\delta_j^L = \sigma'(z_j^L)\frac{\partial {\cal C}}{\partial (a_j^L)},
+\label{_auto2} \tag{3}
+\end{equation}
+\]</div>
+<p>and</p>
+<!-- Equation labels as ordinary links -->
+<div id="_auto3"></div>
+<div class="math notranslate nohighlight">
+\[
+\begin{equation}
+\delta_j^L = \frac{\partial {\cal C}}{\partial b_j^L},
+\label{_auto3} \tag{4}
+\end{equation}
+\]</div>
+</section>
+<section id="final-back-propagating-equation">
+<h2>Final back propagating equation<a class="headerlink" href="#final-back-propagating-equation" title="Link to this heading">#</a></h2>
+<p>We have that (replacing <span class="math notranslate nohighlight">\(L\)</span> with a general layer <span class="math notranslate nohighlight">\(l\)</span>)</p>
+<div class="math notranslate nohighlight">
+\[
+\delta_j^l =\frac{\partial {\cal C}}{\partial z_j^l}.
+\]</div>
+<p>We want to express this in terms of the equations for layer <span class="math notranslate nohighlight">\(l+1\)</span>.</p>
+</section>
+<section id="using-the-chain-rule-and-summing-over-all-k-entries">
+<h2>Using the chain rule and summing over all <span class="math notranslate nohighlight">\(k\)</span> entries<a class="headerlink" href="#using-the-chain-rule-and-summing-over-all-k-entries" title="Link to this heading">#</a></h2>
+<p>We obtain</p>
+<div class="math notranslate nohighlight">
+\[
+\delta_j^l =\sum_k \frac{\partial {\cal C}}{\partial z_k^{l+1}}\frac{\partial z_k^{l+1}}{\partial z_j^{l}}=\sum_k \delta_k^{l+1}\frac{\partial z_k^{l+1}}{\partial z_j^{l}},
+\]</div>
+<p>and recalling that</p>
+<div class="math notranslate nohighlight">
+\[
+z_j^{l+1} = \sum_{i=1}^{M_{l}}w_{ij}^{l+1}a_i^{l}+b_j^{l+1},
+\]</div>
+<p>with <span class="math notranslate nohighlight">\(M_l\)</span> being the number of nodes in layer <span class="math notranslate nohighlight">\(l\)</span>, we obtain</p>
+<div class="math notranslate nohighlight">
+\[
+\delta_j^l =\sum_k \delta_k^{l+1}w_{kj}^{l+1}\sigma'(z_j^l),
+\]</div>
+<p>This is our final equation.</p>
+<p>We are now ready to set up the algorithm for back propagation and learning the weights and biases.</p>
+</section>
+<section id="setting-up-the-back-propagation-algorithm">
+<h2>Setting up the back propagation algorithm<a class="headerlink" href="#setting-up-the-back-propagation-algorithm" title="Link to this heading">#</a></h2>
+<p>The four equations  provide us with a way of computing the gradient of the cost function. Let us write this out in the form of an algorithm.</p>
+<p><strong>First</strong>, we set up the input data <span class="math notranslate nohighlight">\(\hat{x}\)</span> and the activations
+<span class="math notranslate nohighlight">\(\hat{z}_1\)</span> of the input layer and compute the activation function and
+the pertinent outputs <span class="math notranslate nohighlight">\(\hat{a}^1\)</span>.</p>
+<p><strong>Secondly</strong>, we perform then the feed forward till we reach the output
+layer and compute all <span class="math notranslate nohighlight">\(\hat{z}_l\)</span> of the input layer and compute the
+activation function and the pertinent outputs <span class="math notranslate nohighlight">\(\hat{a}^l\)</span> for
+<span class="math notranslate nohighlight">\(l=1,2,3,\dots,L\)</span>.</p>
+<p><strong>Notation</strong>: The first hidden layer has <span class="math notranslate nohighlight">\(l=1\)</span> as label and the final output layer has <span class="math notranslate nohighlight">\(l=L\)</span>.</p>
+</section>
+<section id="setting-up-the-back-propagation-algorithm-part-2">
+<h2>Setting up the back propagation algorithm, part 2<a class="headerlink" href="#setting-up-the-back-propagation-algorithm-part-2" title="Link to this heading">#</a></h2>
+<p>Thereafter we compute the ouput error <span class="math notranslate nohighlight">\(\hat{\delta}^L\)</span> by computing all</p>
+<div class="math notranslate nohighlight">
+\[
+\delta_j^L = \sigma'(z_j^L)\frac{\partial {\cal C}}{\partial (a_j^L)}.
+\]</div>
+<p>Then we compute the back propagate error for each <span class="math notranslate nohighlight">\(l=L-1,L-2,\dots,1\)</span> as</p>
+<div class="math notranslate nohighlight">
+\[
+\delta_j^l = \sum_k \delta_k^{l+1}w_{kj}^{l+1}\sigma'(z_j^l).
+\]</div>
+</section>
+<section id="setting-up-the-back-propagation-algorithm-part-3">
+<h2>Setting up the Back propagation algorithm, part 3<a class="headerlink" href="#setting-up-the-back-propagation-algorithm-part-3" title="Link to this heading">#</a></h2>
+<p>Finally, we update the weights and the biases using gradient descent
+for each <span class="math notranslate nohighlight">\(l=L-1,L-2,\dots,1\)</span> and update the weights and biases
+according to the rules</p>
+<div class="math notranslate nohighlight">
+\[
+w_{jk}^l\leftarrow  = w_{jk}^l- \eta \delta_j^la_k^{l-1},
+\]</div>
+<div class="math notranslate nohighlight">
+\[
+b_j^l \leftarrow b_j^l-\eta \frac{\partial {\cal C}}{\partial b_j^l}=b_j^l-\eta \delta_j^l,
+\]</div>
+<p>with <span class="math notranslate nohighlight">\(\eta\)</span> being the learning rate.</p>
+</section>
+<section id="updating-the-gradients">
+<h2>Updating the gradients<a class="headerlink" href="#updating-the-gradients" title="Link to this heading">#</a></h2>
+<p>With the back propagate error for each <span class="math notranslate nohighlight">\(l=L-1,L-2,\dots,1\)</span> as</p>
+<div class="math notranslate nohighlight">
+\[
+\delta_j^l = \sum_k \delta_k^{l+1}w_{kj}^{l+1}sigma'(z_j^l),
+\]</div>
+<p>we update the weights and the biases using gradient descent for each <span class="math notranslate nohighlight">\(l=L-1,L-2,\dots,1\)</span> and update the weights and biases according to the rules</p>
+<div class="math notranslate nohighlight">
+\[
+w_{jk}^l\leftarrow  = w_{jk}^l- \eta \delta_j^la_k^{l-1},
+\]</div>
+<div class="math notranslate nohighlight">
+\[
+b_j^l \leftarrow b_j^l-\eta \frac{\partial {\cal C}}{\partial b_j^l}=b_j^l-\eta \delta_j^l,
+\]</div>
+</section>
+</section>
+
+    <script type="text/x-thebe-config">
+    {
+        requestKernel: true,
+        binderOptions: {
+            repo: "binder-examples/jupyter-stacks-datascience",
+            ref: "master",
+        },
+        codeMirrorConfig: {
+            theme: "abcdef",
+            mode: "python"
+        },
+        kernelOptions: {
+            name: "python3",
+            path: "./."
+        },
+        predefinedOutput: true
+    }
+    </script>
+    <script>kernelName = 'python3'</script>
+
+                </article>
+              
+
+              
+              
+              
+              
+                <footer class="prev-next-footer d-print-none">
+                  
+<div class="prev-next-area">
+    <a class="left-prev"
+       href="http://github.com/week40.html"
+       title="previous page">
+      <i class="fa-solid fa-angle-left"></i>
+      <div class="prev-next-info">
+        <p class="prev-next-subtitle">previous</p>
+        <p class="prev-next-title">Week 40: Gradient descent methods (continued) and start Neural networks</p>
+      </div>
+    </a>
+    <a class="right-next"
+       href="http://github.com/exercisesweek41.html"
+       title="next page">
+      <div class="prev-next-info">
+        <p class="prev-next-subtitle">next</p>
+        <p class="prev-next-title">Exercises week 41</p>
+      </div>
+      <i class="fa-solid fa-angle-right"></i>
+    </a>
+</div>
+                </footer>
+              
+            </div>
+            
+            
+              
+                <div class="bd-sidebar-secondary bd-toc"><div class="sidebar-secondary-items sidebar-secondary__inner">
+
+
+  <div class="sidebar-secondary-item">
+  <div class="page-toc tocsection onthispage">
+    <i class="fa-solid fa-list"></i> Contents
+  </div>
+  <nav class="bd-toc-nav page-toc">
+    <ul class="visible nav section-nav flex-column">
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#plan-for-week-41-october-6-10">Plan for week 41, October 6-10</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#material-for-the-lecture-on-monday-october-6-2025">Material for the lecture on Monday October 6, 2025</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#readings-and-videos">Readings and Videos:</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#mathematics-of-deep-learning">Mathematics of deep learning</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#reminder-on-books-with-hands-on-material-and-codes">Reminder on books with hands-on material and codes</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#lab-sessions-on-tuesday-and-wednesday">Lab sessions on Tuesday and Wednesday</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#lecture-monday-october-6">Lecture Monday  October 6</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#introduction-to-neural-networks">Introduction to Neural networks</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#artificial-neurons">Artificial neurons</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#neural-network-types">Neural network types</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#feed-forward-neural-networks">Feed-forward neural networks</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#convolutional-neural-network">Convolutional Neural Network</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#recurrent-neural-networks">Recurrent neural networks</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#other-types-of-networks">Other types of networks</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#multilayer-perceptrons">Multilayer perceptrons</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#why-multilayer-perceptrons">Why multilayer perceptrons?</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#illustration-of-a-single-perceptron-model-and-a-multi-perceptron-model">Illustration of a single perceptron model and a multi-perceptron model</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#mathematics-of-deep-learning-and-neural-networks">Mathematics of deep learning and neural networks</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#basics-of-an-nn">Basics of an NN</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#overarching-view-of-a-neural-network">Overarching view of a neural network</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#the-optimization-problem">The optimization problem</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#parameters-of-neural-networks">Parameters of neural networks</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#other-ingredients-of-a-neural-network">Other ingredients of a neural network</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#other-parameters">Other parameters</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#universal-approximation-theorem">Universal approximation theorem</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#some-parallels-from-real-analysis">Some parallels from real analysis</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#the-approximation-theorem-in-words">The approximation theorem in words</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#more-on-the-general-approximation-theorem">More on the general approximation theorem</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#class-of-functions-we-can-approximate">Class of functions we can approximate</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#setting-up-the-equations-for-a-neural-network">Setting up the equations for a neural network</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#layout-of-a-neural-network-with-three-hidden-layers">Layout of a neural network with three hidden layers</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#definitions">Definitions</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#inputs-to-the-activation-function">Inputs to the activation function</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#derivatives-and-the-chain-rule">Derivatives and the chain rule</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#derivative-of-the-cost-function">Derivative of the cost function</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#simpler-examples-first-and-automatic-differentiation">Simpler examples first, and automatic differentiation</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#reminder-on-the-chain-rule-and-gradients">Reminder on the chain rule and gradients</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#multivariable-functions">Multivariable functions</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#automatic-differentiation-through-examples">Automatic differentiation through examples</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#simple-example">Simple example</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#smarter-way-of-evaluating-the-above-function">Smarter way of evaluating the above function</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#reducing-the-number-of-operations">Reducing the number of operations</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#chain-rule-forward-and-reverse-modes">Chain rule, forward and reverse modes</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#forward-and-reverse-modes">Forward and reverse modes</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#more-complicated-function">More complicated function</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#counting-the-number-of-floating-point-operations">Counting the number of floating point operations</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#defining-intermediate-operations">Defining intermediate operations</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#new-expression-for-the-derivative">New expression for the derivative</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#final-derivatives">Final derivatives</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#in-general-not-this-simple">In general not this simple</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#automatic-differentiation">Automatic differentiation</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#chain-rule">Chain rule</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#first-network-example-simple-percepetron-with-one-input">First network example, simple percepetron with one input</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#layout-of-a-simple-neural-network-with-no-hidden-layer">Layout of a simple neural network with no hidden layer</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#optimizing-the-parameters">Optimizing the parameters</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#adding-a-hidden-layer">Adding a hidden layer</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#layout-of-a-simple-neural-network-with-one-hidden-layer">Layout of a simple neural network with one hidden layer</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#the-derivatives">The derivatives</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#important-observations">Important observations</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#the-training">The training</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#code-example">Code example</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#exercise-1-including-more-data">Exercise 1: Including more data</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#simple-neural-network-and-the-back-propagation-equations">Simple neural network and the  back propagation equations</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-and-one-output-node">Layout of a simple neural network with two input nodes, one  hidden layer and one output node</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#the-ouput-layer">The ouput layer</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#compact-expressions">Compact expressions</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#output-layer">Output layer</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#explicit-derivatives">Explicit derivatives</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#derivatives-of-the-hidden-layer">Derivatives of the hidden layer</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#final-expression">Final expression</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#completing-the-list">Completing the list</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#final-expressions-for-the-biases-of-the-hidden-layer">Final expressions for the biases of the hidden layer</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#gradient-expressions">Gradient expressions</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#exercise-2-extended-program">Exercise 2: Extended program</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#getting-serious-the-back-propagation-equations-for-a-neural-network">Getting serious, the  back propagation equations for a neural network</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#analyzing-the-last-results">Analyzing the last results</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#more-considerations">More considerations</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#derivatives-in-terms-of-z-j-l">Derivatives in terms of <span class="math notranslate nohighlight">\(z_j^L\)</span></a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#bringing-it-together">Bringing it together</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#final-back-propagating-equation">Final back propagating equation</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#using-the-chain-rule-and-summing-over-all-k-entries">Using the chain rule and summing over all <span class="math notranslate nohighlight">\(k\)</span> entries</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#setting-up-the-back-propagation-algorithm">Setting up the back propagation algorithm</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#setting-up-the-back-propagation-algorithm-part-2">Setting up the back propagation algorithm, part 2</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#setting-up-the-back-propagation-algorithm-part-3">Setting up the Back propagation algorithm, part 3</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#updating-the-gradients">Updating the gradients</a></li>
+</ul>
+  </nav></div>
+
+</div></div>
+              
+            
+          </div>
+          <footer class="bd-footer-content">
+            
+<div class="bd-footer-content__inner container">
+  
+  <div class="footer-item">
+    
+<p class="component-author">
+By Morten Hjorth-Jensen
+</p>
+
+  </div>
+  
+  <div class="footer-item">
+    
+
+  <p class="copyright">
+    
+      © Copyright 2023.
+      <br/>
+    
+  </p>
+
+  </div>
+  
+  <div class="footer-item">
+    
+  </div>
+  
+  <div class="footer-item">
+    
+  </div>
+  
+</div>
+          </footer>
+        
+
+      </main>
+    </div>
+  </div>
+  
+  <!-- Scripts loaded after <body> so the DOM is not blocked -->
+  <script src="http://github.com/_static/scripts/bootstrap.js?digest=dfe6caa3a7d634c4db9b"></script>
+<script src="http://github.com/_static/scripts/pydata-sphinx-theme.js?digest=dfe6caa3a7d634c4db9b"></script>
+
+  <footer class="bd-footer">
+  </footer>
+  </body>
+</html>
\ No newline at end of file
diff --git a/doc/LectureNotes/_build/html/week42.html b/doc/LectureNotes/_build/html/week42.html
new file mode 100644
index 000000000..efa142492
--- /dev/null
+++ b/doc/LectureNotes/_build/html/week42.html
@@ -0,0 +1,4066 @@
+
+<!DOCTYPE html>
+
+
+<html lang="en" data-content_root="./" >
+
+  <head>
+    <meta charset="utf-8" />
+    <meta name="viewport" content="width=device-width, initial-scale=1.0" /><meta name="viewport" content="width=device-width, initial-scale=1" />
+
+    <title>Week 42 Constructing a Neural Network code with examples &#8212; Applied Data Analysis and Machine Learning</title>
+  
+  
+  
+  <script data-cfasync="false">
+    document.documentElement.dataset.mode = localStorage.getItem("mode") || "";
+    document.documentElement.dataset.theme = localStorage.getItem("theme") || "";
+  </script>
+  
+  <!-- Loaded before other Sphinx assets -->
+  <link href="http://github.com/_static/styles/theme.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+<link href="http://github.com/_static/styles/bootstrap.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+<link href="http://github.com/_static/styles/pydata-sphinx-theme.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+
+  
+  <link href="http://github.com/_static/vendor/fontawesome/6.5.2/css/all.min.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+  <link rel="preload" as="font" type="font/woff2" crossorigin href="http://github.com/_static/vendor/fontawesome/6.5.2/webfonts/fa-solid-900.woff2" />
+<link rel="preload" as="font" type="font/woff2" crossorigin href="http://github.com/_static/vendor/fontawesome/6.5.2/webfonts/fa-brands-400.woff2" />
+<link rel="preload" as="font" type="font/woff2" crossorigin href="http://github.com/_static/vendor/fontawesome/6.5.2/webfonts/fa-regular-400.woff2" />
+
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/pygments.css?v=fa44fd50" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/styles/sphinx-book-theme.css?v=eba8b062" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/togglebutton.css?v=13237357" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/copybutton.css?v=76b2166b" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/mystnb.8ecb98da25f57f5357bf6f572d296f466b2cfe2517ffebfabe82451661e28f02.css?v=6644e6bb" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/sphinx-thebe.css?v=4fa983c6" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/sphinx-design.min.css?v=95c83b7e" />
+  
+  <!-- Pre-loaded scripts that we'll load fully later -->
+  <link rel="preload" as="script" href="http://github.com/_static/scripts/bootstrap.js?digest=dfe6caa3a7d634c4db9b" />
+<link rel="preload" as="script" href="http://github.com/_static/scripts/pydata-sphinx-theme.js?digest=dfe6caa3a7d634c4db9b" />
+  <script src="http://github.com/_static/vendor/fontawesome/6.5.2/js/all.min.js?digest=dfe6caa3a7d634c4db9b"></script>
+
+    <script src="http://github.com/_static/documentation_options.js?v=9eb32ce0"></script>
+    <script src="http://github.com/_static/doctools.js?v=9a2dae69"></script>
+    <script src="http://github.com/_static/sphinx_highlight.js?v=dc90522c"></script>
+    <script src="http://github.com/_static/clipboard.min.js?v=a7894cd8"></script>
+    <script src="http://github.com/_static/copybutton.js?v=f281be69"></script>
+    <script src="http://github.com/_static/scripts/sphinx-book-theme.js?v=887ef09a"></script>
+    <script>let toggleHintShow = 'Click to show';</script>
+    <script>let toggleHintHide = 'Click to hide';</script>
+    <script>let toggleOpenOnPrint = 'true';</script>
+    <script src="http://github.com/_static/togglebutton.js?v=4a39c7ea"></script>
+    <script>var togglebuttonSelector = '.toggle, .admonition.dropdown';</script>
+    <script src="http://github.com/_static/design-tabs.js?v=f930bc37"></script>
+    <script>const THEBE_JS_URL = "https://unpkg.com/thebe@0.8.2/lib/index.js"; const thebe_selector = ".thebe,.cell"; const thebe_selector_input = "pre"; const thebe_selector_output = ".output, .cell_output"</script>
+    <script async="async" src="http://github.com/_static/sphinx-thebe.js?v=c100c467"></script>
+    <script>var togglebuttonSelector = '.toggle, .admonition.dropdown';</script>
+    <script>const THEBE_JS_URL = "https://unpkg.com/thebe@0.8.2/lib/index.js"; const thebe_selector = ".thebe,.cell"; const thebe_selector_input = "pre"; const thebe_selector_output = ".output, .cell_output"</script>
+    <script>window.MathJax = {"options": {"processHtmlClass": "tex2jax_process|mathjax_process|math|output_area"}}</script>
+    <script defer="defer" src="https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-mml-chtml.js"></script>
+    <script>DOCUMENTATION_OPTIONS.pagename = 'week42';</script>
+    <link rel="index" title="Index" href="http://github.com/genindex.html" />
+    <link rel="search" title="Search" href="http://github.com/search.html" />
+    <link rel="next" title="Exercises week 42" href="http://github.com/exercisesweek42.html" />
+    <link rel="prev" title="Exercises week 41" href="http://github.com/exercisesweek41.html" />
+  <meta name="viewport" content="width=device-width, initial-scale=1"/>
+  <meta name="docsearch:language" content="en"/>
+  </head>
+  
+  
+  <body data-bs-spy="scroll" data-bs-target=".bd-toc-nav" data-offset="180" data-bs-root-margin="0px 0px -60%" data-default-mode="">
+
+  
+  
+  <div id="pst-skip-link" class="skip-link d-print-none"><a href="#main-content">Skip to main content</a></div>
+  
+  <div id="pst-scroll-pixel-helper"></div>
+  
+  <button type="button" class="btn rounded-pill" id="pst-back-to-top">
+    <i class="fa-solid fa-arrow-up"></i>Back to top</button>
+
+  
+  <input type="checkbox"
+          class="sidebar-toggle"
+          id="pst-primary-sidebar-checkbox"/>
+  <label class="overlay overlay-primary" for="pst-primary-sidebar-checkbox"></label>
+  
+  <input type="checkbox"
+          class="sidebar-toggle"
+          id="pst-secondary-sidebar-checkbox"/>
+  <label class="overlay overlay-secondary" for="pst-secondary-sidebar-checkbox"></label>
+  
+  <div class="search-button__wrapper">
+    <div class="search-button__overlay"></div>
+    <div class="search-button__search-container">
+<form class="bd-search d-flex align-items-center"
+      action="http://github.com/search.html"
+      method="get">
+  <i class="fa-solid fa-magnifying-glass"></i>
+  <input type="search"
+         class="form-control"
+         name="q"
+         id="search-input"
+         placeholder="Search this book..."
+         aria-label="Search this book..."
+         autocomplete="off"
+         autocorrect="off"
+         autocapitalize="off"
+         spellcheck="false"/>
+  <span class="search-button__kbd-shortcut"><kbd class="kbd-shortcut__modifier">Ctrl</kbd>+<kbd>K</kbd></span>
+</form></div>
+  </div>
+
+  <div class="pst-async-banner-revealer d-none">
+  <aside id="bd-header-version-warning" class="d-none d-print-none" aria-label="Version warning"></aside>
+</div>
+
+  
+    <header class="bd-header navbar navbar-expand-lg bd-navbar d-print-none">
+    </header>
+  
+
+  <div class="bd-container">
+    <div class="bd-container__inner bd-page-width">
+      
+      
+      
+      <div class="bd-sidebar-primary bd-sidebar">
+        
+
+  
+  <div class="sidebar-header-items sidebar-primary__section">
+    
+    
+    
+    
+  </div>
+  
+    <div class="sidebar-primary-items__start sidebar-primary__section">
+        <div class="sidebar-primary-item">
+
+  
+    
+  
+
+<a class="navbar-brand logo" href="http://github.com/intro.html">
+  
+  
+  
+  
+  
+    
+    
+      
+    
+    
+    <img src="http://github.com/_static/logo.png" class="logo__image only-light" alt="Applied Data Analysis and Machine Learning - Home"/>
+    <script>document.write(`<img src="http://github.com/_static/logo.png" class="logo__image only-dark" alt="Applied Data Analysis and Machine Learning - Home"/>`);</script>
+  
+  
+</a></div>
+        <div class="sidebar-primary-item">
+
+ <script>
+ document.write(`
+   <button class="btn search-button-field search-button__button" title="Search" aria-label="Search" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <i class="fa-solid fa-magnifying-glass"></i>
+    <span class="search-button__default-text">Search</span>
+    <span class="search-button__kbd-shortcut"><kbd class="kbd-shortcut__modifier">Ctrl</kbd>+<kbd class="kbd-shortcut__modifier">K</kbd></span>
+   </button>
+ `);
+ </script></div>
+        <div class="sidebar-primary-item"><nav class="bd-links bd-docs-nav" aria-label="Main">
+    <div class="bd-toc-item navbar-nav active">
+        
+        <ul class="nav bd-sidenav bd-sidenav__home-link">
+            <li class="toctree-l1">
+                <a class="reference internal" href="http://github.com/intro.html">
+                    Applied Data Analysis and Machine Learning
+                </a>
+            </li>
+        </ul>
+        <p aria-level="2" class="caption" role="heading"><span class="caption-text">About the course</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/schedule.html">Course setting</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/teachers.html">Teachers and Grading</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/textbooks.html">Textbooks</a></li>
+
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Review of Statistics with Resampling Techniques and Linear Algebra</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/statistics.html">1. Elements of Probability Theory and Statistical Data Analysis</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/linalg.html">2. Linear Algebra, Handling of Arrays and more Python Features</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">From Regression to Support Vector Machines</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter1.html">3. Linear Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter2.html">4. Ridge and Lasso Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter3.html">5. Resampling Methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter4.html">6. Logistic Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapteroptimization.html">7. Optimization, the central part of any Machine Learning algortithm</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter5.html">8. Support Vector Machines, overarching aims</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Decision Trees, Ensemble Methods and Boosting</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter6.html">9. Decision trees, overarching aims</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter7.html">10. Ensemble Methods: From a Single Tree to Many Trees and Extreme Boosting, Meet the Jungle of Methods</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Dimensionality Reduction</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter8.html">11. Basic ideas of the Principal Component Analysis (PCA)</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/clustering.html">12. Clustering and Unsupervised Learning</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Deep Learning Methods</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter9.html">13. Neural networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter10.html">14. Building a Feed Forward Neural Network</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter11.html">15. Solving Differential Equations  with Deep Learning</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter12.html">16. Convolutional Neural Networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter13.html">17. Recurrent neural networks: Overarching view</a></li>
+
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Weekly material, notes and exercises</span></p>
+<ul class="current nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek34.html">Exercises week 34</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week34.html">Week 34: Introduction to the course, Logistics and Practicalities</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek35.html">Exercises week 35</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week35.html">Week 35: From Ordinary Linear Regression to Ridge and Lasso Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek36.html">Exercises week 36</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week36.html">Week 36: Linear Regression and Gradient descent</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek37.html">Exercises week 37</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week37.html">Week 37: Gradient descent methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek38.html">Exercises week 38</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week38.html">Week 38: Statistical analysis, bias-variance tradeoff and resampling methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek39.html">Exercises week 39</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week39.html">Week 39: Resampling methods and logistic regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week40.html">Week 40: Gradient descent methods (continued) and start Neural networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week41.html">Week 41 Neural networks and constructing a neural network code</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek41.html">Exercises week 41</a></li>
+
+
+
+
+
+
+
+
+<li class="toctree-l1 current active"><a class="current reference internal" href="#">Week 42 Constructing a Neural Network code with examples</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek42.html">Exercises week 42</a></li>
+
+
+
+
+
+
+
+
+
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week43.html">Week 43: Deep Learning: Constructing a Neural Network code and solving differential equations</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek43.html">Exercises week 43</a></li>
+
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week44.html">Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek44.html">Exercises week 44</a></li>
+
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week45.html">Week 45,  Convolutional Neural Networks (CCNs)</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Projects</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/project1.html">Project 1 on Machine Learning, deadline October 6 (midnight), 2025</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/project2.html">Project 2 on Machine Learning, deadline November 10 (Midnight)</a></li>
+</ul>
+
+    </div>
+</nav></div>
+    </div>
+  
+  
+  <div class="sidebar-primary-items__end sidebar-primary__section">
+  </div>
+  
+  <div id="rtd-footer-container"></div>
+
+
+      </div>
+      
+      <main id="main-content" class="bd-main" role="main">
+        
+        
+
+<div class="sbt-scroll-pixel-helper"></div>
+
+          <div class="bd-content">
+            <div class="bd-article-container">
+              
+              <div class="bd-header-article d-print-none">
+<div class="header-article-items header-article__inner">
+  
+    <div class="header-article-items__start">
+      
+        <div class="header-article-item"><button class="sidebar-toggle primary-toggle btn btn-sm" title="Toggle primary sidebar" data-bs-placement="bottom" data-bs-toggle="tooltip">
+  <span class="fa-solid fa-bars"></span>
+</button></div>
+      
+    </div>
+  
+  
+    <div class="header-article-items__end">
+      
+        <div class="header-article-item">
+
+<div class="article-header-buttons">
+
+
+
+
+
+<div class="dropdown dropdown-download-buttons">
+  <button class="btn dropdown-toggle" type="button" data-bs-toggle="dropdown" aria-expanded="false" aria-label="Download this page">
+    <i class="fas fa-download"></i>
+  </button>
+  <ul class="dropdown-menu">
+      
+      
+      
+      <li><a href="http://github.com/_sources/week42.ipynb" target="_blank"
+   class="btn btn-sm btn-download-source-button dropdown-item"
+   title="Download source file"
+   data-bs-placement="left" data-bs-toggle="tooltip"
+>
+  
+
+<span class="btn__icon-container">
+  <i class="fas fa-file"></i>
+  </span>
+<span class="btn__text-container">.ipynb</span>
+</a>
+</li>
+      
+      
+      
+      
+      <li>
+<button onclick="window.print()"
+  class="btn btn-sm btn-download-pdf-button dropdown-item"
+  title="Print to PDF"
+  data-bs-placement="left" data-bs-toggle="tooltip"
+>
+  
+
+<span class="btn__icon-container">
+  <i class="fas fa-file-pdf"></i>
+  </span>
+<span class="btn__text-container">.pdf</span>
+</button>
+</li>
+      
+  </ul>
+</div>
+
+
+
+
+<button onclick="toggleFullScreen()"
+  class="btn btn-sm btn-fullscreen-button"
+  title="Fullscreen mode"
+  data-bs-placement="bottom" data-bs-toggle="tooltip"
+>
+  
+
+<span class="btn__icon-container">
+  <i class="fas fa-expand"></i>
+  </span>
+
+</button>
+
+
+
+<script>
+document.write(`
+  <button class="btn btn-sm nav-link pst-navbar-icon theme-switch-button" title="light/dark" aria-label="light/dark" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <i class="theme-switch fa-solid fa-sun fa-lg" data-mode="light"></i>
+    <i class="theme-switch fa-solid fa-moon fa-lg" data-mode="dark"></i>
+    <i class="theme-switch fa-solid fa-circle-half-stroke fa-lg" data-mode="auto"></i>
+  </button>
+`);
+</script>
+
+
+<script>
+document.write(`
+  <button class="btn btn-sm pst-navbar-icon search-button search-button__button" title="Search" aria-label="Search" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <i class="fa-solid fa-magnifying-glass fa-lg"></i>
+  </button>
+`);
+</script>
+<button class="sidebar-toggle secondary-toggle btn btn-sm" title="Toggle secondary sidebar" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <span class="fa-solid fa-list"></span>
+</button>
+</div></div>
+      
+    </div>
+  
+</div>
+</div>
+              
+              
+
+<div id="jb-print-docs-body" class="onlyprint">
+    <h1>Week 42 Constructing a Neural Network code with examples</h1>
+    <!-- Table of contents -->
+    <div id="print-main-content">
+        <div id="jb-print-toc">
+            
+            <div>
+                <h2> Contents </h2>
+            </div>
+            <nav aria-label="Page">
+                <ul class="visible nav section-nav flex-column">
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#lecture-october-13-2025">Lecture October 13, 2025</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#readings-and-videos">Readings and videos</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#material-for-the-lab-sessions-on-tuesday-and-wednesday">Material for the lab sessions on Tuesday and Wednesday</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network">Lecture material: Writing a code which implements a feed-forward neural network</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#mathematics-of-deep-learning">Mathematics of deep learning</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#reminder-on-books-with-hands-on-material-and-codes">Reminder on books with hands-on material and codes</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#reading-recommendations">Reading recommendations</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#reminder-from-last-week-first-network-example-simple-percepetron-with-one-input">Reminder from last week: First network example, simple percepetron with one input</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#layout-of-a-simple-neural-network-with-no-hidden-layer">Layout of a simple neural network with no hidden layer</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#optimizing-the-parameters">Optimizing the parameters</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#adding-a-hidden-layer">Adding a hidden layer</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#layout-of-a-simple-neural-network-with-one-hidden-layer">Layout of a simple neural network with one hidden layer</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#the-derivatives">The derivatives</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#important-observations">Important observations</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#the-training">The training</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#code-example">Code example</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#simple-neural-network-and-the-back-propagation-equations">Simple neural network and the  back propagation equations</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-with-two-hidden-noeds-and-one-output-node">Layout of a simple neural network with two input nodes, one  hidden layer with two hidden noeds and one output node</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#the-ouput-layer">The ouput layer</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#compact-expressions">Compact expressions</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#output-layer">Output layer</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#explicit-derivatives">Explicit derivatives</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#derivatives-of-the-hidden-layer">Derivatives of the hidden layer</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#final-expression">Final expression</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#completing-the-list">Completing the list</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#final-expressions-for-the-biases-of-the-hidden-layer">Final expressions for the biases of the hidden layer</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#gradient-expressions">Gradient expressions</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#setting-up-the-equations-for-a-neural-network">Setting up the equations for a neural network</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0">Layout of a neural network with three hidden layers (last layer = <span class="math notranslate nohighlight">\(l=L=4\)</span>, first layer <span class="math notranslate nohighlight">\(l=0\)</span>)</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#definitions">Definitions</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#inputs-to-the-activation-function">Inputs to the activation function</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#layout-of-input-to-first-hidden-layer-l-1-from-input-layer-l-0">Layout of input to first hidden layer <span class="math notranslate nohighlight">\(l=1\)</span> from input layer <span class="math notranslate nohighlight">\(l=0\)</span></a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#derivatives-and-the-chain-rule">Derivatives and the chain rule</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#derivative-of-the-cost-function">Derivative of the cost function</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#the-back-propagation-equations-for-a-neural-network">The  back propagation equations for a neural network</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#analyzing-the-last-results">Analyzing the last results</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#more-considerations">More considerations</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#derivatives-in-terms-of-z-j-l">Derivatives in terms of <span class="math notranslate nohighlight">\(z_j^L\)</span></a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#bringing-it-together">Bringing it together</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#final-back-propagating-equation">Final back propagating equation</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#using-the-chain-rule-and-summing-over-all-k-entries">Using the chain rule and summing over all <span class="math notranslate nohighlight">\(k\)</span> entries</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations">Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#setting-up-the-back-propagation-algorithm-part-1">Setting up the back propagation algorithm, part 1</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#setting-up-the-back-propagation-algorithm-part-2">Setting up the back propagation algorithm, part 2</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#setting-up-the-back-propagation-algorithm-part-3">Setting up the Back propagation algorithm, part 3</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#updating-the-gradients">Updating the gradients</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#activation-functions">Activation functions</a><ul class="nav section-nav flex-column">
+<li class="toc-h3 nav-item toc-entry"><a class="reference internal nav-link" href="#activation-functions-logistic-and-hyperbolic-ones">Activation functions, Logistic and Hyperbolic ones</a></li>
+</ul>
+</li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#relevance">Relevance</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#vanishing-gradients">Vanishing gradients</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#exploding-gradients">Exploding gradients</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#is-the-logistic-activation-function-sigmoid-our-choice">Is the Logistic activation function (Sigmoid)  our choice?</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#logistic-function-as-the-root-of-problems">Logistic function as the root of problems</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#the-derivative-of-the-logistic-funtion">The derivative of the Logistic funtion</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#insights-from-the-paper-by-glorot-and-bengio">Insights from the paper by Glorot and Bengio</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#the-relu-function-family">The RELU function family</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#elu-function">ELU function</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#which-activation-function-should-we-use">Which activation function should we use?</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#more-on-activation-functions-output-layers">More on activation functions, output layers</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#fine-tuning-neural-network-hyperparameters">Fine-tuning neural network hyperparameters</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#hidden-layers">Hidden layers</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#batch-normalization">Batch Normalization</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#dropout">Dropout</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#gradient-clipping">Gradient Clipping</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#a-top-down-perspective-on-neural-networks">A top-down perspective on Neural networks</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#more-top-down-perspectives">More top-down perspectives</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#limitations-of-supervised-learning-with-deep-networks">Limitations of supervised learning with deep networks</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#limitations-of-nns">Limitations of NNs</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#homogeneous-data">Homogeneous data</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#more-limitations">More limitations</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#setting-up-a-multi-layer-perceptron-model-for-classification">Setting up a Multi-layer perceptron model for classification</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#defining-the-cost-function">Defining the cost function</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#example-binary-classification-problem">Example: binary classification problem</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#the-softmax-function">The Softmax function</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#developing-a-code-for-doing-neural-networks-with-back-propagation">Developing a code for doing neural networks with back propagation</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#collect-and-pre-process-data">Collect and pre-process data</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#train-and-test-datasets">Train and test datasets</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#define-model-and-architecture">Define model and architecture</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#layers">Layers</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#weights-and-biases">Weights and biases</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#feed-forward-pass">Feed-forward pass</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#matrix-multiplications">Matrix multiplications</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#choose-cost-function-and-optimizer">Choose cost function and optimizer</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#optimizing-the-cost-function">Optimizing the cost function</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#regularization">Regularization</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#matrix-multiplication">Matrix  multiplication</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#improving-performance">Improving performance</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#full-object-oriented-implementation">Full object-oriented implementation</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#evaluate-model-performance-on-test-data">Evaluate model performance on test data</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#adjust-hyperparameters">Adjust hyperparameters</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#visualization">Visualization</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#scikit-learn-implementation">scikit-learn implementation</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#id1">Visualization</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#building-neural-networks-in-tensorflow-and-keras">Building neural networks in Tensorflow and Keras</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#tensorflow">Tensorflow</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#using-keras">Using Keras</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#id2">Collect and pre-process data</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#building-a-neural-network-code">Building a neural network code</a><ul class="nav section-nav flex-column">
+<li class="toc-h3 nav-item toc-entry"><a class="reference internal nav-link" href="#learning-rate-methods">Learning rate methods</a></li>
+<li class="toc-h3 nav-item toc-entry"><a class="reference internal nav-link" href="#usage-of-the-above-learning-rate-schedulers">Usage of the above learning rate schedulers</a></li>
+<li class="toc-h3 nav-item toc-entry"><a class="reference internal nav-link" href="#cost-functions">Cost functions</a></li>
+<li class="toc-h3 nav-item toc-entry"><a class="reference internal nav-link" href="#id3">Activation functions</a></li>
+<li class="toc-h3 nav-item toc-entry"><a class="reference internal nav-link" href="#the-neural-network">The Neural Network</a></li>
+<li class="toc-h3 nav-item toc-entry"><a class="reference internal nav-link" href="#multiclass-classification">Multiclass classification</a></li>
+</ul>
+</li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#testing-the-xor-gate-and-other-gates">Testing the XOR gate and other gates</a></li>
+</ul>
+            </nav>
+        </div>
+    </div>
+</div>
+
+              
+                
+<div id="searchbox"></div>
+                <article class="bd-article">
+                  
+  <!-- HTML file automatically generated from DocOnce source (https://github.com/doconce/doconce/)
+doconce format html week42.do.txt --no_mako -->
+<!-- dom:TITLE: Week 42 Constructing a Neural Network code with examples --><section class="tex2jax_ignore mathjax_ignore" id="week-42-constructing-a-neural-network-code-with-examples">
+<h1>Week 42 Constructing a Neural Network code with examples<a class="headerlink" href="#week-42-constructing-a-neural-network-code-with-examples" title="Link to this heading">#</a></h1>
+<p><strong>Morten Hjorth-Jensen</strong>, Department of Physics, University of Oslo, Norway</p>
+<p>Date: <strong>October 13-17, 2025</strong></p>
+<section id="lecture-october-13-2025">
+<h2>Lecture October 13, 2025<a class="headerlink" href="#lecture-october-13-2025" title="Link to this heading">#</a></h2>
+<ol class="arabic simple">
+<li><p>Building our own Feed-forward Neural Network and discussion of project 2</p></li>
+<li><p>Project 2 is available at <a class="github reference external" href="https://github.com/CompPhysics/MachineLearning/blob/master/doc/Projects/2025/Project2/ipynb/Project2.ipynb">CompPhysics/MachineLearning</a></p></li>
+</ol>
+</section>
+<section id="readings-and-videos">
+<h2>Readings and videos<a class="headerlink" href="#readings-and-videos" title="Link to this heading">#</a></h2>
+<ol class="arabic simple">
+<li><p>These lecture notes</p></li>
+<li><p>Video of lecture at <a class="reference external" href="https://youtu.be/eqyNrEYRXnY">https://youtu.be/eqyNrEYRXnY</a></p></li>
+<li><p>Whiteboard notes at <a class="github reference external" href="https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2025/FYSSTKweek42.pdf">CompPhysics/MachineLearning</a></p></li>
+<li><p>For a more in depth discussion on  neural networks we recommend Goodfellow et al chapters 6 and 7. For the optimization part, see chapter 8.</p></li>
+<li><p>Neural Networks demystified at <a class="reference external" href="https://www.youtube.com/watch?v=bxe2T-V8XRs&amp;amp;list=PLiaHhY2iBX9hdHaRr6b7XevZtgZRa1PoU&amp;amp;ab_channel=WelchLabs">https://www.youtube.com/watch?v=bxe2T-V8XRs&amp;list=PLiaHhY2iBX9hdHaRr6b7XevZtgZRa1PoU&amp;ab_channel=WelchLabs</a></p></li>
+<li><p>Building Neural Networks from scratch at <a class="reference external" href="https://www.youtube.com/watch?v=Wo5dMEP_BbI&amp;amp;list=PLQVvvaa0QuDcjD5BAw2DxE6OF2tius3V3&amp;amp;ab_channel=sentdex">https://www.youtube.com/watch?v=Wo5dMEP_BbI&amp;list=PLQVvvaa0QuDcjD5BAw2DxE6OF2tius3V3&amp;ab_channel=sentdex</a></p></li>
+<li><p>Video on Neural Networks at <a class="reference external" href="https://www.youtube.com/watch?v=CqOfi41LfDw">https://www.youtube.com/watch?v=CqOfi41LfDw</a></p></li>
+<li><p>Video on the back propagation algorithm at <a class="reference external" href="https://www.youtube.com/watch?v=Ilg3gGewQ5U">https://www.youtube.com/watch?v=Ilg3gGewQ5U</a></p></li>
+</ol>
+<p>I also  recommend Michael Nielsen’s intuitive approach to the neural networks and the universal approximation theorem, see the slides at <a class="reference external" href="http://neuralnetworksanddeeplearning.com/chap4.html">http://neuralnetworksanddeeplearning.com/chap4.html</a>.</p>
+</section>
+<section id="material-for-the-lab-sessions-on-tuesday-and-wednesday">
+<h2>Material for the lab sessions on Tuesday and Wednesday<a class="headerlink" href="#material-for-the-lab-sessions-on-tuesday-and-wednesday" title="Link to this heading">#</a></h2>
+<ol class="arabic simple">
+<li><p>Exercises on writing a code for neural networks, back propagation part, see exercises for week 42 at <a class="reference external" href="https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/exercisesweek42.html">https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/exercisesweek42.html</a></p></li>
+<li><p>Discussion of project 2</p></li>
+</ol>
+</section>
+<section id="lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network">
+<h2>Lecture material: Writing a code which implements a feed-forward neural network<a class="headerlink" href="#lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network" title="Link to this heading">#</a></h2>
+<p>Last week we discussed the basics of neural networks and deep learning
+and the basics of automatic differentiation.  We looked also at
+examples on how compute the parameters of a simple network with scalar
+inputs and ouputs and no or just one hidden layers.</p>
+<p>We ended our discussions with the derivation of the equations for a
+neural network with one hidden layers and two input variables and two
+hidden nodes but only one output node. We did almost finish the derivation of the back propagation algorithm.</p>
+</section>
+<section id="mathematics-of-deep-learning">
+<h2>Mathematics of deep learning<a class="headerlink" href="#mathematics-of-deep-learning" title="Link to this heading">#</a></h2>
+<p><strong>Two recent books online.</strong></p>
+<ol class="arabic simple">
+<li><p><a class="reference external" href="https://arxiv.org/abs/2105.04026">The Modern Mathematics of Deep Learning, by Julius Berner, Philipp Grohs, Gitta Kutyniok, Philipp Petersen</a>, published as <a class="reference external" href="https://doi.org/10.1017/9781009025096.002">Mathematical Aspects of Deep Learning, pp. 1-111. Cambridge University Press, 2022</a></p></li>
+<li><p><a class="reference external" href="https://doi.org/10.48550/arXiv.2310.20360">Mathematical Introduction to Deep Learning: Methods, Implementations, and Theory, Arnulf Jentzen, Benno Kuckuck, Philippe von Wurstemberger</a></p></li>
+</ol>
+</section>
+<section id="reminder-on-books-with-hands-on-material-and-codes">
+<h2>Reminder on books with hands-on material and codes<a class="headerlink" href="#reminder-on-books-with-hands-on-material-and-codes" title="Link to this heading">#</a></h2>
+<ul class="simple">
+<li><p><a class="reference external" href="https://sebastianraschka.com/blog/2022/ml-pytorch-book.html">Sebastian Rashcka et al, Machine learning with Sickit-Learn and PyTorch</a></p></li>
+</ul>
+</section>
+<section id="reading-recommendations">
+<h2>Reading recommendations<a class="headerlink" href="#reading-recommendations" title="Link to this heading">#</a></h2>
+<ol class="arabic simple">
+<li><p>Rashkca et al., chapter 11, jupyter-notebook sent separately, from <a class="reference external" href="https://github.com/rasbt/machine-learning-book">GitHub</a></p></li>
+<li><p>Goodfellow et al, chapter 6 and 7 contain most of the neural network background.</p></li>
+</ol>
+</section>
+<section id="reminder-from-last-week-first-network-example-simple-percepetron-with-one-input">
+<h2>Reminder from last week: First network example, simple percepetron with one input<a class="headerlink" href="#reminder-from-last-week-first-network-example-simple-percepetron-with-one-input" title="Link to this heading">#</a></h2>
+<p>As yet another example we define now a simple perceptron model with
+all quantities given by scalars. We consider only one input variable
+<span class="math notranslate nohighlight">\(x\)</span> and one target value <span class="math notranslate nohighlight">\(y\)</span>.  We define an activation function
+<span class="math notranslate nohighlight">\(\sigma_1\)</span> which takes as input</p>
+<div class="math notranslate nohighlight">
+\[
+z_1 = w_1x+b_1,
+\]</div>
+<p>where <span class="math notranslate nohighlight">\(w_1\)</span> is the weight and <span class="math notranslate nohighlight">\(b_1\)</span> is the bias. These are the
+parameters we want to optimize.  The output is <span class="math notranslate nohighlight">\(a_1=\sigma(z_1)\)</span> (see
+graph from whiteboard notes). This output is then fed into the
+<strong>cost/loss</strong> function, which we here for the sake of simplicity just
+define as the squared error</p>
+<div class="math notranslate nohighlight">
+\[
+C(x;w_1,b_1)=\frac{1}{2}(a_1-y)^2.
+\]</div>
+</section>
+<section id="layout-of-a-simple-neural-network-with-no-hidden-layer">
+<h2>Layout of a simple neural network with no hidden layer<a class="headerlink" href="#layout-of-a-simple-neural-network-with-no-hidden-layer" title="Link to this heading">#</a></h2>
+<!-- dom:FIGURE: [figures/simplenn1.png, width=900 frac=1.0] -->
+<!-- begin figure -->
+<p><img src="http://github.com/figures/simplenn1.png" width="900"><p style="font-size: 0.9em"><i>Figure 1: </i></p></p>
+<!-- end figure --></section>
+<section id="optimizing-the-parameters">
+<h2>Optimizing the parameters<a class="headerlink" href="#optimizing-the-parameters" title="Link to this heading">#</a></h2>
+<p>In setting up the feed forward and back propagation parts of the
+algorithm, we need now the derivative of the various variables we want
+to train.</p>
+<p>We need</p>
+<div class="math notranslate nohighlight">
+\[
+\frac{\partial C}{\partial w_1} \hspace{0.1cm}\mathrm{and}\hspace{0.1cm}\frac{\partial C}{\partial b_1}.
+\]</div>
+<p>Using the chain rule we find</p>
+<div class="math notranslate nohighlight">
+\[
+\frac{\partial C}{\partial w_1}=\frac{\partial C}{\partial a_1}\frac{\partial a_1}{\partial z_1}\frac{\partial z_1}{\partial w_1}=(a_1-y)\sigma_1'x,
+\]</div>
+<p>and</p>
+<div class="math notranslate nohighlight">
+\[
+\frac{\partial C}{\partial b_1}=\frac{\partial C}{\partial a_1}\frac{\partial a_1}{\partial z_1}\frac{\partial z_1}{\partial b_1}=(a_1-y)\sigma_1',
+\]</div>
+<p>which we later will just define as</p>
+<div class="math notranslate nohighlight">
+\[
+\frac{\partial C}{\partial a_1}\frac{\partial a_1}{\partial z_1}=\delta_1.
+\]</div>
+</section>
+<section id="adding-a-hidden-layer">
+<h2>Adding a hidden layer<a class="headerlink" href="#adding-a-hidden-layer" title="Link to this heading">#</a></h2>
+<p>We change our simple model to (see graph)
+a network with just one hidden layer but with scalar variables only.</p>
+<p>Our output variable changes to <span class="math notranslate nohighlight">\(a_2\)</span> and <span class="math notranslate nohighlight">\(a_1\)</span> is now the output from the hidden node and <span class="math notranslate nohighlight">\(a_0=x\)</span>.
+We have then</p>
+<div class="math notranslate nohighlight">
+\[
+z_1 = w_1a_0+b_1 \hspace{0.1cm} \wedge a_1 = \sigma_1(z_1),
+\]</div>
+<div class="math notranslate nohighlight">
+\[
+z_2 = w_2a_1+b_2 \hspace{0.1cm} \wedge a_2 = \sigma_2(z_2),
+\]</div>
+<p>and the cost function</p>
+<div class="math notranslate nohighlight">
+\[
+C(x;\boldsymbol{\Theta})=\frac{1}{2}(a_2-y)^2,
+\]</div>
+<p>with <span class="math notranslate nohighlight">\(\boldsymbol{\Theta}=[w_1,w_2,b_1,b_2]\)</span>.</p>
+</section>
+<section id="layout-of-a-simple-neural-network-with-one-hidden-layer">
+<h2>Layout of a simple neural network with one hidden layer<a class="headerlink" href="#layout-of-a-simple-neural-network-with-one-hidden-layer" title="Link to this heading">#</a></h2>
+<!-- dom:FIGURE: [figures/simplenn2.png, width=900 frac=1.0] -->
+<!-- begin figure -->
+<p><img src="http://github.com/figures/simplenn2.png" width="900"><p style="font-size: 0.9em"><i>Figure 1: </i></p></p>
+<!-- end figure --></section>
+<section id="the-derivatives">
+<h2>The derivatives<a class="headerlink" href="#the-derivatives" title="Link to this heading">#</a></h2>
+<p>The derivatives are now, using the chain rule again</p>
+<div class="math notranslate nohighlight">
+\[
+\frac{\partial C}{\partial w_2}=\frac{\partial C}{\partial a_2}\frac{\partial a_2}{\partial z_2}\frac{\partial z_2}{\partial w_2}=(a_2-y)\sigma_2'a_1=\delta_2a_1,
+\]</div>
+<div class="math notranslate nohighlight">
+\[
+\frac{\partial C}{\partial b_2}=\frac{\partial C}{\partial a_2}\frac{\partial a_2}{\partial z_2}\frac{\partial z_2}{\partial b_2}=(a_2-y)\sigma_2'=\delta_2,
+\]</div>
+<div class="math notranslate nohighlight">
+\[
+\frac{\partial C}{\partial w_1}=\frac{\partial C}{\partial a_2}\frac{\partial a_2}{\partial z_2}\frac{\partial z_2}{\partial a_1}\frac{\partial a_1}{\partial z_1}\frac{\partial z_1}{\partial w_1}=(a_2-y)\sigma_2'a_1\sigma_1'a_0,
+\]</div>
+<div class="math notranslate nohighlight">
+\[
+\frac{\partial C}{\partial b_1}=\frac{\partial C}{\partial a_2}\frac{\partial a_2}{\partial z_2}\frac{\partial z_2}{\partial a_1}\frac{\partial a_1}{\partial z_1}\frac{\partial z_1}{\partial b_1}=(a_2-y)\sigma_2'\sigma_1'=\delta_1.
+\]</div>
+<p>Can you generalize this to more than one hidden layer?</p>
+</section>
+<section id="important-observations">
+<h2>Important observations<a class="headerlink" href="#important-observations" title="Link to this heading">#</a></h2>
+<p>From the above equations we see that the derivatives of the activation
+functions play a central role. If they vanish, the training may
+stop. This is called the vanishing gradient problem, see discussions below. If they become
+large, the parameters <span class="math notranslate nohighlight">\(w_i\)</span> and <span class="math notranslate nohighlight">\(b_i\)</span> may simply go to infinity. This
+is referenced as  the exploding gradient problem.</p>
+</section>
+<section id="the-training">
+<h2>The training<a class="headerlink" href="#the-training" title="Link to this heading">#</a></h2>
+<p>The training of the parameters is done through various gradient descent approximations with</p>
+<div class="math notranslate nohighlight">
+\[
+w_{i}\leftarrow w_{i}- \eta \delta_i a_{i-1},
+\]</div>
+<p>and</p>
+<div class="math notranslate nohighlight">
+\[
+b_i \leftarrow b_i-\eta \delta_i,
+\]</div>
+<p>with <span class="math notranslate nohighlight">\(\eta\)</span> is the learning rate.</p>
+<p>One iteration consists of one feed forward step and one back-propagation step. Each back-propagation step does one update of the parameters <span class="math notranslate nohighlight">\(\boldsymbol{\Theta}\)</span>.</p>
+<p>For the first hidden layer <span class="math notranslate nohighlight">\(a_{i-1}=a_0=x\)</span> for this simple model.</p>
+</section>
+<section id="code-example">
+<h2>Code example<a class="headerlink" href="#code-example" title="Link to this heading">#</a></h2>
+<p>The code here implements the above model with one hidden layer and
+scalar variables for the same function we studied in the previous
+example.  The code is however set up so that we can add multiple
+inputs <span class="math notranslate nohighlight">\(x\)</span> and target values <span class="math notranslate nohighlight">\(y\)</span>. Note also that we have the
+possibility of defining a feature matrix <span class="math notranslate nohighlight">\(\boldsymbol{X}\)</span> with more than just
+one column for the input values. This will turn useful in our next example. We have also defined matrices and vectors for all of our operations although it is not necessary here.</p>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-none notranslate"><div class="highlight"><pre><span></span>import numpy as np
+# We use the Sigmoid function as activation function
+def sigmoid(z):
+    return 1.0/(1.0+np.exp(-z))
+
+def forwardpropagation(x):
+    # weighted sum of inputs to the hidden layer
+    z_1 = np.matmul(x, w_1) + b_1
+    # activation in the hidden layer
+    a_1 = sigmoid(z_1)
+    # weighted sum of inputs to the output layer
+    z_2 = np.matmul(a_1, w_2) + b_2
+    a_2 = z_2
+    return a_1, a_2
+
+def backpropagation(x, y):
+    a_1, a_2 = forwardpropagation(x)
+    # parameter delta for the output layer, note that a_2=z_2 and its derivative wrt z_2 is just 1
+    delta_2 = a_2 - y
+    print(0.5*((a_2-y)**2))
+    # delta for  the hidden layer
+    delta_1 = np.matmul(delta_2, w_2.T) * a_1 * (1 - a_1)
+    # gradients for the output layer
+    output_weights_gradient = np.matmul(a_1.T, delta_2)
+    output_bias_gradient = np.sum(delta_2, axis=0)
+    # gradient for the hidden layer
+    hidden_weights_gradient = np.matmul(x.T, delta_1)
+    hidden_bias_gradient = np.sum(delta_1, axis=0)
+    return output_weights_gradient, output_bias_gradient, hidden_weights_gradient, hidden_bias_gradient
+
+
+# ensure the same random numbers appear every time
+np.random.seed(0)
+# Input variable
+x = np.array([4.0],dtype=np.float64)
+# Target values
+y = 2*x+1.0 
+
+# Defining the neural network, only scalars here
+n_inputs = x.shape
+n_features = 1
+n_hidden_neurons = 1
+n_outputs = 1
+
+# Initialize the network
+# weights and bias in the hidden layer
+w_1 = np.random.randn(n_features, n_hidden_neurons)
+b_1 = np.zeros(n_hidden_neurons) + 0.01
+
+# weights and bias in the output layer
+w_2 = np.random.randn(n_hidden_neurons, n_outputs)
+b_2 = np.zeros(n_outputs) + 0.01
+
+eta = 0.1
+for i in range(50):
+    # calculate gradients
+    derivW2, derivB2, derivW1, derivB1 = backpropagation(x, y)
+    # update weights and biases
+    w_2 -= eta * derivW2
+    b_2 -= eta * derivB2
+    w_1 -= eta * derivW1
+    b_1 -= eta * derivB1
+</pre></div>
+</div>
+</div>
+</div>
+<p>We see that after some few iterations (the results do depend on the learning rate however), we get an error which is rather small.</p>
+</section>
+<section id="simple-neural-network-and-the-back-propagation-equations">
+<h2>Simple neural network and the  back propagation equations<a class="headerlink" href="#simple-neural-network-and-the-back-propagation-equations" title="Link to this heading">#</a></h2>
+<p>Let us now try to increase our level of ambition and attempt at setting
+up the equations for a neural network with two input nodes, one hidden
+layer with two hidden nodes and one output layer with one output node/neuron only (see graph)..</p>
+<p>We need to define the following parameters and variables with the input layer (layer <span class="math notranslate nohighlight">\((0)\)</span>)
+where we label the  nodes <span class="math notranslate nohighlight">\(x_1\)</span> and <span class="math notranslate nohighlight">\(x_2\)</span></p>
+<div class="math notranslate nohighlight">
+\[
+x_1 = a_1^{(0)} \wedge x_2 = a_2^{(0)}.
+\]</div>
+<p>The  hidden layer (layer <span class="math notranslate nohighlight">\((1)\)</span>) has  nodes which yield the outputs <span class="math notranslate nohighlight">\(a_1^{(1)}\)</span> and <span class="math notranslate nohighlight">\(a_2^{(1)}\)</span>) with  weight <span class="math notranslate nohighlight">\(\boldsymbol{w}\)</span> and bias <span class="math notranslate nohighlight">\(\boldsymbol{b}\)</span> parameters</p>
+<div class="math notranslate nohighlight">
+\[
+w_{ij}^{(1)}=\left\{w_{11}^{(1)},w_{12}^{(1)},w_{21}^{(1)},w_{22}^{(1)}\right\} \wedge b^{(1)}=\left\{b_1^{(1)},b_2^{(1)}\right\}.
+\]</div>
+</section>
+<section id="layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-with-two-hidden-noeds-and-one-output-node">
+<h2>Layout of a simple neural network with two input nodes, one  hidden layer with two hidden noeds and one output node<a class="headerlink" href="#layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-with-two-hidden-noeds-and-one-output-node" title="Link to this heading">#</a></h2>
+<!-- dom:FIGURE: [figures/simplenn3.png, width=900 frac=1.0] -->
+<!-- begin figure -->
+<p><img src="http://github.com/figures/simplenn3.png" width="900"><p style="font-size: 0.9em"><i>Figure 1: </i></p></p>
+<!-- end figure --></section>
+<section id="the-ouput-layer">
+<h2>The ouput layer<a class="headerlink" href="#the-ouput-layer" title="Link to this heading">#</a></h2>
+<p>We have the ouput layer given by layer label <span class="math notranslate nohighlight">\((2)\)</span> with output <span class="math notranslate nohighlight">\(a^{(2)}\)</span> and weights and biases to be determined given by the variables</p>
+<div class="math notranslate nohighlight">
+\[
+w_{i}^{(2)}=\left\{w_{1}^{(2)},w_{2}^{(2)}\right\} \wedge b^{(2)}.
+\]</div>
+<p>Our output is <span class="math notranslate nohighlight">\(\tilde{y}=a^{(2)}\)</span> and we define a generic cost function <span class="math notranslate nohighlight">\(C(a^{(2)},y;\boldsymbol{\Theta})\)</span> where <span class="math notranslate nohighlight">\(y\)</span> is the target value (a scalar here).
+The parameters we need to optimize are given by</p>
+<div class="math notranslate nohighlight">
+\[
+\boldsymbol{\Theta}=\left\{w_{11}^{(1)},w_{12}^{(1)},w_{21}^{(1)},w_{22}^{(1)},w_{1}^{(2)},w_{2}^{(2)},b_1^{(1)},b_2^{(1)},b^{(2)}\right\}.
+\]</div>
+</section>
+<section id="compact-expressions">
+<h2>Compact expressions<a class="headerlink" href="#compact-expressions" title="Link to this heading">#</a></h2>
+<p>We can define the inputs to the activation functions for the various layers in terms of various matrix-vector multiplications and vector additions.
+The inputs to the first hidden layer are</p>
+<div class="math notranslate nohighlight">
+\[\begin{split}
+\begin{bmatrix}z_1^{(1)} \\ z_2^{(1)} \end{bmatrix}=\left(\begin{bmatrix}w_{11}^{(1)} &amp; w_{12}^{(1)}\\ w_{21}^{(1)} &amp;w_{22}^{(1)} \end{bmatrix}\right)^{T}\begin{bmatrix}a_1^{(0)} \\ a_2^{(0)} \end{bmatrix}+\begin{bmatrix}b_1^{(1)} \\ b_2^{(1)} \end{bmatrix},
+\end{split}\]</div>
+<p>with outputs</p>
+<div class="math notranslate nohighlight">
+\[\begin{split}
+\begin{bmatrix}a_1^{(1)} \\ a_2^{(1)} \end{bmatrix}=\begin{bmatrix}\sigma^{(1)}(z_1^{(1)}) \\ \sigma^{(1)}(z_2^{(1)}) \end{bmatrix}.
+\end{split}\]</div>
+</section>
+<section id="output-layer">
+<h2>Output layer<a class="headerlink" href="#output-layer" title="Link to this heading">#</a></h2>
+<p>For the final output layer we have the inputs to the final activation function</p>
+<div class="math notranslate nohighlight">
+\[
+z^{(2)} = w_{1}^{(2)}a_1^{(1)} +w_{2}^{(2)}a_2^{(1)}+b^{(2)},
+\]</div>
+<p>resulting in the  output</p>
+<div class="math notranslate nohighlight">
+\[
+a^{(2)}=\sigma^{(2)}(z^{(2)}).
+\]</div>
+</section>
+<section id="explicit-derivatives">
+<h2>Explicit derivatives<a class="headerlink" href="#explicit-derivatives" title="Link to this heading">#</a></h2>
+<p>In total we have nine parameters which we need to train.  Using the
+chain rule (or just the back-propagation algorithm) we can find all
+derivatives. Since we will use automatic differentiation in reverse
+mode, we start with the derivatives of the cost function with respect
+to the parameters of the output layer, namely</p>
+<div class="math notranslate nohighlight">
+\[
+\frac{\partial C}{\partial w_{i}^{(2)}}=\frac{\partial C}{\partial a^{(2)}}\frac{\partial a^{(2)}}{\partial z^{(2)}}\frac{\partial z^{(2)}}{\partial w_{i}^{(2)}}=\delta^{(2)}a_i^{(1)},
+\]</div>
+<p>with</p>
+<div class="math notranslate nohighlight">
+\[
+\delta^{(2)}=\frac{\partial C}{\partial a^{(2)}}\frac{\partial a^{(2)}}{\partial z^{(2)}}
+\]</div>
+<p>and finally</p>
+<div class="math notranslate nohighlight">
+\[
+\frac{\partial C}{\partial b^{(2)}}=\frac{\partial C}{\partial a^{(2)}}\frac{\partial a^{(2)}}{\partial z^{(2)}}\frac{\partial z^{(2)}}{\partial b^{(2)}}=\delta^{(2)}.
+\]</div>
+</section>
+<section id="derivatives-of-the-hidden-layer">
+<h2>Derivatives of the hidden layer<a class="headerlink" href="#derivatives-of-the-hidden-layer" title="Link to this heading">#</a></h2>
+<p>Using the chain rule we have the following expressions for say one of the weight parameters (it is easy to generalize to the other weight parameters)</p>
+<div class="math notranslate nohighlight">
+\[
+\frac{\partial C}{\partial w_{11}^{(1)}}=\frac{\partial C}{\partial a^{(2)}}\frac{\partial a^{(2)}}{\partial z^{(2)}}
+\frac{\partial z^{(2)}}{\partial z_1^{(1)}}\frac{\partial z_1^{(1)}}{\partial w_{11}^{(1)}}=    \delta^{(2)}\frac{\partial z^{(2)}}{\partial z_1^{(1)}}\frac{\partial z_1^{(1)}}{\partial w_{11}^{(1)}},
+\]</div>
+<p>which, noting that</p>
+<div class="math notranslate nohighlight">
+\[
+z^{(2)} =w_1^{(2)}a_1^{(1)}+w_2^{(2)}a_2^{(1)}+b^{(2)},
+\]</div>
+<p>allows us to rewrite</p>
+<div class="math notranslate nohighlight">
+\[
+\frac{\partial z^{(2)}}{\partial z_1^{(1)}}\frac{\partial z_1^{(1)}}{\partial w_{11}^{(1)}}=w_1^{(2)}\frac{\partial a_1^{(1)}}{\partial z_1^{(1)}}a_1^{(1)}.
+\]</div>
+</section>
+<section id="final-expression">
+<h2>Final expression<a class="headerlink" href="#final-expression" title="Link to this heading">#</a></h2>
+<p>Defining</p>
+<div class="math notranslate nohighlight">
+\[
+\delta_1^{(1)}=w_1^{(2)}\frac{\partial a_1^{(1)}}{\partial z_1^{(1)}}\delta^{(2)},
+\]</div>
+<p>we have</p>
+<div class="math notranslate nohighlight">
+\[
+\frac{\partial C}{\partial w_{11}^{(1)}}=\delta_1^{(1)}a_1^{(1)}.
+\]</div>
+<p>Similarly, we obtain</p>
+<div class="math notranslate nohighlight">
+\[
+\frac{\partial C}{\partial w_{12}^{(1)}}=\delta_1^{(1)}a_2^{(1)}.
+\]</div>
+</section>
+<section id="completing-the-list">
+<h2>Completing the list<a class="headerlink" href="#completing-the-list" title="Link to this heading">#</a></h2>
+<p>Similarly, we find</p>
+<div class="math notranslate nohighlight">
+\[
+\frac{\partial C}{\partial w_{21}^{(1)}}=\delta_2^{(1)}a_1^{(1)},
+\]</div>
+<p>and</p>
+<div class="math notranslate nohighlight">
+\[
+\frac{\partial C}{\partial w_{22}^{(1)}}=\delta_2^{(1)}a_2^{(1)},
+\]</div>
+<p>where we have defined</p>
+<div class="math notranslate nohighlight">
+\[
+\delta_2^{(1)}=w_2^{(2)}\frac{\partial a_2^{(1)}}{\partial z_2^{(1)}}\delta^{(2)}.
+\]</div>
+</section>
+<section id="final-expressions-for-the-biases-of-the-hidden-layer">
+<h2>Final expressions for the biases of the hidden layer<a class="headerlink" href="#final-expressions-for-the-biases-of-the-hidden-layer" title="Link to this heading">#</a></h2>
+<p>For the sake of completeness, we list the derivatives of the biases, which are</p>
+<div class="math notranslate nohighlight">
+\[
+\frac{\partial C}{\partial b_{1}^{(1)}}=\delta_1^{(1)},
+\]</div>
+<p>and</p>
+<div class="math notranslate nohighlight">
+\[
+\frac{\partial C}{\partial b_{2}^{(1)}}=\delta_2^{(1)}.
+\]</div>
+<p>As we will see below, these expressions can be generalized in a more compact form.</p>
+</section>
+<section id="gradient-expressions">
+<h2>Gradient expressions<a class="headerlink" href="#gradient-expressions" title="Link to this heading">#</a></h2>
+<p>For this specific model, with just one output node and two hidden
+nodes, the gradient descent equations take the following form for output layer</p>
+<div class="math notranslate nohighlight">
+\[
+w_{i}^{(2)}\leftarrow w_{i}^{(2)}- \eta \delta^{(2)} a_{i}^{(1)},
+\]</div>
+<p>and</p>
+<div class="math notranslate nohighlight">
+\[
+b^{(2)} \leftarrow b^{(2)}-\eta \delta^{(2)},
+\]</div>
+<p>and</p>
+<div class="math notranslate nohighlight">
+\[
+w_{ij}^{(1)}\leftarrow w_{ij}^{(1)}- \eta \delta_{i}^{(1)} a_{j}^{(0)},
+\]</div>
+<p>and</p>
+<div class="math notranslate nohighlight">
+\[
+b_{i}^{(1)} \leftarrow b_{i}^{(1)}-\eta \delta_{i}^{(1)},
+\]</div>
+<p>where <span class="math notranslate nohighlight">\(\eta\)</span> is the learning rate.</p>
+</section>
+<section id="setting-up-the-equations-for-a-neural-network">
+<h2>Setting up the equations for a neural network<a class="headerlink" href="#setting-up-the-equations-for-a-neural-network" title="Link to this heading">#</a></h2>
+<p>The questions we want to ask are how do changes in the biases and the
+weights in our network change the cost function and how can we use the
+final output to modify the weights and biases?</p>
+<p>To derive these equations let us start with a plain regression problem
+and define our cost function as</p>
+<div class="math notranslate nohighlight">
+\[
+{\cal C}(\boldsymbol{\Theta})  =  \frac{1}{2}\sum_{i=1}^n\left(y_i - \tilde{y}_i\right)^2,
+\]</div>
+<p>where the <span class="math notranslate nohighlight">\(y_i\)</span>s are our <span class="math notranslate nohighlight">\(n\)</span> targets (the values we want to
+reproduce), while the outputs of the network after having propagated
+all inputs <span class="math notranslate nohighlight">\(\boldsymbol{x}\)</span> are given by <span class="math notranslate nohighlight">\(\boldsymbol{\tilde{y}}_i\)</span>.</p>
+</section>
+<section id="layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0">
+<h2>Layout of a neural network with three hidden layers (last layer = <span class="math notranslate nohighlight">\(l=L=4\)</span>, first layer <span class="math notranslate nohighlight">\(l=0\)</span>)<a class="headerlink" href="#layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0" title="Link to this heading">#</a></h2>
+<!-- dom:FIGURE: [figures/nn2.png, width=900 frac=1.0] -->
+<!-- begin figure -->
+<p><img src="http://github.com/figures/nn2.png" width="900"><p style="font-size: 0.9em"><i>Figure 1: </i></p></p>
+<!-- end figure --></section>
+<section id="definitions">
+<h2>Definitions<a class="headerlink" href="#definitions" title="Link to this heading">#</a></h2>
+<p>With our definition of the targets <span class="math notranslate nohighlight">\(\boldsymbol{y}\)</span>, the outputs of the
+network <span class="math notranslate nohighlight">\(\boldsymbol{\tilde{y}}\)</span> and the inputs <span class="math notranslate nohighlight">\(\boldsymbol{x}\)</span> we
+define now the activation <span class="math notranslate nohighlight">\(z_j^l\)</span> of node/neuron/unit <span class="math notranslate nohighlight">\(j\)</span> of the
+<span class="math notranslate nohighlight">\(l\)</span>-th layer as a function of the bias, the weights which add up from
+the previous layer <span class="math notranslate nohighlight">\(l-1\)</span> and the forward passes/outputs
+<span class="math notranslate nohighlight">\(\boldsymbol{a}^{l-1}\)</span> from the previous layer as</p>
+<div class="math notranslate nohighlight">
+\[
+z_j^l = \sum_{i=1}^{M_{l-1}}w_{ij}^la_i^{l-1}+b_j^l,
+\]</div>
+<p>where <span class="math notranslate nohighlight">\(b_k^l\)</span> are the biases from layer <span class="math notranslate nohighlight">\(l\)</span>.  Here <span class="math notranslate nohighlight">\(M_{l-1}\)</span>
+represents the total number of nodes/neurons/units of layer <span class="math notranslate nohighlight">\(l-1\)</span>. The
+figure in the whiteboard notes illustrates this equation.  We can rewrite this in a more
+compact form as the matrix-vector products we discussed earlier,</p>
+<div class="math notranslate nohighlight">
+\[
+\boldsymbol{z}^l = \left(\boldsymbol{W}^l\right)^T\boldsymbol{a}^{l-1}+\boldsymbol{b}^l.
+\]</div>
+</section>
+<section id="inputs-to-the-activation-function">
+<h2>Inputs to the activation function<a class="headerlink" href="#inputs-to-the-activation-function" title="Link to this heading">#</a></h2>
+<p>With the activation values <span class="math notranslate nohighlight">\(\boldsymbol{z}^l\)</span> we can in turn define the
+output of layer <span class="math notranslate nohighlight">\(l\)</span> as <span class="math notranslate nohighlight">\(\boldsymbol{a}^l = \sigma(\boldsymbol{z}^l)\)</span> where <span class="math notranslate nohighlight">\(\sigma\)</span> is our
+activation function. In the examples here we will use the sigmoid
+function discussed in our logistic regression lectures. We will also use the same activation function <span class="math notranslate nohighlight">\(\sigma\)</span> for all layers
+and their nodes.  It means we have</p>
+<div class="math notranslate nohighlight">
+\[
+a_j^l = \sigma(z_j^l) = \frac{1}{1+\exp{-(z_j^l)}}.
+\]</div>
+</section>
+<section id="layout-of-input-to-first-hidden-layer-l-1-from-input-layer-l-0">
+<h2>Layout of input to first hidden layer <span class="math notranslate nohighlight">\(l=1\)</span> from input layer <span class="math notranslate nohighlight">\(l=0\)</span><a class="headerlink" href="#layout-of-input-to-first-hidden-layer-l-1-from-input-layer-l-0" title="Link to this heading">#</a></h2>
+<!-- dom:FIGURE: [figures/structure.png, width=900 frac=1.0] -->
+<!-- begin figure -->
+<p><img src="http://github.com/figures/structure.png" width="900"><p style="font-size: 0.9em"><i>Figure 1: </i></p></p>
+<!-- end figure --></section>
+<section id="derivatives-and-the-chain-rule">
+<h2>Derivatives and the chain rule<a class="headerlink" href="#derivatives-and-the-chain-rule" title="Link to this heading">#</a></h2>
+<p>From the definition of the input variable to the activation function, that is <span class="math notranslate nohighlight">\(z_j^l\)</span> we have</p>
+<div class="math notranslate nohighlight">
+\[
+\frac{\partial z_j^l}{\partial w_{ij}^l} = a_i^{l-1},
+\]</div>
+<p>and</p>
+<div class="math notranslate nohighlight">
+\[
+\frac{\partial z_j^l}{\partial a_i^{l-1}} = w_{ji}^l.
+\]</div>
+<p>With our definition of the activation function we have that (note that this function depends only on <span class="math notranslate nohighlight">\(z_j^l\)</span>)</p>
+<div class="math notranslate nohighlight">
+\[
+\frac{\partial a_j^l}{\partial z_j^{l}} = a_j^l(1-a_j^l)=\sigma(z_j^l)(1-\sigma(z_j^l)).
+\]</div>
+</section>
+<section id="derivative-of-the-cost-function">
+<h2>Derivative of the cost function<a class="headerlink" href="#derivative-of-the-cost-function" title="Link to this heading">#</a></h2>
+<p>With these definitions we can now compute the derivative of the cost function in terms of the weights.</p>
+<p>Let us specialize to the output layer <span class="math notranslate nohighlight">\(l=L\)</span>. Our cost function is</p>
+<div class="math notranslate nohighlight">
+\[
+{\cal C}(\boldsymbol{\Theta}^L)  =  \frac{1}{2}\sum_{i=1}^n\left(y_i - \tilde{y}_i\right)^2=\frac{1}{2}\sum_{i=1}^n\left(a_i^L - y_i\right)^2,
+\]</div>
+<p>The derivative of this function with respect to the weights is</p>
+<div class="math notranslate nohighlight">
+\[
+\frac{\partial{\cal C}(\boldsymbol{\Theta}^L)}{\partial w_{ij}^L}  =  \left(a_j^L - y_j\right)\frac{\partial a_j^L}{\partial w_{ij}^{L}},
+\]</div>
+<p>The last partial derivative can easily be computed and reads (by applying the chain rule)</p>
+<div class="math notranslate nohighlight">
+\[
+\frac{\partial a_j^L}{\partial w_{ij}^{L}} = \frac{\partial a_j^L}{\partial z_{j}^{L}}\frac{\partial z_j^L}{\partial w_{ij}^{L}}=a_j^L(1-a_j^L)a_i^{L-1}.
+\]</div>
+</section>
+<section id="the-back-propagation-equations-for-a-neural-network">
+<h2>The  back propagation equations for a neural network<a class="headerlink" href="#the-back-propagation-equations-for-a-neural-network" title="Link to this heading">#</a></h2>
+<p>We have thus</p>
+<div class="math notranslate nohighlight">
+\[
+\frac{\partial{\cal C}((\boldsymbol{\Theta}^L)}{\partial w_{ij}^L}  =  \left(a_j^L - y_j\right)a_j^L(1-a_j^L)a_i^{L-1},
+\]</div>
+<p>Defining</p>
+<div class="math notranslate nohighlight">
+\[
+\delta_j^L = a_j^L(1-a_j^L)\left(a_j^L - y_j\right) = \sigma'(z_j^L)\frac{\partial {\cal C}}{\partial (a_j^L)},
+\]</div>
+<p>and using the Hadamard product of two vectors we can write this as</p>
+<div class="math notranslate nohighlight">
+\[
+\boldsymbol{\delta}^L = \sigma'(\boldsymbol{z}^L)\circ\frac{\partial {\cal C}}{\partial (\boldsymbol{a}^L)}.
+\]</div>
+</section>
+<section id="analyzing-the-last-results">
+<h2>Analyzing the last results<a class="headerlink" href="#analyzing-the-last-results" title="Link to this heading">#</a></h2>
+<p>This is an important expression. The second term on the right handside
+measures how fast the cost function is changing as a function of the <span class="math notranslate nohighlight">\(j\)</span>th
+output activation.  If, for example, the cost function doesn’t depend
+much on a particular output node <span class="math notranslate nohighlight">\(j\)</span>, then <span class="math notranslate nohighlight">\(\delta_j^L\)</span> will be small,
+which is what we would expect. The first term on the right, measures
+how fast the activation function <span class="math notranslate nohighlight">\(f\)</span> is changing at a given activation
+value <span class="math notranslate nohighlight">\(z_j^L\)</span>.</p>
+</section>
+<section id="more-considerations">
+<h2>More considerations<a class="headerlink" href="#more-considerations" title="Link to this heading">#</a></h2>
+<p>Notice that everything in the above equations is easily computed.  In
+particular, we compute <span class="math notranslate nohighlight">\(z_j^L\)</span> while computing the behaviour of the
+network, and it is only a small additional overhead to compute
+<span class="math notranslate nohighlight">\(\sigma'(z^L_j)\)</span>.  The exact form of the derivative with respect to the
+output depends on the form of the cost function.
+However, provided the cost function is known there should be little
+trouble in calculating</p>
+<div class="math notranslate nohighlight">
+\[
+\frac{\partial {\cal C}}{\partial (a_j^L)}
+\]</div>
+<p>With the definition of <span class="math notranslate nohighlight">\(\delta_j^L\)</span> we have a more compact definition of the derivative of the cost function in terms of the weights, namely</p>
+<div class="math notranslate nohighlight">
+\[
+\frac{\partial{\cal C}}{\partial w_{ij}^L}  =  \delta_j^La_i^{L-1}.
+\]</div>
+</section>
+<section id="derivatives-in-terms-of-z-j-l">
+<h2>Derivatives in terms of <span class="math notranslate nohighlight">\(z_j^L\)</span><a class="headerlink" href="#derivatives-in-terms-of-z-j-l" title="Link to this heading">#</a></h2>
+<p>It is also easy to see that our previous equation can be written as</p>
+<div class="math notranslate nohighlight">
+\[
+\delta_j^L =\frac{\partial {\cal C}}{\partial z_j^L}= \frac{\partial {\cal C}}{\partial a_j^L}\frac{\partial a_j^L}{\partial z_j^L},
+\]</div>
+<p>which can also be interpreted as the partial derivative of the cost function with respect to the biases <span class="math notranslate nohighlight">\(b_j^L\)</span>, namely</p>
+<div class="math notranslate nohighlight">
+\[
+\delta_j^L = \frac{\partial {\cal C}}{\partial b_j^L}\frac{\partial b_j^L}{\partial z_j^L}=\frac{\partial {\cal C}}{\partial b_j^L},
+\]</div>
+<p>That is, the error <span class="math notranslate nohighlight">\(\delta_j^L\)</span> is exactly equal to the rate of change of the cost function as a function of the bias.</p>
+</section>
+<section id="bringing-it-together">
+<h2>Bringing it together<a class="headerlink" href="#bringing-it-together" title="Link to this heading">#</a></h2>
+<p>We have now three equations that are essential for the computations of the derivatives of the cost function at the output layer. These equations are needed to start the algorithm and they are</p>
+<!-- Equation labels as ordinary links -->
+<div id="_auto1"></div>
+<div class="math notranslate nohighlight">
+\[
+\begin{equation}
+\frac{\partial{\cal C}(\boldsymbol{W^L})}{\partial w_{ij}^L}  =  \delta_j^La_i^{L-1},
+\label{_auto1} \tag{1}
+\end{equation}
+\]</div>
+<p>and</p>
+<!-- Equation labels as ordinary links -->
+<div id="_auto2"></div>
+<div class="math notranslate nohighlight">
+\[
+\begin{equation}
+\delta_j^L = \sigma'(z_j^L)\frac{\partial {\cal C}}{\partial (a_j^L)},
+\label{_auto2} \tag{2}
+\end{equation}
+\]</div>
+<p>and</p>
+<!-- Equation labels as ordinary links -->
+<div id="_auto3"></div>
+<div class="math notranslate nohighlight">
+\[
+\begin{equation}
+\delta_j^L = \frac{\partial {\cal C}}{\partial b_j^L},
+\label{_auto3} \tag{3}
+\end{equation}
+\]</div>
+</section>
+<section id="final-back-propagating-equation">
+<h2>Final back propagating equation<a class="headerlink" href="#final-back-propagating-equation" title="Link to this heading">#</a></h2>
+<p>We have that (replacing <span class="math notranslate nohighlight">\(L\)</span> with a general layer <span class="math notranslate nohighlight">\(l\)</span>)</p>
+<div class="math notranslate nohighlight">
+\[
+\delta_j^l =\frac{\partial {\cal C}}{\partial z_j^l}.
+\]</div>
+<p>We want to express this in terms of the equations for layer <span class="math notranslate nohighlight">\(l+1\)</span>.</p>
+</section>
+<section id="using-the-chain-rule-and-summing-over-all-k-entries">
+<h2>Using the chain rule and summing over all <span class="math notranslate nohighlight">\(k\)</span> entries<a class="headerlink" href="#using-the-chain-rule-and-summing-over-all-k-entries" title="Link to this heading">#</a></h2>
+<p>We obtain</p>
+<div class="math notranslate nohighlight">
+\[
+\delta_j^l =\sum_k \frac{\partial {\cal C}}{\partial z_k^{l+1}}\frac{\partial z_k^{l+1}}{\partial z_j^{l}}=\sum_k \delta_k^{l+1}\frac{\partial z_k^{l+1}}{\partial z_j^{l}},
+\]</div>
+<p>and recalling that</p>
+<div class="math notranslate nohighlight">
+\[
+z_j^{l+1} = \sum_{i=1}^{M_{l}}w_{ij}^{l+1}a_i^{l}+b_j^{l+1},
+\]</div>
+<p>with <span class="math notranslate nohighlight">\(M_l\)</span> being the number of nodes in layer <span class="math notranslate nohighlight">\(l\)</span>, we obtain</p>
+<div class="math notranslate nohighlight">
+\[
+\delta_j^l =\sum_k \delta_k^{l+1}w_{kj}^{l+1}\sigma'(z_j^l),
+\]</div>
+<p>This is our final equation.</p>
+<p>We are now ready to set up the algorithm for back propagation and learning the weights and biases.</p>
+</section>
+<section id="setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations">
+<h2>Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations<a class="headerlink" href="#setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations" title="Link to this heading">#</a></h2>
+<p><strong>The architecture (our model).</strong></p>
+<ol class="arabic simple">
+<li><p>Set up your inputs and outputs (scalars, vectors, matrices or higher-order arrays)</p></li>
+<li><p>Define the number of hidden layers and hidden nodes</p></li>
+<li><p>Define activation functions for hidden layers and output layers</p></li>
+<li><p>Define optimizer (plan learning rate, momentum, ADAgrad, RMSprop, ADAM etc) and array of initial learning rates</p></li>
+<li><p>Define cost function and possible regularization terms with hyperparameters</p></li>
+<li><p>Initialize weights and biases</p></li>
+<li><p>Fix number of iterations for the feed forward part and back propagation part</p></li>
+</ol>
+</section>
+<section id="setting-up-the-back-propagation-algorithm-part-1">
+<h2>Setting up the back propagation algorithm, part 1<a class="headerlink" href="#setting-up-the-back-propagation-algorithm-part-1" title="Link to this heading">#</a></h2>
+<p>The four equations  provide us with a way of computing the gradients of the cost function. Let us write this out in the form of an algorithm.</p>
+<p><strong>First</strong>, we set up the input data <span class="math notranslate nohighlight">\(\boldsymbol{x}\)</span> and the activations
+<span class="math notranslate nohighlight">\(\boldsymbol{z}_1\)</span> of the input layer and compute the activation function and
+the pertinent outputs <span class="math notranslate nohighlight">\(\boldsymbol{a}^1\)</span>.</p>
+<p><strong>Secondly</strong>, we perform then the feed forward till we reach the output
+layer and compute all <span class="math notranslate nohighlight">\(\boldsymbol{z}_l\)</span> of the input layer and compute the
+activation function and the pertinent outputs <span class="math notranslate nohighlight">\(\boldsymbol{a}^l\)</span> for
+<span class="math notranslate nohighlight">\(l=1,2,3,\dots,L\)</span>.</p>
+<p><strong>Notation</strong>: The first hidden layer has <span class="math notranslate nohighlight">\(l=1\)</span> as label and the final output layer has <span class="math notranslate nohighlight">\(l=L\)</span>.</p>
+</section>
+<section id="setting-up-the-back-propagation-algorithm-part-2">
+<h2>Setting up the back propagation algorithm, part 2<a class="headerlink" href="#setting-up-the-back-propagation-algorithm-part-2" title="Link to this heading">#</a></h2>
+<p>Thereafter we compute the ouput error <span class="math notranslate nohighlight">\(\boldsymbol{\delta}^L\)</span> by computing all</p>
+<div class="math notranslate nohighlight">
+\[
+\delta_j^L = \sigma'(z_j^L)\frac{\partial {\cal C}}{\partial (a_j^L)}.
+\]</div>
+<p>Then we compute the back propagate error for each <span class="math notranslate nohighlight">\(l=L-1,L-2,\dots,1\)</span> as</p>
+<div class="math notranslate nohighlight">
+\[
+\delta_j^l = \sum_k \delta_k^{l+1}w_{kj}^{l+1}\sigma'(z_j^l).
+\]</div>
+</section>
+<section id="setting-up-the-back-propagation-algorithm-part-3">
+<h2>Setting up the Back propagation algorithm, part 3<a class="headerlink" href="#setting-up-the-back-propagation-algorithm-part-3" title="Link to this heading">#</a></h2>
+<p>Finally, we update the weights and the biases using gradient descent
+for each <span class="math notranslate nohighlight">\(l=L-1,L-2,\dots,1\)</span> (the first hidden layer) and update the weights and biases
+according to the rules</p>
+<div class="math notranslate nohighlight">
+\[
+w_{ij}^l\leftarrow  = w_{ij}^l- \eta \delta_j^la_i^{l-1},
+\]</div>
+<div class="math notranslate nohighlight">
+\[
+b_j^l \leftarrow b_j^l-\eta \frac{\partial {\cal C}}{\partial b_j^l}=b_j^l-\eta \delta_j^l,
+\]</div>
+<p>with <span class="math notranslate nohighlight">\(\eta\)</span> being the learning rate.</p>
+</section>
+<section id="updating-the-gradients">
+<h2>Updating the gradients<a class="headerlink" href="#updating-the-gradients" title="Link to this heading">#</a></h2>
+<p>With the back propagate error for each <span class="math notranslate nohighlight">\(l=L-1,L-2,\dots,1\)</span> as</p>
+<div class="math notranslate nohighlight">
+\[
+\delta_j^l = \sum_k \delta_k^{l+1}w_{kj}^{l+1}\sigma'(z_j^l),
+\]</div>
+<p>we update the weights and the biases using gradient descent for each <span class="math notranslate nohighlight">\(l=L-1,L-2,\dots,1\)</span> and update the weights and biases according to the rules</p>
+<div class="math notranslate nohighlight">
+\[
+w_{ij}^l\leftarrow  = w_{ij}^l- \eta \delta_j^la_i^{l-1},
+\]</div>
+<div class="math notranslate nohighlight">
+\[
+b_j^l \leftarrow b_j^l-\eta \frac{\partial {\cal C}}{\partial b_j^l}=b_j^l-\eta \delta_j^l,
+\]</div>
+</section>
+<section id="activation-functions">
+<h2>Activation functions<a class="headerlink" href="#activation-functions" title="Link to this heading">#</a></h2>
+<p>A property that characterizes a neural network, other than its
+connectivity, is the choice of activation function(s).  The following
+restrictions are imposed on an activation function for an FFNN to
+fulfill the universal approximation theorem</p>
+<ul class="simple">
+<li><p>Non-constant</p></li>
+<li><p>Bounded</p></li>
+<li><p>Monotonically-increasing</p></li>
+<li><p>Continuous</p></li>
+</ul>
+<section id="activation-functions-logistic-and-hyperbolic-ones">
+<h3>Activation functions, Logistic and Hyperbolic ones<a class="headerlink" href="#activation-functions-logistic-and-hyperbolic-ones" title="Link to this heading">#</a></h3>
+<p>The second requirement excludes all linear functions. Furthermore, in
+a MLP with only linear activation functions, each layer simply
+performs a linear transformation of its inputs.</p>
+<p>Regardless of the number of layers, the output of the NN will be
+nothing but a linear function of the inputs. Thus we need to introduce
+some kind of non-linearity to the NN to be able to fit non-linear
+functions Typical examples are the logistic <em>Sigmoid</em></p>
+<div class="math notranslate nohighlight">
+\[
+\sigma(x) = \frac{1}{1 + e^{-x}},
+\]</div>
+<p>and the <em>hyperbolic tangent</em> function</p>
+<div class="math notranslate nohighlight">
+\[
+\sigma(x) = \tanh(x)
+\]</div>
+</section>
+</section>
+<section id="relevance">
+<h2>Relevance<a class="headerlink" href="#relevance" title="Link to this heading">#</a></h2>
+<p>The <em>sigmoid</em> function are more biologically plausible because the
+output of inactive neurons are zero. Such activation function are
+called <em>one-sided</em>. However, it has been shown that the hyperbolic
+tangent performs better than the sigmoid for training MLPs.  has
+become the most popular for <em>deep neural networks</em></p>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-none notranslate"><div class="highlight"><pre><span></span>%matplotlib inline
+
+&quot;&quot;&quot;The sigmoid function (or the logistic curve) is a 
+function that takes any real number, z, and outputs a number (0,1).
+It is useful in neural networks for assigning weights on a relative scale.
+The value z is the weighted sum of parameters involved in the learning algorithm.&quot;&quot;&quot;
+
+import numpy
+import matplotlib.pyplot as plt
+import math as mt
+
+z = numpy.arange(-5, 5, .1)
+sigma_fn = numpy.vectorize(lambda z: 1/(1+numpy.exp(-z)))
+sigma = sigma_fn(z)
+
+fig = plt.figure()
+ax = fig.add_subplot(111)
+ax.plot(z, sigma)
+ax.set_ylim([-0.1, 1.1])
+ax.set_xlim([-5,5])
+ax.grid(True)
+ax.set_xlabel(&#39;z&#39;)
+ax.set_title(&#39;sigmoid function&#39;)
+
+plt.show()
+
+&quot;&quot;&quot;Step Function&quot;&quot;&quot;
+z = numpy.arange(-5, 5, .02)
+step_fn = numpy.vectorize(lambda z: 1.0 if z &gt;= 0.0 else 0.0)
+step = step_fn(z)
+
+fig = plt.figure()
+ax = fig.add_subplot(111)
+ax.plot(z, step)
+ax.set_ylim([-0.5, 1.5])
+ax.set_xlim([-5,5])
+ax.grid(True)
+ax.set_xlabel(&#39;z&#39;)
+ax.set_title(&#39;step function&#39;)
+
+plt.show()
+
+&quot;&quot;&quot;Sine Function&quot;&quot;&quot;
+z = numpy.arange(-2*mt.pi, 2*mt.pi, 0.1)
+t = numpy.sin(z)
+
+fig = plt.figure()
+ax = fig.add_subplot(111)
+ax.plot(z, t)
+ax.set_ylim([-1.0, 1.0])
+ax.set_xlim([-2*mt.pi,2*mt.pi])
+ax.grid(True)
+ax.set_xlabel(&#39;z&#39;)
+ax.set_title(&#39;sine function&#39;)
+
+plt.show()
+
+&quot;&quot;&quot;Plots a graph of the squashing function used by a rectified linear
+unit&quot;&quot;&quot;
+z = numpy.arange(-2, 2, .1)
+zero = numpy.zeros(len(z))
+y = numpy.max([zero, z], axis=0)
+
+fig = plt.figure()
+ax = fig.add_subplot(111)
+ax.plot(z, y)
+ax.set_ylim([-2.0, 2.0])
+ax.set_xlim([-2.0, 2.0])
+ax.grid(True)
+ax.set_xlabel(&#39;z&#39;)
+ax.set_title(&#39;Rectified linear unit&#39;)
+
+plt.show()
+</pre></div>
+</div>
+</div>
+</div>
+</section>
+<section id="vanishing-gradients">
+<h2>Vanishing gradients<a class="headerlink" href="#vanishing-gradients" title="Link to this heading">#</a></h2>
+<p>The Back propagation algorithm we derived above works by going from
+the output layer to the input layer, propagating the error gradient on
+the way. Once the algorithm has computed the gradient of the cost
+function with regards to each parameter in the network, it uses these
+gradients to update each parameter with a Gradient Descent (GD) step.</p>
+<p>Unfortunately for us, the gradients often get smaller and smaller as
+the algorithm progresses down to the first hidden layers. As a result,
+the GD update leaves the lower layer connection weights virtually
+unchanged, and training never converges to a good solution. This is
+known in the literature as <strong>the vanishing gradients problem</strong>.</p>
+</section>
+<section id="exploding-gradients">
+<h2>Exploding gradients<a class="headerlink" href="#exploding-gradients" title="Link to this heading">#</a></h2>
+<p>In other cases, the opposite can happen, namely the the gradients can
+grow bigger and bigger. The result is that many of the layers get
+large updates of the weights the algorithm diverges. This is the
+<strong>exploding gradients problem</strong>, which is mostly encountered in
+recurrent neural networks. More generally, deep neural networks suffer
+from unstable gradients, different layers may learn at widely
+different speeds</p>
+</section>
+<section id="is-the-logistic-activation-function-sigmoid-our-choice">
+<h2>Is the Logistic activation function (Sigmoid)  our choice?<a class="headerlink" href="#is-the-logistic-activation-function-sigmoid-our-choice" title="Link to this heading">#</a></h2>
+<p>Although this unfortunate behavior has been empirically observed for
+quite a while (it was one of the reasons why deep neural networks were
+mostly abandoned for a long time), it is only around 2010 that
+significant progress was made in understanding it.</p>
+<p>A paper titled <a class="reference external" href="http://proceedings.mlr.press/v9/glorot10a.html">Understanding the Difficulty of Training Deep
+Feedforward Neural Networks by Xavier Glorot and Yoshua Bengio</a> found that
+the problems with the popular logistic
+sigmoid activation function and the weight initialization technique
+that was most popular at the time, namely random initialization using
+a normal distribution with a mean of 0 and a standard deviation of
+1.</p>
+</section>
+<section id="logistic-function-as-the-root-of-problems">
+<h2>Logistic function as the root of problems<a class="headerlink" href="#logistic-function-as-the-root-of-problems" title="Link to this heading">#</a></h2>
+<p>They showed that with this activation function and this
+initialization scheme, the variance of the outputs of each layer is
+much greater than the variance of its inputs. Going forward in the
+network, the variance keeps increasing after each layer until the
+activation function saturates at the top layers. This is actually made
+worse by the fact that the logistic function has a mean of 0.5, not 0
+(the hyperbolic tangent function has a mean of 0 and behaves slightly
+better than the logistic function in deep networks).</p>
+</section>
+<section id="the-derivative-of-the-logistic-funtion">
+<h2>The derivative of the Logistic funtion<a class="headerlink" href="#the-derivative-of-the-logistic-funtion" title="Link to this heading">#</a></h2>
+<p>Looking at the logistic activation function, when inputs become large
+(negative or positive), the function saturates at 0 or 1, with a
+derivative extremely close to 0. Thus when backpropagation kicks in,
+it has virtually no gradient to propagate back through the network,
+and what little gradient exists keeps getting diluted as
+backpropagation progresses down through the top layers, so there is
+really nothing left for the lower layers.</p>
+<p>In their paper, Glorot and Bengio propose a way to significantly
+alleviate this problem. We need the signal to flow properly in both
+directions: in the forward direction when making predictions, and in
+the reverse direction when backpropagating gradients. We don’t want
+the signal to die out, nor do we want it to explode and saturate. For
+the signal to flow properly, the authors argue that we need the
+variance of the outputs of each layer to be equal to the variance of
+its inputs, and we also need the gradients to have equal variance
+before and after flowing through a layer in the reverse direction.</p>
+</section>
+<section id="insights-from-the-paper-by-glorot-and-bengio">
+<h2>Insights from the paper by Glorot and Bengio<a class="headerlink" href="#insights-from-the-paper-by-glorot-and-bengio" title="Link to this heading">#</a></h2>
+<p>One of the insights in the 2010 paper by Glorot and Bengio was that
+the vanishing/exploding gradients problems were in part due to a poor
+choice of activation function. Until then most people had assumed that
+if Nature had chosen to use roughly sigmoid activation functions in
+biological neurons, they must be an excellent choice. But it turns out
+that other activation functions behave much better in deep neural
+networks, in particular the ReLU activation function, mostly because
+it does not saturate for positive values (and also because it is quite
+fast to compute).</p>
+</section>
+<section id="the-relu-function-family">
+<h2>The RELU function family<a class="headerlink" href="#the-relu-function-family" title="Link to this heading">#</a></h2>
+<p>The ReLU activation function suffers from a problem known as the dying
+ReLUs: during training, some neurons effectively die, meaning they
+stop outputting anything other than 0.</p>
+<p>In some cases, you may find that half of your network’s neurons are
+dead, especially if you used a large learning rate. During training,
+if a neuron’s weights get updated such that the weighted sum of the
+neuron’s inputs is negative, it will start outputting 0. When this
+happen, the neuron is unlikely to come back to life since the gradient
+of the ReLU function is 0 when its input is negative.</p>
+</section>
+<section id="elu-function">
+<h2>ELU function<a class="headerlink" href="#elu-function" title="Link to this heading">#</a></h2>
+<p>To solve this problem, nowadays practitioners use a variant of the
+ReLU function, such as the leaky ReLU discussed above or the so-called
+exponential linear unit (ELU) function</p>
+<div class="math notranslate nohighlight">
+\[\begin{split}
+ELU(z) = \left\{\begin{array}{cc} \alpha\left( \exp{(z)}-1\right) &amp; z &lt; 0,\\  z &amp; z \ge 0.\end{array}\right.
+\end{split}\]</div>
+</section>
+<section id="which-activation-function-should-we-use">
+<h2>Which activation function should we use?<a class="headerlink" href="#which-activation-function-should-we-use" title="Link to this heading">#</a></h2>
+<p>In general it seems that the ELU activation function is better than
+the leaky ReLU function (and its variants), which is better than
+ReLU. ReLU performs better than <span class="math notranslate nohighlight">\(\tanh\)</span> which in turn performs better
+than the logistic function.</p>
+<p>If runtime performance is an issue, then you may opt for the leaky
+ReLU function over the ELU function If you don’t want to tweak yet
+another hyperparameter, you may just use the default <span class="math notranslate nohighlight">\(\alpha\)</span> of
+<span class="math notranslate nohighlight">\(0.01\)</span> for the leaky ReLU, and <span class="math notranslate nohighlight">\(1\)</span> for ELU. If you have spare time and
+computing power, you can use cross-validation or bootstrap to evaluate
+other activation functions.</p>
+</section>
+<section id="more-on-activation-functions-output-layers">
+<h2>More on activation functions, output layers<a class="headerlink" href="#more-on-activation-functions-output-layers" title="Link to this heading">#</a></h2>
+<p>In most cases you can use the ReLU activation function in the hidden
+layers (or one of its variants).</p>
+<p>It is a bit faster to compute than other activation functions, and the
+gradient descent optimization does in general not get stuck.</p>
+<p><strong>For the output layer:</strong></p>
+<ul class="simple">
+<li><p>For classification the softmax activation function is generally a good choice for classification tasks (when the classes are mutually exclusive).</p></li>
+<li><p>For regression tasks, you can simply use no activation function at all.</p></li>
+</ul>
+</section>
+<section id="fine-tuning-neural-network-hyperparameters">
+<h2>Fine-tuning neural network hyperparameters<a class="headerlink" href="#fine-tuning-neural-network-hyperparameters" title="Link to this heading">#</a></h2>
+<p>The flexibility of neural networks is also one of their main
+drawbacks: there are many hyperparameters to tweak. Not only can you
+use any imaginable network topology (how neurons/nodes are
+interconnected), but even in a simple FFNN you can change the number
+of layers, the number of neurons per layer, the type of activation
+function to use in each layer, the weight initialization logic, the
+stochastic gradient optmized and much more. How do you know what
+combination of hyperparameters is the best for your task?</p>
+<ul class="simple">
+<li><p>You can use grid search with cross-validation to find the right hyperparameters.</p></li>
+</ul>
+<p>However,since there are many hyperparameters to tune, and since
+training a neural network on a large dataset takes a lot of time, you
+will only be able to explore a tiny part of the hyperparameter space.</p>
+<ul class="simple">
+<li><p>You can use randomized search.</p></li>
+<li><p>Or use tools like <a class="reference external" href="http://oscar.calldesk.ai/">Oscar</a>, which implements more complex algorithms to help you find a good set of hyperparameters quickly.</p></li>
+</ul>
+</section>
+<section id="hidden-layers">
+<h2>Hidden layers<a class="headerlink" href="#hidden-layers" title="Link to this heading">#</a></h2>
+<p>For many problems you can start with just one or two hidden layers and
+it will work just fine.  For the MNIST data set discussed below you can easily get a
+high accuracy using just one hidden layer with a few hundred neurons.
+You can reach for this data set above 98% accuracy using two hidden
+layers with the same total amount of neurons, in roughly the same
+amount of training time.</p>
+<p>For more complex problems, you can gradually ramp up the number of
+hidden layers, until you start overfitting the training set. Very
+complex tasks, such as large image classification or speech
+recognition, typically require networks with dozens of layers and they
+need a huge amount of training data. However, you will rarely have to
+train such networks from scratch: it is much more common to reuse
+parts of a pretrained state-of-the-art network that performs a similar
+task.</p>
+</section>
+<section id="batch-normalization">
+<h2>Batch Normalization<a class="headerlink" href="#batch-normalization" title="Link to this heading">#</a></h2>
+<p>Batch Normalization aims to address the vanishing/exploding gradients
+problems, and more generally the problem that the distribution of each
+layer’s inputs changes during training, as the parameters of the
+previous layers change.</p>
+<p>The technique consists of adding an operation in the model just before
+the activation function of each layer, simply zero-centering and
+normalizing the inputs, then scaling and shifting the result using two
+new parameters per layer (one for scaling, the other for shifting). In
+other words, this operation lets the model learn the optimal scale and
+mean of the inputs for each layer.  In order to zero-center and
+normalize the inputs, the algorithm needs to estimate the inputs’ mean
+and standard deviation. It does so by evaluating the mean and standard
+deviation of the inputs over the current mini-batch, from this the
+name batch normalization.</p>
+</section>
+<section id="dropout">
+<h2>Dropout<a class="headerlink" href="#dropout" title="Link to this heading">#</a></h2>
+<p>It is a fairly simple algorithm: at every training step, every neuron
+(including the input neurons but excluding the output neurons) has a
+probability <span class="math notranslate nohighlight">\(p\)</span> of being temporarily dropped out, meaning it will be
+entirely ignored during this training step, but it may be active
+during the next step.</p>
+<p>The hyperparameter <span class="math notranslate nohighlight">\(p\)</span> is called the dropout rate, and it is typically
+set to 50%. After training, the neurons are not dropped anymore.  It
+is viewed as one of the most popular regularization techniques.</p>
+</section>
+<section id="gradient-clipping">
+<h2>Gradient Clipping<a class="headerlink" href="#gradient-clipping" title="Link to this heading">#</a></h2>
+<p>A popular technique to lessen the exploding gradients problem is to
+simply clip the gradients during backpropagation so that they never
+exceed some threshold (this is mostly useful for recurrent neural
+networks).</p>
+<p>This technique is called Gradient Clipping.</p>
+<p>In general however, Batch
+Normalization is preferred.</p>
+</section>
+<section id="a-top-down-perspective-on-neural-networks">
+<h2>A top-down perspective on Neural networks<a class="headerlink" href="#a-top-down-perspective-on-neural-networks" title="Link to this heading">#</a></h2>
+<p>The first thing we would like to do is divide the data into two or
+three parts. A training set, a validation or dev (development) set,
+and a test set. The test set is the data on which we want to make
+predictions. The dev set is a subset of the training data we use to
+check how well we are doing out-of-sample, after training the model on
+the training dataset. We use the validation error as a proxy for the
+test error in order to make tweaks to our model. It is crucial that we
+do not use any of the test data to train the algorithm. This is a
+cardinal sin in ML. Then:</p>
+<ol class="arabic simple">
+<li><p>Estimate optimal error rate</p></li>
+<li><p>Minimize underfitting (bias) on training data set.</p></li>
+<li><p>Make sure you are not overfitting.</p></li>
+</ol>
+</section>
+<section id="more-top-down-perspectives">
+<h2>More top-down perspectives<a class="headerlink" href="#more-top-down-perspectives" title="Link to this heading">#</a></h2>
+<p>If the validation and test sets are drawn from the same distributions,
+then a good performance on the validation set should lead to similarly
+good performance on the test set.</p>
+<p>However, sometimes
+the training data and test data differ in subtle ways because, for
+example, they are collected using slightly different methods, or
+because it is cheaper to collect data in one way versus another. In
+this case, there can be a mismatch between the training and test
+data. This can lead to the neural network overfitting these small
+differences between the test and training sets, and a poor performance
+on the test set despite having a good performance on the validation
+set. To rectify this, Andrew Ng suggests making two validation or dev
+sets, one constructed from the training data and one constructed from
+the test data. The difference between the performance of the algorithm
+on these two validation sets quantifies the train-test mismatch. This
+can serve as another important diagnostic when using DNNs for
+supervised learning.</p>
+</section>
+<section id="limitations-of-supervised-learning-with-deep-networks">
+<h2>Limitations of supervised learning with deep networks<a class="headerlink" href="#limitations-of-supervised-learning-with-deep-networks" title="Link to this heading">#</a></h2>
+<p>Like all statistical methods, supervised learning using neural
+networks has important limitations. This is especially important when
+one seeks to apply these methods, especially to physics problems. Like
+all tools, DNNs are not a universal solution. Often, the same or
+better performance on a task can be achieved by using a few
+hand-engineered features (or even a collection of random
+features).</p>
+</section>
+<section id="limitations-of-nns">
+<h2>Limitations of NNs<a class="headerlink" href="#limitations-of-nns" title="Link to this heading">#</a></h2>
+<p>Here we list some of the important limitations of supervised neural network based models.</p>
+<ul class="simple">
+<li><p><strong>Need labeled data</strong>. All supervised learning methods, DNNs for supervised learning require labeled data. Often, labeled data is harder to acquire than unlabeled data (e.g. one must pay for human experts to label images).</p></li>
+<li><p><strong>Supervised neural networks are extremely data intensive.</strong> DNNs are data hungry. They perform best when data is plentiful. This is doubly so for supervised methods where the data must also be labeled. The utility of DNNs is extremely limited if data is hard to acquire or the datasets are small (hundreds to a few thousand samples). In this case, the performance of other methods that utilize hand-engineered features can exceed that of DNNs.</p></li>
+</ul>
+</section>
+<section id="homogeneous-data">
+<h2>Homogeneous data<a class="headerlink" href="#homogeneous-data" title="Link to this heading">#</a></h2>
+<ul class="simple">
+<li><p><strong>Homogeneous data.</strong> Almost all DNNs deal with homogeneous data of one type. It is very hard to design architectures that mix and match data types (i.e. some continuous variables, some discrete variables, some time series). In applications beyond images, video, and language, this is often what is required. In contrast, ensemble models like random forests or gradient-boosted trees have no difficulty handling mixed data types.</p></li>
+</ul>
+</section>
+<section id="more-limitations">
+<h2>More limitations<a class="headerlink" href="#more-limitations" title="Link to this heading">#</a></h2>
+<ul class="simple">
+<li><p><strong>Many problems are not about prediction.</strong> In natural science we are often interested in learning something about the underlying distribution that generates the data. In this case, it is often difficult to cast these ideas in a supervised learning setting. While the problems are related, it is possible to make good predictions with a <em>wrong</em> model. The model might or might not be useful for understanding the underlying science.</p></li>
+</ul>
+<p>Some of these remarks are particular to DNNs, others are shared by all supervised learning methods. This motivates the use of unsupervised methods which in part circumvent these problems.</p>
+</section>
+<section id="setting-up-a-multi-layer-perceptron-model-for-classification">
+<h2>Setting up a Multi-layer perceptron model for classification<a class="headerlink" href="#setting-up-a-multi-layer-perceptron-model-for-classification" title="Link to this heading">#</a></h2>
+<p>We are now gong to develop an example based on the MNIST data
+base. This is a classification problem and we need to use our
+cross-entropy function we discussed in connection with logistic
+regression. The cross-entropy defines our cost function for the
+classificaton problems with neural networks.</p>
+<p>In binary classification with two classes <span class="math notranslate nohighlight">\((0, 1)\)</span> we define the
+logistic/sigmoid function as the probability that a particular input
+is in class <span class="math notranslate nohighlight">\(0\)</span> or <span class="math notranslate nohighlight">\(1\)</span>.  This is possible because the logistic
+function takes any input from the real numbers and inputs a number
+between 0 and 1, and can therefore be interpreted as a probability. It
+also has other nice properties, such as a derivative that is simple to
+calculate.</p>
+<p>For an input <span class="math notranslate nohighlight">\(\boldsymbol{a}\)</span> from the hidden layer, the probability that the input <span class="math notranslate nohighlight">\(\boldsymbol{x}\)</span>
+is in class 0 or 1 is just. We let <span class="math notranslate nohighlight">\(\theta\)</span> represent the unknown weights and biases to be adjusted by our equations). The variable <span class="math notranslate nohighlight">\(x\)</span>
+represents our activation values <span class="math notranslate nohighlight">\(z\)</span>. We have</p>
+<div class="math notranslate nohighlight">
+\[
+P(y = 0 \mid \boldsymbol{x}, \boldsymbol{\theta}) = \frac{1}{1 + \exp{(- \boldsymbol{x}})} ,
+\]</div>
+<p>and</p>
+<div class="math notranslate nohighlight">
+\[
+P(y = 1 \mid \boldsymbol{x}, \boldsymbol{\theta}) = 1 - P(y = 0 \mid \boldsymbol{x}, \boldsymbol{\theta}) ,
+\]</div>
+<p>where <span class="math notranslate nohighlight">\(y \in \{0, 1\}\)</span>  and <span class="math notranslate nohighlight">\(\boldsymbol{\theta}\)</span> represents the weights and biases
+of our network.</p>
+</section>
+<section id="defining-the-cost-function">
+<h2>Defining the cost function<a class="headerlink" href="#defining-the-cost-function" title="Link to this heading">#</a></h2>
+<p>Our cost function is given as (see the Logistic regression lectures)</p>
+<div class="math notranslate nohighlight">
+\[
+\mathcal{C}(\boldsymbol{\theta}) = - \ln P(\mathcal{D} \mid \boldsymbol{\theta}) = - \sum_{i=1}^n
+y_i \ln[P(y_i = 0)] + (1 - y_i) \ln [1 - P(y_i = 0)] = \sum_{i=1}^n \mathcal{L}_i(\boldsymbol{\theta}) .
+\]</div>
+<p>This last equality means that we can interpret our <em>cost</em> function as a sum over the <em>loss</em> function
+for each point in the dataset <span class="math notranslate nohighlight">\(\mathcal{L}_i(\boldsymbol{\theta})\)</span>.<br />
+The negative sign is just so that we can think about our algorithm as minimizing a positive number, rather
+than maximizing a negative number.</p>
+<p>In <em>multiclass</em> classification it is common to treat each integer label as a so called <em>one-hot</em> vector:</p>
+<p><span class="math notranslate nohighlight">\(y = 5 \quad \rightarrow \quad \boldsymbol{y} = (0, 0, 0, 0, 0, 1, 0, 0, 0, 0) ,\)</span> and</p>
+<p><span class="math notranslate nohighlight">\(y = 1 \quad \rightarrow \quad \boldsymbol{y} = (0, 1, 0, 0, 0, 0, 0, 0, 0, 0) ,\)</span></p>
+<p>i.e. a binary bit string of length <span class="math notranslate nohighlight">\(C\)</span>, where <span class="math notranslate nohighlight">\(C = 10\)</span> is the number of classes in the MNIST dataset (numbers from <span class="math notranslate nohighlight">\(0\)</span> to <span class="math notranslate nohighlight">\(9\)</span>)..</p>
+<p>If <span class="math notranslate nohighlight">\(\boldsymbol{x}_i\)</span> is the <span class="math notranslate nohighlight">\(i\)</span>-th input (image), <span class="math notranslate nohighlight">\(y_{ic}\)</span> refers to the <span class="math notranslate nohighlight">\(c\)</span>-th component of the <span class="math notranslate nohighlight">\(i\)</span>-th
+output vector <span class="math notranslate nohighlight">\(\boldsymbol{y}_i\)</span>.<br />
+The probability of <span class="math notranslate nohighlight">\(\boldsymbol{x}_i\)</span> being in class <span class="math notranslate nohighlight">\(c\)</span> will be given by the softmax function:</p>
+<div class="math notranslate nohighlight">
+\[
+P(y_{ic} = 1 \mid \boldsymbol{x}_i, \boldsymbol{\theta}) = \frac{\exp{((\boldsymbol{a}_i^{hidden})^T \boldsymbol{w}_c)}}
+{\sum_{c'=0}^{C-1} \exp{((\boldsymbol{a}_i^{hidden})^T \boldsymbol{w}_{c'})}} ,
+\]</div>
+<p>which reduces to the logistic function in the binary case.<br />
+The likelihood of this <span class="math notranslate nohighlight">\(C\)</span>-class classifier
+is now given as:</p>
+<div class="math notranslate nohighlight">
+\[
+P(\mathcal{D} \mid \boldsymbol{\theta}) = \prod_{i=1}^n \prod_{c=0}^{C-1} [P(y_{ic} = 1)]^{y_{ic}} .
+\]</div>
+<p>Again we take the negative log-likelihood to define our cost function:</p>
+<div class="math notranslate nohighlight">
+\[
+\mathcal{C}(\boldsymbol{\theta}) = - \log{P(\mathcal{D} \mid \boldsymbol{\theta})}.
+\]</div>
+<p>See the logistic regression lectures for a full definition of the cost function.</p>
+<p>The back propagation equations need now only a small change, namely the definition of a new cost function. We are thus ready to use the same equations as before!</p>
+</section>
+<section id="example-binary-classification-problem">
+<h2>Example: binary classification problem<a class="headerlink" href="#example-binary-classification-problem" title="Link to this heading">#</a></h2>
+<p>As an example of the above, relevant for project 2 as well, let us consider a binary class. As discussed in our logistic regression lectures, we defined a cost function in terms of the parameters <span class="math notranslate nohighlight">\(\beta\)</span> as</p>
+<div class="math notranslate nohighlight">
+\[
+\mathcal{C}(\boldsymbol{\beta}) = - \sum_{i=1}^n \left(y_i\log{p(y_i \vert x_i,\boldsymbol{\beta})}+(1-y_i)\log{1-p(y_i \vert x_i,\boldsymbol{\beta})}\right),
+\]</div>
+<p>where we had defined the logistic (sigmoid) function</p>
+<div class="math notranslate nohighlight">
+\[
+p(y_i =1\vert x_i,\boldsymbol{\beta})=\frac{\exp{(\beta_0+\beta_1 x_i)}}{1+\exp{(\beta_0+\beta_1 x_i)}},
+\]</div>
+<p>and</p>
+<div class="math notranslate nohighlight">
+\[
+p(y_i =0\vert x_i,\boldsymbol{\beta})=1-p(y_i =1\vert x_i,\boldsymbol{\beta}).
+\]</div>
+<p>The parameters <span class="math notranslate nohighlight">\(\boldsymbol{\beta}\)</span> were defined using a minimization method like gradient descent or Newton-Raphson’s method.</p>
+<p>Now we replace <span class="math notranslate nohighlight">\(x_i\)</span> with the activation <span class="math notranslate nohighlight">\(z_i^l\)</span> for a given layer <span class="math notranslate nohighlight">\(l\)</span> and the outputs as <span class="math notranslate nohighlight">\(y_i=a_i^l=f(z_i^l)\)</span>, with <span class="math notranslate nohighlight">\(z_i^l\)</span> now being a function of the weights <span class="math notranslate nohighlight">\(w_{ij}^l\)</span> and biases <span class="math notranslate nohighlight">\(b_i^l\)</span>.
+We have then</p>
+<div class="math notranslate nohighlight">
+\[
+a_i^l = y_i = \frac{\exp{(z_i^l)}}{1+\exp{(z_i^l)}},
+\]</div>
+<p>with</p>
+<div class="math notranslate nohighlight">
+\[
+z_i^l = \sum_{j}w_{ij}^l a_j^{l-1}+b_i^l,
+\]</div>
+<p>where the superscript <span class="math notranslate nohighlight">\(l-1\)</span> indicates that these are the outputs from layer <span class="math notranslate nohighlight">\(l-1\)</span>.
+Our cost function at the final layer <span class="math notranslate nohighlight">\(l=L\)</span> is now</p>
+<div class="math notranslate nohighlight">
+\[
+\mathcal{C}(\boldsymbol{W}) = - \sum_{i=1}^n \left(t_i\log{a_i^L}+(1-t_i)\log{(1-a_i^L)}\right),
+\]</div>
+<p>where we have defined the targets <span class="math notranslate nohighlight">\(t_i\)</span>. The derivatives of the cost function with respect to the output <span class="math notranslate nohighlight">\(a_i^L\)</span> are then easily calculated and we get</p>
+<div class="math notranslate nohighlight">
+\[
+\frac{\partial \mathcal{C}(\boldsymbol{W})}{\partial a_i^L} = \frac{a_i^L-t_i}{a_i^L(1-a_i^L)}.
+\]</div>
+<p>In case we use another activation function than the logistic one, we need to evaluate other derivatives.</p>
+</section>
+<section id="the-softmax-function">
+<h2>The Softmax function<a class="headerlink" href="#the-softmax-function" title="Link to this heading">#</a></h2>
+<p>In case we employ the more general case given by the Softmax equation, we need to evaluate the derivative of the activation function with respect to the activation <span class="math notranslate nohighlight">\(z_i^l\)</span>, that is we need</p>
+<div class="math notranslate nohighlight">
+\[
+\frac{\partial f(z_i^l)}{\partial w_{jk}^l} =
+\frac{\partial f(z_i^l)}{\partial z_j^l} \frac{\partial z_j^l}{\partial w_{jk}^l}= \frac{\partial f(z_i^l)}{\partial z_j^l}a_k^{l-1}.
+\]</div>
+<p>For the Softmax function we have</p>
+<div class="math notranslate nohighlight">
+\[
+f(z_i^l) = \frac{\exp{(z_i^l)}}{\sum_{m=1}^K\exp{(z_m^l)}}.
+\]</div>
+<p>Its derivative with respect to <span class="math notranslate nohighlight">\(z_j^l\)</span> gives</p>
+<div class="math notranslate nohighlight">
+\[
+\frac{\partial f(z_i^l)}{\partial z_j^l}= f(z_i^l)\left(\delta_{ij}-f(z_j^l)\right),
+\]</div>
+<p>which in case of the simply binary model reduces to  having <span class="math notranslate nohighlight">\(i=j\)</span>.</p>
+</section>
+<section id="developing-a-code-for-doing-neural-networks-with-back-propagation">
+<h2>Developing a code for doing neural networks with back propagation<a class="headerlink" href="#developing-a-code-for-doing-neural-networks-with-back-propagation" title="Link to this heading">#</a></h2>
+<p>One can identify a set of key steps when using neural networks to solve supervised learning problems:</p>
+<ol class="arabic simple">
+<li><p>Collect and pre-process data</p></li>
+<li><p>Define model and architecture</p></li>
+<li><p>Choose cost function and optimizer</p></li>
+<li><p>Train the model</p></li>
+<li><p>Evaluate model performance on test data</p></li>
+<li><p>Adjust hyperparameters (if necessary, network architecture)</p></li>
+</ol>
+</section>
+<section id="collect-and-pre-process-data">
+<h2>Collect and pre-process data<a class="headerlink" href="#collect-and-pre-process-data" title="Link to this heading">#</a></h2>
+<p>Here we will be using the MNIST dataset, which is readily available through the <strong>scikit-learn</strong>
+package. You may also find it for example <a class="reference external" href="http://yann.lecun.com/exdb/mnist/">here</a>.<br />
+The <em>MNIST</em> (Modified National Institute of Standards and Technology) database is a large database
+of handwritten digits that is commonly used for training various image processing systems.<br />
+The MNIST dataset consists of 70 000 images of size <span class="math notranslate nohighlight">\(28\times 28\)</span> pixels, each labeled from 0 to 9.<br />
+The scikit-learn dataset we will use consists of a selection of 1797 images of size <span class="math notranslate nohighlight">\(8\times 8\)</span> collected and processed from this database.</p>
+<p>To feed data into a feed-forward neural network we need to represent
+the inputs as a design/feature matrix <span class="math notranslate nohighlight">\(X = (n_{inputs}, n_{features})\)</span>.  Each
+row represents an <em>input</em>, in this case a handwritten digit, and
+each column represents a <em>feature</em>, in this case a pixel.  The
+correct answers, also known as <em>labels</em> or <em>targets</em> are
+represented as a 1D array of integers
+<span class="math notranslate nohighlight">\(Y = (n_{inputs}) = (5, 3, 1, 8,...)\)</span>.</p>
+<p>As an example, say we want to build a neural network using supervised learning to predict Body-Mass Index (BMI) from
+measurements of height (in m)<br />
+and weight (in kg). If we have measurements of 5 people the design/feature matrix could be for example:</p>
+<div class="math notranslate nohighlight">
+\[\begin{split} X = \begin{bmatrix}
+1.85 &amp; 81\\
+1.71 &amp; 65\\
+1.95 &amp; 103\\
+1.55 &amp; 42\\
+1.63 &amp; 56
+\end{bmatrix} ,\end{split}\]</div>
+<p>and the targets would be:</p>
+<div class="math notranslate nohighlight">
+\[ Y = (23.7, 22.2, 27.1, 17.5, 21.1) \]</div>
+<p>Since each input image is a 2D matrix, we need to flatten the image
+(i.e. “unravel” the 2D matrix into a 1D array) to turn the data into a
+design/feature matrix. This means we lose all spatial information in the
+image, such as locality and translational invariance. More complicated
+architectures such as Convolutional Neural Networks can take advantage
+of such information, and are most commonly applied when analyzing
+images.</p>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-none notranslate"><div class="highlight"><pre><span></span># import necessary packages
+import numpy as np
+import matplotlib.pyplot as plt
+from sklearn import datasets
+
+
+# ensure the same random numbers appear every time
+np.random.seed(0)
+
+# display images in notebook
+%matplotlib inline
+plt.rcParams[&#39;figure.figsize&#39;] = (12,12)
+
+
+# download MNIST dataset
+digits = datasets.load_digits()
+
+# define inputs and labels
+inputs = digits.images
+labels = digits.target
+
+print(&quot;inputs = (n_inputs, pixel_width, pixel_height) = &quot; + str(inputs.shape))
+print(&quot;labels = (n_inputs) = &quot; + str(labels.shape))
+
+
+# flatten the image
+# the value -1 means dimension is inferred from the remaining dimensions: 8x8 = 64
+n_inputs = len(inputs)
+inputs = inputs.reshape(n_inputs, -1)
+print(&quot;X = (n_inputs, n_features) = &quot; + str(inputs.shape))
+
+
+# choose some random images to display
+indices = np.arange(n_inputs)
+random_indices = np.random.choice(indices, size=5)
+
+for i, image in enumerate(digits.images[random_indices]):
+    plt.subplot(1, 5, i+1)
+    plt.axis(&#39;off&#39;)
+    plt.imshow(image, cmap=plt.cm.gray_r, interpolation=&#39;nearest&#39;)
+    plt.title(&quot;Label: %d&quot; % digits.target[random_indices[i]])
+plt.show()
+</pre></div>
+</div>
+</div>
+</div>
+</section>
+<section id="train-and-test-datasets">
+<h2>Train and test datasets<a class="headerlink" href="#train-and-test-datasets" title="Link to this heading">#</a></h2>
+<p>Performing analysis before partitioning the dataset is a major error, that can lead to incorrect conclusions.</p>
+<p>We will reserve <span class="math notranslate nohighlight">\(80 \%\)</span> of our dataset for training and <span class="math notranslate nohighlight">\(20 \%\)</span> for testing.</p>
+<p>It is important that the train and test datasets are drawn randomly from our dataset, to ensure
+no bias in the sampling.<br />
+Say you are taking measurements of weather data to predict the weather in the coming 5 days.
+You don’t want to train your model on measurements taken from the hours 00.00 to 12.00, and then test it on data
+collected from 12.00 to 24.00.</p>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-none notranslate"><div class="highlight"><pre><span></span>from sklearn.model_selection import train_test_split
+
+# one-liner from scikit-learn library
+train_size = 0.8
+test_size = 1 - train_size
+X_train, X_test, Y_train, Y_test = train_test_split(inputs, labels, train_size=train_size,
+                                                    test_size=test_size)
+
+# equivalently in numpy
+def train_test_split_numpy(inputs, labels, train_size, test_size):
+    n_inputs = len(inputs)
+    inputs_shuffled = inputs.copy()
+    labels_shuffled = labels.copy()
+    
+    np.random.shuffle(inputs_shuffled)
+    np.random.shuffle(labels_shuffled)
+    
+    train_end = int(n_inputs*train_size)
+    X_train, X_test = inputs_shuffled[:train_end], inputs_shuffled[train_end:]
+    Y_train, Y_test = labels_shuffled[:train_end], labels_shuffled[train_end:]
+    
+    return X_train, X_test, Y_train, Y_test
+
+#X_train, X_test, Y_train, Y_test = train_test_split_numpy(inputs, labels, train_size, test_size)
+
+print(&quot;Number of training images: &quot; + str(len(X_train)))
+print(&quot;Number of test images: &quot; + str(len(X_test)))
+</pre></div>
+</div>
+</div>
+</div>
+</section>
+<section id="define-model-and-architecture">
+<h2>Define model and architecture<a class="headerlink" href="#define-model-and-architecture" title="Link to this heading">#</a></h2>
+<p>Our simple feed-forward neural network will consist of an <em>input</em> layer, a single <em>hidden</em> layer and an <em>output</em> layer. The activation <span class="math notranslate nohighlight">\(y\)</span> of each neuron is a weighted sum of inputs, passed through an activation function. In case of the simple perceptron model we have</p>
+<div class="math notranslate nohighlight">
+\[ z = \sum_{i=1}^n w_i a_i ,\]</div>
+<div class="math notranslate nohighlight">
+\[ y = f(z) ,\]</div>
+<p>where <span class="math notranslate nohighlight">\(f\)</span> is the activation function, <span class="math notranslate nohighlight">\(a_i\)</span> represents input from neuron <span class="math notranslate nohighlight">\(i\)</span> in the preceding layer
+and <span class="math notranslate nohighlight">\(w_i\)</span> is the weight to input <span class="math notranslate nohighlight">\(i\)</span>.<br />
+The activation of the neurons in the input layer is just the features (e.g. a pixel value).</p>
+<p>The simplest activation function for a neuron is the <em>Heaviside</em> function:</p>
+<div class="math notranslate nohighlight">
+\[\begin{split} f(z) = 
+\begin{cases}
+1,  &amp;  z &gt; 0\\
+0,  &amp; \text{otherwise}
+\end{cases}
+\end{split}\]</div>
+<p>A feed-forward neural network with this activation is known as a <em>perceptron</em>.<br />
+For a binary classifier (i.e. two classes, 0 or 1, dog or not-dog) we can also use this in our output layer.<br />
+This activation can be generalized to <span class="math notranslate nohighlight">\(k\)</span> classes (using e.g. the <em>one-against-all</em> strategy),
+and we call these architectures <em>multiclass perceptrons</em>.</p>
+<p>However, it is now common to use the terms Single Layer Perceptron (SLP) (1 hidden layer) and<br />
+Multilayer Perceptron (MLP) (2 or more hidden layers) to refer to feed-forward neural networks with any activation function.</p>
+<p>Typical choices for activation functions include the sigmoid function, hyperbolic tangent, and Rectified Linear Unit (ReLU).<br />
+We will be using the sigmoid function <span class="math notranslate nohighlight">\(\sigma(x)\)</span>:</p>
+<div class="math notranslate nohighlight">
+\[ f(x) = \sigma(x) = \frac{1}{1 + e^{-x}} ,\]</div>
+<p>which is inspired by probability theory (see logistic regression) and was most commonly used until about 2011. See the discussion below concerning other activation functions.</p>
+</section>
+<section id="layers">
+<h2>Layers<a class="headerlink" href="#layers" title="Link to this heading">#</a></h2>
+<ul class="simple">
+<li><p>Input</p></li>
+</ul>
+<p>Since each input image has 8x8 = 64 pixels or features, we have an input layer of 64 neurons.</p>
+<ul class="simple">
+<li><p>Hidden layer</p></li>
+</ul>
+<p>We will use 50 neurons in the hidden layer receiving input from the neurons in the input layer.<br />
+Since each neuron in the hidden layer is connected to the 64 inputs we have 64x50 = 3200 weights to the hidden layer.</p>
+<ul class="simple">
+<li><p>Output</p></li>
+</ul>
+<p>If we were building a binary classifier, it would be sufficient with a single neuron in the output layer,
+which could output 0 or 1 according to the Heaviside function. This would be an example of a <em>hard</em> classifier, meaning it outputs the class of the input directly. However, if we are dealing with noisy data it is often beneficial to use a <em>soft</em> classifier, which outputs the probability of being in class 0 or 1.</p>
+<p>For a soft binary classifier, we could use a single neuron and interpret the output as either being the probability of being in class 0 or the probability of being in class 1. Alternatively we could use 2 neurons, and interpret each neuron as the probability of being in each class.</p>
+<p>Since we are doing multiclass classification, with 10 categories, it is natural to use 10 neurons in the output layer. We number the neurons <span class="math notranslate nohighlight">\(j = 0,1,...,9\)</span>. The activation of each output neuron <span class="math notranslate nohighlight">\(j\)</span> will be according to the <em>softmax</em> function:</p>
+<div class="math notranslate nohighlight">
+\[ P(\text{class $j$} \mid \text{input $\boldsymbol{a}$}) = \frac{\exp{(\boldsymbol{a}^T \boldsymbol{w}_j)}}
+{\sum_{c=0}^{9} \exp{(\boldsymbol{a}^T \boldsymbol{w}_c)}} ,\]</div>
+<p>i.e. each neuron <span class="math notranslate nohighlight">\(j\)</span> outputs the probability of being in class <span class="math notranslate nohighlight">\(j\)</span> given an input from the hidden layer <span class="math notranslate nohighlight">\(\boldsymbol{a}\)</span>, with <span class="math notranslate nohighlight">\(\boldsymbol{w}_j\)</span> the weights of neuron <span class="math notranslate nohighlight">\(j\)</span> to the inputs.<br />
+The denominator is a normalization factor to ensure the outputs (probabilities) sum up to 1.<br />
+The exponent is just the weighted sum of inputs as before:</p>
+<div class="math notranslate nohighlight">
+\[ z_j = \sum_{i=1}^n w_ {ij} a_i+b_j.\]</div>
+<p>Since each neuron in the output layer is connected to the 50 inputs from the hidden layer we have 50x10 = 500
+weights to the output layer.</p>
+</section>
+<section id="weights-and-biases">
+<h2>Weights and biases<a class="headerlink" href="#weights-and-biases" title="Link to this heading">#</a></h2>
+<p>Typically weights are initialized with small values distributed around zero, drawn from a uniform
+or normal distribution. Setting all weights to zero means all neurons give the same output, making the network useless.</p>
+<p>Adding a bias value to the weighted sum of inputs allows the neural network to represent a greater range
+of values. Without it, any input with the value 0 will be mapped to zero (before being passed through the activation). The bias unit has an output of 1, and a weight to each neuron <span class="math notranslate nohighlight">\(j\)</span>, <span class="math notranslate nohighlight">\(b_j\)</span>:</p>
+<div class="math notranslate nohighlight">
+\[ z_j = \sum_{i=1}^n w_ {ij} a_i + b_j.\]</div>
+<p>The bias weights <span class="math notranslate nohighlight">\(\boldsymbol{b}\)</span> are often initialized to zero, but a small value like <span class="math notranslate nohighlight">\(0.01\)</span> ensures all neurons have some output which can be backpropagated in the first training cycle.</p>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-none notranslate"><div class="highlight"><pre><span></span># building our neural network
+
+n_inputs, n_features = X_train.shape
+n_hidden_neurons = 50
+n_categories = 10
+
+# we make the weights normally distributed using numpy.random.randn
+
+# weights and bias in the hidden layer
+hidden_weights = np.random.randn(n_features, n_hidden_neurons)
+hidden_bias = np.zeros(n_hidden_neurons) + 0.01
+
+# weights and bias in the output layer
+output_weights = np.random.randn(n_hidden_neurons, n_categories)
+output_bias = np.zeros(n_categories) + 0.01
+</pre></div>
+</div>
+</div>
+</div>
+</section>
+<section id="feed-forward-pass">
+<h2>Feed-forward pass<a class="headerlink" href="#feed-forward-pass" title="Link to this heading">#</a></h2>
+<p>Denote <span class="math notranslate nohighlight">\(F\)</span> the number of features, <span class="math notranslate nohighlight">\(H\)</span> the number of hidden neurons and <span class="math notranslate nohighlight">\(C\)</span> the number of categories.<br />
+For each input image we calculate a weighted sum of input features (pixel values) to each neuron <span class="math notranslate nohighlight">\(j\)</span> in the hidden layer <span class="math notranslate nohighlight">\(l\)</span>:</p>
+<div class="math notranslate nohighlight">
+\[ z_{j}^{l} = \sum_{i=1}^{F} w_{ij}^{l} x_i + b_{j}^{l},\]</div>
+<p>this is then passed through our activation function</p>
+<div class="math notranslate nohighlight">
+\[ a_{j}^{l} = f(z_{j}^{l}) .\]</div>
+<p>We calculate a weighted sum of inputs (activations in the hidden layer) to each neuron <span class="math notranslate nohighlight">\(j\)</span> in the output layer:</p>
+<div class="math notranslate nohighlight">
+\[ z_{j}^{L} = \sum_{i=1}^{H} w_{ij}^{L} a_{i}^{l} + b_{j}^{L}.\]</div>
+<p>Finally we calculate the output of neuron <span class="math notranslate nohighlight">\(j\)</span> in the output layer using the softmax function:</p>
+<div class="math notranslate nohighlight">
+\[ a_{j}^{L} = \frac{\exp{(z_j^{L})}}
+{\sum_{c=0}^{C-1} \exp{(z_c^{L})}} .\]</div>
+</section>
+<section id="matrix-multiplications">
+<h2>Matrix multiplications<a class="headerlink" href="#matrix-multiplications" title="Link to this heading">#</a></h2>
+<p>Since our data has the dimensions <span class="math notranslate nohighlight">\(X = (n_{inputs}, n_{features})\)</span> and our weights to the hidden
+layer have the dimensions<br />
+<span class="math notranslate nohighlight">\(W_{hidden} = (n_{features}, n_{hidden})\)</span>,
+we can easily feed the network all our training data in one go by taking the matrix product</p>
+<div class="math notranslate nohighlight">
+\[ X W^{h} = (n_{inputs}, n_{hidden}),\]</div>
+<p>and obtain a matrix that holds the weighted sum of inputs to the hidden layer
+for each input image and each hidden neuron.<br />
+We also add the bias to obtain a matrix of weighted sums to the hidden layer <span class="math notranslate nohighlight">\(Z^{h}\)</span>:</p>
+<div class="math notranslate nohighlight">
+\[ \boldsymbol{z}^{l} = \boldsymbol{X} \boldsymbol{W}^{l} + \boldsymbol{b}^{l} ,\]</div>
+<p>meaning the same bias (1D array with size equal number of hidden neurons) is added to each input image.<br />
+This is then passed through the activation:</p>
+<div class="math notranslate nohighlight">
+\[ \boldsymbol{a}^{l} = f(\boldsymbol{z}^l) .\]</div>
+<p>This is fed to the output layer:</p>
+<div class="math notranslate nohighlight">
+\[ \boldsymbol{z}^{L} = \boldsymbol{a}^{L} \boldsymbol{W}^{L} + \boldsymbol{b}^{L} .\]</div>
+<p>Finally we receive our output values for each image and each category by passing it through the softmax function:</p>
+<div class="math notranslate nohighlight">
+\[ output = softmax (\boldsymbol{z}^{L}) = (n_{inputs}, n_{categories}) .\]</div>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-none notranslate"><div class="highlight"><pre><span></span># setup the feed-forward pass, subscript h = hidden layer
+
+def sigmoid(x):
+    return 1/(1 + np.exp(-x))
+
+def feed_forward(X):
+    # weighted sum of inputs to the hidden layer
+    z_h = np.matmul(X, hidden_weights) + hidden_bias
+    # activation in the hidden layer
+    a_h = sigmoid(z_h)
+    
+    # weighted sum of inputs to the output layer
+    z_o = np.matmul(a_h, output_weights) + output_bias
+    # softmax output
+    # axis 0 holds each input and axis 1 the probabilities of each category
+    exp_term = np.exp(z_o)
+    probabilities = exp_term / np.sum(exp_term, axis=1, keepdims=True)
+    
+    return probabilities
+
+probabilities = feed_forward(X_train)
+print(&quot;probabilities = (n_inputs, n_categories) = &quot; + str(probabilities.shape))
+print(&quot;probability that image 0 is in category 0,1,2,...,9 = \n&quot; + str(probabilities[0]))
+print(&quot;probabilities sum up to: &quot; + str(probabilities[0].sum()))
+print()
+
+# we obtain a prediction by taking the class with the highest likelihood
+def predict(X):
+    probabilities = feed_forward(X)
+    return np.argmax(probabilities, axis=1)
+
+predictions = predict(X_train)
+print(&quot;predictions = (n_inputs) = &quot; + str(predictions.shape))
+print(&quot;prediction for image 0: &quot; + str(predictions[0]))
+print(&quot;correct label for image 0: &quot; + str(Y_train[0]))
+</pre></div>
+</div>
+</div>
+</div>
+</section>
+<section id="choose-cost-function-and-optimizer">
+<h2>Choose cost function and optimizer<a class="headerlink" href="#choose-cost-function-and-optimizer" title="Link to this heading">#</a></h2>
+<p>To measure how well our neural network is doing we need to introduce a cost function.<br />
+We will call the function that gives the error of a single sample output the <em>loss</em> function, and the function
+that gives the total error of our network across all samples the <em>cost</em> function.
+A typical choice for multiclass classification is the <em>cross-entropy</em> loss, also known as the negative log likelihood.</p>
+<p>In <em>multiclass</em> classification it is common to treat each integer label as a so called <em>one-hot</em> vector:</p>
+<div class="math notranslate nohighlight">
+\[ y = 5 \quad \rightarrow \quad \boldsymbol{y} = (0, 0, 0, 0, 0, 1, 0, 0, 0, 0) ,\]</div>
+<div class="math notranslate nohighlight">
+\[ y = 1 \quad \rightarrow \quad \boldsymbol{y} = (0, 1, 0, 0, 0, 0, 0, 0, 0, 0) ,\]</div>
+<p>i.e. a binary bit string of length <span class="math notranslate nohighlight">\(C\)</span>, where <span class="math notranslate nohighlight">\(C = 10\)</span> is the number of classes in the MNIST dataset.</p>
+<p>Let <span class="math notranslate nohighlight">\(y_{ic}\)</span> denote the <span class="math notranslate nohighlight">\(c\)</span>-th component of the <span class="math notranslate nohighlight">\(i\)</span>-th one-hot vector.<br />
+We define the cost function <span class="math notranslate nohighlight">\(\mathcal{C}\)</span> as a sum over the cross-entropy loss for each point <span class="math notranslate nohighlight">\(\boldsymbol{x}_i\)</span> in the dataset.</p>
+<p>In the one-hot representation only one of the terms in the loss function is non-zero, namely the
+probability of the correct category <span class="math notranslate nohighlight">\(c'\)</span><br />
+(i.e. the category <span class="math notranslate nohighlight">\(c'\)</span> such that <span class="math notranslate nohighlight">\(y_{ic'} = 1\)</span>). This means that the cross entropy loss only punishes you for how wrong
+you got the correct label. The probability of category <span class="math notranslate nohighlight">\(c\)</span> is given by the softmax function. The vector <span class="math notranslate nohighlight">\(\boldsymbol{\theta}\)</span> represents the parameters of our network, i.e. all the weights and biases.</p>
+</section>
+<section id="optimizing-the-cost-function">
+<h2>Optimizing the cost function<a class="headerlink" href="#optimizing-the-cost-function" title="Link to this heading">#</a></h2>
+<p>The network is trained by finding the weights and biases that minimize the cost function. One of the most widely used classes of methods is <em>gradient descent</em> and its generalizations. The idea behind gradient descent
+is simply to adjust the weights in the direction where the gradient of the cost function is large and negative. This ensures we flow toward a <em>local</em> minimum of the cost function.<br />
+Each parameter <span class="math notranslate nohighlight">\(\theta\)</span> is iteratively adjusted according to the rule</p>
+<div class="math notranslate nohighlight">
+\[ \theta_{i+1} = \theta_i - \eta \nabla \mathcal{C}(\theta_i) ,\]</div>
+<p>where <span class="math notranslate nohighlight">\(\eta\)</span> is known as the <em>learning rate</em>, which controls how big a step we take towards the minimum.<br />
+This update can be repeated for any number of iterations, or until we are satisfied with the result.</p>
+<p>A simple and effective improvement is a variant called <em>Batch Gradient Descent</em>.<br />
+Instead of calculating the gradient on the whole dataset, we calculate an approximation of the gradient
+on a subset of the data called a <em>minibatch</em>.<br />
+If there are <span class="math notranslate nohighlight">\(N\)</span> data points and we have a minibatch size of <span class="math notranslate nohighlight">\(M\)</span>, the total number of batches
+is <span class="math notranslate nohighlight">\(N/M\)</span>.<br />
+We denote each minibatch <span class="math notranslate nohighlight">\(B_k\)</span>, with <span class="math notranslate nohighlight">\(k = 1, 2,...,N/M\)</span>. The gradient then becomes:</p>
+<div class="math notranslate nohighlight">
+\[ \nabla \mathcal{C}(\theta) = \frac{1}{N} \sum_{i=1}^N \nabla \mathcal{L}_i(\theta) \quad \rightarrow \quad
+\frac{1}{M} \sum_{i \in B_k} \nabla \mathcal{L}_i(\theta) ,\]</div>
+<p>i.e. instead of averaging the loss over the entire dataset, we average over a minibatch.</p>
+<p>This has two important benefits:</p>
+<ol class="arabic simple">
+<li><p>Introducing stochasticity decreases the chance that the algorithm becomes stuck in a local minima.</p></li>
+<li><p>It significantly speeds up the calculation, since we do not have to use the entire dataset to calculate the gradient.</p></li>
+</ol>
+<p>The various optmization  methods, with codes and algorithms,  are discussed in our lectures on <a class="reference external" href="https://compphysics.github.io/MachineLearning/doc/pub/Splines/html/Splines-bs.html">Gradient descent approaches</a>.</p>
+</section>
+<section id="regularization">
+<h2>Regularization<a class="headerlink" href="#regularization" title="Link to this heading">#</a></h2>
+<p>It is common to add an extra term to the cost function, proportional
+to the size of the weights.  This is equivalent to constraining the
+size of the weights, so that they do not grow out of control.
+Constraining the size of the weights means that the weights cannot
+grow arbitrarily large to fit the training data, and in this way
+reduces <em>overfitting</em>.</p>
+<p>We will measure the size of the weights using the so called <em>L2-norm</em>, meaning our cost function becomes:</p>
+<div class="math notranslate nohighlight">
+\[  \mathcal{C}(\theta) = \frac{1}{N} \sum_{i=1}^N \mathcal{L}_i(\theta) \quad \rightarrow \quad
+\frac{1}{N} \sum_{i=1}^N  \mathcal{L}_i(\theta) + \lambda \lvert \lvert \boldsymbol{w} \rvert \rvert_2^2 
+= \frac{1}{N} \sum_{i=1}^N  \mathcal{L}(\theta) + \lambda \sum_{ij} w_{ij}^2,\]</div>
+<p>i.e. we sum up all the weights squared. The factor <span class="math notranslate nohighlight">\(\lambda\)</span> is known as a regularization parameter.</p>
+<p>In order to train the model, we need to calculate the derivative of
+the cost function with respect to every bias and weight in the
+network.  In total our network has <span class="math notranslate nohighlight">\((64 + 1)\times 50=3250\)</span> weights in
+the hidden layer and <span class="math notranslate nohighlight">\((50 + 1)\times 10=510\)</span> weights to the output
+layer (<span class="math notranslate nohighlight">\(+1\)</span> for the bias), and the gradient must be calculated for
+every parameter.  We use the <em>backpropagation</em> algorithm discussed
+above. This is a clever use of the chain rule that allows us to
+calculate the gradient efficently.</p>
+</section>
+<section id="matrix-multiplication">
+<h2>Matrix  multiplication<a class="headerlink" href="#matrix-multiplication" title="Link to this heading">#</a></h2>
+<p>To more efficently train our network these equations are implemented using matrix operations.<br />
+The error in the output layer is calculated simply as, with <span class="math notranslate nohighlight">\(\boldsymbol{t}\)</span> being our targets,</p>
+<div class="math notranslate nohighlight">
+\[ \delta_L = \boldsymbol{t} - \boldsymbol{y} = (n_{inputs}, n_{categories}) .\]</div>
+<p>The gradient for the output weights is calculated as</p>
+<div class="math notranslate nohighlight">
+\[ \nabla W_{L} = \boldsymbol{a}^T \delta_L   = (n_{hidden}, n_{categories}) ,\]</div>
+<p>where <span class="math notranslate nohighlight">\(\boldsymbol{a} = (n_{inputs}, n_{hidden})\)</span>. This simply means that we are summing up the gradients for each input.<br />
+Since we are going backwards we have to transpose the activation matrix.</p>
+<p>The gradient with respect to the output bias is then</p>
+<div class="math notranslate nohighlight">
+\[ \nabla \boldsymbol{b}_{L} = \sum_{i=1}^{n_{inputs}} \delta_L = (n_{categories}) .\]</div>
+<p>The error in the hidden layer is</p>
+<div class="math notranslate nohighlight">
+\[ \Delta_h = \delta_L W_{L}^T \circ f'(z_{h}) = \delta_L W_{L}^T \circ a_{h} \circ (1 - a_{h}) = (n_{inputs}, n_{hidden}) ,\]</div>
+<p>where <span class="math notranslate nohighlight">\(f'(a_{h})\)</span> is the derivative of the activation in the hidden layer. The matrix products mean
+that we are summing up the products for each neuron in the output layer. The symbol <span class="math notranslate nohighlight">\(\circ\)</span> denotes
+the <em>Hadamard product</em>, meaning element-wise multiplication.</p>
+<p>This again gives us the gradients in the hidden layer:</p>
+<div class="math notranslate nohighlight">
+\[ \nabla W_{h} = X^T \delta_h = (n_{features}, n_{hidden}) ,\]</div>
+<div class="math notranslate nohighlight">
+\[ \nabla b_{h} = \sum_{i=1}^{n_{inputs}} \delta_h = (n_{hidden}) .\]</div>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-none notranslate"><div class="highlight"><pre><span></span># to categorical turns our integer vector into a onehot representation
+from sklearn.metrics import accuracy_score
+
+# one-hot in numpy
+def to_categorical_numpy(integer_vector):
+    n_inputs = len(integer_vector)
+    n_categories = np.max(integer_vector) + 1
+    onehot_vector = np.zeros((n_inputs, n_categories))
+    onehot_vector[range(n_inputs), integer_vector] = 1
+    
+    return onehot_vector
+
+#Y_train_onehot, Y_test_onehot = to_categorical(Y_train), to_categorical(Y_test)
+Y_train_onehot, Y_test_onehot = to_categorical_numpy(Y_train), to_categorical_numpy(Y_test)
+
+def feed_forward_train(X):
+    # weighted sum of inputs to the hidden layer
+    z_h = np.matmul(X, hidden_weights) + hidden_bias
+    # activation in the hidden layer
+    a_h = sigmoid(z_h)
+    
+    # weighted sum of inputs to the output layer
+    z_o = np.matmul(a_h, output_weights) + output_bias
+    # softmax output
+    # axis 0 holds each input and axis 1 the probabilities of each category
+    exp_term = np.exp(z_o)
+    probabilities = exp_term / np.sum(exp_term, axis=1, keepdims=True)
+    
+    # for backpropagation need activations in hidden and output layers
+    return a_h, probabilities
+
+def backpropagation(X, Y):
+    a_h, probabilities = feed_forward_train(X)
+    
+    # error in the output layer
+    error_output = probabilities - Y
+    # error in the hidden layer
+    error_hidden = np.matmul(error_output, output_weights.T) * a_h * (1 - a_h)
+    
+    # gradients for the output layer
+    output_weights_gradient = np.matmul(a_h.T, error_output)
+    output_bias_gradient = np.sum(error_output, axis=0)
+    
+    # gradient for the hidden layer
+    hidden_weights_gradient = np.matmul(X.T, error_hidden)
+    hidden_bias_gradient = np.sum(error_hidden, axis=0)
+
+    return output_weights_gradient, output_bias_gradient, hidden_weights_gradient, hidden_bias_gradient
+
+print(&quot;Old accuracy on training data: &quot; + str(accuracy_score(predict(X_train), Y_train)))
+
+eta = 0.01
+lmbd = 0.01
+for i in range(1000):
+    # calculate gradients
+    dWo, dBo, dWh, dBh = backpropagation(X_train, Y_train_onehot)
+    
+    # regularization term gradients
+    dWo += lmbd * output_weights
+    dWh += lmbd * hidden_weights
+    
+    # update weights and biases
+    output_weights -= eta * dWo
+    output_bias -= eta * dBo
+    hidden_weights -= eta * dWh
+    hidden_bias -= eta * dBh
+
+print(&quot;New accuracy on training data: &quot; + str(accuracy_score(predict(X_train), Y_train)))
+</pre></div>
+</div>
+</div>
+</div>
+</section>
+<section id="improving-performance">
+<h2>Improving performance<a class="headerlink" href="#improving-performance" title="Link to this heading">#</a></h2>
+<p>As we can see the network does not seem to be learning at all. It seems to be just guessing the label for each image.<br />
+In order to obtain a network that does something useful, we will have to do a bit more work.</p>
+<p>The choice of <em>hyperparameters</em> such as learning rate and regularization parameter is hugely influential for the performance of the network. Typically a <em>grid-search</em> is performed, wherein we test different hyperparameters separated by orders of magnitude. For example we could test the learning rates <span class="math notranslate nohighlight">\(\eta = 10^{-6}, 10^{-5},...,10^{-1}\)</span> with different regularization parameters <span class="math notranslate nohighlight">\(\lambda = 10^{-6},...,10^{-0}\)</span>.</p>
+<p>Next, we haven’t implemented minibatching yet, which introduces stochasticity and is though to act as an important regularizer on the weights. We call a feed-forward + backward pass with a minibatch an <em>iteration</em>, and a full training period
+going through the entire dataset (<span class="math notranslate nohighlight">\(n/M\)</span> batches) an <em>epoch</em>.</p>
+<p>If this does not improve network performance, you may want to consider altering the network architecture, adding more neurons or hidden layers.<br />
+Andrew Ng goes through some of these considerations in this <a class="reference external" href="https://youtu.be/F1ka6a13S9I">video</a>. You can find a summary of the video <a class="reference external" href="https://kevinzakka.github.io/2016/09/26/applying-deep-learning/">here</a>.</p>
+</section>
+<section id="full-object-oriented-implementation">
+<h2>Full object-oriented implementation<a class="headerlink" href="#full-object-oriented-implementation" title="Link to this heading">#</a></h2>
+<p>It is very natural to think of the network as an object, with specific instances of the network
+being realizations of this object with different hyperparameters. An implementation using Python classes provides a clean structure and interface, and the full implementation of our neural network is given below.</p>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-none notranslate"><div class="highlight"><pre><span></span>class NeuralNetwork:
+    def __init__(
+            self,
+            X_data,
+            Y_data,
+            n_hidden_neurons=50,
+            n_categories=10,
+            epochs=10,
+            batch_size=100,
+            eta=0.1,
+            lmbd=0.0):
+
+        self.X_data_full = X_data
+        self.Y_data_full = Y_data
+
+        self.n_inputs = X_data.shape[0]
+        self.n_features = X_data.shape[1]
+        self.n_hidden_neurons = n_hidden_neurons
+        self.n_categories = n_categories
+
+        self.epochs = epochs
+        self.batch_size = batch_size
+        self.iterations = self.n_inputs // self.batch_size
+        self.eta = eta
+        self.lmbd = lmbd
+
+        self.create_biases_and_weights()
+
+    def create_biases_and_weights(self):
+        self.hidden_weights = np.random.randn(self.n_features, self.n_hidden_neurons)
+        self.hidden_bias = np.zeros(self.n_hidden_neurons) + 0.01
+
+        self.output_weights = np.random.randn(self.n_hidden_neurons, self.n_categories)
+        self.output_bias = np.zeros(self.n_categories) + 0.01
+
+    def feed_forward(self):
+        # feed-forward for training
+        self.z_h = np.matmul(self.X_data, self.hidden_weights) + self.hidden_bias
+        self.a_h = sigmoid(self.z_h)
+
+        self.z_o = np.matmul(self.a_h, self.output_weights) + self.output_bias
+
+        exp_term = np.exp(self.z_o)
+        self.probabilities = exp_term / np.sum(exp_term, axis=1, keepdims=True)
+
+    def feed_forward_out(self, X):
+        # feed-forward for output
+        z_h = np.matmul(X, self.hidden_weights) + self.hidden_bias
+        a_h = sigmoid(z_h)
+
+        z_o = np.matmul(a_h, self.output_weights) + self.output_bias
+        
+        exp_term = np.exp(z_o)
+        probabilities = exp_term / np.sum(exp_term, axis=1, keepdims=True)
+        return probabilities
+
+    def backpropagation(self):
+        error_output = self.probabilities - self.Y_data
+        error_hidden = np.matmul(error_output, self.output_weights.T) * self.a_h * (1 - self.a_h)
+
+        self.output_weights_gradient = np.matmul(self.a_h.T, error_output)
+        self.output_bias_gradient = np.sum(error_output, axis=0)
+
+        self.hidden_weights_gradient = np.matmul(self.X_data.T, error_hidden)
+        self.hidden_bias_gradient = np.sum(error_hidden, axis=0)
+
+        if self.lmbd &gt; 0.0:
+            self.output_weights_gradient += self.lmbd * self.output_weights
+            self.hidden_weights_gradient += self.lmbd * self.hidden_weights
+
+        self.output_weights -= self.eta * self.output_weights_gradient
+        self.output_bias -= self.eta * self.output_bias_gradient
+        self.hidden_weights -= self.eta * self.hidden_weights_gradient
+        self.hidden_bias -= self.eta * self.hidden_bias_gradient
+
+    def predict(self, X):
+        probabilities = self.feed_forward_out(X)
+        return np.argmax(probabilities, axis=1)
+
+    def predict_probabilities(self, X):
+        probabilities = self.feed_forward_out(X)
+        return probabilities
+
+    def train(self):
+        data_indices = np.arange(self.n_inputs)
+
+        for i in range(self.epochs):
+            for j in range(self.iterations):
+                # pick datapoints with replacement
+                chosen_datapoints = np.random.choice(
+                    data_indices, size=self.batch_size, replace=False
+                )
+
+                # minibatch training data
+                self.X_data = self.X_data_full[chosen_datapoints]
+                self.Y_data = self.Y_data_full[chosen_datapoints]
+
+                self.feed_forward()
+                self.backpropagation()
+</pre></div>
+</div>
+</div>
+</div>
+</section>
+<section id="evaluate-model-performance-on-test-data">
+<h2>Evaluate model performance on test data<a class="headerlink" href="#evaluate-model-performance-on-test-data" title="Link to this heading">#</a></h2>
+<p>To measure the performance of our network we evaluate how well it does it data it has never seen before, i.e. the test data.<br />
+We measure the performance of the network using the <em>accuracy</em> score.<br />
+The accuracy is as you would expect just the number of images correctly labeled divided by the total number of images. A perfect classifier will have an accuracy score of <span class="math notranslate nohighlight">\(1\)</span>.</p>
+<div class="math notranslate nohighlight">
+\[ \text{Accuracy} = \frac{\sum_{i=1}^n I(\tilde{y}_i = y_i)}{n} ,\]</div>
+<p>where <span class="math notranslate nohighlight">\(I\)</span> is the indicator function, <span class="math notranslate nohighlight">\(1\)</span> if <span class="math notranslate nohighlight">\(\tilde{y}_i = y_i\)</span> and <span class="math notranslate nohighlight">\(0\)</span> otherwise.</p>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-none notranslate"><div class="highlight"><pre><span></span>epochs = 100
+batch_size = 100
+
+dnn = NeuralNetwork(X_train, Y_train_onehot, eta=eta, lmbd=lmbd, epochs=epochs, batch_size=batch_size,
+                    n_hidden_neurons=n_hidden_neurons, n_categories=n_categories)
+dnn.train()
+test_predict = dnn.predict(X_test)
+
+# accuracy score from scikit library
+print(&quot;Accuracy score on test set: &quot;, accuracy_score(Y_test, test_predict))
+
+# equivalent in numpy
+def accuracy_score_numpy(Y_test, Y_pred):
+    return np.sum(Y_test == Y_pred) / len(Y_test)
+
+#print(&quot;Accuracy score on test set: &quot;, accuracy_score_numpy(Y_test, test_predict))
+</pre></div>
+</div>
+</div>
+</div>
+</section>
+<section id="adjust-hyperparameters">
+<h2>Adjust hyperparameters<a class="headerlink" href="#adjust-hyperparameters" title="Link to this heading">#</a></h2>
+<p>We now perform a grid search to find the optimal hyperparameters for the network.<br />
+Note that we are only using 1 layer with 50 neurons, and human performance is estimated to be around <span class="math notranslate nohighlight">\(98\%\)</span> (<span class="math notranslate nohighlight">\(2\%\)</span> error rate).</p>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-none notranslate"><div class="highlight"><pre><span></span>eta_vals = np.logspace(-5, 1, 7)
+lmbd_vals = np.logspace(-5, 1, 7)
+# store the models for later use
+DNN_numpy = np.zeros((len(eta_vals), len(lmbd_vals)), dtype=object)
+
+# grid search
+for i, eta in enumerate(eta_vals):
+    for j, lmbd in enumerate(lmbd_vals):
+        dnn = NeuralNetwork(X_train, Y_train_onehot, eta=eta, lmbd=lmbd, epochs=epochs, batch_size=batch_size,
+                            n_hidden_neurons=n_hidden_neurons, n_categories=n_categories)
+        dnn.train()
+        
+        DNN_numpy[i][j] = dnn
+        
+        test_predict = dnn.predict(X_test)
+        
+        print(&quot;Learning rate  = &quot;, eta)
+        print(&quot;Lambda = &quot;, lmbd)
+        print(&quot;Accuracy score on test set: &quot;, accuracy_score(Y_test, test_predict))
+        print()
+</pre></div>
+</div>
+</div>
+</div>
+</section>
+<section id="visualization">
+<h2>Visualization<a class="headerlink" href="#visualization" title="Link to this heading">#</a></h2>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-none notranslate"><div class="highlight"><pre><span></span># visual representation of grid search
+# uses seaborn heatmap, you can also do this with matplotlib imshow
+import seaborn as sns
+
+sns.set()
+
+train_accuracy = np.zeros((len(eta_vals), len(lmbd_vals)))
+test_accuracy = np.zeros((len(eta_vals), len(lmbd_vals)))
+
+for i in range(len(eta_vals)):
+    for j in range(len(lmbd_vals)):
+        dnn = DNN_numpy[i][j]
+        
+        train_pred = dnn.predict(X_train) 
+        test_pred = dnn.predict(X_test)
+
+        train_accuracy[i][j] = accuracy_score(Y_train, train_pred)
+        test_accuracy[i][j] = accuracy_score(Y_test, test_pred)
+
+        
+fig, ax = plt.subplots(figsize = (10, 10))
+sns.heatmap(train_accuracy, annot=True, ax=ax, cmap=&quot;viridis&quot;)
+ax.set_title(&quot;Training Accuracy&quot;)
+ax.set_ylabel(&quot;$\eta$&quot;)
+ax.set_xlabel(&quot;$\lambda$&quot;)
+plt.show()
+
+fig, ax = plt.subplots(figsize = (10, 10))
+sns.heatmap(test_accuracy, annot=True, ax=ax, cmap=&quot;viridis&quot;)
+ax.set_title(&quot;Test Accuracy&quot;)
+ax.set_ylabel(&quot;$\eta$&quot;)
+ax.set_xlabel(&quot;$\lambda$&quot;)
+plt.show()
+</pre></div>
+</div>
+</div>
+</div>
+</section>
+<section id="scikit-learn-implementation">
+<h2>scikit-learn implementation<a class="headerlink" href="#scikit-learn-implementation" title="Link to this heading">#</a></h2>
+<p><strong>scikit-learn</strong> focuses more
+on traditional machine learning methods, such as regression,
+clustering, decision trees, etc. As such, it has only two types of
+neural networks: Multi Layer Perceptron outputting continuous values,
+<em>MPLRegressor</em>, and Multi Layer Perceptron outputting labels,
+<em>MLPClassifier</em>. We will see how simple it is to use these classes.</p>
+<p><strong>scikit-learn</strong> implements a few improvements from our neural network,
+such as early stopping, a varying learning rate, different
+optimization methods, etc. We would therefore expect a better
+performance overall.</p>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-none notranslate"><div class="highlight"><pre><span></span>from sklearn.neural_network import MLPClassifier
+# store models for later use
+DNN_scikit = np.zeros((len(eta_vals), len(lmbd_vals)), dtype=object)
+
+for i, eta in enumerate(eta_vals):
+    for j, lmbd in enumerate(lmbd_vals):
+        dnn = MLPClassifier(hidden_layer_sizes=(n_hidden_neurons), activation=&#39;logistic&#39;,
+                            alpha=lmbd, learning_rate_init=eta, max_iter=epochs)
+        dnn.fit(X_train, Y_train)
+        
+        DNN_scikit[i][j] = dnn
+        
+        print(&quot;Learning rate  = &quot;, eta)
+        print(&quot;Lambda = &quot;, lmbd)
+        print(&quot;Accuracy score on test set: &quot;, dnn.score(X_test, Y_test))
+        print()
+</pre></div>
+</div>
+</div>
+</div>
+</section>
+<section id="id1">
+<h2>Visualization<a class="headerlink" href="#id1" title="Link to this heading">#</a></h2>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-none notranslate"><div class="highlight"><pre><span></span># optional
+# visual representation of grid search
+# uses seaborn heatmap, could probably do this in matplotlib
+import seaborn as sns
+
+sns.set()
+
+train_accuracy = np.zeros((len(eta_vals), len(lmbd_vals)))
+test_accuracy = np.zeros((len(eta_vals), len(lmbd_vals)))
+
+for i in range(len(eta_vals)):
+    for j in range(len(lmbd_vals)):
+        dnn = DNN_scikit[i][j]
+        
+        train_pred = dnn.predict(X_train) 
+        test_pred = dnn.predict(X_test)
+
+        train_accuracy[i][j] = accuracy_score(Y_train, train_pred)
+        test_accuracy[i][j] = accuracy_score(Y_test, test_pred)
+
+        
+fig, ax = plt.subplots(figsize = (10, 10))
+sns.heatmap(train_accuracy, annot=True, ax=ax, cmap=&quot;viridis&quot;)
+ax.set_title(&quot;Training Accuracy&quot;)
+ax.set_ylabel(&quot;$\eta$&quot;)
+ax.set_xlabel(&quot;$\lambda$&quot;)
+plt.show()
+
+fig, ax = plt.subplots(figsize = (10, 10))
+sns.heatmap(test_accuracy, annot=True, ax=ax, cmap=&quot;viridis&quot;)
+ax.set_title(&quot;Test Accuracy&quot;)
+ax.set_ylabel(&quot;$\eta$&quot;)
+ax.set_xlabel(&quot;$\lambda$&quot;)
+plt.show()
+</pre></div>
+</div>
+</div>
+</div>
+</section>
+<section id="building-neural-networks-in-tensorflow-and-keras">
+<h2>Building neural networks in Tensorflow and Keras<a class="headerlink" href="#building-neural-networks-in-tensorflow-and-keras" title="Link to this heading">#</a></h2>
+<p>Now we want  to build on the experience gained from our neural network implementation in NumPy and scikit-learn
+and use it to construct a neural network in Tensorflow. Once we have constructed a neural network in NumPy
+and Tensorflow, building one in Keras is really quite trivial, though the performance may suffer.</p>
+<p>In our previous example we used only one hidden layer, and in this we will use two. From this it should be quite
+clear how to build one using an arbitrary number of hidden layers, using data structures such as Python lists or
+NumPy arrays.</p>
+</section>
+<section id="tensorflow">
+<h2>Tensorflow<a class="headerlink" href="#tensorflow" title="Link to this heading">#</a></h2>
+<p>Tensorflow is an open source library machine learning library
+developed by the Google Brain team for internal use. It was released
+under the Apache 2.0 open source license in November 9, 2015.</p>
+<p>Tensorflow is a computational framework that allows you to construct
+machine learning models at different levels of abstraction, from
+high-level, object-oriented APIs like Keras, down to the C++ kernels
+that Tensorflow is built upon. The higher levels of abstraction are
+simpler to use, but less flexible, and our choice of implementation
+should reflect the problems we are trying to solve.</p>
+<p><a class="reference external" href="https://www.tensorflow.org/guide/graphs">Tensorflow uses</a> so-called graphs to represent your computation
+in terms of the dependencies between individual operations, such that you first build a Tensorflow <em>graph</em>
+to represent your model, and then create a Tensorflow <em>session</em> to run the graph.</p>
+<p>In this guide we will analyze the same data as we did in our NumPy and
+scikit-learn tutorial, gathered from the MNIST database of images. We
+will give an introduction to the lower level Python Application
+Program Interfaces (APIs), and see how we use them to build our graph.
+Then we will build (effectively) the same graph in Keras, to see just
+how simple solving a machine learning problem can be.</p>
+<p>To install tensorflow on Unix/Linux systems, use pip as</p>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-none notranslate"><div class="highlight"><pre><span></span>pip3 install tensorflow
+</pre></div>
+</div>
+</div>
+</div>
+<p>and/or if you use <strong>anaconda</strong>, just write (or install from the graphical user interface)
+(current release of CPU-only TensorFlow)</p>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-none notranslate"><div class="highlight"><pre><span></span>conda create -n tf tensorflow
+conda activate tf
+</pre></div>
+</div>
+</div>
+</div>
+<p>To install the current release of GPU TensorFlow</p>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-none notranslate"><div class="highlight"><pre><span></span>conda create -n tf-gpu tensorflow-gpu
+conda activate tf-gpu
+</pre></div>
+</div>
+</div>
+</div>
+</section>
+<section id="using-keras">
+<h2>Using Keras<a class="headerlink" href="#using-keras" title="Link to this heading">#</a></h2>
+<p>Keras is a high level <a class="reference external" href="https://en.wikipedia.org/wiki/Application_programming_interface">neural network</a>
+that supports Tensorflow, CTNK and Theano as backends.<br />
+If you have Anaconda installed you may run the following command</p>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-none notranslate"><div class="highlight"><pre><span></span>conda install keras
+</pre></div>
+</div>
+</div>
+</div>
+<p>You can look up the <a class="reference external" href="https://keras.io/">instructions here</a> for more information.</p>
+<p>We will to a large extent use <strong>keras</strong> in this course.</p>
+</section>
+<section id="id2">
+<h2>Collect and pre-process data<a class="headerlink" href="#id2" title="Link to this heading">#</a></h2>
+<p>Let us look again at the MINST data set.</p>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-none notranslate"><div class="highlight"><pre><span></span># import necessary packages
+import numpy as np
+import matplotlib.pyplot as plt
+import tensorflow as tf
+from sklearn import datasets
+
+
+# ensure the same random numbers appear every time
+np.random.seed(0)
+
+# display images in notebook
+%matplotlib inline
+plt.rcParams[&#39;figure.figsize&#39;] = (12,12)
+
+
+# download MNIST dataset
+digits = datasets.load_digits()
+
+# define inputs and labels
+inputs = digits.images
+labels = digits.target
+
+print(&quot;inputs = (n_inputs, pixel_width, pixel_height) = &quot; + str(inputs.shape))
+print(&quot;labels = (n_inputs) = &quot; + str(labels.shape))
+
+
+# flatten the image
+# the value -1 means dimension is inferred from the remaining dimensions: 8x8 = 64
+n_inputs = len(inputs)
+inputs = inputs.reshape(n_inputs, -1)
+print(&quot;X = (n_inputs, n_features) = &quot; + str(inputs.shape))
+
+
+# choose some random images to display
+indices = np.arange(n_inputs)
+random_indices = np.random.choice(indices, size=5)
+
+for i, image in enumerate(digits.images[random_indices]):
+    plt.subplot(1, 5, i+1)
+    plt.axis(&#39;off&#39;)
+    plt.imshow(image, cmap=plt.cm.gray_r, interpolation=&#39;nearest&#39;)
+    plt.title(&quot;Label: %d&quot; % digits.target[random_indices[i]])
+plt.show()
+</pre></div>
+</div>
+</div>
+</div>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-none notranslate"><div class="highlight"><pre><span></span>from tensorflow.keras.layers import Input
+from tensorflow.keras.models import Sequential      #This allows appending layers to existing models
+from tensorflow.keras.layers import Dense           #This allows defining the characteristics of a particular layer
+from tensorflow.keras import optimizers             #This allows using whichever optimiser we want (sgd,adam,RMSprop)
+from tensorflow.keras import regularizers           #This allows using whichever regularizer we want (l1,l2,l1_l2)
+from tensorflow.keras.utils import to_categorical   #This allows using categorical cross entropy as the cost function
+
+from sklearn.model_selection import train_test_split
+
+# one-hot representation of labels
+labels = to_categorical(labels)
+
+# split into train and test data
+train_size = 0.8
+test_size = 1 - train_size
+X_train, X_test, Y_train, Y_test = train_test_split(inputs, labels, train_size=train_size,
+                                                    test_size=test_size)
+</pre></div>
+</div>
+</div>
+</div>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-none notranslate"><div class="highlight"><pre><span></span>
+epochs = 100
+batch_size = 100
+n_neurons_layer1 = 100
+n_neurons_layer2 = 50
+n_categories = 10
+eta_vals = np.logspace(-5, 1, 7)
+lmbd_vals = np.logspace(-5, 1, 7)
+def create_neural_network_keras(n_neurons_layer1, n_neurons_layer2, n_categories, eta, lmbd):
+    model = Sequential()
+    model.add(Dense(n_neurons_layer1, activation=&#39;sigmoid&#39;, kernel_regularizer=regularizers.l2(lmbd)))
+    model.add(Dense(n_neurons_layer2, activation=&#39;sigmoid&#39;, kernel_regularizer=regularizers.l2(lmbd)))
+    model.add(Dense(n_categories, activation=&#39;softmax&#39;))
+    
+    sgd = optimizers.SGD(lr=eta)
+    model.compile(loss=&#39;categorical_crossentropy&#39;, optimizer=sgd, metrics=[&#39;accuracy&#39;])
+    
+    return model
+</pre></div>
+</div>
+</div>
+</div>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-none notranslate"><div class="highlight"><pre><span></span>DNN_keras = np.zeros((len(eta_vals), len(lmbd_vals)), dtype=object)
+        
+for i, eta in enumerate(eta_vals):
+    for j, lmbd in enumerate(lmbd_vals):
+        DNN = create_neural_network_keras(n_neurons_layer1, n_neurons_layer2, n_categories,
+                                         eta=eta, lmbd=lmbd)
+        DNN.fit(X_train, Y_train, epochs=epochs, batch_size=batch_size, verbose=0)
+        scores = DNN.evaluate(X_test, Y_test)
+        
+        DNN_keras[i][j] = DNN
+        
+        print(&quot;Learning rate = &quot;, eta)
+        print(&quot;Lambda = &quot;, lmbd)
+        print(&quot;Test accuracy: %.3f&quot; % scores[1])
+        print()
+</pre></div>
+</div>
+</div>
+</div>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-none notranslate"><div class="highlight"><pre><span></span># optional
+# visual representation of grid search
+# uses seaborn heatmap, could probably do this in matplotlib
+import seaborn as sns
+
+sns.set()
+
+train_accuracy = np.zeros((len(eta_vals), len(lmbd_vals)))
+test_accuracy = np.zeros((len(eta_vals), len(lmbd_vals)))
+
+for i in range(len(eta_vals)):
+    for j in range(len(lmbd_vals)):
+        DNN = DNN_keras[i][j]
+
+        train_accuracy[i][j] = DNN.evaluate(X_train, Y_train)[1]
+        test_accuracy[i][j] = DNN.evaluate(X_test, Y_test)[1]
+
+        
+fig, ax = plt.subplots(figsize = (10, 10))
+sns.heatmap(train_accuracy, annot=True, ax=ax, cmap=&quot;viridis&quot;)
+ax.set_title(&quot;Training Accuracy&quot;)
+ax.set_ylabel(&quot;$\eta$&quot;)
+ax.set_xlabel(&quot;$\lambda$&quot;)
+plt.show()
+
+fig, ax = plt.subplots(figsize = (10, 10))
+sns.heatmap(test_accuracy, annot=True, ax=ax, cmap=&quot;viridis&quot;)
+ax.set_title(&quot;Test Accuracy&quot;)
+ax.set_ylabel(&quot;$\eta$&quot;)
+ax.set_xlabel(&quot;$\lambda$&quot;)
+plt.show()
+</pre></div>
+</div>
+</div>
+</div>
+</section>
+<section id="building-a-neural-network-code">
+<h2>Building a neural network code<a class="headerlink" href="#building-a-neural-network-code" title="Link to this heading">#</a></h2>
+<p>Here we  present a flexible object oriented codebase
+for a feed forward neural network, along with a demonstration of how
+to use it. Before we get into the details of the neural network, we
+will first present some implementations of various schedulers, cost
+functions and activation functions that can be used together with the
+neural network.</p>
+<p>The codes here were developed by Eric Reber and Gregor Kajda during spring 2023.</p>
+<section id="learning-rate-methods">
+<h3>Learning rate methods<a class="headerlink" href="#learning-rate-methods" title="Link to this heading">#</a></h3>
+<p>The code below shows object oriented implementations of the Constant,
+Momentum, Adagrad, AdagradMomentum, RMS prop and Adam schedulers. All
+of the classes belong to the shared abstract Scheduler class, and
+share the update_change() and reset() methods allowing for any of the
+schedulers to be seamlessly used during the training stage, as will
+later be shown in the fit() method of the neural
+network. Update_change() only has one parameter, the gradient
+(<span class="math notranslate nohighlight">\(δ^l_ja^{l−1}_k\)</span>), and returns the change which will be subtracted
+from the weights. The reset() function takes no parameters, and resets
+the desired variables. For Constant and Momentum, reset does nothing.</p>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-none notranslate"><div class="highlight"><pre><span></span>import autograd.numpy as np
+
+class Scheduler:
+    &quot;&quot;&quot;
+    Abstract class for Schedulers
+    &quot;&quot;&quot;
+
+    def __init__(self, eta):
+        self.eta = eta
+
+    # should be overwritten
+    def update_change(self, gradient):
+        raise NotImplementedError
+
+    # overwritten if needed
+    def reset(self):
+        pass
+
+
+class Constant(Scheduler):
+    def __init__(self, eta):
+        super().__init__(eta)
+
+    def update_change(self, gradient):
+        return self.eta * gradient
+    
+    def reset(self):
+        pass
+
+
+class Momentum(Scheduler):
+    def __init__(self, eta: float, momentum: float):
+        super().__init__(eta)
+        self.momentum = momentum
+        self.change = 0
+
+    def update_change(self, gradient):
+        self.change = self.momentum * self.change + self.eta * gradient
+        return self.change
+
+    def reset(self):
+        pass
+
+
+class Adagrad(Scheduler):
+    def __init__(self, eta):
+        super().__init__(eta)
+        self.G_t = None
+
+    def update_change(self, gradient):
+        delta = 1e-8  # avoid division ny zero
+
+        if self.G_t is None:
+            self.G_t = np.zeros((gradient.shape[0], gradient.shape[0]))
+
+        self.G_t += gradient @ gradient.T
+
+        G_t_inverse = 1 / (
+            delta + np.sqrt(np.reshape(np.diagonal(self.G_t), (self.G_t.shape[0], 1)))
+        )
+        return self.eta * gradient * G_t_inverse
+
+    def reset(self):
+        self.G_t = None
+
+
+class AdagradMomentum(Scheduler):
+    def __init__(self, eta, momentum):
+        super().__init__(eta)
+        self.G_t = None
+        self.momentum = momentum
+        self.change = 0
+
+    def update_change(self, gradient):
+        delta = 1e-8  # avoid division ny zero
+
+        if self.G_t is None:
+            self.G_t = np.zeros((gradient.shape[0], gradient.shape[0]))
+
+        self.G_t += gradient @ gradient.T
+
+        G_t_inverse = 1 / (
+            delta + np.sqrt(np.reshape(np.diagonal(self.G_t), (self.G_t.shape[0], 1)))
+        )
+        self.change = self.change * self.momentum + self.eta * gradient * G_t_inverse
+        return self.change
+
+    def reset(self):
+        self.G_t = None
+
+
+class RMS_prop(Scheduler):
+    def __init__(self, eta, rho):
+        super().__init__(eta)
+        self.rho = rho
+        self.second = 0.0
+
+    def update_change(self, gradient):
+        delta = 1e-8  # avoid division ny zero
+        self.second = self.rho * self.second + (1 - self.rho) * gradient * gradient
+        return self.eta * gradient / (np.sqrt(self.second + delta))
+
+    def reset(self):
+        self.second = 0.0
+
+
+class Adam(Scheduler):
+    def __init__(self, eta, rho, rho2):
+        super().__init__(eta)
+        self.rho = rho
+        self.rho2 = rho2
+        self.moment = 0
+        self.second = 0
+        self.n_epochs = 1
+
+    def update_change(self, gradient):
+        delta = 1e-8  # avoid division ny zero
+
+        self.moment = self.rho * self.moment + (1 - self.rho) * gradient
+        self.second = self.rho2 * self.second + (1 - self.rho2) * gradient * gradient
+
+        moment_corrected = self.moment / (1 - self.rho**self.n_epochs)
+        second_corrected = self.second / (1 - self.rho2**self.n_epochs)
+
+        return self.eta * moment_corrected / (np.sqrt(second_corrected + delta))
+
+    def reset(self):
+        self.n_epochs += 1
+        self.moment = 0
+        self.second = 0
+</pre></div>
+</div>
+</div>
+</div>
+</section>
+<section id="usage-of-the-above-learning-rate-schedulers">
+<h3>Usage of the above learning rate schedulers<a class="headerlink" href="#usage-of-the-above-learning-rate-schedulers" title="Link to this heading">#</a></h3>
+<p>To initalize a scheduler, simply create the object and pass in the
+necessary parameters such as the learning rate and the momentum as
+shown below. As the Scheduler class is an abstract class it should not
+called directly, and will raise an error upon usage.</p>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-none notranslate"><div class="highlight"><pre><span></span>momentum_scheduler = Momentum(eta=1e-3, momentum=0.9)
+adam_scheduler = Adam(eta=1e-3, rho=0.9, rho2=0.999)
+</pre></div>
+</div>
+</div>
+</div>
+<p>Here is a small example for how a segment of code using schedulers
+could look. Switching out the schedulers is simple.</p>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-none notranslate"><div class="highlight"><pre><span></span>weights = np.ones((3,3))
+print(f&quot;Before scheduler:\n{weights=}&quot;)
+
+epochs = 10
+for e in range(epochs):
+    gradient = np.random.rand(3, 3)
+    change = adam_scheduler.update_change(gradient)
+    weights = weights - change
+    adam_scheduler.reset()
+
+print(f&quot;\nAfter scheduler:\n{weights=}&quot;)
+</pre></div>
+</div>
+</div>
+</div>
+</section>
+<section id="cost-functions">
+<h3>Cost functions<a class="headerlink" href="#cost-functions" title="Link to this heading">#</a></h3>
+<p>Here we discuss cost functions that can be used when creating the
+neural network. Every cost function takes the target vector as its
+parameter, and returns a function valued only at <span class="math notranslate nohighlight">\(x\)</span> such that it may
+easily be differentiated.</p>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-none notranslate"><div class="highlight"><pre><span></span>import autograd.numpy as np
+
+def CostOLS(target):
+    
+    def func(X):
+        return (1.0 / target.shape[0]) * np.sum((target - X) ** 2)
+
+    return func
+
+
+def CostLogReg(target):
+
+    def func(X):
+        
+        return -(1.0 / target.shape[0]) * np.sum(
+            (target * np.log(X + 10e-10)) + ((1 - target) * np.log(1 - X + 10e-10))
+        )
+
+    return func
+
+
+def CostCrossEntropy(target):
+    
+    def func(X):
+        return -(1.0 / target.size) * np.sum(target * np.log(X + 10e-10))
+
+    return func
+</pre></div>
+</div>
+</div>
+</div>
+<p>Below we give a short example of how these cost function may be used
+to obtain results if you wish to test them out on your own using
+AutoGrad’s automatics differentiation.</p>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-none notranslate"><div class="highlight"><pre><span></span>from autograd import grad
+
+target = np.array([[1, 2, 3]]).T
+a = np.array([[4, 5, 6]]).T
+
+cost_func = CostCrossEntropy
+cost_func_derivative = grad(cost_func(target))
+
+valued_at_a = cost_func_derivative(a)
+print(f&quot;Derivative of cost function {cost_func.__name__} valued at a:\n{valued_at_a}&quot;)
+</pre></div>
+</div>
+</div>
+</div>
+</section>
+<section id="id3">
+<h3>Activation functions<a class="headerlink" href="#id3" title="Link to this heading">#</a></h3>
+<p>Finally, before we look at the neural network, we will look at the
+activation functions which can be specified between the hidden layers
+and as the output function. Each function can be valued for any given
+vector or matrix X, and can be differentiated via derivate().</p>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-none notranslate"><div class="highlight"><pre><span></span>import autograd.numpy as np
+from autograd import elementwise_grad
+
+def identity(X):
+    return X
+
+
+def sigmoid(X):
+    try:
+        return 1.0 / (1 + np.exp(-X))
+    except FloatingPointError:
+        return np.where(X &gt; np.zeros(X.shape), np.ones(X.shape), np.zeros(X.shape))
+
+
+def softmax(X):
+    X = X - np.max(X, axis=-1, keepdims=True)
+    delta = 10e-10
+    return np.exp(X) / (np.sum(np.exp(X), axis=-1, keepdims=True) + delta)
+
+
+def RELU(X):
+    return np.where(X &gt; np.zeros(X.shape), X, np.zeros(X.shape))
+
+
+def LRELU(X):
+    delta = 10e-4
+    return np.where(X &gt; np.zeros(X.shape), X, delta * X)
+
+
+def derivate(func):
+    if func.__name__ == &quot;RELU&quot;:
+
+        def func(X):
+            return np.where(X &gt; 0, 1, 0)
+
+        return func
+
+    elif func.__name__ == &quot;LRELU&quot;:
+
+        def func(X):
+            delta = 10e-4
+            return np.where(X &gt; 0, 1, delta)
+
+        return func
+
+    else:
+        return elementwise_grad(func)
+</pre></div>
+</div>
+</div>
+</div>
+<p>Below follows a short demonstration of how to use an activation
+function. The derivative of the activation function will be important
+when calculating the output delta term during backpropagation. Note
+that derivate() can also be used for cost functions for a more
+generalized approach.</p>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-none notranslate"><div class="highlight"><pre><span></span>z = np.array([[4, 5, 6]]).T
+print(f&quot;Input to activation function:\n{z}&quot;)
+
+act_func = sigmoid
+a = act_func(z)
+print(f&quot;\nOutput from {act_func.__name__} activation function:\n{a}&quot;)
+
+act_func_derivative = derivate(act_func)
+valued_at_z = act_func_derivative(a)
+print(f&quot;\nDerivative of {act_func.__name__} activation function valued at z:\n{valued_at_z}&quot;)
+</pre></div>
+</div>
+</div>
+</div>
+</section>
+<section id="the-neural-network">
+<h3>The Neural Network<a class="headerlink" href="#the-neural-network" title="Link to this heading">#</a></h3>
+<p>Now that we have gotten a good understanding of the implementation of
+some important components, we can take a look at an object oriented
+implementation of a feed forward neural network. The feed forward
+neural network has been implemented as a class named FFNN, which can
+be initiated as a regressor or classifier dependant on the choice of
+cost function. The FFNN can have any number of input nodes, hidden
+layers with any amount of hidden nodes, and any amount of output nodes
+meaning it can perform multiclass classification as well as binary
+classification and regression problems. Although there is a lot of
+code present, it makes for an easy to use and generalizeable interface
+for creating many types of neural networks as will be demonstrated
+below.</p>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-none notranslate"><div class="highlight"><pre><span></span>import math
+import autograd.numpy as np
+import sys
+import warnings
+from autograd import grad, elementwise_grad
+from random import random, seed
+from copy import deepcopy, copy
+from typing import Tuple, Callable
+from sklearn.utils import resample
+
+warnings.simplefilter(&quot;error&quot;)
+
+
+class FFNN:
+    &quot;&quot;&quot;
+    Description:
+    ------------
+        Feed Forward Neural Network with interface enabling flexible design of a
+        nerual networks architecture and the specification of activation function
+        in the hidden layers and output layer respectively. This model can be used
+        for both regression and classification problems, depending on the output function.
+
+    Attributes:
+    ------------
+        I   dimensions (tuple[int]): A list of positive integers, which specifies the
+            number of nodes in each of the networks layers. The first integer in the array
+            defines the number of nodes in the input layer, the second integer defines number
+            of nodes in the first hidden layer and so on until the last number, which
+            specifies the number of nodes in the output layer.
+        II  hidden_func (Callable): The activation function for the hidden layers
+        III output_func (Callable): The activation function for the output layer
+        IV  cost_func (Callable): Our cost function
+        V   seed (int): Sets random seed, makes results reproducible
+    &quot;&quot;&quot;
+
+    def __init__(
+        self,
+        dimensions: tuple[int],
+        hidden_func: Callable = sigmoid,
+        output_func: Callable = lambda x: x,
+        cost_func: Callable = CostOLS,
+        seed: int = None,
+    ):
+        self.dimensions = dimensions
+        self.hidden_func = hidden_func
+        self.output_func = output_func
+        self.cost_func = cost_func
+        self.seed = seed
+        self.weights = list()
+        self.schedulers_weight = list()
+        self.schedulers_bias = list()
+        self.a_matrices = list()
+        self.z_matrices = list()
+        self.classification = None
+
+        self.reset_weights()
+        self._set_classification()
+
+    def fit(
+        self,
+        X: np.ndarray,
+        t: np.ndarray,
+        scheduler: Scheduler,
+        batches: int = 1,
+        epochs: int = 100,
+        lam: float = 0,
+        X_val: np.ndarray = None,
+        t_val: np.ndarray = None,
+    ):
+        &quot;&quot;&quot;
+        Description:
+        ------------
+            This function performs the training the neural network by performing the feedforward and backpropagation
+            algorithm to update the networks weights.
+
+        Parameters:
+        ------------
+            I    X (np.ndarray) : training data
+            II   t (np.ndarray) : target data
+            III  scheduler (Scheduler) : specified scheduler (algorithm for optimization of gradient descent)
+            IV   scheduler_args (list[int]) : list of all arguments necessary for scheduler
+
+        Optional Parameters:
+        ------------
+            V    batches (int) : number of batches the datasets are split into, default equal to 1
+            VI   epochs (int) : number of iterations used to train the network, default equal to 100
+            VII  lam (float) : regularization hyperparameter lambda
+            VIII X_val (np.ndarray) : validation set
+            IX   t_val (np.ndarray) : validation target set
+
+        Returns:
+        ------------
+            I   scores (dict) : A dictionary containing the performance metrics of the model.
+                The number of the metrics depends on the parameters passed to the fit-function.
+
+        &quot;&quot;&quot;
+
+        # setup 
+        if self.seed is not None:
+            np.random.seed(self.seed)
+
+        val_set = False
+        if X_val is not None and t_val is not None:
+            val_set = True
+
+        # creating arrays for score metrics
+        train_errors = np.empty(epochs)
+        train_errors.fill(np.nan)
+        val_errors = np.empty(epochs)
+        val_errors.fill(np.nan)
+
+        train_accs = np.empty(epochs)
+        train_accs.fill(np.nan)
+        val_accs = np.empty(epochs)
+        val_accs.fill(np.nan)
+
+        self.schedulers_weight = list()
+        self.schedulers_bias = list()
+
+        batch_size = X.shape[0] // batches
+
+        X, t = resample(X, t)
+
+        # this function returns a function valued only at X
+        cost_function_train = self.cost_func(t)
+        if val_set:
+            cost_function_val = self.cost_func(t_val)
+
+        # create schedulers for each weight matrix
+        for i in range(len(self.weights)):
+            self.schedulers_weight.append(copy(scheduler))
+            self.schedulers_bias.append(copy(scheduler))
+
+        print(f&quot;{scheduler.__class__.__name__}: Eta={scheduler.eta}, Lambda={lam}&quot;)
+
+        try:
+            for e in range(epochs):
+                for i in range(batches):
+                    # allows for minibatch gradient descent
+                    if i == batches - 1:
+                        # If the for loop has reached the last batch, take all thats left
+                        X_batch = X[i * batch_size :, :]
+                        t_batch = t[i * batch_size :, :]
+                    else:
+                        X_batch = X[i * batch_size : (i + 1) * batch_size, :]
+                        t_batch = t[i * batch_size : (i + 1) * batch_size, :]
+
+                    self._feedforward(X_batch)
+                    self._backpropagate(X_batch, t_batch, lam)
+
+                # reset schedulers for each epoch (some schedulers pass in this call)
+                for scheduler in self.schedulers_weight:
+                    scheduler.reset()
+
+                for scheduler in self.schedulers_bias:
+                    scheduler.reset()
+
+                # computing performance metrics
+                pred_train = self.predict(X)
+                train_error = cost_function_train(pred_train)
+
+                train_errors[e] = train_error
+                if val_set:
+                    
+                    pred_val = self.predict(X_val)
+                    val_error = cost_function_val(pred_val)
+                    val_errors[e] = val_error
+
+                if self.classification:
+                    train_acc = self._accuracy(self.predict(X), t)
+                    train_accs[e] = train_acc
+                    if val_set:
+                        val_acc = self._accuracy(pred_val, t_val)
+                        val_accs[e] = val_acc
+
+                # printing progress bar
+                progression = e / epochs
+                print_length = self._progress_bar(
+                    progression,
+                    train_error=train_errors[e],
+                    train_acc=train_accs[e],
+                    val_error=val_errors[e],
+                    val_acc=val_accs[e],
+                )
+        except KeyboardInterrupt:
+            # allows for stopping training at any point and seeing the result
+            pass
+
+        # visualization of training progression (similiar to tensorflow progression bar)
+        sys.stdout.write(&quot;\r&quot; + &quot; &quot; * print_length)
+        sys.stdout.flush()
+        self._progress_bar(
+            1,
+            train_error=train_errors[e],
+            train_acc=train_accs[e],
+            val_error=val_errors[e],
+            val_acc=val_accs[e],
+        )
+        sys.stdout.write(&quot;&quot;)
+
+        # return performance metrics for the entire run
+        scores = dict()
+
+        scores[&quot;train_errors&quot;] = train_errors
+
+        if val_set:
+            scores[&quot;val_errors&quot;] = val_errors
+
+        if self.classification:
+            scores[&quot;train_accs&quot;] = train_accs
+
+            if val_set:
+                scores[&quot;val_accs&quot;] = val_accs
+
+        return scores
+
+    def predict(self, X: np.ndarray, *, threshold=0.5):
+        &quot;&quot;&quot;
+         Description:
+         ------------
+             Performs prediction after training of the network has been finished.
+
+         Parameters:
+        ------------
+             I   X (np.ndarray): The design matrix, with n rows of p features each
+
+         Optional Parameters:
+         ------------
+             II  threshold (float) : sets minimal value for a prediction to be predicted as the positive class
+                 in classification problems
+
+         Returns:
+         ------------
+             I   z (np.ndarray): A prediction vector (row) for each row in our design matrix
+                 This vector is thresholded if regression=False, meaning that classification results
+                 in a vector of 1s and 0s, while regressions in an array of decimal numbers
+
+        &quot;&quot;&quot;
+
+        predict = self._feedforward(X)
+
+        if self.classification:
+            return np.where(predict &gt; threshold, 1, 0)
+        else:
+            return predict
+
+    def reset_weights(self):
+        &quot;&quot;&quot;
+        Description:
+        ------------
+            Resets/Reinitializes the weights in order to train the network for a new problem.
+
+        &quot;&quot;&quot;
+        if self.seed is not None:
+            np.random.seed(self.seed)
+
+        self.weights = list()
+        for i in range(len(self.dimensions) - 1):
+            weight_array = np.random.randn(
+                self.dimensions[i] + 1, self.dimensions[i + 1]
+            )
+            weight_array[0, :] = np.random.randn(self.dimensions[i + 1]) * 0.01
+
+            self.weights.append(weight_array)
+
+    def _feedforward(self, X: np.ndarray):
+        &quot;&quot;&quot;
+        Description:
+        ------------
+            Calculates the activation of each layer starting at the input and ending at the output.
+            Each following activation is calculated from a weighted sum of each of the preceeding
+            activations (except in the case of the input layer).
+
+        Parameters:
+        ------------
+            I   X (np.ndarray): The design matrix, with n rows of p features each
+
+        Returns:
+        ------------
+            I   z (np.ndarray): A prediction vector (row) for each row in our design matrix
+        &quot;&quot;&quot;
+
+        # reset matrices
+        self.a_matrices = list()
+        self.z_matrices = list()
+
+        # if X is just a vector, make it into a matrix
+        if len(X.shape) == 1:
+            X = X.reshape((1, X.shape[0]))
+
+        # Add a coloumn of zeros as the first coloumn of the design matrix, in order
+        # to add bias to our data
+        bias = np.ones((X.shape[0], 1)) * 0.01
+        X = np.hstack([bias, X])
+
+        # a^0, the nodes in the input layer (one a^0 for each row in X - where the
+        # exponent indicates layer number).
+        a = X
+        self.a_matrices.append(a)
+        self.z_matrices.append(a)
+
+        # The feed forward algorithm
+        for i in range(len(self.weights)):
+            if i &lt; len(self.weights) - 1:
+                z = a @ self.weights[i]
+                self.z_matrices.append(z)
+                a = self.hidden_func(z)
+                # bias column again added to the data here
+                bias = np.ones((a.shape[0], 1)) * 0.01
+                a = np.hstack([bias, a])
+                self.a_matrices.append(a)
+            else:
+                try:
+                    # a^L, the nodes in our output layers
+                    z = a @ self.weights[i]
+                    a = self.output_func(z)
+                    self.a_matrices.append(a)
+                    self.z_matrices.append(z)
+                except Exception as OverflowError:
+                    print(
+                        &quot;OverflowError in fit() in FFNN\nHOW TO DEBUG ERROR: Consider lowering your learning rate or scheduler specific parameters such as momentum, or check if your input values need scaling&quot;
+                    )
+
+        # this will be a^L
+        return a
+
+    def _backpropagate(self, X, t, lam):
+        &quot;&quot;&quot;
+        Description:
+        ------------
+            Performs the backpropagation algorithm. In other words, this method
+            calculates the gradient of all the layers starting at the
+            output layer, and moving from right to left accumulates the gradient until
+            the input layer is reached. Each layers respective weights are updated while
+            the algorithm propagates backwards from the output layer (auto-differentation in reverse mode).
+
+        Parameters:
+        ------------
+            I   X (np.ndarray): The design matrix, with n rows of p features each.
+            II  t (np.ndarray): The target vector, with n rows of p targets.
+            III lam (float32): regularization parameter used to punish the weights in case of overfitting
+
+        Returns:
+        ------------
+            No return value.
+
+        &quot;&quot;&quot;
+        out_derivative = derivate(self.output_func)
+        hidden_derivative = derivate(self.hidden_func)
+
+        for i in range(len(self.weights) - 1, -1, -1):
+            # delta terms for output
+            if i == len(self.weights) - 1:
+                # for multi-class classification
+                if (
+                    self.output_func.__name__ == &quot;softmax&quot;
+                ):
+                    delta_matrix = self.a_matrices[i + 1] - t
+                # for single class classification
+                else:
+                    cost_func_derivative = grad(self.cost_func(t))
+                    delta_matrix = out_derivative(
+                        self.z_matrices[i + 1]
+                    ) * cost_func_derivative(self.a_matrices[i + 1])
+
+            # delta terms for hidden layer
+            else:
+                delta_matrix = (
+                    self.weights[i + 1][1:, :] @ delta_matrix.T
+                ).T * hidden_derivative(self.z_matrices[i + 1])
+
+            # calculate gradient
+            gradient_weights = self.a_matrices[i][:, 1:].T @ delta_matrix
+            gradient_bias = np.sum(delta_matrix, axis=0).reshape(
+                1, delta_matrix.shape[1]
+            )
+
+            # regularization term
+            gradient_weights += self.weights[i][1:, :] * lam
+
+            # use scheduler
+            update_matrix = np.vstack(
+                [
+                    self.schedulers_bias[i].update_change(gradient_bias),
+                    self.schedulers_weight[i].update_change(gradient_weights),
+                ]
+            )
+
+            # update weights and bias
+            self.weights[i] -= update_matrix
+
+    def _accuracy(self, prediction: np.ndarray, target: np.ndarray):
+        &quot;&quot;&quot;
+        Description:
+        ------------
+            Calculates accuracy of given prediction to target
+
+        Parameters:
+        ------------
+            I   prediction (np.ndarray): vector of predicitons output network
+                (1s and 0s in case of classification, and real numbers in case of regression)
+            II  target (np.ndarray): vector of true values (What the network ideally should predict)
+
+        Returns:
+        ------------
+            A floating point number representing the percentage of correctly classified instances.
+        &quot;&quot;&quot;
+        assert prediction.size == target.size
+        return np.average((target == prediction))
+    def _set_classification(self):
+        &quot;&quot;&quot;
+        Description:
+        ------------
+            Decides if FFNN acts as classifier (True) og regressor (False),
+            sets self.classification during init()
+        &quot;&quot;&quot;
+        self.classification = False
+        if (
+            self.cost_func.__name__ == &quot;CostLogReg&quot;
+            or self.cost_func.__name__ == &quot;CostCrossEntropy&quot;
+        ):
+            self.classification = True
+
+    def _progress_bar(self, progression, **kwargs):
+        &quot;&quot;&quot;
+        Description:
+        ------------
+            Displays progress of training
+        &quot;&quot;&quot;
+        print_length = 40
+        num_equals = int(progression * print_length)
+        num_not = print_length - num_equals
+        arrow = &quot;&gt;&quot; if num_equals &gt; 0 else &quot;&quot;
+        bar = &quot;[&quot; + &quot;=&quot; * (num_equals - 1) + arrow + &quot;-&quot; * num_not + &quot;]&quot;
+        perc_print = self._format(progression * 100, decimals=5)
+        line = f&quot;  {bar} {perc_print}% &quot;
+
+        for key in kwargs:
+            if not np.isnan(kwargs[key]):
+                value = self._format(kwargs[key], decimals=4)
+                line += f&quot;| {key}: {value} &quot;
+        sys.stdout.write(&quot;\r&quot; + line)
+        sys.stdout.flush()
+        return len(line)
+
+    def _format(self, value, decimals=4):
+        &quot;&quot;&quot;
+        Description:
+        ------------
+            Formats decimal numbers for progress bar
+        &quot;&quot;&quot;
+        if value &gt; 0:
+            v = value
+        elif value &lt; 0:
+            v = -10 * value
+        else:
+            v = 1
+        n = 1 + math.floor(math.log10(v))
+        if n &gt;= decimals - 1:
+            return str(round(value))
+        return f&quot;{value:.{decimals-n-1}f}&quot;
+</pre></div>
+</div>
+</div>
+</div>
+<p>Before we make a model, we will quickly generate a dataset we can use
+for our linear regression problem as shown below</p>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-none notranslate"><div class="highlight"><pre><span></span>import autograd.numpy as np
+from sklearn.model_selection import train_test_split
+
+def SkrankeFunction(x, y):
+    return np.ravel(0 + 1*x + 2*y + 3*x**2 + 4*x*y + 5*y**2)
+
+def create_X(x, y, n):
+    if len(x.shape) &gt; 1:
+        x = np.ravel(x)
+        y = np.ravel(y)
+
+    N = len(x)
+    l = int((n + 1) * (n + 2) / 2)  # Number of elements in beta
+    X = np.ones((N, l))
+
+    for i in range(1, n + 1):
+        q = int((i) * (i + 1) / 2)
+        for k in range(i + 1):
+            X[:, q + k] = (x ** (i - k)) * (y**k)
+
+    return X
+
+step=0.5
+x = np.arange(0, 1, step)
+y = np.arange(0, 1, step)
+x, y = np.meshgrid(x, y)
+target = SkrankeFunction(x, y)
+target = target.reshape(target.shape[0], 1)
+
+poly_degree=3
+X = create_X(x, y, poly_degree)
+
+X_train, X_test, t_train, t_test = train_test_split(X, target)
+</pre></div>
+</div>
+</div>
+</div>
+<p>Now that we have our dataset ready for the regression, we can create
+our regressor. Note that with the seed parameter, we can make sure our
+results stay the same every time we run the neural network. For
+inititialization, we simply specify the dimensions (we wish the amount
+of input nodes to be equal to the datapoints, and the output to
+predict one value).</p>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-none notranslate"><div class="highlight"><pre><span></span>input_nodes = X_train.shape[1]
+output_nodes = 1
+
+linear_regression = FFNN((input_nodes, output_nodes), output_func=identity, cost_func=CostOLS, seed=2023)
+</pre></div>
+</div>
+</div>
+</div>
+<p>We then fit our model with our training data using the scheduler of our choice.</p>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-none notranslate"><div class="highlight"><pre><span></span>linear_regression.reset_weights() # reset weights such that previous runs or reruns don&#39;t affect the weights
+
+scheduler = Constant(eta=1e-3)
+scores = linear_regression.fit(X_train, t_train, scheduler)
+</pre></div>
+</div>
+</div>
+</div>
+<p>Due to the progress bar we can see the MSE (train_error) throughout
+the FFNN’s training. Note that the fit() function has some optional
+parameters with defualt arguments. For example, the regularization
+hyperparameter can be left ignored if not needed, and equally the FFNN
+will by default run for 100 epochs. These can easily be changed, such
+as for example:</p>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-none notranslate"><div class="highlight"><pre><span></span>linear_regression.reset_weights() # reset weights such that previous runs or reruns don&#39;t affect the weights
+
+scores = linear_regression.fit(X_train, t_train, scheduler, lam=1e-4, epochs=1000)
+</pre></div>
+</div>
+</div>
+</div>
+<p>We see that given more epochs to train on, the regressor reaches a lower MSE.</p>
+<p>Let us then switch to a binary classification. We use a binary
+classification dataset, and follow a similar setup to the regression
+case.</p>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-none notranslate"><div class="highlight"><pre><span></span>from sklearn.datasets import load_breast_cancer
+from sklearn.preprocessing import MinMaxScaler
+
+wisconsin = load_breast_cancer()
+X = wisconsin.data
+target = wisconsin.target
+target = target.reshape(target.shape[0], 1)
+
+X_train, X_val, t_train, t_val = train_test_split(X, target)
+
+scaler = MinMaxScaler()
+scaler.fit(X_train)
+X_train = scaler.transform(X_train)
+X_val = scaler.transform(X_val)
+</pre></div>
+</div>
+</div>
+</div>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-none notranslate"><div class="highlight"><pre><span></span>input_nodes = X_train.shape[1]
+output_nodes = 1
+
+logistic_regression = FFNN((input_nodes, output_nodes), output_func=sigmoid, cost_func=CostLogReg, seed=2023)
+</pre></div>
+</div>
+</div>
+</div>
+<p>We will now make use of our validation data by passing it into our fit function as a keyword argument</p>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-none notranslate"><div class="highlight"><pre><span></span>logistic_regression.reset_weights() # reset weights such that previous runs or reruns don&#39;t affect the weights
+
+scheduler = Adam(eta=1e-3, rho=0.9, rho2=0.999)
+scores = logistic_regression.fit(X_train, t_train, scheduler, epochs=1000, X_val=X_val, t_val=t_val)
+</pre></div>
+</div>
+</div>
+</div>
+<p>Finally, we will create a neural network with 2 hidden layers with activation functions.</p>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-none notranslate"><div class="highlight"><pre><span></span>input_nodes = X_train.shape[1]
+hidden_nodes1 = 100
+hidden_nodes2 = 30
+output_nodes = 1
+
+dims = (input_nodes, hidden_nodes1, hidden_nodes2, output_nodes)
+
+neural_network = FFNN(dims, hidden_func=RELU, output_func=sigmoid, cost_func=CostLogReg, seed=2023)
+</pre></div>
+</div>
+</div>
+</div>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-none notranslate"><div class="highlight"><pre><span></span>neural_network.reset_weights() # reset weights such that previous runs or reruns don&#39;t affect the weights
+
+scheduler = Adam(eta=1e-4, rho=0.9, rho2=0.999)
+scores = neural_network.fit(X_train, t_train, scheduler, epochs=1000, X_val=X_val, t_val=t_val)
+</pre></div>
+</div>
+</div>
+</div>
+</section>
+<section id="multiclass-classification">
+<h3>Multiclass classification<a class="headerlink" href="#multiclass-classification" title="Link to this heading">#</a></h3>
+<p>Finally, we will demonstrate the use case of multiclass classification
+using our FFNN with the famous MNIST dataset, which contain images of
+digits between the range of 0 to 9.</p>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-none notranslate"><div class="highlight"><pre><span></span>from sklearn.datasets import load_digits
+
+def onehot(target: np.ndarray):
+    onehot = np.zeros((target.size, target.max() + 1))
+    onehot[np.arange(target.size), target] = 1
+    return onehot
+
+digits = load_digits()
+
+X = digits.data
+target = digits.target
+target = onehot(target)
+
+input_nodes = 64
+hidden_nodes1 = 100
+hidden_nodes2 = 30
+output_nodes = 10
+
+dims = (input_nodes, hidden_nodes1, hidden_nodes2, output_nodes)
+
+multiclass = FFNN(dims, hidden_func=LRELU, output_func=softmax, cost_func=CostCrossEntropy)
+
+multiclass.reset_weights() # reset weights such that previous runs or reruns don&#39;t affect the weights
+
+scheduler = Adam(eta=1e-4, rho=0.9, rho2=0.999)
+scores = multiclass.fit(X, target, scheduler, epochs=1000)
+</pre></div>
+</div>
+</div>
+</div>
+</section>
+</section>
+<section id="testing-the-xor-gate-and-other-gates">
+<h2>Testing the XOR gate and other gates<a class="headerlink" href="#testing-the-xor-gate-and-other-gates" title="Link to this heading">#</a></h2>
+<p>Let us now use our code to test the XOR gate.</p>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-none notranslate"><div class="highlight"><pre><span></span>X = np.array([ [0, 0], [0, 1], [1, 0],[1, 1]],dtype=np.float64)
+
+# The XOR gate
+yXOR = np.array( [[ 0], [1] ,[1], [0]])
+
+input_nodes = X.shape[1]
+output_nodes = 1
+
+logistic_regression = FFNN((input_nodes, output_nodes), output_func=sigmoid, cost_func=CostLogReg, seed=2023)
+logistic_regression.reset_weights() # reset weights such that previous runs or reruns don&#39;t affect the weights
+scheduler = Adam(eta=1e-1, rho=0.9, rho2=0.999)
+scores = logistic_regression.fit(X, yXOR, scheduler, epochs=1000)
+</pre></div>
+</div>
+</div>
+</div>
+<p>Not bad, but the results depend strongly on the learning reate. Try different learning rates.</p>
+</section>
+</section>
+
+    <script type="text/x-thebe-config">
+    {
+        requestKernel: true,
+        binderOptions: {
+            repo: "binder-examples/jupyter-stacks-datascience",
+            ref: "master",
+        },
+        codeMirrorConfig: {
+            theme: "abcdef",
+            mode: "python"
+        },
+        kernelOptions: {
+            name: "python3",
+            path: "./."
+        },
+        predefinedOutput: true
+    }
+    </script>
+    <script>kernelName = 'python3'</script>
+
+                </article>
+              
+
+              
+              
+              
+              
+                <footer class="prev-next-footer d-print-none">
+                  
+<div class="prev-next-area">
+    <a class="left-prev"
+       href="http://github.com/exercisesweek41.html"
+       title="previous page">
+      <i class="fa-solid fa-angle-left"></i>
+      <div class="prev-next-info">
+        <p class="prev-next-subtitle">previous</p>
+        <p class="prev-next-title">Exercises week 41</p>
+      </div>
+    </a>
+    <a class="right-next"
+       href="http://github.com/exercisesweek42.html"
+       title="next page">
+      <div class="prev-next-info">
+        <p class="prev-next-subtitle">next</p>
+        <p class="prev-next-title">Exercises week 42</p>
+      </div>
+      <i class="fa-solid fa-angle-right"></i>
+    </a>
+</div>
+                </footer>
+              
+            </div>
+            
+            
+              
+                <div class="bd-sidebar-secondary bd-toc"><div class="sidebar-secondary-items sidebar-secondary__inner">
+
+
+  <div class="sidebar-secondary-item">
+  <div class="page-toc tocsection onthispage">
+    <i class="fa-solid fa-list"></i> Contents
+  </div>
+  <nav class="bd-toc-nav page-toc">
+    <ul class="visible nav section-nav flex-column">
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#lecture-october-13-2025">Lecture October 13, 2025</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#readings-and-videos">Readings and videos</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#material-for-the-lab-sessions-on-tuesday-and-wednesday">Material for the lab sessions on Tuesday and Wednesday</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network">Lecture material: Writing a code which implements a feed-forward neural network</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#mathematics-of-deep-learning">Mathematics of deep learning</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#reminder-on-books-with-hands-on-material-and-codes">Reminder on books with hands-on material and codes</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#reading-recommendations">Reading recommendations</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#reminder-from-last-week-first-network-example-simple-percepetron-with-one-input">Reminder from last week: First network example, simple percepetron with one input</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#layout-of-a-simple-neural-network-with-no-hidden-layer">Layout of a simple neural network with no hidden layer</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#optimizing-the-parameters">Optimizing the parameters</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#adding-a-hidden-layer">Adding a hidden layer</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#layout-of-a-simple-neural-network-with-one-hidden-layer">Layout of a simple neural network with one hidden layer</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#the-derivatives">The derivatives</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#important-observations">Important observations</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#the-training">The training</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#code-example">Code example</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#simple-neural-network-and-the-back-propagation-equations">Simple neural network and the  back propagation equations</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-with-two-hidden-noeds-and-one-output-node">Layout of a simple neural network with two input nodes, one  hidden layer with two hidden noeds and one output node</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#the-ouput-layer">The ouput layer</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#compact-expressions">Compact expressions</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#output-layer">Output layer</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#explicit-derivatives">Explicit derivatives</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#derivatives-of-the-hidden-layer">Derivatives of the hidden layer</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#final-expression">Final expression</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#completing-the-list">Completing the list</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#final-expressions-for-the-biases-of-the-hidden-layer">Final expressions for the biases of the hidden layer</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#gradient-expressions">Gradient expressions</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#setting-up-the-equations-for-a-neural-network">Setting up the equations for a neural network</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0">Layout of a neural network with three hidden layers (last layer = <span class="math notranslate nohighlight">\(l=L=4\)</span>, first layer <span class="math notranslate nohighlight">\(l=0\)</span>)</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#definitions">Definitions</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#inputs-to-the-activation-function">Inputs to the activation function</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#layout-of-input-to-first-hidden-layer-l-1-from-input-layer-l-0">Layout of input to first hidden layer <span class="math notranslate nohighlight">\(l=1\)</span> from input layer <span class="math notranslate nohighlight">\(l=0\)</span></a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#derivatives-and-the-chain-rule">Derivatives and the chain rule</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#derivative-of-the-cost-function">Derivative of the cost function</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#the-back-propagation-equations-for-a-neural-network">The  back propagation equations for a neural network</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#analyzing-the-last-results">Analyzing the last results</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#more-considerations">More considerations</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#derivatives-in-terms-of-z-j-l">Derivatives in terms of <span class="math notranslate nohighlight">\(z_j^L\)</span></a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#bringing-it-together">Bringing it together</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#final-back-propagating-equation">Final back propagating equation</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#using-the-chain-rule-and-summing-over-all-k-entries">Using the chain rule and summing over all <span class="math notranslate nohighlight">\(k\)</span> entries</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations">Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#setting-up-the-back-propagation-algorithm-part-1">Setting up the back propagation algorithm, part 1</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#setting-up-the-back-propagation-algorithm-part-2">Setting up the back propagation algorithm, part 2</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#setting-up-the-back-propagation-algorithm-part-3">Setting up the Back propagation algorithm, part 3</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#updating-the-gradients">Updating the gradients</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#activation-functions">Activation functions</a><ul class="nav section-nav flex-column">
+<li class="toc-h3 nav-item toc-entry"><a class="reference internal nav-link" href="#activation-functions-logistic-and-hyperbolic-ones">Activation functions, Logistic and Hyperbolic ones</a></li>
+</ul>
+</li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#relevance">Relevance</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#vanishing-gradients">Vanishing gradients</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#exploding-gradients">Exploding gradients</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#is-the-logistic-activation-function-sigmoid-our-choice">Is the Logistic activation function (Sigmoid)  our choice?</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#logistic-function-as-the-root-of-problems">Logistic function as the root of problems</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#the-derivative-of-the-logistic-funtion">The derivative of the Logistic funtion</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#insights-from-the-paper-by-glorot-and-bengio">Insights from the paper by Glorot and Bengio</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#the-relu-function-family">The RELU function family</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#elu-function">ELU function</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#which-activation-function-should-we-use">Which activation function should we use?</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#more-on-activation-functions-output-layers">More on activation functions, output layers</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#fine-tuning-neural-network-hyperparameters">Fine-tuning neural network hyperparameters</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#hidden-layers">Hidden layers</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#batch-normalization">Batch Normalization</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#dropout">Dropout</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#gradient-clipping">Gradient Clipping</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#a-top-down-perspective-on-neural-networks">A top-down perspective on Neural networks</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#more-top-down-perspectives">More top-down perspectives</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#limitations-of-supervised-learning-with-deep-networks">Limitations of supervised learning with deep networks</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#limitations-of-nns">Limitations of NNs</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#homogeneous-data">Homogeneous data</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#more-limitations">More limitations</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#setting-up-a-multi-layer-perceptron-model-for-classification">Setting up a Multi-layer perceptron model for classification</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#defining-the-cost-function">Defining the cost function</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#example-binary-classification-problem">Example: binary classification problem</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#the-softmax-function">The Softmax function</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#developing-a-code-for-doing-neural-networks-with-back-propagation">Developing a code for doing neural networks with back propagation</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#collect-and-pre-process-data">Collect and pre-process data</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#train-and-test-datasets">Train and test datasets</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#define-model-and-architecture">Define model and architecture</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#layers">Layers</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#weights-and-biases">Weights and biases</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#feed-forward-pass">Feed-forward pass</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#matrix-multiplications">Matrix multiplications</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#choose-cost-function-and-optimizer">Choose cost function and optimizer</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#optimizing-the-cost-function">Optimizing the cost function</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#regularization">Regularization</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#matrix-multiplication">Matrix  multiplication</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#improving-performance">Improving performance</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#full-object-oriented-implementation">Full object-oriented implementation</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#evaluate-model-performance-on-test-data">Evaluate model performance on test data</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#adjust-hyperparameters">Adjust hyperparameters</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#visualization">Visualization</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#scikit-learn-implementation">scikit-learn implementation</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#id1">Visualization</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#building-neural-networks-in-tensorflow-and-keras">Building neural networks in Tensorflow and Keras</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#tensorflow">Tensorflow</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#using-keras">Using Keras</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#id2">Collect and pre-process data</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#building-a-neural-network-code">Building a neural network code</a><ul class="nav section-nav flex-column">
+<li class="toc-h3 nav-item toc-entry"><a class="reference internal nav-link" href="#learning-rate-methods">Learning rate methods</a></li>
+<li class="toc-h3 nav-item toc-entry"><a class="reference internal nav-link" href="#usage-of-the-above-learning-rate-schedulers">Usage of the above learning rate schedulers</a></li>
+<li class="toc-h3 nav-item toc-entry"><a class="reference internal nav-link" href="#cost-functions">Cost functions</a></li>
+<li class="toc-h3 nav-item toc-entry"><a class="reference internal nav-link" href="#id3">Activation functions</a></li>
+<li class="toc-h3 nav-item toc-entry"><a class="reference internal nav-link" href="#the-neural-network">The Neural Network</a></li>
+<li class="toc-h3 nav-item toc-entry"><a class="reference internal nav-link" href="#multiclass-classification">Multiclass classification</a></li>
+</ul>
+</li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#testing-the-xor-gate-and-other-gates">Testing the XOR gate and other gates</a></li>
+</ul>
+  </nav></div>
+
+</div></div>
+              
+            
+          </div>
+          <footer class="bd-footer-content">
+            
+<div class="bd-footer-content__inner container">
+  
+  <div class="footer-item">
+    
+<p class="component-author">
+By Morten Hjorth-Jensen
+</p>
+
+  </div>
+  
+  <div class="footer-item">
+    
+
+  <p class="copyright">
+    
+      © Copyright 2023.
+      <br/>
+    
+  </p>
+
+  </div>
+  
+  <div class="footer-item">
+    
+  </div>
+  
+  <div class="footer-item">
+    
+  </div>
+  
+</div>
+          </footer>
+        
+
+      </main>
+    </div>
+  </div>
+  
+  <!-- Scripts loaded after <body> so the DOM is not blocked -->
+  <script src="http://github.com/_static/scripts/bootstrap.js?digest=dfe6caa3a7d634c4db9b"></script>
+<script src="http://github.com/_static/scripts/pydata-sphinx-theme.js?digest=dfe6caa3a7d634c4db9b"></script>
+
+  <footer class="bd-footer">
+  </footer>
+  </body>
+</html>
\ No newline at end of file
diff --git a/doc/LectureNotes/_build/html/week43.html b/doc/LectureNotes/_build/html/week43.html
new file mode 100644
index 000000000..0f4021673
--- /dev/null
+++ b/doc/LectureNotes/_build/html/week43.html
@@ -0,0 +1,4614 @@
+
+<!DOCTYPE html>
+
+
+<html lang="en" data-content_root="./" >
+
+  <head>
+    <meta charset="utf-8" />
+    <meta name="viewport" content="width=device-width, initial-scale=1.0" /><meta name="viewport" content="width=device-width, initial-scale=1" />
+
+    <title>Week 43: Deep Learning: Constructing a Neural Network code and solving differential equations &#8212; Applied Data Analysis and Machine Learning</title>
+  
+  
+  
+  <script data-cfasync="false">
+    document.documentElement.dataset.mode = localStorage.getItem("mode") || "";
+    document.documentElement.dataset.theme = localStorage.getItem("theme") || "";
+  </script>
+  
+  <!-- Loaded before other Sphinx assets -->
+  <link href="http://github.com/_static/styles/theme.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+<link href="http://github.com/_static/styles/bootstrap.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+<link href="http://github.com/_static/styles/pydata-sphinx-theme.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+
+  
+  <link href="http://github.com/_static/vendor/fontawesome/6.5.2/css/all.min.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+  <link rel="preload" as="font" type="font/woff2" crossorigin href="http://github.com/_static/vendor/fontawesome/6.5.2/webfonts/fa-solid-900.woff2" />
+<link rel="preload" as="font" type="font/woff2" crossorigin href="http://github.com/_static/vendor/fontawesome/6.5.2/webfonts/fa-brands-400.woff2" />
+<link rel="preload" as="font" type="font/woff2" crossorigin href="http://github.com/_static/vendor/fontawesome/6.5.2/webfonts/fa-regular-400.woff2" />
+
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/pygments.css?v=fa44fd50" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/styles/sphinx-book-theme.css?v=eba8b062" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/togglebutton.css?v=13237357" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/copybutton.css?v=76b2166b" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/mystnb.8ecb98da25f57f5357bf6f572d296f466b2cfe2517ffebfabe82451661e28f02.css?v=6644e6bb" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/sphinx-thebe.css?v=4fa983c6" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/sphinx-design.min.css?v=95c83b7e" />
+  
+  <!-- Pre-loaded scripts that we'll load fully later -->
+  <link rel="preload" as="script" href="http://github.com/_static/scripts/bootstrap.js?digest=dfe6caa3a7d634c4db9b" />
+<link rel="preload" as="script" href="http://github.com/_static/scripts/pydata-sphinx-theme.js?digest=dfe6caa3a7d634c4db9b" />
+  <script src="http://github.com/_static/vendor/fontawesome/6.5.2/js/all.min.js?digest=dfe6caa3a7d634c4db9b"></script>
+
+    <script src="http://github.com/_static/documentation_options.js?v=9eb32ce0"></script>
+    <script src="http://github.com/_static/doctools.js?v=9a2dae69"></script>
+    <script src="http://github.com/_static/sphinx_highlight.js?v=dc90522c"></script>
+    <script src="http://github.com/_static/clipboard.min.js?v=a7894cd8"></script>
+    <script src="http://github.com/_static/copybutton.js?v=f281be69"></script>
+    <script src="http://github.com/_static/scripts/sphinx-book-theme.js?v=887ef09a"></script>
+    <script>let toggleHintShow = 'Click to show';</script>
+    <script>let toggleHintHide = 'Click to hide';</script>
+    <script>let toggleOpenOnPrint = 'true';</script>
+    <script src="http://github.com/_static/togglebutton.js?v=4a39c7ea"></script>
+    <script>var togglebuttonSelector = '.toggle, .admonition.dropdown';</script>
+    <script src="http://github.com/_static/design-tabs.js?v=f930bc37"></script>
+    <script>const THEBE_JS_URL = "https://unpkg.com/thebe@0.8.2/lib/index.js"; const thebe_selector = ".thebe,.cell"; const thebe_selector_input = "pre"; const thebe_selector_output = ".output, .cell_output"</script>
+    <script async="async" src="http://github.com/_static/sphinx-thebe.js?v=c100c467"></script>
+    <script>var togglebuttonSelector = '.toggle, .admonition.dropdown';</script>
+    <script>const THEBE_JS_URL = "https://unpkg.com/thebe@0.8.2/lib/index.js"; const thebe_selector = ".thebe,.cell"; const thebe_selector_input = "pre"; const thebe_selector_output = ".output, .cell_output"</script>
+    <script>window.MathJax = {"options": {"processHtmlClass": "tex2jax_process|mathjax_process|math|output_area"}}</script>
+    <script defer="defer" src="https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-mml-chtml.js"></script>
+    <script>DOCUMENTATION_OPTIONS.pagename = 'week43';</script>
+    <link rel="index" title="Index" href="http://github.com/genindex.html" />
+    <link rel="search" title="Search" href="http://github.com/search.html" />
+    <link rel="next" title="Exercises week 43" href="http://github.com/exercisesweek43.html" />
+    <link rel="prev" title="Exercises week 42" href="http://github.com/exercisesweek42.html" />
+  <meta name="viewport" content="width=device-width, initial-scale=1"/>
+  <meta name="docsearch:language" content="en"/>
+  </head>
+  
+  
+  <body data-bs-spy="scroll" data-bs-target=".bd-toc-nav" data-offset="180" data-bs-root-margin="0px 0px -60%" data-default-mode="">
+
+  
+  
+  <div id="pst-skip-link" class="skip-link d-print-none"><a href="#main-content">Skip to main content</a></div>
+  
+  <div id="pst-scroll-pixel-helper"></div>
+  
+  <button type="button" class="btn rounded-pill" id="pst-back-to-top">
+    <i class="fa-solid fa-arrow-up"></i>Back to top</button>
+
+  
+  <input type="checkbox"
+          class="sidebar-toggle"
+          id="pst-primary-sidebar-checkbox"/>
+  <label class="overlay overlay-primary" for="pst-primary-sidebar-checkbox"></label>
+  
+  <input type="checkbox"
+          class="sidebar-toggle"
+          id="pst-secondary-sidebar-checkbox"/>
+  <label class="overlay overlay-secondary" for="pst-secondary-sidebar-checkbox"></label>
+  
+  <div class="search-button__wrapper">
+    <div class="search-button__overlay"></div>
+    <div class="search-button__search-container">
+<form class="bd-search d-flex align-items-center"
+      action="http://github.com/search.html"
+      method="get">
+  <i class="fa-solid fa-magnifying-glass"></i>
+  <input type="search"
+         class="form-control"
+         name="q"
+         id="search-input"
+         placeholder="Search this book..."
+         aria-label="Search this book..."
+         autocomplete="off"
+         autocorrect="off"
+         autocapitalize="off"
+         spellcheck="false"/>
+  <span class="search-button__kbd-shortcut"><kbd class="kbd-shortcut__modifier">Ctrl</kbd>+<kbd>K</kbd></span>
+</form></div>
+  </div>
+
+  <div class="pst-async-banner-revealer d-none">
+  <aside id="bd-header-version-warning" class="d-none d-print-none" aria-label="Version warning"></aside>
+</div>
+
+  
+    <header class="bd-header navbar navbar-expand-lg bd-navbar d-print-none">
+    </header>
+  
+
+  <div class="bd-container">
+    <div class="bd-container__inner bd-page-width">
+      
+      
+      
+      <div class="bd-sidebar-primary bd-sidebar">
+        
+
+  
+  <div class="sidebar-header-items sidebar-primary__section">
+    
+    
+    
+    
+  </div>
+  
+    <div class="sidebar-primary-items__start sidebar-primary__section">
+        <div class="sidebar-primary-item">
+
+  
+    
+  
+
+<a class="navbar-brand logo" href="http://github.com/intro.html">
+  
+  
+  
+  
+  
+    
+    
+      
+    
+    
+    <img src="http://github.com/_static/logo.png" class="logo__image only-light" alt="Applied Data Analysis and Machine Learning - Home"/>
+    <script>document.write(`<img src="http://github.com/_static/logo.png" class="logo__image only-dark" alt="Applied Data Analysis and Machine Learning - Home"/>`);</script>
+  
+  
+</a></div>
+        <div class="sidebar-primary-item">
+
+ <script>
+ document.write(`
+   <button class="btn search-button-field search-button__button" title="Search" aria-label="Search" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <i class="fa-solid fa-magnifying-glass"></i>
+    <span class="search-button__default-text">Search</span>
+    <span class="search-button__kbd-shortcut"><kbd class="kbd-shortcut__modifier">Ctrl</kbd>+<kbd class="kbd-shortcut__modifier">K</kbd></span>
+   </button>
+ `);
+ </script></div>
+        <div class="sidebar-primary-item"><nav class="bd-links bd-docs-nav" aria-label="Main">
+    <div class="bd-toc-item navbar-nav active">
+        
+        <ul class="nav bd-sidenav bd-sidenav__home-link">
+            <li class="toctree-l1">
+                <a class="reference internal" href="http://github.com/intro.html">
+                    Applied Data Analysis and Machine Learning
+                </a>
+            </li>
+        </ul>
+        <p aria-level="2" class="caption" role="heading"><span class="caption-text">About the course</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/schedule.html">Course setting</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/teachers.html">Teachers and Grading</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/textbooks.html">Textbooks</a></li>
+
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Review of Statistics with Resampling Techniques and Linear Algebra</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/statistics.html">1. Elements of Probability Theory and Statistical Data Analysis</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/linalg.html">2. Linear Algebra, Handling of Arrays and more Python Features</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">From Regression to Support Vector Machines</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter1.html">3. Linear Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter2.html">4. Ridge and Lasso Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter3.html">5. Resampling Methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter4.html">6. Logistic Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapteroptimization.html">7. Optimization, the central part of any Machine Learning algortithm</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter5.html">8. Support Vector Machines, overarching aims</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Decision Trees, Ensemble Methods and Boosting</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter6.html">9. Decision trees, overarching aims</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter7.html">10. Ensemble Methods: From a Single Tree to Many Trees and Extreme Boosting, Meet the Jungle of Methods</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Dimensionality Reduction</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter8.html">11. Basic ideas of the Principal Component Analysis (PCA)</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/clustering.html">12. Clustering and Unsupervised Learning</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Deep Learning Methods</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter9.html">13. Neural networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter10.html">14. Building a Feed Forward Neural Network</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter11.html">15. Solving Differential Equations  with Deep Learning</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter12.html">16. Convolutional Neural Networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter13.html">17. Recurrent neural networks: Overarching view</a></li>
+
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Weekly material, notes and exercises</span></p>
+<ul class="current nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek34.html">Exercises week 34</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week34.html">Week 34: Introduction to the course, Logistics and Practicalities</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek35.html">Exercises week 35</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week35.html">Week 35: From Ordinary Linear Regression to Ridge and Lasso Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek36.html">Exercises week 36</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week36.html">Week 36: Linear Regression and Gradient descent</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek37.html">Exercises week 37</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week37.html">Week 37: Gradient descent methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek38.html">Exercises week 38</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week38.html">Week 38: Statistical analysis, bias-variance tradeoff and resampling methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek39.html">Exercises week 39</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week39.html">Week 39: Resampling methods and logistic regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week40.html">Week 40: Gradient descent methods (continued) and start Neural networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week41.html">Week 41 Neural networks and constructing a neural network code</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek41.html">Exercises week 41</a></li>
+
+
+
+
+
+
+
+
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week42.html">Week 42 Constructing a Neural Network code with examples</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek42.html">Exercises week 42</a></li>
+
+
+
+
+
+
+
+
+
+<li class="toctree-l1 current active"><a class="current reference internal" href="#">Week 43: Deep Learning: Constructing a Neural Network code and solving differential equations</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek43.html">Exercises week 43</a></li>
+
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week44.html">Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek44.html">Exercises week 44</a></li>
+
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week45.html">Week 45,  Convolutional Neural Networks (CCNs)</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Projects</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/project1.html">Project 1 on Machine Learning, deadline October 6 (midnight), 2025</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/project2.html">Project 2 on Machine Learning, deadline November 10 (Midnight)</a></li>
+</ul>
+
+    </div>
+</nav></div>
+    </div>
+  
+  
+  <div class="sidebar-primary-items__end sidebar-primary__section">
+  </div>
+  
+  <div id="rtd-footer-container"></div>
+
+
+      </div>
+      
+      <main id="main-content" class="bd-main" role="main">
+        
+        
+
+<div class="sbt-scroll-pixel-helper"></div>
+
+          <div class="bd-content">
+            <div class="bd-article-container">
+              
+              <div class="bd-header-article d-print-none">
+<div class="header-article-items header-article__inner">
+  
+    <div class="header-article-items__start">
+      
+        <div class="header-article-item"><button class="sidebar-toggle primary-toggle btn btn-sm" title="Toggle primary sidebar" data-bs-placement="bottom" data-bs-toggle="tooltip">
+  <span class="fa-solid fa-bars"></span>
+</button></div>
+      
+    </div>
+  
+  
+    <div class="header-article-items__end">
+      
+        <div class="header-article-item">
+
+<div class="article-header-buttons">
+
+
+
+
+
+<div class="dropdown dropdown-download-buttons">
+  <button class="btn dropdown-toggle" type="button" data-bs-toggle="dropdown" aria-expanded="false" aria-label="Download this page">
+    <i class="fas fa-download"></i>
+  </button>
+  <ul class="dropdown-menu">
+      
+      
+      
+      <li><a href="http://github.com/_sources/week43.ipynb" target="_blank"
+   class="btn btn-sm btn-download-source-button dropdown-item"
+   title="Download source file"
+   data-bs-placement="left" data-bs-toggle="tooltip"
+>
+  
+
+<span class="btn__icon-container">
+  <i class="fas fa-file"></i>
+  </span>
+<span class="btn__text-container">.ipynb</span>
+</a>
+</li>
+      
+      
+      
+      
+      <li>
+<button onclick="window.print()"
+  class="btn btn-sm btn-download-pdf-button dropdown-item"
+  title="Print to PDF"
+  data-bs-placement="left" data-bs-toggle="tooltip"
+>
+  
+
+<span class="btn__icon-container">
+  <i class="fas fa-file-pdf"></i>
+  </span>
+<span class="btn__text-container">.pdf</span>
+</button>
+</li>
+      
+  </ul>
+</div>
+
+
+
+
+<button onclick="toggleFullScreen()"
+  class="btn btn-sm btn-fullscreen-button"
+  title="Fullscreen mode"
+  data-bs-placement="bottom" data-bs-toggle="tooltip"
+>
+  
+
+<span class="btn__icon-container">
+  <i class="fas fa-expand"></i>
+  </span>
+
+</button>
+
+
+
+<script>
+document.write(`
+  <button class="btn btn-sm nav-link pst-navbar-icon theme-switch-button" title="light/dark" aria-label="light/dark" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <i class="theme-switch fa-solid fa-sun fa-lg" data-mode="light"></i>
+    <i class="theme-switch fa-solid fa-moon fa-lg" data-mode="dark"></i>
+    <i class="theme-switch fa-solid fa-circle-half-stroke fa-lg" data-mode="auto"></i>
+  </button>
+`);
+</script>
+
+
+<script>
+document.write(`
+  <button class="btn btn-sm pst-navbar-icon search-button search-button__button" title="Search" aria-label="Search" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <i class="fa-solid fa-magnifying-glass fa-lg"></i>
+  </button>
+`);
+</script>
+<button class="sidebar-toggle secondary-toggle btn btn-sm" title="Toggle secondary sidebar" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <span class="fa-solid fa-list"></span>
+</button>
+</div></div>
+      
+    </div>
+  
+</div>
+</div>
+              
+              
+
+<div id="jb-print-docs-body" class="onlyprint">
+    <h1>Week 43: Deep Learning: Constructing a Neural Network code and solving differential equations</h1>
+    <!-- Table of contents -->
+    <div id="print-main-content">
+        <div id="jb-print-toc">
+            
+            <div>
+                <h2> Contents </h2>
+            </div>
+            <nav aria-label="Page">
+                <ul class="visible nav section-nav flex-column">
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#plans-for-week-43">Plans for week 43</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#exercises-and-lab-session-week-43">Exercises and lab session week 43</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#using-automatic-differentiation">Using Automatic differentiation</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#back-propagation-and-automatic-differentiation">Back propagation and automatic differentiation</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#lecture-monday-october-20">Lecture Monday  October 20</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations">Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#setting-up-the-back-propagation-algorithm-part-1">Setting up the back propagation algorithm, part 1</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#setting-up-the-back-propagation-algorithm-part-2">Setting up the back propagation algorithm, part 2</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#setting-up-the-back-propagation-algorithm-part-3">Setting up the Back propagation algorithm, part 3</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#updating-the-gradients">Updating the gradients</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#activation-functions">Activation functions</a><ul class="nav section-nav flex-column">
+<li class="toc-h3 nav-item toc-entry"><a class="reference internal nav-link" href="#activation-functions-examples">Activation functions, examples</a></li>
+</ul>
+</li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#the-relu-function-family">The RELU function family</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#elu-function">ELU function</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#which-activation-function-should-we-use">Which activation function should we use?</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#more-on-activation-functions-output-layers">More on activation functions, output layers</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#building-neural-networks-in-tensorflow-and-keras">Building neural networks in Tensorflow and Keras</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#tensorflow">Tensorflow</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#using-keras">Using Keras</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#collect-and-pre-process-data">Collect and pre-process data</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#using-pytorch-with-the-full-mnist-data-set">Using Pytorch with the full MNIST data set</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#and-a-similar-example-using-tensorflow-with-keras">And a similar example using Tensorflow with Keras</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#building-our-own-neural-network-code">Building our own  neural network code</a><ul class="nav section-nav flex-column">
+<li class="toc-h3 nav-item toc-entry"><a class="reference internal nav-link" href="#learning-rate-methods">Learning rate methods</a></li>
+<li class="toc-h3 nav-item toc-entry"><a class="reference internal nav-link" href="#usage-of-the-above-learning-rate-schedulers">Usage of the above learning rate schedulers</a></li>
+<li class="toc-h3 nav-item toc-entry"><a class="reference internal nav-link" href="#cost-functions">Cost functions</a></li>
+<li class="toc-h3 nav-item toc-entry"><a class="reference internal nav-link" href="#id1">Activation functions</a></li>
+<li class="toc-h3 nav-item toc-entry"><a class="reference internal nav-link" href="#the-neural-network">The Neural Network</a></li>
+<li class="toc-h3 nav-item toc-entry"><a class="reference internal nav-link" href="#multiclass-classification">Multiclass classification</a></li>
+</ul>
+</li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#testing-the-xor-gate-and-other-gates">Testing the XOR gate and other gates</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#solving-differential-equations-with-deep-learning">Solving differential equations  with Deep Learning</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#ordinary-differential-equations-first">Ordinary Differential Equations first</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#the-trial-solution">The trial solution</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#minimization-process">Minimization process</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#minimizing-the-cost-function-using-gradient-descent-and-automatic-differentiation">Minimizing the cost function using gradient descent and automatic differentiation</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#example-exponential-decay">Example: Exponential decay</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#the-function-to-solve-for">The function to solve for</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#id2">The trial solution</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#setup-of-network">Setup of Network</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#reformulating-the-problem">Reformulating the problem</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#more-technicalities">More technicalities</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#more-details">More details</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#a-possible-implementation-of-a-neural-network">A possible implementation of a neural network</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#technicalities">Technicalities</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#final-technicalities-i">Final technicalities I</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#final-technicalities-ii">Final technicalities II</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#final-technicalities-iii">Final technicalities III</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#final-technicalities-iv">Final technicalities IV</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#back-propagation">Back propagation</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#gradient-descent">Gradient descent</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#the-code-for-solving-the-ode">The code for solving the ODE</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#the-network-with-one-input-layer-specified-number-of-hidden-layers-and-one-output-layer">The network with one input layer, specified number of hidden layers, and one output layer</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#example-population-growth">Example: Population growth</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#setting-up-the-problem">Setting up the problem</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#id3">The trial solution</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#the-program-using-autograd">The program using Autograd</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#using-forward-euler-to-solve-the-ode">Using forward Euler to solve the ODE</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#example-solving-the-one-dimensional-poisson-equation">Example: Solving the one dimensional Poisson equation</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#the-specific-equation-to-solve-for">The specific equation to solve for</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#solving-the-equation-using-autograd">Solving the equation using Autograd</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#comparing-with-a-numerical-scheme">Comparing with a numerical scheme</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#setting-up-the-code">Setting up the code</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#partial-differential-equations">Partial Differential Equations</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#type-of-problem">Type of problem</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#network-requirements">Network requirements</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#id4">More details</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#example-the-diffusion-equation">Example: The diffusion equation</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#defining-the-problem">Defining the problem</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#setting-up-the-network-using-autograd">Setting up the network using Autograd</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#setting-up-the-network-using-autograd-the-trial-solution">Setting up the network using Autograd; The trial solution</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#why-the-jacobian">Why the jacobian?</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#setting-up-the-network-using-autograd-the-full-program">Setting up the network using Autograd; The full program</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#example-solving-the-wave-equation-with-neural-networks">Example: Solving the wave equation with Neural Networks</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#the-problem-to-solve-for">The problem to solve for</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#id5">The trial solution</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#the-analytical-solution">The analytical solution</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#solving-the-wave-equation-the-full-program-using-autograd">Solving the wave equation - the full program using Autograd</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#resources-on-differential-equations-and-deep-learning">Resources on differential equations and deep learning</a></li>
+</ul>
+            </nav>
+        </div>
+    </div>
+</div>
+
+              
+                
+<div id="searchbox"></div>
+                <article class="bd-article">
+                  
+  <!-- HTML file automatically generated from DocOnce source (https://github.com/doconce/doconce/)
+doconce format html week43.do.txt --no_mako -->
+<!-- dom:TITLE: Week 43: Deep Learning: Constructing a Neural Network code and solving differential equations --><section class="tex2jax_ignore mathjax_ignore" id="week-43-deep-learning-constructing-a-neural-network-code-and-solving-differential-equations">
+<h1>Week 43: Deep Learning: Constructing a Neural Network code and solving differential equations<a class="headerlink" href="#week-43-deep-learning-constructing-a-neural-network-code-and-solving-differential-equations" title="Link to this heading">#</a></h1>
+<p><strong>Morten Hjorth-Jensen</strong>, Department of Physics, University of Oslo, Norway</p>
+<p>Date: <strong>October 20, 2025</strong></p>
+<section id="plans-for-week-43">
+<h2>Plans for week 43<a class="headerlink" href="#plans-for-week-43" title="Link to this heading">#</a></h2>
+<p><strong>Material for the lecture on Monday October 20, 2025.</strong></p>
+<ol class="arabic simple">
+<li><p>Reminder from last week, see also lecture notes from week 42 at <a class="reference external" href="https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/week42.html">https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/week42.html</a> as well as those from week 41, see see <a class="reference external" href="https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/week41.html">https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/week41.html</a>.</p></li>
+<li><p>Building our own Feed-forward Neural Network.</p></li>
+<li><p>Coding examples using Tensorflow/Keras and Pytorch examples. The Pytorch examples are adapted from Rashcka’s text, see chapters 11-13..</p></li>
+<li><p>Start discussions on how to use neural networks for solving  differential equations (ordinary and partial ones). This topic continues next week as well.</p></li>
+<li><p>Video of lecture at <a class="reference external" href="https://youtu.be/Gi6mzxAT0Ew">https://youtu.be/Gi6mzxAT0Ew</a></p></li>
+<li><p>Whiteboard notes at <a class="github reference external" href="https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2025/FYSSTKweek43.pdf">CompPhysics/MachineLearning</a></p></li>
+</ol>
+</section>
+<section id="exercises-and-lab-session-week-43">
+<h2>Exercises and lab session week 43<a class="headerlink" href="#exercises-and-lab-session-week-43" title="Link to this heading">#</a></h2>
+<p><strong>Lab sessions on Tuesday and Wednesday.</strong></p>
+<ol class="arabic simple">
+<li><p>Work  on writing your own neural network code and discussions of project 2. If you didn’t get time to do the exercises from the two last weeks, we recommend doing so as these exercises give you the basic elements of a neural network code.</p></li>
+<li><p>The exercises this week are tailored to the optional part of project 2, and deal with studying ways to display results from classification problems</p></li>
+</ol>
+</section>
+<section id="using-automatic-differentiation">
+<h2>Using Automatic differentiation<a class="headerlink" href="#using-automatic-differentiation" title="Link to this heading">#</a></h2>
+<p>In our discussions of ordinary differential equations and neural network codes
+we will also study the usage of Autograd, see for example <a class="reference external" href="https://www.youtube.com/watch?v=fRf4l5qaX1M&amp;amp;ab_channel=AlexSmola">https://www.youtube.com/watch?v=fRf4l5qaX1M&amp;ab_channel=AlexSmola</a> in computing gradients for deep learning. For the documentation of Autograd and examples see the Autograd documentation at <a class="github reference external" href="https://github.com/HIPS/autograd">HIPS/autograd</a> and the lecture slides from week 41, see <a class="reference external" href="https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/week41.html">https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/week41.html</a>.</p>
+</section>
+<section id="back-propagation-and-automatic-differentiation">
+<h2>Back propagation and automatic differentiation<a class="headerlink" href="#back-propagation-and-automatic-differentiation" title="Link to this heading">#</a></h2>
+<p>For more details on the back propagation algorithm and automatic differentiation see</p>
+<ol class="arabic simple">
+<li><p><a class="reference external" href="https://www.jmlr.org/papers/volume18/17-468/17-468.pdf">https://www.jmlr.org/papers/volume18/17-468/17-468.pdf</a></p></li>
+<li><p><a class="reference external" href="https://deepimaging.github.io/lectures/lecture_11_Backpropagation.pdf">https://deepimaging.github.io/lectures/lecture_11_Backpropagation.pdf</a></p></li>
+<li><p>Slides 12-44 at <a class="reference external" href="http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture4.pdf">http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture4.pdf</a></p></li>
+</ol>
+</section>
+<section id="lecture-monday-october-20">
+<h2>Lecture Monday  October 20<a class="headerlink" href="#lecture-monday-october-20" title="Link to this heading">#</a></h2>
+</section>
+<section id="setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations">
+<h2>Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations<a class="headerlink" href="#setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations" title="Link to this heading">#</a></h2>
+<p>This is a reminder from last week.</p>
+<p><strong>The architecture (our model).</strong></p>
+<ol class="arabic simple">
+<li><p>Set up your inputs and outputs (scalars, vectors, matrices or higher-order arrays)</p></li>
+<li><p>Define the number of hidden layers and hidden nodes</p></li>
+<li><p>Define activation functions for hidden layers and output layers</p></li>
+<li><p>Define optimizer (plan learning rate, momentum, ADAgrad, RMSprop, ADAM etc) and array of initial learning rates</p></li>
+<li><p>Define cost function and possible regularization terms with hyperparameters</p></li>
+<li><p>Initialize weights and biases</p></li>
+<li><p>Fix number of iterations for the feed forward part and back propagation part</p></li>
+</ol>
+</section>
+<section id="setting-up-the-back-propagation-algorithm-part-1">
+<h2>Setting up the back propagation algorithm, part 1<a class="headerlink" href="#setting-up-the-back-propagation-algorithm-part-1" title="Link to this heading">#</a></h2>
+<p>Let us write this out in the form of an algorithm.</p>
+<p><strong>First</strong>, we set up the input data <span class="math notranslate nohighlight">\(\boldsymbol{x}\)</span> and the activations
+<span class="math notranslate nohighlight">\(\boldsymbol{z}_1\)</span> of the input layer and compute the activation function and
+the pertinent outputs <span class="math notranslate nohighlight">\(\boldsymbol{a}^1\)</span>.</p>
+<p><strong>Secondly</strong>, we perform then the feed forward till we reach the output
+layer and compute all <span class="math notranslate nohighlight">\(\boldsymbol{z}_l\)</span> of the input layer and compute the
+activation function and the pertinent outputs <span class="math notranslate nohighlight">\(\boldsymbol{a}^l\)</span> for
+<span class="math notranslate nohighlight">\(l=1,2,3,\dots,L\)</span>.</p>
+<p><strong>Notation</strong>: The first hidden layer has <span class="math notranslate nohighlight">\(l=1\)</span> as label and the final output layer has <span class="math notranslate nohighlight">\(l=L\)</span>.</p>
+</section>
+<section id="setting-up-the-back-propagation-algorithm-part-2">
+<h2>Setting up the back propagation algorithm, part 2<a class="headerlink" href="#setting-up-the-back-propagation-algorithm-part-2" title="Link to this heading">#</a></h2>
+<p>Thereafter we compute the ouput error <span class="math notranslate nohighlight">\(\boldsymbol{\delta}^L\)</span> by computing all</p>
+<div class="math notranslate nohighlight">
+\[
+\delta_j^L = \sigma'(z_j^L)\frac{\partial {\cal C}}{\partial (a_j^L)}.
+\]</div>
+<p>Then we compute the back propagate error for each <span class="math notranslate nohighlight">\(l=L-1,L-2,\dots,1\)</span> as</p>
+<div class="math notranslate nohighlight">
+\[
+\delta_j^l = \sum_k \delta_k^{l+1}w_{kj}^{l+1}\sigma'(z_j^l).
+\]</div>
+</section>
+<section id="setting-up-the-back-propagation-algorithm-part-3">
+<h2>Setting up the Back propagation algorithm, part 3<a class="headerlink" href="#setting-up-the-back-propagation-algorithm-part-3" title="Link to this heading">#</a></h2>
+<p>Finally, we update the weights and the biases using gradient descent
+for each <span class="math notranslate nohighlight">\(l=L-1,L-2,\dots,1\)</span> (the first hidden layer) and update the weights and biases
+according to the rules</p>
+<div class="math notranslate nohighlight">
+\[
+w_{ij}^l\leftarrow  = w_{ij}^l- \eta \delta_j^la_i^{l-1},
+\]</div>
+<div class="math notranslate nohighlight">
+\[
+b_j^l \leftarrow b_j^l-\eta \frac{\partial {\cal C}}{\partial b_j^l}=b_j^l-\eta \delta_j^l,
+\]</div>
+<p>with <span class="math notranslate nohighlight">\(\eta\)</span> being the learning rate.</p>
+</section>
+<section id="updating-the-gradients">
+<h2>Updating the gradients<a class="headerlink" href="#updating-the-gradients" title="Link to this heading">#</a></h2>
+<p>With the back propagate error for each <span class="math notranslate nohighlight">\(l=L-1,L-2,\dots,1\)</span> as</p>
+<div class="math notranslate nohighlight">
+\[
+\delta_j^l = \sum_k \delta_k^{l+1}w_{kj}^{l+1}\sigma'(z_j^l),
+\]</div>
+<p>we update the weights and the biases using gradient descent for each <span class="math notranslate nohighlight">\(l=L-1,L-2,\dots,1\)</span> and update the weights and biases according to the rules</p>
+<div class="math notranslate nohighlight">
+\[
+w_{ij}^l\leftarrow  = w_{ij}^l- \eta \delta_j^la_i^{l-1},
+\]</div>
+<div class="math notranslate nohighlight">
+\[
+b_j^l \leftarrow b_j^l-\eta \frac{\partial {\cal C}}{\partial b_j^l}=b_j^l-\eta \delta_j^l,
+\]</div>
+</section>
+<section id="activation-functions">
+<h2>Activation functions<a class="headerlink" href="#activation-functions" title="Link to this heading">#</a></h2>
+<p>A property that characterizes a neural network, other than its
+connectivity, is the choice of activation function(s).  The following
+restrictions are imposed on an activation function for an FFNN to
+fulfill the universal approximation theorem</p>
+<ul class="simple">
+<li><p>Non-constant</p></li>
+<li><p>Bounded</p></li>
+<li><p>Monotonically-increasing</p></li>
+<li><p>Continuous</p></li>
+</ul>
+<section id="activation-functions-examples">
+<h3>Activation functions, examples<a class="headerlink" href="#activation-functions-examples" title="Link to this heading">#</a></h3>
+<p>Typical examples are the logistic <em>Sigmoid</em></p>
+<div class="math notranslate nohighlight">
+\[
+\sigma(x) = \frac{1}{1 + e^{-x}},
+\]</div>
+<p>and the <em>hyperbolic tangent</em> function</p>
+<div class="math notranslate nohighlight">
+\[
+\sigma(x) = \tanh(x)
+\]</div>
+</section>
+</section>
+<section id="the-relu-function-family">
+<h2>The RELU function family<a class="headerlink" href="#the-relu-function-family" title="Link to this heading">#</a></h2>
+<p>The ReLU activation function suffers from a problem known as the dying
+ReLUs: during training, some neurons effectively die, meaning they
+stop outputting anything other than 0.</p>
+<p>In some cases, you may find that half of your network’s neurons are
+dead, especially if you used a large learning rate. During training,
+if a neuron’s weights get updated such that the weighted sum of the
+neuron’s inputs is negative, it will start outputting 0. When this
+happen, the neuron is unlikely to come back to life since the gradient
+of the ReLU function is 0 when its input is negative.</p>
+</section>
+<section id="elu-function">
+<h2>ELU function<a class="headerlink" href="#elu-function" title="Link to this heading">#</a></h2>
+<p>To solve this problem, nowadays practitioners use a variant of the
+ReLU function, such as the leaky ReLU discussed above or the so-called
+exponential linear unit (ELU) function</p>
+<div class="math notranslate nohighlight">
+\[\begin{split}
+ELU(z) = \left\{\begin{array}{cc} \alpha\left( \exp{(z)}-1\right) &amp; z &lt; 0,\\  z &amp; z \ge 0.\end{array}\right.
+\end{split}\]</div>
+</section>
+<section id="which-activation-function-should-we-use">
+<h2>Which activation function should we use?<a class="headerlink" href="#which-activation-function-should-we-use" title="Link to this heading">#</a></h2>
+<p>In general it seems that the ELU activation function is better than
+the leaky ReLU function (and its variants), which is better than
+ReLU. ReLU performs better than <span class="math notranslate nohighlight">\(\tanh\)</span> which in turn performs better
+than the logistic function.</p>
+<p>If runtime performance is an issue, then you may opt for the leaky
+ReLU function over the ELU function If you don’t want to tweak yet
+another hyperparameter, you may just use the default <span class="math notranslate nohighlight">\(\alpha\)</span> of
+<span class="math notranslate nohighlight">\(0.01\)</span> for the leaky ReLU, and <span class="math notranslate nohighlight">\(1\)</span> for ELU. If you have spare time and
+computing power, you can use cross-validation or bootstrap to evaluate
+other activation functions.</p>
+</section>
+<section id="more-on-activation-functions-output-layers">
+<h2>More on activation functions, output layers<a class="headerlink" href="#more-on-activation-functions-output-layers" title="Link to this heading">#</a></h2>
+<p>In most cases you can use the ReLU activation function in the hidden
+layers (or one of its variants).</p>
+<p>It is a bit faster to compute than other activation functions, and the
+gradient descent optimization does in general not get stuck.</p>
+<p><strong>For the output layer:</strong></p>
+<ul class="simple">
+<li><p>For classification the softmax activation function is generally a good choice for classification tasks (when the classes are mutually exclusive).</p></li>
+<li><p>For regression tasks, you can simply use no activation function at all.</p></li>
+</ul>
+</section>
+<section id="building-neural-networks-in-tensorflow-and-keras">
+<h2>Building neural networks in Tensorflow and Keras<a class="headerlink" href="#building-neural-networks-in-tensorflow-and-keras" title="Link to this heading">#</a></h2>
+<p>Now we want  to build on the experience gained from our neural network implementation in NumPy and scikit-learn
+and use it to construct a neural network in Tensorflow. Once we have constructed a neural network in NumPy
+and Tensorflow, building one in Keras is really quite trivial, though the performance may suffer.</p>
+<p>In our previous example we used only one hidden layer, and in this we will use two. From this it should be quite
+clear how to build one using an arbitrary number of hidden layers, using data structures such as Python lists or
+NumPy arrays.</p>
+</section>
+<section id="tensorflow">
+<h2>Tensorflow<a class="headerlink" href="#tensorflow" title="Link to this heading">#</a></h2>
+<p>Tensorflow is an open source library machine learning library
+developed by the Google Brain team for internal use. It was released
+under the Apache 2.0 open source license in November 9, 2015.</p>
+<p>Tensorflow is a computational framework that allows you to construct
+machine learning models at different levels of abstraction, from
+high-level, object-oriented APIs like Keras, down to the C++ kernels
+that Tensorflow is built upon. The higher levels of abstraction are
+simpler to use, but less flexible, and our choice of implementation
+should reflect the problems we are trying to solve.</p>
+<p><a class="reference external" href="https://www.tensorflow.org/guide/graphs">Tensorflow uses</a> so-called graphs to represent your computation
+in terms of the dependencies between individual operations, such that you first build a Tensorflow <em>graph</em>
+to represent your model, and then create a Tensorflow <em>session</em> to run the graph.</p>
+<p>In this guide we will analyze the same data as we did in our NumPy and
+scikit-learn tutorial, gathered from the MNIST database of images. We
+will give an introduction to the lower level Python Application
+Program Interfaces (APIs), and see how we use them to build our graph.
+Then we will build (effectively) the same graph in Keras, to see just
+how simple solving a machine learning problem can be.</p>
+<p>To install tensorflow on Unix/Linux systems, use pip as</p>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-none notranslate"><div class="highlight"><pre><span></span>pip3 install tensorflow
+</pre></div>
+</div>
+</div>
+</div>
+<p>and/or if you use <strong>anaconda</strong>, just write (or install from the graphical user interface)
+(current release of CPU-only TensorFlow)</p>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-none notranslate"><div class="highlight"><pre><span></span>conda create -n tf tensorflow
+conda activate tf
+</pre></div>
+</div>
+</div>
+</div>
+<p>To install the current release of GPU TensorFlow</p>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-none notranslate"><div class="highlight"><pre><span></span>conda create -n tf-gpu tensorflow-gpu
+conda activate tf-gpu
+</pre></div>
+</div>
+</div>
+</div>
+</section>
+<section id="using-keras">
+<h2>Using Keras<a class="headerlink" href="#using-keras" title="Link to this heading">#</a></h2>
+<p>Keras is a high level <a class="reference external" href="https://en.wikipedia.org/wiki/Application_programming_interface">neural network</a>
+that supports Tensorflow, CTNK and Theano as backends.<br />
+If you have Anaconda installed you may run the following command</p>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-none notranslate"><div class="highlight"><pre><span></span>conda install keras
+</pre></div>
+</div>
+</div>
+</div>
+<p>You can look up the <a class="reference external" href="https://keras.io/">instructions here</a> for more information.</p>
+<p>We will to a large extent use <strong>keras</strong> in this course.</p>
+</section>
+<section id="collect-and-pre-process-data">
+<h2>Collect and pre-process data<a class="headerlink" href="#collect-and-pre-process-data" title="Link to this heading">#</a></h2>
+<p>Let us look again at the MINST data set.</p>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-none notranslate"><div class="highlight"><pre><span></span>%matplotlib inline
+
+# import necessary packages
+import numpy as np
+import matplotlib.pyplot as plt
+import tensorflow as tf
+from sklearn import datasets
+
+
+# ensure the same random numbers appear every time
+np.random.seed(0)
+
+# display images in notebook
+%matplotlib inline
+plt.rcParams[&#39;figure.figsize&#39;] = (12,12)
+
+
+# download MNIST dataset
+digits = datasets.load_digits()
+
+# define inputs and labels
+inputs = digits.images
+labels = digits.target
+
+print(&quot;inputs = (n_inputs, pixel_width, pixel_height) = &quot; + str(inputs.shape))
+print(&quot;labels = (n_inputs) = &quot; + str(labels.shape))
+
+
+# flatten the image
+# the value -1 means dimension is inferred from the remaining dimensions: 8x8 = 64
+n_inputs = len(inputs)
+inputs = inputs.reshape(n_inputs, -1)
+print(&quot;X = (n_inputs, n_features) = &quot; + str(inputs.shape))
+
+
+# choose some random images to display
+indices = np.arange(n_inputs)
+random_indices = np.random.choice(indices, size=5)
+
+for i, image in enumerate(digits.images[random_indices]):
+    plt.subplot(1, 5, i+1)
+    plt.axis(&#39;off&#39;)
+    plt.imshow(image, cmap=plt.cm.gray_r, interpolation=&#39;nearest&#39;)
+    plt.title(&quot;Label: %d&quot; % digits.target[random_indices[i]])
+plt.show()
+</pre></div>
+</div>
+</div>
+</div>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-none notranslate"><div class="highlight"><pre><span></span>from tensorflow.keras.layers import Input
+from tensorflow.keras.models import Sequential      #This allows appending layers to existing models
+from tensorflow.keras.layers import Dense           #This allows defining the characteristics of a particular layer
+from tensorflow.keras import optimizers             #This allows using whichever optimiser we want (sgd,adam,RMSprop)
+from tensorflow.keras import regularizers           #This allows using whichever regularizer we want (l1,l2,l1_l2)
+from tensorflow.keras.utils import to_categorical   #This allows using categorical cross entropy as the cost function
+
+from sklearn.model_selection import train_test_split
+
+# one-hot representation of labels
+labels = to_categorical(labels)
+
+# split into train and test data
+train_size = 0.8
+test_size = 1 - train_size
+X_train, X_test, Y_train, Y_test = train_test_split(inputs, labels, train_size=train_size,
+                                                    test_size=test_size)
+</pre></div>
+</div>
+</div>
+</div>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-none notranslate"><div class="highlight"><pre><span></span>
+epochs = 100
+batch_size = 100
+n_neurons_layer1 = 100
+n_neurons_layer2 = 50
+n_categories = 10
+eta_vals = np.logspace(-5, 1, 7)
+lmbd_vals = np.logspace(-5, 1, 7)
+def create_neural_network_keras(n_neurons_layer1, n_neurons_layer2, n_categories, eta, lmbd):
+    model = Sequential()
+    model.add(Dense(n_neurons_layer1, activation=&#39;sigmoid&#39;, kernel_regularizer=regularizers.l2(lmbd)))
+    model.add(Dense(n_neurons_layer2, activation=&#39;sigmoid&#39;, kernel_regularizer=regularizers.l2(lmbd)))
+    model.add(Dense(n_categories, activation=&#39;softmax&#39;))
+    
+    sgd = optimizers.SGD(learning_rate=eta)
+    model.compile(loss=&#39;categorical_crossentropy&#39;, optimizer=sgd, metrics=[&#39;accuracy&#39;])
+    
+    return model
+</pre></div>
+</div>
+</div>
+</div>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-none notranslate"><div class="highlight"><pre><span></span>DNN_keras = np.zeros((len(eta_vals), len(lmbd_vals)), dtype=object)
+        
+for i, eta in enumerate(eta_vals):
+    for j, lmbd in enumerate(lmbd_vals):
+        DNN = create_neural_network_keras(n_neurons_layer1, n_neurons_layer2, n_categories,
+                                         eta=eta, lmbd=lmbd)
+        DNN.fit(X_train, Y_train, epochs=epochs, batch_size=batch_size, verbose=0)
+        scores = DNN.evaluate(X_test, Y_test)
+        
+        DNN_keras[i][j] = DNN
+        
+        print(&quot;Learning rate = &quot;, eta)
+        print(&quot;Lambda = &quot;, lmbd)
+        print(&quot;Test accuracy: %.3f&quot; % scores[1])
+        print()
+</pre></div>
+</div>
+</div>
+</div>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-none notranslate"><div class="highlight"><pre><span></span># optional
+# visual representation of grid search
+# uses seaborn heatmap, could probably do this in matplotlib
+import seaborn as sns
+
+sns.set()
+
+train_accuracy = np.zeros((len(eta_vals), len(lmbd_vals)))
+test_accuracy = np.zeros((len(eta_vals), len(lmbd_vals)))
+
+for i in range(len(eta_vals)):
+    for j in range(len(lmbd_vals)):
+        DNN = DNN_keras[i][j]
+
+        train_accuracy[i][j] = DNN.evaluate(X_train, Y_train)[1]
+        test_accuracy[i][j] = DNN.evaluate(X_test, Y_test)[1]
+
+        
+fig, ax = plt.subplots(figsize = (10, 10))
+sns.heatmap(train_accuracy, annot=True, ax=ax, cmap=&quot;viridis&quot;)
+ax.set_title(&quot;Training Accuracy&quot;)
+ax.set_ylabel(&quot;$\eta$&quot;)
+ax.set_xlabel(&quot;$\lambda$&quot;)
+plt.show()
+
+fig, ax = plt.subplots(figsize = (10, 10))
+sns.heatmap(test_accuracy, annot=True, ax=ax, cmap=&quot;viridis&quot;)
+ax.set_title(&quot;Test Accuracy&quot;)
+ax.set_ylabel(&quot;$\eta$&quot;)
+ax.set_xlabel(&quot;$\lambda$&quot;)
+plt.show()
+</pre></div>
+</div>
+</div>
+</div>
+</section>
+<section id="using-pytorch-with-the-full-mnist-data-set">
+<h2>Using Pytorch with the full MNIST data set<a class="headerlink" href="#using-pytorch-with-the-full-mnist-data-set" title="Link to this heading">#</a></h2>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-none notranslate"><div class="highlight"><pre><span></span>import torch
+import torch.nn as nn
+import torch.optim as optim
+import torchvision
+import torchvision.transforms as transforms
+
+# Device configuration: use GPU if available
+device = torch.device(&quot;cuda&quot; if torch.cuda.is_available() else &quot;cpu&quot;)
+
+# MNIST dataset (downloads if not already present)
+transform = transforms.Compose([
+    transforms.ToTensor(),
+    transforms.Normalize((0.5,), (0.5,))  # normalize to mean=0.5, std=0.5 (approx. [-1,1] pixel range)
+])
+train_dataset = torchvision.datasets.MNIST(root=&#39;./data&#39;, train=True, download=True, transform=transform)
+test_dataset  = torchvision.datasets.MNIST(root=&#39;./data&#39;, train=False, download=True, transform=transform)
+
+train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=64, shuffle=True)
+test_loader  = torch.utils.data.DataLoader(test_dataset, batch_size=64, shuffle=False)
+
+
+class NeuralNet(nn.Module):
+    def __init__(self):
+        super(NeuralNet, self).__init__()
+        self.fc1 = nn.Linear(28*28, 100)   # first hidden layer (784 -&gt; 100)
+        self.fc2 = nn.Linear(100, 100)    # second hidden layer (100 -&gt; 100)
+        self.fc3 = nn.Linear(100, 10)     # output layer (100 -&gt; 10 classes)
+    def forward(self, x):
+        x = x.view(x.size(0), -1)         # flatten images into vectors of size 784
+        x = torch.relu(self.fc1(x))       # hidden layer 1 + ReLU activation
+        x = torch.relu(self.fc2(x))       # hidden layer 2 + ReLU activation
+        x = self.fc3(x)                   # output layer (logits for 10 classes)
+        return x
+
+model = NeuralNet().to(device)
+
+
+criterion = nn.CrossEntropyLoss()
+optimizer = optim.SGD(model.parameters(), lr=0.01, weight_decay=1e-4)
+
+num_epochs = 10
+for epoch in range(num_epochs):
+    model.train()  # set model to training mode
+    running_loss = 0.0
+    for images, labels in train_loader:
+        # Move data to device (GPU if available, else CPU)
+        images, labels = images.to(device), labels.to(device)
+
+        optimizer.zero_grad()            # reset gradients to zero
+        outputs = model(images)          # forward pass: compute predictions
+        loss = criterion(outputs, labels)  # compute cross-entropy loss
+        loss.backward()                 # backpropagate to compute gradients
+        optimizer.step()                # update weights using SGD step 
+
+        running_loss += loss.item()
+    # Compute average loss over all batches in this epoch
+    avg_loss = running_loss / len(train_loader)
+    print(f&quot;Epoch {epoch+1}/{num_epochs}, Loss: {avg_loss:.4f}&quot;)
+
+#Evaluation on the Test Set
+
+
+
+model.eval()  # set model to evaluation mode 
+correct = 0
+total = 0
+with torch.no_grad():  # disable gradient calculation for evaluation 
+    for images, labels in test_loader:
+        images, labels = images.to(device), labels.to(device)
+        outputs = model(images)
+        _, predicted = torch.max(outputs, dim=1)  # class with highest score
+        total += labels.size(0)
+        correct += (predicted == labels).sum().item()
+
+accuracy = 100 * correct / total
+print(f&quot;Test Accuracy: {accuracy:.2f}%&quot;)
+</pre></div>
+</div>
+</div>
+</div>
+</section>
+<section id="and-a-similar-example-using-tensorflow-with-keras">
+<h2>And a similar example using Tensorflow with Keras<a class="headerlink" href="#and-a-similar-example-using-tensorflow-with-keras" title="Link to this heading">#</a></h2>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-none notranslate"><div class="highlight"><pre><span></span>
+import tensorflow as tf
+from tensorflow import keras
+from tensorflow.keras import layers, regularizers
+
+# Check for GPU (TensorFlow will use it automatically if available)
+gpus = tf.config.list_physical_devices(&#39;GPU&#39;)
+print(f&quot;GPUs available: {gpus}&quot;)
+
+# 1) Load and preprocess MNIST
+(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()
+# Normalize to [0, 1]
+x_train = (x_train.astype(&quot;float32&quot;) / 255.0)
+x_test  = (x_test.astype(&quot;float32&quot;) / 255.0)
+
+# 2) Build the model: 784 -&gt; 100 -&gt; 100 -&gt; 10
+l2_reg = 1e-4  # L2 regularization strength
+
+model = keras.Sequential([
+    layers.Input(shape=(28, 28)),
+    layers.Flatten(),
+    layers.Dense(100, activation=&quot;relu&quot;,
+                 kernel_regularizer=regularizers.l2(l2_reg)),
+    layers.Dense(100, activation=&quot;relu&quot;,
+                 kernel_regularizer=regularizers.l2(l2_reg)),
+    layers.Dense(10, activation=&quot;softmax&quot;)  # output probabilities for 10 classes
+])
+
+# 3) Compile with SGD + weight decay via L2 regularizers
+model.compile(
+    optimizer=keras.optimizers.SGD(learning_rate=0.01),
+    loss=&quot;sparse_categorical_crossentropy&quot;,
+    metrics=[&quot;accuracy&quot;],
+)
+
+model.summary()
+
+# 4) Train
+history = model.fit(
+    x_train, y_train,
+    epochs=10,
+    batch_size=64,
+    validation_split=0.1,  # optional: monitor validation during training
+    verbose=1
+)
+
+# 5) Evaluate on test set
+test_loss, test_acc = model.evaluate(x_test, y_test, verbose=0)
+print(f&quot;Test accuracy: {test_acc:.4f}, Test loss: {test_loss:.4f}&quot;)
+</pre></div>
+</div>
+</div>
+</div>
+</section>
+<section id="building-our-own-neural-network-code">
+<h2>Building our own  neural network code<a class="headerlink" href="#building-our-own-neural-network-code" title="Link to this heading">#</a></h2>
+<p>Here we  present a flexible object oriented codebase
+for a feed forward neural network, along with a demonstration of how
+to use it. Before we get into the details of the neural network, we
+will first present some implementations of various schedulers, cost
+functions and activation functions that can be used together with the
+neural network.</p>
+<p>The codes here were developed by Eric Reber and Gregor Kajda during spring 2023.</p>
+<section id="learning-rate-methods">
+<h3>Learning rate methods<a class="headerlink" href="#learning-rate-methods" title="Link to this heading">#</a></h3>
+<p>The code below shows object oriented implementations of the Constant,
+Momentum, Adagrad, AdagradMomentum, RMS prop and Adam schedulers. All
+of the classes belong to the shared abstract Scheduler class, and
+share the update_change() and reset() methods allowing for any of the
+schedulers to be seamlessly used during the training stage, as will
+later be shown in the fit() method of the neural
+network. Update_change() only has one parameter, the gradient
+(<span class="math notranslate nohighlight">\(δ^l_ja^{l−1}_k\)</span>), and returns the change which will be subtracted
+from the weights. The reset() function takes no parameters, and resets
+the desired variables. For Constant and Momentum, reset does nothing.</p>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-none notranslate"><div class="highlight"><pre><span></span>import autograd.numpy as np
+
+class Scheduler:
+    &quot;&quot;&quot;
+    Abstract class for Schedulers
+    &quot;&quot;&quot;
+
+    def __init__(self, eta):
+        self.eta = eta
+
+    # should be overwritten
+    def update_change(self, gradient):
+        raise NotImplementedError
+
+    # overwritten if needed
+    def reset(self):
+        pass
+
+
+class Constant(Scheduler):
+    def __init__(self, eta):
+        super().__init__(eta)
+
+    def update_change(self, gradient):
+        return self.eta * gradient
+    
+    def reset(self):
+        pass
+
+
+class Momentum(Scheduler):
+    def __init__(self, eta: float, momentum: float):
+        super().__init__(eta)
+        self.momentum = momentum
+        self.change = 0
+
+    def update_change(self, gradient):
+        self.change = self.momentum * self.change + self.eta * gradient
+        return self.change
+
+    def reset(self):
+        pass
+
+
+class Adagrad(Scheduler):
+    def __init__(self, eta):
+        super().__init__(eta)
+        self.G_t = None
+
+    def update_change(self, gradient):
+        delta = 1e-8  # avoid division ny zero
+
+        if self.G_t is None:
+            self.G_t = np.zeros((gradient.shape[0], gradient.shape[0]))
+
+        self.G_t += gradient @ gradient.T
+
+        G_t_inverse = 1 / (
+            delta + np.sqrt(np.reshape(np.diagonal(self.G_t), (self.G_t.shape[0], 1)))
+        )
+        return self.eta * gradient * G_t_inverse
+
+    def reset(self):
+        self.G_t = None
+
+
+class AdagradMomentum(Scheduler):
+    def __init__(self, eta, momentum):
+        super().__init__(eta)
+        self.G_t = None
+        self.momentum = momentum
+        self.change = 0
+
+    def update_change(self, gradient):
+        delta = 1e-8  # avoid division ny zero
+
+        if self.G_t is None:
+            self.G_t = np.zeros((gradient.shape[0], gradient.shape[0]))
+
+        self.G_t += gradient @ gradient.T
+
+        G_t_inverse = 1 / (
+            delta + np.sqrt(np.reshape(np.diagonal(self.G_t), (self.G_t.shape[0], 1)))
+        )
+        self.change = self.change * self.momentum + self.eta * gradient * G_t_inverse
+        return self.change
+
+    def reset(self):
+        self.G_t = None
+
+
+class RMS_prop(Scheduler):
+    def __init__(self, eta, rho):
+        super().__init__(eta)
+        self.rho = rho
+        self.second = 0.0
+
+    def update_change(self, gradient):
+        delta = 1e-8  # avoid division ny zero
+        self.second = self.rho * self.second + (1 - self.rho) * gradient * gradient
+        return self.eta * gradient / (np.sqrt(self.second + delta))
+
+    def reset(self):
+        self.second = 0.0
+
+
+class Adam(Scheduler):
+    def __init__(self, eta, rho, rho2):
+        super().__init__(eta)
+        self.rho = rho
+        self.rho2 = rho2
+        self.moment = 0
+        self.second = 0
+        self.n_epochs = 1
+
+    def update_change(self, gradient):
+        delta = 1e-8  # avoid division ny zero
+
+        self.moment = self.rho * self.moment + (1 - self.rho) * gradient
+        self.second = self.rho2 * self.second + (1 - self.rho2) * gradient * gradient
+
+        moment_corrected = self.moment / (1 - self.rho**self.n_epochs)
+        second_corrected = self.second / (1 - self.rho2**self.n_epochs)
+
+        return self.eta * moment_corrected / (np.sqrt(second_corrected + delta))
+
+    def reset(self):
+        self.n_epochs += 1
+        self.moment = 0
+        self.second = 0
+</pre></div>
+</div>
+</div>
+</div>
+</section>
+<section id="usage-of-the-above-learning-rate-schedulers">
+<h3>Usage of the above learning rate schedulers<a class="headerlink" href="#usage-of-the-above-learning-rate-schedulers" title="Link to this heading">#</a></h3>
+<p>To initalize a scheduler, simply create the object and pass in the
+necessary parameters such as the learning rate and the momentum as
+shown below. As the Scheduler class is an abstract class it should not
+called directly, and will raise an error upon usage.</p>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-none notranslate"><div class="highlight"><pre><span></span>momentum_scheduler = Momentum(eta=1e-3, momentum=0.9)
+adam_scheduler = Adam(eta=1e-3, rho=0.9, rho2=0.999)
+</pre></div>
+</div>
+</div>
+</div>
+<p>Here is a small example for how a segment of code using schedulers
+could look. Switching out the schedulers is simple.</p>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-none notranslate"><div class="highlight"><pre><span></span>weights = np.ones((3,3))
+print(f&quot;Before scheduler:\n{weights=}&quot;)
+
+epochs = 10
+for e in range(epochs):
+    gradient = np.random.rand(3, 3)
+    change = adam_scheduler.update_change(gradient)
+    weights = weights - change
+    adam_scheduler.reset()
+
+print(f&quot;\nAfter scheduler:\n{weights=}&quot;)
+</pre></div>
+</div>
+</div>
+</div>
+</section>
+<section id="cost-functions">
+<h3>Cost functions<a class="headerlink" href="#cost-functions" title="Link to this heading">#</a></h3>
+<p>Here we discuss cost functions that can be used when creating the
+neural network. Every cost function takes the target vector as its
+parameter, and returns a function valued only at <span class="math notranslate nohighlight">\(x\)</span> such that it may
+easily be differentiated.</p>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-none notranslate"><div class="highlight"><pre><span></span>import autograd.numpy as np
+
+def CostOLS(target):
+    
+    def func(X):
+        return (1.0 / target.shape[0]) * np.sum((target - X) ** 2)
+
+    return func
+
+
+def CostLogReg(target):
+
+    def func(X):
+        
+        return -(1.0 / target.shape[0]) * np.sum(
+            (target * np.log(X + 10e-10)) + ((1 - target) * np.log(1 - X + 10e-10))
+        )
+
+    return func
+
+
+def CostCrossEntropy(target):
+    
+    def func(X):
+        return -(1.0 / target.size) * np.sum(target * np.log(X + 10e-10))
+
+    return func
+</pre></div>
+</div>
+</div>
+</div>
+<p>Below we give a short example of how these cost function may be used
+to obtain results if you wish to test them out on your own using
+AutoGrad’s automatics differentiation.</p>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-none notranslate"><div class="highlight"><pre><span></span>from autograd import grad
+
+target = np.array([[1, 2, 3]]).T
+a = np.array([[4, 5, 6]]).T
+
+cost_func = CostCrossEntropy
+cost_func_derivative = grad(cost_func(target))
+
+valued_at_a = cost_func_derivative(a)
+print(f&quot;Derivative of cost function {cost_func.__name__} valued at a:\n{valued_at_a}&quot;)
+</pre></div>
+</div>
+</div>
+</div>
+</section>
+<section id="id1">
+<h3>Activation functions<a class="headerlink" href="#id1" title="Link to this heading">#</a></h3>
+<p>Finally, before we look at the neural network, we will look at the
+activation functions which can be specified between the hidden layers
+and as the output function. Each function can be valued for any given
+vector or matrix X, and can be differentiated via derivate().</p>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-none notranslate"><div class="highlight"><pre><span></span>import autograd.numpy as np
+from autograd import elementwise_grad
+
+def identity(X):
+    return X
+
+
+def sigmoid(X):
+    try:
+        return 1.0 / (1 + np.exp(-X))
+    except FloatingPointError:
+        return np.where(X &gt; np.zeros(X.shape), np.ones(X.shape), np.zeros(X.shape))
+
+
+def softmax(X):
+    X = X - np.max(X, axis=-1, keepdims=True)
+    delta = 10e-10
+    return np.exp(X) / (np.sum(np.exp(X), axis=-1, keepdims=True) + delta)
+
+
+def RELU(X):
+    return np.where(X &gt; np.zeros(X.shape), X, np.zeros(X.shape))
+
+
+def LRELU(X):
+    delta = 10e-4
+    return np.where(X &gt; np.zeros(X.shape), X, delta * X)
+
+
+def derivate(func):
+    if func.__name__ == &quot;RELU&quot;:
+
+        def func(X):
+            return np.where(X &gt; 0, 1, 0)
+
+        return func
+
+    elif func.__name__ == &quot;LRELU&quot;:
+
+        def func(X):
+            delta = 10e-4
+            return np.where(X &gt; 0, 1, delta)
+
+        return func
+
+    else:
+        return elementwise_grad(func)
+</pre></div>
+</div>
+</div>
+</div>
+<p>Below follows a short demonstration of how to use an activation
+function. The derivative of the activation function will be important
+when calculating the output delta term during backpropagation. Note
+that derivate() can also be used for cost functions for a more
+generalized approach.</p>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-none notranslate"><div class="highlight"><pre><span></span>z = np.array([[4, 5, 6]]).T
+print(f&quot;Input to activation function:\n{z}&quot;)
+
+act_func = sigmoid
+a = act_func(z)
+print(f&quot;\nOutput from {act_func.__name__} activation function:\n{a}&quot;)
+
+act_func_derivative = derivate(act_func)
+valued_at_z = act_func_derivative(a)
+print(f&quot;\nDerivative of {act_func.__name__} activation function valued at z:\n{valued_at_z}&quot;)
+</pre></div>
+</div>
+</div>
+</div>
+</section>
+<section id="the-neural-network">
+<h3>The Neural Network<a class="headerlink" href="#the-neural-network" title="Link to this heading">#</a></h3>
+<p>Now that we have gotten a good understanding of the implementation of
+some important components, we can take a look at an object oriented
+implementation of a feed forward neural network. The feed forward
+neural network has been implemented as a class named FFNN, which can
+be initiated as a regressor or classifier dependant on the choice of
+cost function. The FFNN can have any number of input nodes, hidden
+layers with any amount of hidden nodes, and any amount of output nodes
+meaning it can perform multiclass classification as well as binary
+classification and regression problems. Although there is a lot of
+code present, it makes for an easy to use and generalizeable interface
+for creating many types of neural networks as will be demonstrated
+below.</p>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-none notranslate"><div class="highlight"><pre><span></span>import math
+import autograd.numpy as np
+import sys
+import warnings
+from autograd import grad, elementwise_grad
+from random import random, seed
+from copy import deepcopy, copy
+from typing import Tuple, Callable
+from sklearn.utils import resample
+
+warnings.simplefilter(&quot;error&quot;)
+
+
+class FFNN:
+    &quot;&quot;&quot;
+    Description:
+    ------------
+        Feed Forward Neural Network with interface enabling flexible design of a
+        nerual networks architecture and the specification of activation function
+        in the hidden layers and output layer respectively. This model can be used
+        for both regression and classification problems, depending on the output function.
+
+    Attributes:
+    ------------
+        I   dimensions (tuple[int]): A list of positive integers, which specifies the
+            number of nodes in each of the networks layers. The first integer in the array
+            defines the number of nodes in the input layer, the second integer defines number
+            of nodes in the first hidden layer and so on until the last number, which
+            specifies the number of nodes in the output layer.
+        II  hidden_func (Callable): The activation function for the hidden layers
+        III output_func (Callable): The activation function for the output layer
+        IV  cost_func (Callable): Our cost function
+        V   seed (int): Sets random seed, makes results reproducible
+    &quot;&quot;&quot;
+
+    def __init__(
+        self,
+        dimensions: tuple[int],
+        hidden_func: Callable = sigmoid,
+        output_func: Callable = lambda x: x,
+        cost_func: Callable = CostOLS,
+        seed: int = None,
+    ):
+        self.dimensions = dimensions
+        self.hidden_func = hidden_func
+        self.output_func = output_func
+        self.cost_func = cost_func
+        self.seed = seed
+        self.weights = list()
+        self.schedulers_weight = list()
+        self.schedulers_bias = list()
+        self.a_matrices = list()
+        self.z_matrices = list()
+        self.classification = None
+
+        self.reset_weights()
+        self._set_classification()
+
+    def fit(
+        self,
+        X: np.ndarray,
+        t: np.ndarray,
+        scheduler: Scheduler,
+        batches: int = 1,
+        epochs: int = 100,
+        lam: float = 0,
+        X_val: np.ndarray = None,
+        t_val: np.ndarray = None,
+    ):
+        &quot;&quot;&quot;
+        Description:
+        ------------
+            This function performs the training the neural network by performing the feedforward and backpropagation
+            algorithm to update the networks weights.
+
+        Parameters:
+        ------------
+            I    X (np.ndarray) : training data
+            II   t (np.ndarray) : target data
+            III  scheduler (Scheduler) : specified scheduler (algorithm for optimization of gradient descent)
+            IV   scheduler_args (list[int]) : list of all arguments necessary for scheduler
+
+        Optional Parameters:
+        ------------
+            V    batches (int) : number of batches the datasets are split into, default equal to 1
+            VI   epochs (int) : number of iterations used to train the network, default equal to 100
+            VII  lam (float) : regularization hyperparameter lambda
+            VIII X_val (np.ndarray) : validation set
+            IX   t_val (np.ndarray) : validation target set
+
+        Returns:
+        ------------
+            I   scores (dict) : A dictionary containing the performance metrics of the model.
+                The number of the metrics depends on the parameters passed to the fit-function.
+
+        &quot;&quot;&quot;
+
+        # setup 
+        if self.seed is not None:
+            np.random.seed(self.seed)
+
+        val_set = False
+        if X_val is not None and t_val is not None:
+            val_set = True
+
+        # creating arrays for score metrics
+        train_errors = np.empty(epochs)
+        train_errors.fill(np.nan)
+        val_errors = np.empty(epochs)
+        val_errors.fill(np.nan)
+
+        train_accs = np.empty(epochs)
+        train_accs.fill(np.nan)
+        val_accs = np.empty(epochs)
+        val_accs.fill(np.nan)
+
+        self.schedulers_weight = list()
+        self.schedulers_bias = list()
+
+        batch_size = X.shape[0] // batches
+
+        X, t = resample(X, t)
+
+        # this function returns a function valued only at X
+        cost_function_train = self.cost_func(t)
+        if val_set:
+            cost_function_val = self.cost_func(t_val)
+
+        # create schedulers for each weight matrix
+        for i in range(len(self.weights)):
+            self.schedulers_weight.append(copy(scheduler))
+            self.schedulers_bias.append(copy(scheduler))
+
+        print(f&quot;{scheduler.__class__.__name__}: Eta={scheduler.eta}, Lambda={lam}&quot;)
+
+        try:
+            for e in range(epochs):
+                for i in range(batches):
+                    # allows for minibatch gradient descent
+                    if i == batches - 1:
+                        # If the for loop has reached the last batch, take all thats left
+                        X_batch = X[i * batch_size :, :]
+                        t_batch = t[i * batch_size :, :]
+                    else:
+                        X_batch = X[i * batch_size : (i + 1) * batch_size, :]
+                        t_batch = t[i * batch_size : (i + 1) * batch_size, :]
+
+                    self._feedforward(X_batch)
+                    self._backpropagate(X_batch, t_batch, lam)
+
+                # reset schedulers for each epoch (some schedulers pass in this call)
+                for scheduler in self.schedulers_weight:
+                    scheduler.reset()
+
+                for scheduler in self.schedulers_bias:
+                    scheduler.reset()
+
+                # computing performance metrics
+                pred_train = self.predict(X)
+                train_error = cost_function_train(pred_train)
+
+                train_errors[e] = train_error
+                if val_set:
+                    
+                    pred_val = self.predict(X_val)
+                    val_error = cost_function_val(pred_val)
+                    val_errors[e] = val_error
+
+                if self.classification:
+                    train_acc = self._accuracy(self.predict(X), t)
+                    train_accs[e] = train_acc
+                    if val_set:
+                        val_acc = self._accuracy(pred_val, t_val)
+                        val_accs[e] = val_acc
+
+                # printing progress bar
+                progression = e / epochs
+                print_length = self._progress_bar(
+                    progression,
+                    train_error=train_errors[e],
+                    train_acc=train_accs[e],
+                    val_error=val_errors[e],
+                    val_acc=val_accs[e],
+                )
+        except KeyboardInterrupt:
+            # allows for stopping training at any point and seeing the result
+            pass
+
+        # visualization of training progression (similiar to tensorflow progression bar)
+        sys.stdout.write(&quot;\r&quot; + &quot; &quot; * print_length)
+        sys.stdout.flush()
+        self._progress_bar(
+            1,
+            train_error=train_errors[e],
+            train_acc=train_accs[e],
+            val_error=val_errors[e],
+            val_acc=val_accs[e],
+        )
+        sys.stdout.write(&quot;&quot;)
+
+        # return performance metrics for the entire run
+        scores = dict()
+
+        scores[&quot;train_errors&quot;] = train_errors
+
+        if val_set:
+            scores[&quot;val_errors&quot;] = val_errors
+
+        if self.classification:
+            scores[&quot;train_accs&quot;] = train_accs
+
+            if val_set:
+                scores[&quot;val_accs&quot;] = val_accs
+
+        return scores
+
+    def predict(self, X: np.ndarray, *, threshold=0.5):
+        &quot;&quot;&quot;
+         Description:
+         ------------
+             Performs prediction after training of the network has been finished.
+
+         Parameters:
+        ------------
+             I   X (np.ndarray): The design matrix, with n rows of p features each
+
+         Optional Parameters:
+         ------------
+             II  threshold (float) : sets minimal value for a prediction to be predicted as the positive class
+                 in classification problems
+
+         Returns:
+         ------------
+             I   z (np.ndarray): A prediction vector (row) for each row in our design matrix
+                 This vector is thresholded if regression=False, meaning that classification results
+                 in a vector of 1s and 0s, while regressions in an array of decimal numbers
+
+        &quot;&quot;&quot;
+
+        predict = self._feedforward(X)
+
+        if self.classification:
+            return np.where(predict &gt; threshold, 1, 0)
+        else:
+            return predict
+
+    def reset_weights(self):
+        &quot;&quot;&quot;
+        Description:
+        ------------
+            Resets/Reinitializes the weights in order to train the network for a new problem.
+
+        &quot;&quot;&quot;
+        if self.seed is not None:
+            np.random.seed(self.seed)
+
+        self.weights = list()
+        for i in range(len(self.dimensions) - 1):
+            weight_array = np.random.randn(
+                self.dimensions[i] + 1, self.dimensions[i + 1]
+            )
+            weight_array[0, :] = np.random.randn(self.dimensions[i + 1]) * 0.01
+
+            self.weights.append(weight_array)
+
+    def _feedforward(self, X: np.ndarray):
+        &quot;&quot;&quot;
+        Description:
+        ------------
+            Calculates the activation of each layer starting at the input and ending at the output.
+            Each following activation is calculated from a weighted sum of each of the preceeding
+            activations (except in the case of the input layer).
+
+        Parameters:
+        ------------
+            I   X (np.ndarray): The design matrix, with n rows of p features each
+
+        Returns:
+        ------------
+            I   z (np.ndarray): A prediction vector (row) for each row in our design matrix
+        &quot;&quot;&quot;
+
+        # reset matrices
+        self.a_matrices = list()
+        self.z_matrices = list()
+
+        # if X is just a vector, make it into a matrix
+        if len(X.shape) == 1:
+            X = X.reshape((1, X.shape[0]))
+
+        # Add a coloumn of zeros as the first coloumn of the design matrix, in order
+        # to add bias to our data
+        bias = np.ones((X.shape[0], 1)) * 0.01
+        X = np.hstack([bias, X])
+
+        # a^0, the nodes in the input layer (one a^0 for each row in X - where the
+        # exponent indicates layer number).
+        a = X
+        self.a_matrices.append(a)
+        self.z_matrices.append(a)
+
+        # The feed forward algorithm
+        for i in range(len(self.weights)):
+            if i &lt; len(self.weights) - 1:
+                z = a @ self.weights[i]
+                self.z_matrices.append(z)
+                a = self.hidden_func(z)
+                # bias column again added to the data here
+                bias = np.ones((a.shape[0], 1)) * 0.01
+                a = np.hstack([bias, a])
+                self.a_matrices.append(a)
+            else:
+                try:
+                    # a^L, the nodes in our output layers
+                    z = a @ self.weights[i]
+                    a = self.output_func(z)
+                    self.a_matrices.append(a)
+                    self.z_matrices.append(z)
+                except Exception as OverflowError:
+                    print(
+                        &quot;OverflowError in fit() in FFNN\nHOW TO DEBUG ERROR: Consider lowering your learning rate or scheduler specific parameters such as momentum, or check if your input values need scaling&quot;
+                    )
+
+        # this will be a^L
+        return a
+
+    def _backpropagate(self, X, t, lam):
+        &quot;&quot;&quot;
+        Description:
+        ------------
+            Performs the backpropagation algorithm. In other words, this method
+            calculates the gradient of all the layers starting at the
+            output layer, and moving from right to left accumulates the gradient until
+            the input layer is reached. Each layers respective weights are updated while
+            the algorithm propagates backwards from the output layer (auto-differentation in reverse mode).
+
+        Parameters:
+        ------------
+            I   X (np.ndarray): The design matrix, with n rows of p features each.
+            II  t (np.ndarray): The target vector, with n rows of p targets.
+            III lam (float32): regularization parameter used to punish the weights in case of overfitting
+
+        Returns:
+        ------------
+            No return value.
+
+        &quot;&quot;&quot;
+        out_derivative = derivate(self.output_func)
+        hidden_derivative = derivate(self.hidden_func)
+
+        for i in range(len(self.weights) - 1, -1, -1):
+            # delta terms for output
+            if i == len(self.weights) - 1:
+                # for multi-class classification
+                if (
+                    self.output_func.__name__ == &quot;softmax&quot;
+                ):
+                    delta_matrix = self.a_matrices[i + 1] - t
+                # for single class classification
+                else:
+                    cost_func_derivative = grad(self.cost_func(t))
+                    delta_matrix = out_derivative(
+                        self.z_matrices[i + 1]
+                    ) * cost_func_derivative(self.a_matrices[i + 1])
+
+            # delta terms for hidden layer
+            else:
+                delta_matrix = (
+                    self.weights[i + 1][1:, :] @ delta_matrix.T
+                ).T * hidden_derivative(self.z_matrices[i + 1])
+
+            # calculate gradient
+            gradient_weights = self.a_matrices[i][:, 1:].T @ delta_matrix
+            gradient_bias = np.sum(delta_matrix, axis=0).reshape(
+                1, delta_matrix.shape[1]
+            )
+
+            # regularization term
+            gradient_weights += self.weights[i][1:, :] * lam
+
+            # use scheduler
+            update_matrix = np.vstack(
+                [
+                    self.schedulers_bias[i].update_change(gradient_bias),
+                    self.schedulers_weight[i].update_change(gradient_weights),
+                ]
+            )
+
+            # update weights and bias
+            self.weights[i] -= update_matrix
+
+    def _accuracy(self, prediction: np.ndarray, target: np.ndarray):
+        &quot;&quot;&quot;
+        Description:
+        ------------
+            Calculates accuracy of given prediction to target
+
+        Parameters:
+        ------------
+            I   prediction (np.ndarray): vector of predicitons output network
+                (1s and 0s in case of classification, and real numbers in case of regression)
+            II  target (np.ndarray): vector of true values (What the network ideally should predict)
+
+        Returns:
+        ------------
+            A floating point number representing the percentage of correctly classified instances.
+        &quot;&quot;&quot;
+        assert prediction.size == target.size
+        return np.average((target == prediction))
+    def _set_classification(self):
+        &quot;&quot;&quot;
+        Description:
+        ------------
+            Decides if FFNN acts as classifier (True) og regressor (False),
+            sets self.classification during init()
+        &quot;&quot;&quot;
+        self.classification = False
+        if (
+            self.cost_func.__name__ == &quot;CostLogReg&quot;
+            or self.cost_func.__name__ == &quot;CostCrossEntropy&quot;
+        ):
+            self.classification = True
+
+    def _progress_bar(self, progression, **kwargs):
+        &quot;&quot;&quot;
+        Description:
+        ------------
+            Displays progress of training
+        &quot;&quot;&quot;
+        print_length = 40
+        num_equals = int(progression * print_length)
+        num_not = print_length - num_equals
+        arrow = &quot;&gt;&quot; if num_equals &gt; 0 else &quot;&quot;
+        bar = &quot;[&quot; + &quot;=&quot; * (num_equals - 1) + arrow + &quot;-&quot; * num_not + &quot;]&quot;
+        perc_print = self._format(progression * 100, decimals=5)
+        line = f&quot;  {bar} {perc_print}% &quot;
+
+        for key in kwargs:
+            if not np.isnan(kwargs[key]):
+                value = self._format(kwargs[key], decimals=4)
+                line += f&quot;| {key}: {value} &quot;
+        sys.stdout.write(&quot;\r&quot; + line)
+        sys.stdout.flush()
+        return len(line)
+
+    def _format(self, value, decimals=4):
+        &quot;&quot;&quot;
+        Description:
+        ------------
+            Formats decimal numbers for progress bar
+        &quot;&quot;&quot;
+        if value &gt; 0:
+            v = value
+        elif value &lt; 0:
+            v = -10 * value
+        else:
+            v = 1
+        n = 1 + math.floor(math.log10(v))
+        if n &gt;= decimals - 1:
+            return str(round(value))
+        return f&quot;{value:.{decimals-n-1}f}&quot;
+</pre></div>
+</div>
+</div>
+</div>
+<p>Before we make a model, we will quickly generate a dataset we can use
+for our linear regression problem as shown below</p>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-none notranslate"><div class="highlight"><pre><span></span>import autograd.numpy as np
+from sklearn.model_selection import train_test_split
+
+def SkrankeFunction(x, y):
+    return np.ravel(0 + 1*x + 2*y + 3*x**2 + 4*x*y + 5*y**2)
+
+def create_X(x, y, n):
+    if len(x.shape) &gt; 1:
+        x = np.ravel(x)
+        y = np.ravel(y)
+
+    N = len(x)
+    l = int((n + 1) * (n + 2) / 2)  # Number of elements in beta
+    X = np.ones((N, l))
+
+    for i in range(1, n + 1):
+        q = int((i) * (i + 1) / 2)
+        for k in range(i + 1):
+            X[:, q + k] = (x ** (i - k)) * (y**k)
+
+    return X
+
+step=0.5
+x = np.arange(0, 1, step)
+y = np.arange(0, 1, step)
+x, y = np.meshgrid(x, y)
+target = SkrankeFunction(x, y)
+target = target.reshape(target.shape[0], 1)
+
+poly_degree=3
+X = create_X(x, y, poly_degree)
+
+X_train, X_test, t_train, t_test = train_test_split(X, target)
+</pre></div>
+</div>
+</div>
+</div>
+<p>Now that we have our dataset ready for the regression, we can create
+our regressor. Note that with the seed parameter, we can make sure our
+results stay the same every time we run the neural network. For
+inititialization, we simply specify the dimensions (we wish the amount
+of input nodes to be equal to the datapoints, and the output to
+predict one value).</p>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-none notranslate"><div class="highlight"><pre><span></span>input_nodes = X_train.shape[1]
+output_nodes = 1
+
+linear_regression = FFNN((input_nodes, output_nodes), output_func=identity, cost_func=CostOLS, seed=2023)
+</pre></div>
+</div>
+</div>
+</div>
+<p>We then fit our model with our training data using the scheduler of our choice.</p>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-none notranslate"><div class="highlight"><pre><span></span>linear_regression.reset_weights() # reset weights such that previous runs or reruns don&#39;t affect the weights
+
+scheduler = Constant(eta=1e-3)
+scores = linear_regression.fit(X_train, t_train, scheduler)
+</pre></div>
+</div>
+</div>
+</div>
+<p>Due to the progress bar we can see the MSE (train_error) throughout
+the FFNN’s training. Note that the fit() function has some optional
+parameters with defualt arguments. For example, the regularization
+hyperparameter can be left ignored if not needed, and equally the FFNN
+will by default run for 100 epochs. These can easily be changed, such
+as for example:</p>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-none notranslate"><div class="highlight"><pre><span></span>linear_regression.reset_weights() # reset weights such that previous runs or reruns don&#39;t affect the weights
+
+scores = linear_regression.fit(X_train, t_train, scheduler, lam=1e-4, epochs=1000)
+</pre></div>
+</div>
+</div>
+</div>
+<p>We see that given more epochs to train on, the regressor reaches a lower MSE.</p>
+<p>Let us then switch to a binary classification. We use a binary
+classification dataset, and follow a similar setup to the regression
+case.</p>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-none notranslate"><div class="highlight"><pre><span></span>from sklearn.datasets import load_breast_cancer
+from sklearn.preprocessing import MinMaxScaler
+
+wisconsin = load_breast_cancer()
+X = wisconsin.data
+target = wisconsin.target
+target = target.reshape(target.shape[0], 1)
+
+X_train, X_val, t_train, t_val = train_test_split(X, target)
+
+scaler = MinMaxScaler()
+scaler.fit(X_train)
+X_train = scaler.transform(X_train)
+X_val = scaler.transform(X_val)
+</pre></div>
+</div>
+</div>
+</div>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-none notranslate"><div class="highlight"><pre><span></span>input_nodes = X_train.shape[1]
+output_nodes = 1
+
+logistic_regression = FFNN((input_nodes, output_nodes), output_func=sigmoid, cost_func=CostLogReg, seed=2023)
+</pre></div>
+</div>
+</div>
+</div>
+<p>We will now make use of our validation data by passing it into our fit function as a keyword argument</p>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-none notranslate"><div class="highlight"><pre><span></span>logistic_regression.reset_weights() # reset weights such that previous runs or reruns don&#39;t affect the weights
+
+scheduler = Adam(eta=1e-3, rho=0.9, rho2=0.999)
+scores = logistic_regression.fit(X_train, t_train, scheduler, epochs=1000, X_val=X_val, t_val=t_val)
+</pre></div>
+</div>
+</div>
+</div>
+<p>Finally, we will create a neural network with 2 hidden layers with activation functions.</p>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-none notranslate"><div class="highlight"><pre><span></span>input_nodes = X_train.shape[1]
+hidden_nodes1 = 100
+hidden_nodes2 = 30
+output_nodes = 1
+
+dims = (input_nodes, hidden_nodes1, hidden_nodes2, output_nodes)
+
+neural_network = FFNN(dims, hidden_func=RELU, output_func=sigmoid, cost_func=CostLogReg, seed=2023)
+</pre></div>
+</div>
+</div>
+</div>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-none notranslate"><div class="highlight"><pre><span></span>neural_network.reset_weights() # reset weights such that previous runs or reruns don&#39;t affect the weights
+
+scheduler = Adam(eta=1e-4, rho=0.9, rho2=0.999)
+scores = neural_network.fit(X_train, t_train, scheduler, epochs=1000, X_val=X_val, t_val=t_val)
+</pre></div>
+</div>
+</div>
+</div>
+</section>
+<section id="multiclass-classification">
+<h3>Multiclass classification<a class="headerlink" href="#multiclass-classification" title="Link to this heading">#</a></h3>
+<p>Finally, we will demonstrate the use case of multiclass classification
+using our FFNN with the famous MNIST dataset, which contain images of
+digits between the range of 0 to 9.</p>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-none notranslate"><div class="highlight"><pre><span></span>from sklearn.datasets import load_digits
+
+def onehot(target: np.ndarray):
+    onehot = np.zeros((target.size, target.max() + 1))
+    onehot[np.arange(target.size), target] = 1
+    return onehot
+
+digits = load_digits()
+
+X = digits.data
+target = digits.target
+target = onehot(target)
+
+input_nodes = 64
+hidden_nodes1 = 100
+hidden_nodes2 = 30
+output_nodes = 10
+
+dims = (input_nodes, hidden_nodes1, hidden_nodes2, output_nodes)
+
+multiclass = FFNN(dims, hidden_func=LRELU, output_func=softmax, cost_func=CostCrossEntropy)
+
+multiclass.reset_weights() # reset weights such that previous runs or reruns don&#39;t affect the weights
+
+scheduler = Adam(eta=1e-4, rho=0.9, rho2=0.999)
+scores = multiclass.fit(X, target, scheduler, epochs=1000)
+</pre></div>
+</div>
+</div>
+</div>
+</section>
+</section>
+<section id="testing-the-xor-gate-and-other-gates">
+<h2>Testing the XOR gate and other gates<a class="headerlink" href="#testing-the-xor-gate-and-other-gates" title="Link to this heading">#</a></h2>
+<p>Let us now use our code to test the XOR gate.</p>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-none notranslate"><div class="highlight"><pre><span></span>X = np.array([ [0, 0], [0, 1], [1, 0],[1, 1]],dtype=np.float64)
+
+# The XOR gate
+yXOR = np.array( [[ 0], [1] ,[1], [0]])
+
+input_nodes = X.shape[1]
+output_nodes = 1
+
+logistic_regression = FFNN((input_nodes, output_nodes), output_func=sigmoid, cost_func=CostLogReg, seed=2023)
+logistic_regression.reset_weights() # reset weights such that previous runs or reruns don&#39;t affect the weights
+scheduler = Adam(eta=1e-1, rho=0.9, rho2=0.999)
+scores = logistic_regression.fit(X, yXOR, scheduler, epochs=1000)
+</pre></div>
+</div>
+</div>
+</div>
+<p>Not bad, but the results depend strongly on the learning reate. Try different learning rates.</p>
+</section>
+<section id="solving-differential-equations-with-deep-learning">
+<h2>Solving differential equations  with Deep Learning<a class="headerlink" href="#solving-differential-equations-with-deep-learning" title="Link to this heading">#</a></h2>
+<p>The Universal Approximation Theorem states that a neural network can
+approximate any function at a single hidden layer along with one input
+and output layer to any given precision.</p>
+<p><strong>Book on solving differential equations with ML methods.</strong></p>
+<p><a class="reference external" href="https://www.springer.com/gp/book/9789401798150">An Introduction to Neural Network Methods for Differential Equations</a>, by Yadav and Kumar.</p>
+<p><strong>Physics informed neural networks.</strong></p>
+<p><a class="reference external" href="https://link.springer.com/article/10.1007/s10915-022-01939-z">Scientific Machine Learning Through Physics–Informed Neural Networks: Where we are and What’s Next</a>, by Cuomo et al</p>
+<p><strong>Thanks to Kristine Baluka Hein.</strong></p>
+<p>The lectures on differential equations were developed by Kristine Baluka Hein, now PhD student at IFI.
+A great thanks to Kristine.</p>
+</section>
+<section id="ordinary-differential-equations-first">
+<h2>Ordinary Differential Equations first<a class="headerlink" href="#ordinary-differential-equations-first" title="Link to this heading">#</a></h2>
+<p>An ordinary differential equation (ODE) is an equation involving functions having one variable.</p>
+<p>In general, an ordinary differential equation looks like</p>
+<!-- Equation labels as ordinary links -->
+<div id="ode"></div>
+<div class="math notranslate nohighlight">
+\[
+\begin{equation} \label{ode} \tag{1}
+f\left(x, \, g(x), \, g'(x), \, g''(x), \, \dots \, , \, g^{(n)}(x)\right) = 0
+\end{equation}
+\]</div>
+<p>where <span class="math notranslate nohighlight">\(g(x)\)</span> is the function to find, and <span class="math notranslate nohighlight">\(g^{(n)}(x)\)</span> is the <span class="math notranslate nohighlight">\(n\)</span>-th derivative of <span class="math notranslate nohighlight">\(g(x)\)</span>.</p>
+<p>The <span class="math notranslate nohighlight">\(f\left(x, g(x), g'(x), g''(x), \, \dots \, , g^{(n)}(x)\right)\)</span> is just a way to write that there is an expression involving <span class="math notranslate nohighlight">\(x\)</span> and <span class="math notranslate nohighlight">\(g(x), \ g'(x), \ g''(x), \, \dots \, , \text{ and } g^{(n)}(x)\)</span> on the left side of the equality sign in (<a class="reference internal" href="#ode"><span class="xref myst">1</span></a>).
+The highest order of derivative, that is the value of <span class="math notranslate nohighlight">\(n\)</span>, determines to the order of the equation.
+The equation is referred to as a <span class="math notranslate nohighlight">\(n\)</span>-th order ODE.
+Along with (<a class="reference internal" href="#ode"><span class="xref myst">1</span></a>), some additional conditions of the function <span class="math notranslate nohighlight">\(g(x)\)</span> are typically given
+for the solution to be unique.</p>
+</section>
+<section id="the-trial-solution">
+<h2>The trial solution<a class="headerlink" href="#the-trial-solution" title="Link to this heading">#</a></h2>
+<p>Let the trial solution <span class="math notranslate nohighlight">\(g_t(x)\)</span> be</p>
+<!-- Equation labels as ordinary links -->
+<div id="_auto1"></div>
+<div class="math notranslate nohighlight">
+\[
+\begin{equation}
+	g_t(x) = h_1(x) + h_2(x,N(x,P))
+\label{_auto1} \tag{2}
+\end{equation}
+\]</div>
+<p>where <span class="math notranslate nohighlight">\(h_1(x)\)</span> is a function that makes <span class="math notranslate nohighlight">\(g_t(x)\)</span> satisfy a given set
+of conditions, <span class="math notranslate nohighlight">\(N(x,P)\)</span> a neural network with weights and biases
+described by <span class="math notranslate nohighlight">\(P\)</span> and <span class="math notranslate nohighlight">\(h_2(x, N(x,P))\)</span> some expression involving the
+neural network.  The role of the function <span class="math notranslate nohighlight">\(h_2(x, N(x,P))\)</span>, is to
+ensure that the output from <span class="math notranslate nohighlight">\(N(x,P)\)</span> is zero when <span class="math notranslate nohighlight">\(g_t(x)\)</span> is
+evaluated at the values of <span class="math notranslate nohighlight">\(x\)</span> where the given conditions must be
+satisfied.  The function <span class="math notranslate nohighlight">\(h_1(x)\)</span> should alone make <span class="math notranslate nohighlight">\(g_t(x)\)</span> satisfy
+the conditions.</p>
+<p>But what about the network <span class="math notranslate nohighlight">\(N(x,P)\)</span>?</p>
+<p>As described previously, an optimization method could be used to minimize the parameters of a neural network, that being its weights and biases, through backward propagation.</p>
+</section>
+<section id="minimization-process">
+<h2>Minimization process<a class="headerlink" href="#minimization-process" title="Link to this heading">#</a></h2>
+<p>For the minimization to be defined, we need to have a cost function at hand to minimize.</p>
+<p>It is given that <span class="math notranslate nohighlight">\(f\left(x, \, g(x), \, g'(x), \, g''(x), \, \dots \, , \, g^{(n)}(x)\right)\)</span> should be equal to zero in (<a class="reference internal" href="#ode"><span class="xref myst">1</span></a>).
+We can choose to consider the mean squared error as the cost function for an input <span class="math notranslate nohighlight">\(x\)</span>.
+Since we are looking at one input, the cost function is just <span class="math notranslate nohighlight">\(f\)</span> squared.
+The cost function <span class="math notranslate nohighlight">\(c\left(x, P \right)\)</span> can therefore be expressed as</p>
+<div class="math notranslate nohighlight">
+\[
+C\left(x, P\right) = \big(f\left(x, \, g(x), \, g'(x), \, g''(x), \, \dots \, , \, g^{(n)}(x)\right)\big)^2
+\]</div>
+<p>If <span class="math notranslate nohighlight">\(N\)</span> inputs are given as a vector <span class="math notranslate nohighlight">\(\boldsymbol{x}\)</span> with elements <span class="math notranslate nohighlight">\(x_i\)</span> for <span class="math notranslate nohighlight">\(i = 1,\dots,N\)</span>,
+the cost function becomes</p>
+<!-- Equation labels as ordinary links -->
+<div id="cost"></div>
+<div class="math notranslate nohighlight">
+\[
+\begin{equation} \label{cost} \tag{3}
+	C\left(\boldsymbol{x}, P\right) = \frac{1}{N} \sum_{i=1}^N \big(f\left(x_i, \, g(x_i), \, g'(x_i), \, g''(x_i), \, \dots \, , \, g^{(n)}(x_i)\right)\big)^2
+\end{equation}
+\]</div>
+<p>The neural net should then find the parameters <span class="math notranslate nohighlight">\(P\)</span> that minimizes the cost function in
+(<a class="reference internal" href="#cost"><span class="xref myst">3</span></a>) for a set of <span class="math notranslate nohighlight">\(N\)</span> training samples <span class="math notranslate nohighlight">\(x_i\)</span>.</p>
+</section>
+<section id="minimizing-the-cost-function-using-gradient-descent-and-automatic-differentiation">
+<h2>Minimizing the cost function using gradient descent and automatic differentiation<a class="headerlink" href="#minimizing-the-cost-function-using-gradient-descent-and-automatic-differentiation" title="Link to this heading">#</a></h2>
+<p>To perform the minimization using gradient descent, the gradient of <span class="math notranslate nohighlight">\(C\left(\boldsymbol{x}, P\right)\)</span> is needed.
+It might happen so that finding an analytical expression of the gradient of <span class="math notranslate nohighlight">\(C(\boldsymbol{x}, P)\)</span> from (<a class="reference internal" href="#cost"><span class="xref myst">3</span></a>) gets too messy, depending on which cost function one desires to use.</p>
+<p>Luckily, there exists libraries that makes the job for us through automatic differentiation.
+Automatic differentiation is a method of finding the derivatives numerically with very high precision.</p>
+</section>
+<section id="example-exponential-decay">
+<h2>Example: Exponential decay<a class="headerlink" href="#example-exponential-decay" title="Link to this heading">#</a></h2>
+<p>An exponential decay of a quantity <span class="math notranslate nohighlight">\(g(x)\)</span> is described by the equation</p>
+<!-- Equation labels as ordinary links -->
+<div id="solve_expdec"></div>
+<div class="math notranslate nohighlight">
+\[
+\begin{equation} \label{solve_expdec} \tag{4}
+  g'(x) = -\gamma g(x)
+\end{equation}
+\]</div>
+<p>with <span class="math notranslate nohighlight">\(g(0) = g_0\)</span> for some chosen initial value <span class="math notranslate nohighlight">\(g_0\)</span>.</p>
+<p>The analytical solution of (<a class="reference internal" href="#solve_expdec"><span class="xref myst">4</span></a>) is</p>
+<!-- Equation labels as ordinary links -->
+<div id="_auto2"></div>
+<div class="math notranslate nohighlight">
+\[
+\begin{equation}
+  g(x) = g_0 \exp\left(-\gamma x\right)
+\label{_auto2} \tag{5}
+\end{equation}
+\]</div>
+<p>Having an analytical solution at hand, it is possible to use it to compare how well a neural network finds a solution of (<a class="reference internal" href="#solve_expdec"><span class="xref myst">4</span></a>).</p>
+</section>
+<section id="the-function-to-solve-for">
+<h2>The function to solve for<a class="headerlink" href="#the-function-to-solve-for" title="Link to this heading">#</a></h2>
+<p>The program will use a neural network to solve</p>
+<!-- Equation labels as ordinary links -->
+<div id="solveode"></div>
+<div class="math notranslate nohighlight">
+\[
+\begin{equation} \label{solveode} \tag{6}
+g'(x) = -\gamma g(x)
+\end{equation}
+\]</div>
+<p>where <span class="math notranslate nohighlight">\(g(0) = g_0\)</span> with <span class="math notranslate nohighlight">\(\gamma\)</span> and <span class="math notranslate nohighlight">\(g_0\)</span> being some chosen values.</p>
+<p>In this example, <span class="math notranslate nohighlight">\(\gamma = 2\)</span> and <span class="math notranslate nohighlight">\(g_0 = 10\)</span>.</p>
+</section>
+<section id="id2">
+<h2>The trial solution<a class="headerlink" href="#id2" title="Link to this heading">#</a></h2>
+<p>To begin with, a trial solution <span class="math notranslate nohighlight">\(g_t(t)\)</span> must be chosen. A general trial solution for ordinary differential equations could be</p>
+<div class="math notranslate nohighlight">
+\[
+g_t(x, P) = h_1(x) + h_2(x, N(x, P))
+\]</div>
+<p>with <span class="math notranslate nohighlight">\(h_1(x)\)</span> ensuring that <span class="math notranslate nohighlight">\(g_t(x)\)</span> satisfies some conditions and <span class="math notranslate nohighlight">\(h_2(x,N(x, P))\)</span> an expression involving <span class="math notranslate nohighlight">\(x\)</span> and the output from the neural network <span class="math notranslate nohighlight">\(N(x,P)\)</span> with <span class="math notranslate nohighlight">\(P \)</span> being the collection of the weights and biases for each layer. For now, it is assumed that the network consists of one input layer, one hidden layer, and one output layer.</p>
+</section>
+<section id="setup-of-network">
+<h2>Setup of Network<a class="headerlink" href="#setup-of-network" title="Link to this heading">#</a></h2>
+<p>In this network, there are no weights and bias at the input layer, so <span class="math notranslate nohighlight">\(P = \{ P_{\text{hidden}},  P_{\text{output}} \}\)</span>.
+If there are <span class="math notranslate nohighlight">\(N_{\text{hidden} }\)</span> neurons in the hidden layer, then <span class="math notranslate nohighlight">\(P_{\text{hidden}}\)</span> is a <span class="math notranslate nohighlight">\(N_{\text{hidden} } \times (1 + N_{\text{input}})\)</span> matrix, given that there are <span class="math notranslate nohighlight">\(N_{\text{input}}\)</span> neurons in the input layer.</p>
+<p>The first column in <span class="math notranslate nohighlight">\(P_{\text{hidden} }\)</span> represents the bias for each neuron in the hidden layer and the second column represents the weights for each neuron in the hidden layer from the input layer.
+If there are <span class="math notranslate nohighlight">\(N_{\text{output} }\)</span> neurons in the output layer, then <span class="math notranslate nohighlight">\(P_{\text{output}} \)</span> is a <span class="math notranslate nohighlight">\(N_{\text{output} } \times (1 + N_{\text{hidden} })\)</span> matrix.</p>
+<p>Its first column represents the bias of each neuron and the remaining columns represents the weights to each neuron.</p>
+<p>It is given that <span class="math notranslate nohighlight">\(g(0) = g_0\)</span>. The trial solution must fulfill this condition to be a proper solution of (<a class="reference internal" href="#solveode"><span class="xref myst">6</span></a>). A possible way to ensure that <span class="math notranslate nohighlight">\(g_t(0, P) = g_0\)</span>, is to let <span class="math notranslate nohighlight">\(F(N(x,P)) = x \cdot N(x,P)\)</span> and <span class="math notranslate nohighlight">\(A(x) = g_0\)</span>. This gives the following trial solution:</p>
+<!-- Equation labels as ordinary links -->
+<div id="trial"></div>
+<div class="math notranslate nohighlight">
+\[
+\begin{equation} \label{trial} \tag{7}
+g_t(x, P) = g_0 + x \cdot N(x, P)
+\end{equation}
+\]</div>
+</section>
+<section id="reformulating-the-problem">
+<h2>Reformulating the problem<a class="headerlink" href="#reformulating-the-problem" title="Link to this heading">#</a></h2>
+<p>We wish that our neural network manages to minimize a given cost function.</p>
+<p>A reformulation of out equation, (<a class="reference internal" href="#solveode"><span class="xref myst">6</span></a>), must therefore be done,
+such that it describes the problem a neural network can solve for.</p>
+<p>The neural network must find the set of weights and biases <span class="math notranslate nohighlight">\(P\)</span> such that the trial solution in (<a class="reference internal" href="#trial"><span class="xref myst">7</span></a>) satisfies (<a class="reference internal" href="#solveode"><span class="xref myst">6</span></a>).</p>
+<p>The trial solution</p>
+<div class="math notranslate nohighlight">
+\[
+g_t(x, P) = g_0 + x \cdot N(x, P)
+\]</div>
+<p>has been chosen such that it already solves the condition <span class="math notranslate nohighlight">\(g(0) = g_0\)</span>. What remains, is to find <span class="math notranslate nohighlight">\(P\)</span> such that</p>
+<!-- Equation labels as ordinary links -->
+<div id="nnmin"></div>
+<div class="math notranslate nohighlight">
+\[
+\begin{equation} \label{nnmin} \tag{8}
+g_t'(x, P) = - \gamma g_t(x, P)
+\end{equation}
+\]</div>
+<p>is fulfilled as <em>best as possible</em>.</p>
+</section>
+<section id="more-technicalities">
+<h2>More technicalities<a class="headerlink" href="#more-technicalities" title="Link to this heading">#</a></h2>
+<p>The left hand side and right hand side of (<a class="reference internal" href="#nnmin"><span class="xref myst">8</span></a>) must be computed separately, and then the neural network must choose weights and biases, contained in <span class="math notranslate nohighlight">\(P\)</span>, such that the sides are equal as best as possible.
+This means that the absolute or squared difference between the sides must be as close to zero, ideally equal to zero.
+In this case, the difference squared shows to be an appropriate measurement of how erroneous the trial solution is with respect to <span class="math notranslate nohighlight">\(P\)</span> of the neural network.</p>
+<p>This gives the following cost function our neural network must solve for:</p>
+<div class="math notranslate nohighlight">
+\[
+\min_{P}\Big\{ \big(g_t'(x, P) - ( -\gamma g_t(x, P) \big)^2 \Big\}
+\]</div>
+<p>(the notation <span class="math notranslate nohighlight">\(\min_{P}\{ f(x, P) \}\)</span> means that we desire to find <span class="math notranslate nohighlight">\(P\)</span> that yields the minimum of <span class="math notranslate nohighlight">\(f(x, P)\)</span>)</p>
+<p>or, in terms of weights and biases for the hidden and output layer in our network:</p>
+<div class="math notranslate nohighlight">
+\[
+\min_{P_{\text{hidden} }, \ P_{\text{output} }}\Big\{ \big(g_t'(x, \{ P_{\text{hidden} }, P_{\text{output} }\}) - ( -\gamma g_t(x, \{ P_{\text{hidden} }, P_{\text{output} }\}) \big)^2 \Big\}
+\]</div>
+<p>for an input value <span class="math notranslate nohighlight">\(x\)</span>.</p>
+</section>
+<section id="more-details">
+<h2>More details<a class="headerlink" href="#more-details" title="Link to this heading">#</a></h2>
+<p>If the neural network evaluates <span class="math notranslate nohighlight">\(g_t(x, P)\)</span> at more values for <span class="math notranslate nohighlight">\(x\)</span>, say <span class="math notranslate nohighlight">\(N\)</span> values <span class="math notranslate nohighlight">\(x_i\)</span> for <span class="math notranslate nohighlight">\(i = 1, \dots, N\)</span>, then the <em>total</em> error to minimize becomes</p>
+<!-- Equation labels as ordinary links -->
+<div id="min"></div>
+<div class="math notranslate nohighlight">
+\[
+\begin{equation} \label{min} \tag{9}
+\min_{P}\Big\{\frac{1}{N} \sum_{i=1}^N  \big(g_t'(x_i, P) - ( -\gamma g_t(x_i, P) \big)^2 \Big\}
+\end{equation}
+\]</div>
+<p>Letting <span class="math notranslate nohighlight">\(\boldsymbol{x}\)</span> be a vector with elements <span class="math notranslate nohighlight">\(x_i\)</span> and <span class="math notranslate nohighlight">\(C(\boldsymbol{x}, P) = \frac{1}{N} \sum_i  \big(g_t'(x_i, P) - ( -\gamma g_t(x_i, P) \big)^2\)</span> denote the cost function, the minimization problem that our network must solve, becomes</p>
+<div class="math notranslate nohighlight">
+\[
+\min_{P} C(\boldsymbol{x}, P)
+\]</div>
+<p>In terms of <span class="math notranslate nohighlight">\(P_{\text{hidden} }\)</span> and <span class="math notranslate nohighlight">\(P_{\text{output} }\)</span>, this could also be expressed as</p>
+<div class="math notranslate nohighlight">
+\[
+\min_{P_{\text{hidden} }, \ P_{\text{output} }} C(\boldsymbol{x}, \{P_{\text{hidden} }, P_{\text{output} }\})
+\]</div>
+</section>
+<section id="a-possible-implementation-of-a-neural-network">
+<h2>A possible implementation of a neural network<a class="headerlink" href="#a-possible-implementation-of-a-neural-network" title="Link to this heading">#</a></h2>
+<p>For simplicity, it is assumed that the input is an array <span class="math notranslate nohighlight">\(\boldsymbol{x} = (x_1, \dots, x_N)\)</span> with <span class="math notranslate nohighlight">\(N\)</span> elements. It is at these points the neural network should find <span class="math notranslate nohighlight">\(P\)</span> such that it fulfills (<a class="reference internal" href="#min"><span class="xref myst">9</span></a>).</p>
+<p>First, the neural network must feed forward the inputs.
+This means that <span class="math notranslate nohighlight">\(\boldsymbol{x}s\)</span> must be passed through an input layer, a hidden layer and a output layer. The input layer in this case, does not need to process the data any further.
+The input layer will consist of <span class="math notranslate nohighlight">\(N_{\text{input} }\)</span> neurons, passing its element to each neuron in the hidden layer.  The number of neurons in the hidden layer will be <span class="math notranslate nohighlight">\(N_{\text{hidden} }\)</span>.</p>
+</section>
+<section id="technicalities">
+<h2>Technicalities<a class="headerlink" href="#technicalities" title="Link to this heading">#</a></h2>
+<p>For the <span class="math notranslate nohighlight">\(i\)</span>-th in the hidden layer with weight <span class="math notranslate nohighlight">\(w_i^{\text{hidden} }\)</span> and bias <span class="math notranslate nohighlight">\(b_i^{\text{hidden} }\)</span>, the weighting from the <span class="math notranslate nohighlight">\(j\)</span>-th neuron at the input layer is:</p>
+<div class="math notranslate nohighlight">
+\[\begin{split}
+\begin{aligned}
+z_{i,j}^{\text{hidden}} &amp;= b_i^{\text{hidden}} + w_i^{\text{hidden}}x_j \\
+&amp;=
+\begin{pmatrix}
+b_i^{\text{hidden}} &amp; w_i^{\text{hidden}}
+\end{pmatrix}
+\begin{pmatrix}
+1 \\
+x_j
+\end{pmatrix}
+\end{aligned}
+\end{split}\]</div>
+</section>
+<section id="final-technicalities-i">
+<h2>Final technicalities I<a class="headerlink" href="#final-technicalities-i" title="Link to this heading">#</a></h2>
+<p>The result after weighting the inputs at the <span class="math notranslate nohighlight">\(i\)</span>-th hidden neuron can be written as a vector:</p>
+<div class="math notranslate nohighlight">
+\[\begin{split}
+\begin{aligned}
+\boldsymbol{z}_{i}^{\text{hidden}} &amp;= \Big( b_i^{\text{hidden}} + w_i^{\text{hidden}}x_1 , \ b_i^{\text{hidden}} + w_i^{\text{hidden}} x_2, \ \dots \, , \ b_i^{\text{hidden}} + w_i^{\text{hidden}} x_N\Big)  \\
+&amp;=
+\begin{pmatrix}
+ b_i^{\text{hidden}}  &amp; w_i^{\text{hidden}}
+\end{pmatrix}
+\begin{pmatrix}
+1  &amp; 1 &amp; \dots &amp; 1 \\
+x_1 &amp; x_2 &amp; \dots &amp; x_N
+\end{pmatrix} \\
+&amp;= \boldsymbol{p}_{i, \text{hidden}}^T X
+\end{aligned}
+\end{split}\]</div>
+</section>
+<section id="final-technicalities-ii">
+<h2>Final technicalities II<a class="headerlink" href="#final-technicalities-ii" title="Link to this heading">#</a></h2>
+<p>The vector <span class="math notranslate nohighlight">\(\boldsymbol{p}_{i, \text{hidden}}^T\)</span> constitutes each row in <span class="math notranslate nohighlight">\(P_{\text{hidden} }\)</span>, which contains the weights for the neural network to minimize according to (<a class="reference internal" href="#min"><span class="xref myst">9</span></a>).</p>
+<p>After having found <span class="math notranslate nohighlight">\(\boldsymbol{z}_{i}^{\text{hidden}} \)</span> for every <span class="math notranslate nohighlight">\(i\)</span>-th neuron within the hidden layer, the vector will be sent to an activation function <span class="math notranslate nohighlight">\(a_i(\boldsymbol{z})\)</span>.</p>
+<p>In this example, the sigmoid function has been chosen to be the activation function for each hidden neuron:</p>
+<div class="math notranslate nohighlight">
+\[
+f(z) = \frac{1}{1 + \exp{(-z)}}
+\]</div>
+<p>It is possible to use other activations functions for the hidden layer also.</p>
+<p>The output <span class="math notranslate nohighlight">\(\boldsymbol{x}_i^{\text{hidden}}\)</span> from each <span class="math notranslate nohighlight">\(i\)</span>-th hidden neuron is:</p>
+<div class="math notranslate nohighlight">
+\[
+\boldsymbol{x}_i^{\text{hidden} } = f\big(  \boldsymbol{z}_{i}^{\text{hidden}} \big)
+\]</div>
+<p>The outputs <span class="math notranslate nohighlight">\(\boldsymbol{x}_i^{\text{hidden} } \)</span> are then sent to the output layer.</p>
+<p>The output layer consists of one neuron in this case, and combines the
+output from each of the neurons in the hidden layers. The output layer
+combines the results from the hidden layer using some weights <span class="math notranslate nohighlight">\(w_i^{\text{output}}\)</span>
+and biases <span class="math notranslate nohighlight">\(b_i^{\text{output}}\)</span>. In this case,
+it is assumes that the number of neurons in the output layer is one.</p>
+</section>
+<section id="final-technicalities-iii">
+<h2>Final technicalities III<a class="headerlink" href="#final-technicalities-iii" title="Link to this heading">#</a></h2>
+<p>The procedure of weighting the output neuron <span class="math notranslate nohighlight">\(j\)</span> in the hidden layer to the <span class="math notranslate nohighlight">\(i\)</span>-th neuron in the output layer is similar as for the hidden layer described previously.</p>
+<div class="math notranslate nohighlight">
+\[\begin{split}
+\begin{aligned}
+z_{1,j}^{\text{output}} &amp; =
+\begin{pmatrix}
+b_1^{\text{output}} &amp; \boldsymbol{w}_1^{\text{output}}
+\end{pmatrix}
+\begin{pmatrix}
+1 \\
+\boldsymbol{x}_j^{\text{hidden}}
+\end{pmatrix}
+\end{aligned}
+\end{split}\]</div>
+</section>
+<section id="final-technicalities-iv">
+<h2>Final technicalities IV<a class="headerlink" href="#final-technicalities-iv" title="Link to this heading">#</a></h2>
+<p>Expressing <span class="math notranslate nohighlight">\(z_{1,j}^{\text{output}}\)</span> as a vector gives the following way of weighting the inputs from the hidden layer:</p>
+<div class="math notranslate nohighlight">
+\[\begin{split}
+\boldsymbol{z}_{1}^{\text{output}} =
+\begin{pmatrix}
+b_1^{\text{output}} &amp; \boldsymbol{w}_1^{\text{output}}
+\end{pmatrix}
+\begin{pmatrix}
+1  &amp; 1 &amp; \dots &amp; 1 \\
+\boldsymbol{x}_1^{\text{hidden}} &amp; \boldsymbol{x}_2^{\text{hidden}} &amp; \dots &amp; \boldsymbol{x}_N^{\text{hidden}}
+\end{pmatrix}
+\end{split}\]</div>
+<p>In this case we seek a continuous range of values since we are approximating a function. This means that after computing <span class="math notranslate nohighlight">\(\boldsymbol{z}_{1}^{\text{output}}\)</span> the neural network has finished its feed forward step, and <span class="math notranslate nohighlight">\(\boldsymbol{z}_{1}^{\text{output}}\)</span> is the final output of the network.</p>
+</section>
+<section id="back-propagation">
+<h2>Back propagation<a class="headerlink" href="#back-propagation" title="Link to this heading">#</a></h2>
+<p>The next step is to decide how the parameters should be changed such that they minimize the cost function.</p>
+<p>The chosen cost function for this problem is</p>
+<div class="math notranslate nohighlight">
+\[
+C(\boldsymbol{x}, P) = \frac{1}{N} \sum_i  \big(g_t'(x_i, P) - ( -\gamma g_t(x_i, P) \big)^2
+\]</div>
+<p>In order to minimize the cost function, an optimization method must be chosen.</p>
+<p>Here, gradient descent with a constant step size has been chosen.</p>
+</section>
+<section id="gradient-descent">
+<h2>Gradient descent<a class="headerlink" href="#gradient-descent" title="Link to this heading">#</a></h2>
+<p>The idea of the gradient descent algorithm is to update parameters in
+a direction where the cost function decreases goes to a minimum.</p>
+<p>In general, the update of some parameters <span class="math notranslate nohighlight">\(\boldsymbol{\omega}\)</span> given a cost
+function defined by some weights <span class="math notranslate nohighlight">\(\boldsymbol{\omega}\)</span>, <span class="math notranslate nohighlight">\(C(\boldsymbol{x},
+\boldsymbol{\omega})\)</span>, goes as follows:</p>
+<div class="math notranslate nohighlight">
+\[
+\boldsymbol{\omega}_{\text{new} } = \boldsymbol{\omega} - \lambda \nabla_{\boldsymbol{\omega}} C(\boldsymbol{x}, \boldsymbol{\omega})
+\]</div>
+<p>for a number of iterations or until <span class="math notranslate nohighlight">\( \big|\big| \boldsymbol{\omega}_{\text{new} } - \boldsymbol{\omega} \big|\big|\)</span> becomes smaller than some given tolerance.</p>
+<p>The value of <span class="math notranslate nohighlight">\(\lambda\)</span> decides how large steps the algorithm must take
+in the direction of <span class="math notranslate nohighlight">\( \nabla_{\boldsymbol{\omega}} C(\boldsymbol{x}, \boldsymbol{\omega})\)</span>.
+The notation <span class="math notranslate nohighlight">\(\nabla_{\boldsymbol{\omega}}\)</span> express the gradient with respect
+to the elements in <span class="math notranslate nohighlight">\(\boldsymbol{\omega}\)</span>.</p>
+<p>In our case, we have to minimize the cost function <span class="math notranslate nohighlight">\(C(\boldsymbol{x}, P)\)</span> with
+respect to the two sets of weights and biases, that is for the hidden
+layer <span class="math notranslate nohighlight">\(P_{\text{hidden} }\)</span> and for the output layer <span class="math notranslate nohighlight">\(P_{\text{output}
+}\)</span> .</p>
+<p>This means that <span class="math notranslate nohighlight">\(P_{\text{hidden} }\)</span> and <span class="math notranslate nohighlight">\(P_{\text{output} }\)</span> is updated by</p>
+<div class="math notranslate nohighlight">
+\[\begin{split}
+\begin{aligned}
+P_{\text{hidden},\text{new}} &amp;= P_{\text{hidden}} - \lambda \nabla_{P_{\text{hidden}}} C(\boldsymbol{x}, P)  \\
+P_{\text{output},\text{new}} &amp;= P_{\text{output}} - \lambda \nabla_{P_{\text{output}}} C(\boldsymbol{x}, P)
+\end{aligned}
+\end{split}\]</div>
+</section>
+<section id="the-code-for-solving-the-ode">
+<h2>The code for solving the ODE<a class="headerlink" href="#the-code-for-solving-the-ode" title="Link to this heading">#</a></h2>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-none notranslate"><div class="highlight"><pre><span></span>import autograd.numpy as np
+from autograd import grad, elementwise_grad
+import autograd.numpy.random as npr
+from matplotlib import pyplot as plt
+
+def sigmoid(z):
+    return 1/(1 + np.exp(-z))
+
+# Assuming one input, hidden, and output layer
+def neural_network(params, x):
+
+    # Find the weights (including and biases) for the hidden and output layer.
+    # Assume that params is a list of parameters for each layer.
+    # The biases are the first element for each array in params,
+    # and the weights are the remaning elements in each array in params.
+
+    w_hidden = params[0]
+    w_output = params[1]
+
+    # Assumes input x being an one-dimensional array
+    num_values = np.size(x)
+    x = x.reshape(-1, num_values)
+
+    # Assume that the input layer does nothing to the input x
+    x_input = x
+
+    ## Hidden layer:
+
+    # Add a row of ones to include bias
+    x_input = np.concatenate((np.ones((1,num_values)), x_input ), axis = 0)
+
+    z_hidden = np.matmul(w_hidden, x_input)
+    x_hidden = sigmoid(z_hidden)
+
+    ## Output layer:
+
+    # Include bias:
+    x_hidden = np.concatenate((np.ones((1,num_values)), x_hidden ), axis = 0)
+
+    z_output = np.matmul(w_output, x_hidden)
+    x_output = z_output
+
+    return x_output
+
+# The trial solution using the deep neural network:
+def g_trial(x,params, g0 = 10):
+    return g0 + x*neural_network(params,x)
+
+# The right side of the ODE:
+def g(x, g_trial, gamma = 2):
+    return -gamma*g_trial
+
+# The cost function:
+def cost_function(P, x):
+
+    # Evaluate the trial function with the current parameters P
+    g_t = g_trial(x,P)
+
+    # Find the derivative w.r.t x of the neural network
+    d_net_out = elementwise_grad(neural_network,1)(P,x)
+
+    # Find the derivative w.r.t x of the trial function
+    d_g_t = elementwise_grad(g_trial,0)(x,P)
+
+    # The right side of the ODE
+    func = g(x, g_t)
+
+    err_sqr = (d_g_t - func)**2
+    cost_sum = np.sum(err_sqr)
+
+    return cost_sum / np.size(err_sqr)
+
+# Solve the exponential decay ODE using neural network with one input, hidden, and output layer
+def solve_ode_neural_network(x, num_neurons_hidden, num_iter, lmb):
+    ## Set up initial weights and biases
+
+    # For the hidden layer
+    p0 = npr.randn(num_neurons_hidden, 2 )
+
+    # For the output layer
+    p1 = npr.randn(1, num_neurons_hidden + 1 ) # +1 since bias is included
+
+    P = [p0, p1]
+
+    print(&#39;Initial cost: %g&#39;%cost_function(P, x))
+
+    ## Start finding the optimal weights using gradient descent
+
+    # Find the Python function that represents the gradient of the cost function
+    # w.r.t the 0-th input argument -- that is the weights and biases in the hidden and output layer
+    cost_function_grad = grad(cost_function,0)
+
+    # Let the update be done num_iter times
+    for i in range(num_iter):
+        # Evaluate the gradient at the current weights and biases in P.
+        # The cost_grad consist now of two arrays;
+        # one for the gradient w.r.t P_hidden and
+        # one for the gradient w.r.t P_output
+        cost_grad =  cost_function_grad(P, x)
+
+        P[0] = P[0] - lmb * cost_grad[0]
+        P[1] = P[1] - lmb * cost_grad[1]
+
+    print(&#39;Final cost: %g&#39;%cost_function(P, x))
+
+    return P
+
+def g_analytic(x, gamma = 2, g0 = 10):
+    return g0*np.exp(-gamma*x)
+
+# Solve the given problem
+if __name__ == &#39;__main__&#39;:
+    # Set seed such that the weight are initialized
+    # with same weights and biases for every run.
+    npr.seed(15)
+
+    ## Decide the vales of arguments to the function to solve
+    N = 10
+    x = np.linspace(0, 1, N)
+
+    ## Set up the initial parameters
+    num_hidden_neurons = 10
+    num_iter = 10000
+    lmb = 0.001
+
+    # Use the network
+    P = solve_ode_neural_network(x, num_hidden_neurons, num_iter, lmb)
+
+    # Print the deviation from the trial solution and true solution
+    res = g_trial(x,P)
+    res_analytical = g_analytic(x)
+
+    print(&#39;Max absolute difference: %g&#39;%np.max(np.abs(res - res_analytical)))
+
+    # Plot the results
+    plt.figure(figsize=(10,10))
+
+    plt.title(&#39;Performance of neural network solving an ODE compared to the analytical solution&#39;)
+    plt.plot(x, res_analytical)
+    plt.plot(x, res[0,:])
+    plt.legend([&#39;analytical&#39;,&#39;nn&#39;])
+    plt.xlabel(&#39;x&#39;)
+    plt.ylabel(&#39;g(x)&#39;)
+    plt.show()
+</pre></div>
+</div>
+</div>
+</div>
+</section>
+<section id="the-network-with-one-input-layer-specified-number-of-hidden-layers-and-one-output-layer">
+<h2>The network with one input layer, specified number of hidden layers, and one output layer<a class="headerlink" href="#the-network-with-one-input-layer-specified-number-of-hidden-layers-and-one-output-layer" title="Link to this heading">#</a></h2>
+<p>It is also possible to extend the construction of our network into a more general one, allowing the network to contain more than one hidden layers.</p>
+<p>The number of neurons within each hidden layer are given as a list of integers in the program below.</p>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-none notranslate"><div class="highlight"><pre><span></span>import autograd.numpy as np
+from autograd import grad, elementwise_grad
+import autograd.numpy.random as npr
+from matplotlib import pyplot as plt
+
+def sigmoid(z):
+    return 1/(1 + np.exp(-z))
+
+# The neural network with one input layer and one output layer,
+# but with number of hidden layers specified by the user.
+def deep_neural_network(deep_params, x):
+    # N_hidden is the number of hidden layers
+    # deep_params is a list, len() should be used
+    N_hidden = len(deep_params) - 1 # -1 since params consists of
+                                        # parameters to all the hidden
+                                        # layers AND the output layer.
+
+    # Assumes input x being an one-dimensional array
+    num_values = np.size(x)
+    x = x.reshape(-1, num_values)
+
+    # Assume that the input layer does nothing to the input x
+    x_input = x
+
+    # Due to multiple hidden layers, define a variable referencing to the
+    # output of the previous layer:
+    x_prev = x_input
+
+    ## Hidden layers:
+
+    for l in range(N_hidden):
+        # From the list of parameters P; find the correct weigths and bias for this layer
+        w_hidden = deep_params[l]
+
+        # Add a row of ones to include bias
+        x_prev = np.concatenate((np.ones((1,num_values)), x_prev ), axis = 0)
+
+        z_hidden = np.matmul(w_hidden, x_prev)
+        x_hidden = sigmoid(z_hidden)
+
+        # Update x_prev such that next layer can use the output from this layer
+        x_prev = x_hidden
+
+    ## Output layer:
+
+    # Get the weights and bias for this layer
+    w_output = deep_params[-1]
+
+    # Include bias:
+    x_prev = np.concatenate((np.ones((1,num_values)), x_prev), axis = 0)
+
+    z_output = np.matmul(w_output, x_prev)
+    x_output = z_output
+
+    return x_output
+
+# The trial solution using the deep neural network:
+def g_trial_deep(x,params, g0 = 10):
+    return g0 + x*deep_neural_network(params, x)
+
+# The right side of the ODE:
+def g(x, g_trial, gamma = 2):
+    return -gamma*g_trial
+
+# The same cost function as before, but calls deep_neural_network instead.
+def cost_function_deep(P, x):
+
+    # Evaluate the trial function with the current parameters P
+    g_t = g_trial_deep(x,P)
+
+    # Find the derivative w.r.t x of the neural network
+    d_net_out = elementwise_grad(deep_neural_network,1)(P,x)
+
+    # Find the derivative w.r.t x of the trial function
+    d_g_t = elementwise_grad(g_trial_deep,0)(x,P)
+
+    # The right side of the ODE
+    func = g(x, g_t)
+
+    err_sqr = (d_g_t - func)**2
+    cost_sum = np.sum(err_sqr)
+
+    return cost_sum / np.size(err_sqr)
+
+# Solve the exponential decay ODE using neural network with one input and one output layer,
+# but with specified number of hidden layers from the user.
+def solve_ode_deep_neural_network(x, num_neurons, num_iter, lmb):
+    # num_hidden_neurons is now a list of number of neurons within each hidden layer
+
+    # The number of elements in the list num_hidden_neurons thus represents
+    # the number of hidden layers.
+
+    # Find the number of hidden layers:
+    N_hidden = np.size(num_neurons)
+
+    ## Set up initial weights and biases
+
+    # Initialize the list of parameters:
+    P = [None]*(N_hidden + 1) # + 1 to include the output layer
+
+    P[0] = npr.randn(num_neurons[0], 2 )
+    for l in range(1,N_hidden):
+        P[l] = npr.randn(num_neurons[l], num_neurons[l-1] + 1) # +1 to include bias
+
+    # For the output layer
+    P[-1] = npr.randn(1, num_neurons[-1] + 1 ) # +1 since bias is included
+
+    print(&#39;Initial cost: %g&#39;%cost_function_deep(P, x))
+
+    ## Start finding the optimal weights using gradient descent
+
+    # Find the Python function that represents the gradient of the cost function
+    # w.r.t the 0-th input argument -- that is the weights and biases in the hidden and output layer
+    cost_function_deep_grad = grad(cost_function_deep,0)
+
+    # Let the update be done num_iter times
+    for i in range(num_iter):
+        # Evaluate the gradient at the current weights and biases in P.
+        # The cost_grad consist now of N_hidden + 1 arrays; the gradient w.r.t the weights and biases
+        # in the hidden layers and output layers evaluated at x.
+        cost_deep_grad =  cost_function_deep_grad(P, x)
+
+        for l in range(N_hidden+1):
+            P[l] = P[l] - lmb * cost_deep_grad[l]
+
+    print(&#39;Final cost: %g&#39;%cost_function_deep(P, x))
+
+    return P
+
+def g_analytic(x, gamma = 2, g0 = 10):
+    return g0*np.exp(-gamma*x)
+
+# Solve the given problem
+if __name__ == &#39;__main__&#39;:
+    npr.seed(15)
+
+    ## Decide the vales of arguments to the function to solve
+    N = 10
+    x = np.linspace(0, 1, N)
+
+    ## Set up the initial parameters
+    num_hidden_neurons = np.array([10,10])
+    num_iter = 10000
+    lmb = 0.001
+
+    P = solve_ode_deep_neural_network(x, num_hidden_neurons, num_iter, lmb)
+
+    res = g_trial_deep(x,P)
+    res_analytical = g_analytic(x)
+
+    plt.figure(figsize=(10,10))
+
+    plt.title(&#39;Performance of a deep neural network solving an ODE compared to the analytical solution&#39;)
+    plt.plot(x, res_analytical)
+    plt.plot(x, res[0,:])
+    plt.legend([&#39;analytical&#39;,&#39;dnn&#39;])
+    plt.ylabel(&#39;g(x)&#39;)
+    plt.show()
+</pre></div>
+</div>
+</div>
+</div>
+</section>
+<section id="example-population-growth">
+<h2>Example: Population growth<a class="headerlink" href="#example-population-growth" title="Link to this heading">#</a></h2>
+<p>A logistic model of population growth assumes that a population converges toward an equilibrium.
+The population growth can be modeled by</p>
+<!-- Equation labels as ordinary links -->
+<div id="log"></div>
+<div class="math notranslate nohighlight">
+\[
+\begin{equation} \label{log} \tag{10}
+	g'(t) = \alpha g(t)(A - g(t))
+\end{equation}
+\]</div>
+<p>where <span class="math notranslate nohighlight">\(g(t)\)</span> is the population density at time <span class="math notranslate nohighlight">\(t\)</span>, <span class="math notranslate nohighlight">\(\alpha &gt; 0\)</span> the growth rate and <span class="math notranslate nohighlight">\(A &gt; 0\)</span> is the maximum population number in the environment.
+Also, at <span class="math notranslate nohighlight">\(t = 0\)</span> the population has the size <span class="math notranslate nohighlight">\(g(0) = g_0\)</span>, where <span class="math notranslate nohighlight">\(g_0\)</span> is some chosen constant.</p>
+<p>In this example, similar network as for the exponential decay using Autograd has been used to solve the equation. However, as the implementation might suffer from e.g numerical instability
+and high execution time (this might be more apparent in the examples solving PDEs),
+using a library like  TensorFlow is recommended.
+Here, we stay with a more simple approach and implement for comparison, the simple forward Euler method.</p>
+</section>
+<section id="setting-up-the-problem">
+<h2>Setting up the problem<a class="headerlink" href="#setting-up-the-problem" title="Link to this heading">#</a></h2>
+<p>Here, we will model a population <span class="math notranslate nohighlight">\(g(t)\)</span> in an environment having carrying capacity <span class="math notranslate nohighlight">\(A\)</span>.
+The population follows the model</p>
+<!-- Equation labels as ordinary links -->
+<div id="solveode_population"></div>
+<div class="math notranslate nohighlight">
+\[
+\begin{equation} \label{solveode_population} \tag{11}
+g'(t) = \alpha g(t)(A - g(t))
+\end{equation}
+\]</div>
+<p>where <span class="math notranslate nohighlight">\(g(0) = g_0\)</span>.</p>
+<p>In this example, we let <span class="math notranslate nohighlight">\(\alpha = 2\)</span>, <span class="math notranslate nohighlight">\(A = 1\)</span>, and <span class="math notranslate nohighlight">\(g_0 = 1.2\)</span>.</p>
+</section>
+<section id="id3">
+<h2>The trial solution<a class="headerlink" href="#id3" title="Link to this heading">#</a></h2>
+<p>We will get a slightly different trial solution, as the boundary conditions are different
+compared to the case for exponential decay.</p>
+<p>A possible trial solution satisfying the condition <span class="math notranslate nohighlight">\(g(0) = g_0\)</span> could be</p>
+<div class="math notranslate nohighlight">
+\[
+h_1(t) = g_0 + t \cdot N(t,P)
+\]</div>
+<p>with <span class="math notranslate nohighlight">\(N(t,P)\)</span> being the output from the neural network with weights and biases for each layer collected in the set <span class="math notranslate nohighlight">\(P\)</span>.</p>
+<p>The analytical solution is</p>
+<div class="math notranslate nohighlight">
+\[
+g(t) = \frac{Ag_0}{g_0 + (A - g_0)\exp(-\alpha A t)}
+\]</div>
+</section>
+<section id="the-program-using-autograd">
+<h2>The program using Autograd<a class="headerlink" href="#the-program-using-autograd" title="Link to this heading">#</a></h2>
+<p>The network will be the similar as for the exponential decay example, but with some small modifications for our problem.</p>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-none notranslate"><div class="highlight"><pre><span></span>import autograd.numpy as np
+from autograd import grad, elementwise_grad
+import autograd.numpy.random as npr
+from matplotlib import pyplot as plt
+
+def sigmoid(z):
+    return 1/(1 + np.exp(-z))
+
+# Function to get the parameters.
+# Done such that one can easily change the paramaters after one&#39;s liking.
+def get_parameters():
+    alpha = 2
+    A = 1
+    g0 = 1.2
+    return alpha, A, g0
+
+def deep_neural_network(deep_params, x):
+    # N_hidden is the number of hidden layers
+    # deep_params is a list, len() should be used
+    N_hidden = len(deep_params) - 1 # -1 since params consists of
+                                        # parameters to all the hidden
+                                        # layers AND the output layer.
+
+    # Assumes input x being an one-dimensional array
+    num_values = np.size(x)
+    x = x.reshape(-1, num_values)
+
+    # Assume that the input layer does nothing to the input x
+    x_input = x
+
+    # Due to multiple hidden layers, define a variable referencing to the
+    # output of the previous layer:
+    x_prev = x_input
+
+    ## Hidden layers:
+
+    for l in range(N_hidden):
+        # From the list of parameters P; find the correct weigths and bias for this layer
+        w_hidden = deep_params[l]
+
+        # Add a row of ones to include bias
+        x_prev = np.concatenate((np.ones((1,num_values)), x_prev ), axis = 0)
+
+        z_hidden = np.matmul(w_hidden, x_prev)
+        x_hidden = sigmoid(z_hidden)
+
+        # Update x_prev such that next layer can use the output from this layer
+        x_prev = x_hidden
+
+    ## Output layer:
+
+    # Get the weights and bias for this layer
+    w_output = deep_params[-1]
+
+    # Include bias:
+    x_prev = np.concatenate((np.ones((1,num_values)), x_prev), axis = 0)
+
+    z_output = np.matmul(w_output, x_prev)
+    x_output = z_output
+
+    return x_output
+
+
+
+
+def cost_function_deep(P, x):
+
+    # Evaluate the trial function with the current parameters P
+    g_t = g_trial_deep(x,P)
+
+    # Find the derivative w.r.t x of the trial function
+    d_g_t = elementwise_grad(g_trial_deep,0)(x,P)
+
+    # The right side of the ODE
+    func = f(x, g_t)
+
+    err_sqr = (d_g_t - func)**2
+    cost_sum = np.sum(err_sqr)
+
+    return cost_sum / np.size(err_sqr)
+
+# The right side of the ODE:
+def f(x, g_trial):
+    alpha,A, g0 = get_parameters()
+    return alpha*g_trial*(A - g_trial)
+
+# The trial solution using the deep neural network:
+def g_trial_deep(x, params):
+    alpha,A, g0 = get_parameters()
+    return g0 + x*deep_neural_network(params,x)
+
+# The analytical solution:
+def g_analytic(t):
+    alpha,A, g0 = get_parameters()
+    return A*g0/(g0 + (A - g0)*np.exp(-alpha*A*t))
+
+def solve_ode_deep_neural_network(x, num_neurons, num_iter, lmb):
+    # num_hidden_neurons is now a list of number of neurons within each hidden layer
+
+    # Find the number of hidden layers:
+    N_hidden = np.size(num_neurons)
+
+    ## Set up initial weigths and biases
+
+    # Initialize the list of parameters:
+    P = [None]*(N_hidden + 1) # + 1 to include the output layer
+
+    P[0] = npr.randn(num_neurons[0], 2 )
+    for l in range(1,N_hidden):
+        P[l] = npr.randn(num_neurons[l], num_neurons[l-1] + 1) # +1 to include bias
+
+    # For the output layer
+    P[-1] = npr.randn(1, num_neurons[-1] + 1 ) # +1 since bias is included
+
+    print(&#39;Initial cost: %g&#39;%cost_function_deep(P, x))
+
+    ## Start finding the optimal weigths using gradient descent
+
+    # Find the Python function that represents the gradient of the cost function
+    # w.r.t the 0-th input argument -- that is the weights and biases in the hidden and output layer
+    cost_function_deep_grad = grad(cost_function_deep,0)
+
+    # Let the update be done num_iter times
+    for i in range(num_iter):
+        # Evaluate the gradient at the current weights and biases in P.
+        # The cost_grad consist now of N_hidden + 1 arrays; the gradient w.r.t the weights and biases
+        # in the hidden layers and output layers evaluated at x.
+        cost_deep_grad =  cost_function_deep_grad(P, x)
+
+        for l in range(N_hidden+1):
+            P[l] = P[l] - lmb * cost_deep_grad[l]
+
+    print(&#39;Final cost: %g&#39;%cost_function_deep(P, x))
+
+    return P
+
+if __name__ == &#39;__main__&#39;:
+    npr.seed(4155)
+
+    ## Decide the vales of arguments to the function to solve
+    Nt = 10
+    T = 1
+    t = np.linspace(0,T, Nt)
+
+    ## Set up the initial parameters
+    num_hidden_neurons = [100, 50, 25]
+    num_iter = 1000
+    lmb = 1e-3
+
+    P = solve_ode_deep_neural_network(t, num_hidden_neurons, num_iter, lmb)
+
+    g_dnn_ag = g_trial_deep(t,P)
+    g_analytical = g_analytic(t)
+
+    # Find the maximum absolute difference between the solutons:
+    diff_ag = np.max(np.abs(g_dnn_ag - g_analytical))
+    print(&quot;The max absolute difference between the solutions is: %g&quot;%diff_ag)
+
+    plt.figure(figsize=(10,10))
+
+    plt.title(&#39;Performance of neural network solving an ODE compared to the analytical solution&#39;)
+    plt.plot(t, g_analytical)
+    plt.plot(t, g_dnn_ag[0,:])
+    plt.legend([&#39;analytical&#39;,&#39;nn&#39;])
+    plt.xlabel(&#39;t&#39;)
+    plt.ylabel(&#39;g(t)&#39;)
+
+    plt.show()
+</pre></div>
+</div>
+</div>
+</div>
+</section>
+<section id="using-forward-euler-to-solve-the-ode">
+<h2>Using forward Euler to solve the ODE<a class="headerlink" href="#using-forward-euler-to-solve-the-ode" title="Link to this heading">#</a></h2>
+<p>A straightforward way of solving an ODE numerically, is to use Euler’s method.</p>
+<p>Euler’s method uses Taylor series to approximate the value at a function <span class="math notranslate nohighlight">\(f\)</span> at a step <span class="math notranslate nohighlight">\(\Delta x\)</span> from <span class="math notranslate nohighlight">\(x\)</span>:</p>
+<div class="math notranslate nohighlight">
+\[
+f(x + \Delta x) \approx f(x) + \Delta x f'(x)
+\]</div>
+<p>In our case, using Euler’s method to approximate the value of <span class="math notranslate nohighlight">\(g\)</span> at a step <span class="math notranslate nohighlight">\(\Delta t\)</span> from <span class="math notranslate nohighlight">\(t\)</span> yields</p>
+<div class="math notranslate nohighlight">
+\[\begin{split}
+\begin{aligned}
+  g(t + \Delta t) &amp;\approx g(t) + \Delta t g'(t) \\
+  &amp;= g(t) + \Delta t \big(\alpha g(t)(A - g(t))\big)
+\end{aligned}
+\end{split}\]</div>
+<p>along with the condition that <span class="math notranslate nohighlight">\(g(0) = g_0\)</span>.</p>
+<p>Let <span class="math notranslate nohighlight">\(t_i = i \cdot \Delta t\)</span> where <span class="math notranslate nohighlight">\(\Delta t = \frac{T}{N_t-1}\)</span> where <span class="math notranslate nohighlight">\(T\)</span> is the final time our solver must solve for and <span class="math notranslate nohighlight">\(N_t\)</span> the number of values for <span class="math notranslate nohighlight">\(t \in [0, T]\)</span> for <span class="math notranslate nohighlight">\(i = 0, \dots, N_t-1\)</span>.</p>
+<p>For <span class="math notranslate nohighlight">\(i \geq 1\)</span>, we have that</p>
+<div class="math notranslate nohighlight">
+\[\begin{split}
+\begin{aligned}
+t_i &amp;= i\Delta t \\
+&amp;= (i - 1)\Delta t + \Delta t \\
+&amp;= t_{i-1} + \Delta t
+\end{aligned}
+\end{split}\]</div>
+<p>Now, if <span class="math notranslate nohighlight">\(g_i = g(t_i)\)</span> then</p>
+<!-- Equation labels as ordinary links -->
+<div id="odenum"></div>
+<div class="math notranslate nohighlight">
+\[\begin{split}
+\begin{equation}
+  \begin{aligned}
+  g_i &amp;= g(t_i) \\
+  &amp;= g(t_{i-1} + \Delta t) \\
+  &amp;\approx g(t_{i-1}) + \Delta t \big(\alpha g(t_{i-1})(A - g(t_{i-1}))\big) \\
+  &amp;= g_{i-1} + \Delta t \big(\alpha g_{i-1}(A - g_{i-1})\big)
+  \end{aligned}
+\end{equation} \label{odenum} \tag{12}
+\end{split}\]</div>
+<p>for <span class="math notranslate nohighlight">\(i \geq 1\)</span> and <span class="math notranslate nohighlight">\(g_0 = g(t_0) = g(0) = g_0\)</span>.</p>
+<p>Equation (<a class="reference internal" href="#odenum"><span class="xref myst">12</span></a>) could be implemented in the following way,
+extending the program that uses the network using Autograd:</p>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-none notranslate"><div class="highlight"><pre><span></span># Assume that all function definitions from the example program using Autograd
+# are located here.
+
+if __name__ == &#39;__main__&#39;:
+    npr.seed(4155)
+
+    ## Decide the vales of arguments to the function to solve
+    Nt = 10
+    T = 1
+    t = np.linspace(0,T, Nt)
+
+    ## Set up the initial parameters
+    num_hidden_neurons = [100,50,25]
+    num_iter = 1000
+    lmb = 1e-3
+
+    P = solve_ode_deep_neural_network(t, num_hidden_neurons, num_iter, lmb)
+
+    g_dnn_ag = g_trial_deep(t,P)
+    g_analytical = g_analytic(t)
+
+    # Find the maximum absolute difference between the solutons:
+    diff_ag = np.max(np.abs(g_dnn_ag - g_analytical))
+    print(&quot;The max absolute difference between the solutions is: %g&quot;%diff_ag)
+
+    plt.figure(figsize=(10,10))
+
+    plt.title(&#39;Performance of neural network solving an ODE compared to the analytical solution&#39;)
+    plt.plot(t, g_analytical)
+    plt.plot(t, g_dnn_ag[0,:])
+    plt.legend([&#39;analytical&#39;,&#39;nn&#39;])
+    plt.xlabel(&#39;t&#39;)
+    plt.ylabel(&#39;g(t)&#39;)
+
+    ## Find an approximation to the funtion using forward Euler
+
+    alpha, A, g0 = get_parameters()
+    dt = T/(Nt - 1)
+
+    # Perform forward Euler to solve the ODE
+    g_euler = np.zeros(Nt)
+    g_euler[0] = g0
+
+    for i in range(1,Nt):
+        g_euler[i] = g_euler[i-1] + dt*(alpha*g_euler[i-1]*(A - g_euler[i-1]))
+
+    # Print the errors done by each method
+    diff1 = np.max(np.abs(g_euler - g_analytical))
+    diff2 = np.max(np.abs(g_dnn_ag[0,:] - g_analytical))
+
+    print(&#39;Max absolute difference between Euler method and analytical: %g&#39;%diff1)
+    print(&#39;Max absolute difference between deep neural network and analytical: %g&#39;%diff2)
+
+    # Plot results
+    plt.figure(figsize=(10,10))
+
+    plt.plot(t,g_euler)
+    plt.plot(t,g_analytical)
+    plt.plot(t,g_dnn_ag[0,:])
+
+    plt.legend([&#39;euler&#39;,&#39;analytical&#39;,&#39;dnn&#39;])
+    plt.xlabel(&#39;Time t&#39;)
+    plt.ylabel(&#39;g(t)&#39;)
+
+    plt.show()
+</pre></div>
+</div>
+</div>
+</div>
+</section>
+<section id="example-solving-the-one-dimensional-poisson-equation">
+<h2>Example: Solving the one dimensional Poisson equation<a class="headerlink" href="#example-solving-the-one-dimensional-poisson-equation" title="Link to this heading">#</a></h2>
+<p>The Poisson equation for <span class="math notranslate nohighlight">\(g(x)\)</span> in one dimension is</p>
+<!-- Equation labels as ordinary links -->
+<div id="poisson"></div>
+<div class="math notranslate nohighlight">
+\[
+\begin{equation} \label{poisson} \tag{13}
+  -g''(x) = f(x)
+\end{equation}
+\]</div>
+<p>where <span class="math notranslate nohighlight">\(f(x)\)</span> is a given function for <span class="math notranslate nohighlight">\(x \in (0,1)\)</span>.</p>
+<p>The conditions that <span class="math notranslate nohighlight">\(g(x)\)</span> is chosen to fulfill, are</p>
+<div class="math notranslate nohighlight">
+\[\begin{split}
+\begin{align*}
+  g(0) &amp;= 0 \\
+  g(1) &amp;= 0
+\end{align*}
+\end{split}\]</div>
+<p>This equation can be solved numerically using programs where e.g Autograd and TensorFlow are used.
+The results from the networks can then be compared to the analytical solution.
+In addition, it could be interesting to see how a typical method for numerically solving second order ODEs compares to the neural networks.</p>
+</section>
+<section id="the-specific-equation-to-solve-for">
+<h2>The specific equation to solve for<a class="headerlink" href="#the-specific-equation-to-solve-for" title="Link to this heading">#</a></h2>
+<p>Here, the function <span class="math notranslate nohighlight">\(g(x)\)</span> to solve for follows the equation</p>
+<div class="math notranslate nohighlight">
+\[
+-g''(x) = f(x),\qquad x \in (0,1)
+\]</div>
+<p>where <span class="math notranslate nohighlight">\(f(x)\)</span> is a given function, along with the chosen conditions</p>
+<!-- Equation labels as ordinary links -->
+<div id="cond"></div>
+<div class="math notranslate nohighlight">
+\[
+\begin{aligned}
+g(0) = g(1) = 0
+\end{aligned}\label{cond} \tag{14}
+\]</div>
+<p>In this example, we consider the case when <span class="math notranslate nohighlight">\(f(x) = (3x + x^2)\exp(x)\)</span>.</p>
+<p>For this case, a possible trial solution satisfying the conditions could be</p>
+<div class="math notranslate nohighlight">
+\[
+g_t(x) = x \cdot (1-x) \cdot N(P,x)
+\]</div>
+<p>The analytical solution for this problem is</p>
+<div class="math notranslate nohighlight">
+\[
+g(x) = x(1 - x)\exp(x)
+\]</div>
+</section>
+<section id="solving-the-equation-using-autograd">
+<h2>Solving the equation using Autograd<a class="headerlink" href="#solving-the-equation-using-autograd" title="Link to this heading">#</a></h2>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-none notranslate"><div class="highlight"><pre><span></span>import autograd.numpy as np
+from autograd import grad, elementwise_grad
+import autograd.numpy.random as npr
+from matplotlib import pyplot as plt
+
+def sigmoid(z):
+    return 1/(1 + np.exp(-z))
+
+def deep_neural_network(deep_params, x):
+    # N_hidden is the number of hidden layers
+    # deep_params is a list, len() should be used
+    N_hidden = len(deep_params) - 1 # -1 since params consists of
+                                        # parameters to all the hidden
+                                        # layers AND the output layer.
+
+    # Assumes input x being an one-dimensional array
+    num_values = np.size(x)
+    x = x.reshape(-1, num_values)
+
+    # Assume that the input layer does nothing to the input x
+    x_input = x
+
+    # Due to multiple hidden layers, define a variable referencing to the
+    # output of the previous layer:
+    x_prev = x_input
+
+    ## Hidden layers:
+
+    for l in range(N_hidden):
+        # From the list of parameters P; find the correct weigths and bias for this layer
+        w_hidden = deep_params[l]
+
+        # Add a row of ones to include bias
+        x_prev = np.concatenate((np.ones((1,num_values)), x_prev ), axis = 0)
+
+        z_hidden = np.matmul(w_hidden, x_prev)
+        x_hidden = sigmoid(z_hidden)
+
+        # Update x_prev such that next layer can use the output from this layer
+        x_prev = x_hidden
+
+    ## Output layer:
+
+    # Get the weights and bias for this layer
+    w_output = deep_params[-1]
+
+    # Include bias:
+    x_prev = np.concatenate((np.ones((1,num_values)), x_prev), axis = 0)
+
+    z_output = np.matmul(w_output, x_prev)
+    x_output = z_output
+
+    return x_output
+
+
+def solve_ode_deep_neural_network(x, num_neurons, num_iter, lmb):
+    # num_hidden_neurons is now a list of number of neurons within each hidden layer
+
+    # Find the number of hidden layers:
+    N_hidden = np.size(num_neurons)
+
+    ## Set up initial weigths and biases
+
+    # Initialize the list of parameters:
+    P = [None]*(N_hidden + 1) # + 1 to include the output layer
+
+    P[0] = npr.randn(num_neurons[0], 2 )
+    for l in range(1,N_hidden):
+        P[l] = npr.randn(num_neurons[l], num_neurons[l-1] + 1) # +1 to include bias
+
+    # For the output layer
+    P[-1] = npr.randn(1, num_neurons[-1] + 1 ) # +1 since bias is included
+
+    print(&#39;Initial cost: %g&#39;%cost_function_deep(P, x))
+
+    ## Start finding the optimal weigths using gradient descent
+
+    # Find the Python function that represents the gradient of the cost function
+    # w.r.t the 0-th input argument -- that is the weights and biases in the hidden and output layer
+    cost_function_deep_grad = grad(cost_function_deep,0)
+
+    # Let the update be done num_iter times
+    for i in range(num_iter):
+        # Evaluate the gradient at the current weights and biases in P.
+        # The cost_grad consist now of N_hidden + 1 arrays; the gradient w.r.t the weights and biases
+        # in the hidden layers and output layers evaluated at x.
+        cost_deep_grad =  cost_function_deep_grad(P, x)
+
+        for l in range(N_hidden+1):
+            P[l] = P[l] - lmb * cost_deep_grad[l]
+
+    print(&#39;Final cost: %g&#39;%cost_function_deep(P, x))
+
+    return P
+
+## Set up the cost function specified for this Poisson equation:
+
+# The right side of the ODE
+def f(x):
+    return (3*x + x**2)*np.exp(x)
+
+def cost_function_deep(P, x):
+
+    # Evaluate the trial function with the current parameters P
+    g_t = g_trial_deep(x,P)
+
+    # Find the derivative w.r.t x of the trial function
+    d2_g_t = elementwise_grad(elementwise_grad(g_trial_deep,0))(x,P)
+
+    right_side = f(x)
+
+    err_sqr = (-d2_g_t - right_side)**2
+    cost_sum = np.sum(err_sqr)
+
+    return cost_sum/np.size(err_sqr)
+
+# The trial solution:
+def g_trial_deep(x,P):
+    return x*(1-x)*deep_neural_network(P,x)
+
+# The analytic solution;
+def g_analytic(x):
+    return x*(1-x)*np.exp(x)
+
+if __name__ == &#39;__main__&#39;:
+    npr.seed(4155)
+
+    ## Decide the vales of arguments to the function to solve
+    Nx = 10
+    x = np.linspace(0,1, Nx)
+
+    ## Set up the initial parameters
+    num_hidden_neurons = [200,100]
+    num_iter = 1000
+    lmb = 1e-3
+
+    P = solve_ode_deep_neural_network(x, num_hidden_neurons, num_iter, lmb)
+
+    g_dnn_ag = g_trial_deep(x,P)
+    g_analytical = g_analytic(x)
+
+    # Find the maximum absolute difference between the solutons:
+    max_diff = np.max(np.abs(g_dnn_ag - g_analytical))
+    print(&quot;The max absolute difference between the solutions is: %g&quot;%max_diff)
+
+    plt.figure(figsize=(10,10))
+
+    plt.title(&#39;Performance of neural network solving an ODE compared to the analytical solution&#39;)
+    plt.plot(x, g_analytical)
+    plt.plot(x, g_dnn_ag[0,:])
+    plt.legend([&#39;analytical&#39;,&#39;nn&#39;])
+    plt.xlabel(&#39;x&#39;)
+    plt.ylabel(&#39;g(x)&#39;)
+    plt.show()
+</pre></div>
+</div>
+</div>
+</div>
+</section>
+<section id="comparing-with-a-numerical-scheme">
+<h2>Comparing with a numerical scheme<a class="headerlink" href="#comparing-with-a-numerical-scheme" title="Link to this heading">#</a></h2>
+<p>The Poisson equation is possible to solve using Taylor series to approximate the second derivative.</p>
+<p>Using Taylor series, the second derivative can be expressed as</p>
+<div class="math notranslate nohighlight">
+\[
+g''(x) = \frac{g(x + \Delta x) - 2g(x) + g(x-\Delta x)}{\Delta x^2} + E_{\Delta x}(x)
+\]</div>
+<p>where <span class="math notranslate nohighlight">\(\Delta x\)</span> is a small step size and <span class="math notranslate nohighlight">\(E_{\Delta x}(x)\)</span> being the error term.</p>
+<p>Looking away from the error terms gives an approximation to the second derivative:</p>
+<!-- Equation labels as ordinary links -->
+<div id="approx"></div>
+<div class="math notranslate nohighlight">
+\[
+\begin{equation} \label{approx} \tag{15}
+g''(x) \approx \frac{g(x + \Delta x) - 2g(x) + g(x-\Delta x)}{\Delta x^2}
+\end{equation}
+\]</div>
+<p>If <span class="math notranslate nohighlight">\(x_i = i \Delta x = x_{i-1} + \Delta x\)</span> and <span class="math notranslate nohighlight">\(g_i = g(x_i)\)</span> for <span class="math notranslate nohighlight">\(i = 1,\dots N_x - 2\)</span> with <span class="math notranslate nohighlight">\(N_x\)</span> being the number of values for <span class="math notranslate nohighlight">\(x\)</span>, (<a class="reference internal" href="#approx"><span class="xref myst">15</span></a>) becomes</p>
+<div class="math notranslate nohighlight">
+\[\begin{split}
+\begin{aligned}
+g''(x_i) &amp;\approx \frac{g(x_i + \Delta x) - 2g(x_i) + g(x_i -\Delta x)}{\Delta x^2} \\
+&amp;= \frac{g_{i+1} - 2g_i + g_{i-1}}{\Delta x^2}
+\end{aligned}
+\end{split}\]</div>
+<p>Since we know from our problem that</p>
+<div class="math notranslate nohighlight">
+\[\begin{split}
+\begin{aligned}
+-g''(x) &amp;= f(x) \\
+&amp;= (3x + x^2)\exp(x)
+\end{aligned}
+\end{split}\]</div>
+<p>along with the conditions <span class="math notranslate nohighlight">\(g(0) = g(1) = 0\)</span>,
+the following scheme can be used to find an approximate solution for <span class="math notranslate nohighlight">\(g(x)\)</span> numerically:</p>
+<!-- Equation labels as ordinary links -->
+<div id="odesys"></div>
+<div class="math notranslate nohighlight">
+\[\begin{split}
+\begin{equation}
+  \begin{aligned}
+  -\Big( \frac{g_{i+1} - 2g_i + g_{i-1}}{\Delta x^2} \Big) &amp;= f(x_i) \\
+  -g_{i+1} + 2g_i - g_{i-1} &amp;= \Delta x^2 f(x_i)
+  \end{aligned}
+\end{equation} \label{odesys} \tag{16}
+\end{split}\]</div>
+<p>for <span class="math notranslate nohighlight">\(i = 1, \dots, N_x - 2\)</span> where <span class="math notranslate nohighlight">\(g_0 = g_{N_x - 1} = 0\)</span> and <span class="math notranslate nohighlight">\(f(x_i) = (3x_i + x_i^2)\exp(x_i)\)</span>, which is given for our specific problem.</p>
+<p>The equation can be rewritten into a matrix equation:</p>
+<div class="math notranslate nohighlight">
+\[\begin{split}
+\begin{aligned}
+\begin{pmatrix}
+2 &amp; -1 &amp; 0 &amp; \dots &amp; 0 \\
+-1 &amp; 2 &amp; -1 &amp; \dots &amp; 0 \\
+\vdots &amp; &amp; \ddots &amp; &amp; \vdots \\
+0 &amp; \dots &amp; -1 &amp; 2 &amp; -1  \\
+0 &amp; \dots &amp; 0 &amp; -1 &amp; 2\\
+\end{pmatrix}
+\begin{pmatrix}
+g_1 \\
+g_2 \\
+\vdots \\
+g_{N_x - 3} \\
+g_{N_x - 2}
+\end{pmatrix}
+&amp;=
+\Delta x^2
+\begin{pmatrix}
+f(x_1) \\
+f(x_2) \\
+\vdots \\
+f(x_{N_x - 3}) \\
+f(x_{N_x - 2})
+\end{pmatrix} \\
+\boldsymbol{A}\boldsymbol{g} &amp;= \boldsymbol{f},
+\end{aligned}
+\end{split}\]</div>
+<p>which makes it possible to solve for the vector <span class="math notranslate nohighlight">\(\boldsymbol{g}\)</span>.</p>
+</section>
+<section id="setting-up-the-code">
+<h2>Setting up the code<a class="headerlink" href="#setting-up-the-code" title="Link to this heading">#</a></h2>
+<p>We can then compare the result from this numerical scheme with the output from our network using Autograd:</p>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-none notranslate"><div class="highlight"><pre><span></span>import autograd.numpy as np
+from autograd import grad, elementwise_grad
+import autograd.numpy.random as npr
+from matplotlib import pyplot as plt
+
+def sigmoid(z):
+    return 1/(1 + np.exp(-z))
+
+def deep_neural_network(deep_params, x):
+    # N_hidden is the number of hidden layers
+    # deep_params is a list, len() should be used
+    N_hidden = len(deep_params) - 1 # -1 since params consists of
+                                        # parameters to all the hidden
+                                        # layers AND the output layer.
+
+    # Assumes input x being an one-dimensional array
+    num_values = np.size(x)
+    x = x.reshape(-1, num_values)
+
+    # Assume that the input layer does nothing to the input x
+    x_input = x
+
+    # Due to multiple hidden layers, define a variable referencing to the
+    # output of the previous layer:
+    x_prev = x_input
+
+    ## Hidden layers:
+
+    for l in range(N_hidden):
+        # From the list of parameters P; find the correct weigths and bias for this layer
+        w_hidden = deep_params[l]
+
+        # Add a row of ones to include bias
+        x_prev = np.concatenate((np.ones((1,num_values)), x_prev ), axis = 0)
+
+        z_hidden = np.matmul(w_hidden, x_prev)
+        x_hidden = sigmoid(z_hidden)
+
+        # Update x_prev such that next layer can use the output from this layer
+        x_prev = x_hidden
+
+    ## Output layer:
+
+    # Get the weights and bias for this layer
+    w_output = deep_params[-1]
+
+    # Include bias:
+    x_prev = np.concatenate((np.ones((1,num_values)), x_prev), axis = 0)
+
+    z_output = np.matmul(w_output, x_prev)
+    x_output = z_output
+
+    return x_output
+
+
+def solve_ode_deep_neural_network(x, num_neurons, num_iter, lmb):
+    # num_hidden_neurons is now a list of number of neurons within each hidden layer
+
+    # Find the number of hidden layers:
+    N_hidden = np.size(num_neurons)
+
+    ## Set up initial weigths and biases
+
+    # Initialize the list of parameters:
+    P = [None]*(N_hidden + 1) # + 1 to include the output layer
+
+    P[0] = npr.randn(num_neurons[0], 2 )
+    for l in range(1,N_hidden):
+        P[l] = npr.randn(num_neurons[l], num_neurons[l-1] + 1) # +1 to include bias
+
+    # For the output layer
+    P[-1] = npr.randn(1, num_neurons[-1] + 1 ) # +1 since bias is included
+
+    print(&#39;Initial cost: %g&#39;%cost_function_deep(P, x))
+
+    ## Start finding the optimal weigths using gradient descent
+
+    # Find the Python function that represents the gradient of the cost function
+    # w.r.t the 0-th input argument -- that is the weights and biases in the hidden and output layer
+    cost_function_deep_grad = grad(cost_function_deep,0)
+
+    # Let the update be done num_iter times
+    for i in range(num_iter):
+        # Evaluate the gradient at the current weights and biases in P.
+        # The cost_grad consist now of N_hidden + 1 arrays; the gradient w.r.t the weights and biases
+        # in the hidden layers and output layers evaluated at x.
+        cost_deep_grad =  cost_function_deep_grad(P, x)
+
+        for l in range(N_hidden+1):
+            P[l] = P[l] - lmb * cost_deep_grad[l]
+
+    print(&#39;Final cost: %g&#39;%cost_function_deep(P, x))
+
+    return P
+
+## Set up the cost function specified for this Poisson equation:
+
+# The right side of the ODE
+def f(x):
+    return (3*x + x**2)*np.exp(x)
+
+def cost_function_deep(P, x):
+
+    # Evaluate the trial function with the current parameters P
+    g_t = g_trial_deep(x,P)
+
+    # Find the derivative w.r.t x of the trial function
+    d2_g_t = elementwise_grad(elementwise_grad(g_trial_deep,0))(x,P)
+
+    right_side = f(x)
+
+    err_sqr = (-d2_g_t - right_side)**2
+    cost_sum = np.sum(err_sqr)
+
+    return cost_sum/np.size(err_sqr)
+
+# The trial solution:
+def g_trial_deep(x,P):
+    return x*(1-x)*deep_neural_network(P,x)
+
+# The analytic solution;
+def g_analytic(x):
+    return x*(1-x)*np.exp(x)
+
+if __name__ == &#39;__main__&#39;:
+    npr.seed(4155)
+
+    ## Decide the vales of arguments to the function to solve
+    Nx = 10
+    x = np.linspace(0,1, Nx)
+
+    ## Set up the initial parameters
+    num_hidden_neurons = [200,100]
+    num_iter = 1000
+    lmb = 1e-3
+
+    P = solve_ode_deep_neural_network(x, num_hidden_neurons, num_iter, lmb)
+
+    g_dnn_ag = g_trial_deep(x,P)
+    g_analytical = g_analytic(x)
+
+    # Find the maximum absolute difference between the solutons:
+
+    plt.figure(figsize=(10,10))
+
+    plt.title(&#39;Performance of neural network solving an ODE compared to the analytical solution&#39;)
+    plt.plot(x, g_analytical)
+    plt.plot(x, g_dnn_ag[0,:])
+    plt.legend([&#39;analytical&#39;,&#39;nn&#39;])
+    plt.xlabel(&#39;x&#39;)
+    plt.ylabel(&#39;g(x)&#39;)
+
+    ## Perform the computation using the numerical scheme
+
+    dx = 1/(Nx - 1)
+
+    # Set up the matrix A
+    A = np.zeros((Nx-2,Nx-2))
+
+    A[0,0] = 2
+    A[0,1] = -1
+
+    for i in range(1,Nx-3):
+        A[i,i-1] = -1
+        A[i,i] = 2
+        A[i,i+1] = -1
+
+    A[Nx - 3, Nx - 4] = -1
+    A[Nx - 3, Nx - 3] = 2
+
+    # Set up the vector f
+    f_vec = dx**2 * f(x[1:-1])
+
+    # Solve the equation
+    g_res = np.linalg.solve(A,f_vec)
+
+    g_vec = np.zeros(Nx)
+    g_vec[1:-1] = g_res
+
+    # Print the differences between each method
+    max_diff1 = np.max(np.abs(g_dnn_ag - g_analytical))
+    max_diff2 = np.max(np.abs(g_vec - g_analytical))
+    print(&quot;The max absolute difference between the analytical solution and DNN Autograd: %g&quot;%max_diff1)
+    print(&quot;The max absolute difference between the analytical solution and numerical scheme: %g&quot;%max_diff2)
+
+    # Plot the results
+    plt.figure(figsize=(10,10))
+
+    plt.plot(x,g_vec)
+    plt.plot(x,g_analytical)
+    plt.plot(x,g_dnn_ag[0,:])
+
+    plt.legend([&#39;numerical scheme&#39;,&#39;analytical&#39;,&#39;dnn&#39;])
+    plt.show()
+</pre></div>
+</div>
+</div>
+</div>
+</section>
+<section id="partial-differential-equations">
+<h2>Partial Differential Equations<a class="headerlink" href="#partial-differential-equations" title="Link to this heading">#</a></h2>
+<p>A partial differential equation (PDE) has a solution here the function
+is defined by multiple variables.  The equation may involve all kinds
+of combinations of which variables the function is differentiated with
+respect to.</p>
+<p>In general, a partial differential equation for a function <span class="math notranslate nohighlight">\(g(x_1,\dots,x_N)\)</span> with <span class="math notranslate nohighlight">\(N\)</span> variables may be expressed as</p>
+<!-- Equation labels as ordinary links -->
+<div id="PDE"></div>
+<div class="math notranslate nohighlight">
+\[
+\begin{equation} \label{PDE} \tag{17}
+  f\left(x_1, \, \dots \, , x_N, \frac{\partial g(x_1,\dots,x_N) }{\partial x_1}, \dots , \frac{\partial g(x_1,\dots,x_N) }{\partial x_N}, \frac{\partial g(x_1,\dots,x_N) }{\partial x_1\partial x_2}, \, \dots \, , \frac{\partial^n g(x_1,\dots,x_N) }{\partial x_N^n} \right) = 0
+\end{equation}
+\]</div>
+<p>where <span class="math notranslate nohighlight">\(f\)</span> is an expression involving all kinds of possible mixed derivatives of <span class="math notranslate nohighlight">\(g(x_1,\dots,x_N)\)</span> up to an order <span class="math notranslate nohighlight">\(n\)</span>. In order for the solution to be unique, some additional conditions must also be given.</p>
+</section>
+<section id="type-of-problem">
+<h2>Type of problem<a class="headerlink" href="#type-of-problem" title="Link to this heading">#</a></h2>
+<p>The problem our network must solve for, is similar to the ODE case.
+We must have a trial solution <span class="math notranslate nohighlight">\(g_t\)</span> at hand.</p>
+<p>For instance, the trial solution could be expressed as</p>
+<div class="math notranslate nohighlight">
+\[
+\begin{align*}
+  g_t(x_1,\dots,x_N) = h_1(x_1,\dots,x_N) + h_2(x_1,\dots,x_N,N(x_1,\dots,x_N,P))
+\end{align*}
+\]</div>
+<p>where <span class="math notranslate nohighlight">\(h_1(x_1,\dots,x_N)\)</span> is a function that ensures <span class="math notranslate nohighlight">\(g_t(x_1,\dots,x_N)\)</span> satisfies some given conditions.
+The neural network <span class="math notranslate nohighlight">\(N(x_1,\dots,x_N,P)\)</span> has weights and biases described by <span class="math notranslate nohighlight">\(P\)</span> and <span class="math notranslate nohighlight">\(h_2(x_1,\dots,x_N,N(x_1,\dots,x_N,P))\)</span> is an expression using the output from the neural network in some way.</p>
+<p>The role of the function <span class="math notranslate nohighlight">\(h_2(x_1,\dots,x_N,N(x_1,\dots,x_N,P))\)</span>, is to ensure that the output of <span class="math notranslate nohighlight">\(N(x_1,\dots,x_N,P)\)</span> is zero when <span class="math notranslate nohighlight">\(g_t(x_1,\dots,x_N)\)</span> is evaluated at the values of <span class="math notranslate nohighlight">\(x_1,\dots,x_N\)</span> where the given conditions must be satisfied. The function <span class="math notranslate nohighlight">\(h_1(x_1,\dots,x_N)\)</span> should alone make <span class="math notranslate nohighlight">\(g_t(x_1,\dots,x_N)\)</span> satisfy the conditions.</p>
+</section>
+<section id="network-requirements">
+<h2>Network requirements<a class="headerlink" href="#network-requirements" title="Link to this heading">#</a></h2>
+<p>The network tries then the minimize the cost function following the
+same ideas as described for the ODE case, but now with more than one
+variables to consider.  The concept still remains the same; find a set
+of parameters <span class="math notranslate nohighlight">\(P\)</span> such that the expression <span class="math notranslate nohighlight">\(f\)</span> in (<a class="reference internal" href="#PDE"><span class="xref myst">17</span></a>) is as
+close to zero as possible.</p>
+<p>As for the ODE case, the cost function is the mean squared error that
+the network must try to minimize. The cost function for the network to
+minimize is</p>
+<div class="math notranslate nohighlight">
+\[
+C\left(x_1, \dots, x_N, P\right) = \left(  f\left(x_1, \, \dots \, , x_N, \frac{\partial g(x_1,\dots,x_N) }{\partial x_1}, \dots , \frac{\partial g(x_1,\dots,x_N) }{\partial x_N}, \frac{\partial g(x_1,\dots,x_N) }{\partial x_1\partial x_2}, \, \dots \, , \frac{\partial^n g(x_1,\dots,x_N) }{\partial x_N^n} \right) \right)^2
+\]</div>
+</section>
+<section id="id4">
+<h2>More details<a class="headerlink" href="#id4" title="Link to this heading">#</a></h2>
+<p>If we let <span class="math notranslate nohighlight">\(\boldsymbol{x} = \big( x_1, \dots, x_N \big)\)</span> be an array containing the values for <span class="math notranslate nohighlight">\(x_1, \dots, x_N\)</span> respectively, the cost function can be reformulated into the following:</p>
+<div class="math notranslate nohighlight">
+\[
+C\left(\boldsymbol{x}, P\right) = f\left( \left( \boldsymbol{x}, \frac{\partial g(\boldsymbol{x}) }{\partial x_1}, \dots , \frac{\partial g(\boldsymbol{x}) }{\partial x_N}, \frac{\partial g(\boldsymbol{x}) }{\partial x_1\partial x_2}, \, \dots \, , \frac{\partial^n g(\boldsymbol{x}) }{\partial x_N^n} \right) \right)^2
+\]</div>
+<p>If we also have <span class="math notranslate nohighlight">\(M\)</span> different sets of values for <span class="math notranslate nohighlight">\(x_1, \dots, x_N\)</span>, that is <span class="math notranslate nohighlight">\(\boldsymbol{x}_i = \big(x_1^{(i)}, \dots, x_N^{(i)}\big)\)</span> for <span class="math notranslate nohighlight">\(i = 1,\dots,M\)</span> being the rows in matrix <span class="math notranslate nohighlight">\(X\)</span>, the cost function can be generalized into</p>
+<div class="math notranslate nohighlight">
+\[
+C\left(X, P \right) = \sum_{i=1}^M f\left( \left( \boldsymbol{x}_i, \frac{\partial g(\boldsymbol{x}_i) }{\partial x_1}, \dots , \frac{\partial g(\boldsymbol{x}_i) }{\partial x_N}, \frac{\partial g(\boldsymbol{x}_i) }{\partial x_1\partial x_2}, \, \dots \, , \frac{\partial^n g(\boldsymbol{x}_i) }{\partial x_N^n} \right) \right)^2.
+\]</div>
+</section>
+<section id="example-the-diffusion-equation">
+<h2>Example: The diffusion equation<a class="headerlink" href="#example-the-diffusion-equation" title="Link to this heading">#</a></h2>
+<p>In one spatial dimension, the equation reads</p>
+<div class="math notranslate nohighlight">
+\[
+\frac{\partial g(x,t)}{\partial t} = \frac{\partial^2 g(x,t)}{\partial x^2}
+\]</div>
+<p>where a possible choice of conditions are</p>
+<div class="math notranslate nohighlight">
+\[\begin{split}
+\begin{align*}
+g(0,t) &amp;= 0 ,\qquad t \geq 0 \\
+g(1,t) &amp;= 0, \qquad t \geq 0 \\
+g(x,0) &amp;= u(x),\qquad x\in [0,1]
+\end{align*}
+\end{split}\]</div>
+<p>with <span class="math notranslate nohighlight">\(u(x)\)</span> being some given function.</p>
+</section>
+<section id="defining-the-problem">
+<h2>Defining the problem<a class="headerlink" href="#defining-the-problem" title="Link to this heading">#</a></h2>
+<p>For this case, we want to find <span class="math notranslate nohighlight">\(g(x,t)\)</span> such that</p>
+<!-- Equation labels as ordinary links -->
+<div id="diffonedim"></div>
+<div class="math notranslate nohighlight">
+\[
+\begin{equation}
+  \frac{\partial g(x,t)}{\partial t} = \frac{\partial^2 g(x,t)}{\partial x^2}
+\end{equation} \label{diffonedim} \tag{18}
+\]</div>
+<p>and</p>
+<div class="math notranslate nohighlight">
+\[\begin{split}
+\begin{align*}
+g(0,t) &amp;= 0 ,\qquad t \geq 0 \\
+g(1,t) &amp;= 0, \qquad t \geq 0 \\
+g(x,0) &amp;= u(x),\qquad x\in [0,1]
+\end{align*}
+\end{split}\]</div>
+<p>with <span class="math notranslate nohighlight">\(u(x) = \sin(\pi x)\)</span>.</p>
+<p>First, let us set up the deep neural network.
+The deep neural network will follow the same structure as discussed in the examples solving the ODEs.
+First, we will look into how Autograd could be used in a network tailored to solve for bivariate functions.</p>
+</section>
+<section id="setting-up-the-network-using-autograd">
+<h2>Setting up the network using Autograd<a class="headerlink" href="#setting-up-the-network-using-autograd" title="Link to this heading">#</a></h2>
+<p>The only change to do here, is to extend our network such that
+functions of multiple parameters are correctly handled.  In this case
+we have two variables in our function to solve for, that is time <span class="math notranslate nohighlight">\(t\)</span>
+and position <span class="math notranslate nohighlight">\(x\)</span>.  The variables will be represented by a
+one-dimensional array in the program.  The program will evaluate the
+network at each possible pair <span class="math notranslate nohighlight">\((x,t)\)</span>, given an array for the desired
+<span class="math notranslate nohighlight">\(x\)</span>-values and <span class="math notranslate nohighlight">\(t\)</span>-values to approximate the solution at.</p>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-none notranslate"><div class="highlight"><pre><span></span>def sigmoid(z):
+    return 1/(1 + np.exp(-z))
+
+def deep_neural_network(deep_params, x):
+    # x is now a point and a 1D numpy array; make it a column vector
+    num_coordinates = np.size(x,0)
+    x = x.reshape(num_coordinates,-1)
+
+    num_points = np.size(x,1)
+
+    # N_hidden is the number of hidden layers
+    N_hidden = len(deep_params) - 1 # -1 since params consist of parameters to all the hidden layers AND the output layer
+
+    # Assume that the input layer does nothing to the input x
+    x_input = x
+    x_prev = x_input
+
+    ## Hidden layers:
+
+    for l in range(N_hidden):
+        # From the list of parameters P; find the correct weigths and bias for this layer
+        w_hidden = deep_params[l]
+
+        # Add a row of ones to include bias
+        x_prev = np.concatenate((np.ones((1,num_points)), x_prev ), axis = 0)
+
+        z_hidden = np.matmul(w_hidden, x_prev)
+        x_hidden = sigmoid(z_hidden)
+
+        # Update x_prev such that next layer can use the output from this layer
+        x_prev = x_hidden
+
+    ## Output layer:
+
+    # Get the weights and bias for this layer
+    w_output = deep_params[-1]
+
+    # Include bias:
+    x_prev = np.concatenate((np.ones((1,num_points)), x_prev), axis = 0)
+
+    z_output = np.matmul(w_output, x_prev)
+    x_output = z_output
+
+    return x_output[0][0]
+</pre></div>
+</div>
+</div>
+</div>
+</section>
+<section id="setting-up-the-network-using-autograd-the-trial-solution">
+<h2>Setting up the network using Autograd; The trial solution<a class="headerlink" href="#setting-up-the-network-using-autograd-the-trial-solution" title="Link to this heading">#</a></h2>
+<p>The cost function must then iterate through the given arrays
+containing values for <span class="math notranslate nohighlight">\(x\)</span> and <span class="math notranslate nohighlight">\(t\)</span>, defines a point <span class="math notranslate nohighlight">\((x,t)\)</span> the deep
+neural network and the trial solution is evaluated at, and then finds
+the Jacobian of the trial solution.</p>
+<p>A possible trial solution for this PDE is</p>
+<div class="math notranslate nohighlight">
+\[
+g_t(x,t) = h_1(x,t) + x(1-x)tN(x,t,P)
+\]</div>
+<p>with <span class="math notranslate nohighlight">\(A(x,t)\)</span> being a function ensuring that <span class="math notranslate nohighlight">\(g_t(x,t)\)</span> satisfies our given conditions, and <span class="math notranslate nohighlight">\(N(x,t,P)\)</span> being the output from the deep neural network using weights and biases for each layer from <span class="math notranslate nohighlight">\(P\)</span>.</p>
+<p>To fulfill the conditions, <span class="math notranslate nohighlight">\(A(x,t)\)</span> could be:</p>
+<div class="math notranslate nohighlight">
+\[
+h_1(x,t) = (1-t)\Big(u(x) - \big((1-x)u(0) + x u(1)\big)\Big) = (1-t)u(x) = (1-t)\sin(\pi x)
+\]</div>
+<p>since <span class="math notranslate nohighlight">\((0) = u(1) = 0\)</span> and <span class="math notranslate nohighlight">\(u(x) = \sin(\pi x)\)</span>.</p>
+</section>
+<section id="why-the-jacobian">
+<h2>Why the jacobian?<a class="headerlink" href="#why-the-jacobian" title="Link to this heading">#</a></h2>
+<p>The Jacobian is used because the program must find the derivative of
+the trial solution with respect to <span class="math notranslate nohighlight">\(x\)</span> and <span class="math notranslate nohighlight">\(t\)</span>.</p>
+<p>This gives the necessity of computing the Jacobian matrix, as we want
+to evaluate the gradient with respect to <span class="math notranslate nohighlight">\(x\)</span> and <span class="math notranslate nohighlight">\(t\)</span> (note that the
+Jacobian of a scalar-valued multivariate function is simply its
+gradient).</p>
+<p>In Autograd, the differentiation is by default done with respect to
+the first input argument of your Python function. Since the points is
+an array representing <span class="math notranslate nohighlight">\(x\)</span> and <span class="math notranslate nohighlight">\(t\)</span>, the Jacobian is calculated using
+the values of <span class="math notranslate nohighlight">\(x\)</span> and <span class="math notranslate nohighlight">\(t\)</span>.</p>
+<p>To find the second derivative with respect to <span class="math notranslate nohighlight">\(x\)</span> and <span class="math notranslate nohighlight">\(t\)</span>, the
+Jacobian can be found for the second time. The result is a Hessian
+matrix, which is the matrix containing all the possible second order
+mixed derivatives of <span class="math notranslate nohighlight">\(g(x,t)\)</span>.</p>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-none notranslate"><div class="highlight"><pre><span></span># Set up the trial function:
+def u(x):
+    return np.sin(np.pi*x)
+
+def g_trial(point,P):
+    x,t = point
+    return (1-t)*u(x) + x*(1-x)*t*deep_neural_network(P,point)
+
+# The right side of the ODE:
+def f(point):
+    return 0.
+
+# The cost function:
+def cost_function(P, x, t):
+    cost_sum = 0
+
+    g_t_jacobian_func = jacobian(g_trial)
+    g_t_hessian_func = hessian(g_trial)
+
+    for x_ in x:
+        for t_ in t:
+            point = np.array([x_,t_])
+
+            g_t = g_trial(point,P)
+            g_t_jacobian = g_t_jacobian_func(point,P)
+            g_t_hessian = g_t_hessian_func(point,P)
+
+            g_t_dt = g_t_jacobian[1]
+            g_t_d2x = g_t_hessian[0][0]
+
+            func = f(point)
+
+            err_sqr = ( (g_t_dt - g_t_d2x) - func)**2
+            cost_sum += err_sqr
+
+    return cost_sum
+</pre></div>
+</div>
+</div>
+</div>
+</section>
+<section id="setting-up-the-network-using-autograd-the-full-program">
+<h2>Setting up the network using Autograd; The full program<a class="headerlink" href="#setting-up-the-network-using-autograd-the-full-program" title="Link to this heading">#</a></h2>
+<p>Having set up the network, along with the trial solution and cost function, we can now see how the deep neural network performs by comparing the results to the analytical solution.</p>
+<p>The analytical solution of our problem is</p>
+<div class="math notranslate nohighlight">
+\[
+g(x,t) = \exp(-\pi^2 t)\sin(\pi x)
+\]</div>
+<p>A possible way to implement a neural network solving the PDE, is given below.
+Be aware, though, that it is fairly slow for the parameters used.
+A better result is possible, but requires more iterations, and thus longer time to complete.</p>
+<p>Indeed, the program below is not optimal in its implementation, but rather serves as an example on how to implement and use a neural network to solve a PDE.
+Using TensorFlow results in a much better execution time. Try it!</p>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-none notranslate"><div class="highlight"><pre><span></span>import autograd.numpy as np
+from autograd import jacobian,hessian,grad
+import autograd.numpy.random as npr
+from matplotlib import cm
+from matplotlib import pyplot as plt
+from mpl_toolkits.mplot3d import axes3d
+
+## Set up the network
+
+def sigmoid(z):
+    return 1/(1 + np.exp(-z))
+
+def deep_neural_network(deep_params, x):
+    # x is now a point and a 1D numpy array; make it a column vector
+    num_coordinates = np.size(x,0)
+    x = x.reshape(num_coordinates,-1)
+
+    num_points = np.size(x,1)
+
+    # N_hidden is the number of hidden layers
+    N_hidden = len(deep_params) - 1 # -1 since params consist of parameters to all the hidden layers AND the output layer
+
+    # Assume that the input layer does nothing to the input x
+    x_input = x
+    x_prev = x_input
+
+    ## Hidden layers:
+
+    for l in range(N_hidden):
+        # From the list of parameters P; find the correct weigths and bias for this layer
+        w_hidden = deep_params[l]
+
+        # Add a row of ones to include bias
+        x_prev = np.concatenate((np.ones((1,num_points)), x_prev ), axis = 0)
+
+        z_hidden = np.matmul(w_hidden, x_prev)
+        x_hidden = sigmoid(z_hidden)
+
+        # Update x_prev such that next layer can use the output from this layer
+        x_prev = x_hidden
+
+    ## Output layer:
+
+    # Get the weights and bias for this layer
+    w_output = deep_params[-1]
+
+    # Include bias:
+    x_prev = np.concatenate((np.ones((1,num_points)), x_prev), axis = 0)
+
+    z_output = np.matmul(w_output, x_prev)
+    x_output = z_output
+
+    return x_output[0][0]
+
+## Define the trial solution and cost function
+def u(x):
+    return np.sin(np.pi*x)
+
+def g_trial(point,P):
+    x,t = point
+    return (1-t)*u(x) + x*(1-x)*t*deep_neural_network(P,point)
+
+# The right side of the ODE:
+def f(point):
+    return 0.
+
+# The cost function:
+def cost_function(P, x, t):
+    cost_sum = 0
+
+    g_t_jacobian_func = jacobian(g_trial)
+    g_t_hessian_func = hessian(g_trial)
+
+    for x_ in x:
+        for t_ in t:
+            point = np.array([x_,t_])
+
+            g_t = g_trial(point,P)
+            g_t_jacobian = g_t_jacobian_func(point,P)
+            g_t_hessian = g_t_hessian_func(point,P)
+
+            g_t_dt = g_t_jacobian[1]
+            g_t_d2x = g_t_hessian[0][0]
+
+            func = f(point)
+
+            err_sqr = ( (g_t_dt - g_t_d2x) - func)**2
+            cost_sum += err_sqr
+
+    return cost_sum /( np.size(x)*np.size(t) )
+
+## For comparison, define the analytical solution
+def g_analytic(point):
+    x,t = point
+    return np.exp(-np.pi**2*t)*np.sin(np.pi*x)
+
+## Set up a function for training the network to solve for the equation
+def solve_pde_deep_neural_network(x,t, num_neurons, num_iter, lmb):
+    ## Set up initial weigths and biases
+    N_hidden = np.size(num_neurons)
+
+    ## Set up initial weigths and biases
+
+    # Initialize the list of parameters:
+    P = [None]*(N_hidden + 1) # + 1 to include the output layer
+
+    P[0] = npr.randn(num_neurons[0], 2 + 1 ) # 2 since we have two points, +1 to include bias
+    for l in range(1,N_hidden):
+        P[l] = npr.randn(num_neurons[l], num_neurons[l-1] + 1) # +1 to include bias
+
+    # For the output layer
+    P[-1] = npr.randn(1, num_neurons[-1] + 1 ) # +1 since bias is included
+
+    print(&#39;Initial cost: &#39;,cost_function(P, x, t))
+
+    cost_function_grad = grad(cost_function,0)
+
+    # Let the update be done num_iter times
+    for i in range(num_iter):
+        cost_grad =  cost_function_grad(P, x , t)
+
+        for l in range(N_hidden+1):
+            P[l] = P[l] - lmb * cost_grad[l]
+
+    print(&#39;Final cost: &#39;,cost_function(P, x, t))
+
+    return P
+
+if __name__ == &#39;__main__&#39;:
+    ### Use the neural network:
+    npr.seed(15)
+
+    ## Decide the vales of arguments to the function to solve
+    Nx = 10; Nt = 10
+    x = np.linspace(0, 1, Nx)
+    t = np.linspace(0,1,Nt)
+
+    ## Set up the parameters for the network
+    num_hidden_neurons = [100, 25]
+    num_iter = 250
+    lmb = 0.01
+
+    P = solve_pde_deep_neural_network(x,t, num_hidden_neurons, num_iter, lmb)
+
+    ## Store the results
+    g_dnn_ag = np.zeros((Nx, Nt))
+    G_analytical = np.zeros((Nx, Nt))
+    for i,x_ in enumerate(x):
+        for j, t_ in enumerate(t):
+            point = np.array([x_, t_])
+            g_dnn_ag[i,j] = g_trial(point,P)
+
+            G_analytical[i,j] = g_analytic(point)
+
+    # Find the map difference between the analytical and the computed solution
+    diff_ag = np.abs(g_dnn_ag - G_analytical)
+    print(&#39;Max absolute difference between the analytical solution and the network: %g&#39;%np.max(diff_ag))
+
+    ## Plot the solutions in two dimensions, that being in position and time
+
+    T,X = np.meshgrid(t,x)
+
+    fig = plt.figure(figsize=(10,10))
+    ax = fig.add_suplot(projection=&#39;3d&#39;)
+    ax.set_title(&#39;Solution from the deep neural network w/ %d layer&#39;%len(num_hidden_neurons))
+    s = ax.plot_surface(T,X,g_dnn_ag,linewidth=0,antialiased=False,cmap=cm.viridis)
+    ax.set_xlabel(&#39;Time $t$&#39;)
+    ax.set_ylabel(&#39;Position $x$&#39;);
+
+
+    fig = plt.figure(figsize=(10,10))
+    ax = fig.add_suplot(projection=&#39;3d&#39;)
+    ax.set_title(&#39;Analytical solution&#39;)
+    s = ax.plot_surface(T,X,G_analytical,linewidth=0,antialiased=False,cmap=cm.viridis)
+    ax.set_xlabel(&#39;Time $t$&#39;)
+    ax.set_ylabel(&#39;Position $x$&#39;);
+
+    fig = plt.figure(figsize=(10,10))
+    ax = fig.add_suplot(projection=&#39;3d&#39;)
+    ax.set_title(&#39;Difference&#39;)
+    s = ax.plot_surface(T,X,diff_ag,linewidth=0,antialiased=False,cmap=cm.viridis)
+    ax.set_xlabel(&#39;Time $t$&#39;)
+    ax.set_ylabel(&#39;Position $x$&#39;);
+
+    ## Take some slices of the 3D plots just to see the solutions at particular times
+    indx1 = 0
+    indx2 = int(Nt/2)
+    indx3 = Nt-1
+
+    t1 = t[indx1]
+    t2 = t[indx2]
+    t3 = t[indx3]
+
+    # Slice the results from the DNN
+    res1 = g_dnn_ag[:,indx1]
+    res2 = g_dnn_ag[:,indx2]
+    res3 = g_dnn_ag[:,indx3]
+
+    # Slice the analytical results
+    res_analytical1 = G_analytical[:,indx1]
+    res_analytical2 = G_analytical[:,indx2]
+    res_analytical3 = G_analytical[:,indx3]
+
+    # Plot the slices
+    plt.figure(figsize=(10,10))
+    plt.title(&quot;Computed solutions at time = %g&quot;%t1)
+    plt.plot(x, res1)
+    plt.plot(x,res_analytical1)
+    plt.legend([&#39;dnn&#39;,&#39;analytical&#39;])
+
+    plt.figure(figsize=(10,10))
+    plt.title(&quot;Computed solutions at time = %g&quot;%t2)
+    plt.plot(x, res2)
+    plt.plot(x,res_analytical2)
+    plt.legend([&#39;dnn&#39;,&#39;analytical&#39;])
+
+    plt.figure(figsize=(10,10))
+    plt.title(&quot;Computed solutions at time = %g&quot;%t3)
+    plt.plot(x, res3)
+    plt.plot(x,res_analytical3)
+    plt.legend([&#39;dnn&#39;,&#39;analytical&#39;])
+
+    plt.show()
+</pre></div>
+</div>
+</div>
+</div>
+</section>
+<section id="example-solving-the-wave-equation-with-neural-networks">
+<h2>Example: Solving the wave equation with Neural Networks<a class="headerlink" href="#example-solving-the-wave-equation-with-neural-networks" title="Link to this heading">#</a></h2>
+<p>The wave equation is</p>
+<div class="math notranslate nohighlight">
+\[
+\frac{\partial^2 g(x,t)}{\partial t^2} = c^2\frac{\partial^2 g(x,t)}{\partial x^2}
+\]</div>
+<p>with <span class="math notranslate nohighlight">\(c\)</span> being the specified wave speed.</p>
+<p>Here, the chosen conditions are</p>
+<div class="math notranslate nohighlight">
+\[\begin{split}
+\begin{align*}
+	g(0,t) &amp;= 0 \\
+	g(1,t) &amp;= 0 \\
+	g(x,0) &amp;= u(x) \\
+	\frac{\partial g(x,t)}{\partial t} \Big |_{t = 0} &amp;= v(x)
+\end{align*}
+\end{split}\]</div>
+<p>where <span class="math notranslate nohighlight">\(\frac{\partial g(x,t)}{\partial t} \Big |_{t = 0}\)</span> means the derivative of <span class="math notranslate nohighlight">\(g(x,t)\)</span> with respect to <span class="math notranslate nohighlight">\(t\)</span> is evaluated at <span class="math notranslate nohighlight">\(t = 0\)</span>, and <span class="math notranslate nohighlight">\(u(x)\)</span> and <span class="math notranslate nohighlight">\(v(x)\)</span> being given functions.</p>
+</section>
+<section id="the-problem-to-solve-for">
+<h2>The problem to solve for<a class="headerlink" href="#the-problem-to-solve-for" title="Link to this heading">#</a></h2>
+<p>The wave equation to solve for, is</p>
+<!-- Equation labels as ordinary links -->
+<div id="wave"></div>
+<div class="math notranslate nohighlight">
+\[
+\begin{equation} \label{wave} \tag{19}
+\frac{\partial^2 g(x,t)}{\partial t^2} = c^2 \frac{\partial^2 g(x,t)}{\partial x^2}
+\end{equation}
+\]</div>
+<p>where <span class="math notranslate nohighlight">\(c\)</span> is the given wave speed.
+The chosen conditions for this equation are</p>
+<!-- Equation labels as ordinary links -->
+<div id="condwave"></div>
+<div class="math notranslate nohighlight">
+\[\begin{split}
+\begin{aligned}
+g(0,t) &amp;= 0, &amp;t \geq 0 \\
+g(1,t) &amp;= 0, &amp;t \geq 0 \\
+g(x,0) &amp;= u(x), &amp;x\in[0,1] \\
+\frac{\partial g(x,t)}{\partial t}\Big |_{t = 0} &amp;= v(x), &amp;x \in [0,1]
+\end{aligned} \label{condwave} \tag{20}
+\end{split}\]</div>
+<p>In this example, let <span class="math notranslate nohighlight">\(c = 1\)</span> and <span class="math notranslate nohighlight">\(u(x) = \sin(\pi x)\)</span> and <span class="math notranslate nohighlight">\(v(x) = -\pi\sin(\pi x)\)</span>.</p>
+</section>
+<section id="id5">
+<h2>The trial solution<a class="headerlink" href="#id5" title="Link to this heading">#</a></h2>
+<p>Setting up the network is done in similar matter as for the example of solving the diffusion equation.
+The only things we have to change, is the trial solution such that it satisfies the conditions from (<a class="reference internal" href="#condwave"><span class="xref myst">20</span></a>) and the cost function.</p>
+<p>The trial solution becomes slightly different since we have other conditions than in the example of solving the diffusion equation. Here, a possible trial solution <span class="math notranslate nohighlight">\(g_t(x,t)\)</span> is</p>
+<div class="math notranslate nohighlight">
+\[
+g_t(x,t) = h_1(x,t) + x(1-x)t^2N(x,t,P)
+\]</div>
+<p>where</p>
+<div class="math notranslate nohighlight">
+\[
+h_1(x,t) = (1-t^2)u(x) + tv(x)
+\]</div>
+<p>Note that this trial solution satisfies the conditions only if <span class="math notranslate nohighlight">\(u(0) = v(0) = u(1) = v(1) = 0\)</span>, which is the case in this example.</p>
+</section>
+<section id="the-analytical-solution">
+<h2>The analytical solution<a class="headerlink" href="#the-analytical-solution" title="Link to this heading">#</a></h2>
+<p>The analytical solution for our specific problem, is</p>
+<div class="math notranslate nohighlight">
+\[
+g(x,t) = \sin(\pi x)\cos(\pi t) - \sin(\pi x)\sin(\pi t)
+\]</div>
+</section>
+<section id="solving-the-wave-equation-the-full-program-using-autograd">
+<h2>Solving the wave equation - the full program using Autograd<a class="headerlink" href="#solving-the-wave-equation-the-full-program-using-autograd" title="Link to this heading">#</a></h2>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-none notranslate"><div class="highlight"><pre><span></span>import autograd.numpy as np
+from autograd import hessian,grad
+import autograd.numpy.random as npr
+from matplotlib import cm
+from matplotlib import pyplot as plt
+from mpl_toolkits.mplot3d import axes3d
+
+## Set up the trial function:
+def u(x):
+    return np.sin(np.pi*x)
+
+def v(x):
+    return -np.pi*np.sin(np.pi*x)
+
+def h1(point):
+    x,t = point
+    return (1 - t**2)*u(x) + t*v(x)
+
+def g_trial(point,P):
+    x,t = point
+    return h1(point) + x*(1-x)*t**2*deep_neural_network(P,point)
+
+## Define the cost function
+def cost_function(P, x, t):
+    cost_sum = 0
+
+    g_t_hessian_func = hessian(g_trial)
+
+    for x_ in x:
+        for t_ in t:
+            point = np.array([x_,t_])
+
+            g_t_hessian = g_t_hessian_func(point,P)
+
+            g_t_d2x = g_t_hessian[0][0]
+            g_t_d2t = g_t_hessian[1][1]
+
+            err_sqr = ( (g_t_d2t - g_t_d2x) )**2
+            cost_sum += err_sqr
+
+    return cost_sum / (np.size(t) * np.size(x))
+
+## The neural network
+def sigmoid(z):
+    return 1/(1 + np.exp(-z))
+
+def deep_neural_network(deep_params, x):
+    # x is now a point and a 1D numpy array; make it a column vector
+    num_coordinates = np.size(x,0)
+    x = x.reshape(num_coordinates,-1)
+
+    num_points = np.size(x,1)
+
+    # N_hidden is the number of hidden layers
+    N_hidden = len(deep_params) - 1 # -1 since params consist of parameters to all the hidden layers AND the output layer
+
+    # Assume that the input layer does nothing to the input x
+    x_input = x
+    x_prev = x_input
+
+    ## Hidden layers:
+
+    for l in range(N_hidden):
+        # From the list of parameters P; find the correct weigths and bias for this layer
+        w_hidden = deep_params[l]
+
+        # Add a row of ones to include bias
+        x_prev = np.concatenate((np.ones((1,num_points)), x_prev ), axis = 0)
+
+        z_hidden = np.matmul(w_hidden, x_prev)
+        x_hidden = sigmoid(z_hidden)
+
+        # Update x_prev such that next layer can use the output from this layer
+        x_prev = x_hidden
+
+    ## Output layer:
+
+    # Get the weights and bias for this layer
+    w_output = deep_params[-1]
+
+    # Include bias:
+    x_prev = np.concatenate((np.ones((1,num_points)), x_prev), axis = 0)
+
+    z_output = np.matmul(w_output, x_prev)
+    x_output = z_output
+
+    return x_output[0][0]
+
+## The analytical solution
+def g_analytic(point):
+    x,t = point
+    return np.sin(np.pi*x)*np.cos(np.pi*t) - np.sin(np.pi*x)*np.sin(np.pi*t)
+
+def solve_pde_deep_neural_network(x,t, num_neurons, num_iter, lmb):
+    ## Set up initial weigths and biases
+    N_hidden = np.size(num_neurons)
+
+    ## Set up initial weigths and biases
+
+    # Initialize the list of parameters:
+    P = [None]*(N_hidden + 1) # + 1 to include the output layer
+
+    P[0] = npr.randn(num_neurons[0], 2 + 1 ) # 2 since we have two points, +1 to include bias
+    for l in range(1,N_hidden):
+        P[l] = npr.randn(num_neurons[l], num_neurons[l-1] + 1) # +1 to include bias
+
+    # For the output layer
+    P[-1] = npr.randn(1, num_neurons[-1] + 1 ) # +1 since bias is included
+
+    print(&#39;Initial cost: &#39;,cost_function(P, x, t))
+
+    cost_function_grad = grad(cost_function,0)
+
+    # Let the update be done num_iter times
+    for i in range(num_iter):
+        cost_grad =  cost_function_grad(P, x , t)
+
+        for l in range(N_hidden+1):
+            P[l] = P[l] - lmb * cost_grad[l]
+
+
+    print(&#39;Final cost: &#39;,cost_function(P, x, t))
+
+    return P
+
+if __name__ == &#39;__main__&#39;:
+    ### Use the neural network:
+    npr.seed(15)
+
+    ## Decide the vales of arguments to the function to solve
+    Nx = 10; Nt = 10
+    x = np.linspace(0, 1, Nx)
+    t = np.linspace(0,1,Nt)
+
+    ## Set up the parameters for the network
+    num_hidden_neurons = [50,20]
+    num_iter = 1000
+    lmb = 0.01
+
+    P = solve_pde_deep_neural_network(x,t, num_hidden_neurons, num_iter, lmb)
+
+    ## Store the results
+    res = np.zeros((Nx, Nt))
+    res_analytical = np.zeros((Nx, Nt))
+    for i,x_ in enumerate(x):
+        for j, t_ in enumerate(t):
+            point = np.array([x_, t_])
+            res[i,j] = g_trial(point,P)
+
+            res_analytical[i,j] = g_analytic(point)
+
+    diff = np.abs(res - res_analytical)
+    print(&quot;Max difference between analytical and solution from nn: %g&quot;%np.max(diff))
+
+    ## Plot the solutions in two dimensions, that being in position and time
+
+    T,X = np.meshgrid(t,x)
+
+    fig = plt.figure(figsize=(10,10))
+    ax = fig.add_suplot(projection=&#39;3d&#39;)
+    ax.set_title(&#39;Solution from the deep neural network w/ %d layer&#39;%len(num_hidden_neurons))
+    s = ax.plot_surface(T,X,res,linewidth=0,antialiased=False,cmap=cm.viridis)
+    ax.set_xlabel(&#39;Time $t$&#39;)
+    ax.set_ylabel(&#39;Position $x$&#39;);
+
+
+    fig = plt.figure(figsize=(10,10))
+    ax = fig.add_suplot(projection=&#39;3d&#39;)
+    ax.set_title(&#39;Analytical solution&#39;)
+    s = ax.plot_surface(T,X,res_analytical,linewidth=0,antialiased=False,cmap=cm.viridis)
+    ax.set_xlabel(&#39;Time $t$&#39;)
+    ax.set_ylabel(&#39;Position $x$&#39;);
+
+
+    fig = plt.figure(figsize=(10,10))
+    ax = fig.add_suplot(projection=&#39;3d&#39;)
+    ax.set_title(&#39;Difference&#39;)
+    s = ax.plot_surface(T,X,diff,linewidth=0,antialiased=False,cmap=cm.viridis)
+    ax.set_xlabel(&#39;Time $t$&#39;)
+    ax.set_ylabel(&#39;Position $x$&#39;);
+
+    ## Take some slices of the 3D plots just to see the solutions at particular times
+    indx1 = 0
+    indx2 = int(Nt/2)
+    indx3 = Nt-1
+
+    t1 = t[indx1]
+    t2 = t[indx2]
+    t3 = t[indx3]
+
+    # Slice the results from the DNN
+    res1 = res[:,indx1]
+    res2 = res[:,indx2]
+    res3 = res[:,indx3]
+
+    # Slice the analytical results
+    res_analytical1 = res_analytical[:,indx1]
+    res_analytical2 = res_analytical[:,indx2]
+    res_analytical3 = res_analytical[:,indx3]
+
+    # Plot the slices
+    plt.figure(figsize=(10,10))
+    plt.title(&quot;Computed solutions at time = %g&quot;%t1)
+    plt.plot(x, res1)
+    plt.plot(x,res_analytical1)
+    plt.legend([&#39;dnn&#39;,&#39;analytical&#39;])
+
+    plt.figure(figsize=(10,10))
+    plt.title(&quot;Computed solutions at time = %g&quot;%t2)
+    plt.plot(x, res2)
+    plt.plot(x,res_analytical2)
+    plt.legend([&#39;dnn&#39;,&#39;analytical&#39;])
+
+    plt.figure(figsize=(10,10))
+    plt.title(&quot;Computed solutions at time = %g&quot;%t3)
+    plt.plot(x, res3)
+    plt.plot(x,res_analytical3)
+    plt.legend([&#39;dnn&#39;,&#39;analytical&#39;])
+
+    plt.show()
+</pre></div>
+</div>
+</div>
+</div>
+</section>
+<section id="resources-on-differential-equations-and-deep-learning">
+<h2>Resources on differential equations and deep learning<a class="headerlink" href="#resources-on-differential-equations-and-deep-learning" title="Link to this heading">#</a></h2>
+<ol class="arabic simple">
+<li><p><a class="reference external" href="https://pdfs.semanticscholar.org/d061/df393e0e8fbfd0ea24976458b7d42419040d.pdf">Artificial neural networks for solving ordinary and partial differential equations by I.E. Lagaris et al</a></p></li>
+<li><p><a class="reference external" href="https://becominghuman.ai/neural-networks-for-solving-differential-equations-fa230ac5e04c">Neural networks for solving differential equations by A. Honchar</a></p></li>
+<li><p><a class="reference external" href="http://cs229.stanford.edu/proj2013/ChiaramonteKiener-SolvingDifferentialEquationsUsingNeuralNetworks.pdf">Solving differential equations using neural networks by M.M Chiaramonte and M. Kiener</a></p></li>
+<li><p><a class="reference external" href="https://www.springer.com/us/book/9783540225515">Introduction to Partial Differential Equations by A. Tveito, R. Winther</a></p></li>
+</ol>
+</section>
+</section>
+
+    <script type="text/x-thebe-config">
+    {
+        requestKernel: true,
+        binderOptions: {
+            repo: "binder-examples/jupyter-stacks-datascience",
+            ref: "master",
+        },
+        codeMirrorConfig: {
+            theme: "abcdef",
+            mode: "python"
+        },
+        kernelOptions: {
+            name: "python3",
+            path: "./."
+        },
+        predefinedOutput: true
+    }
+    </script>
+    <script>kernelName = 'python3'</script>
+
+                </article>
+              
+
+              
+              
+              
+              
+                <footer class="prev-next-footer d-print-none">
+                  
+<div class="prev-next-area">
+    <a class="left-prev"
+       href="http://github.com/exercisesweek42.html"
+       title="previous page">
+      <i class="fa-solid fa-angle-left"></i>
+      <div class="prev-next-info">
+        <p class="prev-next-subtitle">previous</p>
+        <p class="prev-next-title">Exercises week 42</p>
+      </div>
+    </a>
+    <a class="right-next"
+       href="http://github.com/exercisesweek43.html"
+       title="next page">
+      <div class="prev-next-info">
+        <p class="prev-next-subtitle">next</p>
+        <p class="prev-next-title">Exercises week 43</p>
+      </div>
+      <i class="fa-solid fa-angle-right"></i>
+    </a>
+</div>
+                </footer>
+              
+            </div>
+            
+            
+              
+                <div class="bd-sidebar-secondary bd-toc"><div class="sidebar-secondary-items sidebar-secondary__inner">
+
+
+  <div class="sidebar-secondary-item">
+  <div class="page-toc tocsection onthispage">
+    <i class="fa-solid fa-list"></i> Contents
+  </div>
+  <nav class="bd-toc-nav page-toc">
+    <ul class="visible nav section-nav flex-column">
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#plans-for-week-43">Plans for week 43</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#exercises-and-lab-session-week-43">Exercises and lab session week 43</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#using-automatic-differentiation">Using Automatic differentiation</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#back-propagation-and-automatic-differentiation">Back propagation and automatic differentiation</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#lecture-monday-october-20">Lecture Monday  October 20</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations">Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#setting-up-the-back-propagation-algorithm-part-1">Setting up the back propagation algorithm, part 1</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#setting-up-the-back-propagation-algorithm-part-2">Setting up the back propagation algorithm, part 2</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#setting-up-the-back-propagation-algorithm-part-3">Setting up the Back propagation algorithm, part 3</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#updating-the-gradients">Updating the gradients</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#activation-functions">Activation functions</a><ul class="nav section-nav flex-column">
+<li class="toc-h3 nav-item toc-entry"><a class="reference internal nav-link" href="#activation-functions-examples">Activation functions, examples</a></li>
+</ul>
+</li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#the-relu-function-family">The RELU function family</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#elu-function">ELU function</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#which-activation-function-should-we-use">Which activation function should we use?</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#more-on-activation-functions-output-layers">More on activation functions, output layers</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#building-neural-networks-in-tensorflow-and-keras">Building neural networks in Tensorflow and Keras</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#tensorflow">Tensorflow</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#using-keras">Using Keras</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#collect-and-pre-process-data">Collect and pre-process data</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#using-pytorch-with-the-full-mnist-data-set">Using Pytorch with the full MNIST data set</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#and-a-similar-example-using-tensorflow-with-keras">And a similar example using Tensorflow with Keras</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#building-our-own-neural-network-code">Building our own  neural network code</a><ul class="nav section-nav flex-column">
+<li class="toc-h3 nav-item toc-entry"><a class="reference internal nav-link" href="#learning-rate-methods">Learning rate methods</a></li>
+<li class="toc-h3 nav-item toc-entry"><a class="reference internal nav-link" href="#usage-of-the-above-learning-rate-schedulers">Usage of the above learning rate schedulers</a></li>
+<li class="toc-h3 nav-item toc-entry"><a class="reference internal nav-link" href="#cost-functions">Cost functions</a></li>
+<li class="toc-h3 nav-item toc-entry"><a class="reference internal nav-link" href="#id1">Activation functions</a></li>
+<li class="toc-h3 nav-item toc-entry"><a class="reference internal nav-link" href="#the-neural-network">The Neural Network</a></li>
+<li class="toc-h3 nav-item toc-entry"><a class="reference internal nav-link" href="#multiclass-classification">Multiclass classification</a></li>
+</ul>
+</li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#testing-the-xor-gate-and-other-gates">Testing the XOR gate and other gates</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#solving-differential-equations-with-deep-learning">Solving differential equations  with Deep Learning</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#ordinary-differential-equations-first">Ordinary Differential Equations first</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#the-trial-solution">The trial solution</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#minimization-process">Minimization process</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#minimizing-the-cost-function-using-gradient-descent-and-automatic-differentiation">Minimizing the cost function using gradient descent and automatic differentiation</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#example-exponential-decay">Example: Exponential decay</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#the-function-to-solve-for">The function to solve for</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#id2">The trial solution</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#setup-of-network">Setup of Network</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#reformulating-the-problem">Reformulating the problem</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#more-technicalities">More technicalities</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#more-details">More details</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#a-possible-implementation-of-a-neural-network">A possible implementation of a neural network</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#technicalities">Technicalities</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#final-technicalities-i">Final technicalities I</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#final-technicalities-ii">Final technicalities II</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#final-technicalities-iii">Final technicalities III</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#final-technicalities-iv">Final technicalities IV</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#back-propagation">Back propagation</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#gradient-descent">Gradient descent</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#the-code-for-solving-the-ode">The code for solving the ODE</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#the-network-with-one-input-layer-specified-number-of-hidden-layers-and-one-output-layer">The network with one input layer, specified number of hidden layers, and one output layer</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#example-population-growth">Example: Population growth</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#setting-up-the-problem">Setting up the problem</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#id3">The trial solution</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#the-program-using-autograd">The program using Autograd</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#using-forward-euler-to-solve-the-ode">Using forward Euler to solve the ODE</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#example-solving-the-one-dimensional-poisson-equation">Example: Solving the one dimensional Poisson equation</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#the-specific-equation-to-solve-for">The specific equation to solve for</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#solving-the-equation-using-autograd">Solving the equation using Autograd</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#comparing-with-a-numerical-scheme">Comparing with a numerical scheme</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#setting-up-the-code">Setting up the code</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#partial-differential-equations">Partial Differential Equations</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#type-of-problem">Type of problem</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#network-requirements">Network requirements</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#id4">More details</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#example-the-diffusion-equation">Example: The diffusion equation</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#defining-the-problem">Defining the problem</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#setting-up-the-network-using-autograd">Setting up the network using Autograd</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#setting-up-the-network-using-autograd-the-trial-solution">Setting up the network using Autograd; The trial solution</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#why-the-jacobian">Why the jacobian?</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#setting-up-the-network-using-autograd-the-full-program">Setting up the network using Autograd; The full program</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#example-solving-the-wave-equation-with-neural-networks">Example: Solving the wave equation with Neural Networks</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#the-problem-to-solve-for">The problem to solve for</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#id5">The trial solution</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#the-analytical-solution">The analytical solution</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#solving-the-wave-equation-the-full-program-using-autograd">Solving the wave equation - the full program using Autograd</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#resources-on-differential-equations-and-deep-learning">Resources on differential equations and deep learning</a></li>
+</ul>
+  </nav></div>
+
+</div></div>
+              
+            
+          </div>
+          <footer class="bd-footer-content">
+            
+<div class="bd-footer-content__inner container">
+  
+  <div class="footer-item">
+    
+<p class="component-author">
+By Morten Hjorth-Jensen
+</p>
+
+  </div>
+  
+  <div class="footer-item">
+    
+
+  <p class="copyright">
+    
+      © Copyright 2023.
+      <br/>
+    
+  </p>
+
+  </div>
+  
+  <div class="footer-item">
+    
+  </div>
+  
+  <div class="footer-item">
+    
+  </div>
+  
+</div>
+          </footer>
+        
+
+      </main>
+    </div>
+  </div>
+  
+  <!-- Scripts loaded after <body> so the DOM is not blocked -->
+  <script src="http://github.com/_static/scripts/bootstrap.js?digest=dfe6caa3a7d634c4db9b"></script>
+<script src="http://github.com/_static/scripts/pydata-sphinx-theme.js?digest=dfe6caa3a7d634c4db9b"></script>
+
+  <footer class="bd-footer">
+  </footer>
+  </body>
+</html>
\ No newline at end of file
diff --git a/doc/LectureNotes/_build/html/week44.html b/doc/LectureNotes/_build/html/week44.html
new file mode 100644
index 000000000..f55e16b0d
--- /dev/null
+++ b/doc/LectureNotes/_build/html/week44.html
@@ -0,0 +1,3493 @@
+
+<!DOCTYPE html>
+
+
+<html lang="en" data-content_root="./" >
+
+  <head>
+    <meta charset="utf-8" />
+    <meta name="viewport" content="width=device-width, initial-scale=1.0" /><meta name="viewport" content="width=device-width, initial-scale=1" />
+
+    <title>Week 44, Solving differential equations with neural networks and start Convolutional Neural Networks (CNN) &#8212; Applied Data Analysis and Machine Learning</title>
+  
+  
+  
+  <script data-cfasync="false">
+    document.documentElement.dataset.mode = localStorage.getItem("mode") || "";
+    document.documentElement.dataset.theme = localStorage.getItem("theme") || "";
+  </script>
+  
+  <!-- Loaded before other Sphinx assets -->
+  <link href="http://github.com/_static/styles/theme.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+<link href="http://github.com/_static/styles/bootstrap.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+<link href="http://github.com/_static/styles/pydata-sphinx-theme.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+
+  
+  <link href="http://github.com/_static/vendor/fontawesome/6.5.2/css/all.min.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+  <link rel="preload" as="font" type="font/woff2" crossorigin href="http://github.com/_static/vendor/fontawesome/6.5.2/webfonts/fa-solid-900.woff2" />
+<link rel="preload" as="font" type="font/woff2" crossorigin href="http://github.com/_static/vendor/fontawesome/6.5.2/webfonts/fa-brands-400.woff2" />
+<link rel="preload" as="font" type="font/woff2" crossorigin href="http://github.com/_static/vendor/fontawesome/6.5.2/webfonts/fa-regular-400.woff2" />
+
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/pygments.css?v=fa44fd50" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/styles/sphinx-book-theme.css?v=eba8b062" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/togglebutton.css?v=13237357" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/copybutton.css?v=76b2166b" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/mystnb.8ecb98da25f57f5357bf6f572d296f466b2cfe2517ffebfabe82451661e28f02.css?v=6644e6bb" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/sphinx-thebe.css?v=4fa983c6" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/sphinx-design.min.css?v=95c83b7e" />
+  
+  <!-- Pre-loaded scripts that we'll load fully later -->
+  <link rel="preload" as="script" href="http://github.com/_static/scripts/bootstrap.js?digest=dfe6caa3a7d634c4db9b" />
+<link rel="preload" as="script" href="http://github.com/_static/scripts/pydata-sphinx-theme.js?digest=dfe6caa3a7d634c4db9b" />
+  <script src="http://github.com/_static/vendor/fontawesome/6.5.2/js/all.min.js?digest=dfe6caa3a7d634c4db9b"></script>
+
+    <script src="http://github.com/_static/documentation_options.js?v=9eb32ce0"></script>
+    <script src="http://github.com/_static/doctools.js?v=9a2dae69"></script>
+    <script src="http://github.com/_static/sphinx_highlight.js?v=dc90522c"></script>
+    <script src="http://github.com/_static/clipboard.min.js?v=a7894cd8"></script>
+    <script src="http://github.com/_static/copybutton.js?v=f281be69"></script>
+    <script src="http://github.com/_static/scripts/sphinx-book-theme.js?v=887ef09a"></script>
+    <script>let toggleHintShow = 'Click to show';</script>
+    <script>let toggleHintHide = 'Click to hide';</script>
+    <script>let toggleOpenOnPrint = 'true';</script>
+    <script src="http://github.com/_static/togglebutton.js?v=4a39c7ea"></script>
+    <script>var togglebuttonSelector = '.toggle, .admonition.dropdown';</script>
+    <script src="http://github.com/_static/design-tabs.js?v=f930bc37"></script>
+    <script>const THEBE_JS_URL = "https://unpkg.com/thebe@0.8.2/lib/index.js"; const thebe_selector = ".thebe,.cell"; const thebe_selector_input = "pre"; const thebe_selector_output = ".output, .cell_output"</script>
+    <script async="async" src="http://github.com/_static/sphinx-thebe.js?v=c100c467"></script>
+    <script>var togglebuttonSelector = '.toggle, .admonition.dropdown';</script>
+    <script>const THEBE_JS_URL = "https://unpkg.com/thebe@0.8.2/lib/index.js"; const thebe_selector = ".thebe,.cell"; const thebe_selector_input = "pre"; const thebe_selector_output = ".output, .cell_output"</script>
+    <script>window.MathJax = {"options": {"processHtmlClass": "tex2jax_process|mathjax_process|math|output_area"}}</script>
+    <script defer="defer" src="https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-mml-chtml.js"></script>
+    <script>DOCUMENTATION_OPTIONS.pagename = 'week44';</script>
+    <link rel="index" title="Index" href="http://github.com/genindex.html" />
+    <link rel="search" title="Search" href="http://github.com/search.html" />
+    <link rel="next" title="Exercises week 44" href="http://github.com/exercisesweek44.html" />
+    <link rel="prev" title="Exercises week 43" href="http://github.com/exercisesweek43.html" />
+  <meta name="viewport" content="width=device-width, initial-scale=1"/>
+  <meta name="docsearch:language" content="en"/>
+  </head>
+  
+  
+  <body data-bs-spy="scroll" data-bs-target=".bd-toc-nav" data-offset="180" data-bs-root-margin="0px 0px -60%" data-default-mode="">
+
+  
+  
+  <div id="pst-skip-link" class="skip-link d-print-none"><a href="#main-content">Skip to main content</a></div>
+  
+  <div id="pst-scroll-pixel-helper"></div>
+  
+  <button type="button" class="btn rounded-pill" id="pst-back-to-top">
+    <i class="fa-solid fa-arrow-up"></i>Back to top</button>
+
+  
+  <input type="checkbox"
+          class="sidebar-toggle"
+          id="pst-primary-sidebar-checkbox"/>
+  <label class="overlay overlay-primary" for="pst-primary-sidebar-checkbox"></label>
+  
+  <input type="checkbox"
+          class="sidebar-toggle"
+          id="pst-secondary-sidebar-checkbox"/>
+  <label class="overlay overlay-secondary" for="pst-secondary-sidebar-checkbox"></label>
+  
+  <div class="search-button__wrapper">
+    <div class="search-button__overlay"></div>
+    <div class="search-button__search-container">
+<form class="bd-search d-flex align-items-center"
+      action="http://github.com/search.html"
+      method="get">
+  <i class="fa-solid fa-magnifying-glass"></i>
+  <input type="search"
+         class="form-control"
+         name="q"
+         id="search-input"
+         placeholder="Search this book..."
+         aria-label="Search this book..."
+         autocomplete="off"
+         autocorrect="off"
+         autocapitalize="off"
+         spellcheck="false"/>
+  <span class="search-button__kbd-shortcut"><kbd class="kbd-shortcut__modifier">Ctrl</kbd>+<kbd>K</kbd></span>
+</form></div>
+  </div>
+
+  <div class="pst-async-banner-revealer d-none">
+  <aside id="bd-header-version-warning" class="d-none d-print-none" aria-label="Version warning"></aside>
+</div>
+
+  
+    <header class="bd-header navbar navbar-expand-lg bd-navbar d-print-none">
+    </header>
+  
+
+  <div class="bd-container">
+    <div class="bd-container__inner bd-page-width">
+      
+      
+      
+      <div class="bd-sidebar-primary bd-sidebar">
+        
+
+  
+  <div class="sidebar-header-items sidebar-primary__section">
+    
+    
+    
+    
+  </div>
+  
+    <div class="sidebar-primary-items__start sidebar-primary__section">
+        <div class="sidebar-primary-item">
+
+  
+    
+  
+
+<a class="navbar-brand logo" href="http://github.com/intro.html">
+  
+  
+  
+  
+  
+    
+    
+      
+    
+    
+    <img src="http://github.com/_static/logo.png" class="logo__image only-light" alt="Applied Data Analysis and Machine Learning - Home"/>
+    <script>document.write(`<img src="http://github.com/_static/logo.png" class="logo__image only-dark" alt="Applied Data Analysis and Machine Learning - Home"/>`);</script>
+  
+  
+</a></div>
+        <div class="sidebar-primary-item">
+
+ <script>
+ document.write(`
+   <button class="btn search-button-field search-button__button" title="Search" aria-label="Search" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <i class="fa-solid fa-magnifying-glass"></i>
+    <span class="search-button__default-text">Search</span>
+    <span class="search-button__kbd-shortcut"><kbd class="kbd-shortcut__modifier">Ctrl</kbd>+<kbd class="kbd-shortcut__modifier">K</kbd></span>
+   </button>
+ `);
+ </script></div>
+        <div class="sidebar-primary-item"><nav class="bd-links bd-docs-nav" aria-label="Main">
+    <div class="bd-toc-item navbar-nav active">
+        
+        <ul class="nav bd-sidenav bd-sidenav__home-link">
+            <li class="toctree-l1">
+                <a class="reference internal" href="http://github.com/intro.html">
+                    Applied Data Analysis and Machine Learning
+                </a>
+            </li>
+        </ul>
+        <p aria-level="2" class="caption" role="heading"><span class="caption-text">About the course</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/schedule.html">Course setting</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/teachers.html">Teachers and Grading</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/textbooks.html">Textbooks</a></li>
+
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Review of Statistics with Resampling Techniques and Linear Algebra</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/statistics.html">1. Elements of Probability Theory and Statistical Data Analysis</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/linalg.html">2. Linear Algebra, Handling of Arrays and more Python Features</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">From Regression to Support Vector Machines</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter1.html">3. Linear Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter2.html">4. Ridge and Lasso Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter3.html">5. Resampling Methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter4.html">6. Logistic Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapteroptimization.html">7. Optimization, the central part of any Machine Learning algortithm</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter5.html">8. Support Vector Machines, overarching aims</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Decision Trees, Ensemble Methods and Boosting</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter6.html">9. Decision trees, overarching aims</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter7.html">10. Ensemble Methods: From a Single Tree to Many Trees and Extreme Boosting, Meet the Jungle of Methods</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Dimensionality Reduction</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter8.html">11. Basic ideas of the Principal Component Analysis (PCA)</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/clustering.html">12. Clustering and Unsupervised Learning</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Deep Learning Methods</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter9.html">13. Neural networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter10.html">14. Building a Feed Forward Neural Network</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter11.html">15. Solving Differential Equations  with Deep Learning</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter12.html">16. Convolutional Neural Networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter13.html">17. Recurrent neural networks: Overarching view</a></li>
+
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Weekly material, notes and exercises</span></p>
+<ul class="current nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek34.html">Exercises week 34</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week34.html">Week 34: Introduction to the course, Logistics and Practicalities</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek35.html">Exercises week 35</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week35.html">Week 35: From Ordinary Linear Regression to Ridge and Lasso Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek36.html">Exercises week 36</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week36.html">Week 36: Linear Regression and Gradient descent</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek37.html">Exercises week 37</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week37.html">Week 37: Gradient descent methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek38.html">Exercises week 38</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week38.html">Week 38: Statistical analysis, bias-variance tradeoff and resampling methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek39.html">Exercises week 39</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week39.html">Week 39: Resampling methods and logistic regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week40.html">Week 40: Gradient descent methods (continued) and start Neural networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week41.html">Week 41 Neural networks and constructing a neural network code</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek41.html">Exercises week 41</a></li>
+
+
+
+
+
+
+
+
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week42.html">Week 42 Constructing a Neural Network code with examples</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek42.html">Exercises week 42</a></li>
+
+
+
+
+
+
+
+
+
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week43.html">Week 43: Deep Learning: Constructing a Neural Network code and solving differential equations</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek43.html">Exercises week 43</a></li>
+
+<li class="toctree-l1 current active"><a class="current reference internal" href="#">Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek44.html">Exercises week 44</a></li>
+
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week45.html">Week 45,  Convolutional Neural Networks (CCNs)</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Projects</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/project1.html">Project 1 on Machine Learning, deadline October 6 (midnight), 2025</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/project2.html">Project 2 on Machine Learning, deadline November 10 (Midnight)</a></li>
+</ul>
+
+    </div>
+</nav></div>
+    </div>
+  
+  
+  <div class="sidebar-primary-items__end sidebar-primary__section">
+  </div>
+  
+  <div id="rtd-footer-container"></div>
+
+
+      </div>
+      
+      <main id="main-content" class="bd-main" role="main">
+        
+        
+
+<div class="sbt-scroll-pixel-helper"></div>
+
+          <div class="bd-content">
+            <div class="bd-article-container">
+              
+              <div class="bd-header-article d-print-none">
+<div class="header-article-items header-article__inner">
+  
+    <div class="header-article-items__start">
+      
+        <div class="header-article-item"><button class="sidebar-toggle primary-toggle btn btn-sm" title="Toggle primary sidebar" data-bs-placement="bottom" data-bs-toggle="tooltip">
+  <span class="fa-solid fa-bars"></span>
+</button></div>
+      
+    </div>
+  
+  
+    <div class="header-article-items__end">
+      
+        <div class="header-article-item">
+
+<div class="article-header-buttons">
+
+
+
+
+
+<div class="dropdown dropdown-download-buttons">
+  <button class="btn dropdown-toggle" type="button" data-bs-toggle="dropdown" aria-expanded="false" aria-label="Download this page">
+    <i class="fas fa-download"></i>
+  </button>
+  <ul class="dropdown-menu">
+      
+      
+      
+      <li><a href="http://github.com/_sources/week44.ipynb" target="_blank"
+   class="btn btn-sm btn-download-source-button dropdown-item"
+   title="Download source file"
+   data-bs-placement="left" data-bs-toggle="tooltip"
+>
+  
+
+<span class="btn__icon-container">
+  <i class="fas fa-file"></i>
+  </span>
+<span class="btn__text-container">.ipynb</span>
+</a>
+</li>
+      
+      
+      
+      
+      <li>
+<button onclick="window.print()"
+  class="btn btn-sm btn-download-pdf-button dropdown-item"
+  title="Print to PDF"
+  data-bs-placement="left" data-bs-toggle="tooltip"
+>
+  
+
+<span class="btn__icon-container">
+  <i class="fas fa-file-pdf"></i>
+  </span>
+<span class="btn__text-container">.pdf</span>
+</button>
+</li>
+      
+  </ul>
+</div>
+
+
+
+
+<button onclick="toggleFullScreen()"
+  class="btn btn-sm btn-fullscreen-button"
+  title="Fullscreen mode"
+  data-bs-placement="bottom" data-bs-toggle="tooltip"
+>
+  
+
+<span class="btn__icon-container">
+  <i class="fas fa-expand"></i>
+  </span>
+
+</button>
+
+
+
+<script>
+document.write(`
+  <button class="btn btn-sm nav-link pst-navbar-icon theme-switch-button" title="light/dark" aria-label="light/dark" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <i class="theme-switch fa-solid fa-sun fa-lg" data-mode="light"></i>
+    <i class="theme-switch fa-solid fa-moon fa-lg" data-mode="dark"></i>
+    <i class="theme-switch fa-solid fa-circle-half-stroke fa-lg" data-mode="auto"></i>
+  </button>
+`);
+</script>
+
+
+<script>
+document.write(`
+  <button class="btn btn-sm pst-navbar-icon search-button search-button__button" title="Search" aria-label="Search" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <i class="fa-solid fa-magnifying-glass fa-lg"></i>
+  </button>
+`);
+</script>
+<button class="sidebar-toggle secondary-toggle btn btn-sm" title="Toggle secondary sidebar" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <span class="fa-solid fa-list"></span>
+</button>
+</div></div>
+      
+    </div>
+  
+</div>
+</div>
+              
+              
+
+<div id="jb-print-docs-body" class="onlyprint">
+    <h1>Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)</h1>
+    <!-- Table of contents -->
+    <div id="print-main-content">
+        <div id="jb-print-toc">
+            
+            <div>
+                <h2> Contents </h2>
+            </div>
+            <nav aria-label="Page">
+                <ul class="visible nav section-nav flex-column">
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#plan-for-week-44">Plan for week 44</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#lab-sessions-on-tuesday-and-wednesday">Lab  sessions on Tuesday and Wednesday</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#material-for-lecture-monday-october-27">Material for Lecture Monday October 27</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#solving-differential-equations-with-deep-learning">Solving differential equations  with Deep Learning</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#ordinary-differential-equations-first">Ordinary Differential Equations first</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#the-trial-solution">The trial solution</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#minimization-process">Minimization process</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#minimizing-the-cost-function-using-gradient-descent-and-automatic-differentiation">Minimizing the cost function using gradient descent and automatic differentiation</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#example-exponential-decay">Example: Exponential decay</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#the-function-to-solve-for">The function to solve for</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#id1">The trial solution</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#setup-of-network">Setup of Network</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#reformulating-the-problem">Reformulating the problem</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#more-technicalities">More technicalities</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#more-details">More details</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#a-possible-implementation-of-a-neural-network">A possible implementation of a neural network</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#technicalities">Technicalities</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#final-technicalities-i">Final technicalities I</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#final-technicalities-ii">Final technicalities II</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#final-technicalities-iii">Final technicalities III</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#final-technicalities-iv">Final technicalities IV</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#back-propagation">Back propagation</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#gradient-descent">Gradient descent</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#the-code-for-solving-the-ode">The code for solving the ODE</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#the-network-with-one-input-layer-specified-number-of-hidden-layers-and-one-output-layer">The network with one input layer, specified number of hidden layers, and one output layer</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#example-population-growth">Example: Population growth</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#setting-up-the-problem">Setting up the problem</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#id2">The trial solution</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#the-program-using-autograd">The program using Autograd</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#using-forward-euler-to-solve-the-ode">Using forward Euler to solve the ODE</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#example-solving-the-one-dimensional-poisson-equation">Example: Solving the one dimensional Poisson equation</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#the-specific-equation-to-solve-for">The specific equation to solve for</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#solving-the-equation-using-autograd">Solving the equation using Autograd</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#comparing-with-a-numerical-scheme">Comparing with a numerical scheme</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#setting-up-the-code">Setting up the code</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#partial-differential-equations">Partial Differential Equations</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#type-of-problem">Type of problem</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#network-requirements">Network requirements</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#id3">More details</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#example-the-diffusion-equation">Example: The diffusion equation</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#defining-the-problem">Defining the problem</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#setting-up-the-network-using-autograd">Setting up the network using Autograd</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#setting-up-the-network-using-autograd-the-trial-solution">Setting up the network using Autograd; The trial solution</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#why-the-jacobian">Why the Jacobian?</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#setting-up-the-network-using-autograd-the-full-program">Setting up the network using Autograd; The full program</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#resources-on-differential-equations-and-deep-learning">Resources on differential equations and deep learning</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#convolutional-neural-networks-recognizing-images">Convolutional Neural Networks (recognizing images)</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#what-is-the-difference">What is the Difference</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#neural-networks-vs-cnns">Neural Networks vs CNNs</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc">Why CNNS for images, sound files, medical images from CT scans etc?</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#regular-nns-dont-scale-well-to-full-images">Regular NNs don’t scale well to full images</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#d-volumes-of-neurons">3D volumes of neurons</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#more-on-dimensionalities">More on Dimensionalities</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#further-remarks">Further remarks</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#layers-used-to-build-cnns">Layers used to build CNNs</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#transforming-images">Transforming images</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#cnns-in-brief">CNNs in brief</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#a-deep-cnn-model-from-raschka-et-al">A deep CNN model (From Raschka et al)</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#key-idea">Key Idea</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#how-to-do-image-compression-before-the-era-of-deep-learning">How to do image compression before the era of deep learning</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#the-svd-example">The SVD example</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#mathematics-of-cnns">Mathematics of CNNs</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#convolution-examples-polynomial-multiplication">Convolution Examples: Polynomial multiplication</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#efficient-polynomial-multiplication">Efficient Polynomial Multiplication</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#further-simplification">Further simplification</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#a-more-efficient-way-of-coding-the-above-convolution">A more efficient way of coding the above Convolution</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#commutative-process">Commutative process</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#toeplitz-matrices">Toeplitz matrices</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#fourier-series-and-toeplitz-matrices">Fourier series and Toeplitz matrices</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#generalizing-the-above-one-dimensional-case">Generalizing the above one-dimensional case</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#memory-considerations">Memory considerations</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#padding">Padding</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#new-vector">New vector</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#rewriting-as-dot-products">Rewriting as dot products</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#cross-correlation">Cross correlation</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#two-dimensional-objects">Two-dimensional objects</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#cnns-in-more-detail-simple-example">CNNs in more detail, simple example</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#the-convolution-stage">The convolution stage</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#finding-the-number-of-parameters">Finding the number of parameters</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#new-image-or-volume">New image (or volume)</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#parameters-to-train-common-settings">Parameters to train, common settings</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#examples-of-cnn-setups">Examples of CNN setups</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#summarizing-performing-a-general-discrete-convolution-from-raschka-et-al">Summarizing: Performing a general discrete convolution (From Raschka et al)</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#pooling">Pooling</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#pooling-arithmetic">Pooling arithmetic</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#pooling-types-from-raschka-et-al">Pooling types (From Raschka et al)</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#building-convolutional-neural-networks-in-tensorflow-keras-and-pytorch">Building convolutional neural networks in Tensorflow/Keras and PyTorch</a></li>
+</ul>
+            </nav>
+        </div>
+    </div>
+</div>
+
+              
+                
+<div id="searchbox"></div>
+                <article class="bd-article">
+                  
+  <!-- HTML file automatically generated from DocOnce source (https://github.com/doconce/doconce/)
+doconce format html week44.do.txt --no_mako -->
+<!-- dom:TITLE: Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN) --><section class="tex2jax_ignore mathjax_ignore" id="week-44-solving-differential-equations-with-neural-networks-and-start-convolutional-neural-networks-cnn">
+<h1>Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)<a class="headerlink" href="#week-44-solving-differential-equations-with-neural-networks-and-start-convolutional-neural-networks-cnn" title="Link to this heading">#</a></h1>
+<p><strong>Morten Hjorth-Jensen</strong>, Department of Physics, University of Oslo, Norway</p>
+<p>Date: <strong>Week 44</strong></p>
+<section id="plan-for-week-44">
+<h2>Plan for week 44<a class="headerlink" href="#plan-for-week-44" title="Link to this heading">#</a></h2>
+<p><strong>Material for the lecture Monday October 27, 2025.</strong></p>
+<ol class="arabic simple">
+<li><p>Solving differential equations, continuation from last week, first lecture</p></li>
+<li><p>Convolutional  Neural Networks, second lecture</p></li>
+<li><p>Readings and Videos:</p></li>
+</ol>
+<ul class="simple">
+<li><p>These lecture notes at <a class="github reference external" href="https://github.com/CompPhysics/MachineLearning/blob/master/doc/pub/week44/ipynb/week44.ipynb">CompPhysics/MachineLearning</a></p></li>
+<li><p>For a more in depth discussion on  neural networks we recommend Goodfellow et al chapter 9. See also chapter 11 and 12 on practicalities and applications</p></li>
+<li><p>Reading suggestions for implementation of CNNs see Rashcka et al.’s chapter 14 at <a class="github reference external" href="https://github.com/rasbt/machine-learning-book/tree/main/ch14">rasbt/machine-learning-book</a>.</p></li>
+<li><p>Video on Deep Learning at <a class="reference external" href="https://www.youtube.com/playlist?list=PLZHQObOWTQDNU6R1_67000Dx_ZCJB-3pi">https://www.youtube.com/playlist?list=PLZHQObOWTQDNU6R1_67000Dx_ZCJB-3pi</a></p></li>
+<li><p>Video  on Convolutional Neural Networks from MIT at <a class="reference external" href="https://www.youtube.com/watch?v=iaSUYvmCekI&amp;amp;ab_channel=AlexanderAmini">https://www.youtube.com/watch?v=iaSUYvmCekI&amp;ab_channel=AlexanderAmini</a></p></li>
+<li><p>Video on CNNs from Stanford at <a class="reference external" href="https://www.youtube.com/watch?v=bNb2fEVKeEo&amp;amp;list=PLC1qU-LWwrF64f4QKQT-Vg5Wr4qEE1Zxk&amp;amp;index=6&amp;amp;ab_channel=StanfordUniversitySchoolofEngineering">https://www.youtube.com/watch?v=bNb2fEVKeEo&amp;list=PLC1qU-LWwrF64f4QKQT-Vg5Wr4qEE1Zxk&amp;index=6&amp;ab_channel=StanfordUniversitySchoolofEngineering</a></p></li>
+<li><p>Video of lecture October 27 at <a class="reference external" href="https://youtu.be/QqOGhLgkig0">https://youtu.be/QqOGhLgkig0</a></p></li>
+<li><p>Whiteboard notes at <a class="github reference external" href="https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2025/FYSSTKweek44">CompPhysics/MachineLearning</a></p></li>
+</ul>
+</section>
+<section id="lab-sessions-on-tuesday-and-wednesday">
+<h2>Lab  sessions on Tuesday and Wednesday<a class="headerlink" href="#lab-sessions-on-tuesday-and-wednesday" title="Link to this heading">#</a></h2>
+<ul class="simple">
+<li><p>Main focus is discussion of and work on project 2</p></li>
+<li><p>If you did not get time to finish the exercises from weeks 41-42, you can also keep working on them and hand in this coming Friday</p></li>
+</ul>
+</section>
+<section id="material-for-lecture-monday-october-27">
+<h2>Material for Lecture Monday October 27<a class="headerlink" href="#material-for-lecture-monday-october-27" title="Link to this heading">#</a></h2>
+</section>
+<section id="solving-differential-equations-with-deep-learning">
+<h2>Solving differential equations  with Deep Learning<a class="headerlink" href="#solving-differential-equations-with-deep-learning" title="Link to this heading">#</a></h2>
+<p>The Universal Approximation Theorem states that a neural network can
+approximate any function at a single hidden layer along with one input
+and output layer to any given precision.</p>
+<p><strong>Book on solving differential equations with ML methods.</strong></p>
+<p><a class="reference external" href="https://www.springer.com/gp/book/9789401798150">An Introduction to Neural Network Methods for Differential Equations</a>, by Yadav and Kumar.</p>
+<p><strong>Physics informed neural networks.</strong></p>
+<p><a class="reference external" href="https://link.springer.com/article/10.1007/s10915-022-01939-z">Scientific Machine Learning Through Physics–Informed Neural Networks: Where we are and What’s Next</a>, by Cuomo et al</p>
+<p><strong>Thanks to Kristine Baluka Hein.</strong></p>
+<p>The lectures on differential equations were developed by Kristine Baluka Hein, now PhD student at IFI.
+A great thanks to Kristine.</p>
+</section>
+<section id="ordinary-differential-equations-first">
+<h2>Ordinary Differential Equations first<a class="headerlink" href="#ordinary-differential-equations-first" title="Link to this heading">#</a></h2>
+<p>An ordinary differential equation (ODE) is an equation involving functions having one variable.</p>
+<p>In general, an ordinary differential equation looks like</p>
+<!-- Equation labels as ordinary links -->
+<div id="ode"></div>
+<div class="math notranslate nohighlight">
+\[
+\begin{equation} \label{ode} \tag{1}
+f\left(x, \, g(x), \, g'(x), \, g''(x), \, \dots \, , \, g^{(n)}(x)\right) = 0
+\end{equation}
+\]</div>
+<p>where <span class="math notranslate nohighlight">\(g(x)\)</span> is the function to find, and <span class="math notranslate nohighlight">\(g^{(n)}(x)\)</span> is the <span class="math notranslate nohighlight">\(n\)</span>-th derivative of <span class="math notranslate nohighlight">\(g(x)\)</span>.</p>
+<p>The <span class="math notranslate nohighlight">\(f\left(x, g(x), g'(x), g''(x), \, \dots \, , g^{(n)}(x)\right)\)</span> is just a way to write that there is an expression involving <span class="math notranslate nohighlight">\(x\)</span> and <span class="math notranslate nohighlight">\(g(x), \ g'(x), \ g''(x), \, \dots \, , \text{ and } g^{(n)}(x)\)</span> on the left side of the equality sign in (<a class="reference internal" href="#ode"><span class="xref myst">1</span></a>).
+The highest order of derivative, that is the value of <span class="math notranslate nohighlight">\(n\)</span>, determines to the order of the equation.
+The equation is referred to as a <span class="math notranslate nohighlight">\(n\)</span>-th order ODE.
+Along with (<a class="reference internal" href="#ode"><span class="xref myst">1</span></a>), some additional conditions of the function <span class="math notranslate nohighlight">\(g(x)\)</span> are typically given
+for the solution to be unique.</p>
+</section>
+<section id="the-trial-solution">
+<h2>The trial solution<a class="headerlink" href="#the-trial-solution" title="Link to this heading">#</a></h2>
+<p>Let the trial solution <span class="math notranslate nohighlight">\(g_t(x)\)</span> be</p>
+<!-- Equation labels as ordinary links -->
+<div id="_auto1"></div>
+<div class="math notranslate nohighlight">
+\[
+\begin{equation}
+	g_t(x) = h_1(x) + h_2(x,N(x,P))
+\label{_auto1} \tag{2}
+\end{equation}
+\]</div>
+<p>where <span class="math notranslate nohighlight">\(h_1(x)\)</span> is a function that makes <span class="math notranslate nohighlight">\(g_t(x)\)</span> satisfy a given set
+of conditions, <span class="math notranslate nohighlight">\(N(x,P)\)</span> a neural network with weights and biases
+described by <span class="math notranslate nohighlight">\(P\)</span> and <span class="math notranslate nohighlight">\(h_2(x, N(x,P))\)</span> some expression involving the
+neural network.  The role of the function <span class="math notranslate nohighlight">\(h_2(x, N(x,P))\)</span>, is to
+ensure that the output from <span class="math notranslate nohighlight">\(N(x,P)\)</span> is zero when <span class="math notranslate nohighlight">\(g_t(x)\)</span> is
+evaluated at the values of <span class="math notranslate nohighlight">\(x\)</span> where the given conditions must be
+satisfied.  The function <span class="math notranslate nohighlight">\(h_1(x)\)</span> should alone make <span class="math notranslate nohighlight">\(g_t(x)\)</span> satisfy
+the conditions.</p>
+<p>But what about the network <span class="math notranslate nohighlight">\(N(x,P)\)</span>?</p>
+<p>As described previously, an optimization method could be used to minimize the parameters of a neural network, that being its weights and biases, through backward propagation.</p>
+</section>
+<section id="minimization-process">
+<h2>Minimization process<a class="headerlink" href="#minimization-process" title="Link to this heading">#</a></h2>
+<p>For the minimization to be defined, we need to have a cost function at hand to minimize.</p>
+<p>It is given that <span class="math notranslate nohighlight">\(f\left(x, \, g(x), \, g'(x), \, g''(x), \, \dots \, , \, g^{(n)}(x)\right)\)</span> should be equal to zero in (<a class="reference internal" href="#ode"><span class="xref myst">1</span></a>).
+We can choose to consider the mean squared error as the cost function for an input <span class="math notranslate nohighlight">\(x\)</span>.
+Since we are looking at one input, the cost function is just <span class="math notranslate nohighlight">\(f\)</span> squared.
+The cost function <span class="math notranslate nohighlight">\(c\left(x, P \right)\)</span> can therefore be expressed as</p>
+<div class="math notranslate nohighlight">
+\[
+C\left(x, P\right) = \big(f\left(x, \, g(x), \, g'(x), \, g''(x), \, \dots \, , \, g^{(n)}(x)\right)\big)^2
+\]</div>
+<p>If <span class="math notranslate nohighlight">\(N\)</span> inputs are given as a vector <span class="math notranslate nohighlight">\(\boldsymbol{x}\)</span> with elements <span class="math notranslate nohighlight">\(x_i\)</span> for <span class="math notranslate nohighlight">\(i = 1,\dots,N\)</span>,
+the cost function becomes</p>
+<!-- Equation labels as ordinary links -->
+<div id="cost"></div>
+<div class="math notranslate nohighlight">
+\[
+\begin{equation} \label{cost} \tag{3}
+	C\left(\boldsymbol{x}, P\right) = \frac{1}{N} \sum_{i=1}^N \big(f\left(x_i, \, g(x_i), \, g'(x_i), \, g''(x_i), \, \dots \, , \, g^{(n)}(x_i)\right)\big)^2
+\end{equation}
+\]</div>
+<p>The neural net should then find the parameters <span class="math notranslate nohighlight">\(P\)</span> that minimizes the cost function in
+(<a class="reference internal" href="#cost"><span class="xref myst">3</span></a>) for a set of <span class="math notranslate nohighlight">\(N\)</span> training samples <span class="math notranslate nohighlight">\(x_i\)</span>.</p>
+</section>
+<section id="minimizing-the-cost-function-using-gradient-descent-and-automatic-differentiation">
+<h2>Minimizing the cost function using gradient descent and automatic differentiation<a class="headerlink" href="#minimizing-the-cost-function-using-gradient-descent-and-automatic-differentiation" title="Link to this heading">#</a></h2>
+<p>To perform the minimization using gradient descent, the gradient of <span class="math notranslate nohighlight">\(C\left(\boldsymbol{x}, P\right)\)</span> is needed.
+It might happen so that finding an analytical expression of the gradient of <span class="math notranslate nohighlight">\(C(\boldsymbol{x}, P)\)</span> from (<a class="reference internal" href="#cost"><span class="xref myst">3</span></a>) gets too messy, depending on which cost function one desires to use.</p>
+<p>Luckily, there exists libraries that makes the job for us through automatic differentiation.
+Automatic differentiation is a method of finding the derivatives numerically with very high precision.</p>
+</section>
+<section id="example-exponential-decay">
+<h2>Example: Exponential decay<a class="headerlink" href="#example-exponential-decay" title="Link to this heading">#</a></h2>
+<p>An exponential decay of a quantity <span class="math notranslate nohighlight">\(g(x)\)</span> is described by the equation</p>
+<!-- Equation labels as ordinary links -->
+<div id="solve_expdec"></div>
+<div class="math notranslate nohighlight">
+\[
+\begin{equation} \label{solve_expdec} \tag{4}
+  g'(x) = -\gamma g(x)
+\end{equation}
+\]</div>
+<p>with <span class="math notranslate nohighlight">\(g(0) = g_0\)</span> for some chosen initial value <span class="math notranslate nohighlight">\(g_0\)</span>.</p>
+<p>The analytical solution of (<a class="reference internal" href="#solve_expdec"><span class="xref myst">4</span></a>) is</p>
+<!-- Equation labels as ordinary links -->
+<div id="_auto2"></div>
+<div class="math notranslate nohighlight">
+\[
+\begin{equation}
+  g(x) = g_0 \exp\left(-\gamma x\right)
+\label{_auto2} \tag{5}
+\end{equation}
+\]</div>
+<p>Having an analytical solution at hand, it is possible to use it to compare how well a neural network finds a solution of (<a class="reference internal" href="#solve_expdec"><span class="xref myst">4</span></a>).</p>
+</section>
+<section id="the-function-to-solve-for">
+<h2>The function to solve for<a class="headerlink" href="#the-function-to-solve-for" title="Link to this heading">#</a></h2>
+<p>The program will use a neural network to solve</p>
+<!-- Equation labels as ordinary links -->
+<div id="solveode"></div>
+<div class="math notranslate nohighlight">
+\[
+\begin{equation} \label{solveode} \tag{6}
+g'(x) = -\gamma g(x)
+\end{equation}
+\]</div>
+<p>where <span class="math notranslate nohighlight">\(g(0) = g_0\)</span> with <span class="math notranslate nohighlight">\(\gamma\)</span> and <span class="math notranslate nohighlight">\(g_0\)</span> being some chosen values.</p>
+<p>In this example, <span class="math notranslate nohighlight">\(\gamma = 2\)</span> and <span class="math notranslate nohighlight">\(g_0 = 10\)</span>.</p>
+</section>
+<section id="id1">
+<h2>The trial solution<a class="headerlink" href="#id1" title="Link to this heading">#</a></h2>
+<p>To begin with, a trial solution <span class="math notranslate nohighlight">\(g_t(t)\)</span> must be chosen. A general trial solution for ordinary differential equations could be</p>
+<div class="math notranslate nohighlight">
+\[
+g_t(x, P) = h_1(x) + h_2(x, N(x, P))
+\]</div>
+<p>with <span class="math notranslate nohighlight">\(h_1(x)\)</span> ensuring that <span class="math notranslate nohighlight">\(g_t(x)\)</span> satisfies some conditions and <span class="math notranslate nohighlight">\(h_2(x,N(x, P))\)</span> an expression involving <span class="math notranslate nohighlight">\(x\)</span> and the output from the neural network <span class="math notranslate nohighlight">\(N(x,P)\)</span> with <span class="math notranslate nohighlight">\(P \)</span> being the collection of the weights and biases for each layer. For now, it is assumed that the network consists of one input layer, one hidden layer, and one output layer.</p>
+</section>
+<section id="setup-of-network">
+<h2>Setup of Network<a class="headerlink" href="#setup-of-network" title="Link to this heading">#</a></h2>
+<p>In this network, there are no weights and bias at the input layer, so <span class="math notranslate nohighlight">\(P = \{ P_{\text{hidden}},  P_{\text{output}} \}\)</span>.
+If there are <span class="math notranslate nohighlight">\(N_{\text{hidden} }\)</span> neurons in the hidden layer, then <span class="math notranslate nohighlight">\(P_{\text{hidden}}\)</span> is a <span class="math notranslate nohighlight">\(N_{\text{hidden} } \times (1 + N_{\text{input}})\)</span> matrix, given that there are <span class="math notranslate nohighlight">\(N_{\text{input}}\)</span> neurons in the input layer.</p>
+<p>The first column in <span class="math notranslate nohighlight">\(P_{\text{hidden} }\)</span> represents the bias for each neuron in the hidden layer and the second column represents the weights for each neuron in the hidden layer from the input layer.
+If there are <span class="math notranslate nohighlight">\(N_{\text{output} }\)</span> neurons in the output layer, then <span class="math notranslate nohighlight">\(P_{\text{output}} \)</span> is a <span class="math notranslate nohighlight">\(N_{\text{output} } \times (1 + N_{\text{hidden} })\)</span> matrix.</p>
+<p>Its first column represents the bias of each neuron and the remaining columns represents the weights to each neuron.</p>
+<p>It is given that <span class="math notranslate nohighlight">\(g(0) = g_0\)</span>. The trial solution must fulfill this condition to be a proper solution of (<a class="reference internal" href="#solveode"><span class="xref myst">6</span></a>). A possible way to ensure that <span class="math notranslate nohighlight">\(g_t(0, P) = g_0\)</span>, is to let <span class="math notranslate nohighlight">\(F(N(x,P)) = x \cdot N(x,P)\)</span> and <span class="math notranslate nohighlight">\(h_1(x) = g_0\)</span>. This gives the following trial solution:</p>
+<!-- Equation labels as ordinary links -->
+<div id="trial"></div>
+<div class="math notranslate nohighlight">
+\[
+\begin{equation} \label{trial} \tag{7}
+g_t(x, P) = g_0 + x \cdot N(x, P)
+\end{equation}
+\]</div>
+</section>
+<section id="reformulating-the-problem">
+<h2>Reformulating the problem<a class="headerlink" href="#reformulating-the-problem" title="Link to this heading">#</a></h2>
+<p>We wish that our neural network manages to minimize a given cost function.</p>
+<p>A reformulation of out equation, (<a class="reference internal" href="#solveode"><span class="xref myst">6</span></a>), must therefore be done,
+such that it describes the problem a neural network can solve for.</p>
+<p>The neural network must find the set of weights and biases <span class="math notranslate nohighlight">\(P\)</span> such that the trial solution in (<a class="reference internal" href="#trial"><span class="xref myst">7</span></a>) satisfies (<a class="reference internal" href="#solveode"><span class="xref myst">6</span></a>).</p>
+<p>The trial solution</p>
+<div class="math notranslate nohighlight">
+\[
+g_t(x, P) = g_0 + x \cdot N(x, P)
+\]</div>
+<p>has been chosen such that it already solves the condition <span class="math notranslate nohighlight">\(g(0) = g_0\)</span>. What remains, is to find <span class="math notranslate nohighlight">\(P\)</span> such that</p>
+<!-- Equation labels as ordinary links -->
+<div id="nnmin"></div>
+<div class="math notranslate nohighlight">
+\[
+\begin{equation} \label{nnmin} \tag{8}
+g_t'(x, P) = - \gamma g_t(x, P)
+\end{equation}
+\]</div>
+<p>is fulfilled as <em>best as possible</em>.</p>
+</section>
+<section id="more-technicalities">
+<h2>More technicalities<a class="headerlink" href="#more-technicalities" title="Link to this heading">#</a></h2>
+<p>The left hand side and right hand side of (<a class="reference internal" href="#nnmin"><span class="xref myst">8</span></a>) must be computed separately, and then the neural network must choose weights and biases, contained in <span class="math notranslate nohighlight">\(P\)</span>, such that the sides are equal as best as possible.
+This means that the absolute or squared difference between the sides must be as close to zero, ideally equal to zero.
+In this case, the difference squared shows to be an appropriate measurement of how erroneous the trial solution is with respect to <span class="math notranslate nohighlight">\(P\)</span> of the neural network.</p>
+<p>This gives the following cost function our neural network must solve for:</p>
+<div class="math notranslate nohighlight">
+\[
+\min_{P}\Big\{ \big(g_t'(x, P) - ( -\gamma g_t(x, P) \big)^2 \Big\}
+\]</div>
+<p>(the notation <span class="math notranslate nohighlight">\(\min_{P}\{ f(x, P) \}\)</span> means that we desire to find <span class="math notranslate nohighlight">\(P\)</span> that yields the minimum of <span class="math notranslate nohighlight">\(f(x, P)\)</span>)</p>
+<p>or, in terms of weights and biases for the hidden and output layer in our network:</p>
+<div class="math notranslate nohighlight">
+\[
+\min_{P_{\text{hidden} }, \ P_{\text{output} }}\Big\{ \big(g_t'(x, \{ P_{\text{hidden} }, P_{\text{output} }\}) - ( -\gamma g_t(x, \{ P_{\text{hidden} }, P_{\text{output} }\}) \big)^2 \Big\}
+\]</div>
+<p>for an input value <span class="math notranslate nohighlight">\(x\)</span>.</p>
+</section>
+<section id="more-details">
+<h2>More details<a class="headerlink" href="#more-details" title="Link to this heading">#</a></h2>
+<p>If the neural network evaluates <span class="math notranslate nohighlight">\(g_t(x, P)\)</span> at more values for <span class="math notranslate nohighlight">\(x\)</span>, say <span class="math notranslate nohighlight">\(N\)</span> values <span class="math notranslate nohighlight">\(x_i\)</span> for <span class="math notranslate nohighlight">\(i = 1, \dots, N\)</span>, then the <em>total</em> error to minimize becomes</p>
+<!-- Equation labels as ordinary links -->
+<div id="min"></div>
+<div class="math notranslate nohighlight">
+\[
+\begin{equation} \label{min} \tag{9}
+\min_{P}\Big\{\frac{1}{N} \sum_{i=1}^N  \big(g_t'(x_i, P) - ( -\gamma g_t(x_i, P) \big)^2 \Big\}
+\end{equation}
+\]</div>
+<p>Letting <span class="math notranslate nohighlight">\(\boldsymbol{x}\)</span> be a vector with elements <span class="math notranslate nohighlight">\(x_i\)</span> and <span class="math notranslate nohighlight">\(C(\boldsymbol{x}, P) = \frac{1}{N} \sum_i  \big(g_t'(x_i, P) - ( -\gamma g_t(x_i, P) \big)^2\)</span> denote the cost function, the minimization problem that our network must solve, becomes</p>
+<div class="math notranslate nohighlight">
+\[
+\min_{P} C(\boldsymbol{x}, P)
+\]</div>
+<p>In terms of <span class="math notranslate nohighlight">\(P_{\text{hidden} }\)</span> and <span class="math notranslate nohighlight">\(P_{\text{output} }\)</span>, this could also be expressed as</p>
+<div class="math notranslate nohighlight">
+\[
+\min_{P_{\text{hidden} }, \ P_{\text{output} }} C(\boldsymbol{x}, \{P_{\text{hidden} }, P_{\text{output} }\})
+\]</div>
+</section>
+<section id="a-possible-implementation-of-a-neural-network">
+<h2>A possible implementation of a neural network<a class="headerlink" href="#a-possible-implementation-of-a-neural-network" title="Link to this heading">#</a></h2>
+<p>For simplicity, it is assumed that the input is an array <span class="math notranslate nohighlight">\(\boldsymbol{x} = (x_1, \dots, x_N)\)</span> with <span class="math notranslate nohighlight">\(N\)</span> elements. It is at these points the neural network should find <span class="math notranslate nohighlight">\(P\)</span> such that it fulfills (<a class="reference internal" href="#min"><span class="xref myst">9</span></a>).</p>
+<p>First, the neural network must feed forward the inputs.
+This means that <span class="math notranslate nohighlight">\(\boldsymbol{x}s\)</span> must be passed through an input layer, a hidden layer and a output layer. The input layer in this case, does not need to process the data any further.
+The input layer will consist of <span class="math notranslate nohighlight">\(N_{\text{input} }\)</span> neurons, passing its element to each neuron in the hidden layer.  The number of neurons in the hidden layer will be <span class="math notranslate nohighlight">\(N_{\text{hidden} }\)</span>.</p>
+</section>
+<section id="technicalities">
+<h2>Technicalities<a class="headerlink" href="#technicalities" title="Link to this heading">#</a></h2>
+<p>For the <span class="math notranslate nohighlight">\(i\)</span>-th in the hidden layer with weight <span class="math notranslate nohighlight">\(w_i^{\text{hidden} }\)</span> and bias <span class="math notranslate nohighlight">\(b_i^{\text{hidden} }\)</span>, the weighting from the <span class="math notranslate nohighlight">\(j\)</span>-th neuron at the input layer is:</p>
+<div class="math notranslate nohighlight">
+\[\begin{split}
+\begin{aligned}
+z_{i,j}^{\text{hidden}} &amp;= b_i^{\text{hidden}} + w_i^{\text{hidden}}x_j \\
+&amp;=
+\begin{pmatrix}
+b_i^{\text{hidden}} &amp; w_i^{\text{hidden}}
+\end{pmatrix}
+\begin{pmatrix}
+1 \\
+x_j
+\end{pmatrix}
+\end{aligned}
+\end{split}\]</div>
+</section>
+<section id="final-technicalities-i">
+<h2>Final technicalities I<a class="headerlink" href="#final-technicalities-i" title="Link to this heading">#</a></h2>
+<p>The result after weighting the inputs at the <span class="math notranslate nohighlight">\(i\)</span>-th hidden neuron can be written as a vector:</p>
+<div class="math notranslate nohighlight">
+\[\begin{split}
+\begin{aligned}
+\boldsymbol{z}_{i}^{\text{hidden}} &amp;= \Big( b_i^{\text{hidden}} + w_i^{\text{hidden}}x_1 , \ b_i^{\text{hidden}} + w_i^{\text{hidden}} x_2, \ \dots \, , \ b_i^{\text{hidden}} + w_i^{\text{hidden}} x_N\Big)  \\
+&amp;=
+\begin{pmatrix}
+ b_i^{\text{hidden}}  &amp; w_i^{\text{hidden}}
+\end{pmatrix}
+\begin{pmatrix}
+1  &amp; 1 &amp; \dots &amp; 1 \\
+x_1 &amp; x_2 &amp; \dots &amp; x_N
+\end{pmatrix} \\
+&amp;= \boldsymbol{p}_{i, \text{hidden}}^T X
+\end{aligned}
+\end{split}\]</div>
+</section>
+<section id="final-technicalities-ii">
+<h2>Final technicalities II<a class="headerlink" href="#final-technicalities-ii" title="Link to this heading">#</a></h2>
+<p>The vector <span class="math notranslate nohighlight">\(\boldsymbol{p}_{i, \text{hidden}}^T\)</span> constitutes each row in <span class="math notranslate nohighlight">\(P_{\text{hidden} }\)</span>, which contains the weights for the neural network to minimize according to (<a class="reference internal" href="#min"><span class="xref myst">9</span></a>).</p>
+<p>After having found <span class="math notranslate nohighlight">\(\boldsymbol{z}_{i}^{\text{hidden}} \)</span> for every <span class="math notranslate nohighlight">\(i\)</span>-th neuron within the hidden layer, the vector will be sent to an activation function <span class="math notranslate nohighlight">\(a_i(\boldsymbol{z})\)</span>.</p>
+<p>In this example, the sigmoid function has been chosen to be the activation function for each hidden neuron:</p>
+<div class="math notranslate nohighlight">
+\[
+f(z) = \frac{1}{1 + \exp{(-z)}}
+\]</div>
+<p>It is possible to use other activations functions for the hidden layer also.</p>
+<p>The output <span class="math notranslate nohighlight">\(\boldsymbol{x}_i^{\text{hidden}}\)</span> from each <span class="math notranslate nohighlight">\(i\)</span>-th hidden neuron is:</p>
+<div class="math notranslate nohighlight">
+\[
+\boldsymbol{x}_i^{\text{hidden} } = f\big(  \boldsymbol{z}_{i}^{\text{hidden}} \big)
+\]</div>
+<p>The outputs <span class="math notranslate nohighlight">\(\boldsymbol{x}_i^{\text{hidden} } \)</span> are then sent to the output layer.</p>
+<p>The output layer consists of one neuron in this case, and combines the
+output from each of the neurons in the hidden layers. The output layer
+combines the results from the hidden layer using some weights <span class="math notranslate nohighlight">\(w_i^{\text{output}}\)</span>
+and biases <span class="math notranslate nohighlight">\(b_i^{\text{output}}\)</span>. In this case,
+it is assumes that the number of neurons in the output layer is one.</p>
+</section>
+<section id="final-technicalities-iii">
+<h2>Final technicalities III<a class="headerlink" href="#final-technicalities-iii" title="Link to this heading">#</a></h2>
+<p>The procedure of weighting the output neuron <span class="math notranslate nohighlight">\(j\)</span> in the hidden layer to the <span class="math notranslate nohighlight">\(i\)</span>-th neuron in the output layer is similar as for the hidden layer described previously.</p>
+<div class="math notranslate nohighlight">
+\[\begin{split}
+\begin{aligned}
+z_{1,j}^{\text{output}} &amp; =
+\begin{pmatrix}
+b_1^{\text{output}} &amp; \boldsymbol{w}_1^{\text{output}}
+\end{pmatrix}
+\begin{pmatrix}
+1 \\
+\boldsymbol{x}_j^{\text{hidden}}
+\end{pmatrix}
+\end{aligned}
+\end{split}\]</div>
+</section>
+<section id="final-technicalities-iv">
+<h2>Final technicalities IV<a class="headerlink" href="#final-technicalities-iv" title="Link to this heading">#</a></h2>
+<p>Expressing <span class="math notranslate nohighlight">\(z_{1,j}^{\text{output}}\)</span> as a vector gives the following way of weighting the inputs from the hidden layer:</p>
+<div class="math notranslate nohighlight">
+\[\begin{split}
+\boldsymbol{z}_{1}^{\text{output}} =
+\begin{pmatrix}
+b_1^{\text{output}} &amp; \boldsymbol{w}_1^{\text{output}}
+\end{pmatrix}
+\begin{pmatrix}
+1  &amp; 1 &amp; \dots &amp; 1 \\
+\boldsymbol{x}_1^{\text{hidden}} &amp; \boldsymbol{x}_2^{\text{hidden}} &amp; \dots &amp; \boldsymbol{x}_N^{\text{hidden}}
+\end{pmatrix}
+\end{split}\]</div>
+<p>In this case we seek a continuous range of values since we are approximating a function. This means that after computing <span class="math notranslate nohighlight">\(\boldsymbol{z}_{1}^{\text{output}}\)</span> the neural network has finished its feed forward step, and <span class="math notranslate nohighlight">\(\boldsymbol{z}_{1}^{\text{output}}\)</span> is the final output of the network.</p>
+</section>
+<section id="back-propagation">
+<h2>Back propagation<a class="headerlink" href="#back-propagation" title="Link to this heading">#</a></h2>
+<p>The next step is to decide how the parameters should be changed such that they minimize the cost function.</p>
+<p>The chosen cost function for this problem is</p>
+<div class="math notranslate nohighlight">
+\[
+C(\boldsymbol{x}, P) = \frac{1}{N} \sum_i  \big(g_t'(x_i, P) - ( -\gamma g_t(x_i, P) \big)^2
+\]</div>
+<p>In order to minimize the cost function, an optimization method must be chosen.</p>
+<p>Here, gradient descent with a constant step size has been chosen.</p>
+</section>
+<section id="gradient-descent">
+<h2>Gradient descent<a class="headerlink" href="#gradient-descent" title="Link to this heading">#</a></h2>
+<p>The idea of the gradient descent algorithm is to update parameters in
+a direction where the cost function decreases goes to a minimum.</p>
+<p>In general, the update of some parameters <span class="math notranslate nohighlight">\(\boldsymbol{\omega}\)</span> given a cost
+function defined by some weights <span class="math notranslate nohighlight">\(\boldsymbol{\omega}\)</span>, <span class="math notranslate nohighlight">\(C(\boldsymbol{x},
+\boldsymbol{\omega})\)</span>, goes as follows:</p>
+<div class="math notranslate nohighlight">
+\[
+\boldsymbol{\omega}_{\text{new} } = \boldsymbol{\omega} - \lambda \nabla_{\boldsymbol{\omega}} C(\boldsymbol{x}, \boldsymbol{\omega})
+\]</div>
+<p>for a number of iterations or until <span class="math notranslate nohighlight">\( \big|\big| \boldsymbol{\omega}_{\text{new} } - \boldsymbol{\omega} \big|\big|\)</span> becomes smaller than some given tolerance.</p>
+<p>The value of <span class="math notranslate nohighlight">\(\lambda\)</span> decides how large steps the algorithm must take
+in the direction of <span class="math notranslate nohighlight">\( \nabla_{\boldsymbol{\omega}} C(\boldsymbol{x}, \boldsymbol{\omega})\)</span>.
+The notation <span class="math notranslate nohighlight">\(\nabla_{\boldsymbol{\omega}}\)</span> express the gradient with respect
+to the elements in <span class="math notranslate nohighlight">\(\boldsymbol{\omega}\)</span>.</p>
+<p>In our case, we have to minimize the cost function <span class="math notranslate nohighlight">\(C(\boldsymbol{x}, P)\)</span> with
+respect to the two sets of weights and biases, that is for the hidden
+layer <span class="math notranslate nohighlight">\(P_{\text{hidden} }\)</span> and for the output layer <span class="math notranslate nohighlight">\(P_{\text{output}
+}\)</span> .</p>
+<p>This means that <span class="math notranslate nohighlight">\(P_{\text{hidden} }\)</span> and <span class="math notranslate nohighlight">\(P_{\text{output} }\)</span> is updated by</p>
+<div class="math notranslate nohighlight">
+\[\begin{split}
+\begin{aligned}
+P_{\text{hidden},\text{new}} &amp;= P_{\text{hidden}} - \lambda \nabla_{P_{\text{hidden}}} C(\boldsymbol{x}, P)  \\
+P_{\text{output},\text{new}} &amp;= P_{\text{output}} - \lambda \nabla_{P_{\text{output}}} C(\boldsymbol{x}, P)
+\end{aligned}
+\end{split}\]</div>
+</section>
+<section id="the-code-for-solving-the-ode">
+<h2>The code for solving the ODE<a class="headerlink" href="#the-code-for-solving-the-ode" title="Link to this heading">#</a></h2>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-none notranslate"><div class="highlight"><pre><span></span>%matplotlib inline
+
+import autograd.numpy as np
+from autograd import grad, elementwise_grad
+import autograd.numpy.random as npr
+from matplotlib import pyplot as plt
+
+def sigmoid(z):
+    return 1/(1 + np.exp(-z))
+
+# Assuming one input, hidden, and output layer
+def neural_network(params, x):
+
+    # Find the weights (including and biases) for the hidden and output layer.
+    # Assume that params is a list of parameters for each layer.
+    # The biases are the first element for each array in params,
+    # and the weights are the remaning elements in each array in params.
+
+    w_hidden = params[0]
+    w_output = params[1]
+
+    # Assumes input x being an one-dimensional array
+    num_values = np.size(x)
+    x = x.reshape(-1, num_values)
+
+    # Assume that the input layer does nothing to the input x
+    x_input = x
+
+    ## Hidden layer:
+
+    # Add a row of ones to include bias
+    x_input = np.concatenate((np.ones((1,num_values)), x_input ), axis = 0)
+
+    z_hidden = np.matmul(w_hidden, x_input)
+    x_hidden = sigmoid(z_hidden)
+
+    ## Output layer:
+
+    # Include bias:
+    x_hidden = np.concatenate((np.ones((1,num_values)), x_hidden ), axis = 0)
+
+    z_output = np.matmul(w_output, x_hidden)
+    x_output = z_output
+
+    return x_output
+
+# The trial solution using the deep neural network:
+def g_trial(x,params, g0 = 10):
+    return g0 + x*neural_network(params,x)
+
+# The right side of the ODE:
+def g(x, g_trial, gamma = 2):
+    return -gamma*g_trial
+
+# The cost function:
+def cost_function(P, x):
+
+    # Evaluate the trial function with the current parameters P
+    g_t = g_trial(x,P)
+
+    # Find the derivative w.r.t x of the neural network
+    d_net_out = elementwise_grad(neural_network,1)(P,x)
+
+    # Find the derivative w.r.t x of the trial function
+    d_g_t = elementwise_grad(g_trial,0)(x,P)
+
+    # The right side of the ODE
+    func = g(x, g_t)
+
+    err_sqr = (d_g_t - func)**2
+    cost_sum = np.sum(err_sqr)
+
+    return cost_sum / np.size(err_sqr)
+
+# Solve the exponential decay ODE using neural network with one input, hidden, and output layer
+def solve_ode_neural_network(x, num_neurons_hidden, num_iter, lmb):
+    ## Set up initial weights and biases
+
+    # For the hidden layer
+    p0 = npr.randn(num_neurons_hidden, 2 )
+
+    # For the output layer
+    p1 = npr.randn(1, num_neurons_hidden + 1 ) # +1 since bias is included
+
+    P = [p0, p1]
+
+    print(&#39;Initial cost: %g&#39;%cost_function(P, x))
+
+    ## Start finding the optimal weights using gradient descent
+
+    # Find the Python function that represents the gradient of the cost function
+    # w.r.t the 0-th input argument -- that is the weights and biases in the hidden and output layer
+    cost_function_grad = grad(cost_function,0)
+
+    # Let the update be done num_iter times
+    for i in range(num_iter):
+        # Evaluate the gradient at the current weights and biases in P.
+        # The cost_grad consist now of two arrays;
+        # one for the gradient w.r.t P_hidden and
+        # one for the gradient w.r.t P_output
+        cost_grad =  cost_function_grad(P, x)
+
+        P[0] = P[0] - lmb * cost_grad[0]
+        P[1] = P[1] - lmb * cost_grad[1]
+
+    print(&#39;Final cost: %g&#39;%cost_function(P, x))
+
+    return P
+
+def g_analytic(x, gamma = 2, g0 = 10):
+    return g0*np.exp(-gamma*x)
+
+# Solve the given problem
+if __name__ == &#39;__main__&#39;:
+    # Set seed such that the weight are initialized
+    # with same weights and biases for every run.
+    npr.seed(15)
+
+    ## Decide the vales of arguments to the function to solve
+    N = 10
+    x = np.linspace(0, 1, N)
+
+    ## Set up the initial parameters
+    num_hidden_neurons = 10
+    num_iter = 10000
+    lmb = 0.001
+
+    # Use the network
+    P = solve_ode_neural_network(x, num_hidden_neurons, num_iter, lmb)
+
+    # Print the deviation from the trial solution and true solution
+    res = g_trial(x,P)
+    res_analytical = g_analytic(x)
+
+    print(&#39;Max absolute difference: %g&#39;%np.max(np.abs(res - res_analytical)))
+
+    # Plot the results
+    plt.figure(figsize=(10,10))
+
+    plt.title(&#39;Performance of neural network solving an ODE compared to the analytical solution&#39;)
+    plt.plot(x, res_analytical)
+    plt.plot(x, res[0,:])
+    plt.legend([&#39;analytical&#39;,&#39;nn&#39;])
+    plt.xlabel(&#39;x&#39;)
+    plt.ylabel(&#39;g(x)&#39;)
+    plt.show()
+</pre></div>
+</div>
+</div>
+</div>
+</section>
+<section id="the-network-with-one-input-layer-specified-number-of-hidden-layers-and-one-output-layer">
+<h2>The network with one input layer, specified number of hidden layers, and one output layer<a class="headerlink" href="#the-network-with-one-input-layer-specified-number-of-hidden-layers-and-one-output-layer" title="Link to this heading">#</a></h2>
+<p>It is also possible to extend the construction of our network into a more general one, allowing the network to contain more than one hidden layers.</p>
+<p>The number of neurons within each hidden layer are given as a list of integers in the program below.</p>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-none notranslate"><div class="highlight"><pre><span></span>import autograd.numpy as np
+from autograd import grad, elementwise_grad
+import autograd.numpy.random as npr
+from matplotlib import pyplot as plt
+
+def sigmoid(z):
+    return 1/(1 + np.exp(-z))
+
+# The neural network with one input layer and one output layer,
+# but with number of hidden layers specified by the user.
+def deep_neural_network(deep_params, x):
+    # N_hidden is the number of hidden layers
+    # deep_params is a list, len() should be used
+    N_hidden = len(deep_params) - 1 # -1 since params consists of
+                                        # parameters to all the hidden
+                                        # layers AND the output layer.
+
+    # Assumes input x being an one-dimensional array
+    num_values = np.size(x)
+    x = x.reshape(-1, num_values)
+
+    # Assume that the input layer does nothing to the input x
+    x_input = x
+
+    # Due to multiple hidden layers, define a variable referencing to the
+    # output of the previous layer:
+    x_prev = x_input
+
+    ## Hidden layers:
+
+    for l in range(N_hidden):
+        # From the list of parameters P; find the correct weigths and bias for this layer
+        w_hidden = deep_params[l]
+
+        # Add a row of ones to include bias
+        x_prev = np.concatenate((np.ones((1,num_values)), x_prev ), axis = 0)
+
+        z_hidden = np.matmul(w_hidden, x_prev)
+        x_hidden = sigmoid(z_hidden)
+
+        # Update x_prev such that next layer can use the output from this layer
+        x_prev = x_hidden
+
+    ## Output layer:
+
+    # Get the weights and bias for this layer
+    w_output = deep_params[-1]
+
+    # Include bias:
+    x_prev = np.concatenate((np.ones((1,num_values)), x_prev), axis = 0)
+
+    z_output = np.matmul(w_output, x_prev)
+    x_output = z_output
+
+    return x_output
+
+# The trial solution using the deep neural network:
+def g_trial_deep(x,params, g0 = 10):
+    return g0 + x*deep_neural_network(params, x)
+
+# The right side of the ODE:
+def g(x, g_trial, gamma = 2):
+    return -gamma*g_trial
+
+# The same cost function as before, but calls deep_neural_network instead.
+def cost_function_deep(P, x):
+
+    # Evaluate the trial function with the current parameters P
+    g_t = g_trial_deep(x,P)
+
+    # Find the derivative w.r.t x of the neural network
+    d_net_out = elementwise_grad(deep_neural_network,1)(P,x)
+
+    # Find the derivative w.r.t x of the trial function
+    d_g_t = elementwise_grad(g_trial_deep,0)(x,P)
+
+    # The right side of the ODE
+    func = g(x, g_t)
+
+    err_sqr = (d_g_t - func)**2
+    cost_sum = np.sum(err_sqr)
+
+    return cost_sum / np.size(err_sqr)
+
+# Solve the exponential decay ODE using neural network with one input and one output layer,
+# but with specified number of hidden layers from the user.
+def solve_ode_deep_neural_network(x, num_neurons, num_iter, lmb):
+    # num_hidden_neurons is now a list of number of neurons within each hidden layer
+
+    # The number of elements in the list num_hidden_neurons thus represents
+    # the number of hidden layers.
+
+    # Find the number of hidden layers:
+    N_hidden = np.size(num_neurons)
+
+    ## Set up initial weights and biases
+
+    # Initialize the list of parameters:
+    P = [None]*(N_hidden + 1) # + 1 to include the output layer
+
+    P[0] = npr.randn(num_neurons[0], 2 )
+    for l in range(1,N_hidden):
+        P[l] = npr.randn(num_neurons[l], num_neurons[l-1] + 1) # +1 to include bias
+
+    # For the output layer
+    P[-1] = npr.randn(1, num_neurons[-1] + 1 ) # +1 since bias is included
+
+    print(&#39;Initial cost: %g&#39;%cost_function_deep(P, x))
+
+    ## Start finding the optimal weights using gradient descent
+
+    # Find the Python function that represents the gradient of the cost function
+    # w.r.t the 0-th input argument -- that is the weights and biases in the hidden and output layer
+    cost_function_deep_grad = grad(cost_function_deep,0)
+
+    # Let the update be done num_iter times
+    for i in range(num_iter):
+        # Evaluate the gradient at the current weights and biases in P.
+        # The cost_grad consist now of N_hidden + 1 arrays; the gradient w.r.t the weights and biases
+        # in the hidden layers and output layers evaluated at x.
+        cost_deep_grad =  cost_function_deep_grad(P, x)
+
+        for l in range(N_hidden+1):
+            P[l] = P[l] - lmb * cost_deep_grad[l]
+
+    print(&#39;Final cost: %g&#39;%cost_function_deep(P, x))
+
+    return P
+
+def g_analytic(x, gamma = 2, g0 = 10):
+    return g0*np.exp(-gamma*x)
+
+# Solve the given problem
+if __name__ == &#39;__main__&#39;:
+    npr.seed(15)
+
+    ## Decide the vales of arguments to the function to solve
+    N = 10
+    x = np.linspace(0, 1, N)
+
+    ## Set up the initial parameters
+    num_hidden_neurons = np.array([10,10])
+    num_iter = 10000
+    lmb = 0.001
+
+    P = solve_ode_deep_neural_network(x, num_hidden_neurons, num_iter, lmb)
+
+    res = g_trial_deep(x,P)
+    res_analytical = g_analytic(x)
+
+    plt.figure(figsize=(10,10))
+
+    plt.title(&#39;Performance of a deep neural network solving an ODE compared to the analytical solution&#39;)
+    plt.plot(x, res_analytical)
+    plt.plot(x, res[0,:])
+    plt.legend([&#39;analytical&#39;,&#39;dnn&#39;])
+    plt.ylabel(&#39;g(x)&#39;)
+    plt.show()
+</pre></div>
+</div>
+</div>
+</div>
+</section>
+<section id="example-population-growth">
+<h2>Example: Population growth<a class="headerlink" href="#example-population-growth" title="Link to this heading">#</a></h2>
+<p>A logistic model of population growth assumes that a population converges toward an equilibrium.
+The population growth can be modeled by</p>
+<!-- Equation labels as ordinary links -->
+<div id="log"></div>
+<div class="math notranslate nohighlight">
+\[
+\begin{equation} \label{log} \tag{10}
+	g'(t) = \alpha g(t)(A - g(t))
+\end{equation}
+\]</div>
+<p>where <span class="math notranslate nohighlight">\(g(t)\)</span> is the population density at time <span class="math notranslate nohighlight">\(t\)</span>, <span class="math notranslate nohighlight">\(\alpha &gt; 0\)</span> the growth rate and <span class="math notranslate nohighlight">\(A &gt; 0\)</span> is the maximum population number in the environment.
+Also, at <span class="math notranslate nohighlight">\(t = 0\)</span> the population has the size <span class="math notranslate nohighlight">\(g(0) = g_0\)</span>, where <span class="math notranslate nohighlight">\(g_0\)</span> is some chosen constant.</p>
+<p>In this example, similar network as for the exponential decay using Autograd has been used to solve the equation. However, as the implementation might suffer from e.g numerical instability
+and high execution time (this might be more apparent in the examples solving PDEs),
+using a library like  TensorFlow is recommended.
+Here, we stay with a more simple approach and implement for comparison, the simple forward Euler method.</p>
+</section>
+<section id="setting-up-the-problem">
+<h2>Setting up the problem<a class="headerlink" href="#setting-up-the-problem" title="Link to this heading">#</a></h2>
+<p>Here, we will model a population <span class="math notranslate nohighlight">\(g(t)\)</span> in an environment having carrying capacity <span class="math notranslate nohighlight">\(A\)</span>.
+The population follows the model</p>
+<!-- Equation labels as ordinary links -->
+<div id="solveode_population"></div>
+<div class="math notranslate nohighlight">
+\[
+\begin{equation} \label{solveode_population} \tag{11}
+g'(t) = \alpha g(t)(A - g(t))
+\end{equation}
+\]</div>
+<p>where <span class="math notranslate nohighlight">\(g(0) = g_0\)</span>.</p>
+<p>In this example, we let <span class="math notranslate nohighlight">\(\alpha = 2\)</span>, <span class="math notranslate nohighlight">\(A = 1\)</span>, and <span class="math notranslate nohighlight">\(g_0 = 1.2\)</span>.</p>
+</section>
+<section id="id2">
+<h2>The trial solution<a class="headerlink" href="#id2" title="Link to this heading">#</a></h2>
+<p>We will get a slightly different trial solution, as the boundary conditions are different
+compared to the case for exponential decay.</p>
+<p>A possible trial solution satisfying the condition <span class="math notranslate nohighlight">\(g(0) = g_0\)</span> could be</p>
+<div class="math notranslate nohighlight">
+\[
+h_1(t) = g_0 + t \cdot N(t,P)
+\]</div>
+<p>with <span class="math notranslate nohighlight">\(N(t,P)\)</span> being the output from the neural network with weights and biases for each layer collected in the set <span class="math notranslate nohighlight">\(P\)</span>.</p>
+<p>The analytical solution is</p>
+<div class="math notranslate nohighlight">
+\[
+g(t) = \frac{Ag_0}{g_0 + (A - g_0)\exp(-\alpha A t)}
+\]</div>
+</section>
+<section id="the-program-using-autograd">
+<h2>The program using Autograd<a class="headerlink" href="#the-program-using-autograd" title="Link to this heading">#</a></h2>
+<p>The network will be the similar as for the exponential decay example, but with some small modifications for our problem.</p>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-none notranslate"><div class="highlight"><pre><span></span>import autograd.numpy as np
+from autograd import grad, elementwise_grad
+import autograd.numpy.random as npr
+from matplotlib import pyplot as plt
+
+def sigmoid(z):
+    return 1/(1 + np.exp(-z))
+
+# Function to get the parameters.
+# Done such that one can easily change the paramaters after one&#39;s liking.
+def get_parameters():
+    alpha = 2
+    A = 1
+    g0 = 1.2
+    return alpha, A, g0
+
+def deep_neural_network(deep_params, x):
+    # N_hidden is the number of hidden layers
+    # deep_params is a list, len() should be used
+    N_hidden = len(deep_params) - 1 # -1 since params consists of
+                                        # parameters to all the hidden
+                                        # layers AND the output layer.
+
+    # Assumes input x being an one-dimensional array
+    num_values = np.size(x)
+    x = x.reshape(-1, num_values)
+
+    # Assume that the input layer does nothing to the input x
+    x_input = x
+
+    # Due to multiple hidden layers, define a variable referencing to the
+    # output of the previous layer:
+    x_prev = x_input
+
+    ## Hidden layers:
+
+    for l in range(N_hidden):
+        # From the list of parameters P; find the correct weigths and bias for this layer
+        w_hidden = deep_params[l]
+
+        # Add a row of ones to include bias
+        x_prev = np.concatenate((np.ones((1,num_values)), x_prev ), axis = 0)
+
+        z_hidden = np.matmul(w_hidden, x_prev)
+        x_hidden = sigmoid(z_hidden)
+
+        # Update x_prev such that next layer can use the output from this layer
+        x_prev = x_hidden
+
+    ## Output layer:
+
+    # Get the weights and bias for this layer
+    w_output = deep_params[-1]
+
+    # Include bias:
+    x_prev = np.concatenate((np.ones((1,num_values)), x_prev), axis = 0)
+
+    z_output = np.matmul(w_output, x_prev)
+    x_output = z_output
+
+    return x_output
+
+
+
+
+def cost_function_deep(P, x):
+
+    # Evaluate the trial function with the current parameters P
+    g_t = g_trial_deep(x,P)
+
+    # Find the derivative w.r.t x of the trial function
+    d_g_t = elementwise_grad(g_trial_deep,0)(x,P)
+
+    # The right side of the ODE
+    func = f(x, g_t)
+
+    err_sqr = (d_g_t - func)**2
+    cost_sum = np.sum(err_sqr)
+
+    return cost_sum / np.size(err_sqr)
+
+# The right side of the ODE:
+def f(x, g_trial):
+    alpha,A, g0 = get_parameters()
+    return alpha*g_trial*(A - g_trial)
+
+# The trial solution using the deep neural network:
+def g_trial_deep(x, params):
+    alpha,A, g0 = get_parameters()
+    return g0 + x*deep_neural_network(params,x)
+
+# The analytical solution:
+def g_analytic(t):
+    alpha,A, g0 = get_parameters()
+    return A*g0/(g0 + (A - g0)*np.exp(-alpha*A*t))
+
+def solve_ode_deep_neural_network(x, num_neurons, num_iter, lmb):
+    # num_hidden_neurons is now a list of number of neurons within each hidden layer
+
+    # Find the number of hidden layers:
+    N_hidden = np.size(num_neurons)
+
+    ## Set up initial weigths and biases
+
+    # Initialize the list of parameters:
+    P = [None]*(N_hidden + 1) # + 1 to include the output layer
+
+    P[0] = npr.randn(num_neurons[0], 2 )
+    for l in range(1,N_hidden):
+        P[l] = npr.randn(num_neurons[l], num_neurons[l-1] + 1) # +1 to include bias
+
+    # For the output layer
+    P[-1] = npr.randn(1, num_neurons[-1] + 1 ) # +1 since bias is included
+
+    print(&#39;Initial cost: %g&#39;%cost_function_deep(P, x))
+
+    ## Start finding the optimal weigths using gradient descent
+
+    # Find the Python function that represents the gradient of the cost function
+    # w.r.t the 0-th input argument -- that is the weights and biases in the hidden and output layer
+    cost_function_deep_grad = grad(cost_function_deep,0)
+
+    # Let the update be done num_iter times
+    for i in range(num_iter):
+        # Evaluate the gradient at the current weights and biases in P.
+        # The cost_grad consist now of N_hidden + 1 arrays; the gradient w.r.t the weights and biases
+        # in the hidden layers and output layers evaluated at x.
+        cost_deep_grad =  cost_function_deep_grad(P, x)
+
+        for l in range(N_hidden+1):
+            P[l] = P[l] - lmb * cost_deep_grad[l]
+
+    print(&#39;Final cost: %g&#39;%cost_function_deep(P, x))
+
+    return P
+
+if __name__ == &#39;__main__&#39;:
+    npr.seed(4155)
+
+    ## Decide the vales of arguments to the function to solve
+    Nt = 10
+    T = 1
+    t = np.linspace(0,T, Nt)
+
+    ## Set up the initial parameters
+    num_hidden_neurons = [100, 50, 25]
+    num_iter = 1000
+    lmb = 1e-3
+
+    P = solve_ode_deep_neural_network(t, num_hidden_neurons, num_iter, lmb)
+
+    g_dnn_ag = g_trial_deep(t,P)
+    g_analytical = g_analytic(t)
+
+    # Find the maximum absolute difference between the solutons:
+    diff_ag = np.max(np.abs(g_dnn_ag - g_analytical))
+    print(&quot;The max absolute difference between the solutions is: %g&quot;%diff_ag)
+
+    plt.figure(figsize=(10,10))
+
+    plt.title(&#39;Performance of neural network solving an ODE compared to the analytical solution&#39;)
+    plt.plot(t, g_analytical)
+    plt.plot(t, g_dnn_ag[0,:])
+    plt.legend([&#39;analytical&#39;,&#39;nn&#39;])
+    plt.xlabel(&#39;t&#39;)
+    plt.ylabel(&#39;g(t)&#39;)
+
+    plt.show()
+</pre></div>
+</div>
+</div>
+</div>
+</section>
+<section id="using-forward-euler-to-solve-the-ode">
+<h2>Using forward Euler to solve the ODE<a class="headerlink" href="#using-forward-euler-to-solve-the-ode" title="Link to this heading">#</a></h2>
+<p>A straightforward way of solving an ODE numerically, is to use Euler’s method.</p>
+<p>Euler’s method uses Taylor series to approximate the value at a function <span class="math notranslate nohighlight">\(f\)</span> at a step <span class="math notranslate nohighlight">\(\Delta x\)</span> from <span class="math notranslate nohighlight">\(x\)</span>:</p>
+<div class="math notranslate nohighlight">
+\[
+f(x + \Delta x) \approx f(x) + \Delta x f'(x)
+\]</div>
+<p>In our case, using Euler’s method to approximate the value of <span class="math notranslate nohighlight">\(g\)</span> at a step <span class="math notranslate nohighlight">\(\Delta t\)</span> from <span class="math notranslate nohighlight">\(t\)</span> yields</p>
+<div class="math notranslate nohighlight">
+\[\begin{split}
+\begin{aligned}
+  g(t + \Delta t) &amp;\approx g(t) + \Delta t g'(t) \\
+  &amp;= g(t) + \Delta t \big(\alpha g(t)(A - g(t))\big)
+\end{aligned}
+\end{split}\]</div>
+<p>along with the condition that <span class="math notranslate nohighlight">\(g(0) = g_0\)</span>.</p>
+<p>Let <span class="math notranslate nohighlight">\(t_i = i \cdot \Delta t\)</span> where <span class="math notranslate nohighlight">\(\Delta t = \frac{T}{N_t-1}\)</span> where <span class="math notranslate nohighlight">\(T\)</span> is the final time our solver must solve for and <span class="math notranslate nohighlight">\(N_t\)</span> the number of values for <span class="math notranslate nohighlight">\(t \in [0, T]\)</span> for <span class="math notranslate nohighlight">\(i = 0, \dots, N_t-1\)</span>.</p>
+<p>For <span class="math notranslate nohighlight">\(i \geq 1\)</span>, we have that</p>
+<div class="math notranslate nohighlight">
+\[\begin{split}
+\begin{aligned}
+t_i &amp;= i\Delta t \\
+&amp;= (i - 1)\Delta t + \Delta t \\
+&amp;= t_{i-1} + \Delta t
+\end{aligned}
+\end{split}\]</div>
+<p>Now, if <span class="math notranslate nohighlight">\(g_i = g(t_i)\)</span> then</p>
+<!-- Equation labels as ordinary links -->
+<div id="odenum"></div>
+<div class="math notranslate nohighlight">
+\[\begin{split}
+\begin{equation}
+  \begin{aligned}
+  g_i &amp;= g(t_i) \\
+  &amp;= g(t_{i-1} + \Delta t) \\
+  &amp;\approx g(t_{i-1}) + \Delta t \big(\alpha g(t_{i-1})(A - g(t_{i-1}))\big) \\
+  &amp;= g_{i-1} + \Delta t \big(\alpha g_{i-1}(A - g_{i-1})\big)
+  \end{aligned}
+\end{equation} \label{odenum} \tag{12}
+\end{split}\]</div>
+<p>for <span class="math notranslate nohighlight">\(i \geq 1\)</span> and <span class="math notranslate nohighlight">\(g_0 = g(t_0) = g(0) = g_0\)</span>.</p>
+<p>Equation (<a class="reference internal" href="#odenum"><span class="xref myst">12</span></a>) could be implemented in the following way,
+extending the program that uses the network using Autograd:</p>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-none notranslate"><div class="highlight"><pre><span></span># Assume that all function definitions from the example program using Autograd
+# are located here.
+
+if __name__ == &#39;__main__&#39;:
+    npr.seed(4155)
+
+    ## Decide the vales of arguments to the function to solve
+    Nt = 10
+    T = 1
+    t = np.linspace(0,T, Nt)
+
+    ## Set up the initial parameters
+    num_hidden_neurons = [100,50,25]
+    num_iter = 1000
+    lmb = 1e-3
+
+    P = solve_ode_deep_neural_network(t, num_hidden_neurons, num_iter, lmb)
+
+    g_dnn_ag = g_trial_deep(t,P)
+    g_analytical = g_analytic(t)
+
+    # Find the maximum absolute difference between the solutons:
+    diff_ag = np.max(np.abs(g_dnn_ag - g_analytical))
+    print(&quot;The max absolute difference between the solutions is: %g&quot;%diff_ag)
+
+    plt.figure(figsize=(10,10))
+
+    plt.title(&#39;Performance of neural network solving an ODE compared to the analytical solution&#39;)
+    plt.plot(t, g_analytical)
+    plt.plot(t, g_dnn_ag[0,:])
+    plt.legend([&#39;analytical&#39;,&#39;nn&#39;])
+    plt.xlabel(&#39;t&#39;)
+    plt.ylabel(&#39;g(t)&#39;)
+
+    ## Find an approximation to the funtion using forward Euler
+
+    alpha, A, g0 = get_parameters()
+    dt = T/(Nt - 1)
+
+    # Perform forward Euler to solve the ODE
+    g_euler = np.zeros(Nt)
+    g_euler[0] = g0
+
+    for i in range(1,Nt):
+        g_euler[i] = g_euler[i-1] + dt*(alpha*g_euler[i-1]*(A - g_euler[i-1]))
+
+    # Print the errors done by each method
+    diff1 = np.max(np.abs(g_euler - g_analytical))
+    diff2 = np.max(np.abs(g_dnn_ag[0,:] - g_analytical))
+
+    print(&#39;Max absolute difference between Euler method and analytical: %g&#39;%diff1)
+    print(&#39;Max absolute difference between deep neural network and analytical: %g&#39;%diff2)
+
+    # Plot results
+    plt.figure(figsize=(10,10))
+
+    plt.plot(t,g_euler)
+    plt.plot(t,g_analytical)
+    plt.plot(t,g_dnn_ag[0,:])
+
+    plt.legend([&#39;euler&#39;,&#39;analytical&#39;,&#39;dnn&#39;])
+    plt.xlabel(&#39;Time t&#39;)
+    plt.ylabel(&#39;g(t)&#39;)
+
+    plt.show()
+</pre></div>
+</div>
+</div>
+</div>
+</section>
+<section id="example-solving-the-one-dimensional-poisson-equation">
+<h2>Example: Solving the one dimensional Poisson equation<a class="headerlink" href="#example-solving-the-one-dimensional-poisson-equation" title="Link to this heading">#</a></h2>
+<p>The Poisson equation for <span class="math notranslate nohighlight">\(g(x)\)</span> in one dimension is</p>
+<!-- Equation labels as ordinary links -->
+<div id="poisson"></div>
+<div class="math notranslate nohighlight">
+\[
+\begin{equation} \label{poisson} \tag{13}
+  -g''(x) = f(x)
+\end{equation}
+\]</div>
+<p>where <span class="math notranslate nohighlight">\(f(x)\)</span> is a given function for <span class="math notranslate nohighlight">\(x \in (0,1)\)</span>.</p>
+<p>The conditions that <span class="math notranslate nohighlight">\(g(x)\)</span> is chosen to fulfill, are</p>
+<div class="math notranslate nohighlight">
+\[\begin{split}
+\begin{align*}
+  g(0) &amp;= 0 \\
+  g(1) &amp;= 0
+\end{align*}
+\end{split}\]</div>
+<p>This equation can be solved numerically using programs where e.g Autograd and TensorFlow are used.
+The results from the networks can then be compared to the analytical solution.
+In addition, it could be interesting to see how a typical method for numerically solving second order ODEs compares to the neural networks.</p>
+</section>
+<section id="the-specific-equation-to-solve-for">
+<h2>The specific equation to solve for<a class="headerlink" href="#the-specific-equation-to-solve-for" title="Link to this heading">#</a></h2>
+<p>Here, the function <span class="math notranslate nohighlight">\(g(x)\)</span> to solve for follows the equation</p>
+<div class="math notranslate nohighlight">
+\[
+-g''(x) = f(x),\qquad x \in (0,1)
+\]</div>
+<p>where <span class="math notranslate nohighlight">\(f(x)\)</span> is a given function, along with the chosen conditions</p>
+<!-- Equation labels as ordinary links -->
+<div id="cond"></div>
+<div class="math notranslate nohighlight">
+\[
+\begin{aligned}
+g(0) = g(1) = 0
+\end{aligned}\label{cond} \tag{14}
+\]</div>
+<p>In this example, we consider the case when <span class="math notranslate nohighlight">\(f(x) = (3x + x^2)\exp(x)\)</span>.</p>
+<p>For this case, a possible trial solution satisfying the conditions could be</p>
+<div class="math notranslate nohighlight">
+\[
+g_t(x) = x \cdot (1-x) \cdot N(P,x)
+\]</div>
+<p>The analytical solution for this problem is</p>
+<div class="math notranslate nohighlight">
+\[
+g(x) = x(1 - x)\exp(x)
+\]</div>
+</section>
+<section id="solving-the-equation-using-autograd">
+<h2>Solving the equation using Autograd<a class="headerlink" href="#solving-the-equation-using-autograd" title="Link to this heading">#</a></h2>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-none notranslate"><div class="highlight"><pre><span></span>import autograd.numpy as np
+from autograd import grad, elementwise_grad
+import autograd.numpy.random as npr
+from matplotlib import pyplot as plt
+
+def sigmoid(z):
+    return 1/(1 + np.exp(-z))
+
+def deep_neural_network(deep_params, x):
+    # N_hidden is the number of hidden layers
+    # deep_params is a list, len() should be used
+    N_hidden = len(deep_params) - 1 # -1 since params consists of
+                                        # parameters to all the hidden
+                                        # layers AND the output layer.
+
+    # Assumes input x being an one-dimensional array
+    num_values = np.size(x)
+    x = x.reshape(-1, num_values)
+
+    # Assume that the input layer does nothing to the input x
+    x_input = x
+
+    # Due to multiple hidden layers, define a variable referencing to the
+    # output of the previous layer:
+    x_prev = x_input
+
+    ## Hidden layers:
+
+    for l in range(N_hidden):
+        # From the list of parameters P; find the correct weigths and bias for this layer
+        w_hidden = deep_params[l]
+
+        # Add a row of ones to include bias
+        x_prev = np.concatenate((np.ones((1,num_values)), x_prev ), axis = 0)
+
+        z_hidden = np.matmul(w_hidden, x_prev)
+        x_hidden = sigmoid(z_hidden)
+
+        # Update x_prev such that next layer can use the output from this layer
+        x_prev = x_hidden
+
+    ## Output layer:
+
+    # Get the weights and bias for this layer
+    w_output = deep_params[-1]
+
+    # Include bias:
+    x_prev = np.concatenate((np.ones((1,num_values)), x_prev), axis = 0)
+
+    z_output = np.matmul(w_output, x_prev)
+    x_output = z_output
+
+    return x_output
+
+
+def solve_ode_deep_neural_network(x, num_neurons, num_iter, lmb):
+    # num_hidden_neurons is now a list of number of neurons within each hidden layer
+
+    # Find the number of hidden layers:
+    N_hidden = np.size(num_neurons)
+
+    ## Set up initial weigths and biases
+
+    # Initialize the list of parameters:
+    P = [None]*(N_hidden + 1) # + 1 to include the output layer
+
+    P[0] = npr.randn(num_neurons[0], 2 )
+    for l in range(1,N_hidden):
+        P[l] = npr.randn(num_neurons[l], num_neurons[l-1] + 1) # +1 to include bias
+
+    # For the output layer
+    P[-1] = npr.randn(1, num_neurons[-1] + 1 ) # +1 since bias is included
+
+    print(&#39;Initial cost: %g&#39;%cost_function_deep(P, x))
+
+    ## Start finding the optimal weigths using gradient descent
+
+    # Find the Python function that represents the gradient of the cost function
+    # w.r.t the 0-th input argument -- that is the weights and biases in the hidden and output layer
+    cost_function_deep_grad = grad(cost_function_deep,0)
+
+    # Let the update be done num_iter times
+    for i in range(num_iter):
+        # Evaluate the gradient at the current weights and biases in P.
+        # The cost_grad consist now of N_hidden + 1 arrays; the gradient w.r.t the weights and biases
+        # in the hidden layers and output layers evaluated at x.
+        cost_deep_grad =  cost_function_deep_grad(P, x)
+
+        for l in range(N_hidden+1):
+            P[l] = P[l] - lmb * cost_deep_grad[l]
+
+    print(&#39;Final cost: %g&#39;%cost_function_deep(P, x))
+
+    return P
+
+## Set up the cost function specified for this Poisson equation:
+
+# The right side of the ODE
+def f(x):
+    return (3*x + x**2)*np.exp(x)
+
+def cost_function_deep(P, x):
+
+    # Evaluate the trial function with the current parameters P
+    g_t = g_trial_deep(x,P)
+
+    # Find the derivative w.r.t x of the trial function
+    d2_g_t = elementwise_grad(elementwise_grad(g_trial_deep,0))(x,P)
+
+    right_side = f(x)
+
+    err_sqr = (-d2_g_t - right_side)**2
+    cost_sum = np.sum(err_sqr)
+
+    return cost_sum/np.size(err_sqr)
+
+# The trial solution:
+def g_trial_deep(x,P):
+    return x*(1-x)*deep_neural_network(P,x)
+
+# The analytic solution;
+def g_analytic(x):
+    return x*(1-x)*np.exp(x)
+
+if __name__ == &#39;__main__&#39;:
+    npr.seed(4155)
+
+    ## Decide the vales of arguments to the function to solve
+    Nx = 10
+    x = np.linspace(0,1, Nx)
+
+    ## Set up the initial parameters
+    num_hidden_neurons = [200,100]
+    num_iter = 1000
+    lmb = 1e-3
+
+    P = solve_ode_deep_neural_network(x, num_hidden_neurons, num_iter, lmb)
+
+    g_dnn_ag = g_trial_deep(x,P)
+    g_analytical = g_analytic(x)
+
+    # Find the maximum absolute difference between the solutons:
+    max_diff = np.max(np.abs(g_dnn_ag - g_analytical))
+    print(&quot;The max absolute difference between the solutions is: %g&quot;%max_diff)
+
+    plt.figure(figsize=(10,10))
+
+    plt.title(&#39;Performance of neural network solving an ODE compared to the analytical solution&#39;)
+    plt.plot(x, g_analytical)
+    plt.plot(x, g_dnn_ag[0,:])
+    plt.legend([&#39;analytical&#39;,&#39;nn&#39;])
+    plt.xlabel(&#39;x&#39;)
+    plt.ylabel(&#39;g(x)&#39;)
+    plt.show()
+</pre></div>
+</div>
+</div>
+</div>
+</section>
+<section id="comparing-with-a-numerical-scheme">
+<h2>Comparing with a numerical scheme<a class="headerlink" href="#comparing-with-a-numerical-scheme" title="Link to this heading">#</a></h2>
+<p>The Poisson equation is possible to solve using Taylor series to approximate the second derivative.</p>
+<p>Using Taylor series, the second derivative can be expressed as</p>
+<div class="math notranslate nohighlight">
+\[
+g''(x) = \frac{g(x + \Delta x) - 2g(x) + g(x-\Delta x)}{\Delta x^2} + E_{\Delta x}(x)
+\]</div>
+<p>where <span class="math notranslate nohighlight">\(\Delta x\)</span> is a small step size and <span class="math notranslate nohighlight">\(E_{\Delta x}(x)\)</span> being the error term.</p>
+<p>Looking away from the error terms gives an approximation to the second derivative:</p>
+<!-- Equation labels as ordinary links -->
+<div id="approx"></div>
+<div class="math notranslate nohighlight">
+\[
+\begin{equation} \label{approx} \tag{15}
+g''(x) \approx \frac{g(x + \Delta x) - 2g(x) + g(x-\Delta x)}{\Delta x^2}
+\end{equation}
+\]</div>
+<p>If <span class="math notranslate nohighlight">\(x_i = i \Delta x = x_{i-1} + \Delta x\)</span> and <span class="math notranslate nohighlight">\(g_i = g(x_i)\)</span> for <span class="math notranslate nohighlight">\(i = 1,\dots N_x - 2\)</span> with <span class="math notranslate nohighlight">\(N_x\)</span> being the number of values for <span class="math notranslate nohighlight">\(x\)</span>, (<a class="reference internal" href="#approx"><span class="xref myst">15</span></a>) becomes</p>
+<div class="math notranslate nohighlight">
+\[\begin{split}
+\begin{aligned}
+g''(x_i) &amp;\approx \frac{g(x_i + \Delta x) - 2g(x_i) + g(x_i -\Delta x)}{\Delta x^2} \\
+&amp;= \frac{g_{i+1} - 2g_i + g_{i-1}}{\Delta x^2}
+\end{aligned}
+\end{split}\]</div>
+<p>Since we know from our problem that</p>
+<div class="math notranslate nohighlight">
+\[\begin{split}
+\begin{aligned}
+-g''(x) &amp;= f(x) \\
+&amp;= (3x + x^2)\exp(x)
+\end{aligned}
+\end{split}\]</div>
+<p>along with the conditions <span class="math notranslate nohighlight">\(g(0) = g(1) = 0\)</span>,
+the following scheme can be used to find an approximate solution for <span class="math notranslate nohighlight">\(g(x)\)</span> numerically:</p>
+<!-- Equation labels as ordinary links -->
+<div id="odesys"></div>
+<div class="math notranslate nohighlight">
+\[\begin{split}
+\begin{equation}
+  \begin{aligned}
+  -\Big( \frac{g_{i+1} - 2g_i + g_{i-1}}{\Delta x^2} \Big) &amp;= f(x_i) \\
+  -g_{i+1} + 2g_i - g_{i-1} &amp;= \Delta x^2 f(x_i)
+  \end{aligned}
+\end{equation} \label{odesys} \tag{16}
+\end{split}\]</div>
+<p>for <span class="math notranslate nohighlight">\(i = 1, \dots, N_x - 2\)</span> where <span class="math notranslate nohighlight">\(g_0 = g_{N_x - 1} = 0\)</span> and <span class="math notranslate nohighlight">\(f(x_i) = (3x_i + x_i^2)\exp(x_i)\)</span>, which is given for our specific problem.</p>
+<p>The equation can be rewritten into a matrix equation:</p>
+<div class="math notranslate nohighlight">
+\[\begin{split}
+\begin{aligned}
+\begin{pmatrix}
+2 &amp; -1 &amp; 0 &amp; \dots &amp; 0 \\
+-1 &amp; 2 &amp; -1 &amp; \dots &amp; 0 \\
+\vdots &amp; &amp; \ddots &amp; &amp; \vdots \\
+0 &amp; \dots &amp; -1 &amp; 2 &amp; -1  \\
+0 &amp; \dots &amp; 0 &amp; -1 &amp; 2\\
+\end{pmatrix}
+\begin{pmatrix}
+g_1 \\
+g_2 \\
+\vdots \\
+g_{N_x - 3} \\
+g_{N_x - 2}
+\end{pmatrix}
+&amp;=
+\Delta x^2
+\begin{pmatrix}
+f(x_1) \\
+f(x_2) \\
+\vdots \\
+f(x_{N_x - 3}) \\
+f(x_{N_x - 2})
+\end{pmatrix} \\
+\boldsymbol{A}\boldsymbol{g} &amp;= \boldsymbol{f},
+\end{aligned}
+\end{split}\]</div>
+<p>which makes it possible to solve for the vector <span class="math notranslate nohighlight">\(\boldsymbol{g}\)</span>.</p>
+</section>
+<section id="setting-up-the-code">
+<h2>Setting up the code<a class="headerlink" href="#setting-up-the-code" title="Link to this heading">#</a></h2>
+<p>We can then compare the result from this numerical scheme with the output from our network using Autograd:</p>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-none notranslate"><div class="highlight"><pre><span></span>import autograd.numpy as np
+from autograd import grad, elementwise_grad
+import autograd.numpy.random as npr
+from matplotlib import pyplot as plt
+
+def sigmoid(z):
+    return 1/(1 + np.exp(-z))
+
+def deep_neural_network(deep_params, x):
+    # N_hidden is the number of hidden layers
+    # deep_params is a list, len() should be used
+    N_hidden = len(deep_params) - 1 # -1 since params consists of
+                                        # parameters to all the hidden
+                                        # layers AND the output layer.
+
+    # Assumes input x being an one-dimensional array
+    num_values = np.size(x)
+    x = x.reshape(-1, num_values)
+
+    # Assume that the input layer does nothing to the input x
+    x_input = x
+
+    # Due to multiple hidden layers, define a variable referencing to the
+    # output of the previous layer:
+    x_prev = x_input
+
+    ## Hidden layers:
+
+    for l in range(N_hidden):
+        # From the list of parameters P; find the correct weigths and bias for this layer
+        w_hidden = deep_params[l]
+
+        # Add a row of ones to include bias
+        x_prev = np.concatenate((np.ones((1,num_values)), x_prev ), axis = 0)
+
+        z_hidden = np.matmul(w_hidden, x_prev)
+        x_hidden = sigmoid(z_hidden)
+
+        # Update x_prev such that next layer can use the output from this layer
+        x_prev = x_hidden
+
+    ## Output layer:
+
+    # Get the weights and bias for this layer
+    w_output = deep_params[-1]
+
+    # Include bias:
+    x_prev = np.concatenate((np.ones((1,num_values)), x_prev), axis = 0)
+
+    z_output = np.matmul(w_output, x_prev)
+    x_output = z_output
+
+    return x_output
+
+
+def solve_ode_deep_neural_network(x, num_neurons, num_iter, lmb):
+    # num_hidden_neurons is now a list of number of neurons within each hidden layer
+
+    # Find the number of hidden layers:
+    N_hidden = np.size(num_neurons)
+
+    ## Set up initial weigths and biases
+
+    # Initialize the list of parameters:
+    P = [None]*(N_hidden + 1) # + 1 to include the output layer
+
+    P[0] = npr.randn(num_neurons[0], 2 )
+    for l in range(1,N_hidden):
+        P[l] = npr.randn(num_neurons[l], num_neurons[l-1] + 1) # +1 to include bias
+
+    # For the output layer
+    P[-1] = npr.randn(1, num_neurons[-1] + 1 ) # +1 since bias is included
+
+    print(&#39;Initial cost: %g&#39;%cost_function_deep(P, x))
+
+    ## Start finding the optimal weigths using gradient descent
+
+    # Find the Python function that represents the gradient of the cost function
+    # w.r.t the 0-th input argument -- that is the weights and biases in the hidden and output layer
+    cost_function_deep_grad = grad(cost_function_deep,0)
+
+    # Let the update be done num_iter times
+    for i in range(num_iter):
+        # Evaluate the gradient at the current weights and biases in P.
+        # The cost_grad consist now of N_hidden + 1 arrays; the gradient w.r.t the weights and biases
+        # in the hidden layers and output layers evaluated at x.
+        cost_deep_grad =  cost_function_deep_grad(P, x)
+
+        for l in range(N_hidden+1):
+            P[l] = P[l] - lmb * cost_deep_grad[l]
+
+    print(&#39;Final cost: %g&#39;%cost_function_deep(P, x))
+
+    return P
+
+## Set up the cost function specified for this Poisson equation:
+
+# The right side of the ODE
+def f(x):
+    return (3*x + x**2)*np.exp(x)
+
+def cost_function_deep(P, x):
+
+    # Evaluate the trial function with the current parameters P
+    g_t = g_trial_deep(x,P)
+
+    # Find the derivative w.r.t x of the trial function
+    d2_g_t = elementwise_grad(elementwise_grad(g_trial_deep,0))(x,P)
+
+    right_side = f(x)
+
+    err_sqr = (-d2_g_t - right_side)**2
+    cost_sum = np.sum(err_sqr)
+
+    return cost_sum/np.size(err_sqr)
+
+# The trial solution:
+def g_trial_deep(x,P):
+    return x*(1-x)*deep_neural_network(P,x)
+
+# The analytic solution;
+def g_analytic(x):
+    return x*(1-x)*np.exp(x)
+
+if __name__ == &#39;__main__&#39;:
+    npr.seed(4155)
+
+    ## Decide the vales of arguments to the function to solve
+    Nx = 10
+    x = np.linspace(0,1, Nx)
+
+    ## Set up the initial parameters
+    num_hidden_neurons = [200,100]
+    num_iter = 1000
+    lmb = 1e-3
+
+    P = solve_ode_deep_neural_network(x, num_hidden_neurons, num_iter, lmb)
+
+    g_dnn_ag = g_trial_deep(x,P)
+    g_analytical = g_analytic(x)
+
+    # Find the maximum absolute difference between the solutons:
+
+    plt.figure(figsize=(10,10))
+
+    plt.title(&#39;Performance of neural network solving an ODE compared to the analytical solution&#39;)
+    plt.plot(x, g_analytical)
+    plt.plot(x, g_dnn_ag[0,:])
+    plt.legend([&#39;analytical&#39;,&#39;nn&#39;])
+    plt.xlabel(&#39;x&#39;)
+    plt.ylabel(&#39;g(x)&#39;)
+
+    ## Perform the computation using the numerical scheme
+
+    dx = 1/(Nx - 1)
+
+    # Set up the matrix A
+    A = np.zeros((Nx-2,Nx-2))
+
+    A[0,0] = 2
+    A[0,1] = -1
+
+    for i in range(1,Nx-3):
+        A[i,i-1] = -1
+        A[i,i] = 2
+        A[i,i+1] = -1
+
+    A[Nx - 3, Nx - 4] = -1
+    A[Nx - 3, Nx - 3] = 2
+
+    # Set up the vector f
+    f_vec = dx**2 * f(x[1:-1])
+
+    # Solve the equation
+    g_res = np.linalg.solve(A,f_vec)
+
+    g_vec = np.zeros(Nx)
+    g_vec[1:-1] = g_res
+
+    # Print the differences between each method
+    max_diff1 = np.max(np.abs(g_dnn_ag - g_analytical))
+    max_diff2 = np.max(np.abs(g_vec - g_analytical))
+    print(&quot;The max absolute difference between the analytical solution and DNN Autograd: %g&quot;%max_diff1)
+    print(&quot;The max absolute difference between the analytical solution and numerical scheme: %g&quot;%max_diff2)
+
+    # Plot the results
+    plt.figure(figsize=(10,10))
+
+    plt.plot(x,g_vec)
+    plt.plot(x,g_analytical)
+    plt.plot(x,g_dnn_ag[0,:])
+
+    plt.legend([&#39;numerical scheme&#39;,&#39;analytical&#39;,&#39;dnn&#39;])
+    plt.show()
+</pre></div>
+</div>
+</div>
+</div>
+</section>
+<section id="partial-differential-equations">
+<h2>Partial Differential Equations<a class="headerlink" href="#partial-differential-equations" title="Link to this heading">#</a></h2>
+<p>A partial differential equation (PDE) has a solution here the function
+is defined by multiple variables.  The equation may involve all kinds
+of combinations of which variables the function is differentiated with
+respect to.</p>
+<p>In general, a partial differential equation for a function <span class="math notranslate nohighlight">\(g(x_1,\dots,x_N)\)</span> with <span class="math notranslate nohighlight">\(N\)</span> variables may be expressed as</p>
+<!-- Equation labels as ordinary links -->
+<div id="PDE"></div>
+<div class="math notranslate nohighlight">
+\[
+\begin{equation} \label{PDE} \tag{17}
+  f\left(x_1, \, \dots \, , x_N, \frac{\partial g(x_1,\dots,x_N) }{\partial x_1}, \dots , \frac{\partial g(x_1,\dots,x_N) }{\partial x_N}, \frac{\partial g(x_1,\dots,x_N) }{\partial x_1\partial x_2}, \, \dots \, , \frac{\partial^n g(x_1,\dots,x_N) }{\partial x_N^n} \right) = 0
+\end{equation}
+\]</div>
+<p>where <span class="math notranslate nohighlight">\(f\)</span> is an expression involving all kinds of possible mixed derivatives of <span class="math notranslate nohighlight">\(g(x_1,\dots,x_N)\)</span> up to an order <span class="math notranslate nohighlight">\(n\)</span>. In order for the solution to be unique, some additional conditions must also be given.</p>
+</section>
+<section id="type-of-problem">
+<h2>Type of problem<a class="headerlink" href="#type-of-problem" title="Link to this heading">#</a></h2>
+<p>The problem our network must solve for, is similar to the ODE case.
+We must have a trial solution <span class="math notranslate nohighlight">\(g_t\)</span> at hand.</p>
+<p>For instance, the trial solution could be expressed as</p>
+<div class="math notranslate nohighlight">
+\[
+\begin{align*}
+  g_t(x_1,\dots,x_N) = h_1(x_1,\dots,x_N) + h_2(x_1,\dots,x_N,N(x_1,\dots,x_N,P))
+\end{align*}
+\]</div>
+<p>where <span class="math notranslate nohighlight">\(h_1(x_1,\dots,x_N)\)</span> is a function that ensures <span class="math notranslate nohighlight">\(g_t(x_1,\dots,x_N)\)</span> satisfies some given conditions.
+The neural network <span class="math notranslate nohighlight">\(N(x_1,\dots,x_N,P)\)</span> has weights and biases described by <span class="math notranslate nohighlight">\(P\)</span> and <span class="math notranslate nohighlight">\(h_2(x_1,\dots,x_N,N(x_1,\dots,x_N,P))\)</span> is an expression using the output from the neural network in some way.</p>
+<p>The role of the function <span class="math notranslate nohighlight">\(h_2(x_1,\dots,x_N,N(x_1,\dots,x_N,P))\)</span>, is to ensure that the output of <span class="math notranslate nohighlight">\(N(x_1,\dots,x_N,P)\)</span> is zero when <span class="math notranslate nohighlight">\(g_t(x_1,\dots,x_N)\)</span> is evaluated at the values of <span class="math notranslate nohighlight">\(x_1,\dots,x_N\)</span> where the given conditions must be satisfied. The function <span class="math notranslate nohighlight">\(h_1(x_1,\dots,x_N)\)</span> should alone make <span class="math notranslate nohighlight">\(g_t(x_1,\dots,x_N)\)</span> satisfy the conditions.</p>
+</section>
+<section id="network-requirements">
+<h2>Network requirements<a class="headerlink" href="#network-requirements" title="Link to this heading">#</a></h2>
+<p>The network tries then the minimize the cost function following the
+same ideas as described for the ODE case, but now with more than one
+variables to consider.  The concept still remains the same; find a set
+of parameters <span class="math notranslate nohighlight">\(P\)</span> such that the expression <span class="math notranslate nohighlight">\(f\)</span> in (<a class="reference internal" href="#PDE"><span class="xref myst">17</span></a>) is as
+close to zero as possible.</p>
+<p>As for the ODE case, the cost function is the mean squared error that
+the network must try to minimize. The cost function for the network to
+minimize is</p>
+<div class="math notranslate nohighlight">
+\[
+C\left(x_1, \dots, x_N, P\right) = \left(  f\left(x_1, \, \dots \, , x_N, \frac{\partial g(x_1,\dots,x_N) }{\partial x_1}, \dots , \frac{\partial g(x_1,\dots,x_N) }{\partial x_N}, \frac{\partial g(x_1,\dots,x_N) }{\partial x_1\partial x_2}, \, \dots \, , \frac{\partial^n g(x_1,\dots,x_N) }{\partial x_N^n} \right) \right)^2
+\]</div>
+</section>
+<section id="id3">
+<h2>More details<a class="headerlink" href="#id3" title="Link to this heading">#</a></h2>
+<p>If we let <span class="math notranslate nohighlight">\(\boldsymbol{x} = \big( x_1, \dots, x_N \big)\)</span> be an array containing the values for <span class="math notranslate nohighlight">\(x_1, \dots, x_N\)</span> respectively, the cost function can be reformulated into the following:</p>
+<div class="math notranslate nohighlight">
+\[
+C\left(\boldsymbol{x}, P\right) = f\left( \left( \boldsymbol{x}, \frac{\partial g(\boldsymbol{x}) }{\partial x_1}, \dots , \frac{\partial g(\boldsymbol{x}) }{\partial x_N}, \frac{\partial g(\boldsymbol{x}) }{\partial x_1\partial x_2}, \, \dots \, , \frac{\partial^n g(\boldsymbol{x}) }{\partial x_N^n} \right) \right)^2
+\]</div>
+<p>If we also have <span class="math notranslate nohighlight">\(M\)</span> different sets of values for <span class="math notranslate nohighlight">\(x_1, \dots, x_N\)</span>, that is <span class="math notranslate nohighlight">\(\boldsymbol{x}_i = \big(x_1^{(i)}, \dots, x_N^{(i)}\big)\)</span> for <span class="math notranslate nohighlight">\(i = 1,\dots,M\)</span> being the rows in matrix <span class="math notranslate nohighlight">\(X\)</span>, the cost function can be generalized into</p>
+<div class="math notranslate nohighlight">
+\[
+C\left(X, P \right) = \sum_{i=1}^M f\left( \left( \boldsymbol{x}_i, \frac{\partial g(\boldsymbol{x}_i) }{\partial x_1}, \dots , \frac{\partial g(\boldsymbol{x}_i) }{\partial x_N}, \frac{\partial g(\boldsymbol{x}_i) }{\partial x_1\partial x_2}, \, \dots \, , \frac{\partial^n g(\boldsymbol{x}_i) }{\partial x_N^n} \right) \right)^2.
+\]</div>
+</section>
+<section id="example-the-diffusion-equation">
+<h2>Example: The diffusion equation<a class="headerlink" href="#example-the-diffusion-equation" title="Link to this heading">#</a></h2>
+<p>In one spatial dimension, the equation reads</p>
+<div class="math notranslate nohighlight">
+\[
+\frac{\partial g(x,t)}{\partial t} = \frac{\partial^2 g(x,t)}{\partial x^2}
+\]</div>
+<p>where a possible choice of conditions are</p>
+<div class="math notranslate nohighlight">
+\[\begin{split}
+\begin{align*}
+g(0,t) &amp;= 0 ,\qquad t \geq 0 \\
+g(1,t) &amp;= 0, \qquad t \geq 0 \\
+g(x,0) &amp;= u(x),\qquad x\in [0,1]
+\end{align*}
+\end{split}\]</div>
+<p>with <span class="math notranslate nohighlight">\(u(x)\)</span> being some given function.</p>
+</section>
+<section id="defining-the-problem">
+<h2>Defining the problem<a class="headerlink" href="#defining-the-problem" title="Link to this heading">#</a></h2>
+<p>For this case, we want to find <span class="math notranslate nohighlight">\(g(x,t)\)</span> such that</p>
+<!-- Equation labels as ordinary links -->
+<div id="diffonedim"></div>
+<div class="math notranslate nohighlight">
+\[
+\begin{equation}
+  \frac{\partial g(x,t)}{\partial t} = \frac{\partial^2 g(x,t)}{\partial x^2}
+\end{equation} \label{diffonedim} \tag{18}
+\]</div>
+<p>and</p>
+<div class="math notranslate nohighlight">
+\[\begin{split}
+\begin{align*}
+g(0,t) &amp;= 0 ,\qquad t \geq 0 \\
+g(1,t) &amp;= 0, \qquad t \geq 0 \\
+g(x,0) &amp;= u(x),\qquad x\in [0,1]
+\end{align*}
+\end{split}\]</div>
+<p>with <span class="math notranslate nohighlight">\(u(x) = \sin(\pi x)\)</span>.</p>
+<p>First, let us set up the deep neural network.
+The deep neural network will follow the same structure as discussed in the examples solving the ODEs.
+First, we will look into how Autograd could be used in a network tailored to solve for bivariate functions.</p>
+</section>
+<section id="setting-up-the-network-using-autograd">
+<h2>Setting up the network using Autograd<a class="headerlink" href="#setting-up-the-network-using-autograd" title="Link to this heading">#</a></h2>
+<p>The only change to do here, is to extend our network such that
+functions of multiple parameters are correctly handled.  In this case
+we have two variables in our function to solve for, that is time <span class="math notranslate nohighlight">\(t\)</span>
+and position <span class="math notranslate nohighlight">\(x\)</span>.  The variables will be represented by a
+one-dimensional array in the program.  The program will evaluate the
+network at each possible pair <span class="math notranslate nohighlight">\((x,t)\)</span>, given an array for the desired
+<span class="math notranslate nohighlight">\(x\)</span>-values and <span class="math notranslate nohighlight">\(t\)</span>-values to approximate the solution at.</p>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-none notranslate"><div class="highlight"><pre><span></span>def sigmoid(z):
+    return 1/(1 + np.exp(-z))
+
+def deep_neural_network(deep_params, x):
+    # x is now a point and a 1D numpy array; make it a column vector
+    num_coordinates = np.size(x,0)
+    x = x.reshape(num_coordinates,-1)
+
+    num_points = np.size(x,1)
+
+    # N_hidden is the number of hidden layers
+    N_hidden = len(deep_params) - 1 # -1 since params consist of parameters to all the hidden layers AND the output layer
+
+    # Assume that the input layer does nothing to the input x
+    x_input = x
+    x_prev = x_input
+
+    ## Hidden layers:
+
+    for l in range(N_hidden):
+        # From the list of parameters P; find the correct weigths and bias for this layer
+        w_hidden = deep_params[l]
+
+        # Add a row of ones to include bias
+        x_prev = np.concatenate((np.ones((1,num_points)), x_prev ), axis = 0)
+
+        z_hidden = np.matmul(w_hidden, x_prev)
+        x_hidden = sigmoid(z_hidden)
+
+        # Update x_prev such that next layer can use the output from this layer
+        x_prev = x_hidden
+
+    ## Output layer:
+
+    # Get the weights and bias for this layer
+    w_output = deep_params[-1]
+
+    # Include bias:
+    x_prev = np.concatenate((np.ones((1,num_points)), x_prev), axis = 0)
+
+    z_output = np.matmul(w_output, x_prev)
+    x_output = z_output
+
+    return x_output[0][0]
+</pre></div>
+</div>
+</div>
+</div>
+</section>
+<section id="setting-up-the-network-using-autograd-the-trial-solution">
+<h2>Setting up the network using Autograd; The trial solution<a class="headerlink" href="#setting-up-the-network-using-autograd-the-trial-solution" title="Link to this heading">#</a></h2>
+<p>The cost function must then iterate through the given arrays
+containing values for <span class="math notranslate nohighlight">\(x\)</span> and <span class="math notranslate nohighlight">\(t\)</span>, defines a point <span class="math notranslate nohighlight">\((x,t)\)</span> the deep
+neural network and the trial solution is evaluated at, and then finds
+the Jacobian of the trial solution.</p>
+<p>A possible trial solution for this PDE is</p>
+<div class="math notranslate nohighlight">
+\[
+g_t(x,t) = h_1(x,t) + x(1-x)tN(x,t,P)
+\]</div>
+<p>with <span class="math notranslate nohighlight">\(h_1(x,t)\)</span> being a function ensuring that <span class="math notranslate nohighlight">\(g_t(x,t)\)</span> satisfies our given conditions, and <span class="math notranslate nohighlight">\(N(x,t,P)\)</span> being the output from the deep neural network using weights and biases for each layer from <span class="math notranslate nohighlight">\(P\)</span>.</p>
+<p>To fulfill the conditions, <span class="math notranslate nohighlight">\(h_1(x,t)\)</span> could be:</p>
+<div class="math notranslate nohighlight">
+\[
+h_1(x,t) = (1-t)\Big(u(x) - \big((1-x)u(0) + x u(1)\big)\Big) = (1-t)u(x) = (1-t)\sin(\pi x)
+\]</div>
+<p>since <span class="math notranslate nohighlight">\((0) = u(1) = 0\)</span> and <span class="math notranslate nohighlight">\(u(x) = \sin(\pi x)\)</span>.</p>
+</section>
+<section id="why-the-jacobian">
+<h2>Why the Jacobian?<a class="headerlink" href="#why-the-jacobian" title="Link to this heading">#</a></h2>
+<p>The Jacobian is used because the program must find the derivative of
+the trial solution with respect to <span class="math notranslate nohighlight">\(x\)</span> and <span class="math notranslate nohighlight">\(t\)</span>.</p>
+<p>This gives the necessity of computing the Jacobian matrix, as we want
+to evaluate the gradient with respect to <span class="math notranslate nohighlight">\(x\)</span> and <span class="math notranslate nohighlight">\(t\)</span> (note that the
+Jacobian of a scalar-valued multivariate function is simply its
+gradient).</p>
+<p>In Autograd, the differentiation is by default done with respect to
+the first input argument of your Python function. Since the points is
+an array representing <span class="math notranslate nohighlight">\(x\)</span> and <span class="math notranslate nohighlight">\(t\)</span>, the Jacobian is calculated using
+the values of <span class="math notranslate nohighlight">\(x\)</span> and <span class="math notranslate nohighlight">\(t\)</span>.</p>
+<p>To find the second derivative with respect to <span class="math notranslate nohighlight">\(x\)</span> and <span class="math notranslate nohighlight">\(t\)</span>, the
+Jacobian can be found for the second time. The result is a Hessian
+matrix, which is the matrix containing all the possible second order
+mixed derivatives of <span class="math notranslate nohighlight">\(g(x,t)\)</span>.</p>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-none notranslate"><div class="highlight"><pre><span></span># Set up the trial function:
+def u(x):
+    return np.sin(np.pi*x)
+
+def g_trial(point,P):
+    x,t = point
+    return (1-t)*u(x) + x*(1-x)*t*deep_neural_network(P,point)
+
+# The right side of the ODE:
+def f(point):
+    return 0.
+
+# The cost function:
+def cost_function(P, x, t):
+    cost_sum = 0
+
+    g_t_jacobian_func = jacobian(g_trial)
+    g_t_hessian_func = hessian(g_trial)
+
+    for x_ in x:
+        for t_ in t:
+            point = np.array([x_,t_])
+
+            g_t = g_trial(point,P)
+            g_t_jacobian = g_t_jacobian_func(point,P)
+            g_t_hessian = g_t_hessian_func(point,P)
+
+            g_t_dt = g_t_jacobian[1]
+            g_t_d2x = g_t_hessian[0][0]
+
+            func = f(point)
+
+            err_sqr = ( (g_t_dt - g_t_d2x) - func)**2
+            cost_sum += err_sqr
+
+    return cost_sum
+</pre></div>
+</div>
+</div>
+</div>
+</section>
+<section id="setting-up-the-network-using-autograd-the-full-program">
+<h2>Setting up the network using Autograd; The full program<a class="headerlink" href="#setting-up-the-network-using-autograd-the-full-program" title="Link to this heading">#</a></h2>
+<p>Having set up the network, along with the trial solution and cost function, we can now see how the deep neural network performs by comparing the results to the analytical solution.</p>
+<p>The analytical solution of our problem is</p>
+<div class="math notranslate nohighlight">
+\[
+g(x,t) = \exp(-\pi^2 t)\sin(\pi x)
+\]</div>
+<p>A possible way to implement a neural network solving the PDE, is given below.
+Be aware, though, that it is fairly slow for the parameters used.
+A better result is possible, but requires more iterations, and thus longer time to complete.</p>
+<p>Indeed, the program below is not optimal in its implementation, but rather serves as an example on how to implement and use a neural network to solve a PDE.
+Using TensorFlow results in a much better execution time. Try it!</p>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-none notranslate"><div class="highlight"><pre><span></span>import autograd.numpy as np
+from autograd import jacobian,hessian,grad
+import autograd.numpy.random as npr
+from matplotlib import cm
+from matplotlib import pyplot as plt
+from mpl_toolkits.mplot3d import axes3d
+
+## Set up the network
+
+def sigmoid(z):
+    return 1/(1 + np.exp(-z))
+
+def deep_neural_network(deep_params, x):
+    # x is now a point and a 1D numpy array; make it a column vector
+    num_coordinates = np.size(x,0)
+    x = x.reshape(num_coordinates,-1)
+
+    num_points = np.size(x,1)
+
+    # N_hidden is the number of hidden layers
+    N_hidden = len(deep_params) - 1 # -1 since params consist of parameters to all the hidden layers AND the output layer
+
+    # Assume that the input layer does nothing to the input x
+    x_input = x
+    x_prev = x_input
+
+    ## Hidden layers:
+
+    for l in range(N_hidden):
+        # From the list of parameters P; find the correct weigths and bias for this layer
+        w_hidden = deep_params[l]
+
+        # Add a row of ones to include bias
+        x_prev = np.concatenate((np.ones((1,num_points)), x_prev ), axis = 0)
+
+        z_hidden = np.matmul(w_hidden, x_prev)
+        x_hidden = sigmoid(z_hidden)
+
+        # Update x_prev such that next layer can use the output from this layer
+        x_prev = x_hidden
+
+    ## Output layer:
+
+    # Get the weights and bias for this layer
+    w_output = deep_params[-1]
+
+    # Include bias:
+    x_prev = np.concatenate((np.ones((1,num_points)), x_prev), axis = 0)
+
+    z_output = np.matmul(w_output, x_prev)
+    x_output = z_output
+
+    return x_output[0][0]
+
+## Define the trial solution and cost function
+def u(x):
+    return np.sin(np.pi*x)
+
+def g_trial(point,P):
+    x,t = point
+    return (1-t)*u(x) + x*(1-x)*t*deep_neural_network(P,point)
+
+# The right side of the ODE:
+def f(point):
+    return 0.
+
+# The cost function:
+def cost_function(P, x, t):
+    cost_sum = 0
+
+    g_t_jacobian_func = jacobian(g_trial)
+    g_t_hessian_func = hessian(g_trial)
+
+    for x_ in x:
+        for t_ in t:
+            point = np.array([x_,t_])
+
+            g_t = g_trial(point,P)
+            g_t_jacobian = g_t_jacobian_func(point,P)
+            g_t_hessian = g_t_hessian_func(point,P)
+
+            g_t_dt = g_t_jacobian[1]
+            g_t_d2x = g_t_hessian[0][0]
+
+            func = f(point)
+
+            err_sqr = ( (g_t_dt - g_t_d2x) - func)**2
+            cost_sum += err_sqr
+
+    return cost_sum /( np.size(x)*np.size(t) )
+
+## For comparison, define the analytical solution
+def g_analytic(point):
+    x,t = point
+    return np.exp(-np.pi**2*t)*np.sin(np.pi*x)
+
+## Set up a function for training the network to solve for the equation
+def solve_pde_deep_neural_network(x,t, num_neurons, num_iter, lmb):
+    ## Set up initial weigths and biases
+    N_hidden = np.size(num_neurons)
+
+    ## Set up initial weigths and biases
+
+    # Initialize the list of parameters:
+    P = [None]*(N_hidden + 1) # + 1 to include the output layer
+
+    P[0] = npr.randn(num_neurons[0], 2 + 1 ) # 2 since we have two points, +1 to include bias
+    for l in range(1,N_hidden):
+        P[l] = npr.randn(num_neurons[l], num_neurons[l-1] + 1) # +1 to include bias
+
+    # For the output layer
+    P[-1] = npr.randn(1, num_neurons[-1] + 1 ) # +1 since bias is included
+
+    print(&#39;Initial cost: &#39;,cost_function(P, x, t))
+
+    cost_function_grad = grad(cost_function,0)
+
+    # Let the update be done num_iter times
+    for i in range(num_iter):
+        cost_grad =  cost_function_grad(P, x , t)
+
+        for l in range(N_hidden+1):
+            P[l] = P[l] - lmb * cost_grad[l]
+
+    print(&#39;Final cost: &#39;,cost_function(P, x, t))
+
+    return P
+
+if __name__ == &#39;__main__&#39;:
+    ### Use the neural network:
+    npr.seed(15)
+
+    ## Decide the vales of arguments to the function to solve
+    Nx = 10; Nt = 10
+    x = np.linspace(0, 1, Nx)
+    t = np.linspace(0,1,Nt)
+
+    ## Set up the parameters for the network
+    num_hidden_neurons = [100, 25]
+    num_iter = 250
+    lmb = 0.01
+
+    P = solve_pde_deep_neural_network(x,t, num_hidden_neurons, num_iter, lmb)
+
+    ## Store the results
+    g_dnn_ag = np.zeros((Nx, Nt))
+    G_analytical = np.zeros((Nx, Nt))
+    for i,x_ in enumerate(x):
+        for j, t_ in enumerate(t):
+            point = np.array([x_, t_])
+            g_dnn_ag[i,j] = g_trial(point,P)
+
+            G_analytical[i,j] = g_analytic(point)
+
+    # Find the map difference between the analytical and the computed solution
+    diff_ag = np.abs(g_dnn_ag - G_analytical)
+    print(&#39;Max absolute difference between the analytical solution and the network: %g&#39;%np.max(diff_ag))
+
+    ## Plot the solutions in two dimensions, that being in position and time
+
+    T,X = np.meshgrid(t,x)
+
+    fig = plt.figure(figsize=(10,10))
+    ax = fig.add_suplot(projection=&#39;3d&#39;)
+    ax.set_title(&#39;Solution from the deep neural network w/ %d layer&#39;%len(num_hidden_neurons))
+    s = ax.plot_surface(T,X,g_dnn_ag,linewidth=0,antialiased=False,cmap=cm.viridis)
+    ax.set_xlabel(&#39;Time $t$&#39;)
+    ax.set_ylabel(&#39;Position $x$&#39;);
+
+
+    fig = plt.figure(figsize=(10,10))
+    ax = fig.add_suplot(projection=&#39;3d&#39;)
+    ax.set_title(&#39;Analytical solution&#39;)
+    s = ax.plot_surface(T,X,G_analytical,linewidth=0,antialiased=False,cmap=cm.viridis)
+    ax.set_xlabel(&#39;Time $t$&#39;)
+    ax.set_ylabel(&#39;Position $x$&#39;);
+
+    fig = plt.figure(figsize=(10,10))
+    ax = fig.add_suplot(projection=&#39;3d&#39;)
+    ax.set_title(&#39;Difference&#39;)
+    s = ax.plot_surface(T,X,diff_ag,linewidth=0,antialiased=False,cmap=cm.viridis)
+    ax.set_xlabel(&#39;Time $t$&#39;)
+    ax.set_ylabel(&#39;Position $x$&#39;);
+
+    ## Take some slices of the 3D plots just to see the solutions at particular times
+    indx1 = 0
+    indx2 = int(Nt/2)
+    indx3 = Nt-1
+
+    t1 = t[indx1]
+    t2 = t[indx2]
+    t3 = t[indx3]
+
+    # Slice the results from the DNN
+    res1 = g_dnn_ag[:,indx1]
+    res2 = g_dnn_ag[:,indx2]
+    res3 = g_dnn_ag[:,indx3]
+
+    # Slice the analytical results
+    res_analytical1 = G_analytical[:,indx1]
+    res_analytical2 = G_analytical[:,indx2]
+    res_analytical3 = G_analytical[:,indx3]
+
+    # Plot the slices
+    plt.figure(figsize=(10,10))
+    plt.title(&quot;Computed solutions at time = %g&quot;%t1)
+    plt.plot(x, res1)
+    plt.plot(x,res_analytical1)
+    plt.legend([&#39;dnn&#39;,&#39;analytical&#39;])
+
+    plt.figure(figsize=(10,10))
+    plt.title(&quot;Computed solutions at time = %g&quot;%t2)
+    plt.plot(x, res2)
+    plt.plot(x,res_analytical2)
+    plt.legend([&#39;dnn&#39;,&#39;analytical&#39;])
+
+    plt.figure(figsize=(10,10))
+    plt.title(&quot;Computed solutions at time = %g&quot;%t3)
+    plt.plot(x, res3)
+    plt.plot(x,res_analytical3)
+    plt.legend([&#39;dnn&#39;,&#39;analytical&#39;])
+
+    plt.show()
+</pre></div>
+</div>
+</div>
+</div>
+</section>
+<section id="resources-on-differential-equations-and-deep-learning">
+<h2>Resources on differential equations and deep learning<a class="headerlink" href="#resources-on-differential-equations-and-deep-learning" title="Link to this heading">#</a></h2>
+<ol class="arabic simple">
+<li><p><a class="reference external" href="https://pdfs.semanticscholar.org/d061/df393e0e8fbfd0ea24976458b7d42419040d.pdf">Artificial neural networks for solving ordinary and partial differential equations by I.E. Lagaris et al</a></p></li>
+<li><p><a class="reference external" href="https://becominghuman.ai/neural-networks-for-solving-differential-equations-fa230ac5e04c">Neural networks for solving differential equations by A. Honchar</a></p></li>
+<li><p><a class="reference external" href="http://cs229.stanford.edu/proj2013/ChiaramonteKiener-SolvingDifferentialEquationsUsingNeuralNetworks.pdf">Solving differential equations using neural networks by M.M Chiaramonte and M. Kiener</a></p></li>
+<li><p><a class="reference external" href="https://www.springer.com/us/book/9783540225515">Introduction to Partial Differential Equations by A. Tveito, R. Winther</a></p></li>
+</ol>
+</section>
+<section id="convolutional-neural-networks-recognizing-images">
+<h2>Convolutional Neural Networks (recognizing images)<a class="headerlink" href="#convolutional-neural-networks-recognizing-images" title="Link to this heading">#</a></h2>
+<p>Convolutional neural networks (CNNs) were developed during the last
+decade of the previous century, with a focus on character recognition
+tasks. Nowadays, CNNs are a central element in the spectacular success
+of deep learning methods. The success in for example image
+classifications have made them a central tool for most machine
+learning practitioners.</p>
+<p>CNNs are very similar to ordinary Neural Networks.
+They are made up of neurons that have learnable weights and
+biases. Each neuron receives some inputs, performs a dot product and
+optionally follows it with a non-linearity. The whole network still
+expresses a single differentiable score function: from the raw image
+pixels on one end to class scores at the other. And they still have a
+loss function (for example Softmax) on the last (fully-connected) layer
+and all the tips/tricks we developed for learning regular Neural
+Networks still apply (back propagation, gradient descent etc etc).</p>
+</section>
+<section id="what-is-the-difference">
+<h2>What is the Difference<a class="headerlink" href="#what-is-the-difference" title="Link to this heading">#</a></h2>
+<p><strong>CNN architectures make the explicit assumption that
+the inputs are images, which allows us to encode certain properties
+into the architecture. These then make the forward function more
+efficient to implement and vastly reduce the amount of parameters in
+the network.</strong></p>
+</section>
+<section id="neural-networks-vs-cnns">
+<h2>Neural Networks vs CNNs<a class="headerlink" href="#neural-networks-vs-cnns" title="Link to this heading">#</a></h2>
+<p>Neural networks are defined as <strong>affine transformations</strong>, that is
+a vector is received as input and is multiplied with a matrix of so-called weights (our unknown paramters) to produce an
+output (to which a bias vector is usually added before passing the result
+through a nonlinear activation function). This is applicable to any type of input, be it an
+image, a sound clip or an unordered collection of features: whatever their
+dimensionality, their representation can always be flattened into a vector
+before the transformation.</p>
+</section>
+<section id="why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc">
+<h2>Why CNNS for images, sound files, medical images from CT scans etc?<a class="headerlink" href="#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" title="Link to this heading">#</a></h2>
+<p>However, when we consider images, sound clips and many other similar kinds of data, these data  have an intrinsic
+structure. More formally, they share these important properties:</p>
+<ul class="simple">
+<li><p>They are stored as multi-dimensional arrays (think of the pixels of a figure) .</p></li>
+<li><p>They feature one or more axes for which ordering matters (e.g., width and height axes for an image, time axis for a sound clip).</p></li>
+<li><p>One axis, called the channel axis, is used to access different views of the data (e.g., the red, green and blue channels of a color image, or the left and right channels of a stereo audio track).</p></li>
+</ul>
+<p>These properties are not exploited when an affine transformation is applied; in
+fact, all the axes are treated in the same way and the topological information
+is not taken into account. Still, taking advantage of the implicit structure of
+the data may prove very handy in solving some tasks, like computer vision and
+speech recognition, and in these cases it would be best to preserve it. This is
+where discrete convolutions come into play.</p>
+<p>A discrete convolution is a linear transformation that preserves this notion of
+ordering. It is sparse (only a few input units contribute to a given output
+unit) and reuses parameters (the same weights are applied to multiple locations
+in the input).</p>
+</section>
+<section id="regular-nns-dont-scale-well-to-full-images">
+<h2>Regular NNs don’t scale well to full images<a class="headerlink" href="#regular-nns-dont-scale-well-to-full-images" title="Link to this heading">#</a></h2>
+<p>As an example, consider
+an image of size <span class="math notranslate nohighlight">\(32\times 32\times 3\)</span> (32 wide, 32 high, 3 color channels), so a
+single fully-connected neuron in a first hidden layer of a regular
+Neural Network would have <span class="math notranslate nohighlight">\(32\times 32\times 3 = 3072\)</span> weights. This amount still
+seems manageable, but clearly this fully-connected structure does not
+scale to larger images. For example, an image of more respectable
+size, say <span class="math notranslate nohighlight">\(200\times 200\times 3\)</span>, would lead to neurons that have
+<span class="math notranslate nohighlight">\(200\times 200\times 3 = 120,000\)</span> weights.</p>
+<p>We could have
+several such neurons, and the parameters would add up quickly! Clearly,
+this full connectivity is wasteful and the huge number of parameters
+would quickly lead to possible overfitting.</p>
+<!-- dom:FIGURE: [figslides/nn.jpeg, width=500 frac=0.6]  A regular 3-layer Neural Network. -->
+<!-- begin figure -->
+<p><img src="http://github.com/figslides/nn.jpeg" width="500"><p style="font-size: 0.9em"><i>Figure 1: A regular 3-layer Neural Network.</i></p></p>
+<!-- end figure --></section>
+<section id="d-volumes-of-neurons">
+<h2>3D volumes of neurons<a class="headerlink" href="#d-volumes-of-neurons" title="Link to this heading">#</a></h2>
+<p>Convolutional Neural Networks take advantage of the fact that the
+input consists of images and they constrain the architecture in a more
+sensible way.</p>
+<p>In particular, unlike a regular Neural Network, the
+layers of a CNN have neurons arranged in 3 dimensions: width,
+height, depth. (Note that the word depth here refers to the third
+dimension of an activation volume, not to the depth of a full Neural
+Network, which can refer to the total number of layers in a network.)</p>
+<p>To understand it better, the above example of an image
+with an input volume of
+activations has dimensions <span class="math notranslate nohighlight">\(32\times 32\times 3\)</span> (width, height,
+depth respectively).</p>
+<p>The neurons in a layer will
+only be connected to a small region of the layer before it, instead of
+all of the neurons in a fully-connected manner. Moreover, the final
+output layer could  for this specific image have dimensions <span class="math notranslate nohighlight">\(1\times 1 \times 10\)</span>,
+because by the
+end of the CNN architecture we will reduce the full image into a
+single vector of class scores, arranged along the depth
+dimension.</p>
+<!-- dom:FIGURE: [figslides/cnn.jpeg, width=500 frac=0.6]  A CNN arranges its neurons in three dimensions (width, height, depth), as visualized in one of the layers. Every layer of a CNN transforms the 3D input volume to a 3D output volume of neuron activations. In this example, the red input layer holds the image, so its width and height would be the dimensions of the image, and the depth would be 3 (Red, Green, Blue channels). -->
+<!-- begin figure -->
+<p><img src="http://github.com/figslides/cnn.jpeg" width="500"><p style="font-size: 0.9em"><i>Figure 1: A CNN arranges its neurons in three dimensions (width, height, depth), as visualized in one of the layers. Every layer of a CNN transforms the 3D input volume to a 3D output volume of neuron activations. In this example, the red input layer holds the image, so its width and height would be the dimensions of the image, and the depth would be 3 (Red, Green, Blue channels).</i></p></p>
+<!-- end figure --></section>
+<section id="more-on-dimensionalities">
+<h2>More on Dimensionalities<a class="headerlink" href="#more-on-dimensionalities" title="Link to this heading">#</a></h2>
+<p>In fields like signal processing (and imaging as well), one designs
+so-called filters. These filters are defined by the convolutions and
+are often hand-crafted. One may specify filters for smoothing, edge
+detection, frequency reshaping, and similar operations. However with
+neural networks the idea is to automatically learn the filters and use
+many of them in conjunction with non-linear operations (activation
+functions).</p>
+<p>As an example consider a neural network operating on sound sequence
+data.  Assume that we an input vector <span class="math notranslate nohighlight">\(\boldsymbol{x}\)</span> of length <span class="math notranslate nohighlight">\(d=10^6\)</span>.  We
+construct then a neural network with onle hidden layer only with
+<span class="math notranslate nohighlight">\(10^4\)</span> nodes. This means that we will have a weight matrix with
+<span class="math notranslate nohighlight">\(10^4\times 10^6=10^{10}\)</span> weights to be determined, together with <span class="math notranslate nohighlight">\(10^4\)</span> biases.</p>
+<p>Assume furthermore that we have an output layer which is meant to train whether the sound sequence represents a human voice (true) or something else (false).
+It means that we have only one output node. But since this output node connects to <span class="math notranslate nohighlight">\(10^4\)</span> nodes in the hidden layer, there are in total <span class="math notranslate nohighlight">\(10^4\)</span> weights to be determined for the output layer, plus one bias. In total we have</p>
+<div class="math notranslate nohighlight">
+\[
+\mathrm{NumberParameters}=10^{10}+10^4+10^4+1 \approx 10^{10},
+\]</div>
+<p>that is ten billion parameters to determine.</p>
+</section>
+<section id="further-remarks">
+<h2>Further remarks<a class="headerlink" href="#further-remarks" title="Link to this heading">#</a></h2>
+<p>The main principles that justify convolutions is locality of
+information and repetion of patterns within the signal. Sound samples
+of the input in adjacent spots are much more likely to affect each
+other than those that are very far away. Similarly, sounds are
+repeated in multiple times in the signal. While slightly simplistic,
+reasoning about such a sound example demonstrates this. The same
+principles then apply to images and other similar data.</p>
+</section>
+<section id="layers-used-to-build-cnns">
+<h2>Layers used to build CNNs<a class="headerlink" href="#layers-used-to-build-cnns" title="Link to this heading">#</a></h2>
+<p>A simple CNN is a sequence of layers, and every layer of a CNN
+transforms one volume of activations to another through a
+differentiable function. We use three main types of layers to build
+CNN architectures: Convolutional Layer, Pooling Layer, and
+Fully-Connected Layer (exactly as seen in regular Neural Networks). We
+will stack these layers to form a full CNN architecture.</p>
+<p>A simple CNN for image classification could have the architecture:</p>
+<ul class="simple">
+<li><p><strong>INPUT</strong> (<span class="math notranslate nohighlight">\(32\times 32 \times 3\)</span>) will hold the raw pixel values of the image, in this case an image of width 32, height 32, and with three color channels R,G,B.</p></li>
+<li><p><strong>CONV</strong> (convolutional )layer will compute the output of neurons that are connected to local regions in the input, each computing a dot product between their weights and a small region they are connected to in the input volume. This may result in volume such as <span class="math notranslate nohighlight">\([32\times 32\times 12]\)</span> if we decided to use 12 filters.</p></li>
+<li><p><strong>RELU</strong> layer will apply an elementwise activation function, such as the <span class="math notranslate nohighlight">\(max(0,x)\)</span> thresholding at zero. This leaves the size of the volume unchanged (<span class="math notranslate nohighlight">\([32\times 32\times 12]\)</span>).</p></li>
+<li><p><strong>POOL</strong> (pooling) layer will perform a downsampling operation along the spatial dimensions (width, height), resulting in volume such as <span class="math notranslate nohighlight">\([16\times 16\times 12]\)</span>.</p></li>
+<li><p><strong>FC</strong> (i.e. fully-connected) layer will compute the class scores, resulting in volume of size <span class="math notranslate nohighlight">\([1\times 1\times 10]\)</span>, where each of the 10 numbers correspond to a class score, such as among the 10 categories of the MNIST images we considered above . As with ordinary Neural Networks and as the name implies, each neuron in this layer will be connected to all the numbers in the previous volume.</p></li>
+</ul>
+</section>
+<section id="transforming-images">
+<h2>Transforming images<a class="headerlink" href="#transforming-images" title="Link to this heading">#</a></h2>
+<p>CNNs transform the original image layer by layer from the original
+pixel values to the final class scores.</p>
+<p>Observe that some layers contain
+parameters and other don’t. In particular, the CNN layers perform
+transformations that are a function of not only the activations in the
+input volume, but also of the parameters (the weights and biases of
+the neurons). On the other hand, the RELU/POOL layers will implement a
+fixed function. The parameters in the CONV/FC layers will be trained
+with gradient descent so that the class scores that the CNN computes
+are consistent with the labels in the training set for each image.</p>
+</section>
+<section id="cnns-in-brief">
+<h2>CNNs in brief<a class="headerlink" href="#cnns-in-brief" title="Link to this heading">#</a></h2>
+<p>In summary:</p>
+<ul class="simple">
+<li><p>A CNN architecture is in the simplest case a list of Layers that transform the image volume into an output volume (e.g. holding the class scores)</p></li>
+<li><p>There are a few distinct types of Layers (e.g. CONV/FC/RELU/POOL are by far the most popular)</p></li>
+<li><p>Each Layer accepts an input 3D volume and transforms it to an output 3D volume through a differentiable function</p></li>
+<li><p>Each Layer may or may not have parameters (e.g. CONV/FC do, RELU/POOL don’t)</p></li>
+<li><p>Each Layer may or may not have additional hyperparameters (e.g. CONV/FC/POOL do, RELU doesn’t)</p></li>
+</ul>
+</section>
+<section id="a-deep-cnn-model-from-raschka-et-al">
+<h2>A deep CNN model (<a class="reference external" href="https://github.com/rasbt/machine-learning-book">From Raschka et al</a>)<a class="headerlink" href="#a-deep-cnn-model-from-raschka-et-al" title="Link to this heading">#</a></h2>
+<!-- dom:FIGURE: [figslides/deepcnn.png, width=500 frac=0.67]  A deep CNN -->
+<!-- begin figure -->
+<p><img src="http://github.com/figslides/deepcnn.png" width="500"><p style="font-size: 0.9em"><i>Figure 1: A deep CNN</i></p></p>
+<!-- end figure --></section>
+<section id="key-idea">
+<h2>Key Idea<a class="headerlink" href="#key-idea" title="Link to this heading">#</a></h2>
+<p>A dense neural network is representd by an affine operation (like
+matrix-matrix multiplication) where all parameters are included.</p>
+<p>The key idea in CNNs for say imaging is that in images neighbor pixels tend to be related! So we connect
+only neighboring neurons in the input instead of connecting all with the first hidden layer.</p>
+<p>We say we perform a filtering (convolution is the mathematical operation).</p>
+</section>
+<section id="how-to-do-image-compression-before-the-era-of-deep-learning">
+<h2>How to do image compression before the era of deep learning<a class="headerlink" href="#how-to-do-image-compression-before-the-era-of-deep-learning" title="Link to this heading">#</a></h2>
+<p>The singular-value decomposition (SVD) algorithm has been for decades one of the standard ways of compressing images.
+The <a class="reference external" href="https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/chapter2.html#the-singular-value-decomposition">lectures on the SVD</a> give many of the essential details concerning the SVD.</p>
+<p>The orthogonal vectors which are obtained from the SVD, can be used to
+project down the dimensionality of a given image. In the example here
+we gray-scale an image and downsize it.</p>
+<p>This recipe relies on us being able to actually perform the SVD. For
+large images, and in particular with many images to reconstruct, using the SVD
+may quickly become an overwhelming task. With the advent of efficient deep
+learning methods like CNNs and later generative methods, these methods
+have become in the last years the premier way of performing image
+analysis. In particular for classification problems with labelled images.</p>
+</section>
+<section id="the-svd-example">
+<h2>The SVD example<a class="headerlink" href="#the-svd-example" title="Link to this heading">#</a></h2>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-none notranslate"><div class="highlight"><pre><span></span>from matplotlib.image import imread
+import matplotlib.pyplot as plt
+import scipy.linalg as ln
+import numpy as np
+import os
+from PIL import Image
+from math import log10, sqrt 
+plt.rcParams[&#39;figure.figsize&#39;] = [16, 8]
+# Import image
+A = imread(os.path.join(&quot;figslides/photo1.jpg&quot;))
+X = A.dot([0.299, 0.5870, 0.114]) # Convert RGB to grayscale
+img = plt.imshow(X)
+# convert to gray
+img.set_cmap(&#39;gray&#39;)
+plt.axis(&#39;off&#39;)
+plt.show()
+# Call image size
+print(&#39;: %s&#39;%str(X.shape))
+
+
+# split the matrix into U, S, VT
+U, S, VT = np.linalg.svd(X,full_matrices=False)
+S = np.diag(S)
+m = 800 # Image&#39;s width
+n = 1200 # Image&#39;s height
+j = 0
+# Try compression with different k vectors (these represent projections):
+for k in (5,10, 20, 100,200,400,500):
+    # Original size of the image
+    originalSize = m * n 
+    # Size after compressed
+    compressedSize = k * (1 + m + n) 
+    # The projection of the original image
+    Xapprox = U[:,:k] @ S[0:k,:k] @ VT[:k,:]
+    plt.figure(j+1)
+    j += 1
+    img = plt.imshow(Xapprox)
+    img.set_cmap(&#39;gray&#39;)
+    
+    plt.axis(&#39;off&#39;)
+    plt.title(&#39;k = &#39; + str(k))
+    plt.show() 
+    print(&#39;Original size of image:&#39;)
+    print(originalSize)
+    print(&#39;Compression rate as Compressed image / Original size:&#39;)
+    ratio = compressedSize * 1.0 / originalSize
+    print(ratio)
+    print(&#39;Compression rate is &#39; + str( round(ratio * 100 ,2)) + &#39;%&#39; )  
+    # Estimate MQA
+    x= X.astype(&quot;float&quot;)
+    y=Xapprox.astype(&quot;float&quot;)
+    err = np.sum((x - y) ** 2)
+    err /= float(X.shape[0] * Xapprox.shape[1])
+    print(&#39;The mean-square deviation &#39;+ str(round( err)))
+    max_pixel = 255.0
+    # Estimate Signal Noise Ratio
+    srv = 20 * (log10(max_pixel / sqrt(err)))
+    print(&#39;Signa to noise ratio &#39;+ str(round(srv)) +&#39;dB&#39;)
+</pre></div>
+</div>
+</div>
+</div>
+</section>
+<section id="mathematics-of-cnns">
+<h2>Mathematics of CNNs<a class="headerlink" href="#mathematics-of-cnns" title="Link to this heading">#</a></h2>
+<p>The mathematics of CNNs is based on the mathematical operation of
+<strong>convolution</strong>.  In mathematics (in particular in functional analysis),
+convolution is represented by mathematical operations (integration,
+summation etc) on two functions in order to produce a third function
+that expresses how the shape of one gets modified by the other.
+Convolution has a plethora of applications in a variety of
+disciplines, spanning from statistics to signal processing, computer
+vision, solutions of differential equations,linear algebra,
+engineering, and yes, machine learning.</p>
+<p>Mathematically, convolution is defined as follows (one-dimensional example):
+Let us define a continuous function <span class="math notranslate nohighlight">\(y(t)\)</span> given by</p>
+<div class="math notranslate nohighlight">
+\[
+y(t) = \int x(a) w(t-a) da,
+\]</div>
+<p>where <span class="math notranslate nohighlight">\(x(a)\)</span> represents a so-called input and <span class="math notranslate nohighlight">\(w(t-a)\)</span> is normally called the weight function or kernel.</p>
+<p>The above integral is written in  a more compact form as</p>
+<div class="math notranslate nohighlight">
+\[
+y(t) = \left(x * w\right)(t).
+\]</div>
+<p>The discretized version reads</p>
+<div class="math notranslate nohighlight">
+\[
+y(t) = \sum_{a=-\infty}^{a=\infty}x(a)w(t-a).
+\]</div>
+<p>Computing the inverse of the above convolution operations is known as deconvolution and the process is commutative.</p>
+<p>How can we use this? And what does it mean? Let us study some familiar examples first.</p>
+</section>
+<section id="convolution-examples-polynomial-multiplication">
+<h2>Convolution Examples: Polynomial multiplication<a class="headerlink" href="#convolution-examples-polynomial-multiplication" title="Link to this heading">#</a></h2>
+<p>Our first example is that of a multiplication between two polynomials,
+which we will rewrite in terms of the mathematics of convolution. In
+the final stage, since the problem here is a discrete one, we will
+recast the final expression in terms of a matrix-vector
+multiplication, where the matrix is a so-called <a class="reference external" href="https://link.springer.com/book/10.1007/978-93-86279-04-0">Toeplitz matrix
+</a>.</p>
+<p>Let us look a the following polynomials to second and third order, respectively:</p>
+<div class="math notranslate nohighlight">
+\[
+p(t) = \alpha_0+\alpha_1 t+\alpha_2 t^2,
+\]</div>
+<p>and</p>
+<div class="math notranslate nohighlight">
+\[
+s(t) = \beta_0+\beta_1 t+\beta_2 t^2+\beta_3 t^3.
+\]</div>
+<p>The polynomial multiplication gives us a new polynomial of degree <span class="math notranslate nohighlight">\(5\)</span></p>
+<div class="math notranslate nohighlight">
+\[
+z(t) = \delta_0+\delta_1 t+\delta_2 t^2+\delta_3 t^3+\delta_4 t^4+\delta_5 t^5.
+\]</div>
+</section>
+<section id="efficient-polynomial-multiplication">
+<h2>Efficient Polynomial Multiplication<a class="headerlink" href="#efficient-polynomial-multiplication" title="Link to this heading">#</a></h2>
+<p>Computing polynomial products can be implemented efficiently if we rewrite the more brute force multiplications using convolution.
+We note first that the new coefficients are given as</p>
+<div class="math notranslate nohighlight">
+\[\begin{split}
+\begin{split}
+\delta_0=&amp;\alpha_0\beta_0\\
+\delta_1=&amp;\alpha_1\beta_0+\alpha_0\beta_1\\
+\delta_2=&amp;\alpha_0\beta_2+\alpha_1\beta_1+\alpha_2\beta_0\\
+\delta_3=&amp;\alpha_1\beta_2+\alpha_2\beta_1+\alpha_0\beta_3\\
+\delta_4=&amp;\alpha_2\beta_2+\alpha_1\beta_3\\
+\delta_5=&amp;\alpha_2\beta_3.\\
+\end{split}
+\end{split}\]</div>
+<p>We note that <span class="math notranslate nohighlight">\(\alpha_i=0\)</span> except for <span class="math notranslate nohighlight">\(i\in \left\{0,1,2\right\}\)</span> and <span class="math notranslate nohighlight">\(\beta_i=0\)</span> except for <span class="math notranslate nohighlight">\(i\in\left\{0,1,2,3\right\}\)</span>.</p>
+<p>We can then rewrite the coefficients <span class="math notranslate nohighlight">\(\delta_j\)</span> using a discrete convolution as</p>
+<div class="math notranslate nohighlight">
+\[
+\delta_j = \sum_{i=-\infty}^{i=\infty}\alpha_i\beta_{j-i}=(\alpha * \beta)_j,
+\]</div>
+<p>or as a double sum with restriction <span class="math notranslate nohighlight">\(l=i+j\)</span></p>
+<div class="math notranslate nohighlight">
+\[
+\delta_l = \sum_{ij}\alpha_i\beta_{j}.
+\]</div>
+</section>
+<section id="further-simplification">
+<h2>Further simplification<a class="headerlink" href="#further-simplification" title="Link to this heading">#</a></h2>
+<p>Although we may have redundant operations with some few zeros for <span class="math notranslate nohighlight">\(\beta_i\)</span>, we can rewrite the above sum in a more compact way as</p>
+<div class="math notranslate nohighlight">
+\[
+\delta_i = \sum_{k=0}^{k=m-1}\alpha_k\beta_{i-k},
+\]</div>
+<p>where <span class="math notranslate nohighlight">\(m=3\)</span> in our case, the maximum length of
+the vector <span class="math notranslate nohighlight">\(\alpha\)</span>. Note that the vector <span class="math notranslate nohighlight">\(\boldsymbol{\beta}\)</span> has length <span class="math notranslate nohighlight">\(n=4\)</span>. Below we will find an even more efficient representation.</p>
+</section>
+<section id="a-more-efficient-way-of-coding-the-above-convolution">
+<h2>A more efficient way of coding the above Convolution<a class="headerlink" href="#a-more-efficient-way-of-coding-the-above-convolution" title="Link to this heading">#</a></h2>
+<p>Since we only have a finite number of <span class="math notranslate nohighlight">\(\alpha\)</span> and <span class="math notranslate nohighlight">\(\beta\)</span> values
+which are non-zero, we can rewrite the above convolution expressions
+as a matrix-vector multiplication</p>
+<div class="math notranslate nohighlight">
+\[\begin{split}
+\boldsymbol{\delta}=\begin{bmatrix}\alpha_0 &amp; 0 &amp; 0 &amp; 0 \\
+                            \alpha_1 &amp; \alpha_0 &amp; 0 &amp; 0 \\
+			    \alpha_2 &amp; \alpha_1 &amp; \alpha_0 &amp; 0 \\
+			    0 &amp; \alpha_2 &amp; \alpha_1 &amp; \alpha_0 \\
+			    0 &amp; 0 &amp; \alpha_2 &amp; \alpha_1 \\
+			    0 &amp; 0 &amp; 0 &amp; \alpha_2
+			    \end{bmatrix}\begin{bmatrix} \beta_0 \\ \beta_1 \\ \beta_2 \\ \beta_3\end{bmatrix}.
+\end{split}\]</div>
+</section>
+<section id="commutative-process">
+<h2>Commutative process<a class="headerlink" href="#commutative-process" title="Link to this heading">#</a></h2>
+<p>The process is commutative and we can easily see that we can rewrite the multiplication in terms of  a matrix holding <span class="math notranslate nohighlight">\(\beta\)</span> and a vector holding <span class="math notranslate nohighlight">\(\alpha\)</span>.
+In this case we have</p>
+<div class="math notranslate nohighlight">
+\[\begin{split}
+\boldsymbol{\delta}=\begin{bmatrix}\beta_0 &amp; 0 &amp; 0  \\
+                            \beta_1 &amp; \beta_0 &amp; 0  \\
+			    \beta_2 &amp; \beta_1 &amp; \beta_0  \\
+			    \beta_3 &amp; \beta_2 &amp; \beta_1 \\
+			    0 &amp; \beta_3 &amp; \beta_2 \\
+			    0 &amp; 0 &amp; \beta_3
+			    \end{bmatrix}\begin{bmatrix} \alpha_0 \\ \alpha_1 \\ \alpha_2\end{bmatrix}.
+\end{split}\]</div>
+<p>Note that the use of these matrices is for mathematical purposes only
+and not implementation purposes.  When implementing the above equation
+we do not encode (and allocate memory) the matrices explicitely.  We
+rather code the convolutions in the minimal memory footprint that they
+require.</p>
+</section>
+<section id="toeplitz-matrices">
+<h2>Toeplitz matrices<a class="headerlink" href="#toeplitz-matrices" title="Link to this heading">#</a></h2>
+<p>The above matrices are examples of so-called <a class="reference external" href="https://link.springer.com/book/10.1007/978-93-86279-04-0">Toeplitz
+matrices</a>. A
+Toeplitz matrix is a matrix in which each descending diagonal from
+left to right is constant. For instance the last matrix, which we
+rewrite as</p>
+<div class="math notranslate nohighlight">
+\[\begin{split}
+\boldsymbol{A}=\begin{bmatrix}a_0 &amp; 0 &amp; 0  \\
+                            a_1 &amp; a_0 &amp; 0  \\
+			    a_2 &amp; a_1 &amp; a_0  \\
+			    a_3 &amp; a_2 &amp; a_1 \\
+			    0 &amp; a_3 &amp; a_2 \\
+			    0 &amp; 0 &amp; a_3
+			    \end{bmatrix},
+\end{split}\]</div>
+<p>with elements <span class="math notranslate nohighlight">\(a_{ii}=a_{i+1,j+1}=a_{i-j}\)</span> is an example of a Toeplitz
+matrix. Such a matrix does not need to be a square matrix.  Toeplitz
+matrices are also closely connected with Fourier series, because the multiplication operator by a trigonometric
+polynomial, compressed to a finite-dimensional space, can be
+represented by such a matrix. The example above shows that we can
+represent linear convolution as multiplication of a Toeplitz matrix by
+a vector.</p>
+</section>
+<section id="fourier-series-and-toeplitz-matrices">
+<h2>Fourier series and Toeplitz matrices<a class="headerlink" href="#fourier-series-and-toeplitz-matrices" title="Link to this heading">#</a></h2>
+<p>This is an active and ogoing research area concerning CNNs. The following articles may be of interest</p>
+<ol class="arabic simple">
+<li><p><a class="reference external" href="https://www.sciencedirect.com/topics/engineering/convolution-theorem#:~:text=The%20convolution%20theorem%20(together%20with,k%20)%20G%20(%20k%20)%20.">Read more about the convolution theorem and Fouriers series</a></p></li>
+<li><p><a class="reference external" href="https://www.sciencedirect.com/science/article/pii/S1568494623006257">Fourier Transform Layer</a></p></li>
+</ol>
+</section>
+<section id="generalizing-the-above-one-dimensional-case">
+<h2>Generalizing the above one-dimensional case<a class="headerlink" href="#generalizing-the-above-one-dimensional-case" title="Link to this heading">#</a></h2>
+<p>In order to align the above simple case with the more general
+convolution cases, we rename <span class="math notranslate nohighlight">\(\boldsymbol{\alpha}\)</span>, whose length is <span class="math notranslate nohighlight">\(m=3\)</span>,
+with <span class="math notranslate nohighlight">\(\boldsymbol{w}\)</span>.  We will interpret <span class="math notranslate nohighlight">\(\boldsymbol{w}\)</span> as a weight/filter function
+with which we want to perform the convolution with an input variable
+<span class="math notranslate nohighlight">\(\boldsymbol{x}\)</span> of length <span class="math notranslate nohighlight">\(n\)</span>.  We will assume always that the filter
+<span class="math notranslate nohighlight">\(\boldsymbol{w}\)</span> has dimensionality <span class="math notranslate nohighlight">\(m \le n\)</span>.</p>
+<p>We replace thus <span class="math notranslate nohighlight">\(\boldsymbol{\beta}\)</span> with <span class="math notranslate nohighlight">\(\boldsymbol{x}\)</span> and <span class="math notranslate nohighlight">\(\boldsymbol{\delta}\)</span> with <span class="math notranslate nohighlight">\(\boldsymbol{y}\)</span> and have</p>
+<div class="math notranslate nohighlight">
+\[
+y(i)= \left(x*w\right)(i)= \sum_{k=0}^{k=m-1}w(k)x(i-k),
+\]</div>
+<p>where <span class="math notranslate nohighlight">\(m=3\)</span> in our case, the maximum length of the vector <span class="math notranslate nohighlight">\(\boldsymbol{w}\)</span>.
+Here the symbol <span class="math notranslate nohighlight">\(*\)</span> represents the mathematical operation of convolution.</p>
+</section>
+<section id="memory-considerations">
+<h2>Memory considerations<a class="headerlink" href="#memory-considerations" title="Link to this heading">#</a></h2>
+<p>This expression leaves us however with some terms with negative
+indices, for example <span class="math notranslate nohighlight">\(x(-1)\)</span> and <span class="math notranslate nohighlight">\(x(-2)\)</span> which may not be defined. Our
+vector <span class="math notranslate nohighlight">\(\boldsymbol{x}\)</span> has components <span class="math notranslate nohighlight">\(x(0)\)</span>, <span class="math notranslate nohighlight">\(x(1)\)</span>, <span class="math notranslate nohighlight">\(x(2)\)</span> and <span class="math notranslate nohighlight">\(x(3)\)</span>.</p>
+<p>The index <span class="math notranslate nohighlight">\(j\)</span> for <span class="math notranslate nohighlight">\(\boldsymbol{x}\)</span> runs from <span class="math notranslate nohighlight">\(j=0\)</span> to <span class="math notranslate nohighlight">\(j=3\)</span> since <span class="math notranslate nohighlight">\(\boldsymbol{x}\)</span> is meant to
+represent a third-order polynomial.</p>
+<p>Furthermore, the index <span class="math notranslate nohighlight">\(i\)</span> runs from <span class="math notranslate nohighlight">\(i=0\)</span> to <span class="math notranslate nohighlight">\(i=5\)</span> since <span class="math notranslate nohighlight">\(\boldsymbol{y}\)</span>
+contains the coefficients of a fifth-order polynomial.  When <span class="math notranslate nohighlight">\(i=5\)</span> we
+may also have values of <span class="math notranslate nohighlight">\(x(4)\)</span> and <span class="math notranslate nohighlight">\(x(5)\)</span> which are not defined.</p>
+</section>
+<section id="padding">
+<h2>Padding<a class="headerlink" href="#padding" title="Link to this heading">#</a></h2>
+<p>The solution to this is what is called <strong>padding</strong>!  We simply define a
+new vector <span class="math notranslate nohighlight">\(x\)</span> with two added elements set to zero before <span class="math notranslate nohighlight">\(x(0)\)</span> and
+two new elements after <span class="math notranslate nohighlight">\(x(3)\)</span> set to zero. That is, we augment the
+length of <span class="math notranslate nohighlight">\(\boldsymbol{x}\)</span> from <span class="math notranslate nohighlight">\(n=4\)</span> to <span class="math notranslate nohighlight">\(n+2P=8\)</span>, where <span class="math notranslate nohighlight">\(P=2\)</span> is the padding
+constant (a new hyperparameter), see discussions below as well.</p>
+</section>
+<section id="new-vector">
+<h2>New vector<a class="headerlink" href="#new-vector" title="Link to this heading">#</a></h2>
+<p>We have a new vector defined as <span class="math notranslate nohighlight">\(x(0)=0\)</span>, <span class="math notranslate nohighlight">\(x(1)=0\)</span>,
+<span class="math notranslate nohighlight">\(x(2)=\beta_0\)</span>, <span class="math notranslate nohighlight">\(x(3)=\beta_1\)</span>, <span class="math notranslate nohighlight">\(x(4)=\beta_2\)</span>, <span class="math notranslate nohighlight">\(x(5)=\beta_3\)</span>,
+<span class="math notranslate nohighlight">\(x(6)=0\)</span>, and <span class="math notranslate nohighlight">\(x(7)=0\)</span>.</p>
+<p>We have added four new elements, which
+are all zero. The benefit is that we can rewrite the equation for
+<span class="math notranslate nohighlight">\(\boldsymbol{y}\)</span>, with <span class="math notranslate nohighlight">\(i=0,1,\dots,5\)</span>,</p>
+<div class="math notranslate nohighlight">
+\[
+y(i) = \sum_{k=0}^{k=m-1}w(k)x(i+(m-1)-k).
+\]</div>
+<p>As an example, we have</p>
+<div class="math notranslate nohighlight">
+\[
+y(4)=x(6)w(0)+x(5)w(1)+x(4)w(2)=0\times \alpha_0+\beta_3\alpha_1+\beta_2\alpha_2,
+\]</div>
+<p>as before except that we have an additional term <span class="math notranslate nohighlight">\(x(6)w(0)\)</span>, which is zero.</p>
+<p>Similarly, for the fifth-order term we have</p>
+<div class="math notranslate nohighlight">
+\[
+y(5)=x(7)w(0)+x(6)w(1)+x(5)w(2)=0\times \alpha_0+0\times\alpha_1+\beta_3\alpha_2.
+\]</div>
+<p>The zeroth-order term is</p>
+<div class="math notranslate nohighlight">
+\[
+y(0)=x(2)w(0)+x(1)w(1)+x(0)w(2)=\beta_0 \alpha_0+0\times\alpha_1+0\times\alpha_2=\alpha_0\beta_0.
+\]</div>
+</section>
+<section id="rewriting-as-dot-products">
+<h2>Rewriting as dot products<a class="headerlink" href="#rewriting-as-dot-products" title="Link to this heading">#</a></h2>
+<p>If we now flip the filter/weight vector, with the following term as a typical example</p>
+<div class="math notranslate nohighlight">
+\[
+y(0)=x(2)w(0)+x(1)w(1)+x(0)w(2)=x(2)\tilde{w}(2)+x(1)\tilde{w}(1)+x(0)\tilde{w}(0),
+\]</div>
+<p>with <span class="math notranslate nohighlight">\(\tilde{w}(0)=w(2)\)</span>, <span class="math notranslate nohighlight">\(\tilde{w}(1)=w(1)\)</span>, and <span class="math notranslate nohighlight">\(\tilde{w}(2)=w(0)\)</span>, we can then rewrite the above sum as a dot product of
+<span class="math notranslate nohighlight">\(x(i:i+(m-1))\tilde{w}\)</span> for element <span class="math notranslate nohighlight">\(y(i)\)</span>, where <span class="math notranslate nohighlight">\(x(i:i+(m-1))\)</span> is simply a patch of <span class="math notranslate nohighlight">\(\boldsymbol{x}\)</span> of size <span class="math notranslate nohighlight">\(m-1\)</span>.</p>
+<p>The padding <span class="math notranslate nohighlight">\(P\)</span> we have introduced for the convolution stage is just
+another hyperparameter which is introduced as part of the
+architecture. Similarly, below we will also introduce another
+hyperparameter called <strong>Stride</strong> <span class="math notranslate nohighlight">\(S\)</span>.</p>
+</section>
+<section id="cross-correlation">
+<h2>Cross correlation<a class="headerlink" href="#cross-correlation" title="Link to this heading">#</a></h2>
+<p>In essentially all applications one uses what is called cross correlation instead of the standard convolution described above.
+This means that multiplication is performed in the same direction and instead of the general expression we have discussed above (with infinite sums)</p>
+<div class="math notranslate nohighlight">
+\[
+y(i) = \sum_{k=-\infty}^{k=\infty}w(k)x(i-k),
+\]</div>
+<p>we have now</p>
+<div class="math notranslate nohighlight">
+\[
+y(i) = \sum_{k=-\infty}^{k=\infty}w(k)x(i+k).
+\]</div>
+<p>Both TensorFlow and PyTorch (as well as our own code example below),
+implement the last equation, although it is normally referred to as
+convolution.  The same padding rules and stride rules discussed below
+apply to this expression as well.</p>
+<p>We leave it as an exercise for you to convince yourself that the example we have discussed till now, gives the same final expression using the last expression.</p>
+</section>
+<section id="two-dimensional-objects">
+<h2>Two-dimensional objects<a class="headerlink" href="#two-dimensional-objects" title="Link to this heading">#</a></h2>
+<p>We are now ready to start studying the discrete convolutions relevant for convolutional neural networks.
+We often use convolutions over more than one dimension at a time. If
+we have a two-dimensional image <span class="math notranslate nohighlight">\(X\)</span> as input, we can have a <strong>filter</strong>
+defined by a two-dimensional <strong>kernel/weight/filter</strong> <span class="math notranslate nohighlight">\(W\)</span>. This leads to an output <span class="math notranslate nohighlight">\(Y\)</span></p>
+<div class="math notranslate nohighlight">
+\[
+Y(i,j)=(X * W)(i,j) = \sum_m\sum_n X(m,n)W(i-m,j-n).
+\]</div>
+<p>Convolution is a commutative process, which means we can rewrite this equation as</p>
+<div class="math notranslate nohighlight">
+\[
+Y(i,j)=(X * W)(i,j) = \sum_m\sum_n X(i-m,j-n)W(m,n).
+\]</div>
+<p>Normally the latter is more straightforward to implement in a machine
+larning library since there is less variation in the range of values
+of <span class="math notranslate nohighlight">\(m\)</span> and <span class="math notranslate nohighlight">\(n\)</span>.</p>
+<p>As mentioned above, most deep learning libraries implement
+cross-correlation instead of convolution (although it is referred to as
+convolution)</p>
+<div class="math notranslate nohighlight">
+\[
+Y(i,j)=(X * W)(i,j) = \sum_m\sum_n X(i+m,j+n)W(m,n).
+\]</div>
+</section>
+<section id="cnns-in-more-detail-simple-example">
+<h2>CNNs in more detail, simple example<a class="headerlink" href="#cnns-in-more-detail-simple-example" title="Link to this heading">#</a></h2>
+<p>Let assume we have an input matrix <span class="math notranslate nohighlight">\(X\)</span> of dimensionality <span class="math notranslate nohighlight">\(3\times 3\)</span>
+and a <span class="math notranslate nohighlight">\(2\times 2\)</span> filter <span class="math notranslate nohighlight">\(W\)</span> given by the following matrices</p>
+<div class="math notranslate nohighlight">
+\[\begin{split}
+\boldsymbol{X}=\begin{bmatrix}x_{00} &amp; x_{01} &amp; x_{02}  \\
+                      x_{10} &amp; x_{11} &amp; x_{12}  \\
+	              x_{20} &amp; x_{21} &amp; x_{22} \end{bmatrix},
+\end{split}\]</div>
+<p>and</p>
+<div class="math notranslate nohighlight">
+\[\begin{split}
+\boldsymbol{W}=\begin{bmatrix}w_{00} &amp; w_{01} \\
+	              w_{10} &amp; w_{11}\end{bmatrix}.
+\end{split}\]</div>
+<p>We introduce now the hyperparameter <span class="math notranslate nohighlight">\(S\)</span> <strong>stride</strong>. Stride represents how the filter <span class="math notranslate nohighlight">\(W\)</span> moves the convolution process on the matrix <span class="math notranslate nohighlight">\(X\)</span>.
+We strongly recommend the repository on <a class="reference external" href="https://github.com/vdumoulin/conv_arithmetic">Arithmetic of deep learning by Dumoulin and Visin</a></p>
+<p>Here we set the stride equal to <span class="math notranslate nohighlight">\(S=1\)</span>, which means that, starting with the element <span class="math notranslate nohighlight">\(x_{00}\)</span>, the filter will act on <span class="math notranslate nohighlight">\(2\times 2\)</span> submatrices each time, starting with the upper corner and moving according to the stride value column by column.</p>
+<p>Here we perform the operation</p>
+<div class="math notranslate nohighlight">
+\[
+Y_(i,j)=(X * W)(i,j) = \sum_m\sum_n X(i-m,j-n)W(m,n),
+\]</div>
+<p>and obtain</p>
+<div class="math notranslate nohighlight">
+\[\begin{split}
+\boldsymbol{Y}=\begin{bmatrix}x_{00}w_{00}+x_{01}w_{01}+x_{10}w_{10}+x_{11}w_{11} &amp; x_{01}w_{00}+x_{02}w_{01}+x_{11}w_{10}+x_{12}w_{11}  \\
+	              x_{10}w_{00}+x_{11}w_{01}+x_{20}w_{10}+x_{21}w_{11} &amp; x_{11}w_{00}+x_{12}w_{01}+x_{21}w_{10}+x_{22}w_{11}\end{bmatrix}.
+\end{split}\]</div>
+<p>We can rewrite this operation in terms of a matrix-vector multiplication by defining a new vector where we flatten out the inputs as a vector <span class="math notranslate nohighlight">\(\boldsymbol{X}'\)</span> of length <span class="math notranslate nohighlight">\(9\)</span> and
+a matrix <span class="math notranslate nohighlight">\(\boldsymbol{W}'\)</span> with dimension <span class="math notranslate nohighlight">\(4\times 9\)</span> as</p>
+<div class="math notranslate nohighlight">
+\[\begin{split}
+\boldsymbol{X}'=\begin{bmatrix}x_{00} \\ x_{01} \\ x_{02} \\ x_{10} \\ x_{11} \\ x_{12} \\ x_{20} \\ x_{21} \\ x_{22} \end{bmatrix},
+\end{split}\]</div>
+<p>and the new matrix</p>
+<div class="math notranslate nohighlight">
+\[\begin{split}
+\boldsymbol{W}'=\begin{bmatrix} w_{00} &amp; w_{01} &amp; 0 &amp; w_{10} &amp; w_{11} &amp; 0 &amp; 0 &amp; 0 &amp; 0 \\
+                        0  &amp; w_{00} &amp; w_{01} &amp; 0 &amp; w_{10} &amp; w_{11} &amp; 0 &amp; 0 &amp; 0 \\
+			0 &amp; 0 &amp; 0 &amp; w_{00} &amp; w_{01} &amp; 0 &amp; w_{10} &amp; w_{11} &amp; 0  \\
+                        0 &amp; 0 &amp; 0 &amp; 0 &amp; w_{00} &amp; w_{01} &amp; 0 &amp; w_{10} &amp; w_{11}\end{bmatrix}.
+\end{split}\]</div>
+<p>We see easily that performing the matrix-vector multiplication <span class="math notranslate nohighlight">\(\boldsymbol{W}'\boldsymbol{X}'\)</span> is the same as the above convolution with stride <span class="math notranslate nohighlight">\(S=1\)</span>, that is</p>
+<div class="math notranslate nohighlight">
+\[
+Y=(\boldsymbol{W}*\boldsymbol{X}),
+\]</div>
+<p>is now given by <span class="math notranslate nohighlight">\(\boldsymbol{W}'\boldsymbol{X}'\)</span> which is a vector of length <span class="math notranslate nohighlight">\(4\)</span> instead of the originally resulting  <span class="math notranslate nohighlight">\(2\times 2\)</span> output matrix.</p>
+</section>
+<section id="the-convolution-stage">
+<h2>The convolution stage<a class="headerlink" href="#the-convolution-stage" title="Link to this heading">#</a></h2>
+<p>The convolution stage, where we apply different filters <span class="math notranslate nohighlight">\(\boldsymbol{W}\)</span> in
+order to reduce the dimensionality of an image, adds, in addition to
+the weights and biases (to be trained by the back propagation
+algorithm) that define the filters, two new hyperparameters, the so-called
+<strong>padding</strong> <span class="math notranslate nohighlight">\(P\)</span> and the stride <span class="math notranslate nohighlight">\(S\)</span>.</p>
+</section>
+<section id="finding-the-number-of-parameters">
+<h2>Finding the number of parameters<a class="headerlink" href="#finding-the-number-of-parameters" title="Link to this heading">#</a></h2>
+<p>In the above example we have an input matrix of dimension <span class="math notranslate nohighlight">\(3\times
+3\)</span>. In general we call the input for an input volume and it is defined
+by its width <span class="math notranslate nohighlight">\(H_1\)</span>, height <span class="math notranslate nohighlight">\(H_1\)</span> and depth <span class="math notranslate nohighlight">\(D_1\)</span>. If we have the
+standard three color channels <span class="math notranslate nohighlight">\(D_1=3\)</span>.</p>
+<p>The above example has <span class="math notranslate nohighlight">\(W_1=H_1=3\)</span> and <span class="math notranslate nohighlight">\(D_1=1\)</span>.</p>
+<p>When we introduce the filter we have the following additional hyperparameters</p>
+<ol class="arabic simple">
+<li><p><span class="math notranslate nohighlight">\(K\)</span> the number of filters. It is common to perform the convolution of the input several times since by experience shrinking the input too fast does not work well</p></li>
+<li><p><span class="math notranslate nohighlight">\(F\)</span> as the filter’s spatial extent</p></li>
+<li><p><span class="math notranslate nohighlight">\(S\)</span> as the stride parameter</p></li>
+<li><p><span class="math notranslate nohighlight">\(P\)</span> as the padding parameter</p></li>
+</ol>
+<p>These parameters are defined by the architecture of the network and are not included in the training.</p>
+</section>
+<section id="new-image-or-volume">
+<h2>New image (or volume)<a class="headerlink" href="#new-image-or-volume" title="Link to this heading">#</a></h2>
+<p>Acting with the filter on the input volume produces an output volume
+which is defined by its width <span class="math notranslate nohighlight">\(W_2\)</span>, its height <span class="math notranslate nohighlight">\(H_2\)</span> and its depth
+<span class="math notranslate nohighlight">\(D_2\)</span>.</p>
+<p>These are defined by the following relations</p>
+<div class="math notranslate nohighlight">
+\[
+W_2 = \frac{(W_1-F+2P)}{S}+1,
+\]</div>
+<div class="math notranslate nohighlight">
+\[
+H_2 = \frac{(H_1-F+2P)}{S}+1,
+\]</div>
+<p>and <span class="math notranslate nohighlight">\(D_2=K\)</span>.</p>
+</section>
+<section id="parameters-to-train-common-settings">
+<h2>Parameters to train, common settings<a class="headerlink" href="#parameters-to-train-common-settings" title="Link to this heading">#</a></h2>
+<p>With parameter sharing, the convolution involves thus  for each filter  <span class="math notranslate nohighlight">\(F\times F\times D_1\)</span> weights plus one bias parameter.</p>
+<p>In total we have</p>
+<div class="math notranslate nohighlight">
+\[
+\left(F\times F\times D_1)\right) \times K+(K\mathrm{--biases}),
+\]</div>
+<p>parameters to train by back propagation.</p>
+<p>It is common to let <span class="math notranslate nohighlight">\(K\)</span> come in powers of <span class="math notranslate nohighlight">\(2\)</span>, that is <span class="math notranslate nohighlight">\(32\)</span>, <span class="math notranslate nohighlight">\(64\)</span>, <span class="math notranslate nohighlight">\(128\)</span> etc.</p>
+<p><strong>Common settings.</strong></p>
+<ol class="arabic simple">
+<li><p><span class="math notranslate nohighlight">\(\begin{array}{c} F=3 &amp; S=1 &amp; P=1 \end{array}\)</span></p></li>
+<li><p><span class="math notranslate nohighlight">\(\begin{array}{c} F=5 &amp; S=1 &amp; P=2 \end{array}\)</span></p></li>
+<li><p><span class="math notranslate nohighlight">\(\begin{array}{c} F=5 &amp; S=2 &amp; P=\mathrm{open} \end{array}\)</span></p></li>
+<li><p><span class="math notranslate nohighlight">\(\begin{array}{c} F=1 &amp; S=1 &amp; P=0 \end{array}\)</span></p></li>
+</ol>
+</section>
+<section id="examples-of-cnn-setups">
+<h2>Examples of CNN setups<a class="headerlink" href="#examples-of-cnn-setups" title="Link to this heading">#</a></h2>
+<p>Let us assume we have an input volume <span class="math notranslate nohighlight">\(V\)</span> given by an image of dimensionality
+<span class="math notranslate nohighlight">\(32\times 32 \times 3\)</span>, that is three color channels and <span class="math notranslate nohighlight">\(32\times 32\)</span> pixels.</p>
+<p>We apply a filter of dimension <span class="math notranslate nohighlight">\(5\times 5\)</span> ten times with stride <span class="math notranslate nohighlight">\(S=1\)</span> and padding <span class="math notranslate nohighlight">\(P=0\)</span>.</p>
+<p>The output volume is given by <span class="math notranslate nohighlight">\((32-5)/1+1=28\)</span>, resulting in ten images
+of dimensionality <span class="math notranslate nohighlight">\(28\times 28\times 3\)</span>.</p>
+<p>The total number of parameters to train for each filter is then
+<span class="math notranslate nohighlight">\(5\times 5\times 3+1\)</span>, where the last parameter is the bias. This
+gives us <span class="math notranslate nohighlight">\(76\)</span> parameters for each filter, leading to a total of <span class="math notranslate nohighlight">\(760\)</span>
+parameters for the ten filters.</p>
+<p>How many parameters will a filter of dimensionality <span class="math notranslate nohighlight">\(3\times 3\)</span>
+(adding color channels) result in if we produce <span class="math notranslate nohighlight">\(32\)</span> new images? Use <span class="math notranslate nohighlight">\(S=1\)</span> and <span class="math notranslate nohighlight">\(P=0\)</span>.</p>
+<p>Note that strides constitute a form of <strong>subsampling</strong>. As an alternative to
+being interpreted as a measure of how much the kernel/filter is translated, strides
+can also be viewed as how much of the output is retained. For instance, moving
+the kernel by hops of two is equivalent to moving the kernel by hops of one but
+retaining only odd output elements.</p>
+</section>
+<section id="summarizing-performing-a-general-discrete-convolution-from-raschka-et-al">
+<h2>Summarizing: Performing a general discrete convolution (<a class="reference external" href="https://github.com/rasbt/machine-learning-book">From Raschka et al</a>)<a class="headerlink" href="#summarizing-performing-a-general-discrete-convolution-from-raschka-et-al" title="Link to this heading">#</a></h2>
+<!-- dom:FIGURE: [figslides/discreteconv1.png, width=500 frac=0.67]  A deep CNN -->
+<!-- begin figure -->
+<p><img src="http://github.com/figslides/discreteconv1.png" width="500"><p style="font-size: 0.9em"><i>Figure 1: A deep CNN</i></p></p>
+<!-- end figure --></section>
+<section id="pooling">
+<h2>Pooling<a class="headerlink" href="#pooling" title="Link to this heading">#</a></h2>
+<p>In addition to discrete convolutions themselves, <strong>pooling</strong> operations
+make up another important building block in CNNs. Pooling operations reduce
+the size of feature maps by using some function to summarize subregions, such
+as taking the average or the maximum value.</p>
+<p>Pooling works by sliding a window across the input and feeding the content of
+the window to a <strong>pooling function</strong>. In some sense, pooling works very much
+like a discrete convolution, but replaces the linear combination described by
+the kernel with some other function.</p>
+</section>
+<section id="pooling-arithmetic">
+<h2>Pooling arithmetic<a class="headerlink" href="#pooling-arithmetic" title="Link to this heading">#</a></h2>
+<p>In a neural network, pooling layers provide invariance to small translations of
+the input. The most common kind of pooling is <strong>max pooling</strong>, which
+consists in splitting the input in (usually non-overlapping) patches and
+outputting the maximum value of each patch. Other kinds of pooling exist, e.g.,
+mean or average pooling, which all share the same idea of aggregating the input
+locally by applying a non-linearity to the content of some patches.</p>
+</section>
+<section id="pooling-types-from-raschka-et-al">
+<h2>Pooling types (<a class="reference external" href="https://github.com/rasbt/machine-learning-book">From Raschka et al</a>)<a class="headerlink" href="#pooling-types-from-raschka-et-al" title="Link to this heading">#</a></h2>
+<!-- dom:FIGURE: [figslides/maxpooling.png, width=500 frac=0.67]  A deep CNN -->
+<!-- begin figure -->
+<p><img src="http://github.com/figslides/maxpooling.png" width="500"><p style="font-size: 0.9em"><i>Figure 1: A deep CNN</i></p></p>
+<!-- end figure --></section>
+<section id="building-convolutional-neural-networks-in-tensorflow-keras-and-pytorch">
+<h2>Building convolutional neural networks in Tensorflow/Keras and PyTorch<a class="headerlink" href="#building-convolutional-neural-networks-in-tensorflow-keras-and-pytorch" title="Link to this heading">#</a></h2>
+<p>As discussed above, CNNs are neural networks built from the assumption
+that the inputs to the network are 2D images. This is important
+because the number of features or pixels in images grows very fast
+with the image size, and an enormous number of weights and biases are
+needed in order to build an accurate network.  Next week we will
+discuss in more detail how we can build a CNN using either TensorFlow
+with Keras and PyTorch.</p>
+</section>
+</section>
+
+    <script type="text/x-thebe-config">
+    {
+        requestKernel: true,
+        binderOptions: {
+            repo: "binder-examples/jupyter-stacks-datascience",
+            ref: "master",
+        },
+        codeMirrorConfig: {
+            theme: "abcdef",
+            mode: "python"
+        },
+        kernelOptions: {
+            name: "python3",
+            path: "./."
+        },
+        predefinedOutput: true
+    }
+    </script>
+    <script>kernelName = 'python3'</script>
+
+                </article>
+              
+
+              
+              
+              
+              
+                <footer class="prev-next-footer d-print-none">
+                  
+<div class="prev-next-area">
+    <a class="left-prev"
+       href="http://github.com/exercisesweek43.html"
+       title="previous page">
+      <i class="fa-solid fa-angle-left"></i>
+      <div class="prev-next-info">
+        <p class="prev-next-subtitle">previous</p>
+        <p class="prev-next-title">Exercises week 43</p>
+      </div>
+    </a>
+    <a class="right-next"
+       href="http://github.com/exercisesweek44.html"
+       title="next page">
+      <div class="prev-next-info">
+        <p class="prev-next-subtitle">next</p>
+        <p class="prev-next-title">Exercises week 44</p>
+      </div>
+      <i class="fa-solid fa-angle-right"></i>
+    </a>
+</div>
+                </footer>
+              
+            </div>
+            
+            
+              
+                <div class="bd-sidebar-secondary bd-toc"><div class="sidebar-secondary-items sidebar-secondary__inner">
+
+
+  <div class="sidebar-secondary-item">
+  <div class="page-toc tocsection onthispage">
+    <i class="fa-solid fa-list"></i> Contents
+  </div>
+  <nav class="bd-toc-nav page-toc">
+    <ul class="visible nav section-nav flex-column">
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#plan-for-week-44">Plan for week 44</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#lab-sessions-on-tuesday-and-wednesday">Lab  sessions on Tuesday and Wednesday</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#material-for-lecture-monday-october-27">Material for Lecture Monday October 27</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#solving-differential-equations-with-deep-learning">Solving differential equations  with Deep Learning</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#ordinary-differential-equations-first">Ordinary Differential Equations first</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#the-trial-solution">The trial solution</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#minimization-process">Minimization process</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#minimizing-the-cost-function-using-gradient-descent-and-automatic-differentiation">Minimizing the cost function using gradient descent and automatic differentiation</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#example-exponential-decay">Example: Exponential decay</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#the-function-to-solve-for">The function to solve for</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#id1">The trial solution</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#setup-of-network">Setup of Network</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#reformulating-the-problem">Reformulating the problem</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#more-technicalities">More technicalities</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#more-details">More details</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#a-possible-implementation-of-a-neural-network">A possible implementation of a neural network</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#technicalities">Technicalities</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#final-technicalities-i">Final technicalities I</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#final-technicalities-ii">Final technicalities II</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#final-technicalities-iii">Final technicalities III</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#final-technicalities-iv">Final technicalities IV</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#back-propagation">Back propagation</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#gradient-descent">Gradient descent</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#the-code-for-solving-the-ode">The code for solving the ODE</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#the-network-with-one-input-layer-specified-number-of-hidden-layers-and-one-output-layer">The network with one input layer, specified number of hidden layers, and one output layer</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#example-population-growth">Example: Population growth</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#setting-up-the-problem">Setting up the problem</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#id2">The trial solution</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#the-program-using-autograd">The program using Autograd</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#using-forward-euler-to-solve-the-ode">Using forward Euler to solve the ODE</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#example-solving-the-one-dimensional-poisson-equation">Example: Solving the one dimensional Poisson equation</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#the-specific-equation-to-solve-for">The specific equation to solve for</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#solving-the-equation-using-autograd">Solving the equation using Autograd</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#comparing-with-a-numerical-scheme">Comparing with a numerical scheme</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#setting-up-the-code">Setting up the code</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#partial-differential-equations">Partial Differential Equations</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#type-of-problem">Type of problem</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#network-requirements">Network requirements</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#id3">More details</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#example-the-diffusion-equation">Example: The diffusion equation</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#defining-the-problem">Defining the problem</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#setting-up-the-network-using-autograd">Setting up the network using Autograd</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#setting-up-the-network-using-autograd-the-trial-solution">Setting up the network using Autograd; The trial solution</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#why-the-jacobian">Why the Jacobian?</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#setting-up-the-network-using-autograd-the-full-program">Setting up the network using Autograd; The full program</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#resources-on-differential-equations-and-deep-learning">Resources on differential equations and deep learning</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#convolutional-neural-networks-recognizing-images">Convolutional Neural Networks (recognizing images)</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#what-is-the-difference">What is the Difference</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#neural-networks-vs-cnns">Neural Networks vs CNNs</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc">Why CNNS for images, sound files, medical images from CT scans etc?</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#regular-nns-dont-scale-well-to-full-images">Regular NNs don’t scale well to full images</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#d-volumes-of-neurons">3D volumes of neurons</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#more-on-dimensionalities">More on Dimensionalities</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#further-remarks">Further remarks</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#layers-used-to-build-cnns">Layers used to build CNNs</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#transforming-images">Transforming images</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#cnns-in-brief">CNNs in brief</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#a-deep-cnn-model-from-raschka-et-al">A deep CNN model (From Raschka et al)</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#key-idea">Key Idea</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#how-to-do-image-compression-before-the-era-of-deep-learning">How to do image compression before the era of deep learning</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#the-svd-example">The SVD example</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#mathematics-of-cnns">Mathematics of CNNs</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#convolution-examples-polynomial-multiplication">Convolution Examples: Polynomial multiplication</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#efficient-polynomial-multiplication">Efficient Polynomial Multiplication</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#further-simplification">Further simplification</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#a-more-efficient-way-of-coding-the-above-convolution">A more efficient way of coding the above Convolution</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#commutative-process">Commutative process</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#toeplitz-matrices">Toeplitz matrices</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#fourier-series-and-toeplitz-matrices">Fourier series and Toeplitz matrices</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#generalizing-the-above-one-dimensional-case">Generalizing the above one-dimensional case</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#memory-considerations">Memory considerations</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#padding">Padding</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#new-vector">New vector</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#rewriting-as-dot-products">Rewriting as dot products</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#cross-correlation">Cross correlation</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#two-dimensional-objects">Two-dimensional objects</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#cnns-in-more-detail-simple-example">CNNs in more detail, simple example</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#the-convolution-stage">The convolution stage</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#finding-the-number-of-parameters">Finding the number of parameters</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#new-image-or-volume">New image (or volume)</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#parameters-to-train-common-settings">Parameters to train, common settings</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#examples-of-cnn-setups">Examples of CNN setups</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#summarizing-performing-a-general-discrete-convolution-from-raschka-et-al">Summarizing: Performing a general discrete convolution (From Raschka et al)</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#pooling">Pooling</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#pooling-arithmetic">Pooling arithmetic</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#pooling-types-from-raschka-et-al">Pooling types (From Raschka et al)</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#building-convolutional-neural-networks-in-tensorflow-keras-and-pytorch">Building convolutional neural networks in Tensorflow/Keras and PyTorch</a></li>
+</ul>
+  </nav></div>
+
+</div></div>
+              
+            
+          </div>
+          <footer class="bd-footer-content">
+            
+<div class="bd-footer-content__inner container">
+  
+  <div class="footer-item">
+    
+<p class="component-author">
+By Morten Hjorth-Jensen
+</p>
+
+  </div>
+  
+  <div class="footer-item">
+    
+
+  <p class="copyright">
+    
+      © Copyright 2023.
+      <br/>
+    
+  </p>
+
+  </div>
+  
+  <div class="footer-item">
+    
+  </div>
+  
+  <div class="footer-item">
+    
+  </div>
+  
+</div>
+          </footer>
+        
+
+      </main>
+    </div>
+  </div>
+  
+  <!-- Scripts loaded after <body> so the DOM is not blocked -->
+  <script src="http://github.com/_static/scripts/bootstrap.js?digest=dfe6caa3a7d634c4db9b"></script>
+<script src="http://github.com/_static/scripts/pydata-sphinx-theme.js?digest=dfe6caa3a7d634c4db9b"></script>
+
+  <footer class="bd-footer">
+  </footer>
+  </body>
+</html>
\ No newline at end of file
diff --git a/doc/LectureNotes/_build/html/week45.html b/doc/LectureNotes/_build/html/week45.html
new file mode 100644
index 000000000..463600da6
--- /dev/null
+++ b/doc/LectureNotes/_build/html/week45.html
@@ -0,0 +1,1800 @@
+
+<!DOCTYPE html>
+
+
+<html lang="en" data-content_root="./" >
+
+  <head>
+    <meta charset="utf-8" />
+    <meta name="viewport" content="width=device-width, initial-scale=1.0" /><meta name="viewport" content="width=device-width, initial-scale=1" />
+
+    <title>Week 45, Convolutional Neural Networks (CCNs) &#8212; Applied Data Analysis and Machine Learning</title>
+  
+  
+  
+  <script data-cfasync="false">
+    document.documentElement.dataset.mode = localStorage.getItem("mode") || "";
+    document.documentElement.dataset.theme = localStorage.getItem("theme") || "";
+  </script>
+  
+  <!-- Loaded before other Sphinx assets -->
+  <link href="http://github.com/_static/styles/theme.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+<link href="http://github.com/_static/styles/bootstrap.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+<link href="http://github.com/_static/styles/pydata-sphinx-theme.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+
+  
+  <link href="http://github.com/_static/vendor/fontawesome/6.5.2/css/all.min.css?digest=dfe6caa3a7d634c4db9b" rel="stylesheet" />
+  <link rel="preload" as="font" type="font/woff2" crossorigin href="http://github.com/_static/vendor/fontawesome/6.5.2/webfonts/fa-solid-900.woff2" />
+<link rel="preload" as="font" type="font/woff2" crossorigin href="http://github.com/_static/vendor/fontawesome/6.5.2/webfonts/fa-brands-400.woff2" />
+<link rel="preload" as="font" type="font/woff2" crossorigin href="http://github.com/_static/vendor/fontawesome/6.5.2/webfonts/fa-regular-400.woff2" />
+
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/pygments.css?v=fa44fd50" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/styles/sphinx-book-theme.css?v=eba8b062" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/togglebutton.css?v=13237357" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/copybutton.css?v=76b2166b" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/mystnb.8ecb98da25f57f5357bf6f572d296f466b2cfe2517ffebfabe82451661e28f02.css?v=6644e6bb" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/sphinx-thebe.css?v=4fa983c6" />
+    <link rel="stylesheet" type="text/css" href="http://github.com/_static/sphinx-design.min.css?v=95c83b7e" />
+  
+  <!-- Pre-loaded scripts that we'll load fully later -->
+  <link rel="preload" as="script" href="http://github.com/_static/scripts/bootstrap.js?digest=dfe6caa3a7d634c4db9b" />
+<link rel="preload" as="script" href="http://github.com/_static/scripts/pydata-sphinx-theme.js?digest=dfe6caa3a7d634c4db9b" />
+  <script src="http://github.com/_static/vendor/fontawesome/6.5.2/js/all.min.js?digest=dfe6caa3a7d634c4db9b"></script>
+
+    <script src="http://github.com/_static/documentation_options.js?v=9eb32ce0"></script>
+    <script src="http://github.com/_static/doctools.js?v=9a2dae69"></script>
+    <script src="http://github.com/_static/sphinx_highlight.js?v=dc90522c"></script>
+    <script src="http://github.com/_static/clipboard.min.js?v=a7894cd8"></script>
+    <script src="http://github.com/_static/copybutton.js?v=f281be69"></script>
+    <script src="http://github.com/_static/scripts/sphinx-book-theme.js?v=887ef09a"></script>
+    <script>let toggleHintShow = 'Click to show';</script>
+    <script>let toggleHintHide = 'Click to hide';</script>
+    <script>let toggleOpenOnPrint = 'true';</script>
+    <script src="http://github.com/_static/togglebutton.js?v=4a39c7ea"></script>
+    <script>var togglebuttonSelector = '.toggle, .admonition.dropdown';</script>
+    <script src="http://github.com/_static/design-tabs.js?v=f930bc37"></script>
+    <script>const THEBE_JS_URL = "https://unpkg.com/thebe@0.8.2/lib/index.js"; const thebe_selector = ".thebe,.cell"; const thebe_selector_input = "pre"; const thebe_selector_output = ".output, .cell_output"</script>
+    <script async="async" src="http://github.com/_static/sphinx-thebe.js?v=c100c467"></script>
+    <script>var togglebuttonSelector = '.toggle, .admonition.dropdown';</script>
+    <script>const THEBE_JS_URL = "https://unpkg.com/thebe@0.8.2/lib/index.js"; const thebe_selector = ".thebe,.cell"; const thebe_selector_input = "pre"; const thebe_selector_output = ".output, .cell_output"</script>
+    <script>window.MathJax = {"options": {"processHtmlClass": "tex2jax_process|mathjax_process|math|output_area"}}</script>
+    <script defer="defer" src="https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-mml-chtml.js"></script>
+    <script>DOCUMENTATION_OPTIONS.pagename = 'week45';</script>
+    <link rel="index" title="Index" href="http://github.com/genindex.html" />
+    <link rel="search" title="Search" href="http://github.com/search.html" />
+    <link rel="next" title="Project 1 on Machine Learning, deadline October 6 (midnight), 2025" href="http://github.com/project1.html" />
+    <link rel="prev" title="Exercises week 44" href="http://github.com/exercisesweek44.html" />
+  <meta name="viewport" content="width=device-width, initial-scale=1"/>
+  <meta name="docsearch:language" content="en"/>
+  </head>
+  
+  
+  <body data-bs-spy="scroll" data-bs-target=".bd-toc-nav" data-offset="180" data-bs-root-margin="0px 0px -60%" data-default-mode="">
+
+  
+  
+  <div id="pst-skip-link" class="skip-link d-print-none"><a href="#main-content">Skip to main content</a></div>
+  
+  <div id="pst-scroll-pixel-helper"></div>
+  
+  <button type="button" class="btn rounded-pill" id="pst-back-to-top">
+    <i class="fa-solid fa-arrow-up"></i>Back to top</button>
+
+  
+  <input type="checkbox"
+          class="sidebar-toggle"
+          id="pst-primary-sidebar-checkbox"/>
+  <label class="overlay overlay-primary" for="pst-primary-sidebar-checkbox"></label>
+  
+  <input type="checkbox"
+          class="sidebar-toggle"
+          id="pst-secondary-sidebar-checkbox"/>
+  <label class="overlay overlay-secondary" for="pst-secondary-sidebar-checkbox"></label>
+  
+  <div class="search-button__wrapper">
+    <div class="search-button__overlay"></div>
+    <div class="search-button__search-container">
+<form class="bd-search d-flex align-items-center"
+      action="http://github.com/search.html"
+      method="get">
+  <i class="fa-solid fa-magnifying-glass"></i>
+  <input type="search"
+         class="form-control"
+         name="q"
+         id="search-input"
+         placeholder="Search this book..."
+         aria-label="Search this book..."
+         autocomplete="off"
+         autocorrect="off"
+         autocapitalize="off"
+         spellcheck="false"/>
+  <span class="search-button__kbd-shortcut"><kbd class="kbd-shortcut__modifier">Ctrl</kbd>+<kbd>K</kbd></span>
+</form></div>
+  </div>
+
+  <div class="pst-async-banner-revealer d-none">
+  <aside id="bd-header-version-warning" class="d-none d-print-none" aria-label="Version warning"></aside>
+</div>
+
+  
+    <header class="bd-header navbar navbar-expand-lg bd-navbar d-print-none">
+    </header>
+  
+
+  <div class="bd-container">
+    <div class="bd-container__inner bd-page-width">
+      
+      
+      
+      <div class="bd-sidebar-primary bd-sidebar">
+        
+
+  
+  <div class="sidebar-header-items sidebar-primary__section">
+    
+    
+    
+    
+  </div>
+  
+    <div class="sidebar-primary-items__start sidebar-primary__section">
+        <div class="sidebar-primary-item">
+
+  
+    
+  
+
+<a class="navbar-brand logo" href="http://github.com/intro.html">
+  
+  
+  
+  
+  
+    
+    
+      
+    
+    
+    <img src="http://github.com/_static/logo.png" class="logo__image only-light" alt="Applied Data Analysis and Machine Learning - Home"/>
+    <script>document.write(`<img src="http://github.com/_static/logo.png" class="logo__image only-dark" alt="Applied Data Analysis and Machine Learning - Home"/>`);</script>
+  
+  
+</a></div>
+        <div class="sidebar-primary-item">
+
+ <script>
+ document.write(`
+   <button class="btn search-button-field search-button__button" title="Search" aria-label="Search" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <i class="fa-solid fa-magnifying-glass"></i>
+    <span class="search-button__default-text">Search</span>
+    <span class="search-button__kbd-shortcut"><kbd class="kbd-shortcut__modifier">Ctrl</kbd>+<kbd class="kbd-shortcut__modifier">K</kbd></span>
+   </button>
+ `);
+ </script></div>
+        <div class="sidebar-primary-item"><nav class="bd-links bd-docs-nav" aria-label="Main">
+    <div class="bd-toc-item navbar-nav active">
+        
+        <ul class="nav bd-sidenav bd-sidenav__home-link">
+            <li class="toctree-l1">
+                <a class="reference internal" href="http://github.com/intro.html">
+                    Applied Data Analysis and Machine Learning
+                </a>
+            </li>
+        </ul>
+        <p aria-level="2" class="caption" role="heading"><span class="caption-text">About the course</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/schedule.html">Course setting</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/teachers.html">Teachers and Grading</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/textbooks.html">Textbooks</a></li>
+
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Review of Statistics with Resampling Techniques and Linear Algebra</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/statistics.html">1. Elements of Probability Theory and Statistical Data Analysis</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/linalg.html">2. Linear Algebra, Handling of Arrays and more Python Features</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">From Regression to Support Vector Machines</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter1.html">3. Linear Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter2.html">4. Ridge and Lasso Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter3.html">5. Resampling Methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter4.html">6. Logistic Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapteroptimization.html">7. Optimization, the central part of any Machine Learning algortithm</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter5.html">8. Support Vector Machines, overarching aims</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Decision Trees, Ensemble Methods and Boosting</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter6.html">9. Decision trees, overarching aims</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter7.html">10. Ensemble Methods: From a Single Tree to Many Trees and Extreme Boosting, Meet the Jungle of Methods</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Dimensionality Reduction</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter8.html">11. Basic ideas of the Principal Component Analysis (PCA)</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/clustering.html">12. Clustering and Unsupervised Learning</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Deep Learning Methods</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter9.html">13. Neural networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter10.html">14. Building a Feed Forward Neural Network</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter11.html">15. Solving Differential Equations  with Deep Learning</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter12.html">16. Convolutional Neural Networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/chapter13.html">17. Recurrent neural networks: Overarching view</a></li>
+
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Weekly material, notes and exercises</span></p>
+<ul class="current nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek34.html">Exercises week 34</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week34.html">Week 34: Introduction to the course, Logistics and Practicalities</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek35.html">Exercises week 35</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week35.html">Week 35: From Ordinary Linear Regression to Ridge and Lasso Regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek36.html">Exercises week 36</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week36.html">Week 36: Linear Regression and Gradient descent</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek37.html">Exercises week 37</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week37.html">Week 37: Gradient descent methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek38.html">Exercises week 38</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week38.html">Week 38: Statistical analysis, bias-variance tradeoff and resampling methods</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek39.html">Exercises week 39</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week39.html">Week 39: Resampling methods and logistic regression</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week40.html">Week 40: Gradient descent methods (continued) and start Neural networks</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week41.html">Week 41 Neural networks and constructing a neural network code</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek41.html">Exercises week 41</a></li>
+
+
+
+
+
+
+
+
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week42.html">Week 42 Constructing a Neural Network code with examples</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek42.html">Exercises week 42</a></li>
+
+
+
+
+
+
+
+
+
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week43.html">Week 43: Deep Learning: Constructing a Neural Network code and solving differential equations</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek43.html">Exercises week 43</a></li>
+
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/week44.html">Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/exercisesweek44.html">Exercises week 44</a></li>
+
+<li class="toctree-l1 current active"><a class="current reference internal" href="#">Week 45,  Convolutional Neural Networks (CCNs)</a></li>
+</ul>
+<p aria-level="2" class="caption" role="heading"><span class="caption-text">Projects</span></p>
+<ul class="nav bd-sidenav">
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/project1.html">Project 1 on Machine Learning, deadline October 6 (midnight), 2025</a></li>
+<li class="toctree-l1"><a class="reference internal" href="http://github.com/project2.html">Project 2 on Machine Learning, deadline November 10 (Midnight)</a></li>
+</ul>
+
+    </div>
+</nav></div>
+    </div>
+  
+  
+  <div class="sidebar-primary-items__end sidebar-primary__section">
+  </div>
+  
+  <div id="rtd-footer-container"></div>
+
+
+      </div>
+      
+      <main id="main-content" class="bd-main" role="main">
+        
+        
+
+<div class="sbt-scroll-pixel-helper"></div>
+
+          <div class="bd-content">
+            <div class="bd-article-container">
+              
+              <div class="bd-header-article d-print-none">
+<div class="header-article-items header-article__inner">
+  
+    <div class="header-article-items__start">
+      
+        <div class="header-article-item"><button class="sidebar-toggle primary-toggle btn btn-sm" title="Toggle primary sidebar" data-bs-placement="bottom" data-bs-toggle="tooltip">
+  <span class="fa-solid fa-bars"></span>
+</button></div>
+      
+    </div>
+  
+  
+    <div class="header-article-items__end">
+      
+        <div class="header-article-item">
+
+<div class="article-header-buttons">
+
+
+
+
+
+<div class="dropdown dropdown-download-buttons">
+  <button class="btn dropdown-toggle" type="button" data-bs-toggle="dropdown" aria-expanded="false" aria-label="Download this page">
+    <i class="fas fa-download"></i>
+  </button>
+  <ul class="dropdown-menu">
+      
+      
+      
+      <li><a href="http://github.com/_sources/week45.ipynb" target="_blank"
+   class="btn btn-sm btn-download-source-button dropdown-item"
+   title="Download source file"
+   data-bs-placement="left" data-bs-toggle="tooltip"
+>
+  
+
+<span class="btn__icon-container">
+  <i class="fas fa-file"></i>
+  </span>
+<span class="btn__text-container">.ipynb</span>
+</a>
+</li>
+      
+      
+      
+      
+      <li>
+<button onclick="window.print()"
+  class="btn btn-sm btn-download-pdf-button dropdown-item"
+  title="Print to PDF"
+  data-bs-placement="left" data-bs-toggle="tooltip"
+>
+  
+
+<span class="btn__icon-container">
+  <i class="fas fa-file-pdf"></i>
+  </span>
+<span class="btn__text-container">.pdf</span>
+</button>
+</li>
+      
+  </ul>
+</div>
+
+
+
+
+<button onclick="toggleFullScreen()"
+  class="btn btn-sm btn-fullscreen-button"
+  title="Fullscreen mode"
+  data-bs-placement="bottom" data-bs-toggle="tooltip"
+>
+  
+
+<span class="btn__icon-container">
+  <i class="fas fa-expand"></i>
+  </span>
+
+</button>
+
+
+
+<script>
+document.write(`
+  <button class="btn btn-sm nav-link pst-navbar-icon theme-switch-button" title="light/dark" aria-label="light/dark" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <i class="theme-switch fa-solid fa-sun fa-lg" data-mode="light"></i>
+    <i class="theme-switch fa-solid fa-moon fa-lg" data-mode="dark"></i>
+    <i class="theme-switch fa-solid fa-circle-half-stroke fa-lg" data-mode="auto"></i>
+  </button>
+`);
+</script>
+
+
+<script>
+document.write(`
+  <button class="btn btn-sm pst-navbar-icon search-button search-button__button" title="Search" aria-label="Search" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <i class="fa-solid fa-magnifying-glass fa-lg"></i>
+  </button>
+`);
+</script>
+<button class="sidebar-toggle secondary-toggle btn btn-sm" title="Toggle secondary sidebar" data-bs-placement="bottom" data-bs-toggle="tooltip">
+    <span class="fa-solid fa-list"></span>
+</button>
+</div></div>
+      
+    </div>
+  
+</div>
+</div>
+              
+              
+
+<div id="jb-print-docs-body" class="onlyprint">
+    <h1>Week 45,  Convolutional Neural Networks (CCNs)</h1>
+    <!-- Table of contents -->
+    <div id="print-main-content">
+        <div id="jb-print-toc">
+            
+            <div>
+                <h2> Contents </h2>
+            </div>
+            <nav aria-label="Page">
+                <ul class="visible nav section-nav flex-column">
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#plans-for-week-45">Plans for week 45</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#material-for-the-lab-sessions">Material for the lab sessions</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#material-for-lecture-monday-november-3">Material for Lecture Monday November 3</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#convolutional-neural-networks-recognizing-images-reminder-from-last-week">Convolutional Neural Networks (recognizing images), reminder from last week</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#what-is-the-difference">What is the Difference</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#neural-networks-vs-cnns">Neural Networks vs CNNs</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc">Why CNNS for images, sound files, medical images from CT scans etc?</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#regular-nns-dont-scale-well-to-full-images">Regular NNs don’t scale well to full images</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#d-volumes-of-neurons">3D volumes of neurons</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#more-on-dimensionalities">More on Dimensionalities</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#further-remarks">Further remarks</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#layers-used-to-build-cnns">Layers used to build CNNs</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#transforming-images">Transforming images</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#cnns-in-brief">CNNs in brief</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#a-deep-cnn-model-from-raschka-et-al">A deep CNN model (From Raschka et al)</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#key-idea">Key Idea</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#mathematics-of-cnns">Mathematics of CNNs</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#convolution-examples-polynomial-multiplication">Convolution Examples: Polynomial multiplication</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#efficient-polynomial-multiplication">Efficient Polynomial Multiplication</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#further-simplification">Further simplification</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#a-more-efficient-way-of-coding-the-above-convolution">A more efficient way of coding the above Convolution</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#commutative-process">Commutative process</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#toeplitz-matrices">Toeplitz matrices</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#fourier-series-and-toeplitz-matrices">Fourier series and Toeplitz matrices</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#generalizing-the-above-one-dimensional-case">Generalizing the above one-dimensional case</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#memory-considerations">Memory considerations</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#padding">Padding</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#new-vector">New vector</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#rewriting-as-dot-products">Rewriting as dot products</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#cross-correlation">Cross correlation</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#two-dimensional-objects">Two-dimensional objects</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#cnns-in-more-detail-simple-example">CNNs in more detail, simple example</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#the-convolution-stage">The convolution stage</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#finding-the-number-of-parameters">Finding the number of parameters</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#new-image-or-volume">New image (or volume)</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#parameters-to-train-common-settings">Parameters to train, common settings</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#examples-of-cnn-setups">Examples of CNN setups</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#summarizing-performing-a-general-discrete-convolution-from-raschka-et-al">Summarizing: Performing a general discrete convolution (From Raschka et al)</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#pooling">Pooling</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#pooling-arithmetic">Pooling arithmetic</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#pooling-types-from-raschka-et-al">Pooling types (From Raschka et al)</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#building-convolutional-neural-networks-using-tensorflow-and-keras">Building convolutional neural networks using Tensorflow and Keras</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#setting-it-up">Setting it up</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#the-mnist-dataset-again">The MNIST dataset again</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#strong-correlations">Strong correlations</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#layers-of-a-cnn">Layers of a CNN</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#systematic-reduction">Systematic reduction</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#prerequisites-collect-and-pre-process-data">Prerequisites: Collect and pre-process data</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#importing-keras-and-tensorflow">Importing Keras and Tensorflow</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#running-with-keras">Running with Keras</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#final-part">Final part</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#final-visualization">Final visualization</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#the-cifar01-data-set">The CIFAR01 data set</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#verifying-the-data-set">Verifying the data set</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#set-up-the-model">Set up  the model</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#add-dense-layers-on-top">Add Dense layers on top</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#compile-and-train-the-model">Compile and train the model</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#finally-evaluate-the-model">Finally, evaluate the model</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#building-code-using-pytorch">Building code using Pytorch</a></li>
+</ul>
+            </nav>
+        </div>
+    </div>
+</div>
+
+              
+                
+<div id="searchbox"></div>
+                <article class="bd-article">
+                  
+  <!-- HTML file automatically generated from DocOnce source (https://github.com/doconce/doconce/)
+doconce format html week45.do.txt --no_mako -->
+<!-- dom:TITLE: Week 45,  Convolutional Neural Networks (CCNs) --><section class="tex2jax_ignore mathjax_ignore" id="week-45-convolutional-neural-networks-ccns">
+<h1>Week 45,  Convolutional Neural Networks (CCNs)<a class="headerlink" href="#week-45-convolutional-neural-networks-ccns" title="Link to this heading">#</a></h1>
+<p><strong>Morten Hjorth-Jensen</strong>, Department of Physics, University of Oslo</p>
+<p>Date: <strong>November 3-7, 2025</strong></p>
+<section id="plans-for-week-45">
+<h2>Plans for week 45<a class="headerlink" href="#plans-for-week-45" title="Link to this heading">#</a></h2>
+<p><strong>Material for the lecture on Monday November 3, 2025.</strong></p>
+<ol class="arabic simple">
+<li><p>Convolutional Neural Networks, codes and examples (TensorFlow and Pytorch implementations)</p></li>
+<li><p>Readings and Videos:</p></li>
+<li><p>These lecture notes at <a class="github reference external" href="https://github.com/CompPhysics/MachineLearning/blob/master/doc/pub/week45/ipynb/week45.ipynb">CompPhysics/MachineLearning</a></p></li>
+<li><p>Video of lecture at <a class="reference external" href="https://youtu.be/dZt6Vm1wjhs">https://youtu.be/dZt6Vm1wjhs</a></p></li>
+<li><p>Whiteboard notes at <a class="github reference external" href="https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2025/FYSSTKweek45.pdf">CompPhysics/MachineLearning</a></p></li>
+<li><p>For a more in depth discussion on  CNNs we recommend Goodfellow et al chapters 9. See also chapter 11 and 12 on practicalities and applications</p></li>
+<li><p>Reading suggestions for implementation of CNNs, see Raschka et al chapters 14-15 at <a class="github reference external" href="https://github.com/rasbt/machine-learning-book">rasbt/machine-learning-book</a>.</p></li>
+</ol>
+<!-- o Video  on Recurrent Neural Networks from MIT at <https://www.youtube.com/watch?v=SEnXr6v2ifU&ab_channel=AlexanderAmini> -->
+<p>a. Video on Deep Learning at <a class="reference external" href="https://www.youtube.com/playlist?list=PLZHQObOWTQDNU6R1_67000Dx_ZCJB-3pi">https://www.youtube.com/playlist?list=PLZHQObOWTQDNU6R1_67000Dx_ZCJB-3pi</a></p>
+</section>
+<section id="material-for-the-lab-sessions">
+<h2>Material for the lab sessions<a class="headerlink" href="#material-for-the-lab-sessions" title="Link to this heading">#</a></h2>
+<p>Discussion of and work on project 2, no exercises this week, only project work</p>
+</section>
+<section id="material-for-lecture-monday-november-3">
+<h2>Material for Lecture Monday November 3<a class="headerlink" href="#material-for-lecture-monday-november-3" title="Link to this heading">#</a></h2>
+</section>
+<section id="convolutional-neural-networks-recognizing-images-reminder-from-last-week">
+<h2>Convolutional Neural Networks (recognizing images), reminder from last week<a class="headerlink" href="#convolutional-neural-networks-recognizing-images-reminder-from-last-week" title="Link to this heading">#</a></h2>
+<p>Convolutional neural networks (CNNs) were developed during the last
+decade of the previous century, with a focus on character recognition
+tasks. Nowadays, CNNs are a central element in the spectacular success
+of deep learning methods. The success in for example image
+classifications have made them a central tool for most machine
+learning practitioners.</p>
+<p>CNNs are very similar to ordinary Neural Networks.
+They are made up of neurons that have learnable weights and
+biases. Each neuron receives some inputs, performs a dot product and
+optionally follows it with a non-linearity. The whole network still
+expresses a single differentiable score function: from the raw image
+pixels on one end to class scores at the other. And they still have a
+loss function (for example Softmax) on the last (fully-connected) layer
+and all the tips/tricks we developed for learning regular Neural
+Networks still apply (back propagation, gradient descent etc etc).</p>
+</section>
+<section id="what-is-the-difference">
+<h2>What is the Difference<a class="headerlink" href="#what-is-the-difference" title="Link to this heading">#</a></h2>
+<p><strong>CNN architectures make the explicit assumption that
+the inputs are images, which allows us to encode certain properties
+into the architecture. These then make the forward function more
+efficient to implement and vastly reduce the amount of parameters in
+the network.</strong></p>
+</section>
+<section id="neural-networks-vs-cnns">
+<h2>Neural Networks vs CNNs<a class="headerlink" href="#neural-networks-vs-cnns" title="Link to this heading">#</a></h2>
+<p>Neural networks are defined as <strong>affine transformations</strong>, that is
+a vector is received as input and is multiplied with a matrix of so-called weights (our unknown paramters) to produce an
+output (to which a bias vector is usually added before passing the result
+through a nonlinear activation function). This is applicable to any type of input, be it an
+image, a sound clip or an unordered collection of features: whatever their
+dimensionality, their representation can always be flattened into a vector
+before the transformation.</p>
+</section>
+<section id="why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc">
+<h2>Why CNNS for images, sound files, medical images from CT scans etc?<a class="headerlink" href="#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" title="Link to this heading">#</a></h2>
+<p>However, when we consider images, sound clips and many other similar kinds of data, these data  have an intrinsic
+structure. More formally, they share these important properties:</p>
+<ul class="simple">
+<li><p>They are stored as multi-dimensional arrays (think of the pixels of a figure) .</p></li>
+<li><p>They feature one or more axes for which ordering matters (e.g., width and height axes for an image, time axis for a sound clip).</p></li>
+<li><p>One axis, called the channel axis, is used to access different views of the data (e.g., the red, green and blue channels of a color image, or the left and right channels of a stereo audio track).</p></li>
+</ul>
+<p>These properties are not exploited when an affine transformation is applied; in
+fact, all the axes are treated in the same way and the topological information
+is not taken into account. Still, taking advantage of the implicit structure of
+the data may prove very handy in solving some tasks, like computer vision and
+speech recognition, and in these cases it would be best to preserve it. This is
+where discrete convolutions come into play.</p>
+<p>A discrete convolution is a linear transformation that preserves this notion of
+ordering. It is sparse (only a few input units contribute to a given output
+unit) and reuses parameters (the same weights are applied to multiple locations
+in the input).</p>
+</section>
+<section id="regular-nns-dont-scale-well-to-full-images">
+<h2>Regular NNs don’t scale well to full images<a class="headerlink" href="#regular-nns-dont-scale-well-to-full-images" title="Link to this heading">#</a></h2>
+<p>As an example, consider
+an image of size <span class="math notranslate nohighlight">\(32\times 32\times 3\)</span> (32 wide, 32 high, 3 color channels), so a
+single fully-connected neuron in a first hidden layer of a regular
+Neural Network would have <span class="math notranslate nohighlight">\(32\times 32\times 3 = 3072\)</span> weights. This amount still
+seems manageable, but clearly this fully-connected structure does not
+scale to larger images. For example, an image of more respectable
+size, say <span class="math notranslate nohighlight">\(200\times 200\times 3\)</span>, would lead to neurons that have
+<span class="math notranslate nohighlight">\(200\times 200\times 3 = 120,000\)</span> weights.</p>
+<p>We could have
+several such neurons, and the parameters would add up quickly! Clearly,
+this full connectivity is wasteful and the huge number of parameters
+would quickly lead to possible overfitting.</p>
+<!-- dom:FIGURE: [figslides/nn.jpeg, width=500 frac=0.6]  A regular 3-layer Neural Network. -->
+<!-- begin figure -->
+<p><img src="http://github.com/figslides/nn.jpeg" width="500"><p style="font-size: 0.9em"><i>Figure 1: A regular 3-layer Neural Network.</i></p></p>
+<!-- end figure --></section>
+<section id="d-volumes-of-neurons">
+<h2>3D volumes of neurons<a class="headerlink" href="#d-volumes-of-neurons" title="Link to this heading">#</a></h2>
+<p>Convolutional Neural Networks take advantage of the fact that the
+input consists of images and they constrain the architecture in a more
+sensible way.</p>
+<p>In particular, unlike a regular Neural Network, the
+layers of a CNN have neurons arranged in 3 dimensions: width,
+height, depth. (Note that the word depth here refers to the third
+dimension of an activation volume, not to the depth of a full Neural
+Network, which can refer to the total number of layers in a network.)</p>
+<p>To understand it better, the above example of an image
+with an input volume of
+activations has dimensions <span class="math notranslate nohighlight">\(32\times 32\times 3\)</span> (width, height,
+depth respectively).</p>
+<p>The neurons in a layer will
+only be connected to a small region of the layer before it, instead of
+all of the neurons in a fully-connected manner. Moreover, the final
+output layer could  for this specific image have dimensions <span class="math notranslate nohighlight">\(1\times 1 \times 10\)</span>,
+because by the
+end of the CNN architecture we will reduce the full image into a
+single vector of class scores, arranged along the depth
+dimension.</p>
+<!-- dom:FIGURE: [figslides/cnn.jpeg, width=500 frac=0.6]  A CNN arranges its neurons in three dimensions (width, height, depth), as visualized in one of the layers. Every layer of a CNN transforms the 3D input volume to a 3D output volume of neuron activations. In this example, the red input layer holds the image, so its width and height would be the dimensions of the image, and the depth would be 3 (Red, Green, Blue channels). -->
+<!-- begin figure -->
+<p><img src="http://github.com/figslides/cnn.jpeg" width="500"><p style="font-size: 0.9em"><i>Figure 1: A CNN arranges its neurons in three dimensions (width, height, depth), as visualized in one of the layers. Every layer of a CNN transforms the 3D input volume to a 3D output volume of neuron activations. In this example, the red input layer holds the image, so its width and height would be the dimensions of the image, and the depth would be 3 (Red, Green, Blue channels).</i></p></p>
+<!-- end figure --></section>
+<section id="more-on-dimensionalities">
+<h2>More on Dimensionalities<a class="headerlink" href="#more-on-dimensionalities" title="Link to this heading">#</a></h2>
+<p>In fields like signal processing (and imaging as well), one designs
+so-called filters. These filters are defined by the convolutions and
+are often hand-crafted. One may specify filters for smoothing, edge
+detection, frequency reshaping, and similar operations. However with
+neural networks the idea is to automatically learn the filters and use
+many of them in conjunction with non-linear operations (activation
+functions).</p>
+<p>As an example consider a neural network operating on sound sequence
+data.  Assume that we an input vector <span class="math notranslate nohighlight">\(\boldsymbol{x}\)</span> of length <span class="math notranslate nohighlight">\(d=10^6\)</span>.  We
+construct then a neural network with onle hidden layer only with
+<span class="math notranslate nohighlight">\(10^4\)</span> nodes. This means that we will have a weight matrix with
+<span class="math notranslate nohighlight">\(10^4\times 10^6=10^{10}\)</span> weights to be determined, together with <span class="math notranslate nohighlight">\(10^4\)</span> biases.</p>
+<p>Assume furthermore that we have an output layer which is meant to train whether the sound sequence represents a human voice (true) or something else (false).
+It means that we have only one output node. But since this output node connects to <span class="math notranslate nohighlight">\(10^4\)</span> nodes in the hidden layer, there are in total <span class="math notranslate nohighlight">\(10^4\)</span> weights to be determined for the output layer, plus one bias. In total we have</p>
+<div class="math notranslate nohighlight">
+\[
+\mathrm{NumberParameters}=10^{10}+10^4+10^4+1 \approx 10^{10},
+\]</div>
+<p>that is ten billion parameters to determine.</p>
+</section>
+<section id="further-remarks">
+<h2>Further remarks<a class="headerlink" href="#further-remarks" title="Link to this heading">#</a></h2>
+<p>The main principles that justify convolutions is locality of
+information and repetion of patterns within the signal. Sound samples
+of the input in adjacent spots are much more likely to affect each
+other than those that are very far away. Similarly, sounds are
+repeated in multiple times in the signal. While slightly simplistic,
+reasoning about such a sound example demonstrates this. The same
+principles then apply to images and other similar data.</p>
+</section>
+<section id="layers-used-to-build-cnns">
+<h2>Layers used to build CNNs<a class="headerlink" href="#layers-used-to-build-cnns" title="Link to this heading">#</a></h2>
+<p>A simple CNN is a sequence of layers, and every layer of a CNN
+transforms one volume of activations to another through a
+differentiable function. We use three main types of layers to build
+CNN architectures: Convolutional Layer, Pooling Layer, and
+Fully-Connected Layer (exactly as seen in regular Neural Networks). We
+will stack these layers to form a full CNN architecture.</p>
+<p>A simple CNN for image classification could have the architecture:</p>
+<ul class="simple">
+<li><p><strong>INPUT</strong> (<span class="math notranslate nohighlight">\(32\times 32 \times 3\)</span>) will hold the raw pixel values of the image, in this case an image of width 32, height 32, and with three color channels R,G,B.</p></li>
+<li><p><strong>CONV</strong> (convolutional )layer will compute the output of neurons that are connected to local regions in the input, each computing a dot product between their weights and a small region they are connected to in the input volume. This may result in volume such as <span class="math notranslate nohighlight">\([32\times 32\times 12]\)</span> if we decided to use 12 filters.</p></li>
+<li><p><strong>RELU</strong> layer will apply an elementwise activation function, such as the <span class="math notranslate nohighlight">\(max(0,x)\)</span> thresholding at zero. This leaves the size of the volume unchanged (<span class="math notranslate nohighlight">\([32\times 32\times 12]\)</span>).</p></li>
+<li><p><strong>POOL</strong> (pooling) layer will perform a downsampling operation along the spatial dimensions (width, height), resulting in volume such as <span class="math notranslate nohighlight">\([16\times 16\times 12]\)</span>.</p></li>
+<li><p><strong>FC</strong> (i.e. fully-connected) layer will compute the class scores, resulting in volume of size <span class="math notranslate nohighlight">\([1\times 1\times 10]\)</span>, where each of the 10 numbers correspond to a class score, such as among the 10 categories of the MNIST images we considered above . As with ordinary Neural Networks and as the name implies, each neuron in this layer will be connected to all the numbers in the previous volume.</p></li>
+</ul>
+</section>
+<section id="transforming-images">
+<h2>Transforming images<a class="headerlink" href="#transforming-images" title="Link to this heading">#</a></h2>
+<p>CNNs transform the original image layer by layer from the original
+pixel values to the final class scores.</p>
+<p>Observe that some layers contain
+parameters and other don’t. In particular, the CNN layers perform
+transformations that are a function of not only the activations in the
+input volume, but also of the parameters (the weights and biases of
+the neurons). On the other hand, the RELU/POOL layers will implement a
+fixed function. The parameters in the CONV/FC layers will be trained
+with gradient descent so that the class scores that the CNN computes
+are consistent with the labels in the training set for each image.</p>
+</section>
+<section id="cnns-in-brief">
+<h2>CNNs in brief<a class="headerlink" href="#cnns-in-brief" title="Link to this heading">#</a></h2>
+<p>In summary:</p>
+<ul class="simple">
+<li><p>A CNN architecture is in the simplest case a list of Layers that transform the image volume into an output volume (e.g. holding the class scores)</p></li>
+<li><p>There are a few distinct types of Layers (e.g. CONV/FC/RELU/POOL are by far the most popular)</p></li>
+<li><p>Each Layer accepts an input 3D volume and transforms it to an output 3D volume through a differentiable function</p></li>
+<li><p>Each Layer may or may not have parameters (e.g. CONV/FC do, RELU/POOL don’t)</p></li>
+<li><p>Each Layer may or may not have additional hyperparameters (e.g. CONV/FC/POOL do, RELU doesn’t)</p></li>
+</ul>
+</section>
+<section id="a-deep-cnn-model-from-raschka-et-al">
+<h2>A deep CNN model (<a class="reference external" href="https://github.com/rasbt/machine-learning-book">From Raschka et al</a>)<a class="headerlink" href="#a-deep-cnn-model-from-raschka-et-al" title="Link to this heading">#</a></h2>
+<!-- dom:FIGURE: [figslides/deepcnn.png, width=500 frac=0.67]  A deep CNN -->
+<!-- begin figure -->
+<p><img src="http://github.com/figslides/deepcnn.png" width="500"><p style="font-size: 0.9em"><i>Figure 1: A deep CNN</i></p></p>
+<!-- end figure --></section>
+<section id="key-idea">
+<h2>Key Idea<a class="headerlink" href="#key-idea" title="Link to this heading">#</a></h2>
+<p>A dense neural network is representd by an affine operation (like matrix-matrix multiplication) where all parameters are included.</p>
+<p>The key idea in CNNs for say imaging is that in images neighbor pixels tend to be related! So we connect
+only neighboring neurons in the input instead of connecting all with the first hidden layer.</p>
+<p>We say we perform a filtering (convolution is the mathematical operation).</p>
+</section>
+<section id="mathematics-of-cnns">
+<h2>Mathematics of CNNs<a class="headerlink" href="#mathematics-of-cnns" title="Link to this heading">#</a></h2>
+<p>The mathematics of CNNs is based on the mathematical operation of
+<strong>convolution</strong>.  In mathematics (in particular in functional analysis),
+convolution is represented by mathematical operations (integration,
+summation etc) on two functions in order to produce a third function
+that expresses how the shape of one gets modified by the other.
+Convolution has a plethora of applications in a variety of
+disciplines, spanning from statistics to signal processing, computer
+vision, solutions of differential equations,linear algebra,
+engineering, and yes, machine learning.</p>
+<p>Mathematically, convolution is defined as follows (one-dimensional example):
+Let us define a continuous function <span class="math notranslate nohighlight">\(y(t)\)</span> given by</p>
+<div class="math notranslate nohighlight">
+\[
+y(t) = \int x(a) w(t-a) da,
+\]</div>
+<p>where <span class="math notranslate nohighlight">\(x(a)\)</span> represents a so-called input and <span class="math notranslate nohighlight">\(w(t-a)\)</span> is normally called the weight function or kernel.</p>
+<p>The above integral is written in  a more compact form as</p>
+<div class="math notranslate nohighlight">
+\[
+y(t) = \left(x * w\right)(t).
+\]</div>
+<p>The discretized version reads</p>
+<div class="math notranslate nohighlight">
+\[
+y(t) = \sum_{a=-\infty}^{a=\infty}x(a)w(t-a).
+\]</div>
+<p>Computing the inverse of the above convolution operations is known as deconvolution and the process is commutative.</p>
+<p>How can we use this? And what does it mean? Let us study some familiar examples first.</p>
+</section>
+<section id="convolution-examples-polynomial-multiplication">
+<h2>Convolution Examples: Polynomial multiplication<a class="headerlink" href="#convolution-examples-polynomial-multiplication" title="Link to this heading">#</a></h2>
+<p>Our first example is that of a multiplication between two polynomials,
+which we will rewrite in terms of the mathematics of convolution. In
+the final stage, since the problem here is a discrete one, we will
+recast the final expression in terms of a matrix-vector
+multiplication, where the matrix is a so-called <a class="reference external" href="https://link.springer.com/book/10.1007/978-93-86279-04-0">Toeplitz matrix
+</a>.</p>
+<p>Let us look a the following polynomials to second and third order, respectively:</p>
+<div class="math notranslate nohighlight">
+\[
+p(t) = \alpha_0+\alpha_1 t+\alpha_2 t^2,
+\]</div>
+<p>and</p>
+<div class="math notranslate nohighlight">
+\[
+s(t) = \beta_0+\beta_1 t+\beta_2 t^2+\beta_3 t^3.
+\]</div>
+<p>The polynomial multiplication gives us a new polynomial of degree <span class="math notranslate nohighlight">\(5\)</span></p>
+<div class="math notranslate nohighlight">
+\[
+z(t) = \delta_0+\delta_1 t+\delta_2 t^2+\delta_3 t^3+\delta_4 t^4+\delta_5 t^5.
+\]</div>
+</section>
+<section id="efficient-polynomial-multiplication">
+<h2>Efficient Polynomial Multiplication<a class="headerlink" href="#efficient-polynomial-multiplication" title="Link to this heading">#</a></h2>
+<p>Computing polynomial products can be implemented efficiently if we rewrite the more brute force multiplications using convolution.
+We note first that the new coefficients are given as</p>
+<div class="math notranslate nohighlight">
+\[\begin{split}
+\begin{split}
+\delta_0=&amp;\alpha_0\beta_0\\
+\delta_1=&amp;\alpha_1\beta_0+\alpha_0\beta_1\\
+\delta_2=&amp;\alpha_0\beta_2+\alpha_1\beta_1+\alpha_2\beta_0\\
+\delta_3=&amp;\alpha_1\beta_2+\alpha_2\beta_1+\alpha_0\beta_3\\
+\delta_4=&amp;\alpha_2\beta_2+\alpha_1\beta_3\\
+\delta_5=&amp;\alpha_2\beta_3.\\
+\end{split}
+\end{split}\]</div>
+<p>We note that <span class="math notranslate nohighlight">\(\alpha_i=0\)</span> except for <span class="math notranslate nohighlight">\(i\in \left\{0,1,2\right\}\)</span> and <span class="math notranslate nohighlight">\(\beta_i=0\)</span> except for <span class="math notranslate nohighlight">\(i\in\left\{0,1,2,3\right\}\)</span>.</p>
+<p>We can then rewrite the coefficients <span class="math notranslate nohighlight">\(\delta_j\)</span> using a discrete convolution as</p>
+<div class="math notranslate nohighlight">
+\[
+\delta_j = \sum_{i=-\infty}^{i=\infty}\alpha_i\beta_{j-i}=(\alpha * \beta)_j,
+\]</div>
+<p>or as a double sum with restriction <span class="math notranslate nohighlight">\(l=i+j\)</span></p>
+<div class="math notranslate nohighlight">
+\[
+\delta_l = \sum_{ij}\alpha_i\beta_{j}.
+\]</div>
+</section>
+<section id="further-simplification">
+<h2>Further simplification<a class="headerlink" href="#further-simplification" title="Link to this heading">#</a></h2>
+<p>Although we may have redundant operations with some few zeros for <span class="math notranslate nohighlight">\(\beta_i\)</span>, we can rewrite the above sum in a more compact way as</p>
+<div class="math notranslate nohighlight">
+\[
+\delta_i = \sum_{k=0}^{k=m-1}\alpha_k\beta_{i-k},
+\]</div>
+<p>where <span class="math notranslate nohighlight">\(m=3\)</span> in our case, the maximum length of
+the vector <span class="math notranslate nohighlight">\(\alpha\)</span>. Note that the vector <span class="math notranslate nohighlight">\(\boldsymbol{\beta}\)</span> has length <span class="math notranslate nohighlight">\(n=4\)</span>. Below we will find an even more efficient representation.</p>
+</section>
+<section id="a-more-efficient-way-of-coding-the-above-convolution">
+<h2>A more efficient way of coding the above Convolution<a class="headerlink" href="#a-more-efficient-way-of-coding-the-above-convolution" title="Link to this heading">#</a></h2>
+<p>Since we only have a finite number of <span class="math notranslate nohighlight">\(\alpha\)</span> and <span class="math notranslate nohighlight">\(\beta\)</span> values
+which are non-zero, we can rewrite the above convolution expressions
+as a matrix-vector multiplication</p>
+<div class="math notranslate nohighlight">
+\[\begin{split}
+\boldsymbol{\delta}=\begin{bmatrix}\alpha_0 &amp; 0 &amp; 0 &amp; 0 \\
+                            \alpha_1 &amp; \alpha_0 &amp; 0 &amp; 0 \\
+			    \alpha_2 &amp; \alpha_1 &amp; \alpha_0 &amp; 0 \\
+			    0 &amp; \alpha_2 &amp; \alpha_1 &amp; \alpha_0 \\
+			    0 &amp; 0 &amp; \alpha_2 &amp; \alpha_1 \\
+			    0 &amp; 0 &amp; 0 &amp; \alpha_2
+			    \end{bmatrix}\begin{bmatrix} \beta_0 \\ \beta_1 \\ \beta_2 \\ \beta_3\end{bmatrix}.
+\end{split}\]</div>
+</section>
+<section id="commutative-process">
+<h2>Commutative process<a class="headerlink" href="#commutative-process" title="Link to this heading">#</a></h2>
+<p>The process is commutative and we can easily see that we can rewrite the multiplication in terms of  a matrix holding <span class="math notranslate nohighlight">\(\beta\)</span> and a vector holding <span class="math notranslate nohighlight">\(\alpha\)</span>.
+In this case we have</p>
+<div class="math notranslate nohighlight">
+\[\begin{split}
+\boldsymbol{\delta}=\begin{bmatrix}\beta_0 &amp; 0 &amp; 0  \\
+                            \beta_1 &amp; \beta_0 &amp; 0  \\
+			    \beta_2 &amp; \beta_1 &amp; \beta_0  \\
+			    \beta_3 &amp; \beta_2 &amp; \beta_1 \\
+			    0 &amp; \beta_3 &amp; \beta_2 \\
+			    0 &amp; 0 &amp; \beta_3
+			    \end{bmatrix}\begin{bmatrix} \alpha_0 \\ \alpha_1 \\ \alpha_2\end{bmatrix}.
+\end{split}\]</div>
+<p>Note that the use of these matrices is for mathematical purposes only
+and not implementation purposes.  When implementing the above equation
+we do not encode (and allocate memory) the matrices explicitely.  We
+rather code the convolutions in the minimal memory footprint that they
+require.</p>
+</section>
+<section id="toeplitz-matrices">
+<h2>Toeplitz matrices<a class="headerlink" href="#toeplitz-matrices" title="Link to this heading">#</a></h2>
+<p>The above matrices are examples of so-called <a class="reference external" href="https://link.springer.com/book/10.1007/978-93-86279-04-0">Toeplitz
+matrices</a>. A
+Toeplitz matrix is a matrix in which each descending diagonal from
+left to right is constant. For instance the last matrix, which we
+rewrite as</p>
+<div class="math notranslate nohighlight">
+\[\begin{split}
+\boldsymbol{A}=\begin{bmatrix}a_0 &amp; 0 &amp; 0  \\
+                            a_1 &amp; a_0 &amp; 0  \\
+			    a_2 &amp; a_1 &amp; a_0  \\
+			    a_3 &amp; a_2 &amp; a_1 \\
+			    0 &amp; a_3 &amp; a_2 \\
+			    0 &amp; 0 &amp; a_3
+			    \end{bmatrix},
+\end{split}\]</div>
+<p>with elements <span class="math notranslate nohighlight">\(a_{ii}=a_{i+1,j+1}=a_{i-j}\)</span> is an example of a Toeplitz
+matrix. Such a matrix does not need to be a square matrix.  Toeplitz
+matrices are also closely connected with Fourier series, because the multiplication operator by a trigonometric
+polynomial, compressed to a finite-dimensional space, can be
+represented by such a matrix. The example above shows that we can
+represent linear convolution as multiplication of a Toeplitz matrix by
+a vector.</p>
+</section>
+<section id="fourier-series-and-toeplitz-matrices">
+<h2>Fourier series and Toeplitz matrices<a class="headerlink" href="#fourier-series-and-toeplitz-matrices" title="Link to this heading">#</a></h2>
+<p>This is an active and ogoing research area concerning CNNs. The following articles may be of interest</p>
+<ol class="arabic simple">
+<li><p><a class="reference external" href="https://www.sciencedirect.com/topics/engineering/convolution-theorem#:~:text=The%20convolution%20theorem%20(together%20with,k%20)%20G%20(%20k%20)%20.">Read more about the convolution theorem and Fouriers series</a></p></li>
+<li><p><a class="reference external" href="https://www.sciencedirect.com/science/article/pii/S1568494623006257">Fourier Transform Layer</a></p></li>
+</ol>
+</section>
+<section id="generalizing-the-above-one-dimensional-case">
+<h2>Generalizing the above one-dimensional case<a class="headerlink" href="#generalizing-the-above-one-dimensional-case" title="Link to this heading">#</a></h2>
+<p>In order to align the above simple case with the more general
+convolution cases, we rename <span class="math notranslate nohighlight">\(\boldsymbol{\alpha}\)</span>, whose length is <span class="math notranslate nohighlight">\(m=3\)</span>,
+with <span class="math notranslate nohighlight">\(\boldsymbol{w}\)</span>.  We will interpret <span class="math notranslate nohighlight">\(\boldsymbol{w}\)</span> as a weight/filter function
+with which we want to perform the convolution with an input variable
+<span class="math notranslate nohighlight">\(\boldsymbol{x}\)</span> of length <span class="math notranslate nohighlight">\(n\)</span>.  We will assume always that the filter
+<span class="math notranslate nohighlight">\(\boldsymbol{w}\)</span> has dimensionality <span class="math notranslate nohighlight">\(m \le n\)</span>.</p>
+<p>We replace thus <span class="math notranslate nohighlight">\(\boldsymbol{\beta}\)</span> with <span class="math notranslate nohighlight">\(\boldsymbol{x}\)</span> and <span class="math notranslate nohighlight">\(\boldsymbol{\delta}\)</span> with <span class="math notranslate nohighlight">\(\boldsymbol{y}\)</span> and have</p>
+<div class="math notranslate nohighlight">
+\[
+y(i)= \left(x*w\right)(i)= \sum_{k=0}^{k=m-1}w(k)x(i-k),
+\]</div>
+<p>where <span class="math notranslate nohighlight">\(m=3\)</span> in our case, the maximum length of the vector <span class="math notranslate nohighlight">\(\boldsymbol{w}\)</span>.
+Here the symbol <span class="math notranslate nohighlight">\(*\)</span> represents the mathematical operation of convolution.</p>
+</section>
+<section id="memory-considerations">
+<h2>Memory considerations<a class="headerlink" href="#memory-considerations" title="Link to this heading">#</a></h2>
+<p>This expression leaves us however with some terms with negative
+indices, for example <span class="math notranslate nohighlight">\(x(-1)\)</span> and <span class="math notranslate nohighlight">\(x(-2)\)</span> which may not be defined. Our
+vector <span class="math notranslate nohighlight">\(\boldsymbol{x}\)</span> has components <span class="math notranslate nohighlight">\(x(0)\)</span>, <span class="math notranslate nohighlight">\(x(1)\)</span>, <span class="math notranslate nohighlight">\(x(2)\)</span> and <span class="math notranslate nohighlight">\(x(3)\)</span>.</p>
+<p>The index <span class="math notranslate nohighlight">\(j\)</span> for <span class="math notranslate nohighlight">\(\boldsymbol{x}\)</span> runs from <span class="math notranslate nohighlight">\(j=0\)</span> to <span class="math notranslate nohighlight">\(j=3\)</span> since <span class="math notranslate nohighlight">\(\boldsymbol{x}\)</span> is meant to
+represent a third-order polynomial.</p>
+<p>Furthermore, the index <span class="math notranslate nohighlight">\(i\)</span> runs from <span class="math notranslate nohighlight">\(i=0\)</span> to <span class="math notranslate nohighlight">\(i=5\)</span> since <span class="math notranslate nohighlight">\(\boldsymbol{y}\)</span>
+contains the coefficients of a fifth-order polynomial.  When <span class="math notranslate nohighlight">\(i=5\)</span> we
+may also have values of <span class="math notranslate nohighlight">\(x(4)\)</span> and <span class="math notranslate nohighlight">\(x(5)\)</span> which are not defined.</p>
+</section>
+<section id="padding">
+<h2>Padding<a class="headerlink" href="#padding" title="Link to this heading">#</a></h2>
+<p>The solution to this is what is called <strong>padding</strong>!  We simply define a
+new vector <span class="math notranslate nohighlight">\(x\)</span> with two added elements set to zero before <span class="math notranslate nohighlight">\(x(0)\)</span> and
+two new elements after <span class="math notranslate nohighlight">\(x(3)\)</span> set to zero. That is, we augment the
+length of <span class="math notranslate nohighlight">\(\boldsymbol{x}\)</span> from <span class="math notranslate nohighlight">\(n=4\)</span> to <span class="math notranslate nohighlight">\(n+2P=8\)</span>, where <span class="math notranslate nohighlight">\(P=2\)</span> is the padding
+constant (a new hyperparameter), see discussions below as well.</p>
+</section>
+<section id="new-vector">
+<h2>New vector<a class="headerlink" href="#new-vector" title="Link to this heading">#</a></h2>
+<p>We have a new vector defined as <span class="math notranslate nohighlight">\(x(0)=0\)</span>, <span class="math notranslate nohighlight">\(x(1)=0\)</span>,
+<span class="math notranslate nohighlight">\(x(2)=\beta_0\)</span>, <span class="math notranslate nohighlight">\(x(3)=\beta_1\)</span>, <span class="math notranslate nohighlight">\(x(4)=\beta_2\)</span>, <span class="math notranslate nohighlight">\(x(5)=\beta_3\)</span>,
+<span class="math notranslate nohighlight">\(x(6)=0\)</span>, and <span class="math notranslate nohighlight">\(x(7)=0\)</span>.</p>
+<p>We have added four new elements, which
+are all zero. The benefit is that we can rewrite the equation for
+<span class="math notranslate nohighlight">\(\boldsymbol{y}\)</span>, with <span class="math notranslate nohighlight">\(i=0,1,\dots,5\)</span>,</p>
+<div class="math notranslate nohighlight">
+\[
+y(i) = \sum_{k=0}^{k=m-1}w(k)x(i+(m-1)-k).
+\]</div>
+<p>As an example, we have</p>
+<div class="math notranslate nohighlight">
+\[
+y(4)=x(6)w(0)+x(5)w(1)+x(4)w(2)=0\times \alpha_0+\beta_3\alpha_1+\beta_2\alpha_2,
+\]</div>
+<p>as before except that we have an additional term <span class="math notranslate nohighlight">\(x(6)w(0)\)</span>, which is zero.</p>
+<p>Similarly, for the fifth-order term we have</p>
+<div class="math notranslate nohighlight">
+\[
+y(5)=x(7)w(0)+x(6)w(1)+x(5)w(2)=0\times \alpha_0+0\times\alpha_1+\beta_3\alpha_2.
+\]</div>
+<p>The zeroth-order term is</p>
+<div class="math notranslate nohighlight">
+\[
+y(0)=x(2)w(0)+x(1)w(1)+x(0)w(2)=\beta_0 \alpha_0+0\times\alpha_1+0\times\alpha_2=\alpha_0\beta_0.
+\]</div>
+</section>
+<section id="rewriting-as-dot-products">
+<h2>Rewriting as dot products<a class="headerlink" href="#rewriting-as-dot-products" title="Link to this heading">#</a></h2>
+<p>If we now flip the filter/weight vector, with the following term as a typical example</p>
+<div class="math notranslate nohighlight">
+\[
+y(0)=x(2)w(0)+x(1)w(1)+x(0)w(2)=x(2)\tilde{w}(2)+x(1)\tilde{w}(1)+x(0)\tilde{w}(0),
+\]</div>
+<p>with <span class="math notranslate nohighlight">\(\tilde{w}(0)=w(2)\)</span>, <span class="math notranslate nohighlight">\(\tilde{w}(1)=w(1)\)</span>, and <span class="math notranslate nohighlight">\(\tilde{w}(2)=w(0)\)</span>, we can then rewrite the above sum as a dot product of
+<span class="math notranslate nohighlight">\(x(i:i+(m-1))\tilde{w}\)</span> for element <span class="math notranslate nohighlight">\(y(i)\)</span>, where <span class="math notranslate nohighlight">\(x(i:i+(m-1))\)</span> is simply a patch of <span class="math notranslate nohighlight">\(\boldsymbol{x}\)</span> of size <span class="math notranslate nohighlight">\(m-1\)</span>.</p>
+<p>The padding <span class="math notranslate nohighlight">\(P\)</span> we have introduced for the convolution stage is just
+another hyperparameter which is introduced as part of the
+architecture. Similarly, below we will also introduce another
+hyperparameter called <strong>Stride</strong> <span class="math notranslate nohighlight">\(S\)</span>.</p>
+</section>
+<section id="cross-correlation">
+<h2>Cross correlation<a class="headerlink" href="#cross-correlation" title="Link to this heading">#</a></h2>
+<p>In essentially all applications one uses what is called cross correlation instead of the standard convolution described above.
+This means that multiplication is performed in the same direction and instead of the general expression we have discussed above (with infinite sums)</p>
+<div class="math notranslate nohighlight">
+\[
+y(i) = \sum_{k=-\infty}^{k=\infty}w(k)x(i-k),
+\]</div>
+<p>we have now</p>
+<div class="math notranslate nohighlight">
+\[
+y(i) = \sum_{k=-\infty}^{k=\infty}w(k)x(i+k).
+\]</div>
+<p>Both TensorFlow and PyTorch (as well as our own code example below),
+implement the last equation, although it is normally referred to as
+convolution.  The same padding rules and stride rules discussed below
+apply to this expression as well.</p>
+<p>We leave it as an exercise for you to convince yourself that the example we have discussed till now, gives the same final expression using the last expression.</p>
+</section>
+<section id="two-dimensional-objects">
+<h2>Two-dimensional objects<a class="headerlink" href="#two-dimensional-objects" title="Link to this heading">#</a></h2>
+<p>We are now ready to start studying the discrete convolutions relevant for convolutional neural networks.
+We often use convolutions over more than one dimension at a time. If
+we have a two-dimensional image <span class="math notranslate nohighlight">\(X\)</span> as input, we can have a <strong>filter</strong>
+defined by a two-dimensional <strong>kernel/weight/filter</strong> <span class="math notranslate nohighlight">\(W\)</span>. This leads to an output <span class="math notranslate nohighlight">\(Y\)</span></p>
+<div class="math notranslate nohighlight">
+\[
+Y(i,j)=(X * W)(i,j) = \sum_m\sum_n X(m,n)W(i-m,j-n).
+\]</div>
+<p>Convolution is a commutative process, which means we can rewrite this equation as</p>
+<div class="math notranslate nohighlight">
+\[
+Y(i,j)=(X * W)(i,j) = \sum_m\sum_n X(i-m,j-n)W(m,n).
+\]</div>
+<p>Normally the latter is more straightforward to implement in a machine
+larning library since there is less variation in the range of values
+of <span class="math notranslate nohighlight">\(m\)</span> and <span class="math notranslate nohighlight">\(n\)</span>.</p>
+<p>As mentioned above, most deep learning libraries implement
+cross-correlation instead of convolution (although it is referred to as
+convolution)</p>
+<div class="math notranslate nohighlight">
+\[
+Y(i,j)=(X * W)(i,j) = \sum_m\sum_n X(i+m,j+n)W(m,n).
+\]</div>
+</section>
+<section id="cnns-in-more-detail-simple-example">
+<h2>CNNs in more detail, simple example<a class="headerlink" href="#cnns-in-more-detail-simple-example" title="Link to this heading">#</a></h2>
+<p>Let assume we have an input matrix <span class="math notranslate nohighlight">\(X\)</span> of dimensionality <span class="math notranslate nohighlight">\(3\times 3\)</span>
+and a <span class="math notranslate nohighlight">\(2\times 2\)</span> filter <span class="math notranslate nohighlight">\(W\)</span> given by the following matrices</p>
+<div class="math notranslate nohighlight">
+\[\begin{split}
+\boldsymbol{X}=\begin{bmatrix}x_{00} &amp; x_{01} &amp; x_{02}  \\
+                      x_{10} &amp; x_{11} &amp; x_{12}  \\
+	              x_{20} &amp; x_{21} &amp; x_{22} \end{bmatrix},
+\end{split}\]</div>
+<p>and</p>
+<div class="math notranslate nohighlight">
+\[\begin{split}
+\boldsymbol{W}=\begin{bmatrix}w_{00} &amp; w_{01} \\
+	              w_{10} &amp; w_{11}\end{bmatrix}.
+\end{split}\]</div>
+<p>We introduce now the hyperparameter <span class="math notranslate nohighlight">\(S\)</span> <strong>stride</strong>. Stride represents how the filter <span class="math notranslate nohighlight">\(W\)</span> moves the convolution process on the matrix <span class="math notranslate nohighlight">\(X\)</span>.
+We strongly recommend the repository on <a class="reference external" href="https://github.com/vdumoulin/conv_arithmetic">Arithmetic of deep learning by Dumoulin and Visin</a></p>
+<p>Here we set the stride equal to <span class="math notranslate nohighlight">\(S=1\)</span>, which means that, starting with the element <span class="math notranslate nohighlight">\(x_{00}\)</span>, the filter will act on <span class="math notranslate nohighlight">\(2\times 2\)</span> submatrices each time, starting with the upper corner and moving according to the stride value column by column.</p>
+<p>Here we perform the operation</p>
+<div class="math notranslate nohighlight">
+\[
+Y_(i,j)=(X * W)(i,j) = \sum_m\sum_n X(i-m,j-n)W(m,n),
+\]</div>
+<p>and obtain</p>
+<div class="math notranslate nohighlight">
+\[\begin{split}
+\boldsymbol{Y}=\begin{bmatrix}x_{00}w_{00}+x_{01}w_{01}+x_{10}w_{10}+x_{11}w_{11} &amp; x_{01}w_{00}+x_{02}w_{01}+x_{11}w_{10}+x_{12}w_{11}  \\
+	              x_{10}w_{00}+x_{11}w_{01}+x_{20}w_{10}+x_{21}w_{11} &amp; x_{11}w_{00}+x_{12}w_{01}+x_{21}w_{10}+x_{22}w_{11}\end{bmatrix}.
+\end{split}\]</div>
+<p>We can rewrite this operation in terms of a matrix-vector multiplication by defining a new vector where we flatten out the inputs as a vector <span class="math notranslate nohighlight">\(\boldsymbol{X}'\)</span> of length <span class="math notranslate nohighlight">\(9\)</span> and
+a matrix <span class="math notranslate nohighlight">\(\boldsymbol{W}'\)</span> with dimension <span class="math notranslate nohighlight">\(4\times 9\)</span> as</p>
+<div class="math notranslate nohighlight">
+\[\begin{split}
+\boldsymbol{X}'=\begin{bmatrix}x_{00} \\ x_{01} \\ x_{02} \\ x_{10} \\ x_{11} \\ x_{12} \\ x_{20} \\ x_{21} \\ x_{22} \end{bmatrix},
+\end{split}\]</div>
+<p>and the new matrix</p>
+<div class="math notranslate nohighlight">
+\[\begin{split}
+\boldsymbol{W}'=\begin{bmatrix} w_{00} &amp; w_{01} &amp; 0 &amp; w_{10} &amp; w_{11} &amp; 0 &amp; 0 &amp; 0 &amp; 0 \\
+                        0  &amp; w_{00} &amp; w_{01} &amp; 0 &amp; w_{10} &amp; w_{11} &amp; 0 &amp; 0 &amp; 0 \\
+			0 &amp; 0 &amp; 0 &amp; w_{00} &amp; w_{01} &amp; 0 &amp; w_{10} &amp; w_{11} &amp; 0  \\
+                        0 &amp; 0 &amp; 0 &amp; 0 &amp; w_{00} &amp; w_{01} &amp; 0 &amp; w_{10} &amp; w_{11}\end{bmatrix}.
+\end{split}\]</div>
+<p>We see easily that performing the matrix-vector multiplication <span class="math notranslate nohighlight">\(\boldsymbol{W}'\boldsymbol{X}'\)</span> is the same as the above convolution with stride <span class="math notranslate nohighlight">\(S=1\)</span>, that is</p>
+<div class="math notranslate nohighlight">
+\[
+Y=(\boldsymbol{W}*\boldsymbol{X}),
+\]</div>
+<p>is now given by <span class="math notranslate nohighlight">\(\boldsymbol{W}'\boldsymbol{X}'\)</span> which is a vector of length <span class="math notranslate nohighlight">\(4\)</span> instead of the originally resulting  <span class="math notranslate nohighlight">\(2\times 2\)</span> output matrix.</p>
+</section>
+<section id="the-convolution-stage">
+<h2>The convolution stage<a class="headerlink" href="#the-convolution-stage" title="Link to this heading">#</a></h2>
+<p>The convolution stage, where we apply different filters <span class="math notranslate nohighlight">\(\boldsymbol{W}\)</span> in
+order to reduce the dimensionality of an image, adds, in addition to
+the weights and biases (to be trained by the back propagation
+algorithm) that define the filters, two new hyperparameters, the so-called
+<strong>padding</strong> <span class="math notranslate nohighlight">\(P\)</span> and the stride <span class="math notranslate nohighlight">\(S\)</span>.</p>
+</section>
+<section id="finding-the-number-of-parameters">
+<h2>Finding the number of parameters<a class="headerlink" href="#finding-the-number-of-parameters" title="Link to this heading">#</a></h2>
+<p>In the above example we have an input matrix of dimension <span class="math notranslate nohighlight">\(3\times
+3\)</span>. In general we call the input for an input volume and it is defined
+by its width <span class="math notranslate nohighlight">\(H_1\)</span>, height <span class="math notranslate nohighlight">\(H_1\)</span> and depth <span class="math notranslate nohighlight">\(D_1\)</span>. If we have the
+standard three color channels <span class="math notranslate nohighlight">\(D_1=3\)</span>.</p>
+<p>The above example has <span class="math notranslate nohighlight">\(W_1=H_1=3\)</span> and <span class="math notranslate nohighlight">\(D_1=1\)</span>.</p>
+<p>When we introduce the filter we have the following additional hyperparameters</p>
+<ol class="arabic simple">
+<li><p><span class="math notranslate nohighlight">\(K\)</span> the number of filters. It is common to perform the convolution of the input several times since by experience shrinking the input too fast does not work well</p></li>
+<li><p><span class="math notranslate nohighlight">\(F\)</span> as the filter’s spatial extent</p></li>
+<li><p><span class="math notranslate nohighlight">\(S\)</span> as the stride parameter</p></li>
+<li><p><span class="math notranslate nohighlight">\(P\)</span> as the padding parameter</p></li>
+</ol>
+<p>These parameters are defined by the architecture of the network and are not included in the training.</p>
+</section>
+<section id="new-image-or-volume">
+<h2>New image (or volume)<a class="headerlink" href="#new-image-or-volume" title="Link to this heading">#</a></h2>
+<p>Acting with the filter on the input volume produces an output volume
+which is defined by its width <span class="math notranslate nohighlight">\(W_2\)</span>, its height <span class="math notranslate nohighlight">\(H_2\)</span> and its depth
+<span class="math notranslate nohighlight">\(D_2\)</span>.</p>
+<p>These are defined by the following relations</p>
+<div class="math notranslate nohighlight">
+\[
+W_2 = \frac{(W_1-F+2P)}{S}+1,
+\]</div>
+<div class="math notranslate nohighlight">
+\[
+H_2 = \frac{(H_1-F+2P)}{S}+1,
+\]</div>
+<p>and <span class="math notranslate nohighlight">\(D_2=K\)</span>.</p>
+</section>
+<section id="parameters-to-train-common-settings">
+<h2>Parameters to train, common settings<a class="headerlink" href="#parameters-to-train-common-settings" title="Link to this heading">#</a></h2>
+<p>With parameter sharing, the convolution involves thus  for each filter  <span class="math notranslate nohighlight">\(F\times F\times D_1\)</span> weights plus one bias parameter.</p>
+<p>In total we have</p>
+<div class="math notranslate nohighlight">
+\[
+\left(F\times F\times D_1)\right) \times K+(K\mathrm{--biases}),
+\]</div>
+<p>parameters to train by back propagation.</p>
+<p>It is common to let <span class="math notranslate nohighlight">\(K\)</span> come in powers of <span class="math notranslate nohighlight">\(2\)</span>, that is <span class="math notranslate nohighlight">\(32\)</span>, <span class="math notranslate nohighlight">\(64\)</span>, <span class="math notranslate nohighlight">\(128\)</span> etc.</p>
+<p><strong>Common settings.</strong></p>
+<ol class="arabic simple">
+<li><p><span class="math notranslate nohighlight">\(\begin{array}{c} F=3 &amp; S=1 &amp; P=1 \end{array}\)</span></p></li>
+<li><p><span class="math notranslate nohighlight">\(\begin{array}{c} F=5 &amp; S=1 &amp; P=2 \end{array}\)</span></p></li>
+<li><p><span class="math notranslate nohighlight">\(\begin{array}{c} F=5 &amp; S=2 &amp; P=\mathrm{open} \end{array}\)</span></p></li>
+<li><p><span class="math notranslate nohighlight">\(\begin{array}{c} F=1 &amp; S=1 &amp; P=0 \end{array}\)</span></p></li>
+</ol>
+</section>
+<section id="examples-of-cnn-setups">
+<h2>Examples of CNN setups<a class="headerlink" href="#examples-of-cnn-setups" title="Link to this heading">#</a></h2>
+<p>Let us assume we have an input volume <span class="math notranslate nohighlight">\(V\)</span> given by an image of dimensionality
+<span class="math notranslate nohighlight">\(32\times 32 \times 3\)</span>, that is three color channels and <span class="math notranslate nohighlight">\(32\times 32\)</span> pixels.</p>
+<p>We apply a filter of dimension <span class="math notranslate nohighlight">\(5\times 5\)</span> ten times with stride <span class="math notranslate nohighlight">\(S=1\)</span> and padding <span class="math notranslate nohighlight">\(P=0\)</span>.</p>
+<p>The output volume is given by <span class="math notranslate nohighlight">\((32-5)/1+1=28\)</span>, resulting in ten images
+of dimensionality <span class="math notranslate nohighlight">\(28\times 28\times 3\)</span>.</p>
+<p>The total number of parameters to train for each filter is then
+<span class="math notranslate nohighlight">\(5\times 5\times 3+1\)</span>, where the last parameter is the bias. This
+gives us <span class="math notranslate nohighlight">\(76\)</span> parameters for each filter, leading to a total of <span class="math notranslate nohighlight">\(760\)</span>
+parameters for the ten filters.</p>
+<p>How many parameters will a filter of dimensionality <span class="math notranslate nohighlight">\(3\times 3\)</span>
+(adding color channels) result in if we produce <span class="math notranslate nohighlight">\(32\)</span> new images? Use <span class="math notranslate nohighlight">\(S=1\)</span> and <span class="math notranslate nohighlight">\(P=0\)</span>.</p>
+<p>Note that strides constitute a form of <strong>subsampling</strong>. As an alternative to
+being interpreted as a measure of how much the kernel/filter is translated, strides
+can also be viewed as how much of the output is retained. For instance, moving
+the kernel by hops of two is equivalent to moving the kernel by hops of one but
+retaining only odd output elements.</p>
+</section>
+<section id="summarizing-performing-a-general-discrete-convolution-from-raschka-et-al">
+<h2>Summarizing: Performing a general discrete convolution (<a class="reference external" href="https://github.com/rasbt/machine-learning-book">From Raschka et al</a>)<a class="headerlink" href="#summarizing-performing-a-general-discrete-convolution-from-raschka-et-al" title="Link to this heading">#</a></h2>
+<!-- dom:FIGURE: [figslides/discreteconv1.png, width=500 frac=0.67]  A deep CNN -->
+<!-- begin figure -->
+<p><img src="http://github.com/figslides/discreteconv1.png" width="500"><p style="font-size: 0.9em"><i>Figure 1: A deep CNN</i></p></p>
+<!-- end figure --></section>
+<section id="pooling">
+<h2>Pooling<a class="headerlink" href="#pooling" title="Link to this heading">#</a></h2>
+<p>In addition to discrete convolutions themselves, <strong>pooling</strong> operations
+make up another important building block in CNNs. Pooling operations reduce
+the size of feature maps by using some function to summarize subregions, such
+as taking the average or the maximum value.</p>
+<p>Pooling works by sliding a window across the input and feeding the content of
+the window to a <strong>pooling function</strong>. In some sense, pooling works very much
+like a discrete convolution, but replaces the linear combination described by
+the kernel with some other function.</p>
+</section>
+<section id="pooling-arithmetic">
+<h2>Pooling arithmetic<a class="headerlink" href="#pooling-arithmetic" title="Link to this heading">#</a></h2>
+<p>In a neural network, pooling layers provide invariance to small translations of
+the input. The most common kind of pooling is <strong>max pooling</strong>, which
+consists in splitting the input in (usually non-overlapping) patches and
+outputting the maximum value of each patch. Other kinds of pooling exist, e.g.,
+mean or average pooling, which all share the same idea of aggregating the input
+locally by applying a non-linearity to the content of some patches.</p>
+</section>
+<section id="pooling-types-from-raschka-et-al">
+<h2>Pooling types (<a class="reference external" href="https://github.com/rasbt/machine-learning-book">From Raschka et al</a>)<a class="headerlink" href="#pooling-types-from-raschka-et-al" title="Link to this heading">#</a></h2>
+<!-- dom:FIGURE: [figslides/maxpooling.png, width=500 frac=0.67]  A deep CNN -->
+<!-- begin figure -->
+<p><img src="http://github.com/figslides/maxpooling.png" width="500"><p style="font-size: 0.9em"><i>Figure 1: A deep CNN</i></p></p>
+<!-- end figure --></section>
+<section id="building-convolutional-neural-networks-using-tensorflow-and-keras">
+<h2>Building convolutional neural networks using Tensorflow and Keras<a class="headerlink" href="#building-convolutional-neural-networks-using-tensorflow-and-keras" title="Link to this heading">#</a></h2>
+<p>As discussed above, CNNs are neural networks built from the assumption that the inputs
+to the network are 2D images. This is important because the number of features or pixels in images
+grows very fast with the image size, and an enormous number of weights and biases are needed in order to build an accurate network.</p>
+<p>As before, we still have our input, a hidden layer and an output. What’s novel about convolutional networks
+are the <strong>convolutional</strong> and <strong>pooling</strong> layers stacked in pairs between the input and the hidden layer.
+In addition, the data is no longer represented as a 2D feature matrix, instead each input is a number of 2D
+matrices, typically 1 for each color dimension (Red, Green, Blue).</p>
+</section>
+<section id="setting-it-up">
+<h2>Setting it up<a class="headerlink" href="#setting-it-up" title="Link to this heading">#</a></h2>
+<p>It means that to represent the entire
+dataset of images, we require a 4D matrix or <strong>tensor</strong>. This tensor has the dimensions:</p>
+<div class="math notranslate nohighlight">
+\[
+(n_{inputs},\, n_{pixels, width},\, n_{pixels, height},\, depth) .
+\]</div>
+</section>
+<section id="the-mnist-dataset-again">
+<h2>The MNIST dataset again<a class="headerlink" href="#the-mnist-dataset-again" title="Link to this heading">#</a></h2>
+<p>The MNIST dataset consists of grayscale images with a pixel size of
+<span class="math notranslate nohighlight">\(28\times 28\)</span>, meaning we require <span class="math notranslate nohighlight">\(28 \times 28 = 724\)</span> weights to each
+neuron in the first hidden layer.</p>
+<p>If we were to analyze images of size <span class="math notranslate nohighlight">\(128\times 128\)</span> we would require
+<span class="math notranslate nohighlight">\(128 \times 128 = 16384\)</span> weights to each neuron. Even worse if we were
+dealing with color images, as most images are, we have an image matrix
+of size <span class="math notranslate nohighlight">\(128\times 128\)</span> for each color dimension (Red, Green, Blue),
+meaning 3 times the number of weights <span class="math notranslate nohighlight">\(= 49152\)</span> are required for every
+single neuron in the first hidden layer.</p>
+</section>
+<section id="strong-correlations">
+<h2>Strong correlations<a class="headerlink" href="#strong-correlations" title="Link to this heading">#</a></h2>
+<p>Images typically have strong local correlations, meaning that a small
+part of the image varies little from its neighboring regions. If for
+example we have an image of a blue car, we can roughly assume that a
+small blue part of the image is surrounded by other blue regions.</p>
+<p>Therefore, instead of connecting every single pixel to a neuron in the
+first hidden layer, as we have previously done with deep neural
+networks, we can instead connect each neuron to a small part of the
+image (in all 3 RGB depth dimensions).  The size of each small area is
+fixed, and known as a <a class="reference external" href="https://en.wikipedia.org/wiki/Receptive_field">receptive</a>.</p>
+</section>
+<section id="layers-of-a-cnn">
+<h2>Layers of a CNN<a class="headerlink" href="#layers-of-a-cnn" title="Link to this heading">#</a></h2>
+<p>The layers of a convolutional neural network arrange neurons in 3D: width, height and depth.<br />
+The input image is typically a square matrix of depth 3.</p>
+<p>A <strong>convolution</strong> is performed on the image which outputs
+a 3D volume of neurons. The weights to the input are arranged in a number of 2D matrices, known as <strong>filters</strong>.</p>
+<p>Each filter slides along the input image, taking the dot product
+between each small part of the image and the filter, in all depth
+dimensions. This is then passed through a non-linear function,
+typically the <strong>Rectified Linear (ReLu)</strong> function, which serves as the
+activation of the neurons in the first convolutional layer. This is
+further passed through a <strong>pooling layer</strong>, which reduces the size of the
+convolutional layer, e.g. by taking the maximum or average across some
+small regions, and this serves as input to the next convolutional
+layer.</p>
+</section>
+<section id="systematic-reduction">
+<h2>Systematic reduction<a class="headerlink" href="#systematic-reduction" title="Link to this heading">#</a></h2>
+<p>By systematically reducing the size of the input volume, through
+convolution and pooling, the network should create representations of
+small parts of the input, and then from them assemble representations
+of larger areas.  The final pooling layer is flattened to serve as
+input to a hidden layer, such that each neuron in the final pooling
+layer is connected to every single neuron in the hidden layer. This
+then serves as input to the output layer, e.g. a softmax output for
+classification.</p>
+</section>
+<section id="prerequisites-collect-and-pre-process-data">
+<h2>Prerequisites: Collect and pre-process data<a class="headerlink" href="#prerequisites-collect-and-pre-process-data" title="Link to this heading">#</a></h2>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-none notranslate"><div class="highlight"><pre><span></span>%matplotlib inline
+
+# import necessary packages
+import numpy as np
+import matplotlib.pyplot as plt
+from sklearn import datasets
+
+
+# ensure the same random numbers appear every time
+np.random.seed(0)
+
+# display images in notebook
+%matplotlib inline
+plt.rcParams[&#39;figure.figsize&#39;] = (12,12)
+
+
+# download MNIST dataset
+digits = datasets.load_digits()
+
+# define inputs and labels
+inputs = digits.images
+labels = digits.target
+
+# RGB images have a depth of 3
+# our images are grayscale so they should have a depth of 1
+inputs = inputs[:,:,:,np.newaxis]
+
+print(&quot;inputs = (n_inputs, pixel_width, pixel_height, depth) = &quot; + str(inputs.shape))
+print(&quot;labels = (n_inputs) = &quot; + str(labels.shape))
+
+
+# choose some random images to display
+n_inputs = len(inputs)
+indices = np.arange(n_inputs)
+random_indices = np.random.choice(indices, size=5)
+
+for i, image in enumerate(digits.images[random_indices]):
+    plt.subplot(1, 5, i+1)
+    plt.axis(&#39;off&#39;)
+    plt.imshow(image, cmap=plt.cm.gray_r, interpolation=&#39;nearest&#39;)
+    plt.title(&quot;Label: %d&quot; % digits.target[random_indices[i]])
+plt.show()
+</pre></div>
+</div>
+</div>
+</div>
+</section>
+<section id="importing-keras-and-tensorflow">
+<h2>Importing Keras and Tensorflow<a class="headerlink" href="#importing-keras-and-tensorflow" title="Link to this heading">#</a></h2>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-none notranslate"><div class="highlight"><pre><span></span>from tensorflow.keras import datasets, layers, models
+from tensorflow.keras.layers import Input
+from tensorflow.keras.models import Sequential      #This allows appending layers to existing models
+from tensorflow.keras.layers import Dense           #This allows defining the characteristics of a particular layer
+from tensorflow.keras import optimizers             #This allows using whichever optimiser we want (sgd,adam,RMSprop)
+from tensorflow.keras import regularizers           #This allows using whichever regularizer we want (l1,l2,l1_l2)
+from tensorflow.keras.utils import to_categorical   #This allows using categorical cross entropy as the cost function
+#from tensorflow.keras import Conv2D
+#from tensorflow.keras import MaxPooling2D
+#from tensorflow.keras import Flatten
+
+from sklearn.model_selection import train_test_split
+
+# representation of labels
+labels = to_categorical(labels)
+
+# split into train and test data
+# one-liner from scikit-learn library
+train_size = 0.8
+test_size = 1 - train_size
+X_train, X_test, Y_train, Y_test = train_test_split(inputs, labels, train_size=train_size,
+                                                    test_size=test_size)
+</pre></div>
+</div>
+</div>
+</div>
+</section>
+<section id="running-with-keras">
+<h2>Running with Keras<a class="headerlink" href="#running-with-keras" title="Link to this heading">#</a></h2>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-none notranslate"><div class="highlight"><pre><span></span>def create_convolutional_neural_network_keras(input_shape, receptive_field,
+                                              n_filters, n_neurons_connected, n_categories,
+                                              eta, lmbd):
+    model = Sequential()
+    model.add(layers.Conv2D(n_filters, (receptive_field, receptive_field), input_shape=input_shape, padding=&#39;same&#39;,
+              activation=&#39;relu&#39;, kernel_regularizer=regularizers.l2(lmbd)))
+    model.add(layers.MaxPooling2D(pool_size=(2, 2)))
+    model.add(layers.Flatten())
+    model.add(layers.Dense(n_neurons_connected, activation=&#39;relu&#39;, kernel_regularizer=regularizers.l2(lmbd)))
+    model.add(layers.Dense(n_categories, activation=&#39;softmax&#39;, kernel_regularizer=regularizers.l2(lmbd)))
+    
+    sgd = optimizers.SGD(lr=eta)
+    model.compile(loss=&#39;categorical_crossentropy&#39;, optimizer=sgd, metrics=[&#39;accuracy&#39;])
+    
+    return model
+
+epochs = 100
+batch_size = 100
+input_shape = X_train.shape[1:4]
+receptive_field = 3
+n_filters = 10
+n_neurons_connected = 50
+n_categories = 10
+
+eta_vals = np.logspace(-5, 1, 7)
+lmbd_vals = np.logspace(-5, 1, 7)
+</pre></div>
+</div>
+</div>
+</div>
+</section>
+<section id="final-part">
+<h2>Final part<a class="headerlink" href="#final-part" title="Link to this heading">#</a></h2>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-none notranslate"><div class="highlight"><pre><span></span>CNN_keras = np.zeros((len(eta_vals), len(lmbd_vals)), dtype=object)
+        
+for i, eta in enumerate(eta_vals):
+    for j, lmbd in enumerate(lmbd_vals):
+        CNN = create_convolutional_neural_network_keras(input_shape, receptive_field,
+                                              n_filters, n_neurons_connected, n_categories,
+                                              eta, lmbd)
+        CNN.fit(X_train, Y_train, epochs=epochs, batch_size=batch_size, verbose=0)
+        scores = CNN.evaluate(X_test, Y_test)
+        
+        CNN_keras[i][j] = CNN
+        
+        print(&quot;Learning rate = &quot;, eta)
+        print(&quot;Lambda = &quot;, lmbd)
+        print(&quot;Test accuracy: %.3f&quot; % scores[1])
+        print()
+</pre></div>
+</div>
+</div>
+</div>
+</section>
+<section id="final-visualization">
+<h2>Final visualization<a class="headerlink" href="#final-visualization" title="Link to this heading">#</a></h2>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-none notranslate"><div class="highlight"><pre><span></span># visual representation of grid search
+# uses seaborn heatmap, could probably do this in matplotlib
+import seaborn as sns
+
+sns.set()
+
+train_accuracy = np.zeros((len(eta_vals), len(lmbd_vals)))
+test_accuracy = np.zeros((len(eta_vals), len(lmbd_vals)))
+
+for i in range(len(eta_vals)):
+    for j in range(len(lmbd_vals)):
+        CNN = CNN_keras[i][j]
+
+        train_accuracy[i][j] = CNN.evaluate(X_train, Y_train)[1]
+        test_accuracy[i][j] = CNN.evaluate(X_test, Y_test)[1]
+
+        
+fig, ax = plt.subplots(figsize = (10, 10))
+sns.heatmap(train_accuracy, annot=True, ax=ax, cmap=&quot;viridis&quot;)
+ax.set_title(&quot;Training Accuracy&quot;)
+ax.set_ylabel(&quot;$\eta$&quot;)
+ax.set_xlabel(&quot;$\lambda$&quot;)
+plt.show()
+
+fig, ax = plt.subplots(figsize = (10, 10))
+sns.heatmap(test_accuracy, annot=True, ax=ax, cmap=&quot;viridis&quot;)
+ax.set_title(&quot;Test Accuracy&quot;)
+ax.set_ylabel(&quot;$\eta$&quot;)
+ax.set_xlabel(&quot;$\lambda$&quot;)
+plt.show()
+</pre></div>
+</div>
+</div>
+</div>
+</section>
+<section id="the-cifar01-data-set">
+<h2>The CIFAR01 data set<a class="headerlink" href="#the-cifar01-data-set" title="Link to this heading">#</a></h2>
+<p>The CIFAR10 dataset contains 60,000 color images in 10 classes, with
+6,000 images in each class. The dataset is divided into 50,000
+training images and 10,000 testing images. The classes are mutually
+exclusive and there is no overlap between them.</p>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-none notranslate"><div class="highlight"><pre><span></span>import tensorflow as tf
+
+from tensorflow.keras import datasets, layers, models
+import matplotlib.pyplot as plt
+
+# We import the data set
+(train_images, train_labels), (test_images, test_labels) = datasets.cifar10.load_data()
+
+# Normalize pixel values to be between 0 and 1 by dividing by 255. 
+train_images, test_images = train_images / 255.0, test_images / 255.0
+</pre></div>
+</div>
+</div>
+</div>
+</section>
+<section id="verifying-the-data-set">
+<h2>Verifying the data set<a class="headerlink" href="#verifying-the-data-set" title="Link to this heading">#</a></h2>
+<p>To verify that the dataset looks correct, let’s plot the first 25 images from the training set and display the class name below each image.</p>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-none notranslate"><div class="highlight"><pre><span></span>class_names = [&#39;airplane&#39;, &#39;automobile&#39;, &#39;bird&#39;, &#39;cat&#39;, &#39;deer&#39;,
+               &#39;dog&#39;, &#39;frog&#39;, &#39;horse&#39;, &#39;ship&#39;, &#39;truck&#39;]
+plt.figure(figsize=(10,10))
+for i in range(25):
+    plt.subplot(5,5,i+1)
+    plt.xticks([])
+    plt.yticks([])
+    plt.grid(False)
+    plt.imshow(train_images[i], cmap=plt.cm.binary)
+    # The CIFAR labels happen to be arrays, 
+    # which is why you need the extra index
+    plt.xlabel(class_names[train_labels[i][0]])
+plt.show()
+</pre></div>
+</div>
+</div>
+</div>
+</section>
+<section id="set-up-the-model">
+<h2>Set up  the model<a class="headerlink" href="#set-up-the-model" title="Link to this heading">#</a></h2>
+<p>The 6 lines of code below define the convolutional base using a common pattern: a stack of Conv2D and MaxPooling2D layers.</p>
+<p>As input, a CNN takes tensors of shape (image_height, image_width, color_channels), ignoring the batch size. If you are new to these dimensions, color_channels refers to (R,G,B). In this example, you will configure our CNN to process inputs of shape (32, 32, 3), which is the format of CIFAR images. You can do this by passing the argument input_shape to our first layer.</p>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-none notranslate"><div class="highlight"><pre><span></span>model = models.Sequential()
+model.add(layers.Conv2D(32, (3, 3), activation=&#39;relu&#39;, input_shape=(32, 32, 3)))
+model.add(layers.MaxPooling2D((2, 2)))
+model.add(layers.Conv2D(64, (3, 3), activation=&#39;relu&#39;))
+model.add(layers.MaxPooling2D((2, 2)))
+model.add(layers.Conv2D(64, (3, 3), activation=&#39;relu&#39;))
+
+# Let&#39;s display the architecture of our model so far.
+
+model.summary()
+</pre></div>
+</div>
+</div>
+</div>
+<p>You can see that the output of every Conv2D and MaxPooling2D layer is a 3D tensor of shape (height, width, channels). The width and height dimensions tend to shrink as you go deeper in the network. The number of output channels for each Conv2D layer is controlled by the first argument (e.g., 32 or 64). Typically, as the width and height shrink, you can afford (computationally) to add more output channels in each Conv2D layer.</p>
+</section>
+<section id="add-dense-layers-on-top">
+<h2>Add Dense layers on top<a class="headerlink" href="#add-dense-layers-on-top" title="Link to this heading">#</a></h2>
+<p>To complete our model, you will feed the last output tensor from the
+convolutional base (of shape (4, 4, 64)) into one or more Dense layers
+to perform classification. Dense layers take vectors as input (which
+are 1D), while the current output is a 3D tensor. First, you will
+flatten (or unroll) the 3D output to 1D, then add one or more Dense
+layers on top. CIFAR has 10 output classes, so you use a final Dense
+layer with 10 outputs and a softmax activation.</p>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-none notranslate"><div class="highlight"><pre><span></span>model.add(layers.Flatten())
+model.add(layers.Dense(64, activation=&#39;relu&#39;))
+model.add(layers.Dense(10))
+Here&#39;s the complete architecture of our model.
+
+model.summary()
+</pre></div>
+</div>
+</div>
+</div>
+<p>As you can see, our (4, 4, 64) outputs were flattened into vectors of shape (1024) before going through two Dense layers.</p>
+</section>
+<section id="compile-and-train-the-model">
+<h2>Compile and train the model<a class="headerlink" href="#compile-and-train-the-model" title="Link to this heading">#</a></h2>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-none notranslate"><div class="highlight"><pre><span></span>model.compile(optimizer=&#39;adam&#39;,
+              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
+              metrics=[&#39;accuracy&#39;])
+
+history = model.fit(train_images, train_labels, epochs=10, 
+                    validation_data=(test_images, test_labels))
+</pre></div>
+</div>
+</div>
+</div>
+</section>
+<section id="finally-evaluate-the-model">
+<h2>Finally, evaluate the model<a class="headerlink" href="#finally-evaluate-the-model" title="Link to this heading">#</a></h2>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-none notranslate"><div class="highlight"><pre><span></span>plt.plot(history.history[&#39;accuracy&#39;], label=&#39;accuracy&#39;)
+plt.plot(history.history[&#39;val_accuracy&#39;], label = &#39;val_accuracy&#39;)
+plt.xlabel(&#39;Epoch&#39;)
+plt.ylabel(&#39;Accuracy&#39;)
+plt.ylim([0.5, 1])
+plt.legend(loc=&#39;lower right&#39;)
+
+test_loss, test_acc = model.evaluate(test_images,  test_labels, verbose=2)
+
+print(test_acc)
+</pre></div>
+</div>
+</div>
+</div>
+</section>
+<section id="building-code-using-pytorch">
+<h2>Building code using Pytorch<a class="headerlink" href="#building-code-using-pytorch" title="Link to this heading">#</a></h2>
+<p>This code loads and normalizes the MNIST dataset. Thereafter it defines  a CNN architecture with:</p>
+<ol class="arabic simple">
+<li><p>Two convolutional layers</p></li>
+<li><p>Max pooling</p></li>
+<li><p>Dropout for regularization</p></li>
+<li><p>Two fully connected layers</p></li>
+</ol>
+<p>It uses the Adam optimizer and for cost function it employs the
+Cross-Entropy function. It trains for 10 epochs.
+You can modify the architecture (number of layers, channels, dropout
+rate) or training parameters (learning rate, batch size, epochs) to
+experiment with different configurations.</p>
+<div class="cell docutils container">
+<div class="cell_input docutils container">
+<div class="highlight-none notranslate"><div class="highlight"><pre><span></span>import torch
+import torch.nn as nn
+import torch.nn.functional as F
+import torch.optim as optim
+from torchvision import datasets, transforms
+
+# Set device
+device = torch.device(&quot;cuda&quot; if torch.cuda.is_available() else &quot;cpu&quot;)
+
+# Define transforms
+transform = transforms.Compose([
+   transforms.ToTensor(),
+   transforms.Normalize((0.1307,), (0.3081,))
+])
+
+# Load datasets
+train_dataset = datasets.MNIST(root=&#39;./data&#39;, train=True, download=True, transform=transform)
+test_dataset = datasets.MNIST(root=&#39;./data&#39;, train=False, download=True, transform=transform)
+
+# Create data loaders
+train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=64, shuffle=True)
+test_loader = torch.utils.data.DataLoader(test_dataset, batch_size=64, shuffle=False)
+
+# Define CNN model
+class CNN(nn.Module):
+   def __init__(self):
+       super(CNN, self).__init__()
+       self.conv1 = nn.Conv2d(1, 32, 3, padding=1)
+       self.conv2 = nn.Conv2d(32, 64, 3, padding=1)
+       self.pool = nn.MaxPool2d(2, 2)
+       self.fc1 = nn.Linear(64*7*7, 1024)
+       self.fc2 = nn.Linear(1024, 10)
+       self.dropout = nn.Dropout(0.5)
+
+   def forward(self, x):
+       x = self.pool(F.relu(self.conv1(x)))
+       x = self.pool(F.relu(self.conv2(x)))
+       x = x.view(-1, 64*7*7)
+       x = self.dropout(F.relu(self.fc1(x)))
+       x = self.fc2(x)
+       return x
+
+# Initialize model, loss function, and optimizer
+model = CNN().to(device)
+criterion = nn.CrossEntropyLoss()
+optimizer = optim.Adam(model.parameters(), lr=0.001)
+
+# Training loop
+num_epochs = 10
+for epoch in range(num_epochs):
+   model.train()
+   running_loss = 0.0
+   for batch_idx, (data, target) in enumerate(train_loader):
+       data, target = data.to(device), target.to(device)
+       optimizer.zero_grad()
+       outputs = model(data)
+       loss = criterion(outputs, target)
+       loss.backward()
+       optimizer.step()
+       running_loss += loss.item()
+
+   print(f&#39;Epoch [{epoch+1}/{num_epochs}], Loss: {running_loss/len(train_loader):.4f}&#39;)
+
+# Testing the model
+model.eval()
+correct = 0
+total = 0
+with torch.no_grad():
+   for data, target in test_loader:
+       data, target = data.to(device), target.to(device)
+       outputs = model(data)
+       _, predicted = torch.max(outputs.data, 1)
+       total += target.size(0)
+       correct += (predicted == target).sum().item()
+
+print(f&#39;Test Accuracy: {100 * correct / total:.2f}%&#39;)
+</pre></div>
+</div>
+</div>
+</div>
+</section>
+</section>
+
+    <script type="text/x-thebe-config">
+    {
+        requestKernel: true,
+        binderOptions: {
+            repo: "binder-examples/jupyter-stacks-datascience",
+            ref: "master",
+        },
+        codeMirrorConfig: {
+            theme: "abcdef",
+            mode: "python"
+        },
+        kernelOptions: {
+            name: "python3",
+            path: "./."
+        },
+        predefinedOutput: true
+    }
+    </script>
+    <script>kernelName = 'python3'</script>
+
+                </article>
+              
+
+              
+              
+              
+              
+                <footer class="prev-next-footer d-print-none">
+                  
+<div class="prev-next-area">
+    <a class="left-prev"
+       href="http://github.com/exercisesweek44.html"
+       title="previous page">
+      <i class="fa-solid fa-angle-left"></i>
+      <div class="prev-next-info">
+        <p class="prev-next-subtitle">previous</p>
+        <p class="prev-next-title">Exercises week 44</p>
+      </div>
+    </a>
+    <a class="right-next"
+       href="http://github.com/project1.html"
+       title="next page">
+      <div class="prev-next-info">
+        <p class="prev-next-subtitle">next</p>
+        <p class="prev-next-title">Project 1 on Machine Learning, deadline October 6 (midnight), 2025</p>
+      </div>
+      <i class="fa-solid fa-angle-right"></i>
+    </a>
+</div>
+                </footer>
+              
+            </div>
+            
+            
+              
+                <div class="bd-sidebar-secondary bd-toc"><div class="sidebar-secondary-items sidebar-secondary__inner">
+
+
+  <div class="sidebar-secondary-item">
+  <div class="page-toc tocsection onthispage">
+    <i class="fa-solid fa-list"></i> Contents
+  </div>
+  <nav class="bd-toc-nav page-toc">
+    <ul class="visible nav section-nav flex-column">
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#plans-for-week-45">Plans for week 45</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#material-for-the-lab-sessions">Material for the lab sessions</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#material-for-lecture-monday-november-3">Material for Lecture Monday November 3</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#convolutional-neural-networks-recognizing-images-reminder-from-last-week">Convolutional Neural Networks (recognizing images), reminder from last week</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#what-is-the-difference">What is the Difference</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#neural-networks-vs-cnns">Neural Networks vs CNNs</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc">Why CNNS for images, sound files, medical images from CT scans etc?</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#regular-nns-dont-scale-well-to-full-images">Regular NNs don’t scale well to full images</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#d-volumes-of-neurons">3D volumes of neurons</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#more-on-dimensionalities">More on Dimensionalities</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#further-remarks">Further remarks</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#layers-used-to-build-cnns">Layers used to build CNNs</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#transforming-images">Transforming images</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#cnns-in-brief">CNNs in brief</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#a-deep-cnn-model-from-raschka-et-al">A deep CNN model (From Raschka et al)</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#key-idea">Key Idea</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#mathematics-of-cnns">Mathematics of CNNs</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#convolution-examples-polynomial-multiplication">Convolution Examples: Polynomial multiplication</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#efficient-polynomial-multiplication">Efficient Polynomial Multiplication</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#further-simplification">Further simplification</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#a-more-efficient-way-of-coding-the-above-convolution">A more efficient way of coding the above Convolution</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#commutative-process">Commutative process</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#toeplitz-matrices">Toeplitz matrices</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#fourier-series-and-toeplitz-matrices">Fourier series and Toeplitz matrices</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#generalizing-the-above-one-dimensional-case">Generalizing the above one-dimensional case</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#memory-considerations">Memory considerations</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#padding">Padding</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#new-vector">New vector</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#rewriting-as-dot-products">Rewriting as dot products</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#cross-correlation">Cross correlation</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#two-dimensional-objects">Two-dimensional objects</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#cnns-in-more-detail-simple-example">CNNs in more detail, simple example</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#the-convolution-stage">The convolution stage</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#finding-the-number-of-parameters">Finding the number of parameters</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#new-image-or-volume">New image (or volume)</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#parameters-to-train-common-settings">Parameters to train, common settings</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#examples-of-cnn-setups">Examples of CNN setups</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#summarizing-performing-a-general-discrete-convolution-from-raschka-et-al">Summarizing: Performing a general discrete convolution (From Raschka et al)</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#pooling">Pooling</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#pooling-arithmetic">Pooling arithmetic</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#pooling-types-from-raschka-et-al">Pooling types (From Raschka et al)</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#building-convolutional-neural-networks-using-tensorflow-and-keras">Building convolutional neural networks using Tensorflow and Keras</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#setting-it-up">Setting it up</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#the-mnist-dataset-again">The MNIST dataset again</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#strong-correlations">Strong correlations</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#layers-of-a-cnn">Layers of a CNN</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#systematic-reduction">Systematic reduction</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#prerequisites-collect-and-pre-process-data">Prerequisites: Collect and pre-process data</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#importing-keras-and-tensorflow">Importing Keras and Tensorflow</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#running-with-keras">Running with Keras</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#final-part">Final part</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#final-visualization">Final visualization</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#the-cifar01-data-set">The CIFAR01 data set</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#verifying-the-data-set">Verifying the data set</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#set-up-the-model">Set up  the model</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#add-dense-layers-on-top">Add Dense layers on top</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#compile-and-train-the-model">Compile and train the model</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#finally-evaluate-the-model">Finally, evaluate the model</a></li>
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#building-code-using-pytorch">Building code using Pytorch</a></li>
+</ul>
+  </nav></div>
+
+</div></div>
+              
+            
+          </div>
+          <footer class="bd-footer-content">
+            
+<div class="bd-footer-content__inner container">
+  
+  <div class="footer-item">
+    
+<p class="component-author">
+By Morten Hjorth-Jensen
+</p>
+
+  </div>
+  
+  <div class="footer-item">
+    
+
+  <p class="copyright">
+    
+      © Copyright 2023.
+      <br/>
+    
+  </p>
+
+  </div>
+  
+  <div class="footer-item">
+    
+  </div>
+  
+  <div class="footer-item">
+    
+  </div>
+  
+</div>
+          </footer>
+        
+
+      </main>
+    </div>
+  </div>
+  
+  <!-- Scripts loaded after <body> so the DOM is not blocked -->
+  <script src="http://github.com/_static/scripts/bootstrap.js?digest=dfe6caa3a7d634c4db9b"></script>
+<script src="http://github.com/_static/scripts/pydata-sphinx-theme.js?digest=dfe6caa3a7d634c4db9b"></script>
+
+  <footer class="bd-footer">
+  </footer>
+  </body>
+</html>
\ No newline at end of file
diff --git a/doc/LectureNotes/_build/jupyter_execute/.venv/lib/python3.13/site-packages/jupyter_book/book_template/markdown-notebooks.ipynb b/doc/LectureNotes/_build/jupyter_execute/.venv/lib/python3.13/site-packages/jupyter_book/book_template/markdown-notebooks.ipynb
new file mode 100644
index 000000000..ff55b6fa1
--- /dev/null
+++ b/doc/LectureNotes/_build/jupyter_execute/.venv/lib/python3.13/site-packages/jupyter_book/book_template/markdown-notebooks.ipynb
@@ -0,0 +1,85 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "1232311e",
+   "metadata": {},
+   "source": [
+    "# Notebooks with MyST Markdown\n",
+    "\n",
+    "Jupyter Book also lets you write text-based notebooks using MyST Markdown.\n",
+    "See [the Notebooks with MyST Markdown documentation](https://jupyterbook.org/file-types/myst-notebooks.html) for more detailed instructions.\n",
+    "This page shows off a notebook written in MyST Markdown.\n",
+    "\n",
+    "## An example cell\n",
+    "\n",
+    "With MyST Markdown, you can define code cells with a directive like so:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "f961e284",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "print(2 + 2)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3b5f5a93",
+   "metadata": {},
+   "source": [
+    "When your book is built, the contents of any `{code-cell}` blocks will be\n",
+    "executed with your default Jupyter kernel, and their outputs will be displayed\n",
+    "in-line with the rest of your content.\n",
+    "\n",
+    "```{seealso}\n",
+    "Jupyter Book uses [Jupytext](https://jupytext.readthedocs.io/en/latest/) to convert text-based files to notebooks, and can support [many other text-based notebook files](https://jupyterbook.org/file-types/jupytext.html).\n",
+    "```\n",
+    "\n",
+    "## Create a notebook with MyST Markdown\n",
+    "\n",
+    "MyST Markdown notebooks are defined by two things:\n",
+    "\n",
+    "1. YAML metadata that is needed to understand if / how it should convert text files to notebooks (including information about the kernel needed).\n",
+    "   See the YAML at the top of this page for example.\n",
+    "2. The presence of `{code-cell}` directives, which will be executed with your book.\n",
+    "\n",
+    "That's all that is needed to get started!\n",
+    "\n",
+    "## Quickly add YAML metadata for MyST Notebooks\n",
+    "\n",
+    "If you have a markdown file and you'd like to quickly add YAML metadata to it, so that Jupyter Book will treat it as a MyST Markdown Notebook, run the following command:\n",
+    "\n",
+    "```\n",
+    "jupyter-book myst init path/to/markdownfile.md\n",
+    "```"
+   ]
+  }
+ ],
+ "metadata": {
+  "jupytext": {
+   "formats": "md:myst",
+   "text_representation": {
+    "extension": ".md",
+    "format_name": "myst",
+    "format_version": 0.13,
+    "jupytext_version": "1.11.5"
+   }
+  },
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "source_map": [
+   13,
+   25,
+   27
+  ]
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
\ No newline at end of file
diff --git a/doc/LectureNotes/_build/jupyter_execute/.venv/lib/python3.13/site-packages/jupyter_book/book_template/notebooks.ipynb b/doc/LectureNotes/_build/jupyter_execute/.venv/lib/python3.13/site-packages/jupyter_book/book_template/notebooks.ipynb
new file mode 100644
index 000000000..1e007e192
--- /dev/null
+++ b/doc/LectureNotes/_build/jupyter_execute/.venv/lib/python3.13/site-packages/jupyter_book/book_template/notebooks.ipynb
@@ -0,0 +1,122 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Content with notebooks\n",
+    "\n",
+    "You can also create content with Jupyter Notebooks. This means that you can include\n",
+    "code blocks and their outputs in your book.\n",
+    "\n",
+    "## Markdown + notebooks\n",
+    "\n",
+    "As it is markdown, you can embed images, HTML, etc into your posts!\n",
+    "\n",
+    "![](https://myst-parser.readthedocs.io/en/latest/_static/logo-wide.svg)\n",
+    "\n",
+    "You can also $add_{math}$ and\n",
+    "\n",
+    "$$\n",
+    "math^{blocks}\n",
+    "$$\n",
+    "\n",
+    "or\n",
+    "\n",
+    "$$\n",
+    "\\begin{aligned}\n",
+    "\\mbox{mean} la_{tex} \\\\ \\\\\n",
+    "math blocks\n",
+    "\\end{aligned}\n",
+    "$$\n",
+    "\n",
+    "But make sure you \\$Escape \\$your \\$dollar signs \\$you want to keep!\n",
+    "\n",
+    "## MyST markdown\n",
+    "\n",
+    "MyST markdown works in Jupyter Notebooks as well. For more information about MyST markdown, check\n",
+    "out [the MyST guide in Jupyter Book](https://jupyterbook.org/content/myst.html),\n",
+    "or see [the MyST markdown documentation](https://myst-parser.readthedocs.io/en/latest/).\n",
+    "\n",
+    "## Code blocks and outputs\n",
+    "\n",
+    "Jupyter Book will also embed your code blocks and output in your book.\n",
+    "For example, here's some sample Matplotlib code:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from matplotlib import rcParams, cycler\n",
+    "import matplotlib.pyplot as plt\n",
+    "import numpy as np\n",
+    "plt.ion()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Fixing random state for reproducibility\n",
+    "np.random.seed(19680801)\n",
+    "\n",
+    "N = 10\n",
+    "data = [np.logspace(0, 1, 100) + np.random.randn(100) + ii for ii in range(N)]\n",
+    "data = np.array(data).T\n",
+    "cmap = plt.cm.coolwarm\n",
+    "rcParams['axes.prop_cycle'] = cycler(color=cmap(np.linspace(0, 1, N)))\n",
+    "\n",
+    "\n",
+    "from matplotlib.lines import Line2D\n",
+    "custom_lines = [Line2D([0], [0], color=cmap(0.), lw=4),\n",
+    "                Line2D([0], [0], color=cmap(.5), lw=4),\n",
+    "                Line2D([0], [0], color=cmap(1.), lw=4)]\n",
+    "\n",
+    "fig, ax = plt.subplots(figsize=(10, 5))\n",
+    "lines = ax.plot(data)\n",
+    "ax.legend(custom_lines, ['Cold', 'Medium', 'Hot']);"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "There is a lot more that you can do with outputs (such as including interactive outputs)\n",
+    "with your book. For more information about this, see [the Jupyter Book documentation](https://jupyterbook.org)"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.8.0"
+  },
+  "widgets": {
+   "application/vnd.jupyter.widget-state+json": {
+    "state": {},
+    "version_major": 2,
+    "version_minor": 0
+   }
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}
\ No newline at end of file
diff --git a/doc/LectureNotes/_build/jupyter_execute/exercisesweek35.ipynb b/doc/LectureNotes/_build/jupyter_execute/exercisesweek35.ipynb
index e88352077..4c74687f4 100644
--- a/doc/LectureNotes/_build/jupyter_execute/exercisesweek35.ipynb
+++ b/doc/LectureNotes/_build/jupyter_execute/exercisesweek35.ipynb
@@ -323,7 +323,7 @@
    "source": [
     "n = 100\n",
     "x = np.linspace(-3, 3, n)\n",
-    "y = np.exp(-x**2) + 1.5 * np.exp(-(x-2)**2) + np.random.normal(0, 1.0)"
+    "y = np.exp(-x**2) + 1.5 * np.exp(-(x-2)**2) + np.random.normal(n)"
    ]
   },
   {
diff --git a/doc/LectureNotes/_build/jupyter_execute/exercisesweek36.ipynb b/doc/LectureNotes/_build/jupyter_execute/exercisesweek36.ipynb
index 820e0a768..2b233c4e1 100644
--- a/doc/LectureNotes/_build/jupyter_execute/exercisesweek36.ipynb
+++ b/doc/LectureNotes/_build/jupyter_execute/exercisesweek36.ipynb
@@ -172,7 +172,7 @@
    "source": [
     "n = 100\n",
     "x = np.linspace(-3, 3, n)\n",
-    "y = np.exp(-x**2) + 1.5 * np.exp(-(x-2)**2) + np.random.normal(0, 0.1)"
+    "y = np.exp(-x**2) + 1.5 * np.exp(-(x-2)**2) + np.random.normal(n)"
    ]
   },
   {
diff --git a/doc/LectureNotes/_build/jupyter_execute/exercisesweek37.ipynb b/doc/LectureNotes/_build/jupyter_execute/exercisesweek37.ipynb
index 23a5a9d27..b7689e250 100644
--- a/doc/LectureNotes/_build/jupyter_execute/exercisesweek37.ipynb
+++ b/doc/LectureNotes/_build/jupyter_execute/exercisesweek37.ipynb
@@ -2,32 +2,33 @@
  "cells": [
   {
    "cell_type": "markdown",
-   "id": "1b941c35",
+   "id": "8e6632a0",
    "metadata": {
     "editable": true
    },
    "source": [
     "<!-- HTML file automatically generated from DocOnce source (https://github.com/doconce/doconce/)\n",
     "doconce format html exercisesweek37.do.txt  -->\n",
-    "<!-- dom:TITLE: Exercises week 37 -->"
+    "<!-- dom:TITLE: Exercises week 37 -->\n"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "dc05b096",
+   "id": "82705c4f",
    "metadata": {
     "editable": true
    },
    "source": [
     "# Exercises week 37\n",
+    "\n",
     "**Implementing gradient descent for Ridge and ordinary Least Squares Regression**\n",
     "\n",
-    "Date: **September 8-12, 2025**"
+    "Date: **September 8-12, 2025**\n"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "2cf07405",
+   "id": "921bf331",
    "metadata": {
     "editable": true
    },
@@ -35,55 +36,56 @@
     "## Learning goals\n",
     "\n",
     "After having completed these exercises you will have:\n",
+    "\n",
     "1. Your own code for the implementation of the simplest gradient descent approach applied to ordinary least squares (OLS) and Ridge regression\n",
     "\n",
     "2. Be able to compare the analytical expressions for OLS and Ridge regression with the gradient descent approach\n",
     "\n",
     "3. Explore the role of the learning rate in the gradient descent approach and the hyperparameter $\\lambda$ in Ridge regression\n",
     "\n",
-    "4. Scale the data properly"
+    "4. Scale the data properly\n"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "3c139edb",
+   "id": "adff65d5",
    "metadata": {
     "editable": true
    },
    "source": [
     "## Simple one-dimensional second-order polynomial\n",
     "\n",
-    "We start with a very simple function"
+    "We start with a very simple function\n"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "aad4cfac",
+   "id": "70418b3d",
    "metadata": {
     "editable": true
    },
    "source": [
     "$$\n",
     "f(x)= 2-x+5x^2,\n",
-    "$$"
+    "$$\n"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "6682282f",
+   "id": "11a3cf73",
    "metadata": {
     "editable": true
    },
    "source": [
-    "defined for $x\\in [-2,2]$. You can add noise if you wish. \n",
+    "defined for $x\\in [-2,2]$. You can add noise if you wish.\n",
     "\n",
     "We are going to fit this function with a polynomial ansatz. The easiest thing is to set up a second-order polynomial and see if you can fit the above function.\n",
-    "Feel free to play around with higher-order polynomials."
+    "Feel free to play around with higher-order polynomials.\n"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "89e2f4c4",
+   "id": "04a06b51",
    "metadata": {
     "editable": true
    },
@@ -94,12 +96,12 @@
     "standardize the features. This ensures all features are on a\n",
     "comparable scale, which is especially important when using\n",
     "regularization. Here we will perform standardization, scaling each\n",
-    "feature to have mean 0 and standard deviation 1."
+    "feature to have mean 0 and standard deviation 1.\n"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "b06d4e53",
+   "id": "408db3d9",
    "metadata": {
     "editable": true
    },
@@ -114,13 +116,13 @@
     "term, the data is shifted such that the intercept is effectively 0\n",
     ". (In practice, one could include an intercept in the model and not\n",
     "penalize it, but here we simplify by centering.)\n",
-    "Choose $n=100$ data points and set up $\\boldsymbol{x}, $\\boldsymbol{y}$ and the design matrix $\\boldsymbol{X}$."
+    "Choose $n=100$ data points and set up $\\boldsymbol{x}$, $\\boldsymbol{y}$ and the design matrix $\\boldsymbol{X}$.\n"
    ]
   },
   {
    "cell_type": "code",
    "execution_count": 1,
-   "id": "63796480",
+   "id": "37fb732c",
    "metadata": {
     "collapsed": false,
     "editable": true
@@ -140,46 +142,46 @@
   },
   {
    "cell_type": "markdown",
-   "id": "80748600",
+   "id": "d861e1e3",
    "metadata": {
     "editable": true
    },
    "source": [
-    "Fill in the necessary details.\n",
+    "Fill in the necessary details. Do we need to center the $y$-values?\n",
     "\n",
     "After this preprocessing, each column of $\\boldsymbol{X}_{\\mathrm{norm}}$ has mean zero and standard deviation $1$\n",
     "and $\\boldsymbol{y}_{\\mathrm{centered}}$ has mean 0. This makes the optimization landscape\n",
     "nicer and ensures the regularization penalty $\\lambda \\sum_j\n",
     "\\theta_j^2$ in Ridge regression treats each coefficient fairly (since features are on the\n",
-    "same scale)."
+    "same scale).\n"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "92751e5f",
+   "id": "b3e774d0",
    "metadata": {
     "editable": true
    },
    "source": [
     "## Exercise 2, calculate the gradients\n",
     "\n",
-    "Find the gradients for OLS and Ridge regression using the mean-squared error as cost/loss function."
+    "Find the gradients for OLS and Ridge regression using the mean-squared error as cost/loss function.\n"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "aedfbd7a",
+   "id": "d5dc7708",
    "metadata": {
     "editable": true
    },
    "source": [
-    "## Exercise 3, using the analytical formulae for OLS and Ridge regression to find the optimal paramters $\\boldsymbol{\\theta}$"
+    "## Exercise 3, using the analytical formulae for OLS and Ridge regression to find the optimal paramters $\\boldsymbol{\\theta}$\n"
    ]
   },
   {
    "cell_type": "code",
    "execution_count": 2,
-   "id": "5d1288fa",
+   "id": "4c9c86ac",
    "metadata": {
     "collapsed": false,
     "editable": true
@@ -187,7 +189,9 @@
    "outputs": [],
    "source": [
     "# Set regularization parameter, either a single value or a vector of values\n",
-    "lambda = ?\n",
+    "# Note that lambda is a python keyword. The lambda keyword is used to create small, single-expression functions without a formal name. These are often called \"anonymous functions\" or \"lambda functions.\"\n",
+    "lam = ?\n",
+    "\n",
     "\n",
     "# Analytical form for OLS and Ridge solution: theta_Ridge = (X^T X + lambda * I)^{-1} X^T y and theta_OLS = (X^T X)^{-1} X^T y\n",
     "I = np.eye(n_features)\n",
@@ -200,7 +204,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "628f5e89",
+   "id": "eeae00fd",
    "metadata": {
     "editable": true
    },
@@ -208,37 +212,37 @@
     "This computes the Ridge and OLS regression coefficients directly. The identity\n",
     "matrix $I$ has the same size as $X^T X$. It adds $\\lambda$ to the diagonal of $X^T X$ for Ridge regression. We\n",
     "then invert this matrix and multiply by $X^T y$. The result\n",
-    "for $\\boldsymbol{\\theta}$  is a NumPy array of shape (n$\\_$features,) containing the\n",
-    "fitted parameters $\\boldsymbol{\\theta}$."
+    "for $\\boldsymbol{\\theta}$ is a NumPy array of shape (n$\\_$features,) containing the\n",
+    "fitted parameters $\\boldsymbol{\\theta}$.\n"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "f115ba4e",
+   "id": "e1c215d5",
    "metadata": {
     "editable": true
    },
    "source": [
     "### 3a)\n",
     "\n",
-    "Finalize, in the above code, the OLS and Ridge regression determination of the optimal parameters $\\boldsymbol{\\theta}$."
+    "Finalize, in the above code, the OLS and Ridge regression determination of the optimal parameters $\\boldsymbol{\\theta}$.\n"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "a9b5189c",
+   "id": "587dd3dc",
    "metadata": {
     "editable": true
    },
    "source": [
     "### 3b)\n",
     "\n",
-    "Explore the results as function of different values of the hyperparameter $\\lambda$. See for example exercise 4 from week 36."
+    "Explore the results as function of different values of the hyperparameter $\\lambda$. See for example exercise 4 from week 36.\n"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "a3969ff6",
+   "id": "bfa34697",
    "metadata": {
     "editable": true
    },
@@ -250,15 +254,15 @@
     "necessary if $n$ and $p$ are so large that the closed-form might be\n",
     "too slow or memory-intensive. We derive the gradients from the cost\n",
     "functions defined above. Use the gradients of the Ridge and OLS cost functions with respect to\n",
-    "the parameters  $\\boldsymbol{\\theta}$ and set up (using the template below) your own gradient descent code for OLS and Ridge regression.\n",
+    "the parameters $\\boldsymbol{\\theta}$ and set up (using the template below) your own gradient descent code for OLS and Ridge regression.\n",
     "\n",
-    "Below is a template code for gradient descent implementation of ridge:"
+    "Below is a template code for gradient descent implementation of ridge:\n"
    ]
   },
   {
    "cell_type": "code",
    "execution_count": 3,
-   "id": "34d87303",
+   "id": "49245f55",
    "metadata": {
     "collapsed": false,
     "editable": true
@@ -273,19 +277,8 @@
     "# Initialize weights for gradient descent\n",
     "theta = np.zeros(n_features)\n",
     "\n",
-    "# Arrays to store history for plotting\n",
-    "cost_history = np.zeros(num_iters)\n",
-    "\n",
     "# Gradient descent loop\n",
-    "m = n_samples  # number of data points\n",
     "for t in range(num_iters):\n",
-    "    # Compute prediction error\n",
-    "    error = X_norm.dot(theta) - y_centered \n",
-    "    # Compute cost for OLS and Ridge (MSE + regularization for Ridge) for monitoring\n",
-    "    cost_OLS = ?\n",
-    "    cost_Ridge = ?\n",
-    "    # You could add a history for both methods (optional)\n",
-    "    cost_history[t] = ?\n",
     "    # Compute gradients for OSL and Ridge\n",
     "    grad_OLS = ?\n",
     "    grad_Ridge = ?\n",
@@ -302,31 +295,33 @@
   },
   {
    "cell_type": "markdown",
-   "id": "989f70bb",
+   "id": "f3f43f2c",
    "metadata": {
     "editable": true
    },
    "source": [
     "### 4a)\n",
     "\n",
-    "Discuss the results as function of the learning rate parameters and the number of iterations."
+    "Write first a gradient descent code for OLS only using the above template.\n",
+    "Discuss the results as function of the learning rate parameters and the number of iterations\n"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "370b2dad",
+   "id": "9ba303be",
    "metadata": {
     "editable": true
    },
    "source": [
     "### 4b)\n",
     "\n",
-    "Try to add a stopping parameter as function of the number iterations and the difference between the new and old $\\theta$ values. How would you define a stopping criterion?"
+    "Write then a similar code for Ridge regression using the above template.\n",
+    "Try to add a stopping parameter as function of the number iterations and the difference between the new and old $\\theta$ values. How would you define a stopping criterion?\n"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "ef197cd7",
+   "id": "78362c6c",
    "metadata": {
     "editable": true
    },
@@ -346,13 +341,13 @@
     "Then we sample feature values for $\\boldsymbol{X}$ randomly (e.g. from a normal distribution). We use a normal distribution so features are roughly centered around 0.\n",
     "Then we compute the target values $y$ using the linear combination $\\boldsymbol{X}\\hat{\\boldsymbol{\\theta}}$ and add some noise (to simulate measurement error or unexplained variance).\n",
     "\n",
-    "Below is the code to generate the dataset:"
+    "Below is the code to generate the dataset:\n"
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 4,
-   "id": "4ccc2f65",
+   "execution_count": null,
+   "id": "8be1cebe",
    "metadata": {
     "collapsed": false,
     "editable": true
@@ -375,13 +370,13 @@
     "X = np.random.randn(n_samples, n_features)  # standard normal distribution\n",
     "\n",
     "# Generate target values y with a linear combination of X and theta_true, plus noise\n",
-    "noise = 0.5 * np.random.randn(n_samples)    # Gaussian noise\n",
+    "noise = 0.5 * np.random.randn(n_samples)  # Gaussian noise\n",
     "y = X.dot @ theta_true + noise"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "00e279ef",
+   "id": "e2693666",
    "metadata": {
     "editable": true
    },
@@ -390,29 +385,29 @@
     "significantly influence $\\boldsymbol{y}$. The rest of the features have zero true\n",
     "coefficient. For example, feature 0 has\n",
     "a true weight of 5.0, feature 1 has -3.0, and feature 6 has 2.0, so\n",
-    "the expected relationship is:"
+    "the expected relationship is:\n"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "c910b3f4",
+   "id": "bc954d12",
    "metadata": {
     "editable": true
    },
    "source": [
     "$$\n",
     "y \\approx 5 \\times x_0 \\;-\\; 3 \\times x_1 \\;+\\; 2 \\times x_6 \\;+\\; \\text{noise}.\n",
-    "$$"
+    "$$\n"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "89e6e040",
+   "id": "6534b610",
    "metadata": {
     "editable": true
    },
    "source": [
-    "You can remove the noise if you wish to. \n",
+    "You can remove the noise if you wish to.\n",
     "\n",
     "Try to fit the above data set using OLS and Ridge regression with the analytical expressions and your own gradient descent codes.\n",
     "\n",
@@ -420,11 +415,15 @@
     "close to the true values [5.0, -3.0, 0.0, …, 2.0, …] that we used to\n",
     "generate the data. Keep in mind that due to regularization and noise,\n",
     "the learned values will not exactly equal the true ones, but they\n",
-    "should be in the same ballpark.  Which method (OLS or Ridge) gives the best results?"
+    "should be in the same ballpark. Which method (OLS or Ridge) gives the best results?\n"
    ]
   }
  ],
- "metadata": {},
+ "metadata": {
+  "language_info": {
+   "name": "python"
+  }
+ },
  "nbformat": 4,
  "nbformat_minor": 5
 }
\ No newline at end of file
diff --git a/doc/LectureNotes/_build/jupyter_execute/exercisesweek38.ipynb b/doc/LectureNotes/_build/jupyter_execute/exercisesweek38.ipynb
new file mode 100644
index 000000000..93d8969c2
--- /dev/null
+++ b/doc/LectureNotes/_build/jupyter_execute/exercisesweek38.ipynb
@@ -0,0 +1,485 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "1da77599",
+   "metadata": {},
+   "source": [
+    "# Exercises week 38\n",
+    "\n",
+    "## September 15-19\n",
+    "\n",
+    "## Resampling and the Bias-Variance Trade-off\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e9f27b0e",
+   "metadata": {},
+   "source": [
+    "### Learning goals\n",
+    "\n",
+    "After completing these exercises, you will know how to\n",
+    "\n",
+    "- Derive expectation and variances values related to linear regression\n",
+    "- Compute expectation and variances values related to linear regression\n",
+    "- Compute and evaluate the trade-off between bias and variance of a model\n",
+    "\n",
+    "### Deliverables\n",
+    "\n",
+    "Complete the following exercises while working in a jupyter notebook. Then, in canvas, include\n",
+    "\n",
+    "- The jupyter notebook with the exercises completed\n",
+    "- An exported PDF of the notebook (https://code.visualstudio.com/docs/datascience/jupyter-notebooks#_export-your-jupyter-notebook)\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "984af8e3",
+   "metadata": {},
+   "source": [
+    "## Use the books!\n",
+    "\n",
+    "This week deals with various mean values and variances in linear regression methods (here it may be useful to look up chapter 3, equation (3.8) of [Trevor Hastie, Robert Tibshirani, Jerome H. Friedman, The Elements of Statistical Learning, Springer](https://www.springer.com/gp/book/9780387848570)).\n",
+    "\n",
+    "For more discussions on Ridge regression and calculation of expectation values, [Wessel van Wieringen's](https://arxiv.org/abs/1509.09169) article is highly recommended.\n",
+    "\n",
+    "The exercises this week are also a part of project 1 and can be reused in the theory part of the project.\n",
+    "\n",
+    "### Definitions\n",
+    "\n",
+    "We assume that there exists a continuous function $f(\\boldsymbol{x})$ and a normal distributed error $\\boldsymbol{\\varepsilon}\\sim N(0, \\sigma^2)$ which describes our data\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c16f7d0e",
+   "metadata": {},
+   "source": [
+    "$$\n",
+    "\\boldsymbol{y} = f(\\boldsymbol{x})+\\boldsymbol{\\varepsilon}\n",
+    "$$\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9fcf981a",
+   "metadata": {
+    "vscode": {
+     "languageId": "plaintext"
+    }
+   },
+   "source": [
+    "We further assume that this continous function can be modeled with a linear model $\\mathbf{\\tilde{y}}$ of some features $\\mathbf{X}$.\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d4189366",
+   "metadata": {},
+   "source": [
+    "$$\n",
+    "\\boldsymbol{y} = \\boldsymbol{\\tilde{y}} + \\boldsymbol{\\varepsilon} = \\boldsymbol{X}\\boldsymbol{\\beta} +\\boldsymbol{\\varepsilon}\n",
+    "$$\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f4fca21b",
+   "metadata": {},
+   "source": [
+    "We therefore get that our data $\\boldsymbol{y}$ has an expectation value $\\boldsymbol{X}\\boldsymbol{\\beta}$ and variance $\\sigma^2$, that is $\\boldsymbol{y}$ follows a normal distribution with mean value $\\boldsymbol{X}\\boldsymbol{\\beta}$ and variance $\\sigma^2$.\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5de0c7e6",
+   "metadata": {},
+   "source": [
+    "## Exercise 1: Expectation values for ordinary least squares expressions\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d878c699",
+   "metadata": {},
+   "source": [
+    "**a)** With the expressions for the optimal parameters $\\boldsymbol{\\hat{\\beta}_{OLS}}$ show that\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "08b7007d",
+   "metadata": {},
+   "source": [
+    "$$\n",
+    "\\mathbb{E}(\\boldsymbol{\\hat{\\beta}_{OLS}}) = \\boldsymbol{\\beta}.\n",
+    "$$\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "46e93394",
+   "metadata": {},
+   "source": [
+    "**b)** Show that the variance of $\\boldsymbol{\\hat{\\beta}_{OLS}}$ is\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "be1b65be",
+   "metadata": {},
+   "source": [
+    "$$\n",
+    "\\mathbf{Var}(\\boldsymbol{\\hat{\\beta}_{OLS}}) = \\sigma^2 \\, (\\mathbf{X}^{T} \\mathbf{X})^{-1}.\n",
+    "$$\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d2143684",
+   "metadata": {},
+   "source": [
+    "We can use the last expression when we define a [confidence interval](https://en.wikipedia.org/wiki/Confidence_interval) for the parameters $\\boldsymbol{\\hat{\\beta}_{OLS}}$.\n",
+    "A given parameter ${\\boldsymbol{\\hat{\\beta}_{OLS}}}_j$ is given by the diagonal matrix element of the above matrix.\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f5c2dc22",
+   "metadata": {},
+   "source": [
+    "## Exercise 2: Expectation values for Ridge regression\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3893e3e7",
+   "metadata": {},
+   "source": [
+    "**a)** With the expressions for the optimal parameters $\\boldsymbol{\\hat{\\beta}_{Ridge}}$ show that\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "79dc571f",
+   "metadata": {},
+   "source": [
+    "$$\n",
+    "\\mathbb{E} \\big[ \\hat{\\boldsymbol{\\beta}}^{\\mathrm{Ridge}} \\big]=(\\mathbf{X}^{T} \\mathbf{X} + \\lambda \\mathbf{I}_{pp})^{-1} (\\mathbf{X}^{\\top} \\mathbf{X})\\boldsymbol{\\beta}\n",
+    "$$\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "028209a1",
+   "metadata": {},
+   "source": [
+    "We see that $\\mathbb{E} \\big[ \\hat{\\boldsymbol{\\beta}}^{\\mathrm{Ridge}} \\big] \\not= \\mathbb{E} \\big[\\hat{\\boldsymbol{\\beta}}^{\\mathrm{OLS}}\\big ]$ for any $\\lambda > 0$.\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b4e721fc",
+   "metadata": {},
+   "source": [
+    "**b)** Show that the variance is\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "090eb1e1",
+   "metadata": {},
+   "source": [
+    "$$\n",
+    "\\mathbf{Var}[\\hat{\\boldsymbol{\\beta}}^{\\mathrm{Ridge}}]=\\sigma^2[  \\mathbf{X}^{T} \\mathbf{X} + \\lambda \\mathbf{I} ]^{-1}  \\mathbf{X}^{T}\\mathbf{X} \\{ [  \\mathbf{X}^{\\top} \\mathbf{X} + \\lambda \\mathbf{I} ]^{-1}\\}^{T}\n",
+    "$$\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6b8e8697",
+   "metadata": {},
+   "source": [
+    "We see that if the parameter $\\lambda$ goes to infinity then the variance of the Ridge parameters $\\boldsymbol{\\beta}$ goes to zero.\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "74bc300b",
+   "metadata": {},
+   "source": [
+    "## Exercise 3: Deriving the expression for the Bias-Variance Trade-off\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "eeb86010",
+   "metadata": {},
+   "source": [
+    "The aim of this exercise is to derive the equations for the bias-variance tradeoff to be used in project 1.\n",
+    "\n",
+    "The parameters $\\boldsymbol{\\hat{\\beta}_{OLS}}$ are found by optimizing the mean squared error via the so-called cost function\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "522a0d1d",
+   "metadata": {},
+   "source": [
+    "$$\n",
+    "C(\\boldsymbol{X},\\boldsymbol{\\beta}) =\\frac{1}{n}\\sum_{i=0}^{n-1}(y_i-\\tilde{y}_i)^2=\\mathbb{E}\\left[(\\boldsymbol{y}-\\boldsymbol{\\tilde{y}})^2\\right]\n",
+    "$$\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "831db06c",
+   "metadata": {},
+   "source": [
+    "**a)** Show that you can rewrite this into an expression which contains\n",
+    "\n",
+    "- the variance of the model (the variance term)\n",
+    "- the expected deviation of the mean of the model from the true data (the bias term)\n",
+    "- the variance of the noise\n",
+    "\n",
+    "In other words, show that:\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8cc52b3c",
+   "metadata": {},
+   "source": [
+    "$$\n",
+    "\\mathbb{E}\\left[(\\boldsymbol{y}-\\boldsymbol{\\tilde{y}})^2\\right]=\\mathrm{Bias}[\\tilde{y}]+\\mathrm{var}[\\tilde{y}]+\\sigma^2,\n",
+    "$$\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8cb50416",
+   "metadata": {},
+   "source": [
+    "with\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e49bdbb4",
+   "metadata": {},
+   "source": [
+    "$$\n",
+    "\\mathrm{Bias}[\\tilde{y}]=\\mathbb{E}\\left[\\left(\\boldsymbol{y}-\\mathbb{E}\\left[\\boldsymbol{\\tilde{y}}\\right]\\right)^2\\right],\n",
+    "$$\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "eca5554a",
+   "metadata": {},
+   "source": [
+    "and\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b1054343",
+   "metadata": {},
+   "source": [
+    "$$\n",
+    "\\mathrm{var}[\\tilde{y}]=\\mathbb{E}\\left[\\left(\\tilde{\\boldsymbol{y}}-\\mathbb{E}\\left[\\boldsymbol{\\tilde{y}}\\right]\\right)^2\\right]=\\frac{1}{n}\\sum_i(\\tilde{y}_i-\\mathbb{E}\\left[\\boldsymbol{\\tilde{y}}\\right])^2.\n",
+    "$$\n",
+    "\n",
+    "In order to arrive at the equation for the bias, we have to approximate the unknown function $f$ with the output/target values $y$.\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "70fbfcd7",
+   "metadata": {},
+   "source": [
+    "**b)** Explain what the terms mean and discuss their interpretations.\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b8f8b9d1",
+   "metadata": {},
+   "source": [
+    "## Exercise 4: Computing the Bias and Variance\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9e012430",
+   "metadata": {},
+   "source": [
+    "Before you compute the bias and variance of a real model for different complexities, let's for now assume that you have sampled predictions and targets for a single model complexity using bootstrap resampling.\n",
+    "\n",
+    "**a)** Using the expression above, compute the mean squared error, bias and variance of the given data. Check that the sum of the bias and variance correctly gives (approximately) the mean squared error.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "b5bf581c",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import numpy as np\n",
+    "\n",
+    "n = 100\n",
+    "bootstraps = 1000\n",
+    "\n",
+    "predictions = np.random.rand(bootstraps, n) * 10 + 10\n",
+    "# The definition of targets has been updated, and was wrong earlier in the week.\n",
+    "targets = np.random.rand(1, n)\n",
+    "\n",
+    "mse = ...\n",
+    "bias = ...\n",
+    "variance = ..."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7b1dc621",
+   "metadata": {},
+   "source": [
+    "**b)** Change the prediction values in some way to increase the bias while decreasing the variance.\n",
+    "\n",
+    "**c)** Change the prediction values in some way to increase the variance while decreasing the bias.\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8da63362",
+   "metadata": {},
+   "source": [
+    "**d)** Perform a bias-variance analysis of a polynomial OLS model fit to a one-dimensional function by computing and plotting the bias and variances values as a function of the polynomial degree of your model.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "dd5855e4",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import numpy as np\n",
+    "import matplotlib.pyplot as plt\n",
+    "from sklearn.preprocessing import (\n",
+    "    PolynomialFeatures,\n",
+    ")  # use the fit_transform method of the created object!\n",
+    "from sklearn.linear_model import LinearRegression\n",
+    "from sklearn.metrics import mean_squared_error\n",
+    "from sklearn.model_selection import train_test_split\n",
+    "from sklearn.utils import resample"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "7e35fa37",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "n = 100\n",
+    "bootstraps = 1000\n",
+    "\n",
+    "x = np.linspace(-3, 3, n)\n",
+    "y = np.exp(-(x**2)) + 1.5 * np.exp(-((x - 2) ** 2)) + np.random.normal(0, 0.1)\n",
+    "\n",
+    "biases = []\n",
+    "variances = []\n",
+    "mses = []\n",
+    "\n",
+    "# for p in range(1, 5):\n",
+    "#    predictions = ...\n",
+    "#    targets = ...\n",
+    "#\n",
+    "#    X = ...\n",
+    "#    X_train, X_test, y_train, y_test = ...\n",
+    "#    for b in range(bootstraps):\n",
+    "#        X_train_re, y_train_re = ...\n",
+    "#\n",
+    "#        # fit your model on the sampled data\n",
+    "#\n",
+    "#        # make predictions on the test data\n",
+    "#        predictions[b, :] =\n",
+    "#        targets[b, :] =\n",
+    "#\n",
+    "#    biases.append(...)\n",
+    "#    variances.append(...)\n",
+    "#    mses.append(...)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "253b8461",
+   "metadata": {},
+   "source": [
+    "**e)** Discuss the bias-variance trade-off as function of your model complexity (the degree of the polynomial).\n",
+    "\n",
+    "**f)** Compute and discuss the bias and variance as function of the number of data points (choose a suitable polynomial degree to show something interesting).\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "46250fbc",
+   "metadata": {},
+   "source": [
+    "## Exercise 5: Interpretation of scaling and metrics\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5af53055",
+   "metadata": {},
+   "source": [
+    "In this course, we often ask you to scale data and compute various metrics. Although these practices are \"standard\" in the field, we will require you to demonstrate an understanding of _why_ you need to scale data and use these metrics. Both so that you can make better arguements about your results, and so that you will hopefully make fewer mistakes.\n",
+    "\n",
+    "First, a few reminders: In this course you should always scale the columns of the feature matrix, and sometimes scale the target data, when it is worth the effort. By scaling, we mean subtracting the mean and dividing by the standard deviation, though there are many other ways to scale data. When scaling either the feature matrix or the target data, the intercept becomes a bit harder to implement and understand, so take care.\n",
+    "\n",
+    "Briefly answer the following:\n",
+    "\n",
+    "**a)** Why do we scale data?\n",
+    "\n",
+    "**b)** Why does the OLS method give practically equivelent models on scaled and unscaled data?\n",
+    "\n",
+    "**c)** Why does the Ridge method **not** give practically equivelent models on scaled and unscaled data? Why do we only consider the model on scaled data correct?\n",
+    "\n",
+    "**d)** Why do we say that the Ridge method gives a biased model?\n",
+    "\n",
+    "**e)** Is the MSE of the OLS method affected by scaling of the feature matrix? Is it affected by scaling of the target data?\n",
+    "\n",
+    "**f)** Read about the R2 score, a metric we will ask you to use a lot later in the course. Is the R2 score of the OLS method affected by scaling of the feature matrix? Is it affected by scaling of the target data?\n",
+    "\n",
+    "**g)** Give interpretations of the following R2 scores: 0, 0.5, 1.\n",
+    "\n",
+    "**h)** What is an advantage of the R2 score over the MSE?\n"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.9.15"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
\ No newline at end of file
diff --git a/doc/LectureNotes/_build/jupyter_execute/exercisesweek39.ipynb b/doc/LectureNotes/_build/jupyter_execute/exercisesweek39.ipynb
new file mode 100644
index 000000000..e9c229ddc
--- /dev/null
+++ b/doc/LectureNotes/_build/jupyter_execute/exercisesweek39.ipynb
@@ -0,0 +1,185 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "433db993",
+   "metadata": {},
+   "source": [
+    "# Exercises week 39\n",
+    "\n",
+    "## Getting started with project 1\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6b931365",
+   "metadata": {},
+   "source": [
+    "The aim of the exercises this week is to aid you in getting started with writing the report. This will be discussed during the lab sessions as well.\n",
+    "\n",
+    "A short feedback to the this exercise will be available before the project deadline. And you can reuse these elements in your final report.\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2a63bae1",
+   "metadata": {},
+   "source": [
+    "### Learning goals\n",
+    "\n",
+    "After completing these exercises, you will know how to\n",
+    "\n",
+    "- Create a properly formatted report in Overleaf\n",
+    "- Select and present graphs for a scientific report\n",
+    "- Write an abstract and introduction for a scientific report\n",
+    "\n",
+    "### Deliverables\n",
+    "\n",
+    "Complete the following exercises while working in an Overleaf project. Then, in canvas, include\n",
+    "\n",
+    "- An exported PDF of the report draft you have been working on.\n",
+    "- A comment linking to the github repository used in exercise 4.\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e0f2d99d",
+   "metadata": {},
+   "source": [
+    "## Exercise 1: Creating the report document\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d06bfb29",
+   "metadata": {},
+   "source": [
+    "We require all projects to be formatted as proper scientific reports, and this includes using LaTeX for typesetting. We strongly recommend that you use the online LaTeX editor Overleaf, as it is much easier to start using, and has excellent support for collaboration.\n",
+    "\n",
+    "**a)** Create an account on Overleaf.com, or log in using SSO with your UiO email.\n",
+    "\n",
+    "**b)** Download [this](https://github.com/CompPhysics/MachineLearning/blob/master/doc/LectureNotes/data/FYS_STK_Template.zip) template project.\n",
+    "\n",
+    "**c)** Create a new Overleaf project with the correct formatting by uploading the template project.\n",
+    "\n",
+    "**d)** Read the general guideline for writing a report, which can be found at <https://github.com/CompPhysics/MachineLearning/blob/master/doc/Projects/EvaluationGrading/EvaluationForm.md>.\n",
+    "\n",
+    "**e)** Look at the provided example of an earlier project, found at <https://github.com/CompPhysics/MachineLearning/blob/master/doc/Projects/ReportExample/ReportExampleDanielBH.pdf>\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ec36f4c3",
+   "metadata": {},
+   "source": [
+    "## Exercise 2: Adding good figures\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f50723f8",
+   "metadata": {},
+   "source": [
+    "**a)** Using what you have learned so far in this course, create a plot illustrating the Bias-Variance trade-off. Make sure the lines and axes are labeled, with font size being the same as in the text.\n",
+    "\n",
+    "**b)** Add this figure to the results section of your document, with a caption that describes it. A reader should be able to understand the figure with only its contents and caption.\n",
+    "\n",
+    "**c)** Refer to the figure in your text using \\ref.\n",
+    "\n",
+    "**d)** Create a heatmap showing the MSE of a Ridge regression model for various polynomial degrees and lambda values. Make sure the axes are labeled, and that the title or colorbar describes what is plotted.\n",
+    "\n",
+    "**e)** Add this second figure to your document with a caption and reference in the text. All figures in the final report must be captioned and be referenced and used in the text.\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "276c214e",
+   "metadata": {},
+   "source": [
+    "## Exercise 3: Writing an abstract and introduction\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f4134eb5",
+   "metadata": {},
+   "source": [
+    "Although much of your project 1 results are not done yet, we want you to write an abstract and introduction to get you started on writing the report. It is generally a good idea to write a lot of a report before finishing all of the results, as you get a better understanding of your methods and inquiry from doing so, along with saving a lot of time. Where you would typically describe results in the abstract, instead make something up, just this once.\n",
+    "\n",
+    "**a)** Read the guidelines on abstract and introduction before you start.\n",
+    "\n",
+    "**b)** Write an abstract for project 1 in your report.\n",
+    "\n",
+    "**c)** Write an introduction for project 1 in your report.\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2f512b59",
+   "metadata": {},
+   "source": [
+    "## Exercise 4: Making the code available and presentable\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "77fe1fec",
+   "metadata": {},
+   "source": [
+    "A central part of the report is the code you write to implement the methods and generate the results. To get points for the code-part of the project, you need to make your code avaliable and presentable.\n",
+    "\n",
+    "**a)** Create a github repository for project 1, or create a dedicated folder for project 1 in a github repository. Only one person in your group needs to do this.\n",
+    "\n",
+    "**b)** Add a PDF of the report to this repository, after completing exercises 1-3\n",
+    "\n",
+    "**c)** Add a folder named Code, where you can put python files for your functions and notebooks for reproducing your results.\n",
+    "\n",
+    "**d)** Add python files for functions, and a notebook to produce the figures in exercise 2, to the Code folder. Remember to use a seed for generating random data and for train-test splits.\n",
+    "\n",
+    "**e)** Create a README file in the repository or project folder with\n",
+    "\n",
+    "- the name of the group members\n",
+    "- a short description of the project\n",
+    "- a description of how to install the required packages to run your code from a requirements.txt file\n",
+    "- names and descriptions of the various notebooks in the Code folder and the results they produce\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f1d72c56",
+   "metadata": {},
+   "source": [
+    "## Exercise 5: Referencing\n",
+    "\n",
+    "**a)** Add a reference to Hastie et al. using your preferred referencing style. See https://www.sokogskriv.no/referansestiler/ for an overview of styles.\n",
+    "\n",
+    "**b)** Add a reference to sklearn like this: https://scikit-learn.org/stable/about.html#citing-scikit-learn\n",
+    "\n",
+    "**c)** Make a prompt to your LLM of choice, and upload the exported conversation to your GitHub repository for the project.\n",
+    "\n",
+    "**d)** At the end of the methods section of the report, write a one paragraph declaration on how and for what you have used the LLM. Link to the log on GitHub.\n"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.9.15"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
\ No newline at end of file
diff --git a/doc/LectureNotes/_build/jupyter_execute/exercisesweek41.ipynb b/doc/LectureNotes/_build/jupyter_execute/exercisesweek41.ipynb
new file mode 100644
index 000000000..d5378765f
--- /dev/null
+++ b/doc/LectureNotes/_build/jupyter_execute/exercisesweek41.ipynb
@@ -0,0 +1,804 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "4b4c06bc",
+   "metadata": {},
+   "source": [
+    "<!-- HTML file automatically generated from DocOnce source (https://github.com/doconce/doconce/)\n",
+    "doconce format html exercisesweek41.do.txt  -->\n",
+    "<!-- dom:TITLE: Exercises week 41 -->\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "bcb25e64",
+   "metadata": {},
+   "source": [
+    "# Exercises week 41\n",
+    "\n",
+    "**October 6-10, 2025**\n",
+    "\n",
+    "Date: **Deadline is Friday October 10 at midnight**\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "bb01f126",
+   "metadata": {},
+   "source": [
+    "# Overarching aims of the exercises this week\n",
+    "\n",
+    "This week, you will implement the entire feed-forward pass of a neural network! Next week you will compute the gradient of the network by implementing back-propagation manually, and by using autograd which does back-propagation for you (much easier!). Next week, you will also use the gradient to optimize the network with a gradient method! However, there is an optional exercise this week to get started on training the network and getting good results!\n",
+    "\n",
+    "We recommend that you do the exercises this week by editing and running this notebook file, as it includes some checks along the way that you have implemented the pieces of the feed-forward pass correctly, and running small parts of the code at a time will be important for understanding the methods.\n",
+    "\n",
+    "If you have trouble running a notebook, you can run this notebook in google colab instead (https://colab.research.google.com/drive/1zKibVQf-iAYaAn2-GlKfgRjHtLnPlBX4#offline=true&sandboxMode=true), an updated link will be provided on the course discord (you can also send an email to k.h.fredly@fys.uio.no if you encounter any trouble), though we recommend that you set up VSCode and your python environment to run code like this locally.\n",
+    "\n",
+    "First, here are some functions you are going to need, don't change this cell. If you are unable to import autograd, just swap in normal numpy until you want to do the final optional exercise.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "c6f61b09",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import autograd.numpy as np  # We need to use this numpy wrapper to make automatic differentiation work later\n",
+    "from sklearn import datasets\n",
+    "import matplotlib.pyplot as plt\n",
+    "from sklearn.metrics import accuracy_score\n",
+    "\n",
+    "\n",
+    "# Defining some activation functions\n",
+    "def ReLU(z):\n",
+    "    return np.where(z > 0, z, 0)\n",
+    "\n",
+    "\n",
+    "def sigmoid(z):\n",
+    "    return 1 / (1 + np.exp(-z))\n",
+    "\n",
+    "\n",
+    "def softmax(z):\n",
+    "    \"\"\"Compute softmax values for each set of scores in the rows of the matrix z.\n",
+    "    Used with batched input data.\"\"\"\n",
+    "    e_z = np.exp(z - np.max(z, axis=0))\n",
+    "    return e_z / np.sum(e_z, axis=1)[:, np.newaxis]\n",
+    "\n",
+    "\n",
+    "def softmax_vec(z):\n",
+    "    \"\"\"Compute softmax values for each set of scores in the vector z.\n",
+    "    Use this function when you use the activation function on one vector at a time\"\"\"\n",
+    "    e_z = np.exp(z - np.max(z))\n",
+    "    return e_z / np.sum(e_z)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6248ec53",
+   "metadata": {},
+   "source": [
+    "# Exercise 1\n",
+    "\n",
+    "In this exercise you will compute the activation of the first layer. You only need to change the code in the cells right below an exercise, the rest works out of the box. Feel free to make changes and see how stuff works though!\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "37f30740",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "np.random.seed(2024)\n",
+    "\n",
+    "x = np.random.randn(2)  # network input. This is a single input with two features\n",
+    "W1 = np.random.randn(4, 2)  # first layer weights"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4ed2cf3d",
+   "metadata": {},
+   "source": [
+    "**a)** Given the shape of the first layer weight matrix, what is the input shape of the neural network? What is the output shape of the first layer?\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "edf7217b",
+   "metadata": {},
+   "source": [
+    "**b)** Define the bias of the first layer, `b1`with the correct shape. (Run the next cell right after the previous to get the random generated values to line up with the test solution below)\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "2129c19f",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "b1 = ..."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "09e8d453",
+   "metadata": {},
+   "source": [
+    "**c)** Compute the intermediary `z1` for the first layer\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "6837119b",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "z1 = ..."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6f71374e",
+   "metadata": {},
+   "source": [
+    "**d)** Compute the activation `a1` for the first layer using the ReLU activation function defined earlier.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "8d41ed19",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "a1 = ..."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "088710c0",
+   "metadata": {},
+   "source": [
+    "Confirm that you got the correct activation with the test below. Make sure that you define `b1` with the randn function right after you define `W1`.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "4d2f54b4",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "sol1 = np.array([0.60610368, 4.0076268, 0.0, 0.56469864])\n",
+    "\n",
+    "print(np.allclose(a1, sol1))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7fb0cf46",
+   "metadata": {},
+   "source": [
+    "# Exercise 2\n",
+    "\n",
+    "Now we will add a layer to the network with an output of length 8 and ReLU activation.\n",
+    "\n",
+    "**a)** What is the input of the second layer? What is its shape?\n",
+    "\n",
+    "**b)** Define the weight and bias of the second layer with the right shapes.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "00063acf",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "W2 = ...\n",
+    "b2 = ..."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5bd7d84b",
+   "metadata": {},
+   "source": [
+    "**c)** Compute the intermediary `z2` and activation `a2` for the second layer.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "2fd0383d",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "z2 = ...\n",
+    "a2 = ..."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1b5daae5",
+   "metadata": {},
+   "source": [
+    "Confirm that you got the correct activation shape with the test below.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "f7f2f8a1",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "print(\n",
+    "    np.allclose(np.exp(len(a2)), 2980.9579870417283)\n",
+    ")  # This should evaluate to True if a2 has the correct shape :)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3759620d",
+   "metadata": {},
+   "source": [
+    "# Exercise 3\n",
+    "\n",
+    "We often want our neural networks to have many layers of varying sizes. To avoid writing very long and error-prone code where we explicitly define and evaluate each layer we should keep all our layers in a single variable which is easy to create and use.\n",
+    "\n",
+    "**a)** Complete the function below so that it returns a list `layers` of weight and bias tuples `(W, b)` for each layer, in order, with the correct shapes that we can use later as our network parameters.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "c58f10f9",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def create_layers(network_input_size, layer_output_sizes):\n",
+    "    layers = []\n",
+    "\n",
+    "    i_size = network_input_size\n",
+    "    for layer_output_size in layer_output_sizes:\n",
+    "        W = ...\n",
+    "        b = ...\n",
+    "        layers.append((W, b))\n",
+    "\n",
+    "        i_size = layer_output_size\n",
+    "    return layers"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "bdc0cda2",
+   "metadata": {},
+   "source": [
+    "**b)** Comple the function below so that it evaluates the intermediary `z` and activation `a` for each layer, with ReLU actication, and returns the final activation `a`. This is the complete feed-forward pass, a full neural network!\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "5262df05",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def feed_forward_all_relu(layers, input):\n",
+    "    a = input\n",
+    "    for W, b in layers:\n",
+    "        z = ...\n",
+    "        a = ...\n",
+    "    return a"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "245adbcb",
+   "metadata": {},
+   "source": [
+    "**c)** Create a network with input size 8 and layers with output sizes 10, 16, 6, 2. Evaluate it and make sure that you get the correct size vectors along the way.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "89a8f70d",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "input_size = ...\n",
+    "layer_output_sizes = [...]\n",
+    "\n",
+    "x = np.random.rand(input_size)\n",
+    "layers = ...\n",
+    "predict = ...\n",
+    "print(predict)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0da7fd52",
+   "metadata": {},
+   "source": [
+    "**d)** Why is a neural network with no activation functions mathematically equivelent to(can be reduced to) a neural network with only one layer?\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "306d8b7c",
+   "metadata": {},
+   "source": [
+    "# Exercise 4 - Custom activation for each layer\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "221c7b6c",
+   "metadata": {},
+   "source": [
+    "So far, every layer has used the same activation, ReLU. We often want to use other types of activation however, so we need to update our code to support multiple types of activation functions. Make sure that you have completed every previous exercise before trying this one.\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "10896d06",
+   "metadata": {},
+   "source": [
+    "**a)** Complete the `feed_forward` function which accepts a list of activation functions as an argument, and which evaluates these activation functions at each layer.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "de062369",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def feed_forward(input, layers, activation_funcs):\n",
+    "    a = input\n",
+    "    for (W, b), activation_func in zip(layers, activation_funcs):\n",
+    "        z = ...\n",
+    "        a = ...\n",
+    "    return a"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8f7df363",
+   "metadata": {},
+   "source": [
+    "**b)** You are now given a list with three activation functions, two ReLU and one sigmoid. (Don't call them yet! you can make a list with function names as elements, and then call these elements of the list later. If you add other functions than the ones defined at the start of the notebook, make sure everything is defined using autograd's numpy wrapper, like above, since we want to use automatic differentiation on all of these functions later.)\n",
+    "\n",
+    "Evaluate a network with three layers and these activation functions.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "301b46dc",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "network_input_size = ...\n",
+    "layer_output_sizes = [...]\n",
+    "activation_funcs = [ReLU, ReLU, sigmoid]\n",
+    "layers = ...\n",
+    "\n",
+    "x = np.random.randn(network_input_size)\n",
+    "feed_forward(x, layers, activation_funcs)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9c914fd0",
+   "metadata": {},
+   "source": [
+    "**c)** How does the output of the network change if you use sigmoid in the hidden layers and ReLU in the output layer?\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a8d6c425",
+   "metadata": {},
+   "source": [
+    "# Exercise 5 - Processing multiple inputs at once\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0f4330a4",
+   "metadata": {},
+   "source": [
+    "So far, the feed forward function has taken one input vector as an input. This vector then undergoes a linear transformation and then an element-wise non-linear operation for each layer. This approach of sending one vector in at a time is great for interpreting how the network transforms data with its linear and non-linear operations, but not the best for numerical efficiency. Now, we want to be able to send many inputs through the network at once. This will make the code a bit harder to understand, but it will make it faster, and more compact. It will be worth the trouble.\n",
+    "\n",
+    "To process multiple inputs at once, while still performing the same operations, you will only need to flip a couple things around.\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "17023bb7",
+   "metadata": {},
+   "source": [
+    "**a)** Complete the function `create_layers_batch` so that the weight matrix is the transpose of what it was when you only sent in one input at a time.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "a241fd79",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def create_layers_batch(network_input_size, layer_output_sizes):\n",
+    "    layers = []\n",
+    "\n",
+    "    i_size = network_input_size\n",
+    "    for layer_output_size in layer_output_sizes:\n",
+    "        W = ...\n",
+    "        b = ...\n",
+    "        layers.append((W, b))\n",
+    "\n",
+    "        i_size = layer_output_size\n",
+    "    return layers"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a6349db6",
+   "metadata": {},
+   "source": [
+    "**b)** Make a matrix of inputs with the shape (number of inputs, number of features), you choose the number of inputs and features per input. Then complete the function `feed_forward_batch` so that you can process this matrix of inputs with only one matrix multiplication and one broadcasted vector addition per layer. (Hint: You will only need to swap two variable around from your previous implementation, but remember to test that you get the same results for equivelent inputs!)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "425f3bcc",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "inputs = np.random.rand(1000, 4)\n",
+    "\n",
+    "\n",
+    "def feed_forward_batch(inputs, layers, activation_funcs):\n",
+    "    a = inputs\n",
+    "    for (W, b), activation_func in zip(layers, activation_funcs):\n",
+    "        z = ...\n",
+    "        a = ...\n",
+    "    return a"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "efd07b4e",
+   "metadata": {},
+   "source": [
+    "**c)** Create and evaluate a neural network with 4 input features, and layers with output sizes 12, 10, 3 and activations ReLU, ReLU, softmax.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "ce6fcc2f",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "network_input_size = ...\n",
+    "layer_output_sizes = [...]\n",
+    "activation_funcs = [...]\n",
+    "layers = create_layers_batch(network_input_size, layer_output_sizes)\n",
+    "\n",
+    "x = np.random.randn(network_input_size)\n",
+    "feed_forward_batch(inputs, layers, activation_funcs)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "87999271",
+   "metadata": {},
+   "source": [
+    "You should use this batched approach moving forward, as it will lead to much more compact code. However, remember that each input is still treated separately, and that you will need to keep in mind the transposed weight matrix and other details when implementing backpropagation.\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "237eb782",
+   "metadata": {},
+   "source": [
+    "# Exercise 6 - Predicting on real data\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "54d5fde7",
+   "metadata": {},
+   "source": [
+    "You will now evaluate your neural network on the iris data set (https://scikit-learn.org/1.5/auto_examples/datasets/plot_iris_dataset.html).\n",
+    "\n",
+    "This dataset contains data on 150 flowers of 3 different types which can be separated pretty well using the four features given for each flower, which includes the width and length of their leaves. You are will later train your network to actually make good predictions.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "6bd4c148",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "iris = datasets.load_iris()\n",
+    "\n",
+    "_, ax = plt.subplots()\n",
+    "scatter = ax.scatter(iris.data[:, 0], iris.data[:, 1], c=iris.target)\n",
+    "ax.set(xlabel=iris.feature_names[0], ylabel=iris.feature_names[1])\n",
+    "_ = ax.legend(\n",
+    "    scatter.legend_elements()[0], iris.target_names, loc=\"lower right\", title=\"Classes\"\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "ed3e2fc9",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "inputs = iris.data\n",
+    "\n",
+    "# Since each prediction is a vector with a score for each of the three types of flowers,\n",
+    "# we need to make each target a vector with a 1 for the correct flower and a 0 for the others.\n",
+    "targets = np.zeros((len(iris.data), 3))\n",
+    "for i, t in enumerate(iris.target):\n",
+    "    targets[i, t] = 1\n",
+    "\n",
+    "\n",
+    "def accuracy(predictions, targets):\n",
+    "    one_hot_predictions = np.zeros(predictions.shape)\n",
+    "\n",
+    "    for i, prediction in enumerate(predictions):\n",
+    "        one_hot_predictions[i, np.argmax(prediction)] = 1\n",
+    "    return accuracy_score(one_hot_predictions, targets)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0362c4a9",
+   "metadata": {},
+   "source": [
+    "**a)** What should the input size for the network be with this dataset? What should the output size of the last layer be?\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "bf62607e",
+   "metadata": {},
+   "source": [
+    "**b)** Create a network with two hidden layers, the first with sigmoid activation and the last with softmax, the first layer should have 8 \"nodes\", the second has the number of nodes you found in exercise a). Softmax returns a \"probability distribution\", in the sense that the numbers in the output are positive and add up to 1 and, their magnitude are in some sense relative to their magnitude before going through the softmax function. Remember to use the batched version of the create_layers and feed forward functions.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "5366d4ae",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "...\n",
+    "layers = ..."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c528846f",
+   "metadata": {},
+   "source": [
+    "**c)** Evaluate your model on the entire iris dataset! For later purposes, we will split the data into train and test sets, and compute gradients on smaller batches of the training data. But for now, evaluate the network on the whole thing at once.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "6c783105",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "predictions = feed_forward_batch(inputs, layers, activation_funcs)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "01a3caa8",
+   "metadata": {},
+   "source": [
+    "**d)** Compute the accuracy of your model using the accuracy function defined above. Recreate your model a couple times and see how the accuracy changes.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "a2612b82",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "print(accuracy(predictions, targets))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "334560b6",
+   "metadata": {},
+   "source": [
+    "# Exercise 7 - Training on real data (Optional)\n",
+    "\n",
+    "To be able to actually do anything useful with your neural network, you need to train it. For this, we need a cost function and a way to take the gradient of the cost function wrt. the network parameters. The following exercises guide you through taking the gradient using autograd, and updating the network parameters using the gradient. Feel free to implement gradient methods like ADAM if you finish everything.\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "700cabe4",
+   "metadata": {},
+   "source": [
+    "Since we are doing a classification task with multiple output classes, we use the cross-entropy loss function, which can evaluate performance on classification tasks. It sees if your prediction is \"most certain\" on the correct target.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "f30e6e2c",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def cross_entropy(predict, target):\n",
+    "    return np.sum(-target * np.log(predict))\n",
+    "\n",
+    "\n",
+    "def cost(input, layers, activation_funcs, target):\n",
+    "    predict = feed_forward_batch(input, layers, activation_funcs)\n",
+    "    return cross_entropy(predict, target)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7ea9c1a4",
+   "metadata": {},
+   "source": [
+    "To improve our network on whatever prediction task we have given it, we need to use a sensible cost function, take the gradient of that cost function with respect to our network parameters, the weights and biases, and then update the weights and biases using these gradients. To clarify, we need to find and use these\n",
+    "\n",
+    "$$\n",
+    "\\frac{\\partial C}{\\partial W}, \\frac{\\partial C}{\\partial b}\n",
+    "$$\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6c753e3b",
+   "metadata": {},
+   "source": [
+    "Now we need to compute these gradients. This is pretty hard to do for a neural network, we will use most of next week to do this, but we can also use autograd to just do it for us, which is what we always do in practice. With the code cell below, we create a function which takes all of these gradients for us.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "56bef776",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from autograd import grad\n",
+    "\n",
+    "\n",
+    "gradient_func = grad(\n",
+    "    cost, 1\n",
+    ")  # Taking the gradient wrt. the second input to the cost function, i.e. the layers"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7b1b74bc",
+   "metadata": {},
+   "source": [
+    "**a)** What shape should the gradient of the cost function wrt. weights and biases be?\n",
+    "\n",
+    "**b)** Use the `gradient_func` function to take the gradient of the cross entropy wrt. the weights and biases of the network. Check the shapes of what's inside. What does the `grad` func from autograd actually do?\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "841c9e87",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "layers_grad = gradient_func(\n",
+    "    inputs, layers, activation_funcs, targets\n",
+    ")  # Don't change this"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "adc9e9be",
+   "metadata": {},
+   "source": [
+    "**c)** Finish the `train_network` function.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "6e4d38d3",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def train_network(\n",
+    "    inputs, layers, activation_funcs, targets, learning_rate=0.001, epochs=100\n",
+    "):\n",
+    "    for i in range(epochs):\n",
+    "        layers_grad = gradient_func(inputs, layers, activation_funcs, targets)\n",
+    "        for (W, b), (W_g, b_g) in zip(layers, layers_grad):\n",
+    "            W -= ...\n",
+    "            b -= ..."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2f65d663",
+   "metadata": {},
+   "source": [
+    "**e)** What do we call the gradient method used above?\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7059dd8c",
+   "metadata": {},
+   "source": [
+    "**d)** Train your network and see how the accuracy changes! Make a plot if you want.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "5027c7a5",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "..."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3bc77016",
+   "metadata": {},
+   "source": [
+    "**e)** How high of an accuracy is it possible to acheive with a neural network on this dataset, if we use the whole thing as training data?\n"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.9.15"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
\ No newline at end of file
diff --git a/doc/LectureNotes/_build/jupyter_execute/exercisesweek42.ipynb b/doc/LectureNotes/_build/jupyter_execute/exercisesweek42.ipynb
new file mode 100644
index 000000000..19e4e09c7
--- /dev/null
+++ b/doc/LectureNotes/_build/jupyter_execute/exercisesweek42.ipynb
@@ -0,0 +1,719 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Exercises week 42\n",
+    "\n",
+    "**October 13-17, 2025**\n",
+    "\n",
+    "Date: **Deadline is Friday October 17 at midnight**\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Overarching aims of the exercises this week\n",
+    "\n",
+    "The aim of the exercises this week is to train the neural network you implemented last week.\n",
+    "\n",
+    "To train neural networks, we use gradient descent, since there is no analytical expression for the optimal parameters. This means you will need to compute the gradient of the cost function wrt. the network parameters. And then you will need to implement some gradient method.\n",
+    "\n",
+    "You will begin by computing gradients for a network with one layer, then two layers, then any number of layers. Keeping track of the shapes and doing things step by step will be very important this week.\n",
+    "\n",
+    "We recommend that you do the exercises this week by editing and running this notebook file, as it includes some checks along the way that you have implemented the neural network correctly, and running small parts of the code at a time will be important for understanding the methods. If you have trouble running a notebook, you can run this notebook in google colab instead(https://colab.research.google.com/drive/1FfvbN0XlhV-lATRPyGRTtTBnJr3zNuHL#offline=true&sandboxMode=true), though we recommend that you set up VSCode and your python environment to run code like this locally.\n",
+    "\n",
+    "First, some setup code that you will need.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import autograd.numpy as np  # We need to use this numpy wrapper to make automatic differentiation work later\n",
+    "from autograd import grad, elementwise_grad\n",
+    "from sklearn import datasets\n",
+    "import matplotlib.pyplot as plt\n",
+    "from sklearn.metrics import accuracy_score\n",
+    "\n",
+    "\n",
+    "# Defining some activation functions\n",
+    "def ReLU(z):\n",
+    "    return np.where(z > 0, z, 0)\n",
+    "\n",
+    "\n",
+    "# Derivative of the ReLU function\n",
+    "def ReLU_der(z):\n",
+    "    return np.where(z > 0, 1, 0)\n",
+    "\n",
+    "\n",
+    "def sigmoid(z):\n",
+    "    return 1 / (1 + np.exp(-z))\n",
+    "\n",
+    "\n",
+    "def mse(predict, target):\n",
+    "    return np.mean((predict - target) ** 2)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Exercise 1 - Understand the feed forward pass\n",
+    "\n",
+    "**a)** Complete last weeks' exercises if you haven't already (recommended).\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Exercise 2 - Gradient with one layer using autograd\n",
+    "\n",
+    "For the first few exercises, we will not use batched inputs. Only a single input vector is passed through the layer at a time.\n",
+    "\n",
+    "In this exercise you will compute the gradient of a single layer. You only need to change the code in the cells right below an exercise, the rest works out of the box. Feel free to make changes and see how stuff works though!\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "**a)** If the weights and bias of a layer has shapes (10, 4) and (10), what will the shapes of the gradients of the cost function wrt. these weights and this bias be?\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "**b)** Complete the feed_forward_one_layer function. It should use the sigmoid activation function. Also define the weigth and bias with the correct shapes.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 41,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def feed_forward_one_layer(W, b, x):\n",
+    "    z = ...\n",
+    "    a = ...\n",
+    "    return a\n",
+    "\n",
+    "\n",
+    "def cost_one_layer(W, b, x, target):\n",
+    "    predict = feed_forward_one_layer(W, b, x)\n",
+    "    return mse(predict, target)\n",
+    "\n",
+    "\n",
+    "x = np.random.rand(2)\n",
+    "target = np.random.rand(3)\n",
+    "\n",
+    "W = ...\n",
+    "b = ..."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "**c)** Compute the gradient of the cost function wrt. the weigth and bias by running the cell below. You will not need to change anything, just make sure it runs by defining things correctly in the cell above. This code uses the autograd package which uses backprogagation to compute the gradient!\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "autograd_one_layer = grad(cost_one_layer, [0, 1])\n",
+    "W_g, b_g = autograd_one_layer(W, b, x, target)\n",
+    "print(W_g, b_g)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Exercise 3 - Gradient with one layer writing backpropagation by hand\n",
+    "\n",
+    "Before you use the gradient you found using autograd, you will have to find the gradient \"manually\", to better understand how the backpropagation computation works. To do backpropagation \"manually\", you will need to write out expressions for many derivatives along the computation.\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "We want to find the gradient of the cost function wrt. the weight and bias. This is quite hard to do directly, so we instead use the chain rule to combine multiple derivatives which are easier to compute.\n",
+    "\n",
+    "$$\n",
+    "\\frac{dC}{dW} = \\frac{dC}{da}\\frac{da}{dz}\\frac{dz}{dW}\n",
+    "$$\n",
+    "\n",
+    "$$\n",
+    "\\frac{dC}{db} = \\frac{dC}{da}\\frac{da}{dz}\\frac{dz}{db}\n",
+    "$$\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "**a)** Which intermediary results can be reused between the two expressions?\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "**b)** What is the derivative of the cost wrt. the final activation? You can use the autograd calculation to make sure you get the correct result. Remember that we compute the mean in mse.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "z = W @ x + b\n",
+    "a = sigmoid(z)\n",
+    "\n",
+    "predict = a\n",
+    "\n",
+    "\n",
+    "def mse_der(predict, target):\n",
+    "    return ...\n",
+    "\n",
+    "\n",
+    "print(mse_der(predict, target))\n",
+    "\n",
+    "cost_autograd = grad(mse, 0)\n",
+    "print(cost_autograd(predict, target))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "**c)** What is the expression for the derivative of the sigmoid activation function? You can use the autograd calculation to make sure you get the correct result.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def sigmoid_der(z):\n",
+    "    return ...\n",
+    "\n",
+    "\n",
+    "print(sigmoid_der(z))\n",
+    "\n",
+    "sigmoid_autograd = elementwise_grad(sigmoid, 0)\n",
+    "print(sigmoid_autograd(z))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "**d)** Using the two derivatives you just computed, compute this intermetidary gradient you will use later:\n",
+    "\n",
+    "$$\n",
+    "\\frac{dC}{dz} = \\frac{dC}{da}\\frac{da}{dz}\n",
+    "$$\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 54,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "dC_da = ...\n",
+    "dC_dz = ..."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "**e)** What is the derivative of the intermediary z wrt. the weight and bias? What should the shapes be? The one for the weights is a little tricky, it can be easier to play around in the next exercise first. You can also try computing it with autograd to get a hint.\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "**f)** Now combine the expressions you have worked with so far to compute the gradients! Note that you always need to do a feed forward pass while saving the zs and as before you do backpropagation, as they are used in the derivative expressions\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "dC_da = ...\n",
+    "dC_dz = ...\n",
+    "dC_dW = ...\n",
+    "dC_db = ...\n",
+    "\n",
+    "print(dC_dW, dC_db)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "You should get the same results as with autograd.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "W_g, b_g = autograd_one_layer(W, b, x, target)\n",
+    "print(W_g, b_g)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Exercise 4 - Gradient with two layers writing backpropagation by hand\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Now that you have implemented backpropagation for one layer, you have found most of the expressions you will need for more layers. Let's move up to two layers.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 59,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "x = np.random.rand(2)\n",
+    "target = np.random.rand(4)\n",
+    "\n",
+    "W1 = np.random.rand(3, 2)\n",
+    "b1 = np.random.rand(3)\n",
+    "\n",
+    "W2 = np.random.rand(4, 3)\n",
+    "b2 = np.random.rand(4)\n",
+    "\n",
+    "layers = [(W1, b1), (W2, b2)]"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 60,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "z1 = W1 @ x + b1\n",
+    "a1 = sigmoid(z1)\n",
+    "z2 = W2 @ a1 + b2\n",
+    "a2 = sigmoid(z2)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "We begin by computing the gradients of the last layer, as the gradients must be propagated backwards from the end.\n",
+    "\n",
+    "**a)** Compute the gradients of the last layer, just like you did the single layer in the previous exercise.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 61,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "dC_da2 = ...\n",
+    "dC_dz2 = ...\n",
+    "dC_dW2 = ...\n",
+    "dC_db2 = ..."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "To find the derivative of the cost wrt. the activation of the first layer, we need a new expression, the one furthest to the right in the following.\n",
+    "\n",
+    "$$\n",
+    "\\frac{dC}{da_1} = \\frac{dC}{dz_2}\\frac{dz_2}{da_1}\n",
+    "$$\n",
+    "\n",
+    "**b)** What is the derivative of the second layer intermetiate wrt. the first layer activation? (First recall how you compute $z_2$)\n",
+    "\n",
+    "$$\n",
+    "\\frac{dz_2}{da_1}\n",
+    "$$\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "**c)** Use this expression, together with expressions which are equivelent to ones for the last layer to compute all the derivatives of the first layer.\n",
+    "\n",
+    "$$\n",
+    "\\frac{dC}{dW_1} = \\frac{dC}{da_1}\\frac{da_1}{dz_1}\\frac{dz_1}{dW_1}\n",
+    "$$\n",
+    "\n",
+    "$$\n",
+    "\\frac{dC}{db_1} = \\frac{dC}{da_1}\\frac{da_1}{dz_1}\\frac{dz_1}{db_1}\n",
+    "$$\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 63,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "dC_da1 = ...\n",
+    "dC_dz1 = ...\n",
+    "dC_dW1 = ...\n",
+    "dC_db1 = ..."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "print(dC_dW1, dC_db1)\n",
+    "print(dC_dW2, dC_db2)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "**d)** Make sure you got the same gradient as the following code which uses autograd to do backpropagation.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 67,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def feed_forward_two_layers(layers, x):\n",
+    "    W1, b1 = layers[0]\n",
+    "    z1 = W1 @ x + b1\n",
+    "    a1 = sigmoid(z1)\n",
+    "\n",
+    "    W2, b2 = layers[1]\n",
+    "    z2 = W2 @ a1 + b2\n",
+    "    a2 = sigmoid(z2)\n",
+    "\n",
+    "    return a2"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def cost_two_layers(layers, x, target):\n",
+    "    predict = feed_forward_two_layers(layers, x)\n",
+    "    return mse(predict, target)\n",
+    "\n",
+    "\n",
+    "grad_two_layers = grad(cost_two_layers, 0)\n",
+    "grad_two_layers(layers, x, target)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "**e)** How would you use the gradient from this layer to compute the gradient of an even earlier layer? Would the expressions be any different?\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Exercise 5 - Gradient with any number of layers writing backpropagation by hand\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Well done on getting this far! Now it's time to compute the gradient with any number of layers.\n",
+    "\n",
+    "First, some code from the general neural network code from last week. Note that we are still sending in one input vector at a time. We will change it to use batched inputs later.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def create_layers(network_input_size, layer_output_sizes):\n",
+    "    layers = []\n",
+    "\n",
+    "    i_size = network_input_size\n",
+    "    for layer_output_size in layer_output_sizes:\n",
+    "        W = np.random.randn(layer_output_size, i_size)\n",
+    "        b = np.random.randn(layer_output_size)\n",
+    "        layers.append((W, b))\n",
+    "\n",
+    "        i_size = layer_output_size\n",
+    "    return layers\n",
+    "\n",
+    "\n",
+    "def feed_forward(input, layers, activation_funcs):\n",
+    "    a = input\n",
+    "    for (W, b), activation_func in zip(layers, activation_funcs):\n",
+    "        z = W @ a + b\n",
+    "        a = activation_func(z)\n",
+    "    return a\n",
+    "\n",
+    "\n",
+    "def cost(layers, input, activation_funcs, target):\n",
+    "    predict = feed_forward(input, layers, activation_funcs)\n",
+    "    return mse(predict, target)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "You might have already have noticed a very important detail in backpropagation: You need the values from the forward pass to compute all the gradients! The feed forward method above is great for efficiency and for using autograd, as it only cares about computing the final output, but now we need to also save the results along the way.\n",
+    "\n",
+    "Here is a function which does that for you.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def feed_forward_saver(input, layers, activation_funcs):\n",
+    "    layer_inputs = []\n",
+    "    zs = []\n",
+    "    a = input\n",
+    "    for (W, b), activation_func in zip(layers, activation_funcs):\n",
+    "        layer_inputs.append(a)\n",
+    "        z = W @ a + b\n",
+    "        a = activation_func(z)\n",
+    "\n",
+    "        zs.append(z)\n",
+    "\n",
+    "    return layer_inputs, zs, a"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "**a)** Now, complete the backpropagation function so that it returns the gradient of the cost function wrt. all the weigths and biases. Use the autograd calculation below to make sure you get the correct answer.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def backpropagation(\n",
+    "    input, layers, activation_funcs, target, activation_ders, cost_der=mse_der\n",
+    "):\n",
+    "    layer_inputs, zs, predict = feed_forward_saver(input, layers, activation_funcs)\n",
+    "\n",
+    "    layer_grads = [() for layer in layers]\n",
+    "\n",
+    "    # We loop over the layers, from the last to the first\n",
+    "    for i in reversed(range(len(layers))):\n",
+    "        layer_input, z, activation_der = layer_inputs[i], zs[i], activation_ders[i]\n",
+    "\n",
+    "        if i == len(layers) - 1:\n",
+    "            # For last layer we use cost derivative as dC_da(L) can be computed directly\n",
+    "            dC_da = ...\n",
+    "        else:\n",
+    "            # For other layers we build on previous z derivative, as dC_da(i) = dC_dz(i+1) * dz(i+1)_da(i)\n",
+    "            (W, b) = layers[i + 1]\n",
+    "            dC_da = ...\n",
+    "\n",
+    "        dC_dz = ...\n",
+    "        dC_dW = ...\n",
+    "        dC_db = ...\n",
+    "\n",
+    "        layer_grads[i] = (dC_dW, dC_db)\n",
+    "\n",
+    "    return layer_grads"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "network_input_size = 2\n",
+    "layer_output_sizes = [3, 4]\n",
+    "activation_funcs = [sigmoid, ReLU]\n",
+    "activation_ders = [sigmoid_der, ReLU_der]\n",
+    "\n",
+    "layers = create_layers(network_input_size, layer_output_sizes)\n",
+    "\n",
+    "x = np.random.rand(network_input_size)\n",
+    "target = np.random.rand(4)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "layer_grads = backpropagation(x, layers, activation_funcs, target, activation_ders)\n",
+    "print(layer_grads)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "cost_grad = grad(cost, 0)\n",
+    "cost_grad(layers, x, [sigmoid, ReLU], target)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Exercise 6 - Batched inputs\n",
+    "\n",
+    "Make new versions of all the functions in exercise 5 which now take batched inputs instead. See last weeks exercise 5 for details on how to batch inputs to neural networks. You will also need to update the backpropogation function.\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Exercise 7 - Training\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "**a)** Complete exercise 6 and 7 from last week, but use your own backpropogation implementation to compute the gradient.\n",
+    "- IMPORTANT: Do not implement the derivative terms for softmax and cross-entropy separately, it will be very hard!\n",
+    "- Instead, use the fact that the derivatives multiplied together simplify to **prediction - target** (see [source1](https://medium.com/data-science/derivative-of-the-softmax-function-and-the-categorical-cross-entropy-loss-ffceefc081d1), [source2](https://shivammehta25.github.io/posts/deriving-categorical-cross-entropy-and-softmax/))\n",
+    "\n",
+    "**b)** Use stochastic gradient descent with momentum when you train your network.\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Exercise 8 (Optional) - Object orientation\n",
+    "\n",
+    "Passing in the layers, activations functions, activation derivatives and cost derivatives into the functions each time leads to code which is easy to understand in isoloation, but messier when used in a larger context with data splitting, data scaling, gradient methods and so forth. Creating an object which stores these values can lead to code which is much easier to use.\n",
+    "\n",
+    "**a)** Write a neural network class. You are free to implement it how you see fit, though we strongly recommend to not save any input or output values as class attributes, nor let the neural network class handle gradient methods internally. Gradient methods should be handled outside, by performing general operations on the layer_grads list using functions or classes separate to the neural network.\n",
+    "\n",
+    "We provide here a skeleton structure which should get you started.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "class NeuralNetwork:\n",
+    "    def __init__(\n",
+    "        self,\n",
+    "        network_input_size,\n",
+    "        layer_output_sizes,\n",
+    "        activation_funcs,\n",
+    "        activation_ders,\n",
+    "        cost_fun,\n",
+    "        cost_der,\n",
+    "    ):\n",
+    "        pass\n",
+    "\n",
+    "    def predict(self, inputs):\n",
+    "        # Simple feed forward pass\n",
+    "        pass\n",
+    "\n",
+    "    def cost(self, inputs, targets):\n",
+    "        pass\n",
+    "\n",
+    "    def _feed_forward_saver(self, inputs):\n",
+    "        pass\n",
+    "\n",
+    "    def compute_gradient(self, inputs, targets):\n",
+    "        pass\n",
+    "\n",
+    "    def update_weights(self, layer_grads):\n",
+    "        pass\n",
+    "\n",
+    "    # These last two methods are not needed in the project, but they can be nice to have! The first one has a layers parameter so that you can use autograd on it\n",
+    "    def autograd_compliant_predict(self, layers, inputs):\n",
+    "        pass\n",
+    "\n",
+    "    def autograd_gradient(self, inputs, targets):\n",
+    "        pass"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.9.15"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}
\ No newline at end of file
diff --git a/doc/LectureNotes/_build/jupyter_execute/exercisesweek43.ipynb b/doc/LectureNotes/_build/jupyter_execute/exercisesweek43.ipynb
new file mode 100644
index 000000000..e02a479e8
--- /dev/null
+++ b/doc/LectureNotes/_build/jupyter_execute/exercisesweek43.ipynb
@@ -0,0 +1,647 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "860d70d8",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- HTML file automatically generated from DocOnce source (https://github.com/doconce/doconce/)\n",
+    "doconce format html exercisesweek43.do.txt  -->\n",
+    "<!-- dom:TITLE: Exercises week 43  -->"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "119c0988",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "# Exercises week 43 \n",
+    "**October 20-24, 2025**\n",
+    "\n",
+    "Date: **Deadline Friday October 24 at midnight**"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "909887eb",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "# Overarching aims of the exercises for week 43\n",
+    "\n",
+    "The aim of the exercises this week is to gain some confidence with\n",
+    "ways to visualize the results of a classification problem.  We will\n",
+    "target three ways of setting up the analysis. The first and simplest\n",
+    "one is the\n",
+    "1. so-called confusion matrix. The next one is the so-called\n",
+    "\n",
+    "2. ROC curve. Finally we have the\n",
+    "\n",
+    "3. Cumulative gain curve.\n",
+    "\n",
+    "We will use Logistic Regression as method for the classification in\n",
+    "this exercise. You can compare these results with those obtained with\n",
+    "your neural network code from project 2 without a hidden layer.\n",
+    "\n",
+    "In these exercises we will use binary and  multi-class data sets\n",
+    "(the Iris data set from week 41).\n",
+    "\n",
+    "The underlying mathematics is described here."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1e1cb4fb",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "### Confusion Matrix\n",
+    "\n",
+    "A **confusion matrix** summarizes a classifier’s performance by\n",
+    "tabulating predictions versus true labels.  For binary classification,\n",
+    "it is a $2\\times2$ table whose entries are counts of outcomes:"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7b090385",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\begin{array}{l|cc} & \\text{Predicted Positive} & \\text{Predicted Negative} \\\\ \\hline \\text{Actual Positive} & TP & FN \\\\ \\text{Actual Negative} & FP & TN \\end{array}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1e14904b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Here TP (true positives) is the number of cases correctly predicted as\n",
+    "positive, FP (false positives) is the number incorrectly predicted as\n",
+    "positive, TN (true negatives) is correctly predicted negative, and FN\n",
+    "(false negatives) is incorrectly predicted negative .  In other words,\n",
+    "“positive” means class 1 and “negative” means class 0; for example, TP\n",
+    "occurs when the prediction and actual are both positive.  Formally:"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e93ea290",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\text{TPR} = \\frac{\\text{TP}}{\\text{TP} + \\text{FN}}, \\quad \\text{FPR} = \\frac{\\text{FP}}{\\text{FP} + \\text{TN}},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c80bea5b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where TPR and FPR are the true and false positive rates defined below.\n",
+    "\n",
+    "In multiclass classification with $K$ classes, the confusion matrix\n",
+    "generalizes to a $K\\times K$ table.  Entry $N_{ij}$ in the table is\n",
+    "the count of instances whose true class is $i$ and whose predicted\n",
+    "class is $j$.  For example, a three-class confusion matrix can be written\n",
+    "as:"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a0f68f5f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\begin{array}{c|ccc} & \\text{Pred Class 1} & \\text{Pred Class 2} & \\text{Pred Class 3} \\\\ \\hline \\text{Act Class 1} & N_{11} & N_{12} & N_{13} \\\\ \\text{Act Class 2} & N_{21} & N_{22} & N_{23} \\\\ \\text{Act Class 3} & N_{31} & N_{32} & N_{33} \\end{array}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "869669b2",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Here the diagonal entries $N_{ii}$ are the true positives for each\n",
+    "class, and off-diagonal entries are misclassifications.  This matrix\n",
+    "allows computation of per-class metrics: e.g. for class $i$,\n",
+    "$\\mathrm{TP}_i=N_{ii}$, $\\mathrm{FN}_i=\\sum_{j\\neq i}N_{ij}$,\n",
+    "$\\mathrm{FP}_i=\\sum_{j\\neq i}N_{ji}$, and $\\mathrm{TN}_i$ is the sum of\n",
+    "all remaining entries.\n",
+    "\n",
+    "As defined above, TPR and FPR come from the binary case. In binary\n",
+    "terms with $P$ actual positives and $N$ actual negatives, one has"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2abd82a7",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\text{TPR} = \\frac{TP}{P} = \\frac{TP}{TP+FN}, \\quad \\text{FPR} =\n",
+    "\\frac{FP}{N} = \\frac{FP}{FP+TN},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2f79325c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "as used in standard confusion-matrix\n",
+    "formulations. These rates will be used in constructing ROC curves."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0ce65a47",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "### ROC Curve\n",
+    "\n",
+    "The Receiver Operating Characteristic (ROC) curve plots the trade-off\n",
+    "between true positives and false positives as a discrimination\n",
+    "threshold varies.  Specifically, for a binary classifier that outputs\n",
+    "a score or probability, one varies the threshold $t$ for declaring\n",
+    "**positive**, and computes at each $t$ the true positive rate\n",
+    "$\\mathrm{TPR}(t)$ and false positive rate $\\mathrm{FPR}(t)$ using the\n",
+    "confusion matrix at that threshold.  The ROC curve is then the graph\n",
+    "of TPR versus FPR.  By definition,"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d750fdff",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\mathrm{TPR} = \\frac{TP}{TP+FN}, \\qquad \\mathrm{FPR} = \\frac{FP}{FP+TN},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "561bfb2c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where $TP,FP,TN,FN$ are counts determined by threshold $t$.  A perfect\n",
+    "classifier would reach the point (FPR=0, TPR=1) at some threshold.\n",
+    "\n",
+    "Formally, the ROC curve is obtained by plotting\n",
+    "$(\\mathrm{FPR}(t),\\mathrm{TPR}(t))$ for all $t\\in[0,1]$ (or as $t$\n",
+    "sweeps through the sorted scores).  The Area Under the ROC Curve (AUC)\n",
+    "quantifies the average performance over all thresholds.  It can be\n",
+    "interpreted probabilistically: $\\mathrm{AUC} =\n",
+    "\\Pr\\bigl(s(X^+)>s(X^-)\\bigr)$, the probability that a random positive\n",
+    "instance $X^+$ receives a higher score $s$ than a random negative\n",
+    "instance $X^-$ .  Equivalently, the AUC is the integral under the ROC\n",
+    "curve:"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5ca722fe",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\mathrm{AUC} \\;=\\; \\int_{0}^{1} \\mathrm{TPR}(f)\\,df,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "30080a86",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where $f$ ranges over FPR (or fraction of negatives).  A model that guesses at random yields a diagonal ROC (AUC=0.5), whereas a perfect model yields AUC=1.0."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9e627156",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "### Cumulative Gain\n",
+    "\n",
+    "The cumulative gain curve (or gains chart) evaluates how many\n",
+    "positives are captured as one targets an increasing fraction of the\n",
+    "population, sorted by model confidence.  To construct it, sort all\n",
+    "instances by decreasing predicted probability of the positive class.\n",
+    "Then, for the top $\\alpha$ fraction of instances, compute the fraction\n",
+    "of all actual positives that fall in this subset.  In formula form, if\n",
+    "$P$ is the total number of positive instances and $P(\\alpha)$ is the\n",
+    "number of positives among the top $\\alpha$ of the data, the cumulative\n",
+    "gain at level $\\alpha$ is"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3e9132ef",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\mathrm{Gain}(\\alpha) \\;=\\; \\frac{P(\\alpha)}{P}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "75be6f5c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "For example, cutting off at the top 10% of predictions yields a gain\n",
+    "equal to (positives in top 10%) divided by (total positives) .\n",
+    "Plotting $\\mathrm{Gain}(\\alpha)$ versus $\\alpha$ (often in percent)\n",
+    "gives the gain curve.  The baseline (random) curve is the diagonal\n",
+    "$\\mathrm{Gain}(\\alpha)=\\alpha$, while an ideal model has a steep climb\n",
+    "toward 1.\n",
+    "\n",
+    "A related measure is the {\\em lift}, often called the gain ratio.  It is the ratio of the model’s capture rate to that of random selection.  Equivalently,"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e5525570",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\mathrm{Lift}(\\alpha) \\;=\\; \\frac{\\mathrm{Gain}(\\alpha)}{\\alpha}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "18ff8dc2",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "A lift $>1$ indicates better-than-random targeting.  In practice, gain\n",
+    "and lift charts (used e.g.\\ in marketing or imbalanced classification)\n",
+    "show how many positives can be “gained” by focusing on a fraction of\n",
+    "the population ."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c3d3fde8",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "### Other measures: Precision, Recall, and the F$_1$ Measure\n",
+    "\n",
+    "Precision and recall (sensitivity) quantify binary classification\n",
+    "accuracy in terms of positive predictions.  They are defined from the\n",
+    "confusion matrix as:"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f1f14c8e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\text{Precision} = \\frac{TP}{TP + FP}, \\qquad \\text{Recall} = \\frac{TP}{TP + FN}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "422cc743",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Precision is the fraction of predicted positives that are correct, and\n",
+    "recall is the fraction of actual positives that are correctly\n",
+    "identified .  A high-precision classifier makes few false-positive\n",
+    "errors, while a high-recall classifier makes few false-negative\n",
+    "errors.\n",
+    "\n",
+    "The F$_1$ score (balanced F-measure) combines precision and recall into a single metric via their harmonic mean.  The usual formula is:"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "621a2e8b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "F_1 =2\\frac{\\text{Precision}\\times\\text{Recall}}{\\text{Precision} + \\text{Recall}}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "62eee54a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "This can be shown to equal"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7a6a2e7a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{2\\,TP}{2\\,TP + FP + FN}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b96c9ff4",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "The F$_1$ score ranges from 0 (worst) to 1 (best), and balances the\n",
+    "trade-off between precision and recall.\n",
+    "\n",
+    "For multi-class classification, one computes per-class\n",
+    "precision/recall/F$_1$ (treating each class as “positive” in a\n",
+    "one-vs-rest manner) and then averages.  Common averaging methods are:\n",
+    "\n",
+    "Micro-averaging: Sum all true positives, false positives, and false negatives across classes, then compute precision/recall/F$_1$ from these totals.\n",
+    "Macro-averaging: Compute the F$1$ score $F{1,i}$ for each class $i$ separately, then take the unweighted mean: $F_{1,\\mathrm{macro}} = \\frac{1}{K}\\sum_{i=1}^K F_{1,i}$ .  This treats all classes equally regardless of size.\n",
+    "Weighted-averaging: Like macro-average, but weight each class’s $F_{1,i}$ by its support $n_i$ (true count): $F_{1,\\mathrm{weighted}} = \\frac{1}{N}\\sum_{i=1}^K n_i F_{1,i}$, where $N=\\sum_i n_i$.  This accounts for class imbalance by giving more weight to larger classes .\n",
+    "\n",
+    "Each of these averages has different use-cases. Micro-average is\n",
+    "dominated by common classes, macro-average highlights performance on\n",
+    "rare classes, and weighted-average is a compromise.  These formulas\n",
+    "and concepts allow rigorous evaluation of classifier performance in\n",
+    "both binary and multi-class settings."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9274bf3f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Exercises\n",
+    "\n",
+    "Here is a simple code example which uses  the Logistic regression machinery from **scikit-learn**.\n",
+    "At the end it sets up the confusion matrix and the ROC and cumulative gain curves.\n",
+    "Feel free to use these functionalities (we don't expect you to write your own code for say the confusion matrix)."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "id": "be9ff0b9",
+   "metadata": {
+    "collapsed": false,
+    "editable": true,
+    "jupyter": {
+     "outputs_hidden": false
+    }
+   },
+   "outputs": [],
+   "source": [
+    "%matplotlib inline\n",
+    "\n",
+    "import matplotlib.pyplot as plt\n",
+    "import numpy as np\n",
+    "from sklearn.model_selection import  train_test_split \n",
+    "# from sklearn.datasets import fill in the data set\n",
+    "from sklearn.linear_model import LogisticRegression\n",
+    "\n",
+    "# Load the data, fill inn\n",
+    "mydata.data = ?\n",
+    "\n",
+    "X_train, X_test, y_train, y_test = train_test_split(mydata.data,cancer.target,random_state=0)\n",
+    "print(X_train.shape)\n",
+    "print(X_test.shape)\n",
+    "# Logistic Regression\n",
+    "# define which type of problem, binary or multiclass\n",
+    "logreg = LogisticRegression(solver='lbfgs')\n",
+    "logreg.fit(X_train, y_train)\n",
+    "\n",
+    "from sklearn.preprocessing import LabelEncoder\n",
+    "from sklearn.model_selection import cross_validate\n",
+    "#Cross validation\n",
+    "accuracy = cross_validate(logreg,X_test,y_test,cv=10)['test_score']\n",
+    "print(accuracy)\n",
+    "print(\"Test set accuracy with Logistic Regression: {:.2f}\".format(logreg.score(X_test,y_test)))\n",
+    "\n",
+    "import scikitplot as skplt\n",
+    "y_pred = logreg.predict(X_test)\n",
+    "skplt.metrics.plot_confusion_matrix(y_test, y_pred, normalize=True)\n",
+    "plt.show()\n",
+    "y_probas = logreg.predict_proba(X_test)\n",
+    "skplt.metrics.plot_roc(y_test, y_probas)\n",
+    "plt.show()\n",
+    "skplt.metrics.plot_cumulative_gain(y_test, y_probas)\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "51760b3e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "### Exercise a)\n",
+    "\n",
+    "Convince yourself about the mathematics for the confusion matrix, the ROC and the cumlative gain curves for both a binary and a multiclass classification problem."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c1d42f5f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "### Exercise b)\n",
+    "\n",
+    "Use a binary classification data available from **scikit-learn**. As an example you can use\n",
+    "the MNIST data set and just specialize to two numbers. To do so you can use the following code lines"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "id": "d20bb8be",
+   "metadata": {
+    "collapsed": false,
+    "editable": true,
+    "jupyter": {
+     "outputs_hidden": false
+    }
+   },
+   "outputs": [],
+   "source": [
+    "from sklearn.datasets import load_digits\n",
+    "digits = load_digits(n_class=2) # Load only two classes, e.g., 0 and 1\n",
+    "X, y = digits.data, digits.target"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "828ea1cd",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Alternatively, you can use the _make$\\_$classification_\n",
+    "functionality. This function generates a random $n$-class classification\n",
+    "dataset, which can be configured for binary classification by setting\n",
+    "n_classes=2. You can also control the number of samples, features,\n",
+    "informative features, redundant features, and more."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "id": "d271f0ba",
+   "metadata": {
+    "collapsed": false,
+    "editable": true,
+    "jupyter": {
+     "outputs_hidden": false
+    }
+   },
+   "outputs": [],
+   "source": [
+    "from sklearn.datasets import make_classification\n",
+    "X, y = make_classification(n_samples=1000, n_features=20, n_informative=10, n_redundant=5, n_classes=2, random_state=42)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0068b032",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "You can use this option for the multiclass case as well, see the next exercise.\n",
+    "If you prefer to study other binary classification datasets, feel free\n",
+    "to replace the above suggestions with your own dataset.\n",
+    "\n",
+    "Make plots of the confusion matrix, the ROC curve and the cumulative gain curve."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c45f5b41",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "### Exercise c) week 43\n",
+    "\n",
+    "As a multiclass problem, we will use the Iris data set discussed in\n",
+    "the exercises from weeks 41 and 42. This is a three-class data set and\n",
+    "you can set it up using **scikit-learn**,"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "id": "3b045d56",
+   "metadata": {
+    "collapsed": false,
+    "editable": true,
+    "jupyter": {
+     "outputs_hidden": false
+    }
+   },
+   "outputs": [],
+   "source": [
+    "from sklearn.datasets import load_iris\n",
+    "iris = load_iris()\n",
+    "X = iris.data  # Features\n",
+    "y = iris.target # Target labels"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "14cc859c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Make plots of the confusion matrix, the ROC curve and the cumulative\n",
+    "gain curve for this (or other) multiclass data set."
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.9.15"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
\ No newline at end of file
diff --git a/doc/LectureNotes/_build/jupyter_execute/exercisesweek44.ipynb b/doc/LectureNotes/_build/jupyter_execute/exercisesweek44.ipynb
new file mode 100644
index 000000000..e218688d9
--- /dev/null
+++ b/doc/LectureNotes/_build/jupyter_execute/exercisesweek44.ipynb
@@ -0,0 +1,182 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "55f7cd56",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- HTML file automatically generated from DocOnce source (https://github.com/doconce/doconce/)\n",
+    "doconce format html exercisesweek44.do.txt  -->\n",
+    "<!-- dom:TITLE: Exercises week 44 -->\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "37c83276",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "# Exercises week 44\n",
+    "\n",
+    "**October 27-31, 2025**\n",
+    "\n",
+    "Date: **Deadline is Friday October 31 at midnight**\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "58a26983",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "# Overarching aims of the exercises this week\n",
+    "\n",
+    "The exercise set this week has two parts.\n",
+    "\n",
+    "1. The first is a version of the exercises from week 39, where you got started with the report and github repository for project 1, only this time for project 2. This part is required, and a short feedback to this exercise will be available before the project deadline. And you can reuse these elements in your final report.\n",
+    "\n",
+    "2. The second is a list of questions meant as a summary of many of the central elements we have discussed in connection with projects 1 and 2, with a slight bias towards deep learning methods and their training. The hope is that these exercises can be of use in your discussions about the neural network results in project 2. **You don't need to answer all the questions, but you should be able to answer them by the end of working on project 2.**\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "350c58e2",
+   "metadata": {
+    "vscode": {
+     "languageId": "plaintext"
+    }
+   },
+   "source": [
+    "### Deliverables\n",
+    "\n",
+    "First, join a group in canvas with your group partners. Pick an avaliable group for Project 2 in the “People” page. If you don't have a group, you should really consider joining one!\n",
+    "\n",
+    "Complete exercise 1 while working in an Overleaf project. Then, in canvas, include\n",
+    "\n",
+    "- An exported PDF of the report draft you have been working on.\n",
+    "- A comment linking to the github repository used in exercise **1d)**\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "00f65f6e",
+   "metadata": {},
+   "source": [
+    "## Exercise 1:\n",
+    "\n",
+    "Following the same directions as in the weekly exercises for week 39:\n",
+    "\n",
+    "**a)** Create a report document in Overleaf, and write a suitable abstract and introduction for project 2.\n",
+    "\n",
+    "**b)** Add a figure in your report of a heatmap showing the test accuracy of a neural network with [0, 1, 2, 3] hidden layers and [5, 10, 25, 50] nodes per hidden layer.\n",
+    "\n",
+    "**c)** Add a figure in your report which meets as few requirements as possible of what we consider a good figure in this course, while still including some results, a title, figure text, and axis labels. Describe in the text of the report the different ways in which the figure is lacking. (This should not be included in the final report for project 2.)\n",
+    "\n",
+    "**d)** Create a github repository or folder in a repository with all the elements described in exercise 4 of the weekly exercises of week 39.\n",
+    "\n",
+    "**e)** If applicable, add references to your report for the source of your data for regression and classification, the source of claims you make about your data, and for the sources of the gradient optimizers you use and your general claims about these.\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6dff53b8",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Exercise 2:\n",
+    "\n",
+    "**a)** Linear and logistic regression methods\n",
+    "\n",
+    "1. What is the main difference between ordinary least squares and Ridge regression?\n",
+    "\n",
+    "2. Which kind of data set would you use logistic regression for?\n",
+    "\n",
+    "3. In linear regression you assume that your output is described by a continuous non-stochastic function $f(x)$. Which is the equivalent function in logistic regression?\n",
+    "\n",
+    "4. Can you find an analytic solution to a logistic regression type of problem?\n",
+    "\n",
+    "5. What kind of cost function would you use in logistic regression?\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "21a056a4",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "**b)** Deep learning\n",
+    "\n",
+    "1. What is an activation function and discuss the use of an activation function? Explain three different types of activation functions?\n",
+    "\n",
+    "2. Describe the architecture of a typical feed forward Neural Network (NN).\n",
+    "\n",
+    "3. You are using a deep neural network for a prediction task. After training your model, you notice that it is strongly overfitting the training set and that the performance on the test isn’t good. What can you do to reduce overfitting?\n",
+    "\n",
+    "4. How would you know if your model is suffering from the problem of exploding gradients?\n",
+    "\n",
+    "5. Can you name and explain a few hyperparameters used for training a neural network?\n",
+    "\n",
+    "6. Describe the architecture of a typical Convolutional Neural Network (CNN)\n",
+    "\n",
+    "7. What is the vanishing gradient problem in Neural Networks and how to fix it?\n",
+    "\n",
+    "8. When it comes to training an artificial neural network, what could the reason be for why the cost/loss doesn't decrease in a few epochs?\n",
+    "\n",
+    "9. How does L1/L2 regularization affect a neural network?\n",
+    "\n",
+    "10. What is(are) the advantage(s) of deep learning over traditional methods like linear regression or logistic regression?\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7c48bc09",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "**c)** Optimization part\n",
+    "\n",
+    "1. Which is the basic mathematical root-finding method behind essentially all gradient descent approaches(stochastic and non-stochastic)?\n",
+    "\n",
+    "2. And why don't we use it? Or stated differently, why do we introduce the learning rate as a parameter?\n",
+    "\n",
+    "3. What might happen if you set the momentum hyperparameter too close to 1 (e.g., 0.9999) when using an optimizer for the learning rate?\n",
+    "\n",
+    "4. Why should we use stochastic gradient descent instead of plain gradient descent?\n",
+    "\n",
+    "5. Which parameters would you need to tune when use a stochastic gradient descent approach?\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "56b0b5f6",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "**d)** Analysis of results\n",
+    "\n",
+    "1. How do you assess overfitting and underfitting?\n",
+    "\n",
+    "2. Why do we divide the data in test and train and/or eventually validation sets?\n",
+    "\n",
+    "3. Why would you use resampling methods in the data analysis? Mention some widely popular resampling methods.\n",
+    "\n",
+    "4. Why might a model that does not overfit the data (maybe because there is a lot of data) perform worse when we add regularization?\n"
+   ]
+  }
+ ],
+ "metadata": {
+  "language_info": {
+   "name": "python"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
\ No newline at end of file
diff --git a/doc/LectureNotes/_build/jupyter_execute/project1.ipynb b/doc/LectureNotes/_build/jupyter_execute/project1.ipynb
index 85723ce73..3cca77865 100644
--- a/doc/LectureNotes/_build/jupyter_execute/project1.ipynb
+++ b/doc/LectureNotes/_build/jupyter_execute/project1.ipynb
@@ -9,7 +9,7 @@
    "source": [
     "<!-- HTML file automatically generated from DocOnce source (https://github.com/doconce/doconce/)\n",
     "doconce format html Project1.do.txt  -->\n",
-    "<!-- dom:TITLE: Project 1 on Machine Learning, deadline October 6 (midnight), 2025 -->"
+    "<!-- dom:TITLE: Project 1 on Machine Learning, deadline October 6 (midnight), 2025 -->\n"
    ]
   },
   {
@@ -20,9 +20,34 @@
    },
    "source": [
     "# Project 1 on Machine Learning, deadline October 6 (midnight), 2025\n",
+    "\n",
     "**Data Analysis and Machine Learning FYS-STK3155/FYS4155**, University of Oslo, Norway\n",
     "\n",
-    "Date: **September 2**"
+    "Date: **September 2**\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "beb333e3",
+   "metadata": {},
+   "source": [
+    "### Deliverables\n",
+    "\n",
+    "First, join a group in canvas with your group partners. Pick an avaliable group for Project 1 in the \"People\" page.\n",
+    "\n",
+    "In canvas, deliver as a group and include:\n",
+    "\n",
+    "- A PDF of your report which follows the guidelines covered below and in the week 39 exercises. Additional requirements include:\n",
+    "  - It should be around 5000 words, use the word counter in Overleaf for this. This often corresponds to 10-12 pages. References and appendices are excluded from the word count\n",
+    "  - It should include around 10-15 figures. You can include more figures in appendices and/or as supplemental material in your repository.\n",
+    "- A comment linking to your github repository (or folder in one of your github repositories) for this project. The repository must include\n",
+    "  - A PDF file of the report\n",
+    "  - A folder named Code, where you put python files for your functions and notebooks for reproducing your results. Remember to use a seed for generating random data and for train-test splits when generating final results.\n",
+    "  - A README file with\n",
+    "    - the name of the group members\n",
+    "    - a short description of the project\n",
+    "    - a description of how to install the required packages to run your code from a requirements.txt file or similar (such as a plain text description)\n",
+    "    - names and descriptions of the various notebooks in the Code folder and the results they produce\n"
    ]
   },
   {
@@ -35,7 +60,7 @@
     "## Preamble: Note on writing reports, using reference material, AI and other tools\n",
     "\n",
     "We want you to answer the three different projects by handing in\n",
-    "reports written like a standard scientific/technical report.  The\n",
+    "reports written like a standard scientific/technical report. The\n",
     "links at\n",
     "<https://github.com/CompPhysics/MachineLearning/tree/master/doc/Projects>\n",
     "contain more information. There you can find examples of previous\n",
@@ -63,14 +88,14 @@
     "been studied in the scientific literature. This makes it easier for\n",
     "you to compare and analyze your results. Comparing with existing\n",
     "results from the scientific literature is also an essential element of\n",
-    "the scientific discussion.  The University of California at Irvine\n",
+    "the scientific discussion. The University of California at Irvine\n",
     "with its Machine Learning repository at\n",
     "<https://archive.ics.uci.edu/ml/index.php> is an excellent site to\n",
     "look up for examples and\n",
     "inspiration. [Kaggle.com](https://www.kaggle.com/) is an equally\n",
     "interesting site. Feel free to explore these sites. When selecting\n",
     "other data sets, make sure these are sets used for regression problems\n",
-    "(not classification)."
+    "(not classification).\n"
    ]
   },
   {
@@ -90,7 +115,7 @@
     "We will study how to fit polynomials to specific\n",
     "one-dimensional functions (feel free to replace the suggested function with more complicated ones).\n",
     "\n",
-    "We will use Runge's function (see <https://en.wikipedia.org/wiki/Runge%27s_phenomenon> for a discussion).  The one-dimensional function we will study is"
+    "We will use Runge's function (see <https://en.wikipedia.org/wiki/Runge%27s_phenomenon> for a discussion). The one-dimensional function we will study is\n"
    ]
   },
   {
@@ -102,7 +127,7 @@
    "source": [
     "$$\n",
     "f(x) = \\frac{1}{1+25x^2}.\n",
-    "$$"
+    "$$\n"
    ]
   },
   {
@@ -114,14 +139,14 @@
    "source": [
     "Our first step will be to perform an OLS regression analysis of this\n",
     "function, trying out a polynomial fit with an $x$ dependence of the\n",
-    "form $[x,x^2,\\dots]$.  You can use a uniform distribution to set up the\n",
+    "form $[x,x^2,\\dots]$. You can use a uniform distribution to set up the\n",
     "arrays of values for $x \\in [-1,1]$, or alternatively use a fixed step size.\n",
     "Thereafter we will repeat many of the same steps when using the Ridge and Lasso regression methods,\n",
-    "introducing thereby a dependence on the hyperparameter  (penalty) $\\lambda$.\n",
+    "introducing thereby a dependence on the hyperparameter (penalty) $\\lambda$.\n",
     "\n",
     "We will also include bootstrap as a resampling technique in order to\n",
-    "study the so-called **bias-variance tradeoff**.  After that we will\n",
-    "include the so-called cross-validation technique."
+    "study the so-called **bias-variance tradeoff**. After that we will\n",
+    "include the so-called cross-validation technique.\n"
    ]
   },
   {
@@ -133,15 +158,15 @@
    "source": [
     "### Part a : Ordinary Least Square (OLS) for the Runge function\n",
     "\n",
-    "We will generate our own dataset for abovementioned  function\n",
+    "We will generate our own dataset for abovementioned function\n",
     "$\\mathrm{Runge}(x)$ function with $x\\in [-1,1]$. You should explore also the addition\n",
     "of an added stochastic noise to this function using the normal\n",
     "distribution $N(0,1)$.\n",
     "\n",
-    "*Write your own code* (using for example the  pseudoinverse function **pinv** from  **Numpy** ) and perform a standard **ordinary least square regression**\n",
-    "analysis using polynomials in $x$ up to  order $15$ or higher. Explore the dependence on the number of data points and the polynomial degree.\n",
+    "_Write your own code_ (using for example the pseudoinverse function **pinv** from **Numpy** ) and perform a standard **ordinary least square regression**\n",
+    "analysis using polynomials in $x$ up to order $15$ or higher. Explore the dependence on the number of data points and the polynomial degree.\n",
     "\n",
-    "Evaluate the mean Squared error (MSE)"
+    "Evaluate the mean Squared error (MSE)\n"
    ]
   },
   {
@@ -154,7 +179,7 @@
     "$$\n",
     "MSE(\\boldsymbol{y},\\tilde{\\boldsymbol{y}}) = \\frac{1}{n}\n",
     "\\sum_{i=0}^{n-1}(y_i-\\tilde{y}_i)^2,\n",
-    "$$"
+    "$$\n"
    ]
   },
   {
@@ -164,9 +189,9 @@
     "editable": true
    },
    "source": [
-    "and the $R^2$ score function.  If $\\tilde{\\boldsymbol{y}}_i$ is the predicted\n",
+    "and the $R^2$ score function. If $\\tilde{\\boldsymbol{y}}_i$ is the predicted\n",
     "value of the $i-th$ sample and $y_i$ is the corresponding true value,\n",
-    "then the score $R^2$ is defined as"
+    "then the score $R^2$ is defined as\n"
    ]
   },
   {
@@ -178,7 +203,7 @@
    "source": [
     "$$\n",
     "R^2(\\boldsymbol{y}, \\tilde{\\boldsymbol{y}}) = 1 - \\frac{\\sum_{i=0}^{n - 1} (y_i - \\tilde{y}_i)^2}{\\sum_{i=0}^{n - 1} (y_i - \\bar{y})^2},\n",
-    "$$"
+    "$$\n"
    ]
   },
   {
@@ -188,7 +213,7 @@
     "editable": true
    },
    "source": [
-    "where we have defined the mean value  of $\\boldsymbol{y}$ as"
+    "where we have defined the mean value of $\\boldsymbol{y}$ as\n"
    ]
   },
   {
@@ -200,7 +225,7 @@
    "source": [
     "$$\n",
     "\\bar{y} =  \\frac{1}{n} \\sum_{i=0}^{n - 1} y_i.\n",
-    "$$"
+    "$$\n"
    ]
   },
   {
@@ -215,23 +240,23 @@
     "\n",
     "Your code has to include a scaling/centering of the data (for example by\n",
     "subtracting the mean value), and\n",
-    "a split of the data in training and test data. For the scaling  you can\n",
+    "a split of the data in training and test data. For the scaling you can\n",
     "either write your own code or use for example the function for\n",
     "splitting training data provided by the library **Scikit-Learn** (make\n",
-    "sure you have installed it).  This function is called\n",
-    "$train\\_test\\_split$.  **You should present a critical discussion of why and how you have scaled or not scaled the data**.\n",
+    "sure you have installed it). This function is called\n",
+    "$train\\_test\\_split$. **You should present a critical discussion of why and how you have scaled or not scaled the data**.\n",
     "\n",
     "It is normal in essentially all Machine Learning studies to split the\n",
-    "data in a training set and a test set (eventually  also an additional\n",
-    "validation set).  There\n",
+    "data in a training set and a test set (eventually also an additional\n",
+    "validation set). There\n",
     "is no explicit recipe for how much data should be included as training\n",
-    "data and say test data.  An accepted rule of thumb is to use\n",
+    "data and say test data. An accepted rule of thumb is to use\n",
     "approximately $2/3$ to $4/5$ of the data as training data.\n",
     "\n",
     "You can easily reuse the solutions to your exercises from week 35.\n",
     "See also the lecture slides from week 35 and week 36.\n",
     "\n",
-    "On scaling, we recommend reading the following section from the scikit-learn software description, see <https://scikit-learn.org/stable/auto_examples/preprocessing/plot_all_scaling.html#plot-all-scaling-standard-scaler-section>."
+    "On scaling, we recommend reading the following section from the scikit-learn software description, see <https://scikit-learn.org/stable/auto_examples/preprocessing/plot_all_scaling.html#plot-all-scaling-standard-scaler-section>.\n"
    ]
   },
   {
@@ -241,14 +266,14 @@
     "editable": true
    },
    "source": [
-    "### Part b: Adding Ridge regression for  the Runge  function\n",
+    "### Part b: Adding Ridge regression for the Runge function\n",
     "\n",
     "Write your own code for the Ridge method as done in the previous\n",
-    "exercise. The lecture notes from week 35 and 36 contain more information. Furthermore, the  results from the exercise set from week 36 is something you can reuse here.\n",
+    "exercise. The lecture notes from week 35 and 36 contain more information. Furthermore, the results from the exercise set from week 36 is something you can reuse here.\n",
     "\n",
     "Perform the same analysis as you did in the previous exercise but now for different values of $\\lambda$. Compare and\n",
-    "analyze your results with those obtained in part a) with the OLS  method. Study the\n",
-    "dependence on $\\lambda$."
+    "analyze your results with those obtained in part a) with the OLS method. Study the\n",
+    "dependence on $\\lambda$.\n"
    ]
   },
   {
@@ -267,7 +292,7 @@
     "from week 36).\n",
     "\n",
     "Study and compare your results from parts a) and b) with your gradient\n",
-    "descent approch. Discuss in particular the role of the learning rate."
+    "descent approch. Discuss in particular the role of the learning rate.\n"
    ]
   },
   {
@@ -283,7 +308,7 @@
     "the gradient descent method by including **momentum**, **ADAgrad**,\n",
     "**RMSprop** and **ADAM** as methods fro iteratively updating your learning\n",
     "rate. Discuss the results and compare the different methods applied to\n",
-    "the one-dimensional Runge function. The lecture notes from week 37 contain several examples on how to implement these methods."
+    "the one-dimensional Runge function. The lecture notes from week 37 contain several examples on how to implement these methods.\n"
    ]
   },
   {
@@ -299,12 +324,12 @@
     "represents our first encounter with a machine learning method which\n",
     "cannot be solved through analytical expressions (as in OLS and Ridge regression). Use the gradient\n",
     "descent methods you developed in parts c) and d) to solve the LASSO\n",
-    "optimization problem. You can compare your results with \n",
+    "optimization problem. You can compare your results with\n",
     "the functionalities of **Scikit-Learn**.\n",
     "\n",
     "Discuss (critically) your results for the Runge function from OLS,\n",
     "Ridge and LASSO regression using the various gradient descent\n",
-    "approaches."
+    "approaches.\n"
    ]
   },
   {
@@ -319,7 +344,7 @@
     "Our last gradient step is to include stochastic gradient descent using\n",
     "the same methods to update the learning rates as in parts c-e).\n",
     "Compare and discuss your results with and without stochastic gradient\n",
-    "and give a critical assessment of the various methods."
+    "and give a critical assessment of the various methods.\n"
    ]
   },
   {
@@ -332,14 +357,14 @@
     "### Part g: Bias-variance trade-off and resampling techniques\n",
     "\n",
     "Our aim here is to study the bias-variance trade-off by implementing\n",
-    "the **bootstrap** resampling technique.  **We will only use the simpler\n",
+    "the **bootstrap** resampling technique. **We will only use the simpler\n",
     "ordinary least squares here**.\n",
     "\n",
-    "With a code which does OLS and includes resampling techniques, \n",
+    "With a code which does OLS and includes resampling techniques,\n",
     "we will now discuss the bias-variance trade-off in the context of\n",
     "continuous predictions such as regression. However, many of the\n",
     "intuitions and ideas discussed here also carry over to classification\n",
-    "tasks and basically all Machine Learning algorithms. \n",
+    "tasks and basically all Machine Learning algorithms.\n",
     "\n",
     "Before you perform an analysis of the bias-variance trade-off on your\n",
     "test data, make first a figure similar to Fig. 2.11 of Hastie,\n",
@@ -356,7 +381,7 @@
     "dataset $\\mathcal{L}$ consisting of the data\n",
     "$\\mathbf{X}_\\mathcal{L}=\\{(y_j, \\boldsymbol{x}_j), j=0\\ldots n-1\\}$.\n",
     "\n",
-    "We assume that the true data is generated from a noisy model"
+    "We assume that the true data is generated from a noisy model\n"
    ]
   },
   {
@@ -368,7 +393,7 @@
    "source": [
     "$$\n",
     "\\boldsymbol{y}=f(\\boldsymbol{x}) + \\boldsymbol{\\epsilon}.\n",
-    "$$"
+    "$$\n"
    ]
   },
   {
@@ -387,7 +412,7 @@
     "that is $\\boldsymbol{\\tilde{y}}=\\boldsymbol{X}\\boldsymbol{\\theta}$.\n",
     "\n",
     "The parameters $\\boldsymbol{\\theta}$ are in turn found by optimizing the mean\n",
-    "squared error via the so-called cost function"
+    "squared error via the so-called cost function\n"
    ]
   },
   {
@@ -399,7 +424,7 @@
    "source": [
     "$$\n",
     "C(\\boldsymbol{X},\\boldsymbol{\\theta}) =\\frac{1}{n}\\sum_{i=0}^{n-1}(y_i-\\tilde{y}_i)^2=\\mathbb{E}\\left[(\\boldsymbol{y}-\\boldsymbol{\\tilde{y}})^2\\right].\n",
-    "$$"
+    "$$\n"
    ]
   },
   {
@@ -409,14 +434,14 @@
     "editable": true
    },
    "source": [
-    "Here the expected value $\\mathbb{E}$ is the sample value. \n",
+    "Here the expected value $\\mathbb{E}$ is the sample value.\n",
     "\n",
     "Show that you can rewrite this in terms of a term which contains the\n",
     "variance of the model itself (the so-called variance term), a term\n",
     "which measures the deviation from the true data and the mean value of\n",
     "the model (the bias term) and finally the variance of the noise.\n",
     "\n",
-    "That is, show that"
+    "That is, show that\n"
    ]
   },
   {
@@ -428,7 +453,7 @@
    "source": [
     "$$\n",
     "\\mathbb{E}\\left[(\\boldsymbol{y}-\\boldsymbol{\\tilde{y}})^2\\right]=\\mathrm{Bias}[\\tilde{y}]+\\mathrm{var}[\\tilde{y}]+\\sigma^2,\n",
-    "$$"
+    "$$\n"
    ]
   },
   {
@@ -438,7 +463,7 @@
     "editable": true
    },
    "source": [
-    "with (we approximate $f(\\boldsymbol{x})\\approx \\boldsymbol{y}$)"
+    "with (we approximate $f(\\boldsymbol{x})\\approx \\boldsymbol{y}$)\n"
    ]
   },
   {
@@ -450,7 +475,7 @@
    "source": [
     "$$\n",
     "\\mathrm{Bias}[\\tilde{y}]=\\mathbb{E}\\left[\\left(\\boldsymbol{y}-\\mathbb{E}\\left[\\boldsymbol{\\tilde{y}}\\right]\\right)^2\\right],\n",
-    "$$"
+    "$$\n"
    ]
   },
   {
@@ -460,7 +485,7 @@
     "editable": true
    },
    "source": [
-    "and"
+    "and\n"
    ]
   },
   {
@@ -472,7 +497,7 @@
    "source": [
     "$$\n",
     "\\mathrm{var}[\\tilde{y}]=\\mathbb{E}\\left[\\left(\\tilde{\\boldsymbol{y}}-\\mathbb{E}\\left[\\boldsymbol{\\tilde{y}}\\right]\\right)^2\\right]=\\frac{1}{n}\\sum_i(\\tilde{y}_i-\\mathbb{E}\\left[\\boldsymbol{\\tilde{y}}\\right])^2.\n",
-    "$$"
+    "$$\n"
    ]
   },
   {
@@ -482,11 +507,11 @@
     "editable": true
    },
    "source": [
-    "**Important note**: Since the function $f(x)$ is unknown, in order to be able to evalute the bias, we replace $f(\\boldsymbol{x})$ in the expression for the bias with $\\boldsymbol{y}$. \n",
+    "**Important note**: Since the function $f(x)$ is unknown, in order to be able to evalute the bias, we replace $f(\\boldsymbol{x})$ in the expression for the bias with $\\boldsymbol{y}$.\n",
     "\n",
     "The answer to this exercise should be included in the theory part of\n",
-    "the report.  This exercise is also part of the weekly exercises of\n",
-    "week 38.  Explain what the terms mean and discuss their\n",
+    "the report. This exercise is also part of the weekly exercises of\n",
+    "week 38. Explain what the terms mean and discuss their\n",
     "interpretations.\n",
     "\n",
     "Perform then a bias-variance analysis of the Runge function by\n",
@@ -495,7 +520,7 @@
     "Discuss the bias and variance trade-off as function\n",
     "of your model complexity (the degree of the polynomial) and the number\n",
     "of data points, and possibly also your training and test data using the **bootstrap** resampling method.\n",
-    "You can follow the code example in the jupyter-book at <https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/chapter3.html#the-bias-variance-tradeoff>."
+    "You can follow the code example in the jupyter-book at <https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/chapter3.html#the-bias-variance-tradeoff>.\n"
    ]
   },
   {
@@ -505,20 +530,20 @@
     "editable": true
    },
    "source": [
-    "### Part h):  Cross-validation as resampling techniques, adding more complexity\n",
+    "### Part h): Cross-validation as resampling techniques, adding more complexity\n",
     "\n",
     "The aim here is to implement another widely popular\n",
-    "resampling technique, the so-called cross-validation method.  \n",
+    "resampling technique, the so-called cross-validation method.\n",
     "\n",
     "Implement the $k$-fold cross-validation algorithm (feel free to use\n",
     "the functionality of **Scikit-Learn** or write your own code) and\n",
     "evaluate again the MSE function resulting from the test folds.\n",
     "\n",
     "Compare the MSE you get from your cross-validation code with the one\n",
-    "you got from your **bootstrap** code from the previous exercise. Comment and interpret your results. \n",
+    "you got from your **bootstrap** code from the previous exercise. Comment and interpret your results.\n",
     "\n",
     "In addition to using the ordinary least squares method, you should\n",
-    "include both Ridge and Lasso regression in the final analysis."
+    "include both Ridge and Lasso regression in the final analysis.\n"
    ]
   },
   {
@@ -532,7 +557,7 @@
     "\n",
     "1. For a discussion and derivation of the variances and mean squared errors using linear regression, see the [Lecture notes on ridge regression by Wessel N. van Wieringen](https://arxiv.org/abs/1509.09169)\n",
     "\n",
-    "2. The textbook of [Trevor Hastie, Robert Tibshirani, Jerome H. Friedman, The Elements of Statistical Learning, Springer](https://www.springer.com/gp/book/9780387848570), chapters 3 and 7 are the most relevant ones for the analysis of parts g) and h)."
+    "2. The textbook of [Trevor Hastie, Robert Tibshirani, Jerome H. Friedman, The Elements of Statistical Learning, Springer](https://www.springer.com/gp/book/9780387848570), chapters 3 and 7 are the most relevant ones for the analysis of parts g) and h).\n"
    ]
   },
   {
@@ -544,25 +569,25 @@
    "source": [
     "## Introduction to numerical projects\n",
     "\n",
-    "Here follows a brief recipe and recommendation on how to answer the various questions when preparing your answers. \n",
+    "Here follows a brief recipe and recommendation on how to answer the various questions when preparing your answers.\n",
     "\n",
-    "  * Give a short description of the nature of the problem and the eventual  numerical methods you have used.\n",
+    "- Give a short description of the nature of the problem and the eventual numerical methods you have used.\n",
     "\n",
-    "  * Describe the algorithm you have used and/or developed. Here you may find it convenient to use pseudocoding. In many cases you can describe the algorithm in the program itself.\n",
+    "- Describe the algorithm you have used and/or developed. Here you may find it convenient to use pseudocoding. In many cases you can describe the algorithm in the program itself.\n",
     "\n",
-    "  * Include the source code of your program. Comment your program properly. You should have the code at your GitHub/GitLab link. You can also place the code in an appendix of your report.\n",
+    "- Include the source code of your program. Comment your program properly. You should have the code at your GitHub/GitLab link. You can also place the code in an appendix of your report.\n",
     "\n",
-    "  * If possible, try to find analytic solutions, or known limits in order to test your program when developing the code.\n",
+    "- If possible, try to find analytic solutions, or known limits in order to test your program when developing the code.\n",
     "\n",
-    "  * Include your results either in figure form or in a table. Remember to        label your results. All tables and figures should have relevant captions        and labels on the axes.\n",
+    "- Include your results either in figure form or in a table. Remember to label your results. All tables and figures should have relevant captions and labels on the axes.\n",
     "\n",
-    "  * Try to evaluate the reliabilty and numerical stability/precision of your results. If possible, include a qualitative and/or quantitative discussion of the numerical stability, eventual loss of precision etc.\n",
+    "- Try to evaluate the reliabilty and numerical stability/precision of your results. If possible, include a qualitative and/or quantitative discussion of the numerical stability, eventual loss of precision etc.\n",
     "\n",
-    "  * Try to give an interpretation of you results in your answers to  the problems.\n",
+    "- Try to give an interpretation of you results in your answers to the problems.\n",
     "\n",
-    "  * Critique: if possible include your comments and reflections about the  exercise, whether you felt you learnt something, ideas for improvements and  other thoughts you've made when solving the exercise. We wish to keep this course at the interactive level and your comments can help us improve it.\n",
+    "- Critique: if possible include your comments and reflections about the exercise, whether you felt you learnt something, ideas for improvements and other thoughts you've made when solving the exercise. We wish to keep this course at the interactive level and your comments can help us improve it.\n",
     "\n",
-    "  * Try to establish a practice where you log your work at the  computerlab. You may find such a logbook very handy at later stages in your work, especially when you don't properly remember  what a previous test version  of your program did. Here you could also record  the time spent on solving the exercise, various algorithms you may have tested or other topics which you feel worthy of mentioning."
+    "- Try to establish a practice where you log your work at the computerlab. You may find such a logbook very handy at later stages in your work, especially when you don't properly remember what a previous test version of your program did. Here you could also record the time spent on solving the exercise, various algorithms you may have tested or other topics which you feel worthy of mentioning.\n"
    ]
   },
   {
@@ -574,17 +599,17 @@
    "source": [
     "## Format for electronic delivery of report and programs\n",
     "\n",
-    "The preferred format for the report is a PDF file. You can also use DOC or postscript formats or as an ipython notebook file.  As programming language we prefer that you choose between C/C++, Fortran2008, Julia or Python. The following prescription should be followed when preparing the report:\n",
+    "The preferred format for the report is a PDF file. You can also use DOC or postscript formats or as an ipython notebook file. As programming language we prefer that you choose between C/C++, Fortran2008, Julia or Python. The following prescription should be followed when preparing the report:\n",
     "\n",
-    "  * Use Canvas to hand in your projects, log in  at  <https://www.uio.no/english/services/it/education/canvas/> with your normal UiO username and password.\n",
+    "- Use Canvas to hand in your projects, log in at <https://www.uio.no/english/services/it/education/canvas/> with your normal UiO username and password.\n",
     "\n",
-    "  * Upload **only** the report file or the link to your GitHub/GitLab or similar typo of  repos!  For the source code file(s) you have developed please provide us with your link to your GitHub/GitLab or similar  domain.  The report file should include all of your discussions and a list of the codes you have developed.  Do not include library files which are available at the course homepage, unless you have made specific changes to them.\n",
+    "- Upload **only** the report file or the link to your GitHub/GitLab or similar typo of repos! For the source code file(s) you have developed please provide us with your link to your GitHub/GitLab or similar domain. The report file should include all of your discussions and a list of the codes you have developed. Do not include library files which are available at the course homepage, unless you have made specific changes to them.\n",
     "\n",
-    "  * In your GitHub/GitLab or similar repository, please include a folder which contains selected results. These can be in the form of output from your code for a selected set of runs and input parameters.\n",
+    "- In your GitHub/GitLab or similar repository, please include a folder which contains selected results. These can be in the form of output from your code for a selected set of runs and input parameters.\n",
     "\n",
-    "Finally, \n",
-    "we encourage you to collaborate. Optimal working groups consist of \n",
-    "2-3 students. You can then hand in a common report."
+    "Finally,\n",
+    "we encourage you to collaborate. Optimal working groups consist of\n",
+    "2-3 students. You can then hand in a common report.\n"
    ]
   },
   {
@@ -596,42 +621,46 @@
    "source": [
     "## Software and needed installations\n",
     "\n",
-    "If you have Python installed (we recommend Python3) and you feel pretty familiar with installing different packages, \n",
+    "If you have Python installed (we recommend Python3) and you feel pretty familiar with installing different packages,\n",
     "we recommend that you install the following Python packages via **pip** as\n",
+    "\n",
     "1. pip install numpy scipy matplotlib ipython scikit-learn tensorflow sympy pandas pillow\n",
     "\n",
     "For Python3, replace **pip** with **pip3**.\n",
     "\n",
-    "See below for a discussion of **tensorflow** and **scikit-learn**. \n",
+    "See below for a discussion of **tensorflow** and **scikit-learn**.\n",
     "\n",
-    "For OSX users we recommend also, after having installed Xcode, to install **brew**. Brew allows \n",
+    "For OSX users we recommend also, after having installed Xcode, to install **brew**. Brew allows\n",
     "for a seamless installation of additional software via for example\n",
+    "\n",
     "1. brew install python3\n",
     "\n",
     "For Linux users, with its variety of distributions like for example the widely popular Ubuntu distribution\n",
-    "you can use **pip** as well and simply install Python as \n",
-    "1. sudo apt-get install python3  (or python for python2.7)\n",
+    "you can use **pip** as well and simply install Python as\n",
+    "\n",
+    "1. sudo apt-get install python3 (or python for python2.7)\n",
+    "\n",
+    "etc etc.\n",
     "\n",
-    "etc etc. \n",
+    "If you don't want to install various Python packages with their dependencies separately, we recommend two widely used distrubutions which set up all relevant dependencies for Python, namely\n",
     "\n",
-    "If you don't want to install various Python packages with their dependencies separately, we recommend two widely used distrubutions which set up  all relevant dependencies for Python, namely\n",
     "1. [Anaconda](https://docs.anaconda.com/) Anaconda is an open source distribution of the Python and R programming languages for large-scale data processing, predictive analytics, and scientific computing, that aims to simplify package management and deployment. Package versions are managed by the package management system **conda**\n",
     "\n",
-    "2. [Enthought canopy](https://www.enthought.com/product/canopy/)  is a Python distribution for scientific and analytic computing distribution and analysis environment, available for free and under a commercial license.\n",
+    "2. [Enthought canopy](https://www.enthought.com/product/canopy/) is a Python distribution for scientific and analytic computing distribution and analysis environment, available for free and under a commercial license.\n",
     "\n",
     "Popular software packages written in Python for ML are\n",
     "\n",
-    "* [Scikit-learn](http://scikit-learn.org/stable/), \n",
+    "- [Scikit-learn](http://scikit-learn.org/stable/),\n",
     "\n",
-    "* [Tensorflow](https://www.tensorflow.org/),\n",
+    "- [Tensorflow](https://www.tensorflow.org/),\n",
     "\n",
-    "* [PyTorch](http://pytorch.org/) and \n",
+    "- [PyTorch](http://pytorch.org/) and\n",
     "\n",
-    "* [Keras](https://keras.io/).\n",
+    "- [Keras](https://keras.io/).\n",
     "\n",
-    "These are all freely available at their respective GitHub sites. They \n",
+    "These are all freely available at their respective GitHub sites. They\n",
     "encompass communities of developers in the thousands or more. And the number\n",
-    "of code developers and contributors keeps increasing."
+    "of code developers and contributors keeps increasing.\n"
    ]
   }
  ],
diff --git a/doc/LectureNotes/_build/jupyter_execute/project2.ipynb b/doc/LectureNotes/_build/jupyter_execute/project2.ipynb
new file mode 100644
index 000000000..f2130ba5a
--- /dev/null
+++ b/doc/LectureNotes/_build/jupyter_execute/project2.ipynb
@@ -0,0 +1,635 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "96e577ca",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- HTML file automatically generated from DocOnce source (https://github.com/doconce/doconce/)\n",
+    "doconce format html Project2.do.txt  -->\n",
+    "<!-- dom:TITLE: Project 2 on Machine Learning, deadline November 10 (Midnight) -->"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "067c02b9",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "# Project 2 on Machine Learning, deadline November 10 (Midnight)\n",
+    "**[Data Analysis and Machine Learning FYS-STK3155/FYS4155](http://www.uio.no/studier/emner/matnat/fys/FYS3155/index-eng.html)**, University of Oslo, Norway\n",
+    "\n",
+    "Date: **October 14, 2025**"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "01f9fedd",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Deliverables\n",
+    "\n",
+    "First, join a group in canvas with your group partners. Pick an avaliable group for Project 2 in the **People** page.\n",
+    "\n",
+    "In canvas, deliver as a group and include:\n",
+    "\n",
+    "* A PDF of your report which follows the guidelines covered below and in the week 39 exercises. Additional requirements include:\n",
+    "\n",
+    "  * It should be around 5000 words, use the word counter in Overleaf for this. This often corresponds to 10-12 pages. References and appendices are excluded from the word count\n",
+    "\n",
+    "  * It should include around 10-15 figures. You can include more figures in appendices and/or as supplemental material in your repository.\n",
+    "\n",
+    "* A comment linking to your github repository (or folder in one of your github repositories) for this project. The repository must include\n",
+    "\n",
+    "A PDF file of the report\n",
+    "  * A folder named Code, where you put python files for your functions and notebooks for reproducing your results. Remember to use a seed for generating random data and for train-test splits when generating final results.\n",
+    "\n",
+    "  * A README file with the name of the group members\n",
+    "\n",
+    "  * a short description of the project\n",
+    "\n",
+    "  * a description of how to install the required packages to run your code from a requirements.txt file or similar (such as a plain text description) names and descriptions of the various notebooks in the Code folder and the results they produce"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9f8e4871",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "### Preamble: Note on writing reports, using reference material, AI and other tools\n",
+    "\n",
+    "We want you to answer the three different projects by handing in\n",
+    "reports written like a standard scientific/technical report. The links\n",
+    "at\n",
+    "https://github.com/CompPhysics/MachineLearning/tree/master/doc/Projects/n",
+    "contain more information. There you can find examples of previous\n",
+    "reports, the projects themselves, how we grade reports etc. How to\n",
+    "write reports will also be discussed during the various lab\n",
+    "sessions. Please do ask us if you are in doubt.\n",
+    "\n",
+    "When using codes and material from other sources, you should refer to\n",
+    "these in the bibliography of your report, indicating wherefrom you for\n",
+    "example got the code, whether this is from the lecture notes,\n",
+    "softwares like Scikit-Learn, TensorFlow, PyTorch or other\n",
+    "sources. These sources should always be cited correctly. How to cite\n",
+    "some of the libraries is often indicated from their corresponding\n",
+    "GitHub sites or websites, see for example how to cite Scikit-Learn at\n",
+    "https://scikit-learn.org/dev/about.html./n",
+    "\n",
+    "We enocurage you to use tools like ChatGPT or similar in writing the\n",
+    "report. If you use for example ChatGPT, please do cite it properly and\n",
+    "include (if possible) your questions and answers as an addition to the\n",
+    "report. This can be uploaded to for example your website,\n",
+    "GitHub/GitLab or similar as supplemental material.\n",
+    "\n",
+    "If you would like to study other data sets, feel free to propose other\n",
+    "sets. What we have proposed here are mere suggestions from our\n",
+    "side. If you opt for another data set, consider using a set which has\n",
+    "been studied in the scientific literature. This makes it easier for\n",
+    "you to compare and analyze your results. Comparing with existing\n",
+    "results from the scientific literature is also an essential element of\n",
+    "the scientific discussion. The University of California at Irvine with\n",
+    "its Machine Learning repository at\n",
+    "https://archive.ics.uci.edu/ml/index.php%20is%20an%20excellent%20site%20to%20look/n",
+    "up for examples and inspiration. Kaggle.com is an equally interesting\n",
+    "site. Feel free to explore these sites."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "460cc6ea",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Classification and Regression, writing our own neural network code\n",
+    "\n",
+    "The main aim of this project is to study both classification and\n",
+    "regression problems by developing our own \n",
+    "feed-forward neural network (FFNN) code. The exercises from week 41 and 42 (see <https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/exercisesweek41.html> and <https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/exercisesweek42.html>) as well as the lecture material from the same weeks (see  <https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/week41.html> and <https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/week42.html>) should contain enough information for you to get started with writing your own code.\n",
+    "\n",
+    "We will also reuse our codes on gradient descent methods from project 1.\n",
+    "\n",
+    "The data sets that we propose here are (the default sets)\n",
+    "\n",
+    "* Regression (fitting a continuous function). In this part you will need to bring back your results from project 1 and compare these with what you get from your Neural Network code to be developed here. The data sets could be\n",
+    "\n",
+    "  * The simple one-dimensional function Runge function from project 1, that is $f(x) = \\frac{1}{1+25x^2}$. We recommend using a simpler function when developing your neural network code for regression problems. Feel however free to discuss and study other functions, such as the two-dimensional Runge function $f(x,y)=\\left[(10x - 5)^2 + (10y - 5)^2 + 1 \\right]^{-1}$, or even more complicated two-dimensional functions (see the supplementary material of <https://www.nature.com/articles/s41467-025-61362-4> for an extensive list of two-dimensional functions). \n",
+    "\n",
+    "* Classification.\n",
+    "\n",
+    " * We will consider a multiclass classification problem given by the full MNIST data set. The full data set is at <https://www.kaggle.com/datasets/hojjatk/mnist-dataset>.\n",
+    "\n",
+    "We will start with a regression problem and we will reuse our codes on gradient descent methods from project 1."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d62a07ef",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "### Part a): Analytical warm-up\n",
+    "\n",
+    "When using our gradient machinery from project 1, we will need the expressions for the cost/loss functions and their respective\n",
+    "gradients. The functions whose gradients we need are:\n",
+    "1. The mean-squared error (MSE) with and without the $L_1$ and $L_2$ norms (regression problems)\n",
+    "\n",
+    "2. The binary cross entropy (aka log loss)  for binary classification problems with and without $L_1$ and $L_2$ norms\n",
+    "\n",
+    "3. The multiclass cross entropy cost/loss function (aka Softmax cross entropy or just Softmax loss function)\n",
+    "\n",
+    "Set up these three cost/loss functions and their respective derivatives and explain the various terms. In this project you will however only use the MSE and the Softmax  cross entropy.\n",
+    "\n",
+    "We will test three activation functions for our neural network setup, these are the \n",
+    "1. The Sigmoid (aka **logit**) function,\n",
+    "\n",
+    "2. the RELU function and\n",
+    "\n",
+    "3. the Leaky RELU function\n",
+    "\n",
+    "Set up their expressions and their first derivatives.\n",
+    "You may consult the lecture notes (with codes and more) from week 42 at <https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/week42.html>."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9cd8b8ac",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "### Reminder about the gradient machinery from project 1\n",
+    "\n",
+    "In the setup of a neural network code you will need your gradient descent codes from\n",
+    "project 1.  For neural networks we will recommend using stochastic\n",
+    "gradient descent with either the RMSprop or the ADAM algorithms for\n",
+    "updating the learning rates. But you should feel free to try plain gradient descent as well.\n",
+    "\n",
+    "We recommend reading chapter 8 on optimization from the textbook of\n",
+    "Goodfellow, Bengio and Courville at\n",
+    "<https://www.deeplearningbook.org/>. This chapter contains many\n",
+    "useful insights and discussions on the optimization part of machine\n",
+    "learning.  A useful reference on the back progagation algorithm is\n",
+    "Nielsen's book at <http://neuralnetworksanddeeplearning.com/>. \n",
+    "\n",
+    "You will find the Python [Seaborn\n",
+    "package](https://seaborn.pydata.org/generated/seaborn.heatmap.html)\n",
+    "useful when plotting the results as function of the learning rate\n",
+    "$\\eta$ and the hyper-parameter $\\lambda$ ."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5931b155",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "### Part b): Writing your own Neural Network code\n",
+    "\n",
+    "Your aim now, and this is the central part of this project, is to\n",
+    "write your own FFNN code implementing the back\n",
+    "propagation algorithm discussed in the lecture slides from week 41 at <https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/week41.html> and week 42 at <https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/week42.html>.\n",
+    "\n",
+    "We will focus on a regression problem first, using the one-dimensional Runge function"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b273fc8a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "f(x) = \\frac{1}{1+25x^2},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e13db1ec",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "from project 1.\n",
+    "\n",
+    "Use only the mean-squared error as cost function (no regularization terms) and \n",
+    "write an FFNN code for a regression problem with a flexible number of hidden\n",
+    "layers and nodes using only the Sigmoid function as activation function for\n",
+    "the hidden layers. Initialize the weights using a normal\n",
+    "distribution. How would you initialize the biases? And which\n",
+    "activation function would you select for the final output layer?\n",
+    "And how would you set up your design/feature matrix? Hint: does it have to represent a polynomial approximation as you did in project 1? \n",
+    "\n",
+    "Train your network and compare the results with those from your OLS\n",
+    "regression code from project 1 using the one-dimensional Runge\n",
+    "function.  When comparing your neural network code with the OLS\n",
+    "results from project 1, use the same data sets which gave you the best\n",
+    "MSE score. Moreover, use the polynomial order from project 1 that gave you the\n",
+    "best result.  Compare these results with your neural network with one\n",
+    "and two hidden layers using $50$ and $100$ hidden nodes, respectively.\n",
+    "\n",
+    "Comment your results and give a critical discussion of the results\n",
+    "obtained with the OLS code from project 1 and your own neural network\n",
+    "code.  Make an analysis of the learning rates employed to find the\n",
+    "optimal MSE score. Test both stochastic gradient descent\n",
+    "with RMSprop and ADAM and plain gradient descent with different\n",
+    "learning rates.\n",
+    "\n",
+    "You should, as you did in project 1, scale your data."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4f864e31",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "### Part c): Testing against other software libraries\n",
+    "\n",
+    "You should test your results against a similar code using **Scikit-Learn** (see the examples in the above lecture notes from weeks 41 and 42) or **tensorflow/keras** or **Pytorch** (for Pytorch, see Raschka et al.'s text chapters 12 and 13). \n",
+    "\n",
+    "Furthermore, you should also test that your derivatives are correctly\n",
+    "calculated using automatic differentiation, using for example the\n",
+    "**Autograd** library or the **JAX** library. It is optional to implement\n",
+    "these libraries for the present project. In this project they serve as\n",
+    "useful tests of our derivatives."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c9faeafd",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "### Part d): Testing different activation functions and depths of the neural network\n",
+    "\n",
+    "You should also test different activation functions for the hidden\n",
+    "layers. Try out the Sigmoid, the RELU and the Leaky RELU functions and\n",
+    "discuss your results.  Test your results as functions of the number of hidden layers and nodes. Do you see signs of overfitting?\n",
+    "It is optional in this project to perform a bias-variance trade-off analysis."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d865c22b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "### Part e): Testing different norms\n",
+    "\n",
+    "Finally, still using the one-dimensional Runge function, add now the\n",
+    "hyperparameters $\\lambda$ with the $L_2$ and $L_1$ norms.  Find the\n",
+    "optimal results for the hyperparameters $\\lambda$ and the learning\n",
+    "rates $\\eta$ and neural network architecture and compare the $L_2$ results with Ridge regression from\n",
+    "project 1 and the $L_1$ results with the Lasso calculations of project 1.\n",
+    "Use again the same data sets and the best results from project 1 in your comparisons."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5270af8f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "### Part f): Classification  analysis using neural networks\n",
+    "\n",
+    "With a well-written code it should now be easy to change the\n",
+    "activation function for the output layer.\n",
+    "\n",
+    "Here we will change the cost function for our neural network code\n",
+    "developed in parts b), d) and e) in order to perform a classification\n",
+    "analysis.  The classification problem we will study is the multiclass\n",
+    "MNIST problem, see the description of the full data set at\n",
+    "<https://www.kaggle.com/datasets/hojjatk/mnist-dataset>. We will use the Softmax cross entropy function discussed in a). \n",
+    "The MNIST data set discussed in the lecture notes from week 42 is a downscaled variant of the full dataset. \n",
+    "\n",
+    "Feel free to suggest other data sets. If you find the classic MNIST data set somewhat limited, feel free to try the  \n",
+    "MNIST-Fashion data set at for example <https://www.kaggle.com/datasets/zalando-research/fashionmnist>.\n",
+    "\n",
+    "To set up the data set, the following python programs may be useful"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "id": "4e0e1fea",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "from sklearn.datasets import fetch_openml\n",
+    "\n",
+    "# Fetch the MNIST dataset\n",
+    "mnist = fetch_openml('mnist_784', version=1, as_frame=False, parser='auto')\n",
+    "\n",
+    "# Extract data (features) and target (labels)\n",
+    "X = mnist.data\n",
+    "y = mnist.target"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8fe85677",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "You should consider scaling the data. The Pixel values in MNIST range from 0 to 255. Scaling them to a 0-1 range can improve the performance of some models. That is, you could implement the following scaling"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "id": "b28318b2",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "X = X / 255.0"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "97e02c71",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "And then perform the standard train-test splitting"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "id": "88af355c",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "from sklearn.model_selection import train_test_split\n",
+    "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d1f8f0ed",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "To measure the performance of our classification problem we will use the\n",
+    "so-called *accuracy* score.  The accuracy is as you would expect just\n",
+    "the number of correctly guessed targets $t_i$ divided by the total\n",
+    "number of targets, that is"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "554b3a48",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\text{Accuracy} = \\frac{\\sum_{i=1}^n I(t_i = y_i)}{n} ,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "77bfdd5c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where $I$ is the indicator function, $1$ if $t_i = y_i$ and $0$\n",
+    "otherwise if we have a binary classification problem. Here $t_i$\n",
+    "represents the target and $y_i$ the outputs of your FFNN code and $n$ is simply the number of targets $t_i$.\n",
+    "\n",
+    "Discuss your results and give a critical analysis of the various parameters, including hyper-parameters like the learning rates and the regularization parameter $\\lambda$, various activation functions, number of hidden layers and nodes and activation functions.  \n",
+    "\n",
+    "Again, we strongly recommend that you compare your own neural Network\n",
+    "code for classification and pertinent results against a similar code using **Scikit-Learn**  or **tensorflow/keras** or **pytorch**.\n",
+    "\n",
+    "If you have time, you can use the functionality of **scikit-learn** and compare your neural network results with those from Logistic regression. This is optional.\n",
+    "The weblink  here <https://medium.com/ai-in-plain-english/comparison-between-logistic-regression-and-neural-networks-in-classifying-digits-dc5e85cd93c3>compares logistic regression and FFNN using the so-called MNIST data set. You may find several useful hints and ideas from this article. Your neural network code can implement the equivalent of logistic regression by simply setting the number of hidden layers to zero and keeping just the input and the output layers. \n",
+    "\n",
+    "If you wish to compare with say Logisti Regression from **scikit-learn**, the following code uses the above data set"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "id": "eaa9e72e",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "from sklearn.linear_model import LogisticRegression\n",
+    "# Initialize the model\n",
+    "model = LogisticRegression(solver='saga', multi_class='multinomial', max_iter=1000, random_state=42)\n",
+    "# Train the model\n",
+    "model.fit(X_train, y_train)\n",
+    "from sklearn.metrics import accuracy_score\n",
+    "# Make predictions on the test set\n",
+    "y_pred = model.predict(X_test)\n",
+    "# Calculate accuracy\n",
+    "accuracy = accuracy_score(y_test, y_pred)\n",
+    "print(f\"Model Accuracy: {accuracy:.4f}\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c7ba883e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "### Part g) Critical evaluation of the various algorithms\n",
+    "\n",
+    "After all these glorious calculations, you should now summarize the\n",
+    "various algorithms and come with a critical evaluation of their pros\n",
+    "and cons. Which algorithm works best for the regression case and which\n",
+    "is best for the classification case. These codes can also be part of\n",
+    "your final project 3, but now applied to other data sets."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "595be693",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Summary of methods to implement and analyze\n",
+    "\n",
+    "**Required Implementation:**\n",
+    "1. Reuse the regression code and results from project 1, these will act as a benchmark for seeing how suited a neural network is for this regression task.\n",
+    "\n",
+    "2. Implement a neural network with\n",
+    "\n",
+    "  * A flexible number of layers\n",
+    "\n",
+    "  * A flexible number of nodes in each layer\n",
+    "\n",
+    "  * A changeable activation function in each layer (Sigmoid, ReLU, LeakyReLU, as well as Linear and Softmax)\n",
+    "\n",
+    "  * A changeable cost function, which will be set to MSE for regression and cross-entropy for multiple-classification\n",
+    "\n",
+    "  * An optional L1 or L2 norm of the weights and biases in the cost function (only used for computing gradients, not interpretable metrics)\n",
+    "\n",
+    "3. Implement the back-propagation algorithm to compute the gradient of your neural network\n",
+    "\n",
+    "4. Reuse the implementation of Plain and Stochastic Gradient Descent from Project 1 (and adapt the code to work with the your neural network)\n",
+    "\n",
+    "  * With no optimization algorithm\n",
+    "\n",
+    "  * With RMS Prop\n",
+    "\n",
+    "  * With ADAM\n",
+    "\n",
+    "5. Implement scaling and train-test splitting of your data, preferably using sklearn\n",
+    "\n",
+    "6. Implement and compute metrics like the MSE and Accuracy"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "35138b41",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "### Required Analysis:\n",
+    "\n",
+    "1. Briefly show and argue for the advantages and disadvantages of the methods from Project 1.\n",
+    "\n",
+    "2. Explore and show the impact of changing the number of layers, nodes per layer, choice of activation function, and inclusion of L1 and L2 norms. Present only the most interesting results from this exploration. 2D Heatmaps will be good for this: Start with finding a well performing set of hyper-parameters, then change two at a time in a range that shows good and bad performance.\n",
+    "\n",
+    "3. Show and argue for the advantages and disadvantages of using a neural network for regression on your data\n",
+    "\n",
+    "4. Show and argue for the advantages and disadvantages of using a neural network for classification on your data\n",
+    "\n",
+    "5. Show and argue for the advantages and disadvantages of the different gradient methods and learning rates when training the neural network"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b18bea03",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "### Optional (Note that you should include at least two of these in the report):\n",
+    "\n",
+    "1. Implement Logistic Regression as simple classification model case (equivalent to a Neural Network with just the output layer)\n",
+    "\n",
+    "2. Compute the gradient of the neural network with autograd, to show that it gives the same result as your hand-written backpropagation.\n",
+    "\n",
+    "3. Compare your results with results from using a machine-learning library like pytorch (https://docs.pytorch.org/tutorials/beginner/basics/buildmodel_tutorial.html)\n",
+    "\n",
+    "4. Use a more complex classification dataset instead, like the fashion MNIST (see <https://www.kaggle.com/datasets/zalando-research/fashionmnist>)\n",
+    "\n",
+    "5. Use a more complex regression dataset instead, like the two-dimensional Runge function $f(x,y)=\\left[(10x - 5)^2 + (10y - 5)^2 + 1 \\right]^{-1}$, or even more complicated two-dimensional functions (see the supplementary material of <https://www.nature.com/articles/s41467-025-61362-4> for an extensive list of two-dimensional functions). \n",
+    "\n",
+    "6. Compute and interpret a confusion matrix of your best classification model (see <https://www.researchgate.net/figure/Confusion-matrix-of-MNIST-and-F-MNIST-embeddings_fig5_349758607>)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "580d8424",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Background literature\n",
+    "\n",
+    "1. The text of Michael Nielsen is highly recommended, see Nielsen's book at <http://neuralnetworksanddeeplearning.com/>. It is an excellent read.\n",
+    "\n",
+    "2. Goodfellow, Bengio and Courville, Deep Learning at <https://www.deeplearningbook.org/>. Here we recommend chapters 6, 7 and 8\n",
+    "\n",
+    "3. Raschka et al. at <https://sebastianraschka.com/blog/2022/ml-pytorch-book.html>. Here we recommend chapters 11, 12 and 13."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "96f5c67e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Introduction to numerical projects\n",
+    "\n",
+    "Here follows a brief recipe and recommendation on how to write a report for each\n",
+    "project.\n",
+    "\n",
+    "  * Give a short description of the nature of the problem and the eventual  numerical methods you have used.\n",
+    "\n",
+    "  * Describe the algorithm you have used and/or developed. Here you may find it convenient to use pseudocoding. In many cases you can describe the algorithm in the program itself.\n",
+    "\n",
+    "  * Include the source code of your program. Comment your program properly.\n",
+    "\n",
+    "  * If possible, try to find analytic solutions, or known limits in order to test your program when developing the code.\n",
+    "\n",
+    "  * Include your results either in figure form or in a table. Remember to        label your results. All tables and figures should have relevant captions        and labels on the axes.\n",
+    "\n",
+    "  * Try to evaluate the reliabilty and numerical stability/precision of your results. If possible, include a qualitative and/or quantitative discussion of the numerical stability, eventual loss of precision etc.\n",
+    "\n",
+    "  * Try to give an interpretation of you results in your answers to  the problems.\n",
+    "\n",
+    "  * Critique: if possible include your comments and reflections about the  exercise, whether you felt you learnt something, ideas for improvements and  other thoughts you've made when solving the exercise. We wish to keep this course at the interactive level and your comments can help us improve it.\n",
+    "\n",
+    "  * Try to establish a practice where you log your work at the  computerlab. You may find such a logbook very handy at later stages in your work, especially when you don't properly remember  what a previous test version  of your program did. Here you could also record  the time spent on solving the exercise, various algorithms you may have tested or other topics which you feel worthy of mentioning."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d1bc28ba",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Format for electronic delivery of report and programs\n",
+    "\n",
+    "The preferred format for the report is a PDF file. You can also use DOC or postscript formats or as an ipython notebook file.  As programming language we prefer that you choose between C/C++, Fortran2008 or Python. The following prescription should be followed when preparing the report:\n",
+    "\n",
+    "  * Use Canvas to hand in your projects, log in  at  <https://www.uio.no/english/services/it/education/canvas/> with your normal UiO username and password.\n",
+    "\n",
+    "  * Upload **only** the report file or the link to your GitHub/GitLab or similar typo of  repos!  For the source code file(s) you have developed please provide us with your link to your GitHub/GitLab or similar  domain.  The report file should include all of your discussions and a list of the codes you have developed.  Do not include library files which are available at the course homepage, unless you have made specific changes to them.\n",
+    "\n",
+    "  * In your GitHub/GitLab or similar repository, please include a folder which contains selected results. These can be in the form of output from your code for a selected set of runs and input parameters.\n",
+    "\n",
+    "Finally, \n",
+    "we encourage you to collaborate. Optimal working groups consist of \n",
+    "2-3 students. You can then hand in a common report."
+   ]
+  }
+ ],
+ "metadata": {},
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
\ No newline at end of file
diff --git a/doc/LectureNotes/_build/jupyter_execute/week37.ipynb b/doc/LectureNotes/_build/jupyter_execute/week37.ipynb
new file mode 100644
index 000000000..b072ac35a
--- /dev/null
+++ b/doc/LectureNotes/_build/jupyter_execute/week37.ipynb
@@ -0,0 +1,3856 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "d842e7e1",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- HTML file automatically generated from DocOnce source (https://github.com/doconce/doconce/)\n",
+    "doconce format html week37.do.txt --no_mako -->\n",
+    "<!-- dom:TITLE: Week 37: Gradient descent methods -->"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0cd52479",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "# Week 37: Gradient descent methods\n",
+    "**Morten Hjorth-Jensen**, Department of Physics, University of Oslo, Norway\n",
+    "\n",
+    "Date: **September 8-12, 2025**\n",
+    "\n",
+    "<!-- todo add link to videos and add link to Van Wieringens notes -->"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "699b6141",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Plans for week 37, lecture Monday\n",
+    "\n",
+    "**Plans and material  for the lecture on Monday September 8.**\n",
+    "\n",
+    "The family of gradient descent methods\n",
+    "1. Plain gradient descent (constant learning rate), reminder from last week with examples using OLS and Ridge\n",
+    "\n",
+    "2. Improving gradient descent with momentum\n",
+    "\n",
+    "3. Introducing stochastic gradient descent\n",
+    "\n",
+    "4. More advanced updates of the learning rate: ADAgrad, RMSprop and ADAM\n",
+    "\n",
+    "5. [Video of Lecture](https://youtu.be/SuxK68tj-V8)\n",
+    "\n",
+    "6. [Whiteboard notes](https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2025/FYSSTKweek37.pdf)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "dd264b1c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Readings and Videos:\n",
+    "1. Recommended: Goodfellow et al, Deep Learning, introduction to gradient descent, see sections 4.3-4.5  at <https://www.deeplearningbook.org/contents/numerical.html> and chapter 8.3-8.5 at <https://www.deeplearningbook.org/contents/optimization.html>\n",
+    "\n",
+    "2. Rashcka et al, pages 37-44 and pages 278-283 with focus on linear regression.\n",
+    "\n",
+    "3. Video on gradient descent at <https://www.youtube.com/watch?v=sDv4f4s2SB8>\n",
+    "\n",
+    "4. Video on Stochastic gradient descent at <https://www.youtube.com/watch?v=vMh0zPT0tLI>"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "608927bc",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Material for lecture Monday September 8"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "60640670",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Gradient descent and revisiting Ordinary Least Squares from last week\n",
+    "\n",
+    "Last week we started with  linear regression as a case study for the gradient descent\n",
+    "methods. Linear regression is a great test case for the gradient\n",
+    "descent methods discussed in the lectures since it has several\n",
+    "desirable properties such as:\n",
+    "\n",
+    "1. An analytical solution (recall homework sets for week 35).\n",
+    "\n",
+    "2. The gradient can be computed analytically.\n",
+    "\n",
+    "3. The cost function is convex which guarantees that gradient descent converges for small enough learning rates\n",
+    "\n",
+    "We revisit an example similar to what we had in the first homework set. We have a function  of the type"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "id": "947b67ee",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "import numpy as np\n",
+    "x = 2*np.random.rand(m,1)\n",
+    "y = 4+3*x+np.random.randn(m,1)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0a787eca",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "with $x_i \\in [0,1] $ is chosen randomly using a uniform distribution. Additionally we have a stochastic noise chosen according to a normal distribution $\\cal {N}(0,1)$. \n",
+    "The linear regression model is given by"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d7e84ac7",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "h_\\theta(x) = \\boldsymbol{y} = \\theta_0 + \\theta_1 x,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f34c217e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "such that"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b145d4eb",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\boldsymbol{y}_i = \\theta_0 + \\theta_1 x_i.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2df6d60d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Gradient descent example\n",
+    "\n",
+    "Let $\\mathbf{y} = (y_1,\\cdots,y_n)^T$, $\\mathbf{\\boldsymbol{y}} = (\\boldsymbol{y}_1,\\cdots,\\boldsymbol{y}_n)^T$ and $\\theta = (\\theta_0, \\theta_1)^T$\n",
+    "\n",
+    "It is convenient to write $\\mathbf{\\boldsymbol{y}} = X\\theta$ where $X \\in \\mathbb{R}^{100 \\times 2} $ is the design matrix given by (we keep the intercept here)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1deafba0",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "X \\equiv \\begin{bmatrix}\n",
+    "1 & x_1  \\\\\n",
+    "\\vdots & \\vdots  \\\\\n",
+    "1 & x_{100} &  \\\\\n",
+    "\\end{bmatrix}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "520ac423",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "The cost/loss/risk function is given by"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "48e7232b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "C(\\theta) = \\frac{1}{n}||X\\theta-\\mathbf{y}||_{2}^{2} = \\frac{1}{n}\\sum_{i=1}^{100}\\left[ (\\theta_0 + \\theta_1 x_i)^2 - 2 y_i (\\theta_0 + \\theta_1 x_i) + y_i^2\\right]\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0194af20",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and we want to find $\\theta$ such that $C(\\theta)$ is minimized."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9f58d823",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## The derivative of the cost/loss function\n",
+    "\n",
+    "Computing $\\partial C(\\theta) / \\partial \\theta_0$ and $\\partial C(\\theta) / \\partial \\theta_1$ we can show  that the gradient can be written as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "10129d02",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\nabla_{\\theta} C(\\theta) = \\frac{2}{n}\\begin{bmatrix} \\sum_{i=1}^{100} \\left(\\theta_0+\\theta_1x_i-y_i\\right) \\\\\n",
+    "\\sum_{i=1}^{100}\\left( x_i (\\theta_0+\\theta_1x_i)-y_ix_i\\right) \\\\\n",
+    "\\end{bmatrix} = \\frac{2}{n}X^T(X\\theta - \\mathbf{y}),\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4cd07523",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where $X$ is the design matrix defined above."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1bda7e01",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## The Hessian matrix\n",
+    "The Hessian matrix of $C(\\theta)$ is given by"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "aa64bdd1",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\boldsymbol{H} \\equiv \\begin{bmatrix}\n",
+    "\\frac{\\partial^2 C(\\theta)}{\\partial \\theta_0^2} & \\frac{\\partial^2 C(\\theta)}{\\partial \\theta_0 \\partial \\theta_1}  \\\\\n",
+    "\\frac{\\partial^2 C(\\theta)}{\\partial \\theta_0 \\partial \\theta_1} & \\frac{\\partial^2 C(\\theta)}{\\partial \\theta_1^2} &  \\\\\n",
+    "\\end{bmatrix} = \\frac{2}{n}X^T X.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3e7f4c5d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "This result implies that $C(\\theta)$ is a convex function since the matrix $X^T X$ always is positive semi-definite."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "79ed73a8",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Simple program\n",
+    "\n",
+    "We can now write a program that minimizes $C(\\theta)$ using the gradient descent method with a constant learning rate $\\eta$ according to"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1b70ad9b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\theta_{k+1} = \\theta_k - \\eta \\nabla_\\theta C(\\theta_k), \\ k=0,1,\\cdots\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2fbef92d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "We can use the expression we computed for the gradient and let use a\n",
+    "$\\theta_0$ be chosen randomly and let $\\eta = 0.001$. Stop iterating\n",
+    "when $||\\nabla_\\theta C(\\theta_k) || \\leq \\epsilon = 10^{-8}$. **Note that the code below does not include the latter stop criterion**.\n",
+    "\n",
+    "And finally we can compare our solution for $\\theta$ with the analytic result given by \n",
+    "$\\theta= (X^TX)^{-1} X^T \\mathbf{y}$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0728a369",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Gradient Descent Example\n",
+    "\n",
+    "Here our simple example"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "id": "a48d43f0",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "%matplotlib inline\n",
+    "\n",
+    "\n",
+    "# Importing various packages\n",
+    "from random import random, seed\n",
+    "import numpy as np\n",
+    "import matplotlib.pyplot as plt\n",
+    "from mpl_toolkits.mplot3d import Axes3D\n",
+    "from matplotlib import cm\n",
+    "from matplotlib.ticker import LinearLocator, FormatStrFormatter\n",
+    "import sys\n",
+    "\n",
+    "# the number of datapoints\n",
+    "n = 100\n",
+    "x = 2*np.random.rand(n,1)\n",
+    "y = 4+3*x+np.random.randn(n,1)\n",
+    "\n",
+    "X = np.c_[np.ones((n,1)), x]\n",
+    "# Hessian matrix\n",
+    "H = (2.0/n)* X.T @ X\n",
+    "# Get the eigenvalues\n",
+    "EigValues, EigVectors = np.linalg.eig(H)\n",
+    "print(f\"Eigenvalues of Hessian Matrix:{EigValues}\")\n",
+    "\n",
+    "theta_linreg = np.linalg.inv(X.T @ X) @ X.T @ y\n",
+    "print(theta_linreg)\n",
+    "theta = np.random.randn(2,1)\n",
+    "\n",
+    "eta = 1.0/np.max(EigValues)\n",
+    "Niterations = 1000\n",
+    "\n",
+    "for iter in range(Niterations):\n",
+    "    gradient = (2.0/n)*X.T @ (X @ theta-y)\n",
+    "    theta -= eta*gradient\n",
+    "\n",
+    "print(theta)\n",
+    "xnew = np.array([[0],[2]])\n",
+    "xbnew = np.c_[np.ones((2,1)), xnew]\n",
+    "ypredict = xbnew.dot(theta)\n",
+    "ypredict2 = xbnew.dot(theta_linreg)\n",
+    "plt.plot(xnew, ypredict, \"r-\")\n",
+    "plt.plot(xnew, ypredict2, \"b-\")\n",
+    "plt.plot(x, y ,'ro')\n",
+    "plt.axis([0,2.0,0, 15.0])\n",
+    "plt.xlabel(r'$x$')\n",
+    "plt.ylabel(r'$y$')\n",
+    "plt.title(r'Gradient descent example')\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6c1c6ed1",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Gradient descent and Ridge\n",
+    "\n",
+    "We have also discussed Ridge regression where the loss function contains a regularized term given by the $L_2$ norm of $\\theta$,"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a82ce6e3",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "C_{\\text{ridge}}(\\theta) = \\frac{1}{n}||X\\theta -\\mathbf{y}||^2 + \\lambda ||\\theta||^2, \\ \\lambda \\geq 0.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cb0de7c2",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "In order to minimize $C_{\\text{ridge}}(\\theta)$ using GD we adjust the gradient as follows"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b76c0dea",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\nabla_\\theta C_{\\text{ridge}}(\\theta)  = \\frac{2}{n}\\begin{bmatrix} \\sum_{i=1}^{100} \\left(\\theta_0+\\theta_1x_i-y_i\\right) \\\\\n",
+    "\\sum_{i=1}^{100}\\left( x_i (\\theta_0+\\theta_1x_i)-y_ix_i\\right) \\\\\n",
+    "\\end{bmatrix} + 2\\lambda\\begin{bmatrix} \\theta_0 \\\\ \\theta_1\\end{bmatrix} = 2 (\\frac{1}{n}X^T(X\\theta - \\mathbf{y})+\\lambda \\theta).\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4eeb07f6",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "We can easily extend our program to minimize $C_{\\text{ridge}}(\\theta)$ using gradient descent and compare with the analytical solution given by"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cc7d6c64",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\theta_{\\text{ridge}} = \\left(X^T X + n\\lambda I_{2 \\times 2} \\right)^{-1} X^T \\mathbf{y}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "08bd65db",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## The Hessian matrix for Ridge Regression\n",
+    "The Hessian matrix of Ridge Regression for our simple example  is given by"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a1c5a4d1",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\boldsymbol{H} \\equiv \\begin{bmatrix}\n",
+    "\\frac{\\partial^2 C(\\theta)}{\\partial \\theta_0^2} & \\frac{\\partial^2 C(\\theta)}{\\partial \\theta_0 \\partial \\theta_1}  \\\\\n",
+    "\\frac{\\partial^2 C(\\theta)}{\\partial \\theta_0 \\partial \\theta_1} & \\frac{\\partial^2 C(\\theta)}{\\partial \\theta_1^2} &  \\\\\n",
+    "\\end{bmatrix} = \\frac{2}{n}X^T X+2\\lambda\\boldsymbol{I}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f178c97e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "This implies that the Hessian matrix  is positive definite, hence the stationary point is a\n",
+    "minimum.\n",
+    "Note that the Ridge cost function is convex being  a sum of two convex\n",
+    "functions. Therefore, the stationary point is a global\n",
+    "minimum of this function."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3853aec7",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Program example for gradient descent with Ridge Regression"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "id": "81740e7b",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "from random import random, seed\n",
+    "import numpy as np\n",
+    "import matplotlib.pyplot as plt\n",
+    "from mpl_toolkits.mplot3d import Axes3D\n",
+    "from matplotlib import cm\n",
+    "from matplotlib.ticker import LinearLocator, FormatStrFormatter\n",
+    "import sys\n",
+    "\n",
+    "# the number of datapoints\n",
+    "n = 100\n",
+    "x = 2*np.random.rand(n,1)\n",
+    "y = 4+3*x+np.random.randn(n,1)\n",
+    "\n",
+    "X = np.c_[np.ones((n,1)), x]\n",
+    "XT_X = X.T @ X\n",
+    "\n",
+    "#Ridge parameter lambda\n",
+    "lmbda  = 0.001\n",
+    "Id = n*lmbda* np.eye(XT_X.shape[0])\n",
+    "\n",
+    "# Hessian matrix\n",
+    "H = (2.0/n)* XT_X+2*lmbda* np.eye(XT_X.shape[0])\n",
+    "# Get the eigenvalues\n",
+    "EigValues, EigVectors = np.linalg.eig(H)\n",
+    "print(f\"Eigenvalues of Hessian Matrix:{EigValues}\")\n",
+    "\n",
+    "\n",
+    "theta_linreg = np.linalg.inv(XT_X+Id) @ X.T @ y\n",
+    "print(theta_linreg)\n",
+    "# Start plain gradient descent\n",
+    "theta = np.random.randn(2,1)\n",
+    "\n",
+    "eta = 1.0/np.max(EigValues)\n",
+    "Niterations = 100\n",
+    "\n",
+    "for iter in range(Niterations):\n",
+    "    gradients = 2.0/n*X.T @ (X @ (theta)-y)+2*lmbda*theta\n",
+    "    theta -= eta*gradients\n",
+    "\n",
+    "print(theta)\n",
+    "ypredict = X @ theta\n",
+    "ypredict2 = X @ theta_linreg\n",
+    "plt.plot(x, ypredict, \"r-\")\n",
+    "plt.plot(x, ypredict2, \"b-\")\n",
+    "plt.plot(x, y ,'ro')\n",
+    "plt.axis([0,2.0,0, 15.0])\n",
+    "plt.xlabel(r'$x$')\n",
+    "plt.ylabel(r'$y$')\n",
+    "plt.title(r'Gradient descent example for Ridge')\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "aa1b6e08",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Using gradient descent methods, limitations\n",
+    "\n",
+    "* **Gradient descent (GD) finds local minima of our function**. Since the GD algorithm is deterministic, if it converges, it will converge to a local minimum of our cost/loss/risk function. Because in ML we are often dealing with extremely rugged landscapes with many local minima, this can lead to poor performance.\n",
+    "\n",
+    "* **GD is sensitive to initial conditions**. One consequence of the local nature of GD is that initial conditions matter. Depending on where one starts, one will end up at a different local minima. Therefore, it is very important to think about how one initializes the training process. This is true for GD as well as more complicated variants of GD.\n",
+    "\n",
+    "* **Gradients are computationally expensive to calculate for large datasets**. In many cases in statistics and ML, the cost/loss/risk function is a sum of terms, with one term for each data point. For example, in linear regression, $E \\propto \\sum_{i=1}^n (y_i - \\mathbf{w}^T\\cdot\\mathbf{x}_i)^2$; for logistic regression, the square error is replaced by the cross entropy. To calculate the gradient we have to sum over *all* $n$ data points. Doing this at every GD step becomes extremely computationally expensive. An ingenious solution to this, is to calculate the gradients using small subsets of the data called \"mini batches\". This has the added benefit of introducing stochasticity into our algorithm.\n",
+    "\n",
+    "* **GD is very sensitive to choices of learning rates**. GD is extremely sensitive to the choice of learning rates. If the learning rate is very small, the training process take an extremely long time. For larger learning rates, GD can diverge and give poor results. Furthermore, depending on what the local landscape looks like, we have to modify the learning rates to ensure convergence. Ideally, we would *adaptively* choose the learning rates to match the landscape.\n",
+    "\n",
+    "* **GD treats all directions in parameter space uniformly.** Another major drawback of GD is that unlike Newton's method, the learning rate for GD is the same in all directions in parameter space. For this reason, the maximum learning rate is set by the behavior of the steepest direction and this can significantly slow down training. Ideally, we would like to take large steps in flat directions and small steps in steep directions. Since we are exploring rugged landscapes where curvatures change, this requires us to keep track of not only the gradient but second derivatives. The ideal scenario would be to calculate the Hessian but this proves to be too computationally expensive. \n",
+    "\n",
+    "* GD can take exponential time to escape saddle points, even with random initialization. As we mentioned, GD is extremely sensitive to initial condition since it determines the particular local minimum GD would eventually reach. However, even with a good initialization scheme, through the introduction of randomness, GD can still take exponential time to escape saddle points."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d1b9be1a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Momentum based GD\n",
+    "\n",
+    "We discuss here some simple examples where we introduce what is called\n",
+    "'memory'about previous steps, or what is normally called momentum\n",
+    "gradient descent.\n",
+    "For the mathematical details, see whiteboad notes from lecture on September 8, 2025."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2e1267e6",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Improving gradient descent with momentum"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "id": "494e82a7",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "from numpy import asarray\n",
+    "from numpy import arange\n",
+    "from numpy.random import rand\n",
+    "from numpy.random import seed\n",
+    "from matplotlib import pyplot\n",
+    " \n",
+    "# objective function\n",
+    "def objective(x):\n",
+    "\treturn x**2.0\n",
+    " \n",
+    "# derivative of objective function\n",
+    "def derivative(x):\n",
+    "\treturn x * 2.0\n",
+    " \n",
+    "# gradient descent algorithm\n",
+    "def gradient_descent(objective, derivative, bounds, n_iter, step_size):\n",
+    "\t# track all solutions\n",
+    "\tsolutions, scores = list(), list()\n",
+    "\t# generate an initial point\n",
+    "\tsolution = bounds[:, 0] + rand(len(bounds)) * (bounds[:, 1] - bounds[:, 0])\n",
+    "\t# run the gradient descent\n",
+    "\tfor i in range(n_iter):\n",
+    "\t\t# calculate gradient\n",
+    "\t\tgradient = derivative(solution)\n",
+    "\t\t# take a step\n",
+    "\t\tsolution = solution - step_size * gradient\n",
+    "\t\t# evaluate candidate point\n",
+    "\t\tsolution_eval = objective(solution)\n",
+    "\t\t# store solution\n",
+    "\t\tsolutions.append(solution)\n",
+    "\t\tscores.append(solution_eval)\n",
+    "\t\t# report progress\n",
+    "\t\tprint('>%d f(%s) = %.5f' % (i, solution, solution_eval))\n",
+    "\treturn [solutions, scores]\n",
+    " \n",
+    "# seed the pseudo random number generator\n",
+    "seed(4)\n",
+    "# define range for input\n",
+    "bounds = asarray([[-1.0, 1.0]])\n",
+    "# define the total iterations\n",
+    "n_iter = 30\n",
+    "# define the step size\n",
+    "step_size = 0.1\n",
+    "# perform the gradient descent search\n",
+    "solutions, scores = gradient_descent(objective, derivative, bounds, n_iter, step_size)\n",
+    "# sample input range uniformly at 0.1 increments\n",
+    "inputs = arange(bounds[0,0], bounds[0,1]+0.1, 0.1)\n",
+    "# compute targets\n",
+    "results = objective(inputs)\n",
+    "# create a line plot of input vs result\n",
+    "pyplot.plot(inputs, results)\n",
+    "# plot the solutions found\n",
+    "pyplot.plot(solutions, scores, '.-', color='red')\n",
+    "# show the plot\n",
+    "pyplot.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "46858c7c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Same code but now with momentum gradient descent"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "id": "6a917123",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "from numpy import asarray\n",
+    "from numpy import arange\n",
+    "from numpy.random import rand\n",
+    "from numpy.random import seed\n",
+    "from matplotlib import pyplot\n",
+    " \n",
+    "# objective function\n",
+    "def objective(x):\n",
+    "\treturn x**2.0\n",
+    " \n",
+    "# derivative of objective function\n",
+    "def derivative(x):\n",
+    "\treturn x * 2.0\n",
+    " \n",
+    "# gradient descent algorithm\n",
+    "def gradient_descent(objective, derivative, bounds, n_iter, step_size, momentum):\n",
+    "\t# track all solutions\n",
+    "\tsolutions, scores = list(), list()\n",
+    "\t# generate an initial point\n",
+    "\tsolution = bounds[:, 0] + rand(len(bounds)) * (bounds[:, 1] - bounds[:, 0])\n",
+    "\t# keep track of the change\n",
+    "\tchange = 0.0\n",
+    "\t# run the gradient descent\n",
+    "\tfor i in range(n_iter):\n",
+    "\t\t# calculate gradient\n",
+    "\t\tgradient = derivative(solution)\n",
+    "\t\t# calculate update\n",
+    "\t\tnew_change = step_size * gradient + momentum * change\n",
+    "\t\t# take a step\n",
+    "\t\tsolution = solution - new_change\n",
+    "\t\t# save the change\n",
+    "\t\tchange = new_change\n",
+    "\t\t# evaluate candidate point\n",
+    "\t\tsolution_eval = objective(solution)\n",
+    "\t\t# store solution\n",
+    "\t\tsolutions.append(solution)\n",
+    "\t\tscores.append(solution_eval)\n",
+    "\t\t# report progress\n",
+    "\t\tprint('>%d f(%s) = %.5f' % (i, solution, solution_eval))\n",
+    "\treturn [solutions, scores]\n",
+    " \n",
+    "# seed the pseudo random number generator\n",
+    "seed(4)\n",
+    "# define range for input\n",
+    "bounds = asarray([[-1.0, 1.0]])\n",
+    "# define the total iterations\n",
+    "n_iter = 30\n",
+    "# define the step size\n",
+    "step_size = 0.1\n",
+    "# define momentum\n",
+    "momentum = 0.3\n",
+    "# perform the gradient descent search with momentum\n",
+    "solutions, scores = gradient_descent(objective, derivative, bounds, n_iter, step_size, momentum)\n",
+    "# sample input range uniformly at 0.1 increments\n",
+    "inputs = arange(bounds[0,0], bounds[0,1]+0.1, 0.1)\n",
+    "# compute targets\n",
+    "results = objective(inputs)\n",
+    "# create a line plot of input vs result\n",
+    "pyplot.plot(inputs, results)\n",
+    "# plot the solutions found\n",
+    "pyplot.plot(solutions, scores, '.-', color='red')\n",
+    "# show the plot\n",
+    "pyplot.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "361b2aa8",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Overview video on Stochastic Gradient Descent (SGD)\n",
+    "\n",
+    "[What is Stochastic Gradient Descent](https://www.youtube.com/watch?v=vMh0zPT0tLI&ab_channel=StatQuestwithJoshStarmer)\n",
+    "There are several reasons for using stochastic gradient descent. Some of these are:\n",
+    "\n",
+    "1. Efficiency: Updates weights more frequently using a single or a small batch of samples, which speeds up convergence.\n",
+    "\n",
+    "2. Hopefully avoid Local Minima\n",
+    "\n",
+    "3. Memory Usage: Requires less memory compared to computing gradients for the entire dataset."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2dacb8ef",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Batches and mini-batches\n",
+    "\n",
+    "In gradient descent we compute the cost function and its gradient for all data points we have.\n",
+    "\n",
+    "In large-scale applications such as the [ILSVRC challenge](https://www.image-net.org/challenges/LSVRC/), the\n",
+    "training data can have on order of millions of examples. Hence, it\n",
+    "seems wasteful to compute the full cost function over the entire\n",
+    "training set in order to perform only a single parameter update. A\n",
+    "very common approach to addressing this challenge is to compute the\n",
+    "gradient over batches of the training data. For example, a typical batch could contain some thousand  examples from\n",
+    "an  entire training set of several millions. This batch is then used to\n",
+    "perform a parameter update."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "59c9add4",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Pros and cons\n",
+    "\n",
+    "1. Speed: SGD is faster than gradient descent because it uses only one training example per iteration, whereas gradient descent requires the entire dataset. This speed advantage becomes more significant as the size of the dataset increases.\n",
+    "\n",
+    "2. Convergence: Gradient descent has a more predictable convergence behaviour because it uses the average gradient of the entire dataset. In contrast, SGD’s convergence behaviour can be more erratic due to its random sampling of individual training examples.\n",
+    "\n",
+    "3. Memory: Gradient descent requires more memory than SGD because it must store the entire dataset for each iteration. SGD only needs to store the current training example, making it more memory-efficient."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a5168cc9",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Convergence rates\n",
+    "\n",
+    "1. Stochastic Gradient Descent has a faster convergence rate due to the use of single training examples in each iteration.\n",
+    "\n",
+    "2. Gradient Descent as a slower convergence rate, as it uses the entire dataset for each iteration."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "47321307",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Accuracy\n",
+    "\n",
+    "In general, stochastic Gradient Descent is Less accurate than gradient\n",
+    "descent, as it calculates the gradient on single examples, which may\n",
+    "not accurately represent the overall dataset.  Gradient Descent is\n",
+    "more accurate because it uses the average gradient calculated over the\n",
+    "entire dataset.\n",
+    "\n",
+    "There are other disadvantages to using SGD. The main drawback is that\n",
+    "its convergence behaviour can be more erratic due to the random\n",
+    "sampling of individual training examples. This can lead to less\n",
+    "accurate results, as the algorithm may not converge to the true\n",
+    "minimum of the cost function. Additionally, the learning rate, which\n",
+    "determines the step size of each update to the model’s parameters,\n",
+    "must be carefully chosen to ensure convergence.\n",
+    "\n",
+    "It is however the method of choice in deep learning algorithms where\n",
+    "SGD is often used in combination with other optimization techniques,\n",
+    "such as momentum or adaptive learning rates"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "96f44d6b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Stochastic Gradient Descent (SGD)\n",
+    "\n",
+    "In stochastic gradient descent, the extreme case is the case where we\n",
+    "have only one batch, that is we include the whole data set.\n",
+    "\n",
+    "This process is called Stochastic Gradient\n",
+    "Descent (SGD) (or also sometimes on-line gradient descent). This is\n",
+    "relatively less common to see because in practice due to vectorized\n",
+    "code optimizations it can be computationally much more efficient to\n",
+    "evaluate the gradient for 100 examples, than the gradient for one\n",
+    "example 100 times. Even though SGD technically refers to using a\n",
+    "single example at a time to evaluate the gradient, you will hear\n",
+    "people use the term SGD even when referring to mini-batch gradient\n",
+    "descent (i.e. mentions of MGD for “Minibatch Gradient Descent”, or BGD\n",
+    "for “Batch gradient descent” are rare to see), where it is usually\n",
+    "assumed that mini-batches are used. The size of the mini-batch is a\n",
+    "hyperparameter but it is not very common to cross-validate or bootstrap it. It is\n",
+    "usually based on memory constraints (if any), or set to some value,\n",
+    "e.g. 32, 64 or 128. We use powers of 2 in practice because many\n",
+    "vectorized operation implementations work faster when their inputs are\n",
+    "sized in powers of 2.\n",
+    "\n",
+    "In our notes with  SGD we mean stochastic gradient descent with mini-batches."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "898ef421",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Stochastic Gradient Descent\n",
+    "\n",
+    "Stochastic gradient descent (SGD) and variants thereof address some of\n",
+    "the shortcomings of the Gradient descent method discussed above.\n",
+    "\n",
+    "The underlying idea of SGD comes from the observation that the cost\n",
+    "function, which we want to minimize, can almost always be written as a\n",
+    "sum over $n$ data points $\\{\\mathbf{x}_i\\}_{i=1}^n$,"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4e827950",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "C(\\mathbf{\\theta}) = \\sum_{i=1}^n c_i(\\mathbf{x}_i,\n",
+    "\\mathbf{\\theta}).\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "05e99546",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Computation of gradients\n",
+    "\n",
+    "This in turn means that the gradient can be\n",
+    "computed as a sum over $i$-gradients"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b92afe6c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\nabla_\\theta C(\\mathbf{\\theta}) = \\sum_i^n \\nabla_\\theta c_i(\\mathbf{x}_i,\n",
+    "\\mathbf{\\theta}).\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b20a4aca",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Stochasticity/randomness is introduced by only taking the\n",
+    "gradient on a subset of the data called minibatches.  If there are $n$\n",
+    "data points and the size of each minibatch is $M$, there will be $n/M$\n",
+    "minibatches. We denote these minibatches by $B_k$ where\n",
+    "$k=1,\\cdots,n/M$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7884cc0d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## SGD example\n",
+    "As an example, suppose we have $10$ data points $(\\mathbf{x}_1,\\cdots, \\mathbf{x}_{10})$ \n",
+    "and we choose to have $M=5$ minibathces,\n",
+    "then each minibatch contains two data points. In particular we have\n",
+    "$B_1 = (\\mathbf{x}_1,\\mathbf{x}_2), \\cdots, B_5 =\n",
+    "(\\mathbf{x}_9,\\mathbf{x}_{10})$. Note that if you choose $M=1$ you\n",
+    "have only a single batch with all data points and on the other extreme,\n",
+    "you may choose $M=n$ resulting in a minibatch for each datapoint, i.e\n",
+    "$B_k = \\mathbf{x}_k$.\n",
+    "\n",
+    "The idea is now to approximate the gradient by replacing the sum over\n",
+    "all data points with a sum over the data points in one the minibatches\n",
+    "picked at random in each gradient descent step"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "392aeed0",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\nabla_{\\theta}\n",
+    "C(\\mathbf{\\theta}) = \\sum_{i=1}^n \\nabla_\\theta c_i(\\mathbf{x}_i,\n",
+    "\\mathbf{\\theta}) \\rightarrow \\sum_{i \\in B_k}^n \\nabla_\\theta\n",
+    "c_i(\\mathbf{x}_i, \\mathbf{\\theta}).\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "04581249",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## The gradient step\n",
+    "\n",
+    "Thus a gradient descent step now looks like"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d21077a4",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\theta_{j+1} = \\theta_j - \\eta_j \\sum_{i \\in B_k}^n \\nabla_\\theta c_i(\\mathbf{x}_i,\n",
+    "\\mathbf{\\theta})\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b4bed668",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where $k$ is picked at random with equal\n",
+    "probability from $[1,n/M]$. An iteration over the number of\n",
+    "minibathces (n/M) is commonly referred to as an epoch. Thus it is\n",
+    "typical to choose a number of epochs and for each epoch iterate over\n",
+    "the number of minibatches, as exemplified in the code below."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9c15b282",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Simple example code"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "id": "602bda4c",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "import numpy as np \n",
+    "\n",
+    "n = 100 #100 datapoints \n",
+    "M = 5   #size of each minibatch\n",
+    "m = int(n/M) #number of minibatches\n",
+    "n_epochs = 10 #number of epochs\n",
+    "\n",
+    "j = 0\n",
+    "for epoch in range(1,n_epochs+1):\n",
+    "    for i in range(m):\n",
+    "        k = np.random.randint(m) #Pick the k-th minibatch at random\n",
+    "        #Compute the gradient using the data in minibatch Bk\n",
+    "        #Compute new suggestion for \n",
+    "        j += 1"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "332831a7",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Taking the gradient only on a subset of the data has two important\n",
+    "benefits. First, it introduces randomness which decreases the chance\n",
+    "that our opmization scheme gets stuck in a local minima. Second, if\n",
+    "the size of the minibatches are small relative to the number of\n",
+    "datapoints ($M <  n$), the computation of the gradient is much\n",
+    "cheaper since we sum over the datapoints in the $k-th$ minibatch and not\n",
+    "all $n$ datapoints."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "187eb27c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## When do we stop?\n",
+    "\n",
+    "A natural question is when do we stop the search for a new minimum?\n",
+    "One possibility is to compute the full gradient after a given number\n",
+    "of epochs and check if the norm of the gradient is smaller than some\n",
+    "threshold and stop if true. However, the condition that the gradient\n",
+    "is zero is valid also for local minima, so this would only tell us\n",
+    "that we are close to a local/global minimum. However, we could also\n",
+    "evaluate the cost function at this point, store the result and\n",
+    "continue the search. If the test kicks in at a later stage we can\n",
+    "compare the values of the cost function and keep the $\\theta$ that\n",
+    "gave the lowest value."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8ddbdbb5",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Slightly different approach\n",
+    "\n",
+    "Another approach is to let the step length $\\eta_j$ depend on the\n",
+    "number of epochs in such a way that it becomes very small after a\n",
+    "reasonable time such that we do not move at all. Such approaches are\n",
+    "also called scaling. There are many such ways to [scale the learning\n",
+    "rate](https://towardsdatascience.com/gradient-descent-the-learning-rate-and-the-importance-of-feature-scaling-6c0b416596e1)\n",
+    "and [discussions here](https://www.jmlr.org/papers/volume23/20-1258/20-1258.pdf). See\n",
+    "also\n",
+    "<https://towardsdatascience.com/learning-rate-schedules-and-adaptive-learning-rate-methods-for-deep-learning-2c8f433990d1>\n",
+    "for a discussion of different scaling functions for the learning rate."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "35ea8e21",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Time decay rate\n",
+    "\n",
+    "As an example, let $e = 0,1,2,3,\\cdots$ denote the current epoch and let $t_0, t_1 > 0$ be two fixed numbers. Furthermore, let $t = e \\cdot m + i$ where $m$ is the number of minibatches and $i=0,\\cdots,m-1$. Then the function $$\\eta_j(t; t_0, t_1) = \\frac{t_0}{t+t_1} $$ goes to zero as the number of epochs gets large. I.e. we start with a step length $\\eta_j (0; t_0, t_1) = t_0/t_1$ which decays in *time* $t$.\n",
+    "\n",
+    "In this way we can fix the number of epochs, compute $\\theta$ and\n",
+    "evaluate the cost function at the end. Repeating the computation will\n",
+    "give a different result since the scheme is random by design. Then we\n",
+    "pick the final $\\theta$ that gives the lowest value of the cost\n",
+    "function."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "id": "77a60fcd",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "import numpy as np \n",
+    "\n",
+    "def step_length(t,t0,t1):\n",
+    "    return t0/(t+t1)\n",
+    "\n",
+    "n = 100 #100 datapoints \n",
+    "M = 5   #size of each minibatch\n",
+    "m = int(n/M) #number of minibatches\n",
+    "n_epochs = 500 #number of epochs\n",
+    "t0 = 1.0\n",
+    "t1 = 10\n",
+    "\n",
+    "eta_j = t0/t1\n",
+    "j = 0\n",
+    "for epoch in range(1,n_epochs+1):\n",
+    "    for i in range(m):\n",
+    "        k = np.random.randint(m) #Pick the k-th minibatch at random\n",
+    "        #Compute the gradient using the data in minibatch Bk\n",
+    "        #Compute new suggestion for theta\n",
+    "        t = epoch*m+i\n",
+    "        eta_j = step_length(t,t0,t1)\n",
+    "        j += 1\n",
+    "\n",
+    "print(\"eta_j after %d epochs: %g\" % (n_epochs,eta_j))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b030b80c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Code with a Number of Minibatches which varies\n",
+    "\n",
+    "In the code here we vary the number of mini-batches."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 8,
+   "id": "9bdf875b",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "# Importing various packages\n",
+    "from math import exp, sqrt\n",
+    "from random import random, seed\n",
+    "import numpy as np\n",
+    "import matplotlib.pyplot as plt\n",
+    "\n",
+    "n = 100\n",
+    "x = 2*np.random.rand(n,1)\n",
+    "y = 4+3*x+np.random.randn(n,1)\n",
+    "\n",
+    "X = np.c_[np.ones((n,1)), x]\n",
+    "XT_X = X.T @ X\n",
+    "theta_linreg = np.linalg.inv(X.T @ X) @ (X.T @ y)\n",
+    "print(\"Own inversion\")\n",
+    "print(theta_linreg)\n",
+    "# Hessian matrix\n",
+    "H = (2.0/n)* XT_X\n",
+    "EigValues, EigVectors = np.linalg.eig(H)\n",
+    "print(f\"Eigenvalues of Hessian Matrix:{EigValues}\")\n",
+    "\n",
+    "theta = np.random.randn(2,1)\n",
+    "eta = 1.0/np.max(EigValues)\n",
+    "Niterations = 1000\n",
+    "\n",
+    "\n",
+    "for iter in range(Niterations):\n",
+    "    gradients = 2.0/n*X.T @ ((X @ theta)-y)\n",
+    "    theta -= eta*gradients\n",
+    "print(\"theta from own gd\")\n",
+    "print(theta)\n",
+    "\n",
+    "xnew = np.array([[0],[2]])\n",
+    "Xnew = np.c_[np.ones((2,1)), xnew]\n",
+    "ypredict = Xnew.dot(theta)\n",
+    "ypredict2 = Xnew.dot(theta_linreg)\n",
+    "\n",
+    "n_epochs = 50\n",
+    "M = 5   #size of each minibatch\n",
+    "m = int(n/M) #number of minibatches\n",
+    "t0, t1 = 5, 50\n",
+    "\n",
+    "def learning_schedule(t):\n",
+    "    return t0/(t+t1)\n",
+    "\n",
+    "theta = np.random.randn(2,1)\n",
+    "\n",
+    "for epoch in range(n_epochs):\n",
+    "# Can you figure out a better way of setting up the contributions to each batch?\n",
+    "    for i in range(m):\n",
+    "        random_index = M*np.random.randint(m)\n",
+    "        xi = X[random_index:random_index+M]\n",
+    "        yi = y[random_index:random_index+M]\n",
+    "        gradients = (2.0/M)* xi.T @ ((xi @ theta)-yi)\n",
+    "        eta = learning_schedule(epoch*m+i)\n",
+    "        theta = theta - eta*gradients\n",
+    "print(\"theta from own sdg\")\n",
+    "print(theta)\n",
+    "\n",
+    "plt.plot(xnew, ypredict, \"r-\")\n",
+    "plt.plot(xnew, ypredict2, \"b-\")\n",
+    "plt.plot(x, y ,'ro')\n",
+    "plt.axis([0,2.0,0, 15.0])\n",
+    "plt.xlabel(r'$x$')\n",
+    "plt.ylabel(r'$y$')\n",
+    "plt.title(r'Random numbers ')\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "365cebd9",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Replace or not\n",
+    "\n",
+    "In the above code, we have use replacement in setting up the\n",
+    "mini-batches. The discussion\n",
+    "[here](https://sebastianraschka.com/faq/docs/sgd-methods.html) may be\n",
+    "useful."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e7c9011a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## SGD vs Full-Batch GD: Convergence Speed and Memory Comparison"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f1c85da0",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "### Theoretical Convergence Speed and convex optimization\n",
+    "\n",
+    "Consider minimizing an empirical cost function"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "66df0f80",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "C(\\theta) =\\frac{1}{N}\\sum_{i=1}^N l_i(\\theta),\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9f02b845",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where each $l_i(\\theta)$ is a\n",
+    "differentiable loss term. Gradient Descent (GD) updates parameters\n",
+    "using the full gradient $\\nabla C(\\theta)$, while Stochastic Gradient\n",
+    "Descent (SGD) uses a single sample (or mini-batch) gradient $\\nabla\n",
+    "l_i(\\theta)$ selected at random. In equation form, one GD step is:"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "21997f1a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\theta_{t+1} = \\theta_t-\\eta \\nabla C(\\theta_t) =\\theta_t -\\eta \\frac{1}{N}\\sum_{i=1}^N \\nabla l_i(\\theta_t),\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cdefe165",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "whereas one SGD step is:"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ac200d56",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\theta_{t+1} = \\theta_t -\\eta \\nabla l_{i_t}(\\theta_t),\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "eb3edfb3",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "with $i_t$ randomly chosen. On smooth convex problems, GD and SGD both\n",
+    "converge to the global minimum, but their rates differ. GD can take\n",
+    "larger, more stable steps since it uses the exact gradient, achieving\n",
+    "an error that decreases on the order of $O(1/t)$ per iteration for\n",
+    "convex objectives (and even exponentially fast for strongly convex\n",
+    "cases). In contrast, plain SGD has more variance in each step, leading\n",
+    "to sublinear convergence in expectation – typically $O(1/\\sqrt{t})$\n",
+    "for general convex objectives (\\thetaith appropriate diminishing step\n",
+    "sizes) . Intuitively, GD’s trajectory is smoother and more\n",
+    "predictable, while SGD’s path oscillates due to noise but costs far\n",
+    "less per iteration, enabling many more updates in the same time."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7fe05c0d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "### Strongly Convex Case\n",
+    "\n",
+    "If $C(\\theta)$ is strongly convex and $L$-smooth (so GD enjoys linear\n",
+    "convergence), the gap $C(\\theta_t)-C(\\theta^*)$ for GD shrinks as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2ae403f1",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "C(\\theta_t) - C(\\theta^* ) \\le \\Big(1 - \\frac{\\mu}{L}\\Big)^t [C(\\theta_0)-C(\\theta^*)],\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "44272171",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "a geometric (linear) convergence per iteration . Achieving an\n",
+    "$\\epsilon$-accurate solution thus takes on the order of\n",
+    "$\\log(1/\\epsilon)$ iterations for GD. However, each GD iteration costs\n",
+    "$O(N)$ gradient evaluations. SGD cannot exploit strong convexity to\n",
+    "obtain a linear rate – instead, with a properly decaying step size\n",
+    "(e.g. $\\eta_t = \\frac{1}{\\mu t}$) or iterate averaging, SGD attains an\n",
+    "$O(1/t)$ convergence rate in expectation . For example, one result\n",
+    "of Moulines and  Bach 2011, see <https://papers.nips.cc/paper_files/paper/2011/hash/40008b9a5380fcacce3976bf7c08af5b-Abstract.html> shows that with $\\eta_t = \\Theta(1/t)$,"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9cde29ef",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\mathbb{E}[C(\\theta_t) - C(\\theta^*)] = O(1/t),\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9b77f20e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "for strongly convex, smooth $F$ . This $1/t$ rate is slower per\n",
+    "iteration than GD’s exponential decay, but each SGD iteration is $N$\n",
+    "times cheaper. In fact, to reach error $\\epsilon$, plain SGD needs on\n",
+    "the order of $T=O(1/\\epsilon)$ iterations (sub-linear convergence),\n",
+    "while GD needs $O(\\log(1/\\epsilon))$ iterations. When accounting for\n",
+    "cost-per-iteration, GD requires $O(N \\log(1/\\epsilon))$ total gradient\n",
+    "computations versus SGD’s $O(1/\\epsilon)$ single-sample\n",
+    "computations. In large-scale regimes (huge $N$), SGD can be\n",
+    "faster in wall-clock time because $N \\log(1/\\epsilon)$ may far exceed\n",
+    "$1/\\epsilon$ for reasonable accuracy levels. In other words,\n",
+    "with millions of data points, one epoch of GD (one full gradient) is\n",
+    "extremely costly, whereas SGD can make $N$ cheap updates in the time\n",
+    "GD makes one – often yielding a good solution faster in practice, even\n",
+    "though SGD’s asymptotic error decays more slowly. As one lecture\n",
+    "succinctly puts it: “SGD can be super effective in terms of iteration\n",
+    "cost and memory, but SGD is slow to converge and can’t adapt to strong\n",
+    "convexity” . Thus, the break-even point depends on $N$ and the desired\n",
+    "accuracy: for moderate accuracy on very large $N$, SGD’s cheaper\n",
+    "updates win; for extremely high precision (very small $\\epsilon$) on a\n",
+    "modest $N$, GD’s fast convergence per step can be advantageous."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4479bd97",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "### Non-Convex Problems\n",
+    "\n",
+    "In non-convex optimization (e.g. deep neural networks), neither GD nor\n",
+    "SGD guarantees global minima, but SGD often displays faster progress\n",
+    "in finding useful minima. Theoretical results here are weaker, usually\n",
+    "showing convergence to a stationary point $\\theta$ ($|\\nabla C|$ is\n",
+    "small) in expectation. For example, GD might require $O(1/\\epsilon^2)$\n",
+    "iterations to ensure $|\\nabla C(\\theta)| < \\epsilon$, and SGD typically has\n",
+    "similar polynomial complexity (often worse due to gradient\n",
+    "noise). However, a noteworthy difference is that SGD’s stochasticity\n",
+    "can help escape saddle points or poor local minima. Random gradient\n",
+    "fluctuations act like implicit noise, helping the iterate “jump” out\n",
+    "of flat saddle regions where full-batch GD could stagnate . In fact,\n",
+    "research has shown that adding noise to GD can guarantee escaping\n",
+    "saddle points in polynomial time, and the inherent noise in SGD often\n",
+    "serves this role. Empirically, this means SGD can sometimes find a\n",
+    "lower loss basin faster, whereas full-batch GD might get “stuck” near\n",
+    "saddle points or need a very small learning rate to navigate complex\n",
+    "error surfaces . Overall, in modern high-dimensional machine learning,\n",
+    "SGD (or mini-batch SGD) is the workhorse for large non-convex problems\n",
+    "because it converges to good solutions much faster in practice,\n",
+    "despite the lack of a linear convergence guarantee. Full-batch GD is\n",
+    "rarely used on large neural networks, as it would require tiny steps\n",
+    "to avoid divergence and is extremely slow per iteration ."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "31ea65c9",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Memory Usage and Scalability\n",
+    "\n",
+    "A major advantage of SGD is its memory efficiency in handling large\n",
+    "datasets. Full-batch GD requires access to the entire training set for\n",
+    "each iteration, which often means the whole dataset (or a large\n",
+    "subset) must reside in memory to compute $\\nabla C(\\theta)$ . This results\n",
+    "in memory usage that scales linearly with the dataset size $N$. For\n",
+    "instance, if each training sample is large (e.g. high-dimensional\n",
+    "features), computing a full gradient may require storing a substantial\n",
+    "portion of the data or all intermediate gradients until they are\n",
+    "aggregated. In contrast, SGD needs only a single (or a small\n",
+    "mini-batch of) training example(s) in memory at any time . The\n",
+    "algorithm processes one sample (or mini-batch) at a time and\n",
+    "immediately updates the model, discarding that sample before moving to\n",
+    "the next. This streaming approach means that memory footprint is\n",
+    "essentially independent of $N$ (apart from storing the model\n",
+    "parameters themselves). As one source notes, gradient descent\n",
+    "“requires more memory than SGD” because it “must store the entire\n",
+    "dataset for each iteration,” whereas SGD “only needs to store the\n",
+    "current training example” . In practical terms, if you have a dataset\n",
+    "of size, say, 1 million examples, full-batch GD would need memory for\n",
+    "all million every step, while SGD could be implemented to load just\n",
+    "one example at a time – a crucial benefit if data are too large to fit\n",
+    "in RAM or GPU memory. This scalability makes SGD suitable for\n",
+    "large-scale learning: as long as you can stream data from disk, SGD\n",
+    "can handle arbitrarily large datasets with fixed memory. In fact, SGD\n",
+    "“does not need to remember which examples were visited” in the past,\n",
+    "allowing it to run in an online fashion on infinite data streams\n",
+    ". Full-batch GD, on the other hand, would require multiple passes\n",
+    "through a giant dataset per update (or a complex distributed memory\n",
+    "system), which is often infeasible.\n",
+    "\n",
+    "There is also a secondary memory effect: computing a full-batch\n",
+    "gradient in deep learning requires storing all intermediate\n",
+    "activations for backpropagation across the entire batch. A very large\n",
+    "batch (approaching the full dataset) might exhaust GPU memory due to\n",
+    "the need to hold activation gradients for thousands or millions of\n",
+    "examples simultaneously. SGD/minibatches mitigate this by splitting\n",
+    "the workload – e.g. with a mini-batch of size 32 or 256, memory use\n",
+    "stays bounded, whereas a full-batch (size = $N$) forward/backward pass\n",
+    "could not even be executed if $N$ is huge. Techniques like gradient\n",
+    "accumulation exist to simulate large-batch GD by summing many\n",
+    "small-batch gradients – but these still process data in manageable\n",
+    "chunks to avoid memory overflow. In summary, memory complexity for GD\n",
+    "grows with $N$, while for SGD it remains $O(1)$ w.r.t. dataset size\n",
+    "(only the model and perhaps a mini-batch reside in memory) . This is a\n",
+    "key reason why batch GD “does not scale” to very large data and why\n",
+    "virtually all large-scale machine learning algorithms rely on\n",
+    "stochastic or mini-batch methods."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3f3fe4c4",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Empirical Evidence: Convergence Time and Memory in Practice\n",
+    "\n",
+    "Empirical studies strongly support the theoretical trade-offs\n",
+    "above. In large-scale machine learning tasks, SGD often converges to a\n",
+    "good solution much faster in wall-clock time than full-batch GD, and\n",
+    "it uses far less memory. For example, Bottou & Bousquet (2008)\n",
+    "analyzed learning time under a fixed computational budget and\n",
+    "concluded that when data is abundant, it’s better to use a faster\n",
+    "(even if less precise) optimization method to process more examples in\n",
+    "the same time . This analysis showed that for large-scale problems,\n",
+    "processing more data with SGD yields lower error than spending the\n",
+    "time to do exact (batch) optimization on fewer data . In other words,\n",
+    "if you have a time budget, it’s often optimal to accept slightly\n",
+    "slower convergence per step (as with SGD) in exchange for being able\n",
+    "to use many more training samples in that time. This phenomenon is\n",
+    "borne out by experiments:"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "69d08c69",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "### Deep Neural Networks\n",
+    "\n",
+    "In modern deep learning, full-batch GD is so slow that it is rarely\n",
+    "attempted; instead, mini-batch SGD is standard. A recent study\n",
+    "demonstrated that it is possible to train a ResNet-50 on ImageNet\n",
+    "using full-batch gradient descent, but it required careful tuning\n",
+    "(e.g. gradient clipping, tiny learning rates) and vast computational\n",
+    "resources – and even then, each full-batch update was extremely\n",
+    "expensive.\n",
+    "\n",
+    "Using a huge batch\n",
+    "(closer to full GD) tends to slow down convergence if the learning\n",
+    "rate is not scaled up, and often encounters optimization difficulties\n",
+    "(plateaus) that small batches avoid.\n",
+    "Empirically, small or medium\n",
+    "batch SGD finds minima in fewer clock hours because it can rapidly\n",
+    "loop over the data with gradient noise aiding exploration."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4e2b549d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "### Memory constraints\n",
+    "\n",
+    "From a memory standpoint, practitioners note that batch GD becomes\n",
+    "infeasible on large data. For example, if one tried to do full-batch\n",
+    "training on a dataset that doesn’t fit in RAM or GPU memory, the\n",
+    "program would resort to heavy disk I/O or simply crash. SGD\n",
+    "circumvents this by processing mini-batches. Even in cases where data\n",
+    "does fit in memory, using a full batch can spike memory usage due to\n",
+    "storing all gradients. One empirical observation is that mini-batch\n",
+    "training has a “lower, fluctuating usage pattern” of memory, whereas\n",
+    "full-batch loading “quickly consumes memory (often exceeding limits)”\n",
+    ". This is especially relevant for graph neural networks or other\n",
+    "models where a “batch” may include a huge chunk of a graph: full-batch\n",
+    "gradient computation can exhaust GPU memory, whereas mini-batch\n",
+    "methods keep memory usage manageable .\n",
+    "\n",
+    "In summary, SGD converges faster than full-batch GD in terms of actual\n",
+    "training time for large-scale problems, provided we measure\n",
+    "convergence as reaching a good-enough solution. Theoretical bounds\n",
+    "show SGD needs more iterations, but because it performs many more\n",
+    "updates per unit time (and requires far less memory), it often\n",
+    "achieves lower loss in a given time frame than GD. Full-batch GD might\n",
+    "take slightly fewer iterations in theory, but each iteration is so\n",
+    "costly that it is “slower… especially for large datasets” . Meanwhile,\n",
+    "memory scaling strongly favors SGD: GD’s memory cost grows with\n",
+    "dataset size, making it impractical beyond a point, whereas SGD’s\n",
+    "memory use is modest and mostly constant w.r.t. $N$ . These\n",
+    "differences have made SGD (and mini-batch variants) the de facto\n",
+    "choice for training large machine learning models, from logistic\n",
+    "regression on millions of examples to deep neural networks with\n",
+    "billions of parameters. The consensus in both research and practice is\n",
+    "that for large-scale or high-dimensional tasks, SGD-type methods\n",
+    "converge quicker per unit of computation and handle memory constraints\n",
+    "better than standard full-batch gradient descent ."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "48c2661e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Second moment of the gradient\n",
+    "\n",
+    "In stochastic gradient descent, with and without momentum, we still\n",
+    "have to specify a schedule for tuning the learning rates $\\eta_t$\n",
+    "as a function of time.  As discussed in the context of Newton's\n",
+    "method, this presents a number of dilemmas. The learning rate is\n",
+    "limited by the steepest direction which can change depending on the\n",
+    "current position in the landscape. To circumvent this problem, ideally\n",
+    "our algorithm would keep track of curvature and take large steps in\n",
+    "shallow, flat directions and small steps in steep, narrow directions.\n",
+    "Second-order methods accomplish this by calculating or approximating\n",
+    "the Hessian and normalizing the learning rate by the\n",
+    "curvature. However, this is very computationally expensive for\n",
+    "extremely large models. Ideally, we would like to be able to\n",
+    "adaptively change the step size to match the landscape without paying\n",
+    "the steep computational price of calculating or approximating\n",
+    "Hessians.\n",
+    "\n",
+    "During the last decade a number of methods have been introduced that accomplish\n",
+    "this by tracking not only the gradient, but also the second moment of\n",
+    "the gradient. These methods include AdaGrad, AdaDelta, Root Mean Squared Propagation (RMS-Prop), and\n",
+    "[ADAM](https://arxiv.org/abs/1412.6980)."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a2106298",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Challenge: Choosing a Fixed Learning Rate\n",
+    "A fixed $\\eta$ is hard to get right:\n",
+    "1. If $\\eta$ is too large, the updates can overshoot the minimum, causing oscillations or divergence\n",
+    "\n",
+    "2. If $\\eta$ is too small, convergence is very slow (many iterations to make progress)\n",
+    "\n",
+    "In practice, one often uses trial-and-error or schedules (decaying $\\eta$ over time) to find a workable balance.\n",
+    "For a function with steep directions and flat directions, a single global $\\eta$ may be inappropriate:\n",
+    "1. Steep coordinates require a smaller step size to avoid oscillation.\n",
+    "\n",
+    "2. Flat/shallow coordinates could use a larger step to speed up progress.\n",
+    "\n",
+    "3. This issue is pronounced in high-dimensional problems with **sparse or varying-scale features** – we need a method to adjust step sizesper feature."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "477a053c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Motivation for Adaptive Step Sizes\n",
+    "\n",
+    "1. Instead of a fixed global $\\eta$, use an **adaptive learning rate** for each parameter that depends on the history of gradients.\n",
+    "\n",
+    "2. Parameters that have large accumulated gradient magnitude should get smaller steps (they've been changing a lot), whereas parameters with small or infrequent gradients can have larger relative steps.\n",
+    "\n",
+    "3. This is especially useful for sparse features: Rarely active features accumulate little gradient, so their learning rate remains comparatively high, ensuring they are not neglected\n",
+    "\n",
+    "4. Conversely, frequently active features accumulate large gradient sums, and their learning rate automatically decreases, preventing too-large updates\n",
+    "\n",
+    "5. Several algorithms implement this idea (AdaGrad, RMSProp, AdaDelta, Adam, etc.). We will derive **AdaGrad**, one of the first adaptive methods."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f0924df8",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## AdaGrad algorithm, taken from [Goodfellow et al](https://www.deeplearningbook.org/contents/optimization.html)\n",
+    "\n",
+    "<!-- dom:FIGURE: [figures/adagrad.png, width=600 frac=0.8] -->\n",
+    "<!-- begin figure -->\n",
+    "\n",
+    "<img src=\"figures/adagrad.png\" width=\"600\"><p style=\"font-size: 0.9em\"><i>Figure 1: </i></p>\n",
+    "<!-- end figure -->"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7743f26d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Derivation of the AdaGrad Algorithm\n",
+    "\n",
+    "**Accumulating Gradient History.**\n",
+    "\n",
+    "1. AdaGrad maintains a running sum of squared gradients for each parameter (coordinate)\n",
+    "\n",
+    "2. Let $g_t = \\nabla C_{i_t}(x_t)$ be the gradient at step $t$ (or a subgradient for nondifferentiable cases).\n",
+    "\n",
+    "3. Initialize $r_0 = 0$ (an all-zero vector in $\\mathbb{R}^d$).\n",
+    "\n",
+    "4. At each iteration $t$, update the accumulation:"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ef4b5d6a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "r_t = r_{t-1} + g_t \\circ g_t,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "927e2738",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "1. Here  $g_t \\circ g_t$ denotes element-wise square of the gradient vector. $g_t^{(j)} = g_{t-1}^{(j)} + (g_{t,j})^2$ for each parameter $j$.\n",
+    "\n",
+    "2. We can view $H_t = \\mathrm{diag}(r_t)$ as a diagonal matrix of past squared gradients. Initially $H_0 = 0$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1753de13",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## AdaGrad Update Rule Derivation\n",
+    "\n",
+    "We scale the gradient by the inverse square root of the accumulated matrix $H_t$. The AdaGrad update at step $t$ is:"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0db67ba3",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\theta_{t+1} =\\theta_t - \\eta H_t^{-1/2} g_t,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7831e978",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where $H_t^{-1/2}$ is the diagonal matrix with entries $(r_{t}^{(1)})^{-1/2}, \\dots, (r_{t}^{(d)})^{-1/2}$\n",
+    "In coordinates, this means each parameter $j$ has an individual step size:"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "92a7758a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\theta_{t+1,j} =\\theta_{t,j} -\\frac{\\eta}{\\sqrt{r_{t,j}}}g_{t,j}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "df62a4ff",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "In practice we add a small constant $\\epsilon$ in the denominator for numerical stability to avoid division by zero:"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c8a2b948",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\theta_{t+1,j}= \\theta_{t,j}-\\frac{\\eta}{\\sqrt{\\epsilon + r_{t,j}}}g_{t,j}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3f269e80",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Equivalently, the effective learning rate for parameter $j$ at time $t$ is $\\displaystyle \\alpha_{t,j} = \\frac{\\eta}{\\sqrt{\\epsilon + r_{t,j}}}$. This decreases over time as $r_{t,j}$ grows."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f4ec584c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## AdaGrad Properties\n",
+    "\n",
+    "1. AdaGrad automatically tunes the step size for each parameter. Parameters with more *volatile or large gradients* get smaller steps, and those with *small or infrequent gradients* get relatively larger steps\n",
+    "\n",
+    "2. No manual schedule needed: The accumulation $r_t$ keeps increasing (or stays the same if gradient is zero), so step sizes $\\eta/\\sqrt{r_t}$ are non-increasing. This has a similar effect to a learning rate schedule, but individualized per coordinate.\n",
+    "\n",
+    "3. Sparse data benefit: For very sparse features, $r_{t,j}$ grows slowly, so that feature’s parameter retains a higher learning rate for longer, allowing it to make significant updates when it does get a gradient signal\n",
+    "\n",
+    "4. Convergence: In convex optimization, AdaGrad can be shown to achieve a sub-linear convergence rate  comparable to the best fixed learning rate tuned for the problem\n",
+    "\n",
+    "It effectively reduces the need to tune $\\eta$ by hand.\n",
+    "1. Limitations: Because $r_t$ accumulates without bound, AdaGrad’s learning rates can become extremely small over long training, potentially slowing progress. (Later variants like RMSProp, AdaDelta, Adam address this by modifying the accumulation rule.)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4b741016",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## RMSProp: Adaptive Learning Rates\n",
+    "\n",
+    "Addresses AdaGrad’s diminishing learning rate issue.\n",
+    "Uses a decaying average of squared gradients (instead of a cumulative sum):"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "76108e75",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "v_t = \\rho v_{t-1} + (1-\\rho)(\\nabla C(\\theta_t))^2,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4c6a3353",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "with $\\rho$ typically $0.9$ (or $0.99$).\n",
+    "1. Update: $\\theta_{t+1} = \\theta_t - \\frac{\\eta}{\\sqrt{v_t + \\epsilon}} \\nabla C(\\theta_t)$.\n",
+    "\n",
+    "2. Recent gradients have more weight, so $v_t$ adapts to the current landscape.\n",
+    "\n",
+    "3. Avoids AdaGrad’s “infinite memory” problem – learning rate does not continuously decay to zero.\n",
+    "\n",
+    "RMSProp was first proposed in lecture notes by Geoff Hinton, 2012 - unpublished.)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3e0a76ae",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## RMSProp algorithm, taken from [Goodfellow et al](https://www.deeplearningbook.org/contents/optimization.html)\n",
+    "\n",
+    "<!-- dom:FIGURE: [figures/rmsprop.png, width=600 frac=0.8] -->\n",
+    "<!-- begin figure -->\n",
+    "\n",
+    "<img src=\"figures/rmsprop.png\" width=\"600\"><p style=\"font-size: 0.9em\"><i>Figure 1: </i></p>\n",
+    "<!-- end figure -->"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "fa5fd82e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Adam Optimizer\n",
+    "\n",
+    "Why combine Momentum and RMSProp? Motivation for Adam: Adaptive Moment Estimation (Adam) was introduced by Kingma an Ba (2014) to combine the benefits of momentum and RMSProp.\n",
+    "\n",
+    "1. Fast convergence by smoothing gradients (accelerates in long-term gradient direction).\n",
+    "\n",
+    "2. Adaptive rates (RMSProp): Per-dimension learning rate scaling for stability (handles different feature scales, sparse gradients).\n",
+    "\n",
+    "3. Adam uses both: maintains moving averages of both first moment (gradients) and second moment (squared gradients)\n",
+    "\n",
+    "4. Additionally, includes a mechanism to correct the bias in these moving averages (crucial in early iterations)\n",
+    "\n",
+    "**Result**: Adam is robust, achieves faster convergence with less tuning, and often outperforms SGD (with momentum) in practice."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "89cda2f6",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## [ADAM optimizer](https://arxiv.org/abs/1412.6980)\n",
+    "\n",
+    "In [ADAM](https://arxiv.org/abs/1412.6980), we keep a running average of\n",
+    "both the first and second moment of the gradient and use this\n",
+    "information to adaptively change the learning rate for different\n",
+    "parameters.  The method is efficient when working with large\n",
+    "problems involving lots data and/or parameters.  It is a combination of the\n",
+    "gradient descent with momentum algorithm and the RMSprop algorithm\n",
+    "discussed above."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "69310c2b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Why Combine Momentum and RMSProp?\n",
+    "\n",
+    "1. Momentum: Fast convergence by smoothing gradients (accelerates in long-term gradient direction).\n",
+    "\n",
+    "2. Adaptive rates (RMSProp): Per-dimension learning rate scaling for stability (handles different feature scales, sparse gradients).\n",
+    "\n",
+    "3. Adam uses both: maintains moving averages of both first moment (gradients) and second moment (squared gradients)\n",
+    "\n",
+    "4. Additionally, includes a mechanism to correct the bias in these moving averages (crucial in early iterations)\n",
+    "\n",
+    "Result: Adam is robust, achieves faster convergence with less tuning, and often outperforms SGD (with momentum) in practice"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7d6b8734",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Adam: Exponential Moving Averages (Moments)\n",
+    "Adam maintains two moving averages at each time step $t$ for each parameter $w$:\n",
+    "**First moment (mean) $m_t$.**\n",
+    "\n",
+    "The Momentum term"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "106ce6bf",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "m_t = \\beta_1m_{t-1} + (1-\\beta_1)\\, \\nabla C(\\theta_t),\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3ba64fd6",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "**Second moment (uncentered variance) $v_t$.**\n",
+    "\n",
+    "The RMS term"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d2e1a9ee",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "v_t = \\beta_2v_{t-1} + (1-\\beta_2)(\\nabla C(\\theta_t))^2,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "00aae51f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "with typical $\\beta_1 = 0.9$, $\\beta_2 = 0.999$. Initialize $m_0 = 0$, $v_0 = 0$.\n",
+    "\n",
+    "  These are **biased** estimators of the true first and second moment of the gradients, especially at the start (since $m_0,v_0$ are zero)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "38adfadd",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Adam: Bias Correction\n",
+    "To counteract initialization bias in $m_t, v_t$, Adam computes bias-corrected estimates"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "484156fb",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\hat{m}_t = \\frac{m_t}{1 - \\beta_1^t}, \\qquad \\hat{v}_t = \\frac{v_t}{1 - \\beta_2^t}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "45d1d0c2",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "* When $t$ is small, $1-\\beta_i^t \\approx 0$, so $\\hat{m}_t, \\hat{v}_t$ significantly larger than raw $m_t, v_t$, compensating for the initial zero bias.\n",
+    "\n",
+    "* As $t$ increases, $1-\\beta_i^t \\to 1$, and $\\hat{m}_t, \\hat{v}_t$ converge to $m_t, v_t$.\n",
+    "\n",
+    "* Bias correction is important for Adam’s stability in early iterations"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e62d5568",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Adam: Update Rule Derivation\n",
+    "Finally, Adam updates parameters using the bias-corrected moments:"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3eb873c1",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\theta_{t+1} =\\theta_t -\\frac{\\alpha}{\\sqrt{\\hat{v}_t} + \\epsilon}\\hat{m}_t,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "fc1129f6",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where $\\epsilon$ is a small constant (e.g. $10^{-8}$) to prevent division by zero.\n",
+    "Breaking it down:\n",
+    "1. Compute gradient $\\nabla C(\\theta_t)$.\n",
+    "\n",
+    "2. Update first moment $m_t$ and second moment $v_t$ (exponential moving averages).\n",
+    "\n",
+    "3. Bias-correct: $\\hat{m}_t = m_t/(1-\\beta_1^t)$, $\\; \\hat{v}_t = v_t/(1-\\beta_2^t)$.\n",
+    "\n",
+    "4. Compute step: $\\Delta \\theta_t = \\frac{\\hat{m}_t}{\\sqrt{\\hat{v}_t} + \\epsilon}$.\n",
+    "\n",
+    "5. Update parameters: $\\theta_{t+1} = \\theta_t - \\alpha\\, \\Delta \\theta_t$.\n",
+    "\n",
+    "This is the Adam update rule as given in the original paper."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6f15ce48",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Adam vs. AdaGrad and RMSProp\n",
+    "\n",
+    "1. AdaGrad: Uses per-coordinate scaling like Adam, but no momentum. Tends to slow down too much due to cumulative history (no forgetting)\n",
+    "\n",
+    "2. RMSProp: Uses moving average of squared gradients (like Adam’s $v_t$) to maintain adaptive learning rates, but does not include momentum or bias-correction.\n",
+    "\n",
+    "3. Adam: Effectively RMSProp + Momentum + Bias-correction\n",
+    "\n",
+    "  * Momentum ($m_t$) provides acceleration and smoother convergence.\n",
+    "\n",
+    "  * Adaptive $v_t$ scaling moderates the step size per dimension.\n",
+    "\n",
+    "  * Bias correction (absent in AdaGrad/RMSProp) ensures robust estimates early on.\n",
+    "\n",
+    "In practice, Adam often yields faster convergence and better tuning stability than RMSProp or AdaGrad alone"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "44cb65e2",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Adaptivity Across Dimensions\n",
+    "\n",
+    "1. Adam adapts the step size \\emph{per coordinate}: parameters with larger gradient variance get smaller effective steps, those with smaller or sparse gradients get larger steps.\n",
+    "\n",
+    "2. This per-dimension adaptivity is inherited from AdaGrad/RMSProp and helps handle ill-conditioned or sparse problems.\n",
+    "\n",
+    "3. Meanwhile, momentum (first moment) allows Adam to continue making progress even if gradients become small or noisy, by leveraging accumulated direction."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e3862c40",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## ADAM algorithm, taken from [Goodfellow et al](https://www.deeplearningbook.org/contents/optimization.html)\n",
+    "\n",
+    "<!-- dom:FIGURE: [figures/adam.png, width=600 frac=0.8] -->\n",
+    "<!-- begin figure -->\n",
+    "\n",
+    "<img src=\"figures/adam.png\" width=\"600\"><p style=\"font-size: 0.9em\"><i>Figure 1: </i></p>\n",
+    "<!-- end figure -->"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c4aa2b35",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Algorithms and codes for Adagrad, RMSprop and Adam\n",
+    "\n",
+    "The algorithms we have implemented are well described in the text by [Goodfellow, Bengio and Courville, chapter 8](https://www.deeplearningbook.org/contents/optimization.html).\n",
+    "\n",
+    "The codes which implement these algorithms are discussed below here."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "01de27d3",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Practical tips\n",
+    "\n",
+    "* **Randomize the data when making mini-batches**. It is always important to randomly shuffle the data when forming mini-batches. Otherwise, the gradient descent method can fit spurious correlations resulting from the order in which data is presented.\n",
+    "\n",
+    "* **Transform your inputs**. Learning becomes difficult when our landscape has a mixture of steep and flat directions. One simple trick for minimizing these situations is to standardize the data by subtracting the mean and normalizing the variance of input variables. Whenever possible, also decorrelate the inputs. To understand why this is helpful, consider the case of linear regression. It is easy to show that for the squared error cost function, the Hessian of the cost function is just the correlation matrix between the inputs. Thus, by standardizing the inputs, we are ensuring that the landscape looks homogeneous in all directions in parameter space. Since most deep networks can be viewed as linear transformations followed by a non-linearity at each layer, we expect this intuition to hold beyond the linear case.\n",
+    "\n",
+    "* **Monitor the out-of-sample performance.** Always monitor the performance of your model on a validation set (a small portion of the training data that is held out of the training process to serve as a proxy for the test set. If the validation error starts increasing, then the model is beginning to overfit. Terminate the learning process. This *early stopping* significantly improves performance in many settings.\n",
+    "\n",
+    "* **Adaptive optimization methods don't always have good generalization.** Recent studies have shown that adaptive methods such as ADAM, RMSPorp, and AdaGrad tend to have poor generalization compared to SGD or SGD with momentum, particularly in the high-dimensional limit (i.e. the number of parameters exceeds the number of data points). Although it is not clear at this stage why these methods perform so well in training deep neural networks, simpler procedures like properly-tuned SGD may work as well or better in these applications."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "78a1a601",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Sneaking in automatic differentiation using Autograd\n",
+    "\n",
+    "In the examples here we take the liberty of sneaking in automatic\n",
+    "differentiation (without having discussed the mathematics).  In\n",
+    "project 1 you will write the gradients as discussed above, that is\n",
+    "hard-coding the gradients.  By introducing automatic differentiation\n",
+    "via the library **autograd**, which is now replaced by **JAX**, we have\n",
+    "more flexibility in setting up alternative cost functions.\n",
+    "\n",
+    "The\n",
+    "first example shows results with ordinary leats squares."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 9,
+   "id": "c721352d",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "# Using Autograd to calculate gradients for OLS\n",
+    "from random import random, seed\n",
+    "import numpy as np\n",
+    "import autograd.numpy as np\n",
+    "import matplotlib.pyplot as plt\n",
+    "from autograd import grad\n",
+    "\n",
+    "def CostOLS(theta):\n",
+    "    return (1.0/n)*np.sum((y-X @ theta)**2)\n",
+    "\n",
+    "n = 100\n",
+    "x = 2*np.random.rand(n,1)\n",
+    "y = 4+3*x+np.random.randn(n,1)\n",
+    "\n",
+    "X = np.c_[np.ones((n,1)), x]\n",
+    "XT_X = X.T @ X\n",
+    "theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)\n",
+    "print(\"Own inversion\")\n",
+    "print(theta_linreg)\n",
+    "# Hessian matrix\n",
+    "H = (2.0/n)* XT_X\n",
+    "EigValues, EigVectors = np.linalg.eig(H)\n",
+    "print(f\"Eigenvalues of Hessian Matrix:{EigValues}\")\n",
+    "\n",
+    "theta = np.random.randn(2,1)\n",
+    "eta = 1.0/np.max(EigValues)\n",
+    "Niterations = 1000\n",
+    "# define the gradient\n",
+    "training_gradient = grad(CostOLS)\n",
+    "\n",
+    "for iter in range(Niterations):\n",
+    "    gradients = training_gradient(theta)\n",
+    "    theta -= eta*gradients\n",
+    "print(\"theta from own gd\")\n",
+    "print(theta)\n",
+    "\n",
+    "xnew = np.array([[0],[2]])\n",
+    "Xnew = np.c_[np.ones((2,1)), xnew]\n",
+    "ypredict = Xnew.dot(theta)\n",
+    "ypredict2 = Xnew.dot(theta_linreg)\n",
+    "\n",
+    "plt.plot(xnew, ypredict, \"r-\")\n",
+    "plt.plot(xnew, ypredict2, \"b-\")\n",
+    "plt.plot(x, y ,'ro')\n",
+    "plt.axis([0,2.0,0, 15.0])\n",
+    "plt.xlabel(r'$x$')\n",
+    "plt.ylabel(r'$y$')\n",
+    "plt.title(r'Random numbers ')\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e36cec47",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Same code but now with momentum gradient descent"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 10,
+   "id": "fc5df7eb",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "# Using Autograd to calculate gradients for OLS\n",
+    "from random import random, seed\n",
+    "import numpy as np\n",
+    "import autograd.numpy as np\n",
+    "import matplotlib.pyplot as plt\n",
+    "from autograd import grad\n",
+    "\n",
+    "def CostOLS(theta):\n",
+    "    return (1.0/n)*np.sum((y-X @ theta)**2)\n",
+    "\n",
+    "n = 100\n",
+    "x = 2*np.random.rand(n,1)\n",
+    "y = 4+3*x#+np.random.randn(n,1)\n",
+    "\n",
+    "X = np.c_[np.ones((n,1)), x]\n",
+    "XT_X = X.T @ X\n",
+    "theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)\n",
+    "print(\"Own inversion\")\n",
+    "print(theta_linreg)\n",
+    "# Hessian matrix\n",
+    "H = (2.0/n)* XT_X\n",
+    "EigValues, EigVectors = np.linalg.eig(H)\n",
+    "print(f\"Eigenvalues of Hessian Matrix:{EigValues}\")\n",
+    "\n",
+    "theta = np.random.randn(2,1)\n",
+    "eta = 1.0/np.max(EigValues)\n",
+    "Niterations = 30\n",
+    "\n",
+    "# define the gradient\n",
+    "training_gradient = grad(CostOLS)\n",
+    "\n",
+    "for iter in range(Niterations):\n",
+    "    gradients = training_gradient(theta)\n",
+    "    theta -= eta*gradients\n",
+    "    print(iter,gradients[0],gradients[1])\n",
+    "print(\"theta from own gd\")\n",
+    "print(theta)\n",
+    "\n",
+    "# Now improve with momentum gradient descent\n",
+    "change = 0.0\n",
+    "delta_momentum = 0.3\n",
+    "for iter in range(Niterations):\n",
+    "    # calculate gradient\n",
+    "    gradients = training_gradient(theta)\n",
+    "    # calculate update\n",
+    "    new_change = eta*gradients+delta_momentum*change\n",
+    "    # take a step\n",
+    "    theta -= new_change\n",
+    "    # save the change\n",
+    "    change = new_change\n",
+    "    print(iter,gradients[0],gradients[1])\n",
+    "print(\"theta from own gd wth momentum\")\n",
+    "print(theta)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0b27af70",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Including Stochastic Gradient Descent with Autograd\n",
+    "\n",
+    "In this code we include the stochastic gradient descent approach\n",
+    "discussed above. Note here that we specify which argument we are\n",
+    "taking the derivative with respect to when using **autograd**."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 11,
+   "id": "adef9763",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "# Using Autograd to calculate gradients using SGD\n",
+    "# OLS example\n",
+    "from random import random, seed\n",
+    "import numpy as np\n",
+    "import autograd.numpy as np\n",
+    "import matplotlib.pyplot as plt\n",
+    "from autograd import grad\n",
+    "\n",
+    "# Note change from previous example\n",
+    "def CostOLS(y,X,theta):\n",
+    "    return np.sum((y-X @ theta)**2)\n",
+    "\n",
+    "n = 100\n",
+    "x = 2*np.random.rand(n,1)\n",
+    "y = 4+3*x+np.random.randn(n,1)\n",
+    "\n",
+    "X = np.c_[np.ones((n,1)), x]\n",
+    "XT_X = X.T @ X\n",
+    "theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)\n",
+    "print(\"Own inversion\")\n",
+    "print(theta_linreg)\n",
+    "# Hessian matrix\n",
+    "H = (2.0/n)* XT_X\n",
+    "EigValues, EigVectors = np.linalg.eig(H)\n",
+    "print(f\"Eigenvalues of Hessian Matrix:{EigValues}\")\n",
+    "\n",
+    "theta = np.random.randn(2,1)\n",
+    "eta = 1.0/np.max(EigValues)\n",
+    "Niterations = 1000\n",
+    "\n",
+    "# Note that we request the derivative wrt third argument (theta, 2 here)\n",
+    "training_gradient = grad(CostOLS,2)\n",
+    "\n",
+    "for iter in range(Niterations):\n",
+    "    gradients = (1.0/n)*training_gradient(y, X, theta)\n",
+    "    theta -= eta*gradients\n",
+    "print(\"theta from own gd\")\n",
+    "print(theta)\n",
+    "\n",
+    "xnew = np.array([[0],[2]])\n",
+    "Xnew = np.c_[np.ones((2,1)), xnew]\n",
+    "ypredict = Xnew.dot(theta)\n",
+    "ypredict2 = Xnew.dot(theta_linreg)\n",
+    "\n",
+    "plt.plot(xnew, ypredict, \"r-\")\n",
+    "plt.plot(xnew, ypredict2, \"b-\")\n",
+    "plt.plot(x, y ,'ro')\n",
+    "plt.axis([0,2.0,0, 15.0])\n",
+    "plt.xlabel(r'$x$')\n",
+    "plt.ylabel(r'$y$')\n",
+    "plt.title(r'Random numbers ')\n",
+    "plt.show()\n",
+    "\n",
+    "n_epochs = 50\n",
+    "M = 5   #size of each minibatch\n",
+    "m = int(n/M) #number of minibatches\n",
+    "t0, t1 = 5, 50\n",
+    "def learning_schedule(t):\n",
+    "    return t0/(t+t1)\n",
+    "\n",
+    "theta = np.random.randn(2,1)\n",
+    "\n",
+    "for epoch in range(n_epochs):\n",
+    "# Can you figure out a better way of setting up the contributions to each batch?\n",
+    "    for i in range(m):\n",
+    "        random_index = M*np.random.randint(m)\n",
+    "        xi = X[random_index:random_index+M]\n",
+    "        yi = y[random_index:random_index+M]\n",
+    "        gradients = (1.0/M)*training_gradient(yi, xi, theta)\n",
+    "        eta = learning_schedule(epoch*m+i)\n",
+    "        theta = theta - eta*gradients\n",
+    "print(\"theta from own sdg\")\n",
+    "print(theta)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "310fe5b2",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Same code but now with momentum gradient descent"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 12,
+   "id": "bcf65acf",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "# Using Autograd to calculate gradients using SGD\n",
+    "# OLS example\n",
+    "from random import random, seed\n",
+    "import numpy as np\n",
+    "import autograd.numpy as np\n",
+    "import matplotlib.pyplot as plt\n",
+    "from autograd import grad\n",
+    "\n",
+    "# Note change from previous example\n",
+    "def CostOLS(y,X,theta):\n",
+    "    return np.sum((y-X @ theta)**2)\n",
+    "\n",
+    "n = 100\n",
+    "x = 2*np.random.rand(n,1)\n",
+    "y = 4+3*x+np.random.randn(n,1)\n",
+    "\n",
+    "X = np.c_[np.ones((n,1)), x]\n",
+    "XT_X = X.T @ X\n",
+    "theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)\n",
+    "print(\"Own inversion\")\n",
+    "print(theta_linreg)\n",
+    "# Hessian matrix\n",
+    "H = (2.0/n)* XT_X\n",
+    "EigValues, EigVectors = np.linalg.eig(H)\n",
+    "print(f\"Eigenvalues of Hessian Matrix:{EigValues}\")\n",
+    "\n",
+    "theta = np.random.randn(2,1)\n",
+    "eta = 1.0/np.max(EigValues)\n",
+    "Niterations = 100\n",
+    "\n",
+    "# Note that we request the derivative wrt third argument (theta, 2 here)\n",
+    "training_gradient = grad(CostOLS,2)\n",
+    "\n",
+    "for iter in range(Niterations):\n",
+    "    gradients = (1.0/n)*training_gradient(y, X, theta)\n",
+    "    theta -= eta*gradients\n",
+    "print(\"theta from own gd\")\n",
+    "print(theta)\n",
+    "\n",
+    "\n",
+    "n_epochs = 50\n",
+    "M = 5   #size of each minibatch\n",
+    "m = int(n/M) #number of minibatches\n",
+    "t0, t1 = 5, 50\n",
+    "def learning_schedule(t):\n",
+    "    return t0/(t+t1)\n",
+    "\n",
+    "theta = np.random.randn(2,1)\n",
+    "\n",
+    "change = 0.0\n",
+    "delta_momentum = 0.3\n",
+    "\n",
+    "for epoch in range(n_epochs):\n",
+    "    for i in range(m):\n",
+    "        random_index = M*np.random.randint(m)\n",
+    "        xi = X[random_index:random_index+M]\n",
+    "        yi = y[random_index:random_index+M]\n",
+    "        gradients = (1.0/M)*training_gradient(yi, xi, theta)\n",
+    "        eta = learning_schedule(epoch*m+i)\n",
+    "        # calculate update\n",
+    "        new_change = eta*gradients+delta_momentum*change\n",
+    "        # take a step\n",
+    "        theta -= new_change\n",
+    "        # save the change\n",
+    "        change = new_change\n",
+    "print(\"theta from own sdg with momentum\")\n",
+    "print(theta)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f5e2c550",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## But none of these can compete with Newton's method\n",
+    "\n",
+    "Note that we here have introduced automatic differentiation"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 13,
+   "id": "300a02a4",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "# Using Newton's method\n",
+    "from random import random, seed\n",
+    "import numpy as np\n",
+    "import autograd.numpy as np\n",
+    "from autograd import grad\n",
+    "\n",
+    "def CostOLS(theta):\n",
+    "    return (1.0/n)*np.sum((y-X @ theta)**2)\n",
+    "\n",
+    "n = 100\n",
+    "x = 2*np.random.rand(n,1)\n",
+    "y = 4+3*x+5*x*x\n",
+    "\n",
+    "X = np.c_[np.ones((n,1)), x, x*x]\n",
+    "XT_X = X.T @ X\n",
+    "theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)\n",
+    "print(\"Own inversion\")\n",
+    "print(theta_linreg)\n",
+    "# Hessian matrix\n",
+    "H = (2.0/n)* XT_X\n",
+    "# Note that here the Hessian does not depend on the parameters theta\n",
+    "invH = np.linalg.pinv(H)\n",
+    "theta = np.random.randn(3,1)\n",
+    "Niterations = 5\n",
+    "# define the gradient\n",
+    "training_gradient = grad(CostOLS)\n",
+    "\n",
+    "for iter in range(Niterations):\n",
+    "    gradients = training_gradient(theta)\n",
+    "    theta -= invH @ gradients\n",
+    "    print(iter,gradients[0],gradients[1])\n",
+    "print(\"theta from own Newton code\")\n",
+    "print(theta)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5cb5fd26",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Similar (second order function now) problem but now with AdaGrad"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 14,
+   "id": "030efc5d",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "# Using Autograd to calculate gradients using AdaGrad and Stochastic Gradient descent\n",
+    "# OLS example\n",
+    "from random import random, seed\n",
+    "import numpy as np\n",
+    "import autograd.numpy as np\n",
+    "import matplotlib.pyplot as plt\n",
+    "from autograd import grad\n",
+    "\n",
+    "# Note change from previous example\n",
+    "def CostOLS(y,X,theta):\n",
+    "    return np.sum((y-X @ theta)**2)\n",
+    "\n",
+    "n = 1000\n",
+    "x = np.random.rand(n,1)\n",
+    "y = 2.0+3*x +4*x*x\n",
+    "\n",
+    "X = np.c_[np.ones((n,1)), x, x*x]\n",
+    "XT_X = X.T @ X\n",
+    "theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)\n",
+    "print(\"Own inversion\")\n",
+    "print(theta_linreg)\n",
+    "\n",
+    "\n",
+    "# Note that we request the derivative wrt third argument (theta, 2 here)\n",
+    "training_gradient = grad(CostOLS,2)\n",
+    "# Define parameters for Stochastic Gradient Descent\n",
+    "n_epochs = 50\n",
+    "M = 5   #size of each minibatch\n",
+    "m = int(n/M) #number of minibatches\n",
+    "# Guess for unknown parameters theta\n",
+    "theta = np.random.randn(3,1)\n",
+    "\n",
+    "# Value for learning rate\n",
+    "eta = 0.01\n",
+    "# Including AdaGrad parameter to avoid possible division by zero\n",
+    "delta  = 1e-8\n",
+    "for epoch in range(n_epochs):\n",
+    "    Giter = 0.0\n",
+    "    for i in range(m):\n",
+    "        random_index = M*np.random.randint(m)\n",
+    "        xi = X[random_index:random_index+M]\n",
+    "        yi = y[random_index:random_index+M]\n",
+    "        gradients = (1.0/M)*training_gradient(yi, xi, theta)\n",
+    "        Giter += gradients*gradients\n",
+    "        update = gradients*eta/(delta+np.sqrt(Giter))\n",
+    "        theta -= update\n",
+    "print(\"theta from own AdaGrad\")\n",
+    "print(theta)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "66850bb7",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Running this code we note an almost perfect agreement with the results from matrix inversion."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e1608bcf",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## RMSprop for adaptive learning rate with Stochastic Gradient Descent"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 15,
+   "id": "0ba7d8f7",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "# Using Autograd to calculate gradients using RMSprop  and Stochastic Gradient descent\n",
+    "# OLS example\n",
+    "from random import random, seed\n",
+    "import numpy as np\n",
+    "import autograd.numpy as np\n",
+    "import matplotlib.pyplot as plt\n",
+    "from autograd import grad\n",
+    "\n",
+    "# Note change from previous example\n",
+    "def CostOLS(y,X,theta):\n",
+    "    return np.sum((y-X @ theta)**2)\n",
+    "\n",
+    "n = 1000\n",
+    "x = np.random.rand(n,1)\n",
+    "y = 2.0+3*x +4*x*x# +np.random.randn(n,1)\n",
+    "\n",
+    "X = np.c_[np.ones((n,1)), x, x*x]\n",
+    "XT_X = X.T @ X\n",
+    "theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)\n",
+    "print(\"Own inversion\")\n",
+    "print(theta_linreg)\n",
+    "\n",
+    "\n",
+    "# Note that we request the derivative wrt third argument (theta, 2 here)\n",
+    "training_gradient = grad(CostOLS,2)\n",
+    "# Define parameters for Stochastic Gradient Descent\n",
+    "n_epochs = 50\n",
+    "M = 5   #size of each minibatch\n",
+    "m = int(n/M) #number of minibatches\n",
+    "# Guess for unknown parameters theta\n",
+    "theta = np.random.randn(3,1)\n",
+    "\n",
+    "# Value for learning rate\n",
+    "eta = 0.01\n",
+    "# Value for parameter rho\n",
+    "rho = 0.99\n",
+    "# Including AdaGrad parameter to avoid possible division by zero\n",
+    "delta  = 1e-8\n",
+    "for epoch in range(n_epochs):\n",
+    "    Giter = 0.0\n",
+    "    for i in range(m):\n",
+    "        random_index = M*np.random.randint(m)\n",
+    "        xi = X[random_index:random_index+M]\n",
+    "        yi = y[random_index:random_index+M]\n",
+    "        gradients = (1.0/M)*training_gradient(yi, xi, theta)\n",
+    "\t# Accumulated gradient\n",
+    "\t# Scaling with rho the new and the previous results\n",
+    "        Giter = (rho*Giter+(1-rho)*gradients*gradients)\n",
+    "\t# Taking the diagonal only and inverting\n",
+    "        update = gradients*eta/(delta+np.sqrt(Giter))\n",
+    "\t# Hadamard product\n",
+    "        theta -= update\n",
+    "print(\"theta from own RMSprop\")\n",
+    "print(theta)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0503f74b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## And finally [ADAM](https://arxiv.org/pdf/1412.6980.pdf)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 16,
+   "id": "c2a2732a",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "# Using Autograd to calculate gradients using RMSprop  and Stochastic Gradient descent\n",
+    "# OLS example\n",
+    "from random import random, seed\n",
+    "import numpy as np\n",
+    "import autograd.numpy as np\n",
+    "import matplotlib.pyplot as plt\n",
+    "from autograd import grad\n",
+    "\n",
+    "# Note change from previous example\n",
+    "def CostOLS(y,X,theta):\n",
+    "    return np.sum((y-X @ theta)**2)\n",
+    "\n",
+    "n = 1000\n",
+    "x = np.random.rand(n,1)\n",
+    "y = 2.0+3*x +4*x*x# +np.random.randn(n,1)\n",
+    "\n",
+    "X = np.c_[np.ones((n,1)), x, x*x]\n",
+    "XT_X = X.T @ X\n",
+    "theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)\n",
+    "print(\"Own inversion\")\n",
+    "print(theta_linreg)\n",
+    "\n",
+    "\n",
+    "# Note that we request the derivative wrt third argument (theta, 2 here)\n",
+    "training_gradient = grad(CostOLS,2)\n",
+    "# Define parameters for Stochastic Gradient Descent\n",
+    "n_epochs = 50\n",
+    "M = 5   #size of each minibatch\n",
+    "m = int(n/M) #number of minibatches\n",
+    "# Guess for unknown parameters theta\n",
+    "theta = np.random.randn(3,1)\n",
+    "\n",
+    "# Value for learning rate\n",
+    "eta = 0.01\n",
+    "# Value for parameters theta1 and theta2, see https://arxiv.org/abs/1412.6980\n",
+    "theta1 = 0.9\n",
+    "theta2 = 0.999\n",
+    "# Including AdaGrad parameter to avoid possible division by zero\n",
+    "delta  = 1e-7\n",
+    "iter = 0\n",
+    "for epoch in range(n_epochs):\n",
+    "    first_moment = 0.0\n",
+    "    second_moment = 0.0\n",
+    "    iter += 1\n",
+    "    for i in range(m):\n",
+    "        random_index = M*np.random.randint(m)\n",
+    "        xi = X[random_index:random_index+M]\n",
+    "        yi = y[random_index:random_index+M]\n",
+    "        gradients = (1.0/M)*training_gradient(yi, xi, theta)\n",
+    "        # Computing moments first\n",
+    "        first_moment = theta1*first_moment + (1-theta1)*gradients\n",
+    "        second_moment = theta2*second_moment+(1-theta2)*gradients*gradients\n",
+    "        first_term = first_moment/(1.0-theta1**iter)\n",
+    "        second_term = second_moment/(1.0-theta2**iter)\n",
+    "\t# Scaling with rho the new and the previous results\n",
+    "        update = eta*first_term/(np.sqrt(second_term)+delta)\n",
+    "        theta -= update\n",
+    "print(\"theta from own ADAM\")\n",
+    "print(theta)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b8475863",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Material for the lab sessions\n",
+    "\n",
+    "1. Exercise set for week 37 and reminder on scaling (from lab sessions of week 35)\n",
+    "\n",
+    "2. Work on project 1\n",
+    "<!-- * [Video of exercise sessions week 37](https://youtu.be/bK4AEcTu-oM) -->\n",
+    "\n",
+    "For more discussions of Ridge regression and calculation of averages, [Wessel van Wieringen's](https://arxiv.org/abs/1509.09169) article is highly recommended."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4d4d0717",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Reminder on different scaling methods\n",
+    "\n",
+    "Before fitting a regression model, it is good practice to normalize or\n",
+    "standardize the features. This ensures all features are on a\n",
+    "comparable scale, which is especially important when using\n",
+    "regularization. In the exercises this week we will perform standardization, scaling each\n",
+    "feature to have mean 0 and standard deviation 1.\n",
+    "\n",
+    "Here we compute the mean and standard deviation of each column (feature) in our design/feature matrix $\\boldsymbol{X}$.\n",
+    "Then we subtract the mean and divide by the standard deviation for each feature.\n",
+    "\n",
+    "In the example here we\n",
+    "we will also center the target $\\boldsymbol{y}$ to mean $0$. Centering $\\boldsymbol{y}$\n",
+    "(and each feature) means the model does not require a separate intercept\n",
+    "term, the data is shifted such that the intercept is effectively 0\n",
+    ". (In practice, one could include an intercept in the model and not\n",
+    "penalize it, but here we simplify by centering.)\n",
+    "Choose $n=100$ data points and set up $\\boldsymbol{x}, $\\boldsymbol{y}$ and the design matrix $\\boldsymbol{X}$."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 17,
+   "id": "46375144",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "# Standardize features (zero mean, unit variance for each feature)\n",
+    "X_mean = X.mean(axis=0)\n",
+    "X_std = X.std(axis=0)\n",
+    "X_std[X_std == 0] = 1  # safeguard to avoid division by zero for constant features\n",
+    "X_norm = (X - X_mean) / X_std\n",
+    "\n",
+    "# Center the target to zero mean (optional, to simplify intercept handling)\n",
+    "y_mean = ?\n",
+    "y_centered = ?"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "39426ccf",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Do we need to center the values of $y$?\n",
+    "\n",
+    "After this preprocessing, each column of $\\boldsymbol{X}_{\\mathrm{norm}}$ has mean zero and standard deviation $1$\n",
+    "and $\\boldsymbol{y}_{\\mathrm{centered}}$ has mean 0. This can make the optimization landscape\n",
+    "nicer and ensures the regularization penalty $\\lambda \\sum_j\n",
+    "\\theta_j^2$ in Ridge regression treats each coefficient fairly (since features are on the\n",
+    "same scale)."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "df7fe27f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Functionality in Scikit-Learn\n",
+    "\n",
+    "**Scikit-Learn** has several functions which allow us to rescale the\n",
+    "data, normally resulting in much better results in terms of various\n",
+    "accuracy scores.  The **StandardScaler** function in **Scikit-Learn**\n",
+    "ensures that for each feature/predictor we study the mean value is\n",
+    "zero and the variance is one (every column in the design/feature\n",
+    "matrix).  This scaling has the drawback that it does not ensure that\n",
+    "we have a particular maximum or minimum in our data set. Another\n",
+    "function included in **Scikit-Learn** is the **MinMaxScaler** which\n",
+    "ensures that all features are exactly between $0$ and $1$. The"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8fd48e39",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## More preprocessing\n",
+    "\n",
+    "The **Normalizer** scales each data\n",
+    "point such that the feature vector has a euclidean length of one. In other words, it\n",
+    "projects a data point on the circle (or sphere in the case of higher dimensions) with a\n",
+    "radius of 1. This means every data point is scaled by a different number (by the\n",
+    "inverse of it’s length).\n",
+    "This normalization is often used when only the direction (or angle) of the data matters,\n",
+    "not the length of the feature vector.\n",
+    "\n",
+    "The **RobustScaler** works similarly to the StandardScaler in that it\n",
+    "ensures statistical properties for each feature that guarantee that\n",
+    "they are on the same scale. However, the RobustScaler uses the median\n",
+    "and quartiles, instead of mean and variance. This makes the\n",
+    "RobustScaler ignore data points that are very different from the rest\n",
+    "(like measurement errors). These odd data points are also called\n",
+    "outliers, and might often lead to trouble for other scaling\n",
+    "techniques."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d6c60a0a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Frequently used scaling functions\n",
+    "\n",
+    "Many features are often scaled using standardization to improve performance. In **Scikit-Learn** this is given by the **StandardScaler** function as discussed above. It is easy however to write your own. \n",
+    "Mathematically, this involves subtracting the mean and divide by the standard deviation over the data set, for each feature:"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1bb6eaa0",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "x_j^{(i)} \\rightarrow \\frac{x_j^{(i)} - \\overline{x}_j}{\\sigma(x_j)},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "25135896",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where $\\overline{x}_j$ and $\\sigma(x_j)$ are the mean and standard deviation, respectively,  of the feature $x_j$.\n",
+    "This ensures that each feature has zero mean and unit standard deviation.  For data sets where  we do not have the standard deviation or don't wish to calculate it,  it is then common to simply set it to one.\n",
+    "\n",
+    "Keep in mind that when you transform your data set before training a model, the same transformation needs to be done\n",
+    "on your eventual new data set  before making a prediction. If we translate this into a Python code, it would could be implemented as"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 18,
+   "id": "469ca11e",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "\"\"\"\n",
+    "#Model training, we compute the mean value of y and X\n",
+    "y_train_mean = np.mean(y_train)\n",
+    "X_train_mean = np.mean(X_train,axis=0)\n",
+    "X_train = X_train - X_train_mean\n",
+    "y_train = y_train - y_train_mean\n",
+    "\n",
+    "# The we fit our model with the training data\n",
+    "trained_model = some_model.fit(X_train,y_train)\n",
+    "\n",
+    "\n",
+    "#Model prediction, we need also to transform our data set used for the prediction.\n",
+    "X_test = X_test - X_train_mean #Use mean from training data\n",
+    "y_pred = trained_model(X_test)\n",
+    "y_pred = y_pred + y_train_mean\n",
+    "\"\"\""
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "33722029",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Let us try to understand what this may imply mathematically when we\n",
+    "subtract the mean values, also known as *zero centering*. For\n",
+    "simplicity, we will focus on  ordinary regression, as done in the above example.\n",
+    "\n",
+    "The cost/loss function  for regression is"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "fe27291e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "C(\\theta_0, \\theta_1, ... , \\theta_{p-1}) = \\frac{1}{n}\\sum_{i=0}^{n} \\left(y_i - \\theta_0 - \\sum_{j=1}^{p-1} X_{ij}\\theta_j\\right)^2,.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ead1167d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Recall also that we use the squared value. This expression can lead to an\n",
+    "increased penalty for higher differences between predicted and\n",
+    "output/target values.\n",
+    "\n",
+    "What we have done is to single out the $\\theta_0$ term in the\n",
+    "definition of the mean squared error (MSE).  The design matrix $X$\n",
+    "does in this case not contain any intercept column.  When we take the\n",
+    "derivative with respect to $\\theta_0$, we want the derivative to obey"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b2efb706",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial C}{\\partial \\theta_j} = 0,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "65333100",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "for all $j$. For $\\theta_0$ we have"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1fde497c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial C}{\\partial \\theta_0} = -\\frac{2}{n}\\sum_{i=0}^{n-1} \\left(y_i - \\theta_0 - \\sum_{j=1}^{p-1} X_{ij} \\theta_j\\right).\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "264ce562",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Multiplying away the constant $2/n$, we obtain"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0f63a6f8",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\sum_{i=0}^{n-1} \\theta_0 = \\sum_{i=0}^{n-1}y_i - \\sum_{i=0}^{n-1} \\sum_{j=1}^{p-1} X_{ij} \\theta_j.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2ba0a6e4",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Let us specialize first to the case where we have only two parameters $\\theta_0$ and $\\theta_1$.\n",
+    "Our result for $\\theta_0$ simplifies then to"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3b377f93",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "n\\theta_0 = \\sum_{i=0}^{n-1}y_i - \\sum_{i=0}^{n-1} X_{i1} \\theta_1.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f05e9d08",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "We obtain then"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "84784b8e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\theta_0 = \\frac{1}{n}\\sum_{i=0}^{n-1}y_i - \\theta_1\\frac{1}{n}\\sum_{i=0}^{n-1} X_{i1}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b62c6e5a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "If we define"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ecce9763",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\mu_{\\boldsymbol{x}_1}=\\frac{1}{n}\\sum_{i=0}^{n-1} X_{i1},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c9e1842a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and the mean value of the outputs as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "be12163e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\mu_y=\\frac{1}{n}\\sum_{i=0}^{n-1}y_i,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a097e9ab",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "we have"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "239422b0",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\theta_0 = \\mu_y - \\theta_1\\mu_{\\boldsymbol{x}_1}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ed9778bb",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "In the general case with more parameters than $\\theta_0$ and $\\theta_1$, we have"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7179b77b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\theta_0 = \\frac{1}{n}\\sum_{i=0}^{n-1}y_i - \\frac{1}{n}\\sum_{i=0}^{n-1}\\sum_{j=1}^{p-1} X_{ij}\\theta_j.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "aad2f56e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "We can rewrite the latter equation as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "26aa9739",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\theta_0 = \\frac{1}{n}\\sum_{i=0}^{n-1}y_i - \\sum_{j=1}^{p-1} \\mu_{\\boldsymbol{x}_j}\\theta_j,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d270cb13",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where we have defined"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5a52457b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\mu_{\\boldsymbol{x}_j}=\\frac{1}{n}\\sum_{i=0}^{n-1} X_{ij},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8c98105d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "the mean value for all elements of the column vector $\\boldsymbol{x}_j$.\n",
+    "\n",
+    "Replacing $y_i$ with $y_i - y_i - \\overline{\\boldsymbol{y}}$ and centering also our design matrix results in a cost function (in vector-matrix disguise)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4d82302f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "C(\\boldsymbol{\\theta}) = (\\boldsymbol{\\tilde{y}} - \\tilde{X}\\boldsymbol{\\theta})^T(\\boldsymbol{\\tilde{y}} - \\tilde{X}\\boldsymbol{\\theta}).\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a3a07a10",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "If we minimize with respect to $\\boldsymbol{\\theta}$ we have then"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ea19374e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\hat{\\boldsymbol{\\theta}} = (\\tilde{X}^T\\tilde{X})^{-1}\\tilde{X}^T\\boldsymbol{\\tilde{y}},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "11dd1361",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where $\\boldsymbol{\\tilde{y}} = \\boldsymbol{y} - \\overline{\\boldsymbol{y}}$\n",
+    "and $\\tilde{X}_{ij} = X_{ij} - \\frac{1}{n}\\sum_{k=0}^{n-1}X_{kj}$.\n",
+    "\n",
+    "For Ridge regression we need to add $\\lambda \\boldsymbol{\\theta}^T\\boldsymbol{\\theta}$ to the cost function and get then"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f6a52f34",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\hat{\\boldsymbol{\\theta}} = (\\tilde{X}^T\\tilde{X} + \\lambda I)^{-1}\\tilde{X}^T\\boldsymbol{\\tilde{y}}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9d6807dc",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "What does this mean? And why do we insist on all this? Let us look at some examples.\n",
+    "\n",
+    "This code shows a simple first-order fit to a data set using the above transformed data, where we consider the role of the intercept first, by either excluding it or including it (*code example thanks to  Øyvind Sigmundson Schøyen*). Here our scaling of the data is done by subtracting the mean values only.\n",
+    "Note also that we do not split the data into training and test."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 19,
+   "id": "2ed0cafc",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "import numpy as np\n",
+    "import matplotlib.pyplot as plt\n",
+    "\n",
+    "from sklearn.linear_model import LinearRegression\n",
+    "\n",
+    "\n",
+    "np.random.seed(2021)\n",
+    "\n",
+    "def MSE(y_data,y_model):\n",
+    "    n = np.size(y_model)\n",
+    "    return np.sum((y_data-y_model)**2)/n\n",
+    "\n",
+    "\n",
+    "def fit_theta(X, y):\n",
+    "    return np.linalg.pinv(X.T @ X) @ X.T @ y\n",
+    "\n",
+    "\n",
+    "true_theta = [2, 0.5, 3.7]\n",
+    "\n",
+    "x = np.linspace(0, 1, 11)\n",
+    "y = np.sum(\n",
+    "    np.asarray([x ** p * b for p, b in enumerate(true_theta)]), axis=0\n",
+    ") + 0.1 * np.random.normal(size=len(x))\n",
+    "\n",
+    "degree = 3\n",
+    "X = np.zeros((len(x), degree))\n",
+    "\n",
+    "# Include the intercept in the design matrix\n",
+    "for p in range(degree):\n",
+    "    X[:, p] = x ** p\n",
+    "\n",
+    "theta = fit_theta(X, y)\n",
+    "\n",
+    "# Intercept is included in the design matrix\n",
+    "skl = LinearRegression(fit_intercept=False).fit(X, y)\n",
+    "\n",
+    "print(f\"True theta: {true_theta}\")\n",
+    "print(f\"Fitted theta: {theta}\")\n",
+    "print(f\"Sklearn fitted theta: {skl.coef_}\")\n",
+    "ypredictOwn = X @ theta\n",
+    "ypredictSKL = skl.predict(X)\n",
+    "print(f\"MSE with intercept column\")\n",
+    "print(MSE(y,ypredictOwn))\n",
+    "print(f\"MSE with intercept column from SKL\")\n",
+    "print(MSE(y,ypredictSKL))\n",
+    "\n",
+    "\n",
+    "plt.figure()\n",
+    "plt.scatter(x, y, label=\"Data\")\n",
+    "plt.plot(x, X @ theta, label=\"Fit\")\n",
+    "plt.plot(x, skl.predict(X), label=\"Sklearn (fit_intercept=False)\")\n",
+    "\n",
+    "\n",
+    "# Do not include the intercept in the design matrix\n",
+    "X = np.zeros((len(x), degree - 1))\n",
+    "\n",
+    "for p in range(degree - 1):\n",
+    "    X[:, p] = x ** (p + 1)\n",
+    "\n",
+    "# Intercept is not included in the design matrix\n",
+    "skl = LinearRegression(fit_intercept=True).fit(X, y)\n",
+    "\n",
+    "# Use centered values for X and y when computing coefficients\n",
+    "y_offset = np.average(y, axis=0)\n",
+    "X_offset = np.average(X, axis=0)\n",
+    "\n",
+    "theta = fit_theta(X - X_offset, y - y_offset)\n",
+    "intercept = np.mean(y_offset - X_offset @ theta)\n",
+    "\n",
+    "print(f\"Manual intercept: {intercept}\")\n",
+    "print(f\"Fitted theta (without intercept): {theta}\")\n",
+    "print(f\"Sklearn intercept: {skl.intercept_}\")\n",
+    "print(f\"Sklearn fitted theta (without intercept): {skl.coef_}\")\n",
+    "ypredictOwn = X @ theta\n",
+    "ypredictSKL = skl.predict(X)\n",
+    "print(f\"MSE with Manual intercept\")\n",
+    "print(MSE(y,ypredictOwn+intercept))\n",
+    "print(f\"MSE with Sklearn intercept\")\n",
+    "print(MSE(y,ypredictSKL))\n",
+    "\n",
+    "plt.plot(x, X @ theta + intercept, \"--\", label=\"Fit (manual intercept)\")\n",
+    "plt.plot(x, skl.predict(X), \"--\", label=\"Sklearn (fit_intercept=True)\")\n",
+    "plt.grid()\n",
+    "plt.legend()\n",
+    "\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f72dbb49",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "The intercept is the value of our output/target variable\n",
+    "when all our features are zero and our function crosses the $y$-axis (for a one-dimensional case). \n",
+    "\n",
+    "Printing the MSE, we see first that both methods give the same MSE, as\n",
+    "they should.  However, when we move to for example Ridge regression,\n",
+    "the way we treat the intercept may give a larger or smaller MSE,\n",
+    "meaning that the MSE can be penalized by the value of the\n",
+    "intercept. Not including the intercept in the fit, means that the\n",
+    "regularization term does not include $\\theta_0$. For different values\n",
+    "of $\\lambda$, this may lead to different MSE values. \n",
+    "\n",
+    "To remind the reader, the regularization term, with the intercept in Ridge regression, is given by"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b7759b1f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\lambda \\vert\\vert \\boldsymbol{\\theta} \\vert\\vert_2^2 = \\lambda \\sum_{j=0}^{p-1}\\theta_j^2,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ba0ecd6e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "but when we take out the intercept, this equation becomes"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ae897f1e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\lambda \\vert\\vert \\boldsymbol{\\theta} \\vert\\vert_2^2 = \\lambda \\sum_{j=1}^{p-1}\\theta_j^2.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f9c41f7f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "For Lasso regression we have"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "fa013cc4",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\lambda \\vert\\vert \\boldsymbol{\\theta} \\vert\\vert_1 = \\lambda \\sum_{j=1}^{p-1}\\vert\\theta_j\\vert.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0c9b24be",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "It means that, when scaling the design matrix and the outputs/targets,\n",
+    "by subtracting the mean values, we have an optimization problem which\n",
+    "is not penalized by the intercept. The MSE value can then be smaller\n",
+    "since it focuses only on the remaining quantities. If we however bring\n",
+    "back the intercept, we will get a MSE which then contains the\n",
+    "intercept.\n",
+    "\n",
+    "Armed with this wisdom, we attempt first to simply set the intercept equal to **False** in our implementation of Ridge regression for our well-known  vanilla data set."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 20,
+   "id": "4f9b1fa0",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "import numpy as np\n",
+    "import pandas as pd\n",
+    "import matplotlib.pyplot as plt\n",
+    "from sklearn.model_selection import train_test_split\n",
+    "from sklearn import linear_model\n",
+    "\n",
+    "def MSE(y_data,y_model):\n",
+    "    n = np.size(y_model)\n",
+    "    return np.sum((y_data-y_model)**2)/n\n",
+    "\n",
+    "\n",
+    "# A seed just to ensure that the random numbers are the same for every run.\n",
+    "# Useful for eventual debugging.\n",
+    "np.random.seed(3155)\n",
+    "\n",
+    "n = 100\n",
+    "x = np.random.rand(n)\n",
+    "y = np.exp(-x**2) + 1.5 * np.exp(-(x-2)**2)\n",
+    "\n",
+    "Maxpolydegree = 20\n",
+    "X = np.zeros((n,Maxpolydegree))\n",
+    "#We include explicitely the intercept column\n",
+    "for degree in range(Maxpolydegree):\n",
+    "    X[:,degree] = x**degree\n",
+    "# We split the data in test and training data\n",
+    "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)\n",
+    "\n",
+    "p = Maxpolydegree\n",
+    "I = np.eye(p,p)\n",
+    "# Decide which values of lambda to use\n",
+    "nlambdas = 6\n",
+    "MSEOwnRidgePredict = np.zeros(nlambdas)\n",
+    "MSERidgePredict = np.zeros(nlambdas)\n",
+    "lambdas = np.logspace(-4, 2, nlambdas)\n",
+    "for i in range(nlambdas):\n",
+    "    lmb = lambdas[i]\n",
+    "    OwnRidgeTheta = np.linalg.pinv(X_train.T @ X_train+lmb*I) @ X_train.T @ y_train\n",
+    "    # Note: we include the intercept column and no scaling\n",
+    "    RegRidge = linear_model.Ridge(lmb,fit_intercept=False)\n",
+    "    RegRidge.fit(X_train,y_train)\n",
+    "    # and then make the prediction\n",
+    "    ytildeOwnRidge = X_train @ OwnRidgeTheta\n",
+    "    ypredictOwnRidge = X_test @ OwnRidgeTheta\n",
+    "    ytildeRidge = RegRidge.predict(X_train)\n",
+    "    ypredictRidge = RegRidge.predict(X_test)\n",
+    "    MSEOwnRidgePredict[i] = MSE(y_test,ypredictOwnRidge)\n",
+    "    MSERidgePredict[i] = MSE(y_test,ypredictRidge)\n",
+    "    print(\"Theta values for own Ridge implementation\")\n",
+    "    print(OwnRidgeTheta)\n",
+    "    print(\"Theta values for Scikit-Learn Ridge implementation\")\n",
+    "    print(RegRidge.coef_)\n",
+    "    print(\"MSE values for own Ridge implementation\")\n",
+    "    print(MSEOwnRidgePredict[i])\n",
+    "    print(\"MSE values for Scikit-Learn Ridge implementation\")\n",
+    "    print(MSERidgePredict[i])\n",
+    "\n",
+    "# Now plot the results\n",
+    "plt.figure()\n",
+    "plt.plot(np.log10(lambdas), MSEOwnRidgePredict, 'r', label = 'MSE own Ridge Test')\n",
+    "plt.plot(np.log10(lambdas), MSERidgePredict, 'g', label = 'MSE Ridge Test')\n",
+    "\n",
+    "plt.xlabel('log10(lambda)')\n",
+    "plt.ylabel('MSE')\n",
+    "plt.legend()\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1aa5ca37",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "The results here agree when we force **Scikit-Learn**'s Ridge function to include the first column in our design matrix.\n",
+    "We see that the results agree very well. Here we have thus explicitely included the intercept column in the design matrix.\n",
+    "What happens if we do not include the intercept in our fit?\n",
+    "Let us see how we can change this code by zero centering."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 21,
+   "id": "a731e32c",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "import numpy as np\n",
+    "import pandas as pd\n",
+    "import matplotlib.pyplot as plt\n",
+    "from sklearn.model_selection import train_test_split\n",
+    "from sklearn import linear_model\n",
+    "from sklearn.preprocessing import StandardScaler\n",
+    "\n",
+    "def MSE(y_data,y_model):\n",
+    "    n = np.size(y_model)\n",
+    "    return np.sum((y_data-y_model)**2)/n\n",
+    "# A seed just to ensure that the random numbers are the same for every run.\n",
+    "# Useful for eventual debugging.\n",
+    "np.random.seed(315)\n",
+    "\n",
+    "n = 100\n",
+    "x = np.random.rand(n)\n",
+    "y = np.exp(-x**2) + 1.5 * np.exp(-(x-2)**2)\n",
+    "\n",
+    "Maxpolydegree = 20\n",
+    "X = np.zeros((n,Maxpolydegree-1))\n",
+    "\n",
+    "for degree in range(1,Maxpolydegree): #No intercept column\n",
+    "    X[:,degree-1] = x**(degree)\n",
+    "\n",
+    "# We split the data in test and training data\n",
+    "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)\n",
+    "\n",
+    "#For our own implementation, we will need to deal with the intercept by centering the design matrix and the target variable\n",
+    "X_train_mean = np.mean(X_train,axis=0)\n",
+    "#Center by removing mean from each feature\n",
+    "X_train_scaled = X_train - X_train_mean \n",
+    "X_test_scaled = X_test - X_train_mean\n",
+    "#The model intercept (called y_scaler) is given by the mean of the target variable (IF X is centered)\n",
+    "#Remove the intercept from the training data.\n",
+    "y_scaler = np.mean(y_train)           \n",
+    "y_train_scaled = y_train - y_scaler   \n",
+    "\n",
+    "p = Maxpolydegree-1\n",
+    "I = np.eye(p,p)\n",
+    "# Decide which values of lambda to use\n",
+    "nlambdas = 6\n",
+    "MSEOwnRidgePredict = np.zeros(nlambdas)\n",
+    "MSERidgePredict = np.zeros(nlambdas)\n",
+    "\n",
+    "lambdas = np.logspace(-4, 2, nlambdas)\n",
+    "for i in range(nlambdas):\n",
+    "    lmb = lambdas[i]\n",
+    "    OwnRidgeTheta = np.linalg.pinv(X_train_scaled.T @ X_train_scaled+lmb*I) @ X_train_scaled.T @ (y_train_scaled)\n",
+    "    intercept_ = y_scaler - X_train_mean@OwnRidgeTheta #The intercept can be shifted so the model can predict on uncentered data\n",
+    "    #Add intercept to prediction\n",
+    "    ypredictOwnRidge = X_test_scaled @ OwnRidgeTheta + y_scaler \n",
+    "    RegRidge = linear_model.Ridge(lmb)\n",
+    "    RegRidge.fit(X_train,y_train)\n",
+    "    ypredictRidge = RegRidge.predict(X_test)\n",
+    "    MSEOwnRidgePredict[i] = MSE(y_test,ypredictOwnRidge)\n",
+    "    MSERidgePredict[i] = MSE(y_test,ypredictRidge)\n",
+    "    print(\"Theta values for own Ridge implementation\")\n",
+    "    print(OwnRidgeTheta) #Intercept is given by mean of target variable\n",
+    "    print(\"Theta values for Scikit-Learn Ridge implementation\")\n",
+    "    print(RegRidge.coef_)\n",
+    "    print('Intercept from own implementation:')\n",
+    "    print(intercept_)\n",
+    "    print('Intercept from Scikit-Learn Ridge implementation')\n",
+    "    print(RegRidge.intercept_)\n",
+    "    print(\"MSE values for own Ridge implementation\")\n",
+    "    print(MSEOwnRidgePredict[i])\n",
+    "    print(\"MSE values for Scikit-Learn Ridge implementation\")\n",
+    "    print(MSERidgePredict[i])\n",
+    "\n",
+    "\n",
+    "# Now plot the results\n",
+    "plt.figure()\n",
+    "plt.plot(np.log10(lambdas), MSEOwnRidgePredict, 'b--', label = 'MSE own Ridge Test')\n",
+    "plt.plot(np.log10(lambdas), MSERidgePredict, 'g--', label = 'MSE SL Ridge Test')\n",
+    "plt.xlabel('log10(lambda)')\n",
+    "plt.ylabel('MSE')\n",
+    "plt.legend()\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6ea197d8",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "We see here, when compared to the code which includes explicitely the\n",
+    "intercept column, that our MSE value is actually smaller. This is\n",
+    "because the regularization term does not include the intercept value\n",
+    "$\\theta_0$ in the fitting.  This applies to Lasso regularization as\n",
+    "well.  It means that our optimization is now done only with the\n",
+    "centered matrix and/or vector that enter the fitting procedure."
+   ]
+  }
+ ],
+ "metadata": {},
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
\ No newline at end of file
diff --git a/doc/LectureNotes/_build/jupyter_execute/week38.ipynb b/doc/LectureNotes/_build/jupyter_execute/week38.ipynb
new file mode 100644
index 000000000..c9b413443
--- /dev/null
+++ b/doc/LectureNotes/_build/jupyter_execute/week38.ipynb
@@ -0,0 +1,2283 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "8f27372d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- HTML file automatically generated from DocOnce source (https://github.com/doconce/doconce/)\n",
+    "doconce format html week38.do.txt --no_mako -->\n",
+    "<!-- dom:TITLE: Week 38: Statistical analysis, bias-variance tradeoff and resampling methods -->"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "fff8ca30",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "# Week 38: Statistical analysis, bias-variance tradeoff and resampling methods\n",
+    "**Morten Hjorth-Jensen**, Department of Physics and Center for Computing in Science Education, University of Oslo, Norway\n",
+    "\n",
+    "Date: **September 15-19, 2025**"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7ee7e714",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Plans for week 38, lecture Monday September 15\n",
+    "\n",
+    "**Material for the lecture on Monday September 15.**\n",
+    "\n",
+    "1. Statistical interpretation of OLS and various expectation values\n",
+    "\n",
+    "2. Resampling techniques, Bootstrap and cross validation and bias-variance tradeoff\n",
+    "\n",
+    "3. The material we did not cover last week, that is on more advanced methods for updating the learning rate, are covered by its own video. We will briefly discuss these topics at the beginning of the lecture and during the lab sessions. See video on ADAgrad, RMSprop and ADAM (material from last week not covered during lecture) at <https://youtu.be/J_41Hld6tTU>\n",
+    "\n",
+    "4. [Video of Lecture](https://youtu.be/4Fo7ITVA7V4)\n",
+    "\n",
+    "5. [Whiteboard notes](https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2025/FYSSTKweek38.pdf)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3b5ac440",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Readings and Videos\n",
+    "1. Raschka et al, pages 175-192\n",
+    "\n",
+    "2. Hastie et al Chapter 7, here we recommend 7.1-7.5 and 7.10 (cross-validation) and 7.11 (bootstrap). See <https://link.springer.com/book/10.1007/978-0-387-84858-7>.\n",
+    "\n",
+    "3. [Video on bias-variance tradeoff](https://www.youtube.com/watch?v=EuBBz3bI-aA)\n",
+    "\n",
+    "4. [Video on Bootstrapping](https://www.youtube.com/watch?v=Xz0x-8-cgaQ)\n",
+    "\n",
+    "5. [Video on cross validation](https://www.youtube.com/watch?v=fSytzGwwBVw)\n",
+    "\n",
+    "For the lab session, the following video on cross validation (from 2024), could be helpful, see <https://www.youtube.com/watch?v=T9jjWsmsd1o>"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6d5dba52",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Linking the regression analysis with a statistical interpretation\n",
+    "\n",
+    "We will now couple the discussions of ordinary least squares, Ridge\n",
+    "and Lasso regression with a statistical interpretation, that is we\n",
+    "move from a linear algebra analysis to a statistical analysis. In\n",
+    "particular, we will focus on what the regularization terms can result\n",
+    "in.  We will amongst other things show that the regularization\n",
+    "parameter can reduce considerably the variance of the parameters\n",
+    "$\\theta$.\n",
+    "\n",
+    "On of the advantages of doing linear regression is that we actually end up with\n",
+    "analytical expressions for several statistical quantities.  \n",
+    "Standard least squares and Ridge regression  allow us to\n",
+    "derive quantities like the variance and other expectation values in a\n",
+    "rather straightforward way.\n",
+    "\n",
+    "It is assumed that $\\varepsilon_i\n",
+    "\\sim \\mathcal{N}(0, \\sigma^2)$ and the $\\varepsilon_{i}$ are\n",
+    "independent, i.e.:"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "bfc2983a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\begin{align*} \n",
+    "\\mbox{Cov}(\\varepsilon_{i_1},\n",
+    "\\varepsilon_{i_2}) & = \\left\\{ \\begin{array}{lcc} \\sigma^2 & \\mbox{if}\n",
+    "& i_1 = i_2, \\\\ 0 & \\mbox{if} & i_1 \\not= i_2.  \\end{array} \\right.\n",
+    "\\end{align*}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2b5f5980",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "The randomness of $\\varepsilon_i$ implies that\n",
+    "$\\mathbf{y}_i$ is also a random variable. In particular,\n",
+    "$\\mathbf{y}_i$ is normally distributed, because $\\varepsilon_i \\sim\n",
+    "\\mathcal{N}(0, \\sigma^2)$ and $\\mathbf{X}_{i,\\ast} \\, \\boldsymbol{\\theta}$ is a\n",
+    "non-random scalar. To specify the parameters of the distribution of\n",
+    "$\\mathbf{y}_i$ we need to calculate its first two moments. \n",
+    "\n",
+    "Recall that $\\boldsymbol{X}$ is a matrix of dimensionality $n\\times p$. The\n",
+    "notation above $\\mathbf{X}_{i,\\ast}$ means that we are looking at the\n",
+    "row number $i$ and perform a sum over all values $p$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3464c7e8",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Assumptions made\n",
+    "\n",
+    "The assumption we have made here can be summarized as (and this is going to be useful when we discuss the bias-variance trade off)\n",
+    "that there exists a function $f(\\boldsymbol{x})$ and  a normal distributed error $\\boldsymbol{\\varepsilon}\\sim \\mathcal{N}(0, \\sigma^2)$\n",
+    "which describe our data"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ed0fd2df",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\boldsymbol{y} = f(\\boldsymbol{x})+\\boldsymbol{\\varepsilon}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "feb9d4c2",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "We approximate this function with our model from the solution of the linear regression equations, that is our\n",
+    "function $f$ is approximated by $\\boldsymbol{\\tilde{y}}$ where we want to minimize $(\\boldsymbol{y}-\\boldsymbol{\\tilde{y}})^2$, our MSE, with"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "eb6d71f8",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\boldsymbol{\\tilde{y}} = \\boldsymbol{X}\\boldsymbol{\\theta}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "566399f6",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Expectation value and variance\n",
+    "\n",
+    "We can calculate the expectation value of $\\boldsymbol{y}$ for a given element $i$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6b33f497",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\begin{align*} \n",
+    "\\mathbb{E}(y_i) & =\n",
+    "\\mathbb{E}(\\mathbf{X}_{i, \\ast} \\, \\boldsymbol{\\theta}) + \\mathbb{E}(\\varepsilon_i)\n",
+    "\\, \\, \\, = \\, \\, \\, \\mathbf{X}_{i, \\ast} \\, \\theta, \n",
+    "\\end{align*}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5f2f79f2",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "while\n",
+    "its variance is"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "199121b0",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\begin{align*} \\mbox{Var}(y_i) & = \\mathbb{E} \\{ [y_i\n",
+    "- \\mathbb{E}(y_i)]^2 \\} \\, \\, \\, = \\, \\, \\, \\mathbb{E} ( y_i^2 ) -\n",
+    "[\\mathbb{E}(y_i)]^2  \\\\  & = \\mathbb{E} [ ( \\mathbf{X}_{i, \\ast} \\,\n",
+    "\\theta + \\varepsilon_i )^2] - ( \\mathbf{X}_{i, \\ast} \\, \\boldsymbol{\\theta})^2 \\\\ &\n",
+    "= \\mathbb{E} [ ( \\mathbf{X}_{i, \\ast} \\, \\boldsymbol{\\theta})^2 + 2 \\varepsilon_i\n",
+    "\\mathbf{X}_{i, \\ast} \\, \\boldsymbol{\\theta} + \\varepsilon_i^2 ] - ( \\mathbf{X}_{i,\n",
+    "\\ast} \\, \\theta)^2 \\\\  & = ( \\mathbf{X}_{i, \\ast} \\, \\boldsymbol{\\theta})^2 + 2\n",
+    "\\mathbb{E}(\\varepsilon_i) \\mathbf{X}_{i, \\ast} \\, \\boldsymbol{\\theta} +\n",
+    "\\mathbb{E}(\\varepsilon_i^2 ) - ( \\mathbf{X}_{i, \\ast} \\, \\boldsymbol{\\theta})^2 \n",
+    "\\\\ & = \\mathbb{E}(\\varepsilon_i^2 ) \\, \\, \\, = \\, \\, \\,\n",
+    "\\mbox{Var}(\\varepsilon_i) \\, \\, \\, = \\, \\, \\, \\sigma^2.  \n",
+    "\\end{align*}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9a1cc529",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Hence, $y_i \\sim \\mathcal{N}( \\mathbf{X}_{i, \\ast} \\, \\boldsymbol{\\theta}, \\sigma^2)$, that is $\\boldsymbol{y}$ follows a normal distribution with \n",
+    "mean value $\\boldsymbol{X}\\boldsymbol{\\theta}$ and variance $\\sigma^2$ (not be confused with the singular values of the SVD)."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "149e63be",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Expectation value and variance for $\\boldsymbol{\\theta}$\n",
+    "\n",
+    "With the OLS expressions for the optimal parameters $\\boldsymbol{\\hat{\\theta}}$ we can evaluate the expectation value"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6a6fb04a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\mathbb{E}(\\boldsymbol{\\hat{\\theta}}) = \\mathbb{E}[ (\\mathbf{X}^{\\top} \\mathbf{X})^{-1}\\mathbf{X}^{T} \\mathbf{Y}]=(\\mathbf{X}^{T} \\mathbf{X})^{-1}\\mathbf{X}^{T} \\mathbb{E}[ \\mathbf{Y}]=(\\mathbf{X}^{T} \\mathbf{X})^{-1} \\mathbf{X}^{T}\\mathbf{X}\\boldsymbol{\\theta}=\\boldsymbol{\\theta}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "79420d06",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "This means that the estimator of the regression parameters is unbiased.\n",
+    "\n",
+    "We can also calculate the variance\n",
+    "\n",
+    "The variance of the optimal value $\\boldsymbol{\\hat{\\theta}}$ is"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0e3de992",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\begin{eqnarray*}\n",
+    "\\mbox{Var}(\\boldsymbol{\\hat{\\theta}}) & = & \\mathbb{E} \\{ [\\boldsymbol{\\theta} - \\mathbb{E}(\\boldsymbol{\\theta})] [\\boldsymbol{\\theta} - \\mathbb{E}(\\boldsymbol{\\theta})]^{T} \\}\n",
+    "\\\\\n",
+    "& = & \\mathbb{E} \\{ [(\\mathbf{X}^{T} \\mathbf{X})^{-1} \\, \\mathbf{X}^{T} \\mathbf{Y} - \\boldsymbol{\\theta}] \\, [(\\mathbf{X}^{T} \\mathbf{X})^{-1} \\, \\mathbf{X}^{T} \\mathbf{Y} - \\boldsymbol{\\theta}]^{T} \\}\n",
+    "\\\\\n",
+    "% & = & \\mathbb{E} \\{ [(\\mathbf{X}^{T} \\mathbf{X})^{-1} \\, \\mathbf{X}^{T} \\mathbf{Y}] \\, [(\\mathbf{X}^{T} \\mathbf{X})^{-1} \\, \\mathbf{X}^{T} \\mathbf{Y}]^{T} \\} - \\boldsymbol{\\theta} \\, \\boldsymbol{\\theta}^{T}\n",
+    "% \\\\\n",
+    "% & = & \\mathbb{E} \\{ (\\mathbf{X}^{T} \\mathbf{X})^{-1} \\, \\mathbf{X}^{T} \\mathbf{Y} \\, \\mathbf{Y}^{T} \\, \\mathbf{X} \\, (\\mathbf{X}^{T} \\mathbf{X})^{-1}  \\} - \\boldsymbol{\\theta} \\, \\boldsymbol{\\theta}^{T}\n",
+    "% \\\\\n",
+    "& = & (\\mathbf{X}^{T} \\mathbf{X})^{-1} \\, \\mathbf{X}^{T} \\, \\mathbb{E} \\{ \\mathbf{Y} \\, \\mathbf{Y}^{T} \\} \\, \\mathbf{X} \\, (\\mathbf{X}^{T} \\mathbf{X})^{-1} - \\boldsymbol{\\theta} \\, \\boldsymbol{\\theta}^{T}\n",
+    "\\\\\n",
+    "& = & (\\mathbf{X}^{T} \\mathbf{X})^{-1} \\, \\mathbf{X}^{T} \\, \\{ \\mathbf{X} \\, \\boldsymbol{\\theta} \\, \\boldsymbol{\\theta}^{T} \\,  \\mathbf{X}^{T} + \\sigma^2 \\} \\, \\mathbf{X} \\, (\\mathbf{X}^{T} \\mathbf{X})^{-1} - \\boldsymbol{\\theta} \\, \\boldsymbol{\\theta}^{T}\n",
+    "% \\\\\n",
+    "% & = & (\\mathbf{X}^T \\mathbf{X})^{-1} \\, \\mathbf{X}^T \\, \\mathbf{X} \\, \\boldsymbol{\\theta} \\, \\boldsymbol{\\theta}^T \\,  \\mathbf{X}^T \\, \\mathbf{X} \\, (\\mathbf{X}^T % \\mathbf{X})^{-1}\n",
+    "% \\\\\n",
+    "% & & + \\, \\, \\sigma^2 \\, (\\mathbf{X}^T \\mathbf{X})^{-1} \\, \\mathbf{X}^T  \\, \\mathbf{X} \\, (\\mathbf{X}^T \\mathbf{X})^{-1} - \\boldsymbol{\\theta} \\boldsymbol{\\theta}^T\n",
+    "\\\\\n",
+    "& = & \\boldsymbol{\\theta} \\, \\boldsymbol{\\theta}^{T}  + \\sigma^2 \\, (\\mathbf{X}^{T} \\mathbf{X})^{-1} - \\boldsymbol{\\theta} \\, \\boldsymbol{\\theta}^{T}\n",
+    "\\, \\, \\, = \\, \\, \\, \\sigma^2 \\, (\\mathbf{X}^{T} \\mathbf{X})^{-1},\n",
+    "\\end{eqnarray*}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d3ea2897",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where we have used  that $\\mathbb{E} (\\mathbf{Y} \\mathbf{Y}^{T}) =\n",
+    "\\mathbf{X} \\, \\boldsymbol{\\theta} \\, \\boldsymbol{\\theta}^{T} \\, \\mathbf{X}^{T} +\n",
+    "\\sigma^2 \\, \\mathbf{I}_{nn}$. From $\\mbox{Var}(\\boldsymbol{\\theta}) = \\sigma^2\n",
+    "\\, (\\mathbf{X}^{T} \\mathbf{X})^{-1}$, one obtains an estimate of the\n",
+    "variance of the estimate of the $j$-th regression coefficient:\n",
+    "$\\boldsymbol{\\sigma}^2 (\\boldsymbol{\\theta}_j ) = \\boldsymbol{\\sigma}^2 [(\\mathbf{X}^{T} \\mathbf{X})^{-1}]_{jj} $. This may be used to\n",
+    "construct a confidence interval for the estimates.\n",
+    "\n",
+    "In a similar way, we can obtain analytical expressions for say the\n",
+    "expectation values of the parameters $\\boldsymbol{\\theta}$ and their variance\n",
+    "when we employ Ridge regression, allowing us again to define a confidence interval. \n",
+    "\n",
+    "It is rather straightforward to show that"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "da5e3927",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\mathbb{E} \\big[ \\boldsymbol{\\theta}^{\\mathrm{Ridge}} \\big]=(\\mathbf{X}^{T} \\mathbf{X} + \\lambda \\mathbf{I}_{pp})^{-1} (\\mathbf{X}^{\\top} \\mathbf{X})\\boldsymbol{\\theta}^{\\mathrm{OLS}}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7ab5488b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "We see clearly that \n",
+    "$\\mathbb{E} \\big[ \\boldsymbol{\\theta}^{\\mathrm{Ridge}} \\big] \\not= \\boldsymbol{\\theta}^{\\mathrm{OLS}}$ for any $\\lambda > 0$. We say then that the ridge estimator is biased.\n",
+    "\n",
+    "We can also compute the variance as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f904a739",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\mbox{Var}[\\boldsymbol{\\theta}^{\\mathrm{Ridge}}]=\\sigma^2[  \\mathbf{X}^{T} \\mathbf{X} + \\lambda \\mathbf{I} ]^{-1}  \\mathbf{X}^{T} \\mathbf{X} \\{ [  \\mathbf{X}^{\\top} \\mathbf{X} + \\lambda \\mathbf{I} ]^{-1}\\}^{T},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "10fd648b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and it is easy to see that if the parameter $\\lambda$ goes to infinity then the variance of Ridge parameters $\\boldsymbol{\\theta}$ goes to zero. \n",
+    "\n",
+    "With this, we can compute the difference"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4812c2a4",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\mbox{Var}[\\boldsymbol{\\theta}^{\\mathrm{OLS}}]-\\mbox{Var}(\\boldsymbol{\\theta}^{\\mathrm{Ridge}})=\\sigma^2 [  \\mathbf{X}^{T} \\mathbf{X} + \\lambda \\mathbf{I} ]^{-1}[ 2\\lambda\\mathbf{I} + \\lambda^2 (\\mathbf{X}^{T} \\mathbf{X})^{-1} ] \\{ [  \\mathbf{X}^{T} \\mathbf{X} + \\lambda \\mathbf{I} ]^{-1}\\}^{T}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "199d8531",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "The difference is non-negative definite since each component of the\n",
+    "matrix product is non-negative definite. \n",
+    "This means the variance we obtain with the standard OLS will always for $\\lambda > 0$ be larger than the variance of $\\boldsymbol{\\theta}$ obtained with the Ridge estimator. This has interesting consequences when we discuss the so-called bias-variance trade-off below."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "96c16676",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Deriving OLS from a probability distribution\n",
+    "\n",
+    "Our basic assumption when we derived the OLS equations was to assume\n",
+    "that our output is determined by a given continuous function\n",
+    "$f(\\boldsymbol{x})$ and a random noise $\\boldsymbol{\\epsilon}$ given by the normal\n",
+    "distribution with zero mean value and an undetermined variance\n",
+    "$\\sigma^2$.\n",
+    "\n",
+    "We found above that the outputs $\\boldsymbol{y}$ have a mean value given by\n",
+    "$\\boldsymbol{X}\\hat{\\boldsymbol{\\theta}}$ and variance $\\sigma^2$. Since the entries to\n",
+    "the design matrix are not stochastic variables, we can assume that the\n",
+    "probability distribution of our targets is also a normal distribution\n",
+    "but now with mean value $\\boldsymbol{X}\\hat{\\boldsymbol{\\theta}}$. This means that a\n",
+    "single output $y_i$ is given by the Gaussian distribution"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a2a1a004",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "y_i\\sim \\mathcal{N}(\\boldsymbol{X}_{i,*}\\boldsymbol{\\theta}, \\sigma^2)=\\frac{1}{\\sqrt{2\\pi\\sigma^2}}\\exp{\\left[-\\frac{(y_i-\\boldsymbol{X}_{i,*}\\boldsymbol{\\theta})^2}{2\\sigma^2}\\right]}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5aad445b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Independent and Identically Distributed (iid)\n",
+    "\n",
+    "We assume now that the various $y_i$ values are stochastically distributed according to the above Gaussian distribution. \n",
+    "We define this distribution as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d197c8bb",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "p(y_i, \\boldsymbol{X}\\vert\\boldsymbol{\\theta})=\\frac{1}{\\sqrt{2\\pi\\sigma^2}}\\exp{\\left[-\\frac{(y_i-\\boldsymbol{X}_{i,*}\\boldsymbol{\\theta})^2}{2\\sigma^2}\\right]},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e2e7462f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "which reads as finding the likelihood of an event $y_i$ with the input variables $\\boldsymbol{X}$ given the parameters (to be determined) $\\boldsymbol{\\theta}$.\n",
+    "\n",
+    "Since these events are assumed to be independent and identicall distributed we can build the probability distribution function (PDF) for all possible event $\\boldsymbol{y}$ as the product of the single events, that is we have"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "eb635d3d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "p(\\boldsymbol{y},\\boldsymbol{X}\\vert\\boldsymbol{\\theta})=\\prod_{i=0}^{n-1}\\frac{1}{\\sqrt{2\\pi\\sigma^2}}\\exp{\\left[-\\frac{(y_i-\\boldsymbol{X}_{i,*}\\boldsymbol{\\theta})^2}{2\\sigma^2}\\right]}=\\prod_{i=0}^{n-1}p(y_i,\\boldsymbol{X}\\vert\\boldsymbol{\\theta}).\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "445ed13e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "We will write this in a more compact form reserving $\\boldsymbol{D}$ for the domain of events, including the ouputs (targets) and the inputs. That is\n",
+    "in case we have a simple one-dimensional input and output case"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "319bfc6c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\boldsymbol{D}=[(x_0,y_0), (x_1,y_1),\\dots, (x_{n-1},y_{n-1})].\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "90abf35a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "In the more general case the various inputs should be replaced by the possible features represented by the input data set $\\boldsymbol{X}$. \n",
+    "We can now rewrite the above probability as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "04b66fbd",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "p(\\boldsymbol{D}\\vert\\boldsymbol{\\theta})=\\prod_{i=0}^{n-1}\\frac{1}{\\sqrt{2\\pi\\sigma^2}}\\exp{\\left[-\\frac{(y_i-\\boldsymbol{X}_{i,*}\\boldsymbol{\\theta})^2}{2\\sigma^2}\\right]}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4a27b5a7",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "It is a conditional probability (see below) and reads as the likelihood of a domain of events $\\boldsymbol{D}$ given a set of parameters $\\boldsymbol{\\theta}$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8d12543f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Maximum Likelihood Estimation (MLE)\n",
+    "\n",
+    "In statistics, maximum likelihood estimation (MLE) is a method of\n",
+    "estimating the parameters of an assumed probability distribution,\n",
+    "given some observed data. This is achieved by maximizing a likelihood\n",
+    "function so that, under the assumed statistical model, the observed\n",
+    "data is the most probable. \n",
+    "\n",
+    "We will assume here that our events are given by the above Gaussian\n",
+    "distribution and we will determine the optimal parameters $\\theta$ by\n",
+    "maximizing the above PDF. However, computing the derivatives of a\n",
+    "product function is cumbersome and can easily lead to overflow and/or\n",
+    "underflowproblems, with potentials for loss of numerical precision.\n",
+    "\n",
+    "In practice, it is more convenient to maximize the logarithm of the\n",
+    "PDF because it is a monotonically increasing function of the argument.\n",
+    "Alternatively, and this will be our option, we will minimize the\n",
+    "negative of the logarithm since this is a monotonically decreasing\n",
+    "function.\n",
+    "\n",
+    "Note also that maximization/minimization of the logarithm of the PDF\n",
+    "is equivalent to the maximization/minimization of the function itself."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2e5cd118",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## A new Cost Function\n",
+    "\n",
+    "We could now define a new cost function to minimize, namely the negative logarithm of the above PDF"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c71a5edf",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "C(\\boldsymbol{\\theta})=-\\log{\\prod_{i=0}^{n-1}p(y_i,\\boldsymbol{X}\\vert\\boldsymbol{\\theta})}=-\\sum_{i=0}^{n-1}\\log{p(y_i,\\boldsymbol{X}\\vert\\boldsymbol{\\theta})},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e663bf2e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "which becomes"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c4bc4873",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "C(\\boldsymbol{\\theta})=\\frac{n}{2}\\log{2\\pi\\sigma^2}+\\frac{\\vert\\vert (\\boldsymbol{y}-\\boldsymbol{X}\\boldsymbol{\\theta})\\vert\\vert_2^2}{2\\sigma^2}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f5bc59b8",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Taking the derivative of the *new* cost function with respect to the parameters $\\theta$ we recognize our familiar OLS equation, namely"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4f6ddf4a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\boldsymbol{X}^T\\left(\\boldsymbol{y}-\\boldsymbol{X}\\boldsymbol{\\theta}\\right) =0,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "afda0a6b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "which leads to the well-known OLS equation for the optimal paramters $\\theta$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b5335dc0",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\hat{\\boldsymbol{\\theta}}^{\\mathrm{OLS}}=\\left(\\boldsymbol{X}^T\\boldsymbol{X}\\right)^{-1}\\boldsymbol{X}^T\\boldsymbol{y}!\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4f86a52d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Next week we will make  a similar analysis for Ridge and Lasso regression"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5cdb1767",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Why resampling methods\n",
+    "\n",
+    "Before we proceed, we need to rethink what we have been doing. In our\n",
+    "eager to fit the data, we have omitted several important elements in\n",
+    "our regression analysis. In what follows we will\n",
+    "1. look at statistical properties, including a discussion of mean values, variance and the so-called bias-variance tradeoff\n",
+    "\n",
+    "2. introduce resampling techniques like cross-validation, bootstrapping and jackknife and more\n",
+    "\n",
+    "and discuss how to select a given model (one of the difficult parts in machine learning)."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "69435d77",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Resampling methods\n",
+    "Resampling methods are an indispensable tool in modern\n",
+    "statistics. They involve repeatedly drawing samples from a training\n",
+    "set and refitting a model of interest on each sample in order to\n",
+    "obtain additional information about the fitted model. For example, in\n",
+    "order to estimate the variability of a linear regression fit, we can\n",
+    "repeatedly draw different samples from the training data, fit a linear\n",
+    "regression to each new sample, and then examine the extent to which\n",
+    "the resulting fits differ. Such an approach may allow us to obtain\n",
+    "information that would not be available from fitting the model only\n",
+    "once using the original training sample.\n",
+    "\n",
+    "Two resampling methods are often used in Machine Learning analyses,\n",
+    "1. The **bootstrap method**\n",
+    "\n",
+    "2. and **Cross-Validation**\n",
+    "\n",
+    "In addition there are several other methods such as the Jackknife and the Blocking methods. We will discuss in particular\n",
+    "cross-validation and the bootstrap method."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cefbb559",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Resampling approaches can be computationally expensive\n",
+    "\n",
+    "Resampling approaches can be computationally expensive, because they\n",
+    "involve fitting the same statistical method multiple times using\n",
+    "different subsets of the training data. However, due to recent\n",
+    "advances in computing power, the computational requirements of\n",
+    "resampling methods generally are not prohibitive. In this chapter, we\n",
+    "discuss two of the most commonly used resampling methods,\n",
+    "cross-validation and the bootstrap. Both methods are important tools\n",
+    "in the practical application of many statistical learning\n",
+    "procedures. For example, cross-validation can be used to estimate the\n",
+    "test error associated with a given statistical learning method in\n",
+    "order to evaluate its performance, or to select the appropriate level\n",
+    "of flexibility. The process of evaluating a model’s performance is\n",
+    "known as model assessment, whereas the process of selecting the proper\n",
+    "level of flexibility for a model is known as model selection. The\n",
+    "bootstrap is widely used."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2659401a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Why resampling methods ?\n",
+    "**Statistical analysis.**\n",
+    "\n",
+    "* Our simulations can be treated as *computer experiments*. This is particularly the case for Monte Carlo methods which are widely used in statistical analyses.\n",
+    "\n",
+    "* The results can be analysed with the same statistical tools as we would use when analysing experimental data.\n",
+    "\n",
+    "* As in all experiments, we are looking for expectation values and an estimate of how accurate they are, i.e., possible sources for errors."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4d5d7748",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Statistical analysis\n",
+    "\n",
+    "* As in other experiments, many numerical  experiments have two classes of errors:\n",
+    "\n",
+    "  * Statistical errors\n",
+    "\n",
+    "  * Systematical errors\n",
+    "\n",
+    "* Statistical errors can be estimated using standard tools from statistics\n",
+    "\n",
+    "* Systematical errors are method specific and must be treated differently from case to case."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "54df92b3",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Resampling methods\n",
+    "\n",
+    "With all these analytical equations for both the OLS and Ridge\n",
+    "regression, we will now outline how to assess a given model. This will\n",
+    "lead to a discussion of the so-called bias-variance tradeoff (see\n",
+    "below) and so-called resampling methods.\n",
+    "\n",
+    "One of the quantities we have discussed as a way to measure errors is\n",
+    "the mean-squared error (MSE), mainly used for fitting of continuous\n",
+    "functions. Another choice is the absolute error.\n",
+    "\n",
+    "In the discussions below we will focus on the MSE and in particular since we will split the data into test and training data,\n",
+    "we discuss the\n",
+    "1. prediction error or simply the **test error** $\\mathrm{Err_{Test}}$, where we have a fixed training set and the test error is the MSE arising from the data reserved for testing. We discuss also the \n",
+    "\n",
+    "2. training error $\\mathrm{Err_{Train}}$, which is the average loss over the training data.\n",
+    "\n",
+    "As our model becomes more and more complex, more of the training data tends to  used. The training may thence adapt to more complicated structures in the data. This may lead to a decrease in the bias (see below for code example) and a slight increase of the variance for the test error.\n",
+    "For a certain level of complexity the test error will reach minimum, before starting to increase again. The\n",
+    "training error reaches a saturation."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5b1a1390",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Resampling methods: Bootstrap\n",
+    "Bootstrapping is a [non-parametric approach](https://en.wikipedia.org/wiki/Nonparametric_statistics) to statistical inference\n",
+    "that substitutes computation for more traditional distributional\n",
+    "assumptions and asymptotic results. Bootstrapping offers a number of\n",
+    "advantages: \n",
+    "1. The bootstrap is quite general, although there are some cases in which it fails.  \n",
+    "\n",
+    "2. Because it does not require distributional assumptions (such as normally distributed errors), the bootstrap can provide more accurate inferences when the data are not well behaved or when the sample size is small.  \n",
+    "\n",
+    "3. It is possible to apply the bootstrap to statistics with sampling distributions that are difficult to derive, even asymptotically. \n",
+    "\n",
+    "4. It is relatively simple to apply the bootstrap to complex data-collection plans (such as stratified and clustered samples).\n",
+    "\n",
+    "The textbook by [Davison on the Bootstrap Methods and their Applications](https://www.cambridge.org/core/books/bootstrap-methods-and-their-application/ED2FD043579F27952363566DC09CBD6A) provides many more insights and proofs. In this course we will take a more practical approach and use the results and theorems provided in the literature. For those interested in reading more about the bootstrap methods, we recommend the above text and the one by [Efron and Tibshirani](https://www.routledge.com/An-Introduction-to-the-Bootstrap/Efron-Tibshirani/p/book/9780412042317).\n",
+    "\n",
+    "Before we proceed however, we need to remind ourselves about a central theorem in statistics, namely the so-called **central limit theorem**."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "39f233e4",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## The Central Limit Theorem\n",
+    "\n",
+    "Suppose we have a PDF $p(x)$ from which we generate  a series $N$\n",
+    "of averages $\\mathbb{E}[x_i]$. Each mean value $\\mathbb{E}[x_i]$\n",
+    "is viewed as the average of a specific measurement, e.g., throwing \n",
+    "dice 100 times and then taking the average value, or producing a certain\n",
+    "amount of random numbers. \n",
+    "For notational ease, we set $\\mathbb{E}[x_i]=x_i$ in the discussion\n",
+    "which follows. We do the same for $\\mathbb{E}[z]=z$.\n",
+    "\n",
+    "If we compute the mean $z$ of $m$ such mean values $x_i$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "361320d8",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "z=\\frac{x_1+x_2+\\dots+x_m}{m},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a363db1e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "the question we pose is which is the PDF of the new variable $z$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "92967efc",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Finding the Limit\n",
+    "\n",
+    "The probability of obtaining an average value $z$ is the product of the \n",
+    "probabilities of obtaining arbitrary individual mean values $x_i$,\n",
+    "but with the constraint that the average is $z$. We can express this through\n",
+    "the following expression"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1bffca97",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\tilde{p}(z)=\\int dx_1p(x_1)\\int dx_2p(x_2)\\dots\\int dx_mp(x_m)\n",
+    "    \\delta(z-\\frac{x_1+x_2+\\dots+x_m}{m}),\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0dacb6fc",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where the $\\delta$-function enbodies the constraint that the mean is $z$.\n",
+    "All measurements that lead to each individual $x_i$ are expected to\n",
+    "be independent, which in turn means that we can express $\\tilde{p}$ as the \n",
+    "product of individual $p(x_i)$.  The independence assumption is important in the derivation of the central limit theorem."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "baeedf81",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Rewriting the $\\delta$-function\n",
+    "\n",
+    "If we use the integral expression for the $\\delta$-function"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "20cc7770",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\delta(z-\\frac{x_1+x_2+\\dots+x_m}{m})=\\frac{1}{2\\pi}\\int_{-\\infty}^{\\infty}\n",
+    "   dq\\exp{\\left(iq(z-\\frac{x_1+x_2+\\dots+x_m}{m})\\right)},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f67d3b94",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and inserting $e^{i\\mu q-i\\mu q}$ where $\\mu$ is the mean value\n",
+    "we arrive at"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "17f59fb6",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\tilde{p}(z)=\\frac{1}{2\\pi}\\int_{-\\infty}^{\\infty}\n",
+    "   dq\\exp{\\left(iq(z-\\mu)\\right)}\\left[\\int_{-\\infty}^{\\infty}\n",
+    "   dxp(x)\\exp{\\left(iq(\\mu-x)/m\\right)}\\right]^m,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5f899fbe",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "with the integral over $x$ resulting in"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "19a1f5bb",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\int_{-\\infty}^{\\infty}dxp(x)\\exp{\\left(iq(\\mu-x)/m\\right)}=\n",
+    "  \\int_{-\\infty}^{\\infty}dxp(x)\n",
+    "   \\left[1+\\frac{iq(\\mu-x)}{m}-\\frac{q^2(\\mu-x)^2}{2m^2}+\\dots\\right].\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1db8fcf2",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Identifying Terms\n",
+    "\n",
+    "The second term on the rhs disappears since this is just the mean and \n",
+    "employing the definition of $\\sigma^2$ we have"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "bfadf7e5",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\int_{-\\infty}^{\\infty}dxp(x)e^{\\left(iq(\\mu-x)/m\\right)}=\n",
+    "  1-\\frac{q^2\\sigma^2}{2m^2}+\\dots,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7c65ce24",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "resulting in"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8cd5650a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\left[\\int_{-\\infty}^{\\infty}dxp(x)\\exp{\\left(iq(\\mu-x)/m\\right)}\\right]^m\\approx\n",
+    "  \\left[1-\\frac{q^2\\sigma^2}{2m^2}+\\dots \\right]^m,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "11fdc936",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and in the limit $m\\rightarrow \\infty$ we obtain"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ed88642e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\tilde{p}(z)=\\frac{1}{\\sqrt{2\\pi}(\\sigma/\\sqrt{m})}\n",
+    "    \\exp{\\left(-\\frac{(z-\\mu)^2}{2(\\sigma/\\sqrt{m})^2}\\right)},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "82c61b81",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "which is the normal distribution with variance\n",
+    "$\\sigma^2_m=\\sigma^2/m$, where $\\sigma$ is the variance of the PDF $p(x)$\n",
+    "and $\\mu$ is also the mean of the PDF $p(x)$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "bc43db46",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Wrapping it up\n",
+    "\n",
+    "Thus, the central limit theorem states that the PDF $\\tilde{p}(z)$ of\n",
+    "the average of $m$ random values corresponding to a PDF $p(x)$ \n",
+    "is a normal distribution whose mean is the \n",
+    "mean value of the PDF $p(x)$ and whose variance is the variance\n",
+    "of the PDF $p(x)$ divided by $m$, the number of values used to compute $z$.\n",
+    "\n",
+    "The central limit theorem leads to the well-known expression for the\n",
+    "standard deviation, given by"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "25418113",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\sigma_m=\n",
+    "\\frac{\\sigma}{\\sqrt{m}}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e5d3c3eb",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "The latter is true only if the average value is known exactly. This is obtained in the limit\n",
+    "$m\\rightarrow \\infty$  only. Because the mean and the variance are measured quantities we obtain \n",
+    "the familiar expression in statistics (the so-called Bessel correction)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c504cba4",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\sigma_m\\approx \n",
+    "\\frac{\\sigma}{\\sqrt{m-1}}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "079ded2a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "In many cases however the above estimate for the standard deviation,\n",
+    "in particular if correlations are strong, may be too simplistic. Keep\n",
+    "in mind that we have assumed that the variables $x$ are independent\n",
+    "and identically distributed. This is obviously not always the\n",
+    "case. For example, the random numbers (or better pseudorandom numbers)\n",
+    "we generate in various calculations do always exhibit some\n",
+    "correlations.\n",
+    "\n",
+    "The theorem is satisfied by a large class of PDFs. Note however that for a\n",
+    "finite $m$, it is not always possible to find a closed form /analytic expression for\n",
+    "$\\tilde{p}(x)$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e8534a50",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Confidence Intervals\n",
+    "\n",
+    "Confidence intervals are used in statistics and represent a type of estimate\n",
+    "computed from the observed data. This gives a range of values for an\n",
+    "unknown parameter such as the parameters $\\boldsymbol{\\theta}$ from linear regression.\n",
+    "\n",
+    "With the OLS expressions for the parameters $\\boldsymbol{\\theta}$ we found \n",
+    "$\\mathbb{E}(\\boldsymbol{\\theta}) = \\boldsymbol{\\theta}$, which means that the estimator of the regression parameters is unbiased.\n",
+    "\n",
+    "In the exercises this week we show that the variance of the estimate of the $j$-th regression coefficient is\n",
+    "$\\boldsymbol{\\sigma}^2 (\\boldsymbol{\\theta}_j ) = \\boldsymbol{\\sigma}^2 [(\\mathbf{X}^{T} \\mathbf{X})^{-1}]_{jj} $.\n",
+    "\n",
+    "This quantity can be used to\n",
+    "construct a confidence interval for the estimates."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2fc73431",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Standard Approach based on the Normal Distribution\n",
+    "\n",
+    "We will assume that the parameters $\\theta$ follow a normal\n",
+    "distribution.  We can then define the confidence interval.  Here we will be using as\n",
+    "shorthands $\\mu_{\\theta}$ for the above mean value and $\\sigma_{\\theta}$\n",
+    "for the standard deviation. We have then a confidence interval"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0f8b0845",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\left(\\mu_{\\theta}\\pm \\frac{z\\sigma_{\\theta}}{\\sqrt{n}}\\right),\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "25105753",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where $z$ defines the level of certainty (or confidence). For a normal\n",
+    "distribution typical parameters are $z=2.576$ which corresponds to a\n",
+    "confidence of $99\\%$ while $z=1.96$ corresponds to a confidence of\n",
+    "$95\\%$.  A confidence level of $95\\%$ is commonly used and it is\n",
+    "normally referred to as a *two-sigmas* confidence level, that is we\n",
+    "approximate $z\\approx 2$.\n",
+    "\n",
+    "For more discussions of confidence intervals (and in particular linked with a discussion of the bootstrap method), see chapter 5 of the textbook by [Davison on the Bootstrap Methods and their Applications](https://www.cambridge.org/core/books/bootstrap-methods-and-their-application/ED2FD043579F27952363566DC09CBD6A)\n",
+    "\n",
+    "In this text you will also find an in-depth discussion of the\n",
+    "Bootstrap method, why it works and various theorems related to it."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "89be6eea",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Resampling methods: Bootstrap background\n",
+    "\n",
+    "Since $\\widehat{\\theta} = \\widehat{\\theta}(\\boldsymbol{X})$ is a function of random variables,\n",
+    "$\\widehat{\\theta}$ itself must be a random variable. Thus it has\n",
+    "a pdf, call this function $p(\\boldsymbol{t})$. The aim of the bootstrap is to\n",
+    "estimate $p(\\boldsymbol{t})$ by the relative frequency of\n",
+    "$\\widehat{\\theta}$. You can think of this as using a histogram\n",
+    "in the place of $p(\\boldsymbol{t})$. If the relative frequency closely\n",
+    "resembles $p(\\vec{t})$, then using numerics, it is straight forward to\n",
+    "estimate all the interesting parameters of $p(\\boldsymbol{t})$ using point\n",
+    "estimators."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6c240b38",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Resampling methods: More Bootstrap background\n",
+    "\n",
+    "In the case that $\\widehat{\\theta}$ has\n",
+    "more than one component, and the components are independent, we use the\n",
+    "same estimator on each component separately.  If the probability\n",
+    "density function of $X_i$, $p(x)$, had been known, then it would have\n",
+    "been straightforward to do this by: \n",
+    "1. Drawing lots of numbers from $p(x)$, suppose we call one such set of numbers $(X_1^*, X_2^*, \\cdots, X_n^*)$. \n",
+    "\n",
+    "2. Then using these numbers, we could compute a replica of $\\widehat{\\theta}$ called $\\widehat{\\theta}^*$. \n",
+    "\n",
+    "By repeated use of the above two points, many\n",
+    "estimates of $\\widehat{\\theta}$ can  be obtained. The\n",
+    "idea is to use the relative frequency of $\\widehat{\\theta}^*$\n",
+    "(think of a histogram) as an estimate of $p(\\boldsymbol{t})$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "fbd95a5c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Resampling methods: Bootstrap approach\n",
+    "\n",
+    "But\n",
+    "unless there is enough information available about the process that\n",
+    "generated $X_1,X_2,\\cdots,X_n$, $p(x)$ is in general\n",
+    "unknown. Therefore, [Efron in 1979](https://projecteuclid.org/euclid.aos/1176344552)  asked the\n",
+    "question: What if we replace $p(x)$ by the relative frequency\n",
+    "of the observation $X_i$?\n",
+    "\n",
+    "If we draw observations in accordance with\n",
+    "the relative frequency of the observations, will we obtain the same\n",
+    "result in some asymptotic sense? The answer is yes."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "dc50d43a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Resampling methods: Bootstrap steps\n",
+    "\n",
+    "The independent bootstrap works like this: \n",
+    "\n",
+    "1. Draw with replacement $n$ numbers for the observed variables $\\boldsymbol{x} = (x_1,x_2,\\cdots,x_n)$. \n",
+    "\n",
+    "2. Define a vector $\\boldsymbol{x}^*$ containing the values which were drawn from $\\boldsymbol{x}$. \n",
+    "\n",
+    "3. Using the vector $\\boldsymbol{x}^*$ compute $\\widehat{\\theta}^*$ by evaluating $\\widehat \\theta$ under the observations $\\boldsymbol{x}^*$. \n",
+    "\n",
+    "4. Repeat this process $k$ times. \n",
+    "\n",
+    "When you are done, you can draw a histogram of the relative frequency\n",
+    "of $\\widehat \\theta^*$. This is your estimate of the probability\n",
+    "distribution $p(t)$. Using this probability distribution you can\n",
+    "estimate any statistics thereof. In principle you never draw the\n",
+    "histogram of the relative frequency of $\\widehat{\\theta}^*$. Instead\n",
+    "you use the estimators corresponding to the statistic of interest. For\n",
+    "example, if you are interested in estimating the variance of $\\widehat\n",
+    "\\theta$, apply the etsimator $\\widehat \\sigma^2$ to the values\n",
+    "$\\widehat \\theta^*$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "283068cc",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Code example for the Bootstrap method\n",
+    "\n",
+    "The following code starts with a Gaussian distribution with mean value\n",
+    "$\\mu =100$ and variance $\\sigma=15$. We use this to generate the data\n",
+    "used in the bootstrap analysis. The bootstrap analysis returns a data\n",
+    "set after a given number of bootstrap operations (as many as we have\n",
+    "data points). This data set consists of estimated mean values for each\n",
+    "bootstrap operation. The histogram generated by the bootstrap method\n",
+    "shows that the distribution for these mean values is also a Gaussian,\n",
+    "centered around the mean value $\\mu=100$ but with standard deviation\n",
+    "$\\sigma/\\sqrt{n}$, where $n$ is the number of bootstrap samples (in\n",
+    "this case the same as the number of original data points). The value\n",
+    "of the standard deviation is what we expect from the central limit\n",
+    "theorem."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "id": "ff4790ba",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "%matplotlib inline\n",
+    "\n",
+    "import numpy as np\n",
+    "from time import time\n",
+    "from scipy.stats import norm\n",
+    "import matplotlib.pyplot as plt\n",
+    "\n",
+    "# Returns mean of bootstrap samples \n",
+    "# Bootstrap algorithm\n",
+    "def bootstrap(data, datapoints):\n",
+    "    t = np.zeros(datapoints)\n",
+    "    n = len(data)\n",
+    "    # non-parametric bootstrap         \n",
+    "    for i in range(datapoints):\n",
+    "        t[i] = np.mean(data[np.random.randint(0,n,n)])\n",
+    "    # analysis    \n",
+    "    print(\"Bootstrap Statistics :\")\n",
+    "    print(\"original           bias      std. error\")\n",
+    "    print(\"%8g %8g %14g %15g\" % (np.mean(data), np.std(data),np.mean(t),np.std(t)))\n",
+    "    return t\n",
+    "\n",
+    "# We set the mean value to 100 and the standard deviation to 15\n",
+    "mu, sigma = 100, 15\n",
+    "datapoints = 10000\n",
+    "# We generate random numbers according to the normal distribution\n",
+    "x = mu + sigma*np.random.randn(datapoints)\n",
+    "# bootstrap returns the data sample                                    \n",
+    "t = bootstrap(x, datapoints)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3e6adc2f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "We see that our new variance and from that the standard deviation, agrees with the central limit theorem."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6ec8223c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Plotting the Histogram"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "id": "3cf4144d",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "# the histogram of the bootstrapped data (normalized data if density = True)\n",
+    "n, binsboot, patches = plt.hist(t, 50, density=True, facecolor='red', alpha=0.75)\n",
+    "# add a 'best fit' line  \n",
+    "y = norm.pdf(binsboot, np.mean(t), np.std(t))\n",
+    "lt = plt.plot(binsboot, y, 'b', linewidth=1)\n",
+    "plt.xlabel('x')\n",
+    "plt.ylabel('Probability')\n",
+    "plt.grid(True)\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "db5a8f91",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## The bias-variance tradeoff\n",
+    "\n",
+    "We will discuss the bias-variance tradeoff in the context of\n",
+    "continuous predictions such as regression. However, many of the\n",
+    "intuitions and ideas discussed here also carry over to classification\n",
+    "tasks. Consider a dataset $\\mathcal{D}$ consisting of the data\n",
+    "$\\mathbf{X}_\\mathcal{D}=\\{(y_j, \\boldsymbol{x}_j), j=0\\ldots n-1\\}$. \n",
+    "\n",
+    "Let us assume that the true data is generated from a noisy model"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "327bce6a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\boldsymbol{y}=f(\\boldsymbol{x}) + \\boldsymbol{\\epsilon}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1c671d4e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where $\\epsilon$ is normally distributed with mean zero and standard deviation $\\sigma^2$.\n",
+    "\n",
+    "In our derivation of the ordinary least squares method we defined then\n",
+    "an approximation to the function $f$ in terms of the parameters\n",
+    "$\\boldsymbol{\\theta}$ and the design matrix $\\boldsymbol{X}$ which embody our model,\n",
+    "that is $\\boldsymbol{\\tilde{y}}=\\boldsymbol{X}\\boldsymbol{\\theta}$. \n",
+    "\n",
+    "Thereafter we found the parameters $\\boldsymbol{\\theta}$ by optimizing the means squared error via the so-called cost function"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6e05fc43",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "C(\\boldsymbol{X},\\boldsymbol{\\theta}) =\\frac{1}{n}\\sum_{i=0}^{n-1}(y_i-\\tilde{y}_i)^2=\\mathbb{E}\\left[(\\boldsymbol{y}-\\boldsymbol{\\tilde{y}})^2\\right].\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c45e0752",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "We can rewrite this as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "bafa4ab6",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\mathbb{E}\\left[(\\boldsymbol{y}-\\boldsymbol{\\tilde{y}})^2\\right]=\\frac{1}{n}\\sum_i(f_i-\\mathbb{E}\\left[\\boldsymbol{\\tilde{y}}\\right])^2+\\frac{1}{n}\\sum_i(\\tilde{y}_i-\\mathbb{E}\\left[\\boldsymbol{\\tilde{y}}\\right])^2+\\sigma^2.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ea0bc471",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "The three terms represent the square of the bias of the learning\n",
+    "method, which can be thought of as the error caused by the simplifying\n",
+    "assumptions built into the method. The second term represents the\n",
+    "variance of the chosen model and finally the last terms is variance of\n",
+    "the error $\\boldsymbol{\\epsilon}$.\n",
+    "\n",
+    "To derive this equation, we need to recall that the variance of $\\boldsymbol{y}$ and $\\boldsymbol{\\epsilon}$ are both equal to $\\sigma^2$. The mean value of $\\boldsymbol{\\epsilon}$ is by definition equal to zero. Furthermore, the function $f$ is not a stochastics variable, idem for $\\boldsymbol{\\tilde{y}}$.\n",
+    "We use a more compact notation in terms of the expectation value"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "08b603f3",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\mathbb{E}\\left[(\\boldsymbol{y}-\\boldsymbol{\\tilde{y}})^2\\right]=\\mathbb{E}\\left[(\\boldsymbol{f}+\\boldsymbol{\\epsilon}-\\boldsymbol{\\tilde{y}})^2\\right],\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4114d10e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and adding and subtracting $\\mathbb{E}\\left[\\boldsymbol{\\tilde{y}}\\right]$ we get"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8890c666",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\mathbb{E}\\left[(\\boldsymbol{y}-\\boldsymbol{\\tilde{y}})^2\\right]=\\mathbb{E}\\left[(\\boldsymbol{f}+\\boldsymbol{\\epsilon}-\\boldsymbol{\\tilde{y}}+\\mathbb{E}\\left[\\boldsymbol{\\tilde{y}}\\right]-\\mathbb{E}\\left[\\boldsymbol{\\tilde{y}}\\right])^2\\right],\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7d5b7ce4",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "which, using the abovementioned expectation values can be rewritten as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3913c5b9",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\mathbb{E}\\left[(\\boldsymbol{y}-\\boldsymbol{\\tilde{y}})^2\\right]=\\mathbb{E}\\left[(\\boldsymbol{y}-\\mathbb{E}\\left[\\boldsymbol{\\tilde{y}}\\right])^2\\right]+\\mathrm{Var}\\left[\\boldsymbol{\\tilde{y}}\\right]+\\sigma^2,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5e0067b1",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "that is the rewriting in terms of the so-called bias, the variance of the model $\\boldsymbol{\\tilde{y}}$ and the variance of $\\boldsymbol{\\epsilon}$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "326bc8f1",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## A way to Read the Bias-Variance Tradeoff\n",
+    "\n",
+    "<!-- dom:FIGURE: [figures/BiasVariance.png, width=600 frac=0.9] -->\n",
+    "<!-- begin figure -->\n",
+    "\n",
+    "<img src=\"figures/BiasVariance.png\" width=\"600\"><p style=\"font-size: 0.9em\"><i>Figure 1: </i></p>\n",
+    "<!-- end figure -->"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d3713eca",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Example code for Bias-Variance tradeoff"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "id": "01c3b507",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "import matplotlib.pyplot as plt\n",
+    "import numpy as np\n",
+    "from sklearn.linear_model import LinearRegression, Ridge, Lasso\n",
+    "from sklearn.preprocessing import PolynomialFeatures\n",
+    "from sklearn.model_selection import train_test_split\n",
+    "from sklearn.pipeline import make_pipeline\n",
+    "from sklearn.utils import resample\n",
+    "\n",
+    "np.random.seed(2018)\n",
+    "\n",
+    "n = 500\n",
+    "n_boostraps = 100\n",
+    "degree = 18  # A quite high value, just to show.\n",
+    "noise = 0.1\n",
+    "\n",
+    "# Make data set.\n",
+    "x = np.linspace(-1, 3, n).reshape(-1, 1)\n",
+    "y = np.exp(-x**2) + 1.5 * np.exp(-(x-2)**2) + np.random.normal(0, 0.1, x.shape)\n",
+    "\n",
+    "# Hold out some test data that is never used in training.\n",
+    "x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2)\n",
+    "\n",
+    "# Combine x transformation and model into one operation.\n",
+    "# Not neccesary, but convenient.\n",
+    "model = make_pipeline(PolynomialFeatures(degree=degree), LinearRegression(fit_intercept=False))\n",
+    "\n",
+    "# The following (m x n_bootstraps) matrix holds the column vectors y_pred\n",
+    "# for each bootstrap iteration.\n",
+    "y_pred = np.empty((y_test.shape[0], n_boostraps))\n",
+    "for i in range(n_boostraps):\n",
+    "    x_, y_ = resample(x_train, y_train)\n",
+    "\n",
+    "    # Evaluate the new model on the same test data each time.\n",
+    "    y_pred[:, i] = model.fit(x_, y_).predict(x_test).ravel()\n",
+    "\n",
+    "# Note: Expectations and variances taken w.r.t. different training\n",
+    "# data sets, hence the axis=1. Subsequent means are taken across the test data\n",
+    "# set in order to obtain a total value, but before this we have error/bias/variance\n",
+    "# calculated per data point in the test set.\n",
+    "# Note 2: The use of keepdims=True is important in the calculation of bias as this \n",
+    "# maintains the column vector form. Dropping this yields very unexpected results.\n",
+    "error = np.mean( np.mean((y_test - y_pred)**2, axis=1, keepdims=True) )\n",
+    "bias = np.mean( (y_test - np.mean(y_pred, axis=1, keepdims=True))**2 )\n",
+    "variance = np.mean( np.var(y_pred, axis=1, keepdims=True) )\n",
+    "print('Error:', error)\n",
+    "print('Bias^2:', bias)\n",
+    "print('Var:', variance)\n",
+    "print('{} >= {} + {} = {}'.format(error, bias, variance, bias+variance))\n",
+    "\n",
+    "plt.plot(x[::5, :], y[::5, :], label='f(x)')\n",
+    "plt.scatter(x_test, y_test, label='Data points')\n",
+    "plt.scatter(x_test, np.mean(y_pred, axis=1), label='Pred')\n",
+    "plt.legend()\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "949e3a5e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Understanding what happens"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "id": "7e7f4926",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "import matplotlib.pyplot as plt\n",
+    "import numpy as np\n",
+    "from sklearn.linear_model import LinearRegression, Ridge, Lasso\n",
+    "from sklearn.preprocessing import PolynomialFeatures\n",
+    "from sklearn.model_selection import train_test_split\n",
+    "from sklearn.pipeline import make_pipeline\n",
+    "from sklearn.utils import resample\n",
+    "\n",
+    "np.random.seed(2018)\n",
+    "\n",
+    "n = 40\n",
+    "n_boostraps = 100\n",
+    "maxdegree = 14\n",
+    "\n",
+    "\n",
+    "# Make data set.\n",
+    "x = np.linspace(-3, 3, n).reshape(-1, 1)\n",
+    "y = np.exp(-x**2) + 1.5 * np.exp(-(x-2)**2)+ np.random.normal(0, 0.1, x.shape)\n",
+    "error = np.zeros(maxdegree)\n",
+    "bias = np.zeros(maxdegree)\n",
+    "variance = np.zeros(maxdegree)\n",
+    "polydegree = np.zeros(maxdegree)\n",
+    "x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2)\n",
+    "\n",
+    "for degree in range(maxdegree):\n",
+    "    model = make_pipeline(PolynomialFeatures(degree=degree), LinearRegression(fit_intercept=False))\n",
+    "    y_pred = np.empty((y_test.shape[0], n_boostraps))\n",
+    "    for i in range(n_boostraps):\n",
+    "        x_, y_ = resample(x_train, y_train)\n",
+    "        y_pred[:, i] = model.fit(x_, y_).predict(x_test).ravel()\n",
+    "\n",
+    "    polydegree[degree] = degree\n",
+    "    error[degree] = np.mean( np.mean((y_test - y_pred)**2, axis=1, keepdims=True) )\n",
+    "    bias[degree] = np.mean( (y_test - np.mean(y_pred, axis=1, keepdims=True))**2 )\n",
+    "    variance[degree] = np.mean( np.var(y_pred, axis=1, keepdims=True) )\n",
+    "    print('Polynomial degree:', degree)\n",
+    "    print('Error:', error[degree])\n",
+    "    print('Bias^2:', bias[degree])\n",
+    "    print('Var:', variance[degree])\n",
+    "    print('{} >= {} + {} = {}'.format(error[degree], bias[degree], variance[degree], bias[degree]+variance[degree]))\n",
+    "\n",
+    "plt.plot(polydegree, error, label='Error')\n",
+    "plt.plot(polydegree, bias, label='bias')\n",
+    "plt.plot(polydegree, variance, label='Variance')\n",
+    "plt.legend()\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "33c5cae5",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Summing up\n",
+    "\n",
+    "The bias-variance tradeoff summarizes the fundamental tension in\n",
+    "machine learning, particularly supervised learning, between the\n",
+    "complexity of a model and the amount of training data needed to train\n",
+    "it.  Since data is often limited, in practice it is often useful to\n",
+    "use a less-complex model with higher bias, that is  a model whose asymptotic\n",
+    "performance is worse than another model because it is easier to\n",
+    "train and less sensitive to sampling noise arising from having a\n",
+    "finite-sized training dataset (smaller variance). \n",
+    "\n",
+    "The above equations tell us that in\n",
+    "order to minimize the expected test error, we need to select a\n",
+    "statistical learning method that simultaneously achieves low variance\n",
+    "and low bias. Note that variance is inherently a nonnegative quantity,\n",
+    "and squared bias is also nonnegative. Hence, we see that the expected\n",
+    "test MSE can never lie below $Var(\\epsilon)$, the irreducible error.\n",
+    "\n",
+    "What do we mean by the variance and bias of a statistical learning\n",
+    "method? The variance refers to the amount by which our model would change if we\n",
+    "estimated it using a different training data set. Since the training\n",
+    "data are used to fit the statistical learning method, different\n",
+    "training data sets  will result in a different estimate. But ideally the\n",
+    "estimate for our model should not vary too much between training\n",
+    "sets. However, if a method has high variance  then small changes in\n",
+    "the training data can result in large changes in the model. In general, more\n",
+    "flexible statistical methods have higher variance.\n",
+    "\n",
+    "You may also find this recent [article](https://www.pnas.org/content/116/32/15849) of interest."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f931f0f2",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Another Example from Scikit-Learn's Repository\n",
+    "\n",
+    "This example demonstrates the problems of underfitting and overfitting and\n",
+    "how we can use linear regression with polynomial features to approximate\n",
+    "nonlinear functions. The plot shows the function that we want to approximate,\n",
+    "which is a part of the cosine function. In addition, the samples from the\n",
+    "real function and the approximations of different models are displayed. The\n",
+    "models have polynomial features of different degrees. We can see that a\n",
+    "linear function (polynomial with degree 1) is not sufficient to fit the\n",
+    "training samples. This is called **underfitting**. A polynomial of degree 4\n",
+    "approximates the true function almost perfectly. However, for higher degrees\n",
+    "the model will **overfit** the training data, i.e. it learns the noise of the\n",
+    "training data.\n",
+    "We evaluate quantitatively overfitting and underfitting by using\n",
+    "cross-validation. We calculate the mean squared error (MSE) on the validation\n",
+    "set, the higher, the less likely the model generalizes correctly from the\n",
+    "training data."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "id": "58daa28d",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "\n",
+    "\n",
+    "#print(__doc__)\n",
+    "\n",
+    "import numpy as np\n",
+    "import matplotlib.pyplot as plt\n",
+    "from sklearn.pipeline import Pipeline\n",
+    "from sklearn.preprocessing import PolynomialFeatures\n",
+    "from sklearn.linear_model import LinearRegression\n",
+    "from sklearn.model_selection import cross_val_score\n",
+    "\n",
+    "\n",
+    "def true_fun(X):\n",
+    "    return np.cos(1.5 * np.pi * X)\n",
+    "\n",
+    "np.random.seed(0)\n",
+    "\n",
+    "n_samples = 30\n",
+    "degrees = [1, 4, 15]\n",
+    "\n",
+    "X = np.sort(np.random.rand(n_samples))\n",
+    "y = true_fun(X) + np.random.randn(n_samples) * 0.1\n",
+    "\n",
+    "plt.figure(figsize=(14, 5))\n",
+    "for i in range(len(degrees)):\n",
+    "    ax = plt.subplot(1, len(degrees), i + 1)\n",
+    "    plt.setp(ax, xticks=(), yticks=())\n",
+    "\n",
+    "    polynomial_features = PolynomialFeatures(degree=degrees[i],\n",
+    "                                             include_bias=False)\n",
+    "    linear_regression = LinearRegression()\n",
+    "    pipeline = Pipeline([(\"polynomial_features\", polynomial_features),\n",
+    "                         (\"linear_regression\", linear_regression)])\n",
+    "    pipeline.fit(X[:, np.newaxis], y)\n",
+    "\n",
+    "    # Evaluate the models using crossvalidation\n",
+    "    scores = cross_val_score(pipeline, X[:, np.newaxis], y,\n",
+    "                             scoring=\"neg_mean_squared_error\", cv=10)\n",
+    "\n",
+    "    X_test = np.linspace(0, 1, 100)\n",
+    "    plt.plot(X_test, pipeline.predict(X_test[:, np.newaxis]), label=\"Model\")\n",
+    "    plt.plot(X_test, true_fun(X_test), label=\"True function\")\n",
+    "    plt.scatter(X, y, edgecolor='b', s=20, label=\"Samples\")\n",
+    "    plt.xlabel(\"x\")\n",
+    "    plt.ylabel(\"y\")\n",
+    "    plt.xlim((0, 1))\n",
+    "    plt.ylim((-2, 2))\n",
+    "    plt.legend(loc=\"best\")\n",
+    "    plt.title(\"Degree {}\\nMSE = {:.2e}(+/- {:.2e})\".format(\n",
+    "        degrees[i], -scores.mean(), scores.std()))\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3bbcf741",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Various steps in cross-validation\n",
+    "\n",
+    "When the repetitive splitting of the data set is done randomly,\n",
+    "samples may accidently end up in a fast majority of the splits in\n",
+    "either training or test set. Such samples may have an unbalanced\n",
+    "influence on either model building or prediction evaluation. To avoid\n",
+    "this $k$-fold cross-validation structures the data splitting. The\n",
+    "samples are divided into $k$ more or less equally sized exhaustive and\n",
+    "mutually exclusive subsets. In turn (at each split) one of these\n",
+    "subsets plays the role of the test set while the union of the\n",
+    "remaining subsets constitutes the training set. Such a splitting\n",
+    "warrants a balanced representation of each sample in both training and\n",
+    "test set over the splits. Still the division into the $k$ subsets\n",
+    "involves a degree of randomness. This may be fully excluded when\n",
+    "choosing $k=n$. This particular case is referred to as leave-one-out\n",
+    "cross-validation (LOOCV)."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4b0ffe06",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Cross-validation in brief\n",
+    "\n",
+    "For the various values of $k$\n",
+    "\n",
+    "1. shuffle the dataset randomly.\n",
+    "\n",
+    "2. Split the dataset into $k$ groups.\n",
+    "\n",
+    "3. For each unique group:\n",
+    "\n",
+    "a. Decide which group to use as set for test data\n",
+    "\n",
+    "b. Take the remaining groups as a training data set\n",
+    "\n",
+    "c. Fit a model on the training set and evaluate it on the test set\n",
+    "\n",
+    "d. Retain the evaluation score and discard the model\n",
+    "\n",
+    "5. Summarize the model using the sample of model evaluation scores"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b11baed6",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Code Example for Cross-validation and $k$-fold Cross-validation\n",
+    "\n",
+    "The code here uses Ridge regression with cross-validation (CV)  resampling and $k$-fold CV in order to fit a specific polynomial."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "id": "39e76d49",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "import numpy as np\n",
+    "import matplotlib.pyplot as plt\n",
+    "from sklearn.model_selection import KFold\n",
+    "from sklearn.linear_model import Ridge\n",
+    "from sklearn.model_selection import cross_val_score\n",
+    "from sklearn.preprocessing import PolynomialFeatures\n",
+    "\n",
+    "# A seed just to ensure that the random numbers are the same for every run.\n",
+    "# Useful for eventual debugging.\n",
+    "np.random.seed(3155)\n",
+    "\n",
+    "# Generate the data.\n",
+    "nsamples = 100\n",
+    "x = np.random.randn(nsamples)\n",
+    "y = 3*x**2 + np.random.randn(nsamples)\n",
+    "\n",
+    "## Cross-validation on Ridge regression using KFold only\n",
+    "\n",
+    "# Decide degree on polynomial to fit\n",
+    "poly = PolynomialFeatures(degree = 6)\n",
+    "\n",
+    "# Decide which values of lambda to use\n",
+    "nlambdas = 500\n",
+    "lambdas = np.logspace(-3, 5, nlambdas)\n",
+    "\n",
+    "# Initialize a KFold instance\n",
+    "k = 5\n",
+    "kfold = KFold(n_splits = k)\n",
+    "\n",
+    "# Perform the cross-validation to estimate MSE\n",
+    "scores_KFold = np.zeros((nlambdas, k))\n",
+    "\n",
+    "i = 0\n",
+    "for lmb in lambdas:\n",
+    "    ridge = Ridge(alpha = lmb)\n",
+    "    j = 0\n",
+    "    for train_inds, test_inds in kfold.split(x):\n",
+    "        xtrain = x[train_inds]\n",
+    "        ytrain = y[train_inds]\n",
+    "\n",
+    "        xtest = x[test_inds]\n",
+    "        ytest = y[test_inds]\n",
+    "\n",
+    "        Xtrain = poly.fit_transform(xtrain[:, np.newaxis])\n",
+    "        ridge.fit(Xtrain, ytrain[:, np.newaxis])\n",
+    "\n",
+    "        Xtest = poly.fit_transform(xtest[:, np.newaxis])\n",
+    "        ypred = ridge.predict(Xtest)\n",
+    "\n",
+    "        scores_KFold[i,j] = np.sum((ypred - ytest[:, np.newaxis])**2)/np.size(ypred)\n",
+    "\n",
+    "        j += 1\n",
+    "    i += 1\n",
+    "\n",
+    "\n",
+    "estimated_mse_KFold = np.mean(scores_KFold, axis = 1)\n",
+    "\n",
+    "## Cross-validation using cross_val_score from sklearn along with KFold\n",
+    "\n",
+    "# kfold is an instance initialized above as:\n",
+    "# kfold = KFold(n_splits = k)\n",
+    "\n",
+    "estimated_mse_sklearn = np.zeros(nlambdas)\n",
+    "i = 0\n",
+    "for lmb in lambdas:\n",
+    "    ridge = Ridge(alpha = lmb)\n",
+    "\n",
+    "    X = poly.fit_transform(x[:, np.newaxis])\n",
+    "    estimated_mse_folds = cross_val_score(ridge, X, y[:, np.newaxis], scoring='neg_mean_squared_error', cv=kfold)\n",
+    "\n",
+    "    # cross_val_score return an array containing the estimated negative mse for every fold.\n",
+    "    # we have to the the mean of every array in order to get an estimate of the mse of the model\n",
+    "    estimated_mse_sklearn[i] = np.mean(-estimated_mse_folds)\n",
+    "\n",
+    "    i += 1\n",
+    "\n",
+    "## Plot and compare the slightly different ways to perform cross-validation\n",
+    "\n",
+    "plt.figure()\n",
+    "\n",
+    "plt.plot(np.log10(lambdas), estimated_mse_sklearn, label = 'cross_val_score')\n",
+    "plt.plot(np.log10(lambdas), estimated_mse_KFold, 'r--', label = 'KFold')\n",
+    "\n",
+    "plt.xlabel('log10(lambda)')\n",
+    "plt.ylabel('mse')\n",
+    "\n",
+    "plt.legend()\n",
+    "\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e7d12ef0",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## More examples on bootstrap and cross-validation and errors"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "id": "47f6ae18",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "# Common imports\n",
+    "import os\n",
+    "import numpy as np\n",
+    "import pandas as pd\n",
+    "import matplotlib.pyplot as plt\n",
+    "from sklearn.linear_model import LinearRegression, Ridge, Lasso\n",
+    "from sklearn.model_selection import train_test_split\n",
+    "from sklearn.utils import resample\n",
+    "from sklearn.metrics import mean_squared_error\n",
+    "# Where to save the figures and data files\n",
+    "PROJECT_ROOT_DIR = \"Results\"\n",
+    "FIGURE_ID = \"Results/FigureFiles\"\n",
+    "DATA_ID = \"DataFiles/\"\n",
+    "\n",
+    "if not os.path.exists(PROJECT_ROOT_DIR):\n",
+    "    os.mkdir(PROJECT_ROOT_DIR)\n",
+    "\n",
+    "if not os.path.exists(FIGURE_ID):\n",
+    "    os.makedirs(FIGURE_ID)\n",
+    "\n",
+    "if not os.path.exists(DATA_ID):\n",
+    "    os.makedirs(DATA_ID)\n",
+    "\n",
+    "def image_path(fig_id):\n",
+    "    return os.path.join(FIGURE_ID, fig_id)\n",
+    "\n",
+    "def data_path(dat_id):\n",
+    "    return os.path.join(DATA_ID, dat_id)\n",
+    "\n",
+    "def save_fig(fig_id):\n",
+    "    plt.savefig(image_path(fig_id) + \".png\", format='png')\n",
+    "\n",
+    "infile = open(data_path(\"EoS.csv\"),'r')\n",
+    "\n",
+    "# Read the EoS data as  csv file and organize the data into two arrays with density and energies\n",
+    "EoS = pd.read_csv(infile, names=('Density', 'Energy'))\n",
+    "EoS['Energy'] = pd.to_numeric(EoS['Energy'], errors='coerce')\n",
+    "EoS = EoS.dropna()\n",
+    "Energies = EoS['Energy']\n",
+    "Density = EoS['Density']\n",
+    "#  The design matrix now as function of various polytrops\n",
+    "\n",
+    "Maxpolydegree = 30\n",
+    "X = np.zeros((len(Density),Maxpolydegree))\n",
+    "X[:,0] = 1.0\n",
+    "testerror = np.zeros(Maxpolydegree)\n",
+    "trainingerror = np.zeros(Maxpolydegree)\n",
+    "polynomial = np.zeros(Maxpolydegree)\n",
+    "\n",
+    "trials = 100\n",
+    "for polydegree in range(1, Maxpolydegree):\n",
+    "    polynomial[polydegree] = polydegree\n",
+    "    for degree in range(polydegree):\n",
+    "        X[:,degree] = Density**(degree/3.0)\n",
+    "\n",
+    "# loop over trials in order to estimate the expectation value of the MSE\n",
+    "    testerror[polydegree] = 0.0\n",
+    "    trainingerror[polydegree] = 0.0\n",
+    "    for samples in range(trials):\n",
+    "        x_train, x_test, y_train, y_test = train_test_split(X, Energies, test_size=0.2)\n",
+    "        model = LinearRegression(fit_intercept=False).fit(x_train, y_train)\n",
+    "        ypred = model.predict(x_train)\n",
+    "        ytilde = model.predict(x_test)\n",
+    "        testerror[polydegree] += mean_squared_error(y_test, ytilde)\n",
+    "        trainingerror[polydegree] += mean_squared_error(y_train, ypred) \n",
+    "\n",
+    "    testerror[polydegree] /= trials\n",
+    "    trainingerror[polydegree] /= trials\n",
+    "    print(\"Degree of polynomial: %3d\"% polynomial[polydegree])\n",
+    "    print(\"Mean squared error on training data: %.8f\" % trainingerror[polydegree])\n",
+    "    print(\"Mean squared error on test data: %.8f\" % testerror[polydegree])\n",
+    "\n",
+    "plt.plot(polynomial, np.log10(trainingerror), label='Training Error')\n",
+    "plt.plot(polynomial, np.log10(testerror), label='Test Error')\n",
+    "plt.xlabel('Polynomial degree')\n",
+    "plt.ylabel('log10[MSE]')\n",
+    "plt.legend()\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9c1d4754",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Note that we kept the intercept column in the fitting here. This means that we need to set the **intercept** in the call to the **Scikit-Learn** function as **False**. Alternatively, we could have set up the design matrix $X$ without the first column of ones."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b698ac66",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## The same example but now with cross-validation\n",
+    "\n",
+    "In this example we keep the intercept column again but add cross-validation in order to estimate the best possible value of the means squared error."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 8,
+   "id": "0a2409b0",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "# Common imports\n",
+    "import os\n",
+    "import numpy as np\n",
+    "import pandas as pd\n",
+    "import matplotlib.pyplot as plt\n",
+    "from sklearn.linear_model import LinearRegression, Ridge, Lasso\n",
+    "from sklearn.metrics import mean_squared_error\n",
+    "from sklearn.model_selection import KFold\n",
+    "from sklearn.model_selection import cross_val_score\n",
+    "\n",
+    "\n",
+    "# Where to save the figures and data files\n",
+    "PROJECT_ROOT_DIR = \"Results\"\n",
+    "FIGURE_ID = \"Results/FigureFiles\"\n",
+    "DATA_ID = \"DataFiles/\"\n",
+    "\n",
+    "if not os.path.exists(PROJECT_ROOT_DIR):\n",
+    "    os.mkdir(PROJECT_ROOT_DIR)\n",
+    "\n",
+    "if not os.path.exists(FIGURE_ID):\n",
+    "    os.makedirs(FIGURE_ID)\n",
+    "\n",
+    "if not os.path.exists(DATA_ID):\n",
+    "    os.makedirs(DATA_ID)\n",
+    "\n",
+    "def image_path(fig_id):\n",
+    "    return os.path.join(FIGURE_ID, fig_id)\n",
+    "\n",
+    "def data_path(dat_id):\n",
+    "    return os.path.join(DATA_ID, dat_id)\n",
+    "\n",
+    "def save_fig(fig_id):\n",
+    "    plt.savefig(image_path(fig_id) + \".png\", format='png')\n",
+    "\n",
+    "infile = open(data_path(\"EoS.csv\"),'r')\n",
+    "\n",
+    "# Read the EoS data as  csv file and organize the data into two arrays with density and energies\n",
+    "EoS = pd.read_csv(infile, names=('Density', 'Energy'))\n",
+    "EoS['Energy'] = pd.to_numeric(EoS['Energy'], errors='coerce')\n",
+    "EoS = EoS.dropna()\n",
+    "Energies = EoS['Energy']\n",
+    "Density = EoS['Density']\n",
+    "#  The design matrix now as function of various polytrops\n",
+    "\n",
+    "Maxpolydegree = 30\n",
+    "X = np.zeros((len(Density),Maxpolydegree))\n",
+    "X[:,0] = 1.0\n",
+    "estimated_mse_sklearn = np.zeros(Maxpolydegree)\n",
+    "polynomial = np.zeros(Maxpolydegree)\n",
+    "k =5\n",
+    "kfold = KFold(n_splits = k)\n",
+    "\n",
+    "for polydegree in range(1, Maxpolydegree):\n",
+    "    polynomial[polydegree] = polydegree\n",
+    "    for degree in range(polydegree):\n",
+    "        X[:,degree] = Density**(degree/3.0)\n",
+    "        OLS = LinearRegression(fit_intercept=False)\n",
+    "# loop over trials in order to estimate the expectation value of the MSE\n",
+    "    estimated_mse_folds = cross_val_score(OLS, X, Energies, scoring='neg_mean_squared_error', cv=kfold)\n",
+    "#[:, np.newaxis]\n",
+    "    estimated_mse_sklearn[polydegree] = np.mean(-estimated_mse_folds)\n",
+    "\n",
+    "plt.plot(polynomial, np.log10(estimated_mse_sklearn), label='Test Error')\n",
+    "plt.xlabel('Polynomial degree')\n",
+    "plt.ylabel('log10[MSE]')\n",
+    "plt.legend()\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "56f130b5",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Material for the lab sessions\n",
+    "\n",
+    "This week we will discuss during the first hour of each lab session\n",
+    "some technicalities related to the project and methods for updating\n",
+    "the learning like ADAgrad, RMSprop and ADAM. As teaching material, see\n",
+    "the jupyter-notebook from week 37 (September 12-16).\n",
+    "\n",
+    "For the lab session, the following video on cross validation (from 2024), could be helpful, see <https://www.youtube.com/watch?v=T9jjWsmsd1o>\n",
+    "\n",
+    "See also video on ADAgrad, RMSprop and ADAM (material from last week not covered during lecture) at <https://youtu.be/J_41Hld6tTU>"
+   ]
+  }
+ ],
+ "metadata": {},
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
\ No newline at end of file
diff --git a/doc/LectureNotes/_build/jupyter_execute/week39.ipynb b/doc/LectureNotes/_build/jupyter_execute/week39.ipynb
new file mode 100644
index 000000000..c1b12b458
--- /dev/null
+++ b/doc/LectureNotes/_build/jupyter_execute/week39.ipynb
@@ -0,0 +1,2430 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "3a65fcc4",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- HTML file automatically generated from DocOnce source (https://github.com/doconce/doconce/)\n",
+    "doconce format html week39.do.txt --no_mako -->\n",
+    "<!-- dom:TITLE: Week 39: Resampling methods and logistic regression -->"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "284ac98b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "# Week 39: Resampling methods and logistic regression\n",
+    "**Morten Hjorth-Jensen**, Department of Physics, University of Oslo\n",
+    "\n",
+    "Date: **Week 39**"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "582e0b32",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Plan for week 39, September 22-26, 2025\n",
+    "\n",
+    "**Material for the lecture on Monday September 22.**\n",
+    "\n",
+    "1. Resampling techniques, Bootstrap and cross validation and bias-variance tradeoff\n",
+    "\n",
+    "2. Logistic regression, our first classification encounter and a stepping stone towards neural networks\n",
+    "\n",
+    "3. [Video of lecture](https://youtu.be/OVouJyhoksY)\n",
+    "\n",
+    "4. [Whiteboard notes](https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2024/FYSSTKweek39.pdf)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "08ea52de",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Readings and Videos, resampling methods\n",
+    "1. Raschka et al, pages 175-192\n",
+    "\n",
+    "2. Hastie et al Chapter 7, here we recommend 7.1-7.5 and 7.10 (cross-validation) and 7.11 (bootstrap). See <https://link.springer.com/book/10.1007/978-0-387-84858-7>.\n",
+    "\n",
+    "3. [Video on bias-variance tradeoff](https://www.youtube.com/watch?v=EuBBz3bI-aA)\n",
+    "\n",
+    "4. [Video on Bootstrapping](https://www.youtube.com/watch?v=Xz0x-8-cgaQ)\n",
+    "\n",
+    "5. [Video on cross validation](https://www.youtube.com/watch?v=fSytzGwwBVw)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a8d5878f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Readings and Videos, logistic regression\n",
+    "1. Hastie et al 4.1, 4.2 and 4.3 on logistic regression\n",
+    "\n",
+    "2. Raschka et al, pages 53-76 on Logistic regression and pages 37-52 on gradient optimization\n",
+    "\n",
+    "3. [Video on Logistic regression](https://www.youtube.com/watch?v=C5268D9t9Ak)\n",
+    "\n",
+    "4. [Yet another video on logistic regression](https://www.youtube.com/watch?v=yIYKR4sgzI8)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e93210f9",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Lab sessions week 39\n",
+    "\n",
+    "**Material for the lab sessions on Tuesday and Wednesday.**\n",
+    "\n",
+    "1. Discussions on how to structure your report for the first project\n",
+    "\n",
+    "2. Exercise for week 39 on how to write the abstract and the introduction of the report and how to include references. \n",
+    "\n",
+    "3. Work on project 1, in particular resampling methods like cross-validation and bootstrap. **For more discussions of project 1, chapter 5 of Goodfellow et al is a good read, in particular sections 5.1-5.5 and 5.7-5.11**.\n",
+    "\n",
+    "4. [Video on how to write scientific reports recorded during one of the lab sessions](https://youtu.be/tVW1ZDmZnwM)\n",
+    "\n",
+    "5. A general guideline can be found at <https://github.com/CompPhysics/MachineLearning/blob/master/doc/Projects/EvaluationGrading/EvaluationForm.md>."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c319a504",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Lecture material"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5f29284a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Resampling methods\n",
+    "Resampling methods are an indispensable tool in modern\n",
+    "statistics. They involve repeatedly drawing samples from a training\n",
+    "set and refitting a model of interest on each sample in order to\n",
+    "obtain additional information about the fitted model. For example, in\n",
+    "order to estimate the variability of a linear regression fit, we can\n",
+    "repeatedly draw different samples from the training data, fit a linear\n",
+    "regression to each new sample, and then examine the extent to which\n",
+    "the resulting fits differ. Such an approach may allow us to obtain\n",
+    "information that would not be available from fitting the model only\n",
+    "once using the original training sample.\n",
+    "\n",
+    "Two resampling methods are often used in Machine Learning analyses,\n",
+    "1. The **bootstrap method**\n",
+    "\n",
+    "2. and **Cross-Validation**\n",
+    "\n",
+    "In addition there are several other methods such as the Jackknife and the Blocking methods. This week will repeat some of the elements of the bootstrap method and focus more on cross-validation."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4a774608",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Resampling approaches can be computationally expensive\n",
+    "\n",
+    "Resampling approaches can be computationally expensive, because they\n",
+    "involve fitting the same statistical method multiple times using\n",
+    "different subsets of the training data. However, due to recent\n",
+    "advances in computing power, the computational requirements of\n",
+    "resampling methods generally are not prohibitive. In this chapter, we\n",
+    "discuss two of the most commonly used resampling methods,\n",
+    "cross-validation and the bootstrap. Both methods are important tools\n",
+    "in the practical application of many statistical learning\n",
+    "procedures. For example, cross-validation can be used to estimate the\n",
+    "test error associated with a given statistical learning method in\n",
+    "order to evaluate its performance, or to select the appropriate level\n",
+    "of flexibility. The process of evaluating a model’s performance is\n",
+    "known as model assessment, whereas the process of selecting the proper\n",
+    "level of flexibility for a model is known as model selection. The\n",
+    "bootstrap is widely used."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5e62c381",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Why resampling methods ?\n",
+    "**Statistical analysis.**\n",
+    "\n",
+    "* Our simulations can be treated as *computer experiments*. This is particularly the case for Monte Carlo methods which are widely used in statistical analyses.\n",
+    "\n",
+    "* The results can be analysed with the same statistical tools as we would use when analysing experimental data.\n",
+    "\n",
+    "* As in all experiments, we are looking for expectation values and an estimate of how accurate they are, i.e., possible sources for errors."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "96896342",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Statistical analysis\n",
+    "\n",
+    "* As in other experiments, many numerical  experiments have two classes of errors:\n",
+    "\n",
+    "  * Statistical errors\n",
+    "\n",
+    "  * Systematical errors\n",
+    "\n",
+    "* Statistical errors can be estimated using standard tools from statistics\n",
+    "\n",
+    "* Systematical errors are method specific and must be treated differently from case to case."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d5318be7",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Resampling methods\n",
+    "\n",
+    "With all these analytical equations for both the OLS and Ridge\n",
+    "regression, we will now outline how to assess a given model. This will\n",
+    "lead to a discussion of the so-called bias-variance tradeoff (see\n",
+    "below) and so-called resampling methods.\n",
+    "\n",
+    "One of the quantities we have discussed as a way to measure errors is\n",
+    "the mean-squared error (MSE), mainly used for fitting of continuous\n",
+    "functions. Another choice is the absolute error.\n",
+    "\n",
+    "In the discussions below we will focus on the MSE and in particular since we will split the data into test and training data,\n",
+    "we discuss the\n",
+    "1. prediction error or simply the **test error** $\\mathrm{Err_{Test}}$, where we have a fixed training set and the test error is the MSE arising from the data reserved for testing. We discuss also the \n",
+    "\n",
+    "2. training error $\\mathrm{Err_{Train}}$, which is the average loss over the training data.\n",
+    "\n",
+    "As our model becomes more and more complex, more of the training data tends to  used. The training may thence adapt to more complicated structures in the data. This may lead to a decrease in the bias (see below for code example) and a slight increase of the variance for the test error.\n",
+    "For a certain level of complexity the test error will reach minimum, before starting to increase again. The\n",
+    "training error reaches a saturation."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7597015e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Resampling methods: Bootstrap\n",
+    "Bootstrapping is a [non-parametric approach](https://en.wikipedia.org/wiki/Nonparametric_statistics) to statistical inference\n",
+    "that substitutes computation for more traditional distributional\n",
+    "assumptions and asymptotic results. Bootstrapping offers a number of\n",
+    "advantages: \n",
+    "1. The bootstrap is quite general, although there are some cases in which it fails.  \n",
+    "\n",
+    "2. Because it does not require distributional assumptions (such as normally distributed errors), the bootstrap can provide more accurate inferences when the data are not well behaved or when the sample size is small.  \n",
+    "\n",
+    "3. It is possible to apply the bootstrap to statistics with sampling distributions that are difficult to derive, even asymptotically. \n",
+    "\n",
+    "4. It is relatively simple to apply the bootstrap to complex data-collection plans (such as stratified and clustered samples).\n",
+    "\n",
+    "The textbook by [Davison on the Bootstrap Methods and their Applications](https://www.cambridge.org/core/books/bootstrap-methods-and-their-application/ED2FD043579F27952363566DC09CBD6A) provides many more insights and proofs. In this course we will take a more practical approach and use the results and theorems provided in the literature. For those interested in reading more about the bootstrap methods, we recommend the above text and the one by [Efron and Tibshirani](https://www.routledge.com/An-Introduction-to-the-Bootstrap/Efron-Tibshirani/p/book/9780412042317)."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "fbf69230",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## The bias-variance tradeoff\n",
+    "\n",
+    "We will discuss the bias-variance tradeoff in the context of\n",
+    "continuous predictions such as regression. However, many of the\n",
+    "intuitions and ideas discussed here also carry over to classification\n",
+    "tasks. Consider a dataset $\\mathcal{D}$ consisting of the data\n",
+    "$\\mathbf{X}_\\mathcal{D}=\\{(y_j, \\boldsymbol{x}_j), j=0\\ldots n-1\\}$. \n",
+    "\n",
+    "Let us assume that the true data is generated from a noisy model"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "358f7872",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\boldsymbol{y}=f(\\boldsymbol{x}) + \\boldsymbol{\\epsilon}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6a4aceef",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where $\\epsilon$ is normally distributed with mean zero and standard deviation $\\sigma^2$.\n",
+    "\n",
+    "In our derivation of the ordinary least squares method we defined then\n",
+    "an approximation to the function $f$ in terms of the parameters\n",
+    "$\\boldsymbol{\\theta}$ and the design matrix $\\boldsymbol{X}$ which embody our model,\n",
+    "that is $\\boldsymbol{\\tilde{y}}=\\boldsymbol{X}\\boldsymbol{\\theta}$. \n",
+    "\n",
+    "Thereafter we found the parameters $\\boldsymbol{\\theta}$ by optimizing the means squared error via the so-called cost function"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "84416669",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "C(\\boldsymbol{X},\\boldsymbol{\\theta}) =\\frac{1}{n}\\sum_{i=0}^{n-1}(y_i-\\tilde{y}_i)^2=\\mathbb{E}\\left[(\\boldsymbol{y}-\\boldsymbol{\\tilde{y}})^2\\right].\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0036358e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "We can rewrite this as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d712d2d7",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\mathbb{E}\\left[(\\boldsymbol{y}-\\boldsymbol{\\tilde{y}})^2\\right]=\\frac{1}{n}\\sum_i(f_i-\\mathbb{E}\\left[\\boldsymbol{\\tilde{y}}\\right])^2+\\frac{1}{n}\\sum_i(\\tilde{y}_i-\\mathbb{E}\\left[\\boldsymbol{\\tilde{y}}\\right])^2+\\sigma^2.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b71e48ac",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "The three terms represent the square of the bias of the learning\n",
+    "method, which can be thought of as the error caused by the simplifying\n",
+    "assumptions built into the method. The second term represents the\n",
+    "variance of the chosen model and finally the last terms is variance of\n",
+    "the error $\\boldsymbol{\\epsilon}$.\n",
+    "\n",
+    "To derive this equation, we need to recall that the variance of $\\boldsymbol{y}$ and $\\boldsymbol{\\epsilon}$ are both equal to $\\sigma^2$. The mean value of $\\boldsymbol{\\epsilon}$ is by definition equal to zero. Furthermore, the function $f$ is not a stochastics variable, idem for $\\boldsymbol{\\tilde{y}}$.\n",
+    "We use a more compact notation in terms of the expectation value"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c78ceafe",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\mathbb{E}\\left[(\\boldsymbol{y}-\\boldsymbol{\\tilde{y}})^2\\right]=\\mathbb{E}\\left[(\\boldsymbol{f}+\\boldsymbol{\\epsilon}-\\boldsymbol{\\tilde{y}})^2\\right],\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "74aae5bc",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and adding and subtracting $\\mathbb{E}\\left[\\boldsymbol{\\tilde{y}}\\right]$ we get"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1f2313f1",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\mathbb{E}\\left[(\\boldsymbol{y}-\\boldsymbol{\\tilde{y}})^2\\right]=\\mathbb{E}\\left[(\\boldsymbol{f}+\\boldsymbol{\\epsilon}-\\boldsymbol{\\tilde{y}}+\\mathbb{E}\\left[\\boldsymbol{\\tilde{y}}\\right]-\\mathbb{E}\\left[\\boldsymbol{\\tilde{y}}\\right])^2\\right],\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a29b174f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "which, using the abovementioned expectation values can be rewritten as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3bc08002",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\mathbb{E}\\left[(\\boldsymbol{y}-\\boldsymbol{\\tilde{y}})^2\\right]=\\mathbb{E}\\left[(\\boldsymbol{y}-\\mathbb{E}\\left[\\boldsymbol{\\tilde{y}}\\right])^2\\right]+\\mathrm{Var}\\left[\\boldsymbol{\\tilde{y}}\\right]+\\sigma^2,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7b7d24e8",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "that is the rewriting in terms of the so-called bias, the variance of the model $\\boldsymbol{\\tilde{y}}$ and the variance of $\\boldsymbol{\\epsilon}$.\n",
+    "\n",
+    "**Note that in order to derive these equations we have assumed we can replace the unknown function $\\boldsymbol{f}$ with the target/output data $\\boldsymbol{y}$.**"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f2118d82",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## A way to Read the Bias-Variance Tradeoff\n",
+    "\n",
+    "<!-- dom:FIGURE: [figures/BiasVariance.png, width=600 frac=0.9] -->\n",
+    "<!-- begin figure -->\n",
+    "\n",
+    "<img src=\"figures/BiasVariance.png\" width=\"600\"><p style=\"font-size: 0.9em\"><i>Figure 1: </i></p>\n",
+    "<!-- end figure -->"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "baf08f8a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Understanding what happens"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "id": "1bd7ac4e",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "%matplotlib inline\n",
+    "\n",
+    "import matplotlib.pyplot as plt\n",
+    "import numpy as np\n",
+    "from sklearn.linear_model import LinearRegression, Ridge, Lasso\n",
+    "from sklearn.preprocessing import PolynomialFeatures\n",
+    "from sklearn.model_selection import train_test_split\n",
+    "from sklearn.pipeline import make_pipeline\n",
+    "from sklearn.utils import resample\n",
+    "\n",
+    "np.random.seed(2018)\n",
+    "\n",
+    "n = 40\n",
+    "n_boostraps = 100\n",
+    "maxdegree = 14\n",
+    "\n",
+    "\n",
+    "# Make data set.\n",
+    "x = np.linspace(-3, 3, n).reshape(-1, 1)\n",
+    "y = np.exp(-x**2) + 1.5 * np.exp(-(x-2)**2)+ np.random.normal(0, 0.1, x.shape)\n",
+    "error = np.zeros(maxdegree)\n",
+    "bias = np.zeros(maxdegree)\n",
+    "variance = np.zeros(maxdegree)\n",
+    "polydegree = np.zeros(maxdegree)\n",
+    "x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2)\n",
+    "\n",
+    "for degree in range(maxdegree):\n",
+    "    model = make_pipeline(PolynomialFeatures(degree=degree), LinearRegression(fit_intercept=False))\n",
+    "    y_pred = np.empty((y_test.shape[0], n_boostraps))\n",
+    "    for i in range(n_boostraps):\n",
+    "        x_, y_ = resample(x_train, y_train)\n",
+    "        y_pred[:, i] = model.fit(x_, y_).predict(x_test).ravel()\n",
+    "\n",
+    "    polydegree[degree] = degree\n",
+    "    error[degree] = np.mean( np.mean((y_test - y_pred)**2, axis=1, keepdims=True) )\n",
+    "    bias[degree] = np.mean( (y_test - np.mean(y_pred, axis=1, keepdims=True))**2 )\n",
+    "    variance[degree] = np.mean( np.var(y_pred, axis=1, keepdims=True) )\n",
+    "    print('Polynomial degree:', degree)\n",
+    "    print('Error:', error[degree])\n",
+    "    print('Bias^2:', bias[degree])\n",
+    "    print('Var:', variance[degree])\n",
+    "    print('{} >= {} + {} = {}'.format(error[degree], bias[degree], variance[degree], bias[degree]+variance[degree]))\n",
+    "\n",
+    "plt.plot(polydegree, error, label='Error')\n",
+    "plt.plot(polydegree, bias, label='bias')\n",
+    "plt.plot(polydegree, variance, label='Variance')\n",
+    "plt.legend()\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3edb75ab",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Summing up\n",
+    "\n",
+    "The bias-variance tradeoff summarizes the fundamental tension in\n",
+    "machine learning, particularly supervised learning, between the\n",
+    "complexity of a model and the amount of training data needed to train\n",
+    "it.  Since data is often limited, in practice it is often useful to\n",
+    "use a less-complex model with higher bias, that is  a model whose asymptotic\n",
+    "performance is worse than another model because it is easier to\n",
+    "train and less sensitive to sampling noise arising from having a\n",
+    "finite-sized training dataset (smaller variance). \n",
+    "\n",
+    "The above equations tell us that in\n",
+    "order to minimize the expected test error, we need to select a\n",
+    "statistical learning method that simultaneously achieves low variance\n",
+    "and low bias. Note that variance is inherently a nonnegative quantity,\n",
+    "and squared bias is also nonnegative. Hence, we see that the expected\n",
+    "test MSE can never lie below $Var(\\epsilon)$, the irreducible error.\n",
+    "\n",
+    "What do we mean by the variance and bias of a statistical learning\n",
+    "method? The variance refers to the amount by which our model would change if we\n",
+    "estimated it using a different training data set. Since the training\n",
+    "data are used to fit the statistical learning method, different\n",
+    "training data sets  will result in a different estimate. But ideally the\n",
+    "estimate for our model should not vary too much between training\n",
+    "sets. However, if a method has high variance  then small changes in\n",
+    "the training data can result in large changes in the model. In general, more\n",
+    "flexible statistical methods have higher variance.\n",
+    "\n",
+    "You may also find this recent [article](https://www.pnas.org/content/116/32/15849) of interest."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "88ce8a48",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Another Example from Scikit-Learn's Repository\n",
+    "\n",
+    "This example demonstrates the problems of underfitting and overfitting and\n",
+    "how we can use linear regression with polynomial features to approximate\n",
+    "nonlinear functions. The plot shows the function that we want to approximate,\n",
+    "which is a part of the cosine function. In addition, the samples from the\n",
+    "real function and the approximations of different models are displayed. The\n",
+    "models have polynomial features of different degrees. We can see that a\n",
+    "linear function (polynomial with degree 1) is not sufficient to fit the\n",
+    "training samples. This is called **underfitting**. A polynomial of degree 4\n",
+    "approximates the true function almost perfectly. However, for higher degrees\n",
+    "the model will **overfit** the training data, i.e. it learns the noise of the\n",
+    "training data.\n",
+    "We evaluate quantitatively overfitting and underfitting by using\n",
+    "cross-validation. We calculate the mean squared error (MSE) on the validation\n",
+    "set, the higher, the less likely the model generalizes correctly from the\n",
+    "training data."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "id": "40385eb8",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "\n",
+    "\n",
+    "#print(__doc__)\n",
+    "\n",
+    "import numpy as np\n",
+    "import matplotlib.pyplot as plt\n",
+    "from sklearn.pipeline import Pipeline\n",
+    "from sklearn.preprocessing import PolynomialFeatures\n",
+    "from sklearn.linear_model import LinearRegression\n",
+    "from sklearn.model_selection import cross_val_score\n",
+    "\n",
+    "\n",
+    "def true_fun(X):\n",
+    "    return np.cos(1.5 * np.pi * X)\n",
+    "\n",
+    "np.random.seed(0)\n",
+    "\n",
+    "n_samples = 30\n",
+    "degrees = [1, 4, 15]\n",
+    "\n",
+    "X = np.sort(np.random.rand(n_samples))\n",
+    "y = true_fun(X) + np.random.randn(n_samples) * 0.1\n",
+    "\n",
+    "plt.figure(figsize=(14, 5))\n",
+    "for i in range(len(degrees)):\n",
+    "    ax = plt.subplot(1, len(degrees), i + 1)\n",
+    "    plt.setp(ax, xticks=(), yticks=())\n",
+    "\n",
+    "    polynomial_features = PolynomialFeatures(degree=degrees[i],\n",
+    "                                             include_bias=False)\n",
+    "    linear_regression = LinearRegression()\n",
+    "    pipeline = Pipeline([(\"polynomial_features\", polynomial_features),\n",
+    "                         (\"linear_regression\", linear_regression)])\n",
+    "    pipeline.fit(X[:, np.newaxis], y)\n",
+    "\n",
+    "    # Evaluate the models using crossvalidation\n",
+    "    scores = cross_val_score(pipeline, X[:, np.newaxis], y,\n",
+    "                             scoring=\"neg_mean_squared_error\", cv=10)\n",
+    "\n",
+    "    X_test = np.linspace(0, 1, 100)\n",
+    "    plt.plot(X_test, pipeline.predict(X_test[:, np.newaxis]), label=\"Model\")\n",
+    "    plt.plot(X_test, true_fun(X_test), label=\"True function\")\n",
+    "    plt.scatter(X, y, edgecolor='b', s=20, label=\"Samples\")\n",
+    "    plt.xlabel(\"x\")\n",
+    "    plt.ylabel(\"y\")\n",
+    "    plt.xlim((0, 1))\n",
+    "    plt.ylim((-2, 2))\n",
+    "    plt.legend(loc=\"best\")\n",
+    "    plt.title(\"Degree {}\\nMSE = {:.2e}(+/- {:.2e})\".format(\n",
+    "        degrees[i], -scores.mean(), scores.std()))\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a0c0d4df",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Various steps in cross-validation\n",
+    "\n",
+    "When the repetitive splitting of the data set is done randomly,\n",
+    "samples may accidently end up in a fast majority of the splits in\n",
+    "either training or test set. Such samples may have an unbalanced\n",
+    "influence on either model building or prediction evaluation. To avoid\n",
+    "this $k$-fold cross-validation structures the data splitting. The\n",
+    "samples are divided into $k$ more or less equally sized exhaustive and\n",
+    "mutually exclusive subsets. In turn (at each split) one of these\n",
+    "subsets plays the role of the test set while the union of the\n",
+    "remaining subsets constitutes the training set. Such a splitting\n",
+    "warrants a balanced representation of each sample in both training and\n",
+    "test set over the splits. Still the division into the $k$ subsets\n",
+    "involves a degree of randomness. This may be fully excluded when\n",
+    "choosing $k=n$. This particular case is referred to as leave-one-out\n",
+    "cross-validation (LOOCV)."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "68d3e653",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Cross-validation in brief\n",
+    "\n",
+    "For the various values of $k$\n",
+    "\n",
+    "1. shuffle the dataset randomly.\n",
+    "\n",
+    "2. Split the dataset into $k$ groups.\n",
+    "\n",
+    "3. For each unique group:\n",
+    "\n",
+    "a. Decide which group to use as set for test data\n",
+    "\n",
+    "b. Take the remaining groups as a training data set\n",
+    "\n",
+    "c. Fit a model on the training set and evaluate it on the test set\n",
+    "\n",
+    "d. Retain the evaluation score and discard the model\n",
+    "\n",
+    "5. Summarize the model using the sample of model evaluation scores"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7f7a6350",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Code Example for Cross-validation and $k$-fold Cross-validation\n",
+    "\n",
+    "The code here uses Ridge regression with cross-validation (CV)  resampling and $k$-fold CV in order to fit a specific polynomial."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "id": "23eef50b",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "import numpy as np\n",
+    "import matplotlib.pyplot as plt\n",
+    "from sklearn.model_selection import KFold\n",
+    "from sklearn.linear_model import Ridge\n",
+    "from sklearn.model_selection import cross_val_score\n",
+    "from sklearn.preprocessing import PolynomialFeatures\n",
+    "\n",
+    "# A seed just to ensure that the random numbers are the same for every run.\n",
+    "# Useful for eventual debugging.\n",
+    "np.random.seed(3155)\n",
+    "\n",
+    "# Generate the data.\n",
+    "nsamples = 100\n",
+    "x = np.random.randn(nsamples)\n",
+    "y = 3*x**2 + np.random.randn(nsamples)\n",
+    "\n",
+    "## Cross-validation on Ridge regression using KFold only\n",
+    "\n",
+    "# Decide degree on polynomial to fit\n",
+    "poly = PolynomialFeatures(degree = 6)\n",
+    "\n",
+    "# Decide which values of lambda to use\n",
+    "nlambdas = 500\n",
+    "lambdas = np.logspace(-3, 5, nlambdas)\n",
+    "\n",
+    "# Initialize a KFold instance\n",
+    "k = 5\n",
+    "kfold = KFold(n_splits = k)\n",
+    "\n",
+    "# Perform the cross-validation to estimate MSE\n",
+    "scores_KFold = np.zeros((nlambdas, k))\n",
+    "\n",
+    "i = 0\n",
+    "for lmb in lambdas:\n",
+    "    ridge = Ridge(alpha = lmb)\n",
+    "    j = 0\n",
+    "    for train_inds, test_inds in kfold.split(x):\n",
+    "        xtrain = x[train_inds]\n",
+    "        ytrain = y[train_inds]\n",
+    "\n",
+    "        xtest = x[test_inds]\n",
+    "        ytest = y[test_inds]\n",
+    "\n",
+    "        Xtrain = poly.fit_transform(xtrain[:, np.newaxis])\n",
+    "        ridge.fit(Xtrain, ytrain[:, np.newaxis])\n",
+    "\n",
+    "        Xtest = poly.fit_transform(xtest[:, np.newaxis])\n",
+    "        ypred = ridge.predict(Xtest)\n",
+    "\n",
+    "        scores_KFold[i,j] = np.sum((ypred - ytest[:, np.newaxis])**2)/np.size(ypred)\n",
+    "\n",
+    "        j += 1\n",
+    "    i += 1\n",
+    "\n",
+    "\n",
+    "estimated_mse_KFold = np.mean(scores_KFold, axis = 1)\n",
+    "\n",
+    "## Cross-validation using cross_val_score from sklearn along with KFold\n",
+    "\n",
+    "# kfold is an instance initialized above as:\n",
+    "# kfold = KFold(n_splits = k)\n",
+    "\n",
+    "estimated_mse_sklearn = np.zeros(nlambdas)\n",
+    "i = 0\n",
+    "for lmb in lambdas:\n",
+    "    ridge = Ridge(alpha = lmb)\n",
+    "\n",
+    "    X = poly.fit_transform(x[:, np.newaxis])\n",
+    "    estimated_mse_folds = cross_val_score(ridge, X, y[:, np.newaxis], scoring='neg_mean_squared_error', cv=kfold)\n",
+    "\n",
+    "    # cross_val_score return an array containing the estimated negative mse for every fold.\n",
+    "    # we have to the the mean of every array in order to get an estimate of the mse of the model\n",
+    "    estimated_mse_sklearn[i] = np.mean(-estimated_mse_folds)\n",
+    "\n",
+    "    i += 1\n",
+    "\n",
+    "## Plot and compare the slightly different ways to perform cross-validation\n",
+    "\n",
+    "plt.figure()\n",
+    "\n",
+    "plt.plot(np.log10(lambdas), estimated_mse_sklearn, label = 'cross_val_score')\n",
+    "#plt.plot(np.log10(lambdas), estimated_mse_KFold, 'r--', label = 'KFold')\n",
+    "\n",
+    "plt.xlabel('log10(lambda)')\n",
+    "plt.ylabel('mse')\n",
+    "\n",
+    "plt.legend()\n",
+    "\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "76662787",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## More examples on bootstrap and cross-validation and errors"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "id": "166cd085",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "# Common imports\n",
+    "import os\n",
+    "import numpy as np\n",
+    "import pandas as pd\n",
+    "import matplotlib.pyplot as plt\n",
+    "from sklearn.linear_model import LinearRegression, Ridge, Lasso\n",
+    "from sklearn.model_selection import train_test_split\n",
+    "from sklearn.utils import resample\n",
+    "from sklearn.metrics import mean_squared_error\n",
+    "# Where to save the figures and data files\n",
+    "PROJECT_ROOT_DIR = \"Results\"\n",
+    "FIGURE_ID = \"Results/FigureFiles\"\n",
+    "DATA_ID = \"DataFiles/\"\n",
+    "\n",
+    "if not os.path.exists(PROJECT_ROOT_DIR):\n",
+    "    os.mkdir(PROJECT_ROOT_DIR)\n",
+    "\n",
+    "if not os.path.exists(FIGURE_ID):\n",
+    "    os.makedirs(FIGURE_ID)\n",
+    "\n",
+    "if not os.path.exists(DATA_ID):\n",
+    "    os.makedirs(DATA_ID)\n",
+    "\n",
+    "def image_path(fig_id):\n",
+    "    return os.path.join(FIGURE_ID, fig_id)\n",
+    "\n",
+    "def data_path(dat_id):\n",
+    "    return os.path.join(DATA_ID, dat_id)\n",
+    "\n",
+    "def save_fig(fig_id):\n",
+    "    plt.savefig(image_path(fig_id) + \".png\", format='png')\n",
+    "\n",
+    "infile = open(data_path(\"EoS.csv\"),'r')\n",
+    "\n",
+    "# Read the EoS data as  csv file and organize the data into two arrays with density and energies\n",
+    "EoS = pd.read_csv(infile, names=('Density', 'Energy'))\n",
+    "EoS['Energy'] = pd.to_numeric(EoS['Energy'], errors='coerce')\n",
+    "EoS = EoS.dropna()\n",
+    "Energies = EoS['Energy']\n",
+    "Density = EoS['Density']\n",
+    "#  The design matrix now as function of various polytrops\n",
+    "\n",
+    "Maxpolydegree = 30\n",
+    "X = np.zeros((len(Density),Maxpolydegree))\n",
+    "X[:,0] = 1.0\n",
+    "testerror = np.zeros(Maxpolydegree)\n",
+    "trainingerror = np.zeros(Maxpolydegree)\n",
+    "polynomial = np.zeros(Maxpolydegree)\n",
+    "\n",
+    "trials = 100\n",
+    "for polydegree in range(1, Maxpolydegree):\n",
+    "    polynomial[polydegree] = polydegree\n",
+    "    for degree in range(polydegree):\n",
+    "        X[:,degree] = Density**(degree/3.0)\n",
+    "\n",
+    "# loop over trials in order to estimate the expectation value of the MSE\n",
+    "    testerror[polydegree] = 0.0\n",
+    "    trainingerror[polydegree] = 0.0\n",
+    "    for samples in range(trials):\n",
+    "        x_train, x_test, y_train, y_test = train_test_split(X, Energies, test_size=0.2)\n",
+    "        model = LinearRegression(fit_intercept=False).fit(x_train, y_train)\n",
+    "        ypred = model.predict(x_train)\n",
+    "        ytilde = model.predict(x_test)\n",
+    "        testerror[polydegree] += mean_squared_error(y_test, ytilde)\n",
+    "        trainingerror[polydegree] += mean_squared_error(y_train, ypred) \n",
+    "\n",
+    "    testerror[polydegree] /= trials\n",
+    "    trainingerror[polydegree] /= trials\n",
+    "    print(\"Degree of polynomial: %3d\"% polynomial[polydegree])\n",
+    "    print(\"Mean squared error on training data: %.8f\" % trainingerror[polydegree])\n",
+    "    print(\"Mean squared error on test data: %.8f\" % testerror[polydegree])\n",
+    "\n",
+    "plt.plot(polynomial, np.log10(trainingerror), label='Training Error')\n",
+    "plt.plot(polynomial, np.log10(testerror), label='Test Error')\n",
+    "plt.xlabel('Polynomial degree')\n",
+    "plt.ylabel('log10[MSE]')\n",
+    "plt.legend()\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "53dc97b8",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Note that we kept the intercept column in the fitting here. This means that we need to set the **intercept** in the call to the **Scikit-Learn** function as **False**. Alternatively, we could have set up the design matrix $X$ without the first column of ones."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "660084ab",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## The same example but now with cross-validation\n",
+    "\n",
+    "In this example we keep the intercept column again but add cross-validation in order to estimate the best possible value of the means squared error."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "id": "5dd5aec2",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "# Common imports\n",
+    "import os\n",
+    "import numpy as np\n",
+    "import pandas as pd\n",
+    "import matplotlib.pyplot as plt\n",
+    "from sklearn.linear_model import LinearRegression, Ridge, Lasso\n",
+    "from sklearn.metrics import mean_squared_error\n",
+    "from sklearn.model_selection import KFold\n",
+    "from sklearn.model_selection import cross_val_score\n",
+    "\n",
+    "\n",
+    "# Where to save the figures and data files\n",
+    "PROJECT_ROOT_DIR = \"Results\"\n",
+    "FIGURE_ID = \"Results/FigureFiles\"\n",
+    "DATA_ID = \"DataFiles/\"\n",
+    "\n",
+    "if not os.path.exists(PROJECT_ROOT_DIR):\n",
+    "    os.mkdir(PROJECT_ROOT_DIR)\n",
+    "\n",
+    "if not os.path.exists(FIGURE_ID):\n",
+    "    os.makedirs(FIGURE_ID)\n",
+    "\n",
+    "if not os.path.exists(DATA_ID):\n",
+    "    os.makedirs(DATA_ID)\n",
+    "\n",
+    "def image_path(fig_id):\n",
+    "    return os.path.join(FIGURE_ID, fig_id)\n",
+    "\n",
+    "def data_path(dat_id):\n",
+    "    return os.path.join(DATA_ID, dat_id)\n",
+    "\n",
+    "def save_fig(fig_id):\n",
+    "    plt.savefig(image_path(fig_id) + \".png\", format='png')\n",
+    "\n",
+    "infile = open(data_path(\"EoS.csv\"),'r')\n",
+    "\n",
+    "# Read the EoS data as  csv file and organize the data into two arrays with density and energies\n",
+    "EoS = pd.read_csv(infile, names=('Density', 'Energy'))\n",
+    "EoS['Energy'] = pd.to_numeric(EoS['Energy'], errors='coerce')\n",
+    "EoS = EoS.dropna()\n",
+    "Energies = EoS['Energy']\n",
+    "Density = EoS['Density']\n",
+    "#  The design matrix now as function of various polytrops\n",
+    "\n",
+    "Maxpolydegree = 30\n",
+    "X = np.zeros((len(Density),Maxpolydegree))\n",
+    "X[:,0] = 1.0\n",
+    "estimated_mse_sklearn = np.zeros(Maxpolydegree)\n",
+    "polynomial = np.zeros(Maxpolydegree)\n",
+    "k =5\n",
+    "kfold = KFold(n_splits = k)\n",
+    "\n",
+    "for polydegree in range(1, Maxpolydegree):\n",
+    "    polynomial[polydegree] = polydegree\n",
+    "    for degree in range(polydegree):\n",
+    "        X[:,degree] = Density**(degree/3.0)\n",
+    "        OLS = LinearRegression(fit_intercept=False)\n",
+    "# loop over trials in order to estimate the expectation value of the MSE\n",
+    "    estimated_mse_folds = cross_val_score(OLS, X, Energies, scoring='neg_mean_squared_error', cv=kfold)\n",
+    "#[:, np.newaxis]\n",
+    "    estimated_mse_sklearn[polydegree] = np.mean(-estimated_mse_folds)\n",
+    "\n",
+    "plt.plot(polynomial, np.log10(estimated_mse_sklearn), label='Test Error')\n",
+    "plt.xlabel('Polynomial degree')\n",
+    "plt.ylabel('log10[MSE]')\n",
+    "plt.legend()\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2c1f6d4b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Logistic Regression\n",
+    "\n",
+    "In linear regression our main interest was centered on learning the\n",
+    "coefficients of a functional fit (say a polynomial) in order to be\n",
+    "able to predict the response of a continuous variable on some unseen\n",
+    "data. The fit to the continuous variable $y_i$ is based on some\n",
+    "independent variables $\\boldsymbol{x}_i$. Linear regression resulted in\n",
+    "analytical expressions for standard ordinary Least Squares or Ridge\n",
+    "regression (in terms of matrices to invert) for several quantities,\n",
+    "ranging from the variance and thereby the confidence intervals of the\n",
+    "parameters $\\boldsymbol{\\theta}$ to the mean squared error. If we can invert\n",
+    "the product of the design matrices, linear regression gives then a\n",
+    "simple recipe for fitting our data."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "149e92ec",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Classification problems\n",
+    "\n",
+    "Classification problems, however, are concerned with outcomes taking\n",
+    "the form of discrete variables (i.e. categories). We may for example,\n",
+    "on the basis of DNA sequencing for a number of patients, like to find\n",
+    "out which mutations are important for a certain disease; or based on\n",
+    "scans of various patients' brains, figure out if there is a tumor or\n",
+    "not; or given a specific physical system, we'd like to identify its\n",
+    "state, say whether it is an ordered or disordered system (typical\n",
+    "situation in solid state physics); or classify the status of a\n",
+    "patient, whether she/he has a stroke or not and many other similar\n",
+    "situations.\n",
+    "\n",
+    "The most common situation we encounter when we apply logistic\n",
+    "regression is that of two possible outcomes, normally denoted as a\n",
+    "binary outcome, true or false, positive or negative, success or\n",
+    "failure etc."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ce85cd3a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Optimization and Deep learning\n",
+    "\n",
+    "Logistic regression will also serve as our stepping stone towards\n",
+    "neural network algorithms and supervised deep learning. For logistic\n",
+    "learning, the minimization of the cost function leads to a non-linear\n",
+    "equation in the parameters $\\boldsymbol{\\theta}$. The optimization of the\n",
+    "problem calls therefore for minimization algorithms. This forms the\n",
+    "bottle neck of all machine learning algorithms, namely how to find\n",
+    "reliable minima of a multi-variable function. This leads us to the\n",
+    "family of gradient descent methods. The latter are the working horses\n",
+    "of basically all modern machine learning algorithms.\n",
+    "\n",
+    "We note also that many of the topics discussed here on logistic \n",
+    "regression are also commonly used in modern supervised Deep Learning\n",
+    "models, as we will see later."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2eb9e687",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Basics\n",
+    "\n",
+    "We consider the case where the outputs/targets, also called the\n",
+    "responses or the outcomes, $y_i$ are discrete and only take values\n",
+    "from $k=0,\\dots,K-1$ (i.e. $K$ classes).\n",
+    "\n",
+    "The goal is to predict the\n",
+    "output classes from the design matrix $\\boldsymbol{X}\\in\\mathbb{R}^{n\\times p}$\n",
+    "made of $n$ samples, each of which carries $p$ features or predictors. The\n",
+    "primary goal is to identify the classes to which new unseen samples\n",
+    "belong.\n",
+    "\n",
+    "Let us specialize to the case of two classes only, with outputs\n",
+    "$y_i=0$ and $y_i=1$. Our outcomes could represent the status of a\n",
+    "credit card user that could default or not on her/his credit card\n",
+    "debt. That is"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9b8b7d05",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "y_i = \\begin{bmatrix} 0 & \\mathrm{no}\\\\  1 & \\mathrm{yes} \\end{bmatrix}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7db50d1a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Linear classifier\n",
+    "\n",
+    "Before moving to the logistic model, let us try to use our linear\n",
+    "regression model to classify these two outcomes. We could for example\n",
+    "fit a linear model to the default case if $y_i > 0.5$ and the no\n",
+    "default case $y_i \\leq 0.5$.\n",
+    "\n",
+    "We would then have our \n",
+    "weighted linear combination, namely"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a78fc346",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"_auto1\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation}\n",
+    "\\boldsymbol{y} = \\boldsymbol{X}^T\\boldsymbol{\\theta} +  \\boldsymbol{\\epsilon},\n",
+    "\\label{_auto1} \\tag{1}\n",
+    "\\end{equation}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "661d8faf",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where $\\boldsymbol{y}$ is a vector representing the possible outcomes, $\\boldsymbol{X}$ is our\n",
+    "$n\\times p$ design matrix and $\\boldsymbol{\\theta}$ represents our estimators/predictors."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8620ba1b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Some selected properties\n",
+    "\n",
+    "The main problem with our function is that it takes values on the\n",
+    "entire real axis. In the case of logistic regression, however, the\n",
+    "labels $y_i$ are discrete variables. A typical example is the credit\n",
+    "card data discussed below here, where we can set the state of\n",
+    "defaulting the debt to $y_i=1$ and not to $y_i=0$ for one the persons\n",
+    "in the data set (see the full example below).\n",
+    "\n",
+    "One simple way to get a discrete output is to have sign\n",
+    "functions that map the output of a linear regressor to values $\\{0,1\\}$,\n",
+    "$f(s_i)=sign(s_i)=1$ if $s_i\\ge 0$ and 0 if otherwise. \n",
+    "We will encounter this model in our first demonstration of neural networks.\n",
+    "\n",
+    "Historically it is called the **perceptron** model in the machine learning\n",
+    "literature. This model is extremely simple. However, in many cases it is more\n",
+    "favorable to use a ``soft\" classifier that outputs\n",
+    "the probability of a given category. This leads us to the logistic function."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8fdbebd2",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Simple example\n",
+    "\n",
+    "The following example on data for coronary heart disease (CHD) as function of age may serve as an illustration. In the code here we read and plot whether a person has had CHD (output = 1) or not (output = 0). This ouput  is plotted the person's against age. Clearly, the figure shows that attempting to make a standard linear regression fit may not be very meaningful."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "id": "8dc64aeb",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "# Common imports\n",
+    "import os\n",
+    "import numpy as np\n",
+    "import pandas as pd\n",
+    "import matplotlib.pyplot as plt\n",
+    "from sklearn.linear_model import LinearRegression, Ridge, Lasso\n",
+    "from sklearn.model_selection import train_test_split\n",
+    "from sklearn.utils import resample\n",
+    "from sklearn.metrics import mean_squared_error\n",
+    "from IPython.display import display\n",
+    "from pylab import plt, mpl\n",
+    "mpl.rcParams['font.family'] = 'serif'\n",
+    "\n",
+    "# Where to save the figures and data files\n",
+    "PROJECT_ROOT_DIR = \"Results\"\n",
+    "FIGURE_ID = \"Results/FigureFiles\"\n",
+    "DATA_ID = \"DataFiles/\"\n",
+    "\n",
+    "if not os.path.exists(PROJECT_ROOT_DIR):\n",
+    "    os.mkdir(PROJECT_ROOT_DIR)\n",
+    "\n",
+    "if not os.path.exists(FIGURE_ID):\n",
+    "    os.makedirs(FIGURE_ID)\n",
+    "\n",
+    "if not os.path.exists(DATA_ID):\n",
+    "    os.makedirs(DATA_ID)\n",
+    "\n",
+    "def image_path(fig_id):\n",
+    "    return os.path.join(FIGURE_ID, fig_id)\n",
+    "\n",
+    "def data_path(dat_id):\n",
+    "    return os.path.join(DATA_ID, dat_id)\n",
+    "\n",
+    "def save_fig(fig_id):\n",
+    "    plt.savefig(image_path(fig_id) + \".png\", format='png')\n",
+    "\n",
+    "infile = open(data_path(\"chddata.csv\"),'r')\n",
+    "\n",
+    "# Read the chd data as  csv file and organize the data into arrays with age group, age, and chd\n",
+    "chd = pd.read_csv(infile, names=('ID', 'Age', 'Agegroup', 'CHD'))\n",
+    "chd.columns = ['ID', 'Age', 'Agegroup', 'CHD']\n",
+    "output = chd['CHD']\n",
+    "age = chd['Age']\n",
+    "agegroup = chd['Agegroup']\n",
+    "numberID  = chd['ID'] \n",
+    "display(chd)\n",
+    "\n",
+    "plt.scatter(age, output, marker='o')\n",
+    "plt.axis([18,70.0,-0.1, 1.2])\n",
+    "plt.xlabel(r'Age')\n",
+    "plt.ylabel(r'CHD')\n",
+    "plt.title(r'Age distribution and Coronary heart disease')\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "40385068",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Plotting the mean value for each group\n",
+    "\n",
+    "What we could attempt however is to plot the mean value for each group."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "id": "a473659b",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "agegroupmean = np.array([0.1, 0.133, 0.250, 0.333, 0.462, 0.625, 0.765, 0.800])\n",
+    "group = np.array([1, 2, 3, 4, 5, 6, 7, 8])\n",
+    "plt.plot(group, agegroupmean, \"r-\")\n",
+    "plt.axis([0,9,0, 1.0])\n",
+    "plt.xlabel(r'Age group')\n",
+    "plt.ylabel(r'CHD mean values')\n",
+    "plt.title(r'Mean values for each age group')\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3e2ab512",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "We are now trying to find a function $f(y\\vert x)$, that is a function which gives us an expected value for the output $y$ with a given input $x$.\n",
+    "In standard linear regression with a linear dependence on $x$, we would write this in terms of our model"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "40361f1b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "f(y_i\\vert x_i)=\\theta_0+\\theta_1 x_i.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a1b379fb",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "This expression implies however that $f(y_i\\vert x_i)$ could take any\n",
+    "value from minus infinity to plus infinity. If we however let\n",
+    "$f(y\\vert y)$ be represented by the mean value, the above example\n",
+    "shows us that we can constrain the function to take values between\n",
+    "zero and one, that is we have $0 \\le f(y_i\\vert x_i) \\le 1$. Looking\n",
+    "at our last curve we see also that it has an S-shaped form. This leads\n",
+    "us to a very popular model for the function $f$, namely the so-called\n",
+    "Sigmoid function or logistic model. We will consider this function as\n",
+    "representing the probability for finding a value of $y_i$ with a given\n",
+    "$x_i$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "bcbf3d2b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## The logistic function\n",
+    "\n",
+    "Another widely studied model, is the so-called \n",
+    "perceptron model, which is an example of a \"hard classification\" model. We\n",
+    "will encounter this model when we discuss neural networks as\n",
+    "well. Each datapoint is deterministically assigned to a category (i.e\n",
+    "$y_i=0$ or $y_i=1$). In many cases, and the coronary heart disease data forms one of many such examples, it is favorable to have a \"soft\"\n",
+    "classifier that outputs the probability of a given category rather\n",
+    "than a single value. For example, given $x_i$, the classifier\n",
+    "outputs the probability of being in a category $k$.  Logistic regression\n",
+    "is the most common example of a so-called soft classifier. In logistic\n",
+    "regression, the probability that a data point $x_i$\n",
+    "belongs to a category $y_i=\\{0,1\\}$ is given by the so-called logit function (or Sigmoid) which is meant to represent the likelihood for a given event,"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "38918f44",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "p(t) = \\frac{1}{1+\\mathrm \\exp{-t}}=\\frac{\\exp{t}}{1+\\mathrm \\exp{t}}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "fd225d0f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Note that $1-p(t)= p(-t)$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d340b5c1",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Examples of likelihood functions used in logistic regression and nueral networks\n",
+    "\n",
+    "The following code plots the logistic function, the step function and other functions we will encounter from here and on."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 8,
+   "id": "357d6f03",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "\"\"\"The sigmoid function (or the logistic curve) is a\n",
+    "function that takes any real number, z, and outputs a number (0,1).\n",
+    "It is useful in neural networks for assigning weights on a relative scale.\n",
+    "The value z is the weighted sum of parameters involved in the learning algorithm.\"\"\"\n",
+    "\n",
+    "import numpy\n",
+    "import matplotlib.pyplot as plt\n",
+    "import math as mt\n",
+    "\n",
+    "z = numpy.arange(-5, 5, .1)\n",
+    "sigma_fn = numpy.vectorize(lambda z: 1/(1+numpy.exp(-z)))\n",
+    "sigma = sigma_fn(z)\n",
+    "\n",
+    "fig = plt.figure()\n",
+    "ax = fig.add_subplot(111)\n",
+    "ax.plot(z, sigma)\n",
+    "ax.set_ylim([-0.1, 1.1])\n",
+    "ax.set_xlim([-5,5])\n",
+    "ax.grid(True)\n",
+    "ax.set_xlabel('z')\n",
+    "ax.set_title('sigmoid function')\n",
+    "\n",
+    "plt.show()\n",
+    "\n",
+    "\"\"\"Step Function\"\"\"\n",
+    "z = numpy.arange(-5, 5, .02)\n",
+    "step_fn = numpy.vectorize(lambda z: 1.0 if z >= 0.0 else 0.0)\n",
+    "step = step_fn(z)\n",
+    "\n",
+    "fig = plt.figure()\n",
+    "ax = fig.add_subplot(111)\n",
+    "ax.plot(z, step)\n",
+    "ax.set_ylim([-0.5, 1.5])\n",
+    "ax.set_xlim([-5,5])\n",
+    "ax.grid(True)\n",
+    "ax.set_xlabel('z')\n",
+    "ax.set_title('step function')\n",
+    "\n",
+    "plt.show()\n",
+    "\n",
+    "\"\"\"tanh Function\"\"\"\n",
+    "z = numpy.arange(-2*mt.pi, 2*mt.pi, 0.1)\n",
+    "t = numpy.tanh(z)\n",
+    "\n",
+    "fig = plt.figure()\n",
+    "ax = fig.add_subplot(111)\n",
+    "ax.plot(z, t)\n",
+    "ax.set_ylim([-1.0, 1.0])\n",
+    "ax.set_xlim([-2*mt.pi,2*mt.pi])\n",
+    "ax.grid(True)\n",
+    "ax.set_xlabel('z')\n",
+    "ax.set_title('tanh function')\n",
+    "\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8be63821",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Two parameters\n",
+    "\n",
+    "We assume now that we have two classes with $y_i$ either $0$ or $1$. Furthermore we assume also that we have only two parameters $\\theta$ in our fitting of the Sigmoid function, that is we define probabilities"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f79d930e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\begin{align*}\n",
+    "p(y_i=1|x_i,\\boldsymbol{\\theta}) &= \\frac{\\exp{(\\theta_0+\\theta_1x_i)}}{1+\\exp{(\\theta_0+\\theta_1x_i)}},\\nonumber\\\\\n",
+    "p(y_i=0|x_i,\\boldsymbol{\\theta}) &= 1 - p(y_i=1|x_i,\\boldsymbol{\\theta}),\n",
+    "\\end{align*}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8a758aae",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where $\\boldsymbol{\\theta}$ are the weights we wish to extract from data, in our case $\\theta_0$ and $\\theta_1$. \n",
+    "\n",
+    "Note that we used"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "88159170",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "p(y_i=0\\vert x_i, \\boldsymbol{\\theta}) = 1-p(y_i=1\\vert x_i, \\boldsymbol{\\theta}).\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f9972402",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Maximum likelihood\n",
+    "\n",
+    "In order to define the total likelihood for all possible outcomes from a  \n",
+    "dataset $\\mathcal{D}=\\{(y_i,x_i)\\}$, with the binary labels\n",
+    "$y_i\\in\\{0,1\\}$ and where the data points are drawn independently, we use the so-called [Maximum Likelihood Estimation](https://en.wikipedia.org/wiki/Maximum_likelihood_estimation) (MLE) principle. \n",
+    "We aim thus at maximizing \n",
+    "the probability of seeing the observed data. We can then approximate the \n",
+    "likelihood in terms of the product of the individual probabilities of a specific outcome $y_i$, that is"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "949524d2",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\begin{align*}\n",
+    "P(\\mathcal{D}|\\boldsymbol{\\theta})& = \\prod_{i=1}^n \\left[p(y_i=1|x_i,\\boldsymbol{\\theta})\\right]^{y_i}\\left[1-p(y_i=1|x_i,\\boldsymbol{\\theta}))\\right]^{1-y_i}\\nonumber \\\\\n",
+    "\\end{align*}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d9a7fded",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "from which we obtain the log-likelihood and our **cost/loss** function"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4c5f78fb",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\mathcal{C}(\\boldsymbol{\\theta}) = \\sum_{i=1}^n \\left( y_i\\log{p(y_i=1|x_i,\\boldsymbol{\\theta})} + (1-y_i)\\log\\left[1-p(y_i=1|x_i,\\boldsymbol{\\theta}))\\right]\\right).\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5ccce506",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## The cost function rewritten\n",
+    "\n",
+    "Reordering the logarithms, we can rewrite the **cost/loss** function as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "bf58bb76",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\mathcal{C}(\\boldsymbol{\\theta}) = \\sum_{i=1}^n  \\left(y_i(\\theta_0+\\theta_1x_i) -\\log{(1+\\exp{(\\theta_0+\\theta_1x_i)})}\\right).\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "41543ca6",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "The maximum likelihood estimator is defined as the set of parameters that maximize the log-likelihood where we maximize with respect to $\\theta$.\n",
+    "Since the cost (error) function is just the negative log-likelihood, for logistic regression we have that"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e664b57a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\mathcal{C}(\\boldsymbol{\\theta})=-\\sum_{i=1}^n  \\left(y_i(\\theta_0+\\theta_1x_i) -\\log{(1+\\exp{(\\theta_0+\\theta_1x_i)})}\\right).\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "eb357503",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "This equation is known in statistics as the **cross entropy**. Finally, we note that just as in linear regression, \n",
+    "in practice we often supplement the cross-entropy with additional regularization terms, usually $L_1$ and $L_2$ regularization as we did for Ridge and Lasso regression."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e388ad02",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Minimizing the cross entropy\n",
+    "\n",
+    "The cross entropy is a convex function of the weights $\\boldsymbol{\\theta}$ and,\n",
+    "therefore, any local minimizer is a global minimizer. \n",
+    "\n",
+    "Minimizing this\n",
+    "cost function with respect to the two parameters $\\theta_0$ and $\\theta_1$ we obtain"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1d4f2850",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial \\mathcal{C}(\\boldsymbol{\\theta})}{\\partial \\theta_0} = -\\sum_{i=1}^n  \\left(y_i -\\frac{\\exp{(\\theta_0+\\theta_1x_i)}}{1+\\exp{(\\theta_0+\\theta_1x_i)}}\\right),\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "68a0c133",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c942a72b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial \\mathcal{C}(\\boldsymbol{\\theta})}{\\partial \\theta_1} = -\\sum_{i=1}^n  \\left(y_ix_i -x_i\\frac{\\exp{(\\theta_0+\\theta_1x_i)}}{1+\\exp{(\\theta_0+\\theta_1x_i)}}\\right).\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "42caf6db",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## A more compact expression\n",
+    "\n",
+    "Let us now define a vector $\\boldsymbol{y}$ with $n$ elements $y_i$, an\n",
+    "$n\\times p$ matrix $\\boldsymbol{X}$ which contains the $x_i$ values and a\n",
+    "vector $\\boldsymbol{p}$ of fitted probabilities $p(y_i\\vert x_i,\\boldsymbol{\\theta})$. We can rewrite in a more compact form the first\n",
+    "derivative of the cost function as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "22cd94c9",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial \\mathcal{C}(\\boldsymbol{\\theta})}{\\partial \\boldsymbol{\\theta}} = -\\boldsymbol{X}^T\\left(\\boldsymbol{y}-\\boldsymbol{p}\\right).\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d9428067",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "If we in addition define a diagonal matrix $\\boldsymbol{W}$ with elements \n",
+    "$p(y_i\\vert x_i,\\boldsymbol{\\theta})(1-p(y_i\\vert x_i,\\boldsymbol{\\theta})$, we can obtain a compact expression of the second derivative as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "29178d5a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial^2 \\mathcal{C}(\\boldsymbol{\\theta})}{\\partial \\boldsymbol{\\theta}\\partial \\boldsymbol{\\theta}^T} = \\boldsymbol{X}^T\\boldsymbol{W}\\boldsymbol{X}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6b7671ad",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Extending to more predictors\n",
+    "\n",
+    "Within a binary classification problem, we can easily expand our model to include multiple predictors. Our ratio between likelihoods is then with $p$ predictors"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "500b6574",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\log{ \\frac{p(\\boldsymbol{\\theta}\\boldsymbol{x})}{1-p(\\boldsymbol{\\theta}\\boldsymbol{x})}} = \\theta_0+\\theta_1x_1+\\theta_2x_2+\\dots+\\theta_px_p.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cf0b50ce",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Here we defined $\\boldsymbol{x}=[1,x_1,x_2,\\dots,x_p]$ and $\\boldsymbol{\\theta}=[\\theta_0, \\theta_1, \\dots, \\theta_p]$ leading to"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "537486ee",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "p(\\boldsymbol{\\theta}\\boldsymbol{x})=\\frac{ \\exp{(\\theta_0+\\theta_1x_1+\\theta_2x_2+\\dots+\\theta_px_p)}}{1+\\exp{(\\theta_0+\\theta_1x_1+\\theta_2x_2+\\dots+\\theta_px_p)}}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "534fb571",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Including more classes\n",
+    "\n",
+    "Till now we have mainly focused on two classes, the so-called binary\n",
+    "system. Suppose we wish to extend to $K$ classes.  Let us for the sake\n",
+    "of simplicity assume we have only two predictors. We have then following model"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "fa7ca275",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\log{\\frac{p(C=1\\vert x)}{p(K\\vert x)}} = \\theta_{10}+\\theta_{11}x_1,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cc765c0e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2c43387d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\log{\\frac{p(C=2\\vert x)}{p(K\\vert x)}} = \\theta_{20}+\\theta_{21}x_1,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e063f183",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and so on till the class $C=K-1$ class"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "060fa00c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\log{\\frac{p(C=K-1\\vert x)}{p(K\\vert x)}} = \\theta_{(K-1)0}+\\theta_{(K-1)1}x_1,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b9034492",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and the model is specified in term of $K-1$ so-called log-odds or\n",
+    "**logit** transformations."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b7fba1fc",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## More classes\n",
+    "\n",
+    "In our discussion of neural networks we will encounter the above again\n",
+    "in terms of a slightly modified function, the so-called **Softmax** function.\n",
+    "\n",
+    "The softmax function is used in various multiclass classification\n",
+    "methods, such as multinomial logistic regression (also known as\n",
+    "softmax regression), multiclass linear discriminant analysis, naive\n",
+    "Bayes classifiers, and artificial neural networks.  Specifically, in\n",
+    "multinomial logistic regression and linear discriminant analysis, the\n",
+    "input to the function is the result of $K$ distinct linear functions,\n",
+    "and the predicted probability for the $k$-th class given a sample\n",
+    "vector $\\boldsymbol{x}$ and a weighting vector $\\boldsymbol{\\theta}$ is (with two\n",
+    "predictors):"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a8346f86",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "p(C=k\\vert \\mathbf {x} )=\\frac{\\exp{(\\theta_{k0}+\\theta_{k1}x_1)}}{1+\\sum_{l=1}^{K-1}\\exp{(\\theta_{l0}+\\theta_{l1}x_1)}}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b05e18eb",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "It is easy to extend to more predictors. The final class is"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3bff89b1",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "p(C=K\\vert \\mathbf {x} )=\\frac{1}{1+\\sum_{l=1}^{K-1}\\exp{(\\theta_{l0}+\\theta_{l1}x_1)}},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e89e832c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and they sum to one. Our earlier discussions were all specialized to\n",
+    "the case with two classes only. It is easy to see from the above that\n",
+    "what we derived earlier is compatible with these equations.\n",
+    "\n",
+    "To find the optimal parameters we would typically use a gradient\n",
+    "descent method.  Newton's method and gradient descent methods are\n",
+    "discussed in the material on [optimization\n",
+    "methods](https://compphysics.github.io/MachineLearning/doc/pub/Splines/html/Splines-bs.html)."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "464d4933",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Optimization, the central part of any Machine Learning algortithm\n",
+    "\n",
+    "Almost every problem in machine learning and data science starts with\n",
+    "a dataset $X$, a model $g(\\theta)$, which is a function of the\n",
+    "parameters $\\theta$ and a cost function $C(X, g(\\theta))$ that allows\n",
+    "us to judge how well the model $g(\\theta)$ explains the observations\n",
+    "$X$. The model is fit by finding the values of $\\theta$ that minimize\n",
+    "the cost function. Ideally we would be able to solve for $\\theta$\n",
+    "analytically, however this is not possible in general and we must use\n",
+    "some approximative/numerical method to compute the minimum."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c707d4a0",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Revisiting our Logistic Regression case\n",
+    "\n",
+    "In our discussion on Logistic Regression we studied the \n",
+    "case of\n",
+    "two classes, with $y_i$ either\n",
+    "$0$ or $1$. Furthermore we assumed also that we have only two\n",
+    "parameters $\\theta$ in our fitting, that is we\n",
+    "defined probabilities"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3f00d244",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\begin{align*}\n",
+    "p(y_i=1|x_i,\\boldsymbol{\\theta}) &= \\frac{\\exp{(\\theta_0+\\theta_1x_i)}}{1+\\exp{(\\theta_0+\\theta_1x_i)}},\\nonumber\\\\\n",
+    "p(y_i=0|x_i,\\boldsymbol{\\theta}) &= 1 - p(y_i=1|x_i,\\boldsymbol{\\theta}),\n",
+    "\\end{align*}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2d239661",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where $\\boldsymbol{\\theta}$ are the weights we wish to extract from data, in our case $\\theta_0$ and $\\theta_1$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4243778f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## The equations to solve\n",
+    "\n",
+    "Our compact equations used a definition of a vector $\\boldsymbol{y}$ with $n$\n",
+    "elements $y_i$, an $n\\times p$ matrix $\\boldsymbol{X}$ which contains the\n",
+    "$x_i$ values and a vector $\\boldsymbol{p}$ of fitted probabilities\n",
+    "$p(y_i\\vert x_i,\\boldsymbol{\\theta})$. We rewrote in a more compact form\n",
+    "the first derivative of the cost function as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "21ce04bb",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial \\mathcal{C}(\\boldsymbol{\\theta})}{\\partial \\boldsymbol{\\theta}} = -\\boldsymbol{X}^T\\left(\\boldsymbol{y}-\\boldsymbol{p}\\right).\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b854153c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "If we in addition define a diagonal matrix $\\boldsymbol{W}$ with elements \n",
+    "$p(y_i\\vert x_i,\\boldsymbol{\\theta})(1-p(y_i\\vert x_i,\\boldsymbol{\\theta})$, we can obtain a compact expression of the second derivative as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "235c9b1d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial^2 \\mathcal{C}(\\boldsymbol{\\theta})}{\\partial \\boldsymbol{\\theta}\\partial \\boldsymbol{\\theta}^T} = \\boldsymbol{X}^T\\boldsymbol{W}\\boldsymbol{X}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1651fe82",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "This defines what is called  the Hessian matrix."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f36a8c94",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Solving using Newton-Raphson's method\n",
+    "\n",
+    "If we can set up these equations, Newton-Raphson's iterative method is normally the method of choice. It requires however that we can compute in an efficient way the  matrices that define the first and second derivatives. \n",
+    "\n",
+    "Our iterative scheme is then given by"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "438b5efe",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\boldsymbol{\\theta}^{\\mathrm{new}} = \\boldsymbol{\\theta}^{\\mathrm{old}}-\\left(\\frac{\\partial^2 \\mathcal{C}(\\boldsymbol{\\theta})}{\\partial \\boldsymbol{\\theta}\\partial \\boldsymbol{\\theta}^T}\\right)^{-1}_{\\boldsymbol{\\theta}^{\\mathrm{old}}}\\times \\left(\\frac{\\partial \\mathcal{C}(\\boldsymbol{\\theta})}{\\partial \\boldsymbol{\\theta}}\\right)_{\\boldsymbol{\\theta}^{\\mathrm{old}}},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f3ae8207",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "or in matrix form as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "702a38c4",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\boldsymbol{\\theta}^{\\mathrm{new}} = \\boldsymbol{\\theta}^{\\mathrm{old}}-\\left(\\boldsymbol{X}^T\\boldsymbol{W}\\boldsymbol{X} \\right)^{-1}\\times \\left(-\\boldsymbol{X}^T(\\boldsymbol{y}-\\boldsymbol{p}) \\right)_{\\boldsymbol{\\theta}^{\\mathrm{old}}}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "43b5a9ab",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "The right-hand side is computed with the old values of $\\theta$. \n",
+    "\n",
+    "If we can compute these matrices, in particular the Hessian, the above is often the easiest method to implement."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5b579d10",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Example code for Logistic Regression\n",
+    "\n",
+    "Here we make a class for Logistic regression. The code uses a simple data set and includes both a binary case and a multiclass case."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 9,
+   "id": "a59b8c77",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "import numpy as np\n",
+    "\n",
+    "class LogisticRegression:\n",
+    "    \"\"\"\n",
+    "    Logistic Regression for binary and multiclass classification.\n",
+    "    \"\"\"\n",
+    "    def __init__(self, lr=0.01, epochs=1000, fit_intercept=True, verbose=False):\n",
+    "        self.lr = lr                  # Learning rate for gradient descent\n",
+    "        self.epochs = epochs          # Number of iterations\n",
+    "        self.fit_intercept = fit_intercept  # Whether to add intercept (bias)\n",
+    "        self.verbose = verbose        # Print loss during training if True\n",
+    "        self.weights = None\n",
+    "        self.multi_class = False      # Will be determined at fit time\n",
+    "\n",
+    "    def _add_intercept(self, X):\n",
+    "        \"\"\"Add intercept term (column of ones) to feature matrix.\"\"\"\n",
+    "        intercept = np.ones((X.shape[0], 1))\n",
+    "        return np.concatenate((intercept, X), axis=1)\n",
+    "\n",
+    "    def _sigmoid(self, z):\n",
+    "        \"\"\"Sigmoid function for binary logistic.\"\"\"\n",
+    "        return 1 / (1 + np.exp(-z))\n",
+    "\n",
+    "    def _softmax(self, Z):\n",
+    "        \"\"\"Softmax function for multiclass logistic.\"\"\"\n",
+    "        exp_Z = np.exp(Z - np.max(Z, axis=1, keepdims=True))\n",
+    "        return exp_Z / np.sum(exp_Z, axis=1, keepdims=True)\n",
+    "\n",
+    "    def fit(self, X, y):\n",
+    "        \"\"\"\n",
+    "        Train the logistic regression model using gradient descent.\n",
+    "        Supports binary (sigmoid) and multiclass (softmax) based on y.\n",
+    "        \"\"\"\n",
+    "        X = np.array(X)\n",
+    "        y = np.array(y)\n",
+    "        n_samples, n_features = X.shape\n",
+    "\n",
+    "        # Add intercept if needed\n",
+    "        if self.fit_intercept:\n",
+    "            X = self._add_intercept(X)\n",
+    "            n_features += 1\n",
+    "\n",
+    "        # Determine classes and mode (binary vs multiclass)\n",
+    "        unique_classes = np.unique(y)\n",
+    "        if len(unique_classes) > 2:\n",
+    "            self.multi_class = True\n",
+    "        else:\n",
+    "            self.multi_class = False\n",
+    "\n",
+    "        # ----- Multiclass case -----\n",
+    "        if self.multi_class:\n",
+    "            n_classes = len(unique_classes)\n",
+    "            # Map original labels to 0...n_classes-1\n",
+    "            class_to_index = {c: idx for idx, c in enumerate(unique_classes)}\n",
+    "            y_indices = np.array([class_to_index[c] for c in y])\n",
+    "            # Initialize weight matrix (features x classes)\n",
+    "            self.weights = np.zeros((n_features, n_classes))\n",
+    "\n",
+    "            # One-hot encode y\n",
+    "            Y_onehot = np.zeros((n_samples, n_classes))\n",
+    "            Y_onehot[np.arange(n_samples), y_indices] = 1\n",
+    "\n",
+    "            # Gradient descent\n",
+    "            for epoch in range(self.epochs):\n",
+    "                scores = X.dot(self.weights)          # Linear scores (n_samples x n_classes)\n",
+    "                probs = self._softmax(scores)        # Probabilities (n_samples x n_classes)\n",
+    "                # Compute gradient (features x classes)\n",
+    "                gradient = (1 / n_samples) * X.T.dot(probs - Y_onehot)\n",
+    "                # Update weights\n",
+    "                self.weights -= self.lr * gradient\n",
+    "\n",
+    "                if self.verbose and epoch % 100 == 0:\n",
+    "                    # Compute current loss (categorical cross-entropy)\n",
+    "                    loss = -np.sum(Y_onehot * np.log(probs + 1e-15)) / n_samples\n",
+    "                    print(f\"[Epoch {epoch}] Multiclass loss: {loss:.4f}\")\n",
+    "\n",
+    "        # ----- Binary case -----\n",
+    "        else:\n",
+    "            # Convert y to 0/1 if not already\n",
+    "            if not np.array_equal(unique_classes, [0, 1]):\n",
+    "                # Map the two classes to 0 and 1\n",
+    "                class0, class1 = unique_classes\n",
+    "                y_binary = np.where(y == class1, 1, 0)\n",
+    "            else:\n",
+    "                y_binary = y.copy().astype(int)\n",
+    "\n",
+    "            # Initialize weights vector (features,)\n",
+    "            self.weights = np.zeros(n_features)\n",
+    "\n",
+    "            # Gradient descent\n",
+    "            for epoch in range(self.epochs):\n",
+    "                linear_model = X.dot(self.weights)     # (n_samples,)\n",
+    "                probs = self._sigmoid(linear_model)   # (n_samples,)\n",
+    "                # Gradient for binary cross-entropy\n",
+    "                gradient = (1 / n_samples) * X.T.dot(probs - y_binary)\n",
+    "                self.weights -= self.lr * gradient\n",
+    "\n",
+    "                if self.verbose and epoch % 100 == 0:\n",
+    "                    # Compute binary cross-entropy loss\n",
+    "                    loss = -np.mean(\n",
+    "                        y_binary * np.log(probs + 1e-15) + \n",
+    "                        (1 - y_binary) * np.log(1 - probs + 1e-15)\n",
+    "                    )\n",
+    "                    print(f\"[Epoch {epoch}] Binary loss: {loss:.4f}\")\n",
+    "\n",
+    "    def predict_prob(self, X):\n",
+    "        \"\"\"\n",
+    "        Compute probability estimates. Returns a 1D array for binary or\n",
+    "        a 2D array (n_samples x n_classes) for multiclass.\n",
+    "        \"\"\"\n",
+    "        X = np.array(X)\n",
+    "        # Add intercept if the model used it\n",
+    "        if self.fit_intercept:\n",
+    "            X = self._add_intercept(X)\n",
+    "        scores = X.dot(self.weights)\n",
+    "        if self.multi_class:\n",
+    "            return self._softmax(scores)\n",
+    "        else:\n",
+    "            return self._sigmoid(scores)\n",
+    "\n",
+    "    def predict(self, X):\n",
+    "        \"\"\"\n",
+    "        Predict class labels for samples in X.\n",
+    "        Returns integer class labels (0,1 for binary, or 0...C-1 for multiclass).\n",
+    "        \"\"\"\n",
+    "        probs = self.predict_prob(X)\n",
+    "        if self.multi_class:\n",
+    "            # Choose class with highest probability\n",
+    "            return np.argmax(probs, axis=1)\n",
+    "        else:\n",
+    "            # Threshold at 0.5 for binary\n",
+    "            return (probs >= 0.5).astype(int)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d7401376",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "The class implements the sigmoid and softmax internally. During fit(),\n",
+    "we check the number of classes: if more than 2, we set\n",
+    "self.multi_class=True and perform multinomial logistic regression. We\n",
+    "one-hot encode the target vector and update a weight matrix with\n",
+    "softmax probabilities. Otherwise, we do standard binary logistic\n",
+    "regression, converting labels to 0/1 if needed and updating a weight\n",
+    "vector. In both cases we use batch gradient descent on the\n",
+    "cross-entropy loss (we add a small epsilon 1e-15 to logs for numerical\n",
+    "stability). Progress (loss) can be printed if verbose=True."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 10,
+   "id": "8609fd64",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "# Evaluation Metrics\n",
+    "#We define helper functions for accuracy and cross-entropy loss. Accuracy is the fraction of correct predictions . For loss, we compute the appropriate cross-entropy:\n",
+    "\n",
+    "def accuracy_score(y_true, y_pred):\n",
+    "    \"\"\"Accuracy = (# correct predictions) / (total samples).\"\"\"\n",
+    "    y_true = np.array(y_true)\n",
+    "    y_pred = np.array(y_pred)\n",
+    "    return np.mean(y_true == y_pred)\n",
+    "\n",
+    "def binary_cross_entropy(y_true, y_prob):\n",
+    "    \"\"\"\n",
+    "    Binary cross-entropy loss.\n",
+    "    y_true: true binary labels (0 or 1), y_prob: predicted probabilities for class 1.\n",
+    "    \"\"\"\n",
+    "    y_true = np.array(y_true)\n",
+    "    y_prob = np.clip(np.array(y_prob), 1e-15, 1-1e-15)\n",
+    "    return -np.mean(y_true * np.log(y_prob) + (1 - y_true) * np.log(1 - y_prob))\n",
+    "\n",
+    "def categorical_cross_entropy(y_true, y_prob):\n",
+    "    \"\"\"\n",
+    "    Categorical cross-entropy loss for multiclass.\n",
+    "    y_true: true labels (0...C-1), y_prob: array of predicted probabilities (n_samples x C).\n",
+    "    \"\"\"\n",
+    "    y_true = np.array(y_true, dtype=int)\n",
+    "    y_prob = np.clip(np.array(y_prob), 1e-15, 1-1e-15)\n",
+    "    # One-hot encode true labels\n",
+    "    n_samples, n_classes = y_prob.shape\n",
+    "    one_hot = np.zeros_like(y_prob)\n",
+    "    one_hot[np.arange(n_samples), y_true] = 1\n",
+    "    # Compute cross-entropy\n",
+    "    loss_vec = -np.sum(one_hot * np.log(y_prob), axis=1)\n",
+    "    return np.mean(loss_vec)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1879aba2",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "### Synthetic data generation\n",
+    "\n",
+    "Binary classification data: Create two Gaussian clusters in 2D. For example, class 0 around mean [-2,-2] and class 1 around [2,2].\n",
+    "Multiclass data: Create several Gaussian clusters (one per class) spread out in feature space."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 11,
+   "id": "6083d844",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "import numpy as np\n",
+    "\n",
+    "def generate_binary_data(n_samples=100, n_features=2, random_state=None):\n",
+    "    \"\"\"\n",
+    "    Generate synthetic binary classification data.\n",
+    "    Returns (X, y) where X is (n_samples x n_features), y in {0,1}.\n",
+    "    \"\"\"\n",
+    "    rng = np.random.RandomState(random_state)\n",
+    "    # Half samples for class 0, half for class 1\n",
+    "    n0 = n_samples // 2\n",
+    "    n1 = n_samples - n0\n",
+    "    # Class 0 around mean -2, class 1 around +2\n",
+    "    mean0 = -2 * np.ones(n_features)\n",
+    "    mean1 =  2 * np.ones(n_features)\n",
+    "    X0 = rng.randn(n0, n_features) + mean0\n",
+    "    X1 = rng.randn(n1, n_features) + mean1\n",
+    "    X = np.vstack((X0, X1))\n",
+    "    y = np.array([0]*n0 + [1]*n1)\n",
+    "    return X, y\n",
+    "\n",
+    "def generate_multiclass_data(n_samples=150, n_features=2, n_classes=3, random_state=None):\n",
+    "    \"\"\"\n",
+    "    Generate synthetic multiclass data with n_classes Gaussian clusters.\n",
+    "    \"\"\"\n",
+    "    rng = np.random.RandomState(random_state)\n",
+    "    X = []\n",
+    "    y = []\n",
+    "    samples_per_class = n_samples // n_classes\n",
+    "    for cls in range(n_classes):\n",
+    "        # Random cluster center for each class\n",
+    "        center = rng.uniform(-5, 5, size=n_features)\n",
+    "        Xi = rng.randn(samples_per_class, n_features) + center\n",
+    "        yi = [cls] * samples_per_class\n",
+    "        X.append(Xi)\n",
+    "        y.extend(yi)\n",
+    "    X = np.vstack(X)\n",
+    "    y = np.array(y)\n",
+    "    return X, y\n",
+    "\n",
+    "\n",
+    "# Generate and test on binary data\n",
+    "X_bin, y_bin = generate_binary_data(n_samples=200, n_features=2, random_state=42)\n",
+    "model_bin = LogisticRegression(lr=0.1, epochs=1000)\n",
+    "model_bin.fit(X_bin, y_bin)\n",
+    "y_prob_bin = model_bin.predict_prob(X_bin)      # probabilities for class 1\n",
+    "y_pred_bin = model_bin.predict(X_bin)           # predicted classes 0 or 1\n",
+    "\n",
+    "acc_bin = accuracy_score(y_bin, y_pred_bin)\n",
+    "loss_bin = binary_cross_entropy(y_bin, y_prob_bin)\n",
+    "print(f\"Binary Classification - Accuracy: {acc_bin:.2f}, Cross-Entropy Loss: {loss_bin:.2f}\")\n",
+    "#For multiclass:\n",
+    "# Generate and test on multiclass data\n",
+    "X_multi, y_multi = generate_multiclass_data(n_samples=300, n_features=2, n_classes=3, random_state=1)\n",
+    "model_multi = LogisticRegression(lr=0.1, epochs=1000)\n",
+    "model_multi.fit(X_multi, y_multi)\n",
+    "y_prob_multi = model_multi.predict_prob(X_multi)     # (n_samples x 3) probabilities\n",
+    "y_pred_multi = model_multi.predict(X_multi)          # predicted labels 0,1,2\n",
+    "\n",
+    "acc_multi = accuracy_score(y_multi, y_pred_multi)\n",
+    "loss_multi = categorical_cross_entropy(y_multi, y_prob_multi)\n",
+    "print(f\"Multiclass Classification - Accuracy: {acc_multi:.2f}, Cross-Entropy Loss: {loss_multi:.2f}\")\n",
+    "\n",
+    "# CSV Export\n",
+    "import csv\n",
+    "\n",
+    "# Export binary results\n",
+    "with open('binary_results.csv', mode='w', newline='') as f:\n",
+    "    writer = csv.writer(f)\n",
+    "    writer.writerow([\"TrueLabel\", \"PredictedLabel\"])\n",
+    "    for true, pred in zip(y_bin, y_pred_bin):\n",
+    "        writer.writerow([true, pred])\n",
+    "\n",
+    "# Export multiclass results\n",
+    "with open('multiclass_results.csv', mode='w', newline='') as f:\n",
+    "    writer = csv.writer(f)\n",
+    "    writer.writerow([\"TrueLabel\", \"PredictedLabel\"])\n",
+    "    for true, pred in zip(y_multi, y_pred_multi):\n",
+    "        writer.writerow([true, pred])"
+   ]
+  }
+ ],
+ "metadata": {},
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
\ No newline at end of file
diff --git a/doc/LectureNotes/_build/jupyter_execute/week40.ipynb b/doc/LectureNotes/_build/jupyter_execute/week40.ipynb
new file mode 100644
index 000000000..5475b7668
--- /dev/null
+++ b/doc/LectureNotes/_build/jupyter_execute/week40.ipynb
@@ -0,0 +1,2459 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "2303c986",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- HTML file automatically generated from DocOnce source (https://github.com/doconce/doconce/)\n",
+    "doconce format html week40.do.txt --no_mako -->\n",
+    "<!-- dom:TITLE: Week 40: Gradient descent methods (continued) and start Neural networks -->"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "75c3b33e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "# Week 40: Gradient descent methods (continued) and start Neural networks\n",
+    "**Morten Hjorth-Jensen**, Department of Physics, University of Oslo, Norway\n",
+    "\n",
+    "Date: **September 29-October 3, 2025**"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4ba50982",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Lecture Monday September 29, 2025\n",
+    "1. Logistic regression and gradient descent, examples on how to code\n",
+    "<!-- o Automatic differentiation and gradient descent, examples using Logistic regression -->\n",
+    "\n",
+    "2. Start with the basics of Neural Networks, setting up the basic steps, from the simple perceptron model to the multi-layer perceptron model\n",
+    "\n",
+    "3. Video of lecture at <https://youtu.be/MS3Tv8FVArs>\n",
+    "\n",
+    "4. Whiteboard notes at <https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2025/FYSSTKweek40.pdf>"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1d527020",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Suggested readings and videos\n",
+    "**Readings and Videos:**\n",
+    "\n",
+    "1. The lecture notes for week 40 (these notes)\n",
+    "<!-- o For a good discussion on gradient methods, we would like to recommend Goodfellow et al section 4.3-4.5 and# sections 8.3-8.6. We will come back to the latter chapter in our discussion of Neural networks as well. -->\n",
+    "\n",
+    "2. For neural networks we recommend Goodfellow et al chapter 6 and Raschka et al chapter 2 (contains also material about gradient descent) and chapter 11 (we will use this next week)\n",
+    "<!-- o Video on gradient descent at <https://www.youtube.com/watch?v=sDv4f4s2SB8> -->\n",
+    "<!-- o Video on automatic differentiation  at <https://www.youtube.com/watch?v=wG_nF1awSSY> -->\n",
+    "\n",
+    "3. Neural Networks demystified at <https://www.youtube.com/watch?v=bxe2T-V8XRs&list=PLiaHhY2iBX9hdHaRr6b7XevZtgZRa1PoU&ab_channel=WelchLabs>\n",
+    "\n",
+    "4. Building Neural Networks from scratch at URL:https://www.youtube.com/watch?v=Wo5dMEP_BbI&list=PLQVvvaa0QuDcjD5BAw2DxE6OF2tius3V3&ab_channel=sentdex\""
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "63a4d497",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Lab sessions Tuesday and Wednesday\n",
+    "**Material for the active learning sessions on Tuesday and Wednesday.**\n",
+    "\n",
+    "  * Work on project 1 and discussions on how to structure your report\n",
+    "\n",
+    "  * No weekly exercises for week 40, project work only\n",
+    "\n",
+    "  * Video on how to write scientific reports recorded during one of the lab sessions at <https://youtu.be/tVW1ZDmZnwM>\n",
+    "\n",
+    "  * A general guideline can be found at <https://github.com/CompPhysics/MachineLearning/blob/master/doc/Projects/EvaluationGrading/EvaluationForm.md>."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "73621d6b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Logistic Regression, from last week\n",
+    "\n",
+    "In linear regression our main interest was centered on learning the\n",
+    "coefficients of a functional fit (say a polynomial) in order to be\n",
+    "able to predict the response of a continuous variable on some unseen\n",
+    "data. The fit to the continuous variable $y_i$ is based on some\n",
+    "independent variables $\\boldsymbol{x}_i$. Linear regression resulted in\n",
+    "analytical expressions for standard ordinary Least Squares or Ridge\n",
+    "regression (in terms of matrices to invert) for several quantities,\n",
+    "ranging from the variance and thereby the confidence intervals of the\n",
+    "parameters $\\boldsymbol{\\theta}$ to the mean squared error. If we can invert\n",
+    "the product of the design matrices, linear regression gives then a\n",
+    "simple recipe for fitting our data."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "fc1df17b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Classification problems\n",
+    "\n",
+    "Classification problems, however, are concerned with outcomes taking\n",
+    "the form of discrete variables (i.e. categories). We may for example,\n",
+    "on the basis of DNA sequencing for a number of patients, like to find\n",
+    "out which mutations are important for a certain disease; or based on\n",
+    "scans of various patients' brains, figure out if there is a tumor or\n",
+    "not; or given a specific physical system, we'd like to identify its\n",
+    "state, say whether it is an ordered or disordered system (typical\n",
+    "situation in solid state physics); or classify the status of a\n",
+    "patient, whether she/he has a stroke or not and many other similar\n",
+    "situations.\n",
+    "\n",
+    "The most common situation we encounter when we apply logistic\n",
+    "regression is that of two possible outcomes, normally denoted as a\n",
+    "binary outcome, true or false, positive or negative, success or\n",
+    "failure etc."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a3d311e6",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Optimization and Deep learning\n",
+    "\n",
+    "Logistic regression will also serve as our stepping stone towards\n",
+    "neural network algorithms and supervised deep learning. For logistic\n",
+    "learning, the minimization of the cost function leads to a non-linear\n",
+    "equation in the parameters $\\boldsymbol{\\theta}$. The optimization of the\n",
+    "problem calls therefore for minimization algorithms.\n",
+    "\n",
+    "As we have discussed earlier, this forms the\n",
+    "bottle neck of all machine learning algorithms, namely how to find\n",
+    "reliable minima of a multi-variable function. This leads us to the\n",
+    "family of gradient descent methods. The latter are the working horses\n",
+    "of basically all modern machine learning algorithms.\n",
+    "\n",
+    "We note also that many of the topics discussed here on logistic \n",
+    "regression are also commonly used in modern supervised Deep Learning\n",
+    "models, as we will see later."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4120d6f9",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Basics\n",
+    "\n",
+    "We consider the case where the outputs/targets, also called the\n",
+    "responses or the outcomes, $y_i$ are discrete and only take values\n",
+    "from $k=0,\\dots,K-1$ (i.e. $K$ classes).\n",
+    "\n",
+    "The goal is to predict the\n",
+    "output classes from the design matrix $\\boldsymbol{X}\\in\\mathbb{R}^{n\\times p}$\n",
+    "made of $n$ samples, each of which carries $p$ features or predictors. The\n",
+    "primary goal is to identify the classes to which new unseen samples\n",
+    "belong.\n",
+    "\n",
+    "Last week we  specialized to the case of two classes only, with outputs\n",
+    "$y_i=0$ and $y_i=1$. Our outcomes could represent the status of a\n",
+    "credit card user that could default or not on her/his credit card\n",
+    "debt. That is"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9e85d1e4",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "y_i = \\begin{bmatrix} 0 & \\mathrm{no}\\\\  1 & \\mathrm{yes} \\end{bmatrix}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a0d8c838",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Two parameters\n",
+    "\n",
+    "We assume now that we have two classes with $y_i$ either $0$ or $1$. Furthermore we assume also that we have only two parameters $\\theta$ in our fitting of the Sigmoid function, that is we define probabilities"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7cea7945",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\begin{align*}\n",
+    "p(y_i=1|x_i,\\boldsymbol{\\theta}) &= \\frac{\\exp{(\\theta_0+\\theta_1x_i)}}{1+\\exp{(\\theta_0+\\theta_1x_i)}},\\nonumber\\\\\n",
+    "p(y_i=0|x_i,\\boldsymbol{\\theta}) &= 1 - p(y_i=1|x_i,\\boldsymbol{\\theta}),\n",
+    "\\end{align*}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6adc5106",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where $\\boldsymbol{\\theta}$ are the weights we wish to extract from data, in our case $\\theta_0$ and $\\theta_1$. \n",
+    "\n",
+    "Note that we used"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f976068e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "p(y_i=0\\vert x_i, \\boldsymbol{\\theta}) = 1-p(y_i=1\\vert x_i, \\boldsymbol{\\theta}).\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "dedf9f0e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Maximum likelihood\n",
+    "\n",
+    "In order to define the total likelihood for all possible outcomes from a  \n",
+    "dataset $\\mathcal{D}=\\{(y_i,x_i)\\}$, with the binary labels\n",
+    "$y_i\\in\\{0,1\\}$ and where the data points are drawn independently, we use the so-called [Maximum Likelihood Estimation](https://en.wikipedia.org/wiki/Maximum_likelihood_estimation) (MLE) principle. \n",
+    "We aim thus at maximizing \n",
+    "the probability of seeing the observed data. We can then approximate the \n",
+    "likelihood in terms of the product of the individual probabilities of a specific outcome $y_i$, that is"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "bd8b54ab",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\begin{align*}\n",
+    "P(\\mathcal{D}|\\boldsymbol{\\theta})& = \\prod_{i=1}^n \\left[p(y_i=1|x_i,\\boldsymbol{\\theta})\\right]^{y_i}\\left[1-p(y_i=1|x_i,\\boldsymbol{\\theta}))\\right]^{1-y_i}\\nonumber \\\\\n",
+    "\\end{align*}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "57bfb17f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "from which we obtain the log-likelihood and our **cost/loss** function"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "00aee268",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\mathcal{C}(\\boldsymbol{\\theta}) = \\sum_{i=1}^n \\left( y_i\\log{p(y_i=1|x_i,\\boldsymbol{\\theta})} + (1-y_i)\\log\\left[1-p(y_i=1|x_i,\\boldsymbol{\\theta}))\\right]\\right).\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e12940f3",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## The cost function rewritten\n",
+    "\n",
+    "Reordering the logarithms, we can rewrite the **cost/loss** function as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e5b2b29e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\mathcal{C}(\\boldsymbol{\\theta}) = \\sum_{i=1}^n  \\left(y_i(\\theta_0+\\theta_1x_i) -\\log{(1+\\exp{(\\theta_0+\\theta_1x_i)})}\\right).\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c6c0ba4c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "The maximum likelihood estimator is defined as the set of parameters that maximize the log-likelihood where we maximize with respect to $\\theta$.\n",
+    "Since the cost (error) function is just the negative log-likelihood, for logistic regression we have that"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "46ee2ea8",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\mathcal{C}(\\boldsymbol{\\theta})=-\\sum_{i=1}^n  \\left(y_i(\\theta_0+\\theta_1x_i) -\\log{(1+\\exp{(\\theta_0+\\theta_1x_i)})}\\right).\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9a05709b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "This equation is known in statistics as the **cross entropy**. Finally, we note that just as in linear regression, \n",
+    "in practice we often supplement the cross-entropy with additional regularization terms, usually $L_1$ and $L_2$ regularization as we did for Ridge and Lasso regression."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ae1362c9",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Minimizing the cross entropy\n",
+    "\n",
+    "The cross entropy is a convex function of the weights $\\boldsymbol{\\theta}$ and,\n",
+    "therefore, any local minimizer is a global minimizer. \n",
+    "\n",
+    "Minimizing this\n",
+    "cost function with respect to the two parameters $\\theta_0$ and $\\theta_1$ we obtain"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "57f4670b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial \\mathcal{C}(\\boldsymbol{\\theta})}{\\partial \\theta_0} = -\\sum_{i=1}^n  \\left(y_i -\\frac{\\exp{(\\theta_0+\\theta_1x_i)}}{1+\\exp{(\\theta_0+\\theta_1x_i)}}\\right),\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1dc19f59",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4e96dc87",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial \\mathcal{C}(\\boldsymbol{\\theta})}{\\partial \\theta_1} = -\\sum_{i=1}^n  \\left(y_ix_i -x_i\\frac{\\exp{(\\theta_0+\\theta_1x_i)}}{1+\\exp{(\\theta_0+\\theta_1x_i)}}\\right).\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "fa77bec9",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## A more compact expression\n",
+    "\n",
+    "Let us now define a vector $\\boldsymbol{y}$ with $n$ elements $y_i$, an\n",
+    "$n\\times p$ matrix $\\boldsymbol{X}$ which contains the $x_i$ values and a\n",
+    "vector $\\boldsymbol{p}$ of fitted probabilities $p(y_i\\vert x_i,\\boldsymbol{\\theta})$. We can rewrite in a more compact form the first\n",
+    "derivative of the cost function as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1b013fd2",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial \\mathcal{C}(\\boldsymbol{\\theta})}{\\partial \\boldsymbol{\\theta}} = -\\boldsymbol{X}^T\\left(\\boldsymbol{y}-\\boldsymbol{p}\\right).\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "910f36dd",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "If we in addition define a diagonal matrix $\\boldsymbol{W}$ with elements \n",
+    "$p(y_i\\vert x_i,\\boldsymbol{\\theta})(1-p(y_i\\vert x_i,\\boldsymbol{\\theta})$, we can obtain a compact expression of the second derivative as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8212d0ed",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial^2 \\mathcal{C}(\\boldsymbol{\\theta})}{\\partial \\boldsymbol{\\theta}\\partial \\boldsymbol{\\theta}^T} = \\boldsymbol{X}^T\\boldsymbol{W}\\boldsymbol{X}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7ae7078b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Extending to more predictors\n",
+    "\n",
+    "Within a binary classification problem, we can easily expand our model to include multiple predictors. Our ratio between likelihoods is then with $p$ predictors"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "59e57d7c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\log{ \\frac{p(\\boldsymbol{\\theta}\\boldsymbol{x})}{1-p(\\boldsymbol{\\theta}\\boldsymbol{x})}} = \\theta_0+\\theta_1x_1+\\theta_2x_2+\\dots+\\theta_px_p.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6ffe0955",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Here we defined $\\boldsymbol{x}=[1,x_1,x_2,\\dots,x_p]$ and $\\boldsymbol{\\theta}=[\\theta_0, \\theta_1, \\dots, \\theta_p]$ leading to"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "56e9bd82",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "p(\\boldsymbol{\\theta}\\boldsymbol{x})=\\frac{ \\exp{(\\theta_0+\\theta_1x_1+\\theta_2x_2+\\dots+\\theta_px_p)}}{1+\\exp{(\\theta_0+\\theta_1x_1+\\theta_2x_2+\\dots+\\theta_px_p)}}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "86b12946",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Including more classes\n",
+    "\n",
+    "Till now we have mainly focused on two classes, the so-called binary\n",
+    "system. Suppose we wish to extend to $K$ classes.  Let us for the sake\n",
+    "of simplicity assume we have only two predictors. We have then following model"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d55394df",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\log{\\frac{p(C=1\\vert x)}{p(K\\vert x)}} = \\theta_{10}+\\theta_{11}x_1,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ee01378a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c7fadfbb",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\log{\\frac{p(C=2\\vert x)}{p(K\\vert x)}} = \\theta_{20}+\\theta_{21}x_1,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e8310f63",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and so on till the class $C=K-1$ class"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "be651647",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\log{\\frac{p(C=K-1\\vert x)}{p(K\\vert x)}} = \\theta_{(K-1)0}+\\theta_{(K-1)1}x_1,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e277c601",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and the model is specified in term of $K-1$ so-called log-odds or\n",
+    "**logit** transformations."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "aea3a410",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## More classes\n",
+    "\n",
+    "In our discussion of neural networks we will encounter the above again\n",
+    "in terms of a slightly modified function, the so-called **Softmax** function.\n",
+    "\n",
+    "The softmax function is used in various multiclass classification\n",
+    "methods, such as multinomial logistic regression (also known as\n",
+    "softmax regression), multiclass linear discriminant analysis, naive\n",
+    "Bayes classifiers, and artificial neural networks.  Specifically, in\n",
+    "multinomial logistic regression and linear discriminant analysis, the\n",
+    "input to the function is the result of $K$ distinct linear functions,\n",
+    "and the predicted probability for the $k$-th class given a sample\n",
+    "vector $\\boldsymbol{x}$ and a weighting vector $\\boldsymbol{\\theta}$ is (with two\n",
+    "predictors):"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "bfa7221f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "p(C=k\\vert \\mathbf {x} )=\\frac{\\exp{(\\theta_{k0}+\\theta_{k1}x_1)}}{1+\\sum_{l=1}^{K-1}\\exp{(\\theta_{l0}+\\theta_{l1}x_1)}}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3d749c39",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "It is easy to extend to more predictors. The final class is"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "dc061a39",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "p(C=K\\vert \\mathbf {x} )=\\frac{1}{1+\\sum_{l=1}^{K-1}\\exp{(\\theta_{l0}+\\theta_{l1}x_1)}},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8ea10488",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and they sum to one. Our earlier discussions were all specialized to\n",
+    "the case with two classes only. It is easy to see from the above that\n",
+    "what we derived earlier is compatible with these equations.\n",
+    "\n",
+    "To find the optimal parameters we would typically use a gradient\n",
+    "descent method.  Newton's method and gradient descent methods are\n",
+    "discussed in the material on [optimization\n",
+    "methods](https://compphysics.github.io/MachineLearning/doc/pub/Splines/html/Splines-bs.html)."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9cb3baf8",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Optimization, the central part of any Machine Learning algortithm\n",
+    "\n",
+    "Almost every problem in machine learning and data science starts with\n",
+    "a dataset $X$, a model $g(\\theta)$, which is a function of the\n",
+    "parameters $\\theta$ and a cost function $C(X, g(\\theta))$ that allows\n",
+    "us to judge how well the model $g(\\theta)$ explains the observations\n",
+    "$X$. The model is fit by finding the values of $\\theta$ that minimize\n",
+    "the cost function. Ideally we would be able to solve for $\\theta$\n",
+    "analytically, however this is not possible in general and we must use\n",
+    "some approximative/numerical method to compute the minimum."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "387393d7",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Revisiting our Logistic Regression case\n",
+    "\n",
+    "In our discussion on Logistic Regression we studied the \n",
+    "case of\n",
+    "two classes, with $y_i$ either\n",
+    "$0$ or $1$. Furthermore we assumed also that we have only two\n",
+    "parameters $\\theta$ in our fitting, that is we\n",
+    "defined probabilities"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "30f64659",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\begin{align*}\n",
+    "p(y_i=1|x_i,\\boldsymbol{\\theta}) &= \\frac{\\exp{(\\theta_0+\\theta_1x_i)}}{1+\\exp{(\\theta_0+\\theta_1x_i)}},\\nonumber\\\\\n",
+    "p(y_i=0|x_i,\\boldsymbol{\\theta}) &= 1 - p(y_i=1|x_i,\\boldsymbol{\\theta}),\n",
+    "\\end{align*}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3ba65422",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where $\\boldsymbol{\\theta}$ are the weights we wish to extract from data, in our case $\\theta_0$ and $\\theta_1$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "005f46d7",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## The equations to solve\n",
+    "\n",
+    "Our compact equations used a definition of a vector $\\boldsymbol{y}$ with $n$\n",
+    "elements $y_i$, an $n\\times p$ matrix $\\boldsymbol{X}$ which contains the\n",
+    "$x_i$ values and a vector $\\boldsymbol{p}$ of fitted probabilities\n",
+    "$p(y_i\\vert x_i,\\boldsymbol{\\theta})$. We rewrote in a more compact form\n",
+    "the first derivative of the cost function as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "61a638bc",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial \\mathcal{C}(\\boldsymbol{\\theta})}{\\partial \\boldsymbol{\\theta}} = -\\boldsymbol{X}^T\\left(\\boldsymbol{y}-\\boldsymbol{p}\\right).\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "469c0042",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "If we in addition define a diagonal matrix $\\boldsymbol{W}$ with elements \n",
+    "$p(y_i\\vert x_i,\\boldsymbol{\\theta})(1-p(y_i\\vert x_i,\\boldsymbol{\\theta})$, we can obtain a compact expression of the second derivative as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0af5449a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial^2 \\mathcal{C}(\\boldsymbol{\\theta})}{\\partial \\boldsymbol{\\theta}\\partial \\boldsymbol{\\theta}^T} = \\boldsymbol{X}^T\\boldsymbol{W}\\boldsymbol{X}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f4c16b4f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "This defines what is called  the Hessian matrix."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ddbe7f50",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Solving using Newton-Raphson's method\n",
+    "\n",
+    "If we can set up these equations, Newton-Raphson's iterative method is normally the method of choice. It requires however that we can compute in an efficient way the  matrices that define the first and second derivatives. \n",
+    "\n",
+    "Our iterative scheme is then given by"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "52830f96",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\boldsymbol{\\theta}^{\\mathrm{new}} = \\boldsymbol{\\theta}^{\\mathrm{old}}-\\left(\\frac{\\partial^2 \\mathcal{C}(\\boldsymbol{\\theta})}{\\partial \\boldsymbol{\\theta}\\partial \\boldsymbol{\\theta}^T}\\right)^{-1}_{\\boldsymbol{\\theta}^{\\mathrm{old}}}\\times \\left(\\frac{\\partial \\mathcal{C}(\\boldsymbol{\\theta})}{\\partial \\boldsymbol{\\theta}}\\right)_{\\boldsymbol{\\theta}^{\\mathrm{old}}},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1b8a1c14",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "or in matrix form as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8ad73cea",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\boldsymbol{\\theta}^{\\mathrm{new}} = \\boldsymbol{\\theta}^{\\mathrm{old}}-\\left(\\boldsymbol{X}^T\\boldsymbol{W}\\boldsymbol{X} \\right)^{-1}\\times \\left(-\\boldsymbol{X}^T(\\boldsymbol{y}-\\boldsymbol{p}) \\right)_{\\boldsymbol{\\theta}^{\\mathrm{old}}}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6d47dd0b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "The right-hand side is computed with the old values of $\\theta$. \n",
+    "\n",
+    "If we can compute these matrices, in particular the Hessian, the above is often the easiest method to implement."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f399c2f4",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Example code for Logistic Regression\n",
+    "\n",
+    "Here we make a class for Logistic regression. The code uses a simple data set and includes both a binary case and a multiclass case."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "id": "79f6b6fc",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "import numpy as np\n",
+    "\n",
+    "class LogisticRegression:\n",
+    "    \"\"\"\n",
+    "    Logistic Regression for binary and multiclass classification.\n",
+    "    \"\"\"\n",
+    "    def __init__(self, lr=0.01, epochs=1000, fit_intercept=True, verbose=False):\n",
+    "        self.lr = lr                  # Learning rate for gradient descent\n",
+    "        self.epochs = epochs          # Number of iterations\n",
+    "        self.fit_intercept = fit_intercept  # Whether to add intercept (bias)\n",
+    "        self.verbose = verbose        # Print loss during training if True\n",
+    "        self.weights = None\n",
+    "        self.multi_class = False      # Will be determined at fit time\n",
+    "\n",
+    "    def _add_intercept(self, X):\n",
+    "        \"\"\"Add intercept term (column of ones) to feature matrix.\"\"\"\n",
+    "        intercept = np.ones((X.shape[0], 1))\n",
+    "        return np.concatenate((intercept, X), axis=1)\n",
+    "\n",
+    "    def _sigmoid(self, z):\n",
+    "        \"\"\"Sigmoid function for binary logistic.\"\"\"\n",
+    "        return 1 / (1 + np.exp(-z))\n",
+    "\n",
+    "    def _softmax(self, Z):\n",
+    "        \"\"\"Softmax function for multiclass logistic.\"\"\"\n",
+    "        exp_Z = np.exp(Z - np.max(Z, axis=1, keepdims=True))\n",
+    "        return exp_Z / np.sum(exp_Z, axis=1, keepdims=True)\n",
+    "\n",
+    "    def fit(self, X, y):\n",
+    "        \"\"\"\n",
+    "        Train the logistic regression model using gradient descent.\n",
+    "        Supports binary (sigmoid) and multiclass (softmax) based on y.\n",
+    "        \"\"\"\n",
+    "        X = np.array(X)\n",
+    "        y = np.array(y)\n",
+    "        n_samples, n_features = X.shape\n",
+    "\n",
+    "        # Add intercept if needed\n",
+    "        if self.fit_intercept:\n",
+    "            X = self._add_intercept(X)\n",
+    "            n_features += 1\n",
+    "\n",
+    "        # Determine classes and mode (binary vs multiclass)\n",
+    "        unique_classes = np.unique(y)\n",
+    "        if len(unique_classes) > 2:\n",
+    "            self.multi_class = True\n",
+    "        else:\n",
+    "            self.multi_class = False\n",
+    "\n",
+    "        # ----- Multiclass case -----\n",
+    "        if self.multi_class:\n",
+    "            n_classes = len(unique_classes)\n",
+    "            # Map original labels to 0...n_classes-1\n",
+    "            class_to_index = {c: idx for idx, c in enumerate(unique_classes)}\n",
+    "            y_indices = np.array([class_to_index[c] for c in y])\n",
+    "            # Initialize weight matrix (features x classes)\n",
+    "            self.weights = np.zeros((n_features, n_classes))\n",
+    "\n",
+    "            # One-hot encode y\n",
+    "            Y_onehot = np.zeros((n_samples, n_classes))\n",
+    "            Y_onehot[np.arange(n_samples), y_indices] = 1\n",
+    "\n",
+    "            # Gradient descent\n",
+    "            for epoch in range(self.epochs):\n",
+    "                scores = X.dot(self.weights)          # Linear scores (n_samples x n_classes)\n",
+    "                probs = self._softmax(scores)        # Probabilities (n_samples x n_classes)\n",
+    "                # Compute gradient (features x classes)\n",
+    "                gradient = (1 / n_samples) * X.T.dot(probs - Y_onehot)\n",
+    "                # Update weights\n",
+    "                self.weights -= self.lr * gradient\n",
+    "\n",
+    "                if self.verbose and epoch % 100 == 0:\n",
+    "                    # Compute current loss (categorical cross-entropy)\n",
+    "                    loss = -np.sum(Y_onehot * np.log(probs + 1e-15)) / n_samples\n",
+    "                    print(f\"[Epoch {epoch}] Multiclass loss: {loss:.4f}\")\n",
+    "\n",
+    "        # ----- Binary case -----\n",
+    "        else:\n",
+    "            # Convert y to 0/1 if not already\n",
+    "            if not np.array_equal(unique_classes, [0, 1]):\n",
+    "                # Map the two classes to 0 and 1\n",
+    "                class0, class1 = unique_classes\n",
+    "                y_binary = np.where(y == class1, 1, 0)\n",
+    "            else:\n",
+    "                y_binary = y.copy().astype(int)\n",
+    "\n",
+    "            # Initialize weights vector (features,)\n",
+    "            self.weights = np.zeros(n_features)\n",
+    "\n",
+    "            # Gradient descent\n",
+    "            for epoch in range(self.epochs):\n",
+    "                linear_model = X.dot(self.weights)     # (n_samples,)\n",
+    "                probs = self._sigmoid(linear_model)   # (n_samples,)\n",
+    "                # Gradient for binary cross-entropy\n",
+    "                gradient = (1 / n_samples) * X.T.dot(probs - y_binary)\n",
+    "                self.weights -= self.lr * gradient\n",
+    "\n",
+    "                if self.verbose and epoch % 100 == 0:\n",
+    "                    # Compute binary cross-entropy loss\n",
+    "                    loss = -np.mean(\n",
+    "                        y_binary * np.log(probs + 1e-15) + \n",
+    "                        (1 - y_binary) * np.log(1 - probs + 1e-15)\n",
+    "                    )\n",
+    "                    print(f\"[Epoch {epoch}] Binary loss: {loss:.4f}\")\n",
+    "\n",
+    "    def predict_prob(self, X):\n",
+    "        \"\"\"\n",
+    "        Compute probability estimates. Returns a 1D array for binary or\n",
+    "        a 2D array (n_samples x n_classes) for multiclass.\n",
+    "        \"\"\"\n",
+    "        X = np.array(X)\n",
+    "        # Add intercept if the model used it\n",
+    "        if self.fit_intercept:\n",
+    "            X = self._add_intercept(X)\n",
+    "        scores = X.dot(self.weights)\n",
+    "        if self.multi_class:\n",
+    "            return self._softmax(scores)\n",
+    "        else:\n",
+    "            return self._sigmoid(scores)\n",
+    "\n",
+    "    def predict(self, X):\n",
+    "        \"\"\"\n",
+    "        Predict class labels for samples in X.\n",
+    "        Returns integer class labels (0,1 for binary, or 0...C-1 for multiclass).\n",
+    "        \"\"\"\n",
+    "        probs = self.predict_prob(X)\n",
+    "        if self.multi_class:\n",
+    "            # Choose class with highest probability\n",
+    "            return np.argmax(probs, axis=1)\n",
+    "        else:\n",
+    "            # Threshold at 0.5 for binary\n",
+    "            return (probs >= 0.5).astype(int)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "24e84b29",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "The class implements the sigmoid and softmax internally. During fit(),\n",
+    "we check the number of classes: if more than 2, we set\n",
+    "self.multi_class=True and perform multinomial logistic regression. We\n",
+    "one-hot encode the target vector and update a weight matrix with\n",
+    "softmax probabilities. Otherwise, we do standard binary logistic\n",
+    "regression, converting labels to 0/1 if needed and updating a weight\n",
+    "vector. In both cases we use batch gradient descent on the\n",
+    "cross-entropy loss (we add a small epsilon 1e-15 to logs for numerical\n",
+    "stability). Progress (loss) can be printed if verbose=True."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "id": "7a73eca4",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "# Evaluation Metrics\n",
+    "#We define helper functions for accuracy and cross-entropy loss. Accuracy is the fraction of correct predictions . For loss, we compute the appropriate cross-entropy:\n",
+    "\n",
+    "def accuracy_score(y_true, y_pred):\n",
+    "    \"\"\"Accuracy = (# correct predictions) / (total samples).\"\"\"\n",
+    "    y_true = np.array(y_true)\n",
+    "    y_pred = np.array(y_pred)\n",
+    "    return np.mean(y_true == y_pred)\n",
+    "\n",
+    "def binary_cross_entropy(y_true, y_prob):\n",
+    "    \"\"\"\n",
+    "    Binary cross-entropy loss.\n",
+    "    y_true: true binary labels (0 or 1), y_prob: predicted probabilities for class 1.\n",
+    "    \"\"\"\n",
+    "    y_true = np.array(y_true)\n",
+    "    y_prob = np.clip(np.array(y_prob), 1e-15, 1-1e-15)\n",
+    "    return -np.mean(y_true * np.log(y_prob) + (1 - y_true) * np.log(1 - y_prob))\n",
+    "\n",
+    "def categorical_cross_entropy(y_true, y_prob):\n",
+    "    \"\"\"\n",
+    "    Categorical cross-entropy loss for multiclass.\n",
+    "    y_true: true labels (0...C-1), y_prob: array of predicted probabilities (n_samples x C).\n",
+    "    \"\"\"\n",
+    "    y_true = np.array(y_true, dtype=int)\n",
+    "    y_prob = np.clip(np.array(y_prob), 1e-15, 1-1e-15)\n",
+    "    # One-hot encode true labels\n",
+    "    n_samples, n_classes = y_prob.shape\n",
+    "    one_hot = np.zeros_like(y_prob)\n",
+    "    one_hot[np.arange(n_samples), y_true] = 1\n",
+    "    # Compute cross-entropy\n",
+    "    loss_vec = -np.sum(one_hot * np.log(y_prob), axis=1)\n",
+    "    return np.mean(loss_vec)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "40d4b30f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "### Synthetic data generation\n",
+    "\n",
+    "Binary classification data: Create two Gaussian clusters in 2D. For example, class 0 around mean [-2,-2] and class 1 around [2,2].\n",
+    "Multiclass data: Create several Gaussian clusters (one per class) spread out in feature space."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "id": "ac0089bf",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "import numpy as np\n",
+    "\n",
+    "def generate_binary_data(n_samples=100, n_features=2, random_state=None):\n",
+    "    \"\"\"\n",
+    "    Generate synthetic binary classification data.\n",
+    "    Returns (X, y) where X is (n_samples x n_features), y in {0,1}.\n",
+    "    \"\"\"\n",
+    "    rng = np.random.RandomState(random_state)\n",
+    "    # Half samples for class 0, half for class 1\n",
+    "    n0 = n_samples // 2\n",
+    "    n1 = n_samples - n0\n",
+    "    # Class 0 around mean -2, class 1 around +2\n",
+    "    mean0 = -2 * np.ones(n_features)\n",
+    "    mean1 =  2 * np.ones(n_features)\n",
+    "    X0 = rng.randn(n0, n_features) + mean0\n",
+    "    X1 = rng.randn(n1, n_features) + mean1\n",
+    "    X = np.vstack((X0, X1))\n",
+    "    y = np.array([0]*n0 + [1]*n1)\n",
+    "    return X, y\n",
+    "\n",
+    "def generate_multiclass_data(n_samples=150, n_features=2, n_classes=3, random_state=None):\n",
+    "    \"\"\"\n",
+    "    Generate synthetic multiclass data with n_classes Gaussian clusters.\n",
+    "    \"\"\"\n",
+    "    rng = np.random.RandomState(random_state)\n",
+    "    X = []\n",
+    "    y = []\n",
+    "    samples_per_class = n_samples // n_classes\n",
+    "    for cls in range(n_classes):\n",
+    "        # Random cluster center for each class\n",
+    "        center = rng.uniform(-5, 5, size=n_features)\n",
+    "        Xi = rng.randn(samples_per_class, n_features) + center\n",
+    "        yi = [cls] * samples_per_class\n",
+    "        X.append(Xi)\n",
+    "        y.extend(yi)\n",
+    "    X = np.vstack(X)\n",
+    "    y = np.array(y)\n",
+    "    return X, y\n",
+    "\n",
+    "\n",
+    "# Generate and test on binary data\n",
+    "X_bin, y_bin = generate_binary_data(n_samples=200, n_features=2, random_state=42)\n",
+    "model_bin = LogisticRegression(lr=0.1, epochs=1000)\n",
+    "model_bin.fit(X_bin, y_bin)\n",
+    "y_prob_bin = model_bin.predict_prob(X_bin)      # probabilities for class 1\n",
+    "y_pred_bin = model_bin.predict(X_bin)           # predicted classes 0 or 1\n",
+    "\n",
+    "acc_bin = accuracy_score(y_bin, y_pred_bin)\n",
+    "loss_bin = binary_cross_entropy(y_bin, y_prob_bin)\n",
+    "print(f\"Binary Classification - Accuracy: {acc_bin:.2f}, Cross-Entropy Loss: {loss_bin:.2f}\")\n",
+    "#For multiclass:\n",
+    "# Generate and test on multiclass data\n",
+    "X_multi, y_multi = generate_multiclass_data(n_samples=300, n_features=2, n_classes=3, random_state=1)\n",
+    "model_multi = LogisticRegression(lr=0.1, epochs=1000)\n",
+    "model_multi.fit(X_multi, y_multi)\n",
+    "y_prob_multi = model_multi.predict_prob(X_multi)     # (n_samples x 3) probabilities\n",
+    "y_pred_multi = model_multi.predict(X_multi)          # predicted labels 0,1,2\n",
+    "\n",
+    "acc_multi = accuracy_score(y_multi, y_pred_multi)\n",
+    "loss_multi = categorical_cross_entropy(y_multi, y_prob_multi)\n",
+    "print(f\"Multiclass Classification - Accuracy: {acc_multi:.2f}, Cross-Entropy Loss: {loss_multi:.2f}\")\n",
+    "\n",
+    "# CSV Export\n",
+    "import csv\n",
+    "\n",
+    "# Export binary results\n",
+    "with open('binary_results.csv', mode='w', newline='') as f:\n",
+    "    writer = csv.writer(f)\n",
+    "    writer.writerow([\"TrueLabel\", \"PredictedLabel\"])\n",
+    "    for true, pred in zip(y_bin, y_pred_bin):\n",
+    "        writer.writerow([true, pred])\n",
+    "\n",
+    "# Export multiclass results\n",
+    "with open('multiclass_results.csv', mode='w', newline='') as f:\n",
+    "    writer = csv.writer(f)\n",
+    "    writer.writerow([\"TrueLabel\", \"PredictedLabel\"])\n",
+    "    for true, pred in zip(y_multi, y_pred_multi):\n",
+    "        writer.writerow([true, pred])"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1e9acef3",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Using **Scikit-learn**\n",
+    "\n",
+    "We show here how we can use a logistic regression case on a data set\n",
+    "included in _scikit_learn_, the so-called Wisconsin breast cancer data\n",
+    "using Logistic regression as our algorithm for classification. This is\n",
+    "a widely studied data set and can easily be included in demonstrations\n",
+    "of classification problems."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "id": "9153234a",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "%matplotlib inline\n",
+    "\n",
+    "import matplotlib.pyplot as plt\n",
+    "import numpy as np\n",
+    "from sklearn.model_selection import  train_test_split \n",
+    "from sklearn.datasets import load_breast_cancer\n",
+    "from sklearn.linear_model import LogisticRegression\n",
+    "\n",
+    "# Load the data\n",
+    "cancer = load_breast_cancer()\n",
+    "\n",
+    "X_train, X_test, y_train, y_test = train_test_split(cancer.data,cancer.target,random_state=0)\n",
+    "print(X_train.shape)\n",
+    "print(X_test.shape)\n",
+    "# Logistic Regression\n",
+    "logreg = LogisticRegression(solver='lbfgs')\n",
+    "logreg.fit(X_train, y_train)\n",
+    "print(\"Test set accuracy with Logistic Regression: {:.2f}\".format(logreg.score(X_test,y_test)))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "908d547b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Using the correlation matrix\n",
+    "\n",
+    "In addition to the above scores, we could also study the covariance (and the correlation matrix).\n",
+    "We use **Pandas** to compute the correlation matrix."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "id": "8a46f4f3",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "import matplotlib.pyplot as plt\n",
+    "import numpy as np\n",
+    "from sklearn.model_selection import  train_test_split \n",
+    "from sklearn.datasets import load_breast_cancer\n",
+    "from sklearn.linear_model import LogisticRegression\n",
+    "cancer = load_breast_cancer()\n",
+    "import pandas as pd\n",
+    "# Making a data frame\n",
+    "cancerpd = pd.DataFrame(cancer.data, columns=cancer.feature_names)\n",
+    "\n",
+    "fig, axes = plt.subplots(15,2,figsize=(10,20))\n",
+    "malignant = cancer.data[cancer.target == 0]\n",
+    "benign = cancer.data[cancer.target == 1]\n",
+    "ax = axes.ravel()\n",
+    "\n",
+    "for i in range(30):\n",
+    "    _, bins = np.histogram(cancer.data[:,i], bins =50)\n",
+    "    ax[i].hist(malignant[:,i], bins = bins, alpha = 0.5)\n",
+    "    ax[i].hist(benign[:,i], bins = bins, alpha = 0.5)\n",
+    "    ax[i].set_title(cancer.feature_names[i])\n",
+    "    ax[i].set_yticks(())\n",
+    "ax[0].set_xlabel(\"Feature magnitude\")\n",
+    "ax[0].set_ylabel(\"Frequency\")\n",
+    "ax[0].legend([\"Malignant\", \"Benign\"], loc =\"best\")\n",
+    "fig.tight_layout()\n",
+    "plt.show()\n",
+    "\n",
+    "import seaborn as sns\n",
+    "correlation_matrix = cancerpd.corr().round(1)\n",
+    "# use the heatmap function from seaborn to plot the correlation matrix\n",
+    "# annot = True to print the values inside the square\n",
+    "plt.figure(figsize=(15,8))\n",
+    "sns.heatmap(data=correlation_matrix, annot=True)\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ba0275a7",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Discussing the correlation data\n",
+    "\n",
+    "In the above example we note two things. In the first plot we display\n",
+    "the overlap of benign and malignant tumors as functions of the various\n",
+    "features in the Wisconsin data set. We see that for\n",
+    "some of the features we can distinguish clearly the benign and\n",
+    "malignant cases while for other features we cannot. This can point to\n",
+    "us which features may be of greater interest when we wish to classify\n",
+    "a benign or not benign tumour.\n",
+    "\n",
+    "In the second figure we have computed the so-called correlation\n",
+    "matrix, which in our case with thirty features becomes a $30\\times 30$\n",
+    "matrix.\n",
+    "\n",
+    "We constructed this matrix using **pandas** via the statements"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "id": "1af34f8e",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "cancerpd = pd.DataFrame(cancer.data, columns=cancer.feature_names)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1eac30d3",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and then"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "id": "a0cdd9c9",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "correlation_matrix = cancerpd.corr().round(1)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "013777ad",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Diagonalizing this matrix we can in turn say something about which\n",
+    "features are of relevance and which are not. This leads  us to\n",
+    "the classical Principal Component Analysis (PCA) theorem with\n",
+    "applications. This will be discussed later this semester."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "410f90ac",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Other measures in classification studies"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 8,
+   "id": "fa16a459",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "import matplotlib.pyplot as plt\n",
+    "import numpy as np\n",
+    "from sklearn.model_selection import  train_test_split \n",
+    "from sklearn.datasets import load_breast_cancer\n",
+    "from sklearn.linear_model import LogisticRegression\n",
+    "\n",
+    "# Load the data\n",
+    "cancer = load_breast_cancer()\n",
+    "\n",
+    "X_train, X_test, y_train, y_test = train_test_split(cancer.data,cancer.target,random_state=0)\n",
+    "print(X_train.shape)\n",
+    "print(X_test.shape)\n",
+    "# Logistic Regression\n",
+    "logreg = LogisticRegression(solver='lbfgs')\n",
+    "logreg.fit(X_train, y_train)\n",
+    "\n",
+    "from sklearn.preprocessing import LabelEncoder\n",
+    "from sklearn.model_selection import cross_validate\n",
+    "#Cross validation\n",
+    "accuracy = cross_validate(logreg,X_test,y_test,cv=10)['test_score']\n",
+    "print(accuracy)\n",
+    "print(\"Test set accuracy with Logistic Regression: {:.2f}\".format(logreg.score(X_test,y_test)))\n",
+    "\n",
+    "import scikitplot as skplt\n",
+    "y_pred = logreg.predict(X_test)\n",
+    "skplt.metrics.plot_confusion_matrix(y_test, y_pred, normalize=True)\n",
+    "plt.show()\n",
+    "y_probas = logreg.predict_proba(X_test)\n",
+    "skplt.metrics.plot_roc(y_test, y_probas)\n",
+    "plt.show()\n",
+    "skplt.metrics.plot_cumulative_gain(y_test, y_probas)\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a721de53",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Introduction to Neural networks\n",
+    "\n",
+    "Artificial neural networks are computational systems that can learn to\n",
+    "perform tasks by considering examples, generally without being\n",
+    "programmed with any task-specific rules. It is supposed to mimic a\n",
+    "biological system, wherein neurons interact by sending signals in the\n",
+    "form of mathematical functions between layers. All layers can contain\n",
+    "an arbitrary number of neurons, and each connection is represented by\n",
+    "a weight variable."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "68de5052",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Artificial neurons\n",
+    "\n",
+    "The field of artificial neural networks has a long history of\n",
+    "development, and is closely connected with the advancement of computer\n",
+    "science and computers in general. A model of artificial neurons was\n",
+    "first developed by McCulloch and Pitts in 1943 to study signal\n",
+    "processing in the brain and has later been refined by others. The\n",
+    "general idea is to mimic neural networks in the human brain, which is\n",
+    "composed of billions of neurons that communicate with each other by\n",
+    "sending electrical signals.  Each neuron accumulates its incoming\n",
+    "signals, which must exceed an activation threshold to yield an\n",
+    "output. If the threshold is not overcome, the neuron remains inactive,\n",
+    "i.e. has zero output.\n",
+    "\n",
+    "This behaviour has inspired a simple mathematical model for an artificial neuron."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7685af02",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"artificialNeuron\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation}\n",
+    " y = f\\left(\\sum_{i=1}^n w_ix_i\\right) = f(u)\n",
+    "\\label{artificialNeuron} \\tag{1}\n",
+    "\\end{equation}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3dfcfcb0",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Here, the output $y$ of the neuron is the value of its activation function, which have as input\n",
+    "a weighted sum of signals $x_i, \\dots ,x_n$ received by $n$ other neurons.\n",
+    "\n",
+    "Conceptually, it is helpful to divide neural networks into four\n",
+    "categories:\n",
+    "1. general purpose neural networks for supervised learning,\n",
+    "\n",
+    "2. neural networks designed specifically for image processing, the most prominent example of this class being Convolutional Neural Networks (CNNs),\n",
+    "\n",
+    "3. neural networks for sequential data such as Recurrent Neural Networks (RNNs), and\n",
+    "\n",
+    "4. neural networks for unsupervised learning such as Deep Boltzmann Machines.\n",
+    "\n",
+    "In natural science, DNNs and CNNs have already found numerous\n",
+    "applications. In statistical physics, they have been applied to detect\n",
+    "phase transitions in 2D Ising and Potts models, lattice gauge\n",
+    "theories, and different phases of polymers, or solving the\n",
+    "Navier-Stokes equation in weather forecasting.  Deep learning has also\n",
+    "found interesting applications in quantum physics. Various quantum\n",
+    "phase transitions can be detected and studied using DNNs and CNNs,\n",
+    "topological phases, and even non-equilibrium many-body\n",
+    "localization. Representing quantum states as DNNs quantum state\n",
+    "tomography are among some of the impressive achievements to reveal the\n",
+    "potential of DNNs to facilitate the study of quantum systems.\n",
+    "\n",
+    "In quantum information theory, it has been shown that one can perform\n",
+    "gate decompositions with the help of neural. \n",
+    "\n",
+    "The applications are not limited to the natural sciences. There is a\n",
+    "plethora of applications in essentially all disciplines, from the\n",
+    "humanities to life science and medicine."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0d037ca7",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Neural network types\n",
+    "\n",
+    "An artificial neural network (ANN), is a computational model that\n",
+    "consists of layers of connected neurons, or nodes or units.  We will\n",
+    "refer to these interchangeably as units or nodes, and sometimes as\n",
+    "neurons.\n",
+    "\n",
+    "It is supposed to mimic a biological nervous system by letting each\n",
+    "neuron interact with other neurons by sending signals in the form of\n",
+    "mathematical functions between layers.  A wide variety of different\n",
+    "ANNs have been developed, but most of them consist of an input layer,\n",
+    "an output layer and eventual layers in-between, called *hidden\n",
+    "layers*. All layers can contain an arbitrary number of nodes, and each\n",
+    "connection between two nodes is associated with a weight variable.\n",
+    "\n",
+    "Neural networks (also called neural nets) are neural-inspired\n",
+    "nonlinear models for supervised learning.  As we will see, neural nets\n",
+    "can be viewed as natural, more powerful extensions of supervised\n",
+    "learning methods such as linear and logistic regression and soft-max\n",
+    "methods we discussed earlier."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7bcf7188",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Feed-forward neural networks\n",
+    "\n",
+    "The feed-forward neural network (FFNN) was the first and simplest type\n",
+    "of ANNs that were devised. In this network, the information moves in\n",
+    "only one direction: forward through the layers.\n",
+    "\n",
+    "Nodes are represented by circles, while the arrows display the\n",
+    "connections between the nodes, including the direction of information\n",
+    "flow. Additionally, each arrow corresponds to a weight variable\n",
+    "(figure to come).  We observe that each node in a layer is connected\n",
+    "to *all* nodes in the subsequent layer, making this a so-called\n",
+    "*fully-connected* FFNN."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cd094e20",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Convolutional Neural Network\n",
+    "\n",
+    "A different variant of FFNNs are *convolutional neural networks*\n",
+    "(CNNs), which have a connectivity pattern inspired by the animal\n",
+    "visual cortex. Individual neurons in the visual cortex only respond to\n",
+    "stimuli from small sub-regions of the visual field, called a receptive\n",
+    "field. This makes the neurons well-suited to exploit the strong\n",
+    "spatially local correlation present in natural images. The response of\n",
+    "each neuron can be approximated mathematically as a convolution\n",
+    "operation.  (figure to come)\n",
+    "\n",
+    "Convolutional neural networks emulate the behaviour of neurons in the\n",
+    "visual cortex by enforcing a *local* connectivity pattern between\n",
+    "nodes of adjacent layers: Each node in a convolutional layer is\n",
+    "connected only to a subset of the nodes in the previous layer, in\n",
+    "contrast to the fully-connected FFNN.  Often, CNNs consist of several\n",
+    "convolutional layers that learn local features of the input, with a\n",
+    "fully-connected layer at the end, which gathers all the local data and\n",
+    "produces the outputs. They have wide applications in image and video\n",
+    "recognition."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ea99157e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Recurrent neural networks\n",
+    "\n",
+    "So far we have only mentioned ANNs where information flows in one\n",
+    "direction: forward. *Recurrent neural networks* on the other hand,\n",
+    "have connections between nodes that form directed *cycles*. This\n",
+    "creates a form of internal memory which are able to capture\n",
+    "information on what has been calculated before; the output is\n",
+    "dependent on the previous computations. Recurrent NNs make use of\n",
+    "sequential information by performing the same task for every element\n",
+    "in a sequence, where each element depends on previous elements. An\n",
+    "example of such information is sentences, making recurrent NNs\n",
+    "especially well-suited for handwriting and speech recognition."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b73754c2",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Other types of networks\n",
+    "\n",
+    "There are many other kinds of ANNs that have been developed. One type\n",
+    "that is specifically designed for interpolation in multidimensional\n",
+    "space is the radial basis function (RBF) network. RBFs are typically\n",
+    "made up of three layers: an input layer, a hidden layer with\n",
+    "non-linear radial symmetric activation functions and a linear output\n",
+    "layer (''linear'' here means that each node in the output layer has a\n",
+    "linear activation function). The layers are normally fully-connected\n",
+    "and there are no cycles, thus RBFs can be viewed as a type of\n",
+    "fully-connected FFNN. They are however usually treated as a separate\n",
+    "type of NN due the unusual activation functions."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "aa97c83d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Multilayer perceptrons\n",
+    "\n",
+    "One uses often so-called fully-connected feed-forward neural networks\n",
+    "with three or more layers (an input layer, one or more hidden layers\n",
+    "and an output layer) consisting of neurons that have non-linear\n",
+    "activation functions.\n",
+    "\n",
+    "Such networks are often called *multilayer perceptrons* (MLPs)."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "abe84919",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Why multilayer perceptrons?\n",
+    "\n",
+    "According to the *Universal approximation theorem*, a feed-forward\n",
+    "neural network with just a single hidden layer containing a finite\n",
+    "number of neurons can approximate a continuous multidimensional\n",
+    "function to arbitrary accuracy, assuming the activation function for\n",
+    "the hidden layer is a **non-constant, bounded and\n",
+    "monotonically-increasing continuous function**.\n",
+    "\n",
+    "Note that the requirements on the activation function only applies to\n",
+    "the hidden layer, the output nodes are always assumed to be linear, so\n",
+    "as to not restrict the range of output values."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d3ff207b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Illustration of a single perceptron model and a multi-perceptron model\n",
+    "\n",
+    "<!-- dom:FIGURE: [figures/nns.png, width=600 frac=0.8]  In a) we show a single perceptron model while in b) we dispay a network with two  hidden layers, an input layer and an output layer. -->\n",
+    "<!-- begin figure -->\n",
+    "\n",
+    "<img src=\"figures/nns.png\" width=\"600\"><p style=\"font-size: 0.9em\"><i>Figure 1: In a) we show a single perceptron model while in b) we dispay a network with two  hidden layers, an input layer and an output layer.</i></p>\n",
+    "<!-- end figure -->"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f982c11f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Examples of XOR, OR and AND gates\n",
+    "\n",
+    "Let us first try to fit various gates using standard linear\n",
+    "regression. The gates we are thinking of are the classical XOR, OR and\n",
+    "AND gates, well-known elements in computer science. The tables here\n",
+    "show how we can set up the inputs $x_1$ and $x_2$ in order to yield a\n",
+    "specific target $y_i$."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 9,
+   "id": "04a3e090",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "\"\"\"\n",
+    "Simple code that tests XOR, OR and AND gates with linear regression\n",
+    "\"\"\"\n",
+    "\n",
+    "import numpy as np\n",
+    "# Design matrix\n",
+    "X = np.array([ [1, 0, 0], [1, 0, 1], [1, 1, 0],[1, 1, 1]],dtype=np.float64)\n",
+    "print(f\"The X.TX  matrix:{X.T @ X}\")\n",
+    "Xinv = np.linalg.pinv(X.T @ X)\n",
+    "print(f\"The invers of X.TX  matrix:{Xinv}\")\n",
+    "\n",
+    "# The XOR gate \n",
+    "yXOR = np.array( [ 0, 1 ,1, 0])\n",
+    "ThetaXOR  = Xinv @ X.T @ yXOR\n",
+    "print(f\"The values of theta for the XOR gate:{ThetaXOR}\")\n",
+    "print(f\"The linear regression prediction  for the XOR gate:{X @ ThetaXOR}\")\n",
+    "\n",
+    "\n",
+    "# The OR gate \n",
+    "yOR = np.array( [ 0, 1 ,1, 1])\n",
+    "ThetaOR  = Xinv @ X.T @ yOR\n",
+    "print(f\"The values of theta for the OR gate:{ThetaOR}\")\n",
+    "print(f\"The linear regression prediction  for the OR gate:{X @ ThetaOR}\")\n",
+    "\n",
+    "\n",
+    "# The OR gate \n",
+    "yAND = np.array( [ 0, 0 ,0, 1])\n",
+    "ThetaAND  = Xinv @ X.T @ yAND\n",
+    "print(f\"The values of theta for the AND gate:{ThetaAND}\")\n",
+    "print(f\"The linear regression prediction  for the AND gate:{X @ ThetaAND}\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "95b1f5a5",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "What is happening here?"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0d200eff",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Does Logistic Regression do a better Job?"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 10,
+   "id": "040a69d0",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "\"\"\"\n",
+    "Simple code that tests XOR and OR gates with linear regression\n",
+    "and logistic regression\n",
+    "\"\"\"\n",
+    "\n",
+    "import matplotlib.pyplot as plt\n",
+    "from sklearn.linear_model import LogisticRegression\n",
+    "import numpy as np\n",
+    "\n",
+    "# Design matrix\n",
+    "X = np.array([ [1, 0, 0], [1, 0, 1], [1, 1, 0],[1, 1, 1]],dtype=np.float64)\n",
+    "print(f\"The X.TX  matrix:{X.T @ X}\")\n",
+    "Xinv = np.linalg.pinv(X.T @ X)\n",
+    "print(f\"The invers of X.TX  matrix:{Xinv}\")\n",
+    "\n",
+    "# The XOR gate \n",
+    "yXOR = np.array( [ 0, 1 ,1, 0])\n",
+    "ThetaXOR  = Xinv @ X.T @ yXOR\n",
+    "print(f\"The values of theta for the XOR gate:{ThetaXOR}\")\n",
+    "print(f\"The linear regression prediction  for the XOR gate:{X @ ThetaXOR}\")\n",
+    "\n",
+    "\n",
+    "# The OR gate \n",
+    "yOR = np.array( [ 0, 1 ,1, 1])\n",
+    "ThetaOR  = Xinv @ X.T @ yOR\n",
+    "print(f\"The values of theta for the OR gate:{ThetaOR}\")\n",
+    "print(f\"The linear regression prediction  for the OR gate:{X @ ThetaOR}\")\n",
+    "\n",
+    "\n",
+    "# The OR gate \n",
+    "yAND = np.array( [ 0, 0 ,0, 1])\n",
+    "ThetaAND  = Xinv @ X.T @ yAND\n",
+    "print(f\"The values of theta for the AND gate:{ThetaAND}\")\n",
+    "print(f\"The linear regression prediction  for the AND gate:{X @ ThetaAND}\")\n",
+    "\n",
+    "# Now we change to logistic regression\n",
+    "\n",
+    "\n",
+    "# Logistic Regression\n",
+    "logreg = LogisticRegression()\n",
+    "logreg.fit(X, yOR)\n",
+    "print(\"Test set accuracy with Logistic Regression for OR gate: {:.2f}\".format(logreg.score(X,yOR)))\n",
+    "\n",
+    "logreg.fit(X, yXOR)\n",
+    "print(\"Test set accuracy with Logistic Regression for XOR gate: {:.2f}\".format(logreg.score(X,yXOR)))\n",
+    "\n",
+    "\n",
+    "logreg.fit(X, yAND)\n",
+    "print(\"Test set accuracy with Logistic Regression for AND gate: {:.2f}\".format(logreg.score(X,yAND)))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "49f17f65",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Not exactly impressive, but somewhat better."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "714e0891",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Adding Neural Networks"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 11,
+   "id": "28bde670",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "\n",
+    "# and now neural networks with Scikit-Learn and the XOR\n",
+    "\n",
+    "from sklearn.neural_network import MLPClassifier\n",
+    "from sklearn.datasets import make_classification\n",
+    "X, yXOR = make_classification(n_samples=100, random_state=1)\n",
+    "FFNN = MLPClassifier(random_state=1, max_iter=300).fit(X, yXOR)\n",
+    "FFNN.predict_proba(X)\n",
+    "print(f\"Test set accuracy with Feed Forward Neural Network  for XOR gate:{FFNN.score(X, yXOR)}\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4440856f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Mathematical model\n",
+    "\n",
+    "The output $y$ is produced via the activation function $f$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6199da92",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "y = f\\left(\\sum_{i=1}^n w_ix_i + b_i\\right) = f(z),\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "62c964e3",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "This function receives $x_i$ as inputs.\n",
+    "Here the activation $z=(\\sum_{i=1}^n w_ix_i+b_i)$. \n",
+    "In an FFNN of such neurons, the *inputs* $x_i$ are the *outputs* of\n",
+    "the neurons in the preceding layer. Furthermore, an MLP is\n",
+    "fully-connected, which means that each neuron receives a weighted sum\n",
+    "of the outputs of *all* neurons in the previous layer."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "64ba4c70",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Mathematical model\n",
+    "\n",
+    "First, for each node $i$ in the first hidden layer, we calculate a weighted sum $z_i^1$ of the input coordinates $x_j$,"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "66c11135",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"_auto1\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation} z_i^1 = \\sum_{j=1}^{M} w_{ij}^1 x_j + b_i^1\n",
+    "\\label{_auto1} \\tag{2}\n",
+    "\\end{equation}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0f47b20a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Here $b_i$ is the so-called bias which is normally needed in\n",
+    "case of zero activation weights or inputs. How to fix the biases and\n",
+    "the weights will be discussed below.  The value of $z_i^1$ is the\n",
+    "argument to the activation function $f_i$ of each node $i$, The\n",
+    "variable $M$ stands for all possible inputs to a given node $i$ in the\n",
+    "first layer.  We define  the output $y_i^1$ of all neurons in layer 1 as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "bda56156",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"outputLayer1\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation}\n",
+    " y_i^1 = f(z_i^1) = f\\left(\\sum_{j=1}^M w_{ij}^1 x_j  + b_i^1\\right)\n",
+    "\\label{outputLayer1} \\tag{3}\n",
+    "\\end{equation}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1330fab9",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where we assume that all nodes in the same layer have identical\n",
+    "activation functions, hence the notation $f$. In general, we could assume in the more general case that different layers have different activation functions.\n",
+    "In this case we would identify these functions with a superscript $l$ for the $l$-th layer,"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ae474dfb",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"generalLayer\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation}\n",
+    " y_i^l = f^l(u_i^l) = f^l\\left(\\sum_{j=1}^{N_{l-1}} w_{ij}^l y_j^{l-1} + b_i^l\\right)\n",
+    "\\label{generalLayer} \\tag{4}\n",
+    "\\end{equation}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b6cb6fed",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where $N_l$ is the number of nodes in layer $l$. When the output of\n",
+    "all the nodes in the first hidden layer are computed, the values of\n",
+    "the subsequent layer can be calculated and so forth until the output\n",
+    "is obtained."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2f8f9b4e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Mathematical model\n",
+    "\n",
+    "The output of neuron $i$ in layer 2 is thus,"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "18e74238",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"_auto2\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation}\n",
+    " y_i^2 = f^2\\left(\\sum_{j=1}^N w_{ij}^2 y_j^1 + b_i^2\\right) \n",
+    "\\label{_auto2} \\tag{5}\n",
+    "\\end{equation}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d10df3e7",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"outputLayer2\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation} \n",
+    " = f^2\\left[\\sum_{j=1}^N w_{ij}^2f^1\\left(\\sum_{k=1}^M w_{jk}^1 x_k + b_j^1\\right) + b_i^2\\right]\n",
+    "\\label{outputLayer2} \\tag{6}\n",
+    "\\end{equation}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "da21a316",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where we have substituted $y_k^1$ with the inputs $x_k$. Finally, the ANN output reads"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "76938a28",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"_auto3\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation}\n",
+    " y_i^3 = f^3\\left(\\sum_{j=1}^N w_{ij}^3 y_j^2 + b_i^3\\right) \n",
+    "\\label{_auto3} \\tag{7}\n",
+    "\\end{equation}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "65434967",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"_auto4\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation} \n",
+    " = f_3\\left[\\sum_{j} w_{ij}^3 f^2\\left(\\sum_{k} w_{jk}^2 f^1\\left(\\sum_{m} w_{km}^1 x_m + b_k^1\\right) + b_j^2\\right)\n",
+    "  + b_1^3\\right]\n",
+    "\\label{_auto4} \\tag{8}\n",
+    "\\end{equation}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "31d4f5aa",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Mathematical model\n",
+    "\n",
+    "We can generalize this expression to an MLP with $l$ hidden\n",
+    "layers. The complete functional form is,"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "114030e5",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"completeNN\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation}\n",
+    "y^{l+1}_i = f^{l+1}\\left[\\!\\sum_{j=1}^{N_l} w_{ij}^3 f^l\\left(\\sum_{k=1}^{N_{l-1}}w_{jk}^{l-1}\\left(\\dots f^1\\left(\\sum_{n=1}^{N_0} w_{mn}^1 x_n+ b_m^1\\right)\\dots\\right)+b_k^2\\right)+b_1^3\\right] \n",
+    "\\label{completeNN} \\tag{9}\n",
+    "\\end{equation}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a93aec4e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "which illustrates a basic property of MLPs: The only independent\n",
+    "variables are the input values $x_n$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7c85562d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Mathematical model\n",
+    "\n",
+    "This confirms that an MLP, despite its quite convoluted mathematical\n",
+    "form, is nothing more than an analytic function, specifically a\n",
+    "mapping of real-valued vectors $\\hat{x} \\in \\mathbb{R}^n \\rightarrow\n",
+    "\\hat{y} \\in \\mathbb{R}^m$.\n",
+    "\n",
+    "Furthermore, the flexibility and universality of an MLP can be\n",
+    "illustrated by realizing that the expression is essentially a nested\n",
+    "sum of scaled activation functions of the form"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1152ea5e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"_auto5\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation}\n",
+    " f(x) = c_1 f(c_2 x + c_3) + c_4\n",
+    "\\label{_auto5} \\tag{10}\n",
+    "\\end{equation}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4f3d4b33",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where the parameters $c_i$ are weights and biases. By adjusting these\n",
+    "parameters, the activation functions can be shifted up and down or\n",
+    "left and right, change slope or be rescaled which is the key to the\n",
+    "flexibility of a neural network."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4c1ac54e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "### Matrix-vector notation\n",
+    "\n",
+    "We can introduce a more convenient notation for the activations in an A NN. \n",
+    "\n",
+    "Additionally, we can represent the biases and activations\n",
+    "as layer-wise column vectors $\\hat{b}_l$ and $\\hat{y}_l$, so that the $i$-th element of each vector \n",
+    "is the bias $b_i^l$ and activation $y_i^l$ of node $i$ in layer $l$ respectively. \n",
+    "\n",
+    "We have that $\\mathrm{W}_l$ is an $N_{l-1} \\times N_l$ matrix, while $\\hat{b}_l$ and $\\hat{y}_l$ are $N_l \\times 1$ column vectors. \n",
+    "With this notation, the sum becomes a matrix-vector multiplication, and we can write\n",
+    "the equation for the activations of hidden layer 2 (assuming three nodes for simplicity) as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5c4a861f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"_auto6\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation}\n",
+    " \\hat{y}_2 = f_2(\\mathrm{W}_2 \\hat{y}_{1} + \\hat{b}_{2}) = \n",
+    " f_2\\left(\\left[\\begin{array}{ccc}\n",
+    "    w^2_{11} &w^2_{12} &w^2_{13} \\\\\n",
+    "    w^2_{21} &w^2_{22} &w^2_{23} \\\\\n",
+    "    w^2_{31} &w^2_{32} &w^2_{33} \\\\\n",
+    "    \\end{array} \\right] \\cdot\n",
+    "    \\left[\\begin{array}{c}\n",
+    "           y^1_1 \\\\\n",
+    "           y^1_2 \\\\\n",
+    "           y^1_3 \\\\\n",
+    "          \\end{array}\\right] + \n",
+    "    \\left[\\begin{array}{c}\n",
+    "           b^2_1 \\\\\n",
+    "           b^2_2 \\\\\n",
+    "           b^2_3 \\\\\n",
+    "          \\end{array}\\right]\\right).\n",
+    "\\label{_auto6} \\tag{11}\n",
+    "\\end{equation}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "276b271b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "### Matrix-vector notation  and activation\n",
+    "\n",
+    "The activation of node $i$ in layer 2 is"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "63a5b8f1",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"_auto7\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation}\n",
+    " y^2_i = f_2\\Bigr(w^2_{i1}y^1_1 + w^2_{i2}y^1_2 + w^2_{i3}y^1_3 + b^2_i\\Bigr) = \n",
+    " f_2\\left(\\sum_{j=1}^3 w^2_{ij} y_j^1 + b^2_i\\right).\n",
+    "\\label{_auto7} \\tag{12}\n",
+    "\\end{equation}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "316b8c32",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "This is not just a convenient and compact notation, but also a useful\n",
+    "and intuitive way to think about MLPs: The output is calculated by a\n",
+    "series of matrix-vector multiplications and vector additions that are\n",
+    "used as input to the activation functions. For each operation\n",
+    "$\\mathrm{W}_l \\hat{y}_{l-1}$ we move forward one layer."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "34ba90c8",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "### Activation functions\n",
+    "\n",
+    "A property that characterizes a neural network, other than its\n",
+    "connectivity, is the choice of activation function(s).  As described\n",
+    "in, the following restrictions are imposed on an activation function\n",
+    "for a FFNN to fulfill the universal approximation theorem\n",
+    "\n",
+    "  * Non-constant\n",
+    "\n",
+    "  * Bounded\n",
+    "\n",
+    "  * Monotonically-increasing\n",
+    "\n",
+    "  * Continuous"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3019fcaf",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "### Activation functions, Logistic and Hyperbolic ones\n",
+    "\n",
+    "The second requirement excludes all linear functions. Furthermore, in\n",
+    "a MLP with only linear activation functions, each layer simply\n",
+    "performs a linear transformation of its inputs.\n",
+    "\n",
+    "Regardless of the number of layers, the output of the NN will be\n",
+    "nothing but a linear function of the inputs. Thus we need to introduce\n",
+    "some kind of non-linearity to the NN to be able to fit non-linear\n",
+    "functions Typical examples are the logistic *Sigmoid*"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "389ff36b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "f(x) = \\frac{1}{1 + e^{-x}},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ee9b399a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and the *hyperbolic tangent* function"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "36f98b26",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "f(x) = \\tanh(x)\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cb7b8839",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "### Relevance\n",
+    "\n",
+    "The *sigmoid* function are more biologically plausible because the\n",
+    "output of inactive neurons are zero. Such activation function are\n",
+    "called *one-sided*. However, it has been shown that the hyperbolic\n",
+    "tangent performs better than the sigmoid for training MLPs.  has\n",
+    "become the most popular for *deep neural networks*"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 12,
+   "id": "db8d28b5",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "\"\"\"The sigmoid function (or the logistic curve) is a \n",
+    "function that takes any real number, z, and outputs a number (0,1).\n",
+    "It is useful in neural networks for assigning weights on a relative scale.\n",
+    "The value z is the weighted sum of parameters involved in the learning algorithm.\"\"\"\n",
+    "\n",
+    "import numpy\n",
+    "import matplotlib.pyplot as plt\n",
+    "import math as mt\n",
+    "\n",
+    "z = numpy.arange(-5, 5, .1)\n",
+    "sigma_fn = numpy.vectorize(lambda z: 1/(1+numpy.exp(-z)))\n",
+    "sigma = sigma_fn(z)\n",
+    "\n",
+    "fig = plt.figure()\n",
+    "ax = fig.add_subplot(111)\n",
+    "ax.plot(z, sigma)\n",
+    "ax.set_ylim([-0.1, 1.1])\n",
+    "ax.set_xlim([-5,5])\n",
+    "ax.grid(True)\n",
+    "ax.set_xlabel('z')\n",
+    "ax.set_title('sigmoid function')\n",
+    "\n",
+    "plt.show()\n",
+    "\n",
+    "\"\"\"Step Function\"\"\"\n",
+    "z = numpy.arange(-5, 5, .02)\n",
+    "step_fn = numpy.vectorize(lambda z: 1.0 if z >= 0.0 else 0.0)\n",
+    "step = step_fn(z)\n",
+    "\n",
+    "fig = plt.figure()\n",
+    "ax = fig.add_subplot(111)\n",
+    "ax.plot(z, step)\n",
+    "ax.set_ylim([-0.5, 1.5])\n",
+    "ax.set_xlim([-5,5])\n",
+    "ax.grid(True)\n",
+    "ax.set_xlabel('z')\n",
+    "ax.set_title('step function')\n",
+    "\n",
+    "plt.show()\n",
+    "\n",
+    "\"\"\"Sine Function\"\"\"\n",
+    "z = numpy.arange(-2*mt.pi, 2*mt.pi, 0.1)\n",
+    "t = numpy.sin(z)\n",
+    "\n",
+    "fig = plt.figure()\n",
+    "ax = fig.add_subplot(111)\n",
+    "ax.plot(z, t)\n",
+    "ax.set_ylim([-1.0, 1.0])\n",
+    "ax.set_xlim([-2*mt.pi,2*mt.pi])\n",
+    "ax.grid(True)\n",
+    "ax.set_xlabel('z')\n",
+    "ax.set_title('sine function')\n",
+    "\n",
+    "plt.show()\n",
+    "\n",
+    "\"\"\"Plots a graph of the squashing function used by a rectified linear\n",
+    "unit\"\"\"\n",
+    "z = numpy.arange(-2, 2, .1)\n",
+    "zero = numpy.zeros(len(z))\n",
+    "y = numpy.max([zero, z], axis=0)\n",
+    "\n",
+    "fig = plt.figure()\n",
+    "ax = fig.add_subplot(111)\n",
+    "ax.plot(z, y)\n",
+    "ax.set_ylim([-2.0, 2.0])\n",
+    "ax.set_xlim([-2.0, 2.0])\n",
+    "ax.grid(True)\n",
+    "ax.set_xlabel('z')\n",
+    "ax.set_title('Rectified linear unit')\n",
+    "\n",
+    "plt.show()"
+   ]
+  }
+ ],
+ "metadata": {},
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
\ No newline at end of file
diff --git a/doc/LectureNotes/_build/jupyter_execute/week41.ipynb b/doc/LectureNotes/_build/jupyter_execute/week41.ipynb
new file mode 100644
index 000000000..00bfd22e1
--- /dev/null
+++ b/doc/LectureNotes/_build/jupyter_execute/week41.ipynb
@@ -0,0 +1,3820 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "b625bb28",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- HTML file automatically generated from DocOnce source (https://github.com/doconce/doconce/)\n",
+    "doconce format html week41.do.txt --no_mako -->\n",
+    "<!-- dom:TITLE: Week 41 Neural networks and constructing a neural network code -->"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "679109d4",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "# Week 41 Neural networks and constructing a neural network code\n",
+    "**Morten Hjorth-Jensen**, Department of Physics, University of Oslo, Norway\n",
+    "\n",
+    "Date: **Week 41**"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d7401ab9",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Plan for week 41, October 6-10"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f47e1c5c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Material for the lecture on Monday October 6, 2025\n",
+    "1. Neural Networks, setting up the basic steps, from the simple perceptron model to the multi-layer perceptron model.\n",
+    "\n",
+    "2. Building our own Feed-forward Neural Network, getting started\n",
+    "<!-- * Video of lecture notes at URL:\"\" -->\n",
+    "<!-- * Whiteboard notes at <https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2025/FYSSTKweek41.pdf> -->"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "af0a9895",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Readings and Videos:\n",
+    "1. These lecture notes\n",
+    "\n",
+    "2. For neural networks we recommend Goodfellow et al chapters 6 and 7.\n",
+    "\n",
+    "3. Rashkca et al., chapter 11, jupyter-notebook sent separately, from [GitHub](https://github.com/rasbt/machine-learning-book)\n",
+    "\n",
+    "4. Neural Networks demystified at <https://www.youtube.com/watch?v=bxe2T-V8XRs&list=PLiaHhY2iBX9hdHaRr6b7XevZtgZRa1PoU&ab_channel=WelchLabs>\n",
+    "\n",
+    "5. Building Neural Networks from scratch at <https://www.youtube.com/watch?v=Wo5dMEP_BbI&list=PLQVvvaa0QuDcjD5BAw2DxE6OF2tius3V3&ab_channel=sentdex>\n",
+    "\n",
+    "6. Video on Neural Networks at <https://www.youtube.com/watch?v=CqOfi41LfDw>\n",
+    "\n",
+    "7. Video on the back propagation algorithm at <https://www.youtube.com/watch?v=Ilg3gGewQ5U>\n",
+    "\n",
+    "8. We also  recommend Michael Nielsen's intuitive approach to the neural networks and the universal approximation theorem, see the slides at <http://neuralnetworksanddeeplearning.com/chap4.html>."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "be1e5c03",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Mathematics of deep learning\n",
+    "\n",
+    "**Two recent books online.**\n",
+    "\n",
+    "1. [The Modern Mathematics of Deep Learning, by Julius Berner, Philipp Grohs, Gitta Kutyniok, Philipp Petersen](https://arxiv.org/abs/2105.04026), published as [Mathematical Aspects of Deep Learning, pp. 1-111. Cambridge University Press, 2022](https://doi.org/10.1017/9781009025096.002)\n",
+    "\n",
+    "2. [Mathematical Introduction to Deep Learning: Methods, Implementations, and Theory, Arnulf Jentzen, Benno Kuckuck, Philippe von Wurstemberger](https://doi.org/10.48550/arXiv.2310.20360)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "52520e8f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Reminder on books with hands-on material and codes\n",
+    "[Sebastian Rashcka et al, Machine learning with Sickit-Learn and PyTorch](https://sebastianraschka.com/blog/2022/ml-pytorch-book.html)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "408a0487",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Lab sessions on Tuesday and Wednesday\n",
+    "\n",
+    "Aim: Getting started with coding neural network. The exercises this\n",
+    "week aim at setting up the feed-forward part of a neural network."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "23056baf",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Lecture Monday  October 6"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "56a2f2f2",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Introduction to Neural networks\n",
+    "\n",
+    "Artificial neural networks are computational systems that can learn to\n",
+    "perform tasks by considering examples, generally without being\n",
+    "programmed with any task-specific rules. It is supposed to mimic a\n",
+    "biological system, wherein neurons interact by sending signals in the\n",
+    "form of mathematical functions between layers. All layers can contain\n",
+    "an arbitrary number of neurons, and each connection is represented by\n",
+    "a weight variable."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2e3fa93d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Artificial neurons\n",
+    "\n",
+    "The field of artificial neural networks has a long history of\n",
+    "development, and is closely connected with the advancement of computer\n",
+    "science and computers in general. A model of artificial neurons was\n",
+    "first developed by McCulloch and Pitts in 1943 to study signal\n",
+    "processing in the brain and has later been refined by others. The\n",
+    "general idea is to mimic neural networks in the human brain, which is\n",
+    "composed of billions of neurons that communicate with each other by\n",
+    "sending electrical signals.  Each neuron accumulates its incoming\n",
+    "signals, which must exceed an activation threshold to yield an\n",
+    "output. If the threshold is not overcome, the neuron remains inactive,\n",
+    "i.e. has zero output.\n",
+    "\n",
+    "This behaviour has inspired a simple mathematical model for an artificial neuron."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0afafe3e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"artificialNeuron\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation}\n",
+    " y = f\\left(\\sum_{i=1}^n w_ix_i\\right) = f(u)\n",
+    "\\label{artificialNeuron} \\tag{1}\n",
+    "\\end{equation}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "bc113056",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Here, the output $y$ of the neuron is the value of its activation function, which have as input\n",
+    "a weighted sum of signals $x_i, \\dots ,x_n$ received by $n$ other neurons.\n",
+    "\n",
+    "Conceptually, it is helpful to divide neural networks into four\n",
+    "categories:\n",
+    "1. general purpose neural networks for supervised learning,\n",
+    "\n",
+    "2. neural networks designed specifically for image processing, the most prominent example of this class being Convolutional Neural Networks (CNNs),\n",
+    "\n",
+    "3. neural networks for sequential data such as Recurrent Neural Networks (RNNs), and\n",
+    "\n",
+    "4. neural networks for unsupervised learning such as Deep Boltzmann Machines.\n",
+    "\n",
+    "In natural science, DNNs and CNNs have already found numerous\n",
+    "applications. In statistical physics, they have been applied to detect\n",
+    "phase transitions in 2D Ising and Potts models, lattice gauge\n",
+    "theories, and different phases of polymers, or solving the\n",
+    "Navier-Stokes equation in weather forecasting.  Deep learning has also\n",
+    "found interesting applications in quantum physics. Various quantum\n",
+    "phase transitions can be detected and studied using DNNs and CNNs,\n",
+    "topological phases, and even non-equilibrium many-body\n",
+    "localization. Representing quantum states as DNNs quantum state\n",
+    "tomography are among some of the impressive achievements to reveal the\n",
+    "potential of DNNs to facilitate the study of quantum systems.\n",
+    "\n",
+    "In quantum information theory, it has been shown that one can perform\n",
+    "gate decompositions with the help of neural. \n",
+    "\n",
+    "The applications are not limited to the natural sciences. There is a\n",
+    "plethora of applications in essentially all disciplines, from the\n",
+    "humanities to life science and medicine."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "872c3321",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Neural network types\n",
+    "\n",
+    "An artificial neural network (ANN), is a computational model that\n",
+    "consists of layers of connected neurons, or nodes or units.  We will\n",
+    "refer to these interchangeably as units or nodes, and sometimes as\n",
+    "neurons.\n",
+    "\n",
+    "It is supposed to mimic a biological nervous system by letting each\n",
+    "neuron interact with other neurons by sending signals in the form of\n",
+    "mathematical functions between layers.  A wide variety of different\n",
+    "ANNs have been developed, but most of them consist of an input layer,\n",
+    "an output layer and eventual layers in-between, called *hidden\n",
+    "layers*. All layers can contain an arbitrary number of nodes, and each\n",
+    "connection between two nodes is associated with a weight variable.\n",
+    "\n",
+    "Neural networks (also called neural nets) are neural-inspired\n",
+    "nonlinear models for supervised learning.  As we will see, neural nets\n",
+    "can be viewed as natural, more powerful extensions of supervised\n",
+    "learning methods such as linear and logistic regression and soft-max\n",
+    "methods we discussed earlier."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "53edae74",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Feed-forward neural networks\n",
+    "\n",
+    "The feed-forward neural network (FFNN) was the first and simplest type\n",
+    "of ANNs that were devised. In this network, the information moves in\n",
+    "only one direction: forward through the layers.\n",
+    "\n",
+    "Nodes are represented by circles, while the arrows display the\n",
+    "connections between the nodes, including the direction of information\n",
+    "flow. Additionally, each arrow corresponds to a weight variable\n",
+    "(figure to come).  We observe that each node in a layer is connected\n",
+    "to *all* nodes in the subsequent layer, making this a so-called\n",
+    "*fully-connected* FFNN."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0eef36d6",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Convolutional Neural Network\n",
+    "\n",
+    "A different variant of FFNNs are *convolutional neural networks*\n",
+    "(CNNs), which have a connectivity pattern inspired by the animal\n",
+    "visual cortex. Individual neurons in the visual cortex only respond to\n",
+    "stimuli from small sub-regions of the visual field, called a receptive\n",
+    "field. This makes the neurons well-suited to exploit the strong\n",
+    "spatially local correlation present in natural images. The response of\n",
+    "each neuron can be approximated mathematically as a convolution\n",
+    "operation.  (figure to come)\n",
+    "\n",
+    "Convolutional neural networks emulate the behaviour of neurons in the\n",
+    "visual cortex by enforcing a *local* connectivity pattern between\n",
+    "nodes of adjacent layers: Each node in a convolutional layer is\n",
+    "connected only to a subset of the nodes in the previous layer, in\n",
+    "contrast to the fully-connected FFNN.  Often, CNNs consist of several\n",
+    "convolutional layers that learn local features of the input, with a\n",
+    "fully-connected layer at the end, which gathers all the local data and\n",
+    "produces the outputs. They have wide applications in image and video\n",
+    "recognition."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "bf602451",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Recurrent neural networks\n",
+    "\n",
+    "So far we have only mentioned ANNs where information flows in one\n",
+    "direction: forward. *Recurrent neural networks* on the other hand,\n",
+    "have connections between nodes that form directed *cycles*. This\n",
+    "creates a form of internal memory which are able to capture\n",
+    "information on what has been calculated before; the output is\n",
+    "dependent on the previous computations. Recurrent NNs make use of\n",
+    "sequential information by performing the same task for every element\n",
+    "in a sequence, where each element depends on previous elements. An\n",
+    "example of such information is sentences, making recurrent NNs\n",
+    "especially well-suited for handwriting and speech recognition."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0afbe2d0",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Other types of networks\n",
+    "\n",
+    "There are many other kinds of ANNs that have been developed. One type\n",
+    "that is specifically designed for interpolation in multidimensional\n",
+    "space is the radial basis function (RBF) network. RBFs are typically\n",
+    "made up of three layers: an input layer, a hidden layer with\n",
+    "non-linear radial symmetric activation functions and a linear output\n",
+    "layer (''linear'' here means that each node in the output layer has a\n",
+    "linear activation function). The layers are normally fully-connected\n",
+    "and there are no cycles, thus RBFs can be viewed as a type of\n",
+    "fully-connected FFNN. They are however usually treated as a separate\n",
+    "type of NN due the unusual activation functions."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d957cfe8",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Multilayer perceptrons\n",
+    "\n",
+    "One uses often so-called fully-connected feed-forward neural networks\n",
+    "with three or more layers (an input layer, one or more hidden layers\n",
+    "and an output layer) consisting of neurons that have non-linear\n",
+    "activation functions.\n",
+    "\n",
+    "Such networks are often called *multilayer perceptrons* (MLPs)."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "57b218ab",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Why multilayer perceptrons?\n",
+    "\n",
+    "According to the *Universal approximation theorem*, a feed-forward\n",
+    "neural network with just a single hidden layer containing a finite\n",
+    "number of neurons can approximate a continuous multidimensional\n",
+    "function to arbitrary accuracy, assuming the activation function for\n",
+    "the hidden layer is a **non-constant, bounded and\n",
+    "monotonically-increasing continuous function**.\n",
+    "\n",
+    "Note that the requirements on the activation function only applies to\n",
+    "the hidden layer, the output nodes are always assumed to be linear, so\n",
+    "as to not restrict the range of output values."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6bda8dda",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Illustration of a single perceptron model and a multi-perceptron model\n",
+    "\n",
+    "<!-- dom:FIGURE: [figures/nns.png, width=600 frac=0.8]  In a) we show a single perceptron model while in b) we dispay a network with two  hidden layers, an input layer and an output layer. -->\n",
+    "<!-- begin figure -->\n",
+    "\n",
+    "<img src=\"figures/nns.png\" width=\"600\"><p style=\"font-size: 0.9em\"><i>Figure 1: In a) we show a single perceptron model while in b) we dispay a network with two  hidden layers, an input layer and an output layer.</i></p>\n",
+    "<!-- end figure -->"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f7d514be",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Mathematics of deep learning and neural networks\n",
+    "\n",
+    "Neural networks, in its so-called feed-forward form, where each\n",
+    "iterations contains a feed-forward stage and a back-propgagation\n",
+    "stage, consist of series of affine matrix-matrix and matrix-vector\n",
+    "multiplications. The unknown parameters (the so-called biases and\n",
+    "weights which deternine the architecture of a neural network), are\n",
+    "uptaded iteratively using the so-called back-propagation algorithm.\n",
+    "This algorithm corresponds to the so-called reverse mode of \n",
+    "automatic differentation."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "02ed299b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Basics of an NN\n",
+    "\n",
+    "A neural network consists of a series of hidden layers, in addition to\n",
+    "the input and output layers.  Each layer $l$ has a set of parameters\n",
+    "$\\boldsymbol{\\Theta}^{(l)}=(\\boldsymbol{W}^{(l)},\\boldsymbol{b}^{(l)})$ which are related to the\n",
+    "parameters in other layers through a series of affine transformations,\n",
+    "for a standard NN these are matrix-matrix and matrix-vector\n",
+    "multiplications.  For all layers we will simply use a collective variable $\\boldsymbol{\\Theta}$.\n",
+    "\n",
+    "It consist of two basic steps:\n",
+    "1. a feed forward stage which takes a given input and produces a final output which is compared with the target values through our cost/loss function.\n",
+    "\n",
+    "2. a back-propagation state where the unknown parameters $\\boldsymbol{\\Theta}$ are updated through the optimization of the their gradients. The expressions for the gradients are obtained via the chain rule, starting from the derivative of the cost/function.\n",
+    "\n",
+    "These two steps make up one iteration. This iterative process is continued till we reach an eventual stopping criterion."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "96b8c13c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Overarching view of a neural network\n",
+    "\n",
+    "The architecture of a neural network defines our model. This model\n",
+    "aims at describing some function $f(\\boldsymbol{x}$ which represents\n",
+    "some final result (outputs or tagrget values) given a specific inpput\n",
+    "$\\boldsymbol{x}$. Note that here $\\boldsymbol{y}$ and $\\boldsymbol{x}$ are not limited to be\n",
+    "vectors.\n",
+    "\n",
+    "The architecture consists of\n",
+    "1. An input and an output layer where the input layer is defined by the inputs $\\boldsymbol{x}$. The output layer produces the model ouput $\\boldsymbol{\\tilde{y}}$ which is compared with the target value $\\boldsymbol{y}$\n",
+    "\n",
+    "2. A given number of hidden layers and neurons/nodes/units for each layer (this may vary)\n",
+    "\n",
+    "3. A given activation function $\\sigma(\\boldsymbol{z})$ with arguments $\\boldsymbol{z}$ to be defined below. The activation functions may differ from layer to layer.\n",
+    "\n",
+    "4. The last layer, normally called **output** layer has normally an activation function tailored to the specific problem\n",
+    "\n",
+    "5. Finally we define a so-called cost or loss function which is used to gauge the quality of our model."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "089704bf",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## The optimization problem\n",
+    "\n",
+    "The cost function is a function of the unknown parameters\n",
+    "$\\boldsymbol{\\Theta}$ where the latter is a container for all possible\n",
+    "parameters needed to define a neural network\n",
+    "\n",
+    "If we are dealing with a regression task a typical cost/loss function\n",
+    "is the mean squared error"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "91ef7170",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "C(\\boldsymbol{\\Theta})=\\frac{1}{n}\\left\\{\\left(\\boldsymbol{y}-\\boldsymbol{X}\\boldsymbol{\\theta}\\right)^T\\left(\\boldsymbol{y}-\\boldsymbol{X}\\boldsymbol{\\theta}\\right)\\right\\}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c9402737",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "This function represents one of many possible ways to define\n",
+    "the so-called cost function. Note that here we have assumed a linear dependence in terms of the paramters $\\boldsymbol{\\Theta}$. This is in general not the case."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "09940e05",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Parameters of neural networks\n",
+    "For neural networks the parameters\n",
+    "$\\boldsymbol{\\Theta}$ are given by the so-called weights and biases (to be\n",
+    "defined below).\n",
+    "\n",
+    "The weights are given by matrix elements $w_{ij}^{(l)}$ where the\n",
+    "superscript indicates the layer number. The biases are typically given\n",
+    "by vector elements representing each single node of a given layer,\n",
+    "that is $b_j^{(l)}$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2bd7b3ff",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Other ingredients of a neural network\n",
+    "\n",
+    "Having defined the architecture of a neural network, the optimization\n",
+    "of the cost function with respect to the parameters $\\boldsymbol{\\Theta}$,\n",
+    "involves the calculations of gradients and their optimization. The\n",
+    "gradients represent the derivatives of a multidimensional object and\n",
+    "are often approximated by various gradient methods, including\n",
+    "1. various quasi-Newton methods,\n",
+    "\n",
+    "2. plain gradient descent (GD) with a constant learning rate $\\eta$,\n",
+    "\n",
+    "3. GD with momentum and other approximations to the learning rates such as\n",
+    "\n",
+    "  * Adapative gradient (ADAgrad)\n",
+    "\n",
+    "  * Root mean-square propagation (RMSprop)\n",
+    "\n",
+    "  * Adaptive gradient with momentum (ADAM) and many other\n",
+    "\n",
+    "4. Stochastic gradient descent and various families of learning rate approximations"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1a771f02",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Other parameters\n",
+    "\n",
+    "In addition to the above, there are often additional hyperparamaters\n",
+    "which are included in the setup of a neural network. These will be\n",
+    "discussed below."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3291a232",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Universal approximation theorem\n",
+    "\n",
+    "The universal approximation theorem plays a central role in deep\n",
+    "learning.  [Cybenko (1989)](https://link.springer.com/article/10.1007/BF02551274) showed\n",
+    "the following:\n",
+    "\n",
+    "Let $\\sigma$ be any continuous sigmoidal function such that"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "74cc209d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\sigma(z) = \\left\\{\\begin{array}{cc} 1 & z\\rightarrow \\infty\\\\ 0 & z \\rightarrow -\\infty \\end{array}\\right.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "fe210f2f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Given a continuous and deterministic function $F(\\boldsymbol{x})$ on the unit\n",
+    "cube in $d$-dimensions $F\\in [0,1]^d$, $x\\in [0,1]^d$ and a parameter\n",
+    "$\\epsilon >0$, there is a one-layer (hidden) neural network\n",
+    "$f(\\boldsymbol{x};\\boldsymbol{\\Theta})$ with $\\boldsymbol{\\Theta}=(\\boldsymbol{W},\\boldsymbol{b})$ and $\\boldsymbol{W}\\in\n",
+    "\\mathbb{R}^{m\\times n}$ and $\\boldsymbol{b}\\in \\mathbb{R}^{n}$, for which"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4dfec9c9",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\vert F(\\boldsymbol{x})-f(\\boldsymbol{x};\\boldsymbol{\\Theta})\\vert < \\epsilon \\hspace{0.1cm} \\forall \\boldsymbol{x}\\in[0,1]^d.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a65f0cd5",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Some parallels from real analysis\n",
+    "\n",
+    "For those of you familiar with for example the [Stone-Weierstrass\n",
+    "theorem](https://en.wikipedia.org/wiki/Stone%E2%80%93Weierstrass_theorem)\n",
+    "for polynomial approximations or the convergence criterion for Fourier\n",
+    "series, there are similarities in the derivation of the proof for\n",
+    "neural networks."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d006386b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## The approximation theorem in words\n",
+    "\n",
+    "**Any continuous function $y=F(\\boldsymbol{x})$ supported on the unit cube in\n",
+    "$d$-dimensions can be approximated by a one-layer sigmoidal network to\n",
+    "arbitrary accuracy.**\n",
+    "\n",
+    "[Hornik (1991)](https://www.sciencedirect.com/science/article/abs/pii/089360809190009T) extended the theorem by letting any non-constant, bounded activation function to be included using that the expectation value"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0b094d43",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\mathbb{E}[\\vert F(\\boldsymbol{x})\\vert^2] =\\int_{\\boldsymbol{x}\\in D} \\vert F(\\boldsymbol{x})\\vert^2p(\\boldsymbol{x})d\\boldsymbol{x} < \\infty.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f2b9ca56",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Then we have"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "db4817b0",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\mathbb{E}[\\vert F(\\boldsymbol{x})-f(\\boldsymbol{x};\\boldsymbol{\\Theta})\\vert^2] =\\int_{\\boldsymbol{x}\\in D} \\vert F(\\boldsymbol{x})-f(\\boldsymbol{x};\\boldsymbol{\\Theta})\\vert^2p(\\boldsymbol{x})d\\boldsymbol{x} < \\epsilon.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "43216143",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## More on the general approximation theorem\n",
+    "\n",
+    "None of the proofs give any insight into the relation between the\n",
+    "number of of hidden layers and nodes and the approximation error\n",
+    "$\\epsilon$, nor the magnitudes of $\\boldsymbol{W}$ and $\\boldsymbol{b}$.\n",
+    "\n",
+    "Neural networks (NNs) have what we may call a kind of universality no matter what function we want to compute.\n",
+    "\n",
+    "It does not mean that an NN can be used to exactly compute any function. Rather, we get an approximation that is as good as we want."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ef48ad88",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Class of functions we can approximate\n",
+    "\n",
+    "The class of functions that can be approximated are the continuous ones.\n",
+    "If the function $F(\\boldsymbol{x})$ is discontinuous, it won't in general be possible to approximate it. However, an NN may still give an approximation even if we fail in some points."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7c4fed36",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Setting up the equations for a neural network\n",
+    "\n",
+    "The questions we want to ask are how do changes in the biases and the\n",
+    "weights in our network change the cost function and how can we use the\n",
+    "final output to modify the weights and biases?\n",
+    "\n",
+    "To derive these equations let us start with a plain regression problem\n",
+    "and define our cost function as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c4cf04e8",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "{\\cal C}(\\boldsymbol{\\Theta})  =  \\frac{1}{2}\\sum_{i=1}^n\\left(y_i - \\tilde{y}_i\\right)^2,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ecc9a1bd",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where the $y_i$s are our $n$ targets (the values we want to\n",
+    "reproduce), while the outputs of the network after having propagated\n",
+    "all inputs $\\boldsymbol{x}$ are given by $\\boldsymbol{\\tilde{y}}_i$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "91e6e150",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Layout of a neural network with three hidden layers\n",
+    "\n",
+    "<!-- dom:FIGURE: [figures/nn1.png, width=900 frac=1.0] -->\n",
+    "<!-- begin figure -->\n",
+    "\n",
+    "<img src=\"figures/nn1.png\" width=\"900\"><p style=\"font-size: 0.9em\"><i>Figure 1: </i></p>\n",
+    "<!-- end figure -->"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4fabe3cc",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Definitions\n",
+    "\n",
+    "With our definition of the targets $\\boldsymbol{y}$, the outputs of the\n",
+    "network $\\boldsymbol{\\tilde{y}}$ and the inputs $\\boldsymbol{x}$ we\n",
+    "define now the activation $z_j^l$ of node/neuron/unit $j$ of the\n",
+    "$l$-th layer as a function of the bias, the weights which add up from\n",
+    "the previous layer $l-1$ and the forward passes/outputs\n",
+    "$\\hat{a}^{l-1}$ from the previous layer as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8c25e4cf",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "z_j^l = \\sum_{i=1}^{M_{l-1}}w_{ij}^la_i^{l-1}+b_j^l,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ae861380",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where $b_k^l$ are the biases from layer $l$.  Here $M_{l-1}$\n",
+    "represents the total number of nodes/neurons/units of layer $l-1$. The\n",
+    "figure in the whiteboard notes illustrates this equation.  We can rewrite this in a more\n",
+    "compact form as the matrix-vector products we discussed earlier,"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2b7f7b74",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\hat{z}^l = \\left(\\hat{W}^l\\right)^T\\hat{a}^{l-1}+\\hat{b}^l.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e76386ca",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Inputs to the activation function\n",
+    "\n",
+    "With the activation values $\\boldsymbol{z}^l$ we can in turn define the\n",
+    "output of layer $l$ as $\\boldsymbol{a}^l = f(\\boldsymbol{z}^l)$ where $f$ is our\n",
+    "activation function. In the examples here we will use the sigmoid\n",
+    "function discussed in our logistic regression lectures. We will also use the same activation function $f$ for all layers\n",
+    "and their nodes.  It means we have"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "12a9fb38",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "a_j^l = \\sigma(z_j^l) = \\frac{1}{1+\\exp{-(z_j^l)}}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "08bbe824",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Derivatives and the chain rule\n",
+    "\n",
+    "From the definition of the activation $z_j^l$ we have"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3783fe53",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial z_j^l}{\\partial w_{ij}^l} = a_i^{l-1},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2b70d213",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "209db1b2",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial z_j^l}{\\partial a_i^{l-1}} = w_{ji}^l.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6e42f02f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "With our definition of the activation function we have that (note that this function depends only on $z_j^l$)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "78422fdc",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial a_j^l}{\\partial z_j^{l}} = a_j^l(1-a_j^l)=\\sigma(z_j^l)(1-\\sigma(z_j^l)).\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8c8491cf",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Derivative of the cost function\n",
+    "\n",
+    "With these definitions we can now compute the derivative of the cost function in terms of the weights.\n",
+    "\n",
+    "Let us specialize to the output layer $l=L$. Our cost function is"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "82fb3ded",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "{\\cal C}(\\boldsymbol{\\Theta}^L)  =  \\frac{1}{2}\\sum_{i=1}^n\\left(y_i - \\tilde{y}_i\\right)^2=\\frac{1}{2}\\sum_{i=1}^n\\left(a_i^L - y_i\\right)^2,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "88fe7049",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "The derivative of this function with respect to the weights is"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "af856571",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial{\\cal C}(\\boldsymbol{\\Theta}^L)}{\\partial w_{jk}^L}  =  \\left(a_j^L - y_j\\right)\\frac{\\partial a_j^L}{\\partial w_{jk}^{L}},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d684ab45",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "The last partial derivative can easily be computed and reads (by applying the chain rule)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ac371b5c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial a_j^L}{\\partial w_{jk}^{L}} = \\frac{\\partial a_j^L}{\\partial z_{j}^{L}}\\frac{\\partial z_j^L}{\\partial w_{jk}^{L}}=a_j^L(1-a_j^L)a_k^{L-1}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8dbfe230",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Simpler examples first, and automatic differentiation\n",
+    "\n",
+    "In order to understand the back propagation algorithm and its\n",
+    "derivation (an implementation of the chain rule), let us first digress\n",
+    "with some simple examples. These examples are also meant to motivate\n",
+    "the link with back propagation and [automatic differentiation](https://en.wikipedia.org/wiki/Automatic_differentiation). We will discuss these topics next week (week 42)."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7244f7f3",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Reminder on the chain rule and gradients\n",
+    "\n",
+    "If we have a multivariate function $f(x,y)$ where $x=x(t)$ and $y=y(t)$ are functions of a variable $t$, we have that the gradient of $f$ with respect to $t$ (without the explicit unit vector components)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ffb80d86",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{df}{dt} = \\begin{bmatrix}\\frac{\\partial f}{\\partial x} & \\frac{\\partial f}{\\partial y} \\end{bmatrix} \\begin{bmatrix}\\frac{\\partial x}{\\partial t} \\\\ \\frac{\\partial y}{\\partial t} \\end{bmatrix}=\\frac{\\partial f}{\\partial x} \\frac{\\partial x}{\\partial t} +\\frac{\\partial f}{\\partial y} \\frac{\\partial y}{\\partial t}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6f15ef23",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Multivariable functions\n",
+    "\n",
+    "If we have a multivariate function $f(x,y)$ where $x=x(t,s)$ and $y=y(t,s)$ are functions of the variables $t$ and $s$, we have that the partial derivatives"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1734d532",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial f}{\\partial s}=\\frac{\\partial f}{\\partial x}\\frac{\\partial x}{\\partial s}+\\frac{\\partial f}{\\partial y}\\frac{\\partial y}{\\partial s},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8c013e25",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f416e200",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial f}{\\partial t}=\\frac{\\partial f}{\\partial x}\\frac{\\partial x}{\\partial t}+\\frac{\\partial f}{\\partial y}\\frac{\\partial y}{\\partial t}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "943d440c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "the gradient of $f$ with respect to $t$ and $s$ (without the explicit unit vector components)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9a88f9e3",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{df}{d(s,t)} = \\begin{bmatrix}\\frac{\\partial f}{\\partial x} & \\frac{\\partial f}{\\partial y} \\end{bmatrix} \\begin{bmatrix}\\frac{\\partial x}{\\partial s}  &\\frac{\\partial x}{\\partial t} \\\\ \\frac{\\partial y}{\\partial s} & \\frac{\\partial y}{\\partial t} \\end{bmatrix}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6bc993bf",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Automatic differentiation through examples\n",
+    "\n",
+    "A great introduction to automatic differentiation is given by Baydin et al., see <https://arxiv.org/abs/1502.05767>.\n",
+    "See also the video at <https://www.youtube.com/watch?v=wG_nF1awSSY>.\n",
+    "\n",
+    "Automatic differentiation is a represented by a repeated application\n",
+    "of the chain rule on well-known functions and allows for the\n",
+    "calculation of derivatives to numerical precision. It is not the same\n",
+    "as the calculation of symbolic derivatives via for example SymPy, nor\n",
+    "does it use approximative formulae based on Taylor-expansions of a\n",
+    "function around a given value. The latter are error prone due to\n",
+    "truncation errors and values of the step size $\\Delta$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0685fdd2",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Simple example\n",
+    "\n",
+    "Our first example is rather simple,"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9a2b16de",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "f(x) =\\exp{x^2},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ba5c3f8a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "with derivative"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d0c973a9",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "f'(x) =2x\\exp{x^2}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "34c21223",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "We can use SymPy to extract the pertinent lines of Python code through the following simple example"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "id": "72fa0f44",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "from __future__ import division\n",
+    "from sympy import *\n",
+    "x = symbols('x')\n",
+    "expr = exp(x*x)\n",
+    "simplify(expr)\n",
+    "derivative = diff(expr,x)\n",
+    "print(python(expr))\n",
+    "print(python(derivative))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "78884bc6",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Smarter way of evaluating the above function\n",
+    "If we study this function, we note that we can reduce the number of operations by introducing an intermediate variable"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f13d7286",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "a = x^2,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "443739d9",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "leading to"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "48b45da1",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "f(x) = f(a(x)) = b= \\exp{a}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "81e7fd8f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "We now assume that all operations can be counted in terms of equal\n",
+    "floating point operations. This means that in order to calculate\n",
+    "$f(x)$ we need first to square $x$ and then compute the exponential. We\n",
+    "have thus two floating point operations only."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "824bbfa1",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Reducing the number of operations\n",
+    "\n",
+    "With the introduction of a precalculated quantity $a$ and thereby $f(x)$ we have that the derivative can be written as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "42d2716e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "f'(x) = 2xb,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f27855c1",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "which reduces the number of operations from four in the orginal\n",
+    "expression to two. This means that if we need to compute $f(x)$ and\n",
+    "its derivative (a common task in optimizations), we have reduced the\n",
+    "number of operations from six to four in total.\n",
+    "\n",
+    "**Note** that the usage of a symbolic software like SymPy does not\n",
+    "include such simplifications and the calculations of the function and\n",
+    "the derivatives yield in general more floating point operations."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d4fe531f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Chain rule, forward and reverse modes\n",
+    "\n",
+    "In the above example we have introduced the variables $a$ and $b$, and our function is"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "aba8f666",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "f(x) = f(a(x)) = b= \\exp{a},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "404c698a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "with $a=x^2$. We can decompose the derivative of $f$ with respect to $x$ as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2c73032a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{df}{dx}=\\frac{df}{db}\\frac{db}{da}\\frac{da}{dx}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "95a71a82",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "We note that since $b=f(x)$ that"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c71b8e66",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{df}{db}=1,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "23998633",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "leading to"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0708e562",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{df}{dx}=\\frac{db}{da}\\frac{da}{dx}=2x\\exp{x^2},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ee8c4ade",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "as before."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "860d410c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Forward and reverse modes\n",
+    "\n",
+    "We have that"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "064e5852",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{df}{dx}=\\frac{df}{db}\\frac{db}{da}\\frac{da}{dx},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "983c3afe",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "which we can rewrite either as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a1f9638f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{df}{dx}=\\left[\\frac{df}{db}\\frac{db}{da}\\right]\\frac{da}{dx},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "84a07e04",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "or"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4383650d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{df}{dx}=\\frac{df}{db}\\left[\\frac{db}{da}\\frac{da}{dx}\\right].\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "36a2d607",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "The first expression is called reverse mode (or back propagation)\n",
+    "since we start by evaluating the derivatives at the end point and then\n",
+    "propagate backwards. This is the standard way of evaluating\n",
+    "derivatives (gradients) when optimizing the parameters of a neural\n",
+    "network.  In the context of deep learning this is computationally\n",
+    "more efficient since the output of a neural network consists of either\n",
+    "one or some few other output variables.\n",
+    "\n",
+    "The second equation defines the so-called  **forward mode**."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ab0a9ca8",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## More complicated function\n",
+    "\n",
+    "We increase our ambitions and introduce a slightly more complicated function"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e85a7d29",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "f(x) =\\sqrt{x^2+exp{x^2}},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "91c151e1",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "with derivative"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "037a60e4",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "f'(x) =\\frac{x(1+\\exp{x^2})}{\\sqrt{x^2+exp{x^2}}}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9f198b96",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "The corresponding SymPy code reads"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "id": "620b6c3e",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "from __future__ import division\n",
+    "from sympy import *\n",
+    "x = symbols('x')\n",
+    "expr = sqrt(x*x+exp(x*x))\n",
+    "simplify(expr)\n",
+    "derivative = diff(expr,x)\n",
+    "print(python(expr))\n",
+    "print(python(derivative))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d1fe5ce8",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Counting the number of floating point operations\n",
+    "\n",
+    "A simple count of operations shows that we need five operations for\n",
+    "the function itself and ten for the derivative.  Fifteen operations in total if we wish to proceed with the above codes.\n",
+    "\n",
+    "Can we reduce this to\n",
+    "say half the number of operations?"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "746e84de",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Defining intermediate operations\n",
+    "\n",
+    "We can indeed reduce the number of operation to half of those listed in the brute force approach above.\n",
+    "We define the following quantities"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cbb4abde",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "a = x^2,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "640a0037",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e3b8b12d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "b = \\exp{x^2} = \\exp{a},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5b2087bf",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5c397a99",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "c= a+b,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c4884822",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c1834aef",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "d=f(x)=\\sqrt{c}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "aeee8fc4",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## New expression for the derivative\n",
+    "\n",
+    "With these definitions we obtain the following partial derivatives"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "df71e889",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial a}{\\partial x} = 2x,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "358a49a2",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "95138b08",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial b}{\\partial a} = \\exp{a},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0a0e2f81",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7fa7f3b5",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial c}{\\partial a} = 1,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c74442e2",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2e9ebae8",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial c}{\\partial b} = 1,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "db89516c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0bc2735a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial d}{\\partial c} = \\frac{1}{2\\sqrt{c}},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "42e0cb08",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and finally"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "56ccf1d5",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial f}{\\partial d} = 1.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "557f2482",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Final derivatives\n",
+    "Our final derivatives are thus"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "90eeebe1",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial f}{\\partial c} = \\frac{\\partial f}{\\partial d} \\frac{\\partial d}{\\partial c}  = \\frac{1}{2\\sqrt{c}},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6c2abeb4",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial f}{\\partial b} = \\frac{\\partial f}{\\partial c} \\frac{\\partial c}{\\partial b}  = \\frac{1}{2\\sqrt{c}},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3f5af305",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial f}{\\partial a} = \\frac{\\partial f}{\\partial c} \\frac{\\partial c}{\\partial a}+\n",
+    "\\frac{\\partial f}{\\partial b} \\frac{\\partial b}{\\partial a}  = \\frac{1+\\exp{a}}{2\\sqrt{c}},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b78e9f43",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and finally"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d197d721",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial f}{\\partial x} = \\frac{\\partial f}{\\partial a} \\frac{\\partial a}{\\partial x}  = \\frac{x(1+\\exp{a})}{\\sqrt{c}},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "17334528",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "which is just"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f69ca3fd",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial f}{\\partial x} = \\frac{x(1+b)}{d},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e937d622",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and requires only three operations if we can reuse all intermediate variables."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8ab7ba6b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## In general not this simple\n",
+    "\n",
+    "In general, see the generalization below, unless we can obtain simple\n",
+    "analytical expressions which we can simplify further, the final\n",
+    "implementation of automatic differentiation involves repeated\n",
+    "calculations (and thereby operations) of derivatives of elementary\n",
+    "functions."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "02665ba6",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Automatic differentiation\n",
+    "\n",
+    "We can make this example more formal. Automatic differentiation is a\n",
+    "formalization of the previous example (see graph).\n",
+    "\n",
+    "We define $\\boldsymbol{x}\\in x_1,\\dots, x_l$ input variables to a given function $f(\\boldsymbol{x})$ and $x_{l+1},\\dots, x_L$ intermediate variables.\n",
+    "\n",
+    "In the above example we have only one input variable, $l=1$ and four intermediate variables, that is"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c473a49a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\begin{bmatrix} x_1=x & x_2 = x^2=a & x_3 =\\exp{a}= b & x_4=c=a+b & x_5 = \\sqrt{c}=d \\end{bmatrix}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6beeffc2",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Furthemore, for $i=l+1, \\dots, L$ (here $i=2,3,4,5$ and $f=x_L=d$), we\n",
+    "define the elementary functions $g_i(x_{Pa(x_i)})$ where $x_{Pa(x_i)}$ are the parent nodes of the variable $x_i$.\n",
+    "\n",
+    "In our case, we have for example for $x_3=g_3(x_{Pa(x_i)})=\\exp{a}$, that $g_3=\\exp{()}$ and $x_{Pa(x_3)}=a$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "814918db",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Chain rule\n",
+    "\n",
+    "We can now compute the gradients by back-propagating the derivatives using the chain rule.\n",
+    "We have defined"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a7a72e3b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial f}{\\partial x_L} = 1,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "041df7ab",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "which allows us to find the derivatives of the various variables $x_i$ as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b687bc51",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial f}{\\partial x_i} = \\sum_{x_j:x_i\\in Pa(x_j)}\\frac{\\partial f}{\\partial x_j} \\frac{\\partial x_j}{\\partial x_i}=\\sum_{x_j:x_i\\in Pa(x_j)}\\frac{\\partial f}{\\partial x_j} \\frac{\\partial g_j}{\\partial x_i}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5c87f3af",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Whenever we have a function which can be expressed as a computation\n",
+    "graph and the various functions can be expressed in terms of\n",
+    "elementary functions that are differentiable, then automatic\n",
+    "differentiation works.  The functions may not need to be elementary\n",
+    "functions, they could also be computer programs, although not all\n",
+    "programs can be automatically differentiated."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "02df0535",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## First network example, simple percepetron with one input\n",
+    "\n",
+    "As yet another example we define now a simple perceptron model with\n",
+    "all quantities given by scalars. We consider only one input variable\n",
+    "$x$ and one target value $y$.  We define an activation function\n",
+    "$\\sigma_1$ which takes as input"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "dc45fa01",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "z_1 = w_1x+b_1,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5568395b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where $w_1$ is the weight and $b_1$ is the bias. These are the\n",
+    "parameters we want to optimize.  The output is $a_1=\\sigma(z_1)$ (see\n",
+    "graph from whiteboard notes). This output is then fed into the\n",
+    "**cost/loss** function, which we here for the sake of simplicity just\n",
+    "define as the squared error"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e6ae6f18",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "C(x;w_1,b_1)=\\frac{1}{2}(a_1-y)^2.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7d6abd22",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Layout of a simple neural network with no hidden layer\n",
+    "\n",
+    "<!-- dom:FIGURE: [figures/simplenn1.png, width=900 frac=1.0] -->\n",
+    "<!-- begin figure -->\n",
+    "\n",
+    "<img src=\"figures/simplenn1.png\" width=\"900\"><p style=\"font-size: 0.9em\"><i>Figure 1: </i></p>\n",
+    "<!-- end figure -->"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1e466108",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Optimizing the parameters\n",
+    "\n",
+    "In setting up the feed forward and back propagation parts of the\n",
+    "algorithm, we need now the derivative of the various variables we want\n",
+    "to train.\n",
+    "\n",
+    "We need"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3b6fd059",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial C}{\\partial w_1} \\hspace{0.1cm}\\mathrm{and}\\hspace{0.1cm}\\frac{\\partial C}{\\partial b_1}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cfad60fc",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Using the chain rule we find"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5c5014b3",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial C}{\\partial w_1}=\\frac{\\partial C}{\\partial a_1}\\frac{\\partial a_1}{\\partial z_1}\\frac{\\partial z_1}{\\partial w_1}=(a_1-y)\\sigma_1'x,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1c677323",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "93362833",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial C}{\\partial b_1}=\\frac{\\partial C}{\\partial a_1}\\frac{\\partial a_1}{\\partial z_1}\\frac{\\partial z_1}{\\partial b_1}=(a_1-y)\\sigma_1',\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c857a902",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "which we later will just define as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b7b95721",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial C}{\\partial a_1}\\frac{\\partial a_1}{\\partial z_1}=\\delta_1.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e2574534",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Adding a hidden layer\n",
+    "\n",
+    "We change our simple model to (see graph)\n",
+    "a network with just one hidden layer but with scalar variables only.\n",
+    "\n",
+    "Our output variable changes to $a_2$ and $a_1$ is now the output from the hidden node and $a_0=x$.\n",
+    "We have then"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ae7a5afa",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "z_1 = w_1a_0+b_1 \\hspace{0.1cm} \\wedge a_1 = \\sigma_1(z_1),\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7962e138",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "z_2 = w_2a_1+b_2 \\hspace{0.1cm} \\wedge a_2 = \\sigma_2(z_2),\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0add2cb1",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and the cost function"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2ea986fc",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "C(x;\\boldsymbol{\\Theta})=\\frac{1}{2}(a_2-y)^2,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "683c4849",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "with $\\boldsymbol{\\Theta}=[w_1,w_2,b_1,b_2]$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f345670c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Layout of a simple neural network with one hidden layer\n",
+    "\n",
+    "<!-- dom:FIGURE: [figures/simplenn2.png, width=900 frac=1.0] -->\n",
+    "<!-- begin figure -->\n",
+    "\n",
+    "<img src=\"figures/simplenn2.png\" width=\"900\"><p style=\"font-size: 0.9em\"><i>Figure 1: </i></p>\n",
+    "<!-- end figure -->"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "bb15a76b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## The derivatives\n",
+    "\n",
+    "The derivatives are now, using the chain rule again"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d0882362",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial C}{\\partial w_2}=\\frac{\\partial C}{\\partial a_2}\\frac{\\partial a_2}{\\partial z_2}\\frac{\\partial z_2}{\\partial w_2}=(a_2-y)\\sigma_2'a_1=\\delta_2a_1,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3e16d45d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial C}{\\partial b_2}=\\frac{\\partial C}{\\partial a_2}\\frac{\\partial a_2}{\\partial z_2}\\frac{\\partial z_2}{\\partial b_2}=(a_2-y)\\sigma_2'=\\delta_2,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b2a0a41b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial C}{\\partial w_1}=\\frac{\\partial C}{\\partial a_2}\\frac{\\partial a_2}{\\partial z_2}\\frac{\\partial z_2}{\\partial a_1}\\frac{\\partial a_1}{\\partial z_1}\\frac{\\partial z_1}{\\partial w_1}=(a_2-y)\\sigma_2'a_1\\sigma_1'a_0,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e8f61358",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial C}{\\partial b_1}=\\frac{\\partial C}{\\partial a_2}\\frac{\\partial a_2}{\\partial z_2}\\frac{\\partial z_2}{\\partial a_1}\\frac{\\partial a_1}{\\partial z_1}\\frac{\\partial z_1}{\\partial b_1}=(a_2-y)\\sigma_2'\\sigma_1'=\\delta_1.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5a8258cb",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Can you generalize this to more than one hidden layer?"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "bb720314",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Important observations\n",
+    "\n",
+    "From the above equations we see that the derivatives of the activation\n",
+    "functions play a central role. If they vanish, the training may\n",
+    "stop. This is called the vanishing gradient problem, see discussions below. If they become\n",
+    "large, the parameters $w_i$ and $b_i$ may simply go to infinity. This\n",
+    "is referenced as  the exploding gradient problem."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "52217a26",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## The training\n",
+    "\n",
+    "The training of the parameters is done through various gradient descent approximations with"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "eb647e50",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "w_{i}\\leftarrow w_{i}- \\eta \\delta_i a_{i-1},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cda95964",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "130a2766",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "b_i \\leftarrow b_i-\\eta \\delta_i,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ac7cc3bc",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "with $\\eta$ is the learning rate.\n",
+    "\n",
+    "One iteration consists of one feed forward step and one back-propagation step. Each back-propagation step does one update of the parameters $\\boldsymbol{\\Theta}$.\n",
+    "\n",
+    "For the first hidden layer $a_{i-1}=a_0=x$ for this simple model."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cde60cd2",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Code example\n",
+    "\n",
+    "The code here implements the above model with one hidden layer and\n",
+    "scalar variables for the same function we studied in the previous\n",
+    "example.  The code is however set up so that we can add multiple\n",
+    "inputs $x$ and target values $y$. Note also that we have the\n",
+    "possibility of defining a feature matrix $\\boldsymbol{X}$ with more than just\n",
+    "one column for the input values. This will turn useful in our next example. We have also defined matrices and vectors for all of our operations although it is not necessary here."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "id": "3616dd69",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "import numpy as np\n",
+    "# We use the Sigmoid function as activation function\n",
+    "def sigmoid(z):\n",
+    "    return 1.0/(1.0+np.exp(-z))\n",
+    "\n",
+    "def forwardpropagation(x):\n",
+    "    # weighted sum of inputs to the hidden layer\n",
+    "    z_1 = np.matmul(x, w_1) + b_1\n",
+    "    # activation in the hidden layer\n",
+    "    a_1 = sigmoid(z_1)\n",
+    "    # weighted sum of inputs to the output layer\n",
+    "    z_2 = np.matmul(a_1, w_2) + b_2\n",
+    "    a_2 = z_2\n",
+    "    return a_1, a_2\n",
+    "\n",
+    "def backpropagation(x, y):\n",
+    "    a_1, a_2 = forwardpropagation(x)\n",
+    "    # parameter delta for the output layer, note that a_2=z_2 and its derivative wrt z_2 is just 1\n",
+    "    delta_2 = a_2 - y\n",
+    "    print(0.5*((a_2-y)**2))\n",
+    "    # delta for  the hidden layer\n",
+    "    delta_1 = np.matmul(delta_2, w_2.T) * a_1 * (1 - a_1)\n",
+    "    # gradients for the output layer\n",
+    "    output_weights_gradient = np.matmul(a_1.T, delta_2)\n",
+    "    output_bias_gradient = np.sum(delta_2, axis=0)\n",
+    "    # gradient for the hidden layer\n",
+    "    hidden_weights_gradient = np.matmul(x.T, delta_1)\n",
+    "    hidden_bias_gradient = np.sum(delta_1, axis=0)\n",
+    "    return output_weights_gradient, output_bias_gradient, hidden_weights_gradient, hidden_bias_gradient\n",
+    "\n",
+    "\n",
+    "# ensure the same random numbers appear every time\n",
+    "np.random.seed(0)\n",
+    "# Input variable\n",
+    "x = np.array([4.0],dtype=np.float64)\n",
+    "# Target values\n",
+    "y = 2*x+1.0 \n",
+    "\n",
+    "# Defining the neural network, only scalars here\n",
+    "n_inputs = x.shape\n",
+    "n_features = 1\n",
+    "n_hidden_neurons = 1\n",
+    "n_outputs = 1\n",
+    "\n",
+    "# Initialize the network\n",
+    "# weights and bias in the hidden layer\n",
+    "w_1 = np.random.randn(n_features, n_hidden_neurons)\n",
+    "b_1 = np.zeros(n_hidden_neurons) + 0.01\n",
+    "\n",
+    "# weights and bias in the output layer\n",
+    "w_2 = np.random.randn(n_hidden_neurons, n_outputs)\n",
+    "b_2 = np.zeros(n_outputs) + 0.01\n",
+    "\n",
+    "eta = 0.1\n",
+    "for i in range(50):\n",
+    "    # calculate gradients\n",
+    "    derivW2, derivB2, derivW1, derivB1 = backpropagation(x, y)\n",
+    "    # update weights and biases\n",
+    "    w_2 -= eta * derivW2\n",
+    "    b_2 -= eta * derivB2\n",
+    "    w_1 -= eta * derivW1\n",
+    "    b_1 -= eta * derivB1"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3348a149",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "We see that after some few iterations (the results do depend on the learning rate however), we get an error which is rather small."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b9b47543",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Exercise 1: Including more data\n",
+    "\n",
+    "Try to increase the amount of input and\n",
+    "target/output data. Try also to perform calculations for more values\n",
+    "of the learning rates. Feel free to add either hyperparameters with an\n",
+    "$l_1$ norm or an $l_2$ norm and discuss your results.\n",
+    "Discuss your results as functions of the amount of training data and various learning rates.\n",
+    "\n",
+    "**Challenge:** Try to change the activation functions and replace the hard-coded analytical expressions with automatic derivation via either **autograd** or **JAX**."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3d2a82c9",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Simple neural network and the  back propagation equations\n",
+    "\n",
+    "Let us now try to increase our level of ambition and attempt at setting \n",
+    "up the equations for a neural network with two input nodes, one hidden\n",
+    "layer with two hidden nodes and one output layer with one output node/neuron only (see graph)..\n",
+    "\n",
+    "We need to define the following parameters and variables with the input layer (layer $(0)$) \n",
+    "where we label the  nodes $x_0$ and $x_1$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e2bda122",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "x_0 = a_0^{(0)} \\wedge x_1 = a_1^{(0)}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d4324d91",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "The  hidden layer (layer $(1)$) has  nodes which yield the outputs $a_0^{(1)}$ and $a_1^{(1)}$) with  weight $\\boldsymbol{w}$ and bias $\\boldsymbol{b}$ parameters"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b3c0b344",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "w_{ij}^{(1)}=\\left\\{w_{00}^{(1)},w_{01}^{(1)},w_{10}^{(1)},w_{11}^{(1)}\\right\\} \\wedge b^{(1)}=\\left\\{b_0^{(1)},b_1^{(1)}\\right\\}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "fb200d12",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Layout of a simple neural network with two input nodes, one  hidden layer and one output node\n",
+    "\n",
+    "<!-- dom:FIGURE: [figures/simplenn3.png, width=900 frac=1.0] -->\n",
+    "<!-- begin figure -->\n",
+    "\n",
+    "<img src=\"figures/simplenn3.png\" width=\"900\"><p style=\"font-size: 0.9em\"><i>Figure 1: </i></p>\n",
+    "<!-- end figure -->"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5a7e37cd",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## The ouput layer\n",
+    "\n",
+    "Finally, we have the ouput layer given by layer label $(2)$ with output $a^{(2)}$ and weights and biases to be determined given by the variables"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "11f25dfa",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "w_{i}^{(2)}=\\left\\{w_{0}^{(2)},w_{1}^{(2)}\\right\\} \\wedge b^{(2)}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8755dbae",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Our output is $\\tilde{y}=a^{(2)}$ and we define a generic cost function $C(a^{(2)},y;\\boldsymbol{\\Theta})$ where $y$ is the target value (a scalar here).\n",
+    "The parameters we need to optimize are given by"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "51983594",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\boldsymbol{\\Theta}=\\left\\{w_{00}^{(1)},w_{01}^{(1)},w_{10}^{(1)},w_{11}^{(1)},w_{0}^{(2)},w_{1}^{(2)},b_0^{(1)},b_1^{(1)},b^{(2)}\\right\\}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "20a70d90",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Compact expressions\n",
+    "\n",
+    "We can define the inputs to the activation functions for the various layers in terms of various matrix-vector multiplications and vector additions.\n",
+    "The inputs to the first hidden layer are"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "76e186dc",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\begin{bmatrix}z_0^{(1)} \\\\ z_1^{(1)} \\end{bmatrix}=\\begin{bmatrix}w_{00}^{(1)} & w_{01}^{(1)}\\\\ w_{10}^{(1)} &w_{11}^{(1)} \\end{bmatrix}\\begin{bmatrix}a_0^{(0)} \\\\ a_1^{(0)} \\end{bmatrix}+\\begin{bmatrix}b_0^{(1)} \\\\ b_1^{(1)} \\end{bmatrix},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3396d1b9",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "with outputs"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2f4d2eed",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\begin{bmatrix}a_0^{(1)} \\\\ a_1^{(1)} \\end{bmatrix}=\\begin{bmatrix}\\sigma^{(1)}(z_0^{(1)}) \\\\ \\sigma^{(1)}(z_1^{(1)}) \\end{bmatrix}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6863edaa",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Output layer\n",
+    "\n",
+    "For the final output layer we have the inputs to the final activation function"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "569b5a62",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "z^{(2)} = w_{0}^{(2)}a_0^{(1)} +w_{1}^{(2)}a_1^{(1)}+b^{(2)},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "88775a53",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "resulting in the  output"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "11852c41",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "a^{(2)}=\\sigma^{(2)}(z^{(2)}).\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4e2e26a9",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Explicit derivatives\n",
+    "\n",
+    "In total we have nine parameters which we need to train.  Using the\n",
+    "chain rule (or just the back-propagation algorithm) we can find all\n",
+    "derivatives. Since we will use automatic differentiation in reverse\n",
+    "mode, we start with the derivatives of the cost function with respect\n",
+    "to the parameters of the output layer, namely"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "25da37b5",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial C}{\\partial w_{i}^{(2)}}=\\frac{\\partial C}{\\partial a^{(2)}}\\frac{\\partial a^{(2)}}{\\partial z^{(2)}}\\frac{\\partial z^{(2)}}{\\partial w_{i}^{(2)}}=\\delta^{(2)}a_i^{(1)},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4094b188",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "with"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "99f40072",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\delta^{(2)}=\\frac{\\partial C}{\\partial a^{(2)}}\\frac{\\partial a^{(2)}}{\\partial z^{(2)}}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a93180cb",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and finally"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "312c8e22",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial C}{\\partial b^{(2)}}=\\frac{\\partial C}{\\partial a^{(2)}}\\frac{\\partial a^{(2)}}{\\partial z^{(2)}}\\frac{\\partial z^{(2)}}{\\partial b^{(2)}}=\\delta^{(2)}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4db8065c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Derivatives of the hidden layer\n",
+    "\n",
+    "Using the chain rule we have the following expressions for say one of the weight parameters (it is easy to generalize to the other weight parameters)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "316b7cc7",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial C}{\\partial w_{00}^{(1)}}=\\frac{\\partial C}{\\partial a^{(2)}}\\frac{\\partial a^{(2)}}{\\partial z^{(2)}}\n",
+    "\\frac{\\partial z^{(2)}}{\\partial z_0^{(1)}}\\frac{\\partial z_0^{(1)}}{\\partial w_{00}^{(1)}}=    \\delta^{(2)}\\frac{\\partial z^{(2)}}{\\partial z_0^{(1)}}\\frac{\\partial z_0^{(1)}}{\\partial w_{00}^{(1)}},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8ef16e76",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "which, noting that"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "85a0f70d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "z^{(2)} =w_0^{(2)}a_0^{(1)}+w_1^{(2)}a_1^{(1)}+b^{(2)},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "108db06e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "allows us to rewrite"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2922e5c6",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial z^{(2)}}{\\partial z_0^{(1)}}\\frac{\\partial z_0^{(1)}}{\\partial w_{00}^{(1)}}=w_0^{(2)}\\frac{\\partial a_0^{(1)}}{\\partial z_0^{(1)}}a_0^{(1)}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cb6f6fe5",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Final expression\n",
+    "Defining"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3a0d272d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\delta_0^{(1)}=w_0^{(2)}\\frac{\\partial a_0^{(1)}}{\\partial z_0^{(1)}}\\delta^{(2)},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "70a6cf5c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "we have"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a862fb73",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial C}{\\partial w_{00}^{(1)}}=\\delta_0^{(1)}a_0^{(1)}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "703fa2c1",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Similarly, we obtain"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2032458a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial C}{\\partial w_{01}^{(1)}}=\\delta_0^{(1)}a_1^{(1)}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "97d8acd7",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Completing the list\n",
+    "\n",
+    "Similarly, we find"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "972e5301",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial C}{\\partial w_{10}^{(1)}}=\\delta_1^{(1)}a_0^{(1)},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ba8f5955",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3ac41463",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial C}{\\partial w_{11}^{(1)}}=\\delta_1^{(1)}a_1^{(1)},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ab92a69c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where we have defined"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8224b6f2",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\delta_1^{(1)}=w_1^{(2)}\\frac{\\partial a_1^{(1)}}{\\partial z_1^{(1)}}\\delta^{(2)}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b55a566b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Final expressions for the biases of the hidden layer\n",
+    "\n",
+    "For the sake of completeness, we list the derivatives of the biases, which are"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cb5f687e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial C}{\\partial b_{0}^{(1)}}=\\delta_0^{(1)},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6d8361e8",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ccfb7fa8",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial C}{\\partial b_{1}^{(1)}}=\\delta_1^{(1)}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "20fd0aa3",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "As we will see below, these expressions can be generalized in a more compact form."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6bca7f99",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Gradient expressions\n",
+    "\n",
+    "For this specific model, with just one output node and two hidden\n",
+    "nodes, the gradient descent equations take the following form for output layer"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "430e26d8",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "w_{i}^{(2)}\\leftarrow w_{i}^{(2)}- \\eta \\delta^{(2)} a_{i}^{(1)},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ced71f83",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ec12ee1a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "b^{(2)} \\leftarrow b^{(2)}-\\eta \\delta^{(2)},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f46fe24d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "af8f924d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "w_{ij}^{(1)}\\leftarrow w_{ij}^{(1)}- \\eta \\delta_{i}^{(1)} a_{j}^{(0)},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4aeb6140",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0bc2f26c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "b_{i}^{(1)} \\leftarrow b_{i}^{(1)}-\\eta \\delta_{i}^{(1)},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7eafd358",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where $\\eta$ is the learning rate."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "548f58f6",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Exercise 2: Extended program\n",
+    "\n",
+    "We extend our simple code to a function which depends on two variable $x_0$ and $x_1$, that is"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4c38514a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "y=f(x_0,x_1)=x_0^2+3x_0x_1+x_1^2+5.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "06303245",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "We feed our network with $n=100$ entries $x_0$ and $x_1$. We have thus two features represented by these variable and an input matrix/design matrix $\\boldsymbol{X}\\in \\mathbf{R}^{n\\times 2}$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ed0c0029",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\boldsymbol{X}=\\begin{bmatrix} x_{00} & x_{01} \\\\ x_{00} & x_{01} \\\\ x_{10} & x_{11} \\\\ x_{20} & x_{21} \\\\ \\dots & \\dots \\\\ \\dots & \\dots \\\\ x_{n-20} & x_{n-21} \\\\ x_{n-10} & x_{n-11} \\end{bmatrix}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "93df389e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Write a code, based on the previous code examples, which takes as input these data and fit the above function.\n",
+    "You can extend your code to include automatic differentiation.\n",
+    "\n",
+    "With these examples, we are now ready to embark upon the writing of more a general code for neural networks."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5df18704",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Getting serious, the  back propagation equations for a neural network\n",
+    "\n",
+    "Now it is time to move away from one node in each layer only. Our inputs are also represented either by several inputs.\n",
+    "\n",
+    "We have thus"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ae3765be",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial{\\cal C}((\\boldsymbol{\\Theta}^L)}{\\partial w_{jk}^L}  =  \\left(a_j^L - y_j\\right)a_j^L(1-a_j^L)a_k^{L-1},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "dd8f7882",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Defining"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f204fdd7",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\delta_j^L = a_j^L(1-a_j^L)\\left(a_j^L - y_j\\right) = \\sigma'(z_j^L)\\frac{\\partial {\\cal C}}{\\partial (a_j^L)},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c28e8401",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and using the Hadamard product of two vectors we can write this as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "910c4eb1",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\boldsymbol{\\delta}^L = \\sigma'(\\hat{z}^L)\\circ\\frac{\\partial {\\cal C}}{\\partial (\\boldsymbol{a}^L)}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "efd2f948",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Analyzing the last results\n",
+    "\n",
+    "This is an important expression. The second term on the right handside\n",
+    "measures how fast the cost function is changing as a function of the $j$th\n",
+    "output activation.  If, for example, the cost function doesn't depend\n",
+    "much on a particular output node $j$, then $\\delta_j^L$ will be small,\n",
+    "which is what we would expect. The first term on the right, measures\n",
+    "how fast the activation function $f$ is changing at a given activation\n",
+    "value $z_j^L$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e1eeeba2",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## More considerations\n",
+    "\n",
+    "Notice that everything in the above equations is easily computed.  In\n",
+    "particular, we compute $z_j^L$ while computing the behaviour of the\n",
+    "network, and it is only a small additional overhead to compute\n",
+    "$\\sigma'(z^L_j)$.  The exact form of the derivative with respect to the\n",
+    "output depends on the form of the cost function.\n",
+    "However, provided the cost function is known there should be little\n",
+    "trouble in calculating"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b5e74c11",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial {\\cal C}}{\\partial (a_j^L)}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e129fe72",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "With the definition of $\\delta_j^L$ we have a more compact definition of the derivative of the cost function in terms of the weights, namely"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3879d293",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial{\\cal C}}{\\partial w_{jk}^L}  =  \\delta_j^La_k^{L-1}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1ea1da9d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Derivatives in terms of $z_j^L$\n",
+    "\n",
+    "It is also easy to see that our previous equation can be written as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c7156e16",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\delta_j^L =\\frac{\\partial {\\cal C}}{\\partial z_j^L}= \\frac{\\partial {\\cal C}}{\\partial a_j^L}\\frac{\\partial a_j^L}{\\partial z_j^L},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8311b4aa",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "which can also be interpreted as the partial derivative of the cost function with respect to the biases $b_j^L$, namely"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7bb3d820",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\delta_j^L = \\frac{\\partial {\\cal C}}{\\partial b_j^L}\\frac{\\partial b_j^L}{\\partial z_j^L}=\\frac{\\partial {\\cal C}}{\\partial b_j^L},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1eeb0c00",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "That is, the error $\\delta_j^L$ is exactly equal to the rate of change of the cost function as a function of the bias."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "bc7d3757",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Bringing it together\n",
+    "\n",
+    "We have now three equations that are essential for the computations of the derivatives of the cost function at the output layer. These equations are needed to start the algorithm and they are"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9f018cff",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"_auto1\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation}\n",
+    "\\frac{\\partial{\\cal C}(\\hat{W^L})}{\\partial w_{jk}^L}  =  \\delta_j^La_k^{L-1},\n",
+    "\\label{_auto1} \\tag{2}\n",
+    "\\end{equation}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ebde7551",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f96aa8f7",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"_auto2\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation}\n",
+    "\\delta_j^L = \\sigma'(z_j^L)\\frac{\\partial {\\cal C}}{\\partial (a_j^L)},\n",
+    "\\label{_auto2} \\tag{3}\n",
+    "\\end{equation}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1215d118",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c5f6885e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"_auto3\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation}\n",
+    "\\delta_j^L = \\frac{\\partial {\\cal C}}{\\partial b_j^L},\n",
+    "\\label{_auto3} \\tag{4}\n",
+    "\\end{equation}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1dedde99",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Final back propagating equation\n",
+    "\n",
+    "We have that (replacing $L$ with a general layer $l$)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a182b912",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\delta_j^l =\\frac{\\partial {\\cal C}}{\\partial z_j^l}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9fcc3201",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "We want to express this in terms of the equations for layer $l+1$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "54237463",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Using the chain rule and summing over all $k$ entries\n",
+    "\n",
+    "We obtain"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "dc069f5a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\delta_j^l =\\sum_k \\frac{\\partial {\\cal C}}{\\partial z_k^{l+1}}\\frac{\\partial z_k^{l+1}}{\\partial z_j^{l}}=\\sum_k \\delta_k^{l+1}\\frac{\\partial z_k^{l+1}}{\\partial z_j^{l}},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "71ba0435",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and recalling that"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "bd00cbe9",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "z_j^{l+1} = \\sum_{i=1}^{M_{l}}w_{ij}^{l+1}a_i^{l}+b_j^{l+1},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1e7e0241",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "with $M_l$ being the number of nodes in layer $l$, we obtain"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e8e3697e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\delta_j^l =\\sum_k \\delta_k^{l+1}w_{kj}^{l+1}\\sigma'(z_j^l),\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7d86a02b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "This is our final equation.\n",
+    "\n",
+    "We are now ready to set up the algorithm for back propagation and learning the weights and biases."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ff1dc46f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Setting up the back propagation algorithm\n",
+    "\n",
+    "The four equations  provide us with a way of computing the gradient of the cost function. Let us write this out in the form of an algorithm.\n",
+    "\n",
+    "**First**, we set up the input data $\\hat{x}$ and the activations\n",
+    "$\\hat{z}_1$ of the input layer and compute the activation function and\n",
+    "the pertinent outputs $\\hat{a}^1$.\n",
+    "\n",
+    "**Secondly**, we perform then the feed forward till we reach the output\n",
+    "layer and compute all $\\hat{z}_l$ of the input layer and compute the\n",
+    "activation function and the pertinent outputs $\\hat{a}^l$ for\n",
+    "$l=1,2,3,\\dots,L$.\n",
+    "\n",
+    "**Notation**: The first hidden layer has $l=1$ as label and the final output layer has $l=L$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1313e6dc",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Setting up the back propagation algorithm, part 2\n",
+    "\n",
+    "Thereafter we compute the ouput error $\\hat{\\delta}^L$ by computing all"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "74378773",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\delta_j^L = \\sigma'(z_j^L)\\frac{\\partial {\\cal C}}{\\partial (a_j^L)}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "70450254",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Then we compute the back propagate error for each $l=L-1,L-2,\\dots,1$ as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "81a28b23",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\delta_j^l = \\sum_k \\delta_k^{l+1}w_{kj}^{l+1}\\sigma'(z_j^l).\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9a733356",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Setting up the Back propagation algorithm, part 3\n",
+    "\n",
+    "Finally, we update the weights and the biases using gradient descent\n",
+    "for each $l=L-1,L-2,\\dots,1$ and update the weights and biases\n",
+    "according to the rules"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f469f486",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "w_{jk}^l\\leftarrow  = w_{jk}^l- \\eta \\delta_j^la_k^{l-1},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7461e5e6",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "b_j^l \\leftarrow b_j^l-\\eta \\frac{\\partial {\\cal C}}{\\partial b_j^l}=b_j^l-\\eta \\delta_j^l,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "50a1b605",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "with $\\eta$ being the learning rate."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0cebce43",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Updating the gradients\n",
+    "\n",
+    "With the back propagate error for each $l=L-1,L-2,\\dots,1$ as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2e4405bd",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\delta_j^l = \\sum_k \\delta_k^{l+1}w_{kj}^{l+1}sigma'(z_j^l),\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2920aa4e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "we update the weights and the biases using gradient descent for each $l=L-1,L-2,\\dots,1$ and update the weights and biases according to the rules"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "bc4357b0",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "w_{jk}^l\\leftarrow  = w_{jk}^l- \\eta \\delta_j^la_k^{l-1},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d9b66569",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "b_j^l \\leftarrow b_j^l-\\eta \\frac{\\partial {\\cal C}}{\\partial b_j^l}=b_j^l-\\eta \\delta_j^l,\n",
+    "$$"
+   ]
+  }
+ ],
+ "metadata": {},
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
\ No newline at end of file
diff --git a/doc/LectureNotes/_build/jupyter_execute/week42.ipynb b/doc/LectureNotes/_build/jupyter_execute/week42.ipynb
new file mode 100644
index 000000000..c3af30dea
--- /dev/null
+++ b/doc/LectureNotes/_build/jupyter_execute/week42.ipynb
@@ -0,0 +1,5952 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "d231eeee",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- HTML file automatically generated from DocOnce source (https://github.com/doconce/doconce/)\n",
+    "doconce format html week42.do.txt --no_mako -->\n",
+    "<!-- dom:TITLE: Week 42 Constructing a Neural Network code with examples -->"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5e782cb1",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "# Week 42 Constructing a Neural Network code with examples\n",
+    "**Morten Hjorth-Jensen**, Department of Physics, University of Oslo, Norway\n",
+    "\n",
+    "Date: **October 13-17, 2025**"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "53309290",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Lecture October 13, 2025\n",
+    "1. Building our own Feed-forward Neural Network and discussion of project 2\n",
+    "\n",
+    "2. Project 2 is available at <https://github.com/CompPhysics/MachineLearning/blob/master/doc/Projects/2025/Project2/ipynb/Project2.ipynb>"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "71367514",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Readings and videos\n",
+    "1. These lecture notes\n",
+    "\n",
+    "2. Video of lecture at <https://youtu.be/eqyNrEYRXnY>\n",
+    "\n",
+    "3. Whiteboard notes at <https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2025/FYSSTKweek42.pdf>\n",
+    "\n",
+    "4. For a more in depth discussion on  neural networks we recommend Goodfellow et al chapters 6 and 7. For the optimization part, see chapter 8.    \n",
+    "\n",
+    "5. Neural Networks demystified at <https://www.youtube.com/watch?v=bxe2T-V8XRs&list=PLiaHhY2iBX9hdHaRr6b7XevZtgZRa1PoU&ab_channel=WelchLabs>\n",
+    "\n",
+    "6. Building Neural Networks from scratch at <https://www.youtube.com/watch?v=Wo5dMEP_BbI&list=PLQVvvaa0QuDcjD5BAw2DxE6OF2tius3V3&ab_channel=sentdex>\n",
+    "\n",
+    "7. Video on Neural Networks at <https://www.youtube.com/watch?v=CqOfi41LfDw>\n",
+    "\n",
+    "8. Video on the back propagation algorithm at <https://www.youtube.com/watch?v=Ilg3gGewQ5U>\n",
+    "\n",
+    "I also  recommend Michael Nielsen's intuitive approach to the neural networks and the universal approximation theorem, see the slides at <http://neuralnetworksanddeeplearning.com/chap4.html>."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c7be87be",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Material for the lab sessions on Tuesday and Wednesday\n",
+    "1. Exercises on writing a code for neural networks, back propagation part, see exercises for week 42 at <https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/exercisesweek42.html> \n",
+    "\n",
+    "2. Discussion of project 2"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8e0567a2",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Lecture material: Writing a code which implements a feed-forward neural network\n",
+    "\n",
+    "Last week we discussed the basics of neural networks and deep learning\n",
+    "and the basics of automatic differentiation.  We looked also at\n",
+    "examples on how compute the parameters of a simple network with scalar\n",
+    "inputs and ouputs and no or just one hidden layers.\n",
+    "\n",
+    "We ended our discussions with the derivation of the equations for a\n",
+    "neural network with one hidden layers and two input variables and two\n",
+    "hidden nodes but only one output node. We did almost finish the derivation of the back propagation algorithm."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "549dcc05",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Mathematics of deep learning\n",
+    "\n",
+    "**Two recent books online.**\n",
+    "\n",
+    "1. [The Modern Mathematics of Deep Learning, by Julius Berner, Philipp Grohs, Gitta Kutyniok, Philipp Petersen](https://arxiv.org/abs/2105.04026), published as [Mathematical Aspects of Deep Learning, pp. 1-111. Cambridge University Press, 2022](https://doi.org/10.1017/9781009025096.002)\n",
+    "\n",
+    "2. [Mathematical Introduction to Deep Learning: Methods, Implementations, and Theory, Arnulf Jentzen, Benno Kuckuck, Philippe von Wurstemberger](https://doi.org/10.48550/arXiv.2310.20360)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "21203bae",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Reminder on books with hands-on material and codes\n",
+    "* [Sebastian Rashcka et al, Machine learning with Sickit-Learn and PyTorch](https://sebastianraschka.com/blog/2022/ml-pytorch-book.html)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1c102a30",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Reading recommendations\n",
+    "\n",
+    "1. Rashkca et al., chapter 11, jupyter-notebook sent separately, from [GitHub](https://github.com/rasbt/machine-learning-book)\n",
+    "\n",
+    "2. Goodfellow et al, chapter 6 and 7 contain most of the neural network background."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "53f11afe",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Reminder from last week: First network example, simple percepetron with one input\n",
+    "\n",
+    "As yet another example we define now a simple perceptron model with\n",
+    "all quantities given by scalars. We consider only one input variable\n",
+    "$x$ and one target value $y$.  We define an activation function\n",
+    "$\\sigma_1$ which takes as input"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "afa8c42a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "z_1 = w_1x+b_1,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cb5c959f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where $w_1$ is the weight and $b_1$ is the bias. These are the\n",
+    "parameters we want to optimize.  The output is $a_1=\\sigma(z_1)$ (see\n",
+    "graph from whiteboard notes). This output is then fed into the\n",
+    "**cost/loss** function, which we here for the sake of simplicity just\n",
+    "define as the squared error"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0083ae15",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "C(x;w_1,b_1)=\\frac{1}{2}(a_1-y)^2.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f4931203",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Layout of a simple neural network with no hidden layer\n",
+    "\n",
+    "<!-- dom:FIGURE: [figures/simplenn1.png, width=900 frac=1.0] -->\n",
+    "<!-- begin figure -->\n",
+    "\n",
+    "<img src=\"figures/simplenn1.png\" width=\"900\"><p style=\"font-size: 0.9em\"><i>Figure 1: </i></p>\n",
+    "<!-- end figure -->"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d3a3754d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Optimizing the parameters\n",
+    "\n",
+    "In setting up the feed forward and back propagation parts of the\n",
+    "algorithm, we need now the derivative of the various variables we want\n",
+    "to train.\n",
+    "\n",
+    "We need"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "bcd5dbab",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial C}{\\partial w_1} \\hspace{0.1cm}\\mathrm{and}\\hspace{0.1cm}\\frac{\\partial C}{\\partial b_1}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2cbc30f1",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Using the chain rule we find"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1a1d803d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial C}{\\partial w_1}=\\frac{\\partial C}{\\partial a_1}\\frac{\\partial a_1}{\\partial z_1}\\frac{\\partial z_1}{\\partial w_1}=(a_1-y)\\sigma_1'x,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "776735c7",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c1a2e5af",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial C}{\\partial b_1}=\\frac{\\partial C}{\\partial a_1}\\frac{\\partial a_1}{\\partial z_1}\\frac{\\partial z_1}{\\partial b_1}=(a_1-y)\\sigma_1',\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9e603df9",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "which we later will just define as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "533212cd",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial C}{\\partial a_1}\\frac{\\partial a_1}{\\partial z_1}=\\delta_1.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "09d91067",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Adding a hidden layer\n",
+    "\n",
+    "We change our simple model to (see graph)\n",
+    "a network with just one hidden layer but with scalar variables only.\n",
+    "\n",
+    "Our output variable changes to $a_2$ and $a_1$ is now the output from the hidden node and $a_0=x$.\n",
+    "We have then"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f767afe7",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "z_1 = w_1a_0+b_1 \\hspace{0.1cm} \\wedge a_1 = \\sigma_1(z_1),\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f38ded54",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "z_2 = w_2a_1+b_2 \\hspace{0.1cm} \\wedge a_2 = \\sigma_2(z_2),\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f3f03bc3",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and the cost function"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9062730e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "C(x;\\boldsymbol{\\Theta})=\\frac{1}{2}(a_2-y)^2,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "75bbc32c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "with $\\boldsymbol{\\Theta}=[w_1,w_2,b_1,b_2]$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "fcf02dbf",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Layout of a simple neural network with one hidden layer\n",
+    "\n",
+    "<!-- dom:FIGURE: [figures/simplenn2.png, width=900 frac=1.0] -->\n",
+    "<!-- begin figure -->\n",
+    "\n",
+    "<img src=\"figures/simplenn2.png\" width=\"900\"><p style=\"font-size: 0.9em\"><i>Figure 1: </i></p>\n",
+    "<!-- end figure -->"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "aa97678f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## The derivatives\n",
+    "\n",
+    "The derivatives are now, using the chain rule again"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "98f68e27",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial C}{\\partial w_2}=\\frac{\\partial C}{\\partial a_2}\\frac{\\partial a_2}{\\partial z_2}\\frac{\\partial z_2}{\\partial w_2}=(a_2-y)\\sigma_2'a_1=\\delta_2a_1,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c4528178",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial C}{\\partial b_2}=\\frac{\\partial C}{\\partial a_2}\\frac{\\partial a_2}{\\partial z_2}\\frac{\\partial z_2}{\\partial b_2}=(a_2-y)\\sigma_2'=\\delta_2,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d6304298",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial C}{\\partial w_1}=\\frac{\\partial C}{\\partial a_2}\\frac{\\partial a_2}{\\partial z_2}\\frac{\\partial z_2}{\\partial a_1}\\frac{\\partial a_1}{\\partial z_1}\\frac{\\partial z_1}{\\partial w_1}=(a_2-y)\\sigma_2'a_1\\sigma_1'a_0,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "dfc47ba6",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial C}{\\partial b_1}=\\frac{\\partial C}{\\partial a_2}\\frac{\\partial a_2}{\\partial z_2}\\frac{\\partial z_2}{\\partial a_1}\\frac{\\partial a_1}{\\partial z_1}\\frac{\\partial z_1}{\\partial b_1}=(a_2-y)\\sigma_2'\\sigma_1'=\\delta_1.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8834c3dc",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Can you generalize this to more than one hidden layer?"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "40956770",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Important observations\n",
+    "\n",
+    "From the above equations we see that the derivatives of the activation\n",
+    "functions play a central role. If they vanish, the training may\n",
+    "stop. This is called the vanishing gradient problem, see discussions below. If they become\n",
+    "large, the parameters $w_i$ and $b_i$ may simply go to infinity. This\n",
+    "is referenced as  the exploding gradient problem."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "69e7fdcf",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## The training\n",
+    "\n",
+    "The training of the parameters is done through various gradient descent approximations with"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "726d4c90",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "w_{i}\\leftarrow w_{i}- \\eta \\delta_i a_{i-1},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0ee83d1c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f5b3b5a5",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "b_i \\leftarrow b_i-\\eta \\delta_i,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b2746792",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "with $\\eta$ is the learning rate.\n",
+    "\n",
+    "One iteration consists of one feed forward step and one back-propagation step. Each back-propagation step does one update of the parameters $\\boldsymbol{\\Theta}$.\n",
+    "\n",
+    "For the first hidden layer $a_{i-1}=a_0=x$ for this simple model."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "76e2e41a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Code example\n",
+    "\n",
+    "The code here implements the above model with one hidden layer and\n",
+    "scalar variables for the same function we studied in the previous\n",
+    "example.  The code is however set up so that we can add multiple\n",
+    "inputs $x$ and target values $y$. Note also that we have the\n",
+    "possibility of defining a feature matrix $\\boldsymbol{X}$ with more than just\n",
+    "one column for the input values. This will turn useful in our next example. We have also defined matrices and vectors for all of our operations although it is not necessary here."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "id": "1c4719c1",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "import numpy as np\n",
+    "# We use the Sigmoid function as activation function\n",
+    "def sigmoid(z):\n",
+    "    return 1.0/(1.0+np.exp(-z))\n",
+    "\n",
+    "def forwardpropagation(x):\n",
+    "    # weighted sum of inputs to the hidden layer\n",
+    "    z_1 = np.matmul(x, w_1) + b_1\n",
+    "    # activation in the hidden layer\n",
+    "    a_1 = sigmoid(z_1)\n",
+    "    # weighted sum of inputs to the output layer\n",
+    "    z_2 = np.matmul(a_1, w_2) + b_2\n",
+    "    a_2 = z_2\n",
+    "    return a_1, a_2\n",
+    "\n",
+    "def backpropagation(x, y):\n",
+    "    a_1, a_2 = forwardpropagation(x)\n",
+    "    # parameter delta for the output layer, note that a_2=z_2 and its derivative wrt z_2 is just 1\n",
+    "    delta_2 = a_2 - y\n",
+    "    print(0.5*((a_2-y)**2))\n",
+    "    # delta for  the hidden layer\n",
+    "    delta_1 = np.matmul(delta_2, w_2.T) * a_1 * (1 - a_1)\n",
+    "    # gradients for the output layer\n",
+    "    output_weights_gradient = np.matmul(a_1.T, delta_2)\n",
+    "    output_bias_gradient = np.sum(delta_2, axis=0)\n",
+    "    # gradient for the hidden layer\n",
+    "    hidden_weights_gradient = np.matmul(x.T, delta_1)\n",
+    "    hidden_bias_gradient = np.sum(delta_1, axis=0)\n",
+    "    return output_weights_gradient, output_bias_gradient, hidden_weights_gradient, hidden_bias_gradient\n",
+    "\n",
+    "\n",
+    "# ensure the same random numbers appear every time\n",
+    "np.random.seed(0)\n",
+    "# Input variable\n",
+    "x = np.array([4.0],dtype=np.float64)\n",
+    "# Target values\n",
+    "y = 2*x+1.0 \n",
+    "\n",
+    "# Defining the neural network, only scalars here\n",
+    "n_inputs = x.shape\n",
+    "n_features = 1\n",
+    "n_hidden_neurons = 1\n",
+    "n_outputs = 1\n",
+    "\n",
+    "# Initialize the network\n",
+    "# weights and bias in the hidden layer\n",
+    "w_1 = np.random.randn(n_features, n_hidden_neurons)\n",
+    "b_1 = np.zeros(n_hidden_neurons) + 0.01\n",
+    "\n",
+    "# weights and bias in the output layer\n",
+    "w_2 = np.random.randn(n_hidden_neurons, n_outputs)\n",
+    "b_2 = np.zeros(n_outputs) + 0.01\n",
+    "\n",
+    "eta = 0.1\n",
+    "for i in range(50):\n",
+    "    # calculate gradients\n",
+    "    derivW2, derivB2, derivW1, derivB1 = backpropagation(x, y)\n",
+    "    # update weights and biases\n",
+    "    w_2 -= eta * derivW2\n",
+    "    b_2 -= eta * derivB2\n",
+    "    w_1 -= eta * derivW1\n",
+    "    b_1 -= eta * derivB1"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "debaaadc",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "We see that after some few iterations (the results do depend on the learning rate however), we get an error which is rather small."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7d576f19",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Simple neural network and the  back propagation equations\n",
+    "\n",
+    "Let us now try to increase our level of ambition and attempt at setting \n",
+    "up the equations for a neural network with two input nodes, one hidden\n",
+    "layer with two hidden nodes and one output layer with one output node/neuron only (see graph)..\n",
+    "\n",
+    "We need to define the following parameters and variables with the input layer (layer $(0)$) \n",
+    "where we label the  nodes $x_1$ and $x_2$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "582b3b43",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "x_1 = a_1^{(0)} \\wedge x_2 = a_2^{(0)}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c8eace47",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "The  hidden layer (layer $(1)$) has  nodes which yield the outputs $a_1^{(1)}$ and $a_2^{(1)}$) with  weight $\\boldsymbol{w}$ and bias $\\boldsymbol{b}$ parameters"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "81ec9945",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "w_{ij}^{(1)}=\\left\\{w_{11}^{(1)},w_{12}^{(1)},w_{21}^{(1)},w_{22}^{(1)}\\right\\} \\wedge b^{(1)}=\\left\\{b_1^{(1)},b_2^{(1)}\\right\\}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c35e1f69",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Layout of a simple neural network with two input nodes, one  hidden layer with two hidden noeds and one output node\n",
+    "\n",
+    "<!-- dom:FIGURE: [figures/simplenn3.png, width=900 frac=1.0] -->\n",
+    "<!-- begin figure -->\n",
+    "\n",
+    "<img src=\"figures/simplenn3.png\" width=\"900\"><p style=\"font-size: 0.9em\"><i>Figure 1: </i></p>\n",
+    "<!-- end figure -->"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "05b8eea9",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## The ouput layer\n",
+    "\n",
+    "We have the ouput layer given by layer label $(2)$ with output $a^{(2)}$ and weights and biases to be determined given by the variables"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7ef9cb55",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "w_{i}^{(2)}=\\left\\{w_{1}^{(2)},w_{2}^{(2)}\\right\\} \\wedge b^{(2)}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1eb5c5ac",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Our output is $\\tilde{y}=a^{(2)}$ and we define a generic cost function $C(a^{(2)},y;\\boldsymbol{\\Theta})$ where $y$ is the target value (a scalar here).\n",
+    "The parameters we need to optimize are given by"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "00492358",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\boldsymbol{\\Theta}=\\left\\{w_{11}^{(1)},w_{12}^{(1)},w_{21}^{(1)},w_{22}^{(1)},w_{1}^{(2)},w_{2}^{(2)},b_1^{(1)},b_2^{(1)},b^{(2)}\\right\\}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "45cca5aa",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Compact expressions\n",
+    "\n",
+    "We can define the inputs to the activation functions for the various layers in terms of various matrix-vector multiplications and vector additions.\n",
+    "The inputs to the first hidden layer are"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "22cfb40b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\begin{bmatrix}z_1^{(1)} \\\\ z_2^{(1)} \\end{bmatrix}=\\left(\\begin{bmatrix}w_{11}^{(1)} & w_{12}^{(1)}\\\\ w_{21}^{(1)} &w_{22}^{(1)} \\end{bmatrix}\\right)^{T}\\begin{bmatrix}a_1^{(0)} \\\\ a_2^{(0)} \\end{bmatrix}+\\begin{bmatrix}b_1^{(1)} \\\\ b_2^{(1)} \\end{bmatrix},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "45b30d06",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "with outputs"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ebd6a7a5",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\begin{bmatrix}a_1^{(1)} \\\\ a_2^{(1)} \\end{bmatrix}=\\begin{bmatrix}\\sigma^{(1)}(z_1^{(1)}) \\\\ \\sigma^{(1)}(z_2^{(1)}) \\end{bmatrix}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "659dd686",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Output layer\n",
+    "\n",
+    "For the final output layer we have the inputs to the final activation function"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "34a1d4ca",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "z^{(2)} = w_{1}^{(2)}a_1^{(1)} +w_{2}^{(2)}a_2^{(1)}+b^{(2)},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "34471712",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "resulting in the  output"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0b3a74fd",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "a^{(2)}=\\sigma^{(2)}(z^{(2)}).\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1a5bdab3",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Explicit derivatives\n",
+    "\n",
+    "In total we have nine parameters which we need to train.  Using the\n",
+    "chain rule (or just the back-propagation algorithm) we can find all\n",
+    "derivatives. Since we will use automatic differentiation in reverse\n",
+    "mode, we start with the derivatives of the cost function with respect\n",
+    "to the parameters of the output layer, namely"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "37f19e78",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial C}{\\partial w_{i}^{(2)}}=\\frac{\\partial C}{\\partial a^{(2)}}\\frac{\\partial a^{(2)}}{\\partial z^{(2)}}\\frac{\\partial z^{(2)}}{\\partial w_{i}^{(2)}}=\\delta^{(2)}a_i^{(1)},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5505aab8",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "with"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d55d045c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\delta^{(2)}=\\frac{\\partial C}{\\partial a^{(2)}}\\frac{\\partial a^{(2)}}{\\partial z^{(2)}}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "04f101e7",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and finally"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "bfab2e91",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial C}{\\partial b^{(2)}}=\\frac{\\partial C}{\\partial a^{(2)}}\\frac{\\partial a^{(2)}}{\\partial z^{(2)}}\\frac{\\partial z^{(2)}}{\\partial b^{(2)}}=\\delta^{(2)}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "77f35b7e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Derivatives of the hidden layer\n",
+    "\n",
+    "Using the chain rule we have the following expressions for say one of the weight parameters (it is easy to generalize to the other weight parameters)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8cf4a606",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial C}{\\partial w_{11}^{(1)}}=\\frac{\\partial C}{\\partial a^{(2)}}\\frac{\\partial a^{(2)}}{\\partial z^{(2)}}\n",
+    "\\frac{\\partial z^{(2)}}{\\partial z_1^{(1)}}\\frac{\\partial z_1^{(1)}}{\\partial w_{11}^{(1)}}=    \\delta^{(2)}\\frac{\\partial z^{(2)}}{\\partial z_1^{(1)}}\\frac{\\partial z_1^{(1)}}{\\partial w_{11}^{(1)}},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "86951351",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "which, noting that"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "73414e65",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "z^{(2)} =w_1^{(2)}a_1^{(1)}+w_2^{(2)}a_2^{(1)}+b^{(2)},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8f0aaa15",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "allows us to rewrite"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "730c5415",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial z^{(2)}}{\\partial z_1^{(1)}}\\frac{\\partial z_1^{(1)}}{\\partial w_{11}^{(1)}}=w_1^{(2)}\\frac{\\partial a_1^{(1)}}{\\partial z_1^{(1)}}a_1^{(1)}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1afcb5a1",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Final expression\n",
+    "Defining"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7f30cb44",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\delta_1^{(1)}=w_1^{(2)}\\frac{\\partial a_1^{(1)}}{\\partial z_1^{(1)}}\\delta^{(2)},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "14c045ce",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "we have"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0c1a2c68",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial C}{\\partial w_{11}^{(1)}}=\\delta_1^{(1)}a_1^{(1)}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a3385222",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Similarly, we obtain"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "18ee3804",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial C}{\\partial w_{12}^{(1)}}=\\delta_1^{(1)}a_2^{(1)}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ad741d56",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Completing the list\n",
+    "\n",
+    "Similarly, we find"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "65870a70",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial C}{\\partial w_{21}^{(1)}}=\\delta_2^{(1)}a_1^{(1)},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f7807fdc",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9af4a759",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial C}{\\partial w_{22}^{(1)}}=\\delta_2^{(1)}a_2^{(1)},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "dc548cb7",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where we have defined"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "83b75e94",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\delta_2^{(1)}=w_2^{(2)}\\frac{\\partial a_2^{(1)}}{\\partial z_2^{(1)}}\\delta^{(2)}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1c2be559",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Final expressions for the biases of the hidden layer\n",
+    "\n",
+    "For the sake of completeness, we list the derivatives of the biases, which are"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "18b85f86",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial C}{\\partial b_{1}^{(1)}}=\\delta_1^{(1)},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "63e39eb4",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a55371c1",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial C}{\\partial b_{2}^{(1)}}=\\delta_2^{(1)}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "fa31a9b3",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "As we will see below, these expressions can be generalized in a more compact form."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "580df891",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Gradient expressions\n",
+    "\n",
+    "For this specific model, with just one output node and two hidden\n",
+    "nodes, the gradient descent equations take the following form for output layer"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c10bf2ce",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "w_{i}^{(2)}\\leftarrow w_{i}^{(2)}- \\eta \\delta^{(2)} a_{i}^{(1)},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0bae11f8",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ed4a8b93",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "b^{(2)} \\leftarrow b^{(2)}-\\eta \\delta^{(2)},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2d582987",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5fa760a1",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "w_{ij}^{(1)}\\leftarrow w_{ij}^{(1)}- \\eta \\delta_{i}^{(1)} a_{j}^{(0)},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "bc9de8bf",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f00e3ace",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "b_{i}^{(1)} \\leftarrow b_{i}^{(1)}-\\eta \\delta_{i}^{(1)},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7ac96362",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where $\\eta$ is the learning rate."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9c46f966",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Setting up the equations for a neural network\n",
+    "\n",
+    "The questions we want to ask are how do changes in the biases and the\n",
+    "weights in our network change the cost function and how can we use the\n",
+    "final output to modify the weights and biases?\n",
+    "\n",
+    "To derive these equations let us start with a plain regression problem\n",
+    "and define our cost function as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ea509b11",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "{\\cal C}(\\boldsymbol{\\Theta})  =  \\frac{1}{2}\\sum_{i=1}^n\\left(y_i - \\tilde{y}_i\\right)^2,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e08ff771",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where the $y_i$s are our $n$ targets (the values we want to\n",
+    "reproduce), while the outputs of the network after having propagated\n",
+    "all inputs $\\boldsymbol{x}$ are given by $\\boldsymbol{\\tilde{y}}_i$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6f476983",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Layout of a neural network with three hidden layers (last layer = $l=L=4$, first layer $l=0$)\n",
+    "\n",
+    "<!-- dom:FIGURE: [figures/nn2.png, width=900 frac=1.0] -->\n",
+    "<!-- begin figure -->\n",
+    "\n",
+    "<img src=\"figures/nn2.png\" width=\"900\"><p style=\"font-size: 0.9em\"><i>Figure 1: </i></p>\n",
+    "<!-- end figure -->"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0535d087",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Definitions\n",
+    "\n",
+    "With our definition of the targets $\\boldsymbol{y}$, the outputs of the\n",
+    "network $\\boldsymbol{\\tilde{y}}$ and the inputs $\\boldsymbol{x}$ we\n",
+    "define now the activation $z_j^l$ of node/neuron/unit $j$ of the\n",
+    "$l$-th layer as a function of the bias, the weights which add up from\n",
+    "the previous layer $l-1$ and the forward passes/outputs\n",
+    "$\\boldsymbol{a}^{l-1}$ from the previous layer as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5e024ec1",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "z_j^l = \\sum_{i=1}^{M_{l-1}}w_{ij}^la_i^{l-1}+b_j^l,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "239fb4c6",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where $b_k^l$ are the biases from layer $l$.  Here $M_{l-1}$\n",
+    "represents the total number of nodes/neurons/units of layer $l-1$. The\n",
+    "figure in the whiteboard notes illustrates this equation.  We can rewrite this in a more\n",
+    "compact form as the matrix-vector products we discussed earlier,"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7e4fa6c5",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\boldsymbol{z}^l = \\left(\\boldsymbol{W}^l\\right)^T\\boldsymbol{a}^{l-1}+\\boldsymbol{b}^l.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c47cc3c6",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Inputs to the activation function\n",
+    "\n",
+    "With the activation values $\\boldsymbol{z}^l$ we can in turn define the\n",
+    "output of layer $l$ as $\\boldsymbol{a}^l = \\sigma(\\boldsymbol{z}^l)$ where $\\sigma$ is our\n",
+    "activation function. In the examples here we will use the sigmoid\n",
+    "function discussed in our logistic regression lectures. We will also use the same activation function $\\sigma$ for all layers\n",
+    "and their nodes.  It means we have"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4eb89f11",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "a_j^l = \\sigma(z_j^l) = \\frac{1}{1+\\exp{-(z_j^l)}}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "92744a90",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Layout of input to first hidden layer $l=1$ from input layer $l=0$\n",
+    "\n",
+    "<!-- dom:FIGURE: [figures/structure.png, width=900 frac=1.0] -->\n",
+    "<!-- begin figure -->\n",
+    "\n",
+    "<img src=\"figures/structure.png\" width=\"900\"><p style=\"font-size: 0.9em\"><i>Figure 1: </i></p>\n",
+    "<!-- end figure -->"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "35424d45",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Derivatives and the chain rule\n",
+    "\n",
+    "From the definition of the input variable to the activation function, that is $z_j^l$ we have"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b8502930",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial z_j^l}{\\partial w_{ij}^l} = a_i^{l-1},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "81ad45a5",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "11bb8afb",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial z_j^l}{\\partial a_i^{l-1}} = w_{ji}^l.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b53ec752",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "With our definition of the activation function we have that (note that this function depends only on $z_j^l$)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b7519a84",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial a_j^l}{\\partial z_j^{l}} = a_j^l(1-a_j^l)=\\sigma(z_j^l)(1-\\sigma(z_j^l)).\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c57689db",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Derivative of the cost function\n",
+    "\n",
+    "With these definitions we can now compute the derivative of the cost function in terms of the weights.\n",
+    "\n",
+    "Let us specialize to the output layer $l=L$. Our cost function is"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a9f83b15",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "{\\cal C}(\\boldsymbol{\\Theta}^L)  =  \\frac{1}{2}\\sum_{i=1}^n\\left(y_i - \\tilde{y}_i\\right)^2=\\frac{1}{2}\\sum_{i=1}^n\\left(a_i^L - y_i\\right)^2,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "067c2583",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "The derivative of this function with respect to the weights is"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "43545710",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial{\\cal C}(\\boldsymbol{\\Theta}^L)}{\\partial w_{ij}^L}  =  \\left(a_j^L - y_j\\right)\\frac{\\partial a_j^L}{\\partial w_{ij}^{L}},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1eb33717",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "The last partial derivative can easily be computed and reads (by applying the chain rule)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e09a8734",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial a_j^L}{\\partial w_{ij}^{L}} = \\frac{\\partial a_j^L}{\\partial z_{j}^{L}}\\frac{\\partial z_j^L}{\\partial w_{ij}^{L}}=a_j^L(1-a_j^L)a_i^{L-1}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3dc0f5a3",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## The  back propagation equations for a neural network\n",
+    "\n",
+    "We have thus"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "bb58784b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial{\\cal C}((\\boldsymbol{\\Theta}^L)}{\\partial w_{ij}^L}  =  \\left(a_j^L - y_j\\right)a_j^L(1-a_j^L)a_i^{L-1},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "10aea094",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Defining"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b7cc2db8",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\delta_j^L = a_j^L(1-a_j^L)\\left(a_j^L - y_j\\right) = \\sigma'(z_j^L)\\frac{\\partial {\\cal C}}{\\partial (a_j^L)},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6cce9a62",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and using the Hadamard product of two vectors we can write this as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "43e5a84b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\boldsymbol{\\delta}^L = \\sigma'(\\boldsymbol{z}^L)\\circ\\frac{\\partial {\\cal C}}{\\partial (\\boldsymbol{a}^L)}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d5c607a7",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Analyzing the last results\n",
+    "\n",
+    "This is an important expression. The second term on the right handside\n",
+    "measures how fast the cost function is changing as a function of the $j$th\n",
+    "output activation.  If, for example, the cost function doesn't depend\n",
+    "much on a particular output node $j$, then $\\delta_j^L$ will be small,\n",
+    "which is what we would expect. The first term on the right, measures\n",
+    "how fast the activation function $f$ is changing at a given activation\n",
+    "value $z_j^L$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a51b3b58",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## More considerations\n",
+    "\n",
+    "Notice that everything in the above equations is easily computed.  In\n",
+    "particular, we compute $z_j^L$ while computing the behaviour of the\n",
+    "network, and it is only a small additional overhead to compute\n",
+    "$\\sigma'(z^L_j)$.  The exact form of the derivative with respect to the\n",
+    "output depends on the form of the cost function.\n",
+    "However, provided the cost function is known there should be little\n",
+    "trouble in calculating"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4cd9d058",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial {\\cal C}}{\\partial (a_j^L)}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c80b630d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "With the definition of $\\delta_j^L$ we have a more compact definition of the derivative of the cost function in terms of the weights, namely"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "dc0c1a06",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial{\\cal C}}{\\partial w_{ij}^L}  =  \\delta_j^La_i^{L-1}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8f2065b7",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Derivatives in terms of $z_j^L$\n",
+    "\n",
+    "It is also easy to see that our previous equation can be written as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7f89b9d8",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\delta_j^L =\\frac{\\partial {\\cal C}}{\\partial z_j^L}= \\frac{\\partial {\\cal C}}{\\partial a_j^L}\\frac{\\partial a_j^L}{\\partial z_j^L},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "49c2cd3f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "which can also be interpreted as the partial derivative of the cost function with respect to the biases $b_j^L$, namely"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "517b1a37",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\delta_j^L = \\frac{\\partial {\\cal C}}{\\partial b_j^L}\\frac{\\partial b_j^L}{\\partial z_j^L}=\\frac{\\partial {\\cal C}}{\\partial b_j^L},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "65c8107f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "That is, the error $\\delta_j^L$ is exactly equal to the rate of change of the cost function as a function of the bias."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2a10f902",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Bringing it together\n",
+    "\n",
+    "We have now three equations that are essential for the computations of the derivatives of the cost function at the output layer. These equations are needed to start the algorithm and they are"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b2ebf9c2",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"_auto1\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation}\n",
+    "\\frac{\\partial{\\cal C}(\\boldsymbol{W^L})}{\\partial w_{ij}^L}  =  \\delta_j^La_i^{L-1},\n",
+    "\\label{_auto1} \\tag{1}\n",
+    "\\end{equation}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "90336322",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f25ff166",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"_auto2\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation}\n",
+    "\\delta_j^L = \\sigma'(z_j^L)\\frac{\\partial {\\cal C}}{\\partial (a_j^L)},\n",
+    "\\label{_auto2} \\tag{2}\n",
+    "\\end{equation}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4cf11d5e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2670748d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"_auto3\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation}\n",
+    "\\delta_j^L = \\frac{\\partial {\\cal C}}{\\partial b_j^L},\n",
+    "\\label{_auto3} \\tag{3}\n",
+    "\\end{equation}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "18c29f71",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Final back propagating equation\n",
+    "\n",
+    "We have that (replacing $L$ with a general layer $l$)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c593470c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\delta_j^l =\\frac{\\partial {\\cal C}}{\\partial z_j^l}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "28e8caef",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "We want to express this in terms of the equations for layer $l+1$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "516de9d7",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Using the chain rule and summing over all $k$ entries\n",
+    "\n",
+    "We obtain"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "004c0bf4",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\delta_j^l =\\sum_k \\frac{\\partial {\\cal C}}{\\partial z_k^{l+1}}\\frac{\\partial z_k^{l+1}}{\\partial z_j^{l}}=\\sum_k \\delta_k^{l+1}\\frac{\\partial z_k^{l+1}}{\\partial z_j^{l}},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d62a3b1f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and recalling that"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e9af770e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "z_j^{l+1} = \\sum_{i=1}^{M_{l}}w_{ij}^{l+1}a_i^{l}+b_j^{l+1},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "eca56f17",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "with $M_l$ being the number of nodes in layer $l$, we obtain"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "bb0e4414",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\delta_j^l =\\sum_k \\delta_k^{l+1}w_{kj}^{l+1}\\sigma'(z_j^l),\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a4b190fc",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "This is our final equation.\n",
+    "\n",
+    "We are now ready to set up the algorithm for back propagation and learning the weights and biases."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ec0f87c0",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations\n",
+    "\n",
+    "**The architecture (our model).**\n",
+    "\n",
+    "1. Set up your inputs and outputs (scalars, vectors, matrices or higher-order arrays)\n",
+    "\n",
+    "2. Define the number of hidden layers and hidden nodes\n",
+    "\n",
+    "3. Define activation functions for hidden layers and output layers\n",
+    "\n",
+    "4. Define optimizer (plan learning rate, momentum, ADAgrad, RMSprop, ADAM etc) and array of initial learning rates\n",
+    "\n",
+    "5. Define cost function and possible regularization terms with hyperparameters\n",
+    "\n",
+    "6. Initialize weights and biases\n",
+    "\n",
+    "7. Fix number of iterations for the feed forward part and back propagation part"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2fb45155",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Setting up the back propagation algorithm, part 1\n",
+    "\n",
+    "The four equations  provide us with a way of computing the gradients of the cost function. Let us write this out in the form of an algorithm.\n",
+    "\n",
+    "**First**, we set up the input data $\\boldsymbol{x}$ and the activations\n",
+    "$\\boldsymbol{z}_1$ of the input layer and compute the activation function and\n",
+    "the pertinent outputs $\\boldsymbol{a}^1$.\n",
+    "\n",
+    "**Secondly**, we perform then the feed forward till we reach the output\n",
+    "layer and compute all $\\boldsymbol{z}_l$ of the input layer and compute the\n",
+    "activation function and the pertinent outputs $\\boldsymbol{a}^l$ for\n",
+    "$l=1,2,3,\\dots,L$.\n",
+    "\n",
+    "**Notation**: The first hidden layer has $l=1$ as label and the final output layer has $l=L$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3d5c2a0e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Setting up the back propagation algorithm, part 2\n",
+    "\n",
+    "Thereafter we compute the ouput error $\\boldsymbol{\\delta}^L$ by computing all"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9183bbd0",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\delta_j^L = \\sigma'(z_j^L)\\frac{\\partial {\\cal C}}{\\partial (a_j^L)}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "32ece956",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Then we compute the back propagate error for each $l=L-1,L-2,\\dots,1$ as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "466d6bda",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\delta_j^l = \\sum_k \\delta_k^{l+1}w_{kj}^{l+1}\\sigma'(z_j^l).\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9f31b228",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Setting up the Back propagation algorithm, part 3\n",
+    "\n",
+    "Finally, we update the weights and the biases using gradient descent\n",
+    "for each $l=L-1,L-2,\\dots,1$ (the first hidden layer) and update the weights and biases\n",
+    "according to the rules"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "fbeac005",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "w_{ij}^l\\leftarrow  = w_{ij}^l- \\eta \\delta_j^la_i^{l-1},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "bc6ae984",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "b_j^l \\leftarrow b_j^l-\\eta \\frac{\\partial {\\cal C}}{\\partial b_j^l}=b_j^l-\\eta \\delta_j^l,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "65f3133d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "with $\\eta$ being the learning rate."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5d27bbe1",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Updating the gradients\n",
+    "\n",
+    "With the back propagate error for each $l=L-1,L-2,\\dots,1$ as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5e5d0aa0",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\delta_j^l = \\sum_k \\delta_k^{l+1}w_{kj}^{l+1}\\sigma'(z_j^l),\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ea32e5bb",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "we update the weights and the biases using gradient descent for each $l=L-1,L-2,\\dots,1$ and update the weights and biases according to the rules"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3a9bb5a6",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "w_{ij}^l\\leftarrow  = w_{ij}^l- \\eta \\delta_j^la_i^{l-1},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9008dcf8",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "b_j^l \\leftarrow b_j^l-\\eta \\frac{\\partial {\\cal C}}{\\partial b_j^l}=b_j^l-\\eta \\delta_j^l,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "89aba7d6",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Activation functions\n",
+    "\n",
+    "A property that characterizes a neural network, other than its\n",
+    "connectivity, is the choice of activation function(s).  The following\n",
+    "restrictions are imposed on an activation function for an FFNN to\n",
+    "fulfill the universal approximation theorem\n",
+    "\n",
+    "  * Non-constant\n",
+    "\n",
+    "  * Bounded\n",
+    "\n",
+    "  * Monotonically-increasing\n",
+    "\n",
+    "  * Continuous"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ea0cdce2",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "### Activation functions, Logistic and Hyperbolic ones\n",
+    "\n",
+    "The second requirement excludes all linear functions. Furthermore, in\n",
+    "a MLP with only linear activation functions, each layer simply\n",
+    "performs a linear transformation of its inputs.\n",
+    "\n",
+    "Regardless of the number of layers, the output of the NN will be\n",
+    "nothing but a linear function of the inputs. Thus we need to introduce\n",
+    "some kind of non-linearity to the NN to be able to fit non-linear\n",
+    "functions Typical examples are the logistic *Sigmoid*"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "91342c80",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\sigma(x) = \\frac{1}{1 + e^{-x}},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "bd6eb22a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and the *hyperbolic tangent* function"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4e75b2ab",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\sigma(x) = \\tanh(x)\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1626d9b7",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Relevance\n",
+    "\n",
+    "The *sigmoid* function are more biologically plausible because the\n",
+    "output of inactive neurons are zero. Such activation function are\n",
+    "called *one-sided*. However, it has been shown that the hyperbolic\n",
+    "tangent performs better than the sigmoid for training MLPs.  has\n",
+    "become the most popular for *deep neural networks*"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "id": "4ac7c23c",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "%matplotlib inline\n",
+    "\n",
+    "\"\"\"The sigmoid function (or the logistic curve) is a \n",
+    "function that takes any real number, z, and outputs a number (0,1).\n",
+    "It is useful in neural networks for assigning weights on a relative scale.\n",
+    "The value z is the weighted sum of parameters involved in the learning algorithm.\"\"\"\n",
+    "\n",
+    "import numpy\n",
+    "import matplotlib.pyplot as plt\n",
+    "import math as mt\n",
+    "\n",
+    "z = numpy.arange(-5, 5, .1)\n",
+    "sigma_fn = numpy.vectorize(lambda z: 1/(1+numpy.exp(-z)))\n",
+    "sigma = sigma_fn(z)\n",
+    "\n",
+    "fig = plt.figure()\n",
+    "ax = fig.add_subplot(111)\n",
+    "ax.plot(z, sigma)\n",
+    "ax.set_ylim([-0.1, 1.1])\n",
+    "ax.set_xlim([-5,5])\n",
+    "ax.grid(True)\n",
+    "ax.set_xlabel('z')\n",
+    "ax.set_title('sigmoid function')\n",
+    "\n",
+    "plt.show()\n",
+    "\n",
+    "\"\"\"Step Function\"\"\"\n",
+    "z = numpy.arange(-5, 5, .02)\n",
+    "step_fn = numpy.vectorize(lambda z: 1.0 if z >= 0.0 else 0.0)\n",
+    "step = step_fn(z)\n",
+    "\n",
+    "fig = plt.figure()\n",
+    "ax = fig.add_subplot(111)\n",
+    "ax.plot(z, step)\n",
+    "ax.set_ylim([-0.5, 1.5])\n",
+    "ax.set_xlim([-5,5])\n",
+    "ax.grid(True)\n",
+    "ax.set_xlabel('z')\n",
+    "ax.set_title('step function')\n",
+    "\n",
+    "plt.show()\n",
+    "\n",
+    "\"\"\"Sine Function\"\"\"\n",
+    "z = numpy.arange(-2*mt.pi, 2*mt.pi, 0.1)\n",
+    "t = numpy.sin(z)\n",
+    "\n",
+    "fig = plt.figure()\n",
+    "ax = fig.add_subplot(111)\n",
+    "ax.plot(z, t)\n",
+    "ax.set_ylim([-1.0, 1.0])\n",
+    "ax.set_xlim([-2*mt.pi,2*mt.pi])\n",
+    "ax.grid(True)\n",
+    "ax.set_xlabel('z')\n",
+    "ax.set_title('sine function')\n",
+    "\n",
+    "plt.show()\n",
+    "\n",
+    "\"\"\"Plots a graph of the squashing function used by a rectified linear\n",
+    "unit\"\"\"\n",
+    "z = numpy.arange(-2, 2, .1)\n",
+    "zero = numpy.zeros(len(z))\n",
+    "y = numpy.max([zero, z], axis=0)\n",
+    "\n",
+    "fig = plt.figure()\n",
+    "ax = fig.add_subplot(111)\n",
+    "ax.plot(z, y)\n",
+    "ax.set_ylim([-2.0, 2.0])\n",
+    "ax.set_xlim([-2.0, 2.0])\n",
+    "ax.grid(True)\n",
+    "ax.set_xlabel('z')\n",
+    "ax.set_title('Rectified linear unit')\n",
+    "\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6aeb0ee4",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Vanishing gradients\n",
+    "\n",
+    "The Back propagation algorithm we derived above works by going from\n",
+    "the output layer to the input layer, propagating the error gradient on\n",
+    "the way. Once the algorithm has computed the gradient of the cost\n",
+    "function with regards to each parameter in the network, it uses these\n",
+    "gradients to update each parameter with a Gradient Descent (GD) step.\n",
+    "\n",
+    "Unfortunately for us, the gradients often get smaller and smaller as\n",
+    "the algorithm progresses down to the first hidden layers. As a result,\n",
+    "the GD update leaves the lower layer connection weights virtually\n",
+    "unchanged, and training never converges to a good solution. This is\n",
+    "known in the literature as **the vanishing gradients problem**."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ea47d1d6",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Exploding gradients\n",
+    "\n",
+    "In other cases, the opposite can happen, namely the the gradients can\n",
+    "grow bigger and bigger. The result is that many of the layers get\n",
+    "large updates of the weights the algorithm diverges. This is the\n",
+    "**exploding gradients problem**, which is mostly encountered in\n",
+    "recurrent neural networks. More generally, deep neural networks suffer\n",
+    "from unstable gradients, different layers may learn at widely\n",
+    "different speeds"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1947aa95",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Is the Logistic activation function (Sigmoid)  our choice?\n",
+    "\n",
+    "Although this unfortunate behavior has been empirically observed for\n",
+    "quite a while (it was one of the reasons why deep neural networks were\n",
+    "mostly abandoned for a long time), it is only around 2010 that\n",
+    "significant progress was made in understanding it.\n",
+    "\n",
+    "A paper titled [Understanding the Difficulty of Training Deep\n",
+    "Feedforward Neural Networks by Xavier Glorot and Yoshua Bengio](http://proceedings.mlr.press/v9/glorot10a.html) found that\n",
+    "the problems with the popular logistic\n",
+    "sigmoid activation function and the weight initialization technique\n",
+    "that was most popular at the time, namely random initialization using\n",
+    "a normal distribution with a mean of 0 and a standard deviation of\n",
+    "1."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d024119f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Logistic function as the root of problems\n",
+    "\n",
+    "They showed that with this activation function and this\n",
+    "initialization scheme, the variance of the outputs of each layer is\n",
+    "much greater than the variance of its inputs. Going forward in the\n",
+    "network, the variance keeps increasing after each layer until the\n",
+    "activation function saturates at the top layers. This is actually made\n",
+    "worse by the fact that the logistic function has a mean of 0.5, not 0\n",
+    "(the hyperbolic tangent function has a mean of 0 and behaves slightly\n",
+    "better than the logistic function in deep networks)."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c9178132",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## The derivative of the Logistic funtion\n",
+    "\n",
+    "Looking at the logistic activation function, when inputs become large\n",
+    "(negative or positive), the function saturates at 0 or 1, with a\n",
+    "derivative extremely close to 0. Thus when backpropagation kicks in,\n",
+    "it has virtually no gradient to propagate back through the network,\n",
+    "and what little gradient exists keeps getting diluted as\n",
+    "backpropagation progresses down through the top layers, so there is\n",
+    "really nothing left for the lower layers.\n",
+    "\n",
+    "In their paper, Glorot and Bengio propose a way to significantly\n",
+    "alleviate this problem. We need the signal to flow properly in both\n",
+    "directions: in the forward direction when making predictions, and in\n",
+    "the reverse direction when backpropagating gradients. We don’t want\n",
+    "the signal to die out, nor do we want it to explode and saturate. For\n",
+    "the signal to flow properly, the authors argue that we need the\n",
+    "variance of the outputs of each layer to be equal to the variance of\n",
+    "its inputs, and we also need the gradients to have equal variance\n",
+    "before and after flowing through a layer in the reverse direction."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "756185f5",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Insights from the paper by Glorot and Bengio\n",
+    "\n",
+    "One of the insights in the 2010 paper by Glorot and Bengio was that\n",
+    "the vanishing/exploding gradients problems were in part due to a poor\n",
+    "choice of activation function. Until then most people had assumed that\n",
+    "if Nature had chosen to use roughly sigmoid activation functions in\n",
+    "biological neurons, they must be an excellent choice. But it turns out\n",
+    "that other activation functions behave much better in deep neural\n",
+    "networks, in particular the ReLU activation function, mostly because\n",
+    "it does not saturate for positive values (and also because it is quite\n",
+    "fast to compute)."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3d92cad4",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## The RELU function family\n",
+    "\n",
+    "The ReLU activation function suffers from a problem known as the dying\n",
+    "ReLUs: during training, some neurons effectively die, meaning they\n",
+    "stop outputting anything other than 0.\n",
+    "\n",
+    "In some cases, you may find that half of your network’s neurons are\n",
+    "dead, especially if you used a large learning rate. During training,\n",
+    "if a neuron’s weights get updated such that the weighted sum of the\n",
+    "neuron’s inputs is negative, it will start outputting 0. When this\n",
+    "happen, the neuron is unlikely to come back to life since the gradient\n",
+    "of the ReLU function is 0 when its input is negative."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cbc6f721",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## ELU function\n",
+    "\n",
+    "To solve this problem, nowadays practitioners use a variant of the\n",
+    "ReLU function, such as the leaky ReLU discussed above or the so-called\n",
+    "exponential linear unit (ELU) function"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9249dc7b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "ELU(z) = \\left\\{\\begin{array}{cc} \\alpha\\left( \\exp{(z)}-1\\right) & z < 0,\\\\  z & z \\ge 0.\\end{array}\\right.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e59de3af",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Which activation function should we use?\n",
+    "\n",
+    "In general it seems that the ELU activation function is better than\n",
+    "the leaky ReLU function (and its variants), which is better than\n",
+    "ReLU. ReLU performs better than $\\tanh$ which in turn performs better\n",
+    "than the logistic function.\n",
+    "\n",
+    "If runtime performance is an issue, then you may opt for the leaky\n",
+    "ReLU function over the ELU function If you don’t want to tweak yet\n",
+    "another hyperparameter, you may just use the default $\\alpha$ of\n",
+    "$0.01$ for the leaky ReLU, and $1$ for ELU. If you have spare time and\n",
+    "computing power, you can use cross-validation or bootstrap to evaluate\n",
+    "other activation functions."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e2da998c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## More on activation functions, output layers\n",
+    "\n",
+    "In most cases you can use the ReLU activation function in the hidden\n",
+    "layers (or one of its variants).\n",
+    "\n",
+    "It is a bit faster to compute than other activation functions, and the\n",
+    "gradient descent optimization does in general not get stuck.\n",
+    "\n",
+    "**For the output layer:**\n",
+    "\n",
+    "* For classification the softmax activation function is generally a good choice for classification tasks (when the classes are mutually exclusive).\n",
+    "\n",
+    "* For regression tasks, you can simply use no activation function at all."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e1abf01e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Fine-tuning neural network hyperparameters\n",
+    "\n",
+    "The flexibility of neural networks is also one of their main\n",
+    "drawbacks: there are many hyperparameters to tweak. Not only can you\n",
+    "use any imaginable network topology (how neurons/nodes are\n",
+    "interconnected), but even in a simple FFNN you can change the number\n",
+    "of layers, the number of neurons per layer, the type of activation\n",
+    "function to use in each layer, the weight initialization logic, the\n",
+    "stochastic gradient optmized and much more. How do you know what\n",
+    "combination of hyperparameters is the best for your task?\n",
+    "\n",
+    "* You can use grid search with cross-validation to find the right hyperparameters.\n",
+    "\n",
+    "However,since there are many hyperparameters to tune, and since\n",
+    "training a neural network on a large dataset takes a lot of time, you\n",
+    "will only be able to explore a tiny part of the hyperparameter space.\n",
+    "\n",
+    "* You can use randomized search.\n",
+    "\n",
+    "* Or use tools like [Oscar](http://oscar.calldesk.ai/), which implements more complex algorithms to help you find a good set of hyperparameters quickly."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a8ded7cd",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Hidden layers\n",
+    "\n",
+    "For many problems you can start with just one or two hidden layers and\n",
+    "it will work just fine.  For the MNIST data set discussed below you can easily get a\n",
+    "high accuracy using just one hidden layer with a few hundred neurons.\n",
+    "You can reach for this data set above 98% accuracy using two hidden\n",
+    "layers with the same total amount of neurons, in roughly the same\n",
+    "amount of training time.\n",
+    "\n",
+    "For more complex problems, you can gradually ramp up the number of\n",
+    "hidden layers, until you start overfitting the training set. Very\n",
+    "complex tasks, such as large image classification or speech\n",
+    "recognition, typically require networks with dozens of layers and they\n",
+    "need a huge amount of training data. However, you will rarely have to\n",
+    "train such networks from scratch: it is much more common to reuse\n",
+    "parts of a pretrained state-of-the-art network that performs a similar\n",
+    "task."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "96da4f48",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Batch Normalization\n",
+    "\n",
+    "Batch Normalization aims to address the vanishing/exploding gradients\n",
+    "problems, and more generally the problem that the distribution of each\n",
+    "layer’s inputs changes during training, as the parameters of the\n",
+    "previous layers change.\n",
+    "\n",
+    "The technique consists of adding an operation in the model just before\n",
+    "the activation function of each layer, simply zero-centering and\n",
+    "normalizing the inputs, then scaling and shifting the result using two\n",
+    "new parameters per layer (one for scaling, the other for shifting). In\n",
+    "other words, this operation lets the model learn the optimal scale and\n",
+    "mean of the inputs for each layer.  In order to zero-center and\n",
+    "normalize the inputs, the algorithm needs to estimate the inputs’ mean\n",
+    "and standard deviation. It does so by evaluating the mean and standard\n",
+    "deviation of the inputs over the current mini-batch, from this the\n",
+    "name batch normalization."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "395346a7",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Dropout\n",
+    "\n",
+    "It is a fairly simple algorithm: at every training step, every neuron\n",
+    "(including the input neurons but excluding the output neurons) has a\n",
+    "probability $p$ of being temporarily dropped out, meaning it will be\n",
+    "entirely ignored during this training step, but it may be active\n",
+    "during the next step.\n",
+    "\n",
+    "The hyperparameter $p$ is called the dropout rate, and it is typically\n",
+    "set to 50%. After training, the neurons are not dropped anymore.  It\n",
+    "is viewed as one of the most popular regularization techniques."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9c712bbb",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Gradient Clipping\n",
+    "\n",
+    "A popular technique to lessen the exploding gradients problem is to\n",
+    "simply clip the gradients during backpropagation so that they never\n",
+    "exceed some threshold (this is mostly useful for recurrent neural\n",
+    "networks).\n",
+    "\n",
+    "This technique is called Gradient Clipping.\n",
+    "\n",
+    "In general however, Batch\n",
+    "Normalization is preferred."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2b66ea72",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## A top-down perspective on Neural networks\n",
+    "\n",
+    "The first thing we would like to do is divide the data into two or\n",
+    "three parts. A training set, a validation or dev (development) set,\n",
+    "and a test set. The test set is the data on which we want to make\n",
+    "predictions. The dev set is a subset of the training data we use to\n",
+    "check how well we are doing out-of-sample, after training the model on\n",
+    "the training dataset. We use the validation error as a proxy for the\n",
+    "test error in order to make tweaks to our model. It is crucial that we\n",
+    "do not use any of the test data to train the algorithm. This is a\n",
+    "cardinal sin in ML. Then:\n",
+    "\n",
+    "1. Estimate optimal error rate\n",
+    "\n",
+    "2. Minimize underfitting (bias) on training data set.\n",
+    "\n",
+    "3. Make sure you are not overfitting."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5acbc082",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## More top-down perspectives\n",
+    "\n",
+    "If the validation and test sets are drawn from the same distributions,\n",
+    "then a good performance on the validation set should lead to similarly\n",
+    "good performance on the test set. \n",
+    "\n",
+    "However, sometimes\n",
+    "the training data and test data differ in subtle ways because, for\n",
+    "example, they are collected using slightly different methods, or\n",
+    "because it is cheaper to collect data in one way versus another. In\n",
+    "this case, there can be a mismatch between the training and test\n",
+    "data. This can lead to the neural network overfitting these small\n",
+    "differences between the test and training sets, and a poor performance\n",
+    "on the test set despite having a good performance on the validation\n",
+    "set. To rectify this, Andrew Ng suggests making two validation or dev\n",
+    "sets, one constructed from the training data and one constructed from\n",
+    "the test data. The difference between the performance of the algorithm\n",
+    "on these two validation sets quantifies the train-test mismatch. This\n",
+    "can serve as another important diagnostic when using DNNs for\n",
+    "supervised learning."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "31825b65",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Limitations of supervised learning with deep networks\n",
+    "\n",
+    "Like all statistical methods, supervised learning using neural\n",
+    "networks has important limitations. This is especially important when\n",
+    "one seeks to apply these methods, especially to physics problems. Like\n",
+    "all tools, DNNs are not a universal solution. Often, the same or\n",
+    "better performance on a task can be achieved by using a few\n",
+    "hand-engineered features (or even a collection of random\n",
+    "features)."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c76d9af9",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Limitations of NNs\n",
+    "\n",
+    "Here we list some of the important limitations of supervised neural network based models. \n",
+    "\n",
+    "* **Need labeled data**. All supervised learning methods, DNNs for supervised learning require labeled data. Often, labeled data is harder to acquire than unlabeled data (e.g. one must pay for human experts to label images).\n",
+    "\n",
+    "* **Supervised neural networks are extremely data intensive.** DNNs are data hungry. They perform best when data is plentiful. This is doubly so for supervised methods where the data must also be labeled. The utility of DNNs is extremely limited if data is hard to acquire or the datasets are small (hundreds to a few thousand samples). In this case, the performance of other methods that utilize hand-engineered features can exceed that of DNNs."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "bdc93363",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Homogeneous data\n",
+    "\n",
+    "* **Homogeneous data.** Almost all DNNs deal with homogeneous data of one type. It is very hard to design architectures that mix and match data types (i.e. some continuous variables, some discrete variables, some time series). In applications beyond images, video, and language, this is often what is required. In contrast, ensemble models like random forests or gradient-boosted trees have no difficulty handling mixed data types."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a1d6ff64",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## More limitations\n",
+    "\n",
+    "* **Many problems are not about prediction.** In natural science we are often interested in learning something about the underlying distribution that generates the data. In this case, it is often difficult to cast these ideas in a supervised learning setting. While the problems are related, it is possible to make good predictions with a *wrong* model. The model might or might not be useful for understanding the underlying science.\n",
+    "\n",
+    "Some of these remarks are particular to DNNs, others are shared by all supervised learning methods. This motivates the use of unsupervised methods which in part circumvent these problems."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0c2e5742",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Setting up a Multi-layer perceptron model for classification\n",
+    "\n",
+    "We are now gong to develop an example based on the MNIST data\n",
+    "base. This is a classification problem and we need to use our\n",
+    "cross-entropy function we discussed in connection with logistic\n",
+    "regression. The cross-entropy defines our cost function for the\n",
+    "classificaton problems with neural networks.\n",
+    "\n",
+    "In binary classification with two classes $(0, 1)$ we define the\n",
+    "logistic/sigmoid function as the probability that a particular input\n",
+    "is in class $0$ or $1$.  This is possible because the logistic\n",
+    "function takes any input from the real numbers and inputs a number\n",
+    "between 0 and 1, and can therefore be interpreted as a probability. It\n",
+    "also has other nice properties, such as a derivative that is simple to\n",
+    "calculate.\n",
+    "\n",
+    "For an input $\\boldsymbol{a}$ from the hidden layer, the probability that the input $\\boldsymbol{x}$\n",
+    "is in class 0 or 1 is just. We let $\\theta$ represent the unknown weights and biases to be adjusted by our equations). The variable $x$\n",
+    "represents our activation values $z$. We have"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d4da3f02",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "P(y = 0 \\mid \\boldsymbol{x}, \\boldsymbol{\\theta}) = \\frac{1}{1 + \\exp{(- \\boldsymbol{x}})} ,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "01ea2e0b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9c1c7bec",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "P(y = 1 \\mid \\boldsymbol{x}, \\boldsymbol{\\theta}) = 1 - P(y = 0 \\mid \\boldsymbol{x}, \\boldsymbol{\\theta}) ,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9238ff2d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where $y \\in \\{0, 1\\}$  and $\\boldsymbol{\\theta}$ represents the weights and biases\n",
+    "of our network."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3be74bd1",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Defining the cost function\n",
+    "\n",
+    "Our cost function is given as (see the Logistic regression lectures)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2e2fd39c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\mathcal{C}(\\boldsymbol{\\theta}) = - \\ln P(\\mathcal{D} \\mid \\boldsymbol{\\theta}) = - \\sum_{i=1}^n\n",
+    "y_i \\ln[P(y_i = 0)] + (1 - y_i) \\ln [1 - P(y_i = 0)] = \\sum_{i=1}^n \\mathcal{L}_i(\\boldsymbol{\\theta}) .\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "42b1d26b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "This last equality means that we can interpret our *cost* function as a sum over the *loss* function\n",
+    "for each point in the dataset $\\mathcal{L}_i(\\boldsymbol{\\theta})$.  \n",
+    "The negative sign is just so that we can think about our algorithm as minimizing a positive number, rather\n",
+    "than maximizing a negative number.  \n",
+    "\n",
+    "In *multiclass* classification it is common to treat each integer label as a so called *one-hot* vector:  \n",
+    "\n",
+    "$y = 5 \\quad \\rightarrow \\quad \\boldsymbol{y} = (0, 0, 0, 0, 0, 1, 0, 0, 0, 0) ,$ and\n",
+    "\n",
+    "$y = 1 \\quad \\rightarrow \\quad \\boldsymbol{y} = (0, 1, 0, 0, 0, 0, 0, 0, 0, 0) ,$ \n",
+    "\n",
+    "i.e. a binary bit string of length $C$, where $C = 10$ is the number of classes in the MNIST dataset (numbers from $0$ to $9$)..  \n",
+    "\n",
+    "If $\\boldsymbol{x}_i$ is the $i$-th input (image), $y_{ic}$ refers to the $c$-th component of the $i$-th\n",
+    "output vector $\\boldsymbol{y}_i$.  \n",
+    "The probability of $\\boldsymbol{x}_i$ being in class $c$ will be given by the softmax function:"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f740a484",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "P(y_{ic} = 1 \\mid \\boldsymbol{x}_i, \\boldsymbol{\\theta}) = \\frac{\\exp{((\\boldsymbol{a}_i^{hidden})^T \\boldsymbol{w}_c)}}\n",
+    "{\\sum_{c'=0}^{C-1} \\exp{((\\boldsymbol{a}_i^{hidden})^T \\boldsymbol{w}_{c'})}} ,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "19189bfc",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "which reduces to the logistic function in the binary case.  \n",
+    "The likelihood of this $C$-class classifier\n",
+    "is now given as:"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "aeb3ef60",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "P(\\mathcal{D} \\mid \\boldsymbol{\\theta}) = \\prod_{i=1}^n \\prod_{c=0}^{C-1} [P(y_{ic} = 1)]^{y_{ic}} .\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "dbf419a1",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Again we take the negative log-likelihood to define our cost function:"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9e345753",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\mathcal{C}(\\boldsymbol{\\theta}) = - \\log{P(\\mathcal{D} \\mid \\boldsymbol{\\theta})}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3b13095e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "See the logistic regression lectures for a full definition of the cost function.\n",
+    "\n",
+    "The back propagation equations need now only a small change, namely the definition of a new cost function. We are thus ready to use the same equations as before!"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "96501a91",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Example: binary classification problem\n",
+    "\n",
+    "As an example of the above, relevant for project 2 as well, let us consider a binary class. As discussed in our logistic regression lectures, we defined a cost function in terms of the parameters $\\beta$ as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "48cf79fe",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\mathcal{C}(\\boldsymbol{\\beta}) = - \\sum_{i=1}^n \\left(y_i\\log{p(y_i \\vert x_i,\\boldsymbol{\\beta})}+(1-y_i)\\log{1-p(y_i \\vert x_i,\\boldsymbol{\\beta})}\\right),\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3243c0b1",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where we had defined the logistic (sigmoid) function"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "bb312a09",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "p(y_i =1\\vert x_i,\\boldsymbol{\\beta})=\\frac{\\exp{(\\beta_0+\\beta_1 x_i)}}{1+\\exp{(\\beta_0+\\beta_1 x_i)}},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "484cf2b4",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2b9c5483",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "p(y_i =0\\vert x_i,\\boldsymbol{\\beta})=1-p(y_i =1\\vert x_i,\\boldsymbol{\\beta}).\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5ca21f09",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "The parameters $\\boldsymbol{\\beta}$ were defined using a minimization method like gradient descent or Newton-Raphson's method. \n",
+    "\n",
+    "Now we replace $x_i$ with the activation $z_i^l$ for a given layer $l$ and the outputs as $y_i=a_i^l=f(z_i^l)$, with $z_i^l$ now being a function of the weights $w_{ij}^l$ and biases $b_i^l$. \n",
+    "We have then"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4852e4d2",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "a_i^l = y_i = \\frac{\\exp{(z_i^l)}}{1+\\exp{(z_i^l)}},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e3b7cbef",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "with"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0c1e69a1",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "z_i^l = \\sum_{j}w_{ij}^l a_j^{l-1}+b_i^l,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e71df7f4",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where the superscript $l-1$ indicates that these are the outputs from layer $l-1$.\n",
+    "Our cost function at the final layer $l=L$ is now"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "50d6fecc",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\mathcal{C}(\\boldsymbol{W}) = - \\sum_{i=1}^n \\left(t_i\\log{a_i^L}+(1-t_i)\\log{(1-a_i^L)}\\right),\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e145e461",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where we have defined the targets $t_i$. The derivatives of the cost function with respect to the output $a_i^L$ are then easily calculated and we get"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "97f13260",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial \\mathcal{C}(\\boldsymbol{W})}{\\partial a_i^L} = \\frac{a_i^L-t_i}{a_i^L(1-a_i^L)}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4361ce3b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "In case we use another activation function than the logistic one, we need to evaluate other derivatives."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "52a16654",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## The Softmax function\n",
+    "In case we employ the more general case given by the Softmax equation, we need to evaluate the derivative of the activation function with respect to the activation $z_i^l$, that is we need"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3bfb321e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial f(z_i^l)}{\\partial w_{jk}^l} =\n",
+    "\\frac{\\partial f(z_i^l)}{\\partial z_j^l} \\frac{\\partial z_j^l}{\\partial w_{jk}^l}= \\frac{\\partial f(z_i^l)}{\\partial z_j^l}a_k^{l-1}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "eccac6c9",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "For the Softmax function we have"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "23634198",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "f(z_i^l) = \\frac{\\exp{(z_i^l)}}{\\sum_{m=1}^K\\exp{(z_m^l)}}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7a2e75ba",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Its derivative with respect to $z_j^l$ gives"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2dad2d14",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial f(z_i^l)}{\\partial z_j^l}= f(z_i^l)\\left(\\delta_{ij}-f(z_j^l)\\right),\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "46415917",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "which in case of the simply binary model reduces to  having $i=j$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6adc7c1e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Developing a code for doing neural networks with back propagation\n",
+    "\n",
+    "One can identify a set of key steps when using neural networks to solve supervised learning problems:  \n",
+    "\n",
+    "1. Collect and pre-process data  \n",
+    "\n",
+    "2. Define model and architecture  \n",
+    "\n",
+    "3. Choose cost function and optimizer  \n",
+    "\n",
+    "4. Train the model  \n",
+    "\n",
+    "5. Evaluate model performance on test data  \n",
+    "\n",
+    "6. Adjust hyperparameters (if necessary, network architecture)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4110d83e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Collect and pre-process data\n",
+    "\n",
+    "Here we will be using the MNIST dataset, which is readily available through the **scikit-learn**\n",
+    "package. You may also find it for example [here](http://yann.lecun.com/exdb/mnist/).  \n",
+    "The *MNIST* (Modified National Institute of Standards and Technology) database is a large database\n",
+    "of handwritten digits that is commonly used for training various image processing systems.  \n",
+    "The MNIST dataset consists of 70 000 images of size $28\\times 28$ pixels, each labeled from 0 to 9.  \n",
+    "The scikit-learn dataset we will use consists of a selection of 1797 images of size $8\\times 8$ collected and processed from this database.  \n",
+    "\n",
+    "To feed data into a feed-forward neural network we need to represent\n",
+    "the inputs as a design/feature matrix $X = (n_{inputs}, n_{features})$.  Each\n",
+    "row represents an *input*, in this case a handwritten digit, and\n",
+    "each column represents a *feature*, in this case a pixel.  The\n",
+    "correct answers, also known as *labels* or *targets* are\n",
+    "represented as a 1D array of integers \n",
+    "$Y = (n_{inputs}) = (5, 3, 1, 8,...)$.\n",
+    "\n",
+    "As an example, say we want to build a neural network using supervised learning to predict Body-Mass Index (BMI) from\n",
+    "measurements of height (in m)  \n",
+    "and weight (in kg). If we have measurements of 5 people the design/feature matrix could be for example:  \n",
+    "\n",
+    "$$ X = \\begin{bmatrix}\n",
+    "1.85 & 81\\\\\n",
+    "1.71 & 65\\\\\n",
+    "1.95 & 103\\\\\n",
+    "1.55 & 42\\\\\n",
+    "1.63 & 56\n",
+    "\\end{bmatrix} ,$$  \n",
+    "\n",
+    "and the targets would be:  \n",
+    "\n",
+    "$$ Y = (23.7, 22.2, 27.1, 17.5, 21.1) $$  \n",
+    "\n",
+    "Since each input image is a 2D matrix, we need to flatten the image\n",
+    "(i.e. \"unravel\" the 2D matrix into a 1D array) to turn the data into a\n",
+    "design/feature matrix. This means we lose all spatial information in the\n",
+    "image, such as locality and translational invariance. More complicated\n",
+    "architectures such as Convolutional Neural Networks can take advantage\n",
+    "of such information, and are most commonly applied when analyzing\n",
+    "images."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "id": "070c610d",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "# import necessary packages\n",
+    "import numpy as np\n",
+    "import matplotlib.pyplot as plt\n",
+    "from sklearn import datasets\n",
+    "\n",
+    "\n",
+    "# ensure the same random numbers appear every time\n",
+    "np.random.seed(0)\n",
+    "\n",
+    "# display images in notebook\n",
+    "%matplotlib inline\n",
+    "plt.rcParams['figure.figsize'] = (12,12)\n",
+    "\n",
+    "\n",
+    "# download MNIST dataset\n",
+    "digits = datasets.load_digits()\n",
+    "\n",
+    "# define inputs and labels\n",
+    "inputs = digits.images\n",
+    "labels = digits.target\n",
+    "\n",
+    "print(\"inputs = (n_inputs, pixel_width, pixel_height) = \" + str(inputs.shape))\n",
+    "print(\"labels = (n_inputs) = \" + str(labels.shape))\n",
+    "\n",
+    "\n",
+    "# flatten the image\n",
+    "# the value -1 means dimension is inferred from the remaining dimensions: 8x8 = 64\n",
+    "n_inputs = len(inputs)\n",
+    "inputs = inputs.reshape(n_inputs, -1)\n",
+    "print(\"X = (n_inputs, n_features) = \" + str(inputs.shape))\n",
+    "\n",
+    "\n",
+    "# choose some random images to display\n",
+    "indices = np.arange(n_inputs)\n",
+    "random_indices = np.random.choice(indices, size=5)\n",
+    "\n",
+    "for i, image in enumerate(digits.images[random_indices]):\n",
+    "    plt.subplot(1, 5, i+1)\n",
+    "    plt.axis('off')\n",
+    "    plt.imshow(image, cmap=plt.cm.gray_r, interpolation='nearest')\n",
+    "    plt.title(\"Label: %d\" % digits.target[random_indices[i]])\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "28bb6085",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Train and test datasets\n",
+    "\n",
+    "Performing analysis before partitioning the dataset is a major error, that can lead to incorrect conclusions.  \n",
+    "\n",
+    "We will reserve $80 \\%$ of our dataset for training and $20 \\%$ for testing.  \n",
+    "\n",
+    "It is important that the train and test datasets are drawn randomly from our dataset, to ensure\n",
+    "no bias in the sampling.  \n",
+    "Say you are taking measurements of weather data to predict the weather in the coming 5 days.\n",
+    "You don't want to train your model on measurements taken from the hours 00.00 to 12.00, and then test it on data\n",
+    "collected from 12.00 to 24.00."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "id": "5a6ae0b0",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "from sklearn.model_selection import train_test_split\n",
+    "\n",
+    "# one-liner from scikit-learn library\n",
+    "train_size = 0.8\n",
+    "test_size = 1 - train_size\n",
+    "X_train, X_test, Y_train, Y_test = train_test_split(inputs, labels, train_size=train_size,\n",
+    "                                                    test_size=test_size)\n",
+    "\n",
+    "# equivalently in numpy\n",
+    "def train_test_split_numpy(inputs, labels, train_size, test_size):\n",
+    "    n_inputs = len(inputs)\n",
+    "    inputs_shuffled = inputs.copy()\n",
+    "    labels_shuffled = labels.copy()\n",
+    "    \n",
+    "    np.random.shuffle(inputs_shuffled)\n",
+    "    np.random.shuffle(labels_shuffled)\n",
+    "    \n",
+    "    train_end = int(n_inputs*train_size)\n",
+    "    X_train, X_test = inputs_shuffled[:train_end], inputs_shuffled[train_end:]\n",
+    "    Y_train, Y_test = labels_shuffled[:train_end], labels_shuffled[train_end:]\n",
+    "    \n",
+    "    return X_train, X_test, Y_train, Y_test\n",
+    "\n",
+    "#X_train, X_test, Y_train, Y_test = train_test_split_numpy(inputs, labels, train_size, test_size)\n",
+    "\n",
+    "print(\"Number of training images: \" + str(len(X_train)))\n",
+    "print(\"Number of test images: \" + str(len(X_test)))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c26d604d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Define model and architecture\n",
+    "\n",
+    "Our simple feed-forward neural network will consist of an *input* layer, a single *hidden* layer and an *output* layer. The activation $y$ of each neuron is a weighted sum of inputs, passed through an activation function. In case of the simple perceptron model we have \n",
+    "\n",
+    "$$ z = \\sum_{i=1}^n w_i a_i ,$$\n",
+    "\n",
+    "$$ y = f(z) ,$$\n",
+    "\n",
+    "where $f$ is the activation function, $a_i$ represents input from neuron $i$ in the preceding layer\n",
+    "and $w_i$ is the weight to input $i$.  \n",
+    "The activation of the neurons in the input layer is just the features (e.g. a pixel value).  \n",
+    "\n",
+    "The simplest activation function for a neuron is the *Heaviside* function:\n",
+    "\n",
+    "$$ f(z) = \n",
+    "\\begin{cases}\n",
+    "1,  &  z > 0\\\\\n",
+    "0,  & \\text{otherwise}\n",
+    "\\end{cases}\n",
+    "$$\n",
+    "\n",
+    "A feed-forward neural network with this activation is known as a *perceptron*.  \n",
+    "For a binary classifier (i.e. two classes, 0 or 1, dog or not-dog) we can also use this in our output layer.  \n",
+    "This activation can be generalized to $k$ classes (using e.g. the *one-against-all* strategy), \n",
+    "and we call these architectures *multiclass perceptrons*.  \n",
+    "\n",
+    "However, it is now common to use the terms Single Layer Perceptron (SLP) (1 hidden layer) and  \n",
+    "Multilayer Perceptron (MLP) (2 or more hidden layers) to refer to feed-forward neural networks with any activation function.  \n",
+    "\n",
+    "Typical choices for activation functions include the sigmoid function, hyperbolic tangent, and Rectified Linear Unit (ReLU).  \n",
+    "We will be using the sigmoid function $\\sigma(x)$:  \n",
+    "\n",
+    "$$ f(x) = \\sigma(x) = \\frac{1}{1 + e^{-x}} ,$$\n",
+    "\n",
+    "which is inspired by probability theory (see logistic regression) and was most commonly used until about 2011. See the discussion below concerning other activation functions."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2775283b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Layers\n",
+    "\n",
+    "* Input \n",
+    "\n",
+    "Since each input image has 8x8 = 64 pixels or features, we have an input layer of 64 neurons.  \n",
+    "\n",
+    "* Hidden layer\n",
+    "\n",
+    "We will use 50 neurons in the hidden layer receiving input from the neurons in the input layer.  \n",
+    "Since each neuron in the hidden layer is connected to the 64 inputs we have 64x50 = 3200 weights to the hidden layer.  \n",
+    "\n",
+    "* Output\n",
+    "\n",
+    "If we were building a binary classifier, it would be sufficient with a single neuron in the output layer,\n",
+    "which could output 0 or 1 according to the Heaviside function. This would be an example of a *hard* classifier, meaning it outputs the class of the input directly. However, if we are dealing with noisy data it is often beneficial to use a *soft* classifier, which outputs the probability of being in class 0 or 1.  \n",
+    "\n",
+    "For a soft binary classifier, we could use a single neuron and interpret the output as either being the probability of being in class 0 or the probability of being in class 1. Alternatively we could use 2 neurons, and interpret each neuron as the probability of being in each class.  \n",
+    "\n",
+    "Since we are doing multiclass classification, with 10 categories, it is natural to use 10 neurons in the output layer. We number the neurons $j = 0,1,...,9$. The activation of each output neuron $j$ will be according to the *softmax* function:  \n",
+    "\n",
+    "$$ P(\\text{class $j$} \\mid \\text{input $\\boldsymbol{a}$}) = \\frac{\\exp{(\\boldsymbol{a}^T \\boldsymbol{w}_j)}}\n",
+    "{\\sum_{c=0}^{9} \\exp{(\\boldsymbol{a}^T \\boldsymbol{w}_c)}} ,$$  \n",
+    "\n",
+    "i.e. each neuron $j$ outputs the probability of being in class $j$ given an input from the hidden layer $\\boldsymbol{a}$, with $\\boldsymbol{w}_j$ the weights of neuron $j$ to the inputs.  \n",
+    "The denominator is a normalization factor to ensure the outputs (probabilities) sum up to 1.  \n",
+    "The exponent is just the weighted sum of inputs as before:  \n",
+    "\n",
+    "$$ z_j = \\sum_{i=1}^n w_ {ij} a_i+b_j.$$  \n",
+    "\n",
+    "Since each neuron in the output layer is connected to the 50 inputs from the hidden layer we have 50x10 = 500\n",
+    "weights to the output layer."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f7455c00",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Weights and biases\n",
+    "\n",
+    "Typically weights are initialized with small values distributed around zero, drawn from a uniform\n",
+    "or normal distribution. Setting all weights to zero means all neurons give the same output, making the network useless.  \n",
+    "\n",
+    "Adding a bias value to the weighted sum of inputs allows the neural network to represent a greater range\n",
+    "of values. Without it, any input with the value 0 will be mapped to zero (before being passed through the activation). The bias unit has an output of 1, and a weight to each neuron $j$, $b_j$:  \n",
+    "\n",
+    "$$ z_j = \\sum_{i=1}^n w_ {ij} a_i + b_j.$$  \n",
+    "\n",
+    "The bias weights $\\boldsymbol{b}$ are often initialized to zero, but a small value like $0.01$ ensures all neurons have some output which can be backpropagated in the first training cycle."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "id": "20b3c8c0",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "# building our neural network\n",
+    "\n",
+    "n_inputs, n_features = X_train.shape\n",
+    "n_hidden_neurons = 50\n",
+    "n_categories = 10\n",
+    "\n",
+    "# we make the weights normally distributed using numpy.random.randn\n",
+    "\n",
+    "# weights and bias in the hidden layer\n",
+    "hidden_weights = np.random.randn(n_features, n_hidden_neurons)\n",
+    "hidden_bias = np.zeros(n_hidden_neurons) + 0.01\n",
+    "\n",
+    "# weights and bias in the output layer\n",
+    "output_weights = np.random.randn(n_hidden_neurons, n_categories)\n",
+    "output_bias = np.zeros(n_categories) + 0.01"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a41d9acd",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Feed-forward pass\n",
+    "\n",
+    "Denote $F$ the number of features, $H$ the number of hidden neurons and $C$ the number of categories.  \n",
+    "For each input image we calculate a weighted sum of input features (pixel values) to each neuron $j$ in the hidden layer $l$:  \n",
+    "\n",
+    "$$ z_{j}^{l} = \\sum_{i=1}^{F} w_{ij}^{l} x_i + b_{j}^{l},$$\n",
+    "\n",
+    "this is then passed through our activation function  \n",
+    "\n",
+    "$$ a_{j}^{l} = f(z_{j}^{l}) .$$  \n",
+    "\n",
+    "We calculate a weighted sum of inputs (activations in the hidden layer) to each neuron $j$ in the output layer:  \n",
+    "\n",
+    "$$ z_{j}^{L} = \\sum_{i=1}^{H} w_{ij}^{L} a_{i}^{l} + b_{j}^{L}.$$  \n",
+    "\n",
+    "Finally we calculate the output of neuron $j$ in the output layer using the softmax function:  \n",
+    "\n",
+    "$$ a_{j}^{L} = \\frac{\\exp{(z_j^{L})}}\n",
+    "{\\sum_{c=0}^{C-1} \\exp{(z_c^{L})}} .$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b2f64238",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Matrix multiplications\n",
+    "\n",
+    "Since our data has the dimensions $X = (n_{inputs}, n_{features})$ and our weights to the hidden\n",
+    "layer have the dimensions  \n",
+    "$W_{hidden} = (n_{features}, n_{hidden})$,\n",
+    "we can easily feed the network all our training data in one go by taking the matrix product  \n",
+    "\n",
+    "$$ X W^{h} = (n_{inputs}, n_{hidden}),$$ \n",
+    "\n",
+    "and obtain a matrix that holds the weighted sum of inputs to the hidden layer\n",
+    "for each input image and each hidden neuron.    \n",
+    "We also add the bias to obtain a matrix of weighted sums to the hidden layer $Z^{h}$:  \n",
+    "\n",
+    "$$ \\boldsymbol{z}^{l} = \\boldsymbol{X} \\boldsymbol{W}^{l} + \\boldsymbol{b}^{l} ,$$\n",
+    "\n",
+    "meaning the same bias (1D array with size equal number of hidden neurons) is added to each input image.  \n",
+    "This is then passed through the activation:  \n",
+    "\n",
+    "$$ \\boldsymbol{a}^{l} = f(\\boldsymbol{z}^l) .$$  \n",
+    "\n",
+    "This is fed to the output layer:  \n",
+    "\n",
+    "$$ \\boldsymbol{z}^{L} = \\boldsymbol{a}^{L} \\boldsymbol{W}^{L} + \\boldsymbol{b}^{L} .$$\n",
+    "\n",
+    "Finally we receive our output values for each image and each category by passing it through the softmax function:  \n",
+    "\n",
+    "$$ output = softmax (\\boldsymbol{z}^{L}) = (n_{inputs}, n_{categories}) .$$"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "id": "1f5589af",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "# setup the feed-forward pass, subscript h = hidden layer\n",
+    "\n",
+    "def sigmoid(x):\n",
+    "    return 1/(1 + np.exp(-x))\n",
+    "\n",
+    "def feed_forward(X):\n",
+    "    # weighted sum of inputs to the hidden layer\n",
+    "    z_h = np.matmul(X, hidden_weights) + hidden_bias\n",
+    "    # activation in the hidden layer\n",
+    "    a_h = sigmoid(z_h)\n",
+    "    \n",
+    "    # weighted sum of inputs to the output layer\n",
+    "    z_o = np.matmul(a_h, output_weights) + output_bias\n",
+    "    # softmax output\n",
+    "    # axis 0 holds each input and axis 1 the probabilities of each category\n",
+    "    exp_term = np.exp(z_o)\n",
+    "    probabilities = exp_term / np.sum(exp_term, axis=1, keepdims=True)\n",
+    "    \n",
+    "    return probabilities\n",
+    "\n",
+    "probabilities = feed_forward(X_train)\n",
+    "print(\"probabilities = (n_inputs, n_categories) = \" + str(probabilities.shape))\n",
+    "print(\"probability that image 0 is in category 0,1,2,...,9 = \\n\" + str(probabilities[0]))\n",
+    "print(\"probabilities sum up to: \" + str(probabilities[0].sum()))\n",
+    "print()\n",
+    "\n",
+    "# we obtain a prediction by taking the class with the highest likelihood\n",
+    "def predict(X):\n",
+    "    probabilities = feed_forward(X)\n",
+    "    return np.argmax(probabilities, axis=1)\n",
+    "\n",
+    "predictions = predict(X_train)\n",
+    "print(\"predictions = (n_inputs) = \" + str(predictions.shape))\n",
+    "print(\"prediction for image 0: \" + str(predictions[0]))\n",
+    "print(\"correct label for image 0: \" + str(Y_train[0]))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4518e911",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Choose cost function and optimizer\n",
+    "\n",
+    "To measure how well our neural network is doing we need to introduce a cost function.  \n",
+    "We will call the function that gives the error of a single sample output the *loss* function, and the function\n",
+    "that gives the total error of our network across all samples the *cost* function.\n",
+    "A typical choice for multiclass classification is the *cross-entropy* loss, also known as the negative log likelihood.  \n",
+    "\n",
+    "In *multiclass* classification it is common to treat each integer label as a so called *one-hot* vector:  \n",
+    "\n",
+    "$$ y = 5 \\quad \\rightarrow \\quad \\boldsymbol{y} = (0, 0, 0, 0, 0, 1, 0, 0, 0, 0) ,$$  \n",
+    "\n",
+    "$$ y = 1 \\quad \\rightarrow \\quad \\boldsymbol{y} = (0, 1, 0, 0, 0, 0, 0, 0, 0, 0) ,$$  \n",
+    "\n",
+    "i.e. a binary bit string of length $C$, where $C = 10$ is the number of classes in the MNIST dataset.  \n",
+    "\n",
+    "Let $y_{ic}$ denote the $c$-th component of the $i$-th one-hot vector.  \n",
+    "We define the cost function $\\mathcal{C}$ as a sum over the cross-entropy loss for each point $\\boldsymbol{x}_i$ in the dataset.\n",
+    "\n",
+    "In the one-hot representation only one of the terms in the loss function is non-zero, namely the\n",
+    "probability of the correct category $c'$  \n",
+    "(i.e. the category $c'$ such that $y_{ic'} = 1$). This means that the cross entropy loss only punishes you for how wrong\n",
+    "you got the correct label. The probability of category $c$ is given by the softmax function. The vector $\\boldsymbol{\\theta}$ represents the parameters of our network, i.e. all the weights and biases."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d519516b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Optimizing the cost function\n",
+    "\n",
+    "The network is trained by finding the weights and biases that minimize the cost function. One of the most widely used classes of methods is *gradient descent* and its generalizations. The idea behind gradient descent\n",
+    "is simply to adjust the weights in the direction where the gradient of the cost function is large and negative. This ensures we flow toward a *local* minimum of the cost function.  \n",
+    "Each parameter $\\theta$ is iteratively adjusted according to the rule  \n",
+    "\n",
+    "$$ \\theta_{i+1} = \\theta_i - \\eta \\nabla \\mathcal{C}(\\theta_i) ,$$\n",
+    "\n",
+    "where $\\eta$ is known as the *learning rate*, which controls how big a step we take towards the minimum.  \n",
+    "This update can be repeated for any number of iterations, or until we are satisfied with the result.  \n",
+    "\n",
+    "A simple and effective improvement is a variant called *Batch Gradient Descent*.  \n",
+    "Instead of calculating the gradient on the whole dataset, we calculate an approximation of the gradient\n",
+    "on a subset of the data called a *minibatch*.  \n",
+    "If there are $N$ data points and we have a minibatch size of $M$, the total number of batches\n",
+    "is $N/M$.  \n",
+    "We denote each minibatch $B_k$, with $k = 1, 2,...,N/M$. The gradient then becomes:  \n",
+    "\n",
+    "$$ \\nabla \\mathcal{C}(\\theta) = \\frac{1}{N} \\sum_{i=1}^N \\nabla \\mathcal{L}_i(\\theta) \\quad \\rightarrow \\quad\n",
+    "\\frac{1}{M} \\sum_{i \\in B_k} \\nabla \\mathcal{L}_i(\\theta) ,$$\n",
+    "\n",
+    "i.e. instead of averaging the loss over the entire dataset, we average over a minibatch.  \n",
+    "\n",
+    "This has two important benefits:  \n",
+    "1. Introducing stochasticity decreases the chance that the algorithm becomes stuck in a local minima.  \n",
+    "\n",
+    "2. It significantly speeds up the calculation, since we do not have to use the entire dataset to calculate the gradient.  \n",
+    "\n",
+    "The various optmization  methods, with codes and algorithms,  are discussed in our lectures on [Gradient descent approaches](https://compphysics.github.io/MachineLearning/doc/pub/Splines/html/Splines-bs.html)."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "46b71202",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Regularization\n",
+    "\n",
+    "It is common to add an extra term to the cost function, proportional\n",
+    "to the size of the weights.  This is equivalent to constraining the\n",
+    "size of the weights, so that they do not grow out of control.\n",
+    "Constraining the size of the weights means that the weights cannot\n",
+    "grow arbitrarily large to fit the training data, and in this way\n",
+    "reduces *overfitting*.\n",
+    "\n",
+    "We will measure the size of the weights using the so called *L2-norm*, meaning our cost function becomes:  \n",
+    "\n",
+    "$$  \\mathcal{C}(\\theta) = \\frac{1}{N} \\sum_{i=1}^N \\mathcal{L}_i(\\theta) \\quad \\rightarrow \\quad\n",
+    "\\frac{1}{N} \\sum_{i=1}^N  \\mathcal{L}_i(\\theta) + \\lambda \\lvert \\lvert \\boldsymbol{w} \\rvert \\rvert_2^2 \n",
+    "= \\frac{1}{N} \\sum_{i=1}^N  \\mathcal{L}(\\theta) + \\lambda \\sum_{ij} w_{ij}^2,$$  \n",
+    "\n",
+    "i.e. we sum up all the weights squared. The factor $\\lambda$ is known as a regularization parameter.\n",
+    "\n",
+    "In order to train the model, we need to calculate the derivative of\n",
+    "the cost function with respect to every bias and weight in the\n",
+    "network.  In total our network has $(64 + 1)\\times 50=3250$ weights in\n",
+    "the hidden layer and $(50 + 1)\\times 10=510$ weights to the output\n",
+    "layer ($+1$ for the bias), and the gradient must be calculated for\n",
+    "every parameter.  We use the *backpropagation* algorithm discussed\n",
+    "above. This is a clever use of the chain rule that allows us to\n",
+    "calculate the gradient efficently."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "129c39d3",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Matrix  multiplication\n",
+    "\n",
+    "To more efficently train our network these equations are implemented using matrix operations.  \n",
+    "The error in the output layer is calculated simply as, with $\\boldsymbol{t}$ being our targets,  \n",
+    "\n",
+    "$$ \\delta_L = \\boldsymbol{t} - \\boldsymbol{y} = (n_{inputs}, n_{categories}) .$$  \n",
+    "\n",
+    "The gradient for the output weights is calculated as  \n",
+    "\n",
+    "$$ \\nabla W_{L} = \\boldsymbol{a}^T \\delta_L   = (n_{hidden}, n_{categories}) ,$$\n",
+    "\n",
+    "where $\\boldsymbol{a} = (n_{inputs}, n_{hidden})$. This simply means that we are summing up the gradients for each input.  \n",
+    "Since we are going backwards we have to transpose the activation matrix.  \n",
+    "\n",
+    "The gradient with respect to the output bias is then  \n",
+    "\n",
+    "$$ \\nabla \\boldsymbol{b}_{L} = \\sum_{i=1}^{n_{inputs}} \\delta_L = (n_{categories}) .$$  \n",
+    "\n",
+    "The error in the hidden layer is  \n",
+    "\n",
+    "$$ \\Delta_h = \\delta_L W_{L}^T \\circ f'(z_{h}) = \\delta_L W_{L}^T \\circ a_{h} \\circ (1 - a_{h}) = (n_{inputs}, n_{hidden}) ,$$  \n",
+    "\n",
+    "where $f'(a_{h})$ is the derivative of the activation in the hidden layer. The matrix products mean\n",
+    "that we are summing up the products for each neuron in the output layer. The symbol $\\circ$ denotes\n",
+    "the *Hadamard product*, meaning element-wise multiplication.  \n",
+    "\n",
+    "This again gives us the gradients in the hidden layer:  \n",
+    "\n",
+    "$$ \\nabla W_{h} = X^T \\delta_h = (n_{features}, n_{hidden}) ,$$  \n",
+    "\n",
+    "$$ \\nabla b_{h} = \\sum_{i=1}^{n_{inputs}} \\delta_h = (n_{hidden}) .$$"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "id": "8abafb44",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "# to categorical turns our integer vector into a onehot representation\n",
+    "from sklearn.metrics import accuracy_score\n",
+    "\n",
+    "# one-hot in numpy\n",
+    "def to_categorical_numpy(integer_vector):\n",
+    "    n_inputs = len(integer_vector)\n",
+    "    n_categories = np.max(integer_vector) + 1\n",
+    "    onehot_vector = np.zeros((n_inputs, n_categories))\n",
+    "    onehot_vector[range(n_inputs), integer_vector] = 1\n",
+    "    \n",
+    "    return onehot_vector\n",
+    "\n",
+    "#Y_train_onehot, Y_test_onehot = to_categorical(Y_train), to_categorical(Y_test)\n",
+    "Y_train_onehot, Y_test_onehot = to_categorical_numpy(Y_train), to_categorical_numpy(Y_test)\n",
+    "\n",
+    "def feed_forward_train(X):\n",
+    "    # weighted sum of inputs to the hidden layer\n",
+    "    z_h = np.matmul(X, hidden_weights) + hidden_bias\n",
+    "    # activation in the hidden layer\n",
+    "    a_h = sigmoid(z_h)\n",
+    "    \n",
+    "    # weighted sum of inputs to the output layer\n",
+    "    z_o = np.matmul(a_h, output_weights) + output_bias\n",
+    "    # softmax output\n",
+    "    # axis 0 holds each input and axis 1 the probabilities of each category\n",
+    "    exp_term = np.exp(z_o)\n",
+    "    probabilities = exp_term / np.sum(exp_term, axis=1, keepdims=True)\n",
+    "    \n",
+    "    # for backpropagation need activations in hidden and output layers\n",
+    "    return a_h, probabilities\n",
+    "\n",
+    "def backpropagation(X, Y):\n",
+    "    a_h, probabilities = feed_forward_train(X)\n",
+    "    \n",
+    "    # error in the output layer\n",
+    "    error_output = probabilities - Y\n",
+    "    # error in the hidden layer\n",
+    "    error_hidden = np.matmul(error_output, output_weights.T) * a_h * (1 - a_h)\n",
+    "    \n",
+    "    # gradients for the output layer\n",
+    "    output_weights_gradient = np.matmul(a_h.T, error_output)\n",
+    "    output_bias_gradient = np.sum(error_output, axis=0)\n",
+    "    \n",
+    "    # gradient for the hidden layer\n",
+    "    hidden_weights_gradient = np.matmul(X.T, error_hidden)\n",
+    "    hidden_bias_gradient = np.sum(error_hidden, axis=0)\n",
+    "\n",
+    "    return output_weights_gradient, output_bias_gradient, hidden_weights_gradient, hidden_bias_gradient\n",
+    "\n",
+    "print(\"Old accuracy on training data: \" + str(accuracy_score(predict(X_train), Y_train)))\n",
+    "\n",
+    "eta = 0.01\n",
+    "lmbd = 0.01\n",
+    "for i in range(1000):\n",
+    "    # calculate gradients\n",
+    "    dWo, dBo, dWh, dBh = backpropagation(X_train, Y_train_onehot)\n",
+    "    \n",
+    "    # regularization term gradients\n",
+    "    dWo += lmbd * output_weights\n",
+    "    dWh += lmbd * hidden_weights\n",
+    "    \n",
+    "    # update weights and biases\n",
+    "    output_weights -= eta * dWo\n",
+    "    output_bias -= eta * dBo\n",
+    "    hidden_weights -= eta * dWh\n",
+    "    hidden_bias -= eta * dBh\n",
+    "\n",
+    "print(\"New accuracy on training data: \" + str(accuracy_score(predict(X_train), Y_train)))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e95c7166",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Improving performance\n",
+    "\n",
+    "As we can see the network does not seem to be learning at all. It seems to be just guessing the label for each image.  \n",
+    "In order to obtain a network that does something useful, we will have to do a bit more work.  \n",
+    "\n",
+    "The choice of *hyperparameters* such as learning rate and regularization parameter is hugely influential for the performance of the network. Typically a *grid-search* is performed, wherein we test different hyperparameters separated by orders of magnitude. For example we could test the learning rates $\\eta = 10^{-6}, 10^{-5},...,10^{-1}$ with different regularization parameters $\\lambda = 10^{-6},...,10^{-0}$.  \n",
+    "\n",
+    "Next, we haven't implemented minibatching yet, which introduces stochasticity and is though to act as an important regularizer on the weights. We call a feed-forward + backward pass with a minibatch an *iteration*, and a full training period\n",
+    "going through the entire dataset ($n/M$ batches) an *epoch*.\n",
+    "\n",
+    "If this does not improve network performance, you may want to consider altering the network architecture, adding more neurons or hidden layers.  \n",
+    "Andrew Ng goes through some of these considerations in this [video](https://youtu.be/F1ka6a13S9I). You can find a summary of the video [here](https://kevinzakka.github.io/2016/09/26/applying-deep-learning/)."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b4365471",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Full object-oriented implementation\n",
+    "\n",
+    "It is very natural to think of the network as an object, with specific instances of the network\n",
+    "being realizations of this object with different hyperparameters. An implementation using Python classes provides a clean structure and interface, and the full implementation of our neural network is given below."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 8,
+   "id": "5a0357b2",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "class NeuralNetwork:\n",
+    "    def __init__(\n",
+    "            self,\n",
+    "            X_data,\n",
+    "            Y_data,\n",
+    "            n_hidden_neurons=50,\n",
+    "            n_categories=10,\n",
+    "            epochs=10,\n",
+    "            batch_size=100,\n",
+    "            eta=0.1,\n",
+    "            lmbd=0.0):\n",
+    "\n",
+    "        self.X_data_full = X_data\n",
+    "        self.Y_data_full = Y_data\n",
+    "\n",
+    "        self.n_inputs = X_data.shape[0]\n",
+    "        self.n_features = X_data.shape[1]\n",
+    "        self.n_hidden_neurons = n_hidden_neurons\n",
+    "        self.n_categories = n_categories\n",
+    "\n",
+    "        self.epochs = epochs\n",
+    "        self.batch_size = batch_size\n",
+    "        self.iterations = self.n_inputs // self.batch_size\n",
+    "        self.eta = eta\n",
+    "        self.lmbd = lmbd\n",
+    "\n",
+    "        self.create_biases_and_weights()\n",
+    "\n",
+    "    def create_biases_and_weights(self):\n",
+    "        self.hidden_weights = np.random.randn(self.n_features, self.n_hidden_neurons)\n",
+    "        self.hidden_bias = np.zeros(self.n_hidden_neurons) + 0.01\n",
+    "\n",
+    "        self.output_weights = np.random.randn(self.n_hidden_neurons, self.n_categories)\n",
+    "        self.output_bias = np.zeros(self.n_categories) + 0.01\n",
+    "\n",
+    "    def feed_forward(self):\n",
+    "        # feed-forward for training\n",
+    "        self.z_h = np.matmul(self.X_data, self.hidden_weights) + self.hidden_bias\n",
+    "        self.a_h = sigmoid(self.z_h)\n",
+    "\n",
+    "        self.z_o = np.matmul(self.a_h, self.output_weights) + self.output_bias\n",
+    "\n",
+    "        exp_term = np.exp(self.z_o)\n",
+    "        self.probabilities = exp_term / np.sum(exp_term, axis=1, keepdims=True)\n",
+    "\n",
+    "    def feed_forward_out(self, X):\n",
+    "        # feed-forward for output\n",
+    "        z_h = np.matmul(X, self.hidden_weights) + self.hidden_bias\n",
+    "        a_h = sigmoid(z_h)\n",
+    "\n",
+    "        z_o = np.matmul(a_h, self.output_weights) + self.output_bias\n",
+    "        \n",
+    "        exp_term = np.exp(z_o)\n",
+    "        probabilities = exp_term / np.sum(exp_term, axis=1, keepdims=True)\n",
+    "        return probabilities\n",
+    "\n",
+    "    def backpropagation(self):\n",
+    "        error_output = self.probabilities - self.Y_data\n",
+    "        error_hidden = np.matmul(error_output, self.output_weights.T) * self.a_h * (1 - self.a_h)\n",
+    "\n",
+    "        self.output_weights_gradient = np.matmul(self.a_h.T, error_output)\n",
+    "        self.output_bias_gradient = np.sum(error_output, axis=0)\n",
+    "\n",
+    "        self.hidden_weights_gradient = np.matmul(self.X_data.T, error_hidden)\n",
+    "        self.hidden_bias_gradient = np.sum(error_hidden, axis=0)\n",
+    "\n",
+    "        if self.lmbd > 0.0:\n",
+    "            self.output_weights_gradient += self.lmbd * self.output_weights\n",
+    "            self.hidden_weights_gradient += self.lmbd * self.hidden_weights\n",
+    "\n",
+    "        self.output_weights -= self.eta * self.output_weights_gradient\n",
+    "        self.output_bias -= self.eta * self.output_bias_gradient\n",
+    "        self.hidden_weights -= self.eta * self.hidden_weights_gradient\n",
+    "        self.hidden_bias -= self.eta * self.hidden_bias_gradient\n",
+    "\n",
+    "    def predict(self, X):\n",
+    "        probabilities = self.feed_forward_out(X)\n",
+    "        return np.argmax(probabilities, axis=1)\n",
+    "\n",
+    "    def predict_probabilities(self, X):\n",
+    "        probabilities = self.feed_forward_out(X)\n",
+    "        return probabilities\n",
+    "\n",
+    "    def train(self):\n",
+    "        data_indices = np.arange(self.n_inputs)\n",
+    "\n",
+    "        for i in range(self.epochs):\n",
+    "            for j in range(self.iterations):\n",
+    "                # pick datapoints with replacement\n",
+    "                chosen_datapoints = np.random.choice(\n",
+    "                    data_indices, size=self.batch_size, replace=False\n",
+    "                )\n",
+    "\n",
+    "                # minibatch training data\n",
+    "                self.X_data = self.X_data_full[chosen_datapoints]\n",
+    "                self.Y_data = self.Y_data_full[chosen_datapoints]\n",
+    "\n",
+    "                self.feed_forward()\n",
+    "                self.backpropagation()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a417307d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Evaluate model performance on test data\n",
+    "\n",
+    "To measure the performance of our network we evaluate how well it does it data it has never seen before, i.e. the test data.  \n",
+    "We measure the performance of the network using the *accuracy* score.  \n",
+    "The accuracy is as you would expect just the number of images correctly labeled divided by the total number of images. A perfect classifier will have an accuracy score of $1$.  \n",
+    "\n",
+    "$$ \\text{Accuracy} = \\frac{\\sum_{i=1}^n I(\\tilde{y}_i = y_i)}{n} ,$$  \n",
+    "\n",
+    "where $I$ is the indicator function, $1$ if $\\tilde{y}_i = y_i$ and $0$ otherwise."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 9,
+   "id": "8ee4b306",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "epochs = 100\n",
+    "batch_size = 100\n",
+    "\n",
+    "dnn = NeuralNetwork(X_train, Y_train_onehot, eta=eta, lmbd=lmbd, epochs=epochs, batch_size=batch_size,\n",
+    "                    n_hidden_neurons=n_hidden_neurons, n_categories=n_categories)\n",
+    "dnn.train()\n",
+    "test_predict = dnn.predict(X_test)\n",
+    "\n",
+    "# accuracy score from scikit library\n",
+    "print(\"Accuracy score on test set: \", accuracy_score(Y_test, test_predict))\n",
+    "\n",
+    "# equivalent in numpy\n",
+    "def accuracy_score_numpy(Y_test, Y_pred):\n",
+    "    return np.sum(Y_test == Y_pred) / len(Y_test)\n",
+    "\n",
+    "#print(\"Accuracy score on test set: \", accuracy_score_numpy(Y_test, test_predict))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "efcbd954",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Adjust hyperparameters\n",
+    "\n",
+    "We now perform a grid search to find the optimal hyperparameters for the network.  \n",
+    "Note that we are only using 1 layer with 50 neurons, and human performance is estimated to be around $98\\%$ ($2\\%$ error rate)."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 10,
+   "id": "bb527e6e",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "eta_vals = np.logspace(-5, 1, 7)\n",
+    "lmbd_vals = np.logspace(-5, 1, 7)\n",
+    "# store the models for later use\n",
+    "DNN_numpy = np.zeros((len(eta_vals), len(lmbd_vals)), dtype=object)\n",
+    "\n",
+    "# grid search\n",
+    "for i, eta in enumerate(eta_vals):\n",
+    "    for j, lmbd in enumerate(lmbd_vals):\n",
+    "        dnn = NeuralNetwork(X_train, Y_train_onehot, eta=eta, lmbd=lmbd, epochs=epochs, batch_size=batch_size,\n",
+    "                            n_hidden_neurons=n_hidden_neurons, n_categories=n_categories)\n",
+    "        dnn.train()\n",
+    "        \n",
+    "        DNN_numpy[i][j] = dnn\n",
+    "        \n",
+    "        test_predict = dnn.predict(X_test)\n",
+    "        \n",
+    "        print(\"Learning rate  = \", eta)\n",
+    "        print(\"Lambda = \", lmbd)\n",
+    "        print(\"Accuracy score on test set: \", accuracy_score(Y_test, test_predict))\n",
+    "        print()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d282951d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Visualization"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 11,
+   "id": "69d3d9c8",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "# visual representation of grid search\n",
+    "# uses seaborn heatmap, you can also do this with matplotlib imshow\n",
+    "import seaborn as sns\n",
+    "\n",
+    "sns.set()\n",
+    "\n",
+    "train_accuracy = np.zeros((len(eta_vals), len(lmbd_vals)))\n",
+    "test_accuracy = np.zeros((len(eta_vals), len(lmbd_vals)))\n",
+    "\n",
+    "for i in range(len(eta_vals)):\n",
+    "    for j in range(len(lmbd_vals)):\n",
+    "        dnn = DNN_numpy[i][j]\n",
+    "        \n",
+    "        train_pred = dnn.predict(X_train) \n",
+    "        test_pred = dnn.predict(X_test)\n",
+    "\n",
+    "        train_accuracy[i][j] = accuracy_score(Y_train, train_pred)\n",
+    "        test_accuracy[i][j] = accuracy_score(Y_test, test_pred)\n",
+    "\n",
+    "        \n",
+    "fig, ax = plt.subplots(figsize = (10, 10))\n",
+    "sns.heatmap(train_accuracy, annot=True, ax=ax, cmap=\"viridis\")\n",
+    "ax.set_title(\"Training Accuracy\")\n",
+    "ax.set_ylabel(\"$\\eta$\")\n",
+    "ax.set_xlabel(\"$\\lambda$\")\n",
+    "plt.show()\n",
+    "\n",
+    "fig, ax = plt.subplots(figsize = (10, 10))\n",
+    "sns.heatmap(test_accuracy, annot=True, ax=ax, cmap=\"viridis\")\n",
+    "ax.set_title(\"Test Accuracy\")\n",
+    "ax.set_ylabel(\"$\\eta$\")\n",
+    "ax.set_xlabel(\"$\\lambda$\")\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "99f5058c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## scikit-learn implementation\n",
+    "\n",
+    "**scikit-learn** focuses more\n",
+    "on traditional machine learning methods, such as regression,\n",
+    "clustering, decision trees, etc. As such, it has only two types of\n",
+    "neural networks: Multi Layer Perceptron outputting continuous values,\n",
+    "*MPLRegressor*, and Multi Layer Perceptron outputting labels,\n",
+    "*MLPClassifier*. We will see how simple it is to use these classes.\n",
+    "\n",
+    "**scikit-learn** implements a few improvements from our neural network,\n",
+    "such as early stopping, a varying learning rate, different\n",
+    "optimization methods, etc. We would therefore expect a better\n",
+    "performance overall."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 12,
+   "id": "7898d99f",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "from sklearn.neural_network import MLPClassifier\n",
+    "# store models for later use\n",
+    "DNN_scikit = np.zeros((len(eta_vals), len(lmbd_vals)), dtype=object)\n",
+    "\n",
+    "for i, eta in enumerate(eta_vals):\n",
+    "    for j, lmbd in enumerate(lmbd_vals):\n",
+    "        dnn = MLPClassifier(hidden_layer_sizes=(n_hidden_neurons), activation='logistic',\n",
+    "                            alpha=lmbd, learning_rate_init=eta, max_iter=epochs)\n",
+    "        dnn.fit(X_train, Y_train)\n",
+    "        \n",
+    "        DNN_scikit[i][j] = dnn\n",
+    "        \n",
+    "        print(\"Learning rate  = \", eta)\n",
+    "        print(\"Lambda = \", lmbd)\n",
+    "        print(\"Accuracy score on test set: \", dnn.score(X_test, Y_test))\n",
+    "        print()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7ceec918",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Visualization"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 13,
+   "id": "98abf229",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "# optional\n",
+    "# visual representation of grid search\n",
+    "# uses seaborn heatmap, could probably do this in matplotlib\n",
+    "import seaborn as sns\n",
+    "\n",
+    "sns.set()\n",
+    "\n",
+    "train_accuracy = np.zeros((len(eta_vals), len(lmbd_vals)))\n",
+    "test_accuracy = np.zeros((len(eta_vals), len(lmbd_vals)))\n",
+    "\n",
+    "for i in range(len(eta_vals)):\n",
+    "    for j in range(len(lmbd_vals)):\n",
+    "        dnn = DNN_scikit[i][j]\n",
+    "        \n",
+    "        train_pred = dnn.predict(X_train) \n",
+    "        test_pred = dnn.predict(X_test)\n",
+    "\n",
+    "        train_accuracy[i][j] = accuracy_score(Y_train, train_pred)\n",
+    "        test_accuracy[i][j] = accuracy_score(Y_test, test_pred)\n",
+    "\n",
+    "        \n",
+    "fig, ax = plt.subplots(figsize = (10, 10))\n",
+    "sns.heatmap(train_accuracy, annot=True, ax=ax, cmap=\"viridis\")\n",
+    "ax.set_title(\"Training Accuracy\")\n",
+    "ax.set_ylabel(\"$\\eta$\")\n",
+    "ax.set_xlabel(\"$\\lambda$\")\n",
+    "plt.show()\n",
+    "\n",
+    "fig, ax = plt.subplots(figsize = (10, 10))\n",
+    "sns.heatmap(test_accuracy, annot=True, ax=ax, cmap=\"viridis\")\n",
+    "ax.set_title(\"Test Accuracy\")\n",
+    "ax.set_ylabel(\"$\\eta$\")\n",
+    "ax.set_xlabel(\"$\\lambda$\")\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ba07c374",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Building neural networks in Tensorflow and Keras\n",
+    "\n",
+    "Now we want  to build on the experience gained from our neural network implementation in NumPy and scikit-learn\n",
+    "and use it to construct a neural network in Tensorflow. Once we have constructed a neural network in NumPy\n",
+    "and Tensorflow, building one in Keras is really quite trivial, though the performance may suffer.  \n",
+    "\n",
+    "In our previous example we used only one hidden layer, and in this we will use two. From this it should be quite\n",
+    "clear how to build one using an arbitrary number of hidden layers, using data structures such as Python lists or\n",
+    "NumPy arrays."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1cf09819",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Tensorflow\n",
+    "\n",
+    "Tensorflow is an open source library machine learning library\n",
+    "developed by the Google Brain team for internal use. It was released\n",
+    "under the Apache 2.0 open source license in November 9, 2015.\n",
+    "\n",
+    "Tensorflow is a computational framework that allows you to construct\n",
+    "machine learning models at different levels of abstraction, from\n",
+    "high-level, object-oriented APIs like Keras, down to the C++ kernels\n",
+    "that Tensorflow is built upon. The higher levels of abstraction are\n",
+    "simpler to use, but less flexible, and our choice of implementation\n",
+    "should reflect the problems we are trying to solve.\n",
+    "\n",
+    "[Tensorflow uses](https://www.tensorflow.org/guide/graphs) so-called graphs to represent your computation\n",
+    "in terms of the dependencies between individual operations, such that you first build a Tensorflow *graph*\n",
+    "to represent your model, and then create a Tensorflow *session* to run the graph.\n",
+    "\n",
+    "In this guide we will analyze the same data as we did in our NumPy and\n",
+    "scikit-learn tutorial, gathered from the MNIST database of images. We\n",
+    "will give an introduction to the lower level Python Application\n",
+    "Program Interfaces (APIs), and see how we use them to build our graph.\n",
+    "Then we will build (effectively) the same graph in Keras, to see just\n",
+    "how simple solving a machine learning problem can be.\n",
+    "\n",
+    "To install tensorflow on Unix/Linux systems, use pip as"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 14,
+   "id": "2c2c3ec5",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "pip3 install tensorflow"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "39d013b1",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and/or if you use **anaconda**, just write (or install from the graphical user interface)\n",
+    "(current release of CPU-only TensorFlow)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 15,
+   "id": "fbf36c26",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "conda create -n tf tensorflow\n",
+    "conda activate tf"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "94e66380",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "To install the current release of GPU TensorFlow"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 16,
+   "id": "5e72b1d2",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "conda create -n tf-gpu tensorflow-gpu\n",
+    "conda activate tf-gpu"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "40470dbd",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Using Keras\n",
+    "\n",
+    "Keras is a high level [neural network](https://en.wikipedia.org/wiki/Application_programming_interface)\n",
+    "that supports Tensorflow, CTNK and Theano as backends.  \n",
+    "If you have Anaconda installed you may run the following command"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 17,
+   "id": "f2cd4f41",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "conda install keras"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "636940c6",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "You can look up the [instructions here](https://keras.io/) for more information.\n",
+    "\n",
+    "We will to a large extent use **keras** in this course."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d9f47b57",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Collect and pre-process data\n",
+    "\n",
+    "Let us look again at the MINST data set."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 18,
+   "id": "1489b5d5",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "# import necessary packages\n",
+    "import numpy as np\n",
+    "import matplotlib.pyplot as plt\n",
+    "import tensorflow as tf\n",
+    "from sklearn import datasets\n",
+    "\n",
+    "\n",
+    "# ensure the same random numbers appear every time\n",
+    "np.random.seed(0)\n",
+    "\n",
+    "# display images in notebook\n",
+    "%matplotlib inline\n",
+    "plt.rcParams['figure.figsize'] = (12,12)\n",
+    "\n",
+    "\n",
+    "# download MNIST dataset\n",
+    "digits = datasets.load_digits()\n",
+    "\n",
+    "# define inputs and labels\n",
+    "inputs = digits.images\n",
+    "labels = digits.target\n",
+    "\n",
+    "print(\"inputs = (n_inputs, pixel_width, pixel_height) = \" + str(inputs.shape))\n",
+    "print(\"labels = (n_inputs) = \" + str(labels.shape))\n",
+    "\n",
+    "\n",
+    "# flatten the image\n",
+    "# the value -1 means dimension is inferred from the remaining dimensions: 8x8 = 64\n",
+    "n_inputs = len(inputs)\n",
+    "inputs = inputs.reshape(n_inputs, -1)\n",
+    "print(\"X = (n_inputs, n_features) = \" + str(inputs.shape))\n",
+    "\n",
+    "\n",
+    "# choose some random images to display\n",
+    "indices = np.arange(n_inputs)\n",
+    "random_indices = np.random.choice(indices, size=5)\n",
+    "\n",
+    "for i, image in enumerate(digits.images[random_indices]):\n",
+    "    plt.subplot(1, 5, i+1)\n",
+    "    plt.axis('off')\n",
+    "    plt.imshow(image, cmap=plt.cm.gray_r, interpolation='nearest')\n",
+    "    plt.title(\"Label: %d\" % digits.target[random_indices[i]])\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 19,
+   "id": "672dc5a2",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "from tensorflow.keras.layers import Input\n",
+    "from tensorflow.keras.models import Sequential      #This allows appending layers to existing models\n",
+    "from tensorflow.keras.layers import Dense           #This allows defining the characteristics of a particular layer\n",
+    "from tensorflow.keras import optimizers             #This allows using whichever optimiser we want (sgd,adam,RMSprop)\n",
+    "from tensorflow.keras import regularizers           #This allows using whichever regularizer we want (l1,l2,l1_l2)\n",
+    "from tensorflow.keras.utils import to_categorical   #This allows using categorical cross entropy as the cost function\n",
+    "\n",
+    "from sklearn.model_selection import train_test_split\n",
+    "\n",
+    "# one-hot representation of labels\n",
+    "labels = to_categorical(labels)\n",
+    "\n",
+    "# split into train and test data\n",
+    "train_size = 0.8\n",
+    "test_size = 1 - train_size\n",
+    "X_train, X_test, Y_train, Y_test = train_test_split(inputs, labels, train_size=train_size,\n",
+    "                                                    test_size=test_size)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 20,
+   "id": "0513084f",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "\n",
+    "epochs = 100\n",
+    "batch_size = 100\n",
+    "n_neurons_layer1 = 100\n",
+    "n_neurons_layer2 = 50\n",
+    "n_categories = 10\n",
+    "eta_vals = np.logspace(-5, 1, 7)\n",
+    "lmbd_vals = np.logspace(-5, 1, 7)\n",
+    "def create_neural_network_keras(n_neurons_layer1, n_neurons_layer2, n_categories, eta, lmbd):\n",
+    "    model = Sequential()\n",
+    "    model.add(Dense(n_neurons_layer1, activation='sigmoid', kernel_regularizer=regularizers.l2(lmbd)))\n",
+    "    model.add(Dense(n_neurons_layer2, activation='sigmoid', kernel_regularizer=regularizers.l2(lmbd)))\n",
+    "    model.add(Dense(n_categories, activation='softmax'))\n",
+    "    \n",
+    "    sgd = optimizers.SGD(lr=eta)\n",
+    "    model.compile(loss='categorical_crossentropy', optimizer=sgd, metrics=['accuracy'])\n",
+    "    \n",
+    "    return model"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 21,
+   "id": "02a34777",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "DNN_keras = np.zeros((len(eta_vals), len(lmbd_vals)), dtype=object)\n",
+    "        \n",
+    "for i, eta in enumerate(eta_vals):\n",
+    "    for j, lmbd in enumerate(lmbd_vals):\n",
+    "        DNN = create_neural_network_keras(n_neurons_layer1, n_neurons_layer2, n_categories,\n",
+    "                                         eta=eta, lmbd=lmbd)\n",
+    "        DNN.fit(X_train, Y_train, epochs=epochs, batch_size=batch_size, verbose=0)\n",
+    "        scores = DNN.evaluate(X_test, Y_test)\n",
+    "        \n",
+    "        DNN_keras[i][j] = DNN\n",
+    "        \n",
+    "        print(\"Learning rate = \", eta)\n",
+    "        print(\"Lambda = \", lmbd)\n",
+    "        print(\"Test accuracy: %.3f\" % scores[1])\n",
+    "        print()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 22,
+   "id": "52c1d6e2",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "# optional\n",
+    "# visual representation of grid search\n",
+    "# uses seaborn heatmap, could probably do this in matplotlib\n",
+    "import seaborn as sns\n",
+    "\n",
+    "sns.set()\n",
+    "\n",
+    "train_accuracy = np.zeros((len(eta_vals), len(lmbd_vals)))\n",
+    "test_accuracy = np.zeros((len(eta_vals), len(lmbd_vals)))\n",
+    "\n",
+    "for i in range(len(eta_vals)):\n",
+    "    for j in range(len(lmbd_vals)):\n",
+    "        DNN = DNN_keras[i][j]\n",
+    "\n",
+    "        train_accuracy[i][j] = DNN.evaluate(X_train, Y_train)[1]\n",
+    "        test_accuracy[i][j] = DNN.evaluate(X_test, Y_test)[1]\n",
+    "\n",
+    "        \n",
+    "fig, ax = plt.subplots(figsize = (10, 10))\n",
+    "sns.heatmap(train_accuracy, annot=True, ax=ax, cmap=\"viridis\")\n",
+    "ax.set_title(\"Training Accuracy\")\n",
+    "ax.set_ylabel(\"$\\eta$\")\n",
+    "ax.set_xlabel(\"$\\lambda$\")\n",
+    "plt.show()\n",
+    "\n",
+    "fig, ax = plt.subplots(figsize = (10, 10))\n",
+    "sns.heatmap(test_accuracy, annot=True, ax=ax, cmap=\"viridis\")\n",
+    "ax.set_title(\"Test Accuracy\")\n",
+    "ax.set_ylabel(\"$\\eta$\")\n",
+    "ax.set_xlabel(\"$\\lambda$\")\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "53f9be79",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Building a neural network code\n",
+    "\n",
+    "Here we  present a flexible object oriented codebase\n",
+    "for a feed forward neural network, along with a demonstration of how\n",
+    "to use it. Before we get into the details of the neural network, we\n",
+    "will first present some implementations of various schedulers, cost\n",
+    "functions and activation functions that can be used together with the\n",
+    "neural network.\n",
+    "\n",
+    "The codes here were developed by Eric Reber and Gregor Kajda during spring 2023."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "39bd1718",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "### Learning rate methods\n",
+    "\n",
+    "The code below shows object oriented implementations of the Constant,\n",
+    "Momentum, Adagrad, AdagradMomentum, RMS prop and Adam schedulers. All\n",
+    "of the classes belong to the shared abstract Scheduler class, and\n",
+    "share the update_change() and reset() methods allowing for any of the\n",
+    "schedulers to be seamlessly used during the training stage, as will\n",
+    "later be shown in the fit() method of the neural\n",
+    "network. Update_change() only has one parameter, the gradient\n",
+    "($δ^l_ja^{l−1}_k$), and returns the change which will be subtracted\n",
+    "from the weights. The reset() function takes no parameters, and resets\n",
+    "the desired variables. For Constant and Momentum, reset does nothing."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 23,
+   "id": "4c1f42f1",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "import autograd.numpy as np\n",
+    "\n",
+    "class Scheduler:\n",
+    "    \"\"\"\n",
+    "    Abstract class for Schedulers\n",
+    "    \"\"\"\n",
+    "\n",
+    "    def __init__(self, eta):\n",
+    "        self.eta = eta\n",
+    "\n",
+    "    # should be overwritten\n",
+    "    def update_change(self, gradient):\n",
+    "        raise NotImplementedError\n",
+    "\n",
+    "    # overwritten if needed\n",
+    "    def reset(self):\n",
+    "        pass\n",
+    "\n",
+    "\n",
+    "class Constant(Scheduler):\n",
+    "    def __init__(self, eta):\n",
+    "        super().__init__(eta)\n",
+    "\n",
+    "    def update_change(self, gradient):\n",
+    "        return self.eta * gradient\n",
+    "    \n",
+    "    def reset(self):\n",
+    "        pass\n",
+    "\n",
+    "\n",
+    "class Momentum(Scheduler):\n",
+    "    def __init__(self, eta: float, momentum: float):\n",
+    "        super().__init__(eta)\n",
+    "        self.momentum = momentum\n",
+    "        self.change = 0\n",
+    "\n",
+    "    def update_change(self, gradient):\n",
+    "        self.change = self.momentum * self.change + self.eta * gradient\n",
+    "        return self.change\n",
+    "\n",
+    "    def reset(self):\n",
+    "        pass\n",
+    "\n",
+    "\n",
+    "class Adagrad(Scheduler):\n",
+    "    def __init__(self, eta):\n",
+    "        super().__init__(eta)\n",
+    "        self.G_t = None\n",
+    "\n",
+    "    def update_change(self, gradient):\n",
+    "        delta = 1e-8  # avoid division ny zero\n",
+    "\n",
+    "        if self.G_t is None:\n",
+    "            self.G_t = np.zeros((gradient.shape[0], gradient.shape[0]))\n",
+    "\n",
+    "        self.G_t += gradient @ gradient.T\n",
+    "\n",
+    "        G_t_inverse = 1 / (\n",
+    "            delta + np.sqrt(np.reshape(np.diagonal(self.G_t), (self.G_t.shape[0], 1)))\n",
+    "        )\n",
+    "        return self.eta * gradient * G_t_inverse\n",
+    "\n",
+    "    def reset(self):\n",
+    "        self.G_t = None\n",
+    "\n",
+    "\n",
+    "class AdagradMomentum(Scheduler):\n",
+    "    def __init__(self, eta, momentum):\n",
+    "        super().__init__(eta)\n",
+    "        self.G_t = None\n",
+    "        self.momentum = momentum\n",
+    "        self.change = 0\n",
+    "\n",
+    "    def update_change(self, gradient):\n",
+    "        delta = 1e-8  # avoid division ny zero\n",
+    "\n",
+    "        if self.G_t is None:\n",
+    "            self.G_t = np.zeros((gradient.shape[0], gradient.shape[0]))\n",
+    "\n",
+    "        self.G_t += gradient @ gradient.T\n",
+    "\n",
+    "        G_t_inverse = 1 / (\n",
+    "            delta + np.sqrt(np.reshape(np.diagonal(self.G_t), (self.G_t.shape[0], 1)))\n",
+    "        )\n",
+    "        self.change = self.change * self.momentum + self.eta * gradient * G_t_inverse\n",
+    "        return self.change\n",
+    "\n",
+    "    def reset(self):\n",
+    "        self.G_t = None\n",
+    "\n",
+    "\n",
+    "class RMS_prop(Scheduler):\n",
+    "    def __init__(self, eta, rho):\n",
+    "        super().__init__(eta)\n",
+    "        self.rho = rho\n",
+    "        self.second = 0.0\n",
+    "\n",
+    "    def update_change(self, gradient):\n",
+    "        delta = 1e-8  # avoid division ny zero\n",
+    "        self.second = self.rho * self.second + (1 - self.rho) * gradient * gradient\n",
+    "        return self.eta * gradient / (np.sqrt(self.second + delta))\n",
+    "\n",
+    "    def reset(self):\n",
+    "        self.second = 0.0\n",
+    "\n",
+    "\n",
+    "class Adam(Scheduler):\n",
+    "    def __init__(self, eta, rho, rho2):\n",
+    "        super().__init__(eta)\n",
+    "        self.rho = rho\n",
+    "        self.rho2 = rho2\n",
+    "        self.moment = 0\n",
+    "        self.second = 0\n",
+    "        self.n_epochs = 1\n",
+    "\n",
+    "    def update_change(self, gradient):\n",
+    "        delta = 1e-8  # avoid division ny zero\n",
+    "\n",
+    "        self.moment = self.rho * self.moment + (1 - self.rho) * gradient\n",
+    "        self.second = self.rho2 * self.second + (1 - self.rho2) * gradient * gradient\n",
+    "\n",
+    "        moment_corrected = self.moment / (1 - self.rho**self.n_epochs)\n",
+    "        second_corrected = self.second / (1 - self.rho2**self.n_epochs)\n",
+    "\n",
+    "        return self.eta * moment_corrected / (np.sqrt(second_corrected + delta))\n",
+    "\n",
+    "    def reset(self):\n",
+    "        self.n_epochs += 1\n",
+    "        self.moment = 0\n",
+    "        self.second = 0"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "532aecc2",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "### Usage of the above learning rate schedulers\n",
+    "\n",
+    "To initalize a scheduler, simply create the object and pass in the\n",
+    "necessary parameters such as the learning rate and the momentum as\n",
+    "shown below. As the Scheduler class is an abstract class it should not\n",
+    "called directly, and will raise an error upon usage."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 24,
+   "id": "b24b4414",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "momentum_scheduler = Momentum(eta=1e-3, momentum=0.9)\n",
+    "adam_scheduler = Adam(eta=1e-3, rho=0.9, rho2=0.999)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "32a25c0b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Here is a small example for how a segment of code using schedulers\n",
+    "could look. Switching out the schedulers is simple."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 25,
+   "id": "7a7d273f",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "weights = np.ones((3,3))\n",
+    "print(f\"Before scheduler:\\n{weights=}\")\n",
+    "\n",
+    "epochs = 10\n",
+    "for e in range(epochs):\n",
+    "    gradient = np.random.rand(3, 3)\n",
+    "    change = adam_scheduler.update_change(gradient)\n",
+    "    weights = weights - change\n",
+    "    adam_scheduler.reset()\n",
+    "\n",
+    "print(f\"\\nAfter scheduler:\\n{weights=}\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d34cd45c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "### Cost functions\n",
+    "\n",
+    "Here we discuss cost functions that can be used when creating the\n",
+    "neural network. Every cost function takes the target vector as its\n",
+    "parameter, and returns a function valued only at $x$ such that it may\n",
+    "easily be differentiated."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 26,
+   "id": "9ad6425d",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "import autograd.numpy as np\n",
+    "\n",
+    "def CostOLS(target):\n",
+    "    \n",
+    "    def func(X):\n",
+    "        return (1.0 / target.shape[0]) * np.sum((target - X) ** 2)\n",
+    "\n",
+    "    return func\n",
+    "\n",
+    "\n",
+    "def CostLogReg(target):\n",
+    "\n",
+    "    def func(X):\n",
+    "        \n",
+    "        return -(1.0 / target.shape[0]) * np.sum(\n",
+    "            (target * np.log(X + 10e-10)) + ((1 - target) * np.log(1 - X + 10e-10))\n",
+    "        )\n",
+    "\n",
+    "    return func\n",
+    "\n",
+    "\n",
+    "def CostCrossEntropy(target):\n",
+    "    \n",
+    "    def func(X):\n",
+    "        return -(1.0 / target.size) * np.sum(target * np.log(X + 10e-10))\n",
+    "\n",
+    "    return func"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "baaaff79",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Below we give a short example of how these cost function may be used\n",
+    "to obtain results if you wish to test them out on your own using\n",
+    "AutoGrad's automatics differentiation."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 27,
+   "id": "78f11b83",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "from autograd import grad\n",
+    "\n",
+    "target = np.array([[1, 2, 3]]).T\n",
+    "a = np.array([[4, 5, 6]]).T\n",
+    "\n",
+    "cost_func = CostCrossEntropy\n",
+    "cost_func_derivative = grad(cost_func(target))\n",
+    "\n",
+    "valued_at_a = cost_func_derivative(a)\n",
+    "print(f\"Derivative of cost function {cost_func.__name__} valued at a:\\n{valued_at_a}\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "05285af5",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "### Activation functions\n",
+    "\n",
+    "Finally, before we look at the neural network, we will look at the\n",
+    "activation functions which can be specified between the hidden layers\n",
+    "and as the output function. Each function can be valued for any given\n",
+    "vector or matrix X, and can be differentiated via derivate()."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 28,
+   "id": "7ac52c84",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "import autograd.numpy as np\n",
+    "from autograd import elementwise_grad\n",
+    "\n",
+    "def identity(X):\n",
+    "    return X\n",
+    "\n",
+    "\n",
+    "def sigmoid(X):\n",
+    "    try:\n",
+    "        return 1.0 / (1 + np.exp(-X))\n",
+    "    except FloatingPointError:\n",
+    "        return np.where(X > np.zeros(X.shape), np.ones(X.shape), np.zeros(X.shape))\n",
+    "\n",
+    "\n",
+    "def softmax(X):\n",
+    "    X = X - np.max(X, axis=-1, keepdims=True)\n",
+    "    delta = 10e-10\n",
+    "    return np.exp(X) / (np.sum(np.exp(X), axis=-1, keepdims=True) + delta)\n",
+    "\n",
+    "\n",
+    "def RELU(X):\n",
+    "    return np.where(X > np.zeros(X.shape), X, np.zeros(X.shape))\n",
+    "\n",
+    "\n",
+    "def LRELU(X):\n",
+    "    delta = 10e-4\n",
+    "    return np.where(X > np.zeros(X.shape), X, delta * X)\n",
+    "\n",
+    "\n",
+    "def derivate(func):\n",
+    "    if func.__name__ == \"RELU\":\n",
+    "\n",
+    "        def func(X):\n",
+    "            return np.where(X > 0, 1, 0)\n",
+    "\n",
+    "        return func\n",
+    "\n",
+    "    elif func.__name__ == \"LRELU\":\n",
+    "\n",
+    "        def func(X):\n",
+    "            delta = 10e-4\n",
+    "            return np.where(X > 0, 1, delta)\n",
+    "\n",
+    "        return func\n",
+    "\n",
+    "    else:\n",
+    "        return elementwise_grad(func)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "873e7caa",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Below follows a short demonstration of how to use an activation\n",
+    "function. The derivative of the activation function will be important\n",
+    "when calculating the output delta term during backpropagation. Note\n",
+    "that derivate() can also be used for cost functions for a more\n",
+    "generalized approach."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 29,
+   "id": "bd43ac18",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "z = np.array([[4, 5, 6]]).T\n",
+    "print(f\"Input to activation function:\\n{z}\")\n",
+    "\n",
+    "act_func = sigmoid\n",
+    "a = act_func(z)\n",
+    "print(f\"\\nOutput from {act_func.__name__} activation function:\\n{a}\")\n",
+    "\n",
+    "act_func_derivative = derivate(act_func)\n",
+    "valued_at_z = act_func_derivative(a)\n",
+    "print(f\"\\nDerivative of {act_func.__name__} activation function valued at z:\\n{valued_at_z}\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3dc2175e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "### The Neural Network\n",
+    "\n",
+    "Now that we have gotten a good understanding of the implementation of\n",
+    "some important components, we can take a look at an object oriented\n",
+    "implementation of a feed forward neural network. The feed forward\n",
+    "neural network has been implemented as a class named FFNN, which can\n",
+    "be initiated as a regressor or classifier dependant on the choice of\n",
+    "cost function. The FFNN can have any number of input nodes, hidden\n",
+    "layers with any amount of hidden nodes, and any amount of output nodes\n",
+    "meaning it can perform multiclass classification as well as binary\n",
+    "classification and regression problems. Although there is a lot of\n",
+    "code present, it makes for an easy to use and generalizeable interface\n",
+    "for creating many types of neural networks as will be demonstrated\n",
+    "below."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 30,
+   "id": "5b4b161c",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "import math\n",
+    "import autograd.numpy as np\n",
+    "import sys\n",
+    "import warnings\n",
+    "from autograd import grad, elementwise_grad\n",
+    "from random import random, seed\n",
+    "from copy import deepcopy, copy\n",
+    "from typing import Tuple, Callable\n",
+    "from sklearn.utils import resample\n",
+    "\n",
+    "warnings.simplefilter(\"error\")\n",
+    "\n",
+    "\n",
+    "class FFNN:\n",
+    "    \"\"\"\n",
+    "    Description:\n",
+    "    ------------\n",
+    "        Feed Forward Neural Network with interface enabling flexible design of a\n",
+    "        nerual networks architecture and the specification of activation function\n",
+    "        in the hidden layers and output layer respectively. This model can be used\n",
+    "        for both regression and classification problems, depending on the output function.\n",
+    "\n",
+    "    Attributes:\n",
+    "    ------------\n",
+    "        I   dimensions (tuple[int]): A list of positive integers, which specifies the\n",
+    "            number of nodes in each of the networks layers. The first integer in the array\n",
+    "            defines the number of nodes in the input layer, the second integer defines number\n",
+    "            of nodes in the first hidden layer and so on until the last number, which\n",
+    "            specifies the number of nodes in the output layer.\n",
+    "        II  hidden_func (Callable): The activation function for the hidden layers\n",
+    "        III output_func (Callable): The activation function for the output layer\n",
+    "        IV  cost_func (Callable): Our cost function\n",
+    "        V   seed (int): Sets random seed, makes results reproducible\n",
+    "    \"\"\"\n",
+    "\n",
+    "    def __init__(\n",
+    "        self,\n",
+    "        dimensions: tuple[int],\n",
+    "        hidden_func: Callable = sigmoid,\n",
+    "        output_func: Callable = lambda x: x,\n",
+    "        cost_func: Callable = CostOLS,\n",
+    "        seed: int = None,\n",
+    "    ):\n",
+    "        self.dimensions = dimensions\n",
+    "        self.hidden_func = hidden_func\n",
+    "        self.output_func = output_func\n",
+    "        self.cost_func = cost_func\n",
+    "        self.seed = seed\n",
+    "        self.weights = list()\n",
+    "        self.schedulers_weight = list()\n",
+    "        self.schedulers_bias = list()\n",
+    "        self.a_matrices = list()\n",
+    "        self.z_matrices = list()\n",
+    "        self.classification = None\n",
+    "\n",
+    "        self.reset_weights()\n",
+    "        self._set_classification()\n",
+    "\n",
+    "    def fit(\n",
+    "        self,\n",
+    "        X: np.ndarray,\n",
+    "        t: np.ndarray,\n",
+    "        scheduler: Scheduler,\n",
+    "        batches: int = 1,\n",
+    "        epochs: int = 100,\n",
+    "        lam: float = 0,\n",
+    "        X_val: np.ndarray = None,\n",
+    "        t_val: np.ndarray = None,\n",
+    "    ):\n",
+    "        \"\"\"\n",
+    "        Description:\n",
+    "        ------------\n",
+    "            This function performs the training the neural network by performing the feedforward and backpropagation\n",
+    "            algorithm to update the networks weights.\n",
+    "\n",
+    "        Parameters:\n",
+    "        ------------\n",
+    "            I    X (np.ndarray) : training data\n",
+    "            II   t (np.ndarray) : target data\n",
+    "            III  scheduler (Scheduler) : specified scheduler (algorithm for optimization of gradient descent)\n",
+    "            IV   scheduler_args (list[int]) : list of all arguments necessary for scheduler\n",
+    "\n",
+    "        Optional Parameters:\n",
+    "        ------------\n",
+    "            V    batches (int) : number of batches the datasets are split into, default equal to 1\n",
+    "            VI   epochs (int) : number of iterations used to train the network, default equal to 100\n",
+    "            VII  lam (float) : regularization hyperparameter lambda\n",
+    "            VIII X_val (np.ndarray) : validation set\n",
+    "            IX   t_val (np.ndarray) : validation target set\n",
+    "\n",
+    "        Returns:\n",
+    "        ------------\n",
+    "            I   scores (dict) : A dictionary containing the performance metrics of the model.\n",
+    "                The number of the metrics depends on the parameters passed to the fit-function.\n",
+    "\n",
+    "        \"\"\"\n",
+    "\n",
+    "        # setup \n",
+    "        if self.seed is not None:\n",
+    "            np.random.seed(self.seed)\n",
+    "\n",
+    "        val_set = False\n",
+    "        if X_val is not None and t_val is not None:\n",
+    "            val_set = True\n",
+    "\n",
+    "        # creating arrays for score metrics\n",
+    "        train_errors = np.empty(epochs)\n",
+    "        train_errors.fill(np.nan)\n",
+    "        val_errors = np.empty(epochs)\n",
+    "        val_errors.fill(np.nan)\n",
+    "\n",
+    "        train_accs = np.empty(epochs)\n",
+    "        train_accs.fill(np.nan)\n",
+    "        val_accs = np.empty(epochs)\n",
+    "        val_accs.fill(np.nan)\n",
+    "\n",
+    "        self.schedulers_weight = list()\n",
+    "        self.schedulers_bias = list()\n",
+    "\n",
+    "        batch_size = X.shape[0] // batches\n",
+    "\n",
+    "        X, t = resample(X, t)\n",
+    "\n",
+    "        # this function returns a function valued only at X\n",
+    "        cost_function_train = self.cost_func(t)\n",
+    "        if val_set:\n",
+    "            cost_function_val = self.cost_func(t_val)\n",
+    "\n",
+    "        # create schedulers for each weight matrix\n",
+    "        for i in range(len(self.weights)):\n",
+    "            self.schedulers_weight.append(copy(scheduler))\n",
+    "            self.schedulers_bias.append(copy(scheduler))\n",
+    "\n",
+    "        print(f\"{scheduler.__class__.__name__}: Eta={scheduler.eta}, Lambda={lam}\")\n",
+    "\n",
+    "        try:\n",
+    "            for e in range(epochs):\n",
+    "                for i in range(batches):\n",
+    "                    # allows for minibatch gradient descent\n",
+    "                    if i == batches - 1:\n",
+    "                        # If the for loop has reached the last batch, take all thats left\n",
+    "                        X_batch = X[i * batch_size :, :]\n",
+    "                        t_batch = t[i * batch_size :, :]\n",
+    "                    else:\n",
+    "                        X_batch = X[i * batch_size : (i + 1) * batch_size, :]\n",
+    "                        t_batch = t[i * batch_size : (i + 1) * batch_size, :]\n",
+    "\n",
+    "                    self._feedforward(X_batch)\n",
+    "                    self._backpropagate(X_batch, t_batch, lam)\n",
+    "\n",
+    "                # reset schedulers for each epoch (some schedulers pass in this call)\n",
+    "                for scheduler in self.schedulers_weight:\n",
+    "                    scheduler.reset()\n",
+    "\n",
+    "                for scheduler in self.schedulers_bias:\n",
+    "                    scheduler.reset()\n",
+    "\n",
+    "                # computing performance metrics\n",
+    "                pred_train = self.predict(X)\n",
+    "                train_error = cost_function_train(pred_train)\n",
+    "\n",
+    "                train_errors[e] = train_error\n",
+    "                if val_set:\n",
+    "                    \n",
+    "                    pred_val = self.predict(X_val)\n",
+    "                    val_error = cost_function_val(pred_val)\n",
+    "                    val_errors[e] = val_error\n",
+    "\n",
+    "                if self.classification:\n",
+    "                    train_acc = self._accuracy(self.predict(X), t)\n",
+    "                    train_accs[e] = train_acc\n",
+    "                    if val_set:\n",
+    "                        val_acc = self._accuracy(pred_val, t_val)\n",
+    "                        val_accs[e] = val_acc\n",
+    "\n",
+    "                # printing progress bar\n",
+    "                progression = e / epochs\n",
+    "                print_length = self._progress_bar(\n",
+    "                    progression,\n",
+    "                    train_error=train_errors[e],\n",
+    "                    train_acc=train_accs[e],\n",
+    "                    val_error=val_errors[e],\n",
+    "                    val_acc=val_accs[e],\n",
+    "                )\n",
+    "        except KeyboardInterrupt:\n",
+    "            # allows for stopping training at any point and seeing the result\n",
+    "            pass\n",
+    "\n",
+    "        # visualization of training progression (similiar to tensorflow progression bar)\n",
+    "        sys.stdout.write(\"\\r\" + \" \" * print_length)\n",
+    "        sys.stdout.flush()\n",
+    "        self._progress_bar(\n",
+    "            1,\n",
+    "            train_error=train_errors[e],\n",
+    "            train_acc=train_accs[e],\n",
+    "            val_error=val_errors[e],\n",
+    "            val_acc=val_accs[e],\n",
+    "        )\n",
+    "        sys.stdout.write(\"\")\n",
+    "\n",
+    "        # return performance metrics for the entire run\n",
+    "        scores = dict()\n",
+    "\n",
+    "        scores[\"train_errors\"] = train_errors\n",
+    "\n",
+    "        if val_set:\n",
+    "            scores[\"val_errors\"] = val_errors\n",
+    "\n",
+    "        if self.classification:\n",
+    "            scores[\"train_accs\"] = train_accs\n",
+    "\n",
+    "            if val_set:\n",
+    "                scores[\"val_accs\"] = val_accs\n",
+    "\n",
+    "        return scores\n",
+    "\n",
+    "    def predict(self, X: np.ndarray, *, threshold=0.5):\n",
+    "        \"\"\"\n",
+    "         Description:\n",
+    "         ------------\n",
+    "             Performs prediction after training of the network has been finished.\n",
+    "\n",
+    "         Parameters:\n",
+    "        ------------\n",
+    "             I   X (np.ndarray): The design matrix, with n rows of p features each\n",
+    "\n",
+    "         Optional Parameters:\n",
+    "         ------------\n",
+    "             II  threshold (float) : sets minimal value for a prediction to be predicted as the positive class\n",
+    "                 in classification problems\n",
+    "\n",
+    "         Returns:\n",
+    "         ------------\n",
+    "             I   z (np.ndarray): A prediction vector (row) for each row in our design matrix\n",
+    "                 This vector is thresholded if regression=False, meaning that classification results\n",
+    "                 in a vector of 1s and 0s, while regressions in an array of decimal numbers\n",
+    "\n",
+    "        \"\"\"\n",
+    "\n",
+    "        predict = self._feedforward(X)\n",
+    "\n",
+    "        if self.classification:\n",
+    "            return np.where(predict > threshold, 1, 0)\n",
+    "        else:\n",
+    "            return predict\n",
+    "\n",
+    "    def reset_weights(self):\n",
+    "        \"\"\"\n",
+    "        Description:\n",
+    "        ------------\n",
+    "            Resets/Reinitializes the weights in order to train the network for a new problem.\n",
+    "\n",
+    "        \"\"\"\n",
+    "        if self.seed is not None:\n",
+    "            np.random.seed(self.seed)\n",
+    "\n",
+    "        self.weights = list()\n",
+    "        for i in range(len(self.dimensions) - 1):\n",
+    "            weight_array = np.random.randn(\n",
+    "                self.dimensions[i] + 1, self.dimensions[i + 1]\n",
+    "            )\n",
+    "            weight_array[0, :] = np.random.randn(self.dimensions[i + 1]) * 0.01\n",
+    "\n",
+    "            self.weights.append(weight_array)\n",
+    "\n",
+    "    def _feedforward(self, X: np.ndarray):\n",
+    "        \"\"\"\n",
+    "        Description:\n",
+    "        ------------\n",
+    "            Calculates the activation of each layer starting at the input and ending at the output.\n",
+    "            Each following activation is calculated from a weighted sum of each of the preceeding\n",
+    "            activations (except in the case of the input layer).\n",
+    "\n",
+    "        Parameters:\n",
+    "        ------------\n",
+    "            I   X (np.ndarray): The design matrix, with n rows of p features each\n",
+    "\n",
+    "        Returns:\n",
+    "        ------------\n",
+    "            I   z (np.ndarray): A prediction vector (row) for each row in our design matrix\n",
+    "        \"\"\"\n",
+    "\n",
+    "        # reset matrices\n",
+    "        self.a_matrices = list()\n",
+    "        self.z_matrices = list()\n",
+    "\n",
+    "        # if X is just a vector, make it into a matrix\n",
+    "        if len(X.shape) == 1:\n",
+    "            X = X.reshape((1, X.shape[0]))\n",
+    "\n",
+    "        # Add a coloumn of zeros as the first coloumn of the design matrix, in order\n",
+    "        # to add bias to our data\n",
+    "        bias = np.ones((X.shape[0], 1)) * 0.01\n",
+    "        X = np.hstack([bias, X])\n",
+    "\n",
+    "        # a^0, the nodes in the input layer (one a^0 for each row in X - where the\n",
+    "        # exponent indicates layer number).\n",
+    "        a = X\n",
+    "        self.a_matrices.append(a)\n",
+    "        self.z_matrices.append(a)\n",
+    "\n",
+    "        # The feed forward algorithm\n",
+    "        for i in range(len(self.weights)):\n",
+    "            if i < len(self.weights) - 1:\n",
+    "                z = a @ self.weights[i]\n",
+    "                self.z_matrices.append(z)\n",
+    "                a = self.hidden_func(z)\n",
+    "                # bias column again added to the data here\n",
+    "                bias = np.ones((a.shape[0], 1)) * 0.01\n",
+    "                a = np.hstack([bias, a])\n",
+    "                self.a_matrices.append(a)\n",
+    "            else:\n",
+    "                try:\n",
+    "                    # a^L, the nodes in our output layers\n",
+    "                    z = a @ self.weights[i]\n",
+    "                    a = self.output_func(z)\n",
+    "                    self.a_matrices.append(a)\n",
+    "                    self.z_matrices.append(z)\n",
+    "                except Exception as OverflowError:\n",
+    "                    print(\n",
+    "                        \"OverflowError in fit() in FFNN\\nHOW TO DEBUG ERROR: Consider lowering your learning rate or scheduler specific parameters such as momentum, or check if your input values need scaling\"\n",
+    "                    )\n",
+    "\n",
+    "        # this will be a^L\n",
+    "        return a\n",
+    "\n",
+    "    def _backpropagate(self, X, t, lam):\n",
+    "        \"\"\"\n",
+    "        Description:\n",
+    "        ------------\n",
+    "            Performs the backpropagation algorithm. In other words, this method\n",
+    "            calculates the gradient of all the layers starting at the\n",
+    "            output layer, and moving from right to left accumulates the gradient until\n",
+    "            the input layer is reached. Each layers respective weights are updated while\n",
+    "            the algorithm propagates backwards from the output layer (auto-differentation in reverse mode).\n",
+    "\n",
+    "        Parameters:\n",
+    "        ------------\n",
+    "            I   X (np.ndarray): The design matrix, with n rows of p features each.\n",
+    "            II  t (np.ndarray): The target vector, with n rows of p targets.\n",
+    "            III lam (float32): regularization parameter used to punish the weights in case of overfitting\n",
+    "\n",
+    "        Returns:\n",
+    "        ------------\n",
+    "            No return value.\n",
+    "\n",
+    "        \"\"\"\n",
+    "        out_derivative = derivate(self.output_func)\n",
+    "        hidden_derivative = derivate(self.hidden_func)\n",
+    "\n",
+    "        for i in range(len(self.weights) - 1, -1, -1):\n",
+    "            # delta terms for output\n",
+    "            if i == len(self.weights) - 1:\n",
+    "                # for multi-class classification\n",
+    "                if (\n",
+    "                    self.output_func.__name__ == \"softmax\"\n",
+    "                ):\n",
+    "                    delta_matrix = self.a_matrices[i + 1] - t\n",
+    "                # for single class classification\n",
+    "                else:\n",
+    "                    cost_func_derivative = grad(self.cost_func(t))\n",
+    "                    delta_matrix = out_derivative(\n",
+    "                        self.z_matrices[i + 1]\n",
+    "                    ) * cost_func_derivative(self.a_matrices[i + 1])\n",
+    "\n",
+    "            # delta terms for hidden layer\n",
+    "            else:\n",
+    "                delta_matrix = (\n",
+    "                    self.weights[i + 1][1:, :] @ delta_matrix.T\n",
+    "                ).T * hidden_derivative(self.z_matrices[i + 1])\n",
+    "\n",
+    "            # calculate gradient\n",
+    "            gradient_weights = self.a_matrices[i][:, 1:].T @ delta_matrix\n",
+    "            gradient_bias = np.sum(delta_matrix, axis=0).reshape(\n",
+    "                1, delta_matrix.shape[1]\n",
+    "            )\n",
+    "\n",
+    "            # regularization term\n",
+    "            gradient_weights += self.weights[i][1:, :] * lam\n",
+    "\n",
+    "            # use scheduler\n",
+    "            update_matrix = np.vstack(\n",
+    "                [\n",
+    "                    self.schedulers_bias[i].update_change(gradient_bias),\n",
+    "                    self.schedulers_weight[i].update_change(gradient_weights),\n",
+    "                ]\n",
+    "            )\n",
+    "\n",
+    "            # update weights and bias\n",
+    "            self.weights[i] -= update_matrix\n",
+    "\n",
+    "    def _accuracy(self, prediction: np.ndarray, target: np.ndarray):\n",
+    "        \"\"\"\n",
+    "        Description:\n",
+    "        ------------\n",
+    "            Calculates accuracy of given prediction to target\n",
+    "\n",
+    "        Parameters:\n",
+    "        ------------\n",
+    "            I   prediction (np.ndarray): vector of predicitons output network\n",
+    "                (1s and 0s in case of classification, and real numbers in case of regression)\n",
+    "            II  target (np.ndarray): vector of true values (What the network ideally should predict)\n",
+    "\n",
+    "        Returns:\n",
+    "        ------------\n",
+    "            A floating point number representing the percentage of correctly classified instances.\n",
+    "        \"\"\"\n",
+    "        assert prediction.size == target.size\n",
+    "        return np.average((target == prediction))\n",
+    "    def _set_classification(self):\n",
+    "        \"\"\"\n",
+    "        Description:\n",
+    "        ------------\n",
+    "            Decides if FFNN acts as classifier (True) og regressor (False),\n",
+    "            sets self.classification during init()\n",
+    "        \"\"\"\n",
+    "        self.classification = False\n",
+    "        if (\n",
+    "            self.cost_func.__name__ == \"CostLogReg\"\n",
+    "            or self.cost_func.__name__ == \"CostCrossEntropy\"\n",
+    "        ):\n",
+    "            self.classification = True\n",
+    "\n",
+    "    def _progress_bar(self, progression, **kwargs):\n",
+    "        \"\"\"\n",
+    "        Description:\n",
+    "        ------------\n",
+    "            Displays progress of training\n",
+    "        \"\"\"\n",
+    "        print_length = 40\n",
+    "        num_equals = int(progression * print_length)\n",
+    "        num_not = print_length - num_equals\n",
+    "        arrow = \">\" if num_equals > 0 else \"\"\n",
+    "        bar = \"[\" + \"=\" * (num_equals - 1) + arrow + \"-\" * num_not + \"]\"\n",
+    "        perc_print = self._format(progression * 100, decimals=5)\n",
+    "        line = f\"  {bar} {perc_print}% \"\n",
+    "\n",
+    "        for key in kwargs:\n",
+    "            if not np.isnan(kwargs[key]):\n",
+    "                value = self._format(kwargs[key], decimals=4)\n",
+    "                line += f\"| {key}: {value} \"\n",
+    "        sys.stdout.write(\"\\r\" + line)\n",
+    "        sys.stdout.flush()\n",
+    "        return len(line)\n",
+    "\n",
+    "    def _format(self, value, decimals=4):\n",
+    "        \"\"\"\n",
+    "        Description:\n",
+    "        ------------\n",
+    "            Formats decimal numbers for progress bar\n",
+    "        \"\"\"\n",
+    "        if value > 0:\n",
+    "            v = value\n",
+    "        elif value < 0:\n",
+    "            v = -10 * value\n",
+    "        else:\n",
+    "            v = 1\n",
+    "        n = 1 + math.floor(math.log10(v))\n",
+    "        if n >= decimals - 1:\n",
+    "            return str(round(value))\n",
+    "        return f\"{value:.{decimals-n-1}f}\""
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9596ae53",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Before we make a model, we will quickly generate a dataset we can use\n",
+    "for our linear regression problem as shown below"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 31,
+   "id": "a11f680f",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "import autograd.numpy as np\n",
+    "from sklearn.model_selection import train_test_split\n",
+    "\n",
+    "def SkrankeFunction(x, y):\n",
+    "    return np.ravel(0 + 1*x + 2*y + 3*x**2 + 4*x*y + 5*y**2)\n",
+    "\n",
+    "def create_X(x, y, n):\n",
+    "    if len(x.shape) > 1:\n",
+    "        x = np.ravel(x)\n",
+    "        y = np.ravel(y)\n",
+    "\n",
+    "    N = len(x)\n",
+    "    l = int((n + 1) * (n + 2) / 2)  # Number of elements in beta\n",
+    "    X = np.ones((N, l))\n",
+    "\n",
+    "    for i in range(1, n + 1):\n",
+    "        q = int((i) * (i + 1) / 2)\n",
+    "        for k in range(i + 1):\n",
+    "            X[:, q + k] = (x ** (i - k)) * (y**k)\n",
+    "\n",
+    "    return X\n",
+    "\n",
+    "step=0.5\n",
+    "x = np.arange(0, 1, step)\n",
+    "y = np.arange(0, 1, step)\n",
+    "x, y = np.meshgrid(x, y)\n",
+    "target = SkrankeFunction(x, y)\n",
+    "target = target.reshape(target.shape[0], 1)\n",
+    "\n",
+    "poly_degree=3\n",
+    "X = create_X(x, y, poly_degree)\n",
+    "\n",
+    "X_train, X_test, t_train, t_test = train_test_split(X, target)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0fc39e40",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Now that we have our dataset ready for the regression, we can create\n",
+    "our regressor. Note that with the seed parameter, we can make sure our\n",
+    "results stay the same every time we run the neural network. For\n",
+    "inititialization, we simply specify the dimensions (we wish the amount\n",
+    "of input nodes to be equal to the datapoints, and the output to\n",
+    "predict one value)."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 32,
+   "id": "a67ab3a0",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "input_nodes = X_train.shape[1]\n",
+    "output_nodes = 1\n",
+    "\n",
+    "linear_regression = FFNN((input_nodes, output_nodes), output_func=identity, cost_func=CostOLS, seed=2023)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3add8665",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "We then fit our model with our training data using the scheduler of our choice."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 33,
+   "id": "4a4fbc7a",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "linear_regression.reset_weights() # reset weights such that previous runs or reruns don't affect the weights\n",
+    "\n",
+    "scheduler = Constant(eta=1e-3)\n",
+    "scores = linear_regression.fit(X_train, t_train, scheduler)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4dff1871",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Due to the progress bar we can see the MSE (train_error) throughout\n",
+    "the FFNN's training. Note that the fit() function has some optional\n",
+    "parameters with defualt arguments. For example, the regularization\n",
+    "hyperparameter can be left ignored if not needed, and equally the FFNN\n",
+    "will by default run for 100 epochs. These can easily be changed, such\n",
+    "as for example:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 34,
+   "id": "ad40e38c",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "linear_regression.reset_weights() # reset weights such that previous runs or reruns don't affect the weights\n",
+    "\n",
+    "scores = linear_regression.fit(X_train, t_train, scheduler, lam=1e-4, epochs=1000)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "43cd1e22",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "We see that given more epochs to train on, the regressor reaches a lower MSE.\n",
+    "\n",
+    "Let us then switch to a binary classification. We use a binary\n",
+    "classification dataset, and follow a similar setup to the regression\n",
+    "case."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 35,
+   "id": "cde36b38",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "from sklearn.datasets import load_breast_cancer\n",
+    "from sklearn.preprocessing import MinMaxScaler\n",
+    "\n",
+    "wisconsin = load_breast_cancer()\n",
+    "X = wisconsin.data\n",
+    "target = wisconsin.target\n",
+    "target = target.reshape(target.shape[0], 1)\n",
+    "\n",
+    "X_train, X_val, t_train, t_val = train_test_split(X, target)\n",
+    "\n",
+    "scaler = MinMaxScaler()\n",
+    "scaler.fit(X_train)\n",
+    "X_train = scaler.transform(X_train)\n",
+    "X_val = scaler.transform(X_val)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 36,
+   "id": "2bc572a4",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "input_nodes = X_train.shape[1]\n",
+    "output_nodes = 1\n",
+    "\n",
+    "logistic_regression = FFNN((input_nodes, output_nodes), output_func=sigmoid, cost_func=CostLogReg, seed=2023)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e3e6fa31",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "We will now make use of our validation data by passing it into our fit function as a keyword argument"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 37,
+   "id": "575ceb29",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "logistic_regression.reset_weights() # reset weights such that previous runs or reruns don't affect the weights\n",
+    "\n",
+    "scheduler = Adam(eta=1e-3, rho=0.9, rho2=0.999)\n",
+    "scores = logistic_regression.fit(X_train, t_train, scheduler, epochs=1000, X_val=X_val, t_val=t_val)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "622015f0",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Finally, we will create a neural network with 2 hidden layers with activation functions."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 38,
+   "id": "9c075b36",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "input_nodes = X_train.shape[1]\n",
+    "hidden_nodes1 = 100\n",
+    "hidden_nodes2 = 30\n",
+    "output_nodes = 1\n",
+    "\n",
+    "dims = (input_nodes, hidden_nodes1, hidden_nodes2, output_nodes)\n",
+    "\n",
+    "neural_network = FFNN(dims, hidden_func=RELU, output_func=sigmoid, cost_func=CostLogReg, seed=2023)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 39,
+   "id": "44ded771",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "neural_network.reset_weights() # reset weights such that previous runs or reruns don't affect the weights\n",
+    "\n",
+    "scheduler = Adam(eta=1e-4, rho=0.9, rho2=0.999)\n",
+    "scores = neural_network.fit(X_train, t_train, scheduler, epochs=1000, X_val=X_val, t_val=t_val)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "317e6e5c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "### Multiclass classification\n",
+    "\n",
+    "Finally, we will demonstrate the use case of multiclass classification\n",
+    "using our FFNN with the famous MNIST dataset, which contain images of\n",
+    "digits between the range of 0 to 9."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 40,
+   "id": "8911de9d",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "from sklearn.datasets import load_digits\n",
+    "\n",
+    "def onehot(target: np.ndarray):\n",
+    "    onehot = np.zeros((target.size, target.max() + 1))\n",
+    "    onehot[np.arange(target.size), target] = 1\n",
+    "    return onehot\n",
+    "\n",
+    "digits = load_digits()\n",
+    "\n",
+    "X = digits.data\n",
+    "target = digits.target\n",
+    "target = onehot(target)\n",
+    "\n",
+    "input_nodes = 64\n",
+    "hidden_nodes1 = 100\n",
+    "hidden_nodes2 = 30\n",
+    "output_nodes = 10\n",
+    "\n",
+    "dims = (input_nodes, hidden_nodes1, hidden_nodes2, output_nodes)\n",
+    "\n",
+    "multiclass = FFNN(dims, hidden_func=LRELU, output_func=softmax, cost_func=CostCrossEntropy)\n",
+    "\n",
+    "multiclass.reset_weights() # reset weights such that previous runs or reruns don't affect the weights\n",
+    "\n",
+    "scheduler = Adam(eta=1e-4, rho=0.9, rho2=0.999)\n",
+    "scores = multiclass.fit(X, target, scheduler, epochs=1000)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "82d61377",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Testing the XOR gate and other gates\n",
+    "\n",
+    "Let us now use our code to test the XOR gate."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 41,
+   "id": "2a72a374",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "X = np.array([ [0, 0], [0, 1], [1, 0],[1, 1]],dtype=np.float64)\n",
+    "\n",
+    "# The XOR gate\n",
+    "yXOR = np.array( [[ 0], [1] ,[1], [0]])\n",
+    "\n",
+    "input_nodes = X.shape[1]\n",
+    "output_nodes = 1\n",
+    "\n",
+    "logistic_regression = FFNN((input_nodes, output_nodes), output_func=sigmoid, cost_func=CostLogReg, seed=2023)\n",
+    "logistic_regression.reset_weights() # reset weights such that previous runs or reruns don't affect the weights\n",
+    "scheduler = Adam(eta=1e-1, rho=0.9, rho2=0.999)\n",
+    "scores = logistic_regression.fit(X, yXOR, scheduler, epochs=1000)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2d892009",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Not bad, but the results depend strongly on the learning reate. Try different learning rates."
+   ]
+  }
+ ],
+ "metadata": {},
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
\ No newline at end of file
diff --git a/doc/LectureNotes/_build/jupyter_execute/week43.ipynb b/doc/LectureNotes/_build/jupyter_execute/week43.ipynb
new file mode 100644
index 000000000..f51a39c64
--- /dev/null
+++ b/doc/LectureNotes/_build/jupyter_execute/week43.ipynb
@@ -0,0 +1,5950 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "5e07edf2",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- HTML file automatically generated from DocOnce source (https://github.com/doconce/doconce/)\n",
+    "doconce format html week43.do.txt --no_mako -->\n",
+    "<!-- dom:TITLE: Week 43: Deep Learning: Constructing a Neural Network code and solving differential equations -->"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "44b465a0",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "# Week 43: Deep Learning: Constructing a Neural Network code and solving differential equations\n",
+    "**Morten Hjorth-Jensen**, Department of Physics, University of Oslo, Norway\n",
+    "\n",
+    "Date: **October 20, 2025**"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9d7bd8c9",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Plans for week 43\n",
+    "\n",
+    "**Material for the lecture on Monday October 20, 2025.**\n",
+    "\n",
+    "1. Reminder from last week, see also lecture notes from week 42 at <https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/week42.html> as well as those from week 41, see see <https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/week41.html>. \n",
+    "\n",
+    "2. Building our own Feed-forward Neural Network.\n",
+    "\n",
+    "3. Coding examples using Tensorflow/Keras and Pytorch examples. The Pytorch examples are adapted from Rashcka's text, see chapters 11-13.. \n",
+    "\n",
+    "4. Start discussions on how to use neural networks for solving  differential equations (ordinary and partial ones). This topic continues next week as well.\n",
+    "\n",
+    "5. Video of lecture at <https://youtu.be/Gi6mzxAT0Ew>\n",
+    "\n",
+    "6. Whiteboard notes at <https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2025/FYSSTKweek43.pdf>"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c50cff0f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Exercises and lab session week 43\n",
+    "**Lab sessions on Tuesday and Wednesday.**\n",
+    "\n",
+    "1. Work  on writing your own neural network code and discussions of project 2. If you didn't get time to do the exercises from the two last weeks, we recommend doing so as these exercises give you the basic elements of a neural network code.\n",
+    "\n",
+    "2. The exercises this week are tailored to the optional part of project 2, and deal with studying ways to display results from classification problems"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "fe8d32ed",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Using Automatic differentiation\n",
+    "\n",
+    "In our discussions of ordinary differential equations and neural network codes\n",
+    "we will also study the usage of Autograd, see for example <https://www.youtube.com/watch?v=fRf4l5qaX1M&ab_channel=AlexSmola> in computing gradients for deep learning. For the documentation of Autograd and examples see the Autograd documentation at <https://github.com/HIPS/autograd> and the lecture slides from week 41, see <https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/week41.html>."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "99999ab4",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Back propagation and automatic differentiation\n",
+    "\n",
+    "For more details on the back propagation algorithm and automatic differentiation see\n",
+    "1. <https://www.jmlr.org/papers/volume18/17-468/17-468.pdf>\n",
+    "\n",
+    "2. <https://deepimaging.github.io/lectures/lecture_11_Backpropagation.pdf>\n",
+    "\n",
+    "3. Slides 12-44 at <http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture4.pdf>"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b4489372",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Lecture Monday  October 20"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f7435e4a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations\n",
+    "This is a reminder from last week.\n",
+    "\n",
+    "**The architecture (our model).**\n",
+    "\n",
+    "1. Set up your inputs and outputs (scalars, vectors, matrices or higher-order arrays)\n",
+    "\n",
+    "2. Define the number of hidden layers and hidden nodes\n",
+    "\n",
+    "3. Define activation functions for hidden layers and output layers\n",
+    "\n",
+    "4. Define optimizer (plan learning rate, momentum, ADAgrad, RMSprop, ADAM etc) and array of initial learning rates\n",
+    "\n",
+    "5. Define cost function and possible regularization terms with hyperparameters\n",
+    "\n",
+    "6. Initialize weights and biases\n",
+    "\n",
+    "7. Fix number of iterations for the feed forward part and back propagation part"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e2561576",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Setting up the back propagation algorithm, part 1\n",
+    "\n",
+    "Let us write this out in the form of an algorithm.\n",
+    "\n",
+    "**First**, we set up the input data $\\boldsymbol{x}$ and the activations\n",
+    "$\\boldsymbol{z}_1$ of the input layer and compute the activation function and\n",
+    "the pertinent outputs $\\boldsymbol{a}^1$.\n",
+    "\n",
+    "**Secondly**, we perform then the feed forward till we reach the output\n",
+    "layer and compute all $\\boldsymbol{z}_l$ of the input layer and compute the\n",
+    "activation function and the pertinent outputs $\\boldsymbol{a}^l$ for\n",
+    "$l=1,2,3,\\dots,L$.\n",
+    "\n",
+    "**Notation**: The first hidden layer has $l=1$ as label and the final output layer has $l=L$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "39ed46ed",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Setting up the back propagation algorithm, part 2\n",
+    "\n",
+    "Thereafter we compute the ouput error $\\boldsymbol{\\delta}^L$ by computing all"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "776b50ac",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\delta_j^L = \\sigma'(z_j^L)\\frac{\\partial {\\cal C}}{\\partial (a_j^L)}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b0ad385d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Then we compute the back propagate error for each $l=L-1,L-2,\\dots,1$ as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "bb592830",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\delta_j^l = \\sum_k \\delta_k^{l+1}w_{kj}^{l+1}\\sigma'(z_j^l).\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "41259526",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Setting up the Back propagation algorithm, part 3\n",
+    "\n",
+    "Finally, we update the weights and the biases using gradient descent\n",
+    "for each $l=L-1,L-2,\\dots,1$ (the first hidden layer) and update the weights and biases\n",
+    "according to the rules"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "47eaff91",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "w_{ij}^l\\leftarrow  = w_{ij}^l- \\eta \\delta_j^la_i^{l-1},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "05b74533",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "b_j^l \\leftarrow b_j^l-\\eta \\frac{\\partial {\\cal C}}{\\partial b_j^l}=b_j^l-\\eta \\delta_j^l,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6edb8648",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "with $\\eta$ being the learning rate."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a663fc08",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Updating the gradients\n",
+    "\n",
+    "With the back propagate error for each $l=L-1,L-2,\\dots,1$ as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "479150e0",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\delta_j^l = \\sum_k \\delta_k^{l+1}w_{kj}^{l+1}\\sigma'(z_j^l),\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "41b9b1ea",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "we update the weights and the biases using gradient descent for each $l=L-1,L-2,\\dots,1$ and update the weights and biases according to the rules"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "590c403a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "w_{ij}^l\\leftarrow  = w_{ij}^l- \\eta \\delta_j^la_i^{l-1},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3db8cbb4",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "b_j^l \\leftarrow b_j^l-\\eta \\frac{\\partial {\\cal C}}{\\partial b_j^l}=b_j^l-\\eta \\delta_j^l,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a204182a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Activation functions\n",
+    "\n",
+    "A property that characterizes a neural network, other than its\n",
+    "connectivity, is the choice of activation function(s).  The following\n",
+    "restrictions are imposed on an activation function for an FFNN to\n",
+    "fulfill the universal approximation theorem\n",
+    "\n",
+    "  * Non-constant\n",
+    "\n",
+    "  * Bounded\n",
+    "\n",
+    "  * Monotonically-increasing\n",
+    "\n",
+    "  * Continuous"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4fe58cce",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "### Activation functions, examples\n",
+    "\n",
+    "Typical examples are the logistic *Sigmoid*"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a14f6d08",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\sigma(x) = \\frac{1}{1 + e^{-x}},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4c290410",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and the *hyperbolic tangent* function"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ca1ac514",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\sigma(x) = \\tanh(x)\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b9bcfab3",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## The RELU function family\n",
+    "\n",
+    "The ReLU activation function suffers from a problem known as the dying\n",
+    "ReLUs: during training, some neurons effectively die, meaning they\n",
+    "stop outputting anything other than 0.\n",
+    "\n",
+    "In some cases, you may find that half of your network’s neurons are\n",
+    "dead, especially if you used a large learning rate. During training,\n",
+    "if a neuron’s weights get updated such that the weighted sum of the\n",
+    "neuron’s inputs is negative, it will start outputting 0. When this\n",
+    "happen, the neuron is unlikely to come back to life since the gradient\n",
+    "of the ReLU function is 0 when its input is negative."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2fdf56f7",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## ELU function\n",
+    "\n",
+    "To solve this problem, nowadays practitioners use a variant of the\n",
+    "ReLU function, such as the leaky ReLU discussed above or the so-called\n",
+    "exponential linear unit (ELU) function"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "14bf193c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "ELU(z) = \\left\\{\\begin{array}{cc} \\alpha\\left( \\exp{(z)}-1\\right) & z < 0,\\\\  z & z \\ge 0.\\end{array}\\right.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "df29068f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Which activation function should we use?\n",
+    "\n",
+    "In general it seems that the ELU activation function is better than\n",
+    "the leaky ReLU function (and its variants), which is better than\n",
+    "ReLU. ReLU performs better than $\\tanh$ which in turn performs better\n",
+    "than the logistic function.\n",
+    "\n",
+    "If runtime performance is an issue, then you may opt for the leaky\n",
+    "ReLU function over the ELU function If you don’t want to tweak yet\n",
+    "another hyperparameter, you may just use the default $\\alpha$ of\n",
+    "$0.01$ for the leaky ReLU, and $1$ for ELU. If you have spare time and\n",
+    "computing power, you can use cross-validation or bootstrap to evaluate\n",
+    "other activation functions."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2fb5a29e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## More on activation functions, output layers\n",
+    "\n",
+    "In most cases you can use the ReLU activation function in the hidden\n",
+    "layers (or one of its variants).\n",
+    "\n",
+    "It is a bit faster to compute than other activation functions, and the\n",
+    "gradient descent optimization does in general not get stuck.\n",
+    "\n",
+    "**For the output layer:**\n",
+    "\n",
+    "* For classification the softmax activation function is generally a good choice for classification tasks (when the classes are mutually exclusive).\n",
+    "\n",
+    "* For regression tasks, you can simply use no activation function at all."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "bab79791",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Building neural networks in Tensorflow and Keras\n",
+    "\n",
+    "Now we want  to build on the experience gained from our neural network implementation in NumPy and scikit-learn\n",
+    "and use it to construct a neural network in Tensorflow. Once we have constructed a neural network in NumPy\n",
+    "and Tensorflow, building one in Keras is really quite trivial, though the performance may suffer.  \n",
+    "\n",
+    "In our previous example we used only one hidden layer, and in this we will use two. From this it should be quite\n",
+    "clear how to build one using an arbitrary number of hidden layers, using data structures such as Python lists or\n",
+    "NumPy arrays."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cc32bc9d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Tensorflow\n",
+    "\n",
+    "Tensorflow is an open source library machine learning library\n",
+    "developed by the Google Brain team for internal use. It was released\n",
+    "under the Apache 2.0 open source license in November 9, 2015.\n",
+    "\n",
+    "Tensorflow is a computational framework that allows you to construct\n",
+    "machine learning models at different levels of abstraction, from\n",
+    "high-level, object-oriented APIs like Keras, down to the C++ kernels\n",
+    "that Tensorflow is built upon. The higher levels of abstraction are\n",
+    "simpler to use, but less flexible, and our choice of implementation\n",
+    "should reflect the problems we are trying to solve.\n",
+    "\n",
+    "[Tensorflow uses](https://www.tensorflow.org/guide/graphs) so-called graphs to represent your computation\n",
+    "in terms of the dependencies between individual operations, such that you first build a Tensorflow *graph*\n",
+    "to represent your model, and then create a Tensorflow *session* to run the graph.\n",
+    "\n",
+    "In this guide we will analyze the same data as we did in our NumPy and\n",
+    "scikit-learn tutorial, gathered from the MNIST database of images. We\n",
+    "will give an introduction to the lower level Python Application\n",
+    "Program Interfaces (APIs), and see how we use them to build our graph.\n",
+    "Then we will build (effectively) the same graph in Keras, to see just\n",
+    "how simple solving a machine learning problem can be.\n",
+    "\n",
+    "To install tensorflow on Unix/Linux systems, use pip as"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "id": "deb81088",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "pip3 install tensorflow"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "979148b0",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and/or if you use **anaconda**, just write (or install from the graphical user interface)\n",
+    "(current release of CPU-only TensorFlow)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "id": "ad63b8d9",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "conda create -n tf tensorflow\n",
+    "conda activate tf"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1417a40e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "To install the current release of GPU TensorFlow"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "id": "d56acb3a",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "conda create -n tf-gpu tensorflow-gpu\n",
+    "conda activate tf-gpu"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6a163d27",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Using Keras\n",
+    "\n",
+    "Keras is a high level [neural network](https://en.wikipedia.org/wiki/Application_programming_interface)\n",
+    "that supports Tensorflow, CTNK and Theano as backends.  \n",
+    "If you have Anaconda installed you may run the following command"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "id": "9ee390a8",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "conda install keras"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "528ea3d5",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "You can look up the [instructions here](https://keras.io/) for more information.\n",
+    "\n",
+    "We will to a large extent use **keras** in this course."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "32178225",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Collect and pre-process data\n",
+    "\n",
+    "Let us look again at the MINST data set."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "id": "e37f86e4",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "%matplotlib inline\n",
+    "\n",
+    "# import necessary packages\n",
+    "import numpy as np\n",
+    "import matplotlib.pyplot as plt\n",
+    "import tensorflow as tf\n",
+    "from sklearn import datasets\n",
+    "\n",
+    "\n",
+    "# ensure the same random numbers appear every time\n",
+    "np.random.seed(0)\n",
+    "\n",
+    "# display images in notebook\n",
+    "%matplotlib inline\n",
+    "plt.rcParams['figure.figsize'] = (12,12)\n",
+    "\n",
+    "\n",
+    "# download MNIST dataset\n",
+    "digits = datasets.load_digits()\n",
+    "\n",
+    "# define inputs and labels\n",
+    "inputs = digits.images\n",
+    "labels = digits.target\n",
+    "\n",
+    "print(\"inputs = (n_inputs, pixel_width, pixel_height) = \" + str(inputs.shape))\n",
+    "print(\"labels = (n_inputs) = \" + str(labels.shape))\n",
+    "\n",
+    "\n",
+    "# flatten the image\n",
+    "# the value -1 means dimension is inferred from the remaining dimensions: 8x8 = 64\n",
+    "n_inputs = len(inputs)\n",
+    "inputs = inputs.reshape(n_inputs, -1)\n",
+    "print(\"X = (n_inputs, n_features) = \" + str(inputs.shape))\n",
+    "\n",
+    "\n",
+    "# choose some random images to display\n",
+    "indices = np.arange(n_inputs)\n",
+    "random_indices = np.random.choice(indices, size=5)\n",
+    "\n",
+    "for i, image in enumerate(digits.images[random_indices]):\n",
+    "    plt.subplot(1, 5, i+1)\n",
+    "    plt.axis('off')\n",
+    "    plt.imshow(image, cmap=plt.cm.gray_r, interpolation='nearest')\n",
+    "    plt.title(\"Label: %d\" % digits.target[random_indices[i]])\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "id": "06a7c3bd",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "from tensorflow.keras.layers import Input\n",
+    "from tensorflow.keras.models import Sequential      #This allows appending layers to existing models\n",
+    "from tensorflow.keras.layers import Dense           #This allows defining the characteristics of a particular layer\n",
+    "from tensorflow.keras import optimizers             #This allows using whichever optimiser we want (sgd,adam,RMSprop)\n",
+    "from tensorflow.keras import regularizers           #This allows using whichever regularizer we want (l1,l2,l1_l2)\n",
+    "from tensorflow.keras.utils import to_categorical   #This allows using categorical cross entropy as the cost function\n",
+    "\n",
+    "from sklearn.model_selection import train_test_split\n",
+    "\n",
+    "# one-hot representation of labels\n",
+    "labels = to_categorical(labels)\n",
+    "\n",
+    "# split into train and test data\n",
+    "train_size = 0.8\n",
+    "test_size = 1 - train_size\n",
+    "X_train, X_test, Y_train, Y_test = train_test_split(inputs, labels, train_size=train_size,\n",
+    "                                                    test_size=test_size)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "id": "358b46c5",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "\n",
+    "epochs = 100\n",
+    "batch_size = 100\n",
+    "n_neurons_layer1 = 100\n",
+    "n_neurons_layer2 = 50\n",
+    "n_categories = 10\n",
+    "eta_vals = np.logspace(-5, 1, 7)\n",
+    "lmbd_vals = np.logspace(-5, 1, 7)\n",
+    "def create_neural_network_keras(n_neurons_layer1, n_neurons_layer2, n_categories, eta, lmbd):\n",
+    "    model = Sequential()\n",
+    "    model.add(Dense(n_neurons_layer1, activation='sigmoid', kernel_regularizer=regularizers.l2(lmbd)))\n",
+    "    model.add(Dense(n_neurons_layer2, activation='sigmoid', kernel_regularizer=regularizers.l2(lmbd)))\n",
+    "    model.add(Dense(n_categories, activation='softmax'))\n",
+    "    \n",
+    "    sgd = optimizers.SGD(learning_rate=eta)\n",
+    "    model.compile(loss='categorical_crossentropy', optimizer=sgd, metrics=['accuracy'])\n",
+    "    \n",
+    "    return model"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 8,
+   "id": "5a0445fb",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "DNN_keras = np.zeros((len(eta_vals), len(lmbd_vals)), dtype=object)\n",
+    "        \n",
+    "for i, eta in enumerate(eta_vals):\n",
+    "    for j, lmbd in enumerate(lmbd_vals):\n",
+    "        DNN = create_neural_network_keras(n_neurons_layer1, n_neurons_layer2, n_categories,\n",
+    "                                         eta=eta, lmbd=lmbd)\n",
+    "        DNN.fit(X_train, Y_train, epochs=epochs, batch_size=batch_size, verbose=0)\n",
+    "        scores = DNN.evaluate(X_test, Y_test)\n",
+    "        \n",
+    "        DNN_keras[i][j] = DNN\n",
+    "        \n",
+    "        print(\"Learning rate = \", eta)\n",
+    "        print(\"Lambda = \", lmbd)\n",
+    "        print(\"Test accuracy: %.3f\" % scores[1])\n",
+    "        print()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 9,
+   "id": "f301c7cf",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "# optional\n",
+    "# visual representation of grid search\n",
+    "# uses seaborn heatmap, could probably do this in matplotlib\n",
+    "import seaborn as sns\n",
+    "\n",
+    "sns.set()\n",
+    "\n",
+    "train_accuracy = np.zeros((len(eta_vals), len(lmbd_vals)))\n",
+    "test_accuracy = np.zeros((len(eta_vals), len(lmbd_vals)))\n",
+    "\n",
+    "for i in range(len(eta_vals)):\n",
+    "    for j in range(len(lmbd_vals)):\n",
+    "        DNN = DNN_keras[i][j]\n",
+    "\n",
+    "        train_accuracy[i][j] = DNN.evaluate(X_train, Y_train)[1]\n",
+    "        test_accuracy[i][j] = DNN.evaluate(X_test, Y_test)[1]\n",
+    "\n",
+    "        \n",
+    "fig, ax = plt.subplots(figsize = (10, 10))\n",
+    "sns.heatmap(train_accuracy, annot=True, ax=ax, cmap=\"viridis\")\n",
+    "ax.set_title(\"Training Accuracy\")\n",
+    "ax.set_ylabel(\"$\\eta$\")\n",
+    "ax.set_xlabel(\"$\\lambda$\")\n",
+    "plt.show()\n",
+    "\n",
+    "fig, ax = plt.subplots(figsize = (10, 10))\n",
+    "sns.heatmap(test_accuracy, annot=True, ax=ax, cmap=\"viridis\")\n",
+    "ax.set_title(\"Test Accuracy\")\n",
+    "ax.set_ylabel(\"$\\eta$\")\n",
+    "ax.set_xlabel(\"$\\lambda$\")\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "610c95e1",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Using Pytorch with the full MNIST data set"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 10,
+   "id": "d0f3ad9a",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "import torch\n",
+    "import torch.nn as nn\n",
+    "import torch.optim as optim\n",
+    "import torchvision\n",
+    "import torchvision.transforms as transforms\n",
+    "\n",
+    "# Device configuration: use GPU if available\n",
+    "device = torch.device(\"cuda\" if torch.cuda.is_available() else \"cpu\")\n",
+    "\n",
+    "# MNIST dataset (downloads if not already present)\n",
+    "transform = transforms.Compose([\n",
+    "    transforms.ToTensor(),\n",
+    "    transforms.Normalize((0.5,), (0.5,))  # normalize to mean=0.5, std=0.5 (approx. [-1,1] pixel range)\n",
+    "])\n",
+    "train_dataset = torchvision.datasets.MNIST(root='./data', train=True, download=True, transform=transform)\n",
+    "test_dataset  = torchvision.datasets.MNIST(root='./data', train=False, download=True, transform=transform)\n",
+    "\n",
+    "train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=64, shuffle=True)\n",
+    "test_loader  = torch.utils.data.DataLoader(test_dataset, batch_size=64, shuffle=False)\n",
+    "\n",
+    "\n",
+    "class NeuralNet(nn.Module):\n",
+    "    def __init__(self):\n",
+    "        super(NeuralNet, self).__init__()\n",
+    "        self.fc1 = nn.Linear(28*28, 100)   # first hidden layer (784 -> 100)\n",
+    "        self.fc2 = nn.Linear(100, 100)    # second hidden layer (100 -> 100)\n",
+    "        self.fc3 = nn.Linear(100, 10)     # output layer (100 -> 10 classes)\n",
+    "    def forward(self, x):\n",
+    "        x = x.view(x.size(0), -1)         # flatten images into vectors of size 784\n",
+    "        x = torch.relu(self.fc1(x))       # hidden layer 1 + ReLU activation\n",
+    "        x = torch.relu(self.fc2(x))       # hidden layer 2 + ReLU activation\n",
+    "        x = self.fc3(x)                   # output layer (logits for 10 classes)\n",
+    "        return x\n",
+    "\n",
+    "model = NeuralNet().to(device)\n",
+    "\n",
+    "\n",
+    "criterion = nn.CrossEntropyLoss()\n",
+    "optimizer = optim.SGD(model.parameters(), lr=0.01, weight_decay=1e-4)\n",
+    "\n",
+    "num_epochs = 10\n",
+    "for epoch in range(num_epochs):\n",
+    "    model.train()  # set model to training mode\n",
+    "    running_loss = 0.0\n",
+    "    for images, labels in train_loader:\n",
+    "        # Move data to device (GPU if available, else CPU)\n",
+    "        images, labels = images.to(device), labels.to(device)\n",
+    "\n",
+    "        optimizer.zero_grad()            # reset gradients to zero\n",
+    "        outputs = model(images)          # forward pass: compute predictions\n",
+    "        loss = criterion(outputs, labels)  # compute cross-entropy loss\n",
+    "        loss.backward()                 # backpropagate to compute gradients\n",
+    "        optimizer.step()                # update weights using SGD step \n",
+    "\n",
+    "        running_loss += loss.item()\n",
+    "    # Compute average loss over all batches in this epoch\n",
+    "    avg_loss = running_loss / len(train_loader)\n",
+    "    print(f\"Epoch {epoch+1}/{num_epochs}, Loss: {avg_loss:.4f}\")\n",
+    "\n",
+    "#Evaluation on the Test Set\n",
+    "\n",
+    "\n",
+    "\n",
+    "model.eval()  # set model to evaluation mode \n",
+    "correct = 0\n",
+    "total = 0\n",
+    "with torch.no_grad():  # disable gradient calculation for evaluation \n",
+    "    for images, labels in test_loader:\n",
+    "        images, labels = images.to(device), labels.to(device)\n",
+    "        outputs = model(images)\n",
+    "        _, predicted = torch.max(outputs, dim=1)  # class with highest score\n",
+    "        total += labels.size(0)\n",
+    "        correct += (predicted == labels).sum().item()\n",
+    "\n",
+    "accuracy = 100 * correct / total\n",
+    "print(f\"Test Accuracy: {accuracy:.2f}%\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "aad687aa",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## And a similar example using Tensorflow with Keras"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 11,
+   "id": "b6c4fad4",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "\n",
+    "import tensorflow as tf\n",
+    "from tensorflow import keras\n",
+    "from tensorflow.keras import layers, regularizers\n",
+    "\n",
+    "# Check for GPU (TensorFlow will use it automatically if available)\n",
+    "gpus = tf.config.list_physical_devices('GPU')\n",
+    "print(f\"GPUs available: {gpus}\")\n",
+    "\n",
+    "# 1) Load and preprocess MNIST\n",
+    "(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()\n",
+    "# Normalize to [0, 1]\n",
+    "x_train = (x_train.astype(\"float32\") / 255.0)\n",
+    "x_test  = (x_test.astype(\"float32\") / 255.0)\n",
+    "\n",
+    "# 2) Build the model: 784 -> 100 -> 100 -> 10\n",
+    "l2_reg = 1e-4  # L2 regularization strength\n",
+    "\n",
+    "model = keras.Sequential([\n",
+    "    layers.Input(shape=(28, 28)),\n",
+    "    layers.Flatten(),\n",
+    "    layers.Dense(100, activation=\"relu\",\n",
+    "                 kernel_regularizer=regularizers.l2(l2_reg)),\n",
+    "    layers.Dense(100, activation=\"relu\",\n",
+    "                 kernel_regularizer=regularizers.l2(l2_reg)),\n",
+    "    layers.Dense(10, activation=\"softmax\")  # output probabilities for 10 classes\n",
+    "])\n",
+    "\n",
+    "# 3) Compile with SGD + weight decay via L2 regularizers\n",
+    "model.compile(\n",
+    "    optimizer=keras.optimizers.SGD(learning_rate=0.01),\n",
+    "    loss=\"sparse_categorical_crossentropy\",\n",
+    "    metrics=[\"accuracy\"],\n",
+    ")\n",
+    "\n",
+    "model.summary()\n",
+    "\n",
+    "# 4) Train\n",
+    "history = model.fit(\n",
+    "    x_train, y_train,\n",
+    "    epochs=10,\n",
+    "    batch_size=64,\n",
+    "    validation_split=0.1,  # optional: monitor validation during training\n",
+    "    verbose=1\n",
+    ")\n",
+    "\n",
+    "# 5) Evaluate on test set\n",
+    "test_loss, test_acc = model.evaluate(x_test, y_test, verbose=0)\n",
+    "print(f\"Test accuracy: {test_acc:.4f}, Test loss: {test_loss:.4f}\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "73162fbb",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Building our own  neural network code\n",
+    "\n",
+    "Here we  present a flexible object oriented codebase\n",
+    "for a feed forward neural network, along with a demonstration of how\n",
+    "to use it. Before we get into the details of the neural network, we\n",
+    "will first present some implementations of various schedulers, cost\n",
+    "functions and activation functions that can be used together with the\n",
+    "neural network.\n",
+    "\n",
+    "The codes here were developed by Eric Reber and Gregor Kajda during spring 2023."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "86f36041",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "### Learning rate methods\n",
+    "\n",
+    "The code below shows object oriented implementations of the Constant,\n",
+    "Momentum, Adagrad, AdagradMomentum, RMS prop and Adam schedulers. All\n",
+    "of the classes belong to the shared abstract Scheduler class, and\n",
+    "share the update_change() and reset() methods allowing for any of the\n",
+    "schedulers to be seamlessly used during the training stage, as will\n",
+    "later be shown in the fit() method of the neural\n",
+    "network. Update_change() only has one parameter, the gradient\n",
+    "($δ^l_ja^{l−1}_k$), and returns the change which will be subtracted\n",
+    "from the weights. The reset() function takes no parameters, and resets\n",
+    "the desired variables. For Constant and Momentum, reset does nothing."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 12,
+   "id": "bcbec449",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "import autograd.numpy as np\n",
+    "\n",
+    "class Scheduler:\n",
+    "    \"\"\"\n",
+    "    Abstract class for Schedulers\n",
+    "    \"\"\"\n",
+    "\n",
+    "    def __init__(self, eta):\n",
+    "        self.eta = eta\n",
+    "\n",
+    "    # should be overwritten\n",
+    "    def update_change(self, gradient):\n",
+    "        raise NotImplementedError\n",
+    "\n",
+    "    # overwritten if needed\n",
+    "    def reset(self):\n",
+    "        pass\n",
+    "\n",
+    "\n",
+    "class Constant(Scheduler):\n",
+    "    def __init__(self, eta):\n",
+    "        super().__init__(eta)\n",
+    "\n",
+    "    def update_change(self, gradient):\n",
+    "        return self.eta * gradient\n",
+    "    \n",
+    "    def reset(self):\n",
+    "        pass\n",
+    "\n",
+    "\n",
+    "class Momentum(Scheduler):\n",
+    "    def __init__(self, eta: float, momentum: float):\n",
+    "        super().__init__(eta)\n",
+    "        self.momentum = momentum\n",
+    "        self.change = 0\n",
+    "\n",
+    "    def update_change(self, gradient):\n",
+    "        self.change = self.momentum * self.change + self.eta * gradient\n",
+    "        return self.change\n",
+    "\n",
+    "    def reset(self):\n",
+    "        pass\n",
+    "\n",
+    "\n",
+    "class Adagrad(Scheduler):\n",
+    "    def __init__(self, eta):\n",
+    "        super().__init__(eta)\n",
+    "        self.G_t = None\n",
+    "\n",
+    "    def update_change(self, gradient):\n",
+    "        delta = 1e-8  # avoid division ny zero\n",
+    "\n",
+    "        if self.G_t is None:\n",
+    "            self.G_t = np.zeros((gradient.shape[0], gradient.shape[0]))\n",
+    "\n",
+    "        self.G_t += gradient @ gradient.T\n",
+    "\n",
+    "        G_t_inverse = 1 / (\n",
+    "            delta + np.sqrt(np.reshape(np.diagonal(self.G_t), (self.G_t.shape[0], 1)))\n",
+    "        )\n",
+    "        return self.eta * gradient * G_t_inverse\n",
+    "\n",
+    "    def reset(self):\n",
+    "        self.G_t = None\n",
+    "\n",
+    "\n",
+    "class AdagradMomentum(Scheduler):\n",
+    "    def __init__(self, eta, momentum):\n",
+    "        super().__init__(eta)\n",
+    "        self.G_t = None\n",
+    "        self.momentum = momentum\n",
+    "        self.change = 0\n",
+    "\n",
+    "    def update_change(self, gradient):\n",
+    "        delta = 1e-8  # avoid division ny zero\n",
+    "\n",
+    "        if self.G_t is None:\n",
+    "            self.G_t = np.zeros((gradient.shape[0], gradient.shape[0]))\n",
+    "\n",
+    "        self.G_t += gradient @ gradient.T\n",
+    "\n",
+    "        G_t_inverse = 1 / (\n",
+    "            delta + np.sqrt(np.reshape(np.diagonal(self.G_t), (self.G_t.shape[0], 1)))\n",
+    "        )\n",
+    "        self.change = self.change * self.momentum + self.eta * gradient * G_t_inverse\n",
+    "        return self.change\n",
+    "\n",
+    "    def reset(self):\n",
+    "        self.G_t = None\n",
+    "\n",
+    "\n",
+    "class RMS_prop(Scheduler):\n",
+    "    def __init__(self, eta, rho):\n",
+    "        super().__init__(eta)\n",
+    "        self.rho = rho\n",
+    "        self.second = 0.0\n",
+    "\n",
+    "    def update_change(self, gradient):\n",
+    "        delta = 1e-8  # avoid division ny zero\n",
+    "        self.second = self.rho * self.second + (1 - self.rho) * gradient * gradient\n",
+    "        return self.eta * gradient / (np.sqrt(self.second + delta))\n",
+    "\n",
+    "    def reset(self):\n",
+    "        self.second = 0.0\n",
+    "\n",
+    "\n",
+    "class Adam(Scheduler):\n",
+    "    def __init__(self, eta, rho, rho2):\n",
+    "        super().__init__(eta)\n",
+    "        self.rho = rho\n",
+    "        self.rho2 = rho2\n",
+    "        self.moment = 0\n",
+    "        self.second = 0\n",
+    "        self.n_epochs = 1\n",
+    "\n",
+    "    def update_change(self, gradient):\n",
+    "        delta = 1e-8  # avoid division ny zero\n",
+    "\n",
+    "        self.moment = self.rho * self.moment + (1 - self.rho) * gradient\n",
+    "        self.second = self.rho2 * self.second + (1 - self.rho2) * gradient * gradient\n",
+    "\n",
+    "        moment_corrected = self.moment / (1 - self.rho**self.n_epochs)\n",
+    "        second_corrected = self.second / (1 - self.rho2**self.n_epochs)\n",
+    "\n",
+    "        return self.eta * moment_corrected / (np.sqrt(second_corrected + delta))\n",
+    "\n",
+    "    def reset(self):\n",
+    "        self.n_epochs += 1\n",
+    "        self.moment = 0\n",
+    "        self.second = 0"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "961989d9",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "### Usage of the above learning rate schedulers\n",
+    "\n",
+    "To initalize a scheduler, simply create the object and pass in the\n",
+    "necessary parameters such as the learning rate and the momentum as\n",
+    "shown below. As the Scheduler class is an abstract class it should not\n",
+    "called directly, and will raise an error upon usage."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 13,
+   "id": "1e9fbe0f",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "momentum_scheduler = Momentum(eta=1e-3, momentum=0.9)\n",
+    "adam_scheduler = Adam(eta=1e-3, rho=0.9, rho2=0.999)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b5adb1b4",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Here is a small example for how a segment of code using schedulers\n",
+    "could look. Switching out the schedulers is simple."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 14,
+   "id": "dc4f4d28",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "weights = np.ones((3,3))\n",
+    "print(f\"Before scheduler:\\n{weights=}\")\n",
+    "\n",
+    "epochs = 10\n",
+    "for e in range(epochs):\n",
+    "    gradient = np.random.rand(3, 3)\n",
+    "    change = adam_scheduler.update_change(gradient)\n",
+    "    weights = weights - change\n",
+    "    adam_scheduler.reset()\n",
+    "\n",
+    "print(f\"\\nAfter scheduler:\\n{weights=}\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8964d118",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "### Cost functions\n",
+    "\n",
+    "Here we discuss cost functions that can be used when creating the\n",
+    "neural network. Every cost function takes the target vector as its\n",
+    "parameter, and returns a function valued only at $x$ such that it may\n",
+    "easily be differentiated."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 15,
+   "id": "3a8470bd",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "import autograd.numpy as np\n",
+    "\n",
+    "def CostOLS(target):\n",
+    "    \n",
+    "    def func(X):\n",
+    "        return (1.0 / target.shape[0]) * np.sum((target - X) ** 2)\n",
+    "\n",
+    "    return func\n",
+    "\n",
+    "\n",
+    "def CostLogReg(target):\n",
+    "\n",
+    "    def func(X):\n",
+    "        \n",
+    "        return -(1.0 / target.shape[0]) * np.sum(\n",
+    "            (target * np.log(X + 10e-10)) + ((1 - target) * np.log(1 - X + 10e-10))\n",
+    "        )\n",
+    "\n",
+    "    return func\n",
+    "\n",
+    "\n",
+    "def CostCrossEntropy(target):\n",
+    "    \n",
+    "    def func(X):\n",
+    "        return -(1.0 / target.size) * np.sum(target * np.log(X + 10e-10))\n",
+    "\n",
+    "    return func"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ab4daf8f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Below we give a short example of how these cost function may be used\n",
+    "to obtain results if you wish to test them out on your own using\n",
+    "AutoGrad's automatics differentiation."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 16,
+   "id": "cf8922ac",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "from autograd import grad\n",
+    "\n",
+    "target = np.array([[1, 2, 3]]).T\n",
+    "a = np.array([[4, 5, 6]]).T\n",
+    "\n",
+    "cost_func = CostCrossEntropy\n",
+    "cost_func_derivative = grad(cost_func(target))\n",
+    "\n",
+    "valued_at_a = cost_func_derivative(a)\n",
+    "print(f\"Derivative of cost function {cost_func.__name__} valued at a:\\n{valued_at_a}\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "fab332c4",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "### Activation functions\n",
+    "\n",
+    "Finally, before we look at the neural network, we will look at the\n",
+    "activation functions which can be specified between the hidden layers\n",
+    "and as the output function. Each function can be valued for any given\n",
+    "vector or matrix X, and can be differentiated via derivate()."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 17,
+   "id": "5ab56013",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "import autograd.numpy as np\n",
+    "from autograd import elementwise_grad\n",
+    "\n",
+    "def identity(X):\n",
+    "    return X\n",
+    "\n",
+    "\n",
+    "def sigmoid(X):\n",
+    "    try:\n",
+    "        return 1.0 / (1 + np.exp(-X))\n",
+    "    except FloatingPointError:\n",
+    "        return np.where(X > np.zeros(X.shape), np.ones(X.shape), np.zeros(X.shape))\n",
+    "\n",
+    "\n",
+    "def softmax(X):\n",
+    "    X = X - np.max(X, axis=-1, keepdims=True)\n",
+    "    delta = 10e-10\n",
+    "    return np.exp(X) / (np.sum(np.exp(X), axis=-1, keepdims=True) + delta)\n",
+    "\n",
+    "\n",
+    "def RELU(X):\n",
+    "    return np.where(X > np.zeros(X.shape), X, np.zeros(X.shape))\n",
+    "\n",
+    "\n",
+    "def LRELU(X):\n",
+    "    delta = 10e-4\n",
+    "    return np.where(X > np.zeros(X.shape), X, delta * X)\n",
+    "\n",
+    "\n",
+    "def derivate(func):\n",
+    "    if func.__name__ == \"RELU\":\n",
+    "\n",
+    "        def func(X):\n",
+    "            return np.where(X > 0, 1, 0)\n",
+    "\n",
+    "        return func\n",
+    "\n",
+    "    elif func.__name__ == \"LRELU\":\n",
+    "\n",
+    "        def func(X):\n",
+    "            delta = 10e-4\n",
+    "            return np.where(X > 0, 1, delta)\n",
+    "\n",
+    "        return func\n",
+    "\n",
+    "    else:\n",
+    "        return elementwise_grad(func)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "969612c3",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Below follows a short demonstration of how to use an activation\n",
+    "function. The derivative of the activation function will be important\n",
+    "when calculating the output delta term during backpropagation. Note\n",
+    "that derivate() can also be used for cost functions for a more\n",
+    "generalized approach."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 18,
+   "id": "313878c6",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "z = np.array([[4, 5, 6]]).T\n",
+    "print(f\"Input to activation function:\\n{z}\")\n",
+    "\n",
+    "act_func = sigmoid\n",
+    "a = act_func(z)\n",
+    "print(f\"\\nOutput from {act_func.__name__} activation function:\\n{a}\")\n",
+    "\n",
+    "act_func_derivative = derivate(act_func)\n",
+    "valued_at_z = act_func_derivative(a)\n",
+    "print(f\"\\nDerivative of {act_func.__name__} activation function valued at z:\\n{valued_at_z}\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "095347a2",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "### The Neural Network\n",
+    "\n",
+    "Now that we have gotten a good understanding of the implementation of\n",
+    "some important components, we can take a look at an object oriented\n",
+    "implementation of a feed forward neural network. The feed forward\n",
+    "neural network has been implemented as a class named FFNN, which can\n",
+    "be initiated as a regressor or classifier dependant on the choice of\n",
+    "cost function. The FFNN can have any number of input nodes, hidden\n",
+    "layers with any amount of hidden nodes, and any amount of output nodes\n",
+    "meaning it can perform multiclass classification as well as binary\n",
+    "classification and regression problems. Although there is a lot of\n",
+    "code present, it makes for an easy to use and generalizeable interface\n",
+    "for creating many types of neural networks as will be demonstrated\n",
+    "below."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 19,
+   "id": "9ea2b0b7",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "import math\n",
+    "import autograd.numpy as np\n",
+    "import sys\n",
+    "import warnings\n",
+    "from autograd import grad, elementwise_grad\n",
+    "from random import random, seed\n",
+    "from copy import deepcopy, copy\n",
+    "from typing import Tuple, Callable\n",
+    "from sklearn.utils import resample\n",
+    "\n",
+    "warnings.simplefilter(\"error\")\n",
+    "\n",
+    "\n",
+    "class FFNN:\n",
+    "    \"\"\"\n",
+    "    Description:\n",
+    "    ------------\n",
+    "        Feed Forward Neural Network with interface enabling flexible design of a\n",
+    "        nerual networks architecture and the specification of activation function\n",
+    "        in the hidden layers and output layer respectively. This model can be used\n",
+    "        for both regression and classification problems, depending on the output function.\n",
+    "\n",
+    "    Attributes:\n",
+    "    ------------\n",
+    "        I   dimensions (tuple[int]): A list of positive integers, which specifies the\n",
+    "            number of nodes in each of the networks layers. The first integer in the array\n",
+    "            defines the number of nodes in the input layer, the second integer defines number\n",
+    "            of nodes in the first hidden layer and so on until the last number, which\n",
+    "            specifies the number of nodes in the output layer.\n",
+    "        II  hidden_func (Callable): The activation function for the hidden layers\n",
+    "        III output_func (Callable): The activation function for the output layer\n",
+    "        IV  cost_func (Callable): Our cost function\n",
+    "        V   seed (int): Sets random seed, makes results reproducible\n",
+    "    \"\"\"\n",
+    "\n",
+    "    def __init__(\n",
+    "        self,\n",
+    "        dimensions: tuple[int],\n",
+    "        hidden_func: Callable = sigmoid,\n",
+    "        output_func: Callable = lambda x: x,\n",
+    "        cost_func: Callable = CostOLS,\n",
+    "        seed: int = None,\n",
+    "    ):\n",
+    "        self.dimensions = dimensions\n",
+    "        self.hidden_func = hidden_func\n",
+    "        self.output_func = output_func\n",
+    "        self.cost_func = cost_func\n",
+    "        self.seed = seed\n",
+    "        self.weights = list()\n",
+    "        self.schedulers_weight = list()\n",
+    "        self.schedulers_bias = list()\n",
+    "        self.a_matrices = list()\n",
+    "        self.z_matrices = list()\n",
+    "        self.classification = None\n",
+    "\n",
+    "        self.reset_weights()\n",
+    "        self._set_classification()\n",
+    "\n",
+    "    def fit(\n",
+    "        self,\n",
+    "        X: np.ndarray,\n",
+    "        t: np.ndarray,\n",
+    "        scheduler: Scheduler,\n",
+    "        batches: int = 1,\n",
+    "        epochs: int = 100,\n",
+    "        lam: float = 0,\n",
+    "        X_val: np.ndarray = None,\n",
+    "        t_val: np.ndarray = None,\n",
+    "    ):\n",
+    "        \"\"\"\n",
+    "        Description:\n",
+    "        ------------\n",
+    "            This function performs the training the neural network by performing the feedforward and backpropagation\n",
+    "            algorithm to update the networks weights.\n",
+    "\n",
+    "        Parameters:\n",
+    "        ------------\n",
+    "            I    X (np.ndarray) : training data\n",
+    "            II   t (np.ndarray) : target data\n",
+    "            III  scheduler (Scheduler) : specified scheduler (algorithm for optimization of gradient descent)\n",
+    "            IV   scheduler_args (list[int]) : list of all arguments necessary for scheduler\n",
+    "\n",
+    "        Optional Parameters:\n",
+    "        ------------\n",
+    "            V    batches (int) : number of batches the datasets are split into, default equal to 1\n",
+    "            VI   epochs (int) : number of iterations used to train the network, default equal to 100\n",
+    "            VII  lam (float) : regularization hyperparameter lambda\n",
+    "            VIII X_val (np.ndarray) : validation set\n",
+    "            IX   t_val (np.ndarray) : validation target set\n",
+    "\n",
+    "        Returns:\n",
+    "        ------------\n",
+    "            I   scores (dict) : A dictionary containing the performance metrics of the model.\n",
+    "                The number of the metrics depends on the parameters passed to the fit-function.\n",
+    "\n",
+    "        \"\"\"\n",
+    "\n",
+    "        # setup \n",
+    "        if self.seed is not None:\n",
+    "            np.random.seed(self.seed)\n",
+    "\n",
+    "        val_set = False\n",
+    "        if X_val is not None and t_val is not None:\n",
+    "            val_set = True\n",
+    "\n",
+    "        # creating arrays for score metrics\n",
+    "        train_errors = np.empty(epochs)\n",
+    "        train_errors.fill(np.nan)\n",
+    "        val_errors = np.empty(epochs)\n",
+    "        val_errors.fill(np.nan)\n",
+    "\n",
+    "        train_accs = np.empty(epochs)\n",
+    "        train_accs.fill(np.nan)\n",
+    "        val_accs = np.empty(epochs)\n",
+    "        val_accs.fill(np.nan)\n",
+    "\n",
+    "        self.schedulers_weight = list()\n",
+    "        self.schedulers_bias = list()\n",
+    "\n",
+    "        batch_size = X.shape[0] // batches\n",
+    "\n",
+    "        X, t = resample(X, t)\n",
+    "\n",
+    "        # this function returns a function valued only at X\n",
+    "        cost_function_train = self.cost_func(t)\n",
+    "        if val_set:\n",
+    "            cost_function_val = self.cost_func(t_val)\n",
+    "\n",
+    "        # create schedulers for each weight matrix\n",
+    "        for i in range(len(self.weights)):\n",
+    "            self.schedulers_weight.append(copy(scheduler))\n",
+    "            self.schedulers_bias.append(copy(scheduler))\n",
+    "\n",
+    "        print(f\"{scheduler.__class__.__name__}: Eta={scheduler.eta}, Lambda={lam}\")\n",
+    "\n",
+    "        try:\n",
+    "            for e in range(epochs):\n",
+    "                for i in range(batches):\n",
+    "                    # allows for minibatch gradient descent\n",
+    "                    if i == batches - 1:\n",
+    "                        # If the for loop has reached the last batch, take all thats left\n",
+    "                        X_batch = X[i * batch_size :, :]\n",
+    "                        t_batch = t[i * batch_size :, :]\n",
+    "                    else:\n",
+    "                        X_batch = X[i * batch_size : (i + 1) * batch_size, :]\n",
+    "                        t_batch = t[i * batch_size : (i + 1) * batch_size, :]\n",
+    "\n",
+    "                    self._feedforward(X_batch)\n",
+    "                    self._backpropagate(X_batch, t_batch, lam)\n",
+    "\n",
+    "                # reset schedulers for each epoch (some schedulers pass in this call)\n",
+    "                for scheduler in self.schedulers_weight:\n",
+    "                    scheduler.reset()\n",
+    "\n",
+    "                for scheduler in self.schedulers_bias:\n",
+    "                    scheduler.reset()\n",
+    "\n",
+    "                # computing performance metrics\n",
+    "                pred_train = self.predict(X)\n",
+    "                train_error = cost_function_train(pred_train)\n",
+    "\n",
+    "                train_errors[e] = train_error\n",
+    "                if val_set:\n",
+    "                    \n",
+    "                    pred_val = self.predict(X_val)\n",
+    "                    val_error = cost_function_val(pred_val)\n",
+    "                    val_errors[e] = val_error\n",
+    "\n",
+    "                if self.classification:\n",
+    "                    train_acc = self._accuracy(self.predict(X), t)\n",
+    "                    train_accs[e] = train_acc\n",
+    "                    if val_set:\n",
+    "                        val_acc = self._accuracy(pred_val, t_val)\n",
+    "                        val_accs[e] = val_acc\n",
+    "\n",
+    "                # printing progress bar\n",
+    "                progression = e / epochs\n",
+    "                print_length = self._progress_bar(\n",
+    "                    progression,\n",
+    "                    train_error=train_errors[e],\n",
+    "                    train_acc=train_accs[e],\n",
+    "                    val_error=val_errors[e],\n",
+    "                    val_acc=val_accs[e],\n",
+    "                )\n",
+    "        except KeyboardInterrupt:\n",
+    "            # allows for stopping training at any point and seeing the result\n",
+    "            pass\n",
+    "\n",
+    "        # visualization of training progression (similiar to tensorflow progression bar)\n",
+    "        sys.stdout.write(\"\\r\" + \" \" * print_length)\n",
+    "        sys.stdout.flush()\n",
+    "        self._progress_bar(\n",
+    "            1,\n",
+    "            train_error=train_errors[e],\n",
+    "            train_acc=train_accs[e],\n",
+    "            val_error=val_errors[e],\n",
+    "            val_acc=val_accs[e],\n",
+    "        )\n",
+    "        sys.stdout.write(\"\")\n",
+    "\n",
+    "        # return performance metrics for the entire run\n",
+    "        scores = dict()\n",
+    "\n",
+    "        scores[\"train_errors\"] = train_errors\n",
+    "\n",
+    "        if val_set:\n",
+    "            scores[\"val_errors\"] = val_errors\n",
+    "\n",
+    "        if self.classification:\n",
+    "            scores[\"train_accs\"] = train_accs\n",
+    "\n",
+    "            if val_set:\n",
+    "                scores[\"val_accs\"] = val_accs\n",
+    "\n",
+    "        return scores\n",
+    "\n",
+    "    def predict(self, X: np.ndarray, *, threshold=0.5):\n",
+    "        \"\"\"\n",
+    "         Description:\n",
+    "         ------------\n",
+    "             Performs prediction after training of the network has been finished.\n",
+    "\n",
+    "         Parameters:\n",
+    "        ------------\n",
+    "             I   X (np.ndarray): The design matrix, with n rows of p features each\n",
+    "\n",
+    "         Optional Parameters:\n",
+    "         ------------\n",
+    "             II  threshold (float) : sets minimal value for a prediction to be predicted as the positive class\n",
+    "                 in classification problems\n",
+    "\n",
+    "         Returns:\n",
+    "         ------------\n",
+    "             I   z (np.ndarray): A prediction vector (row) for each row in our design matrix\n",
+    "                 This vector is thresholded if regression=False, meaning that classification results\n",
+    "                 in a vector of 1s and 0s, while regressions in an array of decimal numbers\n",
+    "\n",
+    "        \"\"\"\n",
+    "\n",
+    "        predict = self._feedforward(X)\n",
+    "\n",
+    "        if self.classification:\n",
+    "            return np.where(predict > threshold, 1, 0)\n",
+    "        else:\n",
+    "            return predict\n",
+    "\n",
+    "    def reset_weights(self):\n",
+    "        \"\"\"\n",
+    "        Description:\n",
+    "        ------------\n",
+    "            Resets/Reinitializes the weights in order to train the network for a new problem.\n",
+    "\n",
+    "        \"\"\"\n",
+    "        if self.seed is not None:\n",
+    "            np.random.seed(self.seed)\n",
+    "\n",
+    "        self.weights = list()\n",
+    "        for i in range(len(self.dimensions) - 1):\n",
+    "            weight_array = np.random.randn(\n",
+    "                self.dimensions[i] + 1, self.dimensions[i + 1]\n",
+    "            )\n",
+    "            weight_array[0, :] = np.random.randn(self.dimensions[i + 1]) * 0.01\n",
+    "\n",
+    "            self.weights.append(weight_array)\n",
+    "\n",
+    "    def _feedforward(self, X: np.ndarray):\n",
+    "        \"\"\"\n",
+    "        Description:\n",
+    "        ------------\n",
+    "            Calculates the activation of each layer starting at the input and ending at the output.\n",
+    "            Each following activation is calculated from a weighted sum of each of the preceeding\n",
+    "            activations (except in the case of the input layer).\n",
+    "\n",
+    "        Parameters:\n",
+    "        ------------\n",
+    "            I   X (np.ndarray): The design matrix, with n rows of p features each\n",
+    "\n",
+    "        Returns:\n",
+    "        ------------\n",
+    "            I   z (np.ndarray): A prediction vector (row) for each row in our design matrix\n",
+    "        \"\"\"\n",
+    "\n",
+    "        # reset matrices\n",
+    "        self.a_matrices = list()\n",
+    "        self.z_matrices = list()\n",
+    "\n",
+    "        # if X is just a vector, make it into a matrix\n",
+    "        if len(X.shape) == 1:\n",
+    "            X = X.reshape((1, X.shape[0]))\n",
+    "\n",
+    "        # Add a coloumn of zeros as the first coloumn of the design matrix, in order\n",
+    "        # to add bias to our data\n",
+    "        bias = np.ones((X.shape[0], 1)) * 0.01\n",
+    "        X = np.hstack([bias, X])\n",
+    "\n",
+    "        # a^0, the nodes in the input layer (one a^0 for each row in X - where the\n",
+    "        # exponent indicates layer number).\n",
+    "        a = X\n",
+    "        self.a_matrices.append(a)\n",
+    "        self.z_matrices.append(a)\n",
+    "\n",
+    "        # The feed forward algorithm\n",
+    "        for i in range(len(self.weights)):\n",
+    "            if i < len(self.weights) - 1:\n",
+    "                z = a @ self.weights[i]\n",
+    "                self.z_matrices.append(z)\n",
+    "                a = self.hidden_func(z)\n",
+    "                # bias column again added to the data here\n",
+    "                bias = np.ones((a.shape[0], 1)) * 0.01\n",
+    "                a = np.hstack([bias, a])\n",
+    "                self.a_matrices.append(a)\n",
+    "            else:\n",
+    "                try:\n",
+    "                    # a^L, the nodes in our output layers\n",
+    "                    z = a @ self.weights[i]\n",
+    "                    a = self.output_func(z)\n",
+    "                    self.a_matrices.append(a)\n",
+    "                    self.z_matrices.append(z)\n",
+    "                except Exception as OverflowError:\n",
+    "                    print(\n",
+    "                        \"OverflowError in fit() in FFNN\\nHOW TO DEBUG ERROR: Consider lowering your learning rate or scheduler specific parameters such as momentum, or check if your input values need scaling\"\n",
+    "                    )\n",
+    "\n",
+    "        # this will be a^L\n",
+    "        return a\n",
+    "\n",
+    "    def _backpropagate(self, X, t, lam):\n",
+    "        \"\"\"\n",
+    "        Description:\n",
+    "        ------------\n",
+    "            Performs the backpropagation algorithm. In other words, this method\n",
+    "            calculates the gradient of all the layers starting at the\n",
+    "            output layer, and moving from right to left accumulates the gradient until\n",
+    "            the input layer is reached. Each layers respective weights are updated while\n",
+    "            the algorithm propagates backwards from the output layer (auto-differentation in reverse mode).\n",
+    "\n",
+    "        Parameters:\n",
+    "        ------------\n",
+    "            I   X (np.ndarray): The design matrix, with n rows of p features each.\n",
+    "            II  t (np.ndarray): The target vector, with n rows of p targets.\n",
+    "            III lam (float32): regularization parameter used to punish the weights in case of overfitting\n",
+    "\n",
+    "        Returns:\n",
+    "        ------------\n",
+    "            No return value.\n",
+    "\n",
+    "        \"\"\"\n",
+    "        out_derivative = derivate(self.output_func)\n",
+    "        hidden_derivative = derivate(self.hidden_func)\n",
+    "\n",
+    "        for i in range(len(self.weights) - 1, -1, -1):\n",
+    "            # delta terms for output\n",
+    "            if i == len(self.weights) - 1:\n",
+    "                # for multi-class classification\n",
+    "                if (\n",
+    "                    self.output_func.__name__ == \"softmax\"\n",
+    "                ):\n",
+    "                    delta_matrix = self.a_matrices[i + 1] - t\n",
+    "                # for single class classification\n",
+    "                else:\n",
+    "                    cost_func_derivative = grad(self.cost_func(t))\n",
+    "                    delta_matrix = out_derivative(\n",
+    "                        self.z_matrices[i + 1]\n",
+    "                    ) * cost_func_derivative(self.a_matrices[i + 1])\n",
+    "\n",
+    "            # delta terms for hidden layer\n",
+    "            else:\n",
+    "                delta_matrix = (\n",
+    "                    self.weights[i + 1][1:, :] @ delta_matrix.T\n",
+    "                ).T * hidden_derivative(self.z_matrices[i + 1])\n",
+    "\n",
+    "            # calculate gradient\n",
+    "            gradient_weights = self.a_matrices[i][:, 1:].T @ delta_matrix\n",
+    "            gradient_bias = np.sum(delta_matrix, axis=0).reshape(\n",
+    "                1, delta_matrix.shape[1]\n",
+    "            )\n",
+    "\n",
+    "            # regularization term\n",
+    "            gradient_weights += self.weights[i][1:, :] * lam\n",
+    "\n",
+    "            # use scheduler\n",
+    "            update_matrix = np.vstack(\n",
+    "                [\n",
+    "                    self.schedulers_bias[i].update_change(gradient_bias),\n",
+    "                    self.schedulers_weight[i].update_change(gradient_weights),\n",
+    "                ]\n",
+    "            )\n",
+    "\n",
+    "            # update weights and bias\n",
+    "            self.weights[i] -= update_matrix\n",
+    "\n",
+    "    def _accuracy(self, prediction: np.ndarray, target: np.ndarray):\n",
+    "        \"\"\"\n",
+    "        Description:\n",
+    "        ------------\n",
+    "            Calculates accuracy of given prediction to target\n",
+    "\n",
+    "        Parameters:\n",
+    "        ------------\n",
+    "            I   prediction (np.ndarray): vector of predicitons output network\n",
+    "                (1s and 0s in case of classification, and real numbers in case of regression)\n",
+    "            II  target (np.ndarray): vector of true values (What the network ideally should predict)\n",
+    "\n",
+    "        Returns:\n",
+    "        ------------\n",
+    "            A floating point number representing the percentage of correctly classified instances.\n",
+    "        \"\"\"\n",
+    "        assert prediction.size == target.size\n",
+    "        return np.average((target == prediction))\n",
+    "    def _set_classification(self):\n",
+    "        \"\"\"\n",
+    "        Description:\n",
+    "        ------------\n",
+    "            Decides if FFNN acts as classifier (True) og regressor (False),\n",
+    "            sets self.classification during init()\n",
+    "        \"\"\"\n",
+    "        self.classification = False\n",
+    "        if (\n",
+    "            self.cost_func.__name__ == \"CostLogReg\"\n",
+    "            or self.cost_func.__name__ == \"CostCrossEntropy\"\n",
+    "        ):\n",
+    "            self.classification = True\n",
+    "\n",
+    "    def _progress_bar(self, progression, **kwargs):\n",
+    "        \"\"\"\n",
+    "        Description:\n",
+    "        ------------\n",
+    "            Displays progress of training\n",
+    "        \"\"\"\n",
+    "        print_length = 40\n",
+    "        num_equals = int(progression * print_length)\n",
+    "        num_not = print_length - num_equals\n",
+    "        arrow = \">\" if num_equals > 0 else \"\"\n",
+    "        bar = \"[\" + \"=\" * (num_equals - 1) + arrow + \"-\" * num_not + \"]\"\n",
+    "        perc_print = self._format(progression * 100, decimals=5)\n",
+    "        line = f\"  {bar} {perc_print}% \"\n",
+    "\n",
+    "        for key in kwargs:\n",
+    "            if not np.isnan(kwargs[key]):\n",
+    "                value = self._format(kwargs[key], decimals=4)\n",
+    "                line += f\"| {key}: {value} \"\n",
+    "        sys.stdout.write(\"\\r\" + line)\n",
+    "        sys.stdout.flush()\n",
+    "        return len(line)\n",
+    "\n",
+    "    def _format(self, value, decimals=4):\n",
+    "        \"\"\"\n",
+    "        Description:\n",
+    "        ------------\n",
+    "            Formats decimal numbers for progress bar\n",
+    "        \"\"\"\n",
+    "        if value > 0:\n",
+    "            v = value\n",
+    "        elif value < 0:\n",
+    "            v = -10 * value\n",
+    "        else:\n",
+    "            v = 1\n",
+    "        n = 1 + math.floor(math.log10(v))\n",
+    "        if n >= decimals - 1:\n",
+    "            return str(round(value))\n",
+    "        return f\"{value:.{decimals-n-1}f}\""
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0f29bccd",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Before we make a model, we will quickly generate a dataset we can use\n",
+    "for our linear regression problem as shown below"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 20,
+   "id": "dc37b403",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "import autograd.numpy as np\n",
+    "from sklearn.model_selection import train_test_split\n",
+    "\n",
+    "def SkrankeFunction(x, y):\n",
+    "    return np.ravel(0 + 1*x + 2*y + 3*x**2 + 4*x*y + 5*y**2)\n",
+    "\n",
+    "def create_X(x, y, n):\n",
+    "    if len(x.shape) > 1:\n",
+    "        x = np.ravel(x)\n",
+    "        y = np.ravel(y)\n",
+    "\n",
+    "    N = len(x)\n",
+    "    l = int((n + 1) * (n + 2) / 2)  # Number of elements in beta\n",
+    "    X = np.ones((N, l))\n",
+    "\n",
+    "    for i in range(1, n + 1):\n",
+    "        q = int((i) * (i + 1) / 2)\n",
+    "        for k in range(i + 1):\n",
+    "            X[:, q + k] = (x ** (i - k)) * (y**k)\n",
+    "\n",
+    "    return X\n",
+    "\n",
+    "step=0.5\n",
+    "x = np.arange(0, 1, step)\n",
+    "y = np.arange(0, 1, step)\n",
+    "x, y = np.meshgrid(x, y)\n",
+    "target = SkrankeFunction(x, y)\n",
+    "target = target.reshape(target.shape[0], 1)\n",
+    "\n",
+    "poly_degree=3\n",
+    "X = create_X(x, y, poly_degree)\n",
+    "\n",
+    "X_train, X_test, t_train, t_test = train_test_split(X, target)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "91790369",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Now that we have our dataset ready for the regression, we can create\n",
+    "our regressor. Note that with the seed parameter, we can make sure our\n",
+    "results stay the same every time we run the neural network. For\n",
+    "inititialization, we simply specify the dimensions (we wish the amount\n",
+    "of input nodes to be equal to the datapoints, and the output to\n",
+    "predict one value)."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 21,
+   "id": "62585c7a",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "input_nodes = X_train.shape[1]\n",
+    "output_nodes = 1\n",
+    "\n",
+    "linear_regression = FFNN((input_nodes, output_nodes), output_func=identity, cost_func=CostOLS, seed=2023)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "69cdc171",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "We then fit our model with our training data using the scheduler of our choice."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 22,
+   "id": "d0713298",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "linear_regression.reset_weights() # reset weights such that previous runs or reruns don't affect the weights\n",
+    "\n",
+    "scheduler = Constant(eta=1e-3)\n",
+    "scores = linear_regression.fit(X_train, t_train, scheduler)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "310f805d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Due to the progress bar we can see the MSE (train_error) throughout\n",
+    "the FFNN's training. Note that the fit() function has some optional\n",
+    "parameters with defualt arguments. For example, the regularization\n",
+    "hyperparameter can be left ignored if not needed, and equally the FFNN\n",
+    "will by default run for 100 epochs. These can easily be changed, such\n",
+    "as for example:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 23,
+   "id": "216d1c44",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "linear_regression.reset_weights() # reset weights such that previous runs or reruns don't affect the weights\n",
+    "\n",
+    "scores = linear_regression.fit(X_train, t_train, scheduler, lam=1e-4, epochs=1000)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ba2e5a39",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "We see that given more epochs to train on, the regressor reaches a lower MSE.\n",
+    "\n",
+    "Let us then switch to a binary classification. We use a binary\n",
+    "classification dataset, and follow a similar setup to the regression\n",
+    "case."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 24,
+   "id": "8c5b291e",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "from sklearn.datasets import load_breast_cancer\n",
+    "from sklearn.preprocessing import MinMaxScaler\n",
+    "\n",
+    "wisconsin = load_breast_cancer()\n",
+    "X = wisconsin.data\n",
+    "target = wisconsin.target\n",
+    "target = target.reshape(target.shape[0], 1)\n",
+    "\n",
+    "X_train, X_val, t_train, t_val = train_test_split(X, target)\n",
+    "\n",
+    "scaler = MinMaxScaler()\n",
+    "scaler.fit(X_train)\n",
+    "X_train = scaler.transform(X_train)\n",
+    "X_val = scaler.transform(X_val)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 25,
+   "id": "4f6aa682",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "input_nodes = X_train.shape[1]\n",
+    "output_nodes = 1\n",
+    "\n",
+    "logistic_regression = FFNN((input_nodes, output_nodes), output_func=sigmoid, cost_func=CostLogReg, seed=2023)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3ff7c54a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "We will now make use of our validation data by passing it into our fit function as a keyword argument"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 26,
+   "id": "4bbcaedd",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "logistic_regression.reset_weights() # reset weights such that previous runs or reruns don't affect the weights\n",
+    "\n",
+    "scheduler = Adam(eta=1e-3, rho=0.9, rho2=0.999)\n",
+    "scores = logistic_regression.fit(X_train, t_train, scheduler, epochs=1000, X_val=X_val, t_val=t_val)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "aa4f54fe",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Finally, we will create a neural network with 2 hidden layers with activation functions."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 27,
+   "id": "c11be1f5",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "input_nodes = X_train.shape[1]\n",
+    "hidden_nodes1 = 100\n",
+    "hidden_nodes2 = 30\n",
+    "output_nodes = 1\n",
+    "\n",
+    "dims = (input_nodes, hidden_nodes1, hidden_nodes2, output_nodes)\n",
+    "\n",
+    "neural_network = FFNN(dims, hidden_func=RELU, output_func=sigmoid, cost_func=CostLogReg, seed=2023)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 28,
+   "id": "78482f24",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "neural_network.reset_weights() # reset weights such that previous runs or reruns don't affect the weights\n",
+    "\n",
+    "scheduler = Adam(eta=1e-4, rho=0.9, rho2=0.999)\n",
+    "scores = neural_network.fit(X_train, t_train, scheduler, epochs=1000, X_val=X_val, t_val=t_val)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "678b88e7",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "### Multiclass classification\n",
+    "\n",
+    "Finally, we will demonstrate the use case of multiclass classification\n",
+    "using our FFNN with the famous MNIST dataset, which contain images of\n",
+    "digits between the range of 0 to 9."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 29,
+   "id": "833a7321",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "from sklearn.datasets import load_digits\n",
+    "\n",
+    "def onehot(target: np.ndarray):\n",
+    "    onehot = np.zeros((target.size, target.max() + 1))\n",
+    "    onehot[np.arange(target.size), target] = 1\n",
+    "    return onehot\n",
+    "\n",
+    "digits = load_digits()\n",
+    "\n",
+    "X = digits.data\n",
+    "target = digits.target\n",
+    "target = onehot(target)\n",
+    "\n",
+    "input_nodes = 64\n",
+    "hidden_nodes1 = 100\n",
+    "hidden_nodes2 = 30\n",
+    "output_nodes = 10\n",
+    "\n",
+    "dims = (input_nodes, hidden_nodes1, hidden_nodes2, output_nodes)\n",
+    "\n",
+    "multiclass = FFNN(dims, hidden_func=LRELU, output_func=softmax, cost_func=CostCrossEntropy)\n",
+    "\n",
+    "multiclass.reset_weights() # reset weights such that previous runs or reruns don't affect the weights\n",
+    "\n",
+    "scheduler = Adam(eta=1e-4, rho=0.9, rho2=0.999)\n",
+    "scores = multiclass.fit(X, target, scheduler, epochs=1000)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1af2ad7b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Testing the XOR gate and other gates\n",
+    "\n",
+    "Let us now use our code to test the XOR gate."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 30,
+   "id": "752c6403",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "X = np.array([ [0, 0], [0, 1], [1, 0],[1, 1]],dtype=np.float64)\n",
+    "\n",
+    "# The XOR gate\n",
+    "yXOR = np.array( [[ 0], [1] ,[1], [0]])\n",
+    "\n",
+    "input_nodes = X.shape[1]\n",
+    "output_nodes = 1\n",
+    "\n",
+    "logistic_regression = FFNN((input_nodes, output_nodes), output_func=sigmoid, cost_func=CostLogReg, seed=2023)\n",
+    "logistic_regression.reset_weights() # reset weights such that previous runs or reruns don't affect the weights\n",
+    "scheduler = Adam(eta=1e-1, rho=0.9, rho2=0.999)\n",
+    "scores = logistic_regression.fit(X, yXOR, scheduler, epochs=1000)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0a7c91e3",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Not bad, but the results depend strongly on the learning reate. Try different learning rates."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "40ffa1fb",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Solving differential equations  with Deep Learning\n",
+    "\n",
+    "The Universal Approximation Theorem states that a neural network can\n",
+    "approximate any function at a single hidden layer along with one input\n",
+    "and output layer to any given precision.\n",
+    "\n",
+    "**Book on solving differential equations with ML methods.**\n",
+    "\n",
+    "[An Introduction to Neural Network Methods for Differential Equations](https://www.springer.com/gp/book/9789401798150), by Yadav and Kumar.\n",
+    "\n",
+    "**Physics informed neural networks.**\n",
+    "\n",
+    "[Scientific Machine Learning Through Physics–Informed Neural Networks: Where we are and What’s Next](https://link.springer.com/article/10.1007/s10915-022-01939-z), by Cuomo et al\n",
+    "\n",
+    "**Thanks to Kristine Baluka Hein.**\n",
+    "\n",
+    "The lectures on differential equations were developed by Kristine Baluka Hein, now PhD student at IFI.\n",
+    "A great thanks to Kristine."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "191ba3eb",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Ordinary Differential Equations first\n",
+    "\n",
+    "An ordinary differential equation (ODE) is an equation involving functions having one variable.\n",
+    "\n",
+    "In general, an ordinary differential equation looks like"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a0be312a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"ode\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation} \\label{ode} \\tag{1}\n",
+    "f\\left(x, \\, g(x), \\, g'(x), \\, g''(x), \\, \\dots \\, , \\, g^{(n)}(x)\\right) = 0\n",
+    "\\end{equation}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "000663cf",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where $g(x)$ is the function to find, and $g^{(n)}(x)$ is the $n$-th derivative of $g(x)$.\n",
+    "\n",
+    "The $f\\left(x, g(x), g'(x), g''(x), \\, \\dots \\, , g^{(n)}(x)\\right)$ is just a way to write that there is an expression involving $x$ and $g(x), \\ g'(x), \\ g''(x), \\, \\dots \\, , \\text{ and } g^{(n)}(x)$ on the left side of the equality sign in ([1](#ode)).\n",
+    "The highest order of derivative, that is the value of $n$, determines to the order of the equation.\n",
+    "The equation is referred to as a $n$-th order ODE.\n",
+    "Along with ([1](#ode)), some additional conditions of the function $g(x)$ are typically given\n",
+    "for the solution to be unique."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f5b87995",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## The trial solution\n",
+    "\n",
+    "Let the trial solution $g_t(x)$ be"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a166c0b6",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"_auto1\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation}\n",
+    "\tg_t(x) = h_1(x) + h_2(x,N(x,P))\n",
+    "\\label{_auto1} \\tag{2}\n",
+    "\\end{equation}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f1e49a2c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where $h_1(x)$ is a function that makes $g_t(x)$ satisfy a given set\n",
+    "of conditions, $N(x,P)$ a neural network with weights and biases\n",
+    "described by $P$ and $h_2(x, N(x,P))$ some expression involving the\n",
+    "neural network.  The role of the function $h_2(x, N(x,P))$, is to\n",
+    "ensure that the output from $N(x,P)$ is zero when $g_t(x)$ is\n",
+    "evaluated at the values of $x$ where the given conditions must be\n",
+    "satisfied.  The function $h_1(x)$ should alone make $g_t(x)$ satisfy\n",
+    "the conditions.\n",
+    "\n",
+    "But what about the network $N(x,P)$?\n",
+    "\n",
+    "As described previously, an optimization method could be used to minimize the parameters of a neural network, that being its weights and biases, through backward propagation."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "207d1a97",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Minimization process\n",
+    "\n",
+    "For the minimization to be defined, we need to have a cost function at hand to minimize.\n",
+    "\n",
+    "It is given that $f\\left(x, \\, g(x), \\, g'(x), \\, g''(x), \\, \\dots \\, , \\, g^{(n)}(x)\\right)$ should be equal to zero in ([1](#ode)).\n",
+    "We can choose to consider the mean squared error as the cost function for an input $x$.\n",
+    "Since we are looking at one input, the cost function is just $f$ squared.\n",
+    "The cost function $c\\left(x, P \\right)$ can therefore be expressed as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "94a061a1",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "C\\left(x, P\\right) = \\big(f\\left(x, \\, g(x), \\, g'(x), \\, g''(x), \\, \\dots \\, , \\, g^{(n)}(x)\\right)\\big)^2\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "93244d03",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "If $N$ inputs are given as a vector $\\boldsymbol{x}$ with elements $x_i$ for $i = 1,\\dots,N$,\n",
+    "the cost function becomes"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6dc16fd4",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"cost\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation} \\label{cost} \\tag{3}\n",
+    "\tC\\left(\\boldsymbol{x}, P\\right) = \\frac{1}{N} \\sum_{i=1}^N \\big(f\\left(x_i, \\, g(x_i), \\, g'(x_i), \\, g''(x_i), \\, \\dots \\, , \\, g^{(n)}(x_i)\\right)\\big)^2\n",
+    "\\end{equation}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "01f4c14a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "The neural net should then find the parameters $P$ that minimizes the cost function in\n",
+    "([3](#cost)) for a set of $N$ training samples $x_i$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1784066c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Minimizing the cost function using gradient descent and automatic differentiation\n",
+    "\n",
+    "To perform the minimization using gradient descent, the gradient of $C\\left(\\boldsymbol{x}, P\\right)$ is needed.\n",
+    "It might happen so that finding an analytical expression of the gradient of $C(\\boldsymbol{x}, P)$ from ([3](#cost)) gets too messy, depending on which cost function one desires to use.\n",
+    "\n",
+    "Luckily, there exists libraries that makes the job for us through automatic differentiation.\n",
+    "Automatic differentiation is a method of finding the derivatives numerically with very high precision."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "43e1b7bf",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Example: Exponential decay\n",
+    "\n",
+    "An exponential decay of a quantity $g(x)$ is described by the equation"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5c28e60a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"solve_expdec\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation} \\label{solve_expdec} \\tag{4}\n",
+    "  g'(x) = -\\gamma g(x)\n",
+    "\\end{equation}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cfd2e420",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "with $g(0) = g_0$ for some chosen initial value $g_0$.\n",
+    "\n",
+    "The analytical solution of ([4](#solve_expdec)) is"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b93aa0f8",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"_auto2\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation}\n",
+    "  g(x) = g_0 \\exp\\left(-\\gamma x\\right)\n",
+    "\\label{_auto2} \\tag{5}\n",
+    "\\end{equation}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "093952f0",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Having an analytical solution at hand, it is possible to use it to compare how well a neural network finds a solution of ([4](#solve_expdec))."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8f82fa61",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## The function to solve for\n",
+    "\n",
+    "The program will use a neural network to solve"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "027d9c52",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"solveode\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation} \\label{solveode} \\tag{6}\n",
+    "g'(x) = -\\gamma g(x)\n",
+    "\\end{equation}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c18c4ee8",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where $g(0) = g_0$ with $\\gamma$ and $g_0$ being some chosen values.\n",
+    "\n",
+    "In this example, $\\gamma = 2$ and $g_0 = 10$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a0d7fc0a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## The trial solution\n",
+    "To begin with, a trial solution $g_t(t)$ must be chosen. A general trial solution for ordinary differential equations could be"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "73cd72f4",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "g_t(x, P) = h_1(x) + h_2(x, N(x, P))\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a4d0850f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "with $h_1(x)$ ensuring that $g_t(x)$ satisfies some conditions and $h_2(x,N(x, P))$ an expression involving $x$ and the output from the neural network $N(x,P)$ with $P $ being the collection of the weights and biases for each layer. For now, it is assumed that the network consists of one input layer, one hidden layer, and one output layer."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "62f3b94f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Setup of Network\n",
+    "\n",
+    "In this network, there are no weights and bias at the input layer, so $P = \\{ P_{\\text{hidden}},  P_{\\text{output}} \\}$.\n",
+    "If there are $N_{\\text{hidden} }$ neurons in the hidden layer, then $P_{\\text{hidden}}$ is a $N_{\\text{hidden} } \\times (1 + N_{\\text{input}})$ matrix, given that there are $N_{\\text{input}}$ neurons in the input layer.\n",
+    "\n",
+    "The first column in $P_{\\text{hidden} }$ represents the bias for each neuron in the hidden layer and the second column represents the weights for each neuron in the hidden layer from the input layer.\n",
+    "If there are $N_{\\text{output} }$ neurons in the output layer, then $P_{\\text{output}} $ is a $N_{\\text{output} } \\times (1 + N_{\\text{hidden} })$ matrix.\n",
+    "\n",
+    "Its first column represents the bias of each neuron and the remaining columns represents the weights to each neuron.\n",
+    "\n",
+    "It is given that $g(0) = g_0$. The trial solution must fulfill this condition to be a proper solution of ([6](#solveode)). A possible way to ensure that $g_t(0, P) = g_0$, is to let $F(N(x,P)) = x \\cdot N(x,P)$ and $A(x) = g_0$. This gives the following trial solution:"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f5144858",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"trial\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation} \\label{trial} \\tag{7}\n",
+    "g_t(x, P) = g_0 + x \\cdot N(x, P)\n",
+    "\\end{equation}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6b441362",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Reformulating the problem\n",
+    "\n",
+    "We wish that our neural network manages to minimize a given cost function.\n",
+    "\n",
+    "A reformulation of out equation, ([6](#solveode)), must therefore be done,\n",
+    "such that it describes the problem a neural network can solve for.\n",
+    "\n",
+    "The neural network must find the set of weights and biases $P$ such that the trial solution in ([7](#trial)) satisfies ([6](#solveode)).\n",
+    "\n",
+    "The trial solution"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "abfe2d6d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "g_t(x, P) = g_0 + x \\cdot N(x, P)\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "aabb6c7b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "has been chosen such that it already solves the condition $g(0) = g_0$. What remains, is to find $P$ such that"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "11fc8b1b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"nnmin\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation} \\label{nnmin} \\tag{8}\n",
+    "g_t'(x, P) = - \\gamma g_t(x, P)\n",
+    "\\end{equation}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "604c92b4",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "is fulfilled as *best as possible*."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e2cd7572",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## More technicalities\n",
+    "\n",
+    "The left hand side and right hand side of ([8](#nnmin)) must be computed separately, and then the neural network must choose weights and biases, contained in $P$, such that the sides are equal as best as possible.\n",
+    "This means that the absolute or squared difference between the sides must be as close to zero, ideally equal to zero.\n",
+    "In this case, the difference squared shows to be an appropriate measurement of how erroneous the trial solution is with respect to $P$ of the neural network.\n",
+    "\n",
+    "This gives the following cost function our neural network must solve for:"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d916a5f6",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\min_{P}\\Big\\{ \\big(g_t'(x, P) - ( -\\gamma g_t(x, P) \\big)^2 \\Big\\}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d746e69c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "(the notation $\\min_{P}\\{ f(x, P) \\}$ means that we desire to find $P$ that yields the minimum of $f(x, P)$)\n",
+    "\n",
+    "or, in terms of weights and biases for the hidden and output layer in our network:"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4c34c242",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\min_{P_{\\text{hidden} }, \\ P_{\\text{output} }}\\Big\\{ \\big(g_t'(x, \\{ P_{\\text{hidden} }, P_{\\text{output} }\\}) - ( -\\gamma g_t(x, \\{ P_{\\text{hidden} }, P_{\\text{output} }\\}) \\big)^2 \\Big\\}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f55f3047",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "for an input value $x$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "485e4671",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## More details\n",
+    "\n",
+    "If the neural network evaluates $g_t(x, P)$ at more values for $x$, say $N$ values $x_i$ for $i = 1, \\dots, N$, then the *total* error to minimize becomes"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5628ca35",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"min\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation} \\label{min} \\tag{9}\n",
+    "\\min_{P}\\Big\\{\\frac{1}{N} \\sum_{i=1}^N  \\big(g_t'(x_i, P) - ( -\\gamma g_t(x_i, P) \\big)^2 \\Big\\}\n",
+    "\\end{equation}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "da2c90ea",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Letting $\\boldsymbol{x}$ be a vector with elements $x_i$ and $C(\\boldsymbol{x}, P) = \\frac{1}{N} \\sum_i  \\big(g_t'(x_i, P) - ( -\\gamma g_t(x_i, P) \\big)^2$ denote the cost function, the minimization problem that our network must solve, becomes"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d386a466",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\min_{P} C(\\boldsymbol{x}, P)\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ec3d975a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "In terms of $P_{\\text{hidden} }$ and $P_{\\text{output} }$, this could also be expressed as\n",
+    "\n",
+    "$$\n",
+    "\\min_{P_{\\text{hidden} }, \\ P_{\\text{output} }} C(\\boldsymbol{x}, \\{P_{\\text{hidden} }, P_{\\text{output} }\\})\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4f0f47e7",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## A possible implementation of a neural network\n",
+    "\n",
+    "For simplicity, it is assumed that the input is an array $\\boldsymbol{x} = (x_1, \\dots, x_N)$ with $N$ elements. It is at these points the neural network should find $P$ such that it fulfills ([9](#min)).\n",
+    "\n",
+    "First, the neural network must feed forward the inputs.\n",
+    "This means that $\\boldsymbol{x}s$ must be passed through an input layer, a hidden layer and a output layer. The input layer in this case, does not need to process the data any further.\n",
+    "The input layer will consist of $N_{\\text{input} }$ neurons, passing its element to each neuron in the hidden layer.  The number of neurons in the hidden layer will be $N_{\\text{hidden} }$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a757d9cf",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Technicalities\n",
+    "\n",
+    "For the $i$-th in the hidden layer with weight $w_i^{\\text{hidden} }$ and bias $b_i^{\\text{hidden} }$, the weighting from the $j$-th neuron at the input layer is:"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ee093dd9",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\begin{aligned}\n",
+    "z_{i,j}^{\\text{hidden}} &= b_i^{\\text{hidden}} + w_i^{\\text{hidden}}x_j \\\\\n",
+    "&=\n",
+    "\\begin{pmatrix}\n",
+    "b_i^{\\text{hidden}} & w_i^{\\text{hidden}}\n",
+    "\\end{pmatrix}\n",
+    "\\begin{pmatrix}\n",
+    "1 \\\\\n",
+    "x_j\n",
+    "\\end{pmatrix}\n",
+    "\\end{aligned}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4d3954bf",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Final technicalities I\n",
+    "\n",
+    "The result after weighting the inputs at the $i$-th hidden neuron can be written as a vector:"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b4b36b8c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\begin{aligned}\n",
+    "\\boldsymbol{z}_{i}^{\\text{hidden}} &= \\Big( b_i^{\\text{hidden}} + w_i^{\\text{hidden}}x_1 , \\ b_i^{\\text{hidden}} + w_i^{\\text{hidden}} x_2, \\ \\dots \\, , \\ b_i^{\\text{hidden}} + w_i^{\\text{hidden}} x_N\\Big)  \\\\\n",
+    "&=\n",
+    "\\begin{pmatrix}\n",
+    " b_i^{\\text{hidden}}  & w_i^{\\text{hidden}}\n",
+    "\\end{pmatrix}\n",
+    "\\begin{pmatrix}\n",
+    "1  & 1 & \\dots & 1 \\\\\n",
+    "x_1 & x_2 & \\dots & x_N\n",
+    "\\end{pmatrix} \\\\\n",
+    "&= \\boldsymbol{p}_{i, \\text{hidden}}^T X\n",
+    "\\end{aligned}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "36e8a1dd",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Final technicalities II\n",
+    "\n",
+    "The vector $\\boldsymbol{p}_{i, \\text{hidden}}^T$ constitutes each row in $P_{\\text{hidden} }$, which contains the weights for the neural network to minimize according to ([9](#min)).\n",
+    "\n",
+    "After having found $\\boldsymbol{z}_{i}^{\\text{hidden}} $ for every $i$-th neuron within the hidden layer, the vector will be sent to an activation function $a_i(\\boldsymbol{z})$.\n",
+    "\n",
+    "In this example, the sigmoid function has been chosen to be the activation function for each hidden neuron:"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "af2e68be",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "f(z) = \\frac{1}{1 + \\exp{(-z)}}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7b8922c6",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "It is possible to use other activations functions for the hidden layer also.\n",
+    "\n",
+    "The output $\\boldsymbol{x}_i^{\\text{hidden}}$ from each $i$-th hidden neuron is:\n",
+    "\n",
+    "$$\n",
+    "\\boldsymbol{x}_i^{\\text{hidden} } = f\\big(  \\boldsymbol{z}_{i}^{\\text{hidden}} \\big)\n",
+    "$$\n",
+    "\n",
+    "The outputs $\\boldsymbol{x}_i^{\\text{hidden} } $ are then sent to the output layer.\n",
+    "\n",
+    "The output layer consists of one neuron in this case, and combines the\n",
+    "output from each of the neurons in the hidden layers. The output layer\n",
+    "combines the results from the hidden layer using some weights $w_i^{\\text{output}}$\n",
+    "and biases $b_i^{\\text{output}}$. In this case,\n",
+    "it is assumes that the number of neurons in the output layer is one."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2aa977d9",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Final technicalities III\n",
+    "\n",
+    "The procedure of weighting the output neuron $j$ in the hidden layer to the $i$-th neuron in the output layer is similar as for the hidden layer described previously."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "48eccfa6",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\begin{aligned}\n",
+    "z_{1,j}^{\\text{output}} & =\n",
+    "\\begin{pmatrix}\n",
+    "b_1^{\\text{output}} & \\boldsymbol{w}_1^{\\text{output}}\n",
+    "\\end{pmatrix}\n",
+    "\\begin{pmatrix}\n",
+    "1 \\\\\n",
+    "\\boldsymbol{x}_j^{\\text{hidden}}\n",
+    "\\end{pmatrix}\n",
+    "\\end{aligned}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d4c2cdbf",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Final technicalities IV\n",
+    "\n",
+    "Expressing $z_{1,j}^{\\text{output}}$ as a vector gives the following way of weighting the inputs from the hidden layer:"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "be26d9c9",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\boldsymbol{z}_{1}^{\\text{output}} =\n",
+    "\\begin{pmatrix}\n",
+    "b_1^{\\text{output}} & \\boldsymbol{w}_1^{\\text{output}}\n",
+    "\\end{pmatrix}\n",
+    "\\begin{pmatrix}\n",
+    "1  & 1 & \\dots & 1 \\\\\n",
+    "\\boldsymbol{x}_1^{\\text{hidden}} & \\boldsymbol{x}_2^{\\text{hidden}} & \\dots & \\boldsymbol{x}_N^{\\text{hidden}}\n",
+    "\\end{pmatrix}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f3703c9a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "In this case we seek a continuous range of values since we are approximating a function. This means that after computing $\\boldsymbol{z}_{1}^{\\text{output}}$ the neural network has finished its feed forward step, and $\\boldsymbol{z}_{1}^{\\text{output}}$ is the final output of the network."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9859680c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Back propagation\n",
+    "\n",
+    "The next step is to decide how the parameters should be changed such that they minimize the cost function.\n",
+    "\n",
+    "The chosen cost function for this problem is"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c3df269d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "C(\\boldsymbol{x}, P) = \\frac{1}{N} \\sum_i  \\big(g_t'(x_i, P) - ( -\\gamma g_t(x_i, P) \\big)^2\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "dc69023a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "In order to minimize the cost function, an optimization method must be chosen.\n",
+    "\n",
+    "Here, gradient descent with a constant step size has been chosen."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d4bed3bd",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Gradient descent\n",
+    "\n",
+    "The idea of the gradient descent algorithm is to update parameters in\n",
+    "a direction where the cost function decreases goes to a minimum.\n",
+    "\n",
+    "In general, the update of some parameters $\\boldsymbol{\\omega}$ given a cost\n",
+    "function defined by some weights $\\boldsymbol{\\omega}$, $C(\\boldsymbol{x},\n",
+    "\\boldsymbol{\\omega})$, goes as follows:"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ed2a4f9a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\boldsymbol{\\omega}_{\\text{new} } = \\boldsymbol{\\omega} - \\lambda \\nabla_{\\boldsymbol{\\omega}} C(\\boldsymbol{x}, \\boldsymbol{\\omega})\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b9a4f604",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "for a number of iterations or until $ \\big|\\big| \\boldsymbol{\\omega}_{\\text{new} } - \\boldsymbol{\\omega} \\big|\\big|$ becomes smaller than some given tolerance.\n",
+    "\n",
+    "The value of $\\lambda$ decides how large steps the algorithm must take\n",
+    "in the direction of $ \\nabla_{\\boldsymbol{\\omega}} C(\\boldsymbol{x}, \\boldsymbol{\\omega})$.\n",
+    "The notation $\\nabla_{\\boldsymbol{\\omega}}$ express the gradient with respect\n",
+    "to the elements in $\\boldsymbol{\\omega}$.\n",
+    "\n",
+    "In our case, we have to minimize the cost function $C(\\boldsymbol{x}, P)$ with\n",
+    "respect to the two sets of weights and biases, that is for the hidden\n",
+    "layer $P_{\\text{hidden} }$ and for the output layer $P_{\\text{output}\n",
+    "}$ .\n",
+    "\n",
+    "This means that $P_{\\text{hidden} }$ and $P_{\\text{output} }$ is updated by"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e48d507f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\begin{aligned}\n",
+    "P_{\\text{hidden},\\text{new}} &= P_{\\text{hidden}} - \\lambda \\nabla_{P_{\\text{hidden}}} C(\\boldsymbol{x}, P)  \\\\\n",
+    "P_{\\text{output},\\text{new}} &= P_{\\text{output}} - \\lambda \\nabla_{P_{\\text{output}}} C(\\boldsymbol{x}, P)\n",
+    "\\end{aligned}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b84c5cf5",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## The code for solving the ODE"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 31,
+   "id": "293d0f7d",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "import autograd.numpy as np\n",
+    "from autograd import grad, elementwise_grad\n",
+    "import autograd.numpy.random as npr\n",
+    "from matplotlib import pyplot as plt\n",
+    "\n",
+    "def sigmoid(z):\n",
+    "    return 1/(1 + np.exp(-z))\n",
+    "\n",
+    "# Assuming one input, hidden, and output layer\n",
+    "def neural_network(params, x):\n",
+    "\n",
+    "    # Find the weights (including and biases) for the hidden and output layer.\n",
+    "    # Assume that params is a list of parameters for each layer.\n",
+    "    # The biases are the first element for each array in params,\n",
+    "    # and the weights are the remaning elements in each array in params.\n",
+    "\n",
+    "    w_hidden = params[0]\n",
+    "    w_output = params[1]\n",
+    "\n",
+    "    # Assumes input x being an one-dimensional array\n",
+    "    num_values = np.size(x)\n",
+    "    x = x.reshape(-1, num_values)\n",
+    "\n",
+    "    # Assume that the input layer does nothing to the input x\n",
+    "    x_input = x\n",
+    "\n",
+    "    ## Hidden layer:\n",
+    "\n",
+    "    # Add a row of ones to include bias\n",
+    "    x_input = np.concatenate((np.ones((1,num_values)), x_input ), axis = 0)\n",
+    "\n",
+    "    z_hidden = np.matmul(w_hidden, x_input)\n",
+    "    x_hidden = sigmoid(z_hidden)\n",
+    "\n",
+    "    ## Output layer:\n",
+    "\n",
+    "    # Include bias:\n",
+    "    x_hidden = np.concatenate((np.ones((1,num_values)), x_hidden ), axis = 0)\n",
+    "\n",
+    "    z_output = np.matmul(w_output, x_hidden)\n",
+    "    x_output = z_output\n",
+    "\n",
+    "    return x_output\n",
+    "\n",
+    "# The trial solution using the deep neural network:\n",
+    "def g_trial(x,params, g0 = 10):\n",
+    "    return g0 + x*neural_network(params,x)\n",
+    "\n",
+    "# The right side of the ODE:\n",
+    "def g(x, g_trial, gamma = 2):\n",
+    "    return -gamma*g_trial\n",
+    "\n",
+    "# The cost function:\n",
+    "def cost_function(P, x):\n",
+    "\n",
+    "    # Evaluate the trial function with the current parameters P\n",
+    "    g_t = g_trial(x,P)\n",
+    "\n",
+    "    # Find the derivative w.r.t x of the neural network\n",
+    "    d_net_out = elementwise_grad(neural_network,1)(P,x)\n",
+    "\n",
+    "    # Find the derivative w.r.t x of the trial function\n",
+    "    d_g_t = elementwise_grad(g_trial,0)(x,P)\n",
+    "\n",
+    "    # The right side of the ODE\n",
+    "    func = g(x, g_t)\n",
+    "\n",
+    "    err_sqr = (d_g_t - func)**2\n",
+    "    cost_sum = np.sum(err_sqr)\n",
+    "\n",
+    "    return cost_sum / np.size(err_sqr)\n",
+    "\n",
+    "# Solve the exponential decay ODE using neural network with one input, hidden, and output layer\n",
+    "def solve_ode_neural_network(x, num_neurons_hidden, num_iter, lmb):\n",
+    "    ## Set up initial weights and biases\n",
+    "\n",
+    "    # For the hidden layer\n",
+    "    p0 = npr.randn(num_neurons_hidden, 2 )\n",
+    "\n",
+    "    # For the output layer\n",
+    "    p1 = npr.randn(1, num_neurons_hidden + 1 ) # +1 since bias is included\n",
+    "\n",
+    "    P = [p0, p1]\n",
+    "\n",
+    "    print('Initial cost: %g'%cost_function(P, x))\n",
+    "\n",
+    "    ## Start finding the optimal weights using gradient descent\n",
+    "\n",
+    "    # Find the Python function that represents the gradient of the cost function\n",
+    "    # w.r.t the 0-th input argument -- that is the weights and biases in the hidden and output layer\n",
+    "    cost_function_grad = grad(cost_function,0)\n",
+    "\n",
+    "    # Let the update be done num_iter times\n",
+    "    for i in range(num_iter):\n",
+    "        # Evaluate the gradient at the current weights and biases in P.\n",
+    "        # The cost_grad consist now of two arrays;\n",
+    "        # one for the gradient w.r.t P_hidden and\n",
+    "        # one for the gradient w.r.t P_output\n",
+    "        cost_grad =  cost_function_grad(P, x)\n",
+    "\n",
+    "        P[0] = P[0] - lmb * cost_grad[0]\n",
+    "        P[1] = P[1] - lmb * cost_grad[1]\n",
+    "\n",
+    "    print('Final cost: %g'%cost_function(P, x))\n",
+    "\n",
+    "    return P\n",
+    "\n",
+    "def g_analytic(x, gamma = 2, g0 = 10):\n",
+    "    return g0*np.exp(-gamma*x)\n",
+    "\n",
+    "# Solve the given problem\n",
+    "if __name__ == '__main__':\n",
+    "    # Set seed such that the weight are initialized\n",
+    "    # with same weights and biases for every run.\n",
+    "    npr.seed(15)\n",
+    "\n",
+    "    ## Decide the vales of arguments to the function to solve\n",
+    "    N = 10\n",
+    "    x = np.linspace(0, 1, N)\n",
+    "\n",
+    "    ## Set up the initial parameters\n",
+    "    num_hidden_neurons = 10\n",
+    "    num_iter = 10000\n",
+    "    lmb = 0.001\n",
+    "\n",
+    "    # Use the network\n",
+    "    P = solve_ode_neural_network(x, num_hidden_neurons, num_iter, lmb)\n",
+    "\n",
+    "    # Print the deviation from the trial solution and true solution\n",
+    "    res = g_trial(x,P)\n",
+    "    res_analytical = g_analytic(x)\n",
+    "\n",
+    "    print('Max absolute difference: %g'%np.max(np.abs(res - res_analytical)))\n",
+    "\n",
+    "    # Plot the results\n",
+    "    plt.figure(figsize=(10,10))\n",
+    "\n",
+    "    plt.title('Performance of neural network solving an ODE compared to the analytical solution')\n",
+    "    plt.plot(x, res_analytical)\n",
+    "    plt.plot(x, res[0,:])\n",
+    "    plt.legend(['analytical','nn'])\n",
+    "    plt.xlabel('x')\n",
+    "    plt.ylabel('g(x)')\n",
+    "    plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "54c070e1",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## The network with one input layer, specified number of hidden layers, and one output layer\n",
+    "\n",
+    "It is also possible to extend the construction of our network into a more general one, allowing the network to contain more than one hidden layers.\n",
+    "\n",
+    "The number of neurons within each hidden layer are given as a list of integers in the program below."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 32,
+   "id": "4ab2467e",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "import autograd.numpy as np\n",
+    "from autograd import grad, elementwise_grad\n",
+    "import autograd.numpy.random as npr\n",
+    "from matplotlib import pyplot as plt\n",
+    "\n",
+    "def sigmoid(z):\n",
+    "    return 1/(1 + np.exp(-z))\n",
+    "\n",
+    "# The neural network with one input layer and one output layer,\n",
+    "# but with number of hidden layers specified by the user.\n",
+    "def deep_neural_network(deep_params, x):\n",
+    "    # N_hidden is the number of hidden layers\n",
+    "    # deep_params is a list, len() should be used\n",
+    "    N_hidden = len(deep_params) - 1 # -1 since params consists of\n",
+    "                                        # parameters to all the hidden\n",
+    "                                        # layers AND the output layer.\n",
+    "\n",
+    "    # Assumes input x being an one-dimensional array\n",
+    "    num_values = np.size(x)\n",
+    "    x = x.reshape(-1, num_values)\n",
+    "\n",
+    "    # Assume that the input layer does nothing to the input x\n",
+    "    x_input = x\n",
+    "\n",
+    "    # Due to multiple hidden layers, define a variable referencing to the\n",
+    "    # output of the previous layer:\n",
+    "    x_prev = x_input\n",
+    "\n",
+    "    ## Hidden layers:\n",
+    "\n",
+    "    for l in range(N_hidden):\n",
+    "        # From the list of parameters P; find the correct weigths and bias for this layer\n",
+    "        w_hidden = deep_params[l]\n",
+    "\n",
+    "        # Add a row of ones to include bias\n",
+    "        x_prev = np.concatenate((np.ones((1,num_values)), x_prev ), axis = 0)\n",
+    "\n",
+    "        z_hidden = np.matmul(w_hidden, x_prev)\n",
+    "        x_hidden = sigmoid(z_hidden)\n",
+    "\n",
+    "        # Update x_prev such that next layer can use the output from this layer\n",
+    "        x_prev = x_hidden\n",
+    "\n",
+    "    ## Output layer:\n",
+    "\n",
+    "    # Get the weights and bias for this layer\n",
+    "    w_output = deep_params[-1]\n",
+    "\n",
+    "    # Include bias:\n",
+    "    x_prev = np.concatenate((np.ones((1,num_values)), x_prev), axis = 0)\n",
+    "\n",
+    "    z_output = np.matmul(w_output, x_prev)\n",
+    "    x_output = z_output\n",
+    "\n",
+    "    return x_output\n",
+    "\n",
+    "# The trial solution using the deep neural network:\n",
+    "def g_trial_deep(x,params, g0 = 10):\n",
+    "    return g0 + x*deep_neural_network(params, x)\n",
+    "\n",
+    "# The right side of the ODE:\n",
+    "def g(x, g_trial, gamma = 2):\n",
+    "    return -gamma*g_trial\n",
+    "\n",
+    "# The same cost function as before, but calls deep_neural_network instead.\n",
+    "def cost_function_deep(P, x):\n",
+    "\n",
+    "    # Evaluate the trial function with the current parameters P\n",
+    "    g_t = g_trial_deep(x,P)\n",
+    "\n",
+    "    # Find the derivative w.r.t x of the neural network\n",
+    "    d_net_out = elementwise_grad(deep_neural_network,1)(P,x)\n",
+    "\n",
+    "    # Find the derivative w.r.t x of the trial function\n",
+    "    d_g_t = elementwise_grad(g_trial_deep,0)(x,P)\n",
+    "\n",
+    "    # The right side of the ODE\n",
+    "    func = g(x, g_t)\n",
+    "\n",
+    "    err_sqr = (d_g_t - func)**2\n",
+    "    cost_sum = np.sum(err_sqr)\n",
+    "\n",
+    "    return cost_sum / np.size(err_sqr)\n",
+    "\n",
+    "# Solve the exponential decay ODE using neural network with one input and one output layer,\n",
+    "# but with specified number of hidden layers from the user.\n",
+    "def solve_ode_deep_neural_network(x, num_neurons, num_iter, lmb):\n",
+    "    # num_hidden_neurons is now a list of number of neurons within each hidden layer\n",
+    "\n",
+    "    # The number of elements in the list num_hidden_neurons thus represents\n",
+    "    # the number of hidden layers.\n",
+    "\n",
+    "    # Find the number of hidden layers:\n",
+    "    N_hidden = np.size(num_neurons)\n",
+    "\n",
+    "    ## Set up initial weights and biases\n",
+    "\n",
+    "    # Initialize the list of parameters:\n",
+    "    P = [None]*(N_hidden + 1) # + 1 to include the output layer\n",
+    "\n",
+    "    P[0] = npr.randn(num_neurons[0], 2 )\n",
+    "    for l in range(1,N_hidden):\n",
+    "        P[l] = npr.randn(num_neurons[l], num_neurons[l-1] + 1) # +1 to include bias\n",
+    "\n",
+    "    # For the output layer\n",
+    "    P[-1] = npr.randn(1, num_neurons[-1] + 1 ) # +1 since bias is included\n",
+    "\n",
+    "    print('Initial cost: %g'%cost_function_deep(P, x))\n",
+    "\n",
+    "    ## Start finding the optimal weights using gradient descent\n",
+    "\n",
+    "    # Find the Python function that represents the gradient of the cost function\n",
+    "    # w.r.t the 0-th input argument -- that is the weights and biases in the hidden and output layer\n",
+    "    cost_function_deep_grad = grad(cost_function_deep,0)\n",
+    "\n",
+    "    # Let the update be done num_iter times\n",
+    "    for i in range(num_iter):\n",
+    "        # Evaluate the gradient at the current weights and biases in P.\n",
+    "        # The cost_grad consist now of N_hidden + 1 arrays; the gradient w.r.t the weights and biases\n",
+    "        # in the hidden layers and output layers evaluated at x.\n",
+    "        cost_deep_grad =  cost_function_deep_grad(P, x)\n",
+    "\n",
+    "        for l in range(N_hidden+1):\n",
+    "            P[l] = P[l] - lmb * cost_deep_grad[l]\n",
+    "\n",
+    "    print('Final cost: %g'%cost_function_deep(P, x))\n",
+    "\n",
+    "    return P\n",
+    "\n",
+    "def g_analytic(x, gamma = 2, g0 = 10):\n",
+    "    return g0*np.exp(-gamma*x)\n",
+    "\n",
+    "# Solve the given problem\n",
+    "if __name__ == '__main__':\n",
+    "    npr.seed(15)\n",
+    "\n",
+    "    ## Decide the vales of arguments to the function to solve\n",
+    "    N = 10\n",
+    "    x = np.linspace(0, 1, N)\n",
+    "\n",
+    "    ## Set up the initial parameters\n",
+    "    num_hidden_neurons = np.array([10,10])\n",
+    "    num_iter = 10000\n",
+    "    lmb = 0.001\n",
+    "\n",
+    "    P = solve_ode_deep_neural_network(x, num_hidden_neurons, num_iter, lmb)\n",
+    "\n",
+    "    res = g_trial_deep(x,P)\n",
+    "    res_analytical = g_analytic(x)\n",
+    "\n",
+    "    plt.figure(figsize=(10,10))\n",
+    "\n",
+    "    plt.title('Performance of a deep neural network solving an ODE compared to the analytical solution')\n",
+    "    plt.plot(x, res_analytical)\n",
+    "    plt.plot(x, res[0,:])\n",
+    "    plt.legend(['analytical','dnn'])\n",
+    "    plt.ylabel('g(x)')\n",
+    "    plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "05126a03",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Example: Population growth\n",
+    "\n",
+    "A logistic model of population growth assumes that a population converges toward an equilibrium.\n",
+    "The population growth can be modeled by"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7b4e9871",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"log\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation} \\label{log} \\tag{10}\n",
+    "\tg'(t) = \\alpha g(t)(A - g(t))\n",
+    "\\end{equation}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "20266e3a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where $g(t)$ is the population density at time $t$, $\\alpha > 0$ the growth rate and $A > 0$ is the maximum population number in the environment.\n",
+    "Also, at $t = 0$ the population has the size $g(0) = g_0$, where $g_0$ is some chosen constant.\n",
+    "\n",
+    "In this example, similar network as for the exponential decay using Autograd has been used to solve the equation. However, as the implementation might suffer from e.g numerical instability\n",
+    "and high execution time (this might be more apparent in the examples solving PDEs),\n",
+    "using a library like  TensorFlow is recommended.\n",
+    "Here, we stay with a more simple approach and implement for comparison, the simple forward Euler method."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8a3f1b3d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Setting up the problem\n",
+    "\n",
+    "Here, we will model a population $g(t)$ in an environment having carrying capacity $A$.\n",
+    "The population follows the model"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "14dfc04b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"solveode_population\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation} \\label{solveode_population} \\tag{11}\n",
+    "g'(t) = \\alpha g(t)(A - g(t))\n",
+    "\\end{equation}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b125d1d3",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where $g(0) = g_0$.\n",
+    "\n",
+    "In this example, we let $\\alpha = 2$, $A = 1$, and $g_0 = 1.2$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "226a3528",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## The trial solution\n",
+    "\n",
+    "We will get a slightly different trial solution, as the boundary conditions are different\n",
+    "compared to the case for exponential decay.\n",
+    "\n",
+    "A possible trial solution satisfying the condition $g(0) = g_0$ could be\n",
+    "\n",
+    "$$\n",
+    "h_1(t) = g_0 + t \\cdot N(t,P)\n",
+    "$$\n",
+    "\n",
+    "with $N(t,P)$ being the output from the neural network with weights and biases for each layer collected in the set $P$.\n",
+    "\n",
+    "The analytical solution is\n",
+    "\n",
+    "$$\n",
+    "g(t) = \\frac{Ag_0}{g_0 + (A - g_0)\\exp(-\\alpha A t)}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "adeeb731",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## The program using Autograd\n",
+    "\n",
+    "The network will be the similar as for the exponential decay example, but with some small modifications for our problem."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 33,
+   "id": "eb3ed6d1",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "import autograd.numpy as np\n",
+    "from autograd import grad, elementwise_grad\n",
+    "import autograd.numpy.random as npr\n",
+    "from matplotlib import pyplot as plt\n",
+    "\n",
+    "def sigmoid(z):\n",
+    "    return 1/(1 + np.exp(-z))\n",
+    "\n",
+    "# Function to get the parameters.\n",
+    "# Done such that one can easily change the paramaters after one's liking.\n",
+    "def get_parameters():\n",
+    "    alpha = 2\n",
+    "    A = 1\n",
+    "    g0 = 1.2\n",
+    "    return alpha, A, g0\n",
+    "\n",
+    "def deep_neural_network(deep_params, x):\n",
+    "    # N_hidden is the number of hidden layers\n",
+    "    # deep_params is a list, len() should be used\n",
+    "    N_hidden = len(deep_params) - 1 # -1 since params consists of\n",
+    "                                        # parameters to all the hidden\n",
+    "                                        # layers AND the output layer.\n",
+    "\n",
+    "    # Assumes input x being an one-dimensional array\n",
+    "    num_values = np.size(x)\n",
+    "    x = x.reshape(-1, num_values)\n",
+    "\n",
+    "    # Assume that the input layer does nothing to the input x\n",
+    "    x_input = x\n",
+    "\n",
+    "    # Due to multiple hidden layers, define a variable referencing to the\n",
+    "    # output of the previous layer:\n",
+    "    x_prev = x_input\n",
+    "\n",
+    "    ## Hidden layers:\n",
+    "\n",
+    "    for l in range(N_hidden):\n",
+    "        # From the list of parameters P; find the correct weigths and bias for this layer\n",
+    "        w_hidden = deep_params[l]\n",
+    "\n",
+    "        # Add a row of ones to include bias\n",
+    "        x_prev = np.concatenate((np.ones((1,num_values)), x_prev ), axis = 0)\n",
+    "\n",
+    "        z_hidden = np.matmul(w_hidden, x_prev)\n",
+    "        x_hidden = sigmoid(z_hidden)\n",
+    "\n",
+    "        # Update x_prev such that next layer can use the output from this layer\n",
+    "        x_prev = x_hidden\n",
+    "\n",
+    "    ## Output layer:\n",
+    "\n",
+    "    # Get the weights and bias for this layer\n",
+    "    w_output = deep_params[-1]\n",
+    "\n",
+    "    # Include bias:\n",
+    "    x_prev = np.concatenate((np.ones((1,num_values)), x_prev), axis = 0)\n",
+    "\n",
+    "    z_output = np.matmul(w_output, x_prev)\n",
+    "    x_output = z_output\n",
+    "\n",
+    "    return x_output\n",
+    "\n",
+    "\n",
+    "\n",
+    "\n",
+    "def cost_function_deep(P, x):\n",
+    "\n",
+    "    # Evaluate the trial function with the current parameters P\n",
+    "    g_t = g_trial_deep(x,P)\n",
+    "\n",
+    "    # Find the derivative w.r.t x of the trial function\n",
+    "    d_g_t = elementwise_grad(g_trial_deep,0)(x,P)\n",
+    "\n",
+    "    # The right side of the ODE\n",
+    "    func = f(x, g_t)\n",
+    "\n",
+    "    err_sqr = (d_g_t - func)**2\n",
+    "    cost_sum = np.sum(err_sqr)\n",
+    "\n",
+    "    return cost_sum / np.size(err_sqr)\n",
+    "\n",
+    "# The right side of the ODE:\n",
+    "def f(x, g_trial):\n",
+    "    alpha,A, g0 = get_parameters()\n",
+    "    return alpha*g_trial*(A - g_trial)\n",
+    "\n",
+    "# The trial solution using the deep neural network:\n",
+    "def g_trial_deep(x, params):\n",
+    "    alpha,A, g0 = get_parameters()\n",
+    "    return g0 + x*deep_neural_network(params,x)\n",
+    "\n",
+    "# The analytical solution:\n",
+    "def g_analytic(t):\n",
+    "    alpha,A, g0 = get_parameters()\n",
+    "    return A*g0/(g0 + (A - g0)*np.exp(-alpha*A*t))\n",
+    "\n",
+    "def solve_ode_deep_neural_network(x, num_neurons, num_iter, lmb):\n",
+    "    # num_hidden_neurons is now a list of number of neurons within each hidden layer\n",
+    "\n",
+    "    # Find the number of hidden layers:\n",
+    "    N_hidden = np.size(num_neurons)\n",
+    "\n",
+    "    ## Set up initial weigths and biases\n",
+    "\n",
+    "    # Initialize the list of parameters:\n",
+    "    P = [None]*(N_hidden + 1) # + 1 to include the output layer\n",
+    "\n",
+    "    P[0] = npr.randn(num_neurons[0], 2 )\n",
+    "    for l in range(1,N_hidden):\n",
+    "        P[l] = npr.randn(num_neurons[l], num_neurons[l-1] + 1) # +1 to include bias\n",
+    "\n",
+    "    # For the output layer\n",
+    "    P[-1] = npr.randn(1, num_neurons[-1] + 1 ) # +1 since bias is included\n",
+    "\n",
+    "    print('Initial cost: %g'%cost_function_deep(P, x))\n",
+    "\n",
+    "    ## Start finding the optimal weigths using gradient descent\n",
+    "\n",
+    "    # Find the Python function that represents the gradient of the cost function\n",
+    "    # w.r.t the 0-th input argument -- that is the weights and biases in the hidden and output layer\n",
+    "    cost_function_deep_grad = grad(cost_function_deep,0)\n",
+    "\n",
+    "    # Let the update be done num_iter times\n",
+    "    for i in range(num_iter):\n",
+    "        # Evaluate the gradient at the current weights and biases in P.\n",
+    "        # The cost_grad consist now of N_hidden + 1 arrays; the gradient w.r.t the weights and biases\n",
+    "        # in the hidden layers and output layers evaluated at x.\n",
+    "        cost_deep_grad =  cost_function_deep_grad(P, x)\n",
+    "\n",
+    "        for l in range(N_hidden+1):\n",
+    "            P[l] = P[l] - lmb * cost_deep_grad[l]\n",
+    "\n",
+    "    print('Final cost: %g'%cost_function_deep(P, x))\n",
+    "\n",
+    "    return P\n",
+    "\n",
+    "if __name__ == '__main__':\n",
+    "    npr.seed(4155)\n",
+    "\n",
+    "    ## Decide the vales of arguments to the function to solve\n",
+    "    Nt = 10\n",
+    "    T = 1\n",
+    "    t = np.linspace(0,T, Nt)\n",
+    "\n",
+    "    ## Set up the initial parameters\n",
+    "    num_hidden_neurons = [100, 50, 25]\n",
+    "    num_iter = 1000\n",
+    "    lmb = 1e-3\n",
+    "\n",
+    "    P = solve_ode_deep_neural_network(t, num_hidden_neurons, num_iter, lmb)\n",
+    "\n",
+    "    g_dnn_ag = g_trial_deep(t,P)\n",
+    "    g_analytical = g_analytic(t)\n",
+    "\n",
+    "    # Find the maximum absolute difference between the solutons:\n",
+    "    diff_ag = np.max(np.abs(g_dnn_ag - g_analytical))\n",
+    "    print(\"The max absolute difference between the solutions is: %g\"%diff_ag)\n",
+    "\n",
+    "    plt.figure(figsize=(10,10))\n",
+    "\n",
+    "    plt.title('Performance of neural network solving an ODE compared to the analytical solution')\n",
+    "    plt.plot(t, g_analytical)\n",
+    "    plt.plot(t, g_dnn_ag[0,:])\n",
+    "    plt.legend(['analytical','nn'])\n",
+    "    plt.xlabel('t')\n",
+    "    plt.ylabel('g(t)')\n",
+    "\n",
+    "    plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2407df1c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Using forward Euler to solve the ODE\n",
+    "\n",
+    "A straightforward way of solving an ODE numerically, is to use Euler's method.\n",
+    "\n",
+    "Euler's method uses Taylor series to approximate the value at a function $f$ at a step $\\Delta x$ from $x$:\n",
+    "\n",
+    "$$\n",
+    "f(x + \\Delta x) \\approx f(x) + \\Delta x f'(x)\n",
+    "$$\n",
+    "\n",
+    "In our case, using Euler's method to approximate the value of $g$ at a step $\\Delta t$ from $t$ yields"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e30d9840",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\begin{aligned}\n",
+    "  g(t + \\Delta t) &\\approx g(t) + \\Delta t g'(t) \\\\\n",
+    "  &= g(t) + \\Delta t \\big(\\alpha g(t)(A - g(t))\\big)\n",
+    "\\end{aligned}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4af6e338",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "along with the condition that $g(0) = g_0$.\n",
+    "\n",
+    "Let $t_i = i \\cdot \\Delta t$ where $\\Delta t = \\frac{T}{N_t-1}$ where $T$ is the final time our solver must solve for and $N_t$ the number of values for $t \\in [0, T]$ for $i = 0, \\dots, N_t-1$.\n",
+    "\n",
+    "For $i \\geq 1$, we have that"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "606cf0d3",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\begin{aligned}\n",
+    "t_i &= i\\Delta t \\\\\n",
+    "&= (i - 1)\\Delta t + \\Delta t \\\\\n",
+    "&= t_{i-1} + \\Delta t\n",
+    "\\end{aligned}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3275ea67",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Now, if $g_i = g(t_i)$ then"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8c36efec",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"odenum\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation}\n",
+    "  \\begin{aligned}\n",
+    "  g_i &= g(t_i) \\\\\n",
+    "  &= g(t_{i-1} + \\Delta t) \\\\\n",
+    "  &\\approx g(t_{i-1}) + \\Delta t \\big(\\alpha g(t_{i-1})(A - g(t_{i-1}))\\big) \\\\\n",
+    "  &= g_{i-1} + \\Delta t \\big(\\alpha g_{i-1}(A - g_{i-1})\\big)\n",
+    "  \\end{aligned}\n",
+    "\\end{equation} \\label{odenum} \\tag{12}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5290cde6",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "for $i \\geq 1$ and $g_0 = g(t_0) = g(0) = g_0$.\n",
+    "\n",
+    "Equation ([12](#odenum)) could be implemented in the following way,\n",
+    "extending the program that uses the network using Autograd:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 34,
+   "id": "d5488516",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "# Assume that all function definitions from the example program using Autograd\n",
+    "# are located here.\n",
+    "\n",
+    "if __name__ == '__main__':\n",
+    "    npr.seed(4155)\n",
+    "\n",
+    "    ## Decide the vales of arguments to the function to solve\n",
+    "    Nt = 10\n",
+    "    T = 1\n",
+    "    t = np.linspace(0,T, Nt)\n",
+    "\n",
+    "    ## Set up the initial parameters\n",
+    "    num_hidden_neurons = [100,50,25]\n",
+    "    num_iter = 1000\n",
+    "    lmb = 1e-3\n",
+    "\n",
+    "    P = solve_ode_deep_neural_network(t, num_hidden_neurons, num_iter, lmb)\n",
+    "\n",
+    "    g_dnn_ag = g_trial_deep(t,P)\n",
+    "    g_analytical = g_analytic(t)\n",
+    "\n",
+    "    # Find the maximum absolute difference between the solutons:\n",
+    "    diff_ag = np.max(np.abs(g_dnn_ag - g_analytical))\n",
+    "    print(\"The max absolute difference between the solutions is: %g\"%diff_ag)\n",
+    "\n",
+    "    plt.figure(figsize=(10,10))\n",
+    "\n",
+    "    plt.title('Performance of neural network solving an ODE compared to the analytical solution')\n",
+    "    plt.plot(t, g_analytical)\n",
+    "    plt.plot(t, g_dnn_ag[0,:])\n",
+    "    plt.legend(['analytical','nn'])\n",
+    "    plt.xlabel('t')\n",
+    "    plt.ylabel('g(t)')\n",
+    "\n",
+    "    ## Find an approximation to the funtion using forward Euler\n",
+    "\n",
+    "    alpha, A, g0 = get_parameters()\n",
+    "    dt = T/(Nt - 1)\n",
+    "\n",
+    "    # Perform forward Euler to solve the ODE\n",
+    "    g_euler = np.zeros(Nt)\n",
+    "    g_euler[0] = g0\n",
+    "\n",
+    "    for i in range(1,Nt):\n",
+    "        g_euler[i] = g_euler[i-1] + dt*(alpha*g_euler[i-1]*(A - g_euler[i-1]))\n",
+    "\n",
+    "    # Print the errors done by each method\n",
+    "    diff1 = np.max(np.abs(g_euler - g_analytical))\n",
+    "    diff2 = np.max(np.abs(g_dnn_ag[0,:] - g_analytical))\n",
+    "\n",
+    "    print('Max absolute difference between Euler method and analytical: %g'%diff1)\n",
+    "    print('Max absolute difference between deep neural network and analytical: %g'%diff2)\n",
+    "\n",
+    "    # Plot results\n",
+    "    plt.figure(figsize=(10,10))\n",
+    "\n",
+    "    plt.plot(t,g_euler)\n",
+    "    plt.plot(t,g_analytical)\n",
+    "    plt.plot(t,g_dnn_ag[0,:])\n",
+    "\n",
+    "    plt.legend(['euler','analytical','dnn'])\n",
+    "    plt.xlabel('Time t')\n",
+    "    plt.ylabel('g(t)')\n",
+    "\n",
+    "    plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d631641d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Example: Solving the one dimensional Poisson equation\n",
+    "\n",
+    "The Poisson equation for $g(x)$ in one dimension is"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3bd8043b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"poisson\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation} \\label{poisson} \\tag{13}\n",
+    "  -g''(x) = f(x)\n",
+    "\\end{equation}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "818ac1d8",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where $f(x)$ is a given function for $x \\in (0,1)$.\n",
+    "\n",
+    "The conditions that $g(x)$ is chosen to fulfill, are"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "894be116",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\begin{align*}\n",
+    "  g(0) &= 0 \\\\\n",
+    "  g(1) &= 0\n",
+    "\\end{align*}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c2fce07f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "This equation can be solved numerically using programs where e.g Autograd and TensorFlow are used.\n",
+    "The results from the networks can then be compared to the analytical solution.\n",
+    "In addition, it could be interesting to see how a typical method for numerically solving second order ODEs compares to the neural networks."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1e2ffb5e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## The specific equation to solve for\n",
+    "\n",
+    "Here, the function $g(x)$ to solve for follows the equation"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5677eb07",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "-g''(x) = f(x),\\qquad x \\in (0,1)\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "89173815",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where $f(x)$ is a given function, along with the chosen conditions"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f6e81c01",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"cond\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{aligned}\n",
+    "g(0) = g(1) = 0\n",
+    "\\end{aligned}\\label{cond} \\tag{14}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "82b4c100",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "In this example, we consider the case when $f(x) = (3x + x^2)\\exp(x)$.\n",
+    "\n",
+    "For this case, a possible trial solution satisfying the conditions could be"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "05574f7f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "g_t(x) = x \\cdot (1-x) \\cdot N(P,x)\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5c17a08c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "The analytical solution for this problem is"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a0ce240a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "g(x) = x(1 - x)\\exp(x)\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d90da9be",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Solving the equation using Autograd"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 35,
+   "id": "ffd8b552",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "import autograd.numpy as np\n",
+    "from autograd import grad, elementwise_grad\n",
+    "import autograd.numpy.random as npr\n",
+    "from matplotlib import pyplot as plt\n",
+    "\n",
+    "def sigmoid(z):\n",
+    "    return 1/(1 + np.exp(-z))\n",
+    "\n",
+    "def deep_neural_network(deep_params, x):\n",
+    "    # N_hidden is the number of hidden layers\n",
+    "    # deep_params is a list, len() should be used\n",
+    "    N_hidden = len(deep_params) - 1 # -1 since params consists of\n",
+    "                                        # parameters to all the hidden\n",
+    "                                        # layers AND the output layer.\n",
+    "\n",
+    "    # Assumes input x being an one-dimensional array\n",
+    "    num_values = np.size(x)\n",
+    "    x = x.reshape(-1, num_values)\n",
+    "\n",
+    "    # Assume that the input layer does nothing to the input x\n",
+    "    x_input = x\n",
+    "\n",
+    "    # Due to multiple hidden layers, define a variable referencing to the\n",
+    "    # output of the previous layer:\n",
+    "    x_prev = x_input\n",
+    "\n",
+    "    ## Hidden layers:\n",
+    "\n",
+    "    for l in range(N_hidden):\n",
+    "        # From the list of parameters P; find the correct weigths and bias for this layer\n",
+    "        w_hidden = deep_params[l]\n",
+    "\n",
+    "        # Add a row of ones to include bias\n",
+    "        x_prev = np.concatenate((np.ones((1,num_values)), x_prev ), axis = 0)\n",
+    "\n",
+    "        z_hidden = np.matmul(w_hidden, x_prev)\n",
+    "        x_hidden = sigmoid(z_hidden)\n",
+    "\n",
+    "        # Update x_prev such that next layer can use the output from this layer\n",
+    "        x_prev = x_hidden\n",
+    "\n",
+    "    ## Output layer:\n",
+    "\n",
+    "    # Get the weights and bias for this layer\n",
+    "    w_output = deep_params[-1]\n",
+    "\n",
+    "    # Include bias:\n",
+    "    x_prev = np.concatenate((np.ones((1,num_values)), x_prev), axis = 0)\n",
+    "\n",
+    "    z_output = np.matmul(w_output, x_prev)\n",
+    "    x_output = z_output\n",
+    "\n",
+    "    return x_output\n",
+    "\n",
+    "\n",
+    "def solve_ode_deep_neural_network(x, num_neurons, num_iter, lmb):\n",
+    "    # num_hidden_neurons is now a list of number of neurons within each hidden layer\n",
+    "\n",
+    "    # Find the number of hidden layers:\n",
+    "    N_hidden = np.size(num_neurons)\n",
+    "\n",
+    "    ## Set up initial weigths and biases\n",
+    "\n",
+    "    # Initialize the list of parameters:\n",
+    "    P = [None]*(N_hidden + 1) # + 1 to include the output layer\n",
+    "\n",
+    "    P[0] = npr.randn(num_neurons[0], 2 )\n",
+    "    for l in range(1,N_hidden):\n",
+    "        P[l] = npr.randn(num_neurons[l], num_neurons[l-1] + 1) # +1 to include bias\n",
+    "\n",
+    "    # For the output layer\n",
+    "    P[-1] = npr.randn(1, num_neurons[-1] + 1 ) # +1 since bias is included\n",
+    "\n",
+    "    print('Initial cost: %g'%cost_function_deep(P, x))\n",
+    "\n",
+    "    ## Start finding the optimal weigths using gradient descent\n",
+    "\n",
+    "    # Find the Python function that represents the gradient of the cost function\n",
+    "    # w.r.t the 0-th input argument -- that is the weights and biases in the hidden and output layer\n",
+    "    cost_function_deep_grad = grad(cost_function_deep,0)\n",
+    "\n",
+    "    # Let the update be done num_iter times\n",
+    "    for i in range(num_iter):\n",
+    "        # Evaluate the gradient at the current weights and biases in P.\n",
+    "        # The cost_grad consist now of N_hidden + 1 arrays; the gradient w.r.t the weights and biases\n",
+    "        # in the hidden layers and output layers evaluated at x.\n",
+    "        cost_deep_grad =  cost_function_deep_grad(P, x)\n",
+    "\n",
+    "        for l in range(N_hidden+1):\n",
+    "            P[l] = P[l] - lmb * cost_deep_grad[l]\n",
+    "\n",
+    "    print('Final cost: %g'%cost_function_deep(P, x))\n",
+    "\n",
+    "    return P\n",
+    "\n",
+    "## Set up the cost function specified for this Poisson equation:\n",
+    "\n",
+    "# The right side of the ODE\n",
+    "def f(x):\n",
+    "    return (3*x + x**2)*np.exp(x)\n",
+    "\n",
+    "def cost_function_deep(P, x):\n",
+    "\n",
+    "    # Evaluate the trial function with the current parameters P\n",
+    "    g_t = g_trial_deep(x,P)\n",
+    "\n",
+    "    # Find the derivative w.r.t x of the trial function\n",
+    "    d2_g_t = elementwise_grad(elementwise_grad(g_trial_deep,0))(x,P)\n",
+    "\n",
+    "    right_side = f(x)\n",
+    "\n",
+    "    err_sqr = (-d2_g_t - right_side)**2\n",
+    "    cost_sum = np.sum(err_sqr)\n",
+    "\n",
+    "    return cost_sum/np.size(err_sqr)\n",
+    "\n",
+    "# The trial solution:\n",
+    "def g_trial_deep(x,P):\n",
+    "    return x*(1-x)*deep_neural_network(P,x)\n",
+    "\n",
+    "# The analytic solution;\n",
+    "def g_analytic(x):\n",
+    "    return x*(1-x)*np.exp(x)\n",
+    "\n",
+    "if __name__ == '__main__':\n",
+    "    npr.seed(4155)\n",
+    "\n",
+    "    ## Decide the vales of arguments to the function to solve\n",
+    "    Nx = 10\n",
+    "    x = np.linspace(0,1, Nx)\n",
+    "\n",
+    "    ## Set up the initial parameters\n",
+    "    num_hidden_neurons = [200,100]\n",
+    "    num_iter = 1000\n",
+    "    lmb = 1e-3\n",
+    "\n",
+    "    P = solve_ode_deep_neural_network(x, num_hidden_neurons, num_iter, lmb)\n",
+    "\n",
+    "    g_dnn_ag = g_trial_deep(x,P)\n",
+    "    g_analytical = g_analytic(x)\n",
+    "\n",
+    "    # Find the maximum absolute difference between the solutons:\n",
+    "    max_diff = np.max(np.abs(g_dnn_ag - g_analytical))\n",
+    "    print(\"The max absolute difference between the solutions is: %g\"%max_diff)\n",
+    "\n",
+    "    plt.figure(figsize=(10,10))\n",
+    "\n",
+    "    plt.title('Performance of neural network solving an ODE compared to the analytical solution')\n",
+    "    plt.plot(x, g_analytical)\n",
+    "    plt.plot(x, g_dnn_ag[0,:])\n",
+    "    plt.legend(['analytical','nn'])\n",
+    "    plt.xlabel('x')\n",
+    "    plt.ylabel('g(x)')\n",
+    "    plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2cde42e7",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Comparing with a numerical scheme\n",
+    "\n",
+    "The Poisson equation is possible to solve using Taylor series to approximate the second derivative.\n",
+    "\n",
+    "Using Taylor series, the second derivative can be expressed as\n",
+    "\n",
+    "$$\n",
+    "g''(x) = \\frac{g(x + \\Delta x) - 2g(x) + g(x-\\Delta x)}{\\Delta x^2} + E_{\\Delta x}(x)\n",
+    "$$\n",
+    "\n",
+    "where $\\Delta x$ is a small step size and $E_{\\Delta x}(x)$ being the error term.\n",
+    "\n",
+    "Looking away from the error terms gives an approximation to the second derivative:"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e24a46af",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"approx\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation} \\label{approx} \\tag{15}\n",
+    "g''(x) \\approx \\frac{g(x + \\Delta x) - 2g(x) + g(x-\\Delta x)}{\\Delta x^2}\n",
+    "\\end{equation}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2417ec7c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "If $x_i = i \\Delta x = x_{i-1} + \\Delta x$ and $g_i = g(x_i)$ for $i = 1,\\dots N_x - 2$ with $N_x$ being the number of values for $x$, ([15](#approx)) becomes"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "012a9c2b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\begin{aligned}\n",
+    "g''(x_i) &\\approx \\frac{g(x_i + \\Delta x) - 2g(x_i) + g(x_i -\\Delta x)}{\\Delta x^2} \\\\\n",
+    "&= \\frac{g_{i+1} - 2g_i + g_{i-1}}{\\Delta x^2}\n",
+    "\\end{aligned}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "101bccb8",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Since we know from our problem that"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "280cdc54",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\begin{aligned}\n",
+    "-g''(x) &= f(x) \\\\\n",
+    "&= (3x + x^2)\\exp(x)\n",
+    "\\end{aligned}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "38bc9035",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "along with the conditions $g(0) = g(1) = 0$,\n",
+    "the following scheme can be used to find an approximate solution for $g(x)$ numerically:"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3925a117",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"odesys\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation}\n",
+    "  \\begin{aligned}\n",
+    "  -\\Big( \\frac{g_{i+1} - 2g_i + g_{i-1}}{\\Delta x^2} \\Big) &= f(x_i) \\\\\n",
+    "  -g_{i+1} + 2g_i - g_{i-1} &= \\Delta x^2 f(x_i)\n",
+    "  \\end{aligned}\n",
+    "\\end{equation} \\label{odesys} \\tag{16}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6f86e85b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "for $i = 1, \\dots, N_x - 2$ where $g_0 = g_{N_x - 1} = 0$ and $f(x_i) = (3x_i + x_i^2)\\exp(x_i)$, which is given for our specific problem.\n",
+    "\n",
+    "The equation can be rewritten into a matrix equation:"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "394b14bc",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\begin{aligned}\n",
+    "\\begin{pmatrix}\n",
+    "2 & -1 & 0 & \\dots & 0 \\\\\n",
+    "-1 & 2 & -1 & \\dots & 0 \\\\\n",
+    "\\vdots & & \\ddots & & \\vdots \\\\\n",
+    "0 & \\dots & -1 & 2 & -1  \\\\\n",
+    "0 & \\dots & 0 & -1 & 2\\\\\n",
+    "\\end{pmatrix}\n",
+    "\\begin{pmatrix}\n",
+    "g_1 \\\\\n",
+    "g_2 \\\\\n",
+    "\\vdots \\\\\n",
+    "g_{N_x - 3} \\\\\n",
+    "g_{N_x - 2}\n",
+    "\\end{pmatrix}\n",
+    "&=\n",
+    "\\Delta x^2\n",
+    "\\begin{pmatrix}\n",
+    "f(x_1) \\\\\n",
+    "f(x_2) \\\\\n",
+    "\\vdots \\\\\n",
+    "f(x_{N_x - 3}) \\\\\n",
+    "f(x_{N_x - 2})\n",
+    "\\end{pmatrix} \\\\\n",
+    "\\boldsymbol{A}\\boldsymbol{g} &= \\boldsymbol{f},\n",
+    "\\end{aligned}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5ab07ae1",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "which makes it possible to solve for the vector $\\boldsymbol{g}$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8134c34f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Setting up the code\n",
+    "\n",
+    "We can then compare the result from this numerical scheme with the output from our network using Autograd:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 36,
+   "id": "4362f9a9",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "import autograd.numpy as np\n",
+    "from autograd import grad, elementwise_grad\n",
+    "import autograd.numpy.random as npr\n",
+    "from matplotlib import pyplot as plt\n",
+    "\n",
+    "def sigmoid(z):\n",
+    "    return 1/(1 + np.exp(-z))\n",
+    "\n",
+    "def deep_neural_network(deep_params, x):\n",
+    "    # N_hidden is the number of hidden layers\n",
+    "    # deep_params is a list, len() should be used\n",
+    "    N_hidden = len(deep_params) - 1 # -1 since params consists of\n",
+    "                                        # parameters to all the hidden\n",
+    "                                        # layers AND the output layer.\n",
+    "\n",
+    "    # Assumes input x being an one-dimensional array\n",
+    "    num_values = np.size(x)\n",
+    "    x = x.reshape(-1, num_values)\n",
+    "\n",
+    "    # Assume that the input layer does nothing to the input x\n",
+    "    x_input = x\n",
+    "\n",
+    "    # Due to multiple hidden layers, define a variable referencing to the\n",
+    "    # output of the previous layer:\n",
+    "    x_prev = x_input\n",
+    "\n",
+    "    ## Hidden layers:\n",
+    "\n",
+    "    for l in range(N_hidden):\n",
+    "        # From the list of parameters P; find the correct weigths and bias for this layer\n",
+    "        w_hidden = deep_params[l]\n",
+    "\n",
+    "        # Add a row of ones to include bias\n",
+    "        x_prev = np.concatenate((np.ones((1,num_values)), x_prev ), axis = 0)\n",
+    "\n",
+    "        z_hidden = np.matmul(w_hidden, x_prev)\n",
+    "        x_hidden = sigmoid(z_hidden)\n",
+    "\n",
+    "        # Update x_prev such that next layer can use the output from this layer\n",
+    "        x_prev = x_hidden\n",
+    "\n",
+    "    ## Output layer:\n",
+    "\n",
+    "    # Get the weights and bias for this layer\n",
+    "    w_output = deep_params[-1]\n",
+    "\n",
+    "    # Include bias:\n",
+    "    x_prev = np.concatenate((np.ones((1,num_values)), x_prev), axis = 0)\n",
+    "\n",
+    "    z_output = np.matmul(w_output, x_prev)\n",
+    "    x_output = z_output\n",
+    "\n",
+    "    return x_output\n",
+    "\n",
+    "\n",
+    "def solve_ode_deep_neural_network(x, num_neurons, num_iter, lmb):\n",
+    "    # num_hidden_neurons is now a list of number of neurons within each hidden layer\n",
+    "\n",
+    "    # Find the number of hidden layers:\n",
+    "    N_hidden = np.size(num_neurons)\n",
+    "\n",
+    "    ## Set up initial weigths and biases\n",
+    "\n",
+    "    # Initialize the list of parameters:\n",
+    "    P = [None]*(N_hidden + 1) # + 1 to include the output layer\n",
+    "\n",
+    "    P[0] = npr.randn(num_neurons[0], 2 )\n",
+    "    for l in range(1,N_hidden):\n",
+    "        P[l] = npr.randn(num_neurons[l], num_neurons[l-1] + 1) # +1 to include bias\n",
+    "\n",
+    "    # For the output layer\n",
+    "    P[-1] = npr.randn(1, num_neurons[-1] + 1 ) # +1 since bias is included\n",
+    "\n",
+    "    print('Initial cost: %g'%cost_function_deep(P, x))\n",
+    "\n",
+    "    ## Start finding the optimal weigths using gradient descent\n",
+    "\n",
+    "    # Find the Python function that represents the gradient of the cost function\n",
+    "    # w.r.t the 0-th input argument -- that is the weights and biases in the hidden and output layer\n",
+    "    cost_function_deep_grad = grad(cost_function_deep,0)\n",
+    "\n",
+    "    # Let the update be done num_iter times\n",
+    "    for i in range(num_iter):\n",
+    "        # Evaluate the gradient at the current weights and biases in P.\n",
+    "        # The cost_grad consist now of N_hidden + 1 arrays; the gradient w.r.t the weights and biases\n",
+    "        # in the hidden layers and output layers evaluated at x.\n",
+    "        cost_deep_grad =  cost_function_deep_grad(P, x)\n",
+    "\n",
+    "        for l in range(N_hidden+1):\n",
+    "            P[l] = P[l] - lmb * cost_deep_grad[l]\n",
+    "\n",
+    "    print('Final cost: %g'%cost_function_deep(P, x))\n",
+    "\n",
+    "    return P\n",
+    "\n",
+    "## Set up the cost function specified for this Poisson equation:\n",
+    "\n",
+    "# The right side of the ODE\n",
+    "def f(x):\n",
+    "    return (3*x + x**2)*np.exp(x)\n",
+    "\n",
+    "def cost_function_deep(P, x):\n",
+    "\n",
+    "    # Evaluate the trial function with the current parameters P\n",
+    "    g_t = g_trial_deep(x,P)\n",
+    "\n",
+    "    # Find the derivative w.r.t x of the trial function\n",
+    "    d2_g_t = elementwise_grad(elementwise_grad(g_trial_deep,0))(x,P)\n",
+    "\n",
+    "    right_side = f(x)\n",
+    "\n",
+    "    err_sqr = (-d2_g_t - right_side)**2\n",
+    "    cost_sum = np.sum(err_sqr)\n",
+    "\n",
+    "    return cost_sum/np.size(err_sqr)\n",
+    "\n",
+    "# The trial solution:\n",
+    "def g_trial_deep(x,P):\n",
+    "    return x*(1-x)*deep_neural_network(P,x)\n",
+    "\n",
+    "# The analytic solution;\n",
+    "def g_analytic(x):\n",
+    "    return x*(1-x)*np.exp(x)\n",
+    "\n",
+    "if __name__ == '__main__':\n",
+    "    npr.seed(4155)\n",
+    "\n",
+    "    ## Decide the vales of arguments to the function to solve\n",
+    "    Nx = 10\n",
+    "    x = np.linspace(0,1, Nx)\n",
+    "\n",
+    "    ## Set up the initial parameters\n",
+    "    num_hidden_neurons = [200,100]\n",
+    "    num_iter = 1000\n",
+    "    lmb = 1e-3\n",
+    "\n",
+    "    P = solve_ode_deep_neural_network(x, num_hidden_neurons, num_iter, lmb)\n",
+    "\n",
+    "    g_dnn_ag = g_trial_deep(x,P)\n",
+    "    g_analytical = g_analytic(x)\n",
+    "\n",
+    "    # Find the maximum absolute difference between the solutons:\n",
+    "\n",
+    "    plt.figure(figsize=(10,10))\n",
+    "\n",
+    "    plt.title('Performance of neural network solving an ODE compared to the analytical solution')\n",
+    "    plt.plot(x, g_analytical)\n",
+    "    plt.plot(x, g_dnn_ag[0,:])\n",
+    "    plt.legend(['analytical','nn'])\n",
+    "    plt.xlabel('x')\n",
+    "    plt.ylabel('g(x)')\n",
+    "\n",
+    "    ## Perform the computation using the numerical scheme\n",
+    "\n",
+    "    dx = 1/(Nx - 1)\n",
+    "\n",
+    "    # Set up the matrix A\n",
+    "    A = np.zeros((Nx-2,Nx-2))\n",
+    "\n",
+    "    A[0,0] = 2\n",
+    "    A[0,1] = -1\n",
+    "\n",
+    "    for i in range(1,Nx-3):\n",
+    "        A[i,i-1] = -1\n",
+    "        A[i,i] = 2\n",
+    "        A[i,i+1] = -1\n",
+    "\n",
+    "    A[Nx - 3, Nx - 4] = -1\n",
+    "    A[Nx - 3, Nx - 3] = 2\n",
+    "\n",
+    "    # Set up the vector f\n",
+    "    f_vec = dx**2 * f(x[1:-1])\n",
+    "\n",
+    "    # Solve the equation\n",
+    "    g_res = np.linalg.solve(A,f_vec)\n",
+    "\n",
+    "    g_vec = np.zeros(Nx)\n",
+    "    g_vec[1:-1] = g_res\n",
+    "\n",
+    "    # Print the differences between each method\n",
+    "    max_diff1 = np.max(np.abs(g_dnn_ag - g_analytical))\n",
+    "    max_diff2 = np.max(np.abs(g_vec - g_analytical))\n",
+    "    print(\"The max absolute difference between the analytical solution and DNN Autograd: %g\"%max_diff1)\n",
+    "    print(\"The max absolute difference between the analytical solution and numerical scheme: %g\"%max_diff2)\n",
+    "\n",
+    "    # Plot the results\n",
+    "    plt.figure(figsize=(10,10))\n",
+    "\n",
+    "    plt.plot(x,g_vec)\n",
+    "    plt.plot(x,g_analytical)\n",
+    "    plt.plot(x,g_dnn_ag[0,:])\n",
+    "\n",
+    "    plt.legend(['numerical scheme','analytical','dnn'])\n",
+    "    plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c66dc85a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Partial Differential Equations\n",
+    "\n",
+    "A partial differential equation (PDE) has a solution here the function\n",
+    "is defined by multiple variables.  The equation may involve all kinds\n",
+    "of combinations of which variables the function is differentiated with\n",
+    "respect to.\n",
+    "\n",
+    "In general, a partial differential equation for a function $g(x_1,\\dots,x_N)$ with $N$ variables may be expressed as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cf60d1fc",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"PDE\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation} \\label{PDE} \\tag{17}\n",
+    "  f\\left(x_1, \\, \\dots \\, , x_N, \\frac{\\partial g(x_1,\\dots,x_N) }{\\partial x_1}, \\dots , \\frac{\\partial g(x_1,\\dots,x_N) }{\\partial x_N}, \\frac{\\partial g(x_1,\\dots,x_N) }{\\partial x_1\\partial x_2}, \\, \\dots \\, , \\frac{\\partial^n g(x_1,\\dots,x_N) }{\\partial x_N^n} \\right) = 0\n",
+    "\\end{equation}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "bff85f6e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where $f$ is an expression involving all kinds of possible mixed derivatives of $g(x_1,\\dots,x_N)$ up to an order $n$. In order for the solution to be unique, some additional conditions must also be given."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "64289867",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Type of problem\n",
+    "\n",
+    "The problem our network must solve for, is similar to the ODE case.\n",
+    "We must have a trial solution $g_t$ at hand.\n",
+    "\n",
+    "For instance, the trial solution could be expressed as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "75d3a4d2",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\begin{align*}\n",
+    "  g_t(x_1,\\dots,x_N) = h_1(x_1,\\dots,x_N) + h_2(x_1,\\dots,x_N,N(x_1,\\dots,x_N,P))\n",
+    "\\end{align*}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6f3e695d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where $h_1(x_1,\\dots,x_N)$ is a function that ensures $g_t(x_1,\\dots,x_N)$ satisfies some given conditions.\n",
+    "The neural network $N(x_1,\\dots,x_N,P)$ has weights and biases described by $P$ and $h_2(x_1,\\dots,x_N,N(x_1,\\dots,x_N,P))$ is an expression using the output from the neural network in some way.\n",
+    "\n",
+    "The role of the function $h_2(x_1,\\dots,x_N,N(x_1,\\dots,x_N,P))$, is to ensure that the output of $N(x_1,\\dots,x_N,P)$ is zero when $g_t(x_1,\\dots,x_N)$ is evaluated at the values of $x_1,\\dots,x_N$ where the given conditions must be satisfied. The function $h_1(x_1,\\dots,x_N)$ should alone make $g_t(x_1,\\dots,x_N)$ satisfy the conditions."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "da1ba3cf",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Network requirements\n",
+    "\n",
+    "The network tries then the minimize the cost function following the\n",
+    "same ideas as described for the ODE case, but now with more than one\n",
+    "variables to consider.  The concept still remains the same; find a set\n",
+    "of parameters $P$ such that the expression $f$ in ([17](#PDE)) is as\n",
+    "close to zero as possible.\n",
+    "\n",
+    "As for the ODE case, the cost function is the mean squared error that\n",
+    "the network must try to minimize. The cost function for the network to\n",
+    "minimize is"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "373065ff",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "C\\left(x_1, \\dots, x_N, P\\right) = \\left(  f\\left(x_1, \\, \\dots \\, , x_N, \\frac{\\partial g(x_1,\\dots,x_N) }{\\partial x_1}, \\dots , \\frac{\\partial g(x_1,\\dots,x_N) }{\\partial x_N}, \\frac{\\partial g(x_1,\\dots,x_N) }{\\partial x_1\\partial x_2}, \\, \\dots \\, , \\frac{\\partial^n g(x_1,\\dots,x_N) }{\\partial x_N^n} \\right) \\right)^2\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2281eade",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## More details\n",
+    "\n",
+    "If we let $\\boldsymbol{x} = \\big( x_1, \\dots, x_N \\big)$ be an array containing the values for $x_1, \\dots, x_N$ respectively, the cost function can be reformulated into the following:"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "989a8905",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "C\\left(\\boldsymbol{x}, P\\right) = f\\left( \\left( \\boldsymbol{x}, \\frac{\\partial g(\\boldsymbol{x}) }{\\partial x_1}, \\dots , \\frac{\\partial g(\\boldsymbol{x}) }{\\partial x_N}, \\frac{\\partial g(\\boldsymbol{x}) }{\\partial x_1\\partial x_2}, \\, \\dots \\, , \\frac{\\partial^n g(\\boldsymbol{x}) }{\\partial x_N^n} \\right) \\right)^2\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b36367a0",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "If we also have $M$ different sets of values for $x_1, \\dots, x_N$, that is $\\boldsymbol{x}_i = \\big(x_1^{(i)}, \\dots, x_N^{(i)}\\big)$ for $i = 1,\\dots,M$ being the rows in matrix $X$, the cost function can be generalized into"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6f6f51dd",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "C\\left(X, P \\right) = \\sum_{i=1}^M f\\left( \\left( \\boldsymbol{x}_i, \\frac{\\partial g(\\boldsymbol{x}_i) }{\\partial x_1}, \\dots , \\frac{\\partial g(\\boldsymbol{x}_i) }{\\partial x_N}, \\frac{\\partial g(\\boldsymbol{x}_i) }{\\partial x_1\\partial x_2}, \\, \\dots \\, , \\frac{\\partial^n g(\\boldsymbol{x}_i) }{\\partial x_N^n} \\right) \\right)^2.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "35bd1e4a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Example: The diffusion equation\n",
+    "\n",
+    "In one spatial dimension, the equation reads"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2b804c0a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial g(x,t)}{\\partial t} = \\frac{\\partial^2 g(x,t)}{\\partial x^2}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "07f20557",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where a possible choice of conditions are"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0e14c702",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\begin{align*}\n",
+    "g(0,t) &= 0 ,\\qquad t \\geq 0 \\\\\n",
+    "g(1,t) &= 0, \\qquad t \\geq 0 \\\\\n",
+    "g(x,0) &= u(x),\\qquad x\\in [0,1]\n",
+    "\\end{align*}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a19c5cae",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "with $u(x)$ being some given function."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "de041a40",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Defining the problem\n",
+    "\n",
+    "For this case, we want to find $g(x,t)$ such that"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "519bb7a7",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"diffonedim\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation}\n",
+    "  \\frac{\\partial g(x,t)}{\\partial t} = \\frac{\\partial^2 g(x,t)}{\\partial x^2}\n",
+    "\\end{equation} \\label{diffonedim} \\tag{18}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "129322ea",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ddc7b725",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\begin{align*}\n",
+    "g(0,t) &= 0 ,\\qquad t \\geq 0 \\\\\n",
+    "g(1,t) &= 0, \\qquad t \\geq 0 \\\\\n",
+    "g(x,0) &= u(x),\\qquad x\\in [0,1]\n",
+    "\\end{align*}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5497b34b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "with $u(x) = \\sin(\\pi x)$.\n",
+    "\n",
+    "First, let us set up the deep neural network.\n",
+    "The deep neural network will follow the same structure as discussed in the examples solving the ODEs.\n",
+    "First, we will look into how Autograd could be used in a network tailored to solve for bivariate functions."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0b9040e4",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Setting up the network using Autograd\n",
+    "\n",
+    "The only change to do here, is to extend our network such that\n",
+    "functions of multiple parameters are correctly handled.  In this case\n",
+    "we have two variables in our function to solve for, that is time $t$\n",
+    "and position $x$.  The variables will be represented by a\n",
+    "one-dimensional array in the program.  The program will evaluate the\n",
+    "network at each possible pair $(x,t)$, given an array for the desired\n",
+    "$x$-values and $t$-values to approximate the solution at."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 37,
+   "id": "17097802",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "def sigmoid(z):\n",
+    "    return 1/(1 + np.exp(-z))\n",
+    "\n",
+    "def deep_neural_network(deep_params, x):\n",
+    "    # x is now a point and a 1D numpy array; make it a column vector\n",
+    "    num_coordinates = np.size(x,0)\n",
+    "    x = x.reshape(num_coordinates,-1)\n",
+    "\n",
+    "    num_points = np.size(x,1)\n",
+    "\n",
+    "    # N_hidden is the number of hidden layers\n",
+    "    N_hidden = len(deep_params) - 1 # -1 since params consist of parameters to all the hidden layers AND the output layer\n",
+    "\n",
+    "    # Assume that the input layer does nothing to the input x\n",
+    "    x_input = x\n",
+    "    x_prev = x_input\n",
+    "\n",
+    "    ## Hidden layers:\n",
+    "\n",
+    "    for l in range(N_hidden):\n",
+    "        # From the list of parameters P; find the correct weigths and bias for this layer\n",
+    "        w_hidden = deep_params[l]\n",
+    "\n",
+    "        # Add a row of ones to include bias\n",
+    "        x_prev = np.concatenate((np.ones((1,num_points)), x_prev ), axis = 0)\n",
+    "\n",
+    "        z_hidden = np.matmul(w_hidden, x_prev)\n",
+    "        x_hidden = sigmoid(z_hidden)\n",
+    "\n",
+    "        # Update x_prev such that next layer can use the output from this layer\n",
+    "        x_prev = x_hidden\n",
+    "\n",
+    "    ## Output layer:\n",
+    "\n",
+    "    # Get the weights and bias for this layer\n",
+    "    w_output = deep_params[-1]\n",
+    "\n",
+    "    # Include bias:\n",
+    "    x_prev = np.concatenate((np.ones((1,num_points)), x_prev), axis = 0)\n",
+    "\n",
+    "    z_output = np.matmul(w_output, x_prev)\n",
+    "    x_output = z_output\n",
+    "\n",
+    "    return x_output[0][0]"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a2178b56",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Setting up the network using Autograd; The trial solution\n",
+    "\n",
+    "The cost function must then iterate through the given arrays\n",
+    "containing values for $x$ and $t$, defines a point $(x,t)$ the deep\n",
+    "neural network and the trial solution is evaluated at, and then finds\n",
+    "the Jacobian of the trial solution.\n",
+    "\n",
+    "A possible trial solution for this PDE is\n",
+    "\n",
+    "$$\n",
+    "g_t(x,t) = h_1(x,t) + x(1-x)tN(x,t,P)\n",
+    "$$\n",
+    "\n",
+    "with $A(x,t)$ being a function ensuring that $g_t(x,t)$ satisfies our given conditions, and $N(x,t,P)$ being the output from the deep neural network using weights and biases for each layer from $P$.\n",
+    "\n",
+    "To fulfill the conditions, $A(x,t)$ could be:\n",
+    "\n",
+    "$$\n",
+    "h_1(x,t) = (1-t)\\Big(u(x) - \\big((1-x)u(0) + x u(1)\\big)\\Big) = (1-t)u(x) = (1-t)\\sin(\\pi x)\n",
+    "$$\n",
+    "since $(0) = u(1) = 0$ and $u(x) = \\sin(\\pi x)$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "533f4e84",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Why the jacobian?\n",
+    "\n",
+    "The Jacobian is used because the program must find the derivative of\n",
+    "the trial solution with respect to $x$ and $t$.\n",
+    "\n",
+    "This gives the necessity of computing the Jacobian matrix, as we want\n",
+    "to evaluate the gradient with respect to $x$ and $t$ (note that the\n",
+    "Jacobian of a scalar-valued multivariate function is simply its\n",
+    "gradient).\n",
+    "\n",
+    "In Autograd, the differentiation is by default done with respect to\n",
+    "the first input argument of your Python function. Since the points is\n",
+    "an array representing $x$ and $t$, the Jacobian is calculated using\n",
+    "the values of $x$ and $t$.\n",
+    "\n",
+    "To find the second derivative with respect to $x$ and $t$, the\n",
+    "Jacobian can be found for the second time. The result is a Hessian\n",
+    "matrix, which is the matrix containing all the possible second order\n",
+    "mixed derivatives of $g(x,t)$."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 38,
+   "id": "7b494481",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "# Set up the trial function:\n",
+    "def u(x):\n",
+    "    return np.sin(np.pi*x)\n",
+    "\n",
+    "def g_trial(point,P):\n",
+    "    x,t = point\n",
+    "    return (1-t)*u(x) + x*(1-x)*t*deep_neural_network(P,point)\n",
+    "\n",
+    "# The right side of the ODE:\n",
+    "def f(point):\n",
+    "    return 0.\n",
+    "\n",
+    "# The cost function:\n",
+    "def cost_function(P, x, t):\n",
+    "    cost_sum = 0\n",
+    "\n",
+    "    g_t_jacobian_func = jacobian(g_trial)\n",
+    "    g_t_hessian_func = hessian(g_trial)\n",
+    "\n",
+    "    for x_ in x:\n",
+    "        for t_ in t:\n",
+    "            point = np.array([x_,t_])\n",
+    "\n",
+    "            g_t = g_trial(point,P)\n",
+    "            g_t_jacobian = g_t_jacobian_func(point,P)\n",
+    "            g_t_hessian = g_t_hessian_func(point,P)\n",
+    "\n",
+    "            g_t_dt = g_t_jacobian[1]\n",
+    "            g_t_d2x = g_t_hessian[0][0]\n",
+    "\n",
+    "            func = f(point)\n",
+    "\n",
+    "            err_sqr = ( (g_t_dt - g_t_d2x) - func)**2\n",
+    "            cost_sum += err_sqr\n",
+    "\n",
+    "    return cost_sum"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9f4b4939",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Setting up the network using Autograd; The full program\n",
+    "\n",
+    "Having set up the network, along with the trial solution and cost function, we can now see how the deep neural network performs by comparing the results to the analytical solution.\n",
+    "\n",
+    "The analytical solution of our problem is\n",
+    "\n",
+    "$$\n",
+    "g(x,t) = \\exp(-\\pi^2 t)\\sin(\\pi x)\n",
+    "$$\n",
+    "\n",
+    "A possible way to implement a neural network solving the PDE, is given below.\n",
+    "Be aware, though, that it is fairly slow for the parameters used.\n",
+    "A better result is possible, but requires more iterations, and thus longer time to complete.\n",
+    "\n",
+    "Indeed, the program below is not optimal in its implementation, but rather serves as an example on how to implement and use a neural network to solve a PDE.\n",
+    "Using TensorFlow results in a much better execution time. Try it!"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 39,
+   "id": "83d6eb7d",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "import autograd.numpy as np\n",
+    "from autograd import jacobian,hessian,grad\n",
+    "import autograd.numpy.random as npr\n",
+    "from matplotlib import cm\n",
+    "from matplotlib import pyplot as plt\n",
+    "from mpl_toolkits.mplot3d import axes3d\n",
+    "\n",
+    "## Set up the network\n",
+    "\n",
+    "def sigmoid(z):\n",
+    "    return 1/(1 + np.exp(-z))\n",
+    "\n",
+    "def deep_neural_network(deep_params, x):\n",
+    "    # x is now a point and a 1D numpy array; make it a column vector\n",
+    "    num_coordinates = np.size(x,0)\n",
+    "    x = x.reshape(num_coordinates,-1)\n",
+    "\n",
+    "    num_points = np.size(x,1)\n",
+    "\n",
+    "    # N_hidden is the number of hidden layers\n",
+    "    N_hidden = len(deep_params) - 1 # -1 since params consist of parameters to all the hidden layers AND the output layer\n",
+    "\n",
+    "    # Assume that the input layer does nothing to the input x\n",
+    "    x_input = x\n",
+    "    x_prev = x_input\n",
+    "\n",
+    "    ## Hidden layers:\n",
+    "\n",
+    "    for l in range(N_hidden):\n",
+    "        # From the list of parameters P; find the correct weigths and bias for this layer\n",
+    "        w_hidden = deep_params[l]\n",
+    "\n",
+    "        # Add a row of ones to include bias\n",
+    "        x_prev = np.concatenate((np.ones((1,num_points)), x_prev ), axis = 0)\n",
+    "\n",
+    "        z_hidden = np.matmul(w_hidden, x_prev)\n",
+    "        x_hidden = sigmoid(z_hidden)\n",
+    "\n",
+    "        # Update x_prev such that next layer can use the output from this layer\n",
+    "        x_prev = x_hidden\n",
+    "\n",
+    "    ## Output layer:\n",
+    "\n",
+    "    # Get the weights and bias for this layer\n",
+    "    w_output = deep_params[-1]\n",
+    "\n",
+    "    # Include bias:\n",
+    "    x_prev = np.concatenate((np.ones((1,num_points)), x_prev), axis = 0)\n",
+    "\n",
+    "    z_output = np.matmul(w_output, x_prev)\n",
+    "    x_output = z_output\n",
+    "\n",
+    "    return x_output[0][0]\n",
+    "\n",
+    "## Define the trial solution and cost function\n",
+    "def u(x):\n",
+    "    return np.sin(np.pi*x)\n",
+    "\n",
+    "def g_trial(point,P):\n",
+    "    x,t = point\n",
+    "    return (1-t)*u(x) + x*(1-x)*t*deep_neural_network(P,point)\n",
+    "\n",
+    "# The right side of the ODE:\n",
+    "def f(point):\n",
+    "    return 0.\n",
+    "\n",
+    "# The cost function:\n",
+    "def cost_function(P, x, t):\n",
+    "    cost_sum = 0\n",
+    "\n",
+    "    g_t_jacobian_func = jacobian(g_trial)\n",
+    "    g_t_hessian_func = hessian(g_trial)\n",
+    "\n",
+    "    for x_ in x:\n",
+    "        for t_ in t:\n",
+    "            point = np.array([x_,t_])\n",
+    "\n",
+    "            g_t = g_trial(point,P)\n",
+    "            g_t_jacobian = g_t_jacobian_func(point,P)\n",
+    "            g_t_hessian = g_t_hessian_func(point,P)\n",
+    "\n",
+    "            g_t_dt = g_t_jacobian[1]\n",
+    "            g_t_d2x = g_t_hessian[0][0]\n",
+    "\n",
+    "            func = f(point)\n",
+    "\n",
+    "            err_sqr = ( (g_t_dt - g_t_d2x) - func)**2\n",
+    "            cost_sum += err_sqr\n",
+    "\n",
+    "    return cost_sum /( np.size(x)*np.size(t) )\n",
+    "\n",
+    "## For comparison, define the analytical solution\n",
+    "def g_analytic(point):\n",
+    "    x,t = point\n",
+    "    return np.exp(-np.pi**2*t)*np.sin(np.pi*x)\n",
+    "\n",
+    "## Set up a function for training the network to solve for the equation\n",
+    "def solve_pde_deep_neural_network(x,t, num_neurons, num_iter, lmb):\n",
+    "    ## Set up initial weigths and biases\n",
+    "    N_hidden = np.size(num_neurons)\n",
+    "\n",
+    "    ## Set up initial weigths and biases\n",
+    "\n",
+    "    # Initialize the list of parameters:\n",
+    "    P = [None]*(N_hidden + 1) # + 1 to include the output layer\n",
+    "\n",
+    "    P[0] = npr.randn(num_neurons[0], 2 + 1 ) # 2 since we have two points, +1 to include bias\n",
+    "    for l in range(1,N_hidden):\n",
+    "        P[l] = npr.randn(num_neurons[l], num_neurons[l-1] + 1) # +1 to include bias\n",
+    "\n",
+    "    # For the output layer\n",
+    "    P[-1] = npr.randn(1, num_neurons[-1] + 1 ) # +1 since bias is included\n",
+    "\n",
+    "    print('Initial cost: ',cost_function(P, x, t))\n",
+    "\n",
+    "    cost_function_grad = grad(cost_function,0)\n",
+    "\n",
+    "    # Let the update be done num_iter times\n",
+    "    for i in range(num_iter):\n",
+    "        cost_grad =  cost_function_grad(P, x , t)\n",
+    "\n",
+    "        for l in range(N_hidden+1):\n",
+    "            P[l] = P[l] - lmb * cost_grad[l]\n",
+    "\n",
+    "    print('Final cost: ',cost_function(P, x, t))\n",
+    "\n",
+    "    return P\n",
+    "\n",
+    "if __name__ == '__main__':\n",
+    "    ### Use the neural network:\n",
+    "    npr.seed(15)\n",
+    "\n",
+    "    ## Decide the vales of arguments to the function to solve\n",
+    "    Nx = 10; Nt = 10\n",
+    "    x = np.linspace(0, 1, Nx)\n",
+    "    t = np.linspace(0,1,Nt)\n",
+    "\n",
+    "    ## Set up the parameters for the network\n",
+    "    num_hidden_neurons = [100, 25]\n",
+    "    num_iter = 250\n",
+    "    lmb = 0.01\n",
+    "\n",
+    "    P = solve_pde_deep_neural_network(x,t, num_hidden_neurons, num_iter, lmb)\n",
+    "\n",
+    "    ## Store the results\n",
+    "    g_dnn_ag = np.zeros((Nx, Nt))\n",
+    "    G_analytical = np.zeros((Nx, Nt))\n",
+    "    for i,x_ in enumerate(x):\n",
+    "        for j, t_ in enumerate(t):\n",
+    "            point = np.array([x_, t_])\n",
+    "            g_dnn_ag[i,j] = g_trial(point,P)\n",
+    "\n",
+    "            G_analytical[i,j] = g_analytic(point)\n",
+    "\n",
+    "    # Find the map difference between the analytical and the computed solution\n",
+    "    diff_ag = np.abs(g_dnn_ag - G_analytical)\n",
+    "    print('Max absolute difference between the analytical solution and the network: %g'%np.max(diff_ag))\n",
+    "\n",
+    "    ## Plot the solutions in two dimensions, that being in position and time\n",
+    "\n",
+    "    T,X = np.meshgrid(t,x)\n",
+    "\n",
+    "    fig = plt.figure(figsize=(10,10))\n",
+    "    ax = fig.add_suplot(projection='3d')\n",
+    "    ax.set_title('Solution from the deep neural network w/ %d layer'%len(num_hidden_neurons))\n",
+    "    s = ax.plot_surface(T,X,g_dnn_ag,linewidth=0,antialiased=False,cmap=cm.viridis)\n",
+    "    ax.set_xlabel('Time $t$')\n",
+    "    ax.set_ylabel('Position $x$');\n",
+    "\n",
+    "\n",
+    "    fig = plt.figure(figsize=(10,10))\n",
+    "    ax = fig.add_suplot(projection='3d')\n",
+    "    ax.set_title('Analytical solution')\n",
+    "    s = ax.plot_surface(T,X,G_analytical,linewidth=0,antialiased=False,cmap=cm.viridis)\n",
+    "    ax.set_xlabel('Time $t$')\n",
+    "    ax.set_ylabel('Position $x$');\n",
+    "\n",
+    "    fig = plt.figure(figsize=(10,10))\n",
+    "    ax = fig.add_suplot(projection='3d')\n",
+    "    ax.set_title('Difference')\n",
+    "    s = ax.plot_surface(T,X,diff_ag,linewidth=0,antialiased=False,cmap=cm.viridis)\n",
+    "    ax.set_xlabel('Time $t$')\n",
+    "    ax.set_ylabel('Position $x$');\n",
+    "\n",
+    "    ## Take some slices of the 3D plots just to see the solutions at particular times\n",
+    "    indx1 = 0\n",
+    "    indx2 = int(Nt/2)\n",
+    "    indx3 = Nt-1\n",
+    "\n",
+    "    t1 = t[indx1]\n",
+    "    t2 = t[indx2]\n",
+    "    t3 = t[indx3]\n",
+    "\n",
+    "    # Slice the results from the DNN\n",
+    "    res1 = g_dnn_ag[:,indx1]\n",
+    "    res2 = g_dnn_ag[:,indx2]\n",
+    "    res3 = g_dnn_ag[:,indx3]\n",
+    "\n",
+    "    # Slice the analytical results\n",
+    "    res_analytical1 = G_analytical[:,indx1]\n",
+    "    res_analytical2 = G_analytical[:,indx2]\n",
+    "    res_analytical3 = G_analytical[:,indx3]\n",
+    "\n",
+    "    # Plot the slices\n",
+    "    plt.figure(figsize=(10,10))\n",
+    "    plt.title(\"Computed solutions at time = %g\"%t1)\n",
+    "    plt.plot(x, res1)\n",
+    "    plt.plot(x,res_analytical1)\n",
+    "    plt.legend(['dnn','analytical'])\n",
+    "\n",
+    "    plt.figure(figsize=(10,10))\n",
+    "    plt.title(\"Computed solutions at time = %g\"%t2)\n",
+    "    plt.plot(x, res2)\n",
+    "    plt.plot(x,res_analytical2)\n",
+    "    plt.legend(['dnn','analytical'])\n",
+    "\n",
+    "    plt.figure(figsize=(10,10))\n",
+    "    plt.title(\"Computed solutions at time = %g\"%t3)\n",
+    "    plt.plot(x, res3)\n",
+    "    plt.plot(x,res_analytical3)\n",
+    "    plt.legend(['dnn','analytical'])\n",
+    "\n",
+    "    plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ada13a48",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Example: Solving the wave equation with Neural Networks\n",
+    "\n",
+    "The wave equation is"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e4727d73",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial^2 g(x,t)}{\\partial t^2} = c^2\\frac{\\partial^2 g(x,t)}{\\partial x^2}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0b86d555",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "with $c$ being the specified wave speed.\n",
+    "\n",
+    "Here, the chosen conditions are"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "216948d5",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\begin{align*}\n",
+    "\tg(0,t) &= 0 \\\\\n",
+    "\tg(1,t) &= 0 \\\\\n",
+    "\tg(x,0) &= u(x) \\\\\n",
+    "\t\\frac{\\partial g(x,t)}{\\partial t} \\Big |_{t = 0} &= v(x)\n",
+    "\\end{align*}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "44c25fdc",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where $\\frac{\\partial g(x,t)}{\\partial t} \\Big |_{t = 0}$ means the derivative of $g(x,t)$ with respect to $t$ is evaluated at $t = 0$, and $u(x)$ and $v(x)$ being given functions."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "98f919eb",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## The problem to solve for\n",
+    "\n",
+    "The wave equation to solve for, is"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "01299767",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"wave\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation} \\label{wave} \\tag{19}\n",
+    "\\frac{\\partial^2 g(x,t)}{\\partial t^2} = c^2 \\frac{\\partial^2 g(x,t)}{\\partial x^2}\n",
+    "\\end{equation}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "556587c5",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where $c$ is the given wave speed.\n",
+    "The chosen conditions for this equation are"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c9eb4f3a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"condwave\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{aligned}\n",
+    "g(0,t) &= 0, &t \\geq 0 \\\\\n",
+    "g(1,t) &= 0, &t \\geq 0 \\\\\n",
+    "g(x,0) &= u(x), &x\\in[0,1] \\\\\n",
+    "\\frac{\\partial g(x,t)}{\\partial t}\\Big |_{t = 0} &= v(x), &x \\in [0,1]\n",
+    "\\end{aligned} \\label{condwave} \\tag{20}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "63128ef6",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "In this example, let $c = 1$ and $u(x) = \\sin(\\pi x)$ and $v(x) = -\\pi\\sin(\\pi x)$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ff568c81",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## The trial solution\n",
+    "Setting up the network is done in similar matter as for the example of solving the diffusion equation.\n",
+    "The only things we have to change, is the trial solution such that it satisfies the conditions from ([20](#condwave)) and the cost function.\n",
+    "\n",
+    "The trial solution becomes slightly different since we have other conditions than in the example of solving the diffusion equation. Here, a possible trial solution $g_t(x,t)$ is\n",
+    "\n",
+    "$$\n",
+    "g_t(x,t) = h_1(x,t) + x(1-x)t^2N(x,t,P)\n",
+    "$$\n",
+    "\n",
+    "where\n",
+    "\n",
+    "$$\n",
+    "h_1(x,t) = (1-t^2)u(x) + tv(x)\n",
+    "$$\n",
+    "\n",
+    "Note that this trial solution satisfies the conditions only if $u(0) = v(0) = u(1) = v(1) = 0$, which is the case in this example."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7b32c8dd",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## The analytical solution\n",
+    "\n",
+    "The analytical solution for our specific problem, is\n",
+    "\n",
+    "$$\n",
+    "g(x,t) = \\sin(\\pi x)\\cos(\\pi t) - \\sin(\\pi x)\\sin(\\pi t)\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "fc33e683",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Solving the wave equation - the full program using Autograd"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 40,
+   "id": "2f923958",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "import autograd.numpy as np\n",
+    "from autograd import hessian,grad\n",
+    "import autograd.numpy.random as npr\n",
+    "from matplotlib import cm\n",
+    "from matplotlib import pyplot as plt\n",
+    "from mpl_toolkits.mplot3d import axes3d\n",
+    "\n",
+    "## Set up the trial function:\n",
+    "def u(x):\n",
+    "    return np.sin(np.pi*x)\n",
+    "\n",
+    "def v(x):\n",
+    "    return -np.pi*np.sin(np.pi*x)\n",
+    "\n",
+    "def h1(point):\n",
+    "    x,t = point\n",
+    "    return (1 - t**2)*u(x) + t*v(x)\n",
+    "\n",
+    "def g_trial(point,P):\n",
+    "    x,t = point\n",
+    "    return h1(point) + x*(1-x)*t**2*deep_neural_network(P,point)\n",
+    "\n",
+    "## Define the cost function\n",
+    "def cost_function(P, x, t):\n",
+    "    cost_sum = 0\n",
+    "\n",
+    "    g_t_hessian_func = hessian(g_trial)\n",
+    "\n",
+    "    for x_ in x:\n",
+    "        for t_ in t:\n",
+    "            point = np.array([x_,t_])\n",
+    "\n",
+    "            g_t_hessian = g_t_hessian_func(point,P)\n",
+    "\n",
+    "            g_t_d2x = g_t_hessian[0][0]\n",
+    "            g_t_d2t = g_t_hessian[1][1]\n",
+    "\n",
+    "            err_sqr = ( (g_t_d2t - g_t_d2x) )**2\n",
+    "            cost_sum += err_sqr\n",
+    "\n",
+    "    return cost_sum / (np.size(t) * np.size(x))\n",
+    "\n",
+    "## The neural network\n",
+    "def sigmoid(z):\n",
+    "    return 1/(1 + np.exp(-z))\n",
+    "\n",
+    "def deep_neural_network(deep_params, x):\n",
+    "    # x is now a point and a 1D numpy array; make it a column vector\n",
+    "    num_coordinates = np.size(x,0)\n",
+    "    x = x.reshape(num_coordinates,-1)\n",
+    "\n",
+    "    num_points = np.size(x,1)\n",
+    "\n",
+    "    # N_hidden is the number of hidden layers\n",
+    "    N_hidden = len(deep_params) - 1 # -1 since params consist of parameters to all the hidden layers AND the output layer\n",
+    "\n",
+    "    # Assume that the input layer does nothing to the input x\n",
+    "    x_input = x\n",
+    "    x_prev = x_input\n",
+    "\n",
+    "    ## Hidden layers:\n",
+    "\n",
+    "    for l in range(N_hidden):\n",
+    "        # From the list of parameters P; find the correct weigths and bias for this layer\n",
+    "        w_hidden = deep_params[l]\n",
+    "\n",
+    "        # Add a row of ones to include bias\n",
+    "        x_prev = np.concatenate((np.ones((1,num_points)), x_prev ), axis = 0)\n",
+    "\n",
+    "        z_hidden = np.matmul(w_hidden, x_prev)\n",
+    "        x_hidden = sigmoid(z_hidden)\n",
+    "\n",
+    "        # Update x_prev such that next layer can use the output from this layer\n",
+    "        x_prev = x_hidden\n",
+    "\n",
+    "    ## Output layer:\n",
+    "\n",
+    "    # Get the weights and bias for this layer\n",
+    "    w_output = deep_params[-1]\n",
+    "\n",
+    "    # Include bias:\n",
+    "    x_prev = np.concatenate((np.ones((1,num_points)), x_prev), axis = 0)\n",
+    "\n",
+    "    z_output = np.matmul(w_output, x_prev)\n",
+    "    x_output = z_output\n",
+    "\n",
+    "    return x_output[0][0]\n",
+    "\n",
+    "## The analytical solution\n",
+    "def g_analytic(point):\n",
+    "    x,t = point\n",
+    "    return np.sin(np.pi*x)*np.cos(np.pi*t) - np.sin(np.pi*x)*np.sin(np.pi*t)\n",
+    "\n",
+    "def solve_pde_deep_neural_network(x,t, num_neurons, num_iter, lmb):\n",
+    "    ## Set up initial weigths and biases\n",
+    "    N_hidden = np.size(num_neurons)\n",
+    "\n",
+    "    ## Set up initial weigths and biases\n",
+    "\n",
+    "    # Initialize the list of parameters:\n",
+    "    P = [None]*(N_hidden + 1) # + 1 to include the output layer\n",
+    "\n",
+    "    P[0] = npr.randn(num_neurons[0], 2 + 1 ) # 2 since we have two points, +1 to include bias\n",
+    "    for l in range(1,N_hidden):\n",
+    "        P[l] = npr.randn(num_neurons[l], num_neurons[l-1] + 1) # +1 to include bias\n",
+    "\n",
+    "    # For the output layer\n",
+    "    P[-1] = npr.randn(1, num_neurons[-1] + 1 ) # +1 since bias is included\n",
+    "\n",
+    "    print('Initial cost: ',cost_function(P, x, t))\n",
+    "\n",
+    "    cost_function_grad = grad(cost_function,0)\n",
+    "\n",
+    "    # Let the update be done num_iter times\n",
+    "    for i in range(num_iter):\n",
+    "        cost_grad =  cost_function_grad(P, x , t)\n",
+    "\n",
+    "        for l in range(N_hidden+1):\n",
+    "            P[l] = P[l] - lmb * cost_grad[l]\n",
+    "\n",
+    "\n",
+    "    print('Final cost: ',cost_function(P, x, t))\n",
+    "\n",
+    "    return P\n",
+    "\n",
+    "if __name__ == '__main__':\n",
+    "    ### Use the neural network:\n",
+    "    npr.seed(15)\n",
+    "\n",
+    "    ## Decide the vales of arguments to the function to solve\n",
+    "    Nx = 10; Nt = 10\n",
+    "    x = np.linspace(0, 1, Nx)\n",
+    "    t = np.linspace(0,1,Nt)\n",
+    "\n",
+    "    ## Set up the parameters for the network\n",
+    "    num_hidden_neurons = [50,20]\n",
+    "    num_iter = 1000\n",
+    "    lmb = 0.01\n",
+    "\n",
+    "    P = solve_pde_deep_neural_network(x,t, num_hidden_neurons, num_iter, lmb)\n",
+    "\n",
+    "    ## Store the results\n",
+    "    res = np.zeros((Nx, Nt))\n",
+    "    res_analytical = np.zeros((Nx, Nt))\n",
+    "    for i,x_ in enumerate(x):\n",
+    "        for j, t_ in enumerate(t):\n",
+    "            point = np.array([x_, t_])\n",
+    "            res[i,j] = g_trial(point,P)\n",
+    "\n",
+    "            res_analytical[i,j] = g_analytic(point)\n",
+    "\n",
+    "    diff = np.abs(res - res_analytical)\n",
+    "    print(\"Max difference between analytical and solution from nn: %g\"%np.max(diff))\n",
+    "\n",
+    "    ## Plot the solutions in two dimensions, that being in position and time\n",
+    "\n",
+    "    T,X = np.meshgrid(t,x)\n",
+    "\n",
+    "    fig = plt.figure(figsize=(10,10))\n",
+    "    ax = fig.add_suplot(projection='3d')\n",
+    "    ax.set_title('Solution from the deep neural network w/ %d layer'%len(num_hidden_neurons))\n",
+    "    s = ax.plot_surface(T,X,res,linewidth=0,antialiased=False,cmap=cm.viridis)\n",
+    "    ax.set_xlabel('Time $t$')\n",
+    "    ax.set_ylabel('Position $x$');\n",
+    "\n",
+    "\n",
+    "    fig = plt.figure(figsize=(10,10))\n",
+    "    ax = fig.add_suplot(projection='3d')\n",
+    "    ax.set_title('Analytical solution')\n",
+    "    s = ax.plot_surface(T,X,res_analytical,linewidth=0,antialiased=False,cmap=cm.viridis)\n",
+    "    ax.set_xlabel('Time $t$')\n",
+    "    ax.set_ylabel('Position $x$');\n",
+    "\n",
+    "\n",
+    "    fig = plt.figure(figsize=(10,10))\n",
+    "    ax = fig.add_suplot(projection='3d')\n",
+    "    ax.set_title('Difference')\n",
+    "    s = ax.plot_surface(T,X,diff,linewidth=0,antialiased=False,cmap=cm.viridis)\n",
+    "    ax.set_xlabel('Time $t$')\n",
+    "    ax.set_ylabel('Position $x$');\n",
+    "\n",
+    "    ## Take some slices of the 3D plots just to see the solutions at particular times\n",
+    "    indx1 = 0\n",
+    "    indx2 = int(Nt/2)\n",
+    "    indx3 = Nt-1\n",
+    "\n",
+    "    t1 = t[indx1]\n",
+    "    t2 = t[indx2]\n",
+    "    t3 = t[indx3]\n",
+    "\n",
+    "    # Slice the results from the DNN\n",
+    "    res1 = res[:,indx1]\n",
+    "    res2 = res[:,indx2]\n",
+    "    res3 = res[:,indx3]\n",
+    "\n",
+    "    # Slice the analytical results\n",
+    "    res_analytical1 = res_analytical[:,indx1]\n",
+    "    res_analytical2 = res_analytical[:,indx2]\n",
+    "    res_analytical3 = res_analytical[:,indx3]\n",
+    "\n",
+    "    # Plot the slices\n",
+    "    plt.figure(figsize=(10,10))\n",
+    "    plt.title(\"Computed solutions at time = %g\"%t1)\n",
+    "    plt.plot(x, res1)\n",
+    "    plt.plot(x,res_analytical1)\n",
+    "    plt.legend(['dnn','analytical'])\n",
+    "\n",
+    "    plt.figure(figsize=(10,10))\n",
+    "    plt.title(\"Computed solutions at time = %g\"%t2)\n",
+    "    plt.plot(x, res2)\n",
+    "    plt.plot(x,res_analytical2)\n",
+    "    plt.legend(['dnn','analytical'])\n",
+    "\n",
+    "    plt.figure(figsize=(10,10))\n",
+    "    plt.title(\"Computed solutions at time = %g\"%t3)\n",
+    "    plt.plot(x, res3)\n",
+    "    plt.plot(x,res_analytical3)\n",
+    "    plt.legend(['dnn','analytical'])\n",
+    "\n",
+    "    plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "95dea76f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Resources on differential equations and deep learning\n",
+    "\n",
+    "1. [Artificial neural networks for solving ordinary and partial differential equations by I.E. Lagaris et al](https://pdfs.semanticscholar.org/d061/df393e0e8fbfd0ea24976458b7d42419040d.pdf)\n",
+    "\n",
+    "2. [Neural networks for solving differential equations by A. Honchar](https://becominghuman.ai/neural-networks-for-solving-differential-equations-fa230ac5e04c)\n",
+    "\n",
+    "3. [Solving differential equations using neural networks by M.M Chiaramonte and M. Kiener](http://cs229.stanford.edu/proj2013/ChiaramonteKiener-SolvingDifferentialEquationsUsingNeuralNetworks.pdf)\n",
+    "\n",
+    "4. [Introduction to Partial Differential Equations by A. Tveito, R. Winther](https://www.springer.com/us/book/9783540225515)"
+   ]
+  }
+ ],
+ "metadata": {},
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
\ No newline at end of file
diff --git a/doc/LectureNotes/_build/jupyter_execute/week44.ipynb b/doc/LectureNotes/_build/jupyter_execute/week44.ipynb
new file mode 100644
index 000000000..9e0c9b8bd
--- /dev/null
+++ b/doc/LectureNotes/_build/jupyter_execute/week44.ipynb
@@ -0,0 +1,4983 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "67995f17",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- HTML file automatically generated from DocOnce source (https://github.com/doconce/doconce/)\n",
+    "doconce format html week44.do.txt --no_mako -->\n",
+    "<!-- dom:TITLE: Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN) -->"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d31bb6a0",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "# Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)\n",
+    "**Morten Hjorth-Jensen**, Department of Physics, University of Oslo, Norway\n",
+    "\n",
+    "Date: **Week 44**"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "846f5bd7",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Plan for week 44\n",
+    "\n",
+    "**Material for the lecture Monday October 27, 2025.**\n",
+    "\n",
+    "1. Solving differential equations, continuation from last week, first lecture\n",
+    "\n",
+    "2. Convolutional  Neural Networks, second lecture\n",
+    "\n",
+    "3. Readings and Videos:\n",
+    "\n",
+    "  * These lecture notes at <https://github.com/CompPhysics/MachineLearning/blob/master/doc/pub/week44/ipynb/week44.ipynb>\n",
+    "\n",
+    "  * For a more in depth discussion on  neural networks we recommend Goodfellow et al chapter 9. See also chapter 11 and 12 on practicalities and applications\n",
+    "\n",
+    "  * Reading suggestions for implementation of CNNs see Rashcka et al.'s chapter 14 at <https://github.com/rasbt/machine-learning-book/tree/main/ch14>.     \n",
+    "\n",
+    "  * Video on Deep Learning at <https://www.youtube.com/playlist?list=PLZHQObOWTQDNU6R1_67000Dx_ZCJB-3pi>\n",
+    "\n",
+    "  * Video  on Convolutional Neural Networks from MIT at <https://www.youtube.com/watch?v=iaSUYvmCekI&ab_channel=AlexanderAmini>\n",
+    "\n",
+    "  * Video on CNNs from Stanford at <https://www.youtube.com/watch?v=bNb2fEVKeEo&list=PLC1qU-LWwrF64f4QKQT-Vg5Wr4qEE1Zxk&index=6&ab_channel=StanfordUniversitySchoolofEngineering>\n",
+    "\n",
+    "  * Video of lecture October 27 at <https://youtu.be/QqOGhLgkig0>\n",
+    "\n",
+    "  * Whiteboard notes at <https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2025/FYSSTKweek44>"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "855f98ab",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Lab  sessions on Tuesday and Wednesday\n",
+    "\n",
+    "* Main focus is discussion of and work on project 2\n",
+    "\n",
+    "* If you did not get time to finish the exercises from weeks 41-42, you can also keep working on them and hand in this coming Friday"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "12675cc5",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Material for Lecture Monday October 27"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f714320f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Solving differential equations  with Deep Learning\n",
+    "\n",
+    "The Universal Approximation Theorem states that a neural network can\n",
+    "approximate any function at a single hidden layer along with one input\n",
+    "and output layer to any given precision.\n",
+    "\n",
+    "**Book on solving differential equations with ML methods.**\n",
+    "\n",
+    "[An Introduction to Neural Network Methods for Differential Equations](https://www.springer.com/gp/book/9789401798150), by Yadav and Kumar.\n",
+    "\n",
+    "**Physics informed neural networks.**\n",
+    "\n",
+    "[Scientific Machine Learning Through Physics–Informed Neural Networks: Where we are and What’s Next](https://link.springer.com/article/10.1007/s10915-022-01939-z), by Cuomo et al\n",
+    "\n",
+    "**Thanks to Kristine Baluka Hein.**\n",
+    "\n",
+    "The lectures on differential equations were developed by Kristine Baluka Hein, now PhD student at IFI.\n",
+    "A great thanks to Kristine."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ebe354b6",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Ordinary Differential Equations first\n",
+    "\n",
+    "An ordinary differential equation (ODE) is an equation involving functions having one variable.\n",
+    "\n",
+    "In general, an ordinary differential equation looks like"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f16621c0",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"ode\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation} \\label{ode} \\tag{1}\n",
+    "f\\left(x, \\, g(x), \\, g'(x), \\, g''(x), \\, \\dots \\, , \\, g^{(n)}(x)\\right) = 0\n",
+    "\\end{equation}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2b272a0d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where $g(x)$ is the function to find, and $g^{(n)}(x)$ is the $n$-th derivative of $g(x)$.\n",
+    "\n",
+    "The $f\\left(x, g(x), g'(x), g''(x), \\, \\dots \\, , g^{(n)}(x)\\right)$ is just a way to write that there is an expression involving $x$ and $g(x), \\ g'(x), \\ g''(x), \\, \\dots \\, , \\text{ and } g^{(n)}(x)$ on the left side of the equality sign in ([1](#ode)).\n",
+    "The highest order of derivative, that is the value of $n$, determines to the order of the equation.\n",
+    "The equation is referred to as a $n$-th order ODE.\n",
+    "Along with ([1](#ode)), some additional conditions of the function $g(x)$ are typically given\n",
+    "for the solution to be unique."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "611b2399",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## The trial solution\n",
+    "\n",
+    "Let the trial solution $g_t(x)$ be"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cab2d9fb",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"_auto1\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation}\n",
+    "\tg_t(x) = h_1(x) + h_2(x,N(x,P))\n",
+    "\\label{_auto1} \\tag{2}\n",
+    "\\end{equation}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "fbd68a84",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where $h_1(x)$ is a function that makes $g_t(x)$ satisfy a given set\n",
+    "of conditions, $N(x,P)$ a neural network with weights and biases\n",
+    "described by $P$ and $h_2(x, N(x,P))$ some expression involving the\n",
+    "neural network.  The role of the function $h_2(x, N(x,P))$, is to\n",
+    "ensure that the output from $N(x,P)$ is zero when $g_t(x)$ is\n",
+    "evaluated at the values of $x$ where the given conditions must be\n",
+    "satisfied.  The function $h_1(x)$ should alone make $g_t(x)$ satisfy\n",
+    "the conditions.\n",
+    "\n",
+    "But what about the network $N(x,P)$?\n",
+    "\n",
+    "As described previously, an optimization method could be used to minimize the parameters of a neural network, that being its weights and biases, through backward propagation."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "24929e78",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Minimization process\n",
+    "\n",
+    "For the minimization to be defined, we need to have a cost function at hand to minimize.\n",
+    "\n",
+    "It is given that $f\\left(x, \\, g(x), \\, g'(x), \\, g''(x), \\, \\dots \\, , \\, g^{(n)}(x)\\right)$ should be equal to zero in ([1](#ode)).\n",
+    "We can choose to consider the mean squared error as the cost function for an input $x$.\n",
+    "Since we are looking at one input, the cost function is just $f$ squared.\n",
+    "The cost function $c\\left(x, P \\right)$ can therefore be expressed as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8da0a4d4",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "C\\left(x, P\\right) = \\big(f\\left(x, \\, g(x), \\, g'(x), \\, g''(x), \\, \\dots \\, , \\, g^{(n)}(x)\\right)\\big)^2\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3de8b89e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "If $N$ inputs are given as a vector $\\boldsymbol{x}$ with elements $x_i$ for $i = 1,\\dots,N$,\n",
+    "the cost function becomes"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1275ce7a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"cost\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation} \\label{cost} \\tag{3}\n",
+    "\tC\\left(\\boldsymbol{x}, P\\right) = \\frac{1}{N} \\sum_{i=1}^N \\big(f\\left(x_i, \\, g(x_i), \\, g'(x_i), \\, g''(x_i), \\, \\dots \\, , \\, g^{(n)}(x_i)\\right)\\big)^2\n",
+    "\\end{equation}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a522e0fa",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "The neural net should then find the parameters $P$ that minimizes the cost function in\n",
+    "([3](#cost)) for a set of $N$ training samples $x_i$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8a18955b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Minimizing the cost function using gradient descent and automatic differentiation\n",
+    "\n",
+    "To perform the minimization using gradient descent, the gradient of $C\\left(\\boldsymbol{x}, P\\right)$ is needed.\n",
+    "It might happen so that finding an analytical expression of the gradient of $C(\\boldsymbol{x}, P)$ from ([3](#cost)) gets too messy, depending on which cost function one desires to use.\n",
+    "\n",
+    "Luckily, there exists libraries that makes the job for us through automatic differentiation.\n",
+    "Automatic differentiation is a method of finding the derivatives numerically with very high precision."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "888808f7",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Example: Exponential decay\n",
+    "\n",
+    "An exponential decay of a quantity $g(x)$ is described by the equation"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "fcefd7fb",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"solve_expdec\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation} \\label{solve_expdec} \\tag{4}\n",
+    "  g'(x) = -\\gamma g(x)\n",
+    "\\end{equation}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "02cb2ce9",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "with $g(0) = g_0$ for some chosen initial value $g_0$.\n",
+    "\n",
+    "The analytical solution of ([4](#solve_expdec)) is"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "bdd9ef4d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"_auto2\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation}\n",
+    "  g(x) = g_0 \\exp\\left(-\\gamma x\\right)\n",
+    "\\label{_auto2} \\tag{5}\n",
+    "\\end{equation}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "867cbb56",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Having an analytical solution at hand, it is possible to use it to compare how well a neural network finds a solution of ([4](#solve_expdec))."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2f9ac7ae",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## The function to solve for\n",
+    "\n",
+    "The program will use a neural network to solve"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "49a68337",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"solveode\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation} \\label{solveode} \\tag{6}\n",
+    "g'(x) = -\\gamma g(x)\n",
+    "\\end{equation}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a6a70316",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where $g(0) = g_0$ with $\\gamma$ and $g_0$ being some chosen values.\n",
+    "\n",
+    "In this example, $\\gamma = 2$ and $g_0 = 10$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "15622597",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## The trial solution\n",
+    "To begin with, a trial solution $g_t(t)$ must be chosen. A general trial solution for ordinary differential equations could be"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3661d5fe",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "g_t(x, P) = h_1(x) + h_2(x, N(x, P))\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "245327b3",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "with $h_1(x)$ ensuring that $g_t(x)$ satisfies some conditions and $h_2(x,N(x, P))$ an expression involving $x$ and the output from the neural network $N(x,P)$ with $P $ being the collection of the weights and biases for each layer. For now, it is assumed that the network consists of one input layer, one hidden layer, and one output layer."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "57ae96b2",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Setup of Network\n",
+    "\n",
+    "In this network, there are no weights and bias at the input layer, so $P = \\{ P_{\\text{hidden}},  P_{\\text{output}} \\}$.\n",
+    "If there are $N_{\\text{hidden} }$ neurons in the hidden layer, then $P_{\\text{hidden}}$ is a $N_{\\text{hidden} } \\times (1 + N_{\\text{input}})$ matrix, given that there are $N_{\\text{input}}$ neurons in the input layer.\n",
+    "\n",
+    "The first column in $P_{\\text{hidden} }$ represents the bias for each neuron in the hidden layer and the second column represents the weights for each neuron in the hidden layer from the input layer.\n",
+    "If there are $N_{\\text{output} }$ neurons in the output layer, then $P_{\\text{output}} $ is a $N_{\\text{output} } \\times (1 + N_{\\text{hidden} })$ matrix.\n",
+    "\n",
+    "Its first column represents the bias of each neuron and the remaining columns represents the weights to each neuron.\n",
+    "\n",
+    "It is given that $g(0) = g_0$. The trial solution must fulfill this condition to be a proper solution of ([6](#solveode)). A possible way to ensure that $g_t(0, P) = g_0$, is to let $F(N(x,P)) = x \\cdot N(x,P)$ and $h_1(x) = g_0$. This gives the following trial solution:"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6e7ea73f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"trial\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation} \\label{trial} \\tag{7}\n",
+    "g_t(x, P) = g_0 + x \\cdot N(x, P)\n",
+    "\\end{equation}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3ef84086",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Reformulating the problem\n",
+    "\n",
+    "We wish that our neural network manages to minimize a given cost function.\n",
+    "\n",
+    "A reformulation of out equation, ([6](#solveode)), must therefore be done,\n",
+    "such that it describes the problem a neural network can solve for.\n",
+    "\n",
+    "The neural network must find the set of weights and biases $P$ such that the trial solution in ([7](#trial)) satisfies ([6](#solveode)).\n",
+    "\n",
+    "The trial solution"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "03980965",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "g_t(x, P) = g_0 + x \\cdot N(x, P)\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f838bf7c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "has been chosen such that it already solves the condition $g(0) = g_0$. What remains, is to find $P$ such that"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3e1ebb62",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"nnmin\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation} \\label{nnmin} \\tag{8}\n",
+    "g_t'(x, P) = - \\gamma g_t(x, P)\n",
+    "\\end{equation}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a85dcbea",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "is fulfilled as *best as possible*."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "dc4a2fc0",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## More technicalities\n",
+    "\n",
+    "The left hand side and right hand side of ([8](#nnmin)) must be computed separately, and then the neural network must choose weights and biases, contained in $P$, such that the sides are equal as best as possible.\n",
+    "This means that the absolute or squared difference between the sides must be as close to zero, ideally equal to zero.\n",
+    "In this case, the difference squared shows to be an appropriate measurement of how erroneous the trial solution is with respect to $P$ of the neural network.\n",
+    "\n",
+    "This gives the following cost function our neural network must solve for:"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "20921b20",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\min_{P}\\Big\\{ \\big(g_t'(x, P) - ( -\\gamma g_t(x, P) \\big)^2 \\Big\\}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "06e89d99",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "(the notation $\\min_{P}\\{ f(x, P) \\}$ means that we desire to find $P$ that yields the minimum of $f(x, P)$)\n",
+    "\n",
+    "or, in terms of weights and biases for the hidden and output layer in our network:"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "fb4b7d00",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\min_{P_{\\text{hidden} }, \\ P_{\\text{output} }}\\Big\\{ \\big(g_t'(x, \\{ P_{\\text{hidden} }, P_{\\text{output} }\\}) - ( -\\gamma g_t(x, \\{ P_{\\text{hidden} }, P_{\\text{output} }\\}) \\big)^2 \\Big\\}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "925d8872",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "for an input value $x$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "46f38d69",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## More details\n",
+    "\n",
+    "If the neural network evaluates $g_t(x, P)$ at more values for $x$, say $N$ values $x_i$ for $i = 1, \\dots, N$, then the *total* error to minimize becomes"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "adca56df",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"min\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation} \\label{min} \\tag{9}\n",
+    "\\min_{P}\\Big\\{\\frac{1}{N} \\sum_{i=1}^N  \\big(g_t'(x_i, P) - ( -\\gamma g_t(x_i, P) \\big)^2 \\Big\\}\n",
+    "\\end{equation}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9e260216",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Letting $\\boldsymbol{x}$ be a vector with elements $x_i$ and $C(\\boldsymbol{x}, P) = \\frac{1}{N} \\sum_i  \\big(g_t'(x_i, P) - ( -\\gamma g_t(x_i, P) \\big)^2$ denote the cost function, the minimization problem that our network must solve, becomes"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7d5e7f63",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\min_{P} C(\\boldsymbol{x}, P)\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d7442d44",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "In terms of $P_{\\text{hidden} }$ and $P_{\\text{output} }$, this could also be expressed as\n",
+    "\n",
+    "$$\n",
+    "\\min_{P_{\\text{hidden} }, \\ P_{\\text{output} }} C(\\boldsymbol{x}, \\{P_{\\text{hidden} }, P_{\\text{output} }\\})\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "af21673a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## A possible implementation of a neural network\n",
+    "\n",
+    "For simplicity, it is assumed that the input is an array $\\boldsymbol{x} = (x_1, \\dots, x_N)$ with $N$ elements. It is at these points the neural network should find $P$ such that it fulfills ([9](#min)).\n",
+    "\n",
+    "First, the neural network must feed forward the inputs.\n",
+    "This means that $\\boldsymbol{x}s$ must be passed through an input layer, a hidden layer and a output layer. The input layer in this case, does not need to process the data any further.\n",
+    "The input layer will consist of $N_{\\text{input} }$ neurons, passing its element to each neuron in the hidden layer.  The number of neurons in the hidden layer will be $N_{\\text{hidden} }$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6687f370",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Technicalities\n",
+    "\n",
+    "For the $i$-th in the hidden layer with weight $w_i^{\\text{hidden} }$ and bias $b_i^{\\text{hidden} }$, the weighting from the $j$-th neuron at the input layer is:"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7c07e210",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\begin{aligned}\n",
+    "z_{i,j}^{\\text{hidden}} &= b_i^{\\text{hidden}} + w_i^{\\text{hidden}}x_j \\\\\n",
+    "&=\n",
+    "\\begin{pmatrix}\n",
+    "b_i^{\\text{hidden}} & w_i^{\\text{hidden}}\n",
+    "\\end{pmatrix}\n",
+    "\\begin{pmatrix}\n",
+    "1 \\\\\n",
+    "x_j\n",
+    "\\end{pmatrix}\n",
+    "\\end{aligned}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7747386f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Final technicalities I\n",
+    "\n",
+    "The result after weighting the inputs at the $i$-th hidden neuron can be written as a vector:"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "981c5e4b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\begin{aligned}\n",
+    "\\boldsymbol{z}_{i}^{\\text{hidden}} &= \\Big( b_i^{\\text{hidden}} + w_i^{\\text{hidden}}x_1 , \\ b_i^{\\text{hidden}} + w_i^{\\text{hidden}} x_2, \\ \\dots \\, , \\ b_i^{\\text{hidden}} + w_i^{\\text{hidden}} x_N\\Big)  \\\\\n",
+    "&=\n",
+    "\\begin{pmatrix}\n",
+    " b_i^{\\text{hidden}}  & w_i^{\\text{hidden}}\n",
+    "\\end{pmatrix}\n",
+    "\\begin{pmatrix}\n",
+    "1  & 1 & \\dots & 1 \\\\\n",
+    "x_1 & x_2 & \\dots & x_N\n",
+    "\\end{pmatrix} \\\\\n",
+    "&= \\boldsymbol{p}_{i, \\text{hidden}}^T X\n",
+    "\\end{aligned}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7eedb1ed",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Final technicalities II\n",
+    "\n",
+    "The vector $\\boldsymbol{p}_{i, \\text{hidden}}^T$ constitutes each row in $P_{\\text{hidden} }$, which contains the weights for the neural network to minimize according to ([9](#min)).\n",
+    "\n",
+    "After having found $\\boldsymbol{z}_{i}^{\\text{hidden}} $ for every $i$-th neuron within the hidden layer, the vector will be sent to an activation function $a_i(\\boldsymbol{z})$.\n",
+    "\n",
+    "In this example, the sigmoid function has been chosen to be the activation function for each hidden neuron:"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8507388c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "f(z) = \\frac{1}{1 + \\exp{(-z)}}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "32c6ce19",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "It is possible to use other activations functions for the hidden layer also.\n",
+    "\n",
+    "The output $\\boldsymbol{x}_i^{\\text{hidden}}$ from each $i$-th hidden neuron is:\n",
+    "\n",
+    "$$\n",
+    "\\boldsymbol{x}_i^{\\text{hidden} } = f\\big(  \\boldsymbol{z}_{i}^{\\text{hidden}} \\big)\n",
+    "$$\n",
+    "\n",
+    "The outputs $\\boldsymbol{x}_i^{\\text{hidden} } $ are then sent to the output layer.\n",
+    "\n",
+    "The output layer consists of one neuron in this case, and combines the\n",
+    "output from each of the neurons in the hidden layers. The output layer\n",
+    "combines the results from the hidden layer using some weights $w_i^{\\text{output}}$\n",
+    "and biases $b_i^{\\text{output}}$. In this case,\n",
+    "it is assumes that the number of neurons in the output layer is one."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d3adb503",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Final technicalities III\n",
+    "\n",
+    "The procedure of weighting the output neuron $j$ in the hidden layer to the $i$-th neuron in the output layer is similar as for the hidden layer described previously."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "41fb7d85",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\begin{aligned}\n",
+    "z_{1,j}^{\\text{output}} & =\n",
+    "\\begin{pmatrix}\n",
+    "b_1^{\\text{output}} & \\boldsymbol{w}_1^{\\text{output}}\n",
+    "\\end{pmatrix}\n",
+    "\\begin{pmatrix}\n",
+    "1 \\\\\n",
+    "\\boldsymbol{x}_j^{\\text{hidden}}\n",
+    "\\end{pmatrix}\n",
+    "\\end{aligned}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6af6c5f6",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Final technicalities IV\n",
+    "\n",
+    "Expressing $z_{1,j}^{\\text{output}}$ as a vector gives the following way of weighting the inputs from the hidden layer:"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "bfdfcfe5",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\boldsymbol{z}_{1}^{\\text{output}} =\n",
+    "\\begin{pmatrix}\n",
+    "b_1^{\\text{output}} & \\boldsymbol{w}_1^{\\text{output}}\n",
+    "\\end{pmatrix}\n",
+    "\\begin{pmatrix}\n",
+    "1  & 1 & \\dots & 1 \\\\\n",
+    "\\boldsymbol{x}_1^{\\text{hidden}} & \\boldsymbol{x}_2^{\\text{hidden}} & \\dots & \\boldsymbol{x}_N^{\\text{hidden}}\n",
+    "\\end{pmatrix}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "224fb7a0",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "In this case we seek a continuous range of values since we are approximating a function. This means that after computing $\\boldsymbol{z}_{1}^{\\text{output}}$ the neural network has finished its feed forward step, and $\\boldsymbol{z}_{1}^{\\text{output}}$ is the final output of the network."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "03c8c39e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Back propagation\n",
+    "\n",
+    "The next step is to decide how the parameters should be changed such that they minimize the cost function.\n",
+    "\n",
+    "The chosen cost function for this problem is"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f467feb4",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "C(\\boldsymbol{x}, P) = \\frac{1}{N} \\sum_i  \\big(g_t'(x_i, P) - ( -\\gamma g_t(x_i, P) \\big)^2\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "287a0aed",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "In order to minimize the cost function, an optimization method must be chosen.\n",
+    "\n",
+    "Here, gradient descent with a constant step size has been chosen."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a49835f1",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Gradient descent\n",
+    "\n",
+    "The idea of the gradient descent algorithm is to update parameters in\n",
+    "a direction where the cost function decreases goes to a minimum.\n",
+    "\n",
+    "In general, the update of some parameters $\\boldsymbol{\\omega}$ given a cost\n",
+    "function defined by some weights $\\boldsymbol{\\omega}$, $C(\\boldsymbol{x},\n",
+    "\\boldsymbol{\\omega})$, goes as follows:"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "62d6f51d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\boldsymbol{\\omega}_{\\text{new} } = \\boldsymbol{\\omega} - \\lambda \\nabla_{\\boldsymbol{\\omega}} C(\\boldsymbol{x}, \\boldsymbol{\\omega})\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3ca20573",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "for a number of iterations or until $ \\big|\\big| \\boldsymbol{\\omega}_{\\text{new} } - \\boldsymbol{\\omega} \\big|\\big|$ becomes smaller than some given tolerance.\n",
+    "\n",
+    "The value of $\\lambda$ decides how large steps the algorithm must take\n",
+    "in the direction of $ \\nabla_{\\boldsymbol{\\omega}} C(\\boldsymbol{x}, \\boldsymbol{\\omega})$.\n",
+    "The notation $\\nabla_{\\boldsymbol{\\omega}}$ express the gradient with respect\n",
+    "to the elements in $\\boldsymbol{\\omega}$.\n",
+    "\n",
+    "In our case, we have to minimize the cost function $C(\\boldsymbol{x}, P)$ with\n",
+    "respect to the two sets of weights and biases, that is for the hidden\n",
+    "layer $P_{\\text{hidden} }$ and for the output layer $P_{\\text{output}\n",
+    "}$ .\n",
+    "\n",
+    "This means that $P_{\\text{hidden} }$ and $P_{\\text{output} }$ is updated by"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8b16bc94",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\begin{aligned}\n",
+    "P_{\\text{hidden},\\text{new}} &= P_{\\text{hidden}} - \\lambda \\nabla_{P_{\\text{hidden}}} C(\\boldsymbol{x}, P)  \\\\\n",
+    "P_{\\text{output},\\text{new}} &= P_{\\text{output}} - \\lambda \\nabla_{P_{\\text{output}}} C(\\boldsymbol{x}, P)\n",
+    "\\end{aligned}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a339b3a7",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## The code for solving the ODE"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "id": "a63e587a",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "%matplotlib inline\n",
+    "\n",
+    "import autograd.numpy as np\n",
+    "from autograd import grad, elementwise_grad\n",
+    "import autograd.numpy.random as npr\n",
+    "from matplotlib import pyplot as plt\n",
+    "\n",
+    "def sigmoid(z):\n",
+    "    return 1/(1 + np.exp(-z))\n",
+    "\n",
+    "# Assuming one input, hidden, and output layer\n",
+    "def neural_network(params, x):\n",
+    "\n",
+    "    # Find the weights (including and biases) for the hidden and output layer.\n",
+    "    # Assume that params is a list of parameters for each layer.\n",
+    "    # The biases are the first element for each array in params,\n",
+    "    # and the weights are the remaning elements in each array in params.\n",
+    "\n",
+    "    w_hidden = params[0]\n",
+    "    w_output = params[1]\n",
+    "\n",
+    "    # Assumes input x being an one-dimensional array\n",
+    "    num_values = np.size(x)\n",
+    "    x = x.reshape(-1, num_values)\n",
+    "\n",
+    "    # Assume that the input layer does nothing to the input x\n",
+    "    x_input = x\n",
+    "\n",
+    "    ## Hidden layer:\n",
+    "\n",
+    "    # Add a row of ones to include bias\n",
+    "    x_input = np.concatenate((np.ones((1,num_values)), x_input ), axis = 0)\n",
+    "\n",
+    "    z_hidden = np.matmul(w_hidden, x_input)\n",
+    "    x_hidden = sigmoid(z_hidden)\n",
+    "\n",
+    "    ## Output layer:\n",
+    "\n",
+    "    # Include bias:\n",
+    "    x_hidden = np.concatenate((np.ones((1,num_values)), x_hidden ), axis = 0)\n",
+    "\n",
+    "    z_output = np.matmul(w_output, x_hidden)\n",
+    "    x_output = z_output\n",
+    "\n",
+    "    return x_output\n",
+    "\n",
+    "# The trial solution using the deep neural network:\n",
+    "def g_trial(x,params, g0 = 10):\n",
+    "    return g0 + x*neural_network(params,x)\n",
+    "\n",
+    "# The right side of the ODE:\n",
+    "def g(x, g_trial, gamma = 2):\n",
+    "    return -gamma*g_trial\n",
+    "\n",
+    "# The cost function:\n",
+    "def cost_function(P, x):\n",
+    "\n",
+    "    # Evaluate the trial function with the current parameters P\n",
+    "    g_t = g_trial(x,P)\n",
+    "\n",
+    "    # Find the derivative w.r.t x of the neural network\n",
+    "    d_net_out = elementwise_grad(neural_network,1)(P,x)\n",
+    "\n",
+    "    # Find the derivative w.r.t x of the trial function\n",
+    "    d_g_t = elementwise_grad(g_trial,0)(x,P)\n",
+    "\n",
+    "    # The right side of the ODE\n",
+    "    func = g(x, g_t)\n",
+    "\n",
+    "    err_sqr = (d_g_t - func)**2\n",
+    "    cost_sum = np.sum(err_sqr)\n",
+    "\n",
+    "    return cost_sum / np.size(err_sqr)\n",
+    "\n",
+    "# Solve the exponential decay ODE using neural network with one input, hidden, and output layer\n",
+    "def solve_ode_neural_network(x, num_neurons_hidden, num_iter, lmb):\n",
+    "    ## Set up initial weights and biases\n",
+    "\n",
+    "    # For the hidden layer\n",
+    "    p0 = npr.randn(num_neurons_hidden, 2 )\n",
+    "\n",
+    "    # For the output layer\n",
+    "    p1 = npr.randn(1, num_neurons_hidden + 1 ) # +1 since bias is included\n",
+    "\n",
+    "    P = [p0, p1]\n",
+    "\n",
+    "    print('Initial cost: %g'%cost_function(P, x))\n",
+    "\n",
+    "    ## Start finding the optimal weights using gradient descent\n",
+    "\n",
+    "    # Find the Python function that represents the gradient of the cost function\n",
+    "    # w.r.t the 0-th input argument -- that is the weights and biases in the hidden and output layer\n",
+    "    cost_function_grad = grad(cost_function,0)\n",
+    "\n",
+    "    # Let the update be done num_iter times\n",
+    "    for i in range(num_iter):\n",
+    "        # Evaluate the gradient at the current weights and biases in P.\n",
+    "        # The cost_grad consist now of two arrays;\n",
+    "        # one for the gradient w.r.t P_hidden and\n",
+    "        # one for the gradient w.r.t P_output\n",
+    "        cost_grad =  cost_function_grad(P, x)\n",
+    "\n",
+    "        P[0] = P[0] - lmb * cost_grad[0]\n",
+    "        P[1] = P[1] - lmb * cost_grad[1]\n",
+    "\n",
+    "    print('Final cost: %g'%cost_function(P, x))\n",
+    "\n",
+    "    return P\n",
+    "\n",
+    "def g_analytic(x, gamma = 2, g0 = 10):\n",
+    "    return g0*np.exp(-gamma*x)\n",
+    "\n",
+    "# Solve the given problem\n",
+    "if __name__ == '__main__':\n",
+    "    # Set seed such that the weight are initialized\n",
+    "    # with same weights and biases for every run.\n",
+    "    npr.seed(15)\n",
+    "\n",
+    "    ## Decide the vales of arguments to the function to solve\n",
+    "    N = 10\n",
+    "    x = np.linspace(0, 1, N)\n",
+    "\n",
+    "    ## Set up the initial parameters\n",
+    "    num_hidden_neurons = 10\n",
+    "    num_iter = 10000\n",
+    "    lmb = 0.001\n",
+    "\n",
+    "    # Use the network\n",
+    "    P = solve_ode_neural_network(x, num_hidden_neurons, num_iter, lmb)\n",
+    "\n",
+    "    # Print the deviation from the trial solution and true solution\n",
+    "    res = g_trial(x,P)\n",
+    "    res_analytical = g_analytic(x)\n",
+    "\n",
+    "    print('Max absolute difference: %g'%np.max(np.abs(res - res_analytical)))\n",
+    "\n",
+    "    # Plot the results\n",
+    "    plt.figure(figsize=(10,10))\n",
+    "\n",
+    "    plt.title('Performance of neural network solving an ODE compared to the analytical solution')\n",
+    "    plt.plot(x, res_analytical)\n",
+    "    plt.plot(x, res[0,:])\n",
+    "    plt.legend(['analytical','nn'])\n",
+    "    plt.xlabel('x')\n",
+    "    plt.ylabel('g(x)')\n",
+    "    plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "85985bda",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## The network with one input layer, specified number of hidden layers, and one output layer\n",
+    "\n",
+    "It is also possible to extend the construction of our network into a more general one, allowing the network to contain more than one hidden layers.\n",
+    "\n",
+    "The number of neurons within each hidden layer are given as a list of integers in the program below."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "id": "91831f8e",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "import autograd.numpy as np\n",
+    "from autograd import grad, elementwise_grad\n",
+    "import autograd.numpy.random as npr\n",
+    "from matplotlib import pyplot as plt\n",
+    "\n",
+    "def sigmoid(z):\n",
+    "    return 1/(1 + np.exp(-z))\n",
+    "\n",
+    "# The neural network with one input layer and one output layer,\n",
+    "# but with number of hidden layers specified by the user.\n",
+    "def deep_neural_network(deep_params, x):\n",
+    "    # N_hidden is the number of hidden layers\n",
+    "    # deep_params is a list, len() should be used\n",
+    "    N_hidden = len(deep_params) - 1 # -1 since params consists of\n",
+    "                                        # parameters to all the hidden\n",
+    "                                        # layers AND the output layer.\n",
+    "\n",
+    "    # Assumes input x being an one-dimensional array\n",
+    "    num_values = np.size(x)\n",
+    "    x = x.reshape(-1, num_values)\n",
+    "\n",
+    "    # Assume that the input layer does nothing to the input x\n",
+    "    x_input = x\n",
+    "\n",
+    "    # Due to multiple hidden layers, define a variable referencing to the\n",
+    "    # output of the previous layer:\n",
+    "    x_prev = x_input\n",
+    "\n",
+    "    ## Hidden layers:\n",
+    "\n",
+    "    for l in range(N_hidden):\n",
+    "        # From the list of parameters P; find the correct weigths and bias for this layer\n",
+    "        w_hidden = deep_params[l]\n",
+    "\n",
+    "        # Add a row of ones to include bias\n",
+    "        x_prev = np.concatenate((np.ones((1,num_values)), x_prev ), axis = 0)\n",
+    "\n",
+    "        z_hidden = np.matmul(w_hidden, x_prev)\n",
+    "        x_hidden = sigmoid(z_hidden)\n",
+    "\n",
+    "        # Update x_prev such that next layer can use the output from this layer\n",
+    "        x_prev = x_hidden\n",
+    "\n",
+    "    ## Output layer:\n",
+    "\n",
+    "    # Get the weights and bias for this layer\n",
+    "    w_output = deep_params[-1]\n",
+    "\n",
+    "    # Include bias:\n",
+    "    x_prev = np.concatenate((np.ones((1,num_values)), x_prev), axis = 0)\n",
+    "\n",
+    "    z_output = np.matmul(w_output, x_prev)\n",
+    "    x_output = z_output\n",
+    "\n",
+    "    return x_output\n",
+    "\n",
+    "# The trial solution using the deep neural network:\n",
+    "def g_trial_deep(x,params, g0 = 10):\n",
+    "    return g0 + x*deep_neural_network(params, x)\n",
+    "\n",
+    "# The right side of the ODE:\n",
+    "def g(x, g_trial, gamma = 2):\n",
+    "    return -gamma*g_trial\n",
+    "\n",
+    "# The same cost function as before, but calls deep_neural_network instead.\n",
+    "def cost_function_deep(P, x):\n",
+    "\n",
+    "    # Evaluate the trial function with the current parameters P\n",
+    "    g_t = g_trial_deep(x,P)\n",
+    "\n",
+    "    # Find the derivative w.r.t x of the neural network\n",
+    "    d_net_out = elementwise_grad(deep_neural_network,1)(P,x)\n",
+    "\n",
+    "    # Find the derivative w.r.t x of the trial function\n",
+    "    d_g_t = elementwise_grad(g_trial_deep,0)(x,P)\n",
+    "\n",
+    "    # The right side of the ODE\n",
+    "    func = g(x, g_t)\n",
+    "\n",
+    "    err_sqr = (d_g_t - func)**2\n",
+    "    cost_sum = np.sum(err_sqr)\n",
+    "\n",
+    "    return cost_sum / np.size(err_sqr)\n",
+    "\n",
+    "# Solve the exponential decay ODE using neural network with one input and one output layer,\n",
+    "# but with specified number of hidden layers from the user.\n",
+    "def solve_ode_deep_neural_network(x, num_neurons, num_iter, lmb):\n",
+    "    # num_hidden_neurons is now a list of number of neurons within each hidden layer\n",
+    "\n",
+    "    # The number of elements in the list num_hidden_neurons thus represents\n",
+    "    # the number of hidden layers.\n",
+    "\n",
+    "    # Find the number of hidden layers:\n",
+    "    N_hidden = np.size(num_neurons)\n",
+    "\n",
+    "    ## Set up initial weights and biases\n",
+    "\n",
+    "    # Initialize the list of parameters:\n",
+    "    P = [None]*(N_hidden + 1) # + 1 to include the output layer\n",
+    "\n",
+    "    P[0] = npr.randn(num_neurons[0], 2 )\n",
+    "    for l in range(1,N_hidden):\n",
+    "        P[l] = npr.randn(num_neurons[l], num_neurons[l-1] + 1) # +1 to include bias\n",
+    "\n",
+    "    # For the output layer\n",
+    "    P[-1] = npr.randn(1, num_neurons[-1] + 1 ) # +1 since bias is included\n",
+    "\n",
+    "    print('Initial cost: %g'%cost_function_deep(P, x))\n",
+    "\n",
+    "    ## Start finding the optimal weights using gradient descent\n",
+    "\n",
+    "    # Find the Python function that represents the gradient of the cost function\n",
+    "    # w.r.t the 0-th input argument -- that is the weights and biases in the hidden and output layer\n",
+    "    cost_function_deep_grad = grad(cost_function_deep,0)\n",
+    "\n",
+    "    # Let the update be done num_iter times\n",
+    "    for i in range(num_iter):\n",
+    "        # Evaluate the gradient at the current weights and biases in P.\n",
+    "        # The cost_grad consist now of N_hidden + 1 arrays; the gradient w.r.t the weights and biases\n",
+    "        # in the hidden layers and output layers evaluated at x.\n",
+    "        cost_deep_grad =  cost_function_deep_grad(P, x)\n",
+    "\n",
+    "        for l in range(N_hidden+1):\n",
+    "            P[l] = P[l] - lmb * cost_deep_grad[l]\n",
+    "\n",
+    "    print('Final cost: %g'%cost_function_deep(P, x))\n",
+    "\n",
+    "    return P\n",
+    "\n",
+    "def g_analytic(x, gamma = 2, g0 = 10):\n",
+    "    return g0*np.exp(-gamma*x)\n",
+    "\n",
+    "# Solve the given problem\n",
+    "if __name__ == '__main__':\n",
+    "    npr.seed(15)\n",
+    "\n",
+    "    ## Decide the vales of arguments to the function to solve\n",
+    "    N = 10\n",
+    "    x = np.linspace(0, 1, N)\n",
+    "\n",
+    "    ## Set up the initial parameters\n",
+    "    num_hidden_neurons = np.array([10,10])\n",
+    "    num_iter = 10000\n",
+    "    lmb = 0.001\n",
+    "\n",
+    "    P = solve_ode_deep_neural_network(x, num_hidden_neurons, num_iter, lmb)\n",
+    "\n",
+    "    res = g_trial_deep(x,P)\n",
+    "    res_analytical = g_analytic(x)\n",
+    "\n",
+    "    plt.figure(figsize=(10,10))\n",
+    "\n",
+    "    plt.title('Performance of a deep neural network solving an ODE compared to the analytical solution')\n",
+    "    plt.plot(x, res_analytical)\n",
+    "    plt.plot(x, res[0,:])\n",
+    "    plt.legend(['analytical','dnn'])\n",
+    "    plt.ylabel('g(x)')\n",
+    "    plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e6de1553",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Example: Population growth\n",
+    "\n",
+    "A logistic model of population growth assumes that a population converges toward an equilibrium.\n",
+    "The population growth can be modeled by"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6e4c5e3a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"log\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation} \\label{log} \\tag{10}\n",
+    "\tg'(t) = \\alpha g(t)(A - g(t))\n",
+    "\\end{equation}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "64a97256",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where $g(t)$ is the population density at time $t$, $\\alpha > 0$ the growth rate and $A > 0$ is the maximum population number in the environment.\n",
+    "Also, at $t = 0$ the population has the size $g(0) = g_0$, where $g_0$ is some chosen constant.\n",
+    "\n",
+    "In this example, similar network as for the exponential decay using Autograd has been used to solve the equation. However, as the implementation might suffer from e.g numerical instability\n",
+    "and high execution time (this might be more apparent in the examples solving PDEs),\n",
+    "using a library like  TensorFlow is recommended.\n",
+    "Here, we stay with a more simple approach and implement for comparison, the simple forward Euler method."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "94bb8aaa",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Setting up the problem\n",
+    "\n",
+    "Here, we will model a population $g(t)$ in an environment having carrying capacity $A$.\n",
+    "The population follows the model"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "29ead54b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"solveode_population\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation} \\label{solveode_population} \\tag{11}\n",
+    "g'(t) = \\alpha g(t)(A - g(t))\n",
+    "\\end{equation}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5685f6e2",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where $g(0) = g_0$.\n",
+    "\n",
+    "In this example, we let $\\alpha = 2$, $A = 1$, and $g_0 = 1.2$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "adaea719",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## The trial solution\n",
+    "\n",
+    "We will get a slightly different trial solution, as the boundary conditions are different\n",
+    "compared to the case for exponential decay.\n",
+    "\n",
+    "A possible trial solution satisfying the condition $g(0) = g_0$ could be\n",
+    "\n",
+    "$$\n",
+    "h_1(t) = g_0 + t \\cdot N(t,P)\n",
+    "$$\n",
+    "\n",
+    "with $N(t,P)$ being the output from the neural network with weights and biases for each layer collected in the set $P$.\n",
+    "\n",
+    "The analytical solution is\n",
+    "\n",
+    "$$\n",
+    "g(t) = \\frac{Ag_0}{g_0 + (A - g_0)\\exp(-\\alpha A t)}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4ee7e543",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## The program using Autograd\n",
+    "\n",
+    "The network will be the similar as for the exponential decay example, but with some small modifications for our problem."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "id": "e50f4369",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "import autograd.numpy as np\n",
+    "from autograd import grad, elementwise_grad\n",
+    "import autograd.numpy.random as npr\n",
+    "from matplotlib import pyplot as plt\n",
+    "\n",
+    "def sigmoid(z):\n",
+    "    return 1/(1 + np.exp(-z))\n",
+    "\n",
+    "# Function to get the parameters.\n",
+    "# Done such that one can easily change the paramaters after one's liking.\n",
+    "def get_parameters():\n",
+    "    alpha = 2\n",
+    "    A = 1\n",
+    "    g0 = 1.2\n",
+    "    return alpha, A, g0\n",
+    "\n",
+    "def deep_neural_network(deep_params, x):\n",
+    "    # N_hidden is the number of hidden layers\n",
+    "    # deep_params is a list, len() should be used\n",
+    "    N_hidden = len(deep_params) - 1 # -1 since params consists of\n",
+    "                                        # parameters to all the hidden\n",
+    "                                        # layers AND the output layer.\n",
+    "\n",
+    "    # Assumes input x being an one-dimensional array\n",
+    "    num_values = np.size(x)\n",
+    "    x = x.reshape(-1, num_values)\n",
+    "\n",
+    "    # Assume that the input layer does nothing to the input x\n",
+    "    x_input = x\n",
+    "\n",
+    "    # Due to multiple hidden layers, define a variable referencing to the\n",
+    "    # output of the previous layer:\n",
+    "    x_prev = x_input\n",
+    "\n",
+    "    ## Hidden layers:\n",
+    "\n",
+    "    for l in range(N_hidden):\n",
+    "        # From the list of parameters P; find the correct weigths and bias for this layer\n",
+    "        w_hidden = deep_params[l]\n",
+    "\n",
+    "        # Add a row of ones to include bias\n",
+    "        x_prev = np.concatenate((np.ones((1,num_values)), x_prev ), axis = 0)\n",
+    "\n",
+    "        z_hidden = np.matmul(w_hidden, x_prev)\n",
+    "        x_hidden = sigmoid(z_hidden)\n",
+    "\n",
+    "        # Update x_prev such that next layer can use the output from this layer\n",
+    "        x_prev = x_hidden\n",
+    "\n",
+    "    ## Output layer:\n",
+    "\n",
+    "    # Get the weights and bias for this layer\n",
+    "    w_output = deep_params[-1]\n",
+    "\n",
+    "    # Include bias:\n",
+    "    x_prev = np.concatenate((np.ones((1,num_values)), x_prev), axis = 0)\n",
+    "\n",
+    "    z_output = np.matmul(w_output, x_prev)\n",
+    "    x_output = z_output\n",
+    "\n",
+    "    return x_output\n",
+    "\n",
+    "\n",
+    "\n",
+    "\n",
+    "def cost_function_deep(P, x):\n",
+    "\n",
+    "    # Evaluate the trial function with the current parameters P\n",
+    "    g_t = g_trial_deep(x,P)\n",
+    "\n",
+    "    # Find the derivative w.r.t x of the trial function\n",
+    "    d_g_t = elementwise_grad(g_trial_deep,0)(x,P)\n",
+    "\n",
+    "    # The right side of the ODE\n",
+    "    func = f(x, g_t)\n",
+    "\n",
+    "    err_sqr = (d_g_t - func)**2\n",
+    "    cost_sum = np.sum(err_sqr)\n",
+    "\n",
+    "    return cost_sum / np.size(err_sqr)\n",
+    "\n",
+    "# The right side of the ODE:\n",
+    "def f(x, g_trial):\n",
+    "    alpha,A, g0 = get_parameters()\n",
+    "    return alpha*g_trial*(A - g_trial)\n",
+    "\n",
+    "# The trial solution using the deep neural network:\n",
+    "def g_trial_deep(x, params):\n",
+    "    alpha,A, g0 = get_parameters()\n",
+    "    return g0 + x*deep_neural_network(params,x)\n",
+    "\n",
+    "# The analytical solution:\n",
+    "def g_analytic(t):\n",
+    "    alpha,A, g0 = get_parameters()\n",
+    "    return A*g0/(g0 + (A - g0)*np.exp(-alpha*A*t))\n",
+    "\n",
+    "def solve_ode_deep_neural_network(x, num_neurons, num_iter, lmb):\n",
+    "    # num_hidden_neurons is now a list of number of neurons within each hidden layer\n",
+    "\n",
+    "    # Find the number of hidden layers:\n",
+    "    N_hidden = np.size(num_neurons)\n",
+    "\n",
+    "    ## Set up initial weigths and biases\n",
+    "\n",
+    "    # Initialize the list of parameters:\n",
+    "    P = [None]*(N_hidden + 1) # + 1 to include the output layer\n",
+    "\n",
+    "    P[0] = npr.randn(num_neurons[0], 2 )\n",
+    "    for l in range(1,N_hidden):\n",
+    "        P[l] = npr.randn(num_neurons[l], num_neurons[l-1] + 1) # +1 to include bias\n",
+    "\n",
+    "    # For the output layer\n",
+    "    P[-1] = npr.randn(1, num_neurons[-1] + 1 ) # +1 since bias is included\n",
+    "\n",
+    "    print('Initial cost: %g'%cost_function_deep(P, x))\n",
+    "\n",
+    "    ## Start finding the optimal weigths using gradient descent\n",
+    "\n",
+    "    # Find the Python function that represents the gradient of the cost function\n",
+    "    # w.r.t the 0-th input argument -- that is the weights and biases in the hidden and output layer\n",
+    "    cost_function_deep_grad = grad(cost_function_deep,0)\n",
+    "\n",
+    "    # Let the update be done num_iter times\n",
+    "    for i in range(num_iter):\n",
+    "        # Evaluate the gradient at the current weights and biases in P.\n",
+    "        # The cost_grad consist now of N_hidden + 1 arrays; the gradient w.r.t the weights and biases\n",
+    "        # in the hidden layers and output layers evaluated at x.\n",
+    "        cost_deep_grad =  cost_function_deep_grad(P, x)\n",
+    "\n",
+    "        for l in range(N_hidden+1):\n",
+    "            P[l] = P[l] - lmb * cost_deep_grad[l]\n",
+    "\n",
+    "    print('Final cost: %g'%cost_function_deep(P, x))\n",
+    "\n",
+    "    return P\n",
+    "\n",
+    "if __name__ == '__main__':\n",
+    "    npr.seed(4155)\n",
+    "\n",
+    "    ## Decide the vales of arguments to the function to solve\n",
+    "    Nt = 10\n",
+    "    T = 1\n",
+    "    t = np.linspace(0,T, Nt)\n",
+    "\n",
+    "    ## Set up the initial parameters\n",
+    "    num_hidden_neurons = [100, 50, 25]\n",
+    "    num_iter = 1000\n",
+    "    lmb = 1e-3\n",
+    "\n",
+    "    P = solve_ode_deep_neural_network(t, num_hidden_neurons, num_iter, lmb)\n",
+    "\n",
+    "    g_dnn_ag = g_trial_deep(t,P)\n",
+    "    g_analytical = g_analytic(t)\n",
+    "\n",
+    "    # Find the maximum absolute difference between the solutons:\n",
+    "    diff_ag = np.max(np.abs(g_dnn_ag - g_analytical))\n",
+    "    print(\"The max absolute difference between the solutions is: %g\"%diff_ag)\n",
+    "\n",
+    "    plt.figure(figsize=(10,10))\n",
+    "\n",
+    "    plt.title('Performance of neural network solving an ODE compared to the analytical solution')\n",
+    "    plt.plot(t, g_analytical)\n",
+    "    plt.plot(t, g_dnn_ag[0,:])\n",
+    "    plt.legend(['analytical','nn'])\n",
+    "    plt.xlabel('t')\n",
+    "    plt.ylabel('g(t)')\n",
+    "\n",
+    "    plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cf212644",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Using forward Euler to solve the ODE\n",
+    "\n",
+    "A straightforward way of solving an ODE numerically, is to use Euler's method.\n",
+    "\n",
+    "Euler's method uses Taylor series to approximate the value at a function $f$ at a step $\\Delta x$ from $x$:\n",
+    "\n",
+    "$$\n",
+    "f(x + \\Delta x) \\approx f(x) + \\Delta x f'(x)\n",
+    "$$\n",
+    "\n",
+    "In our case, using Euler's method to approximate the value of $g$ at a step $\\Delta t$ from $t$ yields"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "46f2fb77",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\begin{aligned}\n",
+    "  g(t + \\Delta t) &\\approx g(t) + \\Delta t g'(t) \\\\\n",
+    "  &= g(t) + \\Delta t \\big(\\alpha g(t)(A - g(t))\\big)\n",
+    "\\end{aligned}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "aab2dfa5",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "along with the condition that $g(0) = g_0$.\n",
+    "\n",
+    "Let $t_i = i \\cdot \\Delta t$ where $\\Delta t = \\frac{T}{N_t-1}$ where $T$ is the final time our solver must solve for and $N_t$ the number of values for $t \\in [0, T]$ for $i = 0, \\dots, N_t-1$.\n",
+    "\n",
+    "For $i \\geq 1$, we have that"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8eea575e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\begin{aligned}\n",
+    "t_i &= i\\Delta t \\\\\n",
+    "&= (i - 1)\\Delta t + \\Delta t \\\\\n",
+    "&= t_{i-1} + \\Delta t\n",
+    "\\end{aligned}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b91b116d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Now, if $g_i = g(t_i)$ then"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b438159d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"odenum\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation}\n",
+    "  \\begin{aligned}\n",
+    "  g_i &= g(t_i) \\\\\n",
+    "  &= g(t_{i-1} + \\Delta t) \\\\\n",
+    "  &\\approx g(t_{i-1}) + \\Delta t \\big(\\alpha g(t_{i-1})(A - g(t_{i-1}))\\big) \\\\\n",
+    "  &= g_{i-1} + \\Delta t \\big(\\alpha g_{i-1}(A - g_{i-1})\\big)\n",
+    "  \\end{aligned}\n",
+    "\\end{equation} \\label{odenum} \\tag{12}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c4fcc89b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "for $i \\geq 1$ and $g_0 = g(t_0) = g(0) = g_0$.\n",
+    "\n",
+    "Equation ([12](#odenum)) could be implemented in the following way,\n",
+    "extending the program that uses the network using Autograd:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "id": "98f55b29",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "# Assume that all function definitions from the example program using Autograd\n",
+    "# are located here.\n",
+    "\n",
+    "if __name__ == '__main__':\n",
+    "    npr.seed(4155)\n",
+    "\n",
+    "    ## Decide the vales of arguments to the function to solve\n",
+    "    Nt = 10\n",
+    "    T = 1\n",
+    "    t = np.linspace(0,T, Nt)\n",
+    "\n",
+    "    ## Set up the initial parameters\n",
+    "    num_hidden_neurons = [100,50,25]\n",
+    "    num_iter = 1000\n",
+    "    lmb = 1e-3\n",
+    "\n",
+    "    P = solve_ode_deep_neural_network(t, num_hidden_neurons, num_iter, lmb)\n",
+    "\n",
+    "    g_dnn_ag = g_trial_deep(t,P)\n",
+    "    g_analytical = g_analytic(t)\n",
+    "\n",
+    "    # Find the maximum absolute difference between the solutons:\n",
+    "    diff_ag = np.max(np.abs(g_dnn_ag - g_analytical))\n",
+    "    print(\"The max absolute difference between the solutions is: %g\"%diff_ag)\n",
+    "\n",
+    "    plt.figure(figsize=(10,10))\n",
+    "\n",
+    "    plt.title('Performance of neural network solving an ODE compared to the analytical solution')\n",
+    "    plt.plot(t, g_analytical)\n",
+    "    plt.plot(t, g_dnn_ag[0,:])\n",
+    "    plt.legend(['analytical','nn'])\n",
+    "    plt.xlabel('t')\n",
+    "    plt.ylabel('g(t)')\n",
+    "\n",
+    "    ## Find an approximation to the funtion using forward Euler\n",
+    "\n",
+    "    alpha, A, g0 = get_parameters()\n",
+    "    dt = T/(Nt - 1)\n",
+    "\n",
+    "    # Perform forward Euler to solve the ODE\n",
+    "    g_euler = np.zeros(Nt)\n",
+    "    g_euler[0] = g0\n",
+    "\n",
+    "    for i in range(1,Nt):\n",
+    "        g_euler[i] = g_euler[i-1] + dt*(alpha*g_euler[i-1]*(A - g_euler[i-1]))\n",
+    "\n",
+    "    # Print the errors done by each method\n",
+    "    diff1 = np.max(np.abs(g_euler - g_analytical))\n",
+    "    diff2 = np.max(np.abs(g_dnn_ag[0,:] - g_analytical))\n",
+    "\n",
+    "    print('Max absolute difference between Euler method and analytical: %g'%diff1)\n",
+    "    print('Max absolute difference between deep neural network and analytical: %g'%diff2)\n",
+    "\n",
+    "    # Plot results\n",
+    "    plt.figure(figsize=(10,10))\n",
+    "\n",
+    "    plt.plot(t,g_euler)\n",
+    "    plt.plot(t,g_analytical)\n",
+    "    plt.plot(t,g_dnn_ag[0,:])\n",
+    "\n",
+    "    plt.legend(['euler','analytical','dnn'])\n",
+    "    plt.xlabel('Time t')\n",
+    "    plt.ylabel('g(t)')\n",
+    "\n",
+    "    plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a6e8888e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Example: Solving the one dimensional Poisson equation\n",
+    "\n",
+    "The Poisson equation for $g(x)$ in one dimension is"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ac2720d4",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"poisson\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation} \\label{poisson} \\tag{13}\n",
+    "  -g''(x) = f(x)\n",
+    "\\end{equation}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "65554b02",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where $f(x)$ is a given function for $x \\in (0,1)$.\n",
+    "\n",
+    "The conditions that $g(x)$ is chosen to fulfill, are"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0cdf0586",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\begin{align*}\n",
+    "  g(0) &= 0 \\\\\n",
+    "  g(1) &= 0\n",
+    "\\end{align*}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f7e65a6a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "This equation can be solved numerically using programs where e.g Autograd and TensorFlow are used.\n",
+    "The results from the networks can then be compared to the analytical solution.\n",
+    "In addition, it could be interesting to see how a typical method for numerically solving second order ODEs compares to the neural networks."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cd827e12",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## The specific equation to solve for\n",
+    "\n",
+    "Here, the function $g(x)$ to solve for follows the equation"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a6100e41",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "-g''(x) = f(x),\\qquad x \\in (0,1)\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "15c06751",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where $f(x)$ is a given function, along with the chosen conditions"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b2b4dd2f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"cond\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{aligned}\n",
+    "g(0) = g(1) = 0\n",
+    "\\end{aligned}\\label{cond} \\tag{14}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2133aeed",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "In this example, we consider the case when $f(x) = (3x + x^2)\\exp(x)$.\n",
+    "\n",
+    "For this case, a possible trial solution satisfying the conditions could be"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5baf9b4b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "g_t(x) = x \\cdot (1-x) \\cdot N(P,x)\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ed82aba2",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "The analytical solution for this problem is"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c9bce69c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "g(x) = x(1 - x)\\exp(x)\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ce42c4a8",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Solving the equation using Autograd"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "id": "2fcb9045",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "import autograd.numpy as np\n",
+    "from autograd import grad, elementwise_grad\n",
+    "import autograd.numpy.random as npr\n",
+    "from matplotlib import pyplot as plt\n",
+    "\n",
+    "def sigmoid(z):\n",
+    "    return 1/(1 + np.exp(-z))\n",
+    "\n",
+    "def deep_neural_network(deep_params, x):\n",
+    "    # N_hidden is the number of hidden layers\n",
+    "    # deep_params is a list, len() should be used\n",
+    "    N_hidden = len(deep_params) - 1 # -1 since params consists of\n",
+    "                                        # parameters to all the hidden\n",
+    "                                        # layers AND the output layer.\n",
+    "\n",
+    "    # Assumes input x being an one-dimensional array\n",
+    "    num_values = np.size(x)\n",
+    "    x = x.reshape(-1, num_values)\n",
+    "\n",
+    "    # Assume that the input layer does nothing to the input x\n",
+    "    x_input = x\n",
+    "\n",
+    "    # Due to multiple hidden layers, define a variable referencing to the\n",
+    "    # output of the previous layer:\n",
+    "    x_prev = x_input\n",
+    "\n",
+    "    ## Hidden layers:\n",
+    "\n",
+    "    for l in range(N_hidden):\n",
+    "        # From the list of parameters P; find the correct weigths and bias for this layer\n",
+    "        w_hidden = deep_params[l]\n",
+    "\n",
+    "        # Add a row of ones to include bias\n",
+    "        x_prev = np.concatenate((np.ones((1,num_values)), x_prev ), axis = 0)\n",
+    "\n",
+    "        z_hidden = np.matmul(w_hidden, x_prev)\n",
+    "        x_hidden = sigmoid(z_hidden)\n",
+    "\n",
+    "        # Update x_prev such that next layer can use the output from this layer\n",
+    "        x_prev = x_hidden\n",
+    "\n",
+    "    ## Output layer:\n",
+    "\n",
+    "    # Get the weights and bias for this layer\n",
+    "    w_output = deep_params[-1]\n",
+    "\n",
+    "    # Include bias:\n",
+    "    x_prev = np.concatenate((np.ones((1,num_values)), x_prev), axis = 0)\n",
+    "\n",
+    "    z_output = np.matmul(w_output, x_prev)\n",
+    "    x_output = z_output\n",
+    "\n",
+    "    return x_output\n",
+    "\n",
+    "\n",
+    "def solve_ode_deep_neural_network(x, num_neurons, num_iter, lmb):\n",
+    "    # num_hidden_neurons is now a list of number of neurons within each hidden layer\n",
+    "\n",
+    "    # Find the number of hidden layers:\n",
+    "    N_hidden = np.size(num_neurons)\n",
+    "\n",
+    "    ## Set up initial weigths and biases\n",
+    "\n",
+    "    # Initialize the list of parameters:\n",
+    "    P = [None]*(N_hidden + 1) # + 1 to include the output layer\n",
+    "\n",
+    "    P[0] = npr.randn(num_neurons[0], 2 )\n",
+    "    for l in range(1,N_hidden):\n",
+    "        P[l] = npr.randn(num_neurons[l], num_neurons[l-1] + 1) # +1 to include bias\n",
+    "\n",
+    "    # For the output layer\n",
+    "    P[-1] = npr.randn(1, num_neurons[-1] + 1 ) # +1 since bias is included\n",
+    "\n",
+    "    print('Initial cost: %g'%cost_function_deep(P, x))\n",
+    "\n",
+    "    ## Start finding the optimal weigths using gradient descent\n",
+    "\n",
+    "    # Find the Python function that represents the gradient of the cost function\n",
+    "    # w.r.t the 0-th input argument -- that is the weights and biases in the hidden and output layer\n",
+    "    cost_function_deep_grad = grad(cost_function_deep,0)\n",
+    "\n",
+    "    # Let the update be done num_iter times\n",
+    "    for i in range(num_iter):\n",
+    "        # Evaluate the gradient at the current weights and biases in P.\n",
+    "        # The cost_grad consist now of N_hidden + 1 arrays; the gradient w.r.t the weights and biases\n",
+    "        # in the hidden layers and output layers evaluated at x.\n",
+    "        cost_deep_grad =  cost_function_deep_grad(P, x)\n",
+    "\n",
+    "        for l in range(N_hidden+1):\n",
+    "            P[l] = P[l] - lmb * cost_deep_grad[l]\n",
+    "\n",
+    "    print('Final cost: %g'%cost_function_deep(P, x))\n",
+    "\n",
+    "    return P\n",
+    "\n",
+    "## Set up the cost function specified for this Poisson equation:\n",
+    "\n",
+    "# The right side of the ODE\n",
+    "def f(x):\n",
+    "    return (3*x + x**2)*np.exp(x)\n",
+    "\n",
+    "def cost_function_deep(P, x):\n",
+    "\n",
+    "    # Evaluate the trial function with the current parameters P\n",
+    "    g_t = g_trial_deep(x,P)\n",
+    "\n",
+    "    # Find the derivative w.r.t x of the trial function\n",
+    "    d2_g_t = elementwise_grad(elementwise_grad(g_trial_deep,0))(x,P)\n",
+    "\n",
+    "    right_side = f(x)\n",
+    "\n",
+    "    err_sqr = (-d2_g_t - right_side)**2\n",
+    "    cost_sum = np.sum(err_sqr)\n",
+    "\n",
+    "    return cost_sum/np.size(err_sqr)\n",
+    "\n",
+    "# The trial solution:\n",
+    "def g_trial_deep(x,P):\n",
+    "    return x*(1-x)*deep_neural_network(P,x)\n",
+    "\n",
+    "# The analytic solution;\n",
+    "def g_analytic(x):\n",
+    "    return x*(1-x)*np.exp(x)\n",
+    "\n",
+    "if __name__ == '__main__':\n",
+    "    npr.seed(4155)\n",
+    "\n",
+    "    ## Decide the vales of arguments to the function to solve\n",
+    "    Nx = 10\n",
+    "    x = np.linspace(0,1, Nx)\n",
+    "\n",
+    "    ## Set up the initial parameters\n",
+    "    num_hidden_neurons = [200,100]\n",
+    "    num_iter = 1000\n",
+    "    lmb = 1e-3\n",
+    "\n",
+    "    P = solve_ode_deep_neural_network(x, num_hidden_neurons, num_iter, lmb)\n",
+    "\n",
+    "    g_dnn_ag = g_trial_deep(x,P)\n",
+    "    g_analytical = g_analytic(x)\n",
+    "\n",
+    "    # Find the maximum absolute difference between the solutons:\n",
+    "    max_diff = np.max(np.abs(g_dnn_ag - g_analytical))\n",
+    "    print(\"The max absolute difference between the solutions is: %g\"%max_diff)\n",
+    "\n",
+    "    plt.figure(figsize=(10,10))\n",
+    "\n",
+    "    plt.title('Performance of neural network solving an ODE compared to the analytical solution')\n",
+    "    plt.plot(x, g_analytical)\n",
+    "    plt.plot(x, g_dnn_ag[0,:])\n",
+    "    plt.legend(['analytical','nn'])\n",
+    "    plt.xlabel('x')\n",
+    "    plt.ylabel('g(x)')\n",
+    "    plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9db2e30e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Comparing with a numerical scheme\n",
+    "\n",
+    "The Poisson equation is possible to solve using Taylor series to approximate the second derivative.\n",
+    "\n",
+    "Using Taylor series, the second derivative can be expressed as\n",
+    "\n",
+    "$$\n",
+    "g''(x) = \\frac{g(x + \\Delta x) - 2g(x) + g(x-\\Delta x)}{\\Delta x^2} + E_{\\Delta x}(x)\n",
+    "$$\n",
+    "\n",
+    "where $\\Delta x$ is a small step size and $E_{\\Delta x}(x)$ being the error term.\n",
+    "\n",
+    "Looking away from the error terms gives an approximation to the second derivative:"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2cea098e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"approx\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation} \\label{approx} \\tag{15}\n",
+    "g''(x) \\approx \\frac{g(x + \\Delta x) - 2g(x) + g(x-\\Delta x)}{\\Delta x^2}\n",
+    "\\end{equation}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4606d139",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "If $x_i = i \\Delta x = x_{i-1} + \\Delta x$ and $g_i = g(x_i)$ for $i = 1,\\dots N_x - 2$ with $N_x$ being the number of values for $x$, ([15](#approx)) becomes"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "bf52b218",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\begin{aligned}\n",
+    "g''(x_i) &\\approx \\frac{g(x_i + \\Delta x) - 2g(x_i) + g(x_i -\\Delta x)}{\\Delta x^2} \\\\\n",
+    "&= \\frac{g_{i+1} - 2g_i + g_{i-1}}{\\Delta x^2}\n",
+    "\\end{aligned}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5649b303",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Since we know from our problem that"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cabbaeeb",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\begin{aligned}\n",
+    "-g''(x) &= f(x) \\\\\n",
+    "&= (3x + x^2)\\exp(x)\n",
+    "\\end{aligned}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9116da9a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "along with the conditions $g(0) = g(1) = 0$,\n",
+    "the following scheme can be used to find an approximate solution for $g(x)$ numerically:"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "fa0313ed",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"odesys\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation}\n",
+    "  \\begin{aligned}\n",
+    "  -\\Big( \\frac{g_{i+1} - 2g_i + g_{i-1}}{\\Delta x^2} \\Big) &= f(x_i) \\\\\n",
+    "  -g_{i+1} + 2g_i - g_{i-1} &= \\Delta x^2 f(x_i)\n",
+    "  \\end{aligned}\n",
+    "\\end{equation} \\label{odesys} \\tag{16}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d4bff256",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "for $i = 1, \\dots, N_x - 2$ where $g_0 = g_{N_x - 1} = 0$ and $f(x_i) = (3x_i + x_i^2)\\exp(x_i)$, which is given for our specific problem.\n",
+    "\n",
+    "The equation can be rewritten into a matrix equation:"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2817b619",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\begin{aligned}\n",
+    "\\begin{pmatrix}\n",
+    "2 & -1 & 0 & \\dots & 0 \\\\\n",
+    "-1 & 2 & -1 & \\dots & 0 \\\\\n",
+    "\\vdots & & \\ddots & & \\vdots \\\\\n",
+    "0 & \\dots & -1 & 2 & -1  \\\\\n",
+    "0 & \\dots & 0 & -1 & 2\\\\\n",
+    "\\end{pmatrix}\n",
+    "\\begin{pmatrix}\n",
+    "g_1 \\\\\n",
+    "g_2 \\\\\n",
+    "\\vdots \\\\\n",
+    "g_{N_x - 3} \\\\\n",
+    "g_{N_x - 2}\n",
+    "\\end{pmatrix}\n",
+    "&=\n",
+    "\\Delta x^2\n",
+    "\\begin{pmatrix}\n",
+    "f(x_1) \\\\\n",
+    "f(x_2) \\\\\n",
+    "\\vdots \\\\\n",
+    "f(x_{N_x - 3}) \\\\\n",
+    "f(x_{N_x - 2})\n",
+    "\\end{pmatrix} \\\\\n",
+    "\\boldsymbol{A}\\boldsymbol{g} &= \\boldsymbol{f},\n",
+    "\\end{aligned}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5130b233",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "which makes it possible to solve for the vector $\\boldsymbol{g}$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "18a4fdda",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Setting up the code\n",
+    "\n",
+    "We can then compare the result from this numerical scheme with the output from our network using Autograd:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "id": "3cff184d",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "import autograd.numpy as np\n",
+    "from autograd import grad, elementwise_grad\n",
+    "import autograd.numpy.random as npr\n",
+    "from matplotlib import pyplot as plt\n",
+    "\n",
+    "def sigmoid(z):\n",
+    "    return 1/(1 + np.exp(-z))\n",
+    "\n",
+    "def deep_neural_network(deep_params, x):\n",
+    "    # N_hidden is the number of hidden layers\n",
+    "    # deep_params is a list, len() should be used\n",
+    "    N_hidden = len(deep_params) - 1 # -1 since params consists of\n",
+    "                                        # parameters to all the hidden\n",
+    "                                        # layers AND the output layer.\n",
+    "\n",
+    "    # Assumes input x being an one-dimensional array\n",
+    "    num_values = np.size(x)\n",
+    "    x = x.reshape(-1, num_values)\n",
+    "\n",
+    "    # Assume that the input layer does nothing to the input x\n",
+    "    x_input = x\n",
+    "\n",
+    "    # Due to multiple hidden layers, define a variable referencing to the\n",
+    "    # output of the previous layer:\n",
+    "    x_prev = x_input\n",
+    "\n",
+    "    ## Hidden layers:\n",
+    "\n",
+    "    for l in range(N_hidden):\n",
+    "        # From the list of parameters P; find the correct weigths and bias for this layer\n",
+    "        w_hidden = deep_params[l]\n",
+    "\n",
+    "        # Add a row of ones to include bias\n",
+    "        x_prev = np.concatenate((np.ones((1,num_values)), x_prev ), axis = 0)\n",
+    "\n",
+    "        z_hidden = np.matmul(w_hidden, x_prev)\n",
+    "        x_hidden = sigmoid(z_hidden)\n",
+    "\n",
+    "        # Update x_prev such that next layer can use the output from this layer\n",
+    "        x_prev = x_hidden\n",
+    "\n",
+    "    ## Output layer:\n",
+    "\n",
+    "    # Get the weights and bias for this layer\n",
+    "    w_output = deep_params[-1]\n",
+    "\n",
+    "    # Include bias:\n",
+    "    x_prev = np.concatenate((np.ones((1,num_values)), x_prev), axis = 0)\n",
+    "\n",
+    "    z_output = np.matmul(w_output, x_prev)\n",
+    "    x_output = z_output\n",
+    "\n",
+    "    return x_output\n",
+    "\n",
+    "\n",
+    "def solve_ode_deep_neural_network(x, num_neurons, num_iter, lmb):\n",
+    "    # num_hidden_neurons is now a list of number of neurons within each hidden layer\n",
+    "\n",
+    "    # Find the number of hidden layers:\n",
+    "    N_hidden = np.size(num_neurons)\n",
+    "\n",
+    "    ## Set up initial weigths and biases\n",
+    "\n",
+    "    # Initialize the list of parameters:\n",
+    "    P = [None]*(N_hidden + 1) # + 1 to include the output layer\n",
+    "\n",
+    "    P[0] = npr.randn(num_neurons[0], 2 )\n",
+    "    for l in range(1,N_hidden):\n",
+    "        P[l] = npr.randn(num_neurons[l], num_neurons[l-1] + 1) # +1 to include bias\n",
+    "\n",
+    "    # For the output layer\n",
+    "    P[-1] = npr.randn(1, num_neurons[-1] + 1 ) # +1 since bias is included\n",
+    "\n",
+    "    print('Initial cost: %g'%cost_function_deep(P, x))\n",
+    "\n",
+    "    ## Start finding the optimal weigths using gradient descent\n",
+    "\n",
+    "    # Find the Python function that represents the gradient of the cost function\n",
+    "    # w.r.t the 0-th input argument -- that is the weights and biases in the hidden and output layer\n",
+    "    cost_function_deep_grad = grad(cost_function_deep,0)\n",
+    "\n",
+    "    # Let the update be done num_iter times\n",
+    "    for i in range(num_iter):\n",
+    "        # Evaluate the gradient at the current weights and biases in P.\n",
+    "        # The cost_grad consist now of N_hidden + 1 arrays; the gradient w.r.t the weights and biases\n",
+    "        # in the hidden layers and output layers evaluated at x.\n",
+    "        cost_deep_grad =  cost_function_deep_grad(P, x)\n",
+    "\n",
+    "        for l in range(N_hidden+1):\n",
+    "            P[l] = P[l] - lmb * cost_deep_grad[l]\n",
+    "\n",
+    "    print('Final cost: %g'%cost_function_deep(P, x))\n",
+    "\n",
+    "    return P\n",
+    "\n",
+    "## Set up the cost function specified for this Poisson equation:\n",
+    "\n",
+    "# The right side of the ODE\n",
+    "def f(x):\n",
+    "    return (3*x + x**2)*np.exp(x)\n",
+    "\n",
+    "def cost_function_deep(P, x):\n",
+    "\n",
+    "    # Evaluate the trial function with the current parameters P\n",
+    "    g_t = g_trial_deep(x,P)\n",
+    "\n",
+    "    # Find the derivative w.r.t x of the trial function\n",
+    "    d2_g_t = elementwise_grad(elementwise_grad(g_trial_deep,0))(x,P)\n",
+    "\n",
+    "    right_side = f(x)\n",
+    "\n",
+    "    err_sqr = (-d2_g_t - right_side)**2\n",
+    "    cost_sum = np.sum(err_sqr)\n",
+    "\n",
+    "    return cost_sum/np.size(err_sqr)\n",
+    "\n",
+    "# The trial solution:\n",
+    "def g_trial_deep(x,P):\n",
+    "    return x*(1-x)*deep_neural_network(P,x)\n",
+    "\n",
+    "# The analytic solution;\n",
+    "def g_analytic(x):\n",
+    "    return x*(1-x)*np.exp(x)\n",
+    "\n",
+    "if __name__ == '__main__':\n",
+    "    npr.seed(4155)\n",
+    "\n",
+    "    ## Decide the vales of arguments to the function to solve\n",
+    "    Nx = 10\n",
+    "    x = np.linspace(0,1, Nx)\n",
+    "\n",
+    "    ## Set up the initial parameters\n",
+    "    num_hidden_neurons = [200,100]\n",
+    "    num_iter = 1000\n",
+    "    lmb = 1e-3\n",
+    "\n",
+    "    P = solve_ode_deep_neural_network(x, num_hidden_neurons, num_iter, lmb)\n",
+    "\n",
+    "    g_dnn_ag = g_trial_deep(x,P)\n",
+    "    g_analytical = g_analytic(x)\n",
+    "\n",
+    "    # Find the maximum absolute difference between the solutons:\n",
+    "\n",
+    "    plt.figure(figsize=(10,10))\n",
+    "\n",
+    "    plt.title('Performance of neural network solving an ODE compared to the analytical solution')\n",
+    "    plt.plot(x, g_analytical)\n",
+    "    plt.plot(x, g_dnn_ag[0,:])\n",
+    "    plt.legend(['analytical','nn'])\n",
+    "    plt.xlabel('x')\n",
+    "    plt.ylabel('g(x)')\n",
+    "\n",
+    "    ## Perform the computation using the numerical scheme\n",
+    "\n",
+    "    dx = 1/(Nx - 1)\n",
+    "\n",
+    "    # Set up the matrix A\n",
+    "    A = np.zeros((Nx-2,Nx-2))\n",
+    "\n",
+    "    A[0,0] = 2\n",
+    "    A[0,1] = -1\n",
+    "\n",
+    "    for i in range(1,Nx-3):\n",
+    "        A[i,i-1] = -1\n",
+    "        A[i,i] = 2\n",
+    "        A[i,i+1] = -1\n",
+    "\n",
+    "    A[Nx - 3, Nx - 4] = -1\n",
+    "    A[Nx - 3, Nx - 3] = 2\n",
+    "\n",
+    "    # Set up the vector f\n",
+    "    f_vec = dx**2 * f(x[1:-1])\n",
+    "\n",
+    "    # Solve the equation\n",
+    "    g_res = np.linalg.solve(A,f_vec)\n",
+    "\n",
+    "    g_vec = np.zeros(Nx)\n",
+    "    g_vec[1:-1] = g_res\n",
+    "\n",
+    "    # Print the differences between each method\n",
+    "    max_diff1 = np.max(np.abs(g_dnn_ag - g_analytical))\n",
+    "    max_diff2 = np.max(np.abs(g_vec - g_analytical))\n",
+    "    print(\"The max absolute difference between the analytical solution and DNN Autograd: %g\"%max_diff1)\n",
+    "    print(\"The max absolute difference between the analytical solution and numerical scheme: %g\"%max_diff2)\n",
+    "\n",
+    "    # Plot the results\n",
+    "    plt.figure(figsize=(10,10))\n",
+    "\n",
+    "    plt.plot(x,g_vec)\n",
+    "    plt.plot(x,g_analytical)\n",
+    "    plt.plot(x,g_dnn_ag[0,:])\n",
+    "\n",
+    "    plt.legend(['numerical scheme','analytical','dnn'])\n",
+    "    plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "89115be0",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Partial Differential Equations\n",
+    "\n",
+    "A partial differential equation (PDE) has a solution here the function\n",
+    "is defined by multiple variables.  The equation may involve all kinds\n",
+    "of combinations of which variables the function is differentiated with\n",
+    "respect to.\n",
+    "\n",
+    "In general, a partial differential equation for a function $g(x_1,\\dots,x_N)$ with $N$ variables may be expressed as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c43a6341",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"PDE\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation} \\label{PDE} \\tag{17}\n",
+    "  f\\left(x_1, \\, \\dots \\, , x_N, \\frac{\\partial g(x_1,\\dots,x_N) }{\\partial x_1}, \\dots , \\frac{\\partial g(x_1,\\dots,x_N) }{\\partial x_N}, \\frac{\\partial g(x_1,\\dots,x_N) }{\\partial x_1\\partial x_2}, \\, \\dots \\, , \\frac{\\partial^n g(x_1,\\dots,x_N) }{\\partial x_N^n} \\right) = 0\n",
+    "\\end{equation}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "218a7a68",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where $f$ is an expression involving all kinds of possible mixed derivatives of $g(x_1,\\dots,x_N)$ up to an order $n$. In order for the solution to be unique, some additional conditions must also be given."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "902f8f61",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Type of problem\n",
+    "\n",
+    "The problem our network must solve for, is similar to the ODE case.\n",
+    "We must have a trial solution $g_t$ at hand.\n",
+    "\n",
+    "For instance, the trial solution could be expressed as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1c2bbcbd",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\begin{align*}\n",
+    "  g_t(x_1,\\dots,x_N) = h_1(x_1,\\dots,x_N) + h_2(x_1,\\dots,x_N,N(x_1,\\dots,x_N,P))\n",
+    "\\end{align*}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "73f5bf7b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where $h_1(x_1,\\dots,x_N)$ is a function that ensures $g_t(x_1,\\dots,x_N)$ satisfies some given conditions.\n",
+    "The neural network $N(x_1,\\dots,x_N,P)$ has weights and biases described by $P$ and $h_2(x_1,\\dots,x_N,N(x_1,\\dots,x_N,P))$ is an expression using the output from the neural network in some way.\n",
+    "\n",
+    "The role of the function $h_2(x_1,\\dots,x_N,N(x_1,\\dots,x_N,P))$, is to ensure that the output of $N(x_1,\\dots,x_N,P)$ is zero when $g_t(x_1,\\dots,x_N)$ is evaluated at the values of $x_1,\\dots,x_N$ where the given conditions must be satisfied. The function $h_1(x_1,\\dots,x_N)$ should alone make $g_t(x_1,\\dots,x_N)$ satisfy the conditions."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "dbb4ece5",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Network requirements\n",
+    "\n",
+    "The network tries then the minimize the cost function following the\n",
+    "same ideas as described for the ODE case, but now with more than one\n",
+    "variables to consider.  The concept still remains the same; find a set\n",
+    "of parameters $P$ such that the expression $f$ in ([17](#PDE)) is as\n",
+    "close to zero as possible.\n",
+    "\n",
+    "As for the ODE case, the cost function is the mean squared error that\n",
+    "the network must try to minimize. The cost function for the network to\n",
+    "minimize is"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d01d3943",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "C\\left(x_1, \\dots, x_N, P\\right) = \\left(  f\\left(x_1, \\, \\dots \\, , x_N, \\frac{\\partial g(x_1,\\dots,x_N) }{\\partial x_1}, \\dots , \\frac{\\partial g(x_1,\\dots,x_N) }{\\partial x_N}, \\frac{\\partial g(x_1,\\dots,x_N) }{\\partial x_1\\partial x_2}, \\, \\dots \\, , \\frac{\\partial^n g(x_1,\\dots,x_N) }{\\partial x_N^n} \\right) \\right)^2\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6514db22",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## More details\n",
+    "\n",
+    "If we let $\\boldsymbol{x} = \\big( x_1, \\dots, x_N \\big)$ be an array containing the values for $x_1, \\dots, x_N$ respectively, the cost function can be reformulated into the following:"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5a0ed10c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "C\\left(\\boldsymbol{x}, P\\right) = f\\left( \\left( \\boldsymbol{x}, \\frac{\\partial g(\\boldsymbol{x}) }{\\partial x_1}, \\dots , \\frac{\\partial g(\\boldsymbol{x}) }{\\partial x_N}, \\frac{\\partial g(\\boldsymbol{x}) }{\\partial x_1\\partial x_2}, \\, \\dots \\, , \\frac{\\partial^n g(\\boldsymbol{x}) }{\\partial x_N^n} \\right) \\right)^2\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "200fc78c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "If we also have $M$ different sets of values for $x_1, \\dots, x_N$, that is $\\boldsymbol{x}_i = \\big(x_1^{(i)}, \\dots, x_N^{(i)}\\big)$ for $i = 1,\\dots,M$ being the rows in matrix $X$, the cost function can be generalized into"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0c87647d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "C\\left(X, P \\right) = \\sum_{i=1}^M f\\left( \\left( \\boldsymbol{x}_i, \\frac{\\partial g(\\boldsymbol{x}_i) }{\\partial x_1}, \\dots , \\frac{\\partial g(\\boldsymbol{x}_i) }{\\partial x_N}, \\frac{\\partial g(\\boldsymbol{x}_i) }{\\partial x_1\\partial x_2}, \\, \\dots \\, , \\frac{\\partial^n g(\\boldsymbol{x}_i) }{\\partial x_N^n} \\right) \\right)^2.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6484a267",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Example: The diffusion equation\n",
+    "\n",
+    "In one spatial dimension, the equation reads"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2c2a2467",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial g(x,t)}{\\partial t} = \\frac{\\partial^2 g(x,t)}{\\partial x^2}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6df58357",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where a possible choice of conditions are"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "13d9c7f6",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\begin{align*}\n",
+    "g(0,t) &= 0 ,\\qquad t \\geq 0 \\\\\n",
+    "g(1,t) &= 0, \\qquad t \\geq 0 \\\\\n",
+    "g(x,0) &= u(x),\\qquad x\\in [0,1]\n",
+    "\\end{align*}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "627708ec",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "with $u(x)$ being some given function."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "43cdd945",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Defining the problem\n",
+    "\n",
+    "For this case, we want to find $g(x,t)$ such that"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ccdcb67e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"diffonedim\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation}\n",
+    "  \\frac{\\partial g(x,t)}{\\partial t} = \\frac{\\partial^2 g(x,t)}{\\partial x^2}\n",
+    "\\end{equation} \\label{diffonedim} \\tag{18}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ebe711f8",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2174f30f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\begin{align*}\n",
+    "g(0,t) &= 0 ,\\qquad t \\geq 0 \\\\\n",
+    "g(1,t) &= 0, \\qquad t \\geq 0 \\\\\n",
+    "g(x,0) &= u(x),\\qquad x\\in [0,1]\n",
+    "\\end{align*}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "083ed2ff",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "with $u(x) = \\sin(\\pi x)$.\n",
+    "\n",
+    "First, let us set up the deep neural network.\n",
+    "The deep neural network will follow the same structure as discussed in the examples solving the ODEs.\n",
+    "First, we will look into how Autograd could be used in a network tailored to solve for bivariate functions."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cf5e3f46",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Setting up the network using Autograd\n",
+    "\n",
+    "The only change to do here, is to extend our network such that\n",
+    "functions of multiple parameters are correctly handled.  In this case\n",
+    "we have two variables in our function to solve for, that is time $t$\n",
+    "and position $x$.  The variables will be represented by a\n",
+    "one-dimensional array in the program.  The program will evaluate the\n",
+    "network at each possible pair $(x,t)$, given an array for the desired\n",
+    "$x$-values and $t$-values to approximate the solution at."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "id": "4fee106b",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "def sigmoid(z):\n",
+    "    return 1/(1 + np.exp(-z))\n",
+    "\n",
+    "def deep_neural_network(deep_params, x):\n",
+    "    # x is now a point and a 1D numpy array; make it a column vector\n",
+    "    num_coordinates = np.size(x,0)\n",
+    "    x = x.reshape(num_coordinates,-1)\n",
+    "\n",
+    "    num_points = np.size(x,1)\n",
+    "\n",
+    "    # N_hidden is the number of hidden layers\n",
+    "    N_hidden = len(deep_params) - 1 # -1 since params consist of parameters to all the hidden layers AND the output layer\n",
+    "\n",
+    "    # Assume that the input layer does nothing to the input x\n",
+    "    x_input = x\n",
+    "    x_prev = x_input\n",
+    "\n",
+    "    ## Hidden layers:\n",
+    "\n",
+    "    for l in range(N_hidden):\n",
+    "        # From the list of parameters P; find the correct weigths and bias for this layer\n",
+    "        w_hidden = deep_params[l]\n",
+    "\n",
+    "        # Add a row of ones to include bias\n",
+    "        x_prev = np.concatenate((np.ones((1,num_points)), x_prev ), axis = 0)\n",
+    "\n",
+    "        z_hidden = np.matmul(w_hidden, x_prev)\n",
+    "        x_hidden = sigmoid(z_hidden)\n",
+    "\n",
+    "        # Update x_prev such that next layer can use the output from this layer\n",
+    "        x_prev = x_hidden\n",
+    "\n",
+    "    ## Output layer:\n",
+    "\n",
+    "    # Get the weights and bias for this layer\n",
+    "    w_output = deep_params[-1]\n",
+    "\n",
+    "    # Include bias:\n",
+    "    x_prev = np.concatenate((np.ones((1,num_points)), x_prev), axis = 0)\n",
+    "\n",
+    "    z_output = np.matmul(w_output, x_prev)\n",
+    "    x_output = z_output\n",
+    "\n",
+    "    return x_output[0][0]"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "63e5fb7e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Setting up the network using Autograd; The trial solution\n",
+    "\n",
+    "The cost function must then iterate through the given arrays\n",
+    "containing values for $x$ and $t$, defines a point $(x,t)$ the deep\n",
+    "neural network and the trial solution is evaluated at, and then finds\n",
+    "the Jacobian of the trial solution.\n",
+    "\n",
+    "A possible trial solution for this PDE is\n",
+    "\n",
+    "$$\n",
+    "g_t(x,t) = h_1(x,t) + x(1-x)tN(x,t,P)\n",
+    "$$\n",
+    "\n",
+    "with $h_1(x,t)$ being a function ensuring that $g_t(x,t)$ satisfies our given conditions, and $N(x,t,P)$ being the output from the deep neural network using weights and biases for each layer from $P$.\n",
+    "\n",
+    "To fulfill the conditions, $h_1(x,t)$ could be:\n",
+    "\n",
+    "$$\n",
+    "h_1(x,t) = (1-t)\\Big(u(x) - \\big((1-x)u(0) + x u(1)\\big)\\Big) = (1-t)u(x) = (1-t)\\sin(\\pi x)\n",
+    "$$\n",
+    "since $(0) = u(1) = 0$ and $u(x) = \\sin(\\pi x)$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "50cfea81",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Why the Jacobian?\n",
+    "\n",
+    "The Jacobian is used because the program must find the derivative of\n",
+    "the trial solution with respect to $x$ and $t$.\n",
+    "\n",
+    "This gives the necessity of computing the Jacobian matrix, as we want\n",
+    "to evaluate the gradient with respect to $x$ and $t$ (note that the\n",
+    "Jacobian of a scalar-valued multivariate function is simply its\n",
+    "gradient).\n",
+    "\n",
+    "In Autograd, the differentiation is by default done with respect to\n",
+    "the first input argument of your Python function. Since the points is\n",
+    "an array representing $x$ and $t$, the Jacobian is calculated using\n",
+    "the values of $x$ and $t$.\n",
+    "\n",
+    "To find the second derivative with respect to $x$ and $t$, the\n",
+    "Jacobian can be found for the second time. The result is a Hessian\n",
+    "matrix, which is the matrix containing all the possible second order\n",
+    "mixed derivatives of $g(x,t)$."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 8,
+   "id": "309808f6",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "# Set up the trial function:\n",
+    "def u(x):\n",
+    "    return np.sin(np.pi*x)\n",
+    "\n",
+    "def g_trial(point,P):\n",
+    "    x,t = point\n",
+    "    return (1-t)*u(x) + x*(1-x)*t*deep_neural_network(P,point)\n",
+    "\n",
+    "# The right side of the ODE:\n",
+    "def f(point):\n",
+    "    return 0.\n",
+    "\n",
+    "# The cost function:\n",
+    "def cost_function(P, x, t):\n",
+    "    cost_sum = 0\n",
+    "\n",
+    "    g_t_jacobian_func = jacobian(g_trial)\n",
+    "    g_t_hessian_func = hessian(g_trial)\n",
+    "\n",
+    "    for x_ in x:\n",
+    "        for t_ in t:\n",
+    "            point = np.array([x_,t_])\n",
+    "\n",
+    "            g_t = g_trial(point,P)\n",
+    "            g_t_jacobian = g_t_jacobian_func(point,P)\n",
+    "            g_t_hessian = g_t_hessian_func(point,P)\n",
+    "\n",
+    "            g_t_dt = g_t_jacobian[1]\n",
+    "            g_t_d2x = g_t_hessian[0][0]\n",
+    "\n",
+    "            func = f(point)\n",
+    "\n",
+    "            err_sqr = ( (g_t_dt - g_t_d2x) - func)**2\n",
+    "            cost_sum += err_sqr\n",
+    "\n",
+    "    return cost_sum"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9880d94c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Setting up the network using Autograd; The full program\n",
+    "\n",
+    "Having set up the network, along with the trial solution and cost function, we can now see how the deep neural network performs by comparing the results to the analytical solution.\n",
+    "\n",
+    "The analytical solution of our problem is\n",
+    "\n",
+    "$$\n",
+    "g(x,t) = \\exp(-\\pi^2 t)\\sin(\\pi x)\n",
+    "$$\n",
+    "\n",
+    "A possible way to implement a neural network solving the PDE, is given below.\n",
+    "Be aware, though, that it is fairly slow for the parameters used.\n",
+    "A better result is possible, but requires more iterations, and thus longer time to complete.\n",
+    "\n",
+    "Indeed, the program below is not optimal in its implementation, but rather serves as an example on how to implement and use a neural network to solve a PDE.\n",
+    "Using TensorFlow results in a much better execution time. Try it!"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 9,
+   "id": "fcd284e3",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "import autograd.numpy as np\n",
+    "from autograd import jacobian,hessian,grad\n",
+    "import autograd.numpy.random as npr\n",
+    "from matplotlib import cm\n",
+    "from matplotlib import pyplot as plt\n",
+    "from mpl_toolkits.mplot3d import axes3d\n",
+    "\n",
+    "## Set up the network\n",
+    "\n",
+    "def sigmoid(z):\n",
+    "    return 1/(1 + np.exp(-z))\n",
+    "\n",
+    "def deep_neural_network(deep_params, x):\n",
+    "    # x is now a point and a 1D numpy array; make it a column vector\n",
+    "    num_coordinates = np.size(x,0)\n",
+    "    x = x.reshape(num_coordinates,-1)\n",
+    "\n",
+    "    num_points = np.size(x,1)\n",
+    "\n",
+    "    # N_hidden is the number of hidden layers\n",
+    "    N_hidden = len(deep_params) - 1 # -1 since params consist of parameters to all the hidden layers AND the output layer\n",
+    "\n",
+    "    # Assume that the input layer does nothing to the input x\n",
+    "    x_input = x\n",
+    "    x_prev = x_input\n",
+    "\n",
+    "    ## Hidden layers:\n",
+    "\n",
+    "    for l in range(N_hidden):\n",
+    "        # From the list of parameters P; find the correct weigths and bias for this layer\n",
+    "        w_hidden = deep_params[l]\n",
+    "\n",
+    "        # Add a row of ones to include bias\n",
+    "        x_prev = np.concatenate((np.ones((1,num_points)), x_prev ), axis = 0)\n",
+    "\n",
+    "        z_hidden = np.matmul(w_hidden, x_prev)\n",
+    "        x_hidden = sigmoid(z_hidden)\n",
+    "\n",
+    "        # Update x_prev such that next layer can use the output from this layer\n",
+    "        x_prev = x_hidden\n",
+    "\n",
+    "    ## Output layer:\n",
+    "\n",
+    "    # Get the weights and bias for this layer\n",
+    "    w_output = deep_params[-1]\n",
+    "\n",
+    "    # Include bias:\n",
+    "    x_prev = np.concatenate((np.ones((1,num_points)), x_prev), axis = 0)\n",
+    "\n",
+    "    z_output = np.matmul(w_output, x_prev)\n",
+    "    x_output = z_output\n",
+    "\n",
+    "    return x_output[0][0]\n",
+    "\n",
+    "## Define the trial solution and cost function\n",
+    "def u(x):\n",
+    "    return np.sin(np.pi*x)\n",
+    "\n",
+    "def g_trial(point,P):\n",
+    "    x,t = point\n",
+    "    return (1-t)*u(x) + x*(1-x)*t*deep_neural_network(P,point)\n",
+    "\n",
+    "# The right side of the ODE:\n",
+    "def f(point):\n",
+    "    return 0.\n",
+    "\n",
+    "# The cost function:\n",
+    "def cost_function(P, x, t):\n",
+    "    cost_sum = 0\n",
+    "\n",
+    "    g_t_jacobian_func = jacobian(g_trial)\n",
+    "    g_t_hessian_func = hessian(g_trial)\n",
+    "\n",
+    "    for x_ in x:\n",
+    "        for t_ in t:\n",
+    "            point = np.array([x_,t_])\n",
+    "\n",
+    "            g_t = g_trial(point,P)\n",
+    "            g_t_jacobian = g_t_jacobian_func(point,P)\n",
+    "            g_t_hessian = g_t_hessian_func(point,P)\n",
+    "\n",
+    "            g_t_dt = g_t_jacobian[1]\n",
+    "            g_t_d2x = g_t_hessian[0][0]\n",
+    "\n",
+    "            func = f(point)\n",
+    "\n",
+    "            err_sqr = ( (g_t_dt - g_t_d2x) - func)**2\n",
+    "            cost_sum += err_sqr\n",
+    "\n",
+    "    return cost_sum /( np.size(x)*np.size(t) )\n",
+    "\n",
+    "## For comparison, define the analytical solution\n",
+    "def g_analytic(point):\n",
+    "    x,t = point\n",
+    "    return np.exp(-np.pi**2*t)*np.sin(np.pi*x)\n",
+    "\n",
+    "## Set up a function for training the network to solve for the equation\n",
+    "def solve_pde_deep_neural_network(x,t, num_neurons, num_iter, lmb):\n",
+    "    ## Set up initial weigths and biases\n",
+    "    N_hidden = np.size(num_neurons)\n",
+    "\n",
+    "    ## Set up initial weigths and biases\n",
+    "\n",
+    "    # Initialize the list of parameters:\n",
+    "    P = [None]*(N_hidden + 1) # + 1 to include the output layer\n",
+    "\n",
+    "    P[0] = npr.randn(num_neurons[0], 2 + 1 ) # 2 since we have two points, +1 to include bias\n",
+    "    for l in range(1,N_hidden):\n",
+    "        P[l] = npr.randn(num_neurons[l], num_neurons[l-1] + 1) # +1 to include bias\n",
+    "\n",
+    "    # For the output layer\n",
+    "    P[-1] = npr.randn(1, num_neurons[-1] + 1 ) # +1 since bias is included\n",
+    "\n",
+    "    print('Initial cost: ',cost_function(P, x, t))\n",
+    "\n",
+    "    cost_function_grad = grad(cost_function,0)\n",
+    "\n",
+    "    # Let the update be done num_iter times\n",
+    "    for i in range(num_iter):\n",
+    "        cost_grad =  cost_function_grad(P, x , t)\n",
+    "\n",
+    "        for l in range(N_hidden+1):\n",
+    "            P[l] = P[l] - lmb * cost_grad[l]\n",
+    "\n",
+    "    print('Final cost: ',cost_function(P, x, t))\n",
+    "\n",
+    "    return P\n",
+    "\n",
+    "if __name__ == '__main__':\n",
+    "    ### Use the neural network:\n",
+    "    npr.seed(15)\n",
+    "\n",
+    "    ## Decide the vales of arguments to the function to solve\n",
+    "    Nx = 10; Nt = 10\n",
+    "    x = np.linspace(0, 1, Nx)\n",
+    "    t = np.linspace(0,1,Nt)\n",
+    "\n",
+    "    ## Set up the parameters for the network\n",
+    "    num_hidden_neurons = [100, 25]\n",
+    "    num_iter = 250\n",
+    "    lmb = 0.01\n",
+    "\n",
+    "    P = solve_pde_deep_neural_network(x,t, num_hidden_neurons, num_iter, lmb)\n",
+    "\n",
+    "    ## Store the results\n",
+    "    g_dnn_ag = np.zeros((Nx, Nt))\n",
+    "    G_analytical = np.zeros((Nx, Nt))\n",
+    "    for i,x_ in enumerate(x):\n",
+    "        for j, t_ in enumerate(t):\n",
+    "            point = np.array([x_, t_])\n",
+    "            g_dnn_ag[i,j] = g_trial(point,P)\n",
+    "\n",
+    "            G_analytical[i,j] = g_analytic(point)\n",
+    "\n",
+    "    # Find the map difference between the analytical and the computed solution\n",
+    "    diff_ag = np.abs(g_dnn_ag - G_analytical)\n",
+    "    print('Max absolute difference between the analytical solution and the network: %g'%np.max(diff_ag))\n",
+    "\n",
+    "    ## Plot the solutions in two dimensions, that being in position and time\n",
+    "\n",
+    "    T,X = np.meshgrid(t,x)\n",
+    "\n",
+    "    fig = plt.figure(figsize=(10,10))\n",
+    "    ax = fig.add_suplot(projection='3d')\n",
+    "    ax.set_title('Solution from the deep neural network w/ %d layer'%len(num_hidden_neurons))\n",
+    "    s = ax.plot_surface(T,X,g_dnn_ag,linewidth=0,antialiased=False,cmap=cm.viridis)\n",
+    "    ax.set_xlabel('Time $t$')\n",
+    "    ax.set_ylabel('Position $x$');\n",
+    "\n",
+    "\n",
+    "    fig = plt.figure(figsize=(10,10))\n",
+    "    ax = fig.add_suplot(projection='3d')\n",
+    "    ax.set_title('Analytical solution')\n",
+    "    s = ax.plot_surface(T,X,G_analytical,linewidth=0,antialiased=False,cmap=cm.viridis)\n",
+    "    ax.set_xlabel('Time $t$')\n",
+    "    ax.set_ylabel('Position $x$');\n",
+    "\n",
+    "    fig = plt.figure(figsize=(10,10))\n",
+    "    ax = fig.add_suplot(projection='3d')\n",
+    "    ax.set_title('Difference')\n",
+    "    s = ax.plot_surface(T,X,diff_ag,linewidth=0,antialiased=False,cmap=cm.viridis)\n",
+    "    ax.set_xlabel('Time $t$')\n",
+    "    ax.set_ylabel('Position $x$');\n",
+    "\n",
+    "    ## Take some slices of the 3D plots just to see the solutions at particular times\n",
+    "    indx1 = 0\n",
+    "    indx2 = int(Nt/2)\n",
+    "    indx3 = Nt-1\n",
+    "\n",
+    "    t1 = t[indx1]\n",
+    "    t2 = t[indx2]\n",
+    "    t3 = t[indx3]\n",
+    "\n",
+    "    # Slice the results from the DNN\n",
+    "    res1 = g_dnn_ag[:,indx1]\n",
+    "    res2 = g_dnn_ag[:,indx2]\n",
+    "    res3 = g_dnn_ag[:,indx3]\n",
+    "\n",
+    "    # Slice the analytical results\n",
+    "    res_analytical1 = G_analytical[:,indx1]\n",
+    "    res_analytical2 = G_analytical[:,indx2]\n",
+    "    res_analytical3 = G_analytical[:,indx3]\n",
+    "\n",
+    "    # Plot the slices\n",
+    "    plt.figure(figsize=(10,10))\n",
+    "    plt.title(\"Computed solutions at time = %g\"%t1)\n",
+    "    plt.plot(x, res1)\n",
+    "    plt.plot(x,res_analytical1)\n",
+    "    plt.legend(['dnn','analytical'])\n",
+    "\n",
+    "    plt.figure(figsize=(10,10))\n",
+    "    plt.title(\"Computed solutions at time = %g\"%t2)\n",
+    "    plt.plot(x, res2)\n",
+    "    plt.plot(x,res_analytical2)\n",
+    "    plt.legend(['dnn','analytical'])\n",
+    "\n",
+    "    plt.figure(figsize=(10,10))\n",
+    "    plt.title(\"Computed solutions at time = %g\"%t3)\n",
+    "    plt.plot(x, res3)\n",
+    "    plt.plot(x,res_analytical3)\n",
+    "    plt.legend(['dnn','analytical'])\n",
+    "\n",
+    "    plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "51ff4964",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Resources on differential equations and deep learning\n",
+    "\n",
+    "1. [Artificial neural networks for solving ordinary and partial differential equations by I.E. Lagaris et al](https://pdfs.semanticscholar.org/d061/df393e0e8fbfd0ea24976458b7d42419040d.pdf)\n",
+    "\n",
+    "2. [Neural networks for solving differential equations by A. Honchar](https://becominghuman.ai/neural-networks-for-solving-differential-equations-fa230ac5e04c)\n",
+    "\n",
+    "3. [Solving differential equations using neural networks by M.M Chiaramonte and M. Kiener](http://cs229.stanford.edu/proj2013/ChiaramonteKiener-SolvingDifferentialEquationsUsingNeuralNetworks.pdf)\n",
+    "\n",
+    "4. [Introduction to Partial Differential Equations by A. Tveito, R. Winther](https://www.springer.com/us/book/9783540225515)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f7c3b9fc",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Convolutional Neural Networks (recognizing images)\n",
+    "\n",
+    "Convolutional neural networks (CNNs) were developed during the last\n",
+    "decade of the previous century, with a focus on character recognition\n",
+    "tasks. Nowadays, CNNs are a central element in the spectacular success\n",
+    "of deep learning methods. The success in for example image\n",
+    "classifications have made them a central tool for most machine\n",
+    "learning practitioners.\n",
+    "\n",
+    "CNNs are very similar to ordinary Neural Networks.\n",
+    "They are made up of neurons that have learnable weights and\n",
+    "biases. Each neuron receives some inputs, performs a dot product and\n",
+    "optionally follows it with a non-linearity. The whole network still\n",
+    "expresses a single differentiable score function: from the raw image\n",
+    "pixels on one end to class scores at the other. And they still have a\n",
+    "loss function (for example Softmax) on the last (fully-connected) layer\n",
+    "and all the tips/tricks we developed for learning regular Neural\n",
+    "Networks still apply (back propagation, gradient descent etc etc)."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5d3a5ee8",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## What is the Difference\n",
+    "\n",
+    "**CNN architectures make the explicit assumption that\n",
+    "the inputs are images, which allows us to encode certain properties\n",
+    "into the architecture. These then make the forward function more\n",
+    "efficient to implement and vastly reduce the amount of parameters in\n",
+    "the network.**"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e8618fc8",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Neural Networks vs CNNs\n",
+    "\n",
+    "Neural networks are defined as **affine transformations**, that is \n",
+    "a vector is received as input and is multiplied with a matrix of so-called weights (our unknown paramters) to produce an\n",
+    "output (to which a bias vector is usually added before passing the result\n",
+    "through a nonlinear activation function). This is applicable to any type of input, be it an\n",
+    "image, a sound clip or an unordered collection of features: whatever their\n",
+    "dimensionality, their representation can always be flattened into a vector\n",
+    "before the transformation."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b41b4781",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Why CNNS for images, sound files, medical images from CT scans etc?\n",
+    "\n",
+    "However, when we consider images, sound clips and many other similar kinds of data, these data  have an intrinsic\n",
+    "structure. More formally, they share these important properties:\n",
+    "* They are stored as multi-dimensional arrays (think of the pixels of a figure) .\n",
+    "\n",
+    "* They feature one or more axes for which ordering matters (e.g., width and height axes for an image, time axis for a sound clip).\n",
+    "\n",
+    "* One axis, called the channel axis, is used to access different views of the data (e.g., the red, green and blue channels of a color image, or the left and right channels of a stereo audio track).\n",
+    "\n",
+    "These properties are not exploited when an affine transformation is applied; in\n",
+    "fact, all the axes are treated in the same way and the topological information\n",
+    "is not taken into account. Still, taking advantage of the implicit structure of\n",
+    "the data may prove very handy in solving some tasks, like computer vision and\n",
+    "speech recognition, and in these cases it would be best to preserve it. This is\n",
+    "where discrete convolutions come into play.\n",
+    "\n",
+    "A discrete convolution is a linear transformation that preserves this notion of\n",
+    "ordering. It is sparse (only a few input units contribute to a given output\n",
+    "unit) and reuses parameters (the same weights are applied to multiple locations\n",
+    "in the input)."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "33bf8922",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Regular NNs don’t scale well to full images\n",
+    "\n",
+    "As an example, consider\n",
+    "an image of size $32\\times 32\\times 3$ (32 wide, 32 high, 3 color channels), so a\n",
+    "single fully-connected neuron in a first hidden layer of a regular\n",
+    "Neural Network would have $32\\times 32\\times 3 = 3072$ weights. This amount still\n",
+    "seems manageable, but clearly this fully-connected structure does not\n",
+    "scale to larger images. For example, an image of more respectable\n",
+    "size, say $200\\times 200\\times 3$, would lead to neurons that have \n",
+    "$200\\times 200\\times 3 = 120,000$ weights. \n",
+    "\n",
+    "We could have\n",
+    "several such neurons, and the parameters would add up quickly! Clearly,\n",
+    "this full connectivity is wasteful and the huge number of parameters\n",
+    "would quickly lead to possible overfitting.\n",
+    "\n",
+    "<!-- dom:FIGURE: [figslides/nn.jpeg, width=500 frac=0.6]  A regular 3-layer Neural Network. -->\n",
+    "<!-- begin figure -->\n",
+    "\n",
+    "<img src=\"figslides/nn.jpeg\" width=\"500\"><p style=\"font-size: 0.9em\"><i>Figure 1: A regular 3-layer Neural Network.</i></p>\n",
+    "<!-- end figure -->"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "95c20234",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## 3D volumes of neurons\n",
+    "\n",
+    "Convolutional Neural Networks take advantage of the fact that the\n",
+    "input consists of images and they constrain the architecture in a more\n",
+    "sensible way. \n",
+    "\n",
+    "In particular, unlike a regular Neural Network, the\n",
+    "layers of a CNN have neurons arranged in 3 dimensions: width,\n",
+    "height, depth. (Note that the word depth here refers to the third\n",
+    "dimension of an activation volume, not to the depth of a full Neural\n",
+    "Network, which can refer to the total number of layers in a network.)\n",
+    "\n",
+    "To understand it better, the above example of an image \n",
+    "with an input volume of\n",
+    "activations has dimensions $32\\times 32\\times 3$ (width, height,\n",
+    "depth respectively). \n",
+    "\n",
+    "The neurons in a layer will\n",
+    "only be connected to a small region of the layer before it, instead of\n",
+    "all of the neurons in a fully-connected manner. Moreover, the final\n",
+    "output layer could  for this specific image have dimensions $1\\times 1 \\times 10$, \n",
+    "because by the\n",
+    "end of the CNN architecture we will reduce the full image into a\n",
+    "single vector of class scores, arranged along the depth\n",
+    "dimension. \n",
+    "\n",
+    "<!-- dom:FIGURE: [figslides/cnn.jpeg, width=500 frac=0.6]  A CNN arranges its neurons in three dimensions (width, height, depth), as visualized in one of the layers. Every layer of a CNN transforms the 3D input volume to a 3D output volume of neuron activations. In this example, the red input layer holds the image, so its width and height would be the dimensions of the image, and the depth would be 3 (Red, Green, Blue channels). -->\n",
+    "<!-- begin figure -->\n",
+    "\n",
+    "<img src=\"figslides/cnn.jpeg\" width=\"500\"><p style=\"font-size: 0.9em\"><i>Figure 1: A CNN arranges its neurons in three dimensions (width, height, depth), as visualized in one of the layers. Every layer of a CNN transforms the 3D input volume to a 3D output volume of neuron activations. In this example, the red input layer holds the image, so its width and height would be the dimensions of the image, and the depth would be 3 (Red, Green, Blue channels).</i></p>\n",
+    "<!-- end figure -->"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2b7ba652",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## More on Dimensionalities\n",
+    "\n",
+    "In fields like signal processing (and imaging as well), one designs\n",
+    "so-called filters. These filters are defined by the convolutions and\n",
+    "are often hand-crafted. One may specify filters for smoothing, edge\n",
+    "detection, frequency reshaping, and similar operations. However with\n",
+    "neural networks the idea is to automatically learn the filters and use\n",
+    "many of them in conjunction with non-linear operations (activation\n",
+    "functions).\n",
+    "\n",
+    "As an example consider a neural network operating on sound sequence\n",
+    "data.  Assume that we an input vector $\\boldsymbol{x}$ of length $d=10^6$.  We\n",
+    "construct then a neural network with onle hidden layer only with\n",
+    "$10^4$ nodes. This means that we will have a weight matrix with\n",
+    "$10^4\\times 10^6=10^{10}$ weights to be determined, together with $10^4$ biases.\n",
+    "\n",
+    "Assume furthermore that we have an output layer which is meant to train whether the sound sequence represents a human voice (true) or something else (false).\n",
+    "It means that we have only one output node. But since this output node connects to $10^4$ nodes in the hidden layer, there are in total $10^4$ weights to be determined for the output layer, plus one bias. In total we have"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b6a7ae46",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\mathrm{NumberParameters}=10^{10}+10^4+10^4+1 \\approx 10^{10},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0d56b05e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "that is ten billion parameters to determine."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "35c90423",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Further remarks\n",
+    "\n",
+    "The main principles that justify convolutions is locality of\n",
+    "information and repetion of patterns within the signal. Sound samples\n",
+    "of the input in adjacent spots are much more likely to affect each\n",
+    "other than those that are very far away. Similarly, sounds are\n",
+    "repeated in multiple times in the signal. While slightly simplistic,\n",
+    "reasoning about such a sound example demonstrates this. The same\n",
+    "principles then apply to images and other similar data."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d08d4fb6",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Layers used to build CNNs\n",
+    "\n",
+    "A simple CNN is a sequence of layers, and every layer of a CNN\n",
+    "transforms one volume of activations to another through a\n",
+    "differentiable function. We use three main types of layers to build\n",
+    "CNN architectures: Convolutional Layer, Pooling Layer, and\n",
+    "Fully-Connected Layer (exactly as seen in regular Neural Networks). We\n",
+    "will stack these layers to form a full CNN architecture.\n",
+    "\n",
+    "A simple CNN for image classification could have the architecture:\n",
+    "\n",
+    "* **INPUT** ($32\\times 32 \\times 3$) will hold the raw pixel values of the image, in this case an image of width 32, height 32, and with three color channels R,G,B.\n",
+    "\n",
+    "* **CONV** (convolutional )layer will compute the output of neurons that are connected to local regions in the input, each computing a dot product between their weights and a small region they are connected to in the input volume. This may result in volume such as $[32\\times 32\\times 12]$ if we decided to use 12 filters.\n",
+    "\n",
+    "* **RELU** layer will apply an elementwise activation function, such as the $max(0,x)$ thresholding at zero. This leaves the size of the volume unchanged ($[32\\times 32\\times 12]$).\n",
+    "\n",
+    "* **POOL** (pooling) layer will perform a downsampling operation along the spatial dimensions (width, height), resulting in volume such as $[16\\times 16\\times 12]$.\n",
+    "\n",
+    "* **FC** (i.e. fully-connected) layer will compute the class scores, resulting in volume of size $[1\\times 1\\times 10]$, where each of the 10 numbers correspond to a class score, such as among the 10 categories of the MNIST images we considered above . As with ordinary Neural Networks and as the name implies, each neuron in this layer will be connected to all the numbers in the previous volume."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "dd95dcc6",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Transforming images\n",
+    "\n",
+    "CNNs transform the original image layer by layer from the original\n",
+    "pixel values to the final class scores. \n",
+    "\n",
+    "Observe that some layers contain\n",
+    "parameters and other don’t. In particular, the CNN layers perform\n",
+    "transformations that are a function of not only the activations in the\n",
+    "input volume, but also of the parameters (the weights and biases of\n",
+    "the neurons). On the other hand, the RELU/POOL layers will implement a\n",
+    "fixed function. The parameters in the CONV/FC layers will be trained\n",
+    "with gradient descent so that the class scores that the CNN computes\n",
+    "are consistent with the labels in the training set for each image."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5fdbdbfd",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## CNNs in brief\n",
+    "\n",
+    "In summary:\n",
+    "\n",
+    "* A CNN architecture is in the simplest case a list of Layers that transform the image volume into an output volume (e.g. holding the class scores)\n",
+    "\n",
+    "* There are a few distinct types of Layers (e.g. CONV/FC/RELU/POOL are by far the most popular)\n",
+    "\n",
+    "* Each Layer accepts an input 3D volume and transforms it to an output 3D volume through a differentiable function\n",
+    "\n",
+    "* Each Layer may or may not have parameters (e.g. CONV/FC do, RELU/POOL don’t)\n",
+    "\n",
+    "* Each Layer may or may not have additional hyperparameters (e.g. CONV/FC/POOL do, RELU doesn’t)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c0cbb6b0",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## A deep CNN model ([From Raschka et al](https://github.com/rasbt/machine-learning-book))\n",
+    "\n",
+    "<!-- dom:FIGURE: [figslides/deepcnn.png, width=500 frac=0.67]  A deep CNN -->\n",
+    "<!-- begin figure -->\n",
+    "\n",
+    "<img src=\"figslides/deepcnn.png\" width=\"500\"><p style=\"font-size: 0.9em\"><i>Figure 1: A deep CNN</i></p>\n",
+    "<!-- end figure -->"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "caf2418d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Key Idea\n",
+    "\n",
+    "A dense neural network is representd by an affine operation (like\n",
+    "matrix-matrix multiplication) where all parameters are included.\n",
+    "\n",
+    "The key idea in CNNs for say imaging is that in images neighbor pixels tend to be related! So we connect\n",
+    "only neighboring neurons in the input instead of connecting all with the first hidden layer.\n",
+    "\n",
+    "We say we perform a filtering (convolution is the mathematical operation)."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7d5552d8",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## How to do image compression before the era of deep learning\n",
+    "\n",
+    "The singular-value decomposition (SVD) algorithm has been for decades one of the standard ways of compressing images.\n",
+    "The [lectures on the SVD](https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/chapter2.html#the-singular-value-decomposition) give many of the essential details concerning the SVD.\n",
+    "\n",
+    "The orthogonal vectors which are obtained from the SVD, can be used to\n",
+    "project down the dimensionality of a given image. In the example here\n",
+    "we gray-scale an image and downsize it.\n",
+    "\n",
+    "This recipe relies on us being able to actually perform the SVD. For\n",
+    "large images, and in particular with many images to reconstruct, using the SVD \n",
+    "may quickly become an overwhelming task. With the advent of efficient deep\n",
+    "learning methods like CNNs and later generative methods, these methods\n",
+    "have become in the last years the premier way of performing image\n",
+    "analysis. In particular for classification problems with labelled images."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d0bc0489",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## The SVD example"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 10,
+   "id": "cec697e6",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "from matplotlib.image import imread\n",
+    "import matplotlib.pyplot as plt\n",
+    "import scipy.linalg as ln\n",
+    "import numpy as np\n",
+    "import os\n",
+    "from PIL import Image\n",
+    "from math import log10, sqrt \n",
+    "plt.rcParams['figure.figsize'] = [16, 8]\n",
+    "# Import image\n",
+    "A = imread(os.path.join(\"figslides/photo1.jpg\"))\n",
+    "X = A.dot([0.299, 0.5870, 0.114]) # Convert RGB to grayscale\n",
+    "img = plt.imshow(X)\n",
+    "# convert to gray\n",
+    "img.set_cmap('gray')\n",
+    "plt.axis('off')\n",
+    "plt.show()\n",
+    "# Call image size\n",
+    "print(': %s'%str(X.shape))\n",
+    "\n",
+    "\n",
+    "# split the matrix into U, S, VT\n",
+    "U, S, VT = np.linalg.svd(X,full_matrices=False)\n",
+    "S = np.diag(S)\n",
+    "m = 800 # Image's width\n",
+    "n = 1200 # Image's height\n",
+    "j = 0\n",
+    "# Try compression with different k vectors (these represent projections):\n",
+    "for k in (5,10, 20, 100,200,400,500):\n",
+    "    # Original size of the image\n",
+    "    originalSize = m * n \n",
+    "    # Size after compressed\n",
+    "    compressedSize = k * (1 + m + n) \n",
+    "    # The projection of the original image\n",
+    "    Xapprox = U[:,:k] @ S[0:k,:k] @ VT[:k,:]\n",
+    "    plt.figure(j+1)\n",
+    "    j += 1\n",
+    "    img = plt.imshow(Xapprox)\n",
+    "    img.set_cmap('gray')\n",
+    "    \n",
+    "    plt.axis('off')\n",
+    "    plt.title('k = ' + str(k))\n",
+    "    plt.show() \n",
+    "    print('Original size of image:')\n",
+    "    print(originalSize)\n",
+    "    print('Compression rate as Compressed image / Original size:')\n",
+    "    ratio = compressedSize * 1.0 / originalSize\n",
+    "    print(ratio)\n",
+    "    print('Compression rate is ' + str( round(ratio * 100 ,2)) + '%' )  \n",
+    "    # Estimate MQA\n",
+    "    x= X.astype(\"float\")\n",
+    "    y=Xapprox.astype(\"float\")\n",
+    "    err = np.sum((x - y) ** 2)\n",
+    "    err /= float(X.shape[0] * Xapprox.shape[1])\n",
+    "    print('The mean-square deviation '+ str(round( err)))\n",
+    "    max_pixel = 255.0\n",
+    "    # Estimate Signal Noise Ratio\n",
+    "    srv = 20 * (log10(max_pixel / sqrt(err)))\n",
+    "    print('Signa to noise ratio '+ str(round(srv)) +'dB')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6a578704",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Mathematics of CNNs\n",
+    "\n",
+    "The mathematics of CNNs is based on the mathematical operation of\n",
+    "**convolution**.  In mathematics (in particular in functional analysis),\n",
+    "convolution is represented by mathematical operations (integration,\n",
+    "summation etc) on two functions in order to produce a third function\n",
+    "that expresses how the shape of one gets modified by the other.\n",
+    "Convolution has a plethora of applications in a variety of\n",
+    "disciplines, spanning from statistics to signal processing, computer\n",
+    "vision, solutions of differential equations,linear algebra,\n",
+    "engineering, and yes, machine learning.\n",
+    "\n",
+    "Mathematically, convolution is defined as follows (one-dimensional example):\n",
+    "Let us define a continuous function $y(t)$ given by"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5c858d52",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "y(t) = \\int x(a) w(t-a) da,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a96333c3",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where $x(a)$ represents a so-called input and $w(t-a)$ is normally called the weight function or kernel.\n",
+    "\n",
+    "The above integral is written in  a more compact form as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9834d45e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "y(t) = \\left(x * w\\right)(t).\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "13e15c5f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "The discretized version reads"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0a496b2f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "y(t) = \\sum_{a=-\\infty}^{a=\\infty}x(a)w(t-a).\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "48c5ecd3",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Computing the inverse of the above convolution operations is known as deconvolution and the process is commutative.\n",
+    "\n",
+    "How can we use this? And what does it mean? Let us study some familiar examples first."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7cab11e7",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Convolution Examples: Polynomial multiplication\n",
+    "\n",
+    "Our first example is that of a multiplication between two polynomials,\n",
+    "which we will rewrite in terms of the mathematics of convolution. In\n",
+    "the final stage, since the problem here is a discrete one, we will\n",
+    "recast the final expression in terms of a matrix-vector\n",
+    "multiplication, where the matrix is a so-called [Toeplitz matrix\n",
+    "](https://link.springer.com/book/10.1007/978-93-86279-04-0).\n",
+    "\n",
+    "Let us look a the following polynomials to second and third order, respectively:"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c90333f8",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "p(t) = \\alpha_0+\\alpha_1 t+\\alpha_2 t^2,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7c1b0c9b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9c8df6e8",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "s(t) = \\beta_0+\\beta_1 t+\\beta_2 t^2+\\beta_3 t^3.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "50667dfa",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "The polynomial multiplication gives us a new polynomial of degree $5$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "11f2ea4b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "z(t) = \\delta_0+\\delta_1 t+\\delta_2 t^2+\\delta_3 t^3+\\delta_4 t^4+\\delta_5 t^5.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4abea758",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Efficient Polynomial Multiplication\n",
+    "\n",
+    "Computing polynomial products can be implemented efficiently if we rewrite the more brute force multiplications using convolution.\n",
+    "We note first that the new coefficients are given as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ad22b2d2",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\begin{split}\n",
+    "\\delta_0=&\\alpha_0\\beta_0\\\\\n",
+    "\\delta_1=&\\alpha_1\\beta_0+\\alpha_0\\beta_1\\\\\n",
+    "\\delta_2=&\\alpha_0\\beta_2+\\alpha_1\\beta_1+\\alpha_2\\beta_0\\\\\n",
+    "\\delta_3=&\\alpha_1\\beta_2+\\alpha_2\\beta_1+\\alpha_0\\beta_3\\\\\n",
+    "\\delta_4=&\\alpha_2\\beta_2+\\alpha_1\\beta_3\\\\\n",
+    "\\delta_5=&\\alpha_2\\beta_3.\\\\\n",
+    "\\end{split}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6a3ee064",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "We note that $\\alpha_i=0$ except for $i\\in \\left\\{0,1,2\\right\\}$ and $\\beta_i=0$ except for $i\\in\\left\\{0,1,2,3\\right\\}$.\n",
+    "\n",
+    "We can then rewrite the coefficients $\\delta_j$ using a discrete convolution as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3aca65d8",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\delta_j = \\sum_{i=-\\infty}^{i=\\infty}\\alpha_i\\beta_{j-i}=(\\alpha * \\beta)_j,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0e04ce27",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "or as a double sum with restriction $l=i+j$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "173eda29",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\delta_l = \\sum_{ij}\\alpha_i\\beta_{j}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a196c2cd",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Further simplification\n",
+    "\n",
+    "Although we may have redundant operations with some few zeros for $\\beta_i$, we can rewrite the above sum in a more compact way as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "56018bb8",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\delta_i = \\sum_{k=0}^{k=m-1}\\alpha_k\\beta_{i-k},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ba91ab7b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where $m=3$ in our case, the maximum length of\n",
+    "the vector $\\alpha$. Note that the vector $\\boldsymbol{\\beta}$ has length $n=4$. Below we will find an even more efficient representation."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1b25324b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## A more efficient way of coding the above Convolution\n",
+    "\n",
+    "Since we only have a finite number of $\\alpha$ and $\\beta$ values\n",
+    "which are non-zero, we can rewrite the above convolution expressions\n",
+    "as a matrix-vector multiplication"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "dd6d9155",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\boldsymbol{\\delta}=\\begin{bmatrix}\\alpha_0 & 0 & 0 & 0 \\\\\n",
+    "                            \\alpha_1 & \\alpha_0 & 0 & 0 \\\\\n",
+    "\t\t\t    \\alpha_2 & \\alpha_1 & \\alpha_0 & 0 \\\\\n",
+    "\t\t\t    0 & \\alpha_2 & \\alpha_1 & \\alpha_0 \\\\\n",
+    "\t\t\t    0 & 0 & \\alpha_2 & \\alpha_1 \\\\\n",
+    "\t\t\t    0 & 0 & 0 & \\alpha_2\n",
+    "\t\t\t    \\end{bmatrix}\\begin{bmatrix} \\beta_0 \\\\ \\beta_1 \\\\ \\beta_2 \\\\ \\beta_3\\end{bmatrix}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "28050537",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Commutative process\n",
+    "\n",
+    "The process is commutative and we can easily see that we can rewrite the multiplication in terms of  a matrix holding $\\beta$ and a vector holding $\\alpha$.\n",
+    "In this case we have"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f8278af4",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\boldsymbol{\\delta}=\\begin{bmatrix}\\beta_0 & 0 & 0  \\\\\n",
+    "                            \\beta_1 & \\beta_0 & 0  \\\\\n",
+    "\t\t\t    \\beta_2 & \\beta_1 & \\beta_0  \\\\\n",
+    "\t\t\t    \\beta_3 & \\beta_2 & \\beta_1 \\\\\n",
+    "\t\t\t    0 & \\beta_3 & \\beta_2 \\\\\n",
+    "\t\t\t    0 & 0 & \\beta_3\n",
+    "\t\t\t    \\end{bmatrix}\\begin{bmatrix} \\alpha_0 \\\\ \\alpha_1 \\\\ \\alpha_2\\end{bmatrix}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cfa8bf9e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Note that the use of these matrices is for mathematical purposes only\n",
+    "and not implementation purposes.  When implementing the above equation\n",
+    "we do not encode (and allocate memory) the matrices explicitely.  We\n",
+    "rather code the convolutions in the minimal memory footprint that they\n",
+    "require."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4ad971ca",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Toeplitz matrices\n",
+    "\n",
+    "The above matrices are examples of so-called [Toeplitz\n",
+    "matrices](https://link.springer.com/book/10.1007/978-93-86279-04-0). A\n",
+    "Toeplitz matrix is a matrix in which each descending diagonal from\n",
+    "left to right is constant. For instance the last matrix, which we\n",
+    "rewrite as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ff12250a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\boldsymbol{A}=\\begin{bmatrix}a_0 & 0 & 0  \\\\\n",
+    "                            a_1 & a_0 & 0  \\\\\n",
+    "\t\t\t    a_2 & a_1 & a_0  \\\\\n",
+    "\t\t\t    a_3 & a_2 & a_1 \\\\\n",
+    "\t\t\t    0 & a_3 & a_2 \\\\\n",
+    "\t\t\t    0 & 0 & a_3\n",
+    "\t\t\t    \\end{bmatrix},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d7ebe2e9",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "with elements $a_{ii}=a_{i+1,j+1}=a_{i-j}$ is an example of a Toeplitz\n",
+    "matrix. Such a matrix does not need to be a square matrix.  Toeplitz\n",
+    "matrices are also closely connected with Fourier series, because the multiplication operator by a trigonometric\n",
+    "polynomial, compressed to a finite-dimensional space, can be\n",
+    "represented by such a matrix. The example above shows that we can\n",
+    "represent linear convolution as multiplication of a Toeplitz matrix by\n",
+    "a vector."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5bfa6cd4",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Fourier series and Toeplitz matrices\n",
+    "\n",
+    "This is an active and ogoing research area concerning CNNs. The following articles may be of interest\n",
+    "1. [Read more about the convolution theorem and Fouriers series](https://www.sciencedirect.com/topics/engineering/convolution-theorem#:~:text=The%20convolution%20theorem%20(together%20with,k%20)%20G%20(%20k%20)%20.)\n",
+    "\n",
+    "2. [Fourier Transform Layer](https://www.sciencedirect.com/science/article/pii/S1568494623006257)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4cb64d8c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Generalizing the above one-dimensional case\n",
+    "\n",
+    "In order to align the above simple case with the more general\n",
+    "convolution cases, we rename $\\boldsymbol{\\alpha}$, whose length is $m=3$,\n",
+    "with $\\boldsymbol{w}$.  We will interpret $\\boldsymbol{w}$ as a weight/filter function\n",
+    "with which we want to perform the convolution with an input variable\n",
+    "$\\boldsymbol{x}$ of length $n$.  We will assume always that the filter\n",
+    "$\\boldsymbol{w}$ has dimensionality $m \\le n$.\n",
+    "\n",
+    "We replace thus $\\boldsymbol{\\beta}$ with $\\boldsymbol{x}$ and $\\boldsymbol{\\delta}$ with $\\boldsymbol{y}$ and have"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b05f94fc",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "y(i)= \\left(x*w\\right)(i)= \\sum_{k=0}^{k=m-1}w(k)x(i-k),\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e95bb8b8",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where $m=3$ in our case, the maximum length of the vector $\\boldsymbol{w}$.\n",
+    "Here the symbol $*$ represents the mathematical operation of convolution."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "490b28d9",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Memory considerations\n",
+    "\n",
+    "This expression leaves us however with some terms with negative\n",
+    "indices, for example $x(-1)$ and $x(-2)$ which may not be defined. Our\n",
+    "vector $\\boldsymbol{x}$ has components $x(0)$, $x(1)$, $x(2)$ and $x(3)$.\n",
+    "\n",
+    "The index $j$ for $\\boldsymbol{x}$ runs from $j=0$ to $j=3$ since $\\boldsymbol{x}$ is meant to\n",
+    "represent a third-order polynomial.\n",
+    "\n",
+    "Furthermore, the index $i$ runs from $i=0$ to $i=5$ since $\\boldsymbol{y}$\n",
+    "contains the coefficients of a fifth-order polynomial.  When $i=5$ we\n",
+    "may also have values of $x(4)$ and $x(5)$ which are not defined."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "73dba37b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Padding\n",
+    "\n",
+    "The solution to this is what is called **padding**!  We simply define a\n",
+    "new vector $x$ with two added elements set to zero before $x(0)$ and\n",
+    "two new elements after $x(3)$ set to zero. That is, we augment the\n",
+    "length of $\\boldsymbol{x}$ from $n=4$ to $n+2P=8$, where $P=2$ is the padding\n",
+    "constant (a new hyperparameter), see discussions below as well."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a4ef9cfb",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## New vector\n",
+    "\n",
+    "We have a new vector defined as $x(0)=0$, $x(1)=0$,\n",
+    "$x(2)=\\beta_0$, $x(3)=\\beta_1$, $x(4)=\\beta_2$, $x(5)=\\beta_3$,\n",
+    "$x(6)=0$, and $x(7)=0$.\n",
+    "\n",
+    "We have added four new elements, which\n",
+    "are all zero. The benefit is that we can rewrite the equation for\n",
+    "$\\boldsymbol{y}$, with $i=0,1,\\dots,5$,"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a3df037d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "y(i) = \\sum_{k=0}^{k=m-1}w(k)x(i+(m-1)-k).\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a10c95fd",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "As an example, we have"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "be674b8a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "y(4)=x(6)w(0)+x(5)w(1)+x(4)w(2)=0\\times \\alpha_0+\\beta_3\\alpha_1+\\beta_2\\alpha_2,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c903130e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "as before except that we have an additional term $x(6)w(0)$, which is zero.\n",
+    "\n",
+    "Similarly, for the fifth-order term we have"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "369fb648",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "y(5)=x(7)w(0)+x(6)w(1)+x(5)w(2)=0\\times \\alpha_0+0\\times\\alpha_1+\\beta_3\\alpha_2.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9eae3982",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "The zeroth-order term is"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "52147ec0",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "y(0)=x(2)w(0)+x(1)w(1)+x(0)w(2)=\\beta_0 \\alpha_0+0\\times\\alpha_1+0\\times\\alpha_2=\\alpha_0\\beta_0.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f26b1f24",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Rewriting as dot products\n",
+    "\n",
+    "If we now flip the filter/weight vector, with the following term as a typical example"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1cda7b7e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "y(0)=x(2)w(0)+x(1)w(1)+x(0)w(2)=x(2)\\tilde{w}(2)+x(1)\\tilde{w}(1)+x(0)\\tilde{w}(0),\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "de80daa7",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "with $\\tilde{w}(0)=w(2)$, $\\tilde{w}(1)=w(1)$, and $\\tilde{w}(2)=w(0)$, we can then rewrite the above sum as a dot product of\n",
+    "$x(i:i+(m-1))\\tilde{w}$ for element $y(i)$, where $x(i:i+(m-1))$ is simply a patch of $\\boldsymbol{x}$ of size $m-1$.\n",
+    "\n",
+    "The padding $P$ we have introduced for the convolution stage is just\n",
+    "another hyperparameter which is introduced as part of the\n",
+    "architecture. Similarly, below we will also introduce another\n",
+    "hyperparameter called **Stride** $S$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "bdb16a64",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Cross correlation\n",
+    "\n",
+    "In essentially all applications one uses what is called cross correlation instead of the standard convolution described above.\n",
+    "This means that multiplication is performed in the same direction and instead of the general expression we have discussed above (with infinite sums)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a88a1043",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "y(i) = \\sum_{k=-\\infty}^{k=\\infty}w(k)x(i-k),\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "03659d77",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "we have now"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "532e84de",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "y(i) = \\sum_{k=-\\infty}^{k=\\infty}w(k)x(i+k).\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0487f1f5",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Both TensorFlow and PyTorch (as well as our own code example below),\n",
+    "implement the last equation, although it is normally referred to as\n",
+    "convolution.  The same padding rules and stride rules discussed below\n",
+    "apply to this expression as well.\n",
+    "\n",
+    "We leave it as an exercise for you to convince yourself that the example we have discussed till now, gives the same final expression using the last expression."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "98475dfa",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Two-dimensional objects\n",
+    "\n",
+    "We are now ready to start studying the discrete convolutions relevant for convolutional neural networks.\n",
+    "We often use convolutions over more than one dimension at a time. If\n",
+    "we have a two-dimensional image $X$ as input, we can have a **filter**\n",
+    "defined by a two-dimensional **kernel/weight/filter** $W$. This leads to an output $Y$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1cb3be71",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "Y(i,j)=(X * W)(i,j) = \\sum_m\\sum_n X(m,n)W(i-m,j-n).\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "bd9cd9fb",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Convolution is a commutative process, which means we can rewrite this equation as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1ba314a8",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "Y(i,j)=(X * W)(i,j) = \\sum_m\\sum_n X(i-m,j-n)W(m,n).\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b9fb3fef",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Normally the latter is more straightforward to implement in a machine\n",
+    "larning library since there is less variation in the range of values\n",
+    "of $m$ and $n$.\n",
+    "\n",
+    "As mentioned above, most deep learning libraries implement\n",
+    "cross-correlation instead of convolution (although it is referred to as\n",
+    "convolution)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2d48086b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "Y(i,j)=(X * W)(i,j) = \\sum_m\\sum_n X(i+m,j+n)W(m,n).\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2a62fbae",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## CNNs in more detail, simple example\n",
+    "\n",
+    "Let assume we have an input matrix $X$ of dimensionality $3\\times 3$\n",
+    "and a $2\\times 2$ filter $W$ given by the following matrices"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0176ecc6",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\boldsymbol{X}=\\begin{bmatrix}x_{00} & x_{01} & x_{02}  \\\\\n",
+    "                      x_{10} & x_{11} & x_{12}  \\\\\n",
+    "\t              x_{20} & x_{21} & x_{22} \\end{bmatrix},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f87b6051",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "164502cc",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\boldsymbol{W}=\\begin{bmatrix}w_{00} & w_{01} \\\\\n",
+    "\t              w_{10} & w_{11}\\end{bmatrix}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1d4e61fe",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "We introduce now the hyperparameter $S$ **stride**. Stride represents how the filter $W$ moves the convolution process on the matrix $X$.\n",
+    "We strongly recommend the repository on [Arithmetic of deep learning by Dumoulin and Visin](https://github.com/vdumoulin/conv_arithmetic) \n",
+    "\n",
+    "Here we set the stride equal to $S=1$, which means that, starting with the element $x_{00}$, the filter will act on $2\\times 2$ submatrices each time, starting with the upper corner and moving according to the stride value column by column. \n",
+    "\n",
+    "Here we perform the operation"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7aae890d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "Y_(i,j)=(X * W)(i,j) = \\sum_m\\sum_n X(i-m,j-n)W(m,n),\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "352ba109",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and obtain"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4660c16f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\boldsymbol{Y}=\\begin{bmatrix}x_{00}w_{00}+x_{01}w_{01}+x_{10}w_{10}+x_{11}w_{11} & x_{01}w_{00}+x_{02}w_{01}+x_{11}w_{10}+x_{12}w_{11}  \\\\\n",
+    "\t              x_{10}w_{00}+x_{11}w_{01}+x_{20}w_{10}+x_{21}w_{11} & x_{11}w_{00}+x_{12}w_{01}+x_{21}w_{10}+x_{22}w_{11}\\end{bmatrix}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "edb9d39b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "We can rewrite this operation in terms of a matrix-vector multiplication by defining a new vector where we flatten out the inputs as a vector $\\boldsymbol{X}'$ of length $9$ and\n",
+    "a matrix $\\boldsymbol{W}'$ with dimension $4\\times 9$ as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "11470079",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\boldsymbol{X}'=\\begin{bmatrix}x_{00} \\\\ x_{01} \\\\ x_{02} \\\\ x_{10} \\\\ x_{11} \\\\ x_{12} \\\\ x_{20} \\\\ x_{21} \\\\ x_{22} \\end{bmatrix},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9b505f16",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and the new matrix"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "30c903b3",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\boldsymbol{W}'=\\begin{bmatrix} w_{00} & w_{01} & 0 & w_{10} & w_{11} & 0 & 0 & 0 & 0 \\\\\n",
+    "                        0  & w_{00} & w_{01} & 0 & w_{10} & w_{11} & 0 & 0 & 0 \\\\\n",
+    "\t\t\t0 & 0 & 0 & w_{00} & w_{01} & 0 & w_{10} & w_{11} & 0  \\\\\n",
+    "                        0 & 0 & 0 & 0 & w_{00} & w_{01} & 0 & w_{10} & w_{11}\\end{bmatrix}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "057a5e31",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "We see easily that performing the matrix-vector multiplication $\\boldsymbol{W}'\\boldsymbol{X}'$ is the same as the above convolution with stride $S=1$, that is"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e5f35917",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "Y=(\\boldsymbol{W}*\\boldsymbol{X}),\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7c7cca5e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "is now given by $\\boldsymbol{W}'\\boldsymbol{X}'$ which is a vector of length $4$ instead of the originally resulting  $2\\times 2$ output matrix."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ed8782fc",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## The convolution stage\n",
+    "\n",
+    "The convolution stage, where we apply different filters $\\boldsymbol{W}$ in\n",
+    "order to reduce the dimensionality of an image, adds, in addition to\n",
+    "the weights and biases (to be trained by the back propagation\n",
+    "algorithm) that define the filters, two new hyperparameters, the so-called\n",
+    "**padding** $P$ and the stride $S$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3582873f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Finding the number of parameters\n",
+    "\n",
+    "In the above example we have an input matrix of dimension $3\\times\n",
+    "3$. In general we call the input for an input volume and it is defined\n",
+    "by its width $H_1$, height $H_1$ and depth $D_1$. If we have the\n",
+    "standard three color channels $D_1=3$.\n",
+    "\n",
+    "The above example has $W_1=H_1=3$ and $D_1=1$.\n",
+    "\n",
+    "When we introduce the filter we have the following additional hyperparameters\n",
+    "1. $K$ the number of filters. It is common to perform the convolution of the input several times since by experience shrinking the input too fast does not work well\n",
+    "\n",
+    "2. $F$ as the filter's spatial extent\n",
+    "\n",
+    "3. $S$ as the stride parameter\n",
+    "\n",
+    "4. $P$ as the padding parameter\n",
+    "\n",
+    "These parameters are defined by the architecture of the network and are not included in the training."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c06b2b85",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## New image (or volume)\n",
+    "\n",
+    "Acting with the filter on the input volume produces an output volume\n",
+    "which is defined by its width $W_2$, its height $H_2$ and its depth\n",
+    "$D_2$.\n",
+    "\n",
+    "These are defined by the following relations"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "aa9ff748",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "W_2 = \\frac{(W_1-F+2P)}{S}+1,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0508533e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "H_2 = \\frac{(H_1-F+2P)}{S}+1,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2b59d0d6",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and $D_2=K$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e283f13b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Parameters to train, common settings\n",
+    "\n",
+    "With parameter sharing, the convolution involves thus  for each filter  $F\\times F\\times D_1$ weights plus one bias parameter.\n",
+    "\n",
+    "In total we have"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "59617fcb",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\left(F\\times F\\times D_1)\\right) \\times K+(K\\mathrm{--biases}),\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f406e197",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "parameters to train by back propagation.\n",
+    "\n",
+    "It is common to let $K$ come in powers of $2$, that is $32$, $64$, $128$ etc.\n",
+    "\n",
+    "**Common settings.**\n",
+    "\n",
+    "1. $\\begin{array}{c} F=3 & S=1 & P=1 \\end{array}$\n",
+    "\n",
+    "2. $\\begin{array}{c} F=5 & S=1 & P=2 \\end{array}$\n",
+    "\n",
+    "3. $\\begin{array}{c} F=5 & S=2 & P=\\mathrm{open} \\end{array}$\n",
+    "\n",
+    "4. $\\begin{array}{c} F=1 & S=1 & P=0 \\end{array}$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "82febfb4",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Examples of CNN setups\n",
+    "\n",
+    "Let us assume we have an input volume $V$ given by an image of dimensionality\n",
+    "$32\\times 32 \\times 3$, that is three color channels and $32\\times 32$ pixels.\n",
+    "\n",
+    "We apply a filter of dimension $5\\times 5$ ten times with stride $S=1$ and padding $P=0$.\n",
+    "\n",
+    "The output volume is given by $(32-5)/1+1=28$, resulting in ten images\n",
+    "of dimensionality $28\\times 28\\times 3$.\n",
+    "\n",
+    "The total number of parameters to train for each filter is then\n",
+    "$5\\times 5\\times 3+1$, where the last parameter is the bias. This\n",
+    "gives us $76$ parameters for each filter, leading to a total of $760$\n",
+    "parameters for the ten filters.\n",
+    "\n",
+    "How many parameters will a filter of dimensionality $3\\times 3$\n",
+    "(adding color channels) result in if we produce $32$ new images? Use $S=1$ and $P=0$.\n",
+    "\n",
+    "Note that strides constitute a form of **subsampling**. As an alternative to\n",
+    "being interpreted as a measure of how much the kernel/filter is translated, strides\n",
+    "can also be viewed as how much of the output is retained. For instance, moving\n",
+    "the kernel by hops of two is equivalent to moving the kernel by hops of one but\n",
+    "retaining only odd output elements."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "638e063c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Summarizing: Performing a general discrete convolution ([From Raschka et al](https://github.com/rasbt/machine-learning-book))\n",
+    "\n",
+    "<!-- dom:FIGURE: [figslides/discreteconv1.png, width=500 frac=0.67]  A deep CNN -->\n",
+    "<!-- begin figure -->\n",
+    "\n",
+    "<img src=\"figslides/discreteconv1.png\" width=\"500\"><p style=\"font-size: 0.9em\"><i>Figure 1: A deep CNN</i></p>\n",
+    "<!-- end figure -->"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d182de4b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Pooling\n",
+    "\n",
+    "In addition to discrete convolutions themselves, **pooling** operations\n",
+    "make up another important building block in CNNs. Pooling operations reduce\n",
+    "the size of feature maps by using some function to summarize subregions, such\n",
+    "as taking the average or the maximum value.\n",
+    "\n",
+    "Pooling works by sliding a window across the input and feeding the content of\n",
+    "the window to a **pooling function**. In some sense, pooling works very much\n",
+    "like a discrete convolution, but replaces the linear combination described by\n",
+    "the kernel with some other function."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1159bffe",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Pooling arithmetic\n",
+    "\n",
+    "In a neural network, pooling layers provide invariance to small translations of\n",
+    "the input. The most common kind of pooling is **max pooling**, which\n",
+    "consists in splitting the input in (usually non-overlapping) patches and\n",
+    "outputting the maximum value of each patch. Other kinds of pooling exist, e.g.,\n",
+    "mean or average pooling, which all share the same idea of aggregating the input\n",
+    "locally by applying a non-linearity to the content of some patches."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "138b6d6a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Pooling types ([From Raschka et al](https://github.com/rasbt/machine-learning-book))\n",
+    "\n",
+    "<!-- dom:FIGURE: [figslides/maxpooling.png, width=500 frac=0.67]  A deep CNN -->\n",
+    "<!-- begin figure -->\n",
+    "\n",
+    "<img src=\"figslides/maxpooling.png\" width=\"500\"><p style=\"font-size: 0.9em\"><i>Figure 1: A deep CNN</i></p>\n",
+    "<!-- end figure -->"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "97123878",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Building convolutional neural networks in Tensorflow/Keras and PyTorch\n",
+    "\n",
+    "As discussed above, CNNs are neural networks built from the assumption\n",
+    "that the inputs to the network are 2D images. This is important\n",
+    "because the number of features or pixels in images grows very fast\n",
+    "with the image size, and an enormous number of weights and biases are\n",
+    "needed in order to build an accurate network.  Next week we will\n",
+    "discuss in more detail how we can build a CNN using either TensorFlow\n",
+    "with Keras and PyTorch."
+   ]
+  }
+ ],
+ "metadata": {},
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
\ No newline at end of file
diff --git a/doc/LectureNotes/_build/jupyter_execute/week45.ipynb b/doc/LectureNotes/_build/jupyter_execute/week45.ipynb
new file mode 100644
index 000000000..54d61a576
--- /dev/null
+++ b/doc/LectureNotes/_build/jupyter_execute/week45.ipynb
@@ -0,0 +1,2335 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "9686648f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- HTML file automatically generated from DocOnce source (https://github.com/doconce/doconce/)\n",
+    "doconce format html week45.do.txt --no_mako -->\n",
+    "<!-- dom:TITLE: Week 45,  Convolutional Neural Networks (CCNs) -->"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "45892517",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "# Week 45,  Convolutional Neural Networks (CCNs)\n",
+    "**Morten Hjorth-Jensen**, Department of Physics, University of Oslo\n",
+    "\n",
+    "Date: **November 3-7, 2025**"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8449fbfd",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Plans for week 45\n",
+    "\n",
+    "**Material for the lecture on Monday November 3, 2025.**\n",
+    "\n",
+    "1. Convolutional Neural Networks, codes and examples (TensorFlow and Pytorch implementations)\n",
+    "\n",
+    "2. Readings and Videos:\n",
+    "\n",
+    "3. These lecture notes at <https://github.com/CompPhysics/MachineLearning/blob/master/doc/pub/week45/ipynb/week45.ipynb>\n",
+    "\n",
+    "4. Video of lecture at <https://youtu.be/dZt6Vm1wjhs>\n",
+    "\n",
+    "5. Whiteboard notes at <https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2025/FYSSTKweek45.pdf>\n",
+    "\n",
+    "6. For a more in depth discussion on  CNNs we recommend Goodfellow et al chapters 9. See also chapter 11 and 12 on practicalities and applications    \n",
+    "\n",
+    "7. Reading suggestions for implementation of CNNs, see Raschka et al chapters 14-15 at <https://github.com/rasbt/machine-learning-book>.\n",
+    "<!-- o Video  on Recurrent Neural Networks from MIT at <https://www.youtube.com/watch?v=SEnXr6v2ifU&ab_channel=AlexanderAmini> -->\n",
+    "\n",
+    "a. Video on Deep Learning at <https://www.youtube.com/playlist?list=PLZHQObOWTQDNU6R1_67000Dx_ZCJB-3pi>"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4ad8a4b2",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Material for the lab sessions\n",
+    "\n",
+    "Discussion of and work on project 2, no exercises this week, only project work"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "48e99fbe",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Material for Lecture Monday November 3"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "661e183c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Convolutional Neural Networks (recognizing images), reminder from last week\n",
+    "\n",
+    "Convolutional neural networks (CNNs) were developed during the last\n",
+    "decade of the previous century, with a focus on character recognition\n",
+    "tasks. Nowadays, CNNs are a central element in the spectacular success\n",
+    "of deep learning methods. The success in for example image\n",
+    "classifications have made them a central tool for most machine\n",
+    "learning practitioners.\n",
+    "\n",
+    "CNNs are very similar to ordinary Neural Networks.\n",
+    "They are made up of neurons that have learnable weights and\n",
+    "biases. Each neuron receives some inputs, performs a dot product and\n",
+    "optionally follows it with a non-linearity. The whole network still\n",
+    "expresses a single differentiable score function: from the raw image\n",
+    "pixels on one end to class scores at the other. And they still have a\n",
+    "loss function (for example Softmax) on the last (fully-connected) layer\n",
+    "and all the tips/tricks we developed for learning regular Neural\n",
+    "Networks still apply (back propagation, gradient descent etc etc)."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "96a38398",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## What is the Difference\n",
+    "\n",
+    "**CNN architectures make the explicit assumption that\n",
+    "the inputs are images, which allows us to encode certain properties\n",
+    "into the architecture. These then make the forward function more\n",
+    "efficient to implement and vastly reduce the amount of parameters in\n",
+    "the network.**"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3ca522fb",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Neural Networks vs CNNs\n",
+    "\n",
+    "Neural networks are defined as **affine transformations**, that is \n",
+    "a vector is received as input and is multiplied with a matrix of so-called weights (our unknown paramters) to produce an\n",
+    "output (to which a bias vector is usually added before passing the result\n",
+    "through a nonlinear activation function). This is applicable to any type of input, be it an\n",
+    "image, a sound clip or an unordered collection of features: whatever their\n",
+    "dimensionality, their representation can always be flattened into a vector\n",
+    "before the transformation."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "609aa156",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Why CNNS for images, sound files, medical images from CT scans etc?\n",
+    "\n",
+    "However, when we consider images, sound clips and many other similar kinds of data, these data  have an intrinsic\n",
+    "structure. More formally, they share these important properties:\n",
+    "* They are stored as multi-dimensional arrays (think of the pixels of a figure) .\n",
+    "\n",
+    "* They feature one or more axes for which ordering matters (e.g., width and height axes for an image, time axis for a sound clip).\n",
+    "\n",
+    "* One axis, called the channel axis, is used to access different views of the data (e.g., the red, green and blue channels of a color image, or the left and right channels of a stereo audio track).\n",
+    "\n",
+    "These properties are not exploited when an affine transformation is applied; in\n",
+    "fact, all the axes are treated in the same way and the topological information\n",
+    "is not taken into account. Still, taking advantage of the implicit structure of\n",
+    "the data may prove very handy in solving some tasks, like computer vision and\n",
+    "speech recognition, and in these cases it would be best to preserve it. This is\n",
+    "where discrete convolutions come into play.\n",
+    "\n",
+    "A discrete convolution is a linear transformation that preserves this notion of\n",
+    "ordering. It is sparse (only a few input units contribute to a given output\n",
+    "unit) and reuses parameters (the same weights are applied to multiple locations\n",
+    "in the input)."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c280e4de",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Regular NNs don’t scale well to full images\n",
+    "\n",
+    "As an example, consider\n",
+    "an image of size $32\\times 32\\times 3$ (32 wide, 32 high, 3 color channels), so a\n",
+    "single fully-connected neuron in a first hidden layer of a regular\n",
+    "Neural Network would have $32\\times 32\\times 3 = 3072$ weights. This amount still\n",
+    "seems manageable, but clearly this fully-connected structure does not\n",
+    "scale to larger images. For example, an image of more respectable\n",
+    "size, say $200\\times 200\\times 3$, would lead to neurons that have \n",
+    "$200\\times 200\\times 3 = 120,000$ weights. \n",
+    "\n",
+    "We could have\n",
+    "several such neurons, and the parameters would add up quickly! Clearly,\n",
+    "this full connectivity is wasteful and the huge number of parameters\n",
+    "would quickly lead to possible overfitting.\n",
+    "\n",
+    "<!-- dom:FIGURE: [figslides/nn.jpeg, width=500 frac=0.6]  A regular 3-layer Neural Network. -->\n",
+    "<!-- begin figure -->\n",
+    "\n",
+    "<img src=\"figslides/nn.jpeg\" width=\"500\"><p style=\"font-size: 0.9em\"><i>Figure 1: A regular 3-layer Neural Network.</i></p>\n",
+    "<!-- end figure -->"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0d86d50e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## 3D volumes of neurons\n",
+    "\n",
+    "Convolutional Neural Networks take advantage of the fact that the\n",
+    "input consists of images and they constrain the architecture in a more\n",
+    "sensible way. \n",
+    "\n",
+    "In particular, unlike a regular Neural Network, the\n",
+    "layers of a CNN have neurons arranged in 3 dimensions: width,\n",
+    "height, depth. (Note that the word depth here refers to the third\n",
+    "dimension of an activation volume, not to the depth of a full Neural\n",
+    "Network, which can refer to the total number of layers in a network.)\n",
+    "\n",
+    "To understand it better, the above example of an image \n",
+    "with an input volume of\n",
+    "activations has dimensions $32\\times 32\\times 3$ (width, height,\n",
+    "depth respectively). \n",
+    "\n",
+    "The neurons in a layer will\n",
+    "only be connected to a small region of the layer before it, instead of\n",
+    "all of the neurons in a fully-connected manner. Moreover, the final\n",
+    "output layer could  for this specific image have dimensions $1\\times 1 \\times 10$, \n",
+    "because by the\n",
+    "end of the CNN architecture we will reduce the full image into a\n",
+    "single vector of class scores, arranged along the depth\n",
+    "dimension. \n",
+    "\n",
+    "<!-- dom:FIGURE: [figslides/cnn.jpeg, width=500 frac=0.6]  A CNN arranges its neurons in three dimensions (width, height, depth), as visualized in one of the layers. Every layer of a CNN transforms the 3D input volume to a 3D output volume of neuron activations. In this example, the red input layer holds the image, so its width and height would be the dimensions of the image, and the depth would be 3 (Red, Green, Blue channels). -->\n",
+    "<!-- begin figure -->\n",
+    "\n",
+    "<img src=\"figslides/cnn.jpeg\" width=\"500\"><p style=\"font-size: 0.9em\"><i>Figure 1: A CNN arranges its neurons in three dimensions (width, height, depth), as visualized in one of the layers. Every layer of a CNN transforms the 3D input volume to a 3D output volume of neuron activations. In this example, the red input layer holds the image, so its width and height would be the dimensions of the image, and the depth would be 3 (Red, Green, Blue channels).</i></p>\n",
+    "<!-- end figure -->"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "93102a35",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## More on Dimensionalities\n",
+    "\n",
+    "In fields like signal processing (and imaging as well), one designs\n",
+    "so-called filters. These filters are defined by the convolutions and\n",
+    "are often hand-crafted. One may specify filters for smoothing, edge\n",
+    "detection, frequency reshaping, and similar operations. However with\n",
+    "neural networks the idea is to automatically learn the filters and use\n",
+    "many of them in conjunction with non-linear operations (activation\n",
+    "functions).\n",
+    "\n",
+    "As an example consider a neural network operating on sound sequence\n",
+    "data.  Assume that we an input vector $\\boldsymbol{x}$ of length $d=10^6$.  We\n",
+    "construct then a neural network with onle hidden layer only with\n",
+    "$10^4$ nodes. This means that we will have a weight matrix with\n",
+    "$10^4\\times 10^6=10^{10}$ weights to be determined, together with $10^4$ biases.\n",
+    "\n",
+    "Assume furthermore that we have an output layer which is meant to train whether the sound sequence represents a human voice (true) or something else (false).\n",
+    "It means that we have only one output node. But since this output node connects to $10^4$ nodes in the hidden layer, there are in total $10^4$ weights to be determined for the output layer, plus one bias. In total we have"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b0e6ea33",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\mathrm{NumberParameters}=10^{10}+10^4+10^4+1 \\approx 10^{10},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3fbba997",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "that is ten billion parameters to determine."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4be9d3e0",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Further remarks\n",
+    "\n",
+    "The main principles that justify convolutions is locality of\n",
+    "information and repetion of patterns within the signal. Sound samples\n",
+    "of the input in adjacent spots are much more likely to affect each\n",
+    "other than those that are very far away. Similarly, sounds are\n",
+    "repeated in multiple times in the signal. While slightly simplistic,\n",
+    "reasoning about such a sound example demonstrates this. The same\n",
+    "principles then apply to images and other similar data."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b93711ab",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Layers used to build CNNs\n",
+    "\n",
+    "A simple CNN is a sequence of layers, and every layer of a CNN\n",
+    "transforms one volume of activations to another through a\n",
+    "differentiable function. We use three main types of layers to build\n",
+    "CNN architectures: Convolutional Layer, Pooling Layer, and\n",
+    "Fully-Connected Layer (exactly as seen in regular Neural Networks). We\n",
+    "will stack these layers to form a full CNN architecture.\n",
+    "\n",
+    "A simple CNN for image classification could have the architecture:\n",
+    "\n",
+    "* **INPUT** ($32\\times 32 \\times 3$) will hold the raw pixel values of the image, in this case an image of width 32, height 32, and with three color channels R,G,B.\n",
+    "\n",
+    "* **CONV** (convolutional )layer will compute the output of neurons that are connected to local regions in the input, each computing a dot product between their weights and a small region they are connected to in the input volume. This may result in volume such as $[32\\times 32\\times 12]$ if we decided to use 12 filters.\n",
+    "\n",
+    "* **RELU** layer will apply an elementwise activation function, such as the $max(0,x)$ thresholding at zero. This leaves the size of the volume unchanged ($[32\\times 32\\times 12]$).\n",
+    "\n",
+    "* **POOL** (pooling) layer will perform a downsampling operation along the spatial dimensions (width, height), resulting in volume such as $[16\\times 16\\times 12]$.\n",
+    "\n",
+    "* **FC** (i.e. fully-connected) layer will compute the class scores, resulting in volume of size $[1\\times 1\\times 10]$, where each of the 10 numbers correspond to a class score, such as among the 10 categories of the MNIST images we considered above . As with ordinary Neural Networks and as the name implies, each neuron in this layer will be connected to all the numbers in the previous volume."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "df93de2c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Transforming images\n",
+    "\n",
+    "CNNs transform the original image layer by layer from the original\n",
+    "pixel values to the final class scores. \n",
+    "\n",
+    "Observe that some layers contain\n",
+    "parameters and other don’t. In particular, the CNN layers perform\n",
+    "transformations that are a function of not only the activations in the\n",
+    "input volume, but also of the parameters (the weights and biases of\n",
+    "the neurons). On the other hand, the RELU/POOL layers will implement a\n",
+    "fixed function. The parameters in the CONV/FC layers will be trained\n",
+    "with gradient descent so that the class scores that the CNN computes\n",
+    "are consistent with the labels in the training set for each image."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "35b469f8",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## CNNs in brief\n",
+    "\n",
+    "In summary:\n",
+    "\n",
+    "* A CNN architecture is in the simplest case a list of Layers that transform the image volume into an output volume (e.g. holding the class scores)\n",
+    "\n",
+    "* There are a few distinct types of Layers (e.g. CONV/FC/RELU/POOL are by far the most popular)\n",
+    "\n",
+    "* Each Layer accepts an input 3D volume and transforms it to an output 3D volume through a differentiable function\n",
+    "\n",
+    "* Each Layer may or may not have parameters (e.g. CONV/FC do, RELU/POOL don’t)\n",
+    "\n",
+    "* Each Layer may or may not have additional hyperparameters (e.g. CONV/FC/POOL do, RELU doesn’t)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f2bc243c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## A deep CNN model ([From Raschka et al](https://github.com/rasbt/machine-learning-book))\n",
+    "\n",
+    "<!-- dom:FIGURE: [figslides/deepcnn.png, width=500 frac=0.67]  A deep CNN -->\n",
+    "<!-- begin figure -->\n",
+    "\n",
+    "<img src=\"figslides/deepcnn.png\" width=\"500\"><p style=\"font-size: 0.9em\"><i>Figure 1: A deep CNN</i></p>\n",
+    "<!-- end figure -->"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "92956a26",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Key Idea\n",
+    "\n",
+    "A dense neural network is representd by an affine operation (like matrix-matrix multiplication) where all parameters are included.\n",
+    "\n",
+    "The key idea in CNNs for say imaging is that in images neighbor pixels tend to be related! So we connect\n",
+    "only neighboring neurons in the input instead of connecting all with the first hidden layer.\n",
+    "\n",
+    "We say we perform a filtering (convolution is the mathematical operation)."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b758f4ee",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Mathematics of CNNs\n",
+    "\n",
+    "The mathematics of CNNs is based on the mathematical operation of\n",
+    "**convolution**.  In mathematics (in particular in functional analysis),\n",
+    "convolution is represented by mathematical operations (integration,\n",
+    "summation etc) on two functions in order to produce a third function\n",
+    "that expresses how the shape of one gets modified by the other.\n",
+    "Convolution has a plethora of applications in a variety of\n",
+    "disciplines, spanning from statistics to signal processing, computer\n",
+    "vision, solutions of differential equations,linear algebra,\n",
+    "engineering, and yes, machine learning.\n",
+    "\n",
+    "Mathematically, convolution is defined as follows (one-dimensional example):\n",
+    "Let us define a continuous function $y(t)$ given by"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9fa911b3",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "y(t) = \\int x(a) w(t-a) da,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "918817a5",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where $x(a)$ represents a so-called input and $w(t-a)$ is normally called the weight function or kernel.\n",
+    "\n",
+    "The above integral is written in  a more compact form as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d5538df6",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "y(t) = \\left(x * w\\right)(t).\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d4a4e2bc",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "The discretized version reads"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "68268e68",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "y(t) = \\sum_{a=-\\infty}^{a=\\infty}x(a)w(t-a).\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "198bcce9",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Computing the inverse of the above convolution operations is known as deconvolution and the process is commutative.\n",
+    "\n",
+    "How can we use this? And what does it mean? Let us study some familiar examples first."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "43b535c4",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Convolution Examples: Polynomial multiplication\n",
+    "\n",
+    "Our first example is that of a multiplication between two polynomials,\n",
+    "which we will rewrite in terms of the mathematics of convolution. In\n",
+    "the final stage, since the problem here is a discrete one, we will\n",
+    "recast the final expression in terms of a matrix-vector\n",
+    "multiplication, where the matrix is a so-called [Toeplitz matrix\n",
+    "](https://link.springer.com/book/10.1007/978-93-86279-04-0).\n",
+    "\n",
+    "Let us look a the following polynomials to second and third order, respectively:"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "45bc8ffc",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "p(t) = \\alpha_0+\\alpha_1 t+\\alpha_2 t^2,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2c42df04",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "08c139bf",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "s(t) = \\beta_0+\\beta_1 t+\\beta_2 t^2+\\beta_3 t^3.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "bf189420",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "The polynomial multiplication gives us a new polynomial of degree $5$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7f5d7607",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "z(t) = \\delta_0+\\delta_1 t+\\delta_2 t^2+\\delta_3 t^3+\\delta_4 t^4+\\delta_5 t^5.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a2f47e64",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Efficient Polynomial Multiplication\n",
+    "\n",
+    "Computing polynomial products can be implemented efficiently if we rewrite the more brute force multiplications using convolution.\n",
+    "We note first that the new coefficients are given as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7890aee8",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\begin{split}\n",
+    "\\delta_0=&\\alpha_0\\beta_0\\\\\n",
+    "\\delta_1=&\\alpha_1\\beta_0+\\alpha_0\\beta_1\\\\\n",
+    "\\delta_2=&\\alpha_0\\beta_2+\\alpha_1\\beta_1+\\alpha_2\\beta_0\\\\\n",
+    "\\delta_3=&\\alpha_1\\beta_2+\\alpha_2\\beta_1+\\alpha_0\\beta_3\\\\\n",
+    "\\delta_4=&\\alpha_2\\beta_2+\\alpha_1\\beta_3\\\\\n",
+    "\\delta_5=&\\alpha_2\\beta_3.\\\\\n",
+    "\\end{split}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6a03a3eb",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "We note that $\\alpha_i=0$ except for $i\\in \\left\\{0,1,2\\right\\}$ and $\\beta_i=0$ except for $i\\in\\left\\{0,1,2,3\\right\\}$.\n",
+    "\n",
+    "We can then rewrite the coefficients $\\delta_j$ using a discrete convolution as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b49e404f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\delta_j = \\sum_{i=-\\infty}^{i=\\infty}\\alpha_i\\beta_{j-i}=(\\alpha * \\beta)_j,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4ef5b061",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "or as a double sum with restriction $l=i+j$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "61685a6c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\delta_l = \\sum_{ij}\\alpha_i\\beta_{j}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7ced5341",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Further simplification\n",
+    "\n",
+    "Although we may have redundant operations with some few zeros for $\\beta_i$, we can rewrite the above sum in a more compact way as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3d00697e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\delta_i = \\sum_{k=0}^{k=m-1}\\alpha_k\\beta_{i-k},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "22837be3",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where $m=3$ in our case, the maximum length of\n",
+    "the vector $\\alpha$. Note that the vector $\\boldsymbol{\\beta}$ has length $n=4$. Below we will find an even more efficient representation."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1603a086",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## A more efficient way of coding the above Convolution\n",
+    "\n",
+    "Since we only have a finite number of $\\alpha$ and $\\beta$ values\n",
+    "which are non-zero, we can rewrite the above convolution expressions\n",
+    "as a matrix-vector multiplication"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "340acf5c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\boldsymbol{\\delta}=\\begin{bmatrix}\\alpha_0 & 0 & 0 & 0 \\\\\n",
+    "                            \\alpha_1 & \\alpha_0 & 0 & 0 \\\\\n",
+    "\t\t\t    \\alpha_2 & \\alpha_1 & \\alpha_0 & 0 \\\\\n",
+    "\t\t\t    0 & \\alpha_2 & \\alpha_1 & \\alpha_0 \\\\\n",
+    "\t\t\t    0 & 0 & \\alpha_2 & \\alpha_1 \\\\\n",
+    "\t\t\t    0 & 0 & 0 & \\alpha_2\n",
+    "\t\t\t    \\end{bmatrix}\\begin{bmatrix} \\beta_0 \\\\ \\beta_1 \\\\ \\beta_2 \\\\ \\beta_3\\end{bmatrix}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cdc8d513",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Commutative process\n",
+    "\n",
+    "The process is commutative and we can easily see that we can rewrite the multiplication in terms of  a matrix holding $\\beta$ and a vector holding $\\alpha$.\n",
+    "In this case we have"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "51e1f3d8",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\boldsymbol{\\delta}=\\begin{bmatrix}\\beta_0 & 0 & 0  \\\\\n",
+    "                            \\beta_1 & \\beta_0 & 0  \\\\\n",
+    "\t\t\t    \\beta_2 & \\beta_1 & \\beta_0  \\\\\n",
+    "\t\t\t    \\beta_3 & \\beta_2 & \\beta_1 \\\\\n",
+    "\t\t\t    0 & \\beta_3 & \\beta_2 \\\\\n",
+    "\t\t\t    0 & 0 & \\beta_3\n",
+    "\t\t\t    \\end{bmatrix}\\begin{bmatrix} \\alpha_0 \\\\ \\alpha_1 \\\\ \\alpha_2\\end{bmatrix}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ce936f65",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Note that the use of these matrices is for mathematical purposes only\n",
+    "and not implementation purposes.  When implementing the above equation\n",
+    "we do not encode (and allocate memory) the matrices explicitely.  We\n",
+    "rather code the convolutions in the minimal memory footprint that they\n",
+    "require."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c93a683f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Toeplitz matrices\n",
+    "\n",
+    "The above matrices are examples of so-called [Toeplitz\n",
+    "matrices](https://link.springer.com/book/10.1007/978-93-86279-04-0). A\n",
+    "Toeplitz matrix is a matrix in which each descending diagonal from\n",
+    "left to right is constant. For instance the last matrix, which we\n",
+    "rewrite as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1e3cffca",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\boldsymbol{A}=\\begin{bmatrix}a_0 & 0 & 0  \\\\\n",
+    "                            a_1 & a_0 & 0  \\\\\n",
+    "\t\t\t    a_2 & a_1 & a_0  \\\\\n",
+    "\t\t\t    a_3 & a_2 & a_1 \\\\\n",
+    "\t\t\t    0 & a_3 & a_2 \\\\\n",
+    "\t\t\t    0 & 0 & a_3\n",
+    "\t\t\t    \\end{bmatrix},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e27270d9",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "with elements $a_{ii}=a_{i+1,j+1}=a_{i-j}$ is an example of a Toeplitz\n",
+    "matrix. Such a matrix does not need to be a square matrix.  Toeplitz\n",
+    "matrices are also closely connected with Fourier series, because the multiplication operator by a trigonometric\n",
+    "polynomial, compressed to a finite-dimensional space, can be\n",
+    "represented by such a matrix. The example above shows that we can\n",
+    "represent linear convolution as multiplication of a Toeplitz matrix by\n",
+    "a vector."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "125ef645",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Fourier series and Toeplitz matrices\n",
+    "\n",
+    "This is an active and ogoing research area concerning CNNs. The following articles may be of interest\n",
+    "1. [Read more about the convolution theorem and Fouriers series](https://www.sciencedirect.com/topics/engineering/convolution-theorem#:~:text=The%20convolution%20theorem%20(together%20with,k%20)%20G%20(%20k%20)%20.)\n",
+    "\n",
+    "2. [Fourier Transform Layer](https://www.sciencedirect.com/science/article/pii/S1568494623006257)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d13ab1e4",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Generalizing the above one-dimensional case\n",
+    "\n",
+    "In order to align the above simple case with the more general\n",
+    "convolution cases, we rename $\\boldsymbol{\\alpha}$, whose length is $m=3$,\n",
+    "with $\\boldsymbol{w}$.  We will interpret $\\boldsymbol{w}$ as a weight/filter function\n",
+    "with which we want to perform the convolution with an input variable\n",
+    "$\\boldsymbol{x}$ of length $n$.  We will assume always that the filter\n",
+    "$\\boldsymbol{w}$ has dimensionality $m \\le n$.\n",
+    "\n",
+    "We replace thus $\\boldsymbol{\\beta}$ with $\\boldsymbol{x}$ and $\\boldsymbol{\\delta}$ with $\\boldsymbol{y}$ and have"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b9eb4b1e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "y(i)= \\left(x*w\\right)(i)= \\sum_{k=0}^{k=m-1}w(k)x(i-k),\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "bdf0893f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where $m=3$ in our case, the maximum length of the vector $\\boldsymbol{w}$.\n",
+    "Here the symbol $*$ represents the mathematical operation of convolution."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "64cd5dbb",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Memory considerations\n",
+    "\n",
+    "This expression leaves us however with some terms with negative\n",
+    "indices, for example $x(-1)$ and $x(-2)$ which may not be defined. Our\n",
+    "vector $\\boldsymbol{x}$ has components $x(0)$, $x(1)$, $x(2)$ and $x(3)$.\n",
+    "\n",
+    "The index $j$ for $\\boldsymbol{x}$ runs from $j=0$ to $j=3$ since $\\boldsymbol{x}$ is meant to\n",
+    "represent a third-order polynomial.\n",
+    "\n",
+    "Furthermore, the index $i$ runs from $i=0$ to $i=5$ since $\\boldsymbol{y}$\n",
+    "contains the coefficients of a fifth-order polynomial.  When $i=5$ we\n",
+    "may also have values of $x(4)$ and $x(5)$ which are not defined."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "20fa0219",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Padding\n",
+    "\n",
+    "The solution to this is what is called **padding**!  We simply define a\n",
+    "new vector $x$ with two added elements set to zero before $x(0)$ and\n",
+    "two new elements after $x(3)$ set to zero. That is, we augment the\n",
+    "length of $\\boldsymbol{x}$ from $n=4$ to $n+2P=8$, where $P=2$ is the padding\n",
+    "constant (a new hyperparameter), see discussions below as well."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d24c7e69",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## New vector\n",
+    "\n",
+    "We have a new vector defined as $x(0)=0$, $x(1)=0$,\n",
+    "$x(2)=\\beta_0$, $x(3)=\\beta_1$, $x(4)=\\beta_2$, $x(5)=\\beta_3$,\n",
+    "$x(6)=0$, and $x(7)=0$.\n",
+    "\n",
+    "We have added four new elements, which\n",
+    "are all zero. The benefit is that we can rewrite the equation for\n",
+    "$\\boldsymbol{y}$, with $i=0,1,\\dots,5$,"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c00151a8",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "y(i) = \\sum_{k=0}^{k=m-1}w(k)x(i+(m-1)-k).\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c9b39bfd",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "As an example, we have"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "53de5ac4",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "y(4)=x(6)w(0)+x(5)w(1)+x(4)w(2)=0\\times \\alpha_0+\\beta_3\\alpha_1+\\beta_2\\alpha_2,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e1025d77",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "as before except that we have an additional term $x(6)w(0)$, which is zero.\n",
+    "\n",
+    "Similarly, for the fifth-order term we have"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "34a5a413",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "y(5)=x(7)w(0)+x(6)w(1)+x(5)w(2)=0\\times \\alpha_0+0\\times\\alpha_1+\\beta_3\\alpha_2.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5ef38242",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "The zeroth-order term is"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "42a8bd2e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "y(0)=x(2)w(0)+x(1)w(1)+x(0)w(2)=\\beta_0 \\alpha_0+0\\times\\alpha_1+0\\times\\alpha_2=\\alpha_0\\beta_0.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2580d624",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Rewriting as dot products\n",
+    "\n",
+    "If we now flip the filter/weight vector, with the following term as a typical example"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "76157e3c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "y(0)=x(2)w(0)+x(1)w(1)+x(0)w(2)=x(2)\\tilde{w}(2)+x(1)\\tilde{w}(1)+x(0)\\tilde{w}(0),\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a47c0bbf",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "with $\\tilde{w}(0)=w(2)$, $\\tilde{w}(1)=w(1)$, and $\\tilde{w}(2)=w(0)$, we can then rewrite the above sum as a dot product of\n",
+    "$x(i:i+(m-1))\\tilde{w}$ for element $y(i)$, where $x(i:i+(m-1))$ is simply a patch of $\\boldsymbol{x}$ of size $m-1$.\n",
+    "\n",
+    "The padding $P$ we have introduced for the convolution stage is just\n",
+    "another hyperparameter which is introduced as part of the\n",
+    "architecture. Similarly, below we will also introduce another\n",
+    "hyperparameter called **Stride** $S$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4de2c235",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Cross correlation\n",
+    "\n",
+    "In essentially all applications one uses what is called cross correlation instead of the standard convolution described above.\n",
+    "This means that multiplication is performed in the same direction and instead of the general expression we have discussed above (with infinite sums)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "33319954",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "y(i) = \\sum_{k=-\\infty}^{k=\\infty}w(k)x(i-k),\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d46fb216",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "we have now"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1125a773",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "y(i) = \\sum_{k=-\\infty}^{k=\\infty}w(k)x(i+k).\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4e9ea645",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Both TensorFlow and PyTorch (as well as our own code example below),\n",
+    "implement the last equation, although it is normally referred to as\n",
+    "convolution.  The same padding rules and stride rules discussed below\n",
+    "apply to this expression as well.\n",
+    "\n",
+    "We leave it as an exercise for you to convince yourself that the example we have discussed till now, gives the same final expression using the last expression."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "711fc589",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Two-dimensional objects\n",
+    "\n",
+    "We are now ready to start studying the discrete convolutions relevant for convolutional neural networks.\n",
+    "We often use convolutions over more than one dimension at a time. If\n",
+    "we have a two-dimensional image $X$ as input, we can have a **filter**\n",
+    "defined by a two-dimensional **kernel/weight/filter** $W$. This leads to an output $Y$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ea93186d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "Y(i,j)=(X * W)(i,j) = \\sum_m\\sum_n X(m,n)W(i-m,j-n).\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2ce72e4f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Convolution is a commutative process, which means we can rewrite this equation as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7c891889",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "Y(i,j)=(X * W)(i,j) = \\sum_m\\sum_n X(i-m,j-n)W(m,n).\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "337f70a6",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Normally the latter is more straightforward to implement in a machine\n",
+    "larning library since there is less variation in the range of values\n",
+    "of $m$ and $n$.\n",
+    "\n",
+    "As mentioned above, most deep learning libraries implement\n",
+    "cross-correlation instead of convolution (although it is referred to as\n",
+    "convolution)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "aa0e3c87",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "Y(i,j)=(X * W)(i,j) = \\sum_m\\sum_n X(i+m,j+n)W(m,n).\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "77113b34",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## CNNs in more detail, simple example\n",
+    "\n",
+    "Let assume we have an input matrix $X$ of dimensionality $3\\times 3$\n",
+    "and a $2\\times 2$ filter $W$ given by the following matrices"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d54278c7",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\boldsymbol{X}=\\begin{bmatrix}x_{00} & x_{01} & x_{02}  \\\\\n",
+    "                      x_{10} & x_{11} & x_{12}  \\\\\n",
+    "\t              x_{20} & x_{21} & x_{22} \\end{bmatrix},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "597d1ef3",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c544ba40",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\boldsymbol{W}=\\begin{bmatrix}w_{00} & w_{01} \\\\\n",
+    "\t              w_{10} & w_{11}\\end{bmatrix}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b6c1b40b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "We introduce now the hyperparameter $S$ **stride**. Stride represents how the filter $W$ moves the convolution process on the matrix $X$.\n",
+    "We strongly recommend the repository on [Arithmetic of deep learning by Dumoulin and Visin](https://github.com/vdumoulin/conv_arithmetic) \n",
+    "\n",
+    "Here we set the stride equal to $S=1$, which means that, starting with the element $x_{00}$, the filter will act on $2\\times 2$ submatrices each time, starting with the upper corner and moving according to the stride value column by column. \n",
+    "\n",
+    "Here we perform the operation"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d8ee5cf0",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "Y_(i,j)=(X * W)(i,j) = \\sum_m\\sum_n X(i-m,j-n)W(m,n),\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5df35204",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and obtain"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "afe8a3ab",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\boldsymbol{Y}=\\begin{bmatrix}x_{00}w_{00}+x_{01}w_{01}+x_{10}w_{10}+x_{11}w_{11} & x_{01}w_{00}+x_{02}w_{01}+x_{11}w_{10}+x_{12}w_{11}  \\\\\n",
+    "\t              x_{10}w_{00}+x_{11}w_{01}+x_{20}w_{10}+x_{21}w_{11} & x_{11}w_{00}+x_{12}w_{01}+x_{21}w_{10}+x_{22}w_{11}\\end{bmatrix}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9a1c6848",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "We can rewrite this operation in terms of a matrix-vector multiplication by defining a new vector where we flatten out the inputs as a vector $\\boldsymbol{X}'$ of length $9$ and\n",
+    "a matrix $\\boldsymbol{W}'$ with dimension $4\\times 9$ as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4506234a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\boldsymbol{X}'=\\begin{bmatrix}x_{00} \\\\ x_{01} \\\\ x_{02} \\\\ x_{10} \\\\ x_{11} \\\\ x_{12} \\\\ x_{20} \\\\ x_{21} \\\\ x_{22} \\end{bmatrix},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f1b2fef4",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and the new matrix"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6c372fa6",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\boldsymbol{W}'=\\begin{bmatrix} w_{00} & w_{01} & 0 & w_{10} & w_{11} & 0 & 0 & 0 & 0 \\\\\n",
+    "                        0  & w_{00} & w_{01} & 0 & w_{10} & w_{11} & 0 & 0 & 0 \\\\\n",
+    "\t\t\t0 & 0 & 0 & w_{00} & w_{01} & 0 & w_{10} & w_{11} & 0  \\\\\n",
+    "                        0 & 0 & 0 & 0 & w_{00} & w_{01} & 0 & w_{10} & w_{11}\\end{bmatrix}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "61ad1cf3",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "We see easily that performing the matrix-vector multiplication $\\boldsymbol{W}'\\boldsymbol{X}'$ is the same as the above convolution with stride $S=1$, that is"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a18a70a2",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "Y=(\\boldsymbol{W}*\\boldsymbol{X}),\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b63a1613",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "is now given by $\\boldsymbol{W}'\\boldsymbol{X}'$ which is a vector of length $4$ instead of the originally resulting  $2\\times 2$ output matrix."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8fa9fe57",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## The convolution stage\n",
+    "\n",
+    "The convolution stage, where we apply different filters $\\boldsymbol{W}$ in\n",
+    "order to reduce the dimensionality of an image, adds, in addition to\n",
+    "the weights and biases (to be trained by the back propagation\n",
+    "algorithm) that define the filters, two new hyperparameters, the so-called\n",
+    "**padding** $P$ and the stride $S$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a30b6ced",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Finding the number of parameters\n",
+    "\n",
+    "In the above example we have an input matrix of dimension $3\\times\n",
+    "3$. In general we call the input for an input volume and it is defined\n",
+    "by its width $H_1$, height $H_1$ and depth $D_1$. If we have the\n",
+    "standard three color channels $D_1=3$.\n",
+    "\n",
+    "The above example has $W_1=H_1=3$ and $D_1=1$.\n",
+    "\n",
+    "When we introduce the filter we have the following additional hyperparameters\n",
+    "1. $K$ the number of filters. It is common to perform the convolution of the input several times since by experience shrinking the input too fast does not work well\n",
+    "\n",
+    "2. $F$ as the filter's spatial extent\n",
+    "\n",
+    "3. $S$ as the stride parameter\n",
+    "\n",
+    "4. $P$ as the padding parameter\n",
+    "\n",
+    "These parameters are defined by the architecture of the network and are not included in the training."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b38d040f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## New image (or volume)\n",
+    "\n",
+    "Acting with the filter on the input volume produces an output volume\n",
+    "which is defined by its width $W_2$, its height $H_2$ and its depth\n",
+    "$D_2$.\n",
+    "\n",
+    "These are defined by the following relations"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3b090ce0",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "W_2 = \\frac{(W_1-F+2P)}{S}+1,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "52fa4212",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "H_2 = \\frac{(H_1-F+2P)}{S}+1,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "dfa9a926",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and $D_2=K$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9bb02c26",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Parameters to train, common settings\n",
+    "\n",
+    "With parameter sharing, the convolution involves thus  for each filter  $F\\times F\\times D_1$ weights plus one bias parameter.\n",
+    "\n",
+    "In total we have"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d98e6808",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\left(F\\times F\\times D_1)\\right) \\times K+(K\\mathrm{--biases}),\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "601ecd16",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "parameters to train by back propagation.\n",
+    "\n",
+    "It is common to let $K$ come in powers of $2$, that is $32$, $64$, $128$ etc.\n",
+    "\n",
+    "**Common settings.**\n",
+    "\n",
+    "1. $\\begin{array}{c} F=3 & S=1 & P=1 \\end{array}$\n",
+    "\n",
+    "2. $\\begin{array}{c} F=5 & S=1 & P=2 \\end{array}$\n",
+    "\n",
+    "3. $\\begin{array}{c} F=5 & S=2 & P=\\mathrm{open} \\end{array}$\n",
+    "\n",
+    "4. $\\begin{array}{c} F=1 & S=1 & P=0 \\end{array}$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3f87e148",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Examples of CNN setups\n",
+    "\n",
+    "Let us assume we have an input volume $V$ given by an image of dimensionality\n",
+    "$32\\times 32 \\times 3$, that is three color channels and $32\\times 32$ pixels.\n",
+    "\n",
+    "We apply a filter of dimension $5\\times 5$ ten times with stride $S=1$ and padding $P=0$.\n",
+    "\n",
+    "The output volume is given by $(32-5)/1+1=28$, resulting in ten images\n",
+    "of dimensionality $28\\times 28\\times 3$.\n",
+    "\n",
+    "The total number of parameters to train for each filter is then\n",
+    "$5\\times 5\\times 3+1$, where the last parameter is the bias. This\n",
+    "gives us $76$ parameters for each filter, leading to a total of $760$\n",
+    "parameters for the ten filters.\n",
+    "\n",
+    "How many parameters will a filter of dimensionality $3\\times 3$\n",
+    "(adding color channels) result in if we produce $32$ new images? Use $S=1$ and $P=0$.\n",
+    "\n",
+    "Note that strides constitute a form of **subsampling**. As an alternative to\n",
+    "being interpreted as a measure of how much the kernel/filter is translated, strides\n",
+    "can also be viewed as how much of the output is retained. For instance, moving\n",
+    "the kernel by hops of two is equivalent to moving the kernel by hops of one but\n",
+    "retaining only odd output elements."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "45526eae",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Summarizing: Performing a general discrete convolution ([From Raschka et al](https://github.com/rasbt/machine-learning-book))\n",
+    "\n",
+    "<!-- dom:FIGURE: [figslides/discreteconv1.png, width=500 frac=0.67]  A deep CNN -->\n",
+    "<!-- begin figure -->\n",
+    "\n",
+    "<img src=\"figslides/discreteconv1.png\" width=\"500\"><p style=\"font-size: 0.9em\"><i>Figure 1: A deep CNN</i></p>\n",
+    "<!-- end figure -->"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "963177d2",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Pooling\n",
+    "\n",
+    "In addition to discrete convolutions themselves, **pooling** operations\n",
+    "make up another important building block in CNNs. Pooling operations reduce\n",
+    "the size of feature maps by using some function to summarize subregions, such\n",
+    "as taking the average or the maximum value.\n",
+    "\n",
+    "Pooling works by sliding a window across the input and feeding the content of\n",
+    "the window to a **pooling function**. In some sense, pooling works very much\n",
+    "like a discrete convolution, but replaces the linear combination described by\n",
+    "the kernel with some other function."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f657465b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Pooling arithmetic\n",
+    "\n",
+    "In a neural network, pooling layers provide invariance to small translations of\n",
+    "the input. The most common kind of pooling is **max pooling**, which\n",
+    "consists in splitting the input in (usually non-overlapping) patches and\n",
+    "outputting the maximum value of each patch. Other kinds of pooling exist, e.g.,\n",
+    "mean or average pooling, which all share the same idea of aggregating the input\n",
+    "locally by applying a non-linearity to the content of some patches."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "33142d01",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Pooling types ([From Raschka et al](https://github.com/rasbt/machine-learning-book))\n",
+    "\n",
+    "<!-- dom:FIGURE: [figslides/maxpooling.png, width=500 frac=0.67]  A deep CNN -->\n",
+    "<!-- begin figure -->\n",
+    "\n",
+    "<img src=\"figslides/maxpooling.png\" width=\"500\"><p style=\"font-size: 0.9em\"><i>Figure 1: A deep CNN</i></p>\n",
+    "<!-- end figure -->"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7e8ee265",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Building convolutional neural networks using Tensorflow and Keras\n",
+    "\n",
+    "As discussed above, CNNs are neural networks built from the assumption that the inputs\n",
+    "to the network are 2D images. This is important because the number of features or pixels in images\n",
+    "grows very fast with the image size, and an enormous number of weights and biases are needed in order to build an accurate network.  \n",
+    "\n",
+    "As before, we still have our input, a hidden layer and an output. What's novel about convolutional networks\n",
+    "are the **convolutional** and **pooling** layers stacked in pairs between the input and the hidden layer.\n",
+    "In addition, the data is no longer represented as a 2D feature matrix, instead each input is a number of 2D\n",
+    "matrices, typically 1 for each color dimension (Red, Green, Blue)."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c4e2bc6f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Setting it up\n",
+    "\n",
+    "It means that to represent the entire\n",
+    "dataset of images, we require a 4D matrix or **tensor**. This tensor has the dimensions:"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f8d6e5be",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "(n_{inputs},\\, n_{pixels, width},\\, n_{pixels, height},\\, depth) .\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "bd170ded",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## The MNIST dataset again\n",
+    "\n",
+    "The MNIST dataset consists of grayscale images with a pixel size of\n",
+    "$28\\times 28$, meaning we require $28 \\times 28 = 724$ weights to each\n",
+    "neuron in the first hidden layer.\n",
+    "\n",
+    "If we were to analyze images of size $128\\times 128$ we would require\n",
+    "$128 \\times 128 = 16384$ weights to each neuron. Even worse if we were\n",
+    "dealing with color images, as most images are, we have an image matrix\n",
+    "of size $128\\times 128$ for each color dimension (Red, Green, Blue),\n",
+    "meaning 3 times the number of weights $= 49152$ are required for every\n",
+    "single neuron in the first hidden layer."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5f8a4322",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Strong correlations\n",
+    "\n",
+    "Images typically have strong local correlations, meaning that a small\n",
+    "part of the image varies little from its neighboring regions. If for\n",
+    "example we have an image of a blue car, we can roughly assume that a\n",
+    "small blue part of the image is surrounded by other blue regions.\n",
+    "\n",
+    "Therefore, instead of connecting every single pixel to a neuron in the\n",
+    "first hidden layer, as we have previously done with deep neural\n",
+    "networks, we can instead connect each neuron to a small part of the\n",
+    "image (in all 3 RGB depth dimensions).  The size of each small area is\n",
+    "fixed, and known as a [receptive](https://en.wikipedia.org/wiki/Receptive_field)."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "bad994c1",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Layers of a CNN\n",
+    "\n",
+    "The layers of a convolutional neural network arrange neurons in 3D: width, height and depth.  \n",
+    "The input image is typically a square matrix of depth 3. \n",
+    "\n",
+    "A **convolution** is performed on the image which outputs\n",
+    "a 3D volume of neurons. The weights to the input are arranged in a number of 2D matrices, known as **filters**.\n",
+    "\n",
+    "Each filter slides along the input image, taking the dot product\n",
+    "between each small part of the image and the filter, in all depth\n",
+    "dimensions. This is then passed through a non-linear function,\n",
+    "typically the **Rectified Linear (ReLu)** function, which serves as the\n",
+    "activation of the neurons in the first convolutional layer. This is\n",
+    "further passed through a **pooling layer**, which reduces the size of the\n",
+    "convolutional layer, e.g. by taking the maximum or average across some\n",
+    "small regions, and this serves as input to the next convolutional\n",
+    "layer."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3f9bf131",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Systematic reduction\n",
+    "\n",
+    "By systematically reducing the size of the input volume, through\n",
+    "convolution and pooling, the network should create representations of\n",
+    "small parts of the input, and then from them assemble representations\n",
+    "of larger areas.  The final pooling layer is flattened to serve as\n",
+    "input to a hidden layer, such that each neuron in the final pooling\n",
+    "layer is connected to every single neuron in the hidden layer. This\n",
+    "then serves as input to the output layer, e.g. a softmax output for\n",
+    "classification."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "625ace40",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Prerequisites: Collect and pre-process data"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "id": "a3f06a64",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "%matplotlib inline\n",
+    "\n",
+    "# import necessary packages\n",
+    "import numpy as np\n",
+    "import matplotlib.pyplot as plt\n",
+    "from sklearn import datasets\n",
+    "\n",
+    "\n",
+    "# ensure the same random numbers appear every time\n",
+    "np.random.seed(0)\n",
+    "\n",
+    "# display images in notebook\n",
+    "%matplotlib inline\n",
+    "plt.rcParams['figure.figsize'] = (12,12)\n",
+    "\n",
+    "\n",
+    "# download MNIST dataset\n",
+    "digits = datasets.load_digits()\n",
+    "\n",
+    "# define inputs and labels\n",
+    "inputs = digits.images\n",
+    "labels = digits.target\n",
+    "\n",
+    "# RGB images have a depth of 3\n",
+    "# our images are grayscale so they should have a depth of 1\n",
+    "inputs = inputs[:,:,:,np.newaxis]\n",
+    "\n",
+    "print(\"inputs = (n_inputs, pixel_width, pixel_height, depth) = \" + str(inputs.shape))\n",
+    "print(\"labels = (n_inputs) = \" + str(labels.shape))\n",
+    "\n",
+    "\n",
+    "# choose some random images to display\n",
+    "n_inputs = len(inputs)\n",
+    "indices = np.arange(n_inputs)\n",
+    "random_indices = np.random.choice(indices, size=5)\n",
+    "\n",
+    "for i, image in enumerate(digits.images[random_indices]):\n",
+    "    plt.subplot(1, 5, i+1)\n",
+    "    plt.axis('off')\n",
+    "    plt.imshow(image, cmap=plt.cm.gray_r, interpolation='nearest')\n",
+    "    plt.title(\"Label: %d\" % digits.target[random_indices[i]])\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "764e7143",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Importing Keras and Tensorflow"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "id": "1b8fd15a",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "from tensorflow.keras import datasets, layers, models\n",
+    "from tensorflow.keras.layers import Input\n",
+    "from tensorflow.keras.models import Sequential      #This allows appending layers to existing models\n",
+    "from tensorflow.keras.layers import Dense           #This allows defining the characteristics of a particular layer\n",
+    "from tensorflow.keras import optimizers             #This allows using whichever optimiser we want (sgd,adam,RMSprop)\n",
+    "from tensorflow.keras import regularizers           #This allows using whichever regularizer we want (l1,l2,l1_l2)\n",
+    "from tensorflow.keras.utils import to_categorical   #This allows using categorical cross entropy as the cost function\n",
+    "#from tensorflow.keras import Conv2D\n",
+    "#from tensorflow.keras import MaxPooling2D\n",
+    "#from tensorflow.keras import Flatten\n",
+    "\n",
+    "from sklearn.model_selection import train_test_split\n",
+    "\n",
+    "# representation of labels\n",
+    "labels = to_categorical(labels)\n",
+    "\n",
+    "# split into train and test data\n",
+    "# one-liner from scikit-learn library\n",
+    "train_size = 0.8\n",
+    "test_size = 1 - train_size\n",
+    "X_train, X_test, Y_train, Y_test = train_test_split(inputs, labels, train_size=train_size,\n",
+    "                                                    test_size=test_size)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "bf68c3f4",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Running with Keras"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "id": "d5a91d0e",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "def create_convolutional_neural_network_keras(input_shape, receptive_field,\n",
+    "                                              n_filters, n_neurons_connected, n_categories,\n",
+    "                                              eta, lmbd):\n",
+    "    model = Sequential()\n",
+    "    model.add(layers.Conv2D(n_filters, (receptive_field, receptive_field), input_shape=input_shape, padding='same',\n",
+    "              activation='relu', kernel_regularizer=regularizers.l2(lmbd)))\n",
+    "    model.add(layers.MaxPooling2D(pool_size=(2, 2)))\n",
+    "    model.add(layers.Flatten())\n",
+    "    model.add(layers.Dense(n_neurons_connected, activation='relu', kernel_regularizer=regularizers.l2(lmbd)))\n",
+    "    model.add(layers.Dense(n_categories, activation='softmax', kernel_regularizer=regularizers.l2(lmbd)))\n",
+    "    \n",
+    "    sgd = optimizers.SGD(lr=eta)\n",
+    "    model.compile(loss='categorical_crossentropy', optimizer=sgd, metrics=['accuracy'])\n",
+    "    \n",
+    "    return model\n",
+    "\n",
+    "epochs = 100\n",
+    "batch_size = 100\n",
+    "input_shape = X_train.shape[1:4]\n",
+    "receptive_field = 3\n",
+    "n_filters = 10\n",
+    "n_neurons_connected = 50\n",
+    "n_categories = 10\n",
+    "\n",
+    "eta_vals = np.logspace(-5, 1, 7)\n",
+    "lmbd_vals = np.logspace(-5, 1, 7)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8ff4d34b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Final part"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "id": "c1035646",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "CNN_keras = np.zeros((len(eta_vals), len(lmbd_vals)), dtype=object)\n",
+    "        \n",
+    "for i, eta in enumerate(eta_vals):\n",
+    "    for j, lmbd in enumerate(lmbd_vals):\n",
+    "        CNN = create_convolutional_neural_network_keras(input_shape, receptive_field,\n",
+    "                                              n_filters, n_neurons_connected, n_categories,\n",
+    "                                              eta, lmbd)\n",
+    "        CNN.fit(X_train, Y_train, epochs=epochs, batch_size=batch_size, verbose=0)\n",
+    "        scores = CNN.evaluate(X_test, Y_test)\n",
+    "        \n",
+    "        CNN_keras[i][j] = CNN\n",
+    "        \n",
+    "        print(\"Learning rate = \", eta)\n",
+    "        print(\"Lambda = \", lmbd)\n",
+    "        print(\"Test accuracy: %.3f\" % scores[1])\n",
+    "        print()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "dcdee4b4",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Final visualization"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "id": "c34c4218",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "# visual representation of grid search\n",
+    "# uses seaborn heatmap, could probably do this in matplotlib\n",
+    "import seaborn as sns\n",
+    "\n",
+    "sns.set()\n",
+    "\n",
+    "train_accuracy = np.zeros((len(eta_vals), len(lmbd_vals)))\n",
+    "test_accuracy = np.zeros((len(eta_vals), len(lmbd_vals)))\n",
+    "\n",
+    "for i in range(len(eta_vals)):\n",
+    "    for j in range(len(lmbd_vals)):\n",
+    "        CNN = CNN_keras[i][j]\n",
+    "\n",
+    "        train_accuracy[i][j] = CNN.evaluate(X_train, Y_train)[1]\n",
+    "        test_accuracy[i][j] = CNN.evaluate(X_test, Y_test)[1]\n",
+    "\n",
+    "        \n",
+    "fig, ax = plt.subplots(figsize = (10, 10))\n",
+    "sns.heatmap(train_accuracy, annot=True, ax=ax, cmap=\"viridis\")\n",
+    "ax.set_title(\"Training Accuracy\")\n",
+    "ax.set_ylabel(\"$\\eta$\")\n",
+    "ax.set_xlabel(\"$\\lambda$\")\n",
+    "plt.show()\n",
+    "\n",
+    "fig, ax = plt.subplots(figsize = (10, 10))\n",
+    "sns.heatmap(test_accuracy, annot=True, ax=ax, cmap=\"viridis\")\n",
+    "ax.set_title(\"Test Accuracy\")\n",
+    "ax.set_ylabel(\"$\\eta$\")\n",
+    "ax.set_xlabel(\"$\\lambda$\")\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9848777f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## The CIFAR01 data set\n",
+    "\n",
+    "The CIFAR10 dataset contains 60,000 color images in 10 classes, with\n",
+    "6,000 images in each class. The dataset is divided into 50,000\n",
+    "training images and 10,000 testing images. The classes are mutually\n",
+    "exclusive and there is no overlap between them."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "id": "e3c34685",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "import tensorflow as tf\n",
+    "\n",
+    "from tensorflow.keras import datasets, layers, models\n",
+    "import matplotlib.pyplot as plt\n",
+    "\n",
+    "# We import the data set\n",
+    "(train_images, train_labels), (test_images, test_labels) = datasets.cifar10.load_data()\n",
+    "\n",
+    "# Normalize pixel values to be between 0 and 1 by dividing by 255. \n",
+    "train_images, test_images = train_images / 255.0, test_images / 255.0"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "376a2959",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Verifying the data set\n",
+    "\n",
+    "To verify that the dataset looks correct, let's plot the first 25 images from the training set and display the class name below each image."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "id": "fa4b303c",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "class_names = ['airplane', 'automobile', 'bird', 'cat', 'deer',\n",
+    "               'dog', 'frog', 'horse', 'ship', 'truck']\n",
+    "plt.figure(figsize=(10,10))\n",
+    "for i in range(25):\n",
+    "    plt.subplot(5,5,i+1)\n",
+    "    plt.xticks([])\n",
+    "    plt.yticks([])\n",
+    "    plt.grid(False)\n",
+    "    plt.imshow(train_images[i], cmap=plt.cm.binary)\n",
+    "    # The CIFAR labels happen to be arrays, \n",
+    "    # which is why you need the extra index\n",
+    "    plt.xlabel(class_names[train_labels[i][0]])\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8f717ab7",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Set up  the model\n",
+    "\n",
+    "The 6 lines of code below define the convolutional base using a common pattern: a stack of Conv2D and MaxPooling2D layers.\n",
+    "\n",
+    "As input, a CNN takes tensors of shape (image_height, image_width, color_channels), ignoring the batch size. If you are new to these dimensions, color_channels refers to (R,G,B). In this example, you will configure our CNN to process inputs of shape (32, 32, 3), which is the format of CIFAR images. You can do this by passing the argument input_shape to our first layer."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 8,
+   "id": "91013222",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "model = models.Sequential()\n",
+    "model.add(layers.Conv2D(32, (3, 3), activation='relu', input_shape=(32, 32, 3)))\n",
+    "model.add(layers.MaxPooling2D((2, 2)))\n",
+    "model.add(layers.Conv2D(64, (3, 3), activation='relu'))\n",
+    "model.add(layers.MaxPooling2D((2, 2)))\n",
+    "model.add(layers.Conv2D(64, (3, 3), activation='relu'))\n",
+    "\n",
+    "# Let's display the architecture of our model so far.\n",
+    "\n",
+    "model.summary()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "64f3581b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "You can see that the output of every Conv2D and MaxPooling2D layer is a 3D tensor of shape (height, width, channels). The width and height dimensions tend to shrink as you go deeper in the network. The number of output channels for each Conv2D layer is controlled by the first argument (e.g., 32 or 64). Typically, as the width and height shrink, you can afford (computationally) to add more output channels in each Conv2D layer."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "07774fd6",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Add Dense layers on top\n",
+    "\n",
+    "To complete our model, you will feed the last output tensor from the\n",
+    "convolutional base (of shape (4, 4, 64)) into one or more Dense layers\n",
+    "to perform classification. Dense layers take vectors as input (which\n",
+    "are 1D), while the current output is a 3D tensor. First, you will\n",
+    "flatten (or unroll) the 3D output to 1D, then add one or more Dense\n",
+    "layers on top. CIFAR has 10 output classes, so you use a final Dense\n",
+    "layer with 10 outputs and a softmax activation."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 9,
+   "id": "a6dc1206",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "model.add(layers.Flatten())\n",
+    "model.add(layers.Dense(64, activation='relu'))\n",
+    "model.add(layers.Dense(10))\n",
+    "Here's the complete architecture of our model.\n",
+    "\n",
+    "model.summary()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "71ef5715",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "As you can see, our (4, 4, 64) outputs were flattened into vectors of shape (1024) before going through two Dense layers."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "596eaf51",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Compile and train the model"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 10,
+   "id": "1c8159af",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "model.compile(optimizer='adam',\n",
+    "              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),\n",
+    "              metrics=['accuracy'])\n",
+    "\n",
+    "history = model.fit(train_images, train_labels, epochs=10, \n",
+    "                    validation_data=(test_images, test_labels))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "23913f02",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Finally, evaluate the model"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 11,
+   "id": "942cf136",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "plt.plot(history.history['accuracy'], label='accuracy')\n",
+    "plt.plot(history.history['val_accuracy'], label = 'val_accuracy')\n",
+    "plt.xlabel('Epoch')\n",
+    "plt.ylabel('Accuracy')\n",
+    "plt.ylim([0.5, 1])\n",
+    "plt.legend(loc='lower right')\n",
+    "\n",
+    "test_loss, test_acc = model.evaluate(test_images,  test_labels, verbose=2)\n",
+    "\n",
+    "print(test_acc)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9cf8f35b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Building code using Pytorch\n",
+    "\n",
+    "This code loads and normalizes the MNIST dataset. Thereafter it defines  a CNN architecture with:\n",
+    "1. Two convolutional layers\n",
+    "\n",
+    "2. Max pooling\n",
+    "\n",
+    "3. Dropout for regularization\n",
+    "\n",
+    "4. Two fully connected layers\n",
+    "\n",
+    "It uses the Adam optimizer and for cost function it employs the\n",
+    "Cross-Entropy function. It trains for 10 epochs.\n",
+    "You can modify the architecture (number of layers, channels, dropout\n",
+    "rate) or training parameters (learning rate, batch size, epochs) to\n",
+    "experiment with different configurations."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 12,
+   "id": "3f08edcf",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "import torch\n",
+    "import torch.nn as nn\n",
+    "import torch.nn.functional as F\n",
+    "import torch.optim as optim\n",
+    "from torchvision import datasets, transforms\n",
+    "\n",
+    "# Set device\n",
+    "device = torch.device(\"cuda\" if torch.cuda.is_available() else \"cpu\")\n",
+    "\n",
+    "# Define transforms\n",
+    "transform = transforms.Compose([\n",
+    "   transforms.ToTensor(),\n",
+    "   transforms.Normalize((0.1307,), (0.3081,))\n",
+    "])\n",
+    "\n",
+    "# Load datasets\n",
+    "train_dataset = datasets.MNIST(root='./data', train=True, download=True, transform=transform)\n",
+    "test_dataset = datasets.MNIST(root='./data', train=False, download=True, transform=transform)\n",
+    "\n",
+    "# Create data loaders\n",
+    "train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=64, shuffle=True)\n",
+    "test_loader = torch.utils.data.DataLoader(test_dataset, batch_size=64, shuffle=False)\n",
+    "\n",
+    "# Define CNN model\n",
+    "class CNN(nn.Module):\n",
+    "   def __init__(self):\n",
+    "       super(CNN, self).__init__()\n",
+    "       self.conv1 = nn.Conv2d(1, 32, 3, padding=1)\n",
+    "       self.conv2 = nn.Conv2d(32, 64, 3, padding=1)\n",
+    "       self.pool = nn.MaxPool2d(2, 2)\n",
+    "       self.fc1 = nn.Linear(64*7*7, 1024)\n",
+    "       self.fc2 = nn.Linear(1024, 10)\n",
+    "       self.dropout = nn.Dropout(0.5)\n",
+    "\n",
+    "   def forward(self, x):\n",
+    "       x = self.pool(F.relu(self.conv1(x)))\n",
+    "       x = self.pool(F.relu(self.conv2(x)))\n",
+    "       x = x.view(-1, 64*7*7)\n",
+    "       x = self.dropout(F.relu(self.fc1(x)))\n",
+    "       x = self.fc2(x)\n",
+    "       return x\n",
+    "\n",
+    "# Initialize model, loss function, and optimizer\n",
+    "model = CNN().to(device)\n",
+    "criterion = nn.CrossEntropyLoss()\n",
+    "optimizer = optim.Adam(model.parameters(), lr=0.001)\n",
+    "\n",
+    "# Training loop\n",
+    "num_epochs = 10\n",
+    "for epoch in range(num_epochs):\n",
+    "   model.train()\n",
+    "   running_loss = 0.0\n",
+    "   for batch_idx, (data, target) in enumerate(train_loader):\n",
+    "       data, target = data.to(device), target.to(device)\n",
+    "       optimizer.zero_grad()\n",
+    "       outputs = model(data)\n",
+    "       loss = criterion(outputs, target)\n",
+    "       loss.backward()\n",
+    "       optimizer.step()\n",
+    "       running_loss += loss.item()\n",
+    "\n",
+    "   print(f'Epoch [{epoch+1}/{num_epochs}], Loss: {running_loss/len(train_loader):.4f}')\n",
+    "\n",
+    "# Testing the model\n",
+    "model.eval()\n",
+    "correct = 0\n",
+    "total = 0\n",
+    "with torch.no_grad():\n",
+    "   for data, target in test_loader:\n",
+    "       data, target = data.to(device), target.to(device)\n",
+    "       outputs = model(data)\n",
+    "       _, predicted = torch.max(outputs.data, 1)\n",
+    "       total += target.size(0)\n",
+    "       correct += (predicted == target).sum().item()\n",
+    "\n",
+    "print(f'Test Accuracy: {100 * correct / total:.2f}%')"
+   ]
+  }
+ ],
+ "metadata": {},
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
\ No newline at end of file
diff --git a/doc/LectureNotes/_toc.yml b/doc/LectureNotes/_toc.yml
index d07149d0e..39ed2f64f 100644
--- a/doc/LectureNotes/_toc.yml
+++ b/doc/LectureNotes/_toc.yml
@@ -48,7 +48,23 @@ parts:
     - file: exercisesweek36.ipynb
     - file: week36.ipynb
     - file: exercisesweek37.ipynb
+    - file: week37.ipynb
+    - file: exercisesweek38.ipynb
+    - file: week38.ipynb    
+    - file: exercisesweek39.ipynb
+    - file: week39.ipynb
+    - file: week40.ipynb
+    - file: week41.ipynb
+    - file: exercisesweek41.ipynb
+    - file: week42.ipynb    
+    - file: exercisesweek42.ipynb        
+    - file: week43.ipynb    
+    - file: exercisesweek43.ipynb
+    - file: week44.ipynb    
+    - file: exercisesweek44.ipynb
+    - file: week45.ipynb        
   - caption: Projects
     numbered: false
     chapters:
     - file: project1.ipynb
+    - file: project2.ipynb    
diff --git a/doc/LectureNotes/data/FYS_STK_Template.zip b/doc/LectureNotes/data/FYS_STK_Template.zip
new file mode 100644
index 000000000..9d2eea71b
Binary files /dev/null and b/doc/LectureNotes/data/FYS_STK_Template.zip differ
diff --git a/doc/LectureNotes/exercisesweek35.ipynb b/doc/LectureNotes/exercisesweek35.ipynb
index 886db99ef..403eab1f3 100644
--- a/doc/LectureNotes/exercisesweek35.ipynb
+++ b/doc/LectureNotes/exercisesweek35.ipynb
@@ -323,7 +323,7 @@
    "source": [
     "n = 100\n",
     "x = np.linspace(-3, 3, n)\n",
-    "y = np.exp(-x**2) + 1.5 * np.exp(-(x-2)**2) + np.random.normal(0, 1.0)"
+    "y = np.exp(-x**2) + 1.5 * np.exp(-(x-2)**2) + np.random.normal(n)"
    ]
   },
   {
diff --git a/doc/LectureNotes/exercisesweek36.ipynb b/doc/LectureNotes/exercisesweek36.ipynb
index ddf3e11e5..3dd1ad167 100644
--- a/doc/LectureNotes/exercisesweek36.ipynb
+++ b/doc/LectureNotes/exercisesweek36.ipynb
@@ -172,7 +172,7 @@
    "source": [
     "n = 100\n",
     "x = np.linspace(-3, 3, n)\n",
-    "y = np.exp(-x**2) + 1.5 * np.exp(-(x-2)**2) + np.random.normal(0, 0.1)"
+    "y = np.exp(-x**2) + 1.5 * np.exp(-(x-2)**2) + np.random.normal(n)"
    ]
   },
   {
diff --git a/doc/LectureNotes/exercisesweek37.ipynb b/doc/LectureNotes/exercisesweek37.ipynb
index 25296c4e0..bb6ba7a35 100644
--- a/doc/LectureNotes/exercisesweek37.ipynb
+++ b/doc/LectureNotes/exercisesweek37.ipynb
@@ -2,32 +2,33 @@
  "cells": [
   {
    "cell_type": "markdown",
-   "id": "1b941c35",
+   "id": "8e6632a0",
    "metadata": {
     "editable": true
    },
    "source": [
     "<!-- HTML file automatically generated from DocOnce source (https://github.com/doconce/doconce/)\n",
     "doconce format html exercisesweek37.do.txt  -->\n",
-    "<!-- dom:TITLE: Exercises week 37 -->"
+    "<!-- dom:TITLE: Exercises week 37 -->\n"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "dc05b096",
+   "id": "82705c4f",
    "metadata": {
     "editable": true
    },
    "source": [
     "# Exercises week 37\n",
+    "\n",
     "**Implementing gradient descent for Ridge and ordinary Least Squares Regression**\n",
     "\n",
-    "Date: **September 8-12, 2025**"
+    "Date: **September 8-12, 2025**\n"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "2cf07405",
+   "id": "921bf331",
    "metadata": {
     "editable": true
    },
@@ -35,55 +36,56 @@
     "## Learning goals\n",
     "\n",
     "After having completed these exercises you will have:\n",
+    "\n",
     "1. Your own code for the implementation of the simplest gradient descent approach applied to ordinary least squares (OLS) and Ridge regression\n",
     "\n",
     "2. Be able to compare the analytical expressions for OLS and Ridge regression with the gradient descent approach\n",
     "\n",
     "3. Explore the role of the learning rate in the gradient descent approach and the hyperparameter $\\lambda$ in Ridge regression\n",
     "\n",
-    "4. Scale the data properly"
+    "4. Scale the data properly\n"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "3c139edb",
+   "id": "adff65d5",
    "metadata": {
     "editable": true
    },
    "source": [
     "## Simple one-dimensional second-order polynomial\n",
     "\n",
-    "We start with a very simple function"
+    "We start with a very simple function\n"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "aad4cfac",
+   "id": "70418b3d",
    "metadata": {
     "editable": true
    },
    "source": [
     "$$\n",
     "f(x)= 2-x+5x^2,\n",
-    "$$"
+    "$$\n"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "6682282f",
+   "id": "11a3cf73",
    "metadata": {
     "editable": true
    },
    "source": [
-    "defined for $x\\in [-2,2]$. You can add noise if you wish. \n",
+    "defined for $x\\in [-2,2]$. You can add noise if you wish.\n",
     "\n",
     "We are going to fit this function with a polynomial ansatz. The easiest thing is to set up a second-order polynomial and see if you can fit the above function.\n",
-    "Feel free to play around with higher-order polynomials."
+    "Feel free to play around with higher-order polynomials.\n"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "89e2f4c4",
+   "id": "04a06b51",
    "metadata": {
     "editable": true
    },
@@ -94,12 +96,12 @@
     "standardize the features. This ensures all features are on a\n",
     "comparable scale, which is especially important when using\n",
     "regularization. Here we will perform standardization, scaling each\n",
-    "feature to have mean 0 and standard deviation 1."
+    "feature to have mean 0 and standard deviation 1.\n"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "b06d4e53",
+   "id": "408db3d9",
    "metadata": {
     "editable": true
    },
@@ -114,13 +116,13 @@
     "term, the data is shifted such that the intercept is effectively 0\n",
     ". (In practice, one could include an intercept in the model and not\n",
     "penalize it, but here we simplify by centering.)\n",
-    "Choose $n=100$ data points and set up $\\boldsymbol{x}, $\\boldsymbol{y}$ and the design matrix $\\boldsymbol{X}$."
+    "Choose $n=100$ data points and set up $\\boldsymbol{x}$, $\\boldsymbol{y}$ and the design matrix $\\boldsymbol{X}$.\n"
    ]
   },
   {
    "cell_type": "code",
    "execution_count": 1,
-   "id": "63796480",
+   "id": "37fb732c",
    "metadata": {
     "collapsed": false,
     "editable": true
@@ -140,46 +142,46 @@
   },
   {
    "cell_type": "markdown",
-   "id": "80748600",
+   "id": "d861e1e3",
    "metadata": {
     "editable": true
    },
    "source": [
-    "Fill in the necessary details.\n",
+    "Fill in the necessary details. Do we need to center the $y$-values?\n",
     "\n",
     "After this preprocessing, each column of $\\boldsymbol{X}_{\\mathrm{norm}}$ has mean zero and standard deviation $1$\n",
     "and $\\boldsymbol{y}_{\\mathrm{centered}}$ has mean 0. This makes the optimization landscape\n",
     "nicer and ensures the regularization penalty $\\lambda \\sum_j\n",
     "\\theta_j^2$ in Ridge regression treats each coefficient fairly (since features are on the\n",
-    "same scale)."
+    "same scale).\n"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "92751e5f",
+   "id": "b3e774d0",
    "metadata": {
     "editable": true
    },
    "source": [
     "## Exercise 2, calculate the gradients\n",
     "\n",
-    "Find the gradients for OLS and Ridge regression using the mean-squared error as cost/loss function."
+    "Find the gradients for OLS and Ridge regression using the mean-squared error as cost/loss function.\n"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "aedfbd7a",
+   "id": "d5dc7708",
    "metadata": {
     "editable": true
    },
    "source": [
-    "## Exercise 3, using the analytical formulae for OLS and Ridge regression to find the optimal paramters $\\boldsymbol{\\theta}$"
+    "## Exercise 3, using the analytical formulae for OLS and Ridge regression to find the optimal paramters $\\boldsymbol{\\theta}$\n"
    ]
   },
   {
    "cell_type": "code",
    "execution_count": 2,
-   "id": "5d1288fa",
+   "id": "4c9c86ac",
    "metadata": {
     "collapsed": false,
     "editable": true
@@ -187,7 +189,9 @@
    "outputs": [],
    "source": [
     "# Set regularization parameter, either a single value or a vector of values\n",
-    "lambda = ?\n",
+    "# Note that lambda is a python keyword. The lambda keyword is used to create small, single-expression functions without a formal name. These are often called \"anonymous functions\" or \"lambda functions.\"\n",
+    "lam = ?\n",
+    "\n",
     "\n",
     "# Analytical form for OLS and Ridge solution: theta_Ridge = (X^T X + lambda * I)^{-1} X^T y and theta_OLS = (X^T X)^{-1} X^T y\n",
     "I = np.eye(n_features)\n",
@@ -200,7 +204,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "628f5e89",
+   "id": "eeae00fd",
    "metadata": {
     "editable": true
    },
@@ -208,37 +212,37 @@
     "This computes the Ridge and OLS regression coefficients directly. The identity\n",
     "matrix $I$ has the same size as $X^T X$. It adds $\\lambda$ to the diagonal of $X^T X$ for Ridge regression. We\n",
     "then invert this matrix and multiply by $X^T y$. The result\n",
-    "for $\\boldsymbol{\\theta}$  is a NumPy array of shape (n$\\_$features,) containing the\n",
-    "fitted parameters $\\boldsymbol{\\theta}$."
+    "for $\\boldsymbol{\\theta}$ is a NumPy array of shape (n$\\_$features,) containing the\n",
+    "fitted parameters $\\boldsymbol{\\theta}$.\n"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "f115ba4e",
+   "id": "e1c215d5",
    "metadata": {
     "editable": true
    },
    "source": [
     "### 3a)\n",
     "\n",
-    "Finalize, in the above code, the OLS and Ridge regression determination of the optimal parameters $\\boldsymbol{\\theta}$."
+    "Finalize, in the above code, the OLS and Ridge regression determination of the optimal parameters $\\boldsymbol{\\theta}$.\n"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "a9b5189c",
+   "id": "587dd3dc",
    "metadata": {
     "editable": true
    },
    "source": [
     "### 3b)\n",
     "\n",
-    "Explore the results as function of different values of the hyperparameter $\\lambda$. See for example exercise 4 from week 36."
+    "Explore the results as function of different values of the hyperparameter $\\lambda$. See for example exercise 4 from week 36.\n"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "a3969ff6",
+   "id": "bfa34697",
    "metadata": {
     "editable": true
    },
@@ -250,15 +254,15 @@
     "necessary if $n$ and $p$ are so large that the closed-form might be\n",
     "too slow or memory-intensive. We derive the gradients from the cost\n",
     "functions defined above. Use the gradients of the Ridge and OLS cost functions with respect to\n",
-    "the parameters  $\\boldsymbol{\\theta}$ and set up (using the template below) your own gradient descent code for OLS and Ridge regression.\n",
+    "the parameters $\\boldsymbol{\\theta}$ and set up (using the template below) your own gradient descent code for OLS and Ridge regression.\n",
     "\n",
-    "Below is a template code for gradient descent implementation of ridge:"
+    "Below is a template code for gradient descent implementation of ridge:\n"
    ]
   },
   {
    "cell_type": "code",
    "execution_count": 3,
-   "id": "34d87303",
+   "id": "49245f55",
    "metadata": {
     "collapsed": false,
     "editable": true
@@ -273,19 +277,8 @@
     "# Initialize weights for gradient descent\n",
     "theta = np.zeros(n_features)\n",
     "\n",
-    "# Arrays to store history for plotting\n",
-    "cost_history = np.zeros(num_iters)\n",
-    "\n",
     "# Gradient descent loop\n",
-    "m = n_samples  # number of data points\n",
     "for t in range(num_iters):\n",
-    "    # Compute prediction error\n",
-    "    error = X_norm.dot(theta) - y_centered \n",
-    "    # Compute cost for OLS and Ridge (MSE + regularization for Ridge) for monitoring\n",
-    "    cost_OLS = ?\n",
-    "    cost_Ridge = ?\n",
-    "    # You could add a history for both methods (optional)\n",
-    "    cost_history[t] = ?\n",
     "    # Compute gradients for OSL and Ridge\n",
     "    grad_OLS = ?\n",
     "    grad_Ridge = ?\n",
@@ -302,31 +295,33 @@
   },
   {
    "cell_type": "markdown",
-   "id": "989f70bb",
+   "id": "f3f43f2c",
    "metadata": {
     "editable": true
    },
    "source": [
     "### 4a)\n",
     "\n",
-    "Discuss the results as function of the learning rate parameters and the number of iterations."
+    "Write first a gradient descent code for OLS only using the above template.\n",
+    "Discuss the results as function of the learning rate parameters and the number of iterations\n"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "370b2dad",
+   "id": "9ba303be",
    "metadata": {
     "editable": true
    },
    "source": [
     "### 4b)\n",
     "\n",
-    "Try to add a stopping parameter as function of the number iterations and the difference between the new and old $\\theta$ values. How would you define a stopping criterion?"
+    "Write then a similar code for Ridge regression using the above template.\n",
+    "Try to add a stopping parameter as function of the number iterations and the difference between the new and old $\\theta$ values. How would you define a stopping criterion?\n"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "ef197cd7",
+   "id": "78362c6c",
    "metadata": {
     "editable": true
    },
@@ -346,13 +341,13 @@
     "Then we sample feature values for $\\boldsymbol{X}$ randomly (e.g. from a normal distribution). We use a normal distribution so features are roughly centered around 0.\n",
     "Then we compute the target values $y$ using the linear combination $\\boldsymbol{X}\\hat{\\boldsymbol{\\theta}}$ and add some noise (to simulate measurement error or unexplained variance).\n",
     "\n",
-    "Below is the code to generate the dataset:"
+    "Below is the code to generate the dataset:\n"
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 4,
-   "id": "4ccc2f65",
+   "execution_count": null,
+   "id": "8be1cebe",
    "metadata": {
     "collapsed": false,
     "editable": true
@@ -375,13 +370,13 @@
     "X = np.random.randn(n_samples, n_features)  # standard normal distribution\n",
     "\n",
     "# Generate target values y with a linear combination of X and theta_true, plus noise\n",
-    "noise = 0.5 * np.random.randn(n_samples)    # Gaussian noise\n",
+    "noise = 0.5 * np.random.randn(n_samples)  # Gaussian noise\n",
     "y = X.dot @ theta_true + noise"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "00e279ef",
+   "id": "e2693666",
    "metadata": {
     "editable": true
    },
@@ -390,29 +385,29 @@
     "significantly influence $\\boldsymbol{y}$. The rest of the features have zero true\n",
     "coefficient. For example, feature 0 has\n",
     "a true weight of 5.0, feature 1 has -3.0, and feature 6 has 2.0, so\n",
-    "the expected relationship is:"
+    "the expected relationship is:\n"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "c910b3f4",
+   "id": "bc954d12",
    "metadata": {
     "editable": true
    },
    "source": [
     "$$\n",
     "y \\approx 5 \\times x_0 \\;-\\; 3 \\times x_1 \\;+\\; 2 \\times x_6 \\;+\\; \\text{noise}.\n",
-    "$$"
+    "$$\n"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "89e6e040",
+   "id": "6534b610",
    "metadata": {
     "editable": true
    },
    "source": [
-    "You can remove the noise if you wish to. \n",
+    "You can remove the noise if you wish to.\n",
     "\n",
     "Try to fit the above data set using OLS and Ridge regression with the analytical expressions and your own gradient descent codes.\n",
     "\n",
@@ -420,11 +415,15 @@
     "close to the true values [5.0, -3.0, 0.0, …, 2.0, …] that we used to\n",
     "generate the data. Keep in mind that due to regularization and noise,\n",
     "the learned values will not exactly equal the true ones, but they\n",
-    "should be in the same ballpark.  Which method (OLS or Ridge) gives the best results?"
+    "should be in the same ballpark. Which method (OLS or Ridge) gives the best results?\n"
    ]
   }
  ],
- "metadata": {},
+ "metadata": {
+  "language_info": {
+   "name": "python"
+  }
+ },
  "nbformat": 4,
  "nbformat_minor": 5
 }
diff --git a/doc/LectureNotes/exercisesweek38.ipynb b/doc/LectureNotes/exercisesweek38.ipynb
new file mode 100644
index 000000000..c100028a5
--- /dev/null
+++ b/doc/LectureNotes/exercisesweek38.ipynb
@@ -0,0 +1,485 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "1da77599",
+   "metadata": {},
+   "source": [
+    "# Exercises week 38\n",
+    "\n",
+    "## September 15-19\n",
+    "\n",
+    "## Resampling and the Bias-Variance Trade-off\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e9f27b0e",
+   "metadata": {},
+   "source": [
+    "### Learning goals\n",
+    "\n",
+    "After completing these exercises, you will know how to\n",
+    "\n",
+    "- Derive expectation and variances values related to linear regression\n",
+    "- Compute expectation and variances values related to linear regression\n",
+    "- Compute and evaluate the trade-off between bias and variance of a model\n",
+    "\n",
+    "### Deliverables\n",
+    "\n",
+    "Complete the following exercises while working in a jupyter notebook. Then, in canvas, include\n",
+    "\n",
+    "- The jupyter notebook with the exercises completed\n",
+    "- An exported PDF of the notebook (https://code.visualstudio.com/docs/datascience/jupyter-notebooks#_export-your-jupyter-notebook)\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "984af8e3",
+   "metadata": {},
+   "source": [
+    "## Use the books!\n",
+    "\n",
+    "This week deals with various mean values and variances in linear regression methods (here it may be useful to look up chapter 3, equation (3.8) of [Trevor Hastie, Robert Tibshirani, Jerome H. Friedman, The Elements of Statistical Learning, Springer](https://www.springer.com/gp/book/9780387848570)).\n",
+    "\n",
+    "For more discussions on Ridge regression and calculation of expectation values, [Wessel van Wieringen's](https://arxiv.org/abs/1509.09169) article is highly recommended.\n",
+    "\n",
+    "The exercises this week are also a part of project 1 and can be reused in the theory part of the project.\n",
+    "\n",
+    "### Definitions\n",
+    "\n",
+    "We assume that there exists a continuous function $f(\\boldsymbol{x})$ and a normal distributed error $\\boldsymbol{\\varepsilon}\\sim N(0, \\sigma^2)$ which describes our data\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c16f7d0e",
+   "metadata": {},
+   "source": [
+    "$$\n",
+    "\\boldsymbol{y} = f(\\boldsymbol{x})+\\boldsymbol{\\varepsilon}\n",
+    "$$\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9fcf981a",
+   "metadata": {
+    "vscode": {
+     "languageId": "plaintext"
+    }
+   },
+   "source": [
+    "We further assume that this continous function can be modeled with a linear model $\\mathbf{\\tilde{y}}$ of some features $\\mathbf{X}$.\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d4189366",
+   "metadata": {},
+   "source": [
+    "$$\n",
+    "\\boldsymbol{y} = \\boldsymbol{\\tilde{y}} + \\boldsymbol{\\varepsilon} = \\boldsymbol{X}\\boldsymbol{\\beta} +\\boldsymbol{\\varepsilon}\n",
+    "$$\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f4fca21b",
+   "metadata": {},
+   "source": [
+    "We therefore get that our data $\\boldsymbol{y}$ has an expectation value $\\boldsymbol{X}\\boldsymbol{\\beta}$ and variance $\\sigma^2$, that is $\\boldsymbol{y}$ follows a normal distribution with mean value $\\boldsymbol{X}\\boldsymbol{\\beta}$ and variance $\\sigma^2$.\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5de0c7e6",
+   "metadata": {},
+   "source": [
+    "## Exercise 1: Expectation values for ordinary least squares expressions\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d878c699",
+   "metadata": {},
+   "source": [
+    "**a)** With the expressions for the optimal parameters $\\boldsymbol{\\hat{\\beta}_{OLS}}$ show that\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "08b7007d",
+   "metadata": {},
+   "source": [
+    "$$\n",
+    "\\mathbb{E}(\\boldsymbol{\\hat{\\beta}_{OLS}}) = \\boldsymbol{\\beta}.\n",
+    "$$\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "46e93394",
+   "metadata": {},
+   "source": [
+    "**b)** Show that the variance of $\\boldsymbol{\\hat{\\beta}_{OLS}}$ is\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "be1b65be",
+   "metadata": {},
+   "source": [
+    "$$\n",
+    "\\mathbf{Var}(\\boldsymbol{\\hat{\\beta}_{OLS}}) = \\sigma^2 \\, (\\mathbf{X}^{T} \\mathbf{X})^{-1}.\n",
+    "$$\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d2143684",
+   "metadata": {},
+   "source": [
+    "We can use the last expression when we define a [confidence interval](https://en.wikipedia.org/wiki/Confidence_interval) for the parameters $\\boldsymbol{\\hat{\\beta}_{OLS}}$.\n",
+    "A given parameter ${\\boldsymbol{\\hat{\\beta}_{OLS}}}_j$ is given by the diagonal matrix element of the above matrix.\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f5c2dc22",
+   "metadata": {},
+   "source": [
+    "## Exercise 2: Expectation values for Ridge regression\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3893e3e7",
+   "metadata": {},
+   "source": [
+    "**a)** With the expressions for the optimal parameters $\\boldsymbol{\\hat{\\beta}_{Ridge}}$ show that\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "79dc571f",
+   "metadata": {},
+   "source": [
+    "$$\n",
+    "\\mathbb{E} \\big[ \\hat{\\boldsymbol{\\beta}}^{\\mathrm{Ridge}} \\big]=(\\mathbf{X}^{T} \\mathbf{X} + \\lambda \\mathbf{I}_{pp})^{-1} (\\mathbf{X}^{\\top} \\mathbf{X})\\boldsymbol{\\beta}\n",
+    "$$\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "028209a1",
+   "metadata": {},
+   "source": [
+    "We see that $\\mathbb{E} \\big[ \\hat{\\boldsymbol{\\beta}}^{\\mathrm{Ridge}} \\big] \\not= \\mathbb{E} \\big[\\hat{\\boldsymbol{\\beta}}^{\\mathrm{OLS}}\\big ]$ for any $\\lambda > 0$.\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b4e721fc",
+   "metadata": {},
+   "source": [
+    "**b)** Show that the variance is\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "090eb1e1",
+   "metadata": {},
+   "source": [
+    "$$\n",
+    "\\mathbf{Var}[\\hat{\\boldsymbol{\\beta}}^{\\mathrm{Ridge}}]=\\sigma^2[  \\mathbf{X}^{T} \\mathbf{X} + \\lambda \\mathbf{I} ]^{-1}  \\mathbf{X}^{T}\\mathbf{X} \\{ [  \\mathbf{X}^{\\top} \\mathbf{X} + \\lambda \\mathbf{I} ]^{-1}\\}^{T}\n",
+    "$$\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6b8e8697",
+   "metadata": {},
+   "source": [
+    "We see that if the parameter $\\lambda$ goes to infinity then the variance of the Ridge parameters $\\boldsymbol{\\beta}$ goes to zero.\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "74bc300b",
+   "metadata": {},
+   "source": [
+    "## Exercise 3: Deriving the expression for the Bias-Variance Trade-off\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "eeb86010",
+   "metadata": {},
+   "source": [
+    "The aim of this exercise is to derive the equations for the bias-variance tradeoff to be used in project 1.\n",
+    "\n",
+    "The parameters $\\boldsymbol{\\hat{\\beta}_{OLS}}$ are found by optimizing the mean squared error via the so-called cost function\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "522a0d1d",
+   "metadata": {},
+   "source": [
+    "$$\n",
+    "C(\\boldsymbol{X},\\boldsymbol{\\beta}) =\\frac{1}{n}\\sum_{i=0}^{n-1}(y_i-\\tilde{y}_i)^2=\\mathbb{E}\\left[(\\boldsymbol{y}-\\boldsymbol{\\tilde{y}})^2\\right]\n",
+    "$$\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "831db06c",
+   "metadata": {},
+   "source": [
+    "**a)** Show that you can rewrite this into an expression which contains\n",
+    "\n",
+    "- the variance of the model (the variance term)\n",
+    "- the expected deviation of the mean of the model from the true data (the bias term)\n",
+    "- the variance of the noise\n",
+    "\n",
+    "In other words, show that:\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8cc52b3c",
+   "metadata": {},
+   "source": [
+    "$$\n",
+    "\\mathbb{E}\\left[(\\boldsymbol{y}-\\boldsymbol{\\tilde{y}})^2\\right]=\\mathrm{Bias}[\\tilde{y}]+\\mathrm{var}[\\tilde{y}]+\\sigma^2,\n",
+    "$$\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8cb50416",
+   "metadata": {},
+   "source": [
+    "with\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e49bdbb4",
+   "metadata": {},
+   "source": [
+    "$$\n",
+    "\\mathrm{Bias}[\\tilde{y}]=\\mathbb{E}\\left[\\left(\\boldsymbol{y}-\\mathbb{E}\\left[\\boldsymbol{\\tilde{y}}\\right]\\right)^2\\right],\n",
+    "$$\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "eca5554a",
+   "metadata": {},
+   "source": [
+    "and\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b1054343",
+   "metadata": {},
+   "source": [
+    "$$\n",
+    "\\mathrm{var}[\\tilde{y}]=\\mathbb{E}\\left[\\left(\\tilde{\\boldsymbol{y}}-\\mathbb{E}\\left[\\boldsymbol{\\tilde{y}}\\right]\\right)^2\\right]=\\frac{1}{n}\\sum_i(\\tilde{y}_i-\\mathbb{E}\\left[\\boldsymbol{\\tilde{y}}\\right])^2.\n",
+    "$$\n",
+    "\n",
+    "In order to arrive at the equation for the bias, we have to approximate the unknown function $f$ with the output/target values $y$.\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "70fbfcd7",
+   "metadata": {},
+   "source": [
+    "**b)** Explain what the terms mean and discuss their interpretations.\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b8f8b9d1",
+   "metadata": {},
+   "source": [
+    "## Exercise 4: Computing the Bias and Variance\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9e012430",
+   "metadata": {},
+   "source": [
+    "Before you compute the bias and variance of a real model for different complexities, let's for now assume that you have sampled predictions and targets for a single model complexity using bootstrap resampling.\n",
+    "\n",
+    "**a)** Using the expression above, compute the mean squared error, bias and variance of the given data. Check that the sum of the bias and variance correctly gives (approximately) the mean squared error.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "b5bf581c",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import numpy as np\n",
+    "\n",
+    "n = 100\n",
+    "bootstraps = 1000\n",
+    "\n",
+    "predictions = np.random.rand(bootstraps, n) * 10 + 10\n",
+    "# The definition of targets has been updated, and was wrong earlier in the week.\n",
+    "targets = np.random.rand(1, n)\n",
+    "\n",
+    "mse = ...\n",
+    "bias = ...\n",
+    "variance = ..."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7b1dc621",
+   "metadata": {},
+   "source": [
+    "**b)** Change the prediction values in some way to increase the bias while decreasing the variance.\n",
+    "\n",
+    "**c)** Change the prediction values in some way to increase the variance while decreasing the bias.\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8da63362",
+   "metadata": {},
+   "source": [
+    "**d)** Perform a bias-variance analysis of a polynomial OLS model fit to a one-dimensional function by computing and plotting the bias and variances values as a function of the polynomial degree of your model.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "dd5855e4",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import numpy as np\n",
+    "import matplotlib.pyplot as plt\n",
+    "from sklearn.preprocessing import (\n",
+    "    PolynomialFeatures,\n",
+    ")  # use the fit_transform method of the created object!\n",
+    "from sklearn.linear_model import LinearRegression\n",
+    "from sklearn.metrics import mean_squared_error\n",
+    "from sklearn.model_selection import train_test_split\n",
+    "from sklearn.utils import resample"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "7e35fa37",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "n = 100\n",
+    "bootstraps = 1000\n",
+    "\n",
+    "x = np.linspace(-3, 3, n)\n",
+    "y = np.exp(-(x**2)) + 1.5 * np.exp(-((x - 2) ** 2)) + np.random.normal(0, 0.1)\n",
+    "\n",
+    "biases = []\n",
+    "variances = []\n",
+    "mses = []\n",
+    "\n",
+    "# for p in range(1, 5):\n",
+    "#    predictions = ...\n",
+    "#    targets = ...\n",
+    "#\n",
+    "#    X = ...\n",
+    "#    X_train, X_test, y_train, y_test = ...\n",
+    "#    for b in range(bootstraps):\n",
+    "#        X_train_re, y_train_re = ...\n",
+    "#\n",
+    "#        # fit your model on the sampled data\n",
+    "#\n",
+    "#        # make predictions on the test data\n",
+    "#        predictions[b, :] =\n",
+    "#        targets[b, :] =\n",
+    "#\n",
+    "#    biases.append(...)\n",
+    "#    variances.append(...)\n",
+    "#    mses.append(...)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "253b8461",
+   "metadata": {},
+   "source": [
+    "**e)** Discuss the bias-variance trade-off as function of your model complexity (the degree of the polynomial).\n",
+    "\n",
+    "**f)** Compute and discuss the bias and variance as function of the number of data points (choose a suitable polynomial degree to show something interesting).\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "46250fbc",
+   "metadata": {},
+   "source": [
+    "## Exercise 5: Interpretation of scaling and metrics\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5af53055",
+   "metadata": {},
+   "source": [
+    "In this course, we often ask you to scale data and compute various metrics. Although these practices are \"standard\" in the field, we will require you to demonstrate an understanding of _why_ you need to scale data and use these metrics. Both so that you can make better arguements about your results, and so that you will hopefully make fewer mistakes.\n",
+    "\n",
+    "First, a few reminders: In this course you should always scale the columns of the feature matrix, and sometimes scale the target data, when it is worth the effort. By scaling, we mean subtracting the mean and dividing by the standard deviation, though there are many other ways to scale data. When scaling either the feature matrix or the target data, the intercept becomes a bit harder to implement and understand, so take care.\n",
+    "\n",
+    "Briefly answer the following:\n",
+    "\n",
+    "**a)** Why do we scale data?\n",
+    "\n",
+    "**b)** Why does the OLS method give practically equivelent models on scaled and unscaled data?\n",
+    "\n",
+    "**c)** Why does the Ridge method **not** give practically equivelent models on scaled and unscaled data? Why do we only consider the model on scaled data correct?\n",
+    "\n",
+    "**d)** Why do we say that the Ridge method gives a biased model?\n",
+    "\n",
+    "**e)** Is the MSE of the OLS method affected by scaling of the feature matrix? Is it affected by scaling of the target data?\n",
+    "\n",
+    "**f)** Read about the R2 score, a metric we will ask you to use a lot later in the course. Is the R2 score of the OLS method affected by scaling of the feature matrix? Is it affected by scaling of the target data?\n",
+    "\n",
+    "**g)** Give interpretations of the following R2 scores: 0, 0.5, 1.\n",
+    "\n",
+    "**h)** What is an advantage of the R2 score over the MSE?\n"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.9.15"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/doc/LectureNotes/exercisesweek39.ipynb b/doc/LectureNotes/exercisesweek39.ipynb
new file mode 100644
index 000000000..22a86cb56
--- /dev/null
+++ b/doc/LectureNotes/exercisesweek39.ipynb
@@ -0,0 +1,185 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "433db993",
+   "metadata": {},
+   "source": [
+    "# Exercises week 39\n",
+    "\n",
+    "## Getting started with project 1\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6b931365",
+   "metadata": {},
+   "source": [
+    "The aim of the exercises this week is to aid you in getting started with writing the report. This will be discussed during the lab sessions as well.\n",
+    "\n",
+    "A short feedback to the this exercise will be available before the project deadline. And you can reuse these elements in your final report.\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2a63bae1",
+   "metadata": {},
+   "source": [
+    "### Learning goals\n",
+    "\n",
+    "After completing these exercises, you will know how to\n",
+    "\n",
+    "- Create a properly formatted report in Overleaf\n",
+    "- Select and present graphs for a scientific report\n",
+    "- Write an abstract and introduction for a scientific report\n",
+    "\n",
+    "### Deliverables\n",
+    "\n",
+    "Complete the following exercises while working in an Overleaf project. Then, in canvas, include\n",
+    "\n",
+    "- An exported PDF of the report draft you have been working on.\n",
+    "- A comment linking to the github repository used in exercise 4.\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e0f2d99d",
+   "metadata": {},
+   "source": [
+    "## Exercise 1: Creating the report document\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d06bfb29",
+   "metadata": {},
+   "source": [
+    "We require all projects to be formatted as proper scientific reports, and this includes using LaTeX for typesetting. We strongly recommend that you use the online LaTeX editor Overleaf, as it is much easier to start using, and has excellent support for collaboration.\n",
+    "\n",
+    "**a)** Create an account on Overleaf.com, or log in using SSO with your UiO email.\n",
+    "\n",
+    "**b)** Download [this](https://github.com/CompPhysics/MachineLearning/blob/master/doc/LectureNotes/data/FYS_STK_Template.zip) template project.\n",
+    "\n",
+    "**c)** Create a new Overleaf project with the correct formatting by uploading the template project.\n",
+    "\n",
+    "**d)** Read the general guideline for writing a report, which can be found at <https://github.com/CompPhysics/MachineLearning/blob/master/doc/Projects/EvaluationGrading/EvaluationForm.md>.\n",
+    "\n",
+    "**e)** Look at the provided example of an earlier project, found at <https://github.com/CompPhysics/MachineLearning/blob/master/doc/Projects/ReportExample/ReportExampleDanielBH.pdf>\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ec36f4c3",
+   "metadata": {},
+   "source": [
+    "## Exercise 2: Adding good figures\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f50723f8",
+   "metadata": {},
+   "source": [
+    "**a)** Using what you have learned so far in this course, create a plot illustrating the Bias-Variance trade-off. Make sure the lines and axes are labeled, with font size being the same as in the text.\n",
+    "\n",
+    "**b)** Add this figure to the results section of your document, with a caption that describes it. A reader should be able to understand the figure with only its contents and caption.\n",
+    "\n",
+    "**c)** Refer to the figure in your text using \\ref.\n",
+    "\n",
+    "**d)** Create a heatmap showing the MSE of a Ridge regression model for various polynomial degrees and lambda values. Make sure the axes are labeled, and that the title or colorbar describes what is plotted.\n",
+    "\n",
+    "**e)** Add this second figure to your document with a caption and reference in the text. All figures in the final report must be captioned and be referenced and used in the text.\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "276c214e",
+   "metadata": {},
+   "source": [
+    "## Exercise 3: Writing an abstract and introduction\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f4134eb5",
+   "metadata": {},
+   "source": [
+    "Although much of your project 1 results are not done yet, we want you to write an abstract and introduction to get you started on writing the report. It is generally a good idea to write a lot of a report before finishing all of the results, as you get a better understanding of your methods and inquiry from doing so, along with saving a lot of time. Where you would typically describe results in the abstract, instead make something up, just this once.\n",
+    "\n",
+    "**a)** Read the guidelines on abstract and introduction before you start.\n",
+    "\n",
+    "**b)** Write an abstract for project 1 in your report.\n",
+    "\n",
+    "**c)** Write an introduction for project 1 in your report.\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2f512b59",
+   "metadata": {},
+   "source": [
+    "## Exercise 4: Making the code available and presentable\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "77fe1fec",
+   "metadata": {},
+   "source": [
+    "A central part of the report is the code you write to implement the methods and generate the results. To get points for the code-part of the project, you need to make your code avaliable and presentable.\n",
+    "\n",
+    "**a)** Create a github repository for project 1, or create a dedicated folder for project 1 in a github repository. Only one person in your group needs to do this.\n",
+    "\n",
+    "**b)** Add a PDF of the report to this repository, after completing exercises 1-3\n",
+    "\n",
+    "**c)** Add a folder named Code, where you can put python files for your functions and notebooks for reproducing your results.\n",
+    "\n",
+    "**d)** Add python files for functions, and a notebook to produce the figures in exercise 2, to the Code folder. Remember to use a seed for generating random data and for train-test splits.\n",
+    "\n",
+    "**e)** Create a README file in the repository or project folder with\n",
+    "\n",
+    "- the name of the group members\n",
+    "- a short description of the project\n",
+    "- a description of how to install the required packages to run your code from a requirements.txt file\n",
+    "- names and descriptions of the various notebooks in the Code folder and the results they produce\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f1d72c56",
+   "metadata": {},
+   "source": [
+    "## Exercise 5: Referencing\n",
+    "\n",
+    "**a)** Add a reference to Hastie et al. using your preferred referencing style. See https://www.sokogskriv.no/referansestiler/ for an overview of styles.\n",
+    "\n",
+    "**b)** Add a reference to sklearn like this: https://scikit-learn.org/stable/about.html#citing-scikit-learn\n",
+    "\n",
+    "**c)** Make a prompt to your LLM of choice, and upload the exported conversation to your GitHub repository for the project.\n",
+    "\n",
+    "**d)** At the end of the methods section of the report, write a one paragraph declaration on how and for what you have used the LLM. Link to the log on GitHub.\n"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.9.15"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/doc/LectureNotes/exercisesweek41.ipynb b/doc/LectureNotes/exercisesweek41.ipynb
new file mode 100644
index 000000000..190c0b96a
--- /dev/null
+++ b/doc/LectureNotes/exercisesweek41.ipynb
@@ -0,0 +1,804 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "4b4c06bc",
+   "metadata": {},
+   "source": [
+    "<!-- HTML file automatically generated from DocOnce source (https://github.com/doconce/doconce/)\n",
+    "doconce format html exercisesweek41.do.txt  -->\n",
+    "<!-- dom:TITLE: Exercises week 41 -->\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "bcb25e64",
+   "metadata": {},
+   "source": [
+    "# Exercises week 41\n",
+    "\n",
+    "**October 6-10, 2025**\n",
+    "\n",
+    "Date: **Deadline is Friday October 10 at midnight**\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "bb01f126",
+   "metadata": {},
+   "source": [
+    "# Overarching aims of the exercises this week\n",
+    "\n",
+    "This week, you will implement the entire feed-forward pass of a neural network! Next week you will compute the gradient of the network by implementing back-propagation manually, and by using autograd which does back-propagation for you (much easier!). Next week, you will also use the gradient to optimize the network with a gradient method! However, there is an optional exercise this week to get started on training the network and getting good results!\n",
+    "\n",
+    "We recommend that you do the exercises this week by editing and running this notebook file, as it includes some checks along the way that you have implemented the pieces of the feed-forward pass correctly, and running small parts of the code at a time will be important for understanding the methods.\n",
+    "\n",
+    "If you have trouble running a notebook, you can run this notebook in google colab instead (https://colab.research.google.com/drive/1zKibVQf-iAYaAn2-GlKfgRjHtLnPlBX4#offline=true&sandboxMode=true), an updated link will be provided on the course discord (you can also send an email to k.h.fredly@fys.uio.no if you encounter any trouble), though we recommend that you set up VSCode and your python environment to run code like this locally.\n",
+    "\n",
+    "First, here are some functions you are going to need, don't change this cell. If you are unable to import autograd, just swap in normal numpy until you want to do the final optional exercise.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "c6f61b09",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import autograd.numpy as np  # We need to use this numpy wrapper to make automatic differentiation work later\n",
+    "from sklearn import datasets\n",
+    "import matplotlib.pyplot as plt\n",
+    "from sklearn.metrics import accuracy_score\n",
+    "\n",
+    "\n",
+    "# Defining some activation functions\n",
+    "def ReLU(z):\n",
+    "    return np.where(z > 0, z, 0)\n",
+    "\n",
+    "\n",
+    "def sigmoid(z):\n",
+    "    return 1 / (1 + np.exp(-z))\n",
+    "\n",
+    "\n",
+    "def softmax(z):\n",
+    "    \"\"\"Compute softmax values for each set of scores in the rows of the matrix z.\n",
+    "    Used with batched input data.\"\"\"\n",
+    "    e_z = np.exp(z - np.max(z, axis=0))\n",
+    "    return e_z / np.sum(e_z, axis=1)[:, np.newaxis]\n",
+    "\n",
+    "\n",
+    "def softmax_vec(z):\n",
+    "    \"\"\"Compute softmax values for each set of scores in the vector z.\n",
+    "    Use this function when you use the activation function on one vector at a time\"\"\"\n",
+    "    e_z = np.exp(z - np.max(z))\n",
+    "    return e_z / np.sum(e_z)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6248ec53",
+   "metadata": {},
+   "source": [
+    "# Exercise 1\n",
+    "\n",
+    "In this exercise you will compute the activation of the first layer. You only need to change the code in the cells right below an exercise, the rest works out of the box. Feel free to make changes and see how stuff works though!\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "37f30740",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "np.random.seed(2024)\n",
+    "\n",
+    "x = np.random.randn(2)  # network input. This is a single input with two features\n",
+    "W1 = np.random.randn(4, 2)  # first layer weights"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4ed2cf3d",
+   "metadata": {},
+   "source": [
+    "**a)** Given the shape of the first layer weight matrix, what is the input shape of the neural network? What is the output shape of the first layer?\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "edf7217b",
+   "metadata": {},
+   "source": [
+    "**b)** Define the bias of the first layer, `b1`with the correct shape. (Run the next cell right after the previous to get the random generated values to line up with the test solution below)\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "2129c19f",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "b1 = ..."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "09e8d453",
+   "metadata": {},
+   "source": [
+    "**c)** Compute the intermediary `z1` for the first layer\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "6837119b",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "z1 = ..."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6f71374e",
+   "metadata": {},
+   "source": [
+    "**d)** Compute the activation `a1` for the first layer using the ReLU activation function defined earlier.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "8d41ed19",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "a1 = ..."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "088710c0",
+   "metadata": {},
+   "source": [
+    "Confirm that you got the correct activation with the test below. Make sure that you define `b1` with the randn function right after you define `W1`.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "4d2f54b4",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "sol1 = np.array([0.60610368, 4.0076268, 0.0, 0.56469864])\n",
+    "\n",
+    "print(np.allclose(a1, sol1))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7fb0cf46",
+   "metadata": {},
+   "source": [
+    "# Exercise 2\n",
+    "\n",
+    "Now we will add a layer to the network with an output of length 8 and ReLU activation.\n",
+    "\n",
+    "**a)** What is the input of the second layer? What is its shape?\n",
+    "\n",
+    "**b)** Define the weight and bias of the second layer with the right shapes.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "00063acf",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "W2 = ...\n",
+    "b2 = ..."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5bd7d84b",
+   "metadata": {},
+   "source": [
+    "**c)** Compute the intermediary `z2` and activation `a2` for the second layer.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "2fd0383d",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "z2 = ...\n",
+    "a2 = ..."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1b5daae5",
+   "metadata": {},
+   "source": [
+    "Confirm that you got the correct activation shape with the test below.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "f7f2f8a1",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "print(\n",
+    "    np.allclose(np.exp(len(a2)), 2980.9579870417283)\n",
+    ")  # This should evaluate to True if a2 has the correct shape :)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3759620d",
+   "metadata": {},
+   "source": [
+    "# Exercise 3\n",
+    "\n",
+    "We often want our neural networks to have many layers of varying sizes. To avoid writing very long and error-prone code where we explicitly define and evaluate each layer we should keep all our layers in a single variable which is easy to create and use.\n",
+    "\n",
+    "**a)** Complete the function below so that it returns a list `layers` of weight and bias tuples `(W, b)` for each layer, in order, with the correct shapes that we can use later as our network parameters.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "c58f10f9",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def create_layers(network_input_size, layer_output_sizes):\n",
+    "    layers = []\n",
+    "\n",
+    "    i_size = network_input_size\n",
+    "    for layer_output_size in layer_output_sizes:\n",
+    "        W = ...\n",
+    "        b = ...\n",
+    "        layers.append((W, b))\n",
+    "\n",
+    "        i_size = layer_output_size\n",
+    "    return layers"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "bdc0cda2",
+   "metadata": {},
+   "source": [
+    "**b)** Comple the function below so that it evaluates the intermediary `z` and activation `a` for each layer, with ReLU actication, and returns the final activation `a`. This is the complete feed-forward pass, a full neural network!\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "5262df05",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def feed_forward_all_relu(layers, input):\n",
+    "    a = input\n",
+    "    for W, b in layers:\n",
+    "        z = ...\n",
+    "        a = ...\n",
+    "    return a"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "245adbcb",
+   "metadata": {},
+   "source": [
+    "**c)** Create a network with input size 8 and layers with output sizes 10, 16, 6, 2. Evaluate it and make sure that you get the correct size vectors along the way.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "89a8f70d",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "input_size = ...\n",
+    "layer_output_sizes = [...]\n",
+    "\n",
+    "x = np.random.rand(input_size)\n",
+    "layers = ...\n",
+    "predict = ...\n",
+    "print(predict)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0da7fd52",
+   "metadata": {},
+   "source": [
+    "**d)** Why is a neural network with no activation functions mathematically equivelent to(can be reduced to) a neural network with only one layer?\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "306d8b7c",
+   "metadata": {},
+   "source": [
+    "# Exercise 4 - Custom activation for each layer\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "221c7b6c",
+   "metadata": {},
+   "source": [
+    "So far, every layer has used the same activation, ReLU. We often want to use other types of activation however, so we need to update our code to support multiple types of activation functions. Make sure that you have completed every previous exercise before trying this one.\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "10896d06",
+   "metadata": {},
+   "source": [
+    "**a)** Complete the `feed_forward` function which accepts a list of activation functions as an argument, and which evaluates these activation functions at each layer.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "de062369",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def feed_forward(input, layers, activation_funcs):\n",
+    "    a = input\n",
+    "    for (W, b), activation_func in zip(layers, activation_funcs):\n",
+    "        z = ...\n",
+    "        a = ...\n",
+    "    return a"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8f7df363",
+   "metadata": {},
+   "source": [
+    "**b)** You are now given a list with three activation functions, two ReLU and one sigmoid. (Don't call them yet! you can make a list with function names as elements, and then call these elements of the list later. If you add other functions than the ones defined at the start of the notebook, make sure everything is defined using autograd's numpy wrapper, like above, since we want to use automatic differentiation on all of these functions later.)\n",
+    "\n",
+    "Evaluate a network with three layers and these activation functions.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "301b46dc",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "network_input_size = ...\n",
+    "layer_output_sizes = [...]\n",
+    "activation_funcs = [ReLU, ReLU, sigmoid]\n",
+    "layers = ...\n",
+    "\n",
+    "x = np.random.randn(network_input_size)\n",
+    "feed_forward(x, layers, activation_funcs)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9c914fd0",
+   "metadata": {},
+   "source": [
+    "**c)** How does the output of the network change if you use sigmoid in the hidden layers and ReLU in the output layer?\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a8d6c425",
+   "metadata": {},
+   "source": [
+    "# Exercise 5 - Processing multiple inputs at once\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0f4330a4",
+   "metadata": {},
+   "source": [
+    "So far, the feed forward function has taken one input vector as an input. This vector then undergoes a linear transformation and then an element-wise non-linear operation for each layer. This approach of sending one vector in at a time is great for interpreting how the network transforms data with its linear and non-linear operations, but not the best for numerical efficiency. Now, we want to be able to send many inputs through the network at once. This will make the code a bit harder to understand, but it will make it faster, and more compact. It will be worth the trouble.\n",
+    "\n",
+    "To process multiple inputs at once, while still performing the same operations, you will only need to flip a couple things around.\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "17023bb7",
+   "metadata": {},
+   "source": [
+    "**a)** Complete the function `create_layers_batch` so that the weight matrix is the transpose of what it was when you only sent in one input at a time.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "a241fd79",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def create_layers_batch(network_input_size, layer_output_sizes):\n",
+    "    layers = []\n",
+    "\n",
+    "    i_size = network_input_size\n",
+    "    for layer_output_size in layer_output_sizes:\n",
+    "        W = ...\n",
+    "        b = ...\n",
+    "        layers.append((W, b))\n",
+    "\n",
+    "        i_size = layer_output_size\n",
+    "    return layers"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a6349db6",
+   "metadata": {},
+   "source": [
+    "**b)** Make a matrix of inputs with the shape (number of inputs, number of features), you choose the number of inputs and features per input. Then complete the function `feed_forward_batch` so that you can process this matrix of inputs with only one matrix multiplication and one broadcasted vector addition per layer. (Hint: You will only need to swap two variable around from your previous implementation, but remember to test that you get the same results for equivelent inputs!)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "425f3bcc",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "inputs = np.random.rand(1000, 4)\n",
+    "\n",
+    "\n",
+    "def feed_forward_batch(inputs, layers, activation_funcs):\n",
+    "    a = inputs\n",
+    "    for (W, b), activation_func in zip(layers, activation_funcs):\n",
+    "        z = ...\n",
+    "        a = ...\n",
+    "    return a"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "efd07b4e",
+   "metadata": {},
+   "source": [
+    "**c)** Create and evaluate a neural network with 4 input features, and layers with output sizes 12, 10, 3 and activations ReLU, ReLU, softmax.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "ce6fcc2f",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "network_input_size = ...\n",
+    "layer_output_sizes = [...]\n",
+    "activation_funcs = [...]\n",
+    "layers = create_layers_batch(network_input_size, layer_output_sizes)\n",
+    "\n",
+    "x = np.random.randn(network_input_size)\n",
+    "feed_forward_batch(inputs, layers, activation_funcs)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "87999271",
+   "metadata": {},
+   "source": [
+    "You should use this batched approach moving forward, as it will lead to much more compact code. However, remember that each input is still treated separately, and that you will need to keep in mind the transposed weight matrix and other details when implementing backpropagation.\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "237eb782",
+   "metadata": {},
+   "source": [
+    "# Exercise 6 - Predicting on real data\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "54d5fde7",
+   "metadata": {},
+   "source": [
+    "You will now evaluate your neural network on the iris data set (https://scikit-learn.org/1.5/auto_examples/datasets/plot_iris_dataset.html).\n",
+    "\n",
+    "This dataset contains data on 150 flowers of 3 different types which can be separated pretty well using the four features given for each flower, which includes the width and length of their leaves. You are will later train your network to actually make good predictions.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "6bd4c148",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "iris = datasets.load_iris()\n",
+    "\n",
+    "_, ax = plt.subplots()\n",
+    "scatter = ax.scatter(iris.data[:, 0], iris.data[:, 1], c=iris.target)\n",
+    "ax.set(xlabel=iris.feature_names[0], ylabel=iris.feature_names[1])\n",
+    "_ = ax.legend(\n",
+    "    scatter.legend_elements()[0], iris.target_names, loc=\"lower right\", title=\"Classes\"\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "ed3e2fc9",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "inputs = iris.data\n",
+    "\n",
+    "# Since each prediction is a vector with a score for each of the three types of flowers,\n",
+    "# we need to make each target a vector with a 1 for the correct flower and a 0 for the others.\n",
+    "targets = np.zeros((len(iris.data), 3))\n",
+    "for i, t in enumerate(iris.target):\n",
+    "    targets[i, t] = 1\n",
+    "\n",
+    "\n",
+    "def accuracy(predictions, targets):\n",
+    "    one_hot_predictions = np.zeros(predictions.shape)\n",
+    "\n",
+    "    for i, prediction in enumerate(predictions):\n",
+    "        one_hot_predictions[i, np.argmax(prediction)] = 1\n",
+    "    return accuracy_score(one_hot_predictions, targets)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0362c4a9",
+   "metadata": {},
+   "source": [
+    "**a)** What should the input size for the network be with this dataset? What should the output size of the last layer be?\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "bf62607e",
+   "metadata": {},
+   "source": [
+    "**b)** Create a network with two hidden layers, the first with sigmoid activation and the last with softmax, the first layer should have 8 \"nodes\", the second has the number of nodes you found in exercise a). Softmax returns a \"probability distribution\", in the sense that the numbers in the output are positive and add up to 1 and, their magnitude are in some sense relative to their magnitude before going through the softmax function. Remember to use the batched version of the create_layers and feed forward functions.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "5366d4ae",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "...\n",
+    "layers = ..."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c528846f",
+   "metadata": {},
+   "source": [
+    "**c)** Evaluate your model on the entire iris dataset! For later purposes, we will split the data into train and test sets, and compute gradients on smaller batches of the training data. But for now, evaluate the network on the whole thing at once.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "6c783105",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "predictions = feed_forward_batch(inputs, layers, activation_funcs)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "01a3caa8",
+   "metadata": {},
+   "source": [
+    "**d)** Compute the accuracy of your model using the accuracy function defined above. Recreate your model a couple times and see how the accuracy changes.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "a2612b82",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "print(accuracy(predictions, targets))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "334560b6",
+   "metadata": {},
+   "source": [
+    "# Exercise 7 - Training on real data (Optional)\n",
+    "\n",
+    "To be able to actually do anything useful with your neural network, you need to train it. For this, we need a cost function and a way to take the gradient of the cost function wrt. the network parameters. The following exercises guide you through taking the gradient using autograd, and updating the network parameters using the gradient. Feel free to implement gradient methods like ADAM if you finish everything.\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "700cabe4",
+   "metadata": {},
+   "source": [
+    "Since we are doing a classification task with multiple output classes, we use the cross-entropy loss function, which can evaluate performance on classification tasks. It sees if your prediction is \"most certain\" on the correct target.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "f30e6e2c",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def cross_entropy(predict, target):\n",
+    "    return np.sum(-target * np.log(predict))\n",
+    "\n",
+    "\n",
+    "def cost(input, layers, activation_funcs, target):\n",
+    "    predict = feed_forward_batch(input, layers, activation_funcs)\n",
+    "    return cross_entropy(predict, target)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7ea9c1a4",
+   "metadata": {},
+   "source": [
+    "To improve our network on whatever prediction task we have given it, we need to use a sensible cost function, take the gradient of that cost function with respect to our network parameters, the weights and biases, and then update the weights and biases using these gradients. To clarify, we need to find and use these\n",
+    "\n",
+    "$$\n",
+    "\\frac{\\partial C}{\\partial W}, \\frac{\\partial C}{\\partial b}\n",
+    "$$\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6c753e3b",
+   "metadata": {},
+   "source": [
+    "Now we need to compute these gradients. This is pretty hard to do for a neural network, we will use most of next week to do this, but we can also use autograd to just do it for us, which is what we always do in practice. With the code cell below, we create a function which takes all of these gradients for us.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "56bef776",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from autograd import grad\n",
+    "\n",
+    "\n",
+    "gradient_func = grad(\n",
+    "    cost, 1\n",
+    ")  # Taking the gradient wrt. the second input to the cost function, i.e. the layers"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7b1b74bc",
+   "metadata": {},
+   "source": [
+    "**a)** What shape should the gradient of the cost function wrt. weights and biases be?\n",
+    "\n",
+    "**b)** Use the `gradient_func` function to take the gradient of the cross entropy wrt. the weights and biases of the network. Check the shapes of what's inside. What does the `grad` func from autograd actually do?\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "841c9e87",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "layers_grad = gradient_func(\n",
+    "    inputs, layers, activation_funcs, targets\n",
+    ")  # Don't change this"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "adc9e9be",
+   "metadata": {},
+   "source": [
+    "**c)** Finish the `train_network` function.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "6e4d38d3",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def train_network(\n",
+    "    inputs, layers, activation_funcs, targets, learning_rate=0.001, epochs=100\n",
+    "):\n",
+    "    for i in range(epochs):\n",
+    "        layers_grad = gradient_func(inputs, layers, activation_funcs, targets)\n",
+    "        for (W, b), (W_g, b_g) in zip(layers, layers_grad):\n",
+    "            W -= ...\n",
+    "            b -= ..."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2f65d663",
+   "metadata": {},
+   "source": [
+    "**e)** What do we call the gradient method used above?\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7059dd8c",
+   "metadata": {},
+   "source": [
+    "**d)** Train your network and see how the accuracy changes! Make a plot if you want.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "5027c7a5",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "..."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3bc77016",
+   "metadata": {},
+   "source": [
+    "**e)** How high of an accuracy is it possible to acheive with a neural network on this dataset, if we use the whole thing as training data?\n"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.9.15"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/doc/LectureNotes/exercisesweek42.ipynb b/doc/LectureNotes/exercisesweek42.ipynb
new file mode 100644
index 000000000..9925836a4
--- /dev/null
+++ b/doc/LectureNotes/exercisesweek42.ipynb
@@ -0,0 +1,719 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Exercises week 42\n",
+    "\n",
+    "**October 13-17, 2025**\n",
+    "\n",
+    "Date: **Deadline is Friday October 17 at midnight**\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Overarching aims of the exercises this week\n",
+    "\n",
+    "The aim of the exercises this week is to train the neural network you implemented last week.\n",
+    "\n",
+    "To train neural networks, we use gradient descent, since there is no analytical expression for the optimal parameters. This means you will need to compute the gradient of the cost function wrt. the network parameters. And then you will need to implement some gradient method.\n",
+    "\n",
+    "You will begin by computing gradients for a network with one layer, then two layers, then any number of layers. Keeping track of the shapes and doing things step by step will be very important this week.\n",
+    "\n",
+    "We recommend that you do the exercises this week by editing and running this notebook file, as it includes some checks along the way that you have implemented the neural network correctly, and running small parts of the code at a time will be important for understanding the methods. If you have trouble running a notebook, you can run this notebook in google colab instead(https://colab.research.google.com/drive/1FfvbN0XlhV-lATRPyGRTtTBnJr3zNuHL#offline=true&sandboxMode=true), though we recommend that you set up VSCode and your python environment to run code like this locally.\n",
+    "\n",
+    "First, some setup code that you will need.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import autograd.numpy as np  # We need to use this numpy wrapper to make automatic differentiation work later\n",
+    "from autograd import grad, elementwise_grad\n",
+    "from sklearn import datasets\n",
+    "import matplotlib.pyplot as plt\n",
+    "from sklearn.metrics import accuracy_score\n",
+    "\n",
+    "\n",
+    "# Defining some activation functions\n",
+    "def ReLU(z):\n",
+    "    return np.where(z > 0, z, 0)\n",
+    "\n",
+    "\n",
+    "# Derivative of the ReLU function\n",
+    "def ReLU_der(z):\n",
+    "    return np.where(z > 0, 1, 0)\n",
+    "\n",
+    "\n",
+    "def sigmoid(z):\n",
+    "    return 1 / (1 + np.exp(-z))\n",
+    "\n",
+    "\n",
+    "def mse(predict, target):\n",
+    "    return np.mean((predict - target) ** 2)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Exercise 1 - Understand the feed forward pass\n",
+    "\n",
+    "**a)** Complete last weeks' exercises if you haven't already (recommended).\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Exercise 2 - Gradient with one layer using autograd\n",
+    "\n",
+    "For the first few exercises, we will not use batched inputs. Only a single input vector is passed through the layer at a time.\n",
+    "\n",
+    "In this exercise you will compute the gradient of a single layer. You only need to change the code in the cells right below an exercise, the rest works out of the box. Feel free to make changes and see how stuff works though!\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "**a)** If the weights and bias of a layer has shapes (10, 4) and (10), what will the shapes of the gradients of the cost function wrt. these weights and this bias be?\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "**b)** Complete the feed_forward_one_layer function. It should use the sigmoid activation function. Also define the weigth and bias with the correct shapes.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 41,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def feed_forward_one_layer(W, b, x):\n",
+    "    z = ...\n",
+    "    a = ...\n",
+    "    return a\n",
+    "\n",
+    "\n",
+    "def cost_one_layer(W, b, x, target):\n",
+    "    predict = feed_forward_one_layer(W, b, x)\n",
+    "    return mse(predict, target)\n",
+    "\n",
+    "\n",
+    "x = np.random.rand(2)\n",
+    "target = np.random.rand(3)\n",
+    "\n",
+    "W = ...\n",
+    "b = ..."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "**c)** Compute the gradient of the cost function wrt. the weigth and bias by running the cell below. You will not need to change anything, just make sure it runs by defining things correctly in the cell above. This code uses the autograd package which uses backprogagation to compute the gradient!\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "autograd_one_layer = grad(cost_one_layer, [0, 1])\n",
+    "W_g, b_g = autograd_one_layer(W, b, x, target)\n",
+    "print(W_g, b_g)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Exercise 3 - Gradient with one layer writing backpropagation by hand\n",
+    "\n",
+    "Before you use the gradient you found using autograd, you will have to find the gradient \"manually\", to better understand how the backpropagation computation works. To do backpropagation \"manually\", you will need to write out expressions for many derivatives along the computation.\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "We want to find the gradient of the cost function wrt. the weight and bias. This is quite hard to do directly, so we instead use the chain rule to combine multiple derivatives which are easier to compute.\n",
+    "\n",
+    "$$\n",
+    "\\frac{dC}{dW} = \\frac{dC}{da}\\frac{da}{dz}\\frac{dz}{dW}\n",
+    "$$\n",
+    "\n",
+    "$$\n",
+    "\\frac{dC}{db} = \\frac{dC}{da}\\frac{da}{dz}\\frac{dz}{db}\n",
+    "$$\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "**a)** Which intermediary results can be reused between the two expressions?\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "**b)** What is the derivative of the cost wrt. the final activation? You can use the autograd calculation to make sure you get the correct result. Remember that we compute the mean in mse.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "z = W @ x + b\n",
+    "a = sigmoid(z)\n",
+    "\n",
+    "predict = a\n",
+    "\n",
+    "\n",
+    "def mse_der(predict, target):\n",
+    "    return ...\n",
+    "\n",
+    "\n",
+    "print(mse_der(predict, target))\n",
+    "\n",
+    "cost_autograd = grad(mse, 0)\n",
+    "print(cost_autograd(predict, target))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "**c)** What is the expression for the derivative of the sigmoid activation function? You can use the autograd calculation to make sure you get the correct result.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def sigmoid_der(z):\n",
+    "    return ...\n",
+    "\n",
+    "\n",
+    "print(sigmoid_der(z))\n",
+    "\n",
+    "sigmoid_autograd = elementwise_grad(sigmoid, 0)\n",
+    "print(sigmoid_autograd(z))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "**d)** Using the two derivatives you just computed, compute this intermetidary gradient you will use later:\n",
+    "\n",
+    "$$\n",
+    "\\frac{dC}{dz} = \\frac{dC}{da}\\frac{da}{dz}\n",
+    "$$\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 54,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "dC_da = ...\n",
+    "dC_dz = ..."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "**e)** What is the derivative of the intermediary z wrt. the weight and bias? What should the shapes be? The one for the weights is a little tricky, it can be easier to play around in the next exercise first. You can also try computing it with autograd to get a hint.\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "**f)** Now combine the expressions you have worked with so far to compute the gradients! Note that you always need to do a feed forward pass while saving the zs and as before you do backpropagation, as they are used in the derivative expressions\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "dC_da = ...\n",
+    "dC_dz = ...\n",
+    "dC_dW = ...\n",
+    "dC_db = ...\n",
+    "\n",
+    "print(dC_dW, dC_db)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "You should get the same results as with autograd.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "W_g, b_g = autograd_one_layer(W, b, x, target)\n",
+    "print(W_g, b_g)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Exercise 4 - Gradient with two layers writing backpropagation by hand\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Now that you have implemented backpropagation for one layer, you have found most of the expressions you will need for more layers. Let's move up to two layers.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 59,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "x = np.random.rand(2)\n",
+    "target = np.random.rand(4)\n",
+    "\n",
+    "W1 = np.random.rand(3, 2)\n",
+    "b1 = np.random.rand(3)\n",
+    "\n",
+    "W2 = np.random.rand(4, 3)\n",
+    "b2 = np.random.rand(4)\n",
+    "\n",
+    "layers = [(W1, b1), (W2, b2)]"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 60,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "z1 = W1 @ x + b1\n",
+    "a1 = sigmoid(z1)\n",
+    "z2 = W2 @ a1 + b2\n",
+    "a2 = sigmoid(z2)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "We begin by computing the gradients of the last layer, as the gradients must be propagated backwards from the end.\n",
+    "\n",
+    "**a)** Compute the gradients of the last layer, just like you did the single layer in the previous exercise.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 61,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "dC_da2 = ...\n",
+    "dC_dz2 = ...\n",
+    "dC_dW2 = ...\n",
+    "dC_db2 = ..."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "To find the derivative of the cost wrt. the activation of the first layer, we need a new expression, the one furthest to the right in the following.\n",
+    "\n",
+    "$$\n",
+    "\\frac{dC}{da_1} = \\frac{dC}{dz_2}\\frac{dz_2}{da_1}\n",
+    "$$\n",
+    "\n",
+    "**b)** What is the derivative of the second layer intermetiate wrt. the first layer activation? (First recall how you compute $z_2$)\n",
+    "\n",
+    "$$\n",
+    "\\frac{dz_2}{da_1}\n",
+    "$$\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "**c)** Use this expression, together with expressions which are equivelent to ones for the last layer to compute all the derivatives of the first layer.\n",
+    "\n",
+    "$$\n",
+    "\\frac{dC}{dW_1} = \\frac{dC}{da_1}\\frac{da_1}{dz_1}\\frac{dz_1}{dW_1}\n",
+    "$$\n",
+    "\n",
+    "$$\n",
+    "\\frac{dC}{db_1} = \\frac{dC}{da_1}\\frac{da_1}{dz_1}\\frac{dz_1}{db_1}\n",
+    "$$\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 63,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "dC_da1 = ...\n",
+    "dC_dz1 = ...\n",
+    "dC_dW1 = ...\n",
+    "dC_db1 = ..."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "print(dC_dW1, dC_db1)\n",
+    "print(dC_dW2, dC_db2)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "**d)** Make sure you got the same gradient as the following code which uses autograd to do backpropagation.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 67,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def feed_forward_two_layers(layers, x):\n",
+    "    W1, b1 = layers[0]\n",
+    "    z1 = W1 @ x + b1\n",
+    "    a1 = sigmoid(z1)\n",
+    "\n",
+    "    W2, b2 = layers[1]\n",
+    "    z2 = W2 @ a1 + b2\n",
+    "    a2 = sigmoid(z2)\n",
+    "\n",
+    "    return a2"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def cost_two_layers(layers, x, target):\n",
+    "    predict = feed_forward_two_layers(layers, x)\n",
+    "    return mse(predict, target)\n",
+    "\n",
+    "\n",
+    "grad_two_layers = grad(cost_two_layers, 0)\n",
+    "grad_two_layers(layers, x, target)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "**e)** How would you use the gradient from this layer to compute the gradient of an even earlier layer? Would the expressions be any different?\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Exercise 5 - Gradient with any number of layers writing backpropagation by hand\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Well done on getting this far! Now it's time to compute the gradient with any number of layers.\n",
+    "\n",
+    "First, some code from the general neural network code from last week. Note that we are still sending in one input vector at a time. We will change it to use batched inputs later.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def create_layers(network_input_size, layer_output_sizes):\n",
+    "    layers = []\n",
+    "\n",
+    "    i_size = network_input_size\n",
+    "    for layer_output_size in layer_output_sizes:\n",
+    "        W = np.random.randn(layer_output_size, i_size)\n",
+    "        b = np.random.randn(layer_output_size)\n",
+    "        layers.append((W, b))\n",
+    "\n",
+    "        i_size = layer_output_size\n",
+    "    return layers\n",
+    "\n",
+    "\n",
+    "def feed_forward(input, layers, activation_funcs):\n",
+    "    a = input\n",
+    "    for (W, b), activation_func in zip(layers, activation_funcs):\n",
+    "        z = W @ a + b\n",
+    "        a = activation_func(z)\n",
+    "    return a\n",
+    "\n",
+    "\n",
+    "def cost(layers, input, activation_funcs, target):\n",
+    "    predict = feed_forward(input, layers, activation_funcs)\n",
+    "    return mse(predict, target)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "You might have already have noticed a very important detail in backpropagation: You need the values from the forward pass to compute all the gradients! The feed forward method above is great for efficiency and for using autograd, as it only cares about computing the final output, but now we need to also save the results along the way.\n",
+    "\n",
+    "Here is a function which does that for you.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def feed_forward_saver(input, layers, activation_funcs):\n",
+    "    layer_inputs = []\n",
+    "    zs = []\n",
+    "    a = input\n",
+    "    for (W, b), activation_func in zip(layers, activation_funcs):\n",
+    "        layer_inputs.append(a)\n",
+    "        z = W @ a + b\n",
+    "        a = activation_func(z)\n",
+    "\n",
+    "        zs.append(z)\n",
+    "\n",
+    "    return layer_inputs, zs, a"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "**a)** Now, complete the backpropagation function so that it returns the gradient of the cost function wrt. all the weigths and biases. Use the autograd calculation below to make sure you get the correct answer.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def backpropagation(\n",
+    "    input, layers, activation_funcs, target, activation_ders, cost_der=mse_der\n",
+    "):\n",
+    "    layer_inputs, zs, predict = feed_forward_saver(input, layers, activation_funcs)\n",
+    "\n",
+    "    layer_grads = [() for layer in layers]\n",
+    "\n",
+    "    # We loop over the layers, from the last to the first\n",
+    "    for i in reversed(range(len(layers))):\n",
+    "        layer_input, z, activation_der = layer_inputs[i], zs[i], activation_ders[i]\n",
+    "\n",
+    "        if i == len(layers) - 1:\n",
+    "            # For last layer we use cost derivative as dC_da(L) can be computed directly\n",
+    "            dC_da = ...\n",
+    "        else:\n",
+    "            # For other layers we build on previous z derivative, as dC_da(i) = dC_dz(i+1) * dz(i+1)_da(i)\n",
+    "            (W, b) = layers[i + 1]\n",
+    "            dC_da = ...\n",
+    "\n",
+    "        dC_dz = ...\n",
+    "        dC_dW = ...\n",
+    "        dC_db = ...\n",
+    "\n",
+    "        layer_grads[i] = (dC_dW, dC_db)\n",
+    "\n",
+    "    return layer_grads"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "network_input_size = 2\n",
+    "layer_output_sizes = [3, 4]\n",
+    "activation_funcs = [sigmoid, ReLU]\n",
+    "activation_ders = [sigmoid_der, ReLU_der]\n",
+    "\n",
+    "layers = create_layers(network_input_size, layer_output_sizes)\n",
+    "\n",
+    "x = np.random.rand(network_input_size)\n",
+    "target = np.random.rand(4)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "layer_grads = backpropagation(x, layers, activation_funcs, target, activation_ders)\n",
+    "print(layer_grads)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "cost_grad = grad(cost, 0)\n",
+    "cost_grad(layers, x, [sigmoid, ReLU], target)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Exercise 6 - Batched inputs\n",
+    "\n",
+    "Make new versions of all the functions in exercise 5 which now take batched inputs instead. See last weeks exercise 5 for details on how to batch inputs to neural networks. You will also need to update the backpropogation function.\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Exercise 7 - Training\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "**a)** Complete exercise 6 and 7 from last week, but use your own backpropogation implementation to compute the gradient.\n",
+    "- IMPORTANT: Do not implement the derivative terms for softmax and cross-entropy separately, it will be very hard!\n",
+    "- Instead, use the fact that the derivatives multiplied together simplify to **prediction - target** (see [source1](https://medium.com/data-science/derivative-of-the-softmax-function-and-the-categorical-cross-entropy-loss-ffceefc081d1), [source2](https://shivammehta25.github.io/posts/deriving-categorical-cross-entropy-and-softmax/))\n",
+    "\n",
+    "**b)** Use stochastic gradient descent with momentum when you train your network.\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Exercise 8 (Optional) - Object orientation\n",
+    "\n",
+    "Passing in the layers, activations functions, activation derivatives and cost derivatives into the functions each time leads to code which is easy to understand in isoloation, but messier when used in a larger context with data splitting, data scaling, gradient methods and so forth. Creating an object which stores these values can lead to code which is much easier to use.\n",
+    "\n",
+    "**a)** Write a neural network class. You are free to implement it how you see fit, though we strongly recommend to not save any input or output values as class attributes, nor let the neural network class handle gradient methods internally. Gradient methods should be handled outside, by performing general operations on the layer_grads list using functions or classes separate to the neural network.\n",
+    "\n",
+    "We provide here a skeleton structure which should get you started.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "class NeuralNetwork:\n",
+    "    def __init__(\n",
+    "        self,\n",
+    "        network_input_size,\n",
+    "        layer_output_sizes,\n",
+    "        activation_funcs,\n",
+    "        activation_ders,\n",
+    "        cost_fun,\n",
+    "        cost_der,\n",
+    "    ):\n",
+    "        pass\n",
+    "\n",
+    "    def predict(self, inputs):\n",
+    "        # Simple feed forward pass\n",
+    "        pass\n",
+    "\n",
+    "    def cost(self, inputs, targets):\n",
+    "        pass\n",
+    "\n",
+    "    def _feed_forward_saver(self, inputs):\n",
+    "        pass\n",
+    "\n",
+    "    def compute_gradient(self, inputs, targets):\n",
+    "        pass\n",
+    "\n",
+    "    def update_weights(self, layer_grads):\n",
+    "        pass\n",
+    "\n",
+    "    # These last two methods are not needed in the project, but they can be nice to have! The first one has a layers parameter so that you can use autograd on it\n",
+    "    def autograd_compliant_predict(self, layers, inputs):\n",
+    "        pass\n",
+    "\n",
+    "    def autograd_gradient(self, inputs, targets):\n",
+    "        pass"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.9.15"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}
diff --git a/doc/LectureNotes/exercisesweek43.ipynb b/doc/LectureNotes/exercisesweek43.ipynb
new file mode 100644
index 000000000..f80e8787a
--- /dev/null
+++ b/doc/LectureNotes/exercisesweek43.ipynb
@@ -0,0 +1,647 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "860d70d8",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- HTML file automatically generated from DocOnce source (https://github.com/doconce/doconce/)\n",
+    "doconce format html exercisesweek43.do.txt  -->\n",
+    "<!-- dom:TITLE: Exercises week 43  -->"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "119c0988",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "# Exercises week 43 \n",
+    "**October 20-24, 2025**\n",
+    "\n",
+    "Date: **Deadline Friday October 24 at midnight**"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "909887eb",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "# Overarching aims of the exercises for week 43\n",
+    "\n",
+    "The aim of the exercises this week is to gain some confidence with\n",
+    "ways to visualize the results of a classification problem.  We will\n",
+    "target three ways of setting up the analysis. The first and simplest\n",
+    "one is the\n",
+    "1. so-called confusion matrix. The next one is the so-called\n",
+    "\n",
+    "2. ROC curve. Finally we have the\n",
+    "\n",
+    "3. Cumulative gain curve.\n",
+    "\n",
+    "We will use Logistic Regression as method for the classification in\n",
+    "this exercise. You can compare these results with those obtained with\n",
+    "your neural network code from project 2 without a hidden layer.\n",
+    "\n",
+    "In these exercises we will use binary and  multi-class data sets\n",
+    "(the Iris data set from week 41).\n",
+    "\n",
+    "The underlying mathematics is described here."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1e1cb4fb",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "### Confusion Matrix\n",
+    "\n",
+    "A **confusion matrix** summarizes a classifier’s performance by\n",
+    "tabulating predictions versus true labels.  For binary classification,\n",
+    "it is a $2\\times2$ table whose entries are counts of outcomes:"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7b090385",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\begin{array}{l|cc} & \\text{Predicted Positive} & \\text{Predicted Negative} \\\\ \\hline \\text{Actual Positive} & TP & FN \\\\ \\text{Actual Negative} & FP & TN \\end{array}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1e14904b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Here TP (true positives) is the number of cases correctly predicted as\n",
+    "positive, FP (false positives) is the number incorrectly predicted as\n",
+    "positive, TN (true negatives) is correctly predicted negative, and FN\n",
+    "(false negatives) is incorrectly predicted negative .  In other words,\n",
+    "“positive” means class 1 and “negative” means class 0; for example, TP\n",
+    "occurs when the prediction and actual are both positive.  Formally:"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e93ea290",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\text{TPR} = \\frac{\\text{TP}}{\\text{TP} + \\text{FN}}, \\quad \\text{FPR} = \\frac{\\text{FP}}{\\text{FP} + \\text{TN}},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c80bea5b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where TPR and FPR are the true and false positive rates defined below.\n",
+    "\n",
+    "In multiclass classification with $K$ classes, the confusion matrix\n",
+    "generalizes to a $K\\times K$ table.  Entry $N_{ij}$ in the table is\n",
+    "the count of instances whose true class is $i$ and whose predicted\n",
+    "class is $j$.  For example, a three-class confusion matrix can be written\n",
+    "as:"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a0f68f5f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\begin{array}{c|ccc} & \\text{Pred Class 1} & \\text{Pred Class 2} & \\text{Pred Class 3} \\\\ \\hline \\text{Act Class 1} & N_{11} & N_{12} & N_{13} \\\\ \\text{Act Class 2} & N_{21} & N_{22} & N_{23} \\\\ \\text{Act Class 3} & N_{31} & N_{32} & N_{33} \\end{array}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "869669b2",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Here the diagonal entries $N_{ii}$ are the true positives for each\n",
+    "class, and off-diagonal entries are misclassifications.  This matrix\n",
+    "allows computation of per-class metrics: e.g. for class $i$,\n",
+    "$\\mathrm{TP}_i=N_{ii}$, $\\mathrm{FN}_i=\\sum_{j\\neq i}N_{ij}$,\n",
+    "$\\mathrm{FP}_i=\\sum_{j\\neq i}N_{ji}$, and $\\mathrm{TN}_i$ is the sum of\n",
+    "all remaining entries.\n",
+    "\n",
+    "As defined above, TPR and FPR come from the binary case. In binary\n",
+    "terms with $P$ actual positives and $N$ actual negatives, one has"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2abd82a7",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\text{TPR} = \\frac{TP}{P} = \\frac{TP}{TP+FN}, \\quad \\text{FPR} =\n",
+    "\\frac{FP}{N} = \\frac{FP}{FP+TN},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2f79325c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "as used in standard confusion-matrix\n",
+    "formulations. These rates will be used in constructing ROC curves."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0ce65a47",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "### ROC Curve\n",
+    "\n",
+    "The Receiver Operating Characteristic (ROC) curve plots the trade-off\n",
+    "between true positives and false positives as a discrimination\n",
+    "threshold varies.  Specifically, for a binary classifier that outputs\n",
+    "a score or probability, one varies the threshold $t$ for declaring\n",
+    "**positive**, and computes at each $t$ the true positive rate\n",
+    "$\\mathrm{TPR}(t)$ and false positive rate $\\mathrm{FPR}(t)$ using the\n",
+    "confusion matrix at that threshold.  The ROC curve is then the graph\n",
+    "of TPR versus FPR.  By definition,"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d750fdff",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\mathrm{TPR} = \\frac{TP}{TP+FN}, \\qquad \\mathrm{FPR} = \\frac{FP}{FP+TN},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "561bfb2c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where $TP,FP,TN,FN$ are counts determined by threshold $t$.  A perfect\n",
+    "classifier would reach the point (FPR=0, TPR=1) at some threshold.\n",
+    "\n",
+    "Formally, the ROC curve is obtained by plotting\n",
+    "$(\\mathrm{FPR}(t),\\mathrm{TPR}(t))$ for all $t\\in[0,1]$ (or as $t$\n",
+    "sweeps through the sorted scores).  The Area Under the ROC Curve (AUC)\n",
+    "quantifies the average performance over all thresholds.  It can be\n",
+    "interpreted probabilistically: $\\mathrm{AUC} =\n",
+    "\\Pr\\bigl(s(X^+)>s(X^-)\\bigr)$, the probability that a random positive\n",
+    "instance $X^+$ receives a higher score $s$ than a random negative\n",
+    "instance $X^-$ .  Equivalently, the AUC is the integral under the ROC\n",
+    "curve:"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5ca722fe",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\mathrm{AUC} \\;=\\; \\int_{0}^{1} \\mathrm{TPR}(f)\\,df,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "30080a86",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where $f$ ranges over FPR (or fraction of negatives).  A model that guesses at random yields a diagonal ROC (AUC=0.5), whereas a perfect model yields AUC=1.0."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9e627156",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "### Cumulative Gain\n",
+    "\n",
+    "The cumulative gain curve (or gains chart) evaluates how many\n",
+    "positives are captured as one targets an increasing fraction of the\n",
+    "population, sorted by model confidence.  To construct it, sort all\n",
+    "instances by decreasing predicted probability of the positive class.\n",
+    "Then, for the top $\\alpha$ fraction of instances, compute the fraction\n",
+    "of all actual positives that fall in this subset.  In formula form, if\n",
+    "$P$ is the total number of positive instances and $P(\\alpha)$ is the\n",
+    "number of positives among the top $\\alpha$ of the data, the cumulative\n",
+    "gain at level $\\alpha$ is"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3e9132ef",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\mathrm{Gain}(\\alpha) \\;=\\; \\frac{P(\\alpha)}{P}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "75be6f5c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "For example, cutting off at the top 10% of predictions yields a gain\n",
+    "equal to (positives in top 10%) divided by (total positives) .\n",
+    "Plotting $\\mathrm{Gain}(\\alpha)$ versus $\\alpha$ (often in percent)\n",
+    "gives the gain curve.  The baseline (random) curve is the diagonal\n",
+    "$\\mathrm{Gain}(\\alpha)=\\alpha$, while an ideal model has a steep climb\n",
+    "toward 1.\n",
+    "\n",
+    "A related measure is the {\\em lift}, often called the gain ratio.  It is the ratio of the model’s capture rate to that of random selection.  Equivalently,"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e5525570",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\mathrm{Lift}(\\alpha) \\;=\\; \\frac{\\mathrm{Gain}(\\alpha)}{\\alpha}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "18ff8dc2",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "A lift $>1$ indicates better-than-random targeting.  In practice, gain\n",
+    "and lift charts (used e.g.\\ in marketing or imbalanced classification)\n",
+    "show how many positives can be “gained” by focusing on a fraction of\n",
+    "the population ."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c3d3fde8",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "### Other measures: Precision, Recall, and the F$_1$ Measure\n",
+    "\n",
+    "Precision and recall (sensitivity) quantify binary classification\n",
+    "accuracy in terms of positive predictions.  They are defined from the\n",
+    "confusion matrix as:"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f1f14c8e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\text{Precision} = \\frac{TP}{TP + FP}, \\qquad \\text{Recall} = \\frac{TP}{TP + FN}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "422cc743",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Precision is the fraction of predicted positives that are correct, and\n",
+    "recall is the fraction of actual positives that are correctly\n",
+    "identified .  A high-precision classifier makes few false-positive\n",
+    "errors, while a high-recall classifier makes few false-negative\n",
+    "errors.\n",
+    "\n",
+    "The F$_1$ score (balanced F-measure) combines precision and recall into a single metric via their harmonic mean.  The usual formula is:"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "621a2e8b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "F_1 =2\\frac{\\text{Precision}\\times\\text{Recall}}{\\text{Precision} + \\text{Recall}}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "62eee54a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "This can be shown to equal"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7a6a2e7a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{2\\,TP}{2\\,TP + FP + FN}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b96c9ff4",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "The F$_1$ score ranges from 0 (worst) to 1 (best), and balances the\n",
+    "trade-off between precision and recall.\n",
+    "\n",
+    "For multi-class classification, one computes per-class\n",
+    "precision/recall/F$_1$ (treating each class as “positive” in a\n",
+    "one-vs-rest manner) and then averages.  Common averaging methods are:\n",
+    "\n",
+    "Micro-averaging: Sum all true positives, false positives, and false negatives across classes, then compute precision/recall/F$_1$ from these totals.\n",
+    "Macro-averaging: Compute the F$1$ score $F{1,i}$ for each class $i$ separately, then take the unweighted mean: $F_{1,\\mathrm{macro}} = \\frac{1}{K}\\sum_{i=1}^K F_{1,i}$ .  This treats all classes equally regardless of size.\n",
+    "Weighted-averaging: Like macro-average, but weight each class’s $F_{1,i}$ by its support $n_i$ (true count): $F_{1,\\mathrm{weighted}} = \\frac{1}{N}\\sum_{i=1}^K n_i F_{1,i}$, where $N=\\sum_i n_i$.  This accounts for class imbalance by giving more weight to larger classes .\n",
+    "\n",
+    "Each of these averages has different use-cases. Micro-average is\n",
+    "dominated by common classes, macro-average highlights performance on\n",
+    "rare classes, and weighted-average is a compromise.  These formulas\n",
+    "and concepts allow rigorous evaluation of classifier performance in\n",
+    "both binary and multi-class settings."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9274bf3f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Exercises\n",
+    "\n",
+    "Here is a simple code example which uses  the Logistic regression machinery from **scikit-learn**.\n",
+    "At the end it sets up the confusion matrix and the ROC and cumulative gain curves.\n",
+    "Feel free to use these functionalities (we don't expect you to write your own code for say the confusion matrix)."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "id": "be9ff0b9",
+   "metadata": {
+    "collapsed": false,
+    "editable": true,
+    "jupyter": {
+     "outputs_hidden": false
+    }
+   },
+   "outputs": [],
+   "source": [
+    "%matplotlib inline\n",
+    "\n",
+    "import matplotlib.pyplot as plt\n",
+    "import numpy as np\n",
+    "from sklearn.model_selection import  train_test_split \n",
+    "# from sklearn.datasets import fill in the data set\n",
+    "from sklearn.linear_model import LogisticRegression\n",
+    "\n",
+    "# Load the data, fill inn\n",
+    "mydata.data = ?\n",
+    "\n",
+    "X_train, X_test, y_train, y_test = train_test_split(mydata.data,cancer.target,random_state=0)\n",
+    "print(X_train.shape)\n",
+    "print(X_test.shape)\n",
+    "# Logistic Regression\n",
+    "# define which type of problem, binary or multiclass\n",
+    "logreg = LogisticRegression(solver='lbfgs')\n",
+    "logreg.fit(X_train, y_train)\n",
+    "\n",
+    "from sklearn.preprocessing import LabelEncoder\n",
+    "from sklearn.model_selection import cross_validate\n",
+    "#Cross validation\n",
+    "accuracy = cross_validate(logreg,X_test,y_test,cv=10)['test_score']\n",
+    "print(accuracy)\n",
+    "print(\"Test set accuracy with Logistic Regression: {:.2f}\".format(logreg.score(X_test,y_test)))\n",
+    "\n",
+    "import scikitplot as skplt\n",
+    "y_pred = logreg.predict(X_test)\n",
+    "skplt.metrics.plot_confusion_matrix(y_test, y_pred, normalize=True)\n",
+    "plt.show()\n",
+    "y_probas = logreg.predict_proba(X_test)\n",
+    "skplt.metrics.plot_roc(y_test, y_probas)\n",
+    "plt.show()\n",
+    "skplt.metrics.plot_cumulative_gain(y_test, y_probas)\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "51760b3e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "### Exercise a)\n",
+    "\n",
+    "Convince yourself about the mathematics for the confusion matrix, the ROC and the cumlative gain curves for both a binary and a multiclass classification problem."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c1d42f5f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "### Exercise b)\n",
+    "\n",
+    "Use a binary classification data available from **scikit-learn**. As an example you can use\n",
+    "the MNIST data set and just specialize to two numbers. To do so you can use the following code lines"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "id": "d20bb8be",
+   "metadata": {
+    "collapsed": false,
+    "editable": true,
+    "jupyter": {
+     "outputs_hidden": false
+    }
+   },
+   "outputs": [],
+   "source": [
+    "from sklearn.datasets import load_digits\n",
+    "digits = load_digits(n_class=2) # Load only two classes, e.g., 0 and 1\n",
+    "X, y = digits.data, digits.target"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "828ea1cd",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Alternatively, you can use the _make$\\_$classification_\n",
+    "functionality. This function generates a random $n$-class classification\n",
+    "dataset, which can be configured for binary classification by setting\n",
+    "n_classes=2. You can also control the number of samples, features,\n",
+    "informative features, redundant features, and more."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "id": "d271f0ba",
+   "metadata": {
+    "collapsed": false,
+    "editable": true,
+    "jupyter": {
+     "outputs_hidden": false
+    }
+   },
+   "outputs": [],
+   "source": [
+    "from sklearn.datasets import make_classification\n",
+    "X, y = make_classification(n_samples=1000, n_features=20, n_informative=10, n_redundant=5, n_classes=2, random_state=42)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0068b032",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "You can use this option for the multiclass case as well, see the next exercise.\n",
+    "If you prefer to study other binary classification datasets, feel free\n",
+    "to replace the above suggestions with your own dataset.\n",
+    "\n",
+    "Make plots of the confusion matrix, the ROC curve and the cumulative gain curve."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c45f5b41",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "### Exercise c) week 43\n",
+    "\n",
+    "As a multiclass problem, we will use the Iris data set discussed in\n",
+    "the exercises from weeks 41 and 42. This is a three-class data set and\n",
+    "you can set it up using **scikit-learn**,"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "id": "3b045d56",
+   "metadata": {
+    "collapsed": false,
+    "editable": true,
+    "jupyter": {
+     "outputs_hidden": false
+    }
+   },
+   "outputs": [],
+   "source": [
+    "from sklearn.datasets import load_iris\n",
+    "iris = load_iris()\n",
+    "X = iris.data  # Features\n",
+    "y = iris.target # Target labels"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "14cc859c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Make plots of the confusion matrix, the ROC curve and the cumulative\n",
+    "gain curve for this (or other) multiclass data set."
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.9.15"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/doc/LectureNotes/exercisesweek44.ipynb b/doc/LectureNotes/exercisesweek44.ipynb
new file mode 100644
index 000000000..32aa0e723
--- /dev/null
+++ b/doc/LectureNotes/exercisesweek44.ipynb
@@ -0,0 +1,182 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "55f7cd56",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- HTML file automatically generated from DocOnce source (https://github.com/doconce/doconce/)\n",
+    "doconce format html exercisesweek44.do.txt  -->\n",
+    "<!-- dom:TITLE: Exercises week 44 -->\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "37c83276",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "# Exercises week 44\n",
+    "\n",
+    "**October 27-31, 2025**\n",
+    "\n",
+    "Date: **Deadline is Friday October 31 at midnight**\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "58a26983",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "# Overarching aims of the exercises this week\n",
+    "\n",
+    "The exercise set this week has two parts.\n",
+    "\n",
+    "1. The first is a version of the exercises from week 39, where you got started with the report and github repository for project 1, only this time for project 2. This part is required, and a short feedback to this exercise will be available before the project deadline. And you can reuse these elements in your final report.\n",
+    "\n",
+    "2. The second is a list of questions meant as a summary of many of the central elements we have discussed in connection with projects 1 and 2, with a slight bias towards deep learning methods and their training. The hope is that these exercises can be of use in your discussions about the neural network results in project 2. **You don't need to answer all the questions, but you should be able to answer them by the end of working on project 2.**\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "350c58e2",
+   "metadata": {
+    "vscode": {
+     "languageId": "plaintext"
+    }
+   },
+   "source": [
+    "### Deliverables\n",
+    "\n",
+    "First, join a group in canvas with your group partners. Pick an avaliable group for Project 2 in the “People” page. If you don't have a group, you should really consider joining one!\n",
+    "\n",
+    "Complete exercise 1 while working in an Overleaf project. Then, in canvas, include\n",
+    "\n",
+    "- An exported PDF of the report draft you have been working on.\n",
+    "- A comment linking to the github repository used in exercise **1d)**\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "00f65f6e",
+   "metadata": {},
+   "source": [
+    "## Exercise 1:\n",
+    "\n",
+    "Following the same directions as in the weekly exercises for week 39:\n",
+    "\n",
+    "**a)** Create a report document in Overleaf, and write a suitable abstract and introduction for project 2.\n",
+    "\n",
+    "**b)** Add a figure in your report of a heatmap showing the test accuracy of a neural network with [0, 1, 2, 3] hidden layers and [5, 10, 25, 50] nodes per hidden layer.\n",
+    "\n",
+    "**c)** Add a figure in your report which meets as few requirements as possible of what we consider a good figure in this course, while still including some results, a title, figure text, and axis labels. Describe in the text of the report the different ways in which the figure is lacking. (This should not be included in the final report for project 2.)\n",
+    "\n",
+    "**d)** Create a github repository or folder in a repository with all the elements described in exercise 4 of the weekly exercises of week 39.\n",
+    "\n",
+    "**e)** If applicable, add references to your report for the source of your data for regression and classification, the source of claims you make about your data, and for the sources of the gradient optimizers you use and your general claims about these.\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6dff53b8",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Exercise 2:\n",
+    "\n",
+    "**a)** Linear and logistic regression methods\n",
+    "\n",
+    "1. What is the main difference between ordinary least squares and Ridge regression?\n",
+    "\n",
+    "2. Which kind of data set would you use logistic regression for?\n",
+    "\n",
+    "3. In linear regression you assume that your output is described by a continuous non-stochastic function $f(x)$. Which is the equivalent function in logistic regression?\n",
+    "\n",
+    "4. Can you find an analytic solution to a logistic regression type of problem?\n",
+    "\n",
+    "5. What kind of cost function would you use in logistic regression?\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "21a056a4",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "**b)** Deep learning\n",
+    "\n",
+    "1. What is an activation function and discuss the use of an activation function? Explain three different types of activation functions?\n",
+    "\n",
+    "2. Describe the architecture of a typical feed forward Neural Network (NN).\n",
+    "\n",
+    "3. You are using a deep neural network for a prediction task. After training your model, you notice that it is strongly overfitting the training set and that the performance on the test isn’t good. What can you do to reduce overfitting?\n",
+    "\n",
+    "4. How would you know if your model is suffering from the problem of exploding gradients?\n",
+    "\n",
+    "5. Can you name and explain a few hyperparameters used for training a neural network?\n",
+    "\n",
+    "6. Describe the architecture of a typical Convolutional Neural Network (CNN)\n",
+    "\n",
+    "7. What is the vanishing gradient problem in Neural Networks and how to fix it?\n",
+    "\n",
+    "8. When it comes to training an artificial neural network, what could the reason be for why the cost/loss doesn't decrease in a few epochs?\n",
+    "\n",
+    "9. How does L1/L2 regularization affect a neural network?\n",
+    "\n",
+    "10. What is(are) the advantage(s) of deep learning over traditional methods like linear regression or logistic regression?\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7c48bc09",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "**c)** Optimization part\n",
+    "\n",
+    "1. Which is the basic mathematical root-finding method behind essentially all gradient descent approaches(stochastic and non-stochastic)?\n",
+    "\n",
+    "2. And why don't we use it? Or stated differently, why do we introduce the learning rate as a parameter?\n",
+    "\n",
+    "3. What might happen if you set the momentum hyperparameter too close to 1 (e.g., 0.9999) when using an optimizer for the learning rate?\n",
+    "\n",
+    "4. Why should we use stochastic gradient descent instead of plain gradient descent?\n",
+    "\n",
+    "5. Which parameters would you need to tune when use a stochastic gradient descent approach?\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "56b0b5f6",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "**d)** Analysis of results\n",
+    "\n",
+    "1. How do you assess overfitting and underfitting?\n",
+    "\n",
+    "2. Why do we divide the data in test and train and/or eventually validation sets?\n",
+    "\n",
+    "3. Why would you use resampling methods in the data analysis? Mention some widely popular resampling methods.\n",
+    "\n",
+    "4. Why might a model that does not overfit the data (maybe because there is a lot of data) perform worse when we add regularization?\n"
+   ]
+  }
+ ],
+ "metadata": {
+  "language_info": {
+   "name": "python"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/doc/LectureNotes/project1.ipynb b/doc/LectureNotes/project1.ipynb
index aba42cd41..5170af951 100644
--- a/doc/LectureNotes/project1.ipynb
+++ b/doc/LectureNotes/project1.ipynb
@@ -9,7 +9,7 @@
    "source": [
     "<!-- HTML file automatically generated from DocOnce source (https://github.com/doconce/doconce/)\n",
     "doconce format html Project1.do.txt  -->\n",
-    "<!-- dom:TITLE: Project 1 on Machine Learning, deadline October 6 (midnight), 2025 -->"
+    "<!-- dom:TITLE: Project 1 on Machine Learning, deadline October 6 (midnight), 2025 -->\n"
    ]
   },
   {
@@ -20,9 +20,34 @@
    },
    "source": [
     "# Project 1 on Machine Learning, deadline October 6 (midnight), 2025\n",
+    "\n",
     "**Data Analysis and Machine Learning FYS-STK3155/FYS4155**, University of Oslo, Norway\n",
     "\n",
-    "Date: **September 2**"
+    "Date: **September 2**\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "beb333e3",
+   "metadata": {},
+   "source": [
+    "### Deliverables\n",
+    "\n",
+    "First, join a group in canvas with your group partners. Pick an avaliable group for Project 1 in the \"People\" page.\n",
+    "\n",
+    "In canvas, deliver as a group and include:\n",
+    "\n",
+    "- A PDF of your report which follows the guidelines covered below and in the week 39 exercises. Additional requirements include:\n",
+    "  - It should be around 5000 words, use the word counter in Overleaf for this. This often corresponds to 10-12 pages. References and appendices are excluded from the word count\n",
+    "  - It should include around 10-15 figures. You can include more figures in appendices and/or as supplemental material in your repository.\n",
+    "- A comment linking to your github repository (or folder in one of your github repositories) for this project. The repository must include\n",
+    "  - A PDF file of the report\n",
+    "  - A folder named Code, where you put python files for your functions and notebooks for reproducing your results. Remember to use a seed for generating random data and for train-test splits when generating final results.\n",
+    "  - A README file with\n",
+    "    - the name of the group members\n",
+    "    - a short description of the project\n",
+    "    - a description of how to install the required packages to run your code from a requirements.txt file or similar (such as a plain text description)\n",
+    "    - names and descriptions of the various notebooks in the Code folder and the results they produce\n"
    ]
   },
   {
@@ -35,7 +60,7 @@
     "## Preamble: Note on writing reports, using reference material, AI and other tools\n",
     "\n",
     "We want you to answer the three different projects by handing in\n",
-    "reports written like a standard scientific/technical report.  The\n",
+    "reports written like a standard scientific/technical report. The\n",
     "links at\n",
     "<https://github.com/CompPhysics/MachineLearning/tree/master/doc/Projects>\n",
     "contain more information. There you can find examples of previous\n",
@@ -63,14 +88,14 @@
     "been studied in the scientific literature. This makes it easier for\n",
     "you to compare and analyze your results. Comparing with existing\n",
     "results from the scientific literature is also an essential element of\n",
-    "the scientific discussion.  The University of California at Irvine\n",
+    "the scientific discussion. The University of California at Irvine\n",
     "with its Machine Learning repository at\n",
     "<https://archive.ics.uci.edu/ml/index.php> is an excellent site to\n",
     "look up for examples and\n",
     "inspiration. [Kaggle.com](https://www.kaggle.com/) is an equally\n",
     "interesting site. Feel free to explore these sites. When selecting\n",
     "other data sets, make sure these are sets used for regression problems\n",
-    "(not classification)."
+    "(not classification).\n"
    ]
   },
   {
@@ -90,7 +115,7 @@
     "We will study how to fit polynomials to specific\n",
     "one-dimensional functions (feel free to replace the suggested function with more complicated ones).\n",
     "\n",
-    "We will use Runge's function (see <https://en.wikipedia.org/wiki/Runge%27s_phenomenon> for a discussion).  The one-dimensional function we will study is"
+    "We will use Runge's function (see <https://en.wikipedia.org/wiki/Runge%27s_phenomenon> for a discussion). The one-dimensional function we will study is\n"
    ]
   },
   {
@@ -102,7 +127,7 @@
    "source": [
     "$$\n",
     "f(x) = \\frac{1}{1+25x^2}.\n",
-    "$$"
+    "$$\n"
    ]
   },
   {
@@ -114,14 +139,14 @@
    "source": [
     "Our first step will be to perform an OLS regression analysis of this\n",
     "function, trying out a polynomial fit with an $x$ dependence of the\n",
-    "form $[x,x^2,\\dots]$.  You can use a uniform distribution to set up the\n",
+    "form $[x,x^2,\\dots]$. You can use a uniform distribution to set up the\n",
     "arrays of values for $x \\in [-1,1]$, or alternatively use a fixed step size.\n",
     "Thereafter we will repeat many of the same steps when using the Ridge and Lasso regression methods,\n",
-    "introducing thereby a dependence on the hyperparameter  (penalty) $\\lambda$.\n",
+    "introducing thereby a dependence on the hyperparameter (penalty) $\\lambda$.\n",
     "\n",
     "We will also include bootstrap as a resampling technique in order to\n",
-    "study the so-called **bias-variance tradeoff**.  After that we will\n",
-    "include the so-called cross-validation technique."
+    "study the so-called **bias-variance tradeoff**. After that we will\n",
+    "include the so-called cross-validation technique.\n"
    ]
   },
   {
@@ -133,15 +158,15 @@
    "source": [
     "### Part a : Ordinary Least Square (OLS) for the Runge function\n",
     "\n",
-    "We will generate our own dataset for abovementioned  function\n",
+    "We will generate our own dataset for abovementioned function\n",
     "$\\mathrm{Runge}(x)$ function with $x\\in [-1,1]$. You should explore also the addition\n",
     "of an added stochastic noise to this function using the normal\n",
     "distribution $N(0,1)$.\n",
     "\n",
-    "*Write your own code* (using for example the  pseudoinverse function **pinv** from  **Numpy** ) and perform a standard **ordinary least square regression**\n",
-    "analysis using polynomials in $x$ up to  order $15$ or higher. Explore the dependence on the number of data points and the polynomial degree.\n",
+    "_Write your own code_ (using for example the pseudoinverse function **pinv** from **Numpy** ) and perform a standard **ordinary least square regression**\n",
+    "analysis using polynomials in $x$ up to order $15$ or higher. Explore the dependence on the number of data points and the polynomial degree.\n",
     "\n",
-    "Evaluate the mean Squared error (MSE)"
+    "Evaluate the mean Squared error (MSE)\n"
    ]
   },
   {
@@ -154,7 +179,7 @@
     "$$\n",
     "MSE(\\boldsymbol{y},\\tilde{\\boldsymbol{y}}) = \\frac{1}{n}\n",
     "\\sum_{i=0}^{n-1}(y_i-\\tilde{y}_i)^2,\n",
-    "$$"
+    "$$\n"
    ]
   },
   {
@@ -164,9 +189,9 @@
     "editable": true
    },
    "source": [
-    "and the $R^2$ score function.  If $\\tilde{\\boldsymbol{y}}_i$ is the predicted\n",
+    "and the $R^2$ score function. If $\\tilde{\\boldsymbol{y}}_i$ is the predicted\n",
     "value of the $i-th$ sample and $y_i$ is the corresponding true value,\n",
-    "then the score $R^2$ is defined as"
+    "then the score $R^2$ is defined as\n"
    ]
   },
   {
@@ -178,7 +203,7 @@
    "source": [
     "$$\n",
     "R^2(\\boldsymbol{y}, \\tilde{\\boldsymbol{y}}) = 1 - \\frac{\\sum_{i=0}^{n - 1} (y_i - \\tilde{y}_i)^2}{\\sum_{i=0}^{n - 1} (y_i - \\bar{y})^2},\n",
-    "$$"
+    "$$\n"
    ]
   },
   {
@@ -188,7 +213,7 @@
     "editable": true
    },
    "source": [
-    "where we have defined the mean value  of $\\boldsymbol{y}$ as"
+    "where we have defined the mean value of $\\boldsymbol{y}$ as\n"
    ]
   },
   {
@@ -200,7 +225,7 @@
    "source": [
     "$$\n",
     "\\bar{y} =  \\frac{1}{n} \\sum_{i=0}^{n - 1} y_i.\n",
-    "$$"
+    "$$\n"
    ]
   },
   {
@@ -215,23 +240,23 @@
     "\n",
     "Your code has to include a scaling/centering of the data (for example by\n",
     "subtracting the mean value), and\n",
-    "a split of the data in training and test data. For the scaling  you can\n",
+    "a split of the data in training and test data. For the scaling you can\n",
     "either write your own code or use for example the function for\n",
     "splitting training data provided by the library **Scikit-Learn** (make\n",
-    "sure you have installed it).  This function is called\n",
-    "$train\\_test\\_split$.  **You should present a critical discussion of why and how you have scaled or not scaled the data**.\n",
+    "sure you have installed it). This function is called\n",
+    "$train\\_test\\_split$. **You should present a critical discussion of why and how you have scaled or not scaled the data**.\n",
     "\n",
     "It is normal in essentially all Machine Learning studies to split the\n",
-    "data in a training set and a test set (eventually  also an additional\n",
-    "validation set).  There\n",
+    "data in a training set and a test set (eventually also an additional\n",
+    "validation set). There\n",
     "is no explicit recipe for how much data should be included as training\n",
-    "data and say test data.  An accepted rule of thumb is to use\n",
+    "data and say test data. An accepted rule of thumb is to use\n",
     "approximately $2/3$ to $4/5$ of the data as training data.\n",
     "\n",
     "You can easily reuse the solutions to your exercises from week 35.\n",
     "See also the lecture slides from week 35 and week 36.\n",
     "\n",
-    "On scaling, we recommend reading the following section from the scikit-learn software description, see <https://scikit-learn.org/stable/auto_examples/preprocessing/plot_all_scaling.html#plot-all-scaling-standard-scaler-section>."
+    "On scaling, we recommend reading the following section from the scikit-learn software description, see <https://scikit-learn.org/stable/auto_examples/preprocessing/plot_all_scaling.html#plot-all-scaling-standard-scaler-section>.\n"
    ]
   },
   {
@@ -241,14 +266,14 @@
     "editable": true
    },
    "source": [
-    "### Part b: Adding Ridge regression for  the Runge  function\n",
+    "### Part b: Adding Ridge regression for the Runge function\n",
     "\n",
     "Write your own code for the Ridge method as done in the previous\n",
-    "exercise. The lecture notes from week 35 and 36 contain more information. Furthermore, the  results from the exercise set from week 36 is something you can reuse here.\n",
+    "exercise. The lecture notes from week 35 and 36 contain more information. Furthermore, the results from the exercise set from week 36 is something you can reuse here.\n",
     "\n",
     "Perform the same analysis as you did in the previous exercise but now for different values of $\\lambda$. Compare and\n",
-    "analyze your results with those obtained in part a) with the OLS  method. Study the\n",
-    "dependence on $\\lambda$."
+    "analyze your results with those obtained in part a) with the OLS method. Study the\n",
+    "dependence on $\\lambda$.\n"
    ]
   },
   {
@@ -267,7 +292,7 @@
     "from week 36).\n",
     "\n",
     "Study and compare your results from parts a) and b) with your gradient\n",
-    "descent approch. Discuss in particular the role of the learning rate."
+    "descent approch. Discuss in particular the role of the learning rate.\n"
    ]
   },
   {
@@ -283,7 +308,7 @@
     "the gradient descent method by including **momentum**, **ADAgrad**,\n",
     "**RMSprop** and **ADAM** as methods fro iteratively updating your learning\n",
     "rate. Discuss the results and compare the different methods applied to\n",
-    "the one-dimensional Runge function. The lecture notes from week 37 contain several examples on how to implement these methods."
+    "the one-dimensional Runge function. The lecture notes from week 37 contain several examples on how to implement these methods.\n"
    ]
   },
   {
@@ -299,12 +324,12 @@
     "represents our first encounter with a machine learning method which\n",
     "cannot be solved through analytical expressions (as in OLS and Ridge regression). Use the gradient\n",
     "descent methods you developed in parts c) and d) to solve the LASSO\n",
-    "optimization problem. You can compare your results with \n",
+    "optimization problem. You can compare your results with\n",
     "the functionalities of **Scikit-Learn**.\n",
     "\n",
     "Discuss (critically) your results for the Runge function from OLS,\n",
     "Ridge and LASSO regression using the various gradient descent\n",
-    "approaches."
+    "approaches.\n"
    ]
   },
   {
@@ -319,7 +344,7 @@
     "Our last gradient step is to include stochastic gradient descent using\n",
     "the same methods to update the learning rates as in parts c-e).\n",
     "Compare and discuss your results with and without stochastic gradient\n",
-    "and give a critical assessment of the various methods."
+    "and give a critical assessment of the various methods.\n"
    ]
   },
   {
@@ -332,14 +357,14 @@
     "### Part g: Bias-variance trade-off and resampling techniques\n",
     "\n",
     "Our aim here is to study the bias-variance trade-off by implementing\n",
-    "the **bootstrap** resampling technique.  **We will only use the simpler\n",
+    "the **bootstrap** resampling technique. **We will only use the simpler\n",
     "ordinary least squares here**.\n",
     "\n",
-    "With a code which does OLS and includes resampling techniques, \n",
+    "With a code which does OLS and includes resampling techniques,\n",
     "we will now discuss the bias-variance trade-off in the context of\n",
     "continuous predictions such as regression. However, many of the\n",
     "intuitions and ideas discussed here also carry over to classification\n",
-    "tasks and basically all Machine Learning algorithms. \n",
+    "tasks and basically all Machine Learning algorithms.\n",
     "\n",
     "Before you perform an analysis of the bias-variance trade-off on your\n",
     "test data, make first a figure similar to Fig. 2.11 of Hastie,\n",
@@ -356,7 +381,7 @@
     "dataset $\\mathcal{L}$ consisting of the data\n",
     "$\\mathbf{X}_\\mathcal{L}=\\{(y_j, \\boldsymbol{x}_j), j=0\\ldots n-1\\}$.\n",
     "\n",
-    "We assume that the true data is generated from a noisy model"
+    "We assume that the true data is generated from a noisy model\n"
    ]
   },
   {
@@ -368,7 +393,7 @@
    "source": [
     "$$\n",
     "\\boldsymbol{y}=f(\\boldsymbol{x}) + \\boldsymbol{\\epsilon}.\n",
-    "$$"
+    "$$\n"
    ]
   },
   {
@@ -387,7 +412,7 @@
     "that is $\\boldsymbol{\\tilde{y}}=\\boldsymbol{X}\\boldsymbol{\\theta}$.\n",
     "\n",
     "The parameters $\\boldsymbol{\\theta}$ are in turn found by optimizing the mean\n",
-    "squared error via the so-called cost function"
+    "squared error via the so-called cost function\n"
    ]
   },
   {
@@ -399,7 +424,7 @@
    "source": [
     "$$\n",
     "C(\\boldsymbol{X},\\boldsymbol{\\theta}) =\\frac{1}{n}\\sum_{i=0}^{n-1}(y_i-\\tilde{y}_i)^2=\\mathbb{E}\\left[(\\boldsymbol{y}-\\boldsymbol{\\tilde{y}})^2\\right].\n",
-    "$$"
+    "$$\n"
    ]
   },
   {
@@ -409,14 +434,14 @@
     "editable": true
    },
    "source": [
-    "Here the expected value $\\mathbb{E}$ is the sample value. \n",
+    "Here the expected value $\\mathbb{E}$ is the sample value.\n",
     "\n",
     "Show that you can rewrite this in terms of a term which contains the\n",
     "variance of the model itself (the so-called variance term), a term\n",
     "which measures the deviation from the true data and the mean value of\n",
     "the model (the bias term) and finally the variance of the noise.\n",
     "\n",
-    "That is, show that"
+    "That is, show that\n"
    ]
   },
   {
@@ -428,7 +453,7 @@
    "source": [
     "$$\n",
     "\\mathbb{E}\\left[(\\boldsymbol{y}-\\boldsymbol{\\tilde{y}})^2\\right]=\\mathrm{Bias}[\\tilde{y}]+\\mathrm{var}[\\tilde{y}]+\\sigma^2,\n",
-    "$$"
+    "$$\n"
    ]
   },
   {
@@ -438,7 +463,7 @@
     "editable": true
    },
    "source": [
-    "with (we approximate $f(\\boldsymbol{x})\\approx \\boldsymbol{y}$)"
+    "with (we approximate $f(\\boldsymbol{x})\\approx \\boldsymbol{y}$)\n"
    ]
   },
   {
@@ -450,7 +475,7 @@
    "source": [
     "$$\n",
     "\\mathrm{Bias}[\\tilde{y}]=\\mathbb{E}\\left[\\left(\\boldsymbol{y}-\\mathbb{E}\\left[\\boldsymbol{\\tilde{y}}\\right]\\right)^2\\right],\n",
-    "$$"
+    "$$\n"
    ]
   },
   {
@@ -460,7 +485,7 @@
     "editable": true
    },
    "source": [
-    "and"
+    "and\n"
    ]
   },
   {
@@ -472,7 +497,7 @@
    "source": [
     "$$\n",
     "\\mathrm{var}[\\tilde{y}]=\\mathbb{E}\\left[\\left(\\tilde{\\boldsymbol{y}}-\\mathbb{E}\\left[\\boldsymbol{\\tilde{y}}\\right]\\right)^2\\right]=\\frac{1}{n}\\sum_i(\\tilde{y}_i-\\mathbb{E}\\left[\\boldsymbol{\\tilde{y}}\\right])^2.\n",
-    "$$"
+    "$$\n"
    ]
   },
   {
@@ -482,11 +507,11 @@
     "editable": true
    },
    "source": [
-    "**Important note**: Since the function $f(x)$ is unknown, in order to be able to evalute the bias, we replace $f(\\boldsymbol{x})$ in the expression for the bias with $\\boldsymbol{y}$. \n",
+    "**Important note**: Since the function $f(x)$ is unknown, in order to be able to evalute the bias, we replace $f(\\boldsymbol{x})$ in the expression for the bias with $\\boldsymbol{y}$.\n",
     "\n",
     "The answer to this exercise should be included in the theory part of\n",
-    "the report.  This exercise is also part of the weekly exercises of\n",
-    "week 38.  Explain what the terms mean and discuss their\n",
+    "the report. This exercise is also part of the weekly exercises of\n",
+    "week 38. Explain what the terms mean and discuss their\n",
     "interpretations.\n",
     "\n",
     "Perform then a bias-variance analysis of the Runge function by\n",
@@ -495,7 +520,7 @@
     "Discuss the bias and variance trade-off as function\n",
     "of your model complexity (the degree of the polynomial) and the number\n",
     "of data points, and possibly also your training and test data using the **bootstrap** resampling method.\n",
-    "You can follow the code example in the jupyter-book at <https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/chapter3.html#the-bias-variance-tradeoff>."
+    "You can follow the code example in the jupyter-book at <https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/chapter3.html#the-bias-variance-tradeoff>.\n"
    ]
   },
   {
@@ -505,20 +530,20 @@
     "editable": true
    },
    "source": [
-    "### Part h):  Cross-validation as resampling techniques, adding more complexity\n",
+    "### Part h): Cross-validation as resampling techniques, adding more complexity\n",
     "\n",
     "The aim here is to implement another widely popular\n",
-    "resampling technique, the so-called cross-validation method.  \n",
+    "resampling technique, the so-called cross-validation method.\n",
     "\n",
     "Implement the $k$-fold cross-validation algorithm (feel free to use\n",
     "the functionality of **Scikit-Learn** or write your own code) and\n",
     "evaluate again the MSE function resulting from the test folds.\n",
     "\n",
     "Compare the MSE you get from your cross-validation code with the one\n",
-    "you got from your **bootstrap** code from the previous exercise. Comment and interpret your results. \n",
+    "you got from your **bootstrap** code from the previous exercise. Comment and interpret your results.\n",
     "\n",
     "In addition to using the ordinary least squares method, you should\n",
-    "include both Ridge and Lasso regression in the final analysis."
+    "include both Ridge and Lasso regression in the final analysis.\n"
    ]
   },
   {
@@ -532,7 +557,7 @@
     "\n",
     "1. For a discussion and derivation of the variances and mean squared errors using linear regression, see the [Lecture notes on ridge regression by Wessel N. van Wieringen](https://arxiv.org/abs/1509.09169)\n",
     "\n",
-    "2. The textbook of [Trevor Hastie, Robert Tibshirani, Jerome H. Friedman, The Elements of Statistical Learning, Springer](https://www.springer.com/gp/book/9780387848570), chapters 3 and 7 are the most relevant ones for the analysis of parts g) and h)."
+    "2. The textbook of [Trevor Hastie, Robert Tibshirani, Jerome H. Friedman, The Elements of Statistical Learning, Springer](https://www.springer.com/gp/book/9780387848570), chapters 3 and 7 are the most relevant ones for the analysis of parts g) and h).\n"
    ]
   },
   {
@@ -544,25 +569,25 @@
    "source": [
     "## Introduction to numerical projects\n",
     "\n",
-    "Here follows a brief recipe and recommendation on how to answer the various questions when preparing your answers. \n",
+    "Here follows a brief recipe and recommendation on how to answer the various questions when preparing your answers.\n",
     "\n",
-    "  * Give a short description of the nature of the problem and the eventual  numerical methods you have used.\n",
+    "- Give a short description of the nature of the problem and the eventual numerical methods you have used.\n",
     "\n",
-    "  * Describe the algorithm you have used and/or developed. Here you may find it convenient to use pseudocoding. In many cases you can describe the algorithm in the program itself.\n",
+    "- Describe the algorithm you have used and/or developed. Here you may find it convenient to use pseudocoding. In many cases you can describe the algorithm in the program itself.\n",
     "\n",
-    "  * Include the source code of your program. Comment your program properly. You should have the code at your GitHub/GitLab link. You can also place the code in an appendix of your report.\n",
+    "- Include the source code of your program. Comment your program properly. You should have the code at your GitHub/GitLab link. You can also place the code in an appendix of your report.\n",
     "\n",
-    "  * If possible, try to find analytic solutions, or known limits in order to test your program when developing the code.\n",
+    "- If possible, try to find analytic solutions, or known limits in order to test your program when developing the code.\n",
     "\n",
-    "  * Include your results either in figure form or in a table. Remember to        label your results. All tables and figures should have relevant captions        and labels on the axes.\n",
+    "- Include your results either in figure form or in a table. Remember to label your results. All tables and figures should have relevant captions and labels on the axes.\n",
     "\n",
-    "  * Try to evaluate the reliabilty and numerical stability/precision of your results. If possible, include a qualitative and/or quantitative discussion of the numerical stability, eventual loss of precision etc.\n",
+    "- Try to evaluate the reliabilty and numerical stability/precision of your results. If possible, include a qualitative and/or quantitative discussion of the numerical stability, eventual loss of precision etc.\n",
     "\n",
-    "  * Try to give an interpretation of you results in your answers to  the problems.\n",
+    "- Try to give an interpretation of you results in your answers to the problems.\n",
     "\n",
-    "  * Critique: if possible include your comments and reflections about the  exercise, whether you felt you learnt something, ideas for improvements and  other thoughts you've made when solving the exercise. We wish to keep this course at the interactive level and your comments can help us improve it.\n",
+    "- Critique: if possible include your comments and reflections about the exercise, whether you felt you learnt something, ideas for improvements and other thoughts you've made when solving the exercise. We wish to keep this course at the interactive level and your comments can help us improve it.\n",
     "\n",
-    "  * Try to establish a practice where you log your work at the  computerlab. You may find such a logbook very handy at later stages in your work, especially when you don't properly remember  what a previous test version  of your program did. Here you could also record  the time spent on solving the exercise, various algorithms you may have tested or other topics which you feel worthy of mentioning."
+    "- Try to establish a practice where you log your work at the computerlab. You may find such a logbook very handy at later stages in your work, especially when you don't properly remember what a previous test version of your program did. Here you could also record the time spent on solving the exercise, various algorithms you may have tested or other topics which you feel worthy of mentioning.\n"
    ]
   },
   {
@@ -574,17 +599,17 @@
    "source": [
     "## Format for electronic delivery of report and programs\n",
     "\n",
-    "The preferred format for the report is a PDF file. You can also use DOC or postscript formats or as an ipython notebook file.  As programming language we prefer that you choose between C/C++, Fortran2008, Julia or Python. The following prescription should be followed when preparing the report:\n",
+    "The preferred format for the report is a PDF file. You can also use DOC or postscript formats or as an ipython notebook file. As programming language we prefer that you choose between C/C++, Fortran2008, Julia or Python. The following prescription should be followed when preparing the report:\n",
     "\n",
-    "  * Use Canvas to hand in your projects, log in  at  <https://www.uio.no/english/services/it/education/canvas/> with your normal UiO username and password.\n",
+    "- Use Canvas to hand in your projects, log in at <https://www.uio.no/english/services/it/education/canvas/> with your normal UiO username and password.\n",
     "\n",
-    "  * Upload **only** the report file or the link to your GitHub/GitLab or similar typo of  repos!  For the source code file(s) you have developed please provide us with your link to your GitHub/GitLab or similar  domain.  The report file should include all of your discussions and a list of the codes you have developed.  Do not include library files which are available at the course homepage, unless you have made specific changes to them.\n",
+    "- Upload **only** the report file or the link to your GitHub/GitLab or similar typo of repos! For the source code file(s) you have developed please provide us with your link to your GitHub/GitLab or similar domain. The report file should include all of your discussions and a list of the codes you have developed. Do not include library files which are available at the course homepage, unless you have made specific changes to them.\n",
     "\n",
-    "  * In your GitHub/GitLab or similar repository, please include a folder which contains selected results. These can be in the form of output from your code for a selected set of runs and input parameters.\n",
+    "- In your GitHub/GitLab or similar repository, please include a folder which contains selected results. These can be in the form of output from your code for a selected set of runs and input parameters.\n",
     "\n",
-    "Finally, \n",
-    "we encourage you to collaborate. Optimal working groups consist of \n",
-    "2-3 students. You can then hand in a common report."
+    "Finally,\n",
+    "we encourage you to collaborate. Optimal working groups consist of\n",
+    "2-3 students. You can then hand in a common report.\n"
    ]
   },
   {
@@ -596,42 +621,46 @@
    "source": [
     "## Software and needed installations\n",
     "\n",
-    "If you have Python installed (we recommend Python3) and you feel pretty familiar with installing different packages, \n",
+    "If you have Python installed (we recommend Python3) and you feel pretty familiar with installing different packages,\n",
     "we recommend that you install the following Python packages via **pip** as\n",
+    "\n",
     "1. pip install numpy scipy matplotlib ipython scikit-learn tensorflow sympy pandas pillow\n",
     "\n",
     "For Python3, replace **pip** with **pip3**.\n",
     "\n",
-    "See below for a discussion of **tensorflow** and **scikit-learn**. \n",
+    "See below for a discussion of **tensorflow** and **scikit-learn**.\n",
     "\n",
-    "For OSX users we recommend also, after having installed Xcode, to install **brew**. Brew allows \n",
+    "For OSX users we recommend also, after having installed Xcode, to install **brew**. Brew allows\n",
     "for a seamless installation of additional software via for example\n",
+    "\n",
     "1. brew install python3\n",
     "\n",
     "For Linux users, with its variety of distributions like for example the widely popular Ubuntu distribution\n",
-    "you can use **pip** as well and simply install Python as \n",
-    "1. sudo apt-get install python3  (or python for python2.7)\n",
+    "you can use **pip** as well and simply install Python as\n",
+    "\n",
+    "1. sudo apt-get install python3 (or python for python2.7)\n",
+    "\n",
+    "etc etc.\n",
     "\n",
-    "etc etc. \n",
+    "If you don't want to install various Python packages with their dependencies separately, we recommend two widely used distrubutions which set up all relevant dependencies for Python, namely\n",
     "\n",
-    "If you don't want to install various Python packages with their dependencies separately, we recommend two widely used distrubutions which set up  all relevant dependencies for Python, namely\n",
     "1. [Anaconda](https://docs.anaconda.com/) Anaconda is an open source distribution of the Python and R programming languages for large-scale data processing, predictive analytics, and scientific computing, that aims to simplify package management and deployment. Package versions are managed by the package management system **conda**\n",
     "\n",
-    "2. [Enthought canopy](https://www.enthought.com/product/canopy/)  is a Python distribution for scientific and analytic computing distribution and analysis environment, available for free and under a commercial license.\n",
+    "2. [Enthought canopy](https://www.enthought.com/product/canopy/) is a Python distribution for scientific and analytic computing distribution and analysis environment, available for free and under a commercial license.\n",
     "\n",
     "Popular software packages written in Python for ML are\n",
     "\n",
-    "* [Scikit-learn](http://scikit-learn.org/stable/), \n",
+    "- [Scikit-learn](http://scikit-learn.org/stable/),\n",
     "\n",
-    "* [Tensorflow](https://www.tensorflow.org/),\n",
+    "- [Tensorflow](https://www.tensorflow.org/),\n",
     "\n",
-    "* [PyTorch](http://pytorch.org/) and \n",
+    "- [PyTorch](http://pytorch.org/) and\n",
     "\n",
-    "* [Keras](https://keras.io/).\n",
+    "- [Keras](https://keras.io/).\n",
     "\n",
-    "These are all freely available at their respective GitHub sites. They \n",
+    "These are all freely available at their respective GitHub sites. They\n",
     "encompass communities of developers in the thousands or more. And the number\n",
-    "of code developers and contributors keeps increasing."
+    "of code developers and contributors keeps increasing.\n"
    ]
   }
  ],
diff --git a/doc/LectureNotes/project2.ipynb b/doc/LectureNotes/project2.ipynb
new file mode 100644
index 000000000..faf4aee16
--- /dev/null
+++ b/doc/LectureNotes/project2.ipynb
@@ -0,0 +1,635 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "96e577ca",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- HTML file automatically generated from DocOnce source (https://github.com/doconce/doconce/)\n",
+    "doconce format html Project2.do.txt  -->\n",
+    "<!-- dom:TITLE: Project 2 on Machine Learning, deadline November 10 (Midnight) -->"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "067c02b9",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "# Project 2 on Machine Learning, deadline November 10 (Midnight)\n",
+    "**[Data Analysis and Machine Learning FYS-STK3155/FYS4155](http://www.uio.no/studier/emner/matnat/fys/FYS3155/index-eng.html)**, University of Oslo, Norway\n",
+    "\n",
+    "Date: **October 14, 2025**"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "01f9fedd",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Deliverables\n",
+    "\n",
+    "First, join a group in canvas with your group partners. Pick an avaliable group for Project 2 in the **People** page.\n",
+    "\n",
+    "In canvas, deliver as a group and include:\n",
+    "\n",
+    "* A PDF of your report which follows the guidelines covered below and in the week 39 exercises. Additional requirements include:\n",
+    "\n",
+    "  * It should be around 5000 words, use the word counter in Overleaf for this. This often corresponds to 10-12 pages. References and appendices are excluded from the word count\n",
+    "\n",
+    "  * It should include around 10-15 figures. You can include more figures in appendices and/or as supplemental material in your repository.\n",
+    "\n",
+    "* A comment linking to your github repository (or folder in one of your github repositories) for this project. The repository must include\n",
+    "\n",
+    "A PDF file of the report\n",
+    "  * A folder named Code, where you put python files for your functions and notebooks for reproducing your results. Remember to use a seed for generating random data and for train-test splits when generating final results.\n",
+    "\n",
+    "  * A README file with the name of the group members\n",
+    "\n",
+    "  * a short description of the project\n",
+    "\n",
+    "  * a description of how to install the required packages to run your code from a requirements.txt file or similar (such as a plain text description) names and descriptions of the various notebooks in the Code folder and the results they produce"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9f8e4871",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "### Preamble: Note on writing reports, using reference material, AI and other tools\n",
+    "\n",
+    "We want you to answer the three different projects by handing in\n",
+    "reports written like a standard scientific/technical report. The links\n",
+    "at\n",
+    "https://github.com/CompPhysics/MachineLearning/tree/master/doc/Projects/n",
+    "contain more information. There you can find examples of previous\n",
+    "reports, the projects themselves, how we grade reports etc. How to\n",
+    "write reports will also be discussed during the various lab\n",
+    "sessions. Please do ask us if you are in doubt.\n",
+    "\n",
+    "When using codes and material from other sources, you should refer to\n",
+    "these in the bibliography of your report, indicating wherefrom you for\n",
+    "example got the code, whether this is from the lecture notes,\n",
+    "softwares like Scikit-Learn, TensorFlow, PyTorch or other\n",
+    "sources. These sources should always be cited correctly. How to cite\n",
+    "some of the libraries is often indicated from their corresponding\n",
+    "GitHub sites or websites, see for example how to cite Scikit-Learn at\n",
+    "https://scikit-learn.org/dev/about.html./n",
+    "\n",
+    "We enocurage you to use tools like ChatGPT or similar in writing the\n",
+    "report. If you use for example ChatGPT, please do cite it properly and\n",
+    "include (if possible) your questions and answers as an addition to the\n",
+    "report. This can be uploaded to for example your website,\n",
+    "GitHub/GitLab or similar as supplemental material.\n",
+    "\n",
+    "If you would like to study other data sets, feel free to propose other\n",
+    "sets. What we have proposed here are mere suggestions from our\n",
+    "side. If you opt for another data set, consider using a set which has\n",
+    "been studied in the scientific literature. This makes it easier for\n",
+    "you to compare and analyze your results. Comparing with existing\n",
+    "results from the scientific literature is also an essential element of\n",
+    "the scientific discussion. The University of California at Irvine with\n",
+    "its Machine Learning repository at\n",
+    "https://archive.ics.uci.edu/ml/index.php%20is%20an%20excellent%20site%20to%20look/n",
+    "up for examples and inspiration. Kaggle.com is an equally interesting\n",
+    "site. Feel free to explore these sites."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "460cc6ea",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Classification and Regression, writing our own neural network code\n",
+    "\n",
+    "The main aim of this project is to study both classification and\n",
+    "regression problems by developing our own \n",
+    "feed-forward neural network (FFNN) code. The exercises from week 41 and 42 (see <https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/exercisesweek41.html> and <https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/exercisesweek42.html>) as well as the lecture material from the same weeks (see  <https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/week41.html> and <https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/week42.html>) should contain enough information for you to get started with writing your own code.\n",
+    "\n",
+    "We will also reuse our codes on gradient descent methods from project 1.\n",
+    "\n",
+    "The data sets that we propose here are (the default sets)\n",
+    "\n",
+    "* Regression (fitting a continuous function). In this part you will need to bring back your results from project 1 and compare these with what you get from your Neural Network code to be developed here. The data sets could be\n",
+    "\n",
+    "  * The simple one-dimensional function Runge function from project 1, that is $f(x) = \\frac{1}{1+25x^2}$. We recommend using a simpler function when developing your neural network code for regression problems. Feel however free to discuss and study other functions, such as the two-dimensional Runge function $f(x,y)=\\left[(10x - 5)^2 + (10y - 5)^2 + 1 \\right]^{-1}$, or even more complicated two-dimensional functions (see the supplementary material of <https://www.nature.com/articles/s41467-025-61362-4> for an extensive list of two-dimensional functions). \n",
+    "\n",
+    "* Classification.\n",
+    "\n",
+    " * We will consider a multiclass classification problem given by the full MNIST data set. The full data set is at <https://www.kaggle.com/datasets/hojjatk/mnist-dataset>.\n",
+    "\n",
+    "We will start with a regression problem and we will reuse our codes on gradient descent methods from project 1."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d62a07ef",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "### Part a): Analytical warm-up\n",
+    "\n",
+    "When using our gradient machinery from project 1, we will need the expressions for the cost/loss functions and their respective\n",
+    "gradients. The functions whose gradients we need are:\n",
+    "1. The mean-squared error (MSE) with and without the $L_1$ and $L_2$ norms (regression problems)\n",
+    "\n",
+    "2. The binary cross entropy (aka log loss)  for binary classification problems with and without $L_1$ and $L_2$ norms\n",
+    "\n",
+    "3. The multiclass cross entropy cost/loss function (aka Softmax cross entropy or just Softmax loss function)\n",
+    "\n",
+    "Set up these three cost/loss functions and their respective derivatives and explain the various terms. In this project you will however only use the MSE and the Softmax  cross entropy.\n",
+    "\n",
+    "We will test three activation functions for our neural network setup, these are the \n",
+    "1. The Sigmoid (aka **logit**) function,\n",
+    "\n",
+    "2. the RELU function and\n",
+    "\n",
+    "3. the Leaky RELU function\n",
+    "\n",
+    "Set up their expressions and their first derivatives.\n",
+    "You may consult the lecture notes (with codes and more) from week 42 at <https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/week42.html>."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9cd8b8ac",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "### Reminder about the gradient machinery from project 1\n",
+    "\n",
+    "In the setup of a neural network code you will need your gradient descent codes from\n",
+    "project 1.  For neural networks we will recommend using stochastic\n",
+    "gradient descent with either the RMSprop or the ADAM algorithms for\n",
+    "updating the learning rates. But you should feel free to try plain gradient descent as well.\n",
+    "\n",
+    "We recommend reading chapter 8 on optimization from the textbook of\n",
+    "Goodfellow, Bengio and Courville at\n",
+    "<https://www.deeplearningbook.org/>. This chapter contains many\n",
+    "useful insights and discussions on the optimization part of machine\n",
+    "learning.  A useful reference on the back progagation algorithm is\n",
+    "Nielsen's book at <http://neuralnetworksanddeeplearning.com/>. \n",
+    "\n",
+    "You will find the Python [Seaborn\n",
+    "package](https://seaborn.pydata.org/generated/seaborn.heatmap.html)\n",
+    "useful when plotting the results as function of the learning rate\n",
+    "$\\eta$ and the hyper-parameter $\\lambda$ ."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5931b155",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "### Part b): Writing your own Neural Network code\n",
+    "\n",
+    "Your aim now, and this is the central part of this project, is to\n",
+    "write your own FFNN code implementing the back\n",
+    "propagation algorithm discussed in the lecture slides from week 41 at <https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/week41.html> and week 42 at <https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/week42.html>.\n",
+    "\n",
+    "We will focus on a regression problem first, using the one-dimensional Runge function"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b273fc8a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "f(x) = \\frac{1}{1+25x^2},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e13db1ec",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "from project 1.\n",
+    "\n",
+    "Use only the mean-squared error as cost function (no regularization terms) and \n",
+    "write an FFNN code for a regression problem with a flexible number of hidden\n",
+    "layers and nodes using only the Sigmoid function as activation function for\n",
+    "the hidden layers. Initialize the weights using a normal\n",
+    "distribution. How would you initialize the biases? And which\n",
+    "activation function would you select for the final output layer?\n",
+    "And how would you set up your design/feature matrix? Hint: does it have to represent a polynomial approximation as you did in project 1? \n",
+    "\n",
+    "Train your network and compare the results with those from your OLS\n",
+    "regression code from project 1 using the one-dimensional Runge\n",
+    "function.  When comparing your neural network code with the OLS\n",
+    "results from project 1, use the same data sets which gave you the best\n",
+    "MSE score. Moreover, use the polynomial order from project 1 that gave you the\n",
+    "best result.  Compare these results with your neural network with one\n",
+    "and two hidden layers using $50$ and $100$ hidden nodes, respectively.\n",
+    "\n",
+    "Comment your results and give a critical discussion of the results\n",
+    "obtained with the OLS code from project 1 and your own neural network\n",
+    "code.  Make an analysis of the learning rates employed to find the\n",
+    "optimal MSE score. Test both stochastic gradient descent\n",
+    "with RMSprop and ADAM and plain gradient descent with different\n",
+    "learning rates.\n",
+    "\n",
+    "You should, as you did in project 1, scale your data."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4f864e31",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "### Part c): Testing against other software libraries\n",
+    "\n",
+    "You should test your results against a similar code using **Scikit-Learn** (see the examples in the above lecture notes from weeks 41 and 42) or **tensorflow/keras** or **Pytorch** (for Pytorch, see Raschka et al.'s text chapters 12 and 13). \n",
+    "\n",
+    "Furthermore, you should also test that your derivatives are correctly\n",
+    "calculated using automatic differentiation, using for example the\n",
+    "**Autograd** library or the **JAX** library. It is optional to implement\n",
+    "these libraries for the present project. In this project they serve as\n",
+    "useful tests of our derivatives."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c9faeafd",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "### Part d): Testing different activation functions and depths of the neural network\n",
+    "\n",
+    "You should also test different activation functions for the hidden\n",
+    "layers. Try out the Sigmoid, the RELU and the Leaky RELU functions and\n",
+    "discuss your results.  Test your results as functions of the number of hidden layers and nodes. Do you see signs of overfitting?\n",
+    "It is optional in this project to perform a bias-variance trade-off analysis."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d865c22b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "### Part e): Testing different norms\n",
+    "\n",
+    "Finally, still using the one-dimensional Runge function, add now the\n",
+    "hyperparameters $\\lambda$ with the $L_2$ and $L_1$ norms.  Find the\n",
+    "optimal results for the hyperparameters $\\lambda$ and the learning\n",
+    "rates $\\eta$ and neural network architecture and compare the $L_2$ results with Ridge regression from\n",
+    "project 1 and the $L_1$ results with the Lasso calculations of project 1.\n",
+    "Use again the same data sets and the best results from project 1 in your comparisons."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5270af8f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "### Part f): Classification  analysis using neural networks\n",
+    "\n",
+    "With a well-written code it should now be easy to change the\n",
+    "activation function for the output layer.\n",
+    "\n",
+    "Here we will change the cost function for our neural network code\n",
+    "developed in parts b), d) and e) in order to perform a classification\n",
+    "analysis.  The classification problem we will study is the multiclass\n",
+    "MNIST problem, see the description of the full data set at\n",
+    "<https://www.kaggle.com/datasets/hojjatk/mnist-dataset>. We will use the Softmax cross entropy function discussed in a). \n",
+    "The MNIST data set discussed in the lecture notes from week 42 is a downscaled variant of the full dataset. \n",
+    "\n",
+    "Feel free to suggest other data sets. If you find the classic MNIST data set somewhat limited, feel free to try the  \n",
+    "MNIST-Fashion data set at for example <https://www.kaggle.com/datasets/zalando-research/fashionmnist>.\n",
+    "\n",
+    "To set up the data set, the following python programs may be useful"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "id": "4e0e1fea",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "from sklearn.datasets import fetch_openml\n",
+    "\n",
+    "# Fetch the MNIST dataset\n",
+    "mnist = fetch_openml('mnist_784', version=1, as_frame=False, parser='auto')\n",
+    "\n",
+    "# Extract data (features) and target (labels)\n",
+    "X = mnist.data\n",
+    "y = mnist.target"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8fe85677",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "You should consider scaling the data. The Pixel values in MNIST range from 0 to 255. Scaling them to a 0-1 range can improve the performance of some models. That is, you could implement the following scaling"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "id": "b28318b2",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "X = X / 255.0"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "97e02c71",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "And then perform the standard train-test splitting"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "id": "88af355c",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "from sklearn.model_selection import train_test_split\n",
+    "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d1f8f0ed",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "To measure the performance of our classification problem we will use the\n",
+    "so-called *accuracy* score.  The accuracy is as you would expect just\n",
+    "the number of correctly guessed targets $t_i$ divided by the total\n",
+    "number of targets, that is"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "554b3a48",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\text{Accuracy} = \\frac{\\sum_{i=1}^n I(t_i = y_i)}{n} ,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "77bfdd5c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where $I$ is the indicator function, $1$ if $t_i = y_i$ and $0$\n",
+    "otherwise if we have a binary classification problem. Here $t_i$\n",
+    "represents the target and $y_i$ the outputs of your FFNN code and $n$ is simply the number of targets $t_i$.\n",
+    "\n",
+    "Discuss your results and give a critical analysis of the various parameters, including hyper-parameters like the learning rates and the regularization parameter $\\lambda$, various activation functions, number of hidden layers and nodes and activation functions.  \n",
+    "\n",
+    "Again, we strongly recommend that you compare your own neural Network\n",
+    "code for classification and pertinent results against a similar code using **Scikit-Learn**  or **tensorflow/keras** or **pytorch**.\n",
+    "\n",
+    "If you have time, you can use the functionality of **scikit-learn** and compare your neural network results with those from Logistic regression. This is optional.\n",
+    "The weblink  here <https://medium.com/ai-in-plain-english/comparison-between-logistic-regression-and-neural-networks-in-classifying-digits-dc5e85cd93c3>compares logistic regression and FFNN using the so-called MNIST data set. You may find several useful hints and ideas from this article. Your neural network code can implement the equivalent of logistic regression by simply setting the number of hidden layers to zero and keeping just the input and the output layers. \n",
+    "\n",
+    "If you wish to compare with say Logisti Regression from **scikit-learn**, the following code uses the above data set"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "id": "eaa9e72e",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "from sklearn.linear_model import LogisticRegression\n",
+    "# Initialize the model\n",
+    "model = LogisticRegression(solver='saga', multi_class='multinomial', max_iter=1000, random_state=42)\n",
+    "# Train the model\n",
+    "model.fit(X_train, y_train)\n",
+    "from sklearn.metrics import accuracy_score\n",
+    "# Make predictions on the test set\n",
+    "y_pred = model.predict(X_test)\n",
+    "# Calculate accuracy\n",
+    "accuracy = accuracy_score(y_test, y_pred)\n",
+    "print(f\"Model Accuracy: {accuracy:.4f}\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c7ba883e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "### Part g) Critical evaluation of the various algorithms\n",
+    "\n",
+    "After all these glorious calculations, you should now summarize the\n",
+    "various algorithms and come with a critical evaluation of their pros\n",
+    "and cons. Which algorithm works best for the regression case and which\n",
+    "is best for the classification case. These codes can also be part of\n",
+    "your final project 3, but now applied to other data sets."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "595be693",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Summary of methods to implement and analyze\n",
+    "\n",
+    "**Required Implementation:**\n",
+    "1. Reuse the regression code and results from project 1, these will act as a benchmark for seeing how suited a neural network is for this regression task.\n",
+    "\n",
+    "2. Implement a neural network with\n",
+    "\n",
+    "  * A flexible number of layers\n",
+    "\n",
+    "  * A flexible number of nodes in each layer\n",
+    "\n",
+    "  * A changeable activation function in each layer (Sigmoid, ReLU, LeakyReLU, as well as Linear and Softmax)\n",
+    "\n",
+    "  * A changeable cost function, which will be set to MSE for regression and cross-entropy for multiple-classification\n",
+    "\n",
+    "  * An optional L1 or L2 norm of the weights and biases in the cost function (only used for computing gradients, not interpretable metrics)\n",
+    "\n",
+    "3. Implement the back-propagation algorithm to compute the gradient of your neural network\n",
+    "\n",
+    "4. Reuse the implementation of Plain and Stochastic Gradient Descent from Project 1 (and adapt the code to work with the your neural network)\n",
+    "\n",
+    "  * With no optimization algorithm\n",
+    "\n",
+    "  * With RMS Prop\n",
+    "\n",
+    "  * With ADAM\n",
+    "\n",
+    "5. Implement scaling and train-test splitting of your data, preferably using sklearn\n",
+    "\n",
+    "6. Implement and compute metrics like the MSE and Accuracy"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "35138b41",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "### Required Analysis:\n",
+    "\n",
+    "1. Briefly show and argue for the advantages and disadvantages of the methods from Project 1.\n",
+    "\n",
+    "2. Explore and show the impact of changing the number of layers, nodes per layer, choice of activation function, and inclusion of L1 and L2 norms. Present only the most interesting results from this exploration. 2D Heatmaps will be good for this: Start with finding a well performing set of hyper-parameters, then change two at a time in a range that shows good and bad performance.\n",
+    "\n",
+    "3. Show and argue for the advantages and disadvantages of using a neural network for regression on your data\n",
+    "\n",
+    "4. Show and argue for the advantages and disadvantages of using a neural network for classification on your data\n",
+    "\n",
+    "5. Show and argue for the advantages and disadvantages of the different gradient methods and learning rates when training the neural network"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b18bea03",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "### Optional (Note that you should include at least two of these in the report):\n",
+    "\n",
+    "1. Implement Logistic Regression as simple classification model case (equivalent to a Neural Network with just the output layer)\n",
+    "\n",
+    "2. Compute the gradient of the neural network with autograd, to show that it gives the same result as your hand-written backpropagation.\n",
+    "\n",
+    "3. Compare your results with results from using a machine-learning library like pytorch (https://docs.pytorch.org/tutorials/beginner/basics/buildmodel_tutorial.html)\n",
+    "\n",
+    "4. Use a more complex classification dataset instead, like the fashion MNIST (see <https://www.kaggle.com/datasets/zalando-research/fashionmnist>)\n",
+    "\n",
+    "5. Use a more complex regression dataset instead, like the two-dimensional Runge function $f(x,y)=\\left[(10x - 5)^2 + (10y - 5)^2 + 1 \\right]^{-1}$, or even more complicated two-dimensional functions (see the supplementary material of <https://www.nature.com/articles/s41467-025-61362-4> for an extensive list of two-dimensional functions). \n",
+    "\n",
+    "6. Compute and interpret a confusion matrix of your best classification model (see <https://www.researchgate.net/figure/Confusion-matrix-of-MNIST-and-F-MNIST-embeddings_fig5_349758607>)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "580d8424",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Background literature\n",
+    "\n",
+    "1. The text of Michael Nielsen is highly recommended, see Nielsen's book at <http://neuralnetworksanddeeplearning.com/>. It is an excellent read.\n",
+    "\n",
+    "2. Goodfellow, Bengio and Courville, Deep Learning at <https://www.deeplearningbook.org/>. Here we recommend chapters 6, 7 and 8\n",
+    "\n",
+    "3. Raschka et al. at <https://sebastianraschka.com/blog/2022/ml-pytorch-book.html>. Here we recommend chapters 11, 12 and 13."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "96f5c67e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Introduction to numerical projects\n",
+    "\n",
+    "Here follows a brief recipe and recommendation on how to write a report for each\n",
+    "project.\n",
+    "\n",
+    "  * Give a short description of the nature of the problem and the eventual  numerical methods you have used.\n",
+    "\n",
+    "  * Describe the algorithm you have used and/or developed. Here you may find it convenient to use pseudocoding. In many cases you can describe the algorithm in the program itself.\n",
+    "\n",
+    "  * Include the source code of your program. Comment your program properly.\n",
+    "\n",
+    "  * If possible, try to find analytic solutions, or known limits in order to test your program when developing the code.\n",
+    "\n",
+    "  * Include your results either in figure form or in a table. Remember to        label your results. All tables and figures should have relevant captions        and labels on the axes.\n",
+    "\n",
+    "  * Try to evaluate the reliabilty and numerical stability/precision of your results. If possible, include a qualitative and/or quantitative discussion of the numerical stability, eventual loss of precision etc.\n",
+    "\n",
+    "  * Try to give an interpretation of you results in your answers to  the problems.\n",
+    "\n",
+    "  * Critique: if possible include your comments and reflections about the  exercise, whether you felt you learnt something, ideas for improvements and  other thoughts you've made when solving the exercise. We wish to keep this course at the interactive level and your comments can help us improve it.\n",
+    "\n",
+    "  * Try to establish a practice where you log your work at the  computerlab. You may find such a logbook very handy at later stages in your work, especially when you don't properly remember  what a previous test version  of your program did. Here you could also record  the time spent on solving the exercise, various algorithms you may have tested or other topics which you feel worthy of mentioning."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d1bc28ba",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Format for electronic delivery of report and programs\n",
+    "\n",
+    "The preferred format for the report is a PDF file. You can also use DOC or postscript formats or as an ipython notebook file.  As programming language we prefer that you choose between C/C++, Fortran2008 or Python. The following prescription should be followed when preparing the report:\n",
+    "\n",
+    "  * Use Canvas to hand in your projects, log in  at  <https://www.uio.no/english/services/it/education/canvas/> with your normal UiO username and password.\n",
+    "\n",
+    "  * Upload **only** the report file or the link to your GitHub/GitLab or similar typo of  repos!  For the source code file(s) you have developed please provide us with your link to your GitHub/GitLab or similar  domain.  The report file should include all of your discussions and a list of the codes you have developed.  Do not include library files which are available at the course homepage, unless you have made specific changes to them.\n",
+    "\n",
+    "  * In your GitHub/GitLab or similar repository, please include a folder which contains selected results. These can be in the form of output from your code for a selected set of runs and input parameters.\n",
+    "\n",
+    "Finally, \n",
+    "we encourage you to collaborate. Optimal working groups consist of \n",
+    "2-3 students. You can then hand in a common report."
+   ]
+  }
+ ],
+ "metadata": {},
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/doc/LectureNotes/requirements.txt b/doc/LectureNotes/requirements.txt
new file mode 100644
index 000000000..54b882503
--- /dev/null
+++ b/doc/LectureNotes/requirements.txt
@@ -0,0 +1,93 @@
+accessible-pygments==0.0.5
+alabaster==0.7.16
+appnope==0.1.4
+asttokens==3.0.0
+attrs==25.3.0
+babel==2.17.0
+beautifulsoup4==4.13.5
+certifi==2025.8.3
+charset-normalizer==3.4.3
+click==8.2.1
+comm==0.2.3
+debugpy==1.8.16
+decorator==5.2.1
+docutils==0.21.2
+executing==2.2.1
+fastjsonschema==2.21.2
+idna==3.10
+imagesize==1.4.1
+importlib_metadata==8.7.0
+ipykernel==6.30.1
+ipython==9.5.0
+ipython_pygments_lexers==1.1.1
+jedi==0.19.2
+Jinja2==3.1.6
+jsonschema==4.25.1
+jsonschema-specifications==2025.9.1
+jupyter-book==1.0.4.post1
+jupyter-cache==1.0.1
+jupyter_client==8.6.3
+jupyter_core==5.8.1
+latexcodec==3.0.1
+linkify-it-py==2.0.3
+markdown-it-py==3.0.0
+MarkupSafe==3.0.2
+matplotlib-inline==0.1.7
+mdit-py-plugins==0.5.0
+mdurl==0.1.2
+myst-nb==1.3.0
+myst-parser==3.0.1
+nbclient==0.10.2
+nbformat==5.10.4
+nest-asyncio==1.6.0
+numpy==2.3.3
+packaging==25.0
+parso==0.8.5
+pexpect==4.9.0
+platformdirs==4.4.0
+prompt_toolkit==3.0.52
+psutil==7.0.0
+ptyprocess==0.7.0
+pure_eval==0.2.3
+pybtex==0.25.1
+pybtex-docutils==1.0.3
+pydata-sphinx-theme==0.15.4
+Pygments==2.19.2
+python-dateutil==2.9.0.post0
+PyYAML==6.0.2
+pyzmq==27.0.2
+referencing==0.36.2
+requests==2.32.5
+rpds-py==0.27.1
+setuptools==80.9.0
+six==1.17.0
+snowballstemmer==3.0.1
+soupsieve==2.8
+Sphinx==7.4.7
+sphinx-book-theme==1.1.4
+sphinx-comments==0.0.3
+sphinx-copybutton==0.5.2
+sphinx-jupyterbook-latex==1.0.0
+sphinx-multitoc-numbering==0.1.3
+sphinx-thebe==0.3.1
+sphinx-togglebutton==0.3.2
+sphinx_design==0.6.1
+sphinx_external_toc==1.0.1
+sphinxcontrib-applehelp==2.0.0
+sphinxcontrib-bibtex==2.6.5
+sphinxcontrib-devhelp==2.0.0
+sphinxcontrib-htmlhelp==2.1.0
+sphinxcontrib-jsmath==1.0.1
+sphinxcontrib-qthelp==2.0.0
+sphinxcontrib-serializinghtml==2.0.0
+SQLAlchemy==2.0.43
+stack-data==0.6.3
+tabulate==0.9.0
+tornado==6.5.2
+traitlets==5.14.3
+typing_extensions==4.15.0
+uc-micro-py==1.0.3
+urllib3==2.5.0
+wcwidth==0.2.13
+wheel==0.45.1
+zipp==3.23.0
diff --git a/doc/LectureNotes/week37.ipynb b/doc/LectureNotes/week37.ipynb
new file mode 100644
index 000000000..fe89adb05
--- /dev/null
+++ b/doc/LectureNotes/week37.ipynb
@@ -0,0 +1,3856 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "d842e7e1",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- HTML file automatically generated from DocOnce source (https://github.com/doconce/doconce/)\n",
+    "doconce format html week37.do.txt --no_mako -->\n",
+    "<!-- dom:TITLE: Week 37: Gradient descent methods -->"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0cd52479",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "# Week 37: Gradient descent methods\n",
+    "**Morten Hjorth-Jensen**, Department of Physics, University of Oslo, Norway\n",
+    "\n",
+    "Date: **September 8-12, 2025**\n",
+    "\n",
+    "<!-- todo add link to videos and add link to Van Wieringens notes -->"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "699b6141",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Plans for week 37, lecture Monday\n",
+    "\n",
+    "**Plans and material  for the lecture on Monday September 8.**\n",
+    "\n",
+    "The family of gradient descent methods\n",
+    "1. Plain gradient descent (constant learning rate), reminder from last week with examples using OLS and Ridge\n",
+    "\n",
+    "2. Improving gradient descent with momentum\n",
+    "\n",
+    "3. Introducing stochastic gradient descent\n",
+    "\n",
+    "4. More advanced updates of the learning rate: ADAgrad, RMSprop and ADAM\n",
+    "\n",
+    "5. [Video of Lecture](https://youtu.be/SuxK68tj-V8)\n",
+    "\n",
+    "6. [Whiteboard notes](https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2025/FYSSTKweek37.pdf)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "dd264b1c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Readings and Videos:\n",
+    "1. Recommended: Goodfellow et al, Deep Learning, introduction to gradient descent, see sections 4.3-4.5  at <https://www.deeplearningbook.org/contents/numerical.html> and chapter 8.3-8.5 at <https://www.deeplearningbook.org/contents/optimization.html>\n",
+    "\n",
+    "2. Rashcka et al, pages 37-44 and pages 278-283 with focus on linear regression.\n",
+    "\n",
+    "3. Video on gradient descent at <https://www.youtube.com/watch?v=sDv4f4s2SB8>\n",
+    "\n",
+    "4. Video on Stochastic gradient descent at <https://www.youtube.com/watch?v=vMh0zPT0tLI>"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "608927bc",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Material for lecture Monday September 8"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "60640670",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Gradient descent and revisiting Ordinary Least Squares from last week\n",
+    "\n",
+    "Last week we started with  linear regression as a case study for the gradient descent\n",
+    "methods. Linear regression is a great test case for the gradient\n",
+    "descent methods discussed in the lectures since it has several\n",
+    "desirable properties such as:\n",
+    "\n",
+    "1. An analytical solution (recall homework sets for week 35).\n",
+    "\n",
+    "2. The gradient can be computed analytically.\n",
+    "\n",
+    "3. The cost function is convex which guarantees that gradient descent converges for small enough learning rates\n",
+    "\n",
+    "We revisit an example similar to what we had in the first homework set. We have a function  of the type"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "id": "947b67ee",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "import numpy as np\n",
+    "x = 2*np.random.rand(m,1)\n",
+    "y = 4+3*x+np.random.randn(m,1)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0a787eca",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "with $x_i \\in [0,1] $ is chosen randomly using a uniform distribution. Additionally we have a stochastic noise chosen according to a normal distribution $\\cal {N}(0,1)$. \n",
+    "The linear regression model is given by"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d7e84ac7",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "h_\\theta(x) = \\boldsymbol{y} = \\theta_0 + \\theta_1 x,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f34c217e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "such that"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b145d4eb",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\boldsymbol{y}_i = \\theta_0 + \\theta_1 x_i.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2df6d60d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Gradient descent example\n",
+    "\n",
+    "Let $\\mathbf{y} = (y_1,\\cdots,y_n)^T$, $\\mathbf{\\boldsymbol{y}} = (\\boldsymbol{y}_1,\\cdots,\\boldsymbol{y}_n)^T$ and $\\theta = (\\theta_0, \\theta_1)^T$\n",
+    "\n",
+    "It is convenient to write $\\mathbf{\\boldsymbol{y}} = X\\theta$ where $X \\in \\mathbb{R}^{100 \\times 2} $ is the design matrix given by (we keep the intercept here)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1deafba0",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "X \\equiv \\begin{bmatrix}\n",
+    "1 & x_1  \\\\\n",
+    "\\vdots & \\vdots  \\\\\n",
+    "1 & x_{100} &  \\\\\n",
+    "\\end{bmatrix}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "520ac423",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "The cost/loss/risk function is given by"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "48e7232b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "C(\\theta) = \\frac{1}{n}||X\\theta-\\mathbf{y}||_{2}^{2} = \\frac{1}{n}\\sum_{i=1}^{100}\\left[ (\\theta_0 + \\theta_1 x_i)^2 - 2 y_i (\\theta_0 + \\theta_1 x_i) + y_i^2\\right]\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0194af20",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and we want to find $\\theta$ such that $C(\\theta)$ is minimized."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9f58d823",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## The derivative of the cost/loss function\n",
+    "\n",
+    "Computing $\\partial C(\\theta) / \\partial \\theta_0$ and $\\partial C(\\theta) / \\partial \\theta_1$ we can show  that the gradient can be written as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "10129d02",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\nabla_{\\theta} C(\\theta) = \\frac{2}{n}\\begin{bmatrix} \\sum_{i=1}^{100} \\left(\\theta_0+\\theta_1x_i-y_i\\right) \\\\\n",
+    "\\sum_{i=1}^{100}\\left( x_i (\\theta_0+\\theta_1x_i)-y_ix_i\\right) \\\\\n",
+    "\\end{bmatrix} = \\frac{2}{n}X^T(X\\theta - \\mathbf{y}),\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4cd07523",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where $X$ is the design matrix defined above."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1bda7e01",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## The Hessian matrix\n",
+    "The Hessian matrix of $C(\\theta)$ is given by"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "aa64bdd1",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\boldsymbol{H} \\equiv \\begin{bmatrix}\n",
+    "\\frac{\\partial^2 C(\\theta)}{\\partial \\theta_0^2} & \\frac{\\partial^2 C(\\theta)}{\\partial \\theta_0 \\partial \\theta_1}  \\\\\n",
+    "\\frac{\\partial^2 C(\\theta)}{\\partial \\theta_0 \\partial \\theta_1} & \\frac{\\partial^2 C(\\theta)}{\\partial \\theta_1^2} &  \\\\\n",
+    "\\end{bmatrix} = \\frac{2}{n}X^T X.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3e7f4c5d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "This result implies that $C(\\theta)$ is a convex function since the matrix $X^T X$ always is positive semi-definite."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "79ed73a8",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Simple program\n",
+    "\n",
+    "We can now write a program that minimizes $C(\\theta)$ using the gradient descent method with a constant learning rate $\\eta$ according to"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1b70ad9b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\theta_{k+1} = \\theta_k - \\eta \\nabla_\\theta C(\\theta_k), \\ k=0,1,\\cdots\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2fbef92d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "We can use the expression we computed for the gradient and let use a\n",
+    "$\\theta_0$ be chosen randomly and let $\\eta = 0.001$. Stop iterating\n",
+    "when $||\\nabla_\\theta C(\\theta_k) || \\leq \\epsilon = 10^{-8}$. **Note that the code below does not include the latter stop criterion**.\n",
+    "\n",
+    "And finally we can compare our solution for $\\theta$ with the analytic result given by \n",
+    "$\\theta= (X^TX)^{-1} X^T \\mathbf{y}$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0728a369",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Gradient Descent Example\n",
+    "\n",
+    "Here our simple example"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "id": "a48d43f0",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "%matplotlib inline\n",
+    "\n",
+    "\n",
+    "# Importing various packages\n",
+    "from random import random, seed\n",
+    "import numpy as np\n",
+    "import matplotlib.pyplot as plt\n",
+    "from mpl_toolkits.mplot3d import Axes3D\n",
+    "from matplotlib import cm\n",
+    "from matplotlib.ticker import LinearLocator, FormatStrFormatter\n",
+    "import sys\n",
+    "\n",
+    "# the number of datapoints\n",
+    "n = 100\n",
+    "x = 2*np.random.rand(n,1)\n",
+    "y = 4+3*x+np.random.randn(n,1)\n",
+    "\n",
+    "X = np.c_[np.ones((n,1)), x]\n",
+    "# Hessian matrix\n",
+    "H = (2.0/n)* X.T @ X\n",
+    "# Get the eigenvalues\n",
+    "EigValues, EigVectors = np.linalg.eig(H)\n",
+    "print(f\"Eigenvalues of Hessian Matrix:{EigValues}\")\n",
+    "\n",
+    "theta_linreg = np.linalg.inv(X.T @ X) @ X.T @ y\n",
+    "print(theta_linreg)\n",
+    "theta = np.random.randn(2,1)\n",
+    "\n",
+    "eta = 1.0/np.max(EigValues)\n",
+    "Niterations = 1000\n",
+    "\n",
+    "for iter in range(Niterations):\n",
+    "    gradient = (2.0/n)*X.T @ (X @ theta-y)\n",
+    "    theta -= eta*gradient\n",
+    "\n",
+    "print(theta)\n",
+    "xnew = np.array([[0],[2]])\n",
+    "xbnew = np.c_[np.ones((2,1)), xnew]\n",
+    "ypredict = xbnew.dot(theta)\n",
+    "ypredict2 = xbnew.dot(theta_linreg)\n",
+    "plt.plot(xnew, ypredict, \"r-\")\n",
+    "plt.plot(xnew, ypredict2, \"b-\")\n",
+    "plt.plot(x, y ,'ro')\n",
+    "plt.axis([0,2.0,0, 15.0])\n",
+    "plt.xlabel(r'$x$')\n",
+    "plt.ylabel(r'$y$')\n",
+    "plt.title(r'Gradient descent example')\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6c1c6ed1",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Gradient descent and Ridge\n",
+    "\n",
+    "We have also discussed Ridge regression where the loss function contains a regularized term given by the $L_2$ norm of $\\theta$,"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a82ce6e3",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "C_{\\text{ridge}}(\\theta) = \\frac{1}{n}||X\\theta -\\mathbf{y}||^2 + \\lambda ||\\theta||^2, \\ \\lambda \\geq 0.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cb0de7c2",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "In order to minimize $C_{\\text{ridge}}(\\theta)$ using GD we adjust the gradient as follows"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b76c0dea",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\nabla_\\theta C_{\\text{ridge}}(\\theta)  = \\frac{2}{n}\\begin{bmatrix} \\sum_{i=1}^{100} \\left(\\theta_0+\\theta_1x_i-y_i\\right) \\\\\n",
+    "\\sum_{i=1}^{100}\\left( x_i (\\theta_0+\\theta_1x_i)-y_ix_i\\right) \\\\\n",
+    "\\end{bmatrix} + 2\\lambda\\begin{bmatrix} \\theta_0 \\\\ \\theta_1\\end{bmatrix} = 2 (\\frac{1}{n}X^T(X\\theta - \\mathbf{y})+\\lambda \\theta).\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4eeb07f6",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "We can easily extend our program to minimize $C_{\\text{ridge}}(\\theta)$ using gradient descent and compare with the analytical solution given by"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cc7d6c64",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\theta_{\\text{ridge}} = \\left(X^T X + n\\lambda I_{2 \\times 2} \\right)^{-1} X^T \\mathbf{y}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "08bd65db",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## The Hessian matrix for Ridge Regression\n",
+    "The Hessian matrix of Ridge Regression for our simple example  is given by"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a1c5a4d1",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\boldsymbol{H} \\equiv \\begin{bmatrix}\n",
+    "\\frac{\\partial^2 C(\\theta)}{\\partial \\theta_0^2} & \\frac{\\partial^2 C(\\theta)}{\\partial \\theta_0 \\partial \\theta_1}  \\\\\n",
+    "\\frac{\\partial^2 C(\\theta)}{\\partial \\theta_0 \\partial \\theta_1} & \\frac{\\partial^2 C(\\theta)}{\\partial \\theta_1^2} &  \\\\\n",
+    "\\end{bmatrix} = \\frac{2}{n}X^T X+2\\lambda\\boldsymbol{I}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f178c97e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "This implies that the Hessian matrix  is positive definite, hence the stationary point is a\n",
+    "minimum.\n",
+    "Note that the Ridge cost function is convex being  a sum of two convex\n",
+    "functions. Therefore, the stationary point is a global\n",
+    "minimum of this function."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3853aec7",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Program example for gradient descent with Ridge Regression"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "id": "81740e7b",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "from random import random, seed\n",
+    "import numpy as np\n",
+    "import matplotlib.pyplot as plt\n",
+    "from mpl_toolkits.mplot3d import Axes3D\n",
+    "from matplotlib import cm\n",
+    "from matplotlib.ticker import LinearLocator, FormatStrFormatter\n",
+    "import sys\n",
+    "\n",
+    "# the number of datapoints\n",
+    "n = 100\n",
+    "x = 2*np.random.rand(n,1)\n",
+    "y = 4+3*x+np.random.randn(n,1)\n",
+    "\n",
+    "X = np.c_[np.ones((n,1)), x]\n",
+    "XT_X = X.T @ X\n",
+    "\n",
+    "#Ridge parameter lambda\n",
+    "lmbda  = 0.001\n",
+    "Id = n*lmbda* np.eye(XT_X.shape[0])\n",
+    "\n",
+    "# Hessian matrix\n",
+    "H = (2.0/n)* XT_X+2*lmbda* np.eye(XT_X.shape[0])\n",
+    "# Get the eigenvalues\n",
+    "EigValues, EigVectors = np.linalg.eig(H)\n",
+    "print(f\"Eigenvalues of Hessian Matrix:{EigValues}\")\n",
+    "\n",
+    "\n",
+    "theta_linreg = np.linalg.inv(XT_X+Id) @ X.T @ y\n",
+    "print(theta_linreg)\n",
+    "# Start plain gradient descent\n",
+    "theta = np.random.randn(2,1)\n",
+    "\n",
+    "eta = 1.0/np.max(EigValues)\n",
+    "Niterations = 100\n",
+    "\n",
+    "for iter in range(Niterations):\n",
+    "    gradients = 2.0/n*X.T @ (X @ (theta)-y)+2*lmbda*theta\n",
+    "    theta -= eta*gradients\n",
+    "\n",
+    "print(theta)\n",
+    "ypredict = X @ theta\n",
+    "ypredict2 = X @ theta_linreg\n",
+    "plt.plot(x, ypredict, \"r-\")\n",
+    "plt.plot(x, ypredict2, \"b-\")\n",
+    "plt.plot(x, y ,'ro')\n",
+    "plt.axis([0,2.0,0, 15.0])\n",
+    "plt.xlabel(r'$x$')\n",
+    "plt.ylabel(r'$y$')\n",
+    "plt.title(r'Gradient descent example for Ridge')\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "aa1b6e08",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Using gradient descent methods, limitations\n",
+    "\n",
+    "* **Gradient descent (GD) finds local minima of our function**. Since the GD algorithm is deterministic, if it converges, it will converge to a local minimum of our cost/loss/risk function. Because in ML we are often dealing with extremely rugged landscapes with many local minima, this can lead to poor performance.\n",
+    "\n",
+    "* **GD is sensitive to initial conditions**. One consequence of the local nature of GD is that initial conditions matter. Depending on where one starts, one will end up at a different local minima. Therefore, it is very important to think about how one initializes the training process. This is true for GD as well as more complicated variants of GD.\n",
+    "\n",
+    "* **Gradients are computationally expensive to calculate for large datasets**. In many cases in statistics and ML, the cost/loss/risk function is a sum of terms, with one term for each data point. For example, in linear regression, $E \\propto \\sum_{i=1}^n (y_i - \\mathbf{w}^T\\cdot\\mathbf{x}_i)^2$; for logistic regression, the square error is replaced by the cross entropy. To calculate the gradient we have to sum over *all* $n$ data points. Doing this at every GD step becomes extremely computationally expensive. An ingenious solution to this, is to calculate the gradients using small subsets of the data called \"mini batches\". This has the added benefit of introducing stochasticity into our algorithm.\n",
+    "\n",
+    "* **GD is very sensitive to choices of learning rates**. GD is extremely sensitive to the choice of learning rates. If the learning rate is very small, the training process take an extremely long time. For larger learning rates, GD can diverge and give poor results. Furthermore, depending on what the local landscape looks like, we have to modify the learning rates to ensure convergence. Ideally, we would *adaptively* choose the learning rates to match the landscape.\n",
+    "\n",
+    "* **GD treats all directions in parameter space uniformly.** Another major drawback of GD is that unlike Newton's method, the learning rate for GD is the same in all directions in parameter space. For this reason, the maximum learning rate is set by the behavior of the steepest direction and this can significantly slow down training. Ideally, we would like to take large steps in flat directions and small steps in steep directions. Since we are exploring rugged landscapes where curvatures change, this requires us to keep track of not only the gradient but second derivatives. The ideal scenario would be to calculate the Hessian but this proves to be too computationally expensive. \n",
+    "\n",
+    "* GD can take exponential time to escape saddle points, even with random initialization. As we mentioned, GD is extremely sensitive to initial condition since it determines the particular local minimum GD would eventually reach. However, even with a good initialization scheme, through the introduction of randomness, GD can still take exponential time to escape saddle points."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d1b9be1a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Momentum based GD\n",
+    "\n",
+    "We discuss here some simple examples where we introduce what is called\n",
+    "'memory'about previous steps, or what is normally called momentum\n",
+    "gradient descent.\n",
+    "For the mathematical details, see whiteboad notes from lecture on September 8, 2025."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2e1267e6",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Improving gradient descent with momentum"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "id": "494e82a7",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "from numpy import asarray\n",
+    "from numpy import arange\n",
+    "from numpy.random import rand\n",
+    "from numpy.random import seed\n",
+    "from matplotlib import pyplot\n",
+    " \n",
+    "# objective function\n",
+    "def objective(x):\n",
+    "\treturn x**2.0\n",
+    " \n",
+    "# derivative of objective function\n",
+    "def derivative(x):\n",
+    "\treturn x * 2.0\n",
+    " \n",
+    "# gradient descent algorithm\n",
+    "def gradient_descent(objective, derivative, bounds, n_iter, step_size):\n",
+    "\t# track all solutions\n",
+    "\tsolutions, scores = list(), list()\n",
+    "\t# generate an initial point\n",
+    "\tsolution = bounds[:, 0] + rand(len(bounds)) * (bounds[:, 1] - bounds[:, 0])\n",
+    "\t# run the gradient descent\n",
+    "\tfor i in range(n_iter):\n",
+    "\t\t# calculate gradient\n",
+    "\t\tgradient = derivative(solution)\n",
+    "\t\t# take a step\n",
+    "\t\tsolution = solution - step_size * gradient\n",
+    "\t\t# evaluate candidate point\n",
+    "\t\tsolution_eval = objective(solution)\n",
+    "\t\t# store solution\n",
+    "\t\tsolutions.append(solution)\n",
+    "\t\tscores.append(solution_eval)\n",
+    "\t\t# report progress\n",
+    "\t\tprint('>%d f(%s) = %.5f' % (i, solution, solution_eval))\n",
+    "\treturn [solutions, scores]\n",
+    " \n",
+    "# seed the pseudo random number generator\n",
+    "seed(4)\n",
+    "# define range for input\n",
+    "bounds = asarray([[-1.0, 1.0]])\n",
+    "# define the total iterations\n",
+    "n_iter = 30\n",
+    "# define the step size\n",
+    "step_size = 0.1\n",
+    "# perform the gradient descent search\n",
+    "solutions, scores = gradient_descent(objective, derivative, bounds, n_iter, step_size)\n",
+    "# sample input range uniformly at 0.1 increments\n",
+    "inputs = arange(bounds[0,0], bounds[0,1]+0.1, 0.1)\n",
+    "# compute targets\n",
+    "results = objective(inputs)\n",
+    "# create a line plot of input vs result\n",
+    "pyplot.plot(inputs, results)\n",
+    "# plot the solutions found\n",
+    "pyplot.plot(solutions, scores, '.-', color='red')\n",
+    "# show the plot\n",
+    "pyplot.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "46858c7c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Same code but now with momentum gradient descent"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "id": "6a917123",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "from numpy import asarray\n",
+    "from numpy import arange\n",
+    "from numpy.random import rand\n",
+    "from numpy.random import seed\n",
+    "from matplotlib import pyplot\n",
+    " \n",
+    "# objective function\n",
+    "def objective(x):\n",
+    "\treturn x**2.0\n",
+    " \n",
+    "# derivative of objective function\n",
+    "def derivative(x):\n",
+    "\treturn x * 2.0\n",
+    " \n",
+    "# gradient descent algorithm\n",
+    "def gradient_descent(objective, derivative, bounds, n_iter, step_size, momentum):\n",
+    "\t# track all solutions\n",
+    "\tsolutions, scores = list(), list()\n",
+    "\t# generate an initial point\n",
+    "\tsolution = bounds[:, 0] + rand(len(bounds)) * (bounds[:, 1] - bounds[:, 0])\n",
+    "\t# keep track of the change\n",
+    "\tchange = 0.0\n",
+    "\t# run the gradient descent\n",
+    "\tfor i in range(n_iter):\n",
+    "\t\t# calculate gradient\n",
+    "\t\tgradient = derivative(solution)\n",
+    "\t\t# calculate update\n",
+    "\t\tnew_change = step_size * gradient + momentum * change\n",
+    "\t\t# take a step\n",
+    "\t\tsolution = solution - new_change\n",
+    "\t\t# save the change\n",
+    "\t\tchange = new_change\n",
+    "\t\t# evaluate candidate point\n",
+    "\t\tsolution_eval = objective(solution)\n",
+    "\t\t# store solution\n",
+    "\t\tsolutions.append(solution)\n",
+    "\t\tscores.append(solution_eval)\n",
+    "\t\t# report progress\n",
+    "\t\tprint('>%d f(%s) = %.5f' % (i, solution, solution_eval))\n",
+    "\treturn [solutions, scores]\n",
+    " \n",
+    "# seed the pseudo random number generator\n",
+    "seed(4)\n",
+    "# define range for input\n",
+    "bounds = asarray([[-1.0, 1.0]])\n",
+    "# define the total iterations\n",
+    "n_iter = 30\n",
+    "# define the step size\n",
+    "step_size = 0.1\n",
+    "# define momentum\n",
+    "momentum = 0.3\n",
+    "# perform the gradient descent search with momentum\n",
+    "solutions, scores = gradient_descent(objective, derivative, bounds, n_iter, step_size, momentum)\n",
+    "# sample input range uniformly at 0.1 increments\n",
+    "inputs = arange(bounds[0,0], bounds[0,1]+0.1, 0.1)\n",
+    "# compute targets\n",
+    "results = objective(inputs)\n",
+    "# create a line plot of input vs result\n",
+    "pyplot.plot(inputs, results)\n",
+    "# plot the solutions found\n",
+    "pyplot.plot(solutions, scores, '.-', color='red')\n",
+    "# show the plot\n",
+    "pyplot.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "361b2aa8",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Overview video on Stochastic Gradient Descent (SGD)\n",
+    "\n",
+    "[What is Stochastic Gradient Descent](https://www.youtube.com/watch?v=vMh0zPT0tLI&ab_channel=StatQuestwithJoshStarmer)\n",
+    "There are several reasons for using stochastic gradient descent. Some of these are:\n",
+    "\n",
+    "1. Efficiency: Updates weights more frequently using a single or a small batch of samples, which speeds up convergence.\n",
+    "\n",
+    "2. Hopefully avoid Local Minima\n",
+    "\n",
+    "3. Memory Usage: Requires less memory compared to computing gradients for the entire dataset."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2dacb8ef",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Batches and mini-batches\n",
+    "\n",
+    "In gradient descent we compute the cost function and its gradient for all data points we have.\n",
+    "\n",
+    "In large-scale applications such as the [ILSVRC challenge](https://www.image-net.org/challenges/LSVRC/), the\n",
+    "training data can have on order of millions of examples. Hence, it\n",
+    "seems wasteful to compute the full cost function over the entire\n",
+    "training set in order to perform only a single parameter update. A\n",
+    "very common approach to addressing this challenge is to compute the\n",
+    "gradient over batches of the training data. For example, a typical batch could contain some thousand  examples from\n",
+    "an  entire training set of several millions. This batch is then used to\n",
+    "perform a parameter update."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "59c9add4",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Pros and cons\n",
+    "\n",
+    "1. Speed: SGD is faster than gradient descent because it uses only one training example per iteration, whereas gradient descent requires the entire dataset. This speed advantage becomes more significant as the size of the dataset increases.\n",
+    "\n",
+    "2. Convergence: Gradient descent has a more predictable convergence behaviour because it uses the average gradient of the entire dataset. In contrast, SGD’s convergence behaviour can be more erratic due to its random sampling of individual training examples.\n",
+    "\n",
+    "3. Memory: Gradient descent requires more memory than SGD because it must store the entire dataset for each iteration. SGD only needs to store the current training example, making it more memory-efficient."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a5168cc9",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Convergence rates\n",
+    "\n",
+    "1. Stochastic Gradient Descent has a faster convergence rate due to the use of single training examples in each iteration.\n",
+    "\n",
+    "2. Gradient Descent as a slower convergence rate, as it uses the entire dataset for each iteration."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "47321307",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Accuracy\n",
+    "\n",
+    "In general, stochastic Gradient Descent is Less accurate than gradient\n",
+    "descent, as it calculates the gradient on single examples, which may\n",
+    "not accurately represent the overall dataset.  Gradient Descent is\n",
+    "more accurate because it uses the average gradient calculated over the\n",
+    "entire dataset.\n",
+    "\n",
+    "There are other disadvantages to using SGD. The main drawback is that\n",
+    "its convergence behaviour can be more erratic due to the random\n",
+    "sampling of individual training examples. This can lead to less\n",
+    "accurate results, as the algorithm may not converge to the true\n",
+    "minimum of the cost function. Additionally, the learning rate, which\n",
+    "determines the step size of each update to the model’s parameters,\n",
+    "must be carefully chosen to ensure convergence.\n",
+    "\n",
+    "It is however the method of choice in deep learning algorithms where\n",
+    "SGD is often used in combination with other optimization techniques,\n",
+    "such as momentum or adaptive learning rates"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "96f44d6b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Stochastic Gradient Descent (SGD)\n",
+    "\n",
+    "In stochastic gradient descent, the extreme case is the case where we\n",
+    "have only one batch, that is we include the whole data set.\n",
+    "\n",
+    "This process is called Stochastic Gradient\n",
+    "Descent (SGD) (or also sometimes on-line gradient descent). This is\n",
+    "relatively less common to see because in practice due to vectorized\n",
+    "code optimizations it can be computationally much more efficient to\n",
+    "evaluate the gradient for 100 examples, than the gradient for one\n",
+    "example 100 times. Even though SGD technically refers to using a\n",
+    "single example at a time to evaluate the gradient, you will hear\n",
+    "people use the term SGD even when referring to mini-batch gradient\n",
+    "descent (i.e. mentions of MGD for “Minibatch Gradient Descent”, or BGD\n",
+    "for “Batch gradient descent” are rare to see), where it is usually\n",
+    "assumed that mini-batches are used. The size of the mini-batch is a\n",
+    "hyperparameter but it is not very common to cross-validate or bootstrap it. It is\n",
+    "usually based on memory constraints (if any), or set to some value,\n",
+    "e.g. 32, 64 or 128. We use powers of 2 in practice because many\n",
+    "vectorized operation implementations work faster when their inputs are\n",
+    "sized in powers of 2.\n",
+    "\n",
+    "In our notes with  SGD we mean stochastic gradient descent with mini-batches."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "898ef421",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Stochastic Gradient Descent\n",
+    "\n",
+    "Stochastic gradient descent (SGD) and variants thereof address some of\n",
+    "the shortcomings of the Gradient descent method discussed above.\n",
+    "\n",
+    "The underlying idea of SGD comes from the observation that the cost\n",
+    "function, which we want to minimize, can almost always be written as a\n",
+    "sum over $n$ data points $\\{\\mathbf{x}_i\\}_{i=1}^n$,"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4e827950",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "C(\\mathbf{\\theta}) = \\sum_{i=1}^n c_i(\\mathbf{x}_i,\n",
+    "\\mathbf{\\theta}).\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "05e99546",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Computation of gradients\n",
+    "\n",
+    "This in turn means that the gradient can be\n",
+    "computed as a sum over $i$-gradients"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b92afe6c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\nabla_\\theta C(\\mathbf{\\theta}) = \\sum_i^n \\nabla_\\theta c_i(\\mathbf{x}_i,\n",
+    "\\mathbf{\\theta}).\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b20a4aca",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Stochasticity/randomness is introduced by only taking the\n",
+    "gradient on a subset of the data called minibatches.  If there are $n$\n",
+    "data points and the size of each minibatch is $M$, there will be $n/M$\n",
+    "minibatches. We denote these minibatches by $B_k$ where\n",
+    "$k=1,\\cdots,n/M$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7884cc0d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## SGD example\n",
+    "As an example, suppose we have $10$ data points $(\\mathbf{x}_1,\\cdots, \\mathbf{x}_{10})$ \n",
+    "and we choose to have $M=5$ minibathces,\n",
+    "then each minibatch contains two data points. In particular we have\n",
+    "$B_1 = (\\mathbf{x}_1,\\mathbf{x}_2), \\cdots, B_5 =\n",
+    "(\\mathbf{x}_9,\\mathbf{x}_{10})$. Note that if you choose $M=1$ you\n",
+    "have only a single batch with all data points and on the other extreme,\n",
+    "you may choose $M=n$ resulting in a minibatch for each datapoint, i.e\n",
+    "$B_k = \\mathbf{x}_k$.\n",
+    "\n",
+    "The idea is now to approximate the gradient by replacing the sum over\n",
+    "all data points with a sum over the data points in one the minibatches\n",
+    "picked at random in each gradient descent step"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "392aeed0",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\nabla_{\\theta}\n",
+    "C(\\mathbf{\\theta}) = \\sum_{i=1}^n \\nabla_\\theta c_i(\\mathbf{x}_i,\n",
+    "\\mathbf{\\theta}) \\rightarrow \\sum_{i \\in B_k}^n \\nabla_\\theta\n",
+    "c_i(\\mathbf{x}_i, \\mathbf{\\theta}).\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "04581249",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## The gradient step\n",
+    "\n",
+    "Thus a gradient descent step now looks like"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d21077a4",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\theta_{j+1} = \\theta_j - \\eta_j \\sum_{i \\in B_k}^n \\nabla_\\theta c_i(\\mathbf{x}_i,\n",
+    "\\mathbf{\\theta})\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b4bed668",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where $k$ is picked at random with equal\n",
+    "probability from $[1,n/M]$. An iteration over the number of\n",
+    "minibathces (n/M) is commonly referred to as an epoch. Thus it is\n",
+    "typical to choose a number of epochs and for each epoch iterate over\n",
+    "the number of minibatches, as exemplified in the code below."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9c15b282",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Simple example code"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "id": "602bda4c",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "import numpy as np \n",
+    "\n",
+    "n = 100 #100 datapoints \n",
+    "M = 5   #size of each minibatch\n",
+    "m = int(n/M) #number of minibatches\n",
+    "n_epochs = 10 #number of epochs\n",
+    "\n",
+    "j = 0\n",
+    "for epoch in range(1,n_epochs+1):\n",
+    "    for i in range(m):\n",
+    "        k = np.random.randint(m) #Pick the k-th minibatch at random\n",
+    "        #Compute the gradient using the data in minibatch Bk\n",
+    "        #Compute new suggestion for \n",
+    "        j += 1"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "332831a7",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Taking the gradient only on a subset of the data has two important\n",
+    "benefits. First, it introduces randomness which decreases the chance\n",
+    "that our opmization scheme gets stuck in a local minima. Second, if\n",
+    "the size of the minibatches are small relative to the number of\n",
+    "datapoints ($M <  n$), the computation of the gradient is much\n",
+    "cheaper since we sum over the datapoints in the $k-th$ minibatch and not\n",
+    "all $n$ datapoints."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "187eb27c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## When do we stop?\n",
+    "\n",
+    "A natural question is when do we stop the search for a new minimum?\n",
+    "One possibility is to compute the full gradient after a given number\n",
+    "of epochs and check if the norm of the gradient is smaller than some\n",
+    "threshold and stop if true. However, the condition that the gradient\n",
+    "is zero is valid also for local minima, so this would only tell us\n",
+    "that we are close to a local/global minimum. However, we could also\n",
+    "evaluate the cost function at this point, store the result and\n",
+    "continue the search. If the test kicks in at a later stage we can\n",
+    "compare the values of the cost function and keep the $\\theta$ that\n",
+    "gave the lowest value."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8ddbdbb5",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Slightly different approach\n",
+    "\n",
+    "Another approach is to let the step length $\\eta_j$ depend on the\n",
+    "number of epochs in such a way that it becomes very small after a\n",
+    "reasonable time such that we do not move at all. Such approaches are\n",
+    "also called scaling. There are many such ways to [scale the learning\n",
+    "rate](https://towardsdatascience.com/gradient-descent-the-learning-rate-and-the-importance-of-feature-scaling-6c0b416596e1)\n",
+    "and [discussions here](https://www.jmlr.org/papers/volume23/20-1258/20-1258.pdf). See\n",
+    "also\n",
+    "<https://towardsdatascience.com/learning-rate-schedules-and-adaptive-learning-rate-methods-for-deep-learning-2c8f433990d1>\n",
+    "for a discussion of different scaling functions for the learning rate."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "35ea8e21",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Time decay rate\n",
+    "\n",
+    "As an example, let $e = 0,1,2,3,\\cdots$ denote the current epoch and let $t_0, t_1 > 0$ be two fixed numbers. Furthermore, let $t = e \\cdot m + i$ where $m$ is the number of minibatches and $i=0,\\cdots,m-1$. Then the function $$\\eta_j(t; t_0, t_1) = \\frac{t_0}{t+t_1} $$ goes to zero as the number of epochs gets large. I.e. we start with a step length $\\eta_j (0; t_0, t_1) = t_0/t_1$ which decays in *time* $t$.\n",
+    "\n",
+    "In this way we can fix the number of epochs, compute $\\theta$ and\n",
+    "evaluate the cost function at the end. Repeating the computation will\n",
+    "give a different result since the scheme is random by design. Then we\n",
+    "pick the final $\\theta$ that gives the lowest value of the cost\n",
+    "function."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "id": "77a60fcd",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "import numpy as np \n",
+    "\n",
+    "def step_length(t,t0,t1):\n",
+    "    return t0/(t+t1)\n",
+    "\n",
+    "n = 100 #100 datapoints \n",
+    "M = 5   #size of each minibatch\n",
+    "m = int(n/M) #number of minibatches\n",
+    "n_epochs = 500 #number of epochs\n",
+    "t0 = 1.0\n",
+    "t1 = 10\n",
+    "\n",
+    "eta_j = t0/t1\n",
+    "j = 0\n",
+    "for epoch in range(1,n_epochs+1):\n",
+    "    for i in range(m):\n",
+    "        k = np.random.randint(m) #Pick the k-th minibatch at random\n",
+    "        #Compute the gradient using the data in minibatch Bk\n",
+    "        #Compute new suggestion for theta\n",
+    "        t = epoch*m+i\n",
+    "        eta_j = step_length(t,t0,t1)\n",
+    "        j += 1\n",
+    "\n",
+    "print(\"eta_j after %d epochs: %g\" % (n_epochs,eta_j))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b030b80c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Code with a Number of Minibatches which varies\n",
+    "\n",
+    "In the code here we vary the number of mini-batches."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 8,
+   "id": "9bdf875b",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "# Importing various packages\n",
+    "from math import exp, sqrt\n",
+    "from random import random, seed\n",
+    "import numpy as np\n",
+    "import matplotlib.pyplot as plt\n",
+    "\n",
+    "n = 100\n",
+    "x = 2*np.random.rand(n,1)\n",
+    "y = 4+3*x+np.random.randn(n,1)\n",
+    "\n",
+    "X = np.c_[np.ones((n,1)), x]\n",
+    "XT_X = X.T @ X\n",
+    "theta_linreg = np.linalg.inv(X.T @ X) @ (X.T @ y)\n",
+    "print(\"Own inversion\")\n",
+    "print(theta_linreg)\n",
+    "# Hessian matrix\n",
+    "H = (2.0/n)* XT_X\n",
+    "EigValues, EigVectors = np.linalg.eig(H)\n",
+    "print(f\"Eigenvalues of Hessian Matrix:{EigValues}\")\n",
+    "\n",
+    "theta = np.random.randn(2,1)\n",
+    "eta = 1.0/np.max(EigValues)\n",
+    "Niterations = 1000\n",
+    "\n",
+    "\n",
+    "for iter in range(Niterations):\n",
+    "    gradients = 2.0/n*X.T @ ((X @ theta)-y)\n",
+    "    theta -= eta*gradients\n",
+    "print(\"theta from own gd\")\n",
+    "print(theta)\n",
+    "\n",
+    "xnew = np.array([[0],[2]])\n",
+    "Xnew = np.c_[np.ones((2,1)), xnew]\n",
+    "ypredict = Xnew.dot(theta)\n",
+    "ypredict2 = Xnew.dot(theta_linreg)\n",
+    "\n",
+    "n_epochs = 50\n",
+    "M = 5   #size of each minibatch\n",
+    "m = int(n/M) #number of minibatches\n",
+    "t0, t1 = 5, 50\n",
+    "\n",
+    "def learning_schedule(t):\n",
+    "    return t0/(t+t1)\n",
+    "\n",
+    "theta = np.random.randn(2,1)\n",
+    "\n",
+    "for epoch in range(n_epochs):\n",
+    "# Can you figure out a better way of setting up the contributions to each batch?\n",
+    "    for i in range(m):\n",
+    "        random_index = M*np.random.randint(m)\n",
+    "        xi = X[random_index:random_index+M]\n",
+    "        yi = y[random_index:random_index+M]\n",
+    "        gradients = (2.0/M)* xi.T @ ((xi @ theta)-yi)\n",
+    "        eta = learning_schedule(epoch*m+i)\n",
+    "        theta = theta - eta*gradients\n",
+    "print(\"theta from own sdg\")\n",
+    "print(theta)\n",
+    "\n",
+    "plt.plot(xnew, ypredict, \"r-\")\n",
+    "plt.plot(xnew, ypredict2, \"b-\")\n",
+    "plt.plot(x, y ,'ro')\n",
+    "plt.axis([0,2.0,0, 15.0])\n",
+    "plt.xlabel(r'$x$')\n",
+    "plt.ylabel(r'$y$')\n",
+    "plt.title(r'Random numbers ')\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "365cebd9",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Replace or not\n",
+    "\n",
+    "In the above code, we have use replacement in setting up the\n",
+    "mini-batches. The discussion\n",
+    "[here](https://sebastianraschka.com/faq/docs/sgd-methods.html) may be\n",
+    "useful."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e7c9011a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## SGD vs Full-Batch GD: Convergence Speed and Memory Comparison"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f1c85da0",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "### Theoretical Convergence Speed and convex optimization\n",
+    "\n",
+    "Consider minimizing an empirical cost function"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "66df0f80",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "C(\\theta) =\\frac{1}{N}\\sum_{i=1}^N l_i(\\theta),\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9f02b845",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where each $l_i(\\theta)$ is a\n",
+    "differentiable loss term. Gradient Descent (GD) updates parameters\n",
+    "using the full gradient $\\nabla C(\\theta)$, while Stochastic Gradient\n",
+    "Descent (SGD) uses a single sample (or mini-batch) gradient $\\nabla\n",
+    "l_i(\\theta)$ selected at random. In equation form, one GD step is:"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "21997f1a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\theta_{t+1} = \\theta_t-\\eta \\nabla C(\\theta_t) =\\theta_t -\\eta \\frac{1}{N}\\sum_{i=1}^N \\nabla l_i(\\theta_t),\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cdefe165",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "whereas one SGD step is:"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ac200d56",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\theta_{t+1} = \\theta_t -\\eta \\nabla l_{i_t}(\\theta_t),\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "eb3edfb3",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "with $i_t$ randomly chosen. On smooth convex problems, GD and SGD both\n",
+    "converge to the global minimum, but their rates differ. GD can take\n",
+    "larger, more stable steps since it uses the exact gradient, achieving\n",
+    "an error that decreases on the order of $O(1/t)$ per iteration for\n",
+    "convex objectives (and even exponentially fast for strongly convex\n",
+    "cases). In contrast, plain SGD has more variance in each step, leading\n",
+    "to sublinear convergence in expectation – typically $O(1/\\sqrt{t})$\n",
+    "for general convex objectives (\\thetaith appropriate diminishing step\n",
+    "sizes) . Intuitively, GD’s trajectory is smoother and more\n",
+    "predictable, while SGD’s path oscillates due to noise but costs far\n",
+    "less per iteration, enabling many more updates in the same time."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7fe05c0d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "### Strongly Convex Case\n",
+    "\n",
+    "If $C(\\theta)$ is strongly convex and $L$-smooth (so GD enjoys linear\n",
+    "convergence), the gap $C(\\theta_t)-C(\\theta^*)$ for GD shrinks as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2ae403f1",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "C(\\theta_t) - C(\\theta^* ) \\le \\Big(1 - \\frac{\\mu}{L}\\Big)^t [C(\\theta_0)-C(\\theta^*)],\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "44272171",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "a geometric (linear) convergence per iteration . Achieving an\n",
+    "$\\epsilon$-accurate solution thus takes on the order of\n",
+    "$\\log(1/\\epsilon)$ iterations for GD. However, each GD iteration costs\n",
+    "$O(N)$ gradient evaluations. SGD cannot exploit strong convexity to\n",
+    "obtain a linear rate – instead, with a properly decaying step size\n",
+    "(e.g. $\\eta_t = \\frac{1}{\\mu t}$) or iterate averaging, SGD attains an\n",
+    "$O(1/t)$ convergence rate in expectation . For example, one result\n",
+    "of Moulines and  Bach 2011, see <https://papers.nips.cc/paper_files/paper/2011/hash/40008b9a5380fcacce3976bf7c08af5b-Abstract.html> shows that with $\\eta_t = \\Theta(1/t)$,"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9cde29ef",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\mathbb{E}[C(\\theta_t) - C(\\theta^*)] = O(1/t),\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9b77f20e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "for strongly convex, smooth $F$ . This $1/t$ rate is slower per\n",
+    "iteration than GD’s exponential decay, but each SGD iteration is $N$\n",
+    "times cheaper. In fact, to reach error $\\epsilon$, plain SGD needs on\n",
+    "the order of $T=O(1/\\epsilon)$ iterations (sub-linear convergence),\n",
+    "while GD needs $O(\\log(1/\\epsilon))$ iterations. When accounting for\n",
+    "cost-per-iteration, GD requires $O(N \\log(1/\\epsilon))$ total gradient\n",
+    "computations versus SGD’s $O(1/\\epsilon)$ single-sample\n",
+    "computations. In large-scale regimes (huge $N$), SGD can be\n",
+    "faster in wall-clock time because $N \\log(1/\\epsilon)$ may far exceed\n",
+    "$1/\\epsilon$ for reasonable accuracy levels. In other words,\n",
+    "with millions of data points, one epoch of GD (one full gradient) is\n",
+    "extremely costly, whereas SGD can make $N$ cheap updates in the time\n",
+    "GD makes one – often yielding a good solution faster in practice, even\n",
+    "though SGD’s asymptotic error decays more slowly. As one lecture\n",
+    "succinctly puts it: “SGD can be super effective in terms of iteration\n",
+    "cost and memory, but SGD is slow to converge and can’t adapt to strong\n",
+    "convexity” . Thus, the break-even point depends on $N$ and the desired\n",
+    "accuracy: for moderate accuracy on very large $N$, SGD’s cheaper\n",
+    "updates win; for extremely high precision (very small $\\epsilon$) on a\n",
+    "modest $N$, GD’s fast convergence per step can be advantageous."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4479bd97",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "### Non-Convex Problems\n",
+    "\n",
+    "In non-convex optimization (e.g. deep neural networks), neither GD nor\n",
+    "SGD guarantees global minima, but SGD often displays faster progress\n",
+    "in finding useful minima. Theoretical results here are weaker, usually\n",
+    "showing convergence to a stationary point $\\theta$ ($|\\nabla C|$ is\n",
+    "small) in expectation. For example, GD might require $O(1/\\epsilon^2)$\n",
+    "iterations to ensure $|\\nabla C(\\theta)| < \\epsilon$, and SGD typically has\n",
+    "similar polynomial complexity (often worse due to gradient\n",
+    "noise). However, a noteworthy difference is that SGD’s stochasticity\n",
+    "can help escape saddle points or poor local minima. Random gradient\n",
+    "fluctuations act like implicit noise, helping the iterate “jump” out\n",
+    "of flat saddle regions where full-batch GD could stagnate . In fact,\n",
+    "research has shown that adding noise to GD can guarantee escaping\n",
+    "saddle points in polynomial time, and the inherent noise in SGD often\n",
+    "serves this role. Empirically, this means SGD can sometimes find a\n",
+    "lower loss basin faster, whereas full-batch GD might get “stuck” near\n",
+    "saddle points or need a very small learning rate to navigate complex\n",
+    "error surfaces . Overall, in modern high-dimensional machine learning,\n",
+    "SGD (or mini-batch SGD) is the workhorse for large non-convex problems\n",
+    "because it converges to good solutions much faster in practice,\n",
+    "despite the lack of a linear convergence guarantee. Full-batch GD is\n",
+    "rarely used on large neural networks, as it would require tiny steps\n",
+    "to avoid divergence and is extremely slow per iteration ."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "31ea65c9",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Memory Usage and Scalability\n",
+    "\n",
+    "A major advantage of SGD is its memory efficiency in handling large\n",
+    "datasets. Full-batch GD requires access to the entire training set for\n",
+    "each iteration, which often means the whole dataset (or a large\n",
+    "subset) must reside in memory to compute $\\nabla C(\\theta)$ . This results\n",
+    "in memory usage that scales linearly with the dataset size $N$. For\n",
+    "instance, if each training sample is large (e.g. high-dimensional\n",
+    "features), computing a full gradient may require storing a substantial\n",
+    "portion of the data or all intermediate gradients until they are\n",
+    "aggregated. In contrast, SGD needs only a single (or a small\n",
+    "mini-batch of) training example(s) in memory at any time . The\n",
+    "algorithm processes one sample (or mini-batch) at a time and\n",
+    "immediately updates the model, discarding that sample before moving to\n",
+    "the next. This streaming approach means that memory footprint is\n",
+    "essentially independent of $N$ (apart from storing the model\n",
+    "parameters themselves). As one source notes, gradient descent\n",
+    "“requires more memory than SGD” because it “must store the entire\n",
+    "dataset for each iteration,” whereas SGD “only needs to store the\n",
+    "current training example” . In practical terms, if you have a dataset\n",
+    "of size, say, 1 million examples, full-batch GD would need memory for\n",
+    "all million every step, while SGD could be implemented to load just\n",
+    "one example at a time – a crucial benefit if data are too large to fit\n",
+    "in RAM or GPU memory. This scalability makes SGD suitable for\n",
+    "large-scale learning: as long as you can stream data from disk, SGD\n",
+    "can handle arbitrarily large datasets with fixed memory. In fact, SGD\n",
+    "“does not need to remember which examples were visited” in the past,\n",
+    "allowing it to run in an online fashion on infinite data streams\n",
+    ". Full-batch GD, on the other hand, would require multiple passes\n",
+    "through a giant dataset per update (or a complex distributed memory\n",
+    "system), which is often infeasible.\n",
+    "\n",
+    "There is also a secondary memory effect: computing a full-batch\n",
+    "gradient in deep learning requires storing all intermediate\n",
+    "activations for backpropagation across the entire batch. A very large\n",
+    "batch (approaching the full dataset) might exhaust GPU memory due to\n",
+    "the need to hold activation gradients for thousands or millions of\n",
+    "examples simultaneously. SGD/minibatches mitigate this by splitting\n",
+    "the workload – e.g. with a mini-batch of size 32 or 256, memory use\n",
+    "stays bounded, whereas a full-batch (size = $N$) forward/backward pass\n",
+    "could not even be executed if $N$ is huge. Techniques like gradient\n",
+    "accumulation exist to simulate large-batch GD by summing many\n",
+    "small-batch gradients – but these still process data in manageable\n",
+    "chunks to avoid memory overflow. In summary, memory complexity for GD\n",
+    "grows with $N$, while for SGD it remains $O(1)$ w.r.t. dataset size\n",
+    "(only the model and perhaps a mini-batch reside in memory) . This is a\n",
+    "key reason why batch GD “does not scale” to very large data and why\n",
+    "virtually all large-scale machine learning algorithms rely on\n",
+    "stochastic or mini-batch methods."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3f3fe4c4",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Empirical Evidence: Convergence Time and Memory in Practice\n",
+    "\n",
+    "Empirical studies strongly support the theoretical trade-offs\n",
+    "above. In large-scale machine learning tasks, SGD often converges to a\n",
+    "good solution much faster in wall-clock time than full-batch GD, and\n",
+    "it uses far less memory. For example, Bottou & Bousquet (2008)\n",
+    "analyzed learning time under a fixed computational budget and\n",
+    "concluded that when data is abundant, it’s better to use a faster\n",
+    "(even if less precise) optimization method to process more examples in\n",
+    "the same time . This analysis showed that for large-scale problems,\n",
+    "processing more data with SGD yields lower error than spending the\n",
+    "time to do exact (batch) optimization on fewer data . In other words,\n",
+    "if you have a time budget, it’s often optimal to accept slightly\n",
+    "slower convergence per step (as with SGD) in exchange for being able\n",
+    "to use many more training samples in that time. This phenomenon is\n",
+    "borne out by experiments:"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "69d08c69",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "### Deep Neural Networks\n",
+    "\n",
+    "In modern deep learning, full-batch GD is so slow that it is rarely\n",
+    "attempted; instead, mini-batch SGD is standard. A recent study\n",
+    "demonstrated that it is possible to train a ResNet-50 on ImageNet\n",
+    "using full-batch gradient descent, but it required careful tuning\n",
+    "(e.g. gradient clipping, tiny learning rates) and vast computational\n",
+    "resources – and even then, each full-batch update was extremely\n",
+    "expensive.\n",
+    "\n",
+    "Using a huge batch\n",
+    "(closer to full GD) tends to slow down convergence if the learning\n",
+    "rate is not scaled up, and often encounters optimization difficulties\n",
+    "(plateaus) that small batches avoid.\n",
+    "Empirically, small or medium\n",
+    "batch SGD finds minima in fewer clock hours because it can rapidly\n",
+    "loop over the data with gradient noise aiding exploration."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4e2b549d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "### Memory constraints\n",
+    "\n",
+    "From a memory standpoint, practitioners note that batch GD becomes\n",
+    "infeasible on large data. For example, if one tried to do full-batch\n",
+    "training on a dataset that doesn’t fit in RAM or GPU memory, the\n",
+    "program would resort to heavy disk I/O or simply crash. SGD\n",
+    "circumvents this by processing mini-batches. Even in cases where data\n",
+    "does fit in memory, using a full batch can spike memory usage due to\n",
+    "storing all gradients. One empirical observation is that mini-batch\n",
+    "training has a “lower, fluctuating usage pattern” of memory, whereas\n",
+    "full-batch loading “quickly consumes memory (often exceeding limits)”\n",
+    ". This is especially relevant for graph neural networks or other\n",
+    "models where a “batch” may include a huge chunk of a graph: full-batch\n",
+    "gradient computation can exhaust GPU memory, whereas mini-batch\n",
+    "methods keep memory usage manageable .\n",
+    "\n",
+    "In summary, SGD converges faster than full-batch GD in terms of actual\n",
+    "training time for large-scale problems, provided we measure\n",
+    "convergence as reaching a good-enough solution. Theoretical bounds\n",
+    "show SGD needs more iterations, but because it performs many more\n",
+    "updates per unit time (and requires far less memory), it often\n",
+    "achieves lower loss in a given time frame than GD. Full-batch GD might\n",
+    "take slightly fewer iterations in theory, but each iteration is so\n",
+    "costly that it is “slower… especially for large datasets” . Meanwhile,\n",
+    "memory scaling strongly favors SGD: GD’s memory cost grows with\n",
+    "dataset size, making it impractical beyond a point, whereas SGD’s\n",
+    "memory use is modest and mostly constant w.r.t. $N$ . These\n",
+    "differences have made SGD (and mini-batch variants) the de facto\n",
+    "choice for training large machine learning models, from logistic\n",
+    "regression on millions of examples to deep neural networks with\n",
+    "billions of parameters. The consensus in both research and practice is\n",
+    "that for large-scale or high-dimensional tasks, SGD-type methods\n",
+    "converge quicker per unit of computation and handle memory constraints\n",
+    "better than standard full-batch gradient descent ."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "48c2661e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Second moment of the gradient\n",
+    "\n",
+    "In stochastic gradient descent, with and without momentum, we still\n",
+    "have to specify a schedule for tuning the learning rates $\\eta_t$\n",
+    "as a function of time.  As discussed in the context of Newton's\n",
+    "method, this presents a number of dilemmas. The learning rate is\n",
+    "limited by the steepest direction which can change depending on the\n",
+    "current position in the landscape. To circumvent this problem, ideally\n",
+    "our algorithm would keep track of curvature and take large steps in\n",
+    "shallow, flat directions and small steps in steep, narrow directions.\n",
+    "Second-order methods accomplish this by calculating or approximating\n",
+    "the Hessian and normalizing the learning rate by the\n",
+    "curvature. However, this is very computationally expensive for\n",
+    "extremely large models. Ideally, we would like to be able to\n",
+    "adaptively change the step size to match the landscape without paying\n",
+    "the steep computational price of calculating or approximating\n",
+    "Hessians.\n",
+    "\n",
+    "During the last decade a number of methods have been introduced that accomplish\n",
+    "this by tracking not only the gradient, but also the second moment of\n",
+    "the gradient. These methods include AdaGrad, AdaDelta, Root Mean Squared Propagation (RMS-Prop), and\n",
+    "[ADAM](https://arxiv.org/abs/1412.6980)."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a2106298",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Challenge: Choosing a Fixed Learning Rate\n",
+    "A fixed $\\eta$ is hard to get right:\n",
+    "1. If $\\eta$ is too large, the updates can overshoot the minimum, causing oscillations or divergence\n",
+    "\n",
+    "2. If $\\eta$ is too small, convergence is very slow (many iterations to make progress)\n",
+    "\n",
+    "In practice, one often uses trial-and-error or schedules (decaying $\\eta$ over time) to find a workable balance.\n",
+    "For a function with steep directions and flat directions, a single global $\\eta$ may be inappropriate:\n",
+    "1. Steep coordinates require a smaller step size to avoid oscillation.\n",
+    "\n",
+    "2. Flat/shallow coordinates could use a larger step to speed up progress.\n",
+    "\n",
+    "3. This issue is pronounced in high-dimensional problems with **sparse or varying-scale features** – we need a method to adjust step sizesper feature."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "477a053c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Motivation for Adaptive Step Sizes\n",
+    "\n",
+    "1. Instead of a fixed global $\\eta$, use an **adaptive learning rate** for each parameter that depends on the history of gradients.\n",
+    "\n",
+    "2. Parameters that have large accumulated gradient magnitude should get smaller steps (they've been changing a lot), whereas parameters with small or infrequent gradients can have larger relative steps.\n",
+    "\n",
+    "3. This is especially useful for sparse features: Rarely active features accumulate little gradient, so their learning rate remains comparatively high, ensuring they are not neglected\n",
+    "\n",
+    "4. Conversely, frequently active features accumulate large gradient sums, and their learning rate automatically decreases, preventing too-large updates\n",
+    "\n",
+    "5. Several algorithms implement this idea (AdaGrad, RMSProp, AdaDelta, Adam, etc.). We will derive **AdaGrad**, one of the first adaptive methods."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f0924df8",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## AdaGrad algorithm, taken from [Goodfellow et al](https://www.deeplearningbook.org/contents/optimization.html)\n",
+    "\n",
+    "<!-- dom:FIGURE: [figures/adagrad.png, width=600 frac=0.8] -->\n",
+    "<!-- begin figure -->\n",
+    "\n",
+    "<img src=\"figures/adagrad.png\" width=\"600\"><p style=\"font-size: 0.9em\"><i>Figure 1: </i></p>\n",
+    "<!-- end figure -->"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7743f26d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Derivation of the AdaGrad Algorithm\n",
+    "\n",
+    "**Accumulating Gradient History.**\n",
+    "\n",
+    "1. AdaGrad maintains a running sum of squared gradients for each parameter (coordinate)\n",
+    "\n",
+    "2. Let $g_t = \\nabla C_{i_t}(x_t)$ be the gradient at step $t$ (or a subgradient for nondifferentiable cases).\n",
+    "\n",
+    "3. Initialize $r_0 = 0$ (an all-zero vector in $\\mathbb{R}^d$).\n",
+    "\n",
+    "4. At each iteration $t$, update the accumulation:"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ef4b5d6a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "r_t = r_{t-1} + g_t \\circ g_t,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "927e2738",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "1. Here  $g_t \\circ g_t$ denotes element-wise square of the gradient vector. $g_t^{(j)} = g_{t-1}^{(j)} + (g_{t,j})^2$ for each parameter $j$.\n",
+    "\n",
+    "2. We can view $H_t = \\mathrm{diag}(r_t)$ as a diagonal matrix of past squared gradients. Initially $H_0 = 0$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1753de13",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## AdaGrad Update Rule Derivation\n",
+    "\n",
+    "We scale the gradient by the inverse square root of the accumulated matrix $H_t$. The AdaGrad update at step $t$ is:"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0db67ba3",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\theta_{t+1} =\\theta_t - \\eta H_t^{-1/2} g_t,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7831e978",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where $H_t^{-1/2}$ is the diagonal matrix with entries $(r_{t}^{(1)})^{-1/2}, \\dots, (r_{t}^{(d)})^{-1/2}$\n",
+    "In coordinates, this means each parameter $j$ has an individual step size:"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "92a7758a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\theta_{t+1,j} =\\theta_{t,j} -\\frac{\\eta}{\\sqrt{r_{t,j}}}g_{t,j}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "df62a4ff",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "In practice we add a small constant $\\epsilon$ in the denominator for numerical stability to avoid division by zero:"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c8a2b948",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\theta_{t+1,j}= \\theta_{t,j}-\\frac{\\eta}{\\sqrt{\\epsilon + r_{t,j}}}g_{t,j}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3f269e80",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Equivalently, the effective learning rate for parameter $j$ at time $t$ is $\\displaystyle \\alpha_{t,j} = \\frac{\\eta}{\\sqrt{\\epsilon + r_{t,j}}}$. This decreases over time as $r_{t,j}$ grows."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f4ec584c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## AdaGrad Properties\n",
+    "\n",
+    "1. AdaGrad automatically tunes the step size for each parameter. Parameters with more *volatile or large gradients* get smaller steps, and those with *small or infrequent gradients* get relatively larger steps\n",
+    "\n",
+    "2. No manual schedule needed: The accumulation $r_t$ keeps increasing (or stays the same if gradient is zero), so step sizes $\\eta/\\sqrt{r_t}$ are non-increasing. This has a similar effect to a learning rate schedule, but individualized per coordinate.\n",
+    "\n",
+    "3. Sparse data benefit: For very sparse features, $r_{t,j}$ grows slowly, so that feature’s parameter retains a higher learning rate for longer, allowing it to make significant updates when it does get a gradient signal\n",
+    "\n",
+    "4. Convergence: In convex optimization, AdaGrad can be shown to achieve a sub-linear convergence rate  comparable to the best fixed learning rate tuned for the problem\n",
+    "\n",
+    "It effectively reduces the need to tune $\\eta$ by hand.\n",
+    "1. Limitations: Because $r_t$ accumulates without bound, AdaGrad’s learning rates can become extremely small over long training, potentially slowing progress. (Later variants like RMSProp, AdaDelta, Adam address this by modifying the accumulation rule.)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4b741016",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## RMSProp: Adaptive Learning Rates\n",
+    "\n",
+    "Addresses AdaGrad’s diminishing learning rate issue.\n",
+    "Uses a decaying average of squared gradients (instead of a cumulative sum):"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "76108e75",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "v_t = \\rho v_{t-1} + (1-\\rho)(\\nabla C(\\theta_t))^2,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4c6a3353",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "with $\\rho$ typically $0.9$ (or $0.99$).\n",
+    "1. Update: $\\theta_{t+1} = \\theta_t - \\frac{\\eta}{\\sqrt{v_t + \\epsilon}} \\nabla C(\\theta_t)$.\n",
+    "\n",
+    "2. Recent gradients have more weight, so $v_t$ adapts to the current landscape.\n",
+    "\n",
+    "3. Avoids AdaGrad’s “infinite memory” problem – learning rate does not continuously decay to zero.\n",
+    "\n",
+    "RMSProp was first proposed in lecture notes by Geoff Hinton, 2012 - unpublished.)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3e0a76ae",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## RMSProp algorithm, taken from [Goodfellow et al](https://www.deeplearningbook.org/contents/optimization.html)\n",
+    "\n",
+    "<!-- dom:FIGURE: [figures/rmsprop.png, width=600 frac=0.8] -->\n",
+    "<!-- begin figure -->\n",
+    "\n",
+    "<img src=\"figures/rmsprop.png\" width=\"600\"><p style=\"font-size: 0.9em\"><i>Figure 1: </i></p>\n",
+    "<!-- end figure -->"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "fa5fd82e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Adam Optimizer\n",
+    "\n",
+    "Why combine Momentum and RMSProp? Motivation for Adam: Adaptive Moment Estimation (Adam) was introduced by Kingma an Ba (2014) to combine the benefits of momentum and RMSProp.\n",
+    "\n",
+    "1. Fast convergence by smoothing gradients (accelerates in long-term gradient direction).\n",
+    "\n",
+    "2. Adaptive rates (RMSProp): Per-dimension learning rate scaling for stability (handles different feature scales, sparse gradients).\n",
+    "\n",
+    "3. Adam uses both: maintains moving averages of both first moment (gradients) and second moment (squared gradients)\n",
+    "\n",
+    "4. Additionally, includes a mechanism to correct the bias in these moving averages (crucial in early iterations)\n",
+    "\n",
+    "**Result**: Adam is robust, achieves faster convergence with less tuning, and often outperforms SGD (with momentum) in practice."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "89cda2f6",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## [ADAM optimizer](https://arxiv.org/abs/1412.6980)\n",
+    "\n",
+    "In [ADAM](https://arxiv.org/abs/1412.6980), we keep a running average of\n",
+    "both the first and second moment of the gradient and use this\n",
+    "information to adaptively change the learning rate for different\n",
+    "parameters.  The method is efficient when working with large\n",
+    "problems involving lots data and/or parameters.  It is a combination of the\n",
+    "gradient descent with momentum algorithm and the RMSprop algorithm\n",
+    "discussed above."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "69310c2b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Why Combine Momentum and RMSProp?\n",
+    "\n",
+    "1. Momentum: Fast convergence by smoothing gradients (accelerates in long-term gradient direction).\n",
+    "\n",
+    "2. Adaptive rates (RMSProp): Per-dimension learning rate scaling for stability (handles different feature scales, sparse gradients).\n",
+    "\n",
+    "3. Adam uses both: maintains moving averages of both first moment (gradients) and second moment (squared gradients)\n",
+    "\n",
+    "4. Additionally, includes a mechanism to correct the bias in these moving averages (crucial in early iterations)\n",
+    "\n",
+    "Result: Adam is robust, achieves faster convergence with less tuning, and often outperforms SGD (with momentum) in practice"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7d6b8734",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Adam: Exponential Moving Averages (Moments)\n",
+    "Adam maintains two moving averages at each time step $t$ for each parameter $w$:\n",
+    "**First moment (mean) $m_t$.**\n",
+    "\n",
+    "The Momentum term"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "106ce6bf",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "m_t = \\beta_1m_{t-1} + (1-\\beta_1)\\, \\nabla C(\\theta_t),\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3ba64fd6",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "**Second moment (uncentered variance) $v_t$.**\n",
+    "\n",
+    "The RMS term"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d2e1a9ee",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "v_t = \\beta_2v_{t-1} + (1-\\beta_2)(\\nabla C(\\theta_t))^2,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "00aae51f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "with typical $\\beta_1 = 0.9$, $\\beta_2 = 0.999$. Initialize $m_0 = 0$, $v_0 = 0$.\n",
+    "\n",
+    "  These are **biased** estimators of the true first and second moment of the gradients, especially at the start (since $m_0,v_0$ are zero)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "38adfadd",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Adam: Bias Correction\n",
+    "To counteract initialization bias in $m_t, v_t$, Adam computes bias-corrected estimates"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "484156fb",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\hat{m}_t = \\frac{m_t}{1 - \\beta_1^t}, \\qquad \\hat{v}_t = \\frac{v_t}{1 - \\beta_2^t}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "45d1d0c2",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "* When $t$ is small, $1-\\beta_i^t \\approx 0$, so $\\hat{m}_t, \\hat{v}_t$ significantly larger than raw $m_t, v_t$, compensating for the initial zero bias.\n",
+    "\n",
+    "* As $t$ increases, $1-\\beta_i^t \\to 1$, and $\\hat{m}_t, \\hat{v}_t$ converge to $m_t, v_t$.\n",
+    "\n",
+    "* Bias correction is important for Adam’s stability in early iterations"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e62d5568",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Adam: Update Rule Derivation\n",
+    "Finally, Adam updates parameters using the bias-corrected moments:"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3eb873c1",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\theta_{t+1} =\\theta_t -\\frac{\\alpha}{\\sqrt{\\hat{v}_t} + \\epsilon}\\hat{m}_t,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "fc1129f6",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where $\\epsilon$ is a small constant (e.g. $10^{-8}$) to prevent division by zero.\n",
+    "Breaking it down:\n",
+    "1. Compute gradient $\\nabla C(\\theta_t)$.\n",
+    "\n",
+    "2. Update first moment $m_t$ and second moment $v_t$ (exponential moving averages).\n",
+    "\n",
+    "3. Bias-correct: $\\hat{m}_t = m_t/(1-\\beta_1^t)$, $\\; \\hat{v}_t = v_t/(1-\\beta_2^t)$.\n",
+    "\n",
+    "4. Compute step: $\\Delta \\theta_t = \\frac{\\hat{m}_t}{\\sqrt{\\hat{v}_t} + \\epsilon}$.\n",
+    "\n",
+    "5. Update parameters: $\\theta_{t+1} = \\theta_t - \\alpha\\, \\Delta \\theta_t$.\n",
+    "\n",
+    "This is the Adam update rule as given in the original paper."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6f15ce48",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Adam vs. AdaGrad and RMSProp\n",
+    "\n",
+    "1. AdaGrad: Uses per-coordinate scaling like Adam, but no momentum. Tends to slow down too much due to cumulative history (no forgetting)\n",
+    "\n",
+    "2. RMSProp: Uses moving average of squared gradients (like Adam’s $v_t$) to maintain adaptive learning rates, but does not include momentum or bias-correction.\n",
+    "\n",
+    "3. Adam: Effectively RMSProp + Momentum + Bias-correction\n",
+    "\n",
+    "  * Momentum ($m_t$) provides acceleration and smoother convergence.\n",
+    "\n",
+    "  * Adaptive $v_t$ scaling moderates the step size per dimension.\n",
+    "\n",
+    "  * Bias correction (absent in AdaGrad/RMSProp) ensures robust estimates early on.\n",
+    "\n",
+    "In practice, Adam often yields faster convergence and better tuning stability than RMSProp or AdaGrad alone"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "44cb65e2",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Adaptivity Across Dimensions\n",
+    "\n",
+    "1. Adam adapts the step size \\emph{per coordinate}: parameters with larger gradient variance get smaller effective steps, those with smaller or sparse gradients get larger steps.\n",
+    "\n",
+    "2. This per-dimension adaptivity is inherited from AdaGrad/RMSProp and helps handle ill-conditioned or sparse problems.\n",
+    "\n",
+    "3. Meanwhile, momentum (first moment) allows Adam to continue making progress even if gradients become small or noisy, by leveraging accumulated direction."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e3862c40",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## ADAM algorithm, taken from [Goodfellow et al](https://www.deeplearningbook.org/contents/optimization.html)\n",
+    "\n",
+    "<!-- dom:FIGURE: [figures/adam.png, width=600 frac=0.8] -->\n",
+    "<!-- begin figure -->\n",
+    "\n",
+    "<img src=\"figures/adam.png\" width=\"600\"><p style=\"font-size: 0.9em\"><i>Figure 1: </i></p>\n",
+    "<!-- end figure -->"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c4aa2b35",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Algorithms and codes for Adagrad, RMSprop and Adam\n",
+    "\n",
+    "The algorithms we have implemented are well described in the text by [Goodfellow, Bengio and Courville, chapter 8](https://www.deeplearningbook.org/contents/optimization.html).\n",
+    "\n",
+    "The codes which implement these algorithms are discussed below here."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "01de27d3",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Practical tips\n",
+    "\n",
+    "* **Randomize the data when making mini-batches**. It is always important to randomly shuffle the data when forming mini-batches. Otherwise, the gradient descent method can fit spurious correlations resulting from the order in which data is presented.\n",
+    "\n",
+    "* **Transform your inputs**. Learning becomes difficult when our landscape has a mixture of steep and flat directions. One simple trick for minimizing these situations is to standardize the data by subtracting the mean and normalizing the variance of input variables. Whenever possible, also decorrelate the inputs. To understand why this is helpful, consider the case of linear regression. It is easy to show that for the squared error cost function, the Hessian of the cost function is just the correlation matrix between the inputs. Thus, by standardizing the inputs, we are ensuring that the landscape looks homogeneous in all directions in parameter space. Since most deep networks can be viewed as linear transformations followed by a non-linearity at each layer, we expect this intuition to hold beyond the linear case.\n",
+    "\n",
+    "* **Monitor the out-of-sample performance.** Always monitor the performance of your model on a validation set (a small portion of the training data that is held out of the training process to serve as a proxy for the test set. If the validation error starts increasing, then the model is beginning to overfit. Terminate the learning process. This *early stopping* significantly improves performance in many settings.\n",
+    "\n",
+    "* **Adaptive optimization methods don't always have good generalization.** Recent studies have shown that adaptive methods such as ADAM, RMSPorp, and AdaGrad tend to have poor generalization compared to SGD or SGD with momentum, particularly in the high-dimensional limit (i.e. the number of parameters exceeds the number of data points). Although it is not clear at this stage why these methods perform so well in training deep neural networks, simpler procedures like properly-tuned SGD may work as well or better in these applications."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "78a1a601",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Sneaking in automatic differentiation using Autograd\n",
+    "\n",
+    "In the examples here we take the liberty of sneaking in automatic\n",
+    "differentiation (without having discussed the mathematics).  In\n",
+    "project 1 you will write the gradients as discussed above, that is\n",
+    "hard-coding the gradients.  By introducing automatic differentiation\n",
+    "via the library **autograd**, which is now replaced by **JAX**, we have\n",
+    "more flexibility in setting up alternative cost functions.\n",
+    "\n",
+    "The\n",
+    "first example shows results with ordinary leats squares."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 9,
+   "id": "c721352d",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "# Using Autograd to calculate gradients for OLS\n",
+    "from random import random, seed\n",
+    "import numpy as np\n",
+    "import autograd.numpy as np\n",
+    "import matplotlib.pyplot as plt\n",
+    "from autograd import grad\n",
+    "\n",
+    "def CostOLS(theta):\n",
+    "    return (1.0/n)*np.sum((y-X @ theta)**2)\n",
+    "\n",
+    "n = 100\n",
+    "x = 2*np.random.rand(n,1)\n",
+    "y = 4+3*x+np.random.randn(n,1)\n",
+    "\n",
+    "X = np.c_[np.ones((n,1)), x]\n",
+    "XT_X = X.T @ X\n",
+    "theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)\n",
+    "print(\"Own inversion\")\n",
+    "print(theta_linreg)\n",
+    "# Hessian matrix\n",
+    "H = (2.0/n)* XT_X\n",
+    "EigValues, EigVectors = np.linalg.eig(H)\n",
+    "print(f\"Eigenvalues of Hessian Matrix:{EigValues}\")\n",
+    "\n",
+    "theta = np.random.randn(2,1)\n",
+    "eta = 1.0/np.max(EigValues)\n",
+    "Niterations = 1000\n",
+    "# define the gradient\n",
+    "training_gradient = grad(CostOLS)\n",
+    "\n",
+    "for iter in range(Niterations):\n",
+    "    gradients = training_gradient(theta)\n",
+    "    theta -= eta*gradients\n",
+    "print(\"theta from own gd\")\n",
+    "print(theta)\n",
+    "\n",
+    "xnew = np.array([[0],[2]])\n",
+    "Xnew = np.c_[np.ones((2,1)), xnew]\n",
+    "ypredict = Xnew.dot(theta)\n",
+    "ypredict2 = Xnew.dot(theta_linreg)\n",
+    "\n",
+    "plt.plot(xnew, ypredict, \"r-\")\n",
+    "plt.plot(xnew, ypredict2, \"b-\")\n",
+    "plt.plot(x, y ,'ro')\n",
+    "plt.axis([0,2.0,0, 15.0])\n",
+    "plt.xlabel(r'$x$')\n",
+    "plt.ylabel(r'$y$')\n",
+    "plt.title(r'Random numbers ')\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e36cec47",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Same code but now with momentum gradient descent"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 10,
+   "id": "fc5df7eb",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "# Using Autograd to calculate gradients for OLS\n",
+    "from random import random, seed\n",
+    "import numpy as np\n",
+    "import autograd.numpy as np\n",
+    "import matplotlib.pyplot as plt\n",
+    "from autograd import grad\n",
+    "\n",
+    "def CostOLS(theta):\n",
+    "    return (1.0/n)*np.sum((y-X @ theta)**2)\n",
+    "\n",
+    "n = 100\n",
+    "x = 2*np.random.rand(n,1)\n",
+    "y = 4+3*x#+np.random.randn(n,1)\n",
+    "\n",
+    "X = np.c_[np.ones((n,1)), x]\n",
+    "XT_X = X.T @ X\n",
+    "theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)\n",
+    "print(\"Own inversion\")\n",
+    "print(theta_linreg)\n",
+    "# Hessian matrix\n",
+    "H = (2.0/n)* XT_X\n",
+    "EigValues, EigVectors = np.linalg.eig(H)\n",
+    "print(f\"Eigenvalues of Hessian Matrix:{EigValues}\")\n",
+    "\n",
+    "theta = np.random.randn(2,1)\n",
+    "eta = 1.0/np.max(EigValues)\n",
+    "Niterations = 30\n",
+    "\n",
+    "# define the gradient\n",
+    "training_gradient = grad(CostOLS)\n",
+    "\n",
+    "for iter in range(Niterations):\n",
+    "    gradients = training_gradient(theta)\n",
+    "    theta -= eta*gradients\n",
+    "    print(iter,gradients[0],gradients[1])\n",
+    "print(\"theta from own gd\")\n",
+    "print(theta)\n",
+    "\n",
+    "# Now improve with momentum gradient descent\n",
+    "change = 0.0\n",
+    "delta_momentum = 0.3\n",
+    "for iter in range(Niterations):\n",
+    "    # calculate gradient\n",
+    "    gradients = training_gradient(theta)\n",
+    "    # calculate update\n",
+    "    new_change = eta*gradients+delta_momentum*change\n",
+    "    # take a step\n",
+    "    theta -= new_change\n",
+    "    # save the change\n",
+    "    change = new_change\n",
+    "    print(iter,gradients[0],gradients[1])\n",
+    "print(\"theta from own gd wth momentum\")\n",
+    "print(theta)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0b27af70",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Including Stochastic Gradient Descent with Autograd\n",
+    "\n",
+    "In this code we include the stochastic gradient descent approach\n",
+    "discussed above. Note here that we specify which argument we are\n",
+    "taking the derivative with respect to when using **autograd**."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 11,
+   "id": "adef9763",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "# Using Autograd to calculate gradients using SGD\n",
+    "# OLS example\n",
+    "from random import random, seed\n",
+    "import numpy as np\n",
+    "import autograd.numpy as np\n",
+    "import matplotlib.pyplot as plt\n",
+    "from autograd import grad\n",
+    "\n",
+    "# Note change from previous example\n",
+    "def CostOLS(y,X,theta):\n",
+    "    return np.sum((y-X @ theta)**2)\n",
+    "\n",
+    "n = 100\n",
+    "x = 2*np.random.rand(n,1)\n",
+    "y = 4+3*x+np.random.randn(n,1)\n",
+    "\n",
+    "X = np.c_[np.ones((n,1)), x]\n",
+    "XT_X = X.T @ X\n",
+    "theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)\n",
+    "print(\"Own inversion\")\n",
+    "print(theta_linreg)\n",
+    "# Hessian matrix\n",
+    "H = (2.0/n)* XT_X\n",
+    "EigValues, EigVectors = np.linalg.eig(H)\n",
+    "print(f\"Eigenvalues of Hessian Matrix:{EigValues}\")\n",
+    "\n",
+    "theta = np.random.randn(2,1)\n",
+    "eta = 1.0/np.max(EigValues)\n",
+    "Niterations = 1000\n",
+    "\n",
+    "# Note that we request the derivative wrt third argument (theta, 2 here)\n",
+    "training_gradient = grad(CostOLS,2)\n",
+    "\n",
+    "for iter in range(Niterations):\n",
+    "    gradients = (1.0/n)*training_gradient(y, X, theta)\n",
+    "    theta -= eta*gradients\n",
+    "print(\"theta from own gd\")\n",
+    "print(theta)\n",
+    "\n",
+    "xnew = np.array([[0],[2]])\n",
+    "Xnew = np.c_[np.ones((2,1)), xnew]\n",
+    "ypredict = Xnew.dot(theta)\n",
+    "ypredict2 = Xnew.dot(theta_linreg)\n",
+    "\n",
+    "plt.plot(xnew, ypredict, \"r-\")\n",
+    "plt.plot(xnew, ypredict2, \"b-\")\n",
+    "plt.plot(x, y ,'ro')\n",
+    "plt.axis([0,2.0,0, 15.0])\n",
+    "plt.xlabel(r'$x$')\n",
+    "plt.ylabel(r'$y$')\n",
+    "plt.title(r'Random numbers ')\n",
+    "plt.show()\n",
+    "\n",
+    "n_epochs = 50\n",
+    "M = 5   #size of each minibatch\n",
+    "m = int(n/M) #number of minibatches\n",
+    "t0, t1 = 5, 50\n",
+    "def learning_schedule(t):\n",
+    "    return t0/(t+t1)\n",
+    "\n",
+    "theta = np.random.randn(2,1)\n",
+    "\n",
+    "for epoch in range(n_epochs):\n",
+    "# Can you figure out a better way of setting up the contributions to each batch?\n",
+    "    for i in range(m):\n",
+    "        random_index = M*np.random.randint(m)\n",
+    "        xi = X[random_index:random_index+M]\n",
+    "        yi = y[random_index:random_index+M]\n",
+    "        gradients = (1.0/M)*training_gradient(yi, xi, theta)\n",
+    "        eta = learning_schedule(epoch*m+i)\n",
+    "        theta = theta - eta*gradients\n",
+    "print(\"theta from own sdg\")\n",
+    "print(theta)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "310fe5b2",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Same code but now with momentum gradient descent"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 12,
+   "id": "bcf65acf",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "# Using Autograd to calculate gradients using SGD\n",
+    "# OLS example\n",
+    "from random import random, seed\n",
+    "import numpy as np\n",
+    "import autograd.numpy as np\n",
+    "import matplotlib.pyplot as plt\n",
+    "from autograd import grad\n",
+    "\n",
+    "# Note change from previous example\n",
+    "def CostOLS(y,X,theta):\n",
+    "    return np.sum((y-X @ theta)**2)\n",
+    "\n",
+    "n = 100\n",
+    "x = 2*np.random.rand(n,1)\n",
+    "y = 4+3*x+np.random.randn(n,1)\n",
+    "\n",
+    "X = np.c_[np.ones((n,1)), x]\n",
+    "XT_X = X.T @ X\n",
+    "theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)\n",
+    "print(\"Own inversion\")\n",
+    "print(theta_linreg)\n",
+    "# Hessian matrix\n",
+    "H = (2.0/n)* XT_X\n",
+    "EigValues, EigVectors = np.linalg.eig(H)\n",
+    "print(f\"Eigenvalues of Hessian Matrix:{EigValues}\")\n",
+    "\n",
+    "theta = np.random.randn(2,1)\n",
+    "eta = 1.0/np.max(EigValues)\n",
+    "Niterations = 100\n",
+    "\n",
+    "# Note that we request the derivative wrt third argument (theta, 2 here)\n",
+    "training_gradient = grad(CostOLS,2)\n",
+    "\n",
+    "for iter in range(Niterations):\n",
+    "    gradients = (1.0/n)*training_gradient(y, X, theta)\n",
+    "    theta -= eta*gradients\n",
+    "print(\"theta from own gd\")\n",
+    "print(theta)\n",
+    "\n",
+    "\n",
+    "n_epochs = 50\n",
+    "M = 5   #size of each minibatch\n",
+    "m = int(n/M) #number of minibatches\n",
+    "t0, t1 = 5, 50\n",
+    "def learning_schedule(t):\n",
+    "    return t0/(t+t1)\n",
+    "\n",
+    "theta = np.random.randn(2,1)\n",
+    "\n",
+    "change = 0.0\n",
+    "delta_momentum = 0.3\n",
+    "\n",
+    "for epoch in range(n_epochs):\n",
+    "    for i in range(m):\n",
+    "        random_index = M*np.random.randint(m)\n",
+    "        xi = X[random_index:random_index+M]\n",
+    "        yi = y[random_index:random_index+M]\n",
+    "        gradients = (1.0/M)*training_gradient(yi, xi, theta)\n",
+    "        eta = learning_schedule(epoch*m+i)\n",
+    "        # calculate update\n",
+    "        new_change = eta*gradients+delta_momentum*change\n",
+    "        # take a step\n",
+    "        theta -= new_change\n",
+    "        # save the change\n",
+    "        change = new_change\n",
+    "print(\"theta from own sdg with momentum\")\n",
+    "print(theta)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f5e2c550",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## But none of these can compete with Newton's method\n",
+    "\n",
+    "Note that we here have introduced automatic differentiation"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 13,
+   "id": "300a02a4",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "# Using Newton's method\n",
+    "from random import random, seed\n",
+    "import numpy as np\n",
+    "import autograd.numpy as np\n",
+    "from autograd import grad\n",
+    "\n",
+    "def CostOLS(theta):\n",
+    "    return (1.0/n)*np.sum((y-X @ theta)**2)\n",
+    "\n",
+    "n = 100\n",
+    "x = 2*np.random.rand(n,1)\n",
+    "y = 4+3*x+5*x*x\n",
+    "\n",
+    "X = np.c_[np.ones((n,1)), x, x*x]\n",
+    "XT_X = X.T @ X\n",
+    "theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)\n",
+    "print(\"Own inversion\")\n",
+    "print(theta_linreg)\n",
+    "# Hessian matrix\n",
+    "H = (2.0/n)* XT_X\n",
+    "# Note that here the Hessian does not depend on the parameters theta\n",
+    "invH = np.linalg.pinv(H)\n",
+    "theta = np.random.randn(3,1)\n",
+    "Niterations = 5\n",
+    "# define the gradient\n",
+    "training_gradient = grad(CostOLS)\n",
+    "\n",
+    "for iter in range(Niterations):\n",
+    "    gradients = training_gradient(theta)\n",
+    "    theta -= invH @ gradients\n",
+    "    print(iter,gradients[0],gradients[1])\n",
+    "print(\"theta from own Newton code\")\n",
+    "print(theta)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5cb5fd26",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Similar (second order function now) problem but now with AdaGrad"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 14,
+   "id": "030efc5d",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "# Using Autograd to calculate gradients using AdaGrad and Stochastic Gradient descent\n",
+    "# OLS example\n",
+    "from random import random, seed\n",
+    "import numpy as np\n",
+    "import autograd.numpy as np\n",
+    "import matplotlib.pyplot as plt\n",
+    "from autograd import grad\n",
+    "\n",
+    "# Note change from previous example\n",
+    "def CostOLS(y,X,theta):\n",
+    "    return np.sum((y-X @ theta)**2)\n",
+    "\n",
+    "n = 1000\n",
+    "x = np.random.rand(n,1)\n",
+    "y = 2.0+3*x +4*x*x\n",
+    "\n",
+    "X = np.c_[np.ones((n,1)), x, x*x]\n",
+    "XT_X = X.T @ X\n",
+    "theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)\n",
+    "print(\"Own inversion\")\n",
+    "print(theta_linreg)\n",
+    "\n",
+    "\n",
+    "# Note that we request the derivative wrt third argument (theta, 2 here)\n",
+    "training_gradient = grad(CostOLS,2)\n",
+    "# Define parameters for Stochastic Gradient Descent\n",
+    "n_epochs = 50\n",
+    "M = 5   #size of each minibatch\n",
+    "m = int(n/M) #number of minibatches\n",
+    "# Guess for unknown parameters theta\n",
+    "theta = np.random.randn(3,1)\n",
+    "\n",
+    "# Value for learning rate\n",
+    "eta = 0.01\n",
+    "# Including AdaGrad parameter to avoid possible division by zero\n",
+    "delta  = 1e-8\n",
+    "for epoch in range(n_epochs):\n",
+    "    Giter = 0.0\n",
+    "    for i in range(m):\n",
+    "        random_index = M*np.random.randint(m)\n",
+    "        xi = X[random_index:random_index+M]\n",
+    "        yi = y[random_index:random_index+M]\n",
+    "        gradients = (1.0/M)*training_gradient(yi, xi, theta)\n",
+    "        Giter += gradients*gradients\n",
+    "        update = gradients*eta/(delta+np.sqrt(Giter))\n",
+    "        theta -= update\n",
+    "print(\"theta from own AdaGrad\")\n",
+    "print(theta)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "66850bb7",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Running this code we note an almost perfect agreement with the results from matrix inversion."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e1608bcf",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## RMSprop for adaptive learning rate with Stochastic Gradient Descent"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 15,
+   "id": "0ba7d8f7",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "# Using Autograd to calculate gradients using RMSprop  and Stochastic Gradient descent\n",
+    "# OLS example\n",
+    "from random import random, seed\n",
+    "import numpy as np\n",
+    "import autograd.numpy as np\n",
+    "import matplotlib.pyplot as plt\n",
+    "from autograd import grad\n",
+    "\n",
+    "# Note change from previous example\n",
+    "def CostOLS(y,X,theta):\n",
+    "    return np.sum((y-X @ theta)**2)\n",
+    "\n",
+    "n = 1000\n",
+    "x = np.random.rand(n,1)\n",
+    "y = 2.0+3*x +4*x*x# +np.random.randn(n,1)\n",
+    "\n",
+    "X = np.c_[np.ones((n,1)), x, x*x]\n",
+    "XT_X = X.T @ X\n",
+    "theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)\n",
+    "print(\"Own inversion\")\n",
+    "print(theta_linreg)\n",
+    "\n",
+    "\n",
+    "# Note that we request the derivative wrt third argument (theta, 2 here)\n",
+    "training_gradient = grad(CostOLS,2)\n",
+    "# Define parameters for Stochastic Gradient Descent\n",
+    "n_epochs = 50\n",
+    "M = 5   #size of each minibatch\n",
+    "m = int(n/M) #number of minibatches\n",
+    "# Guess for unknown parameters theta\n",
+    "theta = np.random.randn(3,1)\n",
+    "\n",
+    "# Value for learning rate\n",
+    "eta = 0.01\n",
+    "# Value for parameter rho\n",
+    "rho = 0.99\n",
+    "# Including AdaGrad parameter to avoid possible division by zero\n",
+    "delta  = 1e-8\n",
+    "for epoch in range(n_epochs):\n",
+    "    Giter = 0.0\n",
+    "    for i in range(m):\n",
+    "        random_index = M*np.random.randint(m)\n",
+    "        xi = X[random_index:random_index+M]\n",
+    "        yi = y[random_index:random_index+M]\n",
+    "        gradients = (1.0/M)*training_gradient(yi, xi, theta)\n",
+    "\t# Accumulated gradient\n",
+    "\t# Scaling with rho the new and the previous results\n",
+    "        Giter = (rho*Giter+(1-rho)*gradients*gradients)\n",
+    "\t# Taking the diagonal only and inverting\n",
+    "        update = gradients*eta/(delta+np.sqrt(Giter))\n",
+    "\t# Hadamard product\n",
+    "        theta -= update\n",
+    "print(\"theta from own RMSprop\")\n",
+    "print(theta)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0503f74b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## And finally [ADAM](https://arxiv.org/pdf/1412.6980.pdf)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 16,
+   "id": "c2a2732a",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "# Using Autograd to calculate gradients using RMSprop  and Stochastic Gradient descent\n",
+    "# OLS example\n",
+    "from random import random, seed\n",
+    "import numpy as np\n",
+    "import autograd.numpy as np\n",
+    "import matplotlib.pyplot as plt\n",
+    "from autograd import grad\n",
+    "\n",
+    "# Note change from previous example\n",
+    "def CostOLS(y,X,theta):\n",
+    "    return np.sum((y-X @ theta)**2)\n",
+    "\n",
+    "n = 1000\n",
+    "x = np.random.rand(n,1)\n",
+    "y = 2.0+3*x +4*x*x# +np.random.randn(n,1)\n",
+    "\n",
+    "X = np.c_[np.ones((n,1)), x, x*x]\n",
+    "XT_X = X.T @ X\n",
+    "theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)\n",
+    "print(\"Own inversion\")\n",
+    "print(theta_linreg)\n",
+    "\n",
+    "\n",
+    "# Note that we request the derivative wrt third argument (theta, 2 here)\n",
+    "training_gradient = grad(CostOLS,2)\n",
+    "# Define parameters for Stochastic Gradient Descent\n",
+    "n_epochs = 50\n",
+    "M = 5   #size of each minibatch\n",
+    "m = int(n/M) #number of minibatches\n",
+    "# Guess for unknown parameters theta\n",
+    "theta = np.random.randn(3,1)\n",
+    "\n",
+    "# Value for learning rate\n",
+    "eta = 0.01\n",
+    "# Value for parameters theta1 and theta2, see https://arxiv.org/abs/1412.6980\n",
+    "theta1 = 0.9\n",
+    "theta2 = 0.999\n",
+    "# Including AdaGrad parameter to avoid possible division by zero\n",
+    "delta  = 1e-7\n",
+    "iter = 0\n",
+    "for epoch in range(n_epochs):\n",
+    "    first_moment = 0.0\n",
+    "    second_moment = 0.0\n",
+    "    iter += 1\n",
+    "    for i in range(m):\n",
+    "        random_index = M*np.random.randint(m)\n",
+    "        xi = X[random_index:random_index+M]\n",
+    "        yi = y[random_index:random_index+M]\n",
+    "        gradients = (1.0/M)*training_gradient(yi, xi, theta)\n",
+    "        # Computing moments first\n",
+    "        first_moment = theta1*first_moment + (1-theta1)*gradients\n",
+    "        second_moment = theta2*second_moment+(1-theta2)*gradients*gradients\n",
+    "        first_term = first_moment/(1.0-theta1**iter)\n",
+    "        second_term = second_moment/(1.0-theta2**iter)\n",
+    "\t# Scaling with rho the new and the previous results\n",
+    "        update = eta*first_term/(np.sqrt(second_term)+delta)\n",
+    "        theta -= update\n",
+    "print(\"theta from own ADAM\")\n",
+    "print(theta)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b8475863",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Material for the lab sessions\n",
+    "\n",
+    "1. Exercise set for week 37 and reminder on scaling (from lab sessions of week 35)\n",
+    "\n",
+    "2. Work on project 1\n",
+    "<!-- * [Video of exercise sessions week 37](https://youtu.be/bK4AEcTu-oM) -->\n",
+    "\n",
+    "For more discussions of Ridge regression and calculation of averages, [Wessel van Wieringen's](https://arxiv.org/abs/1509.09169) article is highly recommended."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4d4d0717",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Reminder on different scaling methods\n",
+    "\n",
+    "Before fitting a regression model, it is good practice to normalize or\n",
+    "standardize the features. This ensures all features are on a\n",
+    "comparable scale, which is especially important when using\n",
+    "regularization. In the exercises this week we will perform standardization, scaling each\n",
+    "feature to have mean 0 and standard deviation 1.\n",
+    "\n",
+    "Here we compute the mean and standard deviation of each column (feature) in our design/feature matrix $\\boldsymbol{X}$.\n",
+    "Then we subtract the mean and divide by the standard deviation for each feature.\n",
+    "\n",
+    "In the example here we\n",
+    "we will also center the target $\\boldsymbol{y}$ to mean $0$. Centering $\\boldsymbol{y}$\n",
+    "(and each feature) means the model does not require a separate intercept\n",
+    "term, the data is shifted such that the intercept is effectively 0\n",
+    ". (In practice, one could include an intercept in the model and not\n",
+    "penalize it, but here we simplify by centering.)\n",
+    "Choose $n=100$ data points and set up $\\boldsymbol{x}, $\\boldsymbol{y}$ and the design matrix $\\boldsymbol{X}$."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 17,
+   "id": "46375144",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "# Standardize features (zero mean, unit variance for each feature)\n",
+    "X_mean = X.mean(axis=0)\n",
+    "X_std = X.std(axis=0)\n",
+    "X_std[X_std == 0] = 1  # safeguard to avoid division by zero for constant features\n",
+    "X_norm = (X - X_mean) / X_std\n",
+    "\n",
+    "# Center the target to zero mean (optional, to simplify intercept handling)\n",
+    "y_mean = ?\n",
+    "y_centered = ?"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "39426ccf",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Do we need to center the values of $y$?\n",
+    "\n",
+    "After this preprocessing, each column of $\\boldsymbol{X}_{\\mathrm{norm}}$ has mean zero and standard deviation $1$\n",
+    "and $\\boldsymbol{y}_{\\mathrm{centered}}$ has mean 0. This can make the optimization landscape\n",
+    "nicer and ensures the regularization penalty $\\lambda \\sum_j\n",
+    "\\theta_j^2$ in Ridge regression treats each coefficient fairly (since features are on the\n",
+    "same scale)."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "df7fe27f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Functionality in Scikit-Learn\n",
+    "\n",
+    "**Scikit-Learn** has several functions which allow us to rescale the\n",
+    "data, normally resulting in much better results in terms of various\n",
+    "accuracy scores.  The **StandardScaler** function in **Scikit-Learn**\n",
+    "ensures that for each feature/predictor we study the mean value is\n",
+    "zero and the variance is one (every column in the design/feature\n",
+    "matrix).  This scaling has the drawback that it does not ensure that\n",
+    "we have a particular maximum or minimum in our data set. Another\n",
+    "function included in **Scikit-Learn** is the **MinMaxScaler** which\n",
+    "ensures that all features are exactly between $0$ and $1$. The"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8fd48e39",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## More preprocessing\n",
+    "\n",
+    "The **Normalizer** scales each data\n",
+    "point such that the feature vector has a euclidean length of one. In other words, it\n",
+    "projects a data point on the circle (or sphere in the case of higher dimensions) with a\n",
+    "radius of 1. This means every data point is scaled by a different number (by the\n",
+    "inverse of it’s length).\n",
+    "This normalization is often used when only the direction (or angle) of the data matters,\n",
+    "not the length of the feature vector.\n",
+    "\n",
+    "The **RobustScaler** works similarly to the StandardScaler in that it\n",
+    "ensures statistical properties for each feature that guarantee that\n",
+    "they are on the same scale. However, the RobustScaler uses the median\n",
+    "and quartiles, instead of mean and variance. This makes the\n",
+    "RobustScaler ignore data points that are very different from the rest\n",
+    "(like measurement errors). These odd data points are also called\n",
+    "outliers, and might often lead to trouble for other scaling\n",
+    "techniques."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d6c60a0a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Frequently used scaling functions\n",
+    "\n",
+    "Many features are often scaled using standardization to improve performance. In **Scikit-Learn** this is given by the **StandardScaler** function as discussed above. It is easy however to write your own. \n",
+    "Mathematically, this involves subtracting the mean and divide by the standard deviation over the data set, for each feature:"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1bb6eaa0",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "x_j^{(i)} \\rightarrow \\frac{x_j^{(i)} - \\overline{x}_j}{\\sigma(x_j)},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "25135896",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where $\\overline{x}_j$ and $\\sigma(x_j)$ are the mean and standard deviation, respectively,  of the feature $x_j$.\n",
+    "This ensures that each feature has zero mean and unit standard deviation.  For data sets where  we do not have the standard deviation or don't wish to calculate it,  it is then common to simply set it to one.\n",
+    "\n",
+    "Keep in mind that when you transform your data set before training a model, the same transformation needs to be done\n",
+    "on your eventual new data set  before making a prediction. If we translate this into a Python code, it would could be implemented as"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 18,
+   "id": "469ca11e",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "\"\"\"\n",
+    "#Model training, we compute the mean value of y and X\n",
+    "y_train_mean = np.mean(y_train)\n",
+    "X_train_mean = np.mean(X_train,axis=0)\n",
+    "X_train = X_train - X_train_mean\n",
+    "y_train = y_train - y_train_mean\n",
+    "\n",
+    "# The we fit our model with the training data\n",
+    "trained_model = some_model.fit(X_train,y_train)\n",
+    "\n",
+    "\n",
+    "#Model prediction, we need also to transform our data set used for the prediction.\n",
+    "X_test = X_test - X_train_mean #Use mean from training data\n",
+    "y_pred = trained_model(X_test)\n",
+    "y_pred = y_pred + y_train_mean\n",
+    "\"\"\""
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "33722029",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Let us try to understand what this may imply mathematically when we\n",
+    "subtract the mean values, also known as *zero centering*. For\n",
+    "simplicity, we will focus on  ordinary regression, as done in the above example.\n",
+    "\n",
+    "The cost/loss function  for regression is"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "fe27291e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "C(\\theta_0, \\theta_1, ... , \\theta_{p-1}) = \\frac{1}{n}\\sum_{i=0}^{n} \\left(y_i - \\theta_0 - \\sum_{j=1}^{p-1} X_{ij}\\theta_j\\right)^2,.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ead1167d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Recall also that we use the squared value. This expression can lead to an\n",
+    "increased penalty for higher differences between predicted and\n",
+    "output/target values.\n",
+    "\n",
+    "What we have done is to single out the $\\theta_0$ term in the\n",
+    "definition of the mean squared error (MSE).  The design matrix $X$\n",
+    "does in this case not contain any intercept column.  When we take the\n",
+    "derivative with respect to $\\theta_0$, we want the derivative to obey"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b2efb706",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial C}{\\partial \\theta_j} = 0,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "65333100",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "for all $j$. For $\\theta_0$ we have"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1fde497c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial C}{\\partial \\theta_0} = -\\frac{2}{n}\\sum_{i=0}^{n-1} \\left(y_i - \\theta_0 - \\sum_{j=1}^{p-1} X_{ij} \\theta_j\\right).\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "264ce562",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Multiplying away the constant $2/n$, we obtain"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0f63a6f8",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\sum_{i=0}^{n-1} \\theta_0 = \\sum_{i=0}^{n-1}y_i - \\sum_{i=0}^{n-1} \\sum_{j=1}^{p-1} X_{ij} \\theta_j.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2ba0a6e4",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Let us specialize first to the case where we have only two parameters $\\theta_0$ and $\\theta_1$.\n",
+    "Our result for $\\theta_0$ simplifies then to"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3b377f93",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "n\\theta_0 = \\sum_{i=0}^{n-1}y_i - \\sum_{i=0}^{n-1} X_{i1} \\theta_1.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f05e9d08",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "We obtain then"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "84784b8e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\theta_0 = \\frac{1}{n}\\sum_{i=0}^{n-1}y_i - \\theta_1\\frac{1}{n}\\sum_{i=0}^{n-1} X_{i1}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b62c6e5a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "If we define"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ecce9763",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\mu_{\\boldsymbol{x}_1}=\\frac{1}{n}\\sum_{i=0}^{n-1} X_{i1},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c9e1842a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and the mean value of the outputs as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "be12163e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\mu_y=\\frac{1}{n}\\sum_{i=0}^{n-1}y_i,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a097e9ab",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "we have"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "239422b0",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\theta_0 = \\mu_y - \\theta_1\\mu_{\\boldsymbol{x}_1}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ed9778bb",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "In the general case with more parameters than $\\theta_0$ and $\\theta_1$, we have"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7179b77b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\theta_0 = \\frac{1}{n}\\sum_{i=0}^{n-1}y_i - \\frac{1}{n}\\sum_{i=0}^{n-1}\\sum_{j=1}^{p-1} X_{ij}\\theta_j.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "aad2f56e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "We can rewrite the latter equation as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "26aa9739",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\theta_0 = \\frac{1}{n}\\sum_{i=0}^{n-1}y_i - \\sum_{j=1}^{p-1} \\mu_{\\boldsymbol{x}_j}\\theta_j,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d270cb13",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where we have defined"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5a52457b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\mu_{\\boldsymbol{x}_j}=\\frac{1}{n}\\sum_{i=0}^{n-1} X_{ij},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8c98105d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "the mean value for all elements of the column vector $\\boldsymbol{x}_j$.\n",
+    "\n",
+    "Replacing $y_i$ with $y_i - y_i - \\overline{\\boldsymbol{y}}$ and centering also our design matrix results in a cost function (in vector-matrix disguise)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4d82302f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "C(\\boldsymbol{\\theta}) = (\\boldsymbol{\\tilde{y}} - \\tilde{X}\\boldsymbol{\\theta})^T(\\boldsymbol{\\tilde{y}} - \\tilde{X}\\boldsymbol{\\theta}).\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a3a07a10",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "If we minimize with respect to $\\boldsymbol{\\theta}$ we have then"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ea19374e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\hat{\\boldsymbol{\\theta}} = (\\tilde{X}^T\\tilde{X})^{-1}\\tilde{X}^T\\boldsymbol{\\tilde{y}},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "11dd1361",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where $\\boldsymbol{\\tilde{y}} = \\boldsymbol{y} - \\overline{\\boldsymbol{y}}$\n",
+    "and $\\tilde{X}_{ij} = X_{ij} - \\frac{1}{n}\\sum_{k=0}^{n-1}X_{kj}$.\n",
+    "\n",
+    "For Ridge regression we need to add $\\lambda \\boldsymbol{\\theta}^T\\boldsymbol{\\theta}$ to the cost function and get then"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f6a52f34",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\hat{\\boldsymbol{\\theta}} = (\\tilde{X}^T\\tilde{X} + \\lambda I)^{-1}\\tilde{X}^T\\boldsymbol{\\tilde{y}}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9d6807dc",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "What does this mean? And why do we insist on all this? Let us look at some examples.\n",
+    "\n",
+    "This code shows a simple first-order fit to a data set using the above transformed data, where we consider the role of the intercept first, by either excluding it or including it (*code example thanks to  Øyvind Sigmundson Schøyen*). Here our scaling of the data is done by subtracting the mean values only.\n",
+    "Note also that we do not split the data into training and test."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 19,
+   "id": "2ed0cafc",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "import numpy as np\n",
+    "import matplotlib.pyplot as plt\n",
+    "\n",
+    "from sklearn.linear_model import LinearRegression\n",
+    "\n",
+    "\n",
+    "np.random.seed(2021)\n",
+    "\n",
+    "def MSE(y_data,y_model):\n",
+    "    n = np.size(y_model)\n",
+    "    return np.sum((y_data-y_model)**2)/n\n",
+    "\n",
+    "\n",
+    "def fit_theta(X, y):\n",
+    "    return np.linalg.pinv(X.T @ X) @ X.T @ y\n",
+    "\n",
+    "\n",
+    "true_theta = [2, 0.5, 3.7]\n",
+    "\n",
+    "x = np.linspace(0, 1, 11)\n",
+    "y = np.sum(\n",
+    "    np.asarray([x ** p * b for p, b in enumerate(true_theta)]), axis=0\n",
+    ") + 0.1 * np.random.normal(size=len(x))\n",
+    "\n",
+    "degree = 3\n",
+    "X = np.zeros((len(x), degree))\n",
+    "\n",
+    "# Include the intercept in the design matrix\n",
+    "for p in range(degree):\n",
+    "    X[:, p] = x ** p\n",
+    "\n",
+    "theta = fit_theta(X, y)\n",
+    "\n",
+    "# Intercept is included in the design matrix\n",
+    "skl = LinearRegression(fit_intercept=False).fit(X, y)\n",
+    "\n",
+    "print(f\"True theta: {true_theta}\")\n",
+    "print(f\"Fitted theta: {theta}\")\n",
+    "print(f\"Sklearn fitted theta: {skl.coef_}\")\n",
+    "ypredictOwn = X @ theta\n",
+    "ypredictSKL = skl.predict(X)\n",
+    "print(f\"MSE with intercept column\")\n",
+    "print(MSE(y,ypredictOwn))\n",
+    "print(f\"MSE with intercept column from SKL\")\n",
+    "print(MSE(y,ypredictSKL))\n",
+    "\n",
+    "\n",
+    "plt.figure()\n",
+    "plt.scatter(x, y, label=\"Data\")\n",
+    "plt.plot(x, X @ theta, label=\"Fit\")\n",
+    "plt.plot(x, skl.predict(X), label=\"Sklearn (fit_intercept=False)\")\n",
+    "\n",
+    "\n",
+    "# Do not include the intercept in the design matrix\n",
+    "X = np.zeros((len(x), degree - 1))\n",
+    "\n",
+    "for p in range(degree - 1):\n",
+    "    X[:, p] = x ** (p + 1)\n",
+    "\n",
+    "# Intercept is not included in the design matrix\n",
+    "skl = LinearRegression(fit_intercept=True).fit(X, y)\n",
+    "\n",
+    "# Use centered values for X and y when computing coefficients\n",
+    "y_offset = np.average(y, axis=0)\n",
+    "X_offset = np.average(X, axis=0)\n",
+    "\n",
+    "theta = fit_theta(X - X_offset, y - y_offset)\n",
+    "intercept = np.mean(y_offset - X_offset @ theta)\n",
+    "\n",
+    "print(f\"Manual intercept: {intercept}\")\n",
+    "print(f\"Fitted theta (without intercept): {theta}\")\n",
+    "print(f\"Sklearn intercept: {skl.intercept_}\")\n",
+    "print(f\"Sklearn fitted theta (without intercept): {skl.coef_}\")\n",
+    "ypredictOwn = X @ theta\n",
+    "ypredictSKL = skl.predict(X)\n",
+    "print(f\"MSE with Manual intercept\")\n",
+    "print(MSE(y,ypredictOwn+intercept))\n",
+    "print(f\"MSE with Sklearn intercept\")\n",
+    "print(MSE(y,ypredictSKL))\n",
+    "\n",
+    "plt.plot(x, X @ theta + intercept, \"--\", label=\"Fit (manual intercept)\")\n",
+    "plt.plot(x, skl.predict(X), \"--\", label=\"Sklearn (fit_intercept=True)\")\n",
+    "plt.grid()\n",
+    "plt.legend()\n",
+    "\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f72dbb49",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "The intercept is the value of our output/target variable\n",
+    "when all our features are zero and our function crosses the $y$-axis (for a one-dimensional case). \n",
+    "\n",
+    "Printing the MSE, we see first that both methods give the same MSE, as\n",
+    "they should.  However, when we move to for example Ridge regression,\n",
+    "the way we treat the intercept may give a larger or smaller MSE,\n",
+    "meaning that the MSE can be penalized by the value of the\n",
+    "intercept. Not including the intercept in the fit, means that the\n",
+    "regularization term does not include $\\theta_0$. For different values\n",
+    "of $\\lambda$, this may lead to different MSE values. \n",
+    "\n",
+    "To remind the reader, the regularization term, with the intercept in Ridge regression, is given by"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b7759b1f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\lambda \\vert\\vert \\boldsymbol{\\theta} \\vert\\vert_2^2 = \\lambda \\sum_{j=0}^{p-1}\\theta_j^2,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ba0ecd6e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "but when we take out the intercept, this equation becomes"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ae897f1e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\lambda \\vert\\vert \\boldsymbol{\\theta} \\vert\\vert_2^2 = \\lambda \\sum_{j=1}^{p-1}\\theta_j^2.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f9c41f7f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "For Lasso regression we have"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "fa013cc4",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\lambda \\vert\\vert \\boldsymbol{\\theta} \\vert\\vert_1 = \\lambda \\sum_{j=1}^{p-1}\\vert\\theta_j\\vert.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0c9b24be",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "It means that, when scaling the design matrix and the outputs/targets,\n",
+    "by subtracting the mean values, we have an optimization problem which\n",
+    "is not penalized by the intercept. The MSE value can then be smaller\n",
+    "since it focuses only on the remaining quantities. If we however bring\n",
+    "back the intercept, we will get a MSE which then contains the\n",
+    "intercept.\n",
+    "\n",
+    "Armed with this wisdom, we attempt first to simply set the intercept equal to **False** in our implementation of Ridge regression for our well-known  vanilla data set."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 20,
+   "id": "4f9b1fa0",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "import numpy as np\n",
+    "import pandas as pd\n",
+    "import matplotlib.pyplot as plt\n",
+    "from sklearn.model_selection import train_test_split\n",
+    "from sklearn import linear_model\n",
+    "\n",
+    "def MSE(y_data,y_model):\n",
+    "    n = np.size(y_model)\n",
+    "    return np.sum((y_data-y_model)**2)/n\n",
+    "\n",
+    "\n",
+    "# A seed just to ensure that the random numbers are the same for every run.\n",
+    "# Useful for eventual debugging.\n",
+    "np.random.seed(3155)\n",
+    "\n",
+    "n = 100\n",
+    "x = np.random.rand(n)\n",
+    "y = np.exp(-x**2) + 1.5 * np.exp(-(x-2)**2)\n",
+    "\n",
+    "Maxpolydegree = 20\n",
+    "X = np.zeros((n,Maxpolydegree))\n",
+    "#We include explicitely the intercept column\n",
+    "for degree in range(Maxpolydegree):\n",
+    "    X[:,degree] = x**degree\n",
+    "# We split the data in test and training data\n",
+    "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)\n",
+    "\n",
+    "p = Maxpolydegree\n",
+    "I = np.eye(p,p)\n",
+    "# Decide which values of lambda to use\n",
+    "nlambdas = 6\n",
+    "MSEOwnRidgePredict = np.zeros(nlambdas)\n",
+    "MSERidgePredict = np.zeros(nlambdas)\n",
+    "lambdas = np.logspace(-4, 2, nlambdas)\n",
+    "for i in range(nlambdas):\n",
+    "    lmb = lambdas[i]\n",
+    "    OwnRidgeTheta = np.linalg.pinv(X_train.T @ X_train+lmb*I) @ X_train.T @ y_train\n",
+    "    # Note: we include the intercept column and no scaling\n",
+    "    RegRidge = linear_model.Ridge(lmb,fit_intercept=False)\n",
+    "    RegRidge.fit(X_train,y_train)\n",
+    "    # and then make the prediction\n",
+    "    ytildeOwnRidge = X_train @ OwnRidgeTheta\n",
+    "    ypredictOwnRidge = X_test @ OwnRidgeTheta\n",
+    "    ytildeRidge = RegRidge.predict(X_train)\n",
+    "    ypredictRidge = RegRidge.predict(X_test)\n",
+    "    MSEOwnRidgePredict[i] = MSE(y_test,ypredictOwnRidge)\n",
+    "    MSERidgePredict[i] = MSE(y_test,ypredictRidge)\n",
+    "    print(\"Theta values for own Ridge implementation\")\n",
+    "    print(OwnRidgeTheta)\n",
+    "    print(\"Theta values for Scikit-Learn Ridge implementation\")\n",
+    "    print(RegRidge.coef_)\n",
+    "    print(\"MSE values for own Ridge implementation\")\n",
+    "    print(MSEOwnRidgePredict[i])\n",
+    "    print(\"MSE values for Scikit-Learn Ridge implementation\")\n",
+    "    print(MSERidgePredict[i])\n",
+    "\n",
+    "# Now plot the results\n",
+    "plt.figure()\n",
+    "plt.plot(np.log10(lambdas), MSEOwnRidgePredict, 'r', label = 'MSE own Ridge Test')\n",
+    "plt.plot(np.log10(lambdas), MSERidgePredict, 'g', label = 'MSE Ridge Test')\n",
+    "\n",
+    "plt.xlabel('log10(lambda)')\n",
+    "plt.ylabel('MSE')\n",
+    "plt.legend()\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1aa5ca37",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "The results here agree when we force **Scikit-Learn**'s Ridge function to include the first column in our design matrix.\n",
+    "We see that the results agree very well. Here we have thus explicitely included the intercept column in the design matrix.\n",
+    "What happens if we do not include the intercept in our fit?\n",
+    "Let us see how we can change this code by zero centering."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 21,
+   "id": "a731e32c",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "import numpy as np\n",
+    "import pandas as pd\n",
+    "import matplotlib.pyplot as plt\n",
+    "from sklearn.model_selection import train_test_split\n",
+    "from sklearn import linear_model\n",
+    "from sklearn.preprocessing import StandardScaler\n",
+    "\n",
+    "def MSE(y_data,y_model):\n",
+    "    n = np.size(y_model)\n",
+    "    return np.sum((y_data-y_model)**2)/n\n",
+    "# A seed just to ensure that the random numbers are the same for every run.\n",
+    "# Useful for eventual debugging.\n",
+    "np.random.seed(315)\n",
+    "\n",
+    "n = 100\n",
+    "x = np.random.rand(n)\n",
+    "y = np.exp(-x**2) + 1.5 * np.exp(-(x-2)**2)\n",
+    "\n",
+    "Maxpolydegree = 20\n",
+    "X = np.zeros((n,Maxpolydegree-1))\n",
+    "\n",
+    "for degree in range(1,Maxpolydegree): #No intercept column\n",
+    "    X[:,degree-1] = x**(degree)\n",
+    "\n",
+    "# We split the data in test and training data\n",
+    "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)\n",
+    "\n",
+    "#For our own implementation, we will need to deal with the intercept by centering the design matrix and the target variable\n",
+    "X_train_mean = np.mean(X_train,axis=0)\n",
+    "#Center by removing mean from each feature\n",
+    "X_train_scaled = X_train - X_train_mean \n",
+    "X_test_scaled = X_test - X_train_mean\n",
+    "#The model intercept (called y_scaler) is given by the mean of the target variable (IF X is centered)\n",
+    "#Remove the intercept from the training data.\n",
+    "y_scaler = np.mean(y_train)           \n",
+    "y_train_scaled = y_train - y_scaler   \n",
+    "\n",
+    "p = Maxpolydegree-1\n",
+    "I = np.eye(p,p)\n",
+    "# Decide which values of lambda to use\n",
+    "nlambdas = 6\n",
+    "MSEOwnRidgePredict = np.zeros(nlambdas)\n",
+    "MSERidgePredict = np.zeros(nlambdas)\n",
+    "\n",
+    "lambdas = np.logspace(-4, 2, nlambdas)\n",
+    "for i in range(nlambdas):\n",
+    "    lmb = lambdas[i]\n",
+    "    OwnRidgeTheta = np.linalg.pinv(X_train_scaled.T @ X_train_scaled+lmb*I) @ X_train_scaled.T @ (y_train_scaled)\n",
+    "    intercept_ = y_scaler - X_train_mean@OwnRidgeTheta #The intercept can be shifted so the model can predict on uncentered data\n",
+    "    #Add intercept to prediction\n",
+    "    ypredictOwnRidge = X_test_scaled @ OwnRidgeTheta + y_scaler \n",
+    "    RegRidge = linear_model.Ridge(lmb)\n",
+    "    RegRidge.fit(X_train,y_train)\n",
+    "    ypredictRidge = RegRidge.predict(X_test)\n",
+    "    MSEOwnRidgePredict[i] = MSE(y_test,ypredictOwnRidge)\n",
+    "    MSERidgePredict[i] = MSE(y_test,ypredictRidge)\n",
+    "    print(\"Theta values for own Ridge implementation\")\n",
+    "    print(OwnRidgeTheta) #Intercept is given by mean of target variable\n",
+    "    print(\"Theta values for Scikit-Learn Ridge implementation\")\n",
+    "    print(RegRidge.coef_)\n",
+    "    print('Intercept from own implementation:')\n",
+    "    print(intercept_)\n",
+    "    print('Intercept from Scikit-Learn Ridge implementation')\n",
+    "    print(RegRidge.intercept_)\n",
+    "    print(\"MSE values for own Ridge implementation\")\n",
+    "    print(MSEOwnRidgePredict[i])\n",
+    "    print(\"MSE values for Scikit-Learn Ridge implementation\")\n",
+    "    print(MSERidgePredict[i])\n",
+    "\n",
+    "\n",
+    "# Now plot the results\n",
+    "plt.figure()\n",
+    "plt.plot(np.log10(lambdas), MSEOwnRidgePredict, 'b--', label = 'MSE own Ridge Test')\n",
+    "plt.plot(np.log10(lambdas), MSERidgePredict, 'g--', label = 'MSE SL Ridge Test')\n",
+    "plt.xlabel('log10(lambda)')\n",
+    "plt.ylabel('MSE')\n",
+    "plt.legend()\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6ea197d8",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "We see here, when compared to the code which includes explicitely the\n",
+    "intercept column, that our MSE value is actually smaller. This is\n",
+    "because the regularization term does not include the intercept value\n",
+    "$\\theta_0$ in the fitting.  This applies to Lasso regularization as\n",
+    "well.  It means that our optimization is now done only with the\n",
+    "centered matrix and/or vector that enter the fitting procedure."
+   ]
+  }
+ ],
+ "metadata": {},
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/doc/LectureNotes/week38.ipynb b/doc/LectureNotes/week38.ipynb
new file mode 100644
index 000000000..1d25f9941
--- /dev/null
+++ b/doc/LectureNotes/week38.ipynb
@@ -0,0 +1,2283 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "8f27372d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- HTML file automatically generated from DocOnce source (https://github.com/doconce/doconce/)\n",
+    "doconce format html week38.do.txt --no_mako -->\n",
+    "<!-- dom:TITLE: Week 38: Statistical analysis, bias-variance tradeoff and resampling methods -->"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "fff8ca30",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "# Week 38: Statistical analysis, bias-variance tradeoff and resampling methods\n",
+    "**Morten Hjorth-Jensen**, Department of Physics and Center for Computing in Science Education, University of Oslo, Norway\n",
+    "\n",
+    "Date: **September 15-19, 2025**"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7ee7e714",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Plans for week 38, lecture Monday September 15\n",
+    "\n",
+    "**Material for the lecture on Monday September 15.**\n",
+    "\n",
+    "1. Statistical interpretation of OLS and various expectation values\n",
+    "\n",
+    "2. Resampling techniques, Bootstrap and cross validation and bias-variance tradeoff\n",
+    "\n",
+    "3. The material we did not cover last week, that is on more advanced methods for updating the learning rate, are covered by its own video. We will briefly discuss these topics at the beginning of the lecture and during the lab sessions. See video on ADAgrad, RMSprop and ADAM (material from last week not covered during lecture) at <https://youtu.be/J_41Hld6tTU>\n",
+    "\n",
+    "4. [Video of Lecture](https://youtu.be/4Fo7ITVA7V4)\n",
+    "\n",
+    "5. [Whiteboard notes](https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2025/FYSSTKweek38.pdf)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3b5ac440",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Readings and Videos\n",
+    "1. Raschka et al, pages 175-192\n",
+    "\n",
+    "2. Hastie et al Chapter 7, here we recommend 7.1-7.5 and 7.10 (cross-validation) and 7.11 (bootstrap). See <https://link.springer.com/book/10.1007/978-0-387-84858-7>.\n",
+    "\n",
+    "3. [Video on bias-variance tradeoff](https://www.youtube.com/watch?v=EuBBz3bI-aA)\n",
+    "\n",
+    "4. [Video on Bootstrapping](https://www.youtube.com/watch?v=Xz0x-8-cgaQ)\n",
+    "\n",
+    "5. [Video on cross validation](https://www.youtube.com/watch?v=fSytzGwwBVw)\n",
+    "\n",
+    "For the lab session, the following video on cross validation (from 2024), could be helpful, see <https://www.youtube.com/watch?v=T9jjWsmsd1o>"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6d5dba52",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Linking the regression analysis with a statistical interpretation\n",
+    "\n",
+    "We will now couple the discussions of ordinary least squares, Ridge\n",
+    "and Lasso regression with a statistical interpretation, that is we\n",
+    "move from a linear algebra analysis to a statistical analysis. In\n",
+    "particular, we will focus on what the regularization terms can result\n",
+    "in.  We will amongst other things show that the regularization\n",
+    "parameter can reduce considerably the variance of the parameters\n",
+    "$\\theta$.\n",
+    "\n",
+    "On of the advantages of doing linear regression is that we actually end up with\n",
+    "analytical expressions for several statistical quantities.  \n",
+    "Standard least squares and Ridge regression  allow us to\n",
+    "derive quantities like the variance and other expectation values in a\n",
+    "rather straightforward way.\n",
+    "\n",
+    "It is assumed that $\\varepsilon_i\n",
+    "\\sim \\mathcal{N}(0, \\sigma^2)$ and the $\\varepsilon_{i}$ are\n",
+    "independent, i.e.:"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "bfc2983a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\begin{align*} \n",
+    "\\mbox{Cov}(\\varepsilon_{i_1},\n",
+    "\\varepsilon_{i_2}) & = \\left\\{ \\begin{array}{lcc} \\sigma^2 & \\mbox{if}\n",
+    "& i_1 = i_2, \\\\ 0 & \\mbox{if} & i_1 \\not= i_2.  \\end{array} \\right.\n",
+    "\\end{align*}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2b5f5980",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "The randomness of $\\varepsilon_i$ implies that\n",
+    "$\\mathbf{y}_i$ is also a random variable. In particular,\n",
+    "$\\mathbf{y}_i$ is normally distributed, because $\\varepsilon_i \\sim\n",
+    "\\mathcal{N}(0, \\sigma^2)$ and $\\mathbf{X}_{i,\\ast} \\, \\boldsymbol{\\theta}$ is a\n",
+    "non-random scalar. To specify the parameters of the distribution of\n",
+    "$\\mathbf{y}_i$ we need to calculate its first two moments. \n",
+    "\n",
+    "Recall that $\\boldsymbol{X}$ is a matrix of dimensionality $n\\times p$. The\n",
+    "notation above $\\mathbf{X}_{i,\\ast}$ means that we are looking at the\n",
+    "row number $i$ and perform a sum over all values $p$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3464c7e8",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Assumptions made\n",
+    "\n",
+    "The assumption we have made here can be summarized as (and this is going to be useful when we discuss the bias-variance trade off)\n",
+    "that there exists a function $f(\\boldsymbol{x})$ and  a normal distributed error $\\boldsymbol{\\varepsilon}\\sim \\mathcal{N}(0, \\sigma^2)$\n",
+    "which describe our data"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ed0fd2df",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\boldsymbol{y} = f(\\boldsymbol{x})+\\boldsymbol{\\varepsilon}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "feb9d4c2",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "We approximate this function with our model from the solution of the linear regression equations, that is our\n",
+    "function $f$ is approximated by $\\boldsymbol{\\tilde{y}}$ where we want to minimize $(\\boldsymbol{y}-\\boldsymbol{\\tilde{y}})^2$, our MSE, with"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "eb6d71f8",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\boldsymbol{\\tilde{y}} = \\boldsymbol{X}\\boldsymbol{\\theta}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "566399f6",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Expectation value and variance\n",
+    "\n",
+    "We can calculate the expectation value of $\\boldsymbol{y}$ for a given element $i$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6b33f497",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\begin{align*} \n",
+    "\\mathbb{E}(y_i) & =\n",
+    "\\mathbb{E}(\\mathbf{X}_{i, \\ast} \\, \\boldsymbol{\\theta}) + \\mathbb{E}(\\varepsilon_i)\n",
+    "\\, \\, \\, = \\, \\, \\, \\mathbf{X}_{i, \\ast} \\, \\theta, \n",
+    "\\end{align*}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5f2f79f2",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "while\n",
+    "its variance is"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "199121b0",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\begin{align*} \\mbox{Var}(y_i) & = \\mathbb{E} \\{ [y_i\n",
+    "- \\mathbb{E}(y_i)]^2 \\} \\, \\, \\, = \\, \\, \\, \\mathbb{E} ( y_i^2 ) -\n",
+    "[\\mathbb{E}(y_i)]^2  \\\\  & = \\mathbb{E} [ ( \\mathbf{X}_{i, \\ast} \\,\n",
+    "\\theta + \\varepsilon_i )^2] - ( \\mathbf{X}_{i, \\ast} \\, \\boldsymbol{\\theta})^2 \\\\ &\n",
+    "= \\mathbb{E} [ ( \\mathbf{X}_{i, \\ast} \\, \\boldsymbol{\\theta})^2 + 2 \\varepsilon_i\n",
+    "\\mathbf{X}_{i, \\ast} \\, \\boldsymbol{\\theta} + \\varepsilon_i^2 ] - ( \\mathbf{X}_{i,\n",
+    "\\ast} \\, \\theta)^2 \\\\  & = ( \\mathbf{X}_{i, \\ast} \\, \\boldsymbol{\\theta})^2 + 2\n",
+    "\\mathbb{E}(\\varepsilon_i) \\mathbf{X}_{i, \\ast} \\, \\boldsymbol{\\theta} +\n",
+    "\\mathbb{E}(\\varepsilon_i^2 ) - ( \\mathbf{X}_{i, \\ast} \\, \\boldsymbol{\\theta})^2 \n",
+    "\\\\ & = \\mathbb{E}(\\varepsilon_i^2 ) \\, \\, \\, = \\, \\, \\,\n",
+    "\\mbox{Var}(\\varepsilon_i) \\, \\, \\, = \\, \\, \\, \\sigma^2.  \n",
+    "\\end{align*}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9a1cc529",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Hence, $y_i \\sim \\mathcal{N}( \\mathbf{X}_{i, \\ast} \\, \\boldsymbol{\\theta}, \\sigma^2)$, that is $\\boldsymbol{y}$ follows a normal distribution with \n",
+    "mean value $\\boldsymbol{X}\\boldsymbol{\\theta}$ and variance $\\sigma^2$ (not be confused with the singular values of the SVD)."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "149e63be",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Expectation value and variance for $\\boldsymbol{\\theta}$\n",
+    "\n",
+    "With the OLS expressions for the optimal parameters $\\boldsymbol{\\hat{\\theta}}$ we can evaluate the expectation value"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6a6fb04a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\mathbb{E}(\\boldsymbol{\\hat{\\theta}}) = \\mathbb{E}[ (\\mathbf{X}^{\\top} \\mathbf{X})^{-1}\\mathbf{X}^{T} \\mathbf{Y}]=(\\mathbf{X}^{T} \\mathbf{X})^{-1}\\mathbf{X}^{T} \\mathbb{E}[ \\mathbf{Y}]=(\\mathbf{X}^{T} \\mathbf{X})^{-1} \\mathbf{X}^{T}\\mathbf{X}\\boldsymbol{\\theta}=\\boldsymbol{\\theta}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "79420d06",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "This means that the estimator of the regression parameters is unbiased.\n",
+    "\n",
+    "We can also calculate the variance\n",
+    "\n",
+    "The variance of the optimal value $\\boldsymbol{\\hat{\\theta}}$ is"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0e3de992",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\begin{eqnarray*}\n",
+    "\\mbox{Var}(\\boldsymbol{\\hat{\\theta}}) & = & \\mathbb{E} \\{ [\\boldsymbol{\\theta} - \\mathbb{E}(\\boldsymbol{\\theta})] [\\boldsymbol{\\theta} - \\mathbb{E}(\\boldsymbol{\\theta})]^{T} \\}\n",
+    "\\\\\n",
+    "& = & \\mathbb{E} \\{ [(\\mathbf{X}^{T} \\mathbf{X})^{-1} \\, \\mathbf{X}^{T} \\mathbf{Y} - \\boldsymbol{\\theta}] \\, [(\\mathbf{X}^{T} \\mathbf{X})^{-1} \\, \\mathbf{X}^{T} \\mathbf{Y} - \\boldsymbol{\\theta}]^{T} \\}\n",
+    "\\\\\n",
+    "% & = & \\mathbb{E} \\{ [(\\mathbf{X}^{T} \\mathbf{X})^{-1} \\, \\mathbf{X}^{T} \\mathbf{Y}] \\, [(\\mathbf{X}^{T} \\mathbf{X})^{-1} \\, \\mathbf{X}^{T} \\mathbf{Y}]^{T} \\} - \\boldsymbol{\\theta} \\, \\boldsymbol{\\theta}^{T}\n",
+    "% \\\\\n",
+    "% & = & \\mathbb{E} \\{ (\\mathbf{X}^{T} \\mathbf{X})^{-1} \\, \\mathbf{X}^{T} \\mathbf{Y} \\, \\mathbf{Y}^{T} \\, \\mathbf{X} \\, (\\mathbf{X}^{T} \\mathbf{X})^{-1}  \\} - \\boldsymbol{\\theta} \\, \\boldsymbol{\\theta}^{T}\n",
+    "% \\\\\n",
+    "& = & (\\mathbf{X}^{T} \\mathbf{X})^{-1} \\, \\mathbf{X}^{T} \\, \\mathbb{E} \\{ \\mathbf{Y} \\, \\mathbf{Y}^{T} \\} \\, \\mathbf{X} \\, (\\mathbf{X}^{T} \\mathbf{X})^{-1} - \\boldsymbol{\\theta} \\, \\boldsymbol{\\theta}^{T}\n",
+    "\\\\\n",
+    "& = & (\\mathbf{X}^{T} \\mathbf{X})^{-1} \\, \\mathbf{X}^{T} \\, \\{ \\mathbf{X} \\, \\boldsymbol{\\theta} \\, \\boldsymbol{\\theta}^{T} \\,  \\mathbf{X}^{T} + \\sigma^2 \\} \\, \\mathbf{X} \\, (\\mathbf{X}^{T} \\mathbf{X})^{-1} - \\boldsymbol{\\theta} \\, \\boldsymbol{\\theta}^{T}\n",
+    "% \\\\\n",
+    "% & = & (\\mathbf{X}^T \\mathbf{X})^{-1} \\, \\mathbf{X}^T \\, \\mathbf{X} \\, \\boldsymbol{\\theta} \\, \\boldsymbol{\\theta}^T \\,  \\mathbf{X}^T \\, \\mathbf{X} \\, (\\mathbf{X}^T % \\mathbf{X})^{-1}\n",
+    "% \\\\\n",
+    "% & & + \\, \\, \\sigma^2 \\, (\\mathbf{X}^T \\mathbf{X})^{-1} \\, \\mathbf{X}^T  \\, \\mathbf{X} \\, (\\mathbf{X}^T \\mathbf{X})^{-1} - \\boldsymbol{\\theta} \\boldsymbol{\\theta}^T\n",
+    "\\\\\n",
+    "& = & \\boldsymbol{\\theta} \\, \\boldsymbol{\\theta}^{T}  + \\sigma^2 \\, (\\mathbf{X}^{T} \\mathbf{X})^{-1} - \\boldsymbol{\\theta} \\, \\boldsymbol{\\theta}^{T}\n",
+    "\\, \\, \\, = \\, \\, \\, \\sigma^2 \\, (\\mathbf{X}^{T} \\mathbf{X})^{-1},\n",
+    "\\end{eqnarray*}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d3ea2897",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where we have used  that $\\mathbb{E} (\\mathbf{Y} \\mathbf{Y}^{T}) =\n",
+    "\\mathbf{X} \\, \\boldsymbol{\\theta} \\, \\boldsymbol{\\theta}^{T} \\, \\mathbf{X}^{T} +\n",
+    "\\sigma^2 \\, \\mathbf{I}_{nn}$. From $\\mbox{Var}(\\boldsymbol{\\theta}) = \\sigma^2\n",
+    "\\, (\\mathbf{X}^{T} \\mathbf{X})^{-1}$, one obtains an estimate of the\n",
+    "variance of the estimate of the $j$-th regression coefficient:\n",
+    "$\\boldsymbol{\\sigma}^2 (\\boldsymbol{\\theta}_j ) = \\boldsymbol{\\sigma}^2 [(\\mathbf{X}^{T} \\mathbf{X})^{-1}]_{jj} $. This may be used to\n",
+    "construct a confidence interval for the estimates.\n",
+    "\n",
+    "In a similar way, we can obtain analytical expressions for say the\n",
+    "expectation values of the parameters $\\boldsymbol{\\theta}$ and their variance\n",
+    "when we employ Ridge regression, allowing us again to define a confidence interval. \n",
+    "\n",
+    "It is rather straightforward to show that"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "da5e3927",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\mathbb{E} \\big[ \\boldsymbol{\\theta}^{\\mathrm{Ridge}} \\big]=(\\mathbf{X}^{T} \\mathbf{X} + \\lambda \\mathbf{I}_{pp})^{-1} (\\mathbf{X}^{\\top} \\mathbf{X})\\boldsymbol{\\theta}^{\\mathrm{OLS}}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7ab5488b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "We see clearly that \n",
+    "$\\mathbb{E} \\big[ \\boldsymbol{\\theta}^{\\mathrm{Ridge}} \\big] \\not= \\boldsymbol{\\theta}^{\\mathrm{OLS}}$ for any $\\lambda > 0$. We say then that the ridge estimator is biased.\n",
+    "\n",
+    "We can also compute the variance as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f904a739",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\mbox{Var}[\\boldsymbol{\\theta}^{\\mathrm{Ridge}}]=\\sigma^2[  \\mathbf{X}^{T} \\mathbf{X} + \\lambda \\mathbf{I} ]^{-1}  \\mathbf{X}^{T} \\mathbf{X} \\{ [  \\mathbf{X}^{\\top} \\mathbf{X} + \\lambda \\mathbf{I} ]^{-1}\\}^{T},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "10fd648b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and it is easy to see that if the parameter $\\lambda$ goes to infinity then the variance of Ridge parameters $\\boldsymbol{\\theta}$ goes to zero. \n",
+    "\n",
+    "With this, we can compute the difference"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4812c2a4",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\mbox{Var}[\\boldsymbol{\\theta}^{\\mathrm{OLS}}]-\\mbox{Var}(\\boldsymbol{\\theta}^{\\mathrm{Ridge}})=\\sigma^2 [  \\mathbf{X}^{T} \\mathbf{X} + \\lambda \\mathbf{I} ]^{-1}[ 2\\lambda\\mathbf{I} + \\lambda^2 (\\mathbf{X}^{T} \\mathbf{X})^{-1} ] \\{ [  \\mathbf{X}^{T} \\mathbf{X} + \\lambda \\mathbf{I} ]^{-1}\\}^{T}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "199d8531",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "The difference is non-negative definite since each component of the\n",
+    "matrix product is non-negative definite. \n",
+    "This means the variance we obtain with the standard OLS will always for $\\lambda > 0$ be larger than the variance of $\\boldsymbol{\\theta}$ obtained with the Ridge estimator. This has interesting consequences when we discuss the so-called bias-variance trade-off below."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "96c16676",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Deriving OLS from a probability distribution\n",
+    "\n",
+    "Our basic assumption when we derived the OLS equations was to assume\n",
+    "that our output is determined by a given continuous function\n",
+    "$f(\\boldsymbol{x})$ and a random noise $\\boldsymbol{\\epsilon}$ given by the normal\n",
+    "distribution with zero mean value and an undetermined variance\n",
+    "$\\sigma^2$.\n",
+    "\n",
+    "We found above that the outputs $\\boldsymbol{y}$ have a mean value given by\n",
+    "$\\boldsymbol{X}\\hat{\\boldsymbol{\\theta}}$ and variance $\\sigma^2$. Since the entries to\n",
+    "the design matrix are not stochastic variables, we can assume that the\n",
+    "probability distribution of our targets is also a normal distribution\n",
+    "but now with mean value $\\boldsymbol{X}\\hat{\\boldsymbol{\\theta}}$. This means that a\n",
+    "single output $y_i$ is given by the Gaussian distribution"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a2a1a004",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "y_i\\sim \\mathcal{N}(\\boldsymbol{X}_{i,*}\\boldsymbol{\\theta}, \\sigma^2)=\\frac{1}{\\sqrt{2\\pi\\sigma^2}}\\exp{\\left[-\\frac{(y_i-\\boldsymbol{X}_{i,*}\\boldsymbol{\\theta})^2}{2\\sigma^2}\\right]}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5aad445b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Independent and Identically Distributed (iid)\n",
+    "\n",
+    "We assume now that the various $y_i$ values are stochastically distributed according to the above Gaussian distribution. \n",
+    "We define this distribution as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d197c8bb",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "p(y_i, \\boldsymbol{X}\\vert\\boldsymbol{\\theta})=\\frac{1}{\\sqrt{2\\pi\\sigma^2}}\\exp{\\left[-\\frac{(y_i-\\boldsymbol{X}_{i,*}\\boldsymbol{\\theta})^2}{2\\sigma^2}\\right]},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e2e7462f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "which reads as finding the likelihood of an event $y_i$ with the input variables $\\boldsymbol{X}$ given the parameters (to be determined) $\\boldsymbol{\\theta}$.\n",
+    "\n",
+    "Since these events are assumed to be independent and identicall distributed we can build the probability distribution function (PDF) for all possible event $\\boldsymbol{y}$ as the product of the single events, that is we have"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "eb635d3d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "p(\\boldsymbol{y},\\boldsymbol{X}\\vert\\boldsymbol{\\theta})=\\prod_{i=0}^{n-1}\\frac{1}{\\sqrt{2\\pi\\sigma^2}}\\exp{\\left[-\\frac{(y_i-\\boldsymbol{X}_{i,*}\\boldsymbol{\\theta})^2}{2\\sigma^2}\\right]}=\\prod_{i=0}^{n-1}p(y_i,\\boldsymbol{X}\\vert\\boldsymbol{\\theta}).\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "445ed13e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "We will write this in a more compact form reserving $\\boldsymbol{D}$ for the domain of events, including the ouputs (targets) and the inputs. That is\n",
+    "in case we have a simple one-dimensional input and output case"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "319bfc6c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\boldsymbol{D}=[(x_0,y_0), (x_1,y_1),\\dots, (x_{n-1},y_{n-1})].\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "90abf35a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "In the more general case the various inputs should be replaced by the possible features represented by the input data set $\\boldsymbol{X}$. \n",
+    "We can now rewrite the above probability as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "04b66fbd",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "p(\\boldsymbol{D}\\vert\\boldsymbol{\\theta})=\\prod_{i=0}^{n-1}\\frac{1}{\\sqrt{2\\pi\\sigma^2}}\\exp{\\left[-\\frac{(y_i-\\boldsymbol{X}_{i,*}\\boldsymbol{\\theta})^2}{2\\sigma^2}\\right]}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4a27b5a7",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "It is a conditional probability (see below) and reads as the likelihood of a domain of events $\\boldsymbol{D}$ given a set of parameters $\\boldsymbol{\\theta}$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8d12543f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Maximum Likelihood Estimation (MLE)\n",
+    "\n",
+    "In statistics, maximum likelihood estimation (MLE) is a method of\n",
+    "estimating the parameters of an assumed probability distribution,\n",
+    "given some observed data. This is achieved by maximizing a likelihood\n",
+    "function so that, under the assumed statistical model, the observed\n",
+    "data is the most probable. \n",
+    "\n",
+    "We will assume here that our events are given by the above Gaussian\n",
+    "distribution and we will determine the optimal parameters $\\theta$ by\n",
+    "maximizing the above PDF. However, computing the derivatives of a\n",
+    "product function is cumbersome and can easily lead to overflow and/or\n",
+    "underflowproblems, with potentials for loss of numerical precision.\n",
+    "\n",
+    "In practice, it is more convenient to maximize the logarithm of the\n",
+    "PDF because it is a monotonically increasing function of the argument.\n",
+    "Alternatively, and this will be our option, we will minimize the\n",
+    "negative of the logarithm since this is a monotonically decreasing\n",
+    "function.\n",
+    "\n",
+    "Note also that maximization/minimization of the logarithm of the PDF\n",
+    "is equivalent to the maximization/minimization of the function itself."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2e5cd118",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## A new Cost Function\n",
+    "\n",
+    "We could now define a new cost function to minimize, namely the negative logarithm of the above PDF"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c71a5edf",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "C(\\boldsymbol{\\theta})=-\\log{\\prod_{i=0}^{n-1}p(y_i,\\boldsymbol{X}\\vert\\boldsymbol{\\theta})}=-\\sum_{i=0}^{n-1}\\log{p(y_i,\\boldsymbol{X}\\vert\\boldsymbol{\\theta})},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e663bf2e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "which becomes"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c4bc4873",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "C(\\boldsymbol{\\theta})=\\frac{n}{2}\\log{2\\pi\\sigma^2}+\\frac{\\vert\\vert (\\boldsymbol{y}-\\boldsymbol{X}\\boldsymbol{\\theta})\\vert\\vert_2^2}{2\\sigma^2}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f5bc59b8",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Taking the derivative of the *new* cost function with respect to the parameters $\\theta$ we recognize our familiar OLS equation, namely"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4f6ddf4a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\boldsymbol{X}^T\\left(\\boldsymbol{y}-\\boldsymbol{X}\\boldsymbol{\\theta}\\right) =0,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "afda0a6b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "which leads to the well-known OLS equation for the optimal paramters $\\theta$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b5335dc0",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\hat{\\boldsymbol{\\theta}}^{\\mathrm{OLS}}=\\left(\\boldsymbol{X}^T\\boldsymbol{X}\\right)^{-1}\\boldsymbol{X}^T\\boldsymbol{y}!\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4f86a52d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Next week we will make  a similar analysis for Ridge and Lasso regression"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5cdb1767",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Why resampling methods\n",
+    "\n",
+    "Before we proceed, we need to rethink what we have been doing. In our\n",
+    "eager to fit the data, we have omitted several important elements in\n",
+    "our regression analysis. In what follows we will\n",
+    "1. look at statistical properties, including a discussion of mean values, variance and the so-called bias-variance tradeoff\n",
+    "\n",
+    "2. introduce resampling techniques like cross-validation, bootstrapping and jackknife and more\n",
+    "\n",
+    "and discuss how to select a given model (one of the difficult parts in machine learning)."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "69435d77",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Resampling methods\n",
+    "Resampling methods are an indispensable tool in modern\n",
+    "statistics. They involve repeatedly drawing samples from a training\n",
+    "set and refitting a model of interest on each sample in order to\n",
+    "obtain additional information about the fitted model. For example, in\n",
+    "order to estimate the variability of a linear regression fit, we can\n",
+    "repeatedly draw different samples from the training data, fit a linear\n",
+    "regression to each new sample, and then examine the extent to which\n",
+    "the resulting fits differ. Such an approach may allow us to obtain\n",
+    "information that would not be available from fitting the model only\n",
+    "once using the original training sample.\n",
+    "\n",
+    "Two resampling methods are often used in Machine Learning analyses,\n",
+    "1. The **bootstrap method**\n",
+    "\n",
+    "2. and **Cross-Validation**\n",
+    "\n",
+    "In addition there are several other methods such as the Jackknife and the Blocking methods. We will discuss in particular\n",
+    "cross-validation and the bootstrap method."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cefbb559",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Resampling approaches can be computationally expensive\n",
+    "\n",
+    "Resampling approaches can be computationally expensive, because they\n",
+    "involve fitting the same statistical method multiple times using\n",
+    "different subsets of the training data. However, due to recent\n",
+    "advances in computing power, the computational requirements of\n",
+    "resampling methods generally are not prohibitive. In this chapter, we\n",
+    "discuss two of the most commonly used resampling methods,\n",
+    "cross-validation and the bootstrap. Both methods are important tools\n",
+    "in the practical application of many statistical learning\n",
+    "procedures. For example, cross-validation can be used to estimate the\n",
+    "test error associated with a given statistical learning method in\n",
+    "order to evaluate its performance, or to select the appropriate level\n",
+    "of flexibility. The process of evaluating a model’s performance is\n",
+    "known as model assessment, whereas the process of selecting the proper\n",
+    "level of flexibility for a model is known as model selection. The\n",
+    "bootstrap is widely used."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2659401a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Why resampling methods ?\n",
+    "**Statistical analysis.**\n",
+    "\n",
+    "* Our simulations can be treated as *computer experiments*. This is particularly the case for Monte Carlo methods which are widely used in statistical analyses.\n",
+    "\n",
+    "* The results can be analysed with the same statistical tools as we would use when analysing experimental data.\n",
+    "\n",
+    "* As in all experiments, we are looking for expectation values and an estimate of how accurate they are, i.e., possible sources for errors."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4d5d7748",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Statistical analysis\n",
+    "\n",
+    "* As in other experiments, many numerical  experiments have two classes of errors:\n",
+    "\n",
+    "  * Statistical errors\n",
+    "\n",
+    "  * Systematical errors\n",
+    "\n",
+    "* Statistical errors can be estimated using standard tools from statistics\n",
+    "\n",
+    "* Systematical errors are method specific and must be treated differently from case to case."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "54df92b3",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Resampling methods\n",
+    "\n",
+    "With all these analytical equations for both the OLS and Ridge\n",
+    "regression, we will now outline how to assess a given model. This will\n",
+    "lead to a discussion of the so-called bias-variance tradeoff (see\n",
+    "below) and so-called resampling methods.\n",
+    "\n",
+    "One of the quantities we have discussed as a way to measure errors is\n",
+    "the mean-squared error (MSE), mainly used for fitting of continuous\n",
+    "functions. Another choice is the absolute error.\n",
+    "\n",
+    "In the discussions below we will focus on the MSE and in particular since we will split the data into test and training data,\n",
+    "we discuss the\n",
+    "1. prediction error or simply the **test error** $\\mathrm{Err_{Test}}$, where we have a fixed training set and the test error is the MSE arising from the data reserved for testing. We discuss also the \n",
+    "\n",
+    "2. training error $\\mathrm{Err_{Train}}$, which is the average loss over the training data.\n",
+    "\n",
+    "As our model becomes more and more complex, more of the training data tends to  used. The training may thence adapt to more complicated structures in the data. This may lead to a decrease in the bias (see below for code example) and a slight increase of the variance for the test error.\n",
+    "For a certain level of complexity the test error will reach minimum, before starting to increase again. The\n",
+    "training error reaches a saturation."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5b1a1390",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Resampling methods: Bootstrap\n",
+    "Bootstrapping is a [non-parametric approach](https://en.wikipedia.org/wiki/Nonparametric_statistics) to statistical inference\n",
+    "that substitutes computation for more traditional distributional\n",
+    "assumptions and asymptotic results. Bootstrapping offers a number of\n",
+    "advantages: \n",
+    "1. The bootstrap is quite general, although there are some cases in which it fails.  \n",
+    "\n",
+    "2. Because it does not require distributional assumptions (such as normally distributed errors), the bootstrap can provide more accurate inferences when the data are not well behaved or when the sample size is small.  \n",
+    "\n",
+    "3. It is possible to apply the bootstrap to statistics with sampling distributions that are difficult to derive, even asymptotically. \n",
+    "\n",
+    "4. It is relatively simple to apply the bootstrap to complex data-collection plans (such as stratified and clustered samples).\n",
+    "\n",
+    "The textbook by [Davison on the Bootstrap Methods and their Applications](https://www.cambridge.org/core/books/bootstrap-methods-and-their-application/ED2FD043579F27952363566DC09CBD6A) provides many more insights and proofs. In this course we will take a more practical approach and use the results and theorems provided in the literature. For those interested in reading more about the bootstrap methods, we recommend the above text and the one by [Efron and Tibshirani](https://www.routledge.com/An-Introduction-to-the-Bootstrap/Efron-Tibshirani/p/book/9780412042317).\n",
+    "\n",
+    "Before we proceed however, we need to remind ourselves about a central theorem in statistics, namely the so-called **central limit theorem**."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "39f233e4",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## The Central Limit Theorem\n",
+    "\n",
+    "Suppose we have a PDF $p(x)$ from which we generate  a series $N$\n",
+    "of averages $\\mathbb{E}[x_i]$. Each mean value $\\mathbb{E}[x_i]$\n",
+    "is viewed as the average of a specific measurement, e.g., throwing \n",
+    "dice 100 times and then taking the average value, or producing a certain\n",
+    "amount of random numbers. \n",
+    "For notational ease, we set $\\mathbb{E}[x_i]=x_i$ in the discussion\n",
+    "which follows. We do the same for $\\mathbb{E}[z]=z$.\n",
+    "\n",
+    "If we compute the mean $z$ of $m$ such mean values $x_i$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "361320d8",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "z=\\frac{x_1+x_2+\\dots+x_m}{m},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a363db1e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "the question we pose is which is the PDF of the new variable $z$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "92967efc",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Finding the Limit\n",
+    "\n",
+    "The probability of obtaining an average value $z$ is the product of the \n",
+    "probabilities of obtaining arbitrary individual mean values $x_i$,\n",
+    "but with the constraint that the average is $z$. We can express this through\n",
+    "the following expression"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1bffca97",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\tilde{p}(z)=\\int dx_1p(x_1)\\int dx_2p(x_2)\\dots\\int dx_mp(x_m)\n",
+    "    \\delta(z-\\frac{x_1+x_2+\\dots+x_m}{m}),\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0dacb6fc",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where the $\\delta$-function enbodies the constraint that the mean is $z$.\n",
+    "All measurements that lead to each individual $x_i$ are expected to\n",
+    "be independent, which in turn means that we can express $\\tilde{p}$ as the \n",
+    "product of individual $p(x_i)$.  The independence assumption is important in the derivation of the central limit theorem."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "baeedf81",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Rewriting the $\\delta$-function\n",
+    "\n",
+    "If we use the integral expression for the $\\delta$-function"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "20cc7770",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\delta(z-\\frac{x_1+x_2+\\dots+x_m}{m})=\\frac{1}{2\\pi}\\int_{-\\infty}^{\\infty}\n",
+    "   dq\\exp{\\left(iq(z-\\frac{x_1+x_2+\\dots+x_m}{m})\\right)},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f67d3b94",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and inserting $e^{i\\mu q-i\\mu q}$ where $\\mu$ is the mean value\n",
+    "we arrive at"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "17f59fb6",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\tilde{p}(z)=\\frac{1}{2\\pi}\\int_{-\\infty}^{\\infty}\n",
+    "   dq\\exp{\\left(iq(z-\\mu)\\right)}\\left[\\int_{-\\infty}^{\\infty}\n",
+    "   dxp(x)\\exp{\\left(iq(\\mu-x)/m\\right)}\\right]^m,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5f899fbe",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "with the integral over $x$ resulting in"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "19a1f5bb",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\int_{-\\infty}^{\\infty}dxp(x)\\exp{\\left(iq(\\mu-x)/m\\right)}=\n",
+    "  \\int_{-\\infty}^{\\infty}dxp(x)\n",
+    "   \\left[1+\\frac{iq(\\mu-x)}{m}-\\frac{q^2(\\mu-x)^2}{2m^2}+\\dots\\right].\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1db8fcf2",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Identifying Terms\n",
+    "\n",
+    "The second term on the rhs disappears since this is just the mean and \n",
+    "employing the definition of $\\sigma^2$ we have"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "bfadf7e5",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\int_{-\\infty}^{\\infty}dxp(x)e^{\\left(iq(\\mu-x)/m\\right)}=\n",
+    "  1-\\frac{q^2\\sigma^2}{2m^2}+\\dots,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7c65ce24",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "resulting in"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8cd5650a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\left[\\int_{-\\infty}^{\\infty}dxp(x)\\exp{\\left(iq(\\mu-x)/m\\right)}\\right]^m\\approx\n",
+    "  \\left[1-\\frac{q^2\\sigma^2}{2m^2}+\\dots \\right]^m,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "11fdc936",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and in the limit $m\\rightarrow \\infty$ we obtain"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ed88642e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\tilde{p}(z)=\\frac{1}{\\sqrt{2\\pi}(\\sigma/\\sqrt{m})}\n",
+    "    \\exp{\\left(-\\frac{(z-\\mu)^2}{2(\\sigma/\\sqrt{m})^2}\\right)},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "82c61b81",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "which is the normal distribution with variance\n",
+    "$\\sigma^2_m=\\sigma^2/m$, where $\\sigma$ is the variance of the PDF $p(x)$\n",
+    "and $\\mu$ is also the mean of the PDF $p(x)$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "bc43db46",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Wrapping it up\n",
+    "\n",
+    "Thus, the central limit theorem states that the PDF $\\tilde{p}(z)$ of\n",
+    "the average of $m$ random values corresponding to a PDF $p(x)$ \n",
+    "is a normal distribution whose mean is the \n",
+    "mean value of the PDF $p(x)$ and whose variance is the variance\n",
+    "of the PDF $p(x)$ divided by $m$, the number of values used to compute $z$.\n",
+    "\n",
+    "The central limit theorem leads to the well-known expression for the\n",
+    "standard deviation, given by"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "25418113",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\sigma_m=\n",
+    "\\frac{\\sigma}{\\sqrt{m}}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e5d3c3eb",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "The latter is true only if the average value is known exactly. This is obtained in the limit\n",
+    "$m\\rightarrow \\infty$  only. Because the mean and the variance are measured quantities we obtain \n",
+    "the familiar expression in statistics (the so-called Bessel correction)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c504cba4",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\sigma_m\\approx \n",
+    "\\frac{\\sigma}{\\sqrt{m-1}}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "079ded2a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "In many cases however the above estimate for the standard deviation,\n",
+    "in particular if correlations are strong, may be too simplistic. Keep\n",
+    "in mind that we have assumed that the variables $x$ are independent\n",
+    "and identically distributed. This is obviously not always the\n",
+    "case. For example, the random numbers (or better pseudorandom numbers)\n",
+    "we generate in various calculations do always exhibit some\n",
+    "correlations.\n",
+    "\n",
+    "The theorem is satisfied by a large class of PDFs. Note however that for a\n",
+    "finite $m$, it is not always possible to find a closed form /analytic expression for\n",
+    "$\\tilde{p}(x)$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e8534a50",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Confidence Intervals\n",
+    "\n",
+    "Confidence intervals are used in statistics and represent a type of estimate\n",
+    "computed from the observed data. This gives a range of values for an\n",
+    "unknown parameter such as the parameters $\\boldsymbol{\\theta}$ from linear regression.\n",
+    "\n",
+    "With the OLS expressions for the parameters $\\boldsymbol{\\theta}$ we found \n",
+    "$\\mathbb{E}(\\boldsymbol{\\theta}) = \\boldsymbol{\\theta}$, which means that the estimator of the regression parameters is unbiased.\n",
+    "\n",
+    "In the exercises this week we show that the variance of the estimate of the $j$-th regression coefficient is\n",
+    "$\\boldsymbol{\\sigma}^2 (\\boldsymbol{\\theta}_j ) = \\boldsymbol{\\sigma}^2 [(\\mathbf{X}^{T} \\mathbf{X})^{-1}]_{jj} $.\n",
+    "\n",
+    "This quantity can be used to\n",
+    "construct a confidence interval for the estimates."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2fc73431",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Standard Approach based on the Normal Distribution\n",
+    "\n",
+    "We will assume that the parameters $\\theta$ follow a normal\n",
+    "distribution.  We can then define the confidence interval.  Here we will be using as\n",
+    "shorthands $\\mu_{\\theta}$ for the above mean value and $\\sigma_{\\theta}$\n",
+    "for the standard deviation. We have then a confidence interval"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0f8b0845",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\left(\\mu_{\\theta}\\pm \\frac{z\\sigma_{\\theta}}{\\sqrt{n}}\\right),\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "25105753",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where $z$ defines the level of certainty (or confidence). For a normal\n",
+    "distribution typical parameters are $z=2.576$ which corresponds to a\n",
+    "confidence of $99\\%$ while $z=1.96$ corresponds to a confidence of\n",
+    "$95\\%$.  A confidence level of $95\\%$ is commonly used and it is\n",
+    "normally referred to as a *two-sigmas* confidence level, that is we\n",
+    "approximate $z\\approx 2$.\n",
+    "\n",
+    "For more discussions of confidence intervals (and in particular linked with a discussion of the bootstrap method), see chapter 5 of the textbook by [Davison on the Bootstrap Methods and their Applications](https://www.cambridge.org/core/books/bootstrap-methods-and-their-application/ED2FD043579F27952363566DC09CBD6A)\n",
+    "\n",
+    "In this text you will also find an in-depth discussion of the\n",
+    "Bootstrap method, why it works and various theorems related to it."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "89be6eea",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Resampling methods: Bootstrap background\n",
+    "\n",
+    "Since $\\widehat{\\theta} = \\widehat{\\theta}(\\boldsymbol{X})$ is a function of random variables,\n",
+    "$\\widehat{\\theta}$ itself must be a random variable. Thus it has\n",
+    "a pdf, call this function $p(\\boldsymbol{t})$. The aim of the bootstrap is to\n",
+    "estimate $p(\\boldsymbol{t})$ by the relative frequency of\n",
+    "$\\widehat{\\theta}$. You can think of this as using a histogram\n",
+    "in the place of $p(\\boldsymbol{t})$. If the relative frequency closely\n",
+    "resembles $p(\\vec{t})$, then using numerics, it is straight forward to\n",
+    "estimate all the interesting parameters of $p(\\boldsymbol{t})$ using point\n",
+    "estimators."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6c240b38",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Resampling methods: More Bootstrap background\n",
+    "\n",
+    "In the case that $\\widehat{\\theta}$ has\n",
+    "more than one component, and the components are independent, we use the\n",
+    "same estimator on each component separately.  If the probability\n",
+    "density function of $X_i$, $p(x)$, had been known, then it would have\n",
+    "been straightforward to do this by: \n",
+    "1. Drawing lots of numbers from $p(x)$, suppose we call one such set of numbers $(X_1^*, X_2^*, \\cdots, X_n^*)$. \n",
+    "\n",
+    "2. Then using these numbers, we could compute a replica of $\\widehat{\\theta}$ called $\\widehat{\\theta}^*$. \n",
+    "\n",
+    "By repeated use of the above two points, many\n",
+    "estimates of $\\widehat{\\theta}$ can  be obtained. The\n",
+    "idea is to use the relative frequency of $\\widehat{\\theta}^*$\n",
+    "(think of a histogram) as an estimate of $p(\\boldsymbol{t})$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "fbd95a5c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Resampling methods: Bootstrap approach\n",
+    "\n",
+    "But\n",
+    "unless there is enough information available about the process that\n",
+    "generated $X_1,X_2,\\cdots,X_n$, $p(x)$ is in general\n",
+    "unknown. Therefore, [Efron in 1979](https://projecteuclid.org/euclid.aos/1176344552)  asked the\n",
+    "question: What if we replace $p(x)$ by the relative frequency\n",
+    "of the observation $X_i$?\n",
+    "\n",
+    "If we draw observations in accordance with\n",
+    "the relative frequency of the observations, will we obtain the same\n",
+    "result in some asymptotic sense? The answer is yes."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "dc50d43a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Resampling methods: Bootstrap steps\n",
+    "\n",
+    "The independent bootstrap works like this: \n",
+    "\n",
+    "1. Draw with replacement $n$ numbers for the observed variables $\\boldsymbol{x} = (x_1,x_2,\\cdots,x_n)$. \n",
+    "\n",
+    "2. Define a vector $\\boldsymbol{x}^*$ containing the values which were drawn from $\\boldsymbol{x}$. \n",
+    "\n",
+    "3. Using the vector $\\boldsymbol{x}^*$ compute $\\widehat{\\theta}^*$ by evaluating $\\widehat \\theta$ under the observations $\\boldsymbol{x}^*$. \n",
+    "\n",
+    "4. Repeat this process $k$ times. \n",
+    "\n",
+    "When you are done, you can draw a histogram of the relative frequency\n",
+    "of $\\widehat \\theta^*$. This is your estimate of the probability\n",
+    "distribution $p(t)$. Using this probability distribution you can\n",
+    "estimate any statistics thereof. In principle you never draw the\n",
+    "histogram of the relative frequency of $\\widehat{\\theta}^*$. Instead\n",
+    "you use the estimators corresponding to the statistic of interest. For\n",
+    "example, if you are interested in estimating the variance of $\\widehat\n",
+    "\\theta$, apply the etsimator $\\widehat \\sigma^2$ to the values\n",
+    "$\\widehat \\theta^*$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "283068cc",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Code example for the Bootstrap method\n",
+    "\n",
+    "The following code starts with a Gaussian distribution with mean value\n",
+    "$\\mu =100$ and variance $\\sigma=15$. We use this to generate the data\n",
+    "used in the bootstrap analysis. The bootstrap analysis returns a data\n",
+    "set after a given number of bootstrap operations (as many as we have\n",
+    "data points). This data set consists of estimated mean values for each\n",
+    "bootstrap operation. The histogram generated by the bootstrap method\n",
+    "shows that the distribution for these mean values is also a Gaussian,\n",
+    "centered around the mean value $\\mu=100$ but with standard deviation\n",
+    "$\\sigma/\\sqrt{n}$, where $n$ is the number of bootstrap samples (in\n",
+    "this case the same as the number of original data points). The value\n",
+    "of the standard deviation is what we expect from the central limit\n",
+    "theorem."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "id": "ff4790ba",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "%matplotlib inline\n",
+    "\n",
+    "import numpy as np\n",
+    "from time import time\n",
+    "from scipy.stats import norm\n",
+    "import matplotlib.pyplot as plt\n",
+    "\n",
+    "# Returns mean of bootstrap samples \n",
+    "# Bootstrap algorithm\n",
+    "def bootstrap(data, datapoints):\n",
+    "    t = np.zeros(datapoints)\n",
+    "    n = len(data)\n",
+    "    # non-parametric bootstrap         \n",
+    "    for i in range(datapoints):\n",
+    "        t[i] = np.mean(data[np.random.randint(0,n,n)])\n",
+    "    # analysis    \n",
+    "    print(\"Bootstrap Statistics :\")\n",
+    "    print(\"original           bias      std. error\")\n",
+    "    print(\"%8g %8g %14g %15g\" % (np.mean(data), np.std(data),np.mean(t),np.std(t)))\n",
+    "    return t\n",
+    "\n",
+    "# We set the mean value to 100 and the standard deviation to 15\n",
+    "mu, sigma = 100, 15\n",
+    "datapoints = 10000\n",
+    "# We generate random numbers according to the normal distribution\n",
+    "x = mu + sigma*np.random.randn(datapoints)\n",
+    "# bootstrap returns the data sample                                    \n",
+    "t = bootstrap(x, datapoints)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3e6adc2f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "We see that our new variance and from that the standard deviation, agrees with the central limit theorem."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6ec8223c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Plotting the Histogram"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "id": "3cf4144d",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "# the histogram of the bootstrapped data (normalized data if density = True)\n",
+    "n, binsboot, patches = plt.hist(t, 50, density=True, facecolor='red', alpha=0.75)\n",
+    "# add a 'best fit' line  \n",
+    "y = norm.pdf(binsboot, np.mean(t), np.std(t))\n",
+    "lt = plt.plot(binsboot, y, 'b', linewidth=1)\n",
+    "plt.xlabel('x')\n",
+    "plt.ylabel('Probability')\n",
+    "plt.grid(True)\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "db5a8f91",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## The bias-variance tradeoff\n",
+    "\n",
+    "We will discuss the bias-variance tradeoff in the context of\n",
+    "continuous predictions such as regression. However, many of the\n",
+    "intuitions and ideas discussed here also carry over to classification\n",
+    "tasks. Consider a dataset $\\mathcal{D}$ consisting of the data\n",
+    "$\\mathbf{X}_\\mathcal{D}=\\{(y_j, \\boldsymbol{x}_j), j=0\\ldots n-1\\}$. \n",
+    "\n",
+    "Let us assume that the true data is generated from a noisy model"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "327bce6a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\boldsymbol{y}=f(\\boldsymbol{x}) + \\boldsymbol{\\epsilon}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1c671d4e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where $\\epsilon$ is normally distributed with mean zero and standard deviation $\\sigma^2$.\n",
+    "\n",
+    "In our derivation of the ordinary least squares method we defined then\n",
+    "an approximation to the function $f$ in terms of the parameters\n",
+    "$\\boldsymbol{\\theta}$ and the design matrix $\\boldsymbol{X}$ which embody our model,\n",
+    "that is $\\boldsymbol{\\tilde{y}}=\\boldsymbol{X}\\boldsymbol{\\theta}$. \n",
+    "\n",
+    "Thereafter we found the parameters $\\boldsymbol{\\theta}$ by optimizing the means squared error via the so-called cost function"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6e05fc43",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "C(\\boldsymbol{X},\\boldsymbol{\\theta}) =\\frac{1}{n}\\sum_{i=0}^{n-1}(y_i-\\tilde{y}_i)^2=\\mathbb{E}\\left[(\\boldsymbol{y}-\\boldsymbol{\\tilde{y}})^2\\right].\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c45e0752",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "We can rewrite this as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "bafa4ab6",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\mathbb{E}\\left[(\\boldsymbol{y}-\\boldsymbol{\\tilde{y}})^2\\right]=\\frac{1}{n}\\sum_i(f_i-\\mathbb{E}\\left[\\boldsymbol{\\tilde{y}}\\right])^2+\\frac{1}{n}\\sum_i(\\tilde{y}_i-\\mathbb{E}\\left[\\boldsymbol{\\tilde{y}}\\right])^2+\\sigma^2.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ea0bc471",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "The three terms represent the square of the bias of the learning\n",
+    "method, which can be thought of as the error caused by the simplifying\n",
+    "assumptions built into the method. The second term represents the\n",
+    "variance of the chosen model and finally the last terms is variance of\n",
+    "the error $\\boldsymbol{\\epsilon}$.\n",
+    "\n",
+    "To derive this equation, we need to recall that the variance of $\\boldsymbol{y}$ and $\\boldsymbol{\\epsilon}$ are both equal to $\\sigma^2$. The mean value of $\\boldsymbol{\\epsilon}$ is by definition equal to zero. Furthermore, the function $f$ is not a stochastics variable, idem for $\\boldsymbol{\\tilde{y}}$.\n",
+    "We use a more compact notation in terms of the expectation value"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "08b603f3",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\mathbb{E}\\left[(\\boldsymbol{y}-\\boldsymbol{\\tilde{y}})^2\\right]=\\mathbb{E}\\left[(\\boldsymbol{f}+\\boldsymbol{\\epsilon}-\\boldsymbol{\\tilde{y}})^2\\right],\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4114d10e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and adding and subtracting $\\mathbb{E}\\left[\\boldsymbol{\\tilde{y}}\\right]$ we get"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8890c666",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\mathbb{E}\\left[(\\boldsymbol{y}-\\boldsymbol{\\tilde{y}})^2\\right]=\\mathbb{E}\\left[(\\boldsymbol{f}+\\boldsymbol{\\epsilon}-\\boldsymbol{\\tilde{y}}+\\mathbb{E}\\left[\\boldsymbol{\\tilde{y}}\\right]-\\mathbb{E}\\left[\\boldsymbol{\\tilde{y}}\\right])^2\\right],\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7d5b7ce4",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "which, using the abovementioned expectation values can be rewritten as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3913c5b9",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\mathbb{E}\\left[(\\boldsymbol{y}-\\boldsymbol{\\tilde{y}})^2\\right]=\\mathbb{E}\\left[(\\boldsymbol{y}-\\mathbb{E}\\left[\\boldsymbol{\\tilde{y}}\\right])^2\\right]+\\mathrm{Var}\\left[\\boldsymbol{\\tilde{y}}\\right]+\\sigma^2,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5e0067b1",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "that is the rewriting in terms of the so-called bias, the variance of the model $\\boldsymbol{\\tilde{y}}$ and the variance of $\\boldsymbol{\\epsilon}$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "326bc8f1",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## A way to Read the Bias-Variance Tradeoff\n",
+    "\n",
+    "<!-- dom:FIGURE: [figures/BiasVariance.png, width=600 frac=0.9] -->\n",
+    "<!-- begin figure -->\n",
+    "\n",
+    "<img src=\"figures/BiasVariance.png\" width=\"600\"><p style=\"font-size: 0.9em\"><i>Figure 1: </i></p>\n",
+    "<!-- end figure -->"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d3713eca",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Example code for Bias-Variance tradeoff"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "id": "01c3b507",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "import matplotlib.pyplot as plt\n",
+    "import numpy as np\n",
+    "from sklearn.linear_model import LinearRegression, Ridge, Lasso\n",
+    "from sklearn.preprocessing import PolynomialFeatures\n",
+    "from sklearn.model_selection import train_test_split\n",
+    "from sklearn.pipeline import make_pipeline\n",
+    "from sklearn.utils import resample\n",
+    "\n",
+    "np.random.seed(2018)\n",
+    "\n",
+    "n = 500\n",
+    "n_boostraps = 100\n",
+    "degree = 18  # A quite high value, just to show.\n",
+    "noise = 0.1\n",
+    "\n",
+    "# Make data set.\n",
+    "x = np.linspace(-1, 3, n).reshape(-1, 1)\n",
+    "y = np.exp(-x**2) + 1.5 * np.exp(-(x-2)**2) + np.random.normal(0, 0.1, x.shape)\n",
+    "\n",
+    "# Hold out some test data that is never used in training.\n",
+    "x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2)\n",
+    "\n",
+    "# Combine x transformation and model into one operation.\n",
+    "# Not neccesary, but convenient.\n",
+    "model = make_pipeline(PolynomialFeatures(degree=degree), LinearRegression(fit_intercept=False))\n",
+    "\n",
+    "# The following (m x n_bootstraps) matrix holds the column vectors y_pred\n",
+    "# for each bootstrap iteration.\n",
+    "y_pred = np.empty((y_test.shape[0], n_boostraps))\n",
+    "for i in range(n_boostraps):\n",
+    "    x_, y_ = resample(x_train, y_train)\n",
+    "\n",
+    "    # Evaluate the new model on the same test data each time.\n",
+    "    y_pred[:, i] = model.fit(x_, y_).predict(x_test).ravel()\n",
+    "\n",
+    "# Note: Expectations and variances taken w.r.t. different training\n",
+    "# data sets, hence the axis=1. Subsequent means are taken across the test data\n",
+    "# set in order to obtain a total value, but before this we have error/bias/variance\n",
+    "# calculated per data point in the test set.\n",
+    "# Note 2: The use of keepdims=True is important in the calculation of bias as this \n",
+    "# maintains the column vector form. Dropping this yields very unexpected results.\n",
+    "error = np.mean( np.mean((y_test - y_pred)**2, axis=1, keepdims=True) )\n",
+    "bias = np.mean( (y_test - np.mean(y_pred, axis=1, keepdims=True))**2 )\n",
+    "variance = np.mean( np.var(y_pred, axis=1, keepdims=True) )\n",
+    "print('Error:', error)\n",
+    "print('Bias^2:', bias)\n",
+    "print('Var:', variance)\n",
+    "print('{} >= {} + {} = {}'.format(error, bias, variance, bias+variance))\n",
+    "\n",
+    "plt.plot(x[::5, :], y[::5, :], label='f(x)')\n",
+    "plt.scatter(x_test, y_test, label='Data points')\n",
+    "plt.scatter(x_test, np.mean(y_pred, axis=1), label='Pred')\n",
+    "plt.legend()\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "949e3a5e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Understanding what happens"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "id": "7e7f4926",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "import matplotlib.pyplot as plt\n",
+    "import numpy as np\n",
+    "from sklearn.linear_model import LinearRegression, Ridge, Lasso\n",
+    "from sklearn.preprocessing import PolynomialFeatures\n",
+    "from sklearn.model_selection import train_test_split\n",
+    "from sklearn.pipeline import make_pipeline\n",
+    "from sklearn.utils import resample\n",
+    "\n",
+    "np.random.seed(2018)\n",
+    "\n",
+    "n = 40\n",
+    "n_boostraps = 100\n",
+    "maxdegree = 14\n",
+    "\n",
+    "\n",
+    "# Make data set.\n",
+    "x = np.linspace(-3, 3, n).reshape(-1, 1)\n",
+    "y = np.exp(-x**2) + 1.5 * np.exp(-(x-2)**2)+ np.random.normal(0, 0.1, x.shape)\n",
+    "error = np.zeros(maxdegree)\n",
+    "bias = np.zeros(maxdegree)\n",
+    "variance = np.zeros(maxdegree)\n",
+    "polydegree = np.zeros(maxdegree)\n",
+    "x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2)\n",
+    "\n",
+    "for degree in range(maxdegree):\n",
+    "    model = make_pipeline(PolynomialFeatures(degree=degree), LinearRegression(fit_intercept=False))\n",
+    "    y_pred = np.empty((y_test.shape[0], n_boostraps))\n",
+    "    for i in range(n_boostraps):\n",
+    "        x_, y_ = resample(x_train, y_train)\n",
+    "        y_pred[:, i] = model.fit(x_, y_).predict(x_test).ravel()\n",
+    "\n",
+    "    polydegree[degree] = degree\n",
+    "    error[degree] = np.mean( np.mean((y_test - y_pred)**2, axis=1, keepdims=True) )\n",
+    "    bias[degree] = np.mean( (y_test - np.mean(y_pred, axis=1, keepdims=True))**2 )\n",
+    "    variance[degree] = np.mean( np.var(y_pred, axis=1, keepdims=True) )\n",
+    "    print('Polynomial degree:', degree)\n",
+    "    print('Error:', error[degree])\n",
+    "    print('Bias^2:', bias[degree])\n",
+    "    print('Var:', variance[degree])\n",
+    "    print('{} >= {} + {} = {}'.format(error[degree], bias[degree], variance[degree], bias[degree]+variance[degree]))\n",
+    "\n",
+    "plt.plot(polydegree, error, label='Error')\n",
+    "plt.plot(polydegree, bias, label='bias')\n",
+    "plt.plot(polydegree, variance, label='Variance')\n",
+    "plt.legend()\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "33c5cae5",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Summing up\n",
+    "\n",
+    "The bias-variance tradeoff summarizes the fundamental tension in\n",
+    "machine learning, particularly supervised learning, between the\n",
+    "complexity of a model and the amount of training data needed to train\n",
+    "it.  Since data is often limited, in practice it is often useful to\n",
+    "use a less-complex model with higher bias, that is  a model whose asymptotic\n",
+    "performance is worse than another model because it is easier to\n",
+    "train and less sensitive to sampling noise arising from having a\n",
+    "finite-sized training dataset (smaller variance). \n",
+    "\n",
+    "The above equations tell us that in\n",
+    "order to minimize the expected test error, we need to select a\n",
+    "statistical learning method that simultaneously achieves low variance\n",
+    "and low bias. Note that variance is inherently a nonnegative quantity,\n",
+    "and squared bias is also nonnegative. Hence, we see that the expected\n",
+    "test MSE can never lie below $Var(\\epsilon)$, the irreducible error.\n",
+    "\n",
+    "What do we mean by the variance and bias of a statistical learning\n",
+    "method? The variance refers to the amount by which our model would change if we\n",
+    "estimated it using a different training data set. Since the training\n",
+    "data are used to fit the statistical learning method, different\n",
+    "training data sets  will result in a different estimate. But ideally the\n",
+    "estimate for our model should not vary too much between training\n",
+    "sets. However, if a method has high variance  then small changes in\n",
+    "the training data can result in large changes in the model. In general, more\n",
+    "flexible statistical methods have higher variance.\n",
+    "\n",
+    "You may also find this recent [article](https://www.pnas.org/content/116/32/15849) of interest."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f931f0f2",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Another Example from Scikit-Learn's Repository\n",
+    "\n",
+    "This example demonstrates the problems of underfitting and overfitting and\n",
+    "how we can use linear regression with polynomial features to approximate\n",
+    "nonlinear functions. The plot shows the function that we want to approximate,\n",
+    "which is a part of the cosine function. In addition, the samples from the\n",
+    "real function and the approximations of different models are displayed. The\n",
+    "models have polynomial features of different degrees. We can see that a\n",
+    "linear function (polynomial with degree 1) is not sufficient to fit the\n",
+    "training samples. This is called **underfitting**. A polynomial of degree 4\n",
+    "approximates the true function almost perfectly. However, for higher degrees\n",
+    "the model will **overfit** the training data, i.e. it learns the noise of the\n",
+    "training data.\n",
+    "We evaluate quantitatively overfitting and underfitting by using\n",
+    "cross-validation. We calculate the mean squared error (MSE) on the validation\n",
+    "set, the higher, the less likely the model generalizes correctly from the\n",
+    "training data."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "id": "58daa28d",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "\n",
+    "\n",
+    "#print(__doc__)\n",
+    "\n",
+    "import numpy as np\n",
+    "import matplotlib.pyplot as plt\n",
+    "from sklearn.pipeline import Pipeline\n",
+    "from sklearn.preprocessing import PolynomialFeatures\n",
+    "from sklearn.linear_model import LinearRegression\n",
+    "from sklearn.model_selection import cross_val_score\n",
+    "\n",
+    "\n",
+    "def true_fun(X):\n",
+    "    return np.cos(1.5 * np.pi * X)\n",
+    "\n",
+    "np.random.seed(0)\n",
+    "\n",
+    "n_samples = 30\n",
+    "degrees = [1, 4, 15]\n",
+    "\n",
+    "X = np.sort(np.random.rand(n_samples))\n",
+    "y = true_fun(X) + np.random.randn(n_samples) * 0.1\n",
+    "\n",
+    "plt.figure(figsize=(14, 5))\n",
+    "for i in range(len(degrees)):\n",
+    "    ax = plt.subplot(1, len(degrees), i + 1)\n",
+    "    plt.setp(ax, xticks=(), yticks=())\n",
+    "\n",
+    "    polynomial_features = PolynomialFeatures(degree=degrees[i],\n",
+    "                                             include_bias=False)\n",
+    "    linear_regression = LinearRegression()\n",
+    "    pipeline = Pipeline([(\"polynomial_features\", polynomial_features),\n",
+    "                         (\"linear_regression\", linear_regression)])\n",
+    "    pipeline.fit(X[:, np.newaxis], y)\n",
+    "\n",
+    "    # Evaluate the models using crossvalidation\n",
+    "    scores = cross_val_score(pipeline, X[:, np.newaxis], y,\n",
+    "                             scoring=\"neg_mean_squared_error\", cv=10)\n",
+    "\n",
+    "    X_test = np.linspace(0, 1, 100)\n",
+    "    plt.plot(X_test, pipeline.predict(X_test[:, np.newaxis]), label=\"Model\")\n",
+    "    plt.plot(X_test, true_fun(X_test), label=\"True function\")\n",
+    "    plt.scatter(X, y, edgecolor='b', s=20, label=\"Samples\")\n",
+    "    plt.xlabel(\"x\")\n",
+    "    plt.ylabel(\"y\")\n",
+    "    plt.xlim((0, 1))\n",
+    "    plt.ylim((-2, 2))\n",
+    "    plt.legend(loc=\"best\")\n",
+    "    plt.title(\"Degree {}\\nMSE = {:.2e}(+/- {:.2e})\".format(\n",
+    "        degrees[i], -scores.mean(), scores.std()))\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3bbcf741",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Various steps in cross-validation\n",
+    "\n",
+    "When the repetitive splitting of the data set is done randomly,\n",
+    "samples may accidently end up in a fast majority of the splits in\n",
+    "either training or test set. Such samples may have an unbalanced\n",
+    "influence on either model building or prediction evaluation. To avoid\n",
+    "this $k$-fold cross-validation structures the data splitting. The\n",
+    "samples are divided into $k$ more or less equally sized exhaustive and\n",
+    "mutually exclusive subsets. In turn (at each split) one of these\n",
+    "subsets plays the role of the test set while the union of the\n",
+    "remaining subsets constitutes the training set. Such a splitting\n",
+    "warrants a balanced representation of each sample in both training and\n",
+    "test set over the splits. Still the division into the $k$ subsets\n",
+    "involves a degree of randomness. This may be fully excluded when\n",
+    "choosing $k=n$. This particular case is referred to as leave-one-out\n",
+    "cross-validation (LOOCV)."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4b0ffe06",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Cross-validation in brief\n",
+    "\n",
+    "For the various values of $k$\n",
+    "\n",
+    "1. shuffle the dataset randomly.\n",
+    "\n",
+    "2. Split the dataset into $k$ groups.\n",
+    "\n",
+    "3. For each unique group:\n",
+    "\n",
+    "a. Decide which group to use as set for test data\n",
+    "\n",
+    "b. Take the remaining groups as a training data set\n",
+    "\n",
+    "c. Fit a model on the training set and evaluate it on the test set\n",
+    "\n",
+    "d. Retain the evaluation score and discard the model\n",
+    "\n",
+    "5. Summarize the model using the sample of model evaluation scores"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b11baed6",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Code Example for Cross-validation and $k$-fold Cross-validation\n",
+    "\n",
+    "The code here uses Ridge regression with cross-validation (CV)  resampling and $k$-fold CV in order to fit a specific polynomial."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "id": "39e76d49",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "import numpy as np\n",
+    "import matplotlib.pyplot as plt\n",
+    "from sklearn.model_selection import KFold\n",
+    "from sklearn.linear_model import Ridge\n",
+    "from sklearn.model_selection import cross_val_score\n",
+    "from sklearn.preprocessing import PolynomialFeatures\n",
+    "\n",
+    "# A seed just to ensure that the random numbers are the same for every run.\n",
+    "# Useful for eventual debugging.\n",
+    "np.random.seed(3155)\n",
+    "\n",
+    "# Generate the data.\n",
+    "nsamples = 100\n",
+    "x = np.random.randn(nsamples)\n",
+    "y = 3*x**2 + np.random.randn(nsamples)\n",
+    "\n",
+    "## Cross-validation on Ridge regression using KFold only\n",
+    "\n",
+    "# Decide degree on polynomial to fit\n",
+    "poly = PolynomialFeatures(degree = 6)\n",
+    "\n",
+    "# Decide which values of lambda to use\n",
+    "nlambdas = 500\n",
+    "lambdas = np.logspace(-3, 5, nlambdas)\n",
+    "\n",
+    "# Initialize a KFold instance\n",
+    "k = 5\n",
+    "kfold = KFold(n_splits = k)\n",
+    "\n",
+    "# Perform the cross-validation to estimate MSE\n",
+    "scores_KFold = np.zeros((nlambdas, k))\n",
+    "\n",
+    "i = 0\n",
+    "for lmb in lambdas:\n",
+    "    ridge = Ridge(alpha = lmb)\n",
+    "    j = 0\n",
+    "    for train_inds, test_inds in kfold.split(x):\n",
+    "        xtrain = x[train_inds]\n",
+    "        ytrain = y[train_inds]\n",
+    "\n",
+    "        xtest = x[test_inds]\n",
+    "        ytest = y[test_inds]\n",
+    "\n",
+    "        Xtrain = poly.fit_transform(xtrain[:, np.newaxis])\n",
+    "        ridge.fit(Xtrain, ytrain[:, np.newaxis])\n",
+    "\n",
+    "        Xtest = poly.fit_transform(xtest[:, np.newaxis])\n",
+    "        ypred = ridge.predict(Xtest)\n",
+    "\n",
+    "        scores_KFold[i,j] = np.sum((ypred - ytest[:, np.newaxis])**2)/np.size(ypred)\n",
+    "\n",
+    "        j += 1\n",
+    "    i += 1\n",
+    "\n",
+    "\n",
+    "estimated_mse_KFold = np.mean(scores_KFold, axis = 1)\n",
+    "\n",
+    "## Cross-validation using cross_val_score from sklearn along with KFold\n",
+    "\n",
+    "# kfold is an instance initialized above as:\n",
+    "# kfold = KFold(n_splits = k)\n",
+    "\n",
+    "estimated_mse_sklearn = np.zeros(nlambdas)\n",
+    "i = 0\n",
+    "for lmb in lambdas:\n",
+    "    ridge = Ridge(alpha = lmb)\n",
+    "\n",
+    "    X = poly.fit_transform(x[:, np.newaxis])\n",
+    "    estimated_mse_folds = cross_val_score(ridge, X, y[:, np.newaxis], scoring='neg_mean_squared_error', cv=kfold)\n",
+    "\n",
+    "    # cross_val_score return an array containing the estimated negative mse for every fold.\n",
+    "    # we have to the the mean of every array in order to get an estimate of the mse of the model\n",
+    "    estimated_mse_sklearn[i] = np.mean(-estimated_mse_folds)\n",
+    "\n",
+    "    i += 1\n",
+    "\n",
+    "## Plot and compare the slightly different ways to perform cross-validation\n",
+    "\n",
+    "plt.figure()\n",
+    "\n",
+    "plt.plot(np.log10(lambdas), estimated_mse_sklearn, label = 'cross_val_score')\n",
+    "plt.plot(np.log10(lambdas), estimated_mse_KFold, 'r--', label = 'KFold')\n",
+    "\n",
+    "plt.xlabel('log10(lambda)')\n",
+    "plt.ylabel('mse')\n",
+    "\n",
+    "plt.legend()\n",
+    "\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e7d12ef0",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## More examples on bootstrap and cross-validation and errors"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "id": "47f6ae18",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "# Common imports\n",
+    "import os\n",
+    "import numpy as np\n",
+    "import pandas as pd\n",
+    "import matplotlib.pyplot as plt\n",
+    "from sklearn.linear_model import LinearRegression, Ridge, Lasso\n",
+    "from sklearn.model_selection import train_test_split\n",
+    "from sklearn.utils import resample\n",
+    "from sklearn.metrics import mean_squared_error\n",
+    "# Where to save the figures and data files\n",
+    "PROJECT_ROOT_DIR = \"Results\"\n",
+    "FIGURE_ID = \"Results/FigureFiles\"\n",
+    "DATA_ID = \"DataFiles/\"\n",
+    "\n",
+    "if not os.path.exists(PROJECT_ROOT_DIR):\n",
+    "    os.mkdir(PROJECT_ROOT_DIR)\n",
+    "\n",
+    "if not os.path.exists(FIGURE_ID):\n",
+    "    os.makedirs(FIGURE_ID)\n",
+    "\n",
+    "if not os.path.exists(DATA_ID):\n",
+    "    os.makedirs(DATA_ID)\n",
+    "\n",
+    "def image_path(fig_id):\n",
+    "    return os.path.join(FIGURE_ID, fig_id)\n",
+    "\n",
+    "def data_path(dat_id):\n",
+    "    return os.path.join(DATA_ID, dat_id)\n",
+    "\n",
+    "def save_fig(fig_id):\n",
+    "    plt.savefig(image_path(fig_id) + \".png\", format='png')\n",
+    "\n",
+    "infile = open(data_path(\"EoS.csv\"),'r')\n",
+    "\n",
+    "# Read the EoS data as  csv file and organize the data into two arrays with density and energies\n",
+    "EoS = pd.read_csv(infile, names=('Density', 'Energy'))\n",
+    "EoS['Energy'] = pd.to_numeric(EoS['Energy'], errors='coerce')\n",
+    "EoS = EoS.dropna()\n",
+    "Energies = EoS['Energy']\n",
+    "Density = EoS['Density']\n",
+    "#  The design matrix now as function of various polytrops\n",
+    "\n",
+    "Maxpolydegree = 30\n",
+    "X = np.zeros((len(Density),Maxpolydegree))\n",
+    "X[:,0] = 1.0\n",
+    "testerror = np.zeros(Maxpolydegree)\n",
+    "trainingerror = np.zeros(Maxpolydegree)\n",
+    "polynomial = np.zeros(Maxpolydegree)\n",
+    "\n",
+    "trials = 100\n",
+    "for polydegree in range(1, Maxpolydegree):\n",
+    "    polynomial[polydegree] = polydegree\n",
+    "    for degree in range(polydegree):\n",
+    "        X[:,degree] = Density**(degree/3.0)\n",
+    "\n",
+    "# loop over trials in order to estimate the expectation value of the MSE\n",
+    "    testerror[polydegree] = 0.0\n",
+    "    trainingerror[polydegree] = 0.0\n",
+    "    for samples in range(trials):\n",
+    "        x_train, x_test, y_train, y_test = train_test_split(X, Energies, test_size=0.2)\n",
+    "        model = LinearRegression(fit_intercept=False).fit(x_train, y_train)\n",
+    "        ypred = model.predict(x_train)\n",
+    "        ytilde = model.predict(x_test)\n",
+    "        testerror[polydegree] += mean_squared_error(y_test, ytilde)\n",
+    "        trainingerror[polydegree] += mean_squared_error(y_train, ypred) \n",
+    "\n",
+    "    testerror[polydegree] /= trials\n",
+    "    trainingerror[polydegree] /= trials\n",
+    "    print(\"Degree of polynomial: %3d\"% polynomial[polydegree])\n",
+    "    print(\"Mean squared error on training data: %.8f\" % trainingerror[polydegree])\n",
+    "    print(\"Mean squared error on test data: %.8f\" % testerror[polydegree])\n",
+    "\n",
+    "plt.plot(polynomial, np.log10(trainingerror), label='Training Error')\n",
+    "plt.plot(polynomial, np.log10(testerror), label='Test Error')\n",
+    "plt.xlabel('Polynomial degree')\n",
+    "plt.ylabel('log10[MSE]')\n",
+    "plt.legend()\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9c1d4754",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Note that we kept the intercept column in the fitting here. This means that we need to set the **intercept** in the call to the **Scikit-Learn** function as **False**. Alternatively, we could have set up the design matrix $X$ without the first column of ones."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b698ac66",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## The same example but now with cross-validation\n",
+    "\n",
+    "In this example we keep the intercept column again but add cross-validation in order to estimate the best possible value of the means squared error."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 8,
+   "id": "0a2409b0",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "# Common imports\n",
+    "import os\n",
+    "import numpy as np\n",
+    "import pandas as pd\n",
+    "import matplotlib.pyplot as plt\n",
+    "from sklearn.linear_model import LinearRegression, Ridge, Lasso\n",
+    "from sklearn.metrics import mean_squared_error\n",
+    "from sklearn.model_selection import KFold\n",
+    "from sklearn.model_selection import cross_val_score\n",
+    "\n",
+    "\n",
+    "# Where to save the figures and data files\n",
+    "PROJECT_ROOT_DIR = \"Results\"\n",
+    "FIGURE_ID = \"Results/FigureFiles\"\n",
+    "DATA_ID = \"DataFiles/\"\n",
+    "\n",
+    "if not os.path.exists(PROJECT_ROOT_DIR):\n",
+    "    os.mkdir(PROJECT_ROOT_DIR)\n",
+    "\n",
+    "if not os.path.exists(FIGURE_ID):\n",
+    "    os.makedirs(FIGURE_ID)\n",
+    "\n",
+    "if not os.path.exists(DATA_ID):\n",
+    "    os.makedirs(DATA_ID)\n",
+    "\n",
+    "def image_path(fig_id):\n",
+    "    return os.path.join(FIGURE_ID, fig_id)\n",
+    "\n",
+    "def data_path(dat_id):\n",
+    "    return os.path.join(DATA_ID, dat_id)\n",
+    "\n",
+    "def save_fig(fig_id):\n",
+    "    plt.savefig(image_path(fig_id) + \".png\", format='png')\n",
+    "\n",
+    "infile = open(data_path(\"EoS.csv\"),'r')\n",
+    "\n",
+    "# Read the EoS data as  csv file and organize the data into two arrays with density and energies\n",
+    "EoS = pd.read_csv(infile, names=('Density', 'Energy'))\n",
+    "EoS['Energy'] = pd.to_numeric(EoS['Energy'], errors='coerce')\n",
+    "EoS = EoS.dropna()\n",
+    "Energies = EoS['Energy']\n",
+    "Density = EoS['Density']\n",
+    "#  The design matrix now as function of various polytrops\n",
+    "\n",
+    "Maxpolydegree = 30\n",
+    "X = np.zeros((len(Density),Maxpolydegree))\n",
+    "X[:,0] = 1.0\n",
+    "estimated_mse_sklearn = np.zeros(Maxpolydegree)\n",
+    "polynomial = np.zeros(Maxpolydegree)\n",
+    "k =5\n",
+    "kfold = KFold(n_splits = k)\n",
+    "\n",
+    "for polydegree in range(1, Maxpolydegree):\n",
+    "    polynomial[polydegree] = polydegree\n",
+    "    for degree in range(polydegree):\n",
+    "        X[:,degree] = Density**(degree/3.0)\n",
+    "        OLS = LinearRegression(fit_intercept=False)\n",
+    "# loop over trials in order to estimate the expectation value of the MSE\n",
+    "    estimated_mse_folds = cross_val_score(OLS, X, Energies, scoring='neg_mean_squared_error', cv=kfold)\n",
+    "#[:, np.newaxis]\n",
+    "    estimated_mse_sklearn[polydegree] = np.mean(-estimated_mse_folds)\n",
+    "\n",
+    "plt.plot(polynomial, np.log10(estimated_mse_sklearn), label='Test Error')\n",
+    "plt.xlabel('Polynomial degree')\n",
+    "plt.ylabel('log10[MSE]')\n",
+    "plt.legend()\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "56f130b5",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Material for the lab sessions\n",
+    "\n",
+    "This week we will discuss during the first hour of each lab session\n",
+    "some technicalities related to the project and methods for updating\n",
+    "the learning like ADAgrad, RMSprop and ADAM. As teaching material, see\n",
+    "the jupyter-notebook from week 37 (September 12-16).\n",
+    "\n",
+    "For the lab session, the following video on cross validation (from 2024), could be helpful, see <https://www.youtube.com/watch?v=T9jjWsmsd1o>\n",
+    "\n",
+    "See also video on ADAgrad, RMSprop and ADAM (material from last week not covered during lecture) at <https://youtu.be/J_41Hld6tTU>"
+   ]
+  }
+ ],
+ "metadata": {},
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/doc/LectureNotes/week39.ipynb b/doc/LectureNotes/week39.ipynb
new file mode 100644
index 000000000..1f411fe62
--- /dev/null
+++ b/doc/LectureNotes/week39.ipynb
@@ -0,0 +1,2430 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "3a65fcc4",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- HTML file automatically generated from DocOnce source (https://github.com/doconce/doconce/)\n",
+    "doconce format html week39.do.txt --no_mako -->\n",
+    "<!-- dom:TITLE: Week 39: Resampling methods and logistic regression -->"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "284ac98b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "# Week 39: Resampling methods and logistic regression\n",
+    "**Morten Hjorth-Jensen**, Department of Physics, University of Oslo\n",
+    "\n",
+    "Date: **Week 39**"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "582e0b32",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Plan for week 39, September 22-26, 2025\n",
+    "\n",
+    "**Material for the lecture on Monday September 22.**\n",
+    "\n",
+    "1. Resampling techniques, Bootstrap and cross validation and bias-variance tradeoff\n",
+    "\n",
+    "2. Logistic regression, our first classification encounter and a stepping stone towards neural networks\n",
+    "\n",
+    "3. [Video of lecture](https://youtu.be/OVouJyhoksY)\n",
+    "\n",
+    "4. [Whiteboard notes](https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2024/FYSSTKweek39.pdf)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "08ea52de",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Readings and Videos, resampling methods\n",
+    "1. Raschka et al, pages 175-192\n",
+    "\n",
+    "2. Hastie et al Chapter 7, here we recommend 7.1-7.5 and 7.10 (cross-validation) and 7.11 (bootstrap). See <https://link.springer.com/book/10.1007/978-0-387-84858-7>.\n",
+    "\n",
+    "3. [Video on bias-variance tradeoff](https://www.youtube.com/watch?v=EuBBz3bI-aA)\n",
+    "\n",
+    "4. [Video on Bootstrapping](https://www.youtube.com/watch?v=Xz0x-8-cgaQ)\n",
+    "\n",
+    "5. [Video on cross validation](https://www.youtube.com/watch?v=fSytzGwwBVw)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a8d5878f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Readings and Videos, logistic regression\n",
+    "1. Hastie et al 4.1, 4.2 and 4.3 on logistic regression\n",
+    "\n",
+    "2. Raschka et al, pages 53-76 on Logistic regression and pages 37-52 on gradient optimization\n",
+    "\n",
+    "3. [Video on Logistic regression](https://www.youtube.com/watch?v=C5268D9t9Ak)\n",
+    "\n",
+    "4. [Yet another video on logistic regression](https://www.youtube.com/watch?v=yIYKR4sgzI8)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e93210f9",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Lab sessions week 39\n",
+    "\n",
+    "**Material for the lab sessions on Tuesday and Wednesday.**\n",
+    "\n",
+    "1. Discussions on how to structure your report for the first project\n",
+    "\n",
+    "2. Exercise for week 39 on how to write the abstract and the introduction of the report and how to include references. \n",
+    "\n",
+    "3. Work on project 1, in particular resampling methods like cross-validation and bootstrap. **For more discussions of project 1, chapter 5 of Goodfellow et al is a good read, in particular sections 5.1-5.5 and 5.7-5.11**.\n",
+    "\n",
+    "4. [Video on how to write scientific reports recorded during one of the lab sessions](https://youtu.be/tVW1ZDmZnwM)\n",
+    "\n",
+    "5. A general guideline can be found at <https://github.com/CompPhysics/MachineLearning/blob/master/doc/Projects/EvaluationGrading/EvaluationForm.md>."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c319a504",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Lecture material"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5f29284a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Resampling methods\n",
+    "Resampling methods are an indispensable tool in modern\n",
+    "statistics. They involve repeatedly drawing samples from a training\n",
+    "set and refitting a model of interest on each sample in order to\n",
+    "obtain additional information about the fitted model. For example, in\n",
+    "order to estimate the variability of a linear regression fit, we can\n",
+    "repeatedly draw different samples from the training data, fit a linear\n",
+    "regression to each new sample, and then examine the extent to which\n",
+    "the resulting fits differ. Such an approach may allow us to obtain\n",
+    "information that would not be available from fitting the model only\n",
+    "once using the original training sample.\n",
+    "\n",
+    "Two resampling methods are often used in Machine Learning analyses,\n",
+    "1. The **bootstrap method**\n",
+    "\n",
+    "2. and **Cross-Validation**\n",
+    "\n",
+    "In addition there are several other methods such as the Jackknife and the Blocking methods. This week will repeat some of the elements of the bootstrap method and focus more on cross-validation."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4a774608",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Resampling approaches can be computationally expensive\n",
+    "\n",
+    "Resampling approaches can be computationally expensive, because they\n",
+    "involve fitting the same statistical method multiple times using\n",
+    "different subsets of the training data. However, due to recent\n",
+    "advances in computing power, the computational requirements of\n",
+    "resampling methods generally are not prohibitive. In this chapter, we\n",
+    "discuss two of the most commonly used resampling methods,\n",
+    "cross-validation and the bootstrap. Both methods are important tools\n",
+    "in the practical application of many statistical learning\n",
+    "procedures. For example, cross-validation can be used to estimate the\n",
+    "test error associated with a given statistical learning method in\n",
+    "order to evaluate its performance, or to select the appropriate level\n",
+    "of flexibility. The process of evaluating a model’s performance is\n",
+    "known as model assessment, whereas the process of selecting the proper\n",
+    "level of flexibility for a model is known as model selection. The\n",
+    "bootstrap is widely used."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5e62c381",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Why resampling methods ?\n",
+    "**Statistical analysis.**\n",
+    "\n",
+    "* Our simulations can be treated as *computer experiments*. This is particularly the case for Monte Carlo methods which are widely used in statistical analyses.\n",
+    "\n",
+    "* The results can be analysed with the same statistical tools as we would use when analysing experimental data.\n",
+    "\n",
+    "* As in all experiments, we are looking for expectation values and an estimate of how accurate they are, i.e., possible sources for errors."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "96896342",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Statistical analysis\n",
+    "\n",
+    "* As in other experiments, many numerical  experiments have two classes of errors:\n",
+    "\n",
+    "  * Statistical errors\n",
+    "\n",
+    "  * Systematical errors\n",
+    "\n",
+    "* Statistical errors can be estimated using standard tools from statistics\n",
+    "\n",
+    "* Systematical errors are method specific and must be treated differently from case to case."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d5318be7",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Resampling methods\n",
+    "\n",
+    "With all these analytical equations for both the OLS and Ridge\n",
+    "regression, we will now outline how to assess a given model. This will\n",
+    "lead to a discussion of the so-called bias-variance tradeoff (see\n",
+    "below) and so-called resampling methods.\n",
+    "\n",
+    "One of the quantities we have discussed as a way to measure errors is\n",
+    "the mean-squared error (MSE), mainly used for fitting of continuous\n",
+    "functions. Another choice is the absolute error.\n",
+    "\n",
+    "In the discussions below we will focus on the MSE and in particular since we will split the data into test and training data,\n",
+    "we discuss the\n",
+    "1. prediction error or simply the **test error** $\\mathrm{Err_{Test}}$, where we have a fixed training set and the test error is the MSE arising from the data reserved for testing. We discuss also the \n",
+    "\n",
+    "2. training error $\\mathrm{Err_{Train}}$, which is the average loss over the training data.\n",
+    "\n",
+    "As our model becomes more and more complex, more of the training data tends to  used. The training may thence adapt to more complicated structures in the data. This may lead to a decrease in the bias (see below for code example) and a slight increase of the variance for the test error.\n",
+    "For a certain level of complexity the test error will reach minimum, before starting to increase again. The\n",
+    "training error reaches a saturation."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7597015e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Resampling methods: Bootstrap\n",
+    "Bootstrapping is a [non-parametric approach](https://en.wikipedia.org/wiki/Nonparametric_statistics) to statistical inference\n",
+    "that substitutes computation for more traditional distributional\n",
+    "assumptions and asymptotic results. Bootstrapping offers a number of\n",
+    "advantages: \n",
+    "1. The bootstrap is quite general, although there are some cases in which it fails.  \n",
+    "\n",
+    "2. Because it does not require distributional assumptions (such as normally distributed errors), the bootstrap can provide more accurate inferences when the data are not well behaved or when the sample size is small.  \n",
+    "\n",
+    "3. It is possible to apply the bootstrap to statistics with sampling distributions that are difficult to derive, even asymptotically. \n",
+    "\n",
+    "4. It is relatively simple to apply the bootstrap to complex data-collection plans (such as stratified and clustered samples).\n",
+    "\n",
+    "The textbook by [Davison on the Bootstrap Methods and their Applications](https://www.cambridge.org/core/books/bootstrap-methods-and-their-application/ED2FD043579F27952363566DC09CBD6A) provides many more insights and proofs. In this course we will take a more practical approach and use the results and theorems provided in the literature. For those interested in reading more about the bootstrap methods, we recommend the above text and the one by [Efron and Tibshirani](https://www.routledge.com/An-Introduction-to-the-Bootstrap/Efron-Tibshirani/p/book/9780412042317)."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "fbf69230",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## The bias-variance tradeoff\n",
+    "\n",
+    "We will discuss the bias-variance tradeoff in the context of\n",
+    "continuous predictions such as regression. However, many of the\n",
+    "intuitions and ideas discussed here also carry over to classification\n",
+    "tasks. Consider a dataset $\\mathcal{D}$ consisting of the data\n",
+    "$\\mathbf{X}_\\mathcal{D}=\\{(y_j, \\boldsymbol{x}_j), j=0\\ldots n-1\\}$. \n",
+    "\n",
+    "Let us assume that the true data is generated from a noisy model"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "358f7872",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\boldsymbol{y}=f(\\boldsymbol{x}) + \\boldsymbol{\\epsilon}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6a4aceef",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where $\\epsilon$ is normally distributed with mean zero and standard deviation $\\sigma^2$.\n",
+    "\n",
+    "In our derivation of the ordinary least squares method we defined then\n",
+    "an approximation to the function $f$ in terms of the parameters\n",
+    "$\\boldsymbol{\\theta}$ and the design matrix $\\boldsymbol{X}$ which embody our model,\n",
+    "that is $\\boldsymbol{\\tilde{y}}=\\boldsymbol{X}\\boldsymbol{\\theta}$. \n",
+    "\n",
+    "Thereafter we found the parameters $\\boldsymbol{\\theta}$ by optimizing the means squared error via the so-called cost function"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "84416669",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "C(\\boldsymbol{X},\\boldsymbol{\\theta}) =\\frac{1}{n}\\sum_{i=0}^{n-1}(y_i-\\tilde{y}_i)^2=\\mathbb{E}\\left[(\\boldsymbol{y}-\\boldsymbol{\\tilde{y}})^2\\right].\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0036358e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "We can rewrite this as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d712d2d7",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\mathbb{E}\\left[(\\boldsymbol{y}-\\boldsymbol{\\tilde{y}})^2\\right]=\\frac{1}{n}\\sum_i(f_i-\\mathbb{E}\\left[\\boldsymbol{\\tilde{y}}\\right])^2+\\frac{1}{n}\\sum_i(\\tilde{y}_i-\\mathbb{E}\\left[\\boldsymbol{\\tilde{y}}\\right])^2+\\sigma^2.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b71e48ac",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "The three terms represent the square of the bias of the learning\n",
+    "method, which can be thought of as the error caused by the simplifying\n",
+    "assumptions built into the method. The second term represents the\n",
+    "variance of the chosen model and finally the last terms is variance of\n",
+    "the error $\\boldsymbol{\\epsilon}$.\n",
+    "\n",
+    "To derive this equation, we need to recall that the variance of $\\boldsymbol{y}$ and $\\boldsymbol{\\epsilon}$ are both equal to $\\sigma^2$. The mean value of $\\boldsymbol{\\epsilon}$ is by definition equal to zero. Furthermore, the function $f$ is not a stochastics variable, idem for $\\boldsymbol{\\tilde{y}}$.\n",
+    "We use a more compact notation in terms of the expectation value"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c78ceafe",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\mathbb{E}\\left[(\\boldsymbol{y}-\\boldsymbol{\\tilde{y}})^2\\right]=\\mathbb{E}\\left[(\\boldsymbol{f}+\\boldsymbol{\\epsilon}-\\boldsymbol{\\tilde{y}})^2\\right],\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "74aae5bc",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and adding and subtracting $\\mathbb{E}\\left[\\boldsymbol{\\tilde{y}}\\right]$ we get"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1f2313f1",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\mathbb{E}\\left[(\\boldsymbol{y}-\\boldsymbol{\\tilde{y}})^2\\right]=\\mathbb{E}\\left[(\\boldsymbol{f}+\\boldsymbol{\\epsilon}-\\boldsymbol{\\tilde{y}}+\\mathbb{E}\\left[\\boldsymbol{\\tilde{y}}\\right]-\\mathbb{E}\\left[\\boldsymbol{\\tilde{y}}\\right])^2\\right],\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a29b174f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "which, using the abovementioned expectation values can be rewritten as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3bc08002",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\mathbb{E}\\left[(\\boldsymbol{y}-\\boldsymbol{\\tilde{y}})^2\\right]=\\mathbb{E}\\left[(\\boldsymbol{y}-\\mathbb{E}\\left[\\boldsymbol{\\tilde{y}}\\right])^2\\right]+\\mathrm{Var}\\left[\\boldsymbol{\\tilde{y}}\\right]+\\sigma^2,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7b7d24e8",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "that is the rewriting in terms of the so-called bias, the variance of the model $\\boldsymbol{\\tilde{y}}$ and the variance of $\\boldsymbol{\\epsilon}$.\n",
+    "\n",
+    "**Note that in order to derive these equations we have assumed we can replace the unknown function $\\boldsymbol{f}$ with the target/output data $\\boldsymbol{y}$.**"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f2118d82",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## A way to Read the Bias-Variance Tradeoff\n",
+    "\n",
+    "<!-- dom:FIGURE: [figures/BiasVariance.png, width=600 frac=0.9] -->\n",
+    "<!-- begin figure -->\n",
+    "\n",
+    "<img src=\"figures/BiasVariance.png\" width=\"600\"><p style=\"font-size: 0.9em\"><i>Figure 1: </i></p>\n",
+    "<!-- end figure -->"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "baf08f8a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Understanding what happens"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "id": "1bd7ac4e",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "%matplotlib inline\n",
+    "\n",
+    "import matplotlib.pyplot as plt\n",
+    "import numpy as np\n",
+    "from sklearn.linear_model import LinearRegression, Ridge, Lasso\n",
+    "from sklearn.preprocessing import PolynomialFeatures\n",
+    "from sklearn.model_selection import train_test_split\n",
+    "from sklearn.pipeline import make_pipeline\n",
+    "from sklearn.utils import resample\n",
+    "\n",
+    "np.random.seed(2018)\n",
+    "\n",
+    "n = 40\n",
+    "n_boostraps = 100\n",
+    "maxdegree = 14\n",
+    "\n",
+    "\n",
+    "# Make data set.\n",
+    "x = np.linspace(-3, 3, n).reshape(-1, 1)\n",
+    "y = np.exp(-x**2) + 1.5 * np.exp(-(x-2)**2)+ np.random.normal(0, 0.1, x.shape)\n",
+    "error = np.zeros(maxdegree)\n",
+    "bias = np.zeros(maxdegree)\n",
+    "variance = np.zeros(maxdegree)\n",
+    "polydegree = np.zeros(maxdegree)\n",
+    "x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2)\n",
+    "\n",
+    "for degree in range(maxdegree):\n",
+    "    model = make_pipeline(PolynomialFeatures(degree=degree), LinearRegression(fit_intercept=False))\n",
+    "    y_pred = np.empty((y_test.shape[0], n_boostraps))\n",
+    "    for i in range(n_boostraps):\n",
+    "        x_, y_ = resample(x_train, y_train)\n",
+    "        y_pred[:, i] = model.fit(x_, y_).predict(x_test).ravel()\n",
+    "\n",
+    "    polydegree[degree] = degree\n",
+    "    error[degree] = np.mean( np.mean((y_test - y_pred)**2, axis=1, keepdims=True) )\n",
+    "    bias[degree] = np.mean( (y_test - np.mean(y_pred, axis=1, keepdims=True))**2 )\n",
+    "    variance[degree] = np.mean( np.var(y_pred, axis=1, keepdims=True) )\n",
+    "    print('Polynomial degree:', degree)\n",
+    "    print('Error:', error[degree])\n",
+    "    print('Bias^2:', bias[degree])\n",
+    "    print('Var:', variance[degree])\n",
+    "    print('{} >= {} + {} = {}'.format(error[degree], bias[degree], variance[degree], bias[degree]+variance[degree]))\n",
+    "\n",
+    "plt.plot(polydegree, error, label='Error')\n",
+    "plt.plot(polydegree, bias, label='bias')\n",
+    "plt.plot(polydegree, variance, label='Variance')\n",
+    "plt.legend()\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3edb75ab",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Summing up\n",
+    "\n",
+    "The bias-variance tradeoff summarizes the fundamental tension in\n",
+    "machine learning, particularly supervised learning, between the\n",
+    "complexity of a model and the amount of training data needed to train\n",
+    "it.  Since data is often limited, in practice it is often useful to\n",
+    "use a less-complex model with higher bias, that is  a model whose asymptotic\n",
+    "performance is worse than another model because it is easier to\n",
+    "train and less sensitive to sampling noise arising from having a\n",
+    "finite-sized training dataset (smaller variance). \n",
+    "\n",
+    "The above equations tell us that in\n",
+    "order to minimize the expected test error, we need to select a\n",
+    "statistical learning method that simultaneously achieves low variance\n",
+    "and low bias. Note that variance is inherently a nonnegative quantity,\n",
+    "and squared bias is also nonnegative. Hence, we see that the expected\n",
+    "test MSE can never lie below $Var(\\epsilon)$, the irreducible error.\n",
+    "\n",
+    "What do we mean by the variance and bias of a statistical learning\n",
+    "method? The variance refers to the amount by which our model would change if we\n",
+    "estimated it using a different training data set. Since the training\n",
+    "data are used to fit the statistical learning method, different\n",
+    "training data sets  will result in a different estimate. But ideally the\n",
+    "estimate for our model should not vary too much between training\n",
+    "sets. However, if a method has high variance  then small changes in\n",
+    "the training data can result in large changes in the model. In general, more\n",
+    "flexible statistical methods have higher variance.\n",
+    "\n",
+    "You may also find this recent [article](https://www.pnas.org/content/116/32/15849) of interest."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "88ce8a48",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Another Example from Scikit-Learn's Repository\n",
+    "\n",
+    "This example demonstrates the problems of underfitting and overfitting and\n",
+    "how we can use linear regression with polynomial features to approximate\n",
+    "nonlinear functions. The plot shows the function that we want to approximate,\n",
+    "which is a part of the cosine function. In addition, the samples from the\n",
+    "real function and the approximations of different models are displayed. The\n",
+    "models have polynomial features of different degrees. We can see that a\n",
+    "linear function (polynomial with degree 1) is not sufficient to fit the\n",
+    "training samples. This is called **underfitting**. A polynomial of degree 4\n",
+    "approximates the true function almost perfectly. However, for higher degrees\n",
+    "the model will **overfit** the training data, i.e. it learns the noise of the\n",
+    "training data.\n",
+    "We evaluate quantitatively overfitting and underfitting by using\n",
+    "cross-validation. We calculate the mean squared error (MSE) on the validation\n",
+    "set, the higher, the less likely the model generalizes correctly from the\n",
+    "training data."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "id": "40385eb8",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "\n",
+    "\n",
+    "#print(__doc__)\n",
+    "\n",
+    "import numpy as np\n",
+    "import matplotlib.pyplot as plt\n",
+    "from sklearn.pipeline import Pipeline\n",
+    "from sklearn.preprocessing import PolynomialFeatures\n",
+    "from sklearn.linear_model import LinearRegression\n",
+    "from sklearn.model_selection import cross_val_score\n",
+    "\n",
+    "\n",
+    "def true_fun(X):\n",
+    "    return np.cos(1.5 * np.pi * X)\n",
+    "\n",
+    "np.random.seed(0)\n",
+    "\n",
+    "n_samples = 30\n",
+    "degrees = [1, 4, 15]\n",
+    "\n",
+    "X = np.sort(np.random.rand(n_samples))\n",
+    "y = true_fun(X) + np.random.randn(n_samples) * 0.1\n",
+    "\n",
+    "plt.figure(figsize=(14, 5))\n",
+    "for i in range(len(degrees)):\n",
+    "    ax = plt.subplot(1, len(degrees), i + 1)\n",
+    "    plt.setp(ax, xticks=(), yticks=())\n",
+    "\n",
+    "    polynomial_features = PolynomialFeatures(degree=degrees[i],\n",
+    "                                             include_bias=False)\n",
+    "    linear_regression = LinearRegression()\n",
+    "    pipeline = Pipeline([(\"polynomial_features\", polynomial_features),\n",
+    "                         (\"linear_regression\", linear_regression)])\n",
+    "    pipeline.fit(X[:, np.newaxis], y)\n",
+    "\n",
+    "    # Evaluate the models using crossvalidation\n",
+    "    scores = cross_val_score(pipeline, X[:, np.newaxis], y,\n",
+    "                             scoring=\"neg_mean_squared_error\", cv=10)\n",
+    "\n",
+    "    X_test = np.linspace(0, 1, 100)\n",
+    "    plt.plot(X_test, pipeline.predict(X_test[:, np.newaxis]), label=\"Model\")\n",
+    "    plt.plot(X_test, true_fun(X_test), label=\"True function\")\n",
+    "    plt.scatter(X, y, edgecolor='b', s=20, label=\"Samples\")\n",
+    "    plt.xlabel(\"x\")\n",
+    "    plt.ylabel(\"y\")\n",
+    "    plt.xlim((0, 1))\n",
+    "    plt.ylim((-2, 2))\n",
+    "    plt.legend(loc=\"best\")\n",
+    "    plt.title(\"Degree {}\\nMSE = {:.2e}(+/- {:.2e})\".format(\n",
+    "        degrees[i], -scores.mean(), scores.std()))\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a0c0d4df",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Various steps in cross-validation\n",
+    "\n",
+    "When the repetitive splitting of the data set is done randomly,\n",
+    "samples may accidently end up in a fast majority of the splits in\n",
+    "either training or test set. Such samples may have an unbalanced\n",
+    "influence on either model building or prediction evaluation. To avoid\n",
+    "this $k$-fold cross-validation structures the data splitting. The\n",
+    "samples are divided into $k$ more or less equally sized exhaustive and\n",
+    "mutually exclusive subsets. In turn (at each split) one of these\n",
+    "subsets plays the role of the test set while the union of the\n",
+    "remaining subsets constitutes the training set. Such a splitting\n",
+    "warrants a balanced representation of each sample in both training and\n",
+    "test set over the splits. Still the division into the $k$ subsets\n",
+    "involves a degree of randomness. This may be fully excluded when\n",
+    "choosing $k=n$. This particular case is referred to as leave-one-out\n",
+    "cross-validation (LOOCV)."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "68d3e653",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Cross-validation in brief\n",
+    "\n",
+    "For the various values of $k$\n",
+    "\n",
+    "1. shuffle the dataset randomly.\n",
+    "\n",
+    "2. Split the dataset into $k$ groups.\n",
+    "\n",
+    "3. For each unique group:\n",
+    "\n",
+    "a. Decide which group to use as set for test data\n",
+    "\n",
+    "b. Take the remaining groups as a training data set\n",
+    "\n",
+    "c. Fit a model on the training set and evaluate it on the test set\n",
+    "\n",
+    "d. Retain the evaluation score and discard the model\n",
+    "\n",
+    "5. Summarize the model using the sample of model evaluation scores"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7f7a6350",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Code Example for Cross-validation and $k$-fold Cross-validation\n",
+    "\n",
+    "The code here uses Ridge regression with cross-validation (CV)  resampling and $k$-fold CV in order to fit a specific polynomial."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "id": "23eef50b",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "import numpy as np\n",
+    "import matplotlib.pyplot as plt\n",
+    "from sklearn.model_selection import KFold\n",
+    "from sklearn.linear_model import Ridge\n",
+    "from sklearn.model_selection import cross_val_score\n",
+    "from sklearn.preprocessing import PolynomialFeatures\n",
+    "\n",
+    "# A seed just to ensure that the random numbers are the same for every run.\n",
+    "# Useful for eventual debugging.\n",
+    "np.random.seed(3155)\n",
+    "\n",
+    "# Generate the data.\n",
+    "nsamples = 100\n",
+    "x = np.random.randn(nsamples)\n",
+    "y = 3*x**2 + np.random.randn(nsamples)\n",
+    "\n",
+    "## Cross-validation on Ridge regression using KFold only\n",
+    "\n",
+    "# Decide degree on polynomial to fit\n",
+    "poly = PolynomialFeatures(degree = 6)\n",
+    "\n",
+    "# Decide which values of lambda to use\n",
+    "nlambdas = 500\n",
+    "lambdas = np.logspace(-3, 5, nlambdas)\n",
+    "\n",
+    "# Initialize a KFold instance\n",
+    "k = 5\n",
+    "kfold = KFold(n_splits = k)\n",
+    "\n",
+    "# Perform the cross-validation to estimate MSE\n",
+    "scores_KFold = np.zeros((nlambdas, k))\n",
+    "\n",
+    "i = 0\n",
+    "for lmb in lambdas:\n",
+    "    ridge = Ridge(alpha = lmb)\n",
+    "    j = 0\n",
+    "    for train_inds, test_inds in kfold.split(x):\n",
+    "        xtrain = x[train_inds]\n",
+    "        ytrain = y[train_inds]\n",
+    "\n",
+    "        xtest = x[test_inds]\n",
+    "        ytest = y[test_inds]\n",
+    "\n",
+    "        Xtrain = poly.fit_transform(xtrain[:, np.newaxis])\n",
+    "        ridge.fit(Xtrain, ytrain[:, np.newaxis])\n",
+    "\n",
+    "        Xtest = poly.fit_transform(xtest[:, np.newaxis])\n",
+    "        ypred = ridge.predict(Xtest)\n",
+    "\n",
+    "        scores_KFold[i,j] = np.sum((ypred - ytest[:, np.newaxis])**2)/np.size(ypred)\n",
+    "\n",
+    "        j += 1\n",
+    "    i += 1\n",
+    "\n",
+    "\n",
+    "estimated_mse_KFold = np.mean(scores_KFold, axis = 1)\n",
+    "\n",
+    "## Cross-validation using cross_val_score from sklearn along with KFold\n",
+    "\n",
+    "# kfold is an instance initialized above as:\n",
+    "# kfold = KFold(n_splits = k)\n",
+    "\n",
+    "estimated_mse_sklearn = np.zeros(nlambdas)\n",
+    "i = 0\n",
+    "for lmb in lambdas:\n",
+    "    ridge = Ridge(alpha = lmb)\n",
+    "\n",
+    "    X = poly.fit_transform(x[:, np.newaxis])\n",
+    "    estimated_mse_folds = cross_val_score(ridge, X, y[:, np.newaxis], scoring='neg_mean_squared_error', cv=kfold)\n",
+    "\n",
+    "    # cross_val_score return an array containing the estimated negative mse for every fold.\n",
+    "    # we have to the the mean of every array in order to get an estimate of the mse of the model\n",
+    "    estimated_mse_sklearn[i] = np.mean(-estimated_mse_folds)\n",
+    "\n",
+    "    i += 1\n",
+    "\n",
+    "## Plot and compare the slightly different ways to perform cross-validation\n",
+    "\n",
+    "plt.figure()\n",
+    "\n",
+    "plt.plot(np.log10(lambdas), estimated_mse_sklearn, label = 'cross_val_score')\n",
+    "#plt.plot(np.log10(lambdas), estimated_mse_KFold, 'r--', label = 'KFold')\n",
+    "\n",
+    "plt.xlabel('log10(lambda)')\n",
+    "plt.ylabel('mse')\n",
+    "\n",
+    "plt.legend()\n",
+    "\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "76662787",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## More examples on bootstrap and cross-validation and errors"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "id": "166cd085",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "# Common imports\n",
+    "import os\n",
+    "import numpy as np\n",
+    "import pandas as pd\n",
+    "import matplotlib.pyplot as plt\n",
+    "from sklearn.linear_model import LinearRegression, Ridge, Lasso\n",
+    "from sklearn.model_selection import train_test_split\n",
+    "from sklearn.utils import resample\n",
+    "from sklearn.metrics import mean_squared_error\n",
+    "# Where to save the figures and data files\n",
+    "PROJECT_ROOT_DIR = \"Results\"\n",
+    "FIGURE_ID = \"Results/FigureFiles\"\n",
+    "DATA_ID = \"DataFiles/\"\n",
+    "\n",
+    "if not os.path.exists(PROJECT_ROOT_DIR):\n",
+    "    os.mkdir(PROJECT_ROOT_DIR)\n",
+    "\n",
+    "if not os.path.exists(FIGURE_ID):\n",
+    "    os.makedirs(FIGURE_ID)\n",
+    "\n",
+    "if not os.path.exists(DATA_ID):\n",
+    "    os.makedirs(DATA_ID)\n",
+    "\n",
+    "def image_path(fig_id):\n",
+    "    return os.path.join(FIGURE_ID, fig_id)\n",
+    "\n",
+    "def data_path(dat_id):\n",
+    "    return os.path.join(DATA_ID, dat_id)\n",
+    "\n",
+    "def save_fig(fig_id):\n",
+    "    plt.savefig(image_path(fig_id) + \".png\", format='png')\n",
+    "\n",
+    "infile = open(data_path(\"EoS.csv\"),'r')\n",
+    "\n",
+    "# Read the EoS data as  csv file and organize the data into two arrays with density and energies\n",
+    "EoS = pd.read_csv(infile, names=('Density', 'Energy'))\n",
+    "EoS['Energy'] = pd.to_numeric(EoS['Energy'], errors='coerce')\n",
+    "EoS = EoS.dropna()\n",
+    "Energies = EoS['Energy']\n",
+    "Density = EoS['Density']\n",
+    "#  The design matrix now as function of various polytrops\n",
+    "\n",
+    "Maxpolydegree = 30\n",
+    "X = np.zeros((len(Density),Maxpolydegree))\n",
+    "X[:,0] = 1.0\n",
+    "testerror = np.zeros(Maxpolydegree)\n",
+    "trainingerror = np.zeros(Maxpolydegree)\n",
+    "polynomial = np.zeros(Maxpolydegree)\n",
+    "\n",
+    "trials = 100\n",
+    "for polydegree in range(1, Maxpolydegree):\n",
+    "    polynomial[polydegree] = polydegree\n",
+    "    for degree in range(polydegree):\n",
+    "        X[:,degree] = Density**(degree/3.0)\n",
+    "\n",
+    "# loop over trials in order to estimate the expectation value of the MSE\n",
+    "    testerror[polydegree] = 0.0\n",
+    "    trainingerror[polydegree] = 0.0\n",
+    "    for samples in range(trials):\n",
+    "        x_train, x_test, y_train, y_test = train_test_split(X, Energies, test_size=0.2)\n",
+    "        model = LinearRegression(fit_intercept=False).fit(x_train, y_train)\n",
+    "        ypred = model.predict(x_train)\n",
+    "        ytilde = model.predict(x_test)\n",
+    "        testerror[polydegree] += mean_squared_error(y_test, ytilde)\n",
+    "        trainingerror[polydegree] += mean_squared_error(y_train, ypred) \n",
+    "\n",
+    "    testerror[polydegree] /= trials\n",
+    "    trainingerror[polydegree] /= trials\n",
+    "    print(\"Degree of polynomial: %3d\"% polynomial[polydegree])\n",
+    "    print(\"Mean squared error on training data: %.8f\" % trainingerror[polydegree])\n",
+    "    print(\"Mean squared error on test data: %.8f\" % testerror[polydegree])\n",
+    "\n",
+    "plt.plot(polynomial, np.log10(trainingerror), label='Training Error')\n",
+    "plt.plot(polynomial, np.log10(testerror), label='Test Error')\n",
+    "plt.xlabel('Polynomial degree')\n",
+    "plt.ylabel('log10[MSE]')\n",
+    "plt.legend()\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "53dc97b8",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Note that we kept the intercept column in the fitting here. This means that we need to set the **intercept** in the call to the **Scikit-Learn** function as **False**. Alternatively, we could have set up the design matrix $X$ without the first column of ones."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "660084ab",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## The same example but now with cross-validation\n",
+    "\n",
+    "In this example we keep the intercept column again but add cross-validation in order to estimate the best possible value of the means squared error."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "id": "5dd5aec2",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "# Common imports\n",
+    "import os\n",
+    "import numpy as np\n",
+    "import pandas as pd\n",
+    "import matplotlib.pyplot as plt\n",
+    "from sklearn.linear_model import LinearRegression, Ridge, Lasso\n",
+    "from sklearn.metrics import mean_squared_error\n",
+    "from sklearn.model_selection import KFold\n",
+    "from sklearn.model_selection import cross_val_score\n",
+    "\n",
+    "\n",
+    "# Where to save the figures and data files\n",
+    "PROJECT_ROOT_DIR = \"Results\"\n",
+    "FIGURE_ID = \"Results/FigureFiles\"\n",
+    "DATA_ID = \"DataFiles/\"\n",
+    "\n",
+    "if not os.path.exists(PROJECT_ROOT_DIR):\n",
+    "    os.mkdir(PROJECT_ROOT_DIR)\n",
+    "\n",
+    "if not os.path.exists(FIGURE_ID):\n",
+    "    os.makedirs(FIGURE_ID)\n",
+    "\n",
+    "if not os.path.exists(DATA_ID):\n",
+    "    os.makedirs(DATA_ID)\n",
+    "\n",
+    "def image_path(fig_id):\n",
+    "    return os.path.join(FIGURE_ID, fig_id)\n",
+    "\n",
+    "def data_path(dat_id):\n",
+    "    return os.path.join(DATA_ID, dat_id)\n",
+    "\n",
+    "def save_fig(fig_id):\n",
+    "    plt.savefig(image_path(fig_id) + \".png\", format='png')\n",
+    "\n",
+    "infile = open(data_path(\"EoS.csv\"),'r')\n",
+    "\n",
+    "# Read the EoS data as  csv file and organize the data into two arrays with density and energies\n",
+    "EoS = pd.read_csv(infile, names=('Density', 'Energy'))\n",
+    "EoS['Energy'] = pd.to_numeric(EoS['Energy'], errors='coerce')\n",
+    "EoS = EoS.dropna()\n",
+    "Energies = EoS['Energy']\n",
+    "Density = EoS['Density']\n",
+    "#  The design matrix now as function of various polytrops\n",
+    "\n",
+    "Maxpolydegree = 30\n",
+    "X = np.zeros((len(Density),Maxpolydegree))\n",
+    "X[:,0] = 1.0\n",
+    "estimated_mse_sklearn = np.zeros(Maxpolydegree)\n",
+    "polynomial = np.zeros(Maxpolydegree)\n",
+    "k =5\n",
+    "kfold = KFold(n_splits = k)\n",
+    "\n",
+    "for polydegree in range(1, Maxpolydegree):\n",
+    "    polynomial[polydegree] = polydegree\n",
+    "    for degree in range(polydegree):\n",
+    "        X[:,degree] = Density**(degree/3.0)\n",
+    "        OLS = LinearRegression(fit_intercept=False)\n",
+    "# loop over trials in order to estimate the expectation value of the MSE\n",
+    "    estimated_mse_folds = cross_val_score(OLS, X, Energies, scoring='neg_mean_squared_error', cv=kfold)\n",
+    "#[:, np.newaxis]\n",
+    "    estimated_mse_sklearn[polydegree] = np.mean(-estimated_mse_folds)\n",
+    "\n",
+    "plt.plot(polynomial, np.log10(estimated_mse_sklearn), label='Test Error')\n",
+    "plt.xlabel('Polynomial degree')\n",
+    "plt.ylabel('log10[MSE]')\n",
+    "plt.legend()\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2c1f6d4b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Logistic Regression\n",
+    "\n",
+    "In linear regression our main interest was centered on learning the\n",
+    "coefficients of a functional fit (say a polynomial) in order to be\n",
+    "able to predict the response of a continuous variable on some unseen\n",
+    "data. The fit to the continuous variable $y_i$ is based on some\n",
+    "independent variables $\\boldsymbol{x}_i$. Linear regression resulted in\n",
+    "analytical expressions for standard ordinary Least Squares or Ridge\n",
+    "regression (in terms of matrices to invert) for several quantities,\n",
+    "ranging from the variance and thereby the confidence intervals of the\n",
+    "parameters $\\boldsymbol{\\theta}$ to the mean squared error. If we can invert\n",
+    "the product of the design matrices, linear regression gives then a\n",
+    "simple recipe for fitting our data."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "149e92ec",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Classification problems\n",
+    "\n",
+    "Classification problems, however, are concerned with outcomes taking\n",
+    "the form of discrete variables (i.e. categories). We may for example,\n",
+    "on the basis of DNA sequencing for a number of patients, like to find\n",
+    "out which mutations are important for a certain disease; or based on\n",
+    "scans of various patients' brains, figure out if there is a tumor or\n",
+    "not; or given a specific physical system, we'd like to identify its\n",
+    "state, say whether it is an ordered or disordered system (typical\n",
+    "situation in solid state physics); or classify the status of a\n",
+    "patient, whether she/he has a stroke or not and many other similar\n",
+    "situations.\n",
+    "\n",
+    "The most common situation we encounter when we apply logistic\n",
+    "regression is that of two possible outcomes, normally denoted as a\n",
+    "binary outcome, true or false, positive or negative, success or\n",
+    "failure etc."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ce85cd3a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Optimization and Deep learning\n",
+    "\n",
+    "Logistic regression will also serve as our stepping stone towards\n",
+    "neural network algorithms and supervised deep learning. For logistic\n",
+    "learning, the minimization of the cost function leads to a non-linear\n",
+    "equation in the parameters $\\boldsymbol{\\theta}$. The optimization of the\n",
+    "problem calls therefore for minimization algorithms. This forms the\n",
+    "bottle neck of all machine learning algorithms, namely how to find\n",
+    "reliable minima of a multi-variable function. This leads us to the\n",
+    "family of gradient descent methods. The latter are the working horses\n",
+    "of basically all modern machine learning algorithms.\n",
+    "\n",
+    "We note also that many of the topics discussed here on logistic \n",
+    "regression are also commonly used in modern supervised Deep Learning\n",
+    "models, as we will see later."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2eb9e687",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Basics\n",
+    "\n",
+    "We consider the case where the outputs/targets, also called the\n",
+    "responses or the outcomes, $y_i$ are discrete and only take values\n",
+    "from $k=0,\\dots,K-1$ (i.e. $K$ classes).\n",
+    "\n",
+    "The goal is to predict the\n",
+    "output classes from the design matrix $\\boldsymbol{X}\\in\\mathbb{R}^{n\\times p}$\n",
+    "made of $n$ samples, each of which carries $p$ features or predictors. The\n",
+    "primary goal is to identify the classes to which new unseen samples\n",
+    "belong.\n",
+    "\n",
+    "Let us specialize to the case of two classes only, with outputs\n",
+    "$y_i=0$ and $y_i=1$. Our outcomes could represent the status of a\n",
+    "credit card user that could default or not on her/his credit card\n",
+    "debt. That is"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9b8b7d05",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "y_i = \\begin{bmatrix} 0 & \\mathrm{no}\\\\  1 & \\mathrm{yes} \\end{bmatrix}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7db50d1a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Linear classifier\n",
+    "\n",
+    "Before moving to the logistic model, let us try to use our linear\n",
+    "regression model to classify these two outcomes. We could for example\n",
+    "fit a linear model to the default case if $y_i > 0.5$ and the no\n",
+    "default case $y_i \\leq 0.5$.\n",
+    "\n",
+    "We would then have our \n",
+    "weighted linear combination, namely"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a78fc346",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"_auto1\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation}\n",
+    "\\boldsymbol{y} = \\boldsymbol{X}^T\\boldsymbol{\\theta} +  \\boldsymbol{\\epsilon},\n",
+    "\\label{_auto1} \\tag{1}\n",
+    "\\end{equation}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "661d8faf",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where $\\boldsymbol{y}$ is a vector representing the possible outcomes, $\\boldsymbol{X}$ is our\n",
+    "$n\\times p$ design matrix and $\\boldsymbol{\\theta}$ represents our estimators/predictors."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8620ba1b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Some selected properties\n",
+    "\n",
+    "The main problem with our function is that it takes values on the\n",
+    "entire real axis. In the case of logistic regression, however, the\n",
+    "labels $y_i$ are discrete variables. A typical example is the credit\n",
+    "card data discussed below here, where we can set the state of\n",
+    "defaulting the debt to $y_i=1$ and not to $y_i=0$ for one the persons\n",
+    "in the data set (see the full example below).\n",
+    "\n",
+    "One simple way to get a discrete output is to have sign\n",
+    "functions that map the output of a linear regressor to values $\\{0,1\\}$,\n",
+    "$f(s_i)=sign(s_i)=1$ if $s_i\\ge 0$ and 0 if otherwise. \n",
+    "We will encounter this model in our first demonstration of neural networks.\n",
+    "\n",
+    "Historically it is called the **perceptron** model in the machine learning\n",
+    "literature. This model is extremely simple. However, in many cases it is more\n",
+    "favorable to use a ``soft\" classifier that outputs\n",
+    "the probability of a given category. This leads us to the logistic function."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8fdbebd2",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Simple example\n",
+    "\n",
+    "The following example on data for coronary heart disease (CHD) as function of age may serve as an illustration. In the code here we read and plot whether a person has had CHD (output = 1) or not (output = 0). This ouput  is plotted the person's against age. Clearly, the figure shows that attempting to make a standard linear regression fit may not be very meaningful."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "id": "8dc64aeb",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "# Common imports\n",
+    "import os\n",
+    "import numpy as np\n",
+    "import pandas as pd\n",
+    "import matplotlib.pyplot as plt\n",
+    "from sklearn.linear_model import LinearRegression, Ridge, Lasso\n",
+    "from sklearn.model_selection import train_test_split\n",
+    "from sklearn.utils import resample\n",
+    "from sklearn.metrics import mean_squared_error\n",
+    "from IPython.display import display\n",
+    "from pylab import plt, mpl\n",
+    "mpl.rcParams['font.family'] = 'serif'\n",
+    "\n",
+    "# Where to save the figures and data files\n",
+    "PROJECT_ROOT_DIR = \"Results\"\n",
+    "FIGURE_ID = \"Results/FigureFiles\"\n",
+    "DATA_ID = \"DataFiles/\"\n",
+    "\n",
+    "if not os.path.exists(PROJECT_ROOT_DIR):\n",
+    "    os.mkdir(PROJECT_ROOT_DIR)\n",
+    "\n",
+    "if not os.path.exists(FIGURE_ID):\n",
+    "    os.makedirs(FIGURE_ID)\n",
+    "\n",
+    "if not os.path.exists(DATA_ID):\n",
+    "    os.makedirs(DATA_ID)\n",
+    "\n",
+    "def image_path(fig_id):\n",
+    "    return os.path.join(FIGURE_ID, fig_id)\n",
+    "\n",
+    "def data_path(dat_id):\n",
+    "    return os.path.join(DATA_ID, dat_id)\n",
+    "\n",
+    "def save_fig(fig_id):\n",
+    "    plt.savefig(image_path(fig_id) + \".png\", format='png')\n",
+    "\n",
+    "infile = open(data_path(\"chddata.csv\"),'r')\n",
+    "\n",
+    "# Read the chd data as  csv file and organize the data into arrays with age group, age, and chd\n",
+    "chd = pd.read_csv(infile, names=('ID', 'Age', 'Agegroup', 'CHD'))\n",
+    "chd.columns = ['ID', 'Age', 'Agegroup', 'CHD']\n",
+    "output = chd['CHD']\n",
+    "age = chd['Age']\n",
+    "agegroup = chd['Agegroup']\n",
+    "numberID  = chd['ID'] \n",
+    "display(chd)\n",
+    "\n",
+    "plt.scatter(age, output, marker='o')\n",
+    "plt.axis([18,70.0,-0.1, 1.2])\n",
+    "plt.xlabel(r'Age')\n",
+    "plt.ylabel(r'CHD')\n",
+    "plt.title(r'Age distribution and Coronary heart disease')\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "40385068",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Plotting the mean value for each group\n",
+    "\n",
+    "What we could attempt however is to plot the mean value for each group."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "id": "a473659b",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "agegroupmean = np.array([0.1, 0.133, 0.250, 0.333, 0.462, 0.625, 0.765, 0.800])\n",
+    "group = np.array([1, 2, 3, 4, 5, 6, 7, 8])\n",
+    "plt.plot(group, agegroupmean, \"r-\")\n",
+    "plt.axis([0,9,0, 1.0])\n",
+    "plt.xlabel(r'Age group')\n",
+    "plt.ylabel(r'CHD mean values')\n",
+    "plt.title(r'Mean values for each age group')\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3e2ab512",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "We are now trying to find a function $f(y\\vert x)$, that is a function which gives us an expected value for the output $y$ with a given input $x$.\n",
+    "In standard linear regression with a linear dependence on $x$, we would write this in terms of our model"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "40361f1b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "f(y_i\\vert x_i)=\\theta_0+\\theta_1 x_i.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a1b379fb",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "This expression implies however that $f(y_i\\vert x_i)$ could take any\n",
+    "value from minus infinity to plus infinity. If we however let\n",
+    "$f(y\\vert y)$ be represented by the mean value, the above example\n",
+    "shows us that we can constrain the function to take values between\n",
+    "zero and one, that is we have $0 \\le f(y_i\\vert x_i) \\le 1$. Looking\n",
+    "at our last curve we see also that it has an S-shaped form. This leads\n",
+    "us to a very popular model for the function $f$, namely the so-called\n",
+    "Sigmoid function or logistic model. We will consider this function as\n",
+    "representing the probability for finding a value of $y_i$ with a given\n",
+    "$x_i$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "bcbf3d2b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## The logistic function\n",
+    "\n",
+    "Another widely studied model, is the so-called \n",
+    "perceptron model, which is an example of a \"hard classification\" model. We\n",
+    "will encounter this model when we discuss neural networks as\n",
+    "well. Each datapoint is deterministically assigned to a category (i.e\n",
+    "$y_i=0$ or $y_i=1$). In many cases, and the coronary heart disease data forms one of many such examples, it is favorable to have a \"soft\"\n",
+    "classifier that outputs the probability of a given category rather\n",
+    "than a single value. For example, given $x_i$, the classifier\n",
+    "outputs the probability of being in a category $k$.  Logistic regression\n",
+    "is the most common example of a so-called soft classifier. In logistic\n",
+    "regression, the probability that a data point $x_i$\n",
+    "belongs to a category $y_i=\\{0,1\\}$ is given by the so-called logit function (or Sigmoid) which is meant to represent the likelihood for a given event,"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "38918f44",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "p(t) = \\frac{1}{1+\\mathrm \\exp{-t}}=\\frac{\\exp{t}}{1+\\mathrm \\exp{t}}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "fd225d0f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Note that $1-p(t)= p(-t)$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d340b5c1",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Examples of likelihood functions used in logistic regression and nueral networks\n",
+    "\n",
+    "The following code plots the logistic function, the step function and other functions we will encounter from here and on."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 8,
+   "id": "357d6f03",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "\"\"\"The sigmoid function (or the logistic curve) is a\n",
+    "function that takes any real number, z, and outputs a number (0,1).\n",
+    "It is useful in neural networks for assigning weights on a relative scale.\n",
+    "The value z is the weighted sum of parameters involved in the learning algorithm.\"\"\"\n",
+    "\n",
+    "import numpy\n",
+    "import matplotlib.pyplot as plt\n",
+    "import math as mt\n",
+    "\n",
+    "z = numpy.arange(-5, 5, .1)\n",
+    "sigma_fn = numpy.vectorize(lambda z: 1/(1+numpy.exp(-z)))\n",
+    "sigma = sigma_fn(z)\n",
+    "\n",
+    "fig = plt.figure()\n",
+    "ax = fig.add_subplot(111)\n",
+    "ax.plot(z, sigma)\n",
+    "ax.set_ylim([-0.1, 1.1])\n",
+    "ax.set_xlim([-5,5])\n",
+    "ax.grid(True)\n",
+    "ax.set_xlabel('z')\n",
+    "ax.set_title('sigmoid function')\n",
+    "\n",
+    "plt.show()\n",
+    "\n",
+    "\"\"\"Step Function\"\"\"\n",
+    "z = numpy.arange(-5, 5, .02)\n",
+    "step_fn = numpy.vectorize(lambda z: 1.0 if z >= 0.0 else 0.0)\n",
+    "step = step_fn(z)\n",
+    "\n",
+    "fig = plt.figure()\n",
+    "ax = fig.add_subplot(111)\n",
+    "ax.plot(z, step)\n",
+    "ax.set_ylim([-0.5, 1.5])\n",
+    "ax.set_xlim([-5,5])\n",
+    "ax.grid(True)\n",
+    "ax.set_xlabel('z')\n",
+    "ax.set_title('step function')\n",
+    "\n",
+    "plt.show()\n",
+    "\n",
+    "\"\"\"tanh Function\"\"\"\n",
+    "z = numpy.arange(-2*mt.pi, 2*mt.pi, 0.1)\n",
+    "t = numpy.tanh(z)\n",
+    "\n",
+    "fig = plt.figure()\n",
+    "ax = fig.add_subplot(111)\n",
+    "ax.plot(z, t)\n",
+    "ax.set_ylim([-1.0, 1.0])\n",
+    "ax.set_xlim([-2*mt.pi,2*mt.pi])\n",
+    "ax.grid(True)\n",
+    "ax.set_xlabel('z')\n",
+    "ax.set_title('tanh function')\n",
+    "\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8be63821",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Two parameters\n",
+    "\n",
+    "We assume now that we have two classes with $y_i$ either $0$ or $1$. Furthermore we assume also that we have only two parameters $\\theta$ in our fitting of the Sigmoid function, that is we define probabilities"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f79d930e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\begin{align*}\n",
+    "p(y_i=1|x_i,\\boldsymbol{\\theta}) &= \\frac{\\exp{(\\theta_0+\\theta_1x_i)}}{1+\\exp{(\\theta_0+\\theta_1x_i)}},\\nonumber\\\\\n",
+    "p(y_i=0|x_i,\\boldsymbol{\\theta}) &= 1 - p(y_i=1|x_i,\\boldsymbol{\\theta}),\n",
+    "\\end{align*}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8a758aae",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where $\\boldsymbol{\\theta}$ are the weights we wish to extract from data, in our case $\\theta_0$ and $\\theta_1$. \n",
+    "\n",
+    "Note that we used"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "88159170",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "p(y_i=0\\vert x_i, \\boldsymbol{\\theta}) = 1-p(y_i=1\\vert x_i, \\boldsymbol{\\theta}).\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f9972402",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Maximum likelihood\n",
+    "\n",
+    "In order to define the total likelihood for all possible outcomes from a  \n",
+    "dataset $\\mathcal{D}=\\{(y_i,x_i)\\}$, with the binary labels\n",
+    "$y_i\\in\\{0,1\\}$ and where the data points are drawn independently, we use the so-called [Maximum Likelihood Estimation](https://en.wikipedia.org/wiki/Maximum_likelihood_estimation) (MLE) principle. \n",
+    "We aim thus at maximizing \n",
+    "the probability of seeing the observed data. We can then approximate the \n",
+    "likelihood in terms of the product of the individual probabilities of a specific outcome $y_i$, that is"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "949524d2",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\begin{align*}\n",
+    "P(\\mathcal{D}|\\boldsymbol{\\theta})& = \\prod_{i=1}^n \\left[p(y_i=1|x_i,\\boldsymbol{\\theta})\\right]^{y_i}\\left[1-p(y_i=1|x_i,\\boldsymbol{\\theta}))\\right]^{1-y_i}\\nonumber \\\\\n",
+    "\\end{align*}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d9a7fded",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "from which we obtain the log-likelihood and our **cost/loss** function"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4c5f78fb",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\mathcal{C}(\\boldsymbol{\\theta}) = \\sum_{i=1}^n \\left( y_i\\log{p(y_i=1|x_i,\\boldsymbol{\\theta})} + (1-y_i)\\log\\left[1-p(y_i=1|x_i,\\boldsymbol{\\theta}))\\right]\\right).\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5ccce506",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## The cost function rewritten\n",
+    "\n",
+    "Reordering the logarithms, we can rewrite the **cost/loss** function as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "bf58bb76",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\mathcal{C}(\\boldsymbol{\\theta}) = \\sum_{i=1}^n  \\left(y_i(\\theta_0+\\theta_1x_i) -\\log{(1+\\exp{(\\theta_0+\\theta_1x_i)})}\\right).\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "41543ca6",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "The maximum likelihood estimator is defined as the set of parameters that maximize the log-likelihood where we maximize with respect to $\\theta$.\n",
+    "Since the cost (error) function is just the negative log-likelihood, for logistic regression we have that"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e664b57a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\mathcal{C}(\\boldsymbol{\\theta})=-\\sum_{i=1}^n  \\left(y_i(\\theta_0+\\theta_1x_i) -\\log{(1+\\exp{(\\theta_0+\\theta_1x_i)})}\\right).\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "eb357503",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "This equation is known in statistics as the **cross entropy**. Finally, we note that just as in linear regression, \n",
+    "in practice we often supplement the cross-entropy with additional regularization terms, usually $L_1$ and $L_2$ regularization as we did for Ridge and Lasso regression."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e388ad02",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Minimizing the cross entropy\n",
+    "\n",
+    "The cross entropy is a convex function of the weights $\\boldsymbol{\\theta}$ and,\n",
+    "therefore, any local minimizer is a global minimizer. \n",
+    "\n",
+    "Minimizing this\n",
+    "cost function with respect to the two parameters $\\theta_0$ and $\\theta_1$ we obtain"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1d4f2850",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial \\mathcal{C}(\\boldsymbol{\\theta})}{\\partial \\theta_0} = -\\sum_{i=1}^n  \\left(y_i -\\frac{\\exp{(\\theta_0+\\theta_1x_i)}}{1+\\exp{(\\theta_0+\\theta_1x_i)}}\\right),\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "68a0c133",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c942a72b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial \\mathcal{C}(\\boldsymbol{\\theta})}{\\partial \\theta_1} = -\\sum_{i=1}^n  \\left(y_ix_i -x_i\\frac{\\exp{(\\theta_0+\\theta_1x_i)}}{1+\\exp{(\\theta_0+\\theta_1x_i)}}\\right).\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "42caf6db",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## A more compact expression\n",
+    "\n",
+    "Let us now define a vector $\\boldsymbol{y}$ with $n$ elements $y_i$, an\n",
+    "$n\\times p$ matrix $\\boldsymbol{X}$ which contains the $x_i$ values and a\n",
+    "vector $\\boldsymbol{p}$ of fitted probabilities $p(y_i\\vert x_i,\\boldsymbol{\\theta})$. We can rewrite in a more compact form the first\n",
+    "derivative of the cost function as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "22cd94c9",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial \\mathcal{C}(\\boldsymbol{\\theta})}{\\partial \\boldsymbol{\\theta}} = -\\boldsymbol{X}^T\\left(\\boldsymbol{y}-\\boldsymbol{p}\\right).\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d9428067",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "If we in addition define a diagonal matrix $\\boldsymbol{W}$ with elements \n",
+    "$p(y_i\\vert x_i,\\boldsymbol{\\theta})(1-p(y_i\\vert x_i,\\boldsymbol{\\theta})$, we can obtain a compact expression of the second derivative as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "29178d5a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial^2 \\mathcal{C}(\\boldsymbol{\\theta})}{\\partial \\boldsymbol{\\theta}\\partial \\boldsymbol{\\theta}^T} = \\boldsymbol{X}^T\\boldsymbol{W}\\boldsymbol{X}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6b7671ad",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Extending to more predictors\n",
+    "\n",
+    "Within a binary classification problem, we can easily expand our model to include multiple predictors. Our ratio between likelihoods is then with $p$ predictors"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "500b6574",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\log{ \\frac{p(\\boldsymbol{\\theta}\\boldsymbol{x})}{1-p(\\boldsymbol{\\theta}\\boldsymbol{x})}} = \\theta_0+\\theta_1x_1+\\theta_2x_2+\\dots+\\theta_px_p.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cf0b50ce",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Here we defined $\\boldsymbol{x}=[1,x_1,x_2,\\dots,x_p]$ and $\\boldsymbol{\\theta}=[\\theta_0, \\theta_1, \\dots, \\theta_p]$ leading to"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "537486ee",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "p(\\boldsymbol{\\theta}\\boldsymbol{x})=\\frac{ \\exp{(\\theta_0+\\theta_1x_1+\\theta_2x_2+\\dots+\\theta_px_p)}}{1+\\exp{(\\theta_0+\\theta_1x_1+\\theta_2x_2+\\dots+\\theta_px_p)}}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "534fb571",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Including more classes\n",
+    "\n",
+    "Till now we have mainly focused on two classes, the so-called binary\n",
+    "system. Suppose we wish to extend to $K$ classes.  Let us for the sake\n",
+    "of simplicity assume we have only two predictors. We have then following model"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "fa7ca275",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\log{\\frac{p(C=1\\vert x)}{p(K\\vert x)}} = \\theta_{10}+\\theta_{11}x_1,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cc765c0e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2c43387d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\log{\\frac{p(C=2\\vert x)}{p(K\\vert x)}} = \\theta_{20}+\\theta_{21}x_1,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e063f183",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and so on till the class $C=K-1$ class"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "060fa00c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\log{\\frac{p(C=K-1\\vert x)}{p(K\\vert x)}} = \\theta_{(K-1)0}+\\theta_{(K-1)1}x_1,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b9034492",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and the model is specified in term of $K-1$ so-called log-odds or\n",
+    "**logit** transformations."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b7fba1fc",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## More classes\n",
+    "\n",
+    "In our discussion of neural networks we will encounter the above again\n",
+    "in terms of a slightly modified function, the so-called **Softmax** function.\n",
+    "\n",
+    "The softmax function is used in various multiclass classification\n",
+    "methods, such as multinomial logistic regression (also known as\n",
+    "softmax regression), multiclass linear discriminant analysis, naive\n",
+    "Bayes classifiers, and artificial neural networks.  Specifically, in\n",
+    "multinomial logistic regression and linear discriminant analysis, the\n",
+    "input to the function is the result of $K$ distinct linear functions,\n",
+    "and the predicted probability for the $k$-th class given a sample\n",
+    "vector $\\boldsymbol{x}$ and a weighting vector $\\boldsymbol{\\theta}$ is (with two\n",
+    "predictors):"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a8346f86",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "p(C=k\\vert \\mathbf {x} )=\\frac{\\exp{(\\theta_{k0}+\\theta_{k1}x_1)}}{1+\\sum_{l=1}^{K-1}\\exp{(\\theta_{l0}+\\theta_{l1}x_1)}}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b05e18eb",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "It is easy to extend to more predictors. The final class is"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3bff89b1",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "p(C=K\\vert \\mathbf {x} )=\\frac{1}{1+\\sum_{l=1}^{K-1}\\exp{(\\theta_{l0}+\\theta_{l1}x_1)}},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e89e832c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and they sum to one. Our earlier discussions were all specialized to\n",
+    "the case with two classes only. It is easy to see from the above that\n",
+    "what we derived earlier is compatible with these equations.\n",
+    "\n",
+    "To find the optimal parameters we would typically use a gradient\n",
+    "descent method.  Newton's method and gradient descent methods are\n",
+    "discussed in the material on [optimization\n",
+    "methods](https://compphysics.github.io/MachineLearning/doc/pub/Splines/html/Splines-bs.html)."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "464d4933",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Optimization, the central part of any Machine Learning algortithm\n",
+    "\n",
+    "Almost every problem in machine learning and data science starts with\n",
+    "a dataset $X$, a model $g(\\theta)$, which is a function of the\n",
+    "parameters $\\theta$ and a cost function $C(X, g(\\theta))$ that allows\n",
+    "us to judge how well the model $g(\\theta)$ explains the observations\n",
+    "$X$. The model is fit by finding the values of $\\theta$ that minimize\n",
+    "the cost function. Ideally we would be able to solve for $\\theta$\n",
+    "analytically, however this is not possible in general and we must use\n",
+    "some approximative/numerical method to compute the minimum."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c707d4a0",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Revisiting our Logistic Regression case\n",
+    "\n",
+    "In our discussion on Logistic Regression we studied the \n",
+    "case of\n",
+    "two classes, with $y_i$ either\n",
+    "$0$ or $1$. Furthermore we assumed also that we have only two\n",
+    "parameters $\\theta$ in our fitting, that is we\n",
+    "defined probabilities"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3f00d244",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\begin{align*}\n",
+    "p(y_i=1|x_i,\\boldsymbol{\\theta}) &= \\frac{\\exp{(\\theta_0+\\theta_1x_i)}}{1+\\exp{(\\theta_0+\\theta_1x_i)}},\\nonumber\\\\\n",
+    "p(y_i=0|x_i,\\boldsymbol{\\theta}) &= 1 - p(y_i=1|x_i,\\boldsymbol{\\theta}),\n",
+    "\\end{align*}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2d239661",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where $\\boldsymbol{\\theta}$ are the weights we wish to extract from data, in our case $\\theta_0$ and $\\theta_1$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4243778f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## The equations to solve\n",
+    "\n",
+    "Our compact equations used a definition of a vector $\\boldsymbol{y}$ with $n$\n",
+    "elements $y_i$, an $n\\times p$ matrix $\\boldsymbol{X}$ which contains the\n",
+    "$x_i$ values and a vector $\\boldsymbol{p}$ of fitted probabilities\n",
+    "$p(y_i\\vert x_i,\\boldsymbol{\\theta})$. We rewrote in a more compact form\n",
+    "the first derivative of the cost function as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "21ce04bb",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial \\mathcal{C}(\\boldsymbol{\\theta})}{\\partial \\boldsymbol{\\theta}} = -\\boldsymbol{X}^T\\left(\\boldsymbol{y}-\\boldsymbol{p}\\right).\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b854153c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "If we in addition define a diagonal matrix $\\boldsymbol{W}$ with elements \n",
+    "$p(y_i\\vert x_i,\\boldsymbol{\\theta})(1-p(y_i\\vert x_i,\\boldsymbol{\\theta})$, we can obtain a compact expression of the second derivative as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "235c9b1d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial^2 \\mathcal{C}(\\boldsymbol{\\theta})}{\\partial \\boldsymbol{\\theta}\\partial \\boldsymbol{\\theta}^T} = \\boldsymbol{X}^T\\boldsymbol{W}\\boldsymbol{X}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1651fe82",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "This defines what is called  the Hessian matrix."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f36a8c94",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Solving using Newton-Raphson's method\n",
+    "\n",
+    "If we can set up these equations, Newton-Raphson's iterative method is normally the method of choice. It requires however that we can compute in an efficient way the  matrices that define the first and second derivatives. \n",
+    "\n",
+    "Our iterative scheme is then given by"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "438b5efe",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\boldsymbol{\\theta}^{\\mathrm{new}} = \\boldsymbol{\\theta}^{\\mathrm{old}}-\\left(\\frac{\\partial^2 \\mathcal{C}(\\boldsymbol{\\theta})}{\\partial \\boldsymbol{\\theta}\\partial \\boldsymbol{\\theta}^T}\\right)^{-1}_{\\boldsymbol{\\theta}^{\\mathrm{old}}}\\times \\left(\\frac{\\partial \\mathcal{C}(\\boldsymbol{\\theta})}{\\partial \\boldsymbol{\\theta}}\\right)_{\\boldsymbol{\\theta}^{\\mathrm{old}}},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f3ae8207",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "or in matrix form as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "702a38c4",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\boldsymbol{\\theta}^{\\mathrm{new}} = \\boldsymbol{\\theta}^{\\mathrm{old}}-\\left(\\boldsymbol{X}^T\\boldsymbol{W}\\boldsymbol{X} \\right)^{-1}\\times \\left(-\\boldsymbol{X}^T(\\boldsymbol{y}-\\boldsymbol{p}) \\right)_{\\boldsymbol{\\theta}^{\\mathrm{old}}}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "43b5a9ab",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "The right-hand side is computed with the old values of $\\theta$. \n",
+    "\n",
+    "If we can compute these matrices, in particular the Hessian, the above is often the easiest method to implement."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5b579d10",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Example code for Logistic Regression\n",
+    "\n",
+    "Here we make a class for Logistic regression. The code uses a simple data set and includes both a binary case and a multiclass case."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 9,
+   "id": "a59b8c77",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "import numpy as np\n",
+    "\n",
+    "class LogisticRegression:\n",
+    "    \"\"\"\n",
+    "    Logistic Regression for binary and multiclass classification.\n",
+    "    \"\"\"\n",
+    "    def __init__(self, lr=0.01, epochs=1000, fit_intercept=True, verbose=False):\n",
+    "        self.lr = lr                  # Learning rate for gradient descent\n",
+    "        self.epochs = epochs          # Number of iterations\n",
+    "        self.fit_intercept = fit_intercept  # Whether to add intercept (bias)\n",
+    "        self.verbose = verbose        # Print loss during training if True\n",
+    "        self.weights = None\n",
+    "        self.multi_class = False      # Will be determined at fit time\n",
+    "\n",
+    "    def _add_intercept(self, X):\n",
+    "        \"\"\"Add intercept term (column of ones) to feature matrix.\"\"\"\n",
+    "        intercept = np.ones((X.shape[0], 1))\n",
+    "        return np.concatenate((intercept, X), axis=1)\n",
+    "\n",
+    "    def _sigmoid(self, z):\n",
+    "        \"\"\"Sigmoid function for binary logistic.\"\"\"\n",
+    "        return 1 / (1 + np.exp(-z))\n",
+    "\n",
+    "    def _softmax(self, Z):\n",
+    "        \"\"\"Softmax function for multiclass logistic.\"\"\"\n",
+    "        exp_Z = np.exp(Z - np.max(Z, axis=1, keepdims=True))\n",
+    "        return exp_Z / np.sum(exp_Z, axis=1, keepdims=True)\n",
+    "\n",
+    "    def fit(self, X, y):\n",
+    "        \"\"\"\n",
+    "        Train the logistic regression model using gradient descent.\n",
+    "        Supports binary (sigmoid) and multiclass (softmax) based on y.\n",
+    "        \"\"\"\n",
+    "        X = np.array(X)\n",
+    "        y = np.array(y)\n",
+    "        n_samples, n_features = X.shape\n",
+    "\n",
+    "        # Add intercept if needed\n",
+    "        if self.fit_intercept:\n",
+    "            X = self._add_intercept(X)\n",
+    "            n_features += 1\n",
+    "\n",
+    "        # Determine classes and mode (binary vs multiclass)\n",
+    "        unique_classes = np.unique(y)\n",
+    "        if len(unique_classes) > 2:\n",
+    "            self.multi_class = True\n",
+    "        else:\n",
+    "            self.multi_class = False\n",
+    "\n",
+    "        # ----- Multiclass case -----\n",
+    "        if self.multi_class:\n",
+    "            n_classes = len(unique_classes)\n",
+    "            # Map original labels to 0...n_classes-1\n",
+    "            class_to_index = {c: idx for idx, c in enumerate(unique_classes)}\n",
+    "            y_indices = np.array([class_to_index[c] for c in y])\n",
+    "            # Initialize weight matrix (features x classes)\n",
+    "            self.weights = np.zeros((n_features, n_classes))\n",
+    "\n",
+    "            # One-hot encode y\n",
+    "            Y_onehot = np.zeros((n_samples, n_classes))\n",
+    "            Y_onehot[np.arange(n_samples), y_indices] = 1\n",
+    "\n",
+    "            # Gradient descent\n",
+    "            for epoch in range(self.epochs):\n",
+    "                scores = X.dot(self.weights)          # Linear scores (n_samples x n_classes)\n",
+    "                probs = self._softmax(scores)        # Probabilities (n_samples x n_classes)\n",
+    "                # Compute gradient (features x classes)\n",
+    "                gradient = (1 / n_samples) * X.T.dot(probs - Y_onehot)\n",
+    "                # Update weights\n",
+    "                self.weights -= self.lr * gradient\n",
+    "\n",
+    "                if self.verbose and epoch % 100 == 0:\n",
+    "                    # Compute current loss (categorical cross-entropy)\n",
+    "                    loss = -np.sum(Y_onehot * np.log(probs + 1e-15)) / n_samples\n",
+    "                    print(f\"[Epoch {epoch}] Multiclass loss: {loss:.4f}\")\n",
+    "\n",
+    "        # ----- Binary case -----\n",
+    "        else:\n",
+    "            # Convert y to 0/1 if not already\n",
+    "            if not np.array_equal(unique_classes, [0, 1]):\n",
+    "                # Map the two classes to 0 and 1\n",
+    "                class0, class1 = unique_classes\n",
+    "                y_binary = np.where(y == class1, 1, 0)\n",
+    "            else:\n",
+    "                y_binary = y.copy().astype(int)\n",
+    "\n",
+    "            # Initialize weights vector (features,)\n",
+    "            self.weights = np.zeros(n_features)\n",
+    "\n",
+    "            # Gradient descent\n",
+    "            for epoch in range(self.epochs):\n",
+    "                linear_model = X.dot(self.weights)     # (n_samples,)\n",
+    "                probs = self._sigmoid(linear_model)   # (n_samples,)\n",
+    "                # Gradient for binary cross-entropy\n",
+    "                gradient = (1 / n_samples) * X.T.dot(probs - y_binary)\n",
+    "                self.weights -= self.lr * gradient\n",
+    "\n",
+    "                if self.verbose and epoch % 100 == 0:\n",
+    "                    # Compute binary cross-entropy loss\n",
+    "                    loss = -np.mean(\n",
+    "                        y_binary * np.log(probs + 1e-15) + \n",
+    "                        (1 - y_binary) * np.log(1 - probs + 1e-15)\n",
+    "                    )\n",
+    "                    print(f\"[Epoch {epoch}] Binary loss: {loss:.4f}\")\n",
+    "\n",
+    "    def predict_prob(self, X):\n",
+    "        \"\"\"\n",
+    "        Compute probability estimates. Returns a 1D array for binary or\n",
+    "        a 2D array (n_samples x n_classes) for multiclass.\n",
+    "        \"\"\"\n",
+    "        X = np.array(X)\n",
+    "        # Add intercept if the model used it\n",
+    "        if self.fit_intercept:\n",
+    "            X = self._add_intercept(X)\n",
+    "        scores = X.dot(self.weights)\n",
+    "        if self.multi_class:\n",
+    "            return self._softmax(scores)\n",
+    "        else:\n",
+    "            return self._sigmoid(scores)\n",
+    "\n",
+    "    def predict(self, X):\n",
+    "        \"\"\"\n",
+    "        Predict class labels for samples in X.\n",
+    "        Returns integer class labels (0,1 for binary, or 0...C-1 for multiclass).\n",
+    "        \"\"\"\n",
+    "        probs = self.predict_prob(X)\n",
+    "        if self.multi_class:\n",
+    "            # Choose class with highest probability\n",
+    "            return np.argmax(probs, axis=1)\n",
+    "        else:\n",
+    "            # Threshold at 0.5 for binary\n",
+    "            return (probs >= 0.5).astype(int)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d7401376",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "The class implements the sigmoid and softmax internally. During fit(),\n",
+    "we check the number of classes: if more than 2, we set\n",
+    "self.multi_class=True and perform multinomial logistic regression. We\n",
+    "one-hot encode the target vector and update a weight matrix with\n",
+    "softmax probabilities. Otherwise, we do standard binary logistic\n",
+    "regression, converting labels to 0/1 if needed and updating a weight\n",
+    "vector. In both cases we use batch gradient descent on the\n",
+    "cross-entropy loss (we add a small epsilon 1e-15 to logs for numerical\n",
+    "stability). Progress (loss) can be printed if verbose=True."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 10,
+   "id": "8609fd64",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "# Evaluation Metrics\n",
+    "#We define helper functions for accuracy and cross-entropy loss. Accuracy is the fraction of correct predictions . For loss, we compute the appropriate cross-entropy:\n",
+    "\n",
+    "def accuracy_score(y_true, y_pred):\n",
+    "    \"\"\"Accuracy = (# correct predictions) / (total samples).\"\"\"\n",
+    "    y_true = np.array(y_true)\n",
+    "    y_pred = np.array(y_pred)\n",
+    "    return np.mean(y_true == y_pred)\n",
+    "\n",
+    "def binary_cross_entropy(y_true, y_prob):\n",
+    "    \"\"\"\n",
+    "    Binary cross-entropy loss.\n",
+    "    y_true: true binary labels (0 or 1), y_prob: predicted probabilities for class 1.\n",
+    "    \"\"\"\n",
+    "    y_true = np.array(y_true)\n",
+    "    y_prob = np.clip(np.array(y_prob), 1e-15, 1-1e-15)\n",
+    "    return -np.mean(y_true * np.log(y_prob) + (1 - y_true) * np.log(1 - y_prob))\n",
+    "\n",
+    "def categorical_cross_entropy(y_true, y_prob):\n",
+    "    \"\"\"\n",
+    "    Categorical cross-entropy loss for multiclass.\n",
+    "    y_true: true labels (0...C-1), y_prob: array of predicted probabilities (n_samples x C).\n",
+    "    \"\"\"\n",
+    "    y_true = np.array(y_true, dtype=int)\n",
+    "    y_prob = np.clip(np.array(y_prob), 1e-15, 1-1e-15)\n",
+    "    # One-hot encode true labels\n",
+    "    n_samples, n_classes = y_prob.shape\n",
+    "    one_hot = np.zeros_like(y_prob)\n",
+    "    one_hot[np.arange(n_samples), y_true] = 1\n",
+    "    # Compute cross-entropy\n",
+    "    loss_vec = -np.sum(one_hot * np.log(y_prob), axis=1)\n",
+    "    return np.mean(loss_vec)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1879aba2",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "### Synthetic data generation\n",
+    "\n",
+    "Binary classification data: Create two Gaussian clusters in 2D. For example, class 0 around mean [-2,-2] and class 1 around [2,2].\n",
+    "Multiclass data: Create several Gaussian clusters (one per class) spread out in feature space."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 11,
+   "id": "6083d844",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "import numpy as np\n",
+    "\n",
+    "def generate_binary_data(n_samples=100, n_features=2, random_state=None):\n",
+    "    \"\"\"\n",
+    "    Generate synthetic binary classification data.\n",
+    "    Returns (X, y) where X is (n_samples x n_features), y in {0,1}.\n",
+    "    \"\"\"\n",
+    "    rng = np.random.RandomState(random_state)\n",
+    "    # Half samples for class 0, half for class 1\n",
+    "    n0 = n_samples // 2\n",
+    "    n1 = n_samples - n0\n",
+    "    # Class 0 around mean -2, class 1 around +2\n",
+    "    mean0 = -2 * np.ones(n_features)\n",
+    "    mean1 =  2 * np.ones(n_features)\n",
+    "    X0 = rng.randn(n0, n_features) + mean0\n",
+    "    X1 = rng.randn(n1, n_features) + mean1\n",
+    "    X = np.vstack((X0, X1))\n",
+    "    y = np.array([0]*n0 + [1]*n1)\n",
+    "    return X, y\n",
+    "\n",
+    "def generate_multiclass_data(n_samples=150, n_features=2, n_classes=3, random_state=None):\n",
+    "    \"\"\"\n",
+    "    Generate synthetic multiclass data with n_classes Gaussian clusters.\n",
+    "    \"\"\"\n",
+    "    rng = np.random.RandomState(random_state)\n",
+    "    X = []\n",
+    "    y = []\n",
+    "    samples_per_class = n_samples // n_classes\n",
+    "    for cls in range(n_classes):\n",
+    "        # Random cluster center for each class\n",
+    "        center = rng.uniform(-5, 5, size=n_features)\n",
+    "        Xi = rng.randn(samples_per_class, n_features) + center\n",
+    "        yi = [cls] * samples_per_class\n",
+    "        X.append(Xi)\n",
+    "        y.extend(yi)\n",
+    "    X = np.vstack(X)\n",
+    "    y = np.array(y)\n",
+    "    return X, y\n",
+    "\n",
+    "\n",
+    "# Generate and test on binary data\n",
+    "X_bin, y_bin = generate_binary_data(n_samples=200, n_features=2, random_state=42)\n",
+    "model_bin = LogisticRegression(lr=0.1, epochs=1000)\n",
+    "model_bin.fit(X_bin, y_bin)\n",
+    "y_prob_bin = model_bin.predict_prob(X_bin)      # probabilities for class 1\n",
+    "y_pred_bin = model_bin.predict(X_bin)           # predicted classes 0 or 1\n",
+    "\n",
+    "acc_bin = accuracy_score(y_bin, y_pred_bin)\n",
+    "loss_bin = binary_cross_entropy(y_bin, y_prob_bin)\n",
+    "print(f\"Binary Classification - Accuracy: {acc_bin:.2f}, Cross-Entropy Loss: {loss_bin:.2f}\")\n",
+    "#For multiclass:\n",
+    "# Generate and test on multiclass data\n",
+    "X_multi, y_multi = generate_multiclass_data(n_samples=300, n_features=2, n_classes=3, random_state=1)\n",
+    "model_multi = LogisticRegression(lr=0.1, epochs=1000)\n",
+    "model_multi.fit(X_multi, y_multi)\n",
+    "y_prob_multi = model_multi.predict_prob(X_multi)     # (n_samples x 3) probabilities\n",
+    "y_pred_multi = model_multi.predict(X_multi)          # predicted labels 0,1,2\n",
+    "\n",
+    "acc_multi = accuracy_score(y_multi, y_pred_multi)\n",
+    "loss_multi = categorical_cross_entropy(y_multi, y_prob_multi)\n",
+    "print(f\"Multiclass Classification - Accuracy: {acc_multi:.2f}, Cross-Entropy Loss: {loss_multi:.2f}\")\n",
+    "\n",
+    "# CSV Export\n",
+    "import csv\n",
+    "\n",
+    "# Export binary results\n",
+    "with open('binary_results.csv', mode='w', newline='') as f:\n",
+    "    writer = csv.writer(f)\n",
+    "    writer.writerow([\"TrueLabel\", \"PredictedLabel\"])\n",
+    "    for true, pred in zip(y_bin, y_pred_bin):\n",
+    "        writer.writerow([true, pred])\n",
+    "\n",
+    "# Export multiclass results\n",
+    "with open('multiclass_results.csv', mode='w', newline='') as f:\n",
+    "    writer = csv.writer(f)\n",
+    "    writer.writerow([\"TrueLabel\", \"PredictedLabel\"])\n",
+    "    for true, pred in zip(y_multi, y_pred_multi):\n",
+    "        writer.writerow([true, pred])"
+   ]
+  }
+ ],
+ "metadata": {},
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/doc/LectureNotes/week40.ipynb b/doc/LectureNotes/week40.ipynb
new file mode 100644
index 000000000..aa3733b88
--- /dev/null
+++ b/doc/LectureNotes/week40.ipynb
@@ -0,0 +1,2459 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "2303c986",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- HTML file automatically generated from DocOnce source (https://github.com/doconce/doconce/)\n",
+    "doconce format html week40.do.txt --no_mako -->\n",
+    "<!-- dom:TITLE: Week 40: Gradient descent methods (continued) and start Neural networks -->"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "75c3b33e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "# Week 40: Gradient descent methods (continued) and start Neural networks\n",
+    "**Morten Hjorth-Jensen**, Department of Physics, University of Oslo, Norway\n",
+    "\n",
+    "Date: **September 29-October 3, 2025**"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4ba50982",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Lecture Monday September 29, 2025\n",
+    "1. Logistic regression and gradient descent, examples on how to code\n",
+    "<!-- o Automatic differentiation and gradient descent, examples using Logistic regression -->\n",
+    "\n",
+    "2. Start with the basics of Neural Networks, setting up the basic steps, from the simple perceptron model to the multi-layer perceptron model\n",
+    "\n",
+    "3. Video of lecture at <https://youtu.be/MS3Tv8FVArs>\n",
+    "\n",
+    "4. Whiteboard notes at <https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2025/FYSSTKweek40.pdf>"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1d527020",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Suggested readings and videos\n",
+    "**Readings and Videos:**\n",
+    "\n",
+    "1. The lecture notes for week 40 (these notes)\n",
+    "<!-- o For a good discussion on gradient methods, we would like to recommend Goodfellow et al section 4.3-4.5 and# sections 8.3-8.6. We will come back to the latter chapter in our discussion of Neural networks as well. -->\n",
+    "\n",
+    "2. For neural networks we recommend Goodfellow et al chapter 6 and Raschka et al chapter 2 (contains also material about gradient descent) and chapter 11 (we will use this next week)\n",
+    "<!-- o Video on gradient descent at <https://www.youtube.com/watch?v=sDv4f4s2SB8> -->\n",
+    "<!-- o Video on automatic differentiation  at <https://www.youtube.com/watch?v=wG_nF1awSSY> -->\n",
+    "\n",
+    "3. Neural Networks demystified at <https://www.youtube.com/watch?v=bxe2T-V8XRs&list=PLiaHhY2iBX9hdHaRr6b7XevZtgZRa1PoU&ab_channel=WelchLabs>\n",
+    "\n",
+    "4. Building Neural Networks from scratch at URL:https://www.youtube.com/watch?v=Wo5dMEP_BbI&list=PLQVvvaa0QuDcjD5BAw2DxE6OF2tius3V3&ab_channel=sentdex\""
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "63a4d497",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Lab sessions Tuesday and Wednesday\n",
+    "**Material for the active learning sessions on Tuesday and Wednesday.**\n",
+    "\n",
+    "  * Work on project 1 and discussions on how to structure your report\n",
+    "\n",
+    "  * No weekly exercises for week 40, project work only\n",
+    "\n",
+    "  * Video on how to write scientific reports recorded during one of the lab sessions at <https://youtu.be/tVW1ZDmZnwM>\n",
+    "\n",
+    "  * A general guideline can be found at <https://github.com/CompPhysics/MachineLearning/blob/master/doc/Projects/EvaluationGrading/EvaluationForm.md>."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "73621d6b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Logistic Regression, from last week\n",
+    "\n",
+    "In linear regression our main interest was centered on learning the\n",
+    "coefficients of a functional fit (say a polynomial) in order to be\n",
+    "able to predict the response of a continuous variable on some unseen\n",
+    "data. The fit to the continuous variable $y_i$ is based on some\n",
+    "independent variables $\\boldsymbol{x}_i$. Linear regression resulted in\n",
+    "analytical expressions for standard ordinary Least Squares or Ridge\n",
+    "regression (in terms of matrices to invert) for several quantities,\n",
+    "ranging from the variance and thereby the confidence intervals of the\n",
+    "parameters $\\boldsymbol{\\theta}$ to the mean squared error. If we can invert\n",
+    "the product of the design matrices, linear regression gives then a\n",
+    "simple recipe for fitting our data."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "fc1df17b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Classification problems\n",
+    "\n",
+    "Classification problems, however, are concerned with outcomes taking\n",
+    "the form of discrete variables (i.e. categories). We may for example,\n",
+    "on the basis of DNA sequencing for a number of patients, like to find\n",
+    "out which mutations are important for a certain disease; or based on\n",
+    "scans of various patients' brains, figure out if there is a tumor or\n",
+    "not; or given a specific physical system, we'd like to identify its\n",
+    "state, say whether it is an ordered or disordered system (typical\n",
+    "situation in solid state physics); or classify the status of a\n",
+    "patient, whether she/he has a stroke or not and many other similar\n",
+    "situations.\n",
+    "\n",
+    "The most common situation we encounter when we apply logistic\n",
+    "regression is that of two possible outcomes, normally denoted as a\n",
+    "binary outcome, true or false, positive or negative, success or\n",
+    "failure etc."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a3d311e6",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Optimization and Deep learning\n",
+    "\n",
+    "Logistic regression will also serve as our stepping stone towards\n",
+    "neural network algorithms and supervised deep learning. For logistic\n",
+    "learning, the minimization of the cost function leads to a non-linear\n",
+    "equation in the parameters $\\boldsymbol{\\theta}$. The optimization of the\n",
+    "problem calls therefore for minimization algorithms.\n",
+    "\n",
+    "As we have discussed earlier, this forms the\n",
+    "bottle neck of all machine learning algorithms, namely how to find\n",
+    "reliable minima of a multi-variable function. This leads us to the\n",
+    "family of gradient descent methods. The latter are the working horses\n",
+    "of basically all modern machine learning algorithms.\n",
+    "\n",
+    "We note also that many of the topics discussed here on logistic \n",
+    "regression are also commonly used in modern supervised Deep Learning\n",
+    "models, as we will see later."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4120d6f9",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Basics\n",
+    "\n",
+    "We consider the case where the outputs/targets, also called the\n",
+    "responses or the outcomes, $y_i$ are discrete and only take values\n",
+    "from $k=0,\\dots,K-1$ (i.e. $K$ classes).\n",
+    "\n",
+    "The goal is to predict the\n",
+    "output classes from the design matrix $\\boldsymbol{X}\\in\\mathbb{R}^{n\\times p}$\n",
+    "made of $n$ samples, each of which carries $p$ features or predictors. The\n",
+    "primary goal is to identify the classes to which new unseen samples\n",
+    "belong.\n",
+    "\n",
+    "Last week we  specialized to the case of two classes only, with outputs\n",
+    "$y_i=0$ and $y_i=1$. Our outcomes could represent the status of a\n",
+    "credit card user that could default or not on her/his credit card\n",
+    "debt. That is"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9e85d1e4",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "y_i = \\begin{bmatrix} 0 & \\mathrm{no}\\\\  1 & \\mathrm{yes} \\end{bmatrix}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a0d8c838",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Two parameters\n",
+    "\n",
+    "We assume now that we have two classes with $y_i$ either $0$ or $1$. Furthermore we assume also that we have only two parameters $\\theta$ in our fitting of the Sigmoid function, that is we define probabilities"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7cea7945",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\begin{align*}\n",
+    "p(y_i=1|x_i,\\boldsymbol{\\theta}) &= \\frac{\\exp{(\\theta_0+\\theta_1x_i)}}{1+\\exp{(\\theta_0+\\theta_1x_i)}},\\nonumber\\\\\n",
+    "p(y_i=0|x_i,\\boldsymbol{\\theta}) &= 1 - p(y_i=1|x_i,\\boldsymbol{\\theta}),\n",
+    "\\end{align*}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6adc5106",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where $\\boldsymbol{\\theta}$ are the weights we wish to extract from data, in our case $\\theta_0$ and $\\theta_1$. \n",
+    "\n",
+    "Note that we used"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f976068e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "p(y_i=0\\vert x_i, \\boldsymbol{\\theta}) = 1-p(y_i=1\\vert x_i, \\boldsymbol{\\theta}).\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "dedf9f0e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Maximum likelihood\n",
+    "\n",
+    "In order to define the total likelihood for all possible outcomes from a  \n",
+    "dataset $\\mathcal{D}=\\{(y_i,x_i)\\}$, with the binary labels\n",
+    "$y_i\\in\\{0,1\\}$ and where the data points are drawn independently, we use the so-called [Maximum Likelihood Estimation](https://en.wikipedia.org/wiki/Maximum_likelihood_estimation) (MLE) principle. \n",
+    "We aim thus at maximizing \n",
+    "the probability of seeing the observed data. We can then approximate the \n",
+    "likelihood in terms of the product of the individual probabilities of a specific outcome $y_i$, that is"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "bd8b54ab",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\begin{align*}\n",
+    "P(\\mathcal{D}|\\boldsymbol{\\theta})& = \\prod_{i=1}^n \\left[p(y_i=1|x_i,\\boldsymbol{\\theta})\\right]^{y_i}\\left[1-p(y_i=1|x_i,\\boldsymbol{\\theta}))\\right]^{1-y_i}\\nonumber \\\\\n",
+    "\\end{align*}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "57bfb17f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "from which we obtain the log-likelihood and our **cost/loss** function"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "00aee268",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\mathcal{C}(\\boldsymbol{\\theta}) = \\sum_{i=1}^n \\left( y_i\\log{p(y_i=1|x_i,\\boldsymbol{\\theta})} + (1-y_i)\\log\\left[1-p(y_i=1|x_i,\\boldsymbol{\\theta}))\\right]\\right).\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e12940f3",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## The cost function rewritten\n",
+    "\n",
+    "Reordering the logarithms, we can rewrite the **cost/loss** function as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e5b2b29e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\mathcal{C}(\\boldsymbol{\\theta}) = \\sum_{i=1}^n  \\left(y_i(\\theta_0+\\theta_1x_i) -\\log{(1+\\exp{(\\theta_0+\\theta_1x_i)})}\\right).\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c6c0ba4c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "The maximum likelihood estimator is defined as the set of parameters that maximize the log-likelihood where we maximize with respect to $\\theta$.\n",
+    "Since the cost (error) function is just the negative log-likelihood, for logistic regression we have that"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "46ee2ea8",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\mathcal{C}(\\boldsymbol{\\theta})=-\\sum_{i=1}^n  \\left(y_i(\\theta_0+\\theta_1x_i) -\\log{(1+\\exp{(\\theta_0+\\theta_1x_i)})}\\right).\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9a05709b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "This equation is known in statistics as the **cross entropy**. Finally, we note that just as in linear regression, \n",
+    "in practice we often supplement the cross-entropy with additional regularization terms, usually $L_1$ and $L_2$ regularization as we did for Ridge and Lasso regression."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ae1362c9",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Minimizing the cross entropy\n",
+    "\n",
+    "The cross entropy is a convex function of the weights $\\boldsymbol{\\theta}$ and,\n",
+    "therefore, any local minimizer is a global minimizer. \n",
+    "\n",
+    "Minimizing this\n",
+    "cost function with respect to the two parameters $\\theta_0$ and $\\theta_1$ we obtain"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "57f4670b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial \\mathcal{C}(\\boldsymbol{\\theta})}{\\partial \\theta_0} = -\\sum_{i=1}^n  \\left(y_i -\\frac{\\exp{(\\theta_0+\\theta_1x_i)}}{1+\\exp{(\\theta_0+\\theta_1x_i)}}\\right),\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1dc19f59",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4e96dc87",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial \\mathcal{C}(\\boldsymbol{\\theta})}{\\partial \\theta_1} = -\\sum_{i=1}^n  \\left(y_ix_i -x_i\\frac{\\exp{(\\theta_0+\\theta_1x_i)}}{1+\\exp{(\\theta_0+\\theta_1x_i)}}\\right).\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "fa77bec9",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## A more compact expression\n",
+    "\n",
+    "Let us now define a vector $\\boldsymbol{y}$ with $n$ elements $y_i$, an\n",
+    "$n\\times p$ matrix $\\boldsymbol{X}$ which contains the $x_i$ values and a\n",
+    "vector $\\boldsymbol{p}$ of fitted probabilities $p(y_i\\vert x_i,\\boldsymbol{\\theta})$. We can rewrite in a more compact form the first\n",
+    "derivative of the cost function as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1b013fd2",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial \\mathcal{C}(\\boldsymbol{\\theta})}{\\partial \\boldsymbol{\\theta}} = -\\boldsymbol{X}^T\\left(\\boldsymbol{y}-\\boldsymbol{p}\\right).\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "910f36dd",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "If we in addition define a diagonal matrix $\\boldsymbol{W}$ with elements \n",
+    "$p(y_i\\vert x_i,\\boldsymbol{\\theta})(1-p(y_i\\vert x_i,\\boldsymbol{\\theta})$, we can obtain a compact expression of the second derivative as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8212d0ed",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial^2 \\mathcal{C}(\\boldsymbol{\\theta})}{\\partial \\boldsymbol{\\theta}\\partial \\boldsymbol{\\theta}^T} = \\boldsymbol{X}^T\\boldsymbol{W}\\boldsymbol{X}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7ae7078b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Extending to more predictors\n",
+    "\n",
+    "Within a binary classification problem, we can easily expand our model to include multiple predictors. Our ratio between likelihoods is then with $p$ predictors"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "59e57d7c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\log{ \\frac{p(\\boldsymbol{\\theta}\\boldsymbol{x})}{1-p(\\boldsymbol{\\theta}\\boldsymbol{x})}} = \\theta_0+\\theta_1x_1+\\theta_2x_2+\\dots+\\theta_px_p.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6ffe0955",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Here we defined $\\boldsymbol{x}=[1,x_1,x_2,\\dots,x_p]$ and $\\boldsymbol{\\theta}=[\\theta_0, \\theta_1, \\dots, \\theta_p]$ leading to"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "56e9bd82",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "p(\\boldsymbol{\\theta}\\boldsymbol{x})=\\frac{ \\exp{(\\theta_0+\\theta_1x_1+\\theta_2x_2+\\dots+\\theta_px_p)}}{1+\\exp{(\\theta_0+\\theta_1x_1+\\theta_2x_2+\\dots+\\theta_px_p)}}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "86b12946",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Including more classes\n",
+    "\n",
+    "Till now we have mainly focused on two classes, the so-called binary\n",
+    "system. Suppose we wish to extend to $K$ classes.  Let us for the sake\n",
+    "of simplicity assume we have only two predictors. We have then following model"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d55394df",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\log{\\frac{p(C=1\\vert x)}{p(K\\vert x)}} = \\theta_{10}+\\theta_{11}x_1,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ee01378a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c7fadfbb",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\log{\\frac{p(C=2\\vert x)}{p(K\\vert x)}} = \\theta_{20}+\\theta_{21}x_1,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e8310f63",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and so on till the class $C=K-1$ class"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "be651647",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\log{\\frac{p(C=K-1\\vert x)}{p(K\\vert x)}} = \\theta_{(K-1)0}+\\theta_{(K-1)1}x_1,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e277c601",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and the model is specified in term of $K-1$ so-called log-odds or\n",
+    "**logit** transformations."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "aea3a410",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## More classes\n",
+    "\n",
+    "In our discussion of neural networks we will encounter the above again\n",
+    "in terms of a slightly modified function, the so-called **Softmax** function.\n",
+    "\n",
+    "The softmax function is used in various multiclass classification\n",
+    "methods, such as multinomial logistic regression (also known as\n",
+    "softmax regression), multiclass linear discriminant analysis, naive\n",
+    "Bayes classifiers, and artificial neural networks.  Specifically, in\n",
+    "multinomial logistic regression and linear discriminant analysis, the\n",
+    "input to the function is the result of $K$ distinct linear functions,\n",
+    "and the predicted probability for the $k$-th class given a sample\n",
+    "vector $\\boldsymbol{x}$ and a weighting vector $\\boldsymbol{\\theta}$ is (with two\n",
+    "predictors):"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "bfa7221f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "p(C=k\\vert \\mathbf {x} )=\\frac{\\exp{(\\theta_{k0}+\\theta_{k1}x_1)}}{1+\\sum_{l=1}^{K-1}\\exp{(\\theta_{l0}+\\theta_{l1}x_1)}}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3d749c39",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "It is easy to extend to more predictors. The final class is"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "dc061a39",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "p(C=K\\vert \\mathbf {x} )=\\frac{1}{1+\\sum_{l=1}^{K-1}\\exp{(\\theta_{l0}+\\theta_{l1}x_1)}},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8ea10488",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and they sum to one. Our earlier discussions were all specialized to\n",
+    "the case with two classes only. It is easy to see from the above that\n",
+    "what we derived earlier is compatible with these equations.\n",
+    "\n",
+    "To find the optimal parameters we would typically use a gradient\n",
+    "descent method.  Newton's method and gradient descent methods are\n",
+    "discussed in the material on [optimization\n",
+    "methods](https://compphysics.github.io/MachineLearning/doc/pub/Splines/html/Splines-bs.html)."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9cb3baf8",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Optimization, the central part of any Machine Learning algortithm\n",
+    "\n",
+    "Almost every problem in machine learning and data science starts with\n",
+    "a dataset $X$, a model $g(\\theta)$, which is a function of the\n",
+    "parameters $\\theta$ and a cost function $C(X, g(\\theta))$ that allows\n",
+    "us to judge how well the model $g(\\theta)$ explains the observations\n",
+    "$X$. The model is fit by finding the values of $\\theta$ that minimize\n",
+    "the cost function. Ideally we would be able to solve for $\\theta$\n",
+    "analytically, however this is not possible in general and we must use\n",
+    "some approximative/numerical method to compute the minimum."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "387393d7",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Revisiting our Logistic Regression case\n",
+    "\n",
+    "In our discussion on Logistic Regression we studied the \n",
+    "case of\n",
+    "two classes, with $y_i$ either\n",
+    "$0$ or $1$. Furthermore we assumed also that we have only two\n",
+    "parameters $\\theta$ in our fitting, that is we\n",
+    "defined probabilities"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "30f64659",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\begin{align*}\n",
+    "p(y_i=1|x_i,\\boldsymbol{\\theta}) &= \\frac{\\exp{(\\theta_0+\\theta_1x_i)}}{1+\\exp{(\\theta_0+\\theta_1x_i)}},\\nonumber\\\\\n",
+    "p(y_i=0|x_i,\\boldsymbol{\\theta}) &= 1 - p(y_i=1|x_i,\\boldsymbol{\\theta}),\n",
+    "\\end{align*}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3ba65422",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where $\\boldsymbol{\\theta}$ are the weights we wish to extract from data, in our case $\\theta_0$ and $\\theta_1$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "005f46d7",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## The equations to solve\n",
+    "\n",
+    "Our compact equations used a definition of a vector $\\boldsymbol{y}$ with $n$\n",
+    "elements $y_i$, an $n\\times p$ matrix $\\boldsymbol{X}$ which contains the\n",
+    "$x_i$ values and a vector $\\boldsymbol{p}$ of fitted probabilities\n",
+    "$p(y_i\\vert x_i,\\boldsymbol{\\theta})$. We rewrote in a more compact form\n",
+    "the first derivative of the cost function as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "61a638bc",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial \\mathcal{C}(\\boldsymbol{\\theta})}{\\partial \\boldsymbol{\\theta}} = -\\boldsymbol{X}^T\\left(\\boldsymbol{y}-\\boldsymbol{p}\\right).\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "469c0042",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "If we in addition define a diagonal matrix $\\boldsymbol{W}$ with elements \n",
+    "$p(y_i\\vert x_i,\\boldsymbol{\\theta})(1-p(y_i\\vert x_i,\\boldsymbol{\\theta})$, we can obtain a compact expression of the second derivative as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0af5449a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial^2 \\mathcal{C}(\\boldsymbol{\\theta})}{\\partial \\boldsymbol{\\theta}\\partial \\boldsymbol{\\theta}^T} = \\boldsymbol{X}^T\\boldsymbol{W}\\boldsymbol{X}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f4c16b4f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "This defines what is called  the Hessian matrix."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ddbe7f50",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Solving using Newton-Raphson's method\n",
+    "\n",
+    "If we can set up these equations, Newton-Raphson's iterative method is normally the method of choice. It requires however that we can compute in an efficient way the  matrices that define the first and second derivatives. \n",
+    "\n",
+    "Our iterative scheme is then given by"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "52830f96",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\boldsymbol{\\theta}^{\\mathrm{new}} = \\boldsymbol{\\theta}^{\\mathrm{old}}-\\left(\\frac{\\partial^2 \\mathcal{C}(\\boldsymbol{\\theta})}{\\partial \\boldsymbol{\\theta}\\partial \\boldsymbol{\\theta}^T}\\right)^{-1}_{\\boldsymbol{\\theta}^{\\mathrm{old}}}\\times \\left(\\frac{\\partial \\mathcal{C}(\\boldsymbol{\\theta})}{\\partial \\boldsymbol{\\theta}}\\right)_{\\boldsymbol{\\theta}^{\\mathrm{old}}},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1b8a1c14",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "or in matrix form as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8ad73cea",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\boldsymbol{\\theta}^{\\mathrm{new}} = \\boldsymbol{\\theta}^{\\mathrm{old}}-\\left(\\boldsymbol{X}^T\\boldsymbol{W}\\boldsymbol{X} \\right)^{-1}\\times \\left(-\\boldsymbol{X}^T(\\boldsymbol{y}-\\boldsymbol{p}) \\right)_{\\boldsymbol{\\theta}^{\\mathrm{old}}}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6d47dd0b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "The right-hand side is computed with the old values of $\\theta$. \n",
+    "\n",
+    "If we can compute these matrices, in particular the Hessian, the above is often the easiest method to implement."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f399c2f4",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Example code for Logistic Regression\n",
+    "\n",
+    "Here we make a class for Logistic regression. The code uses a simple data set and includes both a binary case and a multiclass case."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "id": "79f6b6fc",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "import numpy as np\n",
+    "\n",
+    "class LogisticRegression:\n",
+    "    \"\"\"\n",
+    "    Logistic Regression for binary and multiclass classification.\n",
+    "    \"\"\"\n",
+    "    def __init__(self, lr=0.01, epochs=1000, fit_intercept=True, verbose=False):\n",
+    "        self.lr = lr                  # Learning rate for gradient descent\n",
+    "        self.epochs = epochs          # Number of iterations\n",
+    "        self.fit_intercept = fit_intercept  # Whether to add intercept (bias)\n",
+    "        self.verbose = verbose        # Print loss during training if True\n",
+    "        self.weights = None\n",
+    "        self.multi_class = False      # Will be determined at fit time\n",
+    "\n",
+    "    def _add_intercept(self, X):\n",
+    "        \"\"\"Add intercept term (column of ones) to feature matrix.\"\"\"\n",
+    "        intercept = np.ones((X.shape[0], 1))\n",
+    "        return np.concatenate((intercept, X), axis=1)\n",
+    "\n",
+    "    def _sigmoid(self, z):\n",
+    "        \"\"\"Sigmoid function for binary logistic.\"\"\"\n",
+    "        return 1 / (1 + np.exp(-z))\n",
+    "\n",
+    "    def _softmax(self, Z):\n",
+    "        \"\"\"Softmax function for multiclass logistic.\"\"\"\n",
+    "        exp_Z = np.exp(Z - np.max(Z, axis=1, keepdims=True))\n",
+    "        return exp_Z / np.sum(exp_Z, axis=1, keepdims=True)\n",
+    "\n",
+    "    def fit(self, X, y):\n",
+    "        \"\"\"\n",
+    "        Train the logistic regression model using gradient descent.\n",
+    "        Supports binary (sigmoid) and multiclass (softmax) based on y.\n",
+    "        \"\"\"\n",
+    "        X = np.array(X)\n",
+    "        y = np.array(y)\n",
+    "        n_samples, n_features = X.shape\n",
+    "\n",
+    "        # Add intercept if needed\n",
+    "        if self.fit_intercept:\n",
+    "            X = self._add_intercept(X)\n",
+    "            n_features += 1\n",
+    "\n",
+    "        # Determine classes and mode (binary vs multiclass)\n",
+    "        unique_classes = np.unique(y)\n",
+    "        if len(unique_classes) > 2:\n",
+    "            self.multi_class = True\n",
+    "        else:\n",
+    "            self.multi_class = False\n",
+    "\n",
+    "        # ----- Multiclass case -----\n",
+    "        if self.multi_class:\n",
+    "            n_classes = len(unique_classes)\n",
+    "            # Map original labels to 0...n_classes-1\n",
+    "            class_to_index = {c: idx for idx, c in enumerate(unique_classes)}\n",
+    "            y_indices = np.array([class_to_index[c] for c in y])\n",
+    "            # Initialize weight matrix (features x classes)\n",
+    "            self.weights = np.zeros((n_features, n_classes))\n",
+    "\n",
+    "            # One-hot encode y\n",
+    "            Y_onehot = np.zeros((n_samples, n_classes))\n",
+    "            Y_onehot[np.arange(n_samples), y_indices] = 1\n",
+    "\n",
+    "            # Gradient descent\n",
+    "            for epoch in range(self.epochs):\n",
+    "                scores = X.dot(self.weights)          # Linear scores (n_samples x n_classes)\n",
+    "                probs = self._softmax(scores)        # Probabilities (n_samples x n_classes)\n",
+    "                # Compute gradient (features x classes)\n",
+    "                gradient = (1 / n_samples) * X.T.dot(probs - Y_onehot)\n",
+    "                # Update weights\n",
+    "                self.weights -= self.lr * gradient\n",
+    "\n",
+    "                if self.verbose and epoch % 100 == 0:\n",
+    "                    # Compute current loss (categorical cross-entropy)\n",
+    "                    loss = -np.sum(Y_onehot * np.log(probs + 1e-15)) / n_samples\n",
+    "                    print(f\"[Epoch {epoch}] Multiclass loss: {loss:.4f}\")\n",
+    "\n",
+    "        # ----- Binary case -----\n",
+    "        else:\n",
+    "            # Convert y to 0/1 if not already\n",
+    "            if not np.array_equal(unique_classes, [0, 1]):\n",
+    "                # Map the two classes to 0 and 1\n",
+    "                class0, class1 = unique_classes\n",
+    "                y_binary = np.where(y == class1, 1, 0)\n",
+    "            else:\n",
+    "                y_binary = y.copy().astype(int)\n",
+    "\n",
+    "            # Initialize weights vector (features,)\n",
+    "            self.weights = np.zeros(n_features)\n",
+    "\n",
+    "            # Gradient descent\n",
+    "            for epoch in range(self.epochs):\n",
+    "                linear_model = X.dot(self.weights)     # (n_samples,)\n",
+    "                probs = self._sigmoid(linear_model)   # (n_samples,)\n",
+    "                # Gradient for binary cross-entropy\n",
+    "                gradient = (1 / n_samples) * X.T.dot(probs - y_binary)\n",
+    "                self.weights -= self.lr * gradient\n",
+    "\n",
+    "                if self.verbose and epoch % 100 == 0:\n",
+    "                    # Compute binary cross-entropy loss\n",
+    "                    loss = -np.mean(\n",
+    "                        y_binary * np.log(probs + 1e-15) + \n",
+    "                        (1 - y_binary) * np.log(1 - probs + 1e-15)\n",
+    "                    )\n",
+    "                    print(f\"[Epoch {epoch}] Binary loss: {loss:.4f}\")\n",
+    "\n",
+    "    def predict_prob(self, X):\n",
+    "        \"\"\"\n",
+    "        Compute probability estimates. Returns a 1D array for binary or\n",
+    "        a 2D array (n_samples x n_classes) for multiclass.\n",
+    "        \"\"\"\n",
+    "        X = np.array(X)\n",
+    "        # Add intercept if the model used it\n",
+    "        if self.fit_intercept:\n",
+    "            X = self._add_intercept(X)\n",
+    "        scores = X.dot(self.weights)\n",
+    "        if self.multi_class:\n",
+    "            return self._softmax(scores)\n",
+    "        else:\n",
+    "            return self._sigmoid(scores)\n",
+    "\n",
+    "    def predict(self, X):\n",
+    "        \"\"\"\n",
+    "        Predict class labels for samples in X.\n",
+    "        Returns integer class labels (0,1 for binary, or 0...C-1 for multiclass).\n",
+    "        \"\"\"\n",
+    "        probs = self.predict_prob(X)\n",
+    "        if self.multi_class:\n",
+    "            # Choose class with highest probability\n",
+    "            return np.argmax(probs, axis=1)\n",
+    "        else:\n",
+    "            # Threshold at 0.5 for binary\n",
+    "            return (probs >= 0.5).astype(int)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "24e84b29",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "The class implements the sigmoid and softmax internally. During fit(),\n",
+    "we check the number of classes: if more than 2, we set\n",
+    "self.multi_class=True and perform multinomial logistic regression. We\n",
+    "one-hot encode the target vector and update a weight matrix with\n",
+    "softmax probabilities. Otherwise, we do standard binary logistic\n",
+    "regression, converting labels to 0/1 if needed and updating a weight\n",
+    "vector. In both cases we use batch gradient descent on the\n",
+    "cross-entropy loss (we add a small epsilon 1e-15 to logs for numerical\n",
+    "stability). Progress (loss) can be printed if verbose=True."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "id": "7a73eca4",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "# Evaluation Metrics\n",
+    "#We define helper functions for accuracy and cross-entropy loss. Accuracy is the fraction of correct predictions . For loss, we compute the appropriate cross-entropy:\n",
+    "\n",
+    "def accuracy_score(y_true, y_pred):\n",
+    "    \"\"\"Accuracy = (# correct predictions) / (total samples).\"\"\"\n",
+    "    y_true = np.array(y_true)\n",
+    "    y_pred = np.array(y_pred)\n",
+    "    return np.mean(y_true == y_pred)\n",
+    "\n",
+    "def binary_cross_entropy(y_true, y_prob):\n",
+    "    \"\"\"\n",
+    "    Binary cross-entropy loss.\n",
+    "    y_true: true binary labels (0 or 1), y_prob: predicted probabilities for class 1.\n",
+    "    \"\"\"\n",
+    "    y_true = np.array(y_true)\n",
+    "    y_prob = np.clip(np.array(y_prob), 1e-15, 1-1e-15)\n",
+    "    return -np.mean(y_true * np.log(y_prob) + (1 - y_true) * np.log(1 - y_prob))\n",
+    "\n",
+    "def categorical_cross_entropy(y_true, y_prob):\n",
+    "    \"\"\"\n",
+    "    Categorical cross-entropy loss for multiclass.\n",
+    "    y_true: true labels (0...C-1), y_prob: array of predicted probabilities (n_samples x C).\n",
+    "    \"\"\"\n",
+    "    y_true = np.array(y_true, dtype=int)\n",
+    "    y_prob = np.clip(np.array(y_prob), 1e-15, 1-1e-15)\n",
+    "    # One-hot encode true labels\n",
+    "    n_samples, n_classes = y_prob.shape\n",
+    "    one_hot = np.zeros_like(y_prob)\n",
+    "    one_hot[np.arange(n_samples), y_true] = 1\n",
+    "    # Compute cross-entropy\n",
+    "    loss_vec = -np.sum(one_hot * np.log(y_prob), axis=1)\n",
+    "    return np.mean(loss_vec)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "40d4b30f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "### Synthetic data generation\n",
+    "\n",
+    "Binary classification data: Create two Gaussian clusters in 2D. For example, class 0 around mean [-2,-2] and class 1 around [2,2].\n",
+    "Multiclass data: Create several Gaussian clusters (one per class) spread out in feature space."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "id": "ac0089bf",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "import numpy as np\n",
+    "\n",
+    "def generate_binary_data(n_samples=100, n_features=2, random_state=None):\n",
+    "    \"\"\"\n",
+    "    Generate synthetic binary classification data.\n",
+    "    Returns (X, y) where X is (n_samples x n_features), y in {0,1}.\n",
+    "    \"\"\"\n",
+    "    rng = np.random.RandomState(random_state)\n",
+    "    # Half samples for class 0, half for class 1\n",
+    "    n0 = n_samples // 2\n",
+    "    n1 = n_samples - n0\n",
+    "    # Class 0 around mean -2, class 1 around +2\n",
+    "    mean0 = -2 * np.ones(n_features)\n",
+    "    mean1 =  2 * np.ones(n_features)\n",
+    "    X0 = rng.randn(n0, n_features) + mean0\n",
+    "    X1 = rng.randn(n1, n_features) + mean1\n",
+    "    X = np.vstack((X0, X1))\n",
+    "    y = np.array([0]*n0 + [1]*n1)\n",
+    "    return X, y\n",
+    "\n",
+    "def generate_multiclass_data(n_samples=150, n_features=2, n_classes=3, random_state=None):\n",
+    "    \"\"\"\n",
+    "    Generate synthetic multiclass data with n_classes Gaussian clusters.\n",
+    "    \"\"\"\n",
+    "    rng = np.random.RandomState(random_state)\n",
+    "    X = []\n",
+    "    y = []\n",
+    "    samples_per_class = n_samples // n_classes\n",
+    "    for cls in range(n_classes):\n",
+    "        # Random cluster center for each class\n",
+    "        center = rng.uniform(-5, 5, size=n_features)\n",
+    "        Xi = rng.randn(samples_per_class, n_features) + center\n",
+    "        yi = [cls] * samples_per_class\n",
+    "        X.append(Xi)\n",
+    "        y.extend(yi)\n",
+    "    X = np.vstack(X)\n",
+    "    y = np.array(y)\n",
+    "    return X, y\n",
+    "\n",
+    "\n",
+    "# Generate and test on binary data\n",
+    "X_bin, y_bin = generate_binary_data(n_samples=200, n_features=2, random_state=42)\n",
+    "model_bin = LogisticRegression(lr=0.1, epochs=1000)\n",
+    "model_bin.fit(X_bin, y_bin)\n",
+    "y_prob_bin = model_bin.predict_prob(X_bin)      # probabilities for class 1\n",
+    "y_pred_bin = model_bin.predict(X_bin)           # predicted classes 0 or 1\n",
+    "\n",
+    "acc_bin = accuracy_score(y_bin, y_pred_bin)\n",
+    "loss_bin = binary_cross_entropy(y_bin, y_prob_bin)\n",
+    "print(f\"Binary Classification - Accuracy: {acc_bin:.2f}, Cross-Entropy Loss: {loss_bin:.2f}\")\n",
+    "#For multiclass:\n",
+    "# Generate and test on multiclass data\n",
+    "X_multi, y_multi = generate_multiclass_data(n_samples=300, n_features=2, n_classes=3, random_state=1)\n",
+    "model_multi = LogisticRegression(lr=0.1, epochs=1000)\n",
+    "model_multi.fit(X_multi, y_multi)\n",
+    "y_prob_multi = model_multi.predict_prob(X_multi)     # (n_samples x 3) probabilities\n",
+    "y_pred_multi = model_multi.predict(X_multi)          # predicted labels 0,1,2\n",
+    "\n",
+    "acc_multi = accuracy_score(y_multi, y_pred_multi)\n",
+    "loss_multi = categorical_cross_entropy(y_multi, y_prob_multi)\n",
+    "print(f\"Multiclass Classification - Accuracy: {acc_multi:.2f}, Cross-Entropy Loss: {loss_multi:.2f}\")\n",
+    "\n",
+    "# CSV Export\n",
+    "import csv\n",
+    "\n",
+    "# Export binary results\n",
+    "with open('binary_results.csv', mode='w', newline='') as f:\n",
+    "    writer = csv.writer(f)\n",
+    "    writer.writerow([\"TrueLabel\", \"PredictedLabel\"])\n",
+    "    for true, pred in zip(y_bin, y_pred_bin):\n",
+    "        writer.writerow([true, pred])\n",
+    "\n",
+    "# Export multiclass results\n",
+    "with open('multiclass_results.csv', mode='w', newline='') as f:\n",
+    "    writer = csv.writer(f)\n",
+    "    writer.writerow([\"TrueLabel\", \"PredictedLabel\"])\n",
+    "    for true, pred in zip(y_multi, y_pred_multi):\n",
+    "        writer.writerow([true, pred])"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1e9acef3",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Using **Scikit-learn**\n",
+    "\n",
+    "We show here how we can use a logistic regression case on a data set\n",
+    "included in _scikit_learn_, the so-called Wisconsin breast cancer data\n",
+    "using Logistic regression as our algorithm for classification. This is\n",
+    "a widely studied data set and can easily be included in demonstrations\n",
+    "of classification problems."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "id": "9153234a",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "%matplotlib inline\n",
+    "\n",
+    "import matplotlib.pyplot as plt\n",
+    "import numpy as np\n",
+    "from sklearn.model_selection import  train_test_split \n",
+    "from sklearn.datasets import load_breast_cancer\n",
+    "from sklearn.linear_model import LogisticRegression\n",
+    "\n",
+    "# Load the data\n",
+    "cancer = load_breast_cancer()\n",
+    "\n",
+    "X_train, X_test, y_train, y_test = train_test_split(cancer.data,cancer.target,random_state=0)\n",
+    "print(X_train.shape)\n",
+    "print(X_test.shape)\n",
+    "# Logistic Regression\n",
+    "logreg = LogisticRegression(solver='lbfgs')\n",
+    "logreg.fit(X_train, y_train)\n",
+    "print(\"Test set accuracy with Logistic Regression: {:.2f}\".format(logreg.score(X_test,y_test)))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "908d547b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Using the correlation matrix\n",
+    "\n",
+    "In addition to the above scores, we could also study the covariance (and the correlation matrix).\n",
+    "We use **Pandas** to compute the correlation matrix."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "id": "8a46f4f3",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "import matplotlib.pyplot as plt\n",
+    "import numpy as np\n",
+    "from sklearn.model_selection import  train_test_split \n",
+    "from sklearn.datasets import load_breast_cancer\n",
+    "from sklearn.linear_model import LogisticRegression\n",
+    "cancer = load_breast_cancer()\n",
+    "import pandas as pd\n",
+    "# Making a data frame\n",
+    "cancerpd = pd.DataFrame(cancer.data, columns=cancer.feature_names)\n",
+    "\n",
+    "fig, axes = plt.subplots(15,2,figsize=(10,20))\n",
+    "malignant = cancer.data[cancer.target == 0]\n",
+    "benign = cancer.data[cancer.target == 1]\n",
+    "ax = axes.ravel()\n",
+    "\n",
+    "for i in range(30):\n",
+    "    _, bins = np.histogram(cancer.data[:,i], bins =50)\n",
+    "    ax[i].hist(malignant[:,i], bins = bins, alpha = 0.5)\n",
+    "    ax[i].hist(benign[:,i], bins = bins, alpha = 0.5)\n",
+    "    ax[i].set_title(cancer.feature_names[i])\n",
+    "    ax[i].set_yticks(())\n",
+    "ax[0].set_xlabel(\"Feature magnitude\")\n",
+    "ax[0].set_ylabel(\"Frequency\")\n",
+    "ax[0].legend([\"Malignant\", \"Benign\"], loc =\"best\")\n",
+    "fig.tight_layout()\n",
+    "plt.show()\n",
+    "\n",
+    "import seaborn as sns\n",
+    "correlation_matrix = cancerpd.corr().round(1)\n",
+    "# use the heatmap function from seaborn to plot the correlation matrix\n",
+    "# annot = True to print the values inside the square\n",
+    "plt.figure(figsize=(15,8))\n",
+    "sns.heatmap(data=correlation_matrix, annot=True)\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ba0275a7",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Discussing the correlation data\n",
+    "\n",
+    "In the above example we note two things. In the first plot we display\n",
+    "the overlap of benign and malignant tumors as functions of the various\n",
+    "features in the Wisconsin data set. We see that for\n",
+    "some of the features we can distinguish clearly the benign and\n",
+    "malignant cases while for other features we cannot. This can point to\n",
+    "us which features may be of greater interest when we wish to classify\n",
+    "a benign or not benign tumour.\n",
+    "\n",
+    "In the second figure we have computed the so-called correlation\n",
+    "matrix, which in our case with thirty features becomes a $30\\times 30$\n",
+    "matrix.\n",
+    "\n",
+    "We constructed this matrix using **pandas** via the statements"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "id": "1af34f8e",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "cancerpd = pd.DataFrame(cancer.data, columns=cancer.feature_names)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1eac30d3",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and then"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "id": "a0cdd9c9",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "correlation_matrix = cancerpd.corr().round(1)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "013777ad",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Diagonalizing this matrix we can in turn say something about which\n",
+    "features are of relevance and which are not. This leads  us to\n",
+    "the classical Principal Component Analysis (PCA) theorem with\n",
+    "applications. This will be discussed later this semester."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "410f90ac",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Other measures in classification studies"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 8,
+   "id": "fa16a459",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "import matplotlib.pyplot as plt\n",
+    "import numpy as np\n",
+    "from sklearn.model_selection import  train_test_split \n",
+    "from sklearn.datasets import load_breast_cancer\n",
+    "from sklearn.linear_model import LogisticRegression\n",
+    "\n",
+    "# Load the data\n",
+    "cancer = load_breast_cancer()\n",
+    "\n",
+    "X_train, X_test, y_train, y_test = train_test_split(cancer.data,cancer.target,random_state=0)\n",
+    "print(X_train.shape)\n",
+    "print(X_test.shape)\n",
+    "# Logistic Regression\n",
+    "logreg = LogisticRegression(solver='lbfgs')\n",
+    "logreg.fit(X_train, y_train)\n",
+    "\n",
+    "from sklearn.preprocessing import LabelEncoder\n",
+    "from sklearn.model_selection import cross_validate\n",
+    "#Cross validation\n",
+    "accuracy = cross_validate(logreg,X_test,y_test,cv=10)['test_score']\n",
+    "print(accuracy)\n",
+    "print(\"Test set accuracy with Logistic Regression: {:.2f}\".format(logreg.score(X_test,y_test)))\n",
+    "\n",
+    "import scikitplot as skplt\n",
+    "y_pred = logreg.predict(X_test)\n",
+    "skplt.metrics.plot_confusion_matrix(y_test, y_pred, normalize=True)\n",
+    "plt.show()\n",
+    "y_probas = logreg.predict_proba(X_test)\n",
+    "skplt.metrics.plot_roc(y_test, y_probas)\n",
+    "plt.show()\n",
+    "skplt.metrics.plot_cumulative_gain(y_test, y_probas)\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a721de53",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Introduction to Neural networks\n",
+    "\n",
+    "Artificial neural networks are computational systems that can learn to\n",
+    "perform tasks by considering examples, generally without being\n",
+    "programmed with any task-specific rules. It is supposed to mimic a\n",
+    "biological system, wherein neurons interact by sending signals in the\n",
+    "form of mathematical functions between layers. All layers can contain\n",
+    "an arbitrary number of neurons, and each connection is represented by\n",
+    "a weight variable."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "68de5052",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Artificial neurons\n",
+    "\n",
+    "The field of artificial neural networks has a long history of\n",
+    "development, and is closely connected with the advancement of computer\n",
+    "science and computers in general. A model of artificial neurons was\n",
+    "first developed by McCulloch and Pitts in 1943 to study signal\n",
+    "processing in the brain and has later been refined by others. The\n",
+    "general idea is to mimic neural networks in the human brain, which is\n",
+    "composed of billions of neurons that communicate with each other by\n",
+    "sending electrical signals.  Each neuron accumulates its incoming\n",
+    "signals, which must exceed an activation threshold to yield an\n",
+    "output. If the threshold is not overcome, the neuron remains inactive,\n",
+    "i.e. has zero output.\n",
+    "\n",
+    "This behaviour has inspired a simple mathematical model for an artificial neuron."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7685af02",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"artificialNeuron\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation}\n",
+    " y = f\\left(\\sum_{i=1}^n w_ix_i\\right) = f(u)\n",
+    "\\label{artificialNeuron} \\tag{1}\n",
+    "\\end{equation}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3dfcfcb0",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Here, the output $y$ of the neuron is the value of its activation function, which have as input\n",
+    "a weighted sum of signals $x_i, \\dots ,x_n$ received by $n$ other neurons.\n",
+    "\n",
+    "Conceptually, it is helpful to divide neural networks into four\n",
+    "categories:\n",
+    "1. general purpose neural networks for supervised learning,\n",
+    "\n",
+    "2. neural networks designed specifically for image processing, the most prominent example of this class being Convolutional Neural Networks (CNNs),\n",
+    "\n",
+    "3. neural networks for sequential data such as Recurrent Neural Networks (RNNs), and\n",
+    "\n",
+    "4. neural networks for unsupervised learning such as Deep Boltzmann Machines.\n",
+    "\n",
+    "In natural science, DNNs and CNNs have already found numerous\n",
+    "applications. In statistical physics, they have been applied to detect\n",
+    "phase transitions in 2D Ising and Potts models, lattice gauge\n",
+    "theories, and different phases of polymers, or solving the\n",
+    "Navier-Stokes equation in weather forecasting.  Deep learning has also\n",
+    "found interesting applications in quantum physics. Various quantum\n",
+    "phase transitions can be detected and studied using DNNs and CNNs,\n",
+    "topological phases, and even non-equilibrium many-body\n",
+    "localization. Representing quantum states as DNNs quantum state\n",
+    "tomography are among some of the impressive achievements to reveal the\n",
+    "potential of DNNs to facilitate the study of quantum systems.\n",
+    "\n",
+    "In quantum information theory, it has been shown that one can perform\n",
+    "gate decompositions with the help of neural. \n",
+    "\n",
+    "The applications are not limited to the natural sciences. There is a\n",
+    "plethora of applications in essentially all disciplines, from the\n",
+    "humanities to life science and medicine."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0d037ca7",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Neural network types\n",
+    "\n",
+    "An artificial neural network (ANN), is a computational model that\n",
+    "consists of layers of connected neurons, or nodes or units.  We will\n",
+    "refer to these interchangeably as units or nodes, and sometimes as\n",
+    "neurons.\n",
+    "\n",
+    "It is supposed to mimic a biological nervous system by letting each\n",
+    "neuron interact with other neurons by sending signals in the form of\n",
+    "mathematical functions between layers.  A wide variety of different\n",
+    "ANNs have been developed, but most of them consist of an input layer,\n",
+    "an output layer and eventual layers in-between, called *hidden\n",
+    "layers*. All layers can contain an arbitrary number of nodes, and each\n",
+    "connection between two nodes is associated with a weight variable.\n",
+    "\n",
+    "Neural networks (also called neural nets) are neural-inspired\n",
+    "nonlinear models for supervised learning.  As we will see, neural nets\n",
+    "can be viewed as natural, more powerful extensions of supervised\n",
+    "learning methods such as linear and logistic regression and soft-max\n",
+    "methods we discussed earlier."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7bcf7188",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Feed-forward neural networks\n",
+    "\n",
+    "The feed-forward neural network (FFNN) was the first and simplest type\n",
+    "of ANNs that were devised. In this network, the information moves in\n",
+    "only one direction: forward through the layers.\n",
+    "\n",
+    "Nodes are represented by circles, while the arrows display the\n",
+    "connections between the nodes, including the direction of information\n",
+    "flow. Additionally, each arrow corresponds to a weight variable\n",
+    "(figure to come).  We observe that each node in a layer is connected\n",
+    "to *all* nodes in the subsequent layer, making this a so-called\n",
+    "*fully-connected* FFNN."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cd094e20",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Convolutional Neural Network\n",
+    "\n",
+    "A different variant of FFNNs are *convolutional neural networks*\n",
+    "(CNNs), which have a connectivity pattern inspired by the animal\n",
+    "visual cortex. Individual neurons in the visual cortex only respond to\n",
+    "stimuli from small sub-regions of the visual field, called a receptive\n",
+    "field. This makes the neurons well-suited to exploit the strong\n",
+    "spatially local correlation present in natural images. The response of\n",
+    "each neuron can be approximated mathematically as a convolution\n",
+    "operation.  (figure to come)\n",
+    "\n",
+    "Convolutional neural networks emulate the behaviour of neurons in the\n",
+    "visual cortex by enforcing a *local* connectivity pattern between\n",
+    "nodes of adjacent layers: Each node in a convolutional layer is\n",
+    "connected only to a subset of the nodes in the previous layer, in\n",
+    "contrast to the fully-connected FFNN.  Often, CNNs consist of several\n",
+    "convolutional layers that learn local features of the input, with a\n",
+    "fully-connected layer at the end, which gathers all the local data and\n",
+    "produces the outputs. They have wide applications in image and video\n",
+    "recognition."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ea99157e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Recurrent neural networks\n",
+    "\n",
+    "So far we have only mentioned ANNs where information flows in one\n",
+    "direction: forward. *Recurrent neural networks* on the other hand,\n",
+    "have connections between nodes that form directed *cycles*. This\n",
+    "creates a form of internal memory which are able to capture\n",
+    "information on what has been calculated before; the output is\n",
+    "dependent on the previous computations. Recurrent NNs make use of\n",
+    "sequential information by performing the same task for every element\n",
+    "in a sequence, where each element depends on previous elements. An\n",
+    "example of such information is sentences, making recurrent NNs\n",
+    "especially well-suited for handwriting and speech recognition."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b73754c2",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Other types of networks\n",
+    "\n",
+    "There are many other kinds of ANNs that have been developed. One type\n",
+    "that is specifically designed for interpolation in multidimensional\n",
+    "space is the radial basis function (RBF) network. RBFs are typically\n",
+    "made up of three layers: an input layer, a hidden layer with\n",
+    "non-linear radial symmetric activation functions and a linear output\n",
+    "layer (''linear'' here means that each node in the output layer has a\n",
+    "linear activation function). The layers are normally fully-connected\n",
+    "and there are no cycles, thus RBFs can be viewed as a type of\n",
+    "fully-connected FFNN. They are however usually treated as a separate\n",
+    "type of NN due the unusual activation functions."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "aa97c83d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Multilayer perceptrons\n",
+    "\n",
+    "One uses often so-called fully-connected feed-forward neural networks\n",
+    "with three or more layers (an input layer, one or more hidden layers\n",
+    "and an output layer) consisting of neurons that have non-linear\n",
+    "activation functions.\n",
+    "\n",
+    "Such networks are often called *multilayer perceptrons* (MLPs)."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "abe84919",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Why multilayer perceptrons?\n",
+    "\n",
+    "According to the *Universal approximation theorem*, a feed-forward\n",
+    "neural network with just a single hidden layer containing a finite\n",
+    "number of neurons can approximate a continuous multidimensional\n",
+    "function to arbitrary accuracy, assuming the activation function for\n",
+    "the hidden layer is a **non-constant, bounded and\n",
+    "monotonically-increasing continuous function**.\n",
+    "\n",
+    "Note that the requirements on the activation function only applies to\n",
+    "the hidden layer, the output nodes are always assumed to be linear, so\n",
+    "as to not restrict the range of output values."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d3ff207b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Illustration of a single perceptron model and a multi-perceptron model\n",
+    "\n",
+    "<!-- dom:FIGURE: [figures/nns.png, width=600 frac=0.8]  In a) we show a single perceptron model while in b) we dispay a network with two  hidden layers, an input layer and an output layer. -->\n",
+    "<!-- begin figure -->\n",
+    "\n",
+    "<img src=\"figures/nns.png\" width=\"600\"><p style=\"font-size: 0.9em\"><i>Figure 1: In a) we show a single perceptron model while in b) we dispay a network with two  hidden layers, an input layer and an output layer.</i></p>\n",
+    "<!-- end figure -->"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f982c11f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Examples of XOR, OR and AND gates\n",
+    "\n",
+    "Let us first try to fit various gates using standard linear\n",
+    "regression. The gates we are thinking of are the classical XOR, OR and\n",
+    "AND gates, well-known elements in computer science. The tables here\n",
+    "show how we can set up the inputs $x_1$ and $x_2$ in order to yield a\n",
+    "specific target $y_i$."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 9,
+   "id": "04a3e090",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "\"\"\"\n",
+    "Simple code that tests XOR, OR and AND gates with linear regression\n",
+    "\"\"\"\n",
+    "\n",
+    "import numpy as np\n",
+    "# Design matrix\n",
+    "X = np.array([ [1, 0, 0], [1, 0, 1], [1, 1, 0],[1, 1, 1]],dtype=np.float64)\n",
+    "print(f\"The X.TX  matrix:{X.T @ X}\")\n",
+    "Xinv = np.linalg.pinv(X.T @ X)\n",
+    "print(f\"The invers of X.TX  matrix:{Xinv}\")\n",
+    "\n",
+    "# The XOR gate \n",
+    "yXOR = np.array( [ 0, 1 ,1, 0])\n",
+    "ThetaXOR  = Xinv @ X.T @ yXOR\n",
+    "print(f\"The values of theta for the XOR gate:{ThetaXOR}\")\n",
+    "print(f\"The linear regression prediction  for the XOR gate:{X @ ThetaXOR}\")\n",
+    "\n",
+    "\n",
+    "# The OR gate \n",
+    "yOR = np.array( [ 0, 1 ,1, 1])\n",
+    "ThetaOR  = Xinv @ X.T @ yOR\n",
+    "print(f\"The values of theta for the OR gate:{ThetaOR}\")\n",
+    "print(f\"The linear regression prediction  for the OR gate:{X @ ThetaOR}\")\n",
+    "\n",
+    "\n",
+    "# The OR gate \n",
+    "yAND = np.array( [ 0, 0 ,0, 1])\n",
+    "ThetaAND  = Xinv @ X.T @ yAND\n",
+    "print(f\"The values of theta for the AND gate:{ThetaAND}\")\n",
+    "print(f\"The linear regression prediction  for the AND gate:{X @ ThetaAND}\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "95b1f5a5",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "What is happening here?"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0d200eff",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Does Logistic Regression do a better Job?"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 10,
+   "id": "040a69d0",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "\"\"\"\n",
+    "Simple code that tests XOR and OR gates with linear regression\n",
+    "and logistic regression\n",
+    "\"\"\"\n",
+    "\n",
+    "import matplotlib.pyplot as plt\n",
+    "from sklearn.linear_model import LogisticRegression\n",
+    "import numpy as np\n",
+    "\n",
+    "# Design matrix\n",
+    "X = np.array([ [1, 0, 0], [1, 0, 1], [1, 1, 0],[1, 1, 1]],dtype=np.float64)\n",
+    "print(f\"The X.TX  matrix:{X.T @ X}\")\n",
+    "Xinv = np.linalg.pinv(X.T @ X)\n",
+    "print(f\"The invers of X.TX  matrix:{Xinv}\")\n",
+    "\n",
+    "# The XOR gate \n",
+    "yXOR = np.array( [ 0, 1 ,1, 0])\n",
+    "ThetaXOR  = Xinv @ X.T @ yXOR\n",
+    "print(f\"The values of theta for the XOR gate:{ThetaXOR}\")\n",
+    "print(f\"The linear regression prediction  for the XOR gate:{X @ ThetaXOR}\")\n",
+    "\n",
+    "\n",
+    "# The OR gate \n",
+    "yOR = np.array( [ 0, 1 ,1, 1])\n",
+    "ThetaOR  = Xinv @ X.T @ yOR\n",
+    "print(f\"The values of theta for the OR gate:{ThetaOR}\")\n",
+    "print(f\"The linear regression prediction  for the OR gate:{X @ ThetaOR}\")\n",
+    "\n",
+    "\n",
+    "# The OR gate \n",
+    "yAND = np.array( [ 0, 0 ,0, 1])\n",
+    "ThetaAND  = Xinv @ X.T @ yAND\n",
+    "print(f\"The values of theta for the AND gate:{ThetaAND}\")\n",
+    "print(f\"The linear regression prediction  for the AND gate:{X @ ThetaAND}\")\n",
+    "\n",
+    "# Now we change to logistic regression\n",
+    "\n",
+    "\n",
+    "# Logistic Regression\n",
+    "logreg = LogisticRegression()\n",
+    "logreg.fit(X, yOR)\n",
+    "print(\"Test set accuracy with Logistic Regression for OR gate: {:.2f}\".format(logreg.score(X,yOR)))\n",
+    "\n",
+    "logreg.fit(X, yXOR)\n",
+    "print(\"Test set accuracy with Logistic Regression for XOR gate: {:.2f}\".format(logreg.score(X,yXOR)))\n",
+    "\n",
+    "\n",
+    "logreg.fit(X, yAND)\n",
+    "print(\"Test set accuracy with Logistic Regression for AND gate: {:.2f}\".format(logreg.score(X,yAND)))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "49f17f65",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Not exactly impressive, but somewhat better."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "714e0891",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Adding Neural Networks"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 11,
+   "id": "28bde670",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "\n",
+    "# and now neural networks with Scikit-Learn and the XOR\n",
+    "\n",
+    "from sklearn.neural_network import MLPClassifier\n",
+    "from sklearn.datasets import make_classification\n",
+    "X, yXOR = make_classification(n_samples=100, random_state=1)\n",
+    "FFNN = MLPClassifier(random_state=1, max_iter=300).fit(X, yXOR)\n",
+    "FFNN.predict_proba(X)\n",
+    "print(f\"Test set accuracy with Feed Forward Neural Network  for XOR gate:{FFNN.score(X, yXOR)}\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4440856f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Mathematical model\n",
+    "\n",
+    "The output $y$ is produced via the activation function $f$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6199da92",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "y = f\\left(\\sum_{i=1}^n w_ix_i + b_i\\right) = f(z),\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "62c964e3",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "This function receives $x_i$ as inputs.\n",
+    "Here the activation $z=(\\sum_{i=1}^n w_ix_i+b_i)$. \n",
+    "In an FFNN of such neurons, the *inputs* $x_i$ are the *outputs* of\n",
+    "the neurons in the preceding layer. Furthermore, an MLP is\n",
+    "fully-connected, which means that each neuron receives a weighted sum\n",
+    "of the outputs of *all* neurons in the previous layer."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "64ba4c70",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Mathematical model\n",
+    "\n",
+    "First, for each node $i$ in the first hidden layer, we calculate a weighted sum $z_i^1$ of the input coordinates $x_j$,"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "66c11135",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"_auto1\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation} z_i^1 = \\sum_{j=1}^{M} w_{ij}^1 x_j + b_i^1\n",
+    "\\label{_auto1} \\tag{2}\n",
+    "\\end{equation}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0f47b20a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Here $b_i$ is the so-called bias which is normally needed in\n",
+    "case of zero activation weights or inputs. How to fix the biases and\n",
+    "the weights will be discussed below.  The value of $z_i^1$ is the\n",
+    "argument to the activation function $f_i$ of each node $i$, The\n",
+    "variable $M$ stands for all possible inputs to a given node $i$ in the\n",
+    "first layer.  We define  the output $y_i^1$ of all neurons in layer 1 as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "bda56156",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"outputLayer1\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation}\n",
+    " y_i^1 = f(z_i^1) = f\\left(\\sum_{j=1}^M w_{ij}^1 x_j  + b_i^1\\right)\n",
+    "\\label{outputLayer1} \\tag{3}\n",
+    "\\end{equation}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1330fab9",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where we assume that all nodes in the same layer have identical\n",
+    "activation functions, hence the notation $f$. In general, we could assume in the more general case that different layers have different activation functions.\n",
+    "In this case we would identify these functions with a superscript $l$ for the $l$-th layer,"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ae474dfb",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"generalLayer\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation}\n",
+    " y_i^l = f^l(u_i^l) = f^l\\left(\\sum_{j=1}^{N_{l-1}} w_{ij}^l y_j^{l-1} + b_i^l\\right)\n",
+    "\\label{generalLayer} \\tag{4}\n",
+    "\\end{equation}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b6cb6fed",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where $N_l$ is the number of nodes in layer $l$. When the output of\n",
+    "all the nodes in the first hidden layer are computed, the values of\n",
+    "the subsequent layer can be calculated and so forth until the output\n",
+    "is obtained."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2f8f9b4e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Mathematical model\n",
+    "\n",
+    "The output of neuron $i$ in layer 2 is thus,"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "18e74238",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"_auto2\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation}\n",
+    " y_i^2 = f^2\\left(\\sum_{j=1}^N w_{ij}^2 y_j^1 + b_i^2\\right) \n",
+    "\\label{_auto2} \\tag{5}\n",
+    "\\end{equation}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d10df3e7",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"outputLayer2\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation} \n",
+    " = f^2\\left[\\sum_{j=1}^N w_{ij}^2f^1\\left(\\sum_{k=1}^M w_{jk}^1 x_k + b_j^1\\right) + b_i^2\\right]\n",
+    "\\label{outputLayer2} \\tag{6}\n",
+    "\\end{equation}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "da21a316",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where we have substituted $y_k^1$ with the inputs $x_k$. Finally, the ANN output reads"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "76938a28",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"_auto3\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation}\n",
+    " y_i^3 = f^3\\left(\\sum_{j=1}^N w_{ij}^3 y_j^2 + b_i^3\\right) \n",
+    "\\label{_auto3} \\tag{7}\n",
+    "\\end{equation}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "65434967",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"_auto4\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation} \n",
+    " = f_3\\left[\\sum_{j} w_{ij}^3 f^2\\left(\\sum_{k} w_{jk}^2 f^1\\left(\\sum_{m} w_{km}^1 x_m + b_k^1\\right) + b_j^2\\right)\n",
+    "  + b_1^3\\right]\n",
+    "\\label{_auto4} \\tag{8}\n",
+    "\\end{equation}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "31d4f5aa",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Mathematical model\n",
+    "\n",
+    "We can generalize this expression to an MLP with $l$ hidden\n",
+    "layers. The complete functional form is,"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "114030e5",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"completeNN\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation}\n",
+    "y^{l+1}_i = f^{l+1}\\left[\\!\\sum_{j=1}^{N_l} w_{ij}^3 f^l\\left(\\sum_{k=1}^{N_{l-1}}w_{jk}^{l-1}\\left(\\dots f^1\\left(\\sum_{n=1}^{N_0} w_{mn}^1 x_n+ b_m^1\\right)\\dots\\right)+b_k^2\\right)+b_1^3\\right] \n",
+    "\\label{completeNN} \\tag{9}\n",
+    "\\end{equation}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a93aec4e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "which illustrates a basic property of MLPs: The only independent\n",
+    "variables are the input values $x_n$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7c85562d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Mathematical model\n",
+    "\n",
+    "This confirms that an MLP, despite its quite convoluted mathematical\n",
+    "form, is nothing more than an analytic function, specifically a\n",
+    "mapping of real-valued vectors $\\hat{x} \\in \\mathbb{R}^n \\rightarrow\n",
+    "\\hat{y} \\in \\mathbb{R}^m$.\n",
+    "\n",
+    "Furthermore, the flexibility and universality of an MLP can be\n",
+    "illustrated by realizing that the expression is essentially a nested\n",
+    "sum of scaled activation functions of the form"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1152ea5e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"_auto5\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation}\n",
+    " f(x) = c_1 f(c_2 x + c_3) + c_4\n",
+    "\\label{_auto5} \\tag{10}\n",
+    "\\end{equation}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4f3d4b33",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where the parameters $c_i$ are weights and biases. By adjusting these\n",
+    "parameters, the activation functions can be shifted up and down or\n",
+    "left and right, change slope or be rescaled which is the key to the\n",
+    "flexibility of a neural network."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4c1ac54e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "### Matrix-vector notation\n",
+    "\n",
+    "We can introduce a more convenient notation for the activations in an A NN. \n",
+    "\n",
+    "Additionally, we can represent the biases and activations\n",
+    "as layer-wise column vectors $\\hat{b}_l$ and $\\hat{y}_l$, so that the $i$-th element of each vector \n",
+    "is the bias $b_i^l$ and activation $y_i^l$ of node $i$ in layer $l$ respectively. \n",
+    "\n",
+    "We have that $\\mathrm{W}_l$ is an $N_{l-1} \\times N_l$ matrix, while $\\hat{b}_l$ and $\\hat{y}_l$ are $N_l \\times 1$ column vectors. \n",
+    "With this notation, the sum becomes a matrix-vector multiplication, and we can write\n",
+    "the equation for the activations of hidden layer 2 (assuming three nodes for simplicity) as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5c4a861f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"_auto6\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation}\n",
+    " \\hat{y}_2 = f_2(\\mathrm{W}_2 \\hat{y}_{1} + \\hat{b}_{2}) = \n",
+    " f_2\\left(\\left[\\begin{array}{ccc}\n",
+    "    w^2_{11} &w^2_{12} &w^2_{13} \\\\\n",
+    "    w^2_{21} &w^2_{22} &w^2_{23} \\\\\n",
+    "    w^2_{31} &w^2_{32} &w^2_{33} \\\\\n",
+    "    \\end{array} \\right] \\cdot\n",
+    "    \\left[\\begin{array}{c}\n",
+    "           y^1_1 \\\\\n",
+    "           y^1_2 \\\\\n",
+    "           y^1_3 \\\\\n",
+    "          \\end{array}\\right] + \n",
+    "    \\left[\\begin{array}{c}\n",
+    "           b^2_1 \\\\\n",
+    "           b^2_2 \\\\\n",
+    "           b^2_3 \\\\\n",
+    "          \\end{array}\\right]\\right).\n",
+    "\\label{_auto6} \\tag{11}\n",
+    "\\end{equation}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "276b271b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "### Matrix-vector notation  and activation\n",
+    "\n",
+    "The activation of node $i$ in layer 2 is"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "63a5b8f1",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"_auto7\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation}\n",
+    " y^2_i = f_2\\Bigr(w^2_{i1}y^1_1 + w^2_{i2}y^1_2 + w^2_{i3}y^1_3 + b^2_i\\Bigr) = \n",
+    " f_2\\left(\\sum_{j=1}^3 w^2_{ij} y_j^1 + b^2_i\\right).\n",
+    "\\label{_auto7} \\tag{12}\n",
+    "\\end{equation}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "316b8c32",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "This is not just a convenient and compact notation, but also a useful\n",
+    "and intuitive way to think about MLPs: The output is calculated by a\n",
+    "series of matrix-vector multiplications and vector additions that are\n",
+    "used as input to the activation functions. For each operation\n",
+    "$\\mathrm{W}_l \\hat{y}_{l-1}$ we move forward one layer."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "34ba90c8",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "### Activation functions\n",
+    "\n",
+    "A property that characterizes a neural network, other than its\n",
+    "connectivity, is the choice of activation function(s).  As described\n",
+    "in, the following restrictions are imposed on an activation function\n",
+    "for a FFNN to fulfill the universal approximation theorem\n",
+    "\n",
+    "  * Non-constant\n",
+    "\n",
+    "  * Bounded\n",
+    "\n",
+    "  * Monotonically-increasing\n",
+    "\n",
+    "  * Continuous"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3019fcaf",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "### Activation functions, Logistic and Hyperbolic ones\n",
+    "\n",
+    "The second requirement excludes all linear functions. Furthermore, in\n",
+    "a MLP with only linear activation functions, each layer simply\n",
+    "performs a linear transformation of its inputs.\n",
+    "\n",
+    "Regardless of the number of layers, the output of the NN will be\n",
+    "nothing but a linear function of the inputs. Thus we need to introduce\n",
+    "some kind of non-linearity to the NN to be able to fit non-linear\n",
+    "functions Typical examples are the logistic *Sigmoid*"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "389ff36b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "f(x) = \\frac{1}{1 + e^{-x}},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ee9b399a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and the *hyperbolic tangent* function"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "36f98b26",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "f(x) = \\tanh(x)\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cb7b8839",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "### Relevance\n",
+    "\n",
+    "The *sigmoid* function are more biologically plausible because the\n",
+    "output of inactive neurons are zero. Such activation function are\n",
+    "called *one-sided*. However, it has been shown that the hyperbolic\n",
+    "tangent performs better than the sigmoid for training MLPs.  has\n",
+    "become the most popular for *deep neural networks*"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 12,
+   "id": "db8d28b5",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "\"\"\"The sigmoid function (or the logistic curve) is a \n",
+    "function that takes any real number, z, and outputs a number (0,1).\n",
+    "It is useful in neural networks for assigning weights on a relative scale.\n",
+    "The value z is the weighted sum of parameters involved in the learning algorithm.\"\"\"\n",
+    "\n",
+    "import numpy\n",
+    "import matplotlib.pyplot as plt\n",
+    "import math as mt\n",
+    "\n",
+    "z = numpy.arange(-5, 5, .1)\n",
+    "sigma_fn = numpy.vectorize(lambda z: 1/(1+numpy.exp(-z)))\n",
+    "sigma = sigma_fn(z)\n",
+    "\n",
+    "fig = plt.figure()\n",
+    "ax = fig.add_subplot(111)\n",
+    "ax.plot(z, sigma)\n",
+    "ax.set_ylim([-0.1, 1.1])\n",
+    "ax.set_xlim([-5,5])\n",
+    "ax.grid(True)\n",
+    "ax.set_xlabel('z')\n",
+    "ax.set_title('sigmoid function')\n",
+    "\n",
+    "plt.show()\n",
+    "\n",
+    "\"\"\"Step Function\"\"\"\n",
+    "z = numpy.arange(-5, 5, .02)\n",
+    "step_fn = numpy.vectorize(lambda z: 1.0 if z >= 0.0 else 0.0)\n",
+    "step = step_fn(z)\n",
+    "\n",
+    "fig = plt.figure()\n",
+    "ax = fig.add_subplot(111)\n",
+    "ax.plot(z, step)\n",
+    "ax.set_ylim([-0.5, 1.5])\n",
+    "ax.set_xlim([-5,5])\n",
+    "ax.grid(True)\n",
+    "ax.set_xlabel('z')\n",
+    "ax.set_title('step function')\n",
+    "\n",
+    "plt.show()\n",
+    "\n",
+    "\"\"\"Sine Function\"\"\"\n",
+    "z = numpy.arange(-2*mt.pi, 2*mt.pi, 0.1)\n",
+    "t = numpy.sin(z)\n",
+    "\n",
+    "fig = plt.figure()\n",
+    "ax = fig.add_subplot(111)\n",
+    "ax.plot(z, t)\n",
+    "ax.set_ylim([-1.0, 1.0])\n",
+    "ax.set_xlim([-2*mt.pi,2*mt.pi])\n",
+    "ax.grid(True)\n",
+    "ax.set_xlabel('z')\n",
+    "ax.set_title('sine function')\n",
+    "\n",
+    "plt.show()\n",
+    "\n",
+    "\"\"\"Plots a graph of the squashing function used by a rectified linear\n",
+    "unit\"\"\"\n",
+    "z = numpy.arange(-2, 2, .1)\n",
+    "zero = numpy.zeros(len(z))\n",
+    "y = numpy.max([zero, z], axis=0)\n",
+    "\n",
+    "fig = plt.figure()\n",
+    "ax = fig.add_subplot(111)\n",
+    "ax.plot(z, y)\n",
+    "ax.set_ylim([-2.0, 2.0])\n",
+    "ax.set_xlim([-2.0, 2.0])\n",
+    "ax.grid(True)\n",
+    "ax.set_xlabel('z')\n",
+    "ax.set_title('Rectified linear unit')\n",
+    "\n",
+    "plt.show()"
+   ]
+  }
+ ],
+ "metadata": {},
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/doc/LectureNotes/week41.ipynb b/doc/LectureNotes/week41.ipynb
new file mode 100644
index 000000000..c9b1adcdd
--- /dev/null
+++ b/doc/LectureNotes/week41.ipynb
@@ -0,0 +1,3820 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "b625bb28",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- HTML file automatically generated from DocOnce source (https://github.com/doconce/doconce/)\n",
+    "doconce format html week41.do.txt --no_mako -->\n",
+    "<!-- dom:TITLE: Week 41 Neural networks and constructing a neural network code -->"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "679109d4",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "# Week 41 Neural networks and constructing a neural network code\n",
+    "**Morten Hjorth-Jensen**, Department of Physics, University of Oslo, Norway\n",
+    "\n",
+    "Date: **Week 41**"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d7401ab9",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Plan for week 41, October 6-10"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f47e1c5c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Material for the lecture on Monday October 6, 2025\n",
+    "1. Neural Networks, setting up the basic steps, from the simple perceptron model to the multi-layer perceptron model.\n",
+    "\n",
+    "2. Building our own Feed-forward Neural Network, getting started\n",
+    "<!-- * Video of lecture notes at URL:\"\" -->\n",
+    "<!-- * Whiteboard notes at <https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2025/FYSSTKweek41.pdf> -->"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "af0a9895",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Readings and Videos:\n",
+    "1. These lecture notes\n",
+    "\n",
+    "2. For neural networks we recommend Goodfellow et al chapters 6 and 7.\n",
+    "\n",
+    "3. Rashkca et al., chapter 11, jupyter-notebook sent separately, from [GitHub](https://github.com/rasbt/machine-learning-book)\n",
+    "\n",
+    "4. Neural Networks demystified at <https://www.youtube.com/watch?v=bxe2T-V8XRs&list=PLiaHhY2iBX9hdHaRr6b7XevZtgZRa1PoU&ab_channel=WelchLabs>\n",
+    "\n",
+    "5. Building Neural Networks from scratch at <https://www.youtube.com/watch?v=Wo5dMEP_BbI&list=PLQVvvaa0QuDcjD5BAw2DxE6OF2tius3V3&ab_channel=sentdex>\n",
+    "\n",
+    "6. Video on Neural Networks at <https://www.youtube.com/watch?v=CqOfi41LfDw>\n",
+    "\n",
+    "7. Video on the back propagation algorithm at <https://www.youtube.com/watch?v=Ilg3gGewQ5U>\n",
+    "\n",
+    "8. We also  recommend Michael Nielsen's intuitive approach to the neural networks and the universal approximation theorem, see the slides at <http://neuralnetworksanddeeplearning.com/chap4.html>."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "be1e5c03",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Mathematics of deep learning\n",
+    "\n",
+    "**Two recent books online.**\n",
+    "\n",
+    "1. [The Modern Mathematics of Deep Learning, by Julius Berner, Philipp Grohs, Gitta Kutyniok, Philipp Petersen](https://arxiv.org/abs/2105.04026), published as [Mathematical Aspects of Deep Learning, pp. 1-111. Cambridge University Press, 2022](https://doi.org/10.1017/9781009025096.002)\n",
+    "\n",
+    "2. [Mathematical Introduction to Deep Learning: Methods, Implementations, and Theory, Arnulf Jentzen, Benno Kuckuck, Philippe von Wurstemberger](https://doi.org/10.48550/arXiv.2310.20360)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "52520e8f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Reminder on books with hands-on material and codes\n",
+    "[Sebastian Rashcka et al, Machine learning with Sickit-Learn and PyTorch](https://sebastianraschka.com/blog/2022/ml-pytorch-book.html)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "408a0487",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Lab sessions on Tuesday and Wednesday\n",
+    "\n",
+    "Aim: Getting started with coding neural network. The exercises this\n",
+    "week aim at setting up the feed-forward part of a neural network."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "23056baf",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Lecture Monday  October 6"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "56a2f2f2",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Introduction to Neural networks\n",
+    "\n",
+    "Artificial neural networks are computational systems that can learn to\n",
+    "perform tasks by considering examples, generally without being\n",
+    "programmed with any task-specific rules. It is supposed to mimic a\n",
+    "biological system, wherein neurons interact by sending signals in the\n",
+    "form of mathematical functions between layers. All layers can contain\n",
+    "an arbitrary number of neurons, and each connection is represented by\n",
+    "a weight variable."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2e3fa93d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Artificial neurons\n",
+    "\n",
+    "The field of artificial neural networks has a long history of\n",
+    "development, and is closely connected with the advancement of computer\n",
+    "science and computers in general. A model of artificial neurons was\n",
+    "first developed by McCulloch and Pitts in 1943 to study signal\n",
+    "processing in the brain and has later been refined by others. The\n",
+    "general idea is to mimic neural networks in the human brain, which is\n",
+    "composed of billions of neurons that communicate with each other by\n",
+    "sending electrical signals.  Each neuron accumulates its incoming\n",
+    "signals, which must exceed an activation threshold to yield an\n",
+    "output. If the threshold is not overcome, the neuron remains inactive,\n",
+    "i.e. has zero output.\n",
+    "\n",
+    "This behaviour has inspired a simple mathematical model for an artificial neuron."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0afafe3e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"artificialNeuron\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation}\n",
+    " y = f\\left(\\sum_{i=1}^n w_ix_i\\right) = f(u)\n",
+    "\\label{artificialNeuron} \\tag{1}\n",
+    "\\end{equation}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "bc113056",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Here, the output $y$ of the neuron is the value of its activation function, which have as input\n",
+    "a weighted sum of signals $x_i, \\dots ,x_n$ received by $n$ other neurons.\n",
+    "\n",
+    "Conceptually, it is helpful to divide neural networks into four\n",
+    "categories:\n",
+    "1. general purpose neural networks for supervised learning,\n",
+    "\n",
+    "2. neural networks designed specifically for image processing, the most prominent example of this class being Convolutional Neural Networks (CNNs),\n",
+    "\n",
+    "3. neural networks for sequential data such as Recurrent Neural Networks (RNNs), and\n",
+    "\n",
+    "4. neural networks for unsupervised learning such as Deep Boltzmann Machines.\n",
+    "\n",
+    "In natural science, DNNs and CNNs have already found numerous\n",
+    "applications. In statistical physics, they have been applied to detect\n",
+    "phase transitions in 2D Ising and Potts models, lattice gauge\n",
+    "theories, and different phases of polymers, or solving the\n",
+    "Navier-Stokes equation in weather forecasting.  Deep learning has also\n",
+    "found interesting applications in quantum physics. Various quantum\n",
+    "phase transitions can be detected and studied using DNNs and CNNs,\n",
+    "topological phases, and even non-equilibrium many-body\n",
+    "localization. Representing quantum states as DNNs quantum state\n",
+    "tomography are among some of the impressive achievements to reveal the\n",
+    "potential of DNNs to facilitate the study of quantum systems.\n",
+    "\n",
+    "In quantum information theory, it has been shown that one can perform\n",
+    "gate decompositions with the help of neural. \n",
+    "\n",
+    "The applications are not limited to the natural sciences. There is a\n",
+    "plethora of applications in essentially all disciplines, from the\n",
+    "humanities to life science and medicine."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "872c3321",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Neural network types\n",
+    "\n",
+    "An artificial neural network (ANN), is a computational model that\n",
+    "consists of layers of connected neurons, or nodes or units.  We will\n",
+    "refer to these interchangeably as units or nodes, and sometimes as\n",
+    "neurons.\n",
+    "\n",
+    "It is supposed to mimic a biological nervous system by letting each\n",
+    "neuron interact with other neurons by sending signals in the form of\n",
+    "mathematical functions between layers.  A wide variety of different\n",
+    "ANNs have been developed, but most of them consist of an input layer,\n",
+    "an output layer and eventual layers in-between, called *hidden\n",
+    "layers*. All layers can contain an arbitrary number of nodes, and each\n",
+    "connection between two nodes is associated with a weight variable.\n",
+    "\n",
+    "Neural networks (also called neural nets) are neural-inspired\n",
+    "nonlinear models for supervised learning.  As we will see, neural nets\n",
+    "can be viewed as natural, more powerful extensions of supervised\n",
+    "learning methods such as linear and logistic regression and soft-max\n",
+    "methods we discussed earlier."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "53edae74",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Feed-forward neural networks\n",
+    "\n",
+    "The feed-forward neural network (FFNN) was the first and simplest type\n",
+    "of ANNs that were devised. In this network, the information moves in\n",
+    "only one direction: forward through the layers.\n",
+    "\n",
+    "Nodes are represented by circles, while the arrows display the\n",
+    "connections between the nodes, including the direction of information\n",
+    "flow. Additionally, each arrow corresponds to a weight variable\n",
+    "(figure to come).  We observe that each node in a layer is connected\n",
+    "to *all* nodes in the subsequent layer, making this a so-called\n",
+    "*fully-connected* FFNN."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0eef36d6",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Convolutional Neural Network\n",
+    "\n",
+    "A different variant of FFNNs are *convolutional neural networks*\n",
+    "(CNNs), which have a connectivity pattern inspired by the animal\n",
+    "visual cortex. Individual neurons in the visual cortex only respond to\n",
+    "stimuli from small sub-regions of the visual field, called a receptive\n",
+    "field. This makes the neurons well-suited to exploit the strong\n",
+    "spatially local correlation present in natural images. The response of\n",
+    "each neuron can be approximated mathematically as a convolution\n",
+    "operation.  (figure to come)\n",
+    "\n",
+    "Convolutional neural networks emulate the behaviour of neurons in the\n",
+    "visual cortex by enforcing a *local* connectivity pattern between\n",
+    "nodes of adjacent layers: Each node in a convolutional layer is\n",
+    "connected only to a subset of the nodes in the previous layer, in\n",
+    "contrast to the fully-connected FFNN.  Often, CNNs consist of several\n",
+    "convolutional layers that learn local features of the input, with a\n",
+    "fully-connected layer at the end, which gathers all the local data and\n",
+    "produces the outputs. They have wide applications in image and video\n",
+    "recognition."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "bf602451",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Recurrent neural networks\n",
+    "\n",
+    "So far we have only mentioned ANNs where information flows in one\n",
+    "direction: forward. *Recurrent neural networks* on the other hand,\n",
+    "have connections between nodes that form directed *cycles*. This\n",
+    "creates a form of internal memory which are able to capture\n",
+    "information on what has been calculated before; the output is\n",
+    "dependent on the previous computations. Recurrent NNs make use of\n",
+    "sequential information by performing the same task for every element\n",
+    "in a sequence, where each element depends on previous elements. An\n",
+    "example of such information is sentences, making recurrent NNs\n",
+    "especially well-suited for handwriting and speech recognition."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0afbe2d0",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Other types of networks\n",
+    "\n",
+    "There are many other kinds of ANNs that have been developed. One type\n",
+    "that is specifically designed for interpolation in multidimensional\n",
+    "space is the radial basis function (RBF) network. RBFs are typically\n",
+    "made up of three layers: an input layer, a hidden layer with\n",
+    "non-linear radial symmetric activation functions and a linear output\n",
+    "layer (''linear'' here means that each node in the output layer has a\n",
+    "linear activation function). The layers are normally fully-connected\n",
+    "and there are no cycles, thus RBFs can be viewed as a type of\n",
+    "fully-connected FFNN. They are however usually treated as a separate\n",
+    "type of NN due the unusual activation functions."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d957cfe8",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Multilayer perceptrons\n",
+    "\n",
+    "One uses often so-called fully-connected feed-forward neural networks\n",
+    "with three or more layers (an input layer, one or more hidden layers\n",
+    "and an output layer) consisting of neurons that have non-linear\n",
+    "activation functions.\n",
+    "\n",
+    "Such networks are often called *multilayer perceptrons* (MLPs)."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "57b218ab",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Why multilayer perceptrons?\n",
+    "\n",
+    "According to the *Universal approximation theorem*, a feed-forward\n",
+    "neural network with just a single hidden layer containing a finite\n",
+    "number of neurons can approximate a continuous multidimensional\n",
+    "function to arbitrary accuracy, assuming the activation function for\n",
+    "the hidden layer is a **non-constant, bounded and\n",
+    "monotonically-increasing continuous function**.\n",
+    "\n",
+    "Note that the requirements on the activation function only applies to\n",
+    "the hidden layer, the output nodes are always assumed to be linear, so\n",
+    "as to not restrict the range of output values."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6bda8dda",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Illustration of a single perceptron model and a multi-perceptron model\n",
+    "\n",
+    "<!-- dom:FIGURE: [figures/nns.png, width=600 frac=0.8]  In a) we show a single perceptron model while in b) we dispay a network with two  hidden layers, an input layer and an output layer. -->\n",
+    "<!-- begin figure -->\n",
+    "\n",
+    "<img src=\"figures/nns.png\" width=\"600\"><p style=\"font-size: 0.9em\"><i>Figure 1: In a) we show a single perceptron model while in b) we dispay a network with two  hidden layers, an input layer and an output layer.</i></p>\n",
+    "<!-- end figure -->"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f7d514be",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Mathematics of deep learning and neural networks\n",
+    "\n",
+    "Neural networks, in its so-called feed-forward form, where each\n",
+    "iterations contains a feed-forward stage and a back-propgagation\n",
+    "stage, consist of series of affine matrix-matrix and matrix-vector\n",
+    "multiplications. The unknown parameters (the so-called biases and\n",
+    "weights which deternine the architecture of a neural network), are\n",
+    "uptaded iteratively using the so-called back-propagation algorithm.\n",
+    "This algorithm corresponds to the so-called reverse mode of \n",
+    "automatic differentation."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "02ed299b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Basics of an NN\n",
+    "\n",
+    "A neural network consists of a series of hidden layers, in addition to\n",
+    "the input and output layers.  Each layer $l$ has a set of parameters\n",
+    "$\\boldsymbol{\\Theta}^{(l)}=(\\boldsymbol{W}^{(l)},\\boldsymbol{b}^{(l)})$ which are related to the\n",
+    "parameters in other layers through a series of affine transformations,\n",
+    "for a standard NN these are matrix-matrix and matrix-vector\n",
+    "multiplications.  For all layers we will simply use a collective variable $\\boldsymbol{\\Theta}$.\n",
+    "\n",
+    "It consist of two basic steps:\n",
+    "1. a feed forward stage which takes a given input and produces a final output which is compared with the target values through our cost/loss function.\n",
+    "\n",
+    "2. a back-propagation state where the unknown parameters $\\boldsymbol{\\Theta}$ are updated through the optimization of the their gradients. The expressions for the gradients are obtained via the chain rule, starting from the derivative of the cost/function.\n",
+    "\n",
+    "These two steps make up one iteration. This iterative process is continued till we reach an eventual stopping criterion."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "96b8c13c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Overarching view of a neural network\n",
+    "\n",
+    "The architecture of a neural network defines our model. This model\n",
+    "aims at describing some function $f(\\boldsymbol{x}$ which represents\n",
+    "some final result (outputs or tagrget values) given a specific inpput\n",
+    "$\\boldsymbol{x}$. Note that here $\\boldsymbol{y}$ and $\\boldsymbol{x}$ are not limited to be\n",
+    "vectors.\n",
+    "\n",
+    "The architecture consists of\n",
+    "1. An input and an output layer where the input layer is defined by the inputs $\\boldsymbol{x}$. The output layer produces the model ouput $\\boldsymbol{\\tilde{y}}$ which is compared with the target value $\\boldsymbol{y}$\n",
+    "\n",
+    "2. A given number of hidden layers and neurons/nodes/units for each layer (this may vary)\n",
+    "\n",
+    "3. A given activation function $\\sigma(\\boldsymbol{z})$ with arguments $\\boldsymbol{z}$ to be defined below. The activation functions may differ from layer to layer.\n",
+    "\n",
+    "4. The last layer, normally called **output** layer has normally an activation function tailored to the specific problem\n",
+    "\n",
+    "5. Finally we define a so-called cost or loss function which is used to gauge the quality of our model."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "089704bf",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## The optimization problem\n",
+    "\n",
+    "The cost function is a function of the unknown parameters\n",
+    "$\\boldsymbol{\\Theta}$ where the latter is a container for all possible\n",
+    "parameters needed to define a neural network\n",
+    "\n",
+    "If we are dealing with a regression task a typical cost/loss function\n",
+    "is the mean squared error"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "91ef7170",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "C(\\boldsymbol{\\Theta})=\\frac{1}{n}\\left\\{\\left(\\boldsymbol{y}-\\boldsymbol{X}\\boldsymbol{\\theta}\\right)^T\\left(\\boldsymbol{y}-\\boldsymbol{X}\\boldsymbol{\\theta}\\right)\\right\\}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c9402737",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "This function represents one of many possible ways to define\n",
+    "the so-called cost function. Note that here we have assumed a linear dependence in terms of the paramters $\\boldsymbol{\\Theta}$. This is in general not the case."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "09940e05",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Parameters of neural networks\n",
+    "For neural networks the parameters\n",
+    "$\\boldsymbol{\\Theta}$ are given by the so-called weights and biases (to be\n",
+    "defined below).\n",
+    "\n",
+    "The weights are given by matrix elements $w_{ij}^{(l)}$ where the\n",
+    "superscript indicates the layer number. The biases are typically given\n",
+    "by vector elements representing each single node of a given layer,\n",
+    "that is $b_j^{(l)}$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2bd7b3ff",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Other ingredients of a neural network\n",
+    "\n",
+    "Having defined the architecture of a neural network, the optimization\n",
+    "of the cost function with respect to the parameters $\\boldsymbol{\\Theta}$,\n",
+    "involves the calculations of gradients and their optimization. The\n",
+    "gradients represent the derivatives of a multidimensional object and\n",
+    "are often approximated by various gradient methods, including\n",
+    "1. various quasi-Newton methods,\n",
+    "\n",
+    "2. plain gradient descent (GD) with a constant learning rate $\\eta$,\n",
+    "\n",
+    "3. GD with momentum and other approximations to the learning rates such as\n",
+    "\n",
+    "  * Adapative gradient (ADAgrad)\n",
+    "\n",
+    "  * Root mean-square propagation (RMSprop)\n",
+    "\n",
+    "  * Adaptive gradient with momentum (ADAM) and many other\n",
+    "\n",
+    "4. Stochastic gradient descent and various families of learning rate approximations"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1a771f02",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Other parameters\n",
+    "\n",
+    "In addition to the above, there are often additional hyperparamaters\n",
+    "which are included in the setup of a neural network. These will be\n",
+    "discussed below."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3291a232",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Universal approximation theorem\n",
+    "\n",
+    "The universal approximation theorem plays a central role in deep\n",
+    "learning.  [Cybenko (1989)](https://link.springer.com/article/10.1007/BF02551274) showed\n",
+    "the following:\n",
+    "\n",
+    "Let $\\sigma$ be any continuous sigmoidal function such that"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "74cc209d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\sigma(z) = \\left\\{\\begin{array}{cc} 1 & z\\rightarrow \\infty\\\\ 0 & z \\rightarrow -\\infty \\end{array}\\right.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "fe210f2f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Given a continuous and deterministic function $F(\\boldsymbol{x})$ on the unit\n",
+    "cube in $d$-dimensions $F\\in [0,1]^d$, $x\\in [0,1]^d$ and a parameter\n",
+    "$\\epsilon >0$, there is a one-layer (hidden) neural network\n",
+    "$f(\\boldsymbol{x};\\boldsymbol{\\Theta})$ with $\\boldsymbol{\\Theta}=(\\boldsymbol{W},\\boldsymbol{b})$ and $\\boldsymbol{W}\\in\n",
+    "\\mathbb{R}^{m\\times n}$ and $\\boldsymbol{b}\\in \\mathbb{R}^{n}$, for which"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4dfec9c9",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\vert F(\\boldsymbol{x})-f(\\boldsymbol{x};\\boldsymbol{\\Theta})\\vert < \\epsilon \\hspace{0.1cm} \\forall \\boldsymbol{x}\\in[0,1]^d.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a65f0cd5",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Some parallels from real analysis\n",
+    "\n",
+    "For those of you familiar with for example the [Stone-Weierstrass\n",
+    "theorem](https://en.wikipedia.org/wiki/Stone%E2%80%93Weierstrass_theorem)\n",
+    "for polynomial approximations or the convergence criterion for Fourier\n",
+    "series, there are similarities in the derivation of the proof for\n",
+    "neural networks."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d006386b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## The approximation theorem in words\n",
+    "\n",
+    "**Any continuous function $y=F(\\boldsymbol{x})$ supported on the unit cube in\n",
+    "$d$-dimensions can be approximated by a one-layer sigmoidal network to\n",
+    "arbitrary accuracy.**\n",
+    "\n",
+    "[Hornik (1991)](https://www.sciencedirect.com/science/article/abs/pii/089360809190009T) extended the theorem by letting any non-constant, bounded activation function to be included using that the expectation value"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0b094d43",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\mathbb{E}[\\vert F(\\boldsymbol{x})\\vert^2] =\\int_{\\boldsymbol{x}\\in D} \\vert F(\\boldsymbol{x})\\vert^2p(\\boldsymbol{x})d\\boldsymbol{x} < \\infty.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f2b9ca56",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Then we have"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "db4817b0",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\mathbb{E}[\\vert F(\\boldsymbol{x})-f(\\boldsymbol{x};\\boldsymbol{\\Theta})\\vert^2] =\\int_{\\boldsymbol{x}\\in D} \\vert F(\\boldsymbol{x})-f(\\boldsymbol{x};\\boldsymbol{\\Theta})\\vert^2p(\\boldsymbol{x})d\\boldsymbol{x} < \\epsilon.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "43216143",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## More on the general approximation theorem\n",
+    "\n",
+    "None of the proofs give any insight into the relation between the\n",
+    "number of of hidden layers and nodes and the approximation error\n",
+    "$\\epsilon$, nor the magnitudes of $\\boldsymbol{W}$ and $\\boldsymbol{b}$.\n",
+    "\n",
+    "Neural networks (NNs) have what we may call a kind of universality no matter what function we want to compute.\n",
+    "\n",
+    "It does not mean that an NN can be used to exactly compute any function. Rather, we get an approximation that is as good as we want."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ef48ad88",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Class of functions we can approximate\n",
+    "\n",
+    "The class of functions that can be approximated are the continuous ones.\n",
+    "If the function $F(\\boldsymbol{x})$ is discontinuous, it won't in general be possible to approximate it. However, an NN may still give an approximation even if we fail in some points."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7c4fed36",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Setting up the equations for a neural network\n",
+    "\n",
+    "The questions we want to ask are how do changes in the biases and the\n",
+    "weights in our network change the cost function and how can we use the\n",
+    "final output to modify the weights and biases?\n",
+    "\n",
+    "To derive these equations let us start with a plain regression problem\n",
+    "and define our cost function as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c4cf04e8",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "{\\cal C}(\\boldsymbol{\\Theta})  =  \\frac{1}{2}\\sum_{i=1}^n\\left(y_i - \\tilde{y}_i\\right)^2,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ecc9a1bd",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where the $y_i$s are our $n$ targets (the values we want to\n",
+    "reproduce), while the outputs of the network after having propagated\n",
+    "all inputs $\\boldsymbol{x}$ are given by $\\boldsymbol{\\tilde{y}}_i$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "91e6e150",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Layout of a neural network with three hidden layers\n",
+    "\n",
+    "<!-- dom:FIGURE: [figures/nn1.png, width=900 frac=1.0] -->\n",
+    "<!-- begin figure -->\n",
+    "\n",
+    "<img src=\"figures/nn1.png\" width=\"900\"><p style=\"font-size: 0.9em\"><i>Figure 1: </i></p>\n",
+    "<!-- end figure -->"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4fabe3cc",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Definitions\n",
+    "\n",
+    "With our definition of the targets $\\boldsymbol{y}$, the outputs of the\n",
+    "network $\\boldsymbol{\\tilde{y}}$ and the inputs $\\boldsymbol{x}$ we\n",
+    "define now the activation $z_j^l$ of node/neuron/unit $j$ of the\n",
+    "$l$-th layer as a function of the bias, the weights which add up from\n",
+    "the previous layer $l-1$ and the forward passes/outputs\n",
+    "$\\hat{a}^{l-1}$ from the previous layer as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8c25e4cf",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "z_j^l = \\sum_{i=1}^{M_{l-1}}w_{ij}^la_i^{l-1}+b_j^l,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ae861380",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where $b_k^l$ are the biases from layer $l$.  Here $M_{l-1}$\n",
+    "represents the total number of nodes/neurons/units of layer $l-1$. The\n",
+    "figure in the whiteboard notes illustrates this equation.  We can rewrite this in a more\n",
+    "compact form as the matrix-vector products we discussed earlier,"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2b7f7b74",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\hat{z}^l = \\left(\\hat{W}^l\\right)^T\\hat{a}^{l-1}+\\hat{b}^l.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e76386ca",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Inputs to the activation function\n",
+    "\n",
+    "With the activation values $\\boldsymbol{z}^l$ we can in turn define the\n",
+    "output of layer $l$ as $\\boldsymbol{a}^l = f(\\boldsymbol{z}^l)$ where $f$ is our\n",
+    "activation function. In the examples here we will use the sigmoid\n",
+    "function discussed in our logistic regression lectures. We will also use the same activation function $f$ for all layers\n",
+    "and their nodes.  It means we have"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "12a9fb38",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "a_j^l = \\sigma(z_j^l) = \\frac{1}{1+\\exp{-(z_j^l)}}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "08bbe824",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Derivatives and the chain rule\n",
+    "\n",
+    "From the definition of the activation $z_j^l$ we have"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3783fe53",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial z_j^l}{\\partial w_{ij}^l} = a_i^{l-1},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2b70d213",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "209db1b2",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial z_j^l}{\\partial a_i^{l-1}} = w_{ji}^l.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6e42f02f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "With our definition of the activation function we have that (note that this function depends only on $z_j^l$)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "78422fdc",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial a_j^l}{\\partial z_j^{l}} = a_j^l(1-a_j^l)=\\sigma(z_j^l)(1-\\sigma(z_j^l)).\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8c8491cf",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Derivative of the cost function\n",
+    "\n",
+    "With these definitions we can now compute the derivative of the cost function in terms of the weights.\n",
+    "\n",
+    "Let us specialize to the output layer $l=L$. Our cost function is"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "82fb3ded",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "{\\cal C}(\\boldsymbol{\\Theta}^L)  =  \\frac{1}{2}\\sum_{i=1}^n\\left(y_i - \\tilde{y}_i\\right)^2=\\frac{1}{2}\\sum_{i=1}^n\\left(a_i^L - y_i\\right)^2,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "88fe7049",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "The derivative of this function with respect to the weights is"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "af856571",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial{\\cal C}(\\boldsymbol{\\Theta}^L)}{\\partial w_{jk}^L}  =  \\left(a_j^L - y_j\\right)\\frac{\\partial a_j^L}{\\partial w_{jk}^{L}},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d684ab45",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "The last partial derivative can easily be computed and reads (by applying the chain rule)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ac371b5c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial a_j^L}{\\partial w_{jk}^{L}} = \\frac{\\partial a_j^L}{\\partial z_{j}^{L}}\\frac{\\partial z_j^L}{\\partial w_{jk}^{L}}=a_j^L(1-a_j^L)a_k^{L-1}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8dbfe230",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Simpler examples first, and automatic differentiation\n",
+    "\n",
+    "In order to understand the back propagation algorithm and its\n",
+    "derivation (an implementation of the chain rule), let us first digress\n",
+    "with some simple examples. These examples are also meant to motivate\n",
+    "the link with back propagation and [automatic differentiation](https://en.wikipedia.org/wiki/Automatic_differentiation). We will discuss these topics next week (week 42)."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7244f7f3",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Reminder on the chain rule and gradients\n",
+    "\n",
+    "If we have a multivariate function $f(x,y)$ where $x=x(t)$ and $y=y(t)$ are functions of a variable $t$, we have that the gradient of $f$ with respect to $t$ (without the explicit unit vector components)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ffb80d86",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{df}{dt} = \\begin{bmatrix}\\frac{\\partial f}{\\partial x} & \\frac{\\partial f}{\\partial y} \\end{bmatrix} \\begin{bmatrix}\\frac{\\partial x}{\\partial t} \\\\ \\frac{\\partial y}{\\partial t} \\end{bmatrix}=\\frac{\\partial f}{\\partial x} \\frac{\\partial x}{\\partial t} +\\frac{\\partial f}{\\partial y} \\frac{\\partial y}{\\partial t}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6f15ef23",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Multivariable functions\n",
+    "\n",
+    "If we have a multivariate function $f(x,y)$ where $x=x(t,s)$ and $y=y(t,s)$ are functions of the variables $t$ and $s$, we have that the partial derivatives"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1734d532",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial f}{\\partial s}=\\frac{\\partial f}{\\partial x}\\frac{\\partial x}{\\partial s}+\\frac{\\partial f}{\\partial y}\\frac{\\partial y}{\\partial s},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8c013e25",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f416e200",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial f}{\\partial t}=\\frac{\\partial f}{\\partial x}\\frac{\\partial x}{\\partial t}+\\frac{\\partial f}{\\partial y}\\frac{\\partial y}{\\partial t}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "943d440c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "the gradient of $f$ with respect to $t$ and $s$ (without the explicit unit vector components)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9a88f9e3",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{df}{d(s,t)} = \\begin{bmatrix}\\frac{\\partial f}{\\partial x} & \\frac{\\partial f}{\\partial y} \\end{bmatrix} \\begin{bmatrix}\\frac{\\partial x}{\\partial s}  &\\frac{\\partial x}{\\partial t} \\\\ \\frac{\\partial y}{\\partial s} & \\frac{\\partial y}{\\partial t} \\end{bmatrix}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6bc993bf",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Automatic differentiation through examples\n",
+    "\n",
+    "A great introduction to automatic differentiation is given by Baydin et al., see <https://arxiv.org/abs/1502.05767>.\n",
+    "See also the video at <https://www.youtube.com/watch?v=wG_nF1awSSY>.\n",
+    "\n",
+    "Automatic differentiation is a represented by a repeated application\n",
+    "of the chain rule on well-known functions and allows for the\n",
+    "calculation of derivatives to numerical precision. It is not the same\n",
+    "as the calculation of symbolic derivatives via for example SymPy, nor\n",
+    "does it use approximative formulae based on Taylor-expansions of a\n",
+    "function around a given value. The latter are error prone due to\n",
+    "truncation errors and values of the step size $\\Delta$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0685fdd2",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Simple example\n",
+    "\n",
+    "Our first example is rather simple,"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9a2b16de",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "f(x) =\\exp{x^2},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ba5c3f8a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "with derivative"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d0c973a9",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "f'(x) =2x\\exp{x^2}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "34c21223",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "We can use SymPy to extract the pertinent lines of Python code through the following simple example"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "id": "72fa0f44",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "from __future__ import division\n",
+    "from sympy import *\n",
+    "x = symbols('x')\n",
+    "expr = exp(x*x)\n",
+    "simplify(expr)\n",
+    "derivative = diff(expr,x)\n",
+    "print(python(expr))\n",
+    "print(python(derivative))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "78884bc6",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Smarter way of evaluating the above function\n",
+    "If we study this function, we note that we can reduce the number of operations by introducing an intermediate variable"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f13d7286",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "a = x^2,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "443739d9",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "leading to"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "48b45da1",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "f(x) = f(a(x)) = b= \\exp{a}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "81e7fd8f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "We now assume that all operations can be counted in terms of equal\n",
+    "floating point operations. This means that in order to calculate\n",
+    "$f(x)$ we need first to square $x$ and then compute the exponential. We\n",
+    "have thus two floating point operations only."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "824bbfa1",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Reducing the number of operations\n",
+    "\n",
+    "With the introduction of a precalculated quantity $a$ and thereby $f(x)$ we have that the derivative can be written as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "42d2716e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "f'(x) = 2xb,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f27855c1",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "which reduces the number of operations from four in the orginal\n",
+    "expression to two. This means that if we need to compute $f(x)$ and\n",
+    "its derivative (a common task in optimizations), we have reduced the\n",
+    "number of operations from six to four in total.\n",
+    "\n",
+    "**Note** that the usage of a symbolic software like SymPy does not\n",
+    "include such simplifications and the calculations of the function and\n",
+    "the derivatives yield in general more floating point operations."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d4fe531f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Chain rule, forward and reverse modes\n",
+    "\n",
+    "In the above example we have introduced the variables $a$ and $b$, and our function is"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "aba8f666",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "f(x) = f(a(x)) = b= \\exp{a},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "404c698a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "with $a=x^2$. We can decompose the derivative of $f$ with respect to $x$ as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2c73032a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{df}{dx}=\\frac{df}{db}\\frac{db}{da}\\frac{da}{dx}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "95a71a82",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "We note that since $b=f(x)$ that"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c71b8e66",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{df}{db}=1,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "23998633",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "leading to"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0708e562",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{df}{dx}=\\frac{db}{da}\\frac{da}{dx}=2x\\exp{x^2},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ee8c4ade",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "as before."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "860d410c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Forward and reverse modes\n",
+    "\n",
+    "We have that"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "064e5852",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{df}{dx}=\\frac{df}{db}\\frac{db}{da}\\frac{da}{dx},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "983c3afe",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "which we can rewrite either as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a1f9638f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{df}{dx}=\\left[\\frac{df}{db}\\frac{db}{da}\\right]\\frac{da}{dx},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "84a07e04",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "or"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4383650d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{df}{dx}=\\frac{df}{db}\\left[\\frac{db}{da}\\frac{da}{dx}\\right].\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "36a2d607",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "The first expression is called reverse mode (or back propagation)\n",
+    "since we start by evaluating the derivatives at the end point and then\n",
+    "propagate backwards. This is the standard way of evaluating\n",
+    "derivatives (gradients) when optimizing the parameters of a neural\n",
+    "network.  In the context of deep learning this is computationally\n",
+    "more efficient since the output of a neural network consists of either\n",
+    "one or some few other output variables.\n",
+    "\n",
+    "The second equation defines the so-called  **forward mode**."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ab0a9ca8",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## More complicated function\n",
+    "\n",
+    "We increase our ambitions and introduce a slightly more complicated function"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e85a7d29",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "f(x) =\\sqrt{x^2+exp{x^2}},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "91c151e1",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "with derivative"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "037a60e4",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "f'(x) =\\frac{x(1+\\exp{x^2})}{\\sqrt{x^2+exp{x^2}}}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9f198b96",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "The corresponding SymPy code reads"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "id": "620b6c3e",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "from __future__ import division\n",
+    "from sympy import *\n",
+    "x = symbols('x')\n",
+    "expr = sqrt(x*x+exp(x*x))\n",
+    "simplify(expr)\n",
+    "derivative = diff(expr,x)\n",
+    "print(python(expr))\n",
+    "print(python(derivative))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d1fe5ce8",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Counting the number of floating point operations\n",
+    "\n",
+    "A simple count of operations shows that we need five operations for\n",
+    "the function itself and ten for the derivative.  Fifteen operations in total if we wish to proceed with the above codes.\n",
+    "\n",
+    "Can we reduce this to\n",
+    "say half the number of operations?"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "746e84de",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Defining intermediate operations\n",
+    "\n",
+    "We can indeed reduce the number of operation to half of those listed in the brute force approach above.\n",
+    "We define the following quantities"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cbb4abde",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "a = x^2,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "640a0037",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e3b8b12d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "b = \\exp{x^2} = \\exp{a},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5b2087bf",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5c397a99",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "c= a+b,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c4884822",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c1834aef",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "d=f(x)=\\sqrt{c}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "aeee8fc4",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## New expression for the derivative\n",
+    "\n",
+    "With these definitions we obtain the following partial derivatives"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "df71e889",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial a}{\\partial x} = 2x,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "358a49a2",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "95138b08",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial b}{\\partial a} = \\exp{a},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0a0e2f81",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7fa7f3b5",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial c}{\\partial a} = 1,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c74442e2",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2e9ebae8",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial c}{\\partial b} = 1,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "db89516c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0bc2735a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial d}{\\partial c} = \\frac{1}{2\\sqrt{c}},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "42e0cb08",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and finally"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "56ccf1d5",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial f}{\\partial d} = 1.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "557f2482",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Final derivatives\n",
+    "Our final derivatives are thus"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "90eeebe1",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial f}{\\partial c} = \\frac{\\partial f}{\\partial d} \\frac{\\partial d}{\\partial c}  = \\frac{1}{2\\sqrt{c}},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6c2abeb4",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial f}{\\partial b} = \\frac{\\partial f}{\\partial c} \\frac{\\partial c}{\\partial b}  = \\frac{1}{2\\sqrt{c}},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3f5af305",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial f}{\\partial a} = \\frac{\\partial f}{\\partial c} \\frac{\\partial c}{\\partial a}+\n",
+    "\\frac{\\partial f}{\\partial b} \\frac{\\partial b}{\\partial a}  = \\frac{1+\\exp{a}}{2\\sqrt{c}},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b78e9f43",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and finally"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d197d721",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial f}{\\partial x} = \\frac{\\partial f}{\\partial a} \\frac{\\partial a}{\\partial x}  = \\frac{x(1+\\exp{a})}{\\sqrt{c}},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "17334528",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "which is just"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f69ca3fd",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial f}{\\partial x} = \\frac{x(1+b)}{d},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e937d622",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and requires only three operations if we can reuse all intermediate variables."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8ab7ba6b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## In general not this simple\n",
+    "\n",
+    "In general, see the generalization below, unless we can obtain simple\n",
+    "analytical expressions which we can simplify further, the final\n",
+    "implementation of automatic differentiation involves repeated\n",
+    "calculations (and thereby operations) of derivatives of elementary\n",
+    "functions."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "02665ba6",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Automatic differentiation\n",
+    "\n",
+    "We can make this example more formal. Automatic differentiation is a\n",
+    "formalization of the previous example (see graph).\n",
+    "\n",
+    "We define $\\boldsymbol{x}\\in x_1,\\dots, x_l$ input variables to a given function $f(\\boldsymbol{x})$ and $x_{l+1},\\dots, x_L$ intermediate variables.\n",
+    "\n",
+    "In the above example we have only one input variable, $l=1$ and four intermediate variables, that is"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c473a49a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\begin{bmatrix} x_1=x & x_2 = x^2=a & x_3 =\\exp{a}= b & x_4=c=a+b & x_5 = \\sqrt{c}=d \\end{bmatrix}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6beeffc2",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Furthemore, for $i=l+1, \\dots, L$ (here $i=2,3,4,5$ and $f=x_L=d$), we\n",
+    "define the elementary functions $g_i(x_{Pa(x_i)})$ where $x_{Pa(x_i)}$ are the parent nodes of the variable $x_i$.\n",
+    "\n",
+    "In our case, we have for example for $x_3=g_3(x_{Pa(x_i)})=\\exp{a}$, that $g_3=\\exp{()}$ and $x_{Pa(x_3)}=a$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "814918db",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Chain rule\n",
+    "\n",
+    "We can now compute the gradients by back-propagating the derivatives using the chain rule.\n",
+    "We have defined"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a7a72e3b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial f}{\\partial x_L} = 1,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "041df7ab",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "which allows us to find the derivatives of the various variables $x_i$ as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b687bc51",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial f}{\\partial x_i} = \\sum_{x_j:x_i\\in Pa(x_j)}\\frac{\\partial f}{\\partial x_j} \\frac{\\partial x_j}{\\partial x_i}=\\sum_{x_j:x_i\\in Pa(x_j)}\\frac{\\partial f}{\\partial x_j} \\frac{\\partial g_j}{\\partial x_i}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5c87f3af",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Whenever we have a function which can be expressed as a computation\n",
+    "graph and the various functions can be expressed in terms of\n",
+    "elementary functions that are differentiable, then automatic\n",
+    "differentiation works.  The functions may not need to be elementary\n",
+    "functions, they could also be computer programs, although not all\n",
+    "programs can be automatically differentiated."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "02df0535",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## First network example, simple percepetron with one input\n",
+    "\n",
+    "As yet another example we define now a simple perceptron model with\n",
+    "all quantities given by scalars. We consider only one input variable\n",
+    "$x$ and one target value $y$.  We define an activation function\n",
+    "$\\sigma_1$ which takes as input"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "dc45fa01",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "z_1 = w_1x+b_1,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5568395b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where $w_1$ is the weight and $b_1$ is the bias. These are the\n",
+    "parameters we want to optimize.  The output is $a_1=\\sigma(z_1)$ (see\n",
+    "graph from whiteboard notes). This output is then fed into the\n",
+    "**cost/loss** function, which we here for the sake of simplicity just\n",
+    "define as the squared error"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e6ae6f18",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "C(x;w_1,b_1)=\\frac{1}{2}(a_1-y)^2.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7d6abd22",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Layout of a simple neural network with no hidden layer\n",
+    "\n",
+    "<!-- dom:FIGURE: [figures/simplenn1.png, width=900 frac=1.0] -->\n",
+    "<!-- begin figure -->\n",
+    "\n",
+    "<img src=\"figures/simplenn1.png\" width=\"900\"><p style=\"font-size: 0.9em\"><i>Figure 1: </i></p>\n",
+    "<!-- end figure -->"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1e466108",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Optimizing the parameters\n",
+    "\n",
+    "In setting up the feed forward and back propagation parts of the\n",
+    "algorithm, we need now the derivative of the various variables we want\n",
+    "to train.\n",
+    "\n",
+    "We need"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3b6fd059",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial C}{\\partial w_1} \\hspace{0.1cm}\\mathrm{and}\\hspace{0.1cm}\\frac{\\partial C}{\\partial b_1}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cfad60fc",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Using the chain rule we find"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5c5014b3",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial C}{\\partial w_1}=\\frac{\\partial C}{\\partial a_1}\\frac{\\partial a_1}{\\partial z_1}\\frac{\\partial z_1}{\\partial w_1}=(a_1-y)\\sigma_1'x,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1c677323",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "93362833",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial C}{\\partial b_1}=\\frac{\\partial C}{\\partial a_1}\\frac{\\partial a_1}{\\partial z_1}\\frac{\\partial z_1}{\\partial b_1}=(a_1-y)\\sigma_1',\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c857a902",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "which we later will just define as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b7b95721",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial C}{\\partial a_1}\\frac{\\partial a_1}{\\partial z_1}=\\delta_1.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e2574534",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Adding a hidden layer\n",
+    "\n",
+    "We change our simple model to (see graph)\n",
+    "a network with just one hidden layer but with scalar variables only.\n",
+    "\n",
+    "Our output variable changes to $a_2$ and $a_1$ is now the output from the hidden node and $a_0=x$.\n",
+    "We have then"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ae7a5afa",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "z_1 = w_1a_0+b_1 \\hspace{0.1cm} \\wedge a_1 = \\sigma_1(z_1),\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7962e138",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "z_2 = w_2a_1+b_2 \\hspace{0.1cm} \\wedge a_2 = \\sigma_2(z_2),\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0add2cb1",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and the cost function"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2ea986fc",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "C(x;\\boldsymbol{\\Theta})=\\frac{1}{2}(a_2-y)^2,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "683c4849",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "with $\\boldsymbol{\\Theta}=[w_1,w_2,b_1,b_2]$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f345670c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Layout of a simple neural network with one hidden layer\n",
+    "\n",
+    "<!-- dom:FIGURE: [figures/simplenn2.png, width=900 frac=1.0] -->\n",
+    "<!-- begin figure -->\n",
+    "\n",
+    "<img src=\"figures/simplenn2.png\" width=\"900\"><p style=\"font-size: 0.9em\"><i>Figure 1: </i></p>\n",
+    "<!-- end figure -->"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "bb15a76b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## The derivatives\n",
+    "\n",
+    "The derivatives are now, using the chain rule again"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d0882362",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial C}{\\partial w_2}=\\frac{\\partial C}{\\partial a_2}\\frac{\\partial a_2}{\\partial z_2}\\frac{\\partial z_2}{\\partial w_2}=(a_2-y)\\sigma_2'a_1=\\delta_2a_1,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3e16d45d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial C}{\\partial b_2}=\\frac{\\partial C}{\\partial a_2}\\frac{\\partial a_2}{\\partial z_2}\\frac{\\partial z_2}{\\partial b_2}=(a_2-y)\\sigma_2'=\\delta_2,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b2a0a41b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial C}{\\partial w_1}=\\frac{\\partial C}{\\partial a_2}\\frac{\\partial a_2}{\\partial z_2}\\frac{\\partial z_2}{\\partial a_1}\\frac{\\partial a_1}{\\partial z_1}\\frac{\\partial z_1}{\\partial w_1}=(a_2-y)\\sigma_2'a_1\\sigma_1'a_0,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e8f61358",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial C}{\\partial b_1}=\\frac{\\partial C}{\\partial a_2}\\frac{\\partial a_2}{\\partial z_2}\\frac{\\partial z_2}{\\partial a_1}\\frac{\\partial a_1}{\\partial z_1}\\frac{\\partial z_1}{\\partial b_1}=(a_2-y)\\sigma_2'\\sigma_1'=\\delta_1.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5a8258cb",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Can you generalize this to more than one hidden layer?"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "bb720314",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Important observations\n",
+    "\n",
+    "From the above equations we see that the derivatives of the activation\n",
+    "functions play a central role. If they vanish, the training may\n",
+    "stop. This is called the vanishing gradient problem, see discussions below. If they become\n",
+    "large, the parameters $w_i$ and $b_i$ may simply go to infinity. This\n",
+    "is referenced as  the exploding gradient problem."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "52217a26",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## The training\n",
+    "\n",
+    "The training of the parameters is done through various gradient descent approximations with"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "eb647e50",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "w_{i}\\leftarrow w_{i}- \\eta \\delta_i a_{i-1},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cda95964",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "130a2766",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "b_i \\leftarrow b_i-\\eta \\delta_i,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ac7cc3bc",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "with $\\eta$ is the learning rate.\n",
+    "\n",
+    "One iteration consists of one feed forward step and one back-propagation step. Each back-propagation step does one update of the parameters $\\boldsymbol{\\Theta}$.\n",
+    "\n",
+    "For the first hidden layer $a_{i-1}=a_0=x$ for this simple model."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cde60cd2",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Code example\n",
+    "\n",
+    "The code here implements the above model with one hidden layer and\n",
+    "scalar variables for the same function we studied in the previous\n",
+    "example.  The code is however set up so that we can add multiple\n",
+    "inputs $x$ and target values $y$. Note also that we have the\n",
+    "possibility of defining a feature matrix $\\boldsymbol{X}$ with more than just\n",
+    "one column for the input values. This will turn useful in our next example. We have also defined matrices and vectors for all of our operations although it is not necessary here."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "id": "3616dd69",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "import numpy as np\n",
+    "# We use the Sigmoid function as activation function\n",
+    "def sigmoid(z):\n",
+    "    return 1.0/(1.0+np.exp(-z))\n",
+    "\n",
+    "def forwardpropagation(x):\n",
+    "    # weighted sum of inputs to the hidden layer\n",
+    "    z_1 = np.matmul(x, w_1) + b_1\n",
+    "    # activation in the hidden layer\n",
+    "    a_1 = sigmoid(z_1)\n",
+    "    # weighted sum of inputs to the output layer\n",
+    "    z_2 = np.matmul(a_1, w_2) + b_2\n",
+    "    a_2 = z_2\n",
+    "    return a_1, a_2\n",
+    "\n",
+    "def backpropagation(x, y):\n",
+    "    a_1, a_2 = forwardpropagation(x)\n",
+    "    # parameter delta for the output layer, note that a_2=z_2 and its derivative wrt z_2 is just 1\n",
+    "    delta_2 = a_2 - y\n",
+    "    print(0.5*((a_2-y)**2))\n",
+    "    # delta for  the hidden layer\n",
+    "    delta_1 = np.matmul(delta_2, w_2.T) * a_1 * (1 - a_1)\n",
+    "    # gradients for the output layer\n",
+    "    output_weights_gradient = np.matmul(a_1.T, delta_2)\n",
+    "    output_bias_gradient = np.sum(delta_2, axis=0)\n",
+    "    # gradient for the hidden layer\n",
+    "    hidden_weights_gradient = np.matmul(x.T, delta_1)\n",
+    "    hidden_bias_gradient = np.sum(delta_1, axis=0)\n",
+    "    return output_weights_gradient, output_bias_gradient, hidden_weights_gradient, hidden_bias_gradient\n",
+    "\n",
+    "\n",
+    "# ensure the same random numbers appear every time\n",
+    "np.random.seed(0)\n",
+    "# Input variable\n",
+    "x = np.array([4.0],dtype=np.float64)\n",
+    "# Target values\n",
+    "y = 2*x+1.0 \n",
+    "\n",
+    "# Defining the neural network, only scalars here\n",
+    "n_inputs = x.shape\n",
+    "n_features = 1\n",
+    "n_hidden_neurons = 1\n",
+    "n_outputs = 1\n",
+    "\n",
+    "# Initialize the network\n",
+    "# weights and bias in the hidden layer\n",
+    "w_1 = np.random.randn(n_features, n_hidden_neurons)\n",
+    "b_1 = np.zeros(n_hidden_neurons) + 0.01\n",
+    "\n",
+    "# weights and bias in the output layer\n",
+    "w_2 = np.random.randn(n_hidden_neurons, n_outputs)\n",
+    "b_2 = np.zeros(n_outputs) + 0.01\n",
+    "\n",
+    "eta = 0.1\n",
+    "for i in range(50):\n",
+    "    # calculate gradients\n",
+    "    derivW2, derivB2, derivW1, derivB1 = backpropagation(x, y)\n",
+    "    # update weights and biases\n",
+    "    w_2 -= eta * derivW2\n",
+    "    b_2 -= eta * derivB2\n",
+    "    w_1 -= eta * derivW1\n",
+    "    b_1 -= eta * derivB1"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3348a149",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "We see that after some few iterations (the results do depend on the learning rate however), we get an error which is rather small."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b9b47543",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Exercise 1: Including more data\n",
+    "\n",
+    "Try to increase the amount of input and\n",
+    "target/output data. Try also to perform calculations for more values\n",
+    "of the learning rates. Feel free to add either hyperparameters with an\n",
+    "$l_1$ norm or an $l_2$ norm and discuss your results.\n",
+    "Discuss your results as functions of the amount of training data and various learning rates.\n",
+    "\n",
+    "**Challenge:** Try to change the activation functions and replace the hard-coded analytical expressions with automatic derivation via either **autograd** or **JAX**."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3d2a82c9",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Simple neural network and the  back propagation equations\n",
+    "\n",
+    "Let us now try to increase our level of ambition and attempt at setting \n",
+    "up the equations for a neural network with two input nodes, one hidden\n",
+    "layer with two hidden nodes and one output layer with one output node/neuron only (see graph)..\n",
+    "\n",
+    "We need to define the following parameters and variables with the input layer (layer $(0)$) \n",
+    "where we label the  nodes $x_0$ and $x_1$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e2bda122",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "x_0 = a_0^{(0)} \\wedge x_1 = a_1^{(0)}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d4324d91",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "The  hidden layer (layer $(1)$) has  nodes which yield the outputs $a_0^{(1)}$ and $a_1^{(1)}$) with  weight $\\boldsymbol{w}$ and bias $\\boldsymbol{b}$ parameters"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b3c0b344",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "w_{ij}^{(1)}=\\left\\{w_{00}^{(1)},w_{01}^{(1)},w_{10}^{(1)},w_{11}^{(1)}\\right\\} \\wedge b^{(1)}=\\left\\{b_0^{(1)},b_1^{(1)}\\right\\}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "fb200d12",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Layout of a simple neural network with two input nodes, one  hidden layer and one output node\n",
+    "\n",
+    "<!-- dom:FIGURE: [figures/simplenn3.png, width=900 frac=1.0] -->\n",
+    "<!-- begin figure -->\n",
+    "\n",
+    "<img src=\"figures/simplenn3.png\" width=\"900\"><p style=\"font-size: 0.9em\"><i>Figure 1: </i></p>\n",
+    "<!-- end figure -->"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5a7e37cd",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## The ouput layer\n",
+    "\n",
+    "Finally, we have the ouput layer given by layer label $(2)$ with output $a^{(2)}$ and weights and biases to be determined given by the variables"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "11f25dfa",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "w_{i}^{(2)}=\\left\\{w_{0}^{(2)},w_{1}^{(2)}\\right\\} \\wedge b^{(2)}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8755dbae",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Our output is $\\tilde{y}=a^{(2)}$ and we define a generic cost function $C(a^{(2)},y;\\boldsymbol{\\Theta})$ where $y$ is the target value (a scalar here).\n",
+    "The parameters we need to optimize are given by"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "51983594",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\boldsymbol{\\Theta}=\\left\\{w_{00}^{(1)},w_{01}^{(1)},w_{10}^{(1)},w_{11}^{(1)},w_{0}^{(2)},w_{1}^{(2)},b_0^{(1)},b_1^{(1)},b^{(2)}\\right\\}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "20a70d90",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Compact expressions\n",
+    "\n",
+    "We can define the inputs to the activation functions for the various layers in terms of various matrix-vector multiplications and vector additions.\n",
+    "The inputs to the first hidden layer are"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "76e186dc",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\begin{bmatrix}z_0^{(1)} \\\\ z_1^{(1)} \\end{bmatrix}=\\begin{bmatrix}w_{00}^{(1)} & w_{01}^{(1)}\\\\ w_{10}^{(1)} &w_{11}^{(1)} \\end{bmatrix}\\begin{bmatrix}a_0^{(0)} \\\\ a_1^{(0)} \\end{bmatrix}+\\begin{bmatrix}b_0^{(1)} \\\\ b_1^{(1)} \\end{bmatrix},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3396d1b9",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "with outputs"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2f4d2eed",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\begin{bmatrix}a_0^{(1)} \\\\ a_1^{(1)} \\end{bmatrix}=\\begin{bmatrix}\\sigma^{(1)}(z_0^{(1)}) \\\\ \\sigma^{(1)}(z_1^{(1)}) \\end{bmatrix}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6863edaa",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Output layer\n",
+    "\n",
+    "For the final output layer we have the inputs to the final activation function"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "569b5a62",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "z^{(2)} = w_{0}^{(2)}a_0^{(1)} +w_{1}^{(2)}a_1^{(1)}+b^{(2)},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "88775a53",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "resulting in the  output"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "11852c41",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "a^{(2)}=\\sigma^{(2)}(z^{(2)}).\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4e2e26a9",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Explicit derivatives\n",
+    "\n",
+    "In total we have nine parameters which we need to train.  Using the\n",
+    "chain rule (or just the back-propagation algorithm) we can find all\n",
+    "derivatives. Since we will use automatic differentiation in reverse\n",
+    "mode, we start with the derivatives of the cost function with respect\n",
+    "to the parameters of the output layer, namely"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "25da37b5",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial C}{\\partial w_{i}^{(2)}}=\\frac{\\partial C}{\\partial a^{(2)}}\\frac{\\partial a^{(2)}}{\\partial z^{(2)}}\\frac{\\partial z^{(2)}}{\\partial w_{i}^{(2)}}=\\delta^{(2)}a_i^{(1)},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4094b188",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "with"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "99f40072",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\delta^{(2)}=\\frac{\\partial C}{\\partial a^{(2)}}\\frac{\\partial a^{(2)}}{\\partial z^{(2)}}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a93180cb",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and finally"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "312c8e22",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial C}{\\partial b^{(2)}}=\\frac{\\partial C}{\\partial a^{(2)}}\\frac{\\partial a^{(2)}}{\\partial z^{(2)}}\\frac{\\partial z^{(2)}}{\\partial b^{(2)}}=\\delta^{(2)}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4db8065c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Derivatives of the hidden layer\n",
+    "\n",
+    "Using the chain rule we have the following expressions for say one of the weight parameters (it is easy to generalize to the other weight parameters)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "316b7cc7",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial C}{\\partial w_{00}^{(1)}}=\\frac{\\partial C}{\\partial a^{(2)}}\\frac{\\partial a^{(2)}}{\\partial z^{(2)}}\n",
+    "\\frac{\\partial z^{(2)}}{\\partial z_0^{(1)}}\\frac{\\partial z_0^{(1)}}{\\partial w_{00}^{(1)}}=    \\delta^{(2)}\\frac{\\partial z^{(2)}}{\\partial z_0^{(1)}}\\frac{\\partial z_0^{(1)}}{\\partial w_{00}^{(1)}},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8ef16e76",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "which, noting that"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "85a0f70d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "z^{(2)} =w_0^{(2)}a_0^{(1)}+w_1^{(2)}a_1^{(1)}+b^{(2)},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "108db06e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "allows us to rewrite"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2922e5c6",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial z^{(2)}}{\\partial z_0^{(1)}}\\frac{\\partial z_0^{(1)}}{\\partial w_{00}^{(1)}}=w_0^{(2)}\\frac{\\partial a_0^{(1)}}{\\partial z_0^{(1)}}a_0^{(1)}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cb6f6fe5",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Final expression\n",
+    "Defining"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3a0d272d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\delta_0^{(1)}=w_0^{(2)}\\frac{\\partial a_0^{(1)}}{\\partial z_0^{(1)}}\\delta^{(2)},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "70a6cf5c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "we have"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a862fb73",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial C}{\\partial w_{00}^{(1)}}=\\delta_0^{(1)}a_0^{(1)}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "703fa2c1",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Similarly, we obtain"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2032458a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial C}{\\partial w_{01}^{(1)}}=\\delta_0^{(1)}a_1^{(1)}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "97d8acd7",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Completing the list\n",
+    "\n",
+    "Similarly, we find"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "972e5301",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial C}{\\partial w_{10}^{(1)}}=\\delta_1^{(1)}a_0^{(1)},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ba8f5955",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3ac41463",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial C}{\\partial w_{11}^{(1)}}=\\delta_1^{(1)}a_1^{(1)},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ab92a69c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where we have defined"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8224b6f2",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\delta_1^{(1)}=w_1^{(2)}\\frac{\\partial a_1^{(1)}}{\\partial z_1^{(1)}}\\delta^{(2)}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b55a566b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Final expressions for the biases of the hidden layer\n",
+    "\n",
+    "For the sake of completeness, we list the derivatives of the biases, which are"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cb5f687e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial C}{\\partial b_{0}^{(1)}}=\\delta_0^{(1)},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6d8361e8",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ccfb7fa8",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial C}{\\partial b_{1}^{(1)}}=\\delta_1^{(1)}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "20fd0aa3",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "As we will see below, these expressions can be generalized in a more compact form."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6bca7f99",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Gradient expressions\n",
+    "\n",
+    "For this specific model, with just one output node and two hidden\n",
+    "nodes, the gradient descent equations take the following form for output layer"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "430e26d8",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "w_{i}^{(2)}\\leftarrow w_{i}^{(2)}- \\eta \\delta^{(2)} a_{i}^{(1)},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ced71f83",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ec12ee1a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "b^{(2)} \\leftarrow b^{(2)}-\\eta \\delta^{(2)},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f46fe24d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "af8f924d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "w_{ij}^{(1)}\\leftarrow w_{ij}^{(1)}- \\eta \\delta_{i}^{(1)} a_{j}^{(0)},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4aeb6140",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0bc2f26c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "b_{i}^{(1)} \\leftarrow b_{i}^{(1)}-\\eta \\delta_{i}^{(1)},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7eafd358",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where $\\eta$ is the learning rate."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "548f58f6",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Exercise 2: Extended program\n",
+    "\n",
+    "We extend our simple code to a function which depends on two variable $x_0$ and $x_1$, that is"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4c38514a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "y=f(x_0,x_1)=x_0^2+3x_0x_1+x_1^2+5.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "06303245",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "We feed our network with $n=100$ entries $x_0$ and $x_1$. We have thus two features represented by these variable and an input matrix/design matrix $\\boldsymbol{X}\\in \\mathbf{R}^{n\\times 2}$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ed0c0029",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\boldsymbol{X}=\\begin{bmatrix} x_{00} & x_{01} \\\\ x_{00} & x_{01} \\\\ x_{10} & x_{11} \\\\ x_{20} & x_{21} \\\\ \\dots & \\dots \\\\ \\dots & \\dots \\\\ x_{n-20} & x_{n-21} \\\\ x_{n-10} & x_{n-11} \\end{bmatrix}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "93df389e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Write a code, based on the previous code examples, which takes as input these data and fit the above function.\n",
+    "You can extend your code to include automatic differentiation.\n",
+    "\n",
+    "With these examples, we are now ready to embark upon the writing of more a general code for neural networks."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5df18704",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Getting serious, the  back propagation equations for a neural network\n",
+    "\n",
+    "Now it is time to move away from one node in each layer only. Our inputs are also represented either by several inputs.\n",
+    "\n",
+    "We have thus"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ae3765be",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial{\\cal C}((\\boldsymbol{\\Theta}^L)}{\\partial w_{jk}^L}  =  \\left(a_j^L - y_j\\right)a_j^L(1-a_j^L)a_k^{L-1},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "dd8f7882",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Defining"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f204fdd7",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\delta_j^L = a_j^L(1-a_j^L)\\left(a_j^L - y_j\\right) = \\sigma'(z_j^L)\\frac{\\partial {\\cal C}}{\\partial (a_j^L)},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c28e8401",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and using the Hadamard product of two vectors we can write this as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "910c4eb1",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\boldsymbol{\\delta}^L = \\sigma'(\\hat{z}^L)\\circ\\frac{\\partial {\\cal C}}{\\partial (\\boldsymbol{a}^L)}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "efd2f948",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Analyzing the last results\n",
+    "\n",
+    "This is an important expression. The second term on the right handside\n",
+    "measures how fast the cost function is changing as a function of the $j$th\n",
+    "output activation.  If, for example, the cost function doesn't depend\n",
+    "much on a particular output node $j$, then $\\delta_j^L$ will be small,\n",
+    "which is what we would expect. The first term on the right, measures\n",
+    "how fast the activation function $f$ is changing at a given activation\n",
+    "value $z_j^L$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e1eeeba2",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## More considerations\n",
+    "\n",
+    "Notice that everything in the above equations is easily computed.  In\n",
+    "particular, we compute $z_j^L$ while computing the behaviour of the\n",
+    "network, and it is only a small additional overhead to compute\n",
+    "$\\sigma'(z^L_j)$.  The exact form of the derivative with respect to the\n",
+    "output depends on the form of the cost function.\n",
+    "However, provided the cost function is known there should be little\n",
+    "trouble in calculating"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b5e74c11",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial {\\cal C}}{\\partial (a_j^L)}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e129fe72",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "With the definition of $\\delta_j^L$ we have a more compact definition of the derivative of the cost function in terms of the weights, namely"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3879d293",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial{\\cal C}}{\\partial w_{jk}^L}  =  \\delta_j^La_k^{L-1}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1ea1da9d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Derivatives in terms of $z_j^L$\n",
+    "\n",
+    "It is also easy to see that our previous equation can be written as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c7156e16",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\delta_j^L =\\frac{\\partial {\\cal C}}{\\partial z_j^L}= \\frac{\\partial {\\cal C}}{\\partial a_j^L}\\frac{\\partial a_j^L}{\\partial z_j^L},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8311b4aa",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "which can also be interpreted as the partial derivative of the cost function with respect to the biases $b_j^L$, namely"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7bb3d820",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\delta_j^L = \\frac{\\partial {\\cal C}}{\\partial b_j^L}\\frac{\\partial b_j^L}{\\partial z_j^L}=\\frac{\\partial {\\cal C}}{\\partial b_j^L},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1eeb0c00",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "That is, the error $\\delta_j^L$ is exactly equal to the rate of change of the cost function as a function of the bias."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "bc7d3757",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Bringing it together\n",
+    "\n",
+    "We have now three equations that are essential for the computations of the derivatives of the cost function at the output layer. These equations are needed to start the algorithm and they are"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9f018cff",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"_auto1\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation}\n",
+    "\\frac{\\partial{\\cal C}(\\hat{W^L})}{\\partial w_{jk}^L}  =  \\delta_j^La_k^{L-1},\n",
+    "\\label{_auto1} \\tag{2}\n",
+    "\\end{equation}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ebde7551",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f96aa8f7",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"_auto2\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation}\n",
+    "\\delta_j^L = \\sigma'(z_j^L)\\frac{\\partial {\\cal C}}{\\partial (a_j^L)},\n",
+    "\\label{_auto2} \\tag{3}\n",
+    "\\end{equation}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1215d118",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c5f6885e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"_auto3\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation}\n",
+    "\\delta_j^L = \\frac{\\partial {\\cal C}}{\\partial b_j^L},\n",
+    "\\label{_auto3} \\tag{4}\n",
+    "\\end{equation}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1dedde99",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Final back propagating equation\n",
+    "\n",
+    "We have that (replacing $L$ with a general layer $l$)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a182b912",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\delta_j^l =\\frac{\\partial {\\cal C}}{\\partial z_j^l}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9fcc3201",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "We want to express this in terms of the equations for layer $l+1$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "54237463",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Using the chain rule and summing over all $k$ entries\n",
+    "\n",
+    "We obtain"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "dc069f5a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\delta_j^l =\\sum_k \\frac{\\partial {\\cal C}}{\\partial z_k^{l+1}}\\frac{\\partial z_k^{l+1}}{\\partial z_j^{l}}=\\sum_k \\delta_k^{l+1}\\frac{\\partial z_k^{l+1}}{\\partial z_j^{l}},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "71ba0435",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and recalling that"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "bd00cbe9",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "z_j^{l+1} = \\sum_{i=1}^{M_{l}}w_{ij}^{l+1}a_i^{l}+b_j^{l+1},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1e7e0241",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "with $M_l$ being the number of nodes in layer $l$, we obtain"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e8e3697e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\delta_j^l =\\sum_k \\delta_k^{l+1}w_{kj}^{l+1}\\sigma'(z_j^l),\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7d86a02b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "This is our final equation.\n",
+    "\n",
+    "We are now ready to set up the algorithm for back propagation and learning the weights and biases."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ff1dc46f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Setting up the back propagation algorithm\n",
+    "\n",
+    "The four equations  provide us with a way of computing the gradient of the cost function. Let us write this out in the form of an algorithm.\n",
+    "\n",
+    "**First**, we set up the input data $\\hat{x}$ and the activations\n",
+    "$\\hat{z}_1$ of the input layer and compute the activation function and\n",
+    "the pertinent outputs $\\hat{a}^1$.\n",
+    "\n",
+    "**Secondly**, we perform then the feed forward till we reach the output\n",
+    "layer and compute all $\\hat{z}_l$ of the input layer and compute the\n",
+    "activation function and the pertinent outputs $\\hat{a}^l$ for\n",
+    "$l=1,2,3,\\dots,L$.\n",
+    "\n",
+    "**Notation**: The first hidden layer has $l=1$ as label and the final output layer has $l=L$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1313e6dc",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Setting up the back propagation algorithm, part 2\n",
+    "\n",
+    "Thereafter we compute the ouput error $\\hat{\\delta}^L$ by computing all"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "74378773",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\delta_j^L = \\sigma'(z_j^L)\\frac{\\partial {\\cal C}}{\\partial (a_j^L)}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "70450254",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Then we compute the back propagate error for each $l=L-1,L-2,\\dots,1$ as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "81a28b23",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\delta_j^l = \\sum_k \\delta_k^{l+1}w_{kj}^{l+1}\\sigma'(z_j^l).\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9a733356",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Setting up the Back propagation algorithm, part 3\n",
+    "\n",
+    "Finally, we update the weights and the biases using gradient descent\n",
+    "for each $l=L-1,L-2,\\dots,1$ and update the weights and biases\n",
+    "according to the rules"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f469f486",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "w_{jk}^l\\leftarrow  = w_{jk}^l- \\eta \\delta_j^la_k^{l-1},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7461e5e6",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "b_j^l \\leftarrow b_j^l-\\eta \\frac{\\partial {\\cal C}}{\\partial b_j^l}=b_j^l-\\eta \\delta_j^l,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "50a1b605",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "with $\\eta$ being the learning rate."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0cebce43",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Updating the gradients\n",
+    "\n",
+    "With the back propagate error for each $l=L-1,L-2,\\dots,1$ as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2e4405bd",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\delta_j^l = \\sum_k \\delta_k^{l+1}w_{kj}^{l+1}sigma'(z_j^l),\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2920aa4e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "we update the weights and the biases using gradient descent for each $l=L-1,L-2,\\dots,1$ and update the weights and biases according to the rules"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "bc4357b0",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "w_{jk}^l\\leftarrow  = w_{jk}^l- \\eta \\delta_j^la_k^{l-1},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d9b66569",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "b_j^l \\leftarrow b_j^l-\\eta \\frac{\\partial {\\cal C}}{\\partial b_j^l}=b_j^l-\\eta \\delta_j^l,\n",
+    "$$"
+   ]
+  }
+ ],
+ "metadata": {},
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/doc/LectureNotes/week42.ipynb b/doc/LectureNotes/week42.ipynb
new file mode 100644
index 000000000..45a126e79
--- /dev/null
+++ b/doc/LectureNotes/week42.ipynb
@@ -0,0 +1,5952 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "d231eeee",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- HTML file automatically generated from DocOnce source (https://github.com/doconce/doconce/)\n",
+    "doconce format html week42.do.txt --no_mako -->\n",
+    "<!-- dom:TITLE: Week 42 Constructing a Neural Network code with examples -->"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5e782cb1",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "# Week 42 Constructing a Neural Network code with examples\n",
+    "**Morten Hjorth-Jensen**, Department of Physics, University of Oslo, Norway\n",
+    "\n",
+    "Date: **October 13-17, 2025**"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "53309290",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Lecture October 13, 2025\n",
+    "1. Building our own Feed-forward Neural Network and discussion of project 2\n",
+    "\n",
+    "2. Project 2 is available at <https://github.com/CompPhysics/MachineLearning/blob/master/doc/Projects/2025/Project2/ipynb/Project2.ipynb>"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "71367514",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Readings and videos\n",
+    "1. These lecture notes\n",
+    "\n",
+    "2. Video of lecture at <https://youtu.be/eqyNrEYRXnY>\n",
+    "\n",
+    "3. Whiteboard notes at <https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2025/FYSSTKweek42.pdf>\n",
+    "\n",
+    "4. For a more in depth discussion on  neural networks we recommend Goodfellow et al chapters 6 and 7. For the optimization part, see chapter 8.    \n",
+    "\n",
+    "5. Neural Networks demystified at <https://www.youtube.com/watch?v=bxe2T-V8XRs&list=PLiaHhY2iBX9hdHaRr6b7XevZtgZRa1PoU&ab_channel=WelchLabs>\n",
+    "\n",
+    "6. Building Neural Networks from scratch at <https://www.youtube.com/watch?v=Wo5dMEP_BbI&list=PLQVvvaa0QuDcjD5BAw2DxE6OF2tius3V3&ab_channel=sentdex>\n",
+    "\n",
+    "7. Video on Neural Networks at <https://www.youtube.com/watch?v=CqOfi41LfDw>\n",
+    "\n",
+    "8. Video on the back propagation algorithm at <https://www.youtube.com/watch?v=Ilg3gGewQ5U>\n",
+    "\n",
+    "I also  recommend Michael Nielsen's intuitive approach to the neural networks and the universal approximation theorem, see the slides at <http://neuralnetworksanddeeplearning.com/chap4.html>."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c7be87be",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Material for the lab sessions on Tuesday and Wednesday\n",
+    "1. Exercises on writing a code for neural networks, back propagation part, see exercises for week 42 at <https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/exercisesweek42.html> \n",
+    "\n",
+    "2. Discussion of project 2"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8e0567a2",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Lecture material: Writing a code which implements a feed-forward neural network\n",
+    "\n",
+    "Last week we discussed the basics of neural networks and deep learning\n",
+    "and the basics of automatic differentiation.  We looked also at\n",
+    "examples on how compute the parameters of a simple network with scalar\n",
+    "inputs and ouputs and no or just one hidden layers.\n",
+    "\n",
+    "We ended our discussions with the derivation of the equations for a\n",
+    "neural network with one hidden layers and two input variables and two\n",
+    "hidden nodes but only one output node. We did almost finish the derivation of the back propagation algorithm."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "549dcc05",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Mathematics of deep learning\n",
+    "\n",
+    "**Two recent books online.**\n",
+    "\n",
+    "1. [The Modern Mathematics of Deep Learning, by Julius Berner, Philipp Grohs, Gitta Kutyniok, Philipp Petersen](https://arxiv.org/abs/2105.04026), published as [Mathematical Aspects of Deep Learning, pp. 1-111. Cambridge University Press, 2022](https://doi.org/10.1017/9781009025096.002)\n",
+    "\n",
+    "2. [Mathematical Introduction to Deep Learning: Methods, Implementations, and Theory, Arnulf Jentzen, Benno Kuckuck, Philippe von Wurstemberger](https://doi.org/10.48550/arXiv.2310.20360)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "21203bae",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Reminder on books with hands-on material and codes\n",
+    "* [Sebastian Rashcka et al, Machine learning with Sickit-Learn and PyTorch](https://sebastianraschka.com/blog/2022/ml-pytorch-book.html)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1c102a30",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Reading recommendations\n",
+    "\n",
+    "1. Rashkca et al., chapter 11, jupyter-notebook sent separately, from [GitHub](https://github.com/rasbt/machine-learning-book)\n",
+    "\n",
+    "2. Goodfellow et al, chapter 6 and 7 contain most of the neural network background."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "53f11afe",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Reminder from last week: First network example, simple percepetron with one input\n",
+    "\n",
+    "As yet another example we define now a simple perceptron model with\n",
+    "all quantities given by scalars. We consider only one input variable\n",
+    "$x$ and one target value $y$.  We define an activation function\n",
+    "$\\sigma_1$ which takes as input"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "afa8c42a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "z_1 = w_1x+b_1,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cb5c959f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where $w_1$ is the weight and $b_1$ is the bias. These are the\n",
+    "parameters we want to optimize.  The output is $a_1=\\sigma(z_1)$ (see\n",
+    "graph from whiteboard notes). This output is then fed into the\n",
+    "**cost/loss** function, which we here for the sake of simplicity just\n",
+    "define as the squared error"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0083ae15",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "C(x;w_1,b_1)=\\frac{1}{2}(a_1-y)^2.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f4931203",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Layout of a simple neural network with no hidden layer\n",
+    "\n",
+    "<!-- dom:FIGURE: [figures/simplenn1.png, width=900 frac=1.0] -->\n",
+    "<!-- begin figure -->\n",
+    "\n",
+    "<img src=\"figures/simplenn1.png\" width=\"900\"><p style=\"font-size: 0.9em\"><i>Figure 1: </i></p>\n",
+    "<!-- end figure -->"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d3a3754d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Optimizing the parameters\n",
+    "\n",
+    "In setting up the feed forward and back propagation parts of the\n",
+    "algorithm, we need now the derivative of the various variables we want\n",
+    "to train.\n",
+    "\n",
+    "We need"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "bcd5dbab",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial C}{\\partial w_1} \\hspace{0.1cm}\\mathrm{and}\\hspace{0.1cm}\\frac{\\partial C}{\\partial b_1}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2cbc30f1",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Using the chain rule we find"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1a1d803d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial C}{\\partial w_1}=\\frac{\\partial C}{\\partial a_1}\\frac{\\partial a_1}{\\partial z_1}\\frac{\\partial z_1}{\\partial w_1}=(a_1-y)\\sigma_1'x,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "776735c7",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c1a2e5af",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial C}{\\partial b_1}=\\frac{\\partial C}{\\partial a_1}\\frac{\\partial a_1}{\\partial z_1}\\frac{\\partial z_1}{\\partial b_1}=(a_1-y)\\sigma_1',\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9e603df9",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "which we later will just define as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "533212cd",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial C}{\\partial a_1}\\frac{\\partial a_1}{\\partial z_1}=\\delta_1.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "09d91067",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Adding a hidden layer\n",
+    "\n",
+    "We change our simple model to (see graph)\n",
+    "a network with just one hidden layer but with scalar variables only.\n",
+    "\n",
+    "Our output variable changes to $a_2$ and $a_1$ is now the output from the hidden node and $a_0=x$.\n",
+    "We have then"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f767afe7",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "z_1 = w_1a_0+b_1 \\hspace{0.1cm} \\wedge a_1 = \\sigma_1(z_1),\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f38ded54",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "z_2 = w_2a_1+b_2 \\hspace{0.1cm} \\wedge a_2 = \\sigma_2(z_2),\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f3f03bc3",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and the cost function"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9062730e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "C(x;\\boldsymbol{\\Theta})=\\frac{1}{2}(a_2-y)^2,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "75bbc32c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "with $\\boldsymbol{\\Theta}=[w_1,w_2,b_1,b_2]$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "fcf02dbf",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Layout of a simple neural network with one hidden layer\n",
+    "\n",
+    "<!-- dom:FIGURE: [figures/simplenn2.png, width=900 frac=1.0] -->\n",
+    "<!-- begin figure -->\n",
+    "\n",
+    "<img src=\"figures/simplenn2.png\" width=\"900\"><p style=\"font-size: 0.9em\"><i>Figure 1: </i></p>\n",
+    "<!-- end figure -->"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "aa97678f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## The derivatives\n",
+    "\n",
+    "The derivatives are now, using the chain rule again"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "98f68e27",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial C}{\\partial w_2}=\\frac{\\partial C}{\\partial a_2}\\frac{\\partial a_2}{\\partial z_2}\\frac{\\partial z_2}{\\partial w_2}=(a_2-y)\\sigma_2'a_1=\\delta_2a_1,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c4528178",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial C}{\\partial b_2}=\\frac{\\partial C}{\\partial a_2}\\frac{\\partial a_2}{\\partial z_2}\\frac{\\partial z_2}{\\partial b_2}=(a_2-y)\\sigma_2'=\\delta_2,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d6304298",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial C}{\\partial w_1}=\\frac{\\partial C}{\\partial a_2}\\frac{\\partial a_2}{\\partial z_2}\\frac{\\partial z_2}{\\partial a_1}\\frac{\\partial a_1}{\\partial z_1}\\frac{\\partial z_1}{\\partial w_1}=(a_2-y)\\sigma_2'a_1\\sigma_1'a_0,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "dfc47ba6",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial C}{\\partial b_1}=\\frac{\\partial C}{\\partial a_2}\\frac{\\partial a_2}{\\partial z_2}\\frac{\\partial z_2}{\\partial a_1}\\frac{\\partial a_1}{\\partial z_1}\\frac{\\partial z_1}{\\partial b_1}=(a_2-y)\\sigma_2'\\sigma_1'=\\delta_1.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8834c3dc",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Can you generalize this to more than one hidden layer?"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "40956770",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Important observations\n",
+    "\n",
+    "From the above equations we see that the derivatives of the activation\n",
+    "functions play a central role. If they vanish, the training may\n",
+    "stop. This is called the vanishing gradient problem, see discussions below. If they become\n",
+    "large, the parameters $w_i$ and $b_i$ may simply go to infinity. This\n",
+    "is referenced as  the exploding gradient problem."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "69e7fdcf",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## The training\n",
+    "\n",
+    "The training of the parameters is done through various gradient descent approximations with"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "726d4c90",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "w_{i}\\leftarrow w_{i}- \\eta \\delta_i a_{i-1},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0ee83d1c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f5b3b5a5",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "b_i \\leftarrow b_i-\\eta \\delta_i,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b2746792",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "with $\\eta$ is the learning rate.\n",
+    "\n",
+    "One iteration consists of one feed forward step and one back-propagation step. Each back-propagation step does one update of the parameters $\\boldsymbol{\\Theta}$.\n",
+    "\n",
+    "For the first hidden layer $a_{i-1}=a_0=x$ for this simple model."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "76e2e41a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Code example\n",
+    "\n",
+    "The code here implements the above model with one hidden layer and\n",
+    "scalar variables for the same function we studied in the previous\n",
+    "example.  The code is however set up so that we can add multiple\n",
+    "inputs $x$ and target values $y$. Note also that we have the\n",
+    "possibility of defining a feature matrix $\\boldsymbol{X}$ with more than just\n",
+    "one column for the input values. This will turn useful in our next example. We have also defined matrices and vectors for all of our operations although it is not necessary here."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "id": "1c4719c1",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "import numpy as np\n",
+    "# We use the Sigmoid function as activation function\n",
+    "def sigmoid(z):\n",
+    "    return 1.0/(1.0+np.exp(-z))\n",
+    "\n",
+    "def forwardpropagation(x):\n",
+    "    # weighted sum of inputs to the hidden layer\n",
+    "    z_1 = np.matmul(x, w_1) + b_1\n",
+    "    # activation in the hidden layer\n",
+    "    a_1 = sigmoid(z_1)\n",
+    "    # weighted sum of inputs to the output layer\n",
+    "    z_2 = np.matmul(a_1, w_2) + b_2\n",
+    "    a_2 = z_2\n",
+    "    return a_1, a_2\n",
+    "\n",
+    "def backpropagation(x, y):\n",
+    "    a_1, a_2 = forwardpropagation(x)\n",
+    "    # parameter delta for the output layer, note that a_2=z_2 and its derivative wrt z_2 is just 1\n",
+    "    delta_2 = a_2 - y\n",
+    "    print(0.5*((a_2-y)**2))\n",
+    "    # delta for  the hidden layer\n",
+    "    delta_1 = np.matmul(delta_2, w_2.T) * a_1 * (1 - a_1)\n",
+    "    # gradients for the output layer\n",
+    "    output_weights_gradient = np.matmul(a_1.T, delta_2)\n",
+    "    output_bias_gradient = np.sum(delta_2, axis=0)\n",
+    "    # gradient for the hidden layer\n",
+    "    hidden_weights_gradient = np.matmul(x.T, delta_1)\n",
+    "    hidden_bias_gradient = np.sum(delta_1, axis=0)\n",
+    "    return output_weights_gradient, output_bias_gradient, hidden_weights_gradient, hidden_bias_gradient\n",
+    "\n",
+    "\n",
+    "# ensure the same random numbers appear every time\n",
+    "np.random.seed(0)\n",
+    "# Input variable\n",
+    "x = np.array([4.0],dtype=np.float64)\n",
+    "# Target values\n",
+    "y = 2*x+1.0 \n",
+    "\n",
+    "# Defining the neural network, only scalars here\n",
+    "n_inputs = x.shape\n",
+    "n_features = 1\n",
+    "n_hidden_neurons = 1\n",
+    "n_outputs = 1\n",
+    "\n",
+    "# Initialize the network\n",
+    "# weights and bias in the hidden layer\n",
+    "w_1 = np.random.randn(n_features, n_hidden_neurons)\n",
+    "b_1 = np.zeros(n_hidden_neurons) + 0.01\n",
+    "\n",
+    "# weights and bias in the output layer\n",
+    "w_2 = np.random.randn(n_hidden_neurons, n_outputs)\n",
+    "b_2 = np.zeros(n_outputs) + 0.01\n",
+    "\n",
+    "eta = 0.1\n",
+    "for i in range(50):\n",
+    "    # calculate gradients\n",
+    "    derivW2, derivB2, derivW1, derivB1 = backpropagation(x, y)\n",
+    "    # update weights and biases\n",
+    "    w_2 -= eta * derivW2\n",
+    "    b_2 -= eta * derivB2\n",
+    "    w_1 -= eta * derivW1\n",
+    "    b_1 -= eta * derivB1"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "debaaadc",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "We see that after some few iterations (the results do depend on the learning rate however), we get an error which is rather small."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7d576f19",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Simple neural network and the  back propagation equations\n",
+    "\n",
+    "Let us now try to increase our level of ambition and attempt at setting \n",
+    "up the equations for a neural network with two input nodes, one hidden\n",
+    "layer with two hidden nodes and one output layer with one output node/neuron only (see graph)..\n",
+    "\n",
+    "We need to define the following parameters and variables with the input layer (layer $(0)$) \n",
+    "where we label the  nodes $x_1$ and $x_2$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "582b3b43",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "x_1 = a_1^{(0)} \\wedge x_2 = a_2^{(0)}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c8eace47",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "The  hidden layer (layer $(1)$) has  nodes which yield the outputs $a_1^{(1)}$ and $a_2^{(1)}$) with  weight $\\boldsymbol{w}$ and bias $\\boldsymbol{b}$ parameters"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "81ec9945",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "w_{ij}^{(1)}=\\left\\{w_{11}^{(1)},w_{12}^{(1)},w_{21}^{(1)},w_{22}^{(1)}\\right\\} \\wedge b^{(1)}=\\left\\{b_1^{(1)},b_2^{(1)}\\right\\}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c35e1f69",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Layout of a simple neural network with two input nodes, one  hidden layer with two hidden noeds and one output node\n",
+    "\n",
+    "<!-- dom:FIGURE: [figures/simplenn3.png, width=900 frac=1.0] -->\n",
+    "<!-- begin figure -->\n",
+    "\n",
+    "<img src=\"figures/simplenn3.png\" width=\"900\"><p style=\"font-size: 0.9em\"><i>Figure 1: </i></p>\n",
+    "<!-- end figure -->"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "05b8eea9",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## The ouput layer\n",
+    "\n",
+    "We have the ouput layer given by layer label $(2)$ with output $a^{(2)}$ and weights and biases to be determined given by the variables"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7ef9cb55",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "w_{i}^{(2)}=\\left\\{w_{1}^{(2)},w_{2}^{(2)}\\right\\} \\wedge b^{(2)}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1eb5c5ac",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Our output is $\\tilde{y}=a^{(2)}$ and we define a generic cost function $C(a^{(2)},y;\\boldsymbol{\\Theta})$ where $y$ is the target value (a scalar here).\n",
+    "The parameters we need to optimize are given by"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "00492358",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\boldsymbol{\\Theta}=\\left\\{w_{11}^{(1)},w_{12}^{(1)},w_{21}^{(1)},w_{22}^{(1)},w_{1}^{(2)},w_{2}^{(2)},b_1^{(1)},b_2^{(1)},b^{(2)}\\right\\}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "45cca5aa",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Compact expressions\n",
+    "\n",
+    "We can define the inputs to the activation functions for the various layers in terms of various matrix-vector multiplications and vector additions.\n",
+    "The inputs to the first hidden layer are"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "22cfb40b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\begin{bmatrix}z_1^{(1)} \\\\ z_2^{(1)} \\end{bmatrix}=\\left(\\begin{bmatrix}w_{11}^{(1)} & w_{12}^{(1)}\\\\ w_{21}^{(1)} &w_{22}^{(1)} \\end{bmatrix}\\right)^{T}\\begin{bmatrix}a_1^{(0)} \\\\ a_2^{(0)} \\end{bmatrix}+\\begin{bmatrix}b_1^{(1)} \\\\ b_2^{(1)} \\end{bmatrix},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "45b30d06",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "with outputs"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ebd6a7a5",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\begin{bmatrix}a_1^{(1)} \\\\ a_2^{(1)} \\end{bmatrix}=\\begin{bmatrix}\\sigma^{(1)}(z_1^{(1)}) \\\\ \\sigma^{(1)}(z_2^{(1)}) \\end{bmatrix}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "659dd686",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Output layer\n",
+    "\n",
+    "For the final output layer we have the inputs to the final activation function"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "34a1d4ca",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "z^{(2)} = w_{1}^{(2)}a_1^{(1)} +w_{2}^{(2)}a_2^{(1)}+b^{(2)},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "34471712",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "resulting in the  output"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0b3a74fd",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "a^{(2)}=\\sigma^{(2)}(z^{(2)}).\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1a5bdab3",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Explicit derivatives\n",
+    "\n",
+    "In total we have nine parameters which we need to train.  Using the\n",
+    "chain rule (or just the back-propagation algorithm) we can find all\n",
+    "derivatives. Since we will use automatic differentiation in reverse\n",
+    "mode, we start with the derivatives of the cost function with respect\n",
+    "to the parameters of the output layer, namely"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "37f19e78",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial C}{\\partial w_{i}^{(2)}}=\\frac{\\partial C}{\\partial a^{(2)}}\\frac{\\partial a^{(2)}}{\\partial z^{(2)}}\\frac{\\partial z^{(2)}}{\\partial w_{i}^{(2)}}=\\delta^{(2)}a_i^{(1)},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5505aab8",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "with"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d55d045c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\delta^{(2)}=\\frac{\\partial C}{\\partial a^{(2)}}\\frac{\\partial a^{(2)}}{\\partial z^{(2)}}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "04f101e7",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and finally"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "bfab2e91",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial C}{\\partial b^{(2)}}=\\frac{\\partial C}{\\partial a^{(2)}}\\frac{\\partial a^{(2)}}{\\partial z^{(2)}}\\frac{\\partial z^{(2)}}{\\partial b^{(2)}}=\\delta^{(2)}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "77f35b7e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Derivatives of the hidden layer\n",
+    "\n",
+    "Using the chain rule we have the following expressions for say one of the weight parameters (it is easy to generalize to the other weight parameters)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8cf4a606",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial C}{\\partial w_{11}^{(1)}}=\\frac{\\partial C}{\\partial a^{(2)}}\\frac{\\partial a^{(2)}}{\\partial z^{(2)}}\n",
+    "\\frac{\\partial z^{(2)}}{\\partial z_1^{(1)}}\\frac{\\partial z_1^{(1)}}{\\partial w_{11}^{(1)}}=    \\delta^{(2)}\\frac{\\partial z^{(2)}}{\\partial z_1^{(1)}}\\frac{\\partial z_1^{(1)}}{\\partial w_{11}^{(1)}},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "86951351",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "which, noting that"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "73414e65",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "z^{(2)} =w_1^{(2)}a_1^{(1)}+w_2^{(2)}a_2^{(1)}+b^{(2)},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8f0aaa15",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "allows us to rewrite"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "730c5415",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial z^{(2)}}{\\partial z_1^{(1)}}\\frac{\\partial z_1^{(1)}}{\\partial w_{11}^{(1)}}=w_1^{(2)}\\frac{\\partial a_1^{(1)}}{\\partial z_1^{(1)}}a_1^{(1)}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1afcb5a1",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Final expression\n",
+    "Defining"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7f30cb44",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\delta_1^{(1)}=w_1^{(2)}\\frac{\\partial a_1^{(1)}}{\\partial z_1^{(1)}}\\delta^{(2)},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "14c045ce",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "we have"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0c1a2c68",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial C}{\\partial w_{11}^{(1)}}=\\delta_1^{(1)}a_1^{(1)}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a3385222",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Similarly, we obtain"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "18ee3804",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial C}{\\partial w_{12}^{(1)}}=\\delta_1^{(1)}a_2^{(1)}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ad741d56",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Completing the list\n",
+    "\n",
+    "Similarly, we find"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "65870a70",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial C}{\\partial w_{21}^{(1)}}=\\delta_2^{(1)}a_1^{(1)},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f7807fdc",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9af4a759",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial C}{\\partial w_{22}^{(1)}}=\\delta_2^{(1)}a_2^{(1)},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "dc548cb7",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where we have defined"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "83b75e94",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\delta_2^{(1)}=w_2^{(2)}\\frac{\\partial a_2^{(1)}}{\\partial z_2^{(1)}}\\delta^{(2)}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1c2be559",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Final expressions for the biases of the hidden layer\n",
+    "\n",
+    "For the sake of completeness, we list the derivatives of the biases, which are"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "18b85f86",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial C}{\\partial b_{1}^{(1)}}=\\delta_1^{(1)},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "63e39eb4",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a55371c1",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial C}{\\partial b_{2}^{(1)}}=\\delta_2^{(1)}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "fa31a9b3",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "As we will see below, these expressions can be generalized in a more compact form."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "580df891",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Gradient expressions\n",
+    "\n",
+    "For this specific model, with just one output node and two hidden\n",
+    "nodes, the gradient descent equations take the following form for output layer"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c10bf2ce",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "w_{i}^{(2)}\\leftarrow w_{i}^{(2)}- \\eta \\delta^{(2)} a_{i}^{(1)},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0bae11f8",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ed4a8b93",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "b^{(2)} \\leftarrow b^{(2)}-\\eta \\delta^{(2)},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2d582987",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5fa760a1",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "w_{ij}^{(1)}\\leftarrow w_{ij}^{(1)}- \\eta \\delta_{i}^{(1)} a_{j}^{(0)},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "bc9de8bf",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f00e3ace",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "b_{i}^{(1)} \\leftarrow b_{i}^{(1)}-\\eta \\delta_{i}^{(1)},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7ac96362",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where $\\eta$ is the learning rate."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9c46f966",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Setting up the equations for a neural network\n",
+    "\n",
+    "The questions we want to ask are how do changes in the biases and the\n",
+    "weights in our network change the cost function and how can we use the\n",
+    "final output to modify the weights and biases?\n",
+    "\n",
+    "To derive these equations let us start with a plain regression problem\n",
+    "and define our cost function as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ea509b11",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "{\\cal C}(\\boldsymbol{\\Theta})  =  \\frac{1}{2}\\sum_{i=1}^n\\left(y_i - \\tilde{y}_i\\right)^2,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e08ff771",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where the $y_i$s are our $n$ targets (the values we want to\n",
+    "reproduce), while the outputs of the network after having propagated\n",
+    "all inputs $\\boldsymbol{x}$ are given by $\\boldsymbol{\\tilde{y}}_i$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6f476983",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Layout of a neural network with three hidden layers (last layer = $l=L=4$, first layer $l=0$)\n",
+    "\n",
+    "<!-- dom:FIGURE: [figures/nn2.png, width=900 frac=1.0] -->\n",
+    "<!-- begin figure -->\n",
+    "\n",
+    "<img src=\"figures/nn2.png\" width=\"900\"><p style=\"font-size: 0.9em\"><i>Figure 1: </i></p>\n",
+    "<!-- end figure -->"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0535d087",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Definitions\n",
+    "\n",
+    "With our definition of the targets $\\boldsymbol{y}$, the outputs of the\n",
+    "network $\\boldsymbol{\\tilde{y}}$ and the inputs $\\boldsymbol{x}$ we\n",
+    "define now the activation $z_j^l$ of node/neuron/unit $j$ of the\n",
+    "$l$-th layer as a function of the bias, the weights which add up from\n",
+    "the previous layer $l-1$ and the forward passes/outputs\n",
+    "$\\boldsymbol{a}^{l-1}$ from the previous layer as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5e024ec1",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "z_j^l = \\sum_{i=1}^{M_{l-1}}w_{ij}^la_i^{l-1}+b_j^l,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "239fb4c6",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where $b_k^l$ are the biases from layer $l$.  Here $M_{l-1}$\n",
+    "represents the total number of nodes/neurons/units of layer $l-1$. The\n",
+    "figure in the whiteboard notes illustrates this equation.  We can rewrite this in a more\n",
+    "compact form as the matrix-vector products we discussed earlier,"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7e4fa6c5",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\boldsymbol{z}^l = \\left(\\boldsymbol{W}^l\\right)^T\\boldsymbol{a}^{l-1}+\\boldsymbol{b}^l.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c47cc3c6",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Inputs to the activation function\n",
+    "\n",
+    "With the activation values $\\boldsymbol{z}^l$ we can in turn define the\n",
+    "output of layer $l$ as $\\boldsymbol{a}^l = \\sigma(\\boldsymbol{z}^l)$ where $\\sigma$ is our\n",
+    "activation function. In the examples here we will use the sigmoid\n",
+    "function discussed in our logistic regression lectures. We will also use the same activation function $\\sigma$ for all layers\n",
+    "and their nodes.  It means we have"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4eb89f11",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "a_j^l = \\sigma(z_j^l) = \\frac{1}{1+\\exp{-(z_j^l)}}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "92744a90",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Layout of input to first hidden layer $l=1$ from input layer $l=0$\n",
+    "\n",
+    "<!-- dom:FIGURE: [figures/structure.png, width=900 frac=1.0] -->\n",
+    "<!-- begin figure -->\n",
+    "\n",
+    "<img src=\"figures/structure.png\" width=\"900\"><p style=\"font-size: 0.9em\"><i>Figure 1: </i></p>\n",
+    "<!-- end figure -->"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "35424d45",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Derivatives and the chain rule\n",
+    "\n",
+    "From the definition of the input variable to the activation function, that is $z_j^l$ we have"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b8502930",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial z_j^l}{\\partial w_{ij}^l} = a_i^{l-1},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "81ad45a5",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "11bb8afb",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial z_j^l}{\\partial a_i^{l-1}} = w_{ji}^l.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b53ec752",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "With our definition of the activation function we have that (note that this function depends only on $z_j^l$)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b7519a84",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial a_j^l}{\\partial z_j^{l}} = a_j^l(1-a_j^l)=\\sigma(z_j^l)(1-\\sigma(z_j^l)).\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c57689db",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Derivative of the cost function\n",
+    "\n",
+    "With these definitions we can now compute the derivative of the cost function in terms of the weights.\n",
+    "\n",
+    "Let us specialize to the output layer $l=L$. Our cost function is"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a9f83b15",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "{\\cal C}(\\boldsymbol{\\Theta}^L)  =  \\frac{1}{2}\\sum_{i=1}^n\\left(y_i - \\tilde{y}_i\\right)^2=\\frac{1}{2}\\sum_{i=1}^n\\left(a_i^L - y_i\\right)^2,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "067c2583",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "The derivative of this function with respect to the weights is"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "43545710",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial{\\cal C}(\\boldsymbol{\\Theta}^L)}{\\partial w_{ij}^L}  =  \\left(a_j^L - y_j\\right)\\frac{\\partial a_j^L}{\\partial w_{ij}^{L}},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1eb33717",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "The last partial derivative can easily be computed and reads (by applying the chain rule)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e09a8734",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial a_j^L}{\\partial w_{ij}^{L}} = \\frac{\\partial a_j^L}{\\partial z_{j}^{L}}\\frac{\\partial z_j^L}{\\partial w_{ij}^{L}}=a_j^L(1-a_j^L)a_i^{L-1}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3dc0f5a3",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## The  back propagation equations for a neural network\n",
+    "\n",
+    "We have thus"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "bb58784b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial{\\cal C}((\\boldsymbol{\\Theta}^L)}{\\partial w_{ij}^L}  =  \\left(a_j^L - y_j\\right)a_j^L(1-a_j^L)a_i^{L-1},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "10aea094",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Defining"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b7cc2db8",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\delta_j^L = a_j^L(1-a_j^L)\\left(a_j^L - y_j\\right) = \\sigma'(z_j^L)\\frac{\\partial {\\cal C}}{\\partial (a_j^L)},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6cce9a62",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and using the Hadamard product of two vectors we can write this as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "43e5a84b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\boldsymbol{\\delta}^L = \\sigma'(\\boldsymbol{z}^L)\\circ\\frac{\\partial {\\cal C}}{\\partial (\\boldsymbol{a}^L)}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d5c607a7",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Analyzing the last results\n",
+    "\n",
+    "This is an important expression. The second term on the right handside\n",
+    "measures how fast the cost function is changing as a function of the $j$th\n",
+    "output activation.  If, for example, the cost function doesn't depend\n",
+    "much on a particular output node $j$, then $\\delta_j^L$ will be small,\n",
+    "which is what we would expect. The first term on the right, measures\n",
+    "how fast the activation function $f$ is changing at a given activation\n",
+    "value $z_j^L$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a51b3b58",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## More considerations\n",
+    "\n",
+    "Notice that everything in the above equations is easily computed.  In\n",
+    "particular, we compute $z_j^L$ while computing the behaviour of the\n",
+    "network, and it is only a small additional overhead to compute\n",
+    "$\\sigma'(z^L_j)$.  The exact form of the derivative with respect to the\n",
+    "output depends on the form of the cost function.\n",
+    "However, provided the cost function is known there should be little\n",
+    "trouble in calculating"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4cd9d058",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial {\\cal C}}{\\partial (a_j^L)}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c80b630d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "With the definition of $\\delta_j^L$ we have a more compact definition of the derivative of the cost function in terms of the weights, namely"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "dc0c1a06",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial{\\cal C}}{\\partial w_{ij}^L}  =  \\delta_j^La_i^{L-1}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8f2065b7",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Derivatives in terms of $z_j^L$\n",
+    "\n",
+    "It is also easy to see that our previous equation can be written as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7f89b9d8",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\delta_j^L =\\frac{\\partial {\\cal C}}{\\partial z_j^L}= \\frac{\\partial {\\cal C}}{\\partial a_j^L}\\frac{\\partial a_j^L}{\\partial z_j^L},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "49c2cd3f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "which can also be interpreted as the partial derivative of the cost function with respect to the biases $b_j^L$, namely"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "517b1a37",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\delta_j^L = \\frac{\\partial {\\cal C}}{\\partial b_j^L}\\frac{\\partial b_j^L}{\\partial z_j^L}=\\frac{\\partial {\\cal C}}{\\partial b_j^L},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "65c8107f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "That is, the error $\\delta_j^L$ is exactly equal to the rate of change of the cost function as a function of the bias."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2a10f902",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Bringing it together\n",
+    "\n",
+    "We have now three equations that are essential for the computations of the derivatives of the cost function at the output layer. These equations are needed to start the algorithm and they are"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b2ebf9c2",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"_auto1\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation}\n",
+    "\\frac{\\partial{\\cal C}(\\boldsymbol{W^L})}{\\partial w_{ij}^L}  =  \\delta_j^La_i^{L-1},\n",
+    "\\label{_auto1} \\tag{1}\n",
+    "\\end{equation}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "90336322",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f25ff166",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"_auto2\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation}\n",
+    "\\delta_j^L = \\sigma'(z_j^L)\\frac{\\partial {\\cal C}}{\\partial (a_j^L)},\n",
+    "\\label{_auto2} \\tag{2}\n",
+    "\\end{equation}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4cf11d5e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2670748d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"_auto3\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation}\n",
+    "\\delta_j^L = \\frac{\\partial {\\cal C}}{\\partial b_j^L},\n",
+    "\\label{_auto3} \\tag{3}\n",
+    "\\end{equation}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "18c29f71",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Final back propagating equation\n",
+    "\n",
+    "We have that (replacing $L$ with a general layer $l$)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c593470c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\delta_j^l =\\frac{\\partial {\\cal C}}{\\partial z_j^l}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "28e8caef",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "We want to express this in terms of the equations for layer $l+1$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "516de9d7",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Using the chain rule and summing over all $k$ entries\n",
+    "\n",
+    "We obtain"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "004c0bf4",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\delta_j^l =\\sum_k \\frac{\\partial {\\cal C}}{\\partial z_k^{l+1}}\\frac{\\partial z_k^{l+1}}{\\partial z_j^{l}}=\\sum_k \\delta_k^{l+1}\\frac{\\partial z_k^{l+1}}{\\partial z_j^{l}},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d62a3b1f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and recalling that"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e9af770e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "z_j^{l+1} = \\sum_{i=1}^{M_{l}}w_{ij}^{l+1}a_i^{l}+b_j^{l+1},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "eca56f17",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "with $M_l$ being the number of nodes in layer $l$, we obtain"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "bb0e4414",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\delta_j^l =\\sum_k \\delta_k^{l+1}w_{kj}^{l+1}\\sigma'(z_j^l),\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a4b190fc",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "This is our final equation.\n",
+    "\n",
+    "We are now ready to set up the algorithm for back propagation and learning the weights and biases."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ec0f87c0",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations\n",
+    "\n",
+    "**The architecture (our model).**\n",
+    "\n",
+    "1. Set up your inputs and outputs (scalars, vectors, matrices or higher-order arrays)\n",
+    "\n",
+    "2. Define the number of hidden layers and hidden nodes\n",
+    "\n",
+    "3. Define activation functions for hidden layers and output layers\n",
+    "\n",
+    "4. Define optimizer (plan learning rate, momentum, ADAgrad, RMSprop, ADAM etc) and array of initial learning rates\n",
+    "\n",
+    "5. Define cost function and possible regularization terms with hyperparameters\n",
+    "\n",
+    "6. Initialize weights and biases\n",
+    "\n",
+    "7. Fix number of iterations for the feed forward part and back propagation part"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2fb45155",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Setting up the back propagation algorithm, part 1\n",
+    "\n",
+    "The four equations  provide us with a way of computing the gradients of the cost function. Let us write this out in the form of an algorithm.\n",
+    "\n",
+    "**First**, we set up the input data $\\boldsymbol{x}$ and the activations\n",
+    "$\\boldsymbol{z}_1$ of the input layer and compute the activation function and\n",
+    "the pertinent outputs $\\boldsymbol{a}^1$.\n",
+    "\n",
+    "**Secondly**, we perform then the feed forward till we reach the output\n",
+    "layer and compute all $\\boldsymbol{z}_l$ of the input layer and compute the\n",
+    "activation function and the pertinent outputs $\\boldsymbol{a}^l$ for\n",
+    "$l=1,2,3,\\dots,L$.\n",
+    "\n",
+    "**Notation**: The first hidden layer has $l=1$ as label and the final output layer has $l=L$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3d5c2a0e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Setting up the back propagation algorithm, part 2\n",
+    "\n",
+    "Thereafter we compute the ouput error $\\boldsymbol{\\delta}^L$ by computing all"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9183bbd0",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\delta_j^L = \\sigma'(z_j^L)\\frac{\\partial {\\cal C}}{\\partial (a_j^L)}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "32ece956",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Then we compute the back propagate error for each $l=L-1,L-2,\\dots,1$ as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "466d6bda",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\delta_j^l = \\sum_k \\delta_k^{l+1}w_{kj}^{l+1}\\sigma'(z_j^l).\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9f31b228",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Setting up the Back propagation algorithm, part 3\n",
+    "\n",
+    "Finally, we update the weights and the biases using gradient descent\n",
+    "for each $l=L-1,L-2,\\dots,1$ (the first hidden layer) and update the weights and biases\n",
+    "according to the rules"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "fbeac005",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "w_{ij}^l\\leftarrow  = w_{ij}^l- \\eta \\delta_j^la_i^{l-1},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "bc6ae984",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "b_j^l \\leftarrow b_j^l-\\eta \\frac{\\partial {\\cal C}}{\\partial b_j^l}=b_j^l-\\eta \\delta_j^l,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "65f3133d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "with $\\eta$ being the learning rate."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5d27bbe1",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Updating the gradients\n",
+    "\n",
+    "With the back propagate error for each $l=L-1,L-2,\\dots,1$ as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5e5d0aa0",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\delta_j^l = \\sum_k \\delta_k^{l+1}w_{kj}^{l+1}\\sigma'(z_j^l),\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ea32e5bb",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "we update the weights and the biases using gradient descent for each $l=L-1,L-2,\\dots,1$ and update the weights and biases according to the rules"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3a9bb5a6",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "w_{ij}^l\\leftarrow  = w_{ij}^l- \\eta \\delta_j^la_i^{l-1},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9008dcf8",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "b_j^l \\leftarrow b_j^l-\\eta \\frac{\\partial {\\cal C}}{\\partial b_j^l}=b_j^l-\\eta \\delta_j^l,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "89aba7d6",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Activation functions\n",
+    "\n",
+    "A property that characterizes a neural network, other than its\n",
+    "connectivity, is the choice of activation function(s).  The following\n",
+    "restrictions are imposed on an activation function for an FFNN to\n",
+    "fulfill the universal approximation theorem\n",
+    "\n",
+    "  * Non-constant\n",
+    "\n",
+    "  * Bounded\n",
+    "\n",
+    "  * Monotonically-increasing\n",
+    "\n",
+    "  * Continuous"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ea0cdce2",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "### Activation functions, Logistic and Hyperbolic ones\n",
+    "\n",
+    "The second requirement excludes all linear functions. Furthermore, in\n",
+    "a MLP with only linear activation functions, each layer simply\n",
+    "performs a linear transformation of its inputs.\n",
+    "\n",
+    "Regardless of the number of layers, the output of the NN will be\n",
+    "nothing but a linear function of the inputs. Thus we need to introduce\n",
+    "some kind of non-linearity to the NN to be able to fit non-linear\n",
+    "functions Typical examples are the logistic *Sigmoid*"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "91342c80",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\sigma(x) = \\frac{1}{1 + e^{-x}},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "bd6eb22a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and the *hyperbolic tangent* function"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4e75b2ab",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\sigma(x) = \\tanh(x)\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1626d9b7",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Relevance\n",
+    "\n",
+    "The *sigmoid* function are more biologically plausible because the\n",
+    "output of inactive neurons are zero. Such activation function are\n",
+    "called *one-sided*. However, it has been shown that the hyperbolic\n",
+    "tangent performs better than the sigmoid for training MLPs.  has\n",
+    "become the most popular for *deep neural networks*"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "id": "4ac7c23c",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "%matplotlib inline\n",
+    "\n",
+    "\"\"\"The sigmoid function (or the logistic curve) is a \n",
+    "function that takes any real number, z, and outputs a number (0,1).\n",
+    "It is useful in neural networks for assigning weights on a relative scale.\n",
+    "The value z is the weighted sum of parameters involved in the learning algorithm.\"\"\"\n",
+    "\n",
+    "import numpy\n",
+    "import matplotlib.pyplot as plt\n",
+    "import math as mt\n",
+    "\n",
+    "z = numpy.arange(-5, 5, .1)\n",
+    "sigma_fn = numpy.vectorize(lambda z: 1/(1+numpy.exp(-z)))\n",
+    "sigma = sigma_fn(z)\n",
+    "\n",
+    "fig = plt.figure()\n",
+    "ax = fig.add_subplot(111)\n",
+    "ax.plot(z, sigma)\n",
+    "ax.set_ylim([-0.1, 1.1])\n",
+    "ax.set_xlim([-5,5])\n",
+    "ax.grid(True)\n",
+    "ax.set_xlabel('z')\n",
+    "ax.set_title('sigmoid function')\n",
+    "\n",
+    "plt.show()\n",
+    "\n",
+    "\"\"\"Step Function\"\"\"\n",
+    "z = numpy.arange(-5, 5, .02)\n",
+    "step_fn = numpy.vectorize(lambda z: 1.0 if z >= 0.0 else 0.0)\n",
+    "step = step_fn(z)\n",
+    "\n",
+    "fig = plt.figure()\n",
+    "ax = fig.add_subplot(111)\n",
+    "ax.plot(z, step)\n",
+    "ax.set_ylim([-0.5, 1.5])\n",
+    "ax.set_xlim([-5,5])\n",
+    "ax.grid(True)\n",
+    "ax.set_xlabel('z')\n",
+    "ax.set_title('step function')\n",
+    "\n",
+    "plt.show()\n",
+    "\n",
+    "\"\"\"Sine Function\"\"\"\n",
+    "z = numpy.arange(-2*mt.pi, 2*mt.pi, 0.1)\n",
+    "t = numpy.sin(z)\n",
+    "\n",
+    "fig = plt.figure()\n",
+    "ax = fig.add_subplot(111)\n",
+    "ax.plot(z, t)\n",
+    "ax.set_ylim([-1.0, 1.0])\n",
+    "ax.set_xlim([-2*mt.pi,2*mt.pi])\n",
+    "ax.grid(True)\n",
+    "ax.set_xlabel('z')\n",
+    "ax.set_title('sine function')\n",
+    "\n",
+    "plt.show()\n",
+    "\n",
+    "\"\"\"Plots a graph of the squashing function used by a rectified linear\n",
+    "unit\"\"\"\n",
+    "z = numpy.arange(-2, 2, .1)\n",
+    "zero = numpy.zeros(len(z))\n",
+    "y = numpy.max([zero, z], axis=0)\n",
+    "\n",
+    "fig = plt.figure()\n",
+    "ax = fig.add_subplot(111)\n",
+    "ax.plot(z, y)\n",
+    "ax.set_ylim([-2.0, 2.0])\n",
+    "ax.set_xlim([-2.0, 2.0])\n",
+    "ax.grid(True)\n",
+    "ax.set_xlabel('z')\n",
+    "ax.set_title('Rectified linear unit')\n",
+    "\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6aeb0ee4",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Vanishing gradients\n",
+    "\n",
+    "The Back propagation algorithm we derived above works by going from\n",
+    "the output layer to the input layer, propagating the error gradient on\n",
+    "the way. Once the algorithm has computed the gradient of the cost\n",
+    "function with regards to each parameter in the network, it uses these\n",
+    "gradients to update each parameter with a Gradient Descent (GD) step.\n",
+    "\n",
+    "Unfortunately for us, the gradients often get smaller and smaller as\n",
+    "the algorithm progresses down to the first hidden layers. As a result,\n",
+    "the GD update leaves the lower layer connection weights virtually\n",
+    "unchanged, and training never converges to a good solution. This is\n",
+    "known in the literature as **the vanishing gradients problem**."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ea47d1d6",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Exploding gradients\n",
+    "\n",
+    "In other cases, the opposite can happen, namely the the gradients can\n",
+    "grow bigger and bigger. The result is that many of the layers get\n",
+    "large updates of the weights the algorithm diverges. This is the\n",
+    "**exploding gradients problem**, which is mostly encountered in\n",
+    "recurrent neural networks. More generally, deep neural networks suffer\n",
+    "from unstable gradients, different layers may learn at widely\n",
+    "different speeds"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1947aa95",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Is the Logistic activation function (Sigmoid)  our choice?\n",
+    "\n",
+    "Although this unfortunate behavior has been empirically observed for\n",
+    "quite a while (it was one of the reasons why deep neural networks were\n",
+    "mostly abandoned for a long time), it is only around 2010 that\n",
+    "significant progress was made in understanding it.\n",
+    "\n",
+    "A paper titled [Understanding the Difficulty of Training Deep\n",
+    "Feedforward Neural Networks by Xavier Glorot and Yoshua Bengio](http://proceedings.mlr.press/v9/glorot10a.html) found that\n",
+    "the problems with the popular logistic\n",
+    "sigmoid activation function and the weight initialization technique\n",
+    "that was most popular at the time, namely random initialization using\n",
+    "a normal distribution with a mean of 0 and a standard deviation of\n",
+    "1."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d024119f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Logistic function as the root of problems\n",
+    "\n",
+    "They showed that with this activation function and this\n",
+    "initialization scheme, the variance of the outputs of each layer is\n",
+    "much greater than the variance of its inputs. Going forward in the\n",
+    "network, the variance keeps increasing after each layer until the\n",
+    "activation function saturates at the top layers. This is actually made\n",
+    "worse by the fact that the logistic function has a mean of 0.5, not 0\n",
+    "(the hyperbolic tangent function has a mean of 0 and behaves slightly\n",
+    "better than the logistic function in deep networks)."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c9178132",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## The derivative of the Logistic funtion\n",
+    "\n",
+    "Looking at the logistic activation function, when inputs become large\n",
+    "(negative or positive), the function saturates at 0 or 1, with a\n",
+    "derivative extremely close to 0. Thus when backpropagation kicks in,\n",
+    "it has virtually no gradient to propagate back through the network,\n",
+    "and what little gradient exists keeps getting diluted as\n",
+    "backpropagation progresses down through the top layers, so there is\n",
+    "really nothing left for the lower layers.\n",
+    "\n",
+    "In their paper, Glorot and Bengio propose a way to significantly\n",
+    "alleviate this problem. We need the signal to flow properly in both\n",
+    "directions: in the forward direction when making predictions, and in\n",
+    "the reverse direction when backpropagating gradients. We don’t want\n",
+    "the signal to die out, nor do we want it to explode and saturate. For\n",
+    "the signal to flow properly, the authors argue that we need the\n",
+    "variance of the outputs of each layer to be equal to the variance of\n",
+    "its inputs, and we also need the gradients to have equal variance\n",
+    "before and after flowing through a layer in the reverse direction."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "756185f5",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Insights from the paper by Glorot and Bengio\n",
+    "\n",
+    "One of the insights in the 2010 paper by Glorot and Bengio was that\n",
+    "the vanishing/exploding gradients problems were in part due to a poor\n",
+    "choice of activation function. Until then most people had assumed that\n",
+    "if Nature had chosen to use roughly sigmoid activation functions in\n",
+    "biological neurons, they must be an excellent choice. But it turns out\n",
+    "that other activation functions behave much better in deep neural\n",
+    "networks, in particular the ReLU activation function, mostly because\n",
+    "it does not saturate for positive values (and also because it is quite\n",
+    "fast to compute)."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3d92cad4",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## The RELU function family\n",
+    "\n",
+    "The ReLU activation function suffers from a problem known as the dying\n",
+    "ReLUs: during training, some neurons effectively die, meaning they\n",
+    "stop outputting anything other than 0.\n",
+    "\n",
+    "In some cases, you may find that half of your network’s neurons are\n",
+    "dead, especially if you used a large learning rate. During training,\n",
+    "if a neuron’s weights get updated such that the weighted sum of the\n",
+    "neuron’s inputs is negative, it will start outputting 0. When this\n",
+    "happen, the neuron is unlikely to come back to life since the gradient\n",
+    "of the ReLU function is 0 when its input is negative."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cbc6f721",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## ELU function\n",
+    "\n",
+    "To solve this problem, nowadays practitioners use a variant of the\n",
+    "ReLU function, such as the leaky ReLU discussed above or the so-called\n",
+    "exponential linear unit (ELU) function"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9249dc7b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "ELU(z) = \\left\\{\\begin{array}{cc} \\alpha\\left( \\exp{(z)}-1\\right) & z < 0,\\\\  z & z \\ge 0.\\end{array}\\right.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e59de3af",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Which activation function should we use?\n",
+    "\n",
+    "In general it seems that the ELU activation function is better than\n",
+    "the leaky ReLU function (and its variants), which is better than\n",
+    "ReLU. ReLU performs better than $\\tanh$ which in turn performs better\n",
+    "than the logistic function.\n",
+    "\n",
+    "If runtime performance is an issue, then you may opt for the leaky\n",
+    "ReLU function over the ELU function If you don’t want to tweak yet\n",
+    "another hyperparameter, you may just use the default $\\alpha$ of\n",
+    "$0.01$ for the leaky ReLU, and $1$ for ELU. If you have spare time and\n",
+    "computing power, you can use cross-validation or bootstrap to evaluate\n",
+    "other activation functions."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e2da998c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## More on activation functions, output layers\n",
+    "\n",
+    "In most cases you can use the ReLU activation function in the hidden\n",
+    "layers (or one of its variants).\n",
+    "\n",
+    "It is a bit faster to compute than other activation functions, and the\n",
+    "gradient descent optimization does in general not get stuck.\n",
+    "\n",
+    "**For the output layer:**\n",
+    "\n",
+    "* For classification the softmax activation function is generally a good choice for classification tasks (when the classes are mutually exclusive).\n",
+    "\n",
+    "* For regression tasks, you can simply use no activation function at all."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e1abf01e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Fine-tuning neural network hyperparameters\n",
+    "\n",
+    "The flexibility of neural networks is also one of their main\n",
+    "drawbacks: there are many hyperparameters to tweak. Not only can you\n",
+    "use any imaginable network topology (how neurons/nodes are\n",
+    "interconnected), but even in a simple FFNN you can change the number\n",
+    "of layers, the number of neurons per layer, the type of activation\n",
+    "function to use in each layer, the weight initialization logic, the\n",
+    "stochastic gradient optmized and much more. How do you know what\n",
+    "combination of hyperparameters is the best for your task?\n",
+    "\n",
+    "* You can use grid search with cross-validation to find the right hyperparameters.\n",
+    "\n",
+    "However,since there are many hyperparameters to tune, and since\n",
+    "training a neural network on a large dataset takes a lot of time, you\n",
+    "will only be able to explore a tiny part of the hyperparameter space.\n",
+    "\n",
+    "* You can use randomized search.\n",
+    "\n",
+    "* Or use tools like [Oscar](http://oscar.calldesk.ai/), which implements more complex algorithms to help you find a good set of hyperparameters quickly."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a8ded7cd",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Hidden layers\n",
+    "\n",
+    "For many problems you can start with just one or two hidden layers and\n",
+    "it will work just fine.  For the MNIST data set discussed below you can easily get a\n",
+    "high accuracy using just one hidden layer with a few hundred neurons.\n",
+    "You can reach for this data set above 98% accuracy using two hidden\n",
+    "layers with the same total amount of neurons, in roughly the same\n",
+    "amount of training time.\n",
+    "\n",
+    "For more complex problems, you can gradually ramp up the number of\n",
+    "hidden layers, until you start overfitting the training set. Very\n",
+    "complex tasks, such as large image classification or speech\n",
+    "recognition, typically require networks with dozens of layers and they\n",
+    "need a huge amount of training data. However, you will rarely have to\n",
+    "train such networks from scratch: it is much more common to reuse\n",
+    "parts of a pretrained state-of-the-art network that performs a similar\n",
+    "task."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "96da4f48",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Batch Normalization\n",
+    "\n",
+    "Batch Normalization aims to address the vanishing/exploding gradients\n",
+    "problems, and more generally the problem that the distribution of each\n",
+    "layer’s inputs changes during training, as the parameters of the\n",
+    "previous layers change.\n",
+    "\n",
+    "The technique consists of adding an operation in the model just before\n",
+    "the activation function of each layer, simply zero-centering and\n",
+    "normalizing the inputs, then scaling and shifting the result using two\n",
+    "new parameters per layer (one for scaling, the other for shifting). In\n",
+    "other words, this operation lets the model learn the optimal scale and\n",
+    "mean of the inputs for each layer.  In order to zero-center and\n",
+    "normalize the inputs, the algorithm needs to estimate the inputs’ mean\n",
+    "and standard deviation. It does so by evaluating the mean and standard\n",
+    "deviation of the inputs over the current mini-batch, from this the\n",
+    "name batch normalization."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "395346a7",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Dropout\n",
+    "\n",
+    "It is a fairly simple algorithm: at every training step, every neuron\n",
+    "(including the input neurons but excluding the output neurons) has a\n",
+    "probability $p$ of being temporarily dropped out, meaning it will be\n",
+    "entirely ignored during this training step, but it may be active\n",
+    "during the next step.\n",
+    "\n",
+    "The hyperparameter $p$ is called the dropout rate, and it is typically\n",
+    "set to 50%. After training, the neurons are not dropped anymore.  It\n",
+    "is viewed as one of the most popular regularization techniques."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9c712bbb",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Gradient Clipping\n",
+    "\n",
+    "A popular technique to lessen the exploding gradients problem is to\n",
+    "simply clip the gradients during backpropagation so that they never\n",
+    "exceed some threshold (this is mostly useful for recurrent neural\n",
+    "networks).\n",
+    "\n",
+    "This technique is called Gradient Clipping.\n",
+    "\n",
+    "In general however, Batch\n",
+    "Normalization is preferred."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2b66ea72",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## A top-down perspective on Neural networks\n",
+    "\n",
+    "The first thing we would like to do is divide the data into two or\n",
+    "three parts. A training set, a validation or dev (development) set,\n",
+    "and a test set. The test set is the data on which we want to make\n",
+    "predictions. The dev set is a subset of the training data we use to\n",
+    "check how well we are doing out-of-sample, after training the model on\n",
+    "the training dataset. We use the validation error as a proxy for the\n",
+    "test error in order to make tweaks to our model. It is crucial that we\n",
+    "do not use any of the test data to train the algorithm. This is a\n",
+    "cardinal sin in ML. Then:\n",
+    "\n",
+    "1. Estimate optimal error rate\n",
+    "\n",
+    "2. Minimize underfitting (bias) on training data set.\n",
+    "\n",
+    "3. Make sure you are not overfitting."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5acbc082",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## More top-down perspectives\n",
+    "\n",
+    "If the validation and test sets are drawn from the same distributions,\n",
+    "then a good performance on the validation set should lead to similarly\n",
+    "good performance on the test set. \n",
+    "\n",
+    "However, sometimes\n",
+    "the training data and test data differ in subtle ways because, for\n",
+    "example, they are collected using slightly different methods, or\n",
+    "because it is cheaper to collect data in one way versus another. In\n",
+    "this case, there can be a mismatch between the training and test\n",
+    "data. This can lead to the neural network overfitting these small\n",
+    "differences between the test and training sets, and a poor performance\n",
+    "on the test set despite having a good performance on the validation\n",
+    "set. To rectify this, Andrew Ng suggests making two validation or dev\n",
+    "sets, one constructed from the training data and one constructed from\n",
+    "the test data. The difference between the performance of the algorithm\n",
+    "on these two validation sets quantifies the train-test mismatch. This\n",
+    "can serve as another important diagnostic when using DNNs for\n",
+    "supervised learning."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "31825b65",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Limitations of supervised learning with deep networks\n",
+    "\n",
+    "Like all statistical methods, supervised learning using neural\n",
+    "networks has important limitations. This is especially important when\n",
+    "one seeks to apply these methods, especially to physics problems. Like\n",
+    "all tools, DNNs are not a universal solution. Often, the same or\n",
+    "better performance on a task can be achieved by using a few\n",
+    "hand-engineered features (or even a collection of random\n",
+    "features)."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c76d9af9",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Limitations of NNs\n",
+    "\n",
+    "Here we list some of the important limitations of supervised neural network based models. \n",
+    "\n",
+    "* **Need labeled data**. All supervised learning methods, DNNs for supervised learning require labeled data. Often, labeled data is harder to acquire than unlabeled data (e.g. one must pay for human experts to label images).\n",
+    "\n",
+    "* **Supervised neural networks are extremely data intensive.** DNNs are data hungry. They perform best when data is plentiful. This is doubly so for supervised methods where the data must also be labeled. The utility of DNNs is extremely limited if data is hard to acquire or the datasets are small (hundreds to a few thousand samples). In this case, the performance of other methods that utilize hand-engineered features can exceed that of DNNs."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "bdc93363",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Homogeneous data\n",
+    "\n",
+    "* **Homogeneous data.** Almost all DNNs deal with homogeneous data of one type. It is very hard to design architectures that mix and match data types (i.e. some continuous variables, some discrete variables, some time series). In applications beyond images, video, and language, this is often what is required. In contrast, ensemble models like random forests or gradient-boosted trees have no difficulty handling mixed data types."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a1d6ff64",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## More limitations\n",
+    "\n",
+    "* **Many problems are not about prediction.** In natural science we are often interested in learning something about the underlying distribution that generates the data. In this case, it is often difficult to cast these ideas in a supervised learning setting. While the problems are related, it is possible to make good predictions with a *wrong* model. The model might or might not be useful for understanding the underlying science.\n",
+    "\n",
+    "Some of these remarks are particular to DNNs, others are shared by all supervised learning methods. This motivates the use of unsupervised methods which in part circumvent these problems."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0c2e5742",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Setting up a Multi-layer perceptron model for classification\n",
+    "\n",
+    "We are now gong to develop an example based on the MNIST data\n",
+    "base. This is a classification problem and we need to use our\n",
+    "cross-entropy function we discussed in connection with logistic\n",
+    "regression. The cross-entropy defines our cost function for the\n",
+    "classificaton problems with neural networks.\n",
+    "\n",
+    "In binary classification with two classes $(0, 1)$ we define the\n",
+    "logistic/sigmoid function as the probability that a particular input\n",
+    "is in class $0$ or $1$.  This is possible because the logistic\n",
+    "function takes any input from the real numbers and inputs a number\n",
+    "between 0 and 1, and can therefore be interpreted as a probability. It\n",
+    "also has other nice properties, such as a derivative that is simple to\n",
+    "calculate.\n",
+    "\n",
+    "For an input $\\boldsymbol{a}$ from the hidden layer, the probability that the input $\\boldsymbol{x}$\n",
+    "is in class 0 or 1 is just. We let $\\theta$ represent the unknown weights and biases to be adjusted by our equations). The variable $x$\n",
+    "represents our activation values $z$. We have"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d4da3f02",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "P(y = 0 \\mid \\boldsymbol{x}, \\boldsymbol{\\theta}) = \\frac{1}{1 + \\exp{(- \\boldsymbol{x}})} ,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "01ea2e0b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9c1c7bec",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "P(y = 1 \\mid \\boldsymbol{x}, \\boldsymbol{\\theta}) = 1 - P(y = 0 \\mid \\boldsymbol{x}, \\boldsymbol{\\theta}) ,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9238ff2d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where $y \\in \\{0, 1\\}$  and $\\boldsymbol{\\theta}$ represents the weights and biases\n",
+    "of our network."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3be74bd1",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Defining the cost function\n",
+    "\n",
+    "Our cost function is given as (see the Logistic regression lectures)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2e2fd39c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\mathcal{C}(\\boldsymbol{\\theta}) = - \\ln P(\\mathcal{D} \\mid \\boldsymbol{\\theta}) = - \\sum_{i=1}^n\n",
+    "y_i \\ln[P(y_i = 0)] + (1 - y_i) \\ln [1 - P(y_i = 0)] = \\sum_{i=1}^n \\mathcal{L}_i(\\boldsymbol{\\theta}) .\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "42b1d26b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "This last equality means that we can interpret our *cost* function as a sum over the *loss* function\n",
+    "for each point in the dataset $\\mathcal{L}_i(\\boldsymbol{\\theta})$.  \n",
+    "The negative sign is just so that we can think about our algorithm as minimizing a positive number, rather\n",
+    "than maximizing a negative number.  \n",
+    "\n",
+    "In *multiclass* classification it is common to treat each integer label as a so called *one-hot* vector:  \n",
+    "\n",
+    "$y = 5 \\quad \\rightarrow \\quad \\boldsymbol{y} = (0, 0, 0, 0, 0, 1, 0, 0, 0, 0) ,$ and\n",
+    "\n",
+    "$y = 1 \\quad \\rightarrow \\quad \\boldsymbol{y} = (0, 1, 0, 0, 0, 0, 0, 0, 0, 0) ,$ \n",
+    "\n",
+    "i.e. a binary bit string of length $C$, where $C = 10$ is the number of classes in the MNIST dataset (numbers from $0$ to $9$)..  \n",
+    "\n",
+    "If $\\boldsymbol{x}_i$ is the $i$-th input (image), $y_{ic}$ refers to the $c$-th component of the $i$-th\n",
+    "output vector $\\boldsymbol{y}_i$.  \n",
+    "The probability of $\\boldsymbol{x}_i$ being in class $c$ will be given by the softmax function:"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f740a484",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "P(y_{ic} = 1 \\mid \\boldsymbol{x}_i, \\boldsymbol{\\theta}) = \\frac{\\exp{((\\boldsymbol{a}_i^{hidden})^T \\boldsymbol{w}_c)}}\n",
+    "{\\sum_{c'=0}^{C-1} \\exp{((\\boldsymbol{a}_i^{hidden})^T \\boldsymbol{w}_{c'})}} ,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "19189bfc",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "which reduces to the logistic function in the binary case.  \n",
+    "The likelihood of this $C$-class classifier\n",
+    "is now given as:"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "aeb3ef60",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "P(\\mathcal{D} \\mid \\boldsymbol{\\theta}) = \\prod_{i=1}^n \\prod_{c=0}^{C-1} [P(y_{ic} = 1)]^{y_{ic}} .\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "dbf419a1",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Again we take the negative log-likelihood to define our cost function:"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9e345753",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\mathcal{C}(\\boldsymbol{\\theta}) = - \\log{P(\\mathcal{D} \\mid \\boldsymbol{\\theta})}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3b13095e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "See the logistic regression lectures for a full definition of the cost function.\n",
+    "\n",
+    "The back propagation equations need now only a small change, namely the definition of a new cost function. We are thus ready to use the same equations as before!"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "96501a91",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Example: binary classification problem\n",
+    "\n",
+    "As an example of the above, relevant for project 2 as well, let us consider a binary class. As discussed in our logistic regression lectures, we defined a cost function in terms of the parameters $\\beta$ as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "48cf79fe",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\mathcal{C}(\\boldsymbol{\\beta}) = - \\sum_{i=1}^n \\left(y_i\\log{p(y_i \\vert x_i,\\boldsymbol{\\beta})}+(1-y_i)\\log{1-p(y_i \\vert x_i,\\boldsymbol{\\beta})}\\right),\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3243c0b1",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where we had defined the logistic (sigmoid) function"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "bb312a09",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "p(y_i =1\\vert x_i,\\boldsymbol{\\beta})=\\frac{\\exp{(\\beta_0+\\beta_1 x_i)}}{1+\\exp{(\\beta_0+\\beta_1 x_i)}},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "484cf2b4",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2b9c5483",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "p(y_i =0\\vert x_i,\\boldsymbol{\\beta})=1-p(y_i =1\\vert x_i,\\boldsymbol{\\beta}).\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5ca21f09",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "The parameters $\\boldsymbol{\\beta}$ were defined using a minimization method like gradient descent or Newton-Raphson's method. \n",
+    "\n",
+    "Now we replace $x_i$ with the activation $z_i^l$ for a given layer $l$ and the outputs as $y_i=a_i^l=f(z_i^l)$, with $z_i^l$ now being a function of the weights $w_{ij}^l$ and biases $b_i^l$. \n",
+    "We have then"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4852e4d2",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "a_i^l = y_i = \\frac{\\exp{(z_i^l)}}{1+\\exp{(z_i^l)}},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e3b7cbef",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "with"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0c1e69a1",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "z_i^l = \\sum_{j}w_{ij}^l a_j^{l-1}+b_i^l,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e71df7f4",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where the superscript $l-1$ indicates that these are the outputs from layer $l-1$.\n",
+    "Our cost function at the final layer $l=L$ is now"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "50d6fecc",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\mathcal{C}(\\boldsymbol{W}) = - \\sum_{i=1}^n \\left(t_i\\log{a_i^L}+(1-t_i)\\log{(1-a_i^L)}\\right),\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e145e461",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where we have defined the targets $t_i$. The derivatives of the cost function with respect to the output $a_i^L$ are then easily calculated and we get"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "97f13260",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial \\mathcal{C}(\\boldsymbol{W})}{\\partial a_i^L} = \\frac{a_i^L-t_i}{a_i^L(1-a_i^L)}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4361ce3b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "In case we use another activation function than the logistic one, we need to evaluate other derivatives."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "52a16654",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## The Softmax function\n",
+    "In case we employ the more general case given by the Softmax equation, we need to evaluate the derivative of the activation function with respect to the activation $z_i^l$, that is we need"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3bfb321e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial f(z_i^l)}{\\partial w_{jk}^l} =\n",
+    "\\frac{\\partial f(z_i^l)}{\\partial z_j^l} \\frac{\\partial z_j^l}{\\partial w_{jk}^l}= \\frac{\\partial f(z_i^l)}{\\partial z_j^l}a_k^{l-1}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "eccac6c9",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "For the Softmax function we have"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "23634198",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "f(z_i^l) = \\frac{\\exp{(z_i^l)}}{\\sum_{m=1}^K\\exp{(z_m^l)}}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7a2e75ba",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Its derivative with respect to $z_j^l$ gives"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2dad2d14",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial f(z_i^l)}{\\partial z_j^l}= f(z_i^l)\\left(\\delta_{ij}-f(z_j^l)\\right),\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "46415917",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "which in case of the simply binary model reduces to  having $i=j$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6adc7c1e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Developing a code for doing neural networks with back propagation\n",
+    "\n",
+    "One can identify a set of key steps when using neural networks to solve supervised learning problems:  \n",
+    "\n",
+    "1. Collect and pre-process data  \n",
+    "\n",
+    "2. Define model and architecture  \n",
+    "\n",
+    "3. Choose cost function and optimizer  \n",
+    "\n",
+    "4. Train the model  \n",
+    "\n",
+    "5. Evaluate model performance on test data  \n",
+    "\n",
+    "6. Adjust hyperparameters (if necessary, network architecture)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4110d83e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Collect and pre-process data\n",
+    "\n",
+    "Here we will be using the MNIST dataset, which is readily available through the **scikit-learn**\n",
+    "package. You may also find it for example [here](http://yann.lecun.com/exdb/mnist/).  \n",
+    "The *MNIST* (Modified National Institute of Standards and Technology) database is a large database\n",
+    "of handwritten digits that is commonly used for training various image processing systems.  \n",
+    "The MNIST dataset consists of 70 000 images of size $28\\times 28$ pixels, each labeled from 0 to 9.  \n",
+    "The scikit-learn dataset we will use consists of a selection of 1797 images of size $8\\times 8$ collected and processed from this database.  \n",
+    "\n",
+    "To feed data into a feed-forward neural network we need to represent\n",
+    "the inputs as a design/feature matrix $X = (n_{inputs}, n_{features})$.  Each\n",
+    "row represents an *input*, in this case a handwritten digit, and\n",
+    "each column represents a *feature*, in this case a pixel.  The\n",
+    "correct answers, also known as *labels* or *targets* are\n",
+    "represented as a 1D array of integers \n",
+    "$Y = (n_{inputs}) = (5, 3, 1, 8,...)$.\n",
+    "\n",
+    "As an example, say we want to build a neural network using supervised learning to predict Body-Mass Index (BMI) from\n",
+    "measurements of height (in m)  \n",
+    "and weight (in kg). If we have measurements of 5 people the design/feature matrix could be for example:  \n",
+    "\n",
+    "$$ X = \\begin{bmatrix}\n",
+    "1.85 & 81\\\\\n",
+    "1.71 & 65\\\\\n",
+    "1.95 & 103\\\\\n",
+    "1.55 & 42\\\\\n",
+    "1.63 & 56\n",
+    "\\end{bmatrix} ,$$  \n",
+    "\n",
+    "and the targets would be:  \n",
+    "\n",
+    "$$ Y = (23.7, 22.2, 27.1, 17.5, 21.1) $$  \n",
+    "\n",
+    "Since each input image is a 2D matrix, we need to flatten the image\n",
+    "(i.e. \"unravel\" the 2D matrix into a 1D array) to turn the data into a\n",
+    "design/feature matrix. This means we lose all spatial information in the\n",
+    "image, such as locality and translational invariance. More complicated\n",
+    "architectures such as Convolutional Neural Networks can take advantage\n",
+    "of such information, and are most commonly applied when analyzing\n",
+    "images."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "id": "070c610d",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "# import necessary packages\n",
+    "import numpy as np\n",
+    "import matplotlib.pyplot as plt\n",
+    "from sklearn import datasets\n",
+    "\n",
+    "\n",
+    "# ensure the same random numbers appear every time\n",
+    "np.random.seed(0)\n",
+    "\n",
+    "# display images in notebook\n",
+    "%matplotlib inline\n",
+    "plt.rcParams['figure.figsize'] = (12,12)\n",
+    "\n",
+    "\n",
+    "# download MNIST dataset\n",
+    "digits = datasets.load_digits()\n",
+    "\n",
+    "# define inputs and labels\n",
+    "inputs = digits.images\n",
+    "labels = digits.target\n",
+    "\n",
+    "print(\"inputs = (n_inputs, pixel_width, pixel_height) = \" + str(inputs.shape))\n",
+    "print(\"labels = (n_inputs) = \" + str(labels.shape))\n",
+    "\n",
+    "\n",
+    "# flatten the image\n",
+    "# the value -1 means dimension is inferred from the remaining dimensions: 8x8 = 64\n",
+    "n_inputs = len(inputs)\n",
+    "inputs = inputs.reshape(n_inputs, -1)\n",
+    "print(\"X = (n_inputs, n_features) = \" + str(inputs.shape))\n",
+    "\n",
+    "\n",
+    "# choose some random images to display\n",
+    "indices = np.arange(n_inputs)\n",
+    "random_indices = np.random.choice(indices, size=5)\n",
+    "\n",
+    "for i, image in enumerate(digits.images[random_indices]):\n",
+    "    plt.subplot(1, 5, i+1)\n",
+    "    plt.axis('off')\n",
+    "    plt.imshow(image, cmap=plt.cm.gray_r, interpolation='nearest')\n",
+    "    plt.title(\"Label: %d\" % digits.target[random_indices[i]])\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "28bb6085",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Train and test datasets\n",
+    "\n",
+    "Performing analysis before partitioning the dataset is a major error, that can lead to incorrect conclusions.  \n",
+    "\n",
+    "We will reserve $80 \\%$ of our dataset for training and $20 \\%$ for testing.  \n",
+    "\n",
+    "It is important that the train and test datasets are drawn randomly from our dataset, to ensure\n",
+    "no bias in the sampling.  \n",
+    "Say you are taking measurements of weather data to predict the weather in the coming 5 days.\n",
+    "You don't want to train your model on measurements taken from the hours 00.00 to 12.00, and then test it on data\n",
+    "collected from 12.00 to 24.00."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "id": "5a6ae0b0",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "from sklearn.model_selection import train_test_split\n",
+    "\n",
+    "# one-liner from scikit-learn library\n",
+    "train_size = 0.8\n",
+    "test_size = 1 - train_size\n",
+    "X_train, X_test, Y_train, Y_test = train_test_split(inputs, labels, train_size=train_size,\n",
+    "                                                    test_size=test_size)\n",
+    "\n",
+    "# equivalently in numpy\n",
+    "def train_test_split_numpy(inputs, labels, train_size, test_size):\n",
+    "    n_inputs = len(inputs)\n",
+    "    inputs_shuffled = inputs.copy()\n",
+    "    labels_shuffled = labels.copy()\n",
+    "    \n",
+    "    np.random.shuffle(inputs_shuffled)\n",
+    "    np.random.shuffle(labels_shuffled)\n",
+    "    \n",
+    "    train_end = int(n_inputs*train_size)\n",
+    "    X_train, X_test = inputs_shuffled[:train_end], inputs_shuffled[train_end:]\n",
+    "    Y_train, Y_test = labels_shuffled[:train_end], labels_shuffled[train_end:]\n",
+    "    \n",
+    "    return X_train, X_test, Y_train, Y_test\n",
+    "\n",
+    "#X_train, X_test, Y_train, Y_test = train_test_split_numpy(inputs, labels, train_size, test_size)\n",
+    "\n",
+    "print(\"Number of training images: \" + str(len(X_train)))\n",
+    "print(\"Number of test images: \" + str(len(X_test)))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c26d604d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Define model and architecture\n",
+    "\n",
+    "Our simple feed-forward neural network will consist of an *input* layer, a single *hidden* layer and an *output* layer. The activation $y$ of each neuron is a weighted sum of inputs, passed through an activation function. In case of the simple perceptron model we have \n",
+    "\n",
+    "$$ z = \\sum_{i=1}^n w_i a_i ,$$\n",
+    "\n",
+    "$$ y = f(z) ,$$\n",
+    "\n",
+    "where $f$ is the activation function, $a_i$ represents input from neuron $i$ in the preceding layer\n",
+    "and $w_i$ is the weight to input $i$.  \n",
+    "The activation of the neurons in the input layer is just the features (e.g. a pixel value).  \n",
+    "\n",
+    "The simplest activation function for a neuron is the *Heaviside* function:\n",
+    "\n",
+    "$$ f(z) = \n",
+    "\\begin{cases}\n",
+    "1,  &  z > 0\\\\\n",
+    "0,  & \\text{otherwise}\n",
+    "\\end{cases}\n",
+    "$$\n",
+    "\n",
+    "A feed-forward neural network with this activation is known as a *perceptron*.  \n",
+    "For a binary classifier (i.e. two classes, 0 or 1, dog or not-dog) we can also use this in our output layer.  \n",
+    "This activation can be generalized to $k$ classes (using e.g. the *one-against-all* strategy), \n",
+    "and we call these architectures *multiclass perceptrons*.  \n",
+    "\n",
+    "However, it is now common to use the terms Single Layer Perceptron (SLP) (1 hidden layer) and  \n",
+    "Multilayer Perceptron (MLP) (2 or more hidden layers) to refer to feed-forward neural networks with any activation function.  \n",
+    "\n",
+    "Typical choices for activation functions include the sigmoid function, hyperbolic tangent, and Rectified Linear Unit (ReLU).  \n",
+    "We will be using the sigmoid function $\\sigma(x)$:  \n",
+    "\n",
+    "$$ f(x) = \\sigma(x) = \\frac{1}{1 + e^{-x}} ,$$\n",
+    "\n",
+    "which is inspired by probability theory (see logistic regression) and was most commonly used until about 2011. See the discussion below concerning other activation functions."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2775283b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Layers\n",
+    "\n",
+    "* Input \n",
+    "\n",
+    "Since each input image has 8x8 = 64 pixels or features, we have an input layer of 64 neurons.  \n",
+    "\n",
+    "* Hidden layer\n",
+    "\n",
+    "We will use 50 neurons in the hidden layer receiving input from the neurons in the input layer.  \n",
+    "Since each neuron in the hidden layer is connected to the 64 inputs we have 64x50 = 3200 weights to the hidden layer.  \n",
+    "\n",
+    "* Output\n",
+    "\n",
+    "If we were building a binary classifier, it would be sufficient with a single neuron in the output layer,\n",
+    "which could output 0 or 1 according to the Heaviside function. This would be an example of a *hard* classifier, meaning it outputs the class of the input directly. However, if we are dealing with noisy data it is often beneficial to use a *soft* classifier, which outputs the probability of being in class 0 or 1.  \n",
+    "\n",
+    "For a soft binary classifier, we could use a single neuron and interpret the output as either being the probability of being in class 0 or the probability of being in class 1. Alternatively we could use 2 neurons, and interpret each neuron as the probability of being in each class.  \n",
+    "\n",
+    "Since we are doing multiclass classification, with 10 categories, it is natural to use 10 neurons in the output layer. We number the neurons $j = 0,1,...,9$. The activation of each output neuron $j$ will be according to the *softmax* function:  \n",
+    "\n",
+    "$$ P(\\text{class $j$} \\mid \\text{input $\\boldsymbol{a}$}) = \\frac{\\exp{(\\boldsymbol{a}^T \\boldsymbol{w}_j)}}\n",
+    "{\\sum_{c=0}^{9} \\exp{(\\boldsymbol{a}^T \\boldsymbol{w}_c)}} ,$$  \n",
+    "\n",
+    "i.e. each neuron $j$ outputs the probability of being in class $j$ given an input from the hidden layer $\\boldsymbol{a}$, with $\\boldsymbol{w}_j$ the weights of neuron $j$ to the inputs.  \n",
+    "The denominator is a normalization factor to ensure the outputs (probabilities) sum up to 1.  \n",
+    "The exponent is just the weighted sum of inputs as before:  \n",
+    "\n",
+    "$$ z_j = \\sum_{i=1}^n w_ {ij} a_i+b_j.$$  \n",
+    "\n",
+    "Since each neuron in the output layer is connected to the 50 inputs from the hidden layer we have 50x10 = 500\n",
+    "weights to the output layer."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f7455c00",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Weights and biases\n",
+    "\n",
+    "Typically weights are initialized with small values distributed around zero, drawn from a uniform\n",
+    "or normal distribution. Setting all weights to zero means all neurons give the same output, making the network useless.  \n",
+    "\n",
+    "Adding a bias value to the weighted sum of inputs allows the neural network to represent a greater range\n",
+    "of values. Without it, any input with the value 0 will be mapped to zero (before being passed through the activation). The bias unit has an output of 1, and a weight to each neuron $j$, $b_j$:  \n",
+    "\n",
+    "$$ z_j = \\sum_{i=1}^n w_ {ij} a_i + b_j.$$  \n",
+    "\n",
+    "The bias weights $\\boldsymbol{b}$ are often initialized to zero, but a small value like $0.01$ ensures all neurons have some output which can be backpropagated in the first training cycle."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "id": "20b3c8c0",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "# building our neural network\n",
+    "\n",
+    "n_inputs, n_features = X_train.shape\n",
+    "n_hidden_neurons = 50\n",
+    "n_categories = 10\n",
+    "\n",
+    "# we make the weights normally distributed using numpy.random.randn\n",
+    "\n",
+    "# weights and bias in the hidden layer\n",
+    "hidden_weights = np.random.randn(n_features, n_hidden_neurons)\n",
+    "hidden_bias = np.zeros(n_hidden_neurons) + 0.01\n",
+    "\n",
+    "# weights and bias in the output layer\n",
+    "output_weights = np.random.randn(n_hidden_neurons, n_categories)\n",
+    "output_bias = np.zeros(n_categories) + 0.01"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a41d9acd",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Feed-forward pass\n",
+    "\n",
+    "Denote $F$ the number of features, $H$ the number of hidden neurons and $C$ the number of categories.  \n",
+    "For each input image we calculate a weighted sum of input features (pixel values) to each neuron $j$ in the hidden layer $l$:  \n",
+    "\n",
+    "$$ z_{j}^{l} = \\sum_{i=1}^{F} w_{ij}^{l} x_i + b_{j}^{l},$$\n",
+    "\n",
+    "this is then passed through our activation function  \n",
+    "\n",
+    "$$ a_{j}^{l} = f(z_{j}^{l}) .$$  \n",
+    "\n",
+    "We calculate a weighted sum of inputs (activations in the hidden layer) to each neuron $j$ in the output layer:  \n",
+    "\n",
+    "$$ z_{j}^{L} = \\sum_{i=1}^{H} w_{ij}^{L} a_{i}^{l} + b_{j}^{L}.$$  \n",
+    "\n",
+    "Finally we calculate the output of neuron $j$ in the output layer using the softmax function:  \n",
+    "\n",
+    "$$ a_{j}^{L} = \\frac{\\exp{(z_j^{L})}}\n",
+    "{\\sum_{c=0}^{C-1} \\exp{(z_c^{L})}} .$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b2f64238",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Matrix multiplications\n",
+    "\n",
+    "Since our data has the dimensions $X = (n_{inputs}, n_{features})$ and our weights to the hidden\n",
+    "layer have the dimensions  \n",
+    "$W_{hidden} = (n_{features}, n_{hidden})$,\n",
+    "we can easily feed the network all our training data in one go by taking the matrix product  \n",
+    "\n",
+    "$$ X W^{h} = (n_{inputs}, n_{hidden}),$$ \n",
+    "\n",
+    "and obtain a matrix that holds the weighted sum of inputs to the hidden layer\n",
+    "for each input image and each hidden neuron.    \n",
+    "We also add the bias to obtain a matrix of weighted sums to the hidden layer $Z^{h}$:  \n",
+    "\n",
+    "$$ \\boldsymbol{z}^{l} = \\boldsymbol{X} \\boldsymbol{W}^{l} + \\boldsymbol{b}^{l} ,$$\n",
+    "\n",
+    "meaning the same bias (1D array with size equal number of hidden neurons) is added to each input image.  \n",
+    "This is then passed through the activation:  \n",
+    "\n",
+    "$$ \\boldsymbol{a}^{l} = f(\\boldsymbol{z}^l) .$$  \n",
+    "\n",
+    "This is fed to the output layer:  \n",
+    "\n",
+    "$$ \\boldsymbol{z}^{L} = \\boldsymbol{a}^{L} \\boldsymbol{W}^{L} + \\boldsymbol{b}^{L} .$$\n",
+    "\n",
+    "Finally we receive our output values for each image and each category by passing it through the softmax function:  \n",
+    "\n",
+    "$$ output = softmax (\\boldsymbol{z}^{L}) = (n_{inputs}, n_{categories}) .$$"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "id": "1f5589af",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "# setup the feed-forward pass, subscript h = hidden layer\n",
+    "\n",
+    "def sigmoid(x):\n",
+    "    return 1/(1 + np.exp(-x))\n",
+    "\n",
+    "def feed_forward(X):\n",
+    "    # weighted sum of inputs to the hidden layer\n",
+    "    z_h = np.matmul(X, hidden_weights) + hidden_bias\n",
+    "    # activation in the hidden layer\n",
+    "    a_h = sigmoid(z_h)\n",
+    "    \n",
+    "    # weighted sum of inputs to the output layer\n",
+    "    z_o = np.matmul(a_h, output_weights) + output_bias\n",
+    "    # softmax output\n",
+    "    # axis 0 holds each input and axis 1 the probabilities of each category\n",
+    "    exp_term = np.exp(z_o)\n",
+    "    probabilities = exp_term / np.sum(exp_term, axis=1, keepdims=True)\n",
+    "    \n",
+    "    return probabilities\n",
+    "\n",
+    "probabilities = feed_forward(X_train)\n",
+    "print(\"probabilities = (n_inputs, n_categories) = \" + str(probabilities.shape))\n",
+    "print(\"probability that image 0 is in category 0,1,2,...,9 = \\n\" + str(probabilities[0]))\n",
+    "print(\"probabilities sum up to: \" + str(probabilities[0].sum()))\n",
+    "print()\n",
+    "\n",
+    "# we obtain a prediction by taking the class with the highest likelihood\n",
+    "def predict(X):\n",
+    "    probabilities = feed_forward(X)\n",
+    "    return np.argmax(probabilities, axis=1)\n",
+    "\n",
+    "predictions = predict(X_train)\n",
+    "print(\"predictions = (n_inputs) = \" + str(predictions.shape))\n",
+    "print(\"prediction for image 0: \" + str(predictions[0]))\n",
+    "print(\"correct label for image 0: \" + str(Y_train[0]))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4518e911",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Choose cost function and optimizer\n",
+    "\n",
+    "To measure how well our neural network is doing we need to introduce a cost function.  \n",
+    "We will call the function that gives the error of a single sample output the *loss* function, and the function\n",
+    "that gives the total error of our network across all samples the *cost* function.\n",
+    "A typical choice for multiclass classification is the *cross-entropy* loss, also known as the negative log likelihood.  \n",
+    "\n",
+    "In *multiclass* classification it is common to treat each integer label as a so called *one-hot* vector:  \n",
+    "\n",
+    "$$ y = 5 \\quad \\rightarrow \\quad \\boldsymbol{y} = (0, 0, 0, 0, 0, 1, 0, 0, 0, 0) ,$$  \n",
+    "\n",
+    "$$ y = 1 \\quad \\rightarrow \\quad \\boldsymbol{y} = (0, 1, 0, 0, 0, 0, 0, 0, 0, 0) ,$$  \n",
+    "\n",
+    "i.e. a binary bit string of length $C$, where $C = 10$ is the number of classes in the MNIST dataset.  \n",
+    "\n",
+    "Let $y_{ic}$ denote the $c$-th component of the $i$-th one-hot vector.  \n",
+    "We define the cost function $\\mathcal{C}$ as a sum over the cross-entropy loss for each point $\\boldsymbol{x}_i$ in the dataset.\n",
+    "\n",
+    "In the one-hot representation only one of the terms in the loss function is non-zero, namely the\n",
+    "probability of the correct category $c'$  \n",
+    "(i.e. the category $c'$ such that $y_{ic'} = 1$). This means that the cross entropy loss only punishes you for how wrong\n",
+    "you got the correct label. The probability of category $c$ is given by the softmax function. The vector $\\boldsymbol{\\theta}$ represents the parameters of our network, i.e. all the weights and biases."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d519516b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Optimizing the cost function\n",
+    "\n",
+    "The network is trained by finding the weights and biases that minimize the cost function. One of the most widely used classes of methods is *gradient descent* and its generalizations. The idea behind gradient descent\n",
+    "is simply to adjust the weights in the direction where the gradient of the cost function is large and negative. This ensures we flow toward a *local* minimum of the cost function.  \n",
+    "Each parameter $\\theta$ is iteratively adjusted according to the rule  \n",
+    "\n",
+    "$$ \\theta_{i+1} = \\theta_i - \\eta \\nabla \\mathcal{C}(\\theta_i) ,$$\n",
+    "\n",
+    "where $\\eta$ is known as the *learning rate*, which controls how big a step we take towards the minimum.  \n",
+    "This update can be repeated for any number of iterations, or until we are satisfied with the result.  \n",
+    "\n",
+    "A simple and effective improvement is a variant called *Batch Gradient Descent*.  \n",
+    "Instead of calculating the gradient on the whole dataset, we calculate an approximation of the gradient\n",
+    "on a subset of the data called a *minibatch*.  \n",
+    "If there are $N$ data points and we have a minibatch size of $M$, the total number of batches\n",
+    "is $N/M$.  \n",
+    "We denote each minibatch $B_k$, with $k = 1, 2,...,N/M$. The gradient then becomes:  \n",
+    "\n",
+    "$$ \\nabla \\mathcal{C}(\\theta) = \\frac{1}{N} \\sum_{i=1}^N \\nabla \\mathcal{L}_i(\\theta) \\quad \\rightarrow \\quad\n",
+    "\\frac{1}{M} \\sum_{i \\in B_k} \\nabla \\mathcal{L}_i(\\theta) ,$$\n",
+    "\n",
+    "i.e. instead of averaging the loss over the entire dataset, we average over a minibatch.  \n",
+    "\n",
+    "This has two important benefits:  \n",
+    "1. Introducing stochasticity decreases the chance that the algorithm becomes stuck in a local minima.  \n",
+    "\n",
+    "2. It significantly speeds up the calculation, since we do not have to use the entire dataset to calculate the gradient.  \n",
+    "\n",
+    "The various optmization  methods, with codes and algorithms,  are discussed in our lectures on [Gradient descent approaches](https://compphysics.github.io/MachineLearning/doc/pub/Splines/html/Splines-bs.html)."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "46b71202",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Regularization\n",
+    "\n",
+    "It is common to add an extra term to the cost function, proportional\n",
+    "to the size of the weights.  This is equivalent to constraining the\n",
+    "size of the weights, so that they do not grow out of control.\n",
+    "Constraining the size of the weights means that the weights cannot\n",
+    "grow arbitrarily large to fit the training data, and in this way\n",
+    "reduces *overfitting*.\n",
+    "\n",
+    "We will measure the size of the weights using the so called *L2-norm*, meaning our cost function becomes:  \n",
+    "\n",
+    "$$  \\mathcal{C}(\\theta) = \\frac{1}{N} \\sum_{i=1}^N \\mathcal{L}_i(\\theta) \\quad \\rightarrow \\quad\n",
+    "\\frac{1}{N} \\sum_{i=1}^N  \\mathcal{L}_i(\\theta) + \\lambda \\lvert \\lvert \\boldsymbol{w} \\rvert \\rvert_2^2 \n",
+    "= \\frac{1}{N} \\sum_{i=1}^N  \\mathcal{L}(\\theta) + \\lambda \\sum_{ij} w_{ij}^2,$$  \n",
+    "\n",
+    "i.e. we sum up all the weights squared. The factor $\\lambda$ is known as a regularization parameter.\n",
+    "\n",
+    "In order to train the model, we need to calculate the derivative of\n",
+    "the cost function with respect to every bias and weight in the\n",
+    "network.  In total our network has $(64 + 1)\\times 50=3250$ weights in\n",
+    "the hidden layer and $(50 + 1)\\times 10=510$ weights to the output\n",
+    "layer ($+1$ for the bias), and the gradient must be calculated for\n",
+    "every parameter.  We use the *backpropagation* algorithm discussed\n",
+    "above. This is a clever use of the chain rule that allows us to\n",
+    "calculate the gradient efficently."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "129c39d3",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Matrix  multiplication\n",
+    "\n",
+    "To more efficently train our network these equations are implemented using matrix operations.  \n",
+    "The error in the output layer is calculated simply as, with $\\boldsymbol{t}$ being our targets,  \n",
+    "\n",
+    "$$ \\delta_L = \\boldsymbol{t} - \\boldsymbol{y} = (n_{inputs}, n_{categories}) .$$  \n",
+    "\n",
+    "The gradient for the output weights is calculated as  \n",
+    "\n",
+    "$$ \\nabla W_{L} = \\boldsymbol{a}^T \\delta_L   = (n_{hidden}, n_{categories}) ,$$\n",
+    "\n",
+    "where $\\boldsymbol{a} = (n_{inputs}, n_{hidden})$. This simply means that we are summing up the gradients for each input.  \n",
+    "Since we are going backwards we have to transpose the activation matrix.  \n",
+    "\n",
+    "The gradient with respect to the output bias is then  \n",
+    "\n",
+    "$$ \\nabla \\boldsymbol{b}_{L} = \\sum_{i=1}^{n_{inputs}} \\delta_L = (n_{categories}) .$$  \n",
+    "\n",
+    "The error in the hidden layer is  \n",
+    "\n",
+    "$$ \\Delta_h = \\delta_L W_{L}^T \\circ f'(z_{h}) = \\delta_L W_{L}^T \\circ a_{h} \\circ (1 - a_{h}) = (n_{inputs}, n_{hidden}) ,$$  \n",
+    "\n",
+    "where $f'(a_{h})$ is the derivative of the activation in the hidden layer. The matrix products mean\n",
+    "that we are summing up the products for each neuron in the output layer. The symbol $\\circ$ denotes\n",
+    "the *Hadamard product*, meaning element-wise multiplication.  \n",
+    "\n",
+    "This again gives us the gradients in the hidden layer:  \n",
+    "\n",
+    "$$ \\nabla W_{h} = X^T \\delta_h = (n_{features}, n_{hidden}) ,$$  \n",
+    "\n",
+    "$$ \\nabla b_{h} = \\sum_{i=1}^{n_{inputs}} \\delta_h = (n_{hidden}) .$$"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "id": "8abafb44",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "# to categorical turns our integer vector into a onehot representation\n",
+    "from sklearn.metrics import accuracy_score\n",
+    "\n",
+    "# one-hot in numpy\n",
+    "def to_categorical_numpy(integer_vector):\n",
+    "    n_inputs = len(integer_vector)\n",
+    "    n_categories = np.max(integer_vector) + 1\n",
+    "    onehot_vector = np.zeros((n_inputs, n_categories))\n",
+    "    onehot_vector[range(n_inputs), integer_vector] = 1\n",
+    "    \n",
+    "    return onehot_vector\n",
+    "\n",
+    "#Y_train_onehot, Y_test_onehot = to_categorical(Y_train), to_categorical(Y_test)\n",
+    "Y_train_onehot, Y_test_onehot = to_categorical_numpy(Y_train), to_categorical_numpy(Y_test)\n",
+    "\n",
+    "def feed_forward_train(X):\n",
+    "    # weighted sum of inputs to the hidden layer\n",
+    "    z_h = np.matmul(X, hidden_weights) + hidden_bias\n",
+    "    # activation in the hidden layer\n",
+    "    a_h = sigmoid(z_h)\n",
+    "    \n",
+    "    # weighted sum of inputs to the output layer\n",
+    "    z_o = np.matmul(a_h, output_weights) + output_bias\n",
+    "    # softmax output\n",
+    "    # axis 0 holds each input and axis 1 the probabilities of each category\n",
+    "    exp_term = np.exp(z_o)\n",
+    "    probabilities = exp_term / np.sum(exp_term, axis=1, keepdims=True)\n",
+    "    \n",
+    "    # for backpropagation need activations in hidden and output layers\n",
+    "    return a_h, probabilities\n",
+    "\n",
+    "def backpropagation(X, Y):\n",
+    "    a_h, probabilities = feed_forward_train(X)\n",
+    "    \n",
+    "    # error in the output layer\n",
+    "    error_output = probabilities - Y\n",
+    "    # error in the hidden layer\n",
+    "    error_hidden = np.matmul(error_output, output_weights.T) * a_h * (1 - a_h)\n",
+    "    \n",
+    "    # gradients for the output layer\n",
+    "    output_weights_gradient = np.matmul(a_h.T, error_output)\n",
+    "    output_bias_gradient = np.sum(error_output, axis=0)\n",
+    "    \n",
+    "    # gradient for the hidden layer\n",
+    "    hidden_weights_gradient = np.matmul(X.T, error_hidden)\n",
+    "    hidden_bias_gradient = np.sum(error_hidden, axis=0)\n",
+    "\n",
+    "    return output_weights_gradient, output_bias_gradient, hidden_weights_gradient, hidden_bias_gradient\n",
+    "\n",
+    "print(\"Old accuracy on training data: \" + str(accuracy_score(predict(X_train), Y_train)))\n",
+    "\n",
+    "eta = 0.01\n",
+    "lmbd = 0.01\n",
+    "for i in range(1000):\n",
+    "    # calculate gradients\n",
+    "    dWo, dBo, dWh, dBh = backpropagation(X_train, Y_train_onehot)\n",
+    "    \n",
+    "    # regularization term gradients\n",
+    "    dWo += lmbd * output_weights\n",
+    "    dWh += lmbd * hidden_weights\n",
+    "    \n",
+    "    # update weights and biases\n",
+    "    output_weights -= eta * dWo\n",
+    "    output_bias -= eta * dBo\n",
+    "    hidden_weights -= eta * dWh\n",
+    "    hidden_bias -= eta * dBh\n",
+    "\n",
+    "print(\"New accuracy on training data: \" + str(accuracy_score(predict(X_train), Y_train)))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e95c7166",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Improving performance\n",
+    "\n",
+    "As we can see the network does not seem to be learning at all. It seems to be just guessing the label for each image.  \n",
+    "In order to obtain a network that does something useful, we will have to do a bit more work.  \n",
+    "\n",
+    "The choice of *hyperparameters* such as learning rate and regularization parameter is hugely influential for the performance of the network. Typically a *grid-search* is performed, wherein we test different hyperparameters separated by orders of magnitude. For example we could test the learning rates $\\eta = 10^{-6}, 10^{-5},...,10^{-1}$ with different regularization parameters $\\lambda = 10^{-6},...,10^{-0}$.  \n",
+    "\n",
+    "Next, we haven't implemented minibatching yet, which introduces stochasticity and is though to act as an important regularizer on the weights. We call a feed-forward + backward pass with a minibatch an *iteration*, and a full training period\n",
+    "going through the entire dataset ($n/M$ batches) an *epoch*.\n",
+    "\n",
+    "If this does not improve network performance, you may want to consider altering the network architecture, adding more neurons or hidden layers.  \n",
+    "Andrew Ng goes through some of these considerations in this [video](https://youtu.be/F1ka6a13S9I). You can find a summary of the video [here](https://kevinzakka.github.io/2016/09/26/applying-deep-learning/)."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b4365471",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Full object-oriented implementation\n",
+    "\n",
+    "It is very natural to think of the network as an object, with specific instances of the network\n",
+    "being realizations of this object with different hyperparameters. An implementation using Python classes provides a clean structure and interface, and the full implementation of our neural network is given below."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 8,
+   "id": "5a0357b2",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "class NeuralNetwork:\n",
+    "    def __init__(\n",
+    "            self,\n",
+    "            X_data,\n",
+    "            Y_data,\n",
+    "            n_hidden_neurons=50,\n",
+    "            n_categories=10,\n",
+    "            epochs=10,\n",
+    "            batch_size=100,\n",
+    "            eta=0.1,\n",
+    "            lmbd=0.0):\n",
+    "\n",
+    "        self.X_data_full = X_data\n",
+    "        self.Y_data_full = Y_data\n",
+    "\n",
+    "        self.n_inputs = X_data.shape[0]\n",
+    "        self.n_features = X_data.shape[1]\n",
+    "        self.n_hidden_neurons = n_hidden_neurons\n",
+    "        self.n_categories = n_categories\n",
+    "\n",
+    "        self.epochs = epochs\n",
+    "        self.batch_size = batch_size\n",
+    "        self.iterations = self.n_inputs // self.batch_size\n",
+    "        self.eta = eta\n",
+    "        self.lmbd = lmbd\n",
+    "\n",
+    "        self.create_biases_and_weights()\n",
+    "\n",
+    "    def create_biases_and_weights(self):\n",
+    "        self.hidden_weights = np.random.randn(self.n_features, self.n_hidden_neurons)\n",
+    "        self.hidden_bias = np.zeros(self.n_hidden_neurons) + 0.01\n",
+    "\n",
+    "        self.output_weights = np.random.randn(self.n_hidden_neurons, self.n_categories)\n",
+    "        self.output_bias = np.zeros(self.n_categories) + 0.01\n",
+    "\n",
+    "    def feed_forward(self):\n",
+    "        # feed-forward for training\n",
+    "        self.z_h = np.matmul(self.X_data, self.hidden_weights) + self.hidden_bias\n",
+    "        self.a_h = sigmoid(self.z_h)\n",
+    "\n",
+    "        self.z_o = np.matmul(self.a_h, self.output_weights) + self.output_bias\n",
+    "\n",
+    "        exp_term = np.exp(self.z_o)\n",
+    "        self.probabilities = exp_term / np.sum(exp_term, axis=1, keepdims=True)\n",
+    "\n",
+    "    def feed_forward_out(self, X):\n",
+    "        # feed-forward for output\n",
+    "        z_h = np.matmul(X, self.hidden_weights) + self.hidden_bias\n",
+    "        a_h = sigmoid(z_h)\n",
+    "\n",
+    "        z_o = np.matmul(a_h, self.output_weights) + self.output_bias\n",
+    "        \n",
+    "        exp_term = np.exp(z_o)\n",
+    "        probabilities = exp_term / np.sum(exp_term, axis=1, keepdims=True)\n",
+    "        return probabilities\n",
+    "\n",
+    "    def backpropagation(self):\n",
+    "        error_output = self.probabilities - self.Y_data\n",
+    "        error_hidden = np.matmul(error_output, self.output_weights.T) * self.a_h * (1 - self.a_h)\n",
+    "\n",
+    "        self.output_weights_gradient = np.matmul(self.a_h.T, error_output)\n",
+    "        self.output_bias_gradient = np.sum(error_output, axis=0)\n",
+    "\n",
+    "        self.hidden_weights_gradient = np.matmul(self.X_data.T, error_hidden)\n",
+    "        self.hidden_bias_gradient = np.sum(error_hidden, axis=0)\n",
+    "\n",
+    "        if self.lmbd > 0.0:\n",
+    "            self.output_weights_gradient += self.lmbd * self.output_weights\n",
+    "            self.hidden_weights_gradient += self.lmbd * self.hidden_weights\n",
+    "\n",
+    "        self.output_weights -= self.eta * self.output_weights_gradient\n",
+    "        self.output_bias -= self.eta * self.output_bias_gradient\n",
+    "        self.hidden_weights -= self.eta * self.hidden_weights_gradient\n",
+    "        self.hidden_bias -= self.eta * self.hidden_bias_gradient\n",
+    "\n",
+    "    def predict(self, X):\n",
+    "        probabilities = self.feed_forward_out(X)\n",
+    "        return np.argmax(probabilities, axis=1)\n",
+    "\n",
+    "    def predict_probabilities(self, X):\n",
+    "        probabilities = self.feed_forward_out(X)\n",
+    "        return probabilities\n",
+    "\n",
+    "    def train(self):\n",
+    "        data_indices = np.arange(self.n_inputs)\n",
+    "\n",
+    "        for i in range(self.epochs):\n",
+    "            for j in range(self.iterations):\n",
+    "                # pick datapoints with replacement\n",
+    "                chosen_datapoints = np.random.choice(\n",
+    "                    data_indices, size=self.batch_size, replace=False\n",
+    "                )\n",
+    "\n",
+    "                # minibatch training data\n",
+    "                self.X_data = self.X_data_full[chosen_datapoints]\n",
+    "                self.Y_data = self.Y_data_full[chosen_datapoints]\n",
+    "\n",
+    "                self.feed_forward()\n",
+    "                self.backpropagation()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a417307d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Evaluate model performance on test data\n",
+    "\n",
+    "To measure the performance of our network we evaluate how well it does it data it has never seen before, i.e. the test data.  \n",
+    "We measure the performance of the network using the *accuracy* score.  \n",
+    "The accuracy is as you would expect just the number of images correctly labeled divided by the total number of images. A perfect classifier will have an accuracy score of $1$.  \n",
+    "\n",
+    "$$ \\text{Accuracy} = \\frac{\\sum_{i=1}^n I(\\tilde{y}_i = y_i)}{n} ,$$  \n",
+    "\n",
+    "where $I$ is the indicator function, $1$ if $\\tilde{y}_i = y_i$ and $0$ otherwise."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 9,
+   "id": "8ee4b306",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "epochs = 100\n",
+    "batch_size = 100\n",
+    "\n",
+    "dnn = NeuralNetwork(X_train, Y_train_onehot, eta=eta, lmbd=lmbd, epochs=epochs, batch_size=batch_size,\n",
+    "                    n_hidden_neurons=n_hidden_neurons, n_categories=n_categories)\n",
+    "dnn.train()\n",
+    "test_predict = dnn.predict(X_test)\n",
+    "\n",
+    "# accuracy score from scikit library\n",
+    "print(\"Accuracy score on test set: \", accuracy_score(Y_test, test_predict))\n",
+    "\n",
+    "# equivalent in numpy\n",
+    "def accuracy_score_numpy(Y_test, Y_pred):\n",
+    "    return np.sum(Y_test == Y_pred) / len(Y_test)\n",
+    "\n",
+    "#print(\"Accuracy score on test set: \", accuracy_score_numpy(Y_test, test_predict))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "efcbd954",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Adjust hyperparameters\n",
+    "\n",
+    "We now perform a grid search to find the optimal hyperparameters for the network.  \n",
+    "Note that we are only using 1 layer with 50 neurons, and human performance is estimated to be around $98\\%$ ($2\\%$ error rate)."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 10,
+   "id": "bb527e6e",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "eta_vals = np.logspace(-5, 1, 7)\n",
+    "lmbd_vals = np.logspace(-5, 1, 7)\n",
+    "# store the models for later use\n",
+    "DNN_numpy = np.zeros((len(eta_vals), len(lmbd_vals)), dtype=object)\n",
+    "\n",
+    "# grid search\n",
+    "for i, eta in enumerate(eta_vals):\n",
+    "    for j, lmbd in enumerate(lmbd_vals):\n",
+    "        dnn = NeuralNetwork(X_train, Y_train_onehot, eta=eta, lmbd=lmbd, epochs=epochs, batch_size=batch_size,\n",
+    "                            n_hidden_neurons=n_hidden_neurons, n_categories=n_categories)\n",
+    "        dnn.train()\n",
+    "        \n",
+    "        DNN_numpy[i][j] = dnn\n",
+    "        \n",
+    "        test_predict = dnn.predict(X_test)\n",
+    "        \n",
+    "        print(\"Learning rate  = \", eta)\n",
+    "        print(\"Lambda = \", lmbd)\n",
+    "        print(\"Accuracy score on test set: \", accuracy_score(Y_test, test_predict))\n",
+    "        print()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d282951d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Visualization"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 11,
+   "id": "69d3d9c8",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "# visual representation of grid search\n",
+    "# uses seaborn heatmap, you can also do this with matplotlib imshow\n",
+    "import seaborn as sns\n",
+    "\n",
+    "sns.set()\n",
+    "\n",
+    "train_accuracy = np.zeros((len(eta_vals), len(lmbd_vals)))\n",
+    "test_accuracy = np.zeros((len(eta_vals), len(lmbd_vals)))\n",
+    "\n",
+    "for i in range(len(eta_vals)):\n",
+    "    for j in range(len(lmbd_vals)):\n",
+    "        dnn = DNN_numpy[i][j]\n",
+    "        \n",
+    "        train_pred = dnn.predict(X_train) \n",
+    "        test_pred = dnn.predict(X_test)\n",
+    "\n",
+    "        train_accuracy[i][j] = accuracy_score(Y_train, train_pred)\n",
+    "        test_accuracy[i][j] = accuracy_score(Y_test, test_pred)\n",
+    "\n",
+    "        \n",
+    "fig, ax = plt.subplots(figsize = (10, 10))\n",
+    "sns.heatmap(train_accuracy, annot=True, ax=ax, cmap=\"viridis\")\n",
+    "ax.set_title(\"Training Accuracy\")\n",
+    "ax.set_ylabel(\"$\\eta$\")\n",
+    "ax.set_xlabel(\"$\\lambda$\")\n",
+    "plt.show()\n",
+    "\n",
+    "fig, ax = plt.subplots(figsize = (10, 10))\n",
+    "sns.heatmap(test_accuracy, annot=True, ax=ax, cmap=\"viridis\")\n",
+    "ax.set_title(\"Test Accuracy\")\n",
+    "ax.set_ylabel(\"$\\eta$\")\n",
+    "ax.set_xlabel(\"$\\lambda$\")\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "99f5058c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## scikit-learn implementation\n",
+    "\n",
+    "**scikit-learn** focuses more\n",
+    "on traditional machine learning methods, such as regression,\n",
+    "clustering, decision trees, etc. As such, it has only two types of\n",
+    "neural networks: Multi Layer Perceptron outputting continuous values,\n",
+    "*MPLRegressor*, and Multi Layer Perceptron outputting labels,\n",
+    "*MLPClassifier*. We will see how simple it is to use these classes.\n",
+    "\n",
+    "**scikit-learn** implements a few improvements from our neural network,\n",
+    "such as early stopping, a varying learning rate, different\n",
+    "optimization methods, etc. We would therefore expect a better\n",
+    "performance overall."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 12,
+   "id": "7898d99f",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "from sklearn.neural_network import MLPClassifier\n",
+    "# store models for later use\n",
+    "DNN_scikit = np.zeros((len(eta_vals), len(lmbd_vals)), dtype=object)\n",
+    "\n",
+    "for i, eta in enumerate(eta_vals):\n",
+    "    for j, lmbd in enumerate(lmbd_vals):\n",
+    "        dnn = MLPClassifier(hidden_layer_sizes=(n_hidden_neurons), activation='logistic',\n",
+    "                            alpha=lmbd, learning_rate_init=eta, max_iter=epochs)\n",
+    "        dnn.fit(X_train, Y_train)\n",
+    "        \n",
+    "        DNN_scikit[i][j] = dnn\n",
+    "        \n",
+    "        print(\"Learning rate  = \", eta)\n",
+    "        print(\"Lambda = \", lmbd)\n",
+    "        print(\"Accuracy score on test set: \", dnn.score(X_test, Y_test))\n",
+    "        print()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7ceec918",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Visualization"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 13,
+   "id": "98abf229",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "# optional\n",
+    "# visual representation of grid search\n",
+    "# uses seaborn heatmap, could probably do this in matplotlib\n",
+    "import seaborn as sns\n",
+    "\n",
+    "sns.set()\n",
+    "\n",
+    "train_accuracy = np.zeros((len(eta_vals), len(lmbd_vals)))\n",
+    "test_accuracy = np.zeros((len(eta_vals), len(lmbd_vals)))\n",
+    "\n",
+    "for i in range(len(eta_vals)):\n",
+    "    for j in range(len(lmbd_vals)):\n",
+    "        dnn = DNN_scikit[i][j]\n",
+    "        \n",
+    "        train_pred = dnn.predict(X_train) \n",
+    "        test_pred = dnn.predict(X_test)\n",
+    "\n",
+    "        train_accuracy[i][j] = accuracy_score(Y_train, train_pred)\n",
+    "        test_accuracy[i][j] = accuracy_score(Y_test, test_pred)\n",
+    "\n",
+    "        \n",
+    "fig, ax = plt.subplots(figsize = (10, 10))\n",
+    "sns.heatmap(train_accuracy, annot=True, ax=ax, cmap=\"viridis\")\n",
+    "ax.set_title(\"Training Accuracy\")\n",
+    "ax.set_ylabel(\"$\\eta$\")\n",
+    "ax.set_xlabel(\"$\\lambda$\")\n",
+    "plt.show()\n",
+    "\n",
+    "fig, ax = plt.subplots(figsize = (10, 10))\n",
+    "sns.heatmap(test_accuracy, annot=True, ax=ax, cmap=\"viridis\")\n",
+    "ax.set_title(\"Test Accuracy\")\n",
+    "ax.set_ylabel(\"$\\eta$\")\n",
+    "ax.set_xlabel(\"$\\lambda$\")\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ba07c374",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Building neural networks in Tensorflow and Keras\n",
+    "\n",
+    "Now we want  to build on the experience gained from our neural network implementation in NumPy and scikit-learn\n",
+    "and use it to construct a neural network in Tensorflow. Once we have constructed a neural network in NumPy\n",
+    "and Tensorflow, building one in Keras is really quite trivial, though the performance may suffer.  \n",
+    "\n",
+    "In our previous example we used only one hidden layer, and in this we will use two. From this it should be quite\n",
+    "clear how to build one using an arbitrary number of hidden layers, using data structures such as Python lists or\n",
+    "NumPy arrays."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1cf09819",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Tensorflow\n",
+    "\n",
+    "Tensorflow is an open source library machine learning library\n",
+    "developed by the Google Brain team for internal use. It was released\n",
+    "under the Apache 2.0 open source license in November 9, 2015.\n",
+    "\n",
+    "Tensorflow is a computational framework that allows you to construct\n",
+    "machine learning models at different levels of abstraction, from\n",
+    "high-level, object-oriented APIs like Keras, down to the C++ kernels\n",
+    "that Tensorflow is built upon. The higher levels of abstraction are\n",
+    "simpler to use, but less flexible, and our choice of implementation\n",
+    "should reflect the problems we are trying to solve.\n",
+    "\n",
+    "[Tensorflow uses](https://www.tensorflow.org/guide/graphs) so-called graphs to represent your computation\n",
+    "in terms of the dependencies between individual operations, such that you first build a Tensorflow *graph*\n",
+    "to represent your model, and then create a Tensorflow *session* to run the graph.\n",
+    "\n",
+    "In this guide we will analyze the same data as we did in our NumPy and\n",
+    "scikit-learn tutorial, gathered from the MNIST database of images. We\n",
+    "will give an introduction to the lower level Python Application\n",
+    "Program Interfaces (APIs), and see how we use them to build our graph.\n",
+    "Then we will build (effectively) the same graph in Keras, to see just\n",
+    "how simple solving a machine learning problem can be.\n",
+    "\n",
+    "To install tensorflow on Unix/Linux systems, use pip as"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 14,
+   "id": "2c2c3ec5",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "pip3 install tensorflow"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "39d013b1",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and/or if you use **anaconda**, just write (or install from the graphical user interface)\n",
+    "(current release of CPU-only TensorFlow)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 15,
+   "id": "fbf36c26",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "conda create -n tf tensorflow\n",
+    "conda activate tf"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "94e66380",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "To install the current release of GPU TensorFlow"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 16,
+   "id": "5e72b1d2",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "conda create -n tf-gpu tensorflow-gpu\n",
+    "conda activate tf-gpu"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "40470dbd",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Using Keras\n",
+    "\n",
+    "Keras is a high level [neural network](https://en.wikipedia.org/wiki/Application_programming_interface)\n",
+    "that supports Tensorflow, CTNK and Theano as backends.  \n",
+    "If you have Anaconda installed you may run the following command"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 17,
+   "id": "f2cd4f41",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "conda install keras"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "636940c6",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "You can look up the [instructions here](https://keras.io/) for more information.\n",
+    "\n",
+    "We will to a large extent use **keras** in this course."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d9f47b57",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Collect and pre-process data\n",
+    "\n",
+    "Let us look again at the MINST data set."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 18,
+   "id": "1489b5d5",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "# import necessary packages\n",
+    "import numpy as np\n",
+    "import matplotlib.pyplot as plt\n",
+    "import tensorflow as tf\n",
+    "from sklearn import datasets\n",
+    "\n",
+    "\n",
+    "# ensure the same random numbers appear every time\n",
+    "np.random.seed(0)\n",
+    "\n",
+    "# display images in notebook\n",
+    "%matplotlib inline\n",
+    "plt.rcParams['figure.figsize'] = (12,12)\n",
+    "\n",
+    "\n",
+    "# download MNIST dataset\n",
+    "digits = datasets.load_digits()\n",
+    "\n",
+    "# define inputs and labels\n",
+    "inputs = digits.images\n",
+    "labels = digits.target\n",
+    "\n",
+    "print(\"inputs = (n_inputs, pixel_width, pixel_height) = \" + str(inputs.shape))\n",
+    "print(\"labels = (n_inputs) = \" + str(labels.shape))\n",
+    "\n",
+    "\n",
+    "# flatten the image\n",
+    "# the value -1 means dimension is inferred from the remaining dimensions: 8x8 = 64\n",
+    "n_inputs = len(inputs)\n",
+    "inputs = inputs.reshape(n_inputs, -1)\n",
+    "print(\"X = (n_inputs, n_features) = \" + str(inputs.shape))\n",
+    "\n",
+    "\n",
+    "# choose some random images to display\n",
+    "indices = np.arange(n_inputs)\n",
+    "random_indices = np.random.choice(indices, size=5)\n",
+    "\n",
+    "for i, image in enumerate(digits.images[random_indices]):\n",
+    "    plt.subplot(1, 5, i+1)\n",
+    "    plt.axis('off')\n",
+    "    plt.imshow(image, cmap=plt.cm.gray_r, interpolation='nearest')\n",
+    "    plt.title(\"Label: %d\" % digits.target[random_indices[i]])\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 19,
+   "id": "672dc5a2",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "from tensorflow.keras.layers import Input\n",
+    "from tensorflow.keras.models import Sequential      #This allows appending layers to existing models\n",
+    "from tensorflow.keras.layers import Dense           #This allows defining the characteristics of a particular layer\n",
+    "from tensorflow.keras import optimizers             #This allows using whichever optimiser we want (sgd,adam,RMSprop)\n",
+    "from tensorflow.keras import regularizers           #This allows using whichever regularizer we want (l1,l2,l1_l2)\n",
+    "from tensorflow.keras.utils import to_categorical   #This allows using categorical cross entropy as the cost function\n",
+    "\n",
+    "from sklearn.model_selection import train_test_split\n",
+    "\n",
+    "# one-hot representation of labels\n",
+    "labels = to_categorical(labels)\n",
+    "\n",
+    "# split into train and test data\n",
+    "train_size = 0.8\n",
+    "test_size = 1 - train_size\n",
+    "X_train, X_test, Y_train, Y_test = train_test_split(inputs, labels, train_size=train_size,\n",
+    "                                                    test_size=test_size)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 20,
+   "id": "0513084f",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "\n",
+    "epochs = 100\n",
+    "batch_size = 100\n",
+    "n_neurons_layer1 = 100\n",
+    "n_neurons_layer2 = 50\n",
+    "n_categories = 10\n",
+    "eta_vals = np.logspace(-5, 1, 7)\n",
+    "lmbd_vals = np.logspace(-5, 1, 7)\n",
+    "def create_neural_network_keras(n_neurons_layer1, n_neurons_layer2, n_categories, eta, lmbd):\n",
+    "    model = Sequential()\n",
+    "    model.add(Dense(n_neurons_layer1, activation='sigmoid', kernel_regularizer=regularizers.l2(lmbd)))\n",
+    "    model.add(Dense(n_neurons_layer2, activation='sigmoid', kernel_regularizer=regularizers.l2(lmbd)))\n",
+    "    model.add(Dense(n_categories, activation='softmax'))\n",
+    "    \n",
+    "    sgd = optimizers.SGD(lr=eta)\n",
+    "    model.compile(loss='categorical_crossentropy', optimizer=sgd, metrics=['accuracy'])\n",
+    "    \n",
+    "    return model"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 21,
+   "id": "02a34777",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "DNN_keras = np.zeros((len(eta_vals), len(lmbd_vals)), dtype=object)\n",
+    "        \n",
+    "for i, eta in enumerate(eta_vals):\n",
+    "    for j, lmbd in enumerate(lmbd_vals):\n",
+    "        DNN = create_neural_network_keras(n_neurons_layer1, n_neurons_layer2, n_categories,\n",
+    "                                         eta=eta, lmbd=lmbd)\n",
+    "        DNN.fit(X_train, Y_train, epochs=epochs, batch_size=batch_size, verbose=0)\n",
+    "        scores = DNN.evaluate(X_test, Y_test)\n",
+    "        \n",
+    "        DNN_keras[i][j] = DNN\n",
+    "        \n",
+    "        print(\"Learning rate = \", eta)\n",
+    "        print(\"Lambda = \", lmbd)\n",
+    "        print(\"Test accuracy: %.3f\" % scores[1])\n",
+    "        print()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 22,
+   "id": "52c1d6e2",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "# optional\n",
+    "# visual representation of grid search\n",
+    "# uses seaborn heatmap, could probably do this in matplotlib\n",
+    "import seaborn as sns\n",
+    "\n",
+    "sns.set()\n",
+    "\n",
+    "train_accuracy = np.zeros((len(eta_vals), len(lmbd_vals)))\n",
+    "test_accuracy = np.zeros((len(eta_vals), len(lmbd_vals)))\n",
+    "\n",
+    "for i in range(len(eta_vals)):\n",
+    "    for j in range(len(lmbd_vals)):\n",
+    "        DNN = DNN_keras[i][j]\n",
+    "\n",
+    "        train_accuracy[i][j] = DNN.evaluate(X_train, Y_train)[1]\n",
+    "        test_accuracy[i][j] = DNN.evaluate(X_test, Y_test)[1]\n",
+    "\n",
+    "        \n",
+    "fig, ax = plt.subplots(figsize = (10, 10))\n",
+    "sns.heatmap(train_accuracy, annot=True, ax=ax, cmap=\"viridis\")\n",
+    "ax.set_title(\"Training Accuracy\")\n",
+    "ax.set_ylabel(\"$\\eta$\")\n",
+    "ax.set_xlabel(\"$\\lambda$\")\n",
+    "plt.show()\n",
+    "\n",
+    "fig, ax = plt.subplots(figsize = (10, 10))\n",
+    "sns.heatmap(test_accuracy, annot=True, ax=ax, cmap=\"viridis\")\n",
+    "ax.set_title(\"Test Accuracy\")\n",
+    "ax.set_ylabel(\"$\\eta$\")\n",
+    "ax.set_xlabel(\"$\\lambda$\")\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "53f9be79",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Building a neural network code\n",
+    "\n",
+    "Here we  present a flexible object oriented codebase\n",
+    "for a feed forward neural network, along with a demonstration of how\n",
+    "to use it. Before we get into the details of the neural network, we\n",
+    "will first present some implementations of various schedulers, cost\n",
+    "functions and activation functions that can be used together with the\n",
+    "neural network.\n",
+    "\n",
+    "The codes here were developed by Eric Reber and Gregor Kajda during spring 2023."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "39bd1718",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "### Learning rate methods\n",
+    "\n",
+    "The code below shows object oriented implementations of the Constant,\n",
+    "Momentum, Adagrad, AdagradMomentum, RMS prop and Adam schedulers. All\n",
+    "of the classes belong to the shared abstract Scheduler class, and\n",
+    "share the update_change() and reset() methods allowing for any of the\n",
+    "schedulers to be seamlessly used during the training stage, as will\n",
+    "later be shown in the fit() method of the neural\n",
+    "network. Update_change() only has one parameter, the gradient\n",
+    "($δ^l_ja^{l−1}_k$), and returns the change which will be subtracted\n",
+    "from the weights. The reset() function takes no parameters, and resets\n",
+    "the desired variables. For Constant and Momentum, reset does nothing."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 23,
+   "id": "4c1f42f1",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "import autograd.numpy as np\n",
+    "\n",
+    "class Scheduler:\n",
+    "    \"\"\"\n",
+    "    Abstract class for Schedulers\n",
+    "    \"\"\"\n",
+    "\n",
+    "    def __init__(self, eta):\n",
+    "        self.eta = eta\n",
+    "\n",
+    "    # should be overwritten\n",
+    "    def update_change(self, gradient):\n",
+    "        raise NotImplementedError\n",
+    "\n",
+    "    # overwritten if needed\n",
+    "    def reset(self):\n",
+    "        pass\n",
+    "\n",
+    "\n",
+    "class Constant(Scheduler):\n",
+    "    def __init__(self, eta):\n",
+    "        super().__init__(eta)\n",
+    "\n",
+    "    def update_change(self, gradient):\n",
+    "        return self.eta * gradient\n",
+    "    \n",
+    "    def reset(self):\n",
+    "        pass\n",
+    "\n",
+    "\n",
+    "class Momentum(Scheduler):\n",
+    "    def __init__(self, eta: float, momentum: float):\n",
+    "        super().__init__(eta)\n",
+    "        self.momentum = momentum\n",
+    "        self.change = 0\n",
+    "\n",
+    "    def update_change(self, gradient):\n",
+    "        self.change = self.momentum * self.change + self.eta * gradient\n",
+    "        return self.change\n",
+    "\n",
+    "    def reset(self):\n",
+    "        pass\n",
+    "\n",
+    "\n",
+    "class Adagrad(Scheduler):\n",
+    "    def __init__(self, eta):\n",
+    "        super().__init__(eta)\n",
+    "        self.G_t = None\n",
+    "\n",
+    "    def update_change(self, gradient):\n",
+    "        delta = 1e-8  # avoid division ny zero\n",
+    "\n",
+    "        if self.G_t is None:\n",
+    "            self.G_t = np.zeros((gradient.shape[0], gradient.shape[0]))\n",
+    "\n",
+    "        self.G_t += gradient @ gradient.T\n",
+    "\n",
+    "        G_t_inverse = 1 / (\n",
+    "            delta + np.sqrt(np.reshape(np.diagonal(self.G_t), (self.G_t.shape[0], 1)))\n",
+    "        )\n",
+    "        return self.eta * gradient * G_t_inverse\n",
+    "\n",
+    "    def reset(self):\n",
+    "        self.G_t = None\n",
+    "\n",
+    "\n",
+    "class AdagradMomentum(Scheduler):\n",
+    "    def __init__(self, eta, momentum):\n",
+    "        super().__init__(eta)\n",
+    "        self.G_t = None\n",
+    "        self.momentum = momentum\n",
+    "        self.change = 0\n",
+    "\n",
+    "    def update_change(self, gradient):\n",
+    "        delta = 1e-8  # avoid division ny zero\n",
+    "\n",
+    "        if self.G_t is None:\n",
+    "            self.G_t = np.zeros((gradient.shape[0], gradient.shape[0]))\n",
+    "\n",
+    "        self.G_t += gradient @ gradient.T\n",
+    "\n",
+    "        G_t_inverse = 1 / (\n",
+    "            delta + np.sqrt(np.reshape(np.diagonal(self.G_t), (self.G_t.shape[0], 1)))\n",
+    "        )\n",
+    "        self.change = self.change * self.momentum + self.eta * gradient * G_t_inverse\n",
+    "        return self.change\n",
+    "\n",
+    "    def reset(self):\n",
+    "        self.G_t = None\n",
+    "\n",
+    "\n",
+    "class RMS_prop(Scheduler):\n",
+    "    def __init__(self, eta, rho):\n",
+    "        super().__init__(eta)\n",
+    "        self.rho = rho\n",
+    "        self.second = 0.0\n",
+    "\n",
+    "    def update_change(self, gradient):\n",
+    "        delta = 1e-8  # avoid division ny zero\n",
+    "        self.second = self.rho * self.second + (1 - self.rho) * gradient * gradient\n",
+    "        return self.eta * gradient / (np.sqrt(self.second + delta))\n",
+    "\n",
+    "    def reset(self):\n",
+    "        self.second = 0.0\n",
+    "\n",
+    "\n",
+    "class Adam(Scheduler):\n",
+    "    def __init__(self, eta, rho, rho2):\n",
+    "        super().__init__(eta)\n",
+    "        self.rho = rho\n",
+    "        self.rho2 = rho2\n",
+    "        self.moment = 0\n",
+    "        self.second = 0\n",
+    "        self.n_epochs = 1\n",
+    "\n",
+    "    def update_change(self, gradient):\n",
+    "        delta = 1e-8  # avoid division ny zero\n",
+    "\n",
+    "        self.moment = self.rho * self.moment + (1 - self.rho) * gradient\n",
+    "        self.second = self.rho2 * self.second + (1 - self.rho2) * gradient * gradient\n",
+    "\n",
+    "        moment_corrected = self.moment / (1 - self.rho**self.n_epochs)\n",
+    "        second_corrected = self.second / (1 - self.rho2**self.n_epochs)\n",
+    "\n",
+    "        return self.eta * moment_corrected / (np.sqrt(second_corrected + delta))\n",
+    "\n",
+    "    def reset(self):\n",
+    "        self.n_epochs += 1\n",
+    "        self.moment = 0\n",
+    "        self.second = 0"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "532aecc2",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "### Usage of the above learning rate schedulers\n",
+    "\n",
+    "To initalize a scheduler, simply create the object and pass in the\n",
+    "necessary parameters such as the learning rate and the momentum as\n",
+    "shown below. As the Scheduler class is an abstract class it should not\n",
+    "called directly, and will raise an error upon usage."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 24,
+   "id": "b24b4414",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "momentum_scheduler = Momentum(eta=1e-3, momentum=0.9)\n",
+    "adam_scheduler = Adam(eta=1e-3, rho=0.9, rho2=0.999)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "32a25c0b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Here is a small example for how a segment of code using schedulers\n",
+    "could look. Switching out the schedulers is simple."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 25,
+   "id": "7a7d273f",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "weights = np.ones((3,3))\n",
+    "print(f\"Before scheduler:\\n{weights=}\")\n",
+    "\n",
+    "epochs = 10\n",
+    "for e in range(epochs):\n",
+    "    gradient = np.random.rand(3, 3)\n",
+    "    change = adam_scheduler.update_change(gradient)\n",
+    "    weights = weights - change\n",
+    "    adam_scheduler.reset()\n",
+    "\n",
+    "print(f\"\\nAfter scheduler:\\n{weights=}\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d34cd45c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "### Cost functions\n",
+    "\n",
+    "Here we discuss cost functions that can be used when creating the\n",
+    "neural network. Every cost function takes the target vector as its\n",
+    "parameter, and returns a function valued only at $x$ such that it may\n",
+    "easily be differentiated."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 26,
+   "id": "9ad6425d",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "import autograd.numpy as np\n",
+    "\n",
+    "def CostOLS(target):\n",
+    "    \n",
+    "    def func(X):\n",
+    "        return (1.0 / target.shape[0]) * np.sum((target - X) ** 2)\n",
+    "\n",
+    "    return func\n",
+    "\n",
+    "\n",
+    "def CostLogReg(target):\n",
+    "\n",
+    "    def func(X):\n",
+    "        \n",
+    "        return -(1.0 / target.shape[0]) * np.sum(\n",
+    "            (target * np.log(X + 10e-10)) + ((1 - target) * np.log(1 - X + 10e-10))\n",
+    "        )\n",
+    "\n",
+    "    return func\n",
+    "\n",
+    "\n",
+    "def CostCrossEntropy(target):\n",
+    "    \n",
+    "    def func(X):\n",
+    "        return -(1.0 / target.size) * np.sum(target * np.log(X + 10e-10))\n",
+    "\n",
+    "    return func"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "baaaff79",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Below we give a short example of how these cost function may be used\n",
+    "to obtain results if you wish to test them out on your own using\n",
+    "AutoGrad's automatics differentiation."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 27,
+   "id": "78f11b83",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "from autograd import grad\n",
+    "\n",
+    "target = np.array([[1, 2, 3]]).T\n",
+    "a = np.array([[4, 5, 6]]).T\n",
+    "\n",
+    "cost_func = CostCrossEntropy\n",
+    "cost_func_derivative = grad(cost_func(target))\n",
+    "\n",
+    "valued_at_a = cost_func_derivative(a)\n",
+    "print(f\"Derivative of cost function {cost_func.__name__} valued at a:\\n{valued_at_a}\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "05285af5",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "### Activation functions\n",
+    "\n",
+    "Finally, before we look at the neural network, we will look at the\n",
+    "activation functions which can be specified between the hidden layers\n",
+    "and as the output function. Each function can be valued for any given\n",
+    "vector or matrix X, and can be differentiated via derivate()."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 28,
+   "id": "7ac52c84",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "import autograd.numpy as np\n",
+    "from autograd import elementwise_grad\n",
+    "\n",
+    "def identity(X):\n",
+    "    return X\n",
+    "\n",
+    "\n",
+    "def sigmoid(X):\n",
+    "    try:\n",
+    "        return 1.0 / (1 + np.exp(-X))\n",
+    "    except FloatingPointError:\n",
+    "        return np.where(X > np.zeros(X.shape), np.ones(X.shape), np.zeros(X.shape))\n",
+    "\n",
+    "\n",
+    "def softmax(X):\n",
+    "    X = X - np.max(X, axis=-1, keepdims=True)\n",
+    "    delta = 10e-10\n",
+    "    return np.exp(X) / (np.sum(np.exp(X), axis=-1, keepdims=True) + delta)\n",
+    "\n",
+    "\n",
+    "def RELU(X):\n",
+    "    return np.where(X > np.zeros(X.shape), X, np.zeros(X.shape))\n",
+    "\n",
+    "\n",
+    "def LRELU(X):\n",
+    "    delta = 10e-4\n",
+    "    return np.where(X > np.zeros(X.shape), X, delta * X)\n",
+    "\n",
+    "\n",
+    "def derivate(func):\n",
+    "    if func.__name__ == \"RELU\":\n",
+    "\n",
+    "        def func(X):\n",
+    "            return np.where(X > 0, 1, 0)\n",
+    "\n",
+    "        return func\n",
+    "\n",
+    "    elif func.__name__ == \"LRELU\":\n",
+    "\n",
+    "        def func(X):\n",
+    "            delta = 10e-4\n",
+    "            return np.where(X > 0, 1, delta)\n",
+    "\n",
+    "        return func\n",
+    "\n",
+    "    else:\n",
+    "        return elementwise_grad(func)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "873e7caa",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Below follows a short demonstration of how to use an activation\n",
+    "function. The derivative of the activation function will be important\n",
+    "when calculating the output delta term during backpropagation. Note\n",
+    "that derivate() can also be used for cost functions for a more\n",
+    "generalized approach."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 29,
+   "id": "bd43ac18",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "z = np.array([[4, 5, 6]]).T\n",
+    "print(f\"Input to activation function:\\n{z}\")\n",
+    "\n",
+    "act_func = sigmoid\n",
+    "a = act_func(z)\n",
+    "print(f\"\\nOutput from {act_func.__name__} activation function:\\n{a}\")\n",
+    "\n",
+    "act_func_derivative = derivate(act_func)\n",
+    "valued_at_z = act_func_derivative(a)\n",
+    "print(f\"\\nDerivative of {act_func.__name__} activation function valued at z:\\n{valued_at_z}\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3dc2175e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "### The Neural Network\n",
+    "\n",
+    "Now that we have gotten a good understanding of the implementation of\n",
+    "some important components, we can take a look at an object oriented\n",
+    "implementation of a feed forward neural network. The feed forward\n",
+    "neural network has been implemented as a class named FFNN, which can\n",
+    "be initiated as a regressor or classifier dependant on the choice of\n",
+    "cost function. The FFNN can have any number of input nodes, hidden\n",
+    "layers with any amount of hidden nodes, and any amount of output nodes\n",
+    "meaning it can perform multiclass classification as well as binary\n",
+    "classification and regression problems. Although there is a lot of\n",
+    "code present, it makes for an easy to use and generalizeable interface\n",
+    "for creating many types of neural networks as will be demonstrated\n",
+    "below."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 30,
+   "id": "5b4b161c",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "import math\n",
+    "import autograd.numpy as np\n",
+    "import sys\n",
+    "import warnings\n",
+    "from autograd import grad, elementwise_grad\n",
+    "from random import random, seed\n",
+    "from copy import deepcopy, copy\n",
+    "from typing import Tuple, Callable\n",
+    "from sklearn.utils import resample\n",
+    "\n",
+    "warnings.simplefilter(\"error\")\n",
+    "\n",
+    "\n",
+    "class FFNN:\n",
+    "    \"\"\"\n",
+    "    Description:\n",
+    "    ------------\n",
+    "        Feed Forward Neural Network with interface enabling flexible design of a\n",
+    "        nerual networks architecture and the specification of activation function\n",
+    "        in the hidden layers and output layer respectively. This model can be used\n",
+    "        for both regression and classification problems, depending on the output function.\n",
+    "\n",
+    "    Attributes:\n",
+    "    ------------\n",
+    "        I   dimensions (tuple[int]): A list of positive integers, which specifies the\n",
+    "            number of nodes in each of the networks layers. The first integer in the array\n",
+    "            defines the number of nodes in the input layer, the second integer defines number\n",
+    "            of nodes in the first hidden layer and so on until the last number, which\n",
+    "            specifies the number of nodes in the output layer.\n",
+    "        II  hidden_func (Callable): The activation function for the hidden layers\n",
+    "        III output_func (Callable): The activation function for the output layer\n",
+    "        IV  cost_func (Callable): Our cost function\n",
+    "        V   seed (int): Sets random seed, makes results reproducible\n",
+    "    \"\"\"\n",
+    "\n",
+    "    def __init__(\n",
+    "        self,\n",
+    "        dimensions: tuple[int],\n",
+    "        hidden_func: Callable = sigmoid,\n",
+    "        output_func: Callable = lambda x: x,\n",
+    "        cost_func: Callable = CostOLS,\n",
+    "        seed: int = None,\n",
+    "    ):\n",
+    "        self.dimensions = dimensions\n",
+    "        self.hidden_func = hidden_func\n",
+    "        self.output_func = output_func\n",
+    "        self.cost_func = cost_func\n",
+    "        self.seed = seed\n",
+    "        self.weights = list()\n",
+    "        self.schedulers_weight = list()\n",
+    "        self.schedulers_bias = list()\n",
+    "        self.a_matrices = list()\n",
+    "        self.z_matrices = list()\n",
+    "        self.classification = None\n",
+    "\n",
+    "        self.reset_weights()\n",
+    "        self._set_classification()\n",
+    "\n",
+    "    def fit(\n",
+    "        self,\n",
+    "        X: np.ndarray,\n",
+    "        t: np.ndarray,\n",
+    "        scheduler: Scheduler,\n",
+    "        batches: int = 1,\n",
+    "        epochs: int = 100,\n",
+    "        lam: float = 0,\n",
+    "        X_val: np.ndarray = None,\n",
+    "        t_val: np.ndarray = None,\n",
+    "    ):\n",
+    "        \"\"\"\n",
+    "        Description:\n",
+    "        ------------\n",
+    "            This function performs the training the neural network by performing the feedforward and backpropagation\n",
+    "            algorithm to update the networks weights.\n",
+    "\n",
+    "        Parameters:\n",
+    "        ------------\n",
+    "            I    X (np.ndarray) : training data\n",
+    "            II   t (np.ndarray) : target data\n",
+    "            III  scheduler (Scheduler) : specified scheduler (algorithm for optimization of gradient descent)\n",
+    "            IV   scheduler_args (list[int]) : list of all arguments necessary for scheduler\n",
+    "\n",
+    "        Optional Parameters:\n",
+    "        ------------\n",
+    "            V    batches (int) : number of batches the datasets are split into, default equal to 1\n",
+    "            VI   epochs (int) : number of iterations used to train the network, default equal to 100\n",
+    "            VII  lam (float) : regularization hyperparameter lambda\n",
+    "            VIII X_val (np.ndarray) : validation set\n",
+    "            IX   t_val (np.ndarray) : validation target set\n",
+    "\n",
+    "        Returns:\n",
+    "        ------------\n",
+    "            I   scores (dict) : A dictionary containing the performance metrics of the model.\n",
+    "                The number of the metrics depends on the parameters passed to the fit-function.\n",
+    "\n",
+    "        \"\"\"\n",
+    "\n",
+    "        # setup \n",
+    "        if self.seed is not None:\n",
+    "            np.random.seed(self.seed)\n",
+    "\n",
+    "        val_set = False\n",
+    "        if X_val is not None and t_val is not None:\n",
+    "            val_set = True\n",
+    "\n",
+    "        # creating arrays for score metrics\n",
+    "        train_errors = np.empty(epochs)\n",
+    "        train_errors.fill(np.nan)\n",
+    "        val_errors = np.empty(epochs)\n",
+    "        val_errors.fill(np.nan)\n",
+    "\n",
+    "        train_accs = np.empty(epochs)\n",
+    "        train_accs.fill(np.nan)\n",
+    "        val_accs = np.empty(epochs)\n",
+    "        val_accs.fill(np.nan)\n",
+    "\n",
+    "        self.schedulers_weight = list()\n",
+    "        self.schedulers_bias = list()\n",
+    "\n",
+    "        batch_size = X.shape[0] // batches\n",
+    "\n",
+    "        X, t = resample(X, t)\n",
+    "\n",
+    "        # this function returns a function valued only at X\n",
+    "        cost_function_train = self.cost_func(t)\n",
+    "        if val_set:\n",
+    "            cost_function_val = self.cost_func(t_val)\n",
+    "\n",
+    "        # create schedulers for each weight matrix\n",
+    "        for i in range(len(self.weights)):\n",
+    "            self.schedulers_weight.append(copy(scheduler))\n",
+    "            self.schedulers_bias.append(copy(scheduler))\n",
+    "\n",
+    "        print(f\"{scheduler.__class__.__name__}: Eta={scheduler.eta}, Lambda={lam}\")\n",
+    "\n",
+    "        try:\n",
+    "            for e in range(epochs):\n",
+    "                for i in range(batches):\n",
+    "                    # allows for minibatch gradient descent\n",
+    "                    if i == batches - 1:\n",
+    "                        # If the for loop has reached the last batch, take all thats left\n",
+    "                        X_batch = X[i * batch_size :, :]\n",
+    "                        t_batch = t[i * batch_size :, :]\n",
+    "                    else:\n",
+    "                        X_batch = X[i * batch_size : (i + 1) * batch_size, :]\n",
+    "                        t_batch = t[i * batch_size : (i + 1) * batch_size, :]\n",
+    "\n",
+    "                    self._feedforward(X_batch)\n",
+    "                    self._backpropagate(X_batch, t_batch, lam)\n",
+    "\n",
+    "                # reset schedulers for each epoch (some schedulers pass in this call)\n",
+    "                for scheduler in self.schedulers_weight:\n",
+    "                    scheduler.reset()\n",
+    "\n",
+    "                for scheduler in self.schedulers_bias:\n",
+    "                    scheduler.reset()\n",
+    "\n",
+    "                # computing performance metrics\n",
+    "                pred_train = self.predict(X)\n",
+    "                train_error = cost_function_train(pred_train)\n",
+    "\n",
+    "                train_errors[e] = train_error\n",
+    "                if val_set:\n",
+    "                    \n",
+    "                    pred_val = self.predict(X_val)\n",
+    "                    val_error = cost_function_val(pred_val)\n",
+    "                    val_errors[e] = val_error\n",
+    "\n",
+    "                if self.classification:\n",
+    "                    train_acc = self._accuracy(self.predict(X), t)\n",
+    "                    train_accs[e] = train_acc\n",
+    "                    if val_set:\n",
+    "                        val_acc = self._accuracy(pred_val, t_val)\n",
+    "                        val_accs[e] = val_acc\n",
+    "\n",
+    "                # printing progress bar\n",
+    "                progression = e / epochs\n",
+    "                print_length = self._progress_bar(\n",
+    "                    progression,\n",
+    "                    train_error=train_errors[e],\n",
+    "                    train_acc=train_accs[e],\n",
+    "                    val_error=val_errors[e],\n",
+    "                    val_acc=val_accs[e],\n",
+    "                )\n",
+    "        except KeyboardInterrupt:\n",
+    "            # allows for stopping training at any point and seeing the result\n",
+    "            pass\n",
+    "\n",
+    "        # visualization of training progression (similiar to tensorflow progression bar)\n",
+    "        sys.stdout.write(\"\\r\" + \" \" * print_length)\n",
+    "        sys.stdout.flush()\n",
+    "        self._progress_bar(\n",
+    "            1,\n",
+    "            train_error=train_errors[e],\n",
+    "            train_acc=train_accs[e],\n",
+    "            val_error=val_errors[e],\n",
+    "            val_acc=val_accs[e],\n",
+    "        )\n",
+    "        sys.stdout.write(\"\")\n",
+    "\n",
+    "        # return performance metrics for the entire run\n",
+    "        scores = dict()\n",
+    "\n",
+    "        scores[\"train_errors\"] = train_errors\n",
+    "\n",
+    "        if val_set:\n",
+    "            scores[\"val_errors\"] = val_errors\n",
+    "\n",
+    "        if self.classification:\n",
+    "            scores[\"train_accs\"] = train_accs\n",
+    "\n",
+    "            if val_set:\n",
+    "                scores[\"val_accs\"] = val_accs\n",
+    "\n",
+    "        return scores\n",
+    "\n",
+    "    def predict(self, X: np.ndarray, *, threshold=0.5):\n",
+    "        \"\"\"\n",
+    "         Description:\n",
+    "         ------------\n",
+    "             Performs prediction after training of the network has been finished.\n",
+    "\n",
+    "         Parameters:\n",
+    "        ------------\n",
+    "             I   X (np.ndarray): The design matrix, with n rows of p features each\n",
+    "\n",
+    "         Optional Parameters:\n",
+    "         ------------\n",
+    "             II  threshold (float) : sets minimal value for a prediction to be predicted as the positive class\n",
+    "                 in classification problems\n",
+    "\n",
+    "         Returns:\n",
+    "         ------------\n",
+    "             I   z (np.ndarray): A prediction vector (row) for each row in our design matrix\n",
+    "                 This vector is thresholded if regression=False, meaning that classification results\n",
+    "                 in a vector of 1s and 0s, while regressions in an array of decimal numbers\n",
+    "\n",
+    "        \"\"\"\n",
+    "\n",
+    "        predict = self._feedforward(X)\n",
+    "\n",
+    "        if self.classification:\n",
+    "            return np.where(predict > threshold, 1, 0)\n",
+    "        else:\n",
+    "            return predict\n",
+    "\n",
+    "    def reset_weights(self):\n",
+    "        \"\"\"\n",
+    "        Description:\n",
+    "        ------------\n",
+    "            Resets/Reinitializes the weights in order to train the network for a new problem.\n",
+    "\n",
+    "        \"\"\"\n",
+    "        if self.seed is not None:\n",
+    "            np.random.seed(self.seed)\n",
+    "\n",
+    "        self.weights = list()\n",
+    "        for i in range(len(self.dimensions) - 1):\n",
+    "            weight_array = np.random.randn(\n",
+    "                self.dimensions[i] + 1, self.dimensions[i + 1]\n",
+    "            )\n",
+    "            weight_array[0, :] = np.random.randn(self.dimensions[i + 1]) * 0.01\n",
+    "\n",
+    "            self.weights.append(weight_array)\n",
+    "\n",
+    "    def _feedforward(self, X: np.ndarray):\n",
+    "        \"\"\"\n",
+    "        Description:\n",
+    "        ------------\n",
+    "            Calculates the activation of each layer starting at the input and ending at the output.\n",
+    "            Each following activation is calculated from a weighted sum of each of the preceeding\n",
+    "            activations (except in the case of the input layer).\n",
+    "\n",
+    "        Parameters:\n",
+    "        ------------\n",
+    "            I   X (np.ndarray): The design matrix, with n rows of p features each\n",
+    "\n",
+    "        Returns:\n",
+    "        ------------\n",
+    "            I   z (np.ndarray): A prediction vector (row) for each row in our design matrix\n",
+    "        \"\"\"\n",
+    "\n",
+    "        # reset matrices\n",
+    "        self.a_matrices = list()\n",
+    "        self.z_matrices = list()\n",
+    "\n",
+    "        # if X is just a vector, make it into a matrix\n",
+    "        if len(X.shape) == 1:\n",
+    "            X = X.reshape((1, X.shape[0]))\n",
+    "\n",
+    "        # Add a coloumn of zeros as the first coloumn of the design matrix, in order\n",
+    "        # to add bias to our data\n",
+    "        bias = np.ones((X.shape[0], 1)) * 0.01\n",
+    "        X = np.hstack([bias, X])\n",
+    "\n",
+    "        # a^0, the nodes in the input layer (one a^0 for each row in X - where the\n",
+    "        # exponent indicates layer number).\n",
+    "        a = X\n",
+    "        self.a_matrices.append(a)\n",
+    "        self.z_matrices.append(a)\n",
+    "\n",
+    "        # The feed forward algorithm\n",
+    "        for i in range(len(self.weights)):\n",
+    "            if i < len(self.weights) - 1:\n",
+    "                z = a @ self.weights[i]\n",
+    "                self.z_matrices.append(z)\n",
+    "                a = self.hidden_func(z)\n",
+    "                # bias column again added to the data here\n",
+    "                bias = np.ones((a.shape[0], 1)) * 0.01\n",
+    "                a = np.hstack([bias, a])\n",
+    "                self.a_matrices.append(a)\n",
+    "            else:\n",
+    "                try:\n",
+    "                    # a^L, the nodes in our output layers\n",
+    "                    z = a @ self.weights[i]\n",
+    "                    a = self.output_func(z)\n",
+    "                    self.a_matrices.append(a)\n",
+    "                    self.z_matrices.append(z)\n",
+    "                except Exception as OverflowError:\n",
+    "                    print(\n",
+    "                        \"OverflowError in fit() in FFNN\\nHOW TO DEBUG ERROR: Consider lowering your learning rate or scheduler specific parameters such as momentum, or check if your input values need scaling\"\n",
+    "                    )\n",
+    "\n",
+    "        # this will be a^L\n",
+    "        return a\n",
+    "\n",
+    "    def _backpropagate(self, X, t, lam):\n",
+    "        \"\"\"\n",
+    "        Description:\n",
+    "        ------------\n",
+    "            Performs the backpropagation algorithm. In other words, this method\n",
+    "            calculates the gradient of all the layers starting at the\n",
+    "            output layer, and moving from right to left accumulates the gradient until\n",
+    "            the input layer is reached. Each layers respective weights are updated while\n",
+    "            the algorithm propagates backwards from the output layer (auto-differentation in reverse mode).\n",
+    "\n",
+    "        Parameters:\n",
+    "        ------------\n",
+    "            I   X (np.ndarray): The design matrix, with n rows of p features each.\n",
+    "            II  t (np.ndarray): The target vector, with n rows of p targets.\n",
+    "            III lam (float32): regularization parameter used to punish the weights in case of overfitting\n",
+    "\n",
+    "        Returns:\n",
+    "        ------------\n",
+    "            No return value.\n",
+    "\n",
+    "        \"\"\"\n",
+    "        out_derivative = derivate(self.output_func)\n",
+    "        hidden_derivative = derivate(self.hidden_func)\n",
+    "\n",
+    "        for i in range(len(self.weights) - 1, -1, -1):\n",
+    "            # delta terms for output\n",
+    "            if i == len(self.weights) - 1:\n",
+    "                # for multi-class classification\n",
+    "                if (\n",
+    "                    self.output_func.__name__ == \"softmax\"\n",
+    "                ):\n",
+    "                    delta_matrix = self.a_matrices[i + 1] - t\n",
+    "                # for single class classification\n",
+    "                else:\n",
+    "                    cost_func_derivative = grad(self.cost_func(t))\n",
+    "                    delta_matrix = out_derivative(\n",
+    "                        self.z_matrices[i + 1]\n",
+    "                    ) * cost_func_derivative(self.a_matrices[i + 1])\n",
+    "\n",
+    "            # delta terms for hidden layer\n",
+    "            else:\n",
+    "                delta_matrix = (\n",
+    "                    self.weights[i + 1][1:, :] @ delta_matrix.T\n",
+    "                ).T * hidden_derivative(self.z_matrices[i + 1])\n",
+    "\n",
+    "            # calculate gradient\n",
+    "            gradient_weights = self.a_matrices[i][:, 1:].T @ delta_matrix\n",
+    "            gradient_bias = np.sum(delta_matrix, axis=0).reshape(\n",
+    "                1, delta_matrix.shape[1]\n",
+    "            )\n",
+    "\n",
+    "            # regularization term\n",
+    "            gradient_weights += self.weights[i][1:, :] * lam\n",
+    "\n",
+    "            # use scheduler\n",
+    "            update_matrix = np.vstack(\n",
+    "                [\n",
+    "                    self.schedulers_bias[i].update_change(gradient_bias),\n",
+    "                    self.schedulers_weight[i].update_change(gradient_weights),\n",
+    "                ]\n",
+    "            )\n",
+    "\n",
+    "            # update weights and bias\n",
+    "            self.weights[i] -= update_matrix\n",
+    "\n",
+    "    def _accuracy(self, prediction: np.ndarray, target: np.ndarray):\n",
+    "        \"\"\"\n",
+    "        Description:\n",
+    "        ------------\n",
+    "            Calculates accuracy of given prediction to target\n",
+    "\n",
+    "        Parameters:\n",
+    "        ------------\n",
+    "            I   prediction (np.ndarray): vector of predicitons output network\n",
+    "                (1s and 0s in case of classification, and real numbers in case of regression)\n",
+    "            II  target (np.ndarray): vector of true values (What the network ideally should predict)\n",
+    "\n",
+    "        Returns:\n",
+    "        ------------\n",
+    "            A floating point number representing the percentage of correctly classified instances.\n",
+    "        \"\"\"\n",
+    "        assert prediction.size == target.size\n",
+    "        return np.average((target == prediction))\n",
+    "    def _set_classification(self):\n",
+    "        \"\"\"\n",
+    "        Description:\n",
+    "        ------------\n",
+    "            Decides if FFNN acts as classifier (True) og regressor (False),\n",
+    "            sets self.classification during init()\n",
+    "        \"\"\"\n",
+    "        self.classification = False\n",
+    "        if (\n",
+    "            self.cost_func.__name__ == \"CostLogReg\"\n",
+    "            or self.cost_func.__name__ == \"CostCrossEntropy\"\n",
+    "        ):\n",
+    "            self.classification = True\n",
+    "\n",
+    "    def _progress_bar(self, progression, **kwargs):\n",
+    "        \"\"\"\n",
+    "        Description:\n",
+    "        ------------\n",
+    "            Displays progress of training\n",
+    "        \"\"\"\n",
+    "        print_length = 40\n",
+    "        num_equals = int(progression * print_length)\n",
+    "        num_not = print_length - num_equals\n",
+    "        arrow = \">\" if num_equals > 0 else \"\"\n",
+    "        bar = \"[\" + \"=\" * (num_equals - 1) + arrow + \"-\" * num_not + \"]\"\n",
+    "        perc_print = self._format(progression * 100, decimals=5)\n",
+    "        line = f\"  {bar} {perc_print}% \"\n",
+    "\n",
+    "        for key in kwargs:\n",
+    "            if not np.isnan(kwargs[key]):\n",
+    "                value = self._format(kwargs[key], decimals=4)\n",
+    "                line += f\"| {key}: {value} \"\n",
+    "        sys.stdout.write(\"\\r\" + line)\n",
+    "        sys.stdout.flush()\n",
+    "        return len(line)\n",
+    "\n",
+    "    def _format(self, value, decimals=4):\n",
+    "        \"\"\"\n",
+    "        Description:\n",
+    "        ------------\n",
+    "            Formats decimal numbers for progress bar\n",
+    "        \"\"\"\n",
+    "        if value > 0:\n",
+    "            v = value\n",
+    "        elif value < 0:\n",
+    "            v = -10 * value\n",
+    "        else:\n",
+    "            v = 1\n",
+    "        n = 1 + math.floor(math.log10(v))\n",
+    "        if n >= decimals - 1:\n",
+    "            return str(round(value))\n",
+    "        return f\"{value:.{decimals-n-1}f}\""
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9596ae53",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Before we make a model, we will quickly generate a dataset we can use\n",
+    "for our linear regression problem as shown below"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 31,
+   "id": "a11f680f",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "import autograd.numpy as np\n",
+    "from sklearn.model_selection import train_test_split\n",
+    "\n",
+    "def SkrankeFunction(x, y):\n",
+    "    return np.ravel(0 + 1*x + 2*y + 3*x**2 + 4*x*y + 5*y**2)\n",
+    "\n",
+    "def create_X(x, y, n):\n",
+    "    if len(x.shape) > 1:\n",
+    "        x = np.ravel(x)\n",
+    "        y = np.ravel(y)\n",
+    "\n",
+    "    N = len(x)\n",
+    "    l = int((n + 1) * (n + 2) / 2)  # Number of elements in beta\n",
+    "    X = np.ones((N, l))\n",
+    "\n",
+    "    for i in range(1, n + 1):\n",
+    "        q = int((i) * (i + 1) / 2)\n",
+    "        for k in range(i + 1):\n",
+    "            X[:, q + k] = (x ** (i - k)) * (y**k)\n",
+    "\n",
+    "    return X\n",
+    "\n",
+    "step=0.5\n",
+    "x = np.arange(0, 1, step)\n",
+    "y = np.arange(0, 1, step)\n",
+    "x, y = np.meshgrid(x, y)\n",
+    "target = SkrankeFunction(x, y)\n",
+    "target = target.reshape(target.shape[0], 1)\n",
+    "\n",
+    "poly_degree=3\n",
+    "X = create_X(x, y, poly_degree)\n",
+    "\n",
+    "X_train, X_test, t_train, t_test = train_test_split(X, target)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0fc39e40",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Now that we have our dataset ready for the regression, we can create\n",
+    "our regressor. Note that with the seed parameter, we can make sure our\n",
+    "results stay the same every time we run the neural network. For\n",
+    "inititialization, we simply specify the dimensions (we wish the amount\n",
+    "of input nodes to be equal to the datapoints, and the output to\n",
+    "predict one value)."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 32,
+   "id": "a67ab3a0",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "input_nodes = X_train.shape[1]\n",
+    "output_nodes = 1\n",
+    "\n",
+    "linear_regression = FFNN((input_nodes, output_nodes), output_func=identity, cost_func=CostOLS, seed=2023)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3add8665",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "We then fit our model with our training data using the scheduler of our choice."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 33,
+   "id": "4a4fbc7a",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "linear_regression.reset_weights() # reset weights such that previous runs or reruns don't affect the weights\n",
+    "\n",
+    "scheduler = Constant(eta=1e-3)\n",
+    "scores = linear_regression.fit(X_train, t_train, scheduler)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4dff1871",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Due to the progress bar we can see the MSE (train_error) throughout\n",
+    "the FFNN's training. Note that the fit() function has some optional\n",
+    "parameters with defualt arguments. For example, the regularization\n",
+    "hyperparameter can be left ignored if not needed, and equally the FFNN\n",
+    "will by default run for 100 epochs. These can easily be changed, such\n",
+    "as for example:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 34,
+   "id": "ad40e38c",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "linear_regression.reset_weights() # reset weights such that previous runs or reruns don't affect the weights\n",
+    "\n",
+    "scores = linear_regression.fit(X_train, t_train, scheduler, lam=1e-4, epochs=1000)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "43cd1e22",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "We see that given more epochs to train on, the regressor reaches a lower MSE.\n",
+    "\n",
+    "Let us then switch to a binary classification. We use a binary\n",
+    "classification dataset, and follow a similar setup to the regression\n",
+    "case."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 35,
+   "id": "cde36b38",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "from sklearn.datasets import load_breast_cancer\n",
+    "from sklearn.preprocessing import MinMaxScaler\n",
+    "\n",
+    "wisconsin = load_breast_cancer()\n",
+    "X = wisconsin.data\n",
+    "target = wisconsin.target\n",
+    "target = target.reshape(target.shape[0], 1)\n",
+    "\n",
+    "X_train, X_val, t_train, t_val = train_test_split(X, target)\n",
+    "\n",
+    "scaler = MinMaxScaler()\n",
+    "scaler.fit(X_train)\n",
+    "X_train = scaler.transform(X_train)\n",
+    "X_val = scaler.transform(X_val)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 36,
+   "id": "2bc572a4",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "input_nodes = X_train.shape[1]\n",
+    "output_nodes = 1\n",
+    "\n",
+    "logistic_regression = FFNN((input_nodes, output_nodes), output_func=sigmoid, cost_func=CostLogReg, seed=2023)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e3e6fa31",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "We will now make use of our validation data by passing it into our fit function as a keyword argument"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 37,
+   "id": "575ceb29",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "logistic_regression.reset_weights() # reset weights such that previous runs or reruns don't affect the weights\n",
+    "\n",
+    "scheduler = Adam(eta=1e-3, rho=0.9, rho2=0.999)\n",
+    "scores = logistic_regression.fit(X_train, t_train, scheduler, epochs=1000, X_val=X_val, t_val=t_val)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "622015f0",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Finally, we will create a neural network with 2 hidden layers with activation functions."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 38,
+   "id": "9c075b36",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "input_nodes = X_train.shape[1]\n",
+    "hidden_nodes1 = 100\n",
+    "hidden_nodes2 = 30\n",
+    "output_nodes = 1\n",
+    "\n",
+    "dims = (input_nodes, hidden_nodes1, hidden_nodes2, output_nodes)\n",
+    "\n",
+    "neural_network = FFNN(dims, hidden_func=RELU, output_func=sigmoid, cost_func=CostLogReg, seed=2023)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 39,
+   "id": "44ded771",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "neural_network.reset_weights() # reset weights such that previous runs or reruns don't affect the weights\n",
+    "\n",
+    "scheduler = Adam(eta=1e-4, rho=0.9, rho2=0.999)\n",
+    "scores = neural_network.fit(X_train, t_train, scheduler, epochs=1000, X_val=X_val, t_val=t_val)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "317e6e5c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "### Multiclass classification\n",
+    "\n",
+    "Finally, we will demonstrate the use case of multiclass classification\n",
+    "using our FFNN with the famous MNIST dataset, which contain images of\n",
+    "digits between the range of 0 to 9."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 40,
+   "id": "8911de9d",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "from sklearn.datasets import load_digits\n",
+    "\n",
+    "def onehot(target: np.ndarray):\n",
+    "    onehot = np.zeros((target.size, target.max() + 1))\n",
+    "    onehot[np.arange(target.size), target] = 1\n",
+    "    return onehot\n",
+    "\n",
+    "digits = load_digits()\n",
+    "\n",
+    "X = digits.data\n",
+    "target = digits.target\n",
+    "target = onehot(target)\n",
+    "\n",
+    "input_nodes = 64\n",
+    "hidden_nodes1 = 100\n",
+    "hidden_nodes2 = 30\n",
+    "output_nodes = 10\n",
+    "\n",
+    "dims = (input_nodes, hidden_nodes1, hidden_nodes2, output_nodes)\n",
+    "\n",
+    "multiclass = FFNN(dims, hidden_func=LRELU, output_func=softmax, cost_func=CostCrossEntropy)\n",
+    "\n",
+    "multiclass.reset_weights() # reset weights such that previous runs or reruns don't affect the weights\n",
+    "\n",
+    "scheduler = Adam(eta=1e-4, rho=0.9, rho2=0.999)\n",
+    "scores = multiclass.fit(X, target, scheduler, epochs=1000)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "82d61377",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Testing the XOR gate and other gates\n",
+    "\n",
+    "Let us now use our code to test the XOR gate."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 41,
+   "id": "2a72a374",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "X = np.array([ [0, 0], [0, 1], [1, 0],[1, 1]],dtype=np.float64)\n",
+    "\n",
+    "# The XOR gate\n",
+    "yXOR = np.array( [[ 0], [1] ,[1], [0]])\n",
+    "\n",
+    "input_nodes = X.shape[1]\n",
+    "output_nodes = 1\n",
+    "\n",
+    "logistic_regression = FFNN((input_nodes, output_nodes), output_func=sigmoid, cost_func=CostLogReg, seed=2023)\n",
+    "logistic_regression.reset_weights() # reset weights such that previous runs or reruns don't affect the weights\n",
+    "scheduler = Adam(eta=1e-1, rho=0.9, rho2=0.999)\n",
+    "scores = logistic_regression.fit(X, yXOR, scheduler, epochs=1000)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2d892009",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Not bad, but the results depend strongly on the learning reate. Try different learning rates."
+   ]
+  }
+ ],
+ "metadata": {},
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/doc/LectureNotes/week43.ipynb b/doc/LectureNotes/week43.ipynb
new file mode 100644
index 000000000..b190102b6
--- /dev/null
+++ b/doc/LectureNotes/week43.ipynb
@@ -0,0 +1,5950 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "5e07edf2",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- HTML file automatically generated from DocOnce source (https://github.com/doconce/doconce/)\n",
+    "doconce format html week43.do.txt --no_mako -->\n",
+    "<!-- dom:TITLE: Week 43: Deep Learning: Constructing a Neural Network code and solving differential equations -->"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "44b465a0",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "# Week 43: Deep Learning: Constructing a Neural Network code and solving differential equations\n",
+    "**Morten Hjorth-Jensen**, Department of Physics, University of Oslo, Norway\n",
+    "\n",
+    "Date: **October 20, 2025**"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9d7bd8c9",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Plans for week 43\n",
+    "\n",
+    "**Material for the lecture on Monday October 20, 2025.**\n",
+    "\n",
+    "1. Reminder from last week, see also lecture notes from week 42 at <https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/week42.html> as well as those from week 41, see see <https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/week41.html>. \n",
+    "\n",
+    "2. Building our own Feed-forward Neural Network.\n",
+    "\n",
+    "3. Coding examples using Tensorflow/Keras and Pytorch examples. The Pytorch examples are adapted from Rashcka's text, see chapters 11-13.. \n",
+    "\n",
+    "4. Start discussions on how to use neural networks for solving  differential equations (ordinary and partial ones). This topic continues next week as well.\n",
+    "\n",
+    "5. Video of lecture at <https://youtu.be/Gi6mzxAT0Ew>\n",
+    "\n",
+    "6. Whiteboard notes at <https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2025/FYSSTKweek43.pdf>"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c50cff0f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Exercises and lab session week 43\n",
+    "**Lab sessions on Tuesday and Wednesday.**\n",
+    "\n",
+    "1. Work  on writing your own neural network code and discussions of project 2. If you didn't get time to do the exercises from the two last weeks, we recommend doing so as these exercises give you the basic elements of a neural network code.\n",
+    "\n",
+    "2. The exercises this week are tailored to the optional part of project 2, and deal with studying ways to display results from classification problems"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "fe8d32ed",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Using Automatic differentiation\n",
+    "\n",
+    "In our discussions of ordinary differential equations and neural network codes\n",
+    "we will also study the usage of Autograd, see for example <https://www.youtube.com/watch?v=fRf4l5qaX1M&ab_channel=AlexSmola> in computing gradients for deep learning. For the documentation of Autograd and examples see the Autograd documentation at <https://github.com/HIPS/autograd> and the lecture slides from week 41, see <https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/week41.html>."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "99999ab4",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Back propagation and automatic differentiation\n",
+    "\n",
+    "For more details on the back propagation algorithm and automatic differentiation see\n",
+    "1. <https://www.jmlr.org/papers/volume18/17-468/17-468.pdf>\n",
+    "\n",
+    "2. <https://deepimaging.github.io/lectures/lecture_11_Backpropagation.pdf>\n",
+    "\n",
+    "3. Slides 12-44 at <http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture4.pdf>"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b4489372",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Lecture Monday  October 20"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f7435e4a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations\n",
+    "This is a reminder from last week.\n",
+    "\n",
+    "**The architecture (our model).**\n",
+    "\n",
+    "1. Set up your inputs and outputs (scalars, vectors, matrices or higher-order arrays)\n",
+    "\n",
+    "2. Define the number of hidden layers and hidden nodes\n",
+    "\n",
+    "3. Define activation functions for hidden layers and output layers\n",
+    "\n",
+    "4. Define optimizer (plan learning rate, momentum, ADAgrad, RMSprop, ADAM etc) and array of initial learning rates\n",
+    "\n",
+    "5. Define cost function and possible regularization terms with hyperparameters\n",
+    "\n",
+    "6. Initialize weights and biases\n",
+    "\n",
+    "7. Fix number of iterations for the feed forward part and back propagation part"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e2561576",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Setting up the back propagation algorithm, part 1\n",
+    "\n",
+    "Let us write this out in the form of an algorithm.\n",
+    "\n",
+    "**First**, we set up the input data $\\boldsymbol{x}$ and the activations\n",
+    "$\\boldsymbol{z}_1$ of the input layer and compute the activation function and\n",
+    "the pertinent outputs $\\boldsymbol{a}^1$.\n",
+    "\n",
+    "**Secondly**, we perform then the feed forward till we reach the output\n",
+    "layer and compute all $\\boldsymbol{z}_l$ of the input layer and compute the\n",
+    "activation function and the pertinent outputs $\\boldsymbol{a}^l$ for\n",
+    "$l=1,2,3,\\dots,L$.\n",
+    "\n",
+    "**Notation**: The first hidden layer has $l=1$ as label and the final output layer has $l=L$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "39ed46ed",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Setting up the back propagation algorithm, part 2\n",
+    "\n",
+    "Thereafter we compute the ouput error $\\boldsymbol{\\delta}^L$ by computing all"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "776b50ac",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\delta_j^L = \\sigma'(z_j^L)\\frac{\\partial {\\cal C}}{\\partial (a_j^L)}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b0ad385d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Then we compute the back propagate error for each $l=L-1,L-2,\\dots,1$ as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "bb592830",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\delta_j^l = \\sum_k \\delta_k^{l+1}w_{kj}^{l+1}\\sigma'(z_j^l).\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "41259526",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Setting up the Back propagation algorithm, part 3\n",
+    "\n",
+    "Finally, we update the weights and the biases using gradient descent\n",
+    "for each $l=L-1,L-2,\\dots,1$ (the first hidden layer) and update the weights and biases\n",
+    "according to the rules"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "47eaff91",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "w_{ij}^l\\leftarrow  = w_{ij}^l- \\eta \\delta_j^la_i^{l-1},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "05b74533",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "b_j^l \\leftarrow b_j^l-\\eta \\frac{\\partial {\\cal C}}{\\partial b_j^l}=b_j^l-\\eta \\delta_j^l,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6edb8648",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "with $\\eta$ being the learning rate."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a663fc08",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Updating the gradients\n",
+    "\n",
+    "With the back propagate error for each $l=L-1,L-2,\\dots,1$ as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "479150e0",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\delta_j^l = \\sum_k \\delta_k^{l+1}w_{kj}^{l+1}\\sigma'(z_j^l),\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "41b9b1ea",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "we update the weights and the biases using gradient descent for each $l=L-1,L-2,\\dots,1$ and update the weights and biases according to the rules"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "590c403a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "w_{ij}^l\\leftarrow  = w_{ij}^l- \\eta \\delta_j^la_i^{l-1},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3db8cbb4",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "b_j^l \\leftarrow b_j^l-\\eta \\frac{\\partial {\\cal C}}{\\partial b_j^l}=b_j^l-\\eta \\delta_j^l,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a204182a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Activation functions\n",
+    "\n",
+    "A property that characterizes a neural network, other than its\n",
+    "connectivity, is the choice of activation function(s).  The following\n",
+    "restrictions are imposed on an activation function for an FFNN to\n",
+    "fulfill the universal approximation theorem\n",
+    "\n",
+    "  * Non-constant\n",
+    "\n",
+    "  * Bounded\n",
+    "\n",
+    "  * Monotonically-increasing\n",
+    "\n",
+    "  * Continuous"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4fe58cce",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "### Activation functions, examples\n",
+    "\n",
+    "Typical examples are the logistic *Sigmoid*"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a14f6d08",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\sigma(x) = \\frac{1}{1 + e^{-x}},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4c290410",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and the *hyperbolic tangent* function"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ca1ac514",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\sigma(x) = \\tanh(x)\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b9bcfab3",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## The RELU function family\n",
+    "\n",
+    "The ReLU activation function suffers from a problem known as the dying\n",
+    "ReLUs: during training, some neurons effectively die, meaning they\n",
+    "stop outputting anything other than 0.\n",
+    "\n",
+    "In some cases, you may find that half of your network’s neurons are\n",
+    "dead, especially if you used a large learning rate. During training,\n",
+    "if a neuron’s weights get updated such that the weighted sum of the\n",
+    "neuron’s inputs is negative, it will start outputting 0. When this\n",
+    "happen, the neuron is unlikely to come back to life since the gradient\n",
+    "of the ReLU function is 0 when its input is negative."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2fdf56f7",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## ELU function\n",
+    "\n",
+    "To solve this problem, nowadays practitioners use a variant of the\n",
+    "ReLU function, such as the leaky ReLU discussed above or the so-called\n",
+    "exponential linear unit (ELU) function"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "14bf193c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "ELU(z) = \\left\\{\\begin{array}{cc} \\alpha\\left( \\exp{(z)}-1\\right) & z < 0,\\\\  z & z \\ge 0.\\end{array}\\right.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "df29068f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Which activation function should we use?\n",
+    "\n",
+    "In general it seems that the ELU activation function is better than\n",
+    "the leaky ReLU function (and its variants), which is better than\n",
+    "ReLU. ReLU performs better than $\\tanh$ which in turn performs better\n",
+    "than the logistic function.\n",
+    "\n",
+    "If runtime performance is an issue, then you may opt for the leaky\n",
+    "ReLU function over the ELU function If you don’t want to tweak yet\n",
+    "another hyperparameter, you may just use the default $\\alpha$ of\n",
+    "$0.01$ for the leaky ReLU, and $1$ for ELU. If you have spare time and\n",
+    "computing power, you can use cross-validation or bootstrap to evaluate\n",
+    "other activation functions."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2fb5a29e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## More on activation functions, output layers\n",
+    "\n",
+    "In most cases you can use the ReLU activation function in the hidden\n",
+    "layers (or one of its variants).\n",
+    "\n",
+    "It is a bit faster to compute than other activation functions, and the\n",
+    "gradient descent optimization does in general not get stuck.\n",
+    "\n",
+    "**For the output layer:**\n",
+    "\n",
+    "* For classification the softmax activation function is generally a good choice for classification tasks (when the classes are mutually exclusive).\n",
+    "\n",
+    "* For regression tasks, you can simply use no activation function at all."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "bab79791",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Building neural networks in Tensorflow and Keras\n",
+    "\n",
+    "Now we want  to build on the experience gained from our neural network implementation in NumPy and scikit-learn\n",
+    "and use it to construct a neural network in Tensorflow. Once we have constructed a neural network in NumPy\n",
+    "and Tensorflow, building one in Keras is really quite trivial, though the performance may suffer.  \n",
+    "\n",
+    "In our previous example we used only one hidden layer, and in this we will use two. From this it should be quite\n",
+    "clear how to build one using an arbitrary number of hidden layers, using data structures such as Python lists or\n",
+    "NumPy arrays."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cc32bc9d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Tensorflow\n",
+    "\n",
+    "Tensorflow is an open source library machine learning library\n",
+    "developed by the Google Brain team for internal use. It was released\n",
+    "under the Apache 2.0 open source license in November 9, 2015.\n",
+    "\n",
+    "Tensorflow is a computational framework that allows you to construct\n",
+    "machine learning models at different levels of abstraction, from\n",
+    "high-level, object-oriented APIs like Keras, down to the C++ kernels\n",
+    "that Tensorflow is built upon. The higher levels of abstraction are\n",
+    "simpler to use, but less flexible, and our choice of implementation\n",
+    "should reflect the problems we are trying to solve.\n",
+    "\n",
+    "[Tensorflow uses](https://www.tensorflow.org/guide/graphs) so-called graphs to represent your computation\n",
+    "in terms of the dependencies between individual operations, such that you first build a Tensorflow *graph*\n",
+    "to represent your model, and then create a Tensorflow *session* to run the graph.\n",
+    "\n",
+    "In this guide we will analyze the same data as we did in our NumPy and\n",
+    "scikit-learn tutorial, gathered from the MNIST database of images. We\n",
+    "will give an introduction to the lower level Python Application\n",
+    "Program Interfaces (APIs), and see how we use them to build our graph.\n",
+    "Then we will build (effectively) the same graph in Keras, to see just\n",
+    "how simple solving a machine learning problem can be.\n",
+    "\n",
+    "To install tensorflow on Unix/Linux systems, use pip as"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "id": "deb81088",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "pip3 install tensorflow"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "979148b0",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and/or if you use **anaconda**, just write (or install from the graphical user interface)\n",
+    "(current release of CPU-only TensorFlow)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "id": "ad63b8d9",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "conda create -n tf tensorflow\n",
+    "conda activate tf"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1417a40e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "To install the current release of GPU TensorFlow"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "id": "d56acb3a",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "conda create -n tf-gpu tensorflow-gpu\n",
+    "conda activate tf-gpu"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6a163d27",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Using Keras\n",
+    "\n",
+    "Keras is a high level [neural network](https://en.wikipedia.org/wiki/Application_programming_interface)\n",
+    "that supports Tensorflow, CTNK and Theano as backends.  \n",
+    "If you have Anaconda installed you may run the following command"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "id": "9ee390a8",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "conda install keras"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "528ea3d5",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "You can look up the [instructions here](https://keras.io/) for more information.\n",
+    "\n",
+    "We will to a large extent use **keras** in this course."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "32178225",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Collect and pre-process data\n",
+    "\n",
+    "Let us look again at the MINST data set."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "id": "e37f86e4",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "%matplotlib inline\n",
+    "\n",
+    "# import necessary packages\n",
+    "import numpy as np\n",
+    "import matplotlib.pyplot as plt\n",
+    "import tensorflow as tf\n",
+    "from sklearn import datasets\n",
+    "\n",
+    "\n",
+    "# ensure the same random numbers appear every time\n",
+    "np.random.seed(0)\n",
+    "\n",
+    "# display images in notebook\n",
+    "%matplotlib inline\n",
+    "plt.rcParams['figure.figsize'] = (12,12)\n",
+    "\n",
+    "\n",
+    "# download MNIST dataset\n",
+    "digits = datasets.load_digits()\n",
+    "\n",
+    "# define inputs and labels\n",
+    "inputs = digits.images\n",
+    "labels = digits.target\n",
+    "\n",
+    "print(\"inputs = (n_inputs, pixel_width, pixel_height) = \" + str(inputs.shape))\n",
+    "print(\"labels = (n_inputs) = \" + str(labels.shape))\n",
+    "\n",
+    "\n",
+    "# flatten the image\n",
+    "# the value -1 means dimension is inferred from the remaining dimensions: 8x8 = 64\n",
+    "n_inputs = len(inputs)\n",
+    "inputs = inputs.reshape(n_inputs, -1)\n",
+    "print(\"X = (n_inputs, n_features) = \" + str(inputs.shape))\n",
+    "\n",
+    "\n",
+    "# choose some random images to display\n",
+    "indices = np.arange(n_inputs)\n",
+    "random_indices = np.random.choice(indices, size=5)\n",
+    "\n",
+    "for i, image in enumerate(digits.images[random_indices]):\n",
+    "    plt.subplot(1, 5, i+1)\n",
+    "    plt.axis('off')\n",
+    "    plt.imshow(image, cmap=plt.cm.gray_r, interpolation='nearest')\n",
+    "    plt.title(\"Label: %d\" % digits.target[random_indices[i]])\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "id": "06a7c3bd",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "from tensorflow.keras.layers import Input\n",
+    "from tensorflow.keras.models import Sequential      #This allows appending layers to existing models\n",
+    "from tensorflow.keras.layers import Dense           #This allows defining the characteristics of a particular layer\n",
+    "from tensorflow.keras import optimizers             #This allows using whichever optimiser we want (sgd,adam,RMSprop)\n",
+    "from tensorflow.keras import regularizers           #This allows using whichever regularizer we want (l1,l2,l1_l2)\n",
+    "from tensorflow.keras.utils import to_categorical   #This allows using categorical cross entropy as the cost function\n",
+    "\n",
+    "from sklearn.model_selection import train_test_split\n",
+    "\n",
+    "# one-hot representation of labels\n",
+    "labels = to_categorical(labels)\n",
+    "\n",
+    "# split into train and test data\n",
+    "train_size = 0.8\n",
+    "test_size = 1 - train_size\n",
+    "X_train, X_test, Y_train, Y_test = train_test_split(inputs, labels, train_size=train_size,\n",
+    "                                                    test_size=test_size)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "id": "358b46c5",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "\n",
+    "epochs = 100\n",
+    "batch_size = 100\n",
+    "n_neurons_layer1 = 100\n",
+    "n_neurons_layer2 = 50\n",
+    "n_categories = 10\n",
+    "eta_vals = np.logspace(-5, 1, 7)\n",
+    "lmbd_vals = np.logspace(-5, 1, 7)\n",
+    "def create_neural_network_keras(n_neurons_layer1, n_neurons_layer2, n_categories, eta, lmbd):\n",
+    "    model = Sequential()\n",
+    "    model.add(Dense(n_neurons_layer1, activation='sigmoid', kernel_regularizer=regularizers.l2(lmbd)))\n",
+    "    model.add(Dense(n_neurons_layer2, activation='sigmoid', kernel_regularizer=regularizers.l2(lmbd)))\n",
+    "    model.add(Dense(n_categories, activation='softmax'))\n",
+    "    \n",
+    "    sgd = optimizers.SGD(learning_rate=eta)\n",
+    "    model.compile(loss='categorical_crossentropy', optimizer=sgd, metrics=['accuracy'])\n",
+    "    \n",
+    "    return model"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 8,
+   "id": "5a0445fb",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "DNN_keras = np.zeros((len(eta_vals), len(lmbd_vals)), dtype=object)\n",
+    "        \n",
+    "for i, eta in enumerate(eta_vals):\n",
+    "    for j, lmbd in enumerate(lmbd_vals):\n",
+    "        DNN = create_neural_network_keras(n_neurons_layer1, n_neurons_layer2, n_categories,\n",
+    "                                         eta=eta, lmbd=lmbd)\n",
+    "        DNN.fit(X_train, Y_train, epochs=epochs, batch_size=batch_size, verbose=0)\n",
+    "        scores = DNN.evaluate(X_test, Y_test)\n",
+    "        \n",
+    "        DNN_keras[i][j] = DNN\n",
+    "        \n",
+    "        print(\"Learning rate = \", eta)\n",
+    "        print(\"Lambda = \", lmbd)\n",
+    "        print(\"Test accuracy: %.3f\" % scores[1])\n",
+    "        print()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 9,
+   "id": "f301c7cf",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "# optional\n",
+    "# visual representation of grid search\n",
+    "# uses seaborn heatmap, could probably do this in matplotlib\n",
+    "import seaborn as sns\n",
+    "\n",
+    "sns.set()\n",
+    "\n",
+    "train_accuracy = np.zeros((len(eta_vals), len(lmbd_vals)))\n",
+    "test_accuracy = np.zeros((len(eta_vals), len(lmbd_vals)))\n",
+    "\n",
+    "for i in range(len(eta_vals)):\n",
+    "    for j in range(len(lmbd_vals)):\n",
+    "        DNN = DNN_keras[i][j]\n",
+    "\n",
+    "        train_accuracy[i][j] = DNN.evaluate(X_train, Y_train)[1]\n",
+    "        test_accuracy[i][j] = DNN.evaluate(X_test, Y_test)[1]\n",
+    "\n",
+    "        \n",
+    "fig, ax = plt.subplots(figsize = (10, 10))\n",
+    "sns.heatmap(train_accuracy, annot=True, ax=ax, cmap=\"viridis\")\n",
+    "ax.set_title(\"Training Accuracy\")\n",
+    "ax.set_ylabel(\"$\\eta$\")\n",
+    "ax.set_xlabel(\"$\\lambda$\")\n",
+    "plt.show()\n",
+    "\n",
+    "fig, ax = plt.subplots(figsize = (10, 10))\n",
+    "sns.heatmap(test_accuracy, annot=True, ax=ax, cmap=\"viridis\")\n",
+    "ax.set_title(\"Test Accuracy\")\n",
+    "ax.set_ylabel(\"$\\eta$\")\n",
+    "ax.set_xlabel(\"$\\lambda$\")\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "610c95e1",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Using Pytorch with the full MNIST data set"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 10,
+   "id": "d0f3ad9a",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "import torch\n",
+    "import torch.nn as nn\n",
+    "import torch.optim as optim\n",
+    "import torchvision\n",
+    "import torchvision.transforms as transforms\n",
+    "\n",
+    "# Device configuration: use GPU if available\n",
+    "device = torch.device(\"cuda\" if torch.cuda.is_available() else \"cpu\")\n",
+    "\n",
+    "# MNIST dataset (downloads if not already present)\n",
+    "transform = transforms.Compose([\n",
+    "    transforms.ToTensor(),\n",
+    "    transforms.Normalize((0.5,), (0.5,))  # normalize to mean=0.5, std=0.5 (approx. [-1,1] pixel range)\n",
+    "])\n",
+    "train_dataset = torchvision.datasets.MNIST(root='./data', train=True, download=True, transform=transform)\n",
+    "test_dataset  = torchvision.datasets.MNIST(root='./data', train=False, download=True, transform=transform)\n",
+    "\n",
+    "train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=64, shuffle=True)\n",
+    "test_loader  = torch.utils.data.DataLoader(test_dataset, batch_size=64, shuffle=False)\n",
+    "\n",
+    "\n",
+    "class NeuralNet(nn.Module):\n",
+    "    def __init__(self):\n",
+    "        super(NeuralNet, self).__init__()\n",
+    "        self.fc1 = nn.Linear(28*28, 100)   # first hidden layer (784 -> 100)\n",
+    "        self.fc2 = nn.Linear(100, 100)    # second hidden layer (100 -> 100)\n",
+    "        self.fc3 = nn.Linear(100, 10)     # output layer (100 -> 10 classes)\n",
+    "    def forward(self, x):\n",
+    "        x = x.view(x.size(0), -1)         # flatten images into vectors of size 784\n",
+    "        x = torch.relu(self.fc1(x))       # hidden layer 1 + ReLU activation\n",
+    "        x = torch.relu(self.fc2(x))       # hidden layer 2 + ReLU activation\n",
+    "        x = self.fc3(x)                   # output layer (logits for 10 classes)\n",
+    "        return x\n",
+    "\n",
+    "model = NeuralNet().to(device)\n",
+    "\n",
+    "\n",
+    "criterion = nn.CrossEntropyLoss()\n",
+    "optimizer = optim.SGD(model.parameters(), lr=0.01, weight_decay=1e-4)\n",
+    "\n",
+    "num_epochs = 10\n",
+    "for epoch in range(num_epochs):\n",
+    "    model.train()  # set model to training mode\n",
+    "    running_loss = 0.0\n",
+    "    for images, labels in train_loader:\n",
+    "        # Move data to device (GPU if available, else CPU)\n",
+    "        images, labels = images.to(device), labels.to(device)\n",
+    "\n",
+    "        optimizer.zero_grad()            # reset gradients to zero\n",
+    "        outputs = model(images)          # forward pass: compute predictions\n",
+    "        loss = criterion(outputs, labels)  # compute cross-entropy loss\n",
+    "        loss.backward()                 # backpropagate to compute gradients\n",
+    "        optimizer.step()                # update weights using SGD step \n",
+    "\n",
+    "        running_loss += loss.item()\n",
+    "    # Compute average loss over all batches in this epoch\n",
+    "    avg_loss = running_loss / len(train_loader)\n",
+    "    print(f\"Epoch {epoch+1}/{num_epochs}, Loss: {avg_loss:.4f}\")\n",
+    "\n",
+    "#Evaluation on the Test Set\n",
+    "\n",
+    "\n",
+    "\n",
+    "model.eval()  # set model to evaluation mode \n",
+    "correct = 0\n",
+    "total = 0\n",
+    "with torch.no_grad():  # disable gradient calculation for evaluation \n",
+    "    for images, labels in test_loader:\n",
+    "        images, labels = images.to(device), labels.to(device)\n",
+    "        outputs = model(images)\n",
+    "        _, predicted = torch.max(outputs, dim=1)  # class with highest score\n",
+    "        total += labels.size(0)\n",
+    "        correct += (predicted == labels).sum().item()\n",
+    "\n",
+    "accuracy = 100 * correct / total\n",
+    "print(f\"Test Accuracy: {accuracy:.2f}%\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "aad687aa",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## And a similar example using Tensorflow with Keras"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 11,
+   "id": "b6c4fad4",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "\n",
+    "import tensorflow as tf\n",
+    "from tensorflow import keras\n",
+    "from tensorflow.keras import layers, regularizers\n",
+    "\n",
+    "# Check for GPU (TensorFlow will use it automatically if available)\n",
+    "gpus = tf.config.list_physical_devices('GPU')\n",
+    "print(f\"GPUs available: {gpus}\")\n",
+    "\n",
+    "# 1) Load and preprocess MNIST\n",
+    "(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()\n",
+    "# Normalize to [0, 1]\n",
+    "x_train = (x_train.astype(\"float32\") / 255.0)\n",
+    "x_test  = (x_test.astype(\"float32\") / 255.0)\n",
+    "\n",
+    "# 2) Build the model: 784 -> 100 -> 100 -> 10\n",
+    "l2_reg = 1e-4  # L2 regularization strength\n",
+    "\n",
+    "model = keras.Sequential([\n",
+    "    layers.Input(shape=(28, 28)),\n",
+    "    layers.Flatten(),\n",
+    "    layers.Dense(100, activation=\"relu\",\n",
+    "                 kernel_regularizer=regularizers.l2(l2_reg)),\n",
+    "    layers.Dense(100, activation=\"relu\",\n",
+    "                 kernel_regularizer=regularizers.l2(l2_reg)),\n",
+    "    layers.Dense(10, activation=\"softmax\")  # output probabilities for 10 classes\n",
+    "])\n",
+    "\n",
+    "# 3) Compile with SGD + weight decay via L2 regularizers\n",
+    "model.compile(\n",
+    "    optimizer=keras.optimizers.SGD(learning_rate=0.01),\n",
+    "    loss=\"sparse_categorical_crossentropy\",\n",
+    "    metrics=[\"accuracy\"],\n",
+    ")\n",
+    "\n",
+    "model.summary()\n",
+    "\n",
+    "# 4) Train\n",
+    "history = model.fit(\n",
+    "    x_train, y_train,\n",
+    "    epochs=10,\n",
+    "    batch_size=64,\n",
+    "    validation_split=0.1,  # optional: monitor validation during training\n",
+    "    verbose=1\n",
+    ")\n",
+    "\n",
+    "# 5) Evaluate on test set\n",
+    "test_loss, test_acc = model.evaluate(x_test, y_test, verbose=0)\n",
+    "print(f\"Test accuracy: {test_acc:.4f}, Test loss: {test_loss:.4f}\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "73162fbb",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Building our own  neural network code\n",
+    "\n",
+    "Here we  present a flexible object oriented codebase\n",
+    "for a feed forward neural network, along with a demonstration of how\n",
+    "to use it. Before we get into the details of the neural network, we\n",
+    "will first present some implementations of various schedulers, cost\n",
+    "functions and activation functions that can be used together with the\n",
+    "neural network.\n",
+    "\n",
+    "The codes here were developed by Eric Reber and Gregor Kajda during spring 2023."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "86f36041",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "### Learning rate methods\n",
+    "\n",
+    "The code below shows object oriented implementations of the Constant,\n",
+    "Momentum, Adagrad, AdagradMomentum, RMS prop and Adam schedulers. All\n",
+    "of the classes belong to the shared abstract Scheduler class, and\n",
+    "share the update_change() and reset() methods allowing for any of the\n",
+    "schedulers to be seamlessly used during the training stage, as will\n",
+    "later be shown in the fit() method of the neural\n",
+    "network. Update_change() only has one parameter, the gradient\n",
+    "($δ^l_ja^{l−1}_k$), and returns the change which will be subtracted\n",
+    "from the weights. The reset() function takes no parameters, and resets\n",
+    "the desired variables. For Constant and Momentum, reset does nothing."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 12,
+   "id": "bcbec449",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "import autograd.numpy as np\n",
+    "\n",
+    "class Scheduler:\n",
+    "    \"\"\"\n",
+    "    Abstract class for Schedulers\n",
+    "    \"\"\"\n",
+    "\n",
+    "    def __init__(self, eta):\n",
+    "        self.eta = eta\n",
+    "\n",
+    "    # should be overwritten\n",
+    "    def update_change(self, gradient):\n",
+    "        raise NotImplementedError\n",
+    "\n",
+    "    # overwritten if needed\n",
+    "    def reset(self):\n",
+    "        pass\n",
+    "\n",
+    "\n",
+    "class Constant(Scheduler):\n",
+    "    def __init__(self, eta):\n",
+    "        super().__init__(eta)\n",
+    "\n",
+    "    def update_change(self, gradient):\n",
+    "        return self.eta * gradient\n",
+    "    \n",
+    "    def reset(self):\n",
+    "        pass\n",
+    "\n",
+    "\n",
+    "class Momentum(Scheduler):\n",
+    "    def __init__(self, eta: float, momentum: float):\n",
+    "        super().__init__(eta)\n",
+    "        self.momentum = momentum\n",
+    "        self.change = 0\n",
+    "\n",
+    "    def update_change(self, gradient):\n",
+    "        self.change = self.momentum * self.change + self.eta * gradient\n",
+    "        return self.change\n",
+    "\n",
+    "    def reset(self):\n",
+    "        pass\n",
+    "\n",
+    "\n",
+    "class Adagrad(Scheduler):\n",
+    "    def __init__(self, eta):\n",
+    "        super().__init__(eta)\n",
+    "        self.G_t = None\n",
+    "\n",
+    "    def update_change(self, gradient):\n",
+    "        delta = 1e-8  # avoid division ny zero\n",
+    "\n",
+    "        if self.G_t is None:\n",
+    "            self.G_t = np.zeros((gradient.shape[0], gradient.shape[0]))\n",
+    "\n",
+    "        self.G_t += gradient @ gradient.T\n",
+    "\n",
+    "        G_t_inverse = 1 / (\n",
+    "            delta + np.sqrt(np.reshape(np.diagonal(self.G_t), (self.G_t.shape[0], 1)))\n",
+    "        )\n",
+    "        return self.eta * gradient * G_t_inverse\n",
+    "\n",
+    "    def reset(self):\n",
+    "        self.G_t = None\n",
+    "\n",
+    "\n",
+    "class AdagradMomentum(Scheduler):\n",
+    "    def __init__(self, eta, momentum):\n",
+    "        super().__init__(eta)\n",
+    "        self.G_t = None\n",
+    "        self.momentum = momentum\n",
+    "        self.change = 0\n",
+    "\n",
+    "    def update_change(self, gradient):\n",
+    "        delta = 1e-8  # avoid division ny zero\n",
+    "\n",
+    "        if self.G_t is None:\n",
+    "            self.G_t = np.zeros((gradient.shape[0], gradient.shape[0]))\n",
+    "\n",
+    "        self.G_t += gradient @ gradient.T\n",
+    "\n",
+    "        G_t_inverse = 1 / (\n",
+    "            delta + np.sqrt(np.reshape(np.diagonal(self.G_t), (self.G_t.shape[0], 1)))\n",
+    "        )\n",
+    "        self.change = self.change * self.momentum + self.eta * gradient * G_t_inverse\n",
+    "        return self.change\n",
+    "\n",
+    "    def reset(self):\n",
+    "        self.G_t = None\n",
+    "\n",
+    "\n",
+    "class RMS_prop(Scheduler):\n",
+    "    def __init__(self, eta, rho):\n",
+    "        super().__init__(eta)\n",
+    "        self.rho = rho\n",
+    "        self.second = 0.0\n",
+    "\n",
+    "    def update_change(self, gradient):\n",
+    "        delta = 1e-8  # avoid division ny zero\n",
+    "        self.second = self.rho * self.second + (1 - self.rho) * gradient * gradient\n",
+    "        return self.eta * gradient / (np.sqrt(self.second + delta))\n",
+    "\n",
+    "    def reset(self):\n",
+    "        self.second = 0.0\n",
+    "\n",
+    "\n",
+    "class Adam(Scheduler):\n",
+    "    def __init__(self, eta, rho, rho2):\n",
+    "        super().__init__(eta)\n",
+    "        self.rho = rho\n",
+    "        self.rho2 = rho2\n",
+    "        self.moment = 0\n",
+    "        self.second = 0\n",
+    "        self.n_epochs = 1\n",
+    "\n",
+    "    def update_change(self, gradient):\n",
+    "        delta = 1e-8  # avoid division ny zero\n",
+    "\n",
+    "        self.moment = self.rho * self.moment + (1 - self.rho) * gradient\n",
+    "        self.second = self.rho2 * self.second + (1 - self.rho2) * gradient * gradient\n",
+    "\n",
+    "        moment_corrected = self.moment / (1 - self.rho**self.n_epochs)\n",
+    "        second_corrected = self.second / (1 - self.rho2**self.n_epochs)\n",
+    "\n",
+    "        return self.eta * moment_corrected / (np.sqrt(second_corrected + delta))\n",
+    "\n",
+    "    def reset(self):\n",
+    "        self.n_epochs += 1\n",
+    "        self.moment = 0\n",
+    "        self.second = 0"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "961989d9",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "### Usage of the above learning rate schedulers\n",
+    "\n",
+    "To initalize a scheduler, simply create the object and pass in the\n",
+    "necessary parameters such as the learning rate and the momentum as\n",
+    "shown below. As the Scheduler class is an abstract class it should not\n",
+    "called directly, and will raise an error upon usage."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 13,
+   "id": "1e9fbe0f",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "momentum_scheduler = Momentum(eta=1e-3, momentum=0.9)\n",
+    "adam_scheduler = Adam(eta=1e-3, rho=0.9, rho2=0.999)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b5adb1b4",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Here is a small example for how a segment of code using schedulers\n",
+    "could look. Switching out the schedulers is simple."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 14,
+   "id": "dc4f4d28",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "weights = np.ones((3,3))\n",
+    "print(f\"Before scheduler:\\n{weights=}\")\n",
+    "\n",
+    "epochs = 10\n",
+    "for e in range(epochs):\n",
+    "    gradient = np.random.rand(3, 3)\n",
+    "    change = adam_scheduler.update_change(gradient)\n",
+    "    weights = weights - change\n",
+    "    adam_scheduler.reset()\n",
+    "\n",
+    "print(f\"\\nAfter scheduler:\\n{weights=}\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8964d118",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "### Cost functions\n",
+    "\n",
+    "Here we discuss cost functions that can be used when creating the\n",
+    "neural network. Every cost function takes the target vector as its\n",
+    "parameter, and returns a function valued only at $x$ such that it may\n",
+    "easily be differentiated."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 15,
+   "id": "3a8470bd",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "import autograd.numpy as np\n",
+    "\n",
+    "def CostOLS(target):\n",
+    "    \n",
+    "    def func(X):\n",
+    "        return (1.0 / target.shape[0]) * np.sum((target - X) ** 2)\n",
+    "\n",
+    "    return func\n",
+    "\n",
+    "\n",
+    "def CostLogReg(target):\n",
+    "\n",
+    "    def func(X):\n",
+    "        \n",
+    "        return -(1.0 / target.shape[0]) * np.sum(\n",
+    "            (target * np.log(X + 10e-10)) + ((1 - target) * np.log(1 - X + 10e-10))\n",
+    "        )\n",
+    "\n",
+    "    return func\n",
+    "\n",
+    "\n",
+    "def CostCrossEntropy(target):\n",
+    "    \n",
+    "    def func(X):\n",
+    "        return -(1.0 / target.size) * np.sum(target * np.log(X + 10e-10))\n",
+    "\n",
+    "    return func"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ab4daf8f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Below we give a short example of how these cost function may be used\n",
+    "to obtain results if you wish to test them out on your own using\n",
+    "AutoGrad's automatics differentiation."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 16,
+   "id": "cf8922ac",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "from autograd import grad\n",
+    "\n",
+    "target = np.array([[1, 2, 3]]).T\n",
+    "a = np.array([[4, 5, 6]]).T\n",
+    "\n",
+    "cost_func = CostCrossEntropy\n",
+    "cost_func_derivative = grad(cost_func(target))\n",
+    "\n",
+    "valued_at_a = cost_func_derivative(a)\n",
+    "print(f\"Derivative of cost function {cost_func.__name__} valued at a:\\n{valued_at_a}\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "fab332c4",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "### Activation functions\n",
+    "\n",
+    "Finally, before we look at the neural network, we will look at the\n",
+    "activation functions which can be specified between the hidden layers\n",
+    "and as the output function. Each function can be valued for any given\n",
+    "vector or matrix X, and can be differentiated via derivate()."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 17,
+   "id": "5ab56013",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "import autograd.numpy as np\n",
+    "from autograd import elementwise_grad\n",
+    "\n",
+    "def identity(X):\n",
+    "    return X\n",
+    "\n",
+    "\n",
+    "def sigmoid(X):\n",
+    "    try:\n",
+    "        return 1.0 / (1 + np.exp(-X))\n",
+    "    except FloatingPointError:\n",
+    "        return np.where(X > np.zeros(X.shape), np.ones(X.shape), np.zeros(X.shape))\n",
+    "\n",
+    "\n",
+    "def softmax(X):\n",
+    "    X = X - np.max(X, axis=-1, keepdims=True)\n",
+    "    delta = 10e-10\n",
+    "    return np.exp(X) / (np.sum(np.exp(X), axis=-1, keepdims=True) + delta)\n",
+    "\n",
+    "\n",
+    "def RELU(X):\n",
+    "    return np.where(X > np.zeros(X.shape), X, np.zeros(X.shape))\n",
+    "\n",
+    "\n",
+    "def LRELU(X):\n",
+    "    delta = 10e-4\n",
+    "    return np.where(X > np.zeros(X.shape), X, delta * X)\n",
+    "\n",
+    "\n",
+    "def derivate(func):\n",
+    "    if func.__name__ == \"RELU\":\n",
+    "\n",
+    "        def func(X):\n",
+    "            return np.where(X > 0, 1, 0)\n",
+    "\n",
+    "        return func\n",
+    "\n",
+    "    elif func.__name__ == \"LRELU\":\n",
+    "\n",
+    "        def func(X):\n",
+    "            delta = 10e-4\n",
+    "            return np.where(X > 0, 1, delta)\n",
+    "\n",
+    "        return func\n",
+    "\n",
+    "    else:\n",
+    "        return elementwise_grad(func)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "969612c3",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Below follows a short demonstration of how to use an activation\n",
+    "function. The derivative of the activation function will be important\n",
+    "when calculating the output delta term during backpropagation. Note\n",
+    "that derivate() can also be used for cost functions for a more\n",
+    "generalized approach."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 18,
+   "id": "313878c6",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "z = np.array([[4, 5, 6]]).T\n",
+    "print(f\"Input to activation function:\\n{z}\")\n",
+    "\n",
+    "act_func = sigmoid\n",
+    "a = act_func(z)\n",
+    "print(f\"\\nOutput from {act_func.__name__} activation function:\\n{a}\")\n",
+    "\n",
+    "act_func_derivative = derivate(act_func)\n",
+    "valued_at_z = act_func_derivative(a)\n",
+    "print(f\"\\nDerivative of {act_func.__name__} activation function valued at z:\\n{valued_at_z}\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "095347a2",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "### The Neural Network\n",
+    "\n",
+    "Now that we have gotten a good understanding of the implementation of\n",
+    "some important components, we can take a look at an object oriented\n",
+    "implementation of a feed forward neural network. The feed forward\n",
+    "neural network has been implemented as a class named FFNN, which can\n",
+    "be initiated as a regressor or classifier dependant on the choice of\n",
+    "cost function. The FFNN can have any number of input nodes, hidden\n",
+    "layers with any amount of hidden nodes, and any amount of output nodes\n",
+    "meaning it can perform multiclass classification as well as binary\n",
+    "classification and regression problems. Although there is a lot of\n",
+    "code present, it makes for an easy to use and generalizeable interface\n",
+    "for creating many types of neural networks as will be demonstrated\n",
+    "below."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 19,
+   "id": "9ea2b0b7",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "import math\n",
+    "import autograd.numpy as np\n",
+    "import sys\n",
+    "import warnings\n",
+    "from autograd import grad, elementwise_grad\n",
+    "from random import random, seed\n",
+    "from copy import deepcopy, copy\n",
+    "from typing import Tuple, Callable\n",
+    "from sklearn.utils import resample\n",
+    "\n",
+    "warnings.simplefilter(\"error\")\n",
+    "\n",
+    "\n",
+    "class FFNN:\n",
+    "    \"\"\"\n",
+    "    Description:\n",
+    "    ------------\n",
+    "        Feed Forward Neural Network with interface enabling flexible design of a\n",
+    "        nerual networks architecture and the specification of activation function\n",
+    "        in the hidden layers and output layer respectively. This model can be used\n",
+    "        for both regression and classification problems, depending on the output function.\n",
+    "\n",
+    "    Attributes:\n",
+    "    ------------\n",
+    "        I   dimensions (tuple[int]): A list of positive integers, which specifies the\n",
+    "            number of nodes in each of the networks layers. The first integer in the array\n",
+    "            defines the number of nodes in the input layer, the second integer defines number\n",
+    "            of nodes in the first hidden layer and so on until the last number, which\n",
+    "            specifies the number of nodes in the output layer.\n",
+    "        II  hidden_func (Callable): The activation function for the hidden layers\n",
+    "        III output_func (Callable): The activation function for the output layer\n",
+    "        IV  cost_func (Callable): Our cost function\n",
+    "        V   seed (int): Sets random seed, makes results reproducible\n",
+    "    \"\"\"\n",
+    "\n",
+    "    def __init__(\n",
+    "        self,\n",
+    "        dimensions: tuple[int],\n",
+    "        hidden_func: Callable = sigmoid,\n",
+    "        output_func: Callable = lambda x: x,\n",
+    "        cost_func: Callable = CostOLS,\n",
+    "        seed: int = None,\n",
+    "    ):\n",
+    "        self.dimensions = dimensions\n",
+    "        self.hidden_func = hidden_func\n",
+    "        self.output_func = output_func\n",
+    "        self.cost_func = cost_func\n",
+    "        self.seed = seed\n",
+    "        self.weights = list()\n",
+    "        self.schedulers_weight = list()\n",
+    "        self.schedulers_bias = list()\n",
+    "        self.a_matrices = list()\n",
+    "        self.z_matrices = list()\n",
+    "        self.classification = None\n",
+    "\n",
+    "        self.reset_weights()\n",
+    "        self._set_classification()\n",
+    "\n",
+    "    def fit(\n",
+    "        self,\n",
+    "        X: np.ndarray,\n",
+    "        t: np.ndarray,\n",
+    "        scheduler: Scheduler,\n",
+    "        batches: int = 1,\n",
+    "        epochs: int = 100,\n",
+    "        lam: float = 0,\n",
+    "        X_val: np.ndarray = None,\n",
+    "        t_val: np.ndarray = None,\n",
+    "    ):\n",
+    "        \"\"\"\n",
+    "        Description:\n",
+    "        ------------\n",
+    "            This function performs the training the neural network by performing the feedforward and backpropagation\n",
+    "            algorithm to update the networks weights.\n",
+    "\n",
+    "        Parameters:\n",
+    "        ------------\n",
+    "            I    X (np.ndarray) : training data\n",
+    "            II   t (np.ndarray) : target data\n",
+    "            III  scheduler (Scheduler) : specified scheduler (algorithm for optimization of gradient descent)\n",
+    "            IV   scheduler_args (list[int]) : list of all arguments necessary for scheduler\n",
+    "\n",
+    "        Optional Parameters:\n",
+    "        ------------\n",
+    "            V    batches (int) : number of batches the datasets are split into, default equal to 1\n",
+    "            VI   epochs (int) : number of iterations used to train the network, default equal to 100\n",
+    "            VII  lam (float) : regularization hyperparameter lambda\n",
+    "            VIII X_val (np.ndarray) : validation set\n",
+    "            IX   t_val (np.ndarray) : validation target set\n",
+    "\n",
+    "        Returns:\n",
+    "        ------------\n",
+    "            I   scores (dict) : A dictionary containing the performance metrics of the model.\n",
+    "                The number of the metrics depends on the parameters passed to the fit-function.\n",
+    "\n",
+    "        \"\"\"\n",
+    "\n",
+    "        # setup \n",
+    "        if self.seed is not None:\n",
+    "            np.random.seed(self.seed)\n",
+    "\n",
+    "        val_set = False\n",
+    "        if X_val is not None and t_val is not None:\n",
+    "            val_set = True\n",
+    "\n",
+    "        # creating arrays for score metrics\n",
+    "        train_errors = np.empty(epochs)\n",
+    "        train_errors.fill(np.nan)\n",
+    "        val_errors = np.empty(epochs)\n",
+    "        val_errors.fill(np.nan)\n",
+    "\n",
+    "        train_accs = np.empty(epochs)\n",
+    "        train_accs.fill(np.nan)\n",
+    "        val_accs = np.empty(epochs)\n",
+    "        val_accs.fill(np.nan)\n",
+    "\n",
+    "        self.schedulers_weight = list()\n",
+    "        self.schedulers_bias = list()\n",
+    "\n",
+    "        batch_size = X.shape[0] // batches\n",
+    "\n",
+    "        X, t = resample(X, t)\n",
+    "\n",
+    "        # this function returns a function valued only at X\n",
+    "        cost_function_train = self.cost_func(t)\n",
+    "        if val_set:\n",
+    "            cost_function_val = self.cost_func(t_val)\n",
+    "\n",
+    "        # create schedulers for each weight matrix\n",
+    "        for i in range(len(self.weights)):\n",
+    "            self.schedulers_weight.append(copy(scheduler))\n",
+    "            self.schedulers_bias.append(copy(scheduler))\n",
+    "\n",
+    "        print(f\"{scheduler.__class__.__name__}: Eta={scheduler.eta}, Lambda={lam}\")\n",
+    "\n",
+    "        try:\n",
+    "            for e in range(epochs):\n",
+    "                for i in range(batches):\n",
+    "                    # allows for minibatch gradient descent\n",
+    "                    if i == batches - 1:\n",
+    "                        # If the for loop has reached the last batch, take all thats left\n",
+    "                        X_batch = X[i * batch_size :, :]\n",
+    "                        t_batch = t[i * batch_size :, :]\n",
+    "                    else:\n",
+    "                        X_batch = X[i * batch_size : (i + 1) * batch_size, :]\n",
+    "                        t_batch = t[i * batch_size : (i + 1) * batch_size, :]\n",
+    "\n",
+    "                    self._feedforward(X_batch)\n",
+    "                    self._backpropagate(X_batch, t_batch, lam)\n",
+    "\n",
+    "                # reset schedulers for each epoch (some schedulers pass in this call)\n",
+    "                for scheduler in self.schedulers_weight:\n",
+    "                    scheduler.reset()\n",
+    "\n",
+    "                for scheduler in self.schedulers_bias:\n",
+    "                    scheduler.reset()\n",
+    "\n",
+    "                # computing performance metrics\n",
+    "                pred_train = self.predict(X)\n",
+    "                train_error = cost_function_train(pred_train)\n",
+    "\n",
+    "                train_errors[e] = train_error\n",
+    "                if val_set:\n",
+    "                    \n",
+    "                    pred_val = self.predict(X_val)\n",
+    "                    val_error = cost_function_val(pred_val)\n",
+    "                    val_errors[e] = val_error\n",
+    "\n",
+    "                if self.classification:\n",
+    "                    train_acc = self._accuracy(self.predict(X), t)\n",
+    "                    train_accs[e] = train_acc\n",
+    "                    if val_set:\n",
+    "                        val_acc = self._accuracy(pred_val, t_val)\n",
+    "                        val_accs[e] = val_acc\n",
+    "\n",
+    "                # printing progress bar\n",
+    "                progression = e / epochs\n",
+    "                print_length = self._progress_bar(\n",
+    "                    progression,\n",
+    "                    train_error=train_errors[e],\n",
+    "                    train_acc=train_accs[e],\n",
+    "                    val_error=val_errors[e],\n",
+    "                    val_acc=val_accs[e],\n",
+    "                )\n",
+    "        except KeyboardInterrupt:\n",
+    "            # allows for stopping training at any point and seeing the result\n",
+    "            pass\n",
+    "\n",
+    "        # visualization of training progression (similiar to tensorflow progression bar)\n",
+    "        sys.stdout.write(\"\\r\" + \" \" * print_length)\n",
+    "        sys.stdout.flush()\n",
+    "        self._progress_bar(\n",
+    "            1,\n",
+    "            train_error=train_errors[e],\n",
+    "            train_acc=train_accs[e],\n",
+    "            val_error=val_errors[e],\n",
+    "            val_acc=val_accs[e],\n",
+    "        )\n",
+    "        sys.stdout.write(\"\")\n",
+    "\n",
+    "        # return performance metrics for the entire run\n",
+    "        scores = dict()\n",
+    "\n",
+    "        scores[\"train_errors\"] = train_errors\n",
+    "\n",
+    "        if val_set:\n",
+    "            scores[\"val_errors\"] = val_errors\n",
+    "\n",
+    "        if self.classification:\n",
+    "            scores[\"train_accs\"] = train_accs\n",
+    "\n",
+    "            if val_set:\n",
+    "                scores[\"val_accs\"] = val_accs\n",
+    "\n",
+    "        return scores\n",
+    "\n",
+    "    def predict(self, X: np.ndarray, *, threshold=0.5):\n",
+    "        \"\"\"\n",
+    "         Description:\n",
+    "         ------------\n",
+    "             Performs prediction after training of the network has been finished.\n",
+    "\n",
+    "         Parameters:\n",
+    "        ------------\n",
+    "             I   X (np.ndarray): The design matrix, with n rows of p features each\n",
+    "\n",
+    "         Optional Parameters:\n",
+    "         ------------\n",
+    "             II  threshold (float) : sets minimal value for a prediction to be predicted as the positive class\n",
+    "                 in classification problems\n",
+    "\n",
+    "         Returns:\n",
+    "         ------------\n",
+    "             I   z (np.ndarray): A prediction vector (row) for each row in our design matrix\n",
+    "                 This vector is thresholded if regression=False, meaning that classification results\n",
+    "                 in a vector of 1s and 0s, while regressions in an array of decimal numbers\n",
+    "\n",
+    "        \"\"\"\n",
+    "\n",
+    "        predict = self._feedforward(X)\n",
+    "\n",
+    "        if self.classification:\n",
+    "            return np.where(predict > threshold, 1, 0)\n",
+    "        else:\n",
+    "            return predict\n",
+    "\n",
+    "    def reset_weights(self):\n",
+    "        \"\"\"\n",
+    "        Description:\n",
+    "        ------------\n",
+    "            Resets/Reinitializes the weights in order to train the network for a new problem.\n",
+    "\n",
+    "        \"\"\"\n",
+    "        if self.seed is not None:\n",
+    "            np.random.seed(self.seed)\n",
+    "\n",
+    "        self.weights = list()\n",
+    "        for i in range(len(self.dimensions) - 1):\n",
+    "            weight_array = np.random.randn(\n",
+    "                self.dimensions[i] + 1, self.dimensions[i + 1]\n",
+    "            )\n",
+    "            weight_array[0, :] = np.random.randn(self.dimensions[i + 1]) * 0.01\n",
+    "\n",
+    "            self.weights.append(weight_array)\n",
+    "\n",
+    "    def _feedforward(self, X: np.ndarray):\n",
+    "        \"\"\"\n",
+    "        Description:\n",
+    "        ------------\n",
+    "            Calculates the activation of each layer starting at the input and ending at the output.\n",
+    "            Each following activation is calculated from a weighted sum of each of the preceeding\n",
+    "            activations (except in the case of the input layer).\n",
+    "\n",
+    "        Parameters:\n",
+    "        ------------\n",
+    "            I   X (np.ndarray): The design matrix, with n rows of p features each\n",
+    "\n",
+    "        Returns:\n",
+    "        ------------\n",
+    "            I   z (np.ndarray): A prediction vector (row) for each row in our design matrix\n",
+    "        \"\"\"\n",
+    "\n",
+    "        # reset matrices\n",
+    "        self.a_matrices = list()\n",
+    "        self.z_matrices = list()\n",
+    "\n",
+    "        # if X is just a vector, make it into a matrix\n",
+    "        if len(X.shape) == 1:\n",
+    "            X = X.reshape((1, X.shape[0]))\n",
+    "\n",
+    "        # Add a coloumn of zeros as the first coloumn of the design matrix, in order\n",
+    "        # to add bias to our data\n",
+    "        bias = np.ones((X.shape[0], 1)) * 0.01\n",
+    "        X = np.hstack([bias, X])\n",
+    "\n",
+    "        # a^0, the nodes in the input layer (one a^0 for each row in X - where the\n",
+    "        # exponent indicates layer number).\n",
+    "        a = X\n",
+    "        self.a_matrices.append(a)\n",
+    "        self.z_matrices.append(a)\n",
+    "\n",
+    "        # The feed forward algorithm\n",
+    "        for i in range(len(self.weights)):\n",
+    "            if i < len(self.weights) - 1:\n",
+    "                z = a @ self.weights[i]\n",
+    "                self.z_matrices.append(z)\n",
+    "                a = self.hidden_func(z)\n",
+    "                # bias column again added to the data here\n",
+    "                bias = np.ones((a.shape[0], 1)) * 0.01\n",
+    "                a = np.hstack([bias, a])\n",
+    "                self.a_matrices.append(a)\n",
+    "            else:\n",
+    "                try:\n",
+    "                    # a^L, the nodes in our output layers\n",
+    "                    z = a @ self.weights[i]\n",
+    "                    a = self.output_func(z)\n",
+    "                    self.a_matrices.append(a)\n",
+    "                    self.z_matrices.append(z)\n",
+    "                except Exception as OverflowError:\n",
+    "                    print(\n",
+    "                        \"OverflowError in fit() in FFNN\\nHOW TO DEBUG ERROR: Consider lowering your learning rate or scheduler specific parameters such as momentum, or check if your input values need scaling\"\n",
+    "                    )\n",
+    "\n",
+    "        # this will be a^L\n",
+    "        return a\n",
+    "\n",
+    "    def _backpropagate(self, X, t, lam):\n",
+    "        \"\"\"\n",
+    "        Description:\n",
+    "        ------------\n",
+    "            Performs the backpropagation algorithm. In other words, this method\n",
+    "            calculates the gradient of all the layers starting at the\n",
+    "            output layer, and moving from right to left accumulates the gradient until\n",
+    "            the input layer is reached. Each layers respective weights are updated while\n",
+    "            the algorithm propagates backwards from the output layer (auto-differentation in reverse mode).\n",
+    "\n",
+    "        Parameters:\n",
+    "        ------------\n",
+    "            I   X (np.ndarray): The design matrix, with n rows of p features each.\n",
+    "            II  t (np.ndarray): The target vector, with n rows of p targets.\n",
+    "            III lam (float32): regularization parameter used to punish the weights in case of overfitting\n",
+    "\n",
+    "        Returns:\n",
+    "        ------------\n",
+    "            No return value.\n",
+    "\n",
+    "        \"\"\"\n",
+    "        out_derivative = derivate(self.output_func)\n",
+    "        hidden_derivative = derivate(self.hidden_func)\n",
+    "\n",
+    "        for i in range(len(self.weights) - 1, -1, -1):\n",
+    "            # delta terms for output\n",
+    "            if i == len(self.weights) - 1:\n",
+    "                # for multi-class classification\n",
+    "                if (\n",
+    "                    self.output_func.__name__ == \"softmax\"\n",
+    "                ):\n",
+    "                    delta_matrix = self.a_matrices[i + 1] - t\n",
+    "                # for single class classification\n",
+    "                else:\n",
+    "                    cost_func_derivative = grad(self.cost_func(t))\n",
+    "                    delta_matrix = out_derivative(\n",
+    "                        self.z_matrices[i + 1]\n",
+    "                    ) * cost_func_derivative(self.a_matrices[i + 1])\n",
+    "\n",
+    "            # delta terms for hidden layer\n",
+    "            else:\n",
+    "                delta_matrix = (\n",
+    "                    self.weights[i + 1][1:, :] @ delta_matrix.T\n",
+    "                ).T * hidden_derivative(self.z_matrices[i + 1])\n",
+    "\n",
+    "            # calculate gradient\n",
+    "            gradient_weights = self.a_matrices[i][:, 1:].T @ delta_matrix\n",
+    "            gradient_bias = np.sum(delta_matrix, axis=0).reshape(\n",
+    "                1, delta_matrix.shape[1]\n",
+    "            )\n",
+    "\n",
+    "            # regularization term\n",
+    "            gradient_weights += self.weights[i][1:, :] * lam\n",
+    "\n",
+    "            # use scheduler\n",
+    "            update_matrix = np.vstack(\n",
+    "                [\n",
+    "                    self.schedulers_bias[i].update_change(gradient_bias),\n",
+    "                    self.schedulers_weight[i].update_change(gradient_weights),\n",
+    "                ]\n",
+    "            )\n",
+    "\n",
+    "            # update weights and bias\n",
+    "            self.weights[i] -= update_matrix\n",
+    "\n",
+    "    def _accuracy(self, prediction: np.ndarray, target: np.ndarray):\n",
+    "        \"\"\"\n",
+    "        Description:\n",
+    "        ------------\n",
+    "            Calculates accuracy of given prediction to target\n",
+    "\n",
+    "        Parameters:\n",
+    "        ------------\n",
+    "            I   prediction (np.ndarray): vector of predicitons output network\n",
+    "                (1s and 0s in case of classification, and real numbers in case of regression)\n",
+    "            II  target (np.ndarray): vector of true values (What the network ideally should predict)\n",
+    "\n",
+    "        Returns:\n",
+    "        ------------\n",
+    "            A floating point number representing the percentage of correctly classified instances.\n",
+    "        \"\"\"\n",
+    "        assert prediction.size == target.size\n",
+    "        return np.average((target == prediction))\n",
+    "    def _set_classification(self):\n",
+    "        \"\"\"\n",
+    "        Description:\n",
+    "        ------------\n",
+    "            Decides if FFNN acts as classifier (True) og regressor (False),\n",
+    "            sets self.classification during init()\n",
+    "        \"\"\"\n",
+    "        self.classification = False\n",
+    "        if (\n",
+    "            self.cost_func.__name__ == \"CostLogReg\"\n",
+    "            or self.cost_func.__name__ == \"CostCrossEntropy\"\n",
+    "        ):\n",
+    "            self.classification = True\n",
+    "\n",
+    "    def _progress_bar(self, progression, **kwargs):\n",
+    "        \"\"\"\n",
+    "        Description:\n",
+    "        ------------\n",
+    "            Displays progress of training\n",
+    "        \"\"\"\n",
+    "        print_length = 40\n",
+    "        num_equals = int(progression * print_length)\n",
+    "        num_not = print_length - num_equals\n",
+    "        arrow = \">\" if num_equals > 0 else \"\"\n",
+    "        bar = \"[\" + \"=\" * (num_equals - 1) + arrow + \"-\" * num_not + \"]\"\n",
+    "        perc_print = self._format(progression * 100, decimals=5)\n",
+    "        line = f\"  {bar} {perc_print}% \"\n",
+    "\n",
+    "        for key in kwargs:\n",
+    "            if not np.isnan(kwargs[key]):\n",
+    "                value = self._format(kwargs[key], decimals=4)\n",
+    "                line += f\"| {key}: {value} \"\n",
+    "        sys.stdout.write(\"\\r\" + line)\n",
+    "        sys.stdout.flush()\n",
+    "        return len(line)\n",
+    "\n",
+    "    def _format(self, value, decimals=4):\n",
+    "        \"\"\"\n",
+    "        Description:\n",
+    "        ------------\n",
+    "            Formats decimal numbers for progress bar\n",
+    "        \"\"\"\n",
+    "        if value > 0:\n",
+    "            v = value\n",
+    "        elif value < 0:\n",
+    "            v = -10 * value\n",
+    "        else:\n",
+    "            v = 1\n",
+    "        n = 1 + math.floor(math.log10(v))\n",
+    "        if n >= decimals - 1:\n",
+    "            return str(round(value))\n",
+    "        return f\"{value:.{decimals-n-1}f}\""
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0f29bccd",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Before we make a model, we will quickly generate a dataset we can use\n",
+    "for our linear regression problem as shown below"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 20,
+   "id": "dc37b403",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "import autograd.numpy as np\n",
+    "from sklearn.model_selection import train_test_split\n",
+    "\n",
+    "def SkrankeFunction(x, y):\n",
+    "    return np.ravel(0 + 1*x + 2*y + 3*x**2 + 4*x*y + 5*y**2)\n",
+    "\n",
+    "def create_X(x, y, n):\n",
+    "    if len(x.shape) > 1:\n",
+    "        x = np.ravel(x)\n",
+    "        y = np.ravel(y)\n",
+    "\n",
+    "    N = len(x)\n",
+    "    l = int((n + 1) * (n + 2) / 2)  # Number of elements in beta\n",
+    "    X = np.ones((N, l))\n",
+    "\n",
+    "    for i in range(1, n + 1):\n",
+    "        q = int((i) * (i + 1) / 2)\n",
+    "        for k in range(i + 1):\n",
+    "            X[:, q + k] = (x ** (i - k)) * (y**k)\n",
+    "\n",
+    "    return X\n",
+    "\n",
+    "step=0.5\n",
+    "x = np.arange(0, 1, step)\n",
+    "y = np.arange(0, 1, step)\n",
+    "x, y = np.meshgrid(x, y)\n",
+    "target = SkrankeFunction(x, y)\n",
+    "target = target.reshape(target.shape[0], 1)\n",
+    "\n",
+    "poly_degree=3\n",
+    "X = create_X(x, y, poly_degree)\n",
+    "\n",
+    "X_train, X_test, t_train, t_test = train_test_split(X, target)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "91790369",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Now that we have our dataset ready for the regression, we can create\n",
+    "our regressor. Note that with the seed parameter, we can make sure our\n",
+    "results stay the same every time we run the neural network. For\n",
+    "inititialization, we simply specify the dimensions (we wish the amount\n",
+    "of input nodes to be equal to the datapoints, and the output to\n",
+    "predict one value)."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 21,
+   "id": "62585c7a",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "input_nodes = X_train.shape[1]\n",
+    "output_nodes = 1\n",
+    "\n",
+    "linear_regression = FFNN((input_nodes, output_nodes), output_func=identity, cost_func=CostOLS, seed=2023)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "69cdc171",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "We then fit our model with our training data using the scheduler of our choice."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 22,
+   "id": "d0713298",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "linear_regression.reset_weights() # reset weights such that previous runs or reruns don't affect the weights\n",
+    "\n",
+    "scheduler = Constant(eta=1e-3)\n",
+    "scores = linear_regression.fit(X_train, t_train, scheduler)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "310f805d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Due to the progress bar we can see the MSE (train_error) throughout\n",
+    "the FFNN's training. Note that the fit() function has some optional\n",
+    "parameters with defualt arguments. For example, the regularization\n",
+    "hyperparameter can be left ignored if not needed, and equally the FFNN\n",
+    "will by default run for 100 epochs. These can easily be changed, such\n",
+    "as for example:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 23,
+   "id": "216d1c44",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "linear_regression.reset_weights() # reset weights such that previous runs or reruns don't affect the weights\n",
+    "\n",
+    "scores = linear_regression.fit(X_train, t_train, scheduler, lam=1e-4, epochs=1000)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ba2e5a39",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "We see that given more epochs to train on, the regressor reaches a lower MSE.\n",
+    "\n",
+    "Let us then switch to a binary classification. We use a binary\n",
+    "classification dataset, and follow a similar setup to the regression\n",
+    "case."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 24,
+   "id": "8c5b291e",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "from sklearn.datasets import load_breast_cancer\n",
+    "from sklearn.preprocessing import MinMaxScaler\n",
+    "\n",
+    "wisconsin = load_breast_cancer()\n",
+    "X = wisconsin.data\n",
+    "target = wisconsin.target\n",
+    "target = target.reshape(target.shape[0], 1)\n",
+    "\n",
+    "X_train, X_val, t_train, t_val = train_test_split(X, target)\n",
+    "\n",
+    "scaler = MinMaxScaler()\n",
+    "scaler.fit(X_train)\n",
+    "X_train = scaler.transform(X_train)\n",
+    "X_val = scaler.transform(X_val)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 25,
+   "id": "4f6aa682",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "input_nodes = X_train.shape[1]\n",
+    "output_nodes = 1\n",
+    "\n",
+    "logistic_regression = FFNN((input_nodes, output_nodes), output_func=sigmoid, cost_func=CostLogReg, seed=2023)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3ff7c54a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "We will now make use of our validation data by passing it into our fit function as a keyword argument"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 26,
+   "id": "4bbcaedd",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "logistic_regression.reset_weights() # reset weights such that previous runs or reruns don't affect the weights\n",
+    "\n",
+    "scheduler = Adam(eta=1e-3, rho=0.9, rho2=0.999)\n",
+    "scores = logistic_regression.fit(X_train, t_train, scheduler, epochs=1000, X_val=X_val, t_val=t_val)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "aa4f54fe",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Finally, we will create a neural network with 2 hidden layers with activation functions."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 27,
+   "id": "c11be1f5",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "input_nodes = X_train.shape[1]\n",
+    "hidden_nodes1 = 100\n",
+    "hidden_nodes2 = 30\n",
+    "output_nodes = 1\n",
+    "\n",
+    "dims = (input_nodes, hidden_nodes1, hidden_nodes2, output_nodes)\n",
+    "\n",
+    "neural_network = FFNN(dims, hidden_func=RELU, output_func=sigmoid, cost_func=CostLogReg, seed=2023)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 28,
+   "id": "78482f24",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "neural_network.reset_weights() # reset weights such that previous runs or reruns don't affect the weights\n",
+    "\n",
+    "scheduler = Adam(eta=1e-4, rho=0.9, rho2=0.999)\n",
+    "scores = neural_network.fit(X_train, t_train, scheduler, epochs=1000, X_val=X_val, t_val=t_val)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "678b88e7",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "### Multiclass classification\n",
+    "\n",
+    "Finally, we will demonstrate the use case of multiclass classification\n",
+    "using our FFNN with the famous MNIST dataset, which contain images of\n",
+    "digits between the range of 0 to 9."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 29,
+   "id": "833a7321",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "from sklearn.datasets import load_digits\n",
+    "\n",
+    "def onehot(target: np.ndarray):\n",
+    "    onehot = np.zeros((target.size, target.max() + 1))\n",
+    "    onehot[np.arange(target.size), target] = 1\n",
+    "    return onehot\n",
+    "\n",
+    "digits = load_digits()\n",
+    "\n",
+    "X = digits.data\n",
+    "target = digits.target\n",
+    "target = onehot(target)\n",
+    "\n",
+    "input_nodes = 64\n",
+    "hidden_nodes1 = 100\n",
+    "hidden_nodes2 = 30\n",
+    "output_nodes = 10\n",
+    "\n",
+    "dims = (input_nodes, hidden_nodes1, hidden_nodes2, output_nodes)\n",
+    "\n",
+    "multiclass = FFNN(dims, hidden_func=LRELU, output_func=softmax, cost_func=CostCrossEntropy)\n",
+    "\n",
+    "multiclass.reset_weights() # reset weights such that previous runs or reruns don't affect the weights\n",
+    "\n",
+    "scheduler = Adam(eta=1e-4, rho=0.9, rho2=0.999)\n",
+    "scores = multiclass.fit(X, target, scheduler, epochs=1000)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1af2ad7b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Testing the XOR gate and other gates\n",
+    "\n",
+    "Let us now use our code to test the XOR gate."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 30,
+   "id": "752c6403",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "X = np.array([ [0, 0], [0, 1], [1, 0],[1, 1]],dtype=np.float64)\n",
+    "\n",
+    "# The XOR gate\n",
+    "yXOR = np.array( [[ 0], [1] ,[1], [0]])\n",
+    "\n",
+    "input_nodes = X.shape[1]\n",
+    "output_nodes = 1\n",
+    "\n",
+    "logistic_regression = FFNN((input_nodes, output_nodes), output_func=sigmoid, cost_func=CostLogReg, seed=2023)\n",
+    "logistic_regression.reset_weights() # reset weights such that previous runs or reruns don't affect the weights\n",
+    "scheduler = Adam(eta=1e-1, rho=0.9, rho2=0.999)\n",
+    "scores = logistic_regression.fit(X, yXOR, scheduler, epochs=1000)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0a7c91e3",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Not bad, but the results depend strongly on the learning reate. Try different learning rates."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "40ffa1fb",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Solving differential equations  with Deep Learning\n",
+    "\n",
+    "The Universal Approximation Theorem states that a neural network can\n",
+    "approximate any function at a single hidden layer along with one input\n",
+    "and output layer to any given precision.\n",
+    "\n",
+    "**Book on solving differential equations with ML methods.**\n",
+    "\n",
+    "[An Introduction to Neural Network Methods for Differential Equations](https://www.springer.com/gp/book/9789401798150), by Yadav and Kumar.\n",
+    "\n",
+    "**Physics informed neural networks.**\n",
+    "\n",
+    "[Scientific Machine Learning Through Physics–Informed Neural Networks: Where we are and What’s Next](https://link.springer.com/article/10.1007/s10915-022-01939-z), by Cuomo et al\n",
+    "\n",
+    "**Thanks to Kristine Baluka Hein.**\n",
+    "\n",
+    "The lectures on differential equations were developed by Kristine Baluka Hein, now PhD student at IFI.\n",
+    "A great thanks to Kristine."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "191ba3eb",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Ordinary Differential Equations first\n",
+    "\n",
+    "An ordinary differential equation (ODE) is an equation involving functions having one variable.\n",
+    "\n",
+    "In general, an ordinary differential equation looks like"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a0be312a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"ode\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation} \\label{ode} \\tag{1}\n",
+    "f\\left(x, \\, g(x), \\, g'(x), \\, g''(x), \\, \\dots \\, , \\, g^{(n)}(x)\\right) = 0\n",
+    "\\end{equation}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "000663cf",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where $g(x)$ is the function to find, and $g^{(n)}(x)$ is the $n$-th derivative of $g(x)$.\n",
+    "\n",
+    "The $f\\left(x, g(x), g'(x), g''(x), \\, \\dots \\, , g^{(n)}(x)\\right)$ is just a way to write that there is an expression involving $x$ and $g(x), \\ g'(x), \\ g''(x), \\, \\dots \\, , \\text{ and } g^{(n)}(x)$ on the left side of the equality sign in ([1](#ode)).\n",
+    "The highest order of derivative, that is the value of $n$, determines to the order of the equation.\n",
+    "The equation is referred to as a $n$-th order ODE.\n",
+    "Along with ([1](#ode)), some additional conditions of the function $g(x)$ are typically given\n",
+    "for the solution to be unique."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f5b87995",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## The trial solution\n",
+    "\n",
+    "Let the trial solution $g_t(x)$ be"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a166c0b6",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"_auto1\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation}\n",
+    "\tg_t(x) = h_1(x) + h_2(x,N(x,P))\n",
+    "\\label{_auto1} \\tag{2}\n",
+    "\\end{equation}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f1e49a2c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where $h_1(x)$ is a function that makes $g_t(x)$ satisfy a given set\n",
+    "of conditions, $N(x,P)$ a neural network with weights and biases\n",
+    "described by $P$ and $h_2(x, N(x,P))$ some expression involving the\n",
+    "neural network.  The role of the function $h_2(x, N(x,P))$, is to\n",
+    "ensure that the output from $N(x,P)$ is zero when $g_t(x)$ is\n",
+    "evaluated at the values of $x$ where the given conditions must be\n",
+    "satisfied.  The function $h_1(x)$ should alone make $g_t(x)$ satisfy\n",
+    "the conditions.\n",
+    "\n",
+    "But what about the network $N(x,P)$?\n",
+    "\n",
+    "As described previously, an optimization method could be used to minimize the parameters of a neural network, that being its weights and biases, through backward propagation."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "207d1a97",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Minimization process\n",
+    "\n",
+    "For the minimization to be defined, we need to have a cost function at hand to minimize.\n",
+    "\n",
+    "It is given that $f\\left(x, \\, g(x), \\, g'(x), \\, g''(x), \\, \\dots \\, , \\, g^{(n)}(x)\\right)$ should be equal to zero in ([1](#ode)).\n",
+    "We can choose to consider the mean squared error as the cost function for an input $x$.\n",
+    "Since we are looking at one input, the cost function is just $f$ squared.\n",
+    "The cost function $c\\left(x, P \\right)$ can therefore be expressed as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "94a061a1",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "C\\left(x, P\\right) = \\big(f\\left(x, \\, g(x), \\, g'(x), \\, g''(x), \\, \\dots \\, , \\, g^{(n)}(x)\\right)\\big)^2\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "93244d03",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "If $N$ inputs are given as a vector $\\boldsymbol{x}$ with elements $x_i$ for $i = 1,\\dots,N$,\n",
+    "the cost function becomes"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6dc16fd4",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"cost\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation} \\label{cost} \\tag{3}\n",
+    "\tC\\left(\\boldsymbol{x}, P\\right) = \\frac{1}{N} \\sum_{i=1}^N \\big(f\\left(x_i, \\, g(x_i), \\, g'(x_i), \\, g''(x_i), \\, \\dots \\, , \\, g^{(n)}(x_i)\\right)\\big)^2\n",
+    "\\end{equation}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "01f4c14a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "The neural net should then find the parameters $P$ that minimizes the cost function in\n",
+    "([3](#cost)) for a set of $N$ training samples $x_i$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1784066c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Minimizing the cost function using gradient descent and automatic differentiation\n",
+    "\n",
+    "To perform the minimization using gradient descent, the gradient of $C\\left(\\boldsymbol{x}, P\\right)$ is needed.\n",
+    "It might happen so that finding an analytical expression of the gradient of $C(\\boldsymbol{x}, P)$ from ([3](#cost)) gets too messy, depending on which cost function one desires to use.\n",
+    "\n",
+    "Luckily, there exists libraries that makes the job for us through automatic differentiation.\n",
+    "Automatic differentiation is a method of finding the derivatives numerically with very high precision."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "43e1b7bf",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Example: Exponential decay\n",
+    "\n",
+    "An exponential decay of a quantity $g(x)$ is described by the equation"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5c28e60a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"solve_expdec\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation} \\label{solve_expdec} \\tag{4}\n",
+    "  g'(x) = -\\gamma g(x)\n",
+    "\\end{equation}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cfd2e420",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "with $g(0) = g_0$ for some chosen initial value $g_0$.\n",
+    "\n",
+    "The analytical solution of ([4](#solve_expdec)) is"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b93aa0f8",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"_auto2\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation}\n",
+    "  g(x) = g_0 \\exp\\left(-\\gamma x\\right)\n",
+    "\\label{_auto2} \\tag{5}\n",
+    "\\end{equation}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "093952f0",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Having an analytical solution at hand, it is possible to use it to compare how well a neural network finds a solution of ([4](#solve_expdec))."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8f82fa61",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## The function to solve for\n",
+    "\n",
+    "The program will use a neural network to solve"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "027d9c52",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"solveode\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation} \\label{solveode} \\tag{6}\n",
+    "g'(x) = -\\gamma g(x)\n",
+    "\\end{equation}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c18c4ee8",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where $g(0) = g_0$ with $\\gamma$ and $g_0$ being some chosen values.\n",
+    "\n",
+    "In this example, $\\gamma = 2$ and $g_0 = 10$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a0d7fc0a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## The trial solution\n",
+    "To begin with, a trial solution $g_t(t)$ must be chosen. A general trial solution for ordinary differential equations could be"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "73cd72f4",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "g_t(x, P) = h_1(x) + h_2(x, N(x, P))\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a4d0850f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "with $h_1(x)$ ensuring that $g_t(x)$ satisfies some conditions and $h_2(x,N(x, P))$ an expression involving $x$ and the output from the neural network $N(x,P)$ with $P $ being the collection of the weights and biases for each layer. For now, it is assumed that the network consists of one input layer, one hidden layer, and one output layer."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "62f3b94f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Setup of Network\n",
+    "\n",
+    "In this network, there are no weights and bias at the input layer, so $P = \\{ P_{\\text{hidden}},  P_{\\text{output}} \\}$.\n",
+    "If there are $N_{\\text{hidden} }$ neurons in the hidden layer, then $P_{\\text{hidden}}$ is a $N_{\\text{hidden} } \\times (1 + N_{\\text{input}})$ matrix, given that there are $N_{\\text{input}}$ neurons in the input layer.\n",
+    "\n",
+    "The first column in $P_{\\text{hidden} }$ represents the bias for each neuron in the hidden layer and the second column represents the weights for each neuron in the hidden layer from the input layer.\n",
+    "If there are $N_{\\text{output} }$ neurons in the output layer, then $P_{\\text{output}} $ is a $N_{\\text{output} } \\times (1 + N_{\\text{hidden} })$ matrix.\n",
+    "\n",
+    "Its first column represents the bias of each neuron and the remaining columns represents the weights to each neuron.\n",
+    "\n",
+    "It is given that $g(0) = g_0$. The trial solution must fulfill this condition to be a proper solution of ([6](#solveode)). A possible way to ensure that $g_t(0, P) = g_0$, is to let $F(N(x,P)) = x \\cdot N(x,P)$ and $A(x) = g_0$. This gives the following trial solution:"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f5144858",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"trial\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation} \\label{trial} \\tag{7}\n",
+    "g_t(x, P) = g_0 + x \\cdot N(x, P)\n",
+    "\\end{equation}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6b441362",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Reformulating the problem\n",
+    "\n",
+    "We wish that our neural network manages to minimize a given cost function.\n",
+    "\n",
+    "A reformulation of out equation, ([6](#solveode)), must therefore be done,\n",
+    "such that it describes the problem a neural network can solve for.\n",
+    "\n",
+    "The neural network must find the set of weights and biases $P$ such that the trial solution in ([7](#trial)) satisfies ([6](#solveode)).\n",
+    "\n",
+    "The trial solution"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "abfe2d6d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "g_t(x, P) = g_0 + x \\cdot N(x, P)\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "aabb6c7b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "has been chosen such that it already solves the condition $g(0) = g_0$. What remains, is to find $P$ such that"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "11fc8b1b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"nnmin\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation} \\label{nnmin} \\tag{8}\n",
+    "g_t'(x, P) = - \\gamma g_t(x, P)\n",
+    "\\end{equation}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "604c92b4",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "is fulfilled as *best as possible*."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e2cd7572",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## More technicalities\n",
+    "\n",
+    "The left hand side and right hand side of ([8](#nnmin)) must be computed separately, and then the neural network must choose weights and biases, contained in $P$, such that the sides are equal as best as possible.\n",
+    "This means that the absolute or squared difference between the sides must be as close to zero, ideally equal to zero.\n",
+    "In this case, the difference squared shows to be an appropriate measurement of how erroneous the trial solution is with respect to $P$ of the neural network.\n",
+    "\n",
+    "This gives the following cost function our neural network must solve for:"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d916a5f6",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\min_{P}\\Big\\{ \\big(g_t'(x, P) - ( -\\gamma g_t(x, P) \\big)^2 \\Big\\}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d746e69c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "(the notation $\\min_{P}\\{ f(x, P) \\}$ means that we desire to find $P$ that yields the minimum of $f(x, P)$)\n",
+    "\n",
+    "or, in terms of weights and biases for the hidden and output layer in our network:"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4c34c242",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\min_{P_{\\text{hidden} }, \\ P_{\\text{output} }}\\Big\\{ \\big(g_t'(x, \\{ P_{\\text{hidden} }, P_{\\text{output} }\\}) - ( -\\gamma g_t(x, \\{ P_{\\text{hidden} }, P_{\\text{output} }\\}) \\big)^2 \\Big\\}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f55f3047",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "for an input value $x$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "485e4671",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## More details\n",
+    "\n",
+    "If the neural network evaluates $g_t(x, P)$ at more values for $x$, say $N$ values $x_i$ for $i = 1, \\dots, N$, then the *total* error to minimize becomes"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5628ca35",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"min\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation} \\label{min} \\tag{9}\n",
+    "\\min_{P}\\Big\\{\\frac{1}{N} \\sum_{i=1}^N  \\big(g_t'(x_i, P) - ( -\\gamma g_t(x_i, P) \\big)^2 \\Big\\}\n",
+    "\\end{equation}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "da2c90ea",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Letting $\\boldsymbol{x}$ be a vector with elements $x_i$ and $C(\\boldsymbol{x}, P) = \\frac{1}{N} \\sum_i  \\big(g_t'(x_i, P) - ( -\\gamma g_t(x_i, P) \\big)^2$ denote the cost function, the minimization problem that our network must solve, becomes"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d386a466",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\min_{P} C(\\boldsymbol{x}, P)\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ec3d975a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "In terms of $P_{\\text{hidden} }$ and $P_{\\text{output} }$, this could also be expressed as\n",
+    "\n",
+    "$$\n",
+    "\\min_{P_{\\text{hidden} }, \\ P_{\\text{output} }} C(\\boldsymbol{x}, \\{P_{\\text{hidden} }, P_{\\text{output} }\\})\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4f0f47e7",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## A possible implementation of a neural network\n",
+    "\n",
+    "For simplicity, it is assumed that the input is an array $\\boldsymbol{x} = (x_1, \\dots, x_N)$ with $N$ elements. It is at these points the neural network should find $P$ such that it fulfills ([9](#min)).\n",
+    "\n",
+    "First, the neural network must feed forward the inputs.\n",
+    "This means that $\\boldsymbol{x}s$ must be passed through an input layer, a hidden layer and a output layer. The input layer in this case, does not need to process the data any further.\n",
+    "The input layer will consist of $N_{\\text{input} }$ neurons, passing its element to each neuron in the hidden layer.  The number of neurons in the hidden layer will be $N_{\\text{hidden} }$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a757d9cf",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Technicalities\n",
+    "\n",
+    "For the $i$-th in the hidden layer with weight $w_i^{\\text{hidden} }$ and bias $b_i^{\\text{hidden} }$, the weighting from the $j$-th neuron at the input layer is:"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ee093dd9",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\begin{aligned}\n",
+    "z_{i,j}^{\\text{hidden}} &= b_i^{\\text{hidden}} + w_i^{\\text{hidden}}x_j \\\\\n",
+    "&=\n",
+    "\\begin{pmatrix}\n",
+    "b_i^{\\text{hidden}} & w_i^{\\text{hidden}}\n",
+    "\\end{pmatrix}\n",
+    "\\begin{pmatrix}\n",
+    "1 \\\\\n",
+    "x_j\n",
+    "\\end{pmatrix}\n",
+    "\\end{aligned}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4d3954bf",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Final technicalities I\n",
+    "\n",
+    "The result after weighting the inputs at the $i$-th hidden neuron can be written as a vector:"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b4b36b8c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\begin{aligned}\n",
+    "\\boldsymbol{z}_{i}^{\\text{hidden}} &= \\Big( b_i^{\\text{hidden}} + w_i^{\\text{hidden}}x_1 , \\ b_i^{\\text{hidden}} + w_i^{\\text{hidden}} x_2, \\ \\dots \\, , \\ b_i^{\\text{hidden}} + w_i^{\\text{hidden}} x_N\\Big)  \\\\\n",
+    "&=\n",
+    "\\begin{pmatrix}\n",
+    " b_i^{\\text{hidden}}  & w_i^{\\text{hidden}}\n",
+    "\\end{pmatrix}\n",
+    "\\begin{pmatrix}\n",
+    "1  & 1 & \\dots & 1 \\\\\n",
+    "x_1 & x_2 & \\dots & x_N\n",
+    "\\end{pmatrix} \\\\\n",
+    "&= \\boldsymbol{p}_{i, \\text{hidden}}^T X\n",
+    "\\end{aligned}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "36e8a1dd",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Final technicalities II\n",
+    "\n",
+    "The vector $\\boldsymbol{p}_{i, \\text{hidden}}^T$ constitutes each row in $P_{\\text{hidden} }$, which contains the weights for the neural network to minimize according to ([9](#min)).\n",
+    "\n",
+    "After having found $\\boldsymbol{z}_{i}^{\\text{hidden}} $ for every $i$-th neuron within the hidden layer, the vector will be sent to an activation function $a_i(\\boldsymbol{z})$.\n",
+    "\n",
+    "In this example, the sigmoid function has been chosen to be the activation function for each hidden neuron:"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "af2e68be",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "f(z) = \\frac{1}{1 + \\exp{(-z)}}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7b8922c6",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "It is possible to use other activations functions for the hidden layer also.\n",
+    "\n",
+    "The output $\\boldsymbol{x}_i^{\\text{hidden}}$ from each $i$-th hidden neuron is:\n",
+    "\n",
+    "$$\n",
+    "\\boldsymbol{x}_i^{\\text{hidden} } = f\\big(  \\boldsymbol{z}_{i}^{\\text{hidden}} \\big)\n",
+    "$$\n",
+    "\n",
+    "The outputs $\\boldsymbol{x}_i^{\\text{hidden} } $ are then sent to the output layer.\n",
+    "\n",
+    "The output layer consists of one neuron in this case, and combines the\n",
+    "output from each of the neurons in the hidden layers. The output layer\n",
+    "combines the results from the hidden layer using some weights $w_i^{\\text{output}}$\n",
+    "and biases $b_i^{\\text{output}}$. In this case,\n",
+    "it is assumes that the number of neurons in the output layer is one."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2aa977d9",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Final technicalities III\n",
+    "\n",
+    "The procedure of weighting the output neuron $j$ in the hidden layer to the $i$-th neuron in the output layer is similar as for the hidden layer described previously."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "48eccfa6",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\begin{aligned}\n",
+    "z_{1,j}^{\\text{output}} & =\n",
+    "\\begin{pmatrix}\n",
+    "b_1^{\\text{output}} & \\boldsymbol{w}_1^{\\text{output}}\n",
+    "\\end{pmatrix}\n",
+    "\\begin{pmatrix}\n",
+    "1 \\\\\n",
+    "\\boldsymbol{x}_j^{\\text{hidden}}\n",
+    "\\end{pmatrix}\n",
+    "\\end{aligned}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d4c2cdbf",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Final technicalities IV\n",
+    "\n",
+    "Expressing $z_{1,j}^{\\text{output}}$ as a vector gives the following way of weighting the inputs from the hidden layer:"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "be26d9c9",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\boldsymbol{z}_{1}^{\\text{output}} =\n",
+    "\\begin{pmatrix}\n",
+    "b_1^{\\text{output}} & \\boldsymbol{w}_1^{\\text{output}}\n",
+    "\\end{pmatrix}\n",
+    "\\begin{pmatrix}\n",
+    "1  & 1 & \\dots & 1 \\\\\n",
+    "\\boldsymbol{x}_1^{\\text{hidden}} & \\boldsymbol{x}_2^{\\text{hidden}} & \\dots & \\boldsymbol{x}_N^{\\text{hidden}}\n",
+    "\\end{pmatrix}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f3703c9a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "In this case we seek a continuous range of values since we are approximating a function. This means that after computing $\\boldsymbol{z}_{1}^{\\text{output}}$ the neural network has finished its feed forward step, and $\\boldsymbol{z}_{1}^{\\text{output}}$ is the final output of the network."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9859680c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Back propagation\n",
+    "\n",
+    "The next step is to decide how the parameters should be changed such that they minimize the cost function.\n",
+    "\n",
+    "The chosen cost function for this problem is"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c3df269d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "C(\\boldsymbol{x}, P) = \\frac{1}{N} \\sum_i  \\big(g_t'(x_i, P) - ( -\\gamma g_t(x_i, P) \\big)^2\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "dc69023a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "In order to minimize the cost function, an optimization method must be chosen.\n",
+    "\n",
+    "Here, gradient descent with a constant step size has been chosen."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d4bed3bd",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Gradient descent\n",
+    "\n",
+    "The idea of the gradient descent algorithm is to update parameters in\n",
+    "a direction where the cost function decreases goes to a minimum.\n",
+    "\n",
+    "In general, the update of some parameters $\\boldsymbol{\\omega}$ given a cost\n",
+    "function defined by some weights $\\boldsymbol{\\omega}$, $C(\\boldsymbol{x},\n",
+    "\\boldsymbol{\\omega})$, goes as follows:"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ed2a4f9a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\boldsymbol{\\omega}_{\\text{new} } = \\boldsymbol{\\omega} - \\lambda \\nabla_{\\boldsymbol{\\omega}} C(\\boldsymbol{x}, \\boldsymbol{\\omega})\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b9a4f604",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "for a number of iterations or until $ \\big|\\big| \\boldsymbol{\\omega}_{\\text{new} } - \\boldsymbol{\\omega} \\big|\\big|$ becomes smaller than some given tolerance.\n",
+    "\n",
+    "The value of $\\lambda$ decides how large steps the algorithm must take\n",
+    "in the direction of $ \\nabla_{\\boldsymbol{\\omega}} C(\\boldsymbol{x}, \\boldsymbol{\\omega})$.\n",
+    "The notation $\\nabla_{\\boldsymbol{\\omega}}$ express the gradient with respect\n",
+    "to the elements in $\\boldsymbol{\\omega}$.\n",
+    "\n",
+    "In our case, we have to minimize the cost function $C(\\boldsymbol{x}, P)$ with\n",
+    "respect to the two sets of weights and biases, that is for the hidden\n",
+    "layer $P_{\\text{hidden} }$ and for the output layer $P_{\\text{output}\n",
+    "}$ .\n",
+    "\n",
+    "This means that $P_{\\text{hidden} }$ and $P_{\\text{output} }$ is updated by"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e48d507f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\begin{aligned}\n",
+    "P_{\\text{hidden},\\text{new}} &= P_{\\text{hidden}} - \\lambda \\nabla_{P_{\\text{hidden}}} C(\\boldsymbol{x}, P)  \\\\\n",
+    "P_{\\text{output},\\text{new}} &= P_{\\text{output}} - \\lambda \\nabla_{P_{\\text{output}}} C(\\boldsymbol{x}, P)\n",
+    "\\end{aligned}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b84c5cf5",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## The code for solving the ODE"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 31,
+   "id": "293d0f7d",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "import autograd.numpy as np\n",
+    "from autograd import grad, elementwise_grad\n",
+    "import autograd.numpy.random as npr\n",
+    "from matplotlib import pyplot as plt\n",
+    "\n",
+    "def sigmoid(z):\n",
+    "    return 1/(1 + np.exp(-z))\n",
+    "\n",
+    "# Assuming one input, hidden, and output layer\n",
+    "def neural_network(params, x):\n",
+    "\n",
+    "    # Find the weights (including and biases) for the hidden and output layer.\n",
+    "    # Assume that params is a list of parameters for each layer.\n",
+    "    # The biases are the first element for each array in params,\n",
+    "    # and the weights are the remaning elements in each array in params.\n",
+    "\n",
+    "    w_hidden = params[0]\n",
+    "    w_output = params[1]\n",
+    "\n",
+    "    # Assumes input x being an one-dimensional array\n",
+    "    num_values = np.size(x)\n",
+    "    x = x.reshape(-1, num_values)\n",
+    "\n",
+    "    # Assume that the input layer does nothing to the input x\n",
+    "    x_input = x\n",
+    "\n",
+    "    ## Hidden layer:\n",
+    "\n",
+    "    # Add a row of ones to include bias\n",
+    "    x_input = np.concatenate((np.ones((1,num_values)), x_input ), axis = 0)\n",
+    "\n",
+    "    z_hidden = np.matmul(w_hidden, x_input)\n",
+    "    x_hidden = sigmoid(z_hidden)\n",
+    "\n",
+    "    ## Output layer:\n",
+    "\n",
+    "    # Include bias:\n",
+    "    x_hidden = np.concatenate((np.ones((1,num_values)), x_hidden ), axis = 0)\n",
+    "\n",
+    "    z_output = np.matmul(w_output, x_hidden)\n",
+    "    x_output = z_output\n",
+    "\n",
+    "    return x_output\n",
+    "\n",
+    "# The trial solution using the deep neural network:\n",
+    "def g_trial(x,params, g0 = 10):\n",
+    "    return g0 + x*neural_network(params,x)\n",
+    "\n",
+    "# The right side of the ODE:\n",
+    "def g(x, g_trial, gamma = 2):\n",
+    "    return -gamma*g_trial\n",
+    "\n",
+    "# The cost function:\n",
+    "def cost_function(P, x):\n",
+    "\n",
+    "    # Evaluate the trial function with the current parameters P\n",
+    "    g_t = g_trial(x,P)\n",
+    "\n",
+    "    # Find the derivative w.r.t x of the neural network\n",
+    "    d_net_out = elementwise_grad(neural_network,1)(P,x)\n",
+    "\n",
+    "    # Find the derivative w.r.t x of the trial function\n",
+    "    d_g_t = elementwise_grad(g_trial,0)(x,P)\n",
+    "\n",
+    "    # The right side of the ODE\n",
+    "    func = g(x, g_t)\n",
+    "\n",
+    "    err_sqr = (d_g_t - func)**2\n",
+    "    cost_sum = np.sum(err_sqr)\n",
+    "\n",
+    "    return cost_sum / np.size(err_sqr)\n",
+    "\n",
+    "# Solve the exponential decay ODE using neural network with one input, hidden, and output layer\n",
+    "def solve_ode_neural_network(x, num_neurons_hidden, num_iter, lmb):\n",
+    "    ## Set up initial weights and biases\n",
+    "\n",
+    "    # For the hidden layer\n",
+    "    p0 = npr.randn(num_neurons_hidden, 2 )\n",
+    "\n",
+    "    # For the output layer\n",
+    "    p1 = npr.randn(1, num_neurons_hidden + 1 ) # +1 since bias is included\n",
+    "\n",
+    "    P = [p0, p1]\n",
+    "\n",
+    "    print('Initial cost: %g'%cost_function(P, x))\n",
+    "\n",
+    "    ## Start finding the optimal weights using gradient descent\n",
+    "\n",
+    "    # Find the Python function that represents the gradient of the cost function\n",
+    "    # w.r.t the 0-th input argument -- that is the weights and biases in the hidden and output layer\n",
+    "    cost_function_grad = grad(cost_function,0)\n",
+    "\n",
+    "    # Let the update be done num_iter times\n",
+    "    for i in range(num_iter):\n",
+    "        # Evaluate the gradient at the current weights and biases in P.\n",
+    "        # The cost_grad consist now of two arrays;\n",
+    "        # one for the gradient w.r.t P_hidden and\n",
+    "        # one for the gradient w.r.t P_output\n",
+    "        cost_grad =  cost_function_grad(P, x)\n",
+    "\n",
+    "        P[0] = P[0] - lmb * cost_grad[0]\n",
+    "        P[1] = P[1] - lmb * cost_grad[1]\n",
+    "\n",
+    "    print('Final cost: %g'%cost_function(P, x))\n",
+    "\n",
+    "    return P\n",
+    "\n",
+    "def g_analytic(x, gamma = 2, g0 = 10):\n",
+    "    return g0*np.exp(-gamma*x)\n",
+    "\n",
+    "# Solve the given problem\n",
+    "if __name__ == '__main__':\n",
+    "    # Set seed such that the weight are initialized\n",
+    "    # with same weights and biases for every run.\n",
+    "    npr.seed(15)\n",
+    "\n",
+    "    ## Decide the vales of arguments to the function to solve\n",
+    "    N = 10\n",
+    "    x = np.linspace(0, 1, N)\n",
+    "\n",
+    "    ## Set up the initial parameters\n",
+    "    num_hidden_neurons = 10\n",
+    "    num_iter = 10000\n",
+    "    lmb = 0.001\n",
+    "\n",
+    "    # Use the network\n",
+    "    P = solve_ode_neural_network(x, num_hidden_neurons, num_iter, lmb)\n",
+    "\n",
+    "    # Print the deviation from the trial solution and true solution\n",
+    "    res = g_trial(x,P)\n",
+    "    res_analytical = g_analytic(x)\n",
+    "\n",
+    "    print('Max absolute difference: %g'%np.max(np.abs(res - res_analytical)))\n",
+    "\n",
+    "    # Plot the results\n",
+    "    plt.figure(figsize=(10,10))\n",
+    "\n",
+    "    plt.title('Performance of neural network solving an ODE compared to the analytical solution')\n",
+    "    plt.plot(x, res_analytical)\n",
+    "    plt.plot(x, res[0,:])\n",
+    "    plt.legend(['analytical','nn'])\n",
+    "    plt.xlabel('x')\n",
+    "    plt.ylabel('g(x)')\n",
+    "    plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "54c070e1",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## The network with one input layer, specified number of hidden layers, and one output layer\n",
+    "\n",
+    "It is also possible to extend the construction of our network into a more general one, allowing the network to contain more than one hidden layers.\n",
+    "\n",
+    "The number of neurons within each hidden layer are given as a list of integers in the program below."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 32,
+   "id": "4ab2467e",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "import autograd.numpy as np\n",
+    "from autograd import grad, elementwise_grad\n",
+    "import autograd.numpy.random as npr\n",
+    "from matplotlib import pyplot as plt\n",
+    "\n",
+    "def sigmoid(z):\n",
+    "    return 1/(1 + np.exp(-z))\n",
+    "\n",
+    "# The neural network with one input layer and one output layer,\n",
+    "# but with number of hidden layers specified by the user.\n",
+    "def deep_neural_network(deep_params, x):\n",
+    "    # N_hidden is the number of hidden layers\n",
+    "    # deep_params is a list, len() should be used\n",
+    "    N_hidden = len(deep_params) - 1 # -1 since params consists of\n",
+    "                                        # parameters to all the hidden\n",
+    "                                        # layers AND the output layer.\n",
+    "\n",
+    "    # Assumes input x being an one-dimensional array\n",
+    "    num_values = np.size(x)\n",
+    "    x = x.reshape(-1, num_values)\n",
+    "\n",
+    "    # Assume that the input layer does nothing to the input x\n",
+    "    x_input = x\n",
+    "\n",
+    "    # Due to multiple hidden layers, define a variable referencing to the\n",
+    "    # output of the previous layer:\n",
+    "    x_prev = x_input\n",
+    "\n",
+    "    ## Hidden layers:\n",
+    "\n",
+    "    for l in range(N_hidden):\n",
+    "        # From the list of parameters P; find the correct weigths and bias for this layer\n",
+    "        w_hidden = deep_params[l]\n",
+    "\n",
+    "        # Add a row of ones to include bias\n",
+    "        x_prev = np.concatenate((np.ones((1,num_values)), x_prev ), axis = 0)\n",
+    "\n",
+    "        z_hidden = np.matmul(w_hidden, x_prev)\n",
+    "        x_hidden = sigmoid(z_hidden)\n",
+    "\n",
+    "        # Update x_prev such that next layer can use the output from this layer\n",
+    "        x_prev = x_hidden\n",
+    "\n",
+    "    ## Output layer:\n",
+    "\n",
+    "    # Get the weights and bias for this layer\n",
+    "    w_output = deep_params[-1]\n",
+    "\n",
+    "    # Include bias:\n",
+    "    x_prev = np.concatenate((np.ones((1,num_values)), x_prev), axis = 0)\n",
+    "\n",
+    "    z_output = np.matmul(w_output, x_prev)\n",
+    "    x_output = z_output\n",
+    "\n",
+    "    return x_output\n",
+    "\n",
+    "# The trial solution using the deep neural network:\n",
+    "def g_trial_deep(x,params, g0 = 10):\n",
+    "    return g0 + x*deep_neural_network(params, x)\n",
+    "\n",
+    "# The right side of the ODE:\n",
+    "def g(x, g_trial, gamma = 2):\n",
+    "    return -gamma*g_trial\n",
+    "\n",
+    "# The same cost function as before, but calls deep_neural_network instead.\n",
+    "def cost_function_deep(P, x):\n",
+    "\n",
+    "    # Evaluate the trial function with the current parameters P\n",
+    "    g_t = g_trial_deep(x,P)\n",
+    "\n",
+    "    # Find the derivative w.r.t x of the neural network\n",
+    "    d_net_out = elementwise_grad(deep_neural_network,1)(P,x)\n",
+    "\n",
+    "    # Find the derivative w.r.t x of the trial function\n",
+    "    d_g_t = elementwise_grad(g_trial_deep,0)(x,P)\n",
+    "\n",
+    "    # The right side of the ODE\n",
+    "    func = g(x, g_t)\n",
+    "\n",
+    "    err_sqr = (d_g_t - func)**2\n",
+    "    cost_sum = np.sum(err_sqr)\n",
+    "\n",
+    "    return cost_sum / np.size(err_sqr)\n",
+    "\n",
+    "# Solve the exponential decay ODE using neural network with one input and one output layer,\n",
+    "# but with specified number of hidden layers from the user.\n",
+    "def solve_ode_deep_neural_network(x, num_neurons, num_iter, lmb):\n",
+    "    # num_hidden_neurons is now a list of number of neurons within each hidden layer\n",
+    "\n",
+    "    # The number of elements in the list num_hidden_neurons thus represents\n",
+    "    # the number of hidden layers.\n",
+    "\n",
+    "    # Find the number of hidden layers:\n",
+    "    N_hidden = np.size(num_neurons)\n",
+    "\n",
+    "    ## Set up initial weights and biases\n",
+    "\n",
+    "    # Initialize the list of parameters:\n",
+    "    P = [None]*(N_hidden + 1) # + 1 to include the output layer\n",
+    "\n",
+    "    P[0] = npr.randn(num_neurons[0], 2 )\n",
+    "    for l in range(1,N_hidden):\n",
+    "        P[l] = npr.randn(num_neurons[l], num_neurons[l-1] + 1) # +1 to include bias\n",
+    "\n",
+    "    # For the output layer\n",
+    "    P[-1] = npr.randn(1, num_neurons[-1] + 1 ) # +1 since bias is included\n",
+    "\n",
+    "    print('Initial cost: %g'%cost_function_deep(P, x))\n",
+    "\n",
+    "    ## Start finding the optimal weights using gradient descent\n",
+    "\n",
+    "    # Find the Python function that represents the gradient of the cost function\n",
+    "    # w.r.t the 0-th input argument -- that is the weights and biases in the hidden and output layer\n",
+    "    cost_function_deep_grad = grad(cost_function_deep,0)\n",
+    "\n",
+    "    # Let the update be done num_iter times\n",
+    "    for i in range(num_iter):\n",
+    "        # Evaluate the gradient at the current weights and biases in P.\n",
+    "        # The cost_grad consist now of N_hidden + 1 arrays; the gradient w.r.t the weights and biases\n",
+    "        # in the hidden layers and output layers evaluated at x.\n",
+    "        cost_deep_grad =  cost_function_deep_grad(P, x)\n",
+    "\n",
+    "        for l in range(N_hidden+1):\n",
+    "            P[l] = P[l] - lmb * cost_deep_grad[l]\n",
+    "\n",
+    "    print('Final cost: %g'%cost_function_deep(P, x))\n",
+    "\n",
+    "    return P\n",
+    "\n",
+    "def g_analytic(x, gamma = 2, g0 = 10):\n",
+    "    return g0*np.exp(-gamma*x)\n",
+    "\n",
+    "# Solve the given problem\n",
+    "if __name__ == '__main__':\n",
+    "    npr.seed(15)\n",
+    "\n",
+    "    ## Decide the vales of arguments to the function to solve\n",
+    "    N = 10\n",
+    "    x = np.linspace(0, 1, N)\n",
+    "\n",
+    "    ## Set up the initial parameters\n",
+    "    num_hidden_neurons = np.array([10,10])\n",
+    "    num_iter = 10000\n",
+    "    lmb = 0.001\n",
+    "\n",
+    "    P = solve_ode_deep_neural_network(x, num_hidden_neurons, num_iter, lmb)\n",
+    "\n",
+    "    res = g_trial_deep(x,P)\n",
+    "    res_analytical = g_analytic(x)\n",
+    "\n",
+    "    plt.figure(figsize=(10,10))\n",
+    "\n",
+    "    plt.title('Performance of a deep neural network solving an ODE compared to the analytical solution')\n",
+    "    plt.plot(x, res_analytical)\n",
+    "    plt.plot(x, res[0,:])\n",
+    "    plt.legend(['analytical','dnn'])\n",
+    "    plt.ylabel('g(x)')\n",
+    "    plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "05126a03",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Example: Population growth\n",
+    "\n",
+    "A logistic model of population growth assumes that a population converges toward an equilibrium.\n",
+    "The population growth can be modeled by"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7b4e9871",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"log\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation} \\label{log} \\tag{10}\n",
+    "\tg'(t) = \\alpha g(t)(A - g(t))\n",
+    "\\end{equation}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "20266e3a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where $g(t)$ is the population density at time $t$, $\\alpha > 0$ the growth rate and $A > 0$ is the maximum population number in the environment.\n",
+    "Also, at $t = 0$ the population has the size $g(0) = g_0$, where $g_0$ is some chosen constant.\n",
+    "\n",
+    "In this example, similar network as for the exponential decay using Autograd has been used to solve the equation. However, as the implementation might suffer from e.g numerical instability\n",
+    "and high execution time (this might be more apparent in the examples solving PDEs),\n",
+    "using a library like  TensorFlow is recommended.\n",
+    "Here, we stay with a more simple approach and implement for comparison, the simple forward Euler method."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8a3f1b3d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Setting up the problem\n",
+    "\n",
+    "Here, we will model a population $g(t)$ in an environment having carrying capacity $A$.\n",
+    "The population follows the model"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "14dfc04b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"solveode_population\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation} \\label{solveode_population} \\tag{11}\n",
+    "g'(t) = \\alpha g(t)(A - g(t))\n",
+    "\\end{equation}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b125d1d3",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where $g(0) = g_0$.\n",
+    "\n",
+    "In this example, we let $\\alpha = 2$, $A = 1$, and $g_0 = 1.2$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "226a3528",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## The trial solution\n",
+    "\n",
+    "We will get a slightly different trial solution, as the boundary conditions are different\n",
+    "compared to the case for exponential decay.\n",
+    "\n",
+    "A possible trial solution satisfying the condition $g(0) = g_0$ could be\n",
+    "\n",
+    "$$\n",
+    "h_1(t) = g_0 + t \\cdot N(t,P)\n",
+    "$$\n",
+    "\n",
+    "with $N(t,P)$ being the output from the neural network with weights and biases for each layer collected in the set $P$.\n",
+    "\n",
+    "The analytical solution is\n",
+    "\n",
+    "$$\n",
+    "g(t) = \\frac{Ag_0}{g_0 + (A - g_0)\\exp(-\\alpha A t)}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "adeeb731",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## The program using Autograd\n",
+    "\n",
+    "The network will be the similar as for the exponential decay example, but with some small modifications for our problem."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 33,
+   "id": "eb3ed6d1",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "import autograd.numpy as np\n",
+    "from autograd import grad, elementwise_grad\n",
+    "import autograd.numpy.random as npr\n",
+    "from matplotlib import pyplot as plt\n",
+    "\n",
+    "def sigmoid(z):\n",
+    "    return 1/(1 + np.exp(-z))\n",
+    "\n",
+    "# Function to get the parameters.\n",
+    "# Done such that one can easily change the paramaters after one's liking.\n",
+    "def get_parameters():\n",
+    "    alpha = 2\n",
+    "    A = 1\n",
+    "    g0 = 1.2\n",
+    "    return alpha, A, g0\n",
+    "\n",
+    "def deep_neural_network(deep_params, x):\n",
+    "    # N_hidden is the number of hidden layers\n",
+    "    # deep_params is a list, len() should be used\n",
+    "    N_hidden = len(deep_params) - 1 # -1 since params consists of\n",
+    "                                        # parameters to all the hidden\n",
+    "                                        # layers AND the output layer.\n",
+    "\n",
+    "    # Assumes input x being an one-dimensional array\n",
+    "    num_values = np.size(x)\n",
+    "    x = x.reshape(-1, num_values)\n",
+    "\n",
+    "    # Assume that the input layer does nothing to the input x\n",
+    "    x_input = x\n",
+    "\n",
+    "    # Due to multiple hidden layers, define a variable referencing to the\n",
+    "    # output of the previous layer:\n",
+    "    x_prev = x_input\n",
+    "\n",
+    "    ## Hidden layers:\n",
+    "\n",
+    "    for l in range(N_hidden):\n",
+    "        # From the list of parameters P; find the correct weigths and bias for this layer\n",
+    "        w_hidden = deep_params[l]\n",
+    "\n",
+    "        # Add a row of ones to include bias\n",
+    "        x_prev = np.concatenate((np.ones((1,num_values)), x_prev ), axis = 0)\n",
+    "\n",
+    "        z_hidden = np.matmul(w_hidden, x_prev)\n",
+    "        x_hidden = sigmoid(z_hidden)\n",
+    "\n",
+    "        # Update x_prev such that next layer can use the output from this layer\n",
+    "        x_prev = x_hidden\n",
+    "\n",
+    "    ## Output layer:\n",
+    "\n",
+    "    # Get the weights and bias for this layer\n",
+    "    w_output = deep_params[-1]\n",
+    "\n",
+    "    # Include bias:\n",
+    "    x_prev = np.concatenate((np.ones((1,num_values)), x_prev), axis = 0)\n",
+    "\n",
+    "    z_output = np.matmul(w_output, x_prev)\n",
+    "    x_output = z_output\n",
+    "\n",
+    "    return x_output\n",
+    "\n",
+    "\n",
+    "\n",
+    "\n",
+    "def cost_function_deep(P, x):\n",
+    "\n",
+    "    # Evaluate the trial function with the current parameters P\n",
+    "    g_t = g_trial_deep(x,P)\n",
+    "\n",
+    "    # Find the derivative w.r.t x of the trial function\n",
+    "    d_g_t = elementwise_grad(g_trial_deep,0)(x,P)\n",
+    "\n",
+    "    # The right side of the ODE\n",
+    "    func = f(x, g_t)\n",
+    "\n",
+    "    err_sqr = (d_g_t - func)**2\n",
+    "    cost_sum = np.sum(err_sqr)\n",
+    "\n",
+    "    return cost_sum / np.size(err_sqr)\n",
+    "\n",
+    "# The right side of the ODE:\n",
+    "def f(x, g_trial):\n",
+    "    alpha,A, g0 = get_parameters()\n",
+    "    return alpha*g_trial*(A - g_trial)\n",
+    "\n",
+    "# The trial solution using the deep neural network:\n",
+    "def g_trial_deep(x, params):\n",
+    "    alpha,A, g0 = get_parameters()\n",
+    "    return g0 + x*deep_neural_network(params,x)\n",
+    "\n",
+    "# The analytical solution:\n",
+    "def g_analytic(t):\n",
+    "    alpha,A, g0 = get_parameters()\n",
+    "    return A*g0/(g0 + (A - g0)*np.exp(-alpha*A*t))\n",
+    "\n",
+    "def solve_ode_deep_neural_network(x, num_neurons, num_iter, lmb):\n",
+    "    # num_hidden_neurons is now a list of number of neurons within each hidden layer\n",
+    "\n",
+    "    # Find the number of hidden layers:\n",
+    "    N_hidden = np.size(num_neurons)\n",
+    "\n",
+    "    ## Set up initial weigths and biases\n",
+    "\n",
+    "    # Initialize the list of parameters:\n",
+    "    P = [None]*(N_hidden + 1) # + 1 to include the output layer\n",
+    "\n",
+    "    P[0] = npr.randn(num_neurons[0], 2 )\n",
+    "    for l in range(1,N_hidden):\n",
+    "        P[l] = npr.randn(num_neurons[l], num_neurons[l-1] + 1) # +1 to include bias\n",
+    "\n",
+    "    # For the output layer\n",
+    "    P[-1] = npr.randn(1, num_neurons[-1] + 1 ) # +1 since bias is included\n",
+    "\n",
+    "    print('Initial cost: %g'%cost_function_deep(P, x))\n",
+    "\n",
+    "    ## Start finding the optimal weigths using gradient descent\n",
+    "\n",
+    "    # Find the Python function that represents the gradient of the cost function\n",
+    "    # w.r.t the 0-th input argument -- that is the weights and biases in the hidden and output layer\n",
+    "    cost_function_deep_grad = grad(cost_function_deep,0)\n",
+    "\n",
+    "    # Let the update be done num_iter times\n",
+    "    for i in range(num_iter):\n",
+    "        # Evaluate the gradient at the current weights and biases in P.\n",
+    "        # The cost_grad consist now of N_hidden + 1 arrays; the gradient w.r.t the weights and biases\n",
+    "        # in the hidden layers and output layers evaluated at x.\n",
+    "        cost_deep_grad =  cost_function_deep_grad(P, x)\n",
+    "\n",
+    "        for l in range(N_hidden+1):\n",
+    "            P[l] = P[l] - lmb * cost_deep_grad[l]\n",
+    "\n",
+    "    print('Final cost: %g'%cost_function_deep(P, x))\n",
+    "\n",
+    "    return P\n",
+    "\n",
+    "if __name__ == '__main__':\n",
+    "    npr.seed(4155)\n",
+    "\n",
+    "    ## Decide the vales of arguments to the function to solve\n",
+    "    Nt = 10\n",
+    "    T = 1\n",
+    "    t = np.linspace(0,T, Nt)\n",
+    "\n",
+    "    ## Set up the initial parameters\n",
+    "    num_hidden_neurons = [100, 50, 25]\n",
+    "    num_iter = 1000\n",
+    "    lmb = 1e-3\n",
+    "\n",
+    "    P = solve_ode_deep_neural_network(t, num_hidden_neurons, num_iter, lmb)\n",
+    "\n",
+    "    g_dnn_ag = g_trial_deep(t,P)\n",
+    "    g_analytical = g_analytic(t)\n",
+    "\n",
+    "    # Find the maximum absolute difference between the solutons:\n",
+    "    diff_ag = np.max(np.abs(g_dnn_ag - g_analytical))\n",
+    "    print(\"The max absolute difference between the solutions is: %g\"%diff_ag)\n",
+    "\n",
+    "    plt.figure(figsize=(10,10))\n",
+    "\n",
+    "    plt.title('Performance of neural network solving an ODE compared to the analytical solution')\n",
+    "    plt.plot(t, g_analytical)\n",
+    "    plt.plot(t, g_dnn_ag[0,:])\n",
+    "    plt.legend(['analytical','nn'])\n",
+    "    plt.xlabel('t')\n",
+    "    plt.ylabel('g(t)')\n",
+    "\n",
+    "    plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2407df1c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Using forward Euler to solve the ODE\n",
+    "\n",
+    "A straightforward way of solving an ODE numerically, is to use Euler's method.\n",
+    "\n",
+    "Euler's method uses Taylor series to approximate the value at a function $f$ at a step $\\Delta x$ from $x$:\n",
+    "\n",
+    "$$\n",
+    "f(x + \\Delta x) \\approx f(x) + \\Delta x f'(x)\n",
+    "$$\n",
+    "\n",
+    "In our case, using Euler's method to approximate the value of $g$ at a step $\\Delta t$ from $t$ yields"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e30d9840",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\begin{aligned}\n",
+    "  g(t + \\Delta t) &\\approx g(t) + \\Delta t g'(t) \\\\\n",
+    "  &= g(t) + \\Delta t \\big(\\alpha g(t)(A - g(t))\\big)\n",
+    "\\end{aligned}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4af6e338",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "along with the condition that $g(0) = g_0$.\n",
+    "\n",
+    "Let $t_i = i \\cdot \\Delta t$ where $\\Delta t = \\frac{T}{N_t-1}$ where $T$ is the final time our solver must solve for and $N_t$ the number of values for $t \\in [0, T]$ for $i = 0, \\dots, N_t-1$.\n",
+    "\n",
+    "For $i \\geq 1$, we have that"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "606cf0d3",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\begin{aligned}\n",
+    "t_i &= i\\Delta t \\\\\n",
+    "&= (i - 1)\\Delta t + \\Delta t \\\\\n",
+    "&= t_{i-1} + \\Delta t\n",
+    "\\end{aligned}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3275ea67",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Now, if $g_i = g(t_i)$ then"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8c36efec",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"odenum\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation}\n",
+    "  \\begin{aligned}\n",
+    "  g_i &= g(t_i) \\\\\n",
+    "  &= g(t_{i-1} + \\Delta t) \\\\\n",
+    "  &\\approx g(t_{i-1}) + \\Delta t \\big(\\alpha g(t_{i-1})(A - g(t_{i-1}))\\big) \\\\\n",
+    "  &= g_{i-1} + \\Delta t \\big(\\alpha g_{i-1}(A - g_{i-1})\\big)\n",
+    "  \\end{aligned}\n",
+    "\\end{equation} \\label{odenum} \\tag{12}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5290cde6",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "for $i \\geq 1$ and $g_0 = g(t_0) = g(0) = g_0$.\n",
+    "\n",
+    "Equation ([12](#odenum)) could be implemented in the following way,\n",
+    "extending the program that uses the network using Autograd:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 34,
+   "id": "d5488516",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "# Assume that all function definitions from the example program using Autograd\n",
+    "# are located here.\n",
+    "\n",
+    "if __name__ == '__main__':\n",
+    "    npr.seed(4155)\n",
+    "\n",
+    "    ## Decide the vales of arguments to the function to solve\n",
+    "    Nt = 10\n",
+    "    T = 1\n",
+    "    t = np.linspace(0,T, Nt)\n",
+    "\n",
+    "    ## Set up the initial parameters\n",
+    "    num_hidden_neurons = [100,50,25]\n",
+    "    num_iter = 1000\n",
+    "    lmb = 1e-3\n",
+    "\n",
+    "    P = solve_ode_deep_neural_network(t, num_hidden_neurons, num_iter, lmb)\n",
+    "\n",
+    "    g_dnn_ag = g_trial_deep(t,P)\n",
+    "    g_analytical = g_analytic(t)\n",
+    "\n",
+    "    # Find the maximum absolute difference between the solutons:\n",
+    "    diff_ag = np.max(np.abs(g_dnn_ag - g_analytical))\n",
+    "    print(\"The max absolute difference between the solutions is: %g\"%diff_ag)\n",
+    "\n",
+    "    plt.figure(figsize=(10,10))\n",
+    "\n",
+    "    plt.title('Performance of neural network solving an ODE compared to the analytical solution')\n",
+    "    plt.plot(t, g_analytical)\n",
+    "    plt.plot(t, g_dnn_ag[0,:])\n",
+    "    plt.legend(['analytical','nn'])\n",
+    "    plt.xlabel('t')\n",
+    "    plt.ylabel('g(t)')\n",
+    "\n",
+    "    ## Find an approximation to the funtion using forward Euler\n",
+    "\n",
+    "    alpha, A, g0 = get_parameters()\n",
+    "    dt = T/(Nt - 1)\n",
+    "\n",
+    "    # Perform forward Euler to solve the ODE\n",
+    "    g_euler = np.zeros(Nt)\n",
+    "    g_euler[0] = g0\n",
+    "\n",
+    "    for i in range(1,Nt):\n",
+    "        g_euler[i] = g_euler[i-1] + dt*(alpha*g_euler[i-1]*(A - g_euler[i-1]))\n",
+    "\n",
+    "    # Print the errors done by each method\n",
+    "    diff1 = np.max(np.abs(g_euler - g_analytical))\n",
+    "    diff2 = np.max(np.abs(g_dnn_ag[0,:] - g_analytical))\n",
+    "\n",
+    "    print('Max absolute difference between Euler method and analytical: %g'%diff1)\n",
+    "    print('Max absolute difference between deep neural network and analytical: %g'%diff2)\n",
+    "\n",
+    "    # Plot results\n",
+    "    plt.figure(figsize=(10,10))\n",
+    "\n",
+    "    plt.plot(t,g_euler)\n",
+    "    plt.plot(t,g_analytical)\n",
+    "    plt.plot(t,g_dnn_ag[0,:])\n",
+    "\n",
+    "    plt.legend(['euler','analytical','dnn'])\n",
+    "    plt.xlabel('Time t')\n",
+    "    plt.ylabel('g(t)')\n",
+    "\n",
+    "    plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d631641d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Example: Solving the one dimensional Poisson equation\n",
+    "\n",
+    "The Poisson equation for $g(x)$ in one dimension is"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3bd8043b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"poisson\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation} \\label{poisson} \\tag{13}\n",
+    "  -g''(x) = f(x)\n",
+    "\\end{equation}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "818ac1d8",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where $f(x)$ is a given function for $x \\in (0,1)$.\n",
+    "\n",
+    "The conditions that $g(x)$ is chosen to fulfill, are"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "894be116",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\begin{align*}\n",
+    "  g(0) &= 0 \\\\\n",
+    "  g(1) &= 0\n",
+    "\\end{align*}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c2fce07f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "This equation can be solved numerically using programs where e.g Autograd and TensorFlow are used.\n",
+    "The results from the networks can then be compared to the analytical solution.\n",
+    "In addition, it could be interesting to see how a typical method for numerically solving second order ODEs compares to the neural networks."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1e2ffb5e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## The specific equation to solve for\n",
+    "\n",
+    "Here, the function $g(x)$ to solve for follows the equation"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5677eb07",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "-g''(x) = f(x),\\qquad x \\in (0,1)\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "89173815",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where $f(x)$ is a given function, along with the chosen conditions"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f6e81c01",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"cond\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{aligned}\n",
+    "g(0) = g(1) = 0\n",
+    "\\end{aligned}\\label{cond} \\tag{14}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "82b4c100",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "In this example, we consider the case when $f(x) = (3x + x^2)\\exp(x)$.\n",
+    "\n",
+    "For this case, a possible trial solution satisfying the conditions could be"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "05574f7f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "g_t(x) = x \\cdot (1-x) \\cdot N(P,x)\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5c17a08c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "The analytical solution for this problem is"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a0ce240a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "g(x) = x(1 - x)\\exp(x)\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d90da9be",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Solving the equation using Autograd"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 35,
+   "id": "ffd8b552",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "import autograd.numpy as np\n",
+    "from autograd import grad, elementwise_grad\n",
+    "import autograd.numpy.random as npr\n",
+    "from matplotlib import pyplot as plt\n",
+    "\n",
+    "def sigmoid(z):\n",
+    "    return 1/(1 + np.exp(-z))\n",
+    "\n",
+    "def deep_neural_network(deep_params, x):\n",
+    "    # N_hidden is the number of hidden layers\n",
+    "    # deep_params is a list, len() should be used\n",
+    "    N_hidden = len(deep_params) - 1 # -1 since params consists of\n",
+    "                                        # parameters to all the hidden\n",
+    "                                        # layers AND the output layer.\n",
+    "\n",
+    "    # Assumes input x being an one-dimensional array\n",
+    "    num_values = np.size(x)\n",
+    "    x = x.reshape(-1, num_values)\n",
+    "\n",
+    "    # Assume that the input layer does nothing to the input x\n",
+    "    x_input = x\n",
+    "\n",
+    "    # Due to multiple hidden layers, define a variable referencing to the\n",
+    "    # output of the previous layer:\n",
+    "    x_prev = x_input\n",
+    "\n",
+    "    ## Hidden layers:\n",
+    "\n",
+    "    for l in range(N_hidden):\n",
+    "        # From the list of parameters P; find the correct weigths and bias for this layer\n",
+    "        w_hidden = deep_params[l]\n",
+    "\n",
+    "        # Add a row of ones to include bias\n",
+    "        x_prev = np.concatenate((np.ones((1,num_values)), x_prev ), axis = 0)\n",
+    "\n",
+    "        z_hidden = np.matmul(w_hidden, x_prev)\n",
+    "        x_hidden = sigmoid(z_hidden)\n",
+    "\n",
+    "        # Update x_prev such that next layer can use the output from this layer\n",
+    "        x_prev = x_hidden\n",
+    "\n",
+    "    ## Output layer:\n",
+    "\n",
+    "    # Get the weights and bias for this layer\n",
+    "    w_output = deep_params[-1]\n",
+    "\n",
+    "    # Include bias:\n",
+    "    x_prev = np.concatenate((np.ones((1,num_values)), x_prev), axis = 0)\n",
+    "\n",
+    "    z_output = np.matmul(w_output, x_prev)\n",
+    "    x_output = z_output\n",
+    "\n",
+    "    return x_output\n",
+    "\n",
+    "\n",
+    "def solve_ode_deep_neural_network(x, num_neurons, num_iter, lmb):\n",
+    "    # num_hidden_neurons is now a list of number of neurons within each hidden layer\n",
+    "\n",
+    "    # Find the number of hidden layers:\n",
+    "    N_hidden = np.size(num_neurons)\n",
+    "\n",
+    "    ## Set up initial weigths and biases\n",
+    "\n",
+    "    # Initialize the list of parameters:\n",
+    "    P = [None]*(N_hidden + 1) # + 1 to include the output layer\n",
+    "\n",
+    "    P[0] = npr.randn(num_neurons[0], 2 )\n",
+    "    for l in range(1,N_hidden):\n",
+    "        P[l] = npr.randn(num_neurons[l], num_neurons[l-1] + 1) # +1 to include bias\n",
+    "\n",
+    "    # For the output layer\n",
+    "    P[-1] = npr.randn(1, num_neurons[-1] + 1 ) # +1 since bias is included\n",
+    "\n",
+    "    print('Initial cost: %g'%cost_function_deep(P, x))\n",
+    "\n",
+    "    ## Start finding the optimal weigths using gradient descent\n",
+    "\n",
+    "    # Find the Python function that represents the gradient of the cost function\n",
+    "    # w.r.t the 0-th input argument -- that is the weights and biases in the hidden and output layer\n",
+    "    cost_function_deep_grad = grad(cost_function_deep,0)\n",
+    "\n",
+    "    # Let the update be done num_iter times\n",
+    "    for i in range(num_iter):\n",
+    "        # Evaluate the gradient at the current weights and biases in P.\n",
+    "        # The cost_grad consist now of N_hidden + 1 arrays; the gradient w.r.t the weights and biases\n",
+    "        # in the hidden layers and output layers evaluated at x.\n",
+    "        cost_deep_grad =  cost_function_deep_grad(P, x)\n",
+    "\n",
+    "        for l in range(N_hidden+1):\n",
+    "            P[l] = P[l] - lmb * cost_deep_grad[l]\n",
+    "\n",
+    "    print('Final cost: %g'%cost_function_deep(P, x))\n",
+    "\n",
+    "    return P\n",
+    "\n",
+    "## Set up the cost function specified for this Poisson equation:\n",
+    "\n",
+    "# The right side of the ODE\n",
+    "def f(x):\n",
+    "    return (3*x + x**2)*np.exp(x)\n",
+    "\n",
+    "def cost_function_deep(P, x):\n",
+    "\n",
+    "    # Evaluate the trial function with the current parameters P\n",
+    "    g_t = g_trial_deep(x,P)\n",
+    "\n",
+    "    # Find the derivative w.r.t x of the trial function\n",
+    "    d2_g_t = elementwise_grad(elementwise_grad(g_trial_deep,0))(x,P)\n",
+    "\n",
+    "    right_side = f(x)\n",
+    "\n",
+    "    err_sqr = (-d2_g_t - right_side)**2\n",
+    "    cost_sum = np.sum(err_sqr)\n",
+    "\n",
+    "    return cost_sum/np.size(err_sqr)\n",
+    "\n",
+    "# The trial solution:\n",
+    "def g_trial_deep(x,P):\n",
+    "    return x*(1-x)*deep_neural_network(P,x)\n",
+    "\n",
+    "# The analytic solution;\n",
+    "def g_analytic(x):\n",
+    "    return x*(1-x)*np.exp(x)\n",
+    "\n",
+    "if __name__ == '__main__':\n",
+    "    npr.seed(4155)\n",
+    "\n",
+    "    ## Decide the vales of arguments to the function to solve\n",
+    "    Nx = 10\n",
+    "    x = np.linspace(0,1, Nx)\n",
+    "\n",
+    "    ## Set up the initial parameters\n",
+    "    num_hidden_neurons = [200,100]\n",
+    "    num_iter = 1000\n",
+    "    lmb = 1e-3\n",
+    "\n",
+    "    P = solve_ode_deep_neural_network(x, num_hidden_neurons, num_iter, lmb)\n",
+    "\n",
+    "    g_dnn_ag = g_trial_deep(x,P)\n",
+    "    g_analytical = g_analytic(x)\n",
+    "\n",
+    "    # Find the maximum absolute difference between the solutons:\n",
+    "    max_diff = np.max(np.abs(g_dnn_ag - g_analytical))\n",
+    "    print(\"The max absolute difference between the solutions is: %g\"%max_diff)\n",
+    "\n",
+    "    plt.figure(figsize=(10,10))\n",
+    "\n",
+    "    plt.title('Performance of neural network solving an ODE compared to the analytical solution')\n",
+    "    plt.plot(x, g_analytical)\n",
+    "    plt.plot(x, g_dnn_ag[0,:])\n",
+    "    plt.legend(['analytical','nn'])\n",
+    "    plt.xlabel('x')\n",
+    "    plt.ylabel('g(x)')\n",
+    "    plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2cde42e7",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Comparing with a numerical scheme\n",
+    "\n",
+    "The Poisson equation is possible to solve using Taylor series to approximate the second derivative.\n",
+    "\n",
+    "Using Taylor series, the second derivative can be expressed as\n",
+    "\n",
+    "$$\n",
+    "g''(x) = \\frac{g(x + \\Delta x) - 2g(x) + g(x-\\Delta x)}{\\Delta x^2} + E_{\\Delta x}(x)\n",
+    "$$\n",
+    "\n",
+    "where $\\Delta x$ is a small step size and $E_{\\Delta x}(x)$ being the error term.\n",
+    "\n",
+    "Looking away from the error terms gives an approximation to the second derivative:"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e24a46af",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"approx\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation} \\label{approx} \\tag{15}\n",
+    "g''(x) \\approx \\frac{g(x + \\Delta x) - 2g(x) + g(x-\\Delta x)}{\\Delta x^2}\n",
+    "\\end{equation}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2417ec7c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "If $x_i = i \\Delta x = x_{i-1} + \\Delta x$ and $g_i = g(x_i)$ for $i = 1,\\dots N_x - 2$ with $N_x$ being the number of values for $x$, ([15](#approx)) becomes"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "012a9c2b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\begin{aligned}\n",
+    "g''(x_i) &\\approx \\frac{g(x_i + \\Delta x) - 2g(x_i) + g(x_i -\\Delta x)}{\\Delta x^2} \\\\\n",
+    "&= \\frac{g_{i+1} - 2g_i + g_{i-1}}{\\Delta x^2}\n",
+    "\\end{aligned}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "101bccb8",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Since we know from our problem that"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "280cdc54",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\begin{aligned}\n",
+    "-g''(x) &= f(x) \\\\\n",
+    "&= (3x + x^2)\\exp(x)\n",
+    "\\end{aligned}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "38bc9035",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "along with the conditions $g(0) = g(1) = 0$,\n",
+    "the following scheme can be used to find an approximate solution for $g(x)$ numerically:"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3925a117",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"odesys\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation}\n",
+    "  \\begin{aligned}\n",
+    "  -\\Big( \\frac{g_{i+1} - 2g_i + g_{i-1}}{\\Delta x^2} \\Big) &= f(x_i) \\\\\n",
+    "  -g_{i+1} + 2g_i - g_{i-1} &= \\Delta x^2 f(x_i)\n",
+    "  \\end{aligned}\n",
+    "\\end{equation} \\label{odesys} \\tag{16}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6f86e85b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "for $i = 1, \\dots, N_x - 2$ where $g_0 = g_{N_x - 1} = 0$ and $f(x_i) = (3x_i + x_i^2)\\exp(x_i)$, which is given for our specific problem.\n",
+    "\n",
+    "The equation can be rewritten into a matrix equation:"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "394b14bc",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\begin{aligned}\n",
+    "\\begin{pmatrix}\n",
+    "2 & -1 & 0 & \\dots & 0 \\\\\n",
+    "-1 & 2 & -1 & \\dots & 0 \\\\\n",
+    "\\vdots & & \\ddots & & \\vdots \\\\\n",
+    "0 & \\dots & -1 & 2 & -1  \\\\\n",
+    "0 & \\dots & 0 & -1 & 2\\\\\n",
+    "\\end{pmatrix}\n",
+    "\\begin{pmatrix}\n",
+    "g_1 \\\\\n",
+    "g_2 \\\\\n",
+    "\\vdots \\\\\n",
+    "g_{N_x - 3} \\\\\n",
+    "g_{N_x - 2}\n",
+    "\\end{pmatrix}\n",
+    "&=\n",
+    "\\Delta x^2\n",
+    "\\begin{pmatrix}\n",
+    "f(x_1) \\\\\n",
+    "f(x_2) \\\\\n",
+    "\\vdots \\\\\n",
+    "f(x_{N_x - 3}) \\\\\n",
+    "f(x_{N_x - 2})\n",
+    "\\end{pmatrix} \\\\\n",
+    "\\boldsymbol{A}\\boldsymbol{g} &= \\boldsymbol{f},\n",
+    "\\end{aligned}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5ab07ae1",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "which makes it possible to solve for the vector $\\boldsymbol{g}$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8134c34f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Setting up the code\n",
+    "\n",
+    "We can then compare the result from this numerical scheme with the output from our network using Autograd:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 36,
+   "id": "4362f9a9",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "import autograd.numpy as np\n",
+    "from autograd import grad, elementwise_grad\n",
+    "import autograd.numpy.random as npr\n",
+    "from matplotlib import pyplot as plt\n",
+    "\n",
+    "def sigmoid(z):\n",
+    "    return 1/(1 + np.exp(-z))\n",
+    "\n",
+    "def deep_neural_network(deep_params, x):\n",
+    "    # N_hidden is the number of hidden layers\n",
+    "    # deep_params is a list, len() should be used\n",
+    "    N_hidden = len(deep_params) - 1 # -1 since params consists of\n",
+    "                                        # parameters to all the hidden\n",
+    "                                        # layers AND the output layer.\n",
+    "\n",
+    "    # Assumes input x being an one-dimensional array\n",
+    "    num_values = np.size(x)\n",
+    "    x = x.reshape(-1, num_values)\n",
+    "\n",
+    "    # Assume that the input layer does nothing to the input x\n",
+    "    x_input = x\n",
+    "\n",
+    "    # Due to multiple hidden layers, define a variable referencing to the\n",
+    "    # output of the previous layer:\n",
+    "    x_prev = x_input\n",
+    "\n",
+    "    ## Hidden layers:\n",
+    "\n",
+    "    for l in range(N_hidden):\n",
+    "        # From the list of parameters P; find the correct weigths and bias for this layer\n",
+    "        w_hidden = deep_params[l]\n",
+    "\n",
+    "        # Add a row of ones to include bias\n",
+    "        x_prev = np.concatenate((np.ones((1,num_values)), x_prev ), axis = 0)\n",
+    "\n",
+    "        z_hidden = np.matmul(w_hidden, x_prev)\n",
+    "        x_hidden = sigmoid(z_hidden)\n",
+    "\n",
+    "        # Update x_prev such that next layer can use the output from this layer\n",
+    "        x_prev = x_hidden\n",
+    "\n",
+    "    ## Output layer:\n",
+    "\n",
+    "    # Get the weights and bias for this layer\n",
+    "    w_output = deep_params[-1]\n",
+    "\n",
+    "    # Include bias:\n",
+    "    x_prev = np.concatenate((np.ones((1,num_values)), x_prev), axis = 0)\n",
+    "\n",
+    "    z_output = np.matmul(w_output, x_prev)\n",
+    "    x_output = z_output\n",
+    "\n",
+    "    return x_output\n",
+    "\n",
+    "\n",
+    "def solve_ode_deep_neural_network(x, num_neurons, num_iter, lmb):\n",
+    "    # num_hidden_neurons is now a list of number of neurons within each hidden layer\n",
+    "\n",
+    "    # Find the number of hidden layers:\n",
+    "    N_hidden = np.size(num_neurons)\n",
+    "\n",
+    "    ## Set up initial weigths and biases\n",
+    "\n",
+    "    # Initialize the list of parameters:\n",
+    "    P = [None]*(N_hidden + 1) # + 1 to include the output layer\n",
+    "\n",
+    "    P[0] = npr.randn(num_neurons[0], 2 )\n",
+    "    for l in range(1,N_hidden):\n",
+    "        P[l] = npr.randn(num_neurons[l], num_neurons[l-1] + 1) # +1 to include bias\n",
+    "\n",
+    "    # For the output layer\n",
+    "    P[-1] = npr.randn(1, num_neurons[-1] + 1 ) # +1 since bias is included\n",
+    "\n",
+    "    print('Initial cost: %g'%cost_function_deep(P, x))\n",
+    "\n",
+    "    ## Start finding the optimal weigths using gradient descent\n",
+    "\n",
+    "    # Find the Python function that represents the gradient of the cost function\n",
+    "    # w.r.t the 0-th input argument -- that is the weights and biases in the hidden and output layer\n",
+    "    cost_function_deep_grad = grad(cost_function_deep,0)\n",
+    "\n",
+    "    # Let the update be done num_iter times\n",
+    "    for i in range(num_iter):\n",
+    "        # Evaluate the gradient at the current weights and biases in P.\n",
+    "        # The cost_grad consist now of N_hidden + 1 arrays; the gradient w.r.t the weights and biases\n",
+    "        # in the hidden layers and output layers evaluated at x.\n",
+    "        cost_deep_grad =  cost_function_deep_grad(P, x)\n",
+    "\n",
+    "        for l in range(N_hidden+1):\n",
+    "            P[l] = P[l] - lmb * cost_deep_grad[l]\n",
+    "\n",
+    "    print('Final cost: %g'%cost_function_deep(P, x))\n",
+    "\n",
+    "    return P\n",
+    "\n",
+    "## Set up the cost function specified for this Poisson equation:\n",
+    "\n",
+    "# The right side of the ODE\n",
+    "def f(x):\n",
+    "    return (3*x + x**2)*np.exp(x)\n",
+    "\n",
+    "def cost_function_deep(P, x):\n",
+    "\n",
+    "    # Evaluate the trial function with the current parameters P\n",
+    "    g_t = g_trial_deep(x,P)\n",
+    "\n",
+    "    # Find the derivative w.r.t x of the trial function\n",
+    "    d2_g_t = elementwise_grad(elementwise_grad(g_trial_deep,0))(x,P)\n",
+    "\n",
+    "    right_side = f(x)\n",
+    "\n",
+    "    err_sqr = (-d2_g_t - right_side)**2\n",
+    "    cost_sum = np.sum(err_sqr)\n",
+    "\n",
+    "    return cost_sum/np.size(err_sqr)\n",
+    "\n",
+    "# The trial solution:\n",
+    "def g_trial_deep(x,P):\n",
+    "    return x*(1-x)*deep_neural_network(P,x)\n",
+    "\n",
+    "# The analytic solution;\n",
+    "def g_analytic(x):\n",
+    "    return x*(1-x)*np.exp(x)\n",
+    "\n",
+    "if __name__ == '__main__':\n",
+    "    npr.seed(4155)\n",
+    "\n",
+    "    ## Decide the vales of arguments to the function to solve\n",
+    "    Nx = 10\n",
+    "    x = np.linspace(0,1, Nx)\n",
+    "\n",
+    "    ## Set up the initial parameters\n",
+    "    num_hidden_neurons = [200,100]\n",
+    "    num_iter = 1000\n",
+    "    lmb = 1e-3\n",
+    "\n",
+    "    P = solve_ode_deep_neural_network(x, num_hidden_neurons, num_iter, lmb)\n",
+    "\n",
+    "    g_dnn_ag = g_trial_deep(x,P)\n",
+    "    g_analytical = g_analytic(x)\n",
+    "\n",
+    "    # Find the maximum absolute difference between the solutons:\n",
+    "\n",
+    "    plt.figure(figsize=(10,10))\n",
+    "\n",
+    "    plt.title('Performance of neural network solving an ODE compared to the analytical solution')\n",
+    "    plt.plot(x, g_analytical)\n",
+    "    plt.plot(x, g_dnn_ag[0,:])\n",
+    "    plt.legend(['analytical','nn'])\n",
+    "    plt.xlabel('x')\n",
+    "    plt.ylabel('g(x)')\n",
+    "\n",
+    "    ## Perform the computation using the numerical scheme\n",
+    "\n",
+    "    dx = 1/(Nx - 1)\n",
+    "\n",
+    "    # Set up the matrix A\n",
+    "    A = np.zeros((Nx-2,Nx-2))\n",
+    "\n",
+    "    A[0,0] = 2\n",
+    "    A[0,1] = -1\n",
+    "\n",
+    "    for i in range(1,Nx-3):\n",
+    "        A[i,i-1] = -1\n",
+    "        A[i,i] = 2\n",
+    "        A[i,i+1] = -1\n",
+    "\n",
+    "    A[Nx - 3, Nx - 4] = -1\n",
+    "    A[Nx - 3, Nx - 3] = 2\n",
+    "\n",
+    "    # Set up the vector f\n",
+    "    f_vec = dx**2 * f(x[1:-1])\n",
+    "\n",
+    "    # Solve the equation\n",
+    "    g_res = np.linalg.solve(A,f_vec)\n",
+    "\n",
+    "    g_vec = np.zeros(Nx)\n",
+    "    g_vec[1:-1] = g_res\n",
+    "\n",
+    "    # Print the differences between each method\n",
+    "    max_diff1 = np.max(np.abs(g_dnn_ag - g_analytical))\n",
+    "    max_diff2 = np.max(np.abs(g_vec - g_analytical))\n",
+    "    print(\"The max absolute difference between the analytical solution and DNN Autograd: %g\"%max_diff1)\n",
+    "    print(\"The max absolute difference between the analytical solution and numerical scheme: %g\"%max_diff2)\n",
+    "\n",
+    "    # Plot the results\n",
+    "    plt.figure(figsize=(10,10))\n",
+    "\n",
+    "    plt.plot(x,g_vec)\n",
+    "    plt.plot(x,g_analytical)\n",
+    "    plt.plot(x,g_dnn_ag[0,:])\n",
+    "\n",
+    "    plt.legend(['numerical scheme','analytical','dnn'])\n",
+    "    plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c66dc85a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Partial Differential Equations\n",
+    "\n",
+    "A partial differential equation (PDE) has a solution here the function\n",
+    "is defined by multiple variables.  The equation may involve all kinds\n",
+    "of combinations of which variables the function is differentiated with\n",
+    "respect to.\n",
+    "\n",
+    "In general, a partial differential equation for a function $g(x_1,\\dots,x_N)$ with $N$ variables may be expressed as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cf60d1fc",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"PDE\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation} \\label{PDE} \\tag{17}\n",
+    "  f\\left(x_1, \\, \\dots \\, , x_N, \\frac{\\partial g(x_1,\\dots,x_N) }{\\partial x_1}, \\dots , \\frac{\\partial g(x_1,\\dots,x_N) }{\\partial x_N}, \\frac{\\partial g(x_1,\\dots,x_N) }{\\partial x_1\\partial x_2}, \\, \\dots \\, , \\frac{\\partial^n g(x_1,\\dots,x_N) }{\\partial x_N^n} \\right) = 0\n",
+    "\\end{equation}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "bff85f6e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where $f$ is an expression involving all kinds of possible mixed derivatives of $g(x_1,\\dots,x_N)$ up to an order $n$. In order for the solution to be unique, some additional conditions must also be given."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "64289867",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Type of problem\n",
+    "\n",
+    "The problem our network must solve for, is similar to the ODE case.\n",
+    "We must have a trial solution $g_t$ at hand.\n",
+    "\n",
+    "For instance, the trial solution could be expressed as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "75d3a4d2",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\begin{align*}\n",
+    "  g_t(x_1,\\dots,x_N) = h_1(x_1,\\dots,x_N) + h_2(x_1,\\dots,x_N,N(x_1,\\dots,x_N,P))\n",
+    "\\end{align*}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6f3e695d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where $h_1(x_1,\\dots,x_N)$ is a function that ensures $g_t(x_1,\\dots,x_N)$ satisfies some given conditions.\n",
+    "The neural network $N(x_1,\\dots,x_N,P)$ has weights and biases described by $P$ and $h_2(x_1,\\dots,x_N,N(x_1,\\dots,x_N,P))$ is an expression using the output from the neural network in some way.\n",
+    "\n",
+    "The role of the function $h_2(x_1,\\dots,x_N,N(x_1,\\dots,x_N,P))$, is to ensure that the output of $N(x_1,\\dots,x_N,P)$ is zero when $g_t(x_1,\\dots,x_N)$ is evaluated at the values of $x_1,\\dots,x_N$ where the given conditions must be satisfied. The function $h_1(x_1,\\dots,x_N)$ should alone make $g_t(x_1,\\dots,x_N)$ satisfy the conditions."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "da1ba3cf",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Network requirements\n",
+    "\n",
+    "The network tries then the minimize the cost function following the\n",
+    "same ideas as described for the ODE case, but now with more than one\n",
+    "variables to consider.  The concept still remains the same; find a set\n",
+    "of parameters $P$ such that the expression $f$ in ([17](#PDE)) is as\n",
+    "close to zero as possible.\n",
+    "\n",
+    "As for the ODE case, the cost function is the mean squared error that\n",
+    "the network must try to minimize. The cost function for the network to\n",
+    "minimize is"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "373065ff",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "C\\left(x_1, \\dots, x_N, P\\right) = \\left(  f\\left(x_1, \\, \\dots \\, , x_N, \\frac{\\partial g(x_1,\\dots,x_N) }{\\partial x_1}, \\dots , \\frac{\\partial g(x_1,\\dots,x_N) }{\\partial x_N}, \\frac{\\partial g(x_1,\\dots,x_N) }{\\partial x_1\\partial x_2}, \\, \\dots \\, , \\frac{\\partial^n g(x_1,\\dots,x_N) }{\\partial x_N^n} \\right) \\right)^2\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2281eade",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## More details\n",
+    "\n",
+    "If we let $\\boldsymbol{x} = \\big( x_1, \\dots, x_N \\big)$ be an array containing the values for $x_1, \\dots, x_N$ respectively, the cost function can be reformulated into the following:"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "989a8905",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "C\\left(\\boldsymbol{x}, P\\right) = f\\left( \\left( \\boldsymbol{x}, \\frac{\\partial g(\\boldsymbol{x}) }{\\partial x_1}, \\dots , \\frac{\\partial g(\\boldsymbol{x}) }{\\partial x_N}, \\frac{\\partial g(\\boldsymbol{x}) }{\\partial x_1\\partial x_2}, \\, \\dots \\, , \\frac{\\partial^n g(\\boldsymbol{x}) }{\\partial x_N^n} \\right) \\right)^2\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b36367a0",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "If we also have $M$ different sets of values for $x_1, \\dots, x_N$, that is $\\boldsymbol{x}_i = \\big(x_1^{(i)}, \\dots, x_N^{(i)}\\big)$ for $i = 1,\\dots,M$ being the rows in matrix $X$, the cost function can be generalized into"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6f6f51dd",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "C\\left(X, P \\right) = \\sum_{i=1}^M f\\left( \\left( \\boldsymbol{x}_i, \\frac{\\partial g(\\boldsymbol{x}_i) }{\\partial x_1}, \\dots , \\frac{\\partial g(\\boldsymbol{x}_i) }{\\partial x_N}, \\frac{\\partial g(\\boldsymbol{x}_i) }{\\partial x_1\\partial x_2}, \\, \\dots \\, , \\frac{\\partial^n g(\\boldsymbol{x}_i) }{\\partial x_N^n} \\right) \\right)^2.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "35bd1e4a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Example: The diffusion equation\n",
+    "\n",
+    "In one spatial dimension, the equation reads"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2b804c0a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial g(x,t)}{\\partial t} = \\frac{\\partial^2 g(x,t)}{\\partial x^2}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "07f20557",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where a possible choice of conditions are"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0e14c702",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\begin{align*}\n",
+    "g(0,t) &= 0 ,\\qquad t \\geq 0 \\\\\n",
+    "g(1,t) &= 0, \\qquad t \\geq 0 \\\\\n",
+    "g(x,0) &= u(x),\\qquad x\\in [0,1]\n",
+    "\\end{align*}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a19c5cae",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "with $u(x)$ being some given function."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "de041a40",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Defining the problem\n",
+    "\n",
+    "For this case, we want to find $g(x,t)$ such that"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "519bb7a7",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"diffonedim\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation}\n",
+    "  \\frac{\\partial g(x,t)}{\\partial t} = \\frac{\\partial^2 g(x,t)}{\\partial x^2}\n",
+    "\\end{equation} \\label{diffonedim} \\tag{18}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "129322ea",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ddc7b725",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\begin{align*}\n",
+    "g(0,t) &= 0 ,\\qquad t \\geq 0 \\\\\n",
+    "g(1,t) &= 0, \\qquad t \\geq 0 \\\\\n",
+    "g(x,0) &= u(x),\\qquad x\\in [0,1]\n",
+    "\\end{align*}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5497b34b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "with $u(x) = \\sin(\\pi x)$.\n",
+    "\n",
+    "First, let us set up the deep neural network.\n",
+    "The deep neural network will follow the same structure as discussed in the examples solving the ODEs.\n",
+    "First, we will look into how Autograd could be used in a network tailored to solve for bivariate functions."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0b9040e4",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Setting up the network using Autograd\n",
+    "\n",
+    "The only change to do here, is to extend our network such that\n",
+    "functions of multiple parameters are correctly handled.  In this case\n",
+    "we have two variables in our function to solve for, that is time $t$\n",
+    "and position $x$.  The variables will be represented by a\n",
+    "one-dimensional array in the program.  The program will evaluate the\n",
+    "network at each possible pair $(x,t)$, given an array for the desired\n",
+    "$x$-values and $t$-values to approximate the solution at."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 37,
+   "id": "17097802",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "def sigmoid(z):\n",
+    "    return 1/(1 + np.exp(-z))\n",
+    "\n",
+    "def deep_neural_network(deep_params, x):\n",
+    "    # x is now a point and a 1D numpy array; make it a column vector\n",
+    "    num_coordinates = np.size(x,0)\n",
+    "    x = x.reshape(num_coordinates,-1)\n",
+    "\n",
+    "    num_points = np.size(x,1)\n",
+    "\n",
+    "    # N_hidden is the number of hidden layers\n",
+    "    N_hidden = len(deep_params) - 1 # -1 since params consist of parameters to all the hidden layers AND the output layer\n",
+    "\n",
+    "    # Assume that the input layer does nothing to the input x\n",
+    "    x_input = x\n",
+    "    x_prev = x_input\n",
+    "\n",
+    "    ## Hidden layers:\n",
+    "\n",
+    "    for l in range(N_hidden):\n",
+    "        # From the list of parameters P; find the correct weigths and bias for this layer\n",
+    "        w_hidden = deep_params[l]\n",
+    "\n",
+    "        # Add a row of ones to include bias\n",
+    "        x_prev = np.concatenate((np.ones((1,num_points)), x_prev ), axis = 0)\n",
+    "\n",
+    "        z_hidden = np.matmul(w_hidden, x_prev)\n",
+    "        x_hidden = sigmoid(z_hidden)\n",
+    "\n",
+    "        # Update x_prev such that next layer can use the output from this layer\n",
+    "        x_prev = x_hidden\n",
+    "\n",
+    "    ## Output layer:\n",
+    "\n",
+    "    # Get the weights and bias for this layer\n",
+    "    w_output = deep_params[-1]\n",
+    "\n",
+    "    # Include bias:\n",
+    "    x_prev = np.concatenate((np.ones((1,num_points)), x_prev), axis = 0)\n",
+    "\n",
+    "    z_output = np.matmul(w_output, x_prev)\n",
+    "    x_output = z_output\n",
+    "\n",
+    "    return x_output[0][0]"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a2178b56",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Setting up the network using Autograd; The trial solution\n",
+    "\n",
+    "The cost function must then iterate through the given arrays\n",
+    "containing values for $x$ and $t$, defines a point $(x,t)$ the deep\n",
+    "neural network and the trial solution is evaluated at, and then finds\n",
+    "the Jacobian of the trial solution.\n",
+    "\n",
+    "A possible trial solution for this PDE is\n",
+    "\n",
+    "$$\n",
+    "g_t(x,t) = h_1(x,t) + x(1-x)tN(x,t,P)\n",
+    "$$\n",
+    "\n",
+    "with $A(x,t)$ being a function ensuring that $g_t(x,t)$ satisfies our given conditions, and $N(x,t,P)$ being the output from the deep neural network using weights and biases for each layer from $P$.\n",
+    "\n",
+    "To fulfill the conditions, $A(x,t)$ could be:\n",
+    "\n",
+    "$$\n",
+    "h_1(x,t) = (1-t)\\Big(u(x) - \\big((1-x)u(0) + x u(1)\\big)\\Big) = (1-t)u(x) = (1-t)\\sin(\\pi x)\n",
+    "$$\n",
+    "since $(0) = u(1) = 0$ and $u(x) = \\sin(\\pi x)$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "533f4e84",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Why the jacobian?\n",
+    "\n",
+    "The Jacobian is used because the program must find the derivative of\n",
+    "the trial solution with respect to $x$ and $t$.\n",
+    "\n",
+    "This gives the necessity of computing the Jacobian matrix, as we want\n",
+    "to evaluate the gradient with respect to $x$ and $t$ (note that the\n",
+    "Jacobian of a scalar-valued multivariate function is simply its\n",
+    "gradient).\n",
+    "\n",
+    "In Autograd, the differentiation is by default done with respect to\n",
+    "the first input argument of your Python function. Since the points is\n",
+    "an array representing $x$ and $t$, the Jacobian is calculated using\n",
+    "the values of $x$ and $t$.\n",
+    "\n",
+    "To find the second derivative with respect to $x$ and $t$, the\n",
+    "Jacobian can be found for the second time. The result is a Hessian\n",
+    "matrix, which is the matrix containing all the possible second order\n",
+    "mixed derivatives of $g(x,t)$."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 38,
+   "id": "7b494481",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "# Set up the trial function:\n",
+    "def u(x):\n",
+    "    return np.sin(np.pi*x)\n",
+    "\n",
+    "def g_trial(point,P):\n",
+    "    x,t = point\n",
+    "    return (1-t)*u(x) + x*(1-x)*t*deep_neural_network(P,point)\n",
+    "\n",
+    "# The right side of the ODE:\n",
+    "def f(point):\n",
+    "    return 0.\n",
+    "\n",
+    "# The cost function:\n",
+    "def cost_function(P, x, t):\n",
+    "    cost_sum = 0\n",
+    "\n",
+    "    g_t_jacobian_func = jacobian(g_trial)\n",
+    "    g_t_hessian_func = hessian(g_trial)\n",
+    "\n",
+    "    for x_ in x:\n",
+    "        for t_ in t:\n",
+    "            point = np.array([x_,t_])\n",
+    "\n",
+    "            g_t = g_trial(point,P)\n",
+    "            g_t_jacobian = g_t_jacobian_func(point,P)\n",
+    "            g_t_hessian = g_t_hessian_func(point,P)\n",
+    "\n",
+    "            g_t_dt = g_t_jacobian[1]\n",
+    "            g_t_d2x = g_t_hessian[0][0]\n",
+    "\n",
+    "            func = f(point)\n",
+    "\n",
+    "            err_sqr = ( (g_t_dt - g_t_d2x) - func)**2\n",
+    "            cost_sum += err_sqr\n",
+    "\n",
+    "    return cost_sum"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9f4b4939",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Setting up the network using Autograd; The full program\n",
+    "\n",
+    "Having set up the network, along with the trial solution and cost function, we can now see how the deep neural network performs by comparing the results to the analytical solution.\n",
+    "\n",
+    "The analytical solution of our problem is\n",
+    "\n",
+    "$$\n",
+    "g(x,t) = \\exp(-\\pi^2 t)\\sin(\\pi x)\n",
+    "$$\n",
+    "\n",
+    "A possible way to implement a neural network solving the PDE, is given below.\n",
+    "Be aware, though, that it is fairly slow for the parameters used.\n",
+    "A better result is possible, but requires more iterations, and thus longer time to complete.\n",
+    "\n",
+    "Indeed, the program below is not optimal in its implementation, but rather serves as an example on how to implement and use a neural network to solve a PDE.\n",
+    "Using TensorFlow results in a much better execution time. Try it!"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 39,
+   "id": "83d6eb7d",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "import autograd.numpy as np\n",
+    "from autograd import jacobian,hessian,grad\n",
+    "import autograd.numpy.random as npr\n",
+    "from matplotlib import cm\n",
+    "from matplotlib import pyplot as plt\n",
+    "from mpl_toolkits.mplot3d import axes3d\n",
+    "\n",
+    "## Set up the network\n",
+    "\n",
+    "def sigmoid(z):\n",
+    "    return 1/(1 + np.exp(-z))\n",
+    "\n",
+    "def deep_neural_network(deep_params, x):\n",
+    "    # x is now a point and a 1D numpy array; make it a column vector\n",
+    "    num_coordinates = np.size(x,0)\n",
+    "    x = x.reshape(num_coordinates,-1)\n",
+    "\n",
+    "    num_points = np.size(x,1)\n",
+    "\n",
+    "    # N_hidden is the number of hidden layers\n",
+    "    N_hidden = len(deep_params) - 1 # -1 since params consist of parameters to all the hidden layers AND the output layer\n",
+    "\n",
+    "    # Assume that the input layer does nothing to the input x\n",
+    "    x_input = x\n",
+    "    x_prev = x_input\n",
+    "\n",
+    "    ## Hidden layers:\n",
+    "\n",
+    "    for l in range(N_hidden):\n",
+    "        # From the list of parameters P; find the correct weigths and bias for this layer\n",
+    "        w_hidden = deep_params[l]\n",
+    "\n",
+    "        # Add a row of ones to include bias\n",
+    "        x_prev = np.concatenate((np.ones((1,num_points)), x_prev ), axis = 0)\n",
+    "\n",
+    "        z_hidden = np.matmul(w_hidden, x_prev)\n",
+    "        x_hidden = sigmoid(z_hidden)\n",
+    "\n",
+    "        # Update x_prev such that next layer can use the output from this layer\n",
+    "        x_prev = x_hidden\n",
+    "\n",
+    "    ## Output layer:\n",
+    "\n",
+    "    # Get the weights and bias for this layer\n",
+    "    w_output = deep_params[-1]\n",
+    "\n",
+    "    # Include bias:\n",
+    "    x_prev = np.concatenate((np.ones((1,num_points)), x_prev), axis = 0)\n",
+    "\n",
+    "    z_output = np.matmul(w_output, x_prev)\n",
+    "    x_output = z_output\n",
+    "\n",
+    "    return x_output[0][0]\n",
+    "\n",
+    "## Define the trial solution and cost function\n",
+    "def u(x):\n",
+    "    return np.sin(np.pi*x)\n",
+    "\n",
+    "def g_trial(point,P):\n",
+    "    x,t = point\n",
+    "    return (1-t)*u(x) + x*(1-x)*t*deep_neural_network(P,point)\n",
+    "\n",
+    "# The right side of the ODE:\n",
+    "def f(point):\n",
+    "    return 0.\n",
+    "\n",
+    "# The cost function:\n",
+    "def cost_function(P, x, t):\n",
+    "    cost_sum = 0\n",
+    "\n",
+    "    g_t_jacobian_func = jacobian(g_trial)\n",
+    "    g_t_hessian_func = hessian(g_trial)\n",
+    "\n",
+    "    for x_ in x:\n",
+    "        for t_ in t:\n",
+    "            point = np.array([x_,t_])\n",
+    "\n",
+    "            g_t = g_trial(point,P)\n",
+    "            g_t_jacobian = g_t_jacobian_func(point,P)\n",
+    "            g_t_hessian = g_t_hessian_func(point,P)\n",
+    "\n",
+    "            g_t_dt = g_t_jacobian[1]\n",
+    "            g_t_d2x = g_t_hessian[0][0]\n",
+    "\n",
+    "            func = f(point)\n",
+    "\n",
+    "            err_sqr = ( (g_t_dt - g_t_d2x) - func)**2\n",
+    "            cost_sum += err_sqr\n",
+    "\n",
+    "    return cost_sum /( np.size(x)*np.size(t) )\n",
+    "\n",
+    "## For comparison, define the analytical solution\n",
+    "def g_analytic(point):\n",
+    "    x,t = point\n",
+    "    return np.exp(-np.pi**2*t)*np.sin(np.pi*x)\n",
+    "\n",
+    "## Set up a function for training the network to solve for the equation\n",
+    "def solve_pde_deep_neural_network(x,t, num_neurons, num_iter, lmb):\n",
+    "    ## Set up initial weigths and biases\n",
+    "    N_hidden = np.size(num_neurons)\n",
+    "\n",
+    "    ## Set up initial weigths and biases\n",
+    "\n",
+    "    # Initialize the list of parameters:\n",
+    "    P = [None]*(N_hidden + 1) # + 1 to include the output layer\n",
+    "\n",
+    "    P[0] = npr.randn(num_neurons[0], 2 + 1 ) # 2 since we have two points, +1 to include bias\n",
+    "    for l in range(1,N_hidden):\n",
+    "        P[l] = npr.randn(num_neurons[l], num_neurons[l-1] + 1) # +1 to include bias\n",
+    "\n",
+    "    # For the output layer\n",
+    "    P[-1] = npr.randn(1, num_neurons[-1] + 1 ) # +1 since bias is included\n",
+    "\n",
+    "    print('Initial cost: ',cost_function(P, x, t))\n",
+    "\n",
+    "    cost_function_grad = grad(cost_function,0)\n",
+    "\n",
+    "    # Let the update be done num_iter times\n",
+    "    for i in range(num_iter):\n",
+    "        cost_grad =  cost_function_grad(P, x , t)\n",
+    "\n",
+    "        for l in range(N_hidden+1):\n",
+    "            P[l] = P[l] - lmb * cost_grad[l]\n",
+    "\n",
+    "    print('Final cost: ',cost_function(P, x, t))\n",
+    "\n",
+    "    return P\n",
+    "\n",
+    "if __name__ == '__main__':\n",
+    "    ### Use the neural network:\n",
+    "    npr.seed(15)\n",
+    "\n",
+    "    ## Decide the vales of arguments to the function to solve\n",
+    "    Nx = 10; Nt = 10\n",
+    "    x = np.linspace(0, 1, Nx)\n",
+    "    t = np.linspace(0,1,Nt)\n",
+    "\n",
+    "    ## Set up the parameters for the network\n",
+    "    num_hidden_neurons = [100, 25]\n",
+    "    num_iter = 250\n",
+    "    lmb = 0.01\n",
+    "\n",
+    "    P = solve_pde_deep_neural_network(x,t, num_hidden_neurons, num_iter, lmb)\n",
+    "\n",
+    "    ## Store the results\n",
+    "    g_dnn_ag = np.zeros((Nx, Nt))\n",
+    "    G_analytical = np.zeros((Nx, Nt))\n",
+    "    for i,x_ in enumerate(x):\n",
+    "        for j, t_ in enumerate(t):\n",
+    "            point = np.array([x_, t_])\n",
+    "            g_dnn_ag[i,j] = g_trial(point,P)\n",
+    "\n",
+    "            G_analytical[i,j] = g_analytic(point)\n",
+    "\n",
+    "    # Find the map difference between the analytical and the computed solution\n",
+    "    diff_ag = np.abs(g_dnn_ag - G_analytical)\n",
+    "    print('Max absolute difference between the analytical solution and the network: %g'%np.max(diff_ag))\n",
+    "\n",
+    "    ## Plot the solutions in two dimensions, that being in position and time\n",
+    "\n",
+    "    T,X = np.meshgrid(t,x)\n",
+    "\n",
+    "    fig = plt.figure(figsize=(10,10))\n",
+    "    ax = fig.add_suplot(projection='3d')\n",
+    "    ax.set_title('Solution from the deep neural network w/ %d layer'%len(num_hidden_neurons))\n",
+    "    s = ax.plot_surface(T,X,g_dnn_ag,linewidth=0,antialiased=False,cmap=cm.viridis)\n",
+    "    ax.set_xlabel('Time $t$')\n",
+    "    ax.set_ylabel('Position $x$');\n",
+    "\n",
+    "\n",
+    "    fig = plt.figure(figsize=(10,10))\n",
+    "    ax = fig.add_suplot(projection='3d')\n",
+    "    ax.set_title('Analytical solution')\n",
+    "    s = ax.plot_surface(T,X,G_analytical,linewidth=0,antialiased=False,cmap=cm.viridis)\n",
+    "    ax.set_xlabel('Time $t$')\n",
+    "    ax.set_ylabel('Position $x$');\n",
+    "\n",
+    "    fig = plt.figure(figsize=(10,10))\n",
+    "    ax = fig.add_suplot(projection='3d')\n",
+    "    ax.set_title('Difference')\n",
+    "    s = ax.plot_surface(T,X,diff_ag,linewidth=0,antialiased=False,cmap=cm.viridis)\n",
+    "    ax.set_xlabel('Time $t$')\n",
+    "    ax.set_ylabel('Position $x$');\n",
+    "\n",
+    "    ## Take some slices of the 3D plots just to see the solutions at particular times\n",
+    "    indx1 = 0\n",
+    "    indx2 = int(Nt/2)\n",
+    "    indx3 = Nt-1\n",
+    "\n",
+    "    t1 = t[indx1]\n",
+    "    t2 = t[indx2]\n",
+    "    t3 = t[indx3]\n",
+    "\n",
+    "    # Slice the results from the DNN\n",
+    "    res1 = g_dnn_ag[:,indx1]\n",
+    "    res2 = g_dnn_ag[:,indx2]\n",
+    "    res3 = g_dnn_ag[:,indx3]\n",
+    "\n",
+    "    # Slice the analytical results\n",
+    "    res_analytical1 = G_analytical[:,indx1]\n",
+    "    res_analytical2 = G_analytical[:,indx2]\n",
+    "    res_analytical3 = G_analytical[:,indx3]\n",
+    "\n",
+    "    # Plot the slices\n",
+    "    plt.figure(figsize=(10,10))\n",
+    "    plt.title(\"Computed solutions at time = %g\"%t1)\n",
+    "    plt.plot(x, res1)\n",
+    "    plt.plot(x,res_analytical1)\n",
+    "    plt.legend(['dnn','analytical'])\n",
+    "\n",
+    "    plt.figure(figsize=(10,10))\n",
+    "    plt.title(\"Computed solutions at time = %g\"%t2)\n",
+    "    plt.plot(x, res2)\n",
+    "    plt.plot(x,res_analytical2)\n",
+    "    plt.legend(['dnn','analytical'])\n",
+    "\n",
+    "    plt.figure(figsize=(10,10))\n",
+    "    plt.title(\"Computed solutions at time = %g\"%t3)\n",
+    "    plt.plot(x, res3)\n",
+    "    plt.plot(x,res_analytical3)\n",
+    "    plt.legend(['dnn','analytical'])\n",
+    "\n",
+    "    plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ada13a48",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Example: Solving the wave equation with Neural Networks\n",
+    "\n",
+    "The wave equation is"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e4727d73",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial^2 g(x,t)}{\\partial t^2} = c^2\\frac{\\partial^2 g(x,t)}{\\partial x^2}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0b86d555",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "with $c$ being the specified wave speed.\n",
+    "\n",
+    "Here, the chosen conditions are"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "216948d5",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\begin{align*}\n",
+    "\tg(0,t) &= 0 \\\\\n",
+    "\tg(1,t) &= 0 \\\\\n",
+    "\tg(x,0) &= u(x) \\\\\n",
+    "\t\\frac{\\partial g(x,t)}{\\partial t} \\Big |_{t = 0} &= v(x)\n",
+    "\\end{align*}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "44c25fdc",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where $\\frac{\\partial g(x,t)}{\\partial t} \\Big |_{t = 0}$ means the derivative of $g(x,t)$ with respect to $t$ is evaluated at $t = 0$, and $u(x)$ and $v(x)$ being given functions."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "98f919eb",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## The problem to solve for\n",
+    "\n",
+    "The wave equation to solve for, is"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "01299767",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"wave\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation} \\label{wave} \\tag{19}\n",
+    "\\frac{\\partial^2 g(x,t)}{\\partial t^2} = c^2 \\frac{\\partial^2 g(x,t)}{\\partial x^2}\n",
+    "\\end{equation}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "556587c5",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where $c$ is the given wave speed.\n",
+    "The chosen conditions for this equation are"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c9eb4f3a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"condwave\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{aligned}\n",
+    "g(0,t) &= 0, &t \\geq 0 \\\\\n",
+    "g(1,t) &= 0, &t \\geq 0 \\\\\n",
+    "g(x,0) &= u(x), &x\\in[0,1] \\\\\n",
+    "\\frac{\\partial g(x,t)}{\\partial t}\\Big |_{t = 0} &= v(x), &x \\in [0,1]\n",
+    "\\end{aligned} \\label{condwave} \\tag{20}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "63128ef6",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "In this example, let $c = 1$ and $u(x) = \\sin(\\pi x)$ and $v(x) = -\\pi\\sin(\\pi x)$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ff568c81",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## The trial solution\n",
+    "Setting up the network is done in similar matter as for the example of solving the diffusion equation.\n",
+    "The only things we have to change, is the trial solution such that it satisfies the conditions from ([20](#condwave)) and the cost function.\n",
+    "\n",
+    "The trial solution becomes slightly different since we have other conditions than in the example of solving the diffusion equation. Here, a possible trial solution $g_t(x,t)$ is\n",
+    "\n",
+    "$$\n",
+    "g_t(x,t) = h_1(x,t) + x(1-x)t^2N(x,t,P)\n",
+    "$$\n",
+    "\n",
+    "where\n",
+    "\n",
+    "$$\n",
+    "h_1(x,t) = (1-t^2)u(x) + tv(x)\n",
+    "$$\n",
+    "\n",
+    "Note that this trial solution satisfies the conditions only if $u(0) = v(0) = u(1) = v(1) = 0$, which is the case in this example."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7b32c8dd",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## The analytical solution\n",
+    "\n",
+    "The analytical solution for our specific problem, is\n",
+    "\n",
+    "$$\n",
+    "g(x,t) = \\sin(\\pi x)\\cos(\\pi t) - \\sin(\\pi x)\\sin(\\pi t)\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "fc33e683",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Solving the wave equation - the full program using Autograd"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 40,
+   "id": "2f923958",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "import autograd.numpy as np\n",
+    "from autograd import hessian,grad\n",
+    "import autograd.numpy.random as npr\n",
+    "from matplotlib import cm\n",
+    "from matplotlib import pyplot as plt\n",
+    "from mpl_toolkits.mplot3d import axes3d\n",
+    "\n",
+    "## Set up the trial function:\n",
+    "def u(x):\n",
+    "    return np.sin(np.pi*x)\n",
+    "\n",
+    "def v(x):\n",
+    "    return -np.pi*np.sin(np.pi*x)\n",
+    "\n",
+    "def h1(point):\n",
+    "    x,t = point\n",
+    "    return (1 - t**2)*u(x) + t*v(x)\n",
+    "\n",
+    "def g_trial(point,P):\n",
+    "    x,t = point\n",
+    "    return h1(point) + x*(1-x)*t**2*deep_neural_network(P,point)\n",
+    "\n",
+    "## Define the cost function\n",
+    "def cost_function(P, x, t):\n",
+    "    cost_sum = 0\n",
+    "\n",
+    "    g_t_hessian_func = hessian(g_trial)\n",
+    "\n",
+    "    for x_ in x:\n",
+    "        for t_ in t:\n",
+    "            point = np.array([x_,t_])\n",
+    "\n",
+    "            g_t_hessian = g_t_hessian_func(point,P)\n",
+    "\n",
+    "            g_t_d2x = g_t_hessian[0][0]\n",
+    "            g_t_d2t = g_t_hessian[1][1]\n",
+    "\n",
+    "            err_sqr = ( (g_t_d2t - g_t_d2x) )**2\n",
+    "            cost_sum += err_sqr\n",
+    "\n",
+    "    return cost_sum / (np.size(t) * np.size(x))\n",
+    "\n",
+    "## The neural network\n",
+    "def sigmoid(z):\n",
+    "    return 1/(1 + np.exp(-z))\n",
+    "\n",
+    "def deep_neural_network(deep_params, x):\n",
+    "    # x is now a point and a 1D numpy array; make it a column vector\n",
+    "    num_coordinates = np.size(x,0)\n",
+    "    x = x.reshape(num_coordinates,-1)\n",
+    "\n",
+    "    num_points = np.size(x,1)\n",
+    "\n",
+    "    # N_hidden is the number of hidden layers\n",
+    "    N_hidden = len(deep_params) - 1 # -1 since params consist of parameters to all the hidden layers AND the output layer\n",
+    "\n",
+    "    # Assume that the input layer does nothing to the input x\n",
+    "    x_input = x\n",
+    "    x_prev = x_input\n",
+    "\n",
+    "    ## Hidden layers:\n",
+    "\n",
+    "    for l in range(N_hidden):\n",
+    "        # From the list of parameters P; find the correct weigths and bias for this layer\n",
+    "        w_hidden = deep_params[l]\n",
+    "\n",
+    "        # Add a row of ones to include bias\n",
+    "        x_prev = np.concatenate((np.ones((1,num_points)), x_prev ), axis = 0)\n",
+    "\n",
+    "        z_hidden = np.matmul(w_hidden, x_prev)\n",
+    "        x_hidden = sigmoid(z_hidden)\n",
+    "\n",
+    "        # Update x_prev such that next layer can use the output from this layer\n",
+    "        x_prev = x_hidden\n",
+    "\n",
+    "    ## Output layer:\n",
+    "\n",
+    "    # Get the weights and bias for this layer\n",
+    "    w_output = deep_params[-1]\n",
+    "\n",
+    "    # Include bias:\n",
+    "    x_prev = np.concatenate((np.ones((1,num_points)), x_prev), axis = 0)\n",
+    "\n",
+    "    z_output = np.matmul(w_output, x_prev)\n",
+    "    x_output = z_output\n",
+    "\n",
+    "    return x_output[0][0]\n",
+    "\n",
+    "## The analytical solution\n",
+    "def g_analytic(point):\n",
+    "    x,t = point\n",
+    "    return np.sin(np.pi*x)*np.cos(np.pi*t) - np.sin(np.pi*x)*np.sin(np.pi*t)\n",
+    "\n",
+    "def solve_pde_deep_neural_network(x,t, num_neurons, num_iter, lmb):\n",
+    "    ## Set up initial weigths and biases\n",
+    "    N_hidden = np.size(num_neurons)\n",
+    "\n",
+    "    ## Set up initial weigths and biases\n",
+    "\n",
+    "    # Initialize the list of parameters:\n",
+    "    P = [None]*(N_hidden + 1) # + 1 to include the output layer\n",
+    "\n",
+    "    P[0] = npr.randn(num_neurons[0], 2 + 1 ) # 2 since we have two points, +1 to include bias\n",
+    "    for l in range(1,N_hidden):\n",
+    "        P[l] = npr.randn(num_neurons[l], num_neurons[l-1] + 1) # +1 to include bias\n",
+    "\n",
+    "    # For the output layer\n",
+    "    P[-1] = npr.randn(1, num_neurons[-1] + 1 ) # +1 since bias is included\n",
+    "\n",
+    "    print('Initial cost: ',cost_function(P, x, t))\n",
+    "\n",
+    "    cost_function_grad = grad(cost_function,0)\n",
+    "\n",
+    "    # Let the update be done num_iter times\n",
+    "    for i in range(num_iter):\n",
+    "        cost_grad =  cost_function_grad(P, x , t)\n",
+    "\n",
+    "        for l in range(N_hidden+1):\n",
+    "            P[l] = P[l] - lmb * cost_grad[l]\n",
+    "\n",
+    "\n",
+    "    print('Final cost: ',cost_function(P, x, t))\n",
+    "\n",
+    "    return P\n",
+    "\n",
+    "if __name__ == '__main__':\n",
+    "    ### Use the neural network:\n",
+    "    npr.seed(15)\n",
+    "\n",
+    "    ## Decide the vales of arguments to the function to solve\n",
+    "    Nx = 10; Nt = 10\n",
+    "    x = np.linspace(0, 1, Nx)\n",
+    "    t = np.linspace(0,1,Nt)\n",
+    "\n",
+    "    ## Set up the parameters for the network\n",
+    "    num_hidden_neurons = [50,20]\n",
+    "    num_iter = 1000\n",
+    "    lmb = 0.01\n",
+    "\n",
+    "    P = solve_pde_deep_neural_network(x,t, num_hidden_neurons, num_iter, lmb)\n",
+    "\n",
+    "    ## Store the results\n",
+    "    res = np.zeros((Nx, Nt))\n",
+    "    res_analytical = np.zeros((Nx, Nt))\n",
+    "    for i,x_ in enumerate(x):\n",
+    "        for j, t_ in enumerate(t):\n",
+    "            point = np.array([x_, t_])\n",
+    "            res[i,j] = g_trial(point,P)\n",
+    "\n",
+    "            res_analytical[i,j] = g_analytic(point)\n",
+    "\n",
+    "    diff = np.abs(res - res_analytical)\n",
+    "    print(\"Max difference between analytical and solution from nn: %g\"%np.max(diff))\n",
+    "\n",
+    "    ## Plot the solutions in two dimensions, that being in position and time\n",
+    "\n",
+    "    T,X = np.meshgrid(t,x)\n",
+    "\n",
+    "    fig = plt.figure(figsize=(10,10))\n",
+    "    ax = fig.add_suplot(projection='3d')\n",
+    "    ax.set_title('Solution from the deep neural network w/ %d layer'%len(num_hidden_neurons))\n",
+    "    s = ax.plot_surface(T,X,res,linewidth=0,antialiased=False,cmap=cm.viridis)\n",
+    "    ax.set_xlabel('Time $t$')\n",
+    "    ax.set_ylabel('Position $x$');\n",
+    "\n",
+    "\n",
+    "    fig = plt.figure(figsize=(10,10))\n",
+    "    ax = fig.add_suplot(projection='3d')\n",
+    "    ax.set_title('Analytical solution')\n",
+    "    s = ax.plot_surface(T,X,res_analytical,linewidth=0,antialiased=False,cmap=cm.viridis)\n",
+    "    ax.set_xlabel('Time $t$')\n",
+    "    ax.set_ylabel('Position $x$');\n",
+    "\n",
+    "\n",
+    "    fig = plt.figure(figsize=(10,10))\n",
+    "    ax = fig.add_suplot(projection='3d')\n",
+    "    ax.set_title('Difference')\n",
+    "    s = ax.plot_surface(T,X,diff,linewidth=0,antialiased=False,cmap=cm.viridis)\n",
+    "    ax.set_xlabel('Time $t$')\n",
+    "    ax.set_ylabel('Position $x$');\n",
+    "\n",
+    "    ## Take some slices of the 3D plots just to see the solutions at particular times\n",
+    "    indx1 = 0\n",
+    "    indx2 = int(Nt/2)\n",
+    "    indx3 = Nt-1\n",
+    "\n",
+    "    t1 = t[indx1]\n",
+    "    t2 = t[indx2]\n",
+    "    t3 = t[indx3]\n",
+    "\n",
+    "    # Slice the results from the DNN\n",
+    "    res1 = res[:,indx1]\n",
+    "    res2 = res[:,indx2]\n",
+    "    res3 = res[:,indx3]\n",
+    "\n",
+    "    # Slice the analytical results\n",
+    "    res_analytical1 = res_analytical[:,indx1]\n",
+    "    res_analytical2 = res_analytical[:,indx2]\n",
+    "    res_analytical3 = res_analytical[:,indx3]\n",
+    "\n",
+    "    # Plot the slices\n",
+    "    plt.figure(figsize=(10,10))\n",
+    "    plt.title(\"Computed solutions at time = %g\"%t1)\n",
+    "    plt.plot(x, res1)\n",
+    "    plt.plot(x,res_analytical1)\n",
+    "    plt.legend(['dnn','analytical'])\n",
+    "\n",
+    "    plt.figure(figsize=(10,10))\n",
+    "    plt.title(\"Computed solutions at time = %g\"%t2)\n",
+    "    plt.plot(x, res2)\n",
+    "    plt.plot(x,res_analytical2)\n",
+    "    plt.legend(['dnn','analytical'])\n",
+    "\n",
+    "    plt.figure(figsize=(10,10))\n",
+    "    plt.title(\"Computed solutions at time = %g\"%t3)\n",
+    "    plt.plot(x, res3)\n",
+    "    plt.plot(x,res_analytical3)\n",
+    "    plt.legend(['dnn','analytical'])\n",
+    "\n",
+    "    plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "95dea76f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Resources on differential equations and deep learning\n",
+    "\n",
+    "1. [Artificial neural networks for solving ordinary and partial differential equations by I.E. Lagaris et al](https://pdfs.semanticscholar.org/d061/df393e0e8fbfd0ea24976458b7d42419040d.pdf)\n",
+    "\n",
+    "2. [Neural networks for solving differential equations by A. Honchar](https://becominghuman.ai/neural-networks-for-solving-differential-equations-fa230ac5e04c)\n",
+    "\n",
+    "3. [Solving differential equations using neural networks by M.M Chiaramonte and M. Kiener](http://cs229.stanford.edu/proj2013/ChiaramonteKiener-SolvingDifferentialEquationsUsingNeuralNetworks.pdf)\n",
+    "\n",
+    "4. [Introduction to Partial Differential Equations by A. Tveito, R. Winther](https://www.springer.com/us/book/9783540225515)"
+   ]
+  }
+ ],
+ "metadata": {},
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/doc/LectureNotes/week44.ipynb b/doc/LectureNotes/week44.ipynb
new file mode 100644
index 000000000..6193b11ee
--- /dev/null
+++ b/doc/LectureNotes/week44.ipynb
@@ -0,0 +1,4983 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "67995f17",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- HTML file automatically generated from DocOnce source (https://github.com/doconce/doconce/)\n",
+    "doconce format html week44.do.txt --no_mako -->\n",
+    "<!-- dom:TITLE: Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN) -->"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d31bb6a0",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "# Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)\n",
+    "**Morten Hjorth-Jensen**, Department of Physics, University of Oslo, Norway\n",
+    "\n",
+    "Date: **Week 44**"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "846f5bd7",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Plan for week 44\n",
+    "\n",
+    "**Material for the lecture Monday October 27, 2025.**\n",
+    "\n",
+    "1. Solving differential equations, continuation from last week, first lecture\n",
+    "\n",
+    "2. Convolutional  Neural Networks, second lecture\n",
+    "\n",
+    "3. Readings and Videos:\n",
+    "\n",
+    "  * These lecture notes at <https://github.com/CompPhysics/MachineLearning/blob/master/doc/pub/week44/ipynb/week44.ipynb>\n",
+    "\n",
+    "  * For a more in depth discussion on  neural networks we recommend Goodfellow et al chapter 9. See also chapter 11 and 12 on practicalities and applications\n",
+    "\n",
+    "  * Reading suggestions for implementation of CNNs see Rashcka et al.'s chapter 14 at <https://github.com/rasbt/machine-learning-book/tree/main/ch14>.     \n",
+    "\n",
+    "  * Video on Deep Learning at <https://www.youtube.com/playlist?list=PLZHQObOWTQDNU6R1_67000Dx_ZCJB-3pi>\n",
+    "\n",
+    "  * Video  on Convolutional Neural Networks from MIT at <https://www.youtube.com/watch?v=iaSUYvmCekI&ab_channel=AlexanderAmini>\n",
+    "\n",
+    "  * Video on CNNs from Stanford at <https://www.youtube.com/watch?v=bNb2fEVKeEo&list=PLC1qU-LWwrF64f4QKQT-Vg5Wr4qEE1Zxk&index=6&ab_channel=StanfordUniversitySchoolofEngineering>\n",
+    "\n",
+    "  * Video of lecture October 27 at <https://youtu.be/QqOGhLgkig0>\n",
+    "\n",
+    "  * Whiteboard notes at <https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2025/FYSSTKweek44>"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "855f98ab",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Lab  sessions on Tuesday and Wednesday\n",
+    "\n",
+    "* Main focus is discussion of and work on project 2\n",
+    "\n",
+    "* If you did not get time to finish the exercises from weeks 41-42, you can also keep working on them and hand in this coming Friday"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "12675cc5",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Material for Lecture Monday October 27"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f714320f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Solving differential equations  with Deep Learning\n",
+    "\n",
+    "The Universal Approximation Theorem states that a neural network can\n",
+    "approximate any function at a single hidden layer along with one input\n",
+    "and output layer to any given precision.\n",
+    "\n",
+    "**Book on solving differential equations with ML methods.**\n",
+    "\n",
+    "[An Introduction to Neural Network Methods for Differential Equations](https://www.springer.com/gp/book/9789401798150), by Yadav and Kumar.\n",
+    "\n",
+    "**Physics informed neural networks.**\n",
+    "\n",
+    "[Scientific Machine Learning Through Physics–Informed Neural Networks: Where we are and What’s Next](https://link.springer.com/article/10.1007/s10915-022-01939-z), by Cuomo et al\n",
+    "\n",
+    "**Thanks to Kristine Baluka Hein.**\n",
+    "\n",
+    "The lectures on differential equations were developed by Kristine Baluka Hein, now PhD student at IFI.\n",
+    "A great thanks to Kristine."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ebe354b6",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Ordinary Differential Equations first\n",
+    "\n",
+    "An ordinary differential equation (ODE) is an equation involving functions having one variable.\n",
+    "\n",
+    "In general, an ordinary differential equation looks like"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f16621c0",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"ode\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation} \\label{ode} \\tag{1}\n",
+    "f\\left(x, \\, g(x), \\, g'(x), \\, g''(x), \\, \\dots \\, , \\, g^{(n)}(x)\\right) = 0\n",
+    "\\end{equation}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2b272a0d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where $g(x)$ is the function to find, and $g^{(n)}(x)$ is the $n$-th derivative of $g(x)$.\n",
+    "\n",
+    "The $f\\left(x, g(x), g'(x), g''(x), \\, \\dots \\, , g^{(n)}(x)\\right)$ is just a way to write that there is an expression involving $x$ and $g(x), \\ g'(x), \\ g''(x), \\, \\dots \\, , \\text{ and } g^{(n)}(x)$ on the left side of the equality sign in ([1](#ode)).\n",
+    "The highest order of derivative, that is the value of $n$, determines to the order of the equation.\n",
+    "The equation is referred to as a $n$-th order ODE.\n",
+    "Along with ([1](#ode)), some additional conditions of the function $g(x)$ are typically given\n",
+    "for the solution to be unique."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "611b2399",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## The trial solution\n",
+    "\n",
+    "Let the trial solution $g_t(x)$ be"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cab2d9fb",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"_auto1\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation}\n",
+    "\tg_t(x) = h_1(x) + h_2(x,N(x,P))\n",
+    "\\label{_auto1} \\tag{2}\n",
+    "\\end{equation}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "fbd68a84",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where $h_1(x)$ is a function that makes $g_t(x)$ satisfy a given set\n",
+    "of conditions, $N(x,P)$ a neural network with weights and biases\n",
+    "described by $P$ and $h_2(x, N(x,P))$ some expression involving the\n",
+    "neural network.  The role of the function $h_2(x, N(x,P))$, is to\n",
+    "ensure that the output from $N(x,P)$ is zero when $g_t(x)$ is\n",
+    "evaluated at the values of $x$ where the given conditions must be\n",
+    "satisfied.  The function $h_1(x)$ should alone make $g_t(x)$ satisfy\n",
+    "the conditions.\n",
+    "\n",
+    "But what about the network $N(x,P)$?\n",
+    "\n",
+    "As described previously, an optimization method could be used to minimize the parameters of a neural network, that being its weights and biases, through backward propagation."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "24929e78",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Minimization process\n",
+    "\n",
+    "For the minimization to be defined, we need to have a cost function at hand to minimize.\n",
+    "\n",
+    "It is given that $f\\left(x, \\, g(x), \\, g'(x), \\, g''(x), \\, \\dots \\, , \\, g^{(n)}(x)\\right)$ should be equal to zero in ([1](#ode)).\n",
+    "We can choose to consider the mean squared error as the cost function for an input $x$.\n",
+    "Since we are looking at one input, the cost function is just $f$ squared.\n",
+    "The cost function $c\\left(x, P \\right)$ can therefore be expressed as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8da0a4d4",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "C\\left(x, P\\right) = \\big(f\\left(x, \\, g(x), \\, g'(x), \\, g''(x), \\, \\dots \\, , \\, g^{(n)}(x)\\right)\\big)^2\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3de8b89e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "If $N$ inputs are given as a vector $\\boldsymbol{x}$ with elements $x_i$ for $i = 1,\\dots,N$,\n",
+    "the cost function becomes"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1275ce7a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"cost\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation} \\label{cost} \\tag{3}\n",
+    "\tC\\left(\\boldsymbol{x}, P\\right) = \\frac{1}{N} \\sum_{i=1}^N \\big(f\\left(x_i, \\, g(x_i), \\, g'(x_i), \\, g''(x_i), \\, \\dots \\, , \\, g^{(n)}(x_i)\\right)\\big)^2\n",
+    "\\end{equation}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a522e0fa",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "The neural net should then find the parameters $P$ that minimizes the cost function in\n",
+    "([3](#cost)) for a set of $N$ training samples $x_i$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8a18955b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Minimizing the cost function using gradient descent and automatic differentiation\n",
+    "\n",
+    "To perform the minimization using gradient descent, the gradient of $C\\left(\\boldsymbol{x}, P\\right)$ is needed.\n",
+    "It might happen so that finding an analytical expression of the gradient of $C(\\boldsymbol{x}, P)$ from ([3](#cost)) gets too messy, depending on which cost function one desires to use.\n",
+    "\n",
+    "Luckily, there exists libraries that makes the job for us through automatic differentiation.\n",
+    "Automatic differentiation is a method of finding the derivatives numerically with very high precision."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "888808f7",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Example: Exponential decay\n",
+    "\n",
+    "An exponential decay of a quantity $g(x)$ is described by the equation"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "fcefd7fb",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"solve_expdec\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation} \\label{solve_expdec} \\tag{4}\n",
+    "  g'(x) = -\\gamma g(x)\n",
+    "\\end{equation}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "02cb2ce9",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "with $g(0) = g_0$ for some chosen initial value $g_0$.\n",
+    "\n",
+    "The analytical solution of ([4](#solve_expdec)) is"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "bdd9ef4d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"_auto2\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation}\n",
+    "  g(x) = g_0 \\exp\\left(-\\gamma x\\right)\n",
+    "\\label{_auto2} \\tag{5}\n",
+    "\\end{equation}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "867cbb56",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Having an analytical solution at hand, it is possible to use it to compare how well a neural network finds a solution of ([4](#solve_expdec))."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2f9ac7ae",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## The function to solve for\n",
+    "\n",
+    "The program will use a neural network to solve"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "49a68337",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"solveode\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation} \\label{solveode} \\tag{6}\n",
+    "g'(x) = -\\gamma g(x)\n",
+    "\\end{equation}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a6a70316",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where $g(0) = g_0$ with $\\gamma$ and $g_0$ being some chosen values.\n",
+    "\n",
+    "In this example, $\\gamma = 2$ and $g_0 = 10$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "15622597",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## The trial solution\n",
+    "To begin with, a trial solution $g_t(t)$ must be chosen. A general trial solution for ordinary differential equations could be"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3661d5fe",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "g_t(x, P) = h_1(x) + h_2(x, N(x, P))\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "245327b3",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "with $h_1(x)$ ensuring that $g_t(x)$ satisfies some conditions and $h_2(x,N(x, P))$ an expression involving $x$ and the output from the neural network $N(x,P)$ with $P $ being the collection of the weights and biases for each layer. For now, it is assumed that the network consists of one input layer, one hidden layer, and one output layer."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "57ae96b2",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Setup of Network\n",
+    "\n",
+    "In this network, there are no weights and bias at the input layer, so $P = \\{ P_{\\text{hidden}},  P_{\\text{output}} \\}$.\n",
+    "If there are $N_{\\text{hidden} }$ neurons in the hidden layer, then $P_{\\text{hidden}}$ is a $N_{\\text{hidden} } \\times (1 + N_{\\text{input}})$ matrix, given that there are $N_{\\text{input}}$ neurons in the input layer.\n",
+    "\n",
+    "The first column in $P_{\\text{hidden} }$ represents the bias for each neuron in the hidden layer and the second column represents the weights for each neuron in the hidden layer from the input layer.\n",
+    "If there are $N_{\\text{output} }$ neurons in the output layer, then $P_{\\text{output}} $ is a $N_{\\text{output} } \\times (1 + N_{\\text{hidden} })$ matrix.\n",
+    "\n",
+    "Its first column represents the bias of each neuron and the remaining columns represents the weights to each neuron.\n",
+    "\n",
+    "It is given that $g(0) = g_0$. The trial solution must fulfill this condition to be a proper solution of ([6](#solveode)). A possible way to ensure that $g_t(0, P) = g_0$, is to let $F(N(x,P)) = x \\cdot N(x,P)$ and $h_1(x) = g_0$. This gives the following trial solution:"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6e7ea73f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"trial\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation} \\label{trial} \\tag{7}\n",
+    "g_t(x, P) = g_0 + x \\cdot N(x, P)\n",
+    "\\end{equation}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3ef84086",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Reformulating the problem\n",
+    "\n",
+    "We wish that our neural network manages to minimize a given cost function.\n",
+    "\n",
+    "A reformulation of out equation, ([6](#solveode)), must therefore be done,\n",
+    "such that it describes the problem a neural network can solve for.\n",
+    "\n",
+    "The neural network must find the set of weights and biases $P$ such that the trial solution in ([7](#trial)) satisfies ([6](#solveode)).\n",
+    "\n",
+    "The trial solution"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "03980965",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "g_t(x, P) = g_0 + x \\cdot N(x, P)\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f838bf7c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "has been chosen such that it already solves the condition $g(0) = g_0$. What remains, is to find $P$ such that"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3e1ebb62",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"nnmin\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation} \\label{nnmin} \\tag{8}\n",
+    "g_t'(x, P) = - \\gamma g_t(x, P)\n",
+    "\\end{equation}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a85dcbea",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "is fulfilled as *best as possible*."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "dc4a2fc0",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## More technicalities\n",
+    "\n",
+    "The left hand side and right hand side of ([8](#nnmin)) must be computed separately, and then the neural network must choose weights and biases, contained in $P$, such that the sides are equal as best as possible.\n",
+    "This means that the absolute or squared difference between the sides must be as close to zero, ideally equal to zero.\n",
+    "In this case, the difference squared shows to be an appropriate measurement of how erroneous the trial solution is with respect to $P$ of the neural network.\n",
+    "\n",
+    "This gives the following cost function our neural network must solve for:"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "20921b20",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\min_{P}\\Big\\{ \\big(g_t'(x, P) - ( -\\gamma g_t(x, P) \\big)^2 \\Big\\}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "06e89d99",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "(the notation $\\min_{P}\\{ f(x, P) \\}$ means that we desire to find $P$ that yields the minimum of $f(x, P)$)\n",
+    "\n",
+    "or, in terms of weights and biases for the hidden and output layer in our network:"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "fb4b7d00",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\min_{P_{\\text{hidden} }, \\ P_{\\text{output} }}\\Big\\{ \\big(g_t'(x, \\{ P_{\\text{hidden} }, P_{\\text{output} }\\}) - ( -\\gamma g_t(x, \\{ P_{\\text{hidden} }, P_{\\text{output} }\\}) \\big)^2 \\Big\\}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "925d8872",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "for an input value $x$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "46f38d69",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## More details\n",
+    "\n",
+    "If the neural network evaluates $g_t(x, P)$ at more values for $x$, say $N$ values $x_i$ for $i = 1, \\dots, N$, then the *total* error to minimize becomes"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "adca56df",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"min\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation} \\label{min} \\tag{9}\n",
+    "\\min_{P}\\Big\\{\\frac{1}{N} \\sum_{i=1}^N  \\big(g_t'(x_i, P) - ( -\\gamma g_t(x_i, P) \\big)^2 \\Big\\}\n",
+    "\\end{equation}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9e260216",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Letting $\\boldsymbol{x}$ be a vector with elements $x_i$ and $C(\\boldsymbol{x}, P) = \\frac{1}{N} \\sum_i  \\big(g_t'(x_i, P) - ( -\\gamma g_t(x_i, P) \\big)^2$ denote the cost function, the minimization problem that our network must solve, becomes"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7d5e7f63",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\min_{P} C(\\boldsymbol{x}, P)\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d7442d44",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "In terms of $P_{\\text{hidden} }$ and $P_{\\text{output} }$, this could also be expressed as\n",
+    "\n",
+    "$$\n",
+    "\\min_{P_{\\text{hidden} }, \\ P_{\\text{output} }} C(\\boldsymbol{x}, \\{P_{\\text{hidden} }, P_{\\text{output} }\\})\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "af21673a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## A possible implementation of a neural network\n",
+    "\n",
+    "For simplicity, it is assumed that the input is an array $\\boldsymbol{x} = (x_1, \\dots, x_N)$ with $N$ elements. It is at these points the neural network should find $P$ such that it fulfills ([9](#min)).\n",
+    "\n",
+    "First, the neural network must feed forward the inputs.\n",
+    "This means that $\\boldsymbol{x}s$ must be passed through an input layer, a hidden layer and a output layer. The input layer in this case, does not need to process the data any further.\n",
+    "The input layer will consist of $N_{\\text{input} }$ neurons, passing its element to each neuron in the hidden layer.  The number of neurons in the hidden layer will be $N_{\\text{hidden} }$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6687f370",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Technicalities\n",
+    "\n",
+    "For the $i$-th in the hidden layer with weight $w_i^{\\text{hidden} }$ and bias $b_i^{\\text{hidden} }$, the weighting from the $j$-th neuron at the input layer is:"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7c07e210",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\begin{aligned}\n",
+    "z_{i,j}^{\\text{hidden}} &= b_i^{\\text{hidden}} + w_i^{\\text{hidden}}x_j \\\\\n",
+    "&=\n",
+    "\\begin{pmatrix}\n",
+    "b_i^{\\text{hidden}} & w_i^{\\text{hidden}}\n",
+    "\\end{pmatrix}\n",
+    "\\begin{pmatrix}\n",
+    "1 \\\\\n",
+    "x_j\n",
+    "\\end{pmatrix}\n",
+    "\\end{aligned}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7747386f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Final technicalities I\n",
+    "\n",
+    "The result after weighting the inputs at the $i$-th hidden neuron can be written as a vector:"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "981c5e4b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\begin{aligned}\n",
+    "\\boldsymbol{z}_{i}^{\\text{hidden}} &= \\Big( b_i^{\\text{hidden}} + w_i^{\\text{hidden}}x_1 , \\ b_i^{\\text{hidden}} + w_i^{\\text{hidden}} x_2, \\ \\dots \\, , \\ b_i^{\\text{hidden}} + w_i^{\\text{hidden}} x_N\\Big)  \\\\\n",
+    "&=\n",
+    "\\begin{pmatrix}\n",
+    " b_i^{\\text{hidden}}  & w_i^{\\text{hidden}}\n",
+    "\\end{pmatrix}\n",
+    "\\begin{pmatrix}\n",
+    "1  & 1 & \\dots & 1 \\\\\n",
+    "x_1 & x_2 & \\dots & x_N\n",
+    "\\end{pmatrix} \\\\\n",
+    "&= \\boldsymbol{p}_{i, \\text{hidden}}^T X\n",
+    "\\end{aligned}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7eedb1ed",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Final technicalities II\n",
+    "\n",
+    "The vector $\\boldsymbol{p}_{i, \\text{hidden}}^T$ constitutes each row in $P_{\\text{hidden} }$, which contains the weights for the neural network to minimize according to ([9](#min)).\n",
+    "\n",
+    "After having found $\\boldsymbol{z}_{i}^{\\text{hidden}} $ for every $i$-th neuron within the hidden layer, the vector will be sent to an activation function $a_i(\\boldsymbol{z})$.\n",
+    "\n",
+    "In this example, the sigmoid function has been chosen to be the activation function for each hidden neuron:"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8507388c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "f(z) = \\frac{1}{1 + \\exp{(-z)}}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "32c6ce19",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "It is possible to use other activations functions for the hidden layer also.\n",
+    "\n",
+    "The output $\\boldsymbol{x}_i^{\\text{hidden}}$ from each $i$-th hidden neuron is:\n",
+    "\n",
+    "$$\n",
+    "\\boldsymbol{x}_i^{\\text{hidden} } = f\\big(  \\boldsymbol{z}_{i}^{\\text{hidden}} \\big)\n",
+    "$$\n",
+    "\n",
+    "The outputs $\\boldsymbol{x}_i^{\\text{hidden} } $ are then sent to the output layer.\n",
+    "\n",
+    "The output layer consists of one neuron in this case, and combines the\n",
+    "output from each of the neurons in the hidden layers. The output layer\n",
+    "combines the results from the hidden layer using some weights $w_i^{\\text{output}}$\n",
+    "and biases $b_i^{\\text{output}}$. In this case,\n",
+    "it is assumes that the number of neurons in the output layer is one."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d3adb503",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Final technicalities III\n",
+    "\n",
+    "The procedure of weighting the output neuron $j$ in the hidden layer to the $i$-th neuron in the output layer is similar as for the hidden layer described previously."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "41fb7d85",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\begin{aligned}\n",
+    "z_{1,j}^{\\text{output}} & =\n",
+    "\\begin{pmatrix}\n",
+    "b_1^{\\text{output}} & \\boldsymbol{w}_1^{\\text{output}}\n",
+    "\\end{pmatrix}\n",
+    "\\begin{pmatrix}\n",
+    "1 \\\\\n",
+    "\\boldsymbol{x}_j^{\\text{hidden}}\n",
+    "\\end{pmatrix}\n",
+    "\\end{aligned}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6af6c5f6",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Final technicalities IV\n",
+    "\n",
+    "Expressing $z_{1,j}^{\\text{output}}$ as a vector gives the following way of weighting the inputs from the hidden layer:"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "bfdfcfe5",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\boldsymbol{z}_{1}^{\\text{output}} =\n",
+    "\\begin{pmatrix}\n",
+    "b_1^{\\text{output}} & \\boldsymbol{w}_1^{\\text{output}}\n",
+    "\\end{pmatrix}\n",
+    "\\begin{pmatrix}\n",
+    "1  & 1 & \\dots & 1 \\\\\n",
+    "\\boldsymbol{x}_1^{\\text{hidden}} & \\boldsymbol{x}_2^{\\text{hidden}} & \\dots & \\boldsymbol{x}_N^{\\text{hidden}}\n",
+    "\\end{pmatrix}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "224fb7a0",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "In this case we seek a continuous range of values since we are approximating a function. This means that after computing $\\boldsymbol{z}_{1}^{\\text{output}}$ the neural network has finished its feed forward step, and $\\boldsymbol{z}_{1}^{\\text{output}}$ is the final output of the network."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "03c8c39e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Back propagation\n",
+    "\n",
+    "The next step is to decide how the parameters should be changed such that they minimize the cost function.\n",
+    "\n",
+    "The chosen cost function for this problem is"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f467feb4",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "C(\\boldsymbol{x}, P) = \\frac{1}{N} \\sum_i  \\big(g_t'(x_i, P) - ( -\\gamma g_t(x_i, P) \\big)^2\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "287a0aed",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "In order to minimize the cost function, an optimization method must be chosen.\n",
+    "\n",
+    "Here, gradient descent with a constant step size has been chosen."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a49835f1",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Gradient descent\n",
+    "\n",
+    "The idea of the gradient descent algorithm is to update parameters in\n",
+    "a direction where the cost function decreases goes to a minimum.\n",
+    "\n",
+    "In general, the update of some parameters $\\boldsymbol{\\omega}$ given a cost\n",
+    "function defined by some weights $\\boldsymbol{\\omega}$, $C(\\boldsymbol{x},\n",
+    "\\boldsymbol{\\omega})$, goes as follows:"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "62d6f51d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\boldsymbol{\\omega}_{\\text{new} } = \\boldsymbol{\\omega} - \\lambda \\nabla_{\\boldsymbol{\\omega}} C(\\boldsymbol{x}, \\boldsymbol{\\omega})\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3ca20573",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "for a number of iterations or until $ \\big|\\big| \\boldsymbol{\\omega}_{\\text{new} } - \\boldsymbol{\\omega} \\big|\\big|$ becomes smaller than some given tolerance.\n",
+    "\n",
+    "The value of $\\lambda$ decides how large steps the algorithm must take\n",
+    "in the direction of $ \\nabla_{\\boldsymbol{\\omega}} C(\\boldsymbol{x}, \\boldsymbol{\\omega})$.\n",
+    "The notation $\\nabla_{\\boldsymbol{\\omega}}$ express the gradient with respect\n",
+    "to the elements in $\\boldsymbol{\\omega}$.\n",
+    "\n",
+    "In our case, we have to minimize the cost function $C(\\boldsymbol{x}, P)$ with\n",
+    "respect to the two sets of weights and biases, that is for the hidden\n",
+    "layer $P_{\\text{hidden} }$ and for the output layer $P_{\\text{output}\n",
+    "}$ .\n",
+    "\n",
+    "This means that $P_{\\text{hidden} }$ and $P_{\\text{output} }$ is updated by"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8b16bc94",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\begin{aligned}\n",
+    "P_{\\text{hidden},\\text{new}} &= P_{\\text{hidden}} - \\lambda \\nabla_{P_{\\text{hidden}}} C(\\boldsymbol{x}, P)  \\\\\n",
+    "P_{\\text{output},\\text{new}} &= P_{\\text{output}} - \\lambda \\nabla_{P_{\\text{output}}} C(\\boldsymbol{x}, P)\n",
+    "\\end{aligned}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a339b3a7",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## The code for solving the ODE"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "id": "a63e587a",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "%matplotlib inline\n",
+    "\n",
+    "import autograd.numpy as np\n",
+    "from autograd import grad, elementwise_grad\n",
+    "import autograd.numpy.random as npr\n",
+    "from matplotlib import pyplot as plt\n",
+    "\n",
+    "def sigmoid(z):\n",
+    "    return 1/(1 + np.exp(-z))\n",
+    "\n",
+    "# Assuming one input, hidden, and output layer\n",
+    "def neural_network(params, x):\n",
+    "\n",
+    "    # Find the weights (including and biases) for the hidden and output layer.\n",
+    "    # Assume that params is a list of parameters for each layer.\n",
+    "    # The biases are the first element for each array in params,\n",
+    "    # and the weights are the remaning elements in each array in params.\n",
+    "\n",
+    "    w_hidden = params[0]\n",
+    "    w_output = params[1]\n",
+    "\n",
+    "    # Assumes input x being an one-dimensional array\n",
+    "    num_values = np.size(x)\n",
+    "    x = x.reshape(-1, num_values)\n",
+    "\n",
+    "    # Assume that the input layer does nothing to the input x\n",
+    "    x_input = x\n",
+    "\n",
+    "    ## Hidden layer:\n",
+    "\n",
+    "    # Add a row of ones to include bias\n",
+    "    x_input = np.concatenate((np.ones((1,num_values)), x_input ), axis = 0)\n",
+    "\n",
+    "    z_hidden = np.matmul(w_hidden, x_input)\n",
+    "    x_hidden = sigmoid(z_hidden)\n",
+    "\n",
+    "    ## Output layer:\n",
+    "\n",
+    "    # Include bias:\n",
+    "    x_hidden = np.concatenate((np.ones((1,num_values)), x_hidden ), axis = 0)\n",
+    "\n",
+    "    z_output = np.matmul(w_output, x_hidden)\n",
+    "    x_output = z_output\n",
+    "\n",
+    "    return x_output\n",
+    "\n",
+    "# The trial solution using the deep neural network:\n",
+    "def g_trial(x,params, g0 = 10):\n",
+    "    return g0 + x*neural_network(params,x)\n",
+    "\n",
+    "# The right side of the ODE:\n",
+    "def g(x, g_trial, gamma = 2):\n",
+    "    return -gamma*g_trial\n",
+    "\n",
+    "# The cost function:\n",
+    "def cost_function(P, x):\n",
+    "\n",
+    "    # Evaluate the trial function with the current parameters P\n",
+    "    g_t = g_trial(x,P)\n",
+    "\n",
+    "    # Find the derivative w.r.t x of the neural network\n",
+    "    d_net_out = elementwise_grad(neural_network,1)(P,x)\n",
+    "\n",
+    "    # Find the derivative w.r.t x of the trial function\n",
+    "    d_g_t = elementwise_grad(g_trial,0)(x,P)\n",
+    "\n",
+    "    # The right side of the ODE\n",
+    "    func = g(x, g_t)\n",
+    "\n",
+    "    err_sqr = (d_g_t - func)**2\n",
+    "    cost_sum = np.sum(err_sqr)\n",
+    "\n",
+    "    return cost_sum / np.size(err_sqr)\n",
+    "\n",
+    "# Solve the exponential decay ODE using neural network with one input, hidden, and output layer\n",
+    "def solve_ode_neural_network(x, num_neurons_hidden, num_iter, lmb):\n",
+    "    ## Set up initial weights and biases\n",
+    "\n",
+    "    # For the hidden layer\n",
+    "    p0 = npr.randn(num_neurons_hidden, 2 )\n",
+    "\n",
+    "    # For the output layer\n",
+    "    p1 = npr.randn(1, num_neurons_hidden + 1 ) # +1 since bias is included\n",
+    "\n",
+    "    P = [p0, p1]\n",
+    "\n",
+    "    print('Initial cost: %g'%cost_function(P, x))\n",
+    "\n",
+    "    ## Start finding the optimal weights using gradient descent\n",
+    "\n",
+    "    # Find the Python function that represents the gradient of the cost function\n",
+    "    # w.r.t the 0-th input argument -- that is the weights and biases in the hidden and output layer\n",
+    "    cost_function_grad = grad(cost_function,0)\n",
+    "\n",
+    "    # Let the update be done num_iter times\n",
+    "    for i in range(num_iter):\n",
+    "        # Evaluate the gradient at the current weights and biases in P.\n",
+    "        # The cost_grad consist now of two arrays;\n",
+    "        # one for the gradient w.r.t P_hidden and\n",
+    "        # one for the gradient w.r.t P_output\n",
+    "        cost_grad =  cost_function_grad(P, x)\n",
+    "\n",
+    "        P[0] = P[0] - lmb * cost_grad[0]\n",
+    "        P[1] = P[1] - lmb * cost_grad[1]\n",
+    "\n",
+    "    print('Final cost: %g'%cost_function(P, x))\n",
+    "\n",
+    "    return P\n",
+    "\n",
+    "def g_analytic(x, gamma = 2, g0 = 10):\n",
+    "    return g0*np.exp(-gamma*x)\n",
+    "\n",
+    "# Solve the given problem\n",
+    "if __name__ == '__main__':\n",
+    "    # Set seed such that the weight are initialized\n",
+    "    # with same weights and biases for every run.\n",
+    "    npr.seed(15)\n",
+    "\n",
+    "    ## Decide the vales of arguments to the function to solve\n",
+    "    N = 10\n",
+    "    x = np.linspace(0, 1, N)\n",
+    "\n",
+    "    ## Set up the initial parameters\n",
+    "    num_hidden_neurons = 10\n",
+    "    num_iter = 10000\n",
+    "    lmb = 0.001\n",
+    "\n",
+    "    # Use the network\n",
+    "    P = solve_ode_neural_network(x, num_hidden_neurons, num_iter, lmb)\n",
+    "\n",
+    "    # Print the deviation from the trial solution and true solution\n",
+    "    res = g_trial(x,P)\n",
+    "    res_analytical = g_analytic(x)\n",
+    "\n",
+    "    print('Max absolute difference: %g'%np.max(np.abs(res - res_analytical)))\n",
+    "\n",
+    "    # Plot the results\n",
+    "    plt.figure(figsize=(10,10))\n",
+    "\n",
+    "    plt.title('Performance of neural network solving an ODE compared to the analytical solution')\n",
+    "    plt.plot(x, res_analytical)\n",
+    "    plt.plot(x, res[0,:])\n",
+    "    plt.legend(['analytical','nn'])\n",
+    "    plt.xlabel('x')\n",
+    "    plt.ylabel('g(x)')\n",
+    "    plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "85985bda",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## The network with one input layer, specified number of hidden layers, and one output layer\n",
+    "\n",
+    "It is also possible to extend the construction of our network into a more general one, allowing the network to contain more than one hidden layers.\n",
+    "\n",
+    "The number of neurons within each hidden layer are given as a list of integers in the program below."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "id": "91831f8e",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "import autograd.numpy as np\n",
+    "from autograd import grad, elementwise_grad\n",
+    "import autograd.numpy.random as npr\n",
+    "from matplotlib import pyplot as plt\n",
+    "\n",
+    "def sigmoid(z):\n",
+    "    return 1/(1 + np.exp(-z))\n",
+    "\n",
+    "# The neural network with one input layer and one output layer,\n",
+    "# but with number of hidden layers specified by the user.\n",
+    "def deep_neural_network(deep_params, x):\n",
+    "    # N_hidden is the number of hidden layers\n",
+    "    # deep_params is a list, len() should be used\n",
+    "    N_hidden = len(deep_params) - 1 # -1 since params consists of\n",
+    "                                        # parameters to all the hidden\n",
+    "                                        # layers AND the output layer.\n",
+    "\n",
+    "    # Assumes input x being an one-dimensional array\n",
+    "    num_values = np.size(x)\n",
+    "    x = x.reshape(-1, num_values)\n",
+    "\n",
+    "    # Assume that the input layer does nothing to the input x\n",
+    "    x_input = x\n",
+    "\n",
+    "    # Due to multiple hidden layers, define a variable referencing to the\n",
+    "    # output of the previous layer:\n",
+    "    x_prev = x_input\n",
+    "\n",
+    "    ## Hidden layers:\n",
+    "\n",
+    "    for l in range(N_hidden):\n",
+    "        # From the list of parameters P; find the correct weigths and bias for this layer\n",
+    "        w_hidden = deep_params[l]\n",
+    "\n",
+    "        # Add a row of ones to include bias\n",
+    "        x_prev = np.concatenate((np.ones((1,num_values)), x_prev ), axis = 0)\n",
+    "\n",
+    "        z_hidden = np.matmul(w_hidden, x_prev)\n",
+    "        x_hidden = sigmoid(z_hidden)\n",
+    "\n",
+    "        # Update x_prev such that next layer can use the output from this layer\n",
+    "        x_prev = x_hidden\n",
+    "\n",
+    "    ## Output layer:\n",
+    "\n",
+    "    # Get the weights and bias for this layer\n",
+    "    w_output = deep_params[-1]\n",
+    "\n",
+    "    # Include bias:\n",
+    "    x_prev = np.concatenate((np.ones((1,num_values)), x_prev), axis = 0)\n",
+    "\n",
+    "    z_output = np.matmul(w_output, x_prev)\n",
+    "    x_output = z_output\n",
+    "\n",
+    "    return x_output\n",
+    "\n",
+    "# The trial solution using the deep neural network:\n",
+    "def g_trial_deep(x,params, g0 = 10):\n",
+    "    return g0 + x*deep_neural_network(params, x)\n",
+    "\n",
+    "# The right side of the ODE:\n",
+    "def g(x, g_trial, gamma = 2):\n",
+    "    return -gamma*g_trial\n",
+    "\n",
+    "# The same cost function as before, but calls deep_neural_network instead.\n",
+    "def cost_function_deep(P, x):\n",
+    "\n",
+    "    # Evaluate the trial function with the current parameters P\n",
+    "    g_t = g_trial_deep(x,P)\n",
+    "\n",
+    "    # Find the derivative w.r.t x of the neural network\n",
+    "    d_net_out = elementwise_grad(deep_neural_network,1)(P,x)\n",
+    "\n",
+    "    # Find the derivative w.r.t x of the trial function\n",
+    "    d_g_t = elementwise_grad(g_trial_deep,0)(x,P)\n",
+    "\n",
+    "    # The right side of the ODE\n",
+    "    func = g(x, g_t)\n",
+    "\n",
+    "    err_sqr = (d_g_t - func)**2\n",
+    "    cost_sum = np.sum(err_sqr)\n",
+    "\n",
+    "    return cost_sum / np.size(err_sqr)\n",
+    "\n",
+    "# Solve the exponential decay ODE using neural network with one input and one output layer,\n",
+    "# but with specified number of hidden layers from the user.\n",
+    "def solve_ode_deep_neural_network(x, num_neurons, num_iter, lmb):\n",
+    "    # num_hidden_neurons is now a list of number of neurons within each hidden layer\n",
+    "\n",
+    "    # The number of elements in the list num_hidden_neurons thus represents\n",
+    "    # the number of hidden layers.\n",
+    "\n",
+    "    # Find the number of hidden layers:\n",
+    "    N_hidden = np.size(num_neurons)\n",
+    "\n",
+    "    ## Set up initial weights and biases\n",
+    "\n",
+    "    # Initialize the list of parameters:\n",
+    "    P = [None]*(N_hidden + 1) # + 1 to include the output layer\n",
+    "\n",
+    "    P[0] = npr.randn(num_neurons[0], 2 )\n",
+    "    for l in range(1,N_hidden):\n",
+    "        P[l] = npr.randn(num_neurons[l], num_neurons[l-1] + 1) # +1 to include bias\n",
+    "\n",
+    "    # For the output layer\n",
+    "    P[-1] = npr.randn(1, num_neurons[-1] + 1 ) # +1 since bias is included\n",
+    "\n",
+    "    print('Initial cost: %g'%cost_function_deep(P, x))\n",
+    "\n",
+    "    ## Start finding the optimal weights using gradient descent\n",
+    "\n",
+    "    # Find the Python function that represents the gradient of the cost function\n",
+    "    # w.r.t the 0-th input argument -- that is the weights and biases in the hidden and output layer\n",
+    "    cost_function_deep_grad = grad(cost_function_deep,0)\n",
+    "\n",
+    "    # Let the update be done num_iter times\n",
+    "    for i in range(num_iter):\n",
+    "        # Evaluate the gradient at the current weights and biases in P.\n",
+    "        # The cost_grad consist now of N_hidden + 1 arrays; the gradient w.r.t the weights and biases\n",
+    "        # in the hidden layers and output layers evaluated at x.\n",
+    "        cost_deep_grad =  cost_function_deep_grad(P, x)\n",
+    "\n",
+    "        for l in range(N_hidden+1):\n",
+    "            P[l] = P[l] - lmb * cost_deep_grad[l]\n",
+    "\n",
+    "    print('Final cost: %g'%cost_function_deep(P, x))\n",
+    "\n",
+    "    return P\n",
+    "\n",
+    "def g_analytic(x, gamma = 2, g0 = 10):\n",
+    "    return g0*np.exp(-gamma*x)\n",
+    "\n",
+    "# Solve the given problem\n",
+    "if __name__ == '__main__':\n",
+    "    npr.seed(15)\n",
+    "\n",
+    "    ## Decide the vales of arguments to the function to solve\n",
+    "    N = 10\n",
+    "    x = np.linspace(0, 1, N)\n",
+    "\n",
+    "    ## Set up the initial parameters\n",
+    "    num_hidden_neurons = np.array([10,10])\n",
+    "    num_iter = 10000\n",
+    "    lmb = 0.001\n",
+    "\n",
+    "    P = solve_ode_deep_neural_network(x, num_hidden_neurons, num_iter, lmb)\n",
+    "\n",
+    "    res = g_trial_deep(x,P)\n",
+    "    res_analytical = g_analytic(x)\n",
+    "\n",
+    "    plt.figure(figsize=(10,10))\n",
+    "\n",
+    "    plt.title('Performance of a deep neural network solving an ODE compared to the analytical solution')\n",
+    "    plt.plot(x, res_analytical)\n",
+    "    plt.plot(x, res[0,:])\n",
+    "    plt.legend(['analytical','dnn'])\n",
+    "    plt.ylabel('g(x)')\n",
+    "    plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e6de1553",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Example: Population growth\n",
+    "\n",
+    "A logistic model of population growth assumes that a population converges toward an equilibrium.\n",
+    "The population growth can be modeled by"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6e4c5e3a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"log\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation} \\label{log} \\tag{10}\n",
+    "\tg'(t) = \\alpha g(t)(A - g(t))\n",
+    "\\end{equation}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "64a97256",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where $g(t)$ is the population density at time $t$, $\\alpha > 0$ the growth rate and $A > 0$ is the maximum population number in the environment.\n",
+    "Also, at $t = 0$ the population has the size $g(0) = g_0$, where $g_0$ is some chosen constant.\n",
+    "\n",
+    "In this example, similar network as for the exponential decay using Autograd has been used to solve the equation. However, as the implementation might suffer from e.g numerical instability\n",
+    "and high execution time (this might be more apparent in the examples solving PDEs),\n",
+    "using a library like  TensorFlow is recommended.\n",
+    "Here, we stay with a more simple approach and implement for comparison, the simple forward Euler method."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "94bb8aaa",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Setting up the problem\n",
+    "\n",
+    "Here, we will model a population $g(t)$ in an environment having carrying capacity $A$.\n",
+    "The population follows the model"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "29ead54b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"solveode_population\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation} \\label{solveode_population} \\tag{11}\n",
+    "g'(t) = \\alpha g(t)(A - g(t))\n",
+    "\\end{equation}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5685f6e2",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where $g(0) = g_0$.\n",
+    "\n",
+    "In this example, we let $\\alpha = 2$, $A = 1$, and $g_0 = 1.2$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "adaea719",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## The trial solution\n",
+    "\n",
+    "We will get a slightly different trial solution, as the boundary conditions are different\n",
+    "compared to the case for exponential decay.\n",
+    "\n",
+    "A possible trial solution satisfying the condition $g(0) = g_0$ could be\n",
+    "\n",
+    "$$\n",
+    "h_1(t) = g_0 + t \\cdot N(t,P)\n",
+    "$$\n",
+    "\n",
+    "with $N(t,P)$ being the output from the neural network with weights and biases for each layer collected in the set $P$.\n",
+    "\n",
+    "The analytical solution is\n",
+    "\n",
+    "$$\n",
+    "g(t) = \\frac{Ag_0}{g_0 + (A - g_0)\\exp(-\\alpha A t)}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4ee7e543",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## The program using Autograd\n",
+    "\n",
+    "The network will be the similar as for the exponential decay example, but with some small modifications for our problem."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "id": "e50f4369",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "import autograd.numpy as np\n",
+    "from autograd import grad, elementwise_grad\n",
+    "import autograd.numpy.random as npr\n",
+    "from matplotlib import pyplot as plt\n",
+    "\n",
+    "def sigmoid(z):\n",
+    "    return 1/(1 + np.exp(-z))\n",
+    "\n",
+    "# Function to get the parameters.\n",
+    "# Done such that one can easily change the paramaters after one's liking.\n",
+    "def get_parameters():\n",
+    "    alpha = 2\n",
+    "    A = 1\n",
+    "    g0 = 1.2\n",
+    "    return alpha, A, g0\n",
+    "\n",
+    "def deep_neural_network(deep_params, x):\n",
+    "    # N_hidden is the number of hidden layers\n",
+    "    # deep_params is a list, len() should be used\n",
+    "    N_hidden = len(deep_params) - 1 # -1 since params consists of\n",
+    "                                        # parameters to all the hidden\n",
+    "                                        # layers AND the output layer.\n",
+    "\n",
+    "    # Assumes input x being an one-dimensional array\n",
+    "    num_values = np.size(x)\n",
+    "    x = x.reshape(-1, num_values)\n",
+    "\n",
+    "    # Assume that the input layer does nothing to the input x\n",
+    "    x_input = x\n",
+    "\n",
+    "    # Due to multiple hidden layers, define a variable referencing to the\n",
+    "    # output of the previous layer:\n",
+    "    x_prev = x_input\n",
+    "\n",
+    "    ## Hidden layers:\n",
+    "\n",
+    "    for l in range(N_hidden):\n",
+    "        # From the list of parameters P; find the correct weigths and bias for this layer\n",
+    "        w_hidden = deep_params[l]\n",
+    "\n",
+    "        # Add a row of ones to include bias\n",
+    "        x_prev = np.concatenate((np.ones((1,num_values)), x_prev ), axis = 0)\n",
+    "\n",
+    "        z_hidden = np.matmul(w_hidden, x_prev)\n",
+    "        x_hidden = sigmoid(z_hidden)\n",
+    "\n",
+    "        # Update x_prev such that next layer can use the output from this layer\n",
+    "        x_prev = x_hidden\n",
+    "\n",
+    "    ## Output layer:\n",
+    "\n",
+    "    # Get the weights and bias for this layer\n",
+    "    w_output = deep_params[-1]\n",
+    "\n",
+    "    # Include bias:\n",
+    "    x_prev = np.concatenate((np.ones((1,num_values)), x_prev), axis = 0)\n",
+    "\n",
+    "    z_output = np.matmul(w_output, x_prev)\n",
+    "    x_output = z_output\n",
+    "\n",
+    "    return x_output\n",
+    "\n",
+    "\n",
+    "\n",
+    "\n",
+    "def cost_function_deep(P, x):\n",
+    "\n",
+    "    # Evaluate the trial function with the current parameters P\n",
+    "    g_t = g_trial_deep(x,P)\n",
+    "\n",
+    "    # Find the derivative w.r.t x of the trial function\n",
+    "    d_g_t = elementwise_grad(g_trial_deep,0)(x,P)\n",
+    "\n",
+    "    # The right side of the ODE\n",
+    "    func = f(x, g_t)\n",
+    "\n",
+    "    err_sqr = (d_g_t - func)**2\n",
+    "    cost_sum = np.sum(err_sqr)\n",
+    "\n",
+    "    return cost_sum / np.size(err_sqr)\n",
+    "\n",
+    "# The right side of the ODE:\n",
+    "def f(x, g_trial):\n",
+    "    alpha,A, g0 = get_parameters()\n",
+    "    return alpha*g_trial*(A - g_trial)\n",
+    "\n",
+    "# The trial solution using the deep neural network:\n",
+    "def g_trial_deep(x, params):\n",
+    "    alpha,A, g0 = get_parameters()\n",
+    "    return g0 + x*deep_neural_network(params,x)\n",
+    "\n",
+    "# The analytical solution:\n",
+    "def g_analytic(t):\n",
+    "    alpha,A, g0 = get_parameters()\n",
+    "    return A*g0/(g0 + (A - g0)*np.exp(-alpha*A*t))\n",
+    "\n",
+    "def solve_ode_deep_neural_network(x, num_neurons, num_iter, lmb):\n",
+    "    # num_hidden_neurons is now a list of number of neurons within each hidden layer\n",
+    "\n",
+    "    # Find the number of hidden layers:\n",
+    "    N_hidden = np.size(num_neurons)\n",
+    "\n",
+    "    ## Set up initial weigths and biases\n",
+    "\n",
+    "    # Initialize the list of parameters:\n",
+    "    P = [None]*(N_hidden + 1) # + 1 to include the output layer\n",
+    "\n",
+    "    P[0] = npr.randn(num_neurons[0], 2 )\n",
+    "    for l in range(1,N_hidden):\n",
+    "        P[l] = npr.randn(num_neurons[l], num_neurons[l-1] + 1) # +1 to include bias\n",
+    "\n",
+    "    # For the output layer\n",
+    "    P[-1] = npr.randn(1, num_neurons[-1] + 1 ) # +1 since bias is included\n",
+    "\n",
+    "    print('Initial cost: %g'%cost_function_deep(P, x))\n",
+    "\n",
+    "    ## Start finding the optimal weigths using gradient descent\n",
+    "\n",
+    "    # Find the Python function that represents the gradient of the cost function\n",
+    "    # w.r.t the 0-th input argument -- that is the weights and biases in the hidden and output layer\n",
+    "    cost_function_deep_grad = grad(cost_function_deep,0)\n",
+    "\n",
+    "    # Let the update be done num_iter times\n",
+    "    for i in range(num_iter):\n",
+    "        # Evaluate the gradient at the current weights and biases in P.\n",
+    "        # The cost_grad consist now of N_hidden + 1 arrays; the gradient w.r.t the weights and biases\n",
+    "        # in the hidden layers and output layers evaluated at x.\n",
+    "        cost_deep_grad =  cost_function_deep_grad(P, x)\n",
+    "\n",
+    "        for l in range(N_hidden+1):\n",
+    "            P[l] = P[l] - lmb * cost_deep_grad[l]\n",
+    "\n",
+    "    print('Final cost: %g'%cost_function_deep(P, x))\n",
+    "\n",
+    "    return P\n",
+    "\n",
+    "if __name__ == '__main__':\n",
+    "    npr.seed(4155)\n",
+    "\n",
+    "    ## Decide the vales of arguments to the function to solve\n",
+    "    Nt = 10\n",
+    "    T = 1\n",
+    "    t = np.linspace(0,T, Nt)\n",
+    "\n",
+    "    ## Set up the initial parameters\n",
+    "    num_hidden_neurons = [100, 50, 25]\n",
+    "    num_iter = 1000\n",
+    "    lmb = 1e-3\n",
+    "\n",
+    "    P = solve_ode_deep_neural_network(t, num_hidden_neurons, num_iter, lmb)\n",
+    "\n",
+    "    g_dnn_ag = g_trial_deep(t,P)\n",
+    "    g_analytical = g_analytic(t)\n",
+    "\n",
+    "    # Find the maximum absolute difference between the solutons:\n",
+    "    diff_ag = np.max(np.abs(g_dnn_ag - g_analytical))\n",
+    "    print(\"The max absolute difference between the solutions is: %g\"%diff_ag)\n",
+    "\n",
+    "    plt.figure(figsize=(10,10))\n",
+    "\n",
+    "    plt.title('Performance of neural network solving an ODE compared to the analytical solution')\n",
+    "    plt.plot(t, g_analytical)\n",
+    "    plt.plot(t, g_dnn_ag[0,:])\n",
+    "    plt.legend(['analytical','nn'])\n",
+    "    plt.xlabel('t')\n",
+    "    plt.ylabel('g(t)')\n",
+    "\n",
+    "    plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cf212644",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Using forward Euler to solve the ODE\n",
+    "\n",
+    "A straightforward way of solving an ODE numerically, is to use Euler's method.\n",
+    "\n",
+    "Euler's method uses Taylor series to approximate the value at a function $f$ at a step $\\Delta x$ from $x$:\n",
+    "\n",
+    "$$\n",
+    "f(x + \\Delta x) \\approx f(x) + \\Delta x f'(x)\n",
+    "$$\n",
+    "\n",
+    "In our case, using Euler's method to approximate the value of $g$ at a step $\\Delta t$ from $t$ yields"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "46f2fb77",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\begin{aligned}\n",
+    "  g(t + \\Delta t) &\\approx g(t) + \\Delta t g'(t) \\\\\n",
+    "  &= g(t) + \\Delta t \\big(\\alpha g(t)(A - g(t))\\big)\n",
+    "\\end{aligned}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "aab2dfa5",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "along with the condition that $g(0) = g_0$.\n",
+    "\n",
+    "Let $t_i = i \\cdot \\Delta t$ where $\\Delta t = \\frac{T}{N_t-1}$ where $T$ is the final time our solver must solve for and $N_t$ the number of values for $t \\in [0, T]$ for $i = 0, \\dots, N_t-1$.\n",
+    "\n",
+    "For $i \\geq 1$, we have that"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8eea575e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\begin{aligned}\n",
+    "t_i &= i\\Delta t \\\\\n",
+    "&= (i - 1)\\Delta t + \\Delta t \\\\\n",
+    "&= t_{i-1} + \\Delta t\n",
+    "\\end{aligned}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b91b116d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Now, if $g_i = g(t_i)$ then"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b438159d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"odenum\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation}\n",
+    "  \\begin{aligned}\n",
+    "  g_i &= g(t_i) \\\\\n",
+    "  &= g(t_{i-1} + \\Delta t) \\\\\n",
+    "  &\\approx g(t_{i-1}) + \\Delta t \\big(\\alpha g(t_{i-1})(A - g(t_{i-1}))\\big) \\\\\n",
+    "  &= g_{i-1} + \\Delta t \\big(\\alpha g_{i-1}(A - g_{i-1})\\big)\n",
+    "  \\end{aligned}\n",
+    "\\end{equation} \\label{odenum} \\tag{12}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c4fcc89b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "for $i \\geq 1$ and $g_0 = g(t_0) = g(0) = g_0$.\n",
+    "\n",
+    "Equation ([12](#odenum)) could be implemented in the following way,\n",
+    "extending the program that uses the network using Autograd:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "id": "98f55b29",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "# Assume that all function definitions from the example program using Autograd\n",
+    "# are located here.\n",
+    "\n",
+    "if __name__ == '__main__':\n",
+    "    npr.seed(4155)\n",
+    "\n",
+    "    ## Decide the vales of arguments to the function to solve\n",
+    "    Nt = 10\n",
+    "    T = 1\n",
+    "    t = np.linspace(0,T, Nt)\n",
+    "\n",
+    "    ## Set up the initial parameters\n",
+    "    num_hidden_neurons = [100,50,25]\n",
+    "    num_iter = 1000\n",
+    "    lmb = 1e-3\n",
+    "\n",
+    "    P = solve_ode_deep_neural_network(t, num_hidden_neurons, num_iter, lmb)\n",
+    "\n",
+    "    g_dnn_ag = g_trial_deep(t,P)\n",
+    "    g_analytical = g_analytic(t)\n",
+    "\n",
+    "    # Find the maximum absolute difference between the solutons:\n",
+    "    diff_ag = np.max(np.abs(g_dnn_ag - g_analytical))\n",
+    "    print(\"The max absolute difference between the solutions is: %g\"%diff_ag)\n",
+    "\n",
+    "    plt.figure(figsize=(10,10))\n",
+    "\n",
+    "    plt.title('Performance of neural network solving an ODE compared to the analytical solution')\n",
+    "    plt.plot(t, g_analytical)\n",
+    "    plt.plot(t, g_dnn_ag[0,:])\n",
+    "    plt.legend(['analytical','nn'])\n",
+    "    plt.xlabel('t')\n",
+    "    plt.ylabel('g(t)')\n",
+    "\n",
+    "    ## Find an approximation to the funtion using forward Euler\n",
+    "\n",
+    "    alpha, A, g0 = get_parameters()\n",
+    "    dt = T/(Nt - 1)\n",
+    "\n",
+    "    # Perform forward Euler to solve the ODE\n",
+    "    g_euler = np.zeros(Nt)\n",
+    "    g_euler[0] = g0\n",
+    "\n",
+    "    for i in range(1,Nt):\n",
+    "        g_euler[i] = g_euler[i-1] + dt*(alpha*g_euler[i-1]*(A - g_euler[i-1]))\n",
+    "\n",
+    "    # Print the errors done by each method\n",
+    "    diff1 = np.max(np.abs(g_euler - g_analytical))\n",
+    "    diff2 = np.max(np.abs(g_dnn_ag[0,:] - g_analytical))\n",
+    "\n",
+    "    print('Max absolute difference between Euler method and analytical: %g'%diff1)\n",
+    "    print('Max absolute difference between deep neural network and analytical: %g'%diff2)\n",
+    "\n",
+    "    # Plot results\n",
+    "    plt.figure(figsize=(10,10))\n",
+    "\n",
+    "    plt.plot(t,g_euler)\n",
+    "    plt.plot(t,g_analytical)\n",
+    "    plt.plot(t,g_dnn_ag[0,:])\n",
+    "\n",
+    "    plt.legend(['euler','analytical','dnn'])\n",
+    "    plt.xlabel('Time t')\n",
+    "    plt.ylabel('g(t)')\n",
+    "\n",
+    "    plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a6e8888e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Example: Solving the one dimensional Poisson equation\n",
+    "\n",
+    "The Poisson equation for $g(x)$ in one dimension is"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ac2720d4",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"poisson\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation} \\label{poisson} \\tag{13}\n",
+    "  -g''(x) = f(x)\n",
+    "\\end{equation}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "65554b02",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where $f(x)$ is a given function for $x \\in (0,1)$.\n",
+    "\n",
+    "The conditions that $g(x)$ is chosen to fulfill, are"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0cdf0586",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\begin{align*}\n",
+    "  g(0) &= 0 \\\\\n",
+    "  g(1) &= 0\n",
+    "\\end{align*}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f7e65a6a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "This equation can be solved numerically using programs where e.g Autograd and TensorFlow are used.\n",
+    "The results from the networks can then be compared to the analytical solution.\n",
+    "In addition, it could be interesting to see how a typical method for numerically solving second order ODEs compares to the neural networks."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cd827e12",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## The specific equation to solve for\n",
+    "\n",
+    "Here, the function $g(x)$ to solve for follows the equation"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a6100e41",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "-g''(x) = f(x),\\qquad x \\in (0,1)\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "15c06751",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where $f(x)$ is a given function, along with the chosen conditions"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b2b4dd2f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"cond\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{aligned}\n",
+    "g(0) = g(1) = 0\n",
+    "\\end{aligned}\\label{cond} \\tag{14}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2133aeed",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "In this example, we consider the case when $f(x) = (3x + x^2)\\exp(x)$.\n",
+    "\n",
+    "For this case, a possible trial solution satisfying the conditions could be"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5baf9b4b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "g_t(x) = x \\cdot (1-x) \\cdot N(P,x)\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ed82aba2",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "The analytical solution for this problem is"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c9bce69c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "g(x) = x(1 - x)\\exp(x)\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ce42c4a8",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Solving the equation using Autograd"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "id": "2fcb9045",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "import autograd.numpy as np\n",
+    "from autograd import grad, elementwise_grad\n",
+    "import autograd.numpy.random as npr\n",
+    "from matplotlib import pyplot as plt\n",
+    "\n",
+    "def sigmoid(z):\n",
+    "    return 1/(1 + np.exp(-z))\n",
+    "\n",
+    "def deep_neural_network(deep_params, x):\n",
+    "    # N_hidden is the number of hidden layers\n",
+    "    # deep_params is a list, len() should be used\n",
+    "    N_hidden = len(deep_params) - 1 # -1 since params consists of\n",
+    "                                        # parameters to all the hidden\n",
+    "                                        # layers AND the output layer.\n",
+    "\n",
+    "    # Assumes input x being an one-dimensional array\n",
+    "    num_values = np.size(x)\n",
+    "    x = x.reshape(-1, num_values)\n",
+    "\n",
+    "    # Assume that the input layer does nothing to the input x\n",
+    "    x_input = x\n",
+    "\n",
+    "    # Due to multiple hidden layers, define a variable referencing to the\n",
+    "    # output of the previous layer:\n",
+    "    x_prev = x_input\n",
+    "\n",
+    "    ## Hidden layers:\n",
+    "\n",
+    "    for l in range(N_hidden):\n",
+    "        # From the list of parameters P; find the correct weigths and bias for this layer\n",
+    "        w_hidden = deep_params[l]\n",
+    "\n",
+    "        # Add a row of ones to include bias\n",
+    "        x_prev = np.concatenate((np.ones((1,num_values)), x_prev ), axis = 0)\n",
+    "\n",
+    "        z_hidden = np.matmul(w_hidden, x_prev)\n",
+    "        x_hidden = sigmoid(z_hidden)\n",
+    "\n",
+    "        # Update x_prev such that next layer can use the output from this layer\n",
+    "        x_prev = x_hidden\n",
+    "\n",
+    "    ## Output layer:\n",
+    "\n",
+    "    # Get the weights and bias for this layer\n",
+    "    w_output = deep_params[-1]\n",
+    "\n",
+    "    # Include bias:\n",
+    "    x_prev = np.concatenate((np.ones((1,num_values)), x_prev), axis = 0)\n",
+    "\n",
+    "    z_output = np.matmul(w_output, x_prev)\n",
+    "    x_output = z_output\n",
+    "\n",
+    "    return x_output\n",
+    "\n",
+    "\n",
+    "def solve_ode_deep_neural_network(x, num_neurons, num_iter, lmb):\n",
+    "    # num_hidden_neurons is now a list of number of neurons within each hidden layer\n",
+    "\n",
+    "    # Find the number of hidden layers:\n",
+    "    N_hidden = np.size(num_neurons)\n",
+    "\n",
+    "    ## Set up initial weigths and biases\n",
+    "\n",
+    "    # Initialize the list of parameters:\n",
+    "    P = [None]*(N_hidden + 1) # + 1 to include the output layer\n",
+    "\n",
+    "    P[0] = npr.randn(num_neurons[0], 2 )\n",
+    "    for l in range(1,N_hidden):\n",
+    "        P[l] = npr.randn(num_neurons[l], num_neurons[l-1] + 1) # +1 to include bias\n",
+    "\n",
+    "    # For the output layer\n",
+    "    P[-1] = npr.randn(1, num_neurons[-1] + 1 ) # +1 since bias is included\n",
+    "\n",
+    "    print('Initial cost: %g'%cost_function_deep(P, x))\n",
+    "\n",
+    "    ## Start finding the optimal weigths using gradient descent\n",
+    "\n",
+    "    # Find the Python function that represents the gradient of the cost function\n",
+    "    # w.r.t the 0-th input argument -- that is the weights and biases in the hidden and output layer\n",
+    "    cost_function_deep_grad = grad(cost_function_deep,0)\n",
+    "\n",
+    "    # Let the update be done num_iter times\n",
+    "    for i in range(num_iter):\n",
+    "        # Evaluate the gradient at the current weights and biases in P.\n",
+    "        # The cost_grad consist now of N_hidden + 1 arrays; the gradient w.r.t the weights and biases\n",
+    "        # in the hidden layers and output layers evaluated at x.\n",
+    "        cost_deep_grad =  cost_function_deep_grad(P, x)\n",
+    "\n",
+    "        for l in range(N_hidden+1):\n",
+    "            P[l] = P[l] - lmb * cost_deep_grad[l]\n",
+    "\n",
+    "    print('Final cost: %g'%cost_function_deep(P, x))\n",
+    "\n",
+    "    return P\n",
+    "\n",
+    "## Set up the cost function specified for this Poisson equation:\n",
+    "\n",
+    "# The right side of the ODE\n",
+    "def f(x):\n",
+    "    return (3*x + x**2)*np.exp(x)\n",
+    "\n",
+    "def cost_function_deep(P, x):\n",
+    "\n",
+    "    # Evaluate the trial function with the current parameters P\n",
+    "    g_t = g_trial_deep(x,P)\n",
+    "\n",
+    "    # Find the derivative w.r.t x of the trial function\n",
+    "    d2_g_t = elementwise_grad(elementwise_grad(g_trial_deep,0))(x,P)\n",
+    "\n",
+    "    right_side = f(x)\n",
+    "\n",
+    "    err_sqr = (-d2_g_t - right_side)**2\n",
+    "    cost_sum = np.sum(err_sqr)\n",
+    "\n",
+    "    return cost_sum/np.size(err_sqr)\n",
+    "\n",
+    "# The trial solution:\n",
+    "def g_trial_deep(x,P):\n",
+    "    return x*(1-x)*deep_neural_network(P,x)\n",
+    "\n",
+    "# The analytic solution;\n",
+    "def g_analytic(x):\n",
+    "    return x*(1-x)*np.exp(x)\n",
+    "\n",
+    "if __name__ == '__main__':\n",
+    "    npr.seed(4155)\n",
+    "\n",
+    "    ## Decide the vales of arguments to the function to solve\n",
+    "    Nx = 10\n",
+    "    x = np.linspace(0,1, Nx)\n",
+    "\n",
+    "    ## Set up the initial parameters\n",
+    "    num_hidden_neurons = [200,100]\n",
+    "    num_iter = 1000\n",
+    "    lmb = 1e-3\n",
+    "\n",
+    "    P = solve_ode_deep_neural_network(x, num_hidden_neurons, num_iter, lmb)\n",
+    "\n",
+    "    g_dnn_ag = g_trial_deep(x,P)\n",
+    "    g_analytical = g_analytic(x)\n",
+    "\n",
+    "    # Find the maximum absolute difference between the solutons:\n",
+    "    max_diff = np.max(np.abs(g_dnn_ag - g_analytical))\n",
+    "    print(\"The max absolute difference between the solutions is: %g\"%max_diff)\n",
+    "\n",
+    "    plt.figure(figsize=(10,10))\n",
+    "\n",
+    "    plt.title('Performance of neural network solving an ODE compared to the analytical solution')\n",
+    "    plt.plot(x, g_analytical)\n",
+    "    plt.plot(x, g_dnn_ag[0,:])\n",
+    "    plt.legend(['analytical','nn'])\n",
+    "    plt.xlabel('x')\n",
+    "    plt.ylabel('g(x)')\n",
+    "    plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9db2e30e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Comparing with a numerical scheme\n",
+    "\n",
+    "The Poisson equation is possible to solve using Taylor series to approximate the second derivative.\n",
+    "\n",
+    "Using Taylor series, the second derivative can be expressed as\n",
+    "\n",
+    "$$\n",
+    "g''(x) = \\frac{g(x + \\Delta x) - 2g(x) + g(x-\\Delta x)}{\\Delta x^2} + E_{\\Delta x}(x)\n",
+    "$$\n",
+    "\n",
+    "where $\\Delta x$ is a small step size and $E_{\\Delta x}(x)$ being the error term.\n",
+    "\n",
+    "Looking away from the error terms gives an approximation to the second derivative:"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2cea098e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"approx\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation} \\label{approx} \\tag{15}\n",
+    "g''(x) \\approx \\frac{g(x + \\Delta x) - 2g(x) + g(x-\\Delta x)}{\\Delta x^2}\n",
+    "\\end{equation}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4606d139",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "If $x_i = i \\Delta x = x_{i-1} + \\Delta x$ and $g_i = g(x_i)$ for $i = 1,\\dots N_x - 2$ with $N_x$ being the number of values for $x$, ([15](#approx)) becomes"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "bf52b218",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\begin{aligned}\n",
+    "g''(x_i) &\\approx \\frac{g(x_i + \\Delta x) - 2g(x_i) + g(x_i -\\Delta x)}{\\Delta x^2} \\\\\n",
+    "&= \\frac{g_{i+1} - 2g_i + g_{i-1}}{\\Delta x^2}\n",
+    "\\end{aligned}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5649b303",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Since we know from our problem that"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cabbaeeb",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\begin{aligned}\n",
+    "-g''(x) &= f(x) \\\\\n",
+    "&= (3x + x^2)\\exp(x)\n",
+    "\\end{aligned}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9116da9a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "along with the conditions $g(0) = g(1) = 0$,\n",
+    "the following scheme can be used to find an approximate solution for $g(x)$ numerically:"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "fa0313ed",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"odesys\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation}\n",
+    "  \\begin{aligned}\n",
+    "  -\\Big( \\frac{g_{i+1} - 2g_i + g_{i-1}}{\\Delta x^2} \\Big) &= f(x_i) \\\\\n",
+    "  -g_{i+1} + 2g_i - g_{i-1} &= \\Delta x^2 f(x_i)\n",
+    "  \\end{aligned}\n",
+    "\\end{equation} \\label{odesys} \\tag{16}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d4bff256",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "for $i = 1, \\dots, N_x - 2$ where $g_0 = g_{N_x - 1} = 0$ and $f(x_i) = (3x_i + x_i^2)\\exp(x_i)$, which is given for our specific problem.\n",
+    "\n",
+    "The equation can be rewritten into a matrix equation:"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2817b619",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\begin{aligned}\n",
+    "\\begin{pmatrix}\n",
+    "2 & -1 & 0 & \\dots & 0 \\\\\n",
+    "-1 & 2 & -1 & \\dots & 0 \\\\\n",
+    "\\vdots & & \\ddots & & \\vdots \\\\\n",
+    "0 & \\dots & -1 & 2 & -1  \\\\\n",
+    "0 & \\dots & 0 & -1 & 2\\\\\n",
+    "\\end{pmatrix}\n",
+    "\\begin{pmatrix}\n",
+    "g_1 \\\\\n",
+    "g_2 \\\\\n",
+    "\\vdots \\\\\n",
+    "g_{N_x - 3} \\\\\n",
+    "g_{N_x - 2}\n",
+    "\\end{pmatrix}\n",
+    "&=\n",
+    "\\Delta x^2\n",
+    "\\begin{pmatrix}\n",
+    "f(x_1) \\\\\n",
+    "f(x_2) \\\\\n",
+    "\\vdots \\\\\n",
+    "f(x_{N_x - 3}) \\\\\n",
+    "f(x_{N_x - 2})\n",
+    "\\end{pmatrix} \\\\\n",
+    "\\boldsymbol{A}\\boldsymbol{g} &= \\boldsymbol{f},\n",
+    "\\end{aligned}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5130b233",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "which makes it possible to solve for the vector $\\boldsymbol{g}$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "18a4fdda",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Setting up the code\n",
+    "\n",
+    "We can then compare the result from this numerical scheme with the output from our network using Autograd:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "id": "3cff184d",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "import autograd.numpy as np\n",
+    "from autograd import grad, elementwise_grad\n",
+    "import autograd.numpy.random as npr\n",
+    "from matplotlib import pyplot as plt\n",
+    "\n",
+    "def sigmoid(z):\n",
+    "    return 1/(1 + np.exp(-z))\n",
+    "\n",
+    "def deep_neural_network(deep_params, x):\n",
+    "    # N_hidden is the number of hidden layers\n",
+    "    # deep_params is a list, len() should be used\n",
+    "    N_hidden = len(deep_params) - 1 # -1 since params consists of\n",
+    "                                        # parameters to all the hidden\n",
+    "                                        # layers AND the output layer.\n",
+    "\n",
+    "    # Assumes input x being an one-dimensional array\n",
+    "    num_values = np.size(x)\n",
+    "    x = x.reshape(-1, num_values)\n",
+    "\n",
+    "    # Assume that the input layer does nothing to the input x\n",
+    "    x_input = x\n",
+    "\n",
+    "    # Due to multiple hidden layers, define a variable referencing to the\n",
+    "    # output of the previous layer:\n",
+    "    x_prev = x_input\n",
+    "\n",
+    "    ## Hidden layers:\n",
+    "\n",
+    "    for l in range(N_hidden):\n",
+    "        # From the list of parameters P; find the correct weigths and bias for this layer\n",
+    "        w_hidden = deep_params[l]\n",
+    "\n",
+    "        # Add a row of ones to include bias\n",
+    "        x_prev = np.concatenate((np.ones((1,num_values)), x_prev ), axis = 0)\n",
+    "\n",
+    "        z_hidden = np.matmul(w_hidden, x_prev)\n",
+    "        x_hidden = sigmoid(z_hidden)\n",
+    "\n",
+    "        # Update x_prev such that next layer can use the output from this layer\n",
+    "        x_prev = x_hidden\n",
+    "\n",
+    "    ## Output layer:\n",
+    "\n",
+    "    # Get the weights and bias for this layer\n",
+    "    w_output = deep_params[-1]\n",
+    "\n",
+    "    # Include bias:\n",
+    "    x_prev = np.concatenate((np.ones((1,num_values)), x_prev), axis = 0)\n",
+    "\n",
+    "    z_output = np.matmul(w_output, x_prev)\n",
+    "    x_output = z_output\n",
+    "\n",
+    "    return x_output\n",
+    "\n",
+    "\n",
+    "def solve_ode_deep_neural_network(x, num_neurons, num_iter, lmb):\n",
+    "    # num_hidden_neurons is now a list of number of neurons within each hidden layer\n",
+    "\n",
+    "    # Find the number of hidden layers:\n",
+    "    N_hidden = np.size(num_neurons)\n",
+    "\n",
+    "    ## Set up initial weigths and biases\n",
+    "\n",
+    "    # Initialize the list of parameters:\n",
+    "    P = [None]*(N_hidden + 1) # + 1 to include the output layer\n",
+    "\n",
+    "    P[0] = npr.randn(num_neurons[0], 2 )\n",
+    "    for l in range(1,N_hidden):\n",
+    "        P[l] = npr.randn(num_neurons[l], num_neurons[l-1] + 1) # +1 to include bias\n",
+    "\n",
+    "    # For the output layer\n",
+    "    P[-1] = npr.randn(1, num_neurons[-1] + 1 ) # +1 since bias is included\n",
+    "\n",
+    "    print('Initial cost: %g'%cost_function_deep(P, x))\n",
+    "\n",
+    "    ## Start finding the optimal weigths using gradient descent\n",
+    "\n",
+    "    # Find the Python function that represents the gradient of the cost function\n",
+    "    # w.r.t the 0-th input argument -- that is the weights and biases in the hidden and output layer\n",
+    "    cost_function_deep_grad = grad(cost_function_deep,0)\n",
+    "\n",
+    "    # Let the update be done num_iter times\n",
+    "    for i in range(num_iter):\n",
+    "        # Evaluate the gradient at the current weights and biases in P.\n",
+    "        # The cost_grad consist now of N_hidden + 1 arrays; the gradient w.r.t the weights and biases\n",
+    "        # in the hidden layers and output layers evaluated at x.\n",
+    "        cost_deep_grad =  cost_function_deep_grad(P, x)\n",
+    "\n",
+    "        for l in range(N_hidden+1):\n",
+    "            P[l] = P[l] - lmb * cost_deep_grad[l]\n",
+    "\n",
+    "    print('Final cost: %g'%cost_function_deep(P, x))\n",
+    "\n",
+    "    return P\n",
+    "\n",
+    "## Set up the cost function specified for this Poisson equation:\n",
+    "\n",
+    "# The right side of the ODE\n",
+    "def f(x):\n",
+    "    return (3*x + x**2)*np.exp(x)\n",
+    "\n",
+    "def cost_function_deep(P, x):\n",
+    "\n",
+    "    # Evaluate the trial function with the current parameters P\n",
+    "    g_t = g_trial_deep(x,P)\n",
+    "\n",
+    "    # Find the derivative w.r.t x of the trial function\n",
+    "    d2_g_t = elementwise_grad(elementwise_grad(g_trial_deep,0))(x,P)\n",
+    "\n",
+    "    right_side = f(x)\n",
+    "\n",
+    "    err_sqr = (-d2_g_t - right_side)**2\n",
+    "    cost_sum = np.sum(err_sqr)\n",
+    "\n",
+    "    return cost_sum/np.size(err_sqr)\n",
+    "\n",
+    "# The trial solution:\n",
+    "def g_trial_deep(x,P):\n",
+    "    return x*(1-x)*deep_neural_network(P,x)\n",
+    "\n",
+    "# The analytic solution;\n",
+    "def g_analytic(x):\n",
+    "    return x*(1-x)*np.exp(x)\n",
+    "\n",
+    "if __name__ == '__main__':\n",
+    "    npr.seed(4155)\n",
+    "\n",
+    "    ## Decide the vales of arguments to the function to solve\n",
+    "    Nx = 10\n",
+    "    x = np.linspace(0,1, Nx)\n",
+    "\n",
+    "    ## Set up the initial parameters\n",
+    "    num_hidden_neurons = [200,100]\n",
+    "    num_iter = 1000\n",
+    "    lmb = 1e-3\n",
+    "\n",
+    "    P = solve_ode_deep_neural_network(x, num_hidden_neurons, num_iter, lmb)\n",
+    "\n",
+    "    g_dnn_ag = g_trial_deep(x,P)\n",
+    "    g_analytical = g_analytic(x)\n",
+    "\n",
+    "    # Find the maximum absolute difference between the solutons:\n",
+    "\n",
+    "    plt.figure(figsize=(10,10))\n",
+    "\n",
+    "    plt.title('Performance of neural network solving an ODE compared to the analytical solution')\n",
+    "    plt.plot(x, g_analytical)\n",
+    "    plt.plot(x, g_dnn_ag[0,:])\n",
+    "    plt.legend(['analytical','nn'])\n",
+    "    plt.xlabel('x')\n",
+    "    plt.ylabel('g(x)')\n",
+    "\n",
+    "    ## Perform the computation using the numerical scheme\n",
+    "\n",
+    "    dx = 1/(Nx - 1)\n",
+    "\n",
+    "    # Set up the matrix A\n",
+    "    A = np.zeros((Nx-2,Nx-2))\n",
+    "\n",
+    "    A[0,0] = 2\n",
+    "    A[0,1] = -1\n",
+    "\n",
+    "    for i in range(1,Nx-3):\n",
+    "        A[i,i-1] = -1\n",
+    "        A[i,i] = 2\n",
+    "        A[i,i+1] = -1\n",
+    "\n",
+    "    A[Nx - 3, Nx - 4] = -1\n",
+    "    A[Nx - 3, Nx - 3] = 2\n",
+    "\n",
+    "    # Set up the vector f\n",
+    "    f_vec = dx**2 * f(x[1:-1])\n",
+    "\n",
+    "    # Solve the equation\n",
+    "    g_res = np.linalg.solve(A,f_vec)\n",
+    "\n",
+    "    g_vec = np.zeros(Nx)\n",
+    "    g_vec[1:-1] = g_res\n",
+    "\n",
+    "    # Print the differences between each method\n",
+    "    max_diff1 = np.max(np.abs(g_dnn_ag - g_analytical))\n",
+    "    max_diff2 = np.max(np.abs(g_vec - g_analytical))\n",
+    "    print(\"The max absolute difference between the analytical solution and DNN Autograd: %g\"%max_diff1)\n",
+    "    print(\"The max absolute difference between the analytical solution and numerical scheme: %g\"%max_diff2)\n",
+    "\n",
+    "    # Plot the results\n",
+    "    plt.figure(figsize=(10,10))\n",
+    "\n",
+    "    plt.plot(x,g_vec)\n",
+    "    plt.plot(x,g_analytical)\n",
+    "    plt.plot(x,g_dnn_ag[0,:])\n",
+    "\n",
+    "    plt.legend(['numerical scheme','analytical','dnn'])\n",
+    "    plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "89115be0",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Partial Differential Equations\n",
+    "\n",
+    "A partial differential equation (PDE) has a solution here the function\n",
+    "is defined by multiple variables.  The equation may involve all kinds\n",
+    "of combinations of which variables the function is differentiated with\n",
+    "respect to.\n",
+    "\n",
+    "In general, a partial differential equation for a function $g(x_1,\\dots,x_N)$ with $N$ variables may be expressed as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c43a6341",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"PDE\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation} \\label{PDE} \\tag{17}\n",
+    "  f\\left(x_1, \\, \\dots \\, , x_N, \\frac{\\partial g(x_1,\\dots,x_N) }{\\partial x_1}, \\dots , \\frac{\\partial g(x_1,\\dots,x_N) }{\\partial x_N}, \\frac{\\partial g(x_1,\\dots,x_N) }{\\partial x_1\\partial x_2}, \\, \\dots \\, , \\frac{\\partial^n g(x_1,\\dots,x_N) }{\\partial x_N^n} \\right) = 0\n",
+    "\\end{equation}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "218a7a68",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where $f$ is an expression involving all kinds of possible mixed derivatives of $g(x_1,\\dots,x_N)$ up to an order $n$. In order for the solution to be unique, some additional conditions must also be given."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "902f8f61",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Type of problem\n",
+    "\n",
+    "The problem our network must solve for, is similar to the ODE case.\n",
+    "We must have a trial solution $g_t$ at hand.\n",
+    "\n",
+    "For instance, the trial solution could be expressed as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1c2bbcbd",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\begin{align*}\n",
+    "  g_t(x_1,\\dots,x_N) = h_1(x_1,\\dots,x_N) + h_2(x_1,\\dots,x_N,N(x_1,\\dots,x_N,P))\n",
+    "\\end{align*}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "73f5bf7b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where $h_1(x_1,\\dots,x_N)$ is a function that ensures $g_t(x_1,\\dots,x_N)$ satisfies some given conditions.\n",
+    "The neural network $N(x_1,\\dots,x_N,P)$ has weights and biases described by $P$ and $h_2(x_1,\\dots,x_N,N(x_1,\\dots,x_N,P))$ is an expression using the output from the neural network in some way.\n",
+    "\n",
+    "The role of the function $h_2(x_1,\\dots,x_N,N(x_1,\\dots,x_N,P))$, is to ensure that the output of $N(x_1,\\dots,x_N,P)$ is zero when $g_t(x_1,\\dots,x_N)$ is evaluated at the values of $x_1,\\dots,x_N$ where the given conditions must be satisfied. The function $h_1(x_1,\\dots,x_N)$ should alone make $g_t(x_1,\\dots,x_N)$ satisfy the conditions."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "dbb4ece5",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Network requirements\n",
+    "\n",
+    "The network tries then the minimize the cost function following the\n",
+    "same ideas as described for the ODE case, but now with more than one\n",
+    "variables to consider.  The concept still remains the same; find a set\n",
+    "of parameters $P$ such that the expression $f$ in ([17](#PDE)) is as\n",
+    "close to zero as possible.\n",
+    "\n",
+    "As for the ODE case, the cost function is the mean squared error that\n",
+    "the network must try to minimize. The cost function for the network to\n",
+    "minimize is"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d01d3943",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "C\\left(x_1, \\dots, x_N, P\\right) = \\left(  f\\left(x_1, \\, \\dots \\, , x_N, \\frac{\\partial g(x_1,\\dots,x_N) }{\\partial x_1}, \\dots , \\frac{\\partial g(x_1,\\dots,x_N) }{\\partial x_N}, \\frac{\\partial g(x_1,\\dots,x_N) }{\\partial x_1\\partial x_2}, \\, \\dots \\, , \\frac{\\partial^n g(x_1,\\dots,x_N) }{\\partial x_N^n} \\right) \\right)^2\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6514db22",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## More details\n",
+    "\n",
+    "If we let $\\boldsymbol{x} = \\big( x_1, \\dots, x_N \\big)$ be an array containing the values for $x_1, \\dots, x_N$ respectively, the cost function can be reformulated into the following:"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5a0ed10c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "C\\left(\\boldsymbol{x}, P\\right) = f\\left( \\left( \\boldsymbol{x}, \\frac{\\partial g(\\boldsymbol{x}) }{\\partial x_1}, \\dots , \\frac{\\partial g(\\boldsymbol{x}) }{\\partial x_N}, \\frac{\\partial g(\\boldsymbol{x}) }{\\partial x_1\\partial x_2}, \\, \\dots \\, , \\frac{\\partial^n g(\\boldsymbol{x}) }{\\partial x_N^n} \\right) \\right)^2\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "200fc78c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "If we also have $M$ different sets of values for $x_1, \\dots, x_N$, that is $\\boldsymbol{x}_i = \\big(x_1^{(i)}, \\dots, x_N^{(i)}\\big)$ for $i = 1,\\dots,M$ being the rows in matrix $X$, the cost function can be generalized into"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0c87647d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "C\\left(X, P \\right) = \\sum_{i=1}^M f\\left( \\left( \\boldsymbol{x}_i, \\frac{\\partial g(\\boldsymbol{x}_i) }{\\partial x_1}, \\dots , \\frac{\\partial g(\\boldsymbol{x}_i) }{\\partial x_N}, \\frac{\\partial g(\\boldsymbol{x}_i) }{\\partial x_1\\partial x_2}, \\, \\dots \\, , \\frac{\\partial^n g(\\boldsymbol{x}_i) }{\\partial x_N^n} \\right) \\right)^2.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6484a267",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Example: The diffusion equation\n",
+    "\n",
+    "In one spatial dimension, the equation reads"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2c2a2467",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial g(x,t)}{\\partial t} = \\frac{\\partial^2 g(x,t)}{\\partial x^2}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6df58357",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where a possible choice of conditions are"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "13d9c7f6",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\begin{align*}\n",
+    "g(0,t) &= 0 ,\\qquad t \\geq 0 \\\\\n",
+    "g(1,t) &= 0, \\qquad t \\geq 0 \\\\\n",
+    "g(x,0) &= u(x),\\qquad x\\in [0,1]\n",
+    "\\end{align*}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "627708ec",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "with $u(x)$ being some given function."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "43cdd945",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Defining the problem\n",
+    "\n",
+    "For this case, we want to find $g(x,t)$ such that"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ccdcb67e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"diffonedim\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation}\n",
+    "  \\frac{\\partial g(x,t)}{\\partial t} = \\frac{\\partial^2 g(x,t)}{\\partial x^2}\n",
+    "\\end{equation} \\label{diffonedim} \\tag{18}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ebe711f8",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2174f30f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\begin{align*}\n",
+    "g(0,t) &= 0 ,\\qquad t \\geq 0 \\\\\n",
+    "g(1,t) &= 0, \\qquad t \\geq 0 \\\\\n",
+    "g(x,0) &= u(x),\\qquad x\\in [0,1]\n",
+    "\\end{align*}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "083ed2ff",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "with $u(x) = \\sin(\\pi x)$.\n",
+    "\n",
+    "First, let us set up the deep neural network.\n",
+    "The deep neural network will follow the same structure as discussed in the examples solving the ODEs.\n",
+    "First, we will look into how Autograd could be used in a network tailored to solve for bivariate functions."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cf5e3f46",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Setting up the network using Autograd\n",
+    "\n",
+    "The only change to do here, is to extend our network such that\n",
+    "functions of multiple parameters are correctly handled.  In this case\n",
+    "we have two variables in our function to solve for, that is time $t$\n",
+    "and position $x$.  The variables will be represented by a\n",
+    "one-dimensional array in the program.  The program will evaluate the\n",
+    "network at each possible pair $(x,t)$, given an array for the desired\n",
+    "$x$-values and $t$-values to approximate the solution at."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "id": "4fee106b",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "def sigmoid(z):\n",
+    "    return 1/(1 + np.exp(-z))\n",
+    "\n",
+    "def deep_neural_network(deep_params, x):\n",
+    "    # x is now a point and a 1D numpy array; make it a column vector\n",
+    "    num_coordinates = np.size(x,0)\n",
+    "    x = x.reshape(num_coordinates,-1)\n",
+    "\n",
+    "    num_points = np.size(x,1)\n",
+    "\n",
+    "    # N_hidden is the number of hidden layers\n",
+    "    N_hidden = len(deep_params) - 1 # -1 since params consist of parameters to all the hidden layers AND the output layer\n",
+    "\n",
+    "    # Assume that the input layer does nothing to the input x\n",
+    "    x_input = x\n",
+    "    x_prev = x_input\n",
+    "\n",
+    "    ## Hidden layers:\n",
+    "\n",
+    "    for l in range(N_hidden):\n",
+    "        # From the list of parameters P; find the correct weigths and bias for this layer\n",
+    "        w_hidden = deep_params[l]\n",
+    "\n",
+    "        # Add a row of ones to include bias\n",
+    "        x_prev = np.concatenate((np.ones((1,num_points)), x_prev ), axis = 0)\n",
+    "\n",
+    "        z_hidden = np.matmul(w_hidden, x_prev)\n",
+    "        x_hidden = sigmoid(z_hidden)\n",
+    "\n",
+    "        # Update x_prev such that next layer can use the output from this layer\n",
+    "        x_prev = x_hidden\n",
+    "\n",
+    "    ## Output layer:\n",
+    "\n",
+    "    # Get the weights and bias for this layer\n",
+    "    w_output = deep_params[-1]\n",
+    "\n",
+    "    # Include bias:\n",
+    "    x_prev = np.concatenate((np.ones((1,num_points)), x_prev), axis = 0)\n",
+    "\n",
+    "    z_output = np.matmul(w_output, x_prev)\n",
+    "    x_output = z_output\n",
+    "\n",
+    "    return x_output[0][0]"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "63e5fb7e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Setting up the network using Autograd; The trial solution\n",
+    "\n",
+    "The cost function must then iterate through the given arrays\n",
+    "containing values for $x$ and $t$, defines a point $(x,t)$ the deep\n",
+    "neural network and the trial solution is evaluated at, and then finds\n",
+    "the Jacobian of the trial solution.\n",
+    "\n",
+    "A possible trial solution for this PDE is\n",
+    "\n",
+    "$$\n",
+    "g_t(x,t) = h_1(x,t) + x(1-x)tN(x,t,P)\n",
+    "$$\n",
+    "\n",
+    "with $h_1(x,t)$ being a function ensuring that $g_t(x,t)$ satisfies our given conditions, and $N(x,t,P)$ being the output from the deep neural network using weights and biases for each layer from $P$.\n",
+    "\n",
+    "To fulfill the conditions, $h_1(x,t)$ could be:\n",
+    "\n",
+    "$$\n",
+    "h_1(x,t) = (1-t)\\Big(u(x) - \\big((1-x)u(0) + x u(1)\\big)\\Big) = (1-t)u(x) = (1-t)\\sin(\\pi x)\n",
+    "$$\n",
+    "since $(0) = u(1) = 0$ and $u(x) = \\sin(\\pi x)$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "50cfea81",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Why the Jacobian?\n",
+    "\n",
+    "The Jacobian is used because the program must find the derivative of\n",
+    "the trial solution with respect to $x$ and $t$.\n",
+    "\n",
+    "This gives the necessity of computing the Jacobian matrix, as we want\n",
+    "to evaluate the gradient with respect to $x$ and $t$ (note that the\n",
+    "Jacobian of a scalar-valued multivariate function is simply its\n",
+    "gradient).\n",
+    "\n",
+    "In Autograd, the differentiation is by default done with respect to\n",
+    "the first input argument of your Python function. Since the points is\n",
+    "an array representing $x$ and $t$, the Jacobian is calculated using\n",
+    "the values of $x$ and $t$.\n",
+    "\n",
+    "To find the second derivative with respect to $x$ and $t$, the\n",
+    "Jacobian can be found for the second time. The result is a Hessian\n",
+    "matrix, which is the matrix containing all the possible second order\n",
+    "mixed derivatives of $g(x,t)$."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 8,
+   "id": "309808f6",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "# Set up the trial function:\n",
+    "def u(x):\n",
+    "    return np.sin(np.pi*x)\n",
+    "\n",
+    "def g_trial(point,P):\n",
+    "    x,t = point\n",
+    "    return (1-t)*u(x) + x*(1-x)*t*deep_neural_network(P,point)\n",
+    "\n",
+    "# The right side of the ODE:\n",
+    "def f(point):\n",
+    "    return 0.\n",
+    "\n",
+    "# The cost function:\n",
+    "def cost_function(P, x, t):\n",
+    "    cost_sum = 0\n",
+    "\n",
+    "    g_t_jacobian_func = jacobian(g_trial)\n",
+    "    g_t_hessian_func = hessian(g_trial)\n",
+    "\n",
+    "    for x_ in x:\n",
+    "        for t_ in t:\n",
+    "            point = np.array([x_,t_])\n",
+    "\n",
+    "            g_t = g_trial(point,P)\n",
+    "            g_t_jacobian = g_t_jacobian_func(point,P)\n",
+    "            g_t_hessian = g_t_hessian_func(point,P)\n",
+    "\n",
+    "            g_t_dt = g_t_jacobian[1]\n",
+    "            g_t_d2x = g_t_hessian[0][0]\n",
+    "\n",
+    "            func = f(point)\n",
+    "\n",
+    "            err_sqr = ( (g_t_dt - g_t_d2x) - func)**2\n",
+    "            cost_sum += err_sqr\n",
+    "\n",
+    "    return cost_sum"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9880d94c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Setting up the network using Autograd; The full program\n",
+    "\n",
+    "Having set up the network, along with the trial solution and cost function, we can now see how the deep neural network performs by comparing the results to the analytical solution.\n",
+    "\n",
+    "The analytical solution of our problem is\n",
+    "\n",
+    "$$\n",
+    "g(x,t) = \\exp(-\\pi^2 t)\\sin(\\pi x)\n",
+    "$$\n",
+    "\n",
+    "A possible way to implement a neural network solving the PDE, is given below.\n",
+    "Be aware, though, that it is fairly slow for the parameters used.\n",
+    "A better result is possible, but requires more iterations, and thus longer time to complete.\n",
+    "\n",
+    "Indeed, the program below is not optimal in its implementation, but rather serves as an example on how to implement and use a neural network to solve a PDE.\n",
+    "Using TensorFlow results in a much better execution time. Try it!"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 9,
+   "id": "fcd284e3",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "import autograd.numpy as np\n",
+    "from autograd import jacobian,hessian,grad\n",
+    "import autograd.numpy.random as npr\n",
+    "from matplotlib import cm\n",
+    "from matplotlib import pyplot as plt\n",
+    "from mpl_toolkits.mplot3d import axes3d\n",
+    "\n",
+    "## Set up the network\n",
+    "\n",
+    "def sigmoid(z):\n",
+    "    return 1/(1 + np.exp(-z))\n",
+    "\n",
+    "def deep_neural_network(deep_params, x):\n",
+    "    # x is now a point and a 1D numpy array; make it a column vector\n",
+    "    num_coordinates = np.size(x,0)\n",
+    "    x = x.reshape(num_coordinates,-1)\n",
+    "\n",
+    "    num_points = np.size(x,1)\n",
+    "\n",
+    "    # N_hidden is the number of hidden layers\n",
+    "    N_hidden = len(deep_params) - 1 # -1 since params consist of parameters to all the hidden layers AND the output layer\n",
+    "\n",
+    "    # Assume that the input layer does nothing to the input x\n",
+    "    x_input = x\n",
+    "    x_prev = x_input\n",
+    "\n",
+    "    ## Hidden layers:\n",
+    "\n",
+    "    for l in range(N_hidden):\n",
+    "        # From the list of parameters P; find the correct weigths and bias for this layer\n",
+    "        w_hidden = deep_params[l]\n",
+    "\n",
+    "        # Add a row of ones to include bias\n",
+    "        x_prev = np.concatenate((np.ones((1,num_points)), x_prev ), axis = 0)\n",
+    "\n",
+    "        z_hidden = np.matmul(w_hidden, x_prev)\n",
+    "        x_hidden = sigmoid(z_hidden)\n",
+    "\n",
+    "        # Update x_prev such that next layer can use the output from this layer\n",
+    "        x_prev = x_hidden\n",
+    "\n",
+    "    ## Output layer:\n",
+    "\n",
+    "    # Get the weights and bias for this layer\n",
+    "    w_output = deep_params[-1]\n",
+    "\n",
+    "    # Include bias:\n",
+    "    x_prev = np.concatenate((np.ones((1,num_points)), x_prev), axis = 0)\n",
+    "\n",
+    "    z_output = np.matmul(w_output, x_prev)\n",
+    "    x_output = z_output\n",
+    "\n",
+    "    return x_output[0][0]\n",
+    "\n",
+    "## Define the trial solution and cost function\n",
+    "def u(x):\n",
+    "    return np.sin(np.pi*x)\n",
+    "\n",
+    "def g_trial(point,P):\n",
+    "    x,t = point\n",
+    "    return (1-t)*u(x) + x*(1-x)*t*deep_neural_network(P,point)\n",
+    "\n",
+    "# The right side of the ODE:\n",
+    "def f(point):\n",
+    "    return 0.\n",
+    "\n",
+    "# The cost function:\n",
+    "def cost_function(P, x, t):\n",
+    "    cost_sum = 0\n",
+    "\n",
+    "    g_t_jacobian_func = jacobian(g_trial)\n",
+    "    g_t_hessian_func = hessian(g_trial)\n",
+    "\n",
+    "    for x_ in x:\n",
+    "        for t_ in t:\n",
+    "            point = np.array([x_,t_])\n",
+    "\n",
+    "            g_t = g_trial(point,P)\n",
+    "            g_t_jacobian = g_t_jacobian_func(point,P)\n",
+    "            g_t_hessian = g_t_hessian_func(point,P)\n",
+    "\n",
+    "            g_t_dt = g_t_jacobian[1]\n",
+    "            g_t_d2x = g_t_hessian[0][0]\n",
+    "\n",
+    "            func = f(point)\n",
+    "\n",
+    "            err_sqr = ( (g_t_dt - g_t_d2x) - func)**2\n",
+    "            cost_sum += err_sqr\n",
+    "\n",
+    "    return cost_sum /( np.size(x)*np.size(t) )\n",
+    "\n",
+    "## For comparison, define the analytical solution\n",
+    "def g_analytic(point):\n",
+    "    x,t = point\n",
+    "    return np.exp(-np.pi**2*t)*np.sin(np.pi*x)\n",
+    "\n",
+    "## Set up a function for training the network to solve for the equation\n",
+    "def solve_pde_deep_neural_network(x,t, num_neurons, num_iter, lmb):\n",
+    "    ## Set up initial weigths and biases\n",
+    "    N_hidden = np.size(num_neurons)\n",
+    "\n",
+    "    ## Set up initial weigths and biases\n",
+    "\n",
+    "    # Initialize the list of parameters:\n",
+    "    P = [None]*(N_hidden + 1) # + 1 to include the output layer\n",
+    "\n",
+    "    P[0] = npr.randn(num_neurons[0], 2 + 1 ) # 2 since we have two points, +1 to include bias\n",
+    "    for l in range(1,N_hidden):\n",
+    "        P[l] = npr.randn(num_neurons[l], num_neurons[l-1] + 1) # +1 to include bias\n",
+    "\n",
+    "    # For the output layer\n",
+    "    P[-1] = npr.randn(1, num_neurons[-1] + 1 ) # +1 since bias is included\n",
+    "\n",
+    "    print('Initial cost: ',cost_function(P, x, t))\n",
+    "\n",
+    "    cost_function_grad = grad(cost_function,0)\n",
+    "\n",
+    "    # Let the update be done num_iter times\n",
+    "    for i in range(num_iter):\n",
+    "        cost_grad =  cost_function_grad(P, x , t)\n",
+    "\n",
+    "        for l in range(N_hidden+1):\n",
+    "            P[l] = P[l] - lmb * cost_grad[l]\n",
+    "\n",
+    "    print('Final cost: ',cost_function(P, x, t))\n",
+    "\n",
+    "    return P\n",
+    "\n",
+    "if __name__ == '__main__':\n",
+    "    ### Use the neural network:\n",
+    "    npr.seed(15)\n",
+    "\n",
+    "    ## Decide the vales of arguments to the function to solve\n",
+    "    Nx = 10; Nt = 10\n",
+    "    x = np.linspace(0, 1, Nx)\n",
+    "    t = np.linspace(0,1,Nt)\n",
+    "\n",
+    "    ## Set up the parameters for the network\n",
+    "    num_hidden_neurons = [100, 25]\n",
+    "    num_iter = 250\n",
+    "    lmb = 0.01\n",
+    "\n",
+    "    P = solve_pde_deep_neural_network(x,t, num_hidden_neurons, num_iter, lmb)\n",
+    "\n",
+    "    ## Store the results\n",
+    "    g_dnn_ag = np.zeros((Nx, Nt))\n",
+    "    G_analytical = np.zeros((Nx, Nt))\n",
+    "    for i,x_ in enumerate(x):\n",
+    "        for j, t_ in enumerate(t):\n",
+    "            point = np.array([x_, t_])\n",
+    "            g_dnn_ag[i,j] = g_trial(point,P)\n",
+    "\n",
+    "            G_analytical[i,j] = g_analytic(point)\n",
+    "\n",
+    "    # Find the map difference between the analytical and the computed solution\n",
+    "    diff_ag = np.abs(g_dnn_ag - G_analytical)\n",
+    "    print('Max absolute difference between the analytical solution and the network: %g'%np.max(diff_ag))\n",
+    "\n",
+    "    ## Plot the solutions in two dimensions, that being in position and time\n",
+    "\n",
+    "    T,X = np.meshgrid(t,x)\n",
+    "\n",
+    "    fig = plt.figure(figsize=(10,10))\n",
+    "    ax = fig.add_suplot(projection='3d')\n",
+    "    ax.set_title('Solution from the deep neural network w/ %d layer'%len(num_hidden_neurons))\n",
+    "    s = ax.plot_surface(T,X,g_dnn_ag,linewidth=0,antialiased=False,cmap=cm.viridis)\n",
+    "    ax.set_xlabel('Time $t$')\n",
+    "    ax.set_ylabel('Position $x$');\n",
+    "\n",
+    "\n",
+    "    fig = plt.figure(figsize=(10,10))\n",
+    "    ax = fig.add_suplot(projection='3d')\n",
+    "    ax.set_title('Analytical solution')\n",
+    "    s = ax.plot_surface(T,X,G_analytical,linewidth=0,antialiased=False,cmap=cm.viridis)\n",
+    "    ax.set_xlabel('Time $t$')\n",
+    "    ax.set_ylabel('Position $x$');\n",
+    "\n",
+    "    fig = plt.figure(figsize=(10,10))\n",
+    "    ax = fig.add_suplot(projection='3d')\n",
+    "    ax.set_title('Difference')\n",
+    "    s = ax.plot_surface(T,X,diff_ag,linewidth=0,antialiased=False,cmap=cm.viridis)\n",
+    "    ax.set_xlabel('Time $t$')\n",
+    "    ax.set_ylabel('Position $x$');\n",
+    "\n",
+    "    ## Take some slices of the 3D plots just to see the solutions at particular times\n",
+    "    indx1 = 0\n",
+    "    indx2 = int(Nt/2)\n",
+    "    indx3 = Nt-1\n",
+    "\n",
+    "    t1 = t[indx1]\n",
+    "    t2 = t[indx2]\n",
+    "    t3 = t[indx3]\n",
+    "\n",
+    "    # Slice the results from the DNN\n",
+    "    res1 = g_dnn_ag[:,indx1]\n",
+    "    res2 = g_dnn_ag[:,indx2]\n",
+    "    res3 = g_dnn_ag[:,indx3]\n",
+    "\n",
+    "    # Slice the analytical results\n",
+    "    res_analytical1 = G_analytical[:,indx1]\n",
+    "    res_analytical2 = G_analytical[:,indx2]\n",
+    "    res_analytical3 = G_analytical[:,indx3]\n",
+    "\n",
+    "    # Plot the slices\n",
+    "    plt.figure(figsize=(10,10))\n",
+    "    plt.title(\"Computed solutions at time = %g\"%t1)\n",
+    "    plt.plot(x, res1)\n",
+    "    plt.plot(x,res_analytical1)\n",
+    "    plt.legend(['dnn','analytical'])\n",
+    "\n",
+    "    plt.figure(figsize=(10,10))\n",
+    "    plt.title(\"Computed solutions at time = %g\"%t2)\n",
+    "    plt.plot(x, res2)\n",
+    "    plt.plot(x,res_analytical2)\n",
+    "    plt.legend(['dnn','analytical'])\n",
+    "\n",
+    "    plt.figure(figsize=(10,10))\n",
+    "    plt.title(\"Computed solutions at time = %g\"%t3)\n",
+    "    plt.plot(x, res3)\n",
+    "    plt.plot(x,res_analytical3)\n",
+    "    plt.legend(['dnn','analytical'])\n",
+    "\n",
+    "    plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "51ff4964",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Resources on differential equations and deep learning\n",
+    "\n",
+    "1. [Artificial neural networks for solving ordinary and partial differential equations by I.E. Lagaris et al](https://pdfs.semanticscholar.org/d061/df393e0e8fbfd0ea24976458b7d42419040d.pdf)\n",
+    "\n",
+    "2. [Neural networks for solving differential equations by A. Honchar](https://becominghuman.ai/neural-networks-for-solving-differential-equations-fa230ac5e04c)\n",
+    "\n",
+    "3. [Solving differential equations using neural networks by M.M Chiaramonte and M. Kiener](http://cs229.stanford.edu/proj2013/ChiaramonteKiener-SolvingDifferentialEquationsUsingNeuralNetworks.pdf)\n",
+    "\n",
+    "4. [Introduction to Partial Differential Equations by A. Tveito, R. Winther](https://www.springer.com/us/book/9783540225515)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f7c3b9fc",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Convolutional Neural Networks (recognizing images)\n",
+    "\n",
+    "Convolutional neural networks (CNNs) were developed during the last\n",
+    "decade of the previous century, with a focus on character recognition\n",
+    "tasks. Nowadays, CNNs are a central element in the spectacular success\n",
+    "of deep learning methods. The success in for example image\n",
+    "classifications have made them a central tool for most machine\n",
+    "learning practitioners.\n",
+    "\n",
+    "CNNs are very similar to ordinary Neural Networks.\n",
+    "They are made up of neurons that have learnable weights and\n",
+    "biases. Each neuron receives some inputs, performs a dot product and\n",
+    "optionally follows it with a non-linearity. The whole network still\n",
+    "expresses a single differentiable score function: from the raw image\n",
+    "pixels on one end to class scores at the other. And they still have a\n",
+    "loss function (for example Softmax) on the last (fully-connected) layer\n",
+    "and all the tips/tricks we developed for learning regular Neural\n",
+    "Networks still apply (back propagation, gradient descent etc etc)."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5d3a5ee8",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## What is the Difference\n",
+    "\n",
+    "**CNN architectures make the explicit assumption that\n",
+    "the inputs are images, which allows us to encode certain properties\n",
+    "into the architecture. These then make the forward function more\n",
+    "efficient to implement and vastly reduce the amount of parameters in\n",
+    "the network.**"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e8618fc8",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Neural Networks vs CNNs\n",
+    "\n",
+    "Neural networks are defined as **affine transformations**, that is \n",
+    "a vector is received as input and is multiplied with a matrix of so-called weights (our unknown paramters) to produce an\n",
+    "output (to which a bias vector is usually added before passing the result\n",
+    "through a nonlinear activation function). This is applicable to any type of input, be it an\n",
+    "image, a sound clip or an unordered collection of features: whatever their\n",
+    "dimensionality, their representation can always be flattened into a vector\n",
+    "before the transformation."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b41b4781",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Why CNNS for images, sound files, medical images from CT scans etc?\n",
+    "\n",
+    "However, when we consider images, sound clips and many other similar kinds of data, these data  have an intrinsic\n",
+    "structure. More formally, they share these important properties:\n",
+    "* They are stored as multi-dimensional arrays (think of the pixels of a figure) .\n",
+    "\n",
+    "* They feature one or more axes for which ordering matters (e.g., width and height axes for an image, time axis for a sound clip).\n",
+    "\n",
+    "* One axis, called the channel axis, is used to access different views of the data (e.g., the red, green and blue channels of a color image, or the left and right channels of a stereo audio track).\n",
+    "\n",
+    "These properties are not exploited when an affine transformation is applied; in\n",
+    "fact, all the axes are treated in the same way and the topological information\n",
+    "is not taken into account. Still, taking advantage of the implicit structure of\n",
+    "the data may prove very handy in solving some tasks, like computer vision and\n",
+    "speech recognition, and in these cases it would be best to preserve it. This is\n",
+    "where discrete convolutions come into play.\n",
+    "\n",
+    "A discrete convolution is a linear transformation that preserves this notion of\n",
+    "ordering. It is sparse (only a few input units contribute to a given output\n",
+    "unit) and reuses parameters (the same weights are applied to multiple locations\n",
+    "in the input)."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "33bf8922",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Regular NNs don’t scale well to full images\n",
+    "\n",
+    "As an example, consider\n",
+    "an image of size $32\\times 32\\times 3$ (32 wide, 32 high, 3 color channels), so a\n",
+    "single fully-connected neuron in a first hidden layer of a regular\n",
+    "Neural Network would have $32\\times 32\\times 3 = 3072$ weights. This amount still\n",
+    "seems manageable, but clearly this fully-connected structure does not\n",
+    "scale to larger images. For example, an image of more respectable\n",
+    "size, say $200\\times 200\\times 3$, would lead to neurons that have \n",
+    "$200\\times 200\\times 3 = 120,000$ weights. \n",
+    "\n",
+    "We could have\n",
+    "several such neurons, and the parameters would add up quickly! Clearly,\n",
+    "this full connectivity is wasteful and the huge number of parameters\n",
+    "would quickly lead to possible overfitting.\n",
+    "\n",
+    "<!-- dom:FIGURE: [figslides/nn.jpeg, width=500 frac=0.6]  A regular 3-layer Neural Network. -->\n",
+    "<!-- begin figure -->\n",
+    "\n",
+    "<img src=\"figslides/nn.jpeg\" width=\"500\"><p style=\"font-size: 0.9em\"><i>Figure 1: A regular 3-layer Neural Network.</i></p>\n",
+    "<!-- end figure -->"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "95c20234",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## 3D volumes of neurons\n",
+    "\n",
+    "Convolutional Neural Networks take advantage of the fact that the\n",
+    "input consists of images and they constrain the architecture in a more\n",
+    "sensible way. \n",
+    "\n",
+    "In particular, unlike a regular Neural Network, the\n",
+    "layers of a CNN have neurons arranged in 3 dimensions: width,\n",
+    "height, depth. (Note that the word depth here refers to the third\n",
+    "dimension of an activation volume, not to the depth of a full Neural\n",
+    "Network, which can refer to the total number of layers in a network.)\n",
+    "\n",
+    "To understand it better, the above example of an image \n",
+    "with an input volume of\n",
+    "activations has dimensions $32\\times 32\\times 3$ (width, height,\n",
+    "depth respectively). \n",
+    "\n",
+    "The neurons in a layer will\n",
+    "only be connected to a small region of the layer before it, instead of\n",
+    "all of the neurons in a fully-connected manner. Moreover, the final\n",
+    "output layer could  for this specific image have dimensions $1\\times 1 \\times 10$, \n",
+    "because by the\n",
+    "end of the CNN architecture we will reduce the full image into a\n",
+    "single vector of class scores, arranged along the depth\n",
+    "dimension. \n",
+    "\n",
+    "<!-- dom:FIGURE: [figslides/cnn.jpeg, width=500 frac=0.6]  A CNN arranges its neurons in three dimensions (width, height, depth), as visualized in one of the layers. Every layer of a CNN transforms the 3D input volume to a 3D output volume of neuron activations. In this example, the red input layer holds the image, so its width and height would be the dimensions of the image, and the depth would be 3 (Red, Green, Blue channels). -->\n",
+    "<!-- begin figure -->\n",
+    "\n",
+    "<img src=\"figslides/cnn.jpeg\" width=\"500\"><p style=\"font-size: 0.9em\"><i>Figure 1: A CNN arranges its neurons in three dimensions (width, height, depth), as visualized in one of the layers. Every layer of a CNN transforms the 3D input volume to a 3D output volume of neuron activations. In this example, the red input layer holds the image, so its width and height would be the dimensions of the image, and the depth would be 3 (Red, Green, Blue channels).</i></p>\n",
+    "<!-- end figure -->"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2b7ba652",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## More on Dimensionalities\n",
+    "\n",
+    "In fields like signal processing (and imaging as well), one designs\n",
+    "so-called filters. These filters are defined by the convolutions and\n",
+    "are often hand-crafted. One may specify filters for smoothing, edge\n",
+    "detection, frequency reshaping, and similar operations. However with\n",
+    "neural networks the idea is to automatically learn the filters and use\n",
+    "many of them in conjunction with non-linear operations (activation\n",
+    "functions).\n",
+    "\n",
+    "As an example consider a neural network operating on sound sequence\n",
+    "data.  Assume that we an input vector $\\boldsymbol{x}$ of length $d=10^6$.  We\n",
+    "construct then a neural network with onle hidden layer only with\n",
+    "$10^4$ nodes. This means that we will have a weight matrix with\n",
+    "$10^4\\times 10^6=10^{10}$ weights to be determined, together with $10^4$ biases.\n",
+    "\n",
+    "Assume furthermore that we have an output layer which is meant to train whether the sound sequence represents a human voice (true) or something else (false).\n",
+    "It means that we have only one output node. But since this output node connects to $10^4$ nodes in the hidden layer, there are in total $10^4$ weights to be determined for the output layer, plus one bias. In total we have"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b6a7ae46",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\mathrm{NumberParameters}=10^{10}+10^4+10^4+1 \\approx 10^{10},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0d56b05e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "that is ten billion parameters to determine."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "35c90423",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Further remarks\n",
+    "\n",
+    "The main principles that justify convolutions is locality of\n",
+    "information and repetion of patterns within the signal. Sound samples\n",
+    "of the input in adjacent spots are much more likely to affect each\n",
+    "other than those that are very far away. Similarly, sounds are\n",
+    "repeated in multiple times in the signal. While slightly simplistic,\n",
+    "reasoning about such a sound example demonstrates this. The same\n",
+    "principles then apply to images and other similar data."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d08d4fb6",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Layers used to build CNNs\n",
+    "\n",
+    "A simple CNN is a sequence of layers, and every layer of a CNN\n",
+    "transforms one volume of activations to another through a\n",
+    "differentiable function. We use three main types of layers to build\n",
+    "CNN architectures: Convolutional Layer, Pooling Layer, and\n",
+    "Fully-Connected Layer (exactly as seen in regular Neural Networks). We\n",
+    "will stack these layers to form a full CNN architecture.\n",
+    "\n",
+    "A simple CNN for image classification could have the architecture:\n",
+    "\n",
+    "* **INPUT** ($32\\times 32 \\times 3$) will hold the raw pixel values of the image, in this case an image of width 32, height 32, and with three color channels R,G,B.\n",
+    "\n",
+    "* **CONV** (convolutional )layer will compute the output of neurons that are connected to local regions in the input, each computing a dot product between their weights and a small region they are connected to in the input volume. This may result in volume such as $[32\\times 32\\times 12]$ if we decided to use 12 filters.\n",
+    "\n",
+    "* **RELU** layer will apply an elementwise activation function, such as the $max(0,x)$ thresholding at zero. This leaves the size of the volume unchanged ($[32\\times 32\\times 12]$).\n",
+    "\n",
+    "* **POOL** (pooling) layer will perform a downsampling operation along the spatial dimensions (width, height), resulting in volume such as $[16\\times 16\\times 12]$.\n",
+    "\n",
+    "* **FC** (i.e. fully-connected) layer will compute the class scores, resulting in volume of size $[1\\times 1\\times 10]$, where each of the 10 numbers correspond to a class score, such as among the 10 categories of the MNIST images we considered above . As with ordinary Neural Networks and as the name implies, each neuron in this layer will be connected to all the numbers in the previous volume."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "dd95dcc6",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Transforming images\n",
+    "\n",
+    "CNNs transform the original image layer by layer from the original\n",
+    "pixel values to the final class scores. \n",
+    "\n",
+    "Observe that some layers contain\n",
+    "parameters and other don’t. In particular, the CNN layers perform\n",
+    "transformations that are a function of not only the activations in the\n",
+    "input volume, but also of the parameters (the weights and biases of\n",
+    "the neurons). On the other hand, the RELU/POOL layers will implement a\n",
+    "fixed function. The parameters in the CONV/FC layers will be trained\n",
+    "with gradient descent so that the class scores that the CNN computes\n",
+    "are consistent with the labels in the training set for each image."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5fdbdbfd",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## CNNs in brief\n",
+    "\n",
+    "In summary:\n",
+    "\n",
+    "* A CNN architecture is in the simplest case a list of Layers that transform the image volume into an output volume (e.g. holding the class scores)\n",
+    "\n",
+    "* There are a few distinct types of Layers (e.g. CONV/FC/RELU/POOL are by far the most popular)\n",
+    "\n",
+    "* Each Layer accepts an input 3D volume and transforms it to an output 3D volume through a differentiable function\n",
+    "\n",
+    "* Each Layer may or may not have parameters (e.g. CONV/FC do, RELU/POOL don’t)\n",
+    "\n",
+    "* Each Layer may or may not have additional hyperparameters (e.g. CONV/FC/POOL do, RELU doesn’t)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c0cbb6b0",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## A deep CNN model ([From Raschka et al](https://github.com/rasbt/machine-learning-book))\n",
+    "\n",
+    "<!-- dom:FIGURE: [figslides/deepcnn.png, width=500 frac=0.67]  A deep CNN -->\n",
+    "<!-- begin figure -->\n",
+    "\n",
+    "<img src=\"figslides/deepcnn.png\" width=\"500\"><p style=\"font-size: 0.9em\"><i>Figure 1: A deep CNN</i></p>\n",
+    "<!-- end figure -->"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "caf2418d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Key Idea\n",
+    "\n",
+    "A dense neural network is representd by an affine operation (like\n",
+    "matrix-matrix multiplication) where all parameters are included.\n",
+    "\n",
+    "The key idea in CNNs for say imaging is that in images neighbor pixels tend to be related! So we connect\n",
+    "only neighboring neurons in the input instead of connecting all with the first hidden layer.\n",
+    "\n",
+    "We say we perform a filtering (convolution is the mathematical operation)."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7d5552d8",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## How to do image compression before the era of deep learning\n",
+    "\n",
+    "The singular-value decomposition (SVD) algorithm has been for decades one of the standard ways of compressing images.\n",
+    "The [lectures on the SVD](https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/chapter2.html#the-singular-value-decomposition) give many of the essential details concerning the SVD.\n",
+    "\n",
+    "The orthogonal vectors which are obtained from the SVD, can be used to\n",
+    "project down the dimensionality of a given image. In the example here\n",
+    "we gray-scale an image and downsize it.\n",
+    "\n",
+    "This recipe relies on us being able to actually perform the SVD. For\n",
+    "large images, and in particular with many images to reconstruct, using the SVD \n",
+    "may quickly become an overwhelming task. With the advent of efficient deep\n",
+    "learning methods like CNNs and later generative methods, these methods\n",
+    "have become in the last years the premier way of performing image\n",
+    "analysis. In particular for classification problems with labelled images."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d0bc0489",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## The SVD example"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 10,
+   "id": "cec697e6",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "from matplotlib.image import imread\n",
+    "import matplotlib.pyplot as plt\n",
+    "import scipy.linalg as ln\n",
+    "import numpy as np\n",
+    "import os\n",
+    "from PIL import Image\n",
+    "from math import log10, sqrt \n",
+    "plt.rcParams['figure.figsize'] = [16, 8]\n",
+    "# Import image\n",
+    "A = imread(os.path.join(\"figslides/photo1.jpg\"))\n",
+    "X = A.dot([0.299, 0.5870, 0.114]) # Convert RGB to grayscale\n",
+    "img = plt.imshow(X)\n",
+    "# convert to gray\n",
+    "img.set_cmap('gray')\n",
+    "plt.axis('off')\n",
+    "plt.show()\n",
+    "# Call image size\n",
+    "print(': %s'%str(X.shape))\n",
+    "\n",
+    "\n",
+    "# split the matrix into U, S, VT\n",
+    "U, S, VT = np.linalg.svd(X,full_matrices=False)\n",
+    "S = np.diag(S)\n",
+    "m = 800 # Image's width\n",
+    "n = 1200 # Image's height\n",
+    "j = 0\n",
+    "# Try compression with different k vectors (these represent projections):\n",
+    "for k in (5,10, 20, 100,200,400,500):\n",
+    "    # Original size of the image\n",
+    "    originalSize = m * n \n",
+    "    # Size after compressed\n",
+    "    compressedSize = k * (1 + m + n) \n",
+    "    # The projection of the original image\n",
+    "    Xapprox = U[:,:k] @ S[0:k,:k] @ VT[:k,:]\n",
+    "    plt.figure(j+1)\n",
+    "    j += 1\n",
+    "    img = plt.imshow(Xapprox)\n",
+    "    img.set_cmap('gray')\n",
+    "    \n",
+    "    plt.axis('off')\n",
+    "    plt.title('k = ' + str(k))\n",
+    "    plt.show() \n",
+    "    print('Original size of image:')\n",
+    "    print(originalSize)\n",
+    "    print('Compression rate as Compressed image / Original size:')\n",
+    "    ratio = compressedSize * 1.0 / originalSize\n",
+    "    print(ratio)\n",
+    "    print('Compression rate is ' + str( round(ratio * 100 ,2)) + '%' )  \n",
+    "    # Estimate MQA\n",
+    "    x= X.astype(\"float\")\n",
+    "    y=Xapprox.astype(\"float\")\n",
+    "    err = np.sum((x - y) ** 2)\n",
+    "    err /= float(X.shape[0] * Xapprox.shape[1])\n",
+    "    print('The mean-square deviation '+ str(round( err)))\n",
+    "    max_pixel = 255.0\n",
+    "    # Estimate Signal Noise Ratio\n",
+    "    srv = 20 * (log10(max_pixel / sqrt(err)))\n",
+    "    print('Signa to noise ratio '+ str(round(srv)) +'dB')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6a578704",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Mathematics of CNNs\n",
+    "\n",
+    "The mathematics of CNNs is based on the mathematical operation of\n",
+    "**convolution**.  In mathematics (in particular in functional analysis),\n",
+    "convolution is represented by mathematical operations (integration,\n",
+    "summation etc) on two functions in order to produce a third function\n",
+    "that expresses how the shape of one gets modified by the other.\n",
+    "Convolution has a plethora of applications in a variety of\n",
+    "disciplines, spanning from statistics to signal processing, computer\n",
+    "vision, solutions of differential equations,linear algebra,\n",
+    "engineering, and yes, machine learning.\n",
+    "\n",
+    "Mathematically, convolution is defined as follows (one-dimensional example):\n",
+    "Let us define a continuous function $y(t)$ given by"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5c858d52",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "y(t) = \\int x(a) w(t-a) da,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a96333c3",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where $x(a)$ represents a so-called input and $w(t-a)$ is normally called the weight function or kernel.\n",
+    "\n",
+    "The above integral is written in  a more compact form as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9834d45e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "y(t) = \\left(x * w\\right)(t).\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "13e15c5f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "The discretized version reads"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0a496b2f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "y(t) = \\sum_{a=-\\infty}^{a=\\infty}x(a)w(t-a).\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "48c5ecd3",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Computing the inverse of the above convolution operations is known as deconvolution and the process is commutative.\n",
+    "\n",
+    "How can we use this? And what does it mean? Let us study some familiar examples first."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7cab11e7",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Convolution Examples: Polynomial multiplication\n",
+    "\n",
+    "Our first example is that of a multiplication between two polynomials,\n",
+    "which we will rewrite in terms of the mathematics of convolution. In\n",
+    "the final stage, since the problem here is a discrete one, we will\n",
+    "recast the final expression in terms of a matrix-vector\n",
+    "multiplication, where the matrix is a so-called [Toeplitz matrix\n",
+    "](https://link.springer.com/book/10.1007/978-93-86279-04-0).\n",
+    "\n",
+    "Let us look a the following polynomials to second and third order, respectively:"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c90333f8",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "p(t) = \\alpha_0+\\alpha_1 t+\\alpha_2 t^2,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7c1b0c9b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9c8df6e8",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "s(t) = \\beta_0+\\beta_1 t+\\beta_2 t^2+\\beta_3 t^3.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "50667dfa",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "The polynomial multiplication gives us a new polynomial of degree $5$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "11f2ea4b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "z(t) = \\delta_0+\\delta_1 t+\\delta_2 t^2+\\delta_3 t^3+\\delta_4 t^4+\\delta_5 t^5.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4abea758",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Efficient Polynomial Multiplication\n",
+    "\n",
+    "Computing polynomial products can be implemented efficiently if we rewrite the more brute force multiplications using convolution.\n",
+    "We note first that the new coefficients are given as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ad22b2d2",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\begin{split}\n",
+    "\\delta_0=&\\alpha_0\\beta_0\\\\\n",
+    "\\delta_1=&\\alpha_1\\beta_0+\\alpha_0\\beta_1\\\\\n",
+    "\\delta_2=&\\alpha_0\\beta_2+\\alpha_1\\beta_1+\\alpha_2\\beta_0\\\\\n",
+    "\\delta_3=&\\alpha_1\\beta_2+\\alpha_2\\beta_1+\\alpha_0\\beta_3\\\\\n",
+    "\\delta_4=&\\alpha_2\\beta_2+\\alpha_1\\beta_3\\\\\n",
+    "\\delta_5=&\\alpha_2\\beta_3.\\\\\n",
+    "\\end{split}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6a3ee064",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "We note that $\\alpha_i=0$ except for $i\\in \\left\\{0,1,2\\right\\}$ and $\\beta_i=0$ except for $i\\in\\left\\{0,1,2,3\\right\\}$.\n",
+    "\n",
+    "We can then rewrite the coefficients $\\delta_j$ using a discrete convolution as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3aca65d8",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\delta_j = \\sum_{i=-\\infty}^{i=\\infty}\\alpha_i\\beta_{j-i}=(\\alpha * \\beta)_j,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0e04ce27",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "or as a double sum with restriction $l=i+j$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "173eda29",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\delta_l = \\sum_{ij}\\alpha_i\\beta_{j}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a196c2cd",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Further simplification\n",
+    "\n",
+    "Although we may have redundant operations with some few zeros for $\\beta_i$, we can rewrite the above sum in a more compact way as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "56018bb8",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\delta_i = \\sum_{k=0}^{k=m-1}\\alpha_k\\beta_{i-k},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ba91ab7b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where $m=3$ in our case, the maximum length of\n",
+    "the vector $\\alpha$. Note that the vector $\\boldsymbol{\\beta}$ has length $n=4$. Below we will find an even more efficient representation."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1b25324b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## A more efficient way of coding the above Convolution\n",
+    "\n",
+    "Since we only have a finite number of $\\alpha$ and $\\beta$ values\n",
+    "which are non-zero, we can rewrite the above convolution expressions\n",
+    "as a matrix-vector multiplication"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "dd6d9155",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\boldsymbol{\\delta}=\\begin{bmatrix}\\alpha_0 & 0 & 0 & 0 \\\\\n",
+    "                            \\alpha_1 & \\alpha_0 & 0 & 0 \\\\\n",
+    "\t\t\t    \\alpha_2 & \\alpha_1 & \\alpha_0 & 0 \\\\\n",
+    "\t\t\t    0 & \\alpha_2 & \\alpha_1 & \\alpha_0 \\\\\n",
+    "\t\t\t    0 & 0 & \\alpha_2 & \\alpha_1 \\\\\n",
+    "\t\t\t    0 & 0 & 0 & \\alpha_2\n",
+    "\t\t\t    \\end{bmatrix}\\begin{bmatrix} \\beta_0 \\\\ \\beta_1 \\\\ \\beta_2 \\\\ \\beta_3\\end{bmatrix}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "28050537",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Commutative process\n",
+    "\n",
+    "The process is commutative and we can easily see that we can rewrite the multiplication in terms of  a matrix holding $\\beta$ and a vector holding $\\alpha$.\n",
+    "In this case we have"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f8278af4",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\boldsymbol{\\delta}=\\begin{bmatrix}\\beta_0 & 0 & 0  \\\\\n",
+    "                            \\beta_1 & \\beta_0 & 0  \\\\\n",
+    "\t\t\t    \\beta_2 & \\beta_1 & \\beta_0  \\\\\n",
+    "\t\t\t    \\beta_3 & \\beta_2 & \\beta_1 \\\\\n",
+    "\t\t\t    0 & \\beta_3 & \\beta_2 \\\\\n",
+    "\t\t\t    0 & 0 & \\beta_3\n",
+    "\t\t\t    \\end{bmatrix}\\begin{bmatrix} \\alpha_0 \\\\ \\alpha_1 \\\\ \\alpha_2\\end{bmatrix}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cfa8bf9e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Note that the use of these matrices is for mathematical purposes only\n",
+    "and not implementation purposes.  When implementing the above equation\n",
+    "we do not encode (and allocate memory) the matrices explicitely.  We\n",
+    "rather code the convolutions in the minimal memory footprint that they\n",
+    "require."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4ad971ca",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Toeplitz matrices\n",
+    "\n",
+    "The above matrices are examples of so-called [Toeplitz\n",
+    "matrices](https://link.springer.com/book/10.1007/978-93-86279-04-0). A\n",
+    "Toeplitz matrix is a matrix in which each descending diagonal from\n",
+    "left to right is constant. For instance the last matrix, which we\n",
+    "rewrite as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ff12250a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\boldsymbol{A}=\\begin{bmatrix}a_0 & 0 & 0  \\\\\n",
+    "                            a_1 & a_0 & 0  \\\\\n",
+    "\t\t\t    a_2 & a_1 & a_0  \\\\\n",
+    "\t\t\t    a_3 & a_2 & a_1 \\\\\n",
+    "\t\t\t    0 & a_3 & a_2 \\\\\n",
+    "\t\t\t    0 & 0 & a_3\n",
+    "\t\t\t    \\end{bmatrix},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d7ebe2e9",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "with elements $a_{ii}=a_{i+1,j+1}=a_{i-j}$ is an example of a Toeplitz\n",
+    "matrix. Such a matrix does not need to be a square matrix.  Toeplitz\n",
+    "matrices are also closely connected with Fourier series, because the multiplication operator by a trigonometric\n",
+    "polynomial, compressed to a finite-dimensional space, can be\n",
+    "represented by such a matrix. The example above shows that we can\n",
+    "represent linear convolution as multiplication of a Toeplitz matrix by\n",
+    "a vector."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5bfa6cd4",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Fourier series and Toeplitz matrices\n",
+    "\n",
+    "This is an active and ogoing research area concerning CNNs. The following articles may be of interest\n",
+    "1. [Read more about the convolution theorem and Fouriers series](https://www.sciencedirect.com/topics/engineering/convolution-theorem#:~:text=The%20convolution%20theorem%20(together%20with,k%20)%20G%20(%20k%20)%20.)\n",
+    "\n",
+    "2. [Fourier Transform Layer](https://www.sciencedirect.com/science/article/pii/S1568494623006257)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4cb64d8c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Generalizing the above one-dimensional case\n",
+    "\n",
+    "In order to align the above simple case with the more general\n",
+    "convolution cases, we rename $\\boldsymbol{\\alpha}$, whose length is $m=3$,\n",
+    "with $\\boldsymbol{w}$.  We will interpret $\\boldsymbol{w}$ as a weight/filter function\n",
+    "with which we want to perform the convolution with an input variable\n",
+    "$\\boldsymbol{x}$ of length $n$.  We will assume always that the filter\n",
+    "$\\boldsymbol{w}$ has dimensionality $m \\le n$.\n",
+    "\n",
+    "We replace thus $\\boldsymbol{\\beta}$ with $\\boldsymbol{x}$ and $\\boldsymbol{\\delta}$ with $\\boldsymbol{y}$ and have"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b05f94fc",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "y(i)= \\left(x*w\\right)(i)= \\sum_{k=0}^{k=m-1}w(k)x(i-k),\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e95bb8b8",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where $m=3$ in our case, the maximum length of the vector $\\boldsymbol{w}$.\n",
+    "Here the symbol $*$ represents the mathematical operation of convolution."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "490b28d9",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Memory considerations\n",
+    "\n",
+    "This expression leaves us however with some terms with negative\n",
+    "indices, for example $x(-1)$ and $x(-2)$ which may not be defined. Our\n",
+    "vector $\\boldsymbol{x}$ has components $x(0)$, $x(1)$, $x(2)$ and $x(3)$.\n",
+    "\n",
+    "The index $j$ for $\\boldsymbol{x}$ runs from $j=0$ to $j=3$ since $\\boldsymbol{x}$ is meant to\n",
+    "represent a third-order polynomial.\n",
+    "\n",
+    "Furthermore, the index $i$ runs from $i=0$ to $i=5$ since $\\boldsymbol{y}$\n",
+    "contains the coefficients of a fifth-order polynomial.  When $i=5$ we\n",
+    "may also have values of $x(4)$ and $x(5)$ which are not defined."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "73dba37b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Padding\n",
+    "\n",
+    "The solution to this is what is called **padding**!  We simply define a\n",
+    "new vector $x$ with two added elements set to zero before $x(0)$ and\n",
+    "two new elements after $x(3)$ set to zero. That is, we augment the\n",
+    "length of $\\boldsymbol{x}$ from $n=4$ to $n+2P=8$, where $P=2$ is the padding\n",
+    "constant (a new hyperparameter), see discussions below as well."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a4ef9cfb",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## New vector\n",
+    "\n",
+    "We have a new vector defined as $x(0)=0$, $x(1)=0$,\n",
+    "$x(2)=\\beta_0$, $x(3)=\\beta_1$, $x(4)=\\beta_2$, $x(5)=\\beta_3$,\n",
+    "$x(6)=0$, and $x(7)=0$.\n",
+    "\n",
+    "We have added four new elements, which\n",
+    "are all zero. The benefit is that we can rewrite the equation for\n",
+    "$\\boldsymbol{y}$, with $i=0,1,\\dots,5$,"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a3df037d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "y(i) = \\sum_{k=0}^{k=m-1}w(k)x(i+(m-1)-k).\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a10c95fd",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "As an example, we have"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "be674b8a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "y(4)=x(6)w(0)+x(5)w(1)+x(4)w(2)=0\\times \\alpha_0+\\beta_3\\alpha_1+\\beta_2\\alpha_2,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c903130e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "as before except that we have an additional term $x(6)w(0)$, which is zero.\n",
+    "\n",
+    "Similarly, for the fifth-order term we have"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "369fb648",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "y(5)=x(7)w(0)+x(6)w(1)+x(5)w(2)=0\\times \\alpha_0+0\\times\\alpha_1+\\beta_3\\alpha_2.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9eae3982",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "The zeroth-order term is"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "52147ec0",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "y(0)=x(2)w(0)+x(1)w(1)+x(0)w(2)=\\beta_0 \\alpha_0+0\\times\\alpha_1+0\\times\\alpha_2=\\alpha_0\\beta_0.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f26b1f24",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Rewriting as dot products\n",
+    "\n",
+    "If we now flip the filter/weight vector, with the following term as a typical example"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1cda7b7e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "y(0)=x(2)w(0)+x(1)w(1)+x(0)w(2)=x(2)\\tilde{w}(2)+x(1)\\tilde{w}(1)+x(0)\\tilde{w}(0),\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "de80daa7",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "with $\\tilde{w}(0)=w(2)$, $\\tilde{w}(1)=w(1)$, and $\\tilde{w}(2)=w(0)$, we can then rewrite the above sum as a dot product of\n",
+    "$x(i:i+(m-1))\\tilde{w}$ for element $y(i)$, where $x(i:i+(m-1))$ is simply a patch of $\\boldsymbol{x}$ of size $m-1$.\n",
+    "\n",
+    "The padding $P$ we have introduced for the convolution stage is just\n",
+    "another hyperparameter which is introduced as part of the\n",
+    "architecture. Similarly, below we will also introduce another\n",
+    "hyperparameter called **Stride** $S$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "bdb16a64",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Cross correlation\n",
+    "\n",
+    "In essentially all applications one uses what is called cross correlation instead of the standard convolution described above.\n",
+    "This means that multiplication is performed in the same direction and instead of the general expression we have discussed above (with infinite sums)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a88a1043",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "y(i) = \\sum_{k=-\\infty}^{k=\\infty}w(k)x(i-k),\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "03659d77",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "we have now"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "532e84de",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "y(i) = \\sum_{k=-\\infty}^{k=\\infty}w(k)x(i+k).\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0487f1f5",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Both TensorFlow and PyTorch (as well as our own code example below),\n",
+    "implement the last equation, although it is normally referred to as\n",
+    "convolution.  The same padding rules and stride rules discussed below\n",
+    "apply to this expression as well.\n",
+    "\n",
+    "We leave it as an exercise for you to convince yourself that the example we have discussed till now, gives the same final expression using the last expression."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "98475dfa",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Two-dimensional objects\n",
+    "\n",
+    "We are now ready to start studying the discrete convolutions relevant for convolutional neural networks.\n",
+    "We often use convolutions over more than one dimension at a time. If\n",
+    "we have a two-dimensional image $X$ as input, we can have a **filter**\n",
+    "defined by a two-dimensional **kernel/weight/filter** $W$. This leads to an output $Y$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1cb3be71",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "Y(i,j)=(X * W)(i,j) = \\sum_m\\sum_n X(m,n)W(i-m,j-n).\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "bd9cd9fb",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Convolution is a commutative process, which means we can rewrite this equation as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1ba314a8",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "Y(i,j)=(X * W)(i,j) = \\sum_m\\sum_n X(i-m,j-n)W(m,n).\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b9fb3fef",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Normally the latter is more straightforward to implement in a machine\n",
+    "larning library since there is less variation in the range of values\n",
+    "of $m$ and $n$.\n",
+    "\n",
+    "As mentioned above, most deep learning libraries implement\n",
+    "cross-correlation instead of convolution (although it is referred to as\n",
+    "convolution)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2d48086b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "Y(i,j)=(X * W)(i,j) = \\sum_m\\sum_n X(i+m,j+n)W(m,n).\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2a62fbae",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## CNNs in more detail, simple example\n",
+    "\n",
+    "Let assume we have an input matrix $X$ of dimensionality $3\\times 3$\n",
+    "and a $2\\times 2$ filter $W$ given by the following matrices"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0176ecc6",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\boldsymbol{X}=\\begin{bmatrix}x_{00} & x_{01} & x_{02}  \\\\\n",
+    "                      x_{10} & x_{11} & x_{12}  \\\\\n",
+    "\t              x_{20} & x_{21} & x_{22} \\end{bmatrix},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f87b6051",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "164502cc",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\boldsymbol{W}=\\begin{bmatrix}w_{00} & w_{01} \\\\\n",
+    "\t              w_{10} & w_{11}\\end{bmatrix}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1d4e61fe",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "We introduce now the hyperparameter $S$ **stride**. Stride represents how the filter $W$ moves the convolution process on the matrix $X$.\n",
+    "We strongly recommend the repository on [Arithmetic of deep learning by Dumoulin and Visin](https://github.com/vdumoulin/conv_arithmetic) \n",
+    "\n",
+    "Here we set the stride equal to $S=1$, which means that, starting with the element $x_{00}$, the filter will act on $2\\times 2$ submatrices each time, starting with the upper corner and moving according to the stride value column by column. \n",
+    "\n",
+    "Here we perform the operation"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7aae890d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "Y_(i,j)=(X * W)(i,j) = \\sum_m\\sum_n X(i-m,j-n)W(m,n),\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "352ba109",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and obtain"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4660c16f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\boldsymbol{Y}=\\begin{bmatrix}x_{00}w_{00}+x_{01}w_{01}+x_{10}w_{10}+x_{11}w_{11} & x_{01}w_{00}+x_{02}w_{01}+x_{11}w_{10}+x_{12}w_{11}  \\\\\n",
+    "\t              x_{10}w_{00}+x_{11}w_{01}+x_{20}w_{10}+x_{21}w_{11} & x_{11}w_{00}+x_{12}w_{01}+x_{21}w_{10}+x_{22}w_{11}\\end{bmatrix}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "edb9d39b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "We can rewrite this operation in terms of a matrix-vector multiplication by defining a new vector where we flatten out the inputs as a vector $\\boldsymbol{X}'$ of length $9$ and\n",
+    "a matrix $\\boldsymbol{W}'$ with dimension $4\\times 9$ as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "11470079",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\boldsymbol{X}'=\\begin{bmatrix}x_{00} \\\\ x_{01} \\\\ x_{02} \\\\ x_{10} \\\\ x_{11} \\\\ x_{12} \\\\ x_{20} \\\\ x_{21} \\\\ x_{22} \\end{bmatrix},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9b505f16",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and the new matrix"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "30c903b3",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\boldsymbol{W}'=\\begin{bmatrix} w_{00} & w_{01} & 0 & w_{10} & w_{11} & 0 & 0 & 0 & 0 \\\\\n",
+    "                        0  & w_{00} & w_{01} & 0 & w_{10} & w_{11} & 0 & 0 & 0 \\\\\n",
+    "\t\t\t0 & 0 & 0 & w_{00} & w_{01} & 0 & w_{10} & w_{11} & 0  \\\\\n",
+    "                        0 & 0 & 0 & 0 & w_{00} & w_{01} & 0 & w_{10} & w_{11}\\end{bmatrix}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "057a5e31",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "We see easily that performing the matrix-vector multiplication $\\boldsymbol{W}'\\boldsymbol{X}'$ is the same as the above convolution with stride $S=1$, that is"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e5f35917",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "Y=(\\boldsymbol{W}*\\boldsymbol{X}),\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7c7cca5e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "is now given by $\\boldsymbol{W}'\\boldsymbol{X}'$ which is a vector of length $4$ instead of the originally resulting  $2\\times 2$ output matrix."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ed8782fc",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## The convolution stage\n",
+    "\n",
+    "The convolution stage, where we apply different filters $\\boldsymbol{W}$ in\n",
+    "order to reduce the dimensionality of an image, adds, in addition to\n",
+    "the weights and biases (to be trained by the back propagation\n",
+    "algorithm) that define the filters, two new hyperparameters, the so-called\n",
+    "**padding** $P$ and the stride $S$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3582873f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Finding the number of parameters\n",
+    "\n",
+    "In the above example we have an input matrix of dimension $3\\times\n",
+    "3$. In general we call the input for an input volume and it is defined\n",
+    "by its width $H_1$, height $H_1$ and depth $D_1$. If we have the\n",
+    "standard three color channels $D_1=3$.\n",
+    "\n",
+    "The above example has $W_1=H_1=3$ and $D_1=1$.\n",
+    "\n",
+    "When we introduce the filter we have the following additional hyperparameters\n",
+    "1. $K$ the number of filters. It is common to perform the convolution of the input several times since by experience shrinking the input too fast does not work well\n",
+    "\n",
+    "2. $F$ as the filter's spatial extent\n",
+    "\n",
+    "3. $S$ as the stride parameter\n",
+    "\n",
+    "4. $P$ as the padding parameter\n",
+    "\n",
+    "These parameters are defined by the architecture of the network and are not included in the training."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c06b2b85",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## New image (or volume)\n",
+    "\n",
+    "Acting with the filter on the input volume produces an output volume\n",
+    "which is defined by its width $W_2$, its height $H_2$ and its depth\n",
+    "$D_2$.\n",
+    "\n",
+    "These are defined by the following relations"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "aa9ff748",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "W_2 = \\frac{(W_1-F+2P)}{S}+1,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0508533e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "H_2 = \\frac{(H_1-F+2P)}{S}+1,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2b59d0d6",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and $D_2=K$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e283f13b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Parameters to train, common settings\n",
+    "\n",
+    "With parameter sharing, the convolution involves thus  for each filter  $F\\times F\\times D_1$ weights plus one bias parameter.\n",
+    "\n",
+    "In total we have"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "59617fcb",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\left(F\\times F\\times D_1)\\right) \\times K+(K\\mathrm{--biases}),\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f406e197",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "parameters to train by back propagation.\n",
+    "\n",
+    "It is common to let $K$ come in powers of $2$, that is $32$, $64$, $128$ etc.\n",
+    "\n",
+    "**Common settings.**\n",
+    "\n",
+    "1. $\\begin{array}{c} F=3 & S=1 & P=1 \\end{array}$\n",
+    "\n",
+    "2. $\\begin{array}{c} F=5 & S=1 & P=2 \\end{array}$\n",
+    "\n",
+    "3. $\\begin{array}{c} F=5 & S=2 & P=\\mathrm{open} \\end{array}$\n",
+    "\n",
+    "4. $\\begin{array}{c} F=1 & S=1 & P=0 \\end{array}$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "82febfb4",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Examples of CNN setups\n",
+    "\n",
+    "Let us assume we have an input volume $V$ given by an image of dimensionality\n",
+    "$32\\times 32 \\times 3$, that is three color channels and $32\\times 32$ pixels.\n",
+    "\n",
+    "We apply a filter of dimension $5\\times 5$ ten times with stride $S=1$ and padding $P=0$.\n",
+    "\n",
+    "The output volume is given by $(32-5)/1+1=28$, resulting in ten images\n",
+    "of dimensionality $28\\times 28\\times 3$.\n",
+    "\n",
+    "The total number of parameters to train for each filter is then\n",
+    "$5\\times 5\\times 3+1$, where the last parameter is the bias. This\n",
+    "gives us $76$ parameters for each filter, leading to a total of $760$\n",
+    "parameters for the ten filters.\n",
+    "\n",
+    "How many parameters will a filter of dimensionality $3\\times 3$\n",
+    "(adding color channels) result in if we produce $32$ new images? Use $S=1$ and $P=0$.\n",
+    "\n",
+    "Note that strides constitute a form of **subsampling**. As an alternative to\n",
+    "being interpreted as a measure of how much the kernel/filter is translated, strides\n",
+    "can also be viewed as how much of the output is retained. For instance, moving\n",
+    "the kernel by hops of two is equivalent to moving the kernel by hops of one but\n",
+    "retaining only odd output elements."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "638e063c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Summarizing: Performing a general discrete convolution ([From Raschka et al](https://github.com/rasbt/machine-learning-book))\n",
+    "\n",
+    "<!-- dom:FIGURE: [figslides/discreteconv1.png, width=500 frac=0.67]  A deep CNN -->\n",
+    "<!-- begin figure -->\n",
+    "\n",
+    "<img src=\"figslides/discreteconv1.png\" width=\"500\"><p style=\"font-size: 0.9em\"><i>Figure 1: A deep CNN</i></p>\n",
+    "<!-- end figure -->"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d182de4b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Pooling\n",
+    "\n",
+    "In addition to discrete convolutions themselves, **pooling** operations\n",
+    "make up another important building block in CNNs. Pooling operations reduce\n",
+    "the size of feature maps by using some function to summarize subregions, such\n",
+    "as taking the average or the maximum value.\n",
+    "\n",
+    "Pooling works by sliding a window across the input and feeding the content of\n",
+    "the window to a **pooling function**. In some sense, pooling works very much\n",
+    "like a discrete convolution, but replaces the linear combination described by\n",
+    "the kernel with some other function."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1159bffe",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Pooling arithmetic\n",
+    "\n",
+    "In a neural network, pooling layers provide invariance to small translations of\n",
+    "the input. The most common kind of pooling is **max pooling**, which\n",
+    "consists in splitting the input in (usually non-overlapping) patches and\n",
+    "outputting the maximum value of each patch. Other kinds of pooling exist, e.g.,\n",
+    "mean or average pooling, which all share the same idea of aggregating the input\n",
+    "locally by applying a non-linearity to the content of some patches."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "138b6d6a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Pooling types ([From Raschka et al](https://github.com/rasbt/machine-learning-book))\n",
+    "\n",
+    "<!-- dom:FIGURE: [figslides/maxpooling.png, width=500 frac=0.67]  A deep CNN -->\n",
+    "<!-- begin figure -->\n",
+    "\n",
+    "<img src=\"figslides/maxpooling.png\" width=\"500\"><p style=\"font-size: 0.9em\"><i>Figure 1: A deep CNN</i></p>\n",
+    "<!-- end figure -->"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "97123878",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Building convolutional neural networks in Tensorflow/Keras and PyTorch\n",
+    "\n",
+    "As discussed above, CNNs are neural networks built from the assumption\n",
+    "that the inputs to the network are 2D images. This is important\n",
+    "because the number of features or pixels in images grows very fast\n",
+    "with the image size, and an enormous number of weights and biases are\n",
+    "needed in order to build an accurate network.  Next week we will\n",
+    "discuss in more detail how we can build a CNN using either TensorFlow\n",
+    "with Keras and PyTorch."
+   ]
+  }
+ ],
+ "metadata": {},
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/doc/LectureNotes/week45.ipynb b/doc/LectureNotes/week45.ipynb
new file mode 100644
index 000000000..c5336e2ab
--- /dev/null
+++ b/doc/LectureNotes/week45.ipynb
@@ -0,0 +1,2335 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "9686648f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- HTML file automatically generated from DocOnce source (https://github.com/doconce/doconce/)\n",
+    "doconce format html week45.do.txt --no_mako -->\n",
+    "<!-- dom:TITLE: Week 45,  Convolutional Neural Networks (CCNs) -->"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "45892517",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "# Week 45,  Convolutional Neural Networks (CCNs)\n",
+    "**Morten Hjorth-Jensen**, Department of Physics, University of Oslo\n",
+    "\n",
+    "Date: **November 3-7, 2025**"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8449fbfd",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Plans for week 45\n",
+    "\n",
+    "**Material for the lecture on Monday November 3, 2025.**\n",
+    "\n",
+    "1. Convolutional Neural Networks, codes and examples (TensorFlow and Pytorch implementations)\n",
+    "\n",
+    "2. Readings and Videos:\n",
+    "\n",
+    "3. These lecture notes at <https://github.com/CompPhysics/MachineLearning/blob/master/doc/pub/week45/ipynb/week45.ipynb>\n",
+    "\n",
+    "4. Video of lecture at <https://youtu.be/dZt6Vm1wjhs>\n",
+    "\n",
+    "5. Whiteboard notes at <https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2025/FYSSTKweek45.pdf>\n",
+    "\n",
+    "6. For a more in depth discussion on  CNNs we recommend Goodfellow et al chapters 9. See also chapter 11 and 12 on practicalities and applications    \n",
+    "\n",
+    "7. Reading suggestions for implementation of CNNs, see Raschka et al chapters 14-15 at <https://github.com/rasbt/machine-learning-book>.\n",
+    "<!-- o Video  on Recurrent Neural Networks from MIT at <https://www.youtube.com/watch?v=SEnXr6v2ifU&ab_channel=AlexanderAmini> -->\n",
+    "\n",
+    "a. Video on Deep Learning at <https://www.youtube.com/playlist?list=PLZHQObOWTQDNU6R1_67000Dx_ZCJB-3pi>"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4ad8a4b2",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Material for the lab sessions\n",
+    "\n",
+    "Discussion of and work on project 2, no exercises this week, only project work"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "48e99fbe",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Material for Lecture Monday November 3"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "661e183c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Convolutional Neural Networks (recognizing images), reminder from last week\n",
+    "\n",
+    "Convolutional neural networks (CNNs) were developed during the last\n",
+    "decade of the previous century, with a focus on character recognition\n",
+    "tasks. Nowadays, CNNs are a central element in the spectacular success\n",
+    "of deep learning methods. The success in for example image\n",
+    "classifications have made them a central tool for most machine\n",
+    "learning practitioners.\n",
+    "\n",
+    "CNNs are very similar to ordinary Neural Networks.\n",
+    "They are made up of neurons that have learnable weights and\n",
+    "biases. Each neuron receives some inputs, performs a dot product and\n",
+    "optionally follows it with a non-linearity. The whole network still\n",
+    "expresses a single differentiable score function: from the raw image\n",
+    "pixels on one end to class scores at the other. And they still have a\n",
+    "loss function (for example Softmax) on the last (fully-connected) layer\n",
+    "and all the tips/tricks we developed for learning regular Neural\n",
+    "Networks still apply (back propagation, gradient descent etc etc)."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "96a38398",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## What is the Difference\n",
+    "\n",
+    "**CNN architectures make the explicit assumption that\n",
+    "the inputs are images, which allows us to encode certain properties\n",
+    "into the architecture. These then make the forward function more\n",
+    "efficient to implement and vastly reduce the amount of parameters in\n",
+    "the network.**"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3ca522fb",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Neural Networks vs CNNs\n",
+    "\n",
+    "Neural networks are defined as **affine transformations**, that is \n",
+    "a vector is received as input and is multiplied with a matrix of so-called weights (our unknown paramters) to produce an\n",
+    "output (to which a bias vector is usually added before passing the result\n",
+    "through a nonlinear activation function). This is applicable to any type of input, be it an\n",
+    "image, a sound clip or an unordered collection of features: whatever their\n",
+    "dimensionality, their representation can always be flattened into a vector\n",
+    "before the transformation."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "609aa156",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Why CNNS for images, sound files, medical images from CT scans etc?\n",
+    "\n",
+    "However, when we consider images, sound clips and many other similar kinds of data, these data  have an intrinsic\n",
+    "structure. More formally, they share these important properties:\n",
+    "* They are stored as multi-dimensional arrays (think of the pixels of a figure) .\n",
+    "\n",
+    "* They feature one or more axes for which ordering matters (e.g., width and height axes for an image, time axis for a sound clip).\n",
+    "\n",
+    "* One axis, called the channel axis, is used to access different views of the data (e.g., the red, green and blue channels of a color image, or the left and right channels of a stereo audio track).\n",
+    "\n",
+    "These properties are not exploited when an affine transformation is applied; in\n",
+    "fact, all the axes are treated in the same way and the topological information\n",
+    "is not taken into account. Still, taking advantage of the implicit structure of\n",
+    "the data may prove very handy in solving some tasks, like computer vision and\n",
+    "speech recognition, and in these cases it would be best to preserve it. This is\n",
+    "where discrete convolutions come into play.\n",
+    "\n",
+    "A discrete convolution is a linear transformation that preserves this notion of\n",
+    "ordering. It is sparse (only a few input units contribute to a given output\n",
+    "unit) and reuses parameters (the same weights are applied to multiple locations\n",
+    "in the input)."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c280e4de",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Regular NNs don’t scale well to full images\n",
+    "\n",
+    "As an example, consider\n",
+    "an image of size $32\\times 32\\times 3$ (32 wide, 32 high, 3 color channels), so a\n",
+    "single fully-connected neuron in a first hidden layer of a regular\n",
+    "Neural Network would have $32\\times 32\\times 3 = 3072$ weights. This amount still\n",
+    "seems manageable, but clearly this fully-connected structure does not\n",
+    "scale to larger images. For example, an image of more respectable\n",
+    "size, say $200\\times 200\\times 3$, would lead to neurons that have \n",
+    "$200\\times 200\\times 3 = 120,000$ weights. \n",
+    "\n",
+    "We could have\n",
+    "several such neurons, and the parameters would add up quickly! Clearly,\n",
+    "this full connectivity is wasteful and the huge number of parameters\n",
+    "would quickly lead to possible overfitting.\n",
+    "\n",
+    "<!-- dom:FIGURE: [figslides/nn.jpeg, width=500 frac=0.6]  A regular 3-layer Neural Network. -->\n",
+    "<!-- begin figure -->\n",
+    "\n",
+    "<img src=\"figslides/nn.jpeg\" width=\"500\"><p style=\"font-size: 0.9em\"><i>Figure 1: A regular 3-layer Neural Network.</i></p>\n",
+    "<!-- end figure -->"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0d86d50e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## 3D volumes of neurons\n",
+    "\n",
+    "Convolutional Neural Networks take advantage of the fact that the\n",
+    "input consists of images and they constrain the architecture in a more\n",
+    "sensible way. \n",
+    "\n",
+    "In particular, unlike a regular Neural Network, the\n",
+    "layers of a CNN have neurons arranged in 3 dimensions: width,\n",
+    "height, depth. (Note that the word depth here refers to the third\n",
+    "dimension of an activation volume, not to the depth of a full Neural\n",
+    "Network, which can refer to the total number of layers in a network.)\n",
+    "\n",
+    "To understand it better, the above example of an image \n",
+    "with an input volume of\n",
+    "activations has dimensions $32\\times 32\\times 3$ (width, height,\n",
+    "depth respectively). \n",
+    "\n",
+    "The neurons in a layer will\n",
+    "only be connected to a small region of the layer before it, instead of\n",
+    "all of the neurons in a fully-connected manner. Moreover, the final\n",
+    "output layer could  for this specific image have dimensions $1\\times 1 \\times 10$, \n",
+    "because by the\n",
+    "end of the CNN architecture we will reduce the full image into a\n",
+    "single vector of class scores, arranged along the depth\n",
+    "dimension. \n",
+    "\n",
+    "<!-- dom:FIGURE: [figslides/cnn.jpeg, width=500 frac=0.6]  A CNN arranges its neurons in three dimensions (width, height, depth), as visualized in one of the layers. Every layer of a CNN transforms the 3D input volume to a 3D output volume of neuron activations. In this example, the red input layer holds the image, so its width and height would be the dimensions of the image, and the depth would be 3 (Red, Green, Blue channels). -->\n",
+    "<!-- begin figure -->\n",
+    "\n",
+    "<img src=\"figslides/cnn.jpeg\" width=\"500\"><p style=\"font-size: 0.9em\"><i>Figure 1: A CNN arranges its neurons in three dimensions (width, height, depth), as visualized in one of the layers. Every layer of a CNN transforms the 3D input volume to a 3D output volume of neuron activations. In this example, the red input layer holds the image, so its width and height would be the dimensions of the image, and the depth would be 3 (Red, Green, Blue channels).</i></p>\n",
+    "<!-- end figure -->"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "93102a35",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## More on Dimensionalities\n",
+    "\n",
+    "In fields like signal processing (and imaging as well), one designs\n",
+    "so-called filters. These filters are defined by the convolutions and\n",
+    "are often hand-crafted. One may specify filters for smoothing, edge\n",
+    "detection, frequency reshaping, and similar operations. However with\n",
+    "neural networks the idea is to automatically learn the filters and use\n",
+    "many of them in conjunction with non-linear operations (activation\n",
+    "functions).\n",
+    "\n",
+    "As an example consider a neural network operating on sound sequence\n",
+    "data.  Assume that we an input vector $\\boldsymbol{x}$ of length $d=10^6$.  We\n",
+    "construct then a neural network with onle hidden layer only with\n",
+    "$10^4$ nodes. This means that we will have a weight matrix with\n",
+    "$10^4\\times 10^6=10^{10}$ weights to be determined, together with $10^4$ biases.\n",
+    "\n",
+    "Assume furthermore that we have an output layer which is meant to train whether the sound sequence represents a human voice (true) or something else (false).\n",
+    "It means that we have only one output node. But since this output node connects to $10^4$ nodes in the hidden layer, there are in total $10^4$ weights to be determined for the output layer, plus one bias. In total we have"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b0e6ea33",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\mathrm{NumberParameters}=10^{10}+10^4+10^4+1 \\approx 10^{10},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3fbba997",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "that is ten billion parameters to determine."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4be9d3e0",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Further remarks\n",
+    "\n",
+    "The main principles that justify convolutions is locality of\n",
+    "information and repetion of patterns within the signal. Sound samples\n",
+    "of the input in adjacent spots are much more likely to affect each\n",
+    "other than those that are very far away. Similarly, sounds are\n",
+    "repeated in multiple times in the signal. While slightly simplistic,\n",
+    "reasoning about such a sound example demonstrates this. The same\n",
+    "principles then apply to images and other similar data."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b93711ab",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Layers used to build CNNs\n",
+    "\n",
+    "A simple CNN is a sequence of layers, and every layer of a CNN\n",
+    "transforms one volume of activations to another through a\n",
+    "differentiable function. We use three main types of layers to build\n",
+    "CNN architectures: Convolutional Layer, Pooling Layer, and\n",
+    "Fully-Connected Layer (exactly as seen in regular Neural Networks). We\n",
+    "will stack these layers to form a full CNN architecture.\n",
+    "\n",
+    "A simple CNN for image classification could have the architecture:\n",
+    "\n",
+    "* **INPUT** ($32\\times 32 \\times 3$) will hold the raw pixel values of the image, in this case an image of width 32, height 32, and with three color channels R,G,B.\n",
+    "\n",
+    "* **CONV** (convolutional )layer will compute the output of neurons that are connected to local regions in the input, each computing a dot product between their weights and a small region they are connected to in the input volume. This may result in volume such as $[32\\times 32\\times 12]$ if we decided to use 12 filters.\n",
+    "\n",
+    "* **RELU** layer will apply an elementwise activation function, such as the $max(0,x)$ thresholding at zero. This leaves the size of the volume unchanged ($[32\\times 32\\times 12]$).\n",
+    "\n",
+    "* **POOL** (pooling) layer will perform a downsampling operation along the spatial dimensions (width, height), resulting in volume such as $[16\\times 16\\times 12]$.\n",
+    "\n",
+    "* **FC** (i.e. fully-connected) layer will compute the class scores, resulting in volume of size $[1\\times 1\\times 10]$, where each of the 10 numbers correspond to a class score, such as among the 10 categories of the MNIST images we considered above . As with ordinary Neural Networks and as the name implies, each neuron in this layer will be connected to all the numbers in the previous volume."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "df93de2c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Transforming images\n",
+    "\n",
+    "CNNs transform the original image layer by layer from the original\n",
+    "pixel values to the final class scores. \n",
+    "\n",
+    "Observe that some layers contain\n",
+    "parameters and other don’t. In particular, the CNN layers perform\n",
+    "transformations that are a function of not only the activations in the\n",
+    "input volume, but also of the parameters (the weights and biases of\n",
+    "the neurons). On the other hand, the RELU/POOL layers will implement a\n",
+    "fixed function. The parameters in the CONV/FC layers will be trained\n",
+    "with gradient descent so that the class scores that the CNN computes\n",
+    "are consistent with the labels in the training set for each image."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "35b469f8",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## CNNs in brief\n",
+    "\n",
+    "In summary:\n",
+    "\n",
+    "* A CNN architecture is in the simplest case a list of Layers that transform the image volume into an output volume (e.g. holding the class scores)\n",
+    "\n",
+    "* There are a few distinct types of Layers (e.g. CONV/FC/RELU/POOL are by far the most popular)\n",
+    "\n",
+    "* Each Layer accepts an input 3D volume and transforms it to an output 3D volume through a differentiable function\n",
+    "\n",
+    "* Each Layer may or may not have parameters (e.g. CONV/FC do, RELU/POOL don’t)\n",
+    "\n",
+    "* Each Layer may or may not have additional hyperparameters (e.g. CONV/FC/POOL do, RELU doesn’t)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f2bc243c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## A deep CNN model ([From Raschka et al](https://github.com/rasbt/machine-learning-book))\n",
+    "\n",
+    "<!-- dom:FIGURE: [figslides/deepcnn.png, width=500 frac=0.67]  A deep CNN -->\n",
+    "<!-- begin figure -->\n",
+    "\n",
+    "<img src=\"figslides/deepcnn.png\" width=\"500\"><p style=\"font-size: 0.9em\"><i>Figure 1: A deep CNN</i></p>\n",
+    "<!-- end figure -->"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "92956a26",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Key Idea\n",
+    "\n",
+    "A dense neural network is representd by an affine operation (like matrix-matrix multiplication) where all parameters are included.\n",
+    "\n",
+    "The key idea in CNNs for say imaging is that in images neighbor pixels tend to be related! So we connect\n",
+    "only neighboring neurons in the input instead of connecting all with the first hidden layer.\n",
+    "\n",
+    "We say we perform a filtering (convolution is the mathematical operation)."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b758f4ee",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Mathematics of CNNs\n",
+    "\n",
+    "The mathematics of CNNs is based on the mathematical operation of\n",
+    "**convolution**.  In mathematics (in particular in functional analysis),\n",
+    "convolution is represented by mathematical operations (integration,\n",
+    "summation etc) on two functions in order to produce a third function\n",
+    "that expresses how the shape of one gets modified by the other.\n",
+    "Convolution has a plethora of applications in a variety of\n",
+    "disciplines, spanning from statistics to signal processing, computer\n",
+    "vision, solutions of differential equations,linear algebra,\n",
+    "engineering, and yes, machine learning.\n",
+    "\n",
+    "Mathematically, convolution is defined as follows (one-dimensional example):\n",
+    "Let us define a continuous function $y(t)$ given by"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9fa911b3",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "y(t) = \\int x(a) w(t-a) da,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "918817a5",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where $x(a)$ represents a so-called input and $w(t-a)$ is normally called the weight function or kernel.\n",
+    "\n",
+    "The above integral is written in  a more compact form as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d5538df6",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "y(t) = \\left(x * w\\right)(t).\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d4a4e2bc",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "The discretized version reads"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "68268e68",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "y(t) = \\sum_{a=-\\infty}^{a=\\infty}x(a)w(t-a).\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "198bcce9",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Computing the inverse of the above convolution operations is known as deconvolution and the process is commutative.\n",
+    "\n",
+    "How can we use this? And what does it mean? Let us study some familiar examples first."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "43b535c4",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Convolution Examples: Polynomial multiplication\n",
+    "\n",
+    "Our first example is that of a multiplication between two polynomials,\n",
+    "which we will rewrite in terms of the mathematics of convolution. In\n",
+    "the final stage, since the problem here is a discrete one, we will\n",
+    "recast the final expression in terms of a matrix-vector\n",
+    "multiplication, where the matrix is a so-called [Toeplitz matrix\n",
+    "](https://link.springer.com/book/10.1007/978-93-86279-04-0).\n",
+    "\n",
+    "Let us look a the following polynomials to second and third order, respectively:"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "45bc8ffc",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "p(t) = \\alpha_0+\\alpha_1 t+\\alpha_2 t^2,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2c42df04",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "08c139bf",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "s(t) = \\beta_0+\\beta_1 t+\\beta_2 t^2+\\beta_3 t^3.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "bf189420",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "The polynomial multiplication gives us a new polynomial of degree $5$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7f5d7607",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "z(t) = \\delta_0+\\delta_1 t+\\delta_2 t^2+\\delta_3 t^3+\\delta_4 t^4+\\delta_5 t^5.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a2f47e64",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Efficient Polynomial Multiplication\n",
+    "\n",
+    "Computing polynomial products can be implemented efficiently if we rewrite the more brute force multiplications using convolution.\n",
+    "We note first that the new coefficients are given as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7890aee8",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\begin{split}\n",
+    "\\delta_0=&\\alpha_0\\beta_0\\\\\n",
+    "\\delta_1=&\\alpha_1\\beta_0+\\alpha_0\\beta_1\\\\\n",
+    "\\delta_2=&\\alpha_0\\beta_2+\\alpha_1\\beta_1+\\alpha_2\\beta_0\\\\\n",
+    "\\delta_3=&\\alpha_1\\beta_2+\\alpha_2\\beta_1+\\alpha_0\\beta_3\\\\\n",
+    "\\delta_4=&\\alpha_2\\beta_2+\\alpha_1\\beta_3\\\\\n",
+    "\\delta_5=&\\alpha_2\\beta_3.\\\\\n",
+    "\\end{split}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6a03a3eb",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "We note that $\\alpha_i=0$ except for $i\\in \\left\\{0,1,2\\right\\}$ and $\\beta_i=0$ except for $i\\in\\left\\{0,1,2,3\\right\\}$.\n",
+    "\n",
+    "We can then rewrite the coefficients $\\delta_j$ using a discrete convolution as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b49e404f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\delta_j = \\sum_{i=-\\infty}^{i=\\infty}\\alpha_i\\beta_{j-i}=(\\alpha * \\beta)_j,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4ef5b061",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "or as a double sum with restriction $l=i+j$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "61685a6c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\delta_l = \\sum_{ij}\\alpha_i\\beta_{j}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7ced5341",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Further simplification\n",
+    "\n",
+    "Although we may have redundant operations with some few zeros for $\\beta_i$, we can rewrite the above sum in a more compact way as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3d00697e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\delta_i = \\sum_{k=0}^{k=m-1}\\alpha_k\\beta_{i-k},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "22837be3",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where $m=3$ in our case, the maximum length of\n",
+    "the vector $\\alpha$. Note that the vector $\\boldsymbol{\\beta}$ has length $n=4$. Below we will find an even more efficient representation."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1603a086",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## A more efficient way of coding the above Convolution\n",
+    "\n",
+    "Since we only have a finite number of $\\alpha$ and $\\beta$ values\n",
+    "which are non-zero, we can rewrite the above convolution expressions\n",
+    "as a matrix-vector multiplication"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "340acf5c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\boldsymbol{\\delta}=\\begin{bmatrix}\\alpha_0 & 0 & 0 & 0 \\\\\n",
+    "                            \\alpha_1 & \\alpha_0 & 0 & 0 \\\\\n",
+    "\t\t\t    \\alpha_2 & \\alpha_1 & \\alpha_0 & 0 \\\\\n",
+    "\t\t\t    0 & \\alpha_2 & \\alpha_1 & \\alpha_0 \\\\\n",
+    "\t\t\t    0 & 0 & \\alpha_2 & \\alpha_1 \\\\\n",
+    "\t\t\t    0 & 0 & 0 & \\alpha_2\n",
+    "\t\t\t    \\end{bmatrix}\\begin{bmatrix} \\beta_0 \\\\ \\beta_1 \\\\ \\beta_2 \\\\ \\beta_3\\end{bmatrix}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cdc8d513",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Commutative process\n",
+    "\n",
+    "The process is commutative and we can easily see that we can rewrite the multiplication in terms of  a matrix holding $\\beta$ and a vector holding $\\alpha$.\n",
+    "In this case we have"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "51e1f3d8",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\boldsymbol{\\delta}=\\begin{bmatrix}\\beta_0 & 0 & 0  \\\\\n",
+    "                            \\beta_1 & \\beta_0 & 0  \\\\\n",
+    "\t\t\t    \\beta_2 & \\beta_1 & \\beta_0  \\\\\n",
+    "\t\t\t    \\beta_3 & \\beta_2 & \\beta_1 \\\\\n",
+    "\t\t\t    0 & \\beta_3 & \\beta_2 \\\\\n",
+    "\t\t\t    0 & 0 & \\beta_3\n",
+    "\t\t\t    \\end{bmatrix}\\begin{bmatrix} \\alpha_0 \\\\ \\alpha_1 \\\\ \\alpha_2\\end{bmatrix}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ce936f65",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Note that the use of these matrices is for mathematical purposes only\n",
+    "and not implementation purposes.  When implementing the above equation\n",
+    "we do not encode (and allocate memory) the matrices explicitely.  We\n",
+    "rather code the convolutions in the minimal memory footprint that they\n",
+    "require."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c93a683f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Toeplitz matrices\n",
+    "\n",
+    "The above matrices are examples of so-called [Toeplitz\n",
+    "matrices](https://link.springer.com/book/10.1007/978-93-86279-04-0). A\n",
+    "Toeplitz matrix is a matrix in which each descending diagonal from\n",
+    "left to right is constant. For instance the last matrix, which we\n",
+    "rewrite as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1e3cffca",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\boldsymbol{A}=\\begin{bmatrix}a_0 & 0 & 0  \\\\\n",
+    "                            a_1 & a_0 & 0  \\\\\n",
+    "\t\t\t    a_2 & a_1 & a_0  \\\\\n",
+    "\t\t\t    a_3 & a_2 & a_1 \\\\\n",
+    "\t\t\t    0 & a_3 & a_2 \\\\\n",
+    "\t\t\t    0 & 0 & a_3\n",
+    "\t\t\t    \\end{bmatrix},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e27270d9",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "with elements $a_{ii}=a_{i+1,j+1}=a_{i-j}$ is an example of a Toeplitz\n",
+    "matrix. Such a matrix does not need to be a square matrix.  Toeplitz\n",
+    "matrices are also closely connected with Fourier series, because the multiplication operator by a trigonometric\n",
+    "polynomial, compressed to a finite-dimensional space, can be\n",
+    "represented by such a matrix. The example above shows that we can\n",
+    "represent linear convolution as multiplication of a Toeplitz matrix by\n",
+    "a vector."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "125ef645",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Fourier series and Toeplitz matrices\n",
+    "\n",
+    "This is an active and ogoing research area concerning CNNs. The following articles may be of interest\n",
+    "1. [Read more about the convolution theorem and Fouriers series](https://www.sciencedirect.com/topics/engineering/convolution-theorem#:~:text=The%20convolution%20theorem%20(together%20with,k%20)%20G%20(%20k%20)%20.)\n",
+    "\n",
+    "2. [Fourier Transform Layer](https://www.sciencedirect.com/science/article/pii/S1568494623006257)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d13ab1e4",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Generalizing the above one-dimensional case\n",
+    "\n",
+    "In order to align the above simple case with the more general\n",
+    "convolution cases, we rename $\\boldsymbol{\\alpha}$, whose length is $m=3$,\n",
+    "with $\\boldsymbol{w}$.  We will interpret $\\boldsymbol{w}$ as a weight/filter function\n",
+    "with which we want to perform the convolution with an input variable\n",
+    "$\\boldsymbol{x}$ of length $n$.  We will assume always that the filter\n",
+    "$\\boldsymbol{w}$ has dimensionality $m \\le n$.\n",
+    "\n",
+    "We replace thus $\\boldsymbol{\\beta}$ with $\\boldsymbol{x}$ and $\\boldsymbol{\\delta}$ with $\\boldsymbol{y}$ and have"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b9eb4b1e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "y(i)= \\left(x*w\\right)(i)= \\sum_{k=0}^{k=m-1}w(k)x(i-k),\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "bdf0893f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where $m=3$ in our case, the maximum length of the vector $\\boldsymbol{w}$.\n",
+    "Here the symbol $*$ represents the mathematical operation of convolution."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "64cd5dbb",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Memory considerations\n",
+    "\n",
+    "This expression leaves us however with some terms with negative\n",
+    "indices, for example $x(-1)$ and $x(-2)$ which may not be defined. Our\n",
+    "vector $\\boldsymbol{x}$ has components $x(0)$, $x(1)$, $x(2)$ and $x(3)$.\n",
+    "\n",
+    "The index $j$ for $\\boldsymbol{x}$ runs from $j=0$ to $j=3$ since $\\boldsymbol{x}$ is meant to\n",
+    "represent a third-order polynomial.\n",
+    "\n",
+    "Furthermore, the index $i$ runs from $i=0$ to $i=5$ since $\\boldsymbol{y}$\n",
+    "contains the coefficients of a fifth-order polynomial.  When $i=5$ we\n",
+    "may also have values of $x(4)$ and $x(5)$ which are not defined."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "20fa0219",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Padding\n",
+    "\n",
+    "The solution to this is what is called **padding**!  We simply define a\n",
+    "new vector $x$ with two added elements set to zero before $x(0)$ and\n",
+    "two new elements after $x(3)$ set to zero. That is, we augment the\n",
+    "length of $\\boldsymbol{x}$ from $n=4$ to $n+2P=8$, where $P=2$ is the padding\n",
+    "constant (a new hyperparameter), see discussions below as well."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d24c7e69",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## New vector\n",
+    "\n",
+    "We have a new vector defined as $x(0)=0$, $x(1)=0$,\n",
+    "$x(2)=\\beta_0$, $x(3)=\\beta_1$, $x(4)=\\beta_2$, $x(5)=\\beta_3$,\n",
+    "$x(6)=0$, and $x(7)=0$.\n",
+    "\n",
+    "We have added four new elements, which\n",
+    "are all zero. The benefit is that we can rewrite the equation for\n",
+    "$\\boldsymbol{y}$, with $i=0,1,\\dots,5$,"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c00151a8",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "y(i) = \\sum_{k=0}^{k=m-1}w(k)x(i+(m-1)-k).\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c9b39bfd",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "As an example, we have"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "53de5ac4",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "y(4)=x(6)w(0)+x(5)w(1)+x(4)w(2)=0\\times \\alpha_0+\\beta_3\\alpha_1+\\beta_2\\alpha_2,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e1025d77",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "as before except that we have an additional term $x(6)w(0)$, which is zero.\n",
+    "\n",
+    "Similarly, for the fifth-order term we have"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "34a5a413",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "y(5)=x(7)w(0)+x(6)w(1)+x(5)w(2)=0\\times \\alpha_0+0\\times\\alpha_1+\\beta_3\\alpha_2.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5ef38242",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "The zeroth-order term is"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "42a8bd2e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "y(0)=x(2)w(0)+x(1)w(1)+x(0)w(2)=\\beta_0 \\alpha_0+0\\times\\alpha_1+0\\times\\alpha_2=\\alpha_0\\beta_0.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2580d624",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Rewriting as dot products\n",
+    "\n",
+    "If we now flip the filter/weight vector, with the following term as a typical example"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "76157e3c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "y(0)=x(2)w(0)+x(1)w(1)+x(0)w(2)=x(2)\\tilde{w}(2)+x(1)\\tilde{w}(1)+x(0)\\tilde{w}(0),\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a47c0bbf",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "with $\\tilde{w}(0)=w(2)$, $\\tilde{w}(1)=w(1)$, and $\\tilde{w}(2)=w(0)$, we can then rewrite the above sum as a dot product of\n",
+    "$x(i:i+(m-1))\\tilde{w}$ for element $y(i)$, where $x(i:i+(m-1))$ is simply a patch of $\\boldsymbol{x}$ of size $m-1$.\n",
+    "\n",
+    "The padding $P$ we have introduced for the convolution stage is just\n",
+    "another hyperparameter which is introduced as part of the\n",
+    "architecture. Similarly, below we will also introduce another\n",
+    "hyperparameter called **Stride** $S$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4de2c235",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Cross correlation\n",
+    "\n",
+    "In essentially all applications one uses what is called cross correlation instead of the standard convolution described above.\n",
+    "This means that multiplication is performed in the same direction and instead of the general expression we have discussed above (with infinite sums)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "33319954",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "y(i) = \\sum_{k=-\\infty}^{k=\\infty}w(k)x(i-k),\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d46fb216",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "we have now"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1125a773",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "y(i) = \\sum_{k=-\\infty}^{k=\\infty}w(k)x(i+k).\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4e9ea645",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Both TensorFlow and PyTorch (as well as our own code example below),\n",
+    "implement the last equation, although it is normally referred to as\n",
+    "convolution.  The same padding rules and stride rules discussed below\n",
+    "apply to this expression as well.\n",
+    "\n",
+    "We leave it as an exercise for you to convince yourself that the example we have discussed till now, gives the same final expression using the last expression."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "711fc589",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Two-dimensional objects\n",
+    "\n",
+    "We are now ready to start studying the discrete convolutions relevant for convolutional neural networks.\n",
+    "We often use convolutions over more than one dimension at a time. If\n",
+    "we have a two-dimensional image $X$ as input, we can have a **filter**\n",
+    "defined by a two-dimensional **kernel/weight/filter** $W$. This leads to an output $Y$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ea93186d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "Y(i,j)=(X * W)(i,j) = \\sum_m\\sum_n X(m,n)W(i-m,j-n).\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2ce72e4f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Convolution is a commutative process, which means we can rewrite this equation as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7c891889",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "Y(i,j)=(X * W)(i,j) = \\sum_m\\sum_n X(i-m,j-n)W(m,n).\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "337f70a6",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Normally the latter is more straightforward to implement in a machine\n",
+    "larning library since there is less variation in the range of values\n",
+    "of $m$ and $n$.\n",
+    "\n",
+    "As mentioned above, most deep learning libraries implement\n",
+    "cross-correlation instead of convolution (although it is referred to as\n",
+    "convolution)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "aa0e3c87",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "Y(i,j)=(X * W)(i,j) = \\sum_m\\sum_n X(i+m,j+n)W(m,n).\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "77113b34",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## CNNs in more detail, simple example\n",
+    "\n",
+    "Let assume we have an input matrix $X$ of dimensionality $3\\times 3$\n",
+    "and a $2\\times 2$ filter $W$ given by the following matrices"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d54278c7",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\boldsymbol{X}=\\begin{bmatrix}x_{00} & x_{01} & x_{02}  \\\\\n",
+    "                      x_{10} & x_{11} & x_{12}  \\\\\n",
+    "\t              x_{20} & x_{21} & x_{22} \\end{bmatrix},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "597d1ef3",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c544ba40",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\boldsymbol{W}=\\begin{bmatrix}w_{00} & w_{01} \\\\\n",
+    "\t              w_{10} & w_{11}\\end{bmatrix}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b6c1b40b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "We introduce now the hyperparameter $S$ **stride**. Stride represents how the filter $W$ moves the convolution process on the matrix $X$.\n",
+    "We strongly recommend the repository on [Arithmetic of deep learning by Dumoulin and Visin](https://github.com/vdumoulin/conv_arithmetic) \n",
+    "\n",
+    "Here we set the stride equal to $S=1$, which means that, starting with the element $x_{00}$, the filter will act on $2\\times 2$ submatrices each time, starting with the upper corner and moving according to the stride value column by column. \n",
+    "\n",
+    "Here we perform the operation"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d8ee5cf0",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "Y_(i,j)=(X * W)(i,j) = \\sum_m\\sum_n X(i-m,j-n)W(m,n),\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5df35204",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and obtain"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "afe8a3ab",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\boldsymbol{Y}=\\begin{bmatrix}x_{00}w_{00}+x_{01}w_{01}+x_{10}w_{10}+x_{11}w_{11} & x_{01}w_{00}+x_{02}w_{01}+x_{11}w_{10}+x_{12}w_{11}  \\\\\n",
+    "\t              x_{10}w_{00}+x_{11}w_{01}+x_{20}w_{10}+x_{21}w_{11} & x_{11}w_{00}+x_{12}w_{01}+x_{21}w_{10}+x_{22}w_{11}\\end{bmatrix}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9a1c6848",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "We can rewrite this operation in terms of a matrix-vector multiplication by defining a new vector where we flatten out the inputs as a vector $\\boldsymbol{X}'$ of length $9$ and\n",
+    "a matrix $\\boldsymbol{W}'$ with dimension $4\\times 9$ as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4506234a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\boldsymbol{X}'=\\begin{bmatrix}x_{00} \\\\ x_{01} \\\\ x_{02} \\\\ x_{10} \\\\ x_{11} \\\\ x_{12} \\\\ x_{20} \\\\ x_{21} \\\\ x_{22} \\end{bmatrix},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f1b2fef4",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and the new matrix"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6c372fa6",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\boldsymbol{W}'=\\begin{bmatrix} w_{00} & w_{01} & 0 & w_{10} & w_{11} & 0 & 0 & 0 & 0 \\\\\n",
+    "                        0  & w_{00} & w_{01} & 0 & w_{10} & w_{11} & 0 & 0 & 0 \\\\\n",
+    "\t\t\t0 & 0 & 0 & w_{00} & w_{01} & 0 & w_{10} & w_{11} & 0  \\\\\n",
+    "                        0 & 0 & 0 & 0 & w_{00} & w_{01} & 0 & w_{10} & w_{11}\\end{bmatrix}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "61ad1cf3",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "We see easily that performing the matrix-vector multiplication $\\boldsymbol{W}'\\boldsymbol{X}'$ is the same as the above convolution with stride $S=1$, that is"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a18a70a2",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "Y=(\\boldsymbol{W}*\\boldsymbol{X}),\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b63a1613",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "is now given by $\\boldsymbol{W}'\\boldsymbol{X}'$ which is a vector of length $4$ instead of the originally resulting  $2\\times 2$ output matrix."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8fa9fe57",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## The convolution stage\n",
+    "\n",
+    "The convolution stage, where we apply different filters $\\boldsymbol{W}$ in\n",
+    "order to reduce the dimensionality of an image, adds, in addition to\n",
+    "the weights and biases (to be trained by the back propagation\n",
+    "algorithm) that define the filters, two new hyperparameters, the so-called\n",
+    "**padding** $P$ and the stride $S$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a30b6ced",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Finding the number of parameters\n",
+    "\n",
+    "In the above example we have an input matrix of dimension $3\\times\n",
+    "3$. In general we call the input for an input volume and it is defined\n",
+    "by its width $H_1$, height $H_1$ and depth $D_1$. If we have the\n",
+    "standard three color channels $D_1=3$.\n",
+    "\n",
+    "The above example has $W_1=H_1=3$ and $D_1=1$.\n",
+    "\n",
+    "When we introduce the filter we have the following additional hyperparameters\n",
+    "1. $K$ the number of filters. It is common to perform the convolution of the input several times since by experience shrinking the input too fast does not work well\n",
+    "\n",
+    "2. $F$ as the filter's spatial extent\n",
+    "\n",
+    "3. $S$ as the stride parameter\n",
+    "\n",
+    "4. $P$ as the padding parameter\n",
+    "\n",
+    "These parameters are defined by the architecture of the network and are not included in the training."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b38d040f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## New image (or volume)\n",
+    "\n",
+    "Acting with the filter on the input volume produces an output volume\n",
+    "which is defined by its width $W_2$, its height $H_2$ and its depth\n",
+    "$D_2$.\n",
+    "\n",
+    "These are defined by the following relations"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3b090ce0",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "W_2 = \\frac{(W_1-F+2P)}{S}+1,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "52fa4212",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "H_2 = \\frac{(H_1-F+2P)}{S}+1,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "dfa9a926",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and $D_2=K$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9bb02c26",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Parameters to train, common settings\n",
+    "\n",
+    "With parameter sharing, the convolution involves thus  for each filter  $F\\times F\\times D_1$ weights plus one bias parameter.\n",
+    "\n",
+    "In total we have"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d98e6808",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\left(F\\times F\\times D_1)\\right) \\times K+(K\\mathrm{--biases}),\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "601ecd16",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "parameters to train by back propagation.\n",
+    "\n",
+    "It is common to let $K$ come in powers of $2$, that is $32$, $64$, $128$ etc.\n",
+    "\n",
+    "**Common settings.**\n",
+    "\n",
+    "1. $\\begin{array}{c} F=3 & S=1 & P=1 \\end{array}$\n",
+    "\n",
+    "2. $\\begin{array}{c} F=5 & S=1 & P=2 \\end{array}$\n",
+    "\n",
+    "3. $\\begin{array}{c} F=5 & S=2 & P=\\mathrm{open} \\end{array}$\n",
+    "\n",
+    "4. $\\begin{array}{c} F=1 & S=1 & P=0 \\end{array}$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3f87e148",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Examples of CNN setups\n",
+    "\n",
+    "Let us assume we have an input volume $V$ given by an image of dimensionality\n",
+    "$32\\times 32 \\times 3$, that is three color channels and $32\\times 32$ pixels.\n",
+    "\n",
+    "We apply a filter of dimension $5\\times 5$ ten times with stride $S=1$ and padding $P=0$.\n",
+    "\n",
+    "The output volume is given by $(32-5)/1+1=28$, resulting in ten images\n",
+    "of dimensionality $28\\times 28\\times 3$.\n",
+    "\n",
+    "The total number of parameters to train for each filter is then\n",
+    "$5\\times 5\\times 3+1$, where the last parameter is the bias. This\n",
+    "gives us $76$ parameters for each filter, leading to a total of $760$\n",
+    "parameters for the ten filters.\n",
+    "\n",
+    "How many parameters will a filter of dimensionality $3\\times 3$\n",
+    "(adding color channels) result in if we produce $32$ new images? Use $S=1$ and $P=0$.\n",
+    "\n",
+    "Note that strides constitute a form of **subsampling**. As an alternative to\n",
+    "being interpreted as a measure of how much the kernel/filter is translated, strides\n",
+    "can also be viewed as how much of the output is retained. For instance, moving\n",
+    "the kernel by hops of two is equivalent to moving the kernel by hops of one but\n",
+    "retaining only odd output elements."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "45526eae",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Summarizing: Performing a general discrete convolution ([From Raschka et al](https://github.com/rasbt/machine-learning-book))\n",
+    "\n",
+    "<!-- dom:FIGURE: [figslides/discreteconv1.png, width=500 frac=0.67]  A deep CNN -->\n",
+    "<!-- begin figure -->\n",
+    "\n",
+    "<img src=\"figslides/discreteconv1.png\" width=\"500\"><p style=\"font-size: 0.9em\"><i>Figure 1: A deep CNN</i></p>\n",
+    "<!-- end figure -->"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "963177d2",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Pooling\n",
+    "\n",
+    "In addition to discrete convolutions themselves, **pooling** operations\n",
+    "make up another important building block in CNNs. Pooling operations reduce\n",
+    "the size of feature maps by using some function to summarize subregions, such\n",
+    "as taking the average or the maximum value.\n",
+    "\n",
+    "Pooling works by sliding a window across the input and feeding the content of\n",
+    "the window to a **pooling function**. In some sense, pooling works very much\n",
+    "like a discrete convolution, but replaces the linear combination described by\n",
+    "the kernel with some other function."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f657465b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Pooling arithmetic\n",
+    "\n",
+    "In a neural network, pooling layers provide invariance to small translations of\n",
+    "the input. The most common kind of pooling is **max pooling**, which\n",
+    "consists in splitting the input in (usually non-overlapping) patches and\n",
+    "outputting the maximum value of each patch. Other kinds of pooling exist, e.g.,\n",
+    "mean or average pooling, which all share the same idea of aggregating the input\n",
+    "locally by applying a non-linearity to the content of some patches."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "33142d01",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Pooling types ([From Raschka et al](https://github.com/rasbt/machine-learning-book))\n",
+    "\n",
+    "<!-- dom:FIGURE: [figslides/maxpooling.png, width=500 frac=0.67]  A deep CNN -->\n",
+    "<!-- begin figure -->\n",
+    "\n",
+    "<img src=\"figslides/maxpooling.png\" width=\"500\"><p style=\"font-size: 0.9em\"><i>Figure 1: A deep CNN</i></p>\n",
+    "<!-- end figure -->"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7e8ee265",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Building convolutional neural networks using Tensorflow and Keras\n",
+    "\n",
+    "As discussed above, CNNs are neural networks built from the assumption that the inputs\n",
+    "to the network are 2D images. This is important because the number of features or pixels in images\n",
+    "grows very fast with the image size, and an enormous number of weights and biases are needed in order to build an accurate network.  \n",
+    "\n",
+    "As before, we still have our input, a hidden layer and an output. What's novel about convolutional networks\n",
+    "are the **convolutional** and **pooling** layers stacked in pairs between the input and the hidden layer.\n",
+    "In addition, the data is no longer represented as a 2D feature matrix, instead each input is a number of 2D\n",
+    "matrices, typically 1 for each color dimension (Red, Green, Blue)."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c4e2bc6f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Setting it up\n",
+    "\n",
+    "It means that to represent the entire\n",
+    "dataset of images, we require a 4D matrix or **tensor**. This tensor has the dimensions:"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f8d6e5be",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "(n_{inputs},\\, n_{pixels, width},\\, n_{pixels, height},\\, depth) .\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "bd170ded",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## The MNIST dataset again\n",
+    "\n",
+    "The MNIST dataset consists of grayscale images with a pixel size of\n",
+    "$28\\times 28$, meaning we require $28 \\times 28 = 724$ weights to each\n",
+    "neuron in the first hidden layer.\n",
+    "\n",
+    "If we were to analyze images of size $128\\times 128$ we would require\n",
+    "$128 \\times 128 = 16384$ weights to each neuron. Even worse if we were\n",
+    "dealing with color images, as most images are, we have an image matrix\n",
+    "of size $128\\times 128$ for each color dimension (Red, Green, Blue),\n",
+    "meaning 3 times the number of weights $= 49152$ are required for every\n",
+    "single neuron in the first hidden layer."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5f8a4322",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Strong correlations\n",
+    "\n",
+    "Images typically have strong local correlations, meaning that a small\n",
+    "part of the image varies little from its neighboring regions. If for\n",
+    "example we have an image of a blue car, we can roughly assume that a\n",
+    "small blue part of the image is surrounded by other blue regions.\n",
+    "\n",
+    "Therefore, instead of connecting every single pixel to a neuron in the\n",
+    "first hidden layer, as we have previously done with deep neural\n",
+    "networks, we can instead connect each neuron to a small part of the\n",
+    "image (in all 3 RGB depth dimensions).  The size of each small area is\n",
+    "fixed, and known as a [receptive](https://en.wikipedia.org/wiki/Receptive_field)."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "bad994c1",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Layers of a CNN\n",
+    "\n",
+    "The layers of a convolutional neural network arrange neurons in 3D: width, height and depth.  \n",
+    "The input image is typically a square matrix of depth 3. \n",
+    "\n",
+    "A **convolution** is performed on the image which outputs\n",
+    "a 3D volume of neurons. The weights to the input are arranged in a number of 2D matrices, known as **filters**.\n",
+    "\n",
+    "Each filter slides along the input image, taking the dot product\n",
+    "between each small part of the image and the filter, in all depth\n",
+    "dimensions. This is then passed through a non-linear function,\n",
+    "typically the **Rectified Linear (ReLu)** function, which serves as the\n",
+    "activation of the neurons in the first convolutional layer. This is\n",
+    "further passed through a **pooling layer**, which reduces the size of the\n",
+    "convolutional layer, e.g. by taking the maximum or average across some\n",
+    "small regions, and this serves as input to the next convolutional\n",
+    "layer."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3f9bf131",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Systematic reduction\n",
+    "\n",
+    "By systematically reducing the size of the input volume, through\n",
+    "convolution and pooling, the network should create representations of\n",
+    "small parts of the input, and then from them assemble representations\n",
+    "of larger areas.  The final pooling layer is flattened to serve as\n",
+    "input to a hidden layer, such that each neuron in the final pooling\n",
+    "layer is connected to every single neuron in the hidden layer. This\n",
+    "then serves as input to the output layer, e.g. a softmax output for\n",
+    "classification."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "625ace40",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Prerequisites: Collect and pre-process data"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "id": "a3f06a64",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "%matplotlib inline\n",
+    "\n",
+    "# import necessary packages\n",
+    "import numpy as np\n",
+    "import matplotlib.pyplot as plt\n",
+    "from sklearn import datasets\n",
+    "\n",
+    "\n",
+    "# ensure the same random numbers appear every time\n",
+    "np.random.seed(0)\n",
+    "\n",
+    "# display images in notebook\n",
+    "%matplotlib inline\n",
+    "plt.rcParams['figure.figsize'] = (12,12)\n",
+    "\n",
+    "\n",
+    "# download MNIST dataset\n",
+    "digits = datasets.load_digits()\n",
+    "\n",
+    "# define inputs and labels\n",
+    "inputs = digits.images\n",
+    "labels = digits.target\n",
+    "\n",
+    "# RGB images have a depth of 3\n",
+    "# our images are grayscale so they should have a depth of 1\n",
+    "inputs = inputs[:,:,:,np.newaxis]\n",
+    "\n",
+    "print(\"inputs = (n_inputs, pixel_width, pixel_height, depth) = \" + str(inputs.shape))\n",
+    "print(\"labels = (n_inputs) = \" + str(labels.shape))\n",
+    "\n",
+    "\n",
+    "# choose some random images to display\n",
+    "n_inputs = len(inputs)\n",
+    "indices = np.arange(n_inputs)\n",
+    "random_indices = np.random.choice(indices, size=5)\n",
+    "\n",
+    "for i, image in enumerate(digits.images[random_indices]):\n",
+    "    plt.subplot(1, 5, i+1)\n",
+    "    plt.axis('off')\n",
+    "    plt.imshow(image, cmap=plt.cm.gray_r, interpolation='nearest')\n",
+    "    plt.title(\"Label: %d\" % digits.target[random_indices[i]])\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "764e7143",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Importing Keras and Tensorflow"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "id": "1b8fd15a",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "from tensorflow.keras import datasets, layers, models\n",
+    "from tensorflow.keras.layers import Input\n",
+    "from tensorflow.keras.models import Sequential      #This allows appending layers to existing models\n",
+    "from tensorflow.keras.layers import Dense           #This allows defining the characteristics of a particular layer\n",
+    "from tensorflow.keras import optimizers             #This allows using whichever optimiser we want (sgd,adam,RMSprop)\n",
+    "from tensorflow.keras import regularizers           #This allows using whichever regularizer we want (l1,l2,l1_l2)\n",
+    "from tensorflow.keras.utils import to_categorical   #This allows using categorical cross entropy as the cost function\n",
+    "#from tensorflow.keras import Conv2D\n",
+    "#from tensorflow.keras import MaxPooling2D\n",
+    "#from tensorflow.keras import Flatten\n",
+    "\n",
+    "from sklearn.model_selection import train_test_split\n",
+    "\n",
+    "# representation of labels\n",
+    "labels = to_categorical(labels)\n",
+    "\n",
+    "# split into train and test data\n",
+    "# one-liner from scikit-learn library\n",
+    "train_size = 0.8\n",
+    "test_size = 1 - train_size\n",
+    "X_train, X_test, Y_train, Y_test = train_test_split(inputs, labels, train_size=train_size,\n",
+    "                                                    test_size=test_size)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "bf68c3f4",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Running with Keras"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "id": "d5a91d0e",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "def create_convolutional_neural_network_keras(input_shape, receptive_field,\n",
+    "                                              n_filters, n_neurons_connected, n_categories,\n",
+    "                                              eta, lmbd):\n",
+    "    model = Sequential()\n",
+    "    model.add(layers.Conv2D(n_filters, (receptive_field, receptive_field), input_shape=input_shape, padding='same',\n",
+    "              activation='relu', kernel_regularizer=regularizers.l2(lmbd)))\n",
+    "    model.add(layers.MaxPooling2D(pool_size=(2, 2)))\n",
+    "    model.add(layers.Flatten())\n",
+    "    model.add(layers.Dense(n_neurons_connected, activation='relu', kernel_regularizer=regularizers.l2(lmbd)))\n",
+    "    model.add(layers.Dense(n_categories, activation='softmax', kernel_regularizer=regularizers.l2(lmbd)))\n",
+    "    \n",
+    "    sgd = optimizers.SGD(lr=eta)\n",
+    "    model.compile(loss='categorical_crossentropy', optimizer=sgd, metrics=['accuracy'])\n",
+    "    \n",
+    "    return model\n",
+    "\n",
+    "epochs = 100\n",
+    "batch_size = 100\n",
+    "input_shape = X_train.shape[1:4]\n",
+    "receptive_field = 3\n",
+    "n_filters = 10\n",
+    "n_neurons_connected = 50\n",
+    "n_categories = 10\n",
+    "\n",
+    "eta_vals = np.logspace(-5, 1, 7)\n",
+    "lmbd_vals = np.logspace(-5, 1, 7)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8ff4d34b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Final part"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "id": "c1035646",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "CNN_keras = np.zeros((len(eta_vals), len(lmbd_vals)), dtype=object)\n",
+    "        \n",
+    "for i, eta in enumerate(eta_vals):\n",
+    "    for j, lmbd in enumerate(lmbd_vals):\n",
+    "        CNN = create_convolutional_neural_network_keras(input_shape, receptive_field,\n",
+    "                                              n_filters, n_neurons_connected, n_categories,\n",
+    "                                              eta, lmbd)\n",
+    "        CNN.fit(X_train, Y_train, epochs=epochs, batch_size=batch_size, verbose=0)\n",
+    "        scores = CNN.evaluate(X_test, Y_test)\n",
+    "        \n",
+    "        CNN_keras[i][j] = CNN\n",
+    "        \n",
+    "        print(\"Learning rate = \", eta)\n",
+    "        print(\"Lambda = \", lmbd)\n",
+    "        print(\"Test accuracy: %.3f\" % scores[1])\n",
+    "        print()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "dcdee4b4",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Final visualization"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "id": "c34c4218",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "# visual representation of grid search\n",
+    "# uses seaborn heatmap, could probably do this in matplotlib\n",
+    "import seaborn as sns\n",
+    "\n",
+    "sns.set()\n",
+    "\n",
+    "train_accuracy = np.zeros((len(eta_vals), len(lmbd_vals)))\n",
+    "test_accuracy = np.zeros((len(eta_vals), len(lmbd_vals)))\n",
+    "\n",
+    "for i in range(len(eta_vals)):\n",
+    "    for j in range(len(lmbd_vals)):\n",
+    "        CNN = CNN_keras[i][j]\n",
+    "\n",
+    "        train_accuracy[i][j] = CNN.evaluate(X_train, Y_train)[1]\n",
+    "        test_accuracy[i][j] = CNN.evaluate(X_test, Y_test)[1]\n",
+    "\n",
+    "        \n",
+    "fig, ax = plt.subplots(figsize = (10, 10))\n",
+    "sns.heatmap(train_accuracy, annot=True, ax=ax, cmap=\"viridis\")\n",
+    "ax.set_title(\"Training Accuracy\")\n",
+    "ax.set_ylabel(\"$\\eta$\")\n",
+    "ax.set_xlabel(\"$\\lambda$\")\n",
+    "plt.show()\n",
+    "\n",
+    "fig, ax = plt.subplots(figsize = (10, 10))\n",
+    "sns.heatmap(test_accuracy, annot=True, ax=ax, cmap=\"viridis\")\n",
+    "ax.set_title(\"Test Accuracy\")\n",
+    "ax.set_ylabel(\"$\\eta$\")\n",
+    "ax.set_xlabel(\"$\\lambda$\")\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9848777f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## The CIFAR01 data set\n",
+    "\n",
+    "The CIFAR10 dataset contains 60,000 color images in 10 classes, with\n",
+    "6,000 images in each class. The dataset is divided into 50,000\n",
+    "training images and 10,000 testing images. The classes are mutually\n",
+    "exclusive and there is no overlap between them."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "id": "e3c34685",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "import tensorflow as tf\n",
+    "\n",
+    "from tensorflow.keras import datasets, layers, models\n",
+    "import matplotlib.pyplot as plt\n",
+    "\n",
+    "# We import the data set\n",
+    "(train_images, train_labels), (test_images, test_labels) = datasets.cifar10.load_data()\n",
+    "\n",
+    "# Normalize pixel values to be between 0 and 1 by dividing by 255. \n",
+    "train_images, test_images = train_images / 255.0, test_images / 255.0"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "376a2959",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Verifying the data set\n",
+    "\n",
+    "To verify that the dataset looks correct, let's plot the first 25 images from the training set and display the class name below each image."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "id": "fa4b303c",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "class_names = ['airplane', 'automobile', 'bird', 'cat', 'deer',\n",
+    "               'dog', 'frog', 'horse', 'ship', 'truck']\n",
+    "plt.figure(figsize=(10,10))\n",
+    "for i in range(25):\n",
+    "    plt.subplot(5,5,i+1)\n",
+    "    plt.xticks([])\n",
+    "    plt.yticks([])\n",
+    "    plt.grid(False)\n",
+    "    plt.imshow(train_images[i], cmap=plt.cm.binary)\n",
+    "    # The CIFAR labels happen to be arrays, \n",
+    "    # which is why you need the extra index\n",
+    "    plt.xlabel(class_names[train_labels[i][0]])\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8f717ab7",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Set up  the model\n",
+    "\n",
+    "The 6 lines of code below define the convolutional base using a common pattern: a stack of Conv2D and MaxPooling2D layers.\n",
+    "\n",
+    "As input, a CNN takes tensors of shape (image_height, image_width, color_channels), ignoring the batch size. If you are new to these dimensions, color_channels refers to (R,G,B). In this example, you will configure our CNN to process inputs of shape (32, 32, 3), which is the format of CIFAR images. You can do this by passing the argument input_shape to our first layer."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 8,
+   "id": "91013222",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "model = models.Sequential()\n",
+    "model.add(layers.Conv2D(32, (3, 3), activation='relu', input_shape=(32, 32, 3)))\n",
+    "model.add(layers.MaxPooling2D((2, 2)))\n",
+    "model.add(layers.Conv2D(64, (3, 3), activation='relu'))\n",
+    "model.add(layers.MaxPooling2D((2, 2)))\n",
+    "model.add(layers.Conv2D(64, (3, 3), activation='relu'))\n",
+    "\n",
+    "# Let's display the architecture of our model so far.\n",
+    "\n",
+    "model.summary()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "64f3581b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "You can see that the output of every Conv2D and MaxPooling2D layer is a 3D tensor of shape (height, width, channels). The width and height dimensions tend to shrink as you go deeper in the network. The number of output channels for each Conv2D layer is controlled by the first argument (e.g., 32 or 64). Typically, as the width and height shrink, you can afford (computationally) to add more output channels in each Conv2D layer."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "07774fd6",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Add Dense layers on top\n",
+    "\n",
+    "To complete our model, you will feed the last output tensor from the\n",
+    "convolutional base (of shape (4, 4, 64)) into one or more Dense layers\n",
+    "to perform classification. Dense layers take vectors as input (which\n",
+    "are 1D), while the current output is a 3D tensor. First, you will\n",
+    "flatten (or unroll) the 3D output to 1D, then add one or more Dense\n",
+    "layers on top. CIFAR has 10 output classes, so you use a final Dense\n",
+    "layer with 10 outputs and a softmax activation."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 9,
+   "id": "a6dc1206",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "model.add(layers.Flatten())\n",
+    "model.add(layers.Dense(64, activation='relu'))\n",
+    "model.add(layers.Dense(10))\n",
+    "Here's the complete architecture of our model.\n",
+    "\n",
+    "model.summary()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "71ef5715",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "As you can see, our (4, 4, 64) outputs were flattened into vectors of shape (1024) before going through two Dense layers."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "596eaf51",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Compile and train the model"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 10,
+   "id": "1c8159af",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "model.compile(optimizer='adam',\n",
+    "              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),\n",
+    "              metrics=['accuracy'])\n",
+    "\n",
+    "history = model.fit(train_images, train_labels, epochs=10, \n",
+    "                    validation_data=(test_images, test_labels))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "23913f02",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Finally, evaluate the model"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 11,
+   "id": "942cf136",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "plt.plot(history.history['accuracy'], label='accuracy')\n",
+    "plt.plot(history.history['val_accuracy'], label = 'val_accuracy')\n",
+    "plt.xlabel('Epoch')\n",
+    "plt.ylabel('Accuracy')\n",
+    "plt.ylim([0.5, 1])\n",
+    "plt.legend(loc='lower right')\n",
+    "\n",
+    "test_loss, test_acc = model.evaluate(test_images,  test_labels, verbose=2)\n",
+    "\n",
+    "print(test_acc)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9cf8f35b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Building code using Pytorch\n",
+    "\n",
+    "This code loads and normalizes the MNIST dataset. Thereafter it defines  a CNN architecture with:\n",
+    "1. Two convolutional layers\n",
+    "\n",
+    "2. Max pooling\n",
+    "\n",
+    "3. Dropout for regularization\n",
+    "\n",
+    "4. Two fully connected layers\n",
+    "\n",
+    "It uses the Adam optimizer and for cost function it employs the\n",
+    "Cross-Entropy function. It trains for 10 epochs.\n",
+    "You can modify the architecture (number of layers, channels, dropout\n",
+    "rate) or training parameters (learning rate, batch size, epochs) to\n",
+    "experiment with different configurations."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 12,
+   "id": "3f08edcf",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "import torch\n",
+    "import torch.nn as nn\n",
+    "import torch.nn.functional as F\n",
+    "import torch.optim as optim\n",
+    "from torchvision import datasets, transforms\n",
+    "\n",
+    "# Set device\n",
+    "device = torch.device(\"cuda\" if torch.cuda.is_available() else \"cpu\")\n",
+    "\n",
+    "# Define transforms\n",
+    "transform = transforms.Compose([\n",
+    "   transforms.ToTensor(),\n",
+    "   transforms.Normalize((0.1307,), (0.3081,))\n",
+    "])\n",
+    "\n",
+    "# Load datasets\n",
+    "train_dataset = datasets.MNIST(root='./data', train=True, download=True, transform=transform)\n",
+    "test_dataset = datasets.MNIST(root='./data', train=False, download=True, transform=transform)\n",
+    "\n",
+    "# Create data loaders\n",
+    "train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=64, shuffle=True)\n",
+    "test_loader = torch.utils.data.DataLoader(test_dataset, batch_size=64, shuffle=False)\n",
+    "\n",
+    "# Define CNN model\n",
+    "class CNN(nn.Module):\n",
+    "   def __init__(self):\n",
+    "       super(CNN, self).__init__()\n",
+    "       self.conv1 = nn.Conv2d(1, 32, 3, padding=1)\n",
+    "       self.conv2 = nn.Conv2d(32, 64, 3, padding=1)\n",
+    "       self.pool = nn.MaxPool2d(2, 2)\n",
+    "       self.fc1 = nn.Linear(64*7*7, 1024)\n",
+    "       self.fc2 = nn.Linear(1024, 10)\n",
+    "       self.dropout = nn.Dropout(0.5)\n",
+    "\n",
+    "   def forward(self, x):\n",
+    "       x = self.pool(F.relu(self.conv1(x)))\n",
+    "       x = self.pool(F.relu(self.conv2(x)))\n",
+    "       x = x.view(-1, 64*7*7)\n",
+    "       x = self.dropout(F.relu(self.fc1(x)))\n",
+    "       x = self.fc2(x)\n",
+    "       return x\n",
+    "\n",
+    "# Initialize model, loss function, and optimizer\n",
+    "model = CNN().to(device)\n",
+    "criterion = nn.CrossEntropyLoss()\n",
+    "optimizer = optim.Adam(model.parameters(), lr=0.001)\n",
+    "\n",
+    "# Training loop\n",
+    "num_epochs = 10\n",
+    "for epoch in range(num_epochs):\n",
+    "   model.train()\n",
+    "   running_loss = 0.0\n",
+    "   for batch_idx, (data, target) in enumerate(train_loader):\n",
+    "       data, target = data.to(device), target.to(device)\n",
+    "       optimizer.zero_grad()\n",
+    "       outputs = model(data)\n",
+    "       loss = criterion(outputs, target)\n",
+    "       loss.backward()\n",
+    "       optimizer.step()\n",
+    "       running_loss += loss.item()\n",
+    "\n",
+    "   print(f'Epoch [{epoch+1}/{num_epochs}], Loss: {running_loss/len(train_loader):.4f}')\n",
+    "\n",
+    "# Testing the model\n",
+    "model.eval()\n",
+    "correct = 0\n",
+    "total = 0\n",
+    "with torch.no_grad():\n",
+    "   for data, target in test_loader:\n",
+    "       data, target = data.to(device), target.to(device)\n",
+    "       outputs = model(data)\n",
+    "       _, predicted = torch.max(outputs.data, 1)\n",
+    "       total += target.size(0)\n",
+    "       correct += (predicted == target).sum().item()\n",
+    "\n",
+    "print(f'Test Accuracy: {100 * correct / total:.2f}%')"
+   ]
+  }
+ ],
+ "metadata": {},
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/doc/Programs/Regression/binary_results.csv b/doc/Programs/Regression/binary_results.csv
new file mode 100644
index 000000000..1a5f8e043
--- /dev/null
+++ b/doc/Programs/Regression/binary_results.csv
@@ -0,0 +1,201 @@
+TrueLabel,PredictedLabel
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,0
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
diff --git a/doc/Programs/Regression/multiclass_results.csv b/doc/Programs/Regression/multiclass_results.csv
new file mode 100644
index 000000000..9ffbe5203
--- /dev/null
+++ b/doc/Programs/Regression/multiclass_results.csv
@@ -0,0 +1,301 @@
+TrueLabel,PredictedLabel
+0,0
+0,1
+0,0
+0,0
+0,1
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,1
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,1
+0,0
+0,2
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+1,1
+1,1
+1,0
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,0
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,0
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,0
+1,1
+1,1
+1,0
+1,1
+1,1
+1,1
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
diff --git a/doc/Projects/2025/Project2/html/._Project2-bs000.html b/doc/Projects/2025/Project2/html/._Project2-bs000.html
new file mode 100644
index 000000000..f84073612
--- /dev/null
+++ b/doc/Projects/2025/Project2/html/._Project2-bs000.html
@@ -0,0 +1,640 @@
+<!--
+HTML file automatically generated from DocOnce source
+(https://github.com/doconce/doconce/)
+doconce format html Project2.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=Project2-bs
+-->
+<html>
+<head>
+<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
+<meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
+<meta name="viewport" content="width=device-width, initial-scale=1.0" />
+<meta name="description" content="Project 2 on Machine Learning, deadline November 10 (Midnight)">
+<title>Project 2 on Machine Learning, deadline November 10 (Midnight)</title>
+<!-- Bootstrap style: bootstrap -->
+<!-- doconce format html Project2.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=Project2-bs -->
+<link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
+<!-- not necessary
+<link href="https://netdna.bootstrapcdn.com/font-awesome/4.0.3/css/font-awesome.css" rel="stylesheet">
+-->
+<style type="text/css">
+/* Add scrollbar to dropdown menus in bootstrap navigation bar */
+.dropdown-menu {
+   height: auto;
+   max-height: 400px;
+   overflow-x: hidden;
+}
+/* Adds an invisible element before each target to offset for the navigation
+   bar */
+.anchor::before {
+  content:"";
+  display:block;
+  height:50px;      /* fixed header height for style bootstrap */
+  margin:-50px 0 0; /* negative fixed header height */
+}
+</style>
+</head>
+
+<!-- tocinfo
+{'highest level': 2,
+ 'sections': [('Deliverables', 2, None, 'deliverables'),
+              ('Preamble: Note on writing reports, using reference material, '
+               'AI and other tools',
+               3,
+               None,
+               'preamble-note-on-writing-reports-using-reference-material-ai-and-other-tools'),
+              ('Classification and Regression, writing our own neural network '
+               'code',
+               2,
+               None,
+               'classification-and-regression-writing-our-own-neural-network-code'),
+              ('Part a): Analytical warm-up',
+               3,
+               None,
+               'part-a-analytical-warm-up'),
+              ('Reminder about the gradient machinery from project 1',
+               3,
+               None,
+               'reminder-about-the-gradient-machinery-from-project-1'),
+              ('Part b): Writing your own Neural Network code',
+               3,
+               None,
+               'part-b-writing-your-own-neural-network-code'),
+              ('Part c): Testing against other software libraries',
+               3,
+               None,
+               'part-c-testing-against-other-software-libraries'),
+              ('Part d): Testing different activation functions and depths of '
+               'the neural network',
+               3,
+               None,
+               'part-d-testing-different-activation-functions-and-depths-of-the-neural-network'),
+              ('Part e): Testing different norms',
+               3,
+               None,
+               'part-e-testing-different-norms'),
+              ('Part f): Classification  analysis using neural networks',
+               3,
+               None,
+               'part-f-classification-analysis-using-neural-networks'),
+              ('Part g) Critical evaluation of the various algorithms',
+               3,
+               None,
+               'part-g-critical-evaluation-of-the-various-algorithms'),
+              ('Summary of methods to implement and analyze',
+               2,
+               None,
+               'summary-of-methods-to-implement-and-analyze'),
+              ('Required Analysis:', 3, None, 'required-analysis'),
+              ('Optional (Note that you should include at least two of these '
+               'in the report):',
+               3,
+               None,
+               'optional-note-that-you-should-include-at-least-two-of-these-in-the-report'),
+              ('Background literature', 2, None, 'background-literature'),
+              ('Introduction to numerical projects',
+               2,
+               None,
+               'introduction-to-numerical-projects'),
+              ('Format for electronic delivery of report and programs',
+               2,
+               None,
+               'format-for-electronic-delivery-of-report-and-programs')]}
+end of tocinfo -->
+
+<body>
+
+
+
+<script type="text/x-mathjax-config">
+MathJax.Hub.Config({
+  TeX: {
+     equationNumbers: {  autoNumber: "none"  },
+     extensions: ["AMSmath.js", "AMSsymbols.js", "autobold.js", "color.js"]
+  }
+});
+</script>
+<script type="text/javascript" async
+ src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
+</script>
+
+
+<!-- Bootstrap navigation bar -->
+<div class="navbar navbar-default navbar-fixed-top">
+  <div class="navbar-header">
+    <button type="button" class="navbar-toggle" data-toggle="collapse" data-target=".navbar-responsive-collapse">
+      <span class="icon-bar"></span>
+      <span class="icon-bar"></span>
+      <span class="icon-bar"></span>
+    </button>
+    <a class="navbar-brand" href="http://github.com/Project2-bs.html">Project 2 on Machine Learning, deadline November 10 (Midnight)</a>
+  </div>
+  <div class="navbar-collapse collapse navbar-responsive-collapse">
+    <ul class="nav navbar-nav navbar-right">
+      <li class="dropdown">
+        <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
+        <ul class="dropdown-menu">
+     <!-- navigation toc: --> <li><a href="#deliverables" style="font-size: 80%;"><b>Deliverables</b></a></li>
+     <!-- navigation toc: --> <li><a href="#preamble-note-on-writing-reports-using-reference-material-ai-and-other-tools" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Preamble: Note on writing reports, using reference material, AI and other tools</a></li>
+     <!-- navigation toc: --> <li><a href="#classification-and-regression-writing-our-own-neural-network-code" style="font-size: 80%;"><b>Classification and Regression, writing our own neural network code</b></a></li>
+     <!-- navigation toc: --> <li><a href="#part-a-analytical-warm-up" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Part a): Analytical warm-up</a></li>
+     <!-- navigation toc: --> <li><a href="#reminder-about-the-gradient-machinery-from-project-1" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Reminder about the gradient machinery from project 1</a></li>
+     <!-- navigation toc: --> <li><a href="#part-b-writing-your-own-neural-network-code" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Part b): Writing your own Neural Network code</a></li>
+     <!-- navigation toc: --> <li><a href="#part-c-testing-against-other-software-libraries" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Part c): Testing against other software libraries</a></li>
+     <!-- navigation toc: --> <li><a href="#part-d-testing-different-activation-functions-and-depths-of-the-neural-network" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Part d): Testing different activation functions and depths of the neural network</a></li>
+     <!-- navigation toc: --> <li><a href="#part-e-testing-different-norms" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Part e): Testing different norms</a></li>
+     <!-- navigation toc: --> <li><a href="#part-f-classification-analysis-using-neural-networks" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Part f): Classification  analysis using neural networks</a></li>
+     <!-- navigation toc: --> <li><a href="#part-g-critical-evaluation-of-the-various-algorithms" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Part g) Critical evaluation of the various algorithms</a></li>
+     <!-- navigation toc: --> <li><a href="#summary-of-methods-to-implement-and-analyze" style="font-size: 80%;"><b>Summary of methods to implement and analyze</b></a></li>
+     <!-- navigation toc: --> <li><a href="#required-analysis" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Required Analysis:</a></li>
+     <!-- navigation toc: --> <li><a href="#optional-note-that-you-should-include-at-least-two-of-these-in-the-report" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Optional (Note that you should include at least two of these in the report):</a></li>
+     <!-- navigation toc: --> <li><a href="#background-literature" style="font-size: 80%;"><b>Background literature</b></a></li>
+     <!-- navigation toc: --> <li><a href="#introduction-to-numerical-projects" style="font-size: 80%;"><b>Introduction to numerical projects</b></a></li>
+     <!-- navigation toc: --> <li><a href="#format-for-electronic-delivery-of-report-and-programs" style="font-size: 80%;"><b>Format for electronic delivery of report and programs</b></a></li>
+
+        </ul>
+      </li>
+    </ul>
+  </div>
+</div>
+</div> <!-- end of navigation bar -->
+<div class="container">
+<p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
+<a name="part0000"></a>
+<!-- ------------------- main content ---------------------- -->
+<div class="jumbotron">
+<center>
+<h1>Project 2 on Machine Learning, deadline November 10 (Midnight)</h1>
+</center>  <!-- document title -->
+
+<!-- author(s): <a href="http://www.uio.no/studier/emner/matnat/fys/FYS3155/index-eng.html" target="_self">Data Analysis and Machine Learning FYS-STK3155/FYS4155</a> -->
+<center>
+<b><a href="http://www.uio.no/studier/emner/matnat/fys/FYS3155/index-eng.html" target="_self">Data Analysis and Machine Learning FYS-STK3155/FYS4155</a></b> 
+</center>
+<!-- institution -->
+<center>
+<b>University of Oslo, Norway</b>
+</center>
+<br>
+<center>
+<h4>October 14, 2025</h4>
+</center> <!-- date -->
+<br>
+
+
+</div> <!-- end jumbotron -->
+<h2 id="deliverables" class="anchor">Deliverables </h2>
+
+<p>First, join a group in canvas with your group partners. Pick an avaliable group for Project 2 in the <b>People</b> page.</p>
+
+<p>In canvas, deliver as a group and include:</p>
+
+<ul>
+<li> A PDF of your report which follows the guidelines covered below and in the week 39 exercises. Additional requirements include:</li>
+<ul>
+  <li> It should be around 5000 words, use the word counter in Overleaf for this. This often corresponds to 10-12 pages. References and appendices are excluded from the word count</li>
+  <li> It should include around 10-15 figures. You can include more figures in appendices and/or as supplemental material in your repository.</li>
+</ul>
+<li> A comment linking to your github repository (or folder in one of your github repositories) for this project. The repository must include</li>
+</ul>
+<p>A PDF file of the report</p>
+<ul>
+  <li> A folder named Code, where you put python files for your functions and notebooks for reproducing your results. Remember to use a seed for generating random data and for train-test splits when generating final results.</li>
+  <li> A README file with the name of the group members</li>
+  <li> a short description of the project</li>
+  <li> a description of how to install the required packages to run your code from a requirements.txt file or similar (such as a plain text description) names and descriptions of the various notebooks in the Code folder and the results they produce</li>
+</ul>
+<h3 id="preamble-note-on-writing-reports-using-reference-material-ai-and-other-tools" class="anchor">Preamble: Note on writing reports, using reference material, AI and other tools </h3>
+
+<p>We want you to answer the three different projects by handing in
+reports written like a standard scientific/technical report. The links
+at
+https://github.com/CompPhysics/MachineLearning/tree/master/doc/Projects
+contain more information. There you can find examples of previous
+reports, the projects themselves, how we grade reports etc. How to
+write reports will also be discussed during the various lab
+sessions. Please do ask us if you are in doubt.
+</p>
+
+<p>When using codes and material from other sources, you should refer to
+these in the bibliography of your report, indicating wherefrom you for
+example got the code, whether this is from the lecture notes,
+softwares like Scikit-Learn, TensorFlow, PyTorch or other
+sources. These sources should always be cited correctly. How to cite
+some of the libraries is often indicated from their corresponding
+GitHub sites or websites, see for example how to cite Scikit-Learn at
+https://scikit-learn.org/dev/about.html.
+</p>
+
+<p>We enocurage you to use tools like ChatGPT or similar in writing the
+report. If you use for example ChatGPT, please do cite it properly and
+include (if possible) your questions and answers as an addition to the
+report. This can be uploaded to for example your website,
+GitHub/GitLab or similar as supplemental material.
+</p>
+
+<p>If you would like to study other data sets, feel free to propose other
+sets. What we have proposed here are mere suggestions from our
+side. If you opt for another data set, consider using a set which has
+been studied in the scientific literature. This makes it easier for
+you to compare and analyze your results. Comparing with existing
+results from the scientific literature is also an essential element of
+the scientific discussion. The University of California at Irvine with
+its Machine Learning repository at
+https://archive.ics.uci.edu/ml/index.php is an excellent site to look
+up for examples and inspiration. Kaggle.com is an equally interesting
+site. Feel free to explore these sites. 
+</p>
+<h2 id="classification-and-regression-writing-our-own-neural-network-code" class="anchor">Classification and Regression, writing our own neural network code  </h2>
+
+<p>The main aim of this project is to study both classification and
+regression problems by developing our own 
+feed-forward neural network (FFNN) code. The exercises from week 41 and 42 (see <a href="https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/exercisesweek41.html" target="_self"><tt>https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/exercisesweek41.html</tt></a> and <a href="https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/exercisesweek42.html" target="_self"><tt>https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/exercisesweek42.html</tt></a>) as well as the lecture material from the same weeks (see  <a href="https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/week41.html" target="_self"><tt>https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/week41.html</tt></a> and <a href="https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/week42.html" target="_self"><tt>https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/week42.html</tt></a>) should contain enough information for you to get started with writing your own code.
+</p>
+
+<p>We will also reuse our codes on gradient descent methods from project 1.</p>
+
+<p>The data sets that we propose here are (the default sets)</p>
+
+<ul>
+<li> Regression (fitting a continuous function). In this part you will need to bring back your results from project 1 and compare these with what you get from your Neural Network code to be developed here. The data sets could be</li>
+<ul>
+  <li> The simple one-dimensional function Runge function from project 1, that is \( f(x) = \frac{1}{1+25x^2} \). We recommend using a simpler function when developing your neural network code for regression problems. Feel however free to discuss and study other functions, such as the two-dimensional Runge function \( f(x,y)=\left[(10x - 5)^2 + (10y - 5)^2 + 1 \right]^{-1} \), or even more complicated two-dimensional functions (see the supplementary material of <a href="https://www.nature.com/articles/s41467-025-61362-4" target="_self"><tt>https://www.nature.com/articles/s41467-025-61362-4</tt></a> for an extensive list of two-dimensional functions).</li> 
+</ul>
+<li> Classification.</li>
+<ul>
+ <li> We will consider a multiclass classification problem given by the full MNIST data set. The full data set is at <a href="https://www.kaggle.com/datasets/hojjatk/mnist-dataset" target="_self"><tt>https://www.kaggle.com/datasets/hojjatk/mnist-dataset</tt></a>.</li>
+</ul>
+</ul>
+<p>We will start with a regression problem and we will reuse our codes on gradient descent methods from project 1.</p>
+<h3 id="part-a-analytical-warm-up" class="anchor">Part a): Analytical warm-up </h3>
+
+<p>When using our gradient machinery from project 1, we will need the expressions for the cost/loss functions and their respective
+gradients. The functions whose gradients we need are:
+</p>
+<ol>
+<li> The mean-squared error (MSE) with and without the \( L_1 \) and \( L_2 \) norms (regression problems)</li>
+<li> The binary cross entropy (aka log loss)  for binary classification problems with and without \( L_1 \) and \( L_2 \) norms</li>
+<li> The multiclass cross entropy cost/loss function (aka Softmax cross entropy or just Softmax loss function)</li>
+</ol>
+<p>Set up these three cost/loss functions and their respective derivatives and explain the various terms. In this project you will however only use the MSE and the Softmax  cross entropy.</p>
+
+<p>We will test three activation functions for our neural network setup, these are the </p>
+<ol>
+<li> The Sigmoid (aka <b>logit</b>) function,</li>
+<li> the RELU function and</li>
+<li> the Leaky RELU function</li>
+</ol>
+<p>Set up their expressions and their first derivatives.
+You may consult the lecture notes (with codes and more) from week 42 at <a href="https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/week42.html" target="_self"><tt>https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/week42.html</tt></a>.
+</p>
+<h3 id="reminder-about-the-gradient-machinery-from-project-1" class="anchor">Reminder about the gradient machinery from project 1 </h3>
+
+<p>In the setup of a neural network code you will need your gradient descent codes from
+project 1.  For neural networks we will recommend using stochastic
+gradient descent with either the RMSprop or the ADAM algorithms for
+updating the learning rates. But you should feel free to try plain gradient descent as well.
+</p>
+
+<p>We recommend reading chapter 8 on optimization from the textbook of
+Goodfellow, Bengio and Courville at
+<a href="https://www.deeplearningbook.org/" target="_self"><tt>https://www.deeplearningbook.org/</tt></a>. This chapter contains many
+useful insights and discussions on the optimization part of machine
+learning.  A useful reference on the back progagation algorithm is
+Nielsen's book at <a href="http://neuralnetworksanddeeplearning.com/" target="_self"><tt>http://neuralnetworksanddeeplearning.com/</tt></a>. 
+</p>
+
+<p>You will find the Python <a href="https://seaborn.pydata.org/generated/seaborn.heatmap.html" target="_self">Seaborn
+package</a>
+useful when plotting the results as function of the learning rate
+\( \eta \) and the hyper-parameter \( \lambda \) .
+</p>
+<h3 id="part-b-writing-your-own-neural-network-code" class="anchor">Part b): Writing your own Neural Network code  </h3>
+
+<p>Your aim now, and this is the central part of this project, is to
+write your own FFNN code implementing the back
+propagation algorithm discussed in the lecture slides from week 41 at <a href="https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/week41.html" target="_self"><tt>https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/week41.html</tt></a> and week 42 at <a href="https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/week42.html" target="_self"><tt>https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/week42.html</tt></a>.
+</p>
+
+<p>We will focus on a regression problem first, using the one-dimensional Runge function</p>
+$$
+f(x) = \frac{1}{1+25x^2},
+$$
+
+<p>from project 1.</p>
+
+<p>Use only the mean-squared error as cost function (no regularization terms) and 
+write an FFNN code for a regression problem with a flexible number of hidden
+layers and nodes using only the Sigmoid function as activation function for
+the hidden layers. Initialize the weights using a normal
+distribution. How would you initialize the biases? And which
+activation function would you select for the final output layer?
+And how would you set up your design/feature matrix? Hint: does it have to represent a polynomial approximation as you did in project 1? 
+</p>
+
+<p>Train your network and compare the results with those from your OLS
+regression code from project 1 using the one-dimensional Runge
+function.  When comparing your neural network code with the OLS
+results from project 1, use the same data sets which gave you the best
+MSE score. Moreover, use the polynomial order from project 1 that gave you the
+best result.  Compare these results with your neural network with one
+and two hidden layers using \( 50 \) and \( 100 \) hidden nodes, respectively.
+</p>
+
+<p>Comment your results and give a critical discussion of the results
+obtained with the OLS code from project 1 and your own neural network
+code.  Make an analysis of the learning rates employed to find the
+optimal MSE score. Test both stochastic gradient descent
+with RMSprop and ADAM and plain gradient descent with different
+learning rates.
+</p>
+
+<p>You should, as you did in project 1, scale your data.</p>
+<h3 id="part-c-testing-against-other-software-libraries" class="anchor">Part c): Testing against other software libraries </h3>
+
+<p>You should test your results against a similar code using <b>Scikit-Learn</b> (see the examples in the above lecture notes from weeks 41 and 42) or <b>tensorflow/keras</b> or <b>Pytorch</b> (for Pytorch, see Raschka et al.'s text chapters 12 and 13). </p>
+
+<p>Furthermore, you should also test that your derivatives are correctly
+calculated using automatic differentiation, using for example the
+<b>Autograd</b> library or the <b>JAX</b> library. It is optional to implement
+these libraries for the present project. In this project they serve as
+useful tests of our derivatives.
+</p>
+<h3 id="part-d-testing-different-activation-functions-and-depths-of-the-neural-network" class="anchor">Part d): Testing different activation functions and depths of the neural network </h3>
+
+<p>You should also test different activation functions for the hidden
+layers. Try out the Sigmoid, the RELU and the Leaky RELU functions and
+discuss your results.  Test your results as functions of the number of hidden layers and nodes. Do you see signs of overfitting?
+It is optional in this project to perform a bias-variance trade-off analysis. 
+</p>
+<h3 id="part-e-testing-different-norms" class="anchor">Part e): Testing different norms </h3>
+
+<p>Finally, still using the one-dimensional Runge function, add now the
+hyperparameters \( \lambda \) with the \( L_2 \) and \( L_1 \) norms.  Find the
+optimal results for the hyperparameters \( \lambda \) and the learning
+rates \( \eta \) and neural network architecture and compare the \( L_2 \) results with Ridge regression from
+project 1 and the \( L_1 \) results with the Lasso calculations of project 1.
+Use again the same data sets and the best results from project 1 in your comparisons. 
+</p>
+<h3 id="part-f-classification-analysis-using-neural-networks" class="anchor">Part f): Classification  analysis using neural networks  </h3>
+
+<p>With a well-written code it should now be easy to change the
+activation function for the output layer.
+</p>
+
+<p>Here we will change the cost function for our neural network code
+developed in parts b), d) and e) in order to perform a classification
+analysis.  The classification problem we will study is the multiclass
+MNIST problem, see the description of the full data set at
+<a href="https://www.kaggle.com/datasets/hojjatk/mnist-dataset" target="_self"><tt>https://www.kaggle.com/datasets/hojjatk/mnist-dataset</tt></a>. We will use the Softmax cross entropy function discussed in a). 
+The MNIST data set discussed in the lecture notes from week 42 is a downscaled variant of the full dataset. 
+</p>
+
+<p>Feel free to suggest other data sets. If you find the classic MNIST data set somewhat limited, feel free to try the  
+MNIST-Fashion data set at for example <a href="https://www.kaggle.com/datasets/zalando-research/fashionmnist" target="_self"><tt>https://www.kaggle.com/datasets/zalando-research/fashionmnist</tt></a>.
+</p>
+
+<p>To set up the data set, the following python programs may be useful</p>
+
+<!-- code=python (!bc pycod) typeset with pygments style "default" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #f8f8f8">
+  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.datasets</span> <span style="color: #008000; font-weight: bold">import</span> fetch_openml
+
+<span style="color: #408080; font-style: italic"># Fetch the MNIST dataset</span>
+mnist <span style="color: #666666">=</span> fetch_openml(<span style="color: #BA2121">&#39;mnist_784&#39;</span>, version<span style="color: #666666">=1</span>, as_frame<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">False</span>, parser<span style="color: #666666">=</span><span style="color: #BA2121">&#39;auto&#39;</span>)
+
+<span style="color: #408080; font-style: italic"># Extract data (features) and target (labels)</span>
+X <span style="color: #666666">=</span> mnist<span style="color: #666666">.</span>data
+y <span style="color: #666666">=</span> mnist<span style="color: #666666">.</span>target
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
+</div>
+
+<p>You should consider scaling the data. The Pixel values in MNIST range from 0 to 255. Scaling them to a 0-1 range can improve the performance of some models. That is, you could implement the following scaling</p>
+
+<!-- code=python (!bc pycod) typeset with pygments style "default" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #f8f8f8">
+  <pre style="line-height: 125%;">X <span style="color: #666666">=</span> X <span style="color: #666666">/</span> <span style="color: #666666">255.0</span>
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
+</div>
+
+<p>And then perform the standard train-test splitting</p>
+
+<!-- code=python (!bc pycod) typeset with pygments style "default" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #f8f8f8">
+  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.model_selection</span> <span style="color: #008000; font-weight: bold">import</span> train_test_split
+X_train, X_test, y_train, y_test <span style="color: #666666">=</span> train_test_split(X, y, test_size<span style="color: #666666">=0.2</span>, random_state<span style="color: #666666">=42</span>)
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
+</div>
+
+<p>To measure the performance of our classification problem we will use the
+so-called <em>accuracy</em> score.  The accuracy is as you would expect just
+the number of correctly guessed targets \( t_i \) divided by the total
+number of targets, that is 
+</p>
+
+$$ 
+\text{Accuracy} = \frac{\sum_{i=1}^n I(t_i = y_i)}{n} ,
+$$
+
+<p>where \( I \) is the indicator function, \( 1 \) if \( t_i = y_i \) and \( 0 \)
+otherwise if we have a binary classification problem. Here \( t_i \)
+represents the target and \( y_i \) the outputs of your FFNN code and \( n \) is simply the number of targets \( t_i \).
+</p>
+
+<p>Discuss your results and give a critical analysis of the various parameters, including hyper-parameters like the learning rates and the regularization parameter \( \lambda \), various activation functions, number of hidden layers and nodes and activation functions.  </p>
+
+<p>Again, we strongly recommend that you compare your own neural Network
+code for classification and pertinent results against a similar code using <b>Scikit-Learn</b>  or <b>tensorflow/keras</b> or <b>pytorch</b>.
+</p>
+
+<p>If you have time, you can use the functionality of <b>scikit-learn</b> and compare your neural network results with those from Logistic regression. This is optional.
+The weblink  here <a href="https://medium.com/ai-in-plain-english/comparison-between-logistic-regression-and-neural-networks-in-classifying-digits-dc5e85cd93c3" target="_self"><tt>https://medium.com/ai-in-plain-english/comparison-between-logistic-regression-and-neural-networks-in-classifying-digits-dc5e85cd93c3</tt></a>compares logistic regression and FFNN using the so-called MNIST data set. You may find several useful hints and ideas from this article. Your neural network code can implement the equivalent of logistic regression by simply setting the number of hidden layers to zero and keeping just the input and the output layers. 
+</p>
+
+<p>If you wish to compare with say Logisti Regression from <b>scikit-learn</b>, the following code uses the above data set</p>
+
+<!-- code=python (!bc pycod) typeset with pygments style "default" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #f8f8f8">
+  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.linear_model</span> <span style="color: #008000; font-weight: bold">import</span> LogisticRegression
+<span style="color: #408080; font-style: italic"># Initialize the model</span>
+model <span style="color: #666666">=</span> LogisticRegression(solver<span style="color: #666666">=</span><span style="color: #BA2121">&#39;saga&#39;</span>, multi_class<span style="color: #666666">=</span><span style="color: #BA2121">&#39;multinomial&#39;</span>, max_iter<span style="color: #666666">=1000</span>, random_state<span style="color: #666666">=42</span>)
+<span style="color: #408080; font-style: italic"># Train the model</span>
+model<span style="color: #666666">.</span>fit(X_train, y_train)
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.metrics</span> <span style="color: #008000; font-weight: bold">import</span> accuracy_score
+<span style="color: #408080; font-style: italic"># Make predictions on the test set</span>
+y_pred <span style="color: #666666">=</span> model<span style="color: #666666">.</span>predict(X_test)
+<span style="color: #408080; font-style: italic"># Calculate accuracy</span>
+accuracy <span style="color: #666666">=</span> accuracy_score(y_test, y_pred)
+<span style="color: #008000">print</span>(<span style="color: #BA2121">f&quot;Model Accuracy: </span><span style="color: #BB6688; font-weight: bold">{</span>accuracy<span style="color: #BB6688; font-weight: bold">:</span><span style="color: #BA2121">.4f</span><span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">&quot;</span>)
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
+</div>
+<h3 id="part-g-critical-evaluation-of-the-various-algorithms" class="anchor">Part g) Critical evaluation of the various algorithms </h3>
+
+<p>After all these glorious calculations, you should now summarize the
+various algorithms and come with a critical evaluation of their pros
+and cons. Which algorithm works best for the regression case and which
+is best for the classification case. These codes can also be part of
+your final project 3, but now applied to other data sets.
+</p>
+<h2 id="summary-of-methods-to-implement-and-analyze" class="anchor">Summary of methods to implement and analyze </h2>
+
+<b>Required Implementation:</b>
+<ol>
+<li> Reuse the regression code and results from project 1, these will act as a benchmark for seeing how suited a neural network is for this regression task.</li>
+<li> Implement a neural network with</li>
+<ul>
+  <li> A flexible number of layers</li>
+  <li> A flexible number of nodes in each layer</li>
+  <li> A changeable activation function in each layer (Sigmoid, ReLU, LeakyReLU, as well as Linear and Softmax)</li>
+  <li> A changeable cost function, which will be set to MSE for regression and cross-entropy for multiple-classification</li>
+  <li> An optional L1 or L2 norm of the weights and biases in the cost function (only used for computing gradients, not interpretable metrics)</li>
+</ul>
+<li> Implement the back-propagation algorithm to compute the gradient of your neural network</li>
+<li> Reuse the implementation of Plain and Stochastic Gradient Descent from Project 1 (and adapt the code to work with the your neural network)</li>
+<ul>
+  <li> With no optimization algorithm</li>
+  <li> With RMS Prop</li>
+  <li> With ADAM</li>
+</ul>
+<li> Implement scaling and train-test splitting of your data, preferably using sklearn</li>
+<li> Implement and compute metrics like the MSE and Accuracy</li>
+</ol>
+<h3 id="required-analysis" class="anchor">Required Analysis: </h3>
+<ol>
+<li> Briefly show and argue for the advantages and disadvantages of the methods from Project 1.</li>
+<li> Explore and show the impact of changing the number of layers, nodes per layer, choice of activation function, and inclusion of L1 and L2 norms. Present only the most interesting results from this exploration. 2D Heatmaps will be good for this: Start with finding a well performing set of hyper-parameters, then change two at a time in a range that shows good and bad performance.</li>
+<li> Show and argue for the advantages and disadvantages of using a neural network for regression on your data</li>
+<li> Show and argue for the advantages and disadvantages of using a neural network for classification on your data</li>
+<li> Show and argue for the advantages and disadvantages of the different gradient methods and learning rates when training the neural network</li>
+</ol>
+<h3 id="optional-note-that-you-should-include-at-least-two-of-these-in-the-report" class="anchor">Optional (Note that you should include at least two of these in the report): </h3>
+<ol>
+<li> Implement Logistic Regression as simple classification model case (equivalent to a Neural Network with just the output layer)</li>
+<li> Compute the gradient of the neural network with autograd, to show that it gives the same result as your hand-written backpropagation.</li>
+<li> Compare your results with results from using a machine-learning library like pytorch (https://docs.pytorch.org/tutorials/beginner/basics/buildmodel_tutorial.html)</li>
+<li> Use a more complex classification dataset instead, like the fashion MNIST (see <a href="https://www.kaggle.com/datasets/zalando-research/fashionmnist" target="_self"><tt>https://www.kaggle.com/datasets/zalando-research/fashionmnist</tt></a>)</li>
+<li> Use a more complex regression dataset instead, like the two-dimensional Runge function \( f(x,y)=\left[(10x - 5)^2 + (10y - 5)^2 + 1 \right]^{-1} \), or even more complicated two-dimensional functions (see the supplementary material of <a href="https://www.nature.com/articles/s41467-025-61362-4" target="_self"><tt>https://www.nature.com/articles/s41467-025-61362-4</tt></a> for an extensive list of two-dimensional functions).</li> 
+<li> Compute and interpret a confusion matrix of your best classification model (see <a href="https://www.researchgate.net/figure/Confusion-matrix-of-MNIST-and-F-MNIST-embeddings_fig5_349758607" target="_self"><tt>https://www.researchgate.net/figure/Confusion-matrix-of-MNIST-and-F-MNIST-embeddings_fig5_349758607</tt></a>)</li>
+</ol>
+<h2 id="background-literature" class="anchor">Background literature </h2>
+
+<ol>
+<li> The text of Michael Nielsen is highly recommended, see Nielsen's book at <a href="http://neuralnetworksanddeeplearning.com/" target="_self"><tt>http://neuralnetworksanddeeplearning.com/</tt></a>. It is an excellent read.</li>
+<li> Goodfellow, Bengio and Courville, Deep Learning at <a href="https://www.deeplearningbook.org/" target="_self"><tt>https://www.deeplearningbook.org/</tt></a>. Here we recommend chapters 6, 7 and 8</li>
+<li> Raschka et al. at <a href="https://sebastianraschka.com/blog/2022/ml-pytorch-book.html" target="_self"><tt>https://sebastianraschka.com/blog/2022/ml-pytorch-book.html</tt></a>. Here we recommend chapters 11, 12 and 13.</li>
+</ol>
+<h2 id="introduction-to-numerical-projects" class="anchor">Introduction to numerical projects </h2>
+
+<p>Here follows a brief recipe and recommendation on how to write a report for each
+project.
+</p>
+
+<ul>
+  <li> Give a short description of the nature of the problem and the eventual  numerical methods you have used.</li>
+  <li> Describe the algorithm you have used and/or developed. Here you may find it convenient to use pseudocoding. In many cases you can describe the algorithm in the program itself.</li>
+  <li> Include the source code of your program. Comment your program properly.</li>
+  <li> If possible, try to find analytic solutions, or known limits in order to test your program when developing the code.</li>
+  <li> Include your results either in figure form or in a table. Remember to        label your results. All tables and figures should have relevant captions        and labels on the axes.</li>
+  <li> Try to evaluate the reliabilty and numerical stability/precision of your results. If possible, include a qualitative and/or quantitative discussion of the numerical stability, eventual loss of precision etc.</li>
+  <li> Try to give an interpretation of you results in your answers to  the problems.</li>
+  <li> Critique: if possible include your comments and reflections about the  exercise, whether you felt you learnt something, ideas for improvements and  other thoughts you've made when solving the exercise. We wish to keep this course at the interactive level and your comments can help us improve it.</li>
+  <li> Try to establish a practice where you log your work at the  computerlab. You may find such a logbook very handy at later stages in your work, especially when you don't properly remember  what a previous test version  of your program did. Here you could also record  the time spent on solving the exercise, various algorithms you may have tested or other topics which you feel worthy of mentioning.</li>
+</ul>
+<h2 id="format-for-electronic-delivery-of-report-and-programs" class="anchor">Format for electronic delivery of report and programs </h2>
+
+<p>The preferred format for the report is a PDF file. You can also use DOC or postscript formats or as an ipython notebook file.  As programming language we prefer that you choose between C/C++, Fortran2008 or Python. The following prescription should be followed when preparing the report:</p>
+
+<ul>
+  <li> Use Canvas to hand in your projects, log in  at  <a href="https://www.uio.no/english/services/it/education/canvas/" target="_self"><tt>https://www.uio.no/english/services/it/education/canvas/</tt></a> with your normal UiO username and password.</li>
+  <li> Upload <b>only</b> the report file or the link to your GitHub/GitLab or similar typo of  repos!  For the source code file(s) you have developed please provide us with your link to your GitHub/GitLab or similar  domain.  The report file should include all of your discussions and a list of the codes you have developed.  Do not include library files which are available at the course homepage, unless you have made specific changes to them.</li>
+  <li> In your GitHub/GitLab or similar repository, please include a folder which contains selected results. These can be in the form of output from your code for a selected set of runs and input parameters.</li>
+</ul>
+<p>Finally, 
+we encourage you to collaborate. Optimal working groups consist of 
+2-3 students. You can then hand in a common report. 
+</p>
+
+<p>
+<!-- navigation buttons at the bottom of the page -->
+<ul class="pagination">
+  <li class="active"><a href="http://github.com/._Project2-bs000.html">1</a></li>
+</ul>
+<!-- ------------------- end of main content --------------- -->
+</div>  <!-- end container -->
+<!-- include javascript, jQuery *first* -->
+<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.10.2/jquery.min.js"></script>
+<script src="https://netdna.bootstrapcdn.com/bootstrap/3.0.0/js/bootstrap.min.js"></script>
+<!-- Bootstrap footer
+<footer>
+<a href="https://.../"><img width="250" align=right src="https://.../"></a>
+</footer>
+-->
+<center style="font-size:80%">
+<!-- copyright --> &copy; 1999-2025, "Data Analysis and Machine Learning FYS-STK3155/FYS4155":"http://www.uio.no/studier/emner/matnat/fys/FYS3155/index-eng.html". Released under CC Attribution-NonCommercial 4.0 license
+</center>
+</body>
+</html>
+
diff --git a/doc/Projects/2025/Project2/html/Project2-bs.html b/doc/Projects/2025/Project2/html/Project2-bs.html
new file mode 100644
index 000000000..f84073612
--- /dev/null
+++ b/doc/Projects/2025/Project2/html/Project2-bs.html
@@ -0,0 +1,640 @@
+<!--
+HTML file automatically generated from DocOnce source
+(https://github.com/doconce/doconce/)
+doconce format html Project2.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=Project2-bs
+-->
+<html>
+<head>
+<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
+<meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
+<meta name="viewport" content="width=device-width, initial-scale=1.0" />
+<meta name="description" content="Project 2 on Machine Learning, deadline November 10 (Midnight)">
+<title>Project 2 on Machine Learning, deadline November 10 (Midnight)</title>
+<!-- Bootstrap style: bootstrap -->
+<!-- doconce format html Project2.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=Project2-bs -->
+<link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
+<!-- not necessary
+<link href="https://netdna.bootstrapcdn.com/font-awesome/4.0.3/css/font-awesome.css" rel="stylesheet">
+-->
+<style type="text/css">
+/* Add scrollbar to dropdown menus in bootstrap navigation bar */
+.dropdown-menu {
+   height: auto;
+   max-height: 400px;
+   overflow-x: hidden;
+}
+/* Adds an invisible element before each target to offset for the navigation
+   bar */
+.anchor::before {
+  content:"";
+  display:block;
+  height:50px;      /* fixed header height for style bootstrap */
+  margin:-50px 0 0; /* negative fixed header height */
+}
+</style>
+</head>
+
+<!-- tocinfo
+{'highest level': 2,
+ 'sections': [('Deliverables', 2, None, 'deliverables'),
+              ('Preamble: Note on writing reports, using reference material, '
+               'AI and other tools',
+               3,
+               None,
+               'preamble-note-on-writing-reports-using-reference-material-ai-and-other-tools'),
+              ('Classification and Regression, writing our own neural network '
+               'code',
+               2,
+               None,
+               'classification-and-regression-writing-our-own-neural-network-code'),
+              ('Part a): Analytical warm-up',
+               3,
+               None,
+               'part-a-analytical-warm-up'),
+              ('Reminder about the gradient machinery from project 1',
+               3,
+               None,
+               'reminder-about-the-gradient-machinery-from-project-1'),
+              ('Part b): Writing your own Neural Network code',
+               3,
+               None,
+               'part-b-writing-your-own-neural-network-code'),
+              ('Part c): Testing against other software libraries',
+               3,
+               None,
+               'part-c-testing-against-other-software-libraries'),
+              ('Part d): Testing different activation functions and depths of '
+               'the neural network',
+               3,
+               None,
+               'part-d-testing-different-activation-functions-and-depths-of-the-neural-network'),
+              ('Part e): Testing different norms',
+               3,
+               None,
+               'part-e-testing-different-norms'),
+              ('Part f): Classification  analysis using neural networks',
+               3,
+               None,
+               'part-f-classification-analysis-using-neural-networks'),
+              ('Part g) Critical evaluation of the various algorithms',
+               3,
+               None,
+               'part-g-critical-evaluation-of-the-various-algorithms'),
+              ('Summary of methods to implement and analyze',
+               2,
+               None,
+               'summary-of-methods-to-implement-and-analyze'),
+              ('Required Analysis:', 3, None, 'required-analysis'),
+              ('Optional (Note that you should include at least two of these '
+               'in the report):',
+               3,
+               None,
+               'optional-note-that-you-should-include-at-least-two-of-these-in-the-report'),
+              ('Background literature', 2, None, 'background-literature'),
+              ('Introduction to numerical projects',
+               2,
+               None,
+               'introduction-to-numerical-projects'),
+              ('Format for electronic delivery of report and programs',
+               2,
+               None,
+               'format-for-electronic-delivery-of-report-and-programs')]}
+end of tocinfo -->
+
+<body>
+
+
+
+<script type="text/x-mathjax-config">
+MathJax.Hub.Config({
+  TeX: {
+     equationNumbers: {  autoNumber: "none"  },
+     extensions: ["AMSmath.js", "AMSsymbols.js", "autobold.js", "color.js"]
+  }
+});
+</script>
+<script type="text/javascript" async
+ src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
+</script>
+
+
+<!-- Bootstrap navigation bar -->
+<div class="navbar navbar-default navbar-fixed-top">
+  <div class="navbar-header">
+    <button type="button" class="navbar-toggle" data-toggle="collapse" data-target=".navbar-responsive-collapse">
+      <span class="icon-bar"></span>
+      <span class="icon-bar"></span>
+      <span class="icon-bar"></span>
+    </button>
+    <a class="navbar-brand" href="http://github.com/Project2-bs.html">Project 2 on Machine Learning, deadline November 10 (Midnight)</a>
+  </div>
+  <div class="navbar-collapse collapse navbar-responsive-collapse">
+    <ul class="nav navbar-nav navbar-right">
+      <li class="dropdown">
+        <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
+        <ul class="dropdown-menu">
+     <!-- navigation toc: --> <li><a href="#deliverables" style="font-size: 80%;"><b>Deliverables</b></a></li>
+     <!-- navigation toc: --> <li><a href="#preamble-note-on-writing-reports-using-reference-material-ai-and-other-tools" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Preamble: Note on writing reports, using reference material, AI and other tools</a></li>
+     <!-- navigation toc: --> <li><a href="#classification-and-regression-writing-our-own-neural-network-code" style="font-size: 80%;"><b>Classification and Regression, writing our own neural network code</b></a></li>
+     <!-- navigation toc: --> <li><a href="#part-a-analytical-warm-up" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Part a): Analytical warm-up</a></li>
+     <!-- navigation toc: --> <li><a href="#reminder-about-the-gradient-machinery-from-project-1" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Reminder about the gradient machinery from project 1</a></li>
+     <!-- navigation toc: --> <li><a href="#part-b-writing-your-own-neural-network-code" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Part b): Writing your own Neural Network code</a></li>
+     <!-- navigation toc: --> <li><a href="#part-c-testing-against-other-software-libraries" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Part c): Testing against other software libraries</a></li>
+     <!-- navigation toc: --> <li><a href="#part-d-testing-different-activation-functions-and-depths-of-the-neural-network" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Part d): Testing different activation functions and depths of the neural network</a></li>
+     <!-- navigation toc: --> <li><a href="#part-e-testing-different-norms" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Part e): Testing different norms</a></li>
+     <!-- navigation toc: --> <li><a href="#part-f-classification-analysis-using-neural-networks" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Part f): Classification  analysis using neural networks</a></li>
+     <!-- navigation toc: --> <li><a href="#part-g-critical-evaluation-of-the-various-algorithms" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Part g) Critical evaluation of the various algorithms</a></li>
+     <!-- navigation toc: --> <li><a href="#summary-of-methods-to-implement-and-analyze" style="font-size: 80%;"><b>Summary of methods to implement and analyze</b></a></li>
+     <!-- navigation toc: --> <li><a href="#required-analysis" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Required Analysis:</a></li>
+     <!-- navigation toc: --> <li><a href="#optional-note-that-you-should-include-at-least-two-of-these-in-the-report" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Optional (Note that you should include at least two of these in the report):</a></li>
+     <!-- navigation toc: --> <li><a href="#background-literature" style="font-size: 80%;"><b>Background literature</b></a></li>
+     <!-- navigation toc: --> <li><a href="#introduction-to-numerical-projects" style="font-size: 80%;"><b>Introduction to numerical projects</b></a></li>
+     <!-- navigation toc: --> <li><a href="#format-for-electronic-delivery-of-report-and-programs" style="font-size: 80%;"><b>Format for electronic delivery of report and programs</b></a></li>
+
+        </ul>
+      </li>
+    </ul>
+  </div>
+</div>
+</div> <!-- end of navigation bar -->
+<div class="container">
+<p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
+<a name="part0000"></a>
+<!-- ------------------- main content ---------------------- -->
+<div class="jumbotron">
+<center>
+<h1>Project 2 on Machine Learning, deadline November 10 (Midnight)</h1>
+</center>  <!-- document title -->
+
+<!-- author(s): <a href="http://www.uio.no/studier/emner/matnat/fys/FYS3155/index-eng.html" target="_self">Data Analysis and Machine Learning FYS-STK3155/FYS4155</a> -->
+<center>
+<b><a href="http://www.uio.no/studier/emner/matnat/fys/FYS3155/index-eng.html" target="_self">Data Analysis and Machine Learning FYS-STK3155/FYS4155</a></b> 
+</center>
+<!-- institution -->
+<center>
+<b>University of Oslo, Norway</b>
+</center>
+<br>
+<center>
+<h4>October 14, 2025</h4>
+</center> <!-- date -->
+<br>
+
+
+</div> <!-- end jumbotron -->
+<h2 id="deliverables" class="anchor">Deliverables </h2>
+
+<p>First, join a group in canvas with your group partners. Pick an avaliable group for Project 2 in the <b>People</b> page.</p>
+
+<p>In canvas, deliver as a group and include:</p>
+
+<ul>
+<li> A PDF of your report which follows the guidelines covered below and in the week 39 exercises. Additional requirements include:</li>
+<ul>
+  <li> It should be around 5000 words, use the word counter in Overleaf for this. This often corresponds to 10-12 pages. References and appendices are excluded from the word count</li>
+  <li> It should include around 10-15 figures. You can include more figures in appendices and/or as supplemental material in your repository.</li>
+</ul>
+<li> A comment linking to your github repository (or folder in one of your github repositories) for this project. The repository must include</li>
+</ul>
+<p>A PDF file of the report</p>
+<ul>
+  <li> A folder named Code, where you put python files for your functions and notebooks for reproducing your results. Remember to use a seed for generating random data and for train-test splits when generating final results.</li>
+  <li> A README file with the name of the group members</li>
+  <li> a short description of the project</li>
+  <li> a description of how to install the required packages to run your code from a requirements.txt file or similar (such as a plain text description) names and descriptions of the various notebooks in the Code folder and the results they produce</li>
+</ul>
+<h3 id="preamble-note-on-writing-reports-using-reference-material-ai-and-other-tools" class="anchor">Preamble: Note on writing reports, using reference material, AI and other tools </h3>
+
+<p>We want you to answer the three different projects by handing in
+reports written like a standard scientific/technical report. The links
+at
+https://github.com/CompPhysics/MachineLearning/tree/master/doc/Projects
+contain more information. There you can find examples of previous
+reports, the projects themselves, how we grade reports etc. How to
+write reports will also be discussed during the various lab
+sessions. Please do ask us if you are in doubt.
+</p>
+
+<p>When using codes and material from other sources, you should refer to
+these in the bibliography of your report, indicating wherefrom you for
+example got the code, whether this is from the lecture notes,
+softwares like Scikit-Learn, TensorFlow, PyTorch or other
+sources. These sources should always be cited correctly. How to cite
+some of the libraries is often indicated from their corresponding
+GitHub sites or websites, see for example how to cite Scikit-Learn at
+https://scikit-learn.org/dev/about.html.
+</p>
+
+<p>We enocurage you to use tools like ChatGPT or similar in writing the
+report. If you use for example ChatGPT, please do cite it properly and
+include (if possible) your questions and answers as an addition to the
+report. This can be uploaded to for example your website,
+GitHub/GitLab or similar as supplemental material.
+</p>
+
+<p>If you would like to study other data sets, feel free to propose other
+sets. What we have proposed here are mere suggestions from our
+side. If you opt for another data set, consider using a set which has
+been studied in the scientific literature. This makes it easier for
+you to compare and analyze your results. Comparing with existing
+results from the scientific literature is also an essential element of
+the scientific discussion. The University of California at Irvine with
+its Machine Learning repository at
+https://archive.ics.uci.edu/ml/index.php is an excellent site to look
+up for examples and inspiration. Kaggle.com is an equally interesting
+site. Feel free to explore these sites. 
+</p>
+<h2 id="classification-and-regression-writing-our-own-neural-network-code" class="anchor">Classification and Regression, writing our own neural network code  </h2>
+
+<p>The main aim of this project is to study both classification and
+regression problems by developing our own 
+feed-forward neural network (FFNN) code. The exercises from week 41 and 42 (see <a href="https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/exercisesweek41.html" target="_self"><tt>https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/exercisesweek41.html</tt></a> and <a href="https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/exercisesweek42.html" target="_self"><tt>https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/exercisesweek42.html</tt></a>) as well as the lecture material from the same weeks (see  <a href="https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/week41.html" target="_self"><tt>https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/week41.html</tt></a> and <a href="https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/week42.html" target="_self"><tt>https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/week42.html</tt></a>) should contain enough information for you to get started with writing your own code.
+</p>
+
+<p>We will also reuse our codes on gradient descent methods from project 1.</p>
+
+<p>The data sets that we propose here are (the default sets)</p>
+
+<ul>
+<li> Regression (fitting a continuous function). In this part you will need to bring back your results from project 1 and compare these with what you get from your Neural Network code to be developed here. The data sets could be</li>
+<ul>
+  <li> The simple one-dimensional function Runge function from project 1, that is \( f(x) = \frac{1}{1+25x^2} \). We recommend using a simpler function when developing your neural network code for regression problems. Feel however free to discuss and study other functions, such as the two-dimensional Runge function \( f(x,y)=\left[(10x - 5)^2 + (10y - 5)^2 + 1 \right]^{-1} \), or even more complicated two-dimensional functions (see the supplementary material of <a href="https://www.nature.com/articles/s41467-025-61362-4" target="_self"><tt>https://www.nature.com/articles/s41467-025-61362-4</tt></a> for an extensive list of two-dimensional functions).</li> 
+</ul>
+<li> Classification.</li>
+<ul>
+ <li> We will consider a multiclass classification problem given by the full MNIST data set. The full data set is at <a href="https://www.kaggle.com/datasets/hojjatk/mnist-dataset" target="_self"><tt>https://www.kaggle.com/datasets/hojjatk/mnist-dataset</tt></a>.</li>
+</ul>
+</ul>
+<p>We will start with a regression problem and we will reuse our codes on gradient descent methods from project 1.</p>
+<h3 id="part-a-analytical-warm-up" class="anchor">Part a): Analytical warm-up </h3>
+
+<p>When using our gradient machinery from project 1, we will need the expressions for the cost/loss functions and their respective
+gradients. The functions whose gradients we need are:
+</p>
+<ol>
+<li> The mean-squared error (MSE) with and without the \( L_1 \) and \( L_2 \) norms (regression problems)</li>
+<li> The binary cross entropy (aka log loss)  for binary classification problems with and without \( L_1 \) and \( L_2 \) norms</li>
+<li> The multiclass cross entropy cost/loss function (aka Softmax cross entropy or just Softmax loss function)</li>
+</ol>
+<p>Set up these three cost/loss functions and their respective derivatives and explain the various terms. In this project you will however only use the MSE and the Softmax  cross entropy.</p>
+
+<p>We will test three activation functions for our neural network setup, these are the </p>
+<ol>
+<li> The Sigmoid (aka <b>logit</b>) function,</li>
+<li> the RELU function and</li>
+<li> the Leaky RELU function</li>
+</ol>
+<p>Set up their expressions and their first derivatives.
+You may consult the lecture notes (with codes and more) from week 42 at <a href="https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/week42.html" target="_self"><tt>https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/week42.html</tt></a>.
+</p>
+<h3 id="reminder-about-the-gradient-machinery-from-project-1" class="anchor">Reminder about the gradient machinery from project 1 </h3>
+
+<p>In the setup of a neural network code you will need your gradient descent codes from
+project 1.  For neural networks we will recommend using stochastic
+gradient descent with either the RMSprop or the ADAM algorithms for
+updating the learning rates. But you should feel free to try plain gradient descent as well.
+</p>
+
+<p>We recommend reading chapter 8 on optimization from the textbook of
+Goodfellow, Bengio and Courville at
+<a href="https://www.deeplearningbook.org/" target="_self"><tt>https://www.deeplearningbook.org/</tt></a>. This chapter contains many
+useful insights and discussions on the optimization part of machine
+learning.  A useful reference on the back progagation algorithm is
+Nielsen's book at <a href="http://neuralnetworksanddeeplearning.com/" target="_self"><tt>http://neuralnetworksanddeeplearning.com/</tt></a>. 
+</p>
+
+<p>You will find the Python <a href="https://seaborn.pydata.org/generated/seaborn.heatmap.html" target="_self">Seaborn
+package</a>
+useful when plotting the results as function of the learning rate
+\( \eta \) and the hyper-parameter \( \lambda \) .
+</p>
+<h3 id="part-b-writing-your-own-neural-network-code" class="anchor">Part b): Writing your own Neural Network code  </h3>
+
+<p>Your aim now, and this is the central part of this project, is to
+write your own FFNN code implementing the back
+propagation algorithm discussed in the lecture slides from week 41 at <a href="https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/week41.html" target="_self"><tt>https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/week41.html</tt></a> and week 42 at <a href="https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/week42.html" target="_self"><tt>https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/week42.html</tt></a>.
+</p>
+
+<p>We will focus on a regression problem first, using the one-dimensional Runge function</p>
+$$
+f(x) = \frac{1}{1+25x^2},
+$$
+
+<p>from project 1.</p>
+
+<p>Use only the mean-squared error as cost function (no regularization terms) and 
+write an FFNN code for a regression problem with a flexible number of hidden
+layers and nodes using only the Sigmoid function as activation function for
+the hidden layers. Initialize the weights using a normal
+distribution. How would you initialize the biases? And which
+activation function would you select for the final output layer?
+And how would you set up your design/feature matrix? Hint: does it have to represent a polynomial approximation as you did in project 1? 
+</p>
+
+<p>Train your network and compare the results with those from your OLS
+regression code from project 1 using the one-dimensional Runge
+function.  When comparing your neural network code with the OLS
+results from project 1, use the same data sets which gave you the best
+MSE score. Moreover, use the polynomial order from project 1 that gave you the
+best result.  Compare these results with your neural network with one
+and two hidden layers using \( 50 \) and \( 100 \) hidden nodes, respectively.
+</p>
+
+<p>Comment your results and give a critical discussion of the results
+obtained with the OLS code from project 1 and your own neural network
+code.  Make an analysis of the learning rates employed to find the
+optimal MSE score. Test both stochastic gradient descent
+with RMSprop and ADAM and plain gradient descent with different
+learning rates.
+</p>
+
+<p>You should, as you did in project 1, scale your data.</p>
+<h3 id="part-c-testing-against-other-software-libraries" class="anchor">Part c): Testing against other software libraries </h3>
+
+<p>You should test your results against a similar code using <b>Scikit-Learn</b> (see the examples in the above lecture notes from weeks 41 and 42) or <b>tensorflow/keras</b> or <b>Pytorch</b> (for Pytorch, see Raschka et al.'s text chapters 12 and 13). </p>
+
+<p>Furthermore, you should also test that your derivatives are correctly
+calculated using automatic differentiation, using for example the
+<b>Autograd</b> library or the <b>JAX</b> library. It is optional to implement
+these libraries for the present project. In this project they serve as
+useful tests of our derivatives.
+</p>
+<h3 id="part-d-testing-different-activation-functions-and-depths-of-the-neural-network" class="anchor">Part d): Testing different activation functions and depths of the neural network </h3>
+
+<p>You should also test different activation functions for the hidden
+layers. Try out the Sigmoid, the RELU and the Leaky RELU functions and
+discuss your results.  Test your results as functions of the number of hidden layers and nodes. Do you see signs of overfitting?
+It is optional in this project to perform a bias-variance trade-off analysis. 
+</p>
+<h3 id="part-e-testing-different-norms" class="anchor">Part e): Testing different norms </h3>
+
+<p>Finally, still using the one-dimensional Runge function, add now the
+hyperparameters \( \lambda \) with the \( L_2 \) and \( L_1 \) norms.  Find the
+optimal results for the hyperparameters \( \lambda \) and the learning
+rates \( \eta \) and neural network architecture and compare the \( L_2 \) results with Ridge regression from
+project 1 and the \( L_1 \) results with the Lasso calculations of project 1.
+Use again the same data sets and the best results from project 1 in your comparisons. 
+</p>
+<h3 id="part-f-classification-analysis-using-neural-networks" class="anchor">Part f): Classification  analysis using neural networks  </h3>
+
+<p>With a well-written code it should now be easy to change the
+activation function for the output layer.
+</p>
+
+<p>Here we will change the cost function for our neural network code
+developed in parts b), d) and e) in order to perform a classification
+analysis.  The classification problem we will study is the multiclass
+MNIST problem, see the description of the full data set at
+<a href="https://www.kaggle.com/datasets/hojjatk/mnist-dataset" target="_self"><tt>https://www.kaggle.com/datasets/hojjatk/mnist-dataset</tt></a>. We will use the Softmax cross entropy function discussed in a). 
+The MNIST data set discussed in the lecture notes from week 42 is a downscaled variant of the full dataset. 
+</p>
+
+<p>Feel free to suggest other data sets. If you find the classic MNIST data set somewhat limited, feel free to try the  
+MNIST-Fashion data set at for example <a href="https://www.kaggle.com/datasets/zalando-research/fashionmnist" target="_self"><tt>https://www.kaggle.com/datasets/zalando-research/fashionmnist</tt></a>.
+</p>
+
+<p>To set up the data set, the following python programs may be useful</p>
+
+<!-- code=python (!bc pycod) typeset with pygments style "default" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #f8f8f8">
+  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.datasets</span> <span style="color: #008000; font-weight: bold">import</span> fetch_openml
+
+<span style="color: #408080; font-style: italic"># Fetch the MNIST dataset</span>
+mnist <span style="color: #666666">=</span> fetch_openml(<span style="color: #BA2121">&#39;mnist_784&#39;</span>, version<span style="color: #666666">=1</span>, as_frame<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">False</span>, parser<span style="color: #666666">=</span><span style="color: #BA2121">&#39;auto&#39;</span>)
+
+<span style="color: #408080; font-style: italic"># Extract data (features) and target (labels)</span>
+X <span style="color: #666666">=</span> mnist<span style="color: #666666">.</span>data
+y <span style="color: #666666">=</span> mnist<span style="color: #666666">.</span>target
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
+</div>
+
+<p>You should consider scaling the data. The Pixel values in MNIST range from 0 to 255. Scaling them to a 0-1 range can improve the performance of some models. That is, you could implement the following scaling</p>
+
+<!-- code=python (!bc pycod) typeset with pygments style "default" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #f8f8f8">
+  <pre style="line-height: 125%;">X <span style="color: #666666">=</span> X <span style="color: #666666">/</span> <span style="color: #666666">255.0</span>
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
+</div>
+
+<p>And then perform the standard train-test splitting</p>
+
+<!-- code=python (!bc pycod) typeset with pygments style "default" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #f8f8f8">
+  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.model_selection</span> <span style="color: #008000; font-weight: bold">import</span> train_test_split
+X_train, X_test, y_train, y_test <span style="color: #666666">=</span> train_test_split(X, y, test_size<span style="color: #666666">=0.2</span>, random_state<span style="color: #666666">=42</span>)
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
+</div>
+
+<p>To measure the performance of our classification problem we will use the
+so-called <em>accuracy</em> score.  The accuracy is as you would expect just
+the number of correctly guessed targets \( t_i \) divided by the total
+number of targets, that is 
+</p>
+
+$$ 
+\text{Accuracy} = \frac{\sum_{i=1}^n I(t_i = y_i)}{n} ,
+$$
+
+<p>where \( I \) is the indicator function, \( 1 \) if \( t_i = y_i \) and \( 0 \)
+otherwise if we have a binary classification problem. Here \( t_i \)
+represents the target and \( y_i \) the outputs of your FFNN code and \( n \) is simply the number of targets \( t_i \).
+</p>
+
+<p>Discuss your results and give a critical analysis of the various parameters, including hyper-parameters like the learning rates and the regularization parameter \( \lambda \), various activation functions, number of hidden layers and nodes and activation functions.  </p>
+
+<p>Again, we strongly recommend that you compare your own neural Network
+code for classification and pertinent results against a similar code using <b>Scikit-Learn</b>  or <b>tensorflow/keras</b> or <b>pytorch</b>.
+</p>
+
+<p>If you have time, you can use the functionality of <b>scikit-learn</b> and compare your neural network results with those from Logistic regression. This is optional.
+The weblink  here <a href="https://medium.com/ai-in-plain-english/comparison-between-logistic-regression-and-neural-networks-in-classifying-digits-dc5e85cd93c3" target="_self"><tt>https://medium.com/ai-in-plain-english/comparison-between-logistic-regression-and-neural-networks-in-classifying-digits-dc5e85cd93c3</tt></a>compares logistic regression and FFNN using the so-called MNIST data set. You may find several useful hints and ideas from this article. Your neural network code can implement the equivalent of logistic regression by simply setting the number of hidden layers to zero and keeping just the input and the output layers. 
+</p>
+
+<p>If you wish to compare with say Logisti Regression from <b>scikit-learn</b>, the following code uses the above data set</p>
+
+<!-- code=python (!bc pycod) typeset with pygments style "default" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #f8f8f8">
+  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.linear_model</span> <span style="color: #008000; font-weight: bold">import</span> LogisticRegression
+<span style="color: #408080; font-style: italic"># Initialize the model</span>
+model <span style="color: #666666">=</span> LogisticRegression(solver<span style="color: #666666">=</span><span style="color: #BA2121">&#39;saga&#39;</span>, multi_class<span style="color: #666666">=</span><span style="color: #BA2121">&#39;multinomial&#39;</span>, max_iter<span style="color: #666666">=1000</span>, random_state<span style="color: #666666">=42</span>)
+<span style="color: #408080; font-style: italic"># Train the model</span>
+model<span style="color: #666666">.</span>fit(X_train, y_train)
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.metrics</span> <span style="color: #008000; font-weight: bold">import</span> accuracy_score
+<span style="color: #408080; font-style: italic"># Make predictions on the test set</span>
+y_pred <span style="color: #666666">=</span> model<span style="color: #666666">.</span>predict(X_test)
+<span style="color: #408080; font-style: italic"># Calculate accuracy</span>
+accuracy <span style="color: #666666">=</span> accuracy_score(y_test, y_pred)
+<span style="color: #008000">print</span>(<span style="color: #BA2121">f&quot;Model Accuracy: </span><span style="color: #BB6688; font-weight: bold">{</span>accuracy<span style="color: #BB6688; font-weight: bold">:</span><span style="color: #BA2121">.4f</span><span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">&quot;</span>)
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
+</div>
+<h3 id="part-g-critical-evaluation-of-the-various-algorithms" class="anchor">Part g) Critical evaluation of the various algorithms </h3>
+
+<p>After all these glorious calculations, you should now summarize the
+various algorithms and come with a critical evaluation of their pros
+and cons. Which algorithm works best for the regression case and which
+is best for the classification case. These codes can also be part of
+your final project 3, but now applied to other data sets.
+</p>
+<h2 id="summary-of-methods-to-implement-and-analyze" class="anchor">Summary of methods to implement and analyze </h2>
+
+<b>Required Implementation:</b>
+<ol>
+<li> Reuse the regression code and results from project 1, these will act as a benchmark for seeing how suited a neural network is for this regression task.</li>
+<li> Implement a neural network with</li>
+<ul>
+  <li> A flexible number of layers</li>
+  <li> A flexible number of nodes in each layer</li>
+  <li> A changeable activation function in each layer (Sigmoid, ReLU, LeakyReLU, as well as Linear and Softmax)</li>
+  <li> A changeable cost function, which will be set to MSE for regression and cross-entropy for multiple-classification</li>
+  <li> An optional L1 or L2 norm of the weights and biases in the cost function (only used for computing gradients, not interpretable metrics)</li>
+</ul>
+<li> Implement the back-propagation algorithm to compute the gradient of your neural network</li>
+<li> Reuse the implementation of Plain and Stochastic Gradient Descent from Project 1 (and adapt the code to work with the your neural network)</li>
+<ul>
+  <li> With no optimization algorithm</li>
+  <li> With RMS Prop</li>
+  <li> With ADAM</li>
+</ul>
+<li> Implement scaling and train-test splitting of your data, preferably using sklearn</li>
+<li> Implement and compute metrics like the MSE and Accuracy</li>
+</ol>
+<h3 id="required-analysis" class="anchor">Required Analysis: </h3>
+<ol>
+<li> Briefly show and argue for the advantages and disadvantages of the methods from Project 1.</li>
+<li> Explore and show the impact of changing the number of layers, nodes per layer, choice of activation function, and inclusion of L1 and L2 norms. Present only the most interesting results from this exploration. 2D Heatmaps will be good for this: Start with finding a well performing set of hyper-parameters, then change two at a time in a range that shows good and bad performance.</li>
+<li> Show and argue for the advantages and disadvantages of using a neural network for regression on your data</li>
+<li> Show and argue for the advantages and disadvantages of using a neural network for classification on your data</li>
+<li> Show and argue for the advantages and disadvantages of the different gradient methods and learning rates when training the neural network</li>
+</ol>
+<h3 id="optional-note-that-you-should-include-at-least-two-of-these-in-the-report" class="anchor">Optional (Note that you should include at least two of these in the report): </h3>
+<ol>
+<li> Implement Logistic Regression as simple classification model case (equivalent to a Neural Network with just the output layer)</li>
+<li> Compute the gradient of the neural network with autograd, to show that it gives the same result as your hand-written backpropagation.</li>
+<li> Compare your results with results from using a machine-learning library like pytorch (https://docs.pytorch.org/tutorials/beginner/basics/buildmodel_tutorial.html)</li>
+<li> Use a more complex classification dataset instead, like the fashion MNIST (see <a href="https://www.kaggle.com/datasets/zalando-research/fashionmnist" target="_self"><tt>https://www.kaggle.com/datasets/zalando-research/fashionmnist</tt></a>)</li>
+<li> Use a more complex regression dataset instead, like the two-dimensional Runge function \( f(x,y)=\left[(10x - 5)^2 + (10y - 5)^2 + 1 \right]^{-1} \), or even more complicated two-dimensional functions (see the supplementary material of <a href="https://www.nature.com/articles/s41467-025-61362-4" target="_self"><tt>https://www.nature.com/articles/s41467-025-61362-4</tt></a> for an extensive list of two-dimensional functions).</li> 
+<li> Compute and interpret a confusion matrix of your best classification model (see <a href="https://www.researchgate.net/figure/Confusion-matrix-of-MNIST-and-F-MNIST-embeddings_fig5_349758607" target="_self"><tt>https://www.researchgate.net/figure/Confusion-matrix-of-MNIST-and-F-MNIST-embeddings_fig5_349758607</tt></a>)</li>
+</ol>
+<h2 id="background-literature" class="anchor">Background literature </h2>
+
+<ol>
+<li> The text of Michael Nielsen is highly recommended, see Nielsen's book at <a href="http://neuralnetworksanddeeplearning.com/" target="_self"><tt>http://neuralnetworksanddeeplearning.com/</tt></a>. It is an excellent read.</li>
+<li> Goodfellow, Bengio and Courville, Deep Learning at <a href="https://www.deeplearningbook.org/" target="_self"><tt>https://www.deeplearningbook.org/</tt></a>. Here we recommend chapters 6, 7 and 8</li>
+<li> Raschka et al. at <a href="https://sebastianraschka.com/blog/2022/ml-pytorch-book.html" target="_self"><tt>https://sebastianraschka.com/blog/2022/ml-pytorch-book.html</tt></a>. Here we recommend chapters 11, 12 and 13.</li>
+</ol>
+<h2 id="introduction-to-numerical-projects" class="anchor">Introduction to numerical projects </h2>
+
+<p>Here follows a brief recipe and recommendation on how to write a report for each
+project.
+</p>
+
+<ul>
+  <li> Give a short description of the nature of the problem and the eventual  numerical methods you have used.</li>
+  <li> Describe the algorithm you have used and/or developed. Here you may find it convenient to use pseudocoding. In many cases you can describe the algorithm in the program itself.</li>
+  <li> Include the source code of your program. Comment your program properly.</li>
+  <li> If possible, try to find analytic solutions, or known limits in order to test your program when developing the code.</li>
+  <li> Include your results either in figure form or in a table. Remember to        label your results. All tables and figures should have relevant captions        and labels on the axes.</li>
+  <li> Try to evaluate the reliabilty and numerical stability/precision of your results. If possible, include a qualitative and/or quantitative discussion of the numerical stability, eventual loss of precision etc.</li>
+  <li> Try to give an interpretation of you results in your answers to  the problems.</li>
+  <li> Critique: if possible include your comments and reflections about the  exercise, whether you felt you learnt something, ideas for improvements and  other thoughts you've made when solving the exercise. We wish to keep this course at the interactive level and your comments can help us improve it.</li>
+  <li> Try to establish a practice where you log your work at the  computerlab. You may find such a logbook very handy at later stages in your work, especially when you don't properly remember  what a previous test version  of your program did. Here you could also record  the time spent on solving the exercise, various algorithms you may have tested or other topics which you feel worthy of mentioning.</li>
+</ul>
+<h2 id="format-for-electronic-delivery-of-report-and-programs" class="anchor">Format for electronic delivery of report and programs </h2>
+
+<p>The preferred format for the report is a PDF file. You can also use DOC or postscript formats or as an ipython notebook file.  As programming language we prefer that you choose between C/C++, Fortran2008 or Python. The following prescription should be followed when preparing the report:</p>
+
+<ul>
+  <li> Use Canvas to hand in your projects, log in  at  <a href="https://www.uio.no/english/services/it/education/canvas/" target="_self"><tt>https://www.uio.no/english/services/it/education/canvas/</tt></a> with your normal UiO username and password.</li>
+  <li> Upload <b>only</b> the report file or the link to your GitHub/GitLab or similar typo of  repos!  For the source code file(s) you have developed please provide us with your link to your GitHub/GitLab or similar  domain.  The report file should include all of your discussions and a list of the codes you have developed.  Do not include library files which are available at the course homepage, unless you have made specific changes to them.</li>
+  <li> In your GitHub/GitLab or similar repository, please include a folder which contains selected results. These can be in the form of output from your code for a selected set of runs and input parameters.</li>
+</ul>
+<p>Finally, 
+we encourage you to collaborate. Optimal working groups consist of 
+2-3 students. You can then hand in a common report. 
+</p>
+
+<p>
+<!-- navigation buttons at the bottom of the page -->
+<ul class="pagination">
+  <li class="active"><a href="http://github.com/._Project2-bs000.html">1</a></li>
+</ul>
+<!-- ------------------- end of main content --------------- -->
+</div>  <!-- end container -->
+<!-- include javascript, jQuery *first* -->
+<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.10.2/jquery.min.js"></script>
+<script src="https://netdna.bootstrapcdn.com/bootstrap/3.0.0/js/bootstrap.min.js"></script>
+<!-- Bootstrap footer
+<footer>
+<a href="https://.../"><img width="250" align=right src="https://.../"></a>
+</footer>
+-->
+<center style="font-size:80%">
+<!-- copyright --> &copy; 1999-2025, "Data Analysis and Machine Learning FYS-STK3155/FYS4155":"http://www.uio.no/studier/emner/matnat/fys/FYS3155/index-eng.html". Released under CC Attribution-NonCommercial 4.0 license
+</center>
+</body>
+</html>
+
diff --git a/doc/Projects/2025/Project2/html/Project2.html b/doc/Projects/2025/Project2/html/Project2.html
new file mode 100644
index 000000000..71e58c711
--- /dev/null
+++ b/doc/Projects/2025/Project2/html/Project2.html
@@ -0,0 +1,658 @@
+<!--
+HTML file automatically generated from DocOnce source
+(https://github.com/doconce/doconce/)
+doconce format html Project2.do.txt --pygments_html_style=default --html_style=bloodish --html_links_in_new_window --html_output=Project2
+-->
+<html>
+<head>
+<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
+<meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
+<meta name="viewport" content="width=device-width, initial-scale=1.0" />
+<meta name="description" content="Project 2 on Machine Learning, deadline November 10 (Midnight)">
+<title>Project 2 on Machine Learning, deadline November 10 (Midnight)</title>
+<style type="text/css">
+/* bloodish style */
+body {
+  font-family: Helvetica, Verdana, Arial, Sans-serif;
+  color: #404040;
+  background: #ffffff;
+}
+h1 { font-size: 1.8em; color: #8A0808; }
+h2 { font-size: 1.6em; color: #8A0808; }
+h3 { font-size: 1.4em; color: #8A0808; }
+h4 { font-size: 1.2em; color: #8A0808; }
+a { color: #8A0808; text-decoration:none; }
+tt { font-family: "Courier New", Courier; }
+p { text-indent: 0px; }
+hr { border: 0; width: 80%; border-bottom: 1px solid #aaa}
+p.caption { width: 80%; font-style: normal; text-align: left; }
+hr.figure { border: 0; width: 80%; border-bottom: 1px solid #aaa; }div.highlight {
+    border: 1px solid #cfcfcf;
+    border-radius: 2px;
+    line-height: 1.21429em;
+}
+div.cell {
+    width: 100%;
+    padding: 5px 5px 5px 0;
+    margin: 0;
+    outline: none;
+}
+div.input {
+    page-break-inside: avoid;
+    box-orient: horizontal;
+    box-align: stretch;
+    display: flex;
+    flex-direction: row;
+    align-items: stretch;
+}
+div.inner_cell {
+    box-orient: vertical;
+    box-align: stretch;
+    display: flex;
+    flex-direction: column;
+    align-items: stretch;
+    box-flex: 1;
+    flex: 1;
+}
+div.input_area {
+    border: 1px solid #cfcfcf;
+    border-radius: 4px;
+    background: #f7f7f7;
+    line-height: 1.21429em;
+}
+div.input_area > div.highlight {
+    margin: .4em;
+    border: none;
+    padding: 0;
+    background-color: transparent;
+}
+div.output_wrapper {
+    position: relative;
+    box-orient: vertical;
+    box-align: stretch;
+    display: flex;
+    flex-direction: column;
+    align-items: stretch;
+}
+.output {
+    box-orient: vertical;
+    box-align: stretch;
+    display: flex;
+    flex-direction: column;
+    align-items: stretch;
+}
+div.output_area {
+    padding: 0;
+    page-break-inside: avoid;
+    box-orient: horizontal;
+    box-align: stretch;
+    display: flex;
+    flex-direction: row;
+    align-items: stretch;
+}
+div.output_subarea {
+    padding: .4em .4em 0 .4em;
+    box-flex: 1;
+    flex: 1;
+}
+div.output_text {
+    text-align: left;
+    color: #000;
+    line-height: 1.21429em;
+}
+div { text-align: justify; text-justify: inter-word; }
+.tab {
+  padding-left: 1.5em;
+}
+div.toc p,a {
+  line-height: 1.3;
+  margin-top: 1.1;
+  margin-bottom: 1.1;
+}
+</style>
+</head>
+
+<!-- tocinfo
+{'highest level': 2,
+ 'sections': [('Deliverables', 2, None, 'deliverables'),
+              ('Preamble: Note on writing reports, using reference material, '
+               'AI and other tools',
+               3,
+               None,
+               'preamble-note-on-writing-reports-using-reference-material-ai-and-other-tools'),
+              ('Classification and Regression, writing our own neural network '
+               'code',
+               2,
+               None,
+               'classification-and-regression-writing-our-own-neural-network-code'),
+              ('Part a): Analytical warm-up',
+               3,
+               None,
+               'part-a-analytical-warm-up'),
+              ('Reminder about the gradient machinery from project 1',
+               3,
+               None,
+               'reminder-about-the-gradient-machinery-from-project-1'),
+              ('Part b): Writing your own Neural Network code',
+               3,
+               None,
+               'part-b-writing-your-own-neural-network-code'),
+              ('Part c): Testing against other software libraries',
+               3,
+               None,
+               'part-c-testing-against-other-software-libraries'),
+              ('Part d): Testing different activation functions and depths of '
+               'the neural network',
+               3,
+               None,
+               'part-d-testing-different-activation-functions-and-depths-of-the-neural-network'),
+              ('Part e): Testing different norms',
+               3,
+               None,
+               'part-e-testing-different-norms'),
+              ('Part f): Classification  analysis using neural networks',
+               3,
+               None,
+               'part-f-classification-analysis-using-neural-networks'),
+              ('Part g) Critical evaluation of the various algorithms',
+               3,
+               None,
+               'part-g-critical-evaluation-of-the-various-algorithms'),
+              ('Summary of methods to implement and analyze',
+               2,
+               None,
+               'summary-of-methods-to-implement-and-analyze'),
+              ('Required Analysis:', 3, None, 'required-analysis'),
+              ('Optional (Note that you should include at least two of these '
+               'in the report):',
+               3,
+               None,
+               'optional-note-that-you-should-include-at-least-two-of-these-in-the-report'),
+              ('Background literature', 2, None, 'background-literature'),
+              ('Introduction to numerical projects',
+               2,
+               None,
+               'introduction-to-numerical-projects'),
+              ('Format for electronic delivery of report and programs',
+               2,
+               None,
+               'format-for-electronic-delivery-of-report-and-programs')]}
+end of tocinfo -->
+
+<body>
+
+
+
+<script type="text/x-mathjax-config">
+MathJax.Hub.Config({
+  TeX: {
+     equationNumbers: {  autoNumber: "AMS"  },
+     extensions: ["AMSmath.js", "AMSsymbols.js", "autobold.js", "color.js"]
+  }
+});
+</script>
+<script type="text/javascript" async
+ src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
+</script>
+
+
+<!-- ------------------- main content ---------------------- -->
+<center>
+<h1>Project 2 on Machine Learning, deadline November 10 (Midnight)</h1>
+</center>  <!-- document title -->
+
+<!-- author(s): <a href="http://www.uio.no/studier/emner/matnat/fys/FYS3155/index-eng.html" target="_blank">Data Analysis and Machine Learning FYS-STK3155/FYS4155</a> -->
+<center>
+<b><a href="http://www.uio.no/studier/emner/matnat/fys/FYS3155/index-eng.html" target="_blank">Data Analysis and Machine Learning FYS-STK3155/FYS4155</a></b> 
+</center>
+<!-- institution -->
+<center>
+<b>University of Oslo, Norway</b>
+</center>
+<br>
+<center>
+<h4>October 14, 2025</h4>
+</center> <!-- date -->
+<br>
+<h2 id="deliverables">Deliverables </h2>
+
+<p>First, join a group in canvas with your group partners. Pick an avaliable group for Project 2 in the <b>People</b> page.</p>
+
+<p>In canvas, deliver as a group and include:</p>
+
+<ul>
+<li> A PDF of your report which follows the guidelines covered below and in the week 39 exercises. Additional requirements include:</li>
+<ul>
+  <li> It should be around 5000 words, use the word counter in Overleaf for this. This often corresponds to 10-12 pages. References and appendices are excluded from the word count</li>
+  <li> It should include around 10-15 figures. You can include more figures in appendices and/or as supplemental material in your repository.</li>
+</ul>
+<li> A comment linking to your github repository (or folder in one of your github repositories) for this project. The repository must include</li>
+</ul>
+<p>A PDF file of the report</p>
+<ul>
+  <li> A folder named Code, where you put python files for your functions and notebooks for reproducing your results. Remember to use a seed for generating random data and for train-test splits when generating final results.</li>
+  <li> A README file with the name of the group members</li>
+  <li> a short description of the project</li>
+  <li> a description of how to install the required packages to run your code from a requirements.txt file or similar (such as a plain text description) names and descriptions of the various notebooks in the Code folder and the results they produce</li>
+</ul>
+<h3 id="preamble-note-on-writing-reports-using-reference-material-ai-and-other-tools">Preamble: Note on writing reports, using reference material, AI and other tools </h3>
+
+<p>We want you to answer the three different projects by handing in
+reports written like a standard scientific/technical report. The links
+at
+https://github.com/CompPhysics/MachineLearning/tree/master/doc/Projects
+contain more information. There you can find examples of previous
+reports, the projects themselves, how we grade reports etc. How to
+write reports will also be discussed during the various lab
+sessions. Please do ask us if you are in doubt.
+</p>
+
+<p>When using codes and material from other sources, you should refer to
+these in the bibliography of your report, indicating wherefrom you for
+example got the code, whether this is from the lecture notes,
+softwares like Scikit-Learn, TensorFlow, PyTorch or other
+sources. These sources should always be cited correctly. How to cite
+some of the libraries is often indicated from their corresponding
+GitHub sites or websites, see for example how to cite Scikit-Learn at
+https://scikit-learn.org/dev/about.html.
+</p>
+
+<p>We enocurage you to use tools like ChatGPT or similar in writing the
+report. If you use for example ChatGPT, please do cite it properly and
+include (if possible) your questions and answers as an addition to the
+report. This can be uploaded to for example your website,
+GitHub/GitLab or similar as supplemental material.
+</p>
+
+<p>If you would like to study other data sets, feel free to propose other
+sets. What we have proposed here are mere suggestions from our
+side. If you opt for another data set, consider using a set which has
+been studied in the scientific literature. This makes it easier for
+you to compare and analyze your results. Comparing with existing
+results from the scientific literature is also an essential element of
+the scientific discussion. The University of California at Irvine with
+its Machine Learning repository at
+https://archive.ics.uci.edu/ml/index.php is an excellent site to look
+up for examples and inspiration. Kaggle.com is an equally interesting
+site. Feel free to explore these sites. 
+</p>
+<h2 id="classification-and-regression-writing-our-own-neural-network-code">Classification and Regression, writing our own neural network code  </h2>
+
+<p>The main aim of this project is to study both classification and
+regression problems by developing our own 
+feed-forward neural network (FFNN) code. The exercises from week 41 and 42 (see <a href="https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/exercisesweek41.html" target="_blank"><tt>https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/exercisesweek41.html</tt></a> and <a href="https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/exercisesweek42.html" target="_blank"><tt>https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/exercisesweek42.html</tt></a>) as well as the lecture material from the same weeks (see  <a href="https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/week41.html" target="_blank"><tt>https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/week41.html</tt></a> and <a href="https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/week42.html" target="_blank"><tt>https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/week42.html</tt></a>) should contain enough information for you to get started with writing your own code.
+</p>
+
+<p>We will also reuse our codes on gradient descent methods from project 1.</p>
+
+<p>The data sets that we propose here are (the default sets)</p>
+
+<ul>
+<li> Regression (fitting a continuous function). In this part you will need to bring back your results from project 1 and compare these with what you get from your Neural Network code to be developed here. The data sets could be</li>
+<ul>
+  <li> The simple one-dimensional function Runge function from project 1, that is \( f(x) = \frac{1}{1+25x^2} \). We recommend using a simpler function when developing your neural network code for regression problems. Feel however free to discuss and study other functions, such as the two-dimensional Runge function \( f(x,y)=\left[(10x - 5)^2 + (10y - 5)^2 + 1 \right]^{-1} \), or even more complicated two-dimensional functions (see the supplementary material of <a href="https://www.nature.com/articles/s41467-025-61362-4" target="_blank"><tt>https://www.nature.com/articles/s41467-025-61362-4</tt></a> for an extensive list of two-dimensional functions).</li> 
+</ul>
+<li> Classification.</li>
+<ul>
+ <li> We will consider a multiclass classification problem given by the full MNIST data set. The full data set is at <a href="https://www.kaggle.com/datasets/hojjatk/mnist-dataset" target="_blank"><tt>https://www.kaggle.com/datasets/hojjatk/mnist-dataset</tt></a>.</li>
+</ul>
+</ul>
+<p>We will start with a regression problem and we will reuse our codes on gradient descent methods from project 1.</p>
+<h3 id="part-a-analytical-warm-up">Part a): Analytical warm-up </h3>
+
+<p>When using our gradient machinery from project 1, we will need the expressions for the cost/loss functions and their respective
+gradients. The functions whose gradients we need are:
+</p>
+<ol>
+<li> The mean-squared error (MSE) with and without the \( L_1 \) and \( L_2 \) norms (regression problems)</li>
+<li> The binary cross entropy (aka log loss)  for binary classification problems with and without \( L_1 \) and \( L_2 \) norms</li>
+<li> The multiclass cross entropy cost/loss function (aka Softmax cross entropy or just Softmax loss function)</li>
+</ol>
+<p>Set up these three cost/loss functions and their respective derivatives and explain the various terms. In this project you will however only use the MSE and the Softmax  cross entropy.</p>
+
+<p>We will test three activation functions for our neural network setup, these are the </p>
+<ol>
+<li> The Sigmoid (aka <b>logit</b>) function,</li>
+<li> the RELU function and</li>
+<li> the Leaky RELU function</li>
+</ol>
+<p>Set up their expressions and their first derivatives.
+You may consult the lecture notes (with codes and more) from week 42 at <a href="https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/week42.html" target="_blank"><tt>https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/week42.html</tt></a>.
+</p>
+<h3 id="reminder-about-the-gradient-machinery-from-project-1">Reminder about the gradient machinery from project 1 </h3>
+
+<p>In the setup of a neural network code you will need your gradient descent codes from
+project 1.  For neural networks we will recommend using stochastic
+gradient descent with either the RMSprop or the ADAM algorithms for
+updating the learning rates. But you should feel free to try plain gradient descent as well.
+</p>
+
+<p>We recommend reading chapter 8 on optimization from the textbook of
+Goodfellow, Bengio and Courville at
+<a href="https://www.deeplearningbook.org/" target="_blank"><tt>https://www.deeplearningbook.org/</tt></a>. This chapter contains many
+useful insights and discussions on the optimization part of machine
+learning.  A useful reference on the back progagation algorithm is
+Nielsen's book at <a href="http://neuralnetworksanddeeplearning.com/" target="_blank"><tt>http://neuralnetworksanddeeplearning.com/</tt></a>. 
+</p>
+
+<p>You will find the Python <a href="https://seaborn.pydata.org/generated/seaborn.heatmap.html" target="_blank">Seaborn
+package</a>
+useful when plotting the results as function of the learning rate
+\( \eta \) and the hyper-parameter \( \lambda \) .
+</p>
+<h3 id="part-b-writing-your-own-neural-network-code">Part b): Writing your own Neural Network code  </h3>
+
+<p>Your aim now, and this is the central part of this project, is to
+write your own FFNN code implementing the back
+propagation algorithm discussed in the lecture slides from week 41 at <a href="https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/week41.html" target="_blank"><tt>https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/week41.html</tt></a> and week 42 at <a href="https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/week42.html" target="_blank"><tt>https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/week42.html</tt></a>.
+</p>
+
+<p>We will focus on a regression problem first, using the one-dimensional Runge function</p>
+$$
+f(x) = \frac{1}{1+25x^2},
+$$
+
+<p>from project 1.</p>
+
+<p>Use only the mean-squared error as cost function (no regularization terms) and 
+write an FFNN code for a regression problem with a flexible number of hidden
+layers and nodes using only the Sigmoid function as activation function for
+the hidden layers. Initialize the weights using a normal
+distribution. How would you initialize the biases? And which
+activation function would you select for the final output layer?
+And how would you set up your design/feature matrix? Hint: does it have to represent a polynomial approximation as you did in project 1? 
+</p>
+
+<p>Train your network and compare the results with those from your OLS
+regression code from project 1 using the one-dimensional Runge
+function.  When comparing your neural network code with the OLS
+results from project 1, use the same data sets which gave you the best
+MSE score. Moreover, use the polynomial order from project 1 that gave you the
+best result.  Compare these results with your neural network with one
+and two hidden layers using \( 50 \) and \( 100 \) hidden nodes, respectively.
+</p>
+
+<p>Comment your results and give a critical discussion of the results
+obtained with the OLS code from project 1 and your own neural network
+code.  Make an analysis of the learning rates employed to find the
+optimal MSE score. Test both stochastic gradient descent
+with RMSprop and ADAM and plain gradient descent with different
+learning rates.
+</p>
+
+<p>You should, as you did in project 1, scale your data.</p>
+<h3 id="part-c-testing-against-other-software-libraries">Part c): Testing against other software libraries </h3>
+
+<p>You should test your results against a similar code using <b>Scikit-Learn</b> (see the examples in the above lecture notes from weeks 41 and 42) or <b>tensorflow/keras</b> or <b>Pytorch</b> (for Pytorch, see Raschka et al.'s text chapters 12 and 13). </p>
+
+<p>Furthermore, you should also test that your derivatives are correctly
+calculated using automatic differentiation, using for example the
+<b>Autograd</b> library or the <b>JAX</b> library. It is optional to implement
+these libraries for the present project. In this project they serve as
+useful tests of our derivatives.
+</p>
+<h3 id="part-d-testing-different-activation-functions-and-depths-of-the-neural-network">Part d): Testing different activation functions and depths of the neural network </h3>
+
+<p>You should also test different activation functions for the hidden
+layers. Try out the Sigmoid, the RELU and the Leaky RELU functions and
+discuss your results.  Test your results as functions of the number of hidden layers and nodes. Do you see signs of overfitting?
+It is optional in this project to perform a bias-variance trade-off analysis. 
+</p>
+<h3 id="part-e-testing-different-norms">Part e): Testing different norms </h3>
+
+<p>Finally, still using the one-dimensional Runge function, add now the
+hyperparameters \( \lambda \) with the \( L_2 \) and \( L_1 \) norms.  Find the
+optimal results for the hyperparameters \( \lambda \) and the learning
+rates \( \eta \) and neural network architecture and compare the \( L_2 \) results with Ridge regression from
+project 1 and the \( L_1 \) results with the Lasso calculations of project 1.
+Use again the same data sets and the best results from project 1 in your comparisons. 
+</p>
+<h3 id="part-f-classification-analysis-using-neural-networks">Part f): Classification  analysis using neural networks  </h3>
+
+<p>With a well-written code it should now be easy to change the
+activation function for the output layer.
+</p>
+
+<p>Here we will change the cost function for our neural network code
+developed in parts b), d) and e) in order to perform a classification
+analysis.  The classification problem we will study is the multiclass
+MNIST problem, see the description of the full data set at
+<a href="https://www.kaggle.com/datasets/hojjatk/mnist-dataset" target="_blank"><tt>https://www.kaggle.com/datasets/hojjatk/mnist-dataset</tt></a>. We will use the Softmax cross entropy function discussed in a). 
+The MNIST data set discussed in the lecture notes from week 42 is a downscaled variant of the full dataset. 
+</p>
+
+<p>Feel free to suggest other data sets. If you find the classic MNIST data set somewhat limited, feel free to try the  
+MNIST-Fashion data set at for example <a href="https://www.kaggle.com/datasets/zalando-research/fashionmnist" target="_blank"><tt>https://www.kaggle.com/datasets/zalando-research/fashionmnist</tt></a>.
+</p>
+
+<p>To set up the data set, the following python programs may be useful</p>
+
+<!-- code=python (!bc pycod) typeset with pygments style "default" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #f8f8f8">
+  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.datasets</span> <span style="color: #008000; font-weight: bold">import</span> fetch_openml
+
+<span style="color: #408080; font-style: italic"># Fetch the MNIST dataset</span>
+mnist <span style="color: #666666">=</span> fetch_openml(<span style="color: #BA2121">&#39;mnist_784&#39;</span>, version<span style="color: #666666">=1</span>, as_frame<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">False</span>, parser<span style="color: #666666">=</span><span style="color: #BA2121">&#39;auto&#39;</span>)
+
+<span style="color: #408080; font-style: italic"># Extract data (features) and target (labels)</span>
+X <span style="color: #666666">=</span> mnist<span style="color: #666666">.</span>data
+y <span style="color: #666666">=</span> mnist<span style="color: #666666">.</span>target
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
+</div>
+
+<p>You should consider scaling the data. The Pixel values in MNIST range from 0 to 255. Scaling them to a 0-1 range can improve the performance of some models. That is, you could implement the following scaling</p>
+
+<!-- code=python (!bc pycod) typeset with pygments style "default" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #f8f8f8">
+  <pre style="line-height: 125%;">X <span style="color: #666666">=</span> X <span style="color: #666666">/</span> <span style="color: #666666">255.0</span>
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
+</div>
+
+<p>And then perform the standard train-test splitting</p>
+
+<!-- code=python (!bc pycod) typeset with pygments style "default" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #f8f8f8">
+  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.model_selection</span> <span style="color: #008000; font-weight: bold">import</span> train_test_split
+X_train, X_test, y_train, y_test <span style="color: #666666">=</span> train_test_split(X, y, test_size<span style="color: #666666">=0.2</span>, random_state<span style="color: #666666">=42</span>)
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
+</div>
+
+<p>To measure the performance of our classification problem we will use the
+so-called <em>accuracy</em> score.  The accuracy is as you would expect just
+the number of correctly guessed targets \( t_i \) divided by the total
+number of targets, that is 
+</p>
+
+$$ 
+\text{Accuracy} = \frac{\sum_{i=1}^n I(t_i = y_i)}{n} ,
+$$
+
+<p>where \( I \) is the indicator function, \( 1 \) if \( t_i = y_i \) and \( 0 \)
+otherwise if we have a binary classification problem. Here \( t_i \)
+represents the target and \( y_i \) the outputs of your FFNN code and \( n \) is simply the number of targets \( t_i \).
+</p>
+
+<p>Discuss your results and give a critical analysis of the various parameters, including hyper-parameters like the learning rates and the regularization parameter \( \lambda \), various activation functions, number of hidden layers and nodes and activation functions.  </p>
+
+<p>Again, we strongly recommend that you compare your own neural Network
+code for classification and pertinent results against a similar code using <b>Scikit-Learn</b>  or <b>tensorflow/keras</b> or <b>pytorch</b>.
+</p>
+
+<p>If you have time, you can use the functionality of <b>scikit-learn</b> and compare your neural network results with those from Logistic regression. This is optional.
+The weblink  here <a href="https://medium.com/ai-in-plain-english/comparison-between-logistic-regression-and-neural-networks-in-classifying-digits-dc5e85cd93c3" target="_blank"><tt>https://medium.com/ai-in-plain-english/comparison-between-logistic-regression-and-neural-networks-in-classifying-digits-dc5e85cd93c3</tt></a>compares logistic regression and FFNN using the so-called MNIST data set. You may find several useful hints and ideas from this article. Your neural network code can implement the equivalent of logistic regression by simply setting the number of hidden layers to zero and keeping just the input and the output layers. 
+</p>
+
+<p>If you wish to compare with say Logisti Regression from <b>scikit-learn</b>, the following code uses the above data set</p>
+
+<!-- code=python (!bc pycod) typeset with pygments style "default" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #f8f8f8">
+  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.linear_model</span> <span style="color: #008000; font-weight: bold">import</span> LogisticRegression
+<span style="color: #408080; font-style: italic"># Initialize the model</span>
+model <span style="color: #666666">=</span> LogisticRegression(solver<span style="color: #666666">=</span><span style="color: #BA2121">&#39;saga&#39;</span>, multi_class<span style="color: #666666">=</span><span style="color: #BA2121">&#39;multinomial&#39;</span>, max_iter<span style="color: #666666">=1000</span>, random_state<span style="color: #666666">=42</span>)
+<span style="color: #408080; font-style: italic"># Train the model</span>
+model<span style="color: #666666">.</span>fit(X_train, y_train)
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.metrics</span> <span style="color: #008000; font-weight: bold">import</span> accuracy_score
+<span style="color: #408080; font-style: italic"># Make predictions on the test set</span>
+y_pred <span style="color: #666666">=</span> model<span style="color: #666666">.</span>predict(X_test)
+<span style="color: #408080; font-style: italic"># Calculate accuracy</span>
+accuracy <span style="color: #666666">=</span> accuracy_score(y_test, y_pred)
+<span style="color: #008000">print</span>(<span style="color: #BA2121">f&quot;Model Accuracy: </span><span style="color: #BB6688; font-weight: bold">{</span>accuracy<span style="color: #BB6688; font-weight: bold">:</span><span style="color: #BA2121">.4f</span><span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">&quot;</span>)
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
+</div>
+<h3 id="part-g-critical-evaluation-of-the-various-algorithms">Part g) Critical evaluation of the various algorithms </h3>
+
+<p>After all these glorious calculations, you should now summarize the
+various algorithms and come with a critical evaluation of their pros
+and cons. Which algorithm works best for the regression case and which
+is best for the classification case. These codes can also be part of
+your final project 3, but now applied to other data sets.
+</p>
+<h2 id="summary-of-methods-to-implement-and-analyze">Summary of methods to implement and analyze </h2>
+
+<b>Required Implementation:</b>
+<ol>
+<li> Reuse the regression code and results from project 1, these will act as a benchmark for seeing how suited a neural network is for this regression task.</li>
+<li> Implement a neural network with</li>
+<ul>
+  <li> A flexible number of layers</li>
+  <li> A flexible number of nodes in each layer</li>
+  <li> A changeable activation function in each layer (Sigmoid, ReLU, LeakyReLU, as well as Linear and Softmax)</li>
+  <li> A changeable cost function, which will be set to MSE for regression and cross-entropy for multiple-classification</li>
+  <li> An optional L1 or L2 norm of the weights and biases in the cost function (only used for computing gradients, not interpretable metrics)</li>
+</ul>
+<li> Implement the back-propagation algorithm to compute the gradient of your neural network</li>
+<li> Reuse the implementation of Plain and Stochastic Gradient Descent from Project 1 (and adapt the code to work with the your neural network)</li>
+<ul>
+  <li> With no optimization algorithm</li>
+  <li> With RMS Prop</li>
+  <li> With ADAM</li>
+</ul>
+<li> Implement scaling and train-test splitting of your data, preferably using sklearn</li>
+<li> Implement and compute metrics like the MSE and Accuracy</li>
+</ol>
+<h3 id="required-analysis">Required Analysis: </h3>
+<ol>
+<li> Briefly show and argue for the advantages and disadvantages of the methods from Project 1.</li>
+<li> Explore and show the impact of changing the number of layers, nodes per layer, choice of activation function, and inclusion of L1 and L2 norms. Present only the most interesting results from this exploration. 2D Heatmaps will be good for this: Start with finding a well performing set of hyper-parameters, then change two at a time in a range that shows good and bad performance.</li>
+<li> Show and argue for the advantages and disadvantages of using a neural network for regression on your data</li>
+<li> Show and argue for the advantages and disadvantages of using a neural network for classification on your data</li>
+<li> Show and argue for the advantages and disadvantages of the different gradient methods and learning rates when training the neural network</li>
+</ol>
+<h3 id="optional-note-that-you-should-include-at-least-two-of-these-in-the-report">Optional (Note that you should include at least two of these in the report): </h3>
+<ol>
+<li> Implement Logistic Regression as simple classification model case (equivalent to a Neural Network with just the output layer)</li>
+<li> Compute the gradient of the neural network with autograd, to show that it gives the same result as your hand-written backpropagation.</li>
+<li> Compare your results with results from using a machine-learning library like pytorch (https://docs.pytorch.org/tutorials/beginner/basics/buildmodel_tutorial.html)</li>
+<li> Use a more complex classification dataset instead, like the fashion MNIST (see <a href="https://www.kaggle.com/datasets/zalando-research/fashionmnist" target="_blank"><tt>https://www.kaggle.com/datasets/zalando-research/fashionmnist</tt></a>)</li>
+<li> Use a more complex regression dataset instead, like the two-dimensional Runge function \( f(x,y)=\left[(10x - 5)^2 + (10y - 5)^2 + 1 \right]^{-1} \), or even more complicated two-dimensional functions (see the supplementary material of <a href="https://www.nature.com/articles/s41467-025-61362-4" target="_blank"><tt>https://www.nature.com/articles/s41467-025-61362-4</tt></a> for an extensive list of two-dimensional functions).</li> 
+<li> Compute and interpret a confusion matrix of your best classification model (see <a href="https://www.researchgate.net/figure/Confusion-matrix-of-MNIST-and-F-MNIST-embeddings_fig5_349758607" target="_blank"><tt>https://www.researchgate.net/figure/Confusion-matrix-of-MNIST-and-F-MNIST-embeddings_fig5_349758607</tt></a>)</li>
+</ol>
+<h2 id="background-literature">Background literature </h2>
+
+<ol>
+<li> The text of Michael Nielsen is highly recommended, see Nielsen's book at <a href="http://neuralnetworksanddeeplearning.com/" target="_blank"><tt>http://neuralnetworksanddeeplearning.com/</tt></a>. It is an excellent read.</li>
+<li> Goodfellow, Bengio and Courville, Deep Learning at <a href="https://www.deeplearningbook.org/" target="_blank"><tt>https://www.deeplearningbook.org/</tt></a>. Here we recommend chapters 6, 7 and 8</li>
+<li> Raschka et al. at <a href="https://sebastianraschka.com/blog/2022/ml-pytorch-book.html" target="_blank"><tt>https://sebastianraschka.com/blog/2022/ml-pytorch-book.html</tt></a>. Here we recommend chapters 11, 12 and 13.</li>
+</ol>
+<h2 id="introduction-to-numerical-projects">Introduction to numerical projects </h2>
+
+<p>Here follows a brief recipe and recommendation on how to write a report for each
+project.
+</p>
+
+<ul>
+  <li> Give a short description of the nature of the problem and the eventual  numerical methods you have used.</li>
+  <li> Describe the algorithm you have used and/or developed. Here you may find it convenient to use pseudocoding. In many cases you can describe the algorithm in the program itself.</li>
+  <li> Include the source code of your program. Comment your program properly.</li>
+  <li> If possible, try to find analytic solutions, or known limits in order to test your program when developing the code.</li>
+  <li> Include your results either in figure form or in a table. Remember to        label your results. All tables and figures should have relevant captions        and labels on the axes.</li>
+  <li> Try to evaluate the reliabilty and numerical stability/precision of your results. If possible, include a qualitative and/or quantitative discussion of the numerical stability, eventual loss of precision etc.</li>
+  <li> Try to give an interpretation of you results in your answers to  the problems.</li>
+  <li> Critique: if possible include your comments and reflections about the  exercise, whether you felt you learnt something, ideas for improvements and  other thoughts you've made when solving the exercise. We wish to keep this course at the interactive level and your comments can help us improve it.</li>
+  <li> Try to establish a practice where you log your work at the  computerlab. You may find such a logbook very handy at later stages in your work, especially when you don't properly remember  what a previous test version  of your program did. Here you could also record  the time spent on solving the exercise, various algorithms you may have tested or other topics which you feel worthy of mentioning.</li>
+</ul>
+<h2 id="format-for-electronic-delivery-of-report-and-programs">Format for electronic delivery of report and programs </h2>
+
+<p>The preferred format for the report is a PDF file. You can also use DOC or postscript formats or as an ipython notebook file.  As programming language we prefer that you choose between C/C++, Fortran2008 or Python. The following prescription should be followed when preparing the report:</p>
+
+<ul>
+  <li> Use Canvas to hand in your projects, log in  at  <a href="https://www.uio.no/english/services/it/education/canvas/" target="_blank"><tt>https://www.uio.no/english/services/it/education/canvas/</tt></a> with your normal UiO username and password.</li>
+  <li> Upload <b>only</b> the report file or the link to your GitHub/GitLab or similar typo of  repos!  For the source code file(s) you have developed please provide us with your link to your GitHub/GitLab or similar  domain.  The report file should include all of your discussions and a list of the codes you have developed.  Do not include library files which are available at the course homepage, unless you have made specific changes to them.</li>
+  <li> In your GitHub/GitLab or similar repository, please include a folder which contains selected results. These can be in the form of output from your code for a selected set of runs and input parameters.</li>
+</ul>
+<p>Finally, 
+we encourage you to collaborate. Optimal working groups consist of 
+2-3 students. You can then hand in a common report. 
+</p>
+
+<!-- ------------------- end of main content --------------- -->
+<center style="font-size:80%">
+<!-- copyright --> &copy; 1999-2025, "Data Analysis and Machine Learning FYS-STK3155/FYS4155":"http://www.uio.no/studier/emner/matnat/fys/FYS3155/index-eng.html". Released under CC Attribution-NonCommercial 4.0 license
+</center>
+</body>
+</html>
+
diff --git a/doc/Projects/2025/Project2/ipynb/.ipynb_checkpoints/Project2-checkpoint.ipynb b/doc/Projects/2025/Project2/ipynb/.ipynb_checkpoints/Project2-checkpoint.ipynb
new file mode 100644
index 000000000..90ca0ae29
--- /dev/null
+++ b/doc/Projects/2025/Project2/ipynb/.ipynb_checkpoints/Project2-checkpoint.ipynb
@@ -0,0 +1,635 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "d724df6f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- HTML file automatically generated from DocOnce source (https://github.com/doconce/doconce/)\n",
+    "doconce format html Project2.do.txt  -->\n",
+    "<!-- dom:TITLE: Project 2 on Machine Learning, deadline November 10 (Midnight) -->"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8c1bfdba",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "# Project 2 on Machine Learning, deadline November 10 (Midnight)\n",
+    "**[Data Analysis and Machine Learning FYS-STK3155/FYS4155](http://www.uio.no/studier/emner/matnat/fys/FYS3155/index-eng.html)**, University of Oslo, Norway\n",
+    "\n",
+    "Date: **October 14, 2025**"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "42f6cef9",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Deliverables\n",
+    "\n",
+    "First, join a group in canvas with your group partners. Pick an avaliable group for Project 2 in the **People** page.\n",
+    "\n",
+    "In canvas, deliver as a group and include:\n",
+    "\n",
+    "* A PDF of your report which follows the guidelines covered below and in the week 39 exercises. Additional requirements include:\n",
+    "\n",
+    "  * It should be around 5000 words, use the word counter in Overleaf for this. This often corresponds to 10-12 pages. References and appendices are excluded from the word count\n",
+    "\n",
+    "  * It should include around 10-15 figures. You can include more figures in appendices and/or as supplemental material in your repository.\n",
+    "\n",
+    "* A comment linking to your github repository (or folder in one of your github repositories) for this project. The repository must include\n",
+    "\n",
+    "A PDF file of the report\n",
+    "  * A folder named Code, where you put python files for your functions and notebooks for reproducing your results. Remember to use a seed for generating random data and for train-test splits when generating final results.\n",
+    "\n",
+    "  * A README file with the name of the group members\n",
+    "\n",
+    "  * a short description of the project\n",
+    "\n",
+    "  * a description of how to install the required packages to run your code from a requirements.txt file or similar (such as a plain text description) names and descriptions of the various notebooks in the Code folder and the results they produce"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6b088eeb",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "### Preamble: Note on writing reports, using reference material, AI and other tools\n",
+    "\n",
+    "We want you to answer the three different projects by handing in\n",
+    "reports written like a standard scientific/technical report. The links\n",
+    "at\n",
+    "https://github.com/CompPhysics/MachineLearning/tree/master/doc/Projects/n",
+    "contain more information. There you can find examples of previous\n",
+    "reports, the projects themselves, how we grade reports etc. How to\n",
+    "write reports will also be discussed during the various lab\n",
+    "sessions. Please do ask us if you are in doubt.\n",
+    "\n",
+    "When using codes and material from other sources, you should refer to\n",
+    "these in the bibliography of your report, indicating wherefrom you for\n",
+    "example got the code, whether this is from the lecture notes,\n",
+    "softwares like Scikit-Learn, TensorFlow, PyTorch or other\n",
+    "sources. These sources should always be cited correctly. How to cite\n",
+    "some of the libraries is often indicated from their corresponding\n",
+    "GitHub sites or websites, see for example how to cite Scikit-Learn at\n",
+    "https://scikit-learn.org/dev/about.html./n",
+    "\n",
+    "We enocurage you to use tools like ChatGPT or similar in writing the\n",
+    "report. If you use for example ChatGPT, please do cite it properly and\n",
+    "include (if possible) your questions and answers as an addition to the\n",
+    "report. This can be uploaded to for example your website,\n",
+    "GitHub/GitLab or similar as supplemental material.\n",
+    "\n",
+    "If you would like to study other data sets, feel free to propose other\n",
+    "sets. What we have proposed here are mere suggestions from our\n",
+    "side. If you opt for another data set, consider using a set which has\n",
+    "been studied in the scientific literature. This makes it easier for\n",
+    "you to compare and analyze your results. Comparing with existing\n",
+    "results from the scientific literature is also an essential element of\n",
+    "the scientific discussion. The University of California at Irvine with\n",
+    "its Machine Learning repository at\n",
+    "https://archive.ics.uci.edu/ml/index.php%20is%20an%20excellent%20site%20to%20look/n",
+    "up for examples and inspiration. Kaggle.com is an equally interesting\n",
+    "site. Feel free to explore these sites."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1f51c6be",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Classification and Regression, writing our own neural network code\n",
+    "\n",
+    "The main aim of this project is to study both classification and\n",
+    "regression problems by developing our own \n",
+    "feed-forward neural network (FFNN) code. The exercises from week 41 and 42 (see <https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/exercisesweek41.html> and <https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/exercisesweek42.html>) as well as the lecture material from the same weeks (see  <https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/week41.html> and <https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/week42.html>) should contain enough information for you to get started with writing your own code.\n",
+    "\n",
+    "We will also reuse our codes on gradient descent methods from project 1.\n",
+    "\n",
+    "The data sets that we propose here are (the default sets)\n",
+    "\n",
+    "* Regression (fitting a continuous function). In this part you will need to bring back your results from project 1 and compare these with what you get from your Neural Network code to be developed here. The data sets could be\n",
+    "\n",
+    "  * The simple one-dimensional function Runge function from project 1, that is $f(x) = \\frac{1}{1+25x^2}$. We recommend using a simpler function when developing your neural network code for regression problems. Feel however free to discuss and study other functions, such as the two-dimensional Runge function $f(x,y)=\\left[(10x - 5)^2 + (10y - 5)^2 + 1 \\right]^{-1}$, or even more complicated two-dimensional functions (see the supplementary material of <https://www.nature.com/articles/s41467-025-61362-4> for an extensive list of two-dimensional functions). \n",
+    "\n",
+    "* Classification.\n",
+    "\n",
+    " * We will consider a multiclass classification problem given by the full MNIST data set. The full data set is at <https://www.kaggle.com/datasets/hojjatk/mnist-dataset>.\n",
+    "\n",
+    "We will start with a regression problem and we will reuse our codes on gradient descent methods from project 1."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5428a6da",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "### Part a): Analytical warm-up\n",
+    "\n",
+    "When using our gradient machinery from project 1, we will need the expressions for the cost/loss functions and their respective\n",
+    "gradients. The functions whose gradients we need are:\n",
+    "1. The mean-squared error (MSE) with and without the $L_1$ and $L_2$ norms (regression problems)\n",
+    "\n",
+    "2. The binary cross entropy (aka log loss)  for binary classification problems with and without $L_1$ and $L_2$ norms\n",
+    "\n",
+    "3. The multiclass cross entropy cost/loss function (aka Softmax cross entropy or just Softmax loss function)\n",
+    "\n",
+    "Set up these three cost/loss functions and their respective derivatives and explain the various terms. In this project you will however only use the MSE and the Softmax  cross entropy.\n",
+    "\n",
+    "We will test three activation functions for our neural network setup, these are the \n",
+    "1. The Sigmoid (aka **logit**) function,\n",
+    "\n",
+    "2. the RELU function and\n",
+    "\n",
+    "3. the Leaky RELU function\n",
+    "\n",
+    "Set up their expressions and their first derivatives.\n",
+    "You may consult the lecture notes (with codes and more) from week 42 at <https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/week42.html>."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d56ea8d6",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "### Reminder about the gradient machinery from project 1\n",
+    "\n",
+    "In the setup of a neural network code you will need your gradient descent codes from\n",
+    "project 1.  For neural networks we will recommend using stochastic\n",
+    "gradient descent with either the RMSprop or the ADAM algorithms for\n",
+    "updating the learning rates. But you should feel free to try plain gradient descent as well.\n",
+    "\n",
+    "We recommend reading chapter 8 on optimization from the textbook of\n",
+    "Goodfellow, Bengio and Courville at\n",
+    "<https://www.deeplearningbook.org/>. This chapter contains many\n",
+    "useful insights and discussions on the optimization part of machine\n",
+    "learning.  A useful reference on the back progagation algorithm is\n",
+    "Nielsen's book at <http://neuralnetworksanddeeplearning.com/>. \n",
+    "\n",
+    "You will find the Python [Seaborn\n",
+    "package](https://seaborn.pydata.org/generated/seaborn.heatmap.html)\n",
+    "useful when plotting the results as function of the learning rate\n",
+    "$\\eta$ and the hyper-parameter $\\lambda$ ."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "87464bce",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "### Part b): Writing your own Neural Network code\n",
+    "\n",
+    "Your aim now, and this is the central part of this project, is to\n",
+    "write your own FFNN code implementing the back\n",
+    "propagation algorithm discussed in the lecture slides from week 41 at <https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/week41.html> and week 42 at <https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/week42.html>.\n",
+    "\n",
+    "We will focus on a regression problem first, using the one-dimensional Runge function"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "fc102ae5",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "f(x) = \\frac{1}{1+25x^2},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cec503de",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "from project 1.\n",
+    "\n",
+    "Use only the mean-squared error as cost function (no regularization terms) and \n",
+    "write an FFNN code for a regression problem with a flexible number of hidden\n",
+    "layers and nodes using only the Sigmoid function as activation function for\n",
+    "the hidden layers. Initialize the weights using a normal\n",
+    "distribution. How would you initialize the biases? And which\n",
+    "activation function would you select for the final output layer?\n",
+    "And how would you set up your design/feature matrix? Hint: does it have to represent a polynomial approximation as you did in project 1? \n",
+    "\n",
+    "Train your network and compare the results with those from your OLS\n",
+    "regression code from project 1 using the one-dimensional Runge\n",
+    "function.  When comparing your neural network code with the OLS\n",
+    "results from project 1, use the same data sets which gave you the best\n",
+    "MSE score. Moreover, use the polynomial order from project 1 that gave you the\n",
+    "best result.  Compare these results with your neural network with one\n",
+    "and two hidden layers using $50$ and $100$ hidden nodes, respectively.\n",
+    "\n",
+    "Comment your results and give a critical discussion of the results\n",
+    "obtained with the OLS code from project 1 and your own neural network\n",
+    "code.  Make an analysis of the learning rates employed to find the\n",
+    "optimal MSE score. Test both stochastic gradient descent\n",
+    "with RMSprop and ADAM and plain gradient descent with different\n",
+    "learning rates.\n",
+    "\n",
+    "You should, as you did in project 1, scale your data."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "bbf4879f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "### Part c): Testing against other software libraries\n",
+    "\n",
+    "You should test your results against a similar code using **Scikit-Learn** (see the examples in the above lecture notes from weeks 41 and 42) or **tensorflow/keras** or **Pytorch** (for Pytorch, see Raschka et al.'s text chapters 12 and 13). \n",
+    "\n",
+    "Furthermore, you should also test that your derivatives are correctly\n",
+    "calculated using automatic differentiation, using for example the\n",
+    "**Autograd** library or the **JAX** library. It is optional to implement\n",
+    "these libraries for the present project. In this project they serve as\n",
+    "useful tests of our derivatives."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "307035d6",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "### Part d): Testing different activation functions and depths of the neural network\n",
+    "\n",
+    "You should also test different activation functions for the hidden\n",
+    "layers. Try out the Sigmoid, the RELU and the Leaky RELU functions and\n",
+    "discuss your results.  Test your results as functions of the number of hidden layers and nodes. Do you see signs of overfitting?\n",
+    "It is optional in this project to perform a bias-variance trade-off analysis."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a6d69596",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "### Part e): Testing different norms\n",
+    "\n",
+    "Finally, still using the one-dimensional Runge function, add now the\n",
+    "hyperparameters $\\lambda$ with the $L_2$ and $L_1$ norms.  Find the\n",
+    "optimal results for the hyperparameters $\\lambda$ and the learning\n",
+    "rates $\\eta$ and neural network architecture and compare the $L_2$ results with Ridge regression from\n",
+    "project 1 and the $L_1$ results with the Lasso calculations of project 1.\n",
+    "Use again the same data sets and the best results from project 1 in your comparisons."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b4073806",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "### Part f): Classification  analysis using neural networks\n",
+    "\n",
+    "With a well-written code it should now be easy to change the\n",
+    "activation function for the output layer.\n",
+    "\n",
+    "Here we will change the cost function for our neural network code\n",
+    "developed in parts b), d) and e) in order to perform a classification\n",
+    "analysis.  The classification problem we will study is the multiclass\n",
+    "MNIST problem, see the description of the full data set at\n",
+    "<https://www.kaggle.com/datasets/hojjatk/mnist-dataset>. We will use the Softmax cross entropy function discussed in a). \n",
+    "The MNIST data set discussed in the lecture notes from week 42 is a downscaled variant of the full dataset. \n",
+    "\n",
+    "Feel free to suggest other data sets. If you find the classic MNIST data set somewhat limited, feel free to try the  \n",
+    "MNIST-Fashion data set at for example <https://www.kaggle.com/datasets/zalando-research/fashionmnist>.\n",
+    "\n",
+    "To set up the data set, the following python programs may be useful"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "id": "97f27c66",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "from sklearn.datasets import fetch_openml\n",
+    "\n",
+    "# Fetch the MNIST dataset\n",
+    "mnist = fetch_openml('mnist_784', version=1, as_frame=False, parser='auto')\n",
+    "\n",
+    "# Extract data (features) and target (labels)\n",
+    "X = mnist.data\n",
+    "y = mnist.target"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9525e347",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "You should consider scaling the data. The Pixel values in MNIST range from 0 to 255. Scaling them to a 0-1 range can improve the performance of some models. That is, you could implement the following scaling"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "id": "a9919b5f",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "X = X / 255.0"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c794dffb",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "And then perform the standard train-test splitting"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "id": "ea0aa772",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "from sklearn.model_selection import train_test_split\n",
+    "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b960fb33",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "To measure the performance of our classification problem we will use the\n",
+    "so-called *accuracy* score.  The accuracy is as you would expect just\n",
+    "the number of correctly guessed targets $t_i$ divided by the total\n",
+    "number of targets, that is"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "47b8fa51",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\text{Accuracy} = \\frac{\\sum_{i=1}^n I(t_i = y_i)}{n} ,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5e5a1100",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where $I$ is the indicator function, $1$ if $t_i = y_i$ and $0$\n",
+    "otherwise if we have a binary classification problem. Here $t_i$\n",
+    "represents the target and $y_i$ the outputs of your FFNN code and $n$ is simply the number of targets $t_i$.\n",
+    "\n",
+    "Discuss your results and give a critical analysis of the various parameters, including hyper-parameters like the learning rates and the regularization parameter $\\lambda$, various activation functions, number of hidden layers and nodes and activation functions.  \n",
+    "\n",
+    "Again, we strongly recommend that you compare your own neural Network\n",
+    "code for classification and pertinent results against a similar code using **Scikit-Learn**  or **tensorflow/keras** or **pytorch**.\n",
+    "\n",
+    "If you have time, you can use the functionality of **scikit-learn** and compare your neural network results with those from Logistic regression. This is optional.\n",
+    "The weblink  here <https://medium.com/ai-in-plain-english/comparison-between-logistic-regression-and-neural-networks-in-classifying-digits-dc5e85cd93c3>compares logistic regression and FFNN using the so-called MNIST data set. You may find several useful hints and ideas from this article. Your neural network code can implement the equivalent of logistic regression by simply setting the number of hidden layers to zero. \n",
+    "\n",
+    "If you wish to compare with say Logisti Regression from **scikit-learn**, the following code uses the above data set"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "id": "94699ffc",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "from sklearn.linear_model import LogisticRegression\n",
+    "# Initialize the model\n",
+    "model = LogisticRegression(solver='saga', multi_class='multinomial', max_iter=1000, random_state=42)\n",
+    "# Train the model\n",
+    "model.fit(X_train, y_train)\n",
+    "from sklearn.metrics import accuracy_score\n",
+    "# Make predictions on the test set\n",
+    "y_pred = model.predict(X_test)\n",
+    "# Calculate accuracy\n",
+    "accuracy = accuracy_score(y_test, y_pred)\n",
+    "print(f\"Model Accuracy: {accuracy:.4f}\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5a842d68",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "### Part g) Critical evaluation of the various algorithms\n",
+    "\n",
+    "After all these glorious calculations, you should now summarize the\n",
+    "various algorithms and come with a critical evaluation of their pros\n",
+    "and cons. Which algorithm works best for the regression case and which\n",
+    "is best for the classification case. These codes can also be part of\n",
+    "your final project 3, but now applied to other data sets."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b57aadc2",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Summary of methods to implement and analyze\n",
+    "\n",
+    "**Required Implementation:**\n",
+    "1. Reuse the regression code and results from project 1, these will act as a benchmark for seeing how suited a neural network is for this regression task.\n",
+    "\n",
+    "2. Implement a neural network with\n",
+    "\n",
+    "  * A flexible number of layers\n",
+    "\n",
+    "  * A flexible number of nodes in each layer\n",
+    "\n",
+    "  * A changeable activation function in each layer (Sigmoid, ReLU, LeakyReLU, as well as Linear and Softmax)\n",
+    "\n",
+    "  * A changeable cost function, which will be set to MSE for regression and cross-entropy for multiple-classification\n",
+    "\n",
+    "  * An optional L1 or L2 norm of the weights and biases in the cost function (only used for computing gradients, not interpretable metrics)\n",
+    "\n",
+    "3. Implement the back-propagation algorithm to compute the gradient of your neural network\n",
+    "\n",
+    "4. Reuse the implementation of Plain and Stochastic Gradient Descent from Project 1 (and adapt the code to work with the your neural network)\n",
+    "\n",
+    "  * With no optimization algorithm\n",
+    "\n",
+    "  * With RMS Prop\n",
+    "\n",
+    "  * With ADAM\n",
+    "\n",
+    "5. Implement scaling and train-test splitting of your data, preferably using sklearn\n",
+    "\n",
+    "6. Implement and compute metrics like the MSE and Accuracy"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ae2d8c77",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "### Required Analysis:\n",
+    "\n",
+    "1. Briefly show and argue for the advantages and disadvantages of the methods from Project 1.\n",
+    "\n",
+    "2. Explore and show the impact of changing the number of layers, nodes per layer, choice of activation function, and inclusion of L1 and L2 norms. Present only the most interesting results from this exploration. 2D Heatmaps will be good for this: Start with finding a well performing set of hyper-parameters, then change two at a time in a range that shows good and bad performance.\n",
+    "\n",
+    "3. Show and argue for the advantages and disadvantages of using a neural network for regression on your data\n",
+    "\n",
+    "4. Show and argue for the advantages and disadvantages of using a neural network for classification on your data\n",
+    "\n",
+    "5. Show and argue for the advantages and disadvantages of the different gradient methods and learning rates when training the neural network"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "97736190",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "### Optional (Note that you should include at least two of these in the report):\n",
+    "\n",
+    "1. Implement Logistic Regression as simple classification model case (equivalent to a Neural Network with one layer)\n",
+    "\n",
+    "2. Compute the gradient of the neural network with autograd, to show that it gives the same result as your hand-written backpropagation.\n",
+    "\n",
+    "3. Compare your results with results from using a machine-learning library like pytorch (https://docs.pytorch.org/tutorials/beginner/basics/buildmodel_tutorial.html)\n",
+    "\n",
+    "4. Use a more complex classification dataset instead, like the fashion MNIST (see <https://www.kaggle.com/datasets/zalando-research/fashionmnist>)\n",
+    "\n",
+    "5. Use a more complex regression dataset instead, like the two-dimensional Runge function $f(x,y)=\\left[(10x - 5)^2 + (10y - 5)^2 + 1 \\right]^{-1}$, or even more complicated two-dimensional functions (see the supplementary material of <https://www.nature.com/articles/s41467-025-61362-4> for an extensive list of two-dimensional functions). \n",
+    "\n",
+    "6. Compute and interpret a confusion matrix of your best classification model (see <https://www.researchgate.net/figure/Confusion-matrix-of-MNIST-and-F-MNIST-embeddings_fig5_349758607>)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8f4d4afc",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Background literature\n",
+    "\n",
+    "1. The text of Michael Nielsen is highly recommended, see Nielsen's book at <http://neuralnetworksanddeeplearning.com/>. It is an excellent read.\n",
+    "\n",
+    "2. Goodfellow, Bengio and Courville, Deep Learning at <https://www.deeplearningbook.org/>. Here we recommend chapters 6, 7 and 8\n",
+    "\n",
+    "3. Raschka et al. at <https://sebastianraschka.com/blog/2022/ml-pytorch-book.html>. Here we recommend chapters 11, 12 and 13."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "404319bc",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Introduction to numerical projects\n",
+    "\n",
+    "Here follows a brief recipe and recommendation on how to write a report for each\n",
+    "project.\n",
+    "\n",
+    "  * Give a short description of the nature of the problem and the eventual  numerical methods you have used.\n",
+    "\n",
+    "  * Describe the algorithm you have used and/or developed. Here you may find it convenient to use pseudocoding. In many cases you can describe the algorithm in the program itself.\n",
+    "\n",
+    "  * Include the source code of your program. Comment your program properly.\n",
+    "\n",
+    "  * If possible, try to find analytic solutions, or known limits in order to test your program when developing the code.\n",
+    "\n",
+    "  * Include your results either in figure form or in a table. Remember to        label your results. All tables and figures should have relevant captions        and labels on the axes.\n",
+    "\n",
+    "  * Try to evaluate the reliabilty and numerical stability/precision of your results. If possible, include a qualitative and/or quantitative discussion of the numerical stability, eventual loss of precision etc.\n",
+    "\n",
+    "  * Try to give an interpretation of you results in your answers to  the problems.\n",
+    "\n",
+    "  * Critique: if possible include your comments and reflections about the  exercise, whether you felt you learnt something, ideas for improvements and  other thoughts you've made when solving the exercise. We wish to keep this course at the interactive level and your comments can help us improve it.\n",
+    "\n",
+    "  * Try to establish a practice where you log your work at the  computerlab. You may find such a logbook very handy at later stages in your work, especially when you don't properly remember  what a previous test version  of your program did. Here you could also record  the time spent on solving the exercise, various algorithms you may have tested or other topics which you feel worthy of mentioning."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a23505fa",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Format for electronic delivery of report and programs\n",
+    "\n",
+    "The preferred format for the report is a PDF file. You can also use DOC or postscript formats or as an ipython notebook file.  As programming language we prefer that you choose between C/C++, Fortran2008 or Python. The following prescription should be followed when preparing the report:\n",
+    "\n",
+    "  * Use Canvas to hand in your projects, log in  at  <https://www.uio.no/english/services/it/education/canvas/> with your normal UiO username and password.\n",
+    "\n",
+    "  * Upload **only** the report file or the link to your GitHub/GitLab or similar typo of  repos!  For the source code file(s) you have developed please provide us with your link to your GitHub/GitLab or similar  domain.  The report file should include all of your discussions and a list of the codes you have developed.  Do not include library files which are available at the course homepage, unless you have made specific changes to them.\n",
+    "\n",
+    "  * In your GitHub/GitLab or similar repository, please include a folder which contains selected results. These can be in the form of output from your code for a selected set of runs and input parameters.\n",
+    "\n",
+    "Finally, \n",
+    "we encourage you to collaborate. Optimal working groups consist of \n",
+    "2-3 students. You can then hand in a common report."
+   ]
+  }
+ ],
+ "metadata": {},
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/doc/Projects/2025/Project2/ipynb/Project2.ipynb b/doc/Projects/2025/Project2/ipynb/Project2.ipynb
new file mode 100644
index 000000000..faf4aee16
--- /dev/null
+++ b/doc/Projects/2025/Project2/ipynb/Project2.ipynb
@@ -0,0 +1,635 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "96e577ca",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- HTML file automatically generated from DocOnce source (https://github.com/doconce/doconce/)\n",
+    "doconce format html Project2.do.txt  -->\n",
+    "<!-- dom:TITLE: Project 2 on Machine Learning, deadline November 10 (Midnight) -->"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "067c02b9",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "# Project 2 on Machine Learning, deadline November 10 (Midnight)\n",
+    "**[Data Analysis and Machine Learning FYS-STK3155/FYS4155](http://www.uio.no/studier/emner/matnat/fys/FYS3155/index-eng.html)**, University of Oslo, Norway\n",
+    "\n",
+    "Date: **October 14, 2025**"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "01f9fedd",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Deliverables\n",
+    "\n",
+    "First, join a group in canvas with your group partners. Pick an avaliable group for Project 2 in the **People** page.\n",
+    "\n",
+    "In canvas, deliver as a group and include:\n",
+    "\n",
+    "* A PDF of your report which follows the guidelines covered below and in the week 39 exercises. Additional requirements include:\n",
+    "\n",
+    "  * It should be around 5000 words, use the word counter in Overleaf for this. This often corresponds to 10-12 pages. References and appendices are excluded from the word count\n",
+    "\n",
+    "  * It should include around 10-15 figures. You can include more figures in appendices and/or as supplemental material in your repository.\n",
+    "\n",
+    "* A comment linking to your github repository (or folder in one of your github repositories) for this project. The repository must include\n",
+    "\n",
+    "A PDF file of the report\n",
+    "  * A folder named Code, where you put python files for your functions and notebooks for reproducing your results. Remember to use a seed for generating random data and for train-test splits when generating final results.\n",
+    "\n",
+    "  * A README file with the name of the group members\n",
+    "\n",
+    "  * a short description of the project\n",
+    "\n",
+    "  * a description of how to install the required packages to run your code from a requirements.txt file or similar (such as a plain text description) names and descriptions of the various notebooks in the Code folder and the results they produce"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9f8e4871",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "### Preamble: Note on writing reports, using reference material, AI and other tools\n",
+    "\n",
+    "We want you to answer the three different projects by handing in\n",
+    "reports written like a standard scientific/technical report. The links\n",
+    "at\n",
+    "https://github.com/CompPhysics/MachineLearning/tree/master/doc/Projects/n",
+    "contain more information. There you can find examples of previous\n",
+    "reports, the projects themselves, how we grade reports etc. How to\n",
+    "write reports will also be discussed during the various lab\n",
+    "sessions. Please do ask us if you are in doubt.\n",
+    "\n",
+    "When using codes and material from other sources, you should refer to\n",
+    "these in the bibliography of your report, indicating wherefrom you for\n",
+    "example got the code, whether this is from the lecture notes,\n",
+    "softwares like Scikit-Learn, TensorFlow, PyTorch or other\n",
+    "sources. These sources should always be cited correctly. How to cite\n",
+    "some of the libraries is often indicated from their corresponding\n",
+    "GitHub sites or websites, see for example how to cite Scikit-Learn at\n",
+    "https://scikit-learn.org/dev/about.html./n",
+    "\n",
+    "We enocurage you to use tools like ChatGPT or similar in writing the\n",
+    "report. If you use for example ChatGPT, please do cite it properly and\n",
+    "include (if possible) your questions and answers as an addition to the\n",
+    "report. This can be uploaded to for example your website,\n",
+    "GitHub/GitLab or similar as supplemental material.\n",
+    "\n",
+    "If you would like to study other data sets, feel free to propose other\n",
+    "sets. What we have proposed here are mere suggestions from our\n",
+    "side. If you opt for another data set, consider using a set which has\n",
+    "been studied in the scientific literature. This makes it easier for\n",
+    "you to compare and analyze your results. Comparing with existing\n",
+    "results from the scientific literature is also an essential element of\n",
+    "the scientific discussion. The University of California at Irvine with\n",
+    "its Machine Learning repository at\n",
+    "https://archive.ics.uci.edu/ml/index.php%20is%20an%20excellent%20site%20to%20look/n",
+    "up for examples and inspiration. Kaggle.com is an equally interesting\n",
+    "site. Feel free to explore these sites."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "460cc6ea",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Classification and Regression, writing our own neural network code\n",
+    "\n",
+    "The main aim of this project is to study both classification and\n",
+    "regression problems by developing our own \n",
+    "feed-forward neural network (FFNN) code. The exercises from week 41 and 42 (see <https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/exercisesweek41.html> and <https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/exercisesweek42.html>) as well as the lecture material from the same weeks (see  <https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/week41.html> and <https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/week42.html>) should contain enough information for you to get started with writing your own code.\n",
+    "\n",
+    "We will also reuse our codes on gradient descent methods from project 1.\n",
+    "\n",
+    "The data sets that we propose here are (the default sets)\n",
+    "\n",
+    "* Regression (fitting a continuous function). In this part you will need to bring back your results from project 1 and compare these with what you get from your Neural Network code to be developed here. The data sets could be\n",
+    "\n",
+    "  * The simple one-dimensional function Runge function from project 1, that is $f(x) = \\frac{1}{1+25x^2}$. We recommend using a simpler function when developing your neural network code for regression problems. Feel however free to discuss and study other functions, such as the two-dimensional Runge function $f(x,y)=\\left[(10x - 5)^2 + (10y - 5)^2 + 1 \\right]^{-1}$, or even more complicated two-dimensional functions (see the supplementary material of <https://www.nature.com/articles/s41467-025-61362-4> for an extensive list of two-dimensional functions). \n",
+    "\n",
+    "* Classification.\n",
+    "\n",
+    " * We will consider a multiclass classification problem given by the full MNIST data set. The full data set is at <https://www.kaggle.com/datasets/hojjatk/mnist-dataset>.\n",
+    "\n",
+    "We will start with a regression problem and we will reuse our codes on gradient descent methods from project 1."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d62a07ef",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "### Part a): Analytical warm-up\n",
+    "\n",
+    "When using our gradient machinery from project 1, we will need the expressions for the cost/loss functions and their respective\n",
+    "gradients. The functions whose gradients we need are:\n",
+    "1. The mean-squared error (MSE) with and without the $L_1$ and $L_2$ norms (regression problems)\n",
+    "\n",
+    "2. The binary cross entropy (aka log loss)  for binary classification problems with and without $L_1$ and $L_2$ norms\n",
+    "\n",
+    "3. The multiclass cross entropy cost/loss function (aka Softmax cross entropy or just Softmax loss function)\n",
+    "\n",
+    "Set up these three cost/loss functions and their respective derivatives and explain the various terms. In this project you will however only use the MSE and the Softmax  cross entropy.\n",
+    "\n",
+    "We will test three activation functions for our neural network setup, these are the \n",
+    "1. The Sigmoid (aka **logit**) function,\n",
+    "\n",
+    "2. the RELU function and\n",
+    "\n",
+    "3. the Leaky RELU function\n",
+    "\n",
+    "Set up their expressions and their first derivatives.\n",
+    "You may consult the lecture notes (with codes and more) from week 42 at <https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/week42.html>."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9cd8b8ac",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "### Reminder about the gradient machinery from project 1\n",
+    "\n",
+    "In the setup of a neural network code you will need your gradient descent codes from\n",
+    "project 1.  For neural networks we will recommend using stochastic\n",
+    "gradient descent with either the RMSprop or the ADAM algorithms for\n",
+    "updating the learning rates. But you should feel free to try plain gradient descent as well.\n",
+    "\n",
+    "We recommend reading chapter 8 on optimization from the textbook of\n",
+    "Goodfellow, Bengio and Courville at\n",
+    "<https://www.deeplearningbook.org/>. This chapter contains many\n",
+    "useful insights and discussions on the optimization part of machine\n",
+    "learning.  A useful reference on the back progagation algorithm is\n",
+    "Nielsen's book at <http://neuralnetworksanddeeplearning.com/>. \n",
+    "\n",
+    "You will find the Python [Seaborn\n",
+    "package](https://seaborn.pydata.org/generated/seaborn.heatmap.html)\n",
+    "useful when plotting the results as function of the learning rate\n",
+    "$\\eta$ and the hyper-parameter $\\lambda$ ."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5931b155",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "### Part b): Writing your own Neural Network code\n",
+    "\n",
+    "Your aim now, and this is the central part of this project, is to\n",
+    "write your own FFNN code implementing the back\n",
+    "propagation algorithm discussed in the lecture slides from week 41 at <https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/week41.html> and week 42 at <https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/week42.html>.\n",
+    "\n",
+    "We will focus on a regression problem first, using the one-dimensional Runge function"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b273fc8a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "f(x) = \\frac{1}{1+25x^2},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e13db1ec",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "from project 1.\n",
+    "\n",
+    "Use only the mean-squared error as cost function (no regularization terms) and \n",
+    "write an FFNN code for a regression problem with a flexible number of hidden\n",
+    "layers and nodes using only the Sigmoid function as activation function for\n",
+    "the hidden layers. Initialize the weights using a normal\n",
+    "distribution. How would you initialize the biases? And which\n",
+    "activation function would you select for the final output layer?\n",
+    "And how would you set up your design/feature matrix? Hint: does it have to represent a polynomial approximation as you did in project 1? \n",
+    "\n",
+    "Train your network and compare the results with those from your OLS\n",
+    "regression code from project 1 using the one-dimensional Runge\n",
+    "function.  When comparing your neural network code with the OLS\n",
+    "results from project 1, use the same data sets which gave you the best\n",
+    "MSE score. Moreover, use the polynomial order from project 1 that gave you the\n",
+    "best result.  Compare these results with your neural network with one\n",
+    "and two hidden layers using $50$ and $100$ hidden nodes, respectively.\n",
+    "\n",
+    "Comment your results and give a critical discussion of the results\n",
+    "obtained with the OLS code from project 1 and your own neural network\n",
+    "code.  Make an analysis of the learning rates employed to find the\n",
+    "optimal MSE score. Test both stochastic gradient descent\n",
+    "with RMSprop and ADAM and plain gradient descent with different\n",
+    "learning rates.\n",
+    "\n",
+    "You should, as you did in project 1, scale your data."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4f864e31",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "### Part c): Testing against other software libraries\n",
+    "\n",
+    "You should test your results against a similar code using **Scikit-Learn** (see the examples in the above lecture notes from weeks 41 and 42) or **tensorflow/keras** or **Pytorch** (for Pytorch, see Raschka et al.'s text chapters 12 and 13). \n",
+    "\n",
+    "Furthermore, you should also test that your derivatives are correctly\n",
+    "calculated using automatic differentiation, using for example the\n",
+    "**Autograd** library or the **JAX** library. It is optional to implement\n",
+    "these libraries for the present project. In this project they serve as\n",
+    "useful tests of our derivatives."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c9faeafd",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "### Part d): Testing different activation functions and depths of the neural network\n",
+    "\n",
+    "You should also test different activation functions for the hidden\n",
+    "layers. Try out the Sigmoid, the RELU and the Leaky RELU functions and\n",
+    "discuss your results.  Test your results as functions of the number of hidden layers and nodes. Do you see signs of overfitting?\n",
+    "It is optional in this project to perform a bias-variance trade-off analysis."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d865c22b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "### Part e): Testing different norms\n",
+    "\n",
+    "Finally, still using the one-dimensional Runge function, add now the\n",
+    "hyperparameters $\\lambda$ with the $L_2$ and $L_1$ norms.  Find the\n",
+    "optimal results for the hyperparameters $\\lambda$ and the learning\n",
+    "rates $\\eta$ and neural network architecture and compare the $L_2$ results with Ridge regression from\n",
+    "project 1 and the $L_1$ results with the Lasso calculations of project 1.\n",
+    "Use again the same data sets and the best results from project 1 in your comparisons."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5270af8f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "### Part f): Classification  analysis using neural networks\n",
+    "\n",
+    "With a well-written code it should now be easy to change the\n",
+    "activation function for the output layer.\n",
+    "\n",
+    "Here we will change the cost function for our neural network code\n",
+    "developed in parts b), d) and e) in order to perform a classification\n",
+    "analysis.  The classification problem we will study is the multiclass\n",
+    "MNIST problem, see the description of the full data set at\n",
+    "<https://www.kaggle.com/datasets/hojjatk/mnist-dataset>. We will use the Softmax cross entropy function discussed in a). \n",
+    "The MNIST data set discussed in the lecture notes from week 42 is a downscaled variant of the full dataset. \n",
+    "\n",
+    "Feel free to suggest other data sets. If you find the classic MNIST data set somewhat limited, feel free to try the  \n",
+    "MNIST-Fashion data set at for example <https://www.kaggle.com/datasets/zalando-research/fashionmnist>.\n",
+    "\n",
+    "To set up the data set, the following python programs may be useful"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "id": "4e0e1fea",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "from sklearn.datasets import fetch_openml\n",
+    "\n",
+    "# Fetch the MNIST dataset\n",
+    "mnist = fetch_openml('mnist_784', version=1, as_frame=False, parser='auto')\n",
+    "\n",
+    "# Extract data (features) and target (labels)\n",
+    "X = mnist.data\n",
+    "y = mnist.target"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8fe85677",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "You should consider scaling the data. The Pixel values in MNIST range from 0 to 255. Scaling them to a 0-1 range can improve the performance of some models. That is, you could implement the following scaling"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "id": "b28318b2",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "X = X / 255.0"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "97e02c71",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "And then perform the standard train-test splitting"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "id": "88af355c",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "from sklearn.model_selection import train_test_split\n",
+    "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d1f8f0ed",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "To measure the performance of our classification problem we will use the\n",
+    "so-called *accuracy* score.  The accuracy is as you would expect just\n",
+    "the number of correctly guessed targets $t_i$ divided by the total\n",
+    "number of targets, that is"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "554b3a48",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\text{Accuracy} = \\frac{\\sum_{i=1}^n I(t_i = y_i)}{n} ,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "77bfdd5c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where $I$ is the indicator function, $1$ if $t_i = y_i$ and $0$\n",
+    "otherwise if we have a binary classification problem. Here $t_i$\n",
+    "represents the target and $y_i$ the outputs of your FFNN code and $n$ is simply the number of targets $t_i$.\n",
+    "\n",
+    "Discuss your results and give a critical analysis of the various parameters, including hyper-parameters like the learning rates and the regularization parameter $\\lambda$, various activation functions, number of hidden layers and nodes and activation functions.  \n",
+    "\n",
+    "Again, we strongly recommend that you compare your own neural Network\n",
+    "code for classification and pertinent results against a similar code using **Scikit-Learn**  or **tensorflow/keras** or **pytorch**.\n",
+    "\n",
+    "If you have time, you can use the functionality of **scikit-learn** and compare your neural network results with those from Logistic regression. This is optional.\n",
+    "The weblink  here <https://medium.com/ai-in-plain-english/comparison-between-logistic-regression-and-neural-networks-in-classifying-digits-dc5e85cd93c3>compares logistic regression and FFNN using the so-called MNIST data set. You may find several useful hints and ideas from this article. Your neural network code can implement the equivalent of logistic regression by simply setting the number of hidden layers to zero and keeping just the input and the output layers. \n",
+    "\n",
+    "If you wish to compare with say Logisti Regression from **scikit-learn**, the following code uses the above data set"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "id": "eaa9e72e",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "from sklearn.linear_model import LogisticRegression\n",
+    "# Initialize the model\n",
+    "model = LogisticRegression(solver='saga', multi_class='multinomial', max_iter=1000, random_state=42)\n",
+    "# Train the model\n",
+    "model.fit(X_train, y_train)\n",
+    "from sklearn.metrics import accuracy_score\n",
+    "# Make predictions on the test set\n",
+    "y_pred = model.predict(X_test)\n",
+    "# Calculate accuracy\n",
+    "accuracy = accuracy_score(y_test, y_pred)\n",
+    "print(f\"Model Accuracy: {accuracy:.4f}\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c7ba883e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "### Part g) Critical evaluation of the various algorithms\n",
+    "\n",
+    "After all these glorious calculations, you should now summarize the\n",
+    "various algorithms and come with a critical evaluation of their pros\n",
+    "and cons. Which algorithm works best for the regression case and which\n",
+    "is best for the classification case. These codes can also be part of\n",
+    "your final project 3, but now applied to other data sets."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "595be693",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Summary of methods to implement and analyze\n",
+    "\n",
+    "**Required Implementation:**\n",
+    "1. Reuse the regression code and results from project 1, these will act as a benchmark for seeing how suited a neural network is for this regression task.\n",
+    "\n",
+    "2. Implement a neural network with\n",
+    "\n",
+    "  * A flexible number of layers\n",
+    "\n",
+    "  * A flexible number of nodes in each layer\n",
+    "\n",
+    "  * A changeable activation function in each layer (Sigmoid, ReLU, LeakyReLU, as well as Linear and Softmax)\n",
+    "\n",
+    "  * A changeable cost function, which will be set to MSE for regression and cross-entropy for multiple-classification\n",
+    "\n",
+    "  * An optional L1 or L2 norm of the weights and biases in the cost function (only used for computing gradients, not interpretable metrics)\n",
+    "\n",
+    "3. Implement the back-propagation algorithm to compute the gradient of your neural network\n",
+    "\n",
+    "4. Reuse the implementation of Plain and Stochastic Gradient Descent from Project 1 (and adapt the code to work with the your neural network)\n",
+    "\n",
+    "  * With no optimization algorithm\n",
+    "\n",
+    "  * With RMS Prop\n",
+    "\n",
+    "  * With ADAM\n",
+    "\n",
+    "5. Implement scaling and train-test splitting of your data, preferably using sklearn\n",
+    "\n",
+    "6. Implement and compute metrics like the MSE and Accuracy"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "35138b41",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "### Required Analysis:\n",
+    "\n",
+    "1. Briefly show and argue for the advantages and disadvantages of the methods from Project 1.\n",
+    "\n",
+    "2. Explore and show the impact of changing the number of layers, nodes per layer, choice of activation function, and inclusion of L1 and L2 norms. Present only the most interesting results from this exploration. 2D Heatmaps will be good for this: Start with finding a well performing set of hyper-parameters, then change two at a time in a range that shows good and bad performance.\n",
+    "\n",
+    "3. Show and argue for the advantages and disadvantages of using a neural network for regression on your data\n",
+    "\n",
+    "4. Show and argue for the advantages and disadvantages of using a neural network for classification on your data\n",
+    "\n",
+    "5. Show and argue for the advantages and disadvantages of the different gradient methods and learning rates when training the neural network"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b18bea03",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "### Optional (Note that you should include at least two of these in the report):\n",
+    "\n",
+    "1. Implement Logistic Regression as simple classification model case (equivalent to a Neural Network with just the output layer)\n",
+    "\n",
+    "2. Compute the gradient of the neural network with autograd, to show that it gives the same result as your hand-written backpropagation.\n",
+    "\n",
+    "3. Compare your results with results from using a machine-learning library like pytorch (https://docs.pytorch.org/tutorials/beginner/basics/buildmodel_tutorial.html)\n",
+    "\n",
+    "4. Use a more complex classification dataset instead, like the fashion MNIST (see <https://www.kaggle.com/datasets/zalando-research/fashionmnist>)\n",
+    "\n",
+    "5. Use a more complex regression dataset instead, like the two-dimensional Runge function $f(x,y)=\\left[(10x - 5)^2 + (10y - 5)^2 + 1 \\right]^{-1}$, or even more complicated two-dimensional functions (see the supplementary material of <https://www.nature.com/articles/s41467-025-61362-4> for an extensive list of two-dimensional functions). \n",
+    "\n",
+    "6. Compute and interpret a confusion matrix of your best classification model (see <https://www.researchgate.net/figure/Confusion-matrix-of-MNIST-and-F-MNIST-embeddings_fig5_349758607>)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "580d8424",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Background literature\n",
+    "\n",
+    "1. The text of Michael Nielsen is highly recommended, see Nielsen's book at <http://neuralnetworksanddeeplearning.com/>. It is an excellent read.\n",
+    "\n",
+    "2. Goodfellow, Bengio and Courville, Deep Learning at <https://www.deeplearningbook.org/>. Here we recommend chapters 6, 7 and 8\n",
+    "\n",
+    "3. Raschka et al. at <https://sebastianraschka.com/blog/2022/ml-pytorch-book.html>. Here we recommend chapters 11, 12 and 13."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "96f5c67e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Introduction to numerical projects\n",
+    "\n",
+    "Here follows a brief recipe and recommendation on how to write a report for each\n",
+    "project.\n",
+    "\n",
+    "  * Give a short description of the nature of the problem and the eventual  numerical methods you have used.\n",
+    "\n",
+    "  * Describe the algorithm you have used and/or developed. Here you may find it convenient to use pseudocoding. In many cases you can describe the algorithm in the program itself.\n",
+    "\n",
+    "  * Include the source code of your program. Comment your program properly.\n",
+    "\n",
+    "  * If possible, try to find analytic solutions, or known limits in order to test your program when developing the code.\n",
+    "\n",
+    "  * Include your results either in figure form or in a table. Remember to        label your results. All tables and figures should have relevant captions        and labels on the axes.\n",
+    "\n",
+    "  * Try to evaluate the reliabilty and numerical stability/precision of your results. If possible, include a qualitative and/or quantitative discussion of the numerical stability, eventual loss of precision etc.\n",
+    "\n",
+    "  * Try to give an interpretation of you results in your answers to  the problems.\n",
+    "\n",
+    "  * Critique: if possible include your comments and reflections about the  exercise, whether you felt you learnt something, ideas for improvements and  other thoughts you've made when solving the exercise. We wish to keep this course at the interactive level and your comments can help us improve it.\n",
+    "\n",
+    "  * Try to establish a practice where you log your work at the  computerlab. You may find such a logbook very handy at later stages in your work, especially when you don't properly remember  what a previous test version  of your program did. Here you could also record  the time spent on solving the exercise, various algorithms you may have tested or other topics which you feel worthy of mentioning."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d1bc28ba",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Format for electronic delivery of report and programs\n",
+    "\n",
+    "The preferred format for the report is a PDF file. You can also use DOC or postscript formats or as an ipython notebook file.  As programming language we prefer that you choose between C/C++, Fortran2008 or Python. The following prescription should be followed when preparing the report:\n",
+    "\n",
+    "  * Use Canvas to hand in your projects, log in  at  <https://www.uio.no/english/services/it/education/canvas/> with your normal UiO username and password.\n",
+    "\n",
+    "  * Upload **only** the report file or the link to your GitHub/GitLab or similar typo of  repos!  For the source code file(s) you have developed please provide us with your link to your GitHub/GitLab or similar  domain.  The report file should include all of your discussions and a list of the codes you have developed.  Do not include library files which are available at the course homepage, unless you have made specific changes to them.\n",
+    "\n",
+    "  * In your GitHub/GitLab or similar repository, please include a folder which contains selected results. These can be in the form of output from your code for a selected set of runs and input parameters.\n",
+    "\n",
+    "Finally, \n",
+    "we encourage you to collaborate. Optimal working groups consist of \n",
+    "2-3 students. You can then hand in a common report."
+   ]
+  }
+ ],
+ "metadata": {},
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/doc/Projects/2025/Project2/ipynb/ipynb-Project2-src.tar.gz b/doc/Projects/2025/Project2/ipynb/ipynb-Project2-src.tar.gz
new file mode 100644
index 000000000..d9ea3457e
Binary files /dev/null and b/doc/Projects/2025/Project2/ipynb/ipynb-Project2-src.tar.gz differ
diff --git a/doc/Projects/2025/Project2/pdf/Project2.p.tex b/doc/Projects/2025/Project2/pdf/Project2.p.tex
new file mode 100644
index 000000000..b2d52b1bb
--- /dev/null
+++ b/doc/Projects/2025/Project2/pdf/Project2.p.tex
@@ -0,0 +1,614 @@
+%%
+%% Automatically generated file from DocOnce source
+%% (https://github.com/doconce/doconce/)
+%% doconce format latex Project2.do.txt --print_latex_style=trac --latex_admon=paragraph
+%%
+% #ifdef PTEX2TEX_EXPLANATION
+%%
+%% The file follows the ptex2tex extended LaTeX format, see
+%% ptex2tex: https://code.google.com/p/ptex2tex/
+%%
+%% Run
+%%      ptex2tex myfile
+%% or
+%%      doconce ptex2tex myfile
+%%
+%% to turn myfile.p.tex into an ordinary LaTeX file myfile.tex.
+%% (The ptex2tex program: https://code.google.com/p/ptex2tex)
+%% Many preprocess options can be added to ptex2tex or doconce ptex2tex
+%%
+%%      ptex2tex -DMINTED myfile
+%%      doconce ptex2tex myfile envir=minted
+%%
+%% ptex2tex will typeset code environments according to a global or local
+%% .ptex2tex.cfg configure file. doconce ptex2tex will typeset code
+%% according to options on the command line (just type doconce ptex2tex to
+%% see examples). If doconce ptex2tex has envir=minted, it enables the
+%% minted style without needing -DMINTED.
+% #endif
+
+% #define PREAMBLE
+
+% #ifdef PREAMBLE
+%-------------------- begin preamble ----------------------
+
+\documentclass[%
+oneside,                 % oneside: electronic viewing, twoside: printing
+final,                   % draft: marks overfull hboxes, figures with paths
+10pt]{article}
+
+\listfiles               %  print all files needed to compile this document
+
+\usepackage{relsize,makeidx,color,setspace,amsmath,amsfonts,amssymb}
+\usepackage[table]{xcolor}
+\usepackage{bm,ltablex,microtype}
+
+\usepackage[pdftex]{graphicx}
+
+\usepackage{ptex2tex}
+% #ifdef MINTED
+\usepackage{minted}
+\usemintedstyle{default}
+% #endif
+
+\usepackage[T1]{fontenc}
+%\usepackage[latin1]{inputenc}
+\usepackage{ucs}
+\usepackage[utf8x]{inputenc}
+
+\usepackage{lmodern}         % Latin Modern fonts derived from Computer Modern
+
+% Hyperlinks in PDF:
+\definecolor{linkcolor}{rgb}{0,0,0.4}
+\usepackage{hyperref}
+\hypersetup{
+    breaklinks=true,
+    colorlinks=true,
+    linkcolor=linkcolor,
+    urlcolor=linkcolor,
+    citecolor=black,
+    filecolor=black,
+    %filecolor=blue,
+    pdfmenubar=true,
+    pdftoolbar=true,
+    bookmarksdepth=3   % Uncomment (and tweak) for PDF bookmarks with more levels than the TOC
+    }
+%\hyperbaseurl{}   % hyperlinks are relative to this root
+
+\setcounter{tocdepth}{2}  % levels in table of contents
+
+% --- fancyhdr package for fancy headers ---
+\usepackage{fancyhdr}
+\fancyhf{} % sets both header and footer to nothing
+\renewcommand{\headrulewidth}{0pt}
+\fancyfoot[LE,RO]{\thepage}
+% Ensure copyright on titlepage (article style) and chapter pages (book style)
+\fancypagestyle{plain}{
+  \fancyhf{}
+  \fancyfoot[C]{{\footnotesize \copyright\ 1999-2025, "Data Analysis and Machine Learning FYS-STK3155/FYS4155":"http://www.uio.no/studier/emner/matnat/fys/FYS3155/index-eng.html". Released under CC Attribution-NonCommercial 4.0 license}}
+%  \renewcommand{\footrulewidth}{0mm}
+  \renewcommand{\headrulewidth}{0mm}
+}
+% Ensure copyright on titlepages with \thispagestyle{empty}
+\fancypagestyle{empty}{
+  \fancyhf{}
+  \fancyfoot[C]{{\footnotesize \copyright\ 1999-2025, "Data Analysis and Machine Learning FYS-STK3155/FYS4155":"http://www.uio.no/studier/emner/matnat/fys/FYS3155/index-eng.html". Released under CC Attribution-NonCommercial 4.0 license}}
+  \renewcommand{\footrulewidth}{0mm}
+  \renewcommand{\headrulewidth}{0mm}
+}
+
+\pagestyle{fancy}
+
+
+% prevent orhpans and widows
+\clubpenalty = 10000
+\widowpenalty = 10000
+
+% --- end of standard preamble for documents ---
+
+
+% insert custom LaTeX commands...
+
+\raggedbottom
+\makeindex
+\usepackage[totoc]{idxlayout}   % for index in the toc
+\usepackage[nottoc]{tocbibind}  % for references/bibliography in the toc
+
+%-------------------- end preamble ----------------------
+
+\begin{document}
+
+% matching end for #ifdef PREAMBLE
+% #endif
+
+\newcommand{\exercisesection}[1]{\subsection*{#1}}
+
+
+% ------------------- main content ----------------------
+
+
+
+% ----------------- title -------------------------
+
+\thispagestyle{empty}
+
+\begin{center}
+{\LARGE\bf
+\begin{spacing}{1.25}
+Project 2 on Machine Learning, deadline November 10 (Midnight)
+\end{spacing}
+}
+\end{center}
+
+% ----------------- author(s) -------------------------
+
+\begin{center}
+{\bf \href{{http://www.uio.no/studier/emner/matnat/fys/FYS3155/index-eng.html}}{Data Analysis and Machine Learning FYS-STK3155/FYS4155}}
+\end{center}
+
+    \begin{center}
+% List of all institutions:
+\centerline{{\small University of Oslo, Norway}}
+\end{center}
+    
+% ----------------- end author(s) -------------------------
+
+% --- begin date ---
+\begin{center}
+October 14, 2025
+\end{center}
+% --- end date ---
+
+\vspace{1cm}
+
+
+\subsection{Deliverables}
+
+First, join a group in canvas with your group partners. Pick an avaliable group for Project 2 in the \textbf{People} page.
+
+In canvas, deliver as a group and include:
+
+\begin{itemize}
+\item A PDF of your report which follows the guidelines covered below and in the week 39 exercises. Additional requirements include:
+\begin{itemize}
+
+  \item It should be around 5000 words, use the word counter in Overleaf for this. This often corresponds to 10-12 pages. References and appendices are excluded from the word count
+
+  \item It should include around 10-15 figures. You can include more figures in appendices and/or as supplemental material in your repository.
+
+\end{itemize}
+
+\noindent
+\item A comment linking to your github repository (or folder in one of your github repositories) for this project. The repository must include
+\end{itemize}
+
+\noindent
+A PDF file of the report
+\begin{itemize}
+  \item A folder named Code, where you put python files for your functions and notebooks for reproducing your results. Remember to use a seed for generating random data and for train-test splits when generating final results.
+
+  \item A README file with the name of the group members
+
+  \item a short description of the project
+
+  \item a description of how to install the required packages to run your code from a requirements.txt file or similar (such as a plain text description) names and descriptions of the various notebooks in the Code folder and the results they produce
+\end{itemize}
+
+\noindent
+\paragraph{Preamble: Note on writing reports, using reference material, AI and other tools.}
+We want you to answer the three different projects by handing in
+reports written like a standard scientific/technical report. The links
+at
+https://github.com/CompPhysics/MachineLearning/tree/master/doc/Projects
+contain more information. There you can find examples of previous
+reports, the projects themselves, how we grade reports etc. How to
+write reports will also be discussed during the various lab
+sessions. Please do ask us if you are in doubt.
+
+When using codes and material from other sources, you should refer to
+these in the bibliography of your report, indicating wherefrom you for
+example got the code, whether this is from the lecture notes,
+softwares like Scikit-Learn, TensorFlow, PyTorch or other
+sources. These sources should always be cited correctly. How to cite
+some of the libraries is often indicated from their corresponding
+GitHub sites or websites, see for example how to cite Scikit-Learn at
+https://scikit-learn.org/dev/about.html.
+
+We enocurage you to use tools like ChatGPT or similar in writing the
+report. If you use for example ChatGPT, please do cite it properly and
+include (if possible) your questions and answers as an addition to the
+report. This can be uploaded to for example your website,
+GitHub/GitLab or similar as supplemental material.
+
+If you would like to study other data sets, feel free to propose other
+sets. What we have proposed here are mere suggestions from our
+side. If you opt for another data set, consider using a set which has
+been studied in the scientific literature. This makes it easier for
+you to compare and analyze your results. Comparing with existing
+results from the scientific literature is also an essential element of
+the scientific discussion. The University of California at Irvine with
+its Machine Learning repository at
+https://archive.ics.uci.edu/ml/index.php is an excellent site to look
+up for examples and inspiration. Kaggle.com is an equally interesting
+site. Feel free to explore these sites. 
+
+\subsection{Classification and Regression, writing our own neural network code}
+
+The main aim of this project is to study both classification and
+regression problems by developing our own 
+feed-forward neural network (FFNN) code. The exercises from week 41 and 42 (see \href{{https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/exercisesweek41.html}}{\nolinkurl{https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/exercisesweek41.html}} and \href{{https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/exercisesweek42.html}}{\nolinkurl{https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/exercisesweek42.html}}) as well as the lecture material from the same weeks (see  \href{{https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/week41.html}}{\nolinkurl{https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/week41.html}} and \href{{https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/week42.html}}{\nolinkurl{https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/week42.html}}) should contain enough information for you to get started with writing your own code.
+
+We will also reuse our codes on gradient descent methods from project 1.
+
+The data sets that we propose here are (the default sets)
+
+\begin{itemize}
+\item Regression (fitting a continuous function). In this part you will need to bring back your results from project 1 and compare these with what you get from your Neural Network code to be developed here. The data sets could be
+\begin{itemize}
+
+  \item The simple one-dimensional function Runge function from project 1, that is $f(x) = \frac{1}{1+25x^2}$. We recommend using a simpler function when developing your neural network code for regression problems. Feel however free to discuss and study other functions, such as the two-dimensional Runge function $f(x,y)=\left[(10x - 5)^2 + (10y - 5)^2 + 1 \right]^{-1}$, or even more complicated two-dimensional functions (see the supplementary material of \href{{https://www.nature.com/articles/s41467-025-61362-4}}{\nolinkurl{https://www.nature.com/articles/s41467-025-61362-4}} for an extensive list of two-dimensional functions). 
+
+\end{itemize}
+
+\noindent
+\item Classification.
+\begin{itemize}
+
+ \item We will consider a multiclass classification problem given by the full MNIST data set. The full data set is at \href{{https://www.kaggle.com/datasets/hojjatk/mnist-dataset}}{\nolinkurl{https://www.kaggle.com/datasets/hojjatk/mnist-dataset}}.
+\end{itemize}
+
+\noindent
+\end{itemize}
+
+\noindent
+We will start with a regression problem and we will reuse our codes on gradient descent methods from project 1.
+
+\paragraph{Part a): Analytical warm-up.}
+When using our gradient machinery from project 1, we will need the expressions for the cost/loss functions and their respective
+gradients. The functions whose gradients we need are:
+\begin{enumerate}
+\item The mean-squared error (MSE) with and without the $L_1$ and $L_2$ norms (regression problems)
+
+\item The binary cross entropy (aka log loss)  for binary classification problems with and without $L_1$ and $L_2$ norms
+
+\item The multiclass cross entropy cost/loss function (aka Softmax cross entropy or just Softmax loss function)
+\end{enumerate}
+
+\noindent
+Set up these three cost/loss functions and their respective derivatives and explain the various terms. In this project you will however only use the MSE and the Softmax  cross entropy.
+
+We will test three activation functions for our neural network setup, these are the 
+\begin{enumerate}
+\item The Sigmoid (aka \textbf{logit}) function,
+
+\item the RELU function and
+
+\item the Leaky RELU function
+\end{enumerate}
+
+\noindent
+Set up their expressions and their first derivatives.
+You may consult the lecture notes (with codes and more) from week 42 at \href{{https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/week42.html}}{\nolinkurl{https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/week42.html}}.
+
+\paragraph{Reminder about the gradient machinery from project 1.}
+In the setup of a neural network code you will need your gradient descent codes from
+project 1.  For neural networks we will recommend using stochastic
+gradient descent with either the RMSprop or the ADAM algorithms for
+updating the learning rates. But you should feel free to try plain gradient descent as well.
+
+We recommend reading chapter 8 on optimization from the textbook of
+Goodfellow, Bengio and Courville at
+\href{{https://www.deeplearningbook.org/}}{\nolinkurl{https://www.deeplearningbook.org/}}. This chapter contains many
+useful insights and discussions on the optimization part of machine
+learning.  A useful reference on the back progagation algorithm is
+Nielsen's book at \href{{http://neuralnetworksanddeeplearning.com/}}{\nolinkurl{http://neuralnetworksanddeeplearning.com/}}. 
+
+You will find the Python \href{{https://seaborn.pydata.org/generated/seaborn.heatmap.html}}{Seaborn
+package}
+useful when plotting the results as function of the learning rate
+$\eta$ and the hyper-parameter $\lambda$ .
+
+\paragraph{Part b): Writing your own Neural Network code.}
+Your aim now, and this is the central part of this project, is to
+write your own FFNN code implementing the back
+propagation algorithm discussed in the lecture slides from week 41 at \href{{https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/week41.html}}{\nolinkurl{https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/week41.html}} and week 42 at \href{{https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/week42.html}}{\nolinkurl{https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/week42.html}}.
+
+We will focus on a regression problem first, using the one-dimensional Runge function
+\[
+f(x) = \frac{1}{1+25x^2},
+\]
+from project 1.
+
+Use only the mean-squared error as cost function (no regularization terms) and 
+write an FFNN code for a regression problem with a flexible number of hidden
+layers and nodes using only the Sigmoid function as activation function for
+the hidden layers. Initialize the weights using a normal
+distribution. How would you initialize the biases? And which
+activation function would you select for the final output layer?
+And how would you set up your design/feature matrix? Hint: does it have to represent a polynomial approximation as you did in project 1? 
+
+Train your network and compare the results with those from your OLS
+regression code from project 1 using the one-dimensional Runge
+function.  When comparing your neural network code with the OLS
+results from project 1, use the same data sets which gave you the best
+MSE score. Moreover, use the polynomial order from project 1 that gave you the
+best result.  Compare these results with your neural network with one
+and two hidden layers using $50$ and $100$ hidden nodes, respectively.
+
+Comment your results and give a critical discussion of the results
+obtained with the OLS code from project 1 and your own neural network
+code.  Make an analysis of the learning rates employed to find the
+optimal MSE score. Test both stochastic gradient descent
+with RMSprop and ADAM and plain gradient descent with different
+learning rates.
+
+You should, as you did in project 1, scale your data.
+
+\paragraph{Part c): Testing against other software libraries.}
+You should test your results against a similar code using \textbf{Scikit-Learn} (see the examples in the above lecture notes from weeks 41 and 42) or \textbf{tensorflow/keras} or \textbf{Pytorch} (for Pytorch, see Raschka et al.'s text chapters 12 and 13). 
+
+Furthermore, you should also test that your derivatives are correctly
+calculated using automatic differentiation, using for example the
+\textbf{Autograd} library or the \textbf{JAX} library. It is optional to implement
+these libraries for the present project. In this project they serve as
+useful tests of our derivatives.
+
+\paragraph{Part d): Testing different activation functions and depths of the neural network.}
+You should also test different activation functions for the hidden
+layers. Try out the Sigmoid, the RELU and the Leaky RELU functions and
+discuss your results.  Test your results as functions of the number of hidden layers and nodes. Do you see signs of overfitting?
+It is optional in this project to perform a bias-variance trade-off analysis. 
+
+\paragraph{Part e): Testing different norms.}
+Finally, still using the one-dimensional Runge function, add now the
+hyperparameters $\lambda$ with the $L_2$ and $L_1$ norms.  Find the
+optimal results for the hyperparameters $\lambda$ and the learning
+rates $\eta$ and neural network architecture and compare the $L_2$ results with Ridge regression from
+project 1 and the $L_1$ results with the Lasso calculations of project 1.
+Use again the same data sets and the best results from project 1 in your comparisons. 
+
+\paragraph{Part f): Classification  analysis using neural networks.}
+With a well-written code it should now be easy to change the
+activation function for the output layer.
+
+Here we will change the cost function for our neural network code
+developed in parts b), d) and e) in order to perform a classification
+analysis.  The classification problem we will study is the multiclass
+MNIST problem, see the description of the full data set at
+\href{{https://www.kaggle.com/datasets/hojjatk/mnist-dataset}}{\nolinkurl{https://www.kaggle.com/datasets/hojjatk/mnist-dataset}}. We will use the Softmax cross entropy function discussed in a). 
+The MNIST data set discussed in the lecture notes from week 42 is a downscaled variant of the full dataset. 
+
+Feel free to suggest other data sets. If you find the classic MNIST data set somewhat limited, feel free to try the  
+MNIST-Fashion data set at for example \href{{https://www.kaggle.com/datasets/zalando-research/fashionmnist}}{\nolinkurl{https://www.kaggle.com/datasets/zalando-research/fashionmnist}}.
+
+To set up the data set, the following python programs may be useful
+
+
+
+
+
+
+
+
+
+\bpycod
+from sklearn.datasets import fetch_openml
+
+# Fetch the MNIST dataset
+mnist = fetch_openml('mnist_784', version=1, as_frame=False, parser='auto')
+
+# Extract data (features) and target (labels)
+X = mnist.data
+y = mnist.target
+
+\epycod
+
+You should consider scaling the data. The Pixel values in MNIST range from 0 to 255. Scaling them to a 0-1 range can improve the performance of some models. That is, you could implement the following scaling
+
+
+\bpycod
+X = X / 255.0
+
+\epycod
+
+And then perform the standard train-test splitting
+
+
+
+\bpycod
+from sklearn.model_selection import train_test_split
+X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
+
+\epycod
+
+
+To measure the performance of our classification problem we will use the
+so-called \emph{accuracy} score.  The accuracy is as you would expect just
+the number of correctly guessed targets $t_i$ divided by the total
+number of targets, that is 
+
+\[ 
+\text{Accuracy} = \frac{\sum_{i=1}^n I(t_i = y_i)}{n} ,
+\]
+
+where $I$ is the indicator function, $1$ if $t_i = y_i$ and $0$
+otherwise if we have a binary classification problem. Here $t_i$
+represents the target and $y_i$ the outputs of your FFNN code and $n$ is simply the number of targets $t_i$.
+
+Discuss your results and give a critical analysis of the various parameters, including hyper-parameters like the learning rates and the regularization parameter $\lambda$, various activation functions, number of hidden layers and nodes and activation functions.  
+
+Again, we strongly recommend that you compare your own neural Network
+code for classification and pertinent results against a similar code using \textbf{Scikit-Learn}  or \textbf{tensorflow/keras} or \textbf{pytorch}.
+
+If you have time, you can use the functionality of \textbf{scikit-learn} and compare your neural network results with those from Logistic regression. This is optional.
+The weblink  here \href{{https://medium.com/ai-in-plain-english/comparison-between-logistic-regression-and-neural-networks-in-classifying-digits-dc5e85cd93c3}}{\nolinkurl{https://medium.com/ai-in-plain-english/comparison-between-logistic-regression-and-neural-networks-in-classifying-digits-dc5e85cd93c3}}compares logistic regression and FFNN using the so-called MNIST data set. You may find several useful hints and ideas from this article. Your neural network code can implement the equivalent of logistic regression by simply setting the number of hidden layers to zero and keeping just the input and the output layers. 
+
+If you wish to compare with say Logisti Regression from \textbf{scikit-learn}, the following code uses the above data set
+
+
+
+
+
+
+
+
+
+
+
+
+\bpycod
+from sklearn.linear_model import LogisticRegression
+# Initialize the model
+model = LogisticRegression(solver='saga', multi_class='multinomial', max_iter=1000, random_state=42)
+# Train the model
+model.fit(X_train, y_train)
+from sklearn.metrics import accuracy_score
+# Make predictions on the test set
+y_pred = model.predict(X_test)
+# Calculate accuracy
+accuracy = accuracy_score(y_test, y_pred)
+print(f"Model Accuracy: {accuracy:.4f}")
+
+\epycod
+
+
+\paragraph{Part g) Critical evaluation of the various algorithms.}
+After all these glorious calculations, you should now summarize the
+various algorithms and come with a critical evaluation of their pros
+and cons. Which algorithm works best for the regression case and which
+is best for the classification case. These codes can also be part of
+your final project 3, but now applied to other data sets.
+
+\subsection{Summary of methods to implement and analyze}
+
+\textbf{Required Implementation:}
+\begin{enumerate}
+\item Reuse the regression code and results from project 1, these will act as a benchmark for seeing how suited a neural network is for this regression task.
+
+\item Implement a neural network with
+\begin{itemize}
+
+  \item A flexible number of layers
+
+  \item A flexible number of nodes in each layer
+
+  \item A changeable activation function in each layer (Sigmoid, ReLU, LeakyReLU, as well as Linear and Softmax)
+
+  \item A changeable cost function, which will be set to MSE for regression and cross-entropy for multiple-classification
+
+  \item An optional L1 or L2 norm of the weights and biases in the cost function (only used for computing gradients, not interpretable metrics)
+
+\end{itemize}
+
+\noindent
+\item Implement the back-propagation algorithm to compute the gradient of your neural network
+
+\item Reuse the implementation of Plain and Stochastic Gradient Descent from Project 1 (and adapt the code to work with the your neural network)
+\begin{itemize}
+
+  \item With no optimization algorithm
+
+  \item With RMS Prop
+
+  \item With ADAM
+
+\end{itemize}
+
+\noindent
+\item Implement scaling and train-test splitting of your data, preferably using sklearn
+
+\item Implement and compute metrics like the MSE and Accuracy
+\end{enumerate}
+
+\noindent
+\paragraph{Required Analysis:}
+\begin{enumerate}
+\item Briefly show and argue for the advantages and disadvantages of the methods from Project 1.
+
+\item Explore and show the impact of changing the number of layers, nodes per layer, choice of activation function, and inclusion of L1 and L2 norms. Present only the most interesting results from this exploration. 2D Heatmaps will be good for this: Start with finding a well performing set of hyper-parameters, then change two at a time in a range that shows good and bad performance.
+
+\item Show and argue for the advantages and disadvantages of using a neural network for regression on your data
+
+\item Show and argue for the advantages and disadvantages of using a neural network for classification on your data
+
+\item Show and argue for the advantages and disadvantages of the different gradient methods and learning rates when training the neural network
+\end{enumerate}
+
+\noindent
+\paragraph{Optional (Note that you should include at least two of these in the report):}
+\begin{enumerate}
+\item Implement Logistic Regression as simple classification model case (equivalent to a Neural Network with just the output layer)
+
+\item Compute the gradient of the neural network with autograd, to show that it gives the same result as your hand-written backpropagation.
+
+\item Compare your results with results from using a machine-learning library like pytorch (https://docs.pytorch.org/tutorials/beginner/basics/buildmodel_tutorial.html)
+
+\item Use a more complex classification dataset instead, like the fashion MNIST (see \href{{https://www.kaggle.com/datasets/zalando-research/fashionmnist}}{\nolinkurl{https://www.kaggle.com/datasets/zalando-research/fashionmnist}})
+
+\item Use a more complex regression dataset instead, like the two-dimensional Runge function $f(x,y)=\left[(10x - 5)^2 + (10y - 5)^2 + 1 \right]^{-1}$, or even more complicated two-dimensional functions (see the supplementary material of \href{{https://www.nature.com/articles/s41467-025-61362-4}}{\nolinkurl{https://www.nature.com/articles/s41467-025-61362-4}} for an extensive list of two-dimensional functions). 
+
+\item Compute and interpret a confusion matrix of your best classification model (see \href{{https://www.researchgate.net/figure/Confusion-matrix-of-MNIST-and-F-MNIST-embeddings_fig5_349758607}}{\nolinkurl{https://www.researchgate.net/figure/Confusion-matrix-of-MNIST-and-F-MNIST-embeddings_fig5_349758607}})
+\end{enumerate}
+
+\noindent
+\subsection{Background literature}
+
+\begin{enumerate}
+\item The text of Michael Nielsen is highly recommended, see Nielsen's book at \href{{http://neuralnetworksanddeeplearning.com/}}{\nolinkurl{http://neuralnetworksanddeeplearning.com/}}. It is an excellent read.
+
+\item Goodfellow, Bengio and Courville, Deep Learning at \href{{https://www.deeplearningbook.org/}}{\nolinkurl{https://www.deeplearningbook.org/}}. Here we recommend chapters 6, 7 and 8
+
+\item Raschka et al.~at \href{{https://sebastianraschka.com/blog/2022/ml-pytorch-book.html}}{\nolinkurl{https://sebastianraschka.com/blog/2022/ml-pytorch-book.html}}. Here we recommend chapters 11, 12 and 13.
+\end{enumerate}
+
+\noindent
+\subsection{Introduction to numerical projects}
+
+Here follows a brief recipe and recommendation on how to write a report for each
+project.
+
+\begin{itemize}
+  \item Give a short description of the nature of the problem and the eventual  numerical methods you have used.
+
+  \item Describe the algorithm you have used and/or developed. Here you may find it convenient to use pseudocoding. In many cases you can describe the algorithm in the program itself.
+
+  \item Include the source code of your program. Comment your program properly.
+
+  \item If possible, try to find analytic solutions, or known limits in order to test your program when developing the code.
+
+  \item Include your results either in figure form or in a table. Remember to        label your results. All tables and figures should have relevant captions        and labels on the axes.
+
+  \item Try to evaluate the reliabilty and numerical stability/precision of your results. If possible, include a qualitative and/or quantitative discussion of the numerical stability, eventual loss of precision etc.
+
+  \item Try to give an interpretation of you results in your answers to  the problems.
+
+  \item Critique: if possible include your comments and reflections about the  exercise, whether you felt you learnt something, ideas for improvements and  other thoughts you've made when solving the exercise. We wish to keep this course at the interactive level and your comments can help us improve it.
+
+  \item Try to establish a practice where you log your work at the  computerlab. You may find such a logbook very handy at later stages in your work, especially when you don't properly remember  what a previous test version  of your program did. Here you could also record  the time spent on solving the exercise, various algorithms you may have tested or other topics which you feel worthy of mentioning.
+\end{itemize}
+
+\noindent
+\subsection{Format for electronic delivery of report and programs}
+
+The preferred format for the report is a PDF file. You can also use DOC or postscript formats or as an ipython notebook file.  As programming language we prefer that you choose between C/C++, Fortran2008 or Python. The following prescription should be followed when preparing the report:
+
+\begin{itemize}
+  \item Use Canvas to hand in your projects, log in  at  \href{{https://www.uio.no/english/services/it/education/canvas/}}{\nolinkurl{https://www.uio.no/english/services/it/education/canvas/}} with your normal UiO username and password.
+
+  \item Upload \textbf{only} the report file or the link to your GitHub/GitLab or similar typo of  repos!  For the source code file(s) you have developed please provide us with your link to your GitHub/GitLab or similar  domain.  The report file should include all of your discussions and a list of the codes you have developed.  Do not include library files which are available at the course homepage, unless you have made specific changes to them.
+
+  \item In your GitHub/GitLab or similar repository, please include a folder which contains selected results. These can be in the form of output from your code for a selected set of runs and input parameters.
+\end{itemize}
+
+\noindent
+Finally, 
+we encourage you to collaborate. Optimal working groups consist of 
+2-3 students. You can then hand in a common report. 
+
+
+% ------------------- end of main content ---------------
+
+% #ifdef PREAMBLE
+\end{document}
+% #endif
+
diff --git a/doc/Projects/2025/Project2/pdf/Project2.tex b/doc/Projects/2025/Project2/pdf/Project2.tex
new file mode 100644
index 000000000..488317149
--- /dev/null
+++ b/doc/Projects/2025/Project2/pdf/Project2.tex
@@ -0,0 +1,582 @@
+%%
+%% Automatically generated file from DocOnce source
+%% (https://github.com/doconce/doconce/)
+%% doconce format latex Project2.do.txt --print_latex_style=trac --latex_admon=paragraph
+%%
+
+
+%-------------------- begin preamble ----------------------
+
+\documentclass[%
+oneside,                 % oneside: electronic viewing, twoside: printing
+final,                   % draft: marks overfull hboxes, figures with paths
+10pt]{article}
+
+\listfiles               %  print all files needed to compile this document
+
+\usepackage{relsize,makeidx,color,setspace,amsmath,amsfonts,amssymb}
+\usepackage[table]{xcolor}
+\usepackage{bm,ltablex,microtype}
+
+\usepackage[pdftex]{graphicx}
+
+\usepackage{fancyvrb} % packages needed for verbatim environments
+
+\usepackage[T1]{fontenc}
+%\usepackage[latin1]{inputenc}
+\usepackage{ucs}
+\usepackage[utf8x]{inputenc}
+
+\usepackage{lmodern}         % Latin Modern fonts derived from Computer Modern
+
+% Hyperlinks in PDF:
+\definecolor{linkcolor}{rgb}{0,0,0.4}
+\usepackage{hyperref}
+\hypersetup{
+    breaklinks=true,
+    colorlinks=true,
+    linkcolor=linkcolor,
+    urlcolor=linkcolor,
+    citecolor=black,
+    filecolor=black,
+    %filecolor=blue,
+    pdfmenubar=true,
+    pdftoolbar=true,
+    bookmarksdepth=3   % Uncomment (and tweak) for PDF bookmarks with more levels than the TOC
+    }
+%\hyperbaseurl{}   % hyperlinks are relative to this root
+
+\setcounter{tocdepth}{2}  % levels in table of contents
+
+% --- fancyhdr package for fancy headers ---
+\usepackage{fancyhdr}
+\fancyhf{} % sets both header and footer to nothing
+\renewcommand{\headrulewidth}{0pt}
+\fancyfoot[LE,RO]{\thepage}
+% Ensure copyright on titlepage (article style) and chapter pages (book style)
+\fancypagestyle{plain}{
+  \fancyhf{}
+  \fancyfoot[C]{{\footnotesize \copyright\ 1999-2025, "Data Analysis and Machine Learning FYS-STK3155/FYS4155":"http://www.uio.no/studier/emner/matnat/fys/FYS3155/index-eng.html". Released under CC Attribution-NonCommercial 4.0 license}}
+%  \renewcommand{\footrulewidth}{0mm}
+  \renewcommand{\headrulewidth}{0mm}
+}
+% Ensure copyright on titlepages with \thispagestyle{empty}
+\fancypagestyle{empty}{
+  \fancyhf{}
+  \fancyfoot[C]{{\footnotesize \copyright\ 1999-2025, "Data Analysis and Machine Learning FYS-STK3155/FYS4155":"http://www.uio.no/studier/emner/matnat/fys/FYS3155/index-eng.html". Released under CC Attribution-NonCommercial 4.0 license}}
+  \renewcommand{\footrulewidth}{0mm}
+  \renewcommand{\headrulewidth}{0mm}
+}
+
+\pagestyle{fancy}
+
+
+% prevent orhpans and widows
+\clubpenalty = 10000
+\widowpenalty = 10000
+
+% --- end of standard preamble for documents ---
+
+
+% insert custom LaTeX commands...
+
+\raggedbottom
+\makeindex
+\usepackage[totoc]{idxlayout}   % for index in the toc
+\usepackage[nottoc]{tocbibind}  % for references/bibliography in the toc
+
+%-------------------- end preamble ----------------------
+
+\begin{document}
+
+% matching end for #ifdef PREAMBLE
+
+\newcommand{\exercisesection}[1]{\subsection*{#1}}
+
+
+% ------------------- main content ----------------------
+
+
+
+% ----------------- title -------------------------
+
+\thispagestyle{empty}
+
+\begin{center}
+{\LARGE\bf
+\begin{spacing}{1.25}
+Project 2 on Machine Learning, deadline November 10 (Midnight)
+\end{spacing}
+}
+\end{center}
+
+% ----------------- author(s) -------------------------
+
+\begin{center}
+{\bf \href{{http://www.uio.no/studier/emner/matnat/fys/FYS3155/index-eng.html}}{Data Analysis and Machine Learning FYS-STK3155/FYS4155}}
+\end{center}
+
+    \begin{center}
+% List of all institutions:
+\centerline{{\small University of Oslo, Norway}}
+\end{center}
+    
+% ----------------- end author(s) -------------------------
+
+% --- begin date ---
+\begin{center}
+October 14, 2025
+\end{center}
+% --- end date ---
+
+\vspace{1cm}
+
+
+\subsection*{Deliverables}
+
+First, join a group in canvas with your group partners. Pick an avaliable group for Project 2 in the \textbf{People} page.
+
+In canvas, deliver as a group and include:
+
+\begin{itemize}
+\item A PDF of your report which follows the guidelines covered below and in the week 39 exercises. Additional requirements include:
+\begin{itemize}
+
+  \item It should be around 5000 words, use the word counter in Overleaf for this. This often corresponds to 10-12 pages. References and appendices are excluded from the word count
+
+  \item It should include around 10-15 figures. You can include more figures in appendices and/or as supplemental material in your repository.
+
+\end{itemize}
+
+\noindent
+\item A comment linking to your github repository (or folder in one of your github repositories) for this project. The repository must include
+\end{itemize}
+
+\noindent
+A PDF file of the report
+\begin{itemize}
+  \item A folder named Code, where you put python files for your functions and notebooks for reproducing your results. Remember to use a seed for generating random data and for train-test splits when generating final results.
+
+  \item A README file with the name of the group members
+
+  \item a short description of the project
+
+  \item a description of how to install the required packages to run your code from a requirements.txt file or similar (such as a plain text description) names and descriptions of the various notebooks in the Code folder and the results they produce
+\end{itemize}
+
+\noindent
+\paragraph{Preamble: Note on writing reports, using reference material, AI and other tools.}
+We want you to answer the three different projects by handing in
+reports written like a standard scientific/technical report. The links
+at
+https://github.com/CompPhysics/MachineLearning/tree/master/doc/Projects
+contain more information. There you can find examples of previous
+reports, the projects themselves, how we grade reports etc. How to
+write reports will also be discussed during the various lab
+sessions. Please do ask us if you are in doubt.
+
+When using codes and material from other sources, you should refer to
+these in the bibliography of your report, indicating wherefrom you for
+example got the code, whether this is from the lecture notes,
+softwares like Scikit-Learn, TensorFlow, PyTorch or other
+sources. These sources should always be cited correctly. How to cite
+some of the libraries is often indicated from their corresponding
+GitHub sites or websites, see for example how to cite Scikit-Learn at
+https://scikit-learn.org/dev/about.html.
+
+We enocurage you to use tools like ChatGPT or similar in writing the
+report. If you use for example ChatGPT, please do cite it properly and
+include (if possible) your questions and answers as an addition to the
+report. This can be uploaded to for example your website,
+GitHub/GitLab or similar as supplemental material.
+
+If you would like to study other data sets, feel free to propose other
+sets. What we have proposed here are mere suggestions from our
+side. If you opt for another data set, consider using a set which has
+been studied in the scientific literature. This makes it easier for
+you to compare and analyze your results. Comparing with existing
+results from the scientific literature is also an essential element of
+the scientific discussion. The University of California at Irvine with
+its Machine Learning repository at
+https://archive.ics.uci.edu/ml/index.php is an excellent site to look
+up for examples and inspiration. Kaggle.com is an equally interesting
+site. Feel free to explore these sites. 
+
+\subsection*{Classification and Regression, writing our own neural network code}
+
+The main aim of this project is to study both classification and
+regression problems by developing our own 
+feed-forward neural network (FFNN) code. The exercises from week 41 and 42 (see \href{{https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/exercisesweek41.html}}{\nolinkurl{https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/exercisesweek41.html}} and \href{{https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/exercisesweek42.html}}{\nolinkurl{https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/exercisesweek42.html}}) as well as the lecture material from the same weeks (see  \href{{https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/week41.html}}{\nolinkurl{https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/week41.html}} and \href{{https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/week42.html}}{\nolinkurl{https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/week42.html}}) should contain enough information for you to get started with writing your own code.
+
+We will also reuse our codes on gradient descent methods from project 1.
+
+The data sets that we propose here are (the default sets)
+
+\begin{itemize}
+\item Regression (fitting a continuous function). In this part you will need to bring back your results from project 1 and compare these with what you get from your Neural Network code to be developed here. The data sets could be
+\begin{itemize}
+
+  \item The simple one-dimensional function Runge function from project 1, that is $f(x) = \frac{1}{1+25x^2}$. We recommend using a simpler function when developing your neural network code for regression problems. Feel however free to discuss and study other functions, such as the two-dimensional Runge function $f(x,y)=\left[(10x - 5)^2 + (10y - 5)^2 + 1 \right]^{-1}$, or even more complicated two-dimensional functions (see the supplementary material of \href{{https://www.nature.com/articles/s41467-025-61362-4}}{\nolinkurl{https://www.nature.com/articles/s41467-025-61362-4}} for an extensive list of two-dimensional functions). 
+
+\end{itemize}
+
+\noindent
+\item Classification.
+\begin{itemize}
+
+ \item We will consider a multiclass classification problem given by the full MNIST data set. The full data set is at \href{{https://www.kaggle.com/datasets/hojjatk/mnist-dataset}}{\nolinkurl{https://www.kaggle.com/datasets/hojjatk/mnist-dataset}}.
+\end{itemize}
+
+\noindent
+\end{itemize}
+
+\noindent
+We will start with a regression problem and we will reuse our codes on gradient descent methods from project 1.
+
+\paragraph{Part a): Analytical warm-up.}
+When using our gradient machinery from project 1, we will need the expressions for the cost/loss functions and their respective
+gradients. The functions whose gradients we need are:
+\begin{enumerate}
+\item The mean-squared error (MSE) with and without the $L_1$ and $L_2$ norms (regression problems)
+
+\item The binary cross entropy (aka log loss)  for binary classification problems with and without $L_1$ and $L_2$ norms
+
+\item The multiclass cross entropy cost/loss function (aka Softmax cross entropy or just Softmax loss function)
+\end{enumerate}
+
+\noindent
+Set up these three cost/loss functions and their respective derivatives and explain the various terms. In this project you will however only use the MSE and the Softmax  cross entropy.
+
+We will test three activation functions for our neural network setup, these are the 
+\begin{enumerate}
+\item The Sigmoid (aka \textbf{logit}) function,
+
+\item the RELU function and
+
+\item the Leaky RELU function
+\end{enumerate}
+
+\noindent
+Set up their expressions and their first derivatives.
+You may consult the lecture notes (with codes and more) from week 42 at \href{{https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/week42.html}}{\nolinkurl{https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/week42.html}}.
+
+\paragraph{Reminder about the gradient machinery from project 1.}
+In the setup of a neural network code you will need your gradient descent codes from
+project 1.  For neural networks we will recommend using stochastic
+gradient descent with either the RMSprop or the ADAM algorithms for
+updating the learning rates. But you should feel free to try plain gradient descent as well.
+
+We recommend reading chapter 8 on optimization from the textbook of
+Goodfellow, Bengio and Courville at
+\href{{https://www.deeplearningbook.org/}}{\nolinkurl{https://www.deeplearningbook.org/}}. This chapter contains many
+useful insights and discussions on the optimization part of machine
+learning.  A useful reference on the back progagation algorithm is
+Nielsen's book at \href{{http://neuralnetworksanddeeplearning.com/}}{\nolinkurl{http://neuralnetworksanddeeplearning.com/}}. 
+
+You will find the Python \href{{https://seaborn.pydata.org/generated/seaborn.heatmap.html}}{Seaborn
+package}
+useful when plotting the results as function of the learning rate
+$\eta$ and the hyper-parameter $\lambda$ .
+
+\paragraph{Part b): Writing your own Neural Network code.}
+Your aim now, and this is the central part of this project, is to
+write your own FFNN code implementing the back
+propagation algorithm discussed in the lecture slides from week 41 at \href{{https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/week41.html}}{\nolinkurl{https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/week41.html}} and week 42 at \href{{https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/week42.html}}{\nolinkurl{https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/week42.html}}.
+
+We will focus on a regression problem first, using the one-dimensional Runge function
+\[
+f(x) = \frac{1}{1+25x^2},
+\]
+from project 1.
+
+Use only the mean-squared error as cost function (no regularization terms) and 
+write an FFNN code for a regression problem with a flexible number of hidden
+layers and nodes using only the Sigmoid function as activation function for
+the hidden layers. Initialize the weights using a normal
+distribution. How would you initialize the biases? And which
+activation function would you select for the final output layer?
+And how would you set up your design/feature matrix? Hint: does it have to represent a polynomial approximation as you did in project 1? 
+
+Train your network and compare the results with those from your OLS
+regression code from project 1 using the one-dimensional Runge
+function.  When comparing your neural network code with the OLS
+results from project 1, use the same data sets which gave you the best
+MSE score. Moreover, use the polynomial order from project 1 that gave you the
+best result.  Compare these results with your neural network with one
+and two hidden layers using $50$ and $100$ hidden nodes, respectively.
+
+Comment your results and give a critical discussion of the results
+obtained with the OLS code from project 1 and your own neural network
+code.  Make an analysis of the learning rates employed to find the
+optimal MSE score. Test both stochastic gradient descent
+with RMSprop and ADAM and plain gradient descent with different
+learning rates.
+
+You should, as you did in project 1, scale your data.
+
+\paragraph{Part c): Testing against other software libraries.}
+You should test your results against a similar code using \textbf{Scikit-Learn} (see the examples in the above lecture notes from weeks 41 and 42) or \textbf{tensorflow/keras} or \textbf{Pytorch} (for Pytorch, see Raschka et al.'s text chapters 12 and 13). 
+
+Furthermore, you should also test that your derivatives are correctly
+calculated using automatic differentiation, using for example the
+\textbf{Autograd} library or the \textbf{JAX} library. It is optional to implement
+these libraries for the present project. In this project they serve as
+useful tests of our derivatives.
+
+\paragraph{Part d): Testing different activation functions and depths of the neural network.}
+You should also test different activation functions for the hidden
+layers. Try out the Sigmoid, the RELU and the Leaky RELU functions and
+discuss your results.  Test your results as functions of the number of hidden layers and nodes. Do you see signs of overfitting?
+It is optional in this project to perform a bias-variance trade-off analysis. 
+
+\paragraph{Part e): Testing different norms.}
+Finally, still using the one-dimensional Runge function, add now the
+hyperparameters $\lambda$ with the $L_2$ and $L_1$ norms.  Find the
+optimal results for the hyperparameters $\lambda$ and the learning
+rates $\eta$ and neural network architecture and compare the $L_2$ results with Ridge regression from
+project 1 and the $L_1$ results with the Lasso calculations of project 1.
+Use again the same data sets and the best results from project 1 in your comparisons. 
+
+\paragraph{Part f): Classification  analysis using neural networks.}
+With a well-written code it should now be easy to change the
+activation function for the output layer.
+
+Here we will change the cost function for our neural network code
+developed in parts b), d) and e) in order to perform a classification
+analysis.  The classification problem we will study is the multiclass
+MNIST problem, see the description of the full data set at
+\href{{https://www.kaggle.com/datasets/hojjatk/mnist-dataset}}{\nolinkurl{https://www.kaggle.com/datasets/hojjatk/mnist-dataset}}. We will use the Softmax cross entropy function discussed in a). 
+The MNIST data set discussed in the lecture notes from week 42 is a downscaled variant of the full dataset. 
+
+Feel free to suggest other data sets. If you find the classic MNIST data set somewhat limited, feel free to try the  
+MNIST-Fashion data set at for example \href{{https://www.kaggle.com/datasets/zalando-research/fashionmnist}}{\nolinkurl{https://www.kaggle.com/datasets/zalando-research/fashionmnist}}.
+
+To set up the data set, the following python programs may be useful
+
+
+
+
+
+
+
+
+
+\begin{verbatim}
+from sklearn.datasets import fetch_openml
+
+# Fetch the MNIST dataset
+mnist = fetch_openml('mnist_784', version=1, as_frame=False, parser='auto')
+
+# Extract data (features) and target (labels)
+X = mnist.data
+y = mnist.target
+
+\end{verbatim}
+
+You should consider scaling the data. The Pixel values in MNIST range from 0 to 255. Scaling them to a 0-1 range can improve the performance of some models. That is, you could implement the following scaling
+
+
+\begin{verbatim}
+X = X / 255.0
+
+\end{verbatim}
+
+And then perform the standard train-test splitting
+
+
+
+\begin{verbatim}
+from sklearn.model_selection import train_test_split
+X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
+
+\end{verbatim}
+
+
+To measure the performance of our classification problem we will use the
+so-called \emph{accuracy} score.  The accuracy is as you would expect just
+the number of correctly guessed targets $t_i$ divided by the total
+number of targets, that is 
+
+\[ 
+\text{Accuracy} = \frac{\sum_{i=1}^n I(t_i = y_i)}{n} ,
+\]
+
+where $I$ is the indicator function, $1$ if $t_i = y_i$ and $0$
+otherwise if we have a binary classification problem. Here $t_i$
+represents the target and $y_i$ the outputs of your FFNN code and $n$ is simply the number of targets $t_i$.
+
+Discuss your results and give a critical analysis of the various parameters, including hyper-parameters like the learning rates and the regularization parameter $\lambda$, various activation functions, number of hidden layers and nodes and activation functions.  
+
+Again, we strongly recommend that you compare your own neural Network
+code for classification and pertinent results against a similar code using \textbf{Scikit-Learn}  or \textbf{tensorflow/keras} or \textbf{pytorch}.
+
+If you have time, you can use the functionality of \textbf{scikit-learn} and compare your neural network results with those from Logistic regression. This is optional.
+The weblink  here \href{{https://medium.com/ai-in-plain-english/comparison-between-logistic-regression-and-neural-networks-in-classifying-digits-dc5e85cd93c3}}{\nolinkurl{https://medium.com/ai-in-plain-english/comparison-between-logistic-regression-and-neural-networks-in-classifying-digits-dc5e85cd93c3}}compares logistic regression and FFNN using the so-called MNIST data set. You may find several useful hints and ideas from this article. Your neural network code can implement the equivalent of logistic regression by simply setting the number of hidden layers to zero and keeping just the input and the output layers. 
+
+If you wish to compare with say Logisti Regression from \textbf{scikit-learn}, the following code uses the above data set
+
+
+
+
+
+
+
+
+
+
+
+
+\begin{verbatim}
+from sklearn.linear_model import LogisticRegression
+# Initialize the model
+model = LogisticRegression(solver='saga', multi_class='multinomial', max_iter=1000, random_state=42)
+# Train the model
+model.fit(X_train, y_train)
+from sklearn.metrics import accuracy_score
+# Make predictions on the test set
+y_pred = model.predict(X_test)
+# Calculate accuracy
+accuracy = accuracy_score(y_test, y_pred)
+print(f"Model Accuracy: {accuracy:.4f}")
+
+\end{verbatim}
+
+
+\paragraph{Part g) Critical evaluation of the various algorithms.}
+After all these glorious calculations, you should now summarize the
+various algorithms and come with a critical evaluation of their pros
+and cons. Which algorithm works best for the regression case and which
+is best for the classification case. These codes can also be part of
+your final project 3, but now applied to other data sets.
+
+\subsection*{Summary of methods to implement and analyze}
+
+\textbf{Required Implementation:}
+\begin{enumerate}
+\item Reuse the regression code and results from project 1, these will act as a benchmark for seeing how suited a neural network is for this regression task.
+
+\item Implement a neural network with
+\begin{itemize}
+
+  \item A flexible number of layers
+
+  \item A flexible number of nodes in each layer
+
+  \item A changeable activation function in each layer (Sigmoid, ReLU, LeakyReLU, as well as Linear and Softmax)
+
+  \item A changeable cost function, which will be set to MSE for regression and cross-entropy for multiple-classification
+
+  \item An optional L1 or L2 norm of the weights and biases in the cost function (only used for computing gradients, not interpretable metrics)
+
+\end{itemize}
+
+\noindent
+\item Implement the back-propagation algorithm to compute the gradient of your neural network
+
+\item Reuse the implementation of Plain and Stochastic Gradient Descent from Project 1 (and adapt the code to work with the your neural network)
+\begin{itemize}
+
+  \item With no optimization algorithm
+
+  \item With RMS Prop
+
+  \item With ADAM
+
+\end{itemize}
+
+\noindent
+\item Implement scaling and train-test splitting of your data, preferably using sklearn
+
+\item Implement and compute metrics like the MSE and Accuracy
+\end{enumerate}
+
+\noindent
+\paragraph{Required Analysis:}
+\begin{enumerate}
+\item Briefly show and argue for the advantages and disadvantages of the methods from Project 1.
+
+\item Explore and show the impact of changing the number of layers, nodes per layer, choice of activation function, and inclusion of L1 and L2 norms. Present only the most interesting results from this exploration. 2D Heatmaps will be good for this: Start with finding a well performing set of hyper-parameters, then change two at a time in a range that shows good and bad performance.
+
+\item Show and argue for the advantages and disadvantages of using a neural network for regression on your data
+
+\item Show and argue for the advantages and disadvantages of using a neural network for classification on your data
+
+\item Show and argue for the advantages and disadvantages of the different gradient methods and learning rates when training the neural network
+\end{enumerate}
+
+\noindent
+\paragraph{Optional (Note that you should include at least two of these in the report):}
+\begin{enumerate}
+\item Implement Logistic Regression as simple classification model case (equivalent to a Neural Network with just the output layer)
+
+\item Compute the gradient of the neural network with autograd, to show that it gives the same result as your hand-written backpropagation.
+
+\item Compare your results with results from using a machine-learning library like pytorch (https://docs.pytorch.org/tutorials/beginner/basics/buildmodel_tutorial.html)
+
+\item Use a more complex classification dataset instead, like the fashion MNIST (see \href{{https://www.kaggle.com/datasets/zalando-research/fashionmnist}}{\nolinkurl{https://www.kaggle.com/datasets/zalando-research/fashionmnist}})
+
+\item Use a more complex regression dataset instead, like the two-dimensional Runge function $f(x,y)=\left[(10x - 5)^2 + (10y - 5)^2 + 1 \right]^{-1}$, or even more complicated two-dimensional functions (see the supplementary material of \href{{https://www.nature.com/articles/s41467-025-61362-4}}{\nolinkurl{https://www.nature.com/articles/s41467-025-61362-4}} for an extensive list of two-dimensional functions). 
+
+\item Compute and interpret a confusion matrix of your best classification model (see \href{{https://www.researchgate.net/figure/Confusion-matrix-of-MNIST-and-F-MNIST-embeddings_fig5_349758607}}{\nolinkurl{https://www.researchgate.net/figure/Confusion-matrix-of-MNIST-and-F-MNIST-embeddings_fig5_349758607}})
+\end{enumerate}
+
+\noindent
+\subsection*{Background literature}
+
+\begin{enumerate}
+\item The text of Michael Nielsen is highly recommended, see Nielsen's book at \href{{http://neuralnetworksanddeeplearning.com/}}{\nolinkurl{http://neuralnetworksanddeeplearning.com/}}. It is an excellent read.
+
+\item Goodfellow, Bengio and Courville, Deep Learning at \href{{https://www.deeplearningbook.org/}}{\nolinkurl{https://www.deeplearningbook.org/}}. Here we recommend chapters 6, 7 and 8
+
+\item Raschka et al.~at \href{{https://sebastianraschka.com/blog/2022/ml-pytorch-book.html}}{\nolinkurl{https://sebastianraschka.com/blog/2022/ml-pytorch-book.html}}. Here we recommend chapters 11, 12 and 13.
+\end{enumerate}
+
+\noindent
+\subsection*{Introduction to numerical projects}
+
+Here follows a brief recipe and recommendation on how to write a report for each
+project.
+
+\begin{itemize}
+  \item Give a short description of the nature of the problem and the eventual  numerical methods you have used.
+
+  \item Describe the algorithm you have used and/or developed. Here you may find it convenient to use pseudocoding. In many cases you can describe the algorithm in the program itself.
+
+  \item Include the source code of your program. Comment your program properly.
+
+  \item If possible, try to find analytic solutions, or known limits in order to test your program when developing the code.
+
+  \item Include your results either in figure form or in a table. Remember to        label your results. All tables and figures should have relevant captions        and labels on the axes.
+
+  \item Try to evaluate the reliabilty and numerical stability/precision of your results. If possible, include a qualitative and/or quantitative discussion of the numerical stability, eventual loss of precision etc.
+
+  \item Try to give an interpretation of you results in your answers to  the problems.
+
+  \item Critique: if possible include your comments and reflections about the  exercise, whether you felt you learnt something, ideas for improvements and  other thoughts you've made when solving the exercise. We wish to keep this course at the interactive level and your comments can help us improve it.
+
+  \item Try to establish a practice where you log your work at the  computerlab. You may find such a logbook very handy at later stages in your work, especially when you don't properly remember  what a previous test version  of your program did. Here you could also record  the time spent on solving the exercise, various algorithms you may have tested or other topics which you feel worthy of mentioning.
+\end{itemize}
+
+\noindent
+\subsection*{Format for electronic delivery of report and programs}
+
+The preferred format for the report is a PDF file. You can also use DOC or postscript formats or as an ipython notebook file.  As programming language we prefer that you choose between C/C++, Fortran2008 or Python. The following prescription should be followed when preparing the report:
+
+\begin{itemize}
+  \item Use Canvas to hand in your projects, log in  at  \href{{https://www.uio.no/english/services/it/education/canvas/}}{\nolinkurl{https://www.uio.no/english/services/it/education/canvas/}} with your normal UiO username and password.
+
+  \item Upload \textbf{only} the report file or the link to your GitHub/GitLab or similar typo of  repos!  For the source code file(s) you have developed please provide us with your link to your GitHub/GitLab or similar  domain.  The report file should include all of your discussions and a list of the codes you have developed.  Do not include library files which are available at the course homepage, unless you have made specific changes to them.
+
+  \item In your GitHub/GitLab or similar repository, please include a folder which contains selected results. These can be in the form of output from your code for a selected set of runs and input parameters.
+\end{itemize}
+
+\noindent
+Finally, 
+we encourage you to collaborate. Optimal working groups consist of 
+2-3 students. You can then hand in a common report. 
+
+
+% ------------------- end of main content ---------------
+
+\end{document}
+
diff --git a/doc/Textbooks/SuttonBartoIPRLBook2ndEd.pdf b/doc/Textbooks/SuttonBartoIPRLBook2ndEd.pdf
new file mode 100644
index 000000000..f0a9e7792
Binary files /dev/null and b/doc/Textbooks/SuttonBartoIPRLBook2ndEd.pdf differ
diff --git a/doc/pub/week37/html/._week37-bs000.html b/doc/pub/week37/html/._week37-bs000.html
index 39cfcfc18..5cb7bff40 100644
--- a/doc/pub/week37/html/._week37-bs000.html
+++ b/doc/pub/week37/html/._week37-bs000.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 37: Statistical interpretations and Resampling Methods">
-<title>Week 37: Statistical interpretations and Resampling Methods</title>
+<meta name="description" content="Week 37: Gradient descent methods">
+<title>Week 37: Gradient descent methods</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week37.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week37-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -40,159 +40,222 @@
                2,
                None,
                'plans-for-week-37-lecture-monday'),
-              ('Plans for week 37, lab sessions',
+              ('Readings and Videos:', 2, None, 'readings-and-videos'),
+              ('Material for lecture Monday September 8',
                2,
                None,
-               'plans-for-week-37-lab-sessions'),
-              ('Material for lecture Monday September 9',
+               'material-for-lecture-monday-september-8'),
+              ('Gradient descent and revisiting Ordinary Least Squares from '
+               'last week',
                2,
                None,
-               'material-for-lecture-monday-september-9'),
-              ('Deriving OLS from a probability distribution',
+               'gradient-descent-and-revisiting-ordinary-least-squares-from-last-week'),
+              ('Gradient descent example', 2, None, 'gradient-descent-example'),
+              ('The derivative of the cost/loss function',
                2,
                None,
-               'deriving-ols-from-a-probability-distribution'),
-              ('Independent and Identically Distrubuted (iid)',
+               'the-derivative-of-the-cost-loss-function'),
+              ('The Hessian matrix', 2, None, 'the-hessian-matrix'),
+              ('Simple program', 2, None, 'simple-program'),
+              ('Gradient Descent Example', 2, None, 'gradient-descent-example'),
+              ('Gradient descent and Ridge',
                2,
                None,
-               'independent-and-identically-distrubuted-iid'),
-              ('Maximum Likelihood Estimation (MLE)',
+               'gradient-descent-and-ridge'),
+              ('The Hessian matrix for Ridge Regression',
                2,
                None,
-               'maximum-likelihood-estimation-mle'),
-              ('A new Cost Function', 2, None, 'a-new-cost-function'),
-              ("More basic Statistics and Bayes' theorem",
+               'the-hessian-matrix-for-ridge-regression'),
+              ('Program example for gradient descent with Ridge Regression',
                2,
                None,
-               'more-basic-statistics-and-bayes-theorem'),
-              ('Marginal Probability', 2, None, 'marginal-probability'),
-              ('Conditional  Probability', 2, None, 'conditional-probability'),
-              ("Bayes' Theorem", 2, None, 'bayes-theorem'),
-              ("Interpretations of Bayes' Theorem",
+               'program-example-for-gradient-descent-with-ridge-regression'),
+              ('Using gradient descent methods, limitations',
                2,
                None,
-               'interpretations-of-bayes-theorem'),
-              ("Example of Usage of Bayes' theorem",
+               'using-gradient-descent-methods-limitations'),
+              ('Momentum based GD', 2, None, 'momentum-based-gd'),
+              ('Improving gradient descent with momentum',
                2,
                None,
-               'example-of-usage-of-bayes-theorem'),
-              ('Doing it correctly', 2, None, 'doing-it-correctly'),
-              ("Bayes' Theorem and Ridge and Lasso Regression",
+               'improving-gradient-descent-with-momentum'),
+              ('Same code but now with momentum gradient descent',
                2,
                None,
-               'bayes-theorem-and-ridge-and-lasso-regression'),
-              ('Ridge and Bayes', 2, None, 'ridge-and-bayes'),
-              ('Lasso and Bayes', 2, None, 'lasso-and-bayes'),
-              ('Why resampling methods', 2, None, 'why-resampling-methods'),
-              ('Resampling methods', 2, None, 'resampling-methods'),
-              ('Resampling approaches can be computationally expensive',
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ('Overview video on Stochastic Gradient Descent (SGD)',
                2,
                None,
-               'resampling-approaches-can-be-computationally-expensive'),
-              ('Why resampling methods ?', 2, None, 'why-resampling-methods'),
-              ('Statistical analysis', 2, None, 'statistical-analysis'),
-              ('Resampling methods', 2, None, 'resampling-methods'),
-              ('Resampling methods: Bootstrap',
+               'overview-video-on-stochastic-gradient-descent-sgd'),
+              ('Batches and mini-batches', 2, None, 'batches-and-mini-batches'),
+              ('Pros and cons', 2, None, 'pros-and-cons'),
+              ('Convergence rates', 2, None, 'convergence-rates'),
+              ('Accuracy', 2, None, 'accuracy'),
+              ('Stochastic Gradient Descent (SGD)',
                2,
                None,
-               'resampling-methods-bootstrap'),
-              ('The Central Limit Theorem',
+               'stochastic-gradient-descent-sgd'),
+              ('Stochastic Gradient Descent',
                2,
                None,
-               'the-central-limit-theorem'),
-              ('Finding the Limit', 2, None, 'finding-the-limit'),
-              ('Rewriting the $\\delta$-function',
+               'stochastic-gradient-descent'),
+              ('Computation of gradients', 2, None, 'computation-of-gradients'),
+              ('SGD example', 2, None, 'sgd-example'),
+              ('The gradient step', 2, None, 'the-gradient-step'),
+              ('Simple example code', 2, None, 'simple-example-code'),
+              ('When do we stop?', 2, None, 'when-do-we-stop'),
+              ('Slightly different approach',
                2,
                None,
-               'rewriting-the-delta-function'),
-              ('Identifying Terms', 2, None, 'identifying-terms'),
-              ('Wrapping it up', 2, None, 'wrapping-it-up'),
-              ('Confidence Intervals', 2, None, 'confidence-intervals'),
-              ('Standard Approach based on the Normal Distribution',
+               'slightly-different-approach'),
+              ('Time decay rate', 2, None, 'time-decay-rate'),
+              ('Code with a Number of Minibatches which varies',
                2,
                None,
-               'standard-approach-based-on-the-normal-distribution'),
-              ('Resampling methods: Bootstrap background',
+               'code-with-a-number-of-minibatches-which-varies'),
+              ('Replace or not', 2, None, 'replace-or-not'),
+              ('SGD vs Full-Batch GD: Convergence Speed and Memory Comparison',
                2,
                None,
-               'resampling-methods-bootstrap-background'),
-              ('Resampling methods: More Bootstrap background',
+               'sgd-vs-full-batch-gd-convergence-speed-and-memory-comparison'),
+              ('Theoretical Convergence Speed and convex optimization',
+               3,
+               None,
+               'theoretical-convergence-speed-and-convex-optimization'),
+              ('Strongly Convex Case', 3, None, 'strongly-convex-case'),
+              ('Non-Convex Problems', 3, None, 'non-convex-problems'),
+              ('Memory Usage and Scalability',
+               2,
+               None,
+               'memory-usage-and-scalability'),
+              ('Empirical Evidence: Convergence Time and Memory in Practice',
+               2,
+               None,
+               'empirical-evidence-convergence-time-and-memory-in-practice'),
+              ('Deep Neural Networks', 3, None, 'deep-neural-networks'),
+              ('Memory constraints', 3, None, 'memory-constraints'),
+              ('Second moment of the gradient',
+               2,
+               None,
+               'second-moment-of-the-gradient'),
+              ('Challenge: Choosing a Fixed Learning Rate',
+               2,
+               None,
+               'challenge-choosing-a-fixed-learning-rate'),
+              ('Motivation for Adaptive Step Sizes',
+               2,
+               None,
+               'motivation-for-adaptive-step-sizes'),
+              ('AdaGrad algorithm, taken from "Goodfellow et '
+               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
+               2,
+               None,
+               'adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
+              ('Derivation of the AdaGrad Algorithm',
+               2,
+               None,
+               'derivation-of-the-adagrad-algorithm'),
+              ('AdaGrad Update Rule Derivation',
+               2,
+               None,
+               'adagrad-update-rule-derivation'),
+              ('AdaGrad Properties', 2, None, 'adagrad-properties'),
+              ('RMSProp: Adaptive Learning Rates',
+               2,
+               None,
+               'rmsprop-adaptive-learning-rates'),
+              ('RMSProp algorithm, taken from "Goodfellow et '
+               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
+               2,
+               None,
+               'rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
+              ('Adam Optimizer', 2, None, 'adam-optimizer'),
+              ('"ADAM optimizer":"https://arxiv.org/abs/1412.6980"',
+               2,
+               None,
+               'adam-optimizer-https-arxiv-org-abs-1412-6980'),
+              ('Why Combine Momentum and RMSProp?',
+               2,
+               None,
+               'why-combine-momentum-and-rmsprop'),
+              ('Adam: Exponential Moving Averages (Moments)',
                2,
                None,
-               'resampling-methods-more-bootstrap-background'),
-              ('Resampling methods: Bootstrap approach',
+               'adam-exponential-moving-averages-moments'),
+              ('Adam: Bias Correction', 2, None, 'adam-bias-correction'),
+              ('Adam: Update Rule Derivation',
                2,
                None,
-               'resampling-methods-bootstrap-approach'),
-              ('Resampling methods: Bootstrap steps',
+               'adam-update-rule-derivation'),
+              ('Adam vs. AdaGrad and RMSProp',
                2,
                None,
-               'resampling-methods-bootstrap-steps'),
-              ('Code example for the Bootstrap method',
+               'adam-vs-adagrad-and-rmsprop'),
+              ('Adaptivity Across Dimensions',
                2,
                None,
-               'code-example-for-the-bootstrap-method'),
-              ('Plotting the Histogram', 2, None, 'plotting-the-histogram'),
-              ('The bias-variance tradeoff',
+               'adaptivity-across-dimensions'),
+              ('ADAM algorithm, taken from "Goodfellow et '
+               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
                2,
                None,
-               'the-bias-variance-tradeoff'),
-              ('A way to Read the Bias-Variance Tradeoff',
+               'adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
+              ('Algorithms and codes for Adagrad, RMSprop and Adam',
                2,
                None,
-               'a-way-to-read-the-bias-variance-tradeoff'),
-              ('Example code for Bias-Variance tradeoff',
+               'algorithms-and-codes-for-adagrad-rmsprop-and-adam'),
+              ('Practical tips', 2, None, 'practical-tips'),
+              ('Sneaking in automatic differentiation using Autograd',
                2,
                None,
-               'example-code-for-bias-variance-tradeoff'),
-              ('Understanding what happens',
+               'sneaking-in-automatic-differentiation-using-autograd'),
+              ('Same code but now with momentum gradient descent',
                2,
                None,
-               'understanding-what-happens'),
-              ('Summing up', 2, None, 'summing-up'),
-              ("Another Example from Scikit-Learn's Repository",
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ('Including Stochastic Gradient Descent with Autograd',
                2,
                None,
-               'another-example-from-scikit-learn-s-repository'),
-              ('Various steps in cross-validation',
+               'including-stochastic-gradient-descent-with-autograd'),
+              ('Same code but now with momentum gradient descent',
                2,
                None,
-               'various-steps-in-cross-validation'),
-              ('Cross-validation in brief',
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ("But none of these can compete with Newton's method",
                2,
                None,
-               'cross-validation-in-brief'),
-              ('Code Example for Cross-validation and $k$-fold '
-               'Cross-validation',
+               'but-none-of-these-can-compete-with-newton-s-method'),
+              ('Similar (second order function now) problem but now with '
+               'AdaGrad',
                2,
                None,
-               'code-example-for-cross-validation-and-k-fold-cross-validation'),
-              ('More examples on bootstrap and cross-validation and errors',
+               'similar-second-order-function-now-problem-but-now-with-adagrad'),
+              ('RMSprop for adaptive learning rate with Stochastic Gradient '
+               'Descent',
                2,
                None,
-               'more-examples-on-bootstrap-and-cross-validation-and-errors'),
-              ('The same example but now with cross-validation',
+               'rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent'),
+              ('And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"',
                2,
                None,
-               'the-same-example-but-now-with-cross-validation'),
+               'and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf'),
               ('Material for the lab sessions',
                2,
                None,
                'material-for-the-lab-sessions'),
-              ('Linking the regression analysis with a statistical '
-               'interpretation',
+              ('Reminder on different scaling methods',
                2,
                None,
-               'linking-the-regression-analysis-with-a-statistical-interpretation'),
-              ('Assumptions made', 2, None, 'assumptions-made'),
-              ('Expectation value and variance',
+               'reminder-on-different-scaling-methods'),
+              ('Functionality in Scikit-Learn',
                2,
                None,
-               'expectation-value-and-variance'),
-              ('Expectation value and variance for $\\boldsymbol{\\beta}$',
+               'functionality-in-scikit-learn'),
+              ('More preprocessing', 2, None, 'more-preprocessing'),
+              ('Frequently used scaling functions',
                2,
                None,
-               'expectation-value-and-variance-for-boldsymbol-beta')]}
+               'frequently-used-scaling-functions')]}
 end of tocinfo -->
 
 <body>
@@ -220,66 +283,86 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week37-bs.html">Week 37: Statistical interpretations and Resampling Methods</a>
+    <a class="navbar-brand" href="http://github.com/week37-bs.html">Week 37: Gradient descent methods</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs001.html#plans-for-week-37-lecture-monday" style="font-size: 80%;">Plans for week 37, lecture Monday</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs002.html#plans-for-week-37-lab-sessions" style="font-size: 80%;">Plans for week 37, lab sessions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs003.html#material-for-lecture-monday-september-9" style="font-size: 80%;">Material for lecture Monday September 9</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs004.html#deriving-ols-from-a-probability-distribution" style="font-size: 80%;">Deriving OLS from a probability distribution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs005.html#independent-and-identically-distrubuted-iid" style="font-size: 80%;">Independent and Identically Distrubuted (iid)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs006.html#maximum-likelihood-estimation-mle" style="font-size: 80%;">Maximum Likelihood Estimation (MLE)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs007.html#a-new-cost-function" style="font-size: 80%;">A new Cost Function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs008.html#more-basic-statistics-and-bayes-theorem" style="font-size: 80%;">More basic Statistics and Bayes' theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs009.html#marginal-probability" style="font-size: 80%;">Marginal Probability</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs010.html#conditional-probability" style="font-size: 80%;">Conditional  Probability</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs011.html#bayes-theorem" style="font-size: 80%;">Bayes' Theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs012.html#interpretations-of-bayes-theorem" style="font-size: 80%;">Interpretations of Bayes' Theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs013.html#example-of-usage-of-bayes-theorem" style="font-size: 80%;">Example of Usage of Bayes' theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs014.html#doing-it-correctly" style="font-size: 80%;">Doing it correctly</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs015.html#bayes-theorem-and-ridge-and-lasso-regression" style="font-size: 80%;">Bayes' Theorem and Ridge and Lasso Regression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs016.html#ridge-and-bayes" style="font-size: 80%;">Ridge and Bayes</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs017.html#lasso-and-bayes" style="font-size: 80%;">Lasso and Bayes</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs021.html#why-resampling-methods" style="font-size: 80%;">Why resampling methods</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs023.html#resampling-methods" style="font-size: 80%;">Resampling methods</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs020.html#resampling-approaches-can-be-computationally-expensive" style="font-size: 80%;">Resampling approaches can be computationally expensive</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs021.html#why-resampling-methods" style="font-size: 80%;">Why resampling methods ?</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs022.html#statistical-analysis" style="font-size: 80%;">Statistical analysis</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs023.html#resampling-methods" style="font-size: 80%;">Resampling methods</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs024.html#resampling-methods-bootstrap" style="font-size: 80%;">Resampling methods: Bootstrap</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs025.html#the-central-limit-theorem" style="font-size: 80%;">The Central Limit Theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs026.html#finding-the-limit" style="font-size: 80%;">Finding the Limit</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs027.html#rewriting-the-delta-function" style="font-size: 80%;">Rewriting the \( \delta \)-function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs028.html#identifying-terms" style="font-size: 80%;">Identifying Terms</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs029.html#wrapping-it-up" style="font-size: 80%;">Wrapping it up</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs030.html#confidence-intervals" style="font-size: 80%;">Confidence Intervals</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs031.html#standard-approach-based-on-the-normal-distribution" style="font-size: 80%;">Standard Approach based on the Normal Distribution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs032.html#resampling-methods-bootstrap-background" style="font-size: 80%;">Resampling methods: Bootstrap background</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#resampling-methods-more-bootstrap-background" style="font-size: 80%;">Resampling methods: More Bootstrap background</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs034.html#resampling-methods-bootstrap-approach" style="font-size: 80%;">Resampling methods: Bootstrap approach</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#resampling-methods-bootstrap-steps" style="font-size: 80%;">Resampling methods: Bootstrap steps</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs036.html#code-example-for-the-bootstrap-method" style="font-size: 80%;">Code example for the Bootstrap method</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs037.html#plotting-the-histogram" style="font-size: 80%;">Plotting the Histogram</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs038.html#the-bias-variance-tradeoff" style="font-size: 80%;">The bias-variance tradeoff</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs039.html#a-way-to-read-the-bias-variance-tradeoff" style="font-size: 80%;">A way to Read the Bias-Variance Tradeoff</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs040.html#example-code-for-bias-variance-tradeoff" style="font-size: 80%;">Example code for Bias-Variance tradeoff</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs041.html#understanding-what-happens" style="font-size: 80%;">Understanding what happens</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs042.html#summing-up" style="font-size: 80%;">Summing up</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs043.html#another-example-from-scikit-learn-s-repository" style="font-size: 80%;">Another Example from Scikit-Learn's Repository</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs044.html#various-steps-in-cross-validation" style="font-size: 80%;">Various steps in cross-validation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs045.html#cross-validation-in-brief" style="font-size: 80%;">Cross-validation in brief</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs046.html#code-example-for-cross-validation-and-k-fold-cross-validation" style="font-size: 80%;">Code Example for Cross-validation and \( k \)-fold Cross-validation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs047.html#more-examples-on-bootstrap-and-cross-validation-and-errors" style="font-size: 80%;">More examples on bootstrap and cross-validation and errors</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs048.html#the-same-example-but-now-with-cross-validation" style="font-size: 80%;">The same example but now with cross-validation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs049.html#material-for-the-lab-sessions" style="font-size: 80%;">Material for the lab sessions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs050.html#linking-the-regression-analysis-with-a-statistical-interpretation" style="font-size: 80%;">Linking the regression analysis with a statistical interpretation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs051.html#assumptions-made" style="font-size: 80%;">Assumptions made</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs052.html#expectation-value-and-variance" style="font-size: 80%;">Expectation value and variance</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs053.html#expectation-value-and-variance-for-boldsymbol-beta" style="font-size: 80%;">Expectation value and variance for \( \boldsymbol{\beta} \)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs001.html#plans-for-week-37-lecture-monday" style="font-size: 80%;"><b>Plans for week 37, lecture Monday</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs002.html#readings-and-videos" style="font-size: 80%;"><b>Readings and Videos:</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs003.html#material-for-lecture-monday-september-8" style="font-size: 80%;"><b>Material for lecture Monday September 8</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs004.html#gradient-descent-and-revisiting-ordinary-least-squares-from-last-week" style="font-size: 80%;"><b>Gradient descent and revisiting Ordinary Least Squares from last week</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs009.html#gradient-descent-example" style="font-size: 80%;"><b>Gradient descent example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs006.html#the-derivative-of-the-cost-loss-function" style="font-size: 80%;"><b>The derivative of the cost/loss function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs007.html#the-hessian-matrix" style="font-size: 80%;"><b>The Hessian matrix</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs008.html#simple-program" style="font-size: 80%;"><b>Simple program</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs009.html#gradient-descent-example" style="font-size: 80%;"><b>Gradient Descent Example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs010.html#gradient-descent-and-ridge" style="font-size: 80%;"><b>Gradient descent and Ridge</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs011.html#the-hessian-matrix-for-ridge-regression" style="font-size: 80%;"><b>The Hessian matrix for Ridge Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs012.html#program-example-for-gradient-descent-with-ridge-regression" style="font-size: 80%;"><b>Program example for gradient descent with Ridge Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs013.html#using-gradient-descent-methods-limitations" style="font-size: 80%;"><b>Using gradient descent methods, limitations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs014.html#momentum-based-gd" style="font-size: 80%;"><b>Momentum based GD</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs015.html#improving-gradient-descent-with-momentum" style="font-size: 80%;"><b>Improving gradient descent with momentum</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs056.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs017.html#overview-video-on-stochastic-gradient-descent-sgd" style="font-size: 80%;"><b>Overview video on Stochastic Gradient Descent (SGD)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs018.html#batches-and-mini-batches" style="font-size: 80%;"><b>Batches and mini-batches</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs019.html#pros-and-cons" style="font-size: 80%;"><b>Pros and cons</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs020.html#convergence-rates" style="font-size: 80%;"><b>Convergence rates</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs021.html#accuracy" style="font-size: 80%;"><b>Accuracy</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs022.html#stochastic-gradient-descent-sgd" style="font-size: 80%;"><b>Stochastic Gradient Descent (SGD)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs023.html#stochastic-gradient-descent" style="font-size: 80%;"><b>Stochastic Gradient Descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs024.html#computation-of-gradients" style="font-size: 80%;"><b>Computation of gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs025.html#sgd-example" style="font-size: 80%;"><b>SGD example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs026.html#the-gradient-step" style="font-size: 80%;"><b>The gradient step</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs027.html#simple-example-code" style="font-size: 80%;"><b>Simple example code</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs028.html#when-do-we-stop" style="font-size: 80%;"><b>When do we stop?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs029.html#slightly-different-approach" style="font-size: 80%;"><b>Slightly different approach</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs030.html#time-decay-rate" style="font-size: 80%;"><b>Time decay rate</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs031.html#code-with-a-number-of-minibatches-which-varies" style="font-size: 80%;"><b>Code with a Number of Minibatches which varies</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs032.html#replace-or-not" style="font-size: 80%;"><b>Replace or not</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#sgd-vs-full-batch-gd-convergence-speed-and-memory-comparison" style="font-size: 80%;"><b>SGD vs Full-Batch GD: Convergence Speed and Memory Comparison</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#theoretical-convergence-speed-and-convex-optimization" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Theoretical Convergence Speed and convex optimization</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#strongly-convex-case" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Strongly Convex Case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#non-convex-problems" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Non-Convex Problems</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs034.html#memory-usage-and-scalability" style="font-size: 80%;"><b>Memory Usage and Scalability</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#empirical-evidence-convergence-time-and-memory-in-practice" style="font-size: 80%;"><b>Empirical Evidence: Convergence Time and Memory in Practice</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#deep-neural-networks" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Deep Neural Networks</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#memory-constraints" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Memory constraints</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs036.html#second-moment-of-the-gradient" style="font-size: 80%;"><b>Second moment of the gradient</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs037.html#challenge-choosing-a-fixed-learning-rate" style="font-size: 80%;"><b>Challenge: Choosing a Fixed Learning Rate</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs038.html#motivation-for-adaptive-step-sizes" style="font-size: 80%;"><b>Motivation for Adaptive Step Sizes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs038.html#adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>AdaGrad algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs039.html#derivation-of-the-adagrad-algorithm" style="font-size: 80%;"><b>Derivation of the AdaGrad Algorithm</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs040.html#adagrad-update-rule-derivation" style="font-size: 80%;"><b>AdaGrad Update Rule Derivation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs041.html#adagrad-properties" style="font-size: 80%;"><b>AdaGrad Properties</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs042.html#rmsprop-adaptive-learning-rates" style="font-size: 80%;"><b>RMSProp: Adaptive Learning Rates</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs042.html#rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>RMSProp algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs043.html#adam-optimizer" style="font-size: 80%;"><b>Adam Optimizer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs044.html#adam-optimizer-https-arxiv-org-abs-1412-6980" style="font-size: 80%;"><b>"ADAM optimizer":"https://arxiv.org/abs/1412.6980"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs045.html#why-combine-momentum-and-rmsprop" style="font-size: 80%;"><b>Why Combine Momentum and RMSProp?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs046.html#adam-exponential-moving-averages-moments" style="font-size: 80%;"><b>Adam: Exponential Moving Averages (Moments)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs047.html#adam-bias-correction" style="font-size: 80%;"><b>Adam: Bias Correction</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs048.html#adam-update-rule-derivation" style="font-size: 80%;"><b>Adam: Update Rule Derivation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs049.html#adam-vs-adagrad-and-rmsprop" style="font-size: 80%;"><b>Adam vs. AdaGrad and RMSProp</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs050.html#adaptivity-across-dimensions" style="font-size: 80%;"><b>Adaptivity Across Dimensions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs050.html#adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>ADAM algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs051.html#algorithms-and-codes-for-adagrad-rmsprop-and-adam" style="font-size: 80%;"><b>Algorithms and codes for Adagrad, RMSprop and Adam</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs052.html#practical-tips" style="font-size: 80%;"><b>Practical tips</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs053.html#sneaking-in-automatic-differentiation-using-autograd" style="font-size: 80%;"><b>Sneaking in automatic differentiation using Autograd</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs056.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs055.html#including-stochastic-gradient-descent-with-autograd" style="font-size: 80%;"><b>Including Stochastic Gradient Descent with Autograd</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs056.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs057.html#but-none-of-these-can-compete-with-newton-s-method" style="font-size: 80%;"><b>But none of these can compete with Newton's method</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs058.html#similar-second-order-function-now-problem-but-now-with-adagrad" style="font-size: 80%;"><b>Similar (second order function now) problem but now with AdaGrad</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs059.html#rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent" style="font-size: 80%;"><b>RMSprop for adaptive learning rate with Stochastic Gradient Descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs060.html#and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf" style="font-size: 80%;"><b>And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs061.html#material-for-the-lab-sessions" style="font-size: 80%;"><b>Material for the lab sessions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs062.html#reminder-on-different-scaling-methods" style="font-size: 80%;"><b>Reminder on different scaling methods</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs063.html#functionality-in-scikit-learn" style="font-size: 80%;"><b>Functionality in Scikit-Learn</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs064.html#more-preprocessing" style="font-size: 80%;"><b>More preprocessing</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs065.html#frequently-used-scaling-functions" style="font-size: 80%;"><b>Frequently used scaling functions</b></a></li>
 
         </ul>
       </li>
@@ -293,7 +376,7 @@
 <!-- ------------------- main content ---------------------- -->
 <div class="jumbotron">
 <center>
-<h1>Week 37: Statistical interpretations and Resampling Methods</h1>
+<h1>Week 37: Gradient descent methods</h1>
 </center>  <!-- document title -->
 
 <!-- author(s): Morten Hjorth-Jensen -->
@@ -306,7 +389,7 @@ <h1>Week 37: Statistical interpretations and Resampling Methods</h1>
 </center>
 <br>
 <center>
-<h4>September 9, 2024</h4>
+<h4>September 8-12, 2025</h4>
 </center> <!-- date -->
 <br>
 
@@ -333,7 +416,7 @@ <h4>September 9, 2024</h4>
   <li><a href="http://github.com/._week37-bs008.html">9</a></li>
   <li><a href="http://github.com/._week37-bs009.html">10</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week37-bs053.html">54</a></li>
+  <li><a href="http://github.com/._week37-bs065.html">66</a></li>
   <li><a href="http://github.com/._week37-bs001.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
@@ -347,7 +430,7 @@ <h4>September 9, 2024</h4>
 </footer>
 -->
 <center style="font-size:80%">
-<!-- copyright --> &copy; 1999-2024, Morten Hjorth-Jensen. Released under CC Attribution-NonCommercial 4.0 license
+<!-- copyright --> &copy; 1999-2025, Morten Hjorth-Jensen. Released under CC Attribution-NonCommercial 4.0 license
 </center>
 </body>
 </html>
diff --git a/doc/pub/week37/html/._week37-bs001.html b/doc/pub/week37/html/._week37-bs001.html
index 20713676b..fec983a52 100644
--- a/doc/pub/week37/html/._week37-bs001.html
+++ b/doc/pub/week37/html/._week37-bs001.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 37: Statistical interpretations and Resampling Methods">
-<title>Week 37: Statistical interpretations and Resampling Methods</title>
+<meta name="description" content="Week 37: Gradient descent methods">
+<title>Week 37: Gradient descent methods</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week37.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week37-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -40,159 +40,222 @@
                2,
                None,
                'plans-for-week-37-lecture-monday'),
-              ('Plans for week 37, lab sessions',
+              ('Readings and Videos:', 2, None, 'readings-and-videos'),
+              ('Material for lecture Monday September 8',
                2,
                None,
-               'plans-for-week-37-lab-sessions'),
-              ('Material for lecture Monday September 9',
+               'material-for-lecture-monday-september-8'),
+              ('Gradient descent and revisiting Ordinary Least Squares from '
+               'last week',
                2,
                None,
-               'material-for-lecture-monday-september-9'),
-              ('Deriving OLS from a probability distribution',
+               'gradient-descent-and-revisiting-ordinary-least-squares-from-last-week'),
+              ('Gradient descent example', 2, None, 'gradient-descent-example'),
+              ('The derivative of the cost/loss function',
                2,
                None,
-               'deriving-ols-from-a-probability-distribution'),
-              ('Independent and Identically Distrubuted (iid)',
+               'the-derivative-of-the-cost-loss-function'),
+              ('The Hessian matrix', 2, None, 'the-hessian-matrix'),
+              ('Simple program', 2, None, 'simple-program'),
+              ('Gradient Descent Example', 2, None, 'gradient-descent-example'),
+              ('Gradient descent and Ridge',
                2,
                None,
-               'independent-and-identically-distrubuted-iid'),
-              ('Maximum Likelihood Estimation (MLE)',
+               'gradient-descent-and-ridge'),
+              ('The Hessian matrix for Ridge Regression',
                2,
                None,
-               'maximum-likelihood-estimation-mle'),
-              ('A new Cost Function', 2, None, 'a-new-cost-function'),
-              ("More basic Statistics and Bayes' theorem",
+               'the-hessian-matrix-for-ridge-regression'),
+              ('Program example for gradient descent with Ridge Regression',
                2,
                None,
-               'more-basic-statistics-and-bayes-theorem'),
-              ('Marginal Probability', 2, None, 'marginal-probability'),
-              ('Conditional  Probability', 2, None, 'conditional-probability'),
-              ("Bayes' Theorem", 2, None, 'bayes-theorem'),
-              ("Interpretations of Bayes' Theorem",
+               'program-example-for-gradient-descent-with-ridge-regression'),
+              ('Using gradient descent methods, limitations',
                2,
                None,
-               'interpretations-of-bayes-theorem'),
-              ("Example of Usage of Bayes' theorem",
+               'using-gradient-descent-methods-limitations'),
+              ('Momentum based GD', 2, None, 'momentum-based-gd'),
+              ('Improving gradient descent with momentum',
                2,
                None,
-               'example-of-usage-of-bayes-theorem'),
-              ('Doing it correctly', 2, None, 'doing-it-correctly'),
-              ("Bayes' Theorem and Ridge and Lasso Regression",
+               'improving-gradient-descent-with-momentum'),
+              ('Same code but now with momentum gradient descent',
                2,
                None,
-               'bayes-theorem-and-ridge-and-lasso-regression'),
-              ('Ridge and Bayes', 2, None, 'ridge-and-bayes'),
-              ('Lasso and Bayes', 2, None, 'lasso-and-bayes'),
-              ('Why resampling methods', 2, None, 'why-resampling-methods'),
-              ('Resampling methods', 2, None, 'resampling-methods'),
-              ('Resampling approaches can be computationally expensive',
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ('Overview video on Stochastic Gradient Descent (SGD)',
                2,
                None,
-               'resampling-approaches-can-be-computationally-expensive'),
-              ('Why resampling methods ?', 2, None, 'why-resampling-methods'),
-              ('Statistical analysis', 2, None, 'statistical-analysis'),
-              ('Resampling methods', 2, None, 'resampling-methods'),
-              ('Resampling methods: Bootstrap',
+               'overview-video-on-stochastic-gradient-descent-sgd'),
+              ('Batches and mini-batches', 2, None, 'batches-and-mini-batches'),
+              ('Pros and cons', 2, None, 'pros-and-cons'),
+              ('Convergence rates', 2, None, 'convergence-rates'),
+              ('Accuracy', 2, None, 'accuracy'),
+              ('Stochastic Gradient Descent (SGD)',
                2,
                None,
-               'resampling-methods-bootstrap'),
-              ('The Central Limit Theorem',
+               'stochastic-gradient-descent-sgd'),
+              ('Stochastic Gradient Descent',
                2,
                None,
-               'the-central-limit-theorem'),
-              ('Finding the Limit', 2, None, 'finding-the-limit'),
-              ('Rewriting the $\\delta$-function',
+               'stochastic-gradient-descent'),
+              ('Computation of gradients', 2, None, 'computation-of-gradients'),
+              ('SGD example', 2, None, 'sgd-example'),
+              ('The gradient step', 2, None, 'the-gradient-step'),
+              ('Simple example code', 2, None, 'simple-example-code'),
+              ('When do we stop?', 2, None, 'when-do-we-stop'),
+              ('Slightly different approach',
                2,
                None,
-               'rewriting-the-delta-function'),
-              ('Identifying Terms', 2, None, 'identifying-terms'),
-              ('Wrapping it up', 2, None, 'wrapping-it-up'),
-              ('Confidence Intervals', 2, None, 'confidence-intervals'),
-              ('Standard Approach based on the Normal Distribution',
+               'slightly-different-approach'),
+              ('Time decay rate', 2, None, 'time-decay-rate'),
+              ('Code with a Number of Minibatches which varies',
                2,
                None,
-               'standard-approach-based-on-the-normal-distribution'),
-              ('Resampling methods: Bootstrap background',
+               'code-with-a-number-of-minibatches-which-varies'),
+              ('Replace or not', 2, None, 'replace-or-not'),
+              ('SGD vs Full-Batch GD: Convergence Speed and Memory Comparison',
                2,
                None,
-               'resampling-methods-bootstrap-background'),
-              ('Resampling methods: More Bootstrap background',
+               'sgd-vs-full-batch-gd-convergence-speed-and-memory-comparison'),
+              ('Theoretical Convergence Speed and convex optimization',
+               3,
+               None,
+               'theoretical-convergence-speed-and-convex-optimization'),
+              ('Strongly Convex Case', 3, None, 'strongly-convex-case'),
+              ('Non-Convex Problems', 3, None, 'non-convex-problems'),
+              ('Memory Usage and Scalability',
+               2,
+               None,
+               'memory-usage-and-scalability'),
+              ('Empirical Evidence: Convergence Time and Memory in Practice',
+               2,
+               None,
+               'empirical-evidence-convergence-time-and-memory-in-practice'),
+              ('Deep Neural Networks', 3, None, 'deep-neural-networks'),
+              ('Memory constraints', 3, None, 'memory-constraints'),
+              ('Second moment of the gradient',
+               2,
+               None,
+               'second-moment-of-the-gradient'),
+              ('Challenge: Choosing a Fixed Learning Rate',
                2,
                None,
-               'resampling-methods-more-bootstrap-background'),
-              ('Resampling methods: Bootstrap approach',
+               'challenge-choosing-a-fixed-learning-rate'),
+              ('Motivation for Adaptive Step Sizes',
                2,
                None,
-               'resampling-methods-bootstrap-approach'),
-              ('Resampling methods: Bootstrap steps',
+               'motivation-for-adaptive-step-sizes'),
+              ('AdaGrad algorithm, taken from "Goodfellow et '
+               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
                2,
                None,
-               'resampling-methods-bootstrap-steps'),
-              ('Code example for the Bootstrap method',
+               'adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
+              ('Derivation of the AdaGrad Algorithm',
                2,
                None,
-               'code-example-for-the-bootstrap-method'),
-              ('Plotting the Histogram', 2, None, 'plotting-the-histogram'),
-              ('The bias-variance tradeoff',
+               'derivation-of-the-adagrad-algorithm'),
+              ('AdaGrad Update Rule Derivation',
                2,
                None,
-               'the-bias-variance-tradeoff'),
-              ('A way to Read the Bias-Variance Tradeoff',
+               'adagrad-update-rule-derivation'),
+              ('AdaGrad Properties', 2, None, 'adagrad-properties'),
+              ('RMSProp: Adaptive Learning Rates',
                2,
                None,
-               'a-way-to-read-the-bias-variance-tradeoff'),
-              ('Example code for Bias-Variance tradeoff',
+               'rmsprop-adaptive-learning-rates'),
+              ('RMSProp algorithm, taken from "Goodfellow et '
+               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
                2,
                None,
-               'example-code-for-bias-variance-tradeoff'),
-              ('Understanding what happens',
+               'rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
+              ('Adam Optimizer', 2, None, 'adam-optimizer'),
+              ('"ADAM optimizer":"https://arxiv.org/abs/1412.6980"',
                2,
                None,
-               'understanding-what-happens'),
-              ('Summing up', 2, None, 'summing-up'),
-              ("Another Example from Scikit-Learn's Repository",
+               'adam-optimizer-https-arxiv-org-abs-1412-6980'),
+              ('Why Combine Momentum and RMSProp?',
                2,
                None,
-               'another-example-from-scikit-learn-s-repository'),
-              ('Various steps in cross-validation',
+               'why-combine-momentum-and-rmsprop'),
+              ('Adam: Exponential Moving Averages (Moments)',
                2,
                None,
-               'various-steps-in-cross-validation'),
-              ('Cross-validation in brief',
+               'adam-exponential-moving-averages-moments'),
+              ('Adam: Bias Correction', 2, None, 'adam-bias-correction'),
+              ('Adam: Update Rule Derivation',
                2,
                None,
-               'cross-validation-in-brief'),
-              ('Code Example for Cross-validation and $k$-fold '
-               'Cross-validation',
+               'adam-update-rule-derivation'),
+              ('Adam vs. AdaGrad and RMSProp',
                2,
                None,
-               'code-example-for-cross-validation-and-k-fold-cross-validation'),
-              ('More examples on bootstrap and cross-validation and errors',
+               'adam-vs-adagrad-and-rmsprop'),
+              ('Adaptivity Across Dimensions',
                2,
                None,
-               'more-examples-on-bootstrap-and-cross-validation-and-errors'),
-              ('The same example but now with cross-validation',
+               'adaptivity-across-dimensions'),
+              ('ADAM algorithm, taken from "Goodfellow et '
+               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
                2,
                None,
-               'the-same-example-but-now-with-cross-validation'),
+               'adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
+              ('Algorithms and codes for Adagrad, RMSprop and Adam',
+               2,
+               None,
+               'algorithms-and-codes-for-adagrad-rmsprop-and-adam'),
+              ('Practical tips', 2, None, 'practical-tips'),
+              ('Sneaking in automatic differentiation using Autograd',
+               2,
+               None,
+               'sneaking-in-automatic-differentiation-using-autograd'),
+              ('Same code but now with momentum gradient descent',
+               2,
+               None,
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ('Including Stochastic Gradient Descent with Autograd',
+               2,
+               None,
+               'including-stochastic-gradient-descent-with-autograd'),
+              ('Same code but now with momentum gradient descent',
+               2,
+               None,
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ("But none of these can compete with Newton's method",
+               2,
+               None,
+               'but-none-of-these-can-compete-with-newton-s-method'),
+              ('Similar (second order function now) problem but now with '
+               'AdaGrad',
+               2,
+               None,
+               'similar-second-order-function-now-problem-but-now-with-adagrad'),
+              ('RMSprop for adaptive learning rate with Stochastic Gradient '
+               'Descent',
+               2,
+               None,
+               'rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent'),
+              ('And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"',
+               2,
+               None,
+               'and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf'),
               ('Material for the lab sessions',
                2,
                None,
                'material-for-the-lab-sessions'),
-              ('Linking the regression analysis with a statistical '
-               'interpretation',
+              ('Reminder on different scaling methods',
                2,
                None,
-               'linking-the-regression-analysis-with-a-statistical-interpretation'),
-              ('Assumptions made', 2, None, 'assumptions-made'),
-              ('Expectation value and variance',
+               'reminder-on-different-scaling-methods'),
+              ('Functionality in Scikit-Learn',
                2,
                None,
-               'expectation-value-and-variance'),
-              ('Expectation value and variance for $\\boldsymbol{\\beta}$',
+               'functionality-in-scikit-learn'),
+              ('More preprocessing', 2, None, 'more-preprocessing'),
+              ('Frequently used scaling functions',
                2,
                None,
-               'expectation-value-and-variance-for-boldsymbol-beta')]}
+               'frequently-used-scaling-functions')]}
 end of tocinfo -->
 
 <body>
@@ -220,66 +283,86 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week37-bs.html">Week 37: Statistical interpretations and Resampling Methods</a>
+    <a class="navbar-brand" href="http://github.com/week37-bs.html">Week 37: Gradient descent methods</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="#plans-for-week-37-lecture-monday" style="font-size: 80%;">Plans for week 37, lecture Monday</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs002.html#plans-for-week-37-lab-sessions" style="font-size: 80%;">Plans for week 37, lab sessions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs003.html#material-for-lecture-monday-september-9" style="font-size: 80%;">Material for lecture Monday September 9</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs004.html#deriving-ols-from-a-probability-distribution" style="font-size: 80%;">Deriving OLS from a probability distribution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs005.html#independent-and-identically-distrubuted-iid" style="font-size: 80%;">Independent and Identically Distrubuted (iid)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs006.html#maximum-likelihood-estimation-mle" style="font-size: 80%;">Maximum Likelihood Estimation (MLE)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs007.html#a-new-cost-function" style="font-size: 80%;">A new Cost Function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs008.html#more-basic-statistics-and-bayes-theorem" style="font-size: 80%;">More basic Statistics and Bayes' theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs009.html#marginal-probability" style="font-size: 80%;">Marginal Probability</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs010.html#conditional-probability" style="font-size: 80%;">Conditional  Probability</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs011.html#bayes-theorem" style="font-size: 80%;">Bayes' Theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs012.html#interpretations-of-bayes-theorem" style="font-size: 80%;">Interpretations of Bayes' Theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs013.html#example-of-usage-of-bayes-theorem" style="font-size: 80%;">Example of Usage of Bayes' theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs014.html#doing-it-correctly" style="font-size: 80%;">Doing it correctly</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs015.html#bayes-theorem-and-ridge-and-lasso-regression" style="font-size: 80%;">Bayes' Theorem and Ridge and Lasso Regression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs016.html#ridge-and-bayes" style="font-size: 80%;">Ridge and Bayes</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs017.html#lasso-and-bayes" style="font-size: 80%;">Lasso and Bayes</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs021.html#why-resampling-methods" style="font-size: 80%;">Why resampling methods</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs023.html#resampling-methods" style="font-size: 80%;">Resampling methods</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs020.html#resampling-approaches-can-be-computationally-expensive" style="font-size: 80%;">Resampling approaches can be computationally expensive</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs021.html#why-resampling-methods" style="font-size: 80%;">Why resampling methods ?</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs022.html#statistical-analysis" style="font-size: 80%;">Statistical analysis</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs023.html#resampling-methods" style="font-size: 80%;">Resampling methods</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs024.html#resampling-methods-bootstrap" style="font-size: 80%;">Resampling methods: Bootstrap</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs025.html#the-central-limit-theorem" style="font-size: 80%;">The Central Limit Theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs026.html#finding-the-limit" style="font-size: 80%;">Finding the Limit</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs027.html#rewriting-the-delta-function" style="font-size: 80%;">Rewriting the \( \delta \)-function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs028.html#identifying-terms" style="font-size: 80%;">Identifying Terms</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs029.html#wrapping-it-up" style="font-size: 80%;">Wrapping it up</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs030.html#confidence-intervals" style="font-size: 80%;">Confidence Intervals</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs031.html#standard-approach-based-on-the-normal-distribution" style="font-size: 80%;">Standard Approach based on the Normal Distribution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs032.html#resampling-methods-bootstrap-background" style="font-size: 80%;">Resampling methods: Bootstrap background</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#resampling-methods-more-bootstrap-background" style="font-size: 80%;">Resampling methods: More Bootstrap background</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs034.html#resampling-methods-bootstrap-approach" style="font-size: 80%;">Resampling methods: Bootstrap approach</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#resampling-methods-bootstrap-steps" style="font-size: 80%;">Resampling methods: Bootstrap steps</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs036.html#code-example-for-the-bootstrap-method" style="font-size: 80%;">Code example for the Bootstrap method</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs037.html#plotting-the-histogram" style="font-size: 80%;">Plotting the Histogram</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs038.html#the-bias-variance-tradeoff" style="font-size: 80%;">The bias-variance tradeoff</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs039.html#a-way-to-read-the-bias-variance-tradeoff" style="font-size: 80%;">A way to Read the Bias-Variance Tradeoff</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs040.html#example-code-for-bias-variance-tradeoff" style="font-size: 80%;">Example code for Bias-Variance tradeoff</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs041.html#understanding-what-happens" style="font-size: 80%;">Understanding what happens</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs042.html#summing-up" style="font-size: 80%;">Summing up</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs043.html#another-example-from-scikit-learn-s-repository" style="font-size: 80%;">Another Example from Scikit-Learn's Repository</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs044.html#various-steps-in-cross-validation" style="font-size: 80%;">Various steps in cross-validation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs045.html#cross-validation-in-brief" style="font-size: 80%;">Cross-validation in brief</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs046.html#code-example-for-cross-validation-and-k-fold-cross-validation" style="font-size: 80%;">Code Example for Cross-validation and \( k \)-fold Cross-validation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs047.html#more-examples-on-bootstrap-and-cross-validation-and-errors" style="font-size: 80%;">More examples on bootstrap and cross-validation and errors</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs048.html#the-same-example-but-now-with-cross-validation" style="font-size: 80%;">The same example but now with cross-validation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs049.html#material-for-the-lab-sessions" style="font-size: 80%;">Material for the lab sessions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs050.html#linking-the-regression-analysis-with-a-statistical-interpretation" style="font-size: 80%;">Linking the regression analysis with a statistical interpretation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs051.html#assumptions-made" style="font-size: 80%;">Assumptions made</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs052.html#expectation-value-and-variance" style="font-size: 80%;">Expectation value and variance</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs053.html#expectation-value-and-variance-for-boldsymbol-beta" style="font-size: 80%;">Expectation value and variance for \( \boldsymbol{\beta} \)</a></li>
+     <!-- navigation toc: --> <li><a href="#plans-for-week-37-lecture-monday" style="font-size: 80%;"><b>Plans for week 37, lecture Monday</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs002.html#readings-and-videos" style="font-size: 80%;"><b>Readings and Videos:</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs003.html#material-for-lecture-monday-september-8" style="font-size: 80%;"><b>Material for lecture Monday September 8</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs004.html#gradient-descent-and-revisiting-ordinary-least-squares-from-last-week" style="font-size: 80%;"><b>Gradient descent and revisiting Ordinary Least Squares from last week</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs009.html#gradient-descent-example" style="font-size: 80%;"><b>Gradient descent example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs006.html#the-derivative-of-the-cost-loss-function" style="font-size: 80%;"><b>The derivative of the cost/loss function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs007.html#the-hessian-matrix" style="font-size: 80%;"><b>The Hessian matrix</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs008.html#simple-program" style="font-size: 80%;"><b>Simple program</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs009.html#gradient-descent-example" style="font-size: 80%;"><b>Gradient Descent Example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs010.html#gradient-descent-and-ridge" style="font-size: 80%;"><b>Gradient descent and Ridge</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs011.html#the-hessian-matrix-for-ridge-regression" style="font-size: 80%;"><b>The Hessian matrix for Ridge Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs012.html#program-example-for-gradient-descent-with-ridge-regression" style="font-size: 80%;"><b>Program example for gradient descent with Ridge Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs013.html#using-gradient-descent-methods-limitations" style="font-size: 80%;"><b>Using gradient descent methods, limitations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs014.html#momentum-based-gd" style="font-size: 80%;"><b>Momentum based GD</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs015.html#improving-gradient-descent-with-momentum" style="font-size: 80%;"><b>Improving gradient descent with momentum</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs056.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs017.html#overview-video-on-stochastic-gradient-descent-sgd" style="font-size: 80%;"><b>Overview video on Stochastic Gradient Descent (SGD)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs018.html#batches-and-mini-batches" style="font-size: 80%;"><b>Batches and mini-batches</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs019.html#pros-and-cons" style="font-size: 80%;"><b>Pros and cons</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs020.html#convergence-rates" style="font-size: 80%;"><b>Convergence rates</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs021.html#accuracy" style="font-size: 80%;"><b>Accuracy</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs022.html#stochastic-gradient-descent-sgd" style="font-size: 80%;"><b>Stochastic Gradient Descent (SGD)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs023.html#stochastic-gradient-descent" style="font-size: 80%;"><b>Stochastic Gradient Descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs024.html#computation-of-gradients" style="font-size: 80%;"><b>Computation of gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs025.html#sgd-example" style="font-size: 80%;"><b>SGD example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs026.html#the-gradient-step" style="font-size: 80%;"><b>The gradient step</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs027.html#simple-example-code" style="font-size: 80%;"><b>Simple example code</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs028.html#when-do-we-stop" style="font-size: 80%;"><b>When do we stop?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs029.html#slightly-different-approach" style="font-size: 80%;"><b>Slightly different approach</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs030.html#time-decay-rate" style="font-size: 80%;"><b>Time decay rate</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs031.html#code-with-a-number-of-minibatches-which-varies" style="font-size: 80%;"><b>Code with a Number of Minibatches which varies</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs032.html#replace-or-not" style="font-size: 80%;"><b>Replace or not</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#sgd-vs-full-batch-gd-convergence-speed-and-memory-comparison" style="font-size: 80%;"><b>SGD vs Full-Batch GD: Convergence Speed and Memory Comparison</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#theoretical-convergence-speed-and-convex-optimization" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Theoretical Convergence Speed and convex optimization</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#strongly-convex-case" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Strongly Convex Case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#non-convex-problems" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Non-Convex Problems</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs034.html#memory-usage-and-scalability" style="font-size: 80%;"><b>Memory Usage and Scalability</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#empirical-evidence-convergence-time-and-memory-in-practice" style="font-size: 80%;"><b>Empirical Evidence: Convergence Time and Memory in Practice</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#deep-neural-networks" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Deep Neural Networks</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#memory-constraints" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Memory constraints</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs036.html#second-moment-of-the-gradient" style="font-size: 80%;"><b>Second moment of the gradient</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs037.html#challenge-choosing-a-fixed-learning-rate" style="font-size: 80%;"><b>Challenge: Choosing a Fixed Learning Rate</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs038.html#motivation-for-adaptive-step-sizes" style="font-size: 80%;"><b>Motivation for Adaptive Step Sizes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs038.html#adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>AdaGrad algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs039.html#derivation-of-the-adagrad-algorithm" style="font-size: 80%;"><b>Derivation of the AdaGrad Algorithm</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs040.html#adagrad-update-rule-derivation" style="font-size: 80%;"><b>AdaGrad Update Rule Derivation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs041.html#adagrad-properties" style="font-size: 80%;"><b>AdaGrad Properties</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs042.html#rmsprop-adaptive-learning-rates" style="font-size: 80%;"><b>RMSProp: Adaptive Learning Rates</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs042.html#rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>RMSProp algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs043.html#adam-optimizer" style="font-size: 80%;"><b>Adam Optimizer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs044.html#adam-optimizer-https-arxiv-org-abs-1412-6980" style="font-size: 80%;"><b>"ADAM optimizer":"https://arxiv.org/abs/1412.6980"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs045.html#why-combine-momentum-and-rmsprop" style="font-size: 80%;"><b>Why Combine Momentum and RMSProp?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs046.html#adam-exponential-moving-averages-moments" style="font-size: 80%;"><b>Adam: Exponential Moving Averages (Moments)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs047.html#adam-bias-correction" style="font-size: 80%;"><b>Adam: Bias Correction</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs048.html#adam-update-rule-derivation" style="font-size: 80%;"><b>Adam: Update Rule Derivation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs049.html#adam-vs-adagrad-and-rmsprop" style="font-size: 80%;"><b>Adam vs. AdaGrad and RMSProp</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs050.html#adaptivity-across-dimensions" style="font-size: 80%;"><b>Adaptivity Across Dimensions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs050.html#adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>ADAM algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs051.html#algorithms-and-codes-for-adagrad-rmsprop-and-adam" style="font-size: 80%;"><b>Algorithms and codes for Adagrad, RMSprop and Adam</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs052.html#practical-tips" style="font-size: 80%;"><b>Practical tips</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs053.html#sneaking-in-automatic-differentiation-using-autograd" style="font-size: 80%;"><b>Sneaking in automatic differentiation using Autograd</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs056.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs055.html#including-stochastic-gradient-descent-with-autograd" style="font-size: 80%;"><b>Including Stochastic Gradient Descent with Autograd</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs056.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs057.html#but-none-of-these-can-compete-with-newton-s-method" style="font-size: 80%;"><b>But none of these can compete with Newton's method</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs058.html#similar-second-order-function-now-problem-but-now-with-adagrad" style="font-size: 80%;"><b>Similar (second order function now) problem but now with AdaGrad</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs059.html#rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent" style="font-size: 80%;"><b>RMSprop for adaptive learning rate with Stochastic Gradient Descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs060.html#and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf" style="font-size: 80%;"><b>And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs061.html#material-for-the-lab-sessions" style="font-size: 80%;"><b>Material for the lab sessions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs062.html#reminder-on-different-scaling-methods" style="font-size: 80%;"><b>Reminder on different scaling methods</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs063.html#functionality-in-scikit-learn" style="font-size: 80%;"><b>Functionality in Scikit-Learn</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs064.html#more-preprocessing" style="font-size: 80%;"><b>More preprocessing</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs065.html#frequently-used-scaling-functions" style="font-size: 80%;"><b>Frequently used scaling functions</b></a></li>
 
         </ul>
       </li>
@@ -296,20 +379,15 @@ <h2 id="plans-for-week-37-lecture-monday" class="anchor">Plans for week 37, lect
 <div class="panel panel-default">
 <div class="panel-body">
 <!-- subsequent paragraphs come in larger fonts, so start with a paragraph -->
-<ul>
-   <li> <a href="https://youtu.be/omLmp_kkie0" target="_self">Video of Lecture</a></li>
-   <li> <a href="https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2024/NotesSeptember9.pdf" target="_self">Whiteboard notes</a></li>
-</ul>
-  <li> Statistical interpretation of Ridge and Lasso regression, see also slides from last week</li>
-  <li> Resampling techniques, Bootstrap and cross validation and bias-variance tradeoff (this may partly be discussed during the exercise sessions as well.</li>
-  <li> Readings and Videos:</li>
-<ul>
-    <li> Raschka et al, pages 175-192</li>
-    <li> Hastie et al Chapter 7, here we recommend 7.1-7.5 and 7.10 (cross-validation) and 7.11 (bootstrap). See <a href="https://link.springer.com/book/10.1007/978-0-387-84858-7" target="_self"><tt>https://link.springer.com/book/10.1007/978-0-387-84858-7</tt></a>.</li>
-    <li> <a href="https://www.youtube.com/watch?v=fSytzGwwBVw" target="_self">Video on cross validation</a></li>
-    <li> <a href="https://www.youtube.com/watch?v=Xz0x-8-cgaQ" target="_self">Video on Bootstrapping</a></li>
-    <li> <a href="https://www.youtube.com/watch?v=EuBBz3bI-aA" target="_self">Video on bias-variance tradeoff</a></li>
-</ul>
+<p>The family of gradient descent methods</p>
+<ol>
+<li> Plain gradient descent (constant learning rate), reminder from last week with examples using OLS and Ridge</li>
+<li> Improving gradient descent with momentum</li>
+<li> Introducing stochastic gradient descent</li>
+<li> More advanced updates of the learning rate: ADAgrad, RMSprop and ADAM</li>
+<li> <a href="https://youtu.be/SuxK68tj-V8" target="_self">Video of Lecture</a></li>
+<li> <a href="https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2025/FYSSTKweek37.pdf" target="_self">Whiteboard notes</a></li>
+</ol>
 </div>
 </div>
 
@@ -330,7 +408,7 @@ <h2 id="plans-for-week-37-lecture-monday" class="anchor">Plans for week 37, lect
   <li><a href="http://github.com/._week37-bs009.html">10</a></li>
   <li><a href="http://github.com/._week37-bs010.html">11</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week37-bs053.html">54</a></li>
+  <li><a href="http://github.com/._week37-bs065.html">66</a></li>
   <li><a href="http://github.com/._week37-bs002.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week37/html/._week37-bs002.html b/doc/pub/week37/html/._week37-bs002.html
index abc8ee8ca..1ea756d6e 100644
--- a/doc/pub/week37/html/._week37-bs002.html
+++ b/doc/pub/week37/html/._week37-bs002.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 37: Statistical interpretations and Resampling Methods">
-<title>Week 37: Statistical interpretations and Resampling Methods</title>
+<meta name="description" content="Week 37: Gradient descent methods">
+<title>Week 37: Gradient descent methods</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week37.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week37-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -40,159 +40,222 @@
                2,
                None,
                'plans-for-week-37-lecture-monday'),
-              ('Plans for week 37, lab sessions',
+              ('Readings and Videos:', 2, None, 'readings-and-videos'),
+              ('Material for lecture Monday September 8',
                2,
                None,
-               'plans-for-week-37-lab-sessions'),
-              ('Material for lecture Monday September 9',
+               'material-for-lecture-monday-september-8'),
+              ('Gradient descent and revisiting Ordinary Least Squares from '
+               'last week',
                2,
                None,
-               'material-for-lecture-monday-september-9'),
-              ('Deriving OLS from a probability distribution',
+               'gradient-descent-and-revisiting-ordinary-least-squares-from-last-week'),
+              ('Gradient descent example', 2, None, 'gradient-descent-example'),
+              ('The derivative of the cost/loss function',
                2,
                None,
-               'deriving-ols-from-a-probability-distribution'),
-              ('Independent and Identically Distrubuted (iid)',
+               'the-derivative-of-the-cost-loss-function'),
+              ('The Hessian matrix', 2, None, 'the-hessian-matrix'),
+              ('Simple program', 2, None, 'simple-program'),
+              ('Gradient Descent Example', 2, None, 'gradient-descent-example'),
+              ('Gradient descent and Ridge',
                2,
                None,
-               'independent-and-identically-distrubuted-iid'),
-              ('Maximum Likelihood Estimation (MLE)',
+               'gradient-descent-and-ridge'),
+              ('The Hessian matrix for Ridge Regression',
                2,
                None,
-               'maximum-likelihood-estimation-mle'),
-              ('A new Cost Function', 2, None, 'a-new-cost-function'),
-              ("More basic Statistics and Bayes' theorem",
+               'the-hessian-matrix-for-ridge-regression'),
+              ('Program example for gradient descent with Ridge Regression',
                2,
                None,
-               'more-basic-statistics-and-bayes-theorem'),
-              ('Marginal Probability', 2, None, 'marginal-probability'),
-              ('Conditional  Probability', 2, None, 'conditional-probability'),
-              ("Bayes' Theorem", 2, None, 'bayes-theorem'),
-              ("Interpretations of Bayes' Theorem",
+               'program-example-for-gradient-descent-with-ridge-regression'),
+              ('Using gradient descent methods, limitations',
                2,
                None,
-               'interpretations-of-bayes-theorem'),
-              ("Example of Usage of Bayes' theorem",
+               'using-gradient-descent-methods-limitations'),
+              ('Momentum based GD', 2, None, 'momentum-based-gd'),
+              ('Improving gradient descent with momentum',
                2,
                None,
-               'example-of-usage-of-bayes-theorem'),
-              ('Doing it correctly', 2, None, 'doing-it-correctly'),
-              ("Bayes' Theorem and Ridge and Lasso Regression",
+               'improving-gradient-descent-with-momentum'),
+              ('Same code but now with momentum gradient descent',
                2,
                None,
-               'bayes-theorem-and-ridge-and-lasso-regression'),
-              ('Ridge and Bayes', 2, None, 'ridge-and-bayes'),
-              ('Lasso and Bayes', 2, None, 'lasso-and-bayes'),
-              ('Why resampling methods', 2, None, 'why-resampling-methods'),
-              ('Resampling methods', 2, None, 'resampling-methods'),
-              ('Resampling approaches can be computationally expensive',
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ('Overview video on Stochastic Gradient Descent (SGD)',
                2,
                None,
-               'resampling-approaches-can-be-computationally-expensive'),
-              ('Why resampling methods ?', 2, None, 'why-resampling-methods'),
-              ('Statistical analysis', 2, None, 'statistical-analysis'),
-              ('Resampling methods', 2, None, 'resampling-methods'),
-              ('Resampling methods: Bootstrap',
+               'overview-video-on-stochastic-gradient-descent-sgd'),
+              ('Batches and mini-batches', 2, None, 'batches-and-mini-batches'),
+              ('Pros and cons', 2, None, 'pros-and-cons'),
+              ('Convergence rates', 2, None, 'convergence-rates'),
+              ('Accuracy', 2, None, 'accuracy'),
+              ('Stochastic Gradient Descent (SGD)',
                2,
                None,
-               'resampling-methods-bootstrap'),
-              ('The Central Limit Theorem',
+               'stochastic-gradient-descent-sgd'),
+              ('Stochastic Gradient Descent',
                2,
                None,
-               'the-central-limit-theorem'),
-              ('Finding the Limit', 2, None, 'finding-the-limit'),
-              ('Rewriting the $\\delta$-function',
+               'stochastic-gradient-descent'),
+              ('Computation of gradients', 2, None, 'computation-of-gradients'),
+              ('SGD example', 2, None, 'sgd-example'),
+              ('The gradient step', 2, None, 'the-gradient-step'),
+              ('Simple example code', 2, None, 'simple-example-code'),
+              ('When do we stop?', 2, None, 'when-do-we-stop'),
+              ('Slightly different approach',
                2,
                None,
-               'rewriting-the-delta-function'),
-              ('Identifying Terms', 2, None, 'identifying-terms'),
-              ('Wrapping it up', 2, None, 'wrapping-it-up'),
-              ('Confidence Intervals', 2, None, 'confidence-intervals'),
-              ('Standard Approach based on the Normal Distribution',
+               'slightly-different-approach'),
+              ('Time decay rate', 2, None, 'time-decay-rate'),
+              ('Code with a Number of Minibatches which varies',
                2,
                None,
-               'standard-approach-based-on-the-normal-distribution'),
-              ('Resampling methods: Bootstrap background',
+               'code-with-a-number-of-minibatches-which-varies'),
+              ('Replace or not', 2, None, 'replace-or-not'),
+              ('SGD vs Full-Batch GD: Convergence Speed and Memory Comparison',
                2,
                None,
-               'resampling-methods-bootstrap-background'),
-              ('Resampling methods: More Bootstrap background',
+               'sgd-vs-full-batch-gd-convergence-speed-and-memory-comparison'),
+              ('Theoretical Convergence Speed and convex optimization',
+               3,
+               None,
+               'theoretical-convergence-speed-and-convex-optimization'),
+              ('Strongly Convex Case', 3, None, 'strongly-convex-case'),
+              ('Non-Convex Problems', 3, None, 'non-convex-problems'),
+              ('Memory Usage and Scalability',
+               2,
+               None,
+               'memory-usage-and-scalability'),
+              ('Empirical Evidence: Convergence Time and Memory in Practice',
+               2,
+               None,
+               'empirical-evidence-convergence-time-and-memory-in-practice'),
+              ('Deep Neural Networks', 3, None, 'deep-neural-networks'),
+              ('Memory constraints', 3, None, 'memory-constraints'),
+              ('Second moment of the gradient',
+               2,
+               None,
+               'second-moment-of-the-gradient'),
+              ('Challenge: Choosing a Fixed Learning Rate',
+               2,
+               None,
+               'challenge-choosing-a-fixed-learning-rate'),
+              ('Motivation for Adaptive Step Sizes',
+               2,
+               None,
+               'motivation-for-adaptive-step-sizes'),
+              ('AdaGrad algorithm, taken from "Goodfellow et '
+               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
+               2,
+               None,
+               'adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
+              ('Derivation of the AdaGrad Algorithm',
+               2,
+               None,
+               'derivation-of-the-adagrad-algorithm'),
+              ('AdaGrad Update Rule Derivation',
+               2,
+               None,
+               'adagrad-update-rule-derivation'),
+              ('AdaGrad Properties', 2, None, 'adagrad-properties'),
+              ('RMSProp: Adaptive Learning Rates',
+               2,
+               None,
+               'rmsprop-adaptive-learning-rates'),
+              ('RMSProp algorithm, taken from "Goodfellow et '
+               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
+               2,
+               None,
+               'rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
+              ('Adam Optimizer', 2, None, 'adam-optimizer'),
+              ('"ADAM optimizer":"https://arxiv.org/abs/1412.6980"',
                2,
                None,
-               'resampling-methods-more-bootstrap-background'),
-              ('Resampling methods: Bootstrap approach',
+               'adam-optimizer-https-arxiv-org-abs-1412-6980'),
+              ('Why Combine Momentum and RMSProp?',
                2,
                None,
-               'resampling-methods-bootstrap-approach'),
-              ('Resampling methods: Bootstrap steps',
+               'why-combine-momentum-and-rmsprop'),
+              ('Adam: Exponential Moving Averages (Moments)',
                2,
                None,
-               'resampling-methods-bootstrap-steps'),
-              ('Code example for the Bootstrap method',
+               'adam-exponential-moving-averages-moments'),
+              ('Adam: Bias Correction', 2, None, 'adam-bias-correction'),
+              ('Adam: Update Rule Derivation',
                2,
                None,
-               'code-example-for-the-bootstrap-method'),
-              ('Plotting the Histogram', 2, None, 'plotting-the-histogram'),
-              ('The bias-variance tradeoff',
+               'adam-update-rule-derivation'),
+              ('Adam vs. AdaGrad and RMSProp',
                2,
                None,
-               'the-bias-variance-tradeoff'),
-              ('A way to Read the Bias-Variance Tradeoff',
+               'adam-vs-adagrad-and-rmsprop'),
+              ('Adaptivity Across Dimensions',
                2,
                None,
-               'a-way-to-read-the-bias-variance-tradeoff'),
-              ('Example code for Bias-Variance tradeoff',
+               'adaptivity-across-dimensions'),
+              ('ADAM algorithm, taken from "Goodfellow et '
+               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
                2,
                None,
-               'example-code-for-bias-variance-tradeoff'),
-              ('Understanding what happens',
+               'adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
+              ('Algorithms and codes for Adagrad, RMSprop and Adam',
                2,
                None,
-               'understanding-what-happens'),
-              ('Summing up', 2, None, 'summing-up'),
-              ("Another Example from Scikit-Learn's Repository",
+               'algorithms-and-codes-for-adagrad-rmsprop-and-adam'),
+              ('Practical tips', 2, None, 'practical-tips'),
+              ('Sneaking in automatic differentiation using Autograd',
                2,
                None,
-               'another-example-from-scikit-learn-s-repository'),
-              ('Various steps in cross-validation',
+               'sneaking-in-automatic-differentiation-using-autograd'),
+              ('Same code but now with momentum gradient descent',
                2,
                None,
-               'various-steps-in-cross-validation'),
-              ('Cross-validation in brief',
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ('Including Stochastic Gradient Descent with Autograd',
                2,
                None,
-               'cross-validation-in-brief'),
-              ('Code Example for Cross-validation and $k$-fold '
-               'Cross-validation',
+               'including-stochastic-gradient-descent-with-autograd'),
+              ('Same code but now with momentum gradient descent',
                2,
                None,
-               'code-example-for-cross-validation-and-k-fold-cross-validation'),
-              ('More examples on bootstrap and cross-validation and errors',
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ("But none of these can compete with Newton's method",
                2,
                None,
-               'more-examples-on-bootstrap-and-cross-validation-and-errors'),
-              ('The same example but now with cross-validation',
+               'but-none-of-these-can-compete-with-newton-s-method'),
+              ('Similar (second order function now) problem but now with '
+               'AdaGrad',
                2,
                None,
-               'the-same-example-but-now-with-cross-validation'),
+               'similar-second-order-function-now-problem-but-now-with-adagrad'),
+              ('RMSprop for adaptive learning rate with Stochastic Gradient '
+               'Descent',
+               2,
+               None,
+               'rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent'),
+              ('And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"',
+               2,
+               None,
+               'and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf'),
               ('Material for the lab sessions',
                2,
                None,
                'material-for-the-lab-sessions'),
-              ('Linking the regression analysis with a statistical '
-               'interpretation',
+              ('Reminder on different scaling methods',
                2,
                None,
-               'linking-the-regression-analysis-with-a-statistical-interpretation'),
-              ('Assumptions made', 2, None, 'assumptions-made'),
-              ('Expectation value and variance',
+               'reminder-on-different-scaling-methods'),
+              ('Functionality in Scikit-Learn',
                2,
                None,
-               'expectation-value-and-variance'),
-              ('Expectation value and variance for $\\boldsymbol{\\beta}$',
+               'functionality-in-scikit-learn'),
+              ('More preprocessing', 2, None, 'more-preprocessing'),
+              ('Frequently used scaling functions',
                2,
                None,
-               'expectation-value-and-variance-for-boldsymbol-beta')]}
+               'frequently-used-scaling-functions')]}
 end of tocinfo -->
 
 <body>
@@ -220,66 +283,86 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week37-bs.html">Week 37: Statistical interpretations and Resampling Methods</a>
+    <a class="navbar-brand" href="http://github.com/week37-bs.html">Week 37: Gradient descent methods</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs001.html#plans-for-week-37-lecture-monday" style="font-size: 80%;">Plans for week 37, lecture Monday</a></li>
-     <!-- navigation toc: --> <li><a href="#plans-for-week-37-lab-sessions" style="font-size: 80%;">Plans for week 37, lab sessions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs003.html#material-for-lecture-monday-september-9" style="font-size: 80%;">Material for lecture Monday September 9</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs004.html#deriving-ols-from-a-probability-distribution" style="font-size: 80%;">Deriving OLS from a probability distribution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs005.html#independent-and-identically-distrubuted-iid" style="font-size: 80%;">Independent and Identically Distrubuted (iid)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs006.html#maximum-likelihood-estimation-mle" style="font-size: 80%;">Maximum Likelihood Estimation (MLE)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs007.html#a-new-cost-function" style="font-size: 80%;">A new Cost Function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs008.html#more-basic-statistics-and-bayes-theorem" style="font-size: 80%;">More basic Statistics and Bayes' theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs009.html#marginal-probability" style="font-size: 80%;">Marginal Probability</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs010.html#conditional-probability" style="font-size: 80%;">Conditional  Probability</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs011.html#bayes-theorem" style="font-size: 80%;">Bayes' Theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs012.html#interpretations-of-bayes-theorem" style="font-size: 80%;">Interpretations of Bayes' Theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs013.html#example-of-usage-of-bayes-theorem" style="font-size: 80%;">Example of Usage of Bayes' theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs014.html#doing-it-correctly" style="font-size: 80%;">Doing it correctly</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs015.html#bayes-theorem-and-ridge-and-lasso-regression" style="font-size: 80%;">Bayes' Theorem and Ridge and Lasso Regression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs016.html#ridge-and-bayes" style="font-size: 80%;">Ridge and Bayes</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs017.html#lasso-and-bayes" style="font-size: 80%;">Lasso and Bayes</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs021.html#why-resampling-methods" style="font-size: 80%;">Why resampling methods</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs023.html#resampling-methods" style="font-size: 80%;">Resampling methods</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs020.html#resampling-approaches-can-be-computationally-expensive" style="font-size: 80%;">Resampling approaches can be computationally expensive</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs021.html#why-resampling-methods" style="font-size: 80%;">Why resampling methods ?</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs022.html#statistical-analysis" style="font-size: 80%;">Statistical analysis</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs023.html#resampling-methods" style="font-size: 80%;">Resampling methods</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs024.html#resampling-methods-bootstrap" style="font-size: 80%;">Resampling methods: Bootstrap</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs025.html#the-central-limit-theorem" style="font-size: 80%;">The Central Limit Theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs026.html#finding-the-limit" style="font-size: 80%;">Finding the Limit</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs027.html#rewriting-the-delta-function" style="font-size: 80%;">Rewriting the \( \delta \)-function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs028.html#identifying-terms" style="font-size: 80%;">Identifying Terms</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs029.html#wrapping-it-up" style="font-size: 80%;">Wrapping it up</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs030.html#confidence-intervals" style="font-size: 80%;">Confidence Intervals</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs031.html#standard-approach-based-on-the-normal-distribution" style="font-size: 80%;">Standard Approach based on the Normal Distribution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs032.html#resampling-methods-bootstrap-background" style="font-size: 80%;">Resampling methods: Bootstrap background</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#resampling-methods-more-bootstrap-background" style="font-size: 80%;">Resampling methods: More Bootstrap background</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs034.html#resampling-methods-bootstrap-approach" style="font-size: 80%;">Resampling methods: Bootstrap approach</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#resampling-methods-bootstrap-steps" style="font-size: 80%;">Resampling methods: Bootstrap steps</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs036.html#code-example-for-the-bootstrap-method" style="font-size: 80%;">Code example for the Bootstrap method</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs037.html#plotting-the-histogram" style="font-size: 80%;">Plotting the Histogram</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs038.html#the-bias-variance-tradeoff" style="font-size: 80%;">The bias-variance tradeoff</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs039.html#a-way-to-read-the-bias-variance-tradeoff" style="font-size: 80%;">A way to Read the Bias-Variance Tradeoff</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs040.html#example-code-for-bias-variance-tradeoff" style="font-size: 80%;">Example code for Bias-Variance tradeoff</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs041.html#understanding-what-happens" style="font-size: 80%;">Understanding what happens</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs042.html#summing-up" style="font-size: 80%;">Summing up</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs043.html#another-example-from-scikit-learn-s-repository" style="font-size: 80%;">Another Example from Scikit-Learn's Repository</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs044.html#various-steps-in-cross-validation" style="font-size: 80%;">Various steps in cross-validation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs045.html#cross-validation-in-brief" style="font-size: 80%;">Cross-validation in brief</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs046.html#code-example-for-cross-validation-and-k-fold-cross-validation" style="font-size: 80%;">Code Example for Cross-validation and \( k \)-fold Cross-validation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs047.html#more-examples-on-bootstrap-and-cross-validation-and-errors" style="font-size: 80%;">More examples on bootstrap and cross-validation and errors</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs048.html#the-same-example-but-now-with-cross-validation" style="font-size: 80%;">The same example but now with cross-validation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs049.html#material-for-the-lab-sessions" style="font-size: 80%;">Material for the lab sessions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs050.html#linking-the-regression-analysis-with-a-statistical-interpretation" style="font-size: 80%;">Linking the regression analysis with a statistical interpretation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs051.html#assumptions-made" style="font-size: 80%;">Assumptions made</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs052.html#expectation-value-and-variance" style="font-size: 80%;">Expectation value and variance</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs053.html#expectation-value-and-variance-for-boldsymbol-beta" style="font-size: 80%;">Expectation value and variance for \( \boldsymbol{\beta} \)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs001.html#plans-for-week-37-lecture-monday" style="font-size: 80%;"><b>Plans for week 37, lecture Monday</b></a></li>
+     <!-- navigation toc: --> <li><a href="#readings-and-videos" style="font-size: 80%;"><b>Readings and Videos:</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs003.html#material-for-lecture-monday-september-8" style="font-size: 80%;"><b>Material for lecture Monday September 8</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs004.html#gradient-descent-and-revisiting-ordinary-least-squares-from-last-week" style="font-size: 80%;"><b>Gradient descent and revisiting Ordinary Least Squares from last week</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs009.html#gradient-descent-example" style="font-size: 80%;"><b>Gradient descent example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs006.html#the-derivative-of-the-cost-loss-function" style="font-size: 80%;"><b>The derivative of the cost/loss function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs007.html#the-hessian-matrix" style="font-size: 80%;"><b>The Hessian matrix</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs008.html#simple-program" style="font-size: 80%;"><b>Simple program</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs009.html#gradient-descent-example" style="font-size: 80%;"><b>Gradient Descent Example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs010.html#gradient-descent-and-ridge" style="font-size: 80%;"><b>Gradient descent and Ridge</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs011.html#the-hessian-matrix-for-ridge-regression" style="font-size: 80%;"><b>The Hessian matrix for Ridge Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs012.html#program-example-for-gradient-descent-with-ridge-regression" style="font-size: 80%;"><b>Program example for gradient descent with Ridge Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs013.html#using-gradient-descent-methods-limitations" style="font-size: 80%;"><b>Using gradient descent methods, limitations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs014.html#momentum-based-gd" style="font-size: 80%;"><b>Momentum based GD</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs015.html#improving-gradient-descent-with-momentum" style="font-size: 80%;"><b>Improving gradient descent with momentum</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs056.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs017.html#overview-video-on-stochastic-gradient-descent-sgd" style="font-size: 80%;"><b>Overview video on Stochastic Gradient Descent (SGD)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs018.html#batches-and-mini-batches" style="font-size: 80%;"><b>Batches and mini-batches</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs019.html#pros-and-cons" style="font-size: 80%;"><b>Pros and cons</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs020.html#convergence-rates" style="font-size: 80%;"><b>Convergence rates</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs021.html#accuracy" style="font-size: 80%;"><b>Accuracy</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs022.html#stochastic-gradient-descent-sgd" style="font-size: 80%;"><b>Stochastic Gradient Descent (SGD)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs023.html#stochastic-gradient-descent" style="font-size: 80%;"><b>Stochastic Gradient Descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs024.html#computation-of-gradients" style="font-size: 80%;"><b>Computation of gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs025.html#sgd-example" style="font-size: 80%;"><b>SGD example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs026.html#the-gradient-step" style="font-size: 80%;"><b>The gradient step</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs027.html#simple-example-code" style="font-size: 80%;"><b>Simple example code</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs028.html#when-do-we-stop" style="font-size: 80%;"><b>When do we stop?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs029.html#slightly-different-approach" style="font-size: 80%;"><b>Slightly different approach</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs030.html#time-decay-rate" style="font-size: 80%;"><b>Time decay rate</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs031.html#code-with-a-number-of-minibatches-which-varies" style="font-size: 80%;"><b>Code with a Number of Minibatches which varies</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs032.html#replace-or-not" style="font-size: 80%;"><b>Replace or not</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#sgd-vs-full-batch-gd-convergence-speed-and-memory-comparison" style="font-size: 80%;"><b>SGD vs Full-Batch GD: Convergence Speed and Memory Comparison</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#theoretical-convergence-speed-and-convex-optimization" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Theoretical Convergence Speed and convex optimization</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#strongly-convex-case" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Strongly Convex Case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#non-convex-problems" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Non-Convex Problems</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs034.html#memory-usage-and-scalability" style="font-size: 80%;"><b>Memory Usage and Scalability</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#empirical-evidence-convergence-time-and-memory-in-practice" style="font-size: 80%;"><b>Empirical Evidence: Convergence Time and Memory in Practice</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#deep-neural-networks" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Deep Neural Networks</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#memory-constraints" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Memory constraints</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs036.html#second-moment-of-the-gradient" style="font-size: 80%;"><b>Second moment of the gradient</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs037.html#challenge-choosing-a-fixed-learning-rate" style="font-size: 80%;"><b>Challenge: Choosing a Fixed Learning Rate</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs038.html#motivation-for-adaptive-step-sizes" style="font-size: 80%;"><b>Motivation for Adaptive Step Sizes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs038.html#adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>AdaGrad algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs039.html#derivation-of-the-adagrad-algorithm" style="font-size: 80%;"><b>Derivation of the AdaGrad Algorithm</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs040.html#adagrad-update-rule-derivation" style="font-size: 80%;"><b>AdaGrad Update Rule Derivation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs041.html#adagrad-properties" style="font-size: 80%;"><b>AdaGrad Properties</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs042.html#rmsprop-adaptive-learning-rates" style="font-size: 80%;"><b>RMSProp: Adaptive Learning Rates</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs042.html#rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>RMSProp algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs043.html#adam-optimizer" style="font-size: 80%;"><b>Adam Optimizer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs044.html#adam-optimizer-https-arxiv-org-abs-1412-6980" style="font-size: 80%;"><b>"ADAM optimizer":"https://arxiv.org/abs/1412.6980"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs045.html#why-combine-momentum-and-rmsprop" style="font-size: 80%;"><b>Why Combine Momentum and RMSProp?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs046.html#adam-exponential-moving-averages-moments" style="font-size: 80%;"><b>Adam: Exponential Moving Averages (Moments)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs047.html#adam-bias-correction" style="font-size: 80%;"><b>Adam: Bias Correction</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs048.html#adam-update-rule-derivation" style="font-size: 80%;"><b>Adam: Update Rule Derivation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs049.html#adam-vs-adagrad-and-rmsprop" style="font-size: 80%;"><b>Adam vs. AdaGrad and RMSProp</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs050.html#adaptivity-across-dimensions" style="font-size: 80%;"><b>Adaptivity Across Dimensions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs050.html#adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>ADAM algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs051.html#algorithms-and-codes-for-adagrad-rmsprop-and-adam" style="font-size: 80%;"><b>Algorithms and codes for Adagrad, RMSprop and Adam</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs052.html#practical-tips" style="font-size: 80%;"><b>Practical tips</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs053.html#sneaking-in-automatic-differentiation-using-autograd" style="font-size: 80%;"><b>Sneaking in automatic differentiation using Autograd</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs056.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs055.html#including-stochastic-gradient-descent-with-autograd" style="font-size: 80%;"><b>Including Stochastic Gradient Descent with Autograd</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs056.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs057.html#but-none-of-these-can-compete-with-newton-s-method" style="font-size: 80%;"><b>But none of these can compete with Newton's method</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs058.html#similar-second-order-function-now-problem-but-now-with-adagrad" style="font-size: 80%;"><b>Similar (second order function now) problem but now with AdaGrad</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs059.html#rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent" style="font-size: 80%;"><b>RMSprop for adaptive learning rate with Stochastic Gradient Descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs060.html#and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf" style="font-size: 80%;"><b>And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs061.html#material-for-the-lab-sessions" style="font-size: 80%;"><b>Material for the lab sessions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs062.html#reminder-on-different-scaling-methods" style="font-size: 80%;"><b>Reminder on different scaling methods</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs063.html#functionality-in-scikit-learn" style="font-size: 80%;"><b>Functionality in Scikit-Learn</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs064.html#more-preprocessing" style="font-size: 80%;"><b>More preprocessing</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs065.html#frequently-used-scaling-functions" style="font-size: 80%;"><b>Frequently used scaling functions</b></a></li>
 
         </ul>
       </li>
@@ -291,22 +374,19 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0002"></a>
 <!-- !split -->
-<h2 id="plans-for-week-37-lab-sessions" class="anchor">Plans for week 37, lab sessions  </h2>
-
+<h2 id="readings-and-videos" class="anchor">Readings and Videos: </h2>
 <div class="panel panel-default">
 <div class="panel-body">
 <!-- subsequent paragraphs come in larger fonts, so start with a paragraph -->
-<ul>
-  <li> Calculations of expectation values</li>
-  <li> Discussion of resampling techniques</li>
-  <li> Exercise set for week 37</li>
-  <li> Work on project 1</li>
-  <li> <a href="https://youtu.be/bK4AEcTu-oM" target="_self">Video of exercise sessions week 37</a></li>
-  <li> For more discussions of Ridge regression and calculation of averages, <a href="https://arxiv.org/abs/1509.09169" target="_self">Wessel van Wieringen's</a> article is highly recommended.</li>
-</ul>
+<ol>
+<li> Recommended: Goodfellow et al, Deep Learning, introduction to gradient descent, see sections 4.3-4.5  at <a href="https://www.deeplearningbook.org/contents/numerical.html" target="_self"><tt>https://www.deeplearningbook.org/contents/numerical.html</tt></a> and chapter 8.3-8.5 at <a href="https://www.deeplearningbook.org/contents/optimization.html" target="_self"><tt>https://www.deeplearningbook.org/contents/optimization.html</tt></a></li>
+<li> Rashcka et al, pages 37-44 and pages 278-283 with focus on linear regression.</li>
+<li> Video on gradient descent at <a href="https://www.youtube.com/watch?v=sDv4f4s2SB8" target="_self"><tt>https://www.youtube.com/watch?v=sDv4f4s2SB8</tt></a></li>
+<li> Video on Stochastic gradient descent at <a href="https://www.youtube.com/watch?v=vMh0zPT0tLI" target="_self"><tt>https://www.youtube.com/watch?v=vMh0zPT0tLI</tt></a></li>
+</ol>
 </div>
 </div>
-  
+
 
 <p>
 <!-- navigation buttons at the bottom of the page -->
@@ -325,7 +405,7 @@ <h2 id="plans-for-week-37-lab-sessions" class="anchor">Plans for week 37, lab se
   <li><a href="http://github.com/._week37-bs010.html">11</a></li>
   <li><a href="http://github.com/._week37-bs011.html">12</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week37-bs053.html">54</a></li>
+  <li><a href="http://github.com/._week37-bs065.html">66</a></li>
   <li><a href="http://github.com/._week37-bs003.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week37/html/._week37-bs003.html b/doc/pub/week37/html/._week37-bs003.html
index e4d022a2f..1efdbacc7 100644
--- a/doc/pub/week37/html/._week37-bs003.html
+++ b/doc/pub/week37/html/._week37-bs003.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 37: Statistical interpretations and Resampling Methods">
-<title>Week 37: Statistical interpretations and Resampling Methods</title>
+<meta name="description" content="Week 37: Gradient descent methods">
+<title>Week 37: Gradient descent methods</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week37.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week37-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -40,159 +40,222 @@
                2,
                None,
                'plans-for-week-37-lecture-monday'),
-              ('Plans for week 37, lab sessions',
+              ('Readings and Videos:', 2, None, 'readings-and-videos'),
+              ('Material for lecture Monday September 8',
                2,
                None,
-               'plans-for-week-37-lab-sessions'),
-              ('Material for lecture Monday September 9',
+               'material-for-lecture-monday-september-8'),
+              ('Gradient descent and revisiting Ordinary Least Squares from '
+               'last week',
                2,
                None,
-               'material-for-lecture-monday-september-9'),
-              ('Deriving OLS from a probability distribution',
+               'gradient-descent-and-revisiting-ordinary-least-squares-from-last-week'),
+              ('Gradient descent example', 2, None, 'gradient-descent-example'),
+              ('The derivative of the cost/loss function',
                2,
                None,
-               'deriving-ols-from-a-probability-distribution'),
-              ('Independent and Identically Distrubuted (iid)',
+               'the-derivative-of-the-cost-loss-function'),
+              ('The Hessian matrix', 2, None, 'the-hessian-matrix'),
+              ('Simple program', 2, None, 'simple-program'),
+              ('Gradient Descent Example', 2, None, 'gradient-descent-example'),
+              ('Gradient descent and Ridge',
                2,
                None,
-               'independent-and-identically-distrubuted-iid'),
-              ('Maximum Likelihood Estimation (MLE)',
+               'gradient-descent-and-ridge'),
+              ('The Hessian matrix for Ridge Regression',
                2,
                None,
-               'maximum-likelihood-estimation-mle'),
-              ('A new Cost Function', 2, None, 'a-new-cost-function'),
-              ("More basic Statistics and Bayes' theorem",
+               'the-hessian-matrix-for-ridge-regression'),
+              ('Program example for gradient descent with Ridge Regression',
                2,
                None,
-               'more-basic-statistics-and-bayes-theorem'),
-              ('Marginal Probability', 2, None, 'marginal-probability'),
-              ('Conditional  Probability', 2, None, 'conditional-probability'),
-              ("Bayes' Theorem", 2, None, 'bayes-theorem'),
-              ("Interpretations of Bayes' Theorem",
+               'program-example-for-gradient-descent-with-ridge-regression'),
+              ('Using gradient descent methods, limitations',
                2,
                None,
-               'interpretations-of-bayes-theorem'),
-              ("Example of Usage of Bayes' theorem",
+               'using-gradient-descent-methods-limitations'),
+              ('Momentum based GD', 2, None, 'momentum-based-gd'),
+              ('Improving gradient descent with momentum',
                2,
                None,
-               'example-of-usage-of-bayes-theorem'),
-              ('Doing it correctly', 2, None, 'doing-it-correctly'),
-              ("Bayes' Theorem and Ridge and Lasso Regression",
+               'improving-gradient-descent-with-momentum'),
+              ('Same code but now with momentum gradient descent',
                2,
                None,
-               'bayes-theorem-and-ridge-and-lasso-regression'),
-              ('Ridge and Bayes', 2, None, 'ridge-and-bayes'),
-              ('Lasso and Bayes', 2, None, 'lasso-and-bayes'),
-              ('Why resampling methods', 2, None, 'why-resampling-methods'),
-              ('Resampling methods', 2, None, 'resampling-methods'),
-              ('Resampling approaches can be computationally expensive',
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ('Overview video on Stochastic Gradient Descent (SGD)',
                2,
                None,
-               'resampling-approaches-can-be-computationally-expensive'),
-              ('Why resampling methods ?', 2, None, 'why-resampling-methods'),
-              ('Statistical analysis', 2, None, 'statistical-analysis'),
-              ('Resampling methods', 2, None, 'resampling-methods'),
-              ('Resampling methods: Bootstrap',
+               'overview-video-on-stochastic-gradient-descent-sgd'),
+              ('Batches and mini-batches', 2, None, 'batches-and-mini-batches'),
+              ('Pros and cons', 2, None, 'pros-and-cons'),
+              ('Convergence rates', 2, None, 'convergence-rates'),
+              ('Accuracy', 2, None, 'accuracy'),
+              ('Stochastic Gradient Descent (SGD)',
                2,
                None,
-               'resampling-methods-bootstrap'),
-              ('The Central Limit Theorem',
+               'stochastic-gradient-descent-sgd'),
+              ('Stochastic Gradient Descent',
                2,
                None,
-               'the-central-limit-theorem'),
-              ('Finding the Limit', 2, None, 'finding-the-limit'),
-              ('Rewriting the $\\delta$-function',
+               'stochastic-gradient-descent'),
+              ('Computation of gradients', 2, None, 'computation-of-gradients'),
+              ('SGD example', 2, None, 'sgd-example'),
+              ('The gradient step', 2, None, 'the-gradient-step'),
+              ('Simple example code', 2, None, 'simple-example-code'),
+              ('When do we stop?', 2, None, 'when-do-we-stop'),
+              ('Slightly different approach',
                2,
                None,
-               'rewriting-the-delta-function'),
-              ('Identifying Terms', 2, None, 'identifying-terms'),
-              ('Wrapping it up', 2, None, 'wrapping-it-up'),
-              ('Confidence Intervals', 2, None, 'confidence-intervals'),
-              ('Standard Approach based on the Normal Distribution',
+               'slightly-different-approach'),
+              ('Time decay rate', 2, None, 'time-decay-rate'),
+              ('Code with a Number of Minibatches which varies',
                2,
                None,
-               'standard-approach-based-on-the-normal-distribution'),
-              ('Resampling methods: Bootstrap background',
+               'code-with-a-number-of-minibatches-which-varies'),
+              ('Replace or not', 2, None, 'replace-or-not'),
+              ('SGD vs Full-Batch GD: Convergence Speed and Memory Comparison',
                2,
                None,
-               'resampling-methods-bootstrap-background'),
-              ('Resampling methods: More Bootstrap background',
+               'sgd-vs-full-batch-gd-convergence-speed-and-memory-comparison'),
+              ('Theoretical Convergence Speed and convex optimization',
+               3,
+               None,
+               'theoretical-convergence-speed-and-convex-optimization'),
+              ('Strongly Convex Case', 3, None, 'strongly-convex-case'),
+              ('Non-Convex Problems', 3, None, 'non-convex-problems'),
+              ('Memory Usage and Scalability',
+               2,
+               None,
+               'memory-usage-and-scalability'),
+              ('Empirical Evidence: Convergence Time and Memory in Practice',
+               2,
+               None,
+               'empirical-evidence-convergence-time-and-memory-in-practice'),
+              ('Deep Neural Networks', 3, None, 'deep-neural-networks'),
+              ('Memory constraints', 3, None, 'memory-constraints'),
+              ('Second moment of the gradient',
+               2,
+               None,
+               'second-moment-of-the-gradient'),
+              ('Challenge: Choosing a Fixed Learning Rate',
+               2,
+               None,
+               'challenge-choosing-a-fixed-learning-rate'),
+              ('Motivation for Adaptive Step Sizes',
+               2,
+               None,
+               'motivation-for-adaptive-step-sizes'),
+              ('AdaGrad algorithm, taken from "Goodfellow et '
+               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
+               2,
+               None,
+               'adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
+              ('Derivation of the AdaGrad Algorithm',
+               2,
+               None,
+               'derivation-of-the-adagrad-algorithm'),
+              ('AdaGrad Update Rule Derivation',
+               2,
+               None,
+               'adagrad-update-rule-derivation'),
+              ('AdaGrad Properties', 2, None, 'adagrad-properties'),
+              ('RMSProp: Adaptive Learning Rates',
+               2,
+               None,
+               'rmsprop-adaptive-learning-rates'),
+              ('RMSProp algorithm, taken from "Goodfellow et '
+               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
+               2,
+               None,
+               'rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
+              ('Adam Optimizer', 2, None, 'adam-optimizer'),
+              ('"ADAM optimizer":"https://arxiv.org/abs/1412.6980"',
+               2,
+               None,
+               'adam-optimizer-https-arxiv-org-abs-1412-6980'),
+              ('Why Combine Momentum and RMSProp?',
+               2,
+               None,
+               'why-combine-momentum-and-rmsprop'),
+              ('Adam: Exponential Moving Averages (Moments)',
                2,
                None,
-               'resampling-methods-more-bootstrap-background'),
-              ('Resampling methods: Bootstrap approach',
+               'adam-exponential-moving-averages-moments'),
+              ('Adam: Bias Correction', 2, None, 'adam-bias-correction'),
+              ('Adam: Update Rule Derivation',
                2,
                None,
-               'resampling-methods-bootstrap-approach'),
-              ('Resampling methods: Bootstrap steps',
+               'adam-update-rule-derivation'),
+              ('Adam vs. AdaGrad and RMSProp',
                2,
                None,
-               'resampling-methods-bootstrap-steps'),
-              ('Code example for the Bootstrap method',
+               'adam-vs-adagrad-and-rmsprop'),
+              ('Adaptivity Across Dimensions',
                2,
                None,
-               'code-example-for-the-bootstrap-method'),
-              ('Plotting the Histogram', 2, None, 'plotting-the-histogram'),
-              ('The bias-variance tradeoff',
+               'adaptivity-across-dimensions'),
+              ('ADAM algorithm, taken from "Goodfellow et '
+               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
                2,
                None,
-               'the-bias-variance-tradeoff'),
-              ('A way to Read the Bias-Variance Tradeoff',
+               'adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
+              ('Algorithms and codes for Adagrad, RMSprop and Adam',
                2,
                None,
-               'a-way-to-read-the-bias-variance-tradeoff'),
-              ('Example code for Bias-Variance tradeoff',
+               'algorithms-and-codes-for-adagrad-rmsprop-and-adam'),
+              ('Practical tips', 2, None, 'practical-tips'),
+              ('Sneaking in automatic differentiation using Autograd',
                2,
                None,
-               'example-code-for-bias-variance-tradeoff'),
-              ('Understanding what happens',
+               'sneaking-in-automatic-differentiation-using-autograd'),
+              ('Same code but now with momentum gradient descent',
                2,
                None,
-               'understanding-what-happens'),
-              ('Summing up', 2, None, 'summing-up'),
-              ("Another Example from Scikit-Learn's Repository",
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ('Including Stochastic Gradient Descent with Autograd',
                2,
                None,
-               'another-example-from-scikit-learn-s-repository'),
-              ('Various steps in cross-validation',
+               'including-stochastic-gradient-descent-with-autograd'),
+              ('Same code but now with momentum gradient descent',
                2,
                None,
-               'various-steps-in-cross-validation'),
-              ('Cross-validation in brief',
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ("But none of these can compete with Newton's method",
                2,
                None,
-               'cross-validation-in-brief'),
-              ('Code Example for Cross-validation and $k$-fold '
-               'Cross-validation',
+               'but-none-of-these-can-compete-with-newton-s-method'),
+              ('Similar (second order function now) problem but now with '
+               'AdaGrad',
                2,
                None,
-               'code-example-for-cross-validation-and-k-fold-cross-validation'),
-              ('More examples on bootstrap and cross-validation and errors',
+               'similar-second-order-function-now-problem-but-now-with-adagrad'),
+              ('RMSprop for adaptive learning rate with Stochastic Gradient '
+               'Descent',
                2,
                None,
-               'more-examples-on-bootstrap-and-cross-validation-and-errors'),
-              ('The same example but now with cross-validation',
+               'rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent'),
+              ('And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"',
                2,
                None,
-               'the-same-example-but-now-with-cross-validation'),
+               'and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf'),
               ('Material for the lab sessions',
                2,
                None,
                'material-for-the-lab-sessions'),
-              ('Linking the regression analysis with a statistical '
-               'interpretation',
+              ('Reminder on different scaling methods',
                2,
                None,
-               'linking-the-regression-analysis-with-a-statistical-interpretation'),
-              ('Assumptions made', 2, None, 'assumptions-made'),
-              ('Expectation value and variance',
+               'reminder-on-different-scaling-methods'),
+              ('Functionality in Scikit-Learn',
                2,
                None,
-               'expectation-value-and-variance'),
-              ('Expectation value and variance for $\\boldsymbol{\\beta}$',
+               'functionality-in-scikit-learn'),
+              ('More preprocessing', 2, None, 'more-preprocessing'),
+              ('Frequently used scaling functions',
                2,
                None,
-               'expectation-value-and-variance-for-boldsymbol-beta')]}
+               'frequently-used-scaling-functions')]}
 end of tocinfo -->
 
 <body>
@@ -220,66 +283,86 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week37-bs.html">Week 37: Statistical interpretations and Resampling Methods</a>
+    <a class="navbar-brand" href="http://github.com/week37-bs.html">Week 37: Gradient descent methods</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs001.html#plans-for-week-37-lecture-monday" style="font-size: 80%;">Plans for week 37, lecture Monday</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs002.html#plans-for-week-37-lab-sessions" style="font-size: 80%;">Plans for week 37, lab sessions</a></li>
-     <!-- navigation toc: --> <li><a href="#material-for-lecture-monday-september-9" style="font-size: 80%;">Material for lecture Monday September 9</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs004.html#deriving-ols-from-a-probability-distribution" style="font-size: 80%;">Deriving OLS from a probability distribution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs005.html#independent-and-identically-distrubuted-iid" style="font-size: 80%;">Independent and Identically Distrubuted (iid)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs006.html#maximum-likelihood-estimation-mle" style="font-size: 80%;">Maximum Likelihood Estimation (MLE)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs007.html#a-new-cost-function" style="font-size: 80%;">A new Cost Function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs008.html#more-basic-statistics-and-bayes-theorem" style="font-size: 80%;">More basic Statistics and Bayes' theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs009.html#marginal-probability" style="font-size: 80%;">Marginal Probability</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs010.html#conditional-probability" style="font-size: 80%;">Conditional  Probability</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs011.html#bayes-theorem" style="font-size: 80%;">Bayes' Theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs012.html#interpretations-of-bayes-theorem" style="font-size: 80%;">Interpretations of Bayes' Theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs013.html#example-of-usage-of-bayes-theorem" style="font-size: 80%;">Example of Usage of Bayes' theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs014.html#doing-it-correctly" style="font-size: 80%;">Doing it correctly</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs015.html#bayes-theorem-and-ridge-and-lasso-regression" style="font-size: 80%;">Bayes' Theorem and Ridge and Lasso Regression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs016.html#ridge-and-bayes" style="font-size: 80%;">Ridge and Bayes</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs017.html#lasso-and-bayes" style="font-size: 80%;">Lasso and Bayes</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs021.html#why-resampling-methods" style="font-size: 80%;">Why resampling methods</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs023.html#resampling-methods" style="font-size: 80%;">Resampling methods</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs020.html#resampling-approaches-can-be-computationally-expensive" style="font-size: 80%;">Resampling approaches can be computationally expensive</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs021.html#why-resampling-methods" style="font-size: 80%;">Why resampling methods ?</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs022.html#statistical-analysis" style="font-size: 80%;">Statistical analysis</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs023.html#resampling-methods" style="font-size: 80%;">Resampling methods</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs024.html#resampling-methods-bootstrap" style="font-size: 80%;">Resampling methods: Bootstrap</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs025.html#the-central-limit-theorem" style="font-size: 80%;">The Central Limit Theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs026.html#finding-the-limit" style="font-size: 80%;">Finding the Limit</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs027.html#rewriting-the-delta-function" style="font-size: 80%;">Rewriting the \( \delta \)-function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs028.html#identifying-terms" style="font-size: 80%;">Identifying Terms</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs029.html#wrapping-it-up" style="font-size: 80%;">Wrapping it up</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs030.html#confidence-intervals" style="font-size: 80%;">Confidence Intervals</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs031.html#standard-approach-based-on-the-normal-distribution" style="font-size: 80%;">Standard Approach based on the Normal Distribution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs032.html#resampling-methods-bootstrap-background" style="font-size: 80%;">Resampling methods: Bootstrap background</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#resampling-methods-more-bootstrap-background" style="font-size: 80%;">Resampling methods: More Bootstrap background</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs034.html#resampling-methods-bootstrap-approach" style="font-size: 80%;">Resampling methods: Bootstrap approach</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#resampling-methods-bootstrap-steps" style="font-size: 80%;">Resampling methods: Bootstrap steps</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs036.html#code-example-for-the-bootstrap-method" style="font-size: 80%;">Code example for the Bootstrap method</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs037.html#plotting-the-histogram" style="font-size: 80%;">Plotting the Histogram</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs038.html#the-bias-variance-tradeoff" style="font-size: 80%;">The bias-variance tradeoff</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs039.html#a-way-to-read-the-bias-variance-tradeoff" style="font-size: 80%;">A way to Read the Bias-Variance Tradeoff</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs040.html#example-code-for-bias-variance-tradeoff" style="font-size: 80%;">Example code for Bias-Variance tradeoff</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs041.html#understanding-what-happens" style="font-size: 80%;">Understanding what happens</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs042.html#summing-up" style="font-size: 80%;">Summing up</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs043.html#another-example-from-scikit-learn-s-repository" style="font-size: 80%;">Another Example from Scikit-Learn's Repository</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs044.html#various-steps-in-cross-validation" style="font-size: 80%;">Various steps in cross-validation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs045.html#cross-validation-in-brief" style="font-size: 80%;">Cross-validation in brief</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs046.html#code-example-for-cross-validation-and-k-fold-cross-validation" style="font-size: 80%;">Code Example for Cross-validation and \( k \)-fold Cross-validation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs047.html#more-examples-on-bootstrap-and-cross-validation-and-errors" style="font-size: 80%;">More examples on bootstrap and cross-validation and errors</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs048.html#the-same-example-but-now-with-cross-validation" style="font-size: 80%;">The same example but now with cross-validation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs049.html#material-for-the-lab-sessions" style="font-size: 80%;">Material for the lab sessions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs050.html#linking-the-regression-analysis-with-a-statistical-interpretation" style="font-size: 80%;">Linking the regression analysis with a statistical interpretation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs051.html#assumptions-made" style="font-size: 80%;">Assumptions made</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs052.html#expectation-value-and-variance" style="font-size: 80%;">Expectation value and variance</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs053.html#expectation-value-and-variance-for-boldsymbol-beta" style="font-size: 80%;">Expectation value and variance for \( \boldsymbol{\beta} \)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs001.html#plans-for-week-37-lecture-monday" style="font-size: 80%;"><b>Plans for week 37, lecture Monday</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs002.html#readings-and-videos" style="font-size: 80%;"><b>Readings and Videos:</b></a></li>
+     <!-- navigation toc: --> <li><a href="#material-for-lecture-monday-september-8" style="font-size: 80%;"><b>Material for lecture Monday September 8</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs004.html#gradient-descent-and-revisiting-ordinary-least-squares-from-last-week" style="font-size: 80%;"><b>Gradient descent and revisiting Ordinary Least Squares from last week</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs009.html#gradient-descent-example" style="font-size: 80%;"><b>Gradient descent example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs006.html#the-derivative-of-the-cost-loss-function" style="font-size: 80%;"><b>The derivative of the cost/loss function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs007.html#the-hessian-matrix" style="font-size: 80%;"><b>The Hessian matrix</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs008.html#simple-program" style="font-size: 80%;"><b>Simple program</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs009.html#gradient-descent-example" style="font-size: 80%;"><b>Gradient Descent Example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs010.html#gradient-descent-and-ridge" style="font-size: 80%;"><b>Gradient descent and Ridge</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs011.html#the-hessian-matrix-for-ridge-regression" style="font-size: 80%;"><b>The Hessian matrix for Ridge Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs012.html#program-example-for-gradient-descent-with-ridge-regression" style="font-size: 80%;"><b>Program example for gradient descent with Ridge Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs013.html#using-gradient-descent-methods-limitations" style="font-size: 80%;"><b>Using gradient descent methods, limitations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs014.html#momentum-based-gd" style="font-size: 80%;"><b>Momentum based GD</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs015.html#improving-gradient-descent-with-momentum" style="font-size: 80%;"><b>Improving gradient descent with momentum</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs056.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs017.html#overview-video-on-stochastic-gradient-descent-sgd" style="font-size: 80%;"><b>Overview video on Stochastic Gradient Descent (SGD)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs018.html#batches-and-mini-batches" style="font-size: 80%;"><b>Batches and mini-batches</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs019.html#pros-and-cons" style="font-size: 80%;"><b>Pros and cons</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs020.html#convergence-rates" style="font-size: 80%;"><b>Convergence rates</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs021.html#accuracy" style="font-size: 80%;"><b>Accuracy</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs022.html#stochastic-gradient-descent-sgd" style="font-size: 80%;"><b>Stochastic Gradient Descent (SGD)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs023.html#stochastic-gradient-descent" style="font-size: 80%;"><b>Stochastic Gradient Descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs024.html#computation-of-gradients" style="font-size: 80%;"><b>Computation of gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs025.html#sgd-example" style="font-size: 80%;"><b>SGD example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs026.html#the-gradient-step" style="font-size: 80%;"><b>The gradient step</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs027.html#simple-example-code" style="font-size: 80%;"><b>Simple example code</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs028.html#when-do-we-stop" style="font-size: 80%;"><b>When do we stop?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs029.html#slightly-different-approach" style="font-size: 80%;"><b>Slightly different approach</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs030.html#time-decay-rate" style="font-size: 80%;"><b>Time decay rate</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs031.html#code-with-a-number-of-minibatches-which-varies" style="font-size: 80%;"><b>Code with a Number of Minibatches which varies</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs032.html#replace-or-not" style="font-size: 80%;"><b>Replace or not</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#sgd-vs-full-batch-gd-convergence-speed-and-memory-comparison" style="font-size: 80%;"><b>SGD vs Full-Batch GD: Convergence Speed and Memory Comparison</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#theoretical-convergence-speed-and-convex-optimization" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Theoretical Convergence Speed and convex optimization</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#strongly-convex-case" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Strongly Convex Case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#non-convex-problems" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Non-Convex Problems</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs034.html#memory-usage-and-scalability" style="font-size: 80%;"><b>Memory Usage and Scalability</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#empirical-evidence-convergence-time-and-memory-in-practice" style="font-size: 80%;"><b>Empirical Evidence: Convergence Time and Memory in Practice</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#deep-neural-networks" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Deep Neural Networks</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#memory-constraints" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Memory constraints</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs036.html#second-moment-of-the-gradient" style="font-size: 80%;"><b>Second moment of the gradient</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs037.html#challenge-choosing-a-fixed-learning-rate" style="font-size: 80%;"><b>Challenge: Choosing a Fixed Learning Rate</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs038.html#motivation-for-adaptive-step-sizes" style="font-size: 80%;"><b>Motivation for Adaptive Step Sizes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs038.html#adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>AdaGrad algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs039.html#derivation-of-the-adagrad-algorithm" style="font-size: 80%;"><b>Derivation of the AdaGrad Algorithm</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs040.html#adagrad-update-rule-derivation" style="font-size: 80%;"><b>AdaGrad Update Rule Derivation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs041.html#adagrad-properties" style="font-size: 80%;"><b>AdaGrad Properties</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs042.html#rmsprop-adaptive-learning-rates" style="font-size: 80%;"><b>RMSProp: Adaptive Learning Rates</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs042.html#rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>RMSProp algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs043.html#adam-optimizer" style="font-size: 80%;"><b>Adam Optimizer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs044.html#adam-optimizer-https-arxiv-org-abs-1412-6980" style="font-size: 80%;"><b>"ADAM optimizer":"https://arxiv.org/abs/1412.6980"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs045.html#why-combine-momentum-and-rmsprop" style="font-size: 80%;"><b>Why Combine Momentum and RMSProp?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs046.html#adam-exponential-moving-averages-moments" style="font-size: 80%;"><b>Adam: Exponential Moving Averages (Moments)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs047.html#adam-bias-correction" style="font-size: 80%;"><b>Adam: Bias Correction</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs048.html#adam-update-rule-derivation" style="font-size: 80%;"><b>Adam: Update Rule Derivation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs049.html#adam-vs-adagrad-and-rmsprop" style="font-size: 80%;"><b>Adam vs. AdaGrad and RMSProp</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs050.html#adaptivity-across-dimensions" style="font-size: 80%;"><b>Adaptivity Across Dimensions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs050.html#adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>ADAM algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs051.html#algorithms-and-codes-for-adagrad-rmsprop-and-adam" style="font-size: 80%;"><b>Algorithms and codes for Adagrad, RMSprop and Adam</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs052.html#practical-tips" style="font-size: 80%;"><b>Practical tips</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs053.html#sneaking-in-automatic-differentiation-using-autograd" style="font-size: 80%;"><b>Sneaking in automatic differentiation using Autograd</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs056.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs055.html#including-stochastic-gradient-descent-with-autograd" style="font-size: 80%;"><b>Including Stochastic Gradient Descent with Autograd</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs056.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs057.html#but-none-of-these-can-compete-with-newton-s-method" style="font-size: 80%;"><b>But none of these can compete with Newton's method</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs058.html#similar-second-order-function-now-problem-but-now-with-adagrad" style="font-size: 80%;"><b>Similar (second order function now) problem but now with AdaGrad</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs059.html#rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent" style="font-size: 80%;"><b>RMSprop for adaptive learning rate with Stochastic Gradient Descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs060.html#and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf" style="font-size: 80%;"><b>And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs061.html#material-for-the-lab-sessions" style="font-size: 80%;"><b>Material for the lab sessions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs062.html#reminder-on-different-scaling-methods" style="font-size: 80%;"><b>Reminder on different scaling methods</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs063.html#functionality-in-scikit-learn" style="font-size: 80%;"><b>Functionality in Scikit-Learn</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs064.html#more-preprocessing" style="font-size: 80%;"><b>More preprocessing</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs065.html#frequently-used-scaling-functions" style="font-size: 80%;"><b>Frequently used scaling functions</b></a></li>
 
         </ul>
       </li>
@@ -291,7 +374,7 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0003"></a>
 <!-- !split -->
-<h2 id="material-for-lecture-monday-september-9" class="anchor">Material for lecture Monday September 9 </h2>
+<h2 id="material-for-lecture-monday-september-8" class="anchor">Material for lecture Monday September 8 </h2>
 
 <p>
 <!-- navigation buttons at the bottom of the page -->
@@ -311,7 +394,7 @@ <h2 id="material-for-lecture-monday-september-9" class="anchor">Material for lec
   <li><a href="http://github.com/._week37-bs011.html">12</a></li>
   <li><a href="http://github.com/._week37-bs012.html">13</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week37-bs053.html">54</a></li>
+  <li><a href="http://github.com/._week37-bs065.html">66</a></li>
   <li><a href="http://github.com/._week37-bs004.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week37/html/._week37-bs004.html b/doc/pub/week37/html/._week37-bs004.html
index 439d8902b..b160fe6fe 100644
--- a/doc/pub/week37/html/._week37-bs004.html
+++ b/doc/pub/week37/html/._week37-bs004.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 37: Statistical interpretations and Resampling Methods">
-<title>Week 37: Statistical interpretations and Resampling Methods</title>
+<meta name="description" content="Week 37: Gradient descent methods">
+<title>Week 37: Gradient descent methods</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week37.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week37-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -40,159 +40,222 @@
                2,
                None,
                'plans-for-week-37-lecture-monday'),
-              ('Plans for week 37, lab sessions',
+              ('Readings and Videos:', 2, None, 'readings-and-videos'),
+              ('Material for lecture Monday September 8',
                2,
                None,
-               'plans-for-week-37-lab-sessions'),
-              ('Material for lecture Monday September 9',
+               'material-for-lecture-monday-september-8'),
+              ('Gradient descent and revisiting Ordinary Least Squares from '
+               'last week',
                2,
                None,
-               'material-for-lecture-monday-september-9'),
-              ('Deriving OLS from a probability distribution',
+               'gradient-descent-and-revisiting-ordinary-least-squares-from-last-week'),
+              ('Gradient descent example', 2, None, 'gradient-descent-example'),
+              ('The derivative of the cost/loss function',
                2,
                None,
-               'deriving-ols-from-a-probability-distribution'),
-              ('Independent and Identically Distrubuted (iid)',
+               'the-derivative-of-the-cost-loss-function'),
+              ('The Hessian matrix', 2, None, 'the-hessian-matrix'),
+              ('Simple program', 2, None, 'simple-program'),
+              ('Gradient Descent Example', 2, None, 'gradient-descent-example'),
+              ('Gradient descent and Ridge',
                2,
                None,
-               'independent-and-identically-distrubuted-iid'),
-              ('Maximum Likelihood Estimation (MLE)',
+               'gradient-descent-and-ridge'),
+              ('The Hessian matrix for Ridge Regression',
                2,
                None,
-               'maximum-likelihood-estimation-mle'),
-              ('A new Cost Function', 2, None, 'a-new-cost-function'),
-              ("More basic Statistics and Bayes' theorem",
+               'the-hessian-matrix-for-ridge-regression'),
+              ('Program example for gradient descent with Ridge Regression',
                2,
                None,
-               'more-basic-statistics-and-bayes-theorem'),
-              ('Marginal Probability', 2, None, 'marginal-probability'),
-              ('Conditional  Probability', 2, None, 'conditional-probability'),
-              ("Bayes' Theorem", 2, None, 'bayes-theorem'),
-              ("Interpretations of Bayes' Theorem",
+               'program-example-for-gradient-descent-with-ridge-regression'),
+              ('Using gradient descent methods, limitations',
                2,
                None,
-               'interpretations-of-bayes-theorem'),
-              ("Example of Usage of Bayes' theorem",
+               'using-gradient-descent-methods-limitations'),
+              ('Momentum based GD', 2, None, 'momentum-based-gd'),
+              ('Improving gradient descent with momentum',
                2,
                None,
-               'example-of-usage-of-bayes-theorem'),
-              ('Doing it correctly', 2, None, 'doing-it-correctly'),
-              ("Bayes' Theorem and Ridge and Lasso Regression",
+               'improving-gradient-descent-with-momentum'),
+              ('Same code but now with momentum gradient descent',
                2,
                None,
-               'bayes-theorem-and-ridge-and-lasso-regression'),
-              ('Ridge and Bayes', 2, None, 'ridge-and-bayes'),
-              ('Lasso and Bayes', 2, None, 'lasso-and-bayes'),
-              ('Why resampling methods', 2, None, 'why-resampling-methods'),
-              ('Resampling methods', 2, None, 'resampling-methods'),
-              ('Resampling approaches can be computationally expensive',
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ('Overview video on Stochastic Gradient Descent (SGD)',
                2,
                None,
-               'resampling-approaches-can-be-computationally-expensive'),
-              ('Why resampling methods ?', 2, None, 'why-resampling-methods'),
-              ('Statistical analysis', 2, None, 'statistical-analysis'),
-              ('Resampling methods', 2, None, 'resampling-methods'),
-              ('Resampling methods: Bootstrap',
+               'overview-video-on-stochastic-gradient-descent-sgd'),
+              ('Batches and mini-batches', 2, None, 'batches-and-mini-batches'),
+              ('Pros and cons', 2, None, 'pros-and-cons'),
+              ('Convergence rates', 2, None, 'convergence-rates'),
+              ('Accuracy', 2, None, 'accuracy'),
+              ('Stochastic Gradient Descent (SGD)',
                2,
                None,
-               'resampling-methods-bootstrap'),
-              ('The Central Limit Theorem',
+               'stochastic-gradient-descent-sgd'),
+              ('Stochastic Gradient Descent',
                2,
                None,
-               'the-central-limit-theorem'),
-              ('Finding the Limit', 2, None, 'finding-the-limit'),
-              ('Rewriting the $\\delta$-function',
+               'stochastic-gradient-descent'),
+              ('Computation of gradients', 2, None, 'computation-of-gradients'),
+              ('SGD example', 2, None, 'sgd-example'),
+              ('The gradient step', 2, None, 'the-gradient-step'),
+              ('Simple example code', 2, None, 'simple-example-code'),
+              ('When do we stop?', 2, None, 'when-do-we-stop'),
+              ('Slightly different approach',
                2,
                None,
-               'rewriting-the-delta-function'),
-              ('Identifying Terms', 2, None, 'identifying-terms'),
-              ('Wrapping it up', 2, None, 'wrapping-it-up'),
-              ('Confidence Intervals', 2, None, 'confidence-intervals'),
-              ('Standard Approach based on the Normal Distribution',
+               'slightly-different-approach'),
+              ('Time decay rate', 2, None, 'time-decay-rate'),
+              ('Code with a Number of Minibatches which varies',
                2,
                None,
-               'standard-approach-based-on-the-normal-distribution'),
-              ('Resampling methods: Bootstrap background',
+               'code-with-a-number-of-minibatches-which-varies'),
+              ('Replace or not', 2, None, 'replace-or-not'),
+              ('SGD vs Full-Batch GD: Convergence Speed and Memory Comparison',
                2,
                None,
-               'resampling-methods-bootstrap-background'),
-              ('Resampling methods: More Bootstrap background',
+               'sgd-vs-full-batch-gd-convergence-speed-and-memory-comparison'),
+              ('Theoretical Convergence Speed and convex optimization',
+               3,
+               None,
+               'theoretical-convergence-speed-and-convex-optimization'),
+              ('Strongly Convex Case', 3, None, 'strongly-convex-case'),
+              ('Non-Convex Problems', 3, None, 'non-convex-problems'),
+              ('Memory Usage and Scalability',
+               2,
+               None,
+               'memory-usage-and-scalability'),
+              ('Empirical Evidence: Convergence Time and Memory in Practice',
+               2,
+               None,
+               'empirical-evidence-convergence-time-and-memory-in-practice'),
+              ('Deep Neural Networks', 3, None, 'deep-neural-networks'),
+              ('Memory constraints', 3, None, 'memory-constraints'),
+              ('Second moment of the gradient',
+               2,
+               None,
+               'second-moment-of-the-gradient'),
+              ('Challenge: Choosing a Fixed Learning Rate',
+               2,
+               None,
+               'challenge-choosing-a-fixed-learning-rate'),
+              ('Motivation for Adaptive Step Sizes',
+               2,
+               None,
+               'motivation-for-adaptive-step-sizes'),
+              ('AdaGrad algorithm, taken from "Goodfellow et '
+               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
+               2,
+               None,
+               'adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
+              ('Derivation of the AdaGrad Algorithm',
+               2,
+               None,
+               'derivation-of-the-adagrad-algorithm'),
+              ('AdaGrad Update Rule Derivation',
+               2,
+               None,
+               'adagrad-update-rule-derivation'),
+              ('AdaGrad Properties', 2, None, 'adagrad-properties'),
+              ('RMSProp: Adaptive Learning Rates',
                2,
                None,
-               'resampling-methods-more-bootstrap-background'),
-              ('Resampling methods: Bootstrap approach',
+               'rmsprop-adaptive-learning-rates'),
+              ('RMSProp algorithm, taken from "Goodfellow et '
+               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
                2,
                None,
-               'resampling-methods-bootstrap-approach'),
-              ('Resampling methods: Bootstrap steps',
+               'rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
+              ('Adam Optimizer', 2, None, 'adam-optimizer'),
+              ('"ADAM optimizer":"https://arxiv.org/abs/1412.6980"',
                2,
                None,
-               'resampling-methods-bootstrap-steps'),
-              ('Code example for the Bootstrap method',
+               'adam-optimizer-https-arxiv-org-abs-1412-6980'),
+              ('Why Combine Momentum and RMSProp?',
                2,
                None,
-               'code-example-for-the-bootstrap-method'),
-              ('Plotting the Histogram', 2, None, 'plotting-the-histogram'),
-              ('The bias-variance tradeoff',
+               'why-combine-momentum-and-rmsprop'),
+              ('Adam: Exponential Moving Averages (Moments)',
                2,
                None,
-               'the-bias-variance-tradeoff'),
-              ('A way to Read the Bias-Variance Tradeoff',
+               'adam-exponential-moving-averages-moments'),
+              ('Adam: Bias Correction', 2, None, 'adam-bias-correction'),
+              ('Adam: Update Rule Derivation',
                2,
                None,
-               'a-way-to-read-the-bias-variance-tradeoff'),
-              ('Example code for Bias-Variance tradeoff',
+               'adam-update-rule-derivation'),
+              ('Adam vs. AdaGrad and RMSProp',
                2,
                None,
-               'example-code-for-bias-variance-tradeoff'),
-              ('Understanding what happens',
+               'adam-vs-adagrad-and-rmsprop'),
+              ('Adaptivity Across Dimensions',
                2,
                None,
-               'understanding-what-happens'),
-              ('Summing up', 2, None, 'summing-up'),
-              ("Another Example from Scikit-Learn's Repository",
+               'adaptivity-across-dimensions'),
+              ('ADAM algorithm, taken from "Goodfellow et '
+               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
                2,
                None,
-               'another-example-from-scikit-learn-s-repository'),
-              ('Various steps in cross-validation',
+               'adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
+              ('Algorithms and codes for Adagrad, RMSprop and Adam',
                2,
                None,
-               'various-steps-in-cross-validation'),
-              ('Cross-validation in brief',
+               'algorithms-and-codes-for-adagrad-rmsprop-and-adam'),
+              ('Practical tips', 2, None, 'practical-tips'),
+              ('Sneaking in automatic differentiation using Autograd',
                2,
                None,
-               'cross-validation-in-brief'),
-              ('Code Example for Cross-validation and $k$-fold '
-               'Cross-validation',
+               'sneaking-in-automatic-differentiation-using-autograd'),
+              ('Same code but now with momentum gradient descent',
                2,
                None,
-               'code-example-for-cross-validation-and-k-fold-cross-validation'),
-              ('More examples on bootstrap and cross-validation and errors',
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ('Including Stochastic Gradient Descent with Autograd',
                2,
                None,
-               'more-examples-on-bootstrap-and-cross-validation-and-errors'),
-              ('The same example but now with cross-validation',
+               'including-stochastic-gradient-descent-with-autograd'),
+              ('Same code but now with momentum gradient descent',
                2,
                None,
-               'the-same-example-but-now-with-cross-validation'),
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ("But none of these can compete with Newton's method",
+               2,
+               None,
+               'but-none-of-these-can-compete-with-newton-s-method'),
+              ('Similar (second order function now) problem but now with '
+               'AdaGrad',
+               2,
+               None,
+               'similar-second-order-function-now-problem-but-now-with-adagrad'),
+              ('RMSprop for adaptive learning rate with Stochastic Gradient '
+               'Descent',
+               2,
+               None,
+               'rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent'),
+              ('And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"',
+               2,
+               None,
+               'and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf'),
               ('Material for the lab sessions',
                2,
                None,
                'material-for-the-lab-sessions'),
-              ('Linking the regression analysis with a statistical '
-               'interpretation',
+              ('Reminder on different scaling methods',
                2,
                None,
-               'linking-the-regression-analysis-with-a-statistical-interpretation'),
-              ('Assumptions made', 2, None, 'assumptions-made'),
-              ('Expectation value and variance',
+               'reminder-on-different-scaling-methods'),
+              ('Functionality in Scikit-Learn',
                2,
                None,
-               'expectation-value-and-variance'),
-              ('Expectation value and variance for $\\boldsymbol{\\beta}$',
+               'functionality-in-scikit-learn'),
+              ('More preprocessing', 2, None, 'more-preprocessing'),
+              ('Frequently used scaling functions',
                2,
                None,
-               'expectation-value-and-variance-for-boldsymbol-beta')]}
+               'frequently-used-scaling-functions')]}
 end of tocinfo -->
 
 <body>
@@ -220,66 +283,86 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week37-bs.html">Week 37: Statistical interpretations and Resampling Methods</a>
+    <a class="navbar-brand" href="http://github.com/week37-bs.html">Week 37: Gradient descent methods</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs001.html#plans-for-week-37-lecture-monday" style="font-size: 80%;">Plans for week 37, lecture Monday</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs002.html#plans-for-week-37-lab-sessions" style="font-size: 80%;">Plans for week 37, lab sessions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs003.html#material-for-lecture-monday-september-9" style="font-size: 80%;">Material for lecture Monday September 9</a></li>
-     <!-- navigation toc: --> <li><a href="#deriving-ols-from-a-probability-distribution" style="font-size: 80%;">Deriving OLS from a probability distribution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs005.html#independent-and-identically-distrubuted-iid" style="font-size: 80%;">Independent and Identically Distrubuted (iid)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs006.html#maximum-likelihood-estimation-mle" style="font-size: 80%;">Maximum Likelihood Estimation (MLE)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs007.html#a-new-cost-function" style="font-size: 80%;">A new Cost Function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs008.html#more-basic-statistics-and-bayes-theorem" style="font-size: 80%;">More basic Statistics and Bayes' theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs009.html#marginal-probability" style="font-size: 80%;">Marginal Probability</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs010.html#conditional-probability" style="font-size: 80%;">Conditional  Probability</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs011.html#bayes-theorem" style="font-size: 80%;">Bayes' Theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs012.html#interpretations-of-bayes-theorem" style="font-size: 80%;">Interpretations of Bayes' Theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs013.html#example-of-usage-of-bayes-theorem" style="font-size: 80%;">Example of Usage of Bayes' theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs014.html#doing-it-correctly" style="font-size: 80%;">Doing it correctly</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs015.html#bayes-theorem-and-ridge-and-lasso-regression" style="font-size: 80%;">Bayes' Theorem and Ridge and Lasso Regression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs016.html#ridge-and-bayes" style="font-size: 80%;">Ridge and Bayes</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs017.html#lasso-and-bayes" style="font-size: 80%;">Lasso and Bayes</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs021.html#why-resampling-methods" style="font-size: 80%;">Why resampling methods</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs023.html#resampling-methods" style="font-size: 80%;">Resampling methods</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs020.html#resampling-approaches-can-be-computationally-expensive" style="font-size: 80%;">Resampling approaches can be computationally expensive</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs021.html#why-resampling-methods" style="font-size: 80%;">Why resampling methods ?</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs022.html#statistical-analysis" style="font-size: 80%;">Statistical analysis</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs023.html#resampling-methods" style="font-size: 80%;">Resampling methods</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs024.html#resampling-methods-bootstrap" style="font-size: 80%;">Resampling methods: Bootstrap</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs025.html#the-central-limit-theorem" style="font-size: 80%;">The Central Limit Theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs026.html#finding-the-limit" style="font-size: 80%;">Finding the Limit</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs027.html#rewriting-the-delta-function" style="font-size: 80%;">Rewriting the \( \delta \)-function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs028.html#identifying-terms" style="font-size: 80%;">Identifying Terms</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs029.html#wrapping-it-up" style="font-size: 80%;">Wrapping it up</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs030.html#confidence-intervals" style="font-size: 80%;">Confidence Intervals</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs031.html#standard-approach-based-on-the-normal-distribution" style="font-size: 80%;">Standard Approach based on the Normal Distribution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs032.html#resampling-methods-bootstrap-background" style="font-size: 80%;">Resampling methods: Bootstrap background</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#resampling-methods-more-bootstrap-background" style="font-size: 80%;">Resampling methods: More Bootstrap background</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs034.html#resampling-methods-bootstrap-approach" style="font-size: 80%;">Resampling methods: Bootstrap approach</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#resampling-methods-bootstrap-steps" style="font-size: 80%;">Resampling methods: Bootstrap steps</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs036.html#code-example-for-the-bootstrap-method" style="font-size: 80%;">Code example for the Bootstrap method</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs037.html#plotting-the-histogram" style="font-size: 80%;">Plotting the Histogram</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs038.html#the-bias-variance-tradeoff" style="font-size: 80%;">The bias-variance tradeoff</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs039.html#a-way-to-read-the-bias-variance-tradeoff" style="font-size: 80%;">A way to Read the Bias-Variance Tradeoff</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs040.html#example-code-for-bias-variance-tradeoff" style="font-size: 80%;">Example code for Bias-Variance tradeoff</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs041.html#understanding-what-happens" style="font-size: 80%;">Understanding what happens</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs042.html#summing-up" style="font-size: 80%;">Summing up</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs043.html#another-example-from-scikit-learn-s-repository" style="font-size: 80%;">Another Example from Scikit-Learn's Repository</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs044.html#various-steps-in-cross-validation" style="font-size: 80%;">Various steps in cross-validation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs045.html#cross-validation-in-brief" style="font-size: 80%;">Cross-validation in brief</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs046.html#code-example-for-cross-validation-and-k-fold-cross-validation" style="font-size: 80%;">Code Example for Cross-validation and \( k \)-fold Cross-validation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs047.html#more-examples-on-bootstrap-and-cross-validation-and-errors" style="font-size: 80%;">More examples on bootstrap and cross-validation and errors</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs048.html#the-same-example-but-now-with-cross-validation" style="font-size: 80%;">The same example but now with cross-validation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs049.html#material-for-the-lab-sessions" style="font-size: 80%;">Material for the lab sessions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs050.html#linking-the-regression-analysis-with-a-statistical-interpretation" style="font-size: 80%;">Linking the regression analysis with a statistical interpretation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs051.html#assumptions-made" style="font-size: 80%;">Assumptions made</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs052.html#expectation-value-and-variance" style="font-size: 80%;">Expectation value and variance</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs053.html#expectation-value-and-variance-for-boldsymbol-beta" style="font-size: 80%;">Expectation value and variance for \( \boldsymbol{\beta} \)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs001.html#plans-for-week-37-lecture-monday" style="font-size: 80%;"><b>Plans for week 37, lecture Monday</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs002.html#readings-and-videos" style="font-size: 80%;"><b>Readings and Videos:</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs003.html#material-for-lecture-monday-september-8" style="font-size: 80%;"><b>Material for lecture Monday September 8</b></a></li>
+     <!-- navigation toc: --> <li><a href="#gradient-descent-and-revisiting-ordinary-least-squares-from-last-week" style="font-size: 80%;"><b>Gradient descent and revisiting Ordinary Least Squares from last week</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs009.html#gradient-descent-example" style="font-size: 80%;"><b>Gradient descent example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs006.html#the-derivative-of-the-cost-loss-function" style="font-size: 80%;"><b>The derivative of the cost/loss function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs007.html#the-hessian-matrix" style="font-size: 80%;"><b>The Hessian matrix</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs008.html#simple-program" style="font-size: 80%;"><b>Simple program</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs009.html#gradient-descent-example" style="font-size: 80%;"><b>Gradient Descent Example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs010.html#gradient-descent-and-ridge" style="font-size: 80%;"><b>Gradient descent and Ridge</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs011.html#the-hessian-matrix-for-ridge-regression" style="font-size: 80%;"><b>The Hessian matrix for Ridge Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs012.html#program-example-for-gradient-descent-with-ridge-regression" style="font-size: 80%;"><b>Program example for gradient descent with Ridge Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs013.html#using-gradient-descent-methods-limitations" style="font-size: 80%;"><b>Using gradient descent methods, limitations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs014.html#momentum-based-gd" style="font-size: 80%;"><b>Momentum based GD</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs015.html#improving-gradient-descent-with-momentum" style="font-size: 80%;"><b>Improving gradient descent with momentum</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs056.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs017.html#overview-video-on-stochastic-gradient-descent-sgd" style="font-size: 80%;"><b>Overview video on Stochastic Gradient Descent (SGD)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs018.html#batches-and-mini-batches" style="font-size: 80%;"><b>Batches and mini-batches</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs019.html#pros-and-cons" style="font-size: 80%;"><b>Pros and cons</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs020.html#convergence-rates" style="font-size: 80%;"><b>Convergence rates</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs021.html#accuracy" style="font-size: 80%;"><b>Accuracy</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs022.html#stochastic-gradient-descent-sgd" style="font-size: 80%;"><b>Stochastic Gradient Descent (SGD)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs023.html#stochastic-gradient-descent" style="font-size: 80%;"><b>Stochastic Gradient Descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs024.html#computation-of-gradients" style="font-size: 80%;"><b>Computation of gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs025.html#sgd-example" style="font-size: 80%;"><b>SGD example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs026.html#the-gradient-step" style="font-size: 80%;"><b>The gradient step</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs027.html#simple-example-code" style="font-size: 80%;"><b>Simple example code</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs028.html#when-do-we-stop" style="font-size: 80%;"><b>When do we stop?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs029.html#slightly-different-approach" style="font-size: 80%;"><b>Slightly different approach</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs030.html#time-decay-rate" style="font-size: 80%;"><b>Time decay rate</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs031.html#code-with-a-number-of-minibatches-which-varies" style="font-size: 80%;"><b>Code with a Number of Minibatches which varies</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs032.html#replace-or-not" style="font-size: 80%;"><b>Replace or not</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#sgd-vs-full-batch-gd-convergence-speed-and-memory-comparison" style="font-size: 80%;"><b>SGD vs Full-Batch GD: Convergence Speed and Memory Comparison</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#theoretical-convergence-speed-and-convex-optimization" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Theoretical Convergence Speed and convex optimization</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#strongly-convex-case" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Strongly Convex Case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#non-convex-problems" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Non-Convex Problems</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs034.html#memory-usage-and-scalability" style="font-size: 80%;"><b>Memory Usage and Scalability</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#empirical-evidence-convergence-time-and-memory-in-practice" style="font-size: 80%;"><b>Empirical Evidence: Convergence Time and Memory in Practice</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#deep-neural-networks" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Deep Neural Networks</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#memory-constraints" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Memory constraints</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs036.html#second-moment-of-the-gradient" style="font-size: 80%;"><b>Second moment of the gradient</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs037.html#challenge-choosing-a-fixed-learning-rate" style="font-size: 80%;"><b>Challenge: Choosing a Fixed Learning Rate</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs038.html#motivation-for-adaptive-step-sizes" style="font-size: 80%;"><b>Motivation for Adaptive Step Sizes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs038.html#adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>AdaGrad algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs039.html#derivation-of-the-adagrad-algorithm" style="font-size: 80%;"><b>Derivation of the AdaGrad Algorithm</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs040.html#adagrad-update-rule-derivation" style="font-size: 80%;"><b>AdaGrad Update Rule Derivation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs041.html#adagrad-properties" style="font-size: 80%;"><b>AdaGrad Properties</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs042.html#rmsprop-adaptive-learning-rates" style="font-size: 80%;"><b>RMSProp: Adaptive Learning Rates</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs042.html#rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>RMSProp algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs043.html#adam-optimizer" style="font-size: 80%;"><b>Adam Optimizer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs044.html#adam-optimizer-https-arxiv-org-abs-1412-6980" style="font-size: 80%;"><b>"ADAM optimizer":"https://arxiv.org/abs/1412.6980"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs045.html#why-combine-momentum-and-rmsprop" style="font-size: 80%;"><b>Why Combine Momentum and RMSProp?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs046.html#adam-exponential-moving-averages-moments" style="font-size: 80%;"><b>Adam: Exponential Moving Averages (Moments)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs047.html#adam-bias-correction" style="font-size: 80%;"><b>Adam: Bias Correction</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs048.html#adam-update-rule-derivation" style="font-size: 80%;"><b>Adam: Update Rule Derivation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs049.html#adam-vs-adagrad-and-rmsprop" style="font-size: 80%;"><b>Adam vs. AdaGrad and RMSProp</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs050.html#adaptivity-across-dimensions" style="font-size: 80%;"><b>Adaptivity Across Dimensions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs050.html#adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>ADAM algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs051.html#algorithms-and-codes-for-adagrad-rmsprop-and-adam" style="font-size: 80%;"><b>Algorithms and codes for Adagrad, RMSprop and Adam</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs052.html#practical-tips" style="font-size: 80%;"><b>Practical tips</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs053.html#sneaking-in-automatic-differentiation-using-autograd" style="font-size: 80%;"><b>Sneaking in automatic differentiation using Autograd</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs056.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs055.html#including-stochastic-gradient-descent-with-autograd" style="font-size: 80%;"><b>Including Stochastic Gradient Descent with Autograd</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs056.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs057.html#but-none-of-these-can-compete-with-newton-s-method" style="font-size: 80%;"><b>But none of these can compete with Newton's method</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs058.html#similar-second-order-function-now-problem-but-now-with-adagrad" style="font-size: 80%;"><b>Similar (second order function now) problem but now with AdaGrad</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs059.html#rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent" style="font-size: 80%;"><b>RMSprop for adaptive learning rate with Stochastic Gradient Descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs060.html#and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf" style="font-size: 80%;"><b>And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs061.html#material-for-the-lab-sessions" style="font-size: 80%;"><b>Material for the lab sessions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs062.html#reminder-on-different-scaling-methods" style="font-size: 80%;"><b>Reminder on different scaling methods</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs063.html#functionality-in-scikit-learn" style="font-size: 80%;"><b>Functionality in Scikit-Learn</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs064.html#more-preprocessing" style="font-size: 80%;"><b>More preprocessing</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs065.html#frequently-used-scaling-functions" style="font-size: 80%;"><b>Frequently used scaling functions</b></a></li>
 
         </ul>
       </li>
@@ -290,26 +373,57 @@
 <div class="container">
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0004"></a>
-<!-- !split -->
-<h2 id="deriving-ols-from-a-probability-distribution" class="anchor">Deriving OLS from a probability distribution </h2>
+<!-- !split  -->
+<h2 id="gradient-descent-and-revisiting-ordinary-least-squares-from-last-week" class="anchor">Gradient descent and revisiting Ordinary Least Squares from last week </h2>
 
-<p>Our basic assumption when we derived the OLS equations was to assume
-that our output is determined by a given continuous function
-\( f(\boldsymbol{x}) \) and a random noise \( \boldsymbol{\epsilon} \) given by the normal
-distribution with zero mean value and an undetermined variance
-\( \sigma^2 \).
+<p>Last week we started with  linear regression as a case study for the gradient descent
+methods. Linear regression is a great test case for the gradient
+descent methods discussed in the lectures since it has several
+desirable properties such as:
 </p>
 
-<p>We found above that the outputs \( \boldsymbol{y} \) have a mean value given by
-\( \boldsymbol{X}\hat{\boldsymbol{\beta}} \) and variance \( \sigma^2 \). Since the entries to
-the design matrix are not stochastic variables, we can assume that the
-probability distribution of our targets is also a normal distribution
-but now with mean value \( \boldsymbol{X}\hat{\boldsymbol{\beta}} \). This means that a
-single output \( y_i \) is given by the Gaussian distribution
+<ol>
+<li> An analytical solution (recall homework sets for week 35).</li>
+<li> The gradient can be computed analytically.</li>
+<li> The cost function is convex which guarantees that gradient descent converges for small enough learning rates</li>
+</ol>
+<p>We revisit an example similar to what we had in the first homework set. We have a function  of the type</p>
+
+
+<!-- code=python (!bc pycod) typeset with pygments style "default" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #f8f8f8">
+  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
+x <span style="color: #666666">=</span> <span style="color: #666666">2*</span>np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>rand(m,<span style="color: #666666">1</span>)
+y <span style="color: #666666">=</span> <span style="color: #666666">4+3*</span>x<span style="color: #666666">+</span>np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>randn(m,<span style="color: #666666">1</span>)
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
+</div>
+
+<p>with \( x_i \in [0,1]  \) is chosen randomly using a uniform distribution. Additionally we have a stochastic noise chosen according to a normal distribution \( \cal {N}(0,1) \). 
+The linear regression model is given by 
 </p>
+$$
+h_\theta(x) = \boldsymbol{y} = \theta_0 + \theta_1 x,
+$$
 
+<p>such that </p>
 $$
-y_i\sim \mathcal{N}(\boldsymbol{X}_{i,*}\boldsymbol{\beta}, \sigma^2)=\frac{1}{\sqrt{2\pi\sigma^2}}\exp{\left[-\frac{(y_i-\boldsymbol{X}_{i,*}\boldsymbol{\beta})^2}{2\sigma^2}\right]}.
+\boldsymbol{y}_i = \theta_0 + \theta_1 x_i.
 $$
 
 
@@ -332,7 +446,7 @@ <h2 id="deriving-ols-from-a-probability-distribution" class="anchor">Deriving OL
   <li><a href="http://github.com/._week37-bs012.html">13</a></li>
   <li><a href="http://github.com/._week37-bs013.html">14</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week37-bs053.html">54</a></li>
+  <li><a href="http://github.com/._week37-bs065.html">66</a></li>
   <li><a href="http://github.com/._week37-bs005.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week37/html/._week37-bs005.html b/doc/pub/week37/html/._week37-bs005.html
index 10778e05c..89cb2368c 100644
--- a/doc/pub/week37/html/._week37-bs005.html
+++ b/doc/pub/week37/html/._week37-bs005.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 37: Statistical interpretations and Resampling Methods">
-<title>Week 37: Statistical interpretations and Resampling Methods</title>
+<meta name="description" content="Week 37: Gradient descent methods">
+<title>Week 37: Gradient descent methods</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week37.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week37-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -40,159 +40,222 @@
                2,
                None,
                'plans-for-week-37-lecture-monday'),
-              ('Plans for week 37, lab sessions',
+              ('Readings and Videos:', 2, None, 'readings-and-videos'),
+              ('Material for lecture Monday September 8',
                2,
                None,
-               'plans-for-week-37-lab-sessions'),
-              ('Material for lecture Monday September 9',
+               'material-for-lecture-monday-september-8'),
+              ('Gradient descent and revisiting Ordinary Least Squares from '
+               'last week',
                2,
                None,
-               'material-for-lecture-monday-september-9'),
-              ('Deriving OLS from a probability distribution',
+               'gradient-descent-and-revisiting-ordinary-least-squares-from-last-week'),
+              ('Gradient descent example', 2, None, 'gradient-descent-example'),
+              ('The derivative of the cost/loss function',
                2,
                None,
-               'deriving-ols-from-a-probability-distribution'),
-              ('Independent and Identically Distrubuted (iid)',
+               'the-derivative-of-the-cost-loss-function'),
+              ('The Hessian matrix', 2, None, 'the-hessian-matrix'),
+              ('Simple program', 2, None, 'simple-program'),
+              ('Gradient Descent Example', 2, None, 'gradient-descent-example'),
+              ('Gradient descent and Ridge',
                2,
                None,
-               'independent-and-identically-distrubuted-iid'),
-              ('Maximum Likelihood Estimation (MLE)',
+               'gradient-descent-and-ridge'),
+              ('The Hessian matrix for Ridge Regression',
                2,
                None,
-               'maximum-likelihood-estimation-mle'),
-              ('A new Cost Function', 2, None, 'a-new-cost-function'),
-              ("More basic Statistics and Bayes' theorem",
+               'the-hessian-matrix-for-ridge-regression'),
+              ('Program example for gradient descent with Ridge Regression',
                2,
                None,
-               'more-basic-statistics-and-bayes-theorem'),
-              ('Marginal Probability', 2, None, 'marginal-probability'),
-              ('Conditional  Probability', 2, None, 'conditional-probability'),
-              ("Bayes' Theorem", 2, None, 'bayes-theorem'),
-              ("Interpretations of Bayes' Theorem",
+               'program-example-for-gradient-descent-with-ridge-regression'),
+              ('Using gradient descent methods, limitations',
                2,
                None,
-               'interpretations-of-bayes-theorem'),
-              ("Example of Usage of Bayes' theorem",
+               'using-gradient-descent-methods-limitations'),
+              ('Momentum based GD', 2, None, 'momentum-based-gd'),
+              ('Improving gradient descent with momentum',
                2,
                None,
-               'example-of-usage-of-bayes-theorem'),
-              ('Doing it correctly', 2, None, 'doing-it-correctly'),
-              ("Bayes' Theorem and Ridge and Lasso Regression",
+               'improving-gradient-descent-with-momentum'),
+              ('Same code but now with momentum gradient descent',
                2,
                None,
-               'bayes-theorem-and-ridge-and-lasso-regression'),
-              ('Ridge and Bayes', 2, None, 'ridge-and-bayes'),
-              ('Lasso and Bayes', 2, None, 'lasso-and-bayes'),
-              ('Why resampling methods', 2, None, 'why-resampling-methods'),
-              ('Resampling methods', 2, None, 'resampling-methods'),
-              ('Resampling approaches can be computationally expensive',
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ('Overview video on Stochastic Gradient Descent (SGD)',
                2,
                None,
-               'resampling-approaches-can-be-computationally-expensive'),
-              ('Why resampling methods ?', 2, None, 'why-resampling-methods'),
-              ('Statistical analysis', 2, None, 'statistical-analysis'),
-              ('Resampling methods', 2, None, 'resampling-methods'),
-              ('Resampling methods: Bootstrap',
+               'overview-video-on-stochastic-gradient-descent-sgd'),
+              ('Batches and mini-batches', 2, None, 'batches-and-mini-batches'),
+              ('Pros and cons', 2, None, 'pros-and-cons'),
+              ('Convergence rates', 2, None, 'convergence-rates'),
+              ('Accuracy', 2, None, 'accuracy'),
+              ('Stochastic Gradient Descent (SGD)',
                2,
                None,
-               'resampling-methods-bootstrap'),
-              ('The Central Limit Theorem',
+               'stochastic-gradient-descent-sgd'),
+              ('Stochastic Gradient Descent',
                2,
                None,
-               'the-central-limit-theorem'),
-              ('Finding the Limit', 2, None, 'finding-the-limit'),
-              ('Rewriting the $\\delta$-function',
+               'stochastic-gradient-descent'),
+              ('Computation of gradients', 2, None, 'computation-of-gradients'),
+              ('SGD example', 2, None, 'sgd-example'),
+              ('The gradient step', 2, None, 'the-gradient-step'),
+              ('Simple example code', 2, None, 'simple-example-code'),
+              ('When do we stop?', 2, None, 'when-do-we-stop'),
+              ('Slightly different approach',
                2,
                None,
-               'rewriting-the-delta-function'),
-              ('Identifying Terms', 2, None, 'identifying-terms'),
-              ('Wrapping it up', 2, None, 'wrapping-it-up'),
-              ('Confidence Intervals', 2, None, 'confidence-intervals'),
-              ('Standard Approach based on the Normal Distribution',
+               'slightly-different-approach'),
+              ('Time decay rate', 2, None, 'time-decay-rate'),
+              ('Code with a Number of Minibatches which varies',
                2,
                None,
-               'standard-approach-based-on-the-normal-distribution'),
-              ('Resampling methods: Bootstrap background',
+               'code-with-a-number-of-minibatches-which-varies'),
+              ('Replace or not', 2, None, 'replace-or-not'),
+              ('SGD vs Full-Batch GD: Convergence Speed and Memory Comparison',
                2,
                None,
-               'resampling-methods-bootstrap-background'),
-              ('Resampling methods: More Bootstrap background',
+               'sgd-vs-full-batch-gd-convergence-speed-and-memory-comparison'),
+              ('Theoretical Convergence Speed and convex optimization',
+               3,
+               None,
+               'theoretical-convergence-speed-and-convex-optimization'),
+              ('Strongly Convex Case', 3, None, 'strongly-convex-case'),
+              ('Non-Convex Problems', 3, None, 'non-convex-problems'),
+              ('Memory Usage and Scalability',
+               2,
+               None,
+               'memory-usage-and-scalability'),
+              ('Empirical Evidence: Convergence Time and Memory in Practice',
+               2,
+               None,
+               'empirical-evidence-convergence-time-and-memory-in-practice'),
+              ('Deep Neural Networks', 3, None, 'deep-neural-networks'),
+              ('Memory constraints', 3, None, 'memory-constraints'),
+              ('Second moment of the gradient',
+               2,
+               None,
+               'second-moment-of-the-gradient'),
+              ('Challenge: Choosing a Fixed Learning Rate',
+               2,
+               None,
+               'challenge-choosing-a-fixed-learning-rate'),
+              ('Motivation for Adaptive Step Sizes',
+               2,
+               None,
+               'motivation-for-adaptive-step-sizes'),
+              ('AdaGrad algorithm, taken from "Goodfellow et '
+               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
+               2,
+               None,
+               'adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
+              ('Derivation of the AdaGrad Algorithm',
+               2,
+               None,
+               'derivation-of-the-adagrad-algorithm'),
+              ('AdaGrad Update Rule Derivation',
+               2,
+               None,
+               'adagrad-update-rule-derivation'),
+              ('AdaGrad Properties', 2, None, 'adagrad-properties'),
+              ('RMSProp: Adaptive Learning Rates',
+               2,
+               None,
+               'rmsprop-adaptive-learning-rates'),
+              ('RMSProp algorithm, taken from "Goodfellow et '
+               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
+               2,
+               None,
+               'rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
+              ('Adam Optimizer', 2, None, 'adam-optimizer'),
+              ('"ADAM optimizer":"https://arxiv.org/abs/1412.6980"',
+               2,
+               None,
+               'adam-optimizer-https-arxiv-org-abs-1412-6980'),
+              ('Why Combine Momentum and RMSProp?',
+               2,
+               None,
+               'why-combine-momentum-and-rmsprop'),
+              ('Adam: Exponential Moving Averages (Moments)',
                2,
                None,
-               'resampling-methods-more-bootstrap-background'),
-              ('Resampling methods: Bootstrap approach',
+               'adam-exponential-moving-averages-moments'),
+              ('Adam: Bias Correction', 2, None, 'adam-bias-correction'),
+              ('Adam: Update Rule Derivation',
                2,
                None,
-               'resampling-methods-bootstrap-approach'),
-              ('Resampling methods: Bootstrap steps',
+               'adam-update-rule-derivation'),
+              ('Adam vs. AdaGrad and RMSProp',
                2,
                None,
-               'resampling-methods-bootstrap-steps'),
-              ('Code example for the Bootstrap method',
+               'adam-vs-adagrad-and-rmsprop'),
+              ('Adaptivity Across Dimensions',
                2,
                None,
-               'code-example-for-the-bootstrap-method'),
-              ('Plotting the Histogram', 2, None, 'plotting-the-histogram'),
-              ('The bias-variance tradeoff',
+               'adaptivity-across-dimensions'),
+              ('ADAM algorithm, taken from "Goodfellow et '
+               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
                2,
                None,
-               'the-bias-variance-tradeoff'),
-              ('A way to Read the Bias-Variance Tradeoff',
+               'adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
+              ('Algorithms and codes for Adagrad, RMSprop and Adam',
                2,
                None,
-               'a-way-to-read-the-bias-variance-tradeoff'),
-              ('Example code for Bias-Variance tradeoff',
+               'algorithms-and-codes-for-adagrad-rmsprop-and-adam'),
+              ('Practical tips', 2, None, 'practical-tips'),
+              ('Sneaking in automatic differentiation using Autograd',
                2,
                None,
-               'example-code-for-bias-variance-tradeoff'),
-              ('Understanding what happens',
+               'sneaking-in-automatic-differentiation-using-autograd'),
+              ('Same code but now with momentum gradient descent',
                2,
                None,
-               'understanding-what-happens'),
-              ('Summing up', 2, None, 'summing-up'),
-              ("Another Example from Scikit-Learn's Repository",
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ('Including Stochastic Gradient Descent with Autograd',
                2,
                None,
-               'another-example-from-scikit-learn-s-repository'),
-              ('Various steps in cross-validation',
+               'including-stochastic-gradient-descent-with-autograd'),
+              ('Same code but now with momentum gradient descent',
                2,
                None,
-               'various-steps-in-cross-validation'),
-              ('Cross-validation in brief',
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ("But none of these can compete with Newton's method",
                2,
                None,
-               'cross-validation-in-brief'),
-              ('Code Example for Cross-validation and $k$-fold '
-               'Cross-validation',
+               'but-none-of-these-can-compete-with-newton-s-method'),
+              ('Similar (second order function now) problem but now with '
+               'AdaGrad',
                2,
                None,
-               'code-example-for-cross-validation-and-k-fold-cross-validation'),
-              ('More examples on bootstrap and cross-validation and errors',
+               'similar-second-order-function-now-problem-but-now-with-adagrad'),
+              ('RMSprop for adaptive learning rate with Stochastic Gradient '
+               'Descent',
                2,
                None,
-               'more-examples-on-bootstrap-and-cross-validation-and-errors'),
-              ('The same example but now with cross-validation',
+               'rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent'),
+              ('And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"',
                2,
                None,
-               'the-same-example-but-now-with-cross-validation'),
+               'and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf'),
               ('Material for the lab sessions',
                2,
                None,
                'material-for-the-lab-sessions'),
-              ('Linking the regression analysis with a statistical '
-               'interpretation',
+              ('Reminder on different scaling methods',
                2,
                None,
-               'linking-the-regression-analysis-with-a-statistical-interpretation'),
-              ('Assumptions made', 2, None, 'assumptions-made'),
-              ('Expectation value and variance',
+               'reminder-on-different-scaling-methods'),
+              ('Functionality in Scikit-Learn',
                2,
                None,
-               'expectation-value-and-variance'),
-              ('Expectation value and variance for $\\boldsymbol{\\beta}$',
+               'functionality-in-scikit-learn'),
+              ('More preprocessing', 2, None, 'more-preprocessing'),
+              ('Frequently used scaling functions',
                2,
                None,
-               'expectation-value-and-variance-for-boldsymbol-beta')]}
+               'frequently-used-scaling-functions')]}
 end of tocinfo -->
 
 <body>
@@ -220,66 +283,86 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week37-bs.html">Week 37: Statistical interpretations and Resampling Methods</a>
+    <a class="navbar-brand" href="http://github.com/week37-bs.html">Week 37: Gradient descent methods</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs001.html#plans-for-week-37-lecture-monday" style="font-size: 80%;">Plans for week 37, lecture Monday</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs002.html#plans-for-week-37-lab-sessions" style="font-size: 80%;">Plans for week 37, lab sessions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs003.html#material-for-lecture-monday-september-9" style="font-size: 80%;">Material for lecture Monday September 9</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs004.html#deriving-ols-from-a-probability-distribution" style="font-size: 80%;">Deriving OLS from a probability distribution</a></li>
-     <!-- navigation toc: --> <li><a href="#independent-and-identically-distrubuted-iid" style="font-size: 80%;">Independent and Identically Distrubuted (iid)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs006.html#maximum-likelihood-estimation-mle" style="font-size: 80%;">Maximum Likelihood Estimation (MLE)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs007.html#a-new-cost-function" style="font-size: 80%;">A new Cost Function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs008.html#more-basic-statistics-and-bayes-theorem" style="font-size: 80%;">More basic Statistics and Bayes' theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs009.html#marginal-probability" style="font-size: 80%;">Marginal Probability</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs010.html#conditional-probability" style="font-size: 80%;">Conditional  Probability</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs011.html#bayes-theorem" style="font-size: 80%;">Bayes' Theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs012.html#interpretations-of-bayes-theorem" style="font-size: 80%;">Interpretations of Bayes' Theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs013.html#example-of-usage-of-bayes-theorem" style="font-size: 80%;">Example of Usage of Bayes' theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs014.html#doing-it-correctly" style="font-size: 80%;">Doing it correctly</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs015.html#bayes-theorem-and-ridge-and-lasso-regression" style="font-size: 80%;">Bayes' Theorem and Ridge and Lasso Regression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs016.html#ridge-and-bayes" style="font-size: 80%;">Ridge and Bayes</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs017.html#lasso-and-bayes" style="font-size: 80%;">Lasso and Bayes</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs021.html#why-resampling-methods" style="font-size: 80%;">Why resampling methods</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs023.html#resampling-methods" style="font-size: 80%;">Resampling methods</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs020.html#resampling-approaches-can-be-computationally-expensive" style="font-size: 80%;">Resampling approaches can be computationally expensive</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs021.html#why-resampling-methods" style="font-size: 80%;">Why resampling methods ?</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs022.html#statistical-analysis" style="font-size: 80%;">Statistical analysis</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs023.html#resampling-methods" style="font-size: 80%;">Resampling methods</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs024.html#resampling-methods-bootstrap" style="font-size: 80%;">Resampling methods: Bootstrap</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs025.html#the-central-limit-theorem" style="font-size: 80%;">The Central Limit Theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs026.html#finding-the-limit" style="font-size: 80%;">Finding the Limit</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs027.html#rewriting-the-delta-function" style="font-size: 80%;">Rewriting the \( \delta \)-function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs028.html#identifying-terms" style="font-size: 80%;">Identifying Terms</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs029.html#wrapping-it-up" style="font-size: 80%;">Wrapping it up</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs030.html#confidence-intervals" style="font-size: 80%;">Confidence Intervals</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs031.html#standard-approach-based-on-the-normal-distribution" style="font-size: 80%;">Standard Approach based on the Normal Distribution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs032.html#resampling-methods-bootstrap-background" style="font-size: 80%;">Resampling methods: Bootstrap background</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#resampling-methods-more-bootstrap-background" style="font-size: 80%;">Resampling methods: More Bootstrap background</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs034.html#resampling-methods-bootstrap-approach" style="font-size: 80%;">Resampling methods: Bootstrap approach</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#resampling-methods-bootstrap-steps" style="font-size: 80%;">Resampling methods: Bootstrap steps</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs036.html#code-example-for-the-bootstrap-method" style="font-size: 80%;">Code example for the Bootstrap method</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs037.html#plotting-the-histogram" style="font-size: 80%;">Plotting the Histogram</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs038.html#the-bias-variance-tradeoff" style="font-size: 80%;">The bias-variance tradeoff</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs039.html#a-way-to-read-the-bias-variance-tradeoff" style="font-size: 80%;">A way to Read the Bias-Variance Tradeoff</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs040.html#example-code-for-bias-variance-tradeoff" style="font-size: 80%;">Example code for Bias-Variance tradeoff</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs041.html#understanding-what-happens" style="font-size: 80%;">Understanding what happens</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs042.html#summing-up" style="font-size: 80%;">Summing up</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs043.html#another-example-from-scikit-learn-s-repository" style="font-size: 80%;">Another Example from Scikit-Learn's Repository</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs044.html#various-steps-in-cross-validation" style="font-size: 80%;">Various steps in cross-validation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs045.html#cross-validation-in-brief" style="font-size: 80%;">Cross-validation in brief</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs046.html#code-example-for-cross-validation-and-k-fold-cross-validation" style="font-size: 80%;">Code Example for Cross-validation and \( k \)-fold Cross-validation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs047.html#more-examples-on-bootstrap-and-cross-validation-and-errors" style="font-size: 80%;">More examples on bootstrap and cross-validation and errors</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs048.html#the-same-example-but-now-with-cross-validation" style="font-size: 80%;">The same example but now with cross-validation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs049.html#material-for-the-lab-sessions" style="font-size: 80%;">Material for the lab sessions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs050.html#linking-the-regression-analysis-with-a-statistical-interpretation" style="font-size: 80%;">Linking the regression analysis with a statistical interpretation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs051.html#assumptions-made" style="font-size: 80%;">Assumptions made</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs052.html#expectation-value-and-variance" style="font-size: 80%;">Expectation value and variance</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs053.html#expectation-value-and-variance-for-boldsymbol-beta" style="font-size: 80%;">Expectation value and variance for \( \boldsymbol{\beta} \)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs001.html#plans-for-week-37-lecture-monday" style="font-size: 80%;"><b>Plans for week 37, lecture Monday</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs002.html#readings-and-videos" style="font-size: 80%;"><b>Readings and Videos:</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs003.html#material-for-lecture-monday-september-8" style="font-size: 80%;"><b>Material for lecture Monday September 8</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs004.html#gradient-descent-and-revisiting-ordinary-least-squares-from-last-week" style="font-size: 80%;"><b>Gradient descent and revisiting Ordinary Least Squares from last week</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs009.html#gradient-descent-example" style="font-size: 80%;"><b>Gradient descent example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs006.html#the-derivative-of-the-cost-loss-function" style="font-size: 80%;"><b>The derivative of the cost/loss function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs007.html#the-hessian-matrix" style="font-size: 80%;"><b>The Hessian matrix</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs008.html#simple-program" style="font-size: 80%;"><b>Simple program</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs009.html#gradient-descent-example" style="font-size: 80%;"><b>Gradient Descent Example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs010.html#gradient-descent-and-ridge" style="font-size: 80%;"><b>Gradient descent and Ridge</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs011.html#the-hessian-matrix-for-ridge-regression" style="font-size: 80%;"><b>The Hessian matrix for Ridge Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs012.html#program-example-for-gradient-descent-with-ridge-regression" style="font-size: 80%;"><b>Program example for gradient descent with Ridge Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs013.html#using-gradient-descent-methods-limitations" style="font-size: 80%;"><b>Using gradient descent methods, limitations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs014.html#momentum-based-gd" style="font-size: 80%;"><b>Momentum based GD</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs015.html#improving-gradient-descent-with-momentum" style="font-size: 80%;"><b>Improving gradient descent with momentum</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs056.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs017.html#overview-video-on-stochastic-gradient-descent-sgd" style="font-size: 80%;"><b>Overview video on Stochastic Gradient Descent (SGD)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs018.html#batches-and-mini-batches" style="font-size: 80%;"><b>Batches and mini-batches</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs019.html#pros-and-cons" style="font-size: 80%;"><b>Pros and cons</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs020.html#convergence-rates" style="font-size: 80%;"><b>Convergence rates</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs021.html#accuracy" style="font-size: 80%;"><b>Accuracy</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs022.html#stochastic-gradient-descent-sgd" style="font-size: 80%;"><b>Stochastic Gradient Descent (SGD)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs023.html#stochastic-gradient-descent" style="font-size: 80%;"><b>Stochastic Gradient Descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs024.html#computation-of-gradients" style="font-size: 80%;"><b>Computation of gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs025.html#sgd-example" style="font-size: 80%;"><b>SGD example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs026.html#the-gradient-step" style="font-size: 80%;"><b>The gradient step</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs027.html#simple-example-code" style="font-size: 80%;"><b>Simple example code</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs028.html#when-do-we-stop" style="font-size: 80%;"><b>When do we stop?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs029.html#slightly-different-approach" style="font-size: 80%;"><b>Slightly different approach</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs030.html#time-decay-rate" style="font-size: 80%;"><b>Time decay rate</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs031.html#code-with-a-number-of-minibatches-which-varies" style="font-size: 80%;"><b>Code with a Number of Minibatches which varies</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs032.html#replace-or-not" style="font-size: 80%;"><b>Replace or not</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#sgd-vs-full-batch-gd-convergence-speed-and-memory-comparison" style="font-size: 80%;"><b>SGD vs Full-Batch GD: Convergence Speed and Memory Comparison</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#theoretical-convergence-speed-and-convex-optimization" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Theoretical Convergence Speed and convex optimization</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#strongly-convex-case" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Strongly Convex Case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#non-convex-problems" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Non-Convex Problems</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs034.html#memory-usage-and-scalability" style="font-size: 80%;"><b>Memory Usage and Scalability</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#empirical-evidence-convergence-time-and-memory-in-practice" style="font-size: 80%;"><b>Empirical Evidence: Convergence Time and Memory in Practice</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#deep-neural-networks" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Deep Neural Networks</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#memory-constraints" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Memory constraints</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs036.html#second-moment-of-the-gradient" style="font-size: 80%;"><b>Second moment of the gradient</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs037.html#challenge-choosing-a-fixed-learning-rate" style="font-size: 80%;"><b>Challenge: Choosing a Fixed Learning Rate</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs038.html#motivation-for-adaptive-step-sizes" style="font-size: 80%;"><b>Motivation for Adaptive Step Sizes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs038.html#adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>AdaGrad algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs039.html#derivation-of-the-adagrad-algorithm" style="font-size: 80%;"><b>Derivation of the AdaGrad Algorithm</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs040.html#adagrad-update-rule-derivation" style="font-size: 80%;"><b>AdaGrad Update Rule Derivation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs041.html#adagrad-properties" style="font-size: 80%;"><b>AdaGrad Properties</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs042.html#rmsprop-adaptive-learning-rates" style="font-size: 80%;"><b>RMSProp: Adaptive Learning Rates</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs042.html#rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>RMSProp algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs043.html#adam-optimizer" style="font-size: 80%;"><b>Adam Optimizer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs044.html#adam-optimizer-https-arxiv-org-abs-1412-6980" style="font-size: 80%;"><b>"ADAM optimizer":"https://arxiv.org/abs/1412.6980"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs045.html#why-combine-momentum-and-rmsprop" style="font-size: 80%;"><b>Why Combine Momentum and RMSProp?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs046.html#adam-exponential-moving-averages-moments" style="font-size: 80%;"><b>Adam: Exponential Moving Averages (Moments)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs047.html#adam-bias-correction" style="font-size: 80%;"><b>Adam: Bias Correction</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs048.html#adam-update-rule-derivation" style="font-size: 80%;"><b>Adam: Update Rule Derivation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs049.html#adam-vs-adagrad-and-rmsprop" style="font-size: 80%;"><b>Adam vs. AdaGrad and RMSProp</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs050.html#adaptivity-across-dimensions" style="font-size: 80%;"><b>Adaptivity Across Dimensions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs050.html#adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>ADAM algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs051.html#algorithms-and-codes-for-adagrad-rmsprop-and-adam" style="font-size: 80%;"><b>Algorithms and codes for Adagrad, RMSprop and Adam</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs052.html#practical-tips" style="font-size: 80%;"><b>Practical tips</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs053.html#sneaking-in-automatic-differentiation-using-autograd" style="font-size: 80%;"><b>Sneaking in automatic differentiation using Autograd</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs056.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs055.html#including-stochastic-gradient-descent-with-autograd" style="font-size: 80%;"><b>Including Stochastic Gradient Descent with Autograd</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs056.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs057.html#but-none-of-these-can-compete-with-newton-s-method" style="font-size: 80%;"><b>But none of these can compete with Newton's method</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs058.html#similar-second-order-function-now-problem-but-now-with-adagrad" style="font-size: 80%;"><b>Similar (second order function now) problem but now with AdaGrad</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs059.html#rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent" style="font-size: 80%;"><b>RMSprop for adaptive learning rate with Stochastic Gradient Descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs060.html#and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf" style="font-size: 80%;"><b>And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs061.html#material-for-the-lab-sessions" style="font-size: 80%;"><b>Material for the lab sessions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs062.html#reminder-on-different-scaling-methods" style="font-size: 80%;"><b>Reminder on different scaling methods</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs063.html#functionality-in-scikit-learn" style="font-size: 80%;"><b>Functionality in Scikit-Learn</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs064.html#more-preprocessing" style="font-size: 80%;"><b>More preprocessing</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs065.html#frequently-used-scaling-functions" style="font-size: 80%;"><b>Frequently used scaling functions</b></a></li>
 
         </ul>
       </li>
@@ -290,39 +373,26 @@
 <div class="container">
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0005"></a>
-<!-- !split -->
-<h2 id="independent-and-identically-distrubuted-iid" class="anchor">Independent and Identically Distrubuted (iid) </h2>
-
-<p>We assume now that the various \( y_i \) values are stochastically distributed according to the above Gaussian distribution. 
-We define this distribution as
-</p>
-$$
-p(y_i, \boldsymbol{X}\vert\boldsymbol{\beta})=\frac{1}{\sqrt{2\pi\sigma^2}}\exp{\left[-\frac{(y_i-\boldsymbol{X}_{i,*}\boldsymbol{\beta})^2}{2\sigma^2}\right]},
-$$
+<!-- !split  -->
+<h2 id="gradient-descent-example" class="anchor">Gradient descent example </h2>
 
-<p>which reads as finding the likelihood of an event \( y_i \) with the input variables \( \boldsymbol{X} \) given the parameters (to be determined) \( \boldsymbol{\beta} \).</p>
-
-<p>Since these events are assumed to be independent and identicall distributed we can build the probability distribution function (PDF) for all possible event \( \boldsymbol{y} \) as the product of the single events, that is we have</p>
-
-$$
-p(\boldsymbol{y},\boldsymbol{X}\vert\boldsymbol{\beta})=\prod_{i=0}^{n-1}\frac{1}{\sqrt{2\pi\sigma^2}}\exp{\left[-\frac{(y_i-\boldsymbol{X}_{i,*}\boldsymbol{\beta})^2}{2\sigma^2}\right]}=\prod_{i=0}^{n-1}p(y_i,\boldsymbol{X}\vert\boldsymbol{\beta}).
-$$
+<p>Let \( \mathbf{y} = (y_1,\cdots,y_n)^T \), \( \mathbf{\boldsymbol{y}} = (\boldsymbol{y}_1,\cdots,\boldsymbol{y}_n)^T \) and \( \theta = (\theta_0, \theta_1)^T \)</p>
 
-<p>We will write this in a more compact form reserving \( \boldsymbol{D} \) for the domain of events, including the ouputs (targets) and the inputs. That is
-in case we have a simple one-dimensional input and output case
-</p>
+<p>It is convenient to write \( \mathbf{\boldsymbol{y}} = X\theta \) where \( X \in \mathbb{R}^{100 \times 2}  \) is the design matrix given by (we keep the intercept here)</p>
 $$
-\boldsymbol{D}=[(x_0,y_0), (x_1,y_1),\dots, (x_{n-1},y_{n-1})].
+X \equiv \begin{bmatrix}
+1 & x_1  \\
+\vdots & \vdots  \\
+1 & x_{100} &  \\
+\end{bmatrix}.
 $$
 
-<p>In the more general case the various inputs should be replaced by the possible features represented by the input data set \( \boldsymbol{X} \). 
-We can now rewrite the above probability as 
-</p>
+<p>The cost/loss/risk function is given by </p>
 $$
-p(\boldsymbol{D}\vert\boldsymbol{\beta})=\prod_{i=0}^{n-1}\frac{1}{\sqrt{2\pi\sigma^2}}\exp{\left[-\frac{(y_i-\boldsymbol{X}_{i,*}\boldsymbol{\beta})^2}{2\sigma^2}\right]}.
+C(\theta) = \frac{1}{n}||X\theta-\mathbf{y}||_{2}^{2} = \frac{1}{n}\sum_{i=1}^{100}\left[ (\theta_0 + \theta_1 x_i)^2 - 2 y_i (\theta_0 + \theta_1 x_i) + y_i^2\right] 
 $$
 
-<p>It is a conditional probability (see below) and reads as the likelihood of a domain of events \( \boldsymbol{D} \) given a set of parameters \( \boldsymbol{\beta} \).</p>
+<p>and we want to find \( \theta \) such that \( C(\theta) \) is minimized.</p>
 
 <p>
 <!-- navigation buttons at the bottom of the page -->
@@ -344,7 +414,7 @@ <h2 id="independent-and-identically-distrubuted-iid" class="anchor">Independent
   <li><a href="http://github.com/._week37-bs013.html">14</a></li>
   <li><a href="http://github.com/._week37-bs014.html">15</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week37-bs053.html">54</a></li>
+  <li><a href="http://github.com/._week37-bs065.html">66</a></li>
   <li><a href="http://github.com/._week37-bs006.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week37/html/._week37-bs006.html b/doc/pub/week37/html/._week37-bs006.html
index d4a52a9c3..623def450 100644
--- a/doc/pub/week37/html/._week37-bs006.html
+++ b/doc/pub/week37/html/._week37-bs006.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 37: Statistical interpretations and Resampling Methods">
-<title>Week 37: Statistical interpretations and Resampling Methods</title>
+<meta name="description" content="Week 37: Gradient descent methods">
+<title>Week 37: Gradient descent methods</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week37.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week37-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -40,159 +40,222 @@
                2,
                None,
                'plans-for-week-37-lecture-monday'),
-              ('Plans for week 37, lab sessions',
+              ('Readings and Videos:', 2, None, 'readings-and-videos'),
+              ('Material for lecture Monday September 8',
                2,
                None,
-               'plans-for-week-37-lab-sessions'),
-              ('Material for lecture Monday September 9',
+               'material-for-lecture-monday-september-8'),
+              ('Gradient descent and revisiting Ordinary Least Squares from '
+               'last week',
                2,
                None,
-               'material-for-lecture-monday-september-9'),
-              ('Deriving OLS from a probability distribution',
+               'gradient-descent-and-revisiting-ordinary-least-squares-from-last-week'),
+              ('Gradient descent example', 2, None, 'gradient-descent-example'),
+              ('The derivative of the cost/loss function',
                2,
                None,
-               'deriving-ols-from-a-probability-distribution'),
-              ('Independent and Identically Distrubuted (iid)',
+               'the-derivative-of-the-cost-loss-function'),
+              ('The Hessian matrix', 2, None, 'the-hessian-matrix'),
+              ('Simple program', 2, None, 'simple-program'),
+              ('Gradient Descent Example', 2, None, 'gradient-descent-example'),
+              ('Gradient descent and Ridge',
                2,
                None,
-               'independent-and-identically-distrubuted-iid'),
-              ('Maximum Likelihood Estimation (MLE)',
+               'gradient-descent-and-ridge'),
+              ('The Hessian matrix for Ridge Regression',
                2,
                None,
-               'maximum-likelihood-estimation-mle'),
-              ('A new Cost Function', 2, None, 'a-new-cost-function'),
-              ("More basic Statistics and Bayes' theorem",
+               'the-hessian-matrix-for-ridge-regression'),
+              ('Program example for gradient descent with Ridge Regression',
                2,
                None,
-               'more-basic-statistics-and-bayes-theorem'),
-              ('Marginal Probability', 2, None, 'marginal-probability'),
-              ('Conditional  Probability', 2, None, 'conditional-probability'),
-              ("Bayes' Theorem", 2, None, 'bayes-theorem'),
-              ("Interpretations of Bayes' Theorem",
+               'program-example-for-gradient-descent-with-ridge-regression'),
+              ('Using gradient descent methods, limitations',
                2,
                None,
-               'interpretations-of-bayes-theorem'),
-              ("Example of Usage of Bayes' theorem",
+               'using-gradient-descent-methods-limitations'),
+              ('Momentum based GD', 2, None, 'momentum-based-gd'),
+              ('Improving gradient descent with momentum',
                2,
                None,
-               'example-of-usage-of-bayes-theorem'),
-              ('Doing it correctly', 2, None, 'doing-it-correctly'),
-              ("Bayes' Theorem and Ridge and Lasso Regression",
+               'improving-gradient-descent-with-momentum'),
+              ('Same code but now with momentum gradient descent',
                2,
                None,
-               'bayes-theorem-and-ridge-and-lasso-regression'),
-              ('Ridge and Bayes', 2, None, 'ridge-and-bayes'),
-              ('Lasso and Bayes', 2, None, 'lasso-and-bayes'),
-              ('Why resampling methods', 2, None, 'why-resampling-methods'),
-              ('Resampling methods', 2, None, 'resampling-methods'),
-              ('Resampling approaches can be computationally expensive',
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ('Overview video on Stochastic Gradient Descent (SGD)',
                2,
                None,
-               'resampling-approaches-can-be-computationally-expensive'),
-              ('Why resampling methods ?', 2, None, 'why-resampling-methods'),
-              ('Statistical analysis', 2, None, 'statistical-analysis'),
-              ('Resampling methods', 2, None, 'resampling-methods'),
-              ('Resampling methods: Bootstrap',
+               'overview-video-on-stochastic-gradient-descent-sgd'),
+              ('Batches and mini-batches', 2, None, 'batches-and-mini-batches'),
+              ('Pros and cons', 2, None, 'pros-and-cons'),
+              ('Convergence rates', 2, None, 'convergence-rates'),
+              ('Accuracy', 2, None, 'accuracy'),
+              ('Stochastic Gradient Descent (SGD)',
                2,
                None,
-               'resampling-methods-bootstrap'),
-              ('The Central Limit Theorem',
+               'stochastic-gradient-descent-sgd'),
+              ('Stochastic Gradient Descent',
                2,
                None,
-               'the-central-limit-theorem'),
-              ('Finding the Limit', 2, None, 'finding-the-limit'),
-              ('Rewriting the $\\delta$-function',
+               'stochastic-gradient-descent'),
+              ('Computation of gradients', 2, None, 'computation-of-gradients'),
+              ('SGD example', 2, None, 'sgd-example'),
+              ('The gradient step', 2, None, 'the-gradient-step'),
+              ('Simple example code', 2, None, 'simple-example-code'),
+              ('When do we stop?', 2, None, 'when-do-we-stop'),
+              ('Slightly different approach',
                2,
                None,
-               'rewriting-the-delta-function'),
-              ('Identifying Terms', 2, None, 'identifying-terms'),
-              ('Wrapping it up', 2, None, 'wrapping-it-up'),
-              ('Confidence Intervals', 2, None, 'confidence-intervals'),
-              ('Standard Approach based on the Normal Distribution',
+               'slightly-different-approach'),
+              ('Time decay rate', 2, None, 'time-decay-rate'),
+              ('Code with a Number of Minibatches which varies',
                2,
                None,
-               'standard-approach-based-on-the-normal-distribution'),
-              ('Resampling methods: Bootstrap background',
+               'code-with-a-number-of-minibatches-which-varies'),
+              ('Replace or not', 2, None, 'replace-or-not'),
+              ('SGD vs Full-Batch GD: Convergence Speed and Memory Comparison',
                2,
                None,
-               'resampling-methods-bootstrap-background'),
-              ('Resampling methods: More Bootstrap background',
+               'sgd-vs-full-batch-gd-convergence-speed-and-memory-comparison'),
+              ('Theoretical Convergence Speed and convex optimization',
+               3,
+               None,
+               'theoretical-convergence-speed-and-convex-optimization'),
+              ('Strongly Convex Case', 3, None, 'strongly-convex-case'),
+              ('Non-Convex Problems', 3, None, 'non-convex-problems'),
+              ('Memory Usage and Scalability',
+               2,
+               None,
+               'memory-usage-and-scalability'),
+              ('Empirical Evidence: Convergence Time and Memory in Practice',
+               2,
+               None,
+               'empirical-evidence-convergence-time-and-memory-in-practice'),
+              ('Deep Neural Networks', 3, None, 'deep-neural-networks'),
+              ('Memory constraints', 3, None, 'memory-constraints'),
+              ('Second moment of the gradient',
+               2,
+               None,
+               'second-moment-of-the-gradient'),
+              ('Challenge: Choosing a Fixed Learning Rate',
                2,
                None,
-               'resampling-methods-more-bootstrap-background'),
-              ('Resampling methods: Bootstrap approach',
+               'challenge-choosing-a-fixed-learning-rate'),
+              ('Motivation for Adaptive Step Sizes',
                2,
                None,
-               'resampling-methods-bootstrap-approach'),
-              ('Resampling methods: Bootstrap steps',
+               'motivation-for-adaptive-step-sizes'),
+              ('AdaGrad algorithm, taken from "Goodfellow et '
+               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
                2,
                None,
-               'resampling-methods-bootstrap-steps'),
-              ('Code example for the Bootstrap method',
+               'adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
+              ('Derivation of the AdaGrad Algorithm',
                2,
                None,
-               'code-example-for-the-bootstrap-method'),
-              ('Plotting the Histogram', 2, None, 'plotting-the-histogram'),
-              ('The bias-variance tradeoff',
+               'derivation-of-the-adagrad-algorithm'),
+              ('AdaGrad Update Rule Derivation',
                2,
                None,
-               'the-bias-variance-tradeoff'),
-              ('A way to Read the Bias-Variance Tradeoff',
+               'adagrad-update-rule-derivation'),
+              ('AdaGrad Properties', 2, None, 'adagrad-properties'),
+              ('RMSProp: Adaptive Learning Rates',
                2,
                None,
-               'a-way-to-read-the-bias-variance-tradeoff'),
-              ('Example code for Bias-Variance tradeoff',
+               'rmsprop-adaptive-learning-rates'),
+              ('RMSProp algorithm, taken from "Goodfellow et '
+               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
                2,
                None,
-               'example-code-for-bias-variance-tradeoff'),
-              ('Understanding what happens',
+               'rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
+              ('Adam Optimizer', 2, None, 'adam-optimizer'),
+              ('"ADAM optimizer":"https://arxiv.org/abs/1412.6980"',
                2,
                None,
-               'understanding-what-happens'),
-              ('Summing up', 2, None, 'summing-up'),
-              ("Another Example from Scikit-Learn's Repository",
+               'adam-optimizer-https-arxiv-org-abs-1412-6980'),
+              ('Why Combine Momentum and RMSProp?',
                2,
                None,
-               'another-example-from-scikit-learn-s-repository'),
-              ('Various steps in cross-validation',
+               'why-combine-momentum-and-rmsprop'),
+              ('Adam: Exponential Moving Averages (Moments)',
                2,
                None,
-               'various-steps-in-cross-validation'),
-              ('Cross-validation in brief',
+               'adam-exponential-moving-averages-moments'),
+              ('Adam: Bias Correction', 2, None, 'adam-bias-correction'),
+              ('Adam: Update Rule Derivation',
                2,
                None,
-               'cross-validation-in-brief'),
-              ('Code Example for Cross-validation and $k$-fold '
-               'Cross-validation',
+               'adam-update-rule-derivation'),
+              ('Adam vs. AdaGrad and RMSProp',
                2,
                None,
-               'code-example-for-cross-validation-and-k-fold-cross-validation'),
-              ('More examples on bootstrap and cross-validation and errors',
+               'adam-vs-adagrad-and-rmsprop'),
+              ('Adaptivity Across Dimensions',
                2,
                None,
-               'more-examples-on-bootstrap-and-cross-validation-and-errors'),
-              ('The same example but now with cross-validation',
+               'adaptivity-across-dimensions'),
+              ('ADAM algorithm, taken from "Goodfellow et '
+               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
                2,
                None,
-               'the-same-example-but-now-with-cross-validation'),
+               'adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
+              ('Algorithms and codes for Adagrad, RMSprop and Adam',
+               2,
+               None,
+               'algorithms-and-codes-for-adagrad-rmsprop-and-adam'),
+              ('Practical tips', 2, None, 'practical-tips'),
+              ('Sneaking in automatic differentiation using Autograd',
+               2,
+               None,
+               'sneaking-in-automatic-differentiation-using-autograd'),
+              ('Same code but now with momentum gradient descent',
+               2,
+               None,
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ('Including Stochastic Gradient Descent with Autograd',
+               2,
+               None,
+               'including-stochastic-gradient-descent-with-autograd'),
+              ('Same code but now with momentum gradient descent',
+               2,
+               None,
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ("But none of these can compete with Newton's method",
+               2,
+               None,
+               'but-none-of-these-can-compete-with-newton-s-method'),
+              ('Similar (second order function now) problem but now with '
+               'AdaGrad',
+               2,
+               None,
+               'similar-second-order-function-now-problem-but-now-with-adagrad'),
+              ('RMSprop for adaptive learning rate with Stochastic Gradient '
+               'Descent',
+               2,
+               None,
+               'rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent'),
+              ('And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"',
+               2,
+               None,
+               'and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf'),
               ('Material for the lab sessions',
                2,
                None,
                'material-for-the-lab-sessions'),
-              ('Linking the regression analysis with a statistical '
-               'interpretation',
+              ('Reminder on different scaling methods',
                2,
                None,
-               'linking-the-regression-analysis-with-a-statistical-interpretation'),
-              ('Assumptions made', 2, None, 'assumptions-made'),
-              ('Expectation value and variance',
+               'reminder-on-different-scaling-methods'),
+              ('Functionality in Scikit-Learn',
                2,
                None,
-               'expectation-value-and-variance'),
-              ('Expectation value and variance for $\\boldsymbol{\\beta}$',
+               'functionality-in-scikit-learn'),
+              ('More preprocessing', 2, None, 'more-preprocessing'),
+              ('Frequently used scaling functions',
                2,
                None,
-               'expectation-value-and-variance-for-boldsymbol-beta')]}
+               'frequently-used-scaling-functions')]}
 end of tocinfo -->
 
 <body>
@@ -220,66 +283,86 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week37-bs.html">Week 37: Statistical interpretations and Resampling Methods</a>
+    <a class="navbar-brand" href="http://github.com/week37-bs.html">Week 37: Gradient descent methods</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs001.html#plans-for-week-37-lecture-monday" style="font-size: 80%;">Plans for week 37, lecture Monday</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs002.html#plans-for-week-37-lab-sessions" style="font-size: 80%;">Plans for week 37, lab sessions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs003.html#material-for-lecture-monday-september-9" style="font-size: 80%;">Material for lecture Monday September 9</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs004.html#deriving-ols-from-a-probability-distribution" style="font-size: 80%;">Deriving OLS from a probability distribution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs005.html#independent-and-identically-distrubuted-iid" style="font-size: 80%;">Independent and Identically Distrubuted (iid)</a></li>
-     <!-- navigation toc: --> <li><a href="#maximum-likelihood-estimation-mle" style="font-size: 80%;">Maximum Likelihood Estimation (MLE)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs007.html#a-new-cost-function" style="font-size: 80%;">A new Cost Function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs008.html#more-basic-statistics-and-bayes-theorem" style="font-size: 80%;">More basic Statistics and Bayes' theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs009.html#marginal-probability" style="font-size: 80%;">Marginal Probability</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs010.html#conditional-probability" style="font-size: 80%;">Conditional  Probability</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs011.html#bayes-theorem" style="font-size: 80%;">Bayes' Theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs012.html#interpretations-of-bayes-theorem" style="font-size: 80%;">Interpretations of Bayes' Theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs013.html#example-of-usage-of-bayes-theorem" style="font-size: 80%;">Example of Usage of Bayes' theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs014.html#doing-it-correctly" style="font-size: 80%;">Doing it correctly</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs015.html#bayes-theorem-and-ridge-and-lasso-regression" style="font-size: 80%;">Bayes' Theorem and Ridge and Lasso Regression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs016.html#ridge-and-bayes" style="font-size: 80%;">Ridge and Bayes</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs017.html#lasso-and-bayes" style="font-size: 80%;">Lasso and Bayes</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs021.html#why-resampling-methods" style="font-size: 80%;">Why resampling methods</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs023.html#resampling-methods" style="font-size: 80%;">Resampling methods</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs020.html#resampling-approaches-can-be-computationally-expensive" style="font-size: 80%;">Resampling approaches can be computationally expensive</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs021.html#why-resampling-methods" style="font-size: 80%;">Why resampling methods ?</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs022.html#statistical-analysis" style="font-size: 80%;">Statistical analysis</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs023.html#resampling-methods" style="font-size: 80%;">Resampling methods</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs024.html#resampling-methods-bootstrap" style="font-size: 80%;">Resampling methods: Bootstrap</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs025.html#the-central-limit-theorem" style="font-size: 80%;">The Central Limit Theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs026.html#finding-the-limit" style="font-size: 80%;">Finding the Limit</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs027.html#rewriting-the-delta-function" style="font-size: 80%;">Rewriting the \( \delta \)-function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs028.html#identifying-terms" style="font-size: 80%;">Identifying Terms</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs029.html#wrapping-it-up" style="font-size: 80%;">Wrapping it up</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs030.html#confidence-intervals" style="font-size: 80%;">Confidence Intervals</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs031.html#standard-approach-based-on-the-normal-distribution" style="font-size: 80%;">Standard Approach based on the Normal Distribution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs032.html#resampling-methods-bootstrap-background" style="font-size: 80%;">Resampling methods: Bootstrap background</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#resampling-methods-more-bootstrap-background" style="font-size: 80%;">Resampling methods: More Bootstrap background</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs034.html#resampling-methods-bootstrap-approach" style="font-size: 80%;">Resampling methods: Bootstrap approach</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#resampling-methods-bootstrap-steps" style="font-size: 80%;">Resampling methods: Bootstrap steps</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs036.html#code-example-for-the-bootstrap-method" style="font-size: 80%;">Code example for the Bootstrap method</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs037.html#plotting-the-histogram" style="font-size: 80%;">Plotting the Histogram</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs038.html#the-bias-variance-tradeoff" style="font-size: 80%;">The bias-variance tradeoff</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs039.html#a-way-to-read-the-bias-variance-tradeoff" style="font-size: 80%;">A way to Read the Bias-Variance Tradeoff</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs040.html#example-code-for-bias-variance-tradeoff" style="font-size: 80%;">Example code for Bias-Variance tradeoff</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs041.html#understanding-what-happens" style="font-size: 80%;">Understanding what happens</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs042.html#summing-up" style="font-size: 80%;">Summing up</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs043.html#another-example-from-scikit-learn-s-repository" style="font-size: 80%;">Another Example from Scikit-Learn's Repository</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs044.html#various-steps-in-cross-validation" style="font-size: 80%;">Various steps in cross-validation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs045.html#cross-validation-in-brief" style="font-size: 80%;">Cross-validation in brief</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs046.html#code-example-for-cross-validation-and-k-fold-cross-validation" style="font-size: 80%;">Code Example for Cross-validation and \( k \)-fold Cross-validation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs047.html#more-examples-on-bootstrap-and-cross-validation-and-errors" style="font-size: 80%;">More examples on bootstrap and cross-validation and errors</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs048.html#the-same-example-but-now-with-cross-validation" style="font-size: 80%;">The same example but now with cross-validation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs049.html#material-for-the-lab-sessions" style="font-size: 80%;">Material for the lab sessions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs050.html#linking-the-regression-analysis-with-a-statistical-interpretation" style="font-size: 80%;">Linking the regression analysis with a statistical interpretation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs051.html#assumptions-made" style="font-size: 80%;">Assumptions made</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs052.html#expectation-value-and-variance" style="font-size: 80%;">Expectation value and variance</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs053.html#expectation-value-and-variance-for-boldsymbol-beta" style="font-size: 80%;">Expectation value and variance for \( \boldsymbol{\beta} \)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs001.html#plans-for-week-37-lecture-monday" style="font-size: 80%;"><b>Plans for week 37, lecture Monday</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs002.html#readings-and-videos" style="font-size: 80%;"><b>Readings and Videos:</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs003.html#material-for-lecture-monday-september-8" style="font-size: 80%;"><b>Material for lecture Monday September 8</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs004.html#gradient-descent-and-revisiting-ordinary-least-squares-from-last-week" style="font-size: 80%;"><b>Gradient descent and revisiting Ordinary Least Squares from last week</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs009.html#gradient-descent-example" style="font-size: 80%;"><b>Gradient descent example</b></a></li>
+     <!-- navigation toc: --> <li><a href="#the-derivative-of-the-cost-loss-function" style="font-size: 80%;"><b>The derivative of the cost/loss function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs007.html#the-hessian-matrix" style="font-size: 80%;"><b>The Hessian matrix</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs008.html#simple-program" style="font-size: 80%;"><b>Simple program</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs009.html#gradient-descent-example" style="font-size: 80%;"><b>Gradient Descent Example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs010.html#gradient-descent-and-ridge" style="font-size: 80%;"><b>Gradient descent and Ridge</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs011.html#the-hessian-matrix-for-ridge-regression" style="font-size: 80%;"><b>The Hessian matrix for Ridge Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs012.html#program-example-for-gradient-descent-with-ridge-regression" style="font-size: 80%;"><b>Program example for gradient descent with Ridge Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs013.html#using-gradient-descent-methods-limitations" style="font-size: 80%;"><b>Using gradient descent methods, limitations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs014.html#momentum-based-gd" style="font-size: 80%;"><b>Momentum based GD</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs015.html#improving-gradient-descent-with-momentum" style="font-size: 80%;"><b>Improving gradient descent with momentum</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs056.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs017.html#overview-video-on-stochastic-gradient-descent-sgd" style="font-size: 80%;"><b>Overview video on Stochastic Gradient Descent (SGD)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs018.html#batches-and-mini-batches" style="font-size: 80%;"><b>Batches and mini-batches</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs019.html#pros-and-cons" style="font-size: 80%;"><b>Pros and cons</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs020.html#convergence-rates" style="font-size: 80%;"><b>Convergence rates</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs021.html#accuracy" style="font-size: 80%;"><b>Accuracy</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs022.html#stochastic-gradient-descent-sgd" style="font-size: 80%;"><b>Stochastic Gradient Descent (SGD)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs023.html#stochastic-gradient-descent" style="font-size: 80%;"><b>Stochastic Gradient Descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs024.html#computation-of-gradients" style="font-size: 80%;"><b>Computation of gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs025.html#sgd-example" style="font-size: 80%;"><b>SGD example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs026.html#the-gradient-step" style="font-size: 80%;"><b>The gradient step</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs027.html#simple-example-code" style="font-size: 80%;"><b>Simple example code</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs028.html#when-do-we-stop" style="font-size: 80%;"><b>When do we stop?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs029.html#slightly-different-approach" style="font-size: 80%;"><b>Slightly different approach</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs030.html#time-decay-rate" style="font-size: 80%;"><b>Time decay rate</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs031.html#code-with-a-number-of-minibatches-which-varies" style="font-size: 80%;"><b>Code with a Number of Minibatches which varies</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs032.html#replace-or-not" style="font-size: 80%;"><b>Replace or not</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#sgd-vs-full-batch-gd-convergence-speed-and-memory-comparison" style="font-size: 80%;"><b>SGD vs Full-Batch GD: Convergence Speed and Memory Comparison</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#theoretical-convergence-speed-and-convex-optimization" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Theoretical Convergence Speed and convex optimization</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#strongly-convex-case" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Strongly Convex Case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#non-convex-problems" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Non-Convex Problems</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs034.html#memory-usage-and-scalability" style="font-size: 80%;"><b>Memory Usage and Scalability</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#empirical-evidence-convergence-time-and-memory-in-practice" style="font-size: 80%;"><b>Empirical Evidence: Convergence Time and Memory in Practice</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#deep-neural-networks" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Deep Neural Networks</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#memory-constraints" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Memory constraints</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs036.html#second-moment-of-the-gradient" style="font-size: 80%;"><b>Second moment of the gradient</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs037.html#challenge-choosing-a-fixed-learning-rate" style="font-size: 80%;"><b>Challenge: Choosing a Fixed Learning Rate</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs038.html#motivation-for-adaptive-step-sizes" style="font-size: 80%;"><b>Motivation for Adaptive Step Sizes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs038.html#adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>AdaGrad algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs039.html#derivation-of-the-adagrad-algorithm" style="font-size: 80%;"><b>Derivation of the AdaGrad Algorithm</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs040.html#adagrad-update-rule-derivation" style="font-size: 80%;"><b>AdaGrad Update Rule Derivation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs041.html#adagrad-properties" style="font-size: 80%;"><b>AdaGrad Properties</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs042.html#rmsprop-adaptive-learning-rates" style="font-size: 80%;"><b>RMSProp: Adaptive Learning Rates</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs042.html#rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>RMSProp algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs043.html#adam-optimizer" style="font-size: 80%;"><b>Adam Optimizer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs044.html#adam-optimizer-https-arxiv-org-abs-1412-6980" style="font-size: 80%;"><b>"ADAM optimizer":"https://arxiv.org/abs/1412.6980"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs045.html#why-combine-momentum-and-rmsprop" style="font-size: 80%;"><b>Why Combine Momentum and RMSProp?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs046.html#adam-exponential-moving-averages-moments" style="font-size: 80%;"><b>Adam: Exponential Moving Averages (Moments)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs047.html#adam-bias-correction" style="font-size: 80%;"><b>Adam: Bias Correction</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs048.html#adam-update-rule-derivation" style="font-size: 80%;"><b>Adam: Update Rule Derivation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs049.html#adam-vs-adagrad-and-rmsprop" style="font-size: 80%;"><b>Adam vs. AdaGrad and RMSProp</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs050.html#adaptivity-across-dimensions" style="font-size: 80%;"><b>Adaptivity Across Dimensions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs050.html#adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>ADAM algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs051.html#algorithms-and-codes-for-adagrad-rmsprop-and-adam" style="font-size: 80%;"><b>Algorithms and codes for Adagrad, RMSprop and Adam</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs052.html#practical-tips" style="font-size: 80%;"><b>Practical tips</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs053.html#sneaking-in-automatic-differentiation-using-autograd" style="font-size: 80%;"><b>Sneaking in automatic differentiation using Autograd</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs056.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs055.html#including-stochastic-gradient-descent-with-autograd" style="font-size: 80%;"><b>Including Stochastic Gradient Descent with Autograd</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs056.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs057.html#but-none-of-these-can-compete-with-newton-s-method" style="font-size: 80%;"><b>But none of these can compete with Newton's method</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs058.html#similar-second-order-function-now-problem-but-now-with-adagrad" style="font-size: 80%;"><b>Similar (second order function now) problem but now with AdaGrad</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs059.html#rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent" style="font-size: 80%;"><b>RMSprop for adaptive learning rate with Stochastic Gradient Descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs060.html#and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf" style="font-size: 80%;"><b>And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs061.html#material-for-the-lab-sessions" style="font-size: 80%;"><b>Material for the lab sessions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs062.html#reminder-on-different-scaling-methods" style="font-size: 80%;"><b>Reminder on different scaling methods</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs063.html#functionality-in-scikit-learn" style="font-size: 80%;"><b>Functionality in Scikit-Learn</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs064.html#more-preprocessing" style="font-size: 80%;"><b>More preprocessing</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs065.html#frequently-used-scaling-functions" style="font-size: 80%;"><b>Frequently used scaling functions</b></a></li>
 
         </ul>
       </li>
@@ -291,32 +374,16 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0006"></a>
 <!-- !split -->
-<h2 id="maximum-likelihood-estimation-mle" class="anchor">Maximum Likelihood Estimation (MLE) </h2>
-
-<p>In statistics, maximum likelihood estimation (MLE) is a method of
-estimating the parameters of an assumed probability distribution,
-given some observed data. This is achieved by maximizing a likelihood
-function so that, under the assumed statistical model, the observed
-data is the most probable. 
-</p>
-
-<p>We will assume here that our events are given by the above Gaussian
-distribution and we will determine the optimal parameters \( \beta \) by
-maximizing the above PDF. However, computing the derivatives of a
-product function is cumbersome and can easily lead to overflow and/or
-underflowproblems, with potentials for loss of numerical precision.
-</p>
+<h2 id="the-derivative-of-the-cost-loss-function" class="anchor">The derivative of the cost/loss function </h2>
 
-<p>In practice, it is more convenient to maximize the logarithm of the
-PDF because it is a monotonically increasing function of the argument.
-Alternatively, and this will be our option, we will minimize the
-negative of the logarithm since this is a monotonically decreasing
-function.
-</p>
+<p>Computing \( \partial C(\theta) / \partial \theta_0 \) and \( \partial C(\theta) / \partial \theta_1 \) we can show  that the gradient can be written as</p>
+$$
+\nabla_{\theta} C(\theta) = \frac{2}{n}\begin{bmatrix} \sum_{i=1}^{100} \left(\theta_0+\theta_1x_i-y_i\right) \\
+\sum_{i=1}^{100}\left( x_i (\theta_0+\theta_1x_i)-y_ix_i\right) \\
+\end{bmatrix} = \frac{2}{n}X^T(X\theta - \mathbf{y}), 
+$$
 
-<p>Note also that maximization/minimization of the logarithm of the PDF
-is equivalent to the maximization/minimization of the function itself.
-</p>
+<p>where \( X \) is the design matrix defined above.</p>
 
 <p>
 <!-- navigation buttons at the bottom of the page -->
@@ -339,7 +406,7 @@ <h2 id="maximum-likelihood-estimation-mle" class="anchor">Maximum Likelihood Est
   <li><a href="http://github.com/._week37-bs014.html">15</a></li>
   <li><a href="http://github.com/._week37-bs015.html">16</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week37-bs053.html">54</a></li>
+  <li><a href="http://github.com/._week37-bs065.html">66</a></li>
   <li><a href="http://github.com/._week37-bs007.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week37/html/._week37-bs007.html b/doc/pub/week37/html/._week37-bs007.html
index 132b4b71c..9765369b2 100644
--- a/doc/pub/week37/html/._week37-bs007.html
+++ b/doc/pub/week37/html/._week37-bs007.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 37: Statistical interpretations and Resampling Methods">
-<title>Week 37: Statistical interpretations and Resampling Methods</title>
+<meta name="description" content="Week 37: Gradient descent methods">
+<title>Week 37: Gradient descent methods</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week37.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week37-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -40,159 +40,222 @@
                2,
                None,
                'plans-for-week-37-lecture-monday'),
-              ('Plans for week 37, lab sessions',
+              ('Readings and Videos:', 2, None, 'readings-and-videos'),
+              ('Material for lecture Monday September 8',
                2,
                None,
-               'plans-for-week-37-lab-sessions'),
-              ('Material for lecture Monday September 9',
+               'material-for-lecture-monday-september-8'),
+              ('Gradient descent and revisiting Ordinary Least Squares from '
+               'last week',
                2,
                None,
-               'material-for-lecture-monday-september-9'),
-              ('Deriving OLS from a probability distribution',
+               'gradient-descent-and-revisiting-ordinary-least-squares-from-last-week'),
+              ('Gradient descent example', 2, None, 'gradient-descent-example'),
+              ('The derivative of the cost/loss function',
                2,
                None,
-               'deriving-ols-from-a-probability-distribution'),
-              ('Independent and Identically Distrubuted (iid)',
+               'the-derivative-of-the-cost-loss-function'),
+              ('The Hessian matrix', 2, None, 'the-hessian-matrix'),
+              ('Simple program', 2, None, 'simple-program'),
+              ('Gradient Descent Example', 2, None, 'gradient-descent-example'),
+              ('Gradient descent and Ridge',
                2,
                None,
-               'independent-and-identically-distrubuted-iid'),
-              ('Maximum Likelihood Estimation (MLE)',
+               'gradient-descent-and-ridge'),
+              ('The Hessian matrix for Ridge Regression',
                2,
                None,
-               'maximum-likelihood-estimation-mle'),
-              ('A new Cost Function', 2, None, 'a-new-cost-function'),
-              ("More basic Statistics and Bayes' theorem",
+               'the-hessian-matrix-for-ridge-regression'),
+              ('Program example for gradient descent with Ridge Regression',
                2,
                None,
-               'more-basic-statistics-and-bayes-theorem'),
-              ('Marginal Probability', 2, None, 'marginal-probability'),
-              ('Conditional  Probability', 2, None, 'conditional-probability'),
-              ("Bayes' Theorem", 2, None, 'bayes-theorem'),
-              ("Interpretations of Bayes' Theorem",
+               'program-example-for-gradient-descent-with-ridge-regression'),
+              ('Using gradient descent methods, limitations',
                2,
                None,
-               'interpretations-of-bayes-theorem'),
-              ("Example of Usage of Bayes' theorem",
+               'using-gradient-descent-methods-limitations'),
+              ('Momentum based GD', 2, None, 'momentum-based-gd'),
+              ('Improving gradient descent with momentum',
                2,
                None,
-               'example-of-usage-of-bayes-theorem'),
-              ('Doing it correctly', 2, None, 'doing-it-correctly'),
-              ("Bayes' Theorem and Ridge and Lasso Regression",
+               'improving-gradient-descent-with-momentum'),
+              ('Same code but now with momentum gradient descent',
                2,
                None,
-               'bayes-theorem-and-ridge-and-lasso-regression'),
-              ('Ridge and Bayes', 2, None, 'ridge-and-bayes'),
-              ('Lasso and Bayes', 2, None, 'lasso-and-bayes'),
-              ('Why resampling methods', 2, None, 'why-resampling-methods'),
-              ('Resampling methods', 2, None, 'resampling-methods'),
-              ('Resampling approaches can be computationally expensive',
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ('Overview video on Stochastic Gradient Descent (SGD)',
                2,
                None,
-               'resampling-approaches-can-be-computationally-expensive'),
-              ('Why resampling methods ?', 2, None, 'why-resampling-methods'),
-              ('Statistical analysis', 2, None, 'statistical-analysis'),
-              ('Resampling methods', 2, None, 'resampling-methods'),
-              ('Resampling methods: Bootstrap',
+               'overview-video-on-stochastic-gradient-descent-sgd'),
+              ('Batches and mini-batches', 2, None, 'batches-and-mini-batches'),
+              ('Pros and cons', 2, None, 'pros-and-cons'),
+              ('Convergence rates', 2, None, 'convergence-rates'),
+              ('Accuracy', 2, None, 'accuracy'),
+              ('Stochastic Gradient Descent (SGD)',
                2,
                None,
-               'resampling-methods-bootstrap'),
-              ('The Central Limit Theorem',
+               'stochastic-gradient-descent-sgd'),
+              ('Stochastic Gradient Descent',
                2,
                None,
-               'the-central-limit-theorem'),
-              ('Finding the Limit', 2, None, 'finding-the-limit'),
-              ('Rewriting the $\\delta$-function',
+               'stochastic-gradient-descent'),
+              ('Computation of gradients', 2, None, 'computation-of-gradients'),
+              ('SGD example', 2, None, 'sgd-example'),
+              ('The gradient step', 2, None, 'the-gradient-step'),
+              ('Simple example code', 2, None, 'simple-example-code'),
+              ('When do we stop?', 2, None, 'when-do-we-stop'),
+              ('Slightly different approach',
                2,
                None,
-               'rewriting-the-delta-function'),
-              ('Identifying Terms', 2, None, 'identifying-terms'),
-              ('Wrapping it up', 2, None, 'wrapping-it-up'),
-              ('Confidence Intervals', 2, None, 'confidence-intervals'),
-              ('Standard Approach based on the Normal Distribution',
+               'slightly-different-approach'),
+              ('Time decay rate', 2, None, 'time-decay-rate'),
+              ('Code with a Number of Minibatches which varies',
                2,
                None,
-               'standard-approach-based-on-the-normal-distribution'),
-              ('Resampling methods: Bootstrap background',
+               'code-with-a-number-of-minibatches-which-varies'),
+              ('Replace or not', 2, None, 'replace-or-not'),
+              ('SGD vs Full-Batch GD: Convergence Speed and Memory Comparison',
                2,
                None,
-               'resampling-methods-bootstrap-background'),
-              ('Resampling methods: More Bootstrap background',
+               'sgd-vs-full-batch-gd-convergence-speed-and-memory-comparison'),
+              ('Theoretical Convergence Speed and convex optimization',
+               3,
+               None,
+               'theoretical-convergence-speed-and-convex-optimization'),
+              ('Strongly Convex Case', 3, None, 'strongly-convex-case'),
+              ('Non-Convex Problems', 3, None, 'non-convex-problems'),
+              ('Memory Usage and Scalability',
+               2,
+               None,
+               'memory-usage-and-scalability'),
+              ('Empirical Evidence: Convergence Time and Memory in Practice',
+               2,
+               None,
+               'empirical-evidence-convergence-time-and-memory-in-practice'),
+              ('Deep Neural Networks', 3, None, 'deep-neural-networks'),
+              ('Memory constraints', 3, None, 'memory-constraints'),
+              ('Second moment of the gradient',
+               2,
+               None,
+               'second-moment-of-the-gradient'),
+              ('Challenge: Choosing a Fixed Learning Rate',
+               2,
+               None,
+               'challenge-choosing-a-fixed-learning-rate'),
+              ('Motivation for Adaptive Step Sizes',
+               2,
+               None,
+               'motivation-for-adaptive-step-sizes'),
+              ('AdaGrad algorithm, taken from "Goodfellow et '
+               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
+               2,
+               None,
+               'adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
+              ('Derivation of the AdaGrad Algorithm',
+               2,
+               None,
+               'derivation-of-the-adagrad-algorithm'),
+              ('AdaGrad Update Rule Derivation',
+               2,
+               None,
+               'adagrad-update-rule-derivation'),
+              ('AdaGrad Properties', 2, None, 'adagrad-properties'),
+              ('RMSProp: Adaptive Learning Rates',
+               2,
+               None,
+               'rmsprop-adaptive-learning-rates'),
+              ('RMSProp algorithm, taken from "Goodfellow et '
+               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
                2,
                None,
-               'resampling-methods-more-bootstrap-background'),
-              ('Resampling methods: Bootstrap approach',
+               'rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
+              ('Adam Optimizer', 2, None, 'adam-optimizer'),
+              ('"ADAM optimizer":"https://arxiv.org/abs/1412.6980"',
                2,
                None,
-               'resampling-methods-bootstrap-approach'),
-              ('Resampling methods: Bootstrap steps',
+               'adam-optimizer-https-arxiv-org-abs-1412-6980'),
+              ('Why Combine Momentum and RMSProp?',
                2,
                None,
-               'resampling-methods-bootstrap-steps'),
-              ('Code example for the Bootstrap method',
+               'why-combine-momentum-and-rmsprop'),
+              ('Adam: Exponential Moving Averages (Moments)',
                2,
                None,
-               'code-example-for-the-bootstrap-method'),
-              ('Plotting the Histogram', 2, None, 'plotting-the-histogram'),
-              ('The bias-variance tradeoff',
+               'adam-exponential-moving-averages-moments'),
+              ('Adam: Bias Correction', 2, None, 'adam-bias-correction'),
+              ('Adam: Update Rule Derivation',
                2,
                None,
-               'the-bias-variance-tradeoff'),
-              ('A way to Read the Bias-Variance Tradeoff',
+               'adam-update-rule-derivation'),
+              ('Adam vs. AdaGrad and RMSProp',
                2,
                None,
-               'a-way-to-read-the-bias-variance-tradeoff'),
-              ('Example code for Bias-Variance tradeoff',
+               'adam-vs-adagrad-and-rmsprop'),
+              ('Adaptivity Across Dimensions',
                2,
                None,
-               'example-code-for-bias-variance-tradeoff'),
-              ('Understanding what happens',
+               'adaptivity-across-dimensions'),
+              ('ADAM algorithm, taken from "Goodfellow et '
+               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
                2,
                None,
-               'understanding-what-happens'),
-              ('Summing up', 2, None, 'summing-up'),
-              ("Another Example from Scikit-Learn's Repository",
+               'adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
+              ('Algorithms and codes for Adagrad, RMSprop and Adam',
                2,
                None,
-               'another-example-from-scikit-learn-s-repository'),
-              ('Various steps in cross-validation',
+               'algorithms-and-codes-for-adagrad-rmsprop-and-adam'),
+              ('Practical tips', 2, None, 'practical-tips'),
+              ('Sneaking in automatic differentiation using Autograd',
                2,
                None,
-               'various-steps-in-cross-validation'),
-              ('Cross-validation in brief',
+               'sneaking-in-automatic-differentiation-using-autograd'),
+              ('Same code but now with momentum gradient descent',
                2,
                None,
-               'cross-validation-in-brief'),
-              ('Code Example for Cross-validation and $k$-fold '
-               'Cross-validation',
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ('Including Stochastic Gradient Descent with Autograd',
                2,
                None,
-               'code-example-for-cross-validation-and-k-fold-cross-validation'),
-              ('More examples on bootstrap and cross-validation and errors',
+               'including-stochastic-gradient-descent-with-autograd'),
+              ('Same code but now with momentum gradient descent',
                2,
                None,
-               'more-examples-on-bootstrap-and-cross-validation-and-errors'),
-              ('The same example but now with cross-validation',
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ("But none of these can compete with Newton's method",
                2,
                None,
-               'the-same-example-but-now-with-cross-validation'),
+               'but-none-of-these-can-compete-with-newton-s-method'),
+              ('Similar (second order function now) problem but now with '
+               'AdaGrad',
+               2,
+               None,
+               'similar-second-order-function-now-problem-but-now-with-adagrad'),
+              ('RMSprop for adaptive learning rate with Stochastic Gradient '
+               'Descent',
+               2,
+               None,
+               'rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent'),
+              ('And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"',
+               2,
+               None,
+               'and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf'),
               ('Material for the lab sessions',
                2,
                None,
                'material-for-the-lab-sessions'),
-              ('Linking the regression analysis with a statistical '
-               'interpretation',
+              ('Reminder on different scaling methods',
                2,
                None,
-               'linking-the-regression-analysis-with-a-statistical-interpretation'),
-              ('Assumptions made', 2, None, 'assumptions-made'),
-              ('Expectation value and variance',
+               'reminder-on-different-scaling-methods'),
+              ('Functionality in Scikit-Learn',
                2,
                None,
-               'expectation-value-and-variance'),
-              ('Expectation value and variance for $\\boldsymbol{\\beta}$',
+               'functionality-in-scikit-learn'),
+              ('More preprocessing', 2, None, 'more-preprocessing'),
+              ('Frequently used scaling functions',
                2,
                None,
-               'expectation-value-and-variance-for-boldsymbol-beta')]}
+               'frequently-used-scaling-functions')]}
 end of tocinfo -->
 
 <body>
@@ -220,66 +283,86 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week37-bs.html">Week 37: Statistical interpretations and Resampling Methods</a>
+    <a class="navbar-brand" href="http://github.com/week37-bs.html">Week 37: Gradient descent methods</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs001.html#plans-for-week-37-lecture-monday" style="font-size: 80%;">Plans for week 37, lecture Monday</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs002.html#plans-for-week-37-lab-sessions" style="font-size: 80%;">Plans for week 37, lab sessions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs003.html#material-for-lecture-monday-september-9" style="font-size: 80%;">Material for lecture Monday September 9</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs004.html#deriving-ols-from-a-probability-distribution" style="font-size: 80%;">Deriving OLS from a probability distribution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs005.html#independent-and-identically-distrubuted-iid" style="font-size: 80%;">Independent and Identically Distrubuted (iid)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs006.html#maximum-likelihood-estimation-mle" style="font-size: 80%;">Maximum Likelihood Estimation (MLE)</a></li>
-     <!-- navigation toc: --> <li><a href="#a-new-cost-function" style="font-size: 80%;">A new Cost Function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs008.html#more-basic-statistics-and-bayes-theorem" style="font-size: 80%;">More basic Statistics and Bayes' theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs009.html#marginal-probability" style="font-size: 80%;">Marginal Probability</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs010.html#conditional-probability" style="font-size: 80%;">Conditional  Probability</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs011.html#bayes-theorem" style="font-size: 80%;">Bayes' Theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs012.html#interpretations-of-bayes-theorem" style="font-size: 80%;">Interpretations of Bayes' Theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs013.html#example-of-usage-of-bayes-theorem" style="font-size: 80%;">Example of Usage of Bayes' theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs014.html#doing-it-correctly" style="font-size: 80%;">Doing it correctly</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs015.html#bayes-theorem-and-ridge-and-lasso-regression" style="font-size: 80%;">Bayes' Theorem and Ridge and Lasso Regression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs016.html#ridge-and-bayes" style="font-size: 80%;">Ridge and Bayes</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs017.html#lasso-and-bayes" style="font-size: 80%;">Lasso and Bayes</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs021.html#why-resampling-methods" style="font-size: 80%;">Why resampling methods</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs023.html#resampling-methods" style="font-size: 80%;">Resampling methods</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs020.html#resampling-approaches-can-be-computationally-expensive" style="font-size: 80%;">Resampling approaches can be computationally expensive</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs021.html#why-resampling-methods" style="font-size: 80%;">Why resampling methods ?</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs022.html#statistical-analysis" style="font-size: 80%;">Statistical analysis</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs023.html#resampling-methods" style="font-size: 80%;">Resampling methods</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs024.html#resampling-methods-bootstrap" style="font-size: 80%;">Resampling methods: Bootstrap</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs025.html#the-central-limit-theorem" style="font-size: 80%;">The Central Limit Theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs026.html#finding-the-limit" style="font-size: 80%;">Finding the Limit</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs027.html#rewriting-the-delta-function" style="font-size: 80%;">Rewriting the \( \delta \)-function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs028.html#identifying-terms" style="font-size: 80%;">Identifying Terms</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs029.html#wrapping-it-up" style="font-size: 80%;">Wrapping it up</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs030.html#confidence-intervals" style="font-size: 80%;">Confidence Intervals</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs031.html#standard-approach-based-on-the-normal-distribution" style="font-size: 80%;">Standard Approach based on the Normal Distribution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs032.html#resampling-methods-bootstrap-background" style="font-size: 80%;">Resampling methods: Bootstrap background</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#resampling-methods-more-bootstrap-background" style="font-size: 80%;">Resampling methods: More Bootstrap background</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs034.html#resampling-methods-bootstrap-approach" style="font-size: 80%;">Resampling methods: Bootstrap approach</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#resampling-methods-bootstrap-steps" style="font-size: 80%;">Resampling methods: Bootstrap steps</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs036.html#code-example-for-the-bootstrap-method" style="font-size: 80%;">Code example for the Bootstrap method</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs037.html#plotting-the-histogram" style="font-size: 80%;">Plotting the Histogram</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs038.html#the-bias-variance-tradeoff" style="font-size: 80%;">The bias-variance tradeoff</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs039.html#a-way-to-read-the-bias-variance-tradeoff" style="font-size: 80%;">A way to Read the Bias-Variance Tradeoff</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs040.html#example-code-for-bias-variance-tradeoff" style="font-size: 80%;">Example code for Bias-Variance tradeoff</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs041.html#understanding-what-happens" style="font-size: 80%;">Understanding what happens</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs042.html#summing-up" style="font-size: 80%;">Summing up</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs043.html#another-example-from-scikit-learn-s-repository" style="font-size: 80%;">Another Example from Scikit-Learn's Repository</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs044.html#various-steps-in-cross-validation" style="font-size: 80%;">Various steps in cross-validation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs045.html#cross-validation-in-brief" style="font-size: 80%;">Cross-validation in brief</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs046.html#code-example-for-cross-validation-and-k-fold-cross-validation" style="font-size: 80%;">Code Example for Cross-validation and \( k \)-fold Cross-validation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs047.html#more-examples-on-bootstrap-and-cross-validation-and-errors" style="font-size: 80%;">More examples on bootstrap and cross-validation and errors</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs048.html#the-same-example-but-now-with-cross-validation" style="font-size: 80%;">The same example but now with cross-validation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs049.html#material-for-the-lab-sessions" style="font-size: 80%;">Material for the lab sessions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs050.html#linking-the-regression-analysis-with-a-statistical-interpretation" style="font-size: 80%;">Linking the regression analysis with a statistical interpretation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs051.html#assumptions-made" style="font-size: 80%;">Assumptions made</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs052.html#expectation-value-and-variance" style="font-size: 80%;">Expectation value and variance</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs053.html#expectation-value-and-variance-for-boldsymbol-beta" style="font-size: 80%;">Expectation value and variance for \( \boldsymbol{\beta} \)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs001.html#plans-for-week-37-lecture-monday" style="font-size: 80%;"><b>Plans for week 37, lecture Monday</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs002.html#readings-and-videos" style="font-size: 80%;"><b>Readings and Videos:</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs003.html#material-for-lecture-monday-september-8" style="font-size: 80%;"><b>Material for lecture Monday September 8</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs004.html#gradient-descent-and-revisiting-ordinary-least-squares-from-last-week" style="font-size: 80%;"><b>Gradient descent and revisiting Ordinary Least Squares from last week</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs009.html#gradient-descent-example" style="font-size: 80%;"><b>Gradient descent example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs006.html#the-derivative-of-the-cost-loss-function" style="font-size: 80%;"><b>The derivative of the cost/loss function</b></a></li>
+     <!-- navigation toc: --> <li><a href="#the-hessian-matrix" style="font-size: 80%;"><b>The Hessian matrix</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs008.html#simple-program" style="font-size: 80%;"><b>Simple program</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs009.html#gradient-descent-example" style="font-size: 80%;"><b>Gradient Descent Example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs010.html#gradient-descent-and-ridge" style="font-size: 80%;"><b>Gradient descent and Ridge</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs011.html#the-hessian-matrix-for-ridge-regression" style="font-size: 80%;"><b>The Hessian matrix for Ridge Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs012.html#program-example-for-gradient-descent-with-ridge-regression" style="font-size: 80%;"><b>Program example for gradient descent with Ridge Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs013.html#using-gradient-descent-methods-limitations" style="font-size: 80%;"><b>Using gradient descent methods, limitations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs014.html#momentum-based-gd" style="font-size: 80%;"><b>Momentum based GD</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs015.html#improving-gradient-descent-with-momentum" style="font-size: 80%;"><b>Improving gradient descent with momentum</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs056.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs017.html#overview-video-on-stochastic-gradient-descent-sgd" style="font-size: 80%;"><b>Overview video on Stochastic Gradient Descent (SGD)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs018.html#batches-and-mini-batches" style="font-size: 80%;"><b>Batches and mini-batches</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs019.html#pros-and-cons" style="font-size: 80%;"><b>Pros and cons</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs020.html#convergence-rates" style="font-size: 80%;"><b>Convergence rates</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs021.html#accuracy" style="font-size: 80%;"><b>Accuracy</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs022.html#stochastic-gradient-descent-sgd" style="font-size: 80%;"><b>Stochastic Gradient Descent (SGD)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs023.html#stochastic-gradient-descent" style="font-size: 80%;"><b>Stochastic Gradient Descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs024.html#computation-of-gradients" style="font-size: 80%;"><b>Computation of gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs025.html#sgd-example" style="font-size: 80%;"><b>SGD example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs026.html#the-gradient-step" style="font-size: 80%;"><b>The gradient step</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs027.html#simple-example-code" style="font-size: 80%;"><b>Simple example code</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs028.html#when-do-we-stop" style="font-size: 80%;"><b>When do we stop?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs029.html#slightly-different-approach" style="font-size: 80%;"><b>Slightly different approach</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs030.html#time-decay-rate" style="font-size: 80%;"><b>Time decay rate</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs031.html#code-with-a-number-of-minibatches-which-varies" style="font-size: 80%;"><b>Code with a Number of Minibatches which varies</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs032.html#replace-or-not" style="font-size: 80%;"><b>Replace or not</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#sgd-vs-full-batch-gd-convergence-speed-and-memory-comparison" style="font-size: 80%;"><b>SGD vs Full-Batch GD: Convergence Speed and Memory Comparison</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#theoretical-convergence-speed-and-convex-optimization" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Theoretical Convergence Speed and convex optimization</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#strongly-convex-case" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Strongly Convex Case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#non-convex-problems" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Non-Convex Problems</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs034.html#memory-usage-and-scalability" style="font-size: 80%;"><b>Memory Usage and Scalability</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#empirical-evidence-convergence-time-and-memory-in-practice" style="font-size: 80%;"><b>Empirical Evidence: Convergence Time and Memory in Practice</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#deep-neural-networks" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Deep Neural Networks</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#memory-constraints" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Memory constraints</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs036.html#second-moment-of-the-gradient" style="font-size: 80%;"><b>Second moment of the gradient</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs037.html#challenge-choosing-a-fixed-learning-rate" style="font-size: 80%;"><b>Challenge: Choosing a Fixed Learning Rate</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs038.html#motivation-for-adaptive-step-sizes" style="font-size: 80%;"><b>Motivation for Adaptive Step Sizes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs038.html#adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>AdaGrad algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs039.html#derivation-of-the-adagrad-algorithm" style="font-size: 80%;"><b>Derivation of the AdaGrad Algorithm</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs040.html#adagrad-update-rule-derivation" style="font-size: 80%;"><b>AdaGrad Update Rule Derivation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs041.html#adagrad-properties" style="font-size: 80%;"><b>AdaGrad Properties</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs042.html#rmsprop-adaptive-learning-rates" style="font-size: 80%;"><b>RMSProp: Adaptive Learning Rates</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs042.html#rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>RMSProp algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs043.html#adam-optimizer" style="font-size: 80%;"><b>Adam Optimizer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs044.html#adam-optimizer-https-arxiv-org-abs-1412-6980" style="font-size: 80%;"><b>"ADAM optimizer":"https://arxiv.org/abs/1412.6980"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs045.html#why-combine-momentum-and-rmsprop" style="font-size: 80%;"><b>Why Combine Momentum and RMSProp?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs046.html#adam-exponential-moving-averages-moments" style="font-size: 80%;"><b>Adam: Exponential Moving Averages (Moments)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs047.html#adam-bias-correction" style="font-size: 80%;"><b>Adam: Bias Correction</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs048.html#adam-update-rule-derivation" style="font-size: 80%;"><b>Adam: Update Rule Derivation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs049.html#adam-vs-adagrad-and-rmsprop" style="font-size: 80%;"><b>Adam vs. AdaGrad and RMSProp</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs050.html#adaptivity-across-dimensions" style="font-size: 80%;"><b>Adaptivity Across Dimensions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs050.html#adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>ADAM algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs051.html#algorithms-and-codes-for-adagrad-rmsprop-and-adam" style="font-size: 80%;"><b>Algorithms and codes for Adagrad, RMSprop and Adam</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs052.html#practical-tips" style="font-size: 80%;"><b>Practical tips</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs053.html#sneaking-in-automatic-differentiation-using-autograd" style="font-size: 80%;"><b>Sneaking in automatic differentiation using Autograd</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs056.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs055.html#including-stochastic-gradient-descent-with-autograd" style="font-size: 80%;"><b>Including Stochastic Gradient Descent with Autograd</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs056.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs057.html#but-none-of-these-can-compete-with-newton-s-method" style="font-size: 80%;"><b>But none of these can compete with Newton's method</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs058.html#similar-second-order-function-now-problem-but-now-with-adagrad" style="font-size: 80%;"><b>Similar (second order function now) problem but now with AdaGrad</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs059.html#rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent" style="font-size: 80%;"><b>RMSprop for adaptive learning rate with Stochastic Gradient Descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs060.html#and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf" style="font-size: 80%;"><b>And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs061.html#material-for-the-lab-sessions" style="font-size: 80%;"><b>Material for the lab sessions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs062.html#reminder-on-different-scaling-methods" style="font-size: 80%;"><b>Reminder on different scaling methods</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs063.html#functionality-in-scikit-learn" style="font-size: 80%;"><b>Functionality in Scikit-Learn</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs064.html#more-preprocessing" style="font-size: 80%;"><b>More preprocessing</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs065.html#frequently-used-scaling-functions" style="font-size: 80%;"><b>Frequently used scaling functions</b></a></li>
 
         </ul>
       </li>
@@ -291,31 +374,16 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0007"></a>
 <!-- !split -->
-<h2 id="a-new-cost-function" class="anchor">A new Cost Function </h2>
-
-<p>We could now define a new cost function to minimize, namely the negative logarithm of the above PDF</p>
-
-$$
-C(\boldsymbol{\beta}=-\log{\prod_{i=0}^{n-1}p(y_i,\boldsymbol{X}\vert\boldsymbol{\beta})}=-\sum_{i=0}^{n-1}\log{p(y_i,\boldsymbol{X}\vert\boldsymbol{\beta})},
-$$
-
-<p>which becomes</p>
-$$
-C(\boldsymbol{\beta}=\frac{n}{2}\log{2\pi\sigma^2}+\frac{\vert\vert (\boldsymbol{y}-\boldsymbol{X}\boldsymbol{\beta})\vert\vert_2^2}{2\sigma^2}.
-$$
-
-<p>Taking the derivative of the <em>new</em> cost function with respect to the parameters \( \beta \) we recognize our familiar OLS equation, namely</p>
-
-$$
-\boldsymbol{X}^T\left(\boldsymbol{y}-\boldsymbol{X}\boldsymbol{\beta}\right) =0,
-$$
-
-<p>which leads to the well-known OLS equation for the optimal paramters \( \beta \)</p>
+<h2 id="the-hessian-matrix" class="anchor">The Hessian matrix </h2>
+<p>The Hessian matrix of \( C(\theta) \) is given by </p>
 $$
-\hat{\boldsymbol{\beta}}^{\mathrm{OLS}}=\left(\boldsymbol{X}^T\boldsymbol{X}\right)^{-1}\boldsymbol{X}^T\boldsymbol{y}!
+\boldsymbol{H} \equiv \begin{bmatrix}
+\frac{\partial^2 C(\theta)}{\partial \theta_0^2} & \frac{\partial^2 C(\theta)}{\partial \theta_0 \partial \theta_1}  \\
+\frac{\partial^2 C(\theta)}{\partial \theta_0 \partial \theta_1} & \frac{\partial^2 C(\theta)}{\partial \theta_1^2} &  \\
+\end{bmatrix} = \frac{2}{n}X^T X.
 $$
 
-<p>Before we make a similar analysis for Ridge and Lasso regression, we need a short reminder on statistics. </p>
+<p>This result implies that \( C(\theta) \) is a convex function since the matrix \( X^T X \) always is positive semi-definite.</p>
 
 <p>
 <!-- navigation buttons at the bottom of the page -->
@@ -339,7 +407,7 @@ <h2 id="a-new-cost-function" class="anchor">A new Cost Function </h2>
   <li><a href="http://github.com/._week37-bs015.html">16</a></li>
   <li><a href="http://github.com/._week37-bs016.html">17</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week37-bs053.html">54</a></li>
+  <li><a href="http://github.com/._week37-bs065.html">66</a></li>
   <li><a href="http://github.com/._week37-bs008.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week37/html/._week37-bs008.html b/doc/pub/week37/html/._week37-bs008.html
index b23162b7a..80e607a76 100644
--- a/doc/pub/week37/html/._week37-bs008.html
+++ b/doc/pub/week37/html/._week37-bs008.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 37: Statistical interpretations and Resampling Methods">
-<title>Week 37: Statistical interpretations and Resampling Methods</title>
+<meta name="description" content="Week 37: Gradient descent methods">
+<title>Week 37: Gradient descent methods</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week37.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week37-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -40,159 +40,222 @@
                2,
                None,
                'plans-for-week-37-lecture-monday'),
-              ('Plans for week 37, lab sessions',
+              ('Readings and Videos:', 2, None, 'readings-and-videos'),
+              ('Material for lecture Monday September 8',
                2,
                None,
-               'plans-for-week-37-lab-sessions'),
-              ('Material for lecture Monday September 9',
+               'material-for-lecture-monday-september-8'),
+              ('Gradient descent and revisiting Ordinary Least Squares from '
+               'last week',
                2,
                None,
-               'material-for-lecture-monday-september-9'),
-              ('Deriving OLS from a probability distribution',
+               'gradient-descent-and-revisiting-ordinary-least-squares-from-last-week'),
+              ('Gradient descent example', 2, None, 'gradient-descent-example'),
+              ('The derivative of the cost/loss function',
                2,
                None,
-               'deriving-ols-from-a-probability-distribution'),
-              ('Independent and Identically Distrubuted (iid)',
+               'the-derivative-of-the-cost-loss-function'),
+              ('The Hessian matrix', 2, None, 'the-hessian-matrix'),
+              ('Simple program', 2, None, 'simple-program'),
+              ('Gradient Descent Example', 2, None, 'gradient-descent-example'),
+              ('Gradient descent and Ridge',
                2,
                None,
-               'independent-and-identically-distrubuted-iid'),
-              ('Maximum Likelihood Estimation (MLE)',
+               'gradient-descent-and-ridge'),
+              ('The Hessian matrix for Ridge Regression',
                2,
                None,
-               'maximum-likelihood-estimation-mle'),
-              ('A new Cost Function', 2, None, 'a-new-cost-function'),
-              ("More basic Statistics and Bayes' theorem",
+               'the-hessian-matrix-for-ridge-regression'),
+              ('Program example for gradient descent with Ridge Regression',
                2,
                None,
-               'more-basic-statistics-and-bayes-theorem'),
-              ('Marginal Probability', 2, None, 'marginal-probability'),
-              ('Conditional  Probability', 2, None, 'conditional-probability'),
-              ("Bayes' Theorem", 2, None, 'bayes-theorem'),
-              ("Interpretations of Bayes' Theorem",
+               'program-example-for-gradient-descent-with-ridge-regression'),
+              ('Using gradient descent methods, limitations',
                2,
                None,
-               'interpretations-of-bayes-theorem'),
-              ("Example of Usage of Bayes' theorem",
+               'using-gradient-descent-methods-limitations'),
+              ('Momentum based GD', 2, None, 'momentum-based-gd'),
+              ('Improving gradient descent with momentum',
                2,
                None,
-               'example-of-usage-of-bayes-theorem'),
-              ('Doing it correctly', 2, None, 'doing-it-correctly'),
-              ("Bayes' Theorem and Ridge and Lasso Regression",
+               'improving-gradient-descent-with-momentum'),
+              ('Same code but now with momentum gradient descent',
                2,
                None,
-               'bayes-theorem-and-ridge-and-lasso-regression'),
-              ('Ridge and Bayes', 2, None, 'ridge-and-bayes'),
-              ('Lasso and Bayes', 2, None, 'lasso-and-bayes'),
-              ('Why resampling methods', 2, None, 'why-resampling-methods'),
-              ('Resampling methods', 2, None, 'resampling-methods'),
-              ('Resampling approaches can be computationally expensive',
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ('Overview video on Stochastic Gradient Descent (SGD)',
                2,
                None,
-               'resampling-approaches-can-be-computationally-expensive'),
-              ('Why resampling methods ?', 2, None, 'why-resampling-methods'),
-              ('Statistical analysis', 2, None, 'statistical-analysis'),
-              ('Resampling methods', 2, None, 'resampling-methods'),
-              ('Resampling methods: Bootstrap',
+               'overview-video-on-stochastic-gradient-descent-sgd'),
+              ('Batches and mini-batches', 2, None, 'batches-and-mini-batches'),
+              ('Pros and cons', 2, None, 'pros-and-cons'),
+              ('Convergence rates', 2, None, 'convergence-rates'),
+              ('Accuracy', 2, None, 'accuracy'),
+              ('Stochastic Gradient Descent (SGD)',
                2,
                None,
-               'resampling-methods-bootstrap'),
-              ('The Central Limit Theorem',
+               'stochastic-gradient-descent-sgd'),
+              ('Stochastic Gradient Descent',
                2,
                None,
-               'the-central-limit-theorem'),
-              ('Finding the Limit', 2, None, 'finding-the-limit'),
-              ('Rewriting the $\\delta$-function',
+               'stochastic-gradient-descent'),
+              ('Computation of gradients', 2, None, 'computation-of-gradients'),
+              ('SGD example', 2, None, 'sgd-example'),
+              ('The gradient step', 2, None, 'the-gradient-step'),
+              ('Simple example code', 2, None, 'simple-example-code'),
+              ('When do we stop?', 2, None, 'when-do-we-stop'),
+              ('Slightly different approach',
                2,
                None,
-               'rewriting-the-delta-function'),
-              ('Identifying Terms', 2, None, 'identifying-terms'),
-              ('Wrapping it up', 2, None, 'wrapping-it-up'),
-              ('Confidence Intervals', 2, None, 'confidence-intervals'),
-              ('Standard Approach based on the Normal Distribution',
+               'slightly-different-approach'),
+              ('Time decay rate', 2, None, 'time-decay-rate'),
+              ('Code with a Number of Minibatches which varies',
                2,
                None,
-               'standard-approach-based-on-the-normal-distribution'),
-              ('Resampling methods: Bootstrap background',
+               'code-with-a-number-of-minibatches-which-varies'),
+              ('Replace or not', 2, None, 'replace-or-not'),
+              ('SGD vs Full-Batch GD: Convergence Speed and Memory Comparison',
                2,
                None,
-               'resampling-methods-bootstrap-background'),
-              ('Resampling methods: More Bootstrap background',
+               'sgd-vs-full-batch-gd-convergence-speed-and-memory-comparison'),
+              ('Theoretical Convergence Speed and convex optimization',
+               3,
+               None,
+               'theoretical-convergence-speed-and-convex-optimization'),
+              ('Strongly Convex Case', 3, None, 'strongly-convex-case'),
+              ('Non-Convex Problems', 3, None, 'non-convex-problems'),
+              ('Memory Usage and Scalability',
+               2,
+               None,
+               'memory-usage-and-scalability'),
+              ('Empirical Evidence: Convergence Time and Memory in Practice',
+               2,
+               None,
+               'empirical-evidence-convergence-time-and-memory-in-practice'),
+              ('Deep Neural Networks', 3, None, 'deep-neural-networks'),
+              ('Memory constraints', 3, None, 'memory-constraints'),
+              ('Second moment of the gradient',
+               2,
+               None,
+               'second-moment-of-the-gradient'),
+              ('Challenge: Choosing a Fixed Learning Rate',
+               2,
+               None,
+               'challenge-choosing-a-fixed-learning-rate'),
+              ('Motivation for Adaptive Step Sizes',
+               2,
+               None,
+               'motivation-for-adaptive-step-sizes'),
+              ('AdaGrad algorithm, taken from "Goodfellow et '
+               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
+               2,
+               None,
+               'adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
+              ('Derivation of the AdaGrad Algorithm',
+               2,
+               None,
+               'derivation-of-the-adagrad-algorithm'),
+              ('AdaGrad Update Rule Derivation',
+               2,
+               None,
+               'adagrad-update-rule-derivation'),
+              ('AdaGrad Properties', 2, None, 'adagrad-properties'),
+              ('RMSProp: Adaptive Learning Rates',
+               2,
+               None,
+               'rmsprop-adaptive-learning-rates'),
+              ('RMSProp algorithm, taken from "Goodfellow et '
+               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
+               2,
+               None,
+               'rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
+              ('Adam Optimizer', 2, None, 'adam-optimizer'),
+              ('"ADAM optimizer":"https://arxiv.org/abs/1412.6980"',
                2,
                None,
-               'resampling-methods-more-bootstrap-background'),
-              ('Resampling methods: Bootstrap approach',
+               'adam-optimizer-https-arxiv-org-abs-1412-6980'),
+              ('Why Combine Momentum and RMSProp?',
                2,
                None,
-               'resampling-methods-bootstrap-approach'),
-              ('Resampling methods: Bootstrap steps',
+               'why-combine-momentum-and-rmsprop'),
+              ('Adam: Exponential Moving Averages (Moments)',
                2,
                None,
-               'resampling-methods-bootstrap-steps'),
-              ('Code example for the Bootstrap method',
+               'adam-exponential-moving-averages-moments'),
+              ('Adam: Bias Correction', 2, None, 'adam-bias-correction'),
+              ('Adam: Update Rule Derivation',
                2,
                None,
-               'code-example-for-the-bootstrap-method'),
-              ('Plotting the Histogram', 2, None, 'plotting-the-histogram'),
-              ('The bias-variance tradeoff',
+               'adam-update-rule-derivation'),
+              ('Adam vs. AdaGrad and RMSProp',
                2,
                None,
-               'the-bias-variance-tradeoff'),
-              ('A way to Read the Bias-Variance Tradeoff',
+               'adam-vs-adagrad-and-rmsprop'),
+              ('Adaptivity Across Dimensions',
                2,
                None,
-               'a-way-to-read-the-bias-variance-tradeoff'),
-              ('Example code for Bias-Variance tradeoff',
+               'adaptivity-across-dimensions'),
+              ('ADAM algorithm, taken from "Goodfellow et '
+               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
                2,
                None,
-               'example-code-for-bias-variance-tradeoff'),
-              ('Understanding what happens',
+               'adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
+              ('Algorithms and codes for Adagrad, RMSprop and Adam',
                2,
                None,
-               'understanding-what-happens'),
-              ('Summing up', 2, None, 'summing-up'),
-              ("Another Example from Scikit-Learn's Repository",
+               'algorithms-and-codes-for-adagrad-rmsprop-and-adam'),
+              ('Practical tips', 2, None, 'practical-tips'),
+              ('Sneaking in automatic differentiation using Autograd',
                2,
                None,
-               'another-example-from-scikit-learn-s-repository'),
-              ('Various steps in cross-validation',
+               'sneaking-in-automatic-differentiation-using-autograd'),
+              ('Same code but now with momentum gradient descent',
                2,
                None,
-               'various-steps-in-cross-validation'),
-              ('Cross-validation in brief',
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ('Including Stochastic Gradient Descent with Autograd',
                2,
                None,
-               'cross-validation-in-brief'),
-              ('Code Example for Cross-validation and $k$-fold '
-               'Cross-validation',
+               'including-stochastic-gradient-descent-with-autograd'),
+              ('Same code but now with momentum gradient descent',
                2,
                None,
-               'code-example-for-cross-validation-and-k-fold-cross-validation'),
-              ('More examples on bootstrap and cross-validation and errors',
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ("But none of these can compete with Newton's method",
                2,
                None,
-               'more-examples-on-bootstrap-and-cross-validation-and-errors'),
-              ('The same example but now with cross-validation',
+               'but-none-of-these-can-compete-with-newton-s-method'),
+              ('Similar (second order function now) problem but now with '
+               'AdaGrad',
                2,
                None,
-               'the-same-example-but-now-with-cross-validation'),
+               'similar-second-order-function-now-problem-but-now-with-adagrad'),
+              ('RMSprop for adaptive learning rate with Stochastic Gradient '
+               'Descent',
+               2,
+               None,
+               'rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent'),
+              ('And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"',
+               2,
+               None,
+               'and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf'),
               ('Material for the lab sessions',
                2,
                None,
                'material-for-the-lab-sessions'),
-              ('Linking the regression analysis with a statistical '
-               'interpretation',
+              ('Reminder on different scaling methods',
                2,
                None,
-               'linking-the-regression-analysis-with-a-statistical-interpretation'),
-              ('Assumptions made', 2, None, 'assumptions-made'),
-              ('Expectation value and variance',
+               'reminder-on-different-scaling-methods'),
+              ('Functionality in Scikit-Learn',
                2,
                None,
-               'expectation-value-and-variance'),
-              ('Expectation value and variance for $\\boldsymbol{\\beta}$',
+               'functionality-in-scikit-learn'),
+              ('More preprocessing', 2, None, 'more-preprocessing'),
+              ('Frequently used scaling functions',
                2,
                None,
-               'expectation-value-and-variance-for-boldsymbol-beta')]}
+               'frequently-used-scaling-functions')]}
 end of tocinfo -->
 
 <body>
@@ -220,66 +283,86 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week37-bs.html">Week 37: Statistical interpretations and Resampling Methods</a>
+    <a class="navbar-brand" href="http://github.com/week37-bs.html">Week 37: Gradient descent methods</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs001.html#plans-for-week-37-lecture-monday" style="font-size: 80%;">Plans for week 37, lecture Monday</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs002.html#plans-for-week-37-lab-sessions" style="font-size: 80%;">Plans for week 37, lab sessions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs003.html#material-for-lecture-monday-september-9" style="font-size: 80%;">Material for lecture Monday September 9</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs004.html#deriving-ols-from-a-probability-distribution" style="font-size: 80%;">Deriving OLS from a probability distribution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs005.html#independent-and-identically-distrubuted-iid" style="font-size: 80%;">Independent and Identically Distrubuted (iid)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs006.html#maximum-likelihood-estimation-mle" style="font-size: 80%;">Maximum Likelihood Estimation (MLE)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs007.html#a-new-cost-function" style="font-size: 80%;">A new Cost Function</a></li>
-     <!-- navigation toc: --> <li><a href="#more-basic-statistics-and-bayes-theorem" style="font-size: 80%;">More basic Statistics and Bayes' theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs009.html#marginal-probability" style="font-size: 80%;">Marginal Probability</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs010.html#conditional-probability" style="font-size: 80%;">Conditional  Probability</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs011.html#bayes-theorem" style="font-size: 80%;">Bayes' Theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs012.html#interpretations-of-bayes-theorem" style="font-size: 80%;">Interpretations of Bayes' Theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs013.html#example-of-usage-of-bayes-theorem" style="font-size: 80%;">Example of Usage of Bayes' theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs014.html#doing-it-correctly" style="font-size: 80%;">Doing it correctly</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs015.html#bayes-theorem-and-ridge-and-lasso-regression" style="font-size: 80%;">Bayes' Theorem and Ridge and Lasso Regression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs016.html#ridge-and-bayes" style="font-size: 80%;">Ridge and Bayes</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs017.html#lasso-and-bayes" style="font-size: 80%;">Lasso and Bayes</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs021.html#why-resampling-methods" style="font-size: 80%;">Why resampling methods</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs023.html#resampling-methods" style="font-size: 80%;">Resampling methods</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs020.html#resampling-approaches-can-be-computationally-expensive" style="font-size: 80%;">Resampling approaches can be computationally expensive</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs021.html#why-resampling-methods" style="font-size: 80%;">Why resampling methods ?</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs022.html#statistical-analysis" style="font-size: 80%;">Statistical analysis</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs023.html#resampling-methods" style="font-size: 80%;">Resampling methods</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs024.html#resampling-methods-bootstrap" style="font-size: 80%;">Resampling methods: Bootstrap</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs025.html#the-central-limit-theorem" style="font-size: 80%;">The Central Limit Theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs026.html#finding-the-limit" style="font-size: 80%;">Finding the Limit</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs027.html#rewriting-the-delta-function" style="font-size: 80%;">Rewriting the \( \delta \)-function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs028.html#identifying-terms" style="font-size: 80%;">Identifying Terms</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs029.html#wrapping-it-up" style="font-size: 80%;">Wrapping it up</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs030.html#confidence-intervals" style="font-size: 80%;">Confidence Intervals</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs031.html#standard-approach-based-on-the-normal-distribution" style="font-size: 80%;">Standard Approach based on the Normal Distribution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs032.html#resampling-methods-bootstrap-background" style="font-size: 80%;">Resampling methods: Bootstrap background</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#resampling-methods-more-bootstrap-background" style="font-size: 80%;">Resampling methods: More Bootstrap background</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs034.html#resampling-methods-bootstrap-approach" style="font-size: 80%;">Resampling methods: Bootstrap approach</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#resampling-methods-bootstrap-steps" style="font-size: 80%;">Resampling methods: Bootstrap steps</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs036.html#code-example-for-the-bootstrap-method" style="font-size: 80%;">Code example for the Bootstrap method</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs037.html#plotting-the-histogram" style="font-size: 80%;">Plotting the Histogram</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs038.html#the-bias-variance-tradeoff" style="font-size: 80%;">The bias-variance tradeoff</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs039.html#a-way-to-read-the-bias-variance-tradeoff" style="font-size: 80%;">A way to Read the Bias-Variance Tradeoff</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs040.html#example-code-for-bias-variance-tradeoff" style="font-size: 80%;">Example code for Bias-Variance tradeoff</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs041.html#understanding-what-happens" style="font-size: 80%;">Understanding what happens</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs042.html#summing-up" style="font-size: 80%;">Summing up</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs043.html#another-example-from-scikit-learn-s-repository" style="font-size: 80%;">Another Example from Scikit-Learn's Repository</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs044.html#various-steps-in-cross-validation" style="font-size: 80%;">Various steps in cross-validation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs045.html#cross-validation-in-brief" style="font-size: 80%;">Cross-validation in brief</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs046.html#code-example-for-cross-validation-and-k-fold-cross-validation" style="font-size: 80%;">Code Example for Cross-validation and \( k \)-fold Cross-validation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs047.html#more-examples-on-bootstrap-and-cross-validation-and-errors" style="font-size: 80%;">More examples on bootstrap and cross-validation and errors</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs048.html#the-same-example-but-now-with-cross-validation" style="font-size: 80%;">The same example but now with cross-validation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs049.html#material-for-the-lab-sessions" style="font-size: 80%;">Material for the lab sessions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs050.html#linking-the-regression-analysis-with-a-statistical-interpretation" style="font-size: 80%;">Linking the regression analysis with a statistical interpretation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs051.html#assumptions-made" style="font-size: 80%;">Assumptions made</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs052.html#expectation-value-and-variance" style="font-size: 80%;">Expectation value and variance</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs053.html#expectation-value-and-variance-for-boldsymbol-beta" style="font-size: 80%;">Expectation value and variance for \( \boldsymbol{\beta} \)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs001.html#plans-for-week-37-lecture-monday" style="font-size: 80%;"><b>Plans for week 37, lecture Monday</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs002.html#readings-and-videos" style="font-size: 80%;"><b>Readings and Videos:</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs003.html#material-for-lecture-monday-september-8" style="font-size: 80%;"><b>Material for lecture Monday September 8</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs004.html#gradient-descent-and-revisiting-ordinary-least-squares-from-last-week" style="font-size: 80%;"><b>Gradient descent and revisiting Ordinary Least Squares from last week</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs009.html#gradient-descent-example" style="font-size: 80%;"><b>Gradient descent example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs006.html#the-derivative-of-the-cost-loss-function" style="font-size: 80%;"><b>The derivative of the cost/loss function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs007.html#the-hessian-matrix" style="font-size: 80%;"><b>The Hessian matrix</b></a></li>
+     <!-- navigation toc: --> <li><a href="#simple-program" style="font-size: 80%;"><b>Simple program</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs009.html#gradient-descent-example" style="font-size: 80%;"><b>Gradient Descent Example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs010.html#gradient-descent-and-ridge" style="font-size: 80%;"><b>Gradient descent and Ridge</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs011.html#the-hessian-matrix-for-ridge-regression" style="font-size: 80%;"><b>The Hessian matrix for Ridge Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs012.html#program-example-for-gradient-descent-with-ridge-regression" style="font-size: 80%;"><b>Program example for gradient descent with Ridge Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs013.html#using-gradient-descent-methods-limitations" style="font-size: 80%;"><b>Using gradient descent methods, limitations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs014.html#momentum-based-gd" style="font-size: 80%;"><b>Momentum based GD</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs015.html#improving-gradient-descent-with-momentum" style="font-size: 80%;"><b>Improving gradient descent with momentum</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs056.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs017.html#overview-video-on-stochastic-gradient-descent-sgd" style="font-size: 80%;"><b>Overview video on Stochastic Gradient Descent (SGD)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs018.html#batches-and-mini-batches" style="font-size: 80%;"><b>Batches and mini-batches</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs019.html#pros-and-cons" style="font-size: 80%;"><b>Pros and cons</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs020.html#convergence-rates" style="font-size: 80%;"><b>Convergence rates</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs021.html#accuracy" style="font-size: 80%;"><b>Accuracy</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs022.html#stochastic-gradient-descent-sgd" style="font-size: 80%;"><b>Stochastic Gradient Descent (SGD)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs023.html#stochastic-gradient-descent" style="font-size: 80%;"><b>Stochastic Gradient Descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs024.html#computation-of-gradients" style="font-size: 80%;"><b>Computation of gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs025.html#sgd-example" style="font-size: 80%;"><b>SGD example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs026.html#the-gradient-step" style="font-size: 80%;"><b>The gradient step</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs027.html#simple-example-code" style="font-size: 80%;"><b>Simple example code</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs028.html#when-do-we-stop" style="font-size: 80%;"><b>When do we stop?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs029.html#slightly-different-approach" style="font-size: 80%;"><b>Slightly different approach</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs030.html#time-decay-rate" style="font-size: 80%;"><b>Time decay rate</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs031.html#code-with-a-number-of-minibatches-which-varies" style="font-size: 80%;"><b>Code with a Number of Minibatches which varies</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs032.html#replace-or-not" style="font-size: 80%;"><b>Replace or not</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#sgd-vs-full-batch-gd-convergence-speed-and-memory-comparison" style="font-size: 80%;"><b>SGD vs Full-Batch GD: Convergence Speed and Memory Comparison</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#theoretical-convergence-speed-and-convex-optimization" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Theoretical Convergence Speed and convex optimization</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#strongly-convex-case" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Strongly Convex Case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#non-convex-problems" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Non-Convex Problems</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs034.html#memory-usage-and-scalability" style="font-size: 80%;"><b>Memory Usage and Scalability</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#empirical-evidence-convergence-time-and-memory-in-practice" style="font-size: 80%;"><b>Empirical Evidence: Convergence Time and Memory in Practice</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#deep-neural-networks" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Deep Neural Networks</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#memory-constraints" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Memory constraints</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs036.html#second-moment-of-the-gradient" style="font-size: 80%;"><b>Second moment of the gradient</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs037.html#challenge-choosing-a-fixed-learning-rate" style="font-size: 80%;"><b>Challenge: Choosing a Fixed Learning Rate</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs038.html#motivation-for-adaptive-step-sizes" style="font-size: 80%;"><b>Motivation for Adaptive Step Sizes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs038.html#adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>AdaGrad algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs039.html#derivation-of-the-adagrad-algorithm" style="font-size: 80%;"><b>Derivation of the AdaGrad Algorithm</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs040.html#adagrad-update-rule-derivation" style="font-size: 80%;"><b>AdaGrad Update Rule Derivation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs041.html#adagrad-properties" style="font-size: 80%;"><b>AdaGrad Properties</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs042.html#rmsprop-adaptive-learning-rates" style="font-size: 80%;"><b>RMSProp: Adaptive Learning Rates</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs042.html#rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>RMSProp algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs043.html#adam-optimizer" style="font-size: 80%;"><b>Adam Optimizer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs044.html#adam-optimizer-https-arxiv-org-abs-1412-6980" style="font-size: 80%;"><b>"ADAM optimizer":"https://arxiv.org/abs/1412.6980"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs045.html#why-combine-momentum-and-rmsprop" style="font-size: 80%;"><b>Why Combine Momentum and RMSProp?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs046.html#adam-exponential-moving-averages-moments" style="font-size: 80%;"><b>Adam: Exponential Moving Averages (Moments)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs047.html#adam-bias-correction" style="font-size: 80%;"><b>Adam: Bias Correction</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs048.html#adam-update-rule-derivation" style="font-size: 80%;"><b>Adam: Update Rule Derivation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs049.html#adam-vs-adagrad-and-rmsprop" style="font-size: 80%;"><b>Adam vs. AdaGrad and RMSProp</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs050.html#adaptivity-across-dimensions" style="font-size: 80%;"><b>Adaptivity Across Dimensions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs050.html#adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>ADAM algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs051.html#algorithms-and-codes-for-adagrad-rmsprop-and-adam" style="font-size: 80%;"><b>Algorithms and codes for Adagrad, RMSprop and Adam</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs052.html#practical-tips" style="font-size: 80%;"><b>Practical tips</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs053.html#sneaking-in-automatic-differentiation-using-autograd" style="font-size: 80%;"><b>Sneaking in automatic differentiation using Autograd</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs056.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs055.html#including-stochastic-gradient-descent-with-autograd" style="font-size: 80%;"><b>Including Stochastic Gradient Descent with Autograd</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs056.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs057.html#but-none-of-these-can-compete-with-newton-s-method" style="font-size: 80%;"><b>But none of these can compete with Newton's method</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs058.html#similar-second-order-function-now-problem-but-now-with-adagrad" style="font-size: 80%;"><b>Similar (second order function now) problem but now with AdaGrad</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs059.html#rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent" style="font-size: 80%;"><b>RMSprop for adaptive learning rate with Stochastic Gradient Descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs060.html#and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf" style="font-size: 80%;"><b>And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs061.html#material-for-the-lab-sessions" style="font-size: 80%;"><b>Material for the lab sessions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs062.html#reminder-on-different-scaling-methods" style="font-size: 80%;"><b>Reminder on different scaling methods</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs063.html#functionality-in-scikit-learn" style="font-size: 80%;"><b>Functionality in Scikit-Learn</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs064.html#more-preprocessing" style="font-size: 80%;"><b>More preprocessing</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs065.html#frequently-used-scaling-functions" style="font-size: 80%;"><b>Frequently used scaling functions</b></a></li>
 
         </ul>
       </li>
@@ -291,41 +374,21 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0008"></a>
 <!-- !split -->
-<h2 id="more-basic-statistics-and-bayes-theorem" class="anchor">More basic Statistics and Bayes' theorem </h2>
-
-<p>A central theorem in statistics is Bayes' theorem. This theorem plays a similar role as the good old Pythagoras' theorem in geometry.
-Bayes' theorem is extremely simple to derive. But to do so we need some basic axioms from statistics.
-</p>
-
-<p>Assume we have two domains of events \( X=[x_0,x_1,\dots,x_{n-1}] \) and \( Y=[y_0,y_1,\dots,y_{n-1}] \).</p>
-
-<p>We define also the likelihood for \( X \) and \( Y \) as \( p(X) \) and \( p(Y) \) respectively.
-The likelihood of a specific event \( x_i \) (or \( y_i \)) is then written as \( p(X=x_i) \) or just \( p(x_i)=p_i \). 
-</p>
-
-<div class="panel panel-default">
-<div class="panel-body">
-<!-- subsequent paragraphs come in larger fonts, so start with a paragraph -->
-$$
-p(X \cup Y)= p(X)+p(Y)-p(X \cap Y).
-$$
-</div>
-</div>
-
+<h2 id="simple-program" class="anchor">Simple program </h2>
 
-<div class="panel panel-default">
-<div class="panel-body">
-<!-- subsequent paragraphs come in larger fonts, so start with a paragraph -->
+<p>We can now write a program that minimizes \( C(\theta) \) using the gradient descent method with a constant learning rate \( \eta \) according to </p>
 $$
-p(X \cup Y)= p(X,Y)= p(X\vert Y)p(Y)=p(Y\vert X)p(X),
+\theta_{k+1} = \theta_k - \eta \nabla_\theta C(\theta_k), \ k=0,1,\cdots 
 $$
 
-<p>where we read \( p(X\vert Y) \) as the likelihood of obtaining \( X \) given \( Y \).</p>
-</div>
-</div>
-
+<p>We can use the expression we computed for the gradient and let use a
+\( \theta_0 \) be chosen randomly and let \( \eta = 0.001 \). Stop iterating
+when \( ||\nabla_\theta C(\theta_k) || \leq \epsilon = 10^{-8} \). <b>Note that the code below does not include the latter stop criterion</b>.
+</p>
 
-<p>If we have independent events then \( p(X,Y)=p(X)p(Y) \).</p>
+<p>And finally we can compare our solution for \( \theta \) with the analytic result given by 
+\( \theta= (X^TX)^{-1} X^T \mathbf{y} \).
+</p>
 
 <p>
 <!-- navigation buttons at the bottom of the page -->
@@ -350,7 +413,7 @@ <h2 id="more-basic-statistics-and-bayes-theorem" class="anchor">More basic Stati
   <li><a href="http://github.com/._week37-bs016.html">17</a></li>
   <li><a href="http://github.com/._week37-bs017.html">18</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week37-bs053.html">54</a></li>
+  <li><a href="http://github.com/._week37-bs065.html">66</a></li>
   <li><a href="http://github.com/._week37-bs009.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week37/html/._week37-bs009.html b/doc/pub/week37/html/._week37-bs009.html
index 49d99d9b2..696a4305c 100644
--- a/doc/pub/week37/html/._week37-bs009.html
+++ b/doc/pub/week37/html/._week37-bs009.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 37: Statistical interpretations and Resampling Methods">
-<title>Week 37: Statistical interpretations and Resampling Methods</title>
+<meta name="description" content="Week 37: Gradient descent methods">
+<title>Week 37: Gradient descent methods</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week37.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week37-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -40,159 +40,222 @@
                2,
                None,
                'plans-for-week-37-lecture-monday'),
-              ('Plans for week 37, lab sessions',
+              ('Readings and Videos:', 2, None, 'readings-and-videos'),
+              ('Material for lecture Monday September 8',
                2,
                None,
-               'plans-for-week-37-lab-sessions'),
-              ('Material for lecture Monday September 9',
+               'material-for-lecture-monday-september-8'),
+              ('Gradient descent and revisiting Ordinary Least Squares from '
+               'last week',
                2,
                None,
-               'material-for-lecture-monday-september-9'),
-              ('Deriving OLS from a probability distribution',
+               'gradient-descent-and-revisiting-ordinary-least-squares-from-last-week'),
+              ('Gradient descent example', 2, None, 'gradient-descent-example'),
+              ('The derivative of the cost/loss function',
                2,
                None,
-               'deriving-ols-from-a-probability-distribution'),
-              ('Independent and Identically Distrubuted (iid)',
+               'the-derivative-of-the-cost-loss-function'),
+              ('The Hessian matrix', 2, None, 'the-hessian-matrix'),
+              ('Simple program', 2, None, 'simple-program'),
+              ('Gradient Descent Example', 2, None, 'gradient-descent-example'),
+              ('Gradient descent and Ridge',
                2,
                None,
-               'independent-and-identically-distrubuted-iid'),
-              ('Maximum Likelihood Estimation (MLE)',
+               'gradient-descent-and-ridge'),
+              ('The Hessian matrix for Ridge Regression',
                2,
                None,
-               'maximum-likelihood-estimation-mle'),
-              ('A new Cost Function', 2, None, 'a-new-cost-function'),
-              ("More basic Statistics and Bayes' theorem",
+               'the-hessian-matrix-for-ridge-regression'),
+              ('Program example for gradient descent with Ridge Regression',
                2,
                None,
-               'more-basic-statistics-and-bayes-theorem'),
-              ('Marginal Probability', 2, None, 'marginal-probability'),
-              ('Conditional  Probability', 2, None, 'conditional-probability'),
-              ("Bayes' Theorem", 2, None, 'bayes-theorem'),
-              ("Interpretations of Bayes' Theorem",
+               'program-example-for-gradient-descent-with-ridge-regression'),
+              ('Using gradient descent methods, limitations',
                2,
                None,
-               'interpretations-of-bayes-theorem'),
-              ("Example of Usage of Bayes' theorem",
+               'using-gradient-descent-methods-limitations'),
+              ('Momentum based GD', 2, None, 'momentum-based-gd'),
+              ('Improving gradient descent with momentum',
                2,
                None,
-               'example-of-usage-of-bayes-theorem'),
-              ('Doing it correctly', 2, None, 'doing-it-correctly'),
-              ("Bayes' Theorem and Ridge and Lasso Regression",
+               'improving-gradient-descent-with-momentum'),
+              ('Same code but now with momentum gradient descent',
                2,
                None,
-               'bayes-theorem-and-ridge-and-lasso-regression'),
-              ('Ridge and Bayes', 2, None, 'ridge-and-bayes'),
-              ('Lasso and Bayes', 2, None, 'lasso-and-bayes'),
-              ('Why resampling methods', 2, None, 'why-resampling-methods'),
-              ('Resampling methods', 2, None, 'resampling-methods'),
-              ('Resampling approaches can be computationally expensive',
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ('Overview video on Stochastic Gradient Descent (SGD)',
                2,
                None,
-               'resampling-approaches-can-be-computationally-expensive'),
-              ('Why resampling methods ?', 2, None, 'why-resampling-methods'),
-              ('Statistical analysis', 2, None, 'statistical-analysis'),
-              ('Resampling methods', 2, None, 'resampling-methods'),
-              ('Resampling methods: Bootstrap',
+               'overview-video-on-stochastic-gradient-descent-sgd'),
+              ('Batches and mini-batches', 2, None, 'batches-and-mini-batches'),
+              ('Pros and cons', 2, None, 'pros-and-cons'),
+              ('Convergence rates', 2, None, 'convergence-rates'),
+              ('Accuracy', 2, None, 'accuracy'),
+              ('Stochastic Gradient Descent (SGD)',
                2,
                None,
-               'resampling-methods-bootstrap'),
-              ('The Central Limit Theorem',
+               'stochastic-gradient-descent-sgd'),
+              ('Stochastic Gradient Descent',
                2,
                None,
-               'the-central-limit-theorem'),
-              ('Finding the Limit', 2, None, 'finding-the-limit'),
-              ('Rewriting the $\\delta$-function',
+               'stochastic-gradient-descent'),
+              ('Computation of gradients', 2, None, 'computation-of-gradients'),
+              ('SGD example', 2, None, 'sgd-example'),
+              ('The gradient step', 2, None, 'the-gradient-step'),
+              ('Simple example code', 2, None, 'simple-example-code'),
+              ('When do we stop?', 2, None, 'when-do-we-stop'),
+              ('Slightly different approach',
                2,
                None,
-               'rewriting-the-delta-function'),
-              ('Identifying Terms', 2, None, 'identifying-terms'),
-              ('Wrapping it up', 2, None, 'wrapping-it-up'),
-              ('Confidence Intervals', 2, None, 'confidence-intervals'),
-              ('Standard Approach based on the Normal Distribution',
+               'slightly-different-approach'),
+              ('Time decay rate', 2, None, 'time-decay-rate'),
+              ('Code with a Number of Minibatches which varies',
                2,
                None,
-               'standard-approach-based-on-the-normal-distribution'),
-              ('Resampling methods: Bootstrap background',
+               'code-with-a-number-of-minibatches-which-varies'),
+              ('Replace or not', 2, None, 'replace-or-not'),
+              ('SGD vs Full-Batch GD: Convergence Speed and Memory Comparison',
                2,
                None,
-               'resampling-methods-bootstrap-background'),
-              ('Resampling methods: More Bootstrap background',
+               'sgd-vs-full-batch-gd-convergence-speed-and-memory-comparison'),
+              ('Theoretical Convergence Speed and convex optimization',
+               3,
+               None,
+               'theoretical-convergence-speed-and-convex-optimization'),
+              ('Strongly Convex Case', 3, None, 'strongly-convex-case'),
+              ('Non-Convex Problems', 3, None, 'non-convex-problems'),
+              ('Memory Usage and Scalability',
+               2,
+               None,
+               'memory-usage-and-scalability'),
+              ('Empirical Evidence: Convergence Time and Memory in Practice',
+               2,
+               None,
+               'empirical-evidence-convergence-time-and-memory-in-practice'),
+              ('Deep Neural Networks', 3, None, 'deep-neural-networks'),
+              ('Memory constraints', 3, None, 'memory-constraints'),
+              ('Second moment of the gradient',
+               2,
+               None,
+               'second-moment-of-the-gradient'),
+              ('Challenge: Choosing a Fixed Learning Rate',
+               2,
+               None,
+               'challenge-choosing-a-fixed-learning-rate'),
+              ('Motivation for Adaptive Step Sizes',
+               2,
+               None,
+               'motivation-for-adaptive-step-sizes'),
+              ('AdaGrad algorithm, taken from "Goodfellow et '
+               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
+               2,
+               None,
+               'adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
+              ('Derivation of the AdaGrad Algorithm',
+               2,
+               None,
+               'derivation-of-the-adagrad-algorithm'),
+              ('AdaGrad Update Rule Derivation',
+               2,
+               None,
+               'adagrad-update-rule-derivation'),
+              ('AdaGrad Properties', 2, None, 'adagrad-properties'),
+              ('RMSProp: Adaptive Learning Rates',
                2,
                None,
-               'resampling-methods-more-bootstrap-background'),
-              ('Resampling methods: Bootstrap approach',
+               'rmsprop-adaptive-learning-rates'),
+              ('RMSProp algorithm, taken from "Goodfellow et '
+               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
                2,
                None,
-               'resampling-methods-bootstrap-approach'),
-              ('Resampling methods: Bootstrap steps',
+               'rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
+              ('Adam Optimizer', 2, None, 'adam-optimizer'),
+              ('"ADAM optimizer":"https://arxiv.org/abs/1412.6980"',
                2,
                None,
-               'resampling-methods-bootstrap-steps'),
-              ('Code example for the Bootstrap method',
+               'adam-optimizer-https-arxiv-org-abs-1412-6980'),
+              ('Why Combine Momentum and RMSProp?',
                2,
                None,
-               'code-example-for-the-bootstrap-method'),
-              ('Plotting the Histogram', 2, None, 'plotting-the-histogram'),
-              ('The bias-variance tradeoff',
+               'why-combine-momentum-and-rmsprop'),
+              ('Adam: Exponential Moving Averages (Moments)',
                2,
                None,
-               'the-bias-variance-tradeoff'),
-              ('A way to Read the Bias-Variance Tradeoff',
+               'adam-exponential-moving-averages-moments'),
+              ('Adam: Bias Correction', 2, None, 'adam-bias-correction'),
+              ('Adam: Update Rule Derivation',
                2,
                None,
-               'a-way-to-read-the-bias-variance-tradeoff'),
-              ('Example code for Bias-Variance tradeoff',
+               'adam-update-rule-derivation'),
+              ('Adam vs. AdaGrad and RMSProp',
                2,
                None,
-               'example-code-for-bias-variance-tradeoff'),
-              ('Understanding what happens',
+               'adam-vs-adagrad-and-rmsprop'),
+              ('Adaptivity Across Dimensions',
                2,
                None,
-               'understanding-what-happens'),
-              ('Summing up', 2, None, 'summing-up'),
-              ("Another Example from Scikit-Learn's Repository",
+               'adaptivity-across-dimensions'),
+              ('ADAM algorithm, taken from "Goodfellow et '
+               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
                2,
                None,
-               'another-example-from-scikit-learn-s-repository'),
-              ('Various steps in cross-validation',
+               'adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
+              ('Algorithms and codes for Adagrad, RMSprop and Adam',
                2,
                None,
-               'various-steps-in-cross-validation'),
-              ('Cross-validation in brief',
+               'algorithms-and-codes-for-adagrad-rmsprop-and-adam'),
+              ('Practical tips', 2, None, 'practical-tips'),
+              ('Sneaking in automatic differentiation using Autograd',
                2,
                None,
-               'cross-validation-in-brief'),
-              ('Code Example for Cross-validation and $k$-fold '
-               'Cross-validation',
+               'sneaking-in-automatic-differentiation-using-autograd'),
+              ('Same code but now with momentum gradient descent',
                2,
                None,
-               'code-example-for-cross-validation-and-k-fold-cross-validation'),
-              ('More examples on bootstrap and cross-validation and errors',
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ('Including Stochastic Gradient Descent with Autograd',
                2,
                None,
-               'more-examples-on-bootstrap-and-cross-validation-and-errors'),
-              ('The same example but now with cross-validation',
+               'including-stochastic-gradient-descent-with-autograd'),
+              ('Same code but now with momentum gradient descent',
                2,
                None,
-               'the-same-example-but-now-with-cross-validation'),
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ("But none of these can compete with Newton's method",
+               2,
+               None,
+               'but-none-of-these-can-compete-with-newton-s-method'),
+              ('Similar (second order function now) problem but now with '
+               'AdaGrad',
+               2,
+               None,
+               'similar-second-order-function-now-problem-but-now-with-adagrad'),
+              ('RMSprop for adaptive learning rate with Stochastic Gradient '
+               'Descent',
+               2,
+               None,
+               'rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent'),
+              ('And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"',
+               2,
+               None,
+               'and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf'),
               ('Material for the lab sessions',
                2,
                None,
                'material-for-the-lab-sessions'),
-              ('Linking the regression analysis with a statistical '
-               'interpretation',
+              ('Reminder on different scaling methods',
                2,
                None,
-               'linking-the-regression-analysis-with-a-statistical-interpretation'),
-              ('Assumptions made', 2, None, 'assumptions-made'),
-              ('Expectation value and variance',
+               'reminder-on-different-scaling-methods'),
+              ('Functionality in Scikit-Learn',
                2,
                None,
-               'expectation-value-and-variance'),
-              ('Expectation value and variance for $\\boldsymbol{\\beta}$',
+               'functionality-in-scikit-learn'),
+              ('More preprocessing', 2, None, 'more-preprocessing'),
+              ('Frequently used scaling functions',
                2,
                None,
-               'expectation-value-and-variance-for-boldsymbol-beta')]}
+               'frequently-used-scaling-functions')]}
 end of tocinfo -->
 
 <body>
@@ -220,66 +283,86 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week37-bs.html">Week 37: Statistical interpretations and Resampling Methods</a>
+    <a class="navbar-brand" href="http://github.com/week37-bs.html">Week 37: Gradient descent methods</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs001.html#plans-for-week-37-lecture-monday" style="font-size: 80%;">Plans for week 37, lecture Monday</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs002.html#plans-for-week-37-lab-sessions" style="font-size: 80%;">Plans for week 37, lab sessions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs003.html#material-for-lecture-monday-september-9" style="font-size: 80%;">Material for lecture Monday September 9</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs004.html#deriving-ols-from-a-probability-distribution" style="font-size: 80%;">Deriving OLS from a probability distribution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs005.html#independent-and-identically-distrubuted-iid" style="font-size: 80%;">Independent and Identically Distrubuted (iid)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs006.html#maximum-likelihood-estimation-mle" style="font-size: 80%;">Maximum Likelihood Estimation (MLE)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs007.html#a-new-cost-function" style="font-size: 80%;">A new Cost Function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs008.html#more-basic-statistics-and-bayes-theorem" style="font-size: 80%;">More basic Statistics and Bayes' theorem</a></li>
-     <!-- navigation toc: --> <li><a href="#marginal-probability" style="font-size: 80%;">Marginal Probability</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs010.html#conditional-probability" style="font-size: 80%;">Conditional  Probability</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs011.html#bayes-theorem" style="font-size: 80%;">Bayes' Theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs012.html#interpretations-of-bayes-theorem" style="font-size: 80%;">Interpretations of Bayes' Theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs013.html#example-of-usage-of-bayes-theorem" style="font-size: 80%;">Example of Usage of Bayes' theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs014.html#doing-it-correctly" style="font-size: 80%;">Doing it correctly</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs015.html#bayes-theorem-and-ridge-and-lasso-regression" style="font-size: 80%;">Bayes' Theorem and Ridge and Lasso Regression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs016.html#ridge-and-bayes" style="font-size: 80%;">Ridge and Bayes</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs017.html#lasso-and-bayes" style="font-size: 80%;">Lasso and Bayes</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs021.html#why-resampling-methods" style="font-size: 80%;">Why resampling methods</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs023.html#resampling-methods" style="font-size: 80%;">Resampling methods</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs020.html#resampling-approaches-can-be-computationally-expensive" style="font-size: 80%;">Resampling approaches can be computationally expensive</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs021.html#why-resampling-methods" style="font-size: 80%;">Why resampling methods ?</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs022.html#statistical-analysis" style="font-size: 80%;">Statistical analysis</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs023.html#resampling-methods" style="font-size: 80%;">Resampling methods</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs024.html#resampling-methods-bootstrap" style="font-size: 80%;">Resampling methods: Bootstrap</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs025.html#the-central-limit-theorem" style="font-size: 80%;">The Central Limit Theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs026.html#finding-the-limit" style="font-size: 80%;">Finding the Limit</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs027.html#rewriting-the-delta-function" style="font-size: 80%;">Rewriting the \( \delta \)-function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs028.html#identifying-terms" style="font-size: 80%;">Identifying Terms</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs029.html#wrapping-it-up" style="font-size: 80%;">Wrapping it up</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs030.html#confidence-intervals" style="font-size: 80%;">Confidence Intervals</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs031.html#standard-approach-based-on-the-normal-distribution" style="font-size: 80%;">Standard Approach based on the Normal Distribution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs032.html#resampling-methods-bootstrap-background" style="font-size: 80%;">Resampling methods: Bootstrap background</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#resampling-methods-more-bootstrap-background" style="font-size: 80%;">Resampling methods: More Bootstrap background</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs034.html#resampling-methods-bootstrap-approach" style="font-size: 80%;">Resampling methods: Bootstrap approach</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#resampling-methods-bootstrap-steps" style="font-size: 80%;">Resampling methods: Bootstrap steps</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs036.html#code-example-for-the-bootstrap-method" style="font-size: 80%;">Code example for the Bootstrap method</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs037.html#plotting-the-histogram" style="font-size: 80%;">Plotting the Histogram</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs038.html#the-bias-variance-tradeoff" style="font-size: 80%;">The bias-variance tradeoff</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs039.html#a-way-to-read-the-bias-variance-tradeoff" style="font-size: 80%;">A way to Read the Bias-Variance Tradeoff</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs040.html#example-code-for-bias-variance-tradeoff" style="font-size: 80%;">Example code for Bias-Variance tradeoff</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs041.html#understanding-what-happens" style="font-size: 80%;">Understanding what happens</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs042.html#summing-up" style="font-size: 80%;">Summing up</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs043.html#another-example-from-scikit-learn-s-repository" style="font-size: 80%;">Another Example from Scikit-Learn's Repository</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs044.html#various-steps-in-cross-validation" style="font-size: 80%;">Various steps in cross-validation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs045.html#cross-validation-in-brief" style="font-size: 80%;">Cross-validation in brief</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs046.html#code-example-for-cross-validation-and-k-fold-cross-validation" style="font-size: 80%;">Code Example for Cross-validation and \( k \)-fold Cross-validation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs047.html#more-examples-on-bootstrap-and-cross-validation-and-errors" style="font-size: 80%;">More examples on bootstrap and cross-validation and errors</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs048.html#the-same-example-but-now-with-cross-validation" style="font-size: 80%;">The same example but now with cross-validation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs049.html#material-for-the-lab-sessions" style="font-size: 80%;">Material for the lab sessions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs050.html#linking-the-regression-analysis-with-a-statistical-interpretation" style="font-size: 80%;">Linking the regression analysis with a statistical interpretation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs051.html#assumptions-made" style="font-size: 80%;">Assumptions made</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs052.html#expectation-value-and-variance" style="font-size: 80%;">Expectation value and variance</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs053.html#expectation-value-and-variance-for-boldsymbol-beta" style="font-size: 80%;">Expectation value and variance for \( \boldsymbol{\beta} \)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs001.html#plans-for-week-37-lecture-monday" style="font-size: 80%;"><b>Plans for week 37, lecture Monday</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs002.html#readings-and-videos" style="font-size: 80%;"><b>Readings and Videos:</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs003.html#material-for-lecture-monday-september-8" style="font-size: 80%;"><b>Material for lecture Monday September 8</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs004.html#gradient-descent-and-revisiting-ordinary-least-squares-from-last-week" style="font-size: 80%;"><b>Gradient descent and revisiting Ordinary Least Squares from last week</b></a></li>
+     <!-- navigation toc: --> <li><a href="#gradient-descent-example" style="font-size: 80%;"><b>Gradient descent example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs006.html#the-derivative-of-the-cost-loss-function" style="font-size: 80%;"><b>The derivative of the cost/loss function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs007.html#the-hessian-matrix" style="font-size: 80%;"><b>The Hessian matrix</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs008.html#simple-program" style="font-size: 80%;"><b>Simple program</b></a></li>
+     <!-- navigation toc: --> <li><a href="#gradient-descent-example" style="font-size: 80%;"><b>Gradient Descent Example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs010.html#gradient-descent-and-ridge" style="font-size: 80%;"><b>Gradient descent and Ridge</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs011.html#the-hessian-matrix-for-ridge-regression" style="font-size: 80%;"><b>The Hessian matrix for Ridge Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs012.html#program-example-for-gradient-descent-with-ridge-regression" style="font-size: 80%;"><b>Program example for gradient descent with Ridge Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs013.html#using-gradient-descent-methods-limitations" style="font-size: 80%;"><b>Using gradient descent methods, limitations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs014.html#momentum-based-gd" style="font-size: 80%;"><b>Momentum based GD</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs015.html#improving-gradient-descent-with-momentum" style="font-size: 80%;"><b>Improving gradient descent with momentum</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs056.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs017.html#overview-video-on-stochastic-gradient-descent-sgd" style="font-size: 80%;"><b>Overview video on Stochastic Gradient Descent (SGD)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs018.html#batches-and-mini-batches" style="font-size: 80%;"><b>Batches and mini-batches</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs019.html#pros-and-cons" style="font-size: 80%;"><b>Pros and cons</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs020.html#convergence-rates" style="font-size: 80%;"><b>Convergence rates</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs021.html#accuracy" style="font-size: 80%;"><b>Accuracy</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs022.html#stochastic-gradient-descent-sgd" style="font-size: 80%;"><b>Stochastic Gradient Descent (SGD)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs023.html#stochastic-gradient-descent" style="font-size: 80%;"><b>Stochastic Gradient Descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs024.html#computation-of-gradients" style="font-size: 80%;"><b>Computation of gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs025.html#sgd-example" style="font-size: 80%;"><b>SGD example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs026.html#the-gradient-step" style="font-size: 80%;"><b>The gradient step</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs027.html#simple-example-code" style="font-size: 80%;"><b>Simple example code</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs028.html#when-do-we-stop" style="font-size: 80%;"><b>When do we stop?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs029.html#slightly-different-approach" style="font-size: 80%;"><b>Slightly different approach</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs030.html#time-decay-rate" style="font-size: 80%;"><b>Time decay rate</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs031.html#code-with-a-number-of-minibatches-which-varies" style="font-size: 80%;"><b>Code with a Number of Minibatches which varies</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs032.html#replace-or-not" style="font-size: 80%;"><b>Replace or not</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#sgd-vs-full-batch-gd-convergence-speed-and-memory-comparison" style="font-size: 80%;"><b>SGD vs Full-Batch GD: Convergence Speed and Memory Comparison</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#theoretical-convergence-speed-and-convex-optimization" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Theoretical Convergence Speed and convex optimization</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#strongly-convex-case" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Strongly Convex Case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#non-convex-problems" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Non-Convex Problems</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs034.html#memory-usage-and-scalability" style="font-size: 80%;"><b>Memory Usage and Scalability</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#empirical-evidence-convergence-time-and-memory-in-practice" style="font-size: 80%;"><b>Empirical Evidence: Convergence Time and Memory in Practice</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#deep-neural-networks" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Deep Neural Networks</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#memory-constraints" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Memory constraints</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs036.html#second-moment-of-the-gradient" style="font-size: 80%;"><b>Second moment of the gradient</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs037.html#challenge-choosing-a-fixed-learning-rate" style="font-size: 80%;"><b>Challenge: Choosing a Fixed Learning Rate</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs038.html#motivation-for-adaptive-step-sizes" style="font-size: 80%;"><b>Motivation for Adaptive Step Sizes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs038.html#adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>AdaGrad algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs039.html#derivation-of-the-adagrad-algorithm" style="font-size: 80%;"><b>Derivation of the AdaGrad Algorithm</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs040.html#adagrad-update-rule-derivation" style="font-size: 80%;"><b>AdaGrad Update Rule Derivation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs041.html#adagrad-properties" style="font-size: 80%;"><b>AdaGrad Properties</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs042.html#rmsprop-adaptive-learning-rates" style="font-size: 80%;"><b>RMSProp: Adaptive Learning Rates</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs042.html#rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>RMSProp algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs043.html#adam-optimizer" style="font-size: 80%;"><b>Adam Optimizer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs044.html#adam-optimizer-https-arxiv-org-abs-1412-6980" style="font-size: 80%;"><b>"ADAM optimizer":"https://arxiv.org/abs/1412.6980"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs045.html#why-combine-momentum-and-rmsprop" style="font-size: 80%;"><b>Why Combine Momentum and RMSProp?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs046.html#adam-exponential-moving-averages-moments" style="font-size: 80%;"><b>Adam: Exponential Moving Averages (Moments)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs047.html#adam-bias-correction" style="font-size: 80%;"><b>Adam: Bias Correction</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs048.html#adam-update-rule-derivation" style="font-size: 80%;"><b>Adam: Update Rule Derivation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs049.html#adam-vs-adagrad-and-rmsprop" style="font-size: 80%;"><b>Adam vs. AdaGrad and RMSProp</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs050.html#adaptivity-across-dimensions" style="font-size: 80%;"><b>Adaptivity Across Dimensions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs050.html#adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>ADAM algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs051.html#algorithms-and-codes-for-adagrad-rmsprop-and-adam" style="font-size: 80%;"><b>Algorithms and codes for Adagrad, RMSprop and Adam</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs052.html#practical-tips" style="font-size: 80%;"><b>Practical tips</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs053.html#sneaking-in-automatic-differentiation-using-autograd" style="font-size: 80%;"><b>Sneaking in automatic differentiation using Autograd</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs056.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs055.html#including-stochastic-gradient-descent-with-autograd" style="font-size: 80%;"><b>Including Stochastic Gradient Descent with Autograd</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs056.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs057.html#but-none-of-these-can-compete-with-newton-s-method" style="font-size: 80%;"><b>But none of these can compete with Newton's method</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs058.html#similar-second-order-function-now-problem-but-now-with-adagrad" style="font-size: 80%;"><b>Similar (second order function now) problem but now with AdaGrad</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs059.html#rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent" style="font-size: 80%;"><b>RMSprop for adaptive learning rate with Stochastic Gradient Descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs060.html#and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf" style="font-size: 80%;"><b>And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs061.html#material-for-the-lab-sessions" style="font-size: 80%;"><b>Material for the lab sessions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs062.html#reminder-on-different-scaling-methods" style="font-size: 80%;"><b>Reminder on different scaling methods</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs063.html#functionality-in-scikit-learn" style="font-size: 80%;"><b>Functionality in Scikit-Learn</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs064.html#more-preprocessing" style="font-size: 80%;"><b>More preprocessing</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs065.html#frequently-used-scaling-functions" style="font-size: 80%;"><b>Frequently used scaling functions</b></a></li>
 
         </ul>
       </li>
@@ -291,16 +374,74 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0009"></a>
 <!-- !split -->
-<h2 id="marginal-probability" class="anchor">Marginal Probability </h2>
+<h2 id="gradient-descent-example" class="anchor">Gradient Descent Example </h2>
+
+<p>Here our simple example</p>
 
-<p>The marginal probability is defined in terms of only one of the set of variables \( X,Y \). For a discrete probability we have</p>
-<div class="panel panel-default">
-<div class="panel-body">
-<!-- subsequent paragraphs come in larger fonts, so start with a paragraph -->
-$$
-p(X)=\sum_{i=0}^{n-1}p(X,Y=y_i)=\sum_{i=0}^{n-1}p(X\vert Y=y_i)p(Y=y_i)=\sum_{i=0}^{n-1}p(X\vert y_i)p(y_i).
-$$
+<!-- code=python (!bc pycod) typeset with pygments style "default" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #f8f8f8">
+  <pre style="line-height: 125%;"><span style="color: #408080; font-style: italic"># Importing various packages</span>
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">random</span> <span style="color: #008000; font-weight: bold">import</span> random, seed
+<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
+<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">matplotlib.pyplot</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">plt</span>
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">mpl_toolkits.mplot3d</span> <span style="color: #008000; font-weight: bold">import</span> Axes3D
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">matplotlib</span> <span style="color: #008000; font-weight: bold">import</span> cm
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">matplotlib.ticker</span> <span style="color: #008000; font-weight: bold">import</span> LinearLocator, FormatStrFormatter
+<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">sys</span>
+
+<span style="color: #408080; font-style: italic"># the number of datapoints</span>
+n <span style="color: #666666">=</span> <span style="color: #666666">100</span>
+x <span style="color: #666666">=</span> <span style="color: #666666">2*</span>np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>rand(n,<span style="color: #666666">1</span>)
+y <span style="color: #666666">=</span> <span style="color: #666666">4+3*</span>x<span style="color: #666666">+</span>np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>randn(n,<span style="color: #666666">1</span>)
+
+X <span style="color: #666666">=</span> np<span style="color: #666666">.</span>c_[np<span style="color: #666666">.</span>ones((n,<span style="color: #666666">1</span>)), x]
+<span style="color: #408080; font-style: italic"># Hessian matrix</span>
+H <span style="color: #666666">=</span> (<span style="color: #666666">2.0/</span>n)<span style="color: #666666">*</span> X<span style="color: #666666">.</span>T <span style="color: #666666">@</span> X
+<span style="color: #408080; font-style: italic"># Get the eigenvalues</span>
+EigValues, EigVectors <span style="color: #666666">=</span> np<span style="color: #666666">.</span>linalg<span style="color: #666666">.</span>eig(H)
+<span style="color: #008000">print</span>(<span style="color: #BA2121">f&quot;Eigenvalues of Hessian Matrix:</span><span style="color: #BB6688; font-weight: bold">{</span>EigValues<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">&quot;</span>)
+
+theta_linreg <span style="color: #666666">=</span> np<span style="color: #666666">.</span>linalg<span style="color: #666666">.</span>inv(X<span style="color: #666666">.</span>T <span style="color: #666666">@</span> X) <span style="color: #666666">@</span> X<span style="color: #666666">.</span>T <span style="color: #666666">@</span> y
+<span style="color: #008000">print</span>(theta_linreg)
+theta <span style="color: #666666">=</span> np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>randn(<span style="color: #666666">2</span>,<span style="color: #666666">1</span>)
+
+eta <span style="color: #666666">=</span> <span style="color: #666666">1.0/</span>np<span style="color: #666666">.</span>max(EigValues)
+Niterations <span style="color: #666666">=</span> <span style="color: #666666">1000</span>
+
+<span style="color: #008000; font-weight: bold">for</span> <span style="color: #008000">iter</span> <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(Niterations):
+    gradient <span style="color: #666666">=</span> (<span style="color: #666666">2.0/</span>n)<span style="color: #666666">*</span>X<span style="color: #666666">.</span>T <span style="color: #666666">@</span> (X <span style="color: #666666">@</span> theta<span style="color: #666666">-</span>y)
+    theta <span style="color: #666666">-=</span> eta<span style="color: #666666">*</span>gradient
+
+<span style="color: #008000">print</span>(theta)
+xnew <span style="color: #666666">=</span> np<span style="color: #666666">.</span>array([[<span style="color: #666666">0</span>],[<span style="color: #666666">2</span>]])
+xbnew <span style="color: #666666">=</span> np<span style="color: #666666">.</span>c_[np<span style="color: #666666">.</span>ones((<span style="color: #666666">2</span>,<span style="color: #666666">1</span>)), xnew]
+ypredict <span style="color: #666666">=</span> xbnew<span style="color: #666666">.</span>dot(theta)
+ypredict2 <span style="color: #666666">=</span> xbnew<span style="color: #666666">.</span>dot(theta_linreg)
+plt<span style="color: #666666">.</span>plot(xnew, ypredict, <span style="color: #BA2121">&quot;r-&quot;</span>)
+plt<span style="color: #666666">.</span>plot(xnew, ypredict2, <span style="color: #BA2121">&quot;b-&quot;</span>)
+plt<span style="color: #666666">.</span>plot(x, y ,<span style="color: #BA2121">&#39;ro&#39;</span>)
+plt<span style="color: #666666">.</span>axis([<span style="color: #666666">0</span>,<span style="color: #666666">2.0</span>,<span style="color: #666666">0</span>, <span style="color: #666666">15.0</span>])
+plt<span style="color: #666666">.</span>xlabel(<span style="color: #BA2121">r&#39;$x$&#39;</span>)
+plt<span style="color: #666666">.</span>ylabel(<span style="color: #BA2121">r&#39;$y$&#39;</span>)
+plt<span style="color: #666666">.</span>title(<span style="color: #BA2121">r&#39;Gradient descent example&#39;</span>)
+plt<span style="color: #666666">.</span>show()
+</pre>
 </div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
 </div>
 
 
@@ -328,7 +469,7 @@ <h2 id="marginal-probability" class="anchor">Marginal Probability </h2>
   <li><a href="http://github.com/._week37-bs017.html">18</a></li>
   <li><a href="http://github.com/._week37-bs018.html">19</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week37-bs053.html">54</a></li>
+  <li><a href="http://github.com/._week37-bs065.html">66</a></li>
   <li><a href="http://github.com/._week37-bs010.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week37/html/._week37-bs010.html b/doc/pub/week37/html/._week37-bs010.html
index d08fc5c74..269eb7e48 100644
--- a/doc/pub/week37/html/._week37-bs010.html
+++ b/doc/pub/week37/html/._week37-bs010.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 37: Statistical interpretations and Resampling Methods">
-<title>Week 37: Statistical interpretations and Resampling Methods</title>
+<meta name="description" content="Week 37: Gradient descent methods">
+<title>Week 37: Gradient descent methods</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week37.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week37-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -40,159 +40,222 @@
                2,
                None,
                'plans-for-week-37-lecture-monday'),
-              ('Plans for week 37, lab sessions',
+              ('Readings and Videos:', 2, None, 'readings-and-videos'),
+              ('Material for lecture Monday September 8',
                2,
                None,
-               'plans-for-week-37-lab-sessions'),
-              ('Material for lecture Monday September 9',
+               'material-for-lecture-monday-september-8'),
+              ('Gradient descent and revisiting Ordinary Least Squares from '
+               'last week',
                2,
                None,
-               'material-for-lecture-monday-september-9'),
-              ('Deriving OLS from a probability distribution',
+               'gradient-descent-and-revisiting-ordinary-least-squares-from-last-week'),
+              ('Gradient descent example', 2, None, 'gradient-descent-example'),
+              ('The derivative of the cost/loss function',
                2,
                None,
-               'deriving-ols-from-a-probability-distribution'),
-              ('Independent and Identically Distrubuted (iid)',
+               'the-derivative-of-the-cost-loss-function'),
+              ('The Hessian matrix', 2, None, 'the-hessian-matrix'),
+              ('Simple program', 2, None, 'simple-program'),
+              ('Gradient Descent Example', 2, None, 'gradient-descent-example'),
+              ('Gradient descent and Ridge',
                2,
                None,
-               'independent-and-identically-distrubuted-iid'),
-              ('Maximum Likelihood Estimation (MLE)',
+               'gradient-descent-and-ridge'),
+              ('The Hessian matrix for Ridge Regression',
                2,
                None,
-               'maximum-likelihood-estimation-mle'),
-              ('A new Cost Function', 2, None, 'a-new-cost-function'),
-              ("More basic Statistics and Bayes' theorem",
+               'the-hessian-matrix-for-ridge-regression'),
+              ('Program example for gradient descent with Ridge Regression',
                2,
                None,
-               'more-basic-statistics-and-bayes-theorem'),
-              ('Marginal Probability', 2, None, 'marginal-probability'),
-              ('Conditional  Probability', 2, None, 'conditional-probability'),
-              ("Bayes' Theorem", 2, None, 'bayes-theorem'),
-              ("Interpretations of Bayes' Theorem",
+               'program-example-for-gradient-descent-with-ridge-regression'),
+              ('Using gradient descent methods, limitations',
                2,
                None,
-               'interpretations-of-bayes-theorem'),
-              ("Example of Usage of Bayes' theorem",
+               'using-gradient-descent-methods-limitations'),
+              ('Momentum based GD', 2, None, 'momentum-based-gd'),
+              ('Improving gradient descent with momentum',
                2,
                None,
-               'example-of-usage-of-bayes-theorem'),
-              ('Doing it correctly', 2, None, 'doing-it-correctly'),
-              ("Bayes' Theorem and Ridge and Lasso Regression",
+               'improving-gradient-descent-with-momentum'),
+              ('Same code but now with momentum gradient descent',
                2,
                None,
-               'bayes-theorem-and-ridge-and-lasso-regression'),
-              ('Ridge and Bayes', 2, None, 'ridge-and-bayes'),
-              ('Lasso and Bayes', 2, None, 'lasso-and-bayes'),
-              ('Why resampling methods', 2, None, 'why-resampling-methods'),
-              ('Resampling methods', 2, None, 'resampling-methods'),
-              ('Resampling approaches can be computationally expensive',
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ('Overview video on Stochastic Gradient Descent (SGD)',
                2,
                None,
-               'resampling-approaches-can-be-computationally-expensive'),
-              ('Why resampling methods ?', 2, None, 'why-resampling-methods'),
-              ('Statistical analysis', 2, None, 'statistical-analysis'),
-              ('Resampling methods', 2, None, 'resampling-methods'),
-              ('Resampling methods: Bootstrap',
+               'overview-video-on-stochastic-gradient-descent-sgd'),
+              ('Batches and mini-batches', 2, None, 'batches-and-mini-batches'),
+              ('Pros and cons', 2, None, 'pros-and-cons'),
+              ('Convergence rates', 2, None, 'convergence-rates'),
+              ('Accuracy', 2, None, 'accuracy'),
+              ('Stochastic Gradient Descent (SGD)',
                2,
                None,
-               'resampling-methods-bootstrap'),
-              ('The Central Limit Theorem',
+               'stochastic-gradient-descent-sgd'),
+              ('Stochastic Gradient Descent',
                2,
                None,
-               'the-central-limit-theorem'),
-              ('Finding the Limit', 2, None, 'finding-the-limit'),
-              ('Rewriting the $\\delta$-function',
+               'stochastic-gradient-descent'),
+              ('Computation of gradients', 2, None, 'computation-of-gradients'),
+              ('SGD example', 2, None, 'sgd-example'),
+              ('The gradient step', 2, None, 'the-gradient-step'),
+              ('Simple example code', 2, None, 'simple-example-code'),
+              ('When do we stop?', 2, None, 'when-do-we-stop'),
+              ('Slightly different approach',
                2,
                None,
-               'rewriting-the-delta-function'),
-              ('Identifying Terms', 2, None, 'identifying-terms'),
-              ('Wrapping it up', 2, None, 'wrapping-it-up'),
-              ('Confidence Intervals', 2, None, 'confidence-intervals'),
-              ('Standard Approach based on the Normal Distribution',
+               'slightly-different-approach'),
+              ('Time decay rate', 2, None, 'time-decay-rate'),
+              ('Code with a Number of Minibatches which varies',
                2,
                None,
-               'standard-approach-based-on-the-normal-distribution'),
-              ('Resampling methods: Bootstrap background',
+               'code-with-a-number-of-minibatches-which-varies'),
+              ('Replace or not', 2, None, 'replace-or-not'),
+              ('SGD vs Full-Batch GD: Convergence Speed and Memory Comparison',
                2,
                None,
-               'resampling-methods-bootstrap-background'),
-              ('Resampling methods: More Bootstrap background',
+               'sgd-vs-full-batch-gd-convergence-speed-and-memory-comparison'),
+              ('Theoretical Convergence Speed and convex optimization',
+               3,
+               None,
+               'theoretical-convergence-speed-and-convex-optimization'),
+              ('Strongly Convex Case', 3, None, 'strongly-convex-case'),
+              ('Non-Convex Problems', 3, None, 'non-convex-problems'),
+              ('Memory Usage and Scalability',
+               2,
+               None,
+               'memory-usage-and-scalability'),
+              ('Empirical Evidence: Convergence Time and Memory in Practice',
+               2,
+               None,
+               'empirical-evidence-convergence-time-and-memory-in-practice'),
+              ('Deep Neural Networks', 3, None, 'deep-neural-networks'),
+              ('Memory constraints', 3, None, 'memory-constraints'),
+              ('Second moment of the gradient',
+               2,
+               None,
+               'second-moment-of-the-gradient'),
+              ('Challenge: Choosing a Fixed Learning Rate',
+               2,
+               None,
+               'challenge-choosing-a-fixed-learning-rate'),
+              ('Motivation for Adaptive Step Sizes',
+               2,
+               None,
+               'motivation-for-adaptive-step-sizes'),
+              ('AdaGrad algorithm, taken from "Goodfellow et '
+               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
+               2,
+               None,
+               'adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
+              ('Derivation of the AdaGrad Algorithm',
+               2,
+               None,
+               'derivation-of-the-adagrad-algorithm'),
+              ('AdaGrad Update Rule Derivation',
+               2,
+               None,
+               'adagrad-update-rule-derivation'),
+              ('AdaGrad Properties', 2, None, 'adagrad-properties'),
+              ('RMSProp: Adaptive Learning Rates',
                2,
                None,
-               'resampling-methods-more-bootstrap-background'),
-              ('Resampling methods: Bootstrap approach',
+               'rmsprop-adaptive-learning-rates'),
+              ('RMSProp algorithm, taken from "Goodfellow et '
+               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
                2,
                None,
-               'resampling-methods-bootstrap-approach'),
-              ('Resampling methods: Bootstrap steps',
+               'rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
+              ('Adam Optimizer', 2, None, 'adam-optimizer'),
+              ('"ADAM optimizer":"https://arxiv.org/abs/1412.6980"',
                2,
                None,
-               'resampling-methods-bootstrap-steps'),
-              ('Code example for the Bootstrap method',
+               'adam-optimizer-https-arxiv-org-abs-1412-6980'),
+              ('Why Combine Momentum and RMSProp?',
                2,
                None,
-               'code-example-for-the-bootstrap-method'),
-              ('Plotting the Histogram', 2, None, 'plotting-the-histogram'),
-              ('The bias-variance tradeoff',
+               'why-combine-momentum-and-rmsprop'),
+              ('Adam: Exponential Moving Averages (Moments)',
                2,
                None,
-               'the-bias-variance-tradeoff'),
-              ('A way to Read the Bias-Variance Tradeoff',
+               'adam-exponential-moving-averages-moments'),
+              ('Adam: Bias Correction', 2, None, 'adam-bias-correction'),
+              ('Adam: Update Rule Derivation',
                2,
                None,
-               'a-way-to-read-the-bias-variance-tradeoff'),
-              ('Example code for Bias-Variance tradeoff',
+               'adam-update-rule-derivation'),
+              ('Adam vs. AdaGrad and RMSProp',
                2,
                None,
-               'example-code-for-bias-variance-tradeoff'),
-              ('Understanding what happens',
+               'adam-vs-adagrad-and-rmsprop'),
+              ('Adaptivity Across Dimensions',
                2,
                None,
-               'understanding-what-happens'),
-              ('Summing up', 2, None, 'summing-up'),
-              ("Another Example from Scikit-Learn's Repository",
+               'adaptivity-across-dimensions'),
+              ('ADAM algorithm, taken from "Goodfellow et '
+               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
                2,
                None,
-               'another-example-from-scikit-learn-s-repository'),
-              ('Various steps in cross-validation',
+               'adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
+              ('Algorithms and codes for Adagrad, RMSprop and Adam',
                2,
                None,
-               'various-steps-in-cross-validation'),
-              ('Cross-validation in brief',
+               'algorithms-and-codes-for-adagrad-rmsprop-and-adam'),
+              ('Practical tips', 2, None, 'practical-tips'),
+              ('Sneaking in automatic differentiation using Autograd',
                2,
                None,
-               'cross-validation-in-brief'),
-              ('Code Example for Cross-validation and $k$-fold '
-               'Cross-validation',
+               'sneaking-in-automatic-differentiation-using-autograd'),
+              ('Same code but now with momentum gradient descent',
                2,
                None,
-               'code-example-for-cross-validation-and-k-fold-cross-validation'),
-              ('More examples on bootstrap and cross-validation and errors',
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ('Including Stochastic Gradient Descent with Autograd',
                2,
                None,
-               'more-examples-on-bootstrap-and-cross-validation-and-errors'),
-              ('The same example but now with cross-validation',
+               'including-stochastic-gradient-descent-with-autograd'),
+              ('Same code but now with momentum gradient descent',
                2,
                None,
-               'the-same-example-but-now-with-cross-validation'),
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ("But none of these can compete with Newton's method",
+               2,
+               None,
+               'but-none-of-these-can-compete-with-newton-s-method'),
+              ('Similar (second order function now) problem but now with '
+               'AdaGrad',
+               2,
+               None,
+               'similar-second-order-function-now-problem-but-now-with-adagrad'),
+              ('RMSprop for adaptive learning rate with Stochastic Gradient '
+               'Descent',
+               2,
+               None,
+               'rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent'),
+              ('And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"',
+               2,
+               None,
+               'and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf'),
               ('Material for the lab sessions',
                2,
                None,
                'material-for-the-lab-sessions'),
-              ('Linking the regression analysis with a statistical '
-               'interpretation',
+              ('Reminder on different scaling methods',
                2,
                None,
-               'linking-the-regression-analysis-with-a-statistical-interpretation'),
-              ('Assumptions made', 2, None, 'assumptions-made'),
-              ('Expectation value and variance',
+               'reminder-on-different-scaling-methods'),
+              ('Functionality in Scikit-Learn',
                2,
                None,
-               'expectation-value-and-variance'),
-              ('Expectation value and variance for $\\boldsymbol{\\beta}$',
+               'functionality-in-scikit-learn'),
+              ('More preprocessing', 2, None, 'more-preprocessing'),
+              ('Frequently used scaling functions',
                2,
                None,
-               'expectation-value-and-variance-for-boldsymbol-beta')]}
+               'frequently-used-scaling-functions')]}
 end of tocinfo -->
 
 <body>
@@ -220,66 +283,86 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week37-bs.html">Week 37: Statistical interpretations and Resampling Methods</a>
+    <a class="navbar-brand" href="http://github.com/week37-bs.html">Week 37: Gradient descent methods</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs001.html#plans-for-week-37-lecture-monday" style="font-size: 80%;">Plans for week 37, lecture Monday</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs002.html#plans-for-week-37-lab-sessions" style="font-size: 80%;">Plans for week 37, lab sessions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs003.html#material-for-lecture-monday-september-9" style="font-size: 80%;">Material for lecture Monday September 9</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs004.html#deriving-ols-from-a-probability-distribution" style="font-size: 80%;">Deriving OLS from a probability distribution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs005.html#independent-and-identically-distrubuted-iid" style="font-size: 80%;">Independent and Identically Distrubuted (iid)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs006.html#maximum-likelihood-estimation-mle" style="font-size: 80%;">Maximum Likelihood Estimation (MLE)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs007.html#a-new-cost-function" style="font-size: 80%;">A new Cost Function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs008.html#more-basic-statistics-and-bayes-theorem" style="font-size: 80%;">More basic Statistics and Bayes' theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs009.html#marginal-probability" style="font-size: 80%;">Marginal Probability</a></li>
-     <!-- navigation toc: --> <li><a href="#conditional-probability" style="font-size: 80%;">Conditional  Probability</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs011.html#bayes-theorem" style="font-size: 80%;">Bayes' Theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs012.html#interpretations-of-bayes-theorem" style="font-size: 80%;">Interpretations of Bayes' Theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs013.html#example-of-usage-of-bayes-theorem" style="font-size: 80%;">Example of Usage of Bayes' theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs014.html#doing-it-correctly" style="font-size: 80%;">Doing it correctly</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs015.html#bayes-theorem-and-ridge-and-lasso-regression" style="font-size: 80%;">Bayes' Theorem and Ridge and Lasso Regression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs016.html#ridge-and-bayes" style="font-size: 80%;">Ridge and Bayes</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs017.html#lasso-and-bayes" style="font-size: 80%;">Lasso and Bayes</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs021.html#why-resampling-methods" style="font-size: 80%;">Why resampling methods</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs023.html#resampling-methods" style="font-size: 80%;">Resampling methods</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs020.html#resampling-approaches-can-be-computationally-expensive" style="font-size: 80%;">Resampling approaches can be computationally expensive</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs021.html#why-resampling-methods" style="font-size: 80%;">Why resampling methods ?</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs022.html#statistical-analysis" style="font-size: 80%;">Statistical analysis</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs023.html#resampling-methods" style="font-size: 80%;">Resampling methods</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs024.html#resampling-methods-bootstrap" style="font-size: 80%;">Resampling methods: Bootstrap</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs025.html#the-central-limit-theorem" style="font-size: 80%;">The Central Limit Theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs026.html#finding-the-limit" style="font-size: 80%;">Finding the Limit</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs027.html#rewriting-the-delta-function" style="font-size: 80%;">Rewriting the \( \delta \)-function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs028.html#identifying-terms" style="font-size: 80%;">Identifying Terms</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs029.html#wrapping-it-up" style="font-size: 80%;">Wrapping it up</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs030.html#confidence-intervals" style="font-size: 80%;">Confidence Intervals</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs031.html#standard-approach-based-on-the-normal-distribution" style="font-size: 80%;">Standard Approach based on the Normal Distribution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs032.html#resampling-methods-bootstrap-background" style="font-size: 80%;">Resampling methods: Bootstrap background</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#resampling-methods-more-bootstrap-background" style="font-size: 80%;">Resampling methods: More Bootstrap background</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs034.html#resampling-methods-bootstrap-approach" style="font-size: 80%;">Resampling methods: Bootstrap approach</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#resampling-methods-bootstrap-steps" style="font-size: 80%;">Resampling methods: Bootstrap steps</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs036.html#code-example-for-the-bootstrap-method" style="font-size: 80%;">Code example for the Bootstrap method</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs037.html#plotting-the-histogram" style="font-size: 80%;">Plotting the Histogram</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs038.html#the-bias-variance-tradeoff" style="font-size: 80%;">The bias-variance tradeoff</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs039.html#a-way-to-read-the-bias-variance-tradeoff" style="font-size: 80%;">A way to Read the Bias-Variance Tradeoff</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs040.html#example-code-for-bias-variance-tradeoff" style="font-size: 80%;">Example code for Bias-Variance tradeoff</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs041.html#understanding-what-happens" style="font-size: 80%;">Understanding what happens</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs042.html#summing-up" style="font-size: 80%;">Summing up</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs043.html#another-example-from-scikit-learn-s-repository" style="font-size: 80%;">Another Example from Scikit-Learn's Repository</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs044.html#various-steps-in-cross-validation" style="font-size: 80%;">Various steps in cross-validation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs045.html#cross-validation-in-brief" style="font-size: 80%;">Cross-validation in brief</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs046.html#code-example-for-cross-validation-and-k-fold-cross-validation" style="font-size: 80%;">Code Example for Cross-validation and \( k \)-fold Cross-validation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs047.html#more-examples-on-bootstrap-and-cross-validation-and-errors" style="font-size: 80%;">More examples on bootstrap and cross-validation and errors</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs048.html#the-same-example-but-now-with-cross-validation" style="font-size: 80%;">The same example but now with cross-validation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs049.html#material-for-the-lab-sessions" style="font-size: 80%;">Material for the lab sessions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs050.html#linking-the-regression-analysis-with-a-statistical-interpretation" style="font-size: 80%;">Linking the regression analysis with a statistical interpretation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs051.html#assumptions-made" style="font-size: 80%;">Assumptions made</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs052.html#expectation-value-and-variance" style="font-size: 80%;">Expectation value and variance</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs053.html#expectation-value-and-variance-for-boldsymbol-beta" style="font-size: 80%;">Expectation value and variance for \( \boldsymbol{\beta} \)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs001.html#plans-for-week-37-lecture-monday" style="font-size: 80%;"><b>Plans for week 37, lecture Monday</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs002.html#readings-and-videos" style="font-size: 80%;"><b>Readings and Videos:</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs003.html#material-for-lecture-monday-september-8" style="font-size: 80%;"><b>Material for lecture Monday September 8</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs004.html#gradient-descent-and-revisiting-ordinary-least-squares-from-last-week" style="font-size: 80%;"><b>Gradient descent and revisiting Ordinary Least Squares from last week</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs009.html#gradient-descent-example" style="font-size: 80%;"><b>Gradient descent example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs006.html#the-derivative-of-the-cost-loss-function" style="font-size: 80%;"><b>The derivative of the cost/loss function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs007.html#the-hessian-matrix" style="font-size: 80%;"><b>The Hessian matrix</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs008.html#simple-program" style="font-size: 80%;"><b>Simple program</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs009.html#gradient-descent-example" style="font-size: 80%;"><b>Gradient Descent Example</b></a></li>
+     <!-- navigation toc: --> <li><a href="#gradient-descent-and-ridge" style="font-size: 80%;"><b>Gradient descent and Ridge</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs011.html#the-hessian-matrix-for-ridge-regression" style="font-size: 80%;"><b>The Hessian matrix for Ridge Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs012.html#program-example-for-gradient-descent-with-ridge-regression" style="font-size: 80%;"><b>Program example for gradient descent with Ridge Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs013.html#using-gradient-descent-methods-limitations" style="font-size: 80%;"><b>Using gradient descent methods, limitations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs014.html#momentum-based-gd" style="font-size: 80%;"><b>Momentum based GD</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs015.html#improving-gradient-descent-with-momentum" style="font-size: 80%;"><b>Improving gradient descent with momentum</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs056.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs017.html#overview-video-on-stochastic-gradient-descent-sgd" style="font-size: 80%;"><b>Overview video on Stochastic Gradient Descent (SGD)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs018.html#batches-and-mini-batches" style="font-size: 80%;"><b>Batches and mini-batches</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs019.html#pros-and-cons" style="font-size: 80%;"><b>Pros and cons</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs020.html#convergence-rates" style="font-size: 80%;"><b>Convergence rates</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs021.html#accuracy" style="font-size: 80%;"><b>Accuracy</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs022.html#stochastic-gradient-descent-sgd" style="font-size: 80%;"><b>Stochastic Gradient Descent (SGD)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs023.html#stochastic-gradient-descent" style="font-size: 80%;"><b>Stochastic Gradient Descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs024.html#computation-of-gradients" style="font-size: 80%;"><b>Computation of gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs025.html#sgd-example" style="font-size: 80%;"><b>SGD example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs026.html#the-gradient-step" style="font-size: 80%;"><b>The gradient step</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs027.html#simple-example-code" style="font-size: 80%;"><b>Simple example code</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs028.html#when-do-we-stop" style="font-size: 80%;"><b>When do we stop?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs029.html#slightly-different-approach" style="font-size: 80%;"><b>Slightly different approach</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs030.html#time-decay-rate" style="font-size: 80%;"><b>Time decay rate</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs031.html#code-with-a-number-of-minibatches-which-varies" style="font-size: 80%;"><b>Code with a Number of Minibatches which varies</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs032.html#replace-or-not" style="font-size: 80%;"><b>Replace or not</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#sgd-vs-full-batch-gd-convergence-speed-and-memory-comparison" style="font-size: 80%;"><b>SGD vs Full-Batch GD: Convergence Speed and Memory Comparison</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#theoretical-convergence-speed-and-convex-optimization" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Theoretical Convergence Speed and convex optimization</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#strongly-convex-case" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Strongly Convex Case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#non-convex-problems" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Non-Convex Problems</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs034.html#memory-usage-and-scalability" style="font-size: 80%;"><b>Memory Usage and Scalability</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#empirical-evidence-convergence-time-and-memory-in-practice" style="font-size: 80%;"><b>Empirical Evidence: Convergence Time and Memory in Practice</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#deep-neural-networks" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Deep Neural Networks</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#memory-constraints" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Memory constraints</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs036.html#second-moment-of-the-gradient" style="font-size: 80%;"><b>Second moment of the gradient</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs037.html#challenge-choosing-a-fixed-learning-rate" style="font-size: 80%;"><b>Challenge: Choosing a Fixed Learning Rate</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs038.html#motivation-for-adaptive-step-sizes" style="font-size: 80%;"><b>Motivation for Adaptive Step Sizes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs038.html#adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>AdaGrad algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs039.html#derivation-of-the-adagrad-algorithm" style="font-size: 80%;"><b>Derivation of the AdaGrad Algorithm</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs040.html#adagrad-update-rule-derivation" style="font-size: 80%;"><b>AdaGrad Update Rule Derivation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs041.html#adagrad-properties" style="font-size: 80%;"><b>AdaGrad Properties</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs042.html#rmsprop-adaptive-learning-rates" style="font-size: 80%;"><b>RMSProp: Adaptive Learning Rates</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs042.html#rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>RMSProp algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs043.html#adam-optimizer" style="font-size: 80%;"><b>Adam Optimizer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs044.html#adam-optimizer-https-arxiv-org-abs-1412-6980" style="font-size: 80%;"><b>"ADAM optimizer":"https://arxiv.org/abs/1412.6980"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs045.html#why-combine-momentum-and-rmsprop" style="font-size: 80%;"><b>Why Combine Momentum and RMSProp?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs046.html#adam-exponential-moving-averages-moments" style="font-size: 80%;"><b>Adam: Exponential Moving Averages (Moments)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs047.html#adam-bias-correction" style="font-size: 80%;"><b>Adam: Bias Correction</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs048.html#adam-update-rule-derivation" style="font-size: 80%;"><b>Adam: Update Rule Derivation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs049.html#adam-vs-adagrad-and-rmsprop" style="font-size: 80%;"><b>Adam vs. AdaGrad and RMSProp</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs050.html#adaptivity-across-dimensions" style="font-size: 80%;"><b>Adaptivity Across Dimensions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs050.html#adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>ADAM algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs051.html#algorithms-and-codes-for-adagrad-rmsprop-and-adam" style="font-size: 80%;"><b>Algorithms and codes for Adagrad, RMSprop and Adam</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs052.html#practical-tips" style="font-size: 80%;"><b>Practical tips</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs053.html#sneaking-in-automatic-differentiation-using-autograd" style="font-size: 80%;"><b>Sneaking in automatic differentiation using Autograd</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs056.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs055.html#including-stochastic-gradient-descent-with-autograd" style="font-size: 80%;"><b>Including Stochastic Gradient Descent with Autograd</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs056.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs057.html#but-none-of-these-can-compete-with-newton-s-method" style="font-size: 80%;"><b>But none of these can compete with Newton's method</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs058.html#similar-second-order-function-now-problem-but-now-with-adagrad" style="font-size: 80%;"><b>Similar (second order function now) problem but now with AdaGrad</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs059.html#rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent" style="font-size: 80%;"><b>RMSprop for adaptive learning rate with Stochastic Gradient Descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs060.html#and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf" style="font-size: 80%;"><b>And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs061.html#material-for-the-lab-sessions" style="font-size: 80%;"><b>Material for the lab sessions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs062.html#reminder-on-different-scaling-methods" style="font-size: 80%;"><b>Reminder on different scaling methods</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs063.html#functionality-in-scikit-learn" style="font-size: 80%;"><b>Functionality in Scikit-Learn</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs064.html#more-preprocessing" style="font-size: 80%;"><b>More preprocessing</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs065.html#frequently-used-scaling-functions" style="font-size: 80%;"><b>Frequently used scaling functions</b></a></li>
 
         </ul>
       </li>
@@ -290,18 +373,25 @@
 <div class="container">
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0010"></a>
-<!-- !split -->
-<h2 id="conditional-probability" class="anchor">Conditional  Probability </h2>
+<!-- !split  -->
+<h2 id="gradient-descent-and-ridge" class="anchor">Gradient descent and Ridge </h2>
 
-<p>The conditional  probability, if \( p(Y) > 0 \), is </p>
-<div class="panel panel-default">
-<div class="panel-body">
-<!-- subsequent paragraphs come in larger fonts, so start with a paragraph -->
+<p>We have also discussed Ridge regression where the loss function contains a regularized term given by the \( L_2 \) norm of \( \theta \), </p>
 $$
-p(X\vert Y)= \frac{p(X,Y)}{p(Y)}=\frac{p(X,Y)}{\sum_{i=0}^{n-1}p(Y\vert X=x_i)p(x_i)}.
+C_{\text{ridge}}(\theta) = \frac{1}{n}||X\theta -\mathbf{y}||^2 + \lambda ||\theta||^2, \ \lambda \geq 0.
+$$
+
+<p>In order to minimize \( C_{\text{ridge}}(\theta) \) using GD we adjust the gradient as follows </p>
+$$
+\nabla_\theta C_{\text{ridge}}(\theta)  = \frac{2}{n}\begin{bmatrix} \sum_{i=1}^{100} \left(\theta_0+\theta_1x_i-y_i\right) \\
+\sum_{i=1}^{100}\left( x_i (\theta_0+\theta_1x_i)-y_ix_i\right) \\
+\end{bmatrix} + 2\lambda\begin{bmatrix} \theta_0 \\ \theta_1\end{bmatrix} = 2 (\frac{1}{n}X^T(X\theta - \mathbf{y})+\lambda \theta).
+$$
+
+<p>We can easily extend our program to minimize \( C_{\text{ridge}}(\theta) \) using gradient descent and compare with the analytical solution given by </p>
+$$
+\theta_{\text{ridge}} = \left(X^T X + n\lambda I_{2 \times 2} \right)^{-1} X^T \mathbf{y}.
 $$
-</div>
-</div>
 
 
 <p>
@@ -329,7 +419,7 @@ <h2 id="conditional-probability" class="anchor">Conditional  Probability </h2>
   <li><a href="http://github.com/._week37-bs018.html">19</a></li>
   <li><a href="http://github.com/._week37-bs019.html">20</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week37-bs053.html">54</a></li>
+  <li><a href="http://github.com/._week37-bs065.html">66</a></li>
   <li><a href="http://github.com/._week37-bs011.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week37/html/._week37-bs011.html b/doc/pub/week37/html/._week37-bs011.html
index e64c63365..0e2776240 100644
--- a/doc/pub/week37/html/._week37-bs011.html
+++ b/doc/pub/week37/html/._week37-bs011.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 37: Statistical interpretations and Resampling Methods">
-<title>Week 37: Statistical interpretations and Resampling Methods</title>
+<meta name="description" content="Week 37: Gradient descent methods">
+<title>Week 37: Gradient descent methods</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week37.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week37-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -40,159 +40,222 @@
                2,
                None,
                'plans-for-week-37-lecture-monday'),
-              ('Plans for week 37, lab sessions',
+              ('Readings and Videos:', 2, None, 'readings-and-videos'),
+              ('Material for lecture Monday September 8',
                2,
                None,
-               'plans-for-week-37-lab-sessions'),
-              ('Material for lecture Monday September 9',
+               'material-for-lecture-monday-september-8'),
+              ('Gradient descent and revisiting Ordinary Least Squares from '
+               'last week',
                2,
                None,
-               'material-for-lecture-monday-september-9'),
-              ('Deriving OLS from a probability distribution',
+               'gradient-descent-and-revisiting-ordinary-least-squares-from-last-week'),
+              ('Gradient descent example', 2, None, 'gradient-descent-example'),
+              ('The derivative of the cost/loss function',
                2,
                None,
-               'deriving-ols-from-a-probability-distribution'),
-              ('Independent and Identically Distrubuted (iid)',
+               'the-derivative-of-the-cost-loss-function'),
+              ('The Hessian matrix', 2, None, 'the-hessian-matrix'),
+              ('Simple program', 2, None, 'simple-program'),
+              ('Gradient Descent Example', 2, None, 'gradient-descent-example'),
+              ('Gradient descent and Ridge',
                2,
                None,
-               'independent-and-identically-distrubuted-iid'),
-              ('Maximum Likelihood Estimation (MLE)',
+               'gradient-descent-and-ridge'),
+              ('The Hessian matrix for Ridge Regression',
                2,
                None,
-               'maximum-likelihood-estimation-mle'),
-              ('A new Cost Function', 2, None, 'a-new-cost-function'),
-              ("More basic Statistics and Bayes' theorem",
+               'the-hessian-matrix-for-ridge-regression'),
+              ('Program example for gradient descent with Ridge Regression',
                2,
                None,
-               'more-basic-statistics-and-bayes-theorem'),
-              ('Marginal Probability', 2, None, 'marginal-probability'),
-              ('Conditional  Probability', 2, None, 'conditional-probability'),
-              ("Bayes' Theorem", 2, None, 'bayes-theorem'),
-              ("Interpretations of Bayes' Theorem",
+               'program-example-for-gradient-descent-with-ridge-regression'),
+              ('Using gradient descent methods, limitations',
                2,
                None,
-               'interpretations-of-bayes-theorem'),
-              ("Example of Usage of Bayes' theorem",
+               'using-gradient-descent-methods-limitations'),
+              ('Momentum based GD', 2, None, 'momentum-based-gd'),
+              ('Improving gradient descent with momentum',
                2,
                None,
-               'example-of-usage-of-bayes-theorem'),
-              ('Doing it correctly', 2, None, 'doing-it-correctly'),
-              ("Bayes' Theorem and Ridge and Lasso Regression",
+               'improving-gradient-descent-with-momentum'),
+              ('Same code but now with momentum gradient descent',
                2,
                None,
-               'bayes-theorem-and-ridge-and-lasso-regression'),
-              ('Ridge and Bayes', 2, None, 'ridge-and-bayes'),
-              ('Lasso and Bayes', 2, None, 'lasso-and-bayes'),
-              ('Why resampling methods', 2, None, 'why-resampling-methods'),
-              ('Resampling methods', 2, None, 'resampling-methods'),
-              ('Resampling approaches can be computationally expensive',
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ('Overview video on Stochastic Gradient Descent (SGD)',
                2,
                None,
-               'resampling-approaches-can-be-computationally-expensive'),
-              ('Why resampling methods ?', 2, None, 'why-resampling-methods'),
-              ('Statistical analysis', 2, None, 'statistical-analysis'),
-              ('Resampling methods', 2, None, 'resampling-methods'),
-              ('Resampling methods: Bootstrap',
+               'overview-video-on-stochastic-gradient-descent-sgd'),
+              ('Batches and mini-batches', 2, None, 'batches-and-mini-batches'),
+              ('Pros and cons', 2, None, 'pros-and-cons'),
+              ('Convergence rates', 2, None, 'convergence-rates'),
+              ('Accuracy', 2, None, 'accuracy'),
+              ('Stochastic Gradient Descent (SGD)',
                2,
                None,
-               'resampling-methods-bootstrap'),
-              ('The Central Limit Theorem',
+               'stochastic-gradient-descent-sgd'),
+              ('Stochastic Gradient Descent',
                2,
                None,
-               'the-central-limit-theorem'),
-              ('Finding the Limit', 2, None, 'finding-the-limit'),
-              ('Rewriting the $\\delta$-function',
+               'stochastic-gradient-descent'),
+              ('Computation of gradients', 2, None, 'computation-of-gradients'),
+              ('SGD example', 2, None, 'sgd-example'),
+              ('The gradient step', 2, None, 'the-gradient-step'),
+              ('Simple example code', 2, None, 'simple-example-code'),
+              ('When do we stop?', 2, None, 'when-do-we-stop'),
+              ('Slightly different approach',
                2,
                None,
-               'rewriting-the-delta-function'),
-              ('Identifying Terms', 2, None, 'identifying-terms'),
-              ('Wrapping it up', 2, None, 'wrapping-it-up'),
-              ('Confidence Intervals', 2, None, 'confidence-intervals'),
-              ('Standard Approach based on the Normal Distribution',
+               'slightly-different-approach'),
+              ('Time decay rate', 2, None, 'time-decay-rate'),
+              ('Code with a Number of Minibatches which varies',
                2,
                None,
-               'standard-approach-based-on-the-normal-distribution'),
-              ('Resampling methods: Bootstrap background',
+               'code-with-a-number-of-minibatches-which-varies'),
+              ('Replace or not', 2, None, 'replace-or-not'),
+              ('SGD vs Full-Batch GD: Convergence Speed and Memory Comparison',
                2,
                None,
-               'resampling-methods-bootstrap-background'),
-              ('Resampling methods: More Bootstrap background',
+               'sgd-vs-full-batch-gd-convergence-speed-and-memory-comparison'),
+              ('Theoretical Convergence Speed and convex optimization',
+               3,
+               None,
+               'theoretical-convergence-speed-and-convex-optimization'),
+              ('Strongly Convex Case', 3, None, 'strongly-convex-case'),
+              ('Non-Convex Problems', 3, None, 'non-convex-problems'),
+              ('Memory Usage and Scalability',
+               2,
+               None,
+               'memory-usage-and-scalability'),
+              ('Empirical Evidence: Convergence Time and Memory in Practice',
+               2,
+               None,
+               'empirical-evidence-convergence-time-and-memory-in-practice'),
+              ('Deep Neural Networks', 3, None, 'deep-neural-networks'),
+              ('Memory constraints', 3, None, 'memory-constraints'),
+              ('Second moment of the gradient',
+               2,
+               None,
+               'second-moment-of-the-gradient'),
+              ('Challenge: Choosing a Fixed Learning Rate',
+               2,
+               None,
+               'challenge-choosing-a-fixed-learning-rate'),
+              ('Motivation for Adaptive Step Sizes',
+               2,
+               None,
+               'motivation-for-adaptive-step-sizes'),
+              ('AdaGrad algorithm, taken from "Goodfellow et '
+               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
+               2,
+               None,
+               'adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
+              ('Derivation of the AdaGrad Algorithm',
+               2,
+               None,
+               'derivation-of-the-adagrad-algorithm'),
+              ('AdaGrad Update Rule Derivation',
                2,
                None,
-               'resampling-methods-more-bootstrap-background'),
-              ('Resampling methods: Bootstrap approach',
+               'adagrad-update-rule-derivation'),
+              ('AdaGrad Properties', 2, None, 'adagrad-properties'),
+              ('RMSProp: Adaptive Learning Rates',
                2,
                None,
-               'resampling-methods-bootstrap-approach'),
-              ('Resampling methods: Bootstrap steps',
+               'rmsprop-adaptive-learning-rates'),
+              ('RMSProp algorithm, taken from "Goodfellow et '
+               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
                2,
                None,
-               'resampling-methods-bootstrap-steps'),
-              ('Code example for the Bootstrap method',
+               'rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
+              ('Adam Optimizer', 2, None, 'adam-optimizer'),
+              ('"ADAM optimizer":"https://arxiv.org/abs/1412.6980"',
                2,
                None,
-               'code-example-for-the-bootstrap-method'),
-              ('Plotting the Histogram', 2, None, 'plotting-the-histogram'),
-              ('The bias-variance tradeoff',
+               'adam-optimizer-https-arxiv-org-abs-1412-6980'),
+              ('Why Combine Momentum and RMSProp?',
                2,
                None,
-               'the-bias-variance-tradeoff'),
-              ('A way to Read the Bias-Variance Tradeoff',
+               'why-combine-momentum-and-rmsprop'),
+              ('Adam: Exponential Moving Averages (Moments)',
                2,
                None,
-               'a-way-to-read-the-bias-variance-tradeoff'),
-              ('Example code for Bias-Variance tradeoff',
+               'adam-exponential-moving-averages-moments'),
+              ('Adam: Bias Correction', 2, None, 'adam-bias-correction'),
+              ('Adam: Update Rule Derivation',
                2,
                None,
-               'example-code-for-bias-variance-tradeoff'),
-              ('Understanding what happens',
+               'adam-update-rule-derivation'),
+              ('Adam vs. AdaGrad and RMSProp',
                2,
                None,
-               'understanding-what-happens'),
-              ('Summing up', 2, None, 'summing-up'),
-              ("Another Example from Scikit-Learn's Repository",
+               'adam-vs-adagrad-and-rmsprop'),
+              ('Adaptivity Across Dimensions',
                2,
                None,
-               'another-example-from-scikit-learn-s-repository'),
-              ('Various steps in cross-validation',
+               'adaptivity-across-dimensions'),
+              ('ADAM algorithm, taken from "Goodfellow et '
+               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
                2,
                None,
-               'various-steps-in-cross-validation'),
-              ('Cross-validation in brief',
+               'adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
+              ('Algorithms and codes for Adagrad, RMSprop and Adam',
                2,
                None,
-               'cross-validation-in-brief'),
-              ('Code Example for Cross-validation and $k$-fold '
-               'Cross-validation',
+               'algorithms-and-codes-for-adagrad-rmsprop-and-adam'),
+              ('Practical tips', 2, None, 'practical-tips'),
+              ('Sneaking in automatic differentiation using Autograd',
                2,
                None,
-               'code-example-for-cross-validation-and-k-fold-cross-validation'),
-              ('More examples on bootstrap and cross-validation and errors',
+               'sneaking-in-automatic-differentiation-using-autograd'),
+              ('Same code but now with momentum gradient descent',
                2,
                None,
-               'more-examples-on-bootstrap-and-cross-validation-and-errors'),
-              ('The same example but now with cross-validation',
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ('Including Stochastic Gradient Descent with Autograd',
                2,
                None,
-               'the-same-example-but-now-with-cross-validation'),
+               'including-stochastic-gradient-descent-with-autograd'),
+              ('Same code but now with momentum gradient descent',
+               2,
+               None,
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ("But none of these can compete with Newton's method",
+               2,
+               None,
+               'but-none-of-these-can-compete-with-newton-s-method'),
+              ('Similar (second order function now) problem but now with '
+               'AdaGrad',
+               2,
+               None,
+               'similar-second-order-function-now-problem-but-now-with-adagrad'),
+              ('RMSprop for adaptive learning rate with Stochastic Gradient '
+               'Descent',
+               2,
+               None,
+               'rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent'),
+              ('And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"',
+               2,
+               None,
+               'and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf'),
               ('Material for the lab sessions',
                2,
                None,
                'material-for-the-lab-sessions'),
-              ('Linking the regression analysis with a statistical '
-               'interpretation',
+              ('Reminder on different scaling methods',
                2,
                None,
-               'linking-the-regression-analysis-with-a-statistical-interpretation'),
-              ('Assumptions made', 2, None, 'assumptions-made'),
-              ('Expectation value and variance',
+               'reminder-on-different-scaling-methods'),
+              ('Functionality in Scikit-Learn',
                2,
                None,
-               'expectation-value-and-variance'),
-              ('Expectation value and variance for $\\boldsymbol{\\beta}$',
+               'functionality-in-scikit-learn'),
+              ('More preprocessing', 2, None, 'more-preprocessing'),
+              ('Frequently used scaling functions',
                2,
                None,
-               'expectation-value-and-variance-for-boldsymbol-beta')]}
+               'frequently-used-scaling-functions')]}
 end of tocinfo -->
 
 <body>
@@ -220,66 +283,86 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week37-bs.html">Week 37: Statistical interpretations and Resampling Methods</a>
+    <a class="navbar-brand" href="http://github.com/week37-bs.html">Week 37: Gradient descent methods</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs001.html#plans-for-week-37-lecture-monday" style="font-size: 80%;">Plans for week 37, lecture Monday</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs002.html#plans-for-week-37-lab-sessions" style="font-size: 80%;">Plans for week 37, lab sessions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs003.html#material-for-lecture-monday-september-9" style="font-size: 80%;">Material for lecture Monday September 9</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs004.html#deriving-ols-from-a-probability-distribution" style="font-size: 80%;">Deriving OLS from a probability distribution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs005.html#independent-and-identically-distrubuted-iid" style="font-size: 80%;">Independent and Identically Distrubuted (iid)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs006.html#maximum-likelihood-estimation-mle" style="font-size: 80%;">Maximum Likelihood Estimation (MLE)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs007.html#a-new-cost-function" style="font-size: 80%;">A new Cost Function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs008.html#more-basic-statistics-and-bayes-theorem" style="font-size: 80%;">More basic Statistics and Bayes' theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs009.html#marginal-probability" style="font-size: 80%;">Marginal Probability</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs010.html#conditional-probability" style="font-size: 80%;">Conditional  Probability</a></li>
-     <!-- navigation toc: --> <li><a href="#bayes-theorem" style="font-size: 80%;">Bayes' Theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs012.html#interpretations-of-bayes-theorem" style="font-size: 80%;">Interpretations of Bayes' Theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs013.html#example-of-usage-of-bayes-theorem" style="font-size: 80%;">Example of Usage of Bayes' theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs014.html#doing-it-correctly" style="font-size: 80%;">Doing it correctly</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs015.html#bayes-theorem-and-ridge-and-lasso-regression" style="font-size: 80%;">Bayes' Theorem and Ridge and Lasso Regression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs016.html#ridge-and-bayes" style="font-size: 80%;">Ridge and Bayes</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs017.html#lasso-and-bayes" style="font-size: 80%;">Lasso and Bayes</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs021.html#why-resampling-methods" style="font-size: 80%;">Why resampling methods</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs023.html#resampling-methods" style="font-size: 80%;">Resampling methods</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs020.html#resampling-approaches-can-be-computationally-expensive" style="font-size: 80%;">Resampling approaches can be computationally expensive</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs021.html#why-resampling-methods" style="font-size: 80%;">Why resampling methods ?</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs022.html#statistical-analysis" style="font-size: 80%;">Statistical analysis</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs023.html#resampling-methods" style="font-size: 80%;">Resampling methods</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs024.html#resampling-methods-bootstrap" style="font-size: 80%;">Resampling methods: Bootstrap</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs025.html#the-central-limit-theorem" style="font-size: 80%;">The Central Limit Theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs026.html#finding-the-limit" style="font-size: 80%;">Finding the Limit</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs027.html#rewriting-the-delta-function" style="font-size: 80%;">Rewriting the \( \delta \)-function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs028.html#identifying-terms" style="font-size: 80%;">Identifying Terms</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs029.html#wrapping-it-up" style="font-size: 80%;">Wrapping it up</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs030.html#confidence-intervals" style="font-size: 80%;">Confidence Intervals</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs031.html#standard-approach-based-on-the-normal-distribution" style="font-size: 80%;">Standard Approach based on the Normal Distribution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs032.html#resampling-methods-bootstrap-background" style="font-size: 80%;">Resampling methods: Bootstrap background</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#resampling-methods-more-bootstrap-background" style="font-size: 80%;">Resampling methods: More Bootstrap background</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs034.html#resampling-methods-bootstrap-approach" style="font-size: 80%;">Resampling methods: Bootstrap approach</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#resampling-methods-bootstrap-steps" style="font-size: 80%;">Resampling methods: Bootstrap steps</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs036.html#code-example-for-the-bootstrap-method" style="font-size: 80%;">Code example for the Bootstrap method</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs037.html#plotting-the-histogram" style="font-size: 80%;">Plotting the Histogram</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs038.html#the-bias-variance-tradeoff" style="font-size: 80%;">The bias-variance tradeoff</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs039.html#a-way-to-read-the-bias-variance-tradeoff" style="font-size: 80%;">A way to Read the Bias-Variance Tradeoff</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs040.html#example-code-for-bias-variance-tradeoff" style="font-size: 80%;">Example code for Bias-Variance tradeoff</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs041.html#understanding-what-happens" style="font-size: 80%;">Understanding what happens</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs042.html#summing-up" style="font-size: 80%;">Summing up</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs043.html#another-example-from-scikit-learn-s-repository" style="font-size: 80%;">Another Example from Scikit-Learn's Repository</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs044.html#various-steps-in-cross-validation" style="font-size: 80%;">Various steps in cross-validation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs045.html#cross-validation-in-brief" style="font-size: 80%;">Cross-validation in brief</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs046.html#code-example-for-cross-validation-and-k-fold-cross-validation" style="font-size: 80%;">Code Example for Cross-validation and \( k \)-fold Cross-validation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs047.html#more-examples-on-bootstrap-and-cross-validation-and-errors" style="font-size: 80%;">More examples on bootstrap and cross-validation and errors</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs048.html#the-same-example-but-now-with-cross-validation" style="font-size: 80%;">The same example but now with cross-validation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs049.html#material-for-the-lab-sessions" style="font-size: 80%;">Material for the lab sessions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs050.html#linking-the-regression-analysis-with-a-statistical-interpretation" style="font-size: 80%;">Linking the regression analysis with a statistical interpretation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs051.html#assumptions-made" style="font-size: 80%;">Assumptions made</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs052.html#expectation-value-and-variance" style="font-size: 80%;">Expectation value and variance</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs053.html#expectation-value-and-variance-for-boldsymbol-beta" style="font-size: 80%;">Expectation value and variance for \( \boldsymbol{\beta} \)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs001.html#plans-for-week-37-lecture-monday" style="font-size: 80%;"><b>Plans for week 37, lecture Monday</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs002.html#readings-and-videos" style="font-size: 80%;"><b>Readings and Videos:</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs003.html#material-for-lecture-monday-september-8" style="font-size: 80%;"><b>Material for lecture Monday September 8</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs004.html#gradient-descent-and-revisiting-ordinary-least-squares-from-last-week" style="font-size: 80%;"><b>Gradient descent and revisiting Ordinary Least Squares from last week</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs009.html#gradient-descent-example" style="font-size: 80%;"><b>Gradient descent example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs006.html#the-derivative-of-the-cost-loss-function" style="font-size: 80%;"><b>The derivative of the cost/loss function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs007.html#the-hessian-matrix" style="font-size: 80%;"><b>The Hessian matrix</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs008.html#simple-program" style="font-size: 80%;"><b>Simple program</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs009.html#gradient-descent-example" style="font-size: 80%;"><b>Gradient Descent Example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs010.html#gradient-descent-and-ridge" style="font-size: 80%;"><b>Gradient descent and Ridge</b></a></li>
+     <!-- navigation toc: --> <li><a href="#the-hessian-matrix-for-ridge-regression" style="font-size: 80%;"><b>The Hessian matrix for Ridge Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs012.html#program-example-for-gradient-descent-with-ridge-regression" style="font-size: 80%;"><b>Program example for gradient descent with Ridge Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs013.html#using-gradient-descent-methods-limitations" style="font-size: 80%;"><b>Using gradient descent methods, limitations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs014.html#momentum-based-gd" style="font-size: 80%;"><b>Momentum based GD</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs015.html#improving-gradient-descent-with-momentum" style="font-size: 80%;"><b>Improving gradient descent with momentum</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs056.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs017.html#overview-video-on-stochastic-gradient-descent-sgd" style="font-size: 80%;"><b>Overview video on Stochastic Gradient Descent (SGD)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs018.html#batches-and-mini-batches" style="font-size: 80%;"><b>Batches and mini-batches</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs019.html#pros-and-cons" style="font-size: 80%;"><b>Pros and cons</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs020.html#convergence-rates" style="font-size: 80%;"><b>Convergence rates</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs021.html#accuracy" style="font-size: 80%;"><b>Accuracy</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs022.html#stochastic-gradient-descent-sgd" style="font-size: 80%;"><b>Stochastic Gradient Descent (SGD)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs023.html#stochastic-gradient-descent" style="font-size: 80%;"><b>Stochastic Gradient Descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs024.html#computation-of-gradients" style="font-size: 80%;"><b>Computation of gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs025.html#sgd-example" style="font-size: 80%;"><b>SGD example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs026.html#the-gradient-step" style="font-size: 80%;"><b>The gradient step</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs027.html#simple-example-code" style="font-size: 80%;"><b>Simple example code</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs028.html#when-do-we-stop" style="font-size: 80%;"><b>When do we stop?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs029.html#slightly-different-approach" style="font-size: 80%;"><b>Slightly different approach</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs030.html#time-decay-rate" style="font-size: 80%;"><b>Time decay rate</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs031.html#code-with-a-number-of-minibatches-which-varies" style="font-size: 80%;"><b>Code with a Number of Minibatches which varies</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs032.html#replace-or-not" style="font-size: 80%;"><b>Replace or not</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#sgd-vs-full-batch-gd-convergence-speed-and-memory-comparison" style="font-size: 80%;"><b>SGD vs Full-Batch GD: Convergence Speed and Memory Comparison</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#theoretical-convergence-speed-and-convex-optimization" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Theoretical Convergence Speed and convex optimization</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#strongly-convex-case" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Strongly Convex Case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#non-convex-problems" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Non-Convex Problems</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs034.html#memory-usage-and-scalability" style="font-size: 80%;"><b>Memory Usage and Scalability</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#empirical-evidence-convergence-time-and-memory-in-practice" style="font-size: 80%;"><b>Empirical Evidence: Convergence Time and Memory in Practice</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#deep-neural-networks" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Deep Neural Networks</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#memory-constraints" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Memory constraints</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs036.html#second-moment-of-the-gradient" style="font-size: 80%;"><b>Second moment of the gradient</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs037.html#challenge-choosing-a-fixed-learning-rate" style="font-size: 80%;"><b>Challenge: Choosing a Fixed Learning Rate</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs038.html#motivation-for-adaptive-step-sizes" style="font-size: 80%;"><b>Motivation for Adaptive Step Sizes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs038.html#adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>AdaGrad algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs039.html#derivation-of-the-adagrad-algorithm" style="font-size: 80%;"><b>Derivation of the AdaGrad Algorithm</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs040.html#adagrad-update-rule-derivation" style="font-size: 80%;"><b>AdaGrad Update Rule Derivation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs041.html#adagrad-properties" style="font-size: 80%;"><b>AdaGrad Properties</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs042.html#rmsprop-adaptive-learning-rates" style="font-size: 80%;"><b>RMSProp: Adaptive Learning Rates</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs042.html#rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>RMSProp algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs043.html#adam-optimizer" style="font-size: 80%;"><b>Adam Optimizer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs044.html#adam-optimizer-https-arxiv-org-abs-1412-6980" style="font-size: 80%;"><b>"ADAM optimizer":"https://arxiv.org/abs/1412.6980"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs045.html#why-combine-momentum-and-rmsprop" style="font-size: 80%;"><b>Why Combine Momentum and RMSProp?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs046.html#adam-exponential-moving-averages-moments" style="font-size: 80%;"><b>Adam: Exponential Moving Averages (Moments)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs047.html#adam-bias-correction" style="font-size: 80%;"><b>Adam: Bias Correction</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs048.html#adam-update-rule-derivation" style="font-size: 80%;"><b>Adam: Update Rule Derivation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs049.html#adam-vs-adagrad-and-rmsprop" style="font-size: 80%;"><b>Adam vs. AdaGrad and RMSProp</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs050.html#adaptivity-across-dimensions" style="font-size: 80%;"><b>Adaptivity Across Dimensions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs050.html#adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>ADAM algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs051.html#algorithms-and-codes-for-adagrad-rmsprop-and-adam" style="font-size: 80%;"><b>Algorithms and codes for Adagrad, RMSprop and Adam</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs052.html#practical-tips" style="font-size: 80%;"><b>Practical tips</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs053.html#sneaking-in-automatic-differentiation-using-autograd" style="font-size: 80%;"><b>Sneaking in automatic differentiation using Autograd</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs056.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs055.html#including-stochastic-gradient-descent-with-autograd" style="font-size: 80%;"><b>Including Stochastic Gradient Descent with Autograd</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs056.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs057.html#but-none-of-these-can-compete-with-newton-s-method" style="font-size: 80%;"><b>But none of these can compete with Newton's method</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs058.html#similar-second-order-function-now-problem-but-now-with-adagrad" style="font-size: 80%;"><b>Similar (second order function now) problem but now with AdaGrad</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs059.html#rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent" style="font-size: 80%;"><b>RMSprop for adaptive learning rate with Stochastic Gradient Descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs060.html#and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf" style="font-size: 80%;"><b>And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs061.html#material-for-the-lab-sessions" style="font-size: 80%;"><b>Material for the lab sessions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs062.html#reminder-on-different-scaling-methods" style="font-size: 80%;"><b>Reminder on different scaling methods</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs063.html#functionality-in-scikit-learn" style="font-size: 80%;"><b>Functionality in Scikit-Learn</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs064.html#more-preprocessing" style="font-size: 80%;"><b>More preprocessing</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs065.html#frequently-used-scaling-functions" style="font-size: 80%;"><b>Frequently used scaling functions</b></a></li>
 
         </ul>
       </li>
@@ -291,20 +374,21 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0011"></a>
 <!-- !split -->
-<h2 id="bayes-theorem" class="anchor">Bayes' Theorem </h2>
-
-<p>If we combine the conditional probability with the marginal probability and the standard product rule, we have</p>
-$$
-p(X\vert Y)= \frac{p(X,Y)}{p(Y)},
-$$
-
-<p>which we can rewrite as</p>
-
+<h2 id="the-hessian-matrix-for-ridge-regression" class="anchor">The Hessian matrix for Ridge Regression </h2>
+<p>The Hessian matrix of Ridge Regression for our simple example  is given by </p>
 $$
-p(X\vert Y)= \frac{p(X,Y)}{\sum_{i=0}^{n-1}p(Y\vert X=x_i)p(x_i)}=\frac{p(Y\vert X)p(X)}{\sum_{i=0}^{n-1}p(Y\vert X=x_i)p(x_i)},
+\boldsymbol{H} \equiv \begin{bmatrix}
+\frac{\partial^2 C(\theta)}{\partial \theta_0^2} & \frac{\partial^2 C(\theta)}{\partial \theta_0 \partial \theta_1}  \\
+\frac{\partial^2 C(\theta)}{\partial \theta_0 \partial \theta_1} & \frac{\partial^2 C(\theta)}{\partial \theta_1^2} &  \\
+\end{bmatrix} = \frac{2}{n}X^T X+2\lambda\boldsymbol{I}.
 $$
 
-<p>which is Bayes' theorem. It allows us to evaluate the uncertainty in in \( X \) after we have observed \( Y \). We can easily interchange \( X \) with \( Y \).  </p>
+<p>This implies that the Hessian matrix  is positive definite, hence the stationary point is a
+minimum.
+Note that the Ridge cost function is convex being  a sum of two convex
+functions. Therefore, the stationary point is a global
+minimum of this function.
+</p>
 
 <p>
 <!-- navigation buttons at the bottom of the page -->
@@ -331,7 +415,7 @@ <h2 id="bayes-theorem" class="anchor">Bayes' Theorem </h2>
   <li><a href="http://github.com/._week37-bs019.html">20</a></li>
   <li><a href="http://github.com/._week37-bs020.html">21</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week37-bs053.html">54</a></li>
+  <li><a href="http://github.com/._week37-bs065.html">66</a></li>
   <li><a href="http://github.com/._week37-bs012.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week37/html/._week37-bs012.html b/doc/pub/week37/html/._week37-bs012.html
index 4162a2125..89dbfd4cb 100644
--- a/doc/pub/week37/html/._week37-bs012.html
+++ b/doc/pub/week37/html/._week37-bs012.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 37: Statistical interpretations and Resampling Methods">
-<title>Week 37: Statistical interpretations and Resampling Methods</title>
+<meta name="description" content="Week 37: Gradient descent methods">
+<title>Week 37: Gradient descent methods</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week37.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week37-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -40,159 +40,222 @@
                2,
                None,
                'plans-for-week-37-lecture-monday'),
-              ('Plans for week 37, lab sessions',
+              ('Readings and Videos:', 2, None, 'readings-and-videos'),
+              ('Material for lecture Monday September 8',
                2,
                None,
-               'plans-for-week-37-lab-sessions'),
-              ('Material for lecture Monday September 9',
+               'material-for-lecture-monday-september-8'),
+              ('Gradient descent and revisiting Ordinary Least Squares from '
+               'last week',
                2,
                None,
-               'material-for-lecture-monday-september-9'),
-              ('Deriving OLS from a probability distribution',
+               'gradient-descent-and-revisiting-ordinary-least-squares-from-last-week'),
+              ('Gradient descent example', 2, None, 'gradient-descent-example'),
+              ('The derivative of the cost/loss function',
                2,
                None,
-               'deriving-ols-from-a-probability-distribution'),
-              ('Independent and Identically Distrubuted (iid)',
+               'the-derivative-of-the-cost-loss-function'),
+              ('The Hessian matrix', 2, None, 'the-hessian-matrix'),
+              ('Simple program', 2, None, 'simple-program'),
+              ('Gradient Descent Example', 2, None, 'gradient-descent-example'),
+              ('Gradient descent and Ridge',
                2,
                None,
-               'independent-and-identically-distrubuted-iid'),
-              ('Maximum Likelihood Estimation (MLE)',
+               'gradient-descent-and-ridge'),
+              ('The Hessian matrix for Ridge Regression',
                2,
                None,
-               'maximum-likelihood-estimation-mle'),
-              ('A new Cost Function', 2, None, 'a-new-cost-function'),
-              ("More basic Statistics and Bayes' theorem",
+               'the-hessian-matrix-for-ridge-regression'),
+              ('Program example for gradient descent with Ridge Regression',
                2,
                None,
-               'more-basic-statistics-and-bayes-theorem'),
-              ('Marginal Probability', 2, None, 'marginal-probability'),
-              ('Conditional  Probability', 2, None, 'conditional-probability'),
-              ("Bayes' Theorem", 2, None, 'bayes-theorem'),
-              ("Interpretations of Bayes' Theorem",
+               'program-example-for-gradient-descent-with-ridge-regression'),
+              ('Using gradient descent methods, limitations',
                2,
                None,
-               'interpretations-of-bayes-theorem'),
-              ("Example of Usage of Bayes' theorem",
+               'using-gradient-descent-methods-limitations'),
+              ('Momentum based GD', 2, None, 'momentum-based-gd'),
+              ('Improving gradient descent with momentum',
                2,
                None,
-               'example-of-usage-of-bayes-theorem'),
-              ('Doing it correctly', 2, None, 'doing-it-correctly'),
-              ("Bayes' Theorem and Ridge and Lasso Regression",
+               'improving-gradient-descent-with-momentum'),
+              ('Same code but now with momentum gradient descent',
                2,
                None,
-               'bayes-theorem-and-ridge-and-lasso-regression'),
-              ('Ridge and Bayes', 2, None, 'ridge-and-bayes'),
-              ('Lasso and Bayes', 2, None, 'lasso-and-bayes'),
-              ('Why resampling methods', 2, None, 'why-resampling-methods'),
-              ('Resampling methods', 2, None, 'resampling-methods'),
-              ('Resampling approaches can be computationally expensive',
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ('Overview video on Stochastic Gradient Descent (SGD)',
                2,
                None,
-               'resampling-approaches-can-be-computationally-expensive'),
-              ('Why resampling methods ?', 2, None, 'why-resampling-methods'),
-              ('Statistical analysis', 2, None, 'statistical-analysis'),
-              ('Resampling methods', 2, None, 'resampling-methods'),
-              ('Resampling methods: Bootstrap',
+               'overview-video-on-stochastic-gradient-descent-sgd'),
+              ('Batches and mini-batches', 2, None, 'batches-and-mini-batches'),
+              ('Pros and cons', 2, None, 'pros-and-cons'),
+              ('Convergence rates', 2, None, 'convergence-rates'),
+              ('Accuracy', 2, None, 'accuracy'),
+              ('Stochastic Gradient Descent (SGD)',
                2,
                None,
-               'resampling-methods-bootstrap'),
-              ('The Central Limit Theorem',
+               'stochastic-gradient-descent-sgd'),
+              ('Stochastic Gradient Descent',
                2,
                None,
-               'the-central-limit-theorem'),
-              ('Finding the Limit', 2, None, 'finding-the-limit'),
-              ('Rewriting the $\\delta$-function',
+               'stochastic-gradient-descent'),
+              ('Computation of gradients', 2, None, 'computation-of-gradients'),
+              ('SGD example', 2, None, 'sgd-example'),
+              ('The gradient step', 2, None, 'the-gradient-step'),
+              ('Simple example code', 2, None, 'simple-example-code'),
+              ('When do we stop?', 2, None, 'when-do-we-stop'),
+              ('Slightly different approach',
                2,
                None,
-               'rewriting-the-delta-function'),
-              ('Identifying Terms', 2, None, 'identifying-terms'),
-              ('Wrapping it up', 2, None, 'wrapping-it-up'),
-              ('Confidence Intervals', 2, None, 'confidence-intervals'),
-              ('Standard Approach based on the Normal Distribution',
+               'slightly-different-approach'),
+              ('Time decay rate', 2, None, 'time-decay-rate'),
+              ('Code with a Number of Minibatches which varies',
                2,
                None,
-               'standard-approach-based-on-the-normal-distribution'),
-              ('Resampling methods: Bootstrap background',
+               'code-with-a-number-of-minibatches-which-varies'),
+              ('Replace or not', 2, None, 'replace-or-not'),
+              ('SGD vs Full-Batch GD: Convergence Speed and Memory Comparison',
                2,
                None,
-               'resampling-methods-bootstrap-background'),
-              ('Resampling methods: More Bootstrap background',
+               'sgd-vs-full-batch-gd-convergence-speed-and-memory-comparison'),
+              ('Theoretical Convergence Speed and convex optimization',
+               3,
+               None,
+               'theoretical-convergence-speed-and-convex-optimization'),
+              ('Strongly Convex Case', 3, None, 'strongly-convex-case'),
+              ('Non-Convex Problems', 3, None, 'non-convex-problems'),
+              ('Memory Usage and Scalability',
+               2,
+               None,
+               'memory-usage-and-scalability'),
+              ('Empirical Evidence: Convergence Time and Memory in Practice',
+               2,
+               None,
+               'empirical-evidence-convergence-time-and-memory-in-practice'),
+              ('Deep Neural Networks', 3, None, 'deep-neural-networks'),
+              ('Memory constraints', 3, None, 'memory-constraints'),
+              ('Second moment of the gradient',
+               2,
+               None,
+               'second-moment-of-the-gradient'),
+              ('Challenge: Choosing a Fixed Learning Rate',
+               2,
+               None,
+               'challenge-choosing-a-fixed-learning-rate'),
+              ('Motivation for Adaptive Step Sizes',
+               2,
+               None,
+               'motivation-for-adaptive-step-sizes'),
+              ('AdaGrad algorithm, taken from "Goodfellow et '
+               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
+               2,
+               None,
+               'adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
+              ('Derivation of the AdaGrad Algorithm',
+               2,
+               None,
+               'derivation-of-the-adagrad-algorithm'),
+              ('AdaGrad Update Rule Derivation',
+               2,
+               None,
+               'adagrad-update-rule-derivation'),
+              ('AdaGrad Properties', 2, None, 'adagrad-properties'),
+              ('RMSProp: Adaptive Learning Rates',
+               2,
+               None,
+               'rmsprop-adaptive-learning-rates'),
+              ('RMSProp algorithm, taken from "Goodfellow et '
+               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
                2,
                None,
-               'resampling-methods-more-bootstrap-background'),
-              ('Resampling methods: Bootstrap approach',
+               'rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
+              ('Adam Optimizer', 2, None, 'adam-optimizer'),
+              ('"ADAM optimizer":"https://arxiv.org/abs/1412.6980"',
                2,
                None,
-               'resampling-methods-bootstrap-approach'),
-              ('Resampling methods: Bootstrap steps',
+               'adam-optimizer-https-arxiv-org-abs-1412-6980'),
+              ('Why Combine Momentum and RMSProp?',
                2,
                None,
-               'resampling-methods-bootstrap-steps'),
-              ('Code example for the Bootstrap method',
+               'why-combine-momentum-and-rmsprop'),
+              ('Adam: Exponential Moving Averages (Moments)',
                2,
                None,
-               'code-example-for-the-bootstrap-method'),
-              ('Plotting the Histogram', 2, None, 'plotting-the-histogram'),
-              ('The bias-variance tradeoff',
+               'adam-exponential-moving-averages-moments'),
+              ('Adam: Bias Correction', 2, None, 'adam-bias-correction'),
+              ('Adam: Update Rule Derivation',
                2,
                None,
-               'the-bias-variance-tradeoff'),
-              ('A way to Read the Bias-Variance Tradeoff',
+               'adam-update-rule-derivation'),
+              ('Adam vs. AdaGrad and RMSProp',
                2,
                None,
-               'a-way-to-read-the-bias-variance-tradeoff'),
-              ('Example code for Bias-Variance tradeoff',
+               'adam-vs-adagrad-and-rmsprop'),
+              ('Adaptivity Across Dimensions',
                2,
                None,
-               'example-code-for-bias-variance-tradeoff'),
-              ('Understanding what happens',
+               'adaptivity-across-dimensions'),
+              ('ADAM algorithm, taken from "Goodfellow et '
+               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
                2,
                None,
-               'understanding-what-happens'),
-              ('Summing up', 2, None, 'summing-up'),
-              ("Another Example from Scikit-Learn's Repository",
+               'adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
+              ('Algorithms and codes for Adagrad, RMSprop and Adam',
                2,
                None,
-               'another-example-from-scikit-learn-s-repository'),
-              ('Various steps in cross-validation',
+               'algorithms-and-codes-for-adagrad-rmsprop-and-adam'),
+              ('Practical tips', 2, None, 'practical-tips'),
+              ('Sneaking in automatic differentiation using Autograd',
                2,
                None,
-               'various-steps-in-cross-validation'),
-              ('Cross-validation in brief',
+               'sneaking-in-automatic-differentiation-using-autograd'),
+              ('Same code but now with momentum gradient descent',
                2,
                None,
-               'cross-validation-in-brief'),
-              ('Code Example for Cross-validation and $k$-fold '
-               'Cross-validation',
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ('Including Stochastic Gradient Descent with Autograd',
                2,
                None,
-               'code-example-for-cross-validation-and-k-fold-cross-validation'),
-              ('More examples on bootstrap and cross-validation and errors',
+               'including-stochastic-gradient-descent-with-autograd'),
+              ('Same code but now with momentum gradient descent',
                2,
                None,
-               'more-examples-on-bootstrap-and-cross-validation-and-errors'),
-              ('The same example but now with cross-validation',
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ("But none of these can compete with Newton's method",
                2,
                None,
-               'the-same-example-but-now-with-cross-validation'),
+               'but-none-of-these-can-compete-with-newton-s-method'),
+              ('Similar (second order function now) problem but now with '
+               'AdaGrad',
+               2,
+               None,
+               'similar-second-order-function-now-problem-but-now-with-adagrad'),
+              ('RMSprop for adaptive learning rate with Stochastic Gradient '
+               'Descent',
+               2,
+               None,
+               'rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent'),
+              ('And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"',
+               2,
+               None,
+               'and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf'),
               ('Material for the lab sessions',
                2,
                None,
                'material-for-the-lab-sessions'),
-              ('Linking the regression analysis with a statistical '
-               'interpretation',
+              ('Reminder on different scaling methods',
                2,
                None,
-               'linking-the-regression-analysis-with-a-statistical-interpretation'),
-              ('Assumptions made', 2, None, 'assumptions-made'),
-              ('Expectation value and variance',
+               'reminder-on-different-scaling-methods'),
+              ('Functionality in Scikit-Learn',
                2,
                None,
-               'expectation-value-and-variance'),
-              ('Expectation value and variance for $\\boldsymbol{\\beta}$',
+               'functionality-in-scikit-learn'),
+              ('More preprocessing', 2, None, 'more-preprocessing'),
+              ('Frequently used scaling functions',
                2,
                None,
-               'expectation-value-and-variance-for-boldsymbol-beta')]}
+               'frequently-used-scaling-functions')]}
 end of tocinfo -->
 
 <body>
@@ -220,66 +283,86 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week37-bs.html">Week 37: Statistical interpretations and Resampling Methods</a>
+    <a class="navbar-brand" href="http://github.com/week37-bs.html">Week 37: Gradient descent methods</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs001.html#plans-for-week-37-lecture-monday" style="font-size: 80%;">Plans for week 37, lecture Monday</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs002.html#plans-for-week-37-lab-sessions" style="font-size: 80%;">Plans for week 37, lab sessions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs003.html#material-for-lecture-monday-september-9" style="font-size: 80%;">Material for lecture Monday September 9</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs004.html#deriving-ols-from-a-probability-distribution" style="font-size: 80%;">Deriving OLS from a probability distribution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs005.html#independent-and-identically-distrubuted-iid" style="font-size: 80%;">Independent and Identically Distrubuted (iid)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs006.html#maximum-likelihood-estimation-mle" style="font-size: 80%;">Maximum Likelihood Estimation (MLE)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs007.html#a-new-cost-function" style="font-size: 80%;">A new Cost Function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs008.html#more-basic-statistics-and-bayes-theorem" style="font-size: 80%;">More basic Statistics and Bayes' theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs009.html#marginal-probability" style="font-size: 80%;">Marginal Probability</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs010.html#conditional-probability" style="font-size: 80%;">Conditional  Probability</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs011.html#bayes-theorem" style="font-size: 80%;">Bayes' Theorem</a></li>
-     <!-- navigation toc: --> <li><a href="#interpretations-of-bayes-theorem" style="font-size: 80%;">Interpretations of Bayes' Theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs013.html#example-of-usage-of-bayes-theorem" style="font-size: 80%;">Example of Usage of Bayes' theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs014.html#doing-it-correctly" style="font-size: 80%;">Doing it correctly</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs015.html#bayes-theorem-and-ridge-and-lasso-regression" style="font-size: 80%;">Bayes' Theorem and Ridge and Lasso Regression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs016.html#ridge-and-bayes" style="font-size: 80%;">Ridge and Bayes</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs017.html#lasso-and-bayes" style="font-size: 80%;">Lasso and Bayes</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs021.html#why-resampling-methods" style="font-size: 80%;">Why resampling methods</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs023.html#resampling-methods" style="font-size: 80%;">Resampling methods</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs020.html#resampling-approaches-can-be-computationally-expensive" style="font-size: 80%;">Resampling approaches can be computationally expensive</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs021.html#why-resampling-methods" style="font-size: 80%;">Why resampling methods ?</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs022.html#statistical-analysis" style="font-size: 80%;">Statistical analysis</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs023.html#resampling-methods" style="font-size: 80%;">Resampling methods</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs024.html#resampling-methods-bootstrap" style="font-size: 80%;">Resampling methods: Bootstrap</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs025.html#the-central-limit-theorem" style="font-size: 80%;">The Central Limit Theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs026.html#finding-the-limit" style="font-size: 80%;">Finding the Limit</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs027.html#rewriting-the-delta-function" style="font-size: 80%;">Rewriting the \( \delta \)-function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs028.html#identifying-terms" style="font-size: 80%;">Identifying Terms</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs029.html#wrapping-it-up" style="font-size: 80%;">Wrapping it up</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs030.html#confidence-intervals" style="font-size: 80%;">Confidence Intervals</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs031.html#standard-approach-based-on-the-normal-distribution" style="font-size: 80%;">Standard Approach based on the Normal Distribution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs032.html#resampling-methods-bootstrap-background" style="font-size: 80%;">Resampling methods: Bootstrap background</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#resampling-methods-more-bootstrap-background" style="font-size: 80%;">Resampling methods: More Bootstrap background</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs034.html#resampling-methods-bootstrap-approach" style="font-size: 80%;">Resampling methods: Bootstrap approach</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#resampling-methods-bootstrap-steps" style="font-size: 80%;">Resampling methods: Bootstrap steps</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs036.html#code-example-for-the-bootstrap-method" style="font-size: 80%;">Code example for the Bootstrap method</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs037.html#plotting-the-histogram" style="font-size: 80%;">Plotting the Histogram</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs038.html#the-bias-variance-tradeoff" style="font-size: 80%;">The bias-variance tradeoff</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs039.html#a-way-to-read-the-bias-variance-tradeoff" style="font-size: 80%;">A way to Read the Bias-Variance Tradeoff</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs040.html#example-code-for-bias-variance-tradeoff" style="font-size: 80%;">Example code for Bias-Variance tradeoff</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs041.html#understanding-what-happens" style="font-size: 80%;">Understanding what happens</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs042.html#summing-up" style="font-size: 80%;">Summing up</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs043.html#another-example-from-scikit-learn-s-repository" style="font-size: 80%;">Another Example from Scikit-Learn's Repository</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs044.html#various-steps-in-cross-validation" style="font-size: 80%;">Various steps in cross-validation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs045.html#cross-validation-in-brief" style="font-size: 80%;">Cross-validation in brief</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs046.html#code-example-for-cross-validation-and-k-fold-cross-validation" style="font-size: 80%;">Code Example for Cross-validation and \( k \)-fold Cross-validation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs047.html#more-examples-on-bootstrap-and-cross-validation-and-errors" style="font-size: 80%;">More examples on bootstrap and cross-validation and errors</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs048.html#the-same-example-but-now-with-cross-validation" style="font-size: 80%;">The same example but now with cross-validation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs049.html#material-for-the-lab-sessions" style="font-size: 80%;">Material for the lab sessions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs050.html#linking-the-regression-analysis-with-a-statistical-interpretation" style="font-size: 80%;">Linking the regression analysis with a statistical interpretation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs051.html#assumptions-made" style="font-size: 80%;">Assumptions made</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs052.html#expectation-value-and-variance" style="font-size: 80%;">Expectation value and variance</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs053.html#expectation-value-and-variance-for-boldsymbol-beta" style="font-size: 80%;">Expectation value and variance for \( \boldsymbol{\beta} \)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs001.html#plans-for-week-37-lecture-monday" style="font-size: 80%;"><b>Plans for week 37, lecture Monday</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs002.html#readings-and-videos" style="font-size: 80%;"><b>Readings and Videos:</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs003.html#material-for-lecture-monday-september-8" style="font-size: 80%;"><b>Material for lecture Monday September 8</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs004.html#gradient-descent-and-revisiting-ordinary-least-squares-from-last-week" style="font-size: 80%;"><b>Gradient descent and revisiting Ordinary Least Squares from last week</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs009.html#gradient-descent-example" style="font-size: 80%;"><b>Gradient descent example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs006.html#the-derivative-of-the-cost-loss-function" style="font-size: 80%;"><b>The derivative of the cost/loss function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs007.html#the-hessian-matrix" style="font-size: 80%;"><b>The Hessian matrix</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs008.html#simple-program" style="font-size: 80%;"><b>Simple program</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs009.html#gradient-descent-example" style="font-size: 80%;"><b>Gradient Descent Example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs010.html#gradient-descent-and-ridge" style="font-size: 80%;"><b>Gradient descent and Ridge</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs011.html#the-hessian-matrix-for-ridge-regression" style="font-size: 80%;"><b>The Hessian matrix for Ridge Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="#program-example-for-gradient-descent-with-ridge-regression" style="font-size: 80%;"><b>Program example for gradient descent with Ridge Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs013.html#using-gradient-descent-methods-limitations" style="font-size: 80%;"><b>Using gradient descent methods, limitations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs014.html#momentum-based-gd" style="font-size: 80%;"><b>Momentum based GD</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs015.html#improving-gradient-descent-with-momentum" style="font-size: 80%;"><b>Improving gradient descent with momentum</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs056.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs017.html#overview-video-on-stochastic-gradient-descent-sgd" style="font-size: 80%;"><b>Overview video on Stochastic Gradient Descent (SGD)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs018.html#batches-and-mini-batches" style="font-size: 80%;"><b>Batches and mini-batches</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs019.html#pros-and-cons" style="font-size: 80%;"><b>Pros and cons</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs020.html#convergence-rates" style="font-size: 80%;"><b>Convergence rates</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs021.html#accuracy" style="font-size: 80%;"><b>Accuracy</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs022.html#stochastic-gradient-descent-sgd" style="font-size: 80%;"><b>Stochastic Gradient Descent (SGD)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs023.html#stochastic-gradient-descent" style="font-size: 80%;"><b>Stochastic Gradient Descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs024.html#computation-of-gradients" style="font-size: 80%;"><b>Computation of gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs025.html#sgd-example" style="font-size: 80%;"><b>SGD example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs026.html#the-gradient-step" style="font-size: 80%;"><b>The gradient step</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs027.html#simple-example-code" style="font-size: 80%;"><b>Simple example code</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs028.html#when-do-we-stop" style="font-size: 80%;"><b>When do we stop?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs029.html#slightly-different-approach" style="font-size: 80%;"><b>Slightly different approach</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs030.html#time-decay-rate" style="font-size: 80%;"><b>Time decay rate</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs031.html#code-with-a-number-of-minibatches-which-varies" style="font-size: 80%;"><b>Code with a Number of Minibatches which varies</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs032.html#replace-or-not" style="font-size: 80%;"><b>Replace or not</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#sgd-vs-full-batch-gd-convergence-speed-and-memory-comparison" style="font-size: 80%;"><b>SGD vs Full-Batch GD: Convergence Speed and Memory Comparison</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#theoretical-convergence-speed-and-convex-optimization" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Theoretical Convergence Speed and convex optimization</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#strongly-convex-case" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Strongly Convex Case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#non-convex-problems" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Non-Convex Problems</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs034.html#memory-usage-and-scalability" style="font-size: 80%;"><b>Memory Usage and Scalability</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#empirical-evidence-convergence-time-and-memory-in-practice" style="font-size: 80%;"><b>Empirical Evidence: Convergence Time and Memory in Practice</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#deep-neural-networks" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Deep Neural Networks</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#memory-constraints" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Memory constraints</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs036.html#second-moment-of-the-gradient" style="font-size: 80%;"><b>Second moment of the gradient</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs037.html#challenge-choosing-a-fixed-learning-rate" style="font-size: 80%;"><b>Challenge: Choosing a Fixed Learning Rate</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs038.html#motivation-for-adaptive-step-sizes" style="font-size: 80%;"><b>Motivation for Adaptive Step Sizes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs038.html#adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>AdaGrad algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs039.html#derivation-of-the-adagrad-algorithm" style="font-size: 80%;"><b>Derivation of the AdaGrad Algorithm</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs040.html#adagrad-update-rule-derivation" style="font-size: 80%;"><b>AdaGrad Update Rule Derivation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs041.html#adagrad-properties" style="font-size: 80%;"><b>AdaGrad Properties</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs042.html#rmsprop-adaptive-learning-rates" style="font-size: 80%;"><b>RMSProp: Adaptive Learning Rates</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs042.html#rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>RMSProp algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs043.html#adam-optimizer" style="font-size: 80%;"><b>Adam Optimizer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs044.html#adam-optimizer-https-arxiv-org-abs-1412-6980" style="font-size: 80%;"><b>"ADAM optimizer":"https://arxiv.org/abs/1412.6980"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs045.html#why-combine-momentum-and-rmsprop" style="font-size: 80%;"><b>Why Combine Momentum and RMSProp?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs046.html#adam-exponential-moving-averages-moments" style="font-size: 80%;"><b>Adam: Exponential Moving Averages (Moments)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs047.html#adam-bias-correction" style="font-size: 80%;"><b>Adam: Bias Correction</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs048.html#adam-update-rule-derivation" style="font-size: 80%;"><b>Adam: Update Rule Derivation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs049.html#adam-vs-adagrad-and-rmsprop" style="font-size: 80%;"><b>Adam vs. AdaGrad and RMSProp</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs050.html#adaptivity-across-dimensions" style="font-size: 80%;"><b>Adaptivity Across Dimensions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs050.html#adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>ADAM algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs051.html#algorithms-and-codes-for-adagrad-rmsprop-and-adam" style="font-size: 80%;"><b>Algorithms and codes for Adagrad, RMSprop and Adam</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs052.html#practical-tips" style="font-size: 80%;"><b>Practical tips</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs053.html#sneaking-in-automatic-differentiation-using-autograd" style="font-size: 80%;"><b>Sneaking in automatic differentiation using Autograd</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs056.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs055.html#including-stochastic-gradient-descent-with-autograd" style="font-size: 80%;"><b>Including Stochastic Gradient Descent with Autograd</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs056.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs057.html#but-none-of-these-can-compete-with-newton-s-method" style="font-size: 80%;"><b>But none of these can compete with Newton's method</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs058.html#similar-second-order-function-now-problem-but-now-with-adagrad" style="font-size: 80%;"><b>Similar (second order function now) problem but now with AdaGrad</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs059.html#rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent" style="font-size: 80%;"><b>RMSprop for adaptive learning rate with Stochastic Gradient Descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs060.html#and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf" style="font-size: 80%;"><b>And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs061.html#material-for-the-lab-sessions" style="font-size: 80%;"><b>Material for the lab sessions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs062.html#reminder-on-different-scaling-methods" style="font-size: 80%;"><b>Reminder on different scaling methods</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs063.html#functionality-in-scikit-learn" style="font-size: 80%;"><b>Functionality in Scikit-Learn</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs064.html#more-preprocessing" style="font-size: 80%;"><b>More preprocessing</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs065.html#frequently-used-scaling-functions" style="font-size: 80%;"><b>Frequently used scaling functions</b></a></li>
 
         </ul>
       </li>
@@ -291,17 +374,79 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0012"></a>
 <!-- !split -->
-<h2 id="interpretations-of-bayes-theorem" class="anchor">Interpretations of Bayes' Theorem </h2>
+<h2 id="program-example-for-gradient-descent-with-ridge-regression" class="anchor">Program example for gradient descent with Ridge Regression </h2>
+
+<!-- code=python (!bc pycod) typeset with pygments style "default" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #f8f8f8">
+  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">random</span> <span style="color: #008000; font-weight: bold">import</span> random, seed
+<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
+<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">matplotlib.pyplot</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">plt</span>
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">mpl_toolkits.mplot3d</span> <span style="color: #008000; font-weight: bold">import</span> Axes3D
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">matplotlib</span> <span style="color: #008000; font-weight: bold">import</span> cm
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">matplotlib.ticker</span> <span style="color: #008000; font-weight: bold">import</span> LinearLocator, FormatStrFormatter
+<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">sys</span>
+
+<span style="color: #408080; font-style: italic"># the number of datapoints</span>
+n <span style="color: #666666">=</span> <span style="color: #666666">100</span>
+x <span style="color: #666666">=</span> <span style="color: #666666">2*</span>np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>rand(n,<span style="color: #666666">1</span>)
+y <span style="color: #666666">=</span> <span style="color: #666666">4+3*</span>x<span style="color: #666666">+</span>np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>randn(n,<span style="color: #666666">1</span>)
+
+X <span style="color: #666666">=</span> np<span style="color: #666666">.</span>c_[np<span style="color: #666666">.</span>ones((n,<span style="color: #666666">1</span>)), x]
+XT_X <span style="color: #666666">=</span> X<span style="color: #666666">.</span>T <span style="color: #666666">@</span> X
 
-<p>The quantity \( p(Y\vert X) \) on the right-hand side of the theorem is
-evaluated for the observed data \( Y \) and can be viewed as a function of
-the parameter space represented by \( X \). This function is not
-necesseraly normalized and is normally called the likelihood function.
-</p>
+<span style="color: #408080; font-style: italic">#Ridge parameter lambda</span>
+lmbda  <span style="color: #666666">=</span> <span style="color: #666666">0.001</span>
+Id <span style="color: #666666">=</span> n<span style="color: #666666">*</span>lmbda<span style="color: #666666">*</span> np<span style="color: #666666">.</span>eye(XT_X<span style="color: #666666">.</span>shape[<span style="color: #666666">0</span>])
 
-<p>The function \( p(X) \) on the right hand side is called the prior while the function on the left hand side is the called the posterior probability. The denominator on the right hand side serves as a normalization factor for the posterior distribution.</p>
+<span style="color: #408080; font-style: italic"># Hessian matrix</span>
+H <span style="color: #666666">=</span> (<span style="color: #666666">2.0/</span>n)<span style="color: #666666">*</span> XT_X<span style="color: #666666">+2*</span>lmbda<span style="color: #666666">*</span> np<span style="color: #666666">.</span>eye(XT_X<span style="color: #666666">.</span>shape[<span style="color: #666666">0</span>])
+<span style="color: #408080; font-style: italic"># Get the eigenvalues</span>
+EigValues, EigVectors <span style="color: #666666">=</span> np<span style="color: #666666">.</span>linalg<span style="color: #666666">.</span>eig(H)
+<span style="color: #008000">print</span>(<span style="color: #BA2121">f&quot;Eigenvalues of Hessian Matrix:</span><span style="color: #BB6688; font-weight: bold">{</span>EigValues<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">&quot;</span>)
+
+
+theta_linreg <span style="color: #666666">=</span> np<span style="color: #666666">.</span>linalg<span style="color: #666666">.</span>inv(XT_X<span style="color: #666666">+</span>Id) <span style="color: #666666">@</span> X<span style="color: #666666">.</span>T <span style="color: #666666">@</span> y
+<span style="color: #008000">print</span>(theta_linreg)
+<span style="color: #408080; font-style: italic"># Start plain gradient descent</span>
+theta <span style="color: #666666">=</span> np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>randn(<span style="color: #666666">2</span>,<span style="color: #666666">1</span>)
+
+eta <span style="color: #666666">=</span> <span style="color: #666666">1.0/</span>np<span style="color: #666666">.</span>max(EigValues)
+Niterations <span style="color: #666666">=</span> <span style="color: #666666">100</span>
+
+<span style="color: #008000; font-weight: bold">for</span> <span style="color: #008000">iter</span> <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(Niterations):
+    gradients <span style="color: #666666">=</span> <span style="color: #666666">2.0/</span>n<span style="color: #666666">*</span>X<span style="color: #666666">.</span>T <span style="color: #666666">@</span> (X <span style="color: #666666">@</span> (theta)<span style="color: #666666">-</span>y)<span style="color: #666666">+2*</span>lmbda<span style="color: #666666">*</span>theta
+    theta <span style="color: #666666">-=</span> eta<span style="color: #666666">*</span>gradients
+
+<span style="color: #008000">print</span>(theta)
+ypredict <span style="color: #666666">=</span> X <span style="color: #666666">@</span> theta
+ypredict2 <span style="color: #666666">=</span> X <span style="color: #666666">@</span> theta_linreg
+plt<span style="color: #666666">.</span>plot(x, ypredict, <span style="color: #BA2121">&quot;r-&quot;</span>)
+plt<span style="color: #666666">.</span>plot(x, ypredict2, <span style="color: #BA2121">&quot;b-&quot;</span>)
+plt<span style="color: #666666">.</span>plot(x, y ,<span style="color: #BA2121">&#39;ro&#39;</span>)
+plt<span style="color: #666666">.</span>axis([<span style="color: #666666">0</span>,<span style="color: #666666">2.0</span>,<span style="color: #666666">0</span>, <span style="color: #666666">15.0</span>])
+plt<span style="color: #666666">.</span>xlabel(<span style="color: #BA2121">r&#39;$x$&#39;</span>)
+plt<span style="color: #666666">.</span>ylabel(<span style="color: #BA2121">r&#39;$y$&#39;</span>)
+plt<span style="color: #666666">.</span>title(<span style="color: #BA2121">r&#39;Gradient descent example for Ridge&#39;</span>)
+plt<span style="color: #666666">.</span>show()
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
+</div>
 
-<p>Let us try to illustrate Bayes' theorem through an example.</p>
 
 <p>
 <!-- navigation buttons at the bottom of the page -->
@@ -328,7 +473,7 @@ <h2 id="interpretations-of-bayes-theorem" class="anchor">Interpretations of Baye
   <li><a href="http://github.com/._week37-bs020.html">21</a></li>
   <li><a href="http://github.com/._week37-bs021.html">22</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week37-bs053.html">54</a></li>
+  <li><a href="http://github.com/._week37-bs065.html">66</a></li>
   <li><a href="http://github.com/._week37-bs013.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week37/html/._week37-bs013.html b/doc/pub/week37/html/._week37-bs013.html
index 6df1dd439..31dcd4508 100644
--- a/doc/pub/week37/html/._week37-bs013.html
+++ b/doc/pub/week37/html/._week37-bs013.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 37: Statistical interpretations and Resampling Methods">
-<title>Week 37: Statistical interpretations and Resampling Methods</title>
+<meta name="description" content="Week 37: Gradient descent methods">
+<title>Week 37: Gradient descent methods</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week37.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week37-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -40,159 +40,222 @@
                2,
                None,
                'plans-for-week-37-lecture-monday'),
-              ('Plans for week 37, lab sessions',
+              ('Readings and Videos:', 2, None, 'readings-and-videos'),
+              ('Material for lecture Monday September 8',
                2,
                None,
-               'plans-for-week-37-lab-sessions'),
-              ('Material for lecture Monday September 9',
+               'material-for-lecture-monday-september-8'),
+              ('Gradient descent and revisiting Ordinary Least Squares from '
+               'last week',
                2,
                None,
-               'material-for-lecture-monday-september-9'),
-              ('Deriving OLS from a probability distribution',
+               'gradient-descent-and-revisiting-ordinary-least-squares-from-last-week'),
+              ('Gradient descent example', 2, None, 'gradient-descent-example'),
+              ('The derivative of the cost/loss function',
                2,
                None,
-               'deriving-ols-from-a-probability-distribution'),
-              ('Independent and Identically Distrubuted (iid)',
+               'the-derivative-of-the-cost-loss-function'),
+              ('The Hessian matrix', 2, None, 'the-hessian-matrix'),
+              ('Simple program', 2, None, 'simple-program'),
+              ('Gradient Descent Example', 2, None, 'gradient-descent-example'),
+              ('Gradient descent and Ridge',
                2,
                None,
-               'independent-and-identically-distrubuted-iid'),
-              ('Maximum Likelihood Estimation (MLE)',
+               'gradient-descent-and-ridge'),
+              ('The Hessian matrix for Ridge Regression',
                2,
                None,
-               'maximum-likelihood-estimation-mle'),
-              ('A new Cost Function', 2, None, 'a-new-cost-function'),
-              ("More basic Statistics and Bayes' theorem",
+               'the-hessian-matrix-for-ridge-regression'),
+              ('Program example for gradient descent with Ridge Regression',
                2,
                None,
-               'more-basic-statistics-and-bayes-theorem'),
-              ('Marginal Probability', 2, None, 'marginal-probability'),
-              ('Conditional  Probability', 2, None, 'conditional-probability'),
-              ("Bayes' Theorem", 2, None, 'bayes-theorem'),
-              ("Interpretations of Bayes' Theorem",
+               'program-example-for-gradient-descent-with-ridge-regression'),
+              ('Using gradient descent methods, limitations',
                2,
                None,
-               'interpretations-of-bayes-theorem'),
-              ("Example of Usage of Bayes' theorem",
+               'using-gradient-descent-methods-limitations'),
+              ('Momentum based GD', 2, None, 'momentum-based-gd'),
+              ('Improving gradient descent with momentum',
                2,
                None,
-               'example-of-usage-of-bayes-theorem'),
-              ('Doing it correctly', 2, None, 'doing-it-correctly'),
-              ("Bayes' Theorem and Ridge and Lasso Regression",
+               'improving-gradient-descent-with-momentum'),
+              ('Same code but now with momentum gradient descent',
                2,
                None,
-               'bayes-theorem-and-ridge-and-lasso-regression'),
-              ('Ridge and Bayes', 2, None, 'ridge-and-bayes'),
-              ('Lasso and Bayes', 2, None, 'lasso-and-bayes'),
-              ('Why resampling methods', 2, None, 'why-resampling-methods'),
-              ('Resampling methods', 2, None, 'resampling-methods'),
-              ('Resampling approaches can be computationally expensive',
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ('Overview video on Stochastic Gradient Descent (SGD)',
                2,
                None,
-               'resampling-approaches-can-be-computationally-expensive'),
-              ('Why resampling methods ?', 2, None, 'why-resampling-methods'),
-              ('Statistical analysis', 2, None, 'statistical-analysis'),
-              ('Resampling methods', 2, None, 'resampling-methods'),
-              ('Resampling methods: Bootstrap',
+               'overview-video-on-stochastic-gradient-descent-sgd'),
+              ('Batches and mini-batches', 2, None, 'batches-and-mini-batches'),
+              ('Pros and cons', 2, None, 'pros-and-cons'),
+              ('Convergence rates', 2, None, 'convergence-rates'),
+              ('Accuracy', 2, None, 'accuracy'),
+              ('Stochastic Gradient Descent (SGD)',
                2,
                None,
-               'resampling-methods-bootstrap'),
-              ('The Central Limit Theorem',
+               'stochastic-gradient-descent-sgd'),
+              ('Stochastic Gradient Descent',
                2,
                None,
-               'the-central-limit-theorem'),
-              ('Finding the Limit', 2, None, 'finding-the-limit'),
-              ('Rewriting the $\\delta$-function',
+               'stochastic-gradient-descent'),
+              ('Computation of gradients', 2, None, 'computation-of-gradients'),
+              ('SGD example', 2, None, 'sgd-example'),
+              ('The gradient step', 2, None, 'the-gradient-step'),
+              ('Simple example code', 2, None, 'simple-example-code'),
+              ('When do we stop?', 2, None, 'when-do-we-stop'),
+              ('Slightly different approach',
                2,
                None,
-               'rewriting-the-delta-function'),
-              ('Identifying Terms', 2, None, 'identifying-terms'),
-              ('Wrapping it up', 2, None, 'wrapping-it-up'),
-              ('Confidence Intervals', 2, None, 'confidence-intervals'),
-              ('Standard Approach based on the Normal Distribution',
+               'slightly-different-approach'),
+              ('Time decay rate', 2, None, 'time-decay-rate'),
+              ('Code with a Number of Minibatches which varies',
                2,
                None,
-               'standard-approach-based-on-the-normal-distribution'),
-              ('Resampling methods: Bootstrap background',
+               'code-with-a-number-of-minibatches-which-varies'),
+              ('Replace or not', 2, None, 'replace-or-not'),
+              ('SGD vs Full-Batch GD: Convergence Speed and Memory Comparison',
                2,
                None,
-               'resampling-methods-bootstrap-background'),
-              ('Resampling methods: More Bootstrap background',
+               'sgd-vs-full-batch-gd-convergence-speed-and-memory-comparison'),
+              ('Theoretical Convergence Speed and convex optimization',
+               3,
+               None,
+               'theoretical-convergence-speed-and-convex-optimization'),
+              ('Strongly Convex Case', 3, None, 'strongly-convex-case'),
+              ('Non-Convex Problems', 3, None, 'non-convex-problems'),
+              ('Memory Usage and Scalability',
+               2,
+               None,
+               'memory-usage-and-scalability'),
+              ('Empirical Evidence: Convergence Time and Memory in Practice',
+               2,
+               None,
+               'empirical-evidence-convergence-time-and-memory-in-practice'),
+              ('Deep Neural Networks', 3, None, 'deep-neural-networks'),
+              ('Memory constraints', 3, None, 'memory-constraints'),
+              ('Second moment of the gradient',
+               2,
+               None,
+               'second-moment-of-the-gradient'),
+              ('Challenge: Choosing a Fixed Learning Rate',
+               2,
+               None,
+               'challenge-choosing-a-fixed-learning-rate'),
+              ('Motivation for Adaptive Step Sizes',
+               2,
+               None,
+               'motivation-for-adaptive-step-sizes'),
+              ('AdaGrad algorithm, taken from "Goodfellow et '
+               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
+               2,
+               None,
+               'adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
+              ('Derivation of the AdaGrad Algorithm',
+               2,
+               None,
+               'derivation-of-the-adagrad-algorithm'),
+              ('AdaGrad Update Rule Derivation',
                2,
                None,
-               'resampling-methods-more-bootstrap-background'),
-              ('Resampling methods: Bootstrap approach',
+               'adagrad-update-rule-derivation'),
+              ('AdaGrad Properties', 2, None, 'adagrad-properties'),
+              ('RMSProp: Adaptive Learning Rates',
                2,
                None,
-               'resampling-methods-bootstrap-approach'),
-              ('Resampling methods: Bootstrap steps',
+               'rmsprop-adaptive-learning-rates'),
+              ('RMSProp algorithm, taken from "Goodfellow et '
+               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
                2,
                None,
-               'resampling-methods-bootstrap-steps'),
-              ('Code example for the Bootstrap method',
+               'rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
+              ('Adam Optimizer', 2, None, 'adam-optimizer'),
+              ('"ADAM optimizer":"https://arxiv.org/abs/1412.6980"',
                2,
                None,
-               'code-example-for-the-bootstrap-method'),
-              ('Plotting the Histogram', 2, None, 'plotting-the-histogram'),
-              ('The bias-variance tradeoff',
+               'adam-optimizer-https-arxiv-org-abs-1412-6980'),
+              ('Why Combine Momentum and RMSProp?',
                2,
                None,
-               'the-bias-variance-tradeoff'),
-              ('A way to Read the Bias-Variance Tradeoff',
+               'why-combine-momentum-and-rmsprop'),
+              ('Adam: Exponential Moving Averages (Moments)',
                2,
                None,
-               'a-way-to-read-the-bias-variance-tradeoff'),
-              ('Example code for Bias-Variance tradeoff',
+               'adam-exponential-moving-averages-moments'),
+              ('Adam: Bias Correction', 2, None, 'adam-bias-correction'),
+              ('Adam: Update Rule Derivation',
                2,
                None,
-               'example-code-for-bias-variance-tradeoff'),
-              ('Understanding what happens',
+               'adam-update-rule-derivation'),
+              ('Adam vs. AdaGrad and RMSProp',
                2,
                None,
-               'understanding-what-happens'),
-              ('Summing up', 2, None, 'summing-up'),
-              ("Another Example from Scikit-Learn's Repository",
+               'adam-vs-adagrad-and-rmsprop'),
+              ('Adaptivity Across Dimensions',
                2,
                None,
-               'another-example-from-scikit-learn-s-repository'),
-              ('Various steps in cross-validation',
+               'adaptivity-across-dimensions'),
+              ('ADAM algorithm, taken from "Goodfellow et '
+               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
                2,
                None,
-               'various-steps-in-cross-validation'),
-              ('Cross-validation in brief',
+               'adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
+              ('Algorithms and codes for Adagrad, RMSprop and Adam',
                2,
                None,
-               'cross-validation-in-brief'),
-              ('Code Example for Cross-validation and $k$-fold '
-               'Cross-validation',
+               'algorithms-and-codes-for-adagrad-rmsprop-and-adam'),
+              ('Practical tips', 2, None, 'practical-tips'),
+              ('Sneaking in automatic differentiation using Autograd',
                2,
                None,
-               'code-example-for-cross-validation-and-k-fold-cross-validation'),
-              ('More examples on bootstrap and cross-validation and errors',
+               'sneaking-in-automatic-differentiation-using-autograd'),
+              ('Same code but now with momentum gradient descent',
                2,
                None,
-               'more-examples-on-bootstrap-and-cross-validation-and-errors'),
-              ('The same example but now with cross-validation',
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ('Including Stochastic Gradient Descent with Autograd',
                2,
                None,
-               'the-same-example-but-now-with-cross-validation'),
+               'including-stochastic-gradient-descent-with-autograd'),
+              ('Same code but now with momentum gradient descent',
+               2,
+               None,
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ("But none of these can compete with Newton's method",
+               2,
+               None,
+               'but-none-of-these-can-compete-with-newton-s-method'),
+              ('Similar (second order function now) problem but now with '
+               'AdaGrad',
+               2,
+               None,
+               'similar-second-order-function-now-problem-but-now-with-adagrad'),
+              ('RMSprop for adaptive learning rate with Stochastic Gradient '
+               'Descent',
+               2,
+               None,
+               'rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent'),
+              ('And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"',
+               2,
+               None,
+               'and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf'),
               ('Material for the lab sessions',
                2,
                None,
                'material-for-the-lab-sessions'),
-              ('Linking the regression analysis with a statistical '
-               'interpretation',
+              ('Reminder on different scaling methods',
                2,
                None,
-               'linking-the-regression-analysis-with-a-statistical-interpretation'),
-              ('Assumptions made', 2, None, 'assumptions-made'),
-              ('Expectation value and variance',
+               'reminder-on-different-scaling-methods'),
+              ('Functionality in Scikit-Learn',
                2,
                None,
-               'expectation-value-and-variance'),
-              ('Expectation value and variance for $\\boldsymbol{\\beta}$',
+               'functionality-in-scikit-learn'),
+              ('More preprocessing', 2, None, 'more-preprocessing'),
+              ('Frequently used scaling functions',
                2,
                None,
-               'expectation-value-and-variance-for-boldsymbol-beta')]}
+               'frequently-used-scaling-functions')]}
 end of tocinfo -->
 
 <body>
@@ -220,66 +283,86 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week37-bs.html">Week 37: Statistical interpretations and Resampling Methods</a>
+    <a class="navbar-brand" href="http://github.com/week37-bs.html">Week 37: Gradient descent methods</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs001.html#plans-for-week-37-lecture-monday" style="font-size: 80%;">Plans for week 37, lecture Monday</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs002.html#plans-for-week-37-lab-sessions" style="font-size: 80%;">Plans for week 37, lab sessions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs003.html#material-for-lecture-monday-september-9" style="font-size: 80%;">Material for lecture Monday September 9</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs004.html#deriving-ols-from-a-probability-distribution" style="font-size: 80%;">Deriving OLS from a probability distribution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs005.html#independent-and-identically-distrubuted-iid" style="font-size: 80%;">Independent and Identically Distrubuted (iid)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs006.html#maximum-likelihood-estimation-mle" style="font-size: 80%;">Maximum Likelihood Estimation (MLE)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs007.html#a-new-cost-function" style="font-size: 80%;">A new Cost Function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs008.html#more-basic-statistics-and-bayes-theorem" style="font-size: 80%;">More basic Statistics and Bayes' theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs009.html#marginal-probability" style="font-size: 80%;">Marginal Probability</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs010.html#conditional-probability" style="font-size: 80%;">Conditional  Probability</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs011.html#bayes-theorem" style="font-size: 80%;">Bayes' Theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs012.html#interpretations-of-bayes-theorem" style="font-size: 80%;">Interpretations of Bayes' Theorem</a></li>
-     <!-- navigation toc: --> <li><a href="#example-of-usage-of-bayes-theorem" style="font-size: 80%;">Example of Usage of Bayes' theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs014.html#doing-it-correctly" style="font-size: 80%;">Doing it correctly</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs015.html#bayes-theorem-and-ridge-and-lasso-regression" style="font-size: 80%;">Bayes' Theorem and Ridge and Lasso Regression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs016.html#ridge-and-bayes" style="font-size: 80%;">Ridge and Bayes</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs017.html#lasso-and-bayes" style="font-size: 80%;">Lasso and Bayes</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs021.html#why-resampling-methods" style="font-size: 80%;">Why resampling methods</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs023.html#resampling-methods" style="font-size: 80%;">Resampling methods</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs020.html#resampling-approaches-can-be-computationally-expensive" style="font-size: 80%;">Resampling approaches can be computationally expensive</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs021.html#why-resampling-methods" style="font-size: 80%;">Why resampling methods ?</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs022.html#statistical-analysis" style="font-size: 80%;">Statistical analysis</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs023.html#resampling-methods" style="font-size: 80%;">Resampling methods</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs024.html#resampling-methods-bootstrap" style="font-size: 80%;">Resampling methods: Bootstrap</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs025.html#the-central-limit-theorem" style="font-size: 80%;">The Central Limit Theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs026.html#finding-the-limit" style="font-size: 80%;">Finding the Limit</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs027.html#rewriting-the-delta-function" style="font-size: 80%;">Rewriting the \( \delta \)-function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs028.html#identifying-terms" style="font-size: 80%;">Identifying Terms</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs029.html#wrapping-it-up" style="font-size: 80%;">Wrapping it up</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs030.html#confidence-intervals" style="font-size: 80%;">Confidence Intervals</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs031.html#standard-approach-based-on-the-normal-distribution" style="font-size: 80%;">Standard Approach based on the Normal Distribution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs032.html#resampling-methods-bootstrap-background" style="font-size: 80%;">Resampling methods: Bootstrap background</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#resampling-methods-more-bootstrap-background" style="font-size: 80%;">Resampling methods: More Bootstrap background</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs034.html#resampling-methods-bootstrap-approach" style="font-size: 80%;">Resampling methods: Bootstrap approach</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#resampling-methods-bootstrap-steps" style="font-size: 80%;">Resampling methods: Bootstrap steps</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs036.html#code-example-for-the-bootstrap-method" style="font-size: 80%;">Code example for the Bootstrap method</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs037.html#plotting-the-histogram" style="font-size: 80%;">Plotting the Histogram</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs038.html#the-bias-variance-tradeoff" style="font-size: 80%;">The bias-variance tradeoff</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs039.html#a-way-to-read-the-bias-variance-tradeoff" style="font-size: 80%;">A way to Read the Bias-Variance Tradeoff</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs040.html#example-code-for-bias-variance-tradeoff" style="font-size: 80%;">Example code for Bias-Variance tradeoff</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs041.html#understanding-what-happens" style="font-size: 80%;">Understanding what happens</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs042.html#summing-up" style="font-size: 80%;">Summing up</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs043.html#another-example-from-scikit-learn-s-repository" style="font-size: 80%;">Another Example from Scikit-Learn's Repository</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs044.html#various-steps-in-cross-validation" style="font-size: 80%;">Various steps in cross-validation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs045.html#cross-validation-in-brief" style="font-size: 80%;">Cross-validation in brief</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs046.html#code-example-for-cross-validation-and-k-fold-cross-validation" style="font-size: 80%;">Code Example for Cross-validation and \( k \)-fold Cross-validation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs047.html#more-examples-on-bootstrap-and-cross-validation-and-errors" style="font-size: 80%;">More examples on bootstrap and cross-validation and errors</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs048.html#the-same-example-but-now-with-cross-validation" style="font-size: 80%;">The same example but now with cross-validation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs049.html#material-for-the-lab-sessions" style="font-size: 80%;">Material for the lab sessions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs050.html#linking-the-regression-analysis-with-a-statistical-interpretation" style="font-size: 80%;">Linking the regression analysis with a statistical interpretation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs051.html#assumptions-made" style="font-size: 80%;">Assumptions made</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs052.html#expectation-value-and-variance" style="font-size: 80%;">Expectation value and variance</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs053.html#expectation-value-and-variance-for-boldsymbol-beta" style="font-size: 80%;">Expectation value and variance for \( \boldsymbol{\beta} \)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs001.html#plans-for-week-37-lecture-monday" style="font-size: 80%;"><b>Plans for week 37, lecture Monday</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs002.html#readings-and-videos" style="font-size: 80%;"><b>Readings and Videos:</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs003.html#material-for-lecture-monday-september-8" style="font-size: 80%;"><b>Material for lecture Monday September 8</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs004.html#gradient-descent-and-revisiting-ordinary-least-squares-from-last-week" style="font-size: 80%;"><b>Gradient descent and revisiting Ordinary Least Squares from last week</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs009.html#gradient-descent-example" style="font-size: 80%;"><b>Gradient descent example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs006.html#the-derivative-of-the-cost-loss-function" style="font-size: 80%;"><b>The derivative of the cost/loss function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs007.html#the-hessian-matrix" style="font-size: 80%;"><b>The Hessian matrix</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs008.html#simple-program" style="font-size: 80%;"><b>Simple program</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs009.html#gradient-descent-example" style="font-size: 80%;"><b>Gradient Descent Example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs010.html#gradient-descent-and-ridge" style="font-size: 80%;"><b>Gradient descent and Ridge</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs011.html#the-hessian-matrix-for-ridge-regression" style="font-size: 80%;"><b>The Hessian matrix for Ridge Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs012.html#program-example-for-gradient-descent-with-ridge-regression" style="font-size: 80%;"><b>Program example for gradient descent with Ridge Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="#using-gradient-descent-methods-limitations" style="font-size: 80%;"><b>Using gradient descent methods, limitations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs014.html#momentum-based-gd" style="font-size: 80%;"><b>Momentum based GD</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs015.html#improving-gradient-descent-with-momentum" style="font-size: 80%;"><b>Improving gradient descent with momentum</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs056.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs017.html#overview-video-on-stochastic-gradient-descent-sgd" style="font-size: 80%;"><b>Overview video on Stochastic Gradient Descent (SGD)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs018.html#batches-and-mini-batches" style="font-size: 80%;"><b>Batches and mini-batches</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs019.html#pros-and-cons" style="font-size: 80%;"><b>Pros and cons</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs020.html#convergence-rates" style="font-size: 80%;"><b>Convergence rates</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs021.html#accuracy" style="font-size: 80%;"><b>Accuracy</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs022.html#stochastic-gradient-descent-sgd" style="font-size: 80%;"><b>Stochastic Gradient Descent (SGD)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs023.html#stochastic-gradient-descent" style="font-size: 80%;"><b>Stochastic Gradient Descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs024.html#computation-of-gradients" style="font-size: 80%;"><b>Computation of gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs025.html#sgd-example" style="font-size: 80%;"><b>SGD example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs026.html#the-gradient-step" style="font-size: 80%;"><b>The gradient step</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs027.html#simple-example-code" style="font-size: 80%;"><b>Simple example code</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs028.html#when-do-we-stop" style="font-size: 80%;"><b>When do we stop?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs029.html#slightly-different-approach" style="font-size: 80%;"><b>Slightly different approach</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs030.html#time-decay-rate" style="font-size: 80%;"><b>Time decay rate</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs031.html#code-with-a-number-of-minibatches-which-varies" style="font-size: 80%;"><b>Code with a Number of Minibatches which varies</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs032.html#replace-or-not" style="font-size: 80%;"><b>Replace or not</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#sgd-vs-full-batch-gd-convergence-speed-and-memory-comparison" style="font-size: 80%;"><b>SGD vs Full-Batch GD: Convergence Speed and Memory Comparison</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#theoretical-convergence-speed-and-convex-optimization" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Theoretical Convergence Speed and convex optimization</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#strongly-convex-case" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Strongly Convex Case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#non-convex-problems" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Non-Convex Problems</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs034.html#memory-usage-and-scalability" style="font-size: 80%;"><b>Memory Usage and Scalability</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#empirical-evidence-convergence-time-and-memory-in-practice" style="font-size: 80%;"><b>Empirical Evidence: Convergence Time and Memory in Practice</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#deep-neural-networks" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Deep Neural Networks</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#memory-constraints" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Memory constraints</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs036.html#second-moment-of-the-gradient" style="font-size: 80%;"><b>Second moment of the gradient</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs037.html#challenge-choosing-a-fixed-learning-rate" style="font-size: 80%;"><b>Challenge: Choosing a Fixed Learning Rate</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs038.html#motivation-for-adaptive-step-sizes" style="font-size: 80%;"><b>Motivation for Adaptive Step Sizes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs038.html#adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>AdaGrad algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs039.html#derivation-of-the-adagrad-algorithm" style="font-size: 80%;"><b>Derivation of the AdaGrad Algorithm</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs040.html#adagrad-update-rule-derivation" style="font-size: 80%;"><b>AdaGrad Update Rule Derivation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs041.html#adagrad-properties" style="font-size: 80%;"><b>AdaGrad Properties</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs042.html#rmsprop-adaptive-learning-rates" style="font-size: 80%;"><b>RMSProp: Adaptive Learning Rates</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs042.html#rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>RMSProp algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs043.html#adam-optimizer" style="font-size: 80%;"><b>Adam Optimizer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs044.html#adam-optimizer-https-arxiv-org-abs-1412-6980" style="font-size: 80%;"><b>"ADAM optimizer":"https://arxiv.org/abs/1412.6980"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs045.html#why-combine-momentum-and-rmsprop" style="font-size: 80%;"><b>Why Combine Momentum and RMSProp?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs046.html#adam-exponential-moving-averages-moments" style="font-size: 80%;"><b>Adam: Exponential Moving Averages (Moments)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs047.html#adam-bias-correction" style="font-size: 80%;"><b>Adam: Bias Correction</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs048.html#adam-update-rule-derivation" style="font-size: 80%;"><b>Adam: Update Rule Derivation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs049.html#adam-vs-adagrad-and-rmsprop" style="font-size: 80%;"><b>Adam vs. AdaGrad and RMSProp</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs050.html#adaptivity-across-dimensions" style="font-size: 80%;"><b>Adaptivity Across Dimensions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs050.html#adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>ADAM algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs051.html#algorithms-and-codes-for-adagrad-rmsprop-and-adam" style="font-size: 80%;"><b>Algorithms and codes for Adagrad, RMSprop and Adam</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs052.html#practical-tips" style="font-size: 80%;"><b>Practical tips</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs053.html#sneaking-in-automatic-differentiation-using-autograd" style="font-size: 80%;"><b>Sneaking in automatic differentiation using Autograd</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs056.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs055.html#including-stochastic-gradient-descent-with-autograd" style="font-size: 80%;"><b>Including Stochastic Gradient Descent with Autograd</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs056.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs057.html#but-none-of-these-can-compete-with-newton-s-method" style="font-size: 80%;"><b>But none of these can compete with Newton's method</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs058.html#similar-second-order-function-now-problem-but-now-with-adagrad" style="font-size: 80%;"><b>Similar (second order function now) problem but now with AdaGrad</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs059.html#rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent" style="font-size: 80%;"><b>RMSprop for adaptive learning rate with Stochastic Gradient Descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs060.html#and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf" style="font-size: 80%;"><b>And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs061.html#material-for-the-lab-sessions" style="font-size: 80%;"><b>Material for the lab sessions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs062.html#reminder-on-different-scaling-methods" style="font-size: 80%;"><b>Reminder on different scaling methods</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs063.html#functionality-in-scikit-learn" style="font-size: 80%;"><b>Functionality in Scikit-Learn</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs064.html#more-preprocessing" style="font-size: 80%;"><b>More preprocessing</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs065.html#frequently-used-scaling-functions" style="font-size: 80%;"><b>Frequently used scaling functions</b></a></li>
 
         </ul>
       </li>
@@ -291,28 +374,16 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0013"></a>
 <!-- !split -->
-<h2 id="example-of-usage-of-bayes-theorem" class="anchor">Example of Usage of Bayes' theorem </h2>
-
-<p>Let us suppose that you are undergoing a series of mammography scans in
-order to rule out possible breast cancer cases.  We define the
-sensitivity for a positive event by the variable \( X \). It takes binary
-values with \( X=1 \) representing a positive event and \( X=0 \) being a
-negative event. We reserve \( Y \) as a classification parameter for
-either a negative or a positive breast cancer confirmation. (Short note on wordings: positive here means having breast cancer, although none of us would consider this being a  positive thing).
-</p>
-
-<p>We let \( Y=1 \) represent the the case of having breast cancer and \( Y=0 \) as not.</p>
-
-<p>Let us assume that if you have breast cancer, the test will be positive with a probability of \( 0.8 \), that is we have</p>
-
-$$
-p(X=1\vert Y=1) =0.8.
-$$
-
-<p>This obviously sounds  scary since many would conclude that if the test is positive, there is a likelihood of \( 80\% \) for having cancer.
-It is however not correct, as the following Bayesian analysis shows.
-</p>
+<h2 id="using-gradient-descent-methods-limitations" class="anchor">Using gradient descent methods, limitations </h2>
 
+<ul>
+<li> <b>Gradient descent (GD) finds local minima of our function</b>. Since the GD algorithm is deterministic, if it converges, it will converge to a local minimum of our cost/loss/risk function. Because in ML we are often dealing with extremely rugged landscapes with many local minima, this can lead to poor performance.</li>
+<li> <b>GD is sensitive to initial conditions</b>. One consequence of the local nature of GD is that initial conditions matter. Depending on where one starts, one will end up at a different local minima. Therefore, it is very important to think about how one initializes the training process. This is true for GD as well as more complicated variants of GD.</li>
+<li> <b>Gradients are computationally expensive to calculate for large datasets</b>. In many cases in statistics and ML, the cost/loss/risk function is a sum of terms, with one term for each data point. For example, in linear regression, \( E \propto \sum_{i=1}^n (y_i - \mathbf{w}^T\cdot\mathbf{x}_i)^2 \); for logistic regression, the square error is replaced by the cross entropy. To calculate the gradient we have to sum over <em>all</em> \( n \) data points. Doing this at every GD step becomes extremely computationally expensive. An ingenious solution to this, is to calculate the gradients using small subsets of the data called &quot;mini batches&quot;. This has the added benefit of introducing stochasticity into our algorithm.</li>
+<li> <b>GD is very sensitive to choices of learning rates</b>. GD is extremely sensitive to the choice of learning rates. If the learning rate is very small, the training process take an extremely long time. For larger learning rates, GD can diverge and give poor results. Furthermore, depending on what the local landscape looks like, we have to modify the learning rates to ensure convergence. Ideally, we would <em>adaptively</em> choose the learning rates to match the landscape.</li>
+<li> <b>GD treats all directions in parameter space uniformly.</b> Another major drawback of GD is that unlike Newton's method, the learning rate for GD is the same in all directions in parameter space. For this reason, the maximum learning rate is set by the behavior of the steepest direction and this can significantly slow down training. Ideally, we would like to take large steps in flat directions and small steps in steep directions. Since we are exploring rugged landscapes where curvatures change, this requires us to keep track of not only the gradient but second derivatives. The ideal scenario would be to calculate the Hessian but this proves to be too computationally expensive.</li> 
+<li> GD can take exponential time to escape saddle points, even with random initialization. As we mentioned, GD is extremely sensitive to initial condition since it determines the particular local minimum GD would eventually reach. However, even with a good initialization scheme, through the introduction of randomness, GD can still take exponential time to escape saddle points.</li>
+</ul>
 <p>
 <!-- navigation buttons at the bottom of the page -->
 <ul class="pagination">
@@ -338,7 +409,7 @@ <h2 id="example-of-usage-of-bayes-theorem" class="anchor">Example of Usage of Ba
   <li><a href="http://github.com/._week37-bs021.html">22</a></li>
   <li><a href="http://github.com/._week37-bs022.html">23</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week37-bs053.html">54</a></li>
+  <li><a href="http://github.com/._week37-bs065.html">66</a></li>
   <li><a href="http://github.com/._week37-bs014.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week37/html/._week37-bs014.html b/doc/pub/week37/html/._week37-bs014.html
index 434095bd7..fd2ab0e0b 100644
--- a/doc/pub/week37/html/._week37-bs014.html
+++ b/doc/pub/week37/html/._week37-bs014.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 37: Statistical interpretations and Resampling Methods">
-<title>Week 37: Statistical interpretations and Resampling Methods</title>
+<meta name="description" content="Week 37: Gradient descent methods">
+<title>Week 37: Gradient descent methods</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week37.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week37-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -40,159 +40,222 @@
                2,
                None,
                'plans-for-week-37-lecture-monday'),
-              ('Plans for week 37, lab sessions',
+              ('Readings and Videos:', 2, None, 'readings-and-videos'),
+              ('Material for lecture Monday September 8',
                2,
                None,
-               'plans-for-week-37-lab-sessions'),
-              ('Material for lecture Monday September 9',
+               'material-for-lecture-monday-september-8'),
+              ('Gradient descent and revisiting Ordinary Least Squares from '
+               'last week',
                2,
                None,
-               'material-for-lecture-monday-september-9'),
-              ('Deriving OLS from a probability distribution',
+               'gradient-descent-and-revisiting-ordinary-least-squares-from-last-week'),
+              ('Gradient descent example', 2, None, 'gradient-descent-example'),
+              ('The derivative of the cost/loss function',
                2,
                None,
-               'deriving-ols-from-a-probability-distribution'),
-              ('Independent and Identically Distrubuted (iid)',
+               'the-derivative-of-the-cost-loss-function'),
+              ('The Hessian matrix', 2, None, 'the-hessian-matrix'),
+              ('Simple program', 2, None, 'simple-program'),
+              ('Gradient Descent Example', 2, None, 'gradient-descent-example'),
+              ('Gradient descent and Ridge',
                2,
                None,
-               'independent-and-identically-distrubuted-iid'),
-              ('Maximum Likelihood Estimation (MLE)',
+               'gradient-descent-and-ridge'),
+              ('The Hessian matrix for Ridge Regression',
                2,
                None,
-               'maximum-likelihood-estimation-mle'),
-              ('A new Cost Function', 2, None, 'a-new-cost-function'),
-              ("More basic Statistics and Bayes' theorem",
+               'the-hessian-matrix-for-ridge-regression'),
+              ('Program example for gradient descent with Ridge Regression',
                2,
                None,
-               'more-basic-statistics-and-bayes-theorem'),
-              ('Marginal Probability', 2, None, 'marginal-probability'),
-              ('Conditional  Probability', 2, None, 'conditional-probability'),
-              ("Bayes' Theorem", 2, None, 'bayes-theorem'),
-              ("Interpretations of Bayes' Theorem",
+               'program-example-for-gradient-descent-with-ridge-regression'),
+              ('Using gradient descent methods, limitations',
                2,
                None,
-               'interpretations-of-bayes-theorem'),
-              ("Example of Usage of Bayes' theorem",
+               'using-gradient-descent-methods-limitations'),
+              ('Momentum based GD', 2, None, 'momentum-based-gd'),
+              ('Improving gradient descent with momentum',
                2,
                None,
-               'example-of-usage-of-bayes-theorem'),
-              ('Doing it correctly', 2, None, 'doing-it-correctly'),
-              ("Bayes' Theorem and Ridge and Lasso Regression",
+               'improving-gradient-descent-with-momentum'),
+              ('Same code but now with momentum gradient descent',
                2,
                None,
-               'bayes-theorem-and-ridge-and-lasso-regression'),
-              ('Ridge and Bayes', 2, None, 'ridge-and-bayes'),
-              ('Lasso and Bayes', 2, None, 'lasso-and-bayes'),
-              ('Why resampling methods', 2, None, 'why-resampling-methods'),
-              ('Resampling methods', 2, None, 'resampling-methods'),
-              ('Resampling approaches can be computationally expensive',
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ('Overview video on Stochastic Gradient Descent (SGD)',
                2,
                None,
-               'resampling-approaches-can-be-computationally-expensive'),
-              ('Why resampling methods ?', 2, None, 'why-resampling-methods'),
-              ('Statistical analysis', 2, None, 'statistical-analysis'),
-              ('Resampling methods', 2, None, 'resampling-methods'),
-              ('Resampling methods: Bootstrap',
+               'overview-video-on-stochastic-gradient-descent-sgd'),
+              ('Batches and mini-batches', 2, None, 'batches-and-mini-batches'),
+              ('Pros and cons', 2, None, 'pros-and-cons'),
+              ('Convergence rates', 2, None, 'convergence-rates'),
+              ('Accuracy', 2, None, 'accuracy'),
+              ('Stochastic Gradient Descent (SGD)',
                2,
                None,
-               'resampling-methods-bootstrap'),
-              ('The Central Limit Theorem',
+               'stochastic-gradient-descent-sgd'),
+              ('Stochastic Gradient Descent',
                2,
                None,
-               'the-central-limit-theorem'),
-              ('Finding the Limit', 2, None, 'finding-the-limit'),
-              ('Rewriting the $\\delta$-function',
+               'stochastic-gradient-descent'),
+              ('Computation of gradients', 2, None, 'computation-of-gradients'),
+              ('SGD example', 2, None, 'sgd-example'),
+              ('The gradient step', 2, None, 'the-gradient-step'),
+              ('Simple example code', 2, None, 'simple-example-code'),
+              ('When do we stop?', 2, None, 'when-do-we-stop'),
+              ('Slightly different approach',
                2,
                None,
-               'rewriting-the-delta-function'),
-              ('Identifying Terms', 2, None, 'identifying-terms'),
-              ('Wrapping it up', 2, None, 'wrapping-it-up'),
-              ('Confidence Intervals', 2, None, 'confidence-intervals'),
-              ('Standard Approach based on the Normal Distribution',
+               'slightly-different-approach'),
+              ('Time decay rate', 2, None, 'time-decay-rate'),
+              ('Code with a Number of Minibatches which varies',
                2,
                None,
-               'standard-approach-based-on-the-normal-distribution'),
-              ('Resampling methods: Bootstrap background',
+               'code-with-a-number-of-minibatches-which-varies'),
+              ('Replace or not', 2, None, 'replace-or-not'),
+              ('SGD vs Full-Batch GD: Convergence Speed and Memory Comparison',
                2,
                None,
-               'resampling-methods-bootstrap-background'),
-              ('Resampling methods: More Bootstrap background',
+               'sgd-vs-full-batch-gd-convergence-speed-and-memory-comparison'),
+              ('Theoretical Convergence Speed and convex optimization',
+               3,
+               None,
+               'theoretical-convergence-speed-and-convex-optimization'),
+              ('Strongly Convex Case', 3, None, 'strongly-convex-case'),
+              ('Non-Convex Problems', 3, None, 'non-convex-problems'),
+              ('Memory Usage and Scalability',
+               2,
+               None,
+               'memory-usage-and-scalability'),
+              ('Empirical Evidence: Convergence Time and Memory in Practice',
+               2,
+               None,
+               'empirical-evidence-convergence-time-and-memory-in-practice'),
+              ('Deep Neural Networks', 3, None, 'deep-neural-networks'),
+              ('Memory constraints', 3, None, 'memory-constraints'),
+              ('Second moment of the gradient',
+               2,
+               None,
+               'second-moment-of-the-gradient'),
+              ('Challenge: Choosing a Fixed Learning Rate',
+               2,
+               None,
+               'challenge-choosing-a-fixed-learning-rate'),
+              ('Motivation for Adaptive Step Sizes',
+               2,
+               None,
+               'motivation-for-adaptive-step-sizes'),
+              ('AdaGrad algorithm, taken from "Goodfellow et '
+               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
+               2,
+               None,
+               'adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
+              ('Derivation of the AdaGrad Algorithm',
+               2,
+               None,
+               'derivation-of-the-adagrad-algorithm'),
+              ('AdaGrad Update Rule Derivation',
                2,
                None,
-               'resampling-methods-more-bootstrap-background'),
-              ('Resampling methods: Bootstrap approach',
+               'adagrad-update-rule-derivation'),
+              ('AdaGrad Properties', 2, None, 'adagrad-properties'),
+              ('RMSProp: Adaptive Learning Rates',
                2,
                None,
-               'resampling-methods-bootstrap-approach'),
-              ('Resampling methods: Bootstrap steps',
+               'rmsprop-adaptive-learning-rates'),
+              ('RMSProp algorithm, taken from "Goodfellow et '
+               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
                2,
                None,
-               'resampling-methods-bootstrap-steps'),
-              ('Code example for the Bootstrap method',
+               'rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
+              ('Adam Optimizer', 2, None, 'adam-optimizer'),
+              ('"ADAM optimizer":"https://arxiv.org/abs/1412.6980"',
                2,
                None,
-               'code-example-for-the-bootstrap-method'),
-              ('Plotting the Histogram', 2, None, 'plotting-the-histogram'),
-              ('The bias-variance tradeoff',
+               'adam-optimizer-https-arxiv-org-abs-1412-6980'),
+              ('Why Combine Momentum and RMSProp?',
                2,
                None,
-               'the-bias-variance-tradeoff'),
-              ('A way to Read the Bias-Variance Tradeoff',
+               'why-combine-momentum-and-rmsprop'),
+              ('Adam: Exponential Moving Averages (Moments)',
                2,
                None,
-               'a-way-to-read-the-bias-variance-tradeoff'),
-              ('Example code for Bias-Variance tradeoff',
+               'adam-exponential-moving-averages-moments'),
+              ('Adam: Bias Correction', 2, None, 'adam-bias-correction'),
+              ('Adam: Update Rule Derivation',
                2,
                None,
-               'example-code-for-bias-variance-tradeoff'),
-              ('Understanding what happens',
+               'adam-update-rule-derivation'),
+              ('Adam vs. AdaGrad and RMSProp',
                2,
                None,
-               'understanding-what-happens'),
-              ('Summing up', 2, None, 'summing-up'),
-              ("Another Example from Scikit-Learn's Repository",
+               'adam-vs-adagrad-and-rmsprop'),
+              ('Adaptivity Across Dimensions',
                2,
                None,
-               'another-example-from-scikit-learn-s-repository'),
-              ('Various steps in cross-validation',
+               'adaptivity-across-dimensions'),
+              ('ADAM algorithm, taken from "Goodfellow et '
+               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
                2,
                None,
-               'various-steps-in-cross-validation'),
-              ('Cross-validation in brief',
+               'adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
+              ('Algorithms and codes for Adagrad, RMSprop and Adam',
                2,
                None,
-               'cross-validation-in-brief'),
-              ('Code Example for Cross-validation and $k$-fold '
-               'Cross-validation',
+               'algorithms-and-codes-for-adagrad-rmsprop-and-adam'),
+              ('Practical tips', 2, None, 'practical-tips'),
+              ('Sneaking in automatic differentiation using Autograd',
                2,
                None,
-               'code-example-for-cross-validation-and-k-fold-cross-validation'),
-              ('More examples on bootstrap and cross-validation and errors',
+               'sneaking-in-automatic-differentiation-using-autograd'),
+              ('Same code but now with momentum gradient descent',
                2,
                None,
-               'more-examples-on-bootstrap-and-cross-validation-and-errors'),
-              ('The same example but now with cross-validation',
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ('Including Stochastic Gradient Descent with Autograd',
                2,
                None,
-               'the-same-example-but-now-with-cross-validation'),
+               'including-stochastic-gradient-descent-with-autograd'),
+              ('Same code but now with momentum gradient descent',
+               2,
+               None,
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ("But none of these can compete with Newton's method",
+               2,
+               None,
+               'but-none-of-these-can-compete-with-newton-s-method'),
+              ('Similar (second order function now) problem but now with '
+               'AdaGrad',
+               2,
+               None,
+               'similar-second-order-function-now-problem-but-now-with-adagrad'),
+              ('RMSprop for adaptive learning rate with Stochastic Gradient '
+               'Descent',
+               2,
+               None,
+               'rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent'),
+              ('And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"',
+               2,
+               None,
+               'and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf'),
               ('Material for the lab sessions',
                2,
                None,
                'material-for-the-lab-sessions'),
-              ('Linking the regression analysis with a statistical '
-               'interpretation',
+              ('Reminder on different scaling methods',
                2,
                None,
-               'linking-the-regression-analysis-with-a-statistical-interpretation'),
-              ('Assumptions made', 2, None, 'assumptions-made'),
-              ('Expectation value and variance',
+               'reminder-on-different-scaling-methods'),
+              ('Functionality in Scikit-Learn',
                2,
                None,
-               'expectation-value-and-variance'),
-              ('Expectation value and variance for $\\boldsymbol{\\beta}$',
+               'functionality-in-scikit-learn'),
+              ('More preprocessing', 2, None, 'more-preprocessing'),
+              ('Frequently used scaling functions',
                2,
                None,
-               'expectation-value-and-variance-for-boldsymbol-beta')]}
+               'frequently-used-scaling-functions')]}
 end of tocinfo -->
 
 <body>
@@ -220,66 +283,86 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week37-bs.html">Week 37: Statistical interpretations and Resampling Methods</a>
+    <a class="navbar-brand" href="http://github.com/week37-bs.html">Week 37: Gradient descent methods</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs001.html#plans-for-week-37-lecture-monday" style="font-size: 80%;">Plans for week 37, lecture Monday</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs002.html#plans-for-week-37-lab-sessions" style="font-size: 80%;">Plans for week 37, lab sessions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs003.html#material-for-lecture-monday-september-9" style="font-size: 80%;">Material for lecture Monday September 9</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs004.html#deriving-ols-from-a-probability-distribution" style="font-size: 80%;">Deriving OLS from a probability distribution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs005.html#independent-and-identically-distrubuted-iid" style="font-size: 80%;">Independent and Identically Distrubuted (iid)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs006.html#maximum-likelihood-estimation-mle" style="font-size: 80%;">Maximum Likelihood Estimation (MLE)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs007.html#a-new-cost-function" style="font-size: 80%;">A new Cost Function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs008.html#more-basic-statistics-and-bayes-theorem" style="font-size: 80%;">More basic Statistics and Bayes' theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs009.html#marginal-probability" style="font-size: 80%;">Marginal Probability</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs010.html#conditional-probability" style="font-size: 80%;">Conditional  Probability</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs011.html#bayes-theorem" style="font-size: 80%;">Bayes' Theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs012.html#interpretations-of-bayes-theorem" style="font-size: 80%;">Interpretations of Bayes' Theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs013.html#example-of-usage-of-bayes-theorem" style="font-size: 80%;">Example of Usage of Bayes' theorem</a></li>
-     <!-- navigation toc: --> <li><a href="#doing-it-correctly" style="font-size: 80%;">Doing it correctly</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs015.html#bayes-theorem-and-ridge-and-lasso-regression" style="font-size: 80%;">Bayes' Theorem and Ridge and Lasso Regression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs016.html#ridge-and-bayes" style="font-size: 80%;">Ridge and Bayes</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs017.html#lasso-and-bayes" style="font-size: 80%;">Lasso and Bayes</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs021.html#why-resampling-methods" style="font-size: 80%;">Why resampling methods</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs023.html#resampling-methods" style="font-size: 80%;">Resampling methods</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs020.html#resampling-approaches-can-be-computationally-expensive" style="font-size: 80%;">Resampling approaches can be computationally expensive</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs021.html#why-resampling-methods" style="font-size: 80%;">Why resampling methods ?</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs022.html#statistical-analysis" style="font-size: 80%;">Statistical analysis</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs023.html#resampling-methods" style="font-size: 80%;">Resampling methods</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs024.html#resampling-methods-bootstrap" style="font-size: 80%;">Resampling methods: Bootstrap</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs025.html#the-central-limit-theorem" style="font-size: 80%;">The Central Limit Theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs026.html#finding-the-limit" style="font-size: 80%;">Finding the Limit</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs027.html#rewriting-the-delta-function" style="font-size: 80%;">Rewriting the \( \delta \)-function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs028.html#identifying-terms" style="font-size: 80%;">Identifying Terms</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs029.html#wrapping-it-up" style="font-size: 80%;">Wrapping it up</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs030.html#confidence-intervals" style="font-size: 80%;">Confidence Intervals</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs031.html#standard-approach-based-on-the-normal-distribution" style="font-size: 80%;">Standard Approach based on the Normal Distribution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs032.html#resampling-methods-bootstrap-background" style="font-size: 80%;">Resampling methods: Bootstrap background</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#resampling-methods-more-bootstrap-background" style="font-size: 80%;">Resampling methods: More Bootstrap background</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs034.html#resampling-methods-bootstrap-approach" style="font-size: 80%;">Resampling methods: Bootstrap approach</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#resampling-methods-bootstrap-steps" style="font-size: 80%;">Resampling methods: Bootstrap steps</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs036.html#code-example-for-the-bootstrap-method" style="font-size: 80%;">Code example for the Bootstrap method</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs037.html#plotting-the-histogram" style="font-size: 80%;">Plotting the Histogram</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs038.html#the-bias-variance-tradeoff" style="font-size: 80%;">The bias-variance tradeoff</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs039.html#a-way-to-read-the-bias-variance-tradeoff" style="font-size: 80%;">A way to Read the Bias-Variance Tradeoff</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs040.html#example-code-for-bias-variance-tradeoff" style="font-size: 80%;">Example code for Bias-Variance tradeoff</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs041.html#understanding-what-happens" style="font-size: 80%;">Understanding what happens</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs042.html#summing-up" style="font-size: 80%;">Summing up</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs043.html#another-example-from-scikit-learn-s-repository" style="font-size: 80%;">Another Example from Scikit-Learn's Repository</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs044.html#various-steps-in-cross-validation" style="font-size: 80%;">Various steps in cross-validation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs045.html#cross-validation-in-brief" style="font-size: 80%;">Cross-validation in brief</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs046.html#code-example-for-cross-validation-and-k-fold-cross-validation" style="font-size: 80%;">Code Example for Cross-validation and \( k \)-fold Cross-validation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs047.html#more-examples-on-bootstrap-and-cross-validation-and-errors" style="font-size: 80%;">More examples on bootstrap and cross-validation and errors</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs048.html#the-same-example-but-now-with-cross-validation" style="font-size: 80%;">The same example but now with cross-validation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs049.html#material-for-the-lab-sessions" style="font-size: 80%;">Material for the lab sessions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs050.html#linking-the-regression-analysis-with-a-statistical-interpretation" style="font-size: 80%;">Linking the regression analysis with a statistical interpretation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs051.html#assumptions-made" style="font-size: 80%;">Assumptions made</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs052.html#expectation-value-and-variance" style="font-size: 80%;">Expectation value and variance</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs053.html#expectation-value-and-variance-for-boldsymbol-beta" style="font-size: 80%;">Expectation value and variance for \( \boldsymbol{\beta} \)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs001.html#plans-for-week-37-lecture-monday" style="font-size: 80%;"><b>Plans for week 37, lecture Monday</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs002.html#readings-and-videos" style="font-size: 80%;"><b>Readings and Videos:</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs003.html#material-for-lecture-monday-september-8" style="font-size: 80%;"><b>Material for lecture Monday September 8</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs004.html#gradient-descent-and-revisiting-ordinary-least-squares-from-last-week" style="font-size: 80%;"><b>Gradient descent and revisiting Ordinary Least Squares from last week</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs009.html#gradient-descent-example" style="font-size: 80%;"><b>Gradient descent example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs006.html#the-derivative-of-the-cost-loss-function" style="font-size: 80%;"><b>The derivative of the cost/loss function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs007.html#the-hessian-matrix" style="font-size: 80%;"><b>The Hessian matrix</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs008.html#simple-program" style="font-size: 80%;"><b>Simple program</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs009.html#gradient-descent-example" style="font-size: 80%;"><b>Gradient Descent Example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs010.html#gradient-descent-and-ridge" style="font-size: 80%;"><b>Gradient descent and Ridge</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs011.html#the-hessian-matrix-for-ridge-regression" style="font-size: 80%;"><b>The Hessian matrix for Ridge Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs012.html#program-example-for-gradient-descent-with-ridge-regression" style="font-size: 80%;"><b>Program example for gradient descent with Ridge Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs013.html#using-gradient-descent-methods-limitations" style="font-size: 80%;"><b>Using gradient descent methods, limitations</b></a></li>
+     <!-- navigation toc: --> <li><a href="#momentum-based-gd" style="font-size: 80%;"><b>Momentum based GD</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs015.html#improving-gradient-descent-with-momentum" style="font-size: 80%;"><b>Improving gradient descent with momentum</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs056.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs017.html#overview-video-on-stochastic-gradient-descent-sgd" style="font-size: 80%;"><b>Overview video on Stochastic Gradient Descent (SGD)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs018.html#batches-and-mini-batches" style="font-size: 80%;"><b>Batches and mini-batches</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs019.html#pros-and-cons" style="font-size: 80%;"><b>Pros and cons</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs020.html#convergence-rates" style="font-size: 80%;"><b>Convergence rates</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs021.html#accuracy" style="font-size: 80%;"><b>Accuracy</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs022.html#stochastic-gradient-descent-sgd" style="font-size: 80%;"><b>Stochastic Gradient Descent (SGD)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs023.html#stochastic-gradient-descent" style="font-size: 80%;"><b>Stochastic Gradient Descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs024.html#computation-of-gradients" style="font-size: 80%;"><b>Computation of gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs025.html#sgd-example" style="font-size: 80%;"><b>SGD example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs026.html#the-gradient-step" style="font-size: 80%;"><b>The gradient step</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs027.html#simple-example-code" style="font-size: 80%;"><b>Simple example code</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs028.html#when-do-we-stop" style="font-size: 80%;"><b>When do we stop?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs029.html#slightly-different-approach" style="font-size: 80%;"><b>Slightly different approach</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs030.html#time-decay-rate" style="font-size: 80%;"><b>Time decay rate</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs031.html#code-with-a-number-of-minibatches-which-varies" style="font-size: 80%;"><b>Code with a Number of Minibatches which varies</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs032.html#replace-or-not" style="font-size: 80%;"><b>Replace or not</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#sgd-vs-full-batch-gd-convergence-speed-and-memory-comparison" style="font-size: 80%;"><b>SGD vs Full-Batch GD: Convergence Speed and Memory Comparison</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#theoretical-convergence-speed-and-convex-optimization" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Theoretical Convergence Speed and convex optimization</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#strongly-convex-case" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Strongly Convex Case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#non-convex-problems" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Non-Convex Problems</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs034.html#memory-usage-and-scalability" style="font-size: 80%;"><b>Memory Usage and Scalability</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#empirical-evidence-convergence-time-and-memory-in-practice" style="font-size: 80%;"><b>Empirical Evidence: Convergence Time and Memory in Practice</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#deep-neural-networks" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Deep Neural Networks</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#memory-constraints" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Memory constraints</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs036.html#second-moment-of-the-gradient" style="font-size: 80%;"><b>Second moment of the gradient</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs037.html#challenge-choosing-a-fixed-learning-rate" style="font-size: 80%;"><b>Challenge: Choosing a Fixed Learning Rate</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs038.html#motivation-for-adaptive-step-sizes" style="font-size: 80%;"><b>Motivation for Adaptive Step Sizes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs038.html#adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>AdaGrad algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs039.html#derivation-of-the-adagrad-algorithm" style="font-size: 80%;"><b>Derivation of the AdaGrad Algorithm</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs040.html#adagrad-update-rule-derivation" style="font-size: 80%;"><b>AdaGrad Update Rule Derivation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs041.html#adagrad-properties" style="font-size: 80%;"><b>AdaGrad Properties</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs042.html#rmsprop-adaptive-learning-rates" style="font-size: 80%;"><b>RMSProp: Adaptive Learning Rates</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs042.html#rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>RMSProp algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs043.html#adam-optimizer" style="font-size: 80%;"><b>Adam Optimizer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs044.html#adam-optimizer-https-arxiv-org-abs-1412-6980" style="font-size: 80%;"><b>"ADAM optimizer":"https://arxiv.org/abs/1412.6980"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs045.html#why-combine-momentum-and-rmsprop" style="font-size: 80%;"><b>Why Combine Momentum and RMSProp?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs046.html#adam-exponential-moving-averages-moments" style="font-size: 80%;"><b>Adam: Exponential Moving Averages (Moments)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs047.html#adam-bias-correction" style="font-size: 80%;"><b>Adam: Bias Correction</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs048.html#adam-update-rule-derivation" style="font-size: 80%;"><b>Adam: Update Rule Derivation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs049.html#adam-vs-adagrad-and-rmsprop" style="font-size: 80%;"><b>Adam vs. AdaGrad and RMSProp</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs050.html#adaptivity-across-dimensions" style="font-size: 80%;"><b>Adaptivity Across Dimensions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs050.html#adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>ADAM algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs051.html#algorithms-and-codes-for-adagrad-rmsprop-and-adam" style="font-size: 80%;"><b>Algorithms and codes for Adagrad, RMSprop and Adam</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs052.html#practical-tips" style="font-size: 80%;"><b>Practical tips</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs053.html#sneaking-in-automatic-differentiation-using-autograd" style="font-size: 80%;"><b>Sneaking in automatic differentiation using Autograd</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs056.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs055.html#including-stochastic-gradient-descent-with-autograd" style="font-size: 80%;"><b>Including Stochastic Gradient Descent with Autograd</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs056.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs057.html#but-none-of-these-can-compete-with-newton-s-method" style="font-size: 80%;"><b>But none of these can compete with Newton's method</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs058.html#similar-second-order-function-now-problem-but-now-with-adagrad" style="font-size: 80%;"><b>Similar (second order function now) problem but now with AdaGrad</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs059.html#rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent" style="font-size: 80%;"><b>RMSprop for adaptive learning rate with Stochastic Gradient Descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs060.html#and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf" style="font-size: 80%;"><b>And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs061.html#material-for-the-lab-sessions" style="font-size: 80%;"><b>Material for the lab sessions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs062.html#reminder-on-different-scaling-methods" style="font-size: 80%;"><b>Reminder on different scaling methods</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs063.html#functionality-in-scikit-learn" style="font-size: 80%;"><b>Functionality in Scikit-Learn</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs064.html#more-preprocessing" style="font-size: 80%;"><b>More preprocessing</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs065.html#frequently-used-scaling-functions" style="font-size: 80%;"><b>Frequently used scaling functions</b></a></li>
 
         </ul>
       </li>
@@ -291,29 +374,14 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0014"></a>
 <!-- !split -->
-<h2 id="doing-it-correctly" class="anchor">Doing it correctly </h2>
+<h2 id="momentum-based-gd" class="anchor">Momentum based GD </h2>
 
-<p>If we look at various national surveys on breast cancer, the general likelihood of developing breast cancer is a very small number.
-Let us assume that the prior probability in the population as a whole is
+<p>We discuss here some simple examples where we introduce what is called
+'memory'about previous steps, or what is normally called momentum
+gradient descent.
+For the mathematical details, see whiteboad notes from lecture on September 8, 2025. 
 </p>
 
-$$
-p(Y=1) =0.004.
-$$
-
-<p>We need also to account for the fact that the test may produce a false positive result (false alarm). Let us here assume that we have</p>
-$$
-p(X=1\vert Y=0) =0.1.
-$$
-
-<p>Using Bayes' theorem we can then find the posterior probability that the person has breast cancer in case of a positive test, that is we can compute</p>
-
-$$
-p(Y=1\vert X=1)=\frac{p(X=1\vert Y=1)p(Y=1)}{p(X=1\vert Y=1)p(Y=1)+p(X=1\vert Y=0)p(Y=0)}=\frac{0.8\times 0.004}{0.8\times 0.004+0.1\times 0.996}=0.031.
-$$
-
-<p>That is, in case of a positive test, there is only a \( 3\% \) chance of having breast cancer!</p>
-
 <p>
 <!-- navigation buttons at the bottom of the page -->
 <ul class="pagination">
@@ -339,7 +407,7 @@ <h2 id="doing-it-correctly" class="anchor">Doing it correctly </h2>
   <li><a href="http://github.com/._week37-bs022.html">23</a></li>
   <li><a href="http://github.com/._week37-bs023.html">24</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week37-bs053.html">54</a></li>
+  <li><a href="http://github.com/._week37-bs065.html">66</a></li>
   <li><a href="http://github.com/._week37-bs015.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week37/html/._week37-bs015.html b/doc/pub/week37/html/._week37-bs015.html
index 3d2fc7b18..f11f51071 100644
--- a/doc/pub/week37/html/._week37-bs015.html
+++ b/doc/pub/week37/html/._week37-bs015.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 37: Statistical interpretations and Resampling Methods">
-<title>Week 37: Statistical interpretations and Resampling Methods</title>
+<meta name="description" content="Week 37: Gradient descent methods">
+<title>Week 37: Gradient descent methods</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week37.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week37-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -40,159 +40,222 @@
                2,
                None,
                'plans-for-week-37-lecture-monday'),
-              ('Plans for week 37, lab sessions',
+              ('Readings and Videos:', 2, None, 'readings-and-videos'),
+              ('Material for lecture Monday September 8',
                2,
                None,
-               'plans-for-week-37-lab-sessions'),
-              ('Material for lecture Monday September 9',
+               'material-for-lecture-monday-september-8'),
+              ('Gradient descent and revisiting Ordinary Least Squares from '
+               'last week',
                2,
                None,
-               'material-for-lecture-monday-september-9'),
-              ('Deriving OLS from a probability distribution',
+               'gradient-descent-and-revisiting-ordinary-least-squares-from-last-week'),
+              ('Gradient descent example', 2, None, 'gradient-descent-example'),
+              ('The derivative of the cost/loss function',
                2,
                None,
-               'deriving-ols-from-a-probability-distribution'),
-              ('Independent and Identically Distrubuted (iid)',
+               'the-derivative-of-the-cost-loss-function'),
+              ('The Hessian matrix', 2, None, 'the-hessian-matrix'),
+              ('Simple program', 2, None, 'simple-program'),
+              ('Gradient Descent Example', 2, None, 'gradient-descent-example'),
+              ('Gradient descent and Ridge',
                2,
                None,
-               'independent-and-identically-distrubuted-iid'),
-              ('Maximum Likelihood Estimation (MLE)',
+               'gradient-descent-and-ridge'),
+              ('The Hessian matrix for Ridge Regression',
                2,
                None,
-               'maximum-likelihood-estimation-mle'),
-              ('A new Cost Function', 2, None, 'a-new-cost-function'),
-              ("More basic Statistics and Bayes' theorem",
+               'the-hessian-matrix-for-ridge-regression'),
+              ('Program example for gradient descent with Ridge Regression',
                2,
                None,
-               'more-basic-statistics-and-bayes-theorem'),
-              ('Marginal Probability', 2, None, 'marginal-probability'),
-              ('Conditional  Probability', 2, None, 'conditional-probability'),
-              ("Bayes' Theorem", 2, None, 'bayes-theorem'),
-              ("Interpretations of Bayes' Theorem",
+               'program-example-for-gradient-descent-with-ridge-regression'),
+              ('Using gradient descent methods, limitations',
                2,
                None,
-               'interpretations-of-bayes-theorem'),
-              ("Example of Usage of Bayes' theorem",
+               'using-gradient-descent-methods-limitations'),
+              ('Momentum based GD', 2, None, 'momentum-based-gd'),
+              ('Improving gradient descent with momentum',
                2,
                None,
-               'example-of-usage-of-bayes-theorem'),
-              ('Doing it correctly', 2, None, 'doing-it-correctly'),
-              ("Bayes' Theorem and Ridge and Lasso Regression",
+               'improving-gradient-descent-with-momentum'),
+              ('Same code but now with momentum gradient descent',
                2,
                None,
-               'bayes-theorem-and-ridge-and-lasso-regression'),
-              ('Ridge and Bayes', 2, None, 'ridge-and-bayes'),
-              ('Lasso and Bayes', 2, None, 'lasso-and-bayes'),
-              ('Why resampling methods', 2, None, 'why-resampling-methods'),
-              ('Resampling methods', 2, None, 'resampling-methods'),
-              ('Resampling approaches can be computationally expensive',
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ('Overview video on Stochastic Gradient Descent (SGD)',
                2,
                None,
-               'resampling-approaches-can-be-computationally-expensive'),
-              ('Why resampling methods ?', 2, None, 'why-resampling-methods'),
-              ('Statistical analysis', 2, None, 'statistical-analysis'),
-              ('Resampling methods', 2, None, 'resampling-methods'),
-              ('Resampling methods: Bootstrap',
+               'overview-video-on-stochastic-gradient-descent-sgd'),
+              ('Batches and mini-batches', 2, None, 'batches-and-mini-batches'),
+              ('Pros and cons', 2, None, 'pros-and-cons'),
+              ('Convergence rates', 2, None, 'convergence-rates'),
+              ('Accuracy', 2, None, 'accuracy'),
+              ('Stochastic Gradient Descent (SGD)',
                2,
                None,
-               'resampling-methods-bootstrap'),
-              ('The Central Limit Theorem',
+               'stochastic-gradient-descent-sgd'),
+              ('Stochastic Gradient Descent',
                2,
                None,
-               'the-central-limit-theorem'),
-              ('Finding the Limit', 2, None, 'finding-the-limit'),
-              ('Rewriting the $\\delta$-function',
+               'stochastic-gradient-descent'),
+              ('Computation of gradients', 2, None, 'computation-of-gradients'),
+              ('SGD example', 2, None, 'sgd-example'),
+              ('The gradient step', 2, None, 'the-gradient-step'),
+              ('Simple example code', 2, None, 'simple-example-code'),
+              ('When do we stop?', 2, None, 'when-do-we-stop'),
+              ('Slightly different approach',
                2,
                None,
-               'rewriting-the-delta-function'),
-              ('Identifying Terms', 2, None, 'identifying-terms'),
-              ('Wrapping it up', 2, None, 'wrapping-it-up'),
-              ('Confidence Intervals', 2, None, 'confidence-intervals'),
-              ('Standard Approach based on the Normal Distribution',
+               'slightly-different-approach'),
+              ('Time decay rate', 2, None, 'time-decay-rate'),
+              ('Code with a Number of Minibatches which varies',
                2,
                None,
-               'standard-approach-based-on-the-normal-distribution'),
-              ('Resampling methods: Bootstrap background',
+               'code-with-a-number-of-minibatches-which-varies'),
+              ('Replace or not', 2, None, 'replace-or-not'),
+              ('SGD vs Full-Batch GD: Convergence Speed and Memory Comparison',
                2,
                None,
-               'resampling-methods-bootstrap-background'),
-              ('Resampling methods: More Bootstrap background',
+               'sgd-vs-full-batch-gd-convergence-speed-and-memory-comparison'),
+              ('Theoretical Convergence Speed and convex optimization',
+               3,
+               None,
+               'theoretical-convergence-speed-and-convex-optimization'),
+              ('Strongly Convex Case', 3, None, 'strongly-convex-case'),
+              ('Non-Convex Problems', 3, None, 'non-convex-problems'),
+              ('Memory Usage and Scalability',
+               2,
+               None,
+               'memory-usage-and-scalability'),
+              ('Empirical Evidence: Convergence Time and Memory in Practice',
+               2,
+               None,
+               'empirical-evidence-convergence-time-and-memory-in-practice'),
+              ('Deep Neural Networks', 3, None, 'deep-neural-networks'),
+              ('Memory constraints', 3, None, 'memory-constraints'),
+              ('Second moment of the gradient',
+               2,
+               None,
+               'second-moment-of-the-gradient'),
+              ('Challenge: Choosing a Fixed Learning Rate',
+               2,
+               None,
+               'challenge-choosing-a-fixed-learning-rate'),
+              ('Motivation for Adaptive Step Sizes',
+               2,
+               None,
+               'motivation-for-adaptive-step-sizes'),
+              ('AdaGrad algorithm, taken from "Goodfellow et '
+               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
+               2,
+               None,
+               'adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
+              ('Derivation of the AdaGrad Algorithm',
+               2,
+               None,
+               'derivation-of-the-adagrad-algorithm'),
+              ('AdaGrad Update Rule Derivation',
+               2,
+               None,
+               'adagrad-update-rule-derivation'),
+              ('AdaGrad Properties', 2, None, 'adagrad-properties'),
+              ('RMSProp: Adaptive Learning Rates',
                2,
                None,
-               'resampling-methods-more-bootstrap-background'),
-              ('Resampling methods: Bootstrap approach',
+               'rmsprop-adaptive-learning-rates'),
+              ('RMSProp algorithm, taken from "Goodfellow et '
+               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
                2,
                None,
-               'resampling-methods-bootstrap-approach'),
-              ('Resampling methods: Bootstrap steps',
+               'rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
+              ('Adam Optimizer', 2, None, 'adam-optimizer'),
+              ('"ADAM optimizer":"https://arxiv.org/abs/1412.6980"',
                2,
                None,
-               'resampling-methods-bootstrap-steps'),
-              ('Code example for the Bootstrap method',
+               'adam-optimizer-https-arxiv-org-abs-1412-6980'),
+              ('Why Combine Momentum and RMSProp?',
                2,
                None,
-               'code-example-for-the-bootstrap-method'),
-              ('Plotting the Histogram', 2, None, 'plotting-the-histogram'),
-              ('The bias-variance tradeoff',
+               'why-combine-momentum-and-rmsprop'),
+              ('Adam: Exponential Moving Averages (Moments)',
                2,
                None,
-               'the-bias-variance-tradeoff'),
-              ('A way to Read the Bias-Variance Tradeoff',
+               'adam-exponential-moving-averages-moments'),
+              ('Adam: Bias Correction', 2, None, 'adam-bias-correction'),
+              ('Adam: Update Rule Derivation',
                2,
                None,
-               'a-way-to-read-the-bias-variance-tradeoff'),
-              ('Example code for Bias-Variance tradeoff',
+               'adam-update-rule-derivation'),
+              ('Adam vs. AdaGrad and RMSProp',
                2,
                None,
-               'example-code-for-bias-variance-tradeoff'),
-              ('Understanding what happens',
+               'adam-vs-adagrad-and-rmsprop'),
+              ('Adaptivity Across Dimensions',
                2,
                None,
-               'understanding-what-happens'),
-              ('Summing up', 2, None, 'summing-up'),
-              ("Another Example from Scikit-Learn's Repository",
+               'adaptivity-across-dimensions'),
+              ('ADAM algorithm, taken from "Goodfellow et '
+               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
                2,
                None,
-               'another-example-from-scikit-learn-s-repository'),
-              ('Various steps in cross-validation',
+               'adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
+              ('Algorithms and codes for Adagrad, RMSprop and Adam',
                2,
                None,
-               'various-steps-in-cross-validation'),
-              ('Cross-validation in brief',
+               'algorithms-and-codes-for-adagrad-rmsprop-and-adam'),
+              ('Practical tips', 2, None, 'practical-tips'),
+              ('Sneaking in automatic differentiation using Autograd',
                2,
                None,
-               'cross-validation-in-brief'),
-              ('Code Example for Cross-validation and $k$-fold '
-               'Cross-validation',
+               'sneaking-in-automatic-differentiation-using-autograd'),
+              ('Same code but now with momentum gradient descent',
                2,
                None,
-               'code-example-for-cross-validation-and-k-fold-cross-validation'),
-              ('More examples on bootstrap and cross-validation and errors',
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ('Including Stochastic Gradient Descent with Autograd',
                2,
                None,
-               'more-examples-on-bootstrap-and-cross-validation-and-errors'),
-              ('The same example but now with cross-validation',
+               'including-stochastic-gradient-descent-with-autograd'),
+              ('Same code but now with momentum gradient descent',
                2,
                None,
-               'the-same-example-but-now-with-cross-validation'),
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ("But none of these can compete with Newton's method",
+               2,
+               None,
+               'but-none-of-these-can-compete-with-newton-s-method'),
+              ('Similar (second order function now) problem but now with '
+               'AdaGrad',
+               2,
+               None,
+               'similar-second-order-function-now-problem-but-now-with-adagrad'),
+              ('RMSprop for adaptive learning rate with Stochastic Gradient '
+               'Descent',
+               2,
+               None,
+               'rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent'),
+              ('And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"',
+               2,
+               None,
+               'and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf'),
               ('Material for the lab sessions',
                2,
                None,
                'material-for-the-lab-sessions'),
-              ('Linking the regression analysis with a statistical '
-               'interpretation',
+              ('Reminder on different scaling methods',
                2,
                None,
-               'linking-the-regression-analysis-with-a-statistical-interpretation'),
-              ('Assumptions made', 2, None, 'assumptions-made'),
-              ('Expectation value and variance',
+               'reminder-on-different-scaling-methods'),
+              ('Functionality in Scikit-Learn',
                2,
                None,
-               'expectation-value-and-variance'),
-              ('Expectation value and variance for $\\boldsymbol{\\beta}$',
+               'functionality-in-scikit-learn'),
+              ('More preprocessing', 2, None, 'more-preprocessing'),
+              ('Frequently used scaling functions',
                2,
                None,
-               'expectation-value-and-variance-for-boldsymbol-beta')]}
+               'frequently-used-scaling-functions')]}
 end of tocinfo -->
 
 <body>
@@ -220,66 +283,86 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week37-bs.html">Week 37: Statistical interpretations and Resampling Methods</a>
+    <a class="navbar-brand" href="http://github.com/week37-bs.html">Week 37: Gradient descent methods</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs001.html#plans-for-week-37-lecture-monday" style="font-size: 80%;">Plans for week 37, lecture Monday</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs002.html#plans-for-week-37-lab-sessions" style="font-size: 80%;">Plans for week 37, lab sessions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs003.html#material-for-lecture-monday-september-9" style="font-size: 80%;">Material for lecture Monday September 9</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs004.html#deriving-ols-from-a-probability-distribution" style="font-size: 80%;">Deriving OLS from a probability distribution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs005.html#independent-and-identically-distrubuted-iid" style="font-size: 80%;">Independent and Identically Distrubuted (iid)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs006.html#maximum-likelihood-estimation-mle" style="font-size: 80%;">Maximum Likelihood Estimation (MLE)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs007.html#a-new-cost-function" style="font-size: 80%;">A new Cost Function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs008.html#more-basic-statistics-and-bayes-theorem" style="font-size: 80%;">More basic Statistics and Bayes' theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs009.html#marginal-probability" style="font-size: 80%;">Marginal Probability</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs010.html#conditional-probability" style="font-size: 80%;">Conditional  Probability</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs011.html#bayes-theorem" style="font-size: 80%;">Bayes' Theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs012.html#interpretations-of-bayes-theorem" style="font-size: 80%;">Interpretations of Bayes' Theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs013.html#example-of-usage-of-bayes-theorem" style="font-size: 80%;">Example of Usage of Bayes' theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs014.html#doing-it-correctly" style="font-size: 80%;">Doing it correctly</a></li>
-     <!-- navigation toc: --> <li><a href="#bayes-theorem-and-ridge-and-lasso-regression" style="font-size: 80%;">Bayes' Theorem and Ridge and Lasso Regression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs016.html#ridge-and-bayes" style="font-size: 80%;">Ridge and Bayes</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs017.html#lasso-and-bayes" style="font-size: 80%;">Lasso and Bayes</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs021.html#why-resampling-methods" style="font-size: 80%;">Why resampling methods</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs023.html#resampling-methods" style="font-size: 80%;">Resampling methods</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs020.html#resampling-approaches-can-be-computationally-expensive" style="font-size: 80%;">Resampling approaches can be computationally expensive</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs021.html#why-resampling-methods" style="font-size: 80%;">Why resampling methods ?</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs022.html#statistical-analysis" style="font-size: 80%;">Statistical analysis</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs023.html#resampling-methods" style="font-size: 80%;">Resampling methods</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs024.html#resampling-methods-bootstrap" style="font-size: 80%;">Resampling methods: Bootstrap</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs025.html#the-central-limit-theorem" style="font-size: 80%;">The Central Limit Theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs026.html#finding-the-limit" style="font-size: 80%;">Finding the Limit</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs027.html#rewriting-the-delta-function" style="font-size: 80%;">Rewriting the \( \delta \)-function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs028.html#identifying-terms" style="font-size: 80%;">Identifying Terms</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs029.html#wrapping-it-up" style="font-size: 80%;">Wrapping it up</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs030.html#confidence-intervals" style="font-size: 80%;">Confidence Intervals</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs031.html#standard-approach-based-on-the-normal-distribution" style="font-size: 80%;">Standard Approach based on the Normal Distribution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs032.html#resampling-methods-bootstrap-background" style="font-size: 80%;">Resampling methods: Bootstrap background</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#resampling-methods-more-bootstrap-background" style="font-size: 80%;">Resampling methods: More Bootstrap background</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs034.html#resampling-methods-bootstrap-approach" style="font-size: 80%;">Resampling methods: Bootstrap approach</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#resampling-methods-bootstrap-steps" style="font-size: 80%;">Resampling methods: Bootstrap steps</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs036.html#code-example-for-the-bootstrap-method" style="font-size: 80%;">Code example for the Bootstrap method</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs037.html#plotting-the-histogram" style="font-size: 80%;">Plotting the Histogram</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs038.html#the-bias-variance-tradeoff" style="font-size: 80%;">The bias-variance tradeoff</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs039.html#a-way-to-read-the-bias-variance-tradeoff" style="font-size: 80%;">A way to Read the Bias-Variance Tradeoff</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs040.html#example-code-for-bias-variance-tradeoff" style="font-size: 80%;">Example code for Bias-Variance tradeoff</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs041.html#understanding-what-happens" style="font-size: 80%;">Understanding what happens</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs042.html#summing-up" style="font-size: 80%;">Summing up</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs043.html#another-example-from-scikit-learn-s-repository" style="font-size: 80%;">Another Example from Scikit-Learn's Repository</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs044.html#various-steps-in-cross-validation" style="font-size: 80%;">Various steps in cross-validation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs045.html#cross-validation-in-brief" style="font-size: 80%;">Cross-validation in brief</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs046.html#code-example-for-cross-validation-and-k-fold-cross-validation" style="font-size: 80%;">Code Example for Cross-validation and \( k \)-fold Cross-validation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs047.html#more-examples-on-bootstrap-and-cross-validation-and-errors" style="font-size: 80%;">More examples on bootstrap and cross-validation and errors</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs048.html#the-same-example-but-now-with-cross-validation" style="font-size: 80%;">The same example but now with cross-validation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs049.html#material-for-the-lab-sessions" style="font-size: 80%;">Material for the lab sessions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs050.html#linking-the-regression-analysis-with-a-statistical-interpretation" style="font-size: 80%;">Linking the regression analysis with a statistical interpretation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs051.html#assumptions-made" style="font-size: 80%;">Assumptions made</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs052.html#expectation-value-and-variance" style="font-size: 80%;">Expectation value and variance</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs053.html#expectation-value-and-variance-for-boldsymbol-beta" style="font-size: 80%;">Expectation value and variance for \( \boldsymbol{\beta} \)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs001.html#plans-for-week-37-lecture-monday" style="font-size: 80%;"><b>Plans for week 37, lecture Monday</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs002.html#readings-and-videos" style="font-size: 80%;"><b>Readings and Videos:</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs003.html#material-for-lecture-monday-september-8" style="font-size: 80%;"><b>Material for lecture Monday September 8</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs004.html#gradient-descent-and-revisiting-ordinary-least-squares-from-last-week" style="font-size: 80%;"><b>Gradient descent and revisiting Ordinary Least Squares from last week</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs009.html#gradient-descent-example" style="font-size: 80%;"><b>Gradient descent example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs006.html#the-derivative-of-the-cost-loss-function" style="font-size: 80%;"><b>The derivative of the cost/loss function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs007.html#the-hessian-matrix" style="font-size: 80%;"><b>The Hessian matrix</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs008.html#simple-program" style="font-size: 80%;"><b>Simple program</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs009.html#gradient-descent-example" style="font-size: 80%;"><b>Gradient Descent Example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs010.html#gradient-descent-and-ridge" style="font-size: 80%;"><b>Gradient descent and Ridge</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs011.html#the-hessian-matrix-for-ridge-regression" style="font-size: 80%;"><b>The Hessian matrix for Ridge Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs012.html#program-example-for-gradient-descent-with-ridge-regression" style="font-size: 80%;"><b>Program example for gradient descent with Ridge Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs013.html#using-gradient-descent-methods-limitations" style="font-size: 80%;"><b>Using gradient descent methods, limitations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs014.html#momentum-based-gd" style="font-size: 80%;"><b>Momentum based GD</b></a></li>
+     <!-- navigation toc: --> <li><a href="#improving-gradient-descent-with-momentum" style="font-size: 80%;"><b>Improving gradient descent with momentum</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs056.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs017.html#overview-video-on-stochastic-gradient-descent-sgd" style="font-size: 80%;"><b>Overview video on Stochastic Gradient Descent (SGD)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs018.html#batches-and-mini-batches" style="font-size: 80%;"><b>Batches and mini-batches</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs019.html#pros-and-cons" style="font-size: 80%;"><b>Pros and cons</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs020.html#convergence-rates" style="font-size: 80%;"><b>Convergence rates</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs021.html#accuracy" style="font-size: 80%;"><b>Accuracy</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs022.html#stochastic-gradient-descent-sgd" style="font-size: 80%;"><b>Stochastic Gradient Descent (SGD)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs023.html#stochastic-gradient-descent" style="font-size: 80%;"><b>Stochastic Gradient Descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs024.html#computation-of-gradients" style="font-size: 80%;"><b>Computation of gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs025.html#sgd-example" style="font-size: 80%;"><b>SGD example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs026.html#the-gradient-step" style="font-size: 80%;"><b>The gradient step</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs027.html#simple-example-code" style="font-size: 80%;"><b>Simple example code</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs028.html#when-do-we-stop" style="font-size: 80%;"><b>When do we stop?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs029.html#slightly-different-approach" style="font-size: 80%;"><b>Slightly different approach</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs030.html#time-decay-rate" style="font-size: 80%;"><b>Time decay rate</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs031.html#code-with-a-number-of-minibatches-which-varies" style="font-size: 80%;"><b>Code with a Number of Minibatches which varies</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs032.html#replace-or-not" style="font-size: 80%;"><b>Replace or not</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#sgd-vs-full-batch-gd-convergence-speed-and-memory-comparison" style="font-size: 80%;"><b>SGD vs Full-Batch GD: Convergence Speed and Memory Comparison</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#theoretical-convergence-speed-and-convex-optimization" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Theoretical Convergence Speed and convex optimization</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#strongly-convex-case" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Strongly Convex Case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#non-convex-problems" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Non-Convex Problems</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs034.html#memory-usage-and-scalability" style="font-size: 80%;"><b>Memory Usage and Scalability</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#empirical-evidence-convergence-time-and-memory-in-practice" style="font-size: 80%;"><b>Empirical Evidence: Convergence Time and Memory in Practice</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#deep-neural-networks" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Deep Neural Networks</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#memory-constraints" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Memory constraints</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs036.html#second-moment-of-the-gradient" style="font-size: 80%;"><b>Second moment of the gradient</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs037.html#challenge-choosing-a-fixed-learning-rate" style="font-size: 80%;"><b>Challenge: Choosing a Fixed Learning Rate</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs038.html#motivation-for-adaptive-step-sizes" style="font-size: 80%;"><b>Motivation for Adaptive Step Sizes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs038.html#adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>AdaGrad algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs039.html#derivation-of-the-adagrad-algorithm" style="font-size: 80%;"><b>Derivation of the AdaGrad Algorithm</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs040.html#adagrad-update-rule-derivation" style="font-size: 80%;"><b>AdaGrad Update Rule Derivation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs041.html#adagrad-properties" style="font-size: 80%;"><b>AdaGrad Properties</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs042.html#rmsprop-adaptive-learning-rates" style="font-size: 80%;"><b>RMSProp: Adaptive Learning Rates</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs042.html#rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>RMSProp algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs043.html#adam-optimizer" style="font-size: 80%;"><b>Adam Optimizer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs044.html#adam-optimizer-https-arxiv-org-abs-1412-6980" style="font-size: 80%;"><b>"ADAM optimizer":"https://arxiv.org/abs/1412.6980"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs045.html#why-combine-momentum-and-rmsprop" style="font-size: 80%;"><b>Why Combine Momentum and RMSProp?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs046.html#adam-exponential-moving-averages-moments" style="font-size: 80%;"><b>Adam: Exponential Moving Averages (Moments)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs047.html#adam-bias-correction" style="font-size: 80%;"><b>Adam: Bias Correction</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs048.html#adam-update-rule-derivation" style="font-size: 80%;"><b>Adam: Update Rule Derivation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs049.html#adam-vs-adagrad-and-rmsprop" style="font-size: 80%;"><b>Adam vs. AdaGrad and RMSProp</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs050.html#adaptivity-across-dimensions" style="font-size: 80%;"><b>Adaptivity Across Dimensions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs050.html#adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>ADAM algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs051.html#algorithms-and-codes-for-adagrad-rmsprop-and-adam" style="font-size: 80%;"><b>Algorithms and codes for Adagrad, RMSprop and Adam</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs052.html#practical-tips" style="font-size: 80%;"><b>Practical tips</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs053.html#sneaking-in-automatic-differentiation-using-autograd" style="font-size: 80%;"><b>Sneaking in automatic differentiation using Autograd</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs056.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs055.html#including-stochastic-gradient-descent-with-autograd" style="font-size: 80%;"><b>Including Stochastic Gradient Descent with Autograd</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs056.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs057.html#but-none-of-these-can-compete-with-newton-s-method" style="font-size: 80%;"><b>But none of these can compete with Newton's method</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs058.html#similar-second-order-function-now-problem-but-now-with-adagrad" style="font-size: 80%;"><b>Similar (second order function now) problem but now with AdaGrad</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs059.html#rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent" style="font-size: 80%;"><b>RMSprop for adaptive learning rate with Stochastic Gradient Descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs060.html#and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf" style="font-size: 80%;"><b>And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs061.html#material-for-the-lab-sessions" style="font-size: 80%;"><b>Material for the lab sessions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs062.html#reminder-on-different-scaling-methods" style="font-size: 80%;"><b>Reminder on different scaling methods</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs063.html#functionality-in-scikit-learn" style="font-size: 80%;"><b>Functionality in Scikit-Learn</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs064.html#more-preprocessing" style="font-size: 80%;"><b>More preprocessing</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs065.html#frequently-used-scaling-functions" style="font-size: 80%;"><b>Frequently used scaling functions</b></a></li>
 
         </ul>
       </li>
@@ -291,32 +374,85 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0015"></a>
 <!-- !split -->
-<h2 id="bayes-theorem-and-ridge-and-lasso-regression" class="anchor">Bayes' Theorem and Ridge and Lasso Regression </h2>
+<h2 id="improving-gradient-descent-with-momentum" class="anchor">Improving gradient descent with momentum </h2>
 
-<p>Using Bayes' theorem we can gain a better intuition about Ridge and Lasso regression. </p>
 
-<p>For ordinary least squares we postulated that the maximum likelihood for the doamin of events \( \boldsymbol{D} \) (one-dimensional case)</p>
-$$
-\boldsymbol{D}=[(x_0,y_0), (x_1,y_1),\dots, (x_{n-1},y_{n-1})],
-$$
-
-<p>is given by</p>
-$$
-p(\boldsymbol{D}\vert\boldsymbol{\beta})=\prod_{i=0}^{n-1}\frac{1}{\sqrt{2\pi\sigma^2}}\exp{\left[-\frac{(y_i-\boldsymbol{X}_{i,*}\boldsymbol{\beta})^2}{2\sigma^2}\right]}.
-$$
-
-<p>In Bayes' theorem this function plays the role of the so-called likelihood. We could now ask the question what is the posterior probability of a parameter set \( \boldsymbol{\beta} \) given a domain of events \( \boldsymbol{D} \)?  That is, how can we define the posterior probability </p>
-
-$$
-p(\boldsymbol{\beta}\vert\boldsymbol{D}).
-$$
-
-<p>Bayes' theorem comes to our rescue here since (omitting the normalization constant)</p>
-$$
-p(\boldsymbol{\beta}\vert\boldsymbol{D})\propto p(\boldsymbol{D}\vert\boldsymbol{\beta})p(\boldsymbol{\beta}).
-$$
+<!-- code=python (!bc pycod) typeset with pygments style "default" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #f8f8f8">
+  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">numpy</span> <span style="color: #008000; font-weight: bold">import</span> asarray
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">numpy</span> <span style="color: #008000; font-weight: bold">import</span> arange
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">numpy.random</span> <span style="color: #008000; font-weight: bold">import</span> rand
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">numpy.random</span> <span style="color: #008000; font-weight: bold">import</span> seed
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">matplotlib</span> <span style="color: #008000; font-weight: bold">import</span> pyplot
+ 
+<span style="color: #408080; font-style: italic"># objective function</span>
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">objective</span>(x):
+	<span style="color: #008000; font-weight: bold">return</span> x<span style="color: #666666">**2.0</span>
+ 
+<span style="color: #408080; font-style: italic"># derivative of objective function</span>
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">derivative</span>(x):
+	<span style="color: #008000; font-weight: bold">return</span> x <span style="color: #666666">*</span> <span style="color: #666666">2.0</span>
+ 
+<span style="color: #408080; font-style: italic"># gradient descent algorithm</span>
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">gradient_descent</span>(objective, derivative, bounds, n_iter, step_size):
+	<span style="color: #408080; font-style: italic"># track all solutions</span>
+	solutions, scores <span style="color: #666666">=</span> <span style="color: #008000">list</span>(), <span style="color: #008000">list</span>()
+	<span style="color: #408080; font-style: italic"># generate an initial point</span>
+	solution <span style="color: #666666">=</span> bounds[:, <span style="color: #666666">0</span>] <span style="color: #666666">+</span> rand(<span style="color: #008000">len</span>(bounds)) <span style="color: #666666">*</span> (bounds[:, <span style="color: #666666">1</span>] <span style="color: #666666">-</span> bounds[:, <span style="color: #666666">0</span>])
+	<span style="color: #408080; font-style: italic"># run the gradient descent</span>
+	<span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(n_iter):
+		<span style="color: #408080; font-style: italic"># calculate gradient</span>
+		gradient <span style="color: #666666">=</span> derivative(solution)
+		<span style="color: #408080; font-style: italic"># take a step</span>
+		solution <span style="color: #666666">=</span> solution <span style="color: #666666">-</span> step_size <span style="color: #666666">*</span> gradient
+		<span style="color: #408080; font-style: italic"># evaluate candidate point</span>
+		solution_eval <span style="color: #666666">=</span> objective(solution)
+		<span style="color: #408080; font-style: italic"># store solution</span>
+		solutions<span style="color: #666666">.</span>append(solution)
+		scores<span style="color: #666666">.</span>append(solution_eval)
+		<span style="color: #408080; font-style: italic"># report progress</span>
+		<span style="color: #008000">print</span>(<span style="color: #BA2121">&#39;&gt;</span><span style="color: #BB6688; font-weight: bold">%d</span><span style="color: #BA2121"> f(</span><span style="color: #BB6688; font-weight: bold">%s</span><span style="color: #BA2121">) = </span><span style="color: #BB6688; font-weight: bold">%.5f</span><span style="color: #BA2121">&#39;</span> <span style="color: #666666">%</span> (i, solution, solution_eval))
+	<span style="color: #008000; font-weight: bold">return</span> [solutions, scores]
+ 
+<span style="color: #408080; font-style: italic"># seed the pseudo random number generator</span>
+seed(<span style="color: #666666">4</span>)
+<span style="color: #408080; font-style: italic"># define range for input</span>
+bounds <span style="color: #666666">=</span> asarray([[<span style="color: #666666">-1.0</span>, <span style="color: #666666">1.0</span>]])
+<span style="color: #408080; font-style: italic"># define the total iterations</span>
+n_iter <span style="color: #666666">=</span> <span style="color: #666666">30</span>
+<span style="color: #408080; font-style: italic"># define the step size</span>
+step_size <span style="color: #666666">=</span> <span style="color: #666666">0.1</span>
+<span style="color: #408080; font-style: italic"># perform the gradient descent search</span>
+solutions, scores <span style="color: #666666">=</span> gradient_descent(objective, derivative, bounds, n_iter, step_size)
+<span style="color: #408080; font-style: italic"># sample input range uniformly at 0.1 increments</span>
+inputs <span style="color: #666666">=</span> arange(bounds[<span style="color: #666666">0</span>,<span style="color: #666666">0</span>], bounds[<span style="color: #666666">0</span>,<span style="color: #666666">1</span>]<span style="color: #666666">+0.1</span>, <span style="color: #666666">0.1</span>)
+<span style="color: #408080; font-style: italic"># compute targets</span>
+results <span style="color: #666666">=</span> objective(inputs)
+<span style="color: #408080; font-style: italic"># create a line plot of input vs result</span>
+pyplot<span style="color: #666666">.</span>plot(inputs, results)
+<span style="color: #408080; font-style: italic"># plot the solutions found</span>
+pyplot<span style="color: #666666">.</span>plot(solutions, scores, <span style="color: #BA2121">&#39;.-&#39;</span>, color<span style="color: #666666">=</span><span style="color: #BA2121">&#39;red&#39;</span>)
+<span style="color: #408080; font-style: italic"># show the plot</span>
+pyplot<span style="color: #666666">.</span>show()
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
+</div>
 
-<p>We have a model for \( p(\boldsymbol{D}\vert\boldsymbol{\beta}) \) but need one for the <b>prior</b> \( p(\boldsymbol{\beta}) \)!   </p>
 
 <p>
 <!-- navigation buttons at the bottom of the page -->
@@ -343,7 +479,7 @@ <h2 id="bayes-theorem-and-ridge-and-lasso-regression" class="anchor">Bayes' Theo
   <li><a href="http://github.com/._week37-bs023.html">24</a></li>
   <li><a href="http://github.com/._week37-bs024.html">25</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week37-bs053.html">54</a></li>
+  <li><a href="http://github.com/._week37-bs065.html">66</a></li>
   <li><a href="http://github.com/._week37-bs016.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week37/html/._week37-bs016.html b/doc/pub/week37/html/._week37-bs016.html
index 968a2bd0c..13fcc4f3c 100644
--- a/doc/pub/week37/html/._week37-bs016.html
+++ b/doc/pub/week37/html/._week37-bs016.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 37: Statistical interpretations and Resampling Methods">
-<title>Week 37: Statistical interpretations and Resampling Methods</title>
+<meta name="description" content="Week 37: Gradient descent methods">
+<title>Week 37: Gradient descent methods</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week37.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week37-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -40,159 +40,222 @@
                2,
                None,
                'plans-for-week-37-lecture-monday'),
-              ('Plans for week 37, lab sessions',
+              ('Readings and Videos:', 2, None, 'readings-and-videos'),
+              ('Material for lecture Monday September 8',
                2,
                None,
-               'plans-for-week-37-lab-sessions'),
-              ('Material for lecture Monday September 9',
+               'material-for-lecture-monday-september-8'),
+              ('Gradient descent and revisiting Ordinary Least Squares from '
+               'last week',
                2,
                None,
-               'material-for-lecture-monday-september-9'),
-              ('Deriving OLS from a probability distribution',
+               'gradient-descent-and-revisiting-ordinary-least-squares-from-last-week'),
+              ('Gradient descent example', 2, None, 'gradient-descent-example'),
+              ('The derivative of the cost/loss function',
                2,
                None,
-               'deriving-ols-from-a-probability-distribution'),
-              ('Independent and Identically Distrubuted (iid)',
+               'the-derivative-of-the-cost-loss-function'),
+              ('The Hessian matrix', 2, None, 'the-hessian-matrix'),
+              ('Simple program', 2, None, 'simple-program'),
+              ('Gradient Descent Example', 2, None, 'gradient-descent-example'),
+              ('Gradient descent and Ridge',
                2,
                None,
-               'independent-and-identically-distrubuted-iid'),
-              ('Maximum Likelihood Estimation (MLE)',
+               'gradient-descent-and-ridge'),
+              ('The Hessian matrix for Ridge Regression',
                2,
                None,
-               'maximum-likelihood-estimation-mle'),
-              ('A new Cost Function', 2, None, 'a-new-cost-function'),
-              ("More basic Statistics and Bayes' theorem",
+               'the-hessian-matrix-for-ridge-regression'),
+              ('Program example for gradient descent with Ridge Regression',
                2,
                None,
-               'more-basic-statistics-and-bayes-theorem'),
-              ('Marginal Probability', 2, None, 'marginal-probability'),
-              ('Conditional  Probability', 2, None, 'conditional-probability'),
-              ("Bayes' Theorem", 2, None, 'bayes-theorem'),
-              ("Interpretations of Bayes' Theorem",
+               'program-example-for-gradient-descent-with-ridge-regression'),
+              ('Using gradient descent methods, limitations',
                2,
                None,
-               'interpretations-of-bayes-theorem'),
-              ("Example of Usage of Bayes' theorem",
+               'using-gradient-descent-methods-limitations'),
+              ('Momentum based GD', 2, None, 'momentum-based-gd'),
+              ('Improving gradient descent with momentum',
                2,
                None,
-               'example-of-usage-of-bayes-theorem'),
-              ('Doing it correctly', 2, None, 'doing-it-correctly'),
-              ("Bayes' Theorem and Ridge and Lasso Regression",
+               'improving-gradient-descent-with-momentum'),
+              ('Same code but now with momentum gradient descent',
                2,
                None,
-               'bayes-theorem-and-ridge-and-lasso-regression'),
-              ('Ridge and Bayes', 2, None, 'ridge-and-bayes'),
-              ('Lasso and Bayes', 2, None, 'lasso-and-bayes'),
-              ('Why resampling methods', 2, None, 'why-resampling-methods'),
-              ('Resampling methods', 2, None, 'resampling-methods'),
-              ('Resampling approaches can be computationally expensive',
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ('Overview video on Stochastic Gradient Descent (SGD)',
                2,
                None,
-               'resampling-approaches-can-be-computationally-expensive'),
-              ('Why resampling methods ?', 2, None, 'why-resampling-methods'),
-              ('Statistical analysis', 2, None, 'statistical-analysis'),
-              ('Resampling methods', 2, None, 'resampling-methods'),
-              ('Resampling methods: Bootstrap',
+               'overview-video-on-stochastic-gradient-descent-sgd'),
+              ('Batches and mini-batches', 2, None, 'batches-and-mini-batches'),
+              ('Pros and cons', 2, None, 'pros-and-cons'),
+              ('Convergence rates', 2, None, 'convergence-rates'),
+              ('Accuracy', 2, None, 'accuracy'),
+              ('Stochastic Gradient Descent (SGD)',
                2,
                None,
-               'resampling-methods-bootstrap'),
-              ('The Central Limit Theorem',
+               'stochastic-gradient-descent-sgd'),
+              ('Stochastic Gradient Descent',
                2,
                None,
-               'the-central-limit-theorem'),
-              ('Finding the Limit', 2, None, 'finding-the-limit'),
-              ('Rewriting the $\\delta$-function',
+               'stochastic-gradient-descent'),
+              ('Computation of gradients', 2, None, 'computation-of-gradients'),
+              ('SGD example', 2, None, 'sgd-example'),
+              ('The gradient step', 2, None, 'the-gradient-step'),
+              ('Simple example code', 2, None, 'simple-example-code'),
+              ('When do we stop?', 2, None, 'when-do-we-stop'),
+              ('Slightly different approach',
                2,
                None,
-               'rewriting-the-delta-function'),
-              ('Identifying Terms', 2, None, 'identifying-terms'),
-              ('Wrapping it up', 2, None, 'wrapping-it-up'),
-              ('Confidence Intervals', 2, None, 'confidence-intervals'),
-              ('Standard Approach based on the Normal Distribution',
+               'slightly-different-approach'),
+              ('Time decay rate', 2, None, 'time-decay-rate'),
+              ('Code with a Number of Minibatches which varies',
                2,
                None,
-               'standard-approach-based-on-the-normal-distribution'),
-              ('Resampling methods: Bootstrap background',
+               'code-with-a-number-of-minibatches-which-varies'),
+              ('Replace or not', 2, None, 'replace-or-not'),
+              ('SGD vs Full-Batch GD: Convergence Speed and Memory Comparison',
                2,
                None,
-               'resampling-methods-bootstrap-background'),
-              ('Resampling methods: More Bootstrap background',
+               'sgd-vs-full-batch-gd-convergence-speed-and-memory-comparison'),
+              ('Theoretical Convergence Speed and convex optimization',
+               3,
+               None,
+               'theoretical-convergence-speed-and-convex-optimization'),
+              ('Strongly Convex Case', 3, None, 'strongly-convex-case'),
+              ('Non-Convex Problems', 3, None, 'non-convex-problems'),
+              ('Memory Usage and Scalability',
+               2,
+               None,
+               'memory-usage-and-scalability'),
+              ('Empirical Evidence: Convergence Time and Memory in Practice',
+               2,
+               None,
+               'empirical-evidence-convergence-time-and-memory-in-practice'),
+              ('Deep Neural Networks', 3, None, 'deep-neural-networks'),
+              ('Memory constraints', 3, None, 'memory-constraints'),
+              ('Second moment of the gradient',
+               2,
+               None,
+               'second-moment-of-the-gradient'),
+              ('Challenge: Choosing a Fixed Learning Rate',
+               2,
+               None,
+               'challenge-choosing-a-fixed-learning-rate'),
+              ('Motivation for Adaptive Step Sizes',
                2,
                None,
-               'resampling-methods-more-bootstrap-background'),
-              ('Resampling methods: Bootstrap approach',
+               'motivation-for-adaptive-step-sizes'),
+              ('AdaGrad algorithm, taken from "Goodfellow et '
+               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
                2,
                None,
-               'resampling-methods-bootstrap-approach'),
-              ('Resampling methods: Bootstrap steps',
+               'adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
+              ('Derivation of the AdaGrad Algorithm',
                2,
                None,
-               'resampling-methods-bootstrap-steps'),
-              ('Code example for the Bootstrap method',
+               'derivation-of-the-adagrad-algorithm'),
+              ('AdaGrad Update Rule Derivation',
                2,
                None,
-               'code-example-for-the-bootstrap-method'),
-              ('Plotting the Histogram', 2, None, 'plotting-the-histogram'),
-              ('The bias-variance tradeoff',
+               'adagrad-update-rule-derivation'),
+              ('AdaGrad Properties', 2, None, 'adagrad-properties'),
+              ('RMSProp: Adaptive Learning Rates',
                2,
                None,
-               'the-bias-variance-tradeoff'),
-              ('A way to Read the Bias-Variance Tradeoff',
+               'rmsprop-adaptive-learning-rates'),
+              ('RMSProp algorithm, taken from "Goodfellow et '
+               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
                2,
                None,
-               'a-way-to-read-the-bias-variance-tradeoff'),
-              ('Example code for Bias-Variance tradeoff',
+               'rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
+              ('Adam Optimizer', 2, None, 'adam-optimizer'),
+              ('"ADAM optimizer":"https://arxiv.org/abs/1412.6980"',
                2,
                None,
-               'example-code-for-bias-variance-tradeoff'),
-              ('Understanding what happens',
+               'adam-optimizer-https-arxiv-org-abs-1412-6980'),
+              ('Why Combine Momentum and RMSProp?',
                2,
                None,
-               'understanding-what-happens'),
-              ('Summing up', 2, None, 'summing-up'),
-              ("Another Example from Scikit-Learn's Repository",
+               'why-combine-momentum-and-rmsprop'),
+              ('Adam: Exponential Moving Averages (Moments)',
                2,
                None,
-               'another-example-from-scikit-learn-s-repository'),
-              ('Various steps in cross-validation',
+               'adam-exponential-moving-averages-moments'),
+              ('Adam: Bias Correction', 2, None, 'adam-bias-correction'),
+              ('Adam: Update Rule Derivation',
                2,
                None,
-               'various-steps-in-cross-validation'),
-              ('Cross-validation in brief',
+               'adam-update-rule-derivation'),
+              ('Adam vs. AdaGrad and RMSProp',
                2,
                None,
-               'cross-validation-in-brief'),
-              ('Code Example for Cross-validation and $k$-fold '
-               'Cross-validation',
+               'adam-vs-adagrad-and-rmsprop'),
+              ('Adaptivity Across Dimensions',
                2,
                None,
-               'code-example-for-cross-validation-and-k-fold-cross-validation'),
-              ('More examples on bootstrap and cross-validation and errors',
+               'adaptivity-across-dimensions'),
+              ('ADAM algorithm, taken from "Goodfellow et '
+               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
                2,
                None,
-               'more-examples-on-bootstrap-and-cross-validation-and-errors'),
-              ('The same example but now with cross-validation',
+               'adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
+              ('Algorithms and codes for Adagrad, RMSprop and Adam',
                2,
                None,
-               'the-same-example-but-now-with-cross-validation'),
+               'algorithms-and-codes-for-adagrad-rmsprop-and-adam'),
+              ('Practical tips', 2, None, 'practical-tips'),
+              ('Sneaking in automatic differentiation using Autograd',
+               2,
+               None,
+               'sneaking-in-automatic-differentiation-using-autograd'),
+              ('Same code but now with momentum gradient descent',
+               2,
+               None,
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ('Including Stochastic Gradient Descent with Autograd',
+               2,
+               None,
+               'including-stochastic-gradient-descent-with-autograd'),
+              ('Same code but now with momentum gradient descent',
+               2,
+               None,
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ("But none of these can compete with Newton's method",
+               2,
+               None,
+               'but-none-of-these-can-compete-with-newton-s-method'),
+              ('Similar (second order function now) problem but now with '
+               'AdaGrad',
+               2,
+               None,
+               'similar-second-order-function-now-problem-but-now-with-adagrad'),
+              ('RMSprop for adaptive learning rate with Stochastic Gradient '
+               'Descent',
+               2,
+               None,
+               'rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent'),
+              ('And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"',
+               2,
+               None,
+               'and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf'),
               ('Material for the lab sessions',
                2,
                None,
                'material-for-the-lab-sessions'),
-              ('Linking the regression analysis with a statistical '
-               'interpretation',
+              ('Reminder on different scaling methods',
                2,
                None,
-               'linking-the-regression-analysis-with-a-statistical-interpretation'),
-              ('Assumptions made', 2, None, 'assumptions-made'),
-              ('Expectation value and variance',
+               'reminder-on-different-scaling-methods'),
+              ('Functionality in Scikit-Learn',
                2,
                None,
-               'expectation-value-and-variance'),
-              ('Expectation value and variance for $\\boldsymbol{\\beta}$',
+               'functionality-in-scikit-learn'),
+              ('More preprocessing', 2, None, 'more-preprocessing'),
+              ('Frequently used scaling functions',
                2,
                None,
-               'expectation-value-and-variance-for-boldsymbol-beta')]}
+               'frequently-used-scaling-functions')]}
 end of tocinfo -->
 
 <body>
@@ -220,66 +283,86 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week37-bs.html">Week 37: Statistical interpretations and Resampling Methods</a>
+    <a class="navbar-brand" href="http://github.com/week37-bs.html">Week 37: Gradient descent methods</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs001.html#plans-for-week-37-lecture-monday" style="font-size: 80%;">Plans for week 37, lecture Monday</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs002.html#plans-for-week-37-lab-sessions" style="font-size: 80%;">Plans for week 37, lab sessions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs003.html#material-for-lecture-monday-september-9" style="font-size: 80%;">Material for lecture Monday September 9</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs004.html#deriving-ols-from-a-probability-distribution" style="font-size: 80%;">Deriving OLS from a probability distribution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs005.html#independent-and-identically-distrubuted-iid" style="font-size: 80%;">Independent and Identically Distrubuted (iid)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs006.html#maximum-likelihood-estimation-mle" style="font-size: 80%;">Maximum Likelihood Estimation (MLE)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs007.html#a-new-cost-function" style="font-size: 80%;">A new Cost Function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs008.html#more-basic-statistics-and-bayes-theorem" style="font-size: 80%;">More basic Statistics and Bayes' theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs009.html#marginal-probability" style="font-size: 80%;">Marginal Probability</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs010.html#conditional-probability" style="font-size: 80%;">Conditional  Probability</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs011.html#bayes-theorem" style="font-size: 80%;">Bayes' Theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs012.html#interpretations-of-bayes-theorem" style="font-size: 80%;">Interpretations of Bayes' Theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs013.html#example-of-usage-of-bayes-theorem" style="font-size: 80%;">Example of Usage of Bayes' theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs014.html#doing-it-correctly" style="font-size: 80%;">Doing it correctly</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs015.html#bayes-theorem-and-ridge-and-lasso-regression" style="font-size: 80%;">Bayes' Theorem and Ridge and Lasso Regression</a></li>
-     <!-- navigation toc: --> <li><a href="#ridge-and-bayes" style="font-size: 80%;">Ridge and Bayes</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs017.html#lasso-and-bayes" style="font-size: 80%;">Lasso and Bayes</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs021.html#why-resampling-methods" style="font-size: 80%;">Why resampling methods</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs023.html#resampling-methods" style="font-size: 80%;">Resampling methods</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs020.html#resampling-approaches-can-be-computationally-expensive" style="font-size: 80%;">Resampling approaches can be computationally expensive</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs021.html#why-resampling-methods" style="font-size: 80%;">Why resampling methods ?</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs022.html#statistical-analysis" style="font-size: 80%;">Statistical analysis</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs023.html#resampling-methods" style="font-size: 80%;">Resampling methods</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs024.html#resampling-methods-bootstrap" style="font-size: 80%;">Resampling methods: Bootstrap</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs025.html#the-central-limit-theorem" style="font-size: 80%;">The Central Limit Theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs026.html#finding-the-limit" style="font-size: 80%;">Finding the Limit</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs027.html#rewriting-the-delta-function" style="font-size: 80%;">Rewriting the \( \delta \)-function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs028.html#identifying-terms" style="font-size: 80%;">Identifying Terms</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs029.html#wrapping-it-up" style="font-size: 80%;">Wrapping it up</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs030.html#confidence-intervals" style="font-size: 80%;">Confidence Intervals</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs031.html#standard-approach-based-on-the-normal-distribution" style="font-size: 80%;">Standard Approach based on the Normal Distribution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs032.html#resampling-methods-bootstrap-background" style="font-size: 80%;">Resampling methods: Bootstrap background</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#resampling-methods-more-bootstrap-background" style="font-size: 80%;">Resampling methods: More Bootstrap background</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs034.html#resampling-methods-bootstrap-approach" style="font-size: 80%;">Resampling methods: Bootstrap approach</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#resampling-methods-bootstrap-steps" style="font-size: 80%;">Resampling methods: Bootstrap steps</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs036.html#code-example-for-the-bootstrap-method" style="font-size: 80%;">Code example for the Bootstrap method</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs037.html#plotting-the-histogram" style="font-size: 80%;">Plotting the Histogram</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs038.html#the-bias-variance-tradeoff" style="font-size: 80%;">The bias-variance tradeoff</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs039.html#a-way-to-read-the-bias-variance-tradeoff" style="font-size: 80%;">A way to Read the Bias-Variance Tradeoff</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs040.html#example-code-for-bias-variance-tradeoff" style="font-size: 80%;">Example code for Bias-Variance tradeoff</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs041.html#understanding-what-happens" style="font-size: 80%;">Understanding what happens</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs042.html#summing-up" style="font-size: 80%;">Summing up</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs043.html#another-example-from-scikit-learn-s-repository" style="font-size: 80%;">Another Example from Scikit-Learn's Repository</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs044.html#various-steps-in-cross-validation" style="font-size: 80%;">Various steps in cross-validation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs045.html#cross-validation-in-brief" style="font-size: 80%;">Cross-validation in brief</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs046.html#code-example-for-cross-validation-and-k-fold-cross-validation" style="font-size: 80%;">Code Example for Cross-validation and \( k \)-fold Cross-validation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs047.html#more-examples-on-bootstrap-and-cross-validation-and-errors" style="font-size: 80%;">More examples on bootstrap and cross-validation and errors</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs048.html#the-same-example-but-now-with-cross-validation" style="font-size: 80%;">The same example but now with cross-validation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs049.html#material-for-the-lab-sessions" style="font-size: 80%;">Material for the lab sessions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs050.html#linking-the-regression-analysis-with-a-statistical-interpretation" style="font-size: 80%;">Linking the regression analysis with a statistical interpretation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs051.html#assumptions-made" style="font-size: 80%;">Assumptions made</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs052.html#expectation-value-and-variance" style="font-size: 80%;">Expectation value and variance</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs053.html#expectation-value-and-variance-for-boldsymbol-beta" style="font-size: 80%;">Expectation value and variance for \( \boldsymbol{\beta} \)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs001.html#plans-for-week-37-lecture-monday" style="font-size: 80%;"><b>Plans for week 37, lecture Monday</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs002.html#readings-and-videos" style="font-size: 80%;"><b>Readings and Videos:</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs003.html#material-for-lecture-monday-september-8" style="font-size: 80%;"><b>Material for lecture Monday September 8</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs004.html#gradient-descent-and-revisiting-ordinary-least-squares-from-last-week" style="font-size: 80%;"><b>Gradient descent and revisiting Ordinary Least Squares from last week</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs009.html#gradient-descent-example" style="font-size: 80%;"><b>Gradient descent example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs006.html#the-derivative-of-the-cost-loss-function" style="font-size: 80%;"><b>The derivative of the cost/loss function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs007.html#the-hessian-matrix" style="font-size: 80%;"><b>The Hessian matrix</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs008.html#simple-program" style="font-size: 80%;"><b>Simple program</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs009.html#gradient-descent-example" style="font-size: 80%;"><b>Gradient Descent Example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs010.html#gradient-descent-and-ridge" style="font-size: 80%;"><b>Gradient descent and Ridge</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs011.html#the-hessian-matrix-for-ridge-regression" style="font-size: 80%;"><b>The Hessian matrix for Ridge Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs012.html#program-example-for-gradient-descent-with-ridge-regression" style="font-size: 80%;"><b>Program example for gradient descent with Ridge Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs013.html#using-gradient-descent-methods-limitations" style="font-size: 80%;"><b>Using gradient descent methods, limitations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs014.html#momentum-based-gd" style="font-size: 80%;"><b>Momentum based GD</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs015.html#improving-gradient-descent-with-momentum" style="font-size: 80%;"><b>Improving gradient descent with momentum</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs056.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs017.html#overview-video-on-stochastic-gradient-descent-sgd" style="font-size: 80%;"><b>Overview video on Stochastic Gradient Descent (SGD)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs018.html#batches-and-mini-batches" style="font-size: 80%;"><b>Batches and mini-batches</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs019.html#pros-and-cons" style="font-size: 80%;"><b>Pros and cons</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs020.html#convergence-rates" style="font-size: 80%;"><b>Convergence rates</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs021.html#accuracy" style="font-size: 80%;"><b>Accuracy</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs022.html#stochastic-gradient-descent-sgd" style="font-size: 80%;"><b>Stochastic Gradient Descent (SGD)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs023.html#stochastic-gradient-descent" style="font-size: 80%;"><b>Stochastic Gradient Descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs024.html#computation-of-gradients" style="font-size: 80%;"><b>Computation of gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs025.html#sgd-example" style="font-size: 80%;"><b>SGD example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs026.html#the-gradient-step" style="font-size: 80%;"><b>The gradient step</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs027.html#simple-example-code" style="font-size: 80%;"><b>Simple example code</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs028.html#when-do-we-stop" style="font-size: 80%;"><b>When do we stop?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs029.html#slightly-different-approach" style="font-size: 80%;"><b>Slightly different approach</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs030.html#time-decay-rate" style="font-size: 80%;"><b>Time decay rate</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs031.html#code-with-a-number-of-minibatches-which-varies" style="font-size: 80%;"><b>Code with a Number of Minibatches which varies</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs032.html#replace-or-not" style="font-size: 80%;"><b>Replace or not</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#sgd-vs-full-batch-gd-convergence-speed-and-memory-comparison" style="font-size: 80%;"><b>SGD vs Full-Batch GD: Convergence Speed and Memory Comparison</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#theoretical-convergence-speed-and-convex-optimization" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Theoretical Convergence Speed and convex optimization</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#strongly-convex-case" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Strongly Convex Case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#non-convex-problems" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Non-Convex Problems</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs034.html#memory-usage-and-scalability" style="font-size: 80%;"><b>Memory Usage and Scalability</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#empirical-evidence-convergence-time-and-memory-in-practice" style="font-size: 80%;"><b>Empirical Evidence: Convergence Time and Memory in Practice</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#deep-neural-networks" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Deep Neural Networks</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#memory-constraints" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Memory constraints</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs036.html#second-moment-of-the-gradient" style="font-size: 80%;"><b>Second moment of the gradient</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs037.html#challenge-choosing-a-fixed-learning-rate" style="font-size: 80%;"><b>Challenge: Choosing a Fixed Learning Rate</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs038.html#motivation-for-adaptive-step-sizes" style="font-size: 80%;"><b>Motivation for Adaptive Step Sizes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs038.html#adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>AdaGrad algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs039.html#derivation-of-the-adagrad-algorithm" style="font-size: 80%;"><b>Derivation of the AdaGrad Algorithm</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs040.html#adagrad-update-rule-derivation" style="font-size: 80%;"><b>AdaGrad Update Rule Derivation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs041.html#adagrad-properties" style="font-size: 80%;"><b>AdaGrad Properties</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs042.html#rmsprop-adaptive-learning-rates" style="font-size: 80%;"><b>RMSProp: Adaptive Learning Rates</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs042.html#rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>RMSProp algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs043.html#adam-optimizer" style="font-size: 80%;"><b>Adam Optimizer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs044.html#adam-optimizer-https-arxiv-org-abs-1412-6980" style="font-size: 80%;"><b>"ADAM optimizer":"https://arxiv.org/abs/1412.6980"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs045.html#why-combine-momentum-and-rmsprop" style="font-size: 80%;"><b>Why Combine Momentum and RMSProp?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs046.html#adam-exponential-moving-averages-moments" style="font-size: 80%;"><b>Adam: Exponential Moving Averages (Moments)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs047.html#adam-bias-correction" style="font-size: 80%;"><b>Adam: Bias Correction</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs048.html#adam-update-rule-derivation" style="font-size: 80%;"><b>Adam: Update Rule Derivation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs049.html#adam-vs-adagrad-and-rmsprop" style="font-size: 80%;"><b>Adam vs. AdaGrad and RMSProp</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs050.html#adaptivity-across-dimensions" style="font-size: 80%;"><b>Adaptivity Across Dimensions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs050.html#adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>ADAM algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs051.html#algorithms-and-codes-for-adagrad-rmsprop-and-adam" style="font-size: 80%;"><b>Algorithms and codes for Adagrad, RMSprop and Adam</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs052.html#practical-tips" style="font-size: 80%;"><b>Practical tips</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs053.html#sneaking-in-automatic-differentiation-using-autograd" style="font-size: 80%;"><b>Sneaking in automatic differentiation using Autograd</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs056.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs055.html#including-stochastic-gradient-descent-with-autograd" style="font-size: 80%;"><b>Including Stochastic Gradient Descent with Autograd</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs056.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs057.html#but-none-of-these-can-compete-with-newton-s-method" style="font-size: 80%;"><b>But none of these can compete with Newton's method</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs058.html#similar-second-order-function-now-problem-but-now-with-adagrad" style="font-size: 80%;"><b>Similar (second order function now) problem but now with AdaGrad</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs059.html#rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent" style="font-size: 80%;"><b>RMSprop for adaptive learning rate with Stochastic Gradient Descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs060.html#and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf" style="font-size: 80%;"><b>And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs061.html#material-for-the-lab-sessions" style="font-size: 80%;"><b>Material for the lab sessions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs062.html#reminder-on-different-scaling-methods" style="font-size: 80%;"><b>Reminder on different scaling methods</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs063.html#functionality-in-scikit-learn" style="font-size: 80%;"><b>Functionality in Scikit-Learn</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs064.html#more-preprocessing" style="font-size: 80%;"><b>More preprocessing</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs065.html#frequently-used-scaling-functions" style="font-size: 80%;"><b>Frequently used scaling functions</b></a></li>
 
         </ul>
       </li>
@@ -291,41 +374,93 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0016"></a>
 <!-- !split -->
-<h2 id="ridge-and-bayes" class="anchor">Ridge and Bayes </h2>
-
-<p>With the posterior probability defined by a likelihood which we have
-already modeled and an unknown prior, we are now ready to make
-additional models for the prior.
-</p>
-
-<p>We can, based on our discussions of the variance of \( \boldsymbol{\beta} \) and the mean value, assume that the prior for the values \( \boldsymbol{\beta} \) is given by a Gaussian with mean value zero and variance \( \tau^2 \), that is</p>
+<h2 id="same-code-but-now-with-momentum-gradient-descent" class="anchor">Same code but now with momentum gradient descent </h2>
 
-$$
-p(\boldsymbol{\beta})=\prod_{j=0}^{p-1}\exp{\left(-\frac{\beta_j^2}{2\tau^2}\right)}.
-$$
 
-<p>Our posterior probability becomes then (omitting the normalization factor which is just a constant)</p>
-$$
-p(\boldsymbol{\beta\vert\boldsymbol{D})}=\prod_{i=0}^{n-1}\frac{1}{\sqrt{2\pi\sigma^2}}\exp{\left[-\frac{(y_i-\boldsymbol{X}_{i,*}\boldsymbol{\beta})^2}{2\sigma^2}\right]}\prod_{j=0}^{p-1}\exp{\left(-\frac{\beta_j^2}{2\tau^2}\right)}.
-$$
-
-<p>We can now optimize this quantity with respect to \( \boldsymbol{\beta} \). As we
-did for OLS, this is most conveniently done by taking the negative
-logarithm of the posterior probability. Doing so and leaving out the
-constants terms that do not depend on \( \beta \), we have
-</p>
-
-$$
-C(\boldsymbol{\beta})=\frac{\vert\vert (\boldsymbol{y}-\boldsymbol{X}\boldsymbol{\beta})\vert\vert_2^2}{2\sigma^2}+\frac{1}{2\tau^2}\vert\vert\boldsymbol{\beta}\vert\vert_2^2,
-$$
-
-<p>and replacing \( 1/2\tau^2 \) with \( \lambda \) we have</p>
-
-$$
-C(\boldsymbol{\beta})=\frac{\vert\vert (\boldsymbol{y}-\boldsymbol{X}\boldsymbol{\beta})\vert\vert_2^2}{2\sigma^2}+\lambda\vert\vert\boldsymbol{\beta}\vert\vert_2^2,
-$$
+<!-- code=python (!bc pycod) typeset with pygments style "default" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #f8f8f8">
+  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">numpy</span> <span style="color: #008000; font-weight: bold">import</span> asarray
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">numpy</span> <span style="color: #008000; font-weight: bold">import</span> arange
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">numpy.random</span> <span style="color: #008000; font-weight: bold">import</span> rand
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">numpy.random</span> <span style="color: #008000; font-weight: bold">import</span> seed
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">matplotlib</span> <span style="color: #008000; font-weight: bold">import</span> pyplot
+ 
+<span style="color: #408080; font-style: italic"># objective function</span>
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">objective</span>(x):
+	<span style="color: #008000; font-weight: bold">return</span> x<span style="color: #666666">**2.0</span>
+ 
+<span style="color: #408080; font-style: italic"># derivative of objective function</span>
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">derivative</span>(x):
+	<span style="color: #008000; font-weight: bold">return</span> x <span style="color: #666666">*</span> <span style="color: #666666">2.0</span>
+ 
+<span style="color: #408080; font-style: italic"># gradient descent algorithm</span>
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">gradient_descent</span>(objective, derivative, bounds, n_iter, step_size, momentum):
+	<span style="color: #408080; font-style: italic"># track all solutions</span>
+	solutions, scores <span style="color: #666666">=</span> <span style="color: #008000">list</span>(), <span style="color: #008000">list</span>()
+	<span style="color: #408080; font-style: italic"># generate an initial point</span>
+	solution <span style="color: #666666">=</span> bounds[:, <span style="color: #666666">0</span>] <span style="color: #666666">+</span> rand(<span style="color: #008000">len</span>(bounds)) <span style="color: #666666">*</span> (bounds[:, <span style="color: #666666">1</span>] <span style="color: #666666">-</span> bounds[:, <span style="color: #666666">0</span>])
+	<span style="color: #408080; font-style: italic"># keep track of the change</span>
+	change <span style="color: #666666">=</span> <span style="color: #666666">0.0</span>
+	<span style="color: #408080; font-style: italic"># run the gradient descent</span>
+	<span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(n_iter):
+		<span style="color: #408080; font-style: italic"># calculate gradient</span>
+		gradient <span style="color: #666666">=</span> derivative(solution)
+		<span style="color: #408080; font-style: italic"># calculate update</span>
+		new_change <span style="color: #666666">=</span> step_size <span style="color: #666666">*</span> gradient <span style="color: #666666">+</span> momentum <span style="color: #666666">*</span> change
+		<span style="color: #408080; font-style: italic"># take a step</span>
+		solution <span style="color: #666666">=</span> solution <span style="color: #666666">-</span> new_change
+		<span style="color: #408080; font-style: italic"># save the change</span>
+		change <span style="color: #666666">=</span> new_change
+		<span style="color: #408080; font-style: italic"># evaluate candidate point</span>
+		solution_eval <span style="color: #666666">=</span> objective(solution)
+		<span style="color: #408080; font-style: italic"># store solution</span>
+		solutions<span style="color: #666666">.</span>append(solution)
+		scores<span style="color: #666666">.</span>append(solution_eval)
+		<span style="color: #408080; font-style: italic"># report progress</span>
+		<span style="color: #008000">print</span>(<span style="color: #BA2121">&#39;&gt;</span><span style="color: #BB6688; font-weight: bold">%d</span><span style="color: #BA2121"> f(</span><span style="color: #BB6688; font-weight: bold">%s</span><span style="color: #BA2121">) = </span><span style="color: #BB6688; font-weight: bold">%.5f</span><span style="color: #BA2121">&#39;</span> <span style="color: #666666">%</span> (i, solution, solution_eval))
+	<span style="color: #008000; font-weight: bold">return</span> [solutions, scores]
+ 
+<span style="color: #408080; font-style: italic"># seed the pseudo random number generator</span>
+seed(<span style="color: #666666">4</span>)
+<span style="color: #408080; font-style: italic"># define range for input</span>
+bounds <span style="color: #666666">=</span> asarray([[<span style="color: #666666">-1.0</span>, <span style="color: #666666">1.0</span>]])
+<span style="color: #408080; font-style: italic"># define the total iterations</span>
+n_iter <span style="color: #666666">=</span> <span style="color: #666666">30</span>
+<span style="color: #408080; font-style: italic"># define the step size</span>
+step_size <span style="color: #666666">=</span> <span style="color: #666666">0.1</span>
+<span style="color: #408080; font-style: italic"># define momentum</span>
+momentum <span style="color: #666666">=</span> <span style="color: #666666">0.3</span>
+<span style="color: #408080; font-style: italic"># perform the gradient descent search with momentum</span>
+solutions, scores <span style="color: #666666">=</span> gradient_descent(objective, derivative, bounds, n_iter, step_size, momentum)
+<span style="color: #408080; font-style: italic"># sample input range uniformly at 0.1 increments</span>
+inputs <span style="color: #666666">=</span> arange(bounds[<span style="color: #666666">0</span>,<span style="color: #666666">0</span>], bounds[<span style="color: #666666">0</span>,<span style="color: #666666">1</span>]<span style="color: #666666">+0.1</span>, <span style="color: #666666">0.1</span>)
+<span style="color: #408080; font-style: italic"># compute targets</span>
+results <span style="color: #666666">=</span> objective(inputs)
+<span style="color: #408080; font-style: italic"># create a line plot of input vs result</span>
+pyplot<span style="color: #666666">.</span>plot(inputs, results)
+<span style="color: #408080; font-style: italic"># plot the solutions found</span>
+pyplot<span style="color: #666666">.</span>plot(solutions, scores, <span style="color: #BA2121">&#39;.-&#39;</span>, color<span style="color: #666666">=</span><span style="color: #BA2121">&#39;red&#39;</span>)
+<span style="color: #408080; font-style: italic"># show the plot</span>
+pyplot<span style="color: #666666">.</span>show()
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
+</div>
 
-<p>which is our Ridge cost function!  Nice, isn't it?</p>
 
 <p>
 <!-- navigation buttons at the bottom of the page -->
@@ -352,7 +487,7 @@ <h2 id="ridge-and-bayes" class="anchor">Ridge and Bayes </h2>
   <li><a href="http://github.com/._week37-bs024.html">25</a></li>
   <li><a href="http://github.com/._week37-bs025.html">26</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week37-bs053.html">54</a></li>
+  <li><a href="http://github.com/._week37-bs065.html">66</a></li>
   <li><a href="http://github.com/._week37-bs017.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week37/html/._week37-bs017.html b/doc/pub/week37/html/._week37-bs017.html
index 85840d3d6..443f4a91c 100644
--- a/doc/pub/week37/html/._week37-bs017.html
+++ b/doc/pub/week37/html/._week37-bs017.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 37: Statistical interpretations and Resampling Methods">
-<title>Week 37: Statistical interpretations and Resampling Methods</title>
+<meta name="description" content="Week 37: Gradient descent methods">
+<title>Week 37: Gradient descent methods</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week37.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week37-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -40,159 +40,222 @@
                2,
                None,
                'plans-for-week-37-lecture-monday'),
-              ('Plans for week 37, lab sessions',
+              ('Readings and Videos:', 2, None, 'readings-and-videos'),
+              ('Material for lecture Monday September 8',
                2,
                None,
-               'plans-for-week-37-lab-sessions'),
-              ('Material for lecture Monday September 9',
+               'material-for-lecture-monday-september-8'),
+              ('Gradient descent and revisiting Ordinary Least Squares from '
+               'last week',
                2,
                None,
-               'material-for-lecture-monday-september-9'),
-              ('Deriving OLS from a probability distribution',
+               'gradient-descent-and-revisiting-ordinary-least-squares-from-last-week'),
+              ('Gradient descent example', 2, None, 'gradient-descent-example'),
+              ('The derivative of the cost/loss function',
                2,
                None,
-               'deriving-ols-from-a-probability-distribution'),
-              ('Independent and Identically Distrubuted (iid)',
+               'the-derivative-of-the-cost-loss-function'),
+              ('The Hessian matrix', 2, None, 'the-hessian-matrix'),
+              ('Simple program', 2, None, 'simple-program'),
+              ('Gradient Descent Example', 2, None, 'gradient-descent-example'),
+              ('Gradient descent and Ridge',
                2,
                None,
-               'independent-and-identically-distrubuted-iid'),
-              ('Maximum Likelihood Estimation (MLE)',
+               'gradient-descent-and-ridge'),
+              ('The Hessian matrix for Ridge Regression',
                2,
                None,
-               'maximum-likelihood-estimation-mle'),
-              ('A new Cost Function', 2, None, 'a-new-cost-function'),
-              ("More basic Statistics and Bayes' theorem",
+               'the-hessian-matrix-for-ridge-regression'),
+              ('Program example for gradient descent with Ridge Regression',
                2,
                None,
-               'more-basic-statistics-and-bayes-theorem'),
-              ('Marginal Probability', 2, None, 'marginal-probability'),
-              ('Conditional  Probability', 2, None, 'conditional-probability'),
-              ("Bayes' Theorem", 2, None, 'bayes-theorem'),
-              ("Interpretations of Bayes' Theorem",
+               'program-example-for-gradient-descent-with-ridge-regression'),
+              ('Using gradient descent methods, limitations',
                2,
                None,
-               'interpretations-of-bayes-theorem'),
-              ("Example of Usage of Bayes' theorem",
+               'using-gradient-descent-methods-limitations'),
+              ('Momentum based GD', 2, None, 'momentum-based-gd'),
+              ('Improving gradient descent with momentum',
                2,
                None,
-               'example-of-usage-of-bayes-theorem'),
-              ('Doing it correctly', 2, None, 'doing-it-correctly'),
-              ("Bayes' Theorem and Ridge and Lasso Regression",
+               'improving-gradient-descent-with-momentum'),
+              ('Same code but now with momentum gradient descent',
                2,
                None,
-               'bayes-theorem-and-ridge-and-lasso-regression'),
-              ('Ridge and Bayes', 2, None, 'ridge-and-bayes'),
-              ('Lasso and Bayes', 2, None, 'lasso-and-bayes'),
-              ('Why resampling methods', 2, None, 'why-resampling-methods'),
-              ('Resampling methods', 2, None, 'resampling-methods'),
-              ('Resampling approaches can be computationally expensive',
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ('Overview video on Stochastic Gradient Descent (SGD)',
                2,
                None,
-               'resampling-approaches-can-be-computationally-expensive'),
-              ('Why resampling methods ?', 2, None, 'why-resampling-methods'),
-              ('Statistical analysis', 2, None, 'statistical-analysis'),
-              ('Resampling methods', 2, None, 'resampling-methods'),
-              ('Resampling methods: Bootstrap',
+               'overview-video-on-stochastic-gradient-descent-sgd'),
+              ('Batches and mini-batches', 2, None, 'batches-and-mini-batches'),
+              ('Pros and cons', 2, None, 'pros-and-cons'),
+              ('Convergence rates', 2, None, 'convergence-rates'),
+              ('Accuracy', 2, None, 'accuracy'),
+              ('Stochastic Gradient Descent (SGD)',
                2,
                None,
-               'resampling-methods-bootstrap'),
-              ('The Central Limit Theorem',
+               'stochastic-gradient-descent-sgd'),
+              ('Stochastic Gradient Descent',
                2,
                None,
-               'the-central-limit-theorem'),
-              ('Finding the Limit', 2, None, 'finding-the-limit'),
-              ('Rewriting the $\\delta$-function',
+               'stochastic-gradient-descent'),
+              ('Computation of gradients', 2, None, 'computation-of-gradients'),
+              ('SGD example', 2, None, 'sgd-example'),
+              ('The gradient step', 2, None, 'the-gradient-step'),
+              ('Simple example code', 2, None, 'simple-example-code'),
+              ('When do we stop?', 2, None, 'when-do-we-stop'),
+              ('Slightly different approach',
                2,
                None,
-               'rewriting-the-delta-function'),
-              ('Identifying Terms', 2, None, 'identifying-terms'),
-              ('Wrapping it up', 2, None, 'wrapping-it-up'),
-              ('Confidence Intervals', 2, None, 'confidence-intervals'),
-              ('Standard Approach based on the Normal Distribution',
+               'slightly-different-approach'),
+              ('Time decay rate', 2, None, 'time-decay-rate'),
+              ('Code with a Number of Minibatches which varies',
                2,
                None,
-               'standard-approach-based-on-the-normal-distribution'),
-              ('Resampling methods: Bootstrap background',
+               'code-with-a-number-of-minibatches-which-varies'),
+              ('Replace or not', 2, None, 'replace-or-not'),
+              ('SGD vs Full-Batch GD: Convergence Speed and Memory Comparison',
                2,
                None,
-               'resampling-methods-bootstrap-background'),
-              ('Resampling methods: More Bootstrap background',
+               'sgd-vs-full-batch-gd-convergence-speed-and-memory-comparison'),
+              ('Theoretical Convergence Speed and convex optimization',
+               3,
+               None,
+               'theoretical-convergence-speed-and-convex-optimization'),
+              ('Strongly Convex Case', 3, None, 'strongly-convex-case'),
+              ('Non-Convex Problems', 3, None, 'non-convex-problems'),
+              ('Memory Usage and Scalability',
+               2,
+               None,
+               'memory-usage-and-scalability'),
+              ('Empirical Evidence: Convergence Time and Memory in Practice',
+               2,
+               None,
+               'empirical-evidence-convergence-time-and-memory-in-practice'),
+              ('Deep Neural Networks', 3, None, 'deep-neural-networks'),
+              ('Memory constraints', 3, None, 'memory-constraints'),
+              ('Second moment of the gradient',
+               2,
+               None,
+               'second-moment-of-the-gradient'),
+              ('Challenge: Choosing a Fixed Learning Rate',
+               2,
+               None,
+               'challenge-choosing-a-fixed-learning-rate'),
+              ('Motivation for Adaptive Step Sizes',
+               2,
+               None,
+               'motivation-for-adaptive-step-sizes'),
+              ('AdaGrad algorithm, taken from "Goodfellow et '
+               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
+               2,
+               None,
+               'adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
+              ('Derivation of the AdaGrad Algorithm',
+               2,
+               None,
+               'derivation-of-the-adagrad-algorithm'),
+              ('AdaGrad Update Rule Derivation',
+               2,
+               None,
+               'adagrad-update-rule-derivation'),
+              ('AdaGrad Properties', 2, None, 'adagrad-properties'),
+              ('RMSProp: Adaptive Learning Rates',
+               2,
+               None,
+               'rmsprop-adaptive-learning-rates'),
+              ('RMSProp algorithm, taken from "Goodfellow et '
+               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
+               2,
+               None,
+               'rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
+              ('Adam Optimizer', 2, None, 'adam-optimizer'),
+              ('"ADAM optimizer":"https://arxiv.org/abs/1412.6980"',
+               2,
+               None,
+               'adam-optimizer-https-arxiv-org-abs-1412-6980'),
+              ('Why Combine Momentum and RMSProp?',
+               2,
+               None,
+               'why-combine-momentum-and-rmsprop'),
+              ('Adam: Exponential Moving Averages (Moments)',
                2,
                None,
-               'resampling-methods-more-bootstrap-background'),
-              ('Resampling methods: Bootstrap approach',
+               'adam-exponential-moving-averages-moments'),
+              ('Adam: Bias Correction', 2, None, 'adam-bias-correction'),
+              ('Adam: Update Rule Derivation',
                2,
                None,
-               'resampling-methods-bootstrap-approach'),
-              ('Resampling methods: Bootstrap steps',
+               'adam-update-rule-derivation'),
+              ('Adam vs. AdaGrad and RMSProp',
                2,
                None,
-               'resampling-methods-bootstrap-steps'),
-              ('Code example for the Bootstrap method',
+               'adam-vs-adagrad-and-rmsprop'),
+              ('Adaptivity Across Dimensions',
                2,
                None,
-               'code-example-for-the-bootstrap-method'),
-              ('Plotting the Histogram', 2, None, 'plotting-the-histogram'),
-              ('The bias-variance tradeoff',
+               'adaptivity-across-dimensions'),
+              ('ADAM algorithm, taken from "Goodfellow et '
+               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
                2,
                None,
-               'the-bias-variance-tradeoff'),
-              ('A way to Read the Bias-Variance Tradeoff',
+               'adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
+              ('Algorithms and codes for Adagrad, RMSprop and Adam',
                2,
                None,
-               'a-way-to-read-the-bias-variance-tradeoff'),
-              ('Example code for Bias-Variance tradeoff',
+               'algorithms-and-codes-for-adagrad-rmsprop-and-adam'),
+              ('Practical tips', 2, None, 'practical-tips'),
+              ('Sneaking in automatic differentiation using Autograd',
                2,
                None,
-               'example-code-for-bias-variance-tradeoff'),
-              ('Understanding what happens',
+               'sneaking-in-automatic-differentiation-using-autograd'),
+              ('Same code but now with momentum gradient descent',
                2,
                None,
-               'understanding-what-happens'),
-              ('Summing up', 2, None, 'summing-up'),
-              ("Another Example from Scikit-Learn's Repository",
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ('Including Stochastic Gradient Descent with Autograd',
                2,
                None,
-               'another-example-from-scikit-learn-s-repository'),
-              ('Various steps in cross-validation',
+               'including-stochastic-gradient-descent-with-autograd'),
+              ('Same code but now with momentum gradient descent',
                2,
                None,
-               'various-steps-in-cross-validation'),
-              ('Cross-validation in brief',
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ("But none of these can compete with Newton's method",
                2,
                None,
-               'cross-validation-in-brief'),
-              ('Code Example for Cross-validation and $k$-fold '
-               'Cross-validation',
+               'but-none-of-these-can-compete-with-newton-s-method'),
+              ('Similar (second order function now) problem but now with '
+               'AdaGrad',
                2,
                None,
-               'code-example-for-cross-validation-and-k-fold-cross-validation'),
-              ('More examples on bootstrap and cross-validation and errors',
+               'similar-second-order-function-now-problem-but-now-with-adagrad'),
+              ('RMSprop for adaptive learning rate with Stochastic Gradient '
+               'Descent',
                2,
                None,
-               'more-examples-on-bootstrap-and-cross-validation-and-errors'),
-              ('The same example but now with cross-validation',
+               'rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent'),
+              ('And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"',
                2,
                None,
-               'the-same-example-but-now-with-cross-validation'),
+               'and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf'),
               ('Material for the lab sessions',
                2,
                None,
                'material-for-the-lab-sessions'),
-              ('Linking the regression analysis with a statistical '
-               'interpretation',
+              ('Reminder on different scaling methods',
                2,
                None,
-               'linking-the-regression-analysis-with-a-statistical-interpretation'),
-              ('Assumptions made', 2, None, 'assumptions-made'),
-              ('Expectation value and variance',
+               'reminder-on-different-scaling-methods'),
+              ('Functionality in Scikit-Learn',
                2,
                None,
-               'expectation-value-and-variance'),
-              ('Expectation value and variance for $\\boldsymbol{\\beta}$',
+               'functionality-in-scikit-learn'),
+              ('More preprocessing', 2, None, 'more-preprocessing'),
+              ('Frequently used scaling functions',
                2,
                None,
-               'expectation-value-and-variance-for-boldsymbol-beta')]}
+               'frequently-used-scaling-functions')]}
 end of tocinfo -->
 
 <body>
@@ -220,66 +283,86 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week37-bs.html">Week 37: Statistical interpretations and Resampling Methods</a>
+    <a class="navbar-brand" href="http://github.com/week37-bs.html">Week 37: Gradient descent methods</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs001.html#plans-for-week-37-lecture-monday" style="font-size: 80%;">Plans for week 37, lecture Monday</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs002.html#plans-for-week-37-lab-sessions" style="font-size: 80%;">Plans for week 37, lab sessions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs003.html#material-for-lecture-monday-september-9" style="font-size: 80%;">Material for lecture Monday September 9</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs004.html#deriving-ols-from-a-probability-distribution" style="font-size: 80%;">Deriving OLS from a probability distribution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs005.html#independent-and-identically-distrubuted-iid" style="font-size: 80%;">Independent and Identically Distrubuted (iid)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs006.html#maximum-likelihood-estimation-mle" style="font-size: 80%;">Maximum Likelihood Estimation (MLE)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs007.html#a-new-cost-function" style="font-size: 80%;">A new Cost Function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs008.html#more-basic-statistics-and-bayes-theorem" style="font-size: 80%;">More basic Statistics and Bayes' theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs009.html#marginal-probability" style="font-size: 80%;">Marginal Probability</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs010.html#conditional-probability" style="font-size: 80%;">Conditional  Probability</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs011.html#bayes-theorem" style="font-size: 80%;">Bayes' Theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs012.html#interpretations-of-bayes-theorem" style="font-size: 80%;">Interpretations of Bayes' Theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs013.html#example-of-usage-of-bayes-theorem" style="font-size: 80%;">Example of Usage of Bayes' theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs014.html#doing-it-correctly" style="font-size: 80%;">Doing it correctly</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs015.html#bayes-theorem-and-ridge-and-lasso-regression" style="font-size: 80%;">Bayes' Theorem and Ridge and Lasso Regression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs016.html#ridge-and-bayes" style="font-size: 80%;">Ridge and Bayes</a></li>
-     <!-- navigation toc: --> <li><a href="#lasso-and-bayes" style="font-size: 80%;">Lasso and Bayes</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs021.html#why-resampling-methods" style="font-size: 80%;">Why resampling methods</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs023.html#resampling-methods" style="font-size: 80%;">Resampling methods</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs020.html#resampling-approaches-can-be-computationally-expensive" style="font-size: 80%;">Resampling approaches can be computationally expensive</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs021.html#why-resampling-methods" style="font-size: 80%;">Why resampling methods ?</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs022.html#statistical-analysis" style="font-size: 80%;">Statistical analysis</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs023.html#resampling-methods" style="font-size: 80%;">Resampling methods</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs024.html#resampling-methods-bootstrap" style="font-size: 80%;">Resampling methods: Bootstrap</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs025.html#the-central-limit-theorem" style="font-size: 80%;">The Central Limit Theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs026.html#finding-the-limit" style="font-size: 80%;">Finding the Limit</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs027.html#rewriting-the-delta-function" style="font-size: 80%;">Rewriting the \( \delta \)-function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs028.html#identifying-terms" style="font-size: 80%;">Identifying Terms</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs029.html#wrapping-it-up" style="font-size: 80%;">Wrapping it up</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs030.html#confidence-intervals" style="font-size: 80%;">Confidence Intervals</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs031.html#standard-approach-based-on-the-normal-distribution" style="font-size: 80%;">Standard Approach based on the Normal Distribution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs032.html#resampling-methods-bootstrap-background" style="font-size: 80%;">Resampling methods: Bootstrap background</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#resampling-methods-more-bootstrap-background" style="font-size: 80%;">Resampling methods: More Bootstrap background</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs034.html#resampling-methods-bootstrap-approach" style="font-size: 80%;">Resampling methods: Bootstrap approach</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#resampling-methods-bootstrap-steps" style="font-size: 80%;">Resampling methods: Bootstrap steps</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs036.html#code-example-for-the-bootstrap-method" style="font-size: 80%;">Code example for the Bootstrap method</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs037.html#plotting-the-histogram" style="font-size: 80%;">Plotting the Histogram</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs038.html#the-bias-variance-tradeoff" style="font-size: 80%;">The bias-variance tradeoff</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs039.html#a-way-to-read-the-bias-variance-tradeoff" style="font-size: 80%;">A way to Read the Bias-Variance Tradeoff</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs040.html#example-code-for-bias-variance-tradeoff" style="font-size: 80%;">Example code for Bias-Variance tradeoff</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs041.html#understanding-what-happens" style="font-size: 80%;">Understanding what happens</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs042.html#summing-up" style="font-size: 80%;">Summing up</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs043.html#another-example-from-scikit-learn-s-repository" style="font-size: 80%;">Another Example from Scikit-Learn's Repository</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs044.html#various-steps-in-cross-validation" style="font-size: 80%;">Various steps in cross-validation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs045.html#cross-validation-in-brief" style="font-size: 80%;">Cross-validation in brief</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs046.html#code-example-for-cross-validation-and-k-fold-cross-validation" style="font-size: 80%;">Code Example for Cross-validation and \( k \)-fold Cross-validation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs047.html#more-examples-on-bootstrap-and-cross-validation-and-errors" style="font-size: 80%;">More examples on bootstrap and cross-validation and errors</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs048.html#the-same-example-but-now-with-cross-validation" style="font-size: 80%;">The same example but now with cross-validation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs049.html#material-for-the-lab-sessions" style="font-size: 80%;">Material for the lab sessions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs050.html#linking-the-regression-analysis-with-a-statistical-interpretation" style="font-size: 80%;">Linking the regression analysis with a statistical interpretation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs051.html#assumptions-made" style="font-size: 80%;">Assumptions made</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs052.html#expectation-value-and-variance" style="font-size: 80%;">Expectation value and variance</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs053.html#expectation-value-and-variance-for-boldsymbol-beta" style="font-size: 80%;">Expectation value and variance for \( \boldsymbol{\beta} \)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs001.html#plans-for-week-37-lecture-monday" style="font-size: 80%;"><b>Plans for week 37, lecture Monday</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs002.html#readings-and-videos" style="font-size: 80%;"><b>Readings and Videos:</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs003.html#material-for-lecture-monday-september-8" style="font-size: 80%;"><b>Material for lecture Monday September 8</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs004.html#gradient-descent-and-revisiting-ordinary-least-squares-from-last-week" style="font-size: 80%;"><b>Gradient descent and revisiting Ordinary Least Squares from last week</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs009.html#gradient-descent-example" style="font-size: 80%;"><b>Gradient descent example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs006.html#the-derivative-of-the-cost-loss-function" style="font-size: 80%;"><b>The derivative of the cost/loss function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs007.html#the-hessian-matrix" style="font-size: 80%;"><b>The Hessian matrix</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs008.html#simple-program" style="font-size: 80%;"><b>Simple program</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs009.html#gradient-descent-example" style="font-size: 80%;"><b>Gradient Descent Example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs010.html#gradient-descent-and-ridge" style="font-size: 80%;"><b>Gradient descent and Ridge</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs011.html#the-hessian-matrix-for-ridge-regression" style="font-size: 80%;"><b>The Hessian matrix for Ridge Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs012.html#program-example-for-gradient-descent-with-ridge-regression" style="font-size: 80%;"><b>Program example for gradient descent with Ridge Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs013.html#using-gradient-descent-methods-limitations" style="font-size: 80%;"><b>Using gradient descent methods, limitations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs014.html#momentum-based-gd" style="font-size: 80%;"><b>Momentum based GD</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs015.html#improving-gradient-descent-with-momentum" style="font-size: 80%;"><b>Improving gradient descent with momentum</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs056.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="#overview-video-on-stochastic-gradient-descent-sgd" style="font-size: 80%;"><b>Overview video on Stochastic Gradient Descent (SGD)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs018.html#batches-and-mini-batches" style="font-size: 80%;"><b>Batches and mini-batches</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs019.html#pros-and-cons" style="font-size: 80%;"><b>Pros and cons</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs020.html#convergence-rates" style="font-size: 80%;"><b>Convergence rates</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs021.html#accuracy" style="font-size: 80%;"><b>Accuracy</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs022.html#stochastic-gradient-descent-sgd" style="font-size: 80%;"><b>Stochastic Gradient Descent (SGD)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs023.html#stochastic-gradient-descent" style="font-size: 80%;"><b>Stochastic Gradient Descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs024.html#computation-of-gradients" style="font-size: 80%;"><b>Computation of gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs025.html#sgd-example" style="font-size: 80%;"><b>SGD example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs026.html#the-gradient-step" style="font-size: 80%;"><b>The gradient step</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs027.html#simple-example-code" style="font-size: 80%;"><b>Simple example code</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs028.html#when-do-we-stop" style="font-size: 80%;"><b>When do we stop?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs029.html#slightly-different-approach" style="font-size: 80%;"><b>Slightly different approach</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs030.html#time-decay-rate" style="font-size: 80%;"><b>Time decay rate</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs031.html#code-with-a-number-of-minibatches-which-varies" style="font-size: 80%;"><b>Code with a Number of Minibatches which varies</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs032.html#replace-or-not" style="font-size: 80%;"><b>Replace or not</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#sgd-vs-full-batch-gd-convergence-speed-and-memory-comparison" style="font-size: 80%;"><b>SGD vs Full-Batch GD: Convergence Speed and Memory Comparison</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#theoretical-convergence-speed-and-convex-optimization" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Theoretical Convergence Speed and convex optimization</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#strongly-convex-case" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Strongly Convex Case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#non-convex-problems" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Non-Convex Problems</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs034.html#memory-usage-and-scalability" style="font-size: 80%;"><b>Memory Usage and Scalability</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#empirical-evidence-convergence-time-and-memory-in-practice" style="font-size: 80%;"><b>Empirical Evidence: Convergence Time and Memory in Practice</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#deep-neural-networks" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Deep Neural Networks</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#memory-constraints" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Memory constraints</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs036.html#second-moment-of-the-gradient" style="font-size: 80%;"><b>Second moment of the gradient</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs037.html#challenge-choosing-a-fixed-learning-rate" style="font-size: 80%;"><b>Challenge: Choosing a Fixed Learning Rate</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs038.html#motivation-for-adaptive-step-sizes" style="font-size: 80%;"><b>Motivation for Adaptive Step Sizes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs038.html#adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>AdaGrad algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs039.html#derivation-of-the-adagrad-algorithm" style="font-size: 80%;"><b>Derivation of the AdaGrad Algorithm</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs040.html#adagrad-update-rule-derivation" style="font-size: 80%;"><b>AdaGrad Update Rule Derivation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs041.html#adagrad-properties" style="font-size: 80%;"><b>AdaGrad Properties</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs042.html#rmsprop-adaptive-learning-rates" style="font-size: 80%;"><b>RMSProp: Adaptive Learning Rates</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs042.html#rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>RMSProp algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs043.html#adam-optimizer" style="font-size: 80%;"><b>Adam Optimizer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs044.html#adam-optimizer-https-arxiv-org-abs-1412-6980" style="font-size: 80%;"><b>"ADAM optimizer":"https://arxiv.org/abs/1412.6980"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs045.html#why-combine-momentum-and-rmsprop" style="font-size: 80%;"><b>Why Combine Momentum and RMSProp?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs046.html#adam-exponential-moving-averages-moments" style="font-size: 80%;"><b>Adam: Exponential Moving Averages (Moments)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs047.html#adam-bias-correction" style="font-size: 80%;"><b>Adam: Bias Correction</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs048.html#adam-update-rule-derivation" style="font-size: 80%;"><b>Adam: Update Rule Derivation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs049.html#adam-vs-adagrad-and-rmsprop" style="font-size: 80%;"><b>Adam vs. AdaGrad and RMSProp</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs050.html#adaptivity-across-dimensions" style="font-size: 80%;"><b>Adaptivity Across Dimensions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs050.html#adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>ADAM algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs051.html#algorithms-and-codes-for-adagrad-rmsprop-and-adam" style="font-size: 80%;"><b>Algorithms and codes for Adagrad, RMSprop and Adam</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs052.html#practical-tips" style="font-size: 80%;"><b>Practical tips</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs053.html#sneaking-in-automatic-differentiation-using-autograd" style="font-size: 80%;"><b>Sneaking in automatic differentiation using Autograd</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs056.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs055.html#including-stochastic-gradient-descent-with-autograd" style="font-size: 80%;"><b>Including Stochastic Gradient Descent with Autograd</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs056.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs057.html#but-none-of-these-can-compete-with-newton-s-method" style="font-size: 80%;"><b>But none of these can compete with Newton's method</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs058.html#similar-second-order-function-now-problem-but-now-with-adagrad" style="font-size: 80%;"><b>Similar (second order function now) problem but now with AdaGrad</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs059.html#rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent" style="font-size: 80%;"><b>RMSprop for adaptive learning rate with Stochastic Gradient Descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs060.html#and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf" style="font-size: 80%;"><b>And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs061.html#material-for-the-lab-sessions" style="font-size: 80%;"><b>Material for the lab sessions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs062.html#reminder-on-different-scaling-methods" style="font-size: 80%;"><b>Reminder on different scaling methods</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs063.html#functionality-in-scikit-learn" style="font-size: 80%;"><b>Functionality in Scikit-Learn</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs064.html#more-preprocessing" style="font-size: 80%;"><b>More preprocessing</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs065.html#frequently-used-scaling-functions" style="font-size: 80%;"><b>Frequently used scaling functions</b></a></li>
 
         </ul>
       </li>
@@ -291,36 +374,16 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0017"></a>
 <!-- !split -->
-<h2 id="lasso-and-bayes" class="anchor">Lasso and Bayes </h2>
-
-<p>To derive the Lasso cost function, we simply replace the Gaussian prior with an exponential distribution (<a href="https://en.wikipedia.org/wiki/Laplace_distribution" target="_self">Laplace in this case</a>) with zero mean value,  that is</p>
-
-$$
-p(\boldsymbol{\beta})=\prod_{j=0}^{p-1}\exp{\left(-\frac{\vert\beta_j\vert}{\tau}\right)}.
-$$
-
-<p>Our posterior probability becomes then (omitting the normalization factor which is just a constant)</p>
-$$
-p(\boldsymbol{\beta}\vert\boldsymbol{D})=\prod_{i=0}^{n-1}\frac{1}{\sqrt{2\pi\sigma^2}}\exp{\left[-\frac{(y_i-\boldsymbol{X}_{i,*}\boldsymbol{\beta})^2}{2\sigma^2}\right]}\prod_{j=0}^{p-1}\exp{\left(-\frac{\vert\beta_j\vert}{\tau}\right)}.
-$$
-
-<p>Taking the negative
-logarithm of the posterior probability and leaving out the
-constants terms that do not depend on \( \beta \), we have
-</p>
-
-$$
-C(\boldsymbol{\beta})=\frac{\vert\vert (\boldsymbol{y}-\boldsymbol{X}\boldsymbol{\beta})\vert\vert_2^2}{2\sigma^2}+\frac{1}{\tau}\vert\vert\boldsymbol{\beta}\vert\vert_1,
-$$
-
-<p>and replacing \( 1/\tau \) with \( \lambda \) we have</p>
-
-$$
-C(\boldsymbol{\beta})=\frac{\vert\vert (\boldsymbol{y}-\boldsymbol{X}\boldsymbol{\beta})\vert\vert_2^2}{2\sigma^2}+\lambda\vert\vert\boldsymbol{\beta}\vert\vert_1,
-$$
+<h2 id="overview-video-on-stochastic-gradient-descent-sgd" class="anchor">Overview video on Stochastic Gradient Descent (SGD) </h2>
 
-<p>which is our Lasso cost function!  </p>
+<a href="https://www.youtube.com/watch?v=vMh0zPT0tLI&ab_channel=StatQuestwithJoshStarmer" target="_self">What is Stochastic Gradient Descent</a>
+<p>There are several reasons for using stochastic gradient descent. Some of these are:</p>
 
+<ol>
+<li> Efficiency: Updates weights more frequently using a single or a small batch of samples, which speeds up convergence.</li>
+<li> Hopefully avoid Local Minima</li>
+<li> Memory Usage: Requires less memory compared to computing gradients for the entire dataset.</li>
+</ol>
 <p>
 <!-- navigation buttons at the bottom of the page -->
 <ul class="pagination">
@@ -346,7 +409,7 @@ <h2 id="lasso-and-bayes" class="anchor">Lasso and Bayes </h2>
   <li><a href="http://github.com/._week37-bs025.html">26</a></li>
   <li><a href="http://github.com/._week37-bs026.html">27</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week37-bs053.html">54</a></li>
+  <li><a href="http://github.com/._week37-bs065.html">66</a></li>
   <li><a href="http://github.com/._week37-bs018.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week37/html/._week37-bs018.html b/doc/pub/week37/html/._week37-bs018.html
index 984b49900..d76d509b2 100644
--- a/doc/pub/week37/html/._week37-bs018.html
+++ b/doc/pub/week37/html/._week37-bs018.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 37: Statistical interpretations and Resampling Methods">
-<title>Week 37: Statistical interpretations and Resampling Methods</title>
+<meta name="description" content="Week 37: Gradient descent methods">
+<title>Week 37: Gradient descent methods</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week37.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week37-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -40,159 +40,222 @@
                2,
                None,
                'plans-for-week-37-lecture-monday'),
-              ('Plans for week 37, lab sessions',
+              ('Readings and Videos:', 2, None, 'readings-and-videos'),
+              ('Material for lecture Monday September 8',
                2,
                None,
-               'plans-for-week-37-lab-sessions'),
-              ('Material for lecture Monday September 9',
+               'material-for-lecture-monday-september-8'),
+              ('Gradient descent and revisiting Ordinary Least Squares from '
+               'last week',
                2,
                None,
-               'material-for-lecture-monday-september-9'),
-              ('Deriving OLS from a probability distribution',
+               'gradient-descent-and-revisiting-ordinary-least-squares-from-last-week'),
+              ('Gradient descent example', 2, None, 'gradient-descent-example'),
+              ('The derivative of the cost/loss function',
                2,
                None,
-               'deriving-ols-from-a-probability-distribution'),
-              ('Independent and Identically Distrubuted (iid)',
+               'the-derivative-of-the-cost-loss-function'),
+              ('The Hessian matrix', 2, None, 'the-hessian-matrix'),
+              ('Simple program', 2, None, 'simple-program'),
+              ('Gradient Descent Example', 2, None, 'gradient-descent-example'),
+              ('Gradient descent and Ridge',
                2,
                None,
-               'independent-and-identically-distrubuted-iid'),
-              ('Maximum Likelihood Estimation (MLE)',
+               'gradient-descent-and-ridge'),
+              ('The Hessian matrix for Ridge Regression',
                2,
                None,
-               'maximum-likelihood-estimation-mle'),
-              ('A new Cost Function', 2, None, 'a-new-cost-function'),
-              ("More basic Statistics and Bayes' theorem",
+               'the-hessian-matrix-for-ridge-regression'),
+              ('Program example for gradient descent with Ridge Regression',
                2,
                None,
-               'more-basic-statistics-and-bayes-theorem'),
-              ('Marginal Probability', 2, None, 'marginal-probability'),
-              ('Conditional  Probability', 2, None, 'conditional-probability'),
-              ("Bayes' Theorem", 2, None, 'bayes-theorem'),
-              ("Interpretations of Bayes' Theorem",
+               'program-example-for-gradient-descent-with-ridge-regression'),
+              ('Using gradient descent methods, limitations',
                2,
                None,
-               'interpretations-of-bayes-theorem'),
-              ("Example of Usage of Bayes' theorem",
+               'using-gradient-descent-methods-limitations'),
+              ('Momentum based GD', 2, None, 'momentum-based-gd'),
+              ('Improving gradient descent with momentum',
                2,
                None,
-               'example-of-usage-of-bayes-theorem'),
-              ('Doing it correctly', 2, None, 'doing-it-correctly'),
-              ("Bayes' Theorem and Ridge and Lasso Regression",
+               'improving-gradient-descent-with-momentum'),
+              ('Same code but now with momentum gradient descent',
                2,
                None,
-               'bayes-theorem-and-ridge-and-lasso-regression'),
-              ('Ridge and Bayes', 2, None, 'ridge-and-bayes'),
-              ('Lasso and Bayes', 2, None, 'lasso-and-bayes'),
-              ('Why resampling methods', 2, None, 'why-resampling-methods'),
-              ('Resampling methods', 2, None, 'resampling-methods'),
-              ('Resampling approaches can be computationally expensive',
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ('Overview video on Stochastic Gradient Descent (SGD)',
                2,
                None,
-               'resampling-approaches-can-be-computationally-expensive'),
-              ('Why resampling methods ?', 2, None, 'why-resampling-methods'),
-              ('Statistical analysis', 2, None, 'statistical-analysis'),
-              ('Resampling methods', 2, None, 'resampling-methods'),
-              ('Resampling methods: Bootstrap',
+               'overview-video-on-stochastic-gradient-descent-sgd'),
+              ('Batches and mini-batches', 2, None, 'batches-and-mini-batches'),
+              ('Pros and cons', 2, None, 'pros-and-cons'),
+              ('Convergence rates', 2, None, 'convergence-rates'),
+              ('Accuracy', 2, None, 'accuracy'),
+              ('Stochastic Gradient Descent (SGD)',
                2,
                None,
-               'resampling-methods-bootstrap'),
-              ('The Central Limit Theorem',
+               'stochastic-gradient-descent-sgd'),
+              ('Stochastic Gradient Descent',
                2,
                None,
-               'the-central-limit-theorem'),
-              ('Finding the Limit', 2, None, 'finding-the-limit'),
-              ('Rewriting the $\\delta$-function',
+               'stochastic-gradient-descent'),
+              ('Computation of gradients', 2, None, 'computation-of-gradients'),
+              ('SGD example', 2, None, 'sgd-example'),
+              ('The gradient step', 2, None, 'the-gradient-step'),
+              ('Simple example code', 2, None, 'simple-example-code'),
+              ('When do we stop?', 2, None, 'when-do-we-stop'),
+              ('Slightly different approach',
                2,
                None,
-               'rewriting-the-delta-function'),
-              ('Identifying Terms', 2, None, 'identifying-terms'),
-              ('Wrapping it up', 2, None, 'wrapping-it-up'),
-              ('Confidence Intervals', 2, None, 'confidence-intervals'),
-              ('Standard Approach based on the Normal Distribution',
+               'slightly-different-approach'),
+              ('Time decay rate', 2, None, 'time-decay-rate'),
+              ('Code with a Number of Minibatches which varies',
                2,
                None,
-               'standard-approach-based-on-the-normal-distribution'),
-              ('Resampling methods: Bootstrap background',
+               'code-with-a-number-of-minibatches-which-varies'),
+              ('Replace or not', 2, None, 'replace-or-not'),
+              ('SGD vs Full-Batch GD: Convergence Speed and Memory Comparison',
                2,
                None,
-               'resampling-methods-bootstrap-background'),
-              ('Resampling methods: More Bootstrap background',
+               'sgd-vs-full-batch-gd-convergence-speed-and-memory-comparison'),
+              ('Theoretical Convergence Speed and convex optimization',
+               3,
+               None,
+               'theoretical-convergence-speed-and-convex-optimization'),
+              ('Strongly Convex Case', 3, None, 'strongly-convex-case'),
+              ('Non-Convex Problems', 3, None, 'non-convex-problems'),
+              ('Memory Usage and Scalability',
+               2,
+               None,
+               'memory-usage-and-scalability'),
+              ('Empirical Evidence: Convergence Time and Memory in Practice',
+               2,
+               None,
+               'empirical-evidence-convergence-time-and-memory-in-practice'),
+              ('Deep Neural Networks', 3, None, 'deep-neural-networks'),
+              ('Memory constraints', 3, None, 'memory-constraints'),
+              ('Second moment of the gradient',
+               2,
+               None,
+               'second-moment-of-the-gradient'),
+              ('Challenge: Choosing a Fixed Learning Rate',
+               2,
+               None,
+               'challenge-choosing-a-fixed-learning-rate'),
+              ('Motivation for Adaptive Step Sizes',
+               2,
+               None,
+               'motivation-for-adaptive-step-sizes'),
+              ('AdaGrad algorithm, taken from "Goodfellow et '
+               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
+               2,
+               None,
+               'adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
+              ('Derivation of the AdaGrad Algorithm',
                2,
                None,
-               'resampling-methods-more-bootstrap-background'),
-              ('Resampling methods: Bootstrap approach',
+               'derivation-of-the-adagrad-algorithm'),
+              ('AdaGrad Update Rule Derivation',
                2,
                None,
-               'resampling-methods-bootstrap-approach'),
-              ('Resampling methods: Bootstrap steps',
+               'adagrad-update-rule-derivation'),
+              ('AdaGrad Properties', 2, None, 'adagrad-properties'),
+              ('RMSProp: Adaptive Learning Rates',
                2,
                None,
-               'resampling-methods-bootstrap-steps'),
-              ('Code example for the Bootstrap method',
+               'rmsprop-adaptive-learning-rates'),
+              ('RMSProp algorithm, taken from "Goodfellow et '
+               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
                2,
                None,
-               'code-example-for-the-bootstrap-method'),
-              ('Plotting the Histogram', 2, None, 'plotting-the-histogram'),
-              ('The bias-variance tradeoff',
+               'rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
+              ('Adam Optimizer', 2, None, 'adam-optimizer'),
+              ('"ADAM optimizer":"https://arxiv.org/abs/1412.6980"',
                2,
                None,
-               'the-bias-variance-tradeoff'),
-              ('A way to Read the Bias-Variance Tradeoff',
+               'adam-optimizer-https-arxiv-org-abs-1412-6980'),
+              ('Why Combine Momentum and RMSProp?',
                2,
                None,
-               'a-way-to-read-the-bias-variance-tradeoff'),
-              ('Example code for Bias-Variance tradeoff',
+               'why-combine-momentum-and-rmsprop'),
+              ('Adam: Exponential Moving Averages (Moments)',
                2,
                None,
-               'example-code-for-bias-variance-tradeoff'),
-              ('Understanding what happens',
+               'adam-exponential-moving-averages-moments'),
+              ('Adam: Bias Correction', 2, None, 'adam-bias-correction'),
+              ('Adam: Update Rule Derivation',
                2,
                None,
-               'understanding-what-happens'),
-              ('Summing up', 2, None, 'summing-up'),
-              ("Another Example from Scikit-Learn's Repository",
+               'adam-update-rule-derivation'),
+              ('Adam vs. AdaGrad and RMSProp',
                2,
                None,
-               'another-example-from-scikit-learn-s-repository'),
-              ('Various steps in cross-validation',
+               'adam-vs-adagrad-and-rmsprop'),
+              ('Adaptivity Across Dimensions',
                2,
                None,
-               'various-steps-in-cross-validation'),
-              ('Cross-validation in brief',
+               'adaptivity-across-dimensions'),
+              ('ADAM algorithm, taken from "Goodfellow et '
+               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
                2,
                None,
-               'cross-validation-in-brief'),
-              ('Code Example for Cross-validation and $k$-fold '
-               'Cross-validation',
+               'adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
+              ('Algorithms and codes for Adagrad, RMSprop and Adam',
                2,
                None,
-               'code-example-for-cross-validation-and-k-fold-cross-validation'),
-              ('More examples on bootstrap and cross-validation and errors',
+               'algorithms-and-codes-for-adagrad-rmsprop-and-adam'),
+              ('Practical tips', 2, None, 'practical-tips'),
+              ('Sneaking in automatic differentiation using Autograd',
                2,
                None,
-               'more-examples-on-bootstrap-and-cross-validation-and-errors'),
-              ('The same example but now with cross-validation',
+               'sneaking-in-automatic-differentiation-using-autograd'),
+              ('Same code but now with momentum gradient descent',
                2,
                None,
-               'the-same-example-but-now-with-cross-validation'),
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ('Including Stochastic Gradient Descent with Autograd',
+               2,
+               None,
+               'including-stochastic-gradient-descent-with-autograd'),
+              ('Same code but now with momentum gradient descent',
+               2,
+               None,
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ("But none of these can compete with Newton's method",
+               2,
+               None,
+               'but-none-of-these-can-compete-with-newton-s-method'),
+              ('Similar (second order function now) problem but now with '
+               'AdaGrad',
+               2,
+               None,
+               'similar-second-order-function-now-problem-but-now-with-adagrad'),
+              ('RMSprop for adaptive learning rate with Stochastic Gradient '
+               'Descent',
+               2,
+               None,
+               'rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent'),
+              ('And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"',
+               2,
+               None,
+               'and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf'),
               ('Material for the lab sessions',
                2,
                None,
                'material-for-the-lab-sessions'),
-              ('Linking the regression analysis with a statistical '
-               'interpretation',
+              ('Reminder on different scaling methods',
                2,
                None,
-               'linking-the-regression-analysis-with-a-statistical-interpretation'),
-              ('Assumptions made', 2, None, 'assumptions-made'),
-              ('Expectation value and variance',
+               'reminder-on-different-scaling-methods'),
+              ('Functionality in Scikit-Learn',
                2,
                None,
-               'expectation-value-and-variance'),
-              ('Expectation value and variance for $\\boldsymbol{\\beta}$',
+               'functionality-in-scikit-learn'),
+              ('More preprocessing', 2, None, 'more-preprocessing'),
+              ('Frequently used scaling functions',
                2,
                None,
-               'expectation-value-and-variance-for-boldsymbol-beta')]}
+               'frequently-used-scaling-functions')]}
 end of tocinfo -->
 
 <body>
@@ -220,66 +283,86 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week37-bs.html">Week 37: Statistical interpretations and Resampling Methods</a>
+    <a class="navbar-brand" href="http://github.com/week37-bs.html">Week 37: Gradient descent methods</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs001.html#plans-for-week-37-lecture-monday" style="font-size: 80%;">Plans for week 37, lecture Monday</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs002.html#plans-for-week-37-lab-sessions" style="font-size: 80%;">Plans for week 37, lab sessions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs003.html#material-for-lecture-monday-september-9" style="font-size: 80%;">Material for lecture Monday September 9</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs004.html#deriving-ols-from-a-probability-distribution" style="font-size: 80%;">Deriving OLS from a probability distribution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs005.html#independent-and-identically-distrubuted-iid" style="font-size: 80%;">Independent and Identically Distrubuted (iid)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs006.html#maximum-likelihood-estimation-mle" style="font-size: 80%;">Maximum Likelihood Estimation (MLE)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs007.html#a-new-cost-function" style="font-size: 80%;">A new Cost Function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs008.html#more-basic-statistics-and-bayes-theorem" style="font-size: 80%;">More basic Statistics and Bayes' theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs009.html#marginal-probability" style="font-size: 80%;">Marginal Probability</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs010.html#conditional-probability" style="font-size: 80%;">Conditional  Probability</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs011.html#bayes-theorem" style="font-size: 80%;">Bayes' Theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs012.html#interpretations-of-bayes-theorem" style="font-size: 80%;">Interpretations of Bayes' Theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs013.html#example-of-usage-of-bayes-theorem" style="font-size: 80%;">Example of Usage of Bayes' theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs014.html#doing-it-correctly" style="font-size: 80%;">Doing it correctly</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs015.html#bayes-theorem-and-ridge-and-lasso-regression" style="font-size: 80%;">Bayes' Theorem and Ridge and Lasso Regression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs016.html#ridge-and-bayes" style="font-size: 80%;">Ridge and Bayes</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs017.html#lasso-and-bayes" style="font-size: 80%;">Lasso and Bayes</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs021.html#why-resampling-methods" style="font-size: 80%;">Why resampling methods</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs023.html#resampling-methods" style="font-size: 80%;">Resampling methods</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs020.html#resampling-approaches-can-be-computationally-expensive" style="font-size: 80%;">Resampling approaches can be computationally expensive</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs021.html#why-resampling-methods" style="font-size: 80%;">Why resampling methods ?</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs022.html#statistical-analysis" style="font-size: 80%;">Statistical analysis</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs023.html#resampling-methods" style="font-size: 80%;">Resampling methods</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs024.html#resampling-methods-bootstrap" style="font-size: 80%;">Resampling methods: Bootstrap</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs025.html#the-central-limit-theorem" style="font-size: 80%;">The Central Limit Theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs026.html#finding-the-limit" style="font-size: 80%;">Finding the Limit</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs027.html#rewriting-the-delta-function" style="font-size: 80%;">Rewriting the \( \delta \)-function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs028.html#identifying-terms" style="font-size: 80%;">Identifying Terms</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs029.html#wrapping-it-up" style="font-size: 80%;">Wrapping it up</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs030.html#confidence-intervals" style="font-size: 80%;">Confidence Intervals</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs031.html#standard-approach-based-on-the-normal-distribution" style="font-size: 80%;">Standard Approach based on the Normal Distribution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs032.html#resampling-methods-bootstrap-background" style="font-size: 80%;">Resampling methods: Bootstrap background</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#resampling-methods-more-bootstrap-background" style="font-size: 80%;">Resampling methods: More Bootstrap background</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs034.html#resampling-methods-bootstrap-approach" style="font-size: 80%;">Resampling methods: Bootstrap approach</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#resampling-methods-bootstrap-steps" style="font-size: 80%;">Resampling methods: Bootstrap steps</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs036.html#code-example-for-the-bootstrap-method" style="font-size: 80%;">Code example for the Bootstrap method</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs037.html#plotting-the-histogram" style="font-size: 80%;">Plotting the Histogram</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs038.html#the-bias-variance-tradeoff" style="font-size: 80%;">The bias-variance tradeoff</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs039.html#a-way-to-read-the-bias-variance-tradeoff" style="font-size: 80%;">A way to Read the Bias-Variance Tradeoff</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs040.html#example-code-for-bias-variance-tradeoff" style="font-size: 80%;">Example code for Bias-Variance tradeoff</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs041.html#understanding-what-happens" style="font-size: 80%;">Understanding what happens</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs042.html#summing-up" style="font-size: 80%;">Summing up</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs043.html#another-example-from-scikit-learn-s-repository" style="font-size: 80%;">Another Example from Scikit-Learn's Repository</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs044.html#various-steps-in-cross-validation" style="font-size: 80%;">Various steps in cross-validation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs045.html#cross-validation-in-brief" style="font-size: 80%;">Cross-validation in brief</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs046.html#code-example-for-cross-validation-and-k-fold-cross-validation" style="font-size: 80%;">Code Example for Cross-validation and \( k \)-fold Cross-validation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs047.html#more-examples-on-bootstrap-and-cross-validation-and-errors" style="font-size: 80%;">More examples on bootstrap and cross-validation and errors</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs048.html#the-same-example-but-now-with-cross-validation" style="font-size: 80%;">The same example but now with cross-validation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs049.html#material-for-the-lab-sessions" style="font-size: 80%;">Material for the lab sessions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs050.html#linking-the-regression-analysis-with-a-statistical-interpretation" style="font-size: 80%;">Linking the regression analysis with a statistical interpretation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs051.html#assumptions-made" style="font-size: 80%;">Assumptions made</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs052.html#expectation-value-and-variance" style="font-size: 80%;">Expectation value and variance</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs053.html#expectation-value-and-variance-for-boldsymbol-beta" style="font-size: 80%;">Expectation value and variance for \( \boldsymbol{\beta} \)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs001.html#plans-for-week-37-lecture-monday" style="font-size: 80%;"><b>Plans for week 37, lecture Monday</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs002.html#readings-and-videos" style="font-size: 80%;"><b>Readings and Videos:</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs003.html#material-for-lecture-monday-september-8" style="font-size: 80%;"><b>Material for lecture Monday September 8</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs004.html#gradient-descent-and-revisiting-ordinary-least-squares-from-last-week" style="font-size: 80%;"><b>Gradient descent and revisiting Ordinary Least Squares from last week</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs009.html#gradient-descent-example" style="font-size: 80%;"><b>Gradient descent example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs006.html#the-derivative-of-the-cost-loss-function" style="font-size: 80%;"><b>The derivative of the cost/loss function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs007.html#the-hessian-matrix" style="font-size: 80%;"><b>The Hessian matrix</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs008.html#simple-program" style="font-size: 80%;"><b>Simple program</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs009.html#gradient-descent-example" style="font-size: 80%;"><b>Gradient Descent Example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs010.html#gradient-descent-and-ridge" style="font-size: 80%;"><b>Gradient descent and Ridge</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs011.html#the-hessian-matrix-for-ridge-regression" style="font-size: 80%;"><b>The Hessian matrix for Ridge Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs012.html#program-example-for-gradient-descent-with-ridge-regression" style="font-size: 80%;"><b>Program example for gradient descent with Ridge Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs013.html#using-gradient-descent-methods-limitations" style="font-size: 80%;"><b>Using gradient descent methods, limitations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs014.html#momentum-based-gd" style="font-size: 80%;"><b>Momentum based GD</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs015.html#improving-gradient-descent-with-momentum" style="font-size: 80%;"><b>Improving gradient descent with momentum</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs056.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs017.html#overview-video-on-stochastic-gradient-descent-sgd" style="font-size: 80%;"><b>Overview video on Stochastic Gradient Descent (SGD)</b></a></li>
+     <!-- navigation toc: --> <li><a href="#batches-and-mini-batches" style="font-size: 80%;"><b>Batches and mini-batches</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs019.html#pros-and-cons" style="font-size: 80%;"><b>Pros and cons</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs020.html#convergence-rates" style="font-size: 80%;"><b>Convergence rates</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs021.html#accuracy" style="font-size: 80%;"><b>Accuracy</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs022.html#stochastic-gradient-descent-sgd" style="font-size: 80%;"><b>Stochastic Gradient Descent (SGD)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs023.html#stochastic-gradient-descent" style="font-size: 80%;"><b>Stochastic Gradient Descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs024.html#computation-of-gradients" style="font-size: 80%;"><b>Computation of gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs025.html#sgd-example" style="font-size: 80%;"><b>SGD example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs026.html#the-gradient-step" style="font-size: 80%;"><b>The gradient step</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs027.html#simple-example-code" style="font-size: 80%;"><b>Simple example code</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs028.html#when-do-we-stop" style="font-size: 80%;"><b>When do we stop?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs029.html#slightly-different-approach" style="font-size: 80%;"><b>Slightly different approach</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs030.html#time-decay-rate" style="font-size: 80%;"><b>Time decay rate</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs031.html#code-with-a-number-of-minibatches-which-varies" style="font-size: 80%;"><b>Code with a Number of Minibatches which varies</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs032.html#replace-or-not" style="font-size: 80%;"><b>Replace or not</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#sgd-vs-full-batch-gd-convergence-speed-and-memory-comparison" style="font-size: 80%;"><b>SGD vs Full-Batch GD: Convergence Speed and Memory Comparison</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#theoretical-convergence-speed-and-convex-optimization" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Theoretical Convergence Speed and convex optimization</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#strongly-convex-case" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Strongly Convex Case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#non-convex-problems" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Non-Convex Problems</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs034.html#memory-usage-and-scalability" style="font-size: 80%;"><b>Memory Usage and Scalability</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#empirical-evidence-convergence-time-and-memory-in-practice" style="font-size: 80%;"><b>Empirical Evidence: Convergence Time and Memory in Practice</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#deep-neural-networks" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Deep Neural Networks</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#memory-constraints" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Memory constraints</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs036.html#second-moment-of-the-gradient" style="font-size: 80%;"><b>Second moment of the gradient</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs037.html#challenge-choosing-a-fixed-learning-rate" style="font-size: 80%;"><b>Challenge: Choosing a Fixed Learning Rate</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs038.html#motivation-for-adaptive-step-sizes" style="font-size: 80%;"><b>Motivation for Adaptive Step Sizes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs038.html#adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>AdaGrad algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs039.html#derivation-of-the-adagrad-algorithm" style="font-size: 80%;"><b>Derivation of the AdaGrad Algorithm</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs040.html#adagrad-update-rule-derivation" style="font-size: 80%;"><b>AdaGrad Update Rule Derivation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs041.html#adagrad-properties" style="font-size: 80%;"><b>AdaGrad Properties</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs042.html#rmsprop-adaptive-learning-rates" style="font-size: 80%;"><b>RMSProp: Adaptive Learning Rates</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs042.html#rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>RMSProp algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs043.html#adam-optimizer" style="font-size: 80%;"><b>Adam Optimizer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs044.html#adam-optimizer-https-arxiv-org-abs-1412-6980" style="font-size: 80%;"><b>"ADAM optimizer":"https://arxiv.org/abs/1412.6980"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs045.html#why-combine-momentum-and-rmsprop" style="font-size: 80%;"><b>Why Combine Momentum and RMSProp?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs046.html#adam-exponential-moving-averages-moments" style="font-size: 80%;"><b>Adam: Exponential Moving Averages (Moments)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs047.html#adam-bias-correction" style="font-size: 80%;"><b>Adam: Bias Correction</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs048.html#adam-update-rule-derivation" style="font-size: 80%;"><b>Adam: Update Rule Derivation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs049.html#adam-vs-adagrad-and-rmsprop" style="font-size: 80%;"><b>Adam vs. AdaGrad and RMSProp</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs050.html#adaptivity-across-dimensions" style="font-size: 80%;"><b>Adaptivity Across Dimensions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs050.html#adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>ADAM algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs051.html#algorithms-and-codes-for-adagrad-rmsprop-and-adam" style="font-size: 80%;"><b>Algorithms and codes for Adagrad, RMSprop and Adam</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs052.html#practical-tips" style="font-size: 80%;"><b>Practical tips</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs053.html#sneaking-in-automatic-differentiation-using-autograd" style="font-size: 80%;"><b>Sneaking in automatic differentiation using Autograd</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs056.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs055.html#including-stochastic-gradient-descent-with-autograd" style="font-size: 80%;"><b>Including Stochastic Gradient Descent with Autograd</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs056.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs057.html#but-none-of-these-can-compete-with-newton-s-method" style="font-size: 80%;"><b>But none of these can compete with Newton's method</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs058.html#similar-second-order-function-now-problem-but-now-with-adagrad" style="font-size: 80%;"><b>Similar (second order function now) problem but now with AdaGrad</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs059.html#rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent" style="font-size: 80%;"><b>RMSprop for adaptive learning rate with Stochastic Gradient Descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs060.html#and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf" style="font-size: 80%;"><b>And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs061.html#material-for-the-lab-sessions" style="font-size: 80%;"><b>Material for the lab sessions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs062.html#reminder-on-different-scaling-methods" style="font-size: 80%;"><b>Reminder on different scaling methods</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs063.html#functionality-in-scikit-learn" style="font-size: 80%;"><b>Functionality in Scikit-Learn</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs064.html#more-preprocessing" style="font-size: 80%;"><b>More preprocessing</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs065.html#frequently-used-scaling-functions" style="font-size: 80%;"><b>Frequently used scaling functions</b></a></li>
 
         </ul>
       </li>
@@ -291,17 +374,19 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0018"></a>
 <!-- !split -->
-<h2 id="why-resampling-methods" class="anchor">Why resampling methods </h2>
+<h2 id="batches-and-mini-batches" class="anchor">Batches and mini-batches </h2>
+
+<p>In gradient descent we compute the cost function and its gradient for all data points we have.</p>
 
-<p>Before we proceed, we need to rethink what we have been doing. In our
-eager to fit the data, we have omitted several important elements in
-our regression analysis. In what follows we will
+<p>In large-scale applications such as the <a href="https://www.image-net.org/challenges/LSVRC/" target="_self">ILSVRC challenge</a>, the
+training data can have on order of millions of examples. Hence, it
+seems wasteful to compute the full cost function over the entire
+training set in order to perform only a single parameter update. A
+very common approach to addressing this challenge is to compute the
+gradient over batches of the training data. For example, a typical batch could contain some thousand  examples from
+an  entire training set of several millions. This batch is then used to
+perform a parameter update.
 </p>
-<ol>
-<li> look at statistical properties, including a discussion of mean values, variance and the so-called bias-variance tradeoff</li>
-<li> introduce resampling techniques like cross-validation, bootstrapping and jackknife and more</li>
-</ol>
-<p>and discuss how to select a given model (one of the difficult parts in machine learning).</p>
 
 <p>
 <!-- navigation buttons at the bottom of the page -->
@@ -328,7 +413,7 @@ <h2 id="why-resampling-methods" class="anchor">Why resampling methods </h2>
   <li><a href="http://github.com/._week37-bs026.html">27</a></li>
   <li><a href="http://github.com/._week37-bs027.html">28</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week37-bs053.html">54</a></li>
+  <li><a href="http://github.com/._week37-bs065.html">66</a></li>
   <li><a href="http://github.com/._week37-bs019.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week37/html/._week37-bs019.html b/doc/pub/week37/html/._week37-bs019.html
index 8a1c65a3e..82729b77d 100644
--- a/doc/pub/week37/html/._week37-bs019.html
+++ b/doc/pub/week37/html/._week37-bs019.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 37: Statistical interpretations and Resampling Methods">
-<title>Week 37: Statistical interpretations and Resampling Methods</title>
+<meta name="description" content="Week 37: Gradient descent methods">
+<title>Week 37: Gradient descent methods</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week37.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week37-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -40,159 +40,222 @@
                2,
                None,
                'plans-for-week-37-lecture-monday'),
-              ('Plans for week 37, lab sessions',
+              ('Readings and Videos:', 2, None, 'readings-and-videos'),
+              ('Material for lecture Monday September 8',
                2,
                None,
-               'plans-for-week-37-lab-sessions'),
-              ('Material for lecture Monday September 9',
+               'material-for-lecture-monday-september-8'),
+              ('Gradient descent and revisiting Ordinary Least Squares from '
+               'last week',
                2,
                None,
-               'material-for-lecture-monday-september-9'),
-              ('Deriving OLS from a probability distribution',
+               'gradient-descent-and-revisiting-ordinary-least-squares-from-last-week'),
+              ('Gradient descent example', 2, None, 'gradient-descent-example'),
+              ('The derivative of the cost/loss function',
                2,
                None,
-               'deriving-ols-from-a-probability-distribution'),
-              ('Independent and Identically Distrubuted (iid)',
+               'the-derivative-of-the-cost-loss-function'),
+              ('The Hessian matrix', 2, None, 'the-hessian-matrix'),
+              ('Simple program', 2, None, 'simple-program'),
+              ('Gradient Descent Example', 2, None, 'gradient-descent-example'),
+              ('Gradient descent and Ridge',
                2,
                None,
-               'independent-and-identically-distrubuted-iid'),
-              ('Maximum Likelihood Estimation (MLE)',
+               'gradient-descent-and-ridge'),
+              ('The Hessian matrix for Ridge Regression',
                2,
                None,
-               'maximum-likelihood-estimation-mle'),
-              ('A new Cost Function', 2, None, 'a-new-cost-function'),
-              ("More basic Statistics and Bayes' theorem",
+               'the-hessian-matrix-for-ridge-regression'),
+              ('Program example for gradient descent with Ridge Regression',
                2,
                None,
-               'more-basic-statistics-and-bayes-theorem'),
-              ('Marginal Probability', 2, None, 'marginal-probability'),
-              ('Conditional  Probability', 2, None, 'conditional-probability'),
-              ("Bayes' Theorem", 2, None, 'bayes-theorem'),
-              ("Interpretations of Bayes' Theorem",
+               'program-example-for-gradient-descent-with-ridge-regression'),
+              ('Using gradient descent methods, limitations',
                2,
                None,
-               'interpretations-of-bayes-theorem'),
-              ("Example of Usage of Bayes' theorem",
+               'using-gradient-descent-methods-limitations'),
+              ('Momentum based GD', 2, None, 'momentum-based-gd'),
+              ('Improving gradient descent with momentum',
                2,
                None,
-               'example-of-usage-of-bayes-theorem'),
-              ('Doing it correctly', 2, None, 'doing-it-correctly'),
-              ("Bayes' Theorem and Ridge and Lasso Regression",
+               'improving-gradient-descent-with-momentum'),
+              ('Same code but now with momentum gradient descent',
                2,
                None,
-               'bayes-theorem-and-ridge-and-lasso-regression'),
-              ('Ridge and Bayes', 2, None, 'ridge-and-bayes'),
-              ('Lasso and Bayes', 2, None, 'lasso-and-bayes'),
-              ('Why resampling methods', 2, None, 'why-resampling-methods'),
-              ('Resampling methods', 2, None, 'resampling-methods'),
-              ('Resampling approaches can be computationally expensive',
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ('Overview video on Stochastic Gradient Descent (SGD)',
                2,
                None,
-               'resampling-approaches-can-be-computationally-expensive'),
-              ('Why resampling methods ?', 2, None, 'why-resampling-methods'),
-              ('Statistical analysis', 2, None, 'statistical-analysis'),
-              ('Resampling methods', 2, None, 'resampling-methods'),
-              ('Resampling methods: Bootstrap',
+               'overview-video-on-stochastic-gradient-descent-sgd'),
+              ('Batches and mini-batches', 2, None, 'batches-and-mini-batches'),
+              ('Pros and cons', 2, None, 'pros-and-cons'),
+              ('Convergence rates', 2, None, 'convergence-rates'),
+              ('Accuracy', 2, None, 'accuracy'),
+              ('Stochastic Gradient Descent (SGD)',
                2,
                None,
-               'resampling-methods-bootstrap'),
-              ('The Central Limit Theorem',
+               'stochastic-gradient-descent-sgd'),
+              ('Stochastic Gradient Descent',
                2,
                None,
-               'the-central-limit-theorem'),
-              ('Finding the Limit', 2, None, 'finding-the-limit'),
-              ('Rewriting the $\\delta$-function',
+               'stochastic-gradient-descent'),
+              ('Computation of gradients', 2, None, 'computation-of-gradients'),
+              ('SGD example', 2, None, 'sgd-example'),
+              ('The gradient step', 2, None, 'the-gradient-step'),
+              ('Simple example code', 2, None, 'simple-example-code'),
+              ('When do we stop?', 2, None, 'when-do-we-stop'),
+              ('Slightly different approach',
                2,
                None,
-               'rewriting-the-delta-function'),
-              ('Identifying Terms', 2, None, 'identifying-terms'),
-              ('Wrapping it up', 2, None, 'wrapping-it-up'),
-              ('Confidence Intervals', 2, None, 'confidence-intervals'),
-              ('Standard Approach based on the Normal Distribution',
+               'slightly-different-approach'),
+              ('Time decay rate', 2, None, 'time-decay-rate'),
+              ('Code with a Number of Minibatches which varies',
                2,
                None,
-               'standard-approach-based-on-the-normal-distribution'),
-              ('Resampling methods: Bootstrap background',
+               'code-with-a-number-of-minibatches-which-varies'),
+              ('Replace or not', 2, None, 'replace-or-not'),
+              ('SGD vs Full-Batch GD: Convergence Speed and Memory Comparison',
                2,
                None,
-               'resampling-methods-bootstrap-background'),
-              ('Resampling methods: More Bootstrap background',
+               'sgd-vs-full-batch-gd-convergence-speed-and-memory-comparison'),
+              ('Theoretical Convergence Speed and convex optimization',
+               3,
+               None,
+               'theoretical-convergence-speed-and-convex-optimization'),
+              ('Strongly Convex Case', 3, None, 'strongly-convex-case'),
+              ('Non-Convex Problems', 3, None, 'non-convex-problems'),
+              ('Memory Usage and Scalability',
+               2,
+               None,
+               'memory-usage-and-scalability'),
+              ('Empirical Evidence: Convergence Time and Memory in Practice',
+               2,
+               None,
+               'empirical-evidence-convergence-time-and-memory-in-practice'),
+              ('Deep Neural Networks', 3, None, 'deep-neural-networks'),
+              ('Memory constraints', 3, None, 'memory-constraints'),
+              ('Second moment of the gradient',
+               2,
+               None,
+               'second-moment-of-the-gradient'),
+              ('Challenge: Choosing a Fixed Learning Rate',
+               2,
+               None,
+               'challenge-choosing-a-fixed-learning-rate'),
+              ('Motivation for Adaptive Step Sizes',
+               2,
+               None,
+               'motivation-for-adaptive-step-sizes'),
+              ('AdaGrad algorithm, taken from "Goodfellow et '
+               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
+               2,
+               None,
+               'adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
+              ('Derivation of the AdaGrad Algorithm',
+               2,
+               None,
+               'derivation-of-the-adagrad-algorithm'),
+              ('AdaGrad Update Rule Derivation',
                2,
                None,
-               'resampling-methods-more-bootstrap-background'),
-              ('Resampling methods: Bootstrap approach',
+               'adagrad-update-rule-derivation'),
+              ('AdaGrad Properties', 2, None, 'adagrad-properties'),
+              ('RMSProp: Adaptive Learning Rates',
                2,
                None,
-               'resampling-methods-bootstrap-approach'),
-              ('Resampling methods: Bootstrap steps',
+               'rmsprop-adaptive-learning-rates'),
+              ('RMSProp algorithm, taken from "Goodfellow et '
+               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
                2,
                None,
-               'resampling-methods-bootstrap-steps'),
-              ('Code example for the Bootstrap method',
+               'rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
+              ('Adam Optimizer', 2, None, 'adam-optimizer'),
+              ('"ADAM optimizer":"https://arxiv.org/abs/1412.6980"',
                2,
                None,
-               'code-example-for-the-bootstrap-method'),
-              ('Plotting the Histogram', 2, None, 'plotting-the-histogram'),
-              ('The bias-variance tradeoff',
+               'adam-optimizer-https-arxiv-org-abs-1412-6980'),
+              ('Why Combine Momentum and RMSProp?',
                2,
                None,
-               'the-bias-variance-tradeoff'),
-              ('A way to Read the Bias-Variance Tradeoff',
+               'why-combine-momentum-and-rmsprop'),
+              ('Adam: Exponential Moving Averages (Moments)',
                2,
                None,
-               'a-way-to-read-the-bias-variance-tradeoff'),
-              ('Example code for Bias-Variance tradeoff',
+               'adam-exponential-moving-averages-moments'),
+              ('Adam: Bias Correction', 2, None, 'adam-bias-correction'),
+              ('Adam: Update Rule Derivation',
                2,
                None,
-               'example-code-for-bias-variance-tradeoff'),
-              ('Understanding what happens',
+               'adam-update-rule-derivation'),
+              ('Adam vs. AdaGrad and RMSProp',
                2,
                None,
-               'understanding-what-happens'),
-              ('Summing up', 2, None, 'summing-up'),
-              ("Another Example from Scikit-Learn's Repository",
+               'adam-vs-adagrad-and-rmsprop'),
+              ('Adaptivity Across Dimensions',
                2,
                None,
-               'another-example-from-scikit-learn-s-repository'),
-              ('Various steps in cross-validation',
+               'adaptivity-across-dimensions'),
+              ('ADAM algorithm, taken from "Goodfellow et '
+               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
                2,
                None,
-               'various-steps-in-cross-validation'),
-              ('Cross-validation in brief',
+               'adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
+              ('Algorithms and codes for Adagrad, RMSprop and Adam',
                2,
                None,
-               'cross-validation-in-brief'),
-              ('Code Example for Cross-validation and $k$-fold '
-               'Cross-validation',
+               'algorithms-and-codes-for-adagrad-rmsprop-and-adam'),
+              ('Practical tips', 2, None, 'practical-tips'),
+              ('Sneaking in automatic differentiation using Autograd',
                2,
                None,
-               'code-example-for-cross-validation-and-k-fold-cross-validation'),
-              ('More examples on bootstrap and cross-validation and errors',
+               'sneaking-in-automatic-differentiation-using-autograd'),
+              ('Same code but now with momentum gradient descent',
                2,
                None,
-               'more-examples-on-bootstrap-and-cross-validation-and-errors'),
-              ('The same example but now with cross-validation',
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ('Including Stochastic Gradient Descent with Autograd',
                2,
                None,
-               'the-same-example-but-now-with-cross-validation'),
+               'including-stochastic-gradient-descent-with-autograd'),
+              ('Same code but now with momentum gradient descent',
+               2,
+               None,
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ("But none of these can compete with Newton's method",
+               2,
+               None,
+               'but-none-of-these-can-compete-with-newton-s-method'),
+              ('Similar (second order function now) problem but now with '
+               'AdaGrad',
+               2,
+               None,
+               'similar-second-order-function-now-problem-but-now-with-adagrad'),
+              ('RMSprop for adaptive learning rate with Stochastic Gradient '
+               'Descent',
+               2,
+               None,
+               'rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent'),
+              ('And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"',
+               2,
+               None,
+               'and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf'),
               ('Material for the lab sessions',
                2,
                None,
                'material-for-the-lab-sessions'),
-              ('Linking the regression analysis with a statistical '
-               'interpretation',
+              ('Reminder on different scaling methods',
                2,
                None,
-               'linking-the-regression-analysis-with-a-statistical-interpretation'),
-              ('Assumptions made', 2, None, 'assumptions-made'),
-              ('Expectation value and variance',
+               'reminder-on-different-scaling-methods'),
+              ('Functionality in Scikit-Learn',
                2,
                None,
-               'expectation-value-and-variance'),
-              ('Expectation value and variance for $\\boldsymbol{\\beta}$',
+               'functionality-in-scikit-learn'),
+              ('More preprocessing', 2, None, 'more-preprocessing'),
+              ('Frequently used scaling functions',
                2,
                None,
-               'expectation-value-and-variance-for-boldsymbol-beta')]}
+               'frequently-used-scaling-functions')]}
 end of tocinfo -->
 
 <body>
@@ -220,66 +283,86 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week37-bs.html">Week 37: Statistical interpretations and Resampling Methods</a>
+    <a class="navbar-brand" href="http://github.com/week37-bs.html">Week 37: Gradient descent methods</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs001.html#plans-for-week-37-lecture-monday" style="font-size: 80%;">Plans for week 37, lecture Monday</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs002.html#plans-for-week-37-lab-sessions" style="font-size: 80%;">Plans for week 37, lab sessions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs003.html#material-for-lecture-monday-september-9" style="font-size: 80%;">Material for lecture Monday September 9</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs004.html#deriving-ols-from-a-probability-distribution" style="font-size: 80%;">Deriving OLS from a probability distribution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs005.html#independent-and-identically-distrubuted-iid" style="font-size: 80%;">Independent and Identically Distrubuted (iid)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs006.html#maximum-likelihood-estimation-mle" style="font-size: 80%;">Maximum Likelihood Estimation (MLE)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs007.html#a-new-cost-function" style="font-size: 80%;">A new Cost Function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs008.html#more-basic-statistics-and-bayes-theorem" style="font-size: 80%;">More basic Statistics and Bayes' theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs009.html#marginal-probability" style="font-size: 80%;">Marginal Probability</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs010.html#conditional-probability" style="font-size: 80%;">Conditional  Probability</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs011.html#bayes-theorem" style="font-size: 80%;">Bayes' Theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs012.html#interpretations-of-bayes-theorem" style="font-size: 80%;">Interpretations of Bayes' Theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs013.html#example-of-usage-of-bayes-theorem" style="font-size: 80%;">Example of Usage of Bayes' theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs014.html#doing-it-correctly" style="font-size: 80%;">Doing it correctly</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs015.html#bayes-theorem-and-ridge-and-lasso-regression" style="font-size: 80%;">Bayes' Theorem and Ridge and Lasso Regression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs016.html#ridge-and-bayes" style="font-size: 80%;">Ridge and Bayes</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs017.html#lasso-and-bayes" style="font-size: 80%;">Lasso and Bayes</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs021.html#why-resampling-methods" style="font-size: 80%;">Why resampling methods</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs023.html#resampling-methods" style="font-size: 80%;">Resampling methods</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs020.html#resampling-approaches-can-be-computationally-expensive" style="font-size: 80%;">Resampling approaches can be computationally expensive</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs021.html#why-resampling-methods" style="font-size: 80%;">Why resampling methods ?</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs022.html#statistical-analysis" style="font-size: 80%;">Statistical analysis</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs023.html#resampling-methods" style="font-size: 80%;">Resampling methods</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs024.html#resampling-methods-bootstrap" style="font-size: 80%;">Resampling methods: Bootstrap</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs025.html#the-central-limit-theorem" style="font-size: 80%;">The Central Limit Theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs026.html#finding-the-limit" style="font-size: 80%;">Finding the Limit</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs027.html#rewriting-the-delta-function" style="font-size: 80%;">Rewriting the \( \delta \)-function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs028.html#identifying-terms" style="font-size: 80%;">Identifying Terms</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs029.html#wrapping-it-up" style="font-size: 80%;">Wrapping it up</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs030.html#confidence-intervals" style="font-size: 80%;">Confidence Intervals</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs031.html#standard-approach-based-on-the-normal-distribution" style="font-size: 80%;">Standard Approach based on the Normal Distribution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs032.html#resampling-methods-bootstrap-background" style="font-size: 80%;">Resampling methods: Bootstrap background</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#resampling-methods-more-bootstrap-background" style="font-size: 80%;">Resampling methods: More Bootstrap background</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs034.html#resampling-methods-bootstrap-approach" style="font-size: 80%;">Resampling methods: Bootstrap approach</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#resampling-methods-bootstrap-steps" style="font-size: 80%;">Resampling methods: Bootstrap steps</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs036.html#code-example-for-the-bootstrap-method" style="font-size: 80%;">Code example for the Bootstrap method</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs037.html#plotting-the-histogram" style="font-size: 80%;">Plotting the Histogram</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs038.html#the-bias-variance-tradeoff" style="font-size: 80%;">The bias-variance tradeoff</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs039.html#a-way-to-read-the-bias-variance-tradeoff" style="font-size: 80%;">A way to Read the Bias-Variance Tradeoff</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs040.html#example-code-for-bias-variance-tradeoff" style="font-size: 80%;">Example code for Bias-Variance tradeoff</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs041.html#understanding-what-happens" style="font-size: 80%;">Understanding what happens</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs042.html#summing-up" style="font-size: 80%;">Summing up</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs043.html#another-example-from-scikit-learn-s-repository" style="font-size: 80%;">Another Example from Scikit-Learn's Repository</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs044.html#various-steps-in-cross-validation" style="font-size: 80%;">Various steps in cross-validation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs045.html#cross-validation-in-brief" style="font-size: 80%;">Cross-validation in brief</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs046.html#code-example-for-cross-validation-and-k-fold-cross-validation" style="font-size: 80%;">Code Example for Cross-validation and \( k \)-fold Cross-validation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs047.html#more-examples-on-bootstrap-and-cross-validation-and-errors" style="font-size: 80%;">More examples on bootstrap and cross-validation and errors</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs048.html#the-same-example-but-now-with-cross-validation" style="font-size: 80%;">The same example but now with cross-validation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs049.html#material-for-the-lab-sessions" style="font-size: 80%;">Material for the lab sessions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs050.html#linking-the-regression-analysis-with-a-statistical-interpretation" style="font-size: 80%;">Linking the regression analysis with a statistical interpretation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs051.html#assumptions-made" style="font-size: 80%;">Assumptions made</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs052.html#expectation-value-and-variance" style="font-size: 80%;">Expectation value and variance</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs053.html#expectation-value-and-variance-for-boldsymbol-beta" style="font-size: 80%;">Expectation value and variance for \( \boldsymbol{\beta} \)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs001.html#plans-for-week-37-lecture-monday" style="font-size: 80%;"><b>Plans for week 37, lecture Monday</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs002.html#readings-and-videos" style="font-size: 80%;"><b>Readings and Videos:</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs003.html#material-for-lecture-monday-september-8" style="font-size: 80%;"><b>Material for lecture Monday September 8</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs004.html#gradient-descent-and-revisiting-ordinary-least-squares-from-last-week" style="font-size: 80%;"><b>Gradient descent and revisiting Ordinary Least Squares from last week</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs009.html#gradient-descent-example" style="font-size: 80%;"><b>Gradient descent example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs006.html#the-derivative-of-the-cost-loss-function" style="font-size: 80%;"><b>The derivative of the cost/loss function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs007.html#the-hessian-matrix" style="font-size: 80%;"><b>The Hessian matrix</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs008.html#simple-program" style="font-size: 80%;"><b>Simple program</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs009.html#gradient-descent-example" style="font-size: 80%;"><b>Gradient Descent Example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs010.html#gradient-descent-and-ridge" style="font-size: 80%;"><b>Gradient descent and Ridge</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs011.html#the-hessian-matrix-for-ridge-regression" style="font-size: 80%;"><b>The Hessian matrix for Ridge Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs012.html#program-example-for-gradient-descent-with-ridge-regression" style="font-size: 80%;"><b>Program example for gradient descent with Ridge Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs013.html#using-gradient-descent-methods-limitations" style="font-size: 80%;"><b>Using gradient descent methods, limitations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs014.html#momentum-based-gd" style="font-size: 80%;"><b>Momentum based GD</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs015.html#improving-gradient-descent-with-momentum" style="font-size: 80%;"><b>Improving gradient descent with momentum</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs056.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs017.html#overview-video-on-stochastic-gradient-descent-sgd" style="font-size: 80%;"><b>Overview video on Stochastic Gradient Descent (SGD)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs018.html#batches-and-mini-batches" style="font-size: 80%;"><b>Batches and mini-batches</b></a></li>
+     <!-- navigation toc: --> <li><a href="#pros-and-cons" style="font-size: 80%;"><b>Pros and cons</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs020.html#convergence-rates" style="font-size: 80%;"><b>Convergence rates</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs021.html#accuracy" style="font-size: 80%;"><b>Accuracy</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs022.html#stochastic-gradient-descent-sgd" style="font-size: 80%;"><b>Stochastic Gradient Descent (SGD)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs023.html#stochastic-gradient-descent" style="font-size: 80%;"><b>Stochastic Gradient Descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs024.html#computation-of-gradients" style="font-size: 80%;"><b>Computation of gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs025.html#sgd-example" style="font-size: 80%;"><b>SGD example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs026.html#the-gradient-step" style="font-size: 80%;"><b>The gradient step</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs027.html#simple-example-code" style="font-size: 80%;"><b>Simple example code</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs028.html#when-do-we-stop" style="font-size: 80%;"><b>When do we stop?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs029.html#slightly-different-approach" style="font-size: 80%;"><b>Slightly different approach</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs030.html#time-decay-rate" style="font-size: 80%;"><b>Time decay rate</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs031.html#code-with-a-number-of-minibatches-which-varies" style="font-size: 80%;"><b>Code with a Number of Minibatches which varies</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs032.html#replace-or-not" style="font-size: 80%;"><b>Replace or not</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#sgd-vs-full-batch-gd-convergence-speed-and-memory-comparison" style="font-size: 80%;"><b>SGD vs Full-Batch GD: Convergence Speed and Memory Comparison</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#theoretical-convergence-speed-and-convex-optimization" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Theoretical Convergence Speed and convex optimization</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#strongly-convex-case" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Strongly Convex Case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#non-convex-problems" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Non-Convex Problems</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs034.html#memory-usage-and-scalability" style="font-size: 80%;"><b>Memory Usage and Scalability</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#empirical-evidence-convergence-time-and-memory-in-practice" style="font-size: 80%;"><b>Empirical Evidence: Convergence Time and Memory in Practice</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#deep-neural-networks" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Deep Neural Networks</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#memory-constraints" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Memory constraints</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs036.html#second-moment-of-the-gradient" style="font-size: 80%;"><b>Second moment of the gradient</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs037.html#challenge-choosing-a-fixed-learning-rate" style="font-size: 80%;"><b>Challenge: Choosing a Fixed Learning Rate</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs038.html#motivation-for-adaptive-step-sizes" style="font-size: 80%;"><b>Motivation for Adaptive Step Sizes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs038.html#adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>AdaGrad algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs039.html#derivation-of-the-adagrad-algorithm" style="font-size: 80%;"><b>Derivation of the AdaGrad Algorithm</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs040.html#adagrad-update-rule-derivation" style="font-size: 80%;"><b>AdaGrad Update Rule Derivation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs041.html#adagrad-properties" style="font-size: 80%;"><b>AdaGrad Properties</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs042.html#rmsprop-adaptive-learning-rates" style="font-size: 80%;"><b>RMSProp: Adaptive Learning Rates</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs042.html#rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>RMSProp algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs043.html#adam-optimizer" style="font-size: 80%;"><b>Adam Optimizer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs044.html#adam-optimizer-https-arxiv-org-abs-1412-6980" style="font-size: 80%;"><b>"ADAM optimizer":"https://arxiv.org/abs/1412.6980"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs045.html#why-combine-momentum-and-rmsprop" style="font-size: 80%;"><b>Why Combine Momentum and RMSProp?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs046.html#adam-exponential-moving-averages-moments" style="font-size: 80%;"><b>Adam: Exponential Moving Averages (Moments)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs047.html#adam-bias-correction" style="font-size: 80%;"><b>Adam: Bias Correction</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs048.html#adam-update-rule-derivation" style="font-size: 80%;"><b>Adam: Update Rule Derivation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs049.html#adam-vs-adagrad-and-rmsprop" style="font-size: 80%;"><b>Adam vs. AdaGrad and RMSProp</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs050.html#adaptivity-across-dimensions" style="font-size: 80%;"><b>Adaptivity Across Dimensions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs050.html#adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>ADAM algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs051.html#algorithms-and-codes-for-adagrad-rmsprop-and-adam" style="font-size: 80%;"><b>Algorithms and codes for Adagrad, RMSprop and Adam</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs052.html#practical-tips" style="font-size: 80%;"><b>Practical tips</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs053.html#sneaking-in-automatic-differentiation-using-autograd" style="font-size: 80%;"><b>Sneaking in automatic differentiation using Autograd</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs056.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs055.html#including-stochastic-gradient-descent-with-autograd" style="font-size: 80%;"><b>Including Stochastic Gradient Descent with Autograd</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs056.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs057.html#but-none-of-these-can-compete-with-newton-s-method" style="font-size: 80%;"><b>But none of these can compete with Newton's method</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs058.html#similar-second-order-function-now-problem-but-now-with-adagrad" style="font-size: 80%;"><b>Similar (second order function now) problem but now with AdaGrad</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs059.html#rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent" style="font-size: 80%;"><b>RMSprop for adaptive learning rate with Stochastic Gradient Descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs060.html#and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf" style="font-size: 80%;"><b>And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs061.html#material-for-the-lab-sessions" style="font-size: 80%;"><b>Material for the lab sessions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs062.html#reminder-on-different-scaling-methods" style="font-size: 80%;"><b>Reminder on different scaling methods</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs063.html#functionality-in-scikit-learn" style="font-size: 80%;"><b>Functionality in Scikit-Learn</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs064.html#more-preprocessing" style="font-size: 80%;"><b>More preprocessing</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs065.html#frequently-used-scaling-functions" style="font-size: 80%;"><b>Frequently used scaling functions</b></a></li>
 
         </ul>
       </li>
@@ -291,34 +374,13 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0019"></a>
 <!-- !split -->
-<h2 id="resampling-methods" class="anchor">Resampling methods </h2>
-<div class="panel panel-default">
-<div class="panel-body">
-<!-- subsequent paragraphs come in larger fonts, so start with a paragraph -->
-<p>Resampling methods are an indispensable tool in modern
-statistics. They involve repeatedly drawing samples from a training
-set and refitting a model of interest on each sample in order to
-obtain additional information about the fitted model. For example, in
-order to estimate the variability of a linear regression fit, we can
-repeatedly draw different samples from the training data, fit a linear
-regression to each new sample, and then examine the extent to which
-the resulting fits differ. Such an approach may allow us to obtain
-information that would not be available from fitting the model only
-once using the original training sample.
-</p>
+<h2 id="pros-and-cons" class="anchor">Pros and cons </h2>
 
-<p>Two resampling methods are often used in Machine Learning analyses,</p>
 <ol>
-<li> The <b>bootstrap method</b></li>
-<li> and <b>Cross-Validation</b></li>
+<li> Speed: SGD is faster than gradient descent because it uses only one training example per iteration, whereas gradient descent requires the entire dataset. This speed advantage becomes more significant as the size of the dataset increases.</li>
+<li> Convergence: Gradient descent has a more predictable convergence behaviour because it uses the average gradient of the entire dataset. In contrast, SGD&#8217;s convergence behaviour can be more erratic due to its random sampling of individual training examples.</li>
+<li> Memory: Gradient descent requires more memory than SGD because it must store the entire dataset for each iteration. SGD only needs to store the current training example, making it more memory-efficient.</li>
 </ol>
-<p>In addition there are several other methods such as the Jackknife and the Blocking methods. We will discuss in particular
-cross-validation and the bootstrap method. 
-</p>
-</div>
-</div>
-
-
 <p>
 <!-- navigation buttons at the bottom of the page -->
 <ul class="pagination">
@@ -344,7 +406,7 @@ <h2 id="resampling-methods" class="anchor">Resampling methods </h2>
   <li><a href="http://github.com/._week37-bs027.html">28</a></li>
   <li><a href="http://github.com/._week37-bs028.html">29</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week37-bs053.html">54</a></li>
+  <li><a href="http://github.com/._week37-bs065.html">66</a></li>
   <li><a href="http://github.com/._week37-bs020.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week37/html/._week37-bs020.html b/doc/pub/week37/html/._week37-bs020.html
index 6468726a1..48a11cdf1 100644
--- a/doc/pub/week37/html/._week37-bs020.html
+++ b/doc/pub/week37/html/._week37-bs020.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 37: Statistical interpretations and Resampling Methods">
-<title>Week 37: Statistical interpretations and Resampling Methods</title>
+<meta name="description" content="Week 37: Gradient descent methods">
+<title>Week 37: Gradient descent methods</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week37.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week37-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -40,159 +40,222 @@
                2,
                None,
                'plans-for-week-37-lecture-monday'),
-              ('Plans for week 37, lab sessions',
+              ('Readings and Videos:', 2, None, 'readings-and-videos'),
+              ('Material for lecture Monday September 8',
                2,
                None,
-               'plans-for-week-37-lab-sessions'),
-              ('Material for lecture Monday September 9',
+               'material-for-lecture-monday-september-8'),
+              ('Gradient descent and revisiting Ordinary Least Squares from '
+               'last week',
                2,
                None,
-               'material-for-lecture-monday-september-9'),
-              ('Deriving OLS from a probability distribution',
+               'gradient-descent-and-revisiting-ordinary-least-squares-from-last-week'),
+              ('Gradient descent example', 2, None, 'gradient-descent-example'),
+              ('The derivative of the cost/loss function',
                2,
                None,
-               'deriving-ols-from-a-probability-distribution'),
-              ('Independent and Identically Distrubuted (iid)',
+               'the-derivative-of-the-cost-loss-function'),
+              ('The Hessian matrix', 2, None, 'the-hessian-matrix'),
+              ('Simple program', 2, None, 'simple-program'),
+              ('Gradient Descent Example', 2, None, 'gradient-descent-example'),
+              ('Gradient descent and Ridge',
                2,
                None,
-               'independent-and-identically-distrubuted-iid'),
-              ('Maximum Likelihood Estimation (MLE)',
+               'gradient-descent-and-ridge'),
+              ('The Hessian matrix for Ridge Regression',
                2,
                None,
-               'maximum-likelihood-estimation-mle'),
-              ('A new Cost Function', 2, None, 'a-new-cost-function'),
-              ("More basic Statistics and Bayes' theorem",
+               'the-hessian-matrix-for-ridge-regression'),
+              ('Program example for gradient descent with Ridge Regression',
                2,
                None,
-               'more-basic-statistics-and-bayes-theorem'),
-              ('Marginal Probability', 2, None, 'marginal-probability'),
-              ('Conditional  Probability', 2, None, 'conditional-probability'),
-              ("Bayes' Theorem", 2, None, 'bayes-theorem'),
-              ("Interpretations of Bayes' Theorem",
+               'program-example-for-gradient-descent-with-ridge-regression'),
+              ('Using gradient descent methods, limitations',
                2,
                None,
-               'interpretations-of-bayes-theorem'),
-              ("Example of Usage of Bayes' theorem",
+               'using-gradient-descent-methods-limitations'),
+              ('Momentum based GD', 2, None, 'momentum-based-gd'),
+              ('Improving gradient descent with momentum',
                2,
                None,
-               'example-of-usage-of-bayes-theorem'),
-              ('Doing it correctly', 2, None, 'doing-it-correctly'),
-              ("Bayes' Theorem and Ridge and Lasso Regression",
+               'improving-gradient-descent-with-momentum'),
+              ('Same code but now with momentum gradient descent',
                2,
                None,
-               'bayes-theorem-and-ridge-and-lasso-regression'),
-              ('Ridge and Bayes', 2, None, 'ridge-and-bayes'),
-              ('Lasso and Bayes', 2, None, 'lasso-and-bayes'),
-              ('Why resampling methods', 2, None, 'why-resampling-methods'),
-              ('Resampling methods', 2, None, 'resampling-methods'),
-              ('Resampling approaches can be computationally expensive',
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ('Overview video on Stochastic Gradient Descent (SGD)',
                2,
                None,
-               'resampling-approaches-can-be-computationally-expensive'),
-              ('Why resampling methods ?', 2, None, 'why-resampling-methods'),
-              ('Statistical analysis', 2, None, 'statistical-analysis'),
-              ('Resampling methods', 2, None, 'resampling-methods'),
-              ('Resampling methods: Bootstrap',
+               'overview-video-on-stochastic-gradient-descent-sgd'),
+              ('Batches and mini-batches', 2, None, 'batches-and-mini-batches'),
+              ('Pros and cons', 2, None, 'pros-and-cons'),
+              ('Convergence rates', 2, None, 'convergence-rates'),
+              ('Accuracy', 2, None, 'accuracy'),
+              ('Stochastic Gradient Descent (SGD)',
                2,
                None,
-               'resampling-methods-bootstrap'),
-              ('The Central Limit Theorem',
+               'stochastic-gradient-descent-sgd'),
+              ('Stochastic Gradient Descent',
                2,
                None,
-               'the-central-limit-theorem'),
-              ('Finding the Limit', 2, None, 'finding-the-limit'),
-              ('Rewriting the $\\delta$-function',
+               'stochastic-gradient-descent'),
+              ('Computation of gradients', 2, None, 'computation-of-gradients'),
+              ('SGD example', 2, None, 'sgd-example'),
+              ('The gradient step', 2, None, 'the-gradient-step'),
+              ('Simple example code', 2, None, 'simple-example-code'),
+              ('When do we stop?', 2, None, 'when-do-we-stop'),
+              ('Slightly different approach',
                2,
                None,
-               'rewriting-the-delta-function'),
-              ('Identifying Terms', 2, None, 'identifying-terms'),
-              ('Wrapping it up', 2, None, 'wrapping-it-up'),
-              ('Confidence Intervals', 2, None, 'confidence-intervals'),
-              ('Standard Approach based on the Normal Distribution',
+               'slightly-different-approach'),
+              ('Time decay rate', 2, None, 'time-decay-rate'),
+              ('Code with a Number of Minibatches which varies',
                2,
                None,
-               'standard-approach-based-on-the-normal-distribution'),
-              ('Resampling methods: Bootstrap background',
+               'code-with-a-number-of-minibatches-which-varies'),
+              ('Replace or not', 2, None, 'replace-or-not'),
+              ('SGD vs Full-Batch GD: Convergence Speed and Memory Comparison',
                2,
                None,
-               'resampling-methods-bootstrap-background'),
-              ('Resampling methods: More Bootstrap background',
+               'sgd-vs-full-batch-gd-convergence-speed-and-memory-comparison'),
+              ('Theoretical Convergence Speed and convex optimization',
+               3,
+               None,
+               'theoretical-convergence-speed-and-convex-optimization'),
+              ('Strongly Convex Case', 3, None, 'strongly-convex-case'),
+              ('Non-Convex Problems', 3, None, 'non-convex-problems'),
+              ('Memory Usage and Scalability',
+               2,
+               None,
+               'memory-usage-and-scalability'),
+              ('Empirical Evidence: Convergence Time and Memory in Practice',
+               2,
+               None,
+               'empirical-evidence-convergence-time-and-memory-in-practice'),
+              ('Deep Neural Networks', 3, None, 'deep-neural-networks'),
+              ('Memory constraints', 3, None, 'memory-constraints'),
+              ('Second moment of the gradient',
+               2,
+               None,
+               'second-moment-of-the-gradient'),
+              ('Challenge: Choosing a Fixed Learning Rate',
+               2,
+               None,
+               'challenge-choosing-a-fixed-learning-rate'),
+              ('Motivation for Adaptive Step Sizes',
+               2,
+               None,
+               'motivation-for-adaptive-step-sizes'),
+              ('AdaGrad algorithm, taken from "Goodfellow et '
+               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
+               2,
+               None,
+               'adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
+              ('Derivation of the AdaGrad Algorithm',
+               2,
+               None,
+               'derivation-of-the-adagrad-algorithm'),
+              ('AdaGrad Update Rule Derivation',
                2,
                None,
-               'resampling-methods-more-bootstrap-background'),
-              ('Resampling methods: Bootstrap approach',
+               'adagrad-update-rule-derivation'),
+              ('AdaGrad Properties', 2, None, 'adagrad-properties'),
+              ('RMSProp: Adaptive Learning Rates',
                2,
                None,
-               'resampling-methods-bootstrap-approach'),
-              ('Resampling methods: Bootstrap steps',
+               'rmsprop-adaptive-learning-rates'),
+              ('RMSProp algorithm, taken from "Goodfellow et '
+               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
                2,
                None,
-               'resampling-methods-bootstrap-steps'),
-              ('Code example for the Bootstrap method',
+               'rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
+              ('Adam Optimizer', 2, None, 'adam-optimizer'),
+              ('"ADAM optimizer":"https://arxiv.org/abs/1412.6980"',
                2,
                None,
-               'code-example-for-the-bootstrap-method'),
-              ('Plotting the Histogram', 2, None, 'plotting-the-histogram'),
-              ('The bias-variance tradeoff',
+               'adam-optimizer-https-arxiv-org-abs-1412-6980'),
+              ('Why Combine Momentum and RMSProp?',
                2,
                None,
-               'the-bias-variance-tradeoff'),
-              ('A way to Read the Bias-Variance Tradeoff',
+               'why-combine-momentum-and-rmsprop'),
+              ('Adam: Exponential Moving Averages (Moments)',
                2,
                None,
-               'a-way-to-read-the-bias-variance-tradeoff'),
-              ('Example code for Bias-Variance tradeoff',
+               'adam-exponential-moving-averages-moments'),
+              ('Adam: Bias Correction', 2, None, 'adam-bias-correction'),
+              ('Adam: Update Rule Derivation',
                2,
                None,
-               'example-code-for-bias-variance-tradeoff'),
-              ('Understanding what happens',
+               'adam-update-rule-derivation'),
+              ('Adam vs. AdaGrad and RMSProp',
                2,
                None,
-               'understanding-what-happens'),
-              ('Summing up', 2, None, 'summing-up'),
-              ("Another Example from Scikit-Learn's Repository",
+               'adam-vs-adagrad-and-rmsprop'),
+              ('Adaptivity Across Dimensions',
                2,
                None,
-               'another-example-from-scikit-learn-s-repository'),
-              ('Various steps in cross-validation',
+               'adaptivity-across-dimensions'),
+              ('ADAM algorithm, taken from "Goodfellow et '
+               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
                2,
                None,
-               'various-steps-in-cross-validation'),
-              ('Cross-validation in brief',
+               'adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
+              ('Algorithms and codes for Adagrad, RMSprop and Adam',
                2,
                None,
-               'cross-validation-in-brief'),
-              ('Code Example for Cross-validation and $k$-fold '
-               'Cross-validation',
+               'algorithms-and-codes-for-adagrad-rmsprop-and-adam'),
+              ('Practical tips', 2, None, 'practical-tips'),
+              ('Sneaking in automatic differentiation using Autograd',
                2,
                None,
-               'code-example-for-cross-validation-and-k-fold-cross-validation'),
-              ('More examples on bootstrap and cross-validation and errors',
+               'sneaking-in-automatic-differentiation-using-autograd'),
+              ('Same code but now with momentum gradient descent',
                2,
                None,
-               'more-examples-on-bootstrap-and-cross-validation-and-errors'),
-              ('The same example but now with cross-validation',
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ('Including Stochastic Gradient Descent with Autograd',
                2,
                None,
-               'the-same-example-but-now-with-cross-validation'),
+               'including-stochastic-gradient-descent-with-autograd'),
+              ('Same code but now with momentum gradient descent',
+               2,
+               None,
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ("But none of these can compete with Newton's method",
+               2,
+               None,
+               'but-none-of-these-can-compete-with-newton-s-method'),
+              ('Similar (second order function now) problem but now with '
+               'AdaGrad',
+               2,
+               None,
+               'similar-second-order-function-now-problem-but-now-with-adagrad'),
+              ('RMSprop for adaptive learning rate with Stochastic Gradient '
+               'Descent',
+               2,
+               None,
+               'rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent'),
+              ('And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"',
+               2,
+               None,
+               'and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf'),
               ('Material for the lab sessions',
                2,
                None,
                'material-for-the-lab-sessions'),
-              ('Linking the regression analysis with a statistical '
-               'interpretation',
+              ('Reminder on different scaling methods',
                2,
                None,
-               'linking-the-regression-analysis-with-a-statistical-interpretation'),
-              ('Assumptions made', 2, None, 'assumptions-made'),
-              ('Expectation value and variance',
+               'reminder-on-different-scaling-methods'),
+              ('Functionality in Scikit-Learn',
                2,
                None,
-               'expectation-value-and-variance'),
-              ('Expectation value and variance for $\\boldsymbol{\\beta}$',
+               'functionality-in-scikit-learn'),
+              ('More preprocessing', 2, None, 'more-preprocessing'),
+              ('Frequently used scaling functions',
                2,
                None,
-               'expectation-value-and-variance-for-boldsymbol-beta')]}
+               'frequently-used-scaling-functions')]}
 end of tocinfo -->
 
 <body>
@@ -220,66 +283,86 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week37-bs.html">Week 37: Statistical interpretations and Resampling Methods</a>
+    <a class="navbar-brand" href="http://github.com/week37-bs.html">Week 37: Gradient descent methods</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs001.html#plans-for-week-37-lecture-monday" style="font-size: 80%;">Plans for week 37, lecture Monday</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs002.html#plans-for-week-37-lab-sessions" style="font-size: 80%;">Plans for week 37, lab sessions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs003.html#material-for-lecture-monday-september-9" style="font-size: 80%;">Material for lecture Monday September 9</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs004.html#deriving-ols-from-a-probability-distribution" style="font-size: 80%;">Deriving OLS from a probability distribution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs005.html#independent-and-identically-distrubuted-iid" style="font-size: 80%;">Independent and Identically Distrubuted (iid)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs006.html#maximum-likelihood-estimation-mle" style="font-size: 80%;">Maximum Likelihood Estimation (MLE)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs007.html#a-new-cost-function" style="font-size: 80%;">A new Cost Function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs008.html#more-basic-statistics-and-bayes-theorem" style="font-size: 80%;">More basic Statistics and Bayes' theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs009.html#marginal-probability" style="font-size: 80%;">Marginal Probability</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs010.html#conditional-probability" style="font-size: 80%;">Conditional  Probability</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs011.html#bayes-theorem" style="font-size: 80%;">Bayes' Theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs012.html#interpretations-of-bayes-theorem" style="font-size: 80%;">Interpretations of Bayes' Theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs013.html#example-of-usage-of-bayes-theorem" style="font-size: 80%;">Example of Usage of Bayes' theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs014.html#doing-it-correctly" style="font-size: 80%;">Doing it correctly</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs015.html#bayes-theorem-and-ridge-and-lasso-regression" style="font-size: 80%;">Bayes' Theorem and Ridge and Lasso Regression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs016.html#ridge-and-bayes" style="font-size: 80%;">Ridge and Bayes</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs017.html#lasso-and-bayes" style="font-size: 80%;">Lasso and Bayes</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs021.html#why-resampling-methods" style="font-size: 80%;">Why resampling methods</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs023.html#resampling-methods" style="font-size: 80%;">Resampling methods</a></li>
-     <!-- navigation toc: --> <li><a href="#resampling-approaches-can-be-computationally-expensive" style="font-size: 80%;">Resampling approaches can be computationally expensive</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs021.html#why-resampling-methods" style="font-size: 80%;">Why resampling methods ?</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs022.html#statistical-analysis" style="font-size: 80%;">Statistical analysis</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs023.html#resampling-methods" style="font-size: 80%;">Resampling methods</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs024.html#resampling-methods-bootstrap" style="font-size: 80%;">Resampling methods: Bootstrap</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs025.html#the-central-limit-theorem" style="font-size: 80%;">The Central Limit Theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs026.html#finding-the-limit" style="font-size: 80%;">Finding the Limit</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs027.html#rewriting-the-delta-function" style="font-size: 80%;">Rewriting the \( \delta \)-function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs028.html#identifying-terms" style="font-size: 80%;">Identifying Terms</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs029.html#wrapping-it-up" style="font-size: 80%;">Wrapping it up</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs030.html#confidence-intervals" style="font-size: 80%;">Confidence Intervals</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs031.html#standard-approach-based-on-the-normal-distribution" style="font-size: 80%;">Standard Approach based on the Normal Distribution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs032.html#resampling-methods-bootstrap-background" style="font-size: 80%;">Resampling methods: Bootstrap background</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#resampling-methods-more-bootstrap-background" style="font-size: 80%;">Resampling methods: More Bootstrap background</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs034.html#resampling-methods-bootstrap-approach" style="font-size: 80%;">Resampling methods: Bootstrap approach</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#resampling-methods-bootstrap-steps" style="font-size: 80%;">Resampling methods: Bootstrap steps</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs036.html#code-example-for-the-bootstrap-method" style="font-size: 80%;">Code example for the Bootstrap method</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs037.html#plotting-the-histogram" style="font-size: 80%;">Plotting the Histogram</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs038.html#the-bias-variance-tradeoff" style="font-size: 80%;">The bias-variance tradeoff</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs039.html#a-way-to-read-the-bias-variance-tradeoff" style="font-size: 80%;">A way to Read the Bias-Variance Tradeoff</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs040.html#example-code-for-bias-variance-tradeoff" style="font-size: 80%;">Example code for Bias-Variance tradeoff</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs041.html#understanding-what-happens" style="font-size: 80%;">Understanding what happens</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs042.html#summing-up" style="font-size: 80%;">Summing up</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs043.html#another-example-from-scikit-learn-s-repository" style="font-size: 80%;">Another Example from Scikit-Learn's Repository</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs044.html#various-steps-in-cross-validation" style="font-size: 80%;">Various steps in cross-validation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs045.html#cross-validation-in-brief" style="font-size: 80%;">Cross-validation in brief</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs046.html#code-example-for-cross-validation-and-k-fold-cross-validation" style="font-size: 80%;">Code Example for Cross-validation and \( k \)-fold Cross-validation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs047.html#more-examples-on-bootstrap-and-cross-validation-and-errors" style="font-size: 80%;">More examples on bootstrap and cross-validation and errors</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs048.html#the-same-example-but-now-with-cross-validation" style="font-size: 80%;">The same example but now with cross-validation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs049.html#material-for-the-lab-sessions" style="font-size: 80%;">Material for the lab sessions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs050.html#linking-the-regression-analysis-with-a-statistical-interpretation" style="font-size: 80%;">Linking the regression analysis with a statistical interpretation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs051.html#assumptions-made" style="font-size: 80%;">Assumptions made</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs052.html#expectation-value-and-variance" style="font-size: 80%;">Expectation value and variance</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs053.html#expectation-value-and-variance-for-boldsymbol-beta" style="font-size: 80%;">Expectation value and variance for \( \boldsymbol{\beta} \)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs001.html#plans-for-week-37-lecture-monday" style="font-size: 80%;"><b>Plans for week 37, lecture Monday</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs002.html#readings-and-videos" style="font-size: 80%;"><b>Readings and Videos:</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs003.html#material-for-lecture-monday-september-8" style="font-size: 80%;"><b>Material for lecture Monday September 8</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs004.html#gradient-descent-and-revisiting-ordinary-least-squares-from-last-week" style="font-size: 80%;"><b>Gradient descent and revisiting Ordinary Least Squares from last week</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs009.html#gradient-descent-example" style="font-size: 80%;"><b>Gradient descent example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs006.html#the-derivative-of-the-cost-loss-function" style="font-size: 80%;"><b>The derivative of the cost/loss function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs007.html#the-hessian-matrix" style="font-size: 80%;"><b>The Hessian matrix</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs008.html#simple-program" style="font-size: 80%;"><b>Simple program</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs009.html#gradient-descent-example" style="font-size: 80%;"><b>Gradient Descent Example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs010.html#gradient-descent-and-ridge" style="font-size: 80%;"><b>Gradient descent and Ridge</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs011.html#the-hessian-matrix-for-ridge-regression" style="font-size: 80%;"><b>The Hessian matrix for Ridge Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs012.html#program-example-for-gradient-descent-with-ridge-regression" style="font-size: 80%;"><b>Program example for gradient descent with Ridge Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs013.html#using-gradient-descent-methods-limitations" style="font-size: 80%;"><b>Using gradient descent methods, limitations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs014.html#momentum-based-gd" style="font-size: 80%;"><b>Momentum based GD</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs015.html#improving-gradient-descent-with-momentum" style="font-size: 80%;"><b>Improving gradient descent with momentum</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs056.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs017.html#overview-video-on-stochastic-gradient-descent-sgd" style="font-size: 80%;"><b>Overview video on Stochastic Gradient Descent (SGD)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs018.html#batches-and-mini-batches" style="font-size: 80%;"><b>Batches and mini-batches</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs019.html#pros-and-cons" style="font-size: 80%;"><b>Pros and cons</b></a></li>
+     <!-- navigation toc: --> <li><a href="#convergence-rates" style="font-size: 80%;"><b>Convergence rates</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs021.html#accuracy" style="font-size: 80%;"><b>Accuracy</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs022.html#stochastic-gradient-descent-sgd" style="font-size: 80%;"><b>Stochastic Gradient Descent (SGD)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs023.html#stochastic-gradient-descent" style="font-size: 80%;"><b>Stochastic Gradient Descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs024.html#computation-of-gradients" style="font-size: 80%;"><b>Computation of gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs025.html#sgd-example" style="font-size: 80%;"><b>SGD example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs026.html#the-gradient-step" style="font-size: 80%;"><b>The gradient step</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs027.html#simple-example-code" style="font-size: 80%;"><b>Simple example code</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs028.html#when-do-we-stop" style="font-size: 80%;"><b>When do we stop?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs029.html#slightly-different-approach" style="font-size: 80%;"><b>Slightly different approach</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs030.html#time-decay-rate" style="font-size: 80%;"><b>Time decay rate</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs031.html#code-with-a-number-of-minibatches-which-varies" style="font-size: 80%;"><b>Code with a Number of Minibatches which varies</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs032.html#replace-or-not" style="font-size: 80%;"><b>Replace or not</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#sgd-vs-full-batch-gd-convergence-speed-and-memory-comparison" style="font-size: 80%;"><b>SGD vs Full-Batch GD: Convergence Speed and Memory Comparison</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#theoretical-convergence-speed-and-convex-optimization" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Theoretical Convergence Speed and convex optimization</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#strongly-convex-case" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Strongly Convex Case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#non-convex-problems" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Non-Convex Problems</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs034.html#memory-usage-and-scalability" style="font-size: 80%;"><b>Memory Usage and Scalability</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#empirical-evidence-convergence-time-and-memory-in-practice" style="font-size: 80%;"><b>Empirical Evidence: Convergence Time and Memory in Practice</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#deep-neural-networks" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Deep Neural Networks</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#memory-constraints" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Memory constraints</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs036.html#second-moment-of-the-gradient" style="font-size: 80%;"><b>Second moment of the gradient</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs037.html#challenge-choosing-a-fixed-learning-rate" style="font-size: 80%;"><b>Challenge: Choosing a Fixed Learning Rate</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs038.html#motivation-for-adaptive-step-sizes" style="font-size: 80%;"><b>Motivation for Adaptive Step Sizes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs038.html#adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>AdaGrad algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs039.html#derivation-of-the-adagrad-algorithm" style="font-size: 80%;"><b>Derivation of the AdaGrad Algorithm</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs040.html#adagrad-update-rule-derivation" style="font-size: 80%;"><b>AdaGrad Update Rule Derivation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs041.html#adagrad-properties" style="font-size: 80%;"><b>AdaGrad Properties</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs042.html#rmsprop-adaptive-learning-rates" style="font-size: 80%;"><b>RMSProp: Adaptive Learning Rates</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs042.html#rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>RMSProp algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs043.html#adam-optimizer" style="font-size: 80%;"><b>Adam Optimizer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs044.html#adam-optimizer-https-arxiv-org-abs-1412-6980" style="font-size: 80%;"><b>"ADAM optimizer":"https://arxiv.org/abs/1412.6980"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs045.html#why-combine-momentum-and-rmsprop" style="font-size: 80%;"><b>Why Combine Momentum and RMSProp?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs046.html#adam-exponential-moving-averages-moments" style="font-size: 80%;"><b>Adam: Exponential Moving Averages (Moments)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs047.html#adam-bias-correction" style="font-size: 80%;"><b>Adam: Bias Correction</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs048.html#adam-update-rule-derivation" style="font-size: 80%;"><b>Adam: Update Rule Derivation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs049.html#adam-vs-adagrad-and-rmsprop" style="font-size: 80%;"><b>Adam vs. AdaGrad and RMSProp</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs050.html#adaptivity-across-dimensions" style="font-size: 80%;"><b>Adaptivity Across Dimensions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs050.html#adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>ADAM algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs051.html#algorithms-and-codes-for-adagrad-rmsprop-and-adam" style="font-size: 80%;"><b>Algorithms and codes for Adagrad, RMSprop and Adam</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs052.html#practical-tips" style="font-size: 80%;"><b>Practical tips</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs053.html#sneaking-in-automatic-differentiation-using-autograd" style="font-size: 80%;"><b>Sneaking in automatic differentiation using Autograd</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs056.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs055.html#including-stochastic-gradient-descent-with-autograd" style="font-size: 80%;"><b>Including Stochastic Gradient Descent with Autograd</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs056.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs057.html#but-none-of-these-can-compete-with-newton-s-method" style="font-size: 80%;"><b>But none of these can compete with Newton's method</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs058.html#similar-second-order-function-now-problem-but-now-with-adagrad" style="font-size: 80%;"><b>Similar (second order function now) problem but now with AdaGrad</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs059.html#rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent" style="font-size: 80%;"><b>RMSprop for adaptive learning rate with Stochastic Gradient Descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs060.html#and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf" style="font-size: 80%;"><b>And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs061.html#material-for-the-lab-sessions" style="font-size: 80%;"><b>Material for the lab sessions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs062.html#reminder-on-different-scaling-methods" style="font-size: 80%;"><b>Reminder on different scaling methods</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs063.html#functionality-in-scikit-learn" style="font-size: 80%;"><b>Functionality in Scikit-Learn</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs064.html#more-preprocessing" style="font-size: 80%;"><b>More preprocessing</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs065.html#frequently-used-scaling-functions" style="font-size: 80%;"><b>Frequently used scaling functions</b></a></li>
 
         </ul>
       </li>
@@ -291,31 +374,12 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0020"></a>
 <!-- !split -->
-<h2 id="resampling-approaches-can-be-computationally-expensive" class="anchor">Resampling approaches can be computationally expensive </h2>
-<div class="panel panel-default">
-<div class="panel-body">
-<!-- subsequent paragraphs come in larger fonts, so start with a paragraph -->
-
-<p>Resampling approaches can be computationally expensive, because they
-involve fitting the same statistical method multiple times using
-different subsets of the training data. However, due to recent
-advances in computing power, the computational requirements of
-resampling methods generally are not prohibitive. In this chapter, we
-discuss two of the most commonly used resampling methods,
-cross-validation and the bootstrap. Both methods are important tools
-in the practical application of many statistical learning
-procedures. For example, cross-validation can be used to estimate the
-test error associated with a given statistical learning method in
-order to evaluate its performance, or to select the appropriate level
-of flexibility. The process of evaluating a model&#8217;s performance is
-known as model assessment, whereas the process of selecting the proper
-level of flexibility for a model is known as model selection. The
-bootstrap is widely used.
-</p>
-</div>
-</div>
-
+<h2 id="convergence-rates" class="anchor">Convergence rates </h2>
 
+<ol>
+<li> Stochastic Gradient Descent has a faster convergence rate due to the use of single training examples in each iteration.</li>
+<li> Gradient Descent as a slower convergence rate, as it uses the entire dataset for each iteration.</li>
+</ol>
 <p>
 <!-- navigation buttons at the bottom of the page -->
 <ul class="pagination">
@@ -341,7 +405,7 @@ <h2 id="resampling-approaches-can-be-computationally-expensive" class="anchor">R
   <li><a href="http://github.com/._week37-bs028.html">29</a></li>
   <li><a href="http://github.com/._week37-bs029.html">30</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week37-bs053.html">54</a></li>
+  <li><a href="http://github.com/._week37-bs065.html">66</a></li>
   <li><a href="http://github.com/._week37-bs021.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week37/html/._week37-bs021.html b/doc/pub/week37/html/._week37-bs021.html
index 6e98db575..96a20cf2a 100644
--- a/doc/pub/week37/html/._week37-bs021.html
+++ b/doc/pub/week37/html/._week37-bs021.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 37: Statistical interpretations and Resampling Methods">
-<title>Week 37: Statistical interpretations and Resampling Methods</title>
+<meta name="description" content="Week 37: Gradient descent methods">
+<title>Week 37: Gradient descent methods</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week37.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week37-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -40,159 +40,222 @@
                2,
                None,
                'plans-for-week-37-lecture-monday'),
-              ('Plans for week 37, lab sessions',
+              ('Readings and Videos:', 2, None, 'readings-and-videos'),
+              ('Material for lecture Monday September 8',
                2,
                None,
-               'plans-for-week-37-lab-sessions'),
-              ('Material for lecture Monday September 9',
+               'material-for-lecture-monday-september-8'),
+              ('Gradient descent and revisiting Ordinary Least Squares from '
+               'last week',
                2,
                None,
-               'material-for-lecture-monday-september-9'),
-              ('Deriving OLS from a probability distribution',
+               'gradient-descent-and-revisiting-ordinary-least-squares-from-last-week'),
+              ('Gradient descent example', 2, None, 'gradient-descent-example'),
+              ('The derivative of the cost/loss function',
                2,
                None,
-               'deriving-ols-from-a-probability-distribution'),
-              ('Independent and Identically Distrubuted (iid)',
+               'the-derivative-of-the-cost-loss-function'),
+              ('The Hessian matrix', 2, None, 'the-hessian-matrix'),
+              ('Simple program', 2, None, 'simple-program'),
+              ('Gradient Descent Example', 2, None, 'gradient-descent-example'),
+              ('Gradient descent and Ridge',
                2,
                None,
-               'independent-and-identically-distrubuted-iid'),
-              ('Maximum Likelihood Estimation (MLE)',
+               'gradient-descent-and-ridge'),
+              ('The Hessian matrix for Ridge Regression',
                2,
                None,
-               'maximum-likelihood-estimation-mle'),
-              ('A new Cost Function', 2, None, 'a-new-cost-function'),
-              ("More basic Statistics and Bayes' theorem",
+               'the-hessian-matrix-for-ridge-regression'),
+              ('Program example for gradient descent with Ridge Regression',
                2,
                None,
-               'more-basic-statistics-and-bayes-theorem'),
-              ('Marginal Probability', 2, None, 'marginal-probability'),
-              ('Conditional  Probability', 2, None, 'conditional-probability'),
-              ("Bayes' Theorem", 2, None, 'bayes-theorem'),
-              ("Interpretations of Bayes' Theorem",
+               'program-example-for-gradient-descent-with-ridge-regression'),
+              ('Using gradient descent methods, limitations',
                2,
                None,
-               'interpretations-of-bayes-theorem'),
-              ("Example of Usage of Bayes' theorem",
+               'using-gradient-descent-methods-limitations'),
+              ('Momentum based GD', 2, None, 'momentum-based-gd'),
+              ('Improving gradient descent with momentum',
                2,
                None,
-               'example-of-usage-of-bayes-theorem'),
-              ('Doing it correctly', 2, None, 'doing-it-correctly'),
-              ("Bayes' Theorem and Ridge and Lasso Regression",
+               'improving-gradient-descent-with-momentum'),
+              ('Same code but now with momentum gradient descent',
                2,
                None,
-               'bayes-theorem-and-ridge-and-lasso-regression'),
-              ('Ridge and Bayes', 2, None, 'ridge-and-bayes'),
-              ('Lasso and Bayes', 2, None, 'lasso-and-bayes'),
-              ('Why resampling methods', 2, None, 'why-resampling-methods'),
-              ('Resampling methods', 2, None, 'resampling-methods'),
-              ('Resampling approaches can be computationally expensive',
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ('Overview video on Stochastic Gradient Descent (SGD)',
                2,
                None,
-               'resampling-approaches-can-be-computationally-expensive'),
-              ('Why resampling methods ?', 2, None, 'why-resampling-methods'),
-              ('Statistical analysis', 2, None, 'statistical-analysis'),
-              ('Resampling methods', 2, None, 'resampling-methods'),
-              ('Resampling methods: Bootstrap',
+               'overview-video-on-stochastic-gradient-descent-sgd'),
+              ('Batches and mini-batches', 2, None, 'batches-and-mini-batches'),
+              ('Pros and cons', 2, None, 'pros-and-cons'),
+              ('Convergence rates', 2, None, 'convergence-rates'),
+              ('Accuracy', 2, None, 'accuracy'),
+              ('Stochastic Gradient Descent (SGD)',
                2,
                None,
-               'resampling-methods-bootstrap'),
-              ('The Central Limit Theorem',
+               'stochastic-gradient-descent-sgd'),
+              ('Stochastic Gradient Descent',
                2,
                None,
-               'the-central-limit-theorem'),
-              ('Finding the Limit', 2, None, 'finding-the-limit'),
-              ('Rewriting the $\\delta$-function',
+               'stochastic-gradient-descent'),
+              ('Computation of gradients', 2, None, 'computation-of-gradients'),
+              ('SGD example', 2, None, 'sgd-example'),
+              ('The gradient step', 2, None, 'the-gradient-step'),
+              ('Simple example code', 2, None, 'simple-example-code'),
+              ('When do we stop?', 2, None, 'when-do-we-stop'),
+              ('Slightly different approach',
                2,
                None,
-               'rewriting-the-delta-function'),
-              ('Identifying Terms', 2, None, 'identifying-terms'),
-              ('Wrapping it up', 2, None, 'wrapping-it-up'),
-              ('Confidence Intervals', 2, None, 'confidence-intervals'),
-              ('Standard Approach based on the Normal Distribution',
+               'slightly-different-approach'),
+              ('Time decay rate', 2, None, 'time-decay-rate'),
+              ('Code with a Number of Minibatches which varies',
                2,
                None,
-               'standard-approach-based-on-the-normal-distribution'),
-              ('Resampling methods: Bootstrap background',
+               'code-with-a-number-of-minibatches-which-varies'),
+              ('Replace or not', 2, None, 'replace-or-not'),
+              ('SGD vs Full-Batch GD: Convergence Speed and Memory Comparison',
                2,
                None,
-               'resampling-methods-bootstrap-background'),
-              ('Resampling methods: More Bootstrap background',
+               'sgd-vs-full-batch-gd-convergence-speed-and-memory-comparison'),
+              ('Theoretical Convergence Speed and convex optimization',
+               3,
+               None,
+               'theoretical-convergence-speed-and-convex-optimization'),
+              ('Strongly Convex Case', 3, None, 'strongly-convex-case'),
+              ('Non-Convex Problems', 3, None, 'non-convex-problems'),
+              ('Memory Usage and Scalability',
+               2,
+               None,
+               'memory-usage-and-scalability'),
+              ('Empirical Evidence: Convergence Time and Memory in Practice',
+               2,
+               None,
+               'empirical-evidence-convergence-time-and-memory-in-practice'),
+              ('Deep Neural Networks', 3, None, 'deep-neural-networks'),
+              ('Memory constraints', 3, None, 'memory-constraints'),
+              ('Second moment of the gradient',
+               2,
+               None,
+               'second-moment-of-the-gradient'),
+              ('Challenge: Choosing a Fixed Learning Rate',
+               2,
+               None,
+               'challenge-choosing-a-fixed-learning-rate'),
+              ('Motivation for Adaptive Step Sizes',
+               2,
+               None,
+               'motivation-for-adaptive-step-sizes'),
+              ('AdaGrad algorithm, taken from "Goodfellow et '
+               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
+               2,
+               None,
+               'adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
+              ('Derivation of the AdaGrad Algorithm',
+               2,
+               None,
+               'derivation-of-the-adagrad-algorithm'),
+              ('AdaGrad Update Rule Derivation',
                2,
                None,
-               'resampling-methods-more-bootstrap-background'),
-              ('Resampling methods: Bootstrap approach',
+               'adagrad-update-rule-derivation'),
+              ('AdaGrad Properties', 2, None, 'adagrad-properties'),
+              ('RMSProp: Adaptive Learning Rates',
                2,
                None,
-               'resampling-methods-bootstrap-approach'),
-              ('Resampling methods: Bootstrap steps',
+               'rmsprop-adaptive-learning-rates'),
+              ('RMSProp algorithm, taken from "Goodfellow et '
+               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
                2,
                None,
-               'resampling-methods-bootstrap-steps'),
-              ('Code example for the Bootstrap method',
+               'rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
+              ('Adam Optimizer', 2, None, 'adam-optimizer'),
+              ('"ADAM optimizer":"https://arxiv.org/abs/1412.6980"',
                2,
                None,
-               'code-example-for-the-bootstrap-method'),
-              ('Plotting the Histogram', 2, None, 'plotting-the-histogram'),
-              ('The bias-variance tradeoff',
+               'adam-optimizer-https-arxiv-org-abs-1412-6980'),
+              ('Why Combine Momentum and RMSProp?',
                2,
                None,
-               'the-bias-variance-tradeoff'),
-              ('A way to Read the Bias-Variance Tradeoff',
+               'why-combine-momentum-and-rmsprop'),
+              ('Adam: Exponential Moving Averages (Moments)',
                2,
                None,
-               'a-way-to-read-the-bias-variance-tradeoff'),
-              ('Example code for Bias-Variance tradeoff',
+               'adam-exponential-moving-averages-moments'),
+              ('Adam: Bias Correction', 2, None, 'adam-bias-correction'),
+              ('Adam: Update Rule Derivation',
                2,
                None,
-               'example-code-for-bias-variance-tradeoff'),
-              ('Understanding what happens',
+               'adam-update-rule-derivation'),
+              ('Adam vs. AdaGrad and RMSProp',
                2,
                None,
-               'understanding-what-happens'),
-              ('Summing up', 2, None, 'summing-up'),
-              ("Another Example from Scikit-Learn's Repository",
+               'adam-vs-adagrad-and-rmsprop'),
+              ('Adaptivity Across Dimensions',
                2,
                None,
-               'another-example-from-scikit-learn-s-repository'),
-              ('Various steps in cross-validation',
+               'adaptivity-across-dimensions'),
+              ('ADAM algorithm, taken from "Goodfellow et '
+               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
                2,
                None,
-               'various-steps-in-cross-validation'),
-              ('Cross-validation in brief',
+               'adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
+              ('Algorithms and codes for Adagrad, RMSprop and Adam',
                2,
                None,
-               'cross-validation-in-brief'),
-              ('Code Example for Cross-validation and $k$-fold '
-               'Cross-validation',
+               'algorithms-and-codes-for-adagrad-rmsprop-and-adam'),
+              ('Practical tips', 2, None, 'practical-tips'),
+              ('Sneaking in automatic differentiation using Autograd',
                2,
                None,
-               'code-example-for-cross-validation-and-k-fold-cross-validation'),
-              ('More examples on bootstrap and cross-validation and errors',
+               'sneaking-in-automatic-differentiation-using-autograd'),
+              ('Same code but now with momentum gradient descent',
                2,
                None,
-               'more-examples-on-bootstrap-and-cross-validation-and-errors'),
-              ('The same example but now with cross-validation',
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ('Including Stochastic Gradient Descent with Autograd',
                2,
                None,
-               'the-same-example-but-now-with-cross-validation'),
+               'including-stochastic-gradient-descent-with-autograd'),
+              ('Same code but now with momentum gradient descent',
+               2,
+               None,
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ("But none of these can compete with Newton's method",
+               2,
+               None,
+               'but-none-of-these-can-compete-with-newton-s-method'),
+              ('Similar (second order function now) problem but now with '
+               'AdaGrad',
+               2,
+               None,
+               'similar-second-order-function-now-problem-but-now-with-adagrad'),
+              ('RMSprop for adaptive learning rate with Stochastic Gradient '
+               'Descent',
+               2,
+               None,
+               'rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent'),
+              ('And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"',
+               2,
+               None,
+               'and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf'),
               ('Material for the lab sessions',
                2,
                None,
                'material-for-the-lab-sessions'),
-              ('Linking the regression analysis with a statistical '
-               'interpretation',
+              ('Reminder on different scaling methods',
                2,
                None,
-               'linking-the-regression-analysis-with-a-statistical-interpretation'),
-              ('Assumptions made', 2, None, 'assumptions-made'),
-              ('Expectation value and variance',
+               'reminder-on-different-scaling-methods'),
+              ('Functionality in Scikit-Learn',
                2,
                None,
-               'expectation-value-and-variance'),
-              ('Expectation value and variance for $\\boldsymbol{\\beta}$',
+               'functionality-in-scikit-learn'),
+              ('More preprocessing', 2, None, 'more-preprocessing'),
+              ('Frequently used scaling functions',
                2,
                None,
-               'expectation-value-and-variance-for-boldsymbol-beta')]}
+               'frequently-used-scaling-functions')]}
 end of tocinfo -->
 
 <body>
@@ -220,66 +283,86 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week37-bs.html">Week 37: Statistical interpretations and Resampling Methods</a>
+    <a class="navbar-brand" href="http://github.com/week37-bs.html">Week 37: Gradient descent methods</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs001.html#plans-for-week-37-lecture-monday" style="font-size: 80%;">Plans for week 37, lecture Monday</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs002.html#plans-for-week-37-lab-sessions" style="font-size: 80%;">Plans for week 37, lab sessions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs003.html#material-for-lecture-monday-september-9" style="font-size: 80%;">Material for lecture Monday September 9</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs004.html#deriving-ols-from-a-probability-distribution" style="font-size: 80%;">Deriving OLS from a probability distribution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs005.html#independent-and-identically-distrubuted-iid" style="font-size: 80%;">Independent and Identically Distrubuted (iid)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs006.html#maximum-likelihood-estimation-mle" style="font-size: 80%;">Maximum Likelihood Estimation (MLE)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs007.html#a-new-cost-function" style="font-size: 80%;">A new Cost Function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs008.html#more-basic-statistics-and-bayes-theorem" style="font-size: 80%;">More basic Statistics and Bayes' theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs009.html#marginal-probability" style="font-size: 80%;">Marginal Probability</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs010.html#conditional-probability" style="font-size: 80%;">Conditional  Probability</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs011.html#bayes-theorem" style="font-size: 80%;">Bayes' Theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs012.html#interpretations-of-bayes-theorem" style="font-size: 80%;">Interpretations of Bayes' Theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs013.html#example-of-usage-of-bayes-theorem" style="font-size: 80%;">Example of Usage of Bayes' theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs014.html#doing-it-correctly" style="font-size: 80%;">Doing it correctly</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs015.html#bayes-theorem-and-ridge-and-lasso-regression" style="font-size: 80%;">Bayes' Theorem and Ridge and Lasso Regression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs016.html#ridge-and-bayes" style="font-size: 80%;">Ridge and Bayes</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs017.html#lasso-and-bayes" style="font-size: 80%;">Lasso and Bayes</a></li>
-     <!-- navigation toc: --> <li><a href="#why-resampling-methods" style="font-size: 80%;">Why resampling methods</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs023.html#resampling-methods" style="font-size: 80%;">Resampling methods</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs020.html#resampling-approaches-can-be-computationally-expensive" style="font-size: 80%;">Resampling approaches can be computationally expensive</a></li>
-     <!-- navigation toc: --> <li><a href="#why-resampling-methods" style="font-size: 80%;">Why resampling methods ?</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs022.html#statistical-analysis" style="font-size: 80%;">Statistical analysis</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs023.html#resampling-methods" style="font-size: 80%;">Resampling methods</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs024.html#resampling-methods-bootstrap" style="font-size: 80%;">Resampling methods: Bootstrap</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs025.html#the-central-limit-theorem" style="font-size: 80%;">The Central Limit Theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs026.html#finding-the-limit" style="font-size: 80%;">Finding the Limit</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs027.html#rewriting-the-delta-function" style="font-size: 80%;">Rewriting the \( \delta \)-function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs028.html#identifying-terms" style="font-size: 80%;">Identifying Terms</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs029.html#wrapping-it-up" style="font-size: 80%;">Wrapping it up</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs030.html#confidence-intervals" style="font-size: 80%;">Confidence Intervals</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs031.html#standard-approach-based-on-the-normal-distribution" style="font-size: 80%;">Standard Approach based on the Normal Distribution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs032.html#resampling-methods-bootstrap-background" style="font-size: 80%;">Resampling methods: Bootstrap background</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#resampling-methods-more-bootstrap-background" style="font-size: 80%;">Resampling methods: More Bootstrap background</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs034.html#resampling-methods-bootstrap-approach" style="font-size: 80%;">Resampling methods: Bootstrap approach</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#resampling-methods-bootstrap-steps" style="font-size: 80%;">Resampling methods: Bootstrap steps</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs036.html#code-example-for-the-bootstrap-method" style="font-size: 80%;">Code example for the Bootstrap method</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs037.html#plotting-the-histogram" style="font-size: 80%;">Plotting the Histogram</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs038.html#the-bias-variance-tradeoff" style="font-size: 80%;">The bias-variance tradeoff</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs039.html#a-way-to-read-the-bias-variance-tradeoff" style="font-size: 80%;">A way to Read the Bias-Variance Tradeoff</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs040.html#example-code-for-bias-variance-tradeoff" style="font-size: 80%;">Example code for Bias-Variance tradeoff</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs041.html#understanding-what-happens" style="font-size: 80%;">Understanding what happens</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs042.html#summing-up" style="font-size: 80%;">Summing up</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs043.html#another-example-from-scikit-learn-s-repository" style="font-size: 80%;">Another Example from Scikit-Learn's Repository</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs044.html#various-steps-in-cross-validation" style="font-size: 80%;">Various steps in cross-validation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs045.html#cross-validation-in-brief" style="font-size: 80%;">Cross-validation in brief</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs046.html#code-example-for-cross-validation-and-k-fold-cross-validation" style="font-size: 80%;">Code Example for Cross-validation and \( k \)-fold Cross-validation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs047.html#more-examples-on-bootstrap-and-cross-validation-and-errors" style="font-size: 80%;">More examples on bootstrap and cross-validation and errors</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs048.html#the-same-example-but-now-with-cross-validation" style="font-size: 80%;">The same example but now with cross-validation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs049.html#material-for-the-lab-sessions" style="font-size: 80%;">Material for the lab sessions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs050.html#linking-the-regression-analysis-with-a-statistical-interpretation" style="font-size: 80%;">Linking the regression analysis with a statistical interpretation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs051.html#assumptions-made" style="font-size: 80%;">Assumptions made</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs052.html#expectation-value-and-variance" style="font-size: 80%;">Expectation value and variance</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs053.html#expectation-value-and-variance-for-boldsymbol-beta" style="font-size: 80%;">Expectation value and variance for \( \boldsymbol{\beta} \)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs001.html#plans-for-week-37-lecture-monday" style="font-size: 80%;"><b>Plans for week 37, lecture Monday</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs002.html#readings-and-videos" style="font-size: 80%;"><b>Readings and Videos:</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs003.html#material-for-lecture-monday-september-8" style="font-size: 80%;"><b>Material for lecture Monday September 8</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs004.html#gradient-descent-and-revisiting-ordinary-least-squares-from-last-week" style="font-size: 80%;"><b>Gradient descent and revisiting Ordinary Least Squares from last week</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs009.html#gradient-descent-example" style="font-size: 80%;"><b>Gradient descent example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs006.html#the-derivative-of-the-cost-loss-function" style="font-size: 80%;"><b>The derivative of the cost/loss function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs007.html#the-hessian-matrix" style="font-size: 80%;"><b>The Hessian matrix</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs008.html#simple-program" style="font-size: 80%;"><b>Simple program</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs009.html#gradient-descent-example" style="font-size: 80%;"><b>Gradient Descent Example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs010.html#gradient-descent-and-ridge" style="font-size: 80%;"><b>Gradient descent and Ridge</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs011.html#the-hessian-matrix-for-ridge-regression" style="font-size: 80%;"><b>The Hessian matrix for Ridge Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs012.html#program-example-for-gradient-descent-with-ridge-regression" style="font-size: 80%;"><b>Program example for gradient descent with Ridge Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs013.html#using-gradient-descent-methods-limitations" style="font-size: 80%;"><b>Using gradient descent methods, limitations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs014.html#momentum-based-gd" style="font-size: 80%;"><b>Momentum based GD</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs015.html#improving-gradient-descent-with-momentum" style="font-size: 80%;"><b>Improving gradient descent with momentum</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs056.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs017.html#overview-video-on-stochastic-gradient-descent-sgd" style="font-size: 80%;"><b>Overview video on Stochastic Gradient Descent (SGD)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs018.html#batches-and-mini-batches" style="font-size: 80%;"><b>Batches and mini-batches</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs019.html#pros-and-cons" style="font-size: 80%;"><b>Pros and cons</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs020.html#convergence-rates" style="font-size: 80%;"><b>Convergence rates</b></a></li>
+     <!-- navigation toc: --> <li><a href="#accuracy" style="font-size: 80%;"><b>Accuracy</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs022.html#stochastic-gradient-descent-sgd" style="font-size: 80%;"><b>Stochastic Gradient Descent (SGD)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs023.html#stochastic-gradient-descent" style="font-size: 80%;"><b>Stochastic Gradient Descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs024.html#computation-of-gradients" style="font-size: 80%;"><b>Computation of gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs025.html#sgd-example" style="font-size: 80%;"><b>SGD example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs026.html#the-gradient-step" style="font-size: 80%;"><b>The gradient step</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs027.html#simple-example-code" style="font-size: 80%;"><b>Simple example code</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs028.html#when-do-we-stop" style="font-size: 80%;"><b>When do we stop?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs029.html#slightly-different-approach" style="font-size: 80%;"><b>Slightly different approach</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs030.html#time-decay-rate" style="font-size: 80%;"><b>Time decay rate</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs031.html#code-with-a-number-of-minibatches-which-varies" style="font-size: 80%;"><b>Code with a Number of Minibatches which varies</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs032.html#replace-or-not" style="font-size: 80%;"><b>Replace or not</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#sgd-vs-full-batch-gd-convergence-speed-and-memory-comparison" style="font-size: 80%;"><b>SGD vs Full-Batch GD: Convergence Speed and Memory Comparison</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#theoretical-convergence-speed-and-convex-optimization" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Theoretical Convergence Speed and convex optimization</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#strongly-convex-case" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Strongly Convex Case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#non-convex-problems" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Non-Convex Problems</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs034.html#memory-usage-and-scalability" style="font-size: 80%;"><b>Memory Usage and Scalability</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#empirical-evidence-convergence-time-and-memory-in-practice" style="font-size: 80%;"><b>Empirical Evidence: Convergence Time and Memory in Practice</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#deep-neural-networks" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Deep Neural Networks</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#memory-constraints" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Memory constraints</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs036.html#second-moment-of-the-gradient" style="font-size: 80%;"><b>Second moment of the gradient</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs037.html#challenge-choosing-a-fixed-learning-rate" style="font-size: 80%;"><b>Challenge: Choosing a Fixed Learning Rate</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs038.html#motivation-for-adaptive-step-sizes" style="font-size: 80%;"><b>Motivation for Adaptive Step Sizes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs038.html#adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>AdaGrad algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs039.html#derivation-of-the-adagrad-algorithm" style="font-size: 80%;"><b>Derivation of the AdaGrad Algorithm</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs040.html#adagrad-update-rule-derivation" style="font-size: 80%;"><b>AdaGrad Update Rule Derivation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs041.html#adagrad-properties" style="font-size: 80%;"><b>AdaGrad Properties</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs042.html#rmsprop-adaptive-learning-rates" style="font-size: 80%;"><b>RMSProp: Adaptive Learning Rates</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs042.html#rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>RMSProp algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs043.html#adam-optimizer" style="font-size: 80%;"><b>Adam Optimizer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs044.html#adam-optimizer-https-arxiv-org-abs-1412-6980" style="font-size: 80%;"><b>"ADAM optimizer":"https://arxiv.org/abs/1412.6980"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs045.html#why-combine-momentum-and-rmsprop" style="font-size: 80%;"><b>Why Combine Momentum and RMSProp?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs046.html#adam-exponential-moving-averages-moments" style="font-size: 80%;"><b>Adam: Exponential Moving Averages (Moments)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs047.html#adam-bias-correction" style="font-size: 80%;"><b>Adam: Bias Correction</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs048.html#adam-update-rule-derivation" style="font-size: 80%;"><b>Adam: Update Rule Derivation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs049.html#adam-vs-adagrad-and-rmsprop" style="font-size: 80%;"><b>Adam vs. AdaGrad and RMSProp</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs050.html#adaptivity-across-dimensions" style="font-size: 80%;"><b>Adaptivity Across Dimensions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs050.html#adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>ADAM algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs051.html#algorithms-and-codes-for-adagrad-rmsprop-and-adam" style="font-size: 80%;"><b>Algorithms and codes for Adagrad, RMSprop and Adam</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs052.html#practical-tips" style="font-size: 80%;"><b>Practical tips</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs053.html#sneaking-in-automatic-differentiation-using-autograd" style="font-size: 80%;"><b>Sneaking in automatic differentiation using Autograd</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs056.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs055.html#including-stochastic-gradient-descent-with-autograd" style="font-size: 80%;"><b>Including Stochastic Gradient Descent with Autograd</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs056.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs057.html#but-none-of-these-can-compete-with-newton-s-method" style="font-size: 80%;"><b>But none of these can compete with Newton's method</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs058.html#similar-second-order-function-now-problem-but-now-with-adagrad" style="font-size: 80%;"><b>Similar (second order function now) problem but now with AdaGrad</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs059.html#rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent" style="font-size: 80%;"><b>RMSprop for adaptive learning rate with Stochastic Gradient Descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs060.html#and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf" style="font-size: 80%;"><b>And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs061.html#material-for-the-lab-sessions" style="font-size: 80%;"><b>Material for the lab sessions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs062.html#reminder-on-different-scaling-methods" style="font-size: 80%;"><b>Reminder on different scaling methods</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs063.html#functionality-in-scikit-learn" style="font-size: 80%;"><b>Functionality in Scikit-Learn</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs064.html#more-preprocessing" style="font-size: 80%;"><b>More preprocessing</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs065.html#frequently-used-scaling-functions" style="font-size: 80%;"><b>Frequently used scaling functions</b></a></li>
 
         </ul>
       </li>
@@ -291,19 +374,28 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0021"></a>
 <!-- !split -->
-<h2 id="why-resampling-methods" class="anchor">Why resampling methods ? </h2>
-<div class="panel panel-default">
-<div class="panel-body">
-<!-- subsequent paragraphs come in larger fonts, so start with a paragraph -->
+<h2 id="accuracy" class="anchor">Accuracy  </h2>
 
-<ul>
-<li> Our simulations can be treated as <em>computer experiments</em>. This is particularly the case for Monte Carlo methods which are widely used in statistical analyses.</li>
-<li> The results can be analysed with the same statistical tools as we would use when analysing experimental data.</li>
-<li> As in all experiments, we are looking for expectation values and an estimate of how accurate they are, i.e., possible sources for errors.</li>
-</ul>
-</div>
-</div>
-    
+<p>In general, stochastic Gradient Descent is Less accurate than gradient
+descent, as it calculates the gradient on single examples, which may
+not accurately represent the overall dataset.  Gradient Descent is
+more accurate because it uses the average gradient calculated over the
+entire dataset.
+</p>
+
+<p>There are other disadvantages to using SGD. The main drawback is that
+its convergence behaviour can be more erratic due to the random
+sampling of individual training examples. This can lead to less
+accurate results, as the algorithm may not converge to the true
+minimum of the cost function. Additionally, the learning rate, which
+determines the step size of each update to the model&#8217;s parameters,
+must be carefully chosen to ensure convergence.
+</p>
+
+<p>It is however the method of choice in deep learning algorithms where
+SGD is often used in combination with other optimization techniques,
+such as momentum or adaptive learning rates
+</p>
 
 <p>
 <!-- navigation buttons at the bottom of the page -->
@@ -330,7 +422,7 @@ <h2 id="why-resampling-methods" class="anchor">Why resampling methods ? </h2>
   <li><a href="http://github.com/._week37-bs029.html">30</a></li>
   <li><a href="http://github.com/._week37-bs030.html">31</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week37-bs053.html">54</a></li>
+  <li><a href="http://github.com/._week37-bs065.html">66</a></li>
   <li><a href="http://github.com/._week37-bs022.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week37/html/._week37-bs022.html b/doc/pub/week37/html/._week37-bs022.html
index 06e98daf2..2ea8fe151 100644
--- a/doc/pub/week37/html/._week37-bs022.html
+++ b/doc/pub/week37/html/._week37-bs022.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 37: Statistical interpretations and Resampling Methods">
-<title>Week 37: Statistical interpretations and Resampling Methods</title>
+<meta name="description" content="Week 37: Gradient descent methods">
+<title>Week 37: Gradient descent methods</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week37.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week37-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -40,159 +40,222 @@
                2,
                None,
                'plans-for-week-37-lecture-monday'),
-              ('Plans for week 37, lab sessions',
+              ('Readings and Videos:', 2, None, 'readings-and-videos'),
+              ('Material for lecture Monday September 8',
                2,
                None,
-               'plans-for-week-37-lab-sessions'),
-              ('Material for lecture Monday September 9',
+               'material-for-lecture-monday-september-8'),
+              ('Gradient descent and revisiting Ordinary Least Squares from '
+               'last week',
                2,
                None,
-               'material-for-lecture-monday-september-9'),
-              ('Deriving OLS from a probability distribution',
+               'gradient-descent-and-revisiting-ordinary-least-squares-from-last-week'),
+              ('Gradient descent example', 2, None, 'gradient-descent-example'),
+              ('The derivative of the cost/loss function',
                2,
                None,
-               'deriving-ols-from-a-probability-distribution'),
-              ('Independent and Identically Distrubuted (iid)',
+               'the-derivative-of-the-cost-loss-function'),
+              ('The Hessian matrix', 2, None, 'the-hessian-matrix'),
+              ('Simple program', 2, None, 'simple-program'),
+              ('Gradient Descent Example', 2, None, 'gradient-descent-example'),
+              ('Gradient descent and Ridge',
                2,
                None,
-               'independent-and-identically-distrubuted-iid'),
-              ('Maximum Likelihood Estimation (MLE)',
+               'gradient-descent-and-ridge'),
+              ('The Hessian matrix for Ridge Regression',
                2,
                None,
-               'maximum-likelihood-estimation-mle'),
-              ('A new Cost Function', 2, None, 'a-new-cost-function'),
-              ("More basic Statistics and Bayes' theorem",
+               'the-hessian-matrix-for-ridge-regression'),
+              ('Program example for gradient descent with Ridge Regression',
                2,
                None,
-               'more-basic-statistics-and-bayes-theorem'),
-              ('Marginal Probability', 2, None, 'marginal-probability'),
-              ('Conditional  Probability', 2, None, 'conditional-probability'),
-              ("Bayes' Theorem", 2, None, 'bayes-theorem'),
-              ("Interpretations of Bayes' Theorem",
+               'program-example-for-gradient-descent-with-ridge-regression'),
+              ('Using gradient descent methods, limitations',
                2,
                None,
-               'interpretations-of-bayes-theorem'),
-              ("Example of Usage of Bayes' theorem",
+               'using-gradient-descent-methods-limitations'),
+              ('Momentum based GD', 2, None, 'momentum-based-gd'),
+              ('Improving gradient descent with momentum',
                2,
                None,
-               'example-of-usage-of-bayes-theorem'),
-              ('Doing it correctly', 2, None, 'doing-it-correctly'),
-              ("Bayes' Theorem and Ridge and Lasso Regression",
+               'improving-gradient-descent-with-momentum'),
+              ('Same code but now with momentum gradient descent',
                2,
                None,
-               'bayes-theorem-and-ridge-and-lasso-regression'),
-              ('Ridge and Bayes', 2, None, 'ridge-and-bayes'),
-              ('Lasso and Bayes', 2, None, 'lasso-and-bayes'),
-              ('Why resampling methods', 2, None, 'why-resampling-methods'),
-              ('Resampling methods', 2, None, 'resampling-methods'),
-              ('Resampling approaches can be computationally expensive',
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ('Overview video on Stochastic Gradient Descent (SGD)',
                2,
                None,
-               'resampling-approaches-can-be-computationally-expensive'),
-              ('Why resampling methods ?', 2, None, 'why-resampling-methods'),
-              ('Statistical analysis', 2, None, 'statistical-analysis'),
-              ('Resampling methods', 2, None, 'resampling-methods'),
-              ('Resampling methods: Bootstrap',
+               'overview-video-on-stochastic-gradient-descent-sgd'),
+              ('Batches and mini-batches', 2, None, 'batches-and-mini-batches'),
+              ('Pros and cons', 2, None, 'pros-and-cons'),
+              ('Convergence rates', 2, None, 'convergence-rates'),
+              ('Accuracy', 2, None, 'accuracy'),
+              ('Stochastic Gradient Descent (SGD)',
                2,
                None,
-               'resampling-methods-bootstrap'),
-              ('The Central Limit Theorem',
+               'stochastic-gradient-descent-sgd'),
+              ('Stochastic Gradient Descent',
                2,
                None,
-               'the-central-limit-theorem'),
-              ('Finding the Limit', 2, None, 'finding-the-limit'),
-              ('Rewriting the $\\delta$-function',
+               'stochastic-gradient-descent'),
+              ('Computation of gradients', 2, None, 'computation-of-gradients'),
+              ('SGD example', 2, None, 'sgd-example'),
+              ('The gradient step', 2, None, 'the-gradient-step'),
+              ('Simple example code', 2, None, 'simple-example-code'),
+              ('When do we stop?', 2, None, 'when-do-we-stop'),
+              ('Slightly different approach',
                2,
                None,
-               'rewriting-the-delta-function'),
-              ('Identifying Terms', 2, None, 'identifying-terms'),
-              ('Wrapping it up', 2, None, 'wrapping-it-up'),
-              ('Confidence Intervals', 2, None, 'confidence-intervals'),
-              ('Standard Approach based on the Normal Distribution',
+               'slightly-different-approach'),
+              ('Time decay rate', 2, None, 'time-decay-rate'),
+              ('Code with a Number of Minibatches which varies',
                2,
                None,
-               'standard-approach-based-on-the-normal-distribution'),
-              ('Resampling methods: Bootstrap background',
+               'code-with-a-number-of-minibatches-which-varies'),
+              ('Replace or not', 2, None, 'replace-or-not'),
+              ('SGD vs Full-Batch GD: Convergence Speed and Memory Comparison',
                2,
                None,
-               'resampling-methods-bootstrap-background'),
-              ('Resampling methods: More Bootstrap background',
+               'sgd-vs-full-batch-gd-convergence-speed-and-memory-comparison'),
+              ('Theoretical Convergence Speed and convex optimization',
+               3,
+               None,
+               'theoretical-convergence-speed-and-convex-optimization'),
+              ('Strongly Convex Case', 3, None, 'strongly-convex-case'),
+              ('Non-Convex Problems', 3, None, 'non-convex-problems'),
+              ('Memory Usage and Scalability',
+               2,
+               None,
+               'memory-usage-and-scalability'),
+              ('Empirical Evidence: Convergence Time and Memory in Practice',
+               2,
+               None,
+               'empirical-evidence-convergence-time-and-memory-in-practice'),
+              ('Deep Neural Networks', 3, None, 'deep-neural-networks'),
+              ('Memory constraints', 3, None, 'memory-constraints'),
+              ('Second moment of the gradient',
+               2,
+               None,
+               'second-moment-of-the-gradient'),
+              ('Challenge: Choosing a Fixed Learning Rate',
+               2,
+               None,
+               'challenge-choosing-a-fixed-learning-rate'),
+              ('Motivation for Adaptive Step Sizes',
+               2,
+               None,
+               'motivation-for-adaptive-step-sizes'),
+              ('AdaGrad algorithm, taken from "Goodfellow et '
+               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
+               2,
+               None,
+               'adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
+              ('Derivation of the AdaGrad Algorithm',
+               2,
+               None,
+               'derivation-of-the-adagrad-algorithm'),
+              ('AdaGrad Update Rule Derivation',
                2,
                None,
-               'resampling-methods-more-bootstrap-background'),
-              ('Resampling methods: Bootstrap approach',
+               'adagrad-update-rule-derivation'),
+              ('AdaGrad Properties', 2, None, 'adagrad-properties'),
+              ('RMSProp: Adaptive Learning Rates',
                2,
                None,
-               'resampling-methods-bootstrap-approach'),
-              ('Resampling methods: Bootstrap steps',
+               'rmsprop-adaptive-learning-rates'),
+              ('RMSProp algorithm, taken from "Goodfellow et '
+               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
                2,
                None,
-               'resampling-methods-bootstrap-steps'),
-              ('Code example for the Bootstrap method',
+               'rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
+              ('Adam Optimizer', 2, None, 'adam-optimizer'),
+              ('"ADAM optimizer":"https://arxiv.org/abs/1412.6980"',
                2,
                None,
-               'code-example-for-the-bootstrap-method'),
-              ('Plotting the Histogram', 2, None, 'plotting-the-histogram'),
-              ('The bias-variance tradeoff',
+               'adam-optimizer-https-arxiv-org-abs-1412-6980'),
+              ('Why Combine Momentum and RMSProp?',
                2,
                None,
-               'the-bias-variance-tradeoff'),
-              ('A way to Read the Bias-Variance Tradeoff',
+               'why-combine-momentum-and-rmsprop'),
+              ('Adam: Exponential Moving Averages (Moments)',
                2,
                None,
-               'a-way-to-read-the-bias-variance-tradeoff'),
-              ('Example code for Bias-Variance tradeoff',
+               'adam-exponential-moving-averages-moments'),
+              ('Adam: Bias Correction', 2, None, 'adam-bias-correction'),
+              ('Adam: Update Rule Derivation',
                2,
                None,
-               'example-code-for-bias-variance-tradeoff'),
-              ('Understanding what happens',
+               'adam-update-rule-derivation'),
+              ('Adam vs. AdaGrad and RMSProp',
                2,
                None,
-               'understanding-what-happens'),
-              ('Summing up', 2, None, 'summing-up'),
-              ("Another Example from Scikit-Learn's Repository",
+               'adam-vs-adagrad-and-rmsprop'),
+              ('Adaptivity Across Dimensions',
                2,
                None,
-               'another-example-from-scikit-learn-s-repository'),
-              ('Various steps in cross-validation',
+               'adaptivity-across-dimensions'),
+              ('ADAM algorithm, taken from "Goodfellow et '
+               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
                2,
                None,
-               'various-steps-in-cross-validation'),
-              ('Cross-validation in brief',
+               'adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
+              ('Algorithms and codes for Adagrad, RMSprop and Adam',
                2,
                None,
-               'cross-validation-in-brief'),
-              ('Code Example for Cross-validation and $k$-fold '
-               'Cross-validation',
+               'algorithms-and-codes-for-adagrad-rmsprop-and-adam'),
+              ('Practical tips', 2, None, 'practical-tips'),
+              ('Sneaking in automatic differentiation using Autograd',
                2,
                None,
-               'code-example-for-cross-validation-and-k-fold-cross-validation'),
-              ('More examples on bootstrap and cross-validation and errors',
+               'sneaking-in-automatic-differentiation-using-autograd'),
+              ('Same code but now with momentum gradient descent',
                2,
                None,
-               'more-examples-on-bootstrap-and-cross-validation-and-errors'),
-              ('The same example but now with cross-validation',
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ('Including Stochastic Gradient Descent with Autograd',
                2,
                None,
-               'the-same-example-but-now-with-cross-validation'),
+               'including-stochastic-gradient-descent-with-autograd'),
+              ('Same code but now with momentum gradient descent',
+               2,
+               None,
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ("But none of these can compete with Newton's method",
+               2,
+               None,
+               'but-none-of-these-can-compete-with-newton-s-method'),
+              ('Similar (second order function now) problem but now with '
+               'AdaGrad',
+               2,
+               None,
+               'similar-second-order-function-now-problem-but-now-with-adagrad'),
+              ('RMSprop for adaptive learning rate with Stochastic Gradient '
+               'Descent',
+               2,
+               None,
+               'rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent'),
+              ('And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"',
+               2,
+               None,
+               'and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf'),
               ('Material for the lab sessions',
                2,
                None,
                'material-for-the-lab-sessions'),
-              ('Linking the regression analysis with a statistical '
-               'interpretation',
+              ('Reminder on different scaling methods',
                2,
                None,
-               'linking-the-regression-analysis-with-a-statistical-interpretation'),
-              ('Assumptions made', 2, None, 'assumptions-made'),
-              ('Expectation value and variance',
+               'reminder-on-different-scaling-methods'),
+              ('Functionality in Scikit-Learn',
                2,
                None,
-               'expectation-value-and-variance'),
-              ('Expectation value and variance for $\\boldsymbol{\\beta}$',
+               'functionality-in-scikit-learn'),
+              ('More preprocessing', 2, None, 'more-preprocessing'),
+              ('Frequently used scaling functions',
                2,
                None,
-               'expectation-value-and-variance-for-boldsymbol-beta')]}
+               'frequently-used-scaling-functions')]}
 end of tocinfo -->
 
 <body>
@@ -220,66 +283,86 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week37-bs.html">Week 37: Statistical interpretations and Resampling Methods</a>
+    <a class="navbar-brand" href="http://github.com/week37-bs.html">Week 37: Gradient descent methods</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs001.html#plans-for-week-37-lecture-monday" style="font-size: 80%;">Plans for week 37, lecture Monday</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs002.html#plans-for-week-37-lab-sessions" style="font-size: 80%;">Plans for week 37, lab sessions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs003.html#material-for-lecture-monday-september-9" style="font-size: 80%;">Material for lecture Monday September 9</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs004.html#deriving-ols-from-a-probability-distribution" style="font-size: 80%;">Deriving OLS from a probability distribution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs005.html#independent-and-identically-distrubuted-iid" style="font-size: 80%;">Independent and Identically Distrubuted (iid)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs006.html#maximum-likelihood-estimation-mle" style="font-size: 80%;">Maximum Likelihood Estimation (MLE)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs007.html#a-new-cost-function" style="font-size: 80%;">A new Cost Function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs008.html#more-basic-statistics-and-bayes-theorem" style="font-size: 80%;">More basic Statistics and Bayes' theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs009.html#marginal-probability" style="font-size: 80%;">Marginal Probability</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs010.html#conditional-probability" style="font-size: 80%;">Conditional  Probability</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs011.html#bayes-theorem" style="font-size: 80%;">Bayes' Theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs012.html#interpretations-of-bayes-theorem" style="font-size: 80%;">Interpretations of Bayes' Theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs013.html#example-of-usage-of-bayes-theorem" style="font-size: 80%;">Example of Usage of Bayes' theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs014.html#doing-it-correctly" style="font-size: 80%;">Doing it correctly</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs015.html#bayes-theorem-and-ridge-and-lasso-regression" style="font-size: 80%;">Bayes' Theorem and Ridge and Lasso Regression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs016.html#ridge-and-bayes" style="font-size: 80%;">Ridge and Bayes</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs017.html#lasso-and-bayes" style="font-size: 80%;">Lasso and Bayes</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs021.html#why-resampling-methods" style="font-size: 80%;">Why resampling methods</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs023.html#resampling-methods" style="font-size: 80%;">Resampling methods</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs020.html#resampling-approaches-can-be-computationally-expensive" style="font-size: 80%;">Resampling approaches can be computationally expensive</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs021.html#why-resampling-methods" style="font-size: 80%;">Why resampling methods ?</a></li>
-     <!-- navigation toc: --> <li><a href="#statistical-analysis" style="font-size: 80%;">Statistical analysis</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs023.html#resampling-methods" style="font-size: 80%;">Resampling methods</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs024.html#resampling-methods-bootstrap" style="font-size: 80%;">Resampling methods: Bootstrap</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs025.html#the-central-limit-theorem" style="font-size: 80%;">The Central Limit Theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs026.html#finding-the-limit" style="font-size: 80%;">Finding the Limit</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs027.html#rewriting-the-delta-function" style="font-size: 80%;">Rewriting the \( \delta \)-function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs028.html#identifying-terms" style="font-size: 80%;">Identifying Terms</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs029.html#wrapping-it-up" style="font-size: 80%;">Wrapping it up</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs030.html#confidence-intervals" style="font-size: 80%;">Confidence Intervals</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs031.html#standard-approach-based-on-the-normal-distribution" style="font-size: 80%;">Standard Approach based on the Normal Distribution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs032.html#resampling-methods-bootstrap-background" style="font-size: 80%;">Resampling methods: Bootstrap background</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#resampling-methods-more-bootstrap-background" style="font-size: 80%;">Resampling methods: More Bootstrap background</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs034.html#resampling-methods-bootstrap-approach" style="font-size: 80%;">Resampling methods: Bootstrap approach</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#resampling-methods-bootstrap-steps" style="font-size: 80%;">Resampling methods: Bootstrap steps</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs036.html#code-example-for-the-bootstrap-method" style="font-size: 80%;">Code example for the Bootstrap method</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs037.html#plotting-the-histogram" style="font-size: 80%;">Plotting the Histogram</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs038.html#the-bias-variance-tradeoff" style="font-size: 80%;">The bias-variance tradeoff</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs039.html#a-way-to-read-the-bias-variance-tradeoff" style="font-size: 80%;">A way to Read the Bias-Variance Tradeoff</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs040.html#example-code-for-bias-variance-tradeoff" style="font-size: 80%;">Example code for Bias-Variance tradeoff</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs041.html#understanding-what-happens" style="font-size: 80%;">Understanding what happens</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs042.html#summing-up" style="font-size: 80%;">Summing up</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs043.html#another-example-from-scikit-learn-s-repository" style="font-size: 80%;">Another Example from Scikit-Learn's Repository</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs044.html#various-steps-in-cross-validation" style="font-size: 80%;">Various steps in cross-validation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs045.html#cross-validation-in-brief" style="font-size: 80%;">Cross-validation in brief</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs046.html#code-example-for-cross-validation-and-k-fold-cross-validation" style="font-size: 80%;">Code Example for Cross-validation and \( k \)-fold Cross-validation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs047.html#more-examples-on-bootstrap-and-cross-validation-and-errors" style="font-size: 80%;">More examples on bootstrap and cross-validation and errors</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs048.html#the-same-example-but-now-with-cross-validation" style="font-size: 80%;">The same example but now with cross-validation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs049.html#material-for-the-lab-sessions" style="font-size: 80%;">Material for the lab sessions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs050.html#linking-the-regression-analysis-with-a-statistical-interpretation" style="font-size: 80%;">Linking the regression analysis with a statistical interpretation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs051.html#assumptions-made" style="font-size: 80%;">Assumptions made</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs052.html#expectation-value-and-variance" style="font-size: 80%;">Expectation value and variance</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs053.html#expectation-value-and-variance-for-boldsymbol-beta" style="font-size: 80%;">Expectation value and variance for \( \boldsymbol{\beta} \)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs001.html#plans-for-week-37-lecture-monday" style="font-size: 80%;"><b>Plans for week 37, lecture Monday</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs002.html#readings-and-videos" style="font-size: 80%;"><b>Readings and Videos:</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs003.html#material-for-lecture-monday-september-8" style="font-size: 80%;"><b>Material for lecture Monday September 8</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs004.html#gradient-descent-and-revisiting-ordinary-least-squares-from-last-week" style="font-size: 80%;"><b>Gradient descent and revisiting Ordinary Least Squares from last week</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs009.html#gradient-descent-example" style="font-size: 80%;"><b>Gradient descent example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs006.html#the-derivative-of-the-cost-loss-function" style="font-size: 80%;"><b>The derivative of the cost/loss function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs007.html#the-hessian-matrix" style="font-size: 80%;"><b>The Hessian matrix</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs008.html#simple-program" style="font-size: 80%;"><b>Simple program</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs009.html#gradient-descent-example" style="font-size: 80%;"><b>Gradient Descent Example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs010.html#gradient-descent-and-ridge" style="font-size: 80%;"><b>Gradient descent and Ridge</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs011.html#the-hessian-matrix-for-ridge-regression" style="font-size: 80%;"><b>The Hessian matrix for Ridge Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs012.html#program-example-for-gradient-descent-with-ridge-regression" style="font-size: 80%;"><b>Program example for gradient descent with Ridge Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs013.html#using-gradient-descent-methods-limitations" style="font-size: 80%;"><b>Using gradient descent methods, limitations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs014.html#momentum-based-gd" style="font-size: 80%;"><b>Momentum based GD</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs015.html#improving-gradient-descent-with-momentum" style="font-size: 80%;"><b>Improving gradient descent with momentum</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs056.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs017.html#overview-video-on-stochastic-gradient-descent-sgd" style="font-size: 80%;"><b>Overview video on Stochastic Gradient Descent (SGD)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs018.html#batches-and-mini-batches" style="font-size: 80%;"><b>Batches and mini-batches</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs019.html#pros-and-cons" style="font-size: 80%;"><b>Pros and cons</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs020.html#convergence-rates" style="font-size: 80%;"><b>Convergence rates</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs021.html#accuracy" style="font-size: 80%;"><b>Accuracy</b></a></li>
+     <!-- navigation toc: --> <li><a href="#stochastic-gradient-descent-sgd" style="font-size: 80%;"><b>Stochastic Gradient Descent (SGD)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs023.html#stochastic-gradient-descent" style="font-size: 80%;"><b>Stochastic Gradient Descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs024.html#computation-of-gradients" style="font-size: 80%;"><b>Computation of gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs025.html#sgd-example" style="font-size: 80%;"><b>SGD example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs026.html#the-gradient-step" style="font-size: 80%;"><b>The gradient step</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs027.html#simple-example-code" style="font-size: 80%;"><b>Simple example code</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs028.html#when-do-we-stop" style="font-size: 80%;"><b>When do we stop?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs029.html#slightly-different-approach" style="font-size: 80%;"><b>Slightly different approach</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs030.html#time-decay-rate" style="font-size: 80%;"><b>Time decay rate</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs031.html#code-with-a-number-of-minibatches-which-varies" style="font-size: 80%;"><b>Code with a Number of Minibatches which varies</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs032.html#replace-or-not" style="font-size: 80%;"><b>Replace or not</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#sgd-vs-full-batch-gd-convergence-speed-and-memory-comparison" style="font-size: 80%;"><b>SGD vs Full-Batch GD: Convergence Speed and Memory Comparison</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#theoretical-convergence-speed-and-convex-optimization" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Theoretical Convergence Speed and convex optimization</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#strongly-convex-case" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Strongly Convex Case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#non-convex-problems" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Non-Convex Problems</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs034.html#memory-usage-and-scalability" style="font-size: 80%;"><b>Memory Usage and Scalability</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#empirical-evidence-convergence-time-and-memory-in-practice" style="font-size: 80%;"><b>Empirical Evidence: Convergence Time and Memory in Practice</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#deep-neural-networks" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Deep Neural Networks</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#memory-constraints" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Memory constraints</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs036.html#second-moment-of-the-gradient" style="font-size: 80%;"><b>Second moment of the gradient</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs037.html#challenge-choosing-a-fixed-learning-rate" style="font-size: 80%;"><b>Challenge: Choosing a Fixed Learning Rate</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs038.html#motivation-for-adaptive-step-sizes" style="font-size: 80%;"><b>Motivation for Adaptive Step Sizes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs038.html#adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>AdaGrad algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs039.html#derivation-of-the-adagrad-algorithm" style="font-size: 80%;"><b>Derivation of the AdaGrad Algorithm</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs040.html#adagrad-update-rule-derivation" style="font-size: 80%;"><b>AdaGrad Update Rule Derivation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs041.html#adagrad-properties" style="font-size: 80%;"><b>AdaGrad Properties</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs042.html#rmsprop-adaptive-learning-rates" style="font-size: 80%;"><b>RMSProp: Adaptive Learning Rates</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs042.html#rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>RMSProp algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs043.html#adam-optimizer" style="font-size: 80%;"><b>Adam Optimizer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs044.html#adam-optimizer-https-arxiv-org-abs-1412-6980" style="font-size: 80%;"><b>"ADAM optimizer":"https://arxiv.org/abs/1412.6980"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs045.html#why-combine-momentum-and-rmsprop" style="font-size: 80%;"><b>Why Combine Momentum and RMSProp?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs046.html#adam-exponential-moving-averages-moments" style="font-size: 80%;"><b>Adam: Exponential Moving Averages (Moments)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs047.html#adam-bias-correction" style="font-size: 80%;"><b>Adam: Bias Correction</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs048.html#adam-update-rule-derivation" style="font-size: 80%;"><b>Adam: Update Rule Derivation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs049.html#adam-vs-adagrad-and-rmsprop" style="font-size: 80%;"><b>Adam vs. AdaGrad and RMSProp</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs050.html#adaptivity-across-dimensions" style="font-size: 80%;"><b>Adaptivity Across Dimensions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs050.html#adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>ADAM algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs051.html#algorithms-and-codes-for-adagrad-rmsprop-and-adam" style="font-size: 80%;"><b>Algorithms and codes for Adagrad, RMSprop and Adam</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs052.html#practical-tips" style="font-size: 80%;"><b>Practical tips</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs053.html#sneaking-in-automatic-differentiation-using-autograd" style="font-size: 80%;"><b>Sneaking in automatic differentiation using Autograd</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs056.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs055.html#including-stochastic-gradient-descent-with-autograd" style="font-size: 80%;"><b>Including Stochastic Gradient Descent with Autograd</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs056.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs057.html#but-none-of-these-can-compete-with-newton-s-method" style="font-size: 80%;"><b>But none of these can compete with Newton's method</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs058.html#similar-second-order-function-now-problem-but-now-with-adagrad" style="font-size: 80%;"><b>Similar (second order function now) problem but now with AdaGrad</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs059.html#rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent" style="font-size: 80%;"><b>RMSprop for adaptive learning rate with Stochastic Gradient Descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs060.html#and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf" style="font-size: 80%;"><b>And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs061.html#material-for-the-lab-sessions" style="font-size: 80%;"><b>Material for the lab sessions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs062.html#reminder-on-different-scaling-methods" style="font-size: 80%;"><b>Reminder on different scaling methods</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs063.html#functionality-in-scikit-learn" style="font-size: 80%;"><b>Functionality in Scikit-Learn</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs064.html#more-preprocessing" style="font-size: 80%;"><b>More preprocessing</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs065.html#frequently-used-scaling-functions" style="font-size: 80%;"><b>Frequently used scaling functions</b></a></li>
 
         </ul>
       </li>
@@ -291,23 +374,31 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0022"></a>
 <!-- !split -->
-<h2 id="statistical-analysis" class="anchor">Statistical analysis </h2>
-<div class="panel panel-default">
-<div class="panel-body">
-<!-- subsequent paragraphs come in larger fonts, so start with a paragraph -->
+<h2 id="stochastic-gradient-descent-sgd" class="anchor">Stochastic Gradient Descent (SGD) </h2>
 
-<ul>
-<li> As in other experiments, many numerical  experiments have two classes of errors:</li>
-<ul>
-  <li> Statistical errors</li>
-  <li> Systematical errors</li>
-</ul>
-<li> Statistical errors can be estimated using standard tools from statistics</li>
-<li> Systematical errors are method specific and must be treated differently from case to case.</li> 
-</ul>
-</div>
-</div>
-    
+<p>In stochastic gradient descent, the extreme case is the case where we
+have only one batch, that is we include the whole data set.
+</p>
+
+<p>This process is called Stochastic Gradient
+Descent (SGD) (or also sometimes on-line gradient descent). This is
+relatively less common to see because in practice due to vectorized
+code optimizations it can be computationally much more efficient to
+evaluate the gradient for 100 examples, than the gradient for one
+example 100 times. Even though SGD technically refers to using a
+single example at a time to evaluate the gradient, you will hear
+people use the term SGD even when referring to mini-batch gradient
+descent (i.e. mentions of MGD for &#8220;Minibatch Gradient Descent&#8221;, or BGD
+for &#8220;Batch gradient descent&#8221; are rare to see), where it is usually
+assumed that mini-batches are used. The size of the mini-batch is a
+hyperparameter but it is not very common to cross-validate or bootstrap it. It is
+usually based on memory constraints (if any), or set to some value,
+e.g. 32, 64 or 128. We use powers of 2 in practice because many
+vectorized operation implementations work faster when their inputs are
+sized in powers of 2.
+</p>
+
+<p>In our notes with  SGD we mean stochastic gradient descent with mini-batches.</p>
 
 <p>
 <!-- navigation buttons at the bottom of the page -->
@@ -334,7 +425,7 @@ <h2 id="statistical-analysis" class="anchor">Statistical analysis </h2>
   <li><a href="http://github.com/._week37-bs030.html">31</a></li>
   <li><a href="http://github.com/._week37-bs031.html">32</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week37-bs053.html">54</a></li>
+  <li><a href="http://github.com/._week37-bs065.html">66</a></li>
   <li><a href="http://github.com/._week37-bs023.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week37/html/._week37-bs023.html b/doc/pub/week37/html/._week37-bs023.html
index e93b245a9..9ec237128 100644
--- a/doc/pub/week37/html/._week37-bs023.html
+++ b/doc/pub/week37/html/._week37-bs023.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 37: Statistical interpretations and Resampling Methods">
-<title>Week 37: Statistical interpretations and Resampling Methods</title>
+<meta name="description" content="Week 37: Gradient descent methods">
+<title>Week 37: Gradient descent methods</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week37.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week37-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -40,159 +40,222 @@
                2,
                None,
                'plans-for-week-37-lecture-monday'),
-              ('Plans for week 37, lab sessions',
+              ('Readings and Videos:', 2, None, 'readings-and-videos'),
+              ('Material for lecture Monday September 8',
                2,
                None,
-               'plans-for-week-37-lab-sessions'),
-              ('Material for lecture Monday September 9',
+               'material-for-lecture-monday-september-8'),
+              ('Gradient descent and revisiting Ordinary Least Squares from '
+               'last week',
                2,
                None,
-               'material-for-lecture-monday-september-9'),
-              ('Deriving OLS from a probability distribution',
+               'gradient-descent-and-revisiting-ordinary-least-squares-from-last-week'),
+              ('Gradient descent example', 2, None, 'gradient-descent-example'),
+              ('The derivative of the cost/loss function',
                2,
                None,
-               'deriving-ols-from-a-probability-distribution'),
-              ('Independent and Identically Distrubuted (iid)',
+               'the-derivative-of-the-cost-loss-function'),
+              ('The Hessian matrix', 2, None, 'the-hessian-matrix'),
+              ('Simple program', 2, None, 'simple-program'),
+              ('Gradient Descent Example', 2, None, 'gradient-descent-example'),
+              ('Gradient descent and Ridge',
                2,
                None,
-               'independent-and-identically-distrubuted-iid'),
-              ('Maximum Likelihood Estimation (MLE)',
+               'gradient-descent-and-ridge'),
+              ('The Hessian matrix for Ridge Regression',
                2,
                None,
-               'maximum-likelihood-estimation-mle'),
-              ('A new Cost Function', 2, None, 'a-new-cost-function'),
-              ("More basic Statistics and Bayes' theorem",
+               'the-hessian-matrix-for-ridge-regression'),
+              ('Program example for gradient descent with Ridge Regression',
                2,
                None,
-               'more-basic-statistics-and-bayes-theorem'),
-              ('Marginal Probability', 2, None, 'marginal-probability'),
-              ('Conditional  Probability', 2, None, 'conditional-probability'),
-              ("Bayes' Theorem", 2, None, 'bayes-theorem'),
-              ("Interpretations of Bayes' Theorem",
+               'program-example-for-gradient-descent-with-ridge-regression'),
+              ('Using gradient descent methods, limitations',
                2,
                None,
-               'interpretations-of-bayes-theorem'),
-              ("Example of Usage of Bayes' theorem",
+               'using-gradient-descent-methods-limitations'),
+              ('Momentum based GD', 2, None, 'momentum-based-gd'),
+              ('Improving gradient descent with momentum',
                2,
                None,
-               'example-of-usage-of-bayes-theorem'),
-              ('Doing it correctly', 2, None, 'doing-it-correctly'),
-              ("Bayes' Theorem and Ridge and Lasso Regression",
+               'improving-gradient-descent-with-momentum'),
+              ('Same code but now with momentum gradient descent',
                2,
                None,
-               'bayes-theorem-and-ridge-and-lasso-regression'),
-              ('Ridge and Bayes', 2, None, 'ridge-and-bayes'),
-              ('Lasso and Bayes', 2, None, 'lasso-and-bayes'),
-              ('Why resampling methods', 2, None, 'why-resampling-methods'),
-              ('Resampling methods', 2, None, 'resampling-methods'),
-              ('Resampling approaches can be computationally expensive',
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ('Overview video on Stochastic Gradient Descent (SGD)',
                2,
                None,
-               'resampling-approaches-can-be-computationally-expensive'),
-              ('Why resampling methods ?', 2, None, 'why-resampling-methods'),
-              ('Statistical analysis', 2, None, 'statistical-analysis'),
-              ('Resampling methods', 2, None, 'resampling-methods'),
-              ('Resampling methods: Bootstrap',
+               'overview-video-on-stochastic-gradient-descent-sgd'),
+              ('Batches and mini-batches', 2, None, 'batches-and-mini-batches'),
+              ('Pros and cons', 2, None, 'pros-and-cons'),
+              ('Convergence rates', 2, None, 'convergence-rates'),
+              ('Accuracy', 2, None, 'accuracy'),
+              ('Stochastic Gradient Descent (SGD)',
                2,
                None,
-               'resampling-methods-bootstrap'),
-              ('The Central Limit Theorem',
+               'stochastic-gradient-descent-sgd'),
+              ('Stochastic Gradient Descent',
                2,
                None,
-               'the-central-limit-theorem'),
-              ('Finding the Limit', 2, None, 'finding-the-limit'),
-              ('Rewriting the $\\delta$-function',
+               'stochastic-gradient-descent'),
+              ('Computation of gradients', 2, None, 'computation-of-gradients'),
+              ('SGD example', 2, None, 'sgd-example'),
+              ('The gradient step', 2, None, 'the-gradient-step'),
+              ('Simple example code', 2, None, 'simple-example-code'),
+              ('When do we stop?', 2, None, 'when-do-we-stop'),
+              ('Slightly different approach',
                2,
                None,
-               'rewriting-the-delta-function'),
-              ('Identifying Terms', 2, None, 'identifying-terms'),
-              ('Wrapping it up', 2, None, 'wrapping-it-up'),
-              ('Confidence Intervals', 2, None, 'confidence-intervals'),
-              ('Standard Approach based on the Normal Distribution',
+               'slightly-different-approach'),
+              ('Time decay rate', 2, None, 'time-decay-rate'),
+              ('Code with a Number of Minibatches which varies',
                2,
                None,
-               'standard-approach-based-on-the-normal-distribution'),
-              ('Resampling methods: Bootstrap background',
+               'code-with-a-number-of-minibatches-which-varies'),
+              ('Replace or not', 2, None, 'replace-or-not'),
+              ('SGD vs Full-Batch GD: Convergence Speed and Memory Comparison',
                2,
                None,
-               'resampling-methods-bootstrap-background'),
-              ('Resampling methods: More Bootstrap background',
+               'sgd-vs-full-batch-gd-convergence-speed-and-memory-comparison'),
+              ('Theoretical Convergence Speed and convex optimization',
+               3,
+               None,
+               'theoretical-convergence-speed-and-convex-optimization'),
+              ('Strongly Convex Case', 3, None, 'strongly-convex-case'),
+              ('Non-Convex Problems', 3, None, 'non-convex-problems'),
+              ('Memory Usage and Scalability',
+               2,
+               None,
+               'memory-usage-and-scalability'),
+              ('Empirical Evidence: Convergence Time and Memory in Practice',
+               2,
+               None,
+               'empirical-evidence-convergence-time-and-memory-in-practice'),
+              ('Deep Neural Networks', 3, None, 'deep-neural-networks'),
+              ('Memory constraints', 3, None, 'memory-constraints'),
+              ('Second moment of the gradient',
+               2,
+               None,
+               'second-moment-of-the-gradient'),
+              ('Challenge: Choosing a Fixed Learning Rate',
                2,
                None,
-               'resampling-methods-more-bootstrap-background'),
-              ('Resampling methods: Bootstrap approach',
+               'challenge-choosing-a-fixed-learning-rate'),
+              ('Motivation for Adaptive Step Sizes',
                2,
                None,
-               'resampling-methods-bootstrap-approach'),
-              ('Resampling methods: Bootstrap steps',
+               'motivation-for-adaptive-step-sizes'),
+              ('AdaGrad algorithm, taken from "Goodfellow et '
+               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
                2,
                None,
-               'resampling-methods-bootstrap-steps'),
-              ('Code example for the Bootstrap method',
+               'adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
+              ('Derivation of the AdaGrad Algorithm',
                2,
                None,
-               'code-example-for-the-bootstrap-method'),
-              ('Plotting the Histogram', 2, None, 'plotting-the-histogram'),
-              ('The bias-variance tradeoff',
+               'derivation-of-the-adagrad-algorithm'),
+              ('AdaGrad Update Rule Derivation',
                2,
                None,
-               'the-bias-variance-tradeoff'),
-              ('A way to Read the Bias-Variance Tradeoff',
+               'adagrad-update-rule-derivation'),
+              ('AdaGrad Properties', 2, None, 'adagrad-properties'),
+              ('RMSProp: Adaptive Learning Rates',
                2,
                None,
-               'a-way-to-read-the-bias-variance-tradeoff'),
-              ('Example code for Bias-Variance tradeoff',
+               'rmsprop-adaptive-learning-rates'),
+              ('RMSProp algorithm, taken from "Goodfellow et '
+               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
                2,
                None,
-               'example-code-for-bias-variance-tradeoff'),
-              ('Understanding what happens',
+               'rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
+              ('Adam Optimizer', 2, None, 'adam-optimizer'),
+              ('"ADAM optimizer":"https://arxiv.org/abs/1412.6980"',
                2,
                None,
-               'understanding-what-happens'),
-              ('Summing up', 2, None, 'summing-up'),
-              ("Another Example from Scikit-Learn's Repository",
+               'adam-optimizer-https-arxiv-org-abs-1412-6980'),
+              ('Why Combine Momentum and RMSProp?',
                2,
                None,
-               'another-example-from-scikit-learn-s-repository'),
-              ('Various steps in cross-validation',
+               'why-combine-momentum-and-rmsprop'),
+              ('Adam: Exponential Moving Averages (Moments)',
                2,
                None,
-               'various-steps-in-cross-validation'),
-              ('Cross-validation in brief',
+               'adam-exponential-moving-averages-moments'),
+              ('Adam: Bias Correction', 2, None, 'adam-bias-correction'),
+              ('Adam: Update Rule Derivation',
                2,
                None,
-               'cross-validation-in-brief'),
-              ('Code Example for Cross-validation and $k$-fold '
-               'Cross-validation',
+               'adam-update-rule-derivation'),
+              ('Adam vs. AdaGrad and RMSProp',
                2,
                None,
-               'code-example-for-cross-validation-and-k-fold-cross-validation'),
-              ('More examples on bootstrap and cross-validation and errors',
+               'adam-vs-adagrad-and-rmsprop'),
+              ('Adaptivity Across Dimensions',
                2,
                None,
-               'more-examples-on-bootstrap-and-cross-validation-and-errors'),
-              ('The same example but now with cross-validation',
+               'adaptivity-across-dimensions'),
+              ('ADAM algorithm, taken from "Goodfellow et '
+               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
                2,
                None,
-               'the-same-example-but-now-with-cross-validation'),
+               'adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
+              ('Algorithms and codes for Adagrad, RMSprop and Adam',
+               2,
+               None,
+               'algorithms-and-codes-for-adagrad-rmsprop-and-adam'),
+              ('Practical tips', 2, None, 'practical-tips'),
+              ('Sneaking in automatic differentiation using Autograd',
+               2,
+               None,
+               'sneaking-in-automatic-differentiation-using-autograd'),
+              ('Same code but now with momentum gradient descent',
+               2,
+               None,
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ('Including Stochastic Gradient Descent with Autograd',
+               2,
+               None,
+               'including-stochastic-gradient-descent-with-autograd'),
+              ('Same code but now with momentum gradient descent',
+               2,
+               None,
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ("But none of these can compete with Newton's method",
+               2,
+               None,
+               'but-none-of-these-can-compete-with-newton-s-method'),
+              ('Similar (second order function now) problem but now with '
+               'AdaGrad',
+               2,
+               None,
+               'similar-second-order-function-now-problem-but-now-with-adagrad'),
+              ('RMSprop for adaptive learning rate with Stochastic Gradient '
+               'Descent',
+               2,
+               None,
+               'rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent'),
+              ('And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"',
+               2,
+               None,
+               'and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf'),
               ('Material for the lab sessions',
                2,
                None,
                'material-for-the-lab-sessions'),
-              ('Linking the regression analysis with a statistical '
-               'interpretation',
+              ('Reminder on different scaling methods',
                2,
                None,
-               'linking-the-regression-analysis-with-a-statistical-interpretation'),
-              ('Assumptions made', 2, None, 'assumptions-made'),
-              ('Expectation value and variance',
+               'reminder-on-different-scaling-methods'),
+              ('Functionality in Scikit-Learn',
                2,
                None,
-               'expectation-value-and-variance'),
-              ('Expectation value and variance for $\\boldsymbol{\\beta}$',
+               'functionality-in-scikit-learn'),
+              ('More preprocessing', 2, None, 'more-preprocessing'),
+              ('Frequently used scaling functions',
                2,
                None,
-               'expectation-value-and-variance-for-boldsymbol-beta')]}
+               'frequently-used-scaling-functions')]}
 end of tocinfo -->
 
 <body>
@@ -220,66 +283,86 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week37-bs.html">Week 37: Statistical interpretations and Resampling Methods</a>
+    <a class="navbar-brand" href="http://github.com/week37-bs.html">Week 37: Gradient descent methods</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs001.html#plans-for-week-37-lecture-monday" style="font-size: 80%;">Plans for week 37, lecture Monday</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs002.html#plans-for-week-37-lab-sessions" style="font-size: 80%;">Plans for week 37, lab sessions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs003.html#material-for-lecture-monday-september-9" style="font-size: 80%;">Material for lecture Monday September 9</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs004.html#deriving-ols-from-a-probability-distribution" style="font-size: 80%;">Deriving OLS from a probability distribution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs005.html#independent-and-identically-distrubuted-iid" style="font-size: 80%;">Independent and Identically Distrubuted (iid)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs006.html#maximum-likelihood-estimation-mle" style="font-size: 80%;">Maximum Likelihood Estimation (MLE)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs007.html#a-new-cost-function" style="font-size: 80%;">A new Cost Function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs008.html#more-basic-statistics-and-bayes-theorem" style="font-size: 80%;">More basic Statistics and Bayes' theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs009.html#marginal-probability" style="font-size: 80%;">Marginal Probability</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs010.html#conditional-probability" style="font-size: 80%;">Conditional  Probability</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs011.html#bayes-theorem" style="font-size: 80%;">Bayes' Theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs012.html#interpretations-of-bayes-theorem" style="font-size: 80%;">Interpretations of Bayes' Theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs013.html#example-of-usage-of-bayes-theorem" style="font-size: 80%;">Example of Usage of Bayes' theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs014.html#doing-it-correctly" style="font-size: 80%;">Doing it correctly</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs015.html#bayes-theorem-and-ridge-and-lasso-regression" style="font-size: 80%;">Bayes' Theorem and Ridge and Lasso Regression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs016.html#ridge-and-bayes" style="font-size: 80%;">Ridge and Bayes</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs017.html#lasso-and-bayes" style="font-size: 80%;">Lasso and Bayes</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs021.html#why-resampling-methods" style="font-size: 80%;">Why resampling methods</a></li>
-     <!-- navigation toc: --> <li><a href="#resampling-methods" style="font-size: 80%;">Resampling methods</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs020.html#resampling-approaches-can-be-computationally-expensive" style="font-size: 80%;">Resampling approaches can be computationally expensive</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs021.html#why-resampling-methods" style="font-size: 80%;">Why resampling methods ?</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs022.html#statistical-analysis" style="font-size: 80%;">Statistical analysis</a></li>
-     <!-- navigation toc: --> <li><a href="#resampling-methods" style="font-size: 80%;">Resampling methods</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs024.html#resampling-methods-bootstrap" style="font-size: 80%;">Resampling methods: Bootstrap</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs025.html#the-central-limit-theorem" style="font-size: 80%;">The Central Limit Theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs026.html#finding-the-limit" style="font-size: 80%;">Finding the Limit</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs027.html#rewriting-the-delta-function" style="font-size: 80%;">Rewriting the \( \delta \)-function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs028.html#identifying-terms" style="font-size: 80%;">Identifying Terms</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs029.html#wrapping-it-up" style="font-size: 80%;">Wrapping it up</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs030.html#confidence-intervals" style="font-size: 80%;">Confidence Intervals</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs031.html#standard-approach-based-on-the-normal-distribution" style="font-size: 80%;">Standard Approach based on the Normal Distribution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs032.html#resampling-methods-bootstrap-background" style="font-size: 80%;">Resampling methods: Bootstrap background</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#resampling-methods-more-bootstrap-background" style="font-size: 80%;">Resampling methods: More Bootstrap background</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs034.html#resampling-methods-bootstrap-approach" style="font-size: 80%;">Resampling methods: Bootstrap approach</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#resampling-methods-bootstrap-steps" style="font-size: 80%;">Resampling methods: Bootstrap steps</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs036.html#code-example-for-the-bootstrap-method" style="font-size: 80%;">Code example for the Bootstrap method</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs037.html#plotting-the-histogram" style="font-size: 80%;">Plotting the Histogram</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs038.html#the-bias-variance-tradeoff" style="font-size: 80%;">The bias-variance tradeoff</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs039.html#a-way-to-read-the-bias-variance-tradeoff" style="font-size: 80%;">A way to Read the Bias-Variance Tradeoff</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs040.html#example-code-for-bias-variance-tradeoff" style="font-size: 80%;">Example code for Bias-Variance tradeoff</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs041.html#understanding-what-happens" style="font-size: 80%;">Understanding what happens</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs042.html#summing-up" style="font-size: 80%;">Summing up</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs043.html#another-example-from-scikit-learn-s-repository" style="font-size: 80%;">Another Example from Scikit-Learn's Repository</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs044.html#various-steps-in-cross-validation" style="font-size: 80%;">Various steps in cross-validation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs045.html#cross-validation-in-brief" style="font-size: 80%;">Cross-validation in brief</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs046.html#code-example-for-cross-validation-and-k-fold-cross-validation" style="font-size: 80%;">Code Example for Cross-validation and \( k \)-fold Cross-validation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs047.html#more-examples-on-bootstrap-and-cross-validation-and-errors" style="font-size: 80%;">More examples on bootstrap and cross-validation and errors</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs048.html#the-same-example-but-now-with-cross-validation" style="font-size: 80%;">The same example but now with cross-validation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs049.html#material-for-the-lab-sessions" style="font-size: 80%;">Material for the lab sessions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs050.html#linking-the-regression-analysis-with-a-statistical-interpretation" style="font-size: 80%;">Linking the regression analysis with a statistical interpretation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs051.html#assumptions-made" style="font-size: 80%;">Assumptions made</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs052.html#expectation-value-and-variance" style="font-size: 80%;">Expectation value and variance</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs053.html#expectation-value-and-variance-for-boldsymbol-beta" style="font-size: 80%;">Expectation value and variance for \( \boldsymbol{\beta} \)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs001.html#plans-for-week-37-lecture-monday" style="font-size: 80%;"><b>Plans for week 37, lecture Monday</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs002.html#readings-and-videos" style="font-size: 80%;"><b>Readings and Videos:</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs003.html#material-for-lecture-monday-september-8" style="font-size: 80%;"><b>Material for lecture Monday September 8</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs004.html#gradient-descent-and-revisiting-ordinary-least-squares-from-last-week" style="font-size: 80%;"><b>Gradient descent and revisiting Ordinary Least Squares from last week</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs009.html#gradient-descent-example" style="font-size: 80%;"><b>Gradient descent example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs006.html#the-derivative-of-the-cost-loss-function" style="font-size: 80%;"><b>The derivative of the cost/loss function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs007.html#the-hessian-matrix" style="font-size: 80%;"><b>The Hessian matrix</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs008.html#simple-program" style="font-size: 80%;"><b>Simple program</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs009.html#gradient-descent-example" style="font-size: 80%;"><b>Gradient Descent Example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs010.html#gradient-descent-and-ridge" style="font-size: 80%;"><b>Gradient descent and Ridge</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs011.html#the-hessian-matrix-for-ridge-regression" style="font-size: 80%;"><b>The Hessian matrix for Ridge Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs012.html#program-example-for-gradient-descent-with-ridge-regression" style="font-size: 80%;"><b>Program example for gradient descent with Ridge Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs013.html#using-gradient-descent-methods-limitations" style="font-size: 80%;"><b>Using gradient descent methods, limitations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs014.html#momentum-based-gd" style="font-size: 80%;"><b>Momentum based GD</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs015.html#improving-gradient-descent-with-momentum" style="font-size: 80%;"><b>Improving gradient descent with momentum</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs056.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs017.html#overview-video-on-stochastic-gradient-descent-sgd" style="font-size: 80%;"><b>Overview video on Stochastic Gradient Descent (SGD)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs018.html#batches-and-mini-batches" style="font-size: 80%;"><b>Batches and mini-batches</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs019.html#pros-and-cons" style="font-size: 80%;"><b>Pros and cons</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs020.html#convergence-rates" style="font-size: 80%;"><b>Convergence rates</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs021.html#accuracy" style="font-size: 80%;"><b>Accuracy</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs022.html#stochastic-gradient-descent-sgd" style="font-size: 80%;"><b>Stochastic Gradient Descent (SGD)</b></a></li>
+     <!-- navigation toc: --> <li><a href="#stochastic-gradient-descent" style="font-size: 80%;"><b>Stochastic Gradient Descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs024.html#computation-of-gradients" style="font-size: 80%;"><b>Computation of gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs025.html#sgd-example" style="font-size: 80%;"><b>SGD example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs026.html#the-gradient-step" style="font-size: 80%;"><b>The gradient step</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs027.html#simple-example-code" style="font-size: 80%;"><b>Simple example code</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs028.html#when-do-we-stop" style="font-size: 80%;"><b>When do we stop?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs029.html#slightly-different-approach" style="font-size: 80%;"><b>Slightly different approach</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs030.html#time-decay-rate" style="font-size: 80%;"><b>Time decay rate</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs031.html#code-with-a-number-of-minibatches-which-varies" style="font-size: 80%;"><b>Code with a Number of Minibatches which varies</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs032.html#replace-or-not" style="font-size: 80%;"><b>Replace or not</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#sgd-vs-full-batch-gd-convergence-speed-and-memory-comparison" style="font-size: 80%;"><b>SGD vs Full-Batch GD: Convergence Speed and Memory Comparison</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#theoretical-convergence-speed-and-convex-optimization" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Theoretical Convergence Speed and convex optimization</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#strongly-convex-case" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Strongly Convex Case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#non-convex-problems" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Non-Convex Problems</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs034.html#memory-usage-and-scalability" style="font-size: 80%;"><b>Memory Usage and Scalability</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#empirical-evidence-convergence-time-and-memory-in-practice" style="font-size: 80%;"><b>Empirical Evidence: Convergence Time and Memory in Practice</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#deep-neural-networks" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Deep Neural Networks</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#memory-constraints" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Memory constraints</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs036.html#second-moment-of-the-gradient" style="font-size: 80%;"><b>Second moment of the gradient</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs037.html#challenge-choosing-a-fixed-learning-rate" style="font-size: 80%;"><b>Challenge: Choosing a Fixed Learning Rate</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs038.html#motivation-for-adaptive-step-sizes" style="font-size: 80%;"><b>Motivation for Adaptive Step Sizes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs038.html#adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>AdaGrad algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs039.html#derivation-of-the-adagrad-algorithm" style="font-size: 80%;"><b>Derivation of the AdaGrad Algorithm</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs040.html#adagrad-update-rule-derivation" style="font-size: 80%;"><b>AdaGrad Update Rule Derivation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs041.html#adagrad-properties" style="font-size: 80%;"><b>AdaGrad Properties</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs042.html#rmsprop-adaptive-learning-rates" style="font-size: 80%;"><b>RMSProp: Adaptive Learning Rates</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs042.html#rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>RMSProp algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs043.html#adam-optimizer" style="font-size: 80%;"><b>Adam Optimizer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs044.html#adam-optimizer-https-arxiv-org-abs-1412-6980" style="font-size: 80%;"><b>"ADAM optimizer":"https://arxiv.org/abs/1412.6980"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs045.html#why-combine-momentum-and-rmsprop" style="font-size: 80%;"><b>Why Combine Momentum and RMSProp?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs046.html#adam-exponential-moving-averages-moments" style="font-size: 80%;"><b>Adam: Exponential Moving Averages (Moments)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs047.html#adam-bias-correction" style="font-size: 80%;"><b>Adam: Bias Correction</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs048.html#adam-update-rule-derivation" style="font-size: 80%;"><b>Adam: Update Rule Derivation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs049.html#adam-vs-adagrad-and-rmsprop" style="font-size: 80%;"><b>Adam vs. AdaGrad and RMSProp</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs050.html#adaptivity-across-dimensions" style="font-size: 80%;"><b>Adaptivity Across Dimensions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs050.html#adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>ADAM algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs051.html#algorithms-and-codes-for-adagrad-rmsprop-and-adam" style="font-size: 80%;"><b>Algorithms and codes for Adagrad, RMSprop and Adam</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs052.html#practical-tips" style="font-size: 80%;"><b>Practical tips</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs053.html#sneaking-in-automatic-differentiation-using-autograd" style="font-size: 80%;"><b>Sneaking in automatic differentiation using Autograd</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs056.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs055.html#including-stochastic-gradient-descent-with-autograd" style="font-size: 80%;"><b>Including Stochastic Gradient Descent with Autograd</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs056.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs057.html#but-none-of-these-can-compete-with-newton-s-method" style="font-size: 80%;"><b>But none of these can compete with Newton's method</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs058.html#similar-second-order-function-now-problem-but-now-with-adagrad" style="font-size: 80%;"><b>Similar (second order function now) problem but now with AdaGrad</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs059.html#rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent" style="font-size: 80%;"><b>RMSprop for adaptive learning rate with Stochastic Gradient Descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs060.html#and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf" style="font-size: 80%;"><b>And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs061.html#material-for-the-lab-sessions" style="font-size: 80%;"><b>Material for the lab sessions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs062.html#reminder-on-different-scaling-methods" style="font-size: 80%;"><b>Reminder on different scaling methods</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs063.html#functionality-in-scikit-learn" style="font-size: 80%;"><b>Functionality in Scikit-Learn</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs064.html#more-preprocessing" style="font-size: 80%;"><b>More preprocessing</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs065.html#frequently-used-scaling-functions" style="font-size: 80%;"><b>Frequently used scaling functions</b></a></li>
 
         </ul>
       </li>
@@ -291,30 +374,21 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0023"></a>
 <!-- !split -->
-<h2 id="resampling-methods" class="anchor">Resampling methods </h2>
+<h2 id="stochastic-gradient-descent" class="anchor">Stochastic Gradient Descent </h2>
 
-<p>With all these analytical equations for both the OLS and Ridge
-regression, we will now outline how to assess a given model. This will
-lead to a discussion of the so-called bias-variance tradeoff (see
-below) and so-called resampling methods.
+<p>Stochastic gradient descent (SGD) and variants thereof address some of
+the shortcomings of the Gradient descent method discussed above.
 </p>
 
-<p>One of the quantities we have discussed as a way to measure errors is
-the mean-squared error (MSE), mainly used for fitting of continuous
-functions. Another choice is the absolute error.
+<p>The underlying idea of SGD comes from the observation that the cost
+function, which we want to minimize, can almost always be written as a
+sum over \( n \) data points \( \{\mathbf{x}_i\}_{i=1}^n \),
 </p>
+$$
+C(\mathbf{\theta}) = \sum_{i=1}^n c_i(\mathbf{x}_i,
+\mathbf{\theta}). 
+$$
 
-<p>In the discussions below we will focus on the MSE and in particular since we will split the data into test and training data,
-we discuss the
-</p>
-<ol>
-<li> prediction error or simply the <b>test error</b> \( \mathrm{Err_{Test}} \), where we have a fixed training set and the test error is the MSE arising from the data reserved for testing. We discuss also the</li> 
-<li> training error \( \mathrm{Err_{Train}} \), which is the average loss over the training data.</li>
-</ol>
-<p>As our model becomes more and more complex, more of the training data tends to  used. The training may thence adapt to more complicated structures in the data. This may lead to a decrease in the bias (see below for code example) and a slight increase of the variance for the test error.
-For a certain level of complexity the test error will reach minimum, before starting to increase again. The
-training error reaches a saturation.
-</p>
 
 <p>
 <!-- navigation buttons at the bottom of the page -->
@@ -341,7 +415,7 @@ <h2 id="resampling-methods" class="anchor">Resampling methods </h2>
   <li><a href="http://github.com/._week37-bs031.html">32</a></li>
   <li><a href="http://github.com/._week37-bs032.html">33</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week37-bs053.html">54</a></li>
+  <li><a href="http://github.com/._week37-bs065.html">66</a></li>
   <li><a href="http://github.com/._week37-bs024.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week37/html/._week37-bs024.html b/doc/pub/week37/html/._week37-bs024.html
index 28c3a2321..fb7c448b1 100644
--- a/doc/pub/week37/html/._week37-bs024.html
+++ b/doc/pub/week37/html/._week37-bs024.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 37: Statistical interpretations and Resampling Methods">
-<title>Week 37: Statistical interpretations and Resampling Methods</title>
+<meta name="description" content="Week 37: Gradient descent methods">
+<title>Week 37: Gradient descent methods</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week37.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week37-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -40,159 +40,222 @@
                2,
                None,
                'plans-for-week-37-lecture-monday'),
-              ('Plans for week 37, lab sessions',
+              ('Readings and Videos:', 2, None, 'readings-and-videos'),
+              ('Material for lecture Monday September 8',
                2,
                None,
-               'plans-for-week-37-lab-sessions'),
-              ('Material for lecture Monday September 9',
+               'material-for-lecture-monday-september-8'),
+              ('Gradient descent and revisiting Ordinary Least Squares from '
+               'last week',
                2,
                None,
-               'material-for-lecture-monday-september-9'),
-              ('Deriving OLS from a probability distribution',
+               'gradient-descent-and-revisiting-ordinary-least-squares-from-last-week'),
+              ('Gradient descent example', 2, None, 'gradient-descent-example'),
+              ('The derivative of the cost/loss function',
                2,
                None,
-               'deriving-ols-from-a-probability-distribution'),
-              ('Independent and Identically Distrubuted (iid)',
+               'the-derivative-of-the-cost-loss-function'),
+              ('The Hessian matrix', 2, None, 'the-hessian-matrix'),
+              ('Simple program', 2, None, 'simple-program'),
+              ('Gradient Descent Example', 2, None, 'gradient-descent-example'),
+              ('Gradient descent and Ridge',
                2,
                None,
-               'independent-and-identically-distrubuted-iid'),
-              ('Maximum Likelihood Estimation (MLE)',
+               'gradient-descent-and-ridge'),
+              ('The Hessian matrix for Ridge Regression',
                2,
                None,
-               'maximum-likelihood-estimation-mle'),
-              ('A new Cost Function', 2, None, 'a-new-cost-function'),
-              ("More basic Statistics and Bayes' theorem",
+               'the-hessian-matrix-for-ridge-regression'),
+              ('Program example for gradient descent with Ridge Regression',
                2,
                None,
-               'more-basic-statistics-and-bayes-theorem'),
-              ('Marginal Probability', 2, None, 'marginal-probability'),
-              ('Conditional  Probability', 2, None, 'conditional-probability'),
-              ("Bayes' Theorem", 2, None, 'bayes-theorem'),
-              ("Interpretations of Bayes' Theorem",
+               'program-example-for-gradient-descent-with-ridge-regression'),
+              ('Using gradient descent methods, limitations',
                2,
                None,
-               'interpretations-of-bayes-theorem'),
-              ("Example of Usage of Bayes' theorem",
+               'using-gradient-descent-methods-limitations'),
+              ('Momentum based GD', 2, None, 'momentum-based-gd'),
+              ('Improving gradient descent with momentum',
                2,
                None,
-               'example-of-usage-of-bayes-theorem'),
-              ('Doing it correctly', 2, None, 'doing-it-correctly'),
-              ("Bayes' Theorem and Ridge and Lasso Regression",
+               'improving-gradient-descent-with-momentum'),
+              ('Same code but now with momentum gradient descent',
                2,
                None,
-               'bayes-theorem-and-ridge-and-lasso-regression'),
-              ('Ridge and Bayes', 2, None, 'ridge-and-bayes'),
-              ('Lasso and Bayes', 2, None, 'lasso-and-bayes'),
-              ('Why resampling methods', 2, None, 'why-resampling-methods'),
-              ('Resampling methods', 2, None, 'resampling-methods'),
-              ('Resampling approaches can be computationally expensive',
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ('Overview video on Stochastic Gradient Descent (SGD)',
                2,
                None,
-               'resampling-approaches-can-be-computationally-expensive'),
-              ('Why resampling methods ?', 2, None, 'why-resampling-methods'),
-              ('Statistical analysis', 2, None, 'statistical-analysis'),
-              ('Resampling methods', 2, None, 'resampling-methods'),
-              ('Resampling methods: Bootstrap',
+               'overview-video-on-stochastic-gradient-descent-sgd'),
+              ('Batches and mini-batches', 2, None, 'batches-and-mini-batches'),
+              ('Pros and cons', 2, None, 'pros-and-cons'),
+              ('Convergence rates', 2, None, 'convergence-rates'),
+              ('Accuracy', 2, None, 'accuracy'),
+              ('Stochastic Gradient Descent (SGD)',
                2,
                None,
-               'resampling-methods-bootstrap'),
-              ('The Central Limit Theorem',
+               'stochastic-gradient-descent-sgd'),
+              ('Stochastic Gradient Descent',
                2,
                None,
-               'the-central-limit-theorem'),
-              ('Finding the Limit', 2, None, 'finding-the-limit'),
-              ('Rewriting the $\\delta$-function',
+               'stochastic-gradient-descent'),
+              ('Computation of gradients', 2, None, 'computation-of-gradients'),
+              ('SGD example', 2, None, 'sgd-example'),
+              ('The gradient step', 2, None, 'the-gradient-step'),
+              ('Simple example code', 2, None, 'simple-example-code'),
+              ('When do we stop?', 2, None, 'when-do-we-stop'),
+              ('Slightly different approach',
                2,
                None,
-               'rewriting-the-delta-function'),
-              ('Identifying Terms', 2, None, 'identifying-terms'),
-              ('Wrapping it up', 2, None, 'wrapping-it-up'),
-              ('Confidence Intervals', 2, None, 'confidence-intervals'),
-              ('Standard Approach based on the Normal Distribution',
+               'slightly-different-approach'),
+              ('Time decay rate', 2, None, 'time-decay-rate'),
+              ('Code with a Number of Minibatches which varies',
                2,
                None,
-               'standard-approach-based-on-the-normal-distribution'),
-              ('Resampling methods: Bootstrap background',
+               'code-with-a-number-of-minibatches-which-varies'),
+              ('Replace or not', 2, None, 'replace-or-not'),
+              ('SGD vs Full-Batch GD: Convergence Speed and Memory Comparison',
                2,
                None,
-               'resampling-methods-bootstrap-background'),
-              ('Resampling methods: More Bootstrap background',
+               'sgd-vs-full-batch-gd-convergence-speed-and-memory-comparison'),
+              ('Theoretical Convergence Speed and convex optimization',
+               3,
+               None,
+               'theoretical-convergence-speed-and-convex-optimization'),
+              ('Strongly Convex Case', 3, None, 'strongly-convex-case'),
+              ('Non-Convex Problems', 3, None, 'non-convex-problems'),
+              ('Memory Usage and Scalability',
+               2,
+               None,
+               'memory-usage-and-scalability'),
+              ('Empirical Evidence: Convergence Time and Memory in Practice',
+               2,
+               None,
+               'empirical-evidence-convergence-time-and-memory-in-practice'),
+              ('Deep Neural Networks', 3, None, 'deep-neural-networks'),
+              ('Memory constraints', 3, None, 'memory-constraints'),
+              ('Second moment of the gradient',
+               2,
+               None,
+               'second-moment-of-the-gradient'),
+              ('Challenge: Choosing a Fixed Learning Rate',
+               2,
+               None,
+               'challenge-choosing-a-fixed-learning-rate'),
+              ('Motivation for Adaptive Step Sizes',
+               2,
+               None,
+               'motivation-for-adaptive-step-sizes'),
+              ('AdaGrad algorithm, taken from "Goodfellow et '
+               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
+               2,
+               None,
+               'adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
+              ('Derivation of the AdaGrad Algorithm',
+               2,
+               None,
+               'derivation-of-the-adagrad-algorithm'),
+              ('AdaGrad Update Rule Derivation',
                2,
                None,
-               'resampling-methods-more-bootstrap-background'),
-              ('Resampling methods: Bootstrap approach',
+               'adagrad-update-rule-derivation'),
+              ('AdaGrad Properties', 2, None, 'adagrad-properties'),
+              ('RMSProp: Adaptive Learning Rates',
                2,
                None,
-               'resampling-methods-bootstrap-approach'),
-              ('Resampling methods: Bootstrap steps',
+               'rmsprop-adaptive-learning-rates'),
+              ('RMSProp algorithm, taken from "Goodfellow et '
+               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
                2,
                None,
-               'resampling-methods-bootstrap-steps'),
-              ('Code example for the Bootstrap method',
+               'rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
+              ('Adam Optimizer', 2, None, 'adam-optimizer'),
+              ('"ADAM optimizer":"https://arxiv.org/abs/1412.6980"',
                2,
                None,
-               'code-example-for-the-bootstrap-method'),
-              ('Plotting the Histogram', 2, None, 'plotting-the-histogram'),
-              ('The bias-variance tradeoff',
+               'adam-optimizer-https-arxiv-org-abs-1412-6980'),
+              ('Why Combine Momentum and RMSProp?',
                2,
                None,
-               'the-bias-variance-tradeoff'),
-              ('A way to Read the Bias-Variance Tradeoff',
+               'why-combine-momentum-and-rmsprop'),
+              ('Adam: Exponential Moving Averages (Moments)',
                2,
                None,
-               'a-way-to-read-the-bias-variance-tradeoff'),
-              ('Example code for Bias-Variance tradeoff',
+               'adam-exponential-moving-averages-moments'),
+              ('Adam: Bias Correction', 2, None, 'adam-bias-correction'),
+              ('Adam: Update Rule Derivation',
                2,
                None,
-               'example-code-for-bias-variance-tradeoff'),
-              ('Understanding what happens',
+               'adam-update-rule-derivation'),
+              ('Adam vs. AdaGrad and RMSProp',
                2,
                None,
-               'understanding-what-happens'),
-              ('Summing up', 2, None, 'summing-up'),
-              ("Another Example from Scikit-Learn's Repository",
+               'adam-vs-adagrad-and-rmsprop'),
+              ('Adaptivity Across Dimensions',
                2,
                None,
-               'another-example-from-scikit-learn-s-repository'),
-              ('Various steps in cross-validation',
+               'adaptivity-across-dimensions'),
+              ('ADAM algorithm, taken from "Goodfellow et '
+               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
                2,
                None,
-               'various-steps-in-cross-validation'),
-              ('Cross-validation in brief',
+               'adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
+              ('Algorithms and codes for Adagrad, RMSprop and Adam',
                2,
                None,
-               'cross-validation-in-brief'),
-              ('Code Example for Cross-validation and $k$-fold '
-               'Cross-validation',
+               'algorithms-and-codes-for-adagrad-rmsprop-and-adam'),
+              ('Practical tips', 2, None, 'practical-tips'),
+              ('Sneaking in automatic differentiation using Autograd',
                2,
                None,
-               'code-example-for-cross-validation-and-k-fold-cross-validation'),
-              ('More examples on bootstrap and cross-validation and errors',
+               'sneaking-in-automatic-differentiation-using-autograd'),
+              ('Same code but now with momentum gradient descent',
                2,
                None,
-               'more-examples-on-bootstrap-and-cross-validation-and-errors'),
-              ('The same example but now with cross-validation',
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ('Including Stochastic Gradient Descent with Autograd',
                2,
                None,
-               'the-same-example-but-now-with-cross-validation'),
+               'including-stochastic-gradient-descent-with-autograd'),
+              ('Same code but now with momentum gradient descent',
+               2,
+               None,
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ("But none of these can compete with Newton's method",
+               2,
+               None,
+               'but-none-of-these-can-compete-with-newton-s-method'),
+              ('Similar (second order function now) problem but now with '
+               'AdaGrad',
+               2,
+               None,
+               'similar-second-order-function-now-problem-but-now-with-adagrad'),
+              ('RMSprop for adaptive learning rate with Stochastic Gradient '
+               'Descent',
+               2,
+               None,
+               'rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent'),
+              ('And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"',
+               2,
+               None,
+               'and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf'),
               ('Material for the lab sessions',
                2,
                None,
                'material-for-the-lab-sessions'),
-              ('Linking the regression analysis with a statistical '
-               'interpretation',
+              ('Reminder on different scaling methods',
                2,
                None,
-               'linking-the-regression-analysis-with-a-statistical-interpretation'),
-              ('Assumptions made', 2, None, 'assumptions-made'),
-              ('Expectation value and variance',
+               'reminder-on-different-scaling-methods'),
+              ('Functionality in Scikit-Learn',
                2,
                None,
-               'expectation-value-and-variance'),
-              ('Expectation value and variance for $\\boldsymbol{\\beta}$',
+               'functionality-in-scikit-learn'),
+              ('More preprocessing', 2, None, 'more-preprocessing'),
+              ('Frequently used scaling functions',
                2,
                None,
-               'expectation-value-and-variance-for-boldsymbol-beta')]}
+               'frequently-used-scaling-functions')]}
 end of tocinfo -->
 
 <body>
@@ -220,66 +283,86 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week37-bs.html">Week 37: Statistical interpretations and Resampling Methods</a>
+    <a class="navbar-brand" href="http://github.com/week37-bs.html">Week 37: Gradient descent methods</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs001.html#plans-for-week-37-lecture-monday" style="font-size: 80%;">Plans for week 37, lecture Monday</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs002.html#plans-for-week-37-lab-sessions" style="font-size: 80%;">Plans for week 37, lab sessions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs003.html#material-for-lecture-monday-september-9" style="font-size: 80%;">Material for lecture Monday September 9</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs004.html#deriving-ols-from-a-probability-distribution" style="font-size: 80%;">Deriving OLS from a probability distribution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs005.html#independent-and-identically-distrubuted-iid" style="font-size: 80%;">Independent and Identically Distrubuted (iid)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs006.html#maximum-likelihood-estimation-mle" style="font-size: 80%;">Maximum Likelihood Estimation (MLE)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs007.html#a-new-cost-function" style="font-size: 80%;">A new Cost Function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs008.html#more-basic-statistics-and-bayes-theorem" style="font-size: 80%;">More basic Statistics and Bayes' theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs009.html#marginal-probability" style="font-size: 80%;">Marginal Probability</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs010.html#conditional-probability" style="font-size: 80%;">Conditional  Probability</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs011.html#bayes-theorem" style="font-size: 80%;">Bayes' Theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs012.html#interpretations-of-bayes-theorem" style="font-size: 80%;">Interpretations of Bayes' Theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs013.html#example-of-usage-of-bayes-theorem" style="font-size: 80%;">Example of Usage of Bayes' theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs014.html#doing-it-correctly" style="font-size: 80%;">Doing it correctly</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs015.html#bayes-theorem-and-ridge-and-lasso-regression" style="font-size: 80%;">Bayes' Theorem and Ridge and Lasso Regression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs016.html#ridge-and-bayes" style="font-size: 80%;">Ridge and Bayes</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs017.html#lasso-and-bayes" style="font-size: 80%;">Lasso and Bayes</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs021.html#why-resampling-methods" style="font-size: 80%;">Why resampling methods</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs023.html#resampling-methods" style="font-size: 80%;">Resampling methods</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs020.html#resampling-approaches-can-be-computationally-expensive" style="font-size: 80%;">Resampling approaches can be computationally expensive</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs021.html#why-resampling-methods" style="font-size: 80%;">Why resampling methods ?</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs022.html#statistical-analysis" style="font-size: 80%;">Statistical analysis</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs023.html#resampling-methods" style="font-size: 80%;">Resampling methods</a></li>
-     <!-- navigation toc: --> <li><a href="#resampling-methods-bootstrap" style="font-size: 80%;">Resampling methods: Bootstrap</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs025.html#the-central-limit-theorem" style="font-size: 80%;">The Central Limit Theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs026.html#finding-the-limit" style="font-size: 80%;">Finding the Limit</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs027.html#rewriting-the-delta-function" style="font-size: 80%;">Rewriting the \( \delta \)-function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs028.html#identifying-terms" style="font-size: 80%;">Identifying Terms</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs029.html#wrapping-it-up" style="font-size: 80%;">Wrapping it up</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs030.html#confidence-intervals" style="font-size: 80%;">Confidence Intervals</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs031.html#standard-approach-based-on-the-normal-distribution" style="font-size: 80%;">Standard Approach based on the Normal Distribution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs032.html#resampling-methods-bootstrap-background" style="font-size: 80%;">Resampling methods: Bootstrap background</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#resampling-methods-more-bootstrap-background" style="font-size: 80%;">Resampling methods: More Bootstrap background</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs034.html#resampling-methods-bootstrap-approach" style="font-size: 80%;">Resampling methods: Bootstrap approach</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#resampling-methods-bootstrap-steps" style="font-size: 80%;">Resampling methods: Bootstrap steps</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs036.html#code-example-for-the-bootstrap-method" style="font-size: 80%;">Code example for the Bootstrap method</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs037.html#plotting-the-histogram" style="font-size: 80%;">Plotting the Histogram</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs038.html#the-bias-variance-tradeoff" style="font-size: 80%;">The bias-variance tradeoff</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs039.html#a-way-to-read-the-bias-variance-tradeoff" style="font-size: 80%;">A way to Read the Bias-Variance Tradeoff</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs040.html#example-code-for-bias-variance-tradeoff" style="font-size: 80%;">Example code for Bias-Variance tradeoff</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs041.html#understanding-what-happens" style="font-size: 80%;">Understanding what happens</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs042.html#summing-up" style="font-size: 80%;">Summing up</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs043.html#another-example-from-scikit-learn-s-repository" style="font-size: 80%;">Another Example from Scikit-Learn's Repository</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs044.html#various-steps-in-cross-validation" style="font-size: 80%;">Various steps in cross-validation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs045.html#cross-validation-in-brief" style="font-size: 80%;">Cross-validation in brief</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs046.html#code-example-for-cross-validation-and-k-fold-cross-validation" style="font-size: 80%;">Code Example for Cross-validation and \( k \)-fold Cross-validation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs047.html#more-examples-on-bootstrap-and-cross-validation-and-errors" style="font-size: 80%;">More examples on bootstrap and cross-validation and errors</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs048.html#the-same-example-but-now-with-cross-validation" style="font-size: 80%;">The same example but now with cross-validation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs049.html#material-for-the-lab-sessions" style="font-size: 80%;">Material for the lab sessions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs050.html#linking-the-regression-analysis-with-a-statistical-interpretation" style="font-size: 80%;">Linking the regression analysis with a statistical interpretation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs051.html#assumptions-made" style="font-size: 80%;">Assumptions made</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs052.html#expectation-value-and-variance" style="font-size: 80%;">Expectation value and variance</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs053.html#expectation-value-and-variance-for-boldsymbol-beta" style="font-size: 80%;">Expectation value and variance for \( \boldsymbol{\beta} \)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs001.html#plans-for-week-37-lecture-monday" style="font-size: 80%;"><b>Plans for week 37, lecture Monday</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs002.html#readings-and-videos" style="font-size: 80%;"><b>Readings and Videos:</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs003.html#material-for-lecture-monday-september-8" style="font-size: 80%;"><b>Material for lecture Monday September 8</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs004.html#gradient-descent-and-revisiting-ordinary-least-squares-from-last-week" style="font-size: 80%;"><b>Gradient descent and revisiting Ordinary Least Squares from last week</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs009.html#gradient-descent-example" style="font-size: 80%;"><b>Gradient descent example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs006.html#the-derivative-of-the-cost-loss-function" style="font-size: 80%;"><b>The derivative of the cost/loss function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs007.html#the-hessian-matrix" style="font-size: 80%;"><b>The Hessian matrix</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs008.html#simple-program" style="font-size: 80%;"><b>Simple program</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs009.html#gradient-descent-example" style="font-size: 80%;"><b>Gradient Descent Example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs010.html#gradient-descent-and-ridge" style="font-size: 80%;"><b>Gradient descent and Ridge</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs011.html#the-hessian-matrix-for-ridge-regression" style="font-size: 80%;"><b>The Hessian matrix for Ridge Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs012.html#program-example-for-gradient-descent-with-ridge-regression" style="font-size: 80%;"><b>Program example for gradient descent with Ridge Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs013.html#using-gradient-descent-methods-limitations" style="font-size: 80%;"><b>Using gradient descent methods, limitations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs014.html#momentum-based-gd" style="font-size: 80%;"><b>Momentum based GD</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs015.html#improving-gradient-descent-with-momentum" style="font-size: 80%;"><b>Improving gradient descent with momentum</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs056.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs017.html#overview-video-on-stochastic-gradient-descent-sgd" style="font-size: 80%;"><b>Overview video on Stochastic Gradient Descent (SGD)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs018.html#batches-and-mini-batches" style="font-size: 80%;"><b>Batches and mini-batches</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs019.html#pros-and-cons" style="font-size: 80%;"><b>Pros and cons</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs020.html#convergence-rates" style="font-size: 80%;"><b>Convergence rates</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs021.html#accuracy" style="font-size: 80%;"><b>Accuracy</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs022.html#stochastic-gradient-descent-sgd" style="font-size: 80%;"><b>Stochastic Gradient Descent (SGD)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs023.html#stochastic-gradient-descent" style="font-size: 80%;"><b>Stochastic Gradient Descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="#computation-of-gradients" style="font-size: 80%;"><b>Computation of gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs025.html#sgd-example" style="font-size: 80%;"><b>SGD example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs026.html#the-gradient-step" style="font-size: 80%;"><b>The gradient step</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs027.html#simple-example-code" style="font-size: 80%;"><b>Simple example code</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs028.html#when-do-we-stop" style="font-size: 80%;"><b>When do we stop?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs029.html#slightly-different-approach" style="font-size: 80%;"><b>Slightly different approach</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs030.html#time-decay-rate" style="font-size: 80%;"><b>Time decay rate</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs031.html#code-with-a-number-of-minibatches-which-varies" style="font-size: 80%;"><b>Code with a Number of Minibatches which varies</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs032.html#replace-or-not" style="font-size: 80%;"><b>Replace or not</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#sgd-vs-full-batch-gd-convergence-speed-and-memory-comparison" style="font-size: 80%;"><b>SGD vs Full-Batch GD: Convergence Speed and Memory Comparison</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#theoretical-convergence-speed-and-convex-optimization" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Theoretical Convergence Speed and convex optimization</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#strongly-convex-case" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Strongly Convex Case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#non-convex-problems" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Non-Convex Problems</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs034.html#memory-usage-and-scalability" style="font-size: 80%;"><b>Memory Usage and Scalability</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#empirical-evidence-convergence-time-and-memory-in-practice" style="font-size: 80%;"><b>Empirical Evidence: Convergence Time and Memory in Practice</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#deep-neural-networks" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Deep Neural Networks</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#memory-constraints" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Memory constraints</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs036.html#second-moment-of-the-gradient" style="font-size: 80%;"><b>Second moment of the gradient</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs037.html#challenge-choosing-a-fixed-learning-rate" style="font-size: 80%;"><b>Challenge: Choosing a Fixed Learning Rate</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs038.html#motivation-for-adaptive-step-sizes" style="font-size: 80%;"><b>Motivation for Adaptive Step Sizes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs038.html#adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>AdaGrad algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs039.html#derivation-of-the-adagrad-algorithm" style="font-size: 80%;"><b>Derivation of the AdaGrad Algorithm</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs040.html#adagrad-update-rule-derivation" style="font-size: 80%;"><b>AdaGrad Update Rule Derivation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs041.html#adagrad-properties" style="font-size: 80%;"><b>AdaGrad Properties</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs042.html#rmsprop-adaptive-learning-rates" style="font-size: 80%;"><b>RMSProp: Adaptive Learning Rates</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs042.html#rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>RMSProp algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs043.html#adam-optimizer" style="font-size: 80%;"><b>Adam Optimizer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs044.html#adam-optimizer-https-arxiv-org-abs-1412-6980" style="font-size: 80%;"><b>"ADAM optimizer":"https://arxiv.org/abs/1412.6980"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs045.html#why-combine-momentum-and-rmsprop" style="font-size: 80%;"><b>Why Combine Momentum and RMSProp?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs046.html#adam-exponential-moving-averages-moments" style="font-size: 80%;"><b>Adam: Exponential Moving Averages (Moments)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs047.html#adam-bias-correction" style="font-size: 80%;"><b>Adam: Bias Correction</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs048.html#adam-update-rule-derivation" style="font-size: 80%;"><b>Adam: Update Rule Derivation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs049.html#adam-vs-adagrad-and-rmsprop" style="font-size: 80%;"><b>Adam vs. AdaGrad and RMSProp</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs050.html#adaptivity-across-dimensions" style="font-size: 80%;"><b>Adaptivity Across Dimensions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs050.html#adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>ADAM algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs051.html#algorithms-and-codes-for-adagrad-rmsprop-and-adam" style="font-size: 80%;"><b>Algorithms and codes for Adagrad, RMSprop and Adam</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs052.html#practical-tips" style="font-size: 80%;"><b>Practical tips</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs053.html#sneaking-in-automatic-differentiation-using-autograd" style="font-size: 80%;"><b>Sneaking in automatic differentiation using Autograd</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs056.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs055.html#including-stochastic-gradient-descent-with-autograd" style="font-size: 80%;"><b>Including Stochastic Gradient Descent with Autograd</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs056.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs057.html#but-none-of-these-can-compete-with-newton-s-method" style="font-size: 80%;"><b>But none of these can compete with Newton's method</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs058.html#similar-second-order-function-now-problem-but-now-with-adagrad" style="font-size: 80%;"><b>Similar (second order function now) problem but now with AdaGrad</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs059.html#rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent" style="font-size: 80%;"><b>RMSprop for adaptive learning rate with Stochastic Gradient Descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs060.html#and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf" style="font-size: 80%;"><b>And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs061.html#material-for-the-lab-sessions" style="font-size: 80%;"><b>Material for the lab sessions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs062.html#reminder-on-different-scaling-methods" style="font-size: 80%;"><b>Reminder on different scaling methods</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs063.html#functionality-in-scikit-learn" style="font-size: 80%;"><b>Functionality in Scikit-Learn</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs064.html#more-preprocessing" style="font-size: 80%;"><b>More preprocessing</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs065.html#frequently-used-scaling-functions" style="font-size: 80%;"><b>Frequently used scaling functions</b></a></li>
 
         </ul>
       </li>
@@ -291,28 +374,22 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0024"></a>
 <!-- !split -->
-<h2 id="resampling-methods-bootstrap" class="anchor">Resampling methods: Bootstrap </h2>
-<div class="panel panel-default">
-<div class="panel-body">
-<!-- subsequent paragraphs come in larger fonts, so start with a paragraph -->
-<p>Bootstrapping is a <a href="https://en.wikipedia.org/wiki/Nonparametric_statistics" target="_self">non-parametric approach</a> to statistical inference
-that substitutes computation for more traditional distributional
-assumptions and asymptotic results. Bootstrapping offers a number of
-advantages: 
-</p>
-<ol>
-<li> The bootstrap is quite general, although there are some cases in which it fails.</li>  
-<li> Because it does not require distributional assumptions (such as normally distributed errors), the bootstrap can provide more accurate inferences when the data are not well behaved or when the sample size is small.</li>  
-<li> It is possible to apply the bootstrap to statistics with sampling distributions that are difficult to derive, even asymptotically.</li> 
-<li> It is relatively simple to apply the bootstrap to complex data-collection plans (such as stratified and clustered samples).</li>
-</ol>
-</div>
-</div>
+<h2 id="computation-of-gradients" class="anchor">Computation of gradients </h2>
 
+<p>This in turn means that the gradient can be
+computed as a sum over \( i \)-gradients 
+</p>
+$$
+\nabla_\theta C(\mathbf{\theta}) = \sum_i^n \nabla_\theta c_i(\mathbf{x}_i,
+\mathbf{\theta}).
+$$
 
-<p>The textbook by <a href="https://www.cambridge.org/core/books/bootstrap-methods-and-their-application/ED2FD043579F27952363566DC09CBD6A" target="_self">Davison on the Bootstrap Methods and their Applications</a> provides many more insights and proofs. In this course we will take a more practical approach and use the results and theorems provided in the literature. For those interested in reading more about the bootstrap methods, we recommend the above text and the one by <a href="https://www.routledge.com/An-Introduction-to-the-Bootstrap/Efron-Tibshirani/p/book/9780412042317" target="_self">Efron and Tibshirani</a>.</p>
-
-<p>Before we proceed however, we need to remind ourselves about a central theorem in statistics, namely the so-called <b>central limit theorem</b>.</p>
+<p>Stochasticity/randomness is introduced by only taking the
+gradient on a subset of the data called minibatches.  If there are \( n \)
+data points and the size of each minibatch is \( M \), there will be \( n/M \)
+minibatches. We denote these minibatches by \( B_k \) where
+\( k=1,\cdots,n/M \).
+</p>
 
 <p>
 <!-- navigation buttons at the bottom of the page -->
@@ -339,7 +416,7 @@ <h2 id="resampling-methods-bootstrap" class="anchor">Resampling methods: Bootstr
   <li><a href="http://github.com/._week37-bs032.html">33</a></li>
   <li><a href="http://github.com/._week37-bs033.html">34</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week37-bs053.html">54</a></li>
+  <li><a href="http://github.com/._week37-bs065.html">66</a></li>
   <li><a href="http://github.com/._week37-bs025.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week37/html/._week37-bs025.html b/doc/pub/week37/html/._week37-bs025.html
index 6cf71a941..86fad8686 100644
--- a/doc/pub/week37/html/._week37-bs025.html
+++ b/doc/pub/week37/html/._week37-bs025.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 37: Statistical interpretations and Resampling Methods">
-<title>Week 37: Statistical interpretations and Resampling Methods</title>
+<meta name="description" content="Week 37: Gradient descent methods">
+<title>Week 37: Gradient descent methods</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week37.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week37-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -40,159 +40,222 @@
                2,
                None,
                'plans-for-week-37-lecture-monday'),
-              ('Plans for week 37, lab sessions',
+              ('Readings and Videos:', 2, None, 'readings-and-videos'),
+              ('Material for lecture Monday September 8',
                2,
                None,
-               'plans-for-week-37-lab-sessions'),
-              ('Material for lecture Monday September 9',
+               'material-for-lecture-monday-september-8'),
+              ('Gradient descent and revisiting Ordinary Least Squares from '
+               'last week',
                2,
                None,
-               'material-for-lecture-monday-september-9'),
-              ('Deriving OLS from a probability distribution',
+               'gradient-descent-and-revisiting-ordinary-least-squares-from-last-week'),
+              ('Gradient descent example', 2, None, 'gradient-descent-example'),
+              ('The derivative of the cost/loss function',
                2,
                None,
-               'deriving-ols-from-a-probability-distribution'),
-              ('Independent and Identically Distrubuted (iid)',
+               'the-derivative-of-the-cost-loss-function'),
+              ('The Hessian matrix', 2, None, 'the-hessian-matrix'),
+              ('Simple program', 2, None, 'simple-program'),
+              ('Gradient Descent Example', 2, None, 'gradient-descent-example'),
+              ('Gradient descent and Ridge',
                2,
                None,
-               'independent-and-identically-distrubuted-iid'),
-              ('Maximum Likelihood Estimation (MLE)',
+               'gradient-descent-and-ridge'),
+              ('The Hessian matrix for Ridge Regression',
                2,
                None,
-               'maximum-likelihood-estimation-mle'),
-              ('A new Cost Function', 2, None, 'a-new-cost-function'),
-              ("More basic Statistics and Bayes' theorem",
+               'the-hessian-matrix-for-ridge-regression'),
+              ('Program example for gradient descent with Ridge Regression',
                2,
                None,
-               'more-basic-statistics-and-bayes-theorem'),
-              ('Marginal Probability', 2, None, 'marginal-probability'),
-              ('Conditional  Probability', 2, None, 'conditional-probability'),
-              ("Bayes' Theorem", 2, None, 'bayes-theorem'),
-              ("Interpretations of Bayes' Theorem",
+               'program-example-for-gradient-descent-with-ridge-regression'),
+              ('Using gradient descent methods, limitations',
                2,
                None,
-               'interpretations-of-bayes-theorem'),
-              ("Example of Usage of Bayes' theorem",
+               'using-gradient-descent-methods-limitations'),
+              ('Momentum based GD', 2, None, 'momentum-based-gd'),
+              ('Improving gradient descent with momentum',
                2,
                None,
-               'example-of-usage-of-bayes-theorem'),
-              ('Doing it correctly', 2, None, 'doing-it-correctly'),
-              ("Bayes' Theorem and Ridge and Lasso Regression",
+               'improving-gradient-descent-with-momentum'),
+              ('Same code but now with momentum gradient descent',
                2,
                None,
-               'bayes-theorem-and-ridge-and-lasso-regression'),
-              ('Ridge and Bayes', 2, None, 'ridge-and-bayes'),
-              ('Lasso and Bayes', 2, None, 'lasso-and-bayes'),
-              ('Why resampling methods', 2, None, 'why-resampling-methods'),
-              ('Resampling methods', 2, None, 'resampling-methods'),
-              ('Resampling approaches can be computationally expensive',
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ('Overview video on Stochastic Gradient Descent (SGD)',
                2,
                None,
-               'resampling-approaches-can-be-computationally-expensive'),
-              ('Why resampling methods ?', 2, None, 'why-resampling-methods'),
-              ('Statistical analysis', 2, None, 'statistical-analysis'),
-              ('Resampling methods', 2, None, 'resampling-methods'),
-              ('Resampling methods: Bootstrap',
+               'overview-video-on-stochastic-gradient-descent-sgd'),
+              ('Batches and mini-batches', 2, None, 'batches-and-mini-batches'),
+              ('Pros and cons', 2, None, 'pros-and-cons'),
+              ('Convergence rates', 2, None, 'convergence-rates'),
+              ('Accuracy', 2, None, 'accuracy'),
+              ('Stochastic Gradient Descent (SGD)',
                2,
                None,
-               'resampling-methods-bootstrap'),
-              ('The Central Limit Theorem',
+               'stochastic-gradient-descent-sgd'),
+              ('Stochastic Gradient Descent',
                2,
                None,
-               'the-central-limit-theorem'),
-              ('Finding the Limit', 2, None, 'finding-the-limit'),
-              ('Rewriting the $\\delta$-function',
+               'stochastic-gradient-descent'),
+              ('Computation of gradients', 2, None, 'computation-of-gradients'),
+              ('SGD example', 2, None, 'sgd-example'),
+              ('The gradient step', 2, None, 'the-gradient-step'),
+              ('Simple example code', 2, None, 'simple-example-code'),
+              ('When do we stop?', 2, None, 'when-do-we-stop'),
+              ('Slightly different approach',
                2,
                None,
-               'rewriting-the-delta-function'),
-              ('Identifying Terms', 2, None, 'identifying-terms'),
-              ('Wrapping it up', 2, None, 'wrapping-it-up'),
-              ('Confidence Intervals', 2, None, 'confidence-intervals'),
-              ('Standard Approach based on the Normal Distribution',
+               'slightly-different-approach'),
+              ('Time decay rate', 2, None, 'time-decay-rate'),
+              ('Code with a Number of Minibatches which varies',
                2,
                None,
-               'standard-approach-based-on-the-normal-distribution'),
-              ('Resampling methods: Bootstrap background',
+               'code-with-a-number-of-minibatches-which-varies'),
+              ('Replace or not', 2, None, 'replace-or-not'),
+              ('SGD vs Full-Batch GD: Convergence Speed and Memory Comparison',
                2,
                None,
-               'resampling-methods-bootstrap-background'),
-              ('Resampling methods: More Bootstrap background',
+               'sgd-vs-full-batch-gd-convergence-speed-and-memory-comparison'),
+              ('Theoretical Convergence Speed and convex optimization',
+               3,
+               None,
+               'theoretical-convergence-speed-and-convex-optimization'),
+              ('Strongly Convex Case', 3, None, 'strongly-convex-case'),
+              ('Non-Convex Problems', 3, None, 'non-convex-problems'),
+              ('Memory Usage and Scalability',
+               2,
+               None,
+               'memory-usage-and-scalability'),
+              ('Empirical Evidence: Convergence Time and Memory in Practice',
+               2,
+               None,
+               'empirical-evidence-convergence-time-and-memory-in-practice'),
+              ('Deep Neural Networks', 3, None, 'deep-neural-networks'),
+              ('Memory constraints', 3, None, 'memory-constraints'),
+              ('Second moment of the gradient',
+               2,
+               None,
+               'second-moment-of-the-gradient'),
+              ('Challenge: Choosing a Fixed Learning Rate',
                2,
                None,
-               'resampling-methods-more-bootstrap-background'),
-              ('Resampling methods: Bootstrap approach',
+               'challenge-choosing-a-fixed-learning-rate'),
+              ('Motivation for Adaptive Step Sizes',
                2,
                None,
-               'resampling-methods-bootstrap-approach'),
-              ('Resampling methods: Bootstrap steps',
+               'motivation-for-adaptive-step-sizes'),
+              ('AdaGrad algorithm, taken from "Goodfellow et '
+               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
                2,
                None,
-               'resampling-methods-bootstrap-steps'),
-              ('Code example for the Bootstrap method',
+               'adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
+              ('Derivation of the AdaGrad Algorithm',
                2,
                None,
-               'code-example-for-the-bootstrap-method'),
-              ('Plotting the Histogram', 2, None, 'plotting-the-histogram'),
-              ('The bias-variance tradeoff',
+               'derivation-of-the-adagrad-algorithm'),
+              ('AdaGrad Update Rule Derivation',
                2,
                None,
-               'the-bias-variance-tradeoff'),
-              ('A way to Read the Bias-Variance Tradeoff',
+               'adagrad-update-rule-derivation'),
+              ('AdaGrad Properties', 2, None, 'adagrad-properties'),
+              ('RMSProp: Adaptive Learning Rates',
                2,
                None,
-               'a-way-to-read-the-bias-variance-tradeoff'),
-              ('Example code for Bias-Variance tradeoff',
+               'rmsprop-adaptive-learning-rates'),
+              ('RMSProp algorithm, taken from "Goodfellow et '
+               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
                2,
                None,
-               'example-code-for-bias-variance-tradeoff'),
-              ('Understanding what happens',
+               'rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
+              ('Adam Optimizer', 2, None, 'adam-optimizer'),
+              ('"ADAM optimizer":"https://arxiv.org/abs/1412.6980"',
                2,
                None,
-               'understanding-what-happens'),
-              ('Summing up', 2, None, 'summing-up'),
-              ("Another Example from Scikit-Learn's Repository",
+               'adam-optimizer-https-arxiv-org-abs-1412-6980'),
+              ('Why Combine Momentum and RMSProp?',
                2,
                None,
-               'another-example-from-scikit-learn-s-repository'),
-              ('Various steps in cross-validation',
+               'why-combine-momentum-and-rmsprop'),
+              ('Adam: Exponential Moving Averages (Moments)',
                2,
                None,
-               'various-steps-in-cross-validation'),
-              ('Cross-validation in brief',
+               'adam-exponential-moving-averages-moments'),
+              ('Adam: Bias Correction', 2, None, 'adam-bias-correction'),
+              ('Adam: Update Rule Derivation',
                2,
                None,
-               'cross-validation-in-brief'),
-              ('Code Example for Cross-validation and $k$-fold '
-               'Cross-validation',
+               'adam-update-rule-derivation'),
+              ('Adam vs. AdaGrad and RMSProp',
                2,
                None,
-               'code-example-for-cross-validation-and-k-fold-cross-validation'),
-              ('More examples on bootstrap and cross-validation and errors',
+               'adam-vs-adagrad-and-rmsprop'),
+              ('Adaptivity Across Dimensions',
                2,
                None,
-               'more-examples-on-bootstrap-and-cross-validation-and-errors'),
-              ('The same example but now with cross-validation',
+               'adaptivity-across-dimensions'),
+              ('ADAM algorithm, taken from "Goodfellow et '
+               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
                2,
                None,
-               'the-same-example-but-now-with-cross-validation'),
+               'adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
+              ('Algorithms and codes for Adagrad, RMSprop and Adam',
+               2,
+               None,
+               'algorithms-and-codes-for-adagrad-rmsprop-and-adam'),
+              ('Practical tips', 2, None, 'practical-tips'),
+              ('Sneaking in automatic differentiation using Autograd',
+               2,
+               None,
+               'sneaking-in-automatic-differentiation-using-autograd'),
+              ('Same code but now with momentum gradient descent',
+               2,
+               None,
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ('Including Stochastic Gradient Descent with Autograd',
+               2,
+               None,
+               'including-stochastic-gradient-descent-with-autograd'),
+              ('Same code but now with momentum gradient descent',
+               2,
+               None,
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ("But none of these can compete with Newton's method",
+               2,
+               None,
+               'but-none-of-these-can-compete-with-newton-s-method'),
+              ('Similar (second order function now) problem but now with '
+               'AdaGrad',
+               2,
+               None,
+               'similar-second-order-function-now-problem-but-now-with-adagrad'),
+              ('RMSprop for adaptive learning rate with Stochastic Gradient '
+               'Descent',
+               2,
+               None,
+               'rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent'),
+              ('And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"',
+               2,
+               None,
+               'and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf'),
               ('Material for the lab sessions',
                2,
                None,
                'material-for-the-lab-sessions'),
-              ('Linking the regression analysis with a statistical '
-               'interpretation',
+              ('Reminder on different scaling methods',
                2,
                None,
-               'linking-the-regression-analysis-with-a-statistical-interpretation'),
-              ('Assumptions made', 2, None, 'assumptions-made'),
-              ('Expectation value and variance',
+               'reminder-on-different-scaling-methods'),
+              ('Functionality in Scikit-Learn',
                2,
                None,
-               'expectation-value-and-variance'),
-              ('Expectation value and variance for $\\boldsymbol{\\beta}$',
+               'functionality-in-scikit-learn'),
+              ('More preprocessing', 2, None, 'more-preprocessing'),
+              ('Frequently used scaling functions',
                2,
                None,
-               'expectation-value-and-variance-for-boldsymbol-beta')]}
+               'frequently-used-scaling-functions')]}
 end of tocinfo -->
 
 <body>
@@ -220,66 +283,86 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week37-bs.html">Week 37: Statistical interpretations and Resampling Methods</a>
+    <a class="navbar-brand" href="http://github.com/week37-bs.html">Week 37: Gradient descent methods</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs001.html#plans-for-week-37-lecture-monday" style="font-size: 80%;">Plans for week 37, lecture Monday</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs002.html#plans-for-week-37-lab-sessions" style="font-size: 80%;">Plans for week 37, lab sessions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs003.html#material-for-lecture-monday-september-9" style="font-size: 80%;">Material for lecture Monday September 9</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs004.html#deriving-ols-from-a-probability-distribution" style="font-size: 80%;">Deriving OLS from a probability distribution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs005.html#independent-and-identically-distrubuted-iid" style="font-size: 80%;">Independent and Identically Distrubuted (iid)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs006.html#maximum-likelihood-estimation-mle" style="font-size: 80%;">Maximum Likelihood Estimation (MLE)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs007.html#a-new-cost-function" style="font-size: 80%;">A new Cost Function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs008.html#more-basic-statistics-and-bayes-theorem" style="font-size: 80%;">More basic Statistics and Bayes' theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs009.html#marginal-probability" style="font-size: 80%;">Marginal Probability</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs010.html#conditional-probability" style="font-size: 80%;">Conditional  Probability</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs011.html#bayes-theorem" style="font-size: 80%;">Bayes' Theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs012.html#interpretations-of-bayes-theorem" style="font-size: 80%;">Interpretations of Bayes' Theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs013.html#example-of-usage-of-bayes-theorem" style="font-size: 80%;">Example of Usage of Bayes' theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs014.html#doing-it-correctly" style="font-size: 80%;">Doing it correctly</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs015.html#bayes-theorem-and-ridge-and-lasso-regression" style="font-size: 80%;">Bayes' Theorem and Ridge and Lasso Regression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs016.html#ridge-and-bayes" style="font-size: 80%;">Ridge and Bayes</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs017.html#lasso-and-bayes" style="font-size: 80%;">Lasso and Bayes</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs021.html#why-resampling-methods" style="font-size: 80%;">Why resampling methods</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs023.html#resampling-methods" style="font-size: 80%;">Resampling methods</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs020.html#resampling-approaches-can-be-computationally-expensive" style="font-size: 80%;">Resampling approaches can be computationally expensive</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs021.html#why-resampling-methods" style="font-size: 80%;">Why resampling methods ?</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs022.html#statistical-analysis" style="font-size: 80%;">Statistical analysis</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs023.html#resampling-methods" style="font-size: 80%;">Resampling methods</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs024.html#resampling-methods-bootstrap" style="font-size: 80%;">Resampling methods: Bootstrap</a></li>
-     <!-- navigation toc: --> <li><a href="#the-central-limit-theorem" style="font-size: 80%;">The Central Limit Theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs026.html#finding-the-limit" style="font-size: 80%;">Finding the Limit</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs027.html#rewriting-the-delta-function" style="font-size: 80%;">Rewriting the \( \delta \)-function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs028.html#identifying-terms" style="font-size: 80%;">Identifying Terms</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs029.html#wrapping-it-up" style="font-size: 80%;">Wrapping it up</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs030.html#confidence-intervals" style="font-size: 80%;">Confidence Intervals</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs031.html#standard-approach-based-on-the-normal-distribution" style="font-size: 80%;">Standard Approach based on the Normal Distribution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs032.html#resampling-methods-bootstrap-background" style="font-size: 80%;">Resampling methods: Bootstrap background</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#resampling-methods-more-bootstrap-background" style="font-size: 80%;">Resampling methods: More Bootstrap background</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs034.html#resampling-methods-bootstrap-approach" style="font-size: 80%;">Resampling methods: Bootstrap approach</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#resampling-methods-bootstrap-steps" style="font-size: 80%;">Resampling methods: Bootstrap steps</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs036.html#code-example-for-the-bootstrap-method" style="font-size: 80%;">Code example for the Bootstrap method</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs037.html#plotting-the-histogram" style="font-size: 80%;">Plotting the Histogram</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs038.html#the-bias-variance-tradeoff" style="font-size: 80%;">The bias-variance tradeoff</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs039.html#a-way-to-read-the-bias-variance-tradeoff" style="font-size: 80%;">A way to Read the Bias-Variance Tradeoff</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs040.html#example-code-for-bias-variance-tradeoff" style="font-size: 80%;">Example code for Bias-Variance tradeoff</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs041.html#understanding-what-happens" style="font-size: 80%;">Understanding what happens</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs042.html#summing-up" style="font-size: 80%;">Summing up</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs043.html#another-example-from-scikit-learn-s-repository" style="font-size: 80%;">Another Example from Scikit-Learn's Repository</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs044.html#various-steps-in-cross-validation" style="font-size: 80%;">Various steps in cross-validation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs045.html#cross-validation-in-brief" style="font-size: 80%;">Cross-validation in brief</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs046.html#code-example-for-cross-validation-and-k-fold-cross-validation" style="font-size: 80%;">Code Example for Cross-validation and \( k \)-fold Cross-validation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs047.html#more-examples-on-bootstrap-and-cross-validation-and-errors" style="font-size: 80%;">More examples on bootstrap and cross-validation and errors</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs048.html#the-same-example-but-now-with-cross-validation" style="font-size: 80%;">The same example but now with cross-validation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs049.html#material-for-the-lab-sessions" style="font-size: 80%;">Material for the lab sessions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs050.html#linking-the-regression-analysis-with-a-statistical-interpretation" style="font-size: 80%;">Linking the regression analysis with a statistical interpretation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs051.html#assumptions-made" style="font-size: 80%;">Assumptions made</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs052.html#expectation-value-and-variance" style="font-size: 80%;">Expectation value and variance</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs053.html#expectation-value-and-variance-for-boldsymbol-beta" style="font-size: 80%;">Expectation value and variance for \( \boldsymbol{\beta} \)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs001.html#plans-for-week-37-lecture-monday" style="font-size: 80%;"><b>Plans for week 37, lecture Monday</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs002.html#readings-and-videos" style="font-size: 80%;"><b>Readings and Videos:</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs003.html#material-for-lecture-monday-september-8" style="font-size: 80%;"><b>Material for lecture Monday September 8</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs004.html#gradient-descent-and-revisiting-ordinary-least-squares-from-last-week" style="font-size: 80%;"><b>Gradient descent and revisiting Ordinary Least Squares from last week</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs009.html#gradient-descent-example" style="font-size: 80%;"><b>Gradient descent example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs006.html#the-derivative-of-the-cost-loss-function" style="font-size: 80%;"><b>The derivative of the cost/loss function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs007.html#the-hessian-matrix" style="font-size: 80%;"><b>The Hessian matrix</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs008.html#simple-program" style="font-size: 80%;"><b>Simple program</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs009.html#gradient-descent-example" style="font-size: 80%;"><b>Gradient Descent Example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs010.html#gradient-descent-and-ridge" style="font-size: 80%;"><b>Gradient descent and Ridge</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs011.html#the-hessian-matrix-for-ridge-regression" style="font-size: 80%;"><b>The Hessian matrix for Ridge Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs012.html#program-example-for-gradient-descent-with-ridge-regression" style="font-size: 80%;"><b>Program example for gradient descent with Ridge Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs013.html#using-gradient-descent-methods-limitations" style="font-size: 80%;"><b>Using gradient descent methods, limitations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs014.html#momentum-based-gd" style="font-size: 80%;"><b>Momentum based GD</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs015.html#improving-gradient-descent-with-momentum" style="font-size: 80%;"><b>Improving gradient descent with momentum</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs056.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs017.html#overview-video-on-stochastic-gradient-descent-sgd" style="font-size: 80%;"><b>Overview video on Stochastic Gradient Descent (SGD)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs018.html#batches-and-mini-batches" style="font-size: 80%;"><b>Batches and mini-batches</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs019.html#pros-and-cons" style="font-size: 80%;"><b>Pros and cons</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs020.html#convergence-rates" style="font-size: 80%;"><b>Convergence rates</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs021.html#accuracy" style="font-size: 80%;"><b>Accuracy</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs022.html#stochastic-gradient-descent-sgd" style="font-size: 80%;"><b>Stochastic Gradient Descent (SGD)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs023.html#stochastic-gradient-descent" style="font-size: 80%;"><b>Stochastic Gradient Descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs024.html#computation-of-gradients" style="font-size: 80%;"><b>Computation of gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="#sgd-example" style="font-size: 80%;"><b>SGD example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs026.html#the-gradient-step" style="font-size: 80%;"><b>The gradient step</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs027.html#simple-example-code" style="font-size: 80%;"><b>Simple example code</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs028.html#when-do-we-stop" style="font-size: 80%;"><b>When do we stop?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs029.html#slightly-different-approach" style="font-size: 80%;"><b>Slightly different approach</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs030.html#time-decay-rate" style="font-size: 80%;"><b>Time decay rate</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs031.html#code-with-a-number-of-minibatches-which-varies" style="font-size: 80%;"><b>Code with a Number of Minibatches which varies</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs032.html#replace-or-not" style="font-size: 80%;"><b>Replace or not</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#sgd-vs-full-batch-gd-convergence-speed-and-memory-comparison" style="font-size: 80%;"><b>SGD vs Full-Batch GD: Convergence Speed and Memory Comparison</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#theoretical-convergence-speed-and-convex-optimization" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Theoretical Convergence Speed and convex optimization</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#strongly-convex-case" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Strongly Convex Case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#non-convex-problems" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Non-Convex Problems</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs034.html#memory-usage-and-scalability" style="font-size: 80%;"><b>Memory Usage and Scalability</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#empirical-evidence-convergence-time-and-memory-in-practice" style="font-size: 80%;"><b>Empirical Evidence: Convergence Time and Memory in Practice</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#deep-neural-networks" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Deep Neural Networks</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#memory-constraints" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Memory constraints</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs036.html#second-moment-of-the-gradient" style="font-size: 80%;"><b>Second moment of the gradient</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs037.html#challenge-choosing-a-fixed-learning-rate" style="font-size: 80%;"><b>Challenge: Choosing a Fixed Learning Rate</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs038.html#motivation-for-adaptive-step-sizes" style="font-size: 80%;"><b>Motivation for Adaptive Step Sizes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs038.html#adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>AdaGrad algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs039.html#derivation-of-the-adagrad-algorithm" style="font-size: 80%;"><b>Derivation of the AdaGrad Algorithm</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs040.html#adagrad-update-rule-derivation" style="font-size: 80%;"><b>AdaGrad Update Rule Derivation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs041.html#adagrad-properties" style="font-size: 80%;"><b>AdaGrad Properties</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs042.html#rmsprop-adaptive-learning-rates" style="font-size: 80%;"><b>RMSProp: Adaptive Learning Rates</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs042.html#rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>RMSProp algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs043.html#adam-optimizer" style="font-size: 80%;"><b>Adam Optimizer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs044.html#adam-optimizer-https-arxiv-org-abs-1412-6980" style="font-size: 80%;"><b>"ADAM optimizer":"https://arxiv.org/abs/1412.6980"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs045.html#why-combine-momentum-and-rmsprop" style="font-size: 80%;"><b>Why Combine Momentum and RMSProp?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs046.html#adam-exponential-moving-averages-moments" style="font-size: 80%;"><b>Adam: Exponential Moving Averages (Moments)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs047.html#adam-bias-correction" style="font-size: 80%;"><b>Adam: Bias Correction</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs048.html#adam-update-rule-derivation" style="font-size: 80%;"><b>Adam: Update Rule Derivation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs049.html#adam-vs-adagrad-and-rmsprop" style="font-size: 80%;"><b>Adam vs. AdaGrad and RMSProp</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs050.html#adaptivity-across-dimensions" style="font-size: 80%;"><b>Adaptivity Across Dimensions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs050.html#adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>ADAM algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs051.html#algorithms-and-codes-for-adagrad-rmsprop-and-adam" style="font-size: 80%;"><b>Algorithms and codes for Adagrad, RMSprop and Adam</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs052.html#practical-tips" style="font-size: 80%;"><b>Practical tips</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs053.html#sneaking-in-automatic-differentiation-using-autograd" style="font-size: 80%;"><b>Sneaking in automatic differentiation using Autograd</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs056.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs055.html#including-stochastic-gradient-descent-with-autograd" style="font-size: 80%;"><b>Including Stochastic Gradient Descent with Autograd</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs056.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs057.html#but-none-of-these-can-compete-with-newton-s-method" style="font-size: 80%;"><b>But none of these can compete with Newton's method</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs058.html#similar-second-order-function-now-problem-but-now-with-adagrad" style="font-size: 80%;"><b>Similar (second order function now) problem but now with AdaGrad</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs059.html#rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent" style="font-size: 80%;"><b>RMSprop for adaptive learning rate with Stochastic Gradient Descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs060.html#and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf" style="font-size: 80%;"><b>And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs061.html#material-for-the-lab-sessions" style="font-size: 80%;"><b>Material for the lab sessions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs062.html#reminder-on-different-scaling-methods" style="font-size: 80%;"><b>Reminder on different scaling methods</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs063.html#functionality-in-scikit-learn" style="font-size: 80%;"><b>Functionality in Scikit-Learn</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs064.html#more-preprocessing" style="font-size: 80%;"><b>More preprocessing</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs065.html#frequently-used-scaling-functions" style="font-size: 80%;"><b>Frequently used scaling functions</b></a></li>
 
         </ul>
       </li>
@@ -291,23 +374,28 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0025"></a>
 <!-- !split -->
-<h2 id="the-central-limit-theorem" class="anchor">The Central Limit Theorem </h2>
-
-<p>Suppose we have a PDF \( p(x) \) from which we generate  a series \( N \)
-of averages \( \mathbb{E}[x_i] \). Each mean value \( \mathbb{E}[x_i] \)
-is viewed as the average of a specific measurement, e.g., throwing 
-dice 100 times and then taking the average value, or producing a certain
-amount of random numbers. 
-For notational ease, we set \( \mathbb{E}[x_i]=x_i \) in the discussion
-which follows. We do the same for \( \mathbb{E}[z]=z \).
+<h2 id="sgd-example" class="anchor">SGD example </h2>
+<p>As an example, suppose we have \( 10 \) data points \( (\mathbf{x}_1,\cdots, \mathbf{x}_{10}) \) 
+and we choose to have \( M=5 \) minibathces,
+then each minibatch contains two data points. In particular we have
+\( B_1 = (\mathbf{x}_1,\mathbf{x}_2), \cdots, B_5 =
+(\mathbf{x}_9,\mathbf{x}_{10}) \). Note that if you choose \( M=1 \) you
+have only a single batch with all data points and on the other extreme,
+you may choose \( M=n \) resulting in a minibatch for each datapoint, i.e
+\( B_k = \mathbf{x}_k \).
 </p>
 
-<p>If we compute the mean \( z \) of \( m \) such mean values \( x_i \)   </p>
+<p>The idea is now to approximate the gradient by replacing the sum over
+all data points with a sum over the data points in one the minibatches
+picked at random in each gradient descent step 
+</p>
 $$
-   z=\frac{x_1+x_2+\dots+x_m}{m},
+\nabla_{\theta}
+C(\mathbf{\theta}) = \sum_{i=1}^n \nabla_\theta c_i(\mathbf{x}_i,
+\mathbf{\theta}) \rightarrow \sum_{i \in B_k}^n \nabla_\theta
+c_i(\mathbf{x}_i, \mathbf{\theta}).
 $$
 
-<p>the question we pose is which is the PDF of the new variable \( z \).</p>
 
 <p>
 <!-- navigation buttons at the bottom of the page -->
@@ -334,7 +422,7 @@ <h2 id="the-central-limit-theorem" class="anchor">The Central Limit Theorem </h2
   <li><a href="http://github.com/._week37-bs033.html">34</a></li>
   <li><a href="http://github.com/._week37-bs034.html">35</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week37-bs053.html">54</a></li>
+  <li><a href="http://github.com/._week37-bs065.html">66</a></li>
   <li><a href="http://github.com/._week37-bs026.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week37/html/._week37-bs026.html b/doc/pub/week37/html/._week37-bs026.html
index dafed4d4a..1de6b2d78 100644
--- a/doc/pub/week37/html/._week37-bs026.html
+++ b/doc/pub/week37/html/._week37-bs026.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 37: Statistical interpretations and Resampling Methods">
-<title>Week 37: Statistical interpretations and Resampling Methods</title>
+<meta name="description" content="Week 37: Gradient descent methods">
+<title>Week 37: Gradient descent methods</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week37.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week37-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -40,159 +40,222 @@
                2,
                None,
                'plans-for-week-37-lecture-monday'),
-              ('Plans for week 37, lab sessions',
+              ('Readings and Videos:', 2, None, 'readings-and-videos'),
+              ('Material for lecture Monday September 8',
                2,
                None,
-               'plans-for-week-37-lab-sessions'),
-              ('Material for lecture Monday September 9',
+               'material-for-lecture-monday-september-8'),
+              ('Gradient descent and revisiting Ordinary Least Squares from '
+               'last week',
                2,
                None,
-               'material-for-lecture-monday-september-9'),
-              ('Deriving OLS from a probability distribution',
+               'gradient-descent-and-revisiting-ordinary-least-squares-from-last-week'),
+              ('Gradient descent example', 2, None, 'gradient-descent-example'),
+              ('The derivative of the cost/loss function',
                2,
                None,
-               'deriving-ols-from-a-probability-distribution'),
-              ('Independent and Identically Distrubuted (iid)',
+               'the-derivative-of-the-cost-loss-function'),
+              ('The Hessian matrix', 2, None, 'the-hessian-matrix'),
+              ('Simple program', 2, None, 'simple-program'),
+              ('Gradient Descent Example', 2, None, 'gradient-descent-example'),
+              ('Gradient descent and Ridge',
                2,
                None,
-               'independent-and-identically-distrubuted-iid'),
-              ('Maximum Likelihood Estimation (MLE)',
+               'gradient-descent-and-ridge'),
+              ('The Hessian matrix for Ridge Regression',
                2,
                None,
-               'maximum-likelihood-estimation-mle'),
-              ('A new Cost Function', 2, None, 'a-new-cost-function'),
-              ("More basic Statistics and Bayes' theorem",
+               'the-hessian-matrix-for-ridge-regression'),
+              ('Program example for gradient descent with Ridge Regression',
                2,
                None,
-               'more-basic-statistics-and-bayes-theorem'),
-              ('Marginal Probability', 2, None, 'marginal-probability'),
-              ('Conditional  Probability', 2, None, 'conditional-probability'),
-              ("Bayes' Theorem", 2, None, 'bayes-theorem'),
-              ("Interpretations of Bayes' Theorem",
+               'program-example-for-gradient-descent-with-ridge-regression'),
+              ('Using gradient descent methods, limitations',
                2,
                None,
-               'interpretations-of-bayes-theorem'),
-              ("Example of Usage of Bayes' theorem",
+               'using-gradient-descent-methods-limitations'),
+              ('Momentum based GD', 2, None, 'momentum-based-gd'),
+              ('Improving gradient descent with momentum',
                2,
                None,
-               'example-of-usage-of-bayes-theorem'),
-              ('Doing it correctly', 2, None, 'doing-it-correctly'),
-              ("Bayes' Theorem and Ridge and Lasso Regression",
+               'improving-gradient-descent-with-momentum'),
+              ('Same code but now with momentum gradient descent',
                2,
                None,
-               'bayes-theorem-and-ridge-and-lasso-regression'),
-              ('Ridge and Bayes', 2, None, 'ridge-and-bayes'),
-              ('Lasso and Bayes', 2, None, 'lasso-and-bayes'),
-              ('Why resampling methods', 2, None, 'why-resampling-methods'),
-              ('Resampling methods', 2, None, 'resampling-methods'),
-              ('Resampling approaches can be computationally expensive',
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ('Overview video on Stochastic Gradient Descent (SGD)',
                2,
                None,
-               'resampling-approaches-can-be-computationally-expensive'),
-              ('Why resampling methods ?', 2, None, 'why-resampling-methods'),
-              ('Statistical analysis', 2, None, 'statistical-analysis'),
-              ('Resampling methods', 2, None, 'resampling-methods'),
-              ('Resampling methods: Bootstrap',
+               'overview-video-on-stochastic-gradient-descent-sgd'),
+              ('Batches and mini-batches', 2, None, 'batches-and-mini-batches'),
+              ('Pros and cons', 2, None, 'pros-and-cons'),
+              ('Convergence rates', 2, None, 'convergence-rates'),
+              ('Accuracy', 2, None, 'accuracy'),
+              ('Stochastic Gradient Descent (SGD)',
                2,
                None,
-               'resampling-methods-bootstrap'),
-              ('The Central Limit Theorem',
+               'stochastic-gradient-descent-sgd'),
+              ('Stochastic Gradient Descent',
                2,
                None,
-               'the-central-limit-theorem'),
-              ('Finding the Limit', 2, None, 'finding-the-limit'),
-              ('Rewriting the $\\delta$-function',
+               'stochastic-gradient-descent'),
+              ('Computation of gradients', 2, None, 'computation-of-gradients'),
+              ('SGD example', 2, None, 'sgd-example'),
+              ('The gradient step', 2, None, 'the-gradient-step'),
+              ('Simple example code', 2, None, 'simple-example-code'),
+              ('When do we stop?', 2, None, 'when-do-we-stop'),
+              ('Slightly different approach',
                2,
                None,
-               'rewriting-the-delta-function'),
-              ('Identifying Terms', 2, None, 'identifying-terms'),
-              ('Wrapping it up', 2, None, 'wrapping-it-up'),
-              ('Confidence Intervals', 2, None, 'confidence-intervals'),
-              ('Standard Approach based on the Normal Distribution',
+               'slightly-different-approach'),
+              ('Time decay rate', 2, None, 'time-decay-rate'),
+              ('Code with a Number of Minibatches which varies',
                2,
                None,
-               'standard-approach-based-on-the-normal-distribution'),
-              ('Resampling methods: Bootstrap background',
+               'code-with-a-number-of-minibatches-which-varies'),
+              ('Replace or not', 2, None, 'replace-or-not'),
+              ('SGD vs Full-Batch GD: Convergence Speed and Memory Comparison',
                2,
                None,
-               'resampling-methods-bootstrap-background'),
-              ('Resampling methods: More Bootstrap background',
+               'sgd-vs-full-batch-gd-convergence-speed-and-memory-comparison'),
+              ('Theoretical Convergence Speed and convex optimization',
+               3,
+               None,
+               'theoretical-convergence-speed-and-convex-optimization'),
+              ('Strongly Convex Case', 3, None, 'strongly-convex-case'),
+              ('Non-Convex Problems', 3, None, 'non-convex-problems'),
+              ('Memory Usage and Scalability',
+               2,
+               None,
+               'memory-usage-and-scalability'),
+              ('Empirical Evidence: Convergence Time and Memory in Practice',
+               2,
+               None,
+               'empirical-evidence-convergence-time-and-memory-in-practice'),
+              ('Deep Neural Networks', 3, None, 'deep-neural-networks'),
+              ('Memory constraints', 3, None, 'memory-constraints'),
+              ('Second moment of the gradient',
+               2,
+               None,
+               'second-moment-of-the-gradient'),
+              ('Challenge: Choosing a Fixed Learning Rate',
+               2,
+               None,
+               'challenge-choosing-a-fixed-learning-rate'),
+              ('Motivation for Adaptive Step Sizes',
+               2,
+               None,
+               'motivation-for-adaptive-step-sizes'),
+              ('AdaGrad algorithm, taken from "Goodfellow et '
+               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
+               2,
+               None,
+               'adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
+              ('Derivation of the AdaGrad Algorithm',
                2,
                None,
-               'resampling-methods-more-bootstrap-background'),
-              ('Resampling methods: Bootstrap approach',
+               'derivation-of-the-adagrad-algorithm'),
+              ('AdaGrad Update Rule Derivation',
                2,
                None,
-               'resampling-methods-bootstrap-approach'),
-              ('Resampling methods: Bootstrap steps',
+               'adagrad-update-rule-derivation'),
+              ('AdaGrad Properties', 2, None, 'adagrad-properties'),
+              ('RMSProp: Adaptive Learning Rates',
                2,
                None,
-               'resampling-methods-bootstrap-steps'),
-              ('Code example for the Bootstrap method',
+               'rmsprop-adaptive-learning-rates'),
+              ('RMSProp algorithm, taken from "Goodfellow et '
+               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
                2,
                None,
-               'code-example-for-the-bootstrap-method'),
-              ('Plotting the Histogram', 2, None, 'plotting-the-histogram'),
-              ('The bias-variance tradeoff',
+               'rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
+              ('Adam Optimizer', 2, None, 'adam-optimizer'),
+              ('"ADAM optimizer":"https://arxiv.org/abs/1412.6980"',
                2,
                None,
-               'the-bias-variance-tradeoff'),
-              ('A way to Read the Bias-Variance Tradeoff',
+               'adam-optimizer-https-arxiv-org-abs-1412-6980'),
+              ('Why Combine Momentum and RMSProp?',
                2,
                None,
-               'a-way-to-read-the-bias-variance-tradeoff'),
-              ('Example code for Bias-Variance tradeoff',
+               'why-combine-momentum-and-rmsprop'),
+              ('Adam: Exponential Moving Averages (Moments)',
                2,
                None,
-               'example-code-for-bias-variance-tradeoff'),
-              ('Understanding what happens',
+               'adam-exponential-moving-averages-moments'),
+              ('Adam: Bias Correction', 2, None, 'adam-bias-correction'),
+              ('Adam: Update Rule Derivation',
                2,
                None,
-               'understanding-what-happens'),
-              ('Summing up', 2, None, 'summing-up'),
-              ("Another Example from Scikit-Learn's Repository",
+               'adam-update-rule-derivation'),
+              ('Adam vs. AdaGrad and RMSProp',
                2,
                None,
-               'another-example-from-scikit-learn-s-repository'),
-              ('Various steps in cross-validation',
+               'adam-vs-adagrad-and-rmsprop'),
+              ('Adaptivity Across Dimensions',
                2,
                None,
-               'various-steps-in-cross-validation'),
-              ('Cross-validation in brief',
+               'adaptivity-across-dimensions'),
+              ('ADAM algorithm, taken from "Goodfellow et '
+               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
                2,
                None,
-               'cross-validation-in-brief'),
-              ('Code Example for Cross-validation and $k$-fold '
-               'Cross-validation',
+               'adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
+              ('Algorithms and codes for Adagrad, RMSprop and Adam',
                2,
                None,
-               'code-example-for-cross-validation-and-k-fold-cross-validation'),
-              ('More examples on bootstrap and cross-validation and errors',
+               'algorithms-and-codes-for-adagrad-rmsprop-and-adam'),
+              ('Practical tips', 2, None, 'practical-tips'),
+              ('Sneaking in automatic differentiation using Autograd',
                2,
                None,
-               'more-examples-on-bootstrap-and-cross-validation-and-errors'),
-              ('The same example but now with cross-validation',
+               'sneaking-in-automatic-differentiation-using-autograd'),
+              ('Same code but now with momentum gradient descent',
                2,
                None,
-               'the-same-example-but-now-with-cross-validation'),
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ('Including Stochastic Gradient Descent with Autograd',
+               2,
+               None,
+               'including-stochastic-gradient-descent-with-autograd'),
+              ('Same code but now with momentum gradient descent',
+               2,
+               None,
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ("But none of these can compete with Newton's method",
+               2,
+               None,
+               'but-none-of-these-can-compete-with-newton-s-method'),
+              ('Similar (second order function now) problem but now with '
+               'AdaGrad',
+               2,
+               None,
+               'similar-second-order-function-now-problem-but-now-with-adagrad'),
+              ('RMSprop for adaptive learning rate with Stochastic Gradient '
+               'Descent',
+               2,
+               None,
+               'rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent'),
+              ('And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"',
+               2,
+               None,
+               'and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf'),
               ('Material for the lab sessions',
                2,
                None,
                'material-for-the-lab-sessions'),
-              ('Linking the regression analysis with a statistical '
-               'interpretation',
+              ('Reminder on different scaling methods',
                2,
                None,
-               'linking-the-regression-analysis-with-a-statistical-interpretation'),
-              ('Assumptions made', 2, None, 'assumptions-made'),
-              ('Expectation value and variance',
+               'reminder-on-different-scaling-methods'),
+              ('Functionality in Scikit-Learn',
                2,
                None,
-               'expectation-value-and-variance'),
-              ('Expectation value and variance for $\\boldsymbol{\\beta}$',
+               'functionality-in-scikit-learn'),
+              ('More preprocessing', 2, None, 'more-preprocessing'),
+              ('Frequently used scaling functions',
                2,
                None,
-               'expectation-value-and-variance-for-boldsymbol-beta')]}
+               'frequently-used-scaling-functions')]}
 end of tocinfo -->
 
 <body>
@@ -220,66 +283,86 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week37-bs.html">Week 37: Statistical interpretations and Resampling Methods</a>
+    <a class="navbar-brand" href="http://github.com/week37-bs.html">Week 37: Gradient descent methods</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs001.html#plans-for-week-37-lecture-monday" style="font-size: 80%;">Plans for week 37, lecture Monday</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs002.html#plans-for-week-37-lab-sessions" style="font-size: 80%;">Plans for week 37, lab sessions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs003.html#material-for-lecture-monday-september-9" style="font-size: 80%;">Material for lecture Monday September 9</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs004.html#deriving-ols-from-a-probability-distribution" style="font-size: 80%;">Deriving OLS from a probability distribution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs005.html#independent-and-identically-distrubuted-iid" style="font-size: 80%;">Independent and Identically Distrubuted (iid)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs006.html#maximum-likelihood-estimation-mle" style="font-size: 80%;">Maximum Likelihood Estimation (MLE)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs007.html#a-new-cost-function" style="font-size: 80%;">A new Cost Function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs008.html#more-basic-statistics-and-bayes-theorem" style="font-size: 80%;">More basic Statistics and Bayes' theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs009.html#marginal-probability" style="font-size: 80%;">Marginal Probability</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs010.html#conditional-probability" style="font-size: 80%;">Conditional  Probability</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs011.html#bayes-theorem" style="font-size: 80%;">Bayes' Theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs012.html#interpretations-of-bayes-theorem" style="font-size: 80%;">Interpretations of Bayes' Theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs013.html#example-of-usage-of-bayes-theorem" style="font-size: 80%;">Example of Usage of Bayes' theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs014.html#doing-it-correctly" style="font-size: 80%;">Doing it correctly</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs015.html#bayes-theorem-and-ridge-and-lasso-regression" style="font-size: 80%;">Bayes' Theorem and Ridge and Lasso Regression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs016.html#ridge-and-bayes" style="font-size: 80%;">Ridge and Bayes</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs017.html#lasso-and-bayes" style="font-size: 80%;">Lasso and Bayes</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs021.html#why-resampling-methods" style="font-size: 80%;">Why resampling methods</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs023.html#resampling-methods" style="font-size: 80%;">Resampling methods</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs020.html#resampling-approaches-can-be-computationally-expensive" style="font-size: 80%;">Resampling approaches can be computationally expensive</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs021.html#why-resampling-methods" style="font-size: 80%;">Why resampling methods ?</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs022.html#statistical-analysis" style="font-size: 80%;">Statistical analysis</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs023.html#resampling-methods" style="font-size: 80%;">Resampling methods</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs024.html#resampling-methods-bootstrap" style="font-size: 80%;">Resampling methods: Bootstrap</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs025.html#the-central-limit-theorem" style="font-size: 80%;">The Central Limit Theorem</a></li>
-     <!-- navigation toc: --> <li><a href="#finding-the-limit" style="font-size: 80%;">Finding the Limit</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs027.html#rewriting-the-delta-function" style="font-size: 80%;">Rewriting the \( \delta \)-function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs028.html#identifying-terms" style="font-size: 80%;">Identifying Terms</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs029.html#wrapping-it-up" style="font-size: 80%;">Wrapping it up</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs030.html#confidence-intervals" style="font-size: 80%;">Confidence Intervals</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs031.html#standard-approach-based-on-the-normal-distribution" style="font-size: 80%;">Standard Approach based on the Normal Distribution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs032.html#resampling-methods-bootstrap-background" style="font-size: 80%;">Resampling methods: Bootstrap background</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#resampling-methods-more-bootstrap-background" style="font-size: 80%;">Resampling methods: More Bootstrap background</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs034.html#resampling-methods-bootstrap-approach" style="font-size: 80%;">Resampling methods: Bootstrap approach</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#resampling-methods-bootstrap-steps" style="font-size: 80%;">Resampling methods: Bootstrap steps</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs036.html#code-example-for-the-bootstrap-method" style="font-size: 80%;">Code example for the Bootstrap method</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs037.html#plotting-the-histogram" style="font-size: 80%;">Plotting the Histogram</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs038.html#the-bias-variance-tradeoff" style="font-size: 80%;">The bias-variance tradeoff</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs039.html#a-way-to-read-the-bias-variance-tradeoff" style="font-size: 80%;">A way to Read the Bias-Variance Tradeoff</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs040.html#example-code-for-bias-variance-tradeoff" style="font-size: 80%;">Example code for Bias-Variance tradeoff</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs041.html#understanding-what-happens" style="font-size: 80%;">Understanding what happens</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs042.html#summing-up" style="font-size: 80%;">Summing up</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs043.html#another-example-from-scikit-learn-s-repository" style="font-size: 80%;">Another Example from Scikit-Learn's Repository</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs044.html#various-steps-in-cross-validation" style="font-size: 80%;">Various steps in cross-validation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs045.html#cross-validation-in-brief" style="font-size: 80%;">Cross-validation in brief</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs046.html#code-example-for-cross-validation-and-k-fold-cross-validation" style="font-size: 80%;">Code Example for Cross-validation and \( k \)-fold Cross-validation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs047.html#more-examples-on-bootstrap-and-cross-validation-and-errors" style="font-size: 80%;">More examples on bootstrap and cross-validation and errors</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs048.html#the-same-example-but-now-with-cross-validation" style="font-size: 80%;">The same example but now with cross-validation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs049.html#material-for-the-lab-sessions" style="font-size: 80%;">Material for the lab sessions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs050.html#linking-the-regression-analysis-with-a-statistical-interpretation" style="font-size: 80%;">Linking the regression analysis with a statistical interpretation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs051.html#assumptions-made" style="font-size: 80%;">Assumptions made</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs052.html#expectation-value-and-variance" style="font-size: 80%;">Expectation value and variance</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs053.html#expectation-value-and-variance-for-boldsymbol-beta" style="font-size: 80%;">Expectation value and variance for \( \boldsymbol{\beta} \)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs001.html#plans-for-week-37-lecture-monday" style="font-size: 80%;"><b>Plans for week 37, lecture Monday</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs002.html#readings-and-videos" style="font-size: 80%;"><b>Readings and Videos:</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs003.html#material-for-lecture-monday-september-8" style="font-size: 80%;"><b>Material for lecture Monday September 8</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs004.html#gradient-descent-and-revisiting-ordinary-least-squares-from-last-week" style="font-size: 80%;"><b>Gradient descent and revisiting Ordinary Least Squares from last week</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs009.html#gradient-descent-example" style="font-size: 80%;"><b>Gradient descent example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs006.html#the-derivative-of-the-cost-loss-function" style="font-size: 80%;"><b>The derivative of the cost/loss function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs007.html#the-hessian-matrix" style="font-size: 80%;"><b>The Hessian matrix</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs008.html#simple-program" style="font-size: 80%;"><b>Simple program</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs009.html#gradient-descent-example" style="font-size: 80%;"><b>Gradient Descent Example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs010.html#gradient-descent-and-ridge" style="font-size: 80%;"><b>Gradient descent and Ridge</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs011.html#the-hessian-matrix-for-ridge-regression" style="font-size: 80%;"><b>The Hessian matrix for Ridge Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs012.html#program-example-for-gradient-descent-with-ridge-regression" style="font-size: 80%;"><b>Program example for gradient descent with Ridge Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs013.html#using-gradient-descent-methods-limitations" style="font-size: 80%;"><b>Using gradient descent methods, limitations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs014.html#momentum-based-gd" style="font-size: 80%;"><b>Momentum based GD</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs015.html#improving-gradient-descent-with-momentum" style="font-size: 80%;"><b>Improving gradient descent with momentum</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs056.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs017.html#overview-video-on-stochastic-gradient-descent-sgd" style="font-size: 80%;"><b>Overview video on Stochastic Gradient Descent (SGD)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs018.html#batches-and-mini-batches" style="font-size: 80%;"><b>Batches and mini-batches</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs019.html#pros-and-cons" style="font-size: 80%;"><b>Pros and cons</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs020.html#convergence-rates" style="font-size: 80%;"><b>Convergence rates</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs021.html#accuracy" style="font-size: 80%;"><b>Accuracy</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs022.html#stochastic-gradient-descent-sgd" style="font-size: 80%;"><b>Stochastic Gradient Descent (SGD)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs023.html#stochastic-gradient-descent" style="font-size: 80%;"><b>Stochastic Gradient Descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs024.html#computation-of-gradients" style="font-size: 80%;"><b>Computation of gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs025.html#sgd-example" style="font-size: 80%;"><b>SGD example</b></a></li>
+     <!-- navigation toc: --> <li><a href="#the-gradient-step" style="font-size: 80%;"><b>The gradient step</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs027.html#simple-example-code" style="font-size: 80%;"><b>Simple example code</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs028.html#when-do-we-stop" style="font-size: 80%;"><b>When do we stop?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs029.html#slightly-different-approach" style="font-size: 80%;"><b>Slightly different approach</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs030.html#time-decay-rate" style="font-size: 80%;"><b>Time decay rate</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs031.html#code-with-a-number-of-minibatches-which-varies" style="font-size: 80%;"><b>Code with a Number of Minibatches which varies</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs032.html#replace-or-not" style="font-size: 80%;"><b>Replace or not</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#sgd-vs-full-batch-gd-convergence-speed-and-memory-comparison" style="font-size: 80%;"><b>SGD vs Full-Batch GD: Convergence Speed and Memory Comparison</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#theoretical-convergence-speed-and-convex-optimization" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Theoretical Convergence Speed and convex optimization</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#strongly-convex-case" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Strongly Convex Case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#non-convex-problems" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Non-Convex Problems</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs034.html#memory-usage-and-scalability" style="font-size: 80%;"><b>Memory Usage and Scalability</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#empirical-evidence-convergence-time-and-memory-in-practice" style="font-size: 80%;"><b>Empirical Evidence: Convergence Time and Memory in Practice</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#deep-neural-networks" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Deep Neural Networks</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#memory-constraints" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Memory constraints</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs036.html#second-moment-of-the-gradient" style="font-size: 80%;"><b>Second moment of the gradient</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs037.html#challenge-choosing-a-fixed-learning-rate" style="font-size: 80%;"><b>Challenge: Choosing a Fixed Learning Rate</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs038.html#motivation-for-adaptive-step-sizes" style="font-size: 80%;"><b>Motivation for Adaptive Step Sizes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs038.html#adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>AdaGrad algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs039.html#derivation-of-the-adagrad-algorithm" style="font-size: 80%;"><b>Derivation of the AdaGrad Algorithm</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs040.html#adagrad-update-rule-derivation" style="font-size: 80%;"><b>AdaGrad Update Rule Derivation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs041.html#adagrad-properties" style="font-size: 80%;"><b>AdaGrad Properties</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs042.html#rmsprop-adaptive-learning-rates" style="font-size: 80%;"><b>RMSProp: Adaptive Learning Rates</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs042.html#rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>RMSProp algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs043.html#adam-optimizer" style="font-size: 80%;"><b>Adam Optimizer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs044.html#adam-optimizer-https-arxiv-org-abs-1412-6980" style="font-size: 80%;"><b>"ADAM optimizer":"https://arxiv.org/abs/1412.6980"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs045.html#why-combine-momentum-and-rmsprop" style="font-size: 80%;"><b>Why Combine Momentum and RMSProp?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs046.html#adam-exponential-moving-averages-moments" style="font-size: 80%;"><b>Adam: Exponential Moving Averages (Moments)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs047.html#adam-bias-correction" style="font-size: 80%;"><b>Adam: Bias Correction</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs048.html#adam-update-rule-derivation" style="font-size: 80%;"><b>Adam: Update Rule Derivation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs049.html#adam-vs-adagrad-and-rmsprop" style="font-size: 80%;"><b>Adam vs. AdaGrad and RMSProp</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs050.html#adaptivity-across-dimensions" style="font-size: 80%;"><b>Adaptivity Across Dimensions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs050.html#adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>ADAM algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs051.html#algorithms-and-codes-for-adagrad-rmsprop-and-adam" style="font-size: 80%;"><b>Algorithms and codes for Adagrad, RMSprop and Adam</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs052.html#practical-tips" style="font-size: 80%;"><b>Practical tips</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs053.html#sneaking-in-automatic-differentiation-using-autograd" style="font-size: 80%;"><b>Sneaking in automatic differentiation using Autograd</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs056.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs055.html#including-stochastic-gradient-descent-with-autograd" style="font-size: 80%;"><b>Including Stochastic Gradient Descent with Autograd</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs056.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs057.html#but-none-of-these-can-compete-with-newton-s-method" style="font-size: 80%;"><b>But none of these can compete with Newton's method</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs058.html#similar-second-order-function-now-problem-but-now-with-adagrad" style="font-size: 80%;"><b>Similar (second order function now) problem but now with AdaGrad</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs059.html#rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent" style="font-size: 80%;"><b>RMSprop for adaptive learning rate with Stochastic Gradient Descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs060.html#and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf" style="font-size: 80%;"><b>And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs061.html#material-for-the-lab-sessions" style="font-size: 80%;"><b>Material for the lab sessions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs062.html#reminder-on-different-scaling-methods" style="font-size: 80%;"><b>Reminder on different scaling methods</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs063.html#functionality-in-scikit-learn" style="font-size: 80%;"><b>Functionality in Scikit-Learn</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs064.html#more-preprocessing" style="font-size: 80%;"><b>More preprocessing</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs065.html#frequently-used-scaling-functions" style="font-size: 80%;"><b>Frequently used scaling functions</b></a></li>
 
         </ul>
       </li>
@@ -291,22 +374,19 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0026"></a>
 <!-- !split -->
-<h2 id="finding-the-limit" class="anchor">Finding the Limit </h2>
+<h2 id="the-gradient-step" class="anchor">The gradient step </h2>
 
-<p>The probability of obtaining an average value \( z \) is the product of the 
-probabilities of obtaining arbitrary individual mean values \( x_i \),
-but with the constraint that the average is \( z \). We can express this through
-the following expression
-</p>
+<p>Thus a gradient descent step now looks like </p>
 $$
-    \tilde{p}(z)=\int dx_1p(x_1)\int dx_2p(x_2)\dots\int dx_mp(x_m)
-    \delta(z-\frac{x_1+x_2+\dots+x_m}{m}),
+\theta_{j+1} = \theta_j - \eta_j \sum_{i \in B_k}^n \nabla_\theta c_i(\mathbf{x}_i,
+\mathbf{\theta})
 $$
 
-<p>where the \( \delta \)-function enbodies the constraint that the mean is \( z \).
-All measurements that lead to each individual \( x_i \) are expected to
-be independent, which in turn means that we can express \( \tilde{p} \) as the 
-product of individual \( p(x_i) \).  The independence assumption is important in the derivation of the central limit theorem.
+<p>where \( k \) is picked at random with equal
+probability from \( [1,n/M] \). An iteration over the number of
+minibathces (n/M) is commonly referred to as an epoch. Thus it is
+typical to choose a number of epochs and for each epoch iterate over
+the number of minibatches, as exemplified in the code below.
 </p>
 
 <p>
@@ -334,7 +414,7 @@ <h2 id="finding-the-limit" class="anchor">Finding the Limit </h2>
   <li><a href="http://github.com/._week37-bs034.html">35</a></li>
   <li><a href="http://github.com/._week37-bs035.html">36</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week37-bs053.html">54</a></li>
+  <li><a href="http://github.com/._week37-bs065.html">66</a></li>
   <li><a href="http://github.com/._week37-bs027.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week37/html/._week37-bs027.html b/doc/pub/week37/html/._week37-bs027.html
index 1718f7fbc..c6c9aca48 100644
--- a/doc/pub/week37/html/._week37-bs027.html
+++ b/doc/pub/week37/html/._week37-bs027.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 37: Statistical interpretations and Resampling Methods">
-<title>Week 37: Statistical interpretations and Resampling Methods</title>
+<meta name="description" content="Week 37: Gradient descent methods">
+<title>Week 37: Gradient descent methods</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week37.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week37-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -40,159 +40,222 @@
                2,
                None,
                'plans-for-week-37-lecture-monday'),
-              ('Plans for week 37, lab sessions',
+              ('Readings and Videos:', 2, None, 'readings-and-videos'),
+              ('Material for lecture Monday September 8',
                2,
                None,
-               'plans-for-week-37-lab-sessions'),
-              ('Material for lecture Monday September 9',
+               'material-for-lecture-monday-september-8'),
+              ('Gradient descent and revisiting Ordinary Least Squares from '
+               'last week',
                2,
                None,
-               'material-for-lecture-monday-september-9'),
-              ('Deriving OLS from a probability distribution',
+               'gradient-descent-and-revisiting-ordinary-least-squares-from-last-week'),
+              ('Gradient descent example', 2, None, 'gradient-descent-example'),
+              ('The derivative of the cost/loss function',
                2,
                None,
-               'deriving-ols-from-a-probability-distribution'),
-              ('Independent and Identically Distrubuted (iid)',
+               'the-derivative-of-the-cost-loss-function'),
+              ('The Hessian matrix', 2, None, 'the-hessian-matrix'),
+              ('Simple program', 2, None, 'simple-program'),
+              ('Gradient Descent Example', 2, None, 'gradient-descent-example'),
+              ('Gradient descent and Ridge',
                2,
                None,
-               'independent-and-identically-distrubuted-iid'),
-              ('Maximum Likelihood Estimation (MLE)',
+               'gradient-descent-and-ridge'),
+              ('The Hessian matrix for Ridge Regression',
                2,
                None,
-               'maximum-likelihood-estimation-mle'),
-              ('A new Cost Function', 2, None, 'a-new-cost-function'),
-              ("More basic Statistics and Bayes' theorem",
+               'the-hessian-matrix-for-ridge-regression'),
+              ('Program example for gradient descent with Ridge Regression',
                2,
                None,
-               'more-basic-statistics-and-bayes-theorem'),
-              ('Marginal Probability', 2, None, 'marginal-probability'),
-              ('Conditional  Probability', 2, None, 'conditional-probability'),
-              ("Bayes' Theorem", 2, None, 'bayes-theorem'),
-              ("Interpretations of Bayes' Theorem",
+               'program-example-for-gradient-descent-with-ridge-regression'),
+              ('Using gradient descent methods, limitations',
                2,
                None,
-               'interpretations-of-bayes-theorem'),
-              ("Example of Usage of Bayes' theorem",
+               'using-gradient-descent-methods-limitations'),
+              ('Momentum based GD', 2, None, 'momentum-based-gd'),
+              ('Improving gradient descent with momentum',
                2,
                None,
-               'example-of-usage-of-bayes-theorem'),
-              ('Doing it correctly', 2, None, 'doing-it-correctly'),
-              ("Bayes' Theorem and Ridge and Lasso Regression",
+               'improving-gradient-descent-with-momentum'),
+              ('Same code but now with momentum gradient descent',
                2,
                None,
-               'bayes-theorem-and-ridge-and-lasso-regression'),
-              ('Ridge and Bayes', 2, None, 'ridge-and-bayes'),
-              ('Lasso and Bayes', 2, None, 'lasso-and-bayes'),
-              ('Why resampling methods', 2, None, 'why-resampling-methods'),
-              ('Resampling methods', 2, None, 'resampling-methods'),
-              ('Resampling approaches can be computationally expensive',
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ('Overview video on Stochastic Gradient Descent (SGD)',
                2,
                None,
-               'resampling-approaches-can-be-computationally-expensive'),
-              ('Why resampling methods ?', 2, None, 'why-resampling-methods'),
-              ('Statistical analysis', 2, None, 'statistical-analysis'),
-              ('Resampling methods', 2, None, 'resampling-methods'),
-              ('Resampling methods: Bootstrap',
+               'overview-video-on-stochastic-gradient-descent-sgd'),
+              ('Batches and mini-batches', 2, None, 'batches-and-mini-batches'),
+              ('Pros and cons', 2, None, 'pros-and-cons'),
+              ('Convergence rates', 2, None, 'convergence-rates'),
+              ('Accuracy', 2, None, 'accuracy'),
+              ('Stochastic Gradient Descent (SGD)',
                2,
                None,
-               'resampling-methods-bootstrap'),
-              ('The Central Limit Theorem',
+               'stochastic-gradient-descent-sgd'),
+              ('Stochastic Gradient Descent',
                2,
                None,
-               'the-central-limit-theorem'),
-              ('Finding the Limit', 2, None, 'finding-the-limit'),
-              ('Rewriting the $\\delta$-function',
+               'stochastic-gradient-descent'),
+              ('Computation of gradients', 2, None, 'computation-of-gradients'),
+              ('SGD example', 2, None, 'sgd-example'),
+              ('The gradient step', 2, None, 'the-gradient-step'),
+              ('Simple example code', 2, None, 'simple-example-code'),
+              ('When do we stop?', 2, None, 'when-do-we-stop'),
+              ('Slightly different approach',
                2,
                None,
-               'rewriting-the-delta-function'),
-              ('Identifying Terms', 2, None, 'identifying-terms'),
-              ('Wrapping it up', 2, None, 'wrapping-it-up'),
-              ('Confidence Intervals', 2, None, 'confidence-intervals'),
-              ('Standard Approach based on the Normal Distribution',
+               'slightly-different-approach'),
+              ('Time decay rate', 2, None, 'time-decay-rate'),
+              ('Code with a Number of Minibatches which varies',
                2,
                None,
-               'standard-approach-based-on-the-normal-distribution'),
-              ('Resampling methods: Bootstrap background',
+               'code-with-a-number-of-minibatches-which-varies'),
+              ('Replace or not', 2, None, 'replace-or-not'),
+              ('SGD vs Full-Batch GD: Convergence Speed and Memory Comparison',
                2,
                None,
-               'resampling-methods-bootstrap-background'),
-              ('Resampling methods: More Bootstrap background',
+               'sgd-vs-full-batch-gd-convergence-speed-and-memory-comparison'),
+              ('Theoretical Convergence Speed and convex optimization',
+               3,
+               None,
+               'theoretical-convergence-speed-and-convex-optimization'),
+              ('Strongly Convex Case', 3, None, 'strongly-convex-case'),
+              ('Non-Convex Problems', 3, None, 'non-convex-problems'),
+              ('Memory Usage and Scalability',
+               2,
+               None,
+               'memory-usage-and-scalability'),
+              ('Empirical Evidence: Convergence Time and Memory in Practice',
+               2,
+               None,
+               'empirical-evidence-convergence-time-and-memory-in-practice'),
+              ('Deep Neural Networks', 3, None, 'deep-neural-networks'),
+              ('Memory constraints', 3, None, 'memory-constraints'),
+              ('Second moment of the gradient',
+               2,
+               None,
+               'second-moment-of-the-gradient'),
+              ('Challenge: Choosing a Fixed Learning Rate',
+               2,
+               None,
+               'challenge-choosing-a-fixed-learning-rate'),
+              ('Motivation for Adaptive Step Sizes',
+               2,
+               None,
+               'motivation-for-adaptive-step-sizes'),
+              ('AdaGrad algorithm, taken from "Goodfellow et '
+               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
+               2,
+               None,
+               'adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
+              ('Derivation of the AdaGrad Algorithm',
+               2,
+               None,
+               'derivation-of-the-adagrad-algorithm'),
+              ('AdaGrad Update Rule Derivation',
+               2,
+               None,
+               'adagrad-update-rule-derivation'),
+              ('AdaGrad Properties', 2, None, 'adagrad-properties'),
+              ('RMSProp: Adaptive Learning Rates',
+               2,
+               None,
+               'rmsprop-adaptive-learning-rates'),
+              ('RMSProp algorithm, taken from "Goodfellow et '
+               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
+               2,
+               None,
+               'rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
+              ('Adam Optimizer', 2, None, 'adam-optimizer'),
+              ('"ADAM optimizer":"https://arxiv.org/abs/1412.6980"',
+               2,
+               None,
+               'adam-optimizer-https-arxiv-org-abs-1412-6980'),
+              ('Why Combine Momentum and RMSProp?',
+               2,
+               None,
+               'why-combine-momentum-and-rmsprop'),
+              ('Adam: Exponential Moving Averages (Moments)',
                2,
                None,
-               'resampling-methods-more-bootstrap-background'),
-              ('Resampling methods: Bootstrap approach',
+               'adam-exponential-moving-averages-moments'),
+              ('Adam: Bias Correction', 2, None, 'adam-bias-correction'),
+              ('Adam: Update Rule Derivation',
                2,
                None,
-               'resampling-methods-bootstrap-approach'),
-              ('Resampling methods: Bootstrap steps',
+               'adam-update-rule-derivation'),
+              ('Adam vs. AdaGrad and RMSProp',
                2,
                None,
-               'resampling-methods-bootstrap-steps'),
-              ('Code example for the Bootstrap method',
+               'adam-vs-adagrad-and-rmsprop'),
+              ('Adaptivity Across Dimensions',
                2,
                None,
-               'code-example-for-the-bootstrap-method'),
-              ('Plotting the Histogram', 2, None, 'plotting-the-histogram'),
-              ('The bias-variance tradeoff',
+               'adaptivity-across-dimensions'),
+              ('ADAM algorithm, taken from "Goodfellow et '
+               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
                2,
                None,
-               'the-bias-variance-tradeoff'),
-              ('A way to Read the Bias-Variance Tradeoff',
+               'adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
+              ('Algorithms and codes for Adagrad, RMSprop and Adam',
                2,
                None,
-               'a-way-to-read-the-bias-variance-tradeoff'),
-              ('Example code for Bias-Variance tradeoff',
+               'algorithms-and-codes-for-adagrad-rmsprop-and-adam'),
+              ('Practical tips', 2, None, 'practical-tips'),
+              ('Sneaking in automatic differentiation using Autograd',
                2,
                None,
-               'example-code-for-bias-variance-tradeoff'),
-              ('Understanding what happens',
+               'sneaking-in-automatic-differentiation-using-autograd'),
+              ('Same code but now with momentum gradient descent',
                2,
                None,
-               'understanding-what-happens'),
-              ('Summing up', 2, None, 'summing-up'),
-              ("Another Example from Scikit-Learn's Repository",
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ('Including Stochastic Gradient Descent with Autograd',
                2,
                None,
-               'another-example-from-scikit-learn-s-repository'),
-              ('Various steps in cross-validation',
+               'including-stochastic-gradient-descent-with-autograd'),
+              ('Same code but now with momentum gradient descent',
                2,
                None,
-               'various-steps-in-cross-validation'),
-              ('Cross-validation in brief',
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ("But none of these can compete with Newton's method",
                2,
                None,
-               'cross-validation-in-brief'),
-              ('Code Example for Cross-validation and $k$-fold '
-               'Cross-validation',
+               'but-none-of-these-can-compete-with-newton-s-method'),
+              ('Similar (second order function now) problem but now with '
+               'AdaGrad',
                2,
                None,
-               'code-example-for-cross-validation-and-k-fold-cross-validation'),
-              ('More examples on bootstrap and cross-validation and errors',
+               'similar-second-order-function-now-problem-but-now-with-adagrad'),
+              ('RMSprop for adaptive learning rate with Stochastic Gradient '
+               'Descent',
                2,
                None,
-               'more-examples-on-bootstrap-and-cross-validation-and-errors'),
-              ('The same example but now with cross-validation',
+               'rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent'),
+              ('And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"',
                2,
                None,
-               'the-same-example-but-now-with-cross-validation'),
+               'and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf'),
               ('Material for the lab sessions',
                2,
                None,
                'material-for-the-lab-sessions'),
-              ('Linking the regression analysis with a statistical '
-               'interpretation',
+              ('Reminder on different scaling methods',
                2,
                None,
-               'linking-the-regression-analysis-with-a-statistical-interpretation'),
-              ('Assumptions made', 2, None, 'assumptions-made'),
-              ('Expectation value and variance',
+               'reminder-on-different-scaling-methods'),
+              ('Functionality in Scikit-Learn',
                2,
                None,
-               'expectation-value-and-variance'),
-              ('Expectation value and variance for $\\boldsymbol{\\beta}$',
+               'functionality-in-scikit-learn'),
+              ('More preprocessing', 2, None, 'more-preprocessing'),
+              ('Frequently used scaling functions',
                2,
                None,
-               'expectation-value-and-variance-for-boldsymbol-beta')]}
+               'frequently-used-scaling-functions')]}
 end of tocinfo -->
 
 <body>
@@ -220,66 +283,86 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week37-bs.html">Week 37: Statistical interpretations and Resampling Methods</a>
+    <a class="navbar-brand" href="http://github.com/week37-bs.html">Week 37: Gradient descent methods</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs001.html#plans-for-week-37-lecture-monday" style="font-size: 80%;">Plans for week 37, lecture Monday</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs002.html#plans-for-week-37-lab-sessions" style="font-size: 80%;">Plans for week 37, lab sessions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs003.html#material-for-lecture-monday-september-9" style="font-size: 80%;">Material for lecture Monday September 9</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs004.html#deriving-ols-from-a-probability-distribution" style="font-size: 80%;">Deriving OLS from a probability distribution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs005.html#independent-and-identically-distrubuted-iid" style="font-size: 80%;">Independent and Identically Distrubuted (iid)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs006.html#maximum-likelihood-estimation-mle" style="font-size: 80%;">Maximum Likelihood Estimation (MLE)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs007.html#a-new-cost-function" style="font-size: 80%;">A new Cost Function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs008.html#more-basic-statistics-and-bayes-theorem" style="font-size: 80%;">More basic Statistics and Bayes' theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs009.html#marginal-probability" style="font-size: 80%;">Marginal Probability</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs010.html#conditional-probability" style="font-size: 80%;">Conditional  Probability</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs011.html#bayes-theorem" style="font-size: 80%;">Bayes' Theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs012.html#interpretations-of-bayes-theorem" style="font-size: 80%;">Interpretations of Bayes' Theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs013.html#example-of-usage-of-bayes-theorem" style="font-size: 80%;">Example of Usage of Bayes' theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs014.html#doing-it-correctly" style="font-size: 80%;">Doing it correctly</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs015.html#bayes-theorem-and-ridge-and-lasso-regression" style="font-size: 80%;">Bayes' Theorem and Ridge and Lasso Regression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs016.html#ridge-and-bayes" style="font-size: 80%;">Ridge and Bayes</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs017.html#lasso-and-bayes" style="font-size: 80%;">Lasso and Bayes</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs021.html#why-resampling-methods" style="font-size: 80%;">Why resampling methods</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs023.html#resampling-methods" style="font-size: 80%;">Resampling methods</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs020.html#resampling-approaches-can-be-computationally-expensive" style="font-size: 80%;">Resampling approaches can be computationally expensive</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs021.html#why-resampling-methods" style="font-size: 80%;">Why resampling methods ?</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs022.html#statistical-analysis" style="font-size: 80%;">Statistical analysis</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs023.html#resampling-methods" style="font-size: 80%;">Resampling methods</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs024.html#resampling-methods-bootstrap" style="font-size: 80%;">Resampling methods: Bootstrap</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs025.html#the-central-limit-theorem" style="font-size: 80%;">The Central Limit Theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs026.html#finding-the-limit" style="font-size: 80%;">Finding the Limit</a></li>
-     <!-- navigation toc: --> <li><a href="#rewriting-the-delta-function" style="font-size: 80%;">Rewriting the \( \delta \)-function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs028.html#identifying-terms" style="font-size: 80%;">Identifying Terms</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs029.html#wrapping-it-up" style="font-size: 80%;">Wrapping it up</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs030.html#confidence-intervals" style="font-size: 80%;">Confidence Intervals</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs031.html#standard-approach-based-on-the-normal-distribution" style="font-size: 80%;">Standard Approach based on the Normal Distribution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs032.html#resampling-methods-bootstrap-background" style="font-size: 80%;">Resampling methods: Bootstrap background</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#resampling-methods-more-bootstrap-background" style="font-size: 80%;">Resampling methods: More Bootstrap background</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs034.html#resampling-methods-bootstrap-approach" style="font-size: 80%;">Resampling methods: Bootstrap approach</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#resampling-methods-bootstrap-steps" style="font-size: 80%;">Resampling methods: Bootstrap steps</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs036.html#code-example-for-the-bootstrap-method" style="font-size: 80%;">Code example for the Bootstrap method</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs037.html#plotting-the-histogram" style="font-size: 80%;">Plotting the Histogram</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs038.html#the-bias-variance-tradeoff" style="font-size: 80%;">The bias-variance tradeoff</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs039.html#a-way-to-read-the-bias-variance-tradeoff" style="font-size: 80%;">A way to Read the Bias-Variance Tradeoff</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs040.html#example-code-for-bias-variance-tradeoff" style="font-size: 80%;">Example code for Bias-Variance tradeoff</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs041.html#understanding-what-happens" style="font-size: 80%;">Understanding what happens</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs042.html#summing-up" style="font-size: 80%;">Summing up</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs043.html#another-example-from-scikit-learn-s-repository" style="font-size: 80%;">Another Example from Scikit-Learn's Repository</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs044.html#various-steps-in-cross-validation" style="font-size: 80%;">Various steps in cross-validation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs045.html#cross-validation-in-brief" style="font-size: 80%;">Cross-validation in brief</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs046.html#code-example-for-cross-validation-and-k-fold-cross-validation" style="font-size: 80%;">Code Example for Cross-validation and \( k \)-fold Cross-validation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs047.html#more-examples-on-bootstrap-and-cross-validation-and-errors" style="font-size: 80%;">More examples on bootstrap and cross-validation and errors</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs048.html#the-same-example-but-now-with-cross-validation" style="font-size: 80%;">The same example but now with cross-validation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs049.html#material-for-the-lab-sessions" style="font-size: 80%;">Material for the lab sessions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs050.html#linking-the-regression-analysis-with-a-statistical-interpretation" style="font-size: 80%;">Linking the regression analysis with a statistical interpretation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs051.html#assumptions-made" style="font-size: 80%;">Assumptions made</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs052.html#expectation-value-and-variance" style="font-size: 80%;">Expectation value and variance</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs053.html#expectation-value-and-variance-for-boldsymbol-beta" style="font-size: 80%;">Expectation value and variance for \( \boldsymbol{\beta} \)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs001.html#plans-for-week-37-lecture-monday" style="font-size: 80%;"><b>Plans for week 37, lecture Monday</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs002.html#readings-and-videos" style="font-size: 80%;"><b>Readings and Videos:</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs003.html#material-for-lecture-monday-september-8" style="font-size: 80%;"><b>Material for lecture Monday September 8</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs004.html#gradient-descent-and-revisiting-ordinary-least-squares-from-last-week" style="font-size: 80%;"><b>Gradient descent and revisiting Ordinary Least Squares from last week</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs009.html#gradient-descent-example" style="font-size: 80%;"><b>Gradient descent example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs006.html#the-derivative-of-the-cost-loss-function" style="font-size: 80%;"><b>The derivative of the cost/loss function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs007.html#the-hessian-matrix" style="font-size: 80%;"><b>The Hessian matrix</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs008.html#simple-program" style="font-size: 80%;"><b>Simple program</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs009.html#gradient-descent-example" style="font-size: 80%;"><b>Gradient Descent Example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs010.html#gradient-descent-and-ridge" style="font-size: 80%;"><b>Gradient descent and Ridge</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs011.html#the-hessian-matrix-for-ridge-regression" style="font-size: 80%;"><b>The Hessian matrix for Ridge Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs012.html#program-example-for-gradient-descent-with-ridge-regression" style="font-size: 80%;"><b>Program example for gradient descent with Ridge Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs013.html#using-gradient-descent-methods-limitations" style="font-size: 80%;"><b>Using gradient descent methods, limitations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs014.html#momentum-based-gd" style="font-size: 80%;"><b>Momentum based GD</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs015.html#improving-gradient-descent-with-momentum" style="font-size: 80%;"><b>Improving gradient descent with momentum</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs056.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs017.html#overview-video-on-stochastic-gradient-descent-sgd" style="font-size: 80%;"><b>Overview video on Stochastic Gradient Descent (SGD)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs018.html#batches-and-mini-batches" style="font-size: 80%;"><b>Batches and mini-batches</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs019.html#pros-and-cons" style="font-size: 80%;"><b>Pros and cons</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs020.html#convergence-rates" style="font-size: 80%;"><b>Convergence rates</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs021.html#accuracy" style="font-size: 80%;"><b>Accuracy</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs022.html#stochastic-gradient-descent-sgd" style="font-size: 80%;"><b>Stochastic Gradient Descent (SGD)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs023.html#stochastic-gradient-descent" style="font-size: 80%;"><b>Stochastic Gradient Descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs024.html#computation-of-gradients" style="font-size: 80%;"><b>Computation of gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs025.html#sgd-example" style="font-size: 80%;"><b>SGD example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs026.html#the-gradient-step" style="font-size: 80%;"><b>The gradient step</b></a></li>
+     <!-- navigation toc: --> <li><a href="#simple-example-code" style="font-size: 80%;"><b>Simple example code</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs028.html#when-do-we-stop" style="font-size: 80%;"><b>When do we stop?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs029.html#slightly-different-approach" style="font-size: 80%;"><b>Slightly different approach</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs030.html#time-decay-rate" style="font-size: 80%;"><b>Time decay rate</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs031.html#code-with-a-number-of-minibatches-which-varies" style="font-size: 80%;"><b>Code with a Number of Minibatches which varies</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs032.html#replace-or-not" style="font-size: 80%;"><b>Replace or not</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#sgd-vs-full-batch-gd-convergence-speed-and-memory-comparison" style="font-size: 80%;"><b>SGD vs Full-Batch GD: Convergence Speed and Memory Comparison</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#theoretical-convergence-speed-and-convex-optimization" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Theoretical Convergence Speed and convex optimization</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#strongly-convex-case" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Strongly Convex Case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#non-convex-problems" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Non-Convex Problems</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs034.html#memory-usage-and-scalability" style="font-size: 80%;"><b>Memory Usage and Scalability</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#empirical-evidence-convergence-time-and-memory-in-practice" style="font-size: 80%;"><b>Empirical Evidence: Convergence Time and Memory in Practice</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#deep-neural-networks" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Deep Neural Networks</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#memory-constraints" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Memory constraints</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs036.html#second-moment-of-the-gradient" style="font-size: 80%;"><b>Second moment of the gradient</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs037.html#challenge-choosing-a-fixed-learning-rate" style="font-size: 80%;"><b>Challenge: Choosing a Fixed Learning Rate</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs038.html#motivation-for-adaptive-step-sizes" style="font-size: 80%;"><b>Motivation for Adaptive Step Sizes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs038.html#adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>AdaGrad algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs039.html#derivation-of-the-adagrad-algorithm" style="font-size: 80%;"><b>Derivation of the AdaGrad Algorithm</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs040.html#adagrad-update-rule-derivation" style="font-size: 80%;"><b>AdaGrad Update Rule Derivation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs041.html#adagrad-properties" style="font-size: 80%;"><b>AdaGrad Properties</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs042.html#rmsprop-adaptive-learning-rates" style="font-size: 80%;"><b>RMSProp: Adaptive Learning Rates</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs042.html#rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>RMSProp algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs043.html#adam-optimizer" style="font-size: 80%;"><b>Adam Optimizer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs044.html#adam-optimizer-https-arxiv-org-abs-1412-6980" style="font-size: 80%;"><b>"ADAM optimizer":"https://arxiv.org/abs/1412.6980"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs045.html#why-combine-momentum-and-rmsprop" style="font-size: 80%;"><b>Why Combine Momentum and RMSProp?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs046.html#adam-exponential-moving-averages-moments" style="font-size: 80%;"><b>Adam: Exponential Moving Averages (Moments)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs047.html#adam-bias-correction" style="font-size: 80%;"><b>Adam: Bias Correction</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs048.html#adam-update-rule-derivation" style="font-size: 80%;"><b>Adam: Update Rule Derivation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs049.html#adam-vs-adagrad-and-rmsprop" style="font-size: 80%;"><b>Adam vs. AdaGrad and RMSProp</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs050.html#adaptivity-across-dimensions" style="font-size: 80%;"><b>Adaptivity Across Dimensions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs050.html#adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>ADAM algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs051.html#algorithms-and-codes-for-adagrad-rmsprop-and-adam" style="font-size: 80%;"><b>Algorithms and codes for Adagrad, RMSprop and Adam</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs052.html#practical-tips" style="font-size: 80%;"><b>Practical tips</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs053.html#sneaking-in-automatic-differentiation-using-autograd" style="font-size: 80%;"><b>Sneaking in automatic differentiation using Autograd</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs056.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs055.html#including-stochastic-gradient-descent-with-autograd" style="font-size: 80%;"><b>Including Stochastic Gradient Descent with Autograd</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs056.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs057.html#but-none-of-these-can-compete-with-newton-s-method" style="font-size: 80%;"><b>But none of these can compete with Newton's method</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs058.html#similar-second-order-function-now-problem-but-now-with-adagrad" style="font-size: 80%;"><b>Similar (second order function now) problem but now with AdaGrad</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs059.html#rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent" style="font-size: 80%;"><b>RMSprop for adaptive learning rate with Stochastic Gradient Descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs060.html#and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf" style="font-size: 80%;"><b>And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs061.html#material-for-the-lab-sessions" style="font-size: 80%;"><b>Material for the lab sessions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs062.html#reminder-on-different-scaling-methods" style="font-size: 80%;"><b>Reminder on different scaling methods</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs063.html#functionality-in-scikit-learn" style="font-size: 80%;"><b>Functionality in Scikit-Learn</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs064.html#more-preprocessing" style="font-size: 80%;"><b>More preprocessing</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs065.html#frequently-used-scaling-functions" style="font-size: 80%;"><b>Frequently used scaling functions</b></a></li>
 
         </ul>
       </li>
@@ -291,32 +374,52 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0027"></a>
 <!-- !split -->
-<h2 id="rewriting-the-delta-function" class="anchor">Rewriting the \( \delta \)-function </h2>
-
-<p>If we use the integral expression for the \( \delta \)-function</p>
+<h2 id="simple-example-code" class="anchor">Simple example code </h2>
 
-$$
-   \delta(z-\frac{x_1+x_2+\dots+x_m}{m})=\frac{1}{2\pi}\int_{-\infty}^{\infty}
-   dq\exp{\left(iq(z-\frac{x_1+x_2+\dots+x_m}{m})\right)},
-$$
 
-<p>and inserting \( e^{i\mu q-i\mu q} \) where \( \mu \) is the mean value
-we arrive at
-</p>
-$$
-   \tilde{p}(z)=\frac{1}{2\pi}\int_{-\infty}^{\infty}
-   dq\exp{\left(iq(z-\mu)\right)}\left[\int_{-\infty}^{\infty}
-   dxp(x)\exp{\left(iq(\mu-x)/m\right)}\right]^m,
-$$
+<!-- code=python (!bc pycod) typeset with pygments style "default" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #f8f8f8">
+  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span> 
 
-<p>with the integral over \( x \) resulting in</p>
+n <span style="color: #666666">=</span> <span style="color: #666666">100</span> <span style="color: #408080; font-style: italic">#100 datapoints </span>
+M <span style="color: #666666">=</span> <span style="color: #666666">5</span>   <span style="color: #408080; font-style: italic">#size of each minibatch</span>
+m <span style="color: #666666">=</span> <span style="color: #008000">int</span>(n<span style="color: #666666">/</span>M) <span style="color: #408080; font-style: italic">#number of minibatches</span>
+n_epochs <span style="color: #666666">=</span> <span style="color: #666666">10</span> <span style="color: #408080; font-style: italic">#number of epochs</span>
 
-$$
-  \int_{-\infty}^{\infty}dxp(x)\exp{\left(iq(\mu-x)/m\right)}=
-  \int_{-\infty}^{\infty}dxp(x)
-   \left[1+\frac{iq(\mu-x)}{m}-\frac{q^2(\mu-x)^2}{2m^2}+\dots\right].
-$$
+j <span style="color: #666666">=</span> <span style="color: #666666">0</span>
+<span style="color: #008000; font-weight: bold">for</span> epoch <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #666666">1</span>,n_epochs<span style="color: #666666">+1</span>):
+    <span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(m):
+        k <span style="color: #666666">=</span> np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>randint(m) <span style="color: #408080; font-style: italic">#Pick the k-th minibatch at random</span>
+        <span style="color: #408080; font-style: italic">#Compute the gradient using the data in minibatch Bk</span>
+        <span style="color: #408080; font-style: italic">#Compute new suggestion for </span>
+        j <span style="color: #666666">+=</span> <span style="color: #666666">1</span>
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
+</div>
 
+<p>Taking the gradient only on a subset of the data has two important
+benefits. First, it introduces randomness which decreases the chance
+that our opmization scheme gets stuck in a local minima. Second, if
+the size of the minibatches are small relative to the number of
+datapoints (\( M <  n \)), the computation of the gradient is much
+cheaper since we sum over the datapoints in the \( k-th \) minibatch and not
+all \( n \) datapoints.
+</p>
 
 <p>
 <!-- navigation buttons at the bottom of the page -->
@@ -343,7 +446,7 @@ <h2 id="rewriting-the-delta-function" class="anchor">Rewriting the \( \delta \)-
   <li><a href="http://github.com/._week37-bs035.html">36</a></li>
   <li><a href="http://github.com/._week37-bs036.html">37</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week37-bs053.html">54</a></li>
+  <li><a href="http://github.com/._week37-bs065.html">66</a></li>
   <li><a href="http://github.com/._week37-bs028.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week37/html/._week37-bs028.html b/doc/pub/week37/html/._week37-bs028.html
index b140d6739..03abfb9aa 100644
--- a/doc/pub/week37/html/._week37-bs028.html
+++ b/doc/pub/week37/html/._week37-bs028.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 37: Statistical interpretations and Resampling Methods">
-<title>Week 37: Statistical interpretations and Resampling Methods</title>
+<meta name="description" content="Week 37: Gradient descent methods">
+<title>Week 37: Gradient descent methods</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week37.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week37-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -40,159 +40,222 @@
                2,
                None,
                'plans-for-week-37-lecture-monday'),
-              ('Plans for week 37, lab sessions',
+              ('Readings and Videos:', 2, None, 'readings-and-videos'),
+              ('Material for lecture Monday September 8',
                2,
                None,
-               'plans-for-week-37-lab-sessions'),
-              ('Material for lecture Monday September 9',
+               'material-for-lecture-monday-september-8'),
+              ('Gradient descent and revisiting Ordinary Least Squares from '
+               'last week',
                2,
                None,
-               'material-for-lecture-monday-september-9'),
-              ('Deriving OLS from a probability distribution',
+               'gradient-descent-and-revisiting-ordinary-least-squares-from-last-week'),
+              ('Gradient descent example', 2, None, 'gradient-descent-example'),
+              ('The derivative of the cost/loss function',
                2,
                None,
-               'deriving-ols-from-a-probability-distribution'),
-              ('Independent and Identically Distrubuted (iid)',
+               'the-derivative-of-the-cost-loss-function'),
+              ('The Hessian matrix', 2, None, 'the-hessian-matrix'),
+              ('Simple program', 2, None, 'simple-program'),
+              ('Gradient Descent Example', 2, None, 'gradient-descent-example'),
+              ('Gradient descent and Ridge',
                2,
                None,
-               'independent-and-identically-distrubuted-iid'),
-              ('Maximum Likelihood Estimation (MLE)',
+               'gradient-descent-and-ridge'),
+              ('The Hessian matrix for Ridge Regression',
                2,
                None,
-               'maximum-likelihood-estimation-mle'),
-              ('A new Cost Function', 2, None, 'a-new-cost-function'),
-              ("More basic Statistics and Bayes' theorem",
+               'the-hessian-matrix-for-ridge-regression'),
+              ('Program example for gradient descent with Ridge Regression',
                2,
                None,
-               'more-basic-statistics-and-bayes-theorem'),
-              ('Marginal Probability', 2, None, 'marginal-probability'),
-              ('Conditional  Probability', 2, None, 'conditional-probability'),
-              ("Bayes' Theorem", 2, None, 'bayes-theorem'),
-              ("Interpretations of Bayes' Theorem",
+               'program-example-for-gradient-descent-with-ridge-regression'),
+              ('Using gradient descent methods, limitations',
                2,
                None,
-               'interpretations-of-bayes-theorem'),
-              ("Example of Usage of Bayes' theorem",
+               'using-gradient-descent-methods-limitations'),
+              ('Momentum based GD', 2, None, 'momentum-based-gd'),
+              ('Improving gradient descent with momentum',
                2,
                None,
-               'example-of-usage-of-bayes-theorem'),
-              ('Doing it correctly', 2, None, 'doing-it-correctly'),
-              ("Bayes' Theorem and Ridge and Lasso Regression",
+               'improving-gradient-descent-with-momentum'),
+              ('Same code but now with momentum gradient descent',
                2,
                None,
-               'bayes-theorem-and-ridge-and-lasso-regression'),
-              ('Ridge and Bayes', 2, None, 'ridge-and-bayes'),
-              ('Lasso and Bayes', 2, None, 'lasso-and-bayes'),
-              ('Why resampling methods', 2, None, 'why-resampling-methods'),
-              ('Resampling methods', 2, None, 'resampling-methods'),
-              ('Resampling approaches can be computationally expensive',
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ('Overview video on Stochastic Gradient Descent (SGD)',
                2,
                None,
-               'resampling-approaches-can-be-computationally-expensive'),
-              ('Why resampling methods ?', 2, None, 'why-resampling-methods'),
-              ('Statistical analysis', 2, None, 'statistical-analysis'),
-              ('Resampling methods', 2, None, 'resampling-methods'),
-              ('Resampling methods: Bootstrap',
+               'overview-video-on-stochastic-gradient-descent-sgd'),
+              ('Batches and mini-batches', 2, None, 'batches-and-mini-batches'),
+              ('Pros and cons', 2, None, 'pros-and-cons'),
+              ('Convergence rates', 2, None, 'convergence-rates'),
+              ('Accuracy', 2, None, 'accuracy'),
+              ('Stochastic Gradient Descent (SGD)',
                2,
                None,
-               'resampling-methods-bootstrap'),
-              ('The Central Limit Theorem',
+               'stochastic-gradient-descent-sgd'),
+              ('Stochastic Gradient Descent',
                2,
                None,
-               'the-central-limit-theorem'),
-              ('Finding the Limit', 2, None, 'finding-the-limit'),
-              ('Rewriting the $\\delta$-function',
+               'stochastic-gradient-descent'),
+              ('Computation of gradients', 2, None, 'computation-of-gradients'),
+              ('SGD example', 2, None, 'sgd-example'),
+              ('The gradient step', 2, None, 'the-gradient-step'),
+              ('Simple example code', 2, None, 'simple-example-code'),
+              ('When do we stop?', 2, None, 'when-do-we-stop'),
+              ('Slightly different approach',
                2,
                None,
-               'rewriting-the-delta-function'),
-              ('Identifying Terms', 2, None, 'identifying-terms'),
-              ('Wrapping it up', 2, None, 'wrapping-it-up'),
-              ('Confidence Intervals', 2, None, 'confidence-intervals'),
-              ('Standard Approach based on the Normal Distribution',
+               'slightly-different-approach'),
+              ('Time decay rate', 2, None, 'time-decay-rate'),
+              ('Code with a Number of Minibatches which varies',
                2,
                None,
-               'standard-approach-based-on-the-normal-distribution'),
-              ('Resampling methods: Bootstrap background',
+               'code-with-a-number-of-minibatches-which-varies'),
+              ('Replace or not', 2, None, 'replace-or-not'),
+              ('SGD vs Full-Batch GD: Convergence Speed and Memory Comparison',
                2,
                None,
-               'resampling-methods-bootstrap-background'),
-              ('Resampling methods: More Bootstrap background',
+               'sgd-vs-full-batch-gd-convergence-speed-and-memory-comparison'),
+              ('Theoretical Convergence Speed and convex optimization',
+               3,
+               None,
+               'theoretical-convergence-speed-and-convex-optimization'),
+              ('Strongly Convex Case', 3, None, 'strongly-convex-case'),
+              ('Non-Convex Problems', 3, None, 'non-convex-problems'),
+              ('Memory Usage and Scalability',
+               2,
+               None,
+               'memory-usage-and-scalability'),
+              ('Empirical Evidence: Convergence Time and Memory in Practice',
+               2,
+               None,
+               'empirical-evidence-convergence-time-and-memory-in-practice'),
+              ('Deep Neural Networks', 3, None, 'deep-neural-networks'),
+              ('Memory constraints', 3, None, 'memory-constraints'),
+              ('Second moment of the gradient',
+               2,
+               None,
+               'second-moment-of-the-gradient'),
+              ('Challenge: Choosing a Fixed Learning Rate',
+               2,
+               None,
+               'challenge-choosing-a-fixed-learning-rate'),
+              ('Motivation for Adaptive Step Sizes',
+               2,
+               None,
+               'motivation-for-adaptive-step-sizes'),
+              ('AdaGrad algorithm, taken from "Goodfellow et '
+               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
+               2,
+               None,
+               'adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
+              ('Derivation of the AdaGrad Algorithm',
+               2,
+               None,
+               'derivation-of-the-adagrad-algorithm'),
+              ('AdaGrad Update Rule Derivation',
                2,
                None,
-               'resampling-methods-more-bootstrap-background'),
-              ('Resampling methods: Bootstrap approach',
+               'adagrad-update-rule-derivation'),
+              ('AdaGrad Properties', 2, None, 'adagrad-properties'),
+              ('RMSProp: Adaptive Learning Rates',
                2,
                None,
-               'resampling-methods-bootstrap-approach'),
-              ('Resampling methods: Bootstrap steps',
+               'rmsprop-adaptive-learning-rates'),
+              ('RMSProp algorithm, taken from "Goodfellow et '
+               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
                2,
                None,
-               'resampling-methods-bootstrap-steps'),
-              ('Code example for the Bootstrap method',
+               'rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
+              ('Adam Optimizer', 2, None, 'adam-optimizer'),
+              ('"ADAM optimizer":"https://arxiv.org/abs/1412.6980"',
                2,
                None,
-               'code-example-for-the-bootstrap-method'),
-              ('Plotting the Histogram', 2, None, 'plotting-the-histogram'),
-              ('The bias-variance tradeoff',
+               'adam-optimizer-https-arxiv-org-abs-1412-6980'),
+              ('Why Combine Momentum and RMSProp?',
                2,
                None,
-               'the-bias-variance-tradeoff'),
-              ('A way to Read the Bias-Variance Tradeoff',
+               'why-combine-momentum-and-rmsprop'),
+              ('Adam: Exponential Moving Averages (Moments)',
                2,
                None,
-               'a-way-to-read-the-bias-variance-tradeoff'),
-              ('Example code for Bias-Variance tradeoff',
+               'adam-exponential-moving-averages-moments'),
+              ('Adam: Bias Correction', 2, None, 'adam-bias-correction'),
+              ('Adam: Update Rule Derivation',
                2,
                None,
-               'example-code-for-bias-variance-tradeoff'),
-              ('Understanding what happens',
+               'adam-update-rule-derivation'),
+              ('Adam vs. AdaGrad and RMSProp',
                2,
                None,
-               'understanding-what-happens'),
-              ('Summing up', 2, None, 'summing-up'),
-              ("Another Example from Scikit-Learn's Repository",
+               'adam-vs-adagrad-and-rmsprop'),
+              ('Adaptivity Across Dimensions',
                2,
                None,
-               'another-example-from-scikit-learn-s-repository'),
-              ('Various steps in cross-validation',
+               'adaptivity-across-dimensions'),
+              ('ADAM algorithm, taken from "Goodfellow et '
+               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
                2,
                None,
-               'various-steps-in-cross-validation'),
-              ('Cross-validation in brief',
+               'adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
+              ('Algorithms and codes for Adagrad, RMSprop and Adam',
                2,
                None,
-               'cross-validation-in-brief'),
-              ('Code Example for Cross-validation and $k$-fold '
-               'Cross-validation',
+               'algorithms-and-codes-for-adagrad-rmsprop-and-adam'),
+              ('Practical tips', 2, None, 'practical-tips'),
+              ('Sneaking in automatic differentiation using Autograd',
                2,
                None,
-               'code-example-for-cross-validation-and-k-fold-cross-validation'),
-              ('More examples on bootstrap and cross-validation and errors',
+               'sneaking-in-automatic-differentiation-using-autograd'),
+              ('Same code but now with momentum gradient descent',
                2,
                None,
-               'more-examples-on-bootstrap-and-cross-validation-and-errors'),
-              ('The same example but now with cross-validation',
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ('Including Stochastic Gradient Descent with Autograd',
                2,
                None,
-               'the-same-example-but-now-with-cross-validation'),
+               'including-stochastic-gradient-descent-with-autograd'),
+              ('Same code but now with momentum gradient descent',
+               2,
+               None,
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ("But none of these can compete with Newton's method",
+               2,
+               None,
+               'but-none-of-these-can-compete-with-newton-s-method'),
+              ('Similar (second order function now) problem but now with '
+               'AdaGrad',
+               2,
+               None,
+               'similar-second-order-function-now-problem-but-now-with-adagrad'),
+              ('RMSprop for adaptive learning rate with Stochastic Gradient '
+               'Descent',
+               2,
+               None,
+               'rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent'),
+              ('And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"',
+               2,
+               None,
+               'and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf'),
               ('Material for the lab sessions',
                2,
                None,
                'material-for-the-lab-sessions'),
-              ('Linking the regression analysis with a statistical '
-               'interpretation',
+              ('Reminder on different scaling methods',
                2,
                None,
-               'linking-the-regression-analysis-with-a-statistical-interpretation'),
-              ('Assumptions made', 2, None, 'assumptions-made'),
-              ('Expectation value and variance',
+               'reminder-on-different-scaling-methods'),
+              ('Functionality in Scikit-Learn',
                2,
                None,
-               'expectation-value-and-variance'),
-              ('Expectation value and variance for $\\boldsymbol{\\beta}$',
+               'functionality-in-scikit-learn'),
+              ('More preprocessing', 2, None, 'more-preprocessing'),
+              ('Frequently used scaling functions',
                2,
                None,
-               'expectation-value-and-variance-for-boldsymbol-beta')]}
+               'frequently-used-scaling-functions')]}
 end of tocinfo -->
 
 <body>
@@ -220,66 +283,86 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week37-bs.html">Week 37: Statistical interpretations and Resampling Methods</a>
+    <a class="navbar-brand" href="http://github.com/week37-bs.html">Week 37: Gradient descent methods</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs001.html#plans-for-week-37-lecture-monday" style="font-size: 80%;">Plans for week 37, lecture Monday</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs002.html#plans-for-week-37-lab-sessions" style="font-size: 80%;">Plans for week 37, lab sessions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs003.html#material-for-lecture-monday-september-9" style="font-size: 80%;">Material for lecture Monday September 9</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs004.html#deriving-ols-from-a-probability-distribution" style="font-size: 80%;">Deriving OLS from a probability distribution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs005.html#independent-and-identically-distrubuted-iid" style="font-size: 80%;">Independent and Identically Distrubuted (iid)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs006.html#maximum-likelihood-estimation-mle" style="font-size: 80%;">Maximum Likelihood Estimation (MLE)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs007.html#a-new-cost-function" style="font-size: 80%;">A new Cost Function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs008.html#more-basic-statistics-and-bayes-theorem" style="font-size: 80%;">More basic Statistics and Bayes' theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs009.html#marginal-probability" style="font-size: 80%;">Marginal Probability</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs010.html#conditional-probability" style="font-size: 80%;">Conditional  Probability</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs011.html#bayes-theorem" style="font-size: 80%;">Bayes' Theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs012.html#interpretations-of-bayes-theorem" style="font-size: 80%;">Interpretations of Bayes' Theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs013.html#example-of-usage-of-bayes-theorem" style="font-size: 80%;">Example of Usage of Bayes' theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs014.html#doing-it-correctly" style="font-size: 80%;">Doing it correctly</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs015.html#bayes-theorem-and-ridge-and-lasso-regression" style="font-size: 80%;">Bayes' Theorem and Ridge and Lasso Regression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs016.html#ridge-and-bayes" style="font-size: 80%;">Ridge and Bayes</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs017.html#lasso-and-bayes" style="font-size: 80%;">Lasso and Bayes</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs021.html#why-resampling-methods" style="font-size: 80%;">Why resampling methods</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs023.html#resampling-methods" style="font-size: 80%;">Resampling methods</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs020.html#resampling-approaches-can-be-computationally-expensive" style="font-size: 80%;">Resampling approaches can be computationally expensive</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs021.html#why-resampling-methods" style="font-size: 80%;">Why resampling methods ?</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs022.html#statistical-analysis" style="font-size: 80%;">Statistical analysis</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs023.html#resampling-methods" style="font-size: 80%;">Resampling methods</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs024.html#resampling-methods-bootstrap" style="font-size: 80%;">Resampling methods: Bootstrap</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs025.html#the-central-limit-theorem" style="font-size: 80%;">The Central Limit Theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs026.html#finding-the-limit" style="font-size: 80%;">Finding the Limit</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs027.html#rewriting-the-delta-function" style="font-size: 80%;">Rewriting the \( \delta \)-function</a></li>
-     <!-- navigation toc: --> <li><a href="#identifying-terms" style="font-size: 80%;">Identifying Terms</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs029.html#wrapping-it-up" style="font-size: 80%;">Wrapping it up</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs030.html#confidence-intervals" style="font-size: 80%;">Confidence Intervals</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs031.html#standard-approach-based-on-the-normal-distribution" style="font-size: 80%;">Standard Approach based on the Normal Distribution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs032.html#resampling-methods-bootstrap-background" style="font-size: 80%;">Resampling methods: Bootstrap background</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#resampling-methods-more-bootstrap-background" style="font-size: 80%;">Resampling methods: More Bootstrap background</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs034.html#resampling-methods-bootstrap-approach" style="font-size: 80%;">Resampling methods: Bootstrap approach</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#resampling-methods-bootstrap-steps" style="font-size: 80%;">Resampling methods: Bootstrap steps</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs036.html#code-example-for-the-bootstrap-method" style="font-size: 80%;">Code example for the Bootstrap method</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs037.html#plotting-the-histogram" style="font-size: 80%;">Plotting the Histogram</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs038.html#the-bias-variance-tradeoff" style="font-size: 80%;">The bias-variance tradeoff</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs039.html#a-way-to-read-the-bias-variance-tradeoff" style="font-size: 80%;">A way to Read the Bias-Variance Tradeoff</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs040.html#example-code-for-bias-variance-tradeoff" style="font-size: 80%;">Example code for Bias-Variance tradeoff</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs041.html#understanding-what-happens" style="font-size: 80%;">Understanding what happens</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs042.html#summing-up" style="font-size: 80%;">Summing up</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs043.html#another-example-from-scikit-learn-s-repository" style="font-size: 80%;">Another Example from Scikit-Learn's Repository</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs044.html#various-steps-in-cross-validation" style="font-size: 80%;">Various steps in cross-validation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs045.html#cross-validation-in-brief" style="font-size: 80%;">Cross-validation in brief</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs046.html#code-example-for-cross-validation-and-k-fold-cross-validation" style="font-size: 80%;">Code Example for Cross-validation and \( k \)-fold Cross-validation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs047.html#more-examples-on-bootstrap-and-cross-validation-and-errors" style="font-size: 80%;">More examples on bootstrap and cross-validation and errors</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs048.html#the-same-example-but-now-with-cross-validation" style="font-size: 80%;">The same example but now with cross-validation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs049.html#material-for-the-lab-sessions" style="font-size: 80%;">Material for the lab sessions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs050.html#linking-the-regression-analysis-with-a-statistical-interpretation" style="font-size: 80%;">Linking the regression analysis with a statistical interpretation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs051.html#assumptions-made" style="font-size: 80%;">Assumptions made</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs052.html#expectation-value-and-variance" style="font-size: 80%;">Expectation value and variance</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs053.html#expectation-value-and-variance-for-boldsymbol-beta" style="font-size: 80%;">Expectation value and variance for \( \boldsymbol{\beta} \)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs001.html#plans-for-week-37-lecture-monday" style="font-size: 80%;"><b>Plans for week 37, lecture Monday</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs002.html#readings-and-videos" style="font-size: 80%;"><b>Readings and Videos:</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs003.html#material-for-lecture-monday-september-8" style="font-size: 80%;"><b>Material for lecture Monday September 8</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs004.html#gradient-descent-and-revisiting-ordinary-least-squares-from-last-week" style="font-size: 80%;"><b>Gradient descent and revisiting Ordinary Least Squares from last week</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs009.html#gradient-descent-example" style="font-size: 80%;"><b>Gradient descent example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs006.html#the-derivative-of-the-cost-loss-function" style="font-size: 80%;"><b>The derivative of the cost/loss function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs007.html#the-hessian-matrix" style="font-size: 80%;"><b>The Hessian matrix</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs008.html#simple-program" style="font-size: 80%;"><b>Simple program</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs009.html#gradient-descent-example" style="font-size: 80%;"><b>Gradient Descent Example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs010.html#gradient-descent-and-ridge" style="font-size: 80%;"><b>Gradient descent and Ridge</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs011.html#the-hessian-matrix-for-ridge-regression" style="font-size: 80%;"><b>The Hessian matrix for Ridge Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs012.html#program-example-for-gradient-descent-with-ridge-regression" style="font-size: 80%;"><b>Program example for gradient descent with Ridge Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs013.html#using-gradient-descent-methods-limitations" style="font-size: 80%;"><b>Using gradient descent methods, limitations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs014.html#momentum-based-gd" style="font-size: 80%;"><b>Momentum based GD</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs015.html#improving-gradient-descent-with-momentum" style="font-size: 80%;"><b>Improving gradient descent with momentum</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs056.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs017.html#overview-video-on-stochastic-gradient-descent-sgd" style="font-size: 80%;"><b>Overview video on Stochastic Gradient Descent (SGD)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs018.html#batches-and-mini-batches" style="font-size: 80%;"><b>Batches and mini-batches</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs019.html#pros-and-cons" style="font-size: 80%;"><b>Pros and cons</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs020.html#convergence-rates" style="font-size: 80%;"><b>Convergence rates</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs021.html#accuracy" style="font-size: 80%;"><b>Accuracy</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs022.html#stochastic-gradient-descent-sgd" style="font-size: 80%;"><b>Stochastic Gradient Descent (SGD)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs023.html#stochastic-gradient-descent" style="font-size: 80%;"><b>Stochastic Gradient Descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs024.html#computation-of-gradients" style="font-size: 80%;"><b>Computation of gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs025.html#sgd-example" style="font-size: 80%;"><b>SGD example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs026.html#the-gradient-step" style="font-size: 80%;"><b>The gradient step</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs027.html#simple-example-code" style="font-size: 80%;"><b>Simple example code</b></a></li>
+     <!-- navigation toc: --> <li><a href="#when-do-we-stop" style="font-size: 80%;"><b>When do we stop?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs029.html#slightly-different-approach" style="font-size: 80%;"><b>Slightly different approach</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs030.html#time-decay-rate" style="font-size: 80%;"><b>Time decay rate</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs031.html#code-with-a-number-of-minibatches-which-varies" style="font-size: 80%;"><b>Code with a Number of Minibatches which varies</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs032.html#replace-or-not" style="font-size: 80%;"><b>Replace or not</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#sgd-vs-full-batch-gd-convergence-speed-and-memory-comparison" style="font-size: 80%;"><b>SGD vs Full-Batch GD: Convergence Speed and Memory Comparison</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#theoretical-convergence-speed-and-convex-optimization" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Theoretical Convergence Speed and convex optimization</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#strongly-convex-case" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Strongly Convex Case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#non-convex-problems" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Non-Convex Problems</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs034.html#memory-usage-and-scalability" style="font-size: 80%;"><b>Memory Usage and Scalability</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#empirical-evidence-convergence-time-and-memory-in-practice" style="font-size: 80%;"><b>Empirical Evidence: Convergence Time and Memory in Practice</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#deep-neural-networks" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Deep Neural Networks</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#memory-constraints" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Memory constraints</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs036.html#second-moment-of-the-gradient" style="font-size: 80%;"><b>Second moment of the gradient</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs037.html#challenge-choosing-a-fixed-learning-rate" style="font-size: 80%;"><b>Challenge: Choosing a Fixed Learning Rate</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs038.html#motivation-for-adaptive-step-sizes" style="font-size: 80%;"><b>Motivation for Adaptive Step Sizes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs038.html#adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>AdaGrad algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs039.html#derivation-of-the-adagrad-algorithm" style="font-size: 80%;"><b>Derivation of the AdaGrad Algorithm</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs040.html#adagrad-update-rule-derivation" style="font-size: 80%;"><b>AdaGrad Update Rule Derivation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs041.html#adagrad-properties" style="font-size: 80%;"><b>AdaGrad Properties</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs042.html#rmsprop-adaptive-learning-rates" style="font-size: 80%;"><b>RMSProp: Adaptive Learning Rates</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs042.html#rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>RMSProp algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs043.html#adam-optimizer" style="font-size: 80%;"><b>Adam Optimizer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs044.html#adam-optimizer-https-arxiv-org-abs-1412-6980" style="font-size: 80%;"><b>"ADAM optimizer":"https://arxiv.org/abs/1412.6980"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs045.html#why-combine-momentum-and-rmsprop" style="font-size: 80%;"><b>Why Combine Momentum and RMSProp?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs046.html#adam-exponential-moving-averages-moments" style="font-size: 80%;"><b>Adam: Exponential Moving Averages (Moments)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs047.html#adam-bias-correction" style="font-size: 80%;"><b>Adam: Bias Correction</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs048.html#adam-update-rule-derivation" style="font-size: 80%;"><b>Adam: Update Rule Derivation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs049.html#adam-vs-adagrad-and-rmsprop" style="font-size: 80%;"><b>Adam vs. AdaGrad and RMSProp</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs050.html#adaptivity-across-dimensions" style="font-size: 80%;"><b>Adaptivity Across Dimensions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs050.html#adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>ADAM algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs051.html#algorithms-and-codes-for-adagrad-rmsprop-and-adam" style="font-size: 80%;"><b>Algorithms and codes for Adagrad, RMSprop and Adam</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs052.html#practical-tips" style="font-size: 80%;"><b>Practical tips</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs053.html#sneaking-in-automatic-differentiation-using-autograd" style="font-size: 80%;"><b>Sneaking in automatic differentiation using Autograd</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs056.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs055.html#including-stochastic-gradient-descent-with-autograd" style="font-size: 80%;"><b>Including Stochastic Gradient Descent with Autograd</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs056.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs057.html#but-none-of-these-can-compete-with-newton-s-method" style="font-size: 80%;"><b>But none of these can compete with Newton's method</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs058.html#similar-second-order-function-now-problem-but-now-with-adagrad" style="font-size: 80%;"><b>Similar (second order function now) problem but now with AdaGrad</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs059.html#rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent" style="font-size: 80%;"><b>RMSprop for adaptive learning rate with Stochastic Gradient Descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs060.html#and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf" style="font-size: 80%;"><b>And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs061.html#material-for-the-lab-sessions" style="font-size: 80%;"><b>Material for the lab sessions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs062.html#reminder-on-different-scaling-methods" style="font-size: 80%;"><b>Reminder on different scaling methods</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs063.html#functionality-in-scikit-learn" style="font-size: 80%;"><b>Functionality in Scikit-Learn</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs064.html#more-preprocessing" style="font-size: 80%;"><b>More preprocessing</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs065.html#frequently-used-scaling-functions" style="font-size: 80%;"><b>Frequently used scaling functions</b></a></li>
 
         </ul>
       </li>
@@ -291,33 +374,18 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0028"></a>
 <!-- !split -->
-<h2 id="identifying-terms" class="anchor">Identifying Terms </h2>
-
-<p>The second term on the rhs disappears since this is just the mean and 
-employing the definition of \( \sigma^2 \) we have 
-</p>
-$$
-  \int_{-\infty}^{\infty}dxp(x)e^{\left(iq(\mu-x)/m\right)}=
-  1-\frac{q^2\sigma^2}{2m^2}+\dots,
-$$
-
-<p>resulting in </p>
-
-$$
-  \left[\int_{-\infty}^{\infty}dxp(x)\exp{\left(iq(\mu-x)/m\right)}\right]^m\approx
-  \left[1-\frac{q^2\sigma^2}{2m^2}+\dots \right]^m,
-$$
-
-<p>and in the limit \( m\rightarrow \infty \) we obtain </p>
-
-$$
-   \tilde{p}(z)=\frac{1}{\sqrt{2\pi}(\sigma/\sqrt{m})}
-    \exp{\left(-\frac{(z-\mu)^2}{2(\sigma/\sqrt{m})^2}\right)},
-$$
+<h2 id="when-do-we-stop" class="anchor">When do we stop? </h2>
 
-<p>which is the normal distribution with variance
-\( \sigma^2_m=\sigma^2/m \), where \( \sigma \) is the variance of the PDF \( p(x) \)
-and \( \mu \) is also the mean of the PDF \( p(x) \). 
+<p>A natural question is when do we stop the search for a new minimum?
+One possibility is to compute the full gradient after a given number
+of epochs and check if the norm of the gradient is smaller than some
+threshold and stop if true. However, the condition that the gradient
+is zero is valid also for local minima, so this would only tell us
+that we are close to a local/global minimum. However, we could also
+evaluate the cost function at this point, store the result and
+continue the search. If the test kicks in at a later stage we can
+compare the values of the cost function and keep the \( \theta \) that
+gave the lowest value.
 </p>
 
 <p>
@@ -345,7 +413,7 @@ <h2 id="identifying-terms" class="anchor">Identifying Terms </h2>
   <li><a href="http://github.com/._week37-bs036.html">37</a></li>
   <li><a href="http://github.com/._week37-bs037.html">38</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week37-bs053.html">54</a></li>
+  <li><a href="http://github.com/._week37-bs065.html">66</a></li>
   <li><a href="http://github.com/._week37-bs029.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week37/html/._week37-bs029.html b/doc/pub/week37/html/._week37-bs029.html
index d5bb0afc5..1049e2326 100644
--- a/doc/pub/week37/html/._week37-bs029.html
+++ b/doc/pub/week37/html/._week37-bs029.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 37: Statistical interpretations and Resampling Methods">
-<title>Week 37: Statistical interpretations and Resampling Methods</title>
+<meta name="description" content="Week 37: Gradient descent methods">
+<title>Week 37: Gradient descent methods</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week37.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week37-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -40,159 +40,222 @@
                2,
                None,
                'plans-for-week-37-lecture-monday'),
-              ('Plans for week 37, lab sessions',
+              ('Readings and Videos:', 2, None, 'readings-and-videos'),
+              ('Material for lecture Monday September 8',
                2,
                None,
-               'plans-for-week-37-lab-sessions'),
-              ('Material for lecture Monday September 9',
+               'material-for-lecture-monday-september-8'),
+              ('Gradient descent and revisiting Ordinary Least Squares from '
+               'last week',
                2,
                None,
-               'material-for-lecture-monday-september-9'),
-              ('Deriving OLS from a probability distribution',
+               'gradient-descent-and-revisiting-ordinary-least-squares-from-last-week'),
+              ('Gradient descent example', 2, None, 'gradient-descent-example'),
+              ('The derivative of the cost/loss function',
                2,
                None,
-               'deriving-ols-from-a-probability-distribution'),
-              ('Independent and Identically Distrubuted (iid)',
+               'the-derivative-of-the-cost-loss-function'),
+              ('The Hessian matrix', 2, None, 'the-hessian-matrix'),
+              ('Simple program', 2, None, 'simple-program'),
+              ('Gradient Descent Example', 2, None, 'gradient-descent-example'),
+              ('Gradient descent and Ridge',
                2,
                None,
-               'independent-and-identically-distrubuted-iid'),
-              ('Maximum Likelihood Estimation (MLE)',
+               'gradient-descent-and-ridge'),
+              ('The Hessian matrix for Ridge Regression',
                2,
                None,
-               'maximum-likelihood-estimation-mle'),
-              ('A new Cost Function', 2, None, 'a-new-cost-function'),
-              ("More basic Statistics and Bayes' theorem",
+               'the-hessian-matrix-for-ridge-regression'),
+              ('Program example for gradient descent with Ridge Regression',
                2,
                None,
-               'more-basic-statistics-and-bayes-theorem'),
-              ('Marginal Probability', 2, None, 'marginal-probability'),
-              ('Conditional  Probability', 2, None, 'conditional-probability'),
-              ("Bayes' Theorem", 2, None, 'bayes-theorem'),
-              ("Interpretations of Bayes' Theorem",
+               'program-example-for-gradient-descent-with-ridge-regression'),
+              ('Using gradient descent methods, limitations',
                2,
                None,
-               'interpretations-of-bayes-theorem'),
-              ("Example of Usage of Bayes' theorem",
+               'using-gradient-descent-methods-limitations'),
+              ('Momentum based GD', 2, None, 'momentum-based-gd'),
+              ('Improving gradient descent with momentum',
                2,
                None,
-               'example-of-usage-of-bayes-theorem'),
-              ('Doing it correctly', 2, None, 'doing-it-correctly'),
-              ("Bayes' Theorem and Ridge and Lasso Regression",
+               'improving-gradient-descent-with-momentum'),
+              ('Same code but now with momentum gradient descent',
                2,
                None,
-               'bayes-theorem-and-ridge-and-lasso-regression'),
-              ('Ridge and Bayes', 2, None, 'ridge-and-bayes'),
-              ('Lasso and Bayes', 2, None, 'lasso-and-bayes'),
-              ('Why resampling methods', 2, None, 'why-resampling-methods'),
-              ('Resampling methods', 2, None, 'resampling-methods'),
-              ('Resampling approaches can be computationally expensive',
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ('Overview video on Stochastic Gradient Descent (SGD)',
                2,
                None,
-               'resampling-approaches-can-be-computationally-expensive'),
-              ('Why resampling methods ?', 2, None, 'why-resampling-methods'),
-              ('Statistical analysis', 2, None, 'statistical-analysis'),
-              ('Resampling methods', 2, None, 'resampling-methods'),
-              ('Resampling methods: Bootstrap',
+               'overview-video-on-stochastic-gradient-descent-sgd'),
+              ('Batches and mini-batches', 2, None, 'batches-and-mini-batches'),
+              ('Pros and cons', 2, None, 'pros-and-cons'),
+              ('Convergence rates', 2, None, 'convergence-rates'),
+              ('Accuracy', 2, None, 'accuracy'),
+              ('Stochastic Gradient Descent (SGD)',
                2,
                None,
-               'resampling-methods-bootstrap'),
-              ('The Central Limit Theorem',
+               'stochastic-gradient-descent-sgd'),
+              ('Stochastic Gradient Descent',
                2,
                None,
-               'the-central-limit-theorem'),
-              ('Finding the Limit', 2, None, 'finding-the-limit'),
-              ('Rewriting the $\\delta$-function',
+               'stochastic-gradient-descent'),
+              ('Computation of gradients', 2, None, 'computation-of-gradients'),
+              ('SGD example', 2, None, 'sgd-example'),
+              ('The gradient step', 2, None, 'the-gradient-step'),
+              ('Simple example code', 2, None, 'simple-example-code'),
+              ('When do we stop?', 2, None, 'when-do-we-stop'),
+              ('Slightly different approach',
                2,
                None,
-               'rewriting-the-delta-function'),
-              ('Identifying Terms', 2, None, 'identifying-terms'),
-              ('Wrapping it up', 2, None, 'wrapping-it-up'),
-              ('Confidence Intervals', 2, None, 'confidence-intervals'),
-              ('Standard Approach based on the Normal Distribution',
+               'slightly-different-approach'),
+              ('Time decay rate', 2, None, 'time-decay-rate'),
+              ('Code with a Number of Minibatches which varies',
                2,
                None,
-               'standard-approach-based-on-the-normal-distribution'),
-              ('Resampling methods: Bootstrap background',
+               'code-with-a-number-of-minibatches-which-varies'),
+              ('Replace or not', 2, None, 'replace-or-not'),
+              ('SGD vs Full-Batch GD: Convergence Speed and Memory Comparison',
                2,
                None,
-               'resampling-methods-bootstrap-background'),
-              ('Resampling methods: More Bootstrap background',
+               'sgd-vs-full-batch-gd-convergence-speed-and-memory-comparison'),
+              ('Theoretical Convergence Speed and convex optimization',
+               3,
+               None,
+               'theoretical-convergence-speed-and-convex-optimization'),
+              ('Strongly Convex Case', 3, None, 'strongly-convex-case'),
+              ('Non-Convex Problems', 3, None, 'non-convex-problems'),
+              ('Memory Usage and Scalability',
+               2,
+               None,
+               'memory-usage-and-scalability'),
+              ('Empirical Evidence: Convergence Time and Memory in Practice',
+               2,
+               None,
+               'empirical-evidence-convergence-time-and-memory-in-practice'),
+              ('Deep Neural Networks', 3, None, 'deep-neural-networks'),
+              ('Memory constraints', 3, None, 'memory-constraints'),
+              ('Second moment of the gradient',
+               2,
+               None,
+               'second-moment-of-the-gradient'),
+              ('Challenge: Choosing a Fixed Learning Rate',
+               2,
+               None,
+               'challenge-choosing-a-fixed-learning-rate'),
+              ('Motivation for Adaptive Step Sizes',
+               2,
+               None,
+               'motivation-for-adaptive-step-sizes'),
+              ('AdaGrad algorithm, taken from "Goodfellow et '
+               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
+               2,
+               None,
+               'adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
+              ('Derivation of the AdaGrad Algorithm',
+               2,
+               None,
+               'derivation-of-the-adagrad-algorithm'),
+              ('AdaGrad Update Rule Derivation',
+               2,
+               None,
+               'adagrad-update-rule-derivation'),
+              ('AdaGrad Properties', 2, None, 'adagrad-properties'),
+              ('RMSProp: Adaptive Learning Rates',
                2,
                None,
-               'resampling-methods-more-bootstrap-background'),
-              ('Resampling methods: Bootstrap approach',
+               'rmsprop-adaptive-learning-rates'),
+              ('RMSProp algorithm, taken from "Goodfellow et '
+               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
                2,
                None,
-               'resampling-methods-bootstrap-approach'),
-              ('Resampling methods: Bootstrap steps',
+               'rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
+              ('Adam Optimizer', 2, None, 'adam-optimizer'),
+              ('"ADAM optimizer":"https://arxiv.org/abs/1412.6980"',
                2,
                None,
-               'resampling-methods-bootstrap-steps'),
-              ('Code example for the Bootstrap method',
+               'adam-optimizer-https-arxiv-org-abs-1412-6980'),
+              ('Why Combine Momentum and RMSProp?',
                2,
                None,
-               'code-example-for-the-bootstrap-method'),
-              ('Plotting the Histogram', 2, None, 'plotting-the-histogram'),
-              ('The bias-variance tradeoff',
+               'why-combine-momentum-and-rmsprop'),
+              ('Adam: Exponential Moving Averages (Moments)',
                2,
                None,
-               'the-bias-variance-tradeoff'),
-              ('A way to Read the Bias-Variance Tradeoff',
+               'adam-exponential-moving-averages-moments'),
+              ('Adam: Bias Correction', 2, None, 'adam-bias-correction'),
+              ('Adam: Update Rule Derivation',
                2,
                None,
-               'a-way-to-read-the-bias-variance-tradeoff'),
-              ('Example code for Bias-Variance tradeoff',
+               'adam-update-rule-derivation'),
+              ('Adam vs. AdaGrad and RMSProp',
                2,
                None,
-               'example-code-for-bias-variance-tradeoff'),
-              ('Understanding what happens',
+               'adam-vs-adagrad-and-rmsprop'),
+              ('Adaptivity Across Dimensions',
                2,
                None,
-               'understanding-what-happens'),
-              ('Summing up', 2, None, 'summing-up'),
-              ("Another Example from Scikit-Learn's Repository",
+               'adaptivity-across-dimensions'),
+              ('ADAM algorithm, taken from "Goodfellow et '
+               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
                2,
                None,
-               'another-example-from-scikit-learn-s-repository'),
-              ('Various steps in cross-validation',
+               'adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
+              ('Algorithms and codes for Adagrad, RMSprop and Adam',
                2,
                None,
-               'various-steps-in-cross-validation'),
-              ('Cross-validation in brief',
+               'algorithms-and-codes-for-adagrad-rmsprop-and-adam'),
+              ('Practical tips', 2, None, 'practical-tips'),
+              ('Sneaking in automatic differentiation using Autograd',
                2,
                None,
-               'cross-validation-in-brief'),
-              ('Code Example for Cross-validation and $k$-fold '
-               'Cross-validation',
+               'sneaking-in-automatic-differentiation-using-autograd'),
+              ('Same code but now with momentum gradient descent',
                2,
                None,
-               'code-example-for-cross-validation-and-k-fold-cross-validation'),
-              ('More examples on bootstrap and cross-validation and errors',
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ('Including Stochastic Gradient Descent with Autograd',
                2,
                None,
-               'more-examples-on-bootstrap-and-cross-validation-and-errors'),
-              ('The same example but now with cross-validation',
+               'including-stochastic-gradient-descent-with-autograd'),
+              ('Same code but now with momentum gradient descent',
                2,
                None,
-               'the-same-example-but-now-with-cross-validation'),
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ("But none of these can compete with Newton's method",
+               2,
+               None,
+               'but-none-of-these-can-compete-with-newton-s-method'),
+              ('Similar (second order function now) problem but now with '
+               'AdaGrad',
+               2,
+               None,
+               'similar-second-order-function-now-problem-but-now-with-adagrad'),
+              ('RMSprop for adaptive learning rate with Stochastic Gradient '
+               'Descent',
+               2,
+               None,
+               'rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent'),
+              ('And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"',
+               2,
+               None,
+               'and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf'),
               ('Material for the lab sessions',
                2,
                None,
                'material-for-the-lab-sessions'),
-              ('Linking the regression analysis with a statistical '
-               'interpretation',
+              ('Reminder on different scaling methods',
                2,
                None,
-               'linking-the-regression-analysis-with-a-statistical-interpretation'),
-              ('Assumptions made', 2, None, 'assumptions-made'),
-              ('Expectation value and variance',
+               'reminder-on-different-scaling-methods'),
+              ('Functionality in Scikit-Learn',
                2,
                None,
-               'expectation-value-and-variance'),
-              ('Expectation value and variance for $\\boldsymbol{\\beta}$',
+               'functionality-in-scikit-learn'),
+              ('More preprocessing', 2, None, 'more-preprocessing'),
+              ('Frequently used scaling functions',
                2,
                None,
-               'expectation-value-and-variance-for-boldsymbol-beta')]}
+               'frequently-used-scaling-functions')]}
 end of tocinfo -->
 
 <body>
@@ -220,66 +283,86 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week37-bs.html">Week 37: Statistical interpretations and Resampling Methods</a>
+    <a class="navbar-brand" href="http://github.com/week37-bs.html">Week 37: Gradient descent methods</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs001.html#plans-for-week-37-lecture-monday" style="font-size: 80%;">Plans for week 37, lecture Monday</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs002.html#plans-for-week-37-lab-sessions" style="font-size: 80%;">Plans for week 37, lab sessions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs003.html#material-for-lecture-monday-september-9" style="font-size: 80%;">Material for lecture Monday September 9</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs004.html#deriving-ols-from-a-probability-distribution" style="font-size: 80%;">Deriving OLS from a probability distribution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs005.html#independent-and-identically-distrubuted-iid" style="font-size: 80%;">Independent and Identically Distrubuted (iid)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs006.html#maximum-likelihood-estimation-mle" style="font-size: 80%;">Maximum Likelihood Estimation (MLE)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs007.html#a-new-cost-function" style="font-size: 80%;">A new Cost Function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs008.html#more-basic-statistics-and-bayes-theorem" style="font-size: 80%;">More basic Statistics and Bayes' theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs009.html#marginal-probability" style="font-size: 80%;">Marginal Probability</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs010.html#conditional-probability" style="font-size: 80%;">Conditional  Probability</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs011.html#bayes-theorem" style="font-size: 80%;">Bayes' Theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs012.html#interpretations-of-bayes-theorem" style="font-size: 80%;">Interpretations of Bayes' Theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs013.html#example-of-usage-of-bayes-theorem" style="font-size: 80%;">Example of Usage of Bayes' theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs014.html#doing-it-correctly" style="font-size: 80%;">Doing it correctly</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs015.html#bayes-theorem-and-ridge-and-lasso-regression" style="font-size: 80%;">Bayes' Theorem and Ridge and Lasso Regression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs016.html#ridge-and-bayes" style="font-size: 80%;">Ridge and Bayes</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs017.html#lasso-and-bayes" style="font-size: 80%;">Lasso and Bayes</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs021.html#why-resampling-methods" style="font-size: 80%;">Why resampling methods</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs023.html#resampling-methods" style="font-size: 80%;">Resampling methods</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs020.html#resampling-approaches-can-be-computationally-expensive" style="font-size: 80%;">Resampling approaches can be computationally expensive</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs021.html#why-resampling-methods" style="font-size: 80%;">Why resampling methods ?</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs022.html#statistical-analysis" style="font-size: 80%;">Statistical analysis</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs023.html#resampling-methods" style="font-size: 80%;">Resampling methods</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs024.html#resampling-methods-bootstrap" style="font-size: 80%;">Resampling methods: Bootstrap</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs025.html#the-central-limit-theorem" style="font-size: 80%;">The Central Limit Theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs026.html#finding-the-limit" style="font-size: 80%;">Finding the Limit</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs027.html#rewriting-the-delta-function" style="font-size: 80%;">Rewriting the \( \delta \)-function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs028.html#identifying-terms" style="font-size: 80%;">Identifying Terms</a></li>
-     <!-- navigation toc: --> <li><a href="#wrapping-it-up" style="font-size: 80%;">Wrapping it up</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs030.html#confidence-intervals" style="font-size: 80%;">Confidence Intervals</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs031.html#standard-approach-based-on-the-normal-distribution" style="font-size: 80%;">Standard Approach based on the Normal Distribution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs032.html#resampling-methods-bootstrap-background" style="font-size: 80%;">Resampling methods: Bootstrap background</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#resampling-methods-more-bootstrap-background" style="font-size: 80%;">Resampling methods: More Bootstrap background</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs034.html#resampling-methods-bootstrap-approach" style="font-size: 80%;">Resampling methods: Bootstrap approach</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#resampling-methods-bootstrap-steps" style="font-size: 80%;">Resampling methods: Bootstrap steps</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs036.html#code-example-for-the-bootstrap-method" style="font-size: 80%;">Code example for the Bootstrap method</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs037.html#plotting-the-histogram" style="font-size: 80%;">Plotting the Histogram</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs038.html#the-bias-variance-tradeoff" style="font-size: 80%;">The bias-variance tradeoff</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs039.html#a-way-to-read-the-bias-variance-tradeoff" style="font-size: 80%;">A way to Read the Bias-Variance Tradeoff</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs040.html#example-code-for-bias-variance-tradeoff" style="font-size: 80%;">Example code for Bias-Variance tradeoff</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs041.html#understanding-what-happens" style="font-size: 80%;">Understanding what happens</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs042.html#summing-up" style="font-size: 80%;">Summing up</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs043.html#another-example-from-scikit-learn-s-repository" style="font-size: 80%;">Another Example from Scikit-Learn's Repository</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs044.html#various-steps-in-cross-validation" style="font-size: 80%;">Various steps in cross-validation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs045.html#cross-validation-in-brief" style="font-size: 80%;">Cross-validation in brief</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs046.html#code-example-for-cross-validation-and-k-fold-cross-validation" style="font-size: 80%;">Code Example for Cross-validation and \( k \)-fold Cross-validation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs047.html#more-examples-on-bootstrap-and-cross-validation-and-errors" style="font-size: 80%;">More examples on bootstrap and cross-validation and errors</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs048.html#the-same-example-but-now-with-cross-validation" style="font-size: 80%;">The same example but now with cross-validation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs049.html#material-for-the-lab-sessions" style="font-size: 80%;">Material for the lab sessions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs050.html#linking-the-regression-analysis-with-a-statistical-interpretation" style="font-size: 80%;">Linking the regression analysis with a statistical interpretation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs051.html#assumptions-made" style="font-size: 80%;">Assumptions made</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs052.html#expectation-value-and-variance" style="font-size: 80%;">Expectation value and variance</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs053.html#expectation-value-and-variance-for-boldsymbol-beta" style="font-size: 80%;">Expectation value and variance for \( \boldsymbol{\beta} \)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs001.html#plans-for-week-37-lecture-monday" style="font-size: 80%;"><b>Plans for week 37, lecture Monday</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs002.html#readings-and-videos" style="font-size: 80%;"><b>Readings and Videos:</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs003.html#material-for-lecture-monday-september-8" style="font-size: 80%;"><b>Material for lecture Monday September 8</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs004.html#gradient-descent-and-revisiting-ordinary-least-squares-from-last-week" style="font-size: 80%;"><b>Gradient descent and revisiting Ordinary Least Squares from last week</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs009.html#gradient-descent-example" style="font-size: 80%;"><b>Gradient descent example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs006.html#the-derivative-of-the-cost-loss-function" style="font-size: 80%;"><b>The derivative of the cost/loss function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs007.html#the-hessian-matrix" style="font-size: 80%;"><b>The Hessian matrix</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs008.html#simple-program" style="font-size: 80%;"><b>Simple program</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs009.html#gradient-descent-example" style="font-size: 80%;"><b>Gradient Descent Example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs010.html#gradient-descent-and-ridge" style="font-size: 80%;"><b>Gradient descent and Ridge</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs011.html#the-hessian-matrix-for-ridge-regression" style="font-size: 80%;"><b>The Hessian matrix for Ridge Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs012.html#program-example-for-gradient-descent-with-ridge-regression" style="font-size: 80%;"><b>Program example for gradient descent with Ridge Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs013.html#using-gradient-descent-methods-limitations" style="font-size: 80%;"><b>Using gradient descent methods, limitations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs014.html#momentum-based-gd" style="font-size: 80%;"><b>Momentum based GD</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs015.html#improving-gradient-descent-with-momentum" style="font-size: 80%;"><b>Improving gradient descent with momentum</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs056.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs017.html#overview-video-on-stochastic-gradient-descent-sgd" style="font-size: 80%;"><b>Overview video on Stochastic Gradient Descent (SGD)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs018.html#batches-and-mini-batches" style="font-size: 80%;"><b>Batches and mini-batches</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs019.html#pros-and-cons" style="font-size: 80%;"><b>Pros and cons</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs020.html#convergence-rates" style="font-size: 80%;"><b>Convergence rates</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs021.html#accuracy" style="font-size: 80%;"><b>Accuracy</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs022.html#stochastic-gradient-descent-sgd" style="font-size: 80%;"><b>Stochastic Gradient Descent (SGD)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs023.html#stochastic-gradient-descent" style="font-size: 80%;"><b>Stochastic Gradient Descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs024.html#computation-of-gradients" style="font-size: 80%;"><b>Computation of gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs025.html#sgd-example" style="font-size: 80%;"><b>SGD example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs026.html#the-gradient-step" style="font-size: 80%;"><b>The gradient step</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs027.html#simple-example-code" style="font-size: 80%;"><b>Simple example code</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs028.html#when-do-we-stop" style="font-size: 80%;"><b>When do we stop?</b></a></li>
+     <!-- navigation toc: --> <li><a href="#slightly-different-approach" style="font-size: 80%;"><b>Slightly different approach</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs030.html#time-decay-rate" style="font-size: 80%;"><b>Time decay rate</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs031.html#code-with-a-number-of-minibatches-which-varies" style="font-size: 80%;"><b>Code with a Number of Minibatches which varies</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs032.html#replace-or-not" style="font-size: 80%;"><b>Replace or not</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#sgd-vs-full-batch-gd-convergence-speed-and-memory-comparison" style="font-size: 80%;"><b>SGD vs Full-Batch GD: Convergence Speed and Memory Comparison</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#theoretical-convergence-speed-and-convex-optimization" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Theoretical Convergence Speed and convex optimization</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#strongly-convex-case" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Strongly Convex Case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#non-convex-problems" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Non-Convex Problems</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs034.html#memory-usage-and-scalability" style="font-size: 80%;"><b>Memory Usage and Scalability</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#empirical-evidence-convergence-time-and-memory-in-practice" style="font-size: 80%;"><b>Empirical Evidence: Convergence Time and Memory in Practice</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#deep-neural-networks" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Deep Neural Networks</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#memory-constraints" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Memory constraints</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs036.html#second-moment-of-the-gradient" style="font-size: 80%;"><b>Second moment of the gradient</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs037.html#challenge-choosing-a-fixed-learning-rate" style="font-size: 80%;"><b>Challenge: Choosing a Fixed Learning Rate</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs038.html#motivation-for-adaptive-step-sizes" style="font-size: 80%;"><b>Motivation for Adaptive Step Sizes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs038.html#adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>AdaGrad algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs039.html#derivation-of-the-adagrad-algorithm" style="font-size: 80%;"><b>Derivation of the AdaGrad Algorithm</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs040.html#adagrad-update-rule-derivation" style="font-size: 80%;"><b>AdaGrad Update Rule Derivation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs041.html#adagrad-properties" style="font-size: 80%;"><b>AdaGrad Properties</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs042.html#rmsprop-adaptive-learning-rates" style="font-size: 80%;"><b>RMSProp: Adaptive Learning Rates</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs042.html#rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>RMSProp algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs043.html#adam-optimizer" style="font-size: 80%;"><b>Adam Optimizer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs044.html#adam-optimizer-https-arxiv-org-abs-1412-6980" style="font-size: 80%;"><b>"ADAM optimizer":"https://arxiv.org/abs/1412.6980"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs045.html#why-combine-momentum-and-rmsprop" style="font-size: 80%;"><b>Why Combine Momentum and RMSProp?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs046.html#adam-exponential-moving-averages-moments" style="font-size: 80%;"><b>Adam: Exponential Moving Averages (Moments)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs047.html#adam-bias-correction" style="font-size: 80%;"><b>Adam: Bias Correction</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs048.html#adam-update-rule-derivation" style="font-size: 80%;"><b>Adam: Update Rule Derivation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs049.html#adam-vs-adagrad-and-rmsprop" style="font-size: 80%;"><b>Adam vs. AdaGrad and RMSProp</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs050.html#adaptivity-across-dimensions" style="font-size: 80%;"><b>Adaptivity Across Dimensions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs050.html#adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>ADAM algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs051.html#algorithms-and-codes-for-adagrad-rmsprop-and-adam" style="font-size: 80%;"><b>Algorithms and codes for Adagrad, RMSprop and Adam</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs052.html#practical-tips" style="font-size: 80%;"><b>Practical tips</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs053.html#sneaking-in-automatic-differentiation-using-autograd" style="font-size: 80%;"><b>Sneaking in automatic differentiation using Autograd</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs056.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs055.html#including-stochastic-gradient-descent-with-autograd" style="font-size: 80%;"><b>Including Stochastic Gradient Descent with Autograd</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs056.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs057.html#but-none-of-these-can-compete-with-newton-s-method" style="font-size: 80%;"><b>But none of these can compete with Newton's method</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs058.html#similar-second-order-function-now-problem-but-now-with-adagrad" style="font-size: 80%;"><b>Similar (second order function now) problem but now with AdaGrad</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs059.html#rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent" style="font-size: 80%;"><b>RMSprop for adaptive learning rate with Stochastic Gradient Descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs060.html#and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf" style="font-size: 80%;"><b>And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs061.html#material-for-the-lab-sessions" style="font-size: 80%;"><b>Material for the lab sessions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs062.html#reminder-on-different-scaling-methods" style="font-size: 80%;"><b>Reminder on different scaling methods</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs063.html#functionality-in-scikit-learn" style="font-size: 80%;"><b>Functionality in Scikit-Learn</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs064.html#more-preprocessing" style="font-size: 80%;"><b>More preprocessing</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs065.html#frequently-used-scaling-functions" style="font-size: 80%;"><b>Frequently used scaling functions</b></a></li>
 
         </ul>
       </li>
@@ -291,45 +374,17 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0029"></a>
 <!-- !split -->
-<h2 id="wrapping-it-up" class="anchor">Wrapping it up </h2>
-
-<p>Thus, the central limit theorem states that the PDF \( \tilde{p}(z) \) of
-the average of \( m \) random values corresponding to a PDF \( p(x) \) 
-is a normal distribution whose mean is the 
-mean value of the PDF \( p(x) \) and whose variance is the variance
-of the PDF \( p(x) \) divided by \( m \), the number of values used to compute \( z \).
-</p>
-
-<p>The central limit theorem leads to the well-known expression for the
-standard deviation, given by 
-</p>
-
-$$
-    \sigma_m=
-\frac{\sigma}{\sqrt{m}}.
-$$
-
-<p>The latter is true only if the average value is known exactly. This is obtained in the limit
-\( m\rightarrow \infty \)  only. Because the mean and the variance are measured quantities we obtain 
-the familiar expression in statistics (the so-called Bessel correction)
-</p>
-$$
-    \sigma_m\approx 
-\frac{\sigma}{\sqrt{m-1}}.
-$$
-
-<p>In many cases however the above estimate for the standard deviation,
-in particular if correlations are strong, may be too simplistic. Keep
-in mind that we have assumed that the variables \( x \) are independent
-and identically distributed. This is obviously not always the
-case. For example, the random numbers (or better pseudorandom numbers)
-we generate in various calculations do always exhibit some
-correlations.
-</p>
+<h2 id="slightly-different-approach" class="anchor">Slightly different approach </h2>
 
-<p>The theorem is satisfied by a large class of PDFs. Note however that for a
-finite \( m \), it is not always possible to find a closed form /analytic expression for
-\( \tilde{p}(x) \).
+<p>Another approach is to let the step length \( \eta_j \) depend on the
+number of epochs in such a way that it becomes very small after a
+reasonable time such that we do not move at all. Such approaches are
+also called scaling. There are many such ways to <a href="https://towardsdatascience.com/gradient-descent-the-learning-rate-and-the-importance-of-feature-scaling-6c0b416596e1" target="_self">scale the learning
+rate</a>
+and <a href="https://www.jmlr.org/papers/volume23/20-1258/20-1258.pdf" target="_self">discussions here</a>. See
+also
+<a href="https://towardsdatascience.com/learning-rate-schedules-and-adaptive-learning-rate-methods-for-deep-learning-2c8f433990d1" target="_self"><tt>https://towardsdatascience.com/learning-rate-schedules-and-adaptive-learning-rate-methods-for-deep-learning-2c8f433990d1</tt></a>
+for a discussion of different scaling functions for the learning rate.
 </p>
 
 <p>
@@ -357,7 +412,7 @@ <h2 id="wrapping-it-up" class="anchor">Wrapping it up </h2>
   <li><a href="http://github.com/._week37-bs037.html">38</a></li>
   <li><a href="http://github.com/._week37-bs038.html">39</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week37-bs053.html">54</a></li>
+  <li><a href="http://github.com/._week37-bs065.html">66</a></li>
   <li><a href="http://github.com/._week37-bs030.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week37/html/._week37-bs030.html b/doc/pub/week37/html/._week37-bs030.html
index c0af19a83..aec105dd2 100644
--- a/doc/pub/week37/html/._week37-bs030.html
+++ b/doc/pub/week37/html/._week37-bs030.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 37: Statistical interpretations and Resampling Methods">
-<title>Week 37: Statistical interpretations and Resampling Methods</title>
+<meta name="description" content="Week 37: Gradient descent methods">
+<title>Week 37: Gradient descent methods</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week37.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week37-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -40,159 +40,222 @@
                2,
                None,
                'plans-for-week-37-lecture-monday'),
-              ('Plans for week 37, lab sessions',
+              ('Readings and Videos:', 2, None, 'readings-and-videos'),
+              ('Material for lecture Monday September 8',
                2,
                None,
-               'plans-for-week-37-lab-sessions'),
-              ('Material for lecture Monday September 9',
+               'material-for-lecture-monday-september-8'),
+              ('Gradient descent and revisiting Ordinary Least Squares from '
+               'last week',
                2,
                None,
-               'material-for-lecture-monday-september-9'),
-              ('Deriving OLS from a probability distribution',
+               'gradient-descent-and-revisiting-ordinary-least-squares-from-last-week'),
+              ('Gradient descent example', 2, None, 'gradient-descent-example'),
+              ('The derivative of the cost/loss function',
                2,
                None,
-               'deriving-ols-from-a-probability-distribution'),
-              ('Independent and Identically Distrubuted (iid)',
+               'the-derivative-of-the-cost-loss-function'),
+              ('The Hessian matrix', 2, None, 'the-hessian-matrix'),
+              ('Simple program', 2, None, 'simple-program'),
+              ('Gradient Descent Example', 2, None, 'gradient-descent-example'),
+              ('Gradient descent and Ridge',
                2,
                None,
-               'independent-and-identically-distrubuted-iid'),
-              ('Maximum Likelihood Estimation (MLE)',
+               'gradient-descent-and-ridge'),
+              ('The Hessian matrix for Ridge Regression',
                2,
                None,
-               'maximum-likelihood-estimation-mle'),
-              ('A new Cost Function', 2, None, 'a-new-cost-function'),
-              ("More basic Statistics and Bayes' theorem",
+               'the-hessian-matrix-for-ridge-regression'),
+              ('Program example for gradient descent with Ridge Regression',
                2,
                None,
-               'more-basic-statistics-and-bayes-theorem'),
-              ('Marginal Probability', 2, None, 'marginal-probability'),
-              ('Conditional  Probability', 2, None, 'conditional-probability'),
-              ("Bayes' Theorem", 2, None, 'bayes-theorem'),
-              ("Interpretations of Bayes' Theorem",
+               'program-example-for-gradient-descent-with-ridge-regression'),
+              ('Using gradient descent methods, limitations',
                2,
                None,
-               'interpretations-of-bayes-theorem'),
-              ("Example of Usage of Bayes' theorem",
+               'using-gradient-descent-methods-limitations'),
+              ('Momentum based GD', 2, None, 'momentum-based-gd'),
+              ('Improving gradient descent with momentum',
                2,
                None,
-               'example-of-usage-of-bayes-theorem'),
-              ('Doing it correctly', 2, None, 'doing-it-correctly'),
-              ("Bayes' Theorem and Ridge and Lasso Regression",
+               'improving-gradient-descent-with-momentum'),
+              ('Same code but now with momentum gradient descent',
                2,
                None,
-               'bayes-theorem-and-ridge-and-lasso-regression'),
-              ('Ridge and Bayes', 2, None, 'ridge-and-bayes'),
-              ('Lasso and Bayes', 2, None, 'lasso-and-bayes'),
-              ('Why resampling methods', 2, None, 'why-resampling-methods'),
-              ('Resampling methods', 2, None, 'resampling-methods'),
-              ('Resampling approaches can be computationally expensive',
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ('Overview video on Stochastic Gradient Descent (SGD)',
                2,
                None,
-               'resampling-approaches-can-be-computationally-expensive'),
-              ('Why resampling methods ?', 2, None, 'why-resampling-methods'),
-              ('Statistical analysis', 2, None, 'statistical-analysis'),
-              ('Resampling methods', 2, None, 'resampling-methods'),
-              ('Resampling methods: Bootstrap',
+               'overview-video-on-stochastic-gradient-descent-sgd'),
+              ('Batches and mini-batches', 2, None, 'batches-and-mini-batches'),
+              ('Pros and cons', 2, None, 'pros-and-cons'),
+              ('Convergence rates', 2, None, 'convergence-rates'),
+              ('Accuracy', 2, None, 'accuracy'),
+              ('Stochastic Gradient Descent (SGD)',
                2,
                None,
-               'resampling-methods-bootstrap'),
-              ('The Central Limit Theorem',
+               'stochastic-gradient-descent-sgd'),
+              ('Stochastic Gradient Descent',
                2,
                None,
-               'the-central-limit-theorem'),
-              ('Finding the Limit', 2, None, 'finding-the-limit'),
-              ('Rewriting the $\\delta$-function',
+               'stochastic-gradient-descent'),
+              ('Computation of gradients', 2, None, 'computation-of-gradients'),
+              ('SGD example', 2, None, 'sgd-example'),
+              ('The gradient step', 2, None, 'the-gradient-step'),
+              ('Simple example code', 2, None, 'simple-example-code'),
+              ('When do we stop?', 2, None, 'when-do-we-stop'),
+              ('Slightly different approach',
                2,
                None,
-               'rewriting-the-delta-function'),
-              ('Identifying Terms', 2, None, 'identifying-terms'),
-              ('Wrapping it up', 2, None, 'wrapping-it-up'),
-              ('Confidence Intervals', 2, None, 'confidence-intervals'),
-              ('Standard Approach based on the Normal Distribution',
+               'slightly-different-approach'),
+              ('Time decay rate', 2, None, 'time-decay-rate'),
+              ('Code with a Number of Minibatches which varies',
                2,
                None,
-               'standard-approach-based-on-the-normal-distribution'),
-              ('Resampling methods: Bootstrap background',
+               'code-with-a-number-of-minibatches-which-varies'),
+              ('Replace or not', 2, None, 'replace-or-not'),
+              ('SGD vs Full-Batch GD: Convergence Speed and Memory Comparison',
                2,
                None,
-               'resampling-methods-bootstrap-background'),
-              ('Resampling methods: More Bootstrap background',
+               'sgd-vs-full-batch-gd-convergence-speed-and-memory-comparison'),
+              ('Theoretical Convergence Speed and convex optimization',
+               3,
+               None,
+               'theoretical-convergence-speed-and-convex-optimization'),
+              ('Strongly Convex Case', 3, None, 'strongly-convex-case'),
+              ('Non-Convex Problems', 3, None, 'non-convex-problems'),
+              ('Memory Usage and Scalability',
+               2,
+               None,
+               'memory-usage-and-scalability'),
+              ('Empirical Evidence: Convergence Time and Memory in Practice',
+               2,
+               None,
+               'empirical-evidence-convergence-time-and-memory-in-practice'),
+              ('Deep Neural Networks', 3, None, 'deep-neural-networks'),
+              ('Memory constraints', 3, None, 'memory-constraints'),
+              ('Second moment of the gradient',
+               2,
+               None,
+               'second-moment-of-the-gradient'),
+              ('Challenge: Choosing a Fixed Learning Rate',
+               2,
+               None,
+               'challenge-choosing-a-fixed-learning-rate'),
+              ('Motivation for Adaptive Step Sizes',
+               2,
+               None,
+               'motivation-for-adaptive-step-sizes'),
+              ('AdaGrad algorithm, taken from "Goodfellow et '
+               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
+               2,
+               None,
+               'adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
+              ('Derivation of the AdaGrad Algorithm',
+               2,
+               None,
+               'derivation-of-the-adagrad-algorithm'),
+              ('AdaGrad Update Rule Derivation',
+               2,
+               None,
+               'adagrad-update-rule-derivation'),
+              ('AdaGrad Properties', 2, None, 'adagrad-properties'),
+              ('RMSProp: Adaptive Learning Rates',
+               2,
+               None,
+               'rmsprop-adaptive-learning-rates'),
+              ('RMSProp algorithm, taken from "Goodfellow et '
+               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
                2,
                None,
-               'resampling-methods-more-bootstrap-background'),
-              ('Resampling methods: Bootstrap approach',
+               'rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
+              ('Adam Optimizer', 2, None, 'adam-optimizer'),
+              ('"ADAM optimizer":"https://arxiv.org/abs/1412.6980"',
                2,
                None,
-               'resampling-methods-bootstrap-approach'),
-              ('Resampling methods: Bootstrap steps',
+               'adam-optimizer-https-arxiv-org-abs-1412-6980'),
+              ('Why Combine Momentum and RMSProp?',
                2,
                None,
-               'resampling-methods-bootstrap-steps'),
-              ('Code example for the Bootstrap method',
+               'why-combine-momentum-and-rmsprop'),
+              ('Adam: Exponential Moving Averages (Moments)',
                2,
                None,
-               'code-example-for-the-bootstrap-method'),
-              ('Plotting the Histogram', 2, None, 'plotting-the-histogram'),
-              ('The bias-variance tradeoff',
+               'adam-exponential-moving-averages-moments'),
+              ('Adam: Bias Correction', 2, None, 'adam-bias-correction'),
+              ('Adam: Update Rule Derivation',
                2,
                None,
-               'the-bias-variance-tradeoff'),
-              ('A way to Read the Bias-Variance Tradeoff',
+               'adam-update-rule-derivation'),
+              ('Adam vs. AdaGrad and RMSProp',
                2,
                None,
-               'a-way-to-read-the-bias-variance-tradeoff'),
-              ('Example code for Bias-Variance tradeoff',
+               'adam-vs-adagrad-and-rmsprop'),
+              ('Adaptivity Across Dimensions',
                2,
                None,
-               'example-code-for-bias-variance-tradeoff'),
-              ('Understanding what happens',
+               'adaptivity-across-dimensions'),
+              ('ADAM algorithm, taken from "Goodfellow et '
+               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
                2,
                None,
-               'understanding-what-happens'),
-              ('Summing up', 2, None, 'summing-up'),
-              ("Another Example from Scikit-Learn's Repository",
+               'adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
+              ('Algorithms and codes for Adagrad, RMSprop and Adam',
                2,
                None,
-               'another-example-from-scikit-learn-s-repository'),
-              ('Various steps in cross-validation',
+               'algorithms-and-codes-for-adagrad-rmsprop-and-adam'),
+              ('Practical tips', 2, None, 'practical-tips'),
+              ('Sneaking in automatic differentiation using Autograd',
                2,
                None,
-               'various-steps-in-cross-validation'),
-              ('Cross-validation in brief',
+               'sneaking-in-automatic-differentiation-using-autograd'),
+              ('Same code but now with momentum gradient descent',
                2,
                None,
-               'cross-validation-in-brief'),
-              ('Code Example for Cross-validation and $k$-fold '
-               'Cross-validation',
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ('Including Stochastic Gradient Descent with Autograd',
                2,
                None,
-               'code-example-for-cross-validation-and-k-fold-cross-validation'),
-              ('More examples on bootstrap and cross-validation and errors',
+               'including-stochastic-gradient-descent-with-autograd'),
+              ('Same code but now with momentum gradient descent',
                2,
                None,
-               'more-examples-on-bootstrap-and-cross-validation-and-errors'),
-              ('The same example but now with cross-validation',
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ("But none of these can compete with Newton's method",
                2,
                None,
-               'the-same-example-but-now-with-cross-validation'),
+               'but-none-of-these-can-compete-with-newton-s-method'),
+              ('Similar (second order function now) problem but now with '
+               'AdaGrad',
+               2,
+               None,
+               'similar-second-order-function-now-problem-but-now-with-adagrad'),
+              ('RMSprop for adaptive learning rate with Stochastic Gradient '
+               'Descent',
+               2,
+               None,
+               'rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent'),
+              ('And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"',
+               2,
+               None,
+               'and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf'),
               ('Material for the lab sessions',
                2,
                None,
                'material-for-the-lab-sessions'),
-              ('Linking the regression analysis with a statistical '
-               'interpretation',
+              ('Reminder on different scaling methods',
                2,
                None,
-               'linking-the-regression-analysis-with-a-statistical-interpretation'),
-              ('Assumptions made', 2, None, 'assumptions-made'),
-              ('Expectation value and variance',
+               'reminder-on-different-scaling-methods'),
+              ('Functionality in Scikit-Learn',
                2,
                None,
-               'expectation-value-and-variance'),
-              ('Expectation value and variance for $\\boldsymbol{\\beta}$',
+               'functionality-in-scikit-learn'),
+              ('More preprocessing', 2, None, 'more-preprocessing'),
+              ('Frequently used scaling functions',
                2,
                None,
-               'expectation-value-and-variance-for-boldsymbol-beta')]}
+               'frequently-used-scaling-functions')]}
 end of tocinfo -->
 
 <body>
@@ -220,66 +283,86 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week37-bs.html">Week 37: Statistical interpretations and Resampling Methods</a>
+    <a class="navbar-brand" href="http://github.com/week37-bs.html">Week 37: Gradient descent methods</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs001.html#plans-for-week-37-lecture-monday" style="font-size: 80%;">Plans for week 37, lecture Monday</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs002.html#plans-for-week-37-lab-sessions" style="font-size: 80%;">Plans for week 37, lab sessions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs003.html#material-for-lecture-monday-september-9" style="font-size: 80%;">Material for lecture Monday September 9</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs004.html#deriving-ols-from-a-probability-distribution" style="font-size: 80%;">Deriving OLS from a probability distribution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs005.html#independent-and-identically-distrubuted-iid" style="font-size: 80%;">Independent and Identically Distrubuted (iid)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs006.html#maximum-likelihood-estimation-mle" style="font-size: 80%;">Maximum Likelihood Estimation (MLE)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs007.html#a-new-cost-function" style="font-size: 80%;">A new Cost Function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs008.html#more-basic-statistics-and-bayes-theorem" style="font-size: 80%;">More basic Statistics and Bayes' theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs009.html#marginal-probability" style="font-size: 80%;">Marginal Probability</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs010.html#conditional-probability" style="font-size: 80%;">Conditional  Probability</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs011.html#bayes-theorem" style="font-size: 80%;">Bayes' Theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs012.html#interpretations-of-bayes-theorem" style="font-size: 80%;">Interpretations of Bayes' Theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs013.html#example-of-usage-of-bayes-theorem" style="font-size: 80%;">Example of Usage of Bayes' theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs014.html#doing-it-correctly" style="font-size: 80%;">Doing it correctly</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs015.html#bayes-theorem-and-ridge-and-lasso-regression" style="font-size: 80%;">Bayes' Theorem and Ridge and Lasso Regression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs016.html#ridge-and-bayes" style="font-size: 80%;">Ridge and Bayes</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs017.html#lasso-and-bayes" style="font-size: 80%;">Lasso and Bayes</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs021.html#why-resampling-methods" style="font-size: 80%;">Why resampling methods</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs023.html#resampling-methods" style="font-size: 80%;">Resampling methods</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs020.html#resampling-approaches-can-be-computationally-expensive" style="font-size: 80%;">Resampling approaches can be computationally expensive</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs021.html#why-resampling-methods" style="font-size: 80%;">Why resampling methods ?</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs022.html#statistical-analysis" style="font-size: 80%;">Statistical analysis</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs023.html#resampling-methods" style="font-size: 80%;">Resampling methods</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs024.html#resampling-methods-bootstrap" style="font-size: 80%;">Resampling methods: Bootstrap</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs025.html#the-central-limit-theorem" style="font-size: 80%;">The Central Limit Theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs026.html#finding-the-limit" style="font-size: 80%;">Finding the Limit</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs027.html#rewriting-the-delta-function" style="font-size: 80%;">Rewriting the \( \delta \)-function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs028.html#identifying-terms" style="font-size: 80%;">Identifying Terms</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs029.html#wrapping-it-up" style="font-size: 80%;">Wrapping it up</a></li>
-     <!-- navigation toc: --> <li><a href="#confidence-intervals" style="font-size: 80%;">Confidence Intervals</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs031.html#standard-approach-based-on-the-normal-distribution" style="font-size: 80%;">Standard Approach based on the Normal Distribution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs032.html#resampling-methods-bootstrap-background" style="font-size: 80%;">Resampling methods: Bootstrap background</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#resampling-methods-more-bootstrap-background" style="font-size: 80%;">Resampling methods: More Bootstrap background</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs034.html#resampling-methods-bootstrap-approach" style="font-size: 80%;">Resampling methods: Bootstrap approach</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#resampling-methods-bootstrap-steps" style="font-size: 80%;">Resampling methods: Bootstrap steps</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs036.html#code-example-for-the-bootstrap-method" style="font-size: 80%;">Code example for the Bootstrap method</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs037.html#plotting-the-histogram" style="font-size: 80%;">Plotting the Histogram</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs038.html#the-bias-variance-tradeoff" style="font-size: 80%;">The bias-variance tradeoff</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs039.html#a-way-to-read-the-bias-variance-tradeoff" style="font-size: 80%;">A way to Read the Bias-Variance Tradeoff</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs040.html#example-code-for-bias-variance-tradeoff" style="font-size: 80%;">Example code for Bias-Variance tradeoff</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs041.html#understanding-what-happens" style="font-size: 80%;">Understanding what happens</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs042.html#summing-up" style="font-size: 80%;">Summing up</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs043.html#another-example-from-scikit-learn-s-repository" style="font-size: 80%;">Another Example from Scikit-Learn's Repository</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs044.html#various-steps-in-cross-validation" style="font-size: 80%;">Various steps in cross-validation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs045.html#cross-validation-in-brief" style="font-size: 80%;">Cross-validation in brief</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs046.html#code-example-for-cross-validation-and-k-fold-cross-validation" style="font-size: 80%;">Code Example for Cross-validation and \( k \)-fold Cross-validation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs047.html#more-examples-on-bootstrap-and-cross-validation-and-errors" style="font-size: 80%;">More examples on bootstrap and cross-validation and errors</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs048.html#the-same-example-but-now-with-cross-validation" style="font-size: 80%;">The same example but now with cross-validation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs049.html#material-for-the-lab-sessions" style="font-size: 80%;">Material for the lab sessions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs050.html#linking-the-regression-analysis-with-a-statistical-interpretation" style="font-size: 80%;">Linking the regression analysis with a statistical interpretation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs051.html#assumptions-made" style="font-size: 80%;">Assumptions made</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs052.html#expectation-value-and-variance" style="font-size: 80%;">Expectation value and variance</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs053.html#expectation-value-and-variance-for-boldsymbol-beta" style="font-size: 80%;">Expectation value and variance for \( \boldsymbol{\beta} \)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs001.html#plans-for-week-37-lecture-monday" style="font-size: 80%;"><b>Plans for week 37, lecture Monday</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs002.html#readings-and-videos" style="font-size: 80%;"><b>Readings and Videos:</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs003.html#material-for-lecture-monday-september-8" style="font-size: 80%;"><b>Material for lecture Monday September 8</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs004.html#gradient-descent-and-revisiting-ordinary-least-squares-from-last-week" style="font-size: 80%;"><b>Gradient descent and revisiting Ordinary Least Squares from last week</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs009.html#gradient-descent-example" style="font-size: 80%;"><b>Gradient descent example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs006.html#the-derivative-of-the-cost-loss-function" style="font-size: 80%;"><b>The derivative of the cost/loss function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs007.html#the-hessian-matrix" style="font-size: 80%;"><b>The Hessian matrix</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs008.html#simple-program" style="font-size: 80%;"><b>Simple program</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs009.html#gradient-descent-example" style="font-size: 80%;"><b>Gradient Descent Example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs010.html#gradient-descent-and-ridge" style="font-size: 80%;"><b>Gradient descent and Ridge</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs011.html#the-hessian-matrix-for-ridge-regression" style="font-size: 80%;"><b>The Hessian matrix for Ridge Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs012.html#program-example-for-gradient-descent-with-ridge-regression" style="font-size: 80%;"><b>Program example for gradient descent with Ridge Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs013.html#using-gradient-descent-methods-limitations" style="font-size: 80%;"><b>Using gradient descent methods, limitations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs014.html#momentum-based-gd" style="font-size: 80%;"><b>Momentum based GD</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs015.html#improving-gradient-descent-with-momentum" style="font-size: 80%;"><b>Improving gradient descent with momentum</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs056.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs017.html#overview-video-on-stochastic-gradient-descent-sgd" style="font-size: 80%;"><b>Overview video on Stochastic Gradient Descent (SGD)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs018.html#batches-and-mini-batches" style="font-size: 80%;"><b>Batches and mini-batches</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs019.html#pros-and-cons" style="font-size: 80%;"><b>Pros and cons</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs020.html#convergence-rates" style="font-size: 80%;"><b>Convergence rates</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs021.html#accuracy" style="font-size: 80%;"><b>Accuracy</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs022.html#stochastic-gradient-descent-sgd" style="font-size: 80%;"><b>Stochastic Gradient Descent (SGD)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs023.html#stochastic-gradient-descent" style="font-size: 80%;"><b>Stochastic Gradient Descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs024.html#computation-of-gradients" style="font-size: 80%;"><b>Computation of gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs025.html#sgd-example" style="font-size: 80%;"><b>SGD example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs026.html#the-gradient-step" style="font-size: 80%;"><b>The gradient step</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs027.html#simple-example-code" style="font-size: 80%;"><b>Simple example code</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs028.html#when-do-we-stop" style="font-size: 80%;"><b>When do we stop?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs029.html#slightly-different-approach" style="font-size: 80%;"><b>Slightly different approach</b></a></li>
+     <!-- navigation toc: --> <li><a href="#time-decay-rate" style="font-size: 80%;"><b>Time decay rate</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs031.html#code-with-a-number-of-minibatches-which-varies" style="font-size: 80%;"><b>Code with a Number of Minibatches which varies</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs032.html#replace-or-not" style="font-size: 80%;"><b>Replace or not</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#sgd-vs-full-batch-gd-convergence-speed-and-memory-comparison" style="font-size: 80%;"><b>SGD vs Full-Batch GD: Convergence Speed and Memory Comparison</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#theoretical-convergence-speed-and-convex-optimization" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Theoretical Convergence Speed and convex optimization</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#strongly-convex-case" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Strongly Convex Case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#non-convex-problems" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Non-Convex Problems</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs034.html#memory-usage-and-scalability" style="font-size: 80%;"><b>Memory Usage and Scalability</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#empirical-evidence-convergence-time-and-memory-in-practice" style="font-size: 80%;"><b>Empirical Evidence: Convergence Time and Memory in Practice</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#deep-neural-networks" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Deep Neural Networks</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#memory-constraints" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Memory constraints</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs036.html#second-moment-of-the-gradient" style="font-size: 80%;"><b>Second moment of the gradient</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs037.html#challenge-choosing-a-fixed-learning-rate" style="font-size: 80%;"><b>Challenge: Choosing a Fixed Learning Rate</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs038.html#motivation-for-adaptive-step-sizes" style="font-size: 80%;"><b>Motivation for Adaptive Step Sizes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs038.html#adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>AdaGrad algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs039.html#derivation-of-the-adagrad-algorithm" style="font-size: 80%;"><b>Derivation of the AdaGrad Algorithm</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs040.html#adagrad-update-rule-derivation" style="font-size: 80%;"><b>AdaGrad Update Rule Derivation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs041.html#adagrad-properties" style="font-size: 80%;"><b>AdaGrad Properties</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs042.html#rmsprop-adaptive-learning-rates" style="font-size: 80%;"><b>RMSProp: Adaptive Learning Rates</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs042.html#rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>RMSProp algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs043.html#adam-optimizer" style="font-size: 80%;"><b>Adam Optimizer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs044.html#adam-optimizer-https-arxiv-org-abs-1412-6980" style="font-size: 80%;"><b>"ADAM optimizer":"https://arxiv.org/abs/1412.6980"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs045.html#why-combine-momentum-and-rmsprop" style="font-size: 80%;"><b>Why Combine Momentum and RMSProp?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs046.html#adam-exponential-moving-averages-moments" style="font-size: 80%;"><b>Adam: Exponential Moving Averages (Moments)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs047.html#adam-bias-correction" style="font-size: 80%;"><b>Adam: Bias Correction</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs048.html#adam-update-rule-derivation" style="font-size: 80%;"><b>Adam: Update Rule Derivation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs049.html#adam-vs-adagrad-and-rmsprop" style="font-size: 80%;"><b>Adam vs. AdaGrad and RMSProp</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs050.html#adaptivity-across-dimensions" style="font-size: 80%;"><b>Adaptivity Across Dimensions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs050.html#adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>ADAM algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs051.html#algorithms-and-codes-for-adagrad-rmsprop-and-adam" style="font-size: 80%;"><b>Algorithms and codes for Adagrad, RMSprop and Adam</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs052.html#practical-tips" style="font-size: 80%;"><b>Practical tips</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs053.html#sneaking-in-automatic-differentiation-using-autograd" style="font-size: 80%;"><b>Sneaking in automatic differentiation using Autograd</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs056.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs055.html#including-stochastic-gradient-descent-with-autograd" style="font-size: 80%;"><b>Including Stochastic Gradient Descent with Autograd</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs056.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs057.html#but-none-of-these-can-compete-with-newton-s-method" style="font-size: 80%;"><b>But none of these can compete with Newton's method</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs058.html#similar-second-order-function-now-problem-but-now-with-adagrad" style="font-size: 80%;"><b>Similar (second order function now) problem but now with AdaGrad</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs059.html#rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent" style="font-size: 80%;"><b>RMSprop for adaptive learning rate with Stochastic Gradient Descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs060.html#and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf" style="font-size: 80%;"><b>And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs061.html#material-for-the-lab-sessions" style="font-size: 80%;"><b>Material for the lab sessions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs062.html#reminder-on-different-scaling-methods" style="font-size: 80%;"><b>Reminder on different scaling methods</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs063.html#functionality-in-scikit-learn" style="font-size: 80%;"><b>Functionality in Scikit-Learn</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs064.html#more-preprocessing" style="font-size: 80%;"><b>More preprocessing</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs065.html#frequently-used-scaling-functions" style="font-size: 80%;"><b>Frequently used scaling functions</b></a></li>
 
         </ul>
       </li>
@@ -291,24 +374,63 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0030"></a>
 <!-- !split -->
-<h2 id="confidence-intervals" class="anchor">Confidence Intervals </h2>
+<h2 id="time-decay-rate" class="anchor">Time decay rate </h2>
 
-<p>Confidence intervals are used in statistics and represent a type of estimate
-computed from the observed data. This gives a range of values for an
-unknown parameter such as the parameters \( \boldsymbol{\beta} \) from linear regression.
-</p>
+<p>As an example, let \( e = 0,1,2,3,\cdots \) denote the current epoch and let \( t_0, t_1 > 0 \) be two fixed numbers. Furthermore, let \( t = e \cdot m + i \) where \( m \) is the number of minibatches and \( i=0,\cdots,m-1 \). Then the function $$\eta_j(t; t_0, t_1) = \frac{t_0}{t+t_1} $$ goes to zero as the number of epochs gets large. I.e. we start with a step length \( \eta_j (0; t_0, t_1) = t_0/t_1 \) which decays in <em>time</em> \( t \).</p>
 
-<p>With the OLS expressions for the parameters \( \boldsymbol{\beta} \) we found 
-\( \mathbb{E}(\boldsymbol{\beta}) = \boldsymbol{\beta} \), which means that the estimator of the regression parameters is unbiased.
+<p>In this way we can fix the number of epochs, compute \( \theta \) and
+evaluate the cost function at the end. Repeating the computation will
+give a different result since the scheme is random by design. Then we
+pick the final \( \theta \) that gives the lowest value of the cost
+function.
 </p>
 
-<p>In the exercises this week we show that the variance of the estimate of the \( j \)-th regression coefficient is
-\( \boldsymbol{\sigma}^2 (\boldsymbol{\beta}_j ) = \boldsymbol{\sigma}^2 [(\mathbf{X}^{T} \mathbf{X})^{-1}]_{jj}  \).
-</p>
 
-<p>This quantity can be used to
-construct a confidence interval for the estimates.
-</p>
+<!-- code=python (!bc pycod) typeset with pygments style "default" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #f8f8f8">
+  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span> 
+
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">step_length</span>(t,t0,t1):
+    <span style="color: #008000; font-weight: bold">return</span> t0<span style="color: #666666">/</span>(t<span style="color: #666666">+</span>t1)
+
+n <span style="color: #666666">=</span> <span style="color: #666666">100</span> <span style="color: #408080; font-style: italic">#100 datapoints </span>
+M <span style="color: #666666">=</span> <span style="color: #666666">5</span>   <span style="color: #408080; font-style: italic">#size of each minibatch</span>
+m <span style="color: #666666">=</span> <span style="color: #008000">int</span>(n<span style="color: #666666">/</span>M) <span style="color: #408080; font-style: italic">#number of minibatches</span>
+n_epochs <span style="color: #666666">=</span> <span style="color: #666666">500</span> <span style="color: #408080; font-style: italic">#number of epochs</span>
+t0 <span style="color: #666666">=</span> <span style="color: #666666">1.0</span>
+t1 <span style="color: #666666">=</span> <span style="color: #666666">10</span>
+
+eta_j <span style="color: #666666">=</span> t0<span style="color: #666666">/</span>t1
+j <span style="color: #666666">=</span> <span style="color: #666666">0</span>
+<span style="color: #008000; font-weight: bold">for</span> epoch <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #666666">1</span>,n_epochs<span style="color: #666666">+1</span>):
+    <span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(m):
+        k <span style="color: #666666">=</span> np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>randint(m) <span style="color: #408080; font-style: italic">#Pick the k-th minibatch at random</span>
+        <span style="color: #408080; font-style: italic">#Compute the gradient using the data in minibatch Bk</span>
+        <span style="color: #408080; font-style: italic">#Compute new suggestion for theta</span>
+        t <span style="color: #666666">=</span> epoch<span style="color: #666666">*</span>m<span style="color: #666666">+</span>i
+        eta_j <span style="color: #666666">=</span> step_length(t,t0,t1)
+        j <span style="color: #666666">+=</span> <span style="color: #666666">1</span>
+
+<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;eta_j after </span><span style="color: #BB6688; font-weight: bold">%d</span><span style="color: #BA2121"> epochs: </span><span style="color: #BB6688; font-weight: bold">%g</span><span style="color: #BA2121">&quot;</span> <span style="color: #666666">%</span> (n_epochs,eta_j))
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
+</div>
+
 
 <p>
 <!-- navigation buttons at the bottom of the page -->
@@ -335,7 +457,7 @@ <h2 id="confidence-intervals" class="anchor">Confidence Intervals </h2>
   <li><a href="http://github.com/._week37-bs038.html">39</a></li>
   <li><a href="http://github.com/._week37-bs039.html">40</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week37-bs053.html">54</a></li>
+  <li><a href="http://github.com/._week37-bs065.html">66</a></li>
   <li><a href="http://github.com/._week37-bs031.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week37/html/._week37-bs031.html b/doc/pub/week37/html/._week37-bs031.html
index c4b348795..2c6d23dea 100644
--- a/doc/pub/week37/html/._week37-bs031.html
+++ b/doc/pub/week37/html/._week37-bs031.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 37: Statistical interpretations and Resampling Methods">
-<title>Week 37: Statistical interpretations and Resampling Methods</title>
+<meta name="description" content="Week 37: Gradient descent methods">
+<title>Week 37: Gradient descent methods</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week37.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week37-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -40,159 +40,222 @@
                2,
                None,
                'plans-for-week-37-lecture-monday'),
-              ('Plans for week 37, lab sessions',
+              ('Readings and Videos:', 2, None, 'readings-and-videos'),
+              ('Material for lecture Monday September 8',
                2,
                None,
-               'plans-for-week-37-lab-sessions'),
-              ('Material for lecture Monday September 9',
+               'material-for-lecture-monday-september-8'),
+              ('Gradient descent and revisiting Ordinary Least Squares from '
+               'last week',
                2,
                None,
-               'material-for-lecture-monday-september-9'),
-              ('Deriving OLS from a probability distribution',
+               'gradient-descent-and-revisiting-ordinary-least-squares-from-last-week'),
+              ('Gradient descent example', 2, None, 'gradient-descent-example'),
+              ('The derivative of the cost/loss function',
                2,
                None,
-               'deriving-ols-from-a-probability-distribution'),
-              ('Independent and Identically Distrubuted (iid)',
+               'the-derivative-of-the-cost-loss-function'),
+              ('The Hessian matrix', 2, None, 'the-hessian-matrix'),
+              ('Simple program', 2, None, 'simple-program'),
+              ('Gradient Descent Example', 2, None, 'gradient-descent-example'),
+              ('Gradient descent and Ridge',
                2,
                None,
-               'independent-and-identically-distrubuted-iid'),
-              ('Maximum Likelihood Estimation (MLE)',
+               'gradient-descent-and-ridge'),
+              ('The Hessian matrix for Ridge Regression',
                2,
                None,
-               'maximum-likelihood-estimation-mle'),
-              ('A new Cost Function', 2, None, 'a-new-cost-function'),
-              ("More basic Statistics and Bayes' theorem",
+               'the-hessian-matrix-for-ridge-regression'),
+              ('Program example for gradient descent with Ridge Regression',
                2,
                None,
-               'more-basic-statistics-and-bayes-theorem'),
-              ('Marginal Probability', 2, None, 'marginal-probability'),
-              ('Conditional  Probability', 2, None, 'conditional-probability'),
-              ("Bayes' Theorem", 2, None, 'bayes-theorem'),
-              ("Interpretations of Bayes' Theorem",
+               'program-example-for-gradient-descent-with-ridge-regression'),
+              ('Using gradient descent methods, limitations',
                2,
                None,
-               'interpretations-of-bayes-theorem'),
-              ("Example of Usage of Bayes' theorem",
+               'using-gradient-descent-methods-limitations'),
+              ('Momentum based GD', 2, None, 'momentum-based-gd'),
+              ('Improving gradient descent with momentum',
                2,
                None,
-               'example-of-usage-of-bayes-theorem'),
-              ('Doing it correctly', 2, None, 'doing-it-correctly'),
-              ("Bayes' Theorem and Ridge and Lasso Regression",
+               'improving-gradient-descent-with-momentum'),
+              ('Same code but now with momentum gradient descent',
                2,
                None,
-               'bayes-theorem-and-ridge-and-lasso-regression'),
-              ('Ridge and Bayes', 2, None, 'ridge-and-bayes'),
-              ('Lasso and Bayes', 2, None, 'lasso-and-bayes'),
-              ('Why resampling methods', 2, None, 'why-resampling-methods'),
-              ('Resampling methods', 2, None, 'resampling-methods'),
-              ('Resampling approaches can be computationally expensive',
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ('Overview video on Stochastic Gradient Descent (SGD)',
                2,
                None,
-               'resampling-approaches-can-be-computationally-expensive'),
-              ('Why resampling methods ?', 2, None, 'why-resampling-methods'),
-              ('Statistical analysis', 2, None, 'statistical-analysis'),
-              ('Resampling methods', 2, None, 'resampling-methods'),
-              ('Resampling methods: Bootstrap',
+               'overview-video-on-stochastic-gradient-descent-sgd'),
+              ('Batches and mini-batches', 2, None, 'batches-and-mini-batches'),
+              ('Pros and cons', 2, None, 'pros-and-cons'),
+              ('Convergence rates', 2, None, 'convergence-rates'),
+              ('Accuracy', 2, None, 'accuracy'),
+              ('Stochastic Gradient Descent (SGD)',
                2,
                None,
-               'resampling-methods-bootstrap'),
-              ('The Central Limit Theorem',
+               'stochastic-gradient-descent-sgd'),
+              ('Stochastic Gradient Descent',
                2,
                None,
-               'the-central-limit-theorem'),
-              ('Finding the Limit', 2, None, 'finding-the-limit'),
-              ('Rewriting the $\\delta$-function',
+               'stochastic-gradient-descent'),
+              ('Computation of gradients', 2, None, 'computation-of-gradients'),
+              ('SGD example', 2, None, 'sgd-example'),
+              ('The gradient step', 2, None, 'the-gradient-step'),
+              ('Simple example code', 2, None, 'simple-example-code'),
+              ('When do we stop?', 2, None, 'when-do-we-stop'),
+              ('Slightly different approach',
                2,
                None,
-               'rewriting-the-delta-function'),
-              ('Identifying Terms', 2, None, 'identifying-terms'),
-              ('Wrapping it up', 2, None, 'wrapping-it-up'),
-              ('Confidence Intervals', 2, None, 'confidence-intervals'),
-              ('Standard Approach based on the Normal Distribution',
+               'slightly-different-approach'),
+              ('Time decay rate', 2, None, 'time-decay-rate'),
+              ('Code with a Number of Minibatches which varies',
                2,
                None,
-               'standard-approach-based-on-the-normal-distribution'),
-              ('Resampling methods: Bootstrap background',
+               'code-with-a-number-of-minibatches-which-varies'),
+              ('Replace or not', 2, None, 'replace-or-not'),
+              ('SGD vs Full-Batch GD: Convergence Speed and Memory Comparison',
                2,
                None,
-               'resampling-methods-bootstrap-background'),
-              ('Resampling methods: More Bootstrap background',
+               'sgd-vs-full-batch-gd-convergence-speed-and-memory-comparison'),
+              ('Theoretical Convergence Speed and convex optimization',
+               3,
+               None,
+               'theoretical-convergence-speed-and-convex-optimization'),
+              ('Strongly Convex Case', 3, None, 'strongly-convex-case'),
+              ('Non-Convex Problems', 3, None, 'non-convex-problems'),
+              ('Memory Usage and Scalability',
+               2,
+               None,
+               'memory-usage-and-scalability'),
+              ('Empirical Evidence: Convergence Time and Memory in Practice',
+               2,
+               None,
+               'empirical-evidence-convergence-time-and-memory-in-practice'),
+              ('Deep Neural Networks', 3, None, 'deep-neural-networks'),
+              ('Memory constraints', 3, None, 'memory-constraints'),
+              ('Second moment of the gradient',
+               2,
+               None,
+               'second-moment-of-the-gradient'),
+              ('Challenge: Choosing a Fixed Learning Rate',
+               2,
+               None,
+               'challenge-choosing-a-fixed-learning-rate'),
+              ('Motivation for Adaptive Step Sizes',
+               2,
+               None,
+               'motivation-for-adaptive-step-sizes'),
+              ('AdaGrad algorithm, taken from "Goodfellow et '
+               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
+               2,
+               None,
+               'adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
+              ('Derivation of the AdaGrad Algorithm',
+               2,
+               None,
+               'derivation-of-the-adagrad-algorithm'),
+              ('AdaGrad Update Rule Derivation',
+               2,
+               None,
+               'adagrad-update-rule-derivation'),
+              ('AdaGrad Properties', 2, None, 'adagrad-properties'),
+              ('RMSProp: Adaptive Learning Rates',
+               2,
+               None,
+               'rmsprop-adaptive-learning-rates'),
+              ('RMSProp algorithm, taken from "Goodfellow et '
+               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
                2,
                None,
-               'resampling-methods-more-bootstrap-background'),
-              ('Resampling methods: Bootstrap approach',
+               'rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
+              ('Adam Optimizer', 2, None, 'adam-optimizer'),
+              ('"ADAM optimizer":"https://arxiv.org/abs/1412.6980"',
                2,
                None,
-               'resampling-methods-bootstrap-approach'),
-              ('Resampling methods: Bootstrap steps',
+               'adam-optimizer-https-arxiv-org-abs-1412-6980'),
+              ('Why Combine Momentum and RMSProp?',
                2,
                None,
-               'resampling-methods-bootstrap-steps'),
-              ('Code example for the Bootstrap method',
+               'why-combine-momentum-and-rmsprop'),
+              ('Adam: Exponential Moving Averages (Moments)',
                2,
                None,
-               'code-example-for-the-bootstrap-method'),
-              ('Plotting the Histogram', 2, None, 'plotting-the-histogram'),
-              ('The bias-variance tradeoff',
+               'adam-exponential-moving-averages-moments'),
+              ('Adam: Bias Correction', 2, None, 'adam-bias-correction'),
+              ('Adam: Update Rule Derivation',
                2,
                None,
-               'the-bias-variance-tradeoff'),
-              ('A way to Read the Bias-Variance Tradeoff',
+               'adam-update-rule-derivation'),
+              ('Adam vs. AdaGrad and RMSProp',
                2,
                None,
-               'a-way-to-read-the-bias-variance-tradeoff'),
-              ('Example code for Bias-Variance tradeoff',
+               'adam-vs-adagrad-and-rmsprop'),
+              ('Adaptivity Across Dimensions',
                2,
                None,
-               'example-code-for-bias-variance-tradeoff'),
-              ('Understanding what happens',
+               'adaptivity-across-dimensions'),
+              ('ADAM algorithm, taken from "Goodfellow et '
+               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
                2,
                None,
-               'understanding-what-happens'),
-              ('Summing up', 2, None, 'summing-up'),
-              ("Another Example from Scikit-Learn's Repository",
+               'adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
+              ('Algorithms and codes for Adagrad, RMSprop and Adam',
                2,
                None,
-               'another-example-from-scikit-learn-s-repository'),
-              ('Various steps in cross-validation',
+               'algorithms-and-codes-for-adagrad-rmsprop-and-adam'),
+              ('Practical tips', 2, None, 'practical-tips'),
+              ('Sneaking in automatic differentiation using Autograd',
                2,
                None,
-               'various-steps-in-cross-validation'),
-              ('Cross-validation in brief',
+               'sneaking-in-automatic-differentiation-using-autograd'),
+              ('Same code but now with momentum gradient descent',
                2,
                None,
-               'cross-validation-in-brief'),
-              ('Code Example for Cross-validation and $k$-fold '
-               'Cross-validation',
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ('Including Stochastic Gradient Descent with Autograd',
                2,
                None,
-               'code-example-for-cross-validation-and-k-fold-cross-validation'),
-              ('More examples on bootstrap and cross-validation and errors',
+               'including-stochastic-gradient-descent-with-autograd'),
+              ('Same code but now with momentum gradient descent',
                2,
                None,
-               'more-examples-on-bootstrap-and-cross-validation-and-errors'),
-              ('The same example but now with cross-validation',
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ("But none of these can compete with Newton's method",
                2,
                None,
-               'the-same-example-but-now-with-cross-validation'),
+               'but-none-of-these-can-compete-with-newton-s-method'),
+              ('Similar (second order function now) problem but now with '
+               'AdaGrad',
+               2,
+               None,
+               'similar-second-order-function-now-problem-but-now-with-adagrad'),
+              ('RMSprop for adaptive learning rate with Stochastic Gradient '
+               'Descent',
+               2,
+               None,
+               'rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent'),
+              ('And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"',
+               2,
+               None,
+               'and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf'),
               ('Material for the lab sessions',
                2,
                None,
                'material-for-the-lab-sessions'),
-              ('Linking the regression analysis with a statistical '
-               'interpretation',
+              ('Reminder on different scaling methods',
                2,
                None,
-               'linking-the-regression-analysis-with-a-statistical-interpretation'),
-              ('Assumptions made', 2, None, 'assumptions-made'),
-              ('Expectation value and variance',
+               'reminder-on-different-scaling-methods'),
+              ('Functionality in Scikit-Learn',
                2,
                None,
-               'expectation-value-and-variance'),
-              ('Expectation value and variance for $\\boldsymbol{\\beta}$',
+               'functionality-in-scikit-learn'),
+              ('More preprocessing', 2, None, 'more-preprocessing'),
+              ('Frequently used scaling functions',
                2,
                None,
-               'expectation-value-and-variance-for-boldsymbol-beta')]}
+               'frequently-used-scaling-functions')]}
 end of tocinfo -->
 
 <body>
@@ -220,66 +283,86 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week37-bs.html">Week 37: Statistical interpretations and Resampling Methods</a>
+    <a class="navbar-brand" href="http://github.com/week37-bs.html">Week 37: Gradient descent methods</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs001.html#plans-for-week-37-lecture-monday" style="font-size: 80%;">Plans for week 37, lecture Monday</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs002.html#plans-for-week-37-lab-sessions" style="font-size: 80%;">Plans for week 37, lab sessions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs003.html#material-for-lecture-monday-september-9" style="font-size: 80%;">Material for lecture Monday September 9</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs004.html#deriving-ols-from-a-probability-distribution" style="font-size: 80%;">Deriving OLS from a probability distribution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs005.html#independent-and-identically-distrubuted-iid" style="font-size: 80%;">Independent and Identically Distrubuted (iid)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs006.html#maximum-likelihood-estimation-mle" style="font-size: 80%;">Maximum Likelihood Estimation (MLE)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs007.html#a-new-cost-function" style="font-size: 80%;">A new Cost Function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs008.html#more-basic-statistics-and-bayes-theorem" style="font-size: 80%;">More basic Statistics and Bayes' theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs009.html#marginal-probability" style="font-size: 80%;">Marginal Probability</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs010.html#conditional-probability" style="font-size: 80%;">Conditional  Probability</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs011.html#bayes-theorem" style="font-size: 80%;">Bayes' Theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs012.html#interpretations-of-bayes-theorem" style="font-size: 80%;">Interpretations of Bayes' Theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs013.html#example-of-usage-of-bayes-theorem" style="font-size: 80%;">Example of Usage of Bayes' theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs014.html#doing-it-correctly" style="font-size: 80%;">Doing it correctly</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs015.html#bayes-theorem-and-ridge-and-lasso-regression" style="font-size: 80%;">Bayes' Theorem and Ridge and Lasso Regression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs016.html#ridge-and-bayes" style="font-size: 80%;">Ridge and Bayes</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs017.html#lasso-and-bayes" style="font-size: 80%;">Lasso and Bayes</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs021.html#why-resampling-methods" style="font-size: 80%;">Why resampling methods</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs023.html#resampling-methods" style="font-size: 80%;">Resampling methods</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs020.html#resampling-approaches-can-be-computationally-expensive" style="font-size: 80%;">Resampling approaches can be computationally expensive</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs021.html#why-resampling-methods" style="font-size: 80%;">Why resampling methods ?</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs022.html#statistical-analysis" style="font-size: 80%;">Statistical analysis</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs023.html#resampling-methods" style="font-size: 80%;">Resampling methods</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs024.html#resampling-methods-bootstrap" style="font-size: 80%;">Resampling methods: Bootstrap</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs025.html#the-central-limit-theorem" style="font-size: 80%;">The Central Limit Theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs026.html#finding-the-limit" style="font-size: 80%;">Finding the Limit</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs027.html#rewriting-the-delta-function" style="font-size: 80%;">Rewriting the \( \delta \)-function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs028.html#identifying-terms" style="font-size: 80%;">Identifying Terms</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs029.html#wrapping-it-up" style="font-size: 80%;">Wrapping it up</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs030.html#confidence-intervals" style="font-size: 80%;">Confidence Intervals</a></li>
-     <!-- navigation toc: --> <li><a href="#standard-approach-based-on-the-normal-distribution" style="font-size: 80%;">Standard Approach based on the Normal Distribution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs032.html#resampling-methods-bootstrap-background" style="font-size: 80%;">Resampling methods: Bootstrap background</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#resampling-methods-more-bootstrap-background" style="font-size: 80%;">Resampling methods: More Bootstrap background</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs034.html#resampling-methods-bootstrap-approach" style="font-size: 80%;">Resampling methods: Bootstrap approach</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#resampling-methods-bootstrap-steps" style="font-size: 80%;">Resampling methods: Bootstrap steps</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs036.html#code-example-for-the-bootstrap-method" style="font-size: 80%;">Code example for the Bootstrap method</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs037.html#plotting-the-histogram" style="font-size: 80%;">Plotting the Histogram</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs038.html#the-bias-variance-tradeoff" style="font-size: 80%;">The bias-variance tradeoff</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs039.html#a-way-to-read-the-bias-variance-tradeoff" style="font-size: 80%;">A way to Read the Bias-Variance Tradeoff</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs040.html#example-code-for-bias-variance-tradeoff" style="font-size: 80%;">Example code for Bias-Variance tradeoff</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs041.html#understanding-what-happens" style="font-size: 80%;">Understanding what happens</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs042.html#summing-up" style="font-size: 80%;">Summing up</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs043.html#another-example-from-scikit-learn-s-repository" style="font-size: 80%;">Another Example from Scikit-Learn's Repository</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs044.html#various-steps-in-cross-validation" style="font-size: 80%;">Various steps in cross-validation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs045.html#cross-validation-in-brief" style="font-size: 80%;">Cross-validation in brief</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs046.html#code-example-for-cross-validation-and-k-fold-cross-validation" style="font-size: 80%;">Code Example for Cross-validation and \( k \)-fold Cross-validation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs047.html#more-examples-on-bootstrap-and-cross-validation-and-errors" style="font-size: 80%;">More examples on bootstrap and cross-validation and errors</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs048.html#the-same-example-but-now-with-cross-validation" style="font-size: 80%;">The same example but now with cross-validation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs049.html#material-for-the-lab-sessions" style="font-size: 80%;">Material for the lab sessions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs050.html#linking-the-regression-analysis-with-a-statistical-interpretation" style="font-size: 80%;">Linking the regression analysis with a statistical interpretation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs051.html#assumptions-made" style="font-size: 80%;">Assumptions made</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs052.html#expectation-value-and-variance" style="font-size: 80%;">Expectation value and variance</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs053.html#expectation-value-and-variance-for-boldsymbol-beta" style="font-size: 80%;">Expectation value and variance for \( \boldsymbol{\beta} \)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs001.html#plans-for-week-37-lecture-monday" style="font-size: 80%;"><b>Plans for week 37, lecture Monday</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs002.html#readings-and-videos" style="font-size: 80%;"><b>Readings and Videos:</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs003.html#material-for-lecture-monday-september-8" style="font-size: 80%;"><b>Material for lecture Monday September 8</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs004.html#gradient-descent-and-revisiting-ordinary-least-squares-from-last-week" style="font-size: 80%;"><b>Gradient descent and revisiting Ordinary Least Squares from last week</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs009.html#gradient-descent-example" style="font-size: 80%;"><b>Gradient descent example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs006.html#the-derivative-of-the-cost-loss-function" style="font-size: 80%;"><b>The derivative of the cost/loss function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs007.html#the-hessian-matrix" style="font-size: 80%;"><b>The Hessian matrix</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs008.html#simple-program" style="font-size: 80%;"><b>Simple program</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs009.html#gradient-descent-example" style="font-size: 80%;"><b>Gradient Descent Example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs010.html#gradient-descent-and-ridge" style="font-size: 80%;"><b>Gradient descent and Ridge</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs011.html#the-hessian-matrix-for-ridge-regression" style="font-size: 80%;"><b>The Hessian matrix for Ridge Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs012.html#program-example-for-gradient-descent-with-ridge-regression" style="font-size: 80%;"><b>Program example for gradient descent with Ridge Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs013.html#using-gradient-descent-methods-limitations" style="font-size: 80%;"><b>Using gradient descent methods, limitations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs014.html#momentum-based-gd" style="font-size: 80%;"><b>Momentum based GD</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs015.html#improving-gradient-descent-with-momentum" style="font-size: 80%;"><b>Improving gradient descent with momentum</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs056.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs017.html#overview-video-on-stochastic-gradient-descent-sgd" style="font-size: 80%;"><b>Overview video on Stochastic Gradient Descent (SGD)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs018.html#batches-and-mini-batches" style="font-size: 80%;"><b>Batches and mini-batches</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs019.html#pros-and-cons" style="font-size: 80%;"><b>Pros and cons</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs020.html#convergence-rates" style="font-size: 80%;"><b>Convergence rates</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs021.html#accuracy" style="font-size: 80%;"><b>Accuracy</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs022.html#stochastic-gradient-descent-sgd" style="font-size: 80%;"><b>Stochastic Gradient Descent (SGD)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs023.html#stochastic-gradient-descent" style="font-size: 80%;"><b>Stochastic Gradient Descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs024.html#computation-of-gradients" style="font-size: 80%;"><b>Computation of gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs025.html#sgd-example" style="font-size: 80%;"><b>SGD example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs026.html#the-gradient-step" style="font-size: 80%;"><b>The gradient step</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs027.html#simple-example-code" style="font-size: 80%;"><b>Simple example code</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs028.html#when-do-we-stop" style="font-size: 80%;"><b>When do we stop?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs029.html#slightly-different-approach" style="font-size: 80%;"><b>Slightly different approach</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs030.html#time-decay-rate" style="font-size: 80%;"><b>Time decay rate</b></a></li>
+     <!-- navigation toc: --> <li><a href="#code-with-a-number-of-minibatches-which-varies" style="font-size: 80%;"><b>Code with a Number of Minibatches which varies</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs032.html#replace-or-not" style="font-size: 80%;"><b>Replace or not</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#sgd-vs-full-batch-gd-convergence-speed-and-memory-comparison" style="font-size: 80%;"><b>SGD vs Full-Batch GD: Convergence Speed and Memory Comparison</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#theoretical-convergence-speed-and-convex-optimization" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Theoretical Convergence Speed and convex optimization</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#strongly-convex-case" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Strongly Convex Case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#non-convex-problems" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Non-Convex Problems</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs034.html#memory-usage-and-scalability" style="font-size: 80%;"><b>Memory Usage and Scalability</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#empirical-evidence-convergence-time-and-memory-in-practice" style="font-size: 80%;"><b>Empirical Evidence: Convergence Time and Memory in Practice</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#deep-neural-networks" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Deep Neural Networks</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#memory-constraints" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Memory constraints</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs036.html#second-moment-of-the-gradient" style="font-size: 80%;"><b>Second moment of the gradient</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs037.html#challenge-choosing-a-fixed-learning-rate" style="font-size: 80%;"><b>Challenge: Choosing a Fixed Learning Rate</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs038.html#motivation-for-adaptive-step-sizes" style="font-size: 80%;"><b>Motivation for Adaptive Step Sizes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs038.html#adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>AdaGrad algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs039.html#derivation-of-the-adagrad-algorithm" style="font-size: 80%;"><b>Derivation of the AdaGrad Algorithm</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs040.html#adagrad-update-rule-derivation" style="font-size: 80%;"><b>AdaGrad Update Rule Derivation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs041.html#adagrad-properties" style="font-size: 80%;"><b>AdaGrad Properties</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs042.html#rmsprop-adaptive-learning-rates" style="font-size: 80%;"><b>RMSProp: Adaptive Learning Rates</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs042.html#rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>RMSProp algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs043.html#adam-optimizer" style="font-size: 80%;"><b>Adam Optimizer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs044.html#adam-optimizer-https-arxiv-org-abs-1412-6980" style="font-size: 80%;"><b>"ADAM optimizer":"https://arxiv.org/abs/1412.6980"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs045.html#why-combine-momentum-and-rmsprop" style="font-size: 80%;"><b>Why Combine Momentum and RMSProp?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs046.html#adam-exponential-moving-averages-moments" style="font-size: 80%;"><b>Adam: Exponential Moving Averages (Moments)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs047.html#adam-bias-correction" style="font-size: 80%;"><b>Adam: Bias Correction</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs048.html#adam-update-rule-derivation" style="font-size: 80%;"><b>Adam: Update Rule Derivation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs049.html#adam-vs-adagrad-and-rmsprop" style="font-size: 80%;"><b>Adam vs. AdaGrad and RMSProp</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs050.html#adaptivity-across-dimensions" style="font-size: 80%;"><b>Adaptivity Across Dimensions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs050.html#adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>ADAM algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs051.html#algorithms-and-codes-for-adagrad-rmsprop-and-adam" style="font-size: 80%;"><b>Algorithms and codes for Adagrad, RMSprop and Adam</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs052.html#practical-tips" style="font-size: 80%;"><b>Practical tips</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs053.html#sneaking-in-automatic-differentiation-using-autograd" style="font-size: 80%;"><b>Sneaking in automatic differentiation using Autograd</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs056.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs055.html#including-stochastic-gradient-descent-with-autograd" style="font-size: 80%;"><b>Including Stochastic Gradient Descent with Autograd</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs056.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs057.html#but-none-of-these-can-compete-with-newton-s-method" style="font-size: 80%;"><b>But none of these can compete with Newton's method</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs058.html#similar-second-order-function-now-problem-but-now-with-adagrad" style="font-size: 80%;"><b>Similar (second order function now) problem but now with AdaGrad</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs059.html#rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent" style="font-size: 80%;"><b>RMSprop for adaptive learning rate with Stochastic Gradient Descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs060.html#and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf" style="font-size: 80%;"><b>And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs061.html#material-for-the-lab-sessions" style="font-size: 80%;"><b>Material for the lab sessions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs062.html#reminder-on-different-scaling-methods" style="font-size: 80%;"><b>Reminder on different scaling methods</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs063.html#functionality-in-scikit-learn" style="font-size: 80%;"><b>Functionality in Scikit-Learn</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs064.html#more-preprocessing" style="font-size: 80%;"><b>More preprocessing</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs065.html#frequently-used-scaling-functions" style="font-size: 80%;"><b>Frequently used scaling functions</b></a></li>
 
         </ul>
       </li>
@@ -291,31 +374,97 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0031"></a>
 <!-- !split -->
-<h2 id="standard-approach-based-on-the-normal-distribution" class="anchor">Standard Approach based on the Normal Distribution </h2>
+<h2 id="code-with-a-number-of-minibatches-which-varies" class="anchor">Code with a Number of Minibatches which varies </h2>
+
+<p>In the code here we vary the number of mini-batches.</p>
+
+<!-- code=text (!bc pycode) typeset with pygments style "default" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #f8f8f8">
+  <pre style="line-height: 125%;"># Importing various packages
+from math import exp, sqrt
+from random import random, seed
+import numpy as np
+import matplotlib.pyplot as plt
+
+n = 100
+x = 2*np.random.rand(n,1)
+y = 4+3*x+np.random.randn(n,1)
+
+X = np.c_[np.ones((n,1)), x]
+XT_X = X.T @ X
+theta_linreg = np.linalg.inv(X.T @ X) @ (X.T @ y)
+print(&quot;Own inversion&quot;)
+print(theta_linreg)
+# Hessian matrix
+H = (2.0/n)* XT_X
+EigValues, EigVectors = np.linalg.eig(H)
+print(f&quot;Eigenvalues of Hessian Matrix:{EigValues}&quot;)
 
-<p>We will assume that the parameters \( \beta \) follow a normal
-distribution.  We can then define the confidence interval.  Here we will be using as
-shorthands \( \mu_{\beta} \) for the above mean value and \( \sigma_{\beta} \)
-for the standard deviation. We have then a confidence interval
-</p>
+theta = np.random.randn(2,1)
+eta = 1.0/np.max(EigValues)
+Niterations = 1000
 
-$$
-\left(\mu_{\beta}\pm \frac{z\sigma_{\beta}}{\sqrt{n}}\right),
-$$
 
-<p>where \( z \) defines the level of certainty (or confidence). For a normal
-distribution typical parameters are \( z=2.576 \) which corresponds to a
-confidence of \( 99\% \) while \( z=1.96 \) corresponds to a confidence of
-\( 95\% \).  A confidence level of \( 95\% \) is commonly used and it is
-normally referred to as a <em>two-sigmas</em> confidence level, that is we
-approximate \( z\approx 2 \).
-</p>
+for iter in range(Niterations):
+    gradients = 2.0/n*X.T @ ((X @ theta)-y)
+    theta -= eta*gradients
+print(&quot;theta from own gd&quot;)
+print(theta)
 
-<p>For more discussions of confidence intervals (and in particular linked with a discussion of the bootstrap method), see chapter 5 of the textbook by <a href="https://www.cambridge.org/core/books/bootstrap-methods-and-their-application/ED2FD043579F27952363566DC09CBD6A" target="_self">Davison on the Bootstrap Methods and their Applications</a></p>
+xnew = np.array([[0],[2]])
+Xnew = np.c_[np.ones((2,1)), xnew]
+ypredict = Xnew.dot(theta)
+ypredict2 = Xnew.dot(theta_linreg)
+
+n_epochs = 50
+M = 5   #size of each minibatch
+m = int(n/M) #number of minibatches
+t0, t1 = 5, 50
+
+def learning_schedule(t):
+    return t0/(t+t1)
+
+theta = np.random.randn(2,1)
+
+for epoch in range(n_epochs):
+# Can you figure out a better way of setting up the contributions to each batch?
+    for i in range(m):
+        random_index = M*np.random.randint(m)
+        xi = X[random_index:random_index+M]
+        yi = y[random_index:random_index+M]
+        gradients = (2.0/M)* xi.T @ ((xi @ theta)-yi)
+        eta = learning_schedule(epoch*m+i)
+        theta = theta - eta*gradients
+print(&quot;theta from own sdg&quot;)
+print(theta)
+
+plt.plot(xnew, ypredict, &quot;r-&quot;)
+plt.plot(xnew, ypredict2, &quot;b-&quot;)
+plt.plot(x, y ,&#39;ro&#39;)
+plt.axis([0,2.0,0, 15.0])
+plt.xlabel(r&#39;$x$&#39;)
+plt.ylabel(r&#39;$y$&#39;)
+plt.title(r&#39;Random numbers &#39;)
+plt.show()
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
+</div>
 
-<p>In this text you will also find an in-depth discussion of the
-Bootstrap method, why it works and various theorems related to it. 
-</p>
 
 <p>
 <!-- navigation buttons at the bottom of the page -->
@@ -342,7 +491,7 @@ <h2 id="standard-approach-based-on-the-normal-distribution" class="anchor">Stand
   <li><a href="http://github.com/._week37-bs039.html">40</a></li>
   <li><a href="http://github.com/._week37-bs040.html">41</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week37-bs053.html">54</a></li>
+  <li><a href="http://github.com/._week37-bs065.html">66</a></li>
   <li><a href="http://github.com/._week37-bs032.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week37/html/._week37-bs032.html b/doc/pub/week37/html/._week37-bs032.html
index abd32b421..5bcaa9a17 100644
--- a/doc/pub/week37/html/._week37-bs032.html
+++ b/doc/pub/week37/html/._week37-bs032.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 37: Statistical interpretations and Resampling Methods">
-<title>Week 37: Statistical interpretations and Resampling Methods</title>
+<meta name="description" content="Week 37: Gradient descent methods">
+<title>Week 37: Gradient descent methods</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week37.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week37-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -40,159 +40,222 @@
                2,
                None,
                'plans-for-week-37-lecture-monday'),
-              ('Plans for week 37, lab sessions',
+              ('Readings and Videos:', 2, None, 'readings-and-videos'),
+              ('Material for lecture Monday September 8',
                2,
                None,
-               'plans-for-week-37-lab-sessions'),
-              ('Material for lecture Monday September 9',
+               'material-for-lecture-monday-september-8'),
+              ('Gradient descent and revisiting Ordinary Least Squares from '
+               'last week',
                2,
                None,
-               'material-for-lecture-monday-september-9'),
-              ('Deriving OLS from a probability distribution',
+               'gradient-descent-and-revisiting-ordinary-least-squares-from-last-week'),
+              ('Gradient descent example', 2, None, 'gradient-descent-example'),
+              ('The derivative of the cost/loss function',
                2,
                None,
-               'deriving-ols-from-a-probability-distribution'),
-              ('Independent and Identically Distrubuted (iid)',
+               'the-derivative-of-the-cost-loss-function'),
+              ('The Hessian matrix', 2, None, 'the-hessian-matrix'),
+              ('Simple program', 2, None, 'simple-program'),
+              ('Gradient Descent Example', 2, None, 'gradient-descent-example'),
+              ('Gradient descent and Ridge',
                2,
                None,
-               'independent-and-identically-distrubuted-iid'),
-              ('Maximum Likelihood Estimation (MLE)',
+               'gradient-descent-and-ridge'),
+              ('The Hessian matrix for Ridge Regression',
                2,
                None,
-               'maximum-likelihood-estimation-mle'),
-              ('A new Cost Function', 2, None, 'a-new-cost-function'),
-              ("More basic Statistics and Bayes' theorem",
+               'the-hessian-matrix-for-ridge-regression'),
+              ('Program example for gradient descent with Ridge Regression',
                2,
                None,
-               'more-basic-statistics-and-bayes-theorem'),
-              ('Marginal Probability', 2, None, 'marginal-probability'),
-              ('Conditional  Probability', 2, None, 'conditional-probability'),
-              ("Bayes' Theorem", 2, None, 'bayes-theorem'),
-              ("Interpretations of Bayes' Theorem",
+               'program-example-for-gradient-descent-with-ridge-regression'),
+              ('Using gradient descent methods, limitations',
                2,
                None,
-               'interpretations-of-bayes-theorem'),
-              ("Example of Usage of Bayes' theorem",
+               'using-gradient-descent-methods-limitations'),
+              ('Momentum based GD', 2, None, 'momentum-based-gd'),
+              ('Improving gradient descent with momentum',
                2,
                None,
-               'example-of-usage-of-bayes-theorem'),
-              ('Doing it correctly', 2, None, 'doing-it-correctly'),
-              ("Bayes' Theorem and Ridge and Lasso Regression",
+               'improving-gradient-descent-with-momentum'),
+              ('Same code but now with momentum gradient descent',
                2,
                None,
-               'bayes-theorem-and-ridge-and-lasso-regression'),
-              ('Ridge and Bayes', 2, None, 'ridge-and-bayes'),
-              ('Lasso and Bayes', 2, None, 'lasso-and-bayes'),
-              ('Why resampling methods', 2, None, 'why-resampling-methods'),
-              ('Resampling methods', 2, None, 'resampling-methods'),
-              ('Resampling approaches can be computationally expensive',
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ('Overview video on Stochastic Gradient Descent (SGD)',
                2,
                None,
-               'resampling-approaches-can-be-computationally-expensive'),
-              ('Why resampling methods ?', 2, None, 'why-resampling-methods'),
-              ('Statistical analysis', 2, None, 'statistical-analysis'),
-              ('Resampling methods', 2, None, 'resampling-methods'),
-              ('Resampling methods: Bootstrap',
+               'overview-video-on-stochastic-gradient-descent-sgd'),
+              ('Batches and mini-batches', 2, None, 'batches-and-mini-batches'),
+              ('Pros and cons', 2, None, 'pros-and-cons'),
+              ('Convergence rates', 2, None, 'convergence-rates'),
+              ('Accuracy', 2, None, 'accuracy'),
+              ('Stochastic Gradient Descent (SGD)',
                2,
                None,
-               'resampling-methods-bootstrap'),
-              ('The Central Limit Theorem',
+               'stochastic-gradient-descent-sgd'),
+              ('Stochastic Gradient Descent',
                2,
                None,
-               'the-central-limit-theorem'),
-              ('Finding the Limit', 2, None, 'finding-the-limit'),
-              ('Rewriting the $\\delta$-function',
+               'stochastic-gradient-descent'),
+              ('Computation of gradients', 2, None, 'computation-of-gradients'),
+              ('SGD example', 2, None, 'sgd-example'),
+              ('The gradient step', 2, None, 'the-gradient-step'),
+              ('Simple example code', 2, None, 'simple-example-code'),
+              ('When do we stop?', 2, None, 'when-do-we-stop'),
+              ('Slightly different approach',
                2,
                None,
-               'rewriting-the-delta-function'),
-              ('Identifying Terms', 2, None, 'identifying-terms'),
-              ('Wrapping it up', 2, None, 'wrapping-it-up'),
-              ('Confidence Intervals', 2, None, 'confidence-intervals'),
-              ('Standard Approach based on the Normal Distribution',
+               'slightly-different-approach'),
+              ('Time decay rate', 2, None, 'time-decay-rate'),
+              ('Code with a Number of Minibatches which varies',
                2,
                None,
-               'standard-approach-based-on-the-normal-distribution'),
-              ('Resampling methods: Bootstrap background',
+               'code-with-a-number-of-minibatches-which-varies'),
+              ('Replace or not', 2, None, 'replace-or-not'),
+              ('SGD vs Full-Batch GD: Convergence Speed and Memory Comparison',
                2,
                None,
-               'resampling-methods-bootstrap-background'),
-              ('Resampling methods: More Bootstrap background',
+               'sgd-vs-full-batch-gd-convergence-speed-and-memory-comparison'),
+              ('Theoretical Convergence Speed and convex optimization',
+               3,
+               None,
+               'theoretical-convergence-speed-and-convex-optimization'),
+              ('Strongly Convex Case', 3, None, 'strongly-convex-case'),
+              ('Non-Convex Problems', 3, None, 'non-convex-problems'),
+              ('Memory Usage and Scalability',
+               2,
+               None,
+               'memory-usage-and-scalability'),
+              ('Empirical Evidence: Convergence Time and Memory in Practice',
+               2,
+               None,
+               'empirical-evidence-convergence-time-and-memory-in-practice'),
+              ('Deep Neural Networks', 3, None, 'deep-neural-networks'),
+              ('Memory constraints', 3, None, 'memory-constraints'),
+              ('Second moment of the gradient',
+               2,
+               None,
+               'second-moment-of-the-gradient'),
+              ('Challenge: Choosing a Fixed Learning Rate',
+               2,
+               None,
+               'challenge-choosing-a-fixed-learning-rate'),
+              ('Motivation for Adaptive Step Sizes',
+               2,
+               None,
+               'motivation-for-adaptive-step-sizes'),
+              ('AdaGrad algorithm, taken from "Goodfellow et '
+               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
+               2,
+               None,
+               'adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
+              ('Derivation of the AdaGrad Algorithm',
+               2,
+               None,
+               'derivation-of-the-adagrad-algorithm'),
+              ('AdaGrad Update Rule Derivation',
+               2,
+               None,
+               'adagrad-update-rule-derivation'),
+              ('AdaGrad Properties', 2, None, 'adagrad-properties'),
+              ('RMSProp: Adaptive Learning Rates',
+               2,
+               None,
+               'rmsprop-adaptive-learning-rates'),
+              ('RMSProp algorithm, taken from "Goodfellow et '
+               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
+               2,
+               None,
+               'rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
+              ('Adam Optimizer', 2, None, 'adam-optimizer'),
+              ('"ADAM optimizer":"https://arxiv.org/abs/1412.6980"',
+               2,
+               None,
+               'adam-optimizer-https-arxiv-org-abs-1412-6980'),
+              ('Why Combine Momentum and RMSProp?',
+               2,
+               None,
+               'why-combine-momentum-and-rmsprop'),
+              ('Adam: Exponential Moving Averages (Moments)',
                2,
                None,
-               'resampling-methods-more-bootstrap-background'),
-              ('Resampling methods: Bootstrap approach',
+               'adam-exponential-moving-averages-moments'),
+              ('Adam: Bias Correction', 2, None, 'adam-bias-correction'),
+              ('Adam: Update Rule Derivation',
                2,
                None,
-               'resampling-methods-bootstrap-approach'),
-              ('Resampling methods: Bootstrap steps',
+               'adam-update-rule-derivation'),
+              ('Adam vs. AdaGrad and RMSProp',
                2,
                None,
-               'resampling-methods-bootstrap-steps'),
-              ('Code example for the Bootstrap method',
+               'adam-vs-adagrad-and-rmsprop'),
+              ('Adaptivity Across Dimensions',
                2,
                None,
-               'code-example-for-the-bootstrap-method'),
-              ('Plotting the Histogram', 2, None, 'plotting-the-histogram'),
-              ('The bias-variance tradeoff',
+               'adaptivity-across-dimensions'),
+              ('ADAM algorithm, taken from "Goodfellow et '
+               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
                2,
                None,
-               'the-bias-variance-tradeoff'),
-              ('A way to Read the Bias-Variance Tradeoff',
+               'adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
+              ('Algorithms and codes for Adagrad, RMSprop and Adam',
                2,
                None,
-               'a-way-to-read-the-bias-variance-tradeoff'),
-              ('Example code for Bias-Variance tradeoff',
+               'algorithms-and-codes-for-adagrad-rmsprop-and-adam'),
+              ('Practical tips', 2, None, 'practical-tips'),
+              ('Sneaking in automatic differentiation using Autograd',
                2,
                None,
-               'example-code-for-bias-variance-tradeoff'),
-              ('Understanding what happens',
+               'sneaking-in-automatic-differentiation-using-autograd'),
+              ('Same code but now with momentum gradient descent',
                2,
                None,
-               'understanding-what-happens'),
-              ('Summing up', 2, None, 'summing-up'),
-              ("Another Example from Scikit-Learn's Repository",
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ('Including Stochastic Gradient Descent with Autograd',
                2,
                None,
-               'another-example-from-scikit-learn-s-repository'),
-              ('Various steps in cross-validation',
+               'including-stochastic-gradient-descent-with-autograd'),
+              ('Same code but now with momentum gradient descent',
                2,
                None,
-               'various-steps-in-cross-validation'),
-              ('Cross-validation in brief',
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ("But none of these can compete with Newton's method",
                2,
                None,
-               'cross-validation-in-brief'),
-              ('Code Example for Cross-validation and $k$-fold '
-               'Cross-validation',
+               'but-none-of-these-can-compete-with-newton-s-method'),
+              ('Similar (second order function now) problem but now with '
+               'AdaGrad',
                2,
                None,
-               'code-example-for-cross-validation-and-k-fold-cross-validation'),
-              ('More examples on bootstrap and cross-validation and errors',
+               'similar-second-order-function-now-problem-but-now-with-adagrad'),
+              ('RMSprop for adaptive learning rate with Stochastic Gradient '
+               'Descent',
                2,
                None,
-               'more-examples-on-bootstrap-and-cross-validation-and-errors'),
-              ('The same example but now with cross-validation',
+               'rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent'),
+              ('And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"',
                2,
                None,
-               'the-same-example-but-now-with-cross-validation'),
+               'and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf'),
               ('Material for the lab sessions',
                2,
                None,
                'material-for-the-lab-sessions'),
-              ('Linking the regression analysis with a statistical '
-               'interpretation',
+              ('Reminder on different scaling methods',
                2,
                None,
-               'linking-the-regression-analysis-with-a-statistical-interpretation'),
-              ('Assumptions made', 2, None, 'assumptions-made'),
-              ('Expectation value and variance',
+               'reminder-on-different-scaling-methods'),
+              ('Functionality in Scikit-Learn',
                2,
                None,
-               'expectation-value-and-variance'),
-              ('Expectation value and variance for $\\boldsymbol{\\beta}$',
+               'functionality-in-scikit-learn'),
+              ('More preprocessing', 2, None, 'more-preprocessing'),
+              ('Frequently used scaling functions',
                2,
                None,
-               'expectation-value-and-variance-for-boldsymbol-beta')]}
+               'frequently-used-scaling-functions')]}
 end of tocinfo -->
 
 <body>
@@ -220,66 +283,86 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week37-bs.html">Week 37: Statistical interpretations and Resampling Methods</a>
+    <a class="navbar-brand" href="http://github.com/week37-bs.html">Week 37: Gradient descent methods</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs001.html#plans-for-week-37-lecture-monday" style="font-size: 80%;">Plans for week 37, lecture Monday</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs002.html#plans-for-week-37-lab-sessions" style="font-size: 80%;">Plans for week 37, lab sessions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs003.html#material-for-lecture-monday-september-9" style="font-size: 80%;">Material for lecture Monday September 9</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs004.html#deriving-ols-from-a-probability-distribution" style="font-size: 80%;">Deriving OLS from a probability distribution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs005.html#independent-and-identically-distrubuted-iid" style="font-size: 80%;">Independent and Identically Distrubuted (iid)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs006.html#maximum-likelihood-estimation-mle" style="font-size: 80%;">Maximum Likelihood Estimation (MLE)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs007.html#a-new-cost-function" style="font-size: 80%;">A new Cost Function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs008.html#more-basic-statistics-and-bayes-theorem" style="font-size: 80%;">More basic Statistics and Bayes' theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs009.html#marginal-probability" style="font-size: 80%;">Marginal Probability</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs010.html#conditional-probability" style="font-size: 80%;">Conditional  Probability</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs011.html#bayes-theorem" style="font-size: 80%;">Bayes' Theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs012.html#interpretations-of-bayes-theorem" style="font-size: 80%;">Interpretations of Bayes' Theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs013.html#example-of-usage-of-bayes-theorem" style="font-size: 80%;">Example of Usage of Bayes' theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs014.html#doing-it-correctly" style="font-size: 80%;">Doing it correctly</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs015.html#bayes-theorem-and-ridge-and-lasso-regression" style="font-size: 80%;">Bayes' Theorem and Ridge and Lasso Regression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs016.html#ridge-and-bayes" style="font-size: 80%;">Ridge and Bayes</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs017.html#lasso-and-bayes" style="font-size: 80%;">Lasso and Bayes</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs021.html#why-resampling-methods" style="font-size: 80%;">Why resampling methods</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs023.html#resampling-methods" style="font-size: 80%;">Resampling methods</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs020.html#resampling-approaches-can-be-computationally-expensive" style="font-size: 80%;">Resampling approaches can be computationally expensive</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs021.html#why-resampling-methods" style="font-size: 80%;">Why resampling methods ?</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs022.html#statistical-analysis" style="font-size: 80%;">Statistical analysis</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs023.html#resampling-methods" style="font-size: 80%;">Resampling methods</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs024.html#resampling-methods-bootstrap" style="font-size: 80%;">Resampling methods: Bootstrap</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs025.html#the-central-limit-theorem" style="font-size: 80%;">The Central Limit Theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs026.html#finding-the-limit" style="font-size: 80%;">Finding the Limit</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs027.html#rewriting-the-delta-function" style="font-size: 80%;">Rewriting the \( \delta \)-function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs028.html#identifying-terms" style="font-size: 80%;">Identifying Terms</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs029.html#wrapping-it-up" style="font-size: 80%;">Wrapping it up</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs030.html#confidence-intervals" style="font-size: 80%;">Confidence Intervals</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs031.html#standard-approach-based-on-the-normal-distribution" style="font-size: 80%;">Standard Approach based on the Normal Distribution</a></li>
-     <!-- navigation toc: --> <li><a href="#resampling-methods-bootstrap-background" style="font-size: 80%;">Resampling methods: Bootstrap background</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#resampling-methods-more-bootstrap-background" style="font-size: 80%;">Resampling methods: More Bootstrap background</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs034.html#resampling-methods-bootstrap-approach" style="font-size: 80%;">Resampling methods: Bootstrap approach</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#resampling-methods-bootstrap-steps" style="font-size: 80%;">Resampling methods: Bootstrap steps</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs036.html#code-example-for-the-bootstrap-method" style="font-size: 80%;">Code example for the Bootstrap method</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs037.html#plotting-the-histogram" style="font-size: 80%;">Plotting the Histogram</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs038.html#the-bias-variance-tradeoff" style="font-size: 80%;">The bias-variance tradeoff</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs039.html#a-way-to-read-the-bias-variance-tradeoff" style="font-size: 80%;">A way to Read the Bias-Variance Tradeoff</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs040.html#example-code-for-bias-variance-tradeoff" style="font-size: 80%;">Example code for Bias-Variance tradeoff</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs041.html#understanding-what-happens" style="font-size: 80%;">Understanding what happens</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs042.html#summing-up" style="font-size: 80%;">Summing up</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs043.html#another-example-from-scikit-learn-s-repository" style="font-size: 80%;">Another Example from Scikit-Learn's Repository</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs044.html#various-steps-in-cross-validation" style="font-size: 80%;">Various steps in cross-validation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs045.html#cross-validation-in-brief" style="font-size: 80%;">Cross-validation in brief</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs046.html#code-example-for-cross-validation-and-k-fold-cross-validation" style="font-size: 80%;">Code Example for Cross-validation and \( k \)-fold Cross-validation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs047.html#more-examples-on-bootstrap-and-cross-validation-and-errors" style="font-size: 80%;">More examples on bootstrap and cross-validation and errors</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs048.html#the-same-example-but-now-with-cross-validation" style="font-size: 80%;">The same example but now with cross-validation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs049.html#material-for-the-lab-sessions" style="font-size: 80%;">Material for the lab sessions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs050.html#linking-the-regression-analysis-with-a-statistical-interpretation" style="font-size: 80%;">Linking the regression analysis with a statistical interpretation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs051.html#assumptions-made" style="font-size: 80%;">Assumptions made</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs052.html#expectation-value-and-variance" style="font-size: 80%;">Expectation value and variance</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs053.html#expectation-value-and-variance-for-boldsymbol-beta" style="font-size: 80%;">Expectation value and variance for \( \boldsymbol{\beta} \)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs001.html#plans-for-week-37-lecture-monday" style="font-size: 80%;"><b>Plans for week 37, lecture Monday</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs002.html#readings-and-videos" style="font-size: 80%;"><b>Readings and Videos:</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs003.html#material-for-lecture-monday-september-8" style="font-size: 80%;"><b>Material for lecture Monday September 8</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs004.html#gradient-descent-and-revisiting-ordinary-least-squares-from-last-week" style="font-size: 80%;"><b>Gradient descent and revisiting Ordinary Least Squares from last week</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs009.html#gradient-descent-example" style="font-size: 80%;"><b>Gradient descent example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs006.html#the-derivative-of-the-cost-loss-function" style="font-size: 80%;"><b>The derivative of the cost/loss function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs007.html#the-hessian-matrix" style="font-size: 80%;"><b>The Hessian matrix</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs008.html#simple-program" style="font-size: 80%;"><b>Simple program</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs009.html#gradient-descent-example" style="font-size: 80%;"><b>Gradient Descent Example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs010.html#gradient-descent-and-ridge" style="font-size: 80%;"><b>Gradient descent and Ridge</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs011.html#the-hessian-matrix-for-ridge-regression" style="font-size: 80%;"><b>The Hessian matrix for Ridge Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs012.html#program-example-for-gradient-descent-with-ridge-regression" style="font-size: 80%;"><b>Program example for gradient descent with Ridge Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs013.html#using-gradient-descent-methods-limitations" style="font-size: 80%;"><b>Using gradient descent methods, limitations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs014.html#momentum-based-gd" style="font-size: 80%;"><b>Momentum based GD</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs015.html#improving-gradient-descent-with-momentum" style="font-size: 80%;"><b>Improving gradient descent with momentum</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs056.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs017.html#overview-video-on-stochastic-gradient-descent-sgd" style="font-size: 80%;"><b>Overview video on Stochastic Gradient Descent (SGD)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs018.html#batches-and-mini-batches" style="font-size: 80%;"><b>Batches and mini-batches</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs019.html#pros-and-cons" style="font-size: 80%;"><b>Pros and cons</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs020.html#convergence-rates" style="font-size: 80%;"><b>Convergence rates</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs021.html#accuracy" style="font-size: 80%;"><b>Accuracy</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs022.html#stochastic-gradient-descent-sgd" style="font-size: 80%;"><b>Stochastic Gradient Descent (SGD)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs023.html#stochastic-gradient-descent" style="font-size: 80%;"><b>Stochastic Gradient Descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs024.html#computation-of-gradients" style="font-size: 80%;"><b>Computation of gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs025.html#sgd-example" style="font-size: 80%;"><b>SGD example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs026.html#the-gradient-step" style="font-size: 80%;"><b>The gradient step</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs027.html#simple-example-code" style="font-size: 80%;"><b>Simple example code</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs028.html#when-do-we-stop" style="font-size: 80%;"><b>When do we stop?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs029.html#slightly-different-approach" style="font-size: 80%;"><b>Slightly different approach</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs030.html#time-decay-rate" style="font-size: 80%;"><b>Time decay rate</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs031.html#code-with-a-number-of-minibatches-which-varies" style="font-size: 80%;"><b>Code with a Number of Minibatches which varies</b></a></li>
+     <!-- navigation toc: --> <li><a href="#replace-or-not" style="font-size: 80%;"><b>Replace or not</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#sgd-vs-full-batch-gd-convergence-speed-and-memory-comparison" style="font-size: 80%;"><b>SGD vs Full-Batch GD: Convergence Speed and Memory Comparison</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#theoretical-convergence-speed-and-convex-optimization" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Theoretical Convergence Speed and convex optimization</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#strongly-convex-case" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Strongly Convex Case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#non-convex-problems" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Non-Convex Problems</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs034.html#memory-usage-and-scalability" style="font-size: 80%;"><b>Memory Usage and Scalability</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#empirical-evidence-convergence-time-and-memory-in-practice" style="font-size: 80%;"><b>Empirical Evidence: Convergence Time and Memory in Practice</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#deep-neural-networks" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Deep Neural Networks</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#memory-constraints" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Memory constraints</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs036.html#second-moment-of-the-gradient" style="font-size: 80%;"><b>Second moment of the gradient</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs037.html#challenge-choosing-a-fixed-learning-rate" style="font-size: 80%;"><b>Challenge: Choosing a Fixed Learning Rate</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs038.html#motivation-for-adaptive-step-sizes" style="font-size: 80%;"><b>Motivation for Adaptive Step Sizes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs038.html#adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>AdaGrad algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs039.html#derivation-of-the-adagrad-algorithm" style="font-size: 80%;"><b>Derivation of the AdaGrad Algorithm</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs040.html#adagrad-update-rule-derivation" style="font-size: 80%;"><b>AdaGrad Update Rule Derivation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs041.html#adagrad-properties" style="font-size: 80%;"><b>AdaGrad Properties</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs042.html#rmsprop-adaptive-learning-rates" style="font-size: 80%;"><b>RMSProp: Adaptive Learning Rates</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs042.html#rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>RMSProp algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs043.html#adam-optimizer" style="font-size: 80%;"><b>Adam Optimizer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs044.html#adam-optimizer-https-arxiv-org-abs-1412-6980" style="font-size: 80%;"><b>"ADAM optimizer":"https://arxiv.org/abs/1412.6980"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs045.html#why-combine-momentum-and-rmsprop" style="font-size: 80%;"><b>Why Combine Momentum and RMSProp?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs046.html#adam-exponential-moving-averages-moments" style="font-size: 80%;"><b>Adam: Exponential Moving Averages (Moments)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs047.html#adam-bias-correction" style="font-size: 80%;"><b>Adam: Bias Correction</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs048.html#adam-update-rule-derivation" style="font-size: 80%;"><b>Adam: Update Rule Derivation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs049.html#adam-vs-adagrad-and-rmsprop" style="font-size: 80%;"><b>Adam vs. AdaGrad and RMSProp</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs050.html#adaptivity-across-dimensions" style="font-size: 80%;"><b>Adaptivity Across Dimensions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs050.html#adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>ADAM algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs051.html#algorithms-and-codes-for-adagrad-rmsprop-and-adam" style="font-size: 80%;"><b>Algorithms and codes for Adagrad, RMSprop and Adam</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs052.html#practical-tips" style="font-size: 80%;"><b>Practical tips</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs053.html#sneaking-in-automatic-differentiation-using-autograd" style="font-size: 80%;"><b>Sneaking in automatic differentiation using Autograd</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs056.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs055.html#including-stochastic-gradient-descent-with-autograd" style="font-size: 80%;"><b>Including Stochastic Gradient Descent with Autograd</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs056.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs057.html#but-none-of-these-can-compete-with-newton-s-method" style="font-size: 80%;"><b>But none of these can compete with Newton's method</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs058.html#similar-second-order-function-now-problem-but-now-with-adagrad" style="font-size: 80%;"><b>Similar (second order function now) problem but now with AdaGrad</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs059.html#rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent" style="font-size: 80%;"><b>RMSprop for adaptive learning rate with Stochastic Gradient Descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs060.html#and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf" style="font-size: 80%;"><b>And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs061.html#material-for-the-lab-sessions" style="font-size: 80%;"><b>Material for the lab sessions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs062.html#reminder-on-different-scaling-methods" style="font-size: 80%;"><b>Reminder on different scaling methods</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs063.html#functionality-in-scikit-learn" style="font-size: 80%;"><b>Functionality in Scikit-Learn</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs064.html#more-preprocessing" style="font-size: 80%;"><b>More preprocessing</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs065.html#frequently-used-scaling-functions" style="font-size: 80%;"><b>Frequently used scaling functions</b></a></li>
 
         </ul>
       </li>
@@ -291,17 +374,12 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0032"></a>
 <!-- !split -->
-<h2 id="resampling-methods-bootstrap-background" class="anchor">Resampling methods: Bootstrap background </h2>
+<h2 id="replace-or-not" class="anchor">Replace or not </h2>
 
-<p>Since \( \widehat{\beta} = \widehat{\beta}(\boldsymbol{X}) \) is a function of random variables,
-\( \widehat{\beta} \) itself must be a random variable. Thus it has
-a pdf, call this function \( p(\boldsymbol{t}) \). The aim of the bootstrap is to
-estimate \( p(\boldsymbol{t}) \) by the relative frequency of
-\( \widehat{\beta} \). You can think of this as using a histogram
-in the place of \( p(\boldsymbol{t}) \). If the relative frequency closely
-resembles \( p(\vec{t}) \), then using numerics, it is straight forward to
-estimate all the interesting parameters of \( p(\boldsymbol{t}) \) using point
-estimators.  
+<p>In the above code, we have use replacement in setting up the
+mini-batches. The discussion
+<a href="https://sebastianraschka.com/faq/docs/sgd-methods.html" target="_self">here</a> may be
+useful.  
 </p>
 
 <p>
@@ -329,7 +407,7 @@ <h2 id="resampling-methods-bootstrap-background" class="anchor">Resampling metho
   <li><a href="http://github.com/._week37-bs040.html">41</a></li>
   <li><a href="http://github.com/._week37-bs041.html">42</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week37-bs053.html">54</a></li>
+  <li><a href="http://github.com/._week37-bs065.html">66</a></li>
   <li><a href="http://github.com/._week37-bs033.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week37/html/._week37-bs033.html b/doc/pub/week37/html/._week37-bs033.html
index cbea0e7d2..220f12a1e 100644
--- a/doc/pub/week37/html/._week37-bs033.html
+++ b/doc/pub/week37/html/._week37-bs033.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 37: Statistical interpretations and Resampling Methods">
-<title>Week 37: Statistical interpretations and Resampling Methods</title>
+<meta name="description" content="Week 37: Gradient descent methods">
+<title>Week 37: Gradient descent methods</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week37.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week37-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -40,159 +40,222 @@
                2,
                None,
                'plans-for-week-37-lecture-monday'),
-              ('Plans for week 37, lab sessions',
+              ('Readings and Videos:', 2, None, 'readings-and-videos'),
+              ('Material for lecture Monday September 8',
                2,
                None,
-               'plans-for-week-37-lab-sessions'),
-              ('Material for lecture Monday September 9',
+               'material-for-lecture-monday-september-8'),
+              ('Gradient descent and revisiting Ordinary Least Squares from '
+               'last week',
                2,
                None,
-               'material-for-lecture-monday-september-9'),
-              ('Deriving OLS from a probability distribution',
+               'gradient-descent-and-revisiting-ordinary-least-squares-from-last-week'),
+              ('Gradient descent example', 2, None, 'gradient-descent-example'),
+              ('The derivative of the cost/loss function',
                2,
                None,
-               'deriving-ols-from-a-probability-distribution'),
-              ('Independent and Identically Distrubuted (iid)',
+               'the-derivative-of-the-cost-loss-function'),
+              ('The Hessian matrix', 2, None, 'the-hessian-matrix'),
+              ('Simple program', 2, None, 'simple-program'),
+              ('Gradient Descent Example', 2, None, 'gradient-descent-example'),
+              ('Gradient descent and Ridge',
                2,
                None,
-               'independent-and-identically-distrubuted-iid'),
-              ('Maximum Likelihood Estimation (MLE)',
+               'gradient-descent-and-ridge'),
+              ('The Hessian matrix for Ridge Regression',
                2,
                None,
-               'maximum-likelihood-estimation-mle'),
-              ('A new Cost Function', 2, None, 'a-new-cost-function'),
-              ("More basic Statistics and Bayes' theorem",
+               'the-hessian-matrix-for-ridge-regression'),
+              ('Program example for gradient descent with Ridge Regression',
                2,
                None,
-               'more-basic-statistics-and-bayes-theorem'),
-              ('Marginal Probability', 2, None, 'marginal-probability'),
-              ('Conditional  Probability', 2, None, 'conditional-probability'),
-              ("Bayes' Theorem", 2, None, 'bayes-theorem'),
-              ("Interpretations of Bayes' Theorem",
+               'program-example-for-gradient-descent-with-ridge-regression'),
+              ('Using gradient descent methods, limitations',
                2,
                None,
-               'interpretations-of-bayes-theorem'),
-              ("Example of Usage of Bayes' theorem",
+               'using-gradient-descent-methods-limitations'),
+              ('Momentum based GD', 2, None, 'momentum-based-gd'),
+              ('Improving gradient descent with momentum',
                2,
                None,
-               'example-of-usage-of-bayes-theorem'),
-              ('Doing it correctly', 2, None, 'doing-it-correctly'),
-              ("Bayes' Theorem and Ridge and Lasso Regression",
+               'improving-gradient-descent-with-momentum'),
+              ('Same code but now with momentum gradient descent',
                2,
                None,
-               'bayes-theorem-and-ridge-and-lasso-regression'),
-              ('Ridge and Bayes', 2, None, 'ridge-and-bayes'),
-              ('Lasso and Bayes', 2, None, 'lasso-and-bayes'),
-              ('Why resampling methods', 2, None, 'why-resampling-methods'),
-              ('Resampling methods', 2, None, 'resampling-methods'),
-              ('Resampling approaches can be computationally expensive',
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ('Overview video on Stochastic Gradient Descent (SGD)',
                2,
                None,
-               'resampling-approaches-can-be-computationally-expensive'),
-              ('Why resampling methods ?', 2, None, 'why-resampling-methods'),
-              ('Statistical analysis', 2, None, 'statistical-analysis'),
-              ('Resampling methods', 2, None, 'resampling-methods'),
-              ('Resampling methods: Bootstrap',
+               'overview-video-on-stochastic-gradient-descent-sgd'),
+              ('Batches and mini-batches', 2, None, 'batches-and-mini-batches'),
+              ('Pros and cons', 2, None, 'pros-and-cons'),
+              ('Convergence rates', 2, None, 'convergence-rates'),
+              ('Accuracy', 2, None, 'accuracy'),
+              ('Stochastic Gradient Descent (SGD)',
                2,
                None,
-               'resampling-methods-bootstrap'),
-              ('The Central Limit Theorem',
+               'stochastic-gradient-descent-sgd'),
+              ('Stochastic Gradient Descent',
                2,
                None,
-               'the-central-limit-theorem'),
-              ('Finding the Limit', 2, None, 'finding-the-limit'),
-              ('Rewriting the $\\delta$-function',
+               'stochastic-gradient-descent'),
+              ('Computation of gradients', 2, None, 'computation-of-gradients'),
+              ('SGD example', 2, None, 'sgd-example'),
+              ('The gradient step', 2, None, 'the-gradient-step'),
+              ('Simple example code', 2, None, 'simple-example-code'),
+              ('When do we stop?', 2, None, 'when-do-we-stop'),
+              ('Slightly different approach',
                2,
                None,
-               'rewriting-the-delta-function'),
-              ('Identifying Terms', 2, None, 'identifying-terms'),
-              ('Wrapping it up', 2, None, 'wrapping-it-up'),
-              ('Confidence Intervals', 2, None, 'confidence-intervals'),
-              ('Standard Approach based on the Normal Distribution',
+               'slightly-different-approach'),
+              ('Time decay rate', 2, None, 'time-decay-rate'),
+              ('Code with a Number of Minibatches which varies',
                2,
                None,
-               'standard-approach-based-on-the-normal-distribution'),
-              ('Resampling methods: Bootstrap background',
+               'code-with-a-number-of-minibatches-which-varies'),
+              ('Replace or not', 2, None, 'replace-or-not'),
+              ('SGD vs Full-Batch GD: Convergence Speed and Memory Comparison',
                2,
                None,
-               'resampling-methods-bootstrap-background'),
-              ('Resampling methods: More Bootstrap background',
+               'sgd-vs-full-batch-gd-convergence-speed-and-memory-comparison'),
+              ('Theoretical Convergence Speed and convex optimization',
+               3,
+               None,
+               'theoretical-convergence-speed-and-convex-optimization'),
+              ('Strongly Convex Case', 3, None, 'strongly-convex-case'),
+              ('Non-Convex Problems', 3, None, 'non-convex-problems'),
+              ('Memory Usage and Scalability',
+               2,
+               None,
+               'memory-usage-and-scalability'),
+              ('Empirical Evidence: Convergence Time and Memory in Practice',
+               2,
+               None,
+               'empirical-evidence-convergence-time-and-memory-in-practice'),
+              ('Deep Neural Networks', 3, None, 'deep-neural-networks'),
+              ('Memory constraints', 3, None, 'memory-constraints'),
+              ('Second moment of the gradient',
+               2,
+               None,
+               'second-moment-of-the-gradient'),
+              ('Challenge: Choosing a Fixed Learning Rate',
+               2,
+               None,
+               'challenge-choosing-a-fixed-learning-rate'),
+              ('Motivation for Adaptive Step Sizes',
+               2,
+               None,
+               'motivation-for-adaptive-step-sizes'),
+              ('AdaGrad algorithm, taken from "Goodfellow et '
+               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
+               2,
+               None,
+               'adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
+              ('Derivation of the AdaGrad Algorithm',
+               2,
+               None,
+               'derivation-of-the-adagrad-algorithm'),
+              ('AdaGrad Update Rule Derivation',
+               2,
+               None,
+               'adagrad-update-rule-derivation'),
+              ('AdaGrad Properties', 2, None, 'adagrad-properties'),
+              ('RMSProp: Adaptive Learning Rates',
+               2,
+               None,
+               'rmsprop-adaptive-learning-rates'),
+              ('RMSProp algorithm, taken from "Goodfellow et '
+               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
                2,
                None,
-               'resampling-methods-more-bootstrap-background'),
-              ('Resampling methods: Bootstrap approach',
+               'rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
+              ('Adam Optimizer', 2, None, 'adam-optimizer'),
+              ('"ADAM optimizer":"https://arxiv.org/abs/1412.6980"',
                2,
                None,
-               'resampling-methods-bootstrap-approach'),
-              ('Resampling methods: Bootstrap steps',
+               'adam-optimizer-https-arxiv-org-abs-1412-6980'),
+              ('Why Combine Momentum and RMSProp?',
                2,
                None,
-               'resampling-methods-bootstrap-steps'),
-              ('Code example for the Bootstrap method',
+               'why-combine-momentum-and-rmsprop'),
+              ('Adam: Exponential Moving Averages (Moments)',
                2,
                None,
-               'code-example-for-the-bootstrap-method'),
-              ('Plotting the Histogram', 2, None, 'plotting-the-histogram'),
-              ('The bias-variance tradeoff',
+               'adam-exponential-moving-averages-moments'),
+              ('Adam: Bias Correction', 2, None, 'adam-bias-correction'),
+              ('Adam: Update Rule Derivation',
                2,
                None,
-               'the-bias-variance-tradeoff'),
-              ('A way to Read the Bias-Variance Tradeoff',
+               'adam-update-rule-derivation'),
+              ('Adam vs. AdaGrad and RMSProp',
                2,
                None,
-               'a-way-to-read-the-bias-variance-tradeoff'),
-              ('Example code for Bias-Variance tradeoff',
+               'adam-vs-adagrad-and-rmsprop'),
+              ('Adaptivity Across Dimensions',
                2,
                None,
-               'example-code-for-bias-variance-tradeoff'),
-              ('Understanding what happens',
+               'adaptivity-across-dimensions'),
+              ('ADAM algorithm, taken from "Goodfellow et '
+               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
                2,
                None,
-               'understanding-what-happens'),
-              ('Summing up', 2, None, 'summing-up'),
-              ("Another Example from Scikit-Learn's Repository",
+               'adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
+              ('Algorithms and codes for Adagrad, RMSprop and Adam',
                2,
                None,
-               'another-example-from-scikit-learn-s-repository'),
-              ('Various steps in cross-validation',
+               'algorithms-and-codes-for-adagrad-rmsprop-and-adam'),
+              ('Practical tips', 2, None, 'practical-tips'),
+              ('Sneaking in automatic differentiation using Autograd',
                2,
                None,
-               'various-steps-in-cross-validation'),
-              ('Cross-validation in brief',
+               'sneaking-in-automatic-differentiation-using-autograd'),
+              ('Same code but now with momentum gradient descent',
                2,
                None,
-               'cross-validation-in-brief'),
-              ('Code Example for Cross-validation and $k$-fold '
-               'Cross-validation',
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ('Including Stochastic Gradient Descent with Autograd',
                2,
                None,
-               'code-example-for-cross-validation-and-k-fold-cross-validation'),
-              ('More examples on bootstrap and cross-validation and errors',
+               'including-stochastic-gradient-descent-with-autograd'),
+              ('Same code but now with momentum gradient descent',
                2,
                None,
-               'more-examples-on-bootstrap-and-cross-validation-and-errors'),
-              ('The same example but now with cross-validation',
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ("But none of these can compete with Newton's method",
                2,
                None,
-               'the-same-example-but-now-with-cross-validation'),
+               'but-none-of-these-can-compete-with-newton-s-method'),
+              ('Similar (second order function now) problem but now with '
+               'AdaGrad',
+               2,
+               None,
+               'similar-second-order-function-now-problem-but-now-with-adagrad'),
+              ('RMSprop for adaptive learning rate with Stochastic Gradient '
+               'Descent',
+               2,
+               None,
+               'rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent'),
+              ('And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"',
+               2,
+               None,
+               'and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf'),
               ('Material for the lab sessions',
                2,
                None,
                'material-for-the-lab-sessions'),
-              ('Linking the regression analysis with a statistical '
-               'interpretation',
+              ('Reminder on different scaling methods',
                2,
                None,
-               'linking-the-regression-analysis-with-a-statistical-interpretation'),
-              ('Assumptions made', 2, None, 'assumptions-made'),
-              ('Expectation value and variance',
+               'reminder-on-different-scaling-methods'),
+              ('Functionality in Scikit-Learn',
                2,
                None,
-               'expectation-value-and-variance'),
-              ('Expectation value and variance for $\\boldsymbol{\\beta}$',
+               'functionality-in-scikit-learn'),
+              ('More preprocessing', 2, None, 'more-preprocessing'),
+              ('Frequently used scaling functions',
                2,
                None,
-               'expectation-value-and-variance-for-boldsymbol-beta')]}
+               'frequently-used-scaling-functions')]}
 end of tocinfo -->
 
 <body>
@@ -220,66 +283,86 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week37-bs.html">Week 37: Statistical interpretations and Resampling Methods</a>
+    <a class="navbar-brand" href="http://github.com/week37-bs.html">Week 37: Gradient descent methods</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs001.html#plans-for-week-37-lecture-monday" style="font-size: 80%;">Plans for week 37, lecture Monday</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs002.html#plans-for-week-37-lab-sessions" style="font-size: 80%;">Plans for week 37, lab sessions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs003.html#material-for-lecture-monday-september-9" style="font-size: 80%;">Material for lecture Monday September 9</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs004.html#deriving-ols-from-a-probability-distribution" style="font-size: 80%;">Deriving OLS from a probability distribution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs005.html#independent-and-identically-distrubuted-iid" style="font-size: 80%;">Independent and Identically Distrubuted (iid)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs006.html#maximum-likelihood-estimation-mle" style="font-size: 80%;">Maximum Likelihood Estimation (MLE)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs007.html#a-new-cost-function" style="font-size: 80%;">A new Cost Function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs008.html#more-basic-statistics-and-bayes-theorem" style="font-size: 80%;">More basic Statistics and Bayes' theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs009.html#marginal-probability" style="font-size: 80%;">Marginal Probability</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs010.html#conditional-probability" style="font-size: 80%;">Conditional  Probability</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs011.html#bayes-theorem" style="font-size: 80%;">Bayes' Theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs012.html#interpretations-of-bayes-theorem" style="font-size: 80%;">Interpretations of Bayes' Theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs013.html#example-of-usage-of-bayes-theorem" style="font-size: 80%;">Example of Usage of Bayes' theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs014.html#doing-it-correctly" style="font-size: 80%;">Doing it correctly</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs015.html#bayes-theorem-and-ridge-and-lasso-regression" style="font-size: 80%;">Bayes' Theorem and Ridge and Lasso Regression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs016.html#ridge-and-bayes" style="font-size: 80%;">Ridge and Bayes</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs017.html#lasso-and-bayes" style="font-size: 80%;">Lasso and Bayes</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs021.html#why-resampling-methods" style="font-size: 80%;">Why resampling methods</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs023.html#resampling-methods" style="font-size: 80%;">Resampling methods</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs020.html#resampling-approaches-can-be-computationally-expensive" style="font-size: 80%;">Resampling approaches can be computationally expensive</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs021.html#why-resampling-methods" style="font-size: 80%;">Why resampling methods ?</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs022.html#statistical-analysis" style="font-size: 80%;">Statistical analysis</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs023.html#resampling-methods" style="font-size: 80%;">Resampling methods</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs024.html#resampling-methods-bootstrap" style="font-size: 80%;">Resampling methods: Bootstrap</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs025.html#the-central-limit-theorem" style="font-size: 80%;">The Central Limit Theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs026.html#finding-the-limit" style="font-size: 80%;">Finding the Limit</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs027.html#rewriting-the-delta-function" style="font-size: 80%;">Rewriting the \( \delta \)-function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs028.html#identifying-terms" style="font-size: 80%;">Identifying Terms</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs029.html#wrapping-it-up" style="font-size: 80%;">Wrapping it up</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs030.html#confidence-intervals" style="font-size: 80%;">Confidence Intervals</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs031.html#standard-approach-based-on-the-normal-distribution" style="font-size: 80%;">Standard Approach based on the Normal Distribution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs032.html#resampling-methods-bootstrap-background" style="font-size: 80%;">Resampling methods: Bootstrap background</a></li>
-     <!-- navigation toc: --> <li><a href="#resampling-methods-more-bootstrap-background" style="font-size: 80%;">Resampling methods: More Bootstrap background</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs034.html#resampling-methods-bootstrap-approach" style="font-size: 80%;">Resampling methods: Bootstrap approach</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#resampling-methods-bootstrap-steps" style="font-size: 80%;">Resampling methods: Bootstrap steps</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs036.html#code-example-for-the-bootstrap-method" style="font-size: 80%;">Code example for the Bootstrap method</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs037.html#plotting-the-histogram" style="font-size: 80%;">Plotting the Histogram</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs038.html#the-bias-variance-tradeoff" style="font-size: 80%;">The bias-variance tradeoff</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs039.html#a-way-to-read-the-bias-variance-tradeoff" style="font-size: 80%;">A way to Read the Bias-Variance Tradeoff</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs040.html#example-code-for-bias-variance-tradeoff" style="font-size: 80%;">Example code for Bias-Variance tradeoff</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs041.html#understanding-what-happens" style="font-size: 80%;">Understanding what happens</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs042.html#summing-up" style="font-size: 80%;">Summing up</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs043.html#another-example-from-scikit-learn-s-repository" style="font-size: 80%;">Another Example from Scikit-Learn's Repository</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs044.html#various-steps-in-cross-validation" style="font-size: 80%;">Various steps in cross-validation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs045.html#cross-validation-in-brief" style="font-size: 80%;">Cross-validation in brief</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs046.html#code-example-for-cross-validation-and-k-fold-cross-validation" style="font-size: 80%;">Code Example for Cross-validation and \( k \)-fold Cross-validation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs047.html#more-examples-on-bootstrap-and-cross-validation-and-errors" style="font-size: 80%;">More examples on bootstrap and cross-validation and errors</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs048.html#the-same-example-but-now-with-cross-validation" style="font-size: 80%;">The same example but now with cross-validation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs049.html#material-for-the-lab-sessions" style="font-size: 80%;">Material for the lab sessions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs050.html#linking-the-regression-analysis-with-a-statistical-interpretation" style="font-size: 80%;">Linking the regression analysis with a statistical interpretation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs051.html#assumptions-made" style="font-size: 80%;">Assumptions made</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs052.html#expectation-value-and-variance" style="font-size: 80%;">Expectation value and variance</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs053.html#expectation-value-and-variance-for-boldsymbol-beta" style="font-size: 80%;">Expectation value and variance for \( \boldsymbol{\beta} \)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs001.html#plans-for-week-37-lecture-monday" style="font-size: 80%;"><b>Plans for week 37, lecture Monday</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs002.html#readings-and-videos" style="font-size: 80%;"><b>Readings and Videos:</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs003.html#material-for-lecture-monday-september-8" style="font-size: 80%;"><b>Material for lecture Monday September 8</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs004.html#gradient-descent-and-revisiting-ordinary-least-squares-from-last-week" style="font-size: 80%;"><b>Gradient descent and revisiting Ordinary Least Squares from last week</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs009.html#gradient-descent-example" style="font-size: 80%;"><b>Gradient descent example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs006.html#the-derivative-of-the-cost-loss-function" style="font-size: 80%;"><b>The derivative of the cost/loss function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs007.html#the-hessian-matrix" style="font-size: 80%;"><b>The Hessian matrix</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs008.html#simple-program" style="font-size: 80%;"><b>Simple program</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs009.html#gradient-descent-example" style="font-size: 80%;"><b>Gradient Descent Example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs010.html#gradient-descent-and-ridge" style="font-size: 80%;"><b>Gradient descent and Ridge</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs011.html#the-hessian-matrix-for-ridge-regression" style="font-size: 80%;"><b>The Hessian matrix for Ridge Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs012.html#program-example-for-gradient-descent-with-ridge-regression" style="font-size: 80%;"><b>Program example for gradient descent with Ridge Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs013.html#using-gradient-descent-methods-limitations" style="font-size: 80%;"><b>Using gradient descent methods, limitations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs014.html#momentum-based-gd" style="font-size: 80%;"><b>Momentum based GD</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs015.html#improving-gradient-descent-with-momentum" style="font-size: 80%;"><b>Improving gradient descent with momentum</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs056.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs017.html#overview-video-on-stochastic-gradient-descent-sgd" style="font-size: 80%;"><b>Overview video on Stochastic Gradient Descent (SGD)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs018.html#batches-and-mini-batches" style="font-size: 80%;"><b>Batches and mini-batches</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs019.html#pros-and-cons" style="font-size: 80%;"><b>Pros and cons</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs020.html#convergence-rates" style="font-size: 80%;"><b>Convergence rates</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs021.html#accuracy" style="font-size: 80%;"><b>Accuracy</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs022.html#stochastic-gradient-descent-sgd" style="font-size: 80%;"><b>Stochastic Gradient Descent (SGD)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs023.html#stochastic-gradient-descent" style="font-size: 80%;"><b>Stochastic Gradient Descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs024.html#computation-of-gradients" style="font-size: 80%;"><b>Computation of gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs025.html#sgd-example" style="font-size: 80%;"><b>SGD example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs026.html#the-gradient-step" style="font-size: 80%;"><b>The gradient step</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs027.html#simple-example-code" style="font-size: 80%;"><b>Simple example code</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs028.html#when-do-we-stop" style="font-size: 80%;"><b>When do we stop?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs029.html#slightly-different-approach" style="font-size: 80%;"><b>Slightly different approach</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs030.html#time-decay-rate" style="font-size: 80%;"><b>Time decay rate</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs031.html#code-with-a-number-of-minibatches-which-varies" style="font-size: 80%;"><b>Code with a Number of Minibatches which varies</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs032.html#replace-or-not" style="font-size: 80%;"><b>Replace or not</b></a></li>
+     <!-- navigation toc: --> <li><a href="#sgd-vs-full-batch-gd-convergence-speed-and-memory-comparison" style="font-size: 80%;"><b>SGD vs Full-Batch GD: Convergence Speed and Memory Comparison</b></a></li>
+     <!-- navigation toc: --> <li><a href="#theoretical-convergence-speed-and-convex-optimization" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Theoretical Convergence Speed and convex optimization</a></li>
+     <!-- navigation toc: --> <li><a href="#strongly-convex-case" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Strongly Convex Case</a></li>
+     <!-- navigation toc: --> <li><a href="#non-convex-problems" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Non-Convex Problems</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs034.html#memory-usage-and-scalability" style="font-size: 80%;"><b>Memory Usage and Scalability</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#empirical-evidence-convergence-time-and-memory-in-practice" style="font-size: 80%;"><b>Empirical Evidence: Convergence Time and Memory in Practice</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#deep-neural-networks" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Deep Neural Networks</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#memory-constraints" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Memory constraints</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs036.html#second-moment-of-the-gradient" style="font-size: 80%;"><b>Second moment of the gradient</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs037.html#challenge-choosing-a-fixed-learning-rate" style="font-size: 80%;"><b>Challenge: Choosing a Fixed Learning Rate</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs038.html#motivation-for-adaptive-step-sizes" style="font-size: 80%;"><b>Motivation for Adaptive Step Sizes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs038.html#adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>AdaGrad algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs039.html#derivation-of-the-adagrad-algorithm" style="font-size: 80%;"><b>Derivation of the AdaGrad Algorithm</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs040.html#adagrad-update-rule-derivation" style="font-size: 80%;"><b>AdaGrad Update Rule Derivation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs041.html#adagrad-properties" style="font-size: 80%;"><b>AdaGrad Properties</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs042.html#rmsprop-adaptive-learning-rates" style="font-size: 80%;"><b>RMSProp: Adaptive Learning Rates</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs042.html#rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>RMSProp algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs043.html#adam-optimizer" style="font-size: 80%;"><b>Adam Optimizer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs044.html#adam-optimizer-https-arxiv-org-abs-1412-6980" style="font-size: 80%;"><b>"ADAM optimizer":"https://arxiv.org/abs/1412.6980"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs045.html#why-combine-momentum-and-rmsprop" style="font-size: 80%;"><b>Why Combine Momentum and RMSProp?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs046.html#adam-exponential-moving-averages-moments" style="font-size: 80%;"><b>Adam: Exponential Moving Averages (Moments)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs047.html#adam-bias-correction" style="font-size: 80%;"><b>Adam: Bias Correction</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs048.html#adam-update-rule-derivation" style="font-size: 80%;"><b>Adam: Update Rule Derivation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs049.html#adam-vs-adagrad-and-rmsprop" style="font-size: 80%;"><b>Adam vs. AdaGrad and RMSProp</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs050.html#adaptivity-across-dimensions" style="font-size: 80%;"><b>Adaptivity Across Dimensions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs050.html#adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>ADAM algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs051.html#algorithms-and-codes-for-adagrad-rmsprop-and-adam" style="font-size: 80%;"><b>Algorithms and codes for Adagrad, RMSprop and Adam</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs052.html#practical-tips" style="font-size: 80%;"><b>Practical tips</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs053.html#sneaking-in-automatic-differentiation-using-autograd" style="font-size: 80%;"><b>Sneaking in automatic differentiation using Autograd</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs056.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs055.html#including-stochastic-gradient-descent-with-autograd" style="font-size: 80%;"><b>Including Stochastic Gradient Descent with Autograd</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs056.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs057.html#but-none-of-these-can-compete-with-newton-s-method" style="font-size: 80%;"><b>But none of these can compete with Newton's method</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs058.html#similar-second-order-function-now-problem-but-now-with-adagrad" style="font-size: 80%;"><b>Similar (second order function now) problem but now with AdaGrad</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs059.html#rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent" style="font-size: 80%;"><b>RMSprop for adaptive learning rate with Stochastic Gradient Descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs060.html#and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf" style="font-size: 80%;"><b>And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs061.html#material-for-the-lab-sessions" style="font-size: 80%;"><b>Material for the lab sessions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs062.html#reminder-on-different-scaling-methods" style="font-size: 80%;"><b>Reminder on different scaling methods</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs063.html#functionality-in-scikit-learn" style="font-size: 80%;"><b>Functionality in Scikit-Learn</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs064.html#more-preprocessing" style="font-size: 80%;"><b>More preprocessing</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs065.html#frequently-used-scaling-functions" style="font-size: 80%;"><b>Frequently used scaling functions</b></a></li>
 
         </ul>
       </li>
@@ -291,22 +374,110 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0033"></a>
 <!-- !split -->
-<h2 id="resampling-methods-more-bootstrap-background" class="anchor">Resampling methods: More Bootstrap background </h2>
+<h2 id="sgd-vs-full-batch-gd-convergence-speed-and-memory-comparison" class="anchor">SGD vs Full-Batch GD: Convergence Speed and Memory Comparison </h2>
+<h3 id="theoretical-convergence-speed-and-convex-optimization" class="anchor">Theoretical Convergence Speed and convex optimization </h3>
+
+<p>Consider minimizing an empirical cost function</p>
+$$
+C(\theta) =\frac{1}{N}\sum_{i=1}^N l_i(\theta),
+$$
 
-<p>In the case that \( \widehat{\beta} \) has
-more than one component, and the components are independent, we use the
-same estimator on each component separately.  If the probability
-density function of \( X_i \), \( p(x) \), had been known, then it would have
-been straightforward to do this by: 
+<p>where each \( l_i(\theta) \) is a
+differentiable loss term. Gradient Descent (GD) updates parameters
+using the full gradient \( \nabla C(\theta) \), while Stochastic Gradient
+Descent (SGD) uses a single sample (or mini-batch) gradient \( \nabla
+l_i(\theta) \) selected at random. In equation form, one GD step is:
 </p>
-<ol>
-<li> Drawing lots of numbers from \( p(x) \), suppose we call one such set of numbers \( (X_1^*, X_2^*, \cdots, X_n^*) \).</li> 
-<li> Then using these numbers, we could compute a replica of \( \widehat{\beta} \) called \( \widehat{\beta}^* \).</li> 
-</ol>
-<p>By repeated use of the above two points, many
-estimates of \( \widehat{\beta} \) can  be obtained. The
-idea is to use the relative frequency of \( \widehat{\beta}^* \)
-(think of a histogram) as an estimate of \( p(\boldsymbol{t}) \).
+
+$$
+\theta_{t+1} = \theta_t-\eta \nabla C(\theta_t) =\theta_t -\eta \frac{1}{N}\sum_{i=1}^N \nabla l_i(\theta_t),
+$$
+
+<p>whereas one SGD step is:</p>
+
+$$
+\theta_{t+1} = \theta_t -\eta \nabla l_{i_t}(\theta_t),
+$$
+
+<p>with \( i_t \) randomly chosen. On smooth convex problems, GD and SGD both
+converge to the global minimum, but their rates differ. GD can take
+larger, more stable steps since it uses the exact gradient, achieving
+an error that decreases on the order of \( O(1/t) \) per iteration for
+convex objectives (and even exponentially fast for strongly convex
+cases). In contrast, plain SGD has more variance in each step, leading
+to sublinear convergence in expectation &#8211; typically \( O(1/\sqrt{t}) \)
+for general convex objectives (\thetaith appropriate diminishing step
+sizes) . Intuitively, GD&#8217;s trajectory is smoother and more
+predictable, while SGD&#8217;s path oscillates due to noise but costs far
+less per iteration, enabling many more updates in the same time.
+</p>
+<h3 id="strongly-convex-case" class="anchor">Strongly Convex Case </h3>
+
+<p>If \( C(\theta) \) is strongly convex and \( L \)-smooth (so GD enjoys linear
+convergence), the gap \( C(\theta_t)-C(\theta^*) \) for GD shrinks as
+</p>
+$$
+C(\theta_t) - C(\theta^* ) \le \Big(1 - \frac{\mu}{L}\Big)^t [C(\theta_0)-C(\theta^*)],
+$$
+
+<p>a geometric (linear) convergence per iteration . Achieving an
+\( \epsilon \)-accurate solution thus takes on the order of
+\( \log(1/\epsilon) \) iterations for GD. However, each GD iteration costs
+\( O(N) \) gradient evaluations. SGD cannot exploit strong convexity to
+obtain a linear rate &#8211; instead, with a properly decaying step size
+(e.g. \( \eta_t = \frac{1}{\mu t} \)) or iterate averaging, SGD attains an
+\( O(1/t) \) convergence rate in expectation . For example, one result
+of Moulines and  Bach 2011, see <a href="https://papers.nips.cc/paper_files/paper/2011/hash/40008b9a5380fcacce3976bf7c08af5b-Abstract.html" target="_self"><tt>https://papers.nips.cc/paper_files/paper/2011/hash/40008b9a5380fcacce3976bf7c08af5b-Abstract.html</tt></a> shows that with \( \eta_t = \Theta(1/t) \),
+</p>
+$$
+\mathbb{E}[C(\theta_t) - C(\theta^*)] = O(1/t),
+$$
+
+<p>for strongly convex, smooth \( F \) . This \( 1/t \) rate is slower per
+iteration than GD&#8217;s exponential decay, but each SGD iteration is \( N \)
+times cheaper. In fact, to reach error \( \epsilon \), plain SGD needs on
+the order of \( T=O(1/\epsilon) \) iterations (sub-linear convergence),
+while GD needs \( O(\log(1/\epsilon)) \) iterations. When accounting for
+cost-per-iteration, GD requires \( O(N \log(1/\epsilon)) \) total gradient
+computations versus SGD&#8217;s \( O(1/\epsilon) \) single-sample
+computations. In large-scale regimes (huge \( N \)), SGD can be
+faster in wall-clock time because \( N \log(1/\epsilon) \) may far exceed
+\( 1/\epsilon \) for reasonable accuracy levels. In other words,
+with millions of data points, one epoch of GD (one full gradient) is
+extremely costly, whereas SGD can make \( N \) cheap updates in the time
+GD makes one &#8211; often yielding a good solution faster in practice, even
+though SGD&#8217;s asymptotic error decays more slowly. As one lecture
+succinctly puts it: &#8220;SGD can be super effective in terms of iteration
+cost and memory, but SGD is slow to converge and can&#8217;t adapt to strong
+convexity&#8221; . Thus, the break-even point depends on \( N \) and the desired
+accuracy: for moderate accuracy on very large \( N \), SGD&#8217;s cheaper
+updates win; for extremely high precision (very small \( \epsilon \)) on a
+modest \( N \), GD&#8217;s fast convergence per step can be advantageous.
+</p>
+<h3 id="non-convex-problems" class="anchor">Non-Convex Problems </h3>
+
+<p>In non-convex optimization (e.g. deep neural networks), neither GD nor
+SGD guarantees global minima, but SGD often displays faster progress
+in finding useful minima. Theoretical results here are weaker, usually
+showing convergence to a stationary point \( \theta \) (\( |\nabla C| \) is
+small) in expectation. For example, GD might require \( O(1/\epsilon^2) \)
+iterations to ensure \( |\nabla C(\theta)| < \epsilon \), and SGD typically has
+similar polynomial complexity (often worse due to gradient
+noise). However, a noteworthy difference is that SGD&#8217;s stochasticity
+can help escape saddle points or poor local minima. Random gradient
+fluctuations act like implicit noise, helping the iterate &#8220;jump&#8221; out
+of flat saddle regions where full-batch GD could stagnate . In fact,
+research has shown that adding noise to GD can guarantee escaping
+saddle points in polynomial time, and the inherent noise in SGD often
+serves this role. Empirically, this means SGD can sometimes find a
+lower loss basin faster, whereas full-batch GD might get &#8220;stuck&#8221; near
+saddle points or need a very small learning rate to navigate complex
+error surfaces . Overall, in modern high-dimensional machine learning,
+SGD (or mini-batch SGD) is the workhorse for large non-convex problems
+because it converges to good solutions much faster in practice,
+despite the lack of a linear convergence guarantee. Full-batch GD is
+rarely used on large neural networks, as it would require tiny steps
+to avoid divergence and is extremely slow per iteration .
 </p>
 
 <p>
@@ -334,7 +505,7 @@ <h2 id="resampling-methods-more-bootstrap-background" class="anchor">Resampling
   <li><a href="http://github.com/._week37-bs041.html">42</a></li>
   <li><a href="http://github.com/._week37-bs042.html">43</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week37-bs053.html">54</a></li>
+  <li><a href="http://github.com/._week37-bs065.html">66</a></li>
   <li><a href="http://github.com/._week37-bs034.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week37/html/._week37-bs034.html b/doc/pub/week37/html/._week37-bs034.html
index a867dfa1c..932fa4f2d 100644
--- a/doc/pub/week37/html/._week37-bs034.html
+++ b/doc/pub/week37/html/._week37-bs034.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 37: Statistical interpretations and Resampling Methods">
-<title>Week 37: Statistical interpretations and Resampling Methods</title>
+<meta name="description" content="Week 37: Gradient descent methods">
+<title>Week 37: Gradient descent methods</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week37.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week37-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -40,159 +40,222 @@
                2,
                None,
                'plans-for-week-37-lecture-monday'),
-              ('Plans for week 37, lab sessions',
+              ('Readings and Videos:', 2, None, 'readings-and-videos'),
+              ('Material for lecture Monday September 8',
                2,
                None,
-               'plans-for-week-37-lab-sessions'),
-              ('Material for lecture Monday September 9',
+               'material-for-lecture-monday-september-8'),
+              ('Gradient descent and revisiting Ordinary Least Squares from '
+               'last week',
                2,
                None,
-               'material-for-lecture-monday-september-9'),
-              ('Deriving OLS from a probability distribution',
+               'gradient-descent-and-revisiting-ordinary-least-squares-from-last-week'),
+              ('Gradient descent example', 2, None, 'gradient-descent-example'),
+              ('The derivative of the cost/loss function',
                2,
                None,
-               'deriving-ols-from-a-probability-distribution'),
-              ('Independent and Identically Distrubuted (iid)',
+               'the-derivative-of-the-cost-loss-function'),
+              ('The Hessian matrix', 2, None, 'the-hessian-matrix'),
+              ('Simple program', 2, None, 'simple-program'),
+              ('Gradient Descent Example', 2, None, 'gradient-descent-example'),
+              ('Gradient descent and Ridge',
                2,
                None,
-               'independent-and-identically-distrubuted-iid'),
-              ('Maximum Likelihood Estimation (MLE)',
+               'gradient-descent-and-ridge'),
+              ('The Hessian matrix for Ridge Regression',
                2,
                None,
-               'maximum-likelihood-estimation-mle'),
-              ('A new Cost Function', 2, None, 'a-new-cost-function'),
-              ("More basic Statistics and Bayes' theorem",
+               'the-hessian-matrix-for-ridge-regression'),
+              ('Program example for gradient descent with Ridge Regression',
                2,
                None,
-               'more-basic-statistics-and-bayes-theorem'),
-              ('Marginal Probability', 2, None, 'marginal-probability'),
-              ('Conditional  Probability', 2, None, 'conditional-probability'),
-              ("Bayes' Theorem", 2, None, 'bayes-theorem'),
-              ("Interpretations of Bayes' Theorem",
+               'program-example-for-gradient-descent-with-ridge-regression'),
+              ('Using gradient descent methods, limitations',
                2,
                None,
-               'interpretations-of-bayes-theorem'),
-              ("Example of Usage of Bayes' theorem",
+               'using-gradient-descent-methods-limitations'),
+              ('Momentum based GD', 2, None, 'momentum-based-gd'),
+              ('Improving gradient descent with momentum',
                2,
                None,
-               'example-of-usage-of-bayes-theorem'),
-              ('Doing it correctly', 2, None, 'doing-it-correctly'),
-              ("Bayes' Theorem and Ridge and Lasso Regression",
+               'improving-gradient-descent-with-momentum'),
+              ('Same code but now with momentum gradient descent',
                2,
                None,
-               'bayes-theorem-and-ridge-and-lasso-regression'),
-              ('Ridge and Bayes', 2, None, 'ridge-and-bayes'),
-              ('Lasso and Bayes', 2, None, 'lasso-and-bayes'),
-              ('Why resampling methods', 2, None, 'why-resampling-methods'),
-              ('Resampling methods', 2, None, 'resampling-methods'),
-              ('Resampling approaches can be computationally expensive',
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ('Overview video on Stochastic Gradient Descent (SGD)',
                2,
                None,
-               'resampling-approaches-can-be-computationally-expensive'),
-              ('Why resampling methods ?', 2, None, 'why-resampling-methods'),
-              ('Statistical analysis', 2, None, 'statistical-analysis'),
-              ('Resampling methods', 2, None, 'resampling-methods'),
-              ('Resampling methods: Bootstrap',
+               'overview-video-on-stochastic-gradient-descent-sgd'),
+              ('Batches and mini-batches', 2, None, 'batches-and-mini-batches'),
+              ('Pros and cons', 2, None, 'pros-and-cons'),
+              ('Convergence rates', 2, None, 'convergence-rates'),
+              ('Accuracy', 2, None, 'accuracy'),
+              ('Stochastic Gradient Descent (SGD)',
                2,
                None,
-               'resampling-methods-bootstrap'),
-              ('The Central Limit Theorem',
+               'stochastic-gradient-descent-sgd'),
+              ('Stochastic Gradient Descent',
                2,
                None,
-               'the-central-limit-theorem'),
-              ('Finding the Limit', 2, None, 'finding-the-limit'),
-              ('Rewriting the $\\delta$-function',
+               'stochastic-gradient-descent'),
+              ('Computation of gradients', 2, None, 'computation-of-gradients'),
+              ('SGD example', 2, None, 'sgd-example'),
+              ('The gradient step', 2, None, 'the-gradient-step'),
+              ('Simple example code', 2, None, 'simple-example-code'),
+              ('When do we stop?', 2, None, 'when-do-we-stop'),
+              ('Slightly different approach',
                2,
                None,
-               'rewriting-the-delta-function'),
-              ('Identifying Terms', 2, None, 'identifying-terms'),
-              ('Wrapping it up', 2, None, 'wrapping-it-up'),
-              ('Confidence Intervals', 2, None, 'confidence-intervals'),
-              ('Standard Approach based on the Normal Distribution',
+               'slightly-different-approach'),
+              ('Time decay rate', 2, None, 'time-decay-rate'),
+              ('Code with a Number of Minibatches which varies',
                2,
                None,
-               'standard-approach-based-on-the-normal-distribution'),
-              ('Resampling methods: Bootstrap background',
+               'code-with-a-number-of-minibatches-which-varies'),
+              ('Replace or not', 2, None, 'replace-or-not'),
+              ('SGD vs Full-Batch GD: Convergence Speed and Memory Comparison',
                2,
                None,
-               'resampling-methods-bootstrap-background'),
-              ('Resampling methods: More Bootstrap background',
+               'sgd-vs-full-batch-gd-convergence-speed-and-memory-comparison'),
+              ('Theoretical Convergence Speed and convex optimization',
+               3,
+               None,
+               'theoretical-convergence-speed-and-convex-optimization'),
+              ('Strongly Convex Case', 3, None, 'strongly-convex-case'),
+              ('Non-Convex Problems', 3, None, 'non-convex-problems'),
+              ('Memory Usage and Scalability',
+               2,
+               None,
+               'memory-usage-and-scalability'),
+              ('Empirical Evidence: Convergence Time and Memory in Practice',
+               2,
+               None,
+               'empirical-evidence-convergence-time-and-memory-in-practice'),
+              ('Deep Neural Networks', 3, None, 'deep-neural-networks'),
+              ('Memory constraints', 3, None, 'memory-constraints'),
+              ('Second moment of the gradient',
+               2,
+               None,
+               'second-moment-of-the-gradient'),
+              ('Challenge: Choosing a Fixed Learning Rate',
+               2,
+               None,
+               'challenge-choosing-a-fixed-learning-rate'),
+              ('Motivation for Adaptive Step Sizes',
+               2,
+               None,
+               'motivation-for-adaptive-step-sizes'),
+              ('AdaGrad algorithm, taken from "Goodfellow et '
+               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
+               2,
+               None,
+               'adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
+              ('Derivation of the AdaGrad Algorithm',
+               2,
+               None,
+               'derivation-of-the-adagrad-algorithm'),
+              ('AdaGrad Update Rule Derivation',
+               2,
+               None,
+               'adagrad-update-rule-derivation'),
+              ('AdaGrad Properties', 2, None, 'adagrad-properties'),
+              ('RMSProp: Adaptive Learning Rates',
+               2,
+               None,
+               'rmsprop-adaptive-learning-rates'),
+              ('RMSProp algorithm, taken from "Goodfellow et '
+               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
+               2,
+               None,
+               'rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
+              ('Adam Optimizer', 2, None, 'adam-optimizer'),
+              ('"ADAM optimizer":"https://arxiv.org/abs/1412.6980"',
+               2,
+               None,
+               'adam-optimizer-https-arxiv-org-abs-1412-6980'),
+              ('Why Combine Momentum and RMSProp?',
+               2,
+               None,
+               'why-combine-momentum-and-rmsprop'),
+              ('Adam: Exponential Moving Averages (Moments)',
                2,
                None,
-               'resampling-methods-more-bootstrap-background'),
-              ('Resampling methods: Bootstrap approach',
+               'adam-exponential-moving-averages-moments'),
+              ('Adam: Bias Correction', 2, None, 'adam-bias-correction'),
+              ('Adam: Update Rule Derivation',
                2,
                None,
-               'resampling-methods-bootstrap-approach'),
-              ('Resampling methods: Bootstrap steps',
+               'adam-update-rule-derivation'),
+              ('Adam vs. AdaGrad and RMSProp',
                2,
                None,
-               'resampling-methods-bootstrap-steps'),
-              ('Code example for the Bootstrap method',
+               'adam-vs-adagrad-and-rmsprop'),
+              ('Adaptivity Across Dimensions',
                2,
                None,
-               'code-example-for-the-bootstrap-method'),
-              ('Plotting the Histogram', 2, None, 'plotting-the-histogram'),
-              ('The bias-variance tradeoff',
+               'adaptivity-across-dimensions'),
+              ('ADAM algorithm, taken from "Goodfellow et '
+               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
                2,
                None,
-               'the-bias-variance-tradeoff'),
-              ('A way to Read the Bias-Variance Tradeoff',
+               'adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
+              ('Algorithms and codes for Adagrad, RMSprop and Adam',
                2,
                None,
-               'a-way-to-read-the-bias-variance-tradeoff'),
-              ('Example code for Bias-Variance tradeoff',
+               'algorithms-and-codes-for-adagrad-rmsprop-and-adam'),
+              ('Practical tips', 2, None, 'practical-tips'),
+              ('Sneaking in automatic differentiation using Autograd',
                2,
                None,
-               'example-code-for-bias-variance-tradeoff'),
-              ('Understanding what happens',
+               'sneaking-in-automatic-differentiation-using-autograd'),
+              ('Same code but now with momentum gradient descent',
                2,
                None,
-               'understanding-what-happens'),
-              ('Summing up', 2, None, 'summing-up'),
-              ("Another Example from Scikit-Learn's Repository",
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ('Including Stochastic Gradient Descent with Autograd',
                2,
                None,
-               'another-example-from-scikit-learn-s-repository'),
-              ('Various steps in cross-validation',
+               'including-stochastic-gradient-descent-with-autograd'),
+              ('Same code but now with momentum gradient descent',
                2,
                None,
-               'various-steps-in-cross-validation'),
-              ('Cross-validation in brief',
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ("But none of these can compete with Newton's method",
                2,
                None,
-               'cross-validation-in-brief'),
-              ('Code Example for Cross-validation and $k$-fold '
-               'Cross-validation',
+               'but-none-of-these-can-compete-with-newton-s-method'),
+              ('Similar (second order function now) problem but now with '
+               'AdaGrad',
                2,
                None,
-               'code-example-for-cross-validation-and-k-fold-cross-validation'),
-              ('More examples on bootstrap and cross-validation and errors',
+               'similar-second-order-function-now-problem-but-now-with-adagrad'),
+              ('RMSprop for adaptive learning rate with Stochastic Gradient '
+               'Descent',
                2,
                None,
-               'more-examples-on-bootstrap-and-cross-validation-and-errors'),
-              ('The same example but now with cross-validation',
+               'rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent'),
+              ('And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"',
                2,
                None,
-               'the-same-example-but-now-with-cross-validation'),
+               'and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf'),
               ('Material for the lab sessions',
                2,
                None,
                'material-for-the-lab-sessions'),
-              ('Linking the regression analysis with a statistical '
-               'interpretation',
+              ('Reminder on different scaling methods',
                2,
                None,
-               'linking-the-regression-analysis-with-a-statistical-interpretation'),
-              ('Assumptions made', 2, None, 'assumptions-made'),
-              ('Expectation value and variance',
+               'reminder-on-different-scaling-methods'),
+              ('Functionality in Scikit-Learn',
                2,
                None,
-               'expectation-value-and-variance'),
-              ('Expectation value and variance for $\\boldsymbol{\\beta}$',
+               'functionality-in-scikit-learn'),
+              ('More preprocessing', 2, None, 'more-preprocessing'),
+              ('Frequently used scaling functions',
                2,
                None,
-               'expectation-value-and-variance-for-boldsymbol-beta')]}
+               'frequently-used-scaling-functions')]}
 end of tocinfo -->
 
 <body>
@@ -220,66 +283,86 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week37-bs.html">Week 37: Statistical interpretations and Resampling Methods</a>
+    <a class="navbar-brand" href="http://github.com/week37-bs.html">Week 37: Gradient descent methods</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs001.html#plans-for-week-37-lecture-monday" style="font-size: 80%;">Plans for week 37, lecture Monday</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs002.html#plans-for-week-37-lab-sessions" style="font-size: 80%;">Plans for week 37, lab sessions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs003.html#material-for-lecture-monday-september-9" style="font-size: 80%;">Material for lecture Monday September 9</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs004.html#deriving-ols-from-a-probability-distribution" style="font-size: 80%;">Deriving OLS from a probability distribution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs005.html#independent-and-identically-distrubuted-iid" style="font-size: 80%;">Independent and Identically Distrubuted (iid)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs006.html#maximum-likelihood-estimation-mle" style="font-size: 80%;">Maximum Likelihood Estimation (MLE)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs007.html#a-new-cost-function" style="font-size: 80%;">A new Cost Function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs008.html#more-basic-statistics-and-bayes-theorem" style="font-size: 80%;">More basic Statistics and Bayes' theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs009.html#marginal-probability" style="font-size: 80%;">Marginal Probability</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs010.html#conditional-probability" style="font-size: 80%;">Conditional  Probability</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs011.html#bayes-theorem" style="font-size: 80%;">Bayes' Theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs012.html#interpretations-of-bayes-theorem" style="font-size: 80%;">Interpretations of Bayes' Theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs013.html#example-of-usage-of-bayes-theorem" style="font-size: 80%;">Example of Usage of Bayes' theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs014.html#doing-it-correctly" style="font-size: 80%;">Doing it correctly</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs015.html#bayes-theorem-and-ridge-and-lasso-regression" style="font-size: 80%;">Bayes' Theorem and Ridge and Lasso Regression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs016.html#ridge-and-bayes" style="font-size: 80%;">Ridge and Bayes</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs017.html#lasso-and-bayes" style="font-size: 80%;">Lasso and Bayes</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs021.html#why-resampling-methods" style="font-size: 80%;">Why resampling methods</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs023.html#resampling-methods" style="font-size: 80%;">Resampling methods</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs020.html#resampling-approaches-can-be-computationally-expensive" style="font-size: 80%;">Resampling approaches can be computationally expensive</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs021.html#why-resampling-methods" style="font-size: 80%;">Why resampling methods ?</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs022.html#statistical-analysis" style="font-size: 80%;">Statistical analysis</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs023.html#resampling-methods" style="font-size: 80%;">Resampling methods</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs024.html#resampling-methods-bootstrap" style="font-size: 80%;">Resampling methods: Bootstrap</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs025.html#the-central-limit-theorem" style="font-size: 80%;">The Central Limit Theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs026.html#finding-the-limit" style="font-size: 80%;">Finding the Limit</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs027.html#rewriting-the-delta-function" style="font-size: 80%;">Rewriting the \( \delta \)-function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs028.html#identifying-terms" style="font-size: 80%;">Identifying Terms</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs029.html#wrapping-it-up" style="font-size: 80%;">Wrapping it up</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs030.html#confidence-intervals" style="font-size: 80%;">Confidence Intervals</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs031.html#standard-approach-based-on-the-normal-distribution" style="font-size: 80%;">Standard Approach based on the Normal Distribution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs032.html#resampling-methods-bootstrap-background" style="font-size: 80%;">Resampling methods: Bootstrap background</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#resampling-methods-more-bootstrap-background" style="font-size: 80%;">Resampling methods: More Bootstrap background</a></li>
-     <!-- navigation toc: --> <li><a href="#resampling-methods-bootstrap-approach" style="font-size: 80%;">Resampling methods: Bootstrap approach</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#resampling-methods-bootstrap-steps" style="font-size: 80%;">Resampling methods: Bootstrap steps</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs036.html#code-example-for-the-bootstrap-method" style="font-size: 80%;">Code example for the Bootstrap method</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs037.html#plotting-the-histogram" style="font-size: 80%;">Plotting the Histogram</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs038.html#the-bias-variance-tradeoff" style="font-size: 80%;">The bias-variance tradeoff</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs039.html#a-way-to-read-the-bias-variance-tradeoff" style="font-size: 80%;">A way to Read the Bias-Variance Tradeoff</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs040.html#example-code-for-bias-variance-tradeoff" style="font-size: 80%;">Example code for Bias-Variance tradeoff</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs041.html#understanding-what-happens" style="font-size: 80%;">Understanding what happens</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs042.html#summing-up" style="font-size: 80%;">Summing up</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs043.html#another-example-from-scikit-learn-s-repository" style="font-size: 80%;">Another Example from Scikit-Learn's Repository</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs044.html#various-steps-in-cross-validation" style="font-size: 80%;">Various steps in cross-validation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs045.html#cross-validation-in-brief" style="font-size: 80%;">Cross-validation in brief</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs046.html#code-example-for-cross-validation-and-k-fold-cross-validation" style="font-size: 80%;">Code Example for Cross-validation and \( k \)-fold Cross-validation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs047.html#more-examples-on-bootstrap-and-cross-validation-and-errors" style="font-size: 80%;">More examples on bootstrap and cross-validation and errors</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs048.html#the-same-example-but-now-with-cross-validation" style="font-size: 80%;">The same example but now with cross-validation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs049.html#material-for-the-lab-sessions" style="font-size: 80%;">Material for the lab sessions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs050.html#linking-the-regression-analysis-with-a-statistical-interpretation" style="font-size: 80%;">Linking the regression analysis with a statistical interpretation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs051.html#assumptions-made" style="font-size: 80%;">Assumptions made</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs052.html#expectation-value-and-variance" style="font-size: 80%;">Expectation value and variance</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs053.html#expectation-value-and-variance-for-boldsymbol-beta" style="font-size: 80%;">Expectation value and variance for \( \boldsymbol{\beta} \)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs001.html#plans-for-week-37-lecture-monday" style="font-size: 80%;"><b>Plans for week 37, lecture Monday</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs002.html#readings-and-videos" style="font-size: 80%;"><b>Readings and Videos:</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs003.html#material-for-lecture-monday-september-8" style="font-size: 80%;"><b>Material for lecture Monday September 8</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs004.html#gradient-descent-and-revisiting-ordinary-least-squares-from-last-week" style="font-size: 80%;"><b>Gradient descent and revisiting Ordinary Least Squares from last week</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs009.html#gradient-descent-example" style="font-size: 80%;"><b>Gradient descent example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs006.html#the-derivative-of-the-cost-loss-function" style="font-size: 80%;"><b>The derivative of the cost/loss function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs007.html#the-hessian-matrix" style="font-size: 80%;"><b>The Hessian matrix</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs008.html#simple-program" style="font-size: 80%;"><b>Simple program</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs009.html#gradient-descent-example" style="font-size: 80%;"><b>Gradient Descent Example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs010.html#gradient-descent-and-ridge" style="font-size: 80%;"><b>Gradient descent and Ridge</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs011.html#the-hessian-matrix-for-ridge-regression" style="font-size: 80%;"><b>The Hessian matrix for Ridge Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs012.html#program-example-for-gradient-descent-with-ridge-regression" style="font-size: 80%;"><b>Program example for gradient descent with Ridge Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs013.html#using-gradient-descent-methods-limitations" style="font-size: 80%;"><b>Using gradient descent methods, limitations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs014.html#momentum-based-gd" style="font-size: 80%;"><b>Momentum based GD</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs015.html#improving-gradient-descent-with-momentum" style="font-size: 80%;"><b>Improving gradient descent with momentum</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs056.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs017.html#overview-video-on-stochastic-gradient-descent-sgd" style="font-size: 80%;"><b>Overview video on Stochastic Gradient Descent (SGD)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs018.html#batches-and-mini-batches" style="font-size: 80%;"><b>Batches and mini-batches</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs019.html#pros-and-cons" style="font-size: 80%;"><b>Pros and cons</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs020.html#convergence-rates" style="font-size: 80%;"><b>Convergence rates</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs021.html#accuracy" style="font-size: 80%;"><b>Accuracy</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs022.html#stochastic-gradient-descent-sgd" style="font-size: 80%;"><b>Stochastic Gradient Descent (SGD)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs023.html#stochastic-gradient-descent" style="font-size: 80%;"><b>Stochastic Gradient Descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs024.html#computation-of-gradients" style="font-size: 80%;"><b>Computation of gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs025.html#sgd-example" style="font-size: 80%;"><b>SGD example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs026.html#the-gradient-step" style="font-size: 80%;"><b>The gradient step</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs027.html#simple-example-code" style="font-size: 80%;"><b>Simple example code</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs028.html#when-do-we-stop" style="font-size: 80%;"><b>When do we stop?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs029.html#slightly-different-approach" style="font-size: 80%;"><b>Slightly different approach</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs030.html#time-decay-rate" style="font-size: 80%;"><b>Time decay rate</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs031.html#code-with-a-number-of-minibatches-which-varies" style="font-size: 80%;"><b>Code with a Number of Minibatches which varies</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs032.html#replace-or-not" style="font-size: 80%;"><b>Replace or not</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#sgd-vs-full-batch-gd-convergence-speed-and-memory-comparison" style="font-size: 80%;"><b>SGD vs Full-Batch GD: Convergence Speed and Memory Comparison</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#theoretical-convergence-speed-and-convex-optimization" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Theoretical Convergence Speed and convex optimization</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#strongly-convex-case" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Strongly Convex Case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#non-convex-problems" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Non-Convex Problems</a></li>
+     <!-- navigation toc: --> <li><a href="#memory-usage-and-scalability" style="font-size: 80%;"><b>Memory Usage and Scalability</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#empirical-evidence-convergence-time-and-memory-in-practice" style="font-size: 80%;"><b>Empirical Evidence: Convergence Time and Memory in Practice</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#deep-neural-networks" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Deep Neural Networks</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#memory-constraints" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Memory constraints</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs036.html#second-moment-of-the-gradient" style="font-size: 80%;"><b>Second moment of the gradient</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs037.html#challenge-choosing-a-fixed-learning-rate" style="font-size: 80%;"><b>Challenge: Choosing a Fixed Learning Rate</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs038.html#motivation-for-adaptive-step-sizes" style="font-size: 80%;"><b>Motivation for Adaptive Step Sizes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs038.html#adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>AdaGrad algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs039.html#derivation-of-the-adagrad-algorithm" style="font-size: 80%;"><b>Derivation of the AdaGrad Algorithm</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs040.html#adagrad-update-rule-derivation" style="font-size: 80%;"><b>AdaGrad Update Rule Derivation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs041.html#adagrad-properties" style="font-size: 80%;"><b>AdaGrad Properties</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs042.html#rmsprop-adaptive-learning-rates" style="font-size: 80%;"><b>RMSProp: Adaptive Learning Rates</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs042.html#rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>RMSProp algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs043.html#adam-optimizer" style="font-size: 80%;"><b>Adam Optimizer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs044.html#adam-optimizer-https-arxiv-org-abs-1412-6980" style="font-size: 80%;"><b>"ADAM optimizer":"https://arxiv.org/abs/1412.6980"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs045.html#why-combine-momentum-and-rmsprop" style="font-size: 80%;"><b>Why Combine Momentum and RMSProp?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs046.html#adam-exponential-moving-averages-moments" style="font-size: 80%;"><b>Adam: Exponential Moving Averages (Moments)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs047.html#adam-bias-correction" style="font-size: 80%;"><b>Adam: Bias Correction</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs048.html#adam-update-rule-derivation" style="font-size: 80%;"><b>Adam: Update Rule Derivation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs049.html#adam-vs-adagrad-and-rmsprop" style="font-size: 80%;"><b>Adam vs. AdaGrad and RMSProp</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs050.html#adaptivity-across-dimensions" style="font-size: 80%;"><b>Adaptivity Across Dimensions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs050.html#adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>ADAM algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs051.html#algorithms-and-codes-for-adagrad-rmsprop-and-adam" style="font-size: 80%;"><b>Algorithms and codes for Adagrad, RMSprop and Adam</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs052.html#practical-tips" style="font-size: 80%;"><b>Practical tips</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs053.html#sneaking-in-automatic-differentiation-using-autograd" style="font-size: 80%;"><b>Sneaking in automatic differentiation using Autograd</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs056.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs055.html#including-stochastic-gradient-descent-with-autograd" style="font-size: 80%;"><b>Including Stochastic Gradient Descent with Autograd</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs056.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs057.html#but-none-of-these-can-compete-with-newton-s-method" style="font-size: 80%;"><b>But none of these can compete with Newton's method</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs058.html#similar-second-order-function-now-problem-but-now-with-adagrad" style="font-size: 80%;"><b>Similar (second order function now) problem but now with AdaGrad</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs059.html#rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent" style="font-size: 80%;"><b>RMSprop for adaptive learning rate with Stochastic Gradient Descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs060.html#and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf" style="font-size: 80%;"><b>And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs061.html#material-for-the-lab-sessions" style="font-size: 80%;"><b>Material for the lab sessions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs062.html#reminder-on-different-scaling-methods" style="font-size: 80%;"><b>Reminder on different scaling methods</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs063.html#functionality-in-scikit-learn" style="font-size: 80%;"><b>Functionality in Scikit-Learn</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs064.html#more-preprocessing" style="font-size: 80%;"><b>More preprocessing</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs065.html#frequently-used-scaling-functions" style="font-size: 80%;"><b>Frequently used scaling functions</b></a></li>
 
         </ul>
       </li>
@@ -291,19 +374,56 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0034"></a>
 <!-- !split -->
-<h2 id="resampling-methods-bootstrap-approach" class="anchor">Resampling methods: Bootstrap approach </h2>
+<h2 id="memory-usage-and-scalability" class="anchor">Memory Usage and Scalability </h2>
 
-<p>But
-unless there is enough information available about the process that
-generated \( X_1,X_2,\cdots,X_n \), \( p(x) \) is in general
-unknown. Therefore, <a href="https://projecteuclid.org/euclid.aos/1176344552" target="_self">Efron in 1979</a>  asked the
-question: What if we replace \( p(x) \) by the relative frequency
-of the observation \( X_i \)?
+<p>A major advantage of SGD is its memory efficiency in handling large
+datasets. Full-batch GD requires access to the entire training set for
+each iteration, which often means the whole dataset (or a large
+subset) must reside in memory to compute \( \nabla C(\theta) \) . This results
+in memory usage that scales linearly with the dataset size \( N \). For
+instance, if each training sample is large (e.g. high-dimensional
+features), computing a full gradient may require storing a substantial
+portion of the data or all intermediate gradients until they are
+aggregated. In contrast, SGD needs only a single (or a small
+mini-batch of) training example(s) in memory at any time . The
+algorithm processes one sample (or mini-batch) at a time and
+immediately updates the model, discarding that sample before moving to
+the next. This streaming approach means that memory footprint is
+essentially independent of \( N \) (apart from storing the model
+parameters themselves). As one source notes, gradient descent
+&#8220;requires more memory than SGD&#8221; because it &#8220;must store the entire
+dataset for each iteration,&#8221; whereas SGD &#8220;only needs to store the
+current training example&#8221; . In practical terms, if you have a dataset
+of size, say, 1 million examples, full-batch GD would need memory for
+all million every step, while SGD could be implemented to load just
+one example at a time &#8211; a crucial benefit if data are too large to fit
+in RAM or GPU memory. This scalability makes SGD suitable for
+large-scale learning: as long as you can stream data from disk, SGD
+can handle arbitrarily large datasets with fixed memory. In fact, SGD
+&#8220;does not need to remember which examples were visited&#8221; in the past,
+allowing it to run in an online fashion on infinite data streams
+. Full-batch GD, on the other hand, would require multiple passes
+through a giant dataset per update (or a complex distributed memory
+system), which is often infeasible.
 </p>
 
-<p>If we draw observations in accordance with
-the relative frequency of the observations, will we obtain the same
-result in some asymptotic sense? The answer is yes.
+<p>There is also a secondary memory effect: computing a full-batch
+gradient in deep learning requires storing all intermediate
+activations for backpropagation across the entire batch. A very large
+batch (approaching the full dataset) might exhaust GPU memory due to
+the need to hold activation gradients for thousands or millions of
+examples simultaneously. SGD/minibatches mitigate this by splitting
+the workload &#8211; e.g. with a mini-batch of size 32 or 256, memory use
+stays bounded, whereas a full-batch (size = \( N \)) forward/backward pass
+could not even be executed if \( N \) is huge. Techniques like gradient
+accumulation exist to simulate large-batch GD by summing many
+small-batch gradients &#8211; but these still process data in manageable
+chunks to avoid memory overflow. In summary, memory complexity for GD
+grows with \( N \), while for SGD it remains \( O(1) \) w.r.t. dataset size
+(only the model and perhaps a mini-batch reside in memory) . This is a
+key reason why batch GD &#8220;does not scale&#8221; to very large data and why
+virtually all large-scale machine learning algorithms rely on
+stochastic or mini-batch methods.
 </p>
 
 <p>
@@ -331,7 +451,7 @@ <h2 id="resampling-methods-bootstrap-approach" class="anchor">Resampling methods
   <li><a href="http://github.com/._week37-bs042.html">43</a></li>
   <li><a href="http://github.com/._week37-bs043.html">44</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week37-bs053.html">54</a></li>
+  <li><a href="http://github.com/._week37-bs065.html">66</a></li>
   <li><a href="http://github.com/._week37-bs035.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week37/html/._week37-bs035.html b/doc/pub/week37/html/._week37-bs035.html
index f843ffbc1..bb8b33729 100644
--- a/doc/pub/week37/html/._week37-bs035.html
+++ b/doc/pub/week37/html/._week37-bs035.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 37: Statistical interpretations and Resampling Methods">
-<title>Week 37: Statistical interpretations and Resampling Methods</title>
+<meta name="description" content="Week 37: Gradient descent methods">
+<title>Week 37: Gradient descent methods</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week37.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week37-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -40,159 +40,222 @@
                2,
                None,
                'plans-for-week-37-lecture-monday'),
-              ('Plans for week 37, lab sessions',
+              ('Readings and Videos:', 2, None, 'readings-and-videos'),
+              ('Material for lecture Monday September 8',
                2,
                None,
-               'plans-for-week-37-lab-sessions'),
-              ('Material for lecture Monday September 9',
+               'material-for-lecture-monday-september-8'),
+              ('Gradient descent and revisiting Ordinary Least Squares from '
+               'last week',
                2,
                None,
-               'material-for-lecture-monday-september-9'),
-              ('Deriving OLS from a probability distribution',
+               'gradient-descent-and-revisiting-ordinary-least-squares-from-last-week'),
+              ('Gradient descent example', 2, None, 'gradient-descent-example'),
+              ('The derivative of the cost/loss function',
                2,
                None,
-               'deriving-ols-from-a-probability-distribution'),
-              ('Independent and Identically Distrubuted (iid)',
+               'the-derivative-of-the-cost-loss-function'),
+              ('The Hessian matrix', 2, None, 'the-hessian-matrix'),
+              ('Simple program', 2, None, 'simple-program'),
+              ('Gradient Descent Example', 2, None, 'gradient-descent-example'),
+              ('Gradient descent and Ridge',
                2,
                None,
-               'independent-and-identically-distrubuted-iid'),
-              ('Maximum Likelihood Estimation (MLE)',
+               'gradient-descent-and-ridge'),
+              ('The Hessian matrix for Ridge Regression',
                2,
                None,
-               'maximum-likelihood-estimation-mle'),
-              ('A new Cost Function', 2, None, 'a-new-cost-function'),
-              ("More basic Statistics and Bayes' theorem",
+               'the-hessian-matrix-for-ridge-regression'),
+              ('Program example for gradient descent with Ridge Regression',
                2,
                None,
-               'more-basic-statistics-and-bayes-theorem'),
-              ('Marginal Probability', 2, None, 'marginal-probability'),
-              ('Conditional  Probability', 2, None, 'conditional-probability'),
-              ("Bayes' Theorem", 2, None, 'bayes-theorem'),
-              ("Interpretations of Bayes' Theorem",
+               'program-example-for-gradient-descent-with-ridge-regression'),
+              ('Using gradient descent methods, limitations',
                2,
                None,
-               'interpretations-of-bayes-theorem'),
-              ("Example of Usage of Bayes' theorem",
+               'using-gradient-descent-methods-limitations'),
+              ('Momentum based GD', 2, None, 'momentum-based-gd'),
+              ('Improving gradient descent with momentum',
                2,
                None,
-               'example-of-usage-of-bayes-theorem'),
-              ('Doing it correctly', 2, None, 'doing-it-correctly'),
-              ("Bayes' Theorem and Ridge and Lasso Regression",
+               'improving-gradient-descent-with-momentum'),
+              ('Same code but now with momentum gradient descent',
                2,
                None,
-               'bayes-theorem-and-ridge-and-lasso-regression'),
-              ('Ridge and Bayes', 2, None, 'ridge-and-bayes'),
-              ('Lasso and Bayes', 2, None, 'lasso-and-bayes'),
-              ('Why resampling methods', 2, None, 'why-resampling-methods'),
-              ('Resampling methods', 2, None, 'resampling-methods'),
-              ('Resampling approaches can be computationally expensive',
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ('Overview video on Stochastic Gradient Descent (SGD)',
                2,
                None,
-               'resampling-approaches-can-be-computationally-expensive'),
-              ('Why resampling methods ?', 2, None, 'why-resampling-methods'),
-              ('Statistical analysis', 2, None, 'statistical-analysis'),
-              ('Resampling methods', 2, None, 'resampling-methods'),
-              ('Resampling methods: Bootstrap',
+               'overview-video-on-stochastic-gradient-descent-sgd'),
+              ('Batches and mini-batches', 2, None, 'batches-and-mini-batches'),
+              ('Pros and cons', 2, None, 'pros-and-cons'),
+              ('Convergence rates', 2, None, 'convergence-rates'),
+              ('Accuracy', 2, None, 'accuracy'),
+              ('Stochastic Gradient Descent (SGD)',
                2,
                None,
-               'resampling-methods-bootstrap'),
-              ('The Central Limit Theorem',
+               'stochastic-gradient-descent-sgd'),
+              ('Stochastic Gradient Descent',
                2,
                None,
-               'the-central-limit-theorem'),
-              ('Finding the Limit', 2, None, 'finding-the-limit'),
-              ('Rewriting the $\\delta$-function',
+               'stochastic-gradient-descent'),
+              ('Computation of gradients', 2, None, 'computation-of-gradients'),
+              ('SGD example', 2, None, 'sgd-example'),
+              ('The gradient step', 2, None, 'the-gradient-step'),
+              ('Simple example code', 2, None, 'simple-example-code'),
+              ('When do we stop?', 2, None, 'when-do-we-stop'),
+              ('Slightly different approach',
                2,
                None,
-               'rewriting-the-delta-function'),
-              ('Identifying Terms', 2, None, 'identifying-terms'),
-              ('Wrapping it up', 2, None, 'wrapping-it-up'),
-              ('Confidence Intervals', 2, None, 'confidence-intervals'),
-              ('Standard Approach based on the Normal Distribution',
+               'slightly-different-approach'),
+              ('Time decay rate', 2, None, 'time-decay-rate'),
+              ('Code with a Number of Minibatches which varies',
                2,
                None,
-               'standard-approach-based-on-the-normal-distribution'),
-              ('Resampling methods: Bootstrap background',
+               'code-with-a-number-of-minibatches-which-varies'),
+              ('Replace or not', 2, None, 'replace-or-not'),
+              ('SGD vs Full-Batch GD: Convergence Speed and Memory Comparison',
                2,
                None,
-               'resampling-methods-bootstrap-background'),
-              ('Resampling methods: More Bootstrap background',
+               'sgd-vs-full-batch-gd-convergence-speed-and-memory-comparison'),
+              ('Theoretical Convergence Speed and convex optimization',
+               3,
+               None,
+               'theoretical-convergence-speed-and-convex-optimization'),
+              ('Strongly Convex Case', 3, None, 'strongly-convex-case'),
+              ('Non-Convex Problems', 3, None, 'non-convex-problems'),
+              ('Memory Usage and Scalability',
+               2,
+               None,
+               'memory-usage-and-scalability'),
+              ('Empirical Evidence: Convergence Time and Memory in Practice',
+               2,
+               None,
+               'empirical-evidence-convergence-time-and-memory-in-practice'),
+              ('Deep Neural Networks', 3, None, 'deep-neural-networks'),
+              ('Memory constraints', 3, None, 'memory-constraints'),
+              ('Second moment of the gradient',
+               2,
+               None,
+               'second-moment-of-the-gradient'),
+              ('Challenge: Choosing a Fixed Learning Rate',
+               2,
+               None,
+               'challenge-choosing-a-fixed-learning-rate'),
+              ('Motivation for Adaptive Step Sizes',
+               2,
+               None,
+               'motivation-for-adaptive-step-sizes'),
+              ('AdaGrad algorithm, taken from "Goodfellow et '
+               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
+               2,
+               None,
+               'adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
+              ('Derivation of the AdaGrad Algorithm',
+               2,
+               None,
+               'derivation-of-the-adagrad-algorithm'),
+              ('AdaGrad Update Rule Derivation',
+               2,
+               None,
+               'adagrad-update-rule-derivation'),
+              ('AdaGrad Properties', 2, None, 'adagrad-properties'),
+              ('RMSProp: Adaptive Learning Rates',
+               2,
+               None,
+               'rmsprop-adaptive-learning-rates'),
+              ('RMSProp algorithm, taken from "Goodfellow et '
+               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
+               2,
+               None,
+               'rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
+              ('Adam Optimizer', 2, None, 'adam-optimizer'),
+              ('"ADAM optimizer":"https://arxiv.org/abs/1412.6980"',
+               2,
+               None,
+               'adam-optimizer-https-arxiv-org-abs-1412-6980'),
+              ('Why Combine Momentum and RMSProp?',
+               2,
+               None,
+               'why-combine-momentum-and-rmsprop'),
+              ('Adam: Exponential Moving Averages (Moments)',
                2,
                None,
-               'resampling-methods-more-bootstrap-background'),
-              ('Resampling methods: Bootstrap approach',
+               'adam-exponential-moving-averages-moments'),
+              ('Adam: Bias Correction', 2, None, 'adam-bias-correction'),
+              ('Adam: Update Rule Derivation',
                2,
                None,
-               'resampling-methods-bootstrap-approach'),
-              ('Resampling methods: Bootstrap steps',
+               'adam-update-rule-derivation'),
+              ('Adam vs. AdaGrad and RMSProp',
                2,
                None,
-               'resampling-methods-bootstrap-steps'),
-              ('Code example for the Bootstrap method',
+               'adam-vs-adagrad-and-rmsprop'),
+              ('Adaptivity Across Dimensions',
                2,
                None,
-               'code-example-for-the-bootstrap-method'),
-              ('Plotting the Histogram', 2, None, 'plotting-the-histogram'),
-              ('The bias-variance tradeoff',
+               'adaptivity-across-dimensions'),
+              ('ADAM algorithm, taken from "Goodfellow et '
+               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
                2,
                None,
-               'the-bias-variance-tradeoff'),
-              ('A way to Read the Bias-Variance Tradeoff',
+               'adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
+              ('Algorithms and codes for Adagrad, RMSprop and Adam',
                2,
                None,
-               'a-way-to-read-the-bias-variance-tradeoff'),
-              ('Example code for Bias-Variance tradeoff',
+               'algorithms-and-codes-for-adagrad-rmsprop-and-adam'),
+              ('Practical tips', 2, None, 'practical-tips'),
+              ('Sneaking in automatic differentiation using Autograd',
                2,
                None,
-               'example-code-for-bias-variance-tradeoff'),
-              ('Understanding what happens',
+               'sneaking-in-automatic-differentiation-using-autograd'),
+              ('Same code but now with momentum gradient descent',
                2,
                None,
-               'understanding-what-happens'),
-              ('Summing up', 2, None, 'summing-up'),
-              ("Another Example from Scikit-Learn's Repository",
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ('Including Stochastic Gradient Descent with Autograd',
                2,
                None,
-               'another-example-from-scikit-learn-s-repository'),
-              ('Various steps in cross-validation',
+               'including-stochastic-gradient-descent-with-autograd'),
+              ('Same code but now with momentum gradient descent',
                2,
                None,
-               'various-steps-in-cross-validation'),
-              ('Cross-validation in brief',
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ("But none of these can compete with Newton's method",
                2,
                None,
-               'cross-validation-in-brief'),
-              ('Code Example for Cross-validation and $k$-fold '
-               'Cross-validation',
+               'but-none-of-these-can-compete-with-newton-s-method'),
+              ('Similar (second order function now) problem but now with '
+               'AdaGrad',
                2,
                None,
-               'code-example-for-cross-validation-and-k-fold-cross-validation'),
-              ('More examples on bootstrap and cross-validation and errors',
+               'similar-second-order-function-now-problem-but-now-with-adagrad'),
+              ('RMSprop for adaptive learning rate with Stochastic Gradient '
+               'Descent',
                2,
                None,
-               'more-examples-on-bootstrap-and-cross-validation-and-errors'),
-              ('The same example but now with cross-validation',
+               'rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent'),
+              ('And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"',
                2,
                None,
-               'the-same-example-but-now-with-cross-validation'),
+               'and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf'),
               ('Material for the lab sessions',
                2,
                None,
                'material-for-the-lab-sessions'),
-              ('Linking the regression analysis with a statistical '
-               'interpretation',
+              ('Reminder on different scaling methods',
                2,
                None,
-               'linking-the-regression-analysis-with-a-statistical-interpretation'),
-              ('Assumptions made', 2, None, 'assumptions-made'),
-              ('Expectation value and variance',
+               'reminder-on-different-scaling-methods'),
+              ('Functionality in Scikit-Learn',
                2,
                None,
-               'expectation-value-and-variance'),
-              ('Expectation value and variance for $\\boldsymbol{\\beta}$',
+               'functionality-in-scikit-learn'),
+              ('More preprocessing', 2, None, 'more-preprocessing'),
+              ('Frequently used scaling functions',
                2,
                None,
-               'expectation-value-and-variance-for-boldsymbol-beta')]}
+               'frequently-used-scaling-functions')]}
 end of tocinfo -->
 
 <body>
@@ -220,66 +283,86 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week37-bs.html">Week 37: Statistical interpretations and Resampling Methods</a>
+    <a class="navbar-brand" href="http://github.com/week37-bs.html">Week 37: Gradient descent methods</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs001.html#plans-for-week-37-lecture-monday" style="font-size: 80%;">Plans for week 37, lecture Monday</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs002.html#plans-for-week-37-lab-sessions" style="font-size: 80%;">Plans for week 37, lab sessions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs003.html#material-for-lecture-monday-september-9" style="font-size: 80%;">Material for lecture Monday September 9</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs004.html#deriving-ols-from-a-probability-distribution" style="font-size: 80%;">Deriving OLS from a probability distribution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs005.html#independent-and-identically-distrubuted-iid" style="font-size: 80%;">Independent and Identically Distrubuted (iid)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs006.html#maximum-likelihood-estimation-mle" style="font-size: 80%;">Maximum Likelihood Estimation (MLE)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs007.html#a-new-cost-function" style="font-size: 80%;">A new Cost Function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs008.html#more-basic-statistics-and-bayes-theorem" style="font-size: 80%;">More basic Statistics and Bayes' theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs009.html#marginal-probability" style="font-size: 80%;">Marginal Probability</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs010.html#conditional-probability" style="font-size: 80%;">Conditional  Probability</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs011.html#bayes-theorem" style="font-size: 80%;">Bayes' Theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs012.html#interpretations-of-bayes-theorem" style="font-size: 80%;">Interpretations of Bayes' Theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs013.html#example-of-usage-of-bayes-theorem" style="font-size: 80%;">Example of Usage of Bayes' theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs014.html#doing-it-correctly" style="font-size: 80%;">Doing it correctly</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs015.html#bayes-theorem-and-ridge-and-lasso-regression" style="font-size: 80%;">Bayes' Theorem and Ridge and Lasso Regression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs016.html#ridge-and-bayes" style="font-size: 80%;">Ridge and Bayes</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs017.html#lasso-and-bayes" style="font-size: 80%;">Lasso and Bayes</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs021.html#why-resampling-methods" style="font-size: 80%;">Why resampling methods</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs023.html#resampling-methods" style="font-size: 80%;">Resampling methods</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs020.html#resampling-approaches-can-be-computationally-expensive" style="font-size: 80%;">Resampling approaches can be computationally expensive</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs021.html#why-resampling-methods" style="font-size: 80%;">Why resampling methods ?</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs022.html#statistical-analysis" style="font-size: 80%;">Statistical analysis</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs023.html#resampling-methods" style="font-size: 80%;">Resampling methods</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs024.html#resampling-methods-bootstrap" style="font-size: 80%;">Resampling methods: Bootstrap</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs025.html#the-central-limit-theorem" style="font-size: 80%;">The Central Limit Theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs026.html#finding-the-limit" style="font-size: 80%;">Finding the Limit</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs027.html#rewriting-the-delta-function" style="font-size: 80%;">Rewriting the \( \delta \)-function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs028.html#identifying-terms" style="font-size: 80%;">Identifying Terms</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs029.html#wrapping-it-up" style="font-size: 80%;">Wrapping it up</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs030.html#confidence-intervals" style="font-size: 80%;">Confidence Intervals</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs031.html#standard-approach-based-on-the-normal-distribution" style="font-size: 80%;">Standard Approach based on the Normal Distribution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs032.html#resampling-methods-bootstrap-background" style="font-size: 80%;">Resampling methods: Bootstrap background</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#resampling-methods-more-bootstrap-background" style="font-size: 80%;">Resampling methods: More Bootstrap background</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs034.html#resampling-methods-bootstrap-approach" style="font-size: 80%;">Resampling methods: Bootstrap approach</a></li>
-     <!-- navigation toc: --> <li><a href="#resampling-methods-bootstrap-steps" style="font-size: 80%;">Resampling methods: Bootstrap steps</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs036.html#code-example-for-the-bootstrap-method" style="font-size: 80%;">Code example for the Bootstrap method</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs037.html#plotting-the-histogram" style="font-size: 80%;">Plotting the Histogram</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs038.html#the-bias-variance-tradeoff" style="font-size: 80%;">The bias-variance tradeoff</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs039.html#a-way-to-read-the-bias-variance-tradeoff" style="font-size: 80%;">A way to Read the Bias-Variance Tradeoff</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs040.html#example-code-for-bias-variance-tradeoff" style="font-size: 80%;">Example code for Bias-Variance tradeoff</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs041.html#understanding-what-happens" style="font-size: 80%;">Understanding what happens</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs042.html#summing-up" style="font-size: 80%;">Summing up</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs043.html#another-example-from-scikit-learn-s-repository" style="font-size: 80%;">Another Example from Scikit-Learn's Repository</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs044.html#various-steps-in-cross-validation" style="font-size: 80%;">Various steps in cross-validation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs045.html#cross-validation-in-brief" style="font-size: 80%;">Cross-validation in brief</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs046.html#code-example-for-cross-validation-and-k-fold-cross-validation" style="font-size: 80%;">Code Example for Cross-validation and \( k \)-fold Cross-validation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs047.html#more-examples-on-bootstrap-and-cross-validation-and-errors" style="font-size: 80%;">More examples on bootstrap and cross-validation and errors</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs048.html#the-same-example-but-now-with-cross-validation" style="font-size: 80%;">The same example but now with cross-validation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs049.html#material-for-the-lab-sessions" style="font-size: 80%;">Material for the lab sessions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs050.html#linking-the-regression-analysis-with-a-statistical-interpretation" style="font-size: 80%;">Linking the regression analysis with a statistical interpretation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs051.html#assumptions-made" style="font-size: 80%;">Assumptions made</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs052.html#expectation-value-and-variance" style="font-size: 80%;">Expectation value and variance</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs053.html#expectation-value-and-variance-for-boldsymbol-beta" style="font-size: 80%;">Expectation value and variance for \( \boldsymbol{\beta} \)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs001.html#plans-for-week-37-lecture-monday" style="font-size: 80%;"><b>Plans for week 37, lecture Monday</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs002.html#readings-and-videos" style="font-size: 80%;"><b>Readings and Videos:</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs003.html#material-for-lecture-monday-september-8" style="font-size: 80%;"><b>Material for lecture Monday September 8</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs004.html#gradient-descent-and-revisiting-ordinary-least-squares-from-last-week" style="font-size: 80%;"><b>Gradient descent and revisiting Ordinary Least Squares from last week</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs009.html#gradient-descent-example" style="font-size: 80%;"><b>Gradient descent example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs006.html#the-derivative-of-the-cost-loss-function" style="font-size: 80%;"><b>The derivative of the cost/loss function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs007.html#the-hessian-matrix" style="font-size: 80%;"><b>The Hessian matrix</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs008.html#simple-program" style="font-size: 80%;"><b>Simple program</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs009.html#gradient-descent-example" style="font-size: 80%;"><b>Gradient Descent Example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs010.html#gradient-descent-and-ridge" style="font-size: 80%;"><b>Gradient descent and Ridge</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs011.html#the-hessian-matrix-for-ridge-regression" style="font-size: 80%;"><b>The Hessian matrix for Ridge Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs012.html#program-example-for-gradient-descent-with-ridge-regression" style="font-size: 80%;"><b>Program example for gradient descent with Ridge Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs013.html#using-gradient-descent-methods-limitations" style="font-size: 80%;"><b>Using gradient descent methods, limitations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs014.html#momentum-based-gd" style="font-size: 80%;"><b>Momentum based GD</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs015.html#improving-gradient-descent-with-momentum" style="font-size: 80%;"><b>Improving gradient descent with momentum</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs056.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs017.html#overview-video-on-stochastic-gradient-descent-sgd" style="font-size: 80%;"><b>Overview video on Stochastic Gradient Descent (SGD)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs018.html#batches-and-mini-batches" style="font-size: 80%;"><b>Batches and mini-batches</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs019.html#pros-and-cons" style="font-size: 80%;"><b>Pros and cons</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs020.html#convergence-rates" style="font-size: 80%;"><b>Convergence rates</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs021.html#accuracy" style="font-size: 80%;"><b>Accuracy</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs022.html#stochastic-gradient-descent-sgd" style="font-size: 80%;"><b>Stochastic Gradient Descent (SGD)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs023.html#stochastic-gradient-descent" style="font-size: 80%;"><b>Stochastic Gradient Descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs024.html#computation-of-gradients" style="font-size: 80%;"><b>Computation of gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs025.html#sgd-example" style="font-size: 80%;"><b>SGD example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs026.html#the-gradient-step" style="font-size: 80%;"><b>The gradient step</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs027.html#simple-example-code" style="font-size: 80%;"><b>Simple example code</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs028.html#when-do-we-stop" style="font-size: 80%;"><b>When do we stop?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs029.html#slightly-different-approach" style="font-size: 80%;"><b>Slightly different approach</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs030.html#time-decay-rate" style="font-size: 80%;"><b>Time decay rate</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs031.html#code-with-a-number-of-minibatches-which-varies" style="font-size: 80%;"><b>Code with a Number of Minibatches which varies</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs032.html#replace-or-not" style="font-size: 80%;"><b>Replace or not</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#sgd-vs-full-batch-gd-convergence-speed-and-memory-comparison" style="font-size: 80%;"><b>SGD vs Full-Batch GD: Convergence Speed and Memory Comparison</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#theoretical-convergence-speed-and-convex-optimization" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Theoretical Convergence Speed and convex optimization</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#strongly-convex-case" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Strongly Convex Case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#non-convex-problems" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Non-Convex Problems</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs034.html#memory-usage-and-scalability" style="font-size: 80%;"><b>Memory Usage and Scalability</b></a></li>
+     <!-- navigation toc: --> <li><a href="#empirical-evidence-convergence-time-and-memory-in-practice" style="font-size: 80%;"><b>Empirical Evidence: Convergence Time and Memory in Practice</b></a></li>
+     <!-- navigation toc: --> <li><a href="#deep-neural-networks" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Deep Neural Networks</a></li>
+     <!-- navigation toc: --> <li><a href="#memory-constraints" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Memory constraints</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs036.html#second-moment-of-the-gradient" style="font-size: 80%;"><b>Second moment of the gradient</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs037.html#challenge-choosing-a-fixed-learning-rate" style="font-size: 80%;"><b>Challenge: Choosing a Fixed Learning Rate</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs038.html#motivation-for-adaptive-step-sizes" style="font-size: 80%;"><b>Motivation for Adaptive Step Sizes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs038.html#adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>AdaGrad algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs039.html#derivation-of-the-adagrad-algorithm" style="font-size: 80%;"><b>Derivation of the AdaGrad Algorithm</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs040.html#adagrad-update-rule-derivation" style="font-size: 80%;"><b>AdaGrad Update Rule Derivation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs041.html#adagrad-properties" style="font-size: 80%;"><b>AdaGrad Properties</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs042.html#rmsprop-adaptive-learning-rates" style="font-size: 80%;"><b>RMSProp: Adaptive Learning Rates</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs042.html#rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>RMSProp algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs043.html#adam-optimizer" style="font-size: 80%;"><b>Adam Optimizer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs044.html#adam-optimizer-https-arxiv-org-abs-1412-6980" style="font-size: 80%;"><b>"ADAM optimizer":"https://arxiv.org/abs/1412.6980"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs045.html#why-combine-momentum-and-rmsprop" style="font-size: 80%;"><b>Why Combine Momentum and RMSProp?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs046.html#adam-exponential-moving-averages-moments" style="font-size: 80%;"><b>Adam: Exponential Moving Averages (Moments)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs047.html#adam-bias-correction" style="font-size: 80%;"><b>Adam: Bias Correction</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs048.html#adam-update-rule-derivation" style="font-size: 80%;"><b>Adam: Update Rule Derivation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs049.html#adam-vs-adagrad-and-rmsprop" style="font-size: 80%;"><b>Adam vs. AdaGrad and RMSProp</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs050.html#adaptivity-across-dimensions" style="font-size: 80%;"><b>Adaptivity Across Dimensions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs050.html#adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>ADAM algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs051.html#algorithms-and-codes-for-adagrad-rmsprop-and-adam" style="font-size: 80%;"><b>Algorithms and codes for Adagrad, RMSprop and Adam</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs052.html#practical-tips" style="font-size: 80%;"><b>Practical tips</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs053.html#sneaking-in-automatic-differentiation-using-autograd" style="font-size: 80%;"><b>Sneaking in automatic differentiation using Autograd</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs056.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs055.html#including-stochastic-gradient-descent-with-autograd" style="font-size: 80%;"><b>Including Stochastic Gradient Descent with Autograd</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs056.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs057.html#but-none-of-these-can-compete-with-newton-s-method" style="font-size: 80%;"><b>But none of these can compete with Newton's method</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs058.html#similar-second-order-function-now-problem-but-now-with-adagrad" style="font-size: 80%;"><b>Similar (second order function now) problem but now with AdaGrad</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs059.html#rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent" style="font-size: 80%;"><b>RMSprop for adaptive learning rate with Stochastic Gradient Descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs060.html#and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf" style="font-size: 80%;"><b>And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs061.html#material-for-the-lab-sessions" style="font-size: 80%;"><b>Material for the lab sessions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs062.html#reminder-on-different-scaling-methods" style="font-size: 80%;"><b>Reminder on different scaling methods</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs063.html#functionality-in-scikit-learn" style="font-size: 80%;"><b>Functionality in Scikit-Learn</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs064.html#more-preprocessing" style="font-size: 80%;"><b>More preprocessing</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs065.html#frequently-used-scaling-functions" style="font-size: 80%;"><b>Frequently used scaling functions</b></a></li>
 
         </ul>
       </li>
@@ -291,25 +374,77 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0035"></a>
 <!-- !split -->
-<h2 id="resampling-methods-bootstrap-steps" class="anchor">Resampling methods: Bootstrap steps </h2>
+<h2 id="empirical-evidence-convergence-time-and-memory-in-practice" class="anchor">Empirical Evidence: Convergence Time and Memory in Practice </h2>
+
+<p>Empirical studies strongly support the theoretical trade-offs
+above. In large-scale machine learning tasks, SGD often converges to a
+good solution much faster in wall-clock time than full-batch GD, and
+it uses far less memory. For example, Bottou &amp; Bousquet (2008)
+analyzed learning time under a fixed computational budget and
+concluded that when data is abundant, it&#8217;s better to use a faster
+(even if less precise) optimization method to process more examples in
+the same time . This analysis showed that for large-scale problems,
+processing more data with SGD yields lower error than spending the
+time to do exact (batch) optimization on fewer data . In other words,
+if you have a time budget, it&#8217;s often optimal to accept slightly
+slower convergence per step (as with SGD) in exchange for being able
+to use many more training samples in that time. This phenomenon is
+borne out by experiments:
+</p>
+<h3 id="deep-neural-networks" class="anchor">Deep Neural Networks </h3>
+
+<p>In modern deep learning, full-batch GD is so slow that it is rarely
+attempted; instead, mini-batch SGD is standard. A recent study
+demonstrated that it is possible to train a ResNet-50 on ImageNet
+using full-batch gradient descent, but it required careful tuning
+(e.g. gradient clipping, tiny learning rates) and vast computational
+resources &#8211; and even then, each full-batch update was extremely
+expensive.
+</p>
+
+<p>Using a huge batch
+(closer to full GD) tends to slow down convergence if the learning
+rate is not scaled up, and often encounters optimization difficulties
+(plateaus) that small batches avoid.
+Empirically, small or medium
+batch SGD finds minima in fewer clock hours because it can rapidly
+loop over the data with gradient noise aiding exploration.
+</p>
+<h3 id="memory-constraints" class="anchor">Memory constraints </h3>
 
-<p>The independent bootstrap works like this: </p>
+<p>From a memory standpoint, practitioners note that batch GD becomes
+infeasible on large data. For example, if one tried to do full-batch
+training on a dataset that doesn&#8217;t fit in RAM or GPU memory, the
+program would resort to heavy disk I/O or simply crash. SGD
+circumvents this by processing mini-batches. Even in cases where data
+does fit in memory, using a full batch can spike memory usage due to
+storing all gradients. One empirical observation is that mini-batch
+training has a &#8220;lower, fluctuating usage pattern&#8221; of memory, whereas
+full-batch loading &#8220;quickly consumes memory (often exceeding limits)&#8221;
+. This is especially relevant for graph neural networks or other
+models where a &#8220;batch&#8221; may include a huge chunk of a graph: full-batch
+gradient computation can exhaust GPU memory, whereas mini-batch
+methods keep memory usage manageable .
+</p>
 
-<ol>
-<li> Draw with replacement \( n \) numbers for the observed variables \( \boldsymbol{x} = (x_1,x_2,\cdots,x_n) \).</li> 
-<li> Define a vector \( \boldsymbol{x}^* \) containing the values which were drawn from \( \boldsymbol{x} \).</li> 
-<li> Using the vector \( \boldsymbol{x}^* \) compute \( \widehat{\beta}^* \) by evaluating \( \widehat \beta \) under the observations \( \boldsymbol{x}^* \).</li> 
-<li> Repeat this process \( k \) times.</li> 
-</ol>
-<p>When you are done, you can draw a histogram of the relative frequency
-of \( \widehat \beta^* \). This is your estimate of the probability
-distribution \( p(t) \). Using this probability distribution you can
-estimate any statistics thereof. In principle you never draw the
-histogram of the relative frequency of \( \widehat{\beta}^* \). Instead
-you use the estimators corresponding to the statistic of interest. For
-example, if you are interested in estimating the variance of \( \widehat
-\beta \), apply the etsimator \( \widehat \sigma^2 \) to the values
-\( \widehat \beta^* \).
+<p>In summary, SGD converges faster than full-batch GD in terms of actual
+training time for large-scale problems, provided we measure
+convergence as reaching a good-enough solution. Theoretical bounds
+show SGD needs more iterations, but because it performs many more
+updates per unit time (and requires far less memory), it often
+achieves lower loss in a given time frame than GD. Full-batch GD might
+take slightly fewer iterations in theory, but each iteration is so
+costly that it is &#8220;slower&#8230; especially for large datasets&#8221; . Meanwhile,
+memory scaling strongly favors SGD: GD&#8217;s memory cost grows with
+dataset size, making it impractical beyond a point, whereas SGD&#8217;s
+memory use is modest and mostly constant w.r.t. \( N \) . These
+differences have made SGD (and mini-batch variants) the de facto
+choice for training large machine learning models, from logistic
+regression on millions of examples to deep neural networks with
+billions of parameters. The consensus in both research and practice is
+that for large-scale or high-dimensional tasks, SGD-type methods
+converge quicker per unit of computation and handle memory constraints
+better than standard full-batch gradient descent .
 </p>
 
 <p>
@@ -337,7 +472,7 @@ <h2 id="resampling-methods-bootstrap-steps" class="anchor">Resampling methods: B
   <li><a href="http://github.com/._week37-bs043.html">44</a></li>
   <li><a href="http://github.com/._week37-bs044.html">45</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week37-bs053.html">54</a></li>
+  <li><a href="http://github.com/._week37-bs065.html">66</a></li>
   <li><a href="http://github.com/._week37-bs036.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week37/html/._week37-bs036.html b/doc/pub/week37/html/._week37-bs036.html
index f191c3bab..c05a3e8cb 100644
--- a/doc/pub/week37/html/._week37-bs036.html
+++ b/doc/pub/week37/html/._week37-bs036.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 37: Statistical interpretations and Resampling Methods">
-<title>Week 37: Statistical interpretations and Resampling Methods</title>
+<meta name="description" content="Week 37: Gradient descent methods">
+<title>Week 37: Gradient descent methods</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week37.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week37-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -40,159 +40,222 @@
                2,
                None,
                'plans-for-week-37-lecture-monday'),
-              ('Plans for week 37, lab sessions',
+              ('Readings and Videos:', 2, None, 'readings-and-videos'),
+              ('Material for lecture Monday September 8',
                2,
                None,
-               'plans-for-week-37-lab-sessions'),
-              ('Material for lecture Monday September 9',
+               'material-for-lecture-monday-september-8'),
+              ('Gradient descent and revisiting Ordinary Least Squares from '
+               'last week',
                2,
                None,
-               'material-for-lecture-monday-september-9'),
-              ('Deriving OLS from a probability distribution',
+               'gradient-descent-and-revisiting-ordinary-least-squares-from-last-week'),
+              ('Gradient descent example', 2, None, 'gradient-descent-example'),
+              ('The derivative of the cost/loss function',
                2,
                None,
-               'deriving-ols-from-a-probability-distribution'),
-              ('Independent and Identically Distrubuted (iid)',
+               'the-derivative-of-the-cost-loss-function'),
+              ('The Hessian matrix', 2, None, 'the-hessian-matrix'),
+              ('Simple program', 2, None, 'simple-program'),
+              ('Gradient Descent Example', 2, None, 'gradient-descent-example'),
+              ('Gradient descent and Ridge',
                2,
                None,
-               'independent-and-identically-distrubuted-iid'),
-              ('Maximum Likelihood Estimation (MLE)',
+               'gradient-descent-and-ridge'),
+              ('The Hessian matrix for Ridge Regression',
                2,
                None,
-               'maximum-likelihood-estimation-mle'),
-              ('A new Cost Function', 2, None, 'a-new-cost-function'),
-              ("More basic Statistics and Bayes' theorem",
+               'the-hessian-matrix-for-ridge-regression'),
+              ('Program example for gradient descent with Ridge Regression',
                2,
                None,
-               'more-basic-statistics-and-bayes-theorem'),
-              ('Marginal Probability', 2, None, 'marginal-probability'),
-              ('Conditional  Probability', 2, None, 'conditional-probability'),
-              ("Bayes' Theorem", 2, None, 'bayes-theorem'),
-              ("Interpretations of Bayes' Theorem",
+               'program-example-for-gradient-descent-with-ridge-regression'),
+              ('Using gradient descent methods, limitations',
                2,
                None,
-               'interpretations-of-bayes-theorem'),
-              ("Example of Usage of Bayes' theorem",
+               'using-gradient-descent-methods-limitations'),
+              ('Momentum based GD', 2, None, 'momentum-based-gd'),
+              ('Improving gradient descent with momentum',
                2,
                None,
-               'example-of-usage-of-bayes-theorem'),
-              ('Doing it correctly', 2, None, 'doing-it-correctly'),
-              ("Bayes' Theorem and Ridge and Lasso Regression",
+               'improving-gradient-descent-with-momentum'),
+              ('Same code but now with momentum gradient descent',
                2,
                None,
-               'bayes-theorem-and-ridge-and-lasso-regression'),
-              ('Ridge and Bayes', 2, None, 'ridge-and-bayes'),
-              ('Lasso and Bayes', 2, None, 'lasso-and-bayes'),
-              ('Why resampling methods', 2, None, 'why-resampling-methods'),
-              ('Resampling methods', 2, None, 'resampling-methods'),
-              ('Resampling approaches can be computationally expensive',
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ('Overview video on Stochastic Gradient Descent (SGD)',
                2,
                None,
-               'resampling-approaches-can-be-computationally-expensive'),
-              ('Why resampling methods ?', 2, None, 'why-resampling-methods'),
-              ('Statistical analysis', 2, None, 'statistical-analysis'),
-              ('Resampling methods', 2, None, 'resampling-methods'),
-              ('Resampling methods: Bootstrap',
+               'overview-video-on-stochastic-gradient-descent-sgd'),
+              ('Batches and mini-batches', 2, None, 'batches-and-mini-batches'),
+              ('Pros and cons', 2, None, 'pros-and-cons'),
+              ('Convergence rates', 2, None, 'convergence-rates'),
+              ('Accuracy', 2, None, 'accuracy'),
+              ('Stochastic Gradient Descent (SGD)',
                2,
                None,
-               'resampling-methods-bootstrap'),
-              ('The Central Limit Theorem',
+               'stochastic-gradient-descent-sgd'),
+              ('Stochastic Gradient Descent',
                2,
                None,
-               'the-central-limit-theorem'),
-              ('Finding the Limit', 2, None, 'finding-the-limit'),
-              ('Rewriting the $\\delta$-function',
+               'stochastic-gradient-descent'),
+              ('Computation of gradients', 2, None, 'computation-of-gradients'),
+              ('SGD example', 2, None, 'sgd-example'),
+              ('The gradient step', 2, None, 'the-gradient-step'),
+              ('Simple example code', 2, None, 'simple-example-code'),
+              ('When do we stop?', 2, None, 'when-do-we-stop'),
+              ('Slightly different approach',
                2,
                None,
-               'rewriting-the-delta-function'),
-              ('Identifying Terms', 2, None, 'identifying-terms'),
-              ('Wrapping it up', 2, None, 'wrapping-it-up'),
-              ('Confidence Intervals', 2, None, 'confidence-intervals'),
-              ('Standard Approach based on the Normal Distribution',
+               'slightly-different-approach'),
+              ('Time decay rate', 2, None, 'time-decay-rate'),
+              ('Code with a Number of Minibatches which varies',
                2,
                None,
-               'standard-approach-based-on-the-normal-distribution'),
-              ('Resampling methods: Bootstrap background',
+               'code-with-a-number-of-minibatches-which-varies'),
+              ('Replace or not', 2, None, 'replace-or-not'),
+              ('SGD vs Full-Batch GD: Convergence Speed and Memory Comparison',
                2,
                None,
-               'resampling-methods-bootstrap-background'),
-              ('Resampling methods: More Bootstrap background',
+               'sgd-vs-full-batch-gd-convergence-speed-and-memory-comparison'),
+              ('Theoretical Convergence Speed and convex optimization',
+               3,
+               None,
+               'theoretical-convergence-speed-and-convex-optimization'),
+              ('Strongly Convex Case', 3, None, 'strongly-convex-case'),
+              ('Non-Convex Problems', 3, None, 'non-convex-problems'),
+              ('Memory Usage and Scalability',
+               2,
+               None,
+               'memory-usage-and-scalability'),
+              ('Empirical Evidence: Convergence Time and Memory in Practice',
+               2,
+               None,
+               'empirical-evidence-convergence-time-and-memory-in-practice'),
+              ('Deep Neural Networks', 3, None, 'deep-neural-networks'),
+              ('Memory constraints', 3, None, 'memory-constraints'),
+              ('Second moment of the gradient',
+               2,
+               None,
+               'second-moment-of-the-gradient'),
+              ('Challenge: Choosing a Fixed Learning Rate',
+               2,
+               None,
+               'challenge-choosing-a-fixed-learning-rate'),
+              ('Motivation for Adaptive Step Sizes',
+               2,
+               None,
+               'motivation-for-adaptive-step-sizes'),
+              ('AdaGrad algorithm, taken from "Goodfellow et '
+               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
+               2,
+               None,
+               'adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
+              ('Derivation of the AdaGrad Algorithm',
+               2,
+               None,
+               'derivation-of-the-adagrad-algorithm'),
+              ('AdaGrad Update Rule Derivation',
+               2,
+               None,
+               'adagrad-update-rule-derivation'),
+              ('AdaGrad Properties', 2, None, 'adagrad-properties'),
+              ('RMSProp: Adaptive Learning Rates',
                2,
                None,
-               'resampling-methods-more-bootstrap-background'),
-              ('Resampling methods: Bootstrap approach',
+               'rmsprop-adaptive-learning-rates'),
+              ('RMSProp algorithm, taken from "Goodfellow et '
+               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
                2,
                None,
-               'resampling-methods-bootstrap-approach'),
-              ('Resampling methods: Bootstrap steps',
+               'rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
+              ('Adam Optimizer', 2, None, 'adam-optimizer'),
+              ('"ADAM optimizer":"https://arxiv.org/abs/1412.6980"',
                2,
                None,
-               'resampling-methods-bootstrap-steps'),
-              ('Code example for the Bootstrap method',
+               'adam-optimizer-https-arxiv-org-abs-1412-6980'),
+              ('Why Combine Momentum and RMSProp?',
                2,
                None,
-               'code-example-for-the-bootstrap-method'),
-              ('Plotting the Histogram', 2, None, 'plotting-the-histogram'),
-              ('The bias-variance tradeoff',
+               'why-combine-momentum-and-rmsprop'),
+              ('Adam: Exponential Moving Averages (Moments)',
                2,
                None,
-               'the-bias-variance-tradeoff'),
-              ('A way to Read the Bias-Variance Tradeoff',
+               'adam-exponential-moving-averages-moments'),
+              ('Adam: Bias Correction', 2, None, 'adam-bias-correction'),
+              ('Adam: Update Rule Derivation',
                2,
                None,
-               'a-way-to-read-the-bias-variance-tradeoff'),
-              ('Example code for Bias-Variance tradeoff',
+               'adam-update-rule-derivation'),
+              ('Adam vs. AdaGrad and RMSProp',
                2,
                None,
-               'example-code-for-bias-variance-tradeoff'),
-              ('Understanding what happens',
+               'adam-vs-adagrad-and-rmsprop'),
+              ('Adaptivity Across Dimensions',
                2,
                None,
-               'understanding-what-happens'),
-              ('Summing up', 2, None, 'summing-up'),
-              ("Another Example from Scikit-Learn's Repository",
+               'adaptivity-across-dimensions'),
+              ('ADAM algorithm, taken from "Goodfellow et '
+               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
                2,
                None,
-               'another-example-from-scikit-learn-s-repository'),
-              ('Various steps in cross-validation',
+               'adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
+              ('Algorithms and codes for Adagrad, RMSprop and Adam',
                2,
                None,
-               'various-steps-in-cross-validation'),
-              ('Cross-validation in brief',
+               'algorithms-and-codes-for-adagrad-rmsprop-and-adam'),
+              ('Practical tips', 2, None, 'practical-tips'),
+              ('Sneaking in automatic differentiation using Autograd',
                2,
                None,
-               'cross-validation-in-brief'),
-              ('Code Example for Cross-validation and $k$-fold '
-               'Cross-validation',
+               'sneaking-in-automatic-differentiation-using-autograd'),
+              ('Same code but now with momentum gradient descent',
                2,
                None,
-               'code-example-for-cross-validation-and-k-fold-cross-validation'),
-              ('More examples on bootstrap and cross-validation and errors',
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ('Including Stochastic Gradient Descent with Autograd',
                2,
                None,
-               'more-examples-on-bootstrap-and-cross-validation-and-errors'),
-              ('The same example but now with cross-validation',
+               'including-stochastic-gradient-descent-with-autograd'),
+              ('Same code but now with momentum gradient descent',
                2,
                None,
-               'the-same-example-but-now-with-cross-validation'),
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ("But none of these can compete with Newton's method",
+               2,
+               None,
+               'but-none-of-these-can-compete-with-newton-s-method'),
+              ('Similar (second order function now) problem but now with '
+               'AdaGrad',
+               2,
+               None,
+               'similar-second-order-function-now-problem-but-now-with-adagrad'),
+              ('RMSprop for adaptive learning rate with Stochastic Gradient '
+               'Descent',
+               2,
+               None,
+               'rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent'),
+              ('And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"',
+               2,
+               None,
+               'and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf'),
               ('Material for the lab sessions',
                2,
                None,
                'material-for-the-lab-sessions'),
-              ('Linking the regression analysis with a statistical '
-               'interpretation',
+              ('Reminder on different scaling methods',
                2,
                None,
-               'linking-the-regression-analysis-with-a-statistical-interpretation'),
-              ('Assumptions made', 2, None, 'assumptions-made'),
-              ('Expectation value and variance',
+               'reminder-on-different-scaling-methods'),
+              ('Functionality in Scikit-Learn',
                2,
                None,
-               'expectation-value-and-variance'),
-              ('Expectation value and variance for $\\boldsymbol{\\beta}$',
+               'functionality-in-scikit-learn'),
+              ('More preprocessing', 2, None, 'more-preprocessing'),
+              ('Frequently used scaling functions',
                2,
                None,
-               'expectation-value-and-variance-for-boldsymbol-beta')]}
+               'frequently-used-scaling-functions')]}
 end of tocinfo -->
 
 <body>
@@ -220,66 +283,86 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week37-bs.html">Week 37: Statistical interpretations and Resampling Methods</a>
+    <a class="navbar-brand" href="http://github.com/week37-bs.html">Week 37: Gradient descent methods</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs001.html#plans-for-week-37-lecture-monday" style="font-size: 80%;">Plans for week 37, lecture Monday</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs002.html#plans-for-week-37-lab-sessions" style="font-size: 80%;">Plans for week 37, lab sessions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs003.html#material-for-lecture-monday-september-9" style="font-size: 80%;">Material for lecture Monday September 9</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs004.html#deriving-ols-from-a-probability-distribution" style="font-size: 80%;">Deriving OLS from a probability distribution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs005.html#independent-and-identically-distrubuted-iid" style="font-size: 80%;">Independent and Identically Distrubuted (iid)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs006.html#maximum-likelihood-estimation-mle" style="font-size: 80%;">Maximum Likelihood Estimation (MLE)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs007.html#a-new-cost-function" style="font-size: 80%;">A new Cost Function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs008.html#more-basic-statistics-and-bayes-theorem" style="font-size: 80%;">More basic Statistics and Bayes' theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs009.html#marginal-probability" style="font-size: 80%;">Marginal Probability</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs010.html#conditional-probability" style="font-size: 80%;">Conditional  Probability</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs011.html#bayes-theorem" style="font-size: 80%;">Bayes' Theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs012.html#interpretations-of-bayes-theorem" style="font-size: 80%;">Interpretations of Bayes' Theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs013.html#example-of-usage-of-bayes-theorem" style="font-size: 80%;">Example of Usage of Bayes' theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs014.html#doing-it-correctly" style="font-size: 80%;">Doing it correctly</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs015.html#bayes-theorem-and-ridge-and-lasso-regression" style="font-size: 80%;">Bayes' Theorem and Ridge and Lasso Regression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs016.html#ridge-and-bayes" style="font-size: 80%;">Ridge and Bayes</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs017.html#lasso-and-bayes" style="font-size: 80%;">Lasso and Bayes</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs021.html#why-resampling-methods" style="font-size: 80%;">Why resampling methods</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs023.html#resampling-methods" style="font-size: 80%;">Resampling methods</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs020.html#resampling-approaches-can-be-computationally-expensive" style="font-size: 80%;">Resampling approaches can be computationally expensive</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs021.html#why-resampling-methods" style="font-size: 80%;">Why resampling methods ?</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs022.html#statistical-analysis" style="font-size: 80%;">Statistical analysis</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs023.html#resampling-methods" style="font-size: 80%;">Resampling methods</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs024.html#resampling-methods-bootstrap" style="font-size: 80%;">Resampling methods: Bootstrap</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs025.html#the-central-limit-theorem" style="font-size: 80%;">The Central Limit Theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs026.html#finding-the-limit" style="font-size: 80%;">Finding the Limit</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs027.html#rewriting-the-delta-function" style="font-size: 80%;">Rewriting the \( \delta \)-function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs028.html#identifying-terms" style="font-size: 80%;">Identifying Terms</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs029.html#wrapping-it-up" style="font-size: 80%;">Wrapping it up</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs030.html#confidence-intervals" style="font-size: 80%;">Confidence Intervals</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs031.html#standard-approach-based-on-the-normal-distribution" style="font-size: 80%;">Standard Approach based on the Normal Distribution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs032.html#resampling-methods-bootstrap-background" style="font-size: 80%;">Resampling methods: Bootstrap background</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#resampling-methods-more-bootstrap-background" style="font-size: 80%;">Resampling methods: More Bootstrap background</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs034.html#resampling-methods-bootstrap-approach" style="font-size: 80%;">Resampling methods: Bootstrap approach</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#resampling-methods-bootstrap-steps" style="font-size: 80%;">Resampling methods: Bootstrap steps</a></li>
-     <!-- navigation toc: --> <li><a href="#code-example-for-the-bootstrap-method" style="font-size: 80%;">Code example for the Bootstrap method</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs037.html#plotting-the-histogram" style="font-size: 80%;">Plotting the Histogram</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs038.html#the-bias-variance-tradeoff" style="font-size: 80%;">The bias-variance tradeoff</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs039.html#a-way-to-read-the-bias-variance-tradeoff" style="font-size: 80%;">A way to Read the Bias-Variance Tradeoff</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs040.html#example-code-for-bias-variance-tradeoff" style="font-size: 80%;">Example code for Bias-Variance tradeoff</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs041.html#understanding-what-happens" style="font-size: 80%;">Understanding what happens</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs042.html#summing-up" style="font-size: 80%;">Summing up</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs043.html#another-example-from-scikit-learn-s-repository" style="font-size: 80%;">Another Example from Scikit-Learn's Repository</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs044.html#various-steps-in-cross-validation" style="font-size: 80%;">Various steps in cross-validation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs045.html#cross-validation-in-brief" style="font-size: 80%;">Cross-validation in brief</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs046.html#code-example-for-cross-validation-and-k-fold-cross-validation" style="font-size: 80%;">Code Example for Cross-validation and \( k \)-fold Cross-validation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs047.html#more-examples-on-bootstrap-and-cross-validation-and-errors" style="font-size: 80%;">More examples on bootstrap and cross-validation and errors</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs048.html#the-same-example-but-now-with-cross-validation" style="font-size: 80%;">The same example but now with cross-validation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs049.html#material-for-the-lab-sessions" style="font-size: 80%;">Material for the lab sessions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs050.html#linking-the-regression-analysis-with-a-statistical-interpretation" style="font-size: 80%;">Linking the regression analysis with a statistical interpretation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs051.html#assumptions-made" style="font-size: 80%;">Assumptions made</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs052.html#expectation-value-and-variance" style="font-size: 80%;">Expectation value and variance</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs053.html#expectation-value-and-variance-for-boldsymbol-beta" style="font-size: 80%;">Expectation value and variance for \( \boldsymbol{\beta} \)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs001.html#plans-for-week-37-lecture-monday" style="font-size: 80%;"><b>Plans for week 37, lecture Monday</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs002.html#readings-and-videos" style="font-size: 80%;"><b>Readings and Videos:</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs003.html#material-for-lecture-monday-september-8" style="font-size: 80%;"><b>Material for lecture Monday September 8</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs004.html#gradient-descent-and-revisiting-ordinary-least-squares-from-last-week" style="font-size: 80%;"><b>Gradient descent and revisiting Ordinary Least Squares from last week</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs009.html#gradient-descent-example" style="font-size: 80%;"><b>Gradient descent example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs006.html#the-derivative-of-the-cost-loss-function" style="font-size: 80%;"><b>The derivative of the cost/loss function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs007.html#the-hessian-matrix" style="font-size: 80%;"><b>The Hessian matrix</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs008.html#simple-program" style="font-size: 80%;"><b>Simple program</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs009.html#gradient-descent-example" style="font-size: 80%;"><b>Gradient Descent Example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs010.html#gradient-descent-and-ridge" style="font-size: 80%;"><b>Gradient descent and Ridge</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs011.html#the-hessian-matrix-for-ridge-regression" style="font-size: 80%;"><b>The Hessian matrix for Ridge Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs012.html#program-example-for-gradient-descent-with-ridge-regression" style="font-size: 80%;"><b>Program example for gradient descent with Ridge Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs013.html#using-gradient-descent-methods-limitations" style="font-size: 80%;"><b>Using gradient descent methods, limitations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs014.html#momentum-based-gd" style="font-size: 80%;"><b>Momentum based GD</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs015.html#improving-gradient-descent-with-momentum" style="font-size: 80%;"><b>Improving gradient descent with momentum</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs056.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs017.html#overview-video-on-stochastic-gradient-descent-sgd" style="font-size: 80%;"><b>Overview video on Stochastic Gradient Descent (SGD)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs018.html#batches-and-mini-batches" style="font-size: 80%;"><b>Batches and mini-batches</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs019.html#pros-and-cons" style="font-size: 80%;"><b>Pros and cons</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs020.html#convergence-rates" style="font-size: 80%;"><b>Convergence rates</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs021.html#accuracy" style="font-size: 80%;"><b>Accuracy</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs022.html#stochastic-gradient-descent-sgd" style="font-size: 80%;"><b>Stochastic Gradient Descent (SGD)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs023.html#stochastic-gradient-descent" style="font-size: 80%;"><b>Stochastic Gradient Descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs024.html#computation-of-gradients" style="font-size: 80%;"><b>Computation of gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs025.html#sgd-example" style="font-size: 80%;"><b>SGD example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs026.html#the-gradient-step" style="font-size: 80%;"><b>The gradient step</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs027.html#simple-example-code" style="font-size: 80%;"><b>Simple example code</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs028.html#when-do-we-stop" style="font-size: 80%;"><b>When do we stop?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs029.html#slightly-different-approach" style="font-size: 80%;"><b>Slightly different approach</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs030.html#time-decay-rate" style="font-size: 80%;"><b>Time decay rate</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs031.html#code-with-a-number-of-minibatches-which-varies" style="font-size: 80%;"><b>Code with a Number of Minibatches which varies</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs032.html#replace-or-not" style="font-size: 80%;"><b>Replace or not</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#sgd-vs-full-batch-gd-convergence-speed-and-memory-comparison" style="font-size: 80%;"><b>SGD vs Full-Batch GD: Convergence Speed and Memory Comparison</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#theoretical-convergence-speed-and-convex-optimization" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Theoretical Convergence Speed and convex optimization</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#strongly-convex-case" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Strongly Convex Case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#non-convex-problems" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Non-Convex Problems</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs034.html#memory-usage-and-scalability" style="font-size: 80%;"><b>Memory Usage and Scalability</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#empirical-evidence-convergence-time-and-memory-in-practice" style="font-size: 80%;"><b>Empirical Evidence: Convergence Time and Memory in Practice</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#deep-neural-networks" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Deep Neural Networks</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#memory-constraints" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Memory constraints</a></li>
+     <!-- navigation toc: --> <li><a href="#second-moment-of-the-gradient" style="font-size: 80%;"><b>Second moment of the gradient</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs037.html#challenge-choosing-a-fixed-learning-rate" style="font-size: 80%;"><b>Challenge: Choosing a Fixed Learning Rate</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs038.html#motivation-for-adaptive-step-sizes" style="font-size: 80%;"><b>Motivation for Adaptive Step Sizes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs038.html#adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>AdaGrad algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs039.html#derivation-of-the-adagrad-algorithm" style="font-size: 80%;"><b>Derivation of the AdaGrad Algorithm</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs040.html#adagrad-update-rule-derivation" style="font-size: 80%;"><b>AdaGrad Update Rule Derivation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs041.html#adagrad-properties" style="font-size: 80%;"><b>AdaGrad Properties</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs042.html#rmsprop-adaptive-learning-rates" style="font-size: 80%;"><b>RMSProp: Adaptive Learning Rates</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs042.html#rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>RMSProp algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs043.html#adam-optimizer" style="font-size: 80%;"><b>Adam Optimizer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs044.html#adam-optimizer-https-arxiv-org-abs-1412-6980" style="font-size: 80%;"><b>"ADAM optimizer":"https://arxiv.org/abs/1412.6980"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs045.html#why-combine-momentum-and-rmsprop" style="font-size: 80%;"><b>Why Combine Momentum and RMSProp?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs046.html#adam-exponential-moving-averages-moments" style="font-size: 80%;"><b>Adam: Exponential Moving Averages (Moments)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs047.html#adam-bias-correction" style="font-size: 80%;"><b>Adam: Bias Correction</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs048.html#adam-update-rule-derivation" style="font-size: 80%;"><b>Adam: Update Rule Derivation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs049.html#adam-vs-adagrad-and-rmsprop" style="font-size: 80%;"><b>Adam vs. AdaGrad and RMSProp</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs050.html#adaptivity-across-dimensions" style="font-size: 80%;"><b>Adaptivity Across Dimensions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs050.html#adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>ADAM algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs051.html#algorithms-and-codes-for-adagrad-rmsprop-and-adam" style="font-size: 80%;"><b>Algorithms and codes for Adagrad, RMSprop and Adam</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs052.html#practical-tips" style="font-size: 80%;"><b>Practical tips</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs053.html#sneaking-in-automatic-differentiation-using-autograd" style="font-size: 80%;"><b>Sneaking in automatic differentiation using Autograd</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs056.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs055.html#including-stochastic-gradient-descent-with-autograd" style="font-size: 80%;"><b>Including Stochastic Gradient Descent with Autograd</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs056.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs057.html#but-none-of-these-can-compete-with-newton-s-method" style="font-size: 80%;"><b>But none of these can compete with Newton's method</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs058.html#similar-second-order-function-now-problem-but-now-with-adagrad" style="font-size: 80%;"><b>Similar (second order function now) problem but now with AdaGrad</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs059.html#rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent" style="font-size: 80%;"><b>RMSprop for adaptive learning rate with Stochastic Gradient Descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs060.html#and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf" style="font-size: 80%;"><b>And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs061.html#material-for-the-lab-sessions" style="font-size: 80%;"><b>Material for the lab sessions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs062.html#reminder-on-different-scaling-methods" style="font-size: 80%;"><b>Reminder on different scaling methods</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs063.html#functionality-in-scikit-learn" style="font-size: 80%;"><b>Functionality in Scikit-Learn</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs064.html#more-preprocessing" style="font-size: 80%;"><b>More preprocessing</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs065.html#frequently-used-scaling-functions" style="font-size: 80%;"><b>Frequently used scaling functions</b></a></li>
 
         </ul>
       </li>
@@ -291,71 +374,30 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0036"></a>
 <!-- !split -->
-<h2 id="code-example-for-the-bootstrap-method" class="anchor">Code example for the Bootstrap method </h2>
+<h2 id="second-moment-of-the-gradient" class="anchor">Second moment of the gradient </h2>
 
-<p>The following code starts with a Gaussian distribution with mean value
-\( \mu =100 \) and variance \( \sigma=15 \). We use this to generate the data
-used in the bootstrap analysis. The bootstrap analysis returns a data
-set after a given number of bootstrap operations (as many as we have
-data points). This data set consists of estimated mean values for each
-bootstrap operation. The histogram generated by the bootstrap method
-shows that the distribution for these mean values is also a Gaussian,
-centered around the mean value \( \mu=100 \) but with standard deviation
-\( \sigma/\sqrt{n} \), where \( n \) is the number of bootstrap samples (in
-this case the same as the number of original data points). The value
-of the standard deviation is what we expect from the central limit
-theorem.
+<p>In stochastic gradient descent, with and without momentum, we still
+have to specify a schedule for tuning the learning rates \( \eta_t \)
+as a function of time.  As discussed in the context of Newton's
+method, this presents a number of dilemmas. The learning rate is
+limited by the steepest direction which can change depending on the
+current position in the landscape. To circumvent this problem, ideally
+our algorithm would keep track of curvature and take large steps in
+shallow, flat directions and small steps in steep, narrow directions.
+Second-order methods accomplish this by calculating or approximating
+the Hessian and normalizing the learning rate by the
+curvature. However, this is very computationally expensive for
+extremely large models. Ideally, we would like to be able to
+adaptively change the step size to match the landscape without paying
+the steep computational price of calculating or approximating
+Hessians.
 </p>
 
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">time</span> <span style="color: #008000; font-weight: bold">import</span> time
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">scipy.stats</span> <span style="color: #008000; font-weight: bold">import</span> norm
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">matplotlib.pyplot</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">plt</span>
-
-<span style="color: #408080; font-style: italic"># Returns mean of bootstrap samples </span>
-<span style="color: #408080; font-style: italic"># Bootstrap algorithm</span>
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">bootstrap</span>(data, datapoints):
-    t <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros(datapoints)
-    n <span style="color: #666666">=</span> <span style="color: #008000">len</span>(data)
-    <span style="color: #408080; font-style: italic"># non-parametric bootstrap         </span>
-    <span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(datapoints):
-        t[i] <span style="color: #666666">=</span> np<span style="color: #666666">.</span>mean(data[np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>randint(<span style="color: #666666">0</span>,n,n)])
-    <span style="color: #408080; font-style: italic"># analysis    </span>
-    <span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;Bootstrap Statistics :&quot;</span>)
-    <span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;original           bias      std. error&quot;</span>)
-    <span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;</span><span style="color: #BB6688; font-weight: bold">%8g</span><span style="color: #BA2121"> </span><span style="color: #BB6688; font-weight: bold">%8g</span><span style="color: #BA2121"> </span><span style="color: #BB6688; font-weight: bold">%14g</span><span style="color: #BA2121"> </span><span style="color: #BB6688; font-weight: bold">%15g</span><span style="color: #BA2121">&quot;</span> <span style="color: #666666">%</span> (np<span style="color: #666666">.</span>mean(data), np<span style="color: #666666">.</span>std(data),np<span style="color: #666666">.</span>mean(t),np<span style="color: #666666">.</span>std(t)))
-    <span style="color: #008000; font-weight: bold">return</span> t
-
-<span style="color: #408080; font-style: italic"># We set the mean value to 100 and the standard deviation to 15</span>
-mu, sigma <span style="color: #666666">=</span> <span style="color: #666666">100</span>, <span style="color: #666666">15</span>
-datapoints <span style="color: #666666">=</span> <span style="color: #666666">10000</span>
-<span style="color: #408080; font-style: italic"># We generate random numbers according to the normal distribution</span>
-x <span style="color: #666666">=</span> mu <span style="color: #666666">+</span> sigma<span style="color: #666666">*</span>np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>randn(datapoints)
-<span style="color: #408080; font-style: italic"># bootstrap returns the data sample                                    </span>
-t <span style="color: #666666">=</span> bootstrap(x, datapoints)
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-<p>We see that our new variance and from that the standard deviation, agrees with the central limit theorem.</p>
+<p>During the last decade a number of methods have been introduced that accomplish
+this by tracking not only the gradient, but also the second moment of
+the gradient. These methods include AdaGrad, AdaDelta, Root Mean Squared Propagation (RMS-Prop), and
+<a href="https://arxiv.org/abs/1412.6980" target="_self">ADAM</a>.
+</p>
 
 <p>
 <!-- navigation buttons at the bottom of the page -->
@@ -382,7 +424,7 @@ <h2 id="code-example-for-the-bootstrap-method" class="anchor">Code example for t
   <li><a href="http://github.com/._week37-bs044.html">45</a></li>
   <li><a href="http://github.com/._week37-bs045.html">46</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week37-bs053.html">54</a></li>
+  <li><a href="http://github.com/._week37-bs065.html">66</a></li>
   <li><a href="http://github.com/._week37-bs037.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week37/html/._week37-bs037.html b/doc/pub/week37/html/._week37-bs037.html
index 510a4a4ba..4988bb907 100644
--- a/doc/pub/week37/html/._week37-bs037.html
+++ b/doc/pub/week37/html/._week37-bs037.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 37: Statistical interpretations and Resampling Methods">
-<title>Week 37: Statistical interpretations and Resampling Methods</title>
+<meta name="description" content="Week 37: Gradient descent methods">
+<title>Week 37: Gradient descent methods</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week37.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week37-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -40,159 +40,222 @@
                2,
                None,
                'plans-for-week-37-lecture-monday'),
-              ('Plans for week 37, lab sessions',
+              ('Readings and Videos:', 2, None, 'readings-and-videos'),
+              ('Material for lecture Monday September 8',
                2,
                None,
-               'plans-for-week-37-lab-sessions'),
-              ('Material for lecture Monday September 9',
+               'material-for-lecture-monday-september-8'),
+              ('Gradient descent and revisiting Ordinary Least Squares from '
+               'last week',
                2,
                None,
-               'material-for-lecture-monday-september-9'),
-              ('Deriving OLS from a probability distribution',
+               'gradient-descent-and-revisiting-ordinary-least-squares-from-last-week'),
+              ('Gradient descent example', 2, None, 'gradient-descent-example'),
+              ('The derivative of the cost/loss function',
                2,
                None,
-               'deriving-ols-from-a-probability-distribution'),
-              ('Independent and Identically Distrubuted (iid)',
+               'the-derivative-of-the-cost-loss-function'),
+              ('The Hessian matrix', 2, None, 'the-hessian-matrix'),
+              ('Simple program', 2, None, 'simple-program'),
+              ('Gradient Descent Example', 2, None, 'gradient-descent-example'),
+              ('Gradient descent and Ridge',
                2,
                None,
-               'independent-and-identically-distrubuted-iid'),
-              ('Maximum Likelihood Estimation (MLE)',
+               'gradient-descent-and-ridge'),
+              ('The Hessian matrix for Ridge Regression',
                2,
                None,
-               'maximum-likelihood-estimation-mle'),
-              ('A new Cost Function', 2, None, 'a-new-cost-function'),
-              ("More basic Statistics and Bayes' theorem",
+               'the-hessian-matrix-for-ridge-regression'),
+              ('Program example for gradient descent with Ridge Regression',
                2,
                None,
-               'more-basic-statistics-and-bayes-theorem'),
-              ('Marginal Probability', 2, None, 'marginal-probability'),
-              ('Conditional  Probability', 2, None, 'conditional-probability'),
-              ("Bayes' Theorem", 2, None, 'bayes-theorem'),
-              ("Interpretations of Bayes' Theorem",
+               'program-example-for-gradient-descent-with-ridge-regression'),
+              ('Using gradient descent methods, limitations',
                2,
                None,
-               'interpretations-of-bayes-theorem'),
-              ("Example of Usage of Bayes' theorem",
+               'using-gradient-descent-methods-limitations'),
+              ('Momentum based GD', 2, None, 'momentum-based-gd'),
+              ('Improving gradient descent with momentum',
                2,
                None,
-               'example-of-usage-of-bayes-theorem'),
-              ('Doing it correctly', 2, None, 'doing-it-correctly'),
-              ("Bayes' Theorem and Ridge and Lasso Regression",
+               'improving-gradient-descent-with-momentum'),
+              ('Same code but now with momentum gradient descent',
                2,
                None,
-               'bayes-theorem-and-ridge-and-lasso-regression'),
-              ('Ridge and Bayes', 2, None, 'ridge-and-bayes'),
-              ('Lasso and Bayes', 2, None, 'lasso-and-bayes'),
-              ('Why resampling methods', 2, None, 'why-resampling-methods'),
-              ('Resampling methods', 2, None, 'resampling-methods'),
-              ('Resampling approaches can be computationally expensive',
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ('Overview video on Stochastic Gradient Descent (SGD)',
                2,
                None,
-               'resampling-approaches-can-be-computationally-expensive'),
-              ('Why resampling methods ?', 2, None, 'why-resampling-methods'),
-              ('Statistical analysis', 2, None, 'statistical-analysis'),
-              ('Resampling methods', 2, None, 'resampling-methods'),
-              ('Resampling methods: Bootstrap',
+               'overview-video-on-stochastic-gradient-descent-sgd'),
+              ('Batches and mini-batches', 2, None, 'batches-and-mini-batches'),
+              ('Pros and cons', 2, None, 'pros-and-cons'),
+              ('Convergence rates', 2, None, 'convergence-rates'),
+              ('Accuracy', 2, None, 'accuracy'),
+              ('Stochastic Gradient Descent (SGD)',
                2,
                None,
-               'resampling-methods-bootstrap'),
-              ('The Central Limit Theorem',
+               'stochastic-gradient-descent-sgd'),
+              ('Stochastic Gradient Descent',
                2,
                None,
-               'the-central-limit-theorem'),
-              ('Finding the Limit', 2, None, 'finding-the-limit'),
-              ('Rewriting the $\\delta$-function',
+               'stochastic-gradient-descent'),
+              ('Computation of gradients', 2, None, 'computation-of-gradients'),
+              ('SGD example', 2, None, 'sgd-example'),
+              ('The gradient step', 2, None, 'the-gradient-step'),
+              ('Simple example code', 2, None, 'simple-example-code'),
+              ('When do we stop?', 2, None, 'when-do-we-stop'),
+              ('Slightly different approach',
                2,
                None,
-               'rewriting-the-delta-function'),
-              ('Identifying Terms', 2, None, 'identifying-terms'),
-              ('Wrapping it up', 2, None, 'wrapping-it-up'),
-              ('Confidence Intervals', 2, None, 'confidence-intervals'),
-              ('Standard Approach based on the Normal Distribution',
+               'slightly-different-approach'),
+              ('Time decay rate', 2, None, 'time-decay-rate'),
+              ('Code with a Number of Minibatches which varies',
                2,
                None,
-               'standard-approach-based-on-the-normal-distribution'),
-              ('Resampling methods: Bootstrap background',
+               'code-with-a-number-of-minibatches-which-varies'),
+              ('Replace or not', 2, None, 'replace-or-not'),
+              ('SGD vs Full-Batch GD: Convergence Speed and Memory Comparison',
                2,
                None,
-               'resampling-methods-bootstrap-background'),
-              ('Resampling methods: More Bootstrap background',
+               'sgd-vs-full-batch-gd-convergence-speed-and-memory-comparison'),
+              ('Theoretical Convergence Speed and convex optimization',
+               3,
+               None,
+               'theoretical-convergence-speed-and-convex-optimization'),
+              ('Strongly Convex Case', 3, None, 'strongly-convex-case'),
+              ('Non-Convex Problems', 3, None, 'non-convex-problems'),
+              ('Memory Usage and Scalability',
+               2,
+               None,
+               'memory-usage-and-scalability'),
+              ('Empirical Evidence: Convergence Time and Memory in Practice',
+               2,
+               None,
+               'empirical-evidence-convergence-time-and-memory-in-practice'),
+              ('Deep Neural Networks', 3, None, 'deep-neural-networks'),
+              ('Memory constraints', 3, None, 'memory-constraints'),
+              ('Second moment of the gradient',
+               2,
+               None,
+               'second-moment-of-the-gradient'),
+              ('Challenge: Choosing a Fixed Learning Rate',
+               2,
+               None,
+               'challenge-choosing-a-fixed-learning-rate'),
+              ('Motivation for Adaptive Step Sizes',
+               2,
+               None,
+               'motivation-for-adaptive-step-sizes'),
+              ('AdaGrad algorithm, taken from "Goodfellow et '
+               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
+               2,
+               None,
+               'adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
+              ('Derivation of the AdaGrad Algorithm',
+               2,
+               None,
+               'derivation-of-the-adagrad-algorithm'),
+              ('AdaGrad Update Rule Derivation',
+               2,
+               None,
+               'adagrad-update-rule-derivation'),
+              ('AdaGrad Properties', 2, None, 'adagrad-properties'),
+              ('RMSProp: Adaptive Learning Rates',
+               2,
+               None,
+               'rmsprop-adaptive-learning-rates'),
+              ('RMSProp algorithm, taken from "Goodfellow et '
+               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
+               2,
+               None,
+               'rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
+              ('Adam Optimizer', 2, None, 'adam-optimizer'),
+              ('"ADAM optimizer":"https://arxiv.org/abs/1412.6980"',
+               2,
+               None,
+               'adam-optimizer-https-arxiv-org-abs-1412-6980'),
+              ('Why Combine Momentum and RMSProp?',
+               2,
+               None,
+               'why-combine-momentum-and-rmsprop'),
+              ('Adam: Exponential Moving Averages (Moments)',
                2,
                None,
-               'resampling-methods-more-bootstrap-background'),
-              ('Resampling methods: Bootstrap approach',
+               'adam-exponential-moving-averages-moments'),
+              ('Adam: Bias Correction', 2, None, 'adam-bias-correction'),
+              ('Adam: Update Rule Derivation',
                2,
                None,
-               'resampling-methods-bootstrap-approach'),
-              ('Resampling methods: Bootstrap steps',
+               'adam-update-rule-derivation'),
+              ('Adam vs. AdaGrad and RMSProp',
                2,
                None,
-               'resampling-methods-bootstrap-steps'),
-              ('Code example for the Bootstrap method',
+               'adam-vs-adagrad-and-rmsprop'),
+              ('Adaptivity Across Dimensions',
                2,
                None,
-               'code-example-for-the-bootstrap-method'),
-              ('Plotting the Histogram', 2, None, 'plotting-the-histogram'),
-              ('The bias-variance tradeoff',
+               'adaptivity-across-dimensions'),
+              ('ADAM algorithm, taken from "Goodfellow et '
+               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
                2,
                None,
-               'the-bias-variance-tradeoff'),
-              ('A way to Read the Bias-Variance Tradeoff',
+               'adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
+              ('Algorithms and codes for Adagrad, RMSprop and Adam',
                2,
                None,
-               'a-way-to-read-the-bias-variance-tradeoff'),
-              ('Example code for Bias-Variance tradeoff',
+               'algorithms-and-codes-for-adagrad-rmsprop-and-adam'),
+              ('Practical tips', 2, None, 'practical-tips'),
+              ('Sneaking in automatic differentiation using Autograd',
                2,
                None,
-               'example-code-for-bias-variance-tradeoff'),
-              ('Understanding what happens',
+               'sneaking-in-automatic-differentiation-using-autograd'),
+              ('Same code but now with momentum gradient descent',
                2,
                None,
-               'understanding-what-happens'),
-              ('Summing up', 2, None, 'summing-up'),
-              ("Another Example from Scikit-Learn's Repository",
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ('Including Stochastic Gradient Descent with Autograd',
                2,
                None,
-               'another-example-from-scikit-learn-s-repository'),
-              ('Various steps in cross-validation',
+               'including-stochastic-gradient-descent-with-autograd'),
+              ('Same code but now with momentum gradient descent',
                2,
                None,
-               'various-steps-in-cross-validation'),
-              ('Cross-validation in brief',
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ("But none of these can compete with Newton's method",
                2,
                None,
-               'cross-validation-in-brief'),
-              ('Code Example for Cross-validation and $k$-fold '
-               'Cross-validation',
+               'but-none-of-these-can-compete-with-newton-s-method'),
+              ('Similar (second order function now) problem but now with '
+               'AdaGrad',
                2,
                None,
-               'code-example-for-cross-validation-and-k-fold-cross-validation'),
-              ('More examples on bootstrap and cross-validation and errors',
+               'similar-second-order-function-now-problem-but-now-with-adagrad'),
+              ('RMSprop for adaptive learning rate with Stochastic Gradient '
+               'Descent',
                2,
                None,
-               'more-examples-on-bootstrap-and-cross-validation-and-errors'),
-              ('The same example but now with cross-validation',
+               'rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent'),
+              ('And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"',
                2,
                None,
-               'the-same-example-but-now-with-cross-validation'),
+               'and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf'),
               ('Material for the lab sessions',
                2,
                None,
                'material-for-the-lab-sessions'),
-              ('Linking the regression analysis with a statistical '
-               'interpretation',
+              ('Reminder on different scaling methods',
                2,
                None,
-               'linking-the-regression-analysis-with-a-statistical-interpretation'),
-              ('Assumptions made', 2, None, 'assumptions-made'),
-              ('Expectation value and variance',
+               'reminder-on-different-scaling-methods'),
+              ('Functionality in Scikit-Learn',
                2,
                None,
-               'expectation-value-and-variance'),
-              ('Expectation value and variance for $\\boldsymbol{\\beta}$',
+               'functionality-in-scikit-learn'),
+              ('More preprocessing', 2, None, 'more-preprocessing'),
+              ('Frequently used scaling functions',
                2,
                None,
-               'expectation-value-and-variance-for-boldsymbol-beta')]}
+               'frequently-used-scaling-functions')]}
 end of tocinfo -->
 
 <body>
@@ -220,66 +283,86 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week37-bs.html">Week 37: Statistical interpretations and Resampling Methods</a>
+    <a class="navbar-brand" href="http://github.com/week37-bs.html">Week 37: Gradient descent methods</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs001.html#plans-for-week-37-lecture-monday" style="font-size: 80%;">Plans for week 37, lecture Monday</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs002.html#plans-for-week-37-lab-sessions" style="font-size: 80%;">Plans for week 37, lab sessions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs003.html#material-for-lecture-monday-september-9" style="font-size: 80%;">Material for lecture Monday September 9</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs004.html#deriving-ols-from-a-probability-distribution" style="font-size: 80%;">Deriving OLS from a probability distribution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs005.html#independent-and-identically-distrubuted-iid" style="font-size: 80%;">Independent and Identically Distrubuted (iid)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs006.html#maximum-likelihood-estimation-mle" style="font-size: 80%;">Maximum Likelihood Estimation (MLE)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs007.html#a-new-cost-function" style="font-size: 80%;">A new Cost Function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs008.html#more-basic-statistics-and-bayes-theorem" style="font-size: 80%;">More basic Statistics and Bayes' theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs009.html#marginal-probability" style="font-size: 80%;">Marginal Probability</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs010.html#conditional-probability" style="font-size: 80%;">Conditional  Probability</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs011.html#bayes-theorem" style="font-size: 80%;">Bayes' Theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs012.html#interpretations-of-bayes-theorem" style="font-size: 80%;">Interpretations of Bayes' Theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs013.html#example-of-usage-of-bayes-theorem" style="font-size: 80%;">Example of Usage of Bayes' theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs014.html#doing-it-correctly" style="font-size: 80%;">Doing it correctly</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs015.html#bayes-theorem-and-ridge-and-lasso-regression" style="font-size: 80%;">Bayes' Theorem and Ridge and Lasso Regression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs016.html#ridge-and-bayes" style="font-size: 80%;">Ridge and Bayes</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs017.html#lasso-and-bayes" style="font-size: 80%;">Lasso and Bayes</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs021.html#why-resampling-methods" style="font-size: 80%;">Why resampling methods</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs023.html#resampling-methods" style="font-size: 80%;">Resampling methods</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs020.html#resampling-approaches-can-be-computationally-expensive" style="font-size: 80%;">Resampling approaches can be computationally expensive</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs021.html#why-resampling-methods" style="font-size: 80%;">Why resampling methods ?</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs022.html#statistical-analysis" style="font-size: 80%;">Statistical analysis</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs023.html#resampling-methods" style="font-size: 80%;">Resampling methods</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs024.html#resampling-methods-bootstrap" style="font-size: 80%;">Resampling methods: Bootstrap</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs025.html#the-central-limit-theorem" style="font-size: 80%;">The Central Limit Theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs026.html#finding-the-limit" style="font-size: 80%;">Finding the Limit</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs027.html#rewriting-the-delta-function" style="font-size: 80%;">Rewriting the \( \delta \)-function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs028.html#identifying-terms" style="font-size: 80%;">Identifying Terms</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs029.html#wrapping-it-up" style="font-size: 80%;">Wrapping it up</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs030.html#confidence-intervals" style="font-size: 80%;">Confidence Intervals</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs031.html#standard-approach-based-on-the-normal-distribution" style="font-size: 80%;">Standard Approach based on the Normal Distribution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs032.html#resampling-methods-bootstrap-background" style="font-size: 80%;">Resampling methods: Bootstrap background</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#resampling-methods-more-bootstrap-background" style="font-size: 80%;">Resampling methods: More Bootstrap background</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs034.html#resampling-methods-bootstrap-approach" style="font-size: 80%;">Resampling methods: Bootstrap approach</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#resampling-methods-bootstrap-steps" style="font-size: 80%;">Resampling methods: Bootstrap steps</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs036.html#code-example-for-the-bootstrap-method" style="font-size: 80%;">Code example for the Bootstrap method</a></li>
-     <!-- navigation toc: --> <li><a href="#plotting-the-histogram" style="font-size: 80%;">Plotting the Histogram</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs038.html#the-bias-variance-tradeoff" style="font-size: 80%;">The bias-variance tradeoff</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs039.html#a-way-to-read-the-bias-variance-tradeoff" style="font-size: 80%;">A way to Read the Bias-Variance Tradeoff</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs040.html#example-code-for-bias-variance-tradeoff" style="font-size: 80%;">Example code for Bias-Variance tradeoff</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs041.html#understanding-what-happens" style="font-size: 80%;">Understanding what happens</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs042.html#summing-up" style="font-size: 80%;">Summing up</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs043.html#another-example-from-scikit-learn-s-repository" style="font-size: 80%;">Another Example from Scikit-Learn's Repository</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs044.html#various-steps-in-cross-validation" style="font-size: 80%;">Various steps in cross-validation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs045.html#cross-validation-in-brief" style="font-size: 80%;">Cross-validation in brief</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs046.html#code-example-for-cross-validation-and-k-fold-cross-validation" style="font-size: 80%;">Code Example for Cross-validation and \( k \)-fold Cross-validation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs047.html#more-examples-on-bootstrap-and-cross-validation-and-errors" style="font-size: 80%;">More examples on bootstrap and cross-validation and errors</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs048.html#the-same-example-but-now-with-cross-validation" style="font-size: 80%;">The same example but now with cross-validation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs049.html#material-for-the-lab-sessions" style="font-size: 80%;">Material for the lab sessions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs050.html#linking-the-regression-analysis-with-a-statistical-interpretation" style="font-size: 80%;">Linking the regression analysis with a statistical interpretation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs051.html#assumptions-made" style="font-size: 80%;">Assumptions made</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs052.html#expectation-value-and-variance" style="font-size: 80%;">Expectation value and variance</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs053.html#expectation-value-and-variance-for-boldsymbol-beta" style="font-size: 80%;">Expectation value and variance for \( \boldsymbol{\beta} \)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs001.html#plans-for-week-37-lecture-monday" style="font-size: 80%;"><b>Plans for week 37, lecture Monday</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs002.html#readings-and-videos" style="font-size: 80%;"><b>Readings and Videos:</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs003.html#material-for-lecture-monday-september-8" style="font-size: 80%;"><b>Material for lecture Monday September 8</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs004.html#gradient-descent-and-revisiting-ordinary-least-squares-from-last-week" style="font-size: 80%;"><b>Gradient descent and revisiting Ordinary Least Squares from last week</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs009.html#gradient-descent-example" style="font-size: 80%;"><b>Gradient descent example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs006.html#the-derivative-of-the-cost-loss-function" style="font-size: 80%;"><b>The derivative of the cost/loss function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs007.html#the-hessian-matrix" style="font-size: 80%;"><b>The Hessian matrix</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs008.html#simple-program" style="font-size: 80%;"><b>Simple program</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs009.html#gradient-descent-example" style="font-size: 80%;"><b>Gradient Descent Example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs010.html#gradient-descent-and-ridge" style="font-size: 80%;"><b>Gradient descent and Ridge</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs011.html#the-hessian-matrix-for-ridge-regression" style="font-size: 80%;"><b>The Hessian matrix for Ridge Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs012.html#program-example-for-gradient-descent-with-ridge-regression" style="font-size: 80%;"><b>Program example for gradient descent with Ridge Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs013.html#using-gradient-descent-methods-limitations" style="font-size: 80%;"><b>Using gradient descent methods, limitations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs014.html#momentum-based-gd" style="font-size: 80%;"><b>Momentum based GD</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs015.html#improving-gradient-descent-with-momentum" style="font-size: 80%;"><b>Improving gradient descent with momentum</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs056.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs017.html#overview-video-on-stochastic-gradient-descent-sgd" style="font-size: 80%;"><b>Overview video on Stochastic Gradient Descent (SGD)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs018.html#batches-and-mini-batches" style="font-size: 80%;"><b>Batches and mini-batches</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs019.html#pros-and-cons" style="font-size: 80%;"><b>Pros and cons</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs020.html#convergence-rates" style="font-size: 80%;"><b>Convergence rates</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs021.html#accuracy" style="font-size: 80%;"><b>Accuracy</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs022.html#stochastic-gradient-descent-sgd" style="font-size: 80%;"><b>Stochastic Gradient Descent (SGD)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs023.html#stochastic-gradient-descent" style="font-size: 80%;"><b>Stochastic Gradient Descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs024.html#computation-of-gradients" style="font-size: 80%;"><b>Computation of gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs025.html#sgd-example" style="font-size: 80%;"><b>SGD example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs026.html#the-gradient-step" style="font-size: 80%;"><b>The gradient step</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs027.html#simple-example-code" style="font-size: 80%;"><b>Simple example code</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs028.html#when-do-we-stop" style="font-size: 80%;"><b>When do we stop?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs029.html#slightly-different-approach" style="font-size: 80%;"><b>Slightly different approach</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs030.html#time-decay-rate" style="font-size: 80%;"><b>Time decay rate</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs031.html#code-with-a-number-of-minibatches-which-varies" style="font-size: 80%;"><b>Code with a Number of Minibatches which varies</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs032.html#replace-or-not" style="font-size: 80%;"><b>Replace or not</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#sgd-vs-full-batch-gd-convergence-speed-and-memory-comparison" style="font-size: 80%;"><b>SGD vs Full-Batch GD: Convergence Speed and Memory Comparison</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#theoretical-convergence-speed-and-convex-optimization" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Theoretical Convergence Speed and convex optimization</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#strongly-convex-case" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Strongly Convex Case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#non-convex-problems" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Non-Convex Problems</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs034.html#memory-usage-and-scalability" style="font-size: 80%;"><b>Memory Usage and Scalability</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#empirical-evidence-convergence-time-and-memory-in-practice" style="font-size: 80%;"><b>Empirical Evidence: Convergence Time and Memory in Practice</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#deep-neural-networks" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Deep Neural Networks</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#memory-constraints" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Memory constraints</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs036.html#second-moment-of-the-gradient" style="font-size: 80%;"><b>Second moment of the gradient</b></a></li>
+     <!-- navigation toc: --> <li><a href="#challenge-choosing-a-fixed-learning-rate" style="font-size: 80%;"><b>Challenge: Choosing a Fixed Learning Rate</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs038.html#motivation-for-adaptive-step-sizes" style="font-size: 80%;"><b>Motivation for Adaptive Step Sizes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs038.html#adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>AdaGrad algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs039.html#derivation-of-the-adagrad-algorithm" style="font-size: 80%;"><b>Derivation of the AdaGrad Algorithm</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs040.html#adagrad-update-rule-derivation" style="font-size: 80%;"><b>AdaGrad Update Rule Derivation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs041.html#adagrad-properties" style="font-size: 80%;"><b>AdaGrad Properties</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs042.html#rmsprop-adaptive-learning-rates" style="font-size: 80%;"><b>RMSProp: Adaptive Learning Rates</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs042.html#rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>RMSProp algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs043.html#adam-optimizer" style="font-size: 80%;"><b>Adam Optimizer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs044.html#adam-optimizer-https-arxiv-org-abs-1412-6980" style="font-size: 80%;"><b>"ADAM optimizer":"https://arxiv.org/abs/1412.6980"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs045.html#why-combine-momentum-and-rmsprop" style="font-size: 80%;"><b>Why Combine Momentum and RMSProp?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs046.html#adam-exponential-moving-averages-moments" style="font-size: 80%;"><b>Adam: Exponential Moving Averages (Moments)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs047.html#adam-bias-correction" style="font-size: 80%;"><b>Adam: Bias Correction</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs048.html#adam-update-rule-derivation" style="font-size: 80%;"><b>Adam: Update Rule Derivation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs049.html#adam-vs-adagrad-and-rmsprop" style="font-size: 80%;"><b>Adam vs. AdaGrad and RMSProp</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs050.html#adaptivity-across-dimensions" style="font-size: 80%;"><b>Adaptivity Across Dimensions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs050.html#adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>ADAM algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs051.html#algorithms-and-codes-for-adagrad-rmsprop-and-adam" style="font-size: 80%;"><b>Algorithms and codes for Adagrad, RMSprop and Adam</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs052.html#practical-tips" style="font-size: 80%;"><b>Practical tips</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs053.html#sneaking-in-automatic-differentiation-using-autograd" style="font-size: 80%;"><b>Sneaking in automatic differentiation using Autograd</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs056.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs055.html#including-stochastic-gradient-descent-with-autograd" style="font-size: 80%;"><b>Including Stochastic Gradient Descent with Autograd</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs056.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs057.html#but-none-of-these-can-compete-with-newton-s-method" style="font-size: 80%;"><b>But none of these can compete with Newton's method</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs058.html#similar-second-order-function-now-problem-but-now-with-adagrad" style="font-size: 80%;"><b>Similar (second order function now) problem but now with AdaGrad</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs059.html#rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent" style="font-size: 80%;"><b>RMSprop for adaptive learning rate with Stochastic Gradient Descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs060.html#and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf" style="font-size: 80%;"><b>And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs061.html#material-for-the-lab-sessions" style="font-size: 80%;"><b>Material for the lab sessions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs062.html#reminder-on-different-scaling-methods" style="font-size: 80%;"><b>Reminder on different scaling methods</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs063.html#functionality-in-scikit-learn" style="font-size: 80%;"><b>Functionality in Scikit-Learn</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs064.html#more-preprocessing" style="font-size: 80%;"><b>More preprocessing</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs065.html#frequently-used-scaling-functions" style="font-size: 80%;"><b>Frequently used scaling functions</b></a></li>
 
         </ul>
       </li>
@@ -291,39 +374,20 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0037"></a>
 <!-- !split -->
-<h2 id="plotting-the-histogram" class="anchor">Plotting the Histogram </h2>
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #408080; font-style: italic"># the histogram of the bootstrapped data (normalized data if density = True)</span>
-n, binsboot, patches <span style="color: #666666">=</span> plt<span style="color: #666666">.</span>hist(t, <span style="color: #666666">50</span>, density<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">True</span>, facecolor<span style="color: #666666">=</span><span style="color: #BA2121">&#39;red&#39;</span>, alpha<span style="color: #666666">=0.75</span>)
-<span style="color: #408080; font-style: italic"># add a &#39;best fit&#39; line  </span>
-y <span style="color: #666666">=</span> norm<span style="color: #666666">.</span>pdf(binsboot, np<span style="color: #666666">.</span>mean(t), np<span style="color: #666666">.</span>std(t))
-lt <span style="color: #666666">=</span> plt<span style="color: #666666">.</span>plot(binsboot, y, <span style="color: #BA2121">&#39;b&#39;</span>, linewidth<span style="color: #666666">=1</span>)
-plt<span style="color: #666666">.</span>xlabel(<span style="color: #BA2121">&#39;x&#39;</span>)
-plt<span style="color: #666666">.</span>ylabel(<span style="color: #BA2121">&#39;Probability&#39;</span>)
-plt<span style="color: #666666">.</span>grid(<span style="color: #008000; font-weight: bold">True</span>)
-plt<span style="color: #666666">.</span>show()
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-
+<h2 id="challenge-choosing-a-fixed-learning-rate" class="anchor">Challenge: Choosing a Fixed Learning Rate </h2>
+<p>A fixed \( \eta \) is hard to get right:</p>
+<ol>
+<li> If \( \eta \) is too large, the updates can overshoot the minimum, causing oscillations or divergence</li>
+<li> If \( \eta \) is too small, convergence is very slow (many iterations to make progress)</li>
+</ol>
+<p>In practice, one often uses trial-and-error or schedules (decaying \( \eta \) over time) to find a workable balance.
+For a function with steep directions and flat directions, a single global \( \eta \) may be inappropriate:
+</p>
+<ol>
+<li> Steep coordinates require a smaller step size to avoid oscillation.</li>
+<li> Flat/shallow coordinates could use a larger step to speed up progress.</li>
+<li> This issue is pronounced in high-dimensional problems with **sparse or varying-scale features** &#8211; we need a method to adjust step sizesper feature.</li>
+</ol>
 <p>
 <!-- navigation buttons at the bottom of the page -->
 <ul class="pagination">
@@ -349,7 +413,7 @@ <h2 id="plotting-the-histogram" class="anchor">Plotting the Histogram </h2>
   <li><a href="http://github.com/._week37-bs045.html">46</a></li>
   <li><a href="http://github.com/._week37-bs046.html">47</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week37-bs053.html">54</a></li>
+  <li><a href="http://github.com/._week37-bs065.html">66</a></li>
   <li><a href="http://github.com/._week37-bs038.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week37/html/._week37-bs038.html b/doc/pub/week37/html/._week37-bs038.html
index 7825353e5..d07855f29 100644
--- a/doc/pub/week37/html/._week37-bs038.html
+++ b/doc/pub/week37/html/._week37-bs038.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 37: Statistical interpretations and Resampling Methods">
-<title>Week 37: Statistical interpretations and Resampling Methods</title>
+<meta name="description" content="Week 37: Gradient descent methods">
+<title>Week 37: Gradient descent methods</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week37.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week37-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -40,159 +40,222 @@
                2,
                None,
                'plans-for-week-37-lecture-monday'),
-              ('Plans for week 37, lab sessions',
+              ('Readings and Videos:', 2, None, 'readings-and-videos'),
+              ('Material for lecture Monday September 8',
                2,
                None,
-               'plans-for-week-37-lab-sessions'),
-              ('Material for lecture Monday September 9',
+               'material-for-lecture-monday-september-8'),
+              ('Gradient descent and revisiting Ordinary Least Squares from '
+               'last week',
                2,
                None,
-               'material-for-lecture-monday-september-9'),
-              ('Deriving OLS from a probability distribution',
+               'gradient-descent-and-revisiting-ordinary-least-squares-from-last-week'),
+              ('Gradient descent example', 2, None, 'gradient-descent-example'),
+              ('The derivative of the cost/loss function',
                2,
                None,
-               'deriving-ols-from-a-probability-distribution'),
-              ('Independent and Identically Distrubuted (iid)',
+               'the-derivative-of-the-cost-loss-function'),
+              ('The Hessian matrix', 2, None, 'the-hessian-matrix'),
+              ('Simple program', 2, None, 'simple-program'),
+              ('Gradient Descent Example', 2, None, 'gradient-descent-example'),
+              ('Gradient descent and Ridge',
                2,
                None,
-               'independent-and-identically-distrubuted-iid'),
-              ('Maximum Likelihood Estimation (MLE)',
+               'gradient-descent-and-ridge'),
+              ('The Hessian matrix for Ridge Regression',
                2,
                None,
-               'maximum-likelihood-estimation-mle'),
-              ('A new Cost Function', 2, None, 'a-new-cost-function'),
-              ("More basic Statistics and Bayes' theorem",
+               'the-hessian-matrix-for-ridge-regression'),
+              ('Program example for gradient descent with Ridge Regression',
                2,
                None,
-               'more-basic-statistics-and-bayes-theorem'),
-              ('Marginal Probability', 2, None, 'marginal-probability'),
-              ('Conditional  Probability', 2, None, 'conditional-probability'),
-              ("Bayes' Theorem", 2, None, 'bayes-theorem'),
-              ("Interpretations of Bayes' Theorem",
+               'program-example-for-gradient-descent-with-ridge-regression'),
+              ('Using gradient descent methods, limitations',
                2,
                None,
-               'interpretations-of-bayes-theorem'),
-              ("Example of Usage of Bayes' theorem",
+               'using-gradient-descent-methods-limitations'),
+              ('Momentum based GD', 2, None, 'momentum-based-gd'),
+              ('Improving gradient descent with momentum',
                2,
                None,
-               'example-of-usage-of-bayes-theorem'),
-              ('Doing it correctly', 2, None, 'doing-it-correctly'),
-              ("Bayes' Theorem and Ridge and Lasso Regression",
+               'improving-gradient-descent-with-momentum'),
+              ('Same code but now with momentum gradient descent',
                2,
                None,
-               'bayes-theorem-and-ridge-and-lasso-regression'),
-              ('Ridge and Bayes', 2, None, 'ridge-and-bayes'),
-              ('Lasso and Bayes', 2, None, 'lasso-and-bayes'),
-              ('Why resampling methods', 2, None, 'why-resampling-methods'),
-              ('Resampling methods', 2, None, 'resampling-methods'),
-              ('Resampling approaches can be computationally expensive',
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ('Overview video on Stochastic Gradient Descent (SGD)',
                2,
                None,
-               'resampling-approaches-can-be-computationally-expensive'),
-              ('Why resampling methods ?', 2, None, 'why-resampling-methods'),
-              ('Statistical analysis', 2, None, 'statistical-analysis'),
-              ('Resampling methods', 2, None, 'resampling-methods'),
-              ('Resampling methods: Bootstrap',
+               'overview-video-on-stochastic-gradient-descent-sgd'),
+              ('Batches and mini-batches', 2, None, 'batches-and-mini-batches'),
+              ('Pros and cons', 2, None, 'pros-and-cons'),
+              ('Convergence rates', 2, None, 'convergence-rates'),
+              ('Accuracy', 2, None, 'accuracy'),
+              ('Stochastic Gradient Descent (SGD)',
                2,
                None,
-               'resampling-methods-bootstrap'),
-              ('The Central Limit Theorem',
+               'stochastic-gradient-descent-sgd'),
+              ('Stochastic Gradient Descent',
                2,
                None,
-               'the-central-limit-theorem'),
-              ('Finding the Limit', 2, None, 'finding-the-limit'),
-              ('Rewriting the $\\delta$-function',
+               'stochastic-gradient-descent'),
+              ('Computation of gradients', 2, None, 'computation-of-gradients'),
+              ('SGD example', 2, None, 'sgd-example'),
+              ('The gradient step', 2, None, 'the-gradient-step'),
+              ('Simple example code', 2, None, 'simple-example-code'),
+              ('When do we stop?', 2, None, 'when-do-we-stop'),
+              ('Slightly different approach',
                2,
                None,
-               'rewriting-the-delta-function'),
-              ('Identifying Terms', 2, None, 'identifying-terms'),
-              ('Wrapping it up', 2, None, 'wrapping-it-up'),
-              ('Confidence Intervals', 2, None, 'confidence-intervals'),
-              ('Standard Approach based on the Normal Distribution',
+               'slightly-different-approach'),
+              ('Time decay rate', 2, None, 'time-decay-rate'),
+              ('Code with a Number of Minibatches which varies',
                2,
                None,
-               'standard-approach-based-on-the-normal-distribution'),
-              ('Resampling methods: Bootstrap background',
+               'code-with-a-number-of-minibatches-which-varies'),
+              ('Replace or not', 2, None, 'replace-or-not'),
+              ('SGD vs Full-Batch GD: Convergence Speed and Memory Comparison',
                2,
                None,
-               'resampling-methods-bootstrap-background'),
-              ('Resampling methods: More Bootstrap background',
+               'sgd-vs-full-batch-gd-convergence-speed-and-memory-comparison'),
+              ('Theoretical Convergence Speed and convex optimization',
+               3,
+               None,
+               'theoretical-convergence-speed-and-convex-optimization'),
+              ('Strongly Convex Case', 3, None, 'strongly-convex-case'),
+              ('Non-Convex Problems', 3, None, 'non-convex-problems'),
+              ('Memory Usage and Scalability',
+               2,
+               None,
+               'memory-usage-and-scalability'),
+              ('Empirical Evidence: Convergence Time and Memory in Practice',
+               2,
+               None,
+               'empirical-evidence-convergence-time-and-memory-in-practice'),
+              ('Deep Neural Networks', 3, None, 'deep-neural-networks'),
+              ('Memory constraints', 3, None, 'memory-constraints'),
+              ('Second moment of the gradient',
+               2,
+               None,
+               'second-moment-of-the-gradient'),
+              ('Challenge: Choosing a Fixed Learning Rate',
+               2,
+               None,
+               'challenge-choosing-a-fixed-learning-rate'),
+              ('Motivation for Adaptive Step Sizes',
+               2,
+               None,
+               'motivation-for-adaptive-step-sizes'),
+              ('AdaGrad algorithm, taken from "Goodfellow et '
+               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
+               2,
+               None,
+               'adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
+              ('Derivation of the AdaGrad Algorithm',
+               2,
+               None,
+               'derivation-of-the-adagrad-algorithm'),
+              ('AdaGrad Update Rule Derivation',
+               2,
+               None,
+               'adagrad-update-rule-derivation'),
+              ('AdaGrad Properties', 2, None, 'adagrad-properties'),
+              ('RMSProp: Adaptive Learning Rates',
+               2,
+               None,
+               'rmsprop-adaptive-learning-rates'),
+              ('RMSProp algorithm, taken from "Goodfellow et '
+               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
+               2,
+               None,
+               'rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
+              ('Adam Optimizer', 2, None, 'adam-optimizer'),
+              ('"ADAM optimizer":"https://arxiv.org/abs/1412.6980"',
+               2,
+               None,
+               'adam-optimizer-https-arxiv-org-abs-1412-6980'),
+              ('Why Combine Momentum and RMSProp?',
                2,
                None,
-               'resampling-methods-more-bootstrap-background'),
-              ('Resampling methods: Bootstrap approach',
+               'why-combine-momentum-and-rmsprop'),
+              ('Adam: Exponential Moving Averages (Moments)',
                2,
                None,
-               'resampling-methods-bootstrap-approach'),
-              ('Resampling methods: Bootstrap steps',
+               'adam-exponential-moving-averages-moments'),
+              ('Adam: Bias Correction', 2, None, 'adam-bias-correction'),
+              ('Adam: Update Rule Derivation',
                2,
                None,
-               'resampling-methods-bootstrap-steps'),
-              ('Code example for the Bootstrap method',
+               'adam-update-rule-derivation'),
+              ('Adam vs. AdaGrad and RMSProp',
                2,
                None,
-               'code-example-for-the-bootstrap-method'),
-              ('Plotting the Histogram', 2, None, 'plotting-the-histogram'),
-              ('The bias-variance tradeoff',
+               'adam-vs-adagrad-and-rmsprop'),
+              ('Adaptivity Across Dimensions',
                2,
                None,
-               'the-bias-variance-tradeoff'),
-              ('A way to Read the Bias-Variance Tradeoff',
+               'adaptivity-across-dimensions'),
+              ('ADAM algorithm, taken from "Goodfellow et '
+               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
                2,
                None,
-               'a-way-to-read-the-bias-variance-tradeoff'),
-              ('Example code for Bias-Variance tradeoff',
+               'adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
+              ('Algorithms and codes for Adagrad, RMSprop and Adam',
                2,
                None,
-               'example-code-for-bias-variance-tradeoff'),
-              ('Understanding what happens',
+               'algorithms-and-codes-for-adagrad-rmsprop-and-adam'),
+              ('Practical tips', 2, None, 'practical-tips'),
+              ('Sneaking in automatic differentiation using Autograd',
                2,
                None,
-               'understanding-what-happens'),
-              ('Summing up', 2, None, 'summing-up'),
-              ("Another Example from Scikit-Learn's Repository",
+               'sneaking-in-automatic-differentiation-using-autograd'),
+              ('Same code but now with momentum gradient descent',
                2,
                None,
-               'another-example-from-scikit-learn-s-repository'),
-              ('Various steps in cross-validation',
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ('Including Stochastic Gradient Descent with Autograd',
                2,
                None,
-               'various-steps-in-cross-validation'),
-              ('Cross-validation in brief',
+               'including-stochastic-gradient-descent-with-autograd'),
+              ('Same code but now with momentum gradient descent',
                2,
                None,
-               'cross-validation-in-brief'),
-              ('Code Example for Cross-validation and $k$-fold '
-               'Cross-validation',
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ("But none of these can compete with Newton's method",
                2,
                None,
-               'code-example-for-cross-validation-and-k-fold-cross-validation'),
-              ('More examples on bootstrap and cross-validation and errors',
+               'but-none-of-these-can-compete-with-newton-s-method'),
+              ('Similar (second order function now) problem but now with '
+               'AdaGrad',
                2,
                None,
-               'more-examples-on-bootstrap-and-cross-validation-and-errors'),
-              ('The same example but now with cross-validation',
+               'similar-second-order-function-now-problem-but-now-with-adagrad'),
+              ('RMSprop for adaptive learning rate with Stochastic Gradient '
+               'Descent',
                2,
                None,
-               'the-same-example-but-now-with-cross-validation'),
+               'rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent'),
+              ('And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"',
+               2,
+               None,
+               'and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf'),
               ('Material for the lab sessions',
                2,
                None,
                'material-for-the-lab-sessions'),
-              ('Linking the regression analysis with a statistical '
-               'interpretation',
+              ('Reminder on different scaling methods',
                2,
                None,
-               'linking-the-regression-analysis-with-a-statistical-interpretation'),
-              ('Assumptions made', 2, None, 'assumptions-made'),
-              ('Expectation value and variance',
+               'reminder-on-different-scaling-methods'),
+              ('Functionality in Scikit-Learn',
                2,
                None,
-               'expectation-value-and-variance'),
-              ('Expectation value and variance for $\\boldsymbol{\\beta}$',
+               'functionality-in-scikit-learn'),
+              ('More preprocessing', 2, None, 'more-preprocessing'),
+              ('Frequently used scaling functions',
                2,
                None,
-               'expectation-value-and-variance-for-boldsymbol-beta')]}
+               'frequently-used-scaling-functions')]}
 end of tocinfo -->
 
 <body>
@@ -220,66 +283,86 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week37-bs.html">Week 37: Statistical interpretations and Resampling Methods</a>
+    <a class="navbar-brand" href="http://github.com/week37-bs.html">Week 37: Gradient descent methods</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs001.html#plans-for-week-37-lecture-monday" style="font-size: 80%;">Plans for week 37, lecture Monday</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs002.html#plans-for-week-37-lab-sessions" style="font-size: 80%;">Plans for week 37, lab sessions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs003.html#material-for-lecture-monday-september-9" style="font-size: 80%;">Material for lecture Monday September 9</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs004.html#deriving-ols-from-a-probability-distribution" style="font-size: 80%;">Deriving OLS from a probability distribution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs005.html#independent-and-identically-distrubuted-iid" style="font-size: 80%;">Independent and Identically Distrubuted (iid)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs006.html#maximum-likelihood-estimation-mle" style="font-size: 80%;">Maximum Likelihood Estimation (MLE)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs007.html#a-new-cost-function" style="font-size: 80%;">A new Cost Function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs008.html#more-basic-statistics-and-bayes-theorem" style="font-size: 80%;">More basic Statistics and Bayes' theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs009.html#marginal-probability" style="font-size: 80%;">Marginal Probability</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs010.html#conditional-probability" style="font-size: 80%;">Conditional  Probability</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs011.html#bayes-theorem" style="font-size: 80%;">Bayes' Theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs012.html#interpretations-of-bayes-theorem" style="font-size: 80%;">Interpretations of Bayes' Theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs013.html#example-of-usage-of-bayes-theorem" style="font-size: 80%;">Example of Usage of Bayes' theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs014.html#doing-it-correctly" style="font-size: 80%;">Doing it correctly</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs015.html#bayes-theorem-and-ridge-and-lasso-regression" style="font-size: 80%;">Bayes' Theorem and Ridge and Lasso Regression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs016.html#ridge-and-bayes" style="font-size: 80%;">Ridge and Bayes</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs017.html#lasso-and-bayes" style="font-size: 80%;">Lasso and Bayes</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs021.html#why-resampling-methods" style="font-size: 80%;">Why resampling methods</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs023.html#resampling-methods" style="font-size: 80%;">Resampling methods</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs020.html#resampling-approaches-can-be-computationally-expensive" style="font-size: 80%;">Resampling approaches can be computationally expensive</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs021.html#why-resampling-methods" style="font-size: 80%;">Why resampling methods ?</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs022.html#statistical-analysis" style="font-size: 80%;">Statistical analysis</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs023.html#resampling-methods" style="font-size: 80%;">Resampling methods</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs024.html#resampling-methods-bootstrap" style="font-size: 80%;">Resampling methods: Bootstrap</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs025.html#the-central-limit-theorem" style="font-size: 80%;">The Central Limit Theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs026.html#finding-the-limit" style="font-size: 80%;">Finding the Limit</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs027.html#rewriting-the-delta-function" style="font-size: 80%;">Rewriting the \( \delta \)-function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs028.html#identifying-terms" style="font-size: 80%;">Identifying Terms</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs029.html#wrapping-it-up" style="font-size: 80%;">Wrapping it up</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs030.html#confidence-intervals" style="font-size: 80%;">Confidence Intervals</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs031.html#standard-approach-based-on-the-normal-distribution" style="font-size: 80%;">Standard Approach based on the Normal Distribution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs032.html#resampling-methods-bootstrap-background" style="font-size: 80%;">Resampling methods: Bootstrap background</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#resampling-methods-more-bootstrap-background" style="font-size: 80%;">Resampling methods: More Bootstrap background</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs034.html#resampling-methods-bootstrap-approach" style="font-size: 80%;">Resampling methods: Bootstrap approach</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#resampling-methods-bootstrap-steps" style="font-size: 80%;">Resampling methods: Bootstrap steps</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs036.html#code-example-for-the-bootstrap-method" style="font-size: 80%;">Code example for the Bootstrap method</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs037.html#plotting-the-histogram" style="font-size: 80%;">Plotting the Histogram</a></li>
-     <!-- navigation toc: --> <li><a href="#the-bias-variance-tradeoff" style="font-size: 80%;">The bias-variance tradeoff</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs039.html#a-way-to-read-the-bias-variance-tradeoff" style="font-size: 80%;">A way to Read the Bias-Variance Tradeoff</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs040.html#example-code-for-bias-variance-tradeoff" style="font-size: 80%;">Example code for Bias-Variance tradeoff</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs041.html#understanding-what-happens" style="font-size: 80%;">Understanding what happens</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs042.html#summing-up" style="font-size: 80%;">Summing up</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs043.html#another-example-from-scikit-learn-s-repository" style="font-size: 80%;">Another Example from Scikit-Learn's Repository</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs044.html#various-steps-in-cross-validation" style="font-size: 80%;">Various steps in cross-validation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs045.html#cross-validation-in-brief" style="font-size: 80%;">Cross-validation in brief</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs046.html#code-example-for-cross-validation-and-k-fold-cross-validation" style="font-size: 80%;">Code Example for Cross-validation and \( k \)-fold Cross-validation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs047.html#more-examples-on-bootstrap-and-cross-validation-and-errors" style="font-size: 80%;">More examples on bootstrap and cross-validation and errors</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs048.html#the-same-example-but-now-with-cross-validation" style="font-size: 80%;">The same example but now with cross-validation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs049.html#material-for-the-lab-sessions" style="font-size: 80%;">Material for the lab sessions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs050.html#linking-the-regression-analysis-with-a-statistical-interpretation" style="font-size: 80%;">Linking the regression analysis with a statistical interpretation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs051.html#assumptions-made" style="font-size: 80%;">Assumptions made</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs052.html#expectation-value-and-variance" style="font-size: 80%;">Expectation value and variance</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs053.html#expectation-value-and-variance-for-boldsymbol-beta" style="font-size: 80%;">Expectation value and variance for \( \boldsymbol{\beta} \)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs001.html#plans-for-week-37-lecture-monday" style="font-size: 80%;"><b>Plans for week 37, lecture Monday</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs002.html#readings-and-videos" style="font-size: 80%;"><b>Readings and Videos:</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs003.html#material-for-lecture-monday-september-8" style="font-size: 80%;"><b>Material for lecture Monday September 8</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs004.html#gradient-descent-and-revisiting-ordinary-least-squares-from-last-week" style="font-size: 80%;"><b>Gradient descent and revisiting Ordinary Least Squares from last week</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs009.html#gradient-descent-example" style="font-size: 80%;"><b>Gradient descent example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs006.html#the-derivative-of-the-cost-loss-function" style="font-size: 80%;"><b>The derivative of the cost/loss function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs007.html#the-hessian-matrix" style="font-size: 80%;"><b>The Hessian matrix</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs008.html#simple-program" style="font-size: 80%;"><b>Simple program</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs009.html#gradient-descent-example" style="font-size: 80%;"><b>Gradient Descent Example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs010.html#gradient-descent-and-ridge" style="font-size: 80%;"><b>Gradient descent and Ridge</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs011.html#the-hessian-matrix-for-ridge-regression" style="font-size: 80%;"><b>The Hessian matrix for Ridge Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs012.html#program-example-for-gradient-descent-with-ridge-regression" style="font-size: 80%;"><b>Program example for gradient descent with Ridge Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs013.html#using-gradient-descent-methods-limitations" style="font-size: 80%;"><b>Using gradient descent methods, limitations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs014.html#momentum-based-gd" style="font-size: 80%;"><b>Momentum based GD</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs015.html#improving-gradient-descent-with-momentum" style="font-size: 80%;"><b>Improving gradient descent with momentum</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs056.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs017.html#overview-video-on-stochastic-gradient-descent-sgd" style="font-size: 80%;"><b>Overview video on Stochastic Gradient Descent (SGD)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs018.html#batches-and-mini-batches" style="font-size: 80%;"><b>Batches and mini-batches</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs019.html#pros-and-cons" style="font-size: 80%;"><b>Pros and cons</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs020.html#convergence-rates" style="font-size: 80%;"><b>Convergence rates</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs021.html#accuracy" style="font-size: 80%;"><b>Accuracy</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs022.html#stochastic-gradient-descent-sgd" style="font-size: 80%;"><b>Stochastic Gradient Descent (SGD)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs023.html#stochastic-gradient-descent" style="font-size: 80%;"><b>Stochastic Gradient Descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs024.html#computation-of-gradients" style="font-size: 80%;"><b>Computation of gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs025.html#sgd-example" style="font-size: 80%;"><b>SGD example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs026.html#the-gradient-step" style="font-size: 80%;"><b>The gradient step</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs027.html#simple-example-code" style="font-size: 80%;"><b>Simple example code</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs028.html#when-do-we-stop" style="font-size: 80%;"><b>When do we stop?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs029.html#slightly-different-approach" style="font-size: 80%;"><b>Slightly different approach</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs030.html#time-decay-rate" style="font-size: 80%;"><b>Time decay rate</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs031.html#code-with-a-number-of-minibatches-which-varies" style="font-size: 80%;"><b>Code with a Number of Minibatches which varies</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs032.html#replace-or-not" style="font-size: 80%;"><b>Replace or not</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#sgd-vs-full-batch-gd-convergence-speed-and-memory-comparison" style="font-size: 80%;"><b>SGD vs Full-Batch GD: Convergence Speed and Memory Comparison</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#theoretical-convergence-speed-and-convex-optimization" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Theoretical Convergence Speed and convex optimization</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#strongly-convex-case" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Strongly Convex Case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#non-convex-problems" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Non-Convex Problems</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs034.html#memory-usage-and-scalability" style="font-size: 80%;"><b>Memory Usage and Scalability</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#empirical-evidence-convergence-time-and-memory-in-practice" style="font-size: 80%;"><b>Empirical Evidence: Convergence Time and Memory in Practice</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#deep-neural-networks" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Deep Neural Networks</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#memory-constraints" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Memory constraints</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs036.html#second-moment-of-the-gradient" style="font-size: 80%;"><b>Second moment of the gradient</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs037.html#challenge-choosing-a-fixed-learning-rate" style="font-size: 80%;"><b>Challenge: Choosing a Fixed Learning Rate</b></a></li>
+     <!-- navigation toc: --> <li><a href="#motivation-for-adaptive-step-sizes" style="font-size: 80%;"><b>Motivation for Adaptive Step Sizes</b></a></li>
+     <!-- navigation toc: --> <li><a href="#adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>AdaGrad algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs039.html#derivation-of-the-adagrad-algorithm" style="font-size: 80%;"><b>Derivation of the AdaGrad Algorithm</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs040.html#adagrad-update-rule-derivation" style="font-size: 80%;"><b>AdaGrad Update Rule Derivation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs041.html#adagrad-properties" style="font-size: 80%;"><b>AdaGrad Properties</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs042.html#rmsprop-adaptive-learning-rates" style="font-size: 80%;"><b>RMSProp: Adaptive Learning Rates</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs042.html#rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>RMSProp algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs043.html#adam-optimizer" style="font-size: 80%;"><b>Adam Optimizer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs044.html#adam-optimizer-https-arxiv-org-abs-1412-6980" style="font-size: 80%;"><b>"ADAM optimizer":"https://arxiv.org/abs/1412.6980"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs045.html#why-combine-momentum-and-rmsprop" style="font-size: 80%;"><b>Why Combine Momentum and RMSProp?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs046.html#adam-exponential-moving-averages-moments" style="font-size: 80%;"><b>Adam: Exponential Moving Averages (Moments)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs047.html#adam-bias-correction" style="font-size: 80%;"><b>Adam: Bias Correction</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs048.html#adam-update-rule-derivation" style="font-size: 80%;"><b>Adam: Update Rule Derivation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs049.html#adam-vs-adagrad-and-rmsprop" style="font-size: 80%;"><b>Adam vs. AdaGrad and RMSProp</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs050.html#adaptivity-across-dimensions" style="font-size: 80%;"><b>Adaptivity Across Dimensions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs050.html#adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>ADAM algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs051.html#algorithms-and-codes-for-adagrad-rmsprop-and-adam" style="font-size: 80%;"><b>Algorithms and codes for Adagrad, RMSprop and Adam</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs052.html#practical-tips" style="font-size: 80%;"><b>Practical tips</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs053.html#sneaking-in-automatic-differentiation-using-autograd" style="font-size: 80%;"><b>Sneaking in automatic differentiation using Autograd</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs056.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs055.html#including-stochastic-gradient-descent-with-autograd" style="font-size: 80%;"><b>Including Stochastic Gradient Descent with Autograd</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs056.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs057.html#but-none-of-these-can-compete-with-newton-s-method" style="font-size: 80%;"><b>But none of these can compete with Newton's method</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs058.html#similar-second-order-function-now-problem-but-now-with-adagrad" style="font-size: 80%;"><b>Similar (second order function now) problem but now with AdaGrad</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs059.html#rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent" style="font-size: 80%;"><b>RMSprop for adaptive learning rate with Stochastic Gradient Descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs060.html#and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf" style="font-size: 80%;"><b>And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs061.html#material-for-the-lab-sessions" style="font-size: 80%;"><b>Material for the lab sessions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs062.html#reminder-on-different-scaling-methods" style="font-size: 80%;"><b>Reminder on different scaling methods</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs063.html#functionality-in-scikit-learn" style="font-size: 80%;"><b>Functionality in Scikit-Learn</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs064.html#more-preprocessing" style="font-size: 80%;"><b>More preprocessing</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs065.html#frequently-used-scaling-functions" style="font-size: 80%;"><b>Frequently used scaling functions</b></a></li>
 
         </ul>
       </li>
@@ -291,64 +374,22 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0038"></a>
 <!-- !split -->
-<h2 id="the-bias-variance-tradeoff" class="anchor">The bias-variance tradeoff </h2>
-
-<p>We will discuss the bias-variance tradeoff in the context of
-continuous predictions such as regression. However, many of the
-intuitions and ideas discussed here also carry over to classification
-tasks. Consider a dataset \( \mathcal{D} \) consisting of the data
-\( \mathbf{X}_\mathcal{D}=\{(y_j, \boldsymbol{x}_j), j=0\ldots n-1\} \). 
-</p>
-
-<p>Let us assume that the true data is generated from a noisy model</p>
+<h2 id="motivation-for-adaptive-step-sizes" class="anchor">Motivation for Adaptive Step Sizes </h2>
 
-$$
-\boldsymbol{y}=f(\boldsymbol{x}) + \boldsymbol{\epsilon}
-$$
+<ol>
+<li> Instead of a fixed global \( \eta \), use an <b>adaptive learning rate</b> for each parameter that depends on the history of gradients.</li>
+<li> Parameters that have large accumulated gradient magnitude should get smaller steps (they've been changing a lot), whereas parameters with small or infrequent gradients can have larger relative steps.</li>
+<li> This is especially useful for sparse features: Rarely active features accumulate little gradient, so their learning rate remains comparatively high, ensuring they are not neglected</li>
+<li> Conversely, frequently active features accumulate large gradient sums, and their learning rate automatically decreases, preventing too-large updates</li>
+<li> Several algorithms implement this idea (AdaGrad, RMSProp, AdaDelta, Adam, etc.). We will derive **AdaGrad**, one of the first adaptive methods.</li>
+</ol>
+<h2 id="adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" class="anchor">AdaGrad algorithm, taken from <a href="https://www.deeplearningbook.org/contents/optimization.html" target="_self">Goodfellow et al</a> </h2>
 
-<p>where \( \epsilon \) is normally distributed with mean zero and standard deviation \( \sigma^2 \).</p>
-
-<p>In our derivation of the ordinary least squares method we defined then
-an approximation to the function \( f \) in terms of the parameters
-\( \boldsymbol{\beta} \) and the design matrix \( \boldsymbol{X} \) which embody our model,
-that is \( \boldsymbol{\tilde{y}}=\boldsymbol{X}\boldsymbol{\beta} \). 
-</p>
-
-<p>Thereafter we found the parameters \( \boldsymbol{\beta} \) by optimizing the means squared error via the so-called cost function</p>
-$$
-C(\boldsymbol{X},\boldsymbol{\beta}) =\frac{1}{n}\sum_{i=0}^{n-1}(y_i-\tilde{y}_i)^2=\mathbb{E}\left[(\boldsymbol{y}-\boldsymbol{\tilde{y}})^2\right].
-$$
-
-<p>We can rewrite this as </p>
-$$
-\mathbb{E}\left[(\boldsymbol{y}-\boldsymbol{\tilde{y}})^2\right]=\frac{1}{n}\sum_i(f_i-\mathbb{E}\left[\boldsymbol{\tilde{y}}\right])^2+\frac{1}{n}\sum_i(\tilde{y}_i-\mathbb{E}\left[\boldsymbol{\tilde{y}}\right])^2+\sigma^2.
-$$
-
-<p>The three terms represent the square of the bias of the learning
-method, which can be thought of as the error caused by the simplifying
-assumptions built into the method. The second term represents the
-variance of the chosen model and finally the last terms is variance of
-the error \( \boldsymbol{\epsilon} \).
-</p>
-
-<p>To derive this equation, we need to recall that the variance of \( \boldsymbol{y} \) and \( \boldsymbol{\epsilon} \) are both equal to \( \sigma^2 \). The mean value of \( \boldsymbol{\epsilon} \) is by definition equal to zero. Furthermore, the function \( f \) is not a stochastics variable, idem for \( \boldsymbol{\tilde{y}} \).
-We use a more compact notation in terms of the expectation value 
-</p>
-$$
-\mathbb{E}\left[(\boldsymbol{y}-\boldsymbol{\tilde{y}})^2\right]=\mathbb{E}\left[(\boldsymbol{f}+\boldsymbol{\epsilon}-\boldsymbol{\tilde{y}})^2\right],
-$$
-
-<p>and adding and subtracting \( \mathbb{E}\left[\boldsymbol{\tilde{y}}\right] \) we get</p>
-$$
-\mathbb{E}\left[(\boldsymbol{y}-\boldsymbol{\tilde{y}})^2\right]=\mathbb{E}\left[(\boldsymbol{f}+\boldsymbol{\epsilon}-\boldsymbol{\tilde{y}}+\mathbb{E}\left[\boldsymbol{\tilde{y}}\right]-\mathbb{E}\left[\boldsymbol{\tilde{y}}\right])^2\right],
-$$
-
-<p>which, using the abovementioned expectation values can be rewritten as </p>
-$$
-\mathbb{E}\left[(\boldsymbol{y}-\boldsymbol{\tilde{y}})^2\right]=\mathbb{E}\left[(\boldsymbol{y}-\mathbb{E}\left[\boldsymbol{\tilde{y}}\right])^2\right]+\mathrm{Var}\left[\boldsymbol{\tilde{y}}\right]+\sigma^2,
-$$
-
-<p>that is the rewriting in terms of the so-called bias, the variance of the model \( \boldsymbol{\tilde{y}} \) and the variance of \( \boldsymbol{\epsilon} \).</p>
+<br/><br/>
+<center>
+<p><img src="http://github.com/figures/adagrad.png" width="600" align="bottom"></p>
+</center>
+<br/><br/>
 
 <p>
 <!-- navigation buttons at the bottom of the page -->
@@ -375,7 +416,7 @@ <h2 id="the-bias-variance-tradeoff" class="anchor">The bias-variance tradeoff </
   <li><a href="http://github.com/._week37-bs046.html">47</a></li>
   <li><a href="http://github.com/._week37-bs047.html">48</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week37-bs053.html">54</a></li>
+  <li><a href="http://github.com/._week37-bs065.html">66</a></li>
   <li><a href="http://github.com/._week37-bs039.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week37/html/._week37-bs039.html b/doc/pub/week37/html/._week37-bs039.html
index e9ebac869..fb28b394d 100644
--- a/doc/pub/week37/html/._week37-bs039.html
+++ b/doc/pub/week37/html/._week37-bs039.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 37: Statistical interpretations and Resampling Methods">
-<title>Week 37: Statistical interpretations and Resampling Methods</title>
+<meta name="description" content="Week 37: Gradient descent methods">
+<title>Week 37: Gradient descent methods</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week37.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week37-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -40,159 +40,222 @@
                2,
                None,
                'plans-for-week-37-lecture-monday'),
-              ('Plans for week 37, lab sessions',
+              ('Readings and Videos:', 2, None, 'readings-and-videos'),
+              ('Material for lecture Monday September 8',
                2,
                None,
-               'plans-for-week-37-lab-sessions'),
-              ('Material for lecture Monday September 9',
+               'material-for-lecture-monday-september-8'),
+              ('Gradient descent and revisiting Ordinary Least Squares from '
+               'last week',
                2,
                None,
-               'material-for-lecture-monday-september-9'),
-              ('Deriving OLS from a probability distribution',
+               'gradient-descent-and-revisiting-ordinary-least-squares-from-last-week'),
+              ('Gradient descent example', 2, None, 'gradient-descent-example'),
+              ('The derivative of the cost/loss function',
                2,
                None,
-               'deriving-ols-from-a-probability-distribution'),
-              ('Independent and Identically Distrubuted (iid)',
+               'the-derivative-of-the-cost-loss-function'),
+              ('The Hessian matrix', 2, None, 'the-hessian-matrix'),
+              ('Simple program', 2, None, 'simple-program'),
+              ('Gradient Descent Example', 2, None, 'gradient-descent-example'),
+              ('Gradient descent and Ridge',
                2,
                None,
-               'independent-and-identically-distrubuted-iid'),
-              ('Maximum Likelihood Estimation (MLE)',
+               'gradient-descent-and-ridge'),
+              ('The Hessian matrix for Ridge Regression',
                2,
                None,
-               'maximum-likelihood-estimation-mle'),
-              ('A new Cost Function', 2, None, 'a-new-cost-function'),
-              ("More basic Statistics and Bayes' theorem",
+               'the-hessian-matrix-for-ridge-regression'),
+              ('Program example for gradient descent with Ridge Regression',
                2,
                None,
-               'more-basic-statistics-and-bayes-theorem'),
-              ('Marginal Probability', 2, None, 'marginal-probability'),
-              ('Conditional  Probability', 2, None, 'conditional-probability'),
-              ("Bayes' Theorem", 2, None, 'bayes-theorem'),
-              ("Interpretations of Bayes' Theorem",
+               'program-example-for-gradient-descent-with-ridge-regression'),
+              ('Using gradient descent methods, limitations',
                2,
                None,
-               'interpretations-of-bayes-theorem'),
-              ("Example of Usage of Bayes' theorem",
+               'using-gradient-descent-methods-limitations'),
+              ('Momentum based GD', 2, None, 'momentum-based-gd'),
+              ('Improving gradient descent with momentum',
                2,
                None,
-               'example-of-usage-of-bayes-theorem'),
-              ('Doing it correctly', 2, None, 'doing-it-correctly'),
-              ("Bayes' Theorem and Ridge and Lasso Regression",
+               'improving-gradient-descent-with-momentum'),
+              ('Same code but now with momentum gradient descent',
                2,
                None,
-               'bayes-theorem-and-ridge-and-lasso-regression'),
-              ('Ridge and Bayes', 2, None, 'ridge-and-bayes'),
-              ('Lasso and Bayes', 2, None, 'lasso-and-bayes'),
-              ('Why resampling methods', 2, None, 'why-resampling-methods'),
-              ('Resampling methods', 2, None, 'resampling-methods'),
-              ('Resampling approaches can be computationally expensive',
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ('Overview video on Stochastic Gradient Descent (SGD)',
                2,
                None,
-               'resampling-approaches-can-be-computationally-expensive'),
-              ('Why resampling methods ?', 2, None, 'why-resampling-methods'),
-              ('Statistical analysis', 2, None, 'statistical-analysis'),
-              ('Resampling methods', 2, None, 'resampling-methods'),
-              ('Resampling methods: Bootstrap',
+               'overview-video-on-stochastic-gradient-descent-sgd'),
+              ('Batches and mini-batches', 2, None, 'batches-and-mini-batches'),
+              ('Pros and cons', 2, None, 'pros-and-cons'),
+              ('Convergence rates', 2, None, 'convergence-rates'),
+              ('Accuracy', 2, None, 'accuracy'),
+              ('Stochastic Gradient Descent (SGD)',
                2,
                None,
-               'resampling-methods-bootstrap'),
-              ('The Central Limit Theorem',
+               'stochastic-gradient-descent-sgd'),
+              ('Stochastic Gradient Descent',
                2,
                None,
-               'the-central-limit-theorem'),
-              ('Finding the Limit', 2, None, 'finding-the-limit'),
-              ('Rewriting the $\\delta$-function',
+               'stochastic-gradient-descent'),
+              ('Computation of gradients', 2, None, 'computation-of-gradients'),
+              ('SGD example', 2, None, 'sgd-example'),
+              ('The gradient step', 2, None, 'the-gradient-step'),
+              ('Simple example code', 2, None, 'simple-example-code'),
+              ('When do we stop?', 2, None, 'when-do-we-stop'),
+              ('Slightly different approach',
                2,
                None,
-               'rewriting-the-delta-function'),
-              ('Identifying Terms', 2, None, 'identifying-terms'),
-              ('Wrapping it up', 2, None, 'wrapping-it-up'),
-              ('Confidence Intervals', 2, None, 'confidence-intervals'),
-              ('Standard Approach based on the Normal Distribution',
+               'slightly-different-approach'),
+              ('Time decay rate', 2, None, 'time-decay-rate'),
+              ('Code with a Number of Minibatches which varies',
                2,
                None,
-               'standard-approach-based-on-the-normal-distribution'),
-              ('Resampling methods: Bootstrap background',
+               'code-with-a-number-of-minibatches-which-varies'),
+              ('Replace or not', 2, None, 'replace-or-not'),
+              ('SGD vs Full-Batch GD: Convergence Speed and Memory Comparison',
                2,
                None,
-               'resampling-methods-bootstrap-background'),
-              ('Resampling methods: More Bootstrap background',
+               'sgd-vs-full-batch-gd-convergence-speed-and-memory-comparison'),
+              ('Theoretical Convergence Speed and convex optimization',
+               3,
+               None,
+               'theoretical-convergence-speed-and-convex-optimization'),
+              ('Strongly Convex Case', 3, None, 'strongly-convex-case'),
+              ('Non-Convex Problems', 3, None, 'non-convex-problems'),
+              ('Memory Usage and Scalability',
+               2,
+               None,
+               'memory-usage-and-scalability'),
+              ('Empirical Evidence: Convergence Time and Memory in Practice',
+               2,
+               None,
+               'empirical-evidence-convergence-time-and-memory-in-practice'),
+              ('Deep Neural Networks', 3, None, 'deep-neural-networks'),
+              ('Memory constraints', 3, None, 'memory-constraints'),
+              ('Second moment of the gradient',
+               2,
+               None,
+               'second-moment-of-the-gradient'),
+              ('Challenge: Choosing a Fixed Learning Rate',
+               2,
+               None,
+               'challenge-choosing-a-fixed-learning-rate'),
+              ('Motivation for Adaptive Step Sizes',
+               2,
+               None,
+               'motivation-for-adaptive-step-sizes'),
+              ('AdaGrad algorithm, taken from "Goodfellow et '
+               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
+               2,
+               None,
+               'adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
+              ('Derivation of the AdaGrad Algorithm',
+               2,
+               None,
+               'derivation-of-the-adagrad-algorithm'),
+              ('AdaGrad Update Rule Derivation',
                2,
                None,
-               'resampling-methods-more-bootstrap-background'),
-              ('Resampling methods: Bootstrap approach',
+               'adagrad-update-rule-derivation'),
+              ('AdaGrad Properties', 2, None, 'adagrad-properties'),
+              ('RMSProp: Adaptive Learning Rates',
                2,
                None,
-               'resampling-methods-bootstrap-approach'),
-              ('Resampling methods: Bootstrap steps',
+               'rmsprop-adaptive-learning-rates'),
+              ('RMSProp algorithm, taken from "Goodfellow et '
+               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
                2,
                None,
-               'resampling-methods-bootstrap-steps'),
-              ('Code example for the Bootstrap method',
+               'rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
+              ('Adam Optimizer', 2, None, 'adam-optimizer'),
+              ('"ADAM optimizer":"https://arxiv.org/abs/1412.6980"',
                2,
                None,
-               'code-example-for-the-bootstrap-method'),
-              ('Plotting the Histogram', 2, None, 'plotting-the-histogram'),
-              ('The bias-variance tradeoff',
+               'adam-optimizer-https-arxiv-org-abs-1412-6980'),
+              ('Why Combine Momentum and RMSProp?',
                2,
                None,
-               'the-bias-variance-tradeoff'),
-              ('A way to Read the Bias-Variance Tradeoff',
+               'why-combine-momentum-and-rmsprop'),
+              ('Adam: Exponential Moving Averages (Moments)',
                2,
                None,
-               'a-way-to-read-the-bias-variance-tradeoff'),
-              ('Example code for Bias-Variance tradeoff',
+               'adam-exponential-moving-averages-moments'),
+              ('Adam: Bias Correction', 2, None, 'adam-bias-correction'),
+              ('Adam: Update Rule Derivation',
                2,
                None,
-               'example-code-for-bias-variance-tradeoff'),
-              ('Understanding what happens',
+               'adam-update-rule-derivation'),
+              ('Adam vs. AdaGrad and RMSProp',
                2,
                None,
-               'understanding-what-happens'),
-              ('Summing up', 2, None, 'summing-up'),
-              ("Another Example from Scikit-Learn's Repository",
+               'adam-vs-adagrad-and-rmsprop'),
+              ('Adaptivity Across Dimensions',
                2,
                None,
-               'another-example-from-scikit-learn-s-repository'),
-              ('Various steps in cross-validation',
+               'adaptivity-across-dimensions'),
+              ('ADAM algorithm, taken from "Goodfellow et '
+               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
                2,
                None,
-               'various-steps-in-cross-validation'),
-              ('Cross-validation in brief',
+               'adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
+              ('Algorithms and codes for Adagrad, RMSprop and Adam',
                2,
                None,
-               'cross-validation-in-brief'),
-              ('Code Example for Cross-validation and $k$-fold '
-               'Cross-validation',
+               'algorithms-and-codes-for-adagrad-rmsprop-and-adam'),
+              ('Practical tips', 2, None, 'practical-tips'),
+              ('Sneaking in automatic differentiation using Autograd',
                2,
                None,
-               'code-example-for-cross-validation-and-k-fold-cross-validation'),
-              ('More examples on bootstrap and cross-validation and errors',
+               'sneaking-in-automatic-differentiation-using-autograd'),
+              ('Same code but now with momentum gradient descent',
                2,
                None,
-               'more-examples-on-bootstrap-and-cross-validation-and-errors'),
-              ('The same example but now with cross-validation',
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ('Including Stochastic Gradient Descent with Autograd',
                2,
                None,
-               'the-same-example-but-now-with-cross-validation'),
+               'including-stochastic-gradient-descent-with-autograd'),
+              ('Same code but now with momentum gradient descent',
+               2,
+               None,
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ("But none of these can compete with Newton's method",
+               2,
+               None,
+               'but-none-of-these-can-compete-with-newton-s-method'),
+              ('Similar (second order function now) problem but now with '
+               'AdaGrad',
+               2,
+               None,
+               'similar-second-order-function-now-problem-but-now-with-adagrad'),
+              ('RMSprop for adaptive learning rate with Stochastic Gradient '
+               'Descent',
+               2,
+               None,
+               'rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent'),
+              ('And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"',
+               2,
+               None,
+               'and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf'),
               ('Material for the lab sessions',
                2,
                None,
                'material-for-the-lab-sessions'),
-              ('Linking the regression analysis with a statistical '
-               'interpretation',
+              ('Reminder on different scaling methods',
                2,
                None,
-               'linking-the-regression-analysis-with-a-statistical-interpretation'),
-              ('Assumptions made', 2, None, 'assumptions-made'),
-              ('Expectation value and variance',
+               'reminder-on-different-scaling-methods'),
+              ('Functionality in Scikit-Learn',
                2,
                None,
-               'expectation-value-and-variance'),
-              ('Expectation value and variance for $\\boldsymbol{\\beta}$',
+               'functionality-in-scikit-learn'),
+              ('More preprocessing', 2, None, 'more-preprocessing'),
+              ('Frequently used scaling functions',
                2,
                None,
-               'expectation-value-and-variance-for-boldsymbol-beta')]}
+               'frequently-used-scaling-functions')]}
 end of tocinfo -->
 
 <body>
@@ -220,66 +283,86 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week37-bs.html">Week 37: Statistical interpretations and Resampling Methods</a>
+    <a class="navbar-brand" href="http://github.com/week37-bs.html">Week 37: Gradient descent methods</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs001.html#plans-for-week-37-lecture-monday" style="font-size: 80%;">Plans for week 37, lecture Monday</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs002.html#plans-for-week-37-lab-sessions" style="font-size: 80%;">Plans for week 37, lab sessions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs003.html#material-for-lecture-monday-september-9" style="font-size: 80%;">Material for lecture Monday September 9</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs004.html#deriving-ols-from-a-probability-distribution" style="font-size: 80%;">Deriving OLS from a probability distribution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs005.html#independent-and-identically-distrubuted-iid" style="font-size: 80%;">Independent and Identically Distrubuted (iid)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs006.html#maximum-likelihood-estimation-mle" style="font-size: 80%;">Maximum Likelihood Estimation (MLE)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs007.html#a-new-cost-function" style="font-size: 80%;">A new Cost Function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs008.html#more-basic-statistics-and-bayes-theorem" style="font-size: 80%;">More basic Statistics and Bayes' theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs009.html#marginal-probability" style="font-size: 80%;">Marginal Probability</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs010.html#conditional-probability" style="font-size: 80%;">Conditional  Probability</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs011.html#bayes-theorem" style="font-size: 80%;">Bayes' Theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs012.html#interpretations-of-bayes-theorem" style="font-size: 80%;">Interpretations of Bayes' Theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs013.html#example-of-usage-of-bayes-theorem" style="font-size: 80%;">Example of Usage of Bayes' theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs014.html#doing-it-correctly" style="font-size: 80%;">Doing it correctly</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs015.html#bayes-theorem-and-ridge-and-lasso-regression" style="font-size: 80%;">Bayes' Theorem and Ridge and Lasso Regression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs016.html#ridge-and-bayes" style="font-size: 80%;">Ridge and Bayes</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs017.html#lasso-and-bayes" style="font-size: 80%;">Lasso and Bayes</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs021.html#why-resampling-methods" style="font-size: 80%;">Why resampling methods</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs023.html#resampling-methods" style="font-size: 80%;">Resampling methods</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs020.html#resampling-approaches-can-be-computationally-expensive" style="font-size: 80%;">Resampling approaches can be computationally expensive</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs021.html#why-resampling-methods" style="font-size: 80%;">Why resampling methods ?</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs022.html#statistical-analysis" style="font-size: 80%;">Statistical analysis</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs023.html#resampling-methods" style="font-size: 80%;">Resampling methods</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs024.html#resampling-methods-bootstrap" style="font-size: 80%;">Resampling methods: Bootstrap</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs025.html#the-central-limit-theorem" style="font-size: 80%;">The Central Limit Theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs026.html#finding-the-limit" style="font-size: 80%;">Finding the Limit</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs027.html#rewriting-the-delta-function" style="font-size: 80%;">Rewriting the \( \delta \)-function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs028.html#identifying-terms" style="font-size: 80%;">Identifying Terms</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs029.html#wrapping-it-up" style="font-size: 80%;">Wrapping it up</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs030.html#confidence-intervals" style="font-size: 80%;">Confidence Intervals</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs031.html#standard-approach-based-on-the-normal-distribution" style="font-size: 80%;">Standard Approach based on the Normal Distribution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs032.html#resampling-methods-bootstrap-background" style="font-size: 80%;">Resampling methods: Bootstrap background</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#resampling-methods-more-bootstrap-background" style="font-size: 80%;">Resampling methods: More Bootstrap background</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs034.html#resampling-methods-bootstrap-approach" style="font-size: 80%;">Resampling methods: Bootstrap approach</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#resampling-methods-bootstrap-steps" style="font-size: 80%;">Resampling methods: Bootstrap steps</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs036.html#code-example-for-the-bootstrap-method" style="font-size: 80%;">Code example for the Bootstrap method</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs037.html#plotting-the-histogram" style="font-size: 80%;">Plotting the Histogram</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs038.html#the-bias-variance-tradeoff" style="font-size: 80%;">The bias-variance tradeoff</a></li>
-     <!-- navigation toc: --> <li><a href="#a-way-to-read-the-bias-variance-tradeoff" style="font-size: 80%;">A way to Read the Bias-Variance Tradeoff</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs040.html#example-code-for-bias-variance-tradeoff" style="font-size: 80%;">Example code for Bias-Variance tradeoff</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs041.html#understanding-what-happens" style="font-size: 80%;">Understanding what happens</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs042.html#summing-up" style="font-size: 80%;">Summing up</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs043.html#another-example-from-scikit-learn-s-repository" style="font-size: 80%;">Another Example from Scikit-Learn's Repository</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs044.html#various-steps-in-cross-validation" style="font-size: 80%;">Various steps in cross-validation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs045.html#cross-validation-in-brief" style="font-size: 80%;">Cross-validation in brief</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs046.html#code-example-for-cross-validation-and-k-fold-cross-validation" style="font-size: 80%;">Code Example for Cross-validation and \( k \)-fold Cross-validation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs047.html#more-examples-on-bootstrap-and-cross-validation-and-errors" style="font-size: 80%;">More examples on bootstrap and cross-validation and errors</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs048.html#the-same-example-but-now-with-cross-validation" style="font-size: 80%;">The same example but now with cross-validation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs049.html#material-for-the-lab-sessions" style="font-size: 80%;">Material for the lab sessions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs050.html#linking-the-regression-analysis-with-a-statistical-interpretation" style="font-size: 80%;">Linking the regression analysis with a statistical interpretation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs051.html#assumptions-made" style="font-size: 80%;">Assumptions made</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs052.html#expectation-value-and-variance" style="font-size: 80%;">Expectation value and variance</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs053.html#expectation-value-and-variance-for-boldsymbol-beta" style="font-size: 80%;">Expectation value and variance for \( \boldsymbol{\beta} \)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs001.html#plans-for-week-37-lecture-monday" style="font-size: 80%;"><b>Plans for week 37, lecture Monday</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs002.html#readings-and-videos" style="font-size: 80%;"><b>Readings and Videos:</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs003.html#material-for-lecture-monday-september-8" style="font-size: 80%;"><b>Material for lecture Monday September 8</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs004.html#gradient-descent-and-revisiting-ordinary-least-squares-from-last-week" style="font-size: 80%;"><b>Gradient descent and revisiting Ordinary Least Squares from last week</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs009.html#gradient-descent-example" style="font-size: 80%;"><b>Gradient descent example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs006.html#the-derivative-of-the-cost-loss-function" style="font-size: 80%;"><b>The derivative of the cost/loss function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs007.html#the-hessian-matrix" style="font-size: 80%;"><b>The Hessian matrix</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs008.html#simple-program" style="font-size: 80%;"><b>Simple program</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs009.html#gradient-descent-example" style="font-size: 80%;"><b>Gradient Descent Example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs010.html#gradient-descent-and-ridge" style="font-size: 80%;"><b>Gradient descent and Ridge</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs011.html#the-hessian-matrix-for-ridge-regression" style="font-size: 80%;"><b>The Hessian matrix for Ridge Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs012.html#program-example-for-gradient-descent-with-ridge-regression" style="font-size: 80%;"><b>Program example for gradient descent with Ridge Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs013.html#using-gradient-descent-methods-limitations" style="font-size: 80%;"><b>Using gradient descent methods, limitations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs014.html#momentum-based-gd" style="font-size: 80%;"><b>Momentum based GD</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs015.html#improving-gradient-descent-with-momentum" style="font-size: 80%;"><b>Improving gradient descent with momentum</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs056.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs017.html#overview-video-on-stochastic-gradient-descent-sgd" style="font-size: 80%;"><b>Overview video on Stochastic Gradient Descent (SGD)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs018.html#batches-and-mini-batches" style="font-size: 80%;"><b>Batches and mini-batches</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs019.html#pros-and-cons" style="font-size: 80%;"><b>Pros and cons</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs020.html#convergence-rates" style="font-size: 80%;"><b>Convergence rates</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs021.html#accuracy" style="font-size: 80%;"><b>Accuracy</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs022.html#stochastic-gradient-descent-sgd" style="font-size: 80%;"><b>Stochastic Gradient Descent (SGD)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs023.html#stochastic-gradient-descent" style="font-size: 80%;"><b>Stochastic Gradient Descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs024.html#computation-of-gradients" style="font-size: 80%;"><b>Computation of gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs025.html#sgd-example" style="font-size: 80%;"><b>SGD example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs026.html#the-gradient-step" style="font-size: 80%;"><b>The gradient step</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs027.html#simple-example-code" style="font-size: 80%;"><b>Simple example code</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs028.html#when-do-we-stop" style="font-size: 80%;"><b>When do we stop?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs029.html#slightly-different-approach" style="font-size: 80%;"><b>Slightly different approach</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs030.html#time-decay-rate" style="font-size: 80%;"><b>Time decay rate</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs031.html#code-with-a-number-of-minibatches-which-varies" style="font-size: 80%;"><b>Code with a Number of Minibatches which varies</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs032.html#replace-or-not" style="font-size: 80%;"><b>Replace or not</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#sgd-vs-full-batch-gd-convergence-speed-and-memory-comparison" style="font-size: 80%;"><b>SGD vs Full-Batch GD: Convergence Speed and Memory Comparison</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#theoretical-convergence-speed-and-convex-optimization" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Theoretical Convergence Speed and convex optimization</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#strongly-convex-case" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Strongly Convex Case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#non-convex-problems" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Non-Convex Problems</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs034.html#memory-usage-and-scalability" style="font-size: 80%;"><b>Memory Usage and Scalability</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#empirical-evidence-convergence-time-and-memory-in-practice" style="font-size: 80%;"><b>Empirical Evidence: Convergence Time and Memory in Practice</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#deep-neural-networks" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Deep Neural Networks</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#memory-constraints" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Memory constraints</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs036.html#second-moment-of-the-gradient" style="font-size: 80%;"><b>Second moment of the gradient</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs037.html#challenge-choosing-a-fixed-learning-rate" style="font-size: 80%;"><b>Challenge: Choosing a Fixed Learning Rate</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs038.html#motivation-for-adaptive-step-sizes" style="font-size: 80%;"><b>Motivation for Adaptive Step Sizes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs038.html#adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>AdaGrad algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
+     <!-- navigation toc: --> <li><a href="#derivation-of-the-adagrad-algorithm" style="font-size: 80%;"><b>Derivation of the AdaGrad Algorithm</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs040.html#adagrad-update-rule-derivation" style="font-size: 80%;"><b>AdaGrad Update Rule Derivation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs041.html#adagrad-properties" style="font-size: 80%;"><b>AdaGrad Properties</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs042.html#rmsprop-adaptive-learning-rates" style="font-size: 80%;"><b>RMSProp: Adaptive Learning Rates</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs042.html#rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>RMSProp algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs043.html#adam-optimizer" style="font-size: 80%;"><b>Adam Optimizer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs044.html#adam-optimizer-https-arxiv-org-abs-1412-6980" style="font-size: 80%;"><b>"ADAM optimizer":"https://arxiv.org/abs/1412.6980"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs045.html#why-combine-momentum-and-rmsprop" style="font-size: 80%;"><b>Why Combine Momentum and RMSProp?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs046.html#adam-exponential-moving-averages-moments" style="font-size: 80%;"><b>Adam: Exponential Moving Averages (Moments)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs047.html#adam-bias-correction" style="font-size: 80%;"><b>Adam: Bias Correction</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs048.html#adam-update-rule-derivation" style="font-size: 80%;"><b>Adam: Update Rule Derivation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs049.html#adam-vs-adagrad-and-rmsprop" style="font-size: 80%;"><b>Adam vs. AdaGrad and RMSProp</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs050.html#adaptivity-across-dimensions" style="font-size: 80%;"><b>Adaptivity Across Dimensions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs050.html#adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>ADAM algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs051.html#algorithms-and-codes-for-adagrad-rmsprop-and-adam" style="font-size: 80%;"><b>Algorithms and codes for Adagrad, RMSprop and Adam</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs052.html#practical-tips" style="font-size: 80%;"><b>Practical tips</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs053.html#sneaking-in-automatic-differentiation-using-autograd" style="font-size: 80%;"><b>Sneaking in automatic differentiation using Autograd</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs056.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs055.html#including-stochastic-gradient-descent-with-autograd" style="font-size: 80%;"><b>Including Stochastic Gradient Descent with Autograd</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs056.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs057.html#but-none-of-these-can-compete-with-newton-s-method" style="font-size: 80%;"><b>But none of these can compete with Newton's method</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs058.html#similar-second-order-function-now-problem-but-now-with-adagrad" style="font-size: 80%;"><b>Similar (second order function now) problem but now with AdaGrad</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs059.html#rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent" style="font-size: 80%;"><b>RMSprop for adaptive learning rate with Stochastic Gradient Descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs060.html#and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf" style="font-size: 80%;"><b>And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs061.html#material-for-the-lab-sessions" style="font-size: 80%;"><b>Material for the lab sessions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs062.html#reminder-on-different-scaling-methods" style="font-size: 80%;"><b>Reminder on different scaling methods</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs063.html#functionality-in-scikit-learn" style="font-size: 80%;"><b>Functionality in Scikit-Learn</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs064.html#more-preprocessing" style="font-size: 80%;"><b>More preprocessing</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs065.html#frequently-used-scaling-functions" style="font-size: 80%;"><b>Frequently used scaling functions</b></a></li>
 
         </ul>
       </li>
@@ -291,13 +374,28 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0039"></a>
 <!-- !split -->
-<h2 id="a-way-to-read-the-bias-variance-tradeoff" class="anchor">A way to Read the Bias-Variance Tradeoff </h2>
+<h2 id="derivation-of-the-adagrad-algorithm" class="anchor">Derivation of the AdaGrad Algorithm </h2>
+
+<div class="panel panel-default">
+<div class="panel-body">
+<!-- subsequent paragraphs come in larger fonts, so start with a paragraph -->
+<ol>
+<li> AdaGrad maintains a running sum of squared gradients for each parameter (coordinate)</li>
+<li> Let \( g_t = \nabla C_{i_t}(x_t) \) be the gradient at step \( t \) (or a subgradient for nondifferentiable cases).</li>
+<li> Initialize \( r_0 = 0 \) (an all-zero vector in \( \mathbb{R}^d \)).</li>
+<li> At each iteration \( t \), update the accumulation:</li>
+</ol>
+$$
+r_t = r_{t-1} + g_t \circ g_t,
+$$
+
+<ol>
+<li> Here  \( g_t \circ g_t \) denotes element-wise square of the gradient vector. \( g_t^{(j)} = g_{t-1}^{(j)} + (g_{t,j})^2 \) for each parameter \( j \).</li>
+<li> We can view \( H_t = \mathrm{diag}(r_t) \) as a diagonal matrix of past squared gradients. Initially \( H_0 = 0 \).</li>
+</ol>
+</div>
+</div>
 
-<br/><br/>
-<center>
-<p><img src="http://github.com/figures/BiasVariance.png" width="600" align="bottom"></p>
-</center>
-<br/><br/>
 
 <p>
 <!-- navigation buttons at the bottom of the page -->
@@ -324,7 +422,7 @@ <h2 id="a-way-to-read-the-bias-variance-tradeoff" class="anchor">A way to Read t
   <li><a href="http://github.com/._week37-bs047.html">48</a></li>
   <li><a href="http://github.com/._week37-bs048.html">49</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week37-bs053.html">54</a></li>
+  <li><a href="http://github.com/._week37-bs065.html">66</a></li>
   <li><a href="http://github.com/._week37-bs040.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week37/html/._week37-bs040.html b/doc/pub/week37/html/._week37-bs040.html
index 4cb52fe6e..2ffa9b8af 100644
--- a/doc/pub/week37/html/._week37-bs040.html
+++ b/doc/pub/week37/html/._week37-bs040.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 37: Statistical interpretations and Resampling Methods">
-<title>Week 37: Statistical interpretations and Resampling Methods</title>
+<meta name="description" content="Week 37: Gradient descent methods">
+<title>Week 37: Gradient descent methods</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week37.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week37-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -40,159 +40,222 @@
                2,
                None,
                'plans-for-week-37-lecture-monday'),
-              ('Plans for week 37, lab sessions',
+              ('Readings and Videos:', 2, None, 'readings-and-videos'),
+              ('Material for lecture Monday September 8',
                2,
                None,
-               'plans-for-week-37-lab-sessions'),
-              ('Material for lecture Monday September 9',
+               'material-for-lecture-monday-september-8'),
+              ('Gradient descent and revisiting Ordinary Least Squares from '
+               'last week',
                2,
                None,
-               'material-for-lecture-monday-september-9'),
-              ('Deriving OLS from a probability distribution',
+               'gradient-descent-and-revisiting-ordinary-least-squares-from-last-week'),
+              ('Gradient descent example', 2, None, 'gradient-descent-example'),
+              ('The derivative of the cost/loss function',
                2,
                None,
-               'deriving-ols-from-a-probability-distribution'),
-              ('Independent and Identically Distrubuted (iid)',
+               'the-derivative-of-the-cost-loss-function'),
+              ('The Hessian matrix', 2, None, 'the-hessian-matrix'),
+              ('Simple program', 2, None, 'simple-program'),
+              ('Gradient Descent Example', 2, None, 'gradient-descent-example'),
+              ('Gradient descent and Ridge',
                2,
                None,
-               'independent-and-identically-distrubuted-iid'),
-              ('Maximum Likelihood Estimation (MLE)',
+               'gradient-descent-and-ridge'),
+              ('The Hessian matrix for Ridge Regression',
                2,
                None,
-               'maximum-likelihood-estimation-mle'),
-              ('A new Cost Function', 2, None, 'a-new-cost-function'),
-              ("More basic Statistics and Bayes' theorem",
+               'the-hessian-matrix-for-ridge-regression'),
+              ('Program example for gradient descent with Ridge Regression',
                2,
                None,
-               'more-basic-statistics-and-bayes-theorem'),
-              ('Marginal Probability', 2, None, 'marginal-probability'),
-              ('Conditional  Probability', 2, None, 'conditional-probability'),
-              ("Bayes' Theorem", 2, None, 'bayes-theorem'),
-              ("Interpretations of Bayes' Theorem",
+               'program-example-for-gradient-descent-with-ridge-regression'),
+              ('Using gradient descent methods, limitations',
                2,
                None,
-               'interpretations-of-bayes-theorem'),
-              ("Example of Usage of Bayes' theorem",
+               'using-gradient-descent-methods-limitations'),
+              ('Momentum based GD', 2, None, 'momentum-based-gd'),
+              ('Improving gradient descent with momentum',
                2,
                None,
-               'example-of-usage-of-bayes-theorem'),
-              ('Doing it correctly', 2, None, 'doing-it-correctly'),
-              ("Bayes' Theorem and Ridge and Lasso Regression",
+               'improving-gradient-descent-with-momentum'),
+              ('Same code but now with momentum gradient descent',
                2,
                None,
-               'bayes-theorem-and-ridge-and-lasso-regression'),
-              ('Ridge and Bayes', 2, None, 'ridge-and-bayes'),
-              ('Lasso and Bayes', 2, None, 'lasso-and-bayes'),
-              ('Why resampling methods', 2, None, 'why-resampling-methods'),
-              ('Resampling methods', 2, None, 'resampling-methods'),
-              ('Resampling approaches can be computationally expensive',
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ('Overview video on Stochastic Gradient Descent (SGD)',
                2,
                None,
-               'resampling-approaches-can-be-computationally-expensive'),
-              ('Why resampling methods ?', 2, None, 'why-resampling-methods'),
-              ('Statistical analysis', 2, None, 'statistical-analysis'),
-              ('Resampling methods', 2, None, 'resampling-methods'),
-              ('Resampling methods: Bootstrap',
+               'overview-video-on-stochastic-gradient-descent-sgd'),
+              ('Batches and mini-batches', 2, None, 'batches-and-mini-batches'),
+              ('Pros and cons', 2, None, 'pros-and-cons'),
+              ('Convergence rates', 2, None, 'convergence-rates'),
+              ('Accuracy', 2, None, 'accuracy'),
+              ('Stochastic Gradient Descent (SGD)',
                2,
                None,
-               'resampling-methods-bootstrap'),
-              ('The Central Limit Theorem',
+               'stochastic-gradient-descent-sgd'),
+              ('Stochastic Gradient Descent',
                2,
                None,
-               'the-central-limit-theorem'),
-              ('Finding the Limit', 2, None, 'finding-the-limit'),
-              ('Rewriting the $\\delta$-function',
+               'stochastic-gradient-descent'),
+              ('Computation of gradients', 2, None, 'computation-of-gradients'),
+              ('SGD example', 2, None, 'sgd-example'),
+              ('The gradient step', 2, None, 'the-gradient-step'),
+              ('Simple example code', 2, None, 'simple-example-code'),
+              ('When do we stop?', 2, None, 'when-do-we-stop'),
+              ('Slightly different approach',
                2,
                None,
-               'rewriting-the-delta-function'),
-              ('Identifying Terms', 2, None, 'identifying-terms'),
-              ('Wrapping it up', 2, None, 'wrapping-it-up'),
-              ('Confidence Intervals', 2, None, 'confidence-intervals'),
-              ('Standard Approach based on the Normal Distribution',
+               'slightly-different-approach'),
+              ('Time decay rate', 2, None, 'time-decay-rate'),
+              ('Code with a Number of Minibatches which varies',
                2,
                None,
-               'standard-approach-based-on-the-normal-distribution'),
-              ('Resampling methods: Bootstrap background',
+               'code-with-a-number-of-minibatches-which-varies'),
+              ('Replace or not', 2, None, 'replace-or-not'),
+              ('SGD vs Full-Batch GD: Convergence Speed and Memory Comparison',
                2,
                None,
-               'resampling-methods-bootstrap-background'),
-              ('Resampling methods: More Bootstrap background',
+               'sgd-vs-full-batch-gd-convergence-speed-and-memory-comparison'),
+              ('Theoretical Convergence Speed and convex optimization',
+               3,
+               None,
+               'theoretical-convergence-speed-and-convex-optimization'),
+              ('Strongly Convex Case', 3, None, 'strongly-convex-case'),
+              ('Non-Convex Problems', 3, None, 'non-convex-problems'),
+              ('Memory Usage and Scalability',
+               2,
+               None,
+               'memory-usage-and-scalability'),
+              ('Empirical Evidence: Convergence Time and Memory in Practice',
+               2,
+               None,
+               'empirical-evidence-convergence-time-and-memory-in-practice'),
+              ('Deep Neural Networks', 3, None, 'deep-neural-networks'),
+              ('Memory constraints', 3, None, 'memory-constraints'),
+              ('Second moment of the gradient',
+               2,
+               None,
+               'second-moment-of-the-gradient'),
+              ('Challenge: Choosing a Fixed Learning Rate',
+               2,
+               None,
+               'challenge-choosing-a-fixed-learning-rate'),
+              ('Motivation for Adaptive Step Sizes',
+               2,
+               None,
+               'motivation-for-adaptive-step-sizes'),
+              ('AdaGrad algorithm, taken from "Goodfellow et '
+               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
+               2,
+               None,
+               'adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
+              ('Derivation of the AdaGrad Algorithm',
+               2,
+               None,
+               'derivation-of-the-adagrad-algorithm'),
+              ('AdaGrad Update Rule Derivation',
+               2,
+               None,
+               'adagrad-update-rule-derivation'),
+              ('AdaGrad Properties', 2, None, 'adagrad-properties'),
+              ('RMSProp: Adaptive Learning Rates',
+               2,
+               None,
+               'rmsprop-adaptive-learning-rates'),
+              ('RMSProp algorithm, taken from "Goodfellow et '
+               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
+               2,
+               None,
+               'rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
+              ('Adam Optimizer', 2, None, 'adam-optimizer'),
+              ('"ADAM optimizer":"https://arxiv.org/abs/1412.6980"',
+               2,
+               None,
+               'adam-optimizer-https-arxiv-org-abs-1412-6980'),
+              ('Why Combine Momentum and RMSProp?',
                2,
                None,
-               'resampling-methods-more-bootstrap-background'),
-              ('Resampling methods: Bootstrap approach',
+               'why-combine-momentum-and-rmsprop'),
+              ('Adam: Exponential Moving Averages (Moments)',
                2,
                None,
-               'resampling-methods-bootstrap-approach'),
-              ('Resampling methods: Bootstrap steps',
+               'adam-exponential-moving-averages-moments'),
+              ('Adam: Bias Correction', 2, None, 'adam-bias-correction'),
+              ('Adam: Update Rule Derivation',
                2,
                None,
-               'resampling-methods-bootstrap-steps'),
-              ('Code example for the Bootstrap method',
+               'adam-update-rule-derivation'),
+              ('Adam vs. AdaGrad and RMSProp',
                2,
                None,
-               'code-example-for-the-bootstrap-method'),
-              ('Plotting the Histogram', 2, None, 'plotting-the-histogram'),
-              ('The bias-variance tradeoff',
+               'adam-vs-adagrad-and-rmsprop'),
+              ('Adaptivity Across Dimensions',
                2,
                None,
-               'the-bias-variance-tradeoff'),
-              ('A way to Read the Bias-Variance Tradeoff',
+               'adaptivity-across-dimensions'),
+              ('ADAM algorithm, taken from "Goodfellow et '
+               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
                2,
                None,
-               'a-way-to-read-the-bias-variance-tradeoff'),
-              ('Example code for Bias-Variance tradeoff',
+               'adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
+              ('Algorithms and codes for Adagrad, RMSprop and Adam',
                2,
                None,
-               'example-code-for-bias-variance-tradeoff'),
-              ('Understanding what happens',
+               'algorithms-and-codes-for-adagrad-rmsprop-and-adam'),
+              ('Practical tips', 2, None, 'practical-tips'),
+              ('Sneaking in automatic differentiation using Autograd',
                2,
                None,
-               'understanding-what-happens'),
-              ('Summing up', 2, None, 'summing-up'),
-              ("Another Example from Scikit-Learn's Repository",
+               'sneaking-in-automatic-differentiation-using-autograd'),
+              ('Same code but now with momentum gradient descent',
                2,
                None,
-               'another-example-from-scikit-learn-s-repository'),
-              ('Various steps in cross-validation',
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ('Including Stochastic Gradient Descent with Autograd',
                2,
                None,
-               'various-steps-in-cross-validation'),
-              ('Cross-validation in brief',
+               'including-stochastic-gradient-descent-with-autograd'),
+              ('Same code but now with momentum gradient descent',
                2,
                None,
-               'cross-validation-in-brief'),
-              ('Code Example for Cross-validation and $k$-fold '
-               'Cross-validation',
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ("But none of these can compete with Newton's method",
                2,
                None,
-               'code-example-for-cross-validation-and-k-fold-cross-validation'),
-              ('More examples on bootstrap and cross-validation and errors',
+               'but-none-of-these-can-compete-with-newton-s-method'),
+              ('Similar (second order function now) problem but now with '
+               'AdaGrad',
                2,
                None,
-               'more-examples-on-bootstrap-and-cross-validation-and-errors'),
-              ('The same example but now with cross-validation',
+               'similar-second-order-function-now-problem-but-now-with-adagrad'),
+              ('RMSprop for adaptive learning rate with Stochastic Gradient '
+               'Descent',
                2,
                None,
-               'the-same-example-but-now-with-cross-validation'),
+               'rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent'),
+              ('And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"',
+               2,
+               None,
+               'and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf'),
               ('Material for the lab sessions',
                2,
                None,
                'material-for-the-lab-sessions'),
-              ('Linking the regression analysis with a statistical '
-               'interpretation',
+              ('Reminder on different scaling methods',
                2,
                None,
-               'linking-the-regression-analysis-with-a-statistical-interpretation'),
-              ('Assumptions made', 2, None, 'assumptions-made'),
-              ('Expectation value and variance',
+               'reminder-on-different-scaling-methods'),
+              ('Functionality in Scikit-Learn',
                2,
                None,
-               'expectation-value-and-variance'),
-              ('Expectation value and variance for $\\boldsymbol{\\beta}$',
+               'functionality-in-scikit-learn'),
+              ('More preprocessing', 2, None, 'more-preprocessing'),
+              ('Frequently used scaling functions',
                2,
                None,
-               'expectation-value-and-variance-for-boldsymbol-beta')]}
+               'frequently-used-scaling-functions')]}
 end of tocinfo -->
 
 <body>
@@ -220,66 +283,86 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week37-bs.html">Week 37: Statistical interpretations and Resampling Methods</a>
+    <a class="navbar-brand" href="http://github.com/week37-bs.html">Week 37: Gradient descent methods</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs001.html#plans-for-week-37-lecture-monday" style="font-size: 80%;">Plans for week 37, lecture Monday</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs002.html#plans-for-week-37-lab-sessions" style="font-size: 80%;">Plans for week 37, lab sessions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs003.html#material-for-lecture-monday-september-9" style="font-size: 80%;">Material for lecture Monday September 9</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs004.html#deriving-ols-from-a-probability-distribution" style="font-size: 80%;">Deriving OLS from a probability distribution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs005.html#independent-and-identically-distrubuted-iid" style="font-size: 80%;">Independent and Identically Distrubuted (iid)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs006.html#maximum-likelihood-estimation-mle" style="font-size: 80%;">Maximum Likelihood Estimation (MLE)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs007.html#a-new-cost-function" style="font-size: 80%;">A new Cost Function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs008.html#more-basic-statistics-and-bayes-theorem" style="font-size: 80%;">More basic Statistics and Bayes' theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs009.html#marginal-probability" style="font-size: 80%;">Marginal Probability</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs010.html#conditional-probability" style="font-size: 80%;">Conditional  Probability</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs011.html#bayes-theorem" style="font-size: 80%;">Bayes' Theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs012.html#interpretations-of-bayes-theorem" style="font-size: 80%;">Interpretations of Bayes' Theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs013.html#example-of-usage-of-bayes-theorem" style="font-size: 80%;">Example of Usage of Bayes' theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs014.html#doing-it-correctly" style="font-size: 80%;">Doing it correctly</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs015.html#bayes-theorem-and-ridge-and-lasso-regression" style="font-size: 80%;">Bayes' Theorem and Ridge and Lasso Regression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs016.html#ridge-and-bayes" style="font-size: 80%;">Ridge and Bayes</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs017.html#lasso-and-bayes" style="font-size: 80%;">Lasso and Bayes</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs021.html#why-resampling-methods" style="font-size: 80%;">Why resampling methods</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs023.html#resampling-methods" style="font-size: 80%;">Resampling methods</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs020.html#resampling-approaches-can-be-computationally-expensive" style="font-size: 80%;">Resampling approaches can be computationally expensive</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs021.html#why-resampling-methods" style="font-size: 80%;">Why resampling methods ?</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs022.html#statistical-analysis" style="font-size: 80%;">Statistical analysis</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs023.html#resampling-methods" style="font-size: 80%;">Resampling methods</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs024.html#resampling-methods-bootstrap" style="font-size: 80%;">Resampling methods: Bootstrap</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs025.html#the-central-limit-theorem" style="font-size: 80%;">The Central Limit Theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs026.html#finding-the-limit" style="font-size: 80%;">Finding the Limit</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs027.html#rewriting-the-delta-function" style="font-size: 80%;">Rewriting the \( \delta \)-function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs028.html#identifying-terms" style="font-size: 80%;">Identifying Terms</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs029.html#wrapping-it-up" style="font-size: 80%;">Wrapping it up</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs030.html#confidence-intervals" style="font-size: 80%;">Confidence Intervals</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs031.html#standard-approach-based-on-the-normal-distribution" style="font-size: 80%;">Standard Approach based on the Normal Distribution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs032.html#resampling-methods-bootstrap-background" style="font-size: 80%;">Resampling methods: Bootstrap background</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#resampling-methods-more-bootstrap-background" style="font-size: 80%;">Resampling methods: More Bootstrap background</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs034.html#resampling-methods-bootstrap-approach" style="font-size: 80%;">Resampling methods: Bootstrap approach</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#resampling-methods-bootstrap-steps" style="font-size: 80%;">Resampling methods: Bootstrap steps</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs036.html#code-example-for-the-bootstrap-method" style="font-size: 80%;">Code example for the Bootstrap method</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs037.html#plotting-the-histogram" style="font-size: 80%;">Plotting the Histogram</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs038.html#the-bias-variance-tradeoff" style="font-size: 80%;">The bias-variance tradeoff</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs039.html#a-way-to-read-the-bias-variance-tradeoff" style="font-size: 80%;">A way to Read the Bias-Variance Tradeoff</a></li>
-     <!-- navigation toc: --> <li><a href="#example-code-for-bias-variance-tradeoff" style="font-size: 80%;">Example code for Bias-Variance tradeoff</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs041.html#understanding-what-happens" style="font-size: 80%;">Understanding what happens</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs042.html#summing-up" style="font-size: 80%;">Summing up</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs043.html#another-example-from-scikit-learn-s-repository" style="font-size: 80%;">Another Example from Scikit-Learn's Repository</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs044.html#various-steps-in-cross-validation" style="font-size: 80%;">Various steps in cross-validation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs045.html#cross-validation-in-brief" style="font-size: 80%;">Cross-validation in brief</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs046.html#code-example-for-cross-validation-and-k-fold-cross-validation" style="font-size: 80%;">Code Example for Cross-validation and \( k \)-fold Cross-validation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs047.html#more-examples-on-bootstrap-and-cross-validation-and-errors" style="font-size: 80%;">More examples on bootstrap and cross-validation and errors</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs048.html#the-same-example-but-now-with-cross-validation" style="font-size: 80%;">The same example but now with cross-validation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs049.html#material-for-the-lab-sessions" style="font-size: 80%;">Material for the lab sessions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs050.html#linking-the-regression-analysis-with-a-statistical-interpretation" style="font-size: 80%;">Linking the regression analysis with a statistical interpretation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs051.html#assumptions-made" style="font-size: 80%;">Assumptions made</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs052.html#expectation-value-and-variance" style="font-size: 80%;">Expectation value and variance</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs053.html#expectation-value-and-variance-for-boldsymbol-beta" style="font-size: 80%;">Expectation value and variance for \( \boldsymbol{\beta} \)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs001.html#plans-for-week-37-lecture-monday" style="font-size: 80%;"><b>Plans for week 37, lecture Monday</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs002.html#readings-and-videos" style="font-size: 80%;"><b>Readings and Videos:</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs003.html#material-for-lecture-monday-september-8" style="font-size: 80%;"><b>Material for lecture Monday September 8</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs004.html#gradient-descent-and-revisiting-ordinary-least-squares-from-last-week" style="font-size: 80%;"><b>Gradient descent and revisiting Ordinary Least Squares from last week</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs009.html#gradient-descent-example" style="font-size: 80%;"><b>Gradient descent example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs006.html#the-derivative-of-the-cost-loss-function" style="font-size: 80%;"><b>The derivative of the cost/loss function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs007.html#the-hessian-matrix" style="font-size: 80%;"><b>The Hessian matrix</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs008.html#simple-program" style="font-size: 80%;"><b>Simple program</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs009.html#gradient-descent-example" style="font-size: 80%;"><b>Gradient Descent Example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs010.html#gradient-descent-and-ridge" style="font-size: 80%;"><b>Gradient descent and Ridge</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs011.html#the-hessian-matrix-for-ridge-regression" style="font-size: 80%;"><b>The Hessian matrix for Ridge Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs012.html#program-example-for-gradient-descent-with-ridge-regression" style="font-size: 80%;"><b>Program example for gradient descent with Ridge Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs013.html#using-gradient-descent-methods-limitations" style="font-size: 80%;"><b>Using gradient descent methods, limitations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs014.html#momentum-based-gd" style="font-size: 80%;"><b>Momentum based GD</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs015.html#improving-gradient-descent-with-momentum" style="font-size: 80%;"><b>Improving gradient descent with momentum</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs056.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs017.html#overview-video-on-stochastic-gradient-descent-sgd" style="font-size: 80%;"><b>Overview video on Stochastic Gradient Descent (SGD)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs018.html#batches-and-mini-batches" style="font-size: 80%;"><b>Batches and mini-batches</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs019.html#pros-and-cons" style="font-size: 80%;"><b>Pros and cons</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs020.html#convergence-rates" style="font-size: 80%;"><b>Convergence rates</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs021.html#accuracy" style="font-size: 80%;"><b>Accuracy</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs022.html#stochastic-gradient-descent-sgd" style="font-size: 80%;"><b>Stochastic Gradient Descent (SGD)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs023.html#stochastic-gradient-descent" style="font-size: 80%;"><b>Stochastic Gradient Descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs024.html#computation-of-gradients" style="font-size: 80%;"><b>Computation of gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs025.html#sgd-example" style="font-size: 80%;"><b>SGD example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs026.html#the-gradient-step" style="font-size: 80%;"><b>The gradient step</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs027.html#simple-example-code" style="font-size: 80%;"><b>Simple example code</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs028.html#when-do-we-stop" style="font-size: 80%;"><b>When do we stop?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs029.html#slightly-different-approach" style="font-size: 80%;"><b>Slightly different approach</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs030.html#time-decay-rate" style="font-size: 80%;"><b>Time decay rate</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs031.html#code-with-a-number-of-minibatches-which-varies" style="font-size: 80%;"><b>Code with a Number of Minibatches which varies</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs032.html#replace-or-not" style="font-size: 80%;"><b>Replace or not</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#sgd-vs-full-batch-gd-convergence-speed-and-memory-comparison" style="font-size: 80%;"><b>SGD vs Full-Batch GD: Convergence Speed and Memory Comparison</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#theoretical-convergence-speed-and-convex-optimization" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Theoretical Convergence Speed and convex optimization</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#strongly-convex-case" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Strongly Convex Case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#non-convex-problems" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Non-Convex Problems</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs034.html#memory-usage-and-scalability" style="font-size: 80%;"><b>Memory Usage and Scalability</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#empirical-evidence-convergence-time-and-memory-in-practice" style="font-size: 80%;"><b>Empirical Evidence: Convergence Time and Memory in Practice</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#deep-neural-networks" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Deep Neural Networks</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#memory-constraints" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Memory constraints</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs036.html#second-moment-of-the-gradient" style="font-size: 80%;"><b>Second moment of the gradient</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs037.html#challenge-choosing-a-fixed-learning-rate" style="font-size: 80%;"><b>Challenge: Choosing a Fixed Learning Rate</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs038.html#motivation-for-adaptive-step-sizes" style="font-size: 80%;"><b>Motivation for Adaptive Step Sizes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs038.html#adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>AdaGrad algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs039.html#derivation-of-the-adagrad-algorithm" style="font-size: 80%;"><b>Derivation of the AdaGrad Algorithm</b></a></li>
+     <!-- navigation toc: --> <li><a href="#adagrad-update-rule-derivation" style="font-size: 80%;"><b>AdaGrad Update Rule Derivation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs041.html#adagrad-properties" style="font-size: 80%;"><b>AdaGrad Properties</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs042.html#rmsprop-adaptive-learning-rates" style="font-size: 80%;"><b>RMSProp: Adaptive Learning Rates</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs042.html#rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>RMSProp algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs043.html#adam-optimizer" style="font-size: 80%;"><b>Adam Optimizer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs044.html#adam-optimizer-https-arxiv-org-abs-1412-6980" style="font-size: 80%;"><b>"ADAM optimizer":"https://arxiv.org/abs/1412.6980"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs045.html#why-combine-momentum-and-rmsprop" style="font-size: 80%;"><b>Why Combine Momentum and RMSProp?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs046.html#adam-exponential-moving-averages-moments" style="font-size: 80%;"><b>Adam: Exponential Moving Averages (Moments)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs047.html#adam-bias-correction" style="font-size: 80%;"><b>Adam: Bias Correction</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs048.html#adam-update-rule-derivation" style="font-size: 80%;"><b>Adam: Update Rule Derivation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs049.html#adam-vs-adagrad-and-rmsprop" style="font-size: 80%;"><b>Adam vs. AdaGrad and RMSProp</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs050.html#adaptivity-across-dimensions" style="font-size: 80%;"><b>Adaptivity Across Dimensions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs050.html#adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>ADAM algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs051.html#algorithms-and-codes-for-adagrad-rmsprop-and-adam" style="font-size: 80%;"><b>Algorithms and codes for Adagrad, RMSprop and Adam</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs052.html#practical-tips" style="font-size: 80%;"><b>Practical tips</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs053.html#sneaking-in-automatic-differentiation-using-autograd" style="font-size: 80%;"><b>Sneaking in automatic differentiation using Autograd</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs056.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs055.html#including-stochastic-gradient-descent-with-autograd" style="font-size: 80%;"><b>Including Stochastic Gradient Descent with Autograd</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs056.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs057.html#but-none-of-these-can-compete-with-newton-s-method" style="font-size: 80%;"><b>But none of these can compete with Newton's method</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs058.html#similar-second-order-function-now-problem-but-now-with-adagrad" style="font-size: 80%;"><b>Similar (second order function now) problem but now with AdaGrad</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs059.html#rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent" style="font-size: 80%;"><b>RMSprop for adaptive learning rate with Stochastic Gradient Descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs060.html#and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf" style="font-size: 80%;"><b>And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs061.html#material-for-the-lab-sessions" style="font-size: 80%;"><b>Material for the lab sessions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs062.html#reminder-on-different-scaling-methods" style="font-size: 80%;"><b>Reminder on different scaling methods</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs063.html#functionality-in-scikit-learn" style="font-size: 80%;"><b>Functionality in Scikit-Learn</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs064.html#more-preprocessing" style="font-size: 80%;"><b>More preprocessing</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs065.html#frequently-used-scaling-functions" style="font-size: 80%;"><b>Frequently used scaling functions</b></a></li>
 
         </ul>
       </li>
@@ -291,83 +374,26 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0040"></a>
 <!-- !split -->
-<h2 id="example-code-for-bias-variance-tradeoff" class="anchor">Example code for Bias-Variance tradeoff </h2>
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">matplotlib.pyplot</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">plt</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.linear_model</span> <span style="color: #008000; font-weight: bold">import</span> LinearRegression, Ridge, Lasso
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.preprocessing</span> <span style="color: #008000; font-weight: bold">import</span> PolynomialFeatures
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.model_selection</span> <span style="color: #008000; font-weight: bold">import</span> train_test_split
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.pipeline</span> <span style="color: #008000; font-weight: bold">import</span> make_pipeline
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.utils</span> <span style="color: #008000; font-weight: bold">import</span> resample
-
-np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>seed(<span style="color: #666666">2018</span>)
+<h2 id="adagrad-update-rule-derivation" class="anchor">AdaGrad Update Rule Derivation </h2>
 
-n <span style="color: #666666">=</span> <span style="color: #666666">500</span>
-n_boostraps <span style="color: #666666">=</span> <span style="color: #666666">100</span>
-degree <span style="color: #666666">=</span> <span style="color: #666666">18</span>  <span style="color: #408080; font-style: italic"># A quite high value, just to show.</span>
-noise <span style="color: #666666">=</span> <span style="color: #666666">0.1</span>
+<p>We scale the gradient by the inverse square root of the accumulated matrix \( H_t \). The AdaGrad update at step \( t \) is:</p>
+$$
+\theta_{t+1} =\theta_t - \eta H_t^{-1/2} g_t,
+$$
 
-<span style="color: #408080; font-style: italic"># Make data set.</span>
-x <span style="color: #666666">=</span> np<span style="color: #666666">.</span>linspace(<span style="color: #666666">-1</span>, <span style="color: #666666">3</span>, n)<span style="color: #666666">.</span>reshape(<span style="color: #666666">-1</span>, <span style="color: #666666">1</span>)
-y <span style="color: #666666">=</span> np<span style="color: #666666">.</span>exp(<span style="color: #666666">-</span>x<span style="color: #666666">**2</span>) <span style="color: #666666">+</span> <span style="color: #666666">1.5</span> <span style="color: #666666">*</span> np<span style="color: #666666">.</span>exp(<span style="color: #666666">-</span>(x<span style="color: #666666">-2</span>)<span style="color: #666666">**2</span>) <span style="color: #666666">+</span> np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>normal(<span style="color: #666666">0</span>, <span style="color: #666666">0.1</span>, x<span style="color: #666666">.</span>shape)
+<p>where \( H_t^{-1/2} \) is the diagonal matrix with entries \( (r_{t}^{(1)})^{-1/2}, \dots, (r_{t}^{(d)})^{-1/2} \)
+In coordinates, this means each parameter \( j \) has an individual step size:
+</p>
+$$
+ \theta_{t+1,j} =\theta_{t,j} -\frac{\eta}{\sqrt{r_{t,j}}}g_{t,j}.
+$$
 
-<span style="color: #408080; font-style: italic"># Hold out some test data that is never used in training.</span>
-x_train, x_test, y_train, y_test <span style="color: #666666">=</span> train_test_split(x, y, test_size<span style="color: #666666">=0.2</span>)
-
-<span style="color: #408080; font-style: italic"># Combine x transformation and model into one operation.</span>
-<span style="color: #408080; font-style: italic"># Not neccesary, but convenient.</span>
-model <span style="color: #666666">=</span> make_pipeline(PolynomialFeatures(degree<span style="color: #666666">=</span>degree), LinearRegression(fit_intercept<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">False</span>))
-
-<span style="color: #408080; font-style: italic"># The following (m x n_bootstraps) matrix holds the column vectors y_pred</span>
-<span style="color: #408080; font-style: italic"># for each bootstrap iteration.</span>
-y_pred <span style="color: #666666">=</span> np<span style="color: #666666">.</span>empty((y_test<span style="color: #666666">.</span>shape[<span style="color: #666666">0</span>], n_boostraps))
-<span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(n_boostraps):
-    x_, y_ <span style="color: #666666">=</span> resample(x_train, y_train)
-
-    <span style="color: #408080; font-style: italic"># Evaluate the new model on the same test data each time.</span>
-    y_pred[:, i] <span style="color: #666666">=</span> model<span style="color: #666666">.</span>fit(x_, y_)<span style="color: #666666">.</span>predict(x_test)<span style="color: #666666">.</span>ravel()
-
-<span style="color: #408080; font-style: italic"># Note: Expectations and variances taken w.r.t. different training</span>
-<span style="color: #408080; font-style: italic"># data sets, hence the axis=1. Subsequent means are taken across the test data</span>
-<span style="color: #408080; font-style: italic"># set in order to obtain a total value, but before this we have error/bias/variance</span>
-<span style="color: #408080; font-style: italic"># calculated per data point in the test set.</span>
-<span style="color: #408080; font-style: italic"># Note 2: The use of keepdims=True is important in the calculation of bias as this </span>
-<span style="color: #408080; font-style: italic"># maintains the column vector form. Dropping this yields very unexpected results.</span>
-error <span style="color: #666666">=</span> np<span style="color: #666666">.</span>mean( np<span style="color: #666666">.</span>mean((y_test <span style="color: #666666">-</span> y_pred)<span style="color: #666666">**2</span>, axis<span style="color: #666666">=1</span>, keepdims<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">True</span>) )
-bias <span style="color: #666666">=</span> np<span style="color: #666666">.</span>mean( (y_test <span style="color: #666666">-</span> np<span style="color: #666666">.</span>mean(y_pred, axis<span style="color: #666666">=1</span>, keepdims<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">True</span>))<span style="color: #666666">**2</span> )
-variance <span style="color: #666666">=</span> np<span style="color: #666666">.</span>mean( np<span style="color: #666666">.</span>var(y_pred, axis<span style="color: #666666">=1</span>, keepdims<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">True</span>) )
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&#39;Error:&#39;</span>, error)
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&#39;Bias^2:&#39;</span>, bias)
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&#39;Var:&#39;</span>, variance)
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&#39;</span><span style="color: #BB6688; font-weight: bold">{}</span><span style="color: #BA2121"> &gt;= </span><span style="color: #BB6688; font-weight: bold">{}</span><span style="color: #BA2121"> + </span><span style="color: #BB6688; font-weight: bold">{}</span><span style="color: #BA2121"> = </span><span style="color: #BB6688; font-weight: bold">{}</span><span style="color: #BA2121">&#39;</span><span style="color: #666666">.</span>format(error, bias, variance, bias<span style="color: #666666">+</span>variance))
-
-plt<span style="color: #666666">.</span>plot(x[::<span style="color: #666666">5</span>, :], y[::<span style="color: #666666">5</span>, :], label<span style="color: #666666">=</span><span style="color: #BA2121">&#39;f(x)&#39;</span>)
-plt<span style="color: #666666">.</span>scatter(x_test, y_test, label<span style="color: #666666">=</span><span style="color: #BA2121">&#39;Data points&#39;</span>)
-plt<span style="color: #666666">.</span>scatter(x_test, np<span style="color: #666666">.</span>mean(y_pred, axis<span style="color: #666666">=1</span>), label<span style="color: #666666">=</span><span style="color: #BA2121">&#39;Pred&#39;</span>)
-plt<span style="color: #666666">.</span>legend()
-plt<span style="color: #666666">.</span>show()
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
+<p>In practice we add a small constant \( \epsilon \) in the denominator for numerical stability to avoid division by zero:</p>
+$$
+\theta_{t+1,j}= \theta_{t,j}-\frac{\eta}{\sqrt{\epsilon + r_{t,j}}}g_{t,j}.
+$$
 
+<p>Equivalently, the effective learning rate for parameter \( j \) at time \( t \) is \( \displaystyle \alpha_{t,j} = \frac{\eta}{\sqrt{\epsilon + r_{t,j}}} \). This decreases over time as \( r_{t,j} \) grows.</p>
 
 <p>
 <!-- navigation buttons at the bottom of the page -->
@@ -394,7 +420,7 @@ <h2 id="example-code-for-bias-variance-tradeoff" class="anchor">Example code for
   <li><a href="http://github.com/._week37-bs048.html">49</a></li>
   <li><a href="http://github.com/._week37-bs049.html">50</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week37-bs053.html">54</a></li>
+  <li><a href="http://github.com/._week37-bs065.html">66</a></li>
   <li><a href="http://github.com/._week37-bs041.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week37/html/._week37-bs041.html b/doc/pub/week37/html/._week37-bs041.html
index 7112b2e60..acba4fc7d 100644
--- a/doc/pub/week37/html/._week37-bs041.html
+++ b/doc/pub/week37/html/._week37-bs041.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 37: Statistical interpretations and Resampling Methods">
-<title>Week 37: Statistical interpretations and Resampling Methods</title>
+<meta name="description" content="Week 37: Gradient descent methods">
+<title>Week 37: Gradient descent methods</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week37.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week37-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -40,159 +40,222 @@
                2,
                None,
                'plans-for-week-37-lecture-monday'),
-              ('Plans for week 37, lab sessions',
+              ('Readings and Videos:', 2, None, 'readings-and-videos'),
+              ('Material for lecture Monday September 8',
                2,
                None,
-               'plans-for-week-37-lab-sessions'),
-              ('Material for lecture Monday September 9',
+               'material-for-lecture-monday-september-8'),
+              ('Gradient descent and revisiting Ordinary Least Squares from '
+               'last week',
                2,
                None,
-               'material-for-lecture-monday-september-9'),
-              ('Deriving OLS from a probability distribution',
+               'gradient-descent-and-revisiting-ordinary-least-squares-from-last-week'),
+              ('Gradient descent example', 2, None, 'gradient-descent-example'),
+              ('The derivative of the cost/loss function',
                2,
                None,
-               'deriving-ols-from-a-probability-distribution'),
-              ('Independent and Identically Distrubuted (iid)',
+               'the-derivative-of-the-cost-loss-function'),
+              ('The Hessian matrix', 2, None, 'the-hessian-matrix'),
+              ('Simple program', 2, None, 'simple-program'),
+              ('Gradient Descent Example', 2, None, 'gradient-descent-example'),
+              ('Gradient descent and Ridge',
                2,
                None,
-               'independent-and-identically-distrubuted-iid'),
-              ('Maximum Likelihood Estimation (MLE)',
+               'gradient-descent-and-ridge'),
+              ('The Hessian matrix for Ridge Regression',
                2,
                None,
-               'maximum-likelihood-estimation-mle'),
-              ('A new Cost Function', 2, None, 'a-new-cost-function'),
-              ("More basic Statistics and Bayes' theorem",
+               'the-hessian-matrix-for-ridge-regression'),
+              ('Program example for gradient descent with Ridge Regression',
                2,
                None,
-               'more-basic-statistics-and-bayes-theorem'),
-              ('Marginal Probability', 2, None, 'marginal-probability'),
-              ('Conditional  Probability', 2, None, 'conditional-probability'),
-              ("Bayes' Theorem", 2, None, 'bayes-theorem'),
-              ("Interpretations of Bayes' Theorem",
+               'program-example-for-gradient-descent-with-ridge-regression'),
+              ('Using gradient descent methods, limitations',
                2,
                None,
-               'interpretations-of-bayes-theorem'),
-              ("Example of Usage of Bayes' theorem",
+               'using-gradient-descent-methods-limitations'),
+              ('Momentum based GD', 2, None, 'momentum-based-gd'),
+              ('Improving gradient descent with momentum',
                2,
                None,
-               'example-of-usage-of-bayes-theorem'),
-              ('Doing it correctly', 2, None, 'doing-it-correctly'),
-              ("Bayes' Theorem and Ridge and Lasso Regression",
+               'improving-gradient-descent-with-momentum'),
+              ('Same code but now with momentum gradient descent',
                2,
                None,
-               'bayes-theorem-and-ridge-and-lasso-regression'),
-              ('Ridge and Bayes', 2, None, 'ridge-and-bayes'),
-              ('Lasso and Bayes', 2, None, 'lasso-and-bayes'),
-              ('Why resampling methods', 2, None, 'why-resampling-methods'),
-              ('Resampling methods', 2, None, 'resampling-methods'),
-              ('Resampling approaches can be computationally expensive',
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ('Overview video on Stochastic Gradient Descent (SGD)',
                2,
                None,
-               'resampling-approaches-can-be-computationally-expensive'),
-              ('Why resampling methods ?', 2, None, 'why-resampling-methods'),
-              ('Statistical analysis', 2, None, 'statistical-analysis'),
-              ('Resampling methods', 2, None, 'resampling-methods'),
-              ('Resampling methods: Bootstrap',
+               'overview-video-on-stochastic-gradient-descent-sgd'),
+              ('Batches and mini-batches', 2, None, 'batches-and-mini-batches'),
+              ('Pros and cons', 2, None, 'pros-and-cons'),
+              ('Convergence rates', 2, None, 'convergence-rates'),
+              ('Accuracy', 2, None, 'accuracy'),
+              ('Stochastic Gradient Descent (SGD)',
                2,
                None,
-               'resampling-methods-bootstrap'),
-              ('The Central Limit Theorem',
+               'stochastic-gradient-descent-sgd'),
+              ('Stochastic Gradient Descent',
                2,
                None,
-               'the-central-limit-theorem'),
-              ('Finding the Limit', 2, None, 'finding-the-limit'),
-              ('Rewriting the $\\delta$-function',
+               'stochastic-gradient-descent'),
+              ('Computation of gradients', 2, None, 'computation-of-gradients'),
+              ('SGD example', 2, None, 'sgd-example'),
+              ('The gradient step', 2, None, 'the-gradient-step'),
+              ('Simple example code', 2, None, 'simple-example-code'),
+              ('When do we stop?', 2, None, 'when-do-we-stop'),
+              ('Slightly different approach',
                2,
                None,
-               'rewriting-the-delta-function'),
-              ('Identifying Terms', 2, None, 'identifying-terms'),
-              ('Wrapping it up', 2, None, 'wrapping-it-up'),
-              ('Confidence Intervals', 2, None, 'confidence-intervals'),
-              ('Standard Approach based on the Normal Distribution',
+               'slightly-different-approach'),
+              ('Time decay rate', 2, None, 'time-decay-rate'),
+              ('Code with a Number of Minibatches which varies',
                2,
                None,
-               'standard-approach-based-on-the-normal-distribution'),
-              ('Resampling methods: Bootstrap background',
+               'code-with-a-number-of-minibatches-which-varies'),
+              ('Replace or not', 2, None, 'replace-or-not'),
+              ('SGD vs Full-Batch GD: Convergence Speed and Memory Comparison',
                2,
                None,
-               'resampling-methods-bootstrap-background'),
-              ('Resampling methods: More Bootstrap background',
+               'sgd-vs-full-batch-gd-convergence-speed-and-memory-comparison'),
+              ('Theoretical Convergence Speed and convex optimization',
+               3,
+               None,
+               'theoretical-convergence-speed-and-convex-optimization'),
+              ('Strongly Convex Case', 3, None, 'strongly-convex-case'),
+              ('Non-Convex Problems', 3, None, 'non-convex-problems'),
+              ('Memory Usage and Scalability',
+               2,
+               None,
+               'memory-usage-and-scalability'),
+              ('Empirical Evidence: Convergence Time and Memory in Practice',
+               2,
+               None,
+               'empirical-evidence-convergence-time-and-memory-in-practice'),
+              ('Deep Neural Networks', 3, None, 'deep-neural-networks'),
+              ('Memory constraints', 3, None, 'memory-constraints'),
+              ('Second moment of the gradient',
+               2,
+               None,
+               'second-moment-of-the-gradient'),
+              ('Challenge: Choosing a Fixed Learning Rate',
+               2,
+               None,
+               'challenge-choosing-a-fixed-learning-rate'),
+              ('Motivation for Adaptive Step Sizes',
+               2,
+               None,
+               'motivation-for-adaptive-step-sizes'),
+              ('AdaGrad algorithm, taken from "Goodfellow et '
+               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
+               2,
+               None,
+               'adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
+              ('Derivation of the AdaGrad Algorithm',
+               2,
+               None,
+               'derivation-of-the-adagrad-algorithm'),
+              ('AdaGrad Update Rule Derivation',
+               2,
+               None,
+               'adagrad-update-rule-derivation'),
+              ('AdaGrad Properties', 2, None, 'adagrad-properties'),
+              ('RMSProp: Adaptive Learning Rates',
+               2,
+               None,
+               'rmsprop-adaptive-learning-rates'),
+              ('RMSProp algorithm, taken from "Goodfellow et '
+               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
                2,
                None,
-               'resampling-methods-more-bootstrap-background'),
-              ('Resampling methods: Bootstrap approach',
+               'rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
+              ('Adam Optimizer', 2, None, 'adam-optimizer'),
+              ('"ADAM optimizer":"https://arxiv.org/abs/1412.6980"',
                2,
                None,
-               'resampling-methods-bootstrap-approach'),
-              ('Resampling methods: Bootstrap steps',
+               'adam-optimizer-https-arxiv-org-abs-1412-6980'),
+              ('Why Combine Momentum and RMSProp?',
                2,
                None,
-               'resampling-methods-bootstrap-steps'),
-              ('Code example for the Bootstrap method',
+               'why-combine-momentum-and-rmsprop'),
+              ('Adam: Exponential Moving Averages (Moments)',
                2,
                None,
-               'code-example-for-the-bootstrap-method'),
-              ('Plotting the Histogram', 2, None, 'plotting-the-histogram'),
-              ('The bias-variance tradeoff',
+               'adam-exponential-moving-averages-moments'),
+              ('Adam: Bias Correction', 2, None, 'adam-bias-correction'),
+              ('Adam: Update Rule Derivation',
                2,
                None,
-               'the-bias-variance-tradeoff'),
-              ('A way to Read the Bias-Variance Tradeoff',
+               'adam-update-rule-derivation'),
+              ('Adam vs. AdaGrad and RMSProp',
                2,
                None,
-               'a-way-to-read-the-bias-variance-tradeoff'),
-              ('Example code for Bias-Variance tradeoff',
+               'adam-vs-adagrad-and-rmsprop'),
+              ('Adaptivity Across Dimensions',
                2,
                None,
-               'example-code-for-bias-variance-tradeoff'),
-              ('Understanding what happens',
+               'adaptivity-across-dimensions'),
+              ('ADAM algorithm, taken from "Goodfellow et '
+               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
                2,
                None,
-               'understanding-what-happens'),
-              ('Summing up', 2, None, 'summing-up'),
-              ("Another Example from Scikit-Learn's Repository",
+               'adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
+              ('Algorithms and codes for Adagrad, RMSprop and Adam',
                2,
                None,
-               'another-example-from-scikit-learn-s-repository'),
-              ('Various steps in cross-validation',
+               'algorithms-and-codes-for-adagrad-rmsprop-and-adam'),
+              ('Practical tips', 2, None, 'practical-tips'),
+              ('Sneaking in automatic differentiation using Autograd',
                2,
                None,
-               'various-steps-in-cross-validation'),
-              ('Cross-validation in brief',
+               'sneaking-in-automatic-differentiation-using-autograd'),
+              ('Same code but now with momentum gradient descent',
                2,
                None,
-               'cross-validation-in-brief'),
-              ('Code Example for Cross-validation and $k$-fold '
-               'Cross-validation',
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ('Including Stochastic Gradient Descent with Autograd',
                2,
                None,
-               'code-example-for-cross-validation-and-k-fold-cross-validation'),
-              ('More examples on bootstrap and cross-validation and errors',
+               'including-stochastic-gradient-descent-with-autograd'),
+              ('Same code but now with momentum gradient descent',
                2,
                None,
-               'more-examples-on-bootstrap-and-cross-validation-and-errors'),
-              ('The same example but now with cross-validation',
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ("But none of these can compete with Newton's method",
                2,
                None,
-               'the-same-example-but-now-with-cross-validation'),
+               'but-none-of-these-can-compete-with-newton-s-method'),
+              ('Similar (second order function now) problem but now with '
+               'AdaGrad',
+               2,
+               None,
+               'similar-second-order-function-now-problem-but-now-with-adagrad'),
+              ('RMSprop for adaptive learning rate with Stochastic Gradient '
+               'Descent',
+               2,
+               None,
+               'rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent'),
+              ('And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"',
+               2,
+               None,
+               'and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf'),
               ('Material for the lab sessions',
                2,
                None,
                'material-for-the-lab-sessions'),
-              ('Linking the regression analysis with a statistical '
-               'interpretation',
+              ('Reminder on different scaling methods',
                2,
                None,
-               'linking-the-regression-analysis-with-a-statistical-interpretation'),
-              ('Assumptions made', 2, None, 'assumptions-made'),
-              ('Expectation value and variance',
+               'reminder-on-different-scaling-methods'),
+              ('Functionality in Scikit-Learn',
                2,
                None,
-               'expectation-value-and-variance'),
-              ('Expectation value and variance for $\\boldsymbol{\\beta}$',
+               'functionality-in-scikit-learn'),
+              ('More preprocessing', 2, None, 'more-preprocessing'),
+              ('Frequently used scaling functions',
                2,
                None,
-               'expectation-value-and-variance-for-boldsymbol-beta')]}
+               'frequently-used-scaling-functions')]}
 end of tocinfo -->
 
 <body>
@@ -220,66 +283,86 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week37-bs.html">Week 37: Statistical interpretations and Resampling Methods</a>
+    <a class="navbar-brand" href="http://github.com/week37-bs.html">Week 37: Gradient descent methods</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs001.html#plans-for-week-37-lecture-monday" style="font-size: 80%;">Plans for week 37, lecture Monday</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs002.html#plans-for-week-37-lab-sessions" style="font-size: 80%;">Plans for week 37, lab sessions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs003.html#material-for-lecture-monday-september-9" style="font-size: 80%;">Material for lecture Monday September 9</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs004.html#deriving-ols-from-a-probability-distribution" style="font-size: 80%;">Deriving OLS from a probability distribution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs005.html#independent-and-identically-distrubuted-iid" style="font-size: 80%;">Independent and Identically Distrubuted (iid)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs006.html#maximum-likelihood-estimation-mle" style="font-size: 80%;">Maximum Likelihood Estimation (MLE)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs007.html#a-new-cost-function" style="font-size: 80%;">A new Cost Function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs008.html#more-basic-statistics-and-bayes-theorem" style="font-size: 80%;">More basic Statistics and Bayes' theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs009.html#marginal-probability" style="font-size: 80%;">Marginal Probability</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs010.html#conditional-probability" style="font-size: 80%;">Conditional  Probability</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs011.html#bayes-theorem" style="font-size: 80%;">Bayes' Theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs012.html#interpretations-of-bayes-theorem" style="font-size: 80%;">Interpretations of Bayes' Theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs013.html#example-of-usage-of-bayes-theorem" style="font-size: 80%;">Example of Usage of Bayes' theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs014.html#doing-it-correctly" style="font-size: 80%;">Doing it correctly</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs015.html#bayes-theorem-and-ridge-and-lasso-regression" style="font-size: 80%;">Bayes' Theorem and Ridge and Lasso Regression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs016.html#ridge-and-bayes" style="font-size: 80%;">Ridge and Bayes</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs017.html#lasso-and-bayes" style="font-size: 80%;">Lasso and Bayes</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs021.html#why-resampling-methods" style="font-size: 80%;">Why resampling methods</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs023.html#resampling-methods" style="font-size: 80%;">Resampling methods</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs020.html#resampling-approaches-can-be-computationally-expensive" style="font-size: 80%;">Resampling approaches can be computationally expensive</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs021.html#why-resampling-methods" style="font-size: 80%;">Why resampling methods ?</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs022.html#statistical-analysis" style="font-size: 80%;">Statistical analysis</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs023.html#resampling-methods" style="font-size: 80%;">Resampling methods</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs024.html#resampling-methods-bootstrap" style="font-size: 80%;">Resampling methods: Bootstrap</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs025.html#the-central-limit-theorem" style="font-size: 80%;">The Central Limit Theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs026.html#finding-the-limit" style="font-size: 80%;">Finding the Limit</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs027.html#rewriting-the-delta-function" style="font-size: 80%;">Rewriting the \( \delta \)-function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs028.html#identifying-terms" style="font-size: 80%;">Identifying Terms</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs029.html#wrapping-it-up" style="font-size: 80%;">Wrapping it up</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs030.html#confidence-intervals" style="font-size: 80%;">Confidence Intervals</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs031.html#standard-approach-based-on-the-normal-distribution" style="font-size: 80%;">Standard Approach based on the Normal Distribution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs032.html#resampling-methods-bootstrap-background" style="font-size: 80%;">Resampling methods: Bootstrap background</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#resampling-methods-more-bootstrap-background" style="font-size: 80%;">Resampling methods: More Bootstrap background</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs034.html#resampling-methods-bootstrap-approach" style="font-size: 80%;">Resampling methods: Bootstrap approach</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#resampling-methods-bootstrap-steps" style="font-size: 80%;">Resampling methods: Bootstrap steps</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs036.html#code-example-for-the-bootstrap-method" style="font-size: 80%;">Code example for the Bootstrap method</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs037.html#plotting-the-histogram" style="font-size: 80%;">Plotting the Histogram</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs038.html#the-bias-variance-tradeoff" style="font-size: 80%;">The bias-variance tradeoff</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs039.html#a-way-to-read-the-bias-variance-tradeoff" style="font-size: 80%;">A way to Read the Bias-Variance Tradeoff</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs040.html#example-code-for-bias-variance-tradeoff" style="font-size: 80%;">Example code for Bias-Variance tradeoff</a></li>
-     <!-- navigation toc: --> <li><a href="#understanding-what-happens" style="font-size: 80%;">Understanding what happens</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs042.html#summing-up" style="font-size: 80%;">Summing up</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs043.html#another-example-from-scikit-learn-s-repository" style="font-size: 80%;">Another Example from Scikit-Learn's Repository</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs044.html#various-steps-in-cross-validation" style="font-size: 80%;">Various steps in cross-validation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs045.html#cross-validation-in-brief" style="font-size: 80%;">Cross-validation in brief</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs046.html#code-example-for-cross-validation-and-k-fold-cross-validation" style="font-size: 80%;">Code Example for Cross-validation and \( k \)-fold Cross-validation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs047.html#more-examples-on-bootstrap-and-cross-validation-and-errors" style="font-size: 80%;">More examples on bootstrap and cross-validation and errors</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs048.html#the-same-example-but-now-with-cross-validation" style="font-size: 80%;">The same example but now with cross-validation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs049.html#material-for-the-lab-sessions" style="font-size: 80%;">Material for the lab sessions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs050.html#linking-the-regression-analysis-with-a-statistical-interpretation" style="font-size: 80%;">Linking the regression analysis with a statistical interpretation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs051.html#assumptions-made" style="font-size: 80%;">Assumptions made</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs052.html#expectation-value-and-variance" style="font-size: 80%;">Expectation value and variance</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs053.html#expectation-value-and-variance-for-boldsymbol-beta" style="font-size: 80%;">Expectation value and variance for \( \boldsymbol{\beta} \)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs001.html#plans-for-week-37-lecture-monday" style="font-size: 80%;"><b>Plans for week 37, lecture Monday</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs002.html#readings-and-videos" style="font-size: 80%;"><b>Readings and Videos:</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs003.html#material-for-lecture-monday-september-8" style="font-size: 80%;"><b>Material for lecture Monday September 8</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs004.html#gradient-descent-and-revisiting-ordinary-least-squares-from-last-week" style="font-size: 80%;"><b>Gradient descent and revisiting Ordinary Least Squares from last week</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs009.html#gradient-descent-example" style="font-size: 80%;"><b>Gradient descent example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs006.html#the-derivative-of-the-cost-loss-function" style="font-size: 80%;"><b>The derivative of the cost/loss function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs007.html#the-hessian-matrix" style="font-size: 80%;"><b>The Hessian matrix</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs008.html#simple-program" style="font-size: 80%;"><b>Simple program</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs009.html#gradient-descent-example" style="font-size: 80%;"><b>Gradient Descent Example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs010.html#gradient-descent-and-ridge" style="font-size: 80%;"><b>Gradient descent and Ridge</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs011.html#the-hessian-matrix-for-ridge-regression" style="font-size: 80%;"><b>The Hessian matrix for Ridge Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs012.html#program-example-for-gradient-descent-with-ridge-regression" style="font-size: 80%;"><b>Program example for gradient descent with Ridge Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs013.html#using-gradient-descent-methods-limitations" style="font-size: 80%;"><b>Using gradient descent methods, limitations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs014.html#momentum-based-gd" style="font-size: 80%;"><b>Momentum based GD</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs015.html#improving-gradient-descent-with-momentum" style="font-size: 80%;"><b>Improving gradient descent with momentum</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs056.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs017.html#overview-video-on-stochastic-gradient-descent-sgd" style="font-size: 80%;"><b>Overview video on Stochastic Gradient Descent (SGD)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs018.html#batches-and-mini-batches" style="font-size: 80%;"><b>Batches and mini-batches</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs019.html#pros-and-cons" style="font-size: 80%;"><b>Pros and cons</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs020.html#convergence-rates" style="font-size: 80%;"><b>Convergence rates</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs021.html#accuracy" style="font-size: 80%;"><b>Accuracy</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs022.html#stochastic-gradient-descent-sgd" style="font-size: 80%;"><b>Stochastic Gradient Descent (SGD)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs023.html#stochastic-gradient-descent" style="font-size: 80%;"><b>Stochastic Gradient Descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs024.html#computation-of-gradients" style="font-size: 80%;"><b>Computation of gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs025.html#sgd-example" style="font-size: 80%;"><b>SGD example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs026.html#the-gradient-step" style="font-size: 80%;"><b>The gradient step</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs027.html#simple-example-code" style="font-size: 80%;"><b>Simple example code</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs028.html#when-do-we-stop" style="font-size: 80%;"><b>When do we stop?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs029.html#slightly-different-approach" style="font-size: 80%;"><b>Slightly different approach</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs030.html#time-decay-rate" style="font-size: 80%;"><b>Time decay rate</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs031.html#code-with-a-number-of-minibatches-which-varies" style="font-size: 80%;"><b>Code with a Number of Minibatches which varies</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs032.html#replace-or-not" style="font-size: 80%;"><b>Replace or not</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#sgd-vs-full-batch-gd-convergence-speed-and-memory-comparison" style="font-size: 80%;"><b>SGD vs Full-Batch GD: Convergence Speed and Memory Comparison</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#theoretical-convergence-speed-and-convex-optimization" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Theoretical Convergence Speed and convex optimization</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#strongly-convex-case" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Strongly Convex Case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#non-convex-problems" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Non-Convex Problems</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs034.html#memory-usage-and-scalability" style="font-size: 80%;"><b>Memory Usage and Scalability</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#empirical-evidence-convergence-time-and-memory-in-practice" style="font-size: 80%;"><b>Empirical Evidence: Convergence Time and Memory in Practice</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#deep-neural-networks" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Deep Neural Networks</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#memory-constraints" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Memory constraints</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs036.html#second-moment-of-the-gradient" style="font-size: 80%;"><b>Second moment of the gradient</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs037.html#challenge-choosing-a-fixed-learning-rate" style="font-size: 80%;"><b>Challenge: Choosing a Fixed Learning Rate</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs038.html#motivation-for-adaptive-step-sizes" style="font-size: 80%;"><b>Motivation for Adaptive Step Sizes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs038.html#adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>AdaGrad algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs039.html#derivation-of-the-adagrad-algorithm" style="font-size: 80%;"><b>Derivation of the AdaGrad Algorithm</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs040.html#adagrad-update-rule-derivation" style="font-size: 80%;"><b>AdaGrad Update Rule Derivation</b></a></li>
+     <!-- navigation toc: --> <li><a href="#adagrad-properties" style="font-size: 80%;"><b>AdaGrad Properties</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs042.html#rmsprop-adaptive-learning-rates" style="font-size: 80%;"><b>RMSProp: Adaptive Learning Rates</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs042.html#rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>RMSProp algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs043.html#adam-optimizer" style="font-size: 80%;"><b>Adam Optimizer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs044.html#adam-optimizer-https-arxiv-org-abs-1412-6980" style="font-size: 80%;"><b>"ADAM optimizer":"https://arxiv.org/abs/1412.6980"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs045.html#why-combine-momentum-and-rmsprop" style="font-size: 80%;"><b>Why Combine Momentum and RMSProp?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs046.html#adam-exponential-moving-averages-moments" style="font-size: 80%;"><b>Adam: Exponential Moving Averages (Moments)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs047.html#adam-bias-correction" style="font-size: 80%;"><b>Adam: Bias Correction</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs048.html#adam-update-rule-derivation" style="font-size: 80%;"><b>Adam: Update Rule Derivation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs049.html#adam-vs-adagrad-and-rmsprop" style="font-size: 80%;"><b>Adam vs. AdaGrad and RMSProp</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs050.html#adaptivity-across-dimensions" style="font-size: 80%;"><b>Adaptivity Across Dimensions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs050.html#adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>ADAM algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs051.html#algorithms-and-codes-for-adagrad-rmsprop-and-adam" style="font-size: 80%;"><b>Algorithms and codes for Adagrad, RMSprop and Adam</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs052.html#practical-tips" style="font-size: 80%;"><b>Practical tips</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs053.html#sneaking-in-automatic-differentiation-using-autograd" style="font-size: 80%;"><b>Sneaking in automatic differentiation using Autograd</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs056.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs055.html#including-stochastic-gradient-descent-with-autograd" style="font-size: 80%;"><b>Including Stochastic Gradient Descent with Autograd</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs056.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs057.html#but-none-of-these-can-compete-with-newton-s-method" style="font-size: 80%;"><b>But none of these can compete with Newton's method</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs058.html#similar-second-order-function-now-problem-but-now-with-adagrad" style="font-size: 80%;"><b>Similar (second order function now) problem but now with AdaGrad</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs059.html#rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent" style="font-size: 80%;"><b>RMSprop for adaptive learning rate with Stochastic Gradient Descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs060.html#and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf" style="font-size: 80%;"><b>And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs061.html#material-for-the-lab-sessions" style="font-size: 80%;"><b>Material for the lab sessions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs062.html#reminder-on-different-scaling-methods" style="font-size: 80%;"><b>Reminder on different scaling methods</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs063.html#functionality-in-scikit-learn" style="font-size: 80%;"><b>Functionality in Scikit-Learn</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs064.html#more-preprocessing" style="font-size: 80%;"><b>More preprocessing</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs065.html#frequently-used-scaling-functions" style="font-size: 80%;"><b>Frequently used scaling functions</b></a></li>
 
         </ul>
       </li>
@@ -291,76 +374,18 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0041"></a>
 <!-- !split -->
-<h2 id="understanding-what-happens" class="anchor">Understanding what happens </h2>
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">matplotlib.pyplot</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">plt</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.linear_model</span> <span style="color: #008000; font-weight: bold">import</span> LinearRegression, Ridge, Lasso
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.preprocessing</span> <span style="color: #008000; font-weight: bold">import</span> PolynomialFeatures
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.model_selection</span> <span style="color: #008000; font-weight: bold">import</span> train_test_split
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.pipeline</span> <span style="color: #008000; font-weight: bold">import</span> make_pipeline
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.utils</span> <span style="color: #008000; font-weight: bold">import</span> resample
-
-np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>seed(<span style="color: #666666">2018</span>)
-
-n <span style="color: #666666">=</span> <span style="color: #666666">40</span>
-n_boostraps <span style="color: #666666">=</span> <span style="color: #666666">100</span>
-maxdegree <span style="color: #666666">=</span> <span style="color: #666666">14</span>
-
-
-<span style="color: #408080; font-style: italic"># Make data set.</span>
-x <span style="color: #666666">=</span> np<span style="color: #666666">.</span>linspace(<span style="color: #666666">-3</span>, <span style="color: #666666">3</span>, n)<span style="color: #666666">.</span>reshape(<span style="color: #666666">-1</span>, <span style="color: #666666">1</span>)
-y <span style="color: #666666">=</span> np<span style="color: #666666">.</span>exp(<span style="color: #666666">-</span>x<span style="color: #666666">**2</span>) <span style="color: #666666">+</span> <span style="color: #666666">1.5</span> <span style="color: #666666">*</span> np<span style="color: #666666">.</span>exp(<span style="color: #666666">-</span>(x<span style="color: #666666">-2</span>)<span style="color: #666666">**2</span>)<span style="color: #666666">+</span> np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>normal(<span style="color: #666666">0</span>, <span style="color: #666666">0.1</span>, x<span style="color: #666666">.</span>shape)
-error <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros(maxdegree)
-bias <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros(maxdegree)
-variance <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros(maxdegree)
-polydegree <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros(maxdegree)
-x_train, x_test, y_train, y_test <span style="color: #666666">=</span> train_test_split(x, y, test_size<span style="color: #666666">=0.2</span>)
-
-<span style="color: #008000; font-weight: bold">for</span> degree <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(maxdegree):
-    model <span style="color: #666666">=</span> make_pipeline(PolynomialFeatures(degree<span style="color: #666666">=</span>degree), LinearRegression(fit_intercept<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">False</span>))
-    y_pred <span style="color: #666666">=</span> np<span style="color: #666666">.</span>empty((y_test<span style="color: #666666">.</span>shape[<span style="color: #666666">0</span>], n_boostraps))
-    <span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(n_boostraps):
-        x_, y_ <span style="color: #666666">=</span> resample(x_train, y_train)
-        y_pred[:, i] <span style="color: #666666">=</span> model<span style="color: #666666">.</span>fit(x_, y_)<span style="color: #666666">.</span>predict(x_test)<span style="color: #666666">.</span>ravel()
-
-    polydegree[degree] <span style="color: #666666">=</span> degree
-    error[degree] <span style="color: #666666">=</span> np<span style="color: #666666">.</span>mean( np<span style="color: #666666">.</span>mean((y_test <span style="color: #666666">-</span> y_pred)<span style="color: #666666">**2</span>, axis<span style="color: #666666">=1</span>, keepdims<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">True</span>) )
-    bias[degree] <span style="color: #666666">=</span> np<span style="color: #666666">.</span>mean( (y_test <span style="color: #666666">-</span> np<span style="color: #666666">.</span>mean(y_pred, axis<span style="color: #666666">=1</span>, keepdims<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">True</span>))<span style="color: #666666">**2</span> )
-    variance[degree] <span style="color: #666666">=</span> np<span style="color: #666666">.</span>mean( np<span style="color: #666666">.</span>var(y_pred, axis<span style="color: #666666">=1</span>, keepdims<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">True</span>) )
-    <span style="color: #008000">print</span>(<span style="color: #BA2121">&#39;Polynomial degree:&#39;</span>, degree)
-    <span style="color: #008000">print</span>(<span style="color: #BA2121">&#39;Error:&#39;</span>, error[degree])
-    <span style="color: #008000">print</span>(<span style="color: #BA2121">&#39;Bias^2:&#39;</span>, bias[degree])
-    <span style="color: #008000">print</span>(<span style="color: #BA2121">&#39;Var:&#39;</span>, variance[degree])
-    <span style="color: #008000">print</span>(<span style="color: #BA2121">&#39;</span><span style="color: #BB6688; font-weight: bold">{}</span><span style="color: #BA2121"> &gt;= </span><span style="color: #BB6688; font-weight: bold">{}</span><span style="color: #BA2121"> + </span><span style="color: #BB6688; font-weight: bold">{}</span><span style="color: #BA2121"> = </span><span style="color: #BB6688; font-weight: bold">{}</span><span style="color: #BA2121">&#39;</span><span style="color: #666666">.</span>format(error[degree], bias[degree], variance[degree], bias[degree]<span style="color: #666666">+</span>variance[degree]))
-
-plt<span style="color: #666666">.</span>plot(polydegree, error, label<span style="color: #666666">=</span><span style="color: #BA2121">&#39;Error&#39;</span>)
-plt<span style="color: #666666">.</span>plot(polydegree, bias, label<span style="color: #666666">=</span><span style="color: #BA2121">&#39;bias&#39;</span>)
-plt<span style="color: #666666">.</span>plot(polydegree, variance, label<span style="color: #666666">=</span><span style="color: #BA2121">&#39;Variance&#39;</span>)
-plt<span style="color: #666666">.</span>legend()
-plt<span style="color: #666666">.</span>show()
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
+<h2 id="adagrad-properties" class="anchor">AdaGrad Properties </h2>
 
+<ol>
+<li> AdaGrad automatically tunes the step size for each parameter. Parameters with more <em>volatile or large gradients</em> get smaller steps, and those with <em>small or infrequent gradients</em> get relatively larger steps</li>
+<li> No manual schedule needed: The accumulation \( r_t \) keeps increasing (or stays the same if gradient is zero), so step sizes \( \eta/\sqrt{r_t} \) are non-increasing. This has a similar effect to a learning rate schedule, but individualized per coordinate.</li>
+<li> Sparse data benefit: For very sparse features, \( r_{t,j} \) grows slowly, so that feature&#8217;s parameter retains a higher learning rate for longer, allowing it to make significant updates when it does get a gradient signal</li>
+<li> Convergence: In convex optimization, AdaGrad can be shown to achieve a sub-linear convergence rate  comparable to the best fixed learning rate tuned for the problem</li>
+</ol>
+<p>It effectively reduces the need to tune \( \eta \) by hand.</p>
+<ol>
+<li> Limitations: Because \( r_t \) accumulates without bound, AdaGrad&#8217;s learning rates can become extremely small over long training, potentially slowing progress. (Later variants like RMSProp, AdaDelta, Adam address this by modifying the accumulation rule.)</li>
+</ol>
 <p>
 <!-- navigation buttons at the bottom of the page -->
 <ul class="pagination">
@@ -386,7 +411,7 @@ <h2 id="understanding-what-happens" class="anchor">Understanding what happens </
   <li><a href="http://github.com/._week37-bs049.html">50</a></li>
   <li><a href="http://github.com/._week37-bs050.html">51</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week37-bs053.html">54</a></li>
+  <li><a href="http://github.com/._week37-bs065.html">66</a></li>
   <li><a href="http://github.com/._week37-bs042.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week37/html/._week37-bs042.html b/doc/pub/week37/html/._week37-bs042.html
index fb72e3cd7..e580daf68 100644
--- a/doc/pub/week37/html/._week37-bs042.html
+++ b/doc/pub/week37/html/._week37-bs042.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 37: Statistical interpretations and Resampling Methods">
-<title>Week 37: Statistical interpretations and Resampling Methods</title>
+<meta name="description" content="Week 37: Gradient descent methods">
+<title>Week 37: Gradient descent methods</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week37.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week37-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -40,159 +40,222 @@
                2,
                None,
                'plans-for-week-37-lecture-monday'),
-              ('Plans for week 37, lab sessions',
+              ('Readings and Videos:', 2, None, 'readings-and-videos'),
+              ('Material for lecture Monday September 8',
                2,
                None,
-               'plans-for-week-37-lab-sessions'),
-              ('Material for lecture Monday September 9',
+               'material-for-lecture-monday-september-8'),
+              ('Gradient descent and revisiting Ordinary Least Squares from '
+               'last week',
                2,
                None,
-               'material-for-lecture-monday-september-9'),
-              ('Deriving OLS from a probability distribution',
+               'gradient-descent-and-revisiting-ordinary-least-squares-from-last-week'),
+              ('Gradient descent example', 2, None, 'gradient-descent-example'),
+              ('The derivative of the cost/loss function',
                2,
                None,
-               'deriving-ols-from-a-probability-distribution'),
-              ('Independent and Identically Distrubuted (iid)',
+               'the-derivative-of-the-cost-loss-function'),
+              ('The Hessian matrix', 2, None, 'the-hessian-matrix'),
+              ('Simple program', 2, None, 'simple-program'),
+              ('Gradient Descent Example', 2, None, 'gradient-descent-example'),
+              ('Gradient descent and Ridge',
                2,
                None,
-               'independent-and-identically-distrubuted-iid'),
-              ('Maximum Likelihood Estimation (MLE)',
+               'gradient-descent-and-ridge'),
+              ('The Hessian matrix for Ridge Regression',
                2,
                None,
-               'maximum-likelihood-estimation-mle'),
-              ('A new Cost Function', 2, None, 'a-new-cost-function'),
-              ("More basic Statistics and Bayes' theorem",
+               'the-hessian-matrix-for-ridge-regression'),
+              ('Program example for gradient descent with Ridge Regression',
                2,
                None,
-               'more-basic-statistics-and-bayes-theorem'),
-              ('Marginal Probability', 2, None, 'marginal-probability'),
-              ('Conditional  Probability', 2, None, 'conditional-probability'),
-              ("Bayes' Theorem", 2, None, 'bayes-theorem'),
-              ("Interpretations of Bayes' Theorem",
+               'program-example-for-gradient-descent-with-ridge-regression'),
+              ('Using gradient descent methods, limitations',
                2,
                None,
-               'interpretations-of-bayes-theorem'),
-              ("Example of Usage of Bayes' theorem",
+               'using-gradient-descent-methods-limitations'),
+              ('Momentum based GD', 2, None, 'momentum-based-gd'),
+              ('Improving gradient descent with momentum',
                2,
                None,
-               'example-of-usage-of-bayes-theorem'),
-              ('Doing it correctly', 2, None, 'doing-it-correctly'),
-              ("Bayes' Theorem and Ridge and Lasso Regression",
+               'improving-gradient-descent-with-momentum'),
+              ('Same code but now with momentum gradient descent',
                2,
                None,
-               'bayes-theorem-and-ridge-and-lasso-regression'),
-              ('Ridge and Bayes', 2, None, 'ridge-and-bayes'),
-              ('Lasso and Bayes', 2, None, 'lasso-and-bayes'),
-              ('Why resampling methods', 2, None, 'why-resampling-methods'),
-              ('Resampling methods', 2, None, 'resampling-methods'),
-              ('Resampling approaches can be computationally expensive',
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ('Overview video on Stochastic Gradient Descent (SGD)',
                2,
                None,
-               'resampling-approaches-can-be-computationally-expensive'),
-              ('Why resampling methods ?', 2, None, 'why-resampling-methods'),
-              ('Statistical analysis', 2, None, 'statistical-analysis'),
-              ('Resampling methods', 2, None, 'resampling-methods'),
-              ('Resampling methods: Bootstrap',
+               'overview-video-on-stochastic-gradient-descent-sgd'),
+              ('Batches and mini-batches', 2, None, 'batches-and-mini-batches'),
+              ('Pros and cons', 2, None, 'pros-and-cons'),
+              ('Convergence rates', 2, None, 'convergence-rates'),
+              ('Accuracy', 2, None, 'accuracy'),
+              ('Stochastic Gradient Descent (SGD)',
                2,
                None,
-               'resampling-methods-bootstrap'),
-              ('The Central Limit Theorem',
+               'stochastic-gradient-descent-sgd'),
+              ('Stochastic Gradient Descent',
                2,
                None,
-               'the-central-limit-theorem'),
-              ('Finding the Limit', 2, None, 'finding-the-limit'),
-              ('Rewriting the $\\delta$-function',
+               'stochastic-gradient-descent'),
+              ('Computation of gradients', 2, None, 'computation-of-gradients'),
+              ('SGD example', 2, None, 'sgd-example'),
+              ('The gradient step', 2, None, 'the-gradient-step'),
+              ('Simple example code', 2, None, 'simple-example-code'),
+              ('When do we stop?', 2, None, 'when-do-we-stop'),
+              ('Slightly different approach',
                2,
                None,
-               'rewriting-the-delta-function'),
-              ('Identifying Terms', 2, None, 'identifying-terms'),
-              ('Wrapping it up', 2, None, 'wrapping-it-up'),
-              ('Confidence Intervals', 2, None, 'confidence-intervals'),
-              ('Standard Approach based on the Normal Distribution',
+               'slightly-different-approach'),
+              ('Time decay rate', 2, None, 'time-decay-rate'),
+              ('Code with a Number of Minibatches which varies',
                2,
                None,
-               'standard-approach-based-on-the-normal-distribution'),
-              ('Resampling methods: Bootstrap background',
+               'code-with-a-number-of-minibatches-which-varies'),
+              ('Replace or not', 2, None, 'replace-or-not'),
+              ('SGD vs Full-Batch GD: Convergence Speed and Memory Comparison',
                2,
                None,
-               'resampling-methods-bootstrap-background'),
-              ('Resampling methods: More Bootstrap background',
+               'sgd-vs-full-batch-gd-convergence-speed-and-memory-comparison'),
+              ('Theoretical Convergence Speed and convex optimization',
+               3,
+               None,
+               'theoretical-convergence-speed-and-convex-optimization'),
+              ('Strongly Convex Case', 3, None, 'strongly-convex-case'),
+              ('Non-Convex Problems', 3, None, 'non-convex-problems'),
+              ('Memory Usage and Scalability',
+               2,
+               None,
+               'memory-usage-and-scalability'),
+              ('Empirical Evidence: Convergence Time and Memory in Practice',
+               2,
+               None,
+               'empirical-evidence-convergence-time-and-memory-in-practice'),
+              ('Deep Neural Networks', 3, None, 'deep-neural-networks'),
+              ('Memory constraints', 3, None, 'memory-constraints'),
+              ('Second moment of the gradient',
+               2,
+               None,
+               'second-moment-of-the-gradient'),
+              ('Challenge: Choosing a Fixed Learning Rate',
+               2,
+               None,
+               'challenge-choosing-a-fixed-learning-rate'),
+              ('Motivation for Adaptive Step Sizes',
+               2,
+               None,
+               'motivation-for-adaptive-step-sizes'),
+              ('AdaGrad algorithm, taken from "Goodfellow et '
+               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
+               2,
+               None,
+               'adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
+              ('Derivation of the AdaGrad Algorithm',
+               2,
+               None,
+               'derivation-of-the-adagrad-algorithm'),
+              ('AdaGrad Update Rule Derivation',
                2,
                None,
-               'resampling-methods-more-bootstrap-background'),
-              ('Resampling methods: Bootstrap approach',
+               'adagrad-update-rule-derivation'),
+              ('AdaGrad Properties', 2, None, 'adagrad-properties'),
+              ('RMSProp: Adaptive Learning Rates',
                2,
                None,
-               'resampling-methods-bootstrap-approach'),
-              ('Resampling methods: Bootstrap steps',
+               'rmsprop-adaptive-learning-rates'),
+              ('RMSProp algorithm, taken from "Goodfellow et '
+               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
                2,
                None,
-               'resampling-methods-bootstrap-steps'),
-              ('Code example for the Bootstrap method',
+               'rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
+              ('Adam Optimizer', 2, None, 'adam-optimizer'),
+              ('"ADAM optimizer":"https://arxiv.org/abs/1412.6980"',
                2,
                None,
-               'code-example-for-the-bootstrap-method'),
-              ('Plotting the Histogram', 2, None, 'plotting-the-histogram'),
-              ('The bias-variance tradeoff',
+               'adam-optimizer-https-arxiv-org-abs-1412-6980'),
+              ('Why Combine Momentum and RMSProp?',
                2,
                None,
-               'the-bias-variance-tradeoff'),
-              ('A way to Read the Bias-Variance Tradeoff',
+               'why-combine-momentum-and-rmsprop'),
+              ('Adam: Exponential Moving Averages (Moments)',
                2,
                None,
-               'a-way-to-read-the-bias-variance-tradeoff'),
-              ('Example code for Bias-Variance tradeoff',
+               'adam-exponential-moving-averages-moments'),
+              ('Adam: Bias Correction', 2, None, 'adam-bias-correction'),
+              ('Adam: Update Rule Derivation',
                2,
                None,
-               'example-code-for-bias-variance-tradeoff'),
-              ('Understanding what happens',
+               'adam-update-rule-derivation'),
+              ('Adam vs. AdaGrad and RMSProp',
                2,
                None,
-               'understanding-what-happens'),
-              ('Summing up', 2, None, 'summing-up'),
-              ("Another Example from Scikit-Learn's Repository",
+               'adam-vs-adagrad-and-rmsprop'),
+              ('Adaptivity Across Dimensions',
                2,
                None,
-               'another-example-from-scikit-learn-s-repository'),
-              ('Various steps in cross-validation',
+               'adaptivity-across-dimensions'),
+              ('ADAM algorithm, taken from "Goodfellow et '
+               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
                2,
                None,
-               'various-steps-in-cross-validation'),
-              ('Cross-validation in brief',
+               'adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
+              ('Algorithms and codes for Adagrad, RMSprop and Adam',
                2,
                None,
-               'cross-validation-in-brief'),
-              ('Code Example for Cross-validation and $k$-fold '
-               'Cross-validation',
+               'algorithms-and-codes-for-adagrad-rmsprop-and-adam'),
+              ('Practical tips', 2, None, 'practical-tips'),
+              ('Sneaking in automatic differentiation using Autograd',
                2,
                None,
-               'code-example-for-cross-validation-and-k-fold-cross-validation'),
-              ('More examples on bootstrap and cross-validation and errors',
+               'sneaking-in-automatic-differentiation-using-autograd'),
+              ('Same code but now with momentum gradient descent',
                2,
                None,
-               'more-examples-on-bootstrap-and-cross-validation-and-errors'),
-              ('The same example but now with cross-validation',
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ('Including Stochastic Gradient Descent with Autograd',
                2,
                None,
-               'the-same-example-but-now-with-cross-validation'),
+               'including-stochastic-gradient-descent-with-autograd'),
+              ('Same code but now with momentum gradient descent',
+               2,
+               None,
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ("But none of these can compete with Newton's method",
+               2,
+               None,
+               'but-none-of-these-can-compete-with-newton-s-method'),
+              ('Similar (second order function now) problem but now with '
+               'AdaGrad',
+               2,
+               None,
+               'similar-second-order-function-now-problem-but-now-with-adagrad'),
+              ('RMSprop for adaptive learning rate with Stochastic Gradient '
+               'Descent',
+               2,
+               None,
+               'rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent'),
+              ('And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"',
+               2,
+               None,
+               'and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf'),
               ('Material for the lab sessions',
                2,
                None,
                'material-for-the-lab-sessions'),
-              ('Linking the regression analysis with a statistical '
-               'interpretation',
+              ('Reminder on different scaling methods',
                2,
                None,
-               'linking-the-regression-analysis-with-a-statistical-interpretation'),
-              ('Assumptions made', 2, None, 'assumptions-made'),
-              ('Expectation value and variance',
+               'reminder-on-different-scaling-methods'),
+              ('Functionality in Scikit-Learn',
                2,
                None,
-               'expectation-value-and-variance'),
-              ('Expectation value and variance for $\\boldsymbol{\\beta}$',
+               'functionality-in-scikit-learn'),
+              ('More preprocessing', 2, None, 'more-preprocessing'),
+              ('Frequently used scaling functions',
                2,
                None,
-               'expectation-value-and-variance-for-boldsymbol-beta')]}
+               'frequently-used-scaling-functions')]}
 end of tocinfo -->
 
 <body>
@@ -220,66 +283,86 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week37-bs.html">Week 37: Statistical interpretations and Resampling Methods</a>
+    <a class="navbar-brand" href="http://github.com/week37-bs.html">Week 37: Gradient descent methods</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs001.html#plans-for-week-37-lecture-monday" style="font-size: 80%;">Plans for week 37, lecture Monday</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs002.html#plans-for-week-37-lab-sessions" style="font-size: 80%;">Plans for week 37, lab sessions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs003.html#material-for-lecture-monday-september-9" style="font-size: 80%;">Material for lecture Monday September 9</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs004.html#deriving-ols-from-a-probability-distribution" style="font-size: 80%;">Deriving OLS from a probability distribution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs005.html#independent-and-identically-distrubuted-iid" style="font-size: 80%;">Independent and Identically Distrubuted (iid)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs006.html#maximum-likelihood-estimation-mle" style="font-size: 80%;">Maximum Likelihood Estimation (MLE)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs007.html#a-new-cost-function" style="font-size: 80%;">A new Cost Function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs008.html#more-basic-statistics-and-bayes-theorem" style="font-size: 80%;">More basic Statistics and Bayes' theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs009.html#marginal-probability" style="font-size: 80%;">Marginal Probability</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs010.html#conditional-probability" style="font-size: 80%;">Conditional  Probability</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs011.html#bayes-theorem" style="font-size: 80%;">Bayes' Theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs012.html#interpretations-of-bayes-theorem" style="font-size: 80%;">Interpretations of Bayes' Theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs013.html#example-of-usage-of-bayes-theorem" style="font-size: 80%;">Example of Usage of Bayes' theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs014.html#doing-it-correctly" style="font-size: 80%;">Doing it correctly</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs015.html#bayes-theorem-and-ridge-and-lasso-regression" style="font-size: 80%;">Bayes' Theorem and Ridge and Lasso Regression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs016.html#ridge-and-bayes" style="font-size: 80%;">Ridge and Bayes</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs017.html#lasso-and-bayes" style="font-size: 80%;">Lasso and Bayes</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs021.html#why-resampling-methods" style="font-size: 80%;">Why resampling methods</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs023.html#resampling-methods" style="font-size: 80%;">Resampling methods</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs020.html#resampling-approaches-can-be-computationally-expensive" style="font-size: 80%;">Resampling approaches can be computationally expensive</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs021.html#why-resampling-methods" style="font-size: 80%;">Why resampling methods ?</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs022.html#statistical-analysis" style="font-size: 80%;">Statistical analysis</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs023.html#resampling-methods" style="font-size: 80%;">Resampling methods</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs024.html#resampling-methods-bootstrap" style="font-size: 80%;">Resampling methods: Bootstrap</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs025.html#the-central-limit-theorem" style="font-size: 80%;">The Central Limit Theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs026.html#finding-the-limit" style="font-size: 80%;">Finding the Limit</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs027.html#rewriting-the-delta-function" style="font-size: 80%;">Rewriting the \( \delta \)-function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs028.html#identifying-terms" style="font-size: 80%;">Identifying Terms</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs029.html#wrapping-it-up" style="font-size: 80%;">Wrapping it up</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs030.html#confidence-intervals" style="font-size: 80%;">Confidence Intervals</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs031.html#standard-approach-based-on-the-normal-distribution" style="font-size: 80%;">Standard Approach based on the Normal Distribution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs032.html#resampling-methods-bootstrap-background" style="font-size: 80%;">Resampling methods: Bootstrap background</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#resampling-methods-more-bootstrap-background" style="font-size: 80%;">Resampling methods: More Bootstrap background</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs034.html#resampling-methods-bootstrap-approach" style="font-size: 80%;">Resampling methods: Bootstrap approach</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#resampling-methods-bootstrap-steps" style="font-size: 80%;">Resampling methods: Bootstrap steps</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs036.html#code-example-for-the-bootstrap-method" style="font-size: 80%;">Code example for the Bootstrap method</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs037.html#plotting-the-histogram" style="font-size: 80%;">Plotting the Histogram</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs038.html#the-bias-variance-tradeoff" style="font-size: 80%;">The bias-variance tradeoff</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs039.html#a-way-to-read-the-bias-variance-tradeoff" style="font-size: 80%;">A way to Read the Bias-Variance Tradeoff</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs040.html#example-code-for-bias-variance-tradeoff" style="font-size: 80%;">Example code for Bias-Variance tradeoff</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs041.html#understanding-what-happens" style="font-size: 80%;">Understanding what happens</a></li>
-     <!-- navigation toc: --> <li><a href="#summing-up" style="font-size: 80%;">Summing up</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs043.html#another-example-from-scikit-learn-s-repository" style="font-size: 80%;">Another Example from Scikit-Learn's Repository</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs044.html#various-steps-in-cross-validation" style="font-size: 80%;">Various steps in cross-validation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs045.html#cross-validation-in-brief" style="font-size: 80%;">Cross-validation in brief</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs046.html#code-example-for-cross-validation-and-k-fold-cross-validation" style="font-size: 80%;">Code Example for Cross-validation and \( k \)-fold Cross-validation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs047.html#more-examples-on-bootstrap-and-cross-validation-and-errors" style="font-size: 80%;">More examples on bootstrap and cross-validation and errors</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs048.html#the-same-example-but-now-with-cross-validation" style="font-size: 80%;">The same example but now with cross-validation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs049.html#material-for-the-lab-sessions" style="font-size: 80%;">Material for the lab sessions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs050.html#linking-the-regression-analysis-with-a-statistical-interpretation" style="font-size: 80%;">Linking the regression analysis with a statistical interpretation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs051.html#assumptions-made" style="font-size: 80%;">Assumptions made</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs052.html#expectation-value-and-variance" style="font-size: 80%;">Expectation value and variance</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs053.html#expectation-value-and-variance-for-boldsymbol-beta" style="font-size: 80%;">Expectation value and variance for \( \boldsymbol{\beta} \)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs001.html#plans-for-week-37-lecture-monday" style="font-size: 80%;"><b>Plans for week 37, lecture Monday</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs002.html#readings-and-videos" style="font-size: 80%;"><b>Readings and Videos:</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs003.html#material-for-lecture-monday-september-8" style="font-size: 80%;"><b>Material for lecture Monday September 8</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs004.html#gradient-descent-and-revisiting-ordinary-least-squares-from-last-week" style="font-size: 80%;"><b>Gradient descent and revisiting Ordinary Least Squares from last week</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs009.html#gradient-descent-example" style="font-size: 80%;"><b>Gradient descent example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs006.html#the-derivative-of-the-cost-loss-function" style="font-size: 80%;"><b>The derivative of the cost/loss function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs007.html#the-hessian-matrix" style="font-size: 80%;"><b>The Hessian matrix</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs008.html#simple-program" style="font-size: 80%;"><b>Simple program</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs009.html#gradient-descent-example" style="font-size: 80%;"><b>Gradient Descent Example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs010.html#gradient-descent-and-ridge" style="font-size: 80%;"><b>Gradient descent and Ridge</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs011.html#the-hessian-matrix-for-ridge-regression" style="font-size: 80%;"><b>The Hessian matrix for Ridge Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs012.html#program-example-for-gradient-descent-with-ridge-regression" style="font-size: 80%;"><b>Program example for gradient descent with Ridge Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs013.html#using-gradient-descent-methods-limitations" style="font-size: 80%;"><b>Using gradient descent methods, limitations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs014.html#momentum-based-gd" style="font-size: 80%;"><b>Momentum based GD</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs015.html#improving-gradient-descent-with-momentum" style="font-size: 80%;"><b>Improving gradient descent with momentum</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs056.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs017.html#overview-video-on-stochastic-gradient-descent-sgd" style="font-size: 80%;"><b>Overview video on Stochastic Gradient Descent (SGD)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs018.html#batches-and-mini-batches" style="font-size: 80%;"><b>Batches and mini-batches</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs019.html#pros-and-cons" style="font-size: 80%;"><b>Pros and cons</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs020.html#convergence-rates" style="font-size: 80%;"><b>Convergence rates</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs021.html#accuracy" style="font-size: 80%;"><b>Accuracy</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs022.html#stochastic-gradient-descent-sgd" style="font-size: 80%;"><b>Stochastic Gradient Descent (SGD)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs023.html#stochastic-gradient-descent" style="font-size: 80%;"><b>Stochastic Gradient Descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs024.html#computation-of-gradients" style="font-size: 80%;"><b>Computation of gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs025.html#sgd-example" style="font-size: 80%;"><b>SGD example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs026.html#the-gradient-step" style="font-size: 80%;"><b>The gradient step</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs027.html#simple-example-code" style="font-size: 80%;"><b>Simple example code</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs028.html#when-do-we-stop" style="font-size: 80%;"><b>When do we stop?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs029.html#slightly-different-approach" style="font-size: 80%;"><b>Slightly different approach</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs030.html#time-decay-rate" style="font-size: 80%;"><b>Time decay rate</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs031.html#code-with-a-number-of-minibatches-which-varies" style="font-size: 80%;"><b>Code with a Number of Minibatches which varies</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs032.html#replace-or-not" style="font-size: 80%;"><b>Replace or not</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#sgd-vs-full-batch-gd-convergence-speed-and-memory-comparison" style="font-size: 80%;"><b>SGD vs Full-Batch GD: Convergence Speed and Memory Comparison</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#theoretical-convergence-speed-and-convex-optimization" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Theoretical Convergence Speed and convex optimization</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#strongly-convex-case" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Strongly Convex Case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#non-convex-problems" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Non-Convex Problems</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs034.html#memory-usage-and-scalability" style="font-size: 80%;"><b>Memory Usage and Scalability</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#empirical-evidence-convergence-time-and-memory-in-practice" style="font-size: 80%;"><b>Empirical Evidence: Convergence Time and Memory in Practice</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#deep-neural-networks" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Deep Neural Networks</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#memory-constraints" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Memory constraints</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs036.html#second-moment-of-the-gradient" style="font-size: 80%;"><b>Second moment of the gradient</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs037.html#challenge-choosing-a-fixed-learning-rate" style="font-size: 80%;"><b>Challenge: Choosing a Fixed Learning Rate</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs038.html#motivation-for-adaptive-step-sizes" style="font-size: 80%;"><b>Motivation for Adaptive Step Sizes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs038.html#adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>AdaGrad algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs039.html#derivation-of-the-adagrad-algorithm" style="font-size: 80%;"><b>Derivation of the AdaGrad Algorithm</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs040.html#adagrad-update-rule-derivation" style="font-size: 80%;"><b>AdaGrad Update Rule Derivation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs041.html#adagrad-properties" style="font-size: 80%;"><b>AdaGrad Properties</b></a></li>
+     <!-- navigation toc: --> <li><a href="#rmsprop-adaptive-learning-rates" style="font-size: 80%;"><b>RMSProp: Adaptive Learning Rates</b></a></li>
+     <!-- navigation toc: --> <li><a href="#rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>RMSProp algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs043.html#adam-optimizer" style="font-size: 80%;"><b>Adam Optimizer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs044.html#adam-optimizer-https-arxiv-org-abs-1412-6980" style="font-size: 80%;"><b>"ADAM optimizer":"https://arxiv.org/abs/1412.6980"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs045.html#why-combine-momentum-and-rmsprop" style="font-size: 80%;"><b>Why Combine Momentum and RMSProp?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs046.html#adam-exponential-moving-averages-moments" style="font-size: 80%;"><b>Adam: Exponential Moving Averages (Moments)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs047.html#adam-bias-correction" style="font-size: 80%;"><b>Adam: Bias Correction</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs048.html#adam-update-rule-derivation" style="font-size: 80%;"><b>Adam: Update Rule Derivation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs049.html#adam-vs-adagrad-and-rmsprop" style="font-size: 80%;"><b>Adam vs. AdaGrad and RMSProp</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs050.html#adaptivity-across-dimensions" style="font-size: 80%;"><b>Adaptivity Across Dimensions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs050.html#adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>ADAM algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs051.html#algorithms-and-codes-for-adagrad-rmsprop-and-adam" style="font-size: 80%;"><b>Algorithms and codes for Adagrad, RMSprop and Adam</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs052.html#practical-tips" style="font-size: 80%;"><b>Practical tips</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs053.html#sneaking-in-automatic-differentiation-using-autograd" style="font-size: 80%;"><b>Sneaking in automatic differentiation using Autograd</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs056.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs055.html#including-stochastic-gradient-descent-with-autograd" style="font-size: 80%;"><b>Including Stochastic Gradient Descent with Autograd</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs056.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs057.html#but-none-of-these-can-compete-with-newton-s-method" style="font-size: 80%;"><b>But none of these can compete with Newton's method</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs058.html#similar-second-order-function-now-problem-but-now-with-adagrad" style="font-size: 80%;"><b>Similar (second order function now) problem but now with AdaGrad</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs059.html#rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent" style="font-size: 80%;"><b>RMSprop for adaptive learning rate with Stochastic Gradient Descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs060.html#and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf" style="font-size: 80%;"><b>And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs061.html#material-for-the-lab-sessions" style="font-size: 80%;"><b>Material for the lab sessions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs062.html#reminder-on-different-scaling-methods" style="font-size: 80%;"><b>Reminder on different scaling methods</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs063.html#functionality-in-scikit-learn" style="font-size: 80%;"><b>Functionality in Scikit-Learn</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs064.html#more-preprocessing" style="font-size: 80%;"><b>More preprocessing</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs065.html#frequently-used-scaling-functions" style="font-size: 80%;"><b>Frequently used scaling functions</b></a></li>
 
         </ul>
       </li>
@@ -290,39 +373,30 @@
 <div class="container">
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0042"></a>
-<!-- !split  -->
-<h2 id="summing-up" class="anchor">Summing up </h2>
+<!-- !split -->
+<h2 id="rmsprop-adaptive-learning-rates" class="anchor">RMSProp: Adaptive Learning Rates </h2>
 
-<p>The bias-variance tradeoff summarizes the fundamental tension in
-machine learning, particularly supervised learning, between the
-complexity of a model and the amount of training data needed to train
-it.  Since data is often limited, in practice it is often useful to
-use a less-complex model with higher bias, that is  a model whose asymptotic
-performance is worse than another model because it is easier to
-train and less sensitive to sampling noise arising from having a
-finite-sized training dataset (smaller variance). 
+<p>Addresses AdaGrad&#8217;s diminishing learning rate issue.
+Uses a decaying average of squared gradients (instead of a cumulative sum):
 </p>
+$$
+v_t = \rho v_{t-1} + (1-\rho)(\nabla C(\theta_t))^2,
+$$
 
-<p>The above equations tell us that in
-order to minimize the expected test error, we need to select a
-statistical learning method that simultaneously achieves low variance
-and low bias. Note that variance is inherently a nonnegative quantity,
-and squared bias is also nonnegative. Hence, we see that the expected
-test MSE can never lie below \( Var(\epsilon) \), the irreducible error.
-</p>
+<p>with \( \rho \) typically \( 0.9 \) (or \( 0.99 \)).</p>
+<ol>
+<li> Update: \( \theta_{t+1} = \theta_t - \frac{\eta}{\sqrt{v_t + \epsilon}} \nabla C(\theta_t) \).</li>
+<li> Recent gradients have more weight, so \( v_t \) adapts to the current landscape.</li>
+<li> Avoids AdaGrad&#8217;s &#8220;infinite memory&#8221; problem &#8211; learning rate does not continuously decay to zero.</li>
+</ol>
+<p>RMSProp was first proposed in lecture notes by Geoff Hinton, 2012 &ndash; unpublished.)</p>
+<h2 id="rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" class="anchor">RMSProp algorithm, taken from <a href="https://www.deeplearningbook.org/contents/optimization.html" target="_self">Goodfellow et al</a> </h2>
 
-<p>What do we mean by the variance and bias of a statistical learning
-method? The variance refers to the amount by which our model would change if we
-estimated it using a different training data set. Since the training
-data are used to fit the statistical learning method, different
-training data sets  will result in a different estimate. But ideally the
-estimate for our model should not vary too much between training
-sets. However, if a method has high variance  then small changes in
-the training data can result in large changes in the model. In general, more
-flexible statistical methods have higher variance.
-</p>
-
-<p>You may also find this recent <a href="https://www.pnas.org/content/116/32/15849" target="_self">article</a> of interest.</p>
+<br/><br/>
+<center>
+<p><img src="http://github.com/figures/rmsprop.png" width="600" align="bottom"></p>
+</center>
+<br/><br/>
 
 <p>
 <!-- navigation buttons at the bottom of the page -->
@@ -349,7 +423,7 @@ <h2 id="summing-up" class="anchor">Summing up </h2>
   <li><a href="http://github.com/._week37-bs050.html">51</a></li>
   <li><a href="http://github.com/._week37-bs051.html">52</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week37-bs053.html">54</a></li>
+  <li><a href="http://github.com/._week37-bs065.html">66</a></li>
   <li><a href="http://github.com/._week37-bs043.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week37/html/._week37-bs043.html b/doc/pub/week37/html/._week37-bs043.html
index ee91c67f7..558b02294 100644
--- a/doc/pub/week37/html/._week37-bs043.html
+++ b/doc/pub/week37/html/._week37-bs043.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 37: Statistical interpretations and Resampling Methods">
-<title>Week 37: Statistical interpretations and Resampling Methods</title>
+<meta name="description" content="Week 37: Gradient descent methods">
+<title>Week 37: Gradient descent methods</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week37.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week37-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -40,159 +40,222 @@
                2,
                None,
                'plans-for-week-37-lecture-monday'),
-              ('Plans for week 37, lab sessions',
+              ('Readings and Videos:', 2, None, 'readings-and-videos'),
+              ('Material for lecture Monday September 8',
                2,
                None,
-               'plans-for-week-37-lab-sessions'),
-              ('Material for lecture Monday September 9',
+               'material-for-lecture-monday-september-8'),
+              ('Gradient descent and revisiting Ordinary Least Squares from '
+               'last week',
                2,
                None,
-               'material-for-lecture-monday-september-9'),
-              ('Deriving OLS from a probability distribution',
+               'gradient-descent-and-revisiting-ordinary-least-squares-from-last-week'),
+              ('Gradient descent example', 2, None, 'gradient-descent-example'),
+              ('The derivative of the cost/loss function',
                2,
                None,
-               'deriving-ols-from-a-probability-distribution'),
-              ('Independent and Identically Distrubuted (iid)',
+               'the-derivative-of-the-cost-loss-function'),
+              ('The Hessian matrix', 2, None, 'the-hessian-matrix'),
+              ('Simple program', 2, None, 'simple-program'),
+              ('Gradient Descent Example', 2, None, 'gradient-descent-example'),
+              ('Gradient descent and Ridge',
                2,
                None,
-               'independent-and-identically-distrubuted-iid'),
-              ('Maximum Likelihood Estimation (MLE)',
+               'gradient-descent-and-ridge'),
+              ('The Hessian matrix for Ridge Regression',
                2,
                None,
-               'maximum-likelihood-estimation-mle'),
-              ('A new Cost Function', 2, None, 'a-new-cost-function'),
-              ("More basic Statistics and Bayes' theorem",
+               'the-hessian-matrix-for-ridge-regression'),
+              ('Program example for gradient descent with Ridge Regression',
                2,
                None,
-               'more-basic-statistics-and-bayes-theorem'),
-              ('Marginal Probability', 2, None, 'marginal-probability'),
-              ('Conditional  Probability', 2, None, 'conditional-probability'),
-              ("Bayes' Theorem", 2, None, 'bayes-theorem'),
-              ("Interpretations of Bayes' Theorem",
+               'program-example-for-gradient-descent-with-ridge-regression'),
+              ('Using gradient descent methods, limitations',
                2,
                None,
-               'interpretations-of-bayes-theorem'),
-              ("Example of Usage of Bayes' theorem",
+               'using-gradient-descent-methods-limitations'),
+              ('Momentum based GD', 2, None, 'momentum-based-gd'),
+              ('Improving gradient descent with momentum',
                2,
                None,
-               'example-of-usage-of-bayes-theorem'),
-              ('Doing it correctly', 2, None, 'doing-it-correctly'),
-              ("Bayes' Theorem and Ridge and Lasso Regression",
+               'improving-gradient-descent-with-momentum'),
+              ('Same code but now with momentum gradient descent',
                2,
                None,
-               'bayes-theorem-and-ridge-and-lasso-regression'),
-              ('Ridge and Bayes', 2, None, 'ridge-and-bayes'),
-              ('Lasso and Bayes', 2, None, 'lasso-and-bayes'),
-              ('Why resampling methods', 2, None, 'why-resampling-methods'),
-              ('Resampling methods', 2, None, 'resampling-methods'),
-              ('Resampling approaches can be computationally expensive',
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ('Overview video on Stochastic Gradient Descent (SGD)',
                2,
                None,
-               'resampling-approaches-can-be-computationally-expensive'),
-              ('Why resampling methods ?', 2, None, 'why-resampling-methods'),
-              ('Statistical analysis', 2, None, 'statistical-analysis'),
-              ('Resampling methods', 2, None, 'resampling-methods'),
-              ('Resampling methods: Bootstrap',
+               'overview-video-on-stochastic-gradient-descent-sgd'),
+              ('Batches and mini-batches', 2, None, 'batches-and-mini-batches'),
+              ('Pros and cons', 2, None, 'pros-and-cons'),
+              ('Convergence rates', 2, None, 'convergence-rates'),
+              ('Accuracy', 2, None, 'accuracy'),
+              ('Stochastic Gradient Descent (SGD)',
                2,
                None,
-               'resampling-methods-bootstrap'),
-              ('The Central Limit Theorem',
+               'stochastic-gradient-descent-sgd'),
+              ('Stochastic Gradient Descent',
                2,
                None,
-               'the-central-limit-theorem'),
-              ('Finding the Limit', 2, None, 'finding-the-limit'),
-              ('Rewriting the $\\delta$-function',
+               'stochastic-gradient-descent'),
+              ('Computation of gradients', 2, None, 'computation-of-gradients'),
+              ('SGD example', 2, None, 'sgd-example'),
+              ('The gradient step', 2, None, 'the-gradient-step'),
+              ('Simple example code', 2, None, 'simple-example-code'),
+              ('When do we stop?', 2, None, 'when-do-we-stop'),
+              ('Slightly different approach',
                2,
                None,
-               'rewriting-the-delta-function'),
-              ('Identifying Terms', 2, None, 'identifying-terms'),
-              ('Wrapping it up', 2, None, 'wrapping-it-up'),
-              ('Confidence Intervals', 2, None, 'confidence-intervals'),
-              ('Standard Approach based on the Normal Distribution',
+               'slightly-different-approach'),
+              ('Time decay rate', 2, None, 'time-decay-rate'),
+              ('Code with a Number of Minibatches which varies',
                2,
                None,
-               'standard-approach-based-on-the-normal-distribution'),
-              ('Resampling methods: Bootstrap background',
+               'code-with-a-number-of-minibatches-which-varies'),
+              ('Replace or not', 2, None, 'replace-or-not'),
+              ('SGD vs Full-Batch GD: Convergence Speed and Memory Comparison',
                2,
                None,
-               'resampling-methods-bootstrap-background'),
-              ('Resampling methods: More Bootstrap background',
+               'sgd-vs-full-batch-gd-convergence-speed-and-memory-comparison'),
+              ('Theoretical Convergence Speed and convex optimization',
+               3,
+               None,
+               'theoretical-convergence-speed-and-convex-optimization'),
+              ('Strongly Convex Case', 3, None, 'strongly-convex-case'),
+              ('Non-Convex Problems', 3, None, 'non-convex-problems'),
+              ('Memory Usage and Scalability',
+               2,
+               None,
+               'memory-usage-and-scalability'),
+              ('Empirical Evidence: Convergence Time and Memory in Practice',
+               2,
+               None,
+               'empirical-evidence-convergence-time-and-memory-in-practice'),
+              ('Deep Neural Networks', 3, None, 'deep-neural-networks'),
+              ('Memory constraints', 3, None, 'memory-constraints'),
+              ('Second moment of the gradient',
+               2,
+               None,
+               'second-moment-of-the-gradient'),
+              ('Challenge: Choosing a Fixed Learning Rate',
+               2,
+               None,
+               'challenge-choosing-a-fixed-learning-rate'),
+              ('Motivation for Adaptive Step Sizes',
+               2,
+               None,
+               'motivation-for-adaptive-step-sizes'),
+              ('AdaGrad algorithm, taken from "Goodfellow et '
+               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
+               2,
+               None,
+               'adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
+              ('Derivation of the AdaGrad Algorithm',
+               2,
+               None,
+               'derivation-of-the-adagrad-algorithm'),
+              ('AdaGrad Update Rule Derivation',
+               2,
+               None,
+               'adagrad-update-rule-derivation'),
+              ('AdaGrad Properties', 2, None, 'adagrad-properties'),
+              ('RMSProp: Adaptive Learning Rates',
+               2,
+               None,
+               'rmsprop-adaptive-learning-rates'),
+              ('RMSProp algorithm, taken from "Goodfellow et '
+               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
+               2,
+               None,
+               'rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
+              ('Adam Optimizer', 2, None, 'adam-optimizer'),
+              ('"ADAM optimizer":"https://arxiv.org/abs/1412.6980"',
                2,
                None,
-               'resampling-methods-more-bootstrap-background'),
-              ('Resampling methods: Bootstrap approach',
+               'adam-optimizer-https-arxiv-org-abs-1412-6980'),
+              ('Why Combine Momentum and RMSProp?',
                2,
                None,
-               'resampling-methods-bootstrap-approach'),
-              ('Resampling methods: Bootstrap steps',
+               'why-combine-momentum-and-rmsprop'),
+              ('Adam: Exponential Moving Averages (Moments)',
                2,
                None,
-               'resampling-methods-bootstrap-steps'),
-              ('Code example for the Bootstrap method',
+               'adam-exponential-moving-averages-moments'),
+              ('Adam: Bias Correction', 2, None, 'adam-bias-correction'),
+              ('Adam: Update Rule Derivation',
                2,
                None,
-               'code-example-for-the-bootstrap-method'),
-              ('Plotting the Histogram', 2, None, 'plotting-the-histogram'),
-              ('The bias-variance tradeoff',
+               'adam-update-rule-derivation'),
+              ('Adam vs. AdaGrad and RMSProp',
                2,
                None,
-               'the-bias-variance-tradeoff'),
-              ('A way to Read the Bias-Variance Tradeoff',
+               'adam-vs-adagrad-and-rmsprop'),
+              ('Adaptivity Across Dimensions',
                2,
                None,
-               'a-way-to-read-the-bias-variance-tradeoff'),
-              ('Example code for Bias-Variance tradeoff',
+               'adaptivity-across-dimensions'),
+              ('ADAM algorithm, taken from "Goodfellow et '
+               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
                2,
                None,
-               'example-code-for-bias-variance-tradeoff'),
-              ('Understanding what happens',
+               'adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
+              ('Algorithms and codes for Adagrad, RMSprop and Adam',
                2,
                None,
-               'understanding-what-happens'),
-              ('Summing up', 2, None, 'summing-up'),
-              ("Another Example from Scikit-Learn's Repository",
+               'algorithms-and-codes-for-adagrad-rmsprop-and-adam'),
+              ('Practical tips', 2, None, 'practical-tips'),
+              ('Sneaking in automatic differentiation using Autograd',
                2,
                None,
-               'another-example-from-scikit-learn-s-repository'),
-              ('Various steps in cross-validation',
+               'sneaking-in-automatic-differentiation-using-autograd'),
+              ('Same code but now with momentum gradient descent',
                2,
                None,
-               'various-steps-in-cross-validation'),
-              ('Cross-validation in brief',
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ('Including Stochastic Gradient Descent with Autograd',
                2,
                None,
-               'cross-validation-in-brief'),
-              ('Code Example for Cross-validation and $k$-fold '
-               'Cross-validation',
+               'including-stochastic-gradient-descent-with-autograd'),
+              ('Same code but now with momentum gradient descent',
                2,
                None,
-               'code-example-for-cross-validation-and-k-fold-cross-validation'),
-              ('More examples on bootstrap and cross-validation and errors',
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ("But none of these can compete with Newton's method",
                2,
                None,
-               'more-examples-on-bootstrap-and-cross-validation-and-errors'),
-              ('The same example but now with cross-validation',
+               'but-none-of-these-can-compete-with-newton-s-method'),
+              ('Similar (second order function now) problem but now with '
+               'AdaGrad',
                2,
                None,
-               'the-same-example-but-now-with-cross-validation'),
+               'similar-second-order-function-now-problem-but-now-with-adagrad'),
+              ('RMSprop for adaptive learning rate with Stochastic Gradient '
+               'Descent',
+               2,
+               None,
+               'rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent'),
+              ('And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"',
+               2,
+               None,
+               'and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf'),
               ('Material for the lab sessions',
                2,
                None,
                'material-for-the-lab-sessions'),
-              ('Linking the regression analysis with a statistical '
-               'interpretation',
+              ('Reminder on different scaling methods',
                2,
                None,
-               'linking-the-regression-analysis-with-a-statistical-interpretation'),
-              ('Assumptions made', 2, None, 'assumptions-made'),
-              ('Expectation value and variance',
+               'reminder-on-different-scaling-methods'),
+              ('Functionality in Scikit-Learn',
                2,
                None,
-               'expectation-value-and-variance'),
-              ('Expectation value and variance for $\\boldsymbol{\\beta}$',
+               'functionality-in-scikit-learn'),
+              ('More preprocessing', 2, None, 'more-preprocessing'),
+              ('Frequently used scaling functions',
                2,
                None,
-               'expectation-value-and-variance-for-boldsymbol-beta')]}
+               'frequently-used-scaling-functions')]}
 end of tocinfo -->
 
 <body>
@@ -220,66 +283,86 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week37-bs.html">Week 37: Statistical interpretations and Resampling Methods</a>
+    <a class="navbar-brand" href="http://github.com/week37-bs.html">Week 37: Gradient descent methods</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs001.html#plans-for-week-37-lecture-monday" style="font-size: 80%;">Plans for week 37, lecture Monday</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs002.html#plans-for-week-37-lab-sessions" style="font-size: 80%;">Plans for week 37, lab sessions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs003.html#material-for-lecture-monday-september-9" style="font-size: 80%;">Material for lecture Monday September 9</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs004.html#deriving-ols-from-a-probability-distribution" style="font-size: 80%;">Deriving OLS from a probability distribution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs005.html#independent-and-identically-distrubuted-iid" style="font-size: 80%;">Independent and Identically Distrubuted (iid)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs006.html#maximum-likelihood-estimation-mle" style="font-size: 80%;">Maximum Likelihood Estimation (MLE)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs007.html#a-new-cost-function" style="font-size: 80%;">A new Cost Function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs008.html#more-basic-statistics-and-bayes-theorem" style="font-size: 80%;">More basic Statistics and Bayes' theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs009.html#marginal-probability" style="font-size: 80%;">Marginal Probability</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs010.html#conditional-probability" style="font-size: 80%;">Conditional  Probability</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs011.html#bayes-theorem" style="font-size: 80%;">Bayes' Theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs012.html#interpretations-of-bayes-theorem" style="font-size: 80%;">Interpretations of Bayes' Theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs013.html#example-of-usage-of-bayes-theorem" style="font-size: 80%;">Example of Usage of Bayes' theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs014.html#doing-it-correctly" style="font-size: 80%;">Doing it correctly</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs015.html#bayes-theorem-and-ridge-and-lasso-regression" style="font-size: 80%;">Bayes' Theorem and Ridge and Lasso Regression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs016.html#ridge-and-bayes" style="font-size: 80%;">Ridge and Bayes</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs017.html#lasso-and-bayes" style="font-size: 80%;">Lasso and Bayes</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs021.html#why-resampling-methods" style="font-size: 80%;">Why resampling methods</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs023.html#resampling-methods" style="font-size: 80%;">Resampling methods</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs020.html#resampling-approaches-can-be-computationally-expensive" style="font-size: 80%;">Resampling approaches can be computationally expensive</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs021.html#why-resampling-methods" style="font-size: 80%;">Why resampling methods ?</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs022.html#statistical-analysis" style="font-size: 80%;">Statistical analysis</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs023.html#resampling-methods" style="font-size: 80%;">Resampling methods</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs024.html#resampling-methods-bootstrap" style="font-size: 80%;">Resampling methods: Bootstrap</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs025.html#the-central-limit-theorem" style="font-size: 80%;">The Central Limit Theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs026.html#finding-the-limit" style="font-size: 80%;">Finding the Limit</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs027.html#rewriting-the-delta-function" style="font-size: 80%;">Rewriting the \( \delta \)-function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs028.html#identifying-terms" style="font-size: 80%;">Identifying Terms</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs029.html#wrapping-it-up" style="font-size: 80%;">Wrapping it up</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs030.html#confidence-intervals" style="font-size: 80%;">Confidence Intervals</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs031.html#standard-approach-based-on-the-normal-distribution" style="font-size: 80%;">Standard Approach based on the Normal Distribution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs032.html#resampling-methods-bootstrap-background" style="font-size: 80%;">Resampling methods: Bootstrap background</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#resampling-methods-more-bootstrap-background" style="font-size: 80%;">Resampling methods: More Bootstrap background</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs034.html#resampling-methods-bootstrap-approach" style="font-size: 80%;">Resampling methods: Bootstrap approach</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#resampling-methods-bootstrap-steps" style="font-size: 80%;">Resampling methods: Bootstrap steps</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs036.html#code-example-for-the-bootstrap-method" style="font-size: 80%;">Code example for the Bootstrap method</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs037.html#plotting-the-histogram" style="font-size: 80%;">Plotting the Histogram</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs038.html#the-bias-variance-tradeoff" style="font-size: 80%;">The bias-variance tradeoff</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs039.html#a-way-to-read-the-bias-variance-tradeoff" style="font-size: 80%;">A way to Read the Bias-Variance Tradeoff</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs040.html#example-code-for-bias-variance-tradeoff" style="font-size: 80%;">Example code for Bias-Variance tradeoff</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs041.html#understanding-what-happens" style="font-size: 80%;">Understanding what happens</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs042.html#summing-up" style="font-size: 80%;">Summing up</a></li>
-     <!-- navigation toc: --> <li><a href="#another-example-from-scikit-learn-s-repository" style="font-size: 80%;">Another Example from Scikit-Learn's Repository</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs044.html#various-steps-in-cross-validation" style="font-size: 80%;">Various steps in cross-validation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs045.html#cross-validation-in-brief" style="font-size: 80%;">Cross-validation in brief</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs046.html#code-example-for-cross-validation-and-k-fold-cross-validation" style="font-size: 80%;">Code Example for Cross-validation and \( k \)-fold Cross-validation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs047.html#more-examples-on-bootstrap-and-cross-validation-and-errors" style="font-size: 80%;">More examples on bootstrap and cross-validation and errors</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs048.html#the-same-example-but-now-with-cross-validation" style="font-size: 80%;">The same example but now with cross-validation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs049.html#material-for-the-lab-sessions" style="font-size: 80%;">Material for the lab sessions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs050.html#linking-the-regression-analysis-with-a-statistical-interpretation" style="font-size: 80%;">Linking the regression analysis with a statistical interpretation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs051.html#assumptions-made" style="font-size: 80%;">Assumptions made</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs052.html#expectation-value-and-variance" style="font-size: 80%;">Expectation value and variance</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs053.html#expectation-value-and-variance-for-boldsymbol-beta" style="font-size: 80%;">Expectation value and variance for \( \boldsymbol{\beta} \)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs001.html#plans-for-week-37-lecture-monday" style="font-size: 80%;"><b>Plans for week 37, lecture Monday</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs002.html#readings-and-videos" style="font-size: 80%;"><b>Readings and Videos:</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs003.html#material-for-lecture-monday-september-8" style="font-size: 80%;"><b>Material for lecture Monday September 8</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs004.html#gradient-descent-and-revisiting-ordinary-least-squares-from-last-week" style="font-size: 80%;"><b>Gradient descent and revisiting Ordinary Least Squares from last week</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs009.html#gradient-descent-example" style="font-size: 80%;"><b>Gradient descent example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs006.html#the-derivative-of-the-cost-loss-function" style="font-size: 80%;"><b>The derivative of the cost/loss function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs007.html#the-hessian-matrix" style="font-size: 80%;"><b>The Hessian matrix</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs008.html#simple-program" style="font-size: 80%;"><b>Simple program</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs009.html#gradient-descent-example" style="font-size: 80%;"><b>Gradient Descent Example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs010.html#gradient-descent-and-ridge" style="font-size: 80%;"><b>Gradient descent and Ridge</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs011.html#the-hessian-matrix-for-ridge-regression" style="font-size: 80%;"><b>The Hessian matrix for Ridge Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs012.html#program-example-for-gradient-descent-with-ridge-regression" style="font-size: 80%;"><b>Program example for gradient descent with Ridge Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs013.html#using-gradient-descent-methods-limitations" style="font-size: 80%;"><b>Using gradient descent methods, limitations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs014.html#momentum-based-gd" style="font-size: 80%;"><b>Momentum based GD</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs015.html#improving-gradient-descent-with-momentum" style="font-size: 80%;"><b>Improving gradient descent with momentum</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs056.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs017.html#overview-video-on-stochastic-gradient-descent-sgd" style="font-size: 80%;"><b>Overview video on Stochastic Gradient Descent (SGD)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs018.html#batches-and-mini-batches" style="font-size: 80%;"><b>Batches and mini-batches</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs019.html#pros-and-cons" style="font-size: 80%;"><b>Pros and cons</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs020.html#convergence-rates" style="font-size: 80%;"><b>Convergence rates</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs021.html#accuracy" style="font-size: 80%;"><b>Accuracy</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs022.html#stochastic-gradient-descent-sgd" style="font-size: 80%;"><b>Stochastic Gradient Descent (SGD)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs023.html#stochastic-gradient-descent" style="font-size: 80%;"><b>Stochastic Gradient Descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs024.html#computation-of-gradients" style="font-size: 80%;"><b>Computation of gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs025.html#sgd-example" style="font-size: 80%;"><b>SGD example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs026.html#the-gradient-step" style="font-size: 80%;"><b>The gradient step</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs027.html#simple-example-code" style="font-size: 80%;"><b>Simple example code</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs028.html#when-do-we-stop" style="font-size: 80%;"><b>When do we stop?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs029.html#slightly-different-approach" style="font-size: 80%;"><b>Slightly different approach</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs030.html#time-decay-rate" style="font-size: 80%;"><b>Time decay rate</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs031.html#code-with-a-number-of-minibatches-which-varies" style="font-size: 80%;"><b>Code with a Number of Minibatches which varies</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs032.html#replace-or-not" style="font-size: 80%;"><b>Replace or not</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#sgd-vs-full-batch-gd-convergence-speed-and-memory-comparison" style="font-size: 80%;"><b>SGD vs Full-Batch GD: Convergence Speed and Memory Comparison</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#theoretical-convergence-speed-and-convex-optimization" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Theoretical Convergence Speed and convex optimization</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#strongly-convex-case" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Strongly Convex Case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#non-convex-problems" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Non-Convex Problems</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs034.html#memory-usage-and-scalability" style="font-size: 80%;"><b>Memory Usage and Scalability</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#empirical-evidence-convergence-time-and-memory-in-practice" style="font-size: 80%;"><b>Empirical Evidence: Convergence Time and Memory in Practice</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#deep-neural-networks" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Deep Neural Networks</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#memory-constraints" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Memory constraints</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs036.html#second-moment-of-the-gradient" style="font-size: 80%;"><b>Second moment of the gradient</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs037.html#challenge-choosing-a-fixed-learning-rate" style="font-size: 80%;"><b>Challenge: Choosing a Fixed Learning Rate</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs038.html#motivation-for-adaptive-step-sizes" style="font-size: 80%;"><b>Motivation for Adaptive Step Sizes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs038.html#adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>AdaGrad algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs039.html#derivation-of-the-adagrad-algorithm" style="font-size: 80%;"><b>Derivation of the AdaGrad Algorithm</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs040.html#adagrad-update-rule-derivation" style="font-size: 80%;"><b>AdaGrad Update Rule Derivation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs041.html#adagrad-properties" style="font-size: 80%;"><b>AdaGrad Properties</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs042.html#rmsprop-adaptive-learning-rates" style="font-size: 80%;"><b>RMSProp: Adaptive Learning Rates</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs042.html#rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>RMSProp algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
+     <!-- navigation toc: --> <li><a href="#adam-optimizer" style="font-size: 80%;"><b>Adam Optimizer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs044.html#adam-optimizer-https-arxiv-org-abs-1412-6980" style="font-size: 80%;"><b>"ADAM optimizer":"https://arxiv.org/abs/1412.6980"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs045.html#why-combine-momentum-and-rmsprop" style="font-size: 80%;"><b>Why Combine Momentum and RMSProp?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs046.html#adam-exponential-moving-averages-moments" style="font-size: 80%;"><b>Adam: Exponential Moving Averages (Moments)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs047.html#adam-bias-correction" style="font-size: 80%;"><b>Adam: Bias Correction</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs048.html#adam-update-rule-derivation" style="font-size: 80%;"><b>Adam: Update Rule Derivation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs049.html#adam-vs-adagrad-and-rmsprop" style="font-size: 80%;"><b>Adam vs. AdaGrad and RMSProp</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs050.html#adaptivity-across-dimensions" style="font-size: 80%;"><b>Adaptivity Across Dimensions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs050.html#adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>ADAM algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs051.html#algorithms-and-codes-for-adagrad-rmsprop-and-adam" style="font-size: 80%;"><b>Algorithms and codes for Adagrad, RMSprop and Adam</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs052.html#practical-tips" style="font-size: 80%;"><b>Practical tips</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs053.html#sneaking-in-automatic-differentiation-using-autograd" style="font-size: 80%;"><b>Sneaking in automatic differentiation using Autograd</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs056.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs055.html#including-stochastic-gradient-descent-with-autograd" style="font-size: 80%;"><b>Including Stochastic Gradient Descent with Autograd</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs056.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs057.html#but-none-of-these-can-compete-with-newton-s-method" style="font-size: 80%;"><b>But none of these can compete with Newton's method</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs058.html#similar-second-order-function-now-problem-but-now-with-adagrad" style="font-size: 80%;"><b>Similar (second order function now) problem but now with AdaGrad</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs059.html#rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent" style="font-size: 80%;"><b>RMSprop for adaptive learning rate with Stochastic Gradient Descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs060.html#and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf" style="font-size: 80%;"><b>And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs061.html#material-for-the-lab-sessions" style="font-size: 80%;"><b>Material for the lab sessions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs062.html#reminder-on-different-scaling-methods" style="font-size: 80%;"><b>Reminder on different scaling methods</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs063.html#functionality-in-scikit-learn" style="font-size: 80%;"><b>Functionality in Scikit-Learn</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs064.html#more-preprocessing" style="font-size: 80%;"><b>More preprocessing</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs065.html#frequently-used-scaling-functions" style="font-size: 80%;"><b>Frequently used scaling functions</b></a></li>
 
         </ul>
       </li>
@@ -291,96 +374,17 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0043"></a>
 <!-- !split -->
-<h2 id="another-example-from-scikit-learn-s-repository" class="anchor">Another Example from Scikit-Learn's Repository </h2>
-
-<p>This example demonstrates the problems of underfitting and overfitting and
-how we can use linear regression with polynomial features to approximate
-nonlinear functions. The plot shows the function that we want to approximate,
-which is a part of the cosine function. In addition, the samples from the
-real function and the approximations of different models are displayed. The
-models have polynomial features of different degrees. We can see that a
-linear function (polynomial with degree 1) is not sufficient to fit the
-training samples. This is called <b>underfitting</b>. A polynomial of degree 4
-approximates the true function almost perfectly. However, for higher degrees
-the model will <b>overfit</b> the training data, i.e. it learns the noise of the
-training data.
-We evaluate quantitatively overfitting and underfitting by using
-cross-validation. We calculate the mean squared error (MSE) on the validation
-set, the higher, the less likely the model generalizes correctly from the
-training data.
-</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #408080; font-style: italic">#print(__doc__)</span>
-
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">matplotlib.pyplot</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">plt</span>
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.pipeline</span> <span style="color: #008000; font-weight: bold">import</span> Pipeline
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.preprocessing</span> <span style="color: #008000; font-weight: bold">import</span> PolynomialFeatures
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.linear_model</span> <span style="color: #008000; font-weight: bold">import</span> LinearRegression
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.model_selection</span> <span style="color: #008000; font-weight: bold">import</span> cross_val_score
-
-
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">true_fun</span>(X):
-    <span style="color: #008000; font-weight: bold">return</span> np<span style="color: #666666">.</span>cos(<span style="color: #666666">1.5</span> <span style="color: #666666">*</span> np<span style="color: #666666">.</span>pi <span style="color: #666666">*</span> X)
+<h2 id="adam-optimizer" class="anchor">Adam Optimizer </h2>
 
-np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>seed(<span style="color: #666666">0</span>)
-
-n_samples <span style="color: #666666">=</span> <span style="color: #666666">30</span>
-degrees <span style="color: #666666">=</span> [<span style="color: #666666">1</span>, <span style="color: #666666">4</span>, <span style="color: #666666">15</span>]
-
-X <span style="color: #666666">=</span> np<span style="color: #666666">.</span>sort(np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>rand(n_samples))
-y <span style="color: #666666">=</span> true_fun(X) <span style="color: #666666">+</span> np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>randn(n_samples) <span style="color: #666666">*</span> <span style="color: #666666">0.1</span>
-
-plt<span style="color: #666666">.</span>figure(figsize<span style="color: #666666">=</span>(<span style="color: #666666">14</span>, <span style="color: #666666">5</span>))
-<span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #008000">len</span>(degrees)):
-    ax <span style="color: #666666">=</span> plt<span style="color: #666666">.</span>subplot(<span style="color: #666666">1</span>, <span style="color: #008000">len</span>(degrees), i <span style="color: #666666">+</span> <span style="color: #666666">1</span>)
-    plt<span style="color: #666666">.</span>setp(ax, xticks<span style="color: #666666">=</span>(), yticks<span style="color: #666666">=</span>())
-
-    polynomial_features <span style="color: #666666">=</span> PolynomialFeatures(degree<span style="color: #666666">=</span>degrees[i],
-                                             include_bias<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">False</span>)
-    linear_regression <span style="color: #666666">=</span> LinearRegression()
-    pipeline <span style="color: #666666">=</span> Pipeline([(<span style="color: #BA2121">&quot;polynomial_features&quot;</span>, polynomial_features),
-                         (<span style="color: #BA2121">&quot;linear_regression&quot;</span>, linear_regression)])
-    pipeline<span style="color: #666666">.</span>fit(X[:, np<span style="color: #666666">.</span>newaxis], y)
-
-    <span style="color: #408080; font-style: italic"># Evaluate the models using crossvalidation</span>
-    scores <span style="color: #666666">=</span> cross_val_score(pipeline, X[:, np<span style="color: #666666">.</span>newaxis], y,
-                             scoring<span style="color: #666666">=</span><span style="color: #BA2121">&quot;neg_mean_squared_error&quot;</span>, cv<span style="color: #666666">=10</span>)
-
-    X_test <span style="color: #666666">=</span> np<span style="color: #666666">.</span>linspace(<span style="color: #666666">0</span>, <span style="color: #666666">1</span>, <span style="color: #666666">100</span>)
-    plt<span style="color: #666666">.</span>plot(X_test, pipeline<span style="color: #666666">.</span>predict(X_test[:, np<span style="color: #666666">.</span>newaxis]), label<span style="color: #666666">=</span><span style="color: #BA2121">&quot;Model&quot;</span>)
-    plt<span style="color: #666666">.</span>plot(X_test, true_fun(X_test), label<span style="color: #666666">=</span><span style="color: #BA2121">&quot;True function&quot;</span>)
-    plt<span style="color: #666666">.</span>scatter(X, y, edgecolor<span style="color: #666666">=</span><span style="color: #BA2121">&#39;b&#39;</span>, s<span style="color: #666666">=20</span>, label<span style="color: #666666">=</span><span style="color: #BA2121">&quot;Samples&quot;</span>)
-    plt<span style="color: #666666">.</span>xlabel(<span style="color: #BA2121">&quot;x&quot;</span>)
-    plt<span style="color: #666666">.</span>ylabel(<span style="color: #BA2121">&quot;y&quot;</span>)
-    plt<span style="color: #666666">.</span>xlim((<span style="color: #666666">0</span>, <span style="color: #666666">1</span>))
-    plt<span style="color: #666666">.</span>ylim((<span style="color: #666666">-2</span>, <span style="color: #666666">2</span>))
-    plt<span style="color: #666666">.</span>legend(loc<span style="color: #666666">=</span><span style="color: #BA2121">&quot;best&quot;</span>)
-    plt<span style="color: #666666">.</span>title(<span style="color: #BA2121">&quot;Degree </span><span style="color: #BB6688; font-weight: bold">{}</span><span style="color: #BB6622; font-weight: bold">\n</span><span style="color: #BA2121">MSE = </span><span style="color: #BB6688; font-weight: bold">{:.2e}</span><span style="color: #BA2121">(+/- </span><span style="color: #BB6688; font-weight: bold">{:.2e}</span><span style="color: #BA2121">)&quot;</span><span style="color: #666666">.</span>format(
-        degrees[i], <span style="color: #666666">-</span>scores<span style="color: #666666">.</span>mean(), scores<span style="color: #666666">.</span>std()))
-plt<span style="color: #666666">.</span>show()
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
+<p>Why combine Momentum and RMSProp? Motivation for Adam: Adaptive Moment Estimation (Adam) was introduced by Kingma an Ba (2014) to combine the benefits of momentum and RMSProp.</p>
 
+<ol>
+<li> Fast convergence by smoothing gradients (accelerates in long-term gradient direction).</li>
+<li> Adaptive rates (RMSProp): Per-dimension learning rate scaling for stability (handles different feature scales, sparse gradients).</li>
+<li> Adam uses both: maintains moving averages of both first moment (gradients) and second moment (squared gradients)</li>
+<li> Additionally, includes a mechanism to correct the bias in these moving averages (crucial in early iterations)</li>
+</ol>
+<p><b>Result</b>: Adam is robust, achieves faster convergence with less tuning, and often outperforms SGD (with momentum) in practice.</p>
 
 <p>
 <!-- navigation buttons at the bottom of the page -->
@@ -407,7 +411,7 @@ <h2 id="another-example-from-scikit-learn-s-repository" class="anchor">Another E
   <li><a href="http://github.com/._week37-bs051.html">52</a></li>
   <li><a href="http://github.com/._week37-bs052.html">53</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week37-bs053.html">54</a></li>
+  <li><a href="http://github.com/._week37-bs065.html">66</a></li>
   <li><a href="http://github.com/._week37-bs044.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week37/html/._week37-bs044.html b/doc/pub/week37/html/._week37-bs044.html
index d0e16853e..7969230b4 100644
--- a/doc/pub/week37/html/._week37-bs044.html
+++ b/doc/pub/week37/html/._week37-bs044.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 37: Statistical interpretations and Resampling Methods">
-<title>Week 37: Statistical interpretations and Resampling Methods</title>
+<meta name="description" content="Week 37: Gradient descent methods">
+<title>Week 37: Gradient descent methods</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week37.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week37-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -40,159 +40,222 @@
                2,
                None,
                'plans-for-week-37-lecture-monday'),
-              ('Plans for week 37, lab sessions',
+              ('Readings and Videos:', 2, None, 'readings-and-videos'),
+              ('Material for lecture Monday September 8',
                2,
                None,
-               'plans-for-week-37-lab-sessions'),
-              ('Material for lecture Monday September 9',
+               'material-for-lecture-monday-september-8'),
+              ('Gradient descent and revisiting Ordinary Least Squares from '
+               'last week',
                2,
                None,
-               'material-for-lecture-monday-september-9'),
-              ('Deriving OLS from a probability distribution',
+               'gradient-descent-and-revisiting-ordinary-least-squares-from-last-week'),
+              ('Gradient descent example', 2, None, 'gradient-descent-example'),
+              ('The derivative of the cost/loss function',
                2,
                None,
-               'deriving-ols-from-a-probability-distribution'),
-              ('Independent and Identically Distrubuted (iid)',
+               'the-derivative-of-the-cost-loss-function'),
+              ('The Hessian matrix', 2, None, 'the-hessian-matrix'),
+              ('Simple program', 2, None, 'simple-program'),
+              ('Gradient Descent Example', 2, None, 'gradient-descent-example'),
+              ('Gradient descent and Ridge',
                2,
                None,
-               'independent-and-identically-distrubuted-iid'),
-              ('Maximum Likelihood Estimation (MLE)',
+               'gradient-descent-and-ridge'),
+              ('The Hessian matrix for Ridge Regression',
                2,
                None,
-               'maximum-likelihood-estimation-mle'),
-              ('A new Cost Function', 2, None, 'a-new-cost-function'),
-              ("More basic Statistics and Bayes' theorem",
+               'the-hessian-matrix-for-ridge-regression'),
+              ('Program example for gradient descent with Ridge Regression',
                2,
                None,
-               'more-basic-statistics-and-bayes-theorem'),
-              ('Marginal Probability', 2, None, 'marginal-probability'),
-              ('Conditional  Probability', 2, None, 'conditional-probability'),
-              ("Bayes' Theorem", 2, None, 'bayes-theorem'),
-              ("Interpretations of Bayes' Theorem",
+               'program-example-for-gradient-descent-with-ridge-regression'),
+              ('Using gradient descent methods, limitations',
                2,
                None,
-               'interpretations-of-bayes-theorem'),
-              ("Example of Usage of Bayes' theorem",
+               'using-gradient-descent-methods-limitations'),
+              ('Momentum based GD', 2, None, 'momentum-based-gd'),
+              ('Improving gradient descent with momentum',
                2,
                None,
-               'example-of-usage-of-bayes-theorem'),
-              ('Doing it correctly', 2, None, 'doing-it-correctly'),
-              ("Bayes' Theorem and Ridge and Lasso Regression",
+               'improving-gradient-descent-with-momentum'),
+              ('Same code but now with momentum gradient descent',
                2,
                None,
-               'bayes-theorem-and-ridge-and-lasso-regression'),
-              ('Ridge and Bayes', 2, None, 'ridge-and-bayes'),
-              ('Lasso and Bayes', 2, None, 'lasso-and-bayes'),
-              ('Why resampling methods', 2, None, 'why-resampling-methods'),
-              ('Resampling methods', 2, None, 'resampling-methods'),
-              ('Resampling approaches can be computationally expensive',
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ('Overview video on Stochastic Gradient Descent (SGD)',
                2,
                None,
-               'resampling-approaches-can-be-computationally-expensive'),
-              ('Why resampling methods ?', 2, None, 'why-resampling-methods'),
-              ('Statistical analysis', 2, None, 'statistical-analysis'),
-              ('Resampling methods', 2, None, 'resampling-methods'),
-              ('Resampling methods: Bootstrap',
+               'overview-video-on-stochastic-gradient-descent-sgd'),
+              ('Batches and mini-batches', 2, None, 'batches-and-mini-batches'),
+              ('Pros and cons', 2, None, 'pros-and-cons'),
+              ('Convergence rates', 2, None, 'convergence-rates'),
+              ('Accuracy', 2, None, 'accuracy'),
+              ('Stochastic Gradient Descent (SGD)',
                2,
                None,
-               'resampling-methods-bootstrap'),
-              ('The Central Limit Theorem',
+               'stochastic-gradient-descent-sgd'),
+              ('Stochastic Gradient Descent',
                2,
                None,
-               'the-central-limit-theorem'),
-              ('Finding the Limit', 2, None, 'finding-the-limit'),
-              ('Rewriting the $\\delta$-function',
+               'stochastic-gradient-descent'),
+              ('Computation of gradients', 2, None, 'computation-of-gradients'),
+              ('SGD example', 2, None, 'sgd-example'),
+              ('The gradient step', 2, None, 'the-gradient-step'),
+              ('Simple example code', 2, None, 'simple-example-code'),
+              ('When do we stop?', 2, None, 'when-do-we-stop'),
+              ('Slightly different approach',
                2,
                None,
-               'rewriting-the-delta-function'),
-              ('Identifying Terms', 2, None, 'identifying-terms'),
-              ('Wrapping it up', 2, None, 'wrapping-it-up'),
-              ('Confidence Intervals', 2, None, 'confidence-intervals'),
-              ('Standard Approach based on the Normal Distribution',
+               'slightly-different-approach'),
+              ('Time decay rate', 2, None, 'time-decay-rate'),
+              ('Code with a Number of Minibatches which varies',
                2,
                None,
-               'standard-approach-based-on-the-normal-distribution'),
-              ('Resampling methods: Bootstrap background',
+               'code-with-a-number-of-minibatches-which-varies'),
+              ('Replace or not', 2, None, 'replace-or-not'),
+              ('SGD vs Full-Batch GD: Convergence Speed and Memory Comparison',
                2,
                None,
-               'resampling-methods-bootstrap-background'),
-              ('Resampling methods: More Bootstrap background',
+               'sgd-vs-full-batch-gd-convergence-speed-and-memory-comparison'),
+              ('Theoretical Convergence Speed and convex optimization',
+               3,
+               None,
+               'theoretical-convergence-speed-and-convex-optimization'),
+              ('Strongly Convex Case', 3, None, 'strongly-convex-case'),
+              ('Non-Convex Problems', 3, None, 'non-convex-problems'),
+              ('Memory Usage and Scalability',
+               2,
+               None,
+               'memory-usage-and-scalability'),
+              ('Empirical Evidence: Convergence Time and Memory in Practice',
+               2,
+               None,
+               'empirical-evidence-convergence-time-and-memory-in-practice'),
+              ('Deep Neural Networks', 3, None, 'deep-neural-networks'),
+              ('Memory constraints', 3, None, 'memory-constraints'),
+              ('Second moment of the gradient',
+               2,
+               None,
+               'second-moment-of-the-gradient'),
+              ('Challenge: Choosing a Fixed Learning Rate',
+               2,
+               None,
+               'challenge-choosing-a-fixed-learning-rate'),
+              ('Motivation for Adaptive Step Sizes',
+               2,
+               None,
+               'motivation-for-adaptive-step-sizes'),
+              ('AdaGrad algorithm, taken from "Goodfellow et '
+               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
+               2,
+               None,
+               'adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
+              ('Derivation of the AdaGrad Algorithm',
                2,
                None,
-               'resampling-methods-more-bootstrap-background'),
-              ('Resampling methods: Bootstrap approach',
+               'derivation-of-the-adagrad-algorithm'),
+              ('AdaGrad Update Rule Derivation',
                2,
                None,
-               'resampling-methods-bootstrap-approach'),
-              ('Resampling methods: Bootstrap steps',
+               'adagrad-update-rule-derivation'),
+              ('AdaGrad Properties', 2, None, 'adagrad-properties'),
+              ('RMSProp: Adaptive Learning Rates',
                2,
                None,
-               'resampling-methods-bootstrap-steps'),
-              ('Code example for the Bootstrap method',
+               'rmsprop-adaptive-learning-rates'),
+              ('RMSProp algorithm, taken from "Goodfellow et '
+               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
                2,
                None,
-               'code-example-for-the-bootstrap-method'),
-              ('Plotting the Histogram', 2, None, 'plotting-the-histogram'),
-              ('The bias-variance tradeoff',
+               'rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
+              ('Adam Optimizer', 2, None, 'adam-optimizer'),
+              ('"ADAM optimizer":"https://arxiv.org/abs/1412.6980"',
                2,
                None,
-               'the-bias-variance-tradeoff'),
-              ('A way to Read the Bias-Variance Tradeoff',
+               'adam-optimizer-https-arxiv-org-abs-1412-6980'),
+              ('Why Combine Momentum and RMSProp?',
                2,
                None,
-               'a-way-to-read-the-bias-variance-tradeoff'),
-              ('Example code for Bias-Variance tradeoff',
+               'why-combine-momentum-and-rmsprop'),
+              ('Adam: Exponential Moving Averages (Moments)',
                2,
                None,
-               'example-code-for-bias-variance-tradeoff'),
-              ('Understanding what happens',
+               'adam-exponential-moving-averages-moments'),
+              ('Adam: Bias Correction', 2, None, 'adam-bias-correction'),
+              ('Adam: Update Rule Derivation',
                2,
                None,
-               'understanding-what-happens'),
-              ('Summing up', 2, None, 'summing-up'),
-              ("Another Example from Scikit-Learn's Repository",
+               'adam-update-rule-derivation'),
+              ('Adam vs. AdaGrad and RMSProp',
                2,
                None,
-               'another-example-from-scikit-learn-s-repository'),
-              ('Various steps in cross-validation',
+               'adam-vs-adagrad-and-rmsprop'),
+              ('Adaptivity Across Dimensions',
                2,
                None,
-               'various-steps-in-cross-validation'),
-              ('Cross-validation in brief',
+               'adaptivity-across-dimensions'),
+              ('ADAM algorithm, taken from "Goodfellow et '
+               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
                2,
                None,
-               'cross-validation-in-brief'),
-              ('Code Example for Cross-validation and $k$-fold '
-               'Cross-validation',
+               'adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
+              ('Algorithms and codes for Adagrad, RMSprop and Adam',
                2,
                None,
-               'code-example-for-cross-validation-and-k-fold-cross-validation'),
-              ('More examples on bootstrap and cross-validation and errors',
+               'algorithms-and-codes-for-adagrad-rmsprop-and-adam'),
+              ('Practical tips', 2, None, 'practical-tips'),
+              ('Sneaking in automatic differentiation using Autograd',
                2,
                None,
-               'more-examples-on-bootstrap-and-cross-validation-and-errors'),
-              ('The same example but now with cross-validation',
+               'sneaking-in-automatic-differentiation-using-autograd'),
+              ('Same code but now with momentum gradient descent',
                2,
                None,
-               'the-same-example-but-now-with-cross-validation'),
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ('Including Stochastic Gradient Descent with Autograd',
+               2,
+               None,
+               'including-stochastic-gradient-descent-with-autograd'),
+              ('Same code but now with momentum gradient descent',
+               2,
+               None,
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ("But none of these can compete with Newton's method",
+               2,
+               None,
+               'but-none-of-these-can-compete-with-newton-s-method'),
+              ('Similar (second order function now) problem but now with '
+               'AdaGrad',
+               2,
+               None,
+               'similar-second-order-function-now-problem-but-now-with-adagrad'),
+              ('RMSprop for adaptive learning rate with Stochastic Gradient '
+               'Descent',
+               2,
+               None,
+               'rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent'),
+              ('And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"',
+               2,
+               None,
+               'and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf'),
               ('Material for the lab sessions',
                2,
                None,
                'material-for-the-lab-sessions'),
-              ('Linking the regression analysis with a statistical '
-               'interpretation',
+              ('Reminder on different scaling methods',
                2,
                None,
-               'linking-the-regression-analysis-with-a-statistical-interpretation'),
-              ('Assumptions made', 2, None, 'assumptions-made'),
-              ('Expectation value and variance',
+               'reminder-on-different-scaling-methods'),
+              ('Functionality in Scikit-Learn',
                2,
                None,
-               'expectation-value-and-variance'),
-              ('Expectation value and variance for $\\boldsymbol{\\beta}$',
+               'functionality-in-scikit-learn'),
+              ('More preprocessing', 2, None, 'more-preprocessing'),
+              ('Frequently used scaling functions',
                2,
                None,
-               'expectation-value-and-variance-for-boldsymbol-beta')]}
+               'frequently-used-scaling-functions')]}
 end of tocinfo -->
 
 <body>
@@ -220,66 +283,86 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week37-bs.html">Week 37: Statistical interpretations and Resampling Methods</a>
+    <a class="navbar-brand" href="http://github.com/week37-bs.html">Week 37: Gradient descent methods</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs001.html#plans-for-week-37-lecture-monday" style="font-size: 80%;">Plans for week 37, lecture Monday</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs002.html#plans-for-week-37-lab-sessions" style="font-size: 80%;">Plans for week 37, lab sessions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs003.html#material-for-lecture-monday-september-9" style="font-size: 80%;">Material for lecture Monday September 9</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs004.html#deriving-ols-from-a-probability-distribution" style="font-size: 80%;">Deriving OLS from a probability distribution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs005.html#independent-and-identically-distrubuted-iid" style="font-size: 80%;">Independent and Identically Distrubuted (iid)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs006.html#maximum-likelihood-estimation-mle" style="font-size: 80%;">Maximum Likelihood Estimation (MLE)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs007.html#a-new-cost-function" style="font-size: 80%;">A new Cost Function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs008.html#more-basic-statistics-and-bayes-theorem" style="font-size: 80%;">More basic Statistics and Bayes' theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs009.html#marginal-probability" style="font-size: 80%;">Marginal Probability</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs010.html#conditional-probability" style="font-size: 80%;">Conditional  Probability</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs011.html#bayes-theorem" style="font-size: 80%;">Bayes' Theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs012.html#interpretations-of-bayes-theorem" style="font-size: 80%;">Interpretations of Bayes' Theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs013.html#example-of-usage-of-bayes-theorem" style="font-size: 80%;">Example of Usage of Bayes' theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs014.html#doing-it-correctly" style="font-size: 80%;">Doing it correctly</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs015.html#bayes-theorem-and-ridge-and-lasso-regression" style="font-size: 80%;">Bayes' Theorem and Ridge and Lasso Regression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs016.html#ridge-and-bayes" style="font-size: 80%;">Ridge and Bayes</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs017.html#lasso-and-bayes" style="font-size: 80%;">Lasso and Bayes</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs021.html#why-resampling-methods" style="font-size: 80%;">Why resampling methods</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs023.html#resampling-methods" style="font-size: 80%;">Resampling methods</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs020.html#resampling-approaches-can-be-computationally-expensive" style="font-size: 80%;">Resampling approaches can be computationally expensive</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs021.html#why-resampling-methods" style="font-size: 80%;">Why resampling methods ?</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs022.html#statistical-analysis" style="font-size: 80%;">Statistical analysis</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs023.html#resampling-methods" style="font-size: 80%;">Resampling methods</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs024.html#resampling-methods-bootstrap" style="font-size: 80%;">Resampling methods: Bootstrap</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs025.html#the-central-limit-theorem" style="font-size: 80%;">The Central Limit Theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs026.html#finding-the-limit" style="font-size: 80%;">Finding the Limit</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs027.html#rewriting-the-delta-function" style="font-size: 80%;">Rewriting the \( \delta \)-function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs028.html#identifying-terms" style="font-size: 80%;">Identifying Terms</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs029.html#wrapping-it-up" style="font-size: 80%;">Wrapping it up</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs030.html#confidence-intervals" style="font-size: 80%;">Confidence Intervals</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs031.html#standard-approach-based-on-the-normal-distribution" style="font-size: 80%;">Standard Approach based on the Normal Distribution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs032.html#resampling-methods-bootstrap-background" style="font-size: 80%;">Resampling methods: Bootstrap background</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#resampling-methods-more-bootstrap-background" style="font-size: 80%;">Resampling methods: More Bootstrap background</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs034.html#resampling-methods-bootstrap-approach" style="font-size: 80%;">Resampling methods: Bootstrap approach</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#resampling-methods-bootstrap-steps" style="font-size: 80%;">Resampling methods: Bootstrap steps</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs036.html#code-example-for-the-bootstrap-method" style="font-size: 80%;">Code example for the Bootstrap method</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs037.html#plotting-the-histogram" style="font-size: 80%;">Plotting the Histogram</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs038.html#the-bias-variance-tradeoff" style="font-size: 80%;">The bias-variance tradeoff</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs039.html#a-way-to-read-the-bias-variance-tradeoff" style="font-size: 80%;">A way to Read the Bias-Variance Tradeoff</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs040.html#example-code-for-bias-variance-tradeoff" style="font-size: 80%;">Example code for Bias-Variance tradeoff</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs041.html#understanding-what-happens" style="font-size: 80%;">Understanding what happens</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs042.html#summing-up" style="font-size: 80%;">Summing up</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs043.html#another-example-from-scikit-learn-s-repository" style="font-size: 80%;">Another Example from Scikit-Learn's Repository</a></li>
-     <!-- navigation toc: --> <li><a href="#various-steps-in-cross-validation" style="font-size: 80%;">Various steps in cross-validation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs045.html#cross-validation-in-brief" style="font-size: 80%;">Cross-validation in brief</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs046.html#code-example-for-cross-validation-and-k-fold-cross-validation" style="font-size: 80%;">Code Example for Cross-validation and \( k \)-fold Cross-validation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs047.html#more-examples-on-bootstrap-and-cross-validation-and-errors" style="font-size: 80%;">More examples on bootstrap and cross-validation and errors</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs048.html#the-same-example-but-now-with-cross-validation" style="font-size: 80%;">The same example but now with cross-validation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs049.html#material-for-the-lab-sessions" style="font-size: 80%;">Material for the lab sessions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs050.html#linking-the-regression-analysis-with-a-statistical-interpretation" style="font-size: 80%;">Linking the regression analysis with a statistical interpretation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs051.html#assumptions-made" style="font-size: 80%;">Assumptions made</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs052.html#expectation-value-and-variance" style="font-size: 80%;">Expectation value and variance</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs053.html#expectation-value-and-variance-for-boldsymbol-beta" style="font-size: 80%;">Expectation value and variance for \( \boldsymbol{\beta} \)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs001.html#plans-for-week-37-lecture-monday" style="font-size: 80%;"><b>Plans for week 37, lecture Monday</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs002.html#readings-and-videos" style="font-size: 80%;"><b>Readings and Videos:</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs003.html#material-for-lecture-monday-september-8" style="font-size: 80%;"><b>Material for lecture Monday September 8</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs004.html#gradient-descent-and-revisiting-ordinary-least-squares-from-last-week" style="font-size: 80%;"><b>Gradient descent and revisiting Ordinary Least Squares from last week</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs009.html#gradient-descent-example" style="font-size: 80%;"><b>Gradient descent example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs006.html#the-derivative-of-the-cost-loss-function" style="font-size: 80%;"><b>The derivative of the cost/loss function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs007.html#the-hessian-matrix" style="font-size: 80%;"><b>The Hessian matrix</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs008.html#simple-program" style="font-size: 80%;"><b>Simple program</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs009.html#gradient-descent-example" style="font-size: 80%;"><b>Gradient Descent Example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs010.html#gradient-descent-and-ridge" style="font-size: 80%;"><b>Gradient descent and Ridge</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs011.html#the-hessian-matrix-for-ridge-regression" style="font-size: 80%;"><b>The Hessian matrix for Ridge Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs012.html#program-example-for-gradient-descent-with-ridge-regression" style="font-size: 80%;"><b>Program example for gradient descent with Ridge Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs013.html#using-gradient-descent-methods-limitations" style="font-size: 80%;"><b>Using gradient descent methods, limitations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs014.html#momentum-based-gd" style="font-size: 80%;"><b>Momentum based GD</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs015.html#improving-gradient-descent-with-momentum" style="font-size: 80%;"><b>Improving gradient descent with momentum</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs056.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs017.html#overview-video-on-stochastic-gradient-descent-sgd" style="font-size: 80%;"><b>Overview video on Stochastic Gradient Descent (SGD)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs018.html#batches-and-mini-batches" style="font-size: 80%;"><b>Batches and mini-batches</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs019.html#pros-and-cons" style="font-size: 80%;"><b>Pros and cons</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs020.html#convergence-rates" style="font-size: 80%;"><b>Convergence rates</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs021.html#accuracy" style="font-size: 80%;"><b>Accuracy</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs022.html#stochastic-gradient-descent-sgd" style="font-size: 80%;"><b>Stochastic Gradient Descent (SGD)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs023.html#stochastic-gradient-descent" style="font-size: 80%;"><b>Stochastic Gradient Descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs024.html#computation-of-gradients" style="font-size: 80%;"><b>Computation of gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs025.html#sgd-example" style="font-size: 80%;"><b>SGD example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs026.html#the-gradient-step" style="font-size: 80%;"><b>The gradient step</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs027.html#simple-example-code" style="font-size: 80%;"><b>Simple example code</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs028.html#when-do-we-stop" style="font-size: 80%;"><b>When do we stop?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs029.html#slightly-different-approach" style="font-size: 80%;"><b>Slightly different approach</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs030.html#time-decay-rate" style="font-size: 80%;"><b>Time decay rate</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs031.html#code-with-a-number-of-minibatches-which-varies" style="font-size: 80%;"><b>Code with a Number of Minibatches which varies</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs032.html#replace-or-not" style="font-size: 80%;"><b>Replace or not</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#sgd-vs-full-batch-gd-convergence-speed-and-memory-comparison" style="font-size: 80%;"><b>SGD vs Full-Batch GD: Convergence Speed and Memory Comparison</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#theoretical-convergence-speed-and-convex-optimization" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Theoretical Convergence Speed and convex optimization</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#strongly-convex-case" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Strongly Convex Case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#non-convex-problems" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Non-Convex Problems</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs034.html#memory-usage-and-scalability" style="font-size: 80%;"><b>Memory Usage and Scalability</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#empirical-evidence-convergence-time-and-memory-in-practice" style="font-size: 80%;"><b>Empirical Evidence: Convergence Time and Memory in Practice</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#deep-neural-networks" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Deep Neural Networks</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#memory-constraints" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Memory constraints</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs036.html#second-moment-of-the-gradient" style="font-size: 80%;"><b>Second moment of the gradient</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs037.html#challenge-choosing-a-fixed-learning-rate" style="font-size: 80%;"><b>Challenge: Choosing a Fixed Learning Rate</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs038.html#motivation-for-adaptive-step-sizes" style="font-size: 80%;"><b>Motivation for Adaptive Step Sizes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs038.html#adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>AdaGrad algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs039.html#derivation-of-the-adagrad-algorithm" style="font-size: 80%;"><b>Derivation of the AdaGrad Algorithm</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs040.html#adagrad-update-rule-derivation" style="font-size: 80%;"><b>AdaGrad Update Rule Derivation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs041.html#adagrad-properties" style="font-size: 80%;"><b>AdaGrad Properties</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs042.html#rmsprop-adaptive-learning-rates" style="font-size: 80%;"><b>RMSProp: Adaptive Learning Rates</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs042.html#rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>RMSProp algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs043.html#adam-optimizer" style="font-size: 80%;"><b>Adam Optimizer</b></a></li>
+     <!-- navigation toc: --> <li><a href="#adam-optimizer-https-arxiv-org-abs-1412-6980" style="font-size: 80%;"><b>"ADAM optimizer":"https://arxiv.org/abs/1412.6980"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs045.html#why-combine-momentum-and-rmsprop" style="font-size: 80%;"><b>Why Combine Momentum and RMSProp?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs046.html#adam-exponential-moving-averages-moments" style="font-size: 80%;"><b>Adam: Exponential Moving Averages (Moments)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs047.html#adam-bias-correction" style="font-size: 80%;"><b>Adam: Bias Correction</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs048.html#adam-update-rule-derivation" style="font-size: 80%;"><b>Adam: Update Rule Derivation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs049.html#adam-vs-adagrad-and-rmsprop" style="font-size: 80%;"><b>Adam vs. AdaGrad and RMSProp</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs050.html#adaptivity-across-dimensions" style="font-size: 80%;"><b>Adaptivity Across Dimensions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs050.html#adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>ADAM algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs051.html#algorithms-and-codes-for-adagrad-rmsprop-and-adam" style="font-size: 80%;"><b>Algorithms and codes for Adagrad, RMSprop and Adam</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs052.html#practical-tips" style="font-size: 80%;"><b>Practical tips</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs053.html#sneaking-in-automatic-differentiation-using-autograd" style="font-size: 80%;"><b>Sneaking in automatic differentiation using Autograd</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs056.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs055.html#including-stochastic-gradient-descent-with-autograd" style="font-size: 80%;"><b>Including Stochastic Gradient Descent with Autograd</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs056.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs057.html#but-none-of-these-can-compete-with-newton-s-method" style="font-size: 80%;"><b>But none of these can compete with Newton's method</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs058.html#similar-second-order-function-now-problem-but-now-with-adagrad" style="font-size: 80%;"><b>Similar (second order function now) problem but now with AdaGrad</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs059.html#rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent" style="font-size: 80%;"><b>RMSprop for adaptive learning rate with Stochastic Gradient Descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs060.html#and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf" style="font-size: 80%;"><b>And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs061.html#material-for-the-lab-sessions" style="font-size: 80%;"><b>Material for the lab sessions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs062.html#reminder-on-different-scaling-methods" style="font-size: 80%;"><b>Reminder on different scaling methods</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs063.html#functionality-in-scikit-learn" style="font-size: 80%;"><b>Functionality in Scikit-Learn</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs064.html#more-preprocessing" style="font-size: 80%;"><b>More preprocessing</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs065.html#frequently-used-scaling-functions" style="font-size: 80%;"><b>Frequently used scaling functions</b></a></li>
 
         </ul>
       </li>
@@ -290,23 +373,16 @@
 <div class="container">
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0044"></a>
-<!-- !split  -->
-<h2 id="various-steps-in-cross-validation" class="anchor">Various steps in cross-validation </h2>
+<!-- !split -->
+<h2 id="adam-optimizer-https-arxiv-org-abs-1412-6980" class="anchor"><a href="https://arxiv.org/abs/1412.6980" target="_self">ADAM optimizer</a> </h2>
 
-<p>When the repetitive splitting of the data set is done randomly,
-samples may accidently end up in a fast majority of the splits in
-either training or test set. Such samples may have an unbalanced
-influence on either model building or prediction evaluation. To avoid
-this \( k \)-fold cross-validation structures the data splitting. The
-samples are divided into \( k \) more or less equally sized exhaustive and
-mutually exclusive subsets. In turn (at each split) one of these
-subsets plays the role of the test set while the union of the
-remaining subsets constitutes the training set. Such a splitting
-warrants a balanced representation of each sample in both training and
-test set over the splits. Still the division into the \( k \) subsets
-involves a degree of randomness. This may be fully excluded when
-choosing \( k=n \). This particular case is referred to as leave-one-out
-cross-validation (LOOCV). 
+<p>In <a href="https://arxiv.org/abs/1412.6980" target="_self">ADAM</a>, we keep a running average of
+both the first and second moment of the gradient and use this
+information to adaptively change the learning rate for different
+parameters.  The method is efficient when working with large
+problems involving lots data and/or parameters.  It is a combination of the
+gradient descent with momentum algorithm and the RMSprop algorithm
+discussed above.
 </p>
 
 <p>
@@ -333,6 +409,8 @@ <h2 id="various-steps-in-cross-validation" class="anchor">Various steps in cross
   <li><a href="http://github.com/._week37-bs051.html">52</a></li>
   <li><a href="http://github.com/._week37-bs052.html">53</a></li>
   <li><a href="http://github.com/._week37-bs053.html">54</a></li>
+  <li><a href="">...</a></li>
+  <li><a href="http://github.com/._week37-bs065.html">66</a></li>
   <li><a href="http://github.com/._week37-bs045.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week37/html/._week37-bs045.html b/doc/pub/week37/html/._week37-bs045.html
index de3fd74f6..66740a39a 100644
--- a/doc/pub/week37/html/._week37-bs045.html
+++ b/doc/pub/week37/html/._week37-bs045.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 37: Statistical interpretations and Resampling Methods">
-<title>Week 37: Statistical interpretations and Resampling Methods</title>
+<meta name="description" content="Week 37: Gradient descent methods">
+<title>Week 37: Gradient descent methods</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week37.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week37-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -40,159 +40,222 @@
                2,
                None,
                'plans-for-week-37-lecture-monday'),
-              ('Plans for week 37, lab sessions',
+              ('Readings and Videos:', 2, None, 'readings-and-videos'),
+              ('Material for lecture Monday September 8',
                2,
                None,
-               'plans-for-week-37-lab-sessions'),
-              ('Material for lecture Monday September 9',
+               'material-for-lecture-monday-september-8'),
+              ('Gradient descent and revisiting Ordinary Least Squares from '
+               'last week',
                2,
                None,
-               'material-for-lecture-monday-september-9'),
-              ('Deriving OLS from a probability distribution',
+               'gradient-descent-and-revisiting-ordinary-least-squares-from-last-week'),
+              ('Gradient descent example', 2, None, 'gradient-descent-example'),
+              ('The derivative of the cost/loss function',
                2,
                None,
-               'deriving-ols-from-a-probability-distribution'),
-              ('Independent and Identically Distrubuted (iid)',
+               'the-derivative-of-the-cost-loss-function'),
+              ('The Hessian matrix', 2, None, 'the-hessian-matrix'),
+              ('Simple program', 2, None, 'simple-program'),
+              ('Gradient Descent Example', 2, None, 'gradient-descent-example'),
+              ('Gradient descent and Ridge',
                2,
                None,
-               'independent-and-identically-distrubuted-iid'),
-              ('Maximum Likelihood Estimation (MLE)',
+               'gradient-descent-and-ridge'),
+              ('The Hessian matrix for Ridge Regression',
                2,
                None,
-               'maximum-likelihood-estimation-mle'),
-              ('A new Cost Function', 2, None, 'a-new-cost-function'),
-              ("More basic Statistics and Bayes' theorem",
+               'the-hessian-matrix-for-ridge-regression'),
+              ('Program example for gradient descent with Ridge Regression',
                2,
                None,
-               'more-basic-statistics-and-bayes-theorem'),
-              ('Marginal Probability', 2, None, 'marginal-probability'),
-              ('Conditional  Probability', 2, None, 'conditional-probability'),
-              ("Bayes' Theorem", 2, None, 'bayes-theorem'),
-              ("Interpretations of Bayes' Theorem",
+               'program-example-for-gradient-descent-with-ridge-regression'),
+              ('Using gradient descent methods, limitations',
                2,
                None,
-               'interpretations-of-bayes-theorem'),
-              ("Example of Usage of Bayes' theorem",
+               'using-gradient-descent-methods-limitations'),
+              ('Momentum based GD', 2, None, 'momentum-based-gd'),
+              ('Improving gradient descent with momentum',
                2,
                None,
-               'example-of-usage-of-bayes-theorem'),
-              ('Doing it correctly', 2, None, 'doing-it-correctly'),
-              ("Bayes' Theorem and Ridge and Lasso Regression",
+               'improving-gradient-descent-with-momentum'),
+              ('Same code but now with momentum gradient descent',
                2,
                None,
-               'bayes-theorem-and-ridge-and-lasso-regression'),
-              ('Ridge and Bayes', 2, None, 'ridge-and-bayes'),
-              ('Lasso and Bayes', 2, None, 'lasso-and-bayes'),
-              ('Why resampling methods', 2, None, 'why-resampling-methods'),
-              ('Resampling methods', 2, None, 'resampling-methods'),
-              ('Resampling approaches can be computationally expensive',
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ('Overview video on Stochastic Gradient Descent (SGD)',
                2,
                None,
-               'resampling-approaches-can-be-computationally-expensive'),
-              ('Why resampling methods ?', 2, None, 'why-resampling-methods'),
-              ('Statistical analysis', 2, None, 'statistical-analysis'),
-              ('Resampling methods', 2, None, 'resampling-methods'),
-              ('Resampling methods: Bootstrap',
+               'overview-video-on-stochastic-gradient-descent-sgd'),
+              ('Batches and mini-batches', 2, None, 'batches-and-mini-batches'),
+              ('Pros and cons', 2, None, 'pros-and-cons'),
+              ('Convergence rates', 2, None, 'convergence-rates'),
+              ('Accuracy', 2, None, 'accuracy'),
+              ('Stochastic Gradient Descent (SGD)',
                2,
                None,
-               'resampling-methods-bootstrap'),
-              ('The Central Limit Theorem',
+               'stochastic-gradient-descent-sgd'),
+              ('Stochastic Gradient Descent',
                2,
                None,
-               'the-central-limit-theorem'),
-              ('Finding the Limit', 2, None, 'finding-the-limit'),
-              ('Rewriting the $\\delta$-function',
+               'stochastic-gradient-descent'),
+              ('Computation of gradients', 2, None, 'computation-of-gradients'),
+              ('SGD example', 2, None, 'sgd-example'),
+              ('The gradient step', 2, None, 'the-gradient-step'),
+              ('Simple example code', 2, None, 'simple-example-code'),
+              ('When do we stop?', 2, None, 'when-do-we-stop'),
+              ('Slightly different approach',
                2,
                None,
-               'rewriting-the-delta-function'),
-              ('Identifying Terms', 2, None, 'identifying-terms'),
-              ('Wrapping it up', 2, None, 'wrapping-it-up'),
-              ('Confidence Intervals', 2, None, 'confidence-intervals'),
-              ('Standard Approach based on the Normal Distribution',
+               'slightly-different-approach'),
+              ('Time decay rate', 2, None, 'time-decay-rate'),
+              ('Code with a Number of Minibatches which varies',
                2,
                None,
-               'standard-approach-based-on-the-normal-distribution'),
-              ('Resampling methods: Bootstrap background',
+               'code-with-a-number-of-minibatches-which-varies'),
+              ('Replace or not', 2, None, 'replace-or-not'),
+              ('SGD vs Full-Batch GD: Convergence Speed and Memory Comparison',
                2,
                None,
-               'resampling-methods-bootstrap-background'),
-              ('Resampling methods: More Bootstrap background',
+               'sgd-vs-full-batch-gd-convergence-speed-and-memory-comparison'),
+              ('Theoretical Convergence Speed and convex optimization',
+               3,
+               None,
+               'theoretical-convergence-speed-and-convex-optimization'),
+              ('Strongly Convex Case', 3, None, 'strongly-convex-case'),
+              ('Non-Convex Problems', 3, None, 'non-convex-problems'),
+              ('Memory Usage and Scalability',
+               2,
+               None,
+               'memory-usage-and-scalability'),
+              ('Empirical Evidence: Convergence Time and Memory in Practice',
+               2,
+               None,
+               'empirical-evidence-convergence-time-and-memory-in-practice'),
+              ('Deep Neural Networks', 3, None, 'deep-neural-networks'),
+              ('Memory constraints', 3, None, 'memory-constraints'),
+              ('Second moment of the gradient',
+               2,
+               None,
+               'second-moment-of-the-gradient'),
+              ('Challenge: Choosing a Fixed Learning Rate',
+               2,
+               None,
+               'challenge-choosing-a-fixed-learning-rate'),
+              ('Motivation for Adaptive Step Sizes',
+               2,
+               None,
+               'motivation-for-adaptive-step-sizes'),
+              ('AdaGrad algorithm, taken from "Goodfellow et '
+               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
+               2,
+               None,
+               'adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
+              ('Derivation of the AdaGrad Algorithm',
+               2,
+               None,
+               'derivation-of-the-adagrad-algorithm'),
+              ('AdaGrad Update Rule Derivation',
                2,
                None,
-               'resampling-methods-more-bootstrap-background'),
-              ('Resampling methods: Bootstrap approach',
+               'adagrad-update-rule-derivation'),
+              ('AdaGrad Properties', 2, None, 'adagrad-properties'),
+              ('RMSProp: Adaptive Learning Rates',
                2,
                None,
-               'resampling-methods-bootstrap-approach'),
-              ('Resampling methods: Bootstrap steps',
+               'rmsprop-adaptive-learning-rates'),
+              ('RMSProp algorithm, taken from "Goodfellow et '
+               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
                2,
                None,
-               'resampling-methods-bootstrap-steps'),
-              ('Code example for the Bootstrap method',
+               'rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
+              ('Adam Optimizer', 2, None, 'adam-optimizer'),
+              ('"ADAM optimizer":"https://arxiv.org/abs/1412.6980"',
                2,
                None,
-               'code-example-for-the-bootstrap-method'),
-              ('Plotting the Histogram', 2, None, 'plotting-the-histogram'),
-              ('The bias-variance tradeoff',
+               'adam-optimizer-https-arxiv-org-abs-1412-6980'),
+              ('Why Combine Momentum and RMSProp?',
                2,
                None,
-               'the-bias-variance-tradeoff'),
-              ('A way to Read the Bias-Variance Tradeoff',
+               'why-combine-momentum-and-rmsprop'),
+              ('Adam: Exponential Moving Averages (Moments)',
                2,
                None,
-               'a-way-to-read-the-bias-variance-tradeoff'),
-              ('Example code for Bias-Variance tradeoff',
+               'adam-exponential-moving-averages-moments'),
+              ('Adam: Bias Correction', 2, None, 'adam-bias-correction'),
+              ('Adam: Update Rule Derivation',
                2,
                None,
-               'example-code-for-bias-variance-tradeoff'),
-              ('Understanding what happens',
+               'adam-update-rule-derivation'),
+              ('Adam vs. AdaGrad and RMSProp',
                2,
                None,
-               'understanding-what-happens'),
-              ('Summing up', 2, None, 'summing-up'),
-              ("Another Example from Scikit-Learn's Repository",
+               'adam-vs-adagrad-and-rmsprop'),
+              ('Adaptivity Across Dimensions',
                2,
                None,
-               'another-example-from-scikit-learn-s-repository'),
-              ('Various steps in cross-validation',
+               'adaptivity-across-dimensions'),
+              ('ADAM algorithm, taken from "Goodfellow et '
+               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
                2,
                None,
-               'various-steps-in-cross-validation'),
-              ('Cross-validation in brief',
+               'adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
+              ('Algorithms and codes for Adagrad, RMSprop and Adam',
                2,
                None,
-               'cross-validation-in-brief'),
-              ('Code Example for Cross-validation and $k$-fold '
-               'Cross-validation',
+               'algorithms-and-codes-for-adagrad-rmsprop-and-adam'),
+              ('Practical tips', 2, None, 'practical-tips'),
+              ('Sneaking in automatic differentiation using Autograd',
                2,
                None,
-               'code-example-for-cross-validation-and-k-fold-cross-validation'),
-              ('More examples on bootstrap and cross-validation and errors',
+               'sneaking-in-automatic-differentiation-using-autograd'),
+              ('Same code but now with momentum gradient descent',
                2,
                None,
-               'more-examples-on-bootstrap-and-cross-validation-and-errors'),
-              ('The same example but now with cross-validation',
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ('Including Stochastic Gradient Descent with Autograd',
                2,
                None,
-               'the-same-example-but-now-with-cross-validation'),
+               'including-stochastic-gradient-descent-with-autograd'),
+              ('Same code but now with momentum gradient descent',
+               2,
+               None,
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ("But none of these can compete with Newton's method",
+               2,
+               None,
+               'but-none-of-these-can-compete-with-newton-s-method'),
+              ('Similar (second order function now) problem but now with '
+               'AdaGrad',
+               2,
+               None,
+               'similar-second-order-function-now-problem-but-now-with-adagrad'),
+              ('RMSprop for adaptive learning rate with Stochastic Gradient '
+               'Descent',
+               2,
+               None,
+               'rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent'),
+              ('And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"',
+               2,
+               None,
+               'and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf'),
               ('Material for the lab sessions',
                2,
                None,
                'material-for-the-lab-sessions'),
-              ('Linking the regression analysis with a statistical '
-               'interpretation',
+              ('Reminder on different scaling methods',
                2,
                None,
-               'linking-the-regression-analysis-with-a-statistical-interpretation'),
-              ('Assumptions made', 2, None, 'assumptions-made'),
-              ('Expectation value and variance',
+               'reminder-on-different-scaling-methods'),
+              ('Functionality in Scikit-Learn',
                2,
                None,
-               'expectation-value-and-variance'),
-              ('Expectation value and variance for $\\boldsymbol{\\beta}$',
+               'functionality-in-scikit-learn'),
+              ('More preprocessing', 2, None, 'more-preprocessing'),
+              ('Frequently used scaling functions',
                2,
                None,
-               'expectation-value-and-variance-for-boldsymbol-beta')]}
+               'frequently-used-scaling-functions')]}
 end of tocinfo -->
 
 <body>
@@ -220,66 +283,86 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week37-bs.html">Week 37: Statistical interpretations and Resampling Methods</a>
+    <a class="navbar-brand" href="http://github.com/week37-bs.html">Week 37: Gradient descent methods</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs001.html#plans-for-week-37-lecture-monday" style="font-size: 80%;">Plans for week 37, lecture Monday</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs002.html#plans-for-week-37-lab-sessions" style="font-size: 80%;">Plans for week 37, lab sessions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs003.html#material-for-lecture-monday-september-9" style="font-size: 80%;">Material for lecture Monday September 9</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs004.html#deriving-ols-from-a-probability-distribution" style="font-size: 80%;">Deriving OLS from a probability distribution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs005.html#independent-and-identically-distrubuted-iid" style="font-size: 80%;">Independent and Identically Distrubuted (iid)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs006.html#maximum-likelihood-estimation-mle" style="font-size: 80%;">Maximum Likelihood Estimation (MLE)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs007.html#a-new-cost-function" style="font-size: 80%;">A new Cost Function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs008.html#more-basic-statistics-and-bayes-theorem" style="font-size: 80%;">More basic Statistics and Bayes' theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs009.html#marginal-probability" style="font-size: 80%;">Marginal Probability</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs010.html#conditional-probability" style="font-size: 80%;">Conditional  Probability</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs011.html#bayes-theorem" style="font-size: 80%;">Bayes' Theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs012.html#interpretations-of-bayes-theorem" style="font-size: 80%;">Interpretations of Bayes' Theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs013.html#example-of-usage-of-bayes-theorem" style="font-size: 80%;">Example of Usage of Bayes' theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs014.html#doing-it-correctly" style="font-size: 80%;">Doing it correctly</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs015.html#bayes-theorem-and-ridge-and-lasso-regression" style="font-size: 80%;">Bayes' Theorem and Ridge and Lasso Regression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs016.html#ridge-and-bayes" style="font-size: 80%;">Ridge and Bayes</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs017.html#lasso-and-bayes" style="font-size: 80%;">Lasso and Bayes</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs021.html#why-resampling-methods" style="font-size: 80%;">Why resampling methods</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs023.html#resampling-methods" style="font-size: 80%;">Resampling methods</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs020.html#resampling-approaches-can-be-computationally-expensive" style="font-size: 80%;">Resampling approaches can be computationally expensive</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs021.html#why-resampling-methods" style="font-size: 80%;">Why resampling methods ?</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs022.html#statistical-analysis" style="font-size: 80%;">Statistical analysis</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs023.html#resampling-methods" style="font-size: 80%;">Resampling methods</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs024.html#resampling-methods-bootstrap" style="font-size: 80%;">Resampling methods: Bootstrap</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs025.html#the-central-limit-theorem" style="font-size: 80%;">The Central Limit Theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs026.html#finding-the-limit" style="font-size: 80%;">Finding the Limit</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs027.html#rewriting-the-delta-function" style="font-size: 80%;">Rewriting the \( \delta \)-function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs028.html#identifying-terms" style="font-size: 80%;">Identifying Terms</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs029.html#wrapping-it-up" style="font-size: 80%;">Wrapping it up</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs030.html#confidence-intervals" style="font-size: 80%;">Confidence Intervals</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs031.html#standard-approach-based-on-the-normal-distribution" style="font-size: 80%;">Standard Approach based on the Normal Distribution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs032.html#resampling-methods-bootstrap-background" style="font-size: 80%;">Resampling methods: Bootstrap background</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#resampling-methods-more-bootstrap-background" style="font-size: 80%;">Resampling methods: More Bootstrap background</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs034.html#resampling-methods-bootstrap-approach" style="font-size: 80%;">Resampling methods: Bootstrap approach</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#resampling-methods-bootstrap-steps" style="font-size: 80%;">Resampling methods: Bootstrap steps</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs036.html#code-example-for-the-bootstrap-method" style="font-size: 80%;">Code example for the Bootstrap method</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs037.html#plotting-the-histogram" style="font-size: 80%;">Plotting the Histogram</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs038.html#the-bias-variance-tradeoff" style="font-size: 80%;">The bias-variance tradeoff</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs039.html#a-way-to-read-the-bias-variance-tradeoff" style="font-size: 80%;">A way to Read the Bias-Variance Tradeoff</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs040.html#example-code-for-bias-variance-tradeoff" style="font-size: 80%;">Example code for Bias-Variance tradeoff</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs041.html#understanding-what-happens" style="font-size: 80%;">Understanding what happens</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs042.html#summing-up" style="font-size: 80%;">Summing up</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs043.html#another-example-from-scikit-learn-s-repository" style="font-size: 80%;">Another Example from Scikit-Learn's Repository</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs044.html#various-steps-in-cross-validation" style="font-size: 80%;">Various steps in cross-validation</a></li>
-     <!-- navigation toc: --> <li><a href="#cross-validation-in-brief" style="font-size: 80%;">Cross-validation in brief</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs046.html#code-example-for-cross-validation-and-k-fold-cross-validation" style="font-size: 80%;">Code Example for Cross-validation and \( k \)-fold Cross-validation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs047.html#more-examples-on-bootstrap-and-cross-validation-and-errors" style="font-size: 80%;">More examples on bootstrap and cross-validation and errors</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs048.html#the-same-example-but-now-with-cross-validation" style="font-size: 80%;">The same example but now with cross-validation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs049.html#material-for-the-lab-sessions" style="font-size: 80%;">Material for the lab sessions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs050.html#linking-the-regression-analysis-with-a-statistical-interpretation" style="font-size: 80%;">Linking the regression analysis with a statistical interpretation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs051.html#assumptions-made" style="font-size: 80%;">Assumptions made</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs052.html#expectation-value-and-variance" style="font-size: 80%;">Expectation value and variance</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs053.html#expectation-value-and-variance-for-boldsymbol-beta" style="font-size: 80%;">Expectation value and variance for \( \boldsymbol{\beta} \)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs001.html#plans-for-week-37-lecture-monday" style="font-size: 80%;"><b>Plans for week 37, lecture Monday</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs002.html#readings-and-videos" style="font-size: 80%;"><b>Readings and Videos:</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs003.html#material-for-lecture-monday-september-8" style="font-size: 80%;"><b>Material for lecture Monday September 8</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs004.html#gradient-descent-and-revisiting-ordinary-least-squares-from-last-week" style="font-size: 80%;"><b>Gradient descent and revisiting Ordinary Least Squares from last week</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs009.html#gradient-descent-example" style="font-size: 80%;"><b>Gradient descent example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs006.html#the-derivative-of-the-cost-loss-function" style="font-size: 80%;"><b>The derivative of the cost/loss function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs007.html#the-hessian-matrix" style="font-size: 80%;"><b>The Hessian matrix</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs008.html#simple-program" style="font-size: 80%;"><b>Simple program</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs009.html#gradient-descent-example" style="font-size: 80%;"><b>Gradient Descent Example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs010.html#gradient-descent-and-ridge" style="font-size: 80%;"><b>Gradient descent and Ridge</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs011.html#the-hessian-matrix-for-ridge-regression" style="font-size: 80%;"><b>The Hessian matrix for Ridge Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs012.html#program-example-for-gradient-descent-with-ridge-regression" style="font-size: 80%;"><b>Program example for gradient descent with Ridge Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs013.html#using-gradient-descent-methods-limitations" style="font-size: 80%;"><b>Using gradient descent methods, limitations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs014.html#momentum-based-gd" style="font-size: 80%;"><b>Momentum based GD</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs015.html#improving-gradient-descent-with-momentum" style="font-size: 80%;"><b>Improving gradient descent with momentum</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs056.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs017.html#overview-video-on-stochastic-gradient-descent-sgd" style="font-size: 80%;"><b>Overview video on Stochastic Gradient Descent (SGD)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs018.html#batches-and-mini-batches" style="font-size: 80%;"><b>Batches and mini-batches</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs019.html#pros-and-cons" style="font-size: 80%;"><b>Pros and cons</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs020.html#convergence-rates" style="font-size: 80%;"><b>Convergence rates</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs021.html#accuracy" style="font-size: 80%;"><b>Accuracy</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs022.html#stochastic-gradient-descent-sgd" style="font-size: 80%;"><b>Stochastic Gradient Descent (SGD)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs023.html#stochastic-gradient-descent" style="font-size: 80%;"><b>Stochastic Gradient Descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs024.html#computation-of-gradients" style="font-size: 80%;"><b>Computation of gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs025.html#sgd-example" style="font-size: 80%;"><b>SGD example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs026.html#the-gradient-step" style="font-size: 80%;"><b>The gradient step</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs027.html#simple-example-code" style="font-size: 80%;"><b>Simple example code</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs028.html#when-do-we-stop" style="font-size: 80%;"><b>When do we stop?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs029.html#slightly-different-approach" style="font-size: 80%;"><b>Slightly different approach</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs030.html#time-decay-rate" style="font-size: 80%;"><b>Time decay rate</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs031.html#code-with-a-number-of-minibatches-which-varies" style="font-size: 80%;"><b>Code with a Number of Minibatches which varies</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs032.html#replace-or-not" style="font-size: 80%;"><b>Replace or not</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#sgd-vs-full-batch-gd-convergence-speed-and-memory-comparison" style="font-size: 80%;"><b>SGD vs Full-Batch GD: Convergence Speed and Memory Comparison</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#theoretical-convergence-speed-and-convex-optimization" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Theoretical Convergence Speed and convex optimization</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#strongly-convex-case" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Strongly Convex Case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#non-convex-problems" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Non-Convex Problems</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs034.html#memory-usage-and-scalability" style="font-size: 80%;"><b>Memory Usage and Scalability</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#empirical-evidence-convergence-time-and-memory-in-practice" style="font-size: 80%;"><b>Empirical Evidence: Convergence Time and Memory in Practice</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#deep-neural-networks" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Deep Neural Networks</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#memory-constraints" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Memory constraints</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs036.html#second-moment-of-the-gradient" style="font-size: 80%;"><b>Second moment of the gradient</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs037.html#challenge-choosing-a-fixed-learning-rate" style="font-size: 80%;"><b>Challenge: Choosing a Fixed Learning Rate</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs038.html#motivation-for-adaptive-step-sizes" style="font-size: 80%;"><b>Motivation for Adaptive Step Sizes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs038.html#adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>AdaGrad algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs039.html#derivation-of-the-adagrad-algorithm" style="font-size: 80%;"><b>Derivation of the AdaGrad Algorithm</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs040.html#adagrad-update-rule-derivation" style="font-size: 80%;"><b>AdaGrad Update Rule Derivation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs041.html#adagrad-properties" style="font-size: 80%;"><b>AdaGrad Properties</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs042.html#rmsprop-adaptive-learning-rates" style="font-size: 80%;"><b>RMSProp: Adaptive Learning Rates</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs042.html#rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>RMSProp algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs043.html#adam-optimizer" style="font-size: 80%;"><b>Adam Optimizer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs044.html#adam-optimizer-https-arxiv-org-abs-1412-6980" style="font-size: 80%;"><b>"ADAM optimizer":"https://arxiv.org/abs/1412.6980"</b></a></li>
+     <!-- navigation toc: --> <li><a href="#why-combine-momentum-and-rmsprop" style="font-size: 80%;"><b>Why Combine Momentum and RMSProp?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs046.html#adam-exponential-moving-averages-moments" style="font-size: 80%;"><b>Adam: Exponential Moving Averages (Moments)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs047.html#adam-bias-correction" style="font-size: 80%;"><b>Adam: Bias Correction</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs048.html#adam-update-rule-derivation" style="font-size: 80%;"><b>Adam: Update Rule Derivation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs049.html#adam-vs-adagrad-and-rmsprop" style="font-size: 80%;"><b>Adam vs. AdaGrad and RMSProp</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs050.html#adaptivity-across-dimensions" style="font-size: 80%;"><b>Adaptivity Across Dimensions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs050.html#adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>ADAM algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs051.html#algorithms-and-codes-for-adagrad-rmsprop-and-adam" style="font-size: 80%;"><b>Algorithms and codes for Adagrad, RMSprop and Adam</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs052.html#practical-tips" style="font-size: 80%;"><b>Practical tips</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs053.html#sneaking-in-automatic-differentiation-using-autograd" style="font-size: 80%;"><b>Sneaking in automatic differentiation using Autograd</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs056.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs055.html#including-stochastic-gradient-descent-with-autograd" style="font-size: 80%;"><b>Including Stochastic Gradient Descent with Autograd</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs056.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs057.html#but-none-of-these-can-compete-with-newton-s-method" style="font-size: 80%;"><b>But none of these can compete with Newton's method</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs058.html#similar-second-order-function-now-problem-but-now-with-adagrad" style="font-size: 80%;"><b>Similar (second order function now) problem but now with AdaGrad</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs059.html#rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent" style="font-size: 80%;"><b>RMSprop for adaptive learning rate with Stochastic Gradient Descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs060.html#and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf" style="font-size: 80%;"><b>And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs061.html#material-for-the-lab-sessions" style="font-size: 80%;"><b>Material for the lab sessions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs062.html#reminder-on-different-scaling-methods" style="font-size: 80%;"><b>Reminder on different scaling methods</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs063.html#functionality-in-scikit-learn" style="font-size: 80%;"><b>Functionality in Scikit-Learn</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs064.html#more-preprocessing" style="font-size: 80%;"><b>More preprocessing</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs065.html#frequently-used-scaling-functions" style="font-size: 80%;"><b>Frequently used scaling functions</b></a></li>
 
         </ul>
       </li>
@@ -291,22 +374,16 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0045"></a>
 <!-- !split -->
-<h2 id="cross-validation-in-brief" class="anchor">Cross-validation in brief </h2>
-
-<p>For the various values of \( k \)</p>
+<h2 id="why-combine-momentum-and-rmsprop" class="anchor">Why Combine Momentum and RMSProp? </h2>
 
 <ol>
-<li> shuffle the dataset randomly.</li>
-<li> Split the dataset into \( k \) groups.</li>
-<li> For each unique group:
-<ol type="a"></li>
-<li> Decide which group to use as set for test data</li>
-<li> Take the remaining groups as a training data set</li>
-<li> Fit a model on the training set and evaluate it on the test set</li>
-<li> Retain the evaluation score and discard the model</li>
-</ol>
-<li> Summarize the model using the sample of model evaluation scores</li>
+<li> Momentum: Fast convergence by smoothing gradients (accelerates in long-term gradient direction).</li>
+<li> Adaptive rates (RMSProp): Per-dimension learning rate scaling for stability (handles different feature scales, sparse gradients).</li>
+<li> Adam uses both: maintains moving averages of both first moment (gradients) and second moment (squared gradients)</li>
+<li> Additionally, includes a mechanism to correct the bias in these moving averages (crucial in early iterations)</li>
 </ol>
+<p>Result: Adam is robust, achieves faster convergence with less tuning, and often outperforms SGD (with momentum) in practice</p>
+
 <p>
 <!-- navigation buttons at the bottom of the page -->
 <ul class="pagination">
@@ -330,6 +407,9 @@ <h2 id="cross-validation-in-brief" class="anchor">Cross-validation in brief </h2
   <li><a href="http://github.com/._week37-bs051.html">52</a></li>
   <li><a href="http://github.com/._week37-bs052.html">53</a></li>
   <li><a href="http://github.com/._week37-bs053.html">54</a></li>
+  <li><a href="http://github.com/._week37-bs054.html">55</a></li>
+  <li><a href="">...</a></li>
+  <li><a href="http://github.com/._week37-bs065.html">66</a></li>
   <li><a href="http://github.com/._week37-bs046.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week37/html/._week37-bs046.html b/doc/pub/week37/html/._week37-bs046.html
index 9da2c6905..50fa05f9a 100644
--- a/doc/pub/week37/html/._week37-bs046.html
+++ b/doc/pub/week37/html/._week37-bs046.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 37: Statistical interpretations and Resampling Methods">
-<title>Week 37: Statistical interpretations and Resampling Methods</title>
+<meta name="description" content="Week 37: Gradient descent methods">
+<title>Week 37: Gradient descent methods</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week37.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week37-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -40,159 +40,222 @@
                2,
                None,
                'plans-for-week-37-lecture-monday'),
-              ('Plans for week 37, lab sessions',
+              ('Readings and Videos:', 2, None, 'readings-and-videos'),
+              ('Material for lecture Monday September 8',
                2,
                None,
-               'plans-for-week-37-lab-sessions'),
-              ('Material for lecture Monday September 9',
+               'material-for-lecture-monday-september-8'),
+              ('Gradient descent and revisiting Ordinary Least Squares from '
+               'last week',
                2,
                None,
-               'material-for-lecture-monday-september-9'),
-              ('Deriving OLS from a probability distribution',
+               'gradient-descent-and-revisiting-ordinary-least-squares-from-last-week'),
+              ('Gradient descent example', 2, None, 'gradient-descent-example'),
+              ('The derivative of the cost/loss function',
                2,
                None,
-               'deriving-ols-from-a-probability-distribution'),
-              ('Independent and Identically Distrubuted (iid)',
+               'the-derivative-of-the-cost-loss-function'),
+              ('The Hessian matrix', 2, None, 'the-hessian-matrix'),
+              ('Simple program', 2, None, 'simple-program'),
+              ('Gradient Descent Example', 2, None, 'gradient-descent-example'),
+              ('Gradient descent and Ridge',
                2,
                None,
-               'independent-and-identically-distrubuted-iid'),
-              ('Maximum Likelihood Estimation (MLE)',
+               'gradient-descent-and-ridge'),
+              ('The Hessian matrix for Ridge Regression',
                2,
                None,
-               'maximum-likelihood-estimation-mle'),
-              ('A new Cost Function', 2, None, 'a-new-cost-function'),
-              ("More basic Statistics and Bayes' theorem",
+               'the-hessian-matrix-for-ridge-regression'),
+              ('Program example for gradient descent with Ridge Regression',
                2,
                None,
-               'more-basic-statistics-and-bayes-theorem'),
-              ('Marginal Probability', 2, None, 'marginal-probability'),
-              ('Conditional  Probability', 2, None, 'conditional-probability'),
-              ("Bayes' Theorem", 2, None, 'bayes-theorem'),
-              ("Interpretations of Bayes' Theorem",
+               'program-example-for-gradient-descent-with-ridge-regression'),
+              ('Using gradient descent methods, limitations',
                2,
                None,
-               'interpretations-of-bayes-theorem'),
-              ("Example of Usage of Bayes' theorem",
+               'using-gradient-descent-methods-limitations'),
+              ('Momentum based GD', 2, None, 'momentum-based-gd'),
+              ('Improving gradient descent with momentum',
                2,
                None,
-               'example-of-usage-of-bayes-theorem'),
-              ('Doing it correctly', 2, None, 'doing-it-correctly'),
-              ("Bayes' Theorem and Ridge and Lasso Regression",
+               'improving-gradient-descent-with-momentum'),
+              ('Same code but now with momentum gradient descent',
                2,
                None,
-               'bayes-theorem-and-ridge-and-lasso-regression'),
-              ('Ridge and Bayes', 2, None, 'ridge-and-bayes'),
-              ('Lasso and Bayes', 2, None, 'lasso-and-bayes'),
-              ('Why resampling methods', 2, None, 'why-resampling-methods'),
-              ('Resampling methods', 2, None, 'resampling-methods'),
-              ('Resampling approaches can be computationally expensive',
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ('Overview video on Stochastic Gradient Descent (SGD)',
                2,
                None,
-               'resampling-approaches-can-be-computationally-expensive'),
-              ('Why resampling methods ?', 2, None, 'why-resampling-methods'),
-              ('Statistical analysis', 2, None, 'statistical-analysis'),
-              ('Resampling methods', 2, None, 'resampling-methods'),
-              ('Resampling methods: Bootstrap',
+               'overview-video-on-stochastic-gradient-descent-sgd'),
+              ('Batches and mini-batches', 2, None, 'batches-and-mini-batches'),
+              ('Pros and cons', 2, None, 'pros-and-cons'),
+              ('Convergence rates', 2, None, 'convergence-rates'),
+              ('Accuracy', 2, None, 'accuracy'),
+              ('Stochastic Gradient Descent (SGD)',
                2,
                None,
-               'resampling-methods-bootstrap'),
-              ('The Central Limit Theorem',
+               'stochastic-gradient-descent-sgd'),
+              ('Stochastic Gradient Descent',
                2,
                None,
-               'the-central-limit-theorem'),
-              ('Finding the Limit', 2, None, 'finding-the-limit'),
-              ('Rewriting the $\\delta$-function',
+               'stochastic-gradient-descent'),
+              ('Computation of gradients', 2, None, 'computation-of-gradients'),
+              ('SGD example', 2, None, 'sgd-example'),
+              ('The gradient step', 2, None, 'the-gradient-step'),
+              ('Simple example code', 2, None, 'simple-example-code'),
+              ('When do we stop?', 2, None, 'when-do-we-stop'),
+              ('Slightly different approach',
                2,
                None,
-               'rewriting-the-delta-function'),
-              ('Identifying Terms', 2, None, 'identifying-terms'),
-              ('Wrapping it up', 2, None, 'wrapping-it-up'),
-              ('Confidence Intervals', 2, None, 'confidence-intervals'),
-              ('Standard Approach based on the Normal Distribution',
+               'slightly-different-approach'),
+              ('Time decay rate', 2, None, 'time-decay-rate'),
+              ('Code with a Number of Minibatches which varies',
                2,
                None,
-               'standard-approach-based-on-the-normal-distribution'),
-              ('Resampling methods: Bootstrap background',
+               'code-with-a-number-of-minibatches-which-varies'),
+              ('Replace or not', 2, None, 'replace-or-not'),
+              ('SGD vs Full-Batch GD: Convergence Speed and Memory Comparison',
                2,
                None,
-               'resampling-methods-bootstrap-background'),
-              ('Resampling methods: More Bootstrap background',
+               'sgd-vs-full-batch-gd-convergence-speed-and-memory-comparison'),
+              ('Theoretical Convergence Speed and convex optimization',
+               3,
+               None,
+               'theoretical-convergence-speed-and-convex-optimization'),
+              ('Strongly Convex Case', 3, None, 'strongly-convex-case'),
+              ('Non-Convex Problems', 3, None, 'non-convex-problems'),
+              ('Memory Usage and Scalability',
+               2,
+               None,
+               'memory-usage-and-scalability'),
+              ('Empirical Evidence: Convergence Time and Memory in Practice',
+               2,
+               None,
+               'empirical-evidence-convergence-time-and-memory-in-practice'),
+              ('Deep Neural Networks', 3, None, 'deep-neural-networks'),
+              ('Memory constraints', 3, None, 'memory-constraints'),
+              ('Second moment of the gradient',
+               2,
+               None,
+               'second-moment-of-the-gradient'),
+              ('Challenge: Choosing a Fixed Learning Rate',
+               2,
+               None,
+               'challenge-choosing-a-fixed-learning-rate'),
+              ('Motivation for Adaptive Step Sizes',
+               2,
+               None,
+               'motivation-for-adaptive-step-sizes'),
+              ('AdaGrad algorithm, taken from "Goodfellow et '
+               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
+               2,
+               None,
+               'adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
+              ('Derivation of the AdaGrad Algorithm',
+               2,
+               None,
+               'derivation-of-the-adagrad-algorithm'),
+              ('AdaGrad Update Rule Derivation',
                2,
                None,
-               'resampling-methods-more-bootstrap-background'),
-              ('Resampling methods: Bootstrap approach',
+               'adagrad-update-rule-derivation'),
+              ('AdaGrad Properties', 2, None, 'adagrad-properties'),
+              ('RMSProp: Adaptive Learning Rates',
                2,
                None,
-               'resampling-methods-bootstrap-approach'),
-              ('Resampling methods: Bootstrap steps',
+               'rmsprop-adaptive-learning-rates'),
+              ('RMSProp algorithm, taken from "Goodfellow et '
+               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
                2,
                None,
-               'resampling-methods-bootstrap-steps'),
-              ('Code example for the Bootstrap method',
+               'rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
+              ('Adam Optimizer', 2, None, 'adam-optimizer'),
+              ('"ADAM optimizer":"https://arxiv.org/abs/1412.6980"',
                2,
                None,
-               'code-example-for-the-bootstrap-method'),
-              ('Plotting the Histogram', 2, None, 'plotting-the-histogram'),
-              ('The bias-variance tradeoff',
+               'adam-optimizer-https-arxiv-org-abs-1412-6980'),
+              ('Why Combine Momentum and RMSProp?',
                2,
                None,
-               'the-bias-variance-tradeoff'),
-              ('A way to Read the Bias-Variance Tradeoff',
+               'why-combine-momentum-and-rmsprop'),
+              ('Adam: Exponential Moving Averages (Moments)',
                2,
                None,
-               'a-way-to-read-the-bias-variance-tradeoff'),
-              ('Example code for Bias-Variance tradeoff',
+               'adam-exponential-moving-averages-moments'),
+              ('Adam: Bias Correction', 2, None, 'adam-bias-correction'),
+              ('Adam: Update Rule Derivation',
                2,
                None,
-               'example-code-for-bias-variance-tradeoff'),
-              ('Understanding what happens',
+               'adam-update-rule-derivation'),
+              ('Adam vs. AdaGrad and RMSProp',
                2,
                None,
-               'understanding-what-happens'),
-              ('Summing up', 2, None, 'summing-up'),
-              ("Another Example from Scikit-Learn's Repository",
+               'adam-vs-adagrad-and-rmsprop'),
+              ('Adaptivity Across Dimensions',
                2,
                None,
-               'another-example-from-scikit-learn-s-repository'),
-              ('Various steps in cross-validation',
+               'adaptivity-across-dimensions'),
+              ('ADAM algorithm, taken from "Goodfellow et '
+               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
                2,
                None,
-               'various-steps-in-cross-validation'),
-              ('Cross-validation in brief',
+               'adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
+              ('Algorithms and codes for Adagrad, RMSprop and Adam',
                2,
                None,
-               'cross-validation-in-brief'),
-              ('Code Example for Cross-validation and $k$-fold '
-               'Cross-validation',
+               'algorithms-and-codes-for-adagrad-rmsprop-and-adam'),
+              ('Practical tips', 2, None, 'practical-tips'),
+              ('Sneaking in automatic differentiation using Autograd',
                2,
                None,
-               'code-example-for-cross-validation-and-k-fold-cross-validation'),
-              ('More examples on bootstrap and cross-validation and errors',
+               'sneaking-in-automatic-differentiation-using-autograd'),
+              ('Same code but now with momentum gradient descent',
                2,
                None,
-               'more-examples-on-bootstrap-and-cross-validation-and-errors'),
-              ('The same example but now with cross-validation',
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ('Including Stochastic Gradient Descent with Autograd',
                2,
                None,
-               'the-same-example-but-now-with-cross-validation'),
+               'including-stochastic-gradient-descent-with-autograd'),
+              ('Same code but now with momentum gradient descent',
+               2,
+               None,
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ("But none of these can compete with Newton's method",
+               2,
+               None,
+               'but-none-of-these-can-compete-with-newton-s-method'),
+              ('Similar (second order function now) problem but now with '
+               'AdaGrad',
+               2,
+               None,
+               'similar-second-order-function-now-problem-but-now-with-adagrad'),
+              ('RMSprop for adaptive learning rate with Stochastic Gradient '
+               'Descent',
+               2,
+               None,
+               'rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent'),
+              ('And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"',
+               2,
+               None,
+               'and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf'),
               ('Material for the lab sessions',
                2,
                None,
                'material-for-the-lab-sessions'),
-              ('Linking the regression analysis with a statistical '
-               'interpretation',
+              ('Reminder on different scaling methods',
                2,
                None,
-               'linking-the-regression-analysis-with-a-statistical-interpretation'),
-              ('Assumptions made', 2, None, 'assumptions-made'),
-              ('Expectation value and variance',
+               'reminder-on-different-scaling-methods'),
+              ('Functionality in Scikit-Learn',
                2,
                None,
-               'expectation-value-and-variance'),
-              ('Expectation value and variance for $\\boldsymbol{\\beta}$',
+               'functionality-in-scikit-learn'),
+              ('More preprocessing', 2, None, 'more-preprocessing'),
+              ('Frequently used scaling functions',
                2,
                None,
-               'expectation-value-and-variance-for-boldsymbol-beta')]}
+               'frequently-used-scaling-functions')]}
 end of tocinfo -->
 
 <body>
@@ -220,66 +283,86 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week37-bs.html">Week 37: Statistical interpretations and Resampling Methods</a>
+    <a class="navbar-brand" href="http://github.com/week37-bs.html">Week 37: Gradient descent methods</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs001.html#plans-for-week-37-lecture-monday" style="font-size: 80%;">Plans for week 37, lecture Monday</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs002.html#plans-for-week-37-lab-sessions" style="font-size: 80%;">Plans for week 37, lab sessions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs003.html#material-for-lecture-monday-september-9" style="font-size: 80%;">Material for lecture Monday September 9</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs004.html#deriving-ols-from-a-probability-distribution" style="font-size: 80%;">Deriving OLS from a probability distribution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs005.html#independent-and-identically-distrubuted-iid" style="font-size: 80%;">Independent and Identically Distrubuted (iid)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs006.html#maximum-likelihood-estimation-mle" style="font-size: 80%;">Maximum Likelihood Estimation (MLE)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs007.html#a-new-cost-function" style="font-size: 80%;">A new Cost Function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs008.html#more-basic-statistics-and-bayes-theorem" style="font-size: 80%;">More basic Statistics and Bayes' theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs009.html#marginal-probability" style="font-size: 80%;">Marginal Probability</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs010.html#conditional-probability" style="font-size: 80%;">Conditional  Probability</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs011.html#bayes-theorem" style="font-size: 80%;">Bayes' Theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs012.html#interpretations-of-bayes-theorem" style="font-size: 80%;">Interpretations of Bayes' Theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs013.html#example-of-usage-of-bayes-theorem" style="font-size: 80%;">Example of Usage of Bayes' theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs014.html#doing-it-correctly" style="font-size: 80%;">Doing it correctly</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs015.html#bayes-theorem-and-ridge-and-lasso-regression" style="font-size: 80%;">Bayes' Theorem and Ridge and Lasso Regression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs016.html#ridge-and-bayes" style="font-size: 80%;">Ridge and Bayes</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs017.html#lasso-and-bayes" style="font-size: 80%;">Lasso and Bayes</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs021.html#why-resampling-methods" style="font-size: 80%;">Why resampling methods</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs023.html#resampling-methods" style="font-size: 80%;">Resampling methods</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs020.html#resampling-approaches-can-be-computationally-expensive" style="font-size: 80%;">Resampling approaches can be computationally expensive</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs021.html#why-resampling-methods" style="font-size: 80%;">Why resampling methods ?</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs022.html#statistical-analysis" style="font-size: 80%;">Statistical analysis</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs023.html#resampling-methods" style="font-size: 80%;">Resampling methods</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs024.html#resampling-methods-bootstrap" style="font-size: 80%;">Resampling methods: Bootstrap</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs025.html#the-central-limit-theorem" style="font-size: 80%;">The Central Limit Theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs026.html#finding-the-limit" style="font-size: 80%;">Finding the Limit</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs027.html#rewriting-the-delta-function" style="font-size: 80%;">Rewriting the \( \delta \)-function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs028.html#identifying-terms" style="font-size: 80%;">Identifying Terms</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs029.html#wrapping-it-up" style="font-size: 80%;">Wrapping it up</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs030.html#confidence-intervals" style="font-size: 80%;">Confidence Intervals</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs031.html#standard-approach-based-on-the-normal-distribution" style="font-size: 80%;">Standard Approach based on the Normal Distribution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs032.html#resampling-methods-bootstrap-background" style="font-size: 80%;">Resampling methods: Bootstrap background</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#resampling-methods-more-bootstrap-background" style="font-size: 80%;">Resampling methods: More Bootstrap background</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs034.html#resampling-methods-bootstrap-approach" style="font-size: 80%;">Resampling methods: Bootstrap approach</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#resampling-methods-bootstrap-steps" style="font-size: 80%;">Resampling methods: Bootstrap steps</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs036.html#code-example-for-the-bootstrap-method" style="font-size: 80%;">Code example for the Bootstrap method</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs037.html#plotting-the-histogram" style="font-size: 80%;">Plotting the Histogram</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs038.html#the-bias-variance-tradeoff" style="font-size: 80%;">The bias-variance tradeoff</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs039.html#a-way-to-read-the-bias-variance-tradeoff" style="font-size: 80%;">A way to Read the Bias-Variance Tradeoff</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs040.html#example-code-for-bias-variance-tradeoff" style="font-size: 80%;">Example code for Bias-Variance tradeoff</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs041.html#understanding-what-happens" style="font-size: 80%;">Understanding what happens</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs042.html#summing-up" style="font-size: 80%;">Summing up</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs043.html#another-example-from-scikit-learn-s-repository" style="font-size: 80%;">Another Example from Scikit-Learn's Repository</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs044.html#various-steps-in-cross-validation" style="font-size: 80%;">Various steps in cross-validation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs045.html#cross-validation-in-brief" style="font-size: 80%;">Cross-validation in brief</a></li>
-     <!-- navigation toc: --> <li><a href="#code-example-for-cross-validation-and-k-fold-cross-validation" style="font-size: 80%;">Code Example for Cross-validation and \( k \)-fold Cross-validation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs047.html#more-examples-on-bootstrap-and-cross-validation-and-errors" style="font-size: 80%;">More examples on bootstrap and cross-validation and errors</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs048.html#the-same-example-but-now-with-cross-validation" style="font-size: 80%;">The same example but now with cross-validation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs049.html#material-for-the-lab-sessions" style="font-size: 80%;">Material for the lab sessions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs050.html#linking-the-regression-analysis-with-a-statistical-interpretation" style="font-size: 80%;">Linking the regression analysis with a statistical interpretation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs051.html#assumptions-made" style="font-size: 80%;">Assumptions made</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs052.html#expectation-value-and-variance" style="font-size: 80%;">Expectation value and variance</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs053.html#expectation-value-and-variance-for-boldsymbol-beta" style="font-size: 80%;">Expectation value and variance for \( \boldsymbol{\beta} \)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs001.html#plans-for-week-37-lecture-monday" style="font-size: 80%;"><b>Plans for week 37, lecture Monday</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs002.html#readings-and-videos" style="font-size: 80%;"><b>Readings and Videos:</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs003.html#material-for-lecture-monday-september-8" style="font-size: 80%;"><b>Material for lecture Monday September 8</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs004.html#gradient-descent-and-revisiting-ordinary-least-squares-from-last-week" style="font-size: 80%;"><b>Gradient descent and revisiting Ordinary Least Squares from last week</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs009.html#gradient-descent-example" style="font-size: 80%;"><b>Gradient descent example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs006.html#the-derivative-of-the-cost-loss-function" style="font-size: 80%;"><b>The derivative of the cost/loss function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs007.html#the-hessian-matrix" style="font-size: 80%;"><b>The Hessian matrix</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs008.html#simple-program" style="font-size: 80%;"><b>Simple program</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs009.html#gradient-descent-example" style="font-size: 80%;"><b>Gradient Descent Example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs010.html#gradient-descent-and-ridge" style="font-size: 80%;"><b>Gradient descent and Ridge</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs011.html#the-hessian-matrix-for-ridge-regression" style="font-size: 80%;"><b>The Hessian matrix for Ridge Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs012.html#program-example-for-gradient-descent-with-ridge-regression" style="font-size: 80%;"><b>Program example for gradient descent with Ridge Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs013.html#using-gradient-descent-methods-limitations" style="font-size: 80%;"><b>Using gradient descent methods, limitations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs014.html#momentum-based-gd" style="font-size: 80%;"><b>Momentum based GD</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs015.html#improving-gradient-descent-with-momentum" style="font-size: 80%;"><b>Improving gradient descent with momentum</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs056.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs017.html#overview-video-on-stochastic-gradient-descent-sgd" style="font-size: 80%;"><b>Overview video on Stochastic Gradient Descent (SGD)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs018.html#batches-and-mini-batches" style="font-size: 80%;"><b>Batches and mini-batches</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs019.html#pros-and-cons" style="font-size: 80%;"><b>Pros and cons</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs020.html#convergence-rates" style="font-size: 80%;"><b>Convergence rates</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs021.html#accuracy" style="font-size: 80%;"><b>Accuracy</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs022.html#stochastic-gradient-descent-sgd" style="font-size: 80%;"><b>Stochastic Gradient Descent (SGD)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs023.html#stochastic-gradient-descent" style="font-size: 80%;"><b>Stochastic Gradient Descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs024.html#computation-of-gradients" style="font-size: 80%;"><b>Computation of gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs025.html#sgd-example" style="font-size: 80%;"><b>SGD example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs026.html#the-gradient-step" style="font-size: 80%;"><b>The gradient step</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs027.html#simple-example-code" style="font-size: 80%;"><b>Simple example code</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs028.html#when-do-we-stop" style="font-size: 80%;"><b>When do we stop?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs029.html#slightly-different-approach" style="font-size: 80%;"><b>Slightly different approach</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs030.html#time-decay-rate" style="font-size: 80%;"><b>Time decay rate</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs031.html#code-with-a-number-of-minibatches-which-varies" style="font-size: 80%;"><b>Code with a Number of Minibatches which varies</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs032.html#replace-or-not" style="font-size: 80%;"><b>Replace or not</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#sgd-vs-full-batch-gd-convergence-speed-and-memory-comparison" style="font-size: 80%;"><b>SGD vs Full-Batch GD: Convergence Speed and Memory Comparison</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#theoretical-convergence-speed-and-convex-optimization" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Theoretical Convergence Speed and convex optimization</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#strongly-convex-case" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Strongly Convex Case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#non-convex-problems" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Non-Convex Problems</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs034.html#memory-usage-and-scalability" style="font-size: 80%;"><b>Memory Usage and Scalability</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#empirical-evidence-convergence-time-and-memory-in-practice" style="font-size: 80%;"><b>Empirical Evidence: Convergence Time and Memory in Practice</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#deep-neural-networks" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Deep Neural Networks</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#memory-constraints" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Memory constraints</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs036.html#second-moment-of-the-gradient" style="font-size: 80%;"><b>Second moment of the gradient</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs037.html#challenge-choosing-a-fixed-learning-rate" style="font-size: 80%;"><b>Challenge: Choosing a Fixed Learning Rate</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs038.html#motivation-for-adaptive-step-sizes" style="font-size: 80%;"><b>Motivation for Adaptive Step Sizes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs038.html#adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>AdaGrad algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs039.html#derivation-of-the-adagrad-algorithm" style="font-size: 80%;"><b>Derivation of the AdaGrad Algorithm</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs040.html#adagrad-update-rule-derivation" style="font-size: 80%;"><b>AdaGrad Update Rule Derivation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs041.html#adagrad-properties" style="font-size: 80%;"><b>AdaGrad Properties</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs042.html#rmsprop-adaptive-learning-rates" style="font-size: 80%;"><b>RMSProp: Adaptive Learning Rates</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs042.html#rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>RMSProp algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs043.html#adam-optimizer" style="font-size: 80%;"><b>Adam Optimizer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs044.html#adam-optimizer-https-arxiv-org-abs-1412-6980" style="font-size: 80%;"><b>"ADAM optimizer":"https://arxiv.org/abs/1412.6980"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs045.html#why-combine-momentum-and-rmsprop" style="font-size: 80%;"><b>Why Combine Momentum and RMSProp?</b></a></li>
+     <!-- navigation toc: --> <li><a href="#adam-exponential-moving-averages-moments" style="font-size: 80%;"><b>Adam: Exponential Moving Averages (Moments)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs047.html#adam-bias-correction" style="font-size: 80%;"><b>Adam: Bias Correction</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs048.html#adam-update-rule-derivation" style="font-size: 80%;"><b>Adam: Update Rule Derivation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs049.html#adam-vs-adagrad-and-rmsprop" style="font-size: 80%;"><b>Adam vs. AdaGrad and RMSProp</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs050.html#adaptivity-across-dimensions" style="font-size: 80%;"><b>Adaptivity Across Dimensions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs050.html#adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>ADAM algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs051.html#algorithms-and-codes-for-adagrad-rmsprop-and-adam" style="font-size: 80%;"><b>Algorithms and codes for Adagrad, RMSprop and Adam</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs052.html#practical-tips" style="font-size: 80%;"><b>Practical tips</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs053.html#sneaking-in-automatic-differentiation-using-autograd" style="font-size: 80%;"><b>Sneaking in automatic differentiation using Autograd</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs056.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs055.html#including-stochastic-gradient-descent-with-autograd" style="font-size: 80%;"><b>Including Stochastic Gradient Descent with Autograd</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs056.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs057.html#but-none-of-these-can-compete-with-newton-s-method" style="font-size: 80%;"><b>But none of these can compete with Newton's method</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs058.html#similar-second-order-function-now-problem-but-now-with-adagrad" style="font-size: 80%;"><b>Similar (second order function now) problem but now with AdaGrad</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs059.html#rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent" style="font-size: 80%;"><b>RMSprop for adaptive learning rate with Stochastic Gradient Descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs060.html#and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf" style="font-size: 80%;"><b>And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs061.html#material-for-the-lab-sessions" style="font-size: 80%;"><b>Material for the lab sessions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs062.html#reminder-on-different-scaling-methods" style="font-size: 80%;"><b>Reminder on different scaling methods</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs063.html#functionality-in-scikit-learn" style="font-size: 80%;"><b>Functionality in Scikit-Learn</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs064.html#more-preprocessing" style="font-size: 80%;"><b>More preprocessing</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs065.html#frequently-used-scaling-functions" style="font-size: 80%;"><b>Frequently used scaling functions</b></a></li>
 
         </ul>
       </li>
@@ -291,120 +374,31 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0046"></a>
 <!-- !split -->
-<h2 id="code-example-for-cross-validation-and-k-fold-cross-validation" class="anchor">Code Example for Cross-validation and \( k \)-fold Cross-validation </h2>
-
-<p>The code here uses Ridge regression with cross-validation (CV)  resampling and \( k \)-fold CV in order to fit a specific polynomial. </p>
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">matplotlib.pyplot</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">plt</span>
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.model_selection</span> <span style="color: #008000; font-weight: bold">import</span> KFold
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.linear_model</span> <span style="color: #008000; font-weight: bold">import</span> Ridge
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.model_selection</span> <span style="color: #008000; font-weight: bold">import</span> cross_val_score
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.preprocessing</span> <span style="color: #008000; font-weight: bold">import</span> PolynomialFeatures
-
-<span style="color: #408080; font-style: italic"># A seed just to ensure that the random numbers are the same for every run.</span>
-<span style="color: #408080; font-style: italic"># Useful for eventual debugging.</span>
-np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>seed(<span style="color: #666666">3155</span>)
-
-<span style="color: #408080; font-style: italic"># Generate the data.</span>
-nsamples <span style="color: #666666">=</span> <span style="color: #666666">100</span>
-x <span style="color: #666666">=</span> np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>randn(nsamples)
-y <span style="color: #666666">=</span> <span style="color: #666666">3*</span>x<span style="color: #666666">**2</span> <span style="color: #666666">+</span> np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>randn(nsamples)
-
-<span style="color: #408080; font-style: italic">## Cross-validation on Ridge regression using KFold only</span>
-
-<span style="color: #408080; font-style: italic"># Decide degree on polynomial to fit</span>
-poly <span style="color: #666666">=</span> PolynomialFeatures(degree <span style="color: #666666">=</span> <span style="color: #666666">6</span>)
-
-<span style="color: #408080; font-style: italic"># Decide which values of lambda to use</span>
-nlambdas <span style="color: #666666">=</span> <span style="color: #666666">500</span>
-lambdas <span style="color: #666666">=</span> np<span style="color: #666666">.</span>logspace(<span style="color: #666666">-3</span>, <span style="color: #666666">5</span>, nlambdas)
-
-<span style="color: #408080; font-style: italic"># Initialize a KFold instance</span>
-k <span style="color: #666666">=</span> <span style="color: #666666">5</span>
-kfold <span style="color: #666666">=</span> KFold(n_splits <span style="color: #666666">=</span> k)
-
-<span style="color: #408080; font-style: italic"># Perform the cross-validation to estimate MSE</span>
-scores_KFold <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros((nlambdas, k))
-
-i <span style="color: #666666">=</span> <span style="color: #666666">0</span>
-<span style="color: #008000; font-weight: bold">for</span> lmb <span style="color: #AA22FF; font-weight: bold">in</span> lambdas:
-    ridge <span style="color: #666666">=</span> Ridge(alpha <span style="color: #666666">=</span> lmb)
-    j <span style="color: #666666">=</span> <span style="color: #666666">0</span>
-    <span style="color: #008000; font-weight: bold">for</span> train_inds, test_inds <span style="color: #AA22FF; font-weight: bold">in</span> kfold<span style="color: #666666">.</span>split(x):
-        xtrain <span style="color: #666666">=</span> x[train_inds]
-        ytrain <span style="color: #666666">=</span> y[train_inds]
-
-        xtest <span style="color: #666666">=</span> x[test_inds]
-        ytest <span style="color: #666666">=</span> y[test_inds]
-
-        Xtrain <span style="color: #666666">=</span> poly<span style="color: #666666">.</span>fit_transform(xtrain[:, np<span style="color: #666666">.</span>newaxis])
-        ridge<span style="color: #666666">.</span>fit(Xtrain, ytrain[:, np<span style="color: #666666">.</span>newaxis])
-
-        Xtest <span style="color: #666666">=</span> poly<span style="color: #666666">.</span>fit_transform(xtest[:, np<span style="color: #666666">.</span>newaxis])
-        ypred <span style="color: #666666">=</span> ridge<span style="color: #666666">.</span>predict(Xtest)
-
-        scores_KFold[i,j] <span style="color: #666666">=</span> np<span style="color: #666666">.</span>sum((ypred <span style="color: #666666">-</span> ytest[:, np<span style="color: #666666">.</span>newaxis])<span style="color: #666666">**2</span>)<span style="color: #666666">/</span>np<span style="color: #666666">.</span>size(ypred)
-
-        j <span style="color: #666666">+=</span> <span style="color: #666666">1</span>
-    i <span style="color: #666666">+=</span> <span style="color: #666666">1</span>
-
-
-estimated_mse_KFold <span style="color: #666666">=</span> np<span style="color: #666666">.</span>mean(scores_KFold, axis <span style="color: #666666">=</span> <span style="color: #666666">1</span>)
-
-<span style="color: #408080; font-style: italic">## Cross-validation using cross_val_score from sklearn along with KFold</span>
-
-<span style="color: #408080; font-style: italic"># kfold is an instance initialized above as:</span>
-<span style="color: #408080; font-style: italic"># kfold = KFold(n_splits = k)</span>
-
-estimated_mse_sklearn <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros(nlambdas)
-i <span style="color: #666666">=</span> <span style="color: #666666">0</span>
-<span style="color: #008000; font-weight: bold">for</span> lmb <span style="color: #AA22FF; font-weight: bold">in</span> lambdas:
-    ridge <span style="color: #666666">=</span> Ridge(alpha <span style="color: #666666">=</span> lmb)
-
-    X <span style="color: #666666">=</span> poly<span style="color: #666666">.</span>fit_transform(x[:, np<span style="color: #666666">.</span>newaxis])
-    estimated_mse_folds <span style="color: #666666">=</span> cross_val_score(ridge, X, y[:, np<span style="color: #666666">.</span>newaxis], scoring<span style="color: #666666">=</span><span style="color: #BA2121">&#39;neg_mean_squared_error&#39;</span>, cv<span style="color: #666666">=</span>kfold)
-
-    <span style="color: #408080; font-style: italic"># cross_val_score return an array containing the estimated negative mse for every fold.</span>
-    <span style="color: #408080; font-style: italic"># we have to the the mean of every array in order to get an estimate of the mse of the model</span>
-    estimated_mse_sklearn[i] <span style="color: #666666">=</span> np<span style="color: #666666">.</span>mean(<span style="color: #666666">-</span>estimated_mse_folds)
-
-    i <span style="color: #666666">+=</span> <span style="color: #666666">1</span>
-
-<span style="color: #408080; font-style: italic">## Plot and compare the slightly different ways to perform cross-validation</span>
-
-plt<span style="color: #666666">.</span>figure()
-
-plt<span style="color: #666666">.</span>plot(np<span style="color: #666666">.</span>log10(lambdas), estimated_mse_sklearn, label <span style="color: #666666">=</span> <span style="color: #BA2121">&#39;cross_val_score&#39;</span>)
-plt<span style="color: #666666">.</span>plot(np<span style="color: #666666">.</span>log10(lambdas), estimated_mse_KFold, <span style="color: #BA2121">&#39;r--&#39;</span>, label <span style="color: #666666">=</span> <span style="color: #BA2121">&#39;KFold&#39;</span>)
-
-plt<span style="color: #666666">.</span>xlabel(<span style="color: #BA2121">&#39;log10(lambda)&#39;</span>)
-plt<span style="color: #666666">.</span>ylabel(<span style="color: #BA2121">&#39;mse&#39;</span>)
+<h2 id="adam-exponential-moving-averages-moments" class="anchor">Adam: Exponential Moving Averages (Moments) </h2>
+<p>Adam maintains two moving averages at each time step \( t \) for each parameter \( w \):</p>
+<div class="panel panel-default">
+<div class="panel-body">
+<!-- subsequent paragraphs come in larger fonts, so start with a paragraph -->
+<p>The Momentum term</p>
+$$
+m_t = \beta_1m_{t-1} + (1-\beta_1)\, \nabla C(\theta_t),  
+$$
+</div>
+</div>
 
-plt<span style="color: #666666">.</span>legend()
+<div class="panel panel-default">
+<div class="panel-body">
+<!-- subsequent paragraphs come in larger fonts, so start with a paragraph -->
+<p>The RMS term</p>
+$$
+v_t = \beta_2v_{t-1} + (1-\beta_2)(\nabla C(\theta_t))^2,
+$$
 
-plt<span style="color: #666666">.</span>show()
-</pre>
+<p>with typical \( \beta_1 = 0.9 \), \( \beta_2 = 0.999 \). Initialize \( m_0 = 0 \), \( v_0 = 0 \).</p>
 </div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
 </div>
 
+<p>  These are <b>biased</b> estimators of the true first and second moment of the gradients, especially at the start (since \( m_0,v_0 \) are zero)</p>
 
 <p>
 <!-- navigation buttons at the bottom of the page -->
@@ -428,6 +422,10 @@ <h2 id="code-example-for-cross-validation-and-k-fold-cross-validation" class="an
   <li><a href="http://github.com/._week37-bs051.html">52</a></li>
   <li><a href="http://github.com/._week37-bs052.html">53</a></li>
   <li><a href="http://github.com/._week37-bs053.html">54</a></li>
+  <li><a href="http://github.com/._week37-bs054.html">55</a></li>
+  <li><a href="http://github.com/._week37-bs055.html">56</a></li>
+  <li><a href="">...</a></li>
+  <li><a href="http://github.com/._week37-bs065.html">66</a></li>
   <li><a href="http://github.com/._week37-bs047.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week37/html/._week37-bs047.html b/doc/pub/week37/html/._week37-bs047.html
index a80e1fa8d..bfd6fc51b 100644
--- a/doc/pub/week37/html/._week37-bs047.html
+++ b/doc/pub/week37/html/._week37-bs047.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 37: Statistical interpretations and Resampling Methods">
-<title>Week 37: Statistical interpretations and Resampling Methods</title>
+<meta name="description" content="Week 37: Gradient descent methods">
+<title>Week 37: Gradient descent methods</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week37.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week37-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -40,159 +40,222 @@
                2,
                None,
                'plans-for-week-37-lecture-monday'),
-              ('Plans for week 37, lab sessions',
+              ('Readings and Videos:', 2, None, 'readings-and-videos'),
+              ('Material for lecture Monday September 8',
                2,
                None,
-               'plans-for-week-37-lab-sessions'),
-              ('Material for lecture Monday September 9',
+               'material-for-lecture-monday-september-8'),
+              ('Gradient descent and revisiting Ordinary Least Squares from '
+               'last week',
                2,
                None,
-               'material-for-lecture-monday-september-9'),
-              ('Deriving OLS from a probability distribution',
+               'gradient-descent-and-revisiting-ordinary-least-squares-from-last-week'),
+              ('Gradient descent example', 2, None, 'gradient-descent-example'),
+              ('The derivative of the cost/loss function',
                2,
                None,
-               'deriving-ols-from-a-probability-distribution'),
-              ('Independent and Identically Distrubuted (iid)',
+               'the-derivative-of-the-cost-loss-function'),
+              ('The Hessian matrix', 2, None, 'the-hessian-matrix'),
+              ('Simple program', 2, None, 'simple-program'),
+              ('Gradient Descent Example', 2, None, 'gradient-descent-example'),
+              ('Gradient descent and Ridge',
                2,
                None,
-               'independent-and-identically-distrubuted-iid'),
-              ('Maximum Likelihood Estimation (MLE)',
+               'gradient-descent-and-ridge'),
+              ('The Hessian matrix for Ridge Regression',
                2,
                None,
-               'maximum-likelihood-estimation-mle'),
-              ('A new Cost Function', 2, None, 'a-new-cost-function'),
-              ("More basic Statistics and Bayes' theorem",
+               'the-hessian-matrix-for-ridge-regression'),
+              ('Program example for gradient descent with Ridge Regression',
                2,
                None,
-               'more-basic-statistics-and-bayes-theorem'),
-              ('Marginal Probability', 2, None, 'marginal-probability'),
-              ('Conditional  Probability', 2, None, 'conditional-probability'),
-              ("Bayes' Theorem", 2, None, 'bayes-theorem'),
-              ("Interpretations of Bayes' Theorem",
+               'program-example-for-gradient-descent-with-ridge-regression'),
+              ('Using gradient descent methods, limitations',
                2,
                None,
-               'interpretations-of-bayes-theorem'),
-              ("Example of Usage of Bayes' theorem",
+               'using-gradient-descent-methods-limitations'),
+              ('Momentum based GD', 2, None, 'momentum-based-gd'),
+              ('Improving gradient descent with momentum',
                2,
                None,
-               'example-of-usage-of-bayes-theorem'),
-              ('Doing it correctly', 2, None, 'doing-it-correctly'),
-              ("Bayes' Theorem and Ridge and Lasso Regression",
+               'improving-gradient-descent-with-momentum'),
+              ('Same code but now with momentum gradient descent',
                2,
                None,
-               'bayes-theorem-and-ridge-and-lasso-regression'),
-              ('Ridge and Bayes', 2, None, 'ridge-and-bayes'),
-              ('Lasso and Bayes', 2, None, 'lasso-and-bayes'),
-              ('Why resampling methods', 2, None, 'why-resampling-methods'),
-              ('Resampling methods', 2, None, 'resampling-methods'),
-              ('Resampling approaches can be computationally expensive',
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ('Overview video on Stochastic Gradient Descent (SGD)',
                2,
                None,
-               'resampling-approaches-can-be-computationally-expensive'),
-              ('Why resampling methods ?', 2, None, 'why-resampling-methods'),
-              ('Statistical analysis', 2, None, 'statistical-analysis'),
-              ('Resampling methods', 2, None, 'resampling-methods'),
-              ('Resampling methods: Bootstrap',
+               'overview-video-on-stochastic-gradient-descent-sgd'),
+              ('Batches and mini-batches', 2, None, 'batches-and-mini-batches'),
+              ('Pros and cons', 2, None, 'pros-and-cons'),
+              ('Convergence rates', 2, None, 'convergence-rates'),
+              ('Accuracy', 2, None, 'accuracy'),
+              ('Stochastic Gradient Descent (SGD)',
                2,
                None,
-               'resampling-methods-bootstrap'),
-              ('The Central Limit Theorem',
+               'stochastic-gradient-descent-sgd'),
+              ('Stochastic Gradient Descent',
                2,
                None,
-               'the-central-limit-theorem'),
-              ('Finding the Limit', 2, None, 'finding-the-limit'),
-              ('Rewriting the $\\delta$-function',
+               'stochastic-gradient-descent'),
+              ('Computation of gradients', 2, None, 'computation-of-gradients'),
+              ('SGD example', 2, None, 'sgd-example'),
+              ('The gradient step', 2, None, 'the-gradient-step'),
+              ('Simple example code', 2, None, 'simple-example-code'),
+              ('When do we stop?', 2, None, 'when-do-we-stop'),
+              ('Slightly different approach',
                2,
                None,
-               'rewriting-the-delta-function'),
-              ('Identifying Terms', 2, None, 'identifying-terms'),
-              ('Wrapping it up', 2, None, 'wrapping-it-up'),
-              ('Confidence Intervals', 2, None, 'confidence-intervals'),
-              ('Standard Approach based on the Normal Distribution',
+               'slightly-different-approach'),
+              ('Time decay rate', 2, None, 'time-decay-rate'),
+              ('Code with a Number of Minibatches which varies',
                2,
                None,
-               'standard-approach-based-on-the-normal-distribution'),
-              ('Resampling methods: Bootstrap background',
+               'code-with-a-number-of-minibatches-which-varies'),
+              ('Replace or not', 2, None, 'replace-or-not'),
+              ('SGD vs Full-Batch GD: Convergence Speed and Memory Comparison',
                2,
                None,
-               'resampling-methods-bootstrap-background'),
-              ('Resampling methods: More Bootstrap background',
+               'sgd-vs-full-batch-gd-convergence-speed-and-memory-comparison'),
+              ('Theoretical Convergence Speed and convex optimization',
+               3,
+               None,
+               'theoretical-convergence-speed-and-convex-optimization'),
+              ('Strongly Convex Case', 3, None, 'strongly-convex-case'),
+              ('Non-Convex Problems', 3, None, 'non-convex-problems'),
+              ('Memory Usage and Scalability',
+               2,
+               None,
+               'memory-usage-and-scalability'),
+              ('Empirical Evidence: Convergence Time and Memory in Practice',
+               2,
+               None,
+               'empirical-evidence-convergence-time-and-memory-in-practice'),
+              ('Deep Neural Networks', 3, None, 'deep-neural-networks'),
+              ('Memory constraints', 3, None, 'memory-constraints'),
+              ('Second moment of the gradient',
+               2,
+               None,
+               'second-moment-of-the-gradient'),
+              ('Challenge: Choosing a Fixed Learning Rate',
+               2,
+               None,
+               'challenge-choosing-a-fixed-learning-rate'),
+              ('Motivation for Adaptive Step Sizes',
+               2,
+               None,
+               'motivation-for-adaptive-step-sizes'),
+              ('AdaGrad algorithm, taken from "Goodfellow et '
+               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
+               2,
+               None,
+               'adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
+              ('Derivation of the AdaGrad Algorithm',
+               2,
+               None,
+               'derivation-of-the-adagrad-algorithm'),
+              ('AdaGrad Update Rule Derivation',
+               2,
+               None,
+               'adagrad-update-rule-derivation'),
+              ('AdaGrad Properties', 2, None, 'adagrad-properties'),
+              ('RMSProp: Adaptive Learning Rates',
+               2,
+               None,
+               'rmsprop-adaptive-learning-rates'),
+              ('RMSProp algorithm, taken from "Goodfellow et '
+               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
+               2,
+               None,
+               'rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
+              ('Adam Optimizer', 2, None, 'adam-optimizer'),
+              ('"ADAM optimizer":"https://arxiv.org/abs/1412.6980"',
                2,
                None,
-               'resampling-methods-more-bootstrap-background'),
-              ('Resampling methods: Bootstrap approach',
+               'adam-optimizer-https-arxiv-org-abs-1412-6980'),
+              ('Why Combine Momentum and RMSProp?',
                2,
                None,
-               'resampling-methods-bootstrap-approach'),
-              ('Resampling methods: Bootstrap steps',
+               'why-combine-momentum-and-rmsprop'),
+              ('Adam: Exponential Moving Averages (Moments)',
                2,
                None,
-               'resampling-methods-bootstrap-steps'),
-              ('Code example for the Bootstrap method',
+               'adam-exponential-moving-averages-moments'),
+              ('Adam: Bias Correction', 2, None, 'adam-bias-correction'),
+              ('Adam: Update Rule Derivation',
                2,
                None,
-               'code-example-for-the-bootstrap-method'),
-              ('Plotting the Histogram', 2, None, 'plotting-the-histogram'),
-              ('The bias-variance tradeoff',
+               'adam-update-rule-derivation'),
+              ('Adam vs. AdaGrad and RMSProp',
                2,
                None,
-               'the-bias-variance-tradeoff'),
-              ('A way to Read the Bias-Variance Tradeoff',
+               'adam-vs-adagrad-and-rmsprop'),
+              ('Adaptivity Across Dimensions',
                2,
                None,
-               'a-way-to-read-the-bias-variance-tradeoff'),
-              ('Example code for Bias-Variance tradeoff',
+               'adaptivity-across-dimensions'),
+              ('ADAM algorithm, taken from "Goodfellow et '
+               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
                2,
                None,
-               'example-code-for-bias-variance-tradeoff'),
-              ('Understanding what happens',
+               'adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
+              ('Algorithms and codes for Adagrad, RMSprop and Adam',
                2,
                None,
-               'understanding-what-happens'),
-              ('Summing up', 2, None, 'summing-up'),
-              ("Another Example from Scikit-Learn's Repository",
+               'algorithms-and-codes-for-adagrad-rmsprop-and-adam'),
+              ('Practical tips', 2, None, 'practical-tips'),
+              ('Sneaking in automatic differentiation using Autograd',
                2,
                None,
-               'another-example-from-scikit-learn-s-repository'),
-              ('Various steps in cross-validation',
+               'sneaking-in-automatic-differentiation-using-autograd'),
+              ('Same code but now with momentum gradient descent',
                2,
                None,
-               'various-steps-in-cross-validation'),
-              ('Cross-validation in brief',
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ('Including Stochastic Gradient Descent with Autograd',
                2,
                None,
-               'cross-validation-in-brief'),
-              ('Code Example for Cross-validation and $k$-fold '
-               'Cross-validation',
+               'including-stochastic-gradient-descent-with-autograd'),
+              ('Same code but now with momentum gradient descent',
                2,
                None,
-               'code-example-for-cross-validation-and-k-fold-cross-validation'),
-              ('More examples on bootstrap and cross-validation and errors',
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ("But none of these can compete with Newton's method",
                2,
                None,
-               'more-examples-on-bootstrap-and-cross-validation-and-errors'),
-              ('The same example but now with cross-validation',
+               'but-none-of-these-can-compete-with-newton-s-method'),
+              ('Similar (second order function now) problem but now with '
+               'AdaGrad',
                2,
                None,
-               'the-same-example-but-now-with-cross-validation'),
+               'similar-second-order-function-now-problem-but-now-with-adagrad'),
+              ('RMSprop for adaptive learning rate with Stochastic Gradient '
+               'Descent',
+               2,
+               None,
+               'rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent'),
+              ('And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"',
+               2,
+               None,
+               'and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf'),
               ('Material for the lab sessions',
                2,
                None,
                'material-for-the-lab-sessions'),
-              ('Linking the regression analysis with a statistical '
-               'interpretation',
+              ('Reminder on different scaling methods',
                2,
                None,
-               'linking-the-regression-analysis-with-a-statistical-interpretation'),
-              ('Assumptions made', 2, None, 'assumptions-made'),
-              ('Expectation value and variance',
+               'reminder-on-different-scaling-methods'),
+              ('Functionality in Scikit-Learn',
                2,
                None,
-               'expectation-value-and-variance'),
-              ('Expectation value and variance for $\\boldsymbol{\\beta}$',
+               'functionality-in-scikit-learn'),
+              ('More preprocessing', 2, None, 'more-preprocessing'),
+              ('Frequently used scaling functions',
                2,
                None,
-               'expectation-value-and-variance-for-boldsymbol-beta')]}
+               'frequently-used-scaling-functions')]}
 end of tocinfo -->
 
 <body>
@@ -220,66 +283,86 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week37-bs.html">Week 37: Statistical interpretations and Resampling Methods</a>
+    <a class="navbar-brand" href="http://github.com/week37-bs.html">Week 37: Gradient descent methods</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs001.html#plans-for-week-37-lecture-monday" style="font-size: 80%;">Plans for week 37, lecture Monday</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs002.html#plans-for-week-37-lab-sessions" style="font-size: 80%;">Plans for week 37, lab sessions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs003.html#material-for-lecture-monday-september-9" style="font-size: 80%;">Material for lecture Monday September 9</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs004.html#deriving-ols-from-a-probability-distribution" style="font-size: 80%;">Deriving OLS from a probability distribution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs005.html#independent-and-identically-distrubuted-iid" style="font-size: 80%;">Independent and Identically Distrubuted (iid)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs006.html#maximum-likelihood-estimation-mle" style="font-size: 80%;">Maximum Likelihood Estimation (MLE)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs007.html#a-new-cost-function" style="font-size: 80%;">A new Cost Function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs008.html#more-basic-statistics-and-bayes-theorem" style="font-size: 80%;">More basic Statistics and Bayes' theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs009.html#marginal-probability" style="font-size: 80%;">Marginal Probability</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs010.html#conditional-probability" style="font-size: 80%;">Conditional  Probability</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs011.html#bayes-theorem" style="font-size: 80%;">Bayes' Theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs012.html#interpretations-of-bayes-theorem" style="font-size: 80%;">Interpretations of Bayes' Theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs013.html#example-of-usage-of-bayes-theorem" style="font-size: 80%;">Example of Usage of Bayes' theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs014.html#doing-it-correctly" style="font-size: 80%;">Doing it correctly</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs015.html#bayes-theorem-and-ridge-and-lasso-regression" style="font-size: 80%;">Bayes' Theorem and Ridge and Lasso Regression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs016.html#ridge-and-bayes" style="font-size: 80%;">Ridge and Bayes</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs017.html#lasso-and-bayes" style="font-size: 80%;">Lasso and Bayes</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs021.html#why-resampling-methods" style="font-size: 80%;">Why resampling methods</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs023.html#resampling-methods" style="font-size: 80%;">Resampling methods</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs020.html#resampling-approaches-can-be-computationally-expensive" style="font-size: 80%;">Resampling approaches can be computationally expensive</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs021.html#why-resampling-methods" style="font-size: 80%;">Why resampling methods ?</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs022.html#statistical-analysis" style="font-size: 80%;">Statistical analysis</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs023.html#resampling-methods" style="font-size: 80%;">Resampling methods</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs024.html#resampling-methods-bootstrap" style="font-size: 80%;">Resampling methods: Bootstrap</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs025.html#the-central-limit-theorem" style="font-size: 80%;">The Central Limit Theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs026.html#finding-the-limit" style="font-size: 80%;">Finding the Limit</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs027.html#rewriting-the-delta-function" style="font-size: 80%;">Rewriting the \( \delta \)-function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs028.html#identifying-terms" style="font-size: 80%;">Identifying Terms</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs029.html#wrapping-it-up" style="font-size: 80%;">Wrapping it up</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs030.html#confidence-intervals" style="font-size: 80%;">Confidence Intervals</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs031.html#standard-approach-based-on-the-normal-distribution" style="font-size: 80%;">Standard Approach based on the Normal Distribution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs032.html#resampling-methods-bootstrap-background" style="font-size: 80%;">Resampling methods: Bootstrap background</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#resampling-methods-more-bootstrap-background" style="font-size: 80%;">Resampling methods: More Bootstrap background</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs034.html#resampling-methods-bootstrap-approach" style="font-size: 80%;">Resampling methods: Bootstrap approach</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#resampling-methods-bootstrap-steps" style="font-size: 80%;">Resampling methods: Bootstrap steps</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs036.html#code-example-for-the-bootstrap-method" style="font-size: 80%;">Code example for the Bootstrap method</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs037.html#plotting-the-histogram" style="font-size: 80%;">Plotting the Histogram</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs038.html#the-bias-variance-tradeoff" style="font-size: 80%;">The bias-variance tradeoff</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs039.html#a-way-to-read-the-bias-variance-tradeoff" style="font-size: 80%;">A way to Read the Bias-Variance Tradeoff</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs040.html#example-code-for-bias-variance-tradeoff" style="font-size: 80%;">Example code for Bias-Variance tradeoff</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs041.html#understanding-what-happens" style="font-size: 80%;">Understanding what happens</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs042.html#summing-up" style="font-size: 80%;">Summing up</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs043.html#another-example-from-scikit-learn-s-repository" style="font-size: 80%;">Another Example from Scikit-Learn's Repository</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs044.html#various-steps-in-cross-validation" style="font-size: 80%;">Various steps in cross-validation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs045.html#cross-validation-in-brief" style="font-size: 80%;">Cross-validation in brief</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs046.html#code-example-for-cross-validation-and-k-fold-cross-validation" style="font-size: 80%;">Code Example for Cross-validation and \( k \)-fold Cross-validation</a></li>
-     <!-- navigation toc: --> <li><a href="#more-examples-on-bootstrap-and-cross-validation-and-errors" style="font-size: 80%;">More examples on bootstrap and cross-validation and errors</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs048.html#the-same-example-but-now-with-cross-validation" style="font-size: 80%;">The same example but now with cross-validation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs049.html#material-for-the-lab-sessions" style="font-size: 80%;">Material for the lab sessions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs050.html#linking-the-regression-analysis-with-a-statistical-interpretation" style="font-size: 80%;">Linking the regression analysis with a statistical interpretation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs051.html#assumptions-made" style="font-size: 80%;">Assumptions made</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs052.html#expectation-value-and-variance" style="font-size: 80%;">Expectation value and variance</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs053.html#expectation-value-and-variance-for-boldsymbol-beta" style="font-size: 80%;">Expectation value and variance for \( \boldsymbol{\beta} \)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs001.html#plans-for-week-37-lecture-monday" style="font-size: 80%;"><b>Plans for week 37, lecture Monday</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs002.html#readings-and-videos" style="font-size: 80%;"><b>Readings and Videos:</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs003.html#material-for-lecture-monday-september-8" style="font-size: 80%;"><b>Material for lecture Monday September 8</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs004.html#gradient-descent-and-revisiting-ordinary-least-squares-from-last-week" style="font-size: 80%;"><b>Gradient descent and revisiting Ordinary Least Squares from last week</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs009.html#gradient-descent-example" style="font-size: 80%;"><b>Gradient descent example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs006.html#the-derivative-of-the-cost-loss-function" style="font-size: 80%;"><b>The derivative of the cost/loss function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs007.html#the-hessian-matrix" style="font-size: 80%;"><b>The Hessian matrix</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs008.html#simple-program" style="font-size: 80%;"><b>Simple program</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs009.html#gradient-descent-example" style="font-size: 80%;"><b>Gradient Descent Example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs010.html#gradient-descent-and-ridge" style="font-size: 80%;"><b>Gradient descent and Ridge</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs011.html#the-hessian-matrix-for-ridge-regression" style="font-size: 80%;"><b>The Hessian matrix for Ridge Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs012.html#program-example-for-gradient-descent-with-ridge-regression" style="font-size: 80%;"><b>Program example for gradient descent with Ridge Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs013.html#using-gradient-descent-methods-limitations" style="font-size: 80%;"><b>Using gradient descent methods, limitations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs014.html#momentum-based-gd" style="font-size: 80%;"><b>Momentum based GD</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs015.html#improving-gradient-descent-with-momentum" style="font-size: 80%;"><b>Improving gradient descent with momentum</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs056.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs017.html#overview-video-on-stochastic-gradient-descent-sgd" style="font-size: 80%;"><b>Overview video on Stochastic Gradient Descent (SGD)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs018.html#batches-and-mini-batches" style="font-size: 80%;"><b>Batches and mini-batches</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs019.html#pros-and-cons" style="font-size: 80%;"><b>Pros and cons</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs020.html#convergence-rates" style="font-size: 80%;"><b>Convergence rates</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs021.html#accuracy" style="font-size: 80%;"><b>Accuracy</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs022.html#stochastic-gradient-descent-sgd" style="font-size: 80%;"><b>Stochastic Gradient Descent (SGD)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs023.html#stochastic-gradient-descent" style="font-size: 80%;"><b>Stochastic Gradient Descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs024.html#computation-of-gradients" style="font-size: 80%;"><b>Computation of gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs025.html#sgd-example" style="font-size: 80%;"><b>SGD example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs026.html#the-gradient-step" style="font-size: 80%;"><b>The gradient step</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs027.html#simple-example-code" style="font-size: 80%;"><b>Simple example code</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs028.html#when-do-we-stop" style="font-size: 80%;"><b>When do we stop?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs029.html#slightly-different-approach" style="font-size: 80%;"><b>Slightly different approach</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs030.html#time-decay-rate" style="font-size: 80%;"><b>Time decay rate</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs031.html#code-with-a-number-of-minibatches-which-varies" style="font-size: 80%;"><b>Code with a Number of Minibatches which varies</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs032.html#replace-or-not" style="font-size: 80%;"><b>Replace or not</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#sgd-vs-full-batch-gd-convergence-speed-and-memory-comparison" style="font-size: 80%;"><b>SGD vs Full-Batch GD: Convergence Speed and Memory Comparison</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#theoretical-convergence-speed-and-convex-optimization" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Theoretical Convergence Speed and convex optimization</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#strongly-convex-case" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Strongly Convex Case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#non-convex-problems" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Non-Convex Problems</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs034.html#memory-usage-and-scalability" style="font-size: 80%;"><b>Memory Usage and Scalability</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#empirical-evidence-convergence-time-and-memory-in-practice" style="font-size: 80%;"><b>Empirical Evidence: Convergence Time and Memory in Practice</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#deep-neural-networks" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Deep Neural Networks</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#memory-constraints" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Memory constraints</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs036.html#second-moment-of-the-gradient" style="font-size: 80%;"><b>Second moment of the gradient</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs037.html#challenge-choosing-a-fixed-learning-rate" style="font-size: 80%;"><b>Challenge: Choosing a Fixed Learning Rate</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs038.html#motivation-for-adaptive-step-sizes" style="font-size: 80%;"><b>Motivation for Adaptive Step Sizes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs038.html#adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>AdaGrad algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs039.html#derivation-of-the-adagrad-algorithm" style="font-size: 80%;"><b>Derivation of the AdaGrad Algorithm</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs040.html#adagrad-update-rule-derivation" style="font-size: 80%;"><b>AdaGrad Update Rule Derivation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs041.html#adagrad-properties" style="font-size: 80%;"><b>AdaGrad Properties</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs042.html#rmsprop-adaptive-learning-rates" style="font-size: 80%;"><b>RMSProp: Adaptive Learning Rates</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs042.html#rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>RMSProp algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs043.html#adam-optimizer" style="font-size: 80%;"><b>Adam Optimizer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs044.html#adam-optimizer-https-arxiv-org-abs-1412-6980" style="font-size: 80%;"><b>"ADAM optimizer":"https://arxiv.org/abs/1412.6980"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs045.html#why-combine-momentum-and-rmsprop" style="font-size: 80%;"><b>Why Combine Momentum and RMSProp?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs046.html#adam-exponential-moving-averages-moments" style="font-size: 80%;"><b>Adam: Exponential Moving Averages (Moments)</b></a></li>
+     <!-- navigation toc: --> <li><a href="#adam-bias-correction" style="font-size: 80%;"><b>Adam: Bias Correction</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs048.html#adam-update-rule-derivation" style="font-size: 80%;"><b>Adam: Update Rule Derivation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs049.html#adam-vs-adagrad-and-rmsprop" style="font-size: 80%;"><b>Adam vs. AdaGrad and RMSProp</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs050.html#adaptivity-across-dimensions" style="font-size: 80%;"><b>Adaptivity Across Dimensions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs050.html#adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>ADAM algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs051.html#algorithms-and-codes-for-adagrad-rmsprop-and-adam" style="font-size: 80%;"><b>Algorithms and codes for Adagrad, RMSprop and Adam</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs052.html#practical-tips" style="font-size: 80%;"><b>Practical tips</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs053.html#sneaking-in-automatic-differentiation-using-autograd" style="font-size: 80%;"><b>Sneaking in automatic differentiation using Autograd</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs056.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs055.html#including-stochastic-gradient-descent-with-autograd" style="font-size: 80%;"><b>Including Stochastic Gradient Descent with Autograd</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs056.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs057.html#but-none-of-these-can-compete-with-newton-s-method" style="font-size: 80%;"><b>But none of these can compete with Newton's method</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs058.html#similar-second-order-function-now-problem-but-now-with-adagrad" style="font-size: 80%;"><b>Similar (second order function now) problem but now with AdaGrad</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs059.html#rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent" style="font-size: 80%;"><b>RMSprop for adaptive learning rate with Stochastic Gradient Descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs060.html#and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf" style="font-size: 80%;"><b>And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs061.html#material-for-the-lab-sessions" style="font-size: 80%;"><b>Material for the lab sessions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs062.html#reminder-on-different-scaling-methods" style="font-size: 80%;"><b>Reminder on different scaling methods</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs063.html#functionality-in-scikit-learn" style="font-size: 80%;"><b>Functionality in Scikit-Learn</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs064.html#more-preprocessing" style="font-size: 80%;"><b>More preprocessing</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs065.html#frequently-used-scaling-functions" style="font-size: 80%;"><b>Frequently used scaling functions</b></a></li>
 
         </ul>
       </li>
@@ -291,110 +374,17 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0047"></a>
 <!-- !split -->
-<h2 id="more-examples-on-bootstrap-and-cross-validation-and-errors" class="anchor">More examples on bootstrap and cross-validation and errors </h2>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #408080; font-style: italic"># Common imports</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">os</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">pandas</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">pd</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">matplotlib.pyplot</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">plt</span>
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.linear_model</span> <span style="color: #008000; font-weight: bold">import</span> LinearRegression, Ridge, Lasso
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.model_selection</span> <span style="color: #008000; font-weight: bold">import</span> train_test_split
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.utils</span> <span style="color: #008000; font-weight: bold">import</span> resample
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.metrics</span> <span style="color: #008000; font-weight: bold">import</span> mean_squared_error
-<span style="color: #408080; font-style: italic"># Where to save the figures and data files</span>
-PROJECT_ROOT_DIR <span style="color: #666666">=</span> <span style="color: #BA2121">&quot;Results&quot;</span>
-FIGURE_ID <span style="color: #666666">=</span> <span style="color: #BA2121">&quot;Results/FigureFiles&quot;</span>
-DATA_ID <span style="color: #666666">=</span> <span style="color: #BA2121">&quot;DataFiles/&quot;</span>
-
-<span style="color: #008000; font-weight: bold">if</span> <span style="color: #AA22FF; font-weight: bold">not</span> os<span style="color: #666666">.</span>path<span style="color: #666666">.</span>exists(PROJECT_ROOT_DIR):
-    os<span style="color: #666666">.</span>mkdir(PROJECT_ROOT_DIR)
-
-<span style="color: #008000; font-weight: bold">if</span> <span style="color: #AA22FF; font-weight: bold">not</span> os<span style="color: #666666">.</span>path<span style="color: #666666">.</span>exists(FIGURE_ID):
-    os<span style="color: #666666">.</span>makedirs(FIGURE_ID)
-
-<span style="color: #008000; font-weight: bold">if</span> <span style="color: #AA22FF; font-weight: bold">not</span> os<span style="color: #666666">.</span>path<span style="color: #666666">.</span>exists(DATA_ID):
-    os<span style="color: #666666">.</span>makedirs(DATA_ID)
-
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">image_path</span>(fig_id):
-    <span style="color: #008000; font-weight: bold">return</span> os<span style="color: #666666">.</span>path<span style="color: #666666">.</span>join(FIGURE_ID, fig_id)
-
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">data_path</span>(dat_id):
-    <span style="color: #008000; font-weight: bold">return</span> os<span style="color: #666666">.</span>path<span style="color: #666666">.</span>join(DATA_ID, dat_id)
-
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">save_fig</span>(fig_id):
-    plt<span style="color: #666666">.</span>savefig(image_path(fig_id) <span style="color: #666666">+</span> <span style="color: #BA2121">&quot;.png&quot;</span>, <span style="color: #008000">format</span><span style="color: #666666">=</span><span style="color: #BA2121">&#39;png&#39;</span>)
-
-infile <span style="color: #666666">=</span> <span style="color: #008000">open</span>(data_path(<span style="color: #BA2121">&quot;EoS.csv&quot;</span>),<span style="color: #BA2121">&#39;r&#39;</span>)
-
-<span style="color: #408080; font-style: italic"># Read the EoS data as  csv file and organize the data into two arrays with density and energies</span>
-EoS <span style="color: #666666">=</span> pd<span style="color: #666666">.</span>read_csv(infile, names<span style="color: #666666">=</span>(<span style="color: #BA2121">&#39;Density&#39;</span>, <span style="color: #BA2121">&#39;Energy&#39;</span>))
-EoS[<span style="color: #BA2121">&#39;Energy&#39;</span>] <span style="color: #666666">=</span> pd<span style="color: #666666">.</span>to_numeric(EoS[<span style="color: #BA2121">&#39;Energy&#39;</span>], errors<span style="color: #666666">=</span><span style="color: #BA2121">&#39;coerce&#39;</span>)
-EoS <span style="color: #666666">=</span> EoS<span style="color: #666666">.</span>dropna()
-Energies <span style="color: #666666">=</span> EoS[<span style="color: #BA2121">&#39;Energy&#39;</span>]
-Density <span style="color: #666666">=</span> EoS[<span style="color: #BA2121">&#39;Density&#39;</span>]
-<span style="color: #408080; font-style: italic">#  The design matrix now as function of various polytrops</span>
-
-Maxpolydegree <span style="color: #666666">=</span> <span style="color: #666666">30</span>
-X <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros((<span style="color: #008000">len</span>(Density),Maxpolydegree))
-X[:,<span style="color: #666666">0</span>] <span style="color: #666666">=</span> <span style="color: #666666">1.0</span>
-testerror <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros(Maxpolydegree)
-trainingerror <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros(Maxpolydegree)
-polynomial <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros(Maxpolydegree)
-
-trials <span style="color: #666666">=</span> <span style="color: #666666">100</span>
-<span style="color: #008000; font-weight: bold">for</span> polydegree <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #666666">1</span>, Maxpolydegree):
-    polynomial[polydegree] <span style="color: #666666">=</span> polydegree
-    <span style="color: #008000; font-weight: bold">for</span> degree <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(polydegree):
-        X[:,degree] <span style="color: #666666">=</span> Density<span style="color: #666666">**</span>(degree<span style="color: #666666">/3.0</span>)
-
-<span style="color: #408080; font-style: italic"># loop over trials in order to estimate the expectation value of the MSE</span>
-    testerror[polydegree] <span style="color: #666666">=</span> <span style="color: #666666">0.0</span>
-    trainingerror[polydegree] <span style="color: #666666">=</span> <span style="color: #666666">0.0</span>
-    <span style="color: #008000; font-weight: bold">for</span> samples <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(trials):
-        x_train, x_test, y_train, y_test <span style="color: #666666">=</span> train_test_split(X, Energies, test_size<span style="color: #666666">=0.2</span>)
-        model <span style="color: #666666">=</span> LinearRegression(fit_intercept<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">False</span>)<span style="color: #666666">.</span>fit(x_train, y_train)
-        ypred <span style="color: #666666">=</span> model<span style="color: #666666">.</span>predict(x_train)
-        ytilde <span style="color: #666666">=</span> model<span style="color: #666666">.</span>predict(x_test)
-        testerror[polydegree] <span style="color: #666666">+=</span> mean_squared_error(y_test, ytilde)
-        trainingerror[polydegree] <span style="color: #666666">+=</span> mean_squared_error(y_train, ypred) 
-
-    testerror[polydegree] <span style="color: #666666">/=</span> trials
-    trainingerror[polydegree] <span style="color: #666666">/=</span> trials
-    <span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;Degree of polynomial: </span><span style="color: #BB6688; font-weight: bold">%3d</span><span style="color: #BA2121">&quot;</span><span style="color: #666666">%</span> polynomial[polydegree])
-    <span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;Mean squared error on training data: </span><span style="color: #BB6688; font-weight: bold">%.8f</span><span style="color: #BA2121">&quot;</span> <span style="color: #666666">%</span> trainingerror[polydegree])
-    <span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;Mean squared error on test data: </span><span style="color: #BB6688; font-weight: bold">%.8f</span><span style="color: #BA2121">&quot;</span> <span style="color: #666666">%</span> testerror[polydegree])
-
-plt<span style="color: #666666">.</span>plot(polynomial, np<span style="color: #666666">.</span>log10(trainingerror), label<span style="color: #666666">=</span><span style="color: #BA2121">&#39;Training Error&#39;</span>)
-plt<span style="color: #666666">.</span>plot(polynomial, np<span style="color: #666666">.</span>log10(testerror), label<span style="color: #666666">=</span><span style="color: #BA2121">&#39;Test Error&#39;</span>)
-plt<span style="color: #666666">.</span>xlabel(<span style="color: #BA2121">&#39;Polynomial degree&#39;</span>)
-plt<span style="color: #666666">.</span>ylabel(<span style="color: #BA2121">&#39;log10[MSE]&#39;</span>)
-plt<span style="color: #666666">.</span>legend()
-plt<span style="color: #666666">.</span>show()
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-<p>Note that we kept the intercept column in the fitting here. This means that we need to set the <b>intercept</b> in the call to the <b>Scikit-Learn</b> function as <b>False</b>. Alternatively, we could have set up the design matrix \( X \) without the first column of ones.</p>
+<h2 id="adam-bias-correction" class="anchor">Adam: Bias Correction </h2>
+<p>To counteract initialization bias in \( m_t, v_t \), Adam computes bias-corrected estimates</p>
+$$
+\hat{m}_t = \frac{m_t}{1 - \beta_1^t}, \qquad \hat{v}_t = \frac{v_t}{1 - \beta_2^t}. 
+$$
 
+<ul>
+<li> When \( t \) is small, \( 1-\beta_i^t \approx 0 \), so \( \hat{m}_t, \hat{v}_t \) significantly larger than raw \( m_t, v_t \), compensating for the initial zero bias.</li>
+<li> As \( t \) increases, \( 1-\beta_i^t \to 1 \), and \( \hat{m}_t, \hat{v}_t \) converge to \( m_t, v_t \).</li>
+<li> Bias correction is important for Adam&#8217;s stability in early iterations</li>
+</ul>
 <p>
 <!-- navigation buttons at the bottom of the page -->
 <ul class="pagination">
@@ -416,6 +406,11 @@ <h2 id="more-examples-on-bootstrap-and-cross-validation-and-errors" class="ancho
   <li><a href="http://github.com/._week37-bs051.html">52</a></li>
   <li><a href="http://github.com/._week37-bs052.html">53</a></li>
   <li><a href="http://github.com/._week37-bs053.html">54</a></li>
+  <li><a href="http://github.com/._week37-bs054.html">55</a></li>
+  <li><a href="http://github.com/._week37-bs055.html">56</a></li>
+  <li><a href="http://github.com/._week37-bs056.html">57</a></li>
+  <li><a href="">...</a></li>
+  <li><a href="http://github.com/._week37-bs065.html">66</a></li>
   <li><a href="http://github.com/._week37-bs048.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week37/html/._week37-bs048.html b/doc/pub/week37/html/._week37-bs048.html
index 2db1f74ee..0cf9bdb3c 100644
--- a/doc/pub/week37/html/._week37-bs048.html
+++ b/doc/pub/week37/html/._week37-bs048.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 37: Statistical interpretations and Resampling Methods">
-<title>Week 37: Statistical interpretations and Resampling Methods</title>
+<meta name="description" content="Week 37: Gradient descent methods">
+<title>Week 37: Gradient descent methods</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week37.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week37-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -40,159 +40,222 @@
                2,
                None,
                'plans-for-week-37-lecture-monday'),
-              ('Plans for week 37, lab sessions',
+              ('Readings and Videos:', 2, None, 'readings-and-videos'),
+              ('Material for lecture Monday September 8',
                2,
                None,
-               'plans-for-week-37-lab-sessions'),
-              ('Material for lecture Monday September 9',
+               'material-for-lecture-monday-september-8'),
+              ('Gradient descent and revisiting Ordinary Least Squares from '
+               'last week',
                2,
                None,
-               'material-for-lecture-monday-september-9'),
-              ('Deriving OLS from a probability distribution',
+               'gradient-descent-and-revisiting-ordinary-least-squares-from-last-week'),
+              ('Gradient descent example', 2, None, 'gradient-descent-example'),
+              ('The derivative of the cost/loss function',
                2,
                None,
-               'deriving-ols-from-a-probability-distribution'),
-              ('Independent and Identically Distrubuted (iid)',
+               'the-derivative-of-the-cost-loss-function'),
+              ('The Hessian matrix', 2, None, 'the-hessian-matrix'),
+              ('Simple program', 2, None, 'simple-program'),
+              ('Gradient Descent Example', 2, None, 'gradient-descent-example'),
+              ('Gradient descent and Ridge',
                2,
                None,
-               'independent-and-identically-distrubuted-iid'),
-              ('Maximum Likelihood Estimation (MLE)',
+               'gradient-descent-and-ridge'),
+              ('The Hessian matrix for Ridge Regression',
                2,
                None,
-               'maximum-likelihood-estimation-mle'),
-              ('A new Cost Function', 2, None, 'a-new-cost-function'),
-              ("More basic Statistics and Bayes' theorem",
+               'the-hessian-matrix-for-ridge-regression'),
+              ('Program example for gradient descent with Ridge Regression',
                2,
                None,
-               'more-basic-statistics-and-bayes-theorem'),
-              ('Marginal Probability', 2, None, 'marginal-probability'),
-              ('Conditional  Probability', 2, None, 'conditional-probability'),
-              ("Bayes' Theorem", 2, None, 'bayes-theorem'),
-              ("Interpretations of Bayes' Theorem",
+               'program-example-for-gradient-descent-with-ridge-regression'),
+              ('Using gradient descent methods, limitations',
                2,
                None,
-               'interpretations-of-bayes-theorem'),
-              ("Example of Usage of Bayes' theorem",
+               'using-gradient-descent-methods-limitations'),
+              ('Momentum based GD', 2, None, 'momentum-based-gd'),
+              ('Improving gradient descent with momentum',
                2,
                None,
-               'example-of-usage-of-bayes-theorem'),
-              ('Doing it correctly', 2, None, 'doing-it-correctly'),
-              ("Bayes' Theorem and Ridge and Lasso Regression",
+               'improving-gradient-descent-with-momentum'),
+              ('Same code but now with momentum gradient descent',
                2,
                None,
-               'bayes-theorem-and-ridge-and-lasso-regression'),
-              ('Ridge and Bayes', 2, None, 'ridge-and-bayes'),
-              ('Lasso and Bayes', 2, None, 'lasso-and-bayes'),
-              ('Why resampling methods', 2, None, 'why-resampling-methods'),
-              ('Resampling methods', 2, None, 'resampling-methods'),
-              ('Resampling approaches can be computationally expensive',
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ('Overview video on Stochastic Gradient Descent (SGD)',
                2,
                None,
-               'resampling-approaches-can-be-computationally-expensive'),
-              ('Why resampling methods ?', 2, None, 'why-resampling-methods'),
-              ('Statistical analysis', 2, None, 'statistical-analysis'),
-              ('Resampling methods', 2, None, 'resampling-methods'),
-              ('Resampling methods: Bootstrap',
+               'overview-video-on-stochastic-gradient-descent-sgd'),
+              ('Batches and mini-batches', 2, None, 'batches-and-mini-batches'),
+              ('Pros and cons', 2, None, 'pros-and-cons'),
+              ('Convergence rates', 2, None, 'convergence-rates'),
+              ('Accuracy', 2, None, 'accuracy'),
+              ('Stochastic Gradient Descent (SGD)',
                2,
                None,
-               'resampling-methods-bootstrap'),
-              ('The Central Limit Theorem',
+               'stochastic-gradient-descent-sgd'),
+              ('Stochastic Gradient Descent',
                2,
                None,
-               'the-central-limit-theorem'),
-              ('Finding the Limit', 2, None, 'finding-the-limit'),
-              ('Rewriting the $\\delta$-function',
+               'stochastic-gradient-descent'),
+              ('Computation of gradients', 2, None, 'computation-of-gradients'),
+              ('SGD example', 2, None, 'sgd-example'),
+              ('The gradient step', 2, None, 'the-gradient-step'),
+              ('Simple example code', 2, None, 'simple-example-code'),
+              ('When do we stop?', 2, None, 'when-do-we-stop'),
+              ('Slightly different approach',
                2,
                None,
-               'rewriting-the-delta-function'),
-              ('Identifying Terms', 2, None, 'identifying-terms'),
-              ('Wrapping it up', 2, None, 'wrapping-it-up'),
-              ('Confidence Intervals', 2, None, 'confidence-intervals'),
-              ('Standard Approach based on the Normal Distribution',
+               'slightly-different-approach'),
+              ('Time decay rate', 2, None, 'time-decay-rate'),
+              ('Code with a Number of Minibatches which varies',
                2,
                None,
-               'standard-approach-based-on-the-normal-distribution'),
-              ('Resampling methods: Bootstrap background',
+               'code-with-a-number-of-minibatches-which-varies'),
+              ('Replace or not', 2, None, 'replace-or-not'),
+              ('SGD vs Full-Batch GD: Convergence Speed and Memory Comparison',
                2,
                None,
-               'resampling-methods-bootstrap-background'),
-              ('Resampling methods: More Bootstrap background',
+               'sgd-vs-full-batch-gd-convergence-speed-and-memory-comparison'),
+              ('Theoretical Convergence Speed and convex optimization',
+               3,
+               None,
+               'theoretical-convergence-speed-and-convex-optimization'),
+              ('Strongly Convex Case', 3, None, 'strongly-convex-case'),
+              ('Non-Convex Problems', 3, None, 'non-convex-problems'),
+              ('Memory Usage and Scalability',
+               2,
+               None,
+               'memory-usage-and-scalability'),
+              ('Empirical Evidence: Convergence Time and Memory in Practice',
+               2,
+               None,
+               'empirical-evidence-convergence-time-and-memory-in-practice'),
+              ('Deep Neural Networks', 3, None, 'deep-neural-networks'),
+              ('Memory constraints', 3, None, 'memory-constraints'),
+              ('Second moment of the gradient',
+               2,
+               None,
+               'second-moment-of-the-gradient'),
+              ('Challenge: Choosing a Fixed Learning Rate',
+               2,
+               None,
+               'challenge-choosing-a-fixed-learning-rate'),
+              ('Motivation for Adaptive Step Sizes',
+               2,
+               None,
+               'motivation-for-adaptive-step-sizes'),
+              ('AdaGrad algorithm, taken from "Goodfellow et '
+               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
+               2,
+               None,
+               'adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
+              ('Derivation of the AdaGrad Algorithm',
+               2,
+               None,
+               'derivation-of-the-adagrad-algorithm'),
+              ('AdaGrad Update Rule Derivation',
+               2,
+               None,
+               'adagrad-update-rule-derivation'),
+              ('AdaGrad Properties', 2, None, 'adagrad-properties'),
+              ('RMSProp: Adaptive Learning Rates',
+               2,
+               None,
+               'rmsprop-adaptive-learning-rates'),
+              ('RMSProp algorithm, taken from "Goodfellow et '
+               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
+               2,
+               None,
+               'rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
+              ('Adam Optimizer', 2, None, 'adam-optimizer'),
+              ('"ADAM optimizer":"https://arxiv.org/abs/1412.6980"',
+               2,
+               None,
+               'adam-optimizer-https-arxiv-org-abs-1412-6980'),
+              ('Why Combine Momentum and RMSProp?',
                2,
                None,
-               'resampling-methods-more-bootstrap-background'),
-              ('Resampling methods: Bootstrap approach',
+               'why-combine-momentum-and-rmsprop'),
+              ('Adam: Exponential Moving Averages (Moments)',
                2,
                None,
-               'resampling-methods-bootstrap-approach'),
-              ('Resampling methods: Bootstrap steps',
+               'adam-exponential-moving-averages-moments'),
+              ('Adam: Bias Correction', 2, None, 'adam-bias-correction'),
+              ('Adam: Update Rule Derivation',
                2,
                None,
-               'resampling-methods-bootstrap-steps'),
-              ('Code example for the Bootstrap method',
+               'adam-update-rule-derivation'),
+              ('Adam vs. AdaGrad and RMSProp',
                2,
                None,
-               'code-example-for-the-bootstrap-method'),
-              ('Plotting the Histogram', 2, None, 'plotting-the-histogram'),
-              ('The bias-variance tradeoff',
+               'adam-vs-adagrad-and-rmsprop'),
+              ('Adaptivity Across Dimensions',
                2,
                None,
-               'the-bias-variance-tradeoff'),
-              ('A way to Read the Bias-Variance Tradeoff',
+               'adaptivity-across-dimensions'),
+              ('ADAM algorithm, taken from "Goodfellow et '
+               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
                2,
                None,
-               'a-way-to-read-the-bias-variance-tradeoff'),
-              ('Example code for Bias-Variance tradeoff',
+               'adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
+              ('Algorithms and codes for Adagrad, RMSprop and Adam',
                2,
                None,
-               'example-code-for-bias-variance-tradeoff'),
-              ('Understanding what happens',
+               'algorithms-and-codes-for-adagrad-rmsprop-and-adam'),
+              ('Practical tips', 2, None, 'practical-tips'),
+              ('Sneaking in automatic differentiation using Autograd',
                2,
                None,
-               'understanding-what-happens'),
-              ('Summing up', 2, None, 'summing-up'),
-              ("Another Example from Scikit-Learn's Repository",
+               'sneaking-in-automatic-differentiation-using-autograd'),
+              ('Same code but now with momentum gradient descent',
                2,
                None,
-               'another-example-from-scikit-learn-s-repository'),
-              ('Various steps in cross-validation',
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ('Including Stochastic Gradient Descent with Autograd',
                2,
                None,
-               'various-steps-in-cross-validation'),
-              ('Cross-validation in brief',
+               'including-stochastic-gradient-descent-with-autograd'),
+              ('Same code but now with momentum gradient descent',
                2,
                None,
-               'cross-validation-in-brief'),
-              ('Code Example for Cross-validation and $k$-fold '
-               'Cross-validation',
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ("But none of these can compete with Newton's method",
                2,
                None,
-               'code-example-for-cross-validation-and-k-fold-cross-validation'),
-              ('More examples on bootstrap and cross-validation and errors',
+               'but-none-of-these-can-compete-with-newton-s-method'),
+              ('Similar (second order function now) problem but now with '
+               'AdaGrad',
                2,
                None,
-               'more-examples-on-bootstrap-and-cross-validation-and-errors'),
-              ('The same example but now with cross-validation',
+               'similar-second-order-function-now-problem-but-now-with-adagrad'),
+              ('RMSprop for adaptive learning rate with Stochastic Gradient '
+               'Descent',
                2,
                None,
-               'the-same-example-but-now-with-cross-validation'),
+               'rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent'),
+              ('And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"',
+               2,
+               None,
+               'and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf'),
               ('Material for the lab sessions',
                2,
                None,
                'material-for-the-lab-sessions'),
-              ('Linking the regression analysis with a statistical '
-               'interpretation',
+              ('Reminder on different scaling methods',
                2,
                None,
-               'linking-the-regression-analysis-with-a-statistical-interpretation'),
-              ('Assumptions made', 2, None, 'assumptions-made'),
-              ('Expectation value and variance',
+               'reminder-on-different-scaling-methods'),
+              ('Functionality in Scikit-Learn',
                2,
                None,
-               'expectation-value-and-variance'),
-              ('Expectation value and variance for $\\boldsymbol{\\beta}$',
+               'functionality-in-scikit-learn'),
+              ('More preprocessing', 2, None, 'more-preprocessing'),
+              ('Frequently used scaling functions',
                2,
                None,
-               'expectation-value-and-variance-for-boldsymbol-beta')]}
+               'frequently-used-scaling-functions')]}
 end of tocinfo -->
 
 <body>
@@ -220,66 +283,86 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week37-bs.html">Week 37: Statistical interpretations and Resampling Methods</a>
+    <a class="navbar-brand" href="http://github.com/week37-bs.html">Week 37: Gradient descent methods</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs001.html#plans-for-week-37-lecture-monday" style="font-size: 80%;">Plans for week 37, lecture Monday</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs002.html#plans-for-week-37-lab-sessions" style="font-size: 80%;">Plans for week 37, lab sessions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs003.html#material-for-lecture-monday-september-9" style="font-size: 80%;">Material for lecture Monday September 9</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs004.html#deriving-ols-from-a-probability-distribution" style="font-size: 80%;">Deriving OLS from a probability distribution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs005.html#independent-and-identically-distrubuted-iid" style="font-size: 80%;">Independent and Identically Distrubuted (iid)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs006.html#maximum-likelihood-estimation-mle" style="font-size: 80%;">Maximum Likelihood Estimation (MLE)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs007.html#a-new-cost-function" style="font-size: 80%;">A new Cost Function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs008.html#more-basic-statistics-and-bayes-theorem" style="font-size: 80%;">More basic Statistics and Bayes' theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs009.html#marginal-probability" style="font-size: 80%;">Marginal Probability</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs010.html#conditional-probability" style="font-size: 80%;">Conditional  Probability</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs011.html#bayes-theorem" style="font-size: 80%;">Bayes' Theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs012.html#interpretations-of-bayes-theorem" style="font-size: 80%;">Interpretations of Bayes' Theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs013.html#example-of-usage-of-bayes-theorem" style="font-size: 80%;">Example of Usage of Bayes' theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs014.html#doing-it-correctly" style="font-size: 80%;">Doing it correctly</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs015.html#bayes-theorem-and-ridge-and-lasso-regression" style="font-size: 80%;">Bayes' Theorem and Ridge and Lasso Regression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs016.html#ridge-and-bayes" style="font-size: 80%;">Ridge and Bayes</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs017.html#lasso-and-bayes" style="font-size: 80%;">Lasso and Bayes</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs021.html#why-resampling-methods" style="font-size: 80%;">Why resampling methods</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs023.html#resampling-methods" style="font-size: 80%;">Resampling methods</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs020.html#resampling-approaches-can-be-computationally-expensive" style="font-size: 80%;">Resampling approaches can be computationally expensive</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs021.html#why-resampling-methods" style="font-size: 80%;">Why resampling methods ?</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs022.html#statistical-analysis" style="font-size: 80%;">Statistical analysis</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs023.html#resampling-methods" style="font-size: 80%;">Resampling methods</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs024.html#resampling-methods-bootstrap" style="font-size: 80%;">Resampling methods: Bootstrap</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs025.html#the-central-limit-theorem" style="font-size: 80%;">The Central Limit Theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs026.html#finding-the-limit" style="font-size: 80%;">Finding the Limit</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs027.html#rewriting-the-delta-function" style="font-size: 80%;">Rewriting the \( \delta \)-function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs028.html#identifying-terms" style="font-size: 80%;">Identifying Terms</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs029.html#wrapping-it-up" style="font-size: 80%;">Wrapping it up</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs030.html#confidence-intervals" style="font-size: 80%;">Confidence Intervals</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs031.html#standard-approach-based-on-the-normal-distribution" style="font-size: 80%;">Standard Approach based on the Normal Distribution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs032.html#resampling-methods-bootstrap-background" style="font-size: 80%;">Resampling methods: Bootstrap background</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#resampling-methods-more-bootstrap-background" style="font-size: 80%;">Resampling methods: More Bootstrap background</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs034.html#resampling-methods-bootstrap-approach" style="font-size: 80%;">Resampling methods: Bootstrap approach</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#resampling-methods-bootstrap-steps" style="font-size: 80%;">Resampling methods: Bootstrap steps</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs036.html#code-example-for-the-bootstrap-method" style="font-size: 80%;">Code example for the Bootstrap method</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs037.html#plotting-the-histogram" style="font-size: 80%;">Plotting the Histogram</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs038.html#the-bias-variance-tradeoff" style="font-size: 80%;">The bias-variance tradeoff</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs039.html#a-way-to-read-the-bias-variance-tradeoff" style="font-size: 80%;">A way to Read the Bias-Variance Tradeoff</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs040.html#example-code-for-bias-variance-tradeoff" style="font-size: 80%;">Example code for Bias-Variance tradeoff</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs041.html#understanding-what-happens" style="font-size: 80%;">Understanding what happens</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs042.html#summing-up" style="font-size: 80%;">Summing up</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs043.html#another-example-from-scikit-learn-s-repository" style="font-size: 80%;">Another Example from Scikit-Learn's Repository</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs044.html#various-steps-in-cross-validation" style="font-size: 80%;">Various steps in cross-validation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs045.html#cross-validation-in-brief" style="font-size: 80%;">Cross-validation in brief</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs046.html#code-example-for-cross-validation-and-k-fold-cross-validation" style="font-size: 80%;">Code Example for Cross-validation and \( k \)-fold Cross-validation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs047.html#more-examples-on-bootstrap-and-cross-validation-and-errors" style="font-size: 80%;">More examples on bootstrap and cross-validation and errors</a></li>
-     <!-- navigation toc: --> <li><a href="#the-same-example-but-now-with-cross-validation" style="font-size: 80%;">The same example but now with cross-validation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs049.html#material-for-the-lab-sessions" style="font-size: 80%;">Material for the lab sessions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs050.html#linking-the-regression-analysis-with-a-statistical-interpretation" style="font-size: 80%;">Linking the regression analysis with a statistical interpretation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs051.html#assumptions-made" style="font-size: 80%;">Assumptions made</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs052.html#expectation-value-and-variance" style="font-size: 80%;">Expectation value and variance</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs053.html#expectation-value-and-variance-for-boldsymbol-beta" style="font-size: 80%;">Expectation value and variance for \( \boldsymbol{\beta} \)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs001.html#plans-for-week-37-lecture-monday" style="font-size: 80%;"><b>Plans for week 37, lecture Monday</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs002.html#readings-and-videos" style="font-size: 80%;"><b>Readings and Videos:</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs003.html#material-for-lecture-monday-september-8" style="font-size: 80%;"><b>Material for lecture Monday September 8</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs004.html#gradient-descent-and-revisiting-ordinary-least-squares-from-last-week" style="font-size: 80%;"><b>Gradient descent and revisiting Ordinary Least Squares from last week</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs009.html#gradient-descent-example" style="font-size: 80%;"><b>Gradient descent example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs006.html#the-derivative-of-the-cost-loss-function" style="font-size: 80%;"><b>The derivative of the cost/loss function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs007.html#the-hessian-matrix" style="font-size: 80%;"><b>The Hessian matrix</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs008.html#simple-program" style="font-size: 80%;"><b>Simple program</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs009.html#gradient-descent-example" style="font-size: 80%;"><b>Gradient Descent Example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs010.html#gradient-descent-and-ridge" style="font-size: 80%;"><b>Gradient descent and Ridge</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs011.html#the-hessian-matrix-for-ridge-regression" style="font-size: 80%;"><b>The Hessian matrix for Ridge Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs012.html#program-example-for-gradient-descent-with-ridge-regression" style="font-size: 80%;"><b>Program example for gradient descent with Ridge Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs013.html#using-gradient-descent-methods-limitations" style="font-size: 80%;"><b>Using gradient descent methods, limitations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs014.html#momentum-based-gd" style="font-size: 80%;"><b>Momentum based GD</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs015.html#improving-gradient-descent-with-momentum" style="font-size: 80%;"><b>Improving gradient descent with momentum</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs056.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs017.html#overview-video-on-stochastic-gradient-descent-sgd" style="font-size: 80%;"><b>Overview video on Stochastic Gradient Descent (SGD)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs018.html#batches-and-mini-batches" style="font-size: 80%;"><b>Batches and mini-batches</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs019.html#pros-and-cons" style="font-size: 80%;"><b>Pros and cons</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs020.html#convergence-rates" style="font-size: 80%;"><b>Convergence rates</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs021.html#accuracy" style="font-size: 80%;"><b>Accuracy</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs022.html#stochastic-gradient-descent-sgd" style="font-size: 80%;"><b>Stochastic Gradient Descent (SGD)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs023.html#stochastic-gradient-descent" style="font-size: 80%;"><b>Stochastic Gradient Descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs024.html#computation-of-gradients" style="font-size: 80%;"><b>Computation of gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs025.html#sgd-example" style="font-size: 80%;"><b>SGD example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs026.html#the-gradient-step" style="font-size: 80%;"><b>The gradient step</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs027.html#simple-example-code" style="font-size: 80%;"><b>Simple example code</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs028.html#when-do-we-stop" style="font-size: 80%;"><b>When do we stop?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs029.html#slightly-different-approach" style="font-size: 80%;"><b>Slightly different approach</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs030.html#time-decay-rate" style="font-size: 80%;"><b>Time decay rate</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs031.html#code-with-a-number-of-minibatches-which-varies" style="font-size: 80%;"><b>Code with a Number of Minibatches which varies</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs032.html#replace-or-not" style="font-size: 80%;"><b>Replace or not</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#sgd-vs-full-batch-gd-convergence-speed-and-memory-comparison" style="font-size: 80%;"><b>SGD vs Full-Batch GD: Convergence Speed and Memory Comparison</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#theoretical-convergence-speed-and-convex-optimization" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Theoretical Convergence Speed and convex optimization</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#strongly-convex-case" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Strongly Convex Case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#non-convex-problems" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Non-Convex Problems</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs034.html#memory-usage-and-scalability" style="font-size: 80%;"><b>Memory Usage and Scalability</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#empirical-evidence-convergence-time-and-memory-in-practice" style="font-size: 80%;"><b>Empirical Evidence: Convergence Time and Memory in Practice</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#deep-neural-networks" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Deep Neural Networks</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#memory-constraints" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Memory constraints</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs036.html#second-moment-of-the-gradient" style="font-size: 80%;"><b>Second moment of the gradient</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs037.html#challenge-choosing-a-fixed-learning-rate" style="font-size: 80%;"><b>Challenge: Choosing a Fixed Learning Rate</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs038.html#motivation-for-adaptive-step-sizes" style="font-size: 80%;"><b>Motivation for Adaptive Step Sizes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs038.html#adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>AdaGrad algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs039.html#derivation-of-the-adagrad-algorithm" style="font-size: 80%;"><b>Derivation of the AdaGrad Algorithm</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs040.html#adagrad-update-rule-derivation" style="font-size: 80%;"><b>AdaGrad Update Rule Derivation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs041.html#adagrad-properties" style="font-size: 80%;"><b>AdaGrad Properties</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs042.html#rmsprop-adaptive-learning-rates" style="font-size: 80%;"><b>RMSProp: Adaptive Learning Rates</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs042.html#rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>RMSProp algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs043.html#adam-optimizer" style="font-size: 80%;"><b>Adam Optimizer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs044.html#adam-optimizer-https-arxiv-org-abs-1412-6980" style="font-size: 80%;"><b>"ADAM optimizer":"https://arxiv.org/abs/1412.6980"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs045.html#why-combine-momentum-and-rmsprop" style="font-size: 80%;"><b>Why Combine Momentum and RMSProp?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs046.html#adam-exponential-moving-averages-moments" style="font-size: 80%;"><b>Adam: Exponential Moving Averages (Moments)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs047.html#adam-bias-correction" style="font-size: 80%;"><b>Adam: Bias Correction</b></a></li>
+     <!-- navigation toc: --> <li><a href="#adam-update-rule-derivation" style="font-size: 80%;"><b>Adam: Update Rule Derivation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs049.html#adam-vs-adagrad-and-rmsprop" style="font-size: 80%;"><b>Adam vs. AdaGrad and RMSProp</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs050.html#adaptivity-across-dimensions" style="font-size: 80%;"><b>Adaptivity Across Dimensions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs050.html#adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>ADAM algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs051.html#algorithms-and-codes-for-adagrad-rmsprop-and-adam" style="font-size: 80%;"><b>Algorithms and codes for Adagrad, RMSprop and Adam</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs052.html#practical-tips" style="font-size: 80%;"><b>Practical tips</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs053.html#sneaking-in-automatic-differentiation-using-autograd" style="font-size: 80%;"><b>Sneaking in automatic differentiation using Autograd</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs056.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs055.html#including-stochastic-gradient-descent-with-autograd" style="font-size: 80%;"><b>Including Stochastic Gradient Descent with Autograd</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs056.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs057.html#but-none-of-these-can-compete-with-newton-s-method" style="font-size: 80%;"><b>But none of these can compete with Newton's method</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs058.html#similar-second-order-function-now-problem-but-now-with-adagrad" style="font-size: 80%;"><b>Similar (second order function now) problem but now with AdaGrad</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs059.html#rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent" style="font-size: 80%;"><b>RMSprop for adaptive learning rate with Stochastic Gradient Descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs060.html#and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf" style="font-size: 80%;"><b>And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs061.html#material-for-the-lab-sessions" style="font-size: 80%;"><b>Material for the lab sessions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs062.html#reminder-on-different-scaling-methods" style="font-size: 80%;"><b>Reminder on different scaling methods</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs063.html#functionality-in-scikit-learn" style="font-size: 80%;"><b>Functionality in Scikit-Learn</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs064.html#more-preprocessing" style="font-size: 80%;"><b>More preprocessing</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs065.html#frequently-used-scaling-functions" style="font-size: 80%;"><b>Frequently used scaling functions</b></a></li>
 
         </ul>
       </li>
@@ -290,99 +373,24 @@
 <div class="container">
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0048"></a>
-<!-- !split  -->
-<h2 id="the-same-example-but-now-with-cross-validation" class="anchor">The same example but now with cross-validation </h2>
-
-<p>In this example we keep the intercept column again but add cross-validation in order to estimate the best possible value of the means squared error.</p>
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #408080; font-style: italic"># Common imports</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">os</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">pandas</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">pd</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">matplotlib.pyplot</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">plt</span>
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.linear_model</span> <span style="color: #008000; font-weight: bold">import</span> LinearRegression, Ridge, Lasso
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.metrics</span> <span style="color: #008000; font-weight: bold">import</span> mean_squared_error
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.model_selection</span> <span style="color: #008000; font-weight: bold">import</span> KFold
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.model_selection</span> <span style="color: #008000; font-weight: bold">import</span> cross_val_score
-
-
-<span style="color: #408080; font-style: italic"># Where to save the figures and data files</span>
-PROJECT_ROOT_DIR <span style="color: #666666">=</span> <span style="color: #BA2121">&quot;Results&quot;</span>
-FIGURE_ID <span style="color: #666666">=</span> <span style="color: #BA2121">&quot;Results/FigureFiles&quot;</span>
-DATA_ID <span style="color: #666666">=</span> <span style="color: #BA2121">&quot;DataFiles/&quot;</span>
-
-<span style="color: #008000; font-weight: bold">if</span> <span style="color: #AA22FF; font-weight: bold">not</span> os<span style="color: #666666">.</span>path<span style="color: #666666">.</span>exists(PROJECT_ROOT_DIR):
-    os<span style="color: #666666">.</span>mkdir(PROJECT_ROOT_DIR)
-
-<span style="color: #008000; font-weight: bold">if</span> <span style="color: #AA22FF; font-weight: bold">not</span> os<span style="color: #666666">.</span>path<span style="color: #666666">.</span>exists(FIGURE_ID):
-    os<span style="color: #666666">.</span>makedirs(FIGURE_ID)
-
-<span style="color: #008000; font-weight: bold">if</span> <span style="color: #AA22FF; font-weight: bold">not</span> os<span style="color: #666666">.</span>path<span style="color: #666666">.</span>exists(DATA_ID):
-    os<span style="color: #666666">.</span>makedirs(DATA_ID)
-
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">image_path</span>(fig_id):
-    <span style="color: #008000; font-weight: bold">return</span> os<span style="color: #666666">.</span>path<span style="color: #666666">.</span>join(FIGURE_ID, fig_id)
-
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">data_path</span>(dat_id):
-    <span style="color: #008000; font-weight: bold">return</span> os<span style="color: #666666">.</span>path<span style="color: #666666">.</span>join(DATA_ID, dat_id)
-
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">save_fig</span>(fig_id):
-    plt<span style="color: #666666">.</span>savefig(image_path(fig_id) <span style="color: #666666">+</span> <span style="color: #BA2121">&quot;.png&quot;</span>, <span style="color: #008000">format</span><span style="color: #666666">=</span><span style="color: #BA2121">&#39;png&#39;</span>)
-
-infile <span style="color: #666666">=</span> <span style="color: #008000">open</span>(data_path(<span style="color: #BA2121">&quot;EoS.csv&quot;</span>),<span style="color: #BA2121">&#39;r&#39;</span>)
-
-<span style="color: #408080; font-style: italic"># Read the EoS data as  csv file and organize the data into two arrays with density and energies</span>
-EoS <span style="color: #666666">=</span> pd<span style="color: #666666">.</span>read_csv(infile, names<span style="color: #666666">=</span>(<span style="color: #BA2121">&#39;Density&#39;</span>, <span style="color: #BA2121">&#39;Energy&#39;</span>))
-EoS[<span style="color: #BA2121">&#39;Energy&#39;</span>] <span style="color: #666666">=</span> pd<span style="color: #666666">.</span>to_numeric(EoS[<span style="color: #BA2121">&#39;Energy&#39;</span>], errors<span style="color: #666666">=</span><span style="color: #BA2121">&#39;coerce&#39;</span>)
-EoS <span style="color: #666666">=</span> EoS<span style="color: #666666">.</span>dropna()
-Energies <span style="color: #666666">=</span> EoS[<span style="color: #BA2121">&#39;Energy&#39;</span>]
-Density <span style="color: #666666">=</span> EoS[<span style="color: #BA2121">&#39;Density&#39;</span>]
-<span style="color: #408080; font-style: italic">#  The design matrix now as function of various polytrops</span>
-
-Maxpolydegree <span style="color: #666666">=</span> <span style="color: #666666">30</span>
-X <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros((<span style="color: #008000">len</span>(Density),Maxpolydegree))
-X[:,<span style="color: #666666">0</span>] <span style="color: #666666">=</span> <span style="color: #666666">1.0</span>
-estimated_mse_sklearn <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros(Maxpolydegree)
-polynomial <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros(Maxpolydegree)
-k <span style="color: #666666">=5</span>
-kfold <span style="color: #666666">=</span> KFold(n_splits <span style="color: #666666">=</span> k)
-
-<span style="color: #008000; font-weight: bold">for</span> polydegree <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #666666">1</span>, Maxpolydegree):
-    polynomial[polydegree] <span style="color: #666666">=</span> polydegree
-    <span style="color: #008000; font-weight: bold">for</span> degree <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(polydegree):
-        X[:,degree] <span style="color: #666666">=</span> Density<span style="color: #666666">**</span>(degree<span style="color: #666666">/3.0</span>)
-        OLS <span style="color: #666666">=</span> LinearRegression(fit_intercept<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">False</span>)
-<span style="color: #408080; font-style: italic"># loop over trials in order to estimate the expectation value of the MSE</span>
-    estimated_mse_folds <span style="color: #666666">=</span> cross_val_score(OLS, X, Energies, scoring<span style="color: #666666">=</span><span style="color: #BA2121">&#39;neg_mean_squared_error&#39;</span>, cv<span style="color: #666666">=</span>kfold)
-<span style="color: #408080; font-style: italic">#[:, np.newaxis]</span>
-    estimated_mse_sklearn[polydegree] <span style="color: #666666">=</span> np<span style="color: #666666">.</span>mean(<span style="color: #666666">-</span>estimated_mse_folds)
-
-plt<span style="color: #666666">.</span>plot(polynomial, np<span style="color: #666666">.</span>log10(estimated_mse_sklearn), label<span style="color: #666666">=</span><span style="color: #BA2121">&#39;Test Error&#39;</span>)
-plt<span style="color: #666666">.</span>xlabel(<span style="color: #BA2121">&#39;Polynomial degree&#39;</span>)
-plt<span style="color: #666666">.</span>ylabel(<span style="color: #BA2121">&#39;log10[MSE]&#39;</span>)
-plt<span style="color: #666666">.</span>legend()
-plt<span style="color: #666666">.</span>show()
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
+<!-- !split -->
+<h2 id="adam-update-rule-derivation" class="anchor">Adam: Update Rule Derivation </h2>
+<p>Finally, Adam updates parameters using the bias-corrected moments:</p>
+$$
+\theta_{t+1} =\theta_t -\frac{\alpha}{\sqrt{\hat{v}_t} + \epsilon}\hat{m}_t,
+$$
 
+<p>where \( \epsilon \) is a small constant (e.g. \( 10^{-8} \)) to prevent division by zero.
+Breaking it down:
+</p>
+<ol>
+<li> Compute gradient \( \nabla C(\theta_t) \).</li>
+<li> Update first moment \( m_t \) and second moment \( v_t \) (exponential moving averages).</li>
+<li> Bias-correct: \( \hat{m}_t = m_t/(1-\beta_1^t) \), \( \; \hat{v}_t = v_t/(1-\beta_2^t) \).</li>
+<li> Compute step: \( \Delta \theta_t = \frac{\hat{m}_t}{\sqrt{\hat{v}_t} + \epsilon} \).</li>
+<li> Update parameters: \( \theta_{t+1} = \theta_t - \alpha\, \Delta \theta_t \).</li>
+</ol>
+<p>This is the Adam update rule as given in the original paper.</p>
 
 <p>
 <!-- navigation buttons at the bottom of the page -->
@@ -404,6 +412,12 @@ <h2 id="the-same-example-but-now-with-cross-validation" class="anchor">The same
   <li><a href="http://github.com/._week37-bs051.html">52</a></li>
   <li><a href="http://github.com/._week37-bs052.html">53</a></li>
   <li><a href="http://github.com/._week37-bs053.html">54</a></li>
+  <li><a href="http://github.com/._week37-bs054.html">55</a></li>
+  <li><a href="http://github.com/._week37-bs055.html">56</a></li>
+  <li><a href="http://github.com/._week37-bs056.html">57</a></li>
+  <li><a href="http://github.com/._week37-bs057.html">58</a></li>
+  <li><a href="">...</a></li>
+  <li><a href="http://github.com/._week37-bs065.html">66</a></li>
   <li><a href="http://github.com/._week37-bs049.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week37/html/._week37-bs049.html b/doc/pub/week37/html/._week37-bs049.html
index ada11237f..4eb424b04 100644
--- a/doc/pub/week37/html/._week37-bs049.html
+++ b/doc/pub/week37/html/._week37-bs049.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 37: Statistical interpretations and Resampling Methods">
-<title>Week 37: Statistical interpretations and Resampling Methods</title>
+<meta name="description" content="Week 37: Gradient descent methods">
+<title>Week 37: Gradient descent methods</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week37.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week37-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -40,159 +40,222 @@
                2,
                None,
                'plans-for-week-37-lecture-monday'),
-              ('Plans for week 37, lab sessions',
+              ('Readings and Videos:', 2, None, 'readings-and-videos'),
+              ('Material for lecture Monday September 8',
                2,
                None,
-               'plans-for-week-37-lab-sessions'),
-              ('Material for lecture Monday September 9',
+               'material-for-lecture-monday-september-8'),
+              ('Gradient descent and revisiting Ordinary Least Squares from '
+               'last week',
                2,
                None,
-               'material-for-lecture-monday-september-9'),
-              ('Deriving OLS from a probability distribution',
+               'gradient-descent-and-revisiting-ordinary-least-squares-from-last-week'),
+              ('Gradient descent example', 2, None, 'gradient-descent-example'),
+              ('The derivative of the cost/loss function',
                2,
                None,
-               'deriving-ols-from-a-probability-distribution'),
-              ('Independent and Identically Distrubuted (iid)',
+               'the-derivative-of-the-cost-loss-function'),
+              ('The Hessian matrix', 2, None, 'the-hessian-matrix'),
+              ('Simple program', 2, None, 'simple-program'),
+              ('Gradient Descent Example', 2, None, 'gradient-descent-example'),
+              ('Gradient descent and Ridge',
                2,
                None,
-               'independent-and-identically-distrubuted-iid'),
-              ('Maximum Likelihood Estimation (MLE)',
+               'gradient-descent-and-ridge'),
+              ('The Hessian matrix for Ridge Regression',
                2,
                None,
-               'maximum-likelihood-estimation-mle'),
-              ('A new Cost Function', 2, None, 'a-new-cost-function'),
-              ("More basic Statistics and Bayes' theorem",
+               'the-hessian-matrix-for-ridge-regression'),
+              ('Program example for gradient descent with Ridge Regression',
                2,
                None,
-               'more-basic-statistics-and-bayes-theorem'),
-              ('Marginal Probability', 2, None, 'marginal-probability'),
-              ('Conditional  Probability', 2, None, 'conditional-probability'),
-              ("Bayes' Theorem", 2, None, 'bayes-theorem'),
-              ("Interpretations of Bayes' Theorem",
+               'program-example-for-gradient-descent-with-ridge-regression'),
+              ('Using gradient descent methods, limitations',
                2,
                None,
-               'interpretations-of-bayes-theorem'),
-              ("Example of Usage of Bayes' theorem",
+               'using-gradient-descent-methods-limitations'),
+              ('Momentum based GD', 2, None, 'momentum-based-gd'),
+              ('Improving gradient descent with momentum',
                2,
                None,
-               'example-of-usage-of-bayes-theorem'),
-              ('Doing it correctly', 2, None, 'doing-it-correctly'),
-              ("Bayes' Theorem and Ridge and Lasso Regression",
+               'improving-gradient-descent-with-momentum'),
+              ('Same code but now with momentum gradient descent',
                2,
                None,
-               'bayes-theorem-and-ridge-and-lasso-regression'),
-              ('Ridge and Bayes', 2, None, 'ridge-and-bayes'),
-              ('Lasso and Bayes', 2, None, 'lasso-and-bayes'),
-              ('Why resampling methods', 2, None, 'why-resampling-methods'),
-              ('Resampling methods', 2, None, 'resampling-methods'),
-              ('Resampling approaches can be computationally expensive',
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ('Overview video on Stochastic Gradient Descent (SGD)',
                2,
                None,
-               'resampling-approaches-can-be-computationally-expensive'),
-              ('Why resampling methods ?', 2, None, 'why-resampling-methods'),
-              ('Statistical analysis', 2, None, 'statistical-analysis'),
-              ('Resampling methods', 2, None, 'resampling-methods'),
-              ('Resampling methods: Bootstrap',
+               'overview-video-on-stochastic-gradient-descent-sgd'),
+              ('Batches and mini-batches', 2, None, 'batches-and-mini-batches'),
+              ('Pros and cons', 2, None, 'pros-and-cons'),
+              ('Convergence rates', 2, None, 'convergence-rates'),
+              ('Accuracy', 2, None, 'accuracy'),
+              ('Stochastic Gradient Descent (SGD)',
                2,
                None,
-               'resampling-methods-bootstrap'),
-              ('The Central Limit Theorem',
+               'stochastic-gradient-descent-sgd'),
+              ('Stochastic Gradient Descent',
                2,
                None,
-               'the-central-limit-theorem'),
-              ('Finding the Limit', 2, None, 'finding-the-limit'),
-              ('Rewriting the $\\delta$-function',
+               'stochastic-gradient-descent'),
+              ('Computation of gradients', 2, None, 'computation-of-gradients'),
+              ('SGD example', 2, None, 'sgd-example'),
+              ('The gradient step', 2, None, 'the-gradient-step'),
+              ('Simple example code', 2, None, 'simple-example-code'),
+              ('When do we stop?', 2, None, 'when-do-we-stop'),
+              ('Slightly different approach',
                2,
                None,
-               'rewriting-the-delta-function'),
-              ('Identifying Terms', 2, None, 'identifying-terms'),
-              ('Wrapping it up', 2, None, 'wrapping-it-up'),
-              ('Confidence Intervals', 2, None, 'confidence-intervals'),
-              ('Standard Approach based on the Normal Distribution',
+               'slightly-different-approach'),
+              ('Time decay rate', 2, None, 'time-decay-rate'),
+              ('Code with a Number of Minibatches which varies',
                2,
                None,
-               'standard-approach-based-on-the-normal-distribution'),
-              ('Resampling methods: Bootstrap background',
+               'code-with-a-number-of-minibatches-which-varies'),
+              ('Replace or not', 2, None, 'replace-or-not'),
+              ('SGD vs Full-Batch GD: Convergence Speed and Memory Comparison',
                2,
                None,
-               'resampling-methods-bootstrap-background'),
-              ('Resampling methods: More Bootstrap background',
+               'sgd-vs-full-batch-gd-convergence-speed-and-memory-comparison'),
+              ('Theoretical Convergence Speed and convex optimization',
+               3,
+               None,
+               'theoretical-convergence-speed-and-convex-optimization'),
+              ('Strongly Convex Case', 3, None, 'strongly-convex-case'),
+              ('Non-Convex Problems', 3, None, 'non-convex-problems'),
+              ('Memory Usage and Scalability',
+               2,
+               None,
+               'memory-usage-and-scalability'),
+              ('Empirical Evidence: Convergence Time and Memory in Practice',
+               2,
+               None,
+               'empirical-evidence-convergence-time-and-memory-in-practice'),
+              ('Deep Neural Networks', 3, None, 'deep-neural-networks'),
+              ('Memory constraints', 3, None, 'memory-constraints'),
+              ('Second moment of the gradient',
+               2,
+               None,
+               'second-moment-of-the-gradient'),
+              ('Challenge: Choosing a Fixed Learning Rate',
+               2,
+               None,
+               'challenge-choosing-a-fixed-learning-rate'),
+              ('Motivation for Adaptive Step Sizes',
+               2,
+               None,
+               'motivation-for-adaptive-step-sizes'),
+              ('AdaGrad algorithm, taken from "Goodfellow et '
+               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
+               2,
+               None,
+               'adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
+              ('Derivation of the AdaGrad Algorithm',
+               2,
+               None,
+               'derivation-of-the-adagrad-algorithm'),
+              ('AdaGrad Update Rule Derivation',
+               2,
+               None,
+               'adagrad-update-rule-derivation'),
+              ('AdaGrad Properties', 2, None, 'adagrad-properties'),
+              ('RMSProp: Adaptive Learning Rates',
+               2,
+               None,
+               'rmsprop-adaptive-learning-rates'),
+              ('RMSProp algorithm, taken from "Goodfellow et '
+               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
+               2,
+               None,
+               'rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
+              ('Adam Optimizer', 2, None, 'adam-optimizer'),
+              ('"ADAM optimizer":"https://arxiv.org/abs/1412.6980"',
                2,
                None,
-               'resampling-methods-more-bootstrap-background'),
-              ('Resampling methods: Bootstrap approach',
+               'adam-optimizer-https-arxiv-org-abs-1412-6980'),
+              ('Why Combine Momentum and RMSProp?',
                2,
                None,
-               'resampling-methods-bootstrap-approach'),
-              ('Resampling methods: Bootstrap steps',
+               'why-combine-momentum-and-rmsprop'),
+              ('Adam: Exponential Moving Averages (Moments)',
                2,
                None,
-               'resampling-methods-bootstrap-steps'),
-              ('Code example for the Bootstrap method',
+               'adam-exponential-moving-averages-moments'),
+              ('Adam: Bias Correction', 2, None, 'adam-bias-correction'),
+              ('Adam: Update Rule Derivation',
                2,
                None,
-               'code-example-for-the-bootstrap-method'),
-              ('Plotting the Histogram', 2, None, 'plotting-the-histogram'),
-              ('The bias-variance tradeoff',
+               'adam-update-rule-derivation'),
+              ('Adam vs. AdaGrad and RMSProp',
                2,
                None,
-               'the-bias-variance-tradeoff'),
-              ('A way to Read the Bias-Variance Tradeoff',
+               'adam-vs-adagrad-and-rmsprop'),
+              ('Adaptivity Across Dimensions',
                2,
                None,
-               'a-way-to-read-the-bias-variance-tradeoff'),
-              ('Example code for Bias-Variance tradeoff',
+               'adaptivity-across-dimensions'),
+              ('ADAM algorithm, taken from "Goodfellow et '
+               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
                2,
                None,
-               'example-code-for-bias-variance-tradeoff'),
-              ('Understanding what happens',
+               'adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
+              ('Algorithms and codes for Adagrad, RMSprop and Adam',
                2,
                None,
-               'understanding-what-happens'),
-              ('Summing up', 2, None, 'summing-up'),
-              ("Another Example from Scikit-Learn's Repository",
+               'algorithms-and-codes-for-adagrad-rmsprop-and-adam'),
+              ('Practical tips', 2, None, 'practical-tips'),
+              ('Sneaking in automatic differentiation using Autograd',
                2,
                None,
-               'another-example-from-scikit-learn-s-repository'),
-              ('Various steps in cross-validation',
+               'sneaking-in-automatic-differentiation-using-autograd'),
+              ('Same code but now with momentum gradient descent',
                2,
                None,
-               'various-steps-in-cross-validation'),
-              ('Cross-validation in brief',
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ('Including Stochastic Gradient Descent with Autograd',
                2,
                None,
-               'cross-validation-in-brief'),
-              ('Code Example for Cross-validation and $k$-fold '
-               'Cross-validation',
+               'including-stochastic-gradient-descent-with-autograd'),
+              ('Same code but now with momentum gradient descent',
                2,
                None,
-               'code-example-for-cross-validation-and-k-fold-cross-validation'),
-              ('More examples on bootstrap and cross-validation and errors',
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ("But none of these can compete with Newton's method",
                2,
                None,
-               'more-examples-on-bootstrap-and-cross-validation-and-errors'),
-              ('The same example but now with cross-validation',
+               'but-none-of-these-can-compete-with-newton-s-method'),
+              ('Similar (second order function now) problem but now with '
+               'AdaGrad',
                2,
                None,
-               'the-same-example-but-now-with-cross-validation'),
+               'similar-second-order-function-now-problem-but-now-with-adagrad'),
+              ('RMSprop for adaptive learning rate with Stochastic Gradient '
+               'Descent',
+               2,
+               None,
+               'rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent'),
+              ('And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"',
+               2,
+               None,
+               'and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf'),
               ('Material for the lab sessions',
                2,
                None,
                'material-for-the-lab-sessions'),
-              ('Linking the regression analysis with a statistical '
-               'interpretation',
+              ('Reminder on different scaling methods',
                2,
                None,
-               'linking-the-regression-analysis-with-a-statistical-interpretation'),
-              ('Assumptions made', 2, None, 'assumptions-made'),
-              ('Expectation value and variance',
+               'reminder-on-different-scaling-methods'),
+              ('Functionality in Scikit-Learn',
                2,
                None,
-               'expectation-value-and-variance'),
-              ('Expectation value and variance for $\\boldsymbol{\\beta}$',
+               'functionality-in-scikit-learn'),
+              ('More preprocessing', 2, None, 'more-preprocessing'),
+              ('Frequently used scaling functions',
                2,
                None,
-               'expectation-value-and-variance-for-boldsymbol-beta')]}
+               'frequently-used-scaling-functions')]}
 end of tocinfo -->
 
 <body>
@@ -220,66 +283,86 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week37-bs.html">Week 37: Statistical interpretations and Resampling Methods</a>
+    <a class="navbar-brand" href="http://github.com/week37-bs.html">Week 37: Gradient descent methods</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs001.html#plans-for-week-37-lecture-monday" style="font-size: 80%;">Plans for week 37, lecture Monday</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs002.html#plans-for-week-37-lab-sessions" style="font-size: 80%;">Plans for week 37, lab sessions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs003.html#material-for-lecture-monday-september-9" style="font-size: 80%;">Material for lecture Monday September 9</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs004.html#deriving-ols-from-a-probability-distribution" style="font-size: 80%;">Deriving OLS from a probability distribution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs005.html#independent-and-identically-distrubuted-iid" style="font-size: 80%;">Independent and Identically Distrubuted (iid)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs006.html#maximum-likelihood-estimation-mle" style="font-size: 80%;">Maximum Likelihood Estimation (MLE)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs007.html#a-new-cost-function" style="font-size: 80%;">A new Cost Function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs008.html#more-basic-statistics-and-bayes-theorem" style="font-size: 80%;">More basic Statistics and Bayes' theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs009.html#marginal-probability" style="font-size: 80%;">Marginal Probability</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs010.html#conditional-probability" style="font-size: 80%;">Conditional  Probability</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs011.html#bayes-theorem" style="font-size: 80%;">Bayes' Theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs012.html#interpretations-of-bayes-theorem" style="font-size: 80%;">Interpretations of Bayes' Theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs013.html#example-of-usage-of-bayes-theorem" style="font-size: 80%;">Example of Usage of Bayes' theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs014.html#doing-it-correctly" style="font-size: 80%;">Doing it correctly</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs015.html#bayes-theorem-and-ridge-and-lasso-regression" style="font-size: 80%;">Bayes' Theorem and Ridge and Lasso Regression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs016.html#ridge-and-bayes" style="font-size: 80%;">Ridge and Bayes</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs017.html#lasso-and-bayes" style="font-size: 80%;">Lasso and Bayes</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs021.html#why-resampling-methods" style="font-size: 80%;">Why resampling methods</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs023.html#resampling-methods" style="font-size: 80%;">Resampling methods</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs020.html#resampling-approaches-can-be-computationally-expensive" style="font-size: 80%;">Resampling approaches can be computationally expensive</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs021.html#why-resampling-methods" style="font-size: 80%;">Why resampling methods ?</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs022.html#statistical-analysis" style="font-size: 80%;">Statistical analysis</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs023.html#resampling-methods" style="font-size: 80%;">Resampling methods</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs024.html#resampling-methods-bootstrap" style="font-size: 80%;">Resampling methods: Bootstrap</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs025.html#the-central-limit-theorem" style="font-size: 80%;">The Central Limit Theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs026.html#finding-the-limit" style="font-size: 80%;">Finding the Limit</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs027.html#rewriting-the-delta-function" style="font-size: 80%;">Rewriting the \( \delta \)-function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs028.html#identifying-terms" style="font-size: 80%;">Identifying Terms</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs029.html#wrapping-it-up" style="font-size: 80%;">Wrapping it up</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs030.html#confidence-intervals" style="font-size: 80%;">Confidence Intervals</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs031.html#standard-approach-based-on-the-normal-distribution" style="font-size: 80%;">Standard Approach based on the Normal Distribution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs032.html#resampling-methods-bootstrap-background" style="font-size: 80%;">Resampling methods: Bootstrap background</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#resampling-methods-more-bootstrap-background" style="font-size: 80%;">Resampling methods: More Bootstrap background</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs034.html#resampling-methods-bootstrap-approach" style="font-size: 80%;">Resampling methods: Bootstrap approach</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#resampling-methods-bootstrap-steps" style="font-size: 80%;">Resampling methods: Bootstrap steps</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs036.html#code-example-for-the-bootstrap-method" style="font-size: 80%;">Code example for the Bootstrap method</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs037.html#plotting-the-histogram" style="font-size: 80%;">Plotting the Histogram</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs038.html#the-bias-variance-tradeoff" style="font-size: 80%;">The bias-variance tradeoff</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs039.html#a-way-to-read-the-bias-variance-tradeoff" style="font-size: 80%;">A way to Read the Bias-Variance Tradeoff</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs040.html#example-code-for-bias-variance-tradeoff" style="font-size: 80%;">Example code for Bias-Variance tradeoff</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs041.html#understanding-what-happens" style="font-size: 80%;">Understanding what happens</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs042.html#summing-up" style="font-size: 80%;">Summing up</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs043.html#another-example-from-scikit-learn-s-repository" style="font-size: 80%;">Another Example from Scikit-Learn's Repository</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs044.html#various-steps-in-cross-validation" style="font-size: 80%;">Various steps in cross-validation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs045.html#cross-validation-in-brief" style="font-size: 80%;">Cross-validation in brief</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs046.html#code-example-for-cross-validation-and-k-fold-cross-validation" style="font-size: 80%;">Code Example for Cross-validation and \( k \)-fold Cross-validation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs047.html#more-examples-on-bootstrap-and-cross-validation-and-errors" style="font-size: 80%;">More examples on bootstrap and cross-validation and errors</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs048.html#the-same-example-but-now-with-cross-validation" style="font-size: 80%;">The same example but now with cross-validation</a></li>
-     <!-- navigation toc: --> <li><a href="#material-for-the-lab-sessions" style="font-size: 80%;">Material for the lab sessions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs050.html#linking-the-regression-analysis-with-a-statistical-interpretation" style="font-size: 80%;">Linking the regression analysis with a statistical interpretation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs051.html#assumptions-made" style="font-size: 80%;">Assumptions made</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs052.html#expectation-value-and-variance" style="font-size: 80%;">Expectation value and variance</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs053.html#expectation-value-and-variance-for-boldsymbol-beta" style="font-size: 80%;">Expectation value and variance for \( \boldsymbol{\beta} \)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs001.html#plans-for-week-37-lecture-monday" style="font-size: 80%;"><b>Plans for week 37, lecture Monday</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs002.html#readings-and-videos" style="font-size: 80%;"><b>Readings and Videos:</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs003.html#material-for-lecture-monday-september-8" style="font-size: 80%;"><b>Material for lecture Monday September 8</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs004.html#gradient-descent-and-revisiting-ordinary-least-squares-from-last-week" style="font-size: 80%;"><b>Gradient descent and revisiting Ordinary Least Squares from last week</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs009.html#gradient-descent-example" style="font-size: 80%;"><b>Gradient descent example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs006.html#the-derivative-of-the-cost-loss-function" style="font-size: 80%;"><b>The derivative of the cost/loss function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs007.html#the-hessian-matrix" style="font-size: 80%;"><b>The Hessian matrix</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs008.html#simple-program" style="font-size: 80%;"><b>Simple program</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs009.html#gradient-descent-example" style="font-size: 80%;"><b>Gradient Descent Example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs010.html#gradient-descent-and-ridge" style="font-size: 80%;"><b>Gradient descent and Ridge</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs011.html#the-hessian-matrix-for-ridge-regression" style="font-size: 80%;"><b>The Hessian matrix for Ridge Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs012.html#program-example-for-gradient-descent-with-ridge-regression" style="font-size: 80%;"><b>Program example for gradient descent with Ridge Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs013.html#using-gradient-descent-methods-limitations" style="font-size: 80%;"><b>Using gradient descent methods, limitations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs014.html#momentum-based-gd" style="font-size: 80%;"><b>Momentum based GD</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs015.html#improving-gradient-descent-with-momentum" style="font-size: 80%;"><b>Improving gradient descent with momentum</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs056.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs017.html#overview-video-on-stochastic-gradient-descent-sgd" style="font-size: 80%;"><b>Overview video on Stochastic Gradient Descent (SGD)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs018.html#batches-and-mini-batches" style="font-size: 80%;"><b>Batches and mini-batches</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs019.html#pros-and-cons" style="font-size: 80%;"><b>Pros and cons</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs020.html#convergence-rates" style="font-size: 80%;"><b>Convergence rates</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs021.html#accuracy" style="font-size: 80%;"><b>Accuracy</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs022.html#stochastic-gradient-descent-sgd" style="font-size: 80%;"><b>Stochastic Gradient Descent (SGD)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs023.html#stochastic-gradient-descent" style="font-size: 80%;"><b>Stochastic Gradient Descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs024.html#computation-of-gradients" style="font-size: 80%;"><b>Computation of gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs025.html#sgd-example" style="font-size: 80%;"><b>SGD example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs026.html#the-gradient-step" style="font-size: 80%;"><b>The gradient step</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs027.html#simple-example-code" style="font-size: 80%;"><b>Simple example code</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs028.html#when-do-we-stop" style="font-size: 80%;"><b>When do we stop?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs029.html#slightly-different-approach" style="font-size: 80%;"><b>Slightly different approach</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs030.html#time-decay-rate" style="font-size: 80%;"><b>Time decay rate</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs031.html#code-with-a-number-of-minibatches-which-varies" style="font-size: 80%;"><b>Code with a Number of Minibatches which varies</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs032.html#replace-or-not" style="font-size: 80%;"><b>Replace or not</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#sgd-vs-full-batch-gd-convergence-speed-and-memory-comparison" style="font-size: 80%;"><b>SGD vs Full-Batch GD: Convergence Speed and Memory Comparison</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#theoretical-convergence-speed-and-convex-optimization" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Theoretical Convergence Speed and convex optimization</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#strongly-convex-case" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Strongly Convex Case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#non-convex-problems" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Non-Convex Problems</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs034.html#memory-usage-and-scalability" style="font-size: 80%;"><b>Memory Usage and Scalability</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#empirical-evidence-convergence-time-and-memory-in-practice" style="font-size: 80%;"><b>Empirical Evidence: Convergence Time and Memory in Practice</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#deep-neural-networks" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Deep Neural Networks</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#memory-constraints" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Memory constraints</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs036.html#second-moment-of-the-gradient" style="font-size: 80%;"><b>Second moment of the gradient</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs037.html#challenge-choosing-a-fixed-learning-rate" style="font-size: 80%;"><b>Challenge: Choosing a Fixed Learning Rate</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs038.html#motivation-for-adaptive-step-sizes" style="font-size: 80%;"><b>Motivation for Adaptive Step Sizes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs038.html#adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>AdaGrad algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs039.html#derivation-of-the-adagrad-algorithm" style="font-size: 80%;"><b>Derivation of the AdaGrad Algorithm</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs040.html#adagrad-update-rule-derivation" style="font-size: 80%;"><b>AdaGrad Update Rule Derivation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs041.html#adagrad-properties" style="font-size: 80%;"><b>AdaGrad Properties</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs042.html#rmsprop-adaptive-learning-rates" style="font-size: 80%;"><b>RMSProp: Adaptive Learning Rates</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs042.html#rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>RMSProp algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs043.html#adam-optimizer" style="font-size: 80%;"><b>Adam Optimizer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs044.html#adam-optimizer-https-arxiv-org-abs-1412-6980" style="font-size: 80%;"><b>"ADAM optimizer":"https://arxiv.org/abs/1412.6980"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs045.html#why-combine-momentum-and-rmsprop" style="font-size: 80%;"><b>Why Combine Momentum and RMSProp?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs046.html#adam-exponential-moving-averages-moments" style="font-size: 80%;"><b>Adam: Exponential Moving Averages (Moments)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs047.html#adam-bias-correction" style="font-size: 80%;"><b>Adam: Bias Correction</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs048.html#adam-update-rule-derivation" style="font-size: 80%;"><b>Adam: Update Rule Derivation</b></a></li>
+     <!-- navigation toc: --> <li><a href="#adam-vs-adagrad-and-rmsprop" style="font-size: 80%;"><b>Adam vs. AdaGrad and RMSProp</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs050.html#adaptivity-across-dimensions" style="font-size: 80%;"><b>Adaptivity Across Dimensions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs050.html#adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>ADAM algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs051.html#algorithms-and-codes-for-adagrad-rmsprop-and-adam" style="font-size: 80%;"><b>Algorithms and codes for Adagrad, RMSprop and Adam</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs052.html#practical-tips" style="font-size: 80%;"><b>Practical tips</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs053.html#sneaking-in-automatic-differentiation-using-autograd" style="font-size: 80%;"><b>Sneaking in automatic differentiation using Autograd</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs056.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs055.html#including-stochastic-gradient-descent-with-autograd" style="font-size: 80%;"><b>Including Stochastic Gradient Descent with Autograd</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs056.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs057.html#but-none-of-these-can-compete-with-newton-s-method" style="font-size: 80%;"><b>But none of these can compete with Newton's method</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs058.html#similar-second-order-function-now-problem-but-now-with-adagrad" style="font-size: 80%;"><b>Similar (second order function now) problem but now with AdaGrad</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs059.html#rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent" style="font-size: 80%;"><b>RMSprop for adaptive learning rate with Stochastic Gradient Descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs060.html#and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf" style="font-size: 80%;"><b>And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs061.html#material-for-the-lab-sessions" style="font-size: 80%;"><b>Material for the lab sessions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs062.html#reminder-on-different-scaling-methods" style="font-size: 80%;"><b>Reminder on different scaling methods</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs063.html#functionality-in-scikit-learn" style="font-size: 80%;"><b>Functionality in Scikit-Learn</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs064.html#more-preprocessing" style="font-size: 80%;"><b>More preprocessing</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs065.html#frequently-used-scaling-functions" style="font-size: 80%;"><b>Frequently used scaling functions</b></a></li>
 
         </ul>
       </li>
@@ -291,7 +374,19 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0049"></a>
 <!-- !split -->
-<h2 id="material-for-the-lab-sessions" class="anchor">Material for the lab sessions </h2>
+<h2 id="adam-vs-adagrad-and-rmsprop" class="anchor">Adam vs. AdaGrad and RMSProp </h2>
+
+<ol>
+<li> AdaGrad: Uses per-coordinate scaling like Adam, but no momentum. Tends to slow down too much due to cumulative history (no forgetting)</li>
+<li> RMSProp: Uses moving average of squared gradients (like Adam&#8217;s \( v_t \)) to maintain adaptive learning rates, but does not include momentum or bias-correction.</li>
+<li> Adam: Effectively RMSProp + Momentum + Bias-correction</li>
+<ul>
+  <li> Momentum (\( m_t \)) provides acceleration and smoother convergence.</li>
+  <li> Adaptive \( v_t \) scaling moderates the step size per dimension.</li>
+  <li> Bias correction (absent in AdaGrad/RMSProp) ensures robust estimates early on.</li>
+</ul>
+</ol>
+<p>In practice, Adam often yields faster convergence and better tuning stability than RMSProp or AdaGrad alone</p>
 
 <p>
 <!-- navigation buttons at the bottom of the page -->
@@ -312,6 +407,13 @@ <h2 id="material-for-the-lab-sessions" class="anchor">Material for the lab sessi
   <li><a href="http://github.com/._week37-bs051.html">52</a></li>
   <li><a href="http://github.com/._week37-bs052.html">53</a></li>
   <li><a href="http://github.com/._week37-bs053.html">54</a></li>
+  <li><a href="http://github.com/._week37-bs054.html">55</a></li>
+  <li><a href="http://github.com/._week37-bs055.html">56</a></li>
+  <li><a href="http://github.com/._week37-bs056.html">57</a></li>
+  <li><a href="http://github.com/._week37-bs057.html">58</a></li>
+  <li><a href="http://github.com/._week37-bs058.html">59</a></li>
+  <li><a href="">...</a></li>
+  <li><a href="http://github.com/._week37-bs065.html">66</a></li>
   <li><a href="http://github.com/._week37-bs050.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week37/html/figures/adagrad.png b/doc/pub/week37/html/figures/adagrad.png
new file mode 100644
index 000000000..97a9cf908
Binary files /dev/null and b/doc/pub/week37/html/figures/adagrad.png differ
diff --git a/doc/pub/week37/html/figures/adam.png b/doc/pub/week37/html/figures/adam.png
new file mode 100644
index 000000000..a3a39f025
Binary files /dev/null and b/doc/pub/week37/html/figures/adam.png differ
diff --git a/doc/pub/week37/html/figures/rmsprop.png b/doc/pub/week37/html/figures/rmsprop.png
new file mode 100644
index 000000000..9f336d033
Binary files /dev/null and b/doc/pub/week37/html/figures/rmsprop.png differ
diff --git a/doc/pub/week37/html/week37-bs.html b/doc/pub/week37/html/week37-bs.html
index 39cfcfc18..5cb7bff40 100644
--- a/doc/pub/week37/html/week37-bs.html
+++ b/doc/pub/week37/html/week37-bs.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 37: Statistical interpretations and Resampling Methods">
-<title>Week 37: Statistical interpretations and Resampling Methods</title>
+<meta name="description" content="Week 37: Gradient descent methods">
+<title>Week 37: Gradient descent methods</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week37.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week37-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -40,159 +40,222 @@
                2,
                None,
                'plans-for-week-37-lecture-monday'),
-              ('Plans for week 37, lab sessions',
+              ('Readings and Videos:', 2, None, 'readings-and-videos'),
+              ('Material for lecture Monday September 8',
                2,
                None,
-               'plans-for-week-37-lab-sessions'),
-              ('Material for lecture Monday September 9',
+               'material-for-lecture-monday-september-8'),
+              ('Gradient descent and revisiting Ordinary Least Squares from '
+               'last week',
                2,
                None,
-               'material-for-lecture-monday-september-9'),
-              ('Deriving OLS from a probability distribution',
+               'gradient-descent-and-revisiting-ordinary-least-squares-from-last-week'),
+              ('Gradient descent example', 2, None, 'gradient-descent-example'),
+              ('The derivative of the cost/loss function',
                2,
                None,
-               'deriving-ols-from-a-probability-distribution'),
-              ('Independent and Identically Distrubuted (iid)',
+               'the-derivative-of-the-cost-loss-function'),
+              ('The Hessian matrix', 2, None, 'the-hessian-matrix'),
+              ('Simple program', 2, None, 'simple-program'),
+              ('Gradient Descent Example', 2, None, 'gradient-descent-example'),
+              ('Gradient descent and Ridge',
                2,
                None,
-               'independent-and-identically-distrubuted-iid'),
-              ('Maximum Likelihood Estimation (MLE)',
+               'gradient-descent-and-ridge'),
+              ('The Hessian matrix for Ridge Regression',
                2,
                None,
-               'maximum-likelihood-estimation-mle'),
-              ('A new Cost Function', 2, None, 'a-new-cost-function'),
-              ("More basic Statistics and Bayes' theorem",
+               'the-hessian-matrix-for-ridge-regression'),
+              ('Program example for gradient descent with Ridge Regression',
                2,
                None,
-               'more-basic-statistics-and-bayes-theorem'),
-              ('Marginal Probability', 2, None, 'marginal-probability'),
-              ('Conditional  Probability', 2, None, 'conditional-probability'),
-              ("Bayes' Theorem", 2, None, 'bayes-theorem'),
-              ("Interpretations of Bayes' Theorem",
+               'program-example-for-gradient-descent-with-ridge-regression'),
+              ('Using gradient descent methods, limitations',
                2,
                None,
-               'interpretations-of-bayes-theorem'),
-              ("Example of Usage of Bayes' theorem",
+               'using-gradient-descent-methods-limitations'),
+              ('Momentum based GD', 2, None, 'momentum-based-gd'),
+              ('Improving gradient descent with momentum',
                2,
                None,
-               'example-of-usage-of-bayes-theorem'),
-              ('Doing it correctly', 2, None, 'doing-it-correctly'),
-              ("Bayes' Theorem and Ridge and Lasso Regression",
+               'improving-gradient-descent-with-momentum'),
+              ('Same code but now with momentum gradient descent',
                2,
                None,
-               'bayes-theorem-and-ridge-and-lasso-regression'),
-              ('Ridge and Bayes', 2, None, 'ridge-and-bayes'),
-              ('Lasso and Bayes', 2, None, 'lasso-and-bayes'),
-              ('Why resampling methods', 2, None, 'why-resampling-methods'),
-              ('Resampling methods', 2, None, 'resampling-methods'),
-              ('Resampling approaches can be computationally expensive',
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ('Overview video on Stochastic Gradient Descent (SGD)',
                2,
                None,
-               'resampling-approaches-can-be-computationally-expensive'),
-              ('Why resampling methods ?', 2, None, 'why-resampling-methods'),
-              ('Statistical analysis', 2, None, 'statistical-analysis'),
-              ('Resampling methods', 2, None, 'resampling-methods'),
-              ('Resampling methods: Bootstrap',
+               'overview-video-on-stochastic-gradient-descent-sgd'),
+              ('Batches and mini-batches', 2, None, 'batches-and-mini-batches'),
+              ('Pros and cons', 2, None, 'pros-and-cons'),
+              ('Convergence rates', 2, None, 'convergence-rates'),
+              ('Accuracy', 2, None, 'accuracy'),
+              ('Stochastic Gradient Descent (SGD)',
                2,
                None,
-               'resampling-methods-bootstrap'),
-              ('The Central Limit Theorem',
+               'stochastic-gradient-descent-sgd'),
+              ('Stochastic Gradient Descent',
                2,
                None,
-               'the-central-limit-theorem'),
-              ('Finding the Limit', 2, None, 'finding-the-limit'),
-              ('Rewriting the $\\delta$-function',
+               'stochastic-gradient-descent'),
+              ('Computation of gradients', 2, None, 'computation-of-gradients'),
+              ('SGD example', 2, None, 'sgd-example'),
+              ('The gradient step', 2, None, 'the-gradient-step'),
+              ('Simple example code', 2, None, 'simple-example-code'),
+              ('When do we stop?', 2, None, 'when-do-we-stop'),
+              ('Slightly different approach',
                2,
                None,
-               'rewriting-the-delta-function'),
-              ('Identifying Terms', 2, None, 'identifying-terms'),
-              ('Wrapping it up', 2, None, 'wrapping-it-up'),
-              ('Confidence Intervals', 2, None, 'confidence-intervals'),
-              ('Standard Approach based on the Normal Distribution',
+               'slightly-different-approach'),
+              ('Time decay rate', 2, None, 'time-decay-rate'),
+              ('Code with a Number of Minibatches which varies',
                2,
                None,
-               'standard-approach-based-on-the-normal-distribution'),
-              ('Resampling methods: Bootstrap background',
+               'code-with-a-number-of-minibatches-which-varies'),
+              ('Replace or not', 2, None, 'replace-or-not'),
+              ('SGD vs Full-Batch GD: Convergence Speed and Memory Comparison',
                2,
                None,
-               'resampling-methods-bootstrap-background'),
-              ('Resampling methods: More Bootstrap background',
+               'sgd-vs-full-batch-gd-convergence-speed-and-memory-comparison'),
+              ('Theoretical Convergence Speed and convex optimization',
+               3,
+               None,
+               'theoretical-convergence-speed-and-convex-optimization'),
+              ('Strongly Convex Case', 3, None, 'strongly-convex-case'),
+              ('Non-Convex Problems', 3, None, 'non-convex-problems'),
+              ('Memory Usage and Scalability',
+               2,
+               None,
+               'memory-usage-and-scalability'),
+              ('Empirical Evidence: Convergence Time and Memory in Practice',
+               2,
+               None,
+               'empirical-evidence-convergence-time-and-memory-in-practice'),
+              ('Deep Neural Networks', 3, None, 'deep-neural-networks'),
+              ('Memory constraints', 3, None, 'memory-constraints'),
+              ('Second moment of the gradient',
+               2,
+               None,
+               'second-moment-of-the-gradient'),
+              ('Challenge: Choosing a Fixed Learning Rate',
+               2,
+               None,
+               'challenge-choosing-a-fixed-learning-rate'),
+              ('Motivation for Adaptive Step Sizes',
+               2,
+               None,
+               'motivation-for-adaptive-step-sizes'),
+              ('AdaGrad algorithm, taken from "Goodfellow et '
+               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
+               2,
+               None,
+               'adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
+              ('Derivation of the AdaGrad Algorithm',
+               2,
+               None,
+               'derivation-of-the-adagrad-algorithm'),
+              ('AdaGrad Update Rule Derivation',
+               2,
+               None,
+               'adagrad-update-rule-derivation'),
+              ('AdaGrad Properties', 2, None, 'adagrad-properties'),
+              ('RMSProp: Adaptive Learning Rates',
+               2,
+               None,
+               'rmsprop-adaptive-learning-rates'),
+              ('RMSProp algorithm, taken from "Goodfellow et '
+               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
+               2,
+               None,
+               'rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
+              ('Adam Optimizer', 2, None, 'adam-optimizer'),
+              ('"ADAM optimizer":"https://arxiv.org/abs/1412.6980"',
+               2,
+               None,
+               'adam-optimizer-https-arxiv-org-abs-1412-6980'),
+              ('Why Combine Momentum and RMSProp?',
+               2,
+               None,
+               'why-combine-momentum-and-rmsprop'),
+              ('Adam: Exponential Moving Averages (Moments)',
                2,
                None,
-               'resampling-methods-more-bootstrap-background'),
-              ('Resampling methods: Bootstrap approach',
+               'adam-exponential-moving-averages-moments'),
+              ('Adam: Bias Correction', 2, None, 'adam-bias-correction'),
+              ('Adam: Update Rule Derivation',
                2,
                None,
-               'resampling-methods-bootstrap-approach'),
-              ('Resampling methods: Bootstrap steps',
+               'adam-update-rule-derivation'),
+              ('Adam vs. AdaGrad and RMSProp',
                2,
                None,
-               'resampling-methods-bootstrap-steps'),
-              ('Code example for the Bootstrap method',
+               'adam-vs-adagrad-and-rmsprop'),
+              ('Adaptivity Across Dimensions',
                2,
                None,
-               'code-example-for-the-bootstrap-method'),
-              ('Plotting the Histogram', 2, None, 'plotting-the-histogram'),
-              ('The bias-variance tradeoff',
+               'adaptivity-across-dimensions'),
+              ('ADAM algorithm, taken from "Goodfellow et '
+               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
                2,
                None,
-               'the-bias-variance-tradeoff'),
-              ('A way to Read the Bias-Variance Tradeoff',
+               'adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
+              ('Algorithms and codes for Adagrad, RMSprop and Adam',
                2,
                None,
-               'a-way-to-read-the-bias-variance-tradeoff'),
-              ('Example code for Bias-Variance tradeoff',
+               'algorithms-and-codes-for-adagrad-rmsprop-and-adam'),
+              ('Practical tips', 2, None, 'practical-tips'),
+              ('Sneaking in automatic differentiation using Autograd',
                2,
                None,
-               'example-code-for-bias-variance-tradeoff'),
-              ('Understanding what happens',
+               'sneaking-in-automatic-differentiation-using-autograd'),
+              ('Same code but now with momentum gradient descent',
                2,
                None,
-               'understanding-what-happens'),
-              ('Summing up', 2, None, 'summing-up'),
-              ("Another Example from Scikit-Learn's Repository",
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ('Including Stochastic Gradient Descent with Autograd',
                2,
                None,
-               'another-example-from-scikit-learn-s-repository'),
-              ('Various steps in cross-validation',
+               'including-stochastic-gradient-descent-with-autograd'),
+              ('Same code but now with momentum gradient descent',
                2,
                None,
-               'various-steps-in-cross-validation'),
-              ('Cross-validation in brief',
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ("But none of these can compete with Newton's method",
                2,
                None,
-               'cross-validation-in-brief'),
-              ('Code Example for Cross-validation and $k$-fold '
-               'Cross-validation',
+               'but-none-of-these-can-compete-with-newton-s-method'),
+              ('Similar (second order function now) problem but now with '
+               'AdaGrad',
                2,
                None,
-               'code-example-for-cross-validation-and-k-fold-cross-validation'),
-              ('More examples on bootstrap and cross-validation and errors',
+               'similar-second-order-function-now-problem-but-now-with-adagrad'),
+              ('RMSprop for adaptive learning rate with Stochastic Gradient '
+               'Descent',
                2,
                None,
-               'more-examples-on-bootstrap-and-cross-validation-and-errors'),
-              ('The same example but now with cross-validation',
+               'rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent'),
+              ('And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"',
                2,
                None,
-               'the-same-example-but-now-with-cross-validation'),
+               'and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf'),
               ('Material for the lab sessions',
                2,
                None,
                'material-for-the-lab-sessions'),
-              ('Linking the regression analysis with a statistical '
-               'interpretation',
+              ('Reminder on different scaling methods',
                2,
                None,
-               'linking-the-regression-analysis-with-a-statistical-interpretation'),
-              ('Assumptions made', 2, None, 'assumptions-made'),
-              ('Expectation value and variance',
+               'reminder-on-different-scaling-methods'),
+              ('Functionality in Scikit-Learn',
                2,
                None,
-               'expectation-value-and-variance'),
-              ('Expectation value and variance for $\\boldsymbol{\\beta}$',
+               'functionality-in-scikit-learn'),
+              ('More preprocessing', 2, None, 'more-preprocessing'),
+              ('Frequently used scaling functions',
                2,
                None,
-               'expectation-value-and-variance-for-boldsymbol-beta')]}
+               'frequently-used-scaling-functions')]}
 end of tocinfo -->
 
 <body>
@@ -220,66 +283,86 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week37-bs.html">Week 37: Statistical interpretations and Resampling Methods</a>
+    <a class="navbar-brand" href="http://github.com/week37-bs.html">Week 37: Gradient descent methods</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs001.html#plans-for-week-37-lecture-monday" style="font-size: 80%;">Plans for week 37, lecture Monday</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs002.html#plans-for-week-37-lab-sessions" style="font-size: 80%;">Plans for week 37, lab sessions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs003.html#material-for-lecture-monday-september-9" style="font-size: 80%;">Material for lecture Monday September 9</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs004.html#deriving-ols-from-a-probability-distribution" style="font-size: 80%;">Deriving OLS from a probability distribution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs005.html#independent-and-identically-distrubuted-iid" style="font-size: 80%;">Independent and Identically Distrubuted (iid)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs006.html#maximum-likelihood-estimation-mle" style="font-size: 80%;">Maximum Likelihood Estimation (MLE)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs007.html#a-new-cost-function" style="font-size: 80%;">A new Cost Function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs008.html#more-basic-statistics-and-bayes-theorem" style="font-size: 80%;">More basic Statistics and Bayes' theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs009.html#marginal-probability" style="font-size: 80%;">Marginal Probability</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs010.html#conditional-probability" style="font-size: 80%;">Conditional  Probability</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs011.html#bayes-theorem" style="font-size: 80%;">Bayes' Theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs012.html#interpretations-of-bayes-theorem" style="font-size: 80%;">Interpretations of Bayes' Theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs013.html#example-of-usage-of-bayes-theorem" style="font-size: 80%;">Example of Usage of Bayes' theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs014.html#doing-it-correctly" style="font-size: 80%;">Doing it correctly</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs015.html#bayes-theorem-and-ridge-and-lasso-regression" style="font-size: 80%;">Bayes' Theorem and Ridge and Lasso Regression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs016.html#ridge-and-bayes" style="font-size: 80%;">Ridge and Bayes</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs017.html#lasso-and-bayes" style="font-size: 80%;">Lasso and Bayes</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs021.html#why-resampling-methods" style="font-size: 80%;">Why resampling methods</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs023.html#resampling-methods" style="font-size: 80%;">Resampling methods</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs020.html#resampling-approaches-can-be-computationally-expensive" style="font-size: 80%;">Resampling approaches can be computationally expensive</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs021.html#why-resampling-methods" style="font-size: 80%;">Why resampling methods ?</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs022.html#statistical-analysis" style="font-size: 80%;">Statistical analysis</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs023.html#resampling-methods" style="font-size: 80%;">Resampling methods</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs024.html#resampling-methods-bootstrap" style="font-size: 80%;">Resampling methods: Bootstrap</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs025.html#the-central-limit-theorem" style="font-size: 80%;">The Central Limit Theorem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs026.html#finding-the-limit" style="font-size: 80%;">Finding the Limit</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs027.html#rewriting-the-delta-function" style="font-size: 80%;">Rewriting the \( \delta \)-function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs028.html#identifying-terms" style="font-size: 80%;">Identifying Terms</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs029.html#wrapping-it-up" style="font-size: 80%;">Wrapping it up</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs030.html#confidence-intervals" style="font-size: 80%;">Confidence Intervals</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs031.html#standard-approach-based-on-the-normal-distribution" style="font-size: 80%;">Standard Approach based on the Normal Distribution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs032.html#resampling-methods-bootstrap-background" style="font-size: 80%;">Resampling methods: Bootstrap background</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#resampling-methods-more-bootstrap-background" style="font-size: 80%;">Resampling methods: More Bootstrap background</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs034.html#resampling-methods-bootstrap-approach" style="font-size: 80%;">Resampling methods: Bootstrap approach</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#resampling-methods-bootstrap-steps" style="font-size: 80%;">Resampling methods: Bootstrap steps</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs036.html#code-example-for-the-bootstrap-method" style="font-size: 80%;">Code example for the Bootstrap method</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs037.html#plotting-the-histogram" style="font-size: 80%;">Plotting the Histogram</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs038.html#the-bias-variance-tradeoff" style="font-size: 80%;">The bias-variance tradeoff</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs039.html#a-way-to-read-the-bias-variance-tradeoff" style="font-size: 80%;">A way to Read the Bias-Variance Tradeoff</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs040.html#example-code-for-bias-variance-tradeoff" style="font-size: 80%;">Example code for Bias-Variance tradeoff</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs041.html#understanding-what-happens" style="font-size: 80%;">Understanding what happens</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs042.html#summing-up" style="font-size: 80%;">Summing up</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs043.html#another-example-from-scikit-learn-s-repository" style="font-size: 80%;">Another Example from Scikit-Learn's Repository</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs044.html#various-steps-in-cross-validation" style="font-size: 80%;">Various steps in cross-validation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs045.html#cross-validation-in-brief" style="font-size: 80%;">Cross-validation in brief</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs046.html#code-example-for-cross-validation-and-k-fold-cross-validation" style="font-size: 80%;">Code Example for Cross-validation and \( k \)-fold Cross-validation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs047.html#more-examples-on-bootstrap-and-cross-validation-and-errors" style="font-size: 80%;">More examples on bootstrap and cross-validation and errors</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs048.html#the-same-example-but-now-with-cross-validation" style="font-size: 80%;">The same example but now with cross-validation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs049.html#material-for-the-lab-sessions" style="font-size: 80%;">Material for the lab sessions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs050.html#linking-the-regression-analysis-with-a-statistical-interpretation" style="font-size: 80%;">Linking the regression analysis with a statistical interpretation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs051.html#assumptions-made" style="font-size: 80%;">Assumptions made</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs052.html#expectation-value-and-variance" style="font-size: 80%;">Expectation value and variance</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs053.html#expectation-value-and-variance-for-boldsymbol-beta" style="font-size: 80%;">Expectation value and variance for \( \boldsymbol{\beta} \)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs001.html#plans-for-week-37-lecture-monday" style="font-size: 80%;"><b>Plans for week 37, lecture Monday</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs002.html#readings-and-videos" style="font-size: 80%;"><b>Readings and Videos:</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs003.html#material-for-lecture-monday-september-8" style="font-size: 80%;"><b>Material for lecture Monday September 8</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs004.html#gradient-descent-and-revisiting-ordinary-least-squares-from-last-week" style="font-size: 80%;"><b>Gradient descent and revisiting Ordinary Least Squares from last week</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs009.html#gradient-descent-example" style="font-size: 80%;"><b>Gradient descent example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs006.html#the-derivative-of-the-cost-loss-function" style="font-size: 80%;"><b>The derivative of the cost/loss function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs007.html#the-hessian-matrix" style="font-size: 80%;"><b>The Hessian matrix</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs008.html#simple-program" style="font-size: 80%;"><b>Simple program</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs009.html#gradient-descent-example" style="font-size: 80%;"><b>Gradient Descent Example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs010.html#gradient-descent-and-ridge" style="font-size: 80%;"><b>Gradient descent and Ridge</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs011.html#the-hessian-matrix-for-ridge-regression" style="font-size: 80%;"><b>The Hessian matrix for Ridge Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs012.html#program-example-for-gradient-descent-with-ridge-regression" style="font-size: 80%;"><b>Program example for gradient descent with Ridge Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs013.html#using-gradient-descent-methods-limitations" style="font-size: 80%;"><b>Using gradient descent methods, limitations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs014.html#momentum-based-gd" style="font-size: 80%;"><b>Momentum based GD</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs015.html#improving-gradient-descent-with-momentum" style="font-size: 80%;"><b>Improving gradient descent with momentum</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs056.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs017.html#overview-video-on-stochastic-gradient-descent-sgd" style="font-size: 80%;"><b>Overview video on Stochastic Gradient Descent (SGD)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs018.html#batches-and-mini-batches" style="font-size: 80%;"><b>Batches and mini-batches</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs019.html#pros-and-cons" style="font-size: 80%;"><b>Pros and cons</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs020.html#convergence-rates" style="font-size: 80%;"><b>Convergence rates</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs021.html#accuracy" style="font-size: 80%;"><b>Accuracy</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs022.html#stochastic-gradient-descent-sgd" style="font-size: 80%;"><b>Stochastic Gradient Descent (SGD)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs023.html#stochastic-gradient-descent" style="font-size: 80%;"><b>Stochastic Gradient Descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs024.html#computation-of-gradients" style="font-size: 80%;"><b>Computation of gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs025.html#sgd-example" style="font-size: 80%;"><b>SGD example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs026.html#the-gradient-step" style="font-size: 80%;"><b>The gradient step</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs027.html#simple-example-code" style="font-size: 80%;"><b>Simple example code</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs028.html#when-do-we-stop" style="font-size: 80%;"><b>When do we stop?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs029.html#slightly-different-approach" style="font-size: 80%;"><b>Slightly different approach</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs030.html#time-decay-rate" style="font-size: 80%;"><b>Time decay rate</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs031.html#code-with-a-number-of-minibatches-which-varies" style="font-size: 80%;"><b>Code with a Number of Minibatches which varies</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs032.html#replace-or-not" style="font-size: 80%;"><b>Replace or not</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#sgd-vs-full-batch-gd-convergence-speed-and-memory-comparison" style="font-size: 80%;"><b>SGD vs Full-Batch GD: Convergence Speed and Memory Comparison</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#theoretical-convergence-speed-and-convex-optimization" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Theoretical Convergence Speed and convex optimization</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#strongly-convex-case" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Strongly Convex Case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs033.html#non-convex-problems" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Non-Convex Problems</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs034.html#memory-usage-and-scalability" style="font-size: 80%;"><b>Memory Usage and Scalability</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#empirical-evidence-convergence-time-and-memory-in-practice" style="font-size: 80%;"><b>Empirical Evidence: Convergence Time and Memory in Practice</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#deep-neural-networks" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Deep Neural Networks</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs035.html#memory-constraints" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Memory constraints</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs036.html#second-moment-of-the-gradient" style="font-size: 80%;"><b>Second moment of the gradient</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs037.html#challenge-choosing-a-fixed-learning-rate" style="font-size: 80%;"><b>Challenge: Choosing a Fixed Learning Rate</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs038.html#motivation-for-adaptive-step-sizes" style="font-size: 80%;"><b>Motivation for Adaptive Step Sizes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs038.html#adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>AdaGrad algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs039.html#derivation-of-the-adagrad-algorithm" style="font-size: 80%;"><b>Derivation of the AdaGrad Algorithm</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs040.html#adagrad-update-rule-derivation" style="font-size: 80%;"><b>AdaGrad Update Rule Derivation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs041.html#adagrad-properties" style="font-size: 80%;"><b>AdaGrad Properties</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs042.html#rmsprop-adaptive-learning-rates" style="font-size: 80%;"><b>RMSProp: Adaptive Learning Rates</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs042.html#rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>RMSProp algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs043.html#adam-optimizer" style="font-size: 80%;"><b>Adam Optimizer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs044.html#adam-optimizer-https-arxiv-org-abs-1412-6980" style="font-size: 80%;"><b>"ADAM optimizer":"https://arxiv.org/abs/1412.6980"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs045.html#why-combine-momentum-and-rmsprop" style="font-size: 80%;"><b>Why Combine Momentum and RMSProp?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs046.html#adam-exponential-moving-averages-moments" style="font-size: 80%;"><b>Adam: Exponential Moving Averages (Moments)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs047.html#adam-bias-correction" style="font-size: 80%;"><b>Adam: Bias Correction</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs048.html#adam-update-rule-derivation" style="font-size: 80%;"><b>Adam: Update Rule Derivation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs049.html#adam-vs-adagrad-and-rmsprop" style="font-size: 80%;"><b>Adam vs. AdaGrad and RMSProp</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs050.html#adaptivity-across-dimensions" style="font-size: 80%;"><b>Adaptivity Across Dimensions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs050.html#adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>ADAM algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs051.html#algorithms-and-codes-for-adagrad-rmsprop-and-adam" style="font-size: 80%;"><b>Algorithms and codes for Adagrad, RMSprop and Adam</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs052.html#practical-tips" style="font-size: 80%;"><b>Practical tips</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs053.html#sneaking-in-automatic-differentiation-using-autograd" style="font-size: 80%;"><b>Sneaking in automatic differentiation using Autograd</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs056.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs055.html#including-stochastic-gradient-descent-with-autograd" style="font-size: 80%;"><b>Including Stochastic Gradient Descent with Autograd</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs056.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs057.html#but-none-of-these-can-compete-with-newton-s-method" style="font-size: 80%;"><b>But none of these can compete with Newton's method</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs058.html#similar-second-order-function-now-problem-but-now-with-adagrad" style="font-size: 80%;"><b>Similar (second order function now) problem but now with AdaGrad</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs059.html#rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent" style="font-size: 80%;"><b>RMSprop for adaptive learning rate with Stochastic Gradient Descent</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs060.html#and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf" style="font-size: 80%;"><b>And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs061.html#material-for-the-lab-sessions" style="font-size: 80%;"><b>Material for the lab sessions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs062.html#reminder-on-different-scaling-methods" style="font-size: 80%;"><b>Reminder on different scaling methods</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs063.html#functionality-in-scikit-learn" style="font-size: 80%;"><b>Functionality in Scikit-Learn</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs064.html#more-preprocessing" style="font-size: 80%;"><b>More preprocessing</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week37-bs065.html#frequently-used-scaling-functions" style="font-size: 80%;"><b>Frequently used scaling functions</b></a></li>
 
         </ul>
       </li>
@@ -293,7 +376,7 @@
 <!-- ------------------- main content ---------------------- -->
 <div class="jumbotron">
 <center>
-<h1>Week 37: Statistical interpretations and Resampling Methods</h1>
+<h1>Week 37: Gradient descent methods</h1>
 </center>  <!-- document title -->
 
 <!-- author(s): Morten Hjorth-Jensen -->
@@ -306,7 +389,7 @@ <h1>Week 37: Statistical interpretations and Resampling Methods</h1>
 </center>
 <br>
 <center>
-<h4>September 9, 2024</h4>
+<h4>September 8-12, 2025</h4>
 </center> <!-- date -->
 <br>
 
@@ -333,7 +416,7 @@ <h4>September 9, 2024</h4>
   <li><a href="http://github.com/._week37-bs008.html">9</a></li>
   <li><a href="http://github.com/._week37-bs009.html">10</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week37-bs053.html">54</a></li>
+  <li><a href="http://github.com/._week37-bs065.html">66</a></li>
   <li><a href="http://github.com/._week37-bs001.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
@@ -347,7 +430,7 @@ <h4>September 9, 2024</h4>
 </footer>
 -->
 <center style="font-size:80%">
-<!-- copyright --> &copy; 1999-2024, Morten Hjorth-Jensen. Released under CC Attribution-NonCommercial 4.0 license
+<!-- copyright --> &copy; 1999-2025, Morten Hjorth-Jensen. Released under CC Attribution-NonCommercial 4.0 license
 </center>
 </body>
 </html>
diff --git a/doc/pub/week37/html/week37-reveal.html b/doc/pub/week37/html/week37-reveal.html
index 7286e2f3f..e4cf90870 100644
--- a/doc/pub/week37/html/week37-reveal.html
+++ b/doc/pub/week37/html/week37-reveal.html
@@ -9,8 +9,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 37: Statistical interpretations and Resampling Methods">
-<title>Week 37: Statistical interpretations and Resampling Methods</title>
+<meta name="description" content="Week 37: Gradient descent methods">
+<title>Week 37: Gradient descent methods</title>
 
 <!-- reveal.js: https://lab.hakim.se/reveal-js/ -->
 
@@ -168,7 +168,7 @@
 <section>
 <!-- ------------------- main content ---------------------- -->
 <center>
-<h1 style="text-align: center;">Week 37: Statistical interpretations and Resampling Methods</h1>
+<h1 style="text-align: center;">Week 37: Gradient descent methods</h1>
 </center>  <!-- document title -->
 
 <!-- author(s): Morten Hjorth-Jensen -->
@@ -181,7 +181,7 @@ <h1 style="text-align: center;">Week 37: Statistical interpretations and Resampl
 </center>
 <br>
 <center>
-<h4>September 9, 2024</h4>
+<h4>September 8-12, 2025</h4>
 </center> <!-- date -->
 <br>
 
@@ -189,7 +189,7 @@ <h4>September 9, 2024</h4>
 
 
 <center style="font-size:80%">
-<!-- copyright --> &copy; 1999-2024, Morten Hjorth-Jensen. Released under CC Attribution-NonCommercial 4.0 license
+<!-- copyright --> &copy; 1999-2025, Morten Hjorth-Jensen. Released under CC Attribution-NonCommercial 4.0 license
 </center>
 </section>
 
@@ -197,942 +197,1583 @@ <h4>September 9, 2024</h4>
 <h2 id="plans-for-week-37-lecture-monday">Plans for week 37, lecture Monday  </h2>
 
 <div class="alert alert-block alert-block alert-text-normal">
-<b>Material for the lecture on Monday September 9</b>
+<b>Plans and material  for the lecture on Monday September 8</b>
 <p>
-<ul>
-
-<p><li> <a href="https://youtu.be/omLmp_kkie0" target="_blank">Video of Lecture</a></li>
-
-<p><li> <a href="https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2024/NotesSeptember9.pdf" target="_blank">Whiteboard notes</a></li>
-</ul>
-<p>
-
-<p><li> Statistical interpretation of Ridge and Lasso regression, see also slides from last week</li>
-
-<p><li> Resampling techniques, Bootstrap and cross validation and bias-variance tradeoff (this may partly be discussed during the exercise sessions as well.</li>
-
-<p><li> Readings and Videos:</li>
-<ul>
-
-<p><li> Raschka et al, pages 175-192</li>
-
-<p><li> Hastie et al Chapter 7, here we recommend 7.1-7.5 and 7.10 (cross-validation) and 7.11 (bootstrap). See <a href="https://link.springer.com/book/10.1007/978-0-387-84858-7" target="_blank"><tt>https://link.springer.com/book/10.1007/978-0-387-84858-7</tt></a>.</li>
-
-<p><li> <a href="https://www.youtube.com/watch?v=fSytzGwwBVw" target="_blank">Video on cross validation</a></li>
-
-<p><li> <a href="https://www.youtube.com/watch?v=Xz0x-8-cgaQ" target="_blank">Video on Bootstrapping</a></li>
-
-<p><li> <a href="https://www.youtube.com/watch?v=EuBBz3bI-aA" target="_blank">Video on bias-variance tradeoff</a></li>
-</ul>
+<p>The family of gradient descent methods</p>
+<ol>
+<p><li> Plain gradient descent (constant learning rate), reminder from last week with examples using OLS and Ridge</li>
+<p><li> Improving gradient descent with momentum</li>
+<p><li> Introducing stochastic gradient descent</li>
+<p><li> More advanced updates of the learning rate: ADAgrad, RMSprop and ADAM</li>
+<p><li> <a href="https://youtu.be/SuxK68tj-V8" target="_blank">Video of Lecture</a></li>
+<p><li> <a href="https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2025/FYSSTKweek37.pdf" target="_blank">Whiteboard notes</a></li>
+</ol>
 </div>
 </section>
 
 <section>
-<h2 id="plans-for-week-37-lab-sessions">Plans for week 37, lab sessions  </h2>
-
+<h2 id="readings-and-videos">Readings and Videos: </h2>
 <div class="alert alert-block alert-block alert-text-normal">
-<b>Material for the lab  sessions on Tuesday and Wednesday</b>
+<b></b>
 <p>
-<ul>
-
-<p><li> Calculations of expectation values</li>
-
-<p><li> Discussion of resampling techniques</li>
-
-<p><li> Exercise set for week 37</li>
-
-<p><li> Work on project 1</li>
-
-<p><li> <a href="https://youtu.be/bK4AEcTu-oM" target="_blank">Video of exercise sessions week 37</a></li>
-
-<p><li> For more discussions of Ridge regression and calculation of averages, <a href="https://arxiv.org/abs/1509.09169" target="_blank">Wessel van Wieringen's</a> article is highly recommended.</li>
-</ul>
+<ol>
+<p><li> Recommended: Goodfellow et al, Deep Learning, introduction to gradient descent, see sections 4.3-4.5  at <a href="https://www.deeplearningbook.org/contents/numerical.html" target="_blank"><tt>https://www.deeplearningbook.org/contents/numerical.html</tt></a> and chapter 8.3-8.5 at <a href="https://www.deeplearningbook.org/contents/optimization.html" target="_blank"><tt>https://www.deeplearningbook.org/contents/optimization.html</tt></a></li>
+<p><li> Rashcka et al, pages 37-44 and pages 278-283 with focus on linear regression.</li>
+<p><li> Video on gradient descent at <a href="https://www.youtube.com/watch?v=sDv4f4s2SB8" target="_blank"><tt>https://www.youtube.com/watch?v=sDv4f4s2SB8</tt></a></li>
+<p><li> Video on Stochastic gradient descent at <a href="https://www.youtube.com/watch?v=vMh0zPT0tLI" target="_blank"><tt>https://www.youtube.com/watch?v=vMh0zPT0tLI</tt></a></li>
+</ol>
 </div>
 </section>
 
 <section>
-<h2 id="material-for-lecture-monday-september-9">Material for lecture Monday September 9 </h2>
+<h2 id="material-for-lecture-monday-september-8">Material for lecture Monday September 8 </h2>
 </section>
 
 <section>
-<h2 id="deriving-ols-from-a-probability-distribution">Deriving OLS from a probability distribution </h2>
+<h2 id="gradient-descent-and-revisiting-ordinary-least-squares-from-last-week">Gradient descent and revisiting Ordinary Least Squares from last week </h2>
 
-<p>Our basic assumption when we derived the OLS equations was to assume
-that our output is determined by a given continuous function
-\( f(\boldsymbol{x}) \) and a random noise \( \boldsymbol{\epsilon} \) given by the normal
-distribution with zero mean value and an undetermined variance
-\( \sigma^2 \).
+<p>Last week we started with  linear regression as a case study for the gradient descent
+methods. Linear regression is a great test case for the gradient
+descent methods discussed in the lectures since it has several
+desirable properties such as:
 </p>
 
-<p>We found above that the outputs \( \boldsymbol{y} \) have a mean value given by
-\( \boldsymbol{X}\hat{\boldsymbol{\beta}} \) and variance \( \sigma^2 \). Since the entries to
-the design matrix are not stochastic variables, we can assume that the
-probability distribution of our targets is also a normal distribution
-but now with mean value \( \boldsymbol{X}\hat{\boldsymbol{\beta}} \). This means that a
-single output \( y_i \) is given by the Gaussian distribution
+<ol>
+<p><li> An analytical solution (recall homework sets for week 35).</li>
+<p><li> The gradient can be computed analytically.</li>
+<p><li> The cost function is convex which guarantees that gradient descent converges for small enough learning rates</li>
+</ol>
+<p>
+<p>We revisit an example similar to what we had in the first homework set. We have a function  of the type</p>
+
+
+<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #eeeedd">
+  <pre style="font-size: 80%; line-height: 125%;"><span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
+x = <span style="color: #B452CD">2</span>*np.random.rand(m,<span style="color: #B452CD">1</span>)
+y = <span style="color: #B452CD">4</span>+<span style="color: #B452CD">3</span>*x+np.random.randn(m,<span style="color: #B452CD">1</span>)
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
+</div>
+
+<p>with \( x_i \in [0,1]  \) is chosen randomly using a uniform distribution. Additionally we have a stochastic noise chosen according to a normal distribution \( \cal {N}(0,1) \). 
+The linear regression model is given by 
 </p>
+<p>&nbsp;<br>
+$$
+h_\theta(x) = \boldsymbol{y} = \theta_0 + \theta_1 x,
+$$
+<p>&nbsp;<br>
 
+<p>such that </p>
 <p>&nbsp;<br>
 $$
-y_i\sim \mathcal{N}(\boldsymbol{X}_{i,*}\boldsymbol{\beta}, \sigma^2)=\frac{1}{\sqrt{2\pi\sigma^2}}\exp{\left[-\frac{(y_i-\boldsymbol{X}_{i,*}\boldsymbol{\beta})^2}{2\sigma^2}\right]}.
+\boldsymbol{y}_i = \theta_0 + \theta_1 x_i.
 $$
 <p>&nbsp;<br>
 </section>
 
 <section>
-<h2 id="independent-and-identically-distrubuted-iid">Independent and Identically Distrubuted (iid) </h2>
+<h2 id="gradient-descent-example">Gradient descent example </h2>
 
-<p>We assume now that the various \( y_i \) values are stochastically distributed according to the above Gaussian distribution. 
-We define this distribution as
-</p>
+<p>Let \( \mathbf{y} = (y_1,\cdots,y_n)^T \), \( \mathbf{\boldsymbol{y}} = (\boldsymbol{y}_1,\cdots,\boldsymbol{y}_n)^T \) and \( \theta = (\theta_0, \theta_1)^T \)</p>
+
+<p>It is convenient to write \( \mathbf{\boldsymbol{y}} = X\theta \) where \( X \in \mathbb{R}^{100 \times 2}  \) is the design matrix given by (we keep the intercept here)</p>
 <p>&nbsp;<br>
 $$
-p(y_i, \boldsymbol{X}\vert\boldsymbol{\beta})=\frac{1}{\sqrt{2\pi\sigma^2}}\exp{\left[-\frac{(y_i-\boldsymbol{X}_{i,*}\boldsymbol{\beta})^2}{2\sigma^2}\right]},
+X \equiv \begin{bmatrix}
+1 & x_1  \\
+\vdots & \vdots  \\
+1 & x_{100} &  \\
+\end{bmatrix}.
 $$
 <p>&nbsp;<br>
 
-<p>which reads as finding the likelihood of an event \( y_i \) with the input variables \( \boldsymbol{X} \) given the parameters (to be determined) \( \boldsymbol{\beta} \).</p>
-
-<p>Since these events are assumed to be independent and identicall distributed we can build the probability distribution function (PDF) for all possible event \( \boldsymbol{y} \) as the product of the single events, that is we have</p>
-
+<p>The cost/loss/risk function is given by </p>
 <p>&nbsp;<br>
 $$
-p(\boldsymbol{y},\boldsymbol{X}\vert\boldsymbol{\beta})=\prod_{i=0}^{n-1}\frac{1}{\sqrt{2\pi\sigma^2}}\exp{\left[-\frac{(y_i-\boldsymbol{X}_{i,*}\boldsymbol{\beta})^2}{2\sigma^2}\right]}=\prod_{i=0}^{n-1}p(y_i,\boldsymbol{X}\vert\boldsymbol{\beta}).
+C(\theta) = \frac{1}{n}||X\theta-\mathbf{y}||_{2}^{2} = \frac{1}{n}\sum_{i=1}^{100}\left[ (\theta_0 + \theta_1 x_i)^2 - 2 y_i (\theta_0 + \theta_1 x_i) + y_i^2\right] 
 $$
 <p>&nbsp;<br>
 
-<p>We will write this in a more compact form reserving \( \boldsymbol{D} \) for the domain of events, including the ouputs (targets) and the inputs. That is
-in case we have a simple one-dimensional input and output case
-</p>
+<p>and we want to find \( \theta \) such that \( C(\theta) \) is minimized.</p>
+</section>
+
+<section>
+<h2 id="the-derivative-of-the-cost-loss-function">The derivative of the cost/loss function </h2>
+
+<p>Computing \( \partial C(\theta) / \partial \theta_0 \) and \( \partial C(\theta) / \partial \theta_1 \) we can show  that the gradient can be written as</p>
 <p>&nbsp;<br>
 $$
-\boldsymbol{D}=[(x_0,y_0), (x_1,y_1),\dots, (x_{n-1},y_{n-1})].
+\nabla_{\theta} C(\theta) = \frac{2}{n}\begin{bmatrix} \sum_{i=1}^{100} \left(\theta_0+\theta_1x_i-y_i\right) \\
+\sum_{i=1}^{100}\left( x_i (\theta_0+\theta_1x_i)-y_ix_i\right) \\
+\end{bmatrix} = \frac{2}{n}X^T(X\theta - \mathbf{y}), 
 $$
 <p>&nbsp;<br>
 
-<p>In the more general case the various inputs should be replaced by the possible features represented by the input data set \( \boldsymbol{X} \). 
-We can now rewrite the above probability as 
-</p>
+<p>where \( X \) is the design matrix defined above.</p>
+</section>
+
+<section>
+<h2 id="the-hessian-matrix">The Hessian matrix </h2>
+<p>The Hessian matrix of \( C(\theta) \) is given by </p>
 <p>&nbsp;<br>
 $$
-p(\boldsymbol{D}\vert\boldsymbol{\beta})=\prod_{i=0}^{n-1}\frac{1}{\sqrt{2\pi\sigma^2}}\exp{\left[-\frac{(y_i-\boldsymbol{X}_{i,*}\boldsymbol{\beta})^2}{2\sigma^2}\right]}.
+\boldsymbol{H} \equiv \begin{bmatrix}
+\frac{\partial^2 C(\theta)}{\partial \theta_0^2} & \frac{\partial^2 C(\theta)}{\partial \theta_0 \partial \theta_1}  \\
+\frac{\partial^2 C(\theta)}{\partial \theta_0 \partial \theta_1} & \frac{\partial^2 C(\theta)}{\partial \theta_1^2} &  \\
+\end{bmatrix} = \frac{2}{n}X^T X.
 $$
 <p>&nbsp;<br>
 
-<p>It is a conditional probability (see below) and reads as the likelihood of a domain of events \( \boldsymbol{D} \) given a set of parameters \( \boldsymbol{\beta} \).</p>
+<p>This result implies that \( C(\theta) \) is a convex function since the matrix \( X^T X \) always is positive semi-definite.</p>
 </section>
 
 <section>
-<h2 id="maximum-likelihood-estimation-mle">Maximum Likelihood Estimation (MLE) </h2>
-
-<p>In statistics, maximum likelihood estimation (MLE) is a method of
-estimating the parameters of an assumed probability distribution,
-given some observed data. This is achieved by maximizing a likelihood
-function so that, under the assumed statistical model, the observed
-data is the most probable. 
-</p>
+<h2 id="simple-program">Simple program </h2>
 
-<p>We will assume here that our events are given by the above Gaussian
-distribution and we will determine the optimal parameters \( \beta \) by
-maximizing the above PDF. However, computing the derivatives of a
-product function is cumbersome and can easily lead to overflow and/or
-underflowproblems, with potentials for loss of numerical precision.
-</p>
+<p>We can now write a program that minimizes \( C(\theta) \) using the gradient descent method with a constant learning rate \( \eta \) according to </p>
+<p>&nbsp;<br>
+$$
+\theta_{k+1} = \theta_k - \eta \nabla_\theta C(\theta_k), \ k=0,1,\cdots 
+$$
+<p>&nbsp;<br>
 
-<p>In practice, it is more convenient to maximize the logarithm of the
-PDF because it is a monotonically increasing function of the argument.
-Alternatively, and this will be our option, we will minimize the
-negative of the logarithm since this is a monotonically decreasing
-function.
+<p>We can use the expression we computed for the gradient and let use a
+\( \theta_0 \) be chosen randomly and let \( \eta = 0.001 \). Stop iterating
+when \( ||\nabla_\theta C(\theta_k) || \leq \epsilon = 10^{-8} \). <b>Note that the code below does not include the latter stop criterion</b>.
 </p>
 
-<p>Note also that maximization/minimization of the logarithm of the PDF
-is equivalent to the maximization/minimization of the function itself.
+<p>And finally we can compare our solution for \( \theta \) with the analytic result given by 
+\( \theta= (X^TX)^{-1} X^T \mathbf{y} \).
 </p>
 </section>
 
 <section>
-<h2 id="a-new-cost-function">A new Cost Function </h2>
+<h2 id="gradient-descent-example">Gradient Descent Example </h2>
+
+<p>Here our simple example</p>
+
+<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #eeeedd">
+  <pre style="font-size: 80%; line-height: 125%;"><span style="color: #228B22"># Importing various packages</span>
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">random</span> <span style="color: #8B008B; font-weight: bold">import</span> random, seed
+<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
+<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">matplotlib.pyplot</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">plt</span>
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">mpl_toolkits.mplot3d</span> <span style="color: #8B008B; font-weight: bold">import</span> Axes3D
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">matplotlib</span> <span style="color: #8B008B; font-weight: bold">import</span> cm
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">matplotlib.ticker</span> <span style="color: #8B008B; font-weight: bold">import</span> LinearLocator, FormatStrFormatter
+<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">sys</span>
+
+<span style="color: #228B22"># the number of datapoints</span>
+n = <span style="color: #B452CD">100</span>
+x = <span style="color: #B452CD">2</span>*np.random.rand(n,<span style="color: #B452CD">1</span>)
+y = <span style="color: #B452CD">4</span>+<span style="color: #B452CD">3</span>*x+np.random.randn(n,<span style="color: #B452CD">1</span>)
+
+X = np.c_[np.ones((n,<span style="color: #B452CD">1</span>)), x]
+<span style="color: #228B22"># Hessian matrix</span>
+H = (<span style="color: #B452CD">2.0</span>/n)* X.T @ X
+<span style="color: #228B22"># Get the eigenvalues</span>
+EigValues, EigVectors = np.linalg.eig(H)
+<span style="color: #658b00">print</span>(<span style="color: #CD5555">f&quot;Eigenvalues of Hessian Matrix:{</span>EigValues<span style="color: #CD5555">}&quot;</span>)
+
+theta_linreg = np.linalg.inv(X.T @ X) @ X.T @ y
+<span style="color: #658b00">print</span>(theta_linreg)
+theta = np.random.randn(<span style="color: #B452CD">2</span>,<span style="color: #B452CD">1</span>)
+
+eta = <span style="color: #B452CD">1.0</span>/np.max(EigValues)
+Niterations = <span style="color: #B452CD">1000</span>
+
+<span style="color: #8B008B; font-weight: bold">for</span> <span style="color: #658b00">iter</span> <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(Niterations):
+    gradient = (<span style="color: #B452CD">2.0</span>/n)*X.T @ (X @ theta-y)
+    theta -= eta*gradient
+
+<span style="color: #658b00">print</span>(theta)
+xnew = np.array([[<span style="color: #B452CD">0</span>],[<span style="color: #B452CD">2</span>]])
+xbnew = np.c_[np.ones((<span style="color: #B452CD">2</span>,<span style="color: #B452CD">1</span>)), xnew]
+ypredict = xbnew.dot(theta)
+ypredict2 = xbnew.dot(theta_linreg)
+plt.plot(xnew, ypredict, <span style="color: #CD5555">&quot;r-&quot;</span>)
+plt.plot(xnew, ypredict2, <span style="color: #CD5555">&quot;b-&quot;</span>)
+plt.plot(x, y ,<span style="color: #CD5555">&#39;ro&#39;</span>)
+plt.axis([<span style="color: #B452CD">0</span>,<span style="color: #B452CD">2.0</span>,<span style="color: #B452CD">0</span>, <span style="color: #B452CD">15.0</span>])
+plt.xlabel(<span style="color: #CD5555">r&#39;$x$&#39;</span>)
+plt.ylabel(<span style="color: #CD5555">r&#39;$y$&#39;</span>)
+plt.title(<span style="color: #CD5555">r&#39;Gradient descent example&#39;</span>)
+plt.show()
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
+</div>
+</section>
 
-<p>We could now define a new cost function to minimize, namely the negative logarithm of the above PDF</p>
+<section>
+<h2 id="gradient-descent-and-ridge">Gradient descent and Ridge </h2>
 
+<p>We have also discussed Ridge regression where the loss function contains a regularized term given by the \( L_2 \) norm of \( \theta \), </p>
 <p>&nbsp;<br>
 $$
-C(\boldsymbol{\beta}=-\log{\prod_{i=0}^{n-1}p(y_i,\boldsymbol{X}\vert\boldsymbol{\beta})}=-\sum_{i=0}^{n-1}\log{p(y_i,\boldsymbol{X}\vert\boldsymbol{\beta})},
+C_{\text{ridge}}(\theta) = \frac{1}{n}||X\theta -\mathbf{y}||^2 + \lambda ||\theta||^2, \ \lambda \geq 0.
 $$
 <p>&nbsp;<br>
 
-<p>which becomes</p>
+<p>In order to minimize \( C_{\text{ridge}}(\theta) \) using GD we adjust the gradient as follows </p>
 <p>&nbsp;<br>
 $$
-C(\boldsymbol{\beta}=\frac{n}{2}\log{2\pi\sigma^2}+\frac{\vert\vert (\boldsymbol{y}-\boldsymbol{X}\boldsymbol{\beta})\vert\vert_2^2}{2\sigma^2}.
+\nabla_\theta C_{\text{ridge}}(\theta)  = \frac{2}{n}\begin{bmatrix} \sum_{i=1}^{100} \left(\theta_0+\theta_1x_i-y_i\right) \\
+\sum_{i=1}^{100}\left( x_i (\theta_0+\theta_1x_i)-y_ix_i\right) \\
+\end{bmatrix} + 2\lambda\begin{bmatrix} \theta_0 \\ \theta_1\end{bmatrix} = 2 (\frac{1}{n}X^T(X\theta - \mathbf{y})+\lambda \theta).
 $$
 <p>&nbsp;<br>
 
-<p>Taking the derivative of the <em>new</em> cost function with respect to the parameters \( \beta \) we recognize our familiar OLS equation, namely</p>
-
+<p>We can easily extend our program to minimize \( C_{\text{ridge}}(\theta) \) using gradient descent and compare with the analytical solution given by </p>
 <p>&nbsp;<br>
 $$
-\boldsymbol{X}^T\left(\boldsymbol{y}-\boldsymbol{X}\boldsymbol{\beta}\right) =0,
+\theta_{\text{ridge}} = \left(X^T X + n\lambda I_{2 \times 2} \right)^{-1} X^T \mathbf{y}.
 $$
 <p>&nbsp;<br>
+</section>
 
-<p>which leads to the well-known OLS equation for the optimal paramters \( \beta \)</p>
+<section>
+<h2 id="the-hessian-matrix-for-ridge-regression">The Hessian matrix for Ridge Regression </h2>
+<p>The Hessian matrix of Ridge Regression for our simple example  is given by </p>
 <p>&nbsp;<br>
 $$
-\hat{\boldsymbol{\beta}}^{\mathrm{OLS}}=\left(\boldsymbol{X}^T\boldsymbol{X}\right)^{-1}\boldsymbol{X}^T\boldsymbol{y}!
+\boldsymbol{H} \equiv \begin{bmatrix}
+\frac{\partial^2 C(\theta)}{\partial \theta_0^2} & \frac{\partial^2 C(\theta)}{\partial \theta_0 \partial \theta_1}  \\
+\frac{\partial^2 C(\theta)}{\partial \theta_0 \partial \theta_1} & \frac{\partial^2 C(\theta)}{\partial \theta_1^2} &  \\
+\end{bmatrix} = \frac{2}{n}X^T X+2\lambda\boldsymbol{I}.
 $$
 <p>&nbsp;<br>
 
-<p>Before we make a similar analysis for Ridge and Lasso regression, we need a short reminder on statistics. </p>
+<p>This implies that the Hessian matrix  is positive definite, hence the stationary point is a
+minimum.
+Note that the Ridge cost function is convex being  a sum of two convex
+functions. Therefore, the stationary point is a global
+minimum of this function.
+</p>
 </section>
 
 <section>
-<h2 id="more-basic-statistics-and-bayes-theorem">More basic Statistics and Bayes' theorem </h2>
+<h2 id="program-example-for-gradient-descent-with-ridge-regression">Program example for gradient descent with Ridge Regression </h2>
 
-<p>A central theorem in statistics is Bayes' theorem. This theorem plays a similar role as the good old Pythagoras' theorem in geometry.
-Bayes' theorem is extremely simple to derive. But to do so we need some basic axioms from statistics.
-</p>
-
-<p>Assume we have two domains of events \( X=[x_0,x_1,\dots,x_{n-1}] \) and \( Y=[y_0,y_1,\dots,y_{n-1}] \).</p>
-
-<p>We define also the likelihood for \( X \) and \( Y \) as \( p(X) \) and \( p(Y) \) respectively.
-The likelihood of a specific event \( x_i \) (or \( y_i \)) is then written as \( p(X=x_i) \) or just \( p(x_i)=p_i \). 
-</p>
-
-<div class="alert alert-block alert-block alert-text-normal">
-<b>Union of events is given by</b>
-<p>
-<p>&nbsp;<br>
-$$
-p(X \cup Y)= p(X)+p(Y)-p(X \cap Y).
-$$
-<p>&nbsp;<br>
+<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #eeeedd">
+  <pre style="font-size: 80%; line-height: 125%;"><span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">random</span> <span style="color: #8B008B; font-weight: bold">import</span> random, seed
+<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
+<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">matplotlib.pyplot</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">plt</span>
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">mpl_toolkits.mplot3d</span> <span style="color: #8B008B; font-weight: bold">import</span> Axes3D
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">matplotlib</span> <span style="color: #8B008B; font-weight: bold">import</span> cm
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">matplotlib.ticker</span> <span style="color: #8B008B; font-weight: bold">import</span> LinearLocator, FormatStrFormatter
+<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">sys</span>
+
+<span style="color: #228B22"># the number of datapoints</span>
+n = <span style="color: #B452CD">100</span>
+x = <span style="color: #B452CD">2</span>*np.random.rand(n,<span style="color: #B452CD">1</span>)
+y = <span style="color: #B452CD">4</span>+<span style="color: #B452CD">3</span>*x+np.random.randn(n,<span style="color: #B452CD">1</span>)
+
+X = np.c_[np.ones((n,<span style="color: #B452CD">1</span>)), x]
+XT_X = X.T @ X
+
+<span style="color: #228B22">#Ridge parameter lambda</span>
+lmbda  = <span style="color: #B452CD">0.001</span>
+Id = n*lmbda* np.eye(XT_X.shape[<span style="color: #B452CD">0</span>])
+
+<span style="color: #228B22"># Hessian matrix</span>
+H = (<span style="color: #B452CD">2.0</span>/n)* XT_X+<span style="color: #B452CD">2</span>*lmbda* np.eye(XT_X.shape[<span style="color: #B452CD">0</span>])
+<span style="color: #228B22"># Get the eigenvalues</span>
+EigValues, EigVectors = np.linalg.eig(H)
+<span style="color: #658b00">print</span>(<span style="color: #CD5555">f&quot;Eigenvalues of Hessian Matrix:{</span>EigValues<span style="color: #CD5555">}&quot;</span>)
+
+
+theta_linreg = np.linalg.inv(XT_X+Id) @ X.T @ y
+<span style="color: #658b00">print</span>(theta_linreg)
+<span style="color: #228B22"># Start plain gradient descent</span>
+theta = np.random.randn(<span style="color: #B452CD">2</span>,<span style="color: #B452CD">1</span>)
+
+eta = <span style="color: #B452CD">1.0</span>/np.max(EigValues)
+Niterations = <span style="color: #B452CD">100</span>
+
+<span style="color: #8B008B; font-weight: bold">for</span> <span style="color: #658b00">iter</span> <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(Niterations):
+    gradients = <span style="color: #B452CD">2.0</span>/n*X.T @ (X @ (theta)-y)+<span style="color: #B452CD">2</span>*lmbda*theta
+    theta -= eta*gradients
+
+<span style="color: #658b00">print</span>(theta)
+ypredict = X @ theta
+ypredict2 = X @ theta_linreg
+plt.plot(x, ypredict, <span style="color: #CD5555">&quot;r-&quot;</span>)
+plt.plot(x, ypredict2, <span style="color: #CD5555">&quot;b-&quot;</span>)
+plt.plot(x, y ,<span style="color: #CD5555">&#39;ro&#39;</span>)
+plt.axis([<span style="color: #B452CD">0</span>,<span style="color: #B452CD">2.0</span>,<span style="color: #B452CD">0</span>, <span style="color: #B452CD">15.0</span>])
+plt.xlabel(<span style="color: #CD5555">r&#39;$x$&#39;</span>)
+plt.ylabel(<span style="color: #CD5555">r&#39;$y$&#39;</span>)
+plt.title(<span style="color: #CD5555">r&#39;Gradient descent example for Ridge&#39;</span>)
+plt.show()
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
 </div>
+</section>
 
+<section>
+<h2 id="using-gradient-descent-methods-limitations">Using gradient descent methods, limitations </h2>
 
-<div class="alert alert-block alert-block alert-text-normal">
-<b>The product rule (aka joint probability) is given by</b>
-<p>
-<p>&nbsp;<br>
-$$
-p(X \cup Y)= p(X,Y)= p(X\vert Y)p(Y)=p(Y\vert X)p(X),
-$$
-<p>&nbsp;<br>
+<ul>
+<p><li> <b>Gradient descent (GD) finds local minima of our function</b>. Since the GD algorithm is deterministic, if it converges, it will converge to a local minimum of our cost/loss/risk function. Because in ML we are often dealing with extremely rugged landscapes with many local minima, this can lead to poor performance.</li>
+<p><li> <b>GD is sensitive to initial conditions</b>. One consequence of the local nature of GD is that initial conditions matter. Depending on where one starts, one will end up at a different local minima. Therefore, it is very important to think about how one initializes the training process. This is true for GD as well as more complicated variants of GD.</li>
+<p><li> <b>Gradients are computationally expensive to calculate for large datasets</b>. In many cases in statistics and ML, the cost/loss/risk function is a sum of terms, with one term for each data point. For example, in linear regression, \( E \propto \sum_{i=1}^n (y_i - \mathbf{w}^T\cdot\mathbf{x}_i)^2 \); for logistic regression, the square error is replaced by the cross entropy. To calculate the gradient we have to sum over <em>all</em> \( n \) data points. Doing this at every GD step becomes extremely computationally expensive. An ingenious solution to this, is to calculate the gradients using small subsets of the data called &quot;mini batches&quot;. This has the added benefit of introducing stochasticity into our algorithm.</li>
+<p><li> <b>GD is very sensitive to choices of learning rates</b>. GD is extremely sensitive to the choice of learning rates. If the learning rate is very small, the training process take an extremely long time. For larger learning rates, GD can diverge and give poor results. Furthermore, depending on what the local landscape looks like, we have to modify the learning rates to ensure convergence. Ideally, we would <em>adaptively</em> choose the learning rates to match the landscape.</li>
+<p><li> <b>GD treats all directions in parameter space uniformly.</b> Another major drawback of GD is that unlike Newton's method, the learning rate for GD is the same in all directions in parameter space. For this reason, the maximum learning rate is set by the behavior of the steepest direction and this can significantly slow down training. Ideally, we would like to take large steps in flat directions and small steps in steep directions. Since we are exploring rugged landscapes where curvatures change, this requires us to keep track of not only the gradient but second derivatives. The ideal scenario would be to calculate the Hessian but this proves to be too computationally expensive.</li> 
+<p><li> GD can take exponential time to escape saddle points, even with random initialization. As we mentioned, GD is extremely sensitive to initial condition since it determines the particular local minimum GD would eventually reach. However, even with a good initialization scheme, through the introduction of randomness, GD can still take exponential time to escape saddle points.</li>
+</ul>
+</section>
 
-<p>where we read \( p(X\vert Y) \) as the likelihood of obtaining \( X \) given \( Y \).</p>
-</div>
+<section>
+<h2 id="momentum-based-gd">Momentum based GD </h2>
 
-<p>If we have independent events then \( p(X,Y)=p(X)p(Y) \).</p>
+<p>We discuss here some simple examples where we introduce what is called
+'memory'about previous steps, or what is normally called momentum
+gradient descent.
+For the mathematical details, see whiteboad notes from lecture on September 8, 2025. 
+</p>
 </section>
 
 <section>
-<h2 id="marginal-probability">Marginal Probability </h2>
+<h2 id="improving-gradient-descent-with-momentum">Improving gradient descent with momentum </h2>
 
-<p>The marginal probability is defined in terms of only one of the set of variables \( X,Y \). For a discrete probability we have</p>
-<div class="alert alert-block alert-block alert-text-normal">
-<b></b>
-<p>
-<p>&nbsp;<br>
-$$
-p(X)=\sum_{i=0}^{n-1}p(X,Y=y_i)=\sum_{i=0}^{n-1}p(X\vert Y=y_i)p(Y=y_i)=\sum_{i=0}^{n-1}p(X\vert y_i)p(y_i).
-$$
-<p>&nbsp;<br>
+
+<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #eeeedd">
+  <pre style="font-size: 80%; line-height: 125%;"><span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">numpy</span> <span style="color: #8B008B; font-weight: bold">import</span> asarray
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">numpy</span> <span style="color: #8B008B; font-weight: bold">import</span> arange
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">numpy.random</span> <span style="color: #8B008B; font-weight: bold">import</span> rand
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">numpy.random</span> <span style="color: #8B008B; font-weight: bold">import</span> seed
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">matplotlib</span> <span style="color: #8B008B; font-weight: bold">import</span> pyplot
+ 
+<span style="color: #228B22"># objective function</span>
+<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">objective</span>(x):
+	<span style="color: #8B008B; font-weight: bold">return</span> x**<span style="color: #B452CD">2.0</span>
+ 
+<span style="color: #228B22"># derivative of objective function</span>
+<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">derivative</span>(x):
+	<span style="color: #8B008B; font-weight: bold">return</span> x * <span style="color: #B452CD">2.0</span>
+ 
+<span style="color: #228B22"># gradient descent algorithm</span>
+<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">gradient_descent</span>(objective, derivative, bounds, n_iter, step_size):
+	<span style="color: #228B22"># track all solutions</span>
+	solutions, scores = <span style="color: #658b00">list</span>(), <span style="color: #658b00">list</span>()
+	<span style="color: #228B22"># generate an initial point</span>
+	solution = bounds[:, <span style="color: #B452CD">0</span>] + rand(<span style="color: #658b00">len</span>(bounds)) * (bounds[:, <span style="color: #B452CD">1</span>] - bounds[:, <span style="color: #B452CD">0</span>])
+	<span style="color: #228B22"># run the gradient descent</span>
+	<span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(n_iter):
+		<span style="color: #228B22"># calculate gradient</span>
+		gradient = derivative(solution)
+		<span style="color: #228B22"># take a step</span>
+		solution = solution - step_size * gradient
+		<span style="color: #228B22"># evaluate candidate point</span>
+		solution_eval = objective(solution)
+		<span style="color: #228B22"># store solution</span>
+		solutions.append(solution)
+		scores.append(solution_eval)
+		<span style="color: #228B22"># report progress</span>
+		<span style="color: #658b00">print</span>(<span style="color: #CD5555">&#39;&gt;%d f(%s) = %.5f&#39;</span> % (i, solution, solution_eval))
+	<span style="color: #8B008B; font-weight: bold">return</span> [solutions, scores]
+ 
+<span style="color: #228B22"># seed the pseudo random number generator</span>
+seed(<span style="color: #B452CD">4</span>)
+<span style="color: #228B22"># define range for input</span>
+bounds = asarray([[-<span style="color: #B452CD">1.0</span>, <span style="color: #B452CD">1.0</span>]])
+<span style="color: #228B22"># define the total iterations</span>
+n_iter = <span style="color: #B452CD">30</span>
+<span style="color: #228B22"># define the step size</span>
+step_size = <span style="color: #B452CD">0.1</span>
+<span style="color: #228B22"># perform the gradient descent search</span>
+solutions, scores = gradient_descent(objective, derivative, bounds, n_iter, step_size)
+<span style="color: #228B22"># sample input range uniformly at 0.1 increments</span>
+inputs = arange(bounds[<span style="color: #B452CD">0</span>,<span style="color: #B452CD">0</span>], bounds[<span style="color: #B452CD">0</span>,<span style="color: #B452CD">1</span>]+<span style="color: #B452CD">0.1</span>, <span style="color: #B452CD">0.1</span>)
+<span style="color: #228B22"># compute targets</span>
+results = objective(inputs)
+<span style="color: #228B22"># create a line plot of input vs result</span>
+pyplot.plot(inputs, results)
+<span style="color: #228B22"># plot the solutions found</span>
+pyplot.plot(solutions, scores, <span style="color: #CD5555">&#39;.-&#39;</span>, color=<span style="color: #CD5555">&#39;red&#39;</span>)
+<span style="color: #228B22"># show the plot</span>
+pyplot.show()
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
 </div>
 </section>
 
 <section>
-<h2 id="conditional-probability">Conditional  Probability </h2>
+<h2 id="same-code-but-now-with-momentum-gradient-descent">Same code but now with momentum gradient descent </h2>
 
-<p>The conditional  probability, if \( p(Y) > 0 \), is </p>
-<div class="alert alert-block alert-block alert-text-normal">
-<b></b>
-<p>
-<p>&nbsp;<br>
-$$
-p(X\vert Y)= \frac{p(X,Y)}{p(Y)}=\frac{p(X,Y)}{\sum_{i=0}^{n-1}p(Y\vert X=x_i)p(x_i)}.
-$$
-<p>&nbsp;<br>
+
+<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #eeeedd">
+  <pre style="font-size: 80%; line-height: 125%;"><span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">numpy</span> <span style="color: #8B008B; font-weight: bold">import</span> asarray
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">numpy</span> <span style="color: #8B008B; font-weight: bold">import</span> arange
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">numpy.random</span> <span style="color: #8B008B; font-weight: bold">import</span> rand
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">numpy.random</span> <span style="color: #8B008B; font-weight: bold">import</span> seed
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">matplotlib</span> <span style="color: #8B008B; font-weight: bold">import</span> pyplot
+ 
+<span style="color: #228B22"># objective function</span>
+<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">objective</span>(x):
+	<span style="color: #8B008B; font-weight: bold">return</span> x**<span style="color: #B452CD">2.0</span>
+ 
+<span style="color: #228B22"># derivative of objective function</span>
+<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">derivative</span>(x):
+	<span style="color: #8B008B; font-weight: bold">return</span> x * <span style="color: #B452CD">2.0</span>
+ 
+<span style="color: #228B22"># gradient descent algorithm</span>
+<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">gradient_descent</span>(objective, derivative, bounds, n_iter, step_size, momentum):
+	<span style="color: #228B22"># track all solutions</span>
+	solutions, scores = <span style="color: #658b00">list</span>(), <span style="color: #658b00">list</span>()
+	<span style="color: #228B22"># generate an initial point</span>
+	solution = bounds[:, <span style="color: #B452CD">0</span>] + rand(<span style="color: #658b00">len</span>(bounds)) * (bounds[:, <span style="color: #B452CD">1</span>] - bounds[:, <span style="color: #B452CD">0</span>])
+	<span style="color: #228B22"># keep track of the change</span>
+	change = <span style="color: #B452CD">0.0</span>
+	<span style="color: #228B22"># run the gradient descent</span>
+	<span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(n_iter):
+		<span style="color: #228B22"># calculate gradient</span>
+		gradient = derivative(solution)
+		<span style="color: #228B22"># calculate update</span>
+		new_change = step_size * gradient + momentum * change
+		<span style="color: #228B22"># take a step</span>
+		solution = solution - new_change
+		<span style="color: #228B22"># save the change</span>
+		change = new_change
+		<span style="color: #228B22"># evaluate candidate point</span>
+		solution_eval = objective(solution)
+		<span style="color: #228B22"># store solution</span>
+		solutions.append(solution)
+		scores.append(solution_eval)
+		<span style="color: #228B22"># report progress</span>
+		<span style="color: #658b00">print</span>(<span style="color: #CD5555">&#39;&gt;%d f(%s) = %.5f&#39;</span> % (i, solution, solution_eval))
+	<span style="color: #8B008B; font-weight: bold">return</span> [solutions, scores]
+ 
+<span style="color: #228B22"># seed the pseudo random number generator</span>
+seed(<span style="color: #B452CD">4</span>)
+<span style="color: #228B22"># define range for input</span>
+bounds = asarray([[-<span style="color: #B452CD">1.0</span>, <span style="color: #B452CD">1.0</span>]])
+<span style="color: #228B22"># define the total iterations</span>
+n_iter = <span style="color: #B452CD">30</span>
+<span style="color: #228B22"># define the step size</span>
+step_size = <span style="color: #B452CD">0.1</span>
+<span style="color: #228B22"># define momentum</span>
+momentum = <span style="color: #B452CD">0.3</span>
+<span style="color: #228B22"># perform the gradient descent search with momentum</span>
+solutions, scores = gradient_descent(objective, derivative, bounds, n_iter, step_size, momentum)
+<span style="color: #228B22"># sample input range uniformly at 0.1 increments</span>
+inputs = arange(bounds[<span style="color: #B452CD">0</span>,<span style="color: #B452CD">0</span>], bounds[<span style="color: #B452CD">0</span>,<span style="color: #B452CD">1</span>]+<span style="color: #B452CD">0.1</span>, <span style="color: #B452CD">0.1</span>)
+<span style="color: #228B22"># compute targets</span>
+results = objective(inputs)
+<span style="color: #228B22"># create a line plot of input vs result</span>
+pyplot.plot(inputs, results)
+<span style="color: #228B22"># plot the solutions found</span>
+pyplot.plot(solutions, scores, <span style="color: #CD5555">&#39;.-&#39;</span>, color=<span style="color: #CD5555">&#39;red&#39;</span>)
+<span style="color: #228B22"># show the plot</span>
+pyplot.show()
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
 </div>
 </section>
 
 <section>
-<h2 id="bayes-theorem">Bayes' Theorem </h2>
-
-<p>If we combine the conditional probability with the marginal probability and the standard product rule, we have</p>
-<p>&nbsp;<br>
-$$
-p(X\vert Y)= \frac{p(X,Y)}{p(Y)},
-$$
-<p>&nbsp;<br>
+<h2 id="overview-video-on-stochastic-gradient-descent-sgd">Overview video on Stochastic Gradient Descent (SGD) </h2>
 
-<p>which we can rewrite as</p>
-
-<p>&nbsp;<br>
-$$
-p(X\vert Y)= \frac{p(X,Y)}{\sum_{i=0}^{n-1}p(Y\vert X=x_i)p(x_i)}=\frac{p(Y\vert X)p(X)}{\sum_{i=0}^{n-1}p(Y\vert X=x_i)p(x_i)},
-$$
-<p>&nbsp;<br>
+<a href="https://www.youtube.com/watch?v=vMh0zPT0tLI&ab_channel=StatQuestwithJoshStarmer" target="_blank">What is Stochastic Gradient Descent</a>
+<p>There are several reasons for using stochastic gradient descent. Some of these are:</p>
 
-<p>which is Bayes' theorem. It allows us to evaluate the uncertainty in in \( X \) after we have observed \( Y \). We can easily interchange \( X \) with \( Y \).  </p>
+<ol>
+<p><li> Efficiency: Updates weights more frequently using a single or a small batch of samples, which speeds up convergence.</li>
+<p><li> Hopefully avoid Local Minima</li>
+<p><li> Memory Usage: Requires less memory compared to computing gradients for the entire dataset.</li>
+</ol>
 </section>
 
 <section>
-<h2 id="interpretations-of-bayes-theorem">Interpretations of Bayes' Theorem </h2>
+<h2 id="batches-and-mini-batches">Batches and mini-batches </h2>
+
+<p>In gradient descent we compute the cost function and its gradient for all data points we have.</p>
 
-<p>The quantity \( p(Y\vert X) \) on the right-hand side of the theorem is
-evaluated for the observed data \( Y \) and can be viewed as a function of
-the parameter space represented by \( X \). This function is not
-necesseraly normalized and is normally called the likelihood function.
+<p>In large-scale applications such as the <a href="https://www.image-net.org/challenges/LSVRC/" target="_blank">ILSVRC challenge</a>, the
+training data can have on order of millions of examples. Hence, it
+seems wasteful to compute the full cost function over the entire
+training set in order to perform only a single parameter update. A
+very common approach to addressing this challenge is to compute the
+gradient over batches of the training data. For example, a typical batch could contain some thousand  examples from
+an  entire training set of several millions. This batch is then used to
+perform a parameter update.
 </p>
+</section>
 
-<p>The function \( p(X) \) on the right hand side is called the prior while the function on the left hand side is the called the posterior probability. The denominator on the right hand side serves as a normalization factor for the posterior distribution.</p>
+<section>
+<h2 id="pros-and-cons">Pros and cons </h2>
 
-<p>Let us try to illustrate Bayes' theorem through an example.</p>
+<ol>
+<p><li> Speed: SGD is faster than gradient descent because it uses only one training example per iteration, whereas gradient descent requires the entire dataset. This speed advantage becomes more significant as the size of the dataset increases.</li>
+<p><li> Convergence: Gradient descent has a more predictable convergence behaviour because it uses the average gradient of the entire dataset. In contrast, SGD&#8217;s convergence behaviour can be more erratic due to its random sampling of individual training examples.</li>
+<p><li> Memory: Gradient descent requires more memory than SGD because it must store the entire dataset for each iteration. SGD only needs to store the current training example, making it more memory-efficient.</li>
+</ol>
 </section>
 
 <section>
-<h2 id="example-of-usage-of-bayes-theorem">Example of Usage of Bayes' theorem </h2>
+<h2 id="convergence-rates">Convergence rates </h2>
 
-<p>Let us suppose that you are undergoing a series of mammography scans in
-order to rule out possible breast cancer cases.  We define the
-sensitivity for a positive event by the variable \( X \). It takes binary
-values with \( X=1 \) representing a positive event and \( X=0 \) being a
-negative event. We reserve \( Y \) as a classification parameter for
-either a negative or a positive breast cancer confirmation. (Short note on wordings: positive here means having breast cancer, although none of us would consider this being a  positive thing).
-</p>
+<ol>
+<p><li> Stochastic Gradient Descent has a faster convergence rate due to the use of single training examples in each iteration.</li>
+<p><li> Gradient Descent as a slower convergence rate, as it uses the entire dataset for each iteration.</li>
+</ol>
+</section>
 
-<p>We let \( Y=1 \) represent the the case of having breast cancer and \( Y=0 \) as not.</p>
+<section>
+<h2 id="accuracy">Accuracy  </h2>
 
-<p>Let us assume that if you have breast cancer, the test will be positive with a probability of \( 0.8 \), that is we have</p>
+<p>In general, stochastic Gradient Descent is Less accurate than gradient
+descent, as it calculates the gradient on single examples, which may
+not accurately represent the overall dataset.  Gradient Descent is
+more accurate because it uses the average gradient calculated over the
+entire dataset.
+</p>
 
-<p>&nbsp;<br>
-$$
-p(X=1\vert Y=1) =0.8.
-$$
-<p>&nbsp;<br>
+<p>There are other disadvantages to using SGD. The main drawback is that
+its convergence behaviour can be more erratic due to the random
+sampling of individual training examples. This can lead to less
+accurate results, as the algorithm may not converge to the true
+minimum of the cost function. Additionally, the learning rate, which
+determines the step size of each update to the model&#8217;s parameters,
+must be carefully chosen to ensure convergence.
+</p>
 
-<p>This obviously sounds  scary since many would conclude that if the test is positive, there is a likelihood of \( 80\% \) for having cancer.
-It is however not correct, as the following Bayesian analysis shows.
+<p>It is however the method of choice in deep learning algorithms where
+SGD is often used in combination with other optimization techniques,
+such as momentum or adaptive learning rates
 </p>
 </section>
 
 <section>
-<h2 id="doing-it-correctly">Doing it correctly </h2>
+<h2 id="stochastic-gradient-descent-sgd">Stochastic Gradient Descent (SGD) </h2>
+
+<p>In stochastic gradient descent, the extreme case is the case where we
+have only one batch, that is we include the whole data set.
+</p>
+
+<p>This process is called Stochastic Gradient
+Descent (SGD) (or also sometimes on-line gradient descent). This is
+relatively less common to see because in practice due to vectorized
+code optimizations it can be computationally much more efficient to
+evaluate the gradient for 100 examples, than the gradient for one
+example 100 times. Even though SGD technically refers to using a
+single example at a time to evaluate the gradient, you will hear
+people use the term SGD even when referring to mini-batch gradient
+descent (i.e. mentions of MGD for &#8220;Minibatch Gradient Descent&#8221;, or BGD
+for &#8220;Batch gradient descent&#8221; are rare to see), where it is usually
+assumed that mini-batches are used. The size of the mini-batch is a
+hyperparameter but it is not very common to cross-validate or bootstrap it. It is
+usually based on memory constraints (if any), or set to some value,
+e.g. 32, 64 or 128. We use powers of 2 in practice because many
+vectorized operation implementations work faster when their inputs are
+sized in powers of 2.
+</p>
+
+<p>In our notes with  SGD we mean stochastic gradient descent with mini-batches.</p>
+</section>
+
+<section>
+<h2 id="stochastic-gradient-descent">Stochastic Gradient Descent </h2>
 
-<p>If we look at various national surveys on breast cancer, the general likelihood of developing breast cancer is a very small number.
-Let us assume that the prior probability in the population as a whole is
+<p>Stochastic gradient descent (SGD) and variants thereof address some of
+the shortcomings of the Gradient descent method discussed above.
 </p>
 
+<p>The underlying idea of SGD comes from the observation that the cost
+function, which we want to minimize, can almost always be written as a
+sum over \( n \) data points \( \{\mathbf{x}_i\}_{i=1}^n \),
+</p>
 <p>&nbsp;<br>
 $$
-p(Y=1) =0.004.
+C(\mathbf{\theta}) = \sum_{i=1}^n c_i(\mathbf{x}_i,
+\mathbf{\theta}). 
 $$
 <p>&nbsp;<br>
+</section>
+
+<section>
+<h2 id="computation-of-gradients">Computation of gradients </h2>
 
-<p>We need also to account for the fact that the test may produce a false positive result (false alarm). Let us here assume that we have</p>
+<p>This in turn means that the gradient can be
+computed as a sum over \( i \)-gradients 
+</p>
 <p>&nbsp;<br>
 $$
-p(X=1\vert Y=0) =0.1.
+\nabla_\theta C(\mathbf{\theta}) = \sum_i^n \nabla_\theta c_i(\mathbf{x}_i,
+\mathbf{\theta}).
 $$
 <p>&nbsp;<br>
 
-<p>Using Bayes' theorem we can then find the posterior probability that the person has breast cancer in case of a positive test, that is we can compute</p>
+<p>Stochasticity/randomness is introduced by only taking the
+gradient on a subset of the data called minibatches.  If there are \( n \)
+data points and the size of each minibatch is \( M \), there will be \( n/M \)
+minibatches. We denote these minibatches by \( B_k \) where
+\( k=1,\cdots,n/M \).
+</p>
+</section>
+
+<section>
+<h2 id="sgd-example">SGD example </h2>
+<p>As an example, suppose we have \( 10 \) data points \( (\mathbf{x}_1,\cdots, \mathbf{x}_{10}) \) 
+and we choose to have \( M=5 \) minibathces,
+then each minibatch contains two data points. In particular we have
+\( B_1 = (\mathbf{x}_1,\mathbf{x}_2), \cdots, B_5 =
+(\mathbf{x}_9,\mathbf{x}_{10}) \). Note that if you choose \( M=1 \) you
+have only a single batch with all data points and on the other extreme,
+you may choose \( M=n \) resulting in a minibatch for each datapoint, i.e
+\( B_k = \mathbf{x}_k \).
+</p>
 
+<p>The idea is now to approximate the gradient by replacing the sum over
+all data points with a sum over the data points in one the minibatches
+picked at random in each gradient descent step 
+</p>
 <p>&nbsp;<br>
 $$
-p(Y=1\vert X=1)=\frac{p(X=1\vert Y=1)p(Y=1)}{p(X=1\vert Y=1)p(Y=1)+p(X=1\vert Y=0)p(Y=0)}=\frac{0.8\times 0.004}{0.8\times 0.004+0.1\times 0.996}=0.031.
+\nabla_{\theta}
+C(\mathbf{\theta}) = \sum_{i=1}^n \nabla_\theta c_i(\mathbf{x}_i,
+\mathbf{\theta}) \rightarrow \sum_{i \in B_k}^n \nabla_\theta
+c_i(\mathbf{x}_i, \mathbf{\theta}).
 $$
 <p>&nbsp;<br>
-
-<p>That is, in case of a positive test, there is only a \( 3\% \) chance of having breast cancer!</p>
 </section>
 
 <section>
-<h2 id="bayes-theorem-and-ridge-and-lasso-regression">Bayes' Theorem and Ridge and Lasso Regression </h2>
+<h2 id="the-gradient-step">The gradient step </h2>
 
-<p>Using Bayes' theorem we can gain a better intuition about Ridge and Lasso regression. </p>
-
-<p>For ordinary least squares we postulated that the maximum likelihood for the doamin of events \( \boldsymbol{D} \) (one-dimensional case)</p>
+<p>Thus a gradient descent step now looks like </p>
 <p>&nbsp;<br>
 $$
-\boldsymbol{D}=[(x_0,y_0), (x_1,y_1),\dots, (x_{n-1},y_{n-1})],
+\theta_{j+1} = \theta_j - \eta_j \sum_{i \in B_k}^n \nabla_\theta c_i(\mathbf{x}_i,
+\mathbf{\theta})
 $$
 <p>&nbsp;<br>
 
-<p>is given by</p>
-<p>&nbsp;<br>
-$$
-p(\boldsymbol{D}\vert\boldsymbol{\beta})=\prod_{i=0}^{n-1}\frac{1}{\sqrt{2\pi\sigma^2}}\exp{\left[-\frac{(y_i-\boldsymbol{X}_{i,*}\boldsymbol{\beta})^2}{2\sigma^2}\right]}.
-$$
-<p>&nbsp;<br>
+<p>where \( k \) is picked at random with equal
+probability from \( [1,n/M] \). An iteration over the number of
+minibathces (n/M) is commonly referred to as an epoch. Thus it is
+typical to choose a number of epochs and for each epoch iterate over
+the number of minibatches, as exemplified in the code below.
+</p>
+</section>
 
-<p>In Bayes' theorem this function plays the role of the so-called likelihood. We could now ask the question what is the posterior probability of a parameter set \( \boldsymbol{\beta} \) given a domain of events \( \boldsymbol{D} \)?  That is, how can we define the posterior probability </p>
+<section>
+<h2 id="simple-example-code">Simple example code </h2>
 
-<p>&nbsp;<br>
-$$
-p(\boldsymbol{\beta}\vert\boldsymbol{D}).
-$$
-<p>&nbsp;<br>
 
-<p>Bayes' theorem comes to our rescue here since (omitting the normalization constant)</p>
-<p>&nbsp;<br>
-$$
-p(\boldsymbol{\beta}\vert\boldsymbol{D})\propto p(\boldsymbol{D}\vert\boldsymbol{\beta})p(\boldsymbol{\beta}).
-$$
-<p>&nbsp;<br>
+<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #eeeedd">
+  <pre style="font-size: 80%; line-height: 125%;"><span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span> 
+
+n = <span style="color: #B452CD">100</span> <span style="color: #228B22">#100 datapoints </span>
+M = <span style="color: #B452CD">5</span>   <span style="color: #228B22">#size of each minibatch</span>
+m = <span style="color: #658b00">int</span>(n/M) <span style="color: #228B22">#number of minibatches</span>
+n_epochs = <span style="color: #B452CD">10</span> <span style="color: #228B22">#number of epochs</span>
+
+j = <span style="color: #B452CD">0</span>
+<span style="color: #8B008B; font-weight: bold">for</span> epoch <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #B452CD">1</span>,n_epochs+<span style="color: #B452CD">1</span>):
+    <span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(m):
+        k = np.random.randint(m) <span style="color: #228B22">#Pick the k-th minibatch at random</span>
+        <span style="color: #228B22">#Compute the gradient using the data in minibatch Bk</span>
+        <span style="color: #228B22">#Compute new suggestion for </span>
+        j += <span style="color: #B452CD">1</span>
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
+</div>
 
-<p>We have a model for \( p(\boldsymbol{D}\vert\boldsymbol{\beta}) \) but need one for the <b>prior</b> \( p(\boldsymbol{\beta}) \)!   </p>
+<p>Taking the gradient only on a subset of the data has two important
+benefits. First, it introduces randomness which decreases the chance
+that our opmization scheme gets stuck in a local minima. Second, if
+the size of the minibatches are small relative to the number of
+datapoints (\( M <  n \)), the computation of the gradient is much
+cheaper since we sum over the datapoints in the \( k-th \) minibatch and not
+all \( n \) datapoints.
+</p>
 </section>
 
 <section>
-<h2 id="ridge-and-bayes">Ridge and Bayes </h2>
-
-<p>With the posterior probability defined by a likelihood which we have
-already modeled and an unknown prior, we are now ready to make
-additional models for the prior.
+<h2 id="when-do-we-stop">When do we stop? </h2>
+
+<p>A natural question is when do we stop the search for a new minimum?
+One possibility is to compute the full gradient after a given number
+of epochs and check if the norm of the gradient is smaller than some
+threshold and stop if true. However, the condition that the gradient
+is zero is valid also for local minima, so this would only tell us
+that we are close to a local/global minimum. However, we could also
+evaluate the cost function at this point, store the result and
+continue the search. If the test kicks in at a later stage we can
+compare the values of the cost function and keep the \( \theta \) that
+gave the lowest value.
 </p>
+</section>
 
-<p>We can, based on our discussions of the variance of \( \boldsymbol{\beta} \) and the mean value, assume that the prior for the values \( \boldsymbol{\beta} \) is given by a Gaussian with mean value zero and variance \( \tau^2 \), that is</p>
+<section>
+<h2 id="slightly-different-approach">Slightly different approach </h2>
+
+<p>Another approach is to let the step length \( \eta_j \) depend on the
+number of epochs in such a way that it becomes very small after a
+reasonable time such that we do not move at all. Such approaches are
+also called scaling. There are many such ways to <a href="https://towardsdatascience.com/gradient-descent-the-learning-rate-and-the-importance-of-feature-scaling-6c0b416596e1" target="_blank">scale the learning
+rate</a>
+and <a href="https://www.jmlr.org/papers/volume23/20-1258/20-1258.pdf" target="_blank">discussions here</a>. See
+also
+<a href="https://towardsdatascience.com/learning-rate-schedules-and-adaptive-learning-rate-methods-for-deep-learning-2c8f433990d1" target="_blank"><tt>https://towardsdatascience.com/learning-rate-schedules-and-adaptive-learning-rate-methods-for-deep-learning-2c8f433990d1</tt></a>
+for a discussion of different scaling functions for the learning rate.
+</p>
+</section>
 
-<p>&nbsp;<br>
-$$
-p(\boldsymbol{\beta})=\prod_{j=0}^{p-1}\exp{\left(-\frac{\beta_j^2}{2\tau^2}\right)}.
-$$
-<p>&nbsp;<br>
+<section>
+<h2 id="time-decay-rate">Time decay rate </h2>
 
-<p>Our posterior probability becomes then (omitting the normalization factor which is just a constant)</p>
-<p>&nbsp;<br>
-$$
-p(\boldsymbol{\beta\vert\boldsymbol{D})}=\prod_{i=0}^{n-1}\frac{1}{\sqrt{2\pi\sigma^2}}\exp{\left[-\frac{(y_i-\boldsymbol{X}_{i,*}\boldsymbol{\beta})^2}{2\sigma^2}\right]}\prod_{j=0}^{p-1}\exp{\left(-\frac{\beta_j^2}{2\tau^2}\right)}.
-$$
-<p>&nbsp;<br>
+<p>As an example, let \( e = 0,1,2,3,\cdots \) denote the current epoch and let \( t_0, t_1 > 0 \) be two fixed numbers. Furthermore, let \( t = e \cdot m + i \) where \( m \) is the number of minibatches and \( i=0,\cdots,m-1 \). Then the function <p>&nbsp;<br>
+$$\eta_j(t; t_0, t_1) = \frac{t_0}{t+t_1} $$
+<p>&nbsp;<br> goes to zero as the number of epochs gets large. I.e. we start with a step length \( \eta_j (0; t_0, t_1) = t_0/t_1 \) which decays in <em>time</em> \( t \).</p>
 
-<p>We can now optimize this quantity with respect to \( \boldsymbol{\beta} \). As we
-did for OLS, this is most conveniently done by taking the negative
-logarithm of the posterior probability. Doing so and leaving out the
-constants terms that do not depend on \( \beta \), we have
+<p>In this way we can fix the number of epochs, compute \( \theta \) and
+evaluate the cost function at the end. Repeating the computation will
+give a different result since the scheme is random by design. Then we
+pick the final \( \theta \) that gives the lowest value of the cost
+function.
 </p>
 
-<p>&nbsp;<br>
-$$
-C(\boldsymbol{\beta})=\frac{\vert\vert (\boldsymbol{y}-\boldsymbol{X}\boldsymbol{\beta})\vert\vert_2^2}{2\sigma^2}+\frac{1}{2\tau^2}\vert\vert\boldsymbol{\beta}\vert\vert_2^2,
-$$
-<p>&nbsp;<br>
 
-<p>and replacing \( 1/2\tau^2 \) with \( \lambda \) we have</p>
+<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #eeeedd">
+  <pre style="font-size: 80%; line-height: 125%;"><span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span> 
+
+<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">step_length</span>(t,t0,t1):
+    <span style="color: #8B008B; font-weight: bold">return</span> t0/(t+t1)
+
+n = <span style="color: #B452CD">100</span> <span style="color: #228B22">#100 datapoints </span>
+M = <span style="color: #B452CD">5</span>   <span style="color: #228B22">#size of each minibatch</span>
+m = <span style="color: #658b00">int</span>(n/M) <span style="color: #228B22">#number of minibatches</span>
+n_epochs = <span style="color: #B452CD">500</span> <span style="color: #228B22">#number of epochs</span>
+t0 = <span style="color: #B452CD">1.0</span>
+t1 = <span style="color: #B452CD">10</span>
+
+eta_j = t0/t1
+j = <span style="color: #B452CD">0</span>
+<span style="color: #8B008B; font-weight: bold">for</span> epoch <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #B452CD">1</span>,n_epochs+<span style="color: #B452CD">1</span>):
+    <span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(m):
+        k = np.random.randint(m) <span style="color: #228B22">#Pick the k-th minibatch at random</span>
+        <span style="color: #228B22">#Compute the gradient using the data in minibatch Bk</span>
+        <span style="color: #228B22">#Compute new suggestion for theta</span>
+        t = epoch*m+i
+        eta_j = step_length(t,t0,t1)
+        j += <span style="color: #B452CD">1</span>
+
+<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;eta_j after %d epochs: %g&quot;</span> % (n_epochs,eta_j))
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
+</div>
+</section>
+
+<section>
+<h2 id="code-with-a-number-of-minibatches-which-varies">Code with a Number of Minibatches which varies </h2>
 
-<p>&nbsp;<br>
-$$
-C(\boldsymbol{\beta})=\frac{\vert\vert (\boldsymbol{y}-\boldsymbol{X}\boldsymbol{\beta})\vert\vert_2^2}{2\sigma^2}+\lambda\vert\vert\boldsymbol{\beta}\vert\vert_2^2,
-$$
-<p>&nbsp;<br>
+<p>In the code here we vary the number of mini-batches.</p>
 
-<p>which is our Ridge cost function!  Nice, isn't it?</p>
+<!-- code=text (!bc pycode) typeset with pygments style "perldoc" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #eeeedd">
+  <pre style="font-size: 80%; line-height: 125%;"># Importing various packages
+from math import exp, sqrt
+from random import random, seed
+import numpy as np
+import matplotlib.pyplot as plt
+
+n = 100
+x = 2*np.random.rand(n,1)
+y = 4+3*x+np.random.randn(n,1)
+
+X = np.c_[np.ones((n,1)), x]
+XT_X = X.T @ X
+theta_linreg = np.linalg.inv(X.T @ X) @ (X.T @ y)
+print(&quot;Own inversion&quot;)
+print(theta_linreg)
+# Hessian matrix
+H = (2.0/n)* XT_X
+EigValues, EigVectors = np.linalg.eig(H)
+print(f&quot;Eigenvalues of Hessian Matrix:{EigValues}&quot;)
+
+theta = np.random.randn(2,1)
+eta = 1.0/np.max(EigValues)
+Niterations = 1000
+
+
+for iter in range(Niterations):
+    gradients = 2.0/n*X.T @ ((X @ theta)-y)
+    theta -= eta*gradients
+print(&quot;theta from own gd&quot;)
+print(theta)
+
+xnew = np.array([[0],[2]])
+Xnew = np.c_[np.ones((2,1)), xnew]
+ypredict = Xnew.dot(theta)
+ypredict2 = Xnew.dot(theta_linreg)
+
+n_epochs = 50
+M = 5   #size of each minibatch
+m = int(n/M) #number of minibatches
+t0, t1 = 5, 50
+
+def learning_schedule(t):
+    return t0/(t+t1)
+
+theta = np.random.randn(2,1)
+
+for epoch in range(n_epochs):
+# Can you figure out a better way of setting up the contributions to each batch?
+    for i in range(m):
+        random_index = M*np.random.randint(m)
+        xi = X[random_index:random_index+M]
+        yi = y[random_index:random_index+M]
+        gradients = (2.0/M)* xi.T @ ((xi @ theta)-yi)
+        eta = learning_schedule(epoch*m+i)
+        theta = theta - eta*gradients
+print(&quot;theta from own sdg&quot;)
+print(theta)
+
+plt.plot(xnew, ypredict, &quot;r-&quot;)
+plt.plot(xnew, ypredict2, &quot;b-&quot;)
+plt.plot(x, y ,&#39;ro&#39;)
+plt.axis([0,2.0,0, 15.0])
+plt.xlabel(r&#39;$x$&#39;)
+plt.ylabel(r&#39;$y$&#39;)
+plt.title(r&#39;Random numbers &#39;)
+plt.show()
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
+</div>
 </section>
 
 <section>
-<h2 id="lasso-and-bayes">Lasso and Bayes </h2>
+<h2 id="replace-or-not">Replace or not </h2>
+
+<p>In the above code, we have use replacement in setting up the
+mini-batches. The discussion
+<a href="https://sebastianraschka.com/faq/docs/sgd-methods.html" target="_blank">here</a> may be
+useful.  
+</p>
+</section>
 
-<p>To derive the Lasso cost function, we simply replace the Gaussian prior with an exponential distribution (<a href="https://en.wikipedia.org/wiki/Laplace_distribution" target="_blank">Laplace in this case</a>) with zero mean value,  that is</p>
+<section>
+<h2 id="sgd-vs-full-batch-gd-convergence-speed-and-memory-comparison">SGD vs Full-Batch GD: Convergence Speed and Memory Comparison </h2>
+<h3 id="theoretical-convergence-speed-and-convex-optimization">Theoretical Convergence Speed and convex optimization </h3>
 
+<p>Consider minimizing an empirical cost function</p>
 <p>&nbsp;<br>
 $$
-p(\boldsymbol{\beta})=\prod_{j=0}^{p-1}\exp{\left(-\frac{\vert\beta_j\vert}{\tau}\right)}.
+C(\theta) =\frac{1}{N}\sum_{i=1}^N l_i(\theta),
 $$
 <p>&nbsp;<br>
 
-<p>Our posterior probability becomes then (omitting the normalization factor which is just a constant)</p>
+<p>where each \( l_i(\theta) \) is a
+differentiable loss term. Gradient Descent (GD) updates parameters
+using the full gradient \( \nabla C(\theta) \), while Stochastic Gradient
+Descent (SGD) uses a single sample (or mini-batch) gradient \( \nabla
+l_i(\theta) \) selected at random. In equation form, one GD step is:
+</p>
+
 <p>&nbsp;<br>
 $$
-p(\boldsymbol{\beta}\vert\boldsymbol{D})=\prod_{i=0}^{n-1}\frac{1}{\sqrt{2\pi\sigma^2}}\exp{\left[-\frac{(y_i-\boldsymbol{X}_{i,*}\boldsymbol{\beta})^2}{2\sigma^2}\right]}\prod_{j=0}^{p-1}\exp{\left(-\frac{\vert\beta_j\vert}{\tau}\right)}.
+\theta_{t+1} = \theta_t-\eta \nabla C(\theta_t) =\theta_t -\eta \frac{1}{N}\sum_{i=1}^N \nabla l_i(\theta_t),
 $$
 <p>&nbsp;<br>
 
-<p>Taking the negative
-logarithm of the posterior probability and leaving out the
-constants terms that do not depend on \( \beta \), we have
-</p>
+<p>whereas one SGD step is:</p>
 
 <p>&nbsp;<br>
 $$
-C(\boldsymbol{\beta})=\frac{\vert\vert (\boldsymbol{y}-\boldsymbol{X}\boldsymbol{\beta})\vert\vert_2^2}{2\sigma^2}+\frac{1}{\tau}\vert\vert\boldsymbol{\beta}\vert\vert_1,
+\theta_{t+1} = \theta_t -\eta \nabla l_{i_t}(\theta_t),
 $$
 <p>&nbsp;<br>
 
-<p>and replacing \( 1/\tau \) with \( \lambda \) we have</p>
+<p>with \( i_t \) randomly chosen. On smooth convex problems, GD and SGD both
+converge to the global minimum, but their rates differ. GD can take
+larger, more stable steps since it uses the exact gradient, achieving
+an error that decreases on the order of \( O(1/t) \) per iteration for
+convex objectives (and even exponentially fast for strongly convex
+cases). In contrast, plain SGD has more variance in each step, leading
+to sublinear convergence in expectation &#8211; typically \( O(1/\sqrt{t}) \)
+for general convex objectives (\thetaith appropriate diminishing step
+sizes) . Intuitively, GD&#8217;s trajectory is smoother and more
+predictable, while SGD&#8217;s path oscillates due to noise but costs far
+less per iteration, enabling many more updates in the same time.
+</p>
+<h3 id="strongly-convex-case">Strongly Convex Case </h3>
 
+<p>If \( C(\theta) \) is strongly convex and \( L \)-smooth (so GD enjoys linear
+convergence), the gap \( C(\theta_t)-C(\theta^*) \) for GD shrinks as
+</p>
 <p>&nbsp;<br>
 $$
-C(\boldsymbol{\beta})=\frac{\vert\vert (\boldsymbol{y}-\boldsymbol{X}\boldsymbol{\beta})\vert\vert_2^2}{2\sigma^2}+\lambda\vert\vert\boldsymbol{\beta}\vert\vert_1,
+C(\theta_t) - C(\theta^* ) \le \Big(1 - \frac{\mu}{L}\Big)^t [C(\theta_0)-C(\theta^*)],
 $$
 <p>&nbsp;<br>
 
-<p>which is our Lasso cost function!  </p>
-</section>
+<p>a geometric (linear) convergence per iteration . Achieving an
+\( \epsilon \)-accurate solution thus takes on the order of
+\( \log(1/\epsilon) \) iterations for GD. However, each GD iteration costs
+\( O(N) \) gradient evaluations. SGD cannot exploit strong convexity to
+obtain a linear rate &#8211; instead, with a properly decaying step size
+(e.g. \( \eta_t = \frac{1}{\mu t} \)) or iterate averaging, SGD attains an
+\( O(1/t) \) convergence rate in expectation . For example, one result
+of Moulines and  Bach 2011, see <a href="https://papers.nips.cc/paper_files/paper/2011/hash/40008b9a5380fcacce3976bf7c08af5b-Abstract.html" target="_blank"><tt>https://papers.nips.cc/paper_files/paper/2011/hash/40008b9a5380fcacce3976bf7c08af5b-Abstract.html</tt></a> shows that with \( \eta_t = \Theta(1/t) \),
+</p>
+<p>&nbsp;<br>
+$$
+\mathbb{E}[C(\theta_t) - C(\theta^*)] = O(1/t),
+$$
+<p>&nbsp;<br>
 
-<section>
-<h2 id="why-resampling-methods">Why resampling methods </h2>
+<p>for strongly convex, smooth \( F \) . This \( 1/t \) rate is slower per
+iteration than GD&#8217;s exponential decay, but each SGD iteration is \( N \)
+times cheaper. In fact, to reach error \( \epsilon \), plain SGD needs on
+the order of \( T=O(1/\epsilon) \) iterations (sub-linear convergence),
+while GD needs \( O(\log(1/\epsilon)) \) iterations. When accounting for
+cost-per-iteration, GD requires \( O(N \log(1/\epsilon)) \) total gradient
+computations versus SGD&#8217;s \( O(1/\epsilon) \) single-sample
+computations. In large-scale regimes (huge \( N \)), SGD can be
+faster in wall-clock time because \( N \log(1/\epsilon) \) may far exceed
+\( 1/\epsilon \) for reasonable accuracy levels. In other words,
+with millions of data points, one epoch of GD (one full gradient) is
+extremely costly, whereas SGD can make \( N \) cheap updates in the time
+GD makes one &#8211; often yielding a good solution faster in practice, even
+though SGD&#8217;s asymptotic error decays more slowly. As one lecture
+succinctly puts it: &#8220;SGD can be super effective in terms of iteration
+cost and memory, but SGD is slow to converge and can&#8217;t adapt to strong
+convexity&#8221; . Thus, the break-even point depends on \( N \) and the desired
+accuracy: for moderate accuracy on very large \( N \), SGD&#8217;s cheaper
+updates win; for extremely high precision (very small \( \epsilon \)) on a
+modest \( N \), GD&#8217;s fast convergence per step can be advantageous.
+</p>
+<h3 id="non-convex-problems">Non-Convex Problems </h3>
 
-<p>Before we proceed, we need to rethink what we have been doing. In our
-eager to fit the data, we have omitted several important elements in
-our regression analysis. In what follows we will
+<p>In non-convex optimization (e.g. deep neural networks), neither GD nor
+SGD guarantees global minima, but SGD often displays faster progress
+in finding useful minima. Theoretical results here are weaker, usually
+showing convergence to a stationary point \( \theta \) (\( |\nabla C| \) is
+small) in expectation. For example, GD might require \( O(1/\epsilon^2) \)
+iterations to ensure \( |\nabla C(\theta)| < \epsilon \), and SGD typically has
+similar polynomial complexity (often worse due to gradient
+noise). However, a noteworthy difference is that SGD&#8217;s stochasticity
+can help escape saddle points or poor local minima. Random gradient
+fluctuations act like implicit noise, helping the iterate &#8220;jump&#8221; out
+of flat saddle regions where full-batch GD could stagnate . In fact,
+research has shown that adding noise to GD can guarantee escaping
+saddle points in polynomial time, and the inherent noise in SGD often
+serves this role. Empirically, this means SGD can sometimes find a
+lower loss basin faster, whereas full-batch GD might get &#8220;stuck&#8221; near
+saddle points or need a very small learning rate to navigate complex
+error surfaces . Overall, in modern high-dimensional machine learning,
+SGD (or mini-batch SGD) is the workhorse for large non-convex problems
+because it converges to good solutions much faster in practice,
+despite the lack of a linear convergence guarantee. Full-batch GD is
+rarely used on large neural networks, as it would require tiny steps
+to avoid divergence and is extremely slow per iteration .
 </p>
-<ol>
-<p><li> look at statistical properties, including a discussion of mean values, variance and the so-called bias-variance tradeoff</li>
-<p><li> introduce resampling techniques like cross-validation, bootstrapping and jackknife and more</li>
-</ol>
-<p>
-<p>and discuss how to select a given model (one of the difficult parts in machine learning).</p>
 </section>
 
 <section>
-<h2 id="resampling-methods">Resampling methods </h2>
-<div class="alert alert-block alert-block alert-text-normal">
-<b></b>
-<p>
-<p>Resampling methods are an indispensable tool in modern
-statistics. They involve repeatedly drawing samples from a training
-set and refitting a model of interest on each sample in order to
-obtain additional information about the fitted model. For example, in
-order to estimate the variability of a linear regression fit, we can
-repeatedly draw different samples from the training data, fit a linear
-regression to each new sample, and then examine the extent to which
-the resulting fits differ. Such an approach may allow us to obtain
-information that would not be available from fitting the model only
-once using the original training sample.
-</p>
-
-<p>Two resampling methods are often used in Machine Learning analyses,</p>
-<ol>
-<p><li> The <b>bootstrap method</b></li>
-<p><li> and <b>Cross-Validation</b></li>
-</ol>
-<p>
-<p>In addition there are several other methods such as the Jackknife and the Blocking methods. We will discuss in particular
-cross-validation and the bootstrap method. 
+<h2 id="memory-usage-and-scalability">Memory Usage and Scalability </h2>
+
+<p>A major advantage of SGD is its memory efficiency in handling large
+datasets. Full-batch GD requires access to the entire training set for
+each iteration, which often means the whole dataset (or a large
+subset) must reside in memory to compute \( \nabla C(\theta) \) . This results
+in memory usage that scales linearly with the dataset size \( N \). For
+instance, if each training sample is large (e.g. high-dimensional
+features), computing a full gradient may require storing a substantial
+portion of the data or all intermediate gradients until they are
+aggregated. In contrast, SGD needs only a single (or a small
+mini-batch of) training example(s) in memory at any time . The
+algorithm processes one sample (or mini-batch) at a time and
+immediately updates the model, discarding that sample before moving to
+the next. This streaming approach means that memory footprint is
+essentially independent of \( N \) (apart from storing the model
+parameters themselves). As one source notes, gradient descent
+&#8220;requires more memory than SGD&#8221; because it &#8220;must store the entire
+dataset for each iteration,&#8221; whereas SGD &#8220;only needs to store the
+current training example&#8221; . In practical terms, if you have a dataset
+of size, say, 1 million examples, full-batch GD would need memory for
+all million every step, while SGD could be implemented to load just
+one example at a time &#8211; a crucial benefit if data are too large to fit
+in RAM or GPU memory. This scalability makes SGD suitable for
+large-scale learning: as long as you can stream data from disk, SGD
+can handle arbitrarily large datasets with fixed memory. In fact, SGD
+&#8220;does not need to remember which examples were visited&#8221; in the past,
+allowing it to run in an online fashion on infinite data streams
+. Full-batch GD, on the other hand, would require multiple passes
+through a giant dataset per update (or a complex distributed memory
+system), which is often infeasible.
+</p>
+
+<p>There is also a secondary memory effect: computing a full-batch
+gradient in deep learning requires storing all intermediate
+activations for backpropagation across the entire batch. A very large
+batch (approaching the full dataset) might exhaust GPU memory due to
+the need to hold activation gradients for thousands or millions of
+examples simultaneously. SGD/minibatches mitigate this by splitting
+the workload &#8211; e.g. with a mini-batch of size 32 or 256, memory use
+stays bounded, whereas a full-batch (size = \( N \)) forward/backward pass
+could not even be executed if \( N \) is huge. Techniques like gradient
+accumulation exist to simulate large-batch GD by summing many
+small-batch gradients &#8211; but these still process data in manageable
+chunks to avoid memory overflow. In summary, memory complexity for GD
+grows with \( N \), while for SGD it remains \( O(1) \) w.r.t. dataset size
+(only the model and perhaps a mini-batch reside in memory) . This is a
+key reason why batch GD &#8220;does not scale&#8221; to very large data and why
+virtually all large-scale machine learning algorithms rely on
+stochastic or mini-batch methods.
 </p>
-</div>
 </section>
 
 <section>
-<h2 id="resampling-approaches-can-be-computationally-expensive">Resampling approaches can be computationally expensive </h2>
-<div class="alert alert-block alert-block alert-text-normal">
-<b></b>
-<p>
-
-<p>Resampling approaches can be computationally expensive, because they
-involve fitting the same statistical method multiple times using
-different subsets of the training data. However, due to recent
-advances in computing power, the computational requirements of
-resampling methods generally are not prohibitive. In this chapter, we
-discuss two of the most commonly used resampling methods,
-cross-validation and the bootstrap. Both methods are important tools
-in the practical application of many statistical learning
-procedures. For example, cross-validation can be used to estimate the
-test error associated with a given statistical learning method in
-order to evaluate its performance, or to select the appropriate level
-of flexibility. The process of evaluating a model&#8217;s performance is
-known as model assessment, whereas the process of selecting the proper
-level of flexibility for a model is known as model selection. The
-bootstrap is widely used.
+<h2 id="empirical-evidence-convergence-time-and-memory-in-practice">Empirical Evidence: Convergence Time and Memory in Practice </h2>
+
+<p>Empirical studies strongly support the theoretical trade-offs
+above. In large-scale machine learning tasks, SGD often converges to a
+good solution much faster in wall-clock time than full-batch GD, and
+it uses far less memory. For example, Bottou &amp; Bousquet (2008)
+analyzed learning time under a fixed computational budget and
+concluded that when data is abundant, it&#8217;s better to use a faster
+(even if less precise) optimization method to process more examples in
+the same time . This analysis showed that for large-scale problems,
+processing more data with SGD yields lower error than spending the
+time to do exact (batch) optimization on fewer data . In other words,
+if you have a time budget, it&#8217;s often optimal to accept slightly
+slower convergence per step (as with SGD) in exchange for being able
+to use many more training samples in that time. This phenomenon is
+borne out by experiments:
+</p>
+<h3 id="deep-neural-networks">Deep Neural Networks </h3>
+
+<p>In modern deep learning, full-batch GD is so slow that it is rarely
+attempted; instead, mini-batch SGD is standard. A recent study
+demonstrated that it is possible to train a ResNet-50 on ImageNet
+using full-batch gradient descent, but it required careful tuning
+(e.g. gradient clipping, tiny learning rates) and vast computational
+resources &#8211; and even then, each full-batch update was extremely
+expensive.
+</p>
+
+<p>Using a huge batch
+(closer to full GD) tends to slow down convergence if the learning
+rate is not scaled up, and often encounters optimization difficulties
+(plateaus) that small batches avoid.
+Empirically, small or medium
+batch SGD finds minima in fewer clock hours because it can rapidly
+loop over the data with gradient noise aiding exploration.
+</p>
+<h3 id="memory-constraints">Memory constraints </h3>
+
+<p>From a memory standpoint, practitioners note that batch GD becomes
+infeasible on large data. For example, if one tried to do full-batch
+training on a dataset that doesn&#8217;t fit in RAM or GPU memory, the
+program would resort to heavy disk I/O or simply crash. SGD
+circumvents this by processing mini-batches. Even in cases where data
+does fit in memory, using a full batch can spike memory usage due to
+storing all gradients. One empirical observation is that mini-batch
+training has a &#8220;lower, fluctuating usage pattern&#8221; of memory, whereas
+full-batch loading &#8220;quickly consumes memory (often exceeding limits)&#8221;
+. This is especially relevant for graph neural networks or other
+models where a &#8220;batch&#8221; may include a huge chunk of a graph: full-batch
+gradient computation can exhaust GPU memory, whereas mini-batch
+methods keep memory usage manageable .
+</p>
+
+<p>In summary, SGD converges faster than full-batch GD in terms of actual
+training time for large-scale problems, provided we measure
+convergence as reaching a good-enough solution. Theoretical bounds
+show SGD needs more iterations, but because it performs many more
+updates per unit time (and requires far less memory), it often
+achieves lower loss in a given time frame than GD. Full-batch GD might
+take slightly fewer iterations in theory, but each iteration is so
+costly that it is &#8220;slower&#8230; especially for large datasets&#8221; . Meanwhile,
+memory scaling strongly favors SGD: GD&#8217;s memory cost grows with
+dataset size, making it impractical beyond a point, whereas SGD&#8217;s
+memory use is modest and mostly constant w.r.t. \( N \) . These
+differences have made SGD (and mini-batch variants) the de facto
+choice for training large machine learning models, from logistic
+regression on millions of examples to deep neural networks with
+billions of parameters. The consensus in both research and practice is
+that for large-scale or high-dimensional tasks, SGD-type methods
+converge quicker per unit of computation and handle memory constraints
+better than standard full-batch gradient descent .
 </p>
-</div>
 </section>
 
 <section>
-<h2 id="why-resampling-methods">Why resampling methods ? </h2>
-<div class="alert alert-block alert-block alert-text-normal">
-<b>Statistical analysis</b>
-<p>
-
-<ul>
-<p><li> Our simulations can be treated as <em>computer experiments</em>. This is particularly the case for Monte Carlo methods which are widely used in statistical analyses.</li>
-<p><li> The results can be analysed with the same statistical tools as we would use when analysing experimental data.</li>
-<p><li> As in all experiments, we are looking for expectation values and an estimate of how accurate they are, i.e., possible sources for errors.</li>
-</ul>
-</div>
+<h2 id="second-moment-of-the-gradient">Second moment of the gradient </h2>
+
+<p>In stochastic gradient descent, with and without momentum, we still
+have to specify a schedule for tuning the learning rates \( \eta_t \)
+as a function of time.  As discussed in the context of Newton's
+method, this presents a number of dilemmas. The learning rate is
+limited by the steepest direction which can change depending on the
+current position in the landscape. To circumvent this problem, ideally
+our algorithm would keep track of curvature and take large steps in
+shallow, flat directions and small steps in steep, narrow directions.
+Second-order methods accomplish this by calculating or approximating
+the Hessian and normalizing the learning rate by the
+curvature. However, this is very computationally expensive for
+extremely large models. Ideally, we would like to be able to
+adaptively change the step size to match the landscape without paying
+the steep computational price of calculating or approximating
+Hessians.
+</p>
+
+<p>During the last decade a number of methods have been introduced that accomplish
+this by tracking not only the gradient, but also the second moment of
+the gradient. These methods include AdaGrad, AdaDelta, Root Mean Squared Propagation (RMS-Prop), and
+<a href="https://arxiv.org/abs/1412.6980" target="_blank">ADAM</a>.
+</p>
 </section>
 
 <section>
-<h2 id="statistical-analysis">Statistical analysis </h2>
-<div class="alert alert-block alert-block alert-text-normal">
-<b></b>
-<p>
-
-<ul>
-<p><li> As in other experiments, many numerical  experiments have two classes of errors:</li>
-<ul>
-
-<p><li> Statistical errors</li>
-
-<p><li> Systematical errors</li>
-</ul>
+<h2 id="challenge-choosing-a-fixed-learning-rate">Challenge: Choosing a Fixed Learning Rate </h2>
+<p>A fixed \( \eta \) is hard to get right:</p>
+<ol>
+<p><li> If \( \eta \) is too large, the updates can overshoot the minimum, causing oscillations or divergence</li>
+<p><li> If \( \eta \) is too small, convergence is very slow (many iterations to make progress)</li>
+</ol>
 <p>
-<p><li> Statistical errors can be estimated using standard tools from statistics</li>
-<p><li> Systematical errors are method specific and must be treated differently from case to case.</li> 
-</ul>
-</div>
+<p>In practice, one often uses trial-and-error or schedules (decaying \( \eta \) over time) to find a workable balance.
+For a function with steep directions and flat directions, a single global \( \eta \) may be inappropriate:
+</p>
+<ol>
+<p><li> Steep coordinates require a smaller step size to avoid oscillation.</li>
+<p><li> Flat/shallow coordinates could use a larger step to speed up progress.</li>
+<p><li> This issue is pronounced in high-dimensional problems with **sparse or varying-scale features** &#8211; we need a method to adjust step sizesper feature.</li>
+</ol>
 </section>
 
 <section>
-<h2 id="resampling-methods">Resampling methods </h2>
-
-<p>With all these analytical equations for both the OLS and Ridge
-regression, we will now outline how to assess a given model. This will
-lead to a discussion of the so-called bias-variance tradeoff (see
-below) and so-called resampling methods.
-</p>
-
-<p>One of the quantities we have discussed as a way to measure errors is
-the mean-squared error (MSE), mainly used for fitting of continuous
-functions. Another choice is the absolute error.
-</p>
+<h2 id="motivation-for-adaptive-step-sizes">Motivation for Adaptive Step Sizes </h2>
 
-<p>In the discussions below we will focus on the MSE and in particular since we will split the data into test and training data,
-we discuss the
-</p>
 <ol>
-<p><li> prediction error or simply the <b>test error</b> \( \mathrm{Err_{Test}} \), where we have a fixed training set and the test error is the MSE arising from the data reserved for testing. We discuss also the</li> 
-<p><li> training error \( \mathrm{Err_{Train}} \), which is the average loss over the training data.</li>
+<p><li> Instead of a fixed global \( \eta \), use an <b>adaptive learning rate</b> for each parameter that depends on the history of gradients.</li>
+<p><li> Parameters that have large accumulated gradient magnitude should get smaller steps (they've been changing a lot), whereas parameters with small or infrequent gradients can have larger relative steps.</li>
+<p><li> This is especially useful for sparse features: Rarely active features accumulate little gradient, so their learning rate remains comparatively high, ensuring they are not neglected</li>
+<p><li> Conversely, frequently active features accumulate large gradient sums, and their learning rate automatically decreases, preventing too-large updates</li>
+<p><li> Several algorithms implement this idea (AdaGrad, RMSProp, AdaDelta, Adam, etc.). We will derive **AdaGrad**, one of the first adaptive methods.</li>
 </ol>
 <p>
-<p>As our model becomes more and more complex, more of the training data tends to  used. The training may thence adapt to more complicated structures in the data. This may lead to a decrease in the bias (see below for code example) and a slight increase of the variance for the test error.
-For a certain level of complexity the test error will reach minimum, before starting to increase again. The
-training error reaches a saturation.
-</p>
+<h2 id="adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html">AdaGrad algorithm, taken from <a href="https://www.deeplearningbook.org/contents/optimization.html" target="_blank">Goodfellow et al</a> </h2>
+
+<br/><br/>
+<center>
+<p><img src="http://github.com/figures/adagrad.png" width="600" align="bottom"></p>
+</center>
+<br/><br/>
 </section>
 
 <section>
-<h2 id="resampling-methods-bootstrap">Resampling methods: Bootstrap </h2>
+<h2 id="derivation-of-the-adagrad-algorithm">Derivation of the AdaGrad Algorithm </h2>
+
 <div class="alert alert-block alert-block alert-text-normal">
-<b></b>
+<b>Accumulating Gradient History</b>
 <p>
-<p>Bootstrapping is a <a href="https://en.wikipedia.org/wiki/Nonparametric_statistics" target="_blank">non-parametric approach</a> to statistical inference
-that substitutes computation for more traditional distributional
-assumptions and asymptotic results. Bootstrapping offers a number of
-advantages: 
-</p>
 <ol>
-<p><li> The bootstrap is quite general, although there are some cases in which it fails.</li>
-
-<p><li> Because it does not require distributional assumptions (such as normally distributed errors), the bootstrap can provide more accurate inferences when the data are not well behaved or when the sample size is small.</li>
-
-<p><li> It is possible to apply the bootstrap to statistics with sampling distributions that are difficult to derive, even asymptotically.</li> 
-<p><li> It is relatively simple to apply the bootstrap to complex data-collection plans (such as stratified and clustered samples).</li>
+<p><li> AdaGrad maintains a running sum of squared gradients for each parameter (coordinate)</li>
+<p><li> Let \( g_t = \nabla C_{i_t}(x_t) \) be the gradient at step \( t \) (or a subgradient for nondifferentiable cases).</li>
+<p><li> Initialize \( r_0 = 0 \) (an all-zero vector in \( \mathbb{R}^d \)).</li>
+<p><li> At each iteration \( t \), update the accumulation:</li>
 </ol>
-</div>
-
-<p>The textbook by <a href="https://www.cambridge.org/core/books/bootstrap-methods-and-their-application/ED2FD043579F27952363566DC09CBD6A" target="_blank">Davison on the Bootstrap Methods and their Applications</a> provides many more insights and proofs. In this course we will take a more practical approach and use the results and theorems provided in the literature. For those interested in reading more about the bootstrap methods, we recommend the above text and the one by <a href="https://www.routledge.com/An-Introduction-to-the-Bootstrap/Efron-Tibshirani/p/book/9780412042317" target="_blank">Efron and Tibshirani</a>.</p>
-
-<p>Before we proceed however, we need to remind ourselves about a central theorem in statistics, namely the so-called <b>central limit theorem</b>.</p>
-</section>
-
-<section>
-<h2 id="the-central-limit-theorem">The Central Limit Theorem </h2>
-
-<p>Suppose we have a PDF \( p(x) \) from which we generate  a series \( N \)
-of averages \( \mathbb{E}[x_i] \). Each mean value \( \mathbb{E}[x_i] \)
-is viewed as the average of a specific measurement, e.g., throwing 
-dice 100 times and then taking the average value, or producing a certain
-amount of random numbers. 
-For notational ease, we set \( \mathbb{E}[x_i]=x_i \) in the discussion
-which follows. We do the same for \( \mathbb{E}[z]=z \).
-</p>
-
-<p>If we compute the mean \( z \) of \( m \) such mean values \( x_i \)   </p>
+<p>
 <p>&nbsp;<br>
 $$
-   z=\frac{x_1+x_2+\dots+x_m}{m},
+r_t = r_{t-1} + g_t \circ g_t,
 $$
 <p>&nbsp;<br>
 
-<p>the question we pose is which is the PDF of the new variable \( z \).</p>
+<ol>
+<p><li> Here  \( g_t \circ g_t \) denotes element-wise square of the gradient vector. \( g_t^{(j)} = g_{t-1}^{(j)} + (g_{t,j})^2 \) for each parameter \( j \).</li>
+<p><li> We can view \( H_t = \mathrm{diag}(r_t) \) as a diagonal matrix of past squared gradients. Initially \( H_0 = 0 \).</li>
+</ol>
+</div>
 </section>
 
 <section>
-<h2 id="finding-the-limit">Finding the Limit </h2>
+<h2 id="adagrad-update-rule-derivation">AdaGrad Update Rule Derivation </h2>
 
-<p>The probability of obtaining an average value \( z \) is the product of the 
-probabilities of obtaining arbitrary individual mean values \( x_i \),
-but with the constraint that the average is \( z \). We can express this through
-the following expression
-</p>
+<p>We scale the gradient by the inverse square root of the accumulated matrix \( H_t \). The AdaGrad update at step \( t \) is:</p>
 <p>&nbsp;<br>
 $$
-    \tilde{p}(z)=\int dx_1p(x_1)\int dx_2p(x_2)\dots\int dx_mp(x_m)
-    \delta(z-\frac{x_1+x_2+\dots+x_m}{m}),
+\theta_{t+1} =\theta_t - \eta H_t^{-1/2} g_t,
 $$
 <p>&nbsp;<br>
 
-<p>where the \( \delta \)-function enbodies the constraint that the mean is \( z \).
-All measurements that lead to each individual \( x_i \) are expected to
-be independent, which in turn means that we can express \( \tilde{p} \) as the 
-product of individual \( p(x_i) \).  The independence assumption is important in the derivation of the central limit theorem.
+<p>where \( H_t^{-1/2} \) is the diagonal matrix with entries \( (r_{t}^{(1)})^{-1/2}, \dots, (r_{t}^{(d)})^{-1/2} \)
+In coordinates, this means each parameter \( j \) has an individual step size:
 </p>
-</section>
-
-<section>
-<h2 id="rewriting-the-delta-function">Rewriting the \( \delta \)-function </h2>
-
-<p>If we use the integral expression for the \( \delta \)-function</p>
-
 <p>&nbsp;<br>
 $$
-   \delta(z-\frac{x_1+x_2+\dots+x_m}{m})=\frac{1}{2\pi}\int_{-\infty}^{\infty}
-   dq\exp{\left(iq(z-\frac{x_1+x_2+\dots+x_m}{m})\right)},
+ \theta_{t+1,j} =\theta_{t,j} -\frac{\eta}{\sqrt{r_{t,j}}}g_{t,j}.
 $$
 <p>&nbsp;<br>
 
-<p>and inserting \( e^{i\mu q-i\mu q} \) where \( \mu \) is the mean value
-we arrive at
-</p>
+<p>In practice we add a small constant \( \epsilon \) in the denominator for numerical stability to avoid division by zero:</p>
 <p>&nbsp;<br>
 $$
-   \tilde{p}(z)=\frac{1}{2\pi}\int_{-\infty}^{\infty}
-   dq\exp{\left(iq(z-\mu)\right)}\left[\int_{-\infty}^{\infty}
-   dxp(x)\exp{\left(iq(\mu-x)/m\right)}\right]^m,
+\theta_{t+1,j}= \theta_{t,j}-\frac{\eta}{\sqrt{\epsilon + r_{t,j}}}g_{t,j}.
 $$
 <p>&nbsp;<br>
 
-<p>with the integral over \( x \) resulting in</p>
+<p>Equivalently, the effective learning rate for parameter \( j \) at time \( t \) is \( \displaystyle \alpha_{t,j} = \frac{\eta}{\sqrt{\epsilon + r_{t,j}}} \). This decreases over time as \( r_{t,j} \) grows.</p>
+</section>
 
-<p>&nbsp;<br>
-$$
-  \int_{-\infty}^{\infty}dxp(x)\exp{\left(iq(\mu-x)/m\right)}=
-  \int_{-\infty}^{\infty}dxp(x)
-   \left[1+\frac{iq(\mu-x)}{m}-\frac{q^2(\mu-x)^2}{2m^2}+\dots\right].
-$$
-<p>&nbsp;<br>
+<section>
+<h2 id="adagrad-properties">AdaGrad Properties </h2>
+
+<ol>
+<p><li> AdaGrad automatically tunes the step size for each parameter. Parameters with more <em>volatile or large gradients</em> get smaller steps, and those with <em>small or infrequent gradients</em> get relatively larger steps</li>
+<p><li> No manual schedule needed: The accumulation \( r_t \) keeps increasing (or stays the same if gradient is zero), so step sizes \( \eta/\sqrt{r_t} \) are non-increasing. This has a similar effect to a learning rate schedule, but individualized per coordinate.</li>
+<p><li> Sparse data benefit: For very sparse features, \( r_{t,j} \) grows slowly, so that feature&#8217;s parameter retains a higher learning rate for longer, allowing it to make significant updates when it does get a gradient signal</li>
+<p><li> Convergence: In convex optimization, AdaGrad can be shown to achieve a sub-linear convergence rate  comparable to the best fixed learning rate tuned for the problem</li>
+</ol>
+<p>
+<p>It effectively reduces the need to tune \( \eta \) by hand.</p>
+<ol>
+<p><li> Limitations: Because \( r_t \) accumulates without bound, AdaGrad&#8217;s learning rates can become extremely small over long training, potentially slowing progress. (Later variants like RMSProp, AdaDelta, Adam address this by modifying the accumulation rule.)</li>
+</ol>
 </section>
 
 <section>
-<h2 id="identifying-terms">Identifying Terms </h2>
+<h2 id="rmsprop-adaptive-learning-rates">RMSProp: Adaptive Learning Rates </h2>
 
-<p>The second term on the rhs disappears since this is just the mean and 
-employing the definition of \( \sigma^2 \) we have 
+<p>Addresses AdaGrad&#8217;s diminishing learning rate issue.
+Uses a decaying average of squared gradients (instead of a cumulative sum):
 </p>
 <p>&nbsp;<br>
 $$
-  \int_{-\infty}^{\infty}dxp(x)e^{\left(iq(\mu-x)/m\right)}=
-  1-\frac{q^2\sigma^2}{2m^2}+\dots,
+v_t = \rho v_{t-1} + (1-\rho)(\nabla C(\theta_t))^2,
 $$
 <p>&nbsp;<br>
 
-<p>resulting in </p>
+<p>with \( \rho \) typically \( 0.9 \) (or \( 0.99 \)).</p>
+<ol>
+<p><li> Update: \( \theta_{t+1} = \theta_t - \frac{\eta}{\sqrt{v_t + \epsilon}} \nabla C(\theta_t) \).</li>
+<p><li> Recent gradients have more weight, so \( v_t \) adapts to the current landscape.</li>
+<p><li> Avoids AdaGrad&#8217;s &#8220;infinite memory&#8221; problem &#8211; learning rate does not continuously decay to zero.</li>
+</ol>
+<p>
+<p>RMSProp was first proposed in lecture notes by Geoff Hinton, 2012 &ndash; unpublished.)</p>
+<h2 id="rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html">RMSProp algorithm, taken from <a href="https://www.deeplearningbook.org/contents/optimization.html" target="_blank">Goodfellow et al</a> </h2>
 
-<p>&nbsp;<br>
-$$
-  \left[\int_{-\infty}^{\infty}dxp(x)\exp{\left(iq(\mu-x)/m\right)}\right]^m\approx
-  \left[1-\frac{q^2\sigma^2}{2m^2}+\dots \right]^m,
-$$
-<p>&nbsp;<br>
+<br/><br/>
+<center>
+<p><img src="http://github.com/figures/rmsprop.png" width="600" align="bottom"></p>
+</center>
+<br/><br/>
+</section>
 
-<p>and in the limit \( m\rightarrow \infty \) we obtain </p>
+<section>
+<h2 id="adam-optimizer">Adam Optimizer </h2>
 
-<p>&nbsp;<br>
-$$
-   \tilde{p}(z)=\frac{1}{\sqrt{2\pi}(\sigma/\sqrt{m})}
-    \exp{\left(-\frac{(z-\mu)^2}{2(\sigma/\sqrt{m})^2}\right)},
-$$
-<p>&nbsp;<br>
+<p>Why combine Momentum and RMSProp? Motivation for Adam: Adaptive Moment Estimation (Adam) was introduced by Kingma an Ba (2014) to combine the benefits of momentum and RMSProp.</p>
 
-<p>which is the normal distribution with variance
-\( \sigma^2_m=\sigma^2/m \), where \( \sigma \) is the variance of the PDF \( p(x) \)
-and \( \mu \) is also the mean of the PDF \( p(x) \). 
-</p>
+<ol>
+<p><li> Fast convergence by smoothing gradients (accelerates in long-term gradient direction).</li>
+<p><li> Adaptive rates (RMSProp): Per-dimension learning rate scaling for stability (handles different feature scales, sparse gradients).</li>
+<p><li> Adam uses both: maintains moving averages of both first moment (gradients) and second moment (squared gradients)</li>
+<p><li> Additionally, includes a mechanism to correct the bias in these moving averages (crucial in early iterations)</li>
+</ol>
+<p>
+<p><b>Result</b>: Adam is robust, achieves faster convergence with less tuning, and often outperforms SGD (with momentum) in practice.</p>
 </section>
 
 <section>
-<h2 id="wrapping-it-up">Wrapping it up </h2>
+<h2 id="adam-optimizer-https-arxiv-org-abs-1412-6980"><a href="https://arxiv.org/abs/1412.6980" target="_blank">ADAM optimizer</a> </h2>
 
-<p>Thus, the central limit theorem states that the PDF \( \tilde{p}(z) \) of
-the average of \( m \) random values corresponding to a PDF \( p(x) \) 
-is a normal distribution whose mean is the 
-mean value of the PDF \( p(x) \) and whose variance is the variance
-of the PDF \( p(x) \) divided by \( m \), the number of values used to compute \( z \).
+<p>In <a href="https://arxiv.org/abs/1412.6980" target="_blank">ADAM</a>, we keep a running average of
+both the first and second moment of the gradient and use this
+information to adaptively change the learning rate for different
+parameters.  The method is efficient when working with large
+problems involving lots data and/or parameters.  It is a combination of the
+gradient descent with momentum algorithm and the RMSprop algorithm
+discussed above.
 </p>
+</section>
 
-<p>The central limit theorem leads to the well-known expression for the
-standard deviation, given by 
-</p>
+<section>
+<h2 id="why-combine-momentum-and-rmsprop">Why Combine Momentum and RMSProp? </h2>
+
+<ol>
+<p><li> Momentum: Fast convergence by smoothing gradients (accelerates in long-term gradient direction).</li>
+<p><li> Adaptive rates (RMSProp): Per-dimension learning rate scaling for stability (handles different feature scales, sparse gradients).</li>
+<p><li> Adam uses both: maintains moving averages of both first moment (gradients) and second moment (squared gradients)</li>
+<p><li> Additionally, includes a mechanism to correct the bias in these moving averages (crucial in early iterations)</li>
+</ol>
+<p>
+<p>Result: Adam is robust, achieves faster convergence with less tuning, and often outperforms SGD (with momentum) in practice</p>
+</section>
 
+<section>
+<h2 id="adam-exponential-moving-averages-moments">Adam: Exponential Moving Averages (Moments) </h2>
+<p>Adam maintains two moving averages at each time step \( t \) for each parameter \( w \):</p>
+<div class="alert alert-block alert-block alert-text-normal">
+<b>First moment (mean) \( m_t \)</b>
+<p>
+<p>The Momentum term</p>
 <p>&nbsp;<br>
 $$
-    \sigma_m=
-\frac{\sigma}{\sqrt{m}}.
+m_t = \beta_1m_{t-1} + (1-\beta_1)\, \nabla C(\theta_t),  
 $$
 <p>&nbsp;<br>
+</div>
 
-<p>The latter is true only if the average value is known exactly. This is obtained in the limit
-\( m\rightarrow \infty \)  only. Because the mean and the variance are measured quantities we obtain 
-the familiar expression in statistics (the so-called Bessel correction)
-</p>
+<div class="alert alert-block alert-block alert-text-normal">
+<b>Second moment (uncentered variance) \( v_t \)</b>
+<p>
+<p>The RMS term</p>
 <p>&nbsp;<br>
 $$
-    \sigma_m\approx 
-\frac{\sigma}{\sqrt{m-1}}.
+v_t = \beta_2v_{t-1} + (1-\beta_2)(\nabla C(\theta_t))^2,
 $$
 <p>&nbsp;<br>
 
-<p>In many cases however the above estimate for the standard deviation,
-in particular if correlations are strong, may be too simplistic. Keep
-in mind that we have assumed that the variables \( x \) are independent
-and identically distributed. This is obviously not always the
-case. For example, the random numbers (or better pseudorandom numbers)
-we generate in various calculations do always exhibit some
-correlations.
-</p>
+<p>with typical \( \beta_1 = 0.9 \), \( \beta_2 = 0.999 \). Initialize \( m_0 = 0 \), \( v_0 = 0 \).</p>
+</div>
 
-<p>The theorem is satisfied by a large class of PDFs. Note however that for a
-finite \( m \), it is not always possible to find a closed form /analytic expression for
-\( \tilde{p}(x) \).
-</p>
+<p>  These are <b>biased</b> estimators of the true first and second moment of the gradients, especially at the start (since \( m_0,v_0 \) are zero)</p>
 </section>
 
 <section>
-<h2 id="confidence-intervals">Confidence Intervals </h2>
+<h2 id="adam-bias-correction">Adam: Bias Correction </h2>
+<p>To counteract initialization bias in \( m_t, v_t \), Adam computes bias-corrected estimates</p>
+<p>&nbsp;<br>
+$$
+\hat{m}_t = \frac{m_t}{1 - \beta_1^t}, \qquad \hat{v}_t = \frac{v_t}{1 - \beta_2^t}. 
+$$
+<p>&nbsp;<br>
 
-<p>Confidence intervals are used in statistics and represent a type of estimate
-computed from the observed data. This gives a range of values for an
-unknown parameter such as the parameters \( \boldsymbol{\beta} \) from linear regression.
-</p>
-
-<p>With the OLS expressions for the parameters \( \boldsymbol{\beta} \) we found 
-\( \mathbb{E}(\boldsymbol{\beta}) = \boldsymbol{\beta} \), which means that the estimator of the regression parameters is unbiased.
-</p>
-
-<p>In the exercises this week we show that the variance of the estimate of the \( j \)-th regression coefficient is
-\( \boldsymbol{\sigma}^2 (\boldsymbol{\beta}_j ) = \boldsymbol{\sigma}^2 [(\mathbf{X}^{T} \mathbf{X})^{-1}]_{jj}  \).
-</p>
-
-<p>This quantity can be used to
-construct a confidence interval for the estimates.
-</p>
-</section>
+<ul>
+<p><li> When \( t \) is small, \( 1-\beta_i^t \approx 0 \), so \( \hat{m}_t, \hat{v}_t \) significantly larger than raw \( m_t, v_t \), compensating for the initial zero bias.</li>
+<p><li> As \( t \) increases, \( 1-\beta_i^t \to 1 \), and \( \hat{m}_t, \hat{v}_t \) converge to \( m_t, v_t \).</li>
+<p><li> Bias correction is important for Adam&#8217;s stability in early iterations</li>
+</ul>
+</section>
 
 <section>
-<h2 id="standard-approach-based-on-the-normal-distribution">Standard Approach based on the Normal Distribution </h2>
-
-<p>We will assume that the parameters \( \beta \) follow a normal
-distribution.  We can then define the confidence interval.  Here we will be using as
-shorthands \( \mu_{\beta} \) for the above mean value and \( \sigma_{\beta} \)
-for the standard deviation. We have then a confidence interval
-</p>
-
+<h2 id="adam-update-rule-derivation">Adam: Update Rule Derivation </h2>
+<p>Finally, Adam updates parameters using the bias-corrected moments:</p>
 <p>&nbsp;<br>
 $$
-\left(\mu_{\beta}\pm \frac{z\sigma_{\beta}}{\sqrt{n}}\right),
+\theta_{t+1} =\theta_t -\frac{\alpha}{\sqrt{\hat{v}_t} + \epsilon}\hat{m}_t,
 $$
 <p>&nbsp;<br>
 
-<p>where \( z \) defines the level of certainty (or confidence). For a normal
-distribution typical parameters are \( z=2.576 \) which corresponds to a
-confidence of \( 99\% \) while \( z=1.96 \) corresponds to a confidence of
-\( 95\% \).  A confidence level of \( 95\% \) is commonly used and it is
-normally referred to as a <em>two-sigmas</em> confidence level, that is we
-approximate \( z\approx 2 \).
-</p>
-
-<p>For more discussions of confidence intervals (and in particular linked with a discussion of the bootstrap method), see chapter 5 of the textbook by <a href="https://www.cambridge.org/core/books/bootstrap-methods-and-their-application/ED2FD043579F27952363566DC09CBD6A" target="_blank">Davison on the Bootstrap Methods and their Applications</a></p>
-
-<p>In this text you will also find an in-depth discussion of the
-Bootstrap method, why it works and various theorems related to it. 
+<p>where \( \epsilon \) is a small constant (e.g. \( 10^{-8} \)) to prevent division by zero.
+Breaking it down:
 </p>
+<ol>
+<p><li> Compute gradient \( \nabla C(\theta_t) \).</li>
+<p><li> Update first moment \( m_t \) and second moment \( v_t \) (exponential moving averages).</li>
+<p><li> Bias-correct: \( \hat{m}_t = m_t/(1-\beta_1^t) \), \( \; \hat{v}_t = v_t/(1-\beta_2^t) \).</li>
+<p><li> Compute step: \( \Delta \theta_t = \frac{\hat{m}_t}{\sqrt{\hat{v}_t} + \epsilon} \).</li>
+<p><li> Update parameters: \( \theta_{t+1} = \theta_t - \alpha\, \Delta \theta_t \).</li>
+</ol>
+<p>
+<p>This is the Adam update rule as given in the original paper.</p>
 </section>
 
 <section>
-<h2 id="resampling-methods-bootstrap-background">Resampling methods: Bootstrap background </h2>
+<h2 id="adam-vs-adagrad-and-rmsprop">Adam vs. AdaGrad and RMSProp </h2>
 
-<p>Since \( \widehat{\beta} = \widehat{\beta}(\boldsymbol{X}) \) is a function of random variables,
-\( \widehat{\beta} \) itself must be a random variable. Thus it has
-a pdf, call this function \( p(\boldsymbol{t}) \). The aim of the bootstrap is to
-estimate \( p(\boldsymbol{t}) \) by the relative frequency of
-\( \widehat{\beta} \). You can think of this as using a histogram
-in the place of \( p(\boldsymbol{t}) \). If the relative frequency closely
-resembles \( p(\vec{t}) \), then using numerics, it is straight forward to
-estimate all the interesting parameters of \( p(\boldsymbol{t}) \) using point
-estimators.  
-</p>
+<ol>
+<p><li> AdaGrad: Uses per-coordinate scaling like Adam, but no momentum. Tends to slow down too much due to cumulative history (no forgetting)</li>
+<p><li> RMSProp: Uses moving average of squared gradients (like Adam&#8217;s \( v_t \)) to maintain adaptive learning rates, but does not include momentum or bias-correction.</li>
+<p><li> Adam: Effectively RMSProp + Momentum + Bias-correction</li>
+<ul>
+
+<p><li> Momentum (\( m_t \)) provides acceleration and smoother convergence.</li>
+
+<p><li> Adaptive \( v_t \) scaling moderates the step size per dimension.</li>
+
+<p><li> Bias correction (absent in AdaGrad/RMSProp) ensures robust estimates early on.</li>
+</ul>
+<p>
+</ol>
+<p>
+<p>In practice, Adam often yields faster convergence and better tuning stability than RMSProp or AdaGrad alone</p>
 </section>
 
 <section>
-<h2 id="resampling-methods-more-bootstrap-background">Resampling methods: More Bootstrap background </h2>
+<h2 id="adaptivity-across-dimensions">Adaptivity Across Dimensions </h2>
 
-<p>In the case that \( \widehat{\beta} \) has
-more than one component, and the components are independent, we use the
-same estimator on each component separately.  If the probability
-density function of \( X_i \), \( p(x) \), had been known, then it would have
-been straightforward to do this by: 
-</p>
 <ol>
-<p><li> Drawing lots of numbers from \( p(x) \), suppose we call one such set of numbers \( (X_1^*, X_2^*, \cdots, X_n^*) \).</li> 
-<p><li> Then using these numbers, we could compute a replica of \( \widehat{\beta} \) called \( \widehat{\beta}^* \).</li> 
+<p><li> Adam adapts the step size \emph{per coordinate}: parameters with larger gradient variance get smaller effective steps, those with smaller or sparse gradients get larger steps.</li>
+<p><li> This per-dimension adaptivity is inherited from AdaGrad/RMSProp and helps handle ill-conditioned or sparse problems.</li>
+<p><li> Meanwhile, momentum (first moment) allows Adam to continue making progress even if gradients become small or noisy, by leveraging accumulated direction.</li>
 </ol>
 <p>
-<p>By repeated use of the above two points, many
-estimates of \( \widehat{\beta} \) can  be obtained. The
-idea is to use the relative frequency of \( \widehat{\beta}^* \)
-(think of a histogram) as an estimate of \( p(\boldsymbol{t}) \).
-</p>
+<h2 id="adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html">ADAM algorithm, taken from <a href="https://www.deeplearningbook.org/contents/optimization.html" target="_blank">Goodfellow et al</a> </h2>
+
+<br/><br/>
+<center>
+<p><img src="http://github.com/figures/adam.png" width="600" align="bottom"></p>
+</center>
+<br/><br/>
 </section>
 
 <section>
-<h2 id="resampling-methods-bootstrap-approach">Resampling methods: Bootstrap approach </h2>
+<h2 id="algorithms-and-codes-for-adagrad-rmsprop-and-adam">Algorithms and codes for Adagrad, RMSprop and Adam </h2>
 
-<p>But
-unless there is enough information available about the process that
-generated \( X_1,X_2,\cdots,X_n \), \( p(x) \) is in general
-unknown. Therefore, <a href="https://projecteuclid.org/euclid.aos/1176344552" target="_blank">Efron in 1979</a>  asked the
-question: What if we replace \( p(x) \) by the relative frequency
-of the observation \( X_i \)?
-</p>
+<p>The algorithms we have implemented are well described in the text by <a href="https://www.deeplearningbook.org/contents/optimization.html" target="_blank">Goodfellow, Bengio and Courville, chapter 8</a>.</p>
 
-<p>If we draw observations in accordance with
-the relative frequency of the observations, will we obtain the same
-result in some asymptotic sense? The answer is yes.
-</p>
+<p>The codes which implement these algorithms are discussed below here.</p>
 </section>
 
 <section>
-<h2 id="resampling-methods-bootstrap-steps">Resampling methods: Bootstrap steps </h2>
-
-<p>The independent bootstrap works like this: </p>
+<h2 id="practical-tips">Practical tips </h2>
 
-<ol>
-<p><li> Draw with replacement \( n \) numbers for the observed variables \( \boldsymbol{x} = (x_1,x_2,\cdots,x_n) \).</li> 
-<p><li> Define a vector \( \boldsymbol{x}^* \) containing the values which were drawn from \( \boldsymbol{x} \).</li> 
-<p><li> Using the vector \( \boldsymbol{x}^* \) compute \( \widehat{\beta}^* \) by evaluating \( \widehat \beta \) under the observations \( \boldsymbol{x}^* \).</li> 
-<p><li> Repeat this process \( k \) times.</li> 
-</ol>
-<p>
-<p>When you are done, you can draw a histogram of the relative frequency
-of \( \widehat \beta^* \). This is your estimate of the probability
-distribution \( p(t) \). Using this probability distribution you can
-estimate any statistics thereof. In principle you never draw the
-histogram of the relative frequency of \( \widehat{\beta}^* \). Instead
-you use the estimators corresponding to the statistic of interest. For
-example, if you are interested in estimating the variance of \( \widehat
-\beta \), apply the etsimator \( \widehat \sigma^2 \) to the values
-\( \widehat \beta^* \).
-</p>
+<ul>
+<p><li> <b>Randomize the data when making mini-batches</b>. It is always important to randomly shuffle the data when forming mini-batches. Otherwise, the gradient descent method can fit spurious correlations resulting from the order in which data is presented.</li>
+<p><li> <b>Transform your inputs</b>. Learning becomes difficult when our landscape has a mixture of steep and flat directions. One simple trick for minimizing these situations is to standardize the data by subtracting the mean and normalizing the variance of input variables. Whenever possible, also decorrelate the inputs. To understand why this is helpful, consider the case of linear regression. It is easy to show that for the squared error cost function, the Hessian of the cost function is just the correlation matrix between the inputs. Thus, by standardizing the inputs, we are ensuring that the landscape looks homogeneous in all directions in parameter space. Since most deep networks can be viewed as linear transformations followed by a non-linearity at each layer, we expect this intuition to hold beyond the linear case.</li>
+<p><li> <b>Monitor the out-of-sample performance.</b> Always monitor the performance of your model on a validation set (a small portion of the training data that is held out of the training process to serve as a proxy for the test set. If the validation error starts increasing, then the model is beginning to overfit. Terminate the learning process. This <em>early stopping</em> significantly improves performance in many settings.</li>
+<p><li> <b>Adaptive optimization methods don't always have good generalization.</b> Recent studies have shown that adaptive methods such as ADAM, RMSPorp, and AdaGrad tend to have poor generalization compared to SGD or SGD with momentum, particularly in the high-dimensional limit (i.e. the number of parameters exceeds the number of data points). Although it is not clear at this stage why these methods perform so well in training deep neural networks, simpler procedures like properly-tuned SGD may work as well or better in these applications.</li>
+</ul>
 </section>
 
 <section>
-<h2 id="code-example-for-the-bootstrap-method">Code example for the Bootstrap method </h2>
+<h2 id="sneaking-in-automatic-differentiation-using-autograd">Sneaking in automatic differentiation using Autograd </h2>
+
+<p>In the examples here we take the liberty of sneaking in automatic
+differentiation (without having discussed the mathematics).  In
+project 1 you will write the gradients as discussed above, that is
+hard-coding the gradients.  By introducing automatic differentiation
+via the library <b>autograd</b>, which is now replaced by <b>JAX</b>, we have
+more flexibility in setting up alternative cost functions.
+</p>
 
-<p>The following code starts with a Gaussian distribution with mean value
-\( \mu =100 \) and variance \( \sigma=15 \). We use this to generate the data
-used in the bootstrap analysis. The bootstrap analysis returns a data
-set after a given number of bootstrap operations (as many as we have
-data points). This data set consists of estimated mean values for each
-bootstrap operation. The histogram generated by the bootstrap method
-shows that the distribution for these mean values is also a Gaussian,
-centered around the mean value \( \mu=100 \) but with standard deviation
-\( \sigma/\sqrt{n} \), where \( n \) is the number of bootstrap samples (in
-this case the same as the number of original data points). The value
-of the standard deviation is what we expect from the central limit
-theorem.
+<p>The
+first example shows results with ordinary leats squares.
 </p>
 
 
@@ -1142,32 +1783,55 @@ <h2 id="code-example-for-the-bootstrap-method">Code example for the Bootstrap me
     <div class="inner_cell">
       <div class="input_area">
         <div class="highlight" style="background: #eeeedd">
-  <pre style="font-size: 80%; line-height: 125%;"><span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">time</span> <span style="color: #8B008B; font-weight: bold">import</span> time
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">scipy.stats</span> <span style="color: #8B008B; font-weight: bold">import</span> norm
+  <pre style="font-size: 80%; line-height: 125%;"><span style="color: #228B22"># Using Autograd to calculate gradients for OLS</span>
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">random</span> <span style="color: #8B008B; font-weight: bold">import</span> random, seed
+<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
+<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">autograd.numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
 <span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">matplotlib.pyplot</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">plt</span>
-
-<span style="color: #228B22"># Returns mean of bootstrap samples </span>
-<span style="color: #228B22"># Bootstrap algorithm</span>
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">bootstrap</span>(data, datapoints):
-    t = np.zeros(datapoints)
-    n = <span style="color: #658b00">len</span>(data)
-    <span style="color: #228B22"># non-parametric bootstrap         </span>
-    <span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(datapoints):
-        t[i] = np.mean(data[np.random.randint(<span style="color: #B452CD">0</span>,n,n)])
-    <span style="color: #228B22"># analysis    </span>
-    <span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;Bootstrap Statistics :&quot;</span>)
-    <span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;original           bias      std. error&quot;</span>)
-    <span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;%8g %8g %14g %15g&quot;</span> % (np.mean(data), np.std(data),np.mean(t),np.std(t)))
-    <span style="color: #8B008B; font-weight: bold">return</span> t
-
-<span style="color: #228B22"># We set the mean value to 100 and the standard deviation to 15</span>
-mu, sigma = <span style="color: #B452CD">100</span>, <span style="color: #B452CD">15</span>
-datapoints = <span style="color: #B452CD">10000</span>
-<span style="color: #228B22"># We generate random numbers according to the normal distribution</span>
-x = mu + sigma*np.random.randn(datapoints)
-<span style="color: #228B22"># bootstrap returns the data sample                                    </span>
-t = bootstrap(x, datapoints)
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">autograd</span> <span style="color: #8B008B; font-weight: bold">import</span> grad
+
+<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">CostOLS</span>(theta):
+    <span style="color: #8B008B; font-weight: bold">return</span> (<span style="color: #B452CD">1.0</span>/n)*np.sum((y-X @ theta)**<span style="color: #B452CD">2</span>)
+
+n = <span style="color: #B452CD">100</span>
+x = <span style="color: #B452CD">2</span>*np.random.rand(n,<span style="color: #B452CD">1</span>)
+y = <span style="color: #B452CD">4</span>+<span style="color: #B452CD">3</span>*x+np.random.randn(n,<span style="color: #B452CD">1</span>)
+
+X = np.c_[np.ones((n,<span style="color: #B452CD">1</span>)), x]
+XT_X = X.T @ X
+theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)
+<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;Own inversion&quot;</span>)
+<span style="color: #658b00">print</span>(theta_linreg)
+<span style="color: #228B22"># Hessian matrix</span>
+H = (<span style="color: #B452CD">2.0</span>/n)* XT_X
+EigValues, EigVectors = np.linalg.eig(H)
+<span style="color: #658b00">print</span>(<span style="color: #CD5555">f&quot;Eigenvalues of Hessian Matrix:{</span>EigValues<span style="color: #CD5555">}&quot;</span>)
+
+theta = np.random.randn(<span style="color: #B452CD">2</span>,<span style="color: #B452CD">1</span>)
+eta = <span style="color: #B452CD">1.0</span>/np.max(EigValues)
+Niterations = <span style="color: #B452CD">1000</span>
+<span style="color: #228B22"># define the gradient</span>
+training_gradient = grad(CostOLS)
+
+<span style="color: #8B008B; font-weight: bold">for</span> <span style="color: #658b00">iter</span> <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(Niterations):
+    gradients = training_gradient(theta)
+    theta -= eta*gradients
+<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;theta from own gd&quot;</span>)
+<span style="color: #658b00">print</span>(theta)
+
+xnew = np.array([[<span style="color: #B452CD">0</span>],[<span style="color: #B452CD">2</span>]])
+Xnew = np.c_[np.ones((<span style="color: #B452CD">2</span>,<span style="color: #B452CD">1</span>)), xnew]
+ypredict = Xnew.dot(theta)
+ypredict2 = Xnew.dot(theta_linreg)
+
+plt.plot(xnew, ypredict, <span style="color: #CD5555">&quot;r-&quot;</span>)
+plt.plot(xnew, ypredict2, <span style="color: #CD5555">&quot;b-&quot;</span>)
+plt.plot(x, y ,<span style="color: #CD5555">&#39;ro&#39;</span>)
+plt.axis([<span style="color: #B452CD">0</span>,<span style="color: #B452CD">2.0</span>,<span style="color: #B452CD">0</span>, <span style="color: #B452CD">15.0</span>])
+plt.xlabel(<span style="color: #CD5555">r&#39;$x$&#39;</span>)
+plt.ylabel(<span style="color: #CD5555">r&#39;$y$&#39;</span>)
+plt.title(<span style="color: #CD5555">r&#39;Random numbers &#39;</span>)
+plt.show()
 </pre>
 </div>
       </div>
@@ -1182,12 +1846,10 @@ <h2 id="code-example-for-the-bootstrap-method">Code example for the Bootstrap me
     </div>
   </div>
 </div>
-
-<p>We see that our new variance and from that the standard deviation, agrees with the central limit theorem.</p>
 </section>
 
 <section>
-<h2 id="plotting-the-histogram">Plotting the Histogram </h2>
+<h2 id="same-code-but-now-with-momentum-gradient-descent">Same code but now with momentum gradient descent </h2>
 
 <!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
 <div class="cell border-box-sizing code_cell rendered">
@@ -1195,15 +1857,59 @@ <h2 id="plotting-the-histogram">Plotting the Histogram </h2>
     <div class="inner_cell">
       <div class="input_area">
         <div class="highlight" style="background: #eeeedd">
-  <pre style="font-size: 80%; line-height: 125%;"><span style="color: #228B22"># the histogram of the bootstrapped data (normalized data if density = True)</span>
-n, binsboot, patches = plt.hist(t, <span style="color: #B452CD">50</span>, density=<span style="color: #8B008B; font-weight: bold">True</span>, facecolor=<span style="color: #CD5555">&#39;red&#39;</span>, alpha=<span style="color: #B452CD">0.75</span>)
-<span style="color: #228B22"># add a &#39;best fit&#39; line  </span>
-y = norm.pdf(binsboot, np.mean(t), np.std(t))
-lt = plt.plot(binsboot, y, <span style="color: #CD5555">&#39;b&#39;</span>, linewidth=<span style="color: #B452CD">1</span>)
-plt.xlabel(<span style="color: #CD5555">&#39;x&#39;</span>)
-plt.ylabel(<span style="color: #CD5555">&#39;Probability&#39;</span>)
-plt.grid(<span style="color: #8B008B; font-weight: bold">True</span>)
-plt.show()
+  <pre style="font-size: 80%; line-height: 125%;"><span style="color: #228B22"># Using Autograd to calculate gradients for OLS</span>
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">random</span> <span style="color: #8B008B; font-weight: bold">import</span> random, seed
+<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
+<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">autograd.numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
+<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">matplotlib.pyplot</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">plt</span>
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">autograd</span> <span style="color: #8B008B; font-weight: bold">import</span> grad
+
+<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">CostOLS</span>(theta):
+    <span style="color: #8B008B; font-weight: bold">return</span> (<span style="color: #B452CD">1.0</span>/n)*np.sum((y-X @ theta)**<span style="color: #B452CD">2</span>)
+
+n = <span style="color: #B452CD">100</span>
+x = <span style="color: #B452CD">2</span>*np.random.rand(n,<span style="color: #B452CD">1</span>)
+y = <span style="color: #B452CD">4</span>+<span style="color: #B452CD">3</span>*x<span style="color: #228B22">#+np.random.randn(n,1)</span>
+
+X = np.c_[np.ones((n,<span style="color: #B452CD">1</span>)), x]
+XT_X = X.T @ X
+theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)
+<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;Own inversion&quot;</span>)
+<span style="color: #658b00">print</span>(theta_linreg)
+<span style="color: #228B22"># Hessian matrix</span>
+H = (<span style="color: #B452CD">2.0</span>/n)* XT_X
+EigValues, EigVectors = np.linalg.eig(H)
+<span style="color: #658b00">print</span>(<span style="color: #CD5555">f&quot;Eigenvalues of Hessian Matrix:{</span>EigValues<span style="color: #CD5555">}&quot;</span>)
+
+theta = np.random.randn(<span style="color: #B452CD">2</span>,<span style="color: #B452CD">1</span>)
+eta = <span style="color: #B452CD">1.0</span>/np.max(EigValues)
+Niterations = <span style="color: #B452CD">30</span>
+
+<span style="color: #228B22"># define the gradient</span>
+training_gradient = grad(CostOLS)
+
+<span style="color: #8B008B; font-weight: bold">for</span> <span style="color: #658b00">iter</span> <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(Niterations):
+    gradients = training_gradient(theta)
+    theta -= eta*gradients
+    <span style="color: #658b00">print</span>(<span style="color: #658b00">iter</span>,gradients[<span style="color: #B452CD">0</span>],gradients[<span style="color: #B452CD">1</span>])
+<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;theta from own gd&quot;</span>)
+<span style="color: #658b00">print</span>(theta)
+
+<span style="color: #228B22"># Now improve with momentum gradient descent</span>
+change = <span style="color: #B452CD">0.0</span>
+delta_momentum = <span style="color: #B452CD">0.3</span>
+<span style="color: #8B008B; font-weight: bold">for</span> <span style="color: #658b00">iter</span> <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(Niterations):
+    <span style="color: #228B22"># calculate gradient</span>
+    gradients = training_gradient(theta)
+    <span style="color: #228B22"># calculate update</span>
+    new_change = eta*gradients+delta_momentum*change
+    <span style="color: #228B22"># take a step</span>
+    theta -= new_change
+    <span style="color: #228B22"># save the change</span>
+    change = new_change
+    <span style="color: #658b00">print</span>(<span style="color: #658b00">iter</span>,gradients[<span style="color: #B452CD">0</span>],gradients[<span style="color: #B452CD">1</span>])
+<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;theta from own gd wth momentum&quot;</span>)
+<span style="color: #658b00">print</span>(theta)
 </pre>
 </div>
       </div>
@@ -1221,90 +1927,13 @@ <h2 id="plotting-the-histogram">Plotting the Histogram </h2>
 </section>
 
 <section>
-<h2 id="the-bias-variance-tradeoff">The bias-variance tradeoff </h2>
+<h2 id="including-stochastic-gradient-descent-with-autograd">Including Stochastic Gradient Descent with Autograd </h2>
 
-<p>We will discuss the bias-variance tradeoff in the context of
-continuous predictions such as regression. However, many of the
-intuitions and ideas discussed here also carry over to classification
-tasks. Consider a dataset \( \mathcal{D} \) consisting of the data
-\( \mathbf{X}_\mathcal{D}=\{(y_j, \boldsymbol{x}_j), j=0\ldots n-1\} \). 
+<p>In this code we include the stochastic gradient descent approach
+discussed above. Note here that we specify which argument we are
+taking the derivative with respect to when using <b>autograd</b>.
 </p>
 
-<p>Let us assume that the true data is generated from a noisy model</p>
-
-<p>&nbsp;<br>
-$$
-\boldsymbol{y}=f(\boldsymbol{x}) + \boldsymbol{\epsilon}
-$$
-<p>&nbsp;<br>
-
-<p>where \( \epsilon \) is normally distributed with mean zero and standard deviation \( \sigma^2 \).</p>
-
-<p>In our derivation of the ordinary least squares method we defined then
-an approximation to the function \( f \) in terms of the parameters
-\( \boldsymbol{\beta} \) and the design matrix \( \boldsymbol{X} \) which embody our model,
-that is \( \boldsymbol{\tilde{y}}=\boldsymbol{X}\boldsymbol{\beta} \). 
-</p>
-
-<p>Thereafter we found the parameters \( \boldsymbol{\beta} \) by optimizing the means squared error via the so-called cost function</p>
-<p>&nbsp;<br>
-$$
-C(\boldsymbol{X},\boldsymbol{\beta}) =\frac{1}{n}\sum_{i=0}^{n-1}(y_i-\tilde{y}_i)^2=\mathbb{E}\left[(\boldsymbol{y}-\boldsymbol{\tilde{y}})^2\right].
-$$
-<p>&nbsp;<br>
-
-<p>We can rewrite this as </p>
-<p>&nbsp;<br>
-$$
-\mathbb{E}\left[(\boldsymbol{y}-\boldsymbol{\tilde{y}})^2\right]=\frac{1}{n}\sum_i(f_i-\mathbb{E}\left[\boldsymbol{\tilde{y}}\right])^2+\frac{1}{n}\sum_i(\tilde{y}_i-\mathbb{E}\left[\boldsymbol{\tilde{y}}\right])^2+\sigma^2.
-$$
-<p>&nbsp;<br>
-
-<p>The three terms represent the square of the bias of the learning
-method, which can be thought of as the error caused by the simplifying
-assumptions built into the method. The second term represents the
-variance of the chosen model and finally the last terms is variance of
-the error \( \boldsymbol{\epsilon} \).
-</p>
-
-<p>To derive this equation, we need to recall that the variance of \( \boldsymbol{y} \) and \( \boldsymbol{\epsilon} \) are both equal to \( \sigma^2 \). The mean value of \( \boldsymbol{\epsilon} \) is by definition equal to zero. Furthermore, the function \( f \) is not a stochastics variable, idem for \( \boldsymbol{\tilde{y}} \).
-We use a more compact notation in terms of the expectation value 
-</p>
-<p>&nbsp;<br>
-$$
-\mathbb{E}\left[(\boldsymbol{y}-\boldsymbol{\tilde{y}})^2\right]=\mathbb{E}\left[(\boldsymbol{f}+\boldsymbol{\epsilon}-\boldsymbol{\tilde{y}})^2\right],
-$$
-<p>&nbsp;<br>
-
-<p>and adding and subtracting \( \mathbb{E}\left[\boldsymbol{\tilde{y}}\right] \) we get</p>
-<p>&nbsp;<br>
-$$
-\mathbb{E}\left[(\boldsymbol{y}-\boldsymbol{\tilde{y}})^2\right]=\mathbb{E}\left[(\boldsymbol{f}+\boldsymbol{\epsilon}-\boldsymbol{\tilde{y}}+\mathbb{E}\left[\boldsymbol{\tilde{y}}\right]-\mathbb{E}\left[\boldsymbol{\tilde{y}}\right])^2\right],
-$$
-<p>&nbsp;<br>
-
-<p>which, using the abovementioned expectation values can be rewritten as </p>
-<p>&nbsp;<br>
-$$
-\mathbb{E}\left[(\boldsymbol{y}-\boldsymbol{\tilde{y}})^2\right]=\mathbb{E}\left[(\boldsymbol{y}-\mathbb{E}\left[\boldsymbol{\tilde{y}}\right])^2\right]+\mathrm{Var}\left[\boldsymbol{\tilde{y}}\right]+\sigma^2,
-$$
-<p>&nbsp;<br>
-
-<p>that is the rewriting in terms of the so-called bias, the variance of the model \( \boldsymbol{\tilde{y}} \) and the variance of \( \boldsymbol{\epsilon} \).</p>
-</section>
-
-<section>
-<h2 id="a-way-to-read-the-bias-variance-tradeoff">A way to Read the Bias-Variance Tradeoff </h2>
-
-<br/><br/>
-<center>
-<p><img src="http://github.com/figures/BiasVariance.png" width="600" align="bottom"></p>
-</center>
-<br/><br/>
-</section>
-
-<section>
-<h2 id="example-code-for-bias-variance-tradeoff">Example code for Bias-Variance tradeoff </h2>
 
 <!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
 <div class="cell border-box-sizing code_cell rendered">
@@ -1312,60 +1941,79 @@ <h2 id="example-code-for-bias-variance-tradeoff">Example code for Bias-Variance
     <div class="inner_cell">
       <div class="input_area">
         <div class="highlight" style="background: #eeeedd">
-  <pre style="font-size: 80%; line-height: 125%;"><span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">matplotlib.pyplot</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">plt</span>
+  <pre style="font-size: 80%; line-height: 125%;"><span style="color: #228B22"># Using Autograd to calculate gradients using SGD</span>
+<span style="color: #228B22"># OLS example</span>
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">random</span> <span style="color: #8B008B; font-weight: bold">import</span> random, seed
 <span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.linear_model</span> <span style="color: #8B008B; font-weight: bold">import</span> LinearRegression, Ridge, Lasso
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.preprocessing</span> <span style="color: #8B008B; font-weight: bold">import</span> PolynomialFeatures
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.model_selection</span> <span style="color: #8B008B; font-weight: bold">import</span> train_test_split
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.pipeline</span> <span style="color: #8B008B; font-weight: bold">import</span> make_pipeline
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.utils</span> <span style="color: #8B008B; font-weight: bold">import</span> resample
-
-np.random.seed(<span style="color: #B452CD">2018</span>)
-
-n = <span style="color: #B452CD">500</span>
-n_boostraps = <span style="color: #B452CD">100</span>
-degree = <span style="color: #B452CD">18</span>  <span style="color: #228B22"># A quite high value, just to show.</span>
-noise = <span style="color: #B452CD">0.1</span>
-
-<span style="color: #228B22"># Make data set.</span>
-x = np.linspace(-<span style="color: #B452CD">1</span>, <span style="color: #B452CD">3</span>, n).reshape(-<span style="color: #B452CD">1</span>, <span style="color: #B452CD">1</span>)
-y = np.exp(-x**<span style="color: #B452CD">2</span>) + <span style="color: #B452CD">1.5</span> * np.exp(-(x-<span style="color: #B452CD">2</span>)**<span style="color: #B452CD">2</span>) + np.random.normal(<span style="color: #B452CD">0</span>, <span style="color: #B452CD">0.1</span>, x.shape)
-
-<span style="color: #228B22"># Hold out some test data that is never used in training.</span>
-x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=<span style="color: #B452CD">0.2</span>)
-
-<span style="color: #228B22"># Combine x transformation and model into one operation.</span>
-<span style="color: #228B22"># Not neccesary, but convenient.</span>
-model = make_pipeline(PolynomialFeatures(degree=degree), LinearRegression(fit_intercept=<span style="color: #8B008B; font-weight: bold">False</span>))
-
-<span style="color: #228B22"># The following (m x n_bootstraps) matrix holds the column vectors y_pred</span>
-<span style="color: #228B22"># for each bootstrap iteration.</span>
-y_pred = np.empty((y_test.shape[<span style="color: #B452CD">0</span>], n_boostraps))
-<span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(n_boostraps):
-    x_, y_ = resample(x_train, y_train)
-
-    <span style="color: #228B22"># Evaluate the new model on the same test data each time.</span>
-    y_pred[:, i] = model.fit(x_, y_).predict(x_test).ravel()
-
-<span style="color: #228B22"># Note: Expectations and variances taken w.r.t. different training</span>
-<span style="color: #228B22"># data sets, hence the axis=1. Subsequent means are taken across the test data</span>
-<span style="color: #228B22"># set in order to obtain a total value, but before this we have error/bias/variance</span>
-<span style="color: #228B22"># calculated per data point in the test set.</span>
-<span style="color: #228B22"># Note 2: The use of keepdims=True is important in the calculation of bias as this </span>
-<span style="color: #228B22"># maintains the column vector form. Dropping this yields very unexpected results.</span>
-error = np.mean( np.mean((y_test - y_pred)**<span style="color: #B452CD">2</span>, axis=<span style="color: #B452CD">1</span>, keepdims=<span style="color: #8B008B; font-weight: bold">True</span>) )
-bias = np.mean( (y_test - np.mean(y_pred, axis=<span style="color: #B452CD">1</span>, keepdims=<span style="color: #8B008B; font-weight: bold">True</span>))**<span style="color: #B452CD">2</span> )
-variance = np.mean( np.var(y_pred, axis=<span style="color: #B452CD">1</span>, keepdims=<span style="color: #8B008B; font-weight: bold">True</span>) )
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&#39;Error:&#39;</span>, error)
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&#39;Bias^2:&#39;</span>, bias)
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&#39;Var:&#39;</span>, variance)
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&#39;{} &gt;= {} + {} = {}&#39;</span>.format(error, bias, variance, bias+variance))
-
-plt.plot(x[::<span style="color: #B452CD">5</span>, :], y[::<span style="color: #B452CD">5</span>, :], label=<span style="color: #CD5555">&#39;f(x)&#39;</span>)
-plt.scatter(x_test, y_test, label=<span style="color: #CD5555">&#39;Data points&#39;</span>)
-plt.scatter(x_test, np.mean(y_pred, axis=<span style="color: #B452CD">1</span>), label=<span style="color: #CD5555">&#39;Pred&#39;</span>)
-plt.legend()
+<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">autograd.numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
+<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">matplotlib.pyplot</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">plt</span>
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">autograd</span> <span style="color: #8B008B; font-weight: bold">import</span> grad
+
+<span style="color: #228B22"># Note change from previous example</span>
+<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">CostOLS</span>(y,X,theta):
+    <span style="color: #8B008B; font-weight: bold">return</span> np.sum((y-X @ theta)**<span style="color: #B452CD">2</span>)
+
+n = <span style="color: #B452CD">100</span>
+x = <span style="color: #B452CD">2</span>*np.random.rand(n,<span style="color: #B452CD">1</span>)
+y = <span style="color: #B452CD">4</span>+<span style="color: #B452CD">3</span>*x+np.random.randn(n,<span style="color: #B452CD">1</span>)
+
+X = np.c_[np.ones((n,<span style="color: #B452CD">1</span>)), x]
+XT_X = X.T @ X
+theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)
+<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;Own inversion&quot;</span>)
+<span style="color: #658b00">print</span>(theta_linreg)
+<span style="color: #228B22"># Hessian matrix</span>
+H = (<span style="color: #B452CD">2.0</span>/n)* XT_X
+EigValues, EigVectors = np.linalg.eig(H)
+<span style="color: #658b00">print</span>(<span style="color: #CD5555">f&quot;Eigenvalues of Hessian Matrix:{</span>EigValues<span style="color: #CD5555">}&quot;</span>)
+
+theta = np.random.randn(<span style="color: #B452CD">2</span>,<span style="color: #B452CD">1</span>)
+eta = <span style="color: #B452CD">1.0</span>/np.max(EigValues)
+Niterations = <span style="color: #B452CD">1000</span>
+
+<span style="color: #228B22"># Note that we request the derivative wrt third argument (theta, 2 here)</span>
+training_gradient = grad(CostOLS,<span style="color: #B452CD">2</span>)
+
+<span style="color: #8B008B; font-weight: bold">for</span> <span style="color: #658b00">iter</span> <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(Niterations):
+    gradients = (<span style="color: #B452CD">1.0</span>/n)*training_gradient(y, X, theta)
+    theta -= eta*gradients
+<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;theta from own gd&quot;</span>)
+<span style="color: #658b00">print</span>(theta)
+
+xnew = np.array([[<span style="color: #B452CD">0</span>],[<span style="color: #B452CD">2</span>]])
+Xnew = np.c_[np.ones((<span style="color: #B452CD">2</span>,<span style="color: #B452CD">1</span>)), xnew]
+ypredict = Xnew.dot(theta)
+ypredict2 = Xnew.dot(theta_linreg)
+
+plt.plot(xnew, ypredict, <span style="color: #CD5555">&quot;r-&quot;</span>)
+plt.plot(xnew, ypredict2, <span style="color: #CD5555">&quot;b-&quot;</span>)
+plt.plot(x, y ,<span style="color: #CD5555">&#39;ro&#39;</span>)
+plt.axis([<span style="color: #B452CD">0</span>,<span style="color: #B452CD">2.0</span>,<span style="color: #B452CD">0</span>, <span style="color: #B452CD">15.0</span>])
+plt.xlabel(<span style="color: #CD5555">r&#39;$x$&#39;</span>)
+plt.ylabel(<span style="color: #CD5555">r&#39;$y$&#39;</span>)
+plt.title(<span style="color: #CD5555">r&#39;Random numbers &#39;</span>)
 plt.show()
+
+n_epochs = <span style="color: #B452CD">50</span>
+M = <span style="color: #B452CD">5</span>   <span style="color: #228B22">#size of each minibatch</span>
+m = <span style="color: #658b00">int</span>(n/M) <span style="color: #228B22">#number of minibatches</span>
+t0, t1 = <span style="color: #B452CD">5</span>, <span style="color: #B452CD">50</span>
+<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">learning_schedule</span>(t):
+    <span style="color: #8B008B; font-weight: bold">return</span> t0/(t+t1)
+
+theta = np.random.randn(<span style="color: #B452CD">2</span>,<span style="color: #B452CD">1</span>)
+
+<span style="color: #8B008B; font-weight: bold">for</span> epoch <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(n_epochs):
+<span style="color: #228B22"># Can you figure out a better way of setting up the contributions to each batch?</span>
+    <span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(m):
+        random_index = M*np.random.randint(m)
+        xi = X[random_index:random_index+M]
+        yi = y[random_index:random_index+M]
+        gradients = (<span style="color: #B452CD">1.0</span>/M)*training_gradient(yi, xi, theta)
+        eta = learning_schedule(epoch*m+i)
+        theta = theta - eta*gradients
+<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;theta from own sdg&quot;</span>)
+<span style="color: #658b00">print</span>(theta)
 </pre>
 </div>
       </div>
@@ -1383,7 +2031,7 @@ <h2 id="example-code-for-bias-variance-tradeoff">Example code for Bias-Variance
 </section>
 
 <section>
-<h2 id="understanding-what-happens">Understanding what happens </h2>
+<h2 id="same-code-but-now-with-momentum-gradient-descent">Same code but now with momentum gradient descent </h2>
 
 <!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
 <div class="cell border-box-sizing code_cell rendered">
@@ -1391,52 +2039,73 @@ <h2 id="understanding-what-happens">Understanding what happens </h2>
     <div class="inner_cell">
       <div class="input_area">
         <div class="highlight" style="background: #eeeedd">
-  <pre style="font-size: 80%; line-height: 125%;"><span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">matplotlib.pyplot</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">plt</span>
+  <pre style="font-size: 80%; line-height: 125%;"><span style="color: #228B22"># Using Autograd to calculate gradients using SGD</span>
+<span style="color: #228B22"># OLS example</span>
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">random</span> <span style="color: #8B008B; font-weight: bold">import</span> random, seed
 <span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.linear_model</span> <span style="color: #8B008B; font-weight: bold">import</span> LinearRegression, Ridge, Lasso
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.preprocessing</span> <span style="color: #8B008B; font-weight: bold">import</span> PolynomialFeatures
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.model_selection</span> <span style="color: #8B008B; font-weight: bold">import</span> train_test_split
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.pipeline</span> <span style="color: #8B008B; font-weight: bold">import</span> make_pipeline
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.utils</span> <span style="color: #8B008B; font-weight: bold">import</span> resample
-
-np.random.seed(<span style="color: #B452CD">2018</span>)
-
-n = <span style="color: #B452CD">40</span>
-n_boostraps = <span style="color: #B452CD">100</span>
-maxdegree = <span style="color: #B452CD">14</span>
-
-
-<span style="color: #228B22"># Make data set.</span>
-x = np.linspace(-<span style="color: #B452CD">3</span>, <span style="color: #B452CD">3</span>, n).reshape(-<span style="color: #B452CD">1</span>, <span style="color: #B452CD">1</span>)
-y = np.exp(-x**<span style="color: #B452CD">2</span>) + <span style="color: #B452CD">1.5</span> * np.exp(-(x-<span style="color: #B452CD">2</span>)**<span style="color: #B452CD">2</span>)+ np.random.normal(<span style="color: #B452CD">0</span>, <span style="color: #B452CD">0.1</span>, x.shape)
-error = np.zeros(maxdegree)
-bias = np.zeros(maxdegree)
-variance = np.zeros(maxdegree)
-polydegree = np.zeros(maxdegree)
-x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=<span style="color: #B452CD">0.2</span>)
-
-<span style="color: #8B008B; font-weight: bold">for</span> degree <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(maxdegree):
-    model = make_pipeline(PolynomialFeatures(degree=degree), LinearRegression(fit_intercept=<span style="color: #8B008B; font-weight: bold">False</span>))
-    y_pred = np.empty((y_test.shape[<span style="color: #B452CD">0</span>], n_boostraps))
-    <span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(n_boostraps):
-        x_, y_ = resample(x_train, y_train)
-        y_pred[:, i] = model.fit(x_, y_).predict(x_test).ravel()
-
-    polydegree[degree] = degree
-    error[degree] = np.mean( np.mean((y_test - y_pred)**<span style="color: #B452CD">2</span>, axis=<span style="color: #B452CD">1</span>, keepdims=<span style="color: #8B008B; font-weight: bold">True</span>) )
-    bias[degree] = np.mean( (y_test - np.mean(y_pred, axis=<span style="color: #B452CD">1</span>, keepdims=<span style="color: #8B008B; font-weight: bold">True</span>))**<span style="color: #B452CD">2</span> )
-    variance[degree] = np.mean( np.var(y_pred, axis=<span style="color: #B452CD">1</span>, keepdims=<span style="color: #8B008B; font-weight: bold">True</span>) )
-    <span style="color: #658b00">print</span>(<span style="color: #CD5555">&#39;Polynomial degree:&#39;</span>, degree)
-    <span style="color: #658b00">print</span>(<span style="color: #CD5555">&#39;Error:&#39;</span>, error[degree])
-    <span style="color: #658b00">print</span>(<span style="color: #CD5555">&#39;Bias^2:&#39;</span>, bias[degree])
-    <span style="color: #658b00">print</span>(<span style="color: #CD5555">&#39;Var:&#39;</span>, variance[degree])
-    <span style="color: #658b00">print</span>(<span style="color: #CD5555">&#39;{} &gt;= {} + {} = {}&#39;</span>.format(error[degree], bias[degree], variance[degree], bias[degree]+variance[degree]))
-
-plt.plot(polydegree, error, label=<span style="color: #CD5555">&#39;Error&#39;</span>)
-plt.plot(polydegree, bias, label=<span style="color: #CD5555">&#39;bias&#39;</span>)
-plt.plot(polydegree, variance, label=<span style="color: #CD5555">&#39;Variance&#39;</span>)
-plt.legend()
-plt.show()
+<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">autograd.numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
+<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">matplotlib.pyplot</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">plt</span>
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">autograd</span> <span style="color: #8B008B; font-weight: bold">import</span> grad
+
+<span style="color: #228B22"># Note change from previous example</span>
+<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">CostOLS</span>(y,X,theta):
+    <span style="color: #8B008B; font-weight: bold">return</span> np.sum((y-X @ theta)**<span style="color: #B452CD">2</span>)
+
+n = <span style="color: #B452CD">100</span>
+x = <span style="color: #B452CD">2</span>*np.random.rand(n,<span style="color: #B452CD">1</span>)
+y = <span style="color: #B452CD">4</span>+<span style="color: #B452CD">3</span>*x+np.random.randn(n,<span style="color: #B452CD">1</span>)
+
+X = np.c_[np.ones((n,<span style="color: #B452CD">1</span>)), x]
+XT_X = X.T @ X
+theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)
+<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;Own inversion&quot;</span>)
+<span style="color: #658b00">print</span>(theta_linreg)
+<span style="color: #228B22"># Hessian matrix</span>
+H = (<span style="color: #B452CD">2.0</span>/n)* XT_X
+EigValues, EigVectors = np.linalg.eig(H)
+<span style="color: #658b00">print</span>(<span style="color: #CD5555">f&quot;Eigenvalues of Hessian Matrix:{</span>EigValues<span style="color: #CD5555">}&quot;</span>)
+
+theta = np.random.randn(<span style="color: #B452CD">2</span>,<span style="color: #B452CD">1</span>)
+eta = <span style="color: #B452CD">1.0</span>/np.max(EigValues)
+Niterations = <span style="color: #B452CD">100</span>
+
+<span style="color: #228B22"># Note that we request the derivative wrt third argument (theta, 2 here)</span>
+training_gradient = grad(CostOLS,<span style="color: #B452CD">2</span>)
+
+<span style="color: #8B008B; font-weight: bold">for</span> <span style="color: #658b00">iter</span> <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(Niterations):
+    gradients = (<span style="color: #B452CD">1.0</span>/n)*training_gradient(y, X, theta)
+    theta -= eta*gradients
+<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;theta from own gd&quot;</span>)
+<span style="color: #658b00">print</span>(theta)
+
+
+n_epochs = <span style="color: #B452CD">50</span>
+M = <span style="color: #B452CD">5</span>   <span style="color: #228B22">#size of each minibatch</span>
+m = <span style="color: #658b00">int</span>(n/M) <span style="color: #228B22">#number of minibatches</span>
+t0, t1 = <span style="color: #B452CD">5</span>, <span style="color: #B452CD">50</span>
+<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">learning_schedule</span>(t):
+    <span style="color: #8B008B; font-weight: bold">return</span> t0/(t+t1)
+
+theta = np.random.randn(<span style="color: #B452CD">2</span>,<span style="color: #B452CD">1</span>)
+
+change = <span style="color: #B452CD">0.0</span>
+delta_momentum = <span style="color: #B452CD">0.3</span>
+
+<span style="color: #8B008B; font-weight: bold">for</span> epoch <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(n_epochs):
+    <span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(m):
+        random_index = M*np.random.randint(m)
+        xi = X[random_index:random_index+M]
+        yi = y[random_index:random_index+M]
+        gradients = (<span style="color: #B452CD">1.0</span>/M)*training_gradient(yi, xi, theta)
+        eta = learning_schedule(epoch*m+i)
+        <span style="color: #228B22"># calculate update</span>
+        new_change = eta*gradients+delta_momentum*change
+        <span style="color: #228B22"># take a step</span>
+        theta -= new_change
+        <span style="color: #228B22"># save the change</span>
+        change = new_change
+<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;theta from own sdg with momentum&quot;</span>)
+<span style="color: #658b00">print</span>(theta)
 </pre>
 </div>
       </div>
@@ -1454,60 +2123,67 @@ <h2 id="understanding-what-happens">Understanding what happens </h2>
 </section>
 
 <section>
-<h2 id="summing-up">Summing up </h2>
+<h2 id="but-none-of-these-can-compete-with-newton-s-method">But none of these can compete with Newton's method </h2>
 
-<p>The bias-variance tradeoff summarizes the fundamental tension in
-machine learning, particularly supervised learning, between the
-complexity of a model and the amount of training data needed to train
-it.  Since data is often limited, in practice it is often useful to
-use a less-complex model with higher bias, that is  a model whose asymptotic
-performance is worse than another model because it is easier to
-train and less sensitive to sampling noise arising from having a
-finite-sized training dataset (smaller variance). 
-</p>
+<p>Note that we here have introduced automatic differentiation</p>
 
-<p>The above equations tell us that in
-order to minimize the expected test error, we need to select a
-statistical learning method that simultaneously achieves low variance
-and low bias. Note that variance is inherently a nonnegative quantity,
-and squared bias is also nonnegative. Hence, we see that the expected
-test MSE can never lie below \( Var(\epsilon) \), the irreducible error.
-</p>
-
-<p>What do we mean by the variance and bias of a statistical learning
-method? The variance refers to the amount by which our model would change if we
-estimated it using a different training data set. Since the training
-data are used to fit the statistical learning method, different
-training data sets  will result in a different estimate. But ideally the
-estimate for our model should not vary too much between training
-sets. However, if a method has high variance  then small changes in
-the training data can result in large changes in the model. In general, more
-flexible statistical methods have higher variance.
-</p>
-
-<p>You may also find this recent <a href="https://www.pnas.org/content/116/32/15849" target="_blank">article</a> of interest.</p>
+<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #eeeedd">
+  <pre style="font-size: 80%; line-height: 125%;"><span style="color: #228B22"># Using Newton&#39;s method</span>
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">random</span> <span style="color: #8B008B; font-weight: bold">import</span> random, seed
+<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
+<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">autograd.numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">autograd</span> <span style="color: #8B008B; font-weight: bold">import</span> grad
+
+<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">CostOLS</span>(theta):
+    <span style="color: #8B008B; font-weight: bold">return</span> (<span style="color: #B452CD">1.0</span>/n)*np.sum((y-X @ theta)**<span style="color: #B452CD">2</span>)
+
+n = <span style="color: #B452CD">100</span>
+x = <span style="color: #B452CD">2</span>*np.random.rand(n,<span style="color: #B452CD">1</span>)
+y = <span style="color: #B452CD">4</span>+<span style="color: #B452CD">3</span>*x+<span style="color: #B452CD">5</span>*x*x
+
+X = np.c_[np.ones((n,<span style="color: #B452CD">1</span>)), x, x*x]
+XT_X = X.T @ X
+theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)
+<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;Own inversion&quot;</span>)
+<span style="color: #658b00">print</span>(theta_linreg)
+<span style="color: #228B22"># Hessian matrix</span>
+H = (<span style="color: #B452CD">2.0</span>/n)* XT_X
+<span style="color: #228B22"># Note that here the Hessian does not depend on the parameters theta</span>
+invH = np.linalg.pinv(H)
+theta = np.random.randn(<span style="color: #B452CD">3</span>,<span style="color: #B452CD">1</span>)
+Niterations = <span style="color: #B452CD">5</span>
+<span style="color: #228B22"># define the gradient</span>
+training_gradient = grad(CostOLS)
+
+<span style="color: #8B008B; font-weight: bold">for</span> <span style="color: #658b00">iter</span> <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(Niterations):
+    gradients = training_gradient(theta)
+    theta -= invH @ gradients
+    <span style="color: #658b00">print</span>(<span style="color: #658b00">iter</span>,gradients[<span style="color: #B452CD">0</span>],gradients[<span style="color: #B452CD">1</span>])
+<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;theta from own Newton code&quot;</span>)
+<span style="color: #658b00">print</span>(theta)
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
+</div>
 </section>
 
 <section>
-<h2 id="another-example-from-scikit-learn-s-repository">Another Example from Scikit-Learn's Repository </h2>
-
-<p>This example demonstrates the problems of underfitting and overfitting and
-how we can use linear regression with polynomial features to approximate
-nonlinear functions. The plot shows the function that we want to approximate,
-which is a part of the cosine function. In addition, the samples from the
-real function and the approximations of different models are displayed. The
-models have polynomial features of different degrees. We can see that a
-linear function (polynomial with degree 1) is not sufficient to fit the
-training samples. This is called <b>underfitting</b>. A polynomial of degree 4
-approximates the true function almost perfectly. However, for higher degrees
-the model will <b>overfit</b> the training data, i.e. it learns the noise of the
-training data.
-We evaluate quantitatively overfitting and underfitting by using
-cross-validation. We calculate the mean squared error (MSE) on the validation
-set, the higher, the less likely the model generalizes correctly from the
-training data.
-</p>
-
+<h2 id="similar-second-order-function-now-problem-but-now-with-adagrad">Similar (second order function now) problem but now with AdaGrad </h2>
 
 <!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
 <div class="cell border-box-sizing code_cell rendered">
@@ -1515,55 +2191,54 @@ <h2 id="another-example-from-scikit-learn-s-repository">Another Example from Sci
     <div class="inner_cell">
       <div class="input_area">
         <div class="highlight" style="background: #eeeedd">
-  <pre style="font-size: 80%; line-height: 125%;"><span style="color: #228B22">#print(__doc__)</span>
-
+  <pre style="font-size: 80%; line-height: 125%;"><span style="color: #228B22"># Using Autograd to calculate gradients using AdaGrad and Stochastic Gradient descent</span>
+<span style="color: #228B22"># OLS example</span>
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">random</span> <span style="color: #8B008B; font-weight: bold">import</span> random, seed
 <span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
+<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">autograd.numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
 <span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">matplotlib.pyplot</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">plt</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.pipeline</span> <span style="color: #8B008B; font-weight: bold">import</span> Pipeline
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.preprocessing</span> <span style="color: #8B008B; font-weight: bold">import</span> PolynomialFeatures
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.linear_model</span> <span style="color: #8B008B; font-weight: bold">import</span> LinearRegression
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.model_selection</span> <span style="color: #8B008B; font-weight: bold">import</span> cross_val_score
-
-
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">true_fun</span>(X):
-    <span style="color: #8B008B; font-weight: bold">return</span> np.cos(<span style="color: #B452CD">1.5</span> * np.pi * X)
-
-np.random.seed(<span style="color: #B452CD">0</span>)
-
-n_samples = <span style="color: #B452CD">30</span>
-degrees = [<span style="color: #B452CD">1</span>, <span style="color: #B452CD">4</span>, <span style="color: #B452CD">15</span>]
-
-X = np.sort(np.random.rand(n_samples))
-y = true_fun(X) + np.random.randn(n_samples) * <span style="color: #B452CD">0.1</span>
-
-plt.figure(figsize=(<span style="color: #B452CD">14</span>, <span style="color: #B452CD">5</span>))
-<span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #658b00">len</span>(degrees)):
-    ax = plt.subplot(<span style="color: #B452CD">1</span>, <span style="color: #658b00">len</span>(degrees), i + <span style="color: #B452CD">1</span>)
-    plt.setp(ax, xticks=(), yticks=())
-
-    polynomial_features = PolynomialFeatures(degree=degrees[i],
-                                             include_bias=<span style="color: #8B008B; font-weight: bold">False</span>)
-    linear_regression = LinearRegression()
-    pipeline = Pipeline([(<span style="color: #CD5555">&quot;polynomial_features&quot;</span>, polynomial_features),
-                         (<span style="color: #CD5555">&quot;linear_regression&quot;</span>, linear_regression)])
-    pipeline.fit(X[:, np.newaxis], y)
-
-    <span style="color: #228B22"># Evaluate the models using crossvalidation</span>
-    scores = cross_val_score(pipeline, X[:, np.newaxis], y,
-                             scoring=<span style="color: #CD5555">&quot;neg_mean_squared_error&quot;</span>, cv=<span style="color: #B452CD">10</span>)
-
-    X_test = np.linspace(<span style="color: #B452CD">0</span>, <span style="color: #B452CD">1</span>, <span style="color: #B452CD">100</span>)
-    plt.plot(X_test, pipeline.predict(X_test[:, np.newaxis]), label=<span style="color: #CD5555">&quot;Model&quot;</span>)
-    plt.plot(X_test, true_fun(X_test), label=<span style="color: #CD5555">&quot;True function&quot;</span>)
-    plt.scatter(X, y, edgecolor=<span style="color: #CD5555">&#39;b&#39;</span>, s=<span style="color: #B452CD">20</span>, label=<span style="color: #CD5555">&quot;Samples&quot;</span>)
-    plt.xlabel(<span style="color: #CD5555">&quot;x&quot;</span>)
-    plt.ylabel(<span style="color: #CD5555">&quot;y&quot;</span>)
-    plt.xlim((<span style="color: #B452CD">0</span>, <span style="color: #B452CD">1</span>))
-    plt.ylim((-<span style="color: #B452CD">2</span>, <span style="color: #B452CD">2</span>))
-    plt.legend(loc=<span style="color: #CD5555">&quot;best&quot;</span>)
-    plt.title(<span style="color: #CD5555">&quot;Degree {}\nMSE = {:.2e}(+/- {:.2e})&quot;</span>.format(
-        degrees[i], -scores.mean(), scores.std()))
-plt.show()
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">autograd</span> <span style="color: #8B008B; font-weight: bold">import</span> grad
+
+<span style="color: #228B22"># Note change from previous example</span>
+<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">CostOLS</span>(y,X,theta):
+    <span style="color: #8B008B; font-weight: bold">return</span> np.sum((y-X @ theta)**<span style="color: #B452CD">2</span>)
+
+n = <span style="color: #B452CD">1000</span>
+x = np.random.rand(n,<span style="color: #B452CD">1</span>)
+y = <span style="color: #B452CD">2.0</span>+<span style="color: #B452CD">3</span>*x +<span style="color: #B452CD">4</span>*x*x
+
+X = np.c_[np.ones((n,<span style="color: #B452CD">1</span>)), x, x*x]
+XT_X = X.T @ X
+theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)
+<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;Own inversion&quot;</span>)
+<span style="color: #658b00">print</span>(theta_linreg)
+
+
+<span style="color: #228B22"># Note that we request the derivative wrt third argument (theta, 2 here)</span>
+training_gradient = grad(CostOLS,<span style="color: #B452CD">2</span>)
+<span style="color: #228B22"># Define parameters for Stochastic Gradient Descent</span>
+n_epochs = <span style="color: #B452CD">50</span>
+M = <span style="color: #B452CD">5</span>   <span style="color: #228B22">#size of each minibatch</span>
+m = <span style="color: #658b00">int</span>(n/M) <span style="color: #228B22">#number of minibatches</span>
+<span style="color: #228B22"># Guess for unknown parameters theta</span>
+theta = np.random.randn(<span style="color: #B452CD">3</span>,<span style="color: #B452CD">1</span>)
+
+<span style="color: #228B22"># Value for learning rate</span>
+eta = <span style="color: #B452CD">0.01</span>
+<span style="color: #228B22"># Including AdaGrad parameter to avoid possible division by zero</span>
+delta  = <span style="color: #B452CD">1e-8</span>
+<span style="color: #8B008B; font-weight: bold">for</span> epoch <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(n_epochs):
+    Giter = <span style="color: #B452CD">0.0</span>
+    <span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(m):
+        random_index = M*np.random.randint(m)
+        xi = X[random_index:random_index+M]
+        yi = y[random_index:random_index+M]
+        gradients = (<span style="color: #B452CD">1.0</span>/M)*training_gradient(yi, xi, theta)
+        Giter += gradients*gradients
+        update = gradients*eta/(delta+np.sqrt(Giter))
+        theta -= update
+<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;theta from own AdaGrad&quot;</span>)
+<span style="color: #658b00">print</span>(theta)
 </pre>
 </div>
       </div>
@@ -1578,52 +2253,12 @@ <h2 id="another-example-from-scikit-learn-s-repository">Another Example from Sci
     </div>
   </div>
 </div>
-</section>
-
-<section>
-<h2 id="various-steps-in-cross-validation">Various steps in cross-validation </h2>
-
-<p>When the repetitive splitting of the data set is done randomly,
-samples may accidently end up in a fast majority of the splits in
-either training or test set. Such samples may have an unbalanced
-influence on either model building or prediction evaluation. To avoid
-this \( k \)-fold cross-validation structures the data splitting. The
-samples are divided into \( k \) more or less equally sized exhaustive and
-mutually exclusive subsets. In turn (at each split) one of these
-subsets plays the role of the test set while the union of the
-remaining subsets constitutes the training set. Such a splitting
-warrants a balanced representation of each sample in both training and
-test set over the splits. Still the division into the \( k \) subsets
-involves a degree of randomness. This may be fully excluded when
-choosing \( k=n \). This particular case is referred to as leave-one-out
-cross-validation (LOOCV). 
-</p>
-</section>
-
-<section>
-<h2 id="cross-validation-in-brief">Cross-validation in brief </h2>
-
-<p>For the various values of \( k \)</p>
 
-<ol>
-<p><li> shuffle the dataset randomly.</li>
-<p><li> Split the dataset into \( k \) groups.</li>
-<p><li> For each unique group:
-<ol type="a"></li>
-<p><li> Decide which group to use as set for test data</li>
-<p><li> Take the remaining groups as a training data set</li>
-<p><li> Fit a model on the training set and evaluate it on the test set</li>
-<p><li> Retain the evaluation score and discard the model</li>
-</ol>
-<p>
-<p><li> Summarize the model using the sample of model evaluation scores</li>
-</ol>
+<p>Running this code we note an almost perfect agreement with the results from matrix inversion.</p>
 </section>
 
 <section>
-<h2 id="code-example-for-cross-validation-and-k-fold-cross-validation">Code Example for Cross-validation and \( k \)-fold Cross-validation </h2>
-
-<p>The code here uses Ridge regression with cross-validation (CV)  resampling and \( k \)-fold CV in order to fit a specific polynomial. </p>
+<h2 id="rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent">RMSprop for adaptive learning rate with Stochastic Gradient Descent </h2>
 
 <!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
 <div class="cell border-box-sizing code_cell rendered">
@@ -1631,95 +2266,60 @@ <h2 id="code-example-for-cross-validation-and-k-fold-cross-validation">Code Exam
     <div class="inner_cell">
       <div class="input_area">
         <div class="highlight" style="background: #eeeedd">
-  <pre style="font-size: 80%; line-height: 125%;"><span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
+  <pre style="font-size: 80%; line-height: 125%;"><span style="color: #228B22"># Using Autograd to calculate gradients using RMSprop  and Stochastic Gradient descent</span>
+<span style="color: #228B22"># OLS example</span>
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">random</span> <span style="color: #8B008B; font-weight: bold">import</span> random, seed
+<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
+<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">autograd.numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
 <span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">matplotlib.pyplot</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">plt</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.model_selection</span> <span style="color: #8B008B; font-weight: bold">import</span> KFold
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.linear_model</span> <span style="color: #8B008B; font-weight: bold">import</span> Ridge
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.model_selection</span> <span style="color: #8B008B; font-weight: bold">import</span> cross_val_score
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.preprocessing</span> <span style="color: #8B008B; font-weight: bold">import</span> PolynomialFeatures
-
-<span style="color: #228B22"># A seed just to ensure that the random numbers are the same for every run.</span>
-<span style="color: #228B22"># Useful for eventual debugging.</span>
-np.random.seed(<span style="color: #B452CD">3155</span>)
-
-<span style="color: #228B22"># Generate the data.</span>
-nsamples = <span style="color: #B452CD">100</span>
-x = np.random.randn(nsamples)
-y = <span style="color: #B452CD">3</span>*x**<span style="color: #B452CD">2</span> + np.random.randn(nsamples)
-
-<span style="color: #228B22">## Cross-validation on Ridge regression using KFold only</span>
-
-<span style="color: #228B22"># Decide degree on polynomial to fit</span>
-poly = PolynomialFeatures(degree = <span style="color: #B452CD">6</span>)
-
-<span style="color: #228B22"># Decide which values of lambda to use</span>
-nlambdas = <span style="color: #B452CD">500</span>
-lambdas = np.logspace(-<span style="color: #B452CD">3</span>, <span style="color: #B452CD">5</span>, nlambdas)
-
-<span style="color: #228B22"># Initialize a KFold instance</span>
-k = <span style="color: #B452CD">5</span>
-kfold = KFold(n_splits = k)
-
-<span style="color: #228B22"># Perform the cross-validation to estimate MSE</span>
-scores_KFold = np.zeros((nlambdas, k))
-
-i = <span style="color: #B452CD">0</span>
-<span style="color: #8B008B; font-weight: bold">for</span> lmb <span style="color: #8B008B">in</span> lambdas:
-    ridge = Ridge(alpha = lmb)
-    j = <span style="color: #B452CD">0</span>
-    <span style="color: #8B008B; font-weight: bold">for</span> train_inds, test_inds <span style="color: #8B008B">in</span> kfold.split(x):
-        xtrain = x[train_inds]
-        ytrain = y[train_inds]
-
-        xtest = x[test_inds]
-        ytest = y[test_inds]
-
-        Xtrain = poly.fit_transform(xtrain[:, np.newaxis])
-        ridge.fit(Xtrain, ytrain[:, np.newaxis])
-
-        Xtest = poly.fit_transform(xtest[:, np.newaxis])
-        ypred = ridge.predict(Xtest)
-
-        scores_KFold[i,j] = np.sum((ypred - ytest[:, np.newaxis])**<span style="color: #B452CD">2</span>)/np.size(ypred)
-
-        j += <span style="color: #B452CD">1</span>
-    i += <span style="color: #B452CD">1</span>
-
-
-estimated_mse_KFold = np.mean(scores_KFold, axis = <span style="color: #B452CD">1</span>)
-
-<span style="color: #228B22">## Cross-validation using cross_val_score from sklearn along with KFold</span>
-
-<span style="color: #228B22"># kfold is an instance initialized above as:</span>
-<span style="color: #228B22"># kfold = KFold(n_splits = k)</span>
-
-estimated_mse_sklearn = np.zeros(nlambdas)
-i = <span style="color: #B452CD">0</span>
-<span style="color: #8B008B; font-weight: bold">for</span> lmb <span style="color: #8B008B">in</span> lambdas:
-    ridge = Ridge(alpha = lmb)
-
-    X = poly.fit_transform(x[:, np.newaxis])
-    estimated_mse_folds = cross_val_score(ridge, X, y[:, np.newaxis], scoring=<span style="color: #CD5555">&#39;neg_mean_squared_error&#39;</span>, cv=kfold)
-
-    <span style="color: #228B22"># cross_val_score return an array containing the estimated negative mse for every fold.</span>
-    <span style="color: #228B22"># we have to the the mean of every array in order to get an estimate of the mse of the model</span>
-    estimated_mse_sklearn[i] = np.mean(-estimated_mse_folds)
-
-    i += <span style="color: #B452CD">1</span>
-
-<span style="color: #228B22">## Plot and compare the slightly different ways to perform cross-validation</span>
-
-plt.figure()
-
-plt.plot(np.log10(lambdas), estimated_mse_sklearn, label = <span style="color: #CD5555">&#39;cross_val_score&#39;</span>)
-plt.plot(np.log10(lambdas), estimated_mse_KFold, <span style="color: #CD5555">&#39;r--&#39;</span>, label = <span style="color: #CD5555">&#39;KFold&#39;</span>)
-
-plt.xlabel(<span style="color: #CD5555">&#39;log10(lambda)&#39;</span>)
-plt.ylabel(<span style="color: #CD5555">&#39;mse&#39;</span>)
-
-plt.legend()
-
-plt.show()
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">autograd</span> <span style="color: #8B008B; font-weight: bold">import</span> grad
+
+<span style="color: #228B22"># Note change from previous example</span>
+<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">CostOLS</span>(y,X,theta):
+    <span style="color: #8B008B; font-weight: bold">return</span> np.sum((y-X @ theta)**<span style="color: #B452CD">2</span>)
+
+n = <span style="color: #B452CD">1000</span>
+x = np.random.rand(n,<span style="color: #B452CD">1</span>)
+y = <span style="color: #B452CD">2.0</span>+<span style="color: #B452CD">3</span>*x +<span style="color: #B452CD">4</span>*x*x<span style="color: #228B22"># +np.random.randn(n,1)</span>
+
+X = np.c_[np.ones((n,<span style="color: #B452CD">1</span>)), x, x*x]
+XT_X = X.T @ X
+theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)
+<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;Own inversion&quot;</span>)
+<span style="color: #658b00">print</span>(theta_linreg)
+
+
+<span style="color: #228B22"># Note that we request the derivative wrt third argument (theta, 2 here)</span>
+training_gradient = grad(CostOLS,<span style="color: #B452CD">2</span>)
+<span style="color: #228B22"># Define parameters for Stochastic Gradient Descent</span>
+n_epochs = <span style="color: #B452CD">50</span>
+M = <span style="color: #B452CD">5</span>   <span style="color: #228B22">#size of each minibatch</span>
+m = <span style="color: #658b00">int</span>(n/M) <span style="color: #228B22">#number of minibatches</span>
+<span style="color: #228B22"># Guess for unknown parameters theta</span>
+theta = np.random.randn(<span style="color: #B452CD">3</span>,<span style="color: #B452CD">1</span>)
+
+<span style="color: #228B22"># Value for learning rate</span>
+eta = <span style="color: #B452CD">0.01</span>
+<span style="color: #228B22"># Value for parameter rho</span>
+rho = <span style="color: #B452CD">0.99</span>
+<span style="color: #228B22"># Including AdaGrad parameter to avoid possible division by zero</span>
+delta  = <span style="color: #B452CD">1e-8</span>
+<span style="color: #8B008B; font-weight: bold">for</span> epoch <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(n_epochs):
+    Giter = <span style="color: #B452CD">0.0</span>
+    <span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(m):
+        random_index = M*np.random.randint(m)
+        xi = X[random_index:random_index+M]
+        yi = y[random_index:random_index+M]
+        gradients = (<span style="color: #B452CD">1.0</span>/M)*training_gradient(yi, xi, theta)
+	<span style="color: #228B22"># Accumulated gradient</span>
+	<span style="color: #228B22"># Scaling with rho the new and the previous results</span>
+        Giter = (rho*Giter+(<span style="color: #B452CD">1</span>-rho)*gradients*gradients)
+	<span style="color: #228B22"># Taking the diagonal only and inverting</span>
+        update = gradients*eta/(delta+np.sqrt(Giter))
+	<span style="color: #228B22"># Hadamard product</span>
+        theta -= update
+<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;theta from own RMSprop&quot;</span>)
+<span style="color: #658b00">print</span>(theta)
 </pre>
 </div>
       </div>
@@ -1737,7 +2337,7 @@ <h2 id="code-example-for-cross-validation-and-k-fold-cross-validation">Code Exam
 </section>
 
 <section>
-<h2 id="more-examples-on-bootstrap-and-cross-validation-and-errors">More examples on bootstrap and cross-validation and errors </h2>
+<h2 id="and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf">And finally <a href="https://arxiv.org/pdf/1412.6980.pdf" target="_blank">ADAM</a> </h2>
 
 
 <!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
@@ -1746,84 +2346,65 @@ <h2 id="more-examples-on-bootstrap-and-cross-validation-and-errors">More example
     <div class="inner_cell">
       <div class="input_area">
         <div class="highlight" style="background: #eeeedd">
-  <pre style="font-size: 80%; line-height: 125%;"><span style="color: #228B22"># Common imports</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">os</span>
+  <pre style="font-size: 80%; line-height: 125%;"><span style="color: #228B22"># Using Autograd to calculate gradients using RMSprop  and Stochastic Gradient descent</span>
+<span style="color: #228B22"># OLS example</span>
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">random</span> <span style="color: #8B008B; font-weight: bold">import</span> random, seed
 <span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">pandas</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">pd</span>
+<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">autograd.numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
 <span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">matplotlib.pyplot</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">plt</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.linear_model</span> <span style="color: #8B008B; font-weight: bold">import</span> LinearRegression, Ridge, Lasso
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.model_selection</span> <span style="color: #8B008B; font-weight: bold">import</span> train_test_split
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.utils</span> <span style="color: #8B008B; font-weight: bold">import</span> resample
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.metrics</span> <span style="color: #8B008B; font-weight: bold">import</span> mean_squared_error
-<span style="color: #228B22"># Where to save the figures and data files</span>
-PROJECT_ROOT_DIR = <span style="color: #CD5555">&quot;Results&quot;</span>
-FIGURE_ID = <span style="color: #CD5555">&quot;Results/FigureFiles&quot;</span>
-DATA_ID = <span style="color: #CD5555">&quot;DataFiles/&quot;</span>
-
-<span style="color: #8B008B; font-weight: bold">if</span> <span style="color: #8B008B">not</span> os.path.exists(PROJECT_ROOT_DIR):
-    os.mkdir(PROJECT_ROOT_DIR)
-
-<span style="color: #8B008B; font-weight: bold">if</span> <span style="color: #8B008B">not</span> os.path.exists(FIGURE_ID):
-    os.makedirs(FIGURE_ID)
-
-<span style="color: #8B008B; font-weight: bold">if</span> <span style="color: #8B008B">not</span> os.path.exists(DATA_ID):
-    os.makedirs(DATA_ID)
-
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">image_path</span>(fig_id):
-    <span style="color: #8B008B; font-weight: bold">return</span> os.path.join(FIGURE_ID, fig_id)
-
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">data_path</span>(dat_id):
-    <span style="color: #8B008B; font-weight: bold">return</span> os.path.join(DATA_ID, dat_id)
-
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">save_fig</span>(fig_id):
-    plt.savefig(image_path(fig_id) + <span style="color: #CD5555">&quot;.png&quot;</span>, <span style="color: #658b00">format</span>=<span style="color: #CD5555">&#39;png&#39;</span>)
-
-infile = <span style="color: #658b00">open</span>(data_path(<span style="color: #CD5555">&quot;EoS.csv&quot;</span>),<span style="color: #CD5555">&#39;r&#39;</span>)
-
-<span style="color: #228B22"># Read the EoS data as  csv file and organize the data into two arrays with density and energies</span>
-EoS = pd.read_csv(infile, names=(<span style="color: #CD5555">&#39;Density&#39;</span>, <span style="color: #CD5555">&#39;Energy&#39;</span>))
-EoS[<span style="color: #CD5555">&#39;Energy&#39;</span>] = pd.to_numeric(EoS[<span style="color: #CD5555">&#39;Energy&#39;</span>], errors=<span style="color: #CD5555">&#39;coerce&#39;</span>)
-EoS = EoS.dropna()
-Energies = EoS[<span style="color: #CD5555">&#39;Energy&#39;</span>]
-Density = EoS[<span style="color: #CD5555">&#39;Density&#39;</span>]
-<span style="color: #228B22">#  The design matrix now as function of various polytrops</span>
-
-Maxpolydegree = <span style="color: #B452CD">30</span>
-X = np.zeros((<span style="color: #658b00">len</span>(Density),Maxpolydegree))
-X[:,<span style="color: #B452CD">0</span>] = <span style="color: #B452CD">1.0</span>
-testerror = np.zeros(Maxpolydegree)
-trainingerror = np.zeros(Maxpolydegree)
-polynomial = np.zeros(Maxpolydegree)
-
-trials = <span style="color: #B452CD">100</span>
-<span style="color: #8B008B; font-weight: bold">for</span> polydegree <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #B452CD">1</span>, Maxpolydegree):
-    polynomial[polydegree] = polydegree
-    <span style="color: #8B008B; font-weight: bold">for</span> degree <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(polydegree):
-        X[:,degree] = Density**(degree/<span style="color: #B452CD">3.0</span>)
-
-<span style="color: #228B22"># loop over trials in order to estimate the expectation value of the MSE</span>
-    testerror[polydegree] = <span style="color: #B452CD">0.0</span>
-    trainingerror[polydegree] = <span style="color: #B452CD">0.0</span>
-    <span style="color: #8B008B; font-weight: bold">for</span> samples <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(trials):
-        x_train, x_test, y_train, y_test = train_test_split(X, Energies, test_size=<span style="color: #B452CD">0.2</span>)
-        model = LinearRegression(fit_intercept=<span style="color: #8B008B; font-weight: bold">False</span>).fit(x_train, y_train)
-        ypred = model.predict(x_train)
-        ytilde = model.predict(x_test)
-        testerror[polydegree] += mean_squared_error(y_test, ytilde)
-        trainingerror[polydegree] += mean_squared_error(y_train, ypred) 
-
-    testerror[polydegree] /= trials
-    trainingerror[polydegree] /= trials
-    <span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;Degree of polynomial: %3d&quot;</span>% polynomial[polydegree])
-    <span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;Mean squared error on training data: %.8f&quot;</span> % trainingerror[polydegree])
-    <span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;Mean squared error on test data: %.8f&quot;</span> % testerror[polydegree])
-
-plt.plot(polynomial, np.log10(trainingerror), label=<span style="color: #CD5555">&#39;Training Error&#39;</span>)
-plt.plot(polynomial, np.log10(testerror), label=<span style="color: #CD5555">&#39;Test Error&#39;</span>)
-plt.xlabel(<span style="color: #CD5555">&#39;Polynomial degree&#39;</span>)
-plt.ylabel(<span style="color: #CD5555">&#39;log10[MSE]&#39;</span>)
-plt.legend()
-plt.show()
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">autograd</span> <span style="color: #8B008B; font-weight: bold">import</span> grad
+
+<span style="color: #228B22"># Note change from previous example</span>
+<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">CostOLS</span>(y,X,theta):
+    <span style="color: #8B008B; font-weight: bold">return</span> np.sum((y-X @ theta)**<span style="color: #B452CD">2</span>)
+
+n = <span style="color: #B452CD">1000</span>
+x = np.random.rand(n,<span style="color: #B452CD">1</span>)
+y = <span style="color: #B452CD">2.0</span>+<span style="color: #B452CD">3</span>*x +<span style="color: #B452CD">4</span>*x*x<span style="color: #228B22"># +np.random.randn(n,1)</span>
+
+X = np.c_[np.ones((n,<span style="color: #B452CD">1</span>)), x, x*x]
+XT_X = X.T @ X
+theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)
+<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;Own inversion&quot;</span>)
+<span style="color: #658b00">print</span>(theta_linreg)
+
+
+<span style="color: #228B22"># Note that we request the derivative wrt third argument (theta, 2 here)</span>
+training_gradient = grad(CostOLS,<span style="color: #B452CD">2</span>)
+<span style="color: #228B22"># Define parameters for Stochastic Gradient Descent</span>
+n_epochs = <span style="color: #B452CD">50</span>
+M = <span style="color: #B452CD">5</span>   <span style="color: #228B22">#size of each minibatch</span>
+m = <span style="color: #658b00">int</span>(n/M) <span style="color: #228B22">#number of minibatches</span>
+<span style="color: #228B22"># Guess for unknown parameters theta</span>
+theta = np.random.randn(<span style="color: #B452CD">3</span>,<span style="color: #B452CD">1</span>)
+
+<span style="color: #228B22"># Value for learning rate</span>
+eta = <span style="color: #B452CD">0.01</span>
+<span style="color: #228B22"># Value for parameters theta1 and theta2, see https://arxiv.org/abs/1412.6980</span>
+theta1 = <span style="color: #B452CD">0.9</span>
+theta2 = <span style="color: #B452CD">0.999</span>
+<span style="color: #228B22"># Including AdaGrad parameter to avoid possible division by zero</span>
+delta  = <span style="color: #B452CD">1e-7</span>
+<span style="color: #658b00">iter</span> = <span style="color: #B452CD">0</span>
+<span style="color: #8B008B; font-weight: bold">for</span> epoch <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(n_epochs):
+    first_moment = <span style="color: #B452CD">0.0</span>
+    second_moment = <span style="color: #B452CD">0.0</span>
+    <span style="color: #658b00">iter</span> += <span style="color: #B452CD">1</span>
+    <span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(m):
+        random_index = M*np.random.randint(m)
+        xi = X[random_index:random_index+M]
+        yi = y[random_index:random_index+M]
+        gradients = (<span style="color: #B452CD">1.0</span>/M)*training_gradient(yi, xi, theta)
+        <span style="color: #228B22"># Computing moments first</span>
+        first_moment = theta1*first_moment + (<span style="color: #B452CD">1</span>-theta1)*gradients
+        second_moment = theta2*second_moment+(<span style="color: #B452CD">1</span>-theta2)*gradients*gradients
+        first_term = first_moment/(<span style="color: #B452CD">1.0</span>-theta1**<span style="color: #658b00">iter</span>)
+        second_term = second_moment/(<span style="color: #B452CD">1.0</span>-theta2**<span style="color: #658b00">iter</span>)
+	<span style="color: #228B22"># Scaling with rho the new and the previous results</span>
+        update = eta*first_term/(np.sqrt(second_term)+delta)
+        theta -= update
+<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;theta from own ADAM&quot;</span>)
+<span style="color: #658b00">print</span>(theta)
 </pre>
 </div>
       </div>
@@ -1838,14 +2419,47 @@ <h2 id="more-examples-on-bootstrap-and-cross-validation-and-errors">More example
     </div>
   </div>
 </div>
+</section>
+
+<section>
+<h2 id="material-for-the-lab-sessions">Material for the lab sessions  </h2>
 
-<p>Note that we kept the intercept column in the fitting here. This means that we need to set the <b>intercept</b> in the call to the <b>Scikit-Learn</b> function as <b>False</b>. Alternatively, we could have set up the design matrix \( X \) without the first column of ones.</p>
+<div class="alert alert-block alert-block alert-text-normal">
+<b></b>
+<p>
+<ol>
+<p><li> Exercise set for week 37 and reminder on scaling (from lab sessions of week 35)</li>
+<p><li> Work on project 1
+<!-- * <a href="https://youtu.be/bK4AEcTu-oM" target="_blank">Video of exercise sessions week 37</a> --></li>
+</ol>
+<p>
+<p>For more discussions of Ridge regression and calculation of averages, <a href="https://arxiv.org/abs/1509.09169" target="_blank">Wessel van Wieringen's</a> article is highly recommended.</p>
+</div>
 </section>
 
 <section>
-<h2 id="the-same-example-but-now-with-cross-validation">The same example but now with cross-validation </h2>
+<h2 id="reminder-on-different-scaling-methods">Reminder on different scaling methods </h2>
+
+<p>Before fitting a regression model, it is good practice to normalize or
+standardize the features. This ensures all features are on a
+comparable scale, which is especially important when using
+regularization. In the exercises this week we will perform standardization, scaling each
+feature to have mean 0 and standard deviation 1.
+</p>
+
+<p>Here we compute the mean and standard deviation of each column (feature) in our design/feature matrix \( \boldsymbol{X} \).
+Then we subtract the mean and divide by the standard deviation for each feature.
+</p>
+
+<p>In the example here we
+we will also center the target \( \boldsymbol{y} \) to mean \( 0 \). Centering \( \boldsymbol{y} \)
+(and each feature) means the model does not require a separate intercept
+term, the data is shifted such that the intercept is effectively 0
+. (In practice, one could include an intercept in the model and not
+penalize it, but here we simplify by centering.)
+Choose \( n=100 \) data points and set up $\boldsymbol{x}, \( \boldsymbol{y} \) and the design matrix \( \boldsymbol{X} \).
+</p>
 
-<p>In this example we keep the intercept column again but add cross-validation in order to estimate the best possible value of the means squared error.</p>
 
 <!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
 <div class="cell border-box-sizing code_cell rendered">
@@ -1853,73 +2467,15 @@ <h2 id="the-same-example-but-now-with-cross-validation">The same example but now
     <div class="inner_cell">
       <div class="input_area">
         <div class="highlight" style="background: #eeeedd">
-  <pre style="font-size: 80%; line-height: 125%;"><span style="color: #228B22"># Common imports</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">os</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">pandas</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">pd</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">matplotlib.pyplot</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">plt</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.linear_model</span> <span style="color: #8B008B; font-weight: bold">import</span> LinearRegression, Ridge, Lasso
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.metrics</span> <span style="color: #8B008B; font-weight: bold">import</span> mean_squared_error
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.model_selection</span> <span style="color: #8B008B; font-weight: bold">import</span> KFold
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.model_selection</span> <span style="color: #8B008B; font-weight: bold">import</span> cross_val_score
-
-
-<span style="color: #228B22"># Where to save the figures and data files</span>
-PROJECT_ROOT_DIR = <span style="color: #CD5555">&quot;Results&quot;</span>
-FIGURE_ID = <span style="color: #CD5555">&quot;Results/FigureFiles&quot;</span>
-DATA_ID = <span style="color: #CD5555">&quot;DataFiles/&quot;</span>
-
-<span style="color: #8B008B; font-weight: bold">if</span> <span style="color: #8B008B">not</span> os.path.exists(PROJECT_ROOT_DIR):
-    os.mkdir(PROJECT_ROOT_DIR)
-
-<span style="color: #8B008B; font-weight: bold">if</span> <span style="color: #8B008B">not</span> os.path.exists(FIGURE_ID):
-    os.makedirs(FIGURE_ID)
-
-<span style="color: #8B008B; font-weight: bold">if</span> <span style="color: #8B008B">not</span> os.path.exists(DATA_ID):
-    os.makedirs(DATA_ID)
-
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">image_path</span>(fig_id):
-    <span style="color: #8B008B; font-weight: bold">return</span> os.path.join(FIGURE_ID, fig_id)
-
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">data_path</span>(dat_id):
-    <span style="color: #8B008B; font-weight: bold">return</span> os.path.join(DATA_ID, dat_id)
-
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">save_fig</span>(fig_id):
-    plt.savefig(image_path(fig_id) + <span style="color: #CD5555">&quot;.png&quot;</span>, <span style="color: #658b00">format</span>=<span style="color: #CD5555">&#39;png&#39;</span>)
-
-infile = <span style="color: #658b00">open</span>(data_path(<span style="color: #CD5555">&quot;EoS.csv&quot;</span>),<span style="color: #CD5555">&#39;r&#39;</span>)
-
-<span style="color: #228B22"># Read the EoS data as  csv file and organize the data into two arrays with density and energies</span>
-EoS = pd.read_csv(infile, names=(<span style="color: #CD5555">&#39;Density&#39;</span>, <span style="color: #CD5555">&#39;Energy&#39;</span>))
-EoS[<span style="color: #CD5555">&#39;Energy&#39;</span>] = pd.to_numeric(EoS[<span style="color: #CD5555">&#39;Energy&#39;</span>], errors=<span style="color: #CD5555">&#39;coerce&#39;</span>)
-EoS = EoS.dropna()
-Energies = EoS[<span style="color: #CD5555">&#39;Energy&#39;</span>]
-Density = EoS[<span style="color: #CD5555">&#39;Density&#39;</span>]
-<span style="color: #228B22">#  The design matrix now as function of various polytrops</span>
-
-Maxpolydegree = <span style="color: #B452CD">30</span>
-X = np.zeros((<span style="color: #658b00">len</span>(Density),Maxpolydegree))
-X[:,<span style="color: #B452CD">0</span>] = <span style="color: #B452CD">1.0</span>
-estimated_mse_sklearn = np.zeros(Maxpolydegree)
-polynomial = np.zeros(Maxpolydegree)
-k =<span style="color: #B452CD">5</span>
-kfold = KFold(n_splits = k)
-
-<span style="color: #8B008B; font-weight: bold">for</span> polydegree <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #B452CD">1</span>, Maxpolydegree):
-    polynomial[polydegree] = polydegree
-    <span style="color: #8B008B; font-weight: bold">for</span> degree <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(polydegree):
-        X[:,degree] = Density**(degree/<span style="color: #B452CD">3.0</span>)
-        OLS = LinearRegression(fit_intercept=<span style="color: #8B008B; font-weight: bold">False</span>)
-<span style="color: #228B22"># loop over trials in order to estimate the expectation value of the MSE</span>
-    estimated_mse_folds = cross_val_score(OLS, X, Energies, scoring=<span style="color: #CD5555">&#39;neg_mean_squared_error&#39;</span>, cv=kfold)
-<span style="color: #228B22">#[:, np.newaxis]</span>
-    estimated_mse_sklearn[polydegree] = np.mean(-estimated_mse_folds)
-
-plt.plot(polynomial, np.log10(estimated_mse_sklearn), label=<span style="color: #CD5555">&#39;Test Error&#39;</span>)
-plt.xlabel(<span style="color: #CD5555">&#39;Polynomial degree&#39;</span>)
-plt.ylabel(<span style="color: #CD5555">&#39;log10[MSE]&#39;</span>)
-plt.legend()
-plt.show()
+  <pre style="font-size: 80%; line-height: 125%;"><span style="color: #228B22"># Standardize features (zero mean, unit variance for each feature)</span>
+X_mean = X.mean(axis=<span style="color: #B452CD">0</span>)
+X_std = X.std(axis=<span style="color: #B452CD">0</span>)
+X_std[X_std == <span style="color: #B452CD">0</span>] = <span style="color: #B452CD">1</span>  <span style="color: #228B22"># safeguard to avoid division by zero for constant features</span>
+X_norm = (X - X_mean) / X_std
+
+<span style="color: #228B22"># Center the target to zero mean (optional, to simplify intercept handling)</span>
+y_mean = <span style="color: #a61717; background-color: #e3d2d2">?</span>
+y_centered = <span style="color: #a61717; background-color: #e3d2d2">?</span>
 </pre>
 </div>
       </div>
@@ -1934,211 +2490,606 @@ <h2 id="the-same-example-but-now-with-cross-validation">The same example but now
     </div>
   </div>
 </div>
+
+<p>Do we need to center the values of \( y \)?</p>
+
+<p>After this preprocessing, each column of \( \boldsymbol{X}_{\mathrm{norm}} \) has mean zero and standard deviation \( 1 \)
+and \( \boldsymbol{y}_{\mathrm{centered}} \) has mean 0. This can make the optimization landscape
+nicer and ensures the regularization penalty \( \lambda \sum_j
+\theta_j^2 \) in Ridge regression treats each coefficient fairly (since features are on the
+same scale).
+</p>
 </section>
 
 <section>
-<h2 id="material-for-the-lab-sessions">Material for the lab sessions </h2>
+<h2 id="functionality-in-scikit-learn">Functionality in Scikit-Learn </h2>
+
+<p><b>Scikit-Learn</b> has several functions which allow us to rescale the
+data, normally resulting in much better results in terms of various
+accuracy scores.  The <b>StandardScaler</b> function in <b>Scikit-Learn</b>
+ensures that for each feature/predictor we study the mean value is
+zero and the variance is one (every column in the design/feature
+matrix).  This scaling has the drawback that it does not ensure that
+we have a particular maximum or minimum in our data set. Another
+function included in <b>Scikit-Learn</b> is the <b>MinMaxScaler</b> which
+ensures that all features are exactly between \( 0 \) and \( 1 \). The
+</p>
 </section>
 
 <section>
-<h2 id="linking-the-regression-analysis-with-a-statistical-interpretation">Linking the regression analysis with a statistical interpretation </h2>
+<h2 id="more-preprocessing">More preprocessing </h2>
 
-<p>We will now couple the discussions of ordinary least squares, Ridge
-and Lasso regression with a statistical interpretation, that is we
-move from a linear algebra analysis to a statistical analysis. In
-particular, we will focus on what the regularization terms can result
-in.  We will amongst other things show that the regularization
-parameter can reduce considerably the variance of the parameters
-\( \beta \).
+<div class="alert alert-block alert-block alert-text-normal">
+<b></b>
+<p>
+<p>The <b>Normalizer</b> scales each data
+point such that the feature vector has a euclidean length of one. In other words, it
+projects a data point on the circle (or sphere in the case of higher dimensions) with a
+radius of 1. This means every data point is scaled by a different number (by the
+inverse of it&#8217;s length).
+This normalization is often used when only the direction (or angle) of the data matters,
+not the length of the feature vector.
+</p>
+
+<p>The <b>RobustScaler</b> works similarly to the StandardScaler in that it
+ensures statistical properties for each feature that guarantee that
+they are on the same scale. However, the RobustScaler uses the median
+and quartiles, instead of mean and variance. This makes the
+RobustScaler ignore data points that are very different from the rest
+(like measurement errors). These odd data points are also called
+outliers, and might often lead to trouble for other scaling
+techniques.
 </p>
+</div>
+</section>
 
-<p>The
-advantage of doing linear regression is that we actually end up with
-analytical expressions for several statistical quantities.  
-Standard least squares and Ridge regression  allow us to
-derive quantities like the variance and other expectation values in a
-rather straightforward way.
-</p>
+<section>
+<h2 id="frequently-used-scaling-functions">Frequently used scaling functions </h2>
 
-<p>It is assumed that \( \varepsilon_i
-\sim \mathcal{N}(0, \sigma^2) \) and the \( \varepsilon_{i} \) are
-independent, i.e.: 
+<p>Many features are often scaled using standardization to improve performance. In <b>Scikit-Learn</b> this is given by the <b>StandardScaler</b> function as discussed above. It is easy however to write your own. 
+Mathematically, this involves subtracting the mean and divide by the standard deviation over the data set, for each feature:
 </p>
+
 <p>&nbsp;<br>
 $$
-\begin{align*} 
-\mbox{Cov}(\varepsilon_{i_1},
-\varepsilon_{i_2}) & = \left\{ \begin{array}{lcc} \sigma^2 & \mbox{if}
-& i_1 = i_2, \\ 0 & \mbox{if} & i_1 \not= i_2.  \end{array} \right.
-\end{align*} 
+    x_j^{(i)} \rightarrow \frac{x_j^{(i)} - \overline{x}_j}{\sigma(x_j)},
 $$
 <p>&nbsp;<br>
 
-<p>The randomness of \( \varepsilon_i \) implies that
-\( \mathbf{y}_i \) is also a random variable. In particular,
-\( \mathbf{y}_i \) is normally distributed, because \( \varepsilon_i \sim
-\mathcal{N}(0, \sigma^2) \) and \( \mathbf{X}_{i,\ast} \, \boldsymbol{\beta} \) is a
-non-random scalar. To specify the parameters of the distribution of
-\( \mathbf{y}_i \) we need to calculate its first two moments. 
+<p>where \( \overline{x}_j \) and \( \sigma(x_j) \) are the mean and standard deviation, respectively,  of the feature \( x_j \).
+This ensures that each feature has zero mean and unit standard deviation.  For data sets where  we do not have the standard deviation or don't wish to calculate it,  it is then common to simply set it to one.
 </p>
 
-<p>Recall that \( \boldsymbol{X} \) is a matrix of dimensionality \( n\times p \). The
-notation above \( \mathbf{X}_{i,\ast} \) means that we are looking at the
-row number \( i \) and perform a sum over all values \( p \).
+<p>Keep in mind that when you transform your data set before training a model, the same transformation needs to be done
+on your eventual new data set  before making a prediction. If we translate this into a Python code, it would could be implemented as
 </p>
-</section>
 
-<section>
-<h2 id="assumptions-made">Assumptions made </h2>
 
-<p>The assumption we have made here can be summarized as (and this is going to be useful when we discuss the bias-variance trade off)
-that there exists a function \( f(\boldsymbol{x}) \) and  a normal distributed error \( \boldsymbol{\varepsilon}\sim \mathcal{N}(0, \sigma^2) \)
-which describe our data
+<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #eeeedd">
+  <pre style="font-size: 80%; line-height: 125%;"><span style="color: #CD5555">&quot;&quot;&quot;</span>
+<span style="color: #CD5555">#Model training, we compute the mean value of y and X</span>
+<span style="color: #CD5555">y_train_mean = np.mean(y_train)</span>
+<span style="color: #CD5555">X_train_mean = np.mean(X_train,axis=0)</span>
+<span style="color: #CD5555">X_train = X_train - X_train_mean</span>
+<span style="color: #CD5555">y_train = y_train - y_train_mean</span>
+
+<span style="color: #CD5555"># The we fit our model with the training data</span>
+<span style="color: #CD5555">trained_model = some_model.fit(X_train,y_train)</span>
+
+
+<span style="color: #CD5555">#Model prediction, we need also to transform our data set used for the prediction.</span>
+<span style="color: #CD5555">X_test = X_test - X_train_mean #Use mean from training data</span>
+<span style="color: #CD5555">y_pred = trained_model(X_test)</span>
+<span style="color: #CD5555">y_pred = y_pred + y_train_mean</span>
+<span style="color: #CD5555">&quot;&quot;&quot;</span>
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
+</div>
+
+<p>Let us try to understand what this may imply mathematically when we
+subtract the mean values, also known as <em>zero centering</em>. For
+simplicity, we will focus on  ordinary regression, as done in the above example.
 </p>
+
+<p>The cost/loss function  for regression is</p>
 <p>&nbsp;<br>
 $$
-\boldsymbol{y} = f(\boldsymbol{x})+\boldsymbol{\varepsilon}
+C(\theta_0, \theta_1, ... , \theta_{p-1}) = \frac{1}{n}\sum_{i=0}^{n} \left(y_i - \theta_0 - \sum_{j=1}^{p-1} X_{ij}\theta_j\right)^2,.
 $$
 <p>&nbsp;<br>
 
-<p>We approximate this function with our model from the solution of the linear regression equations, that is our
-function \( f \) is approximated by \( \boldsymbol{\tilde{y}} \) where we want to minimize \( (\boldsymbol{y}-\boldsymbol{\tilde{y}})^2 \), our MSE, with
+<p>Recall also that we use the squared value. This expression can lead to an
+increased penalty for higher differences between predicted and
+output/target values.
+</p>
+
+<p>What we have done is to single out the \( \theta_0 \) term in the
+definition of the mean squared error (MSE).  The design matrix \( X \)
+does in this case not contain any intercept column.  When we take the
+derivative with respect to \( \theta_0 \), we want the derivative to obey
 </p>
+
 <p>&nbsp;<br>
 $$
-\boldsymbol{\tilde{y}} = \boldsymbol{X}\boldsymbol{\beta}.
+\frac{\partial C}{\partial \theta_j} = 0,
 $$
 <p>&nbsp;<br>
-</section>
 
-<section>
-<h2 id="expectation-value-and-variance">Expectation value and variance </h2>
+<p>for all \( j \). For \( \theta_0 \) we have</p>
 
-<p>We can calculate the expectation value of \( \boldsymbol{y} \) for a given element \( i \) </p>
 <p>&nbsp;<br>
 $$
-\begin{align*} 
-\mathbb{E}(y_i) & =
-\mathbb{E}(\mathbf{X}_{i, \ast} \, \boldsymbol{\beta}) + \mathbb{E}(\varepsilon_i)
-\, \, \, = \, \, \, \mathbf{X}_{i, \ast} \, \beta, 
-\end{align*} 
+\frac{\partial C}{\partial \theta_0} = -\frac{2}{n}\sum_{i=0}^{n-1} \left(y_i - \theta_0 - \sum_{j=1}^{p-1} X_{ij} \theta_j\right).
 $$
 <p>&nbsp;<br>
 
-<p>while
-its variance is 
-</p>
+<p>Multiplying away the constant \( 2/n \), we obtain</p>
 <p>&nbsp;<br>
 $$
-\begin{align*} \mbox{Var}(y_i) & = \mathbb{E} \{ [y_i
-- \mathbb{E}(y_i)]^2 \} \, \, \, = \, \, \, \mathbb{E} ( y_i^2 ) -
-[\mathbb{E}(y_i)]^2  \\  & = \mathbb{E} [ ( \mathbf{X}_{i, \ast} \,
-\beta + \varepsilon_i )^2] - ( \mathbf{X}_{i, \ast} \, \boldsymbol{\beta})^2 \\ &
-= \mathbb{E} [ ( \mathbf{X}_{i, \ast} \, \boldsymbol{\beta})^2 + 2 \varepsilon_i
-\mathbf{X}_{i, \ast} \, \boldsymbol{\beta} + \varepsilon_i^2 ] - ( \mathbf{X}_{i,
-\ast} \, \beta)^2 \\  & = ( \mathbf{X}_{i, \ast} \, \boldsymbol{\beta})^2 + 2
-\mathbb{E}(\varepsilon_i) \mathbf{X}_{i, \ast} \, \boldsymbol{\beta} +
-\mathbb{E}(\varepsilon_i^2 ) - ( \mathbf{X}_{i, \ast} \, \boldsymbol{\beta})^2 
-\\ & = \mathbb{E}(\varepsilon_i^2 ) \, \, \, = \, \, \,
-\mbox{Var}(\varepsilon_i) \, \, \, = \, \, \, \sigma^2.  
-\end{align*}
+\sum_{i=0}^{n-1} \theta_0 = \sum_{i=0}^{n-1}y_i - \sum_{i=0}^{n-1} \sum_{j=1}^{p-1} X_{ij} \theta_j.
 $$
 <p>&nbsp;<br>
 
-<p>Hence, \( y_i \sim \mathcal{N}( \mathbf{X}_{i, \ast} \, \boldsymbol{\beta}, \sigma^2) \), that is \( \boldsymbol{y} \) follows a normal distribution with 
-mean value \( \boldsymbol{X}\boldsymbol{\beta} \) and variance \( \sigma^2 \) (not be confused with the singular values of the SVD). 
+<p>Let us specialize first to the case where we have only two parameters \( \theta_0 \) and \( \theta_1 \).
+Our result for \( \theta_0 \) simplifies then to
 </p>
-</section>
+<p>&nbsp;<br>
+$$
+n\theta_0 = \sum_{i=0}^{n-1}y_i - \sum_{i=0}^{n-1} X_{i1} \theta_1.
+$$
+<p>&nbsp;<br>
 
-<section>
-<h2 id="expectation-value-and-variance-for-boldsymbol-beta">Expectation value and variance for \( \boldsymbol{\beta} \) </h2>
+<p>We obtain then</p>
+<p>&nbsp;<br>
+$$
+\theta_0 = \frac{1}{n}\sum_{i=0}^{n-1}y_i - \theta_1\frac{1}{n}\sum_{i=0}^{n-1} X_{i1}.
+$$
+<p>&nbsp;<br>
 
-<p>With the OLS expressions for the optimal parameters \( \boldsymbol{\hat{\beta}} \) we can evaluate the expectation value</p>
+<p>If we define</p>
 <p>&nbsp;<br>
 $$
-\mathbb{E}(\boldsymbol{\hat{\beta}}) = \mathbb{E}[ (\mathbf{X}^{\top} \mathbf{X})^{-1}\mathbf{X}^{T} \mathbf{Y}]=(\mathbf{X}^{T} \mathbf{X})^{-1}\mathbf{X}^{T} \mathbb{E}[ \mathbf{Y}]=(\mathbf{X}^{T} \mathbf{X})^{-1} \mathbf{X}^{T}\mathbf{X}\boldsymbol{\beta}=\boldsymbol{\beta}.
+\mu_{\boldsymbol{x}_1}=\frac{1}{n}\sum_{i=0}^{n-1} X_{i1},
 $$
 <p>&nbsp;<br>
 
-<p>This means that the estimator of the regression parameters is unbiased.</p>
+<p>and the mean value of the outputs as</p>
+<p>&nbsp;<br>
+$$
+\mu_y=\frac{1}{n}\sum_{i=0}^{n-1}y_i,
+$$
+<p>&nbsp;<br>
 
-<p>We can also calculate the variance</p>
+<p>we have</p>
+<p>&nbsp;<br>
+$$
+\theta_0 = \mu_y - \theta_1\mu_{\boldsymbol{x}_1}.
+$$
+<p>&nbsp;<br>
 
-<p>The variance of the optimal value \( \boldsymbol{\hat{\beta}} \) is</p>
+<p>In the general case with more parameters than \( \theta_0 \) and \( \theta_1 \), we have</p>
 <p>&nbsp;<br>
 $$
-\begin{eqnarray*}
-\mbox{Var}(\boldsymbol{\hat{\beta}}) & = & \mathbb{E} \{ [\boldsymbol{\beta} - \mathbb{E}(\boldsymbol{\beta})] [\boldsymbol{\beta} - \mathbb{E}(\boldsymbol{\beta})]^{T} \}
-\\
-& = & \mathbb{E} \{ [(\mathbf{X}^{T} \mathbf{X})^{-1} \, \mathbf{X}^{T} \mathbf{y} - \boldsymbol{\beta}] \, [(\mathbf{X}^{T} \mathbf{X})^{-1} \, \mathbf{X}^{T} \mathbf{y} - \boldsymbol{\beta}]^{T} \}
-\\
-% & = & \mathbb{E} \{ [(\mathbf{X}^{T} \mathbf{X})^{-1} \, \mathbf{X}^{T} \mathbf{y}] \, [(\mathbf{X}^{T} \mathbf{X})^{-1} \, \mathbf{X}^{T} \mathbf{y}]^{T} \} - \boldsymbol{\beta} \, \boldsymbol{\beta}^{T}
-% \\
-% & = & \mathbb{E} \{ (\mathbf{X}^{T} \mathbf{X})^{-1} \, \mathbf{X}^{T} \mathbf{y} \, \mathbf{y}^{T} \, \mathbf{X} \, (\mathbf{X}^{T} \mathbf{X})^{-1}  \} - \boldsymbol{\beta} \, \boldsymbol{\beta}^{T}
-% \\
-& = & (\mathbf{X}^{T} \mathbf{X})^{-1} \, \mathbf{X}^{T} \, \mathbb{E} \{ \mathbf{y} \, \mathbf{y}^{T} \} \, \mathbf{X} \, (\mathbf{X}^{T} \mathbf{X})^{-1} - \boldsymbol{\beta} \, \boldsymbol{\beta}^{T}
-\\
-& = & (\mathbf{X}^{T} \mathbf{X})^{-1} \, \mathbf{X}^{T} \, \{ \mathbf{X} \, \boldsymbol{\beta} \, \boldsymbol{\beta}^{T} \,  \mathbf{X}^{T} + \sigma^2 \} \, \mathbf{X} \, (\mathbf{X}^{T} \mathbf{X})^{-1} - \boldsymbol{\beta} \, \boldsymbol{\beta}^{T}
-% \\
-% & = & (\mathbf{X}^T \mathbf{X})^{-1} \, \mathbf{X}^T \, \mathbf{X} \, \boldsymbol{\beta} \, \boldsymbol{\beta}^T \,  \mathbf{X}^T \, \mathbf{X} \, (\mathbf{X}^T % \mathbf{X})^{-1}
-% \\
-% & & + \, \, \sigma^2 \, (\mathbf{X}^T \mathbf{X})^{-1} \, \mathbf{X}^T  \, \mathbf{X} \, (\mathbf{X}^T \mathbf{X})^{-1} - \boldsymbol{\beta} \boldsymbol{\beta}^T
-\\
-& = & \boldsymbol{\beta} \, \boldsymbol{\beta}^{T}  + \sigma^2 \, (\mathbf{X}^{T} \mathbf{X})^{-1} - \boldsymbol{\beta} \, \boldsymbol{\beta}^{T}
-\, \, \, = \, \, \, \sigma^2 \, (\mathbf{X}^{T} \mathbf{X})^{-1},
-\end{eqnarray*}
+\theta_0 = \frac{1}{n}\sum_{i=0}^{n-1}y_i - \frac{1}{n}\sum_{i=0}^{n-1}\sum_{j=1}^{p-1} X_{ij}\theta_j.
 $$
 <p>&nbsp;<br>
 
-<p>where we have used  that \( \mathbb{E} (\mathbf{y} \mathbf{y}^{T}) =
-\mathbf{X} \, \boldsymbol{\beta} \, \boldsymbol{\beta}^{T} \, \mathbf{X}^{T} +
-\sigma^2 \, \mathbf{I}_{nn} \). From \( \mbox{Var}(\boldsymbol{\beta}) = \sigma^2
-\, (\mathbf{X}^{T} \mathbf{X})^{-1} \), one obtains an estimate of the
-variance of the estimate of the \( j \)-th regression coefficient:
-\( \boldsymbol{\sigma}^2 (\boldsymbol{\beta}_j ) = \boldsymbol{\sigma}^2 [(\mathbf{X}^{T} \mathbf{X})^{-1}]_{jj}  \). This may be used to
-construct a confidence interval for the estimates.
-</p>
+<p>We can rewrite the latter equation as</p>
+<p>&nbsp;<br>
+$$
+\theta_0 = \frac{1}{n}\sum_{i=0}^{n-1}y_i - \sum_{j=1}^{p-1} \mu_{\boldsymbol{x}_j}\theta_j,
+$$
+<p>&nbsp;<br>
+
+<p>where we have defined</p>
+<p>&nbsp;<br>
+$$
+\mu_{\boldsymbol{x}_j}=\frac{1}{n}\sum_{i=0}^{n-1} X_{ij},
+$$
+<p>&nbsp;<br>
+
+<p>the mean value for all elements of the column vector \( \boldsymbol{x}_j \).</p>
+
+<p>Replacing \( y_i \) with \( y_i - y_i - \overline{\boldsymbol{y}} \) and centering also our design matrix results in a cost function (in vector-matrix disguise)</p>
+<p>&nbsp;<br>
+$$
+C(\boldsymbol{\theta}) = (\boldsymbol{\tilde{y}} - \tilde{X}\boldsymbol{\theta})^T(\boldsymbol{\tilde{y}} - \tilde{X}\boldsymbol{\theta}). 
+$$
+<p>&nbsp;<br>
 
-<p>In a similar way, we can obtain analytical expressions for say the
-expectation values of the parameters \( \boldsymbol{\beta} \) and their variance
-when we employ Ridge regression, allowing us again to define a confidence interval. 
+<p>If we minimize with respect to \( \boldsymbol{\theta} \) we have then</p>
+
+<p>&nbsp;<br>
+$$
+\hat{\boldsymbol{\theta}} = (\tilde{X}^T\tilde{X})^{-1}\tilde{X}^T\boldsymbol{\tilde{y}},
+$$
+<p>&nbsp;<br>
+
+<p>where \( \boldsymbol{\tilde{y}} = \boldsymbol{y} - \overline{\boldsymbol{y}} \)
+and \( \tilde{X}_{ij} = X_{ij} - \frac{1}{n}\sum_{k=0}^{n-1}X_{kj} \).
 </p>
 
-<p>It is rather straightforward to show that</p>
+<p>For Ridge regression we need to add \( \lambda \boldsymbol{\theta}^T\boldsymbol{\theta} \) to the cost function and get then</p>
 <p>&nbsp;<br>
 $$
-\mathbb{E} \big[ \hat{\boldsymbol{\beta}}^{\mathrm{Ridge}} \big]=(\mathbf{X}^{T} \mathbf{X} + \lambda \mathbf{I}_{pp})^{-1} (\mathbf{X}^{\top} \mathbf{X})\boldsymbol{\beta}.
+\hat{\boldsymbol{\theta}} = (\tilde{X}^T\tilde{X} + \lambda I)^{-1}\tilde{X}^T\boldsymbol{\tilde{y}}.
 $$
 <p>&nbsp;<br>
 
-<p>We see clearly that 
-\( \mathbb{E} \big[ \hat{\boldsymbol{\beta}}^{\mathrm{Ridge}} \big] \not= \hat{\boldsymbol{\beta}}^{\mathrm{OLS}} \) for any \( \lambda > 0 \).
+<p>What does this mean? And why do we insist on all this? Let us look at some examples.</p>
+
+<p>This code shows a simple first-order fit to a data set using the above transformed data, where we consider the role of the intercept first, by either excluding it or including it (<em>code example thanks to  &#216;yvind Sigmundson Sch&#248;yen</em>). Here our scaling of the data is done by subtracting the mean values only.
+Note also that we do not split the data into training and test.
 </p>
 
-<p>We can also compute the variance as </p>
 
+<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #eeeedd">
+  <pre style="font-size: 80%; line-height: 125%;"><span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
+<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">matplotlib.pyplot</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">plt</span>
+
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.linear_model</span> <span style="color: #8B008B; font-weight: bold">import</span> LinearRegression
+
+
+np.random.seed(<span style="color: #B452CD">2021</span>)
+
+<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">MSE</span>(y_data,y_model):
+    n = np.size(y_model)
+    <span style="color: #8B008B; font-weight: bold">return</span> np.sum((y_data-y_model)**<span style="color: #B452CD">2</span>)/n
+
+
+<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">fit_theta</span>(X, y):
+    <span style="color: #8B008B; font-weight: bold">return</span> np.linalg.pinv(X.T @ X) @ X.T @ y
+
+
+true_theta = [<span style="color: #B452CD">2</span>, <span style="color: #B452CD">0.5</span>, <span style="color: #B452CD">3.7</span>]
+
+x = np.linspace(<span style="color: #B452CD">0</span>, <span style="color: #B452CD">1</span>, <span style="color: #B452CD">11</span>)
+y = np.sum(
+    np.asarray([x ** p * b <span style="color: #8B008B; font-weight: bold">for</span> p, b <span style="color: #8B008B">in</span> <span style="color: #658b00">enumerate</span>(true_theta)]), axis=<span style="color: #B452CD">0</span>
+) + <span style="color: #B452CD">0.1</span> * np.random.normal(size=<span style="color: #658b00">len</span>(x))
+
+degree = <span style="color: #B452CD">3</span>
+X = np.zeros((<span style="color: #658b00">len</span>(x), degree))
+
+<span style="color: #228B22"># Include the intercept in the design matrix</span>
+<span style="color: #8B008B; font-weight: bold">for</span> p <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(degree):
+    X[:, p] = x ** p
+
+theta = fit_theta(X, y)
+
+<span style="color: #228B22"># Intercept is included in the design matrix</span>
+skl = LinearRegression(fit_intercept=<span style="color: #8B008B; font-weight: bold">False</span>).fit(X, y)
+
+<span style="color: #658b00">print</span>(<span style="color: #CD5555">f&quot;True theta: {</span>true_theta<span style="color: #CD5555">}&quot;</span>)
+<span style="color: #658b00">print</span>(<span style="color: #CD5555">f&quot;Fitted theta: {</span>theta<span style="color: #CD5555">}&quot;</span>)
+<span style="color: #658b00">print</span>(<span style="color: #CD5555">f&quot;Sklearn fitted theta: {</span>skl.coef_<span style="color: #CD5555">}&quot;</span>)
+ypredictOwn = X @ theta
+ypredictSKL = skl.predict(X)
+<span style="color: #658b00">print</span>(<span style="color: #CD5555">f&quot;MSE with intercept column&quot;</span>)
+<span style="color: #658b00">print</span>(MSE(y,ypredictOwn))
+<span style="color: #658b00">print</span>(<span style="color: #CD5555">f&quot;MSE with intercept column from SKL&quot;</span>)
+<span style="color: #658b00">print</span>(MSE(y,ypredictSKL))
+
+
+plt.figure()
+plt.scatter(x, y, label=<span style="color: #CD5555">&quot;Data&quot;</span>)
+plt.plot(x, X @ theta, label=<span style="color: #CD5555">&quot;Fit&quot;</span>)
+plt.plot(x, skl.predict(X), label=<span style="color: #CD5555">&quot;Sklearn (fit_intercept=False)&quot;</span>)
+
+
+<span style="color: #228B22"># Do not include the intercept in the design matrix</span>
+X = np.zeros((<span style="color: #658b00">len</span>(x), degree - <span style="color: #B452CD">1</span>))
+
+<span style="color: #8B008B; font-weight: bold">for</span> p <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(degree - <span style="color: #B452CD">1</span>):
+    X[:, p] = x ** (p + <span style="color: #B452CD">1</span>)
+
+<span style="color: #228B22"># Intercept is not included in the design matrix</span>
+skl = LinearRegression(fit_intercept=<span style="color: #8B008B; font-weight: bold">True</span>).fit(X, y)
+
+<span style="color: #228B22"># Use centered values for X and y when computing coefficients</span>
+y_offset = np.average(y, axis=<span style="color: #B452CD">0</span>)
+X_offset = np.average(X, axis=<span style="color: #B452CD">0</span>)
+
+theta = fit_theta(X - X_offset, y - y_offset)
+intercept = np.mean(y_offset - X_offset @ theta)
+
+<span style="color: #658b00">print</span>(<span style="color: #CD5555">f&quot;Manual intercept: {</span>intercept<span style="color: #CD5555">}&quot;</span>)
+<span style="color: #658b00">print</span>(<span style="color: #CD5555">f&quot;Fitted theta (without intercept): {</span>theta<span style="color: #CD5555">}&quot;</span>)
+<span style="color: #658b00">print</span>(<span style="color: #CD5555">f&quot;Sklearn intercept: {</span>skl.intercept_<span style="color: #CD5555">}&quot;</span>)
+<span style="color: #658b00">print</span>(<span style="color: #CD5555">f&quot;Sklearn fitted theta (without intercept): {</span>skl.coef_<span style="color: #CD5555">}&quot;</span>)
+ypredictOwn = X @ theta
+ypredictSKL = skl.predict(X)
+<span style="color: #658b00">print</span>(<span style="color: #CD5555">f&quot;MSE with Manual intercept&quot;</span>)
+<span style="color: #658b00">print</span>(MSE(y,ypredictOwn+intercept))
+<span style="color: #658b00">print</span>(<span style="color: #CD5555">f&quot;MSE with Sklearn intercept&quot;</span>)
+<span style="color: #658b00">print</span>(MSE(y,ypredictSKL))
+
+plt.plot(x, X @ theta + intercept, <span style="color: #CD5555">&quot;--&quot;</span>, label=<span style="color: #CD5555">&quot;Fit (manual intercept)&quot;</span>)
+plt.plot(x, skl.predict(X), <span style="color: #CD5555">&quot;--&quot;</span>, label=<span style="color: #CD5555">&quot;Sklearn (fit_intercept=True)&quot;</span>)
+plt.grid()
+plt.legend()
+
+plt.show()
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
+</div>
+
+<p>The intercept is the value of our output/target variable
+when all our features are zero and our function crosses the \( y \)-axis (for a one-dimensional case). 
+</p>
+
+<p>Printing the MSE, we see first that both methods give the same MSE, as
+they should.  However, when we move to for example Ridge regression,
+the way we treat the intercept may give a larger or smaller MSE,
+meaning that the MSE can be penalized by the value of the
+intercept. Not including the intercept in the fit, means that the
+regularization term does not include \( \theta_0 \). For different values
+of \( \lambda \), this may lead to different MSE values. 
+</p>
+
+<p>To remind the reader, the regularization term, with the intercept in Ridge regression, is given by</p>
 <p>&nbsp;<br>
 $$
-\mbox{Var}[\hat{\boldsymbol{\beta}}^{\mathrm{Ridge}}]=\sigma^2[  \mathbf{X}^{T} \mathbf{X} + \lambda \mathbf{I} ]^{-1}  \mathbf{X}^{T} \mathbf{X} \{ [  \mathbf{X}^{\top} \mathbf{X} + \lambda \mathbf{I} ]^{-1}\}^{T},
+\lambda \vert\vert \boldsymbol{\theta} \vert\vert_2^2 = \lambda \sum_{j=0}^{p-1}\theta_j^2,
 $$
 <p>&nbsp;<br>
 
-<p>and it is easy to see that if the parameter \( \lambda \) goes to infinity then the variance of Ridge parameters \( \boldsymbol{\beta} \) goes to zero. </p>
-
-<p>With this, we can compute the difference </p>
+<p>but when we take out the intercept, this equation becomes</p>
+<p>&nbsp;<br>
+$$
+\lambda \vert\vert \boldsymbol{\theta} \vert\vert_2^2 = \lambda \sum_{j=1}^{p-1}\theta_j^2.
+$$
+<p>&nbsp;<br>
 
+<p>For Lasso regression we have</p>
 <p>&nbsp;<br>
 $$
-\mbox{Var}[\hat{\boldsymbol{\beta}}^{\mathrm{OLS}}]-\mbox{Var}(\hat{\boldsymbol{\beta}}^{\mathrm{Ridge}})=\sigma^2 [  \mathbf{X}^{T} \mathbf{X} + \lambda \mathbf{I} ]^{-1}[ 2\lambda\mathbf{I} + \lambda^2 (\mathbf{X}^{T} \mathbf{X})^{-1} ] \{ [  \mathbf{X}^{T} \mathbf{X} + \lambda \mathbf{I} ]^{-1}\}^{T}.
+\lambda \vert\vert \boldsymbol{\theta} \vert\vert_1 = \lambda \sum_{j=1}^{p-1}\vert\theta_j\vert.
 $$
 <p>&nbsp;<br>
 
-<p>The difference is non-negative definite since each component of the
-matrix product is non-negative definite. 
-This means the variance we obtain with the standard OLS will always for \( \lambda > 0 \) be larger than the variance of \( \boldsymbol{\beta} \) obtained with the Ridge estimator. This has interesting consequences when we discuss the so-called bias-variance trade-off below. 
+<p>It means that, when scaling the design matrix and the outputs/targets,
+by subtracting the mean values, we have an optimization problem which
+is not penalized by the intercept. The MSE value can then be smaller
+since it focuses only on the remaining quantities. If we however bring
+back the intercept, we will get a MSE which then contains the
+intercept.
 </p>
 
-<p>For more discussions of Ridge regression and calculation of averages, <a href="https://arxiv.org/abs/1509.09169" target="_blank">Wessel van Wieringen's</a> article is highly recommended.</p>
+<p>Armed with this wisdom, we attempt first to simply set the intercept equal to <b>False</b> in our implementation of Ridge regression for our well-known  vanilla data set.</p>
+
+
+<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #eeeedd">
+  <pre style="font-size: 80%; line-height: 125%;"><span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
+<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">pandas</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">pd</span>
+<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">matplotlib.pyplot</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">plt</span>
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.model_selection</span> <span style="color: #8B008B; font-weight: bold">import</span> train_test_split
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn</span> <span style="color: #8B008B; font-weight: bold">import</span> linear_model
+
+<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">MSE</span>(y_data,y_model):
+    n = np.size(y_model)
+    <span style="color: #8B008B; font-weight: bold">return</span> np.sum((y_data-y_model)**<span style="color: #B452CD">2</span>)/n
+
+
+<span style="color: #228B22"># A seed just to ensure that the random numbers are the same for every run.</span>
+<span style="color: #228B22"># Useful for eventual debugging.</span>
+np.random.seed(<span style="color: #B452CD">3155</span>)
+
+n = <span style="color: #B452CD">100</span>
+x = np.random.rand(n)
+y = np.exp(-x**<span style="color: #B452CD">2</span>) + <span style="color: #B452CD">1.5</span> * np.exp(-(x-<span style="color: #B452CD">2</span>)**<span style="color: #B452CD">2</span>)
+
+Maxpolydegree = <span style="color: #B452CD">20</span>
+X = np.zeros((n,Maxpolydegree))
+<span style="color: #228B22">#We include explicitely the intercept column</span>
+<span style="color: #8B008B; font-weight: bold">for</span> degree <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(Maxpolydegree):
+    X[:,degree] = x**degree
+<span style="color: #228B22"># We split the data in test and training data</span>
+X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=<span style="color: #B452CD">0.2</span>)
+
+p = Maxpolydegree
+I = np.eye(p,p)
+<span style="color: #228B22"># Decide which values of lambda to use</span>
+nlambdas = <span style="color: #B452CD">6</span>
+MSEOwnRidgePredict = np.zeros(nlambdas)
+MSERidgePredict = np.zeros(nlambdas)
+lambdas = np.logspace(-<span style="color: #B452CD">4</span>, <span style="color: #B452CD">2</span>, nlambdas)
+<span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(nlambdas):
+    lmb = lambdas[i]
+    OwnRidgeTheta = np.linalg.pinv(X_train.T @ X_train+lmb*I) @ X_train.T @ y_train
+    <span style="color: #228B22"># Note: we include the intercept column and no scaling</span>
+    RegRidge = linear_model.Ridge(lmb,fit_intercept=<span style="color: #8B008B; font-weight: bold">False</span>)
+    RegRidge.fit(X_train,y_train)
+    <span style="color: #228B22"># and then make the prediction</span>
+    ytildeOwnRidge = X_train @ OwnRidgeTheta
+    ypredictOwnRidge = X_test @ OwnRidgeTheta
+    ytildeRidge = RegRidge.predict(X_train)
+    ypredictRidge = RegRidge.predict(X_test)
+    MSEOwnRidgePredict[i] = MSE(y_test,ypredictOwnRidge)
+    MSERidgePredict[i] = MSE(y_test,ypredictRidge)
+    <span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;Theta values for own Ridge implementation&quot;</span>)
+    <span style="color: #658b00">print</span>(OwnRidgeTheta)
+    <span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;Theta values for Scikit-Learn Ridge implementation&quot;</span>)
+    <span style="color: #658b00">print</span>(RegRidge.coef_)
+    <span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;MSE values for own Ridge implementation&quot;</span>)
+    <span style="color: #658b00">print</span>(MSEOwnRidgePredict[i])
+    <span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;MSE values for Scikit-Learn Ridge implementation&quot;</span>)
+    <span style="color: #658b00">print</span>(MSERidgePredict[i])
+
+<span style="color: #228B22"># Now plot the results</span>
+plt.figure()
+plt.plot(np.log10(lambdas), MSEOwnRidgePredict, <span style="color: #CD5555">&#39;r&#39;</span>, label = <span style="color: #CD5555">&#39;MSE own Ridge Test&#39;</span>)
+plt.plot(np.log10(lambdas), MSERidgePredict, <span style="color: #CD5555">&#39;g&#39;</span>, label = <span style="color: #CD5555">&#39;MSE Ridge Test&#39;</span>)
+
+plt.xlabel(<span style="color: #CD5555">&#39;log10(lambda)&#39;</span>)
+plt.ylabel(<span style="color: #CD5555">&#39;MSE&#39;</span>)
+plt.legend()
+plt.show()
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
+</div>
+
+<p>The results here agree when we force <b>Scikit-Learn</b>'s Ridge function to include the first column in our design matrix.
+We see that the results agree very well. Here we have thus explicitely included the intercept column in the design matrix.
+What happens if we do not include the intercept in our fit?
+Let us see how we can change this code by zero centering.
+</p>
+
+
+<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #eeeedd">
+  <pre style="font-size: 80%; line-height: 125%;"><span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
+<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">pandas</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">pd</span>
+<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">matplotlib.pyplot</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">plt</span>
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.model_selection</span> <span style="color: #8B008B; font-weight: bold">import</span> train_test_split
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn</span> <span style="color: #8B008B; font-weight: bold">import</span> linear_model
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.preprocessing</span> <span style="color: #8B008B; font-weight: bold">import</span> StandardScaler
+
+<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">MSE</span>(y_data,y_model):
+    n = np.size(y_model)
+    <span style="color: #8B008B; font-weight: bold">return</span> np.sum((y_data-y_model)**<span style="color: #B452CD">2</span>)/n
+<span style="color: #228B22"># A seed just to ensure that the random numbers are the same for every run.</span>
+<span style="color: #228B22"># Useful for eventual debugging.</span>
+np.random.seed(<span style="color: #B452CD">315</span>)
+
+n = <span style="color: #B452CD">100</span>
+x = np.random.rand(n)
+y = np.exp(-x**<span style="color: #B452CD">2</span>) + <span style="color: #B452CD">1.5</span> * np.exp(-(x-<span style="color: #B452CD">2</span>)**<span style="color: #B452CD">2</span>)
+
+Maxpolydegree = <span style="color: #B452CD">20</span>
+X = np.zeros((n,Maxpolydegree-<span style="color: #B452CD">1</span>))
+
+<span style="color: #8B008B; font-weight: bold">for</span> degree <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #B452CD">1</span>,Maxpolydegree): <span style="color: #228B22">#No intercept column</span>
+    X[:,degree-<span style="color: #B452CD">1</span>] = x**(degree)
+
+<span style="color: #228B22"># We split the data in test and training data</span>
+X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=<span style="color: #B452CD">0.2</span>)
+
+<span style="color: #228B22">#For our own implementation, we will need to deal with the intercept by centering the design matrix and the target variable</span>
+X_train_mean = np.mean(X_train,axis=<span style="color: #B452CD">0</span>)
+<span style="color: #228B22">#Center by removing mean from each feature</span>
+X_train_scaled = X_train - X_train_mean 
+X_test_scaled = X_test - X_train_mean
+<span style="color: #228B22">#The model intercept (called y_scaler) is given by the mean of the target variable (IF X is centered)</span>
+<span style="color: #228B22">#Remove the intercept from the training data.</span>
+y_scaler = np.mean(y_train)           
+y_train_scaled = y_train - y_scaler   
+
+p = Maxpolydegree-<span style="color: #B452CD">1</span>
+I = np.eye(p,p)
+<span style="color: #228B22"># Decide which values of lambda to use</span>
+nlambdas = <span style="color: #B452CD">6</span>
+MSEOwnRidgePredict = np.zeros(nlambdas)
+MSERidgePredict = np.zeros(nlambdas)
+
+lambdas = np.logspace(-<span style="color: #B452CD">4</span>, <span style="color: #B452CD">2</span>, nlambdas)
+<span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(nlambdas):
+    lmb = lambdas[i]
+    OwnRidgeTheta = np.linalg.pinv(X_train_scaled.T @ X_train_scaled+lmb*I) @ X_train_scaled.T @ (y_train_scaled)
+    intercept_ = y_scaler - X_train_mean<span style="color: #707a7c">@OwnRidgeTheta</span> <span style="color: #228B22">#The intercept can be shifted so the model can predict on uncentered data</span>
+    <span style="color: #228B22">#Add intercept to prediction</span>
+    ypredictOwnRidge = X_test_scaled @ OwnRidgeTheta + y_scaler 
+    RegRidge = linear_model.Ridge(lmb)
+    RegRidge.fit(X_train,y_train)
+    ypredictRidge = RegRidge.predict(X_test)
+    MSEOwnRidgePredict[i] = MSE(y_test,ypredictOwnRidge)
+    MSERidgePredict[i] = MSE(y_test,ypredictRidge)
+    <span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;Theta values for own Ridge implementation&quot;</span>)
+    <span style="color: #658b00">print</span>(OwnRidgeTheta) <span style="color: #228B22">#Intercept is given by mean of target variable</span>
+    <span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;Theta values for Scikit-Learn Ridge implementation&quot;</span>)
+    <span style="color: #658b00">print</span>(RegRidge.coef_)
+    <span style="color: #658b00">print</span>(<span style="color: #CD5555">&#39;Intercept from own implementation:&#39;</span>)
+    <span style="color: #658b00">print</span>(intercept_)
+    <span style="color: #658b00">print</span>(<span style="color: #CD5555">&#39;Intercept from Scikit-Learn Ridge implementation&#39;</span>)
+    <span style="color: #658b00">print</span>(RegRidge.intercept_)
+    <span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;MSE values for own Ridge implementation&quot;</span>)
+    <span style="color: #658b00">print</span>(MSEOwnRidgePredict[i])
+    <span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;MSE values for Scikit-Learn Ridge implementation&quot;</span>)
+    <span style="color: #658b00">print</span>(MSERidgePredict[i])
+
+
+<span style="color: #228B22"># Now plot the results</span>
+plt.figure()
+plt.plot(np.log10(lambdas), MSEOwnRidgePredict, <span style="color: #CD5555">&#39;b--&#39;</span>, label = <span style="color: #CD5555">&#39;MSE own Ridge Test&#39;</span>)
+plt.plot(np.log10(lambdas), MSERidgePredict, <span style="color: #CD5555">&#39;g--&#39;</span>, label = <span style="color: #CD5555">&#39;MSE SL Ridge Test&#39;</span>)
+plt.xlabel(<span style="color: #CD5555">&#39;log10(lambda)&#39;</span>)
+plt.ylabel(<span style="color: #CD5555">&#39;MSE&#39;</span>)
+plt.legend()
+plt.show()
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
+</div>
+
+<p>We see here, when compared to the code which includes explicitely the
+intercept column, that our MSE value is actually smaller. This is
+because the regularization term does not include the intercept value
+\( \theta_0 \) in the fitting.  This applies to Lasso regularization as
+well.  It means that our optimization is now done only with the
+centered matrix and/or vector that enter the fitting procedure.
+</p>
 </section>
 
 
diff --git a/doc/pub/week37/html/week37-solarized.html b/doc/pub/week37/html/week37-solarized.html
index 9f9f47f47..093e60695 100644
--- a/doc/pub/week37/html/week37-solarized.html
+++ b/doc/pub/week37/html/week37-solarized.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 37: Statistical interpretations and Resampling Methods">
-<title>Week 37: Statistical interpretations and Resampling Methods</title>
+<meta name="description" content="Week 37: Gradient descent methods">
+<title>Week 37: Gradient descent methods</title>
 <link href="https://cdn.rawgit.com/doconce/doconce/master/bundled/html_styles/style_solarized_box/css/solarized_light_code.css" rel="stylesheet" type="text/css" title="light"/>
 <script src="https://cdn.rawgit.com/doconce/doconce/master/bundled/html_styles/style_solarized_box/js/highlight.pack.js"></script>
 <script>hljs.initHighlightingOnLoad();</script>
@@ -67,159 +67,222 @@
                2,
                None,
                'plans-for-week-37-lecture-monday'),
-              ('Plans for week 37, lab sessions',
+              ('Readings and Videos:', 2, None, 'readings-and-videos'),
+              ('Material for lecture Monday September 8',
                2,
                None,
-               'plans-for-week-37-lab-sessions'),
-              ('Material for lecture Monday September 9',
+               'material-for-lecture-monday-september-8'),
+              ('Gradient descent and revisiting Ordinary Least Squares from '
+               'last week',
                2,
                None,
-               'material-for-lecture-monday-september-9'),
-              ('Deriving OLS from a probability distribution',
+               'gradient-descent-and-revisiting-ordinary-least-squares-from-last-week'),
+              ('Gradient descent example', 2, None, 'gradient-descent-example'),
+              ('The derivative of the cost/loss function',
                2,
                None,
-               'deriving-ols-from-a-probability-distribution'),
-              ('Independent and Identically Distrubuted (iid)',
+               'the-derivative-of-the-cost-loss-function'),
+              ('The Hessian matrix', 2, None, 'the-hessian-matrix'),
+              ('Simple program', 2, None, 'simple-program'),
+              ('Gradient Descent Example', 2, None, 'gradient-descent-example'),
+              ('Gradient descent and Ridge',
                2,
                None,
-               'independent-and-identically-distrubuted-iid'),
-              ('Maximum Likelihood Estimation (MLE)',
+               'gradient-descent-and-ridge'),
+              ('The Hessian matrix for Ridge Regression',
                2,
                None,
-               'maximum-likelihood-estimation-mle'),
-              ('A new Cost Function', 2, None, 'a-new-cost-function'),
-              ("More basic Statistics and Bayes' theorem",
+               'the-hessian-matrix-for-ridge-regression'),
+              ('Program example for gradient descent with Ridge Regression',
                2,
                None,
-               'more-basic-statistics-and-bayes-theorem'),
-              ('Marginal Probability', 2, None, 'marginal-probability'),
-              ('Conditional  Probability', 2, None, 'conditional-probability'),
-              ("Bayes' Theorem", 2, None, 'bayes-theorem'),
-              ("Interpretations of Bayes' Theorem",
+               'program-example-for-gradient-descent-with-ridge-regression'),
+              ('Using gradient descent methods, limitations',
                2,
                None,
-               'interpretations-of-bayes-theorem'),
-              ("Example of Usage of Bayes' theorem",
+               'using-gradient-descent-methods-limitations'),
+              ('Momentum based GD', 2, None, 'momentum-based-gd'),
+              ('Improving gradient descent with momentum',
                2,
                None,
-               'example-of-usage-of-bayes-theorem'),
-              ('Doing it correctly', 2, None, 'doing-it-correctly'),
-              ("Bayes' Theorem and Ridge and Lasso Regression",
+               'improving-gradient-descent-with-momentum'),
+              ('Same code but now with momentum gradient descent',
                2,
                None,
-               'bayes-theorem-and-ridge-and-lasso-regression'),
-              ('Ridge and Bayes', 2, None, 'ridge-and-bayes'),
-              ('Lasso and Bayes', 2, None, 'lasso-and-bayes'),
-              ('Why resampling methods', 2, None, 'why-resampling-methods'),
-              ('Resampling methods', 2, None, 'resampling-methods'),
-              ('Resampling approaches can be computationally expensive',
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ('Overview video on Stochastic Gradient Descent (SGD)',
                2,
                None,
-               'resampling-approaches-can-be-computationally-expensive'),
-              ('Why resampling methods ?', 2, None, 'why-resampling-methods'),
-              ('Statistical analysis', 2, None, 'statistical-analysis'),
-              ('Resampling methods', 2, None, 'resampling-methods'),
-              ('Resampling methods: Bootstrap',
+               'overview-video-on-stochastic-gradient-descent-sgd'),
+              ('Batches and mini-batches', 2, None, 'batches-and-mini-batches'),
+              ('Pros and cons', 2, None, 'pros-and-cons'),
+              ('Convergence rates', 2, None, 'convergence-rates'),
+              ('Accuracy', 2, None, 'accuracy'),
+              ('Stochastic Gradient Descent (SGD)',
                2,
                None,
-               'resampling-methods-bootstrap'),
-              ('The Central Limit Theorem',
+               'stochastic-gradient-descent-sgd'),
+              ('Stochastic Gradient Descent',
                2,
                None,
-               'the-central-limit-theorem'),
-              ('Finding the Limit', 2, None, 'finding-the-limit'),
-              ('Rewriting the $\\delta$-function',
+               'stochastic-gradient-descent'),
+              ('Computation of gradients', 2, None, 'computation-of-gradients'),
+              ('SGD example', 2, None, 'sgd-example'),
+              ('The gradient step', 2, None, 'the-gradient-step'),
+              ('Simple example code', 2, None, 'simple-example-code'),
+              ('When do we stop?', 2, None, 'when-do-we-stop'),
+              ('Slightly different approach',
                2,
                None,
-               'rewriting-the-delta-function'),
-              ('Identifying Terms', 2, None, 'identifying-terms'),
-              ('Wrapping it up', 2, None, 'wrapping-it-up'),
-              ('Confidence Intervals', 2, None, 'confidence-intervals'),
-              ('Standard Approach based on the Normal Distribution',
+               'slightly-different-approach'),
+              ('Time decay rate', 2, None, 'time-decay-rate'),
+              ('Code with a Number of Minibatches which varies',
                2,
                None,
-               'standard-approach-based-on-the-normal-distribution'),
-              ('Resampling methods: Bootstrap background',
+               'code-with-a-number-of-minibatches-which-varies'),
+              ('Replace or not', 2, None, 'replace-or-not'),
+              ('SGD vs Full-Batch GD: Convergence Speed and Memory Comparison',
                2,
                None,
-               'resampling-methods-bootstrap-background'),
-              ('Resampling methods: More Bootstrap background',
+               'sgd-vs-full-batch-gd-convergence-speed-and-memory-comparison'),
+              ('Theoretical Convergence Speed and convex optimization',
+               3,
+               None,
+               'theoretical-convergence-speed-and-convex-optimization'),
+              ('Strongly Convex Case', 3, None, 'strongly-convex-case'),
+              ('Non-Convex Problems', 3, None, 'non-convex-problems'),
+              ('Memory Usage and Scalability',
+               2,
+               None,
+               'memory-usage-and-scalability'),
+              ('Empirical Evidence: Convergence Time and Memory in Practice',
+               2,
+               None,
+               'empirical-evidence-convergence-time-and-memory-in-practice'),
+              ('Deep Neural Networks', 3, None, 'deep-neural-networks'),
+              ('Memory constraints', 3, None, 'memory-constraints'),
+              ('Second moment of the gradient',
+               2,
+               None,
+               'second-moment-of-the-gradient'),
+              ('Challenge: Choosing a Fixed Learning Rate',
+               2,
+               None,
+               'challenge-choosing-a-fixed-learning-rate'),
+              ('Motivation for Adaptive Step Sizes',
+               2,
+               None,
+               'motivation-for-adaptive-step-sizes'),
+              ('AdaGrad algorithm, taken from "Goodfellow et '
+               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
+               2,
+               None,
+               'adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
+              ('Derivation of the AdaGrad Algorithm',
+               2,
+               None,
+               'derivation-of-the-adagrad-algorithm'),
+              ('AdaGrad Update Rule Derivation',
+               2,
+               None,
+               'adagrad-update-rule-derivation'),
+              ('AdaGrad Properties', 2, None, 'adagrad-properties'),
+              ('RMSProp: Adaptive Learning Rates',
+               2,
+               None,
+               'rmsprop-adaptive-learning-rates'),
+              ('RMSProp algorithm, taken from "Goodfellow et '
+               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
+               2,
+               None,
+               'rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
+              ('Adam Optimizer', 2, None, 'adam-optimizer'),
+              ('"ADAM optimizer":"https://arxiv.org/abs/1412.6980"',
+               2,
+               None,
+               'adam-optimizer-https-arxiv-org-abs-1412-6980'),
+              ('Why Combine Momentum and RMSProp?',
+               2,
+               None,
+               'why-combine-momentum-and-rmsprop'),
+              ('Adam: Exponential Moving Averages (Moments)',
                2,
                None,
-               'resampling-methods-more-bootstrap-background'),
-              ('Resampling methods: Bootstrap approach',
+               'adam-exponential-moving-averages-moments'),
+              ('Adam: Bias Correction', 2, None, 'adam-bias-correction'),
+              ('Adam: Update Rule Derivation',
                2,
                None,
-               'resampling-methods-bootstrap-approach'),
-              ('Resampling methods: Bootstrap steps',
+               'adam-update-rule-derivation'),
+              ('Adam vs. AdaGrad and RMSProp',
                2,
                None,
-               'resampling-methods-bootstrap-steps'),
-              ('Code example for the Bootstrap method',
+               'adam-vs-adagrad-and-rmsprop'),
+              ('Adaptivity Across Dimensions',
                2,
                None,
-               'code-example-for-the-bootstrap-method'),
-              ('Plotting the Histogram', 2, None, 'plotting-the-histogram'),
-              ('The bias-variance tradeoff',
+               'adaptivity-across-dimensions'),
+              ('ADAM algorithm, taken from "Goodfellow et '
+               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
                2,
                None,
-               'the-bias-variance-tradeoff'),
-              ('A way to Read the Bias-Variance Tradeoff',
+               'adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
+              ('Algorithms and codes for Adagrad, RMSprop and Adam',
                2,
                None,
-               'a-way-to-read-the-bias-variance-tradeoff'),
-              ('Example code for Bias-Variance tradeoff',
+               'algorithms-and-codes-for-adagrad-rmsprop-and-adam'),
+              ('Practical tips', 2, None, 'practical-tips'),
+              ('Sneaking in automatic differentiation using Autograd',
                2,
                None,
-               'example-code-for-bias-variance-tradeoff'),
-              ('Understanding what happens',
+               'sneaking-in-automatic-differentiation-using-autograd'),
+              ('Same code but now with momentum gradient descent',
                2,
                None,
-               'understanding-what-happens'),
-              ('Summing up', 2, None, 'summing-up'),
-              ("Another Example from Scikit-Learn's Repository",
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ('Including Stochastic Gradient Descent with Autograd',
                2,
                None,
-               'another-example-from-scikit-learn-s-repository'),
-              ('Various steps in cross-validation',
+               'including-stochastic-gradient-descent-with-autograd'),
+              ('Same code but now with momentum gradient descent',
                2,
                None,
-               'various-steps-in-cross-validation'),
-              ('Cross-validation in brief',
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ("But none of these can compete with Newton's method",
                2,
                None,
-               'cross-validation-in-brief'),
-              ('Code Example for Cross-validation and $k$-fold '
-               'Cross-validation',
+               'but-none-of-these-can-compete-with-newton-s-method'),
+              ('Similar (second order function now) problem but now with '
+               'AdaGrad',
                2,
                None,
-               'code-example-for-cross-validation-and-k-fold-cross-validation'),
-              ('More examples on bootstrap and cross-validation and errors',
+               'similar-second-order-function-now-problem-but-now-with-adagrad'),
+              ('RMSprop for adaptive learning rate with Stochastic Gradient '
+               'Descent',
                2,
                None,
-               'more-examples-on-bootstrap-and-cross-validation-and-errors'),
-              ('The same example but now with cross-validation',
+               'rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent'),
+              ('And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"',
                2,
                None,
-               'the-same-example-but-now-with-cross-validation'),
+               'and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf'),
               ('Material for the lab sessions',
                2,
                None,
                'material-for-the-lab-sessions'),
-              ('Linking the regression analysis with a statistical '
-               'interpretation',
+              ('Reminder on different scaling methods',
                2,
                None,
-               'linking-the-regression-analysis-with-a-statistical-interpretation'),
-              ('Assumptions made', 2, None, 'assumptions-made'),
-              ('Expectation value and variance',
+               'reminder-on-different-scaling-methods'),
+              ('Functionality in Scikit-Learn',
                2,
                None,
-               'expectation-value-and-variance'),
-              ('Expectation value and variance for $\\boldsymbol{\\beta}$',
+               'functionality-in-scikit-learn'),
+              ('More preprocessing', 2, None, 'more-preprocessing'),
+              ('Frequently used scaling functions',
                2,
                None,
-               'expectation-value-and-variance-for-boldsymbol-beta')]}
+               'frequently-used-scaling-functions')]}
 end of tocinfo -->
 
 <body>
@@ -241,7 +304,7 @@
 
 <!-- ------------------- main content ---------------------- -->
 <center>
-<h1>Week 37: Statistical interpretations and Resampling Methods</h1>
+<h1>Week 37: Gradient descent methods</h1>
 </center>  <!-- document title -->
 
 <!-- author(s): Morten Hjorth-Jensen -->
@@ -254,7 +317,7 @@ <h1>Week 37: Statistical interpretations and Resampling Methods</h1>
 </center>
 <br>
 <center>
-<h4>September 9, 2024</h4>
+<h4>September 8-12, 2025</h4>
 </center> <!-- date -->
 <br>
 
@@ -264,808 +327,1619 @@ <h4>September 9, 2024</h4>
 <h2 id="plans-for-week-37-lecture-monday">Plans for week 37, lecture Monday  </h2>
 
 <div class="alert alert-block alert-block alert-text-normal">
-<b>Material for the lecture on Monday September 9</b>
+<b>Plans and material  for the lecture on Monday September 8</b>
 <p>
-<ul>
-   <li> <a href="https://youtu.be/omLmp_kkie0" target="_blank">Video of Lecture</a></li>
-   <li> <a href="https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2024/NotesSeptember9.pdf" target="_blank">Whiteboard notes</a></li>
-</ul>
-  <li> Statistical interpretation of Ridge and Lasso regression, see also slides from last week</li>
-  <li> Resampling techniques, Bootstrap and cross validation and bias-variance tradeoff (this may partly be discussed during the exercise sessions as well.</li>
-  <li> Readings and Videos:</li>
-<ul>
-    <li> Raschka et al, pages 175-192</li>
-    <li> Hastie et al Chapter 7, here we recommend 7.1-7.5 and 7.10 (cross-validation) and 7.11 (bootstrap). See <a href="https://link.springer.com/book/10.1007/978-0-387-84858-7" target="_blank"><tt>https://link.springer.com/book/10.1007/978-0-387-84858-7</tt></a>.</li>
-    <li> <a href="https://www.youtube.com/watch?v=fSytzGwwBVw" target="_blank">Video on cross validation</a></li>
-    <li> <a href="https://www.youtube.com/watch?v=Xz0x-8-cgaQ" target="_blank">Video on Bootstrapping</a></li>
-    <li> <a href="https://www.youtube.com/watch?v=EuBBz3bI-aA" target="_blank">Video on bias-variance tradeoff</a></li>
-</ul>
+<p>The family of gradient descent methods</p>
+<ol>
+<li> Plain gradient descent (constant learning rate), reminder from last week with examples using OLS and Ridge</li>
+<li> Improving gradient descent with momentum</li>
+<li> Introducing stochastic gradient descent</li>
+<li> More advanced updates of the learning rate: ADAgrad, RMSprop and ADAM</li>
+<li> <a href="https://youtu.be/SuxK68tj-V8" target="_blank">Video of Lecture</a></li>
+<li> <a href="https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2025/FYSSTKweek37.pdf" target="_blank">Whiteboard notes</a></li>
+</ol>
 </div>
 
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="plans-for-week-37-lab-sessions">Plans for week 37, lab sessions  </h2>
-
+<h2 id="readings-and-videos">Readings and Videos: </h2>
 <div class="alert alert-block alert-block alert-text-normal">
-<b>Material for the lab  sessions on Tuesday and Wednesday</b>
+<b></b>
 <p>
-<ul>
-  <li> Calculations of expectation values</li>
-  <li> Discussion of resampling techniques</li>
-  <li> Exercise set for week 37</li>
-  <li> Work on project 1</li>
-  <li> <a href="https://youtu.be/bK4AEcTu-oM" target="_blank">Video of exercise sessions week 37</a></li>
-  <li> For more discussions of Ridge regression and calculation of averages, <a href="https://arxiv.org/abs/1509.09169" target="_blank">Wessel van Wieringen's</a> article is highly recommended.</li>
-</ul>
+<ol>
+<li> Recommended: Goodfellow et al, Deep Learning, introduction to gradient descent, see sections 4.3-4.5  at <a href="https://www.deeplearningbook.org/contents/numerical.html" target="_blank"><tt>https://www.deeplearningbook.org/contents/numerical.html</tt></a> and chapter 8.3-8.5 at <a href="https://www.deeplearningbook.org/contents/optimization.html" target="_blank"><tt>https://www.deeplearningbook.org/contents/optimization.html</tt></a></li>
+<li> Rashcka et al, pages 37-44 and pages 278-283 with focus on linear regression.</li>
+<li> Video on gradient descent at <a href="https://www.youtube.com/watch?v=sDv4f4s2SB8" target="_blank"><tt>https://www.youtube.com/watch?v=sDv4f4s2SB8</tt></a></li>
+<li> Video on Stochastic gradient descent at <a href="https://www.youtube.com/watch?v=vMh0zPT0tLI" target="_blank"><tt>https://www.youtube.com/watch?v=vMh0zPT0tLI</tt></a></li>
+</ol>
 </div>
-  
 
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="material-for-lecture-monday-september-9">Material for lecture Monday September 9 </h2>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="deriving-ols-from-a-probability-distribution">Deriving OLS from a probability distribution </h2>
+<h2 id="material-for-lecture-monday-september-8">Material for lecture Monday September 8 </h2>
 
-<p>Our basic assumption when we derived the OLS equations was to assume
-that our output is determined by a given continuous function
-\( f(\boldsymbol{x}) \) and a random noise \( \boldsymbol{\epsilon} \) given by the normal
-distribution with zero mean value and an undetermined variance
-\( \sigma^2 \).
-</p>
+<!-- !split  -->
+<h2 id="gradient-descent-and-revisiting-ordinary-least-squares-from-last-week">Gradient descent and revisiting Ordinary Least Squares from last week </h2>
 
-<p>We found above that the outputs \( \boldsymbol{y} \) have a mean value given by
-\( \boldsymbol{X}\hat{\boldsymbol{\beta}} \) and variance \( \sigma^2 \). Since the entries to
-the design matrix are not stochastic variables, we can assume that the
-probability distribution of our targets is also a normal distribution
-but now with mean value \( \boldsymbol{X}\hat{\boldsymbol{\beta}} \). This means that a
-single output \( y_i \) is given by the Gaussian distribution
+<p>Last week we started with  linear regression as a case study for the gradient descent
+methods. Linear regression is a great test case for the gradient
+descent methods discussed in the lectures since it has several
+desirable properties such as:
 </p>
 
-$$
-y_i\sim \mathcal{N}(\boldsymbol{X}_{i,*}\boldsymbol{\beta}, \sigma^2)=\frac{1}{\sqrt{2\pi\sigma^2}}\exp{\left[-\frac{(y_i-\boldsymbol{X}_{i,*}\boldsymbol{\beta})^2}{2\sigma^2}\right]}.
-$$
+<ol>
+<li> An analytical solution (recall homework sets for week 35).</li>
+<li> The gradient can be computed analytically.</li>
+<li> The cost function is convex which guarantees that gradient descent converges for small enough learning rates</li>
+</ol>
+<p>We revisit an example similar to what we had in the first homework set. We have a function  of the type</p>
 
 
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="independent-and-identically-distrubuted-iid">Independent and Identically Distrubuted (iid) </h2>
+<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #eeeedd">
+  <pre style="line-height: 125%;"><span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
+x = <span style="color: #B452CD">2</span>*np.random.rand(m,<span style="color: #B452CD">1</span>)
+y = <span style="color: #B452CD">4</span>+<span style="color: #B452CD">3</span>*x+np.random.randn(m,<span style="color: #B452CD">1</span>)
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
+</div>
 
-<p>We assume now that the various \( y_i \) values are stochastically distributed according to the above Gaussian distribution. 
-We define this distribution as
+<p>with \( x_i \in [0,1]  \) is chosen randomly using a uniform distribution. Additionally we have a stochastic noise chosen according to a normal distribution \( \cal {N}(0,1) \). 
+The linear regression model is given by 
 </p>
 $$
-p(y_i, \boldsymbol{X}\vert\boldsymbol{\beta})=\frac{1}{\sqrt{2\pi\sigma^2}}\exp{\left[-\frac{(y_i-\boldsymbol{X}_{i,*}\boldsymbol{\beta})^2}{2\sigma^2}\right]},
+h_\theta(x) = \boldsymbol{y} = \theta_0 + \theta_1 x,
+$$
+
+<p>such that </p>
+$$
+\boldsymbol{y}_i = \theta_0 + \theta_1 x_i.
 $$
 
-<p>which reads as finding the likelihood of an event \( y_i \) with the input variables \( \boldsymbol{X} \) given the parameters (to be determined) \( \boldsymbol{\beta} \).</p>
 
-<p>Since these events are assumed to be independent and identicall distributed we can build the probability distribution function (PDF) for all possible event \( \boldsymbol{y} \) as the product of the single events, that is we have</p>
+<!-- !split  -->
+<h2 id="gradient-descent-example">Gradient descent example </h2>
+
+<p>Let \( \mathbf{y} = (y_1,\cdots,y_n)^T \), \( \mathbf{\boldsymbol{y}} = (\boldsymbol{y}_1,\cdots,\boldsymbol{y}_n)^T \) and \( \theta = (\theta_0, \theta_1)^T \)</p>
 
+<p>It is convenient to write \( \mathbf{\boldsymbol{y}} = X\theta \) where \( X \in \mathbb{R}^{100 \times 2}  \) is the design matrix given by (we keep the intercept here)</p>
 $$
-p(\boldsymbol{y},\boldsymbol{X}\vert\boldsymbol{\beta})=\prod_{i=0}^{n-1}\frac{1}{\sqrt{2\pi\sigma^2}}\exp{\left[-\frac{(y_i-\boldsymbol{X}_{i,*}\boldsymbol{\beta})^2}{2\sigma^2}\right]}=\prod_{i=0}^{n-1}p(y_i,\boldsymbol{X}\vert\boldsymbol{\beta}).
+X \equiv \begin{bmatrix}
+1 & x_1  \\
+\vdots & \vdots  \\
+1 & x_{100} &  \\
+\end{bmatrix}.
 $$
 
-<p>We will write this in a more compact form reserving \( \boldsymbol{D} \) for the domain of events, including the ouputs (targets) and the inputs. That is
-in case we have a simple one-dimensional input and output case
-</p>
+<p>The cost/loss/risk function is given by </p>
 $$
-\boldsymbol{D}=[(x_0,y_0), (x_1,y_1),\dots, (x_{n-1},y_{n-1})].
+C(\theta) = \frac{1}{n}||X\theta-\mathbf{y}||_{2}^{2} = \frac{1}{n}\sum_{i=1}^{100}\left[ (\theta_0 + \theta_1 x_i)^2 - 2 y_i (\theta_0 + \theta_1 x_i) + y_i^2\right] 
 $$
 
-<p>In the more general case the various inputs should be replaced by the possible features represented by the input data set \( \boldsymbol{X} \). 
-We can now rewrite the above probability as 
-</p>
+<p>and we want to find \( \theta \) such that \( C(\theta) \) is minimized.</p>
+
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="the-derivative-of-the-cost-loss-function">The derivative of the cost/loss function </h2>
+
+<p>Computing \( \partial C(\theta) / \partial \theta_0 \) and \( \partial C(\theta) / \partial \theta_1 \) we can show  that the gradient can be written as</p>
 $$
-p(\boldsymbol{D}\vert\boldsymbol{\beta})=\prod_{i=0}^{n-1}\frac{1}{\sqrt{2\pi\sigma^2}}\exp{\left[-\frac{(y_i-\boldsymbol{X}_{i,*}\boldsymbol{\beta})^2}{2\sigma^2}\right]}.
+\nabla_{\theta} C(\theta) = \frac{2}{n}\begin{bmatrix} \sum_{i=1}^{100} \left(\theta_0+\theta_1x_i-y_i\right) \\
+\sum_{i=1}^{100}\left( x_i (\theta_0+\theta_1x_i)-y_ix_i\right) \\
+\end{bmatrix} = \frac{2}{n}X^T(X\theta - \mathbf{y}), 
 $$
 
-<p>It is a conditional probability (see below) and reads as the likelihood of a domain of events \( \boldsymbol{D} \) given a set of parameters \( \boldsymbol{\beta} \).</p>
+<p>where \( X \) is the design matrix defined above.</p>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="maximum-likelihood-estimation-mle">Maximum Likelihood Estimation (MLE) </h2>
+<h2 id="the-hessian-matrix">The Hessian matrix </h2>
+<p>The Hessian matrix of \( C(\theta) \) is given by </p>
+$$
+\boldsymbol{H} \equiv \begin{bmatrix}
+\frac{\partial^2 C(\theta)}{\partial \theta_0^2} & \frac{\partial^2 C(\theta)}{\partial \theta_0 \partial \theta_1}  \\
+\frac{\partial^2 C(\theta)}{\partial \theta_0 \partial \theta_1} & \frac{\partial^2 C(\theta)}{\partial \theta_1^2} &  \\
+\end{bmatrix} = \frac{2}{n}X^T X.
+$$
 
-<p>In statistics, maximum likelihood estimation (MLE) is a method of
-estimating the parameters of an assumed probability distribution,
-given some observed data. This is achieved by maximizing a likelihood
-function so that, under the assumed statistical model, the observed
-data is the most probable. 
-</p>
+<p>This result implies that \( C(\theta) \) is a convex function since the matrix \( X^T X \) always is positive semi-definite.</p>
 
-<p>We will assume here that our events are given by the above Gaussian
-distribution and we will determine the optimal parameters \( \beta \) by
-maximizing the above PDF. However, computing the derivatives of a
-product function is cumbersome and can easily lead to overflow and/or
-underflowproblems, with potentials for loss of numerical precision.
-</p>
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="simple-program">Simple program </h2>
 
-<p>In practice, it is more convenient to maximize the logarithm of the
-PDF because it is a monotonically increasing function of the argument.
-Alternatively, and this will be our option, we will minimize the
-negative of the logarithm since this is a monotonically decreasing
-function.
+<p>We can now write a program that minimizes \( C(\theta) \) using the gradient descent method with a constant learning rate \( \eta \) according to </p>
+$$
+\theta_{k+1} = \theta_k - \eta \nabla_\theta C(\theta_k), \ k=0,1,\cdots 
+$$
+
+<p>We can use the expression we computed for the gradient and let use a
+\( \theta_0 \) be chosen randomly and let \( \eta = 0.001 \). Stop iterating
+when \( ||\nabla_\theta C(\theta_k) || \leq \epsilon = 10^{-8} \). <b>Note that the code below does not include the latter stop criterion</b>.
 </p>
 
-<p>Note also that maximization/minimization of the logarithm of the PDF
-is equivalent to the maximization/minimization of the function itself.
+<p>And finally we can compare our solution for \( \theta \) with the analytic result given by 
+\( \theta= (X^TX)^{-1} X^T \mathbf{y} \).
 </p>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="a-new-cost-function">A new Cost Function </h2>
+<h2 id="gradient-descent-example">Gradient Descent Example </h2>
 
-<p>We could now define a new cost function to minimize, namely the negative logarithm of the above PDF</p>
+<p>Here our simple example</p>
+
+<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #eeeedd">
+  <pre style="line-height: 125%;"><span style="color: #228B22"># Importing various packages</span>
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">random</span> <span style="color: #8B008B; font-weight: bold">import</span> random, seed
+<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
+<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">matplotlib.pyplot</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">plt</span>
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">mpl_toolkits.mplot3d</span> <span style="color: #8B008B; font-weight: bold">import</span> Axes3D
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">matplotlib</span> <span style="color: #8B008B; font-weight: bold">import</span> cm
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">matplotlib.ticker</span> <span style="color: #8B008B; font-weight: bold">import</span> LinearLocator, FormatStrFormatter
+<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">sys</span>
+
+<span style="color: #228B22"># the number of datapoints</span>
+n = <span style="color: #B452CD">100</span>
+x = <span style="color: #B452CD">2</span>*np.random.rand(n,<span style="color: #B452CD">1</span>)
+y = <span style="color: #B452CD">4</span>+<span style="color: #B452CD">3</span>*x+np.random.randn(n,<span style="color: #B452CD">1</span>)
+
+X = np.c_[np.ones((n,<span style="color: #B452CD">1</span>)), x]
+<span style="color: #228B22"># Hessian matrix</span>
+H = (<span style="color: #B452CD">2.0</span>/n)* X.T @ X
+<span style="color: #228B22"># Get the eigenvalues</span>
+EigValues, EigVectors = np.linalg.eig(H)
+<span style="color: #658b00">print</span>(<span style="color: #CD5555">f&quot;Eigenvalues of Hessian Matrix:{</span>EigValues<span style="color: #CD5555">}&quot;</span>)
+
+theta_linreg = np.linalg.inv(X.T @ X) @ X.T @ y
+<span style="color: #658b00">print</span>(theta_linreg)
+theta = np.random.randn(<span style="color: #B452CD">2</span>,<span style="color: #B452CD">1</span>)
+
+eta = <span style="color: #B452CD">1.0</span>/np.max(EigValues)
+Niterations = <span style="color: #B452CD">1000</span>
+
+<span style="color: #8B008B; font-weight: bold">for</span> <span style="color: #658b00">iter</span> <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(Niterations):
+    gradient = (<span style="color: #B452CD">2.0</span>/n)*X.T @ (X @ theta-y)
+    theta -= eta*gradient
+
+<span style="color: #658b00">print</span>(theta)
+xnew = np.array([[<span style="color: #B452CD">0</span>],[<span style="color: #B452CD">2</span>]])
+xbnew = np.c_[np.ones((<span style="color: #B452CD">2</span>,<span style="color: #B452CD">1</span>)), xnew]
+ypredict = xbnew.dot(theta)
+ypredict2 = xbnew.dot(theta_linreg)
+plt.plot(xnew, ypredict, <span style="color: #CD5555">&quot;r-&quot;</span>)
+plt.plot(xnew, ypredict2, <span style="color: #CD5555">&quot;b-&quot;</span>)
+plt.plot(x, y ,<span style="color: #CD5555">&#39;ro&#39;</span>)
+plt.axis([<span style="color: #B452CD">0</span>,<span style="color: #B452CD">2.0</span>,<span style="color: #B452CD">0</span>, <span style="color: #B452CD">15.0</span>])
+plt.xlabel(<span style="color: #CD5555">r&#39;$x$&#39;</span>)
+plt.ylabel(<span style="color: #CD5555">r&#39;$y$&#39;</span>)
+plt.title(<span style="color: #CD5555">r&#39;Gradient descent example&#39;</span>)
+plt.show()
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
+</div>
 
-$$
-C(\boldsymbol{\beta}=-\log{\prod_{i=0}^{n-1}p(y_i,\boldsymbol{X}\vert\boldsymbol{\beta})}=-\sum_{i=0}^{n-1}\log{p(y_i,\boldsymbol{X}\vert\boldsymbol{\beta})},
-$$
 
-<p>which becomes</p>
+<!-- !split  -->
+<h2 id="gradient-descent-and-ridge">Gradient descent and Ridge </h2>
+
+<p>We have also discussed Ridge regression where the loss function contains a regularized term given by the \( L_2 \) norm of \( \theta \), </p>
 $$
-C(\boldsymbol{\beta}=\frac{n}{2}\log{2\pi\sigma^2}+\frac{\vert\vert (\boldsymbol{y}-\boldsymbol{X}\boldsymbol{\beta})\vert\vert_2^2}{2\sigma^2}.
+C_{\text{ridge}}(\theta) = \frac{1}{n}||X\theta -\mathbf{y}||^2 + \lambda ||\theta||^2, \ \lambda \geq 0.
 $$
 
-<p>Taking the derivative of the <em>new</em> cost function with respect to the parameters \( \beta \) we recognize our familiar OLS equation, namely</p>
-
+<p>In order to minimize \( C_{\text{ridge}}(\theta) \) using GD we adjust the gradient as follows </p>
 $$
-\boldsymbol{X}^T\left(\boldsymbol{y}-\boldsymbol{X}\boldsymbol{\beta}\right) =0,
+\nabla_\theta C_{\text{ridge}}(\theta)  = \frac{2}{n}\begin{bmatrix} \sum_{i=1}^{100} \left(\theta_0+\theta_1x_i-y_i\right) \\
+\sum_{i=1}^{100}\left( x_i (\theta_0+\theta_1x_i)-y_ix_i\right) \\
+\end{bmatrix} + 2\lambda\begin{bmatrix} \theta_0 \\ \theta_1\end{bmatrix} = 2 (\frac{1}{n}X^T(X\theta - \mathbf{y})+\lambda \theta).
 $$
 
-<p>which leads to the well-known OLS equation for the optimal paramters \( \beta \)</p>
+<p>We can easily extend our program to minimize \( C_{\text{ridge}}(\theta) \) using gradient descent and compare with the analytical solution given by </p>
 $$
-\hat{\boldsymbol{\beta}}^{\mathrm{OLS}}=\left(\boldsymbol{X}^T\boldsymbol{X}\right)^{-1}\boldsymbol{X}^T\boldsymbol{y}!
+\theta_{\text{ridge}} = \left(X^T X + n\lambda I_{2 \times 2} \right)^{-1} X^T \mathbf{y}.
 $$
 
-<p>Before we make a similar analysis for Ridge and Lasso regression, we need a short reminder on statistics. </p>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="more-basic-statistics-and-bayes-theorem">More basic Statistics and Bayes' theorem </h2>
-
-<p>A central theorem in statistics is Bayes' theorem. This theorem plays a similar role as the good old Pythagoras' theorem in geometry.
-Bayes' theorem is extremely simple to derive. But to do so we need some basic axioms from statistics.
-</p>
-
-<p>Assume we have two domains of events \( X=[x_0,x_1,\dots,x_{n-1}] \) and \( Y=[y_0,y_1,\dots,y_{n-1}] \).</p>
-
-<p>We define also the likelihood for \( X \) and \( Y \) as \( p(X) \) and \( p(Y) \) respectively.
-The likelihood of a specific event \( x_i \) (or \( y_i \)) is then written as \( p(X=x_i) \) or just \( p(x_i)=p_i \). 
-</p>
-
-<div class="alert alert-block alert-block alert-text-normal">
-<b>Union of events is given by</b>
-<p>
+<h2 id="the-hessian-matrix-for-ridge-regression">The Hessian matrix for Ridge Regression </h2>
+<p>The Hessian matrix of Ridge Regression for our simple example  is given by </p>
 $$
-p(X \cup Y)= p(X)+p(Y)-p(X \cap Y).
+\boldsymbol{H} \equiv \begin{bmatrix}
+\frac{\partial^2 C(\theta)}{\partial \theta_0^2} & \frac{\partial^2 C(\theta)}{\partial \theta_0 \partial \theta_1}  \\
+\frac{\partial^2 C(\theta)}{\partial \theta_0 \partial \theta_1} & \frac{\partial^2 C(\theta)}{\partial \theta_1^2} &  \\
+\end{bmatrix} = \frac{2}{n}X^T X+2\lambda\boldsymbol{I}.
 $$
-</div>
 
+<p>This implies that the Hessian matrix  is positive definite, hence the stationary point is a
+minimum.
+Note that the Ridge cost function is convex being  a sum of two convex
+functions. Therefore, the stationary point is a global
+minimum of this function.
+</p>
 
-<div class="alert alert-block alert-block alert-text-normal">
-<b>The product rule (aka joint probability) is given by</b>
-<p>
-$$
-p(X \cup Y)= p(X,Y)= p(X\vert Y)p(Y)=p(Y\vert X)p(X),
-$$
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="program-example-for-gradient-descent-with-ridge-regression">Program example for gradient descent with Ridge Regression </h2>
 
-<p>where we read \( p(X\vert Y) \) as the likelihood of obtaining \( X \) given \( Y \).</p>
+<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #eeeedd">
+  <pre style="line-height: 125%;"><span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">random</span> <span style="color: #8B008B; font-weight: bold">import</span> random, seed
+<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
+<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">matplotlib.pyplot</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">plt</span>
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">mpl_toolkits.mplot3d</span> <span style="color: #8B008B; font-weight: bold">import</span> Axes3D
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">matplotlib</span> <span style="color: #8B008B; font-weight: bold">import</span> cm
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">matplotlib.ticker</span> <span style="color: #8B008B; font-weight: bold">import</span> LinearLocator, FormatStrFormatter
+<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">sys</span>
+
+<span style="color: #228B22"># the number of datapoints</span>
+n = <span style="color: #B452CD">100</span>
+x = <span style="color: #B452CD">2</span>*np.random.rand(n,<span style="color: #B452CD">1</span>)
+y = <span style="color: #B452CD">4</span>+<span style="color: #B452CD">3</span>*x+np.random.randn(n,<span style="color: #B452CD">1</span>)
+
+X = np.c_[np.ones((n,<span style="color: #B452CD">1</span>)), x]
+XT_X = X.T @ X
+
+<span style="color: #228B22">#Ridge parameter lambda</span>
+lmbda  = <span style="color: #B452CD">0.001</span>
+Id = n*lmbda* np.eye(XT_X.shape[<span style="color: #B452CD">0</span>])
+
+<span style="color: #228B22"># Hessian matrix</span>
+H = (<span style="color: #B452CD">2.0</span>/n)* XT_X+<span style="color: #B452CD">2</span>*lmbda* np.eye(XT_X.shape[<span style="color: #B452CD">0</span>])
+<span style="color: #228B22"># Get the eigenvalues</span>
+EigValues, EigVectors = np.linalg.eig(H)
+<span style="color: #658b00">print</span>(<span style="color: #CD5555">f&quot;Eigenvalues of Hessian Matrix:{</span>EigValues<span style="color: #CD5555">}&quot;</span>)
+
+
+theta_linreg = np.linalg.inv(XT_X+Id) @ X.T @ y
+<span style="color: #658b00">print</span>(theta_linreg)
+<span style="color: #228B22"># Start plain gradient descent</span>
+theta = np.random.randn(<span style="color: #B452CD">2</span>,<span style="color: #B452CD">1</span>)
+
+eta = <span style="color: #B452CD">1.0</span>/np.max(EigValues)
+Niterations = <span style="color: #B452CD">100</span>
+
+<span style="color: #8B008B; font-weight: bold">for</span> <span style="color: #658b00">iter</span> <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(Niterations):
+    gradients = <span style="color: #B452CD">2.0</span>/n*X.T @ (X @ (theta)-y)+<span style="color: #B452CD">2</span>*lmbda*theta
+    theta -= eta*gradients
+
+<span style="color: #658b00">print</span>(theta)
+ypredict = X @ theta
+ypredict2 = X @ theta_linreg
+plt.plot(x, ypredict, <span style="color: #CD5555">&quot;r-&quot;</span>)
+plt.plot(x, ypredict2, <span style="color: #CD5555">&quot;b-&quot;</span>)
+plt.plot(x, y ,<span style="color: #CD5555">&#39;ro&#39;</span>)
+plt.axis([<span style="color: #B452CD">0</span>,<span style="color: #B452CD">2.0</span>,<span style="color: #B452CD">0</span>, <span style="color: #B452CD">15.0</span>])
+plt.xlabel(<span style="color: #CD5555">r&#39;$x$&#39;</span>)
+plt.ylabel(<span style="color: #CD5555">r&#39;$y$&#39;</span>)
+plt.title(<span style="color: #CD5555">r&#39;Gradient descent example for Ridge&#39;</span>)
+plt.show()
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
 </div>
 
 
-<p>If we have independent events then \( p(X,Y)=p(X)p(Y) \).</p>
-
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="marginal-probability">Marginal Probability </h2>
+<h2 id="using-gradient-descent-methods-limitations">Using gradient descent methods, limitations </h2>
 
-<p>The marginal probability is defined in terms of only one of the set of variables \( X,Y \). For a discrete probability we have</p>
-<div class="alert alert-block alert-block alert-text-normal">
-<b></b>
-<p>
-$$
-p(X)=\sum_{i=0}^{n-1}p(X,Y=y_i)=\sum_{i=0}^{n-1}p(X\vert Y=y_i)p(Y=y_i)=\sum_{i=0}^{n-1}p(X\vert y_i)p(y_i).
-$$
-</div>
+<ul>
+<li> <b>Gradient descent (GD) finds local minima of our function</b>. Since the GD algorithm is deterministic, if it converges, it will converge to a local minimum of our cost/loss/risk function. Because in ML we are often dealing with extremely rugged landscapes with many local minima, this can lead to poor performance.</li>
+<li> <b>GD is sensitive to initial conditions</b>. One consequence of the local nature of GD is that initial conditions matter. Depending on where one starts, one will end up at a different local minima. Therefore, it is very important to think about how one initializes the training process. This is true for GD as well as more complicated variants of GD.</li>
+<li> <b>Gradients are computationally expensive to calculate for large datasets</b>. In many cases in statistics and ML, the cost/loss/risk function is a sum of terms, with one term for each data point. For example, in linear regression, \( E \propto \sum_{i=1}^n (y_i - \mathbf{w}^T\cdot\mathbf{x}_i)^2 \); for logistic regression, the square error is replaced by the cross entropy. To calculate the gradient we have to sum over <em>all</em> \( n \) data points. Doing this at every GD step becomes extremely computationally expensive. An ingenious solution to this, is to calculate the gradients using small subsets of the data called &quot;mini batches&quot;. This has the added benefit of introducing stochasticity into our algorithm.</li>
+<li> <b>GD is very sensitive to choices of learning rates</b>. GD is extremely sensitive to the choice of learning rates. If the learning rate is very small, the training process take an extremely long time. For larger learning rates, GD can diverge and give poor results. Furthermore, depending on what the local landscape looks like, we have to modify the learning rates to ensure convergence. Ideally, we would <em>adaptively</em> choose the learning rates to match the landscape.</li>
+<li> <b>GD treats all directions in parameter space uniformly.</b> Another major drawback of GD is that unlike Newton's method, the learning rate for GD is the same in all directions in parameter space. For this reason, the maximum learning rate is set by the behavior of the steepest direction and this can significantly slow down training. Ideally, we would like to take large steps in flat directions and small steps in steep directions. Since we are exploring rugged landscapes where curvatures change, this requires us to keep track of not only the gradient but second derivatives. The ideal scenario would be to calculate the Hessian but this proves to be too computationally expensive.</li> 
+<li> GD can take exponential time to escape saddle points, even with random initialization. As we mentioned, GD is extremely sensitive to initial condition since it determines the particular local minimum GD would eventually reach. However, even with a good initialization scheme, through the introduction of randomness, GD can still take exponential time to escape saddle points.</li>
+</ul>
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="momentum-based-gd">Momentum based GD </h2>
 
+<p>We discuss here some simple examples where we introduce what is called
+'memory'about previous steps, or what is normally called momentum
+gradient descent.
+For the mathematical details, see whiteboad notes from lecture on September 8, 2025. 
+</p>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="conditional-probability">Conditional  Probability </h2>
+<h2 id="improving-gradient-descent-with-momentum">Improving gradient descent with momentum </h2>
 
-<p>The conditional  probability, if \( p(Y) > 0 \), is </p>
-<div class="alert alert-block alert-block alert-text-normal">
-<b></b>
-<p>
-$$
-p(X\vert Y)= \frac{p(X,Y)}{p(Y)}=\frac{p(X,Y)}{\sum_{i=0}^{n-1}p(Y\vert X=x_i)p(x_i)}.
-$$
+
+<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #eeeedd">
+  <pre style="line-height: 125%;"><span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">numpy</span> <span style="color: #8B008B; font-weight: bold">import</span> asarray
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">numpy</span> <span style="color: #8B008B; font-weight: bold">import</span> arange
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">numpy.random</span> <span style="color: #8B008B; font-weight: bold">import</span> rand
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">numpy.random</span> <span style="color: #8B008B; font-weight: bold">import</span> seed
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">matplotlib</span> <span style="color: #8B008B; font-weight: bold">import</span> pyplot
+ 
+<span style="color: #228B22"># objective function</span>
+<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">objective</span>(x):
+	<span style="color: #8B008B; font-weight: bold">return</span> x**<span style="color: #B452CD">2.0</span>
+ 
+<span style="color: #228B22"># derivative of objective function</span>
+<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">derivative</span>(x):
+	<span style="color: #8B008B; font-weight: bold">return</span> x * <span style="color: #B452CD">2.0</span>
+ 
+<span style="color: #228B22"># gradient descent algorithm</span>
+<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">gradient_descent</span>(objective, derivative, bounds, n_iter, step_size):
+	<span style="color: #228B22"># track all solutions</span>
+	solutions, scores = <span style="color: #658b00">list</span>(), <span style="color: #658b00">list</span>()
+	<span style="color: #228B22"># generate an initial point</span>
+	solution = bounds[:, <span style="color: #B452CD">0</span>] + rand(<span style="color: #658b00">len</span>(bounds)) * (bounds[:, <span style="color: #B452CD">1</span>] - bounds[:, <span style="color: #B452CD">0</span>])
+	<span style="color: #228B22"># run the gradient descent</span>
+	<span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(n_iter):
+		<span style="color: #228B22"># calculate gradient</span>
+		gradient = derivative(solution)
+		<span style="color: #228B22"># take a step</span>
+		solution = solution - step_size * gradient
+		<span style="color: #228B22"># evaluate candidate point</span>
+		solution_eval = objective(solution)
+		<span style="color: #228B22"># store solution</span>
+		solutions.append(solution)
+		scores.append(solution_eval)
+		<span style="color: #228B22"># report progress</span>
+		<span style="color: #658b00">print</span>(<span style="color: #CD5555">&#39;&gt;%d f(%s) = %.5f&#39;</span> % (i, solution, solution_eval))
+	<span style="color: #8B008B; font-weight: bold">return</span> [solutions, scores]
+ 
+<span style="color: #228B22"># seed the pseudo random number generator</span>
+seed(<span style="color: #B452CD">4</span>)
+<span style="color: #228B22"># define range for input</span>
+bounds = asarray([[-<span style="color: #B452CD">1.0</span>, <span style="color: #B452CD">1.0</span>]])
+<span style="color: #228B22"># define the total iterations</span>
+n_iter = <span style="color: #B452CD">30</span>
+<span style="color: #228B22"># define the step size</span>
+step_size = <span style="color: #B452CD">0.1</span>
+<span style="color: #228B22"># perform the gradient descent search</span>
+solutions, scores = gradient_descent(objective, derivative, bounds, n_iter, step_size)
+<span style="color: #228B22"># sample input range uniformly at 0.1 increments</span>
+inputs = arange(bounds[<span style="color: #B452CD">0</span>,<span style="color: #B452CD">0</span>], bounds[<span style="color: #B452CD">0</span>,<span style="color: #B452CD">1</span>]+<span style="color: #B452CD">0.1</span>, <span style="color: #B452CD">0.1</span>)
+<span style="color: #228B22"># compute targets</span>
+results = objective(inputs)
+<span style="color: #228B22"># create a line plot of input vs result</span>
+pyplot.plot(inputs, results)
+<span style="color: #228B22"># plot the solutions found</span>
+pyplot.plot(solutions, scores, <span style="color: #CD5555">&#39;.-&#39;</span>, color=<span style="color: #CD5555">&#39;red&#39;</span>)
+<span style="color: #228B22"># show the plot</span>
+pyplot.show()
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
 </div>
 
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="bayes-theorem">Bayes' Theorem </h2>
+<h2 id="same-code-but-now-with-momentum-gradient-descent">Same code but now with momentum gradient descent </h2>
 
-<p>If we combine the conditional probability with the marginal probability and the standard product rule, we have</p>
-$$
-p(X\vert Y)= \frac{p(X,Y)}{p(Y)},
-$$
 
-<p>which we can rewrite as</p>
+<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #eeeedd">
+  <pre style="line-height: 125%;"><span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">numpy</span> <span style="color: #8B008B; font-weight: bold">import</span> asarray
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">numpy</span> <span style="color: #8B008B; font-weight: bold">import</span> arange
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">numpy.random</span> <span style="color: #8B008B; font-weight: bold">import</span> rand
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">numpy.random</span> <span style="color: #8B008B; font-weight: bold">import</span> seed
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">matplotlib</span> <span style="color: #8B008B; font-weight: bold">import</span> pyplot
+ 
+<span style="color: #228B22"># objective function</span>
+<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">objective</span>(x):
+	<span style="color: #8B008B; font-weight: bold">return</span> x**<span style="color: #B452CD">2.0</span>
+ 
+<span style="color: #228B22"># derivative of objective function</span>
+<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">derivative</span>(x):
+	<span style="color: #8B008B; font-weight: bold">return</span> x * <span style="color: #B452CD">2.0</span>
+ 
+<span style="color: #228B22"># gradient descent algorithm</span>
+<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">gradient_descent</span>(objective, derivative, bounds, n_iter, step_size, momentum):
+	<span style="color: #228B22"># track all solutions</span>
+	solutions, scores = <span style="color: #658b00">list</span>(), <span style="color: #658b00">list</span>()
+	<span style="color: #228B22"># generate an initial point</span>
+	solution = bounds[:, <span style="color: #B452CD">0</span>] + rand(<span style="color: #658b00">len</span>(bounds)) * (bounds[:, <span style="color: #B452CD">1</span>] - bounds[:, <span style="color: #B452CD">0</span>])
+	<span style="color: #228B22"># keep track of the change</span>
+	change = <span style="color: #B452CD">0.0</span>
+	<span style="color: #228B22"># run the gradient descent</span>
+	<span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(n_iter):
+		<span style="color: #228B22"># calculate gradient</span>
+		gradient = derivative(solution)
+		<span style="color: #228B22"># calculate update</span>
+		new_change = step_size * gradient + momentum * change
+		<span style="color: #228B22"># take a step</span>
+		solution = solution - new_change
+		<span style="color: #228B22"># save the change</span>
+		change = new_change
+		<span style="color: #228B22"># evaluate candidate point</span>
+		solution_eval = objective(solution)
+		<span style="color: #228B22"># store solution</span>
+		solutions.append(solution)
+		scores.append(solution_eval)
+		<span style="color: #228B22"># report progress</span>
+		<span style="color: #658b00">print</span>(<span style="color: #CD5555">&#39;&gt;%d f(%s) = %.5f&#39;</span> % (i, solution, solution_eval))
+	<span style="color: #8B008B; font-weight: bold">return</span> [solutions, scores]
+ 
+<span style="color: #228B22"># seed the pseudo random number generator</span>
+seed(<span style="color: #B452CD">4</span>)
+<span style="color: #228B22"># define range for input</span>
+bounds = asarray([[-<span style="color: #B452CD">1.0</span>, <span style="color: #B452CD">1.0</span>]])
+<span style="color: #228B22"># define the total iterations</span>
+n_iter = <span style="color: #B452CD">30</span>
+<span style="color: #228B22"># define the step size</span>
+step_size = <span style="color: #B452CD">0.1</span>
+<span style="color: #228B22"># define momentum</span>
+momentum = <span style="color: #B452CD">0.3</span>
+<span style="color: #228B22"># perform the gradient descent search with momentum</span>
+solutions, scores = gradient_descent(objective, derivative, bounds, n_iter, step_size, momentum)
+<span style="color: #228B22"># sample input range uniformly at 0.1 increments</span>
+inputs = arange(bounds[<span style="color: #B452CD">0</span>,<span style="color: #B452CD">0</span>], bounds[<span style="color: #B452CD">0</span>,<span style="color: #B452CD">1</span>]+<span style="color: #B452CD">0.1</span>, <span style="color: #B452CD">0.1</span>)
+<span style="color: #228B22"># compute targets</span>
+results = objective(inputs)
+<span style="color: #228B22"># create a line plot of input vs result</span>
+pyplot.plot(inputs, results)
+<span style="color: #228B22"># plot the solutions found</span>
+pyplot.plot(solutions, scores, <span style="color: #CD5555">&#39;.-&#39;</span>, color=<span style="color: #CD5555">&#39;red&#39;</span>)
+<span style="color: #228B22"># show the plot</span>
+pyplot.show()
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
+</div>
+
 
-$$
-p(X\vert Y)= \frac{p(X,Y)}{\sum_{i=0}^{n-1}p(Y\vert X=x_i)p(x_i)}=\frac{p(Y\vert X)p(X)}{\sum_{i=0}^{n-1}p(Y\vert X=x_i)p(x_i)},
-$$
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="overview-video-on-stochastic-gradient-descent-sgd">Overview video on Stochastic Gradient Descent (SGD) </h2>
 
-<p>which is Bayes' theorem. It allows us to evaluate the uncertainty in in \( X \) after we have observed \( Y \). We can easily interchange \( X \) with \( Y \).  </p>
+<a href="https://www.youtube.com/watch?v=vMh0zPT0tLI&ab_channel=StatQuestwithJoshStarmer" target="_blank">What is Stochastic Gradient Descent</a>
+<p>There are several reasons for using stochastic gradient descent. Some of these are:</p>
 
+<ol>
+<li> Efficiency: Updates weights more frequently using a single or a small batch of samples, which speeds up convergence.</li>
+<li> Hopefully avoid Local Minima</li>
+<li> Memory Usage: Requires less memory compared to computing gradients for the entire dataset.</li>
+</ol>
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="interpretations-of-bayes-theorem">Interpretations of Bayes' Theorem </h2>
+<h2 id="batches-and-mini-batches">Batches and mini-batches </h2>
 
-<p>The quantity \( p(Y\vert X) \) on the right-hand side of the theorem is
-evaluated for the observed data \( Y \) and can be viewed as a function of
-the parameter space represented by \( X \). This function is not
-necesseraly normalized and is normally called the likelihood function.
+<p>In gradient descent we compute the cost function and its gradient for all data points we have.</p>
+
+<p>In large-scale applications such as the <a href="https://www.image-net.org/challenges/LSVRC/" target="_blank">ILSVRC challenge</a>, the
+training data can have on order of millions of examples. Hence, it
+seems wasteful to compute the full cost function over the entire
+training set in order to perform only a single parameter update. A
+very common approach to addressing this challenge is to compute the
+gradient over batches of the training data. For example, a typical batch could contain some thousand  examples from
+an  entire training set of several millions. This batch is then used to
+perform a parameter update.
 </p>
 
-<p>The function \( p(X) \) on the right hand side is called the prior while the function on the left hand side is the called the posterior probability. The denominator on the right hand side serves as a normalization factor for the posterior distribution.</p>
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="pros-and-cons">Pros and cons </h2>
 
-<p>Let us try to illustrate Bayes' theorem through an example.</p>
+<ol>
+<li> Speed: SGD is faster than gradient descent because it uses only one training example per iteration, whereas gradient descent requires the entire dataset. This speed advantage becomes more significant as the size of the dataset increases.</li>
+<li> Convergence: Gradient descent has a more predictable convergence behaviour because it uses the average gradient of the entire dataset. In contrast, SGD&#8217;s convergence behaviour can be more erratic due to its random sampling of individual training examples.</li>
+<li> Memory: Gradient descent requires more memory than SGD because it must store the entire dataset for each iteration. SGD only needs to store the current training example, making it more memory-efficient.</li>
+</ol>
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="convergence-rates">Convergence rates </h2>
 
+<ol>
+<li> Stochastic Gradient Descent has a faster convergence rate due to the use of single training examples in each iteration.</li>
+<li> Gradient Descent as a slower convergence rate, as it uses the entire dataset for each iteration.</li>
+</ol>
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="example-of-usage-of-bayes-theorem">Example of Usage of Bayes' theorem </h2>
+<h2 id="accuracy">Accuracy  </h2>
 
-<p>Let us suppose that you are undergoing a series of mammography scans in
-order to rule out possible breast cancer cases.  We define the
-sensitivity for a positive event by the variable \( X \). It takes binary
-values with \( X=1 \) representing a positive event and \( X=0 \) being a
-negative event. We reserve \( Y \) as a classification parameter for
-either a negative or a positive breast cancer confirmation. (Short note on wordings: positive here means having breast cancer, although none of us would consider this being a  positive thing).
+<p>In general, stochastic Gradient Descent is Less accurate than gradient
+descent, as it calculates the gradient on single examples, which may
+not accurately represent the overall dataset.  Gradient Descent is
+more accurate because it uses the average gradient calculated over the
+entire dataset.
 </p>
 
-<p>We let \( Y=1 \) represent the the case of having breast cancer and \( Y=0 \) as not.</p>
-
-<p>Let us assume that if you have breast cancer, the test will be positive with a probability of \( 0.8 \), that is we have</p>
-
-$$
-p(X=1\vert Y=1) =0.8.
-$$
+<p>There are other disadvantages to using SGD. The main drawback is that
+its convergence behaviour can be more erratic due to the random
+sampling of individual training examples. This can lead to less
+accurate results, as the algorithm may not converge to the true
+minimum of the cost function. Additionally, the learning rate, which
+determines the step size of each update to the model&#8217;s parameters,
+must be carefully chosen to ensure convergence.
+</p>
 
-<p>This obviously sounds  scary since many would conclude that if the test is positive, there is a likelihood of \( 80\% \) for having cancer.
-It is however not correct, as the following Bayesian analysis shows.
+<p>It is however the method of choice in deep learning algorithms where
+SGD is often used in combination with other optimization techniques,
+such as momentum or adaptive learning rates
 </p>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="doing-it-correctly">Doing it correctly </h2>
+<h2 id="stochastic-gradient-descent-sgd">Stochastic Gradient Descent (SGD) </h2>
 
-<p>If we look at various national surveys on breast cancer, the general likelihood of developing breast cancer is a very small number.
-Let us assume that the prior probability in the population as a whole is
+<p>In stochastic gradient descent, the extreme case is the case where we
+have only one batch, that is we include the whole data set.
 </p>
 
-$$
-p(Y=1) =0.004.
-$$
-
-<p>We need also to account for the fact that the test may produce a false positive result (false alarm). Let us here assume that we have</p>
-$$
-p(X=1\vert Y=0) =0.1.
-$$
-
-<p>Using Bayes' theorem we can then find the posterior probability that the person has breast cancer in case of a positive test, that is we can compute</p>
-
-$$
-p(Y=1\vert X=1)=\frac{p(X=1\vert Y=1)p(Y=1)}{p(X=1\vert Y=1)p(Y=1)+p(X=1\vert Y=0)p(Y=0)}=\frac{0.8\times 0.004}{0.8\times 0.004+0.1\times 0.996}=0.031.
-$$
+<p>This process is called Stochastic Gradient
+Descent (SGD) (or also sometimes on-line gradient descent). This is
+relatively less common to see because in practice due to vectorized
+code optimizations it can be computationally much more efficient to
+evaluate the gradient for 100 examples, than the gradient for one
+example 100 times. Even though SGD technically refers to using a
+single example at a time to evaluate the gradient, you will hear
+people use the term SGD even when referring to mini-batch gradient
+descent (i.e. mentions of MGD for &#8220;Minibatch Gradient Descent&#8221;, or BGD
+for &#8220;Batch gradient descent&#8221; are rare to see), where it is usually
+assumed that mini-batches are used. The size of the mini-batch is a
+hyperparameter but it is not very common to cross-validate or bootstrap it. It is
+usually based on memory constraints (if any), or set to some value,
+e.g. 32, 64 or 128. We use powers of 2 in practice because many
+vectorized operation implementations work faster when their inputs are
+sized in powers of 2.
+</p>
 
-<p>That is, in case of a positive test, there is only a \( 3\% \) chance of having breast cancer!</p>
+<p>In our notes with  SGD we mean stochastic gradient descent with mini-batches.</p>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="bayes-theorem-and-ridge-and-lasso-regression">Bayes' Theorem and Ridge and Lasso Regression </h2>
+<h2 id="stochastic-gradient-descent">Stochastic Gradient Descent </h2>
 
-<p>Using Bayes' theorem we can gain a better intuition about Ridge and Lasso regression. </p>
-
-<p>For ordinary least squares we postulated that the maximum likelihood for the doamin of events \( \boldsymbol{D} \) (one-dimensional case)</p>
-$$
-\boldsymbol{D}=[(x_0,y_0), (x_1,y_1),\dots, (x_{n-1},y_{n-1})],
-$$
+<p>Stochastic gradient descent (SGD) and variants thereof address some of
+the shortcomings of the Gradient descent method discussed above.
+</p>
 
-<p>is given by</p>
+<p>The underlying idea of SGD comes from the observation that the cost
+function, which we want to minimize, can almost always be written as a
+sum over \( n \) data points \( \{\mathbf{x}_i\}_{i=1}^n \),
+</p>
 $$
-p(\boldsymbol{D}\vert\boldsymbol{\beta})=\prod_{i=0}^{n-1}\frac{1}{\sqrt{2\pi\sigma^2}}\exp{\left[-\frac{(y_i-\boldsymbol{X}_{i,*}\boldsymbol{\beta})^2}{2\sigma^2}\right]}.
+C(\mathbf{\theta}) = \sum_{i=1}^n c_i(\mathbf{x}_i,
+\mathbf{\theta}). 
 $$
 
-<p>In Bayes' theorem this function plays the role of the so-called likelihood. We could now ask the question what is the posterior probability of a parameter set \( \boldsymbol{\beta} \) given a domain of events \( \boldsymbol{D} \)?  That is, how can we define the posterior probability </p>
 
-$$
-p(\boldsymbol{\beta}\vert\boldsymbol{D}).
-$$
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="computation-of-gradients">Computation of gradients </h2>
 
-<p>Bayes' theorem comes to our rescue here since (omitting the normalization constant)</p>
+<p>This in turn means that the gradient can be
+computed as a sum over \( i \)-gradients 
+</p>
 $$
-p(\boldsymbol{\beta}\vert\boldsymbol{D})\propto p(\boldsymbol{D}\vert\boldsymbol{\beta})p(\boldsymbol{\beta}).
+\nabla_\theta C(\mathbf{\theta}) = \sum_i^n \nabla_\theta c_i(\mathbf{x}_i,
+\mathbf{\theta}).
 $$
 
-<p>We have a model for \( p(\boldsymbol{D}\vert\boldsymbol{\beta}) \) but need one for the <b>prior</b> \( p(\boldsymbol{\beta}) \)!   </p>
+<p>Stochasticity/randomness is introduced by only taking the
+gradient on a subset of the data called minibatches.  If there are \( n \)
+data points and the size of each minibatch is \( M \), there will be \( n/M \)
+minibatches. We denote these minibatches by \( B_k \) where
+\( k=1,\cdots,n/M \).
+</p>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="ridge-and-bayes">Ridge and Bayes </h2>
-
-<p>With the posterior probability defined by a likelihood which we have
-already modeled and an unknown prior, we are now ready to make
-additional models for the prior.
+<h2 id="sgd-example">SGD example </h2>
+<p>As an example, suppose we have \( 10 \) data points \( (\mathbf{x}_1,\cdots, \mathbf{x}_{10}) \) 
+and we choose to have \( M=5 \) minibathces,
+then each minibatch contains two data points. In particular we have
+\( B_1 = (\mathbf{x}_1,\mathbf{x}_2), \cdots, B_5 =
+(\mathbf{x}_9,\mathbf{x}_{10}) \). Note that if you choose \( M=1 \) you
+have only a single batch with all data points and on the other extreme,
+you may choose \( M=n \) resulting in a minibatch for each datapoint, i.e
+\( B_k = \mathbf{x}_k \).
 </p>
 
-<p>We can, based on our discussions of the variance of \( \boldsymbol{\beta} \) and the mean value, assume that the prior for the values \( \boldsymbol{\beta} \) is given by a Gaussian with mean value zero and variance \( \tau^2 \), that is</p>
-
+<p>The idea is now to approximate the gradient by replacing the sum over
+all data points with a sum over the data points in one the minibatches
+picked at random in each gradient descent step 
+</p>
 $$
-p(\boldsymbol{\beta})=\prod_{j=0}^{p-1}\exp{\left(-\frac{\beta_j^2}{2\tau^2}\right)}.
+\nabla_{\theta}
+C(\mathbf{\theta}) = \sum_{i=1}^n \nabla_\theta c_i(\mathbf{x}_i,
+\mathbf{\theta}) \rightarrow \sum_{i \in B_k}^n \nabla_\theta
+c_i(\mathbf{x}_i, \mathbf{\theta}).
 $$
 
-<p>Our posterior probability becomes then (omitting the normalization factor which is just a constant)</p>
-$$
-p(\boldsymbol{\beta\vert\boldsymbol{D})}=\prod_{i=0}^{n-1}\frac{1}{\sqrt{2\pi\sigma^2}}\exp{\left[-\frac{(y_i-\boldsymbol{X}_{i,*}\boldsymbol{\beta})^2}{2\sigma^2}\right]}\prod_{j=0}^{p-1}\exp{\left(-\frac{\beta_j^2}{2\tau^2}\right)}.
-$$
 
-<p>We can now optimize this quantity with respect to \( \boldsymbol{\beta} \). As we
-did for OLS, this is most conveniently done by taking the negative
-logarithm of the posterior probability. Doing so and leaving out the
-constants terms that do not depend on \( \beta \), we have
-</p>
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="the-gradient-step">The gradient step </h2>
 
+<p>Thus a gradient descent step now looks like </p>
 $$
-C(\boldsymbol{\beta})=\frac{\vert\vert (\boldsymbol{y}-\boldsymbol{X}\boldsymbol{\beta})\vert\vert_2^2}{2\sigma^2}+\frac{1}{2\tau^2}\vert\vert\boldsymbol{\beta}\vert\vert_2^2,
+\theta_{j+1} = \theta_j - \eta_j \sum_{i \in B_k}^n \nabla_\theta c_i(\mathbf{x}_i,
+\mathbf{\theta})
 $$
 
-<p>and replacing \( 1/2\tau^2 \) with \( \lambda \) we have</p>
-
-$$
-C(\boldsymbol{\beta})=\frac{\vert\vert (\boldsymbol{y}-\boldsymbol{X}\boldsymbol{\beta})\vert\vert_2^2}{2\sigma^2}+\lambda\vert\vert\boldsymbol{\beta}\vert\vert_2^2,
-$$
-
-<p>which is our Ridge cost function!  Nice, isn't it?</p>
+<p>where \( k \) is picked at random with equal
+probability from \( [1,n/M] \). An iteration over the number of
+minibathces (n/M) is commonly referred to as an epoch. Thus it is
+typical to choose a number of epochs and for each epoch iterate over
+the number of minibatches, as exemplified in the code below.
+</p>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="lasso-and-bayes">Lasso and Bayes </h2>
-
-<p>To derive the Lasso cost function, we simply replace the Gaussian prior with an exponential distribution (<a href="https://en.wikipedia.org/wiki/Laplace_distribution" target="_blank">Laplace in this case</a>) with zero mean value,  that is</p>
+<h2 id="simple-example-code">Simple example code </h2>
 
-$$
-p(\boldsymbol{\beta})=\prod_{j=0}^{p-1}\exp{\left(-\frac{\vert\beta_j\vert}{\tau}\right)}.
-$$
 
-<p>Our posterior probability becomes then (omitting the normalization factor which is just a constant)</p>
-$$
-p(\boldsymbol{\beta}\vert\boldsymbol{D})=\prod_{i=0}^{n-1}\frac{1}{\sqrt{2\pi\sigma^2}}\exp{\left[-\frac{(y_i-\boldsymbol{X}_{i,*}\boldsymbol{\beta})^2}{2\sigma^2}\right]}\prod_{j=0}^{p-1}\exp{\left(-\frac{\vert\beta_j\vert}{\tau}\right)}.
-$$
+<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #eeeedd">
+  <pre style="line-height: 125%;"><span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span> 
+
+n = <span style="color: #B452CD">100</span> <span style="color: #228B22">#100 datapoints </span>
+M = <span style="color: #B452CD">5</span>   <span style="color: #228B22">#size of each minibatch</span>
+m = <span style="color: #658b00">int</span>(n/M) <span style="color: #228B22">#number of minibatches</span>
+n_epochs = <span style="color: #B452CD">10</span> <span style="color: #228B22">#number of epochs</span>
+
+j = <span style="color: #B452CD">0</span>
+<span style="color: #8B008B; font-weight: bold">for</span> epoch <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #B452CD">1</span>,n_epochs+<span style="color: #B452CD">1</span>):
+    <span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(m):
+        k = np.random.randint(m) <span style="color: #228B22">#Pick the k-th minibatch at random</span>
+        <span style="color: #228B22">#Compute the gradient using the data in minibatch Bk</span>
+        <span style="color: #228B22">#Compute new suggestion for </span>
+        j += <span style="color: #B452CD">1</span>
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
+</div>
 
-<p>Taking the negative
-logarithm of the posterior probability and leaving out the
-constants terms that do not depend on \( \beta \), we have
+<p>Taking the gradient only on a subset of the data has two important
+benefits. First, it introduces randomness which decreases the chance
+that our opmization scheme gets stuck in a local minima. Second, if
+the size of the minibatches are small relative to the number of
+datapoints (\( M <  n \)), the computation of the gradient is much
+cheaper since we sum over the datapoints in the \( k-th \) minibatch and not
+all \( n \) datapoints.
 </p>
 
-$$
-C(\boldsymbol{\beta})=\frac{\vert\vert (\boldsymbol{y}-\boldsymbol{X}\boldsymbol{\beta})\vert\vert_2^2}{2\sigma^2}+\frac{1}{\tau}\vert\vert\boldsymbol{\beta}\vert\vert_1,
-$$
-
-<p>and replacing \( 1/\tau \) with \( \lambda \) we have</p>
-
-$$
-C(\boldsymbol{\beta})=\frac{\vert\vert (\boldsymbol{y}-\boldsymbol{X}\boldsymbol{\beta})\vert\vert_2^2}{2\sigma^2}+\lambda\vert\vert\boldsymbol{\beta}\vert\vert_1,
-$$
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="when-do-we-stop">When do we stop? </h2>
 
-<p>which is our Lasso cost function!  </p>
+<p>A natural question is when do we stop the search for a new minimum?
+One possibility is to compute the full gradient after a given number
+of epochs and check if the norm of the gradient is smaller than some
+threshold and stop if true. However, the condition that the gradient
+is zero is valid also for local minima, so this would only tell us
+that we are close to a local/global minimum. However, we could also
+evaluate the cost function at this point, store the result and
+continue the search. If the test kicks in at a later stage we can
+compare the values of the cost function and keep the \( \theta \) that
+gave the lowest value.
+</p>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="why-resampling-methods">Why resampling methods </h2>
+<h2 id="slightly-different-approach">Slightly different approach </h2>
 
-<p>Before we proceed, we need to rethink what we have been doing. In our
-eager to fit the data, we have omitted several important elements in
-our regression analysis. In what follows we will
+<p>Another approach is to let the step length \( \eta_j \) depend on the
+number of epochs in such a way that it becomes very small after a
+reasonable time such that we do not move at all. Such approaches are
+also called scaling. There are many such ways to <a href="https://towardsdatascience.com/gradient-descent-the-learning-rate-and-the-importance-of-feature-scaling-6c0b416596e1" target="_blank">scale the learning
+rate</a>
+and <a href="https://www.jmlr.org/papers/volume23/20-1258/20-1258.pdf" target="_blank">discussions here</a>. See
+also
+<a href="https://towardsdatascience.com/learning-rate-schedules-and-adaptive-learning-rate-methods-for-deep-learning-2c8f433990d1" target="_blank"><tt>https://towardsdatascience.com/learning-rate-schedules-and-adaptive-learning-rate-methods-for-deep-learning-2c8f433990d1</tt></a>
+for a discussion of different scaling functions for the learning rate.
 </p>
-<ol>
-<li> look at statistical properties, including a discussion of mean values, variance and the so-called bias-variance tradeoff</li>
-<li> introduce resampling techniques like cross-validation, bootstrapping and jackknife and more</li>
-</ol>
-<p>and discuss how to select a given model (one of the difficult parts in machine learning).</p>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="resampling-methods">Resampling methods </h2>
-<div class="alert alert-block alert-block alert-text-normal">
-<b></b>
-<p>
-<p>Resampling methods are an indispensable tool in modern
-statistics. They involve repeatedly drawing samples from a training
-set and refitting a model of interest on each sample in order to
-obtain additional information about the fitted model. For example, in
-order to estimate the variability of a linear regression fit, we can
-repeatedly draw different samples from the training data, fit a linear
-regression to each new sample, and then examine the extent to which
-the resulting fits differ. Such an approach may allow us to obtain
-information that would not be available from fitting the model only
-once using the original training sample.
-</p>
-
-<p>Two resampling methods are often used in Machine Learning analyses,</p>
-<ol>
-<li> The <b>bootstrap method</b></li>
-<li> and <b>Cross-Validation</b></li>
-</ol>
-<p>In addition there are several other methods such as the Jackknife and the Blocking methods. We will discuss in particular
-cross-validation and the bootstrap method. 
+<h2 id="time-decay-rate">Time decay rate </h2>
+
+<p>As an example, let \( e = 0,1,2,3,\cdots \) denote the current epoch and let \( t_0, t_1 > 0 \) be two fixed numbers. Furthermore, let \( t = e \cdot m + i \) where \( m \) is the number of minibatches and \( i=0,\cdots,m-1 \). Then the function $$\eta_j(t; t_0, t_1) = \frac{t_0}{t+t_1} $$ goes to zero as the number of epochs gets large. I.e. we start with a step length \( \eta_j (0; t_0, t_1) = t_0/t_1 \) which decays in <em>time</em> \( t \).</p>
+
+<p>In this way we can fix the number of epochs, compute \( \theta \) and
+evaluate the cost function at the end. Repeating the computation will
+give a different result since the scheme is random by design. Then we
+pick the final \( \theta \) that gives the lowest value of the cost
+function.
 </p>
-</div>
 
 
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="resampling-approaches-can-be-computationally-expensive">Resampling approaches can be computationally expensive </h2>
-<div class="alert alert-block alert-block alert-text-normal">
-<b></b>
-<p>
+<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #eeeedd">
+  <pre style="line-height: 125%;"><span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span> 
+
+<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">step_length</span>(t,t0,t1):
+    <span style="color: #8B008B; font-weight: bold">return</span> t0/(t+t1)
+
+n = <span style="color: #B452CD">100</span> <span style="color: #228B22">#100 datapoints </span>
+M = <span style="color: #B452CD">5</span>   <span style="color: #228B22">#size of each minibatch</span>
+m = <span style="color: #658b00">int</span>(n/M) <span style="color: #228B22">#number of minibatches</span>
+n_epochs = <span style="color: #B452CD">500</span> <span style="color: #228B22">#number of epochs</span>
+t0 = <span style="color: #B452CD">1.0</span>
+t1 = <span style="color: #B452CD">10</span>
+
+eta_j = t0/t1
+j = <span style="color: #B452CD">0</span>
+<span style="color: #8B008B; font-weight: bold">for</span> epoch <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #B452CD">1</span>,n_epochs+<span style="color: #B452CD">1</span>):
+    <span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(m):
+        k = np.random.randint(m) <span style="color: #228B22">#Pick the k-th minibatch at random</span>
+        <span style="color: #228B22">#Compute the gradient using the data in minibatch Bk</span>
+        <span style="color: #228B22">#Compute new suggestion for theta</span>
+        t = epoch*m+i
+        eta_j = step_length(t,t0,t1)
+        j += <span style="color: #B452CD">1</span>
 
-<p>Resampling approaches can be computationally expensive, because they
-involve fitting the same statistical method multiple times using
-different subsets of the training data. However, due to recent
-advances in computing power, the computational requirements of
-resampling methods generally are not prohibitive. In this chapter, we
-discuss two of the most commonly used resampling methods,
-cross-validation and the bootstrap. Both methods are important tools
-in the practical application of many statistical learning
-procedures. For example, cross-validation can be used to estimate the
-test error associated with a given statistical learning method in
-order to evaluate its performance, or to select the appropriate level
-of flexibility. The process of evaluating a model&#8217;s performance is
-known as model assessment, whereas the process of selecting the proper
-level of flexibility for a model is known as model selection. The
-bootstrap is widely used.
-</p>
+<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;eta_j after %d epochs: %g&quot;</span> % (n_epochs,eta_j))
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
 </div>
 
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="why-resampling-methods">Why resampling methods ? </h2>
-<div class="alert alert-block alert-block alert-text-normal">
-<b>Statistical analysis</b>
-<p>
+<h2 id="code-with-a-number-of-minibatches-which-varies">Code with a Number of Minibatches which varies </h2>
 
-<ul>
-<li> Our simulations can be treated as <em>computer experiments</em>. This is particularly the case for Monte Carlo methods which are widely used in statistical analyses.</li>
-<li> The results can be analysed with the same statistical tools as we would use when analysing experimental data.</li>
-<li> As in all experiments, we are looking for expectation values and an estimate of how accurate they are, i.e., possible sources for errors.</li>
-</ul>
+<p>In the code here we vary the number of mini-batches.</p>
+
+<!-- code=text (!bc pycode) typeset with pygments style "perldoc" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #eeeedd">
+  <pre style="line-height: 125%;"># Importing various packages
+from math import exp, sqrt
+from random import random, seed
+import numpy as np
+import matplotlib.pyplot as plt
+
+n = 100
+x = 2*np.random.rand(n,1)
+y = 4+3*x+np.random.randn(n,1)
+
+X = np.c_[np.ones((n,1)), x]
+XT_X = X.T @ X
+theta_linreg = np.linalg.inv(X.T @ X) @ (X.T @ y)
+print(&quot;Own inversion&quot;)
+print(theta_linreg)
+# Hessian matrix
+H = (2.0/n)* XT_X
+EigValues, EigVectors = np.linalg.eig(H)
+print(f&quot;Eigenvalues of Hessian Matrix:{EigValues}&quot;)
+
+theta = np.random.randn(2,1)
+eta = 1.0/np.max(EigValues)
+Niterations = 1000
+
+
+for iter in range(Niterations):
+    gradients = 2.0/n*X.T @ ((X @ theta)-y)
+    theta -= eta*gradients
+print(&quot;theta from own gd&quot;)
+print(theta)
+
+xnew = np.array([[0],[2]])
+Xnew = np.c_[np.ones((2,1)), xnew]
+ypredict = Xnew.dot(theta)
+ypredict2 = Xnew.dot(theta_linreg)
+
+n_epochs = 50
+M = 5   #size of each minibatch
+m = int(n/M) #number of minibatches
+t0, t1 = 5, 50
+
+def learning_schedule(t):
+    return t0/(t+t1)
+
+theta = np.random.randn(2,1)
+
+for epoch in range(n_epochs):
+# Can you figure out a better way of setting up the contributions to each batch?
+    for i in range(m):
+        random_index = M*np.random.randint(m)
+        xi = X[random_index:random_index+M]
+        yi = y[random_index:random_index+M]
+        gradients = (2.0/M)* xi.T @ ((xi @ theta)-yi)
+        eta = learning_schedule(epoch*m+i)
+        theta = theta - eta*gradients
+print(&quot;theta from own sdg&quot;)
+print(theta)
+
+plt.plot(xnew, ypredict, &quot;r-&quot;)
+plt.plot(xnew, ypredict2, &quot;b-&quot;)
+plt.plot(x, y ,&#39;ro&#39;)
+plt.axis([0,2.0,0, 15.0])
+plt.xlabel(r&#39;$x$&#39;)
+plt.ylabel(r&#39;$y$&#39;)
+plt.title(r&#39;Random numbers &#39;)
+plt.show()
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
 </div>
-    
+
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="statistical-analysis">Statistical analysis </h2>
-<div class="alert alert-block alert-block alert-text-normal">
-<b></b>
-<p>
+<h2 id="replace-or-not">Replace or not </h2>
 
-<ul>
-<li> As in other experiments, many numerical  experiments have two classes of errors:</li>
-<ul>
-  <li> Statistical errors</li>
-  <li> Systematical errors</li>
-</ul>
-<li> Statistical errors can be estimated using standard tools from statistics</li>
-<li> Systematical errors are method specific and must be treated differently from case to case.</li> 
-</ul>
-</div>
-    
+<p>In the above code, we have use replacement in setting up the
+mini-batches. The discussion
+<a href="https://sebastianraschka.com/faq/docs/sgd-methods.html" target="_blank">here</a> may be
+useful.  
+</p>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="resampling-methods">Resampling methods </h2>
+<h2 id="sgd-vs-full-batch-gd-convergence-speed-and-memory-comparison">SGD vs Full-Batch GD: Convergence Speed and Memory Comparison </h2>
+<h3 id="theoretical-convergence-speed-and-convex-optimization">Theoretical Convergence Speed and convex optimization </h3>
 
-<p>With all these analytical equations for both the OLS and Ridge
-regression, we will now outline how to assess a given model. This will
-lead to a discussion of the so-called bias-variance tradeoff (see
-below) and so-called resampling methods.
+<p>Consider minimizing an empirical cost function</p>
+$$
+C(\theta) =\frac{1}{N}\sum_{i=1}^N l_i(\theta),
+$$
+
+<p>where each \( l_i(\theta) \) is a
+differentiable loss term. Gradient Descent (GD) updates parameters
+using the full gradient \( \nabla C(\theta) \), while Stochastic Gradient
+Descent (SGD) uses a single sample (or mini-batch) gradient \( \nabla
+l_i(\theta) \) selected at random. In equation form, one GD step is:
 </p>
 
-<p>One of the quantities we have discussed as a way to measure errors is
-the mean-squared error (MSE), mainly used for fitting of continuous
-functions. Another choice is the absolute error.
+$$
+\theta_{t+1} = \theta_t-\eta \nabla C(\theta_t) =\theta_t -\eta \frac{1}{N}\sum_{i=1}^N \nabla l_i(\theta_t),
+$$
+
+<p>whereas one SGD step is:</p>
+
+$$
+\theta_{t+1} = \theta_t -\eta \nabla l_{i_t}(\theta_t),
+$$
+
+<p>with \( i_t \) randomly chosen. On smooth convex problems, GD and SGD both
+converge to the global minimum, but their rates differ. GD can take
+larger, more stable steps since it uses the exact gradient, achieving
+an error that decreases on the order of \( O(1/t) \) per iteration for
+convex objectives (and even exponentially fast for strongly convex
+cases). In contrast, plain SGD has more variance in each step, leading
+to sublinear convergence in expectation &#8211; typically \( O(1/\sqrt{t}) \)
+for general convex objectives (\thetaith appropriate diminishing step
+sizes) . Intuitively, GD&#8217;s trajectory is smoother and more
+predictable, while SGD&#8217;s path oscillates due to noise but costs far
+less per iteration, enabling many more updates in the same time.
 </p>
+<h3 id="strongly-convex-case">Strongly Convex Case </h3>
 
-<p>In the discussions below we will focus on the MSE and in particular since we will split the data into test and training data,
-we discuss the
+<p>If \( C(\theta) \) is strongly convex and \( L \)-smooth (so GD enjoys linear
+convergence), the gap \( C(\theta_t)-C(\theta^*) \) for GD shrinks as
 </p>
-<ol>
-<li> prediction error or simply the <b>test error</b> \( \mathrm{Err_{Test}} \), where we have a fixed training set and the test error is the MSE arising from the data reserved for testing. We discuss also the</li> 
-<li> training error \( \mathrm{Err_{Train}} \), which is the average loss over the training data.</li>
-</ol>
-<p>As our model becomes more and more complex, more of the training data tends to  used. The training may thence adapt to more complicated structures in the data. This may lead to a decrease in the bias (see below for code example) and a slight increase of the variance for the test error.
-For a certain level of complexity the test error will reach minimum, before starting to increase again. The
-training error reaches a saturation.
+$$
+C(\theta_t) - C(\theta^* ) \le \Big(1 - \frac{\mu}{L}\Big)^t [C(\theta_0)-C(\theta^*)],
+$$
+
+<p>a geometric (linear) convergence per iteration . Achieving an
+\( \epsilon \)-accurate solution thus takes on the order of
+\( \log(1/\epsilon) \) iterations for GD. However, each GD iteration costs
+\( O(N) \) gradient evaluations. SGD cannot exploit strong convexity to
+obtain a linear rate &#8211; instead, with a properly decaying step size
+(e.g. \( \eta_t = \frac{1}{\mu t} \)) or iterate averaging, SGD attains an
+\( O(1/t) \) convergence rate in expectation . For example, one result
+of Moulines and  Bach 2011, see <a href="https://papers.nips.cc/paper_files/paper/2011/hash/40008b9a5380fcacce3976bf7c08af5b-Abstract.html" target="_blank"><tt>https://papers.nips.cc/paper_files/paper/2011/hash/40008b9a5380fcacce3976bf7c08af5b-Abstract.html</tt></a> shows that with \( \eta_t = \Theta(1/t) \),
 </p>
+$$
+\mathbb{E}[C(\theta_t) - C(\theta^*)] = O(1/t),
+$$
 
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="resampling-methods-bootstrap">Resampling methods: Bootstrap </h2>
-<div class="alert alert-block alert-block alert-text-normal">
-<b></b>
-<p>
-<p>Bootstrapping is a <a href="https://en.wikipedia.org/wiki/Nonparametric_statistics" target="_blank">non-parametric approach</a> to statistical inference
-that substitutes computation for more traditional distributional
-assumptions and asymptotic results. Bootstrapping offers a number of
-advantages: 
+<p>for strongly convex, smooth \( F \) . This \( 1/t \) rate is slower per
+iteration than GD&#8217;s exponential decay, but each SGD iteration is \( N \)
+times cheaper. In fact, to reach error \( \epsilon \), plain SGD needs on
+the order of \( T=O(1/\epsilon) \) iterations (sub-linear convergence),
+while GD needs \( O(\log(1/\epsilon)) \) iterations. When accounting for
+cost-per-iteration, GD requires \( O(N \log(1/\epsilon)) \) total gradient
+computations versus SGD&#8217;s \( O(1/\epsilon) \) single-sample
+computations. In large-scale regimes (huge \( N \)), SGD can be
+faster in wall-clock time because \( N \log(1/\epsilon) \) may far exceed
+\( 1/\epsilon \) for reasonable accuracy levels. In other words,
+with millions of data points, one epoch of GD (one full gradient) is
+extremely costly, whereas SGD can make \( N \) cheap updates in the time
+GD makes one &#8211; often yielding a good solution faster in practice, even
+though SGD&#8217;s asymptotic error decays more slowly. As one lecture
+succinctly puts it: &#8220;SGD can be super effective in terms of iteration
+cost and memory, but SGD is slow to converge and can&#8217;t adapt to strong
+convexity&#8221; . Thus, the break-even point depends on \( N \) and the desired
+accuracy: for moderate accuracy on very large \( N \), SGD&#8217;s cheaper
+updates win; for extremely high precision (very small \( \epsilon \)) on a
+modest \( N \), GD&#8217;s fast convergence per step can be advantageous.
 </p>
-<ol>
-<li> The bootstrap is quite general, although there are some cases in which it fails.</li>  
-<li> Because it does not require distributional assumptions (such as normally distributed errors), the bootstrap can provide more accurate inferences when the data are not well behaved or when the sample size is small.</li>  
-<li> It is possible to apply the bootstrap to statistics with sampling distributions that are difficult to derive, even asymptotically.</li> 
-<li> It is relatively simple to apply the bootstrap to complex data-collection plans (such as stratified and clustered samples).</li>
-</ol>
-</div>
+<h3 id="non-convex-problems">Non-Convex Problems </h3>
 
+<p>In non-convex optimization (e.g. deep neural networks), neither GD nor
+SGD guarantees global minima, but SGD often displays faster progress
+in finding useful minima. Theoretical results here are weaker, usually
+showing convergence to a stationary point \( \theta \) (\( |\nabla C| \) is
+small) in expectation. For example, GD might require \( O(1/\epsilon^2) \)
+iterations to ensure \( |\nabla C(\theta)| < \epsilon \), and SGD typically has
+similar polynomial complexity (often worse due to gradient
+noise). However, a noteworthy difference is that SGD&#8217;s stochasticity
+can help escape saddle points or poor local minima. Random gradient
+fluctuations act like implicit noise, helping the iterate &#8220;jump&#8221; out
+of flat saddle regions where full-batch GD could stagnate . In fact,
+research has shown that adding noise to GD can guarantee escaping
+saddle points in polynomial time, and the inherent noise in SGD often
+serves this role. Empirically, this means SGD can sometimes find a
+lower loss basin faster, whereas full-batch GD might get &#8220;stuck&#8221; near
+saddle points or need a very small learning rate to navigate complex
+error surfaces . Overall, in modern high-dimensional machine learning,
+SGD (or mini-batch SGD) is the workhorse for large non-convex problems
+because it converges to good solutions much faster in practice,
+despite the lack of a linear convergence guarantee. Full-batch GD is
+rarely used on large neural networks, as it would require tiny steps
+to avoid divergence and is extremely slow per iteration .
+</p>
 
-<p>The textbook by <a href="https://www.cambridge.org/core/books/bootstrap-methods-and-their-application/ED2FD043579F27952363566DC09CBD6A" target="_blank">Davison on the Bootstrap Methods and their Applications</a> provides many more insights and proofs. In this course we will take a more practical approach and use the results and theorems provided in the literature. For those interested in reading more about the bootstrap methods, we recommend the above text and the one by <a href="https://www.routledge.com/An-Introduction-to-the-Bootstrap/Efron-Tibshirani/p/book/9780412042317" target="_blank">Efron and Tibshirani</a>.</p>
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="memory-usage-and-scalability">Memory Usage and Scalability </h2>
+
+<p>A major advantage of SGD is its memory efficiency in handling large
+datasets. Full-batch GD requires access to the entire training set for
+each iteration, which often means the whole dataset (or a large
+subset) must reside in memory to compute \( \nabla C(\theta) \) . This results
+in memory usage that scales linearly with the dataset size \( N \). For
+instance, if each training sample is large (e.g. high-dimensional
+features), computing a full gradient may require storing a substantial
+portion of the data or all intermediate gradients until they are
+aggregated. In contrast, SGD needs only a single (or a small
+mini-batch of) training example(s) in memory at any time . The
+algorithm processes one sample (or mini-batch) at a time and
+immediately updates the model, discarding that sample before moving to
+the next. This streaming approach means that memory footprint is
+essentially independent of \( N \) (apart from storing the model
+parameters themselves). As one source notes, gradient descent
+&#8220;requires more memory than SGD&#8221; because it &#8220;must store the entire
+dataset for each iteration,&#8221; whereas SGD &#8220;only needs to store the
+current training example&#8221; . In practical terms, if you have a dataset
+of size, say, 1 million examples, full-batch GD would need memory for
+all million every step, while SGD could be implemented to load just
+one example at a time &#8211; a crucial benefit if data are too large to fit
+in RAM or GPU memory. This scalability makes SGD suitable for
+large-scale learning: as long as you can stream data from disk, SGD
+can handle arbitrarily large datasets with fixed memory. In fact, SGD
+&#8220;does not need to remember which examples were visited&#8221; in the past,
+allowing it to run in an online fashion on infinite data streams
+. Full-batch GD, on the other hand, would require multiple passes
+through a giant dataset per update (or a complex distributed memory
+system), which is often infeasible.
+</p>
+
+<p>There is also a secondary memory effect: computing a full-batch
+gradient in deep learning requires storing all intermediate
+activations for backpropagation across the entire batch. A very large
+batch (approaching the full dataset) might exhaust GPU memory due to
+the need to hold activation gradients for thousands or millions of
+examples simultaneously. SGD/minibatches mitigate this by splitting
+the workload &#8211; e.g. with a mini-batch of size 32 or 256, memory use
+stays bounded, whereas a full-batch (size = \( N \)) forward/backward pass
+could not even be executed if \( N \) is huge. Techniques like gradient
+accumulation exist to simulate large-batch GD by summing many
+small-batch gradients &#8211; but these still process data in manageable
+chunks to avoid memory overflow. In summary, memory complexity for GD
+grows with \( N \), while for SGD it remains \( O(1) \) w.r.t. dataset size
+(only the model and perhaps a mini-batch reside in memory) . This is a
+key reason why batch GD &#8220;does not scale&#8221; to very large data and why
+virtually all large-scale machine learning algorithms rely on
+stochastic or mini-batch methods.
+</p>
 
-<p>Before we proceed however, we need to remind ourselves about a central theorem in statistics, namely the so-called <b>central limit theorem</b>.</p>
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="empirical-evidence-convergence-time-and-memory-in-practice">Empirical Evidence: Convergence Time and Memory in Practice </h2>
+
+<p>Empirical studies strongly support the theoretical trade-offs
+above. In large-scale machine learning tasks, SGD often converges to a
+good solution much faster in wall-clock time than full-batch GD, and
+it uses far less memory. For example, Bottou &amp; Bousquet (2008)
+analyzed learning time under a fixed computational budget and
+concluded that when data is abundant, it&#8217;s better to use a faster
+(even if less precise) optimization method to process more examples in
+the same time . This analysis showed that for large-scale problems,
+processing more data with SGD yields lower error than spending the
+time to do exact (batch) optimization on fewer data . In other words,
+if you have a time budget, it&#8217;s often optimal to accept slightly
+slower convergence per step (as with SGD) in exchange for being able
+to use many more training samples in that time. This phenomenon is
+borne out by experiments:
+</p>
+<h3 id="deep-neural-networks">Deep Neural Networks </h3>
+
+<p>In modern deep learning, full-batch GD is so slow that it is rarely
+attempted; instead, mini-batch SGD is standard. A recent study
+demonstrated that it is possible to train a ResNet-50 on ImageNet
+using full-batch gradient descent, but it required careful tuning
+(e.g. gradient clipping, tiny learning rates) and vast computational
+resources &#8211; and even then, each full-batch update was extremely
+expensive.
+</p>
+
+<p>Using a huge batch
+(closer to full GD) tends to slow down convergence if the learning
+rate is not scaled up, and often encounters optimization difficulties
+(plateaus) that small batches avoid.
+Empirically, small or medium
+batch SGD finds minima in fewer clock hours because it can rapidly
+loop over the data with gradient noise aiding exploration.
+</p>
+<h3 id="memory-constraints">Memory constraints </h3>
+
+<p>From a memory standpoint, practitioners note that batch GD becomes
+infeasible on large data. For example, if one tried to do full-batch
+training on a dataset that doesn&#8217;t fit in RAM or GPU memory, the
+program would resort to heavy disk I/O or simply crash. SGD
+circumvents this by processing mini-batches. Even in cases where data
+does fit in memory, using a full batch can spike memory usage due to
+storing all gradients. One empirical observation is that mini-batch
+training has a &#8220;lower, fluctuating usage pattern&#8221; of memory, whereas
+full-batch loading &#8220;quickly consumes memory (often exceeding limits)&#8221;
+. This is especially relevant for graph neural networks or other
+models where a &#8220;batch&#8221; may include a huge chunk of a graph: full-batch
+gradient computation can exhaust GPU memory, whereas mini-batch
+methods keep memory usage manageable .
+</p>
+
+<p>In summary, SGD converges faster than full-batch GD in terms of actual
+training time for large-scale problems, provided we measure
+convergence as reaching a good-enough solution. Theoretical bounds
+show SGD needs more iterations, but because it performs many more
+updates per unit time (and requires far less memory), it often
+achieves lower loss in a given time frame than GD. Full-batch GD might
+take slightly fewer iterations in theory, but each iteration is so
+costly that it is &#8220;slower&#8230; especially for large datasets&#8221; . Meanwhile,
+memory scaling strongly favors SGD: GD&#8217;s memory cost grows with
+dataset size, making it impractical beyond a point, whereas SGD&#8217;s
+memory use is modest and mostly constant w.r.t. \( N \) . These
+differences have made SGD (and mini-batch variants) the de facto
+choice for training large machine learning models, from logistic
+regression on millions of examples to deep neural networks with
+billions of parameters. The consensus in both research and practice is
+that for large-scale or high-dimensional tasks, SGD-type methods
+converge quicker per unit of computation and handle memory constraints
+better than standard full-batch gradient descent .
+</p>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="the-central-limit-theorem">The Central Limit Theorem </h2>
+<h2 id="second-moment-of-the-gradient">Second moment of the gradient </h2>
+
+<p>In stochastic gradient descent, with and without momentum, we still
+have to specify a schedule for tuning the learning rates \( \eta_t \)
+as a function of time.  As discussed in the context of Newton's
+method, this presents a number of dilemmas. The learning rate is
+limited by the steepest direction which can change depending on the
+current position in the landscape. To circumvent this problem, ideally
+our algorithm would keep track of curvature and take large steps in
+shallow, flat directions and small steps in steep, narrow directions.
+Second-order methods accomplish this by calculating or approximating
+the Hessian and normalizing the learning rate by the
+curvature. However, this is very computationally expensive for
+extremely large models. Ideally, we would like to be able to
+adaptively change the step size to match the landscape without paying
+the steep computational price of calculating or approximating
+Hessians.
+</p>
+
+<p>During the last decade a number of methods have been introduced that accomplish
+this by tracking not only the gradient, but also the second moment of
+the gradient. These methods include AdaGrad, AdaDelta, Root Mean Squared Propagation (RMS-Prop), and
+<a href="https://arxiv.org/abs/1412.6980" target="_blank">ADAM</a>.
+</p>
 
-<p>Suppose we have a PDF \( p(x) \) from which we generate  a series \( N \)
-of averages \( \mathbb{E}[x_i] \). Each mean value \( \mathbb{E}[x_i] \)
-is viewed as the average of a specific measurement, e.g., throwing 
-dice 100 times and then taking the average value, or producing a certain
-amount of random numbers. 
-For notational ease, we set \( \mathbb{E}[x_i]=x_i \) in the discussion
-which follows. We do the same for \( \mathbb{E}[z]=z \).
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="challenge-choosing-a-fixed-learning-rate">Challenge: Choosing a Fixed Learning Rate </h2>
+<p>A fixed \( \eta \) is hard to get right:</p>
+<ol>
+<li> If \( \eta \) is too large, the updates can overshoot the minimum, causing oscillations or divergence</li>
+<li> If \( \eta \) is too small, convergence is very slow (many iterations to make progress)</li>
+</ol>
+<p>In practice, one often uses trial-and-error or schedules (decaying \( \eta \) over time) to find a workable balance.
+For a function with steep directions and flat directions, a single global \( \eta \) may be inappropriate:
 </p>
+<ol>
+<li> Steep coordinates require a smaller step size to avoid oscillation.</li>
+<li> Flat/shallow coordinates could use a larger step to speed up progress.</li>
+<li> This issue is pronounced in high-dimensional problems with **sparse or varying-scale features** &#8211; we need a method to adjust step sizesper feature.</li>
+</ol>
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="motivation-for-adaptive-step-sizes">Motivation for Adaptive Step Sizes </h2>
 
-<p>If we compute the mean \( z \) of \( m \) such mean values \( x_i \)   </p>
-$$
-   z=\frac{x_1+x_2+\dots+x_m}{m},
-$$
+<ol>
+<li> Instead of a fixed global \( \eta \), use an <b>adaptive learning rate</b> for each parameter that depends on the history of gradients.</li>
+<li> Parameters that have large accumulated gradient magnitude should get smaller steps (they've been changing a lot), whereas parameters with small or infrequent gradients can have larger relative steps.</li>
+<li> This is especially useful for sparse features: Rarely active features accumulate little gradient, so their learning rate remains comparatively high, ensuring they are not neglected</li>
+<li> Conversely, frequently active features accumulate large gradient sums, and their learning rate automatically decreases, preventing too-large updates</li>
+<li> Several algorithms implement this idea (AdaGrad, RMSProp, AdaDelta, Adam, etc.). We will derive **AdaGrad**, one of the first adaptive methods.</li>
+</ol>
+<h2 id="adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html">AdaGrad algorithm, taken from <a href="https://www.deeplearningbook.org/contents/optimization.html" target="_blank">Goodfellow et al</a> </h2>
 
-<p>the question we pose is which is the PDF of the new variable \( z \).</p>
+<br/><br/>
+<center>
+<p><img src="http://github.com/figures/adagrad.png" width="600" align="bottom"></p>
+</center>
+<br/><br/>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="finding-the-limit">Finding the Limit </h2>
+<h2 id="derivation-of-the-adagrad-algorithm">Derivation of the AdaGrad Algorithm </h2>
 
-<p>The probability of obtaining an average value \( z \) is the product of the 
-probabilities of obtaining arbitrary individual mean values \( x_i \),
-but with the constraint that the average is \( z \). We can express this through
-the following expression
-</p>
+<div class="alert alert-block alert-block alert-text-normal">
+<b>Accumulating Gradient History</b>
+<p>
+<ol>
+<li> AdaGrad maintains a running sum of squared gradients for each parameter (coordinate)</li>
+<li> Let \( g_t = \nabla C_{i_t}(x_t) \) be the gradient at step \( t \) (or a subgradient for nondifferentiable cases).</li>
+<li> Initialize \( r_0 = 0 \) (an all-zero vector in \( \mathbb{R}^d \)).</li>
+<li> At each iteration \( t \), update the accumulation:</li>
+</ol>
 $$
-    \tilde{p}(z)=\int dx_1p(x_1)\int dx_2p(x_2)\dots\int dx_mp(x_m)
-    \delta(z-\frac{x_1+x_2+\dots+x_m}{m}),
+r_t = r_{t-1} + g_t \circ g_t,
 $$
 
-<p>where the \( \delta \)-function enbodies the constraint that the mean is \( z \).
-All measurements that lead to each individual \( x_i \) are expected to
-be independent, which in turn means that we can express \( \tilde{p} \) as the 
-product of individual \( p(x_i) \).  The independence assumption is important in the derivation of the central limit theorem.
-</p>
+<ol>
+<li> Here  \( g_t \circ g_t \) denotes element-wise square of the gradient vector. \( g_t^{(j)} = g_{t-1}^{(j)} + (g_{t,j})^2 \) for each parameter \( j \).</li>
+<li> We can view \( H_t = \mathrm{diag}(r_t) \) as a diagonal matrix of past squared gradients. Initially \( H_0 = 0 \).</li>
+</ol>
+</div>
 
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="rewriting-the-delta-function">Rewriting the \( \delta \)-function </h2>
 
-<p>If we use the integral expression for the \( \delta \)-function</p>
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="adagrad-update-rule-derivation">AdaGrad Update Rule Derivation </h2>
 
+<p>We scale the gradient by the inverse square root of the accumulated matrix \( H_t \). The AdaGrad update at step \( t \) is:</p>
 $$
-   \delta(z-\frac{x_1+x_2+\dots+x_m}{m})=\frac{1}{2\pi}\int_{-\infty}^{\infty}
-   dq\exp{\left(iq(z-\frac{x_1+x_2+\dots+x_m}{m})\right)},
+\theta_{t+1} =\theta_t - \eta H_t^{-1/2} g_t,
 $$
 
-<p>and inserting \( e^{i\mu q-i\mu q} \) where \( \mu \) is the mean value
-we arrive at
+<p>where \( H_t^{-1/2} \) is the diagonal matrix with entries \( (r_{t}^{(1)})^{-1/2}, \dots, (r_{t}^{(d)})^{-1/2} \)
+In coordinates, this means each parameter \( j \) has an individual step size:
 </p>
 $$
-   \tilde{p}(z)=\frac{1}{2\pi}\int_{-\infty}^{\infty}
-   dq\exp{\left(iq(z-\mu)\right)}\left[\int_{-\infty}^{\infty}
-   dxp(x)\exp{\left(iq(\mu-x)/m\right)}\right]^m,
+ \theta_{t+1,j} =\theta_{t,j} -\frac{\eta}{\sqrt{r_{t,j}}}g_{t,j}.
 $$
 
-<p>with the integral over \( x \) resulting in</p>
-
+<p>In practice we add a small constant \( \epsilon \) in the denominator for numerical stability to avoid division by zero:</p>
 $$
-  \int_{-\infty}^{\infty}dxp(x)\exp{\left(iq(\mu-x)/m\right)}=
-  \int_{-\infty}^{\infty}dxp(x)
-   \left[1+\frac{iq(\mu-x)}{m}-\frac{q^2(\mu-x)^2}{2m^2}+\dots\right].
+\theta_{t+1,j}= \theta_{t,j}-\frac{\eta}{\sqrt{\epsilon + r_{t,j}}}g_{t,j}.
 $$
 
+<p>Equivalently, the effective learning rate for parameter \( j \) at time \( t \) is \( \displaystyle \alpha_{t,j} = \frac{\eta}{\sqrt{\epsilon + r_{t,j}}} \). This decreases over time as \( r_{t,j} \) grows.</p>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="identifying-terms">Identifying Terms </h2>
+<h2 id="adagrad-properties">AdaGrad Properties </h2>
+
+<ol>
+<li> AdaGrad automatically tunes the step size for each parameter. Parameters with more <em>volatile or large gradients</em> get smaller steps, and those with <em>small or infrequent gradients</em> get relatively larger steps</li>
+<li> No manual schedule needed: The accumulation \( r_t \) keeps increasing (or stays the same if gradient is zero), so step sizes \( \eta/\sqrt{r_t} \) are non-increasing. This has a similar effect to a learning rate schedule, but individualized per coordinate.</li>
+<li> Sparse data benefit: For very sparse features, \( r_{t,j} \) grows slowly, so that feature&#8217;s parameter retains a higher learning rate for longer, allowing it to make significant updates when it does get a gradient signal</li>
+<li> Convergence: In convex optimization, AdaGrad can be shown to achieve a sub-linear convergence rate  comparable to the best fixed learning rate tuned for the problem</li>
+</ol>
+<p>It effectively reduces the need to tune \( \eta \) by hand.</p>
+<ol>
+<li> Limitations: Because \( r_t \) accumulates without bound, AdaGrad&#8217;s learning rates can become extremely small over long training, potentially slowing progress. (Later variants like RMSProp, AdaDelta, Adam address this by modifying the accumulation rule.)</li>
+</ol>
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="rmsprop-adaptive-learning-rates">RMSProp: Adaptive Learning Rates </h2>
 
-<p>The second term on the rhs disappears since this is just the mean and 
-employing the definition of \( \sigma^2 \) we have 
+<p>Addresses AdaGrad&#8217;s diminishing learning rate issue.
+Uses a decaying average of squared gradients (instead of a cumulative sum):
 </p>
 $$
-  \int_{-\infty}^{\infty}dxp(x)e^{\left(iq(\mu-x)/m\right)}=
-  1-\frac{q^2\sigma^2}{2m^2}+\dots,
+v_t = \rho v_{t-1} + (1-\rho)(\nabla C(\theta_t))^2,
 $$
 
-<p>resulting in </p>
+<p>with \( \rho \) typically \( 0.9 \) (or \( 0.99 \)).</p>
+<ol>
+<li> Update: \( \theta_{t+1} = \theta_t - \frac{\eta}{\sqrt{v_t + \epsilon}} \nabla C(\theta_t) \).</li>
+<li> Recent gradients have more weight, so \( v_t \) adapts to the current landscape.</li>
+<li> Avoids AdaGrad&#8217;s &#8220;infinite memory&#8221; problem &#8211; learning rate does not continuously decay to zero.</li>
+</ol>
+<p>RMSProp was first proposed in lecture notes by Geoff Hinton, 2012 &ndash; unpublished.)</p>
+<h2 id="rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html">RMSProp algorithm, taken from <a href="https://www.deeplearningbook.org/contents/optimization.html" target="_blank">Goodfellow et al</a> </h2>
 
-$$
-  \left[\int_{-\infty}^{\infty}dxp(x)\exp{\left(iq(\mu-x)/m\right)}\right]^m\approx
-  \left[1-\frac{q^2\sigma^2}{2m^2}+\dots \right]^m,
-$$
+<br/><br/>
+<center>
+<p><img src="http://github.com/figures/rmsprop.png" width="600" align="bottom"></p>
+</center>
+<br/><br/>
 
-<p>and in the limit \( m\rightarrow \infty \) we obtain </p>
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="adam-optimizer">Adam Optimizer </h2>
 
-$$
-   \tilde{p}(z)=\frac{1}{\sqrt{2\pi}(\sigma/\sqrt{m})}
-    \exp{\left(-\frac{(z-\mu)^2}{2(\sigma/\sqrt{m})^2}\right)},
-$$
+<p>Why combine Momentum and RMSProp? Motivation for Adam: Adaptive Moment Estimation (Adam) was introduced by Kingma an Ba (2014) to combine the benefits of momentum and RMSProp.</p>
 
-<p>which is the normal distribution with variance
-\( \sigma^2_m=\sigma^2/m \), where \( \sigma \) is the variance of the PDF \( p(x) \)
-and \( \mu \) is also the mean of the PDF \( p(x) \). 
-</p>
+<ol>
+<li> Fast convergence by smoothing gradients (accelerates in long-term gradient direction).</li>
+<li> Adaptive rates (RMSProp): Per-dimension learning rate scaling for stability (handles different feature scales, sparse gradients).</li>
+<li> Adam uses both: maintains moving averages of both first moment (gradients) and second moment (squared gradients)</li>
+<li> Additionally, includes a mechanism to correct the bias in these moving averages (crucial in early iterations)</li>
+</ol>
+<p><b>Result</b>: Adam is robust, achieves faster convergence with less tuning, and often outperforms SGD (with momentum) in practice.</p>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="wrapping-it-up">Wrapping it up </h2>
+<h2 id="adam-optimizer-https-arxiv-org-abs-1412-6980"><a href="https://arxiv.org/abs/1412.6980" target="_blank">ADAM optimizer</a> </h2>
 
-<p>Thus, the central limit theorem states that the PDF \( \tilde{p}(z) \) of
-the average of \( m \) random values corresponding to a PDF \( p(x) \) 
-is a normal distribution whose mean is the 
-mean value of the PDF \( p(x) \) and whose variance is the variance
-of the PDF \( p(x) \) divided by \( m \), the number of values used to compute \( z \).
+<p>In <a href="https://arxiv.org/abs/1412.6980" target="_blank">ADAM</a>, we keep a running average of
+both the first and second moment of the gradient and use this
+information to adaptively change the learning rate for different
+parameters.  The method is efficient when working with large
+problems involving lots data and/or parameters.  It is a combination of the
+gradient descent with momentum algorithm and the RMSprop algorithm
+discussed above.
 </p>
 
-<p>The central limit theorem leads to the well-known expression for the
-standard deviation, given by 
-</p>
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="why-combine-momentum-and-rmsprop">Why Combine Momentum and RMSProp? </h2>
+
+<ol>
+<li> Momentum: Fast convergence by smoothing gradients (accelerates in long-term gradient direction).</li>
+<li> Adaptive rates (RMSProp): Per-dimension learning rate scaling for stability (handles different feature scales, sparse gradients).</li>
+<li> Adam uses both: maintains moving averages of both first moment (gradients) and second moment (squared gradients)</li>
+<li> Additionally, includes a mechanism to correct the bias in these moving averages (crucial in early iterations)</li>
+</ol>
+<p>Result: Adam is robust, achieves faster convergence with less tuning, and often outperforms SGD (with momentum) in practice</p>
 
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="adam-exponential-moving-averages-moments">Adam: Exponential Moving Averages (Moments) </h2>
+<p>Adam maintains two moving averages at each time step \( t \) for each parameter \( w \):</p>
+<div class="alert alert-block alert-block alert-text-normal">
+<b>First moment (mean) \( m_t \)</b>
+<p>
+<p>The Momentum term</p>
 $$
-    \sigma_m=
-\frac{\sigma}{\sqrt{m}}.
+m_t = \beta_1m_{t-1} + (1-\beta_1)\, \nabla C(\theta_t),  
 $$
+</div>
 
-<p>The latter is true only if the average value is known exactly. This is obtained in the limit
-\( m\rightarrow \infty \)  only. Because the mean and the variance are measured quantities we obtain 
-the familiar expression in statistics (the so-called Bessel correction)
-</p>
+<div class="alert alert-block alert-block alert-text-normal">
+<b>Second moment (uncentered variance) \( v_t \)</b>
+<p>
+<p>The RMS term</p>
 $$
-    \sigma_m\approx 
-\frac{\sigma}{\sqrt{m-1}}.
+v_t = \beta_2v_{t-1} + (1-\beta_2)(\nabla C(\theta_t))^2,
 $$
 
-<p>In many cases however the above estimate for the standard deviation,
-in particular if correlations are strong, may be too simplistic. Keep
-in mind that we have assumed that the variables \( x \) are independent
-and identically distributed. This is obviously not always the
-case. For example, the random numbers (or better pseudorandom numbers)
-we generate in various calculations do always exhibit some
-correlations.
-</p>
+<p>with typical \( \beta_1 = 0.9 \), \( \beta_2 = 0.999 \). Initialize \( m_0 = 0 \), \( v_0 = 0 \).</p>
+</div>
 
-<p>The theorem is satisfied by a large class of PDFs. Note however that for a
-finite \( m \), it is not always possible to find a closed form /analytic expression for
-\( \tilde{p}(x) \).
-</p>
+<p>  These are <b>biased</b> estimators of the true first and second moment of the gradients, especially at the start (since \( m_0,v_0 \) are zero)</p>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="confidence-intervals">Confidence Intervals </h2>
+<h2 id="adam-bias-correction">Adam: Bias Correction </h2>
+<p>To counteract initialization bias in \( m_t, v_t \), Adam computes bias-corrected estimates</p>
+$$
+\hat{m}_t = \frac{m_t}{1 - \beta_1^t}, \qquad \hat{v}_t = \frac{v_t}{1 - \beta_2^t}. 
+$$
 
-<p>Confidence intervals are used in statistics and represent a type of estimate
-computed from the observed data. This gives a range of values for an
-unknown parameter such as the parameters \( \boldsymbol{\beta} \) from linear regression.
-</p>
+<ul>
+<li> When \( t \) is small, \( 1-\beta_i^t \approx 0 \), so \( \hat{m}_t, \hat{v}_t \) significantly larger than raw \( m_t, v_t \), compensating for the initial zero bias.</li>
+<li> As \( t \) increases, \( 1-\beta_i^t \to 1 \), and \( \hat{m}_t, \hat{v}_t \) converge to \( m_t, v_t \).</li>
+<li> Bias correction is important for Adam&#8217;s stability in early iterations</li>
+</ul>
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="adam-update-rule-derivation">Adam: Update Rule Derivation </h2>
+<p>Finally, Adam updates parameters using the bias-corrected moments:</p>
+$$
+\theta_{t+1} =\theta_t -\frac{\alpha}{\sqrt{\hat{v}_t} + \epsilon}\hat{m}_t,
+$$
 
-<p>With the OLS expressions for the parameters \( \boldsymbol{\beta} \) we found 
-\( \mathbb{E}(\boldsymbol{\beta}) = \boldsymbol{\beta} \), which means that the estimator of the regression parameters is unbiased.
+<p>where \( \epsilon \) is a small constant (e.g. \( 10^{-8} \)) to prevent division by zero.
+Breaking it down:
 </p>
+<ol>
+<li> Compute gradient \( \nabla C(\theta_t) \).</li>
+<li> Update first moment \( m_t \) and second moment \( v_t \) (exponential moving averages).</li>
+<li> Bias-correct: \( \hat{m}_t = m_t/(1-\beta_1^t) \), \( \; \hat{v}_t = v_t/(1-\beta_2^t) \).</li>
+<li> Compute step: \( \Delta \theta_t = \frac{\hat{m}_t}{\sqrt{\hat{v}_t} + \epsilon} \).</li>
+<li> Update parameters: \( \theta_{t+1} = \theta_t - \alpha\, \Delta \theta_t \).</li>
+</ol>
+<p>This is the Adam update rule as given in the original paper.</p>
 
-<p>In the exercises this week we show that the variance of the estimate of the \( j \)-th regression coefficient is
-\( \boldsymbol{\sigma}^2 (\boldsymbol{\beta}_j ) = \boldsymbol{\sigma}^2 [(\mathbf{X}^{T} \mathbf{X})^{-1}]_{jj}  \).
-</p>
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="adam-vs-adagrad-and-rmsprop">Adam vs. AdaGrad and RMSProp </h2>
 
-<p>This quantity can be used to
-construct a confidence interval for the estimates.
-</p>
+<ol>
+<li> AdaGrad: Uses per-coordinate scaling like Adam, but no momentum. Tends to slow down too much due to cumulative history (no forgetting)</li>
+<li> RMSProp: Uses moving average of squared gradients (like Adam&#8217;s \( v_t \)) to maintain adaptive learning rates, but does not include momentum or bias-correction.</li>
+<li> Adam: Effectively RMSProp + Momentum + Bias-correction</li>
+<ul>
+  <li> Momentum (\( m_t \)) provides acceleration and smoother convergence.</li>
+  <li> Adaptive \( v_t \) scaling moderates the step size per dimension.</li>
+  <li> Bias correction (absent in AdaGrad/RMSProp) ensures robust estimates early on.</li>
+</ul>
+</ol>
+<p>In practice, Adam often yields faster convergence and better tuning stability than RMSProp or AdaGrad alone</p>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="standard-approach-based-on-the-normal-distribution">Standard Approach based on the Normal Distribution </h2>
+<h2 id="adaptivity-across-dimensions">Adaptivity Across Dimensions </h2>
 
-<p>We will assume that the parameters \( \beta \) follow a normal
-distribution.  We can then define the confidence interval.  Here we will be using as
-shorthands \( \mu_{\beta} \) for the above mean value and \( \sigma_{\beta} \)
-for the standard deviation. We have then a confidence interval
-</p>
+<ol>
+<li> Adam adapts the step size \emph{per coordinate}: parameters with larger gradient variance get smaller effective steps, those with smaller or sparse gradients get larger steps.</li>
+<li> This per-dimension adaptivity is inherited from AdaGrad/RMSProp and helps handle ill-conditioned or sparse problems.</li>
+<li> Meanwhile, momentum (first moment) allows Adam to continue making progress even if gradients become small or noisy, by leveraging accumulated direction.</li>
+</ol>
+<h2 id="adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html">ADAM algorithm, taken from <a href="https://www.deeplearningbook.org/contents/optimization.html" target="_blank">Goodfellow et al</a> </h2>
 
-$$
-\left(\mu_{\beta}\pm \frac{z\sigma_{\beta}}{\sqrt{n}}\right),
-$$
+<br/><br/>
+<center>
+<p><img src="http://github.com/figures/adam.png" width="600" align="bottom"></p>
+</center>
+<br/><br/>
 
-<p>where \( z \) defines the level of certainty (or confidence). For a normal
-distribution typical parameters are \( z=2.576 \) which corresponds to a
-confidence of \( 99\% \) while \( z=1.96 \) corresponds to a confidence of
-\( 95\% \).  A confidence level of \( 95\% \) is commonly used and it is
-normally referred to as a <em>two-sigmas</em> confidence level, that is we
-approximate \( z\approx 2 \).
-</p>
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="algorithms-and-codes-for-adagrad-rmsprop-and-adam">Algorithms and codes for Adagrad, RMSprop and Adam </h2>
 
-<p>For more discussions of confidence intervals (and in particular linked with a discussion of the bootstrap method), see chapter 5 of the textbook by <a href="https://www.cambridge.org/core/books/bootstrap-methods-and-their-application/ED2FD043579F27952363566DC09CBD6A" target="_blank">Davison on the Bootstrap Methods and their Applications</a></p>
+<p>The algorithms we have implemented are well described in the text by <a href="https://www.deeplearningbook.org/contents/optimization.html" target="_blank">Goodfellow, Bengio and Courville, chapter 8</a>.</p>
 
-<p>In this text you will also find an in-depth discussion of the
-Bootstrap method, why it works and various theorems related to it. 
-</p>
+<p>The codes which implement these algorithms are discussed below here.</p>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="resampling-methods-bootstrap-background">Resampling methods: Bootstrap background </h2>
-
-<p>Since \( \widehat{\beta} = \widehat{\beta}(\boldsymbol{X}) \) is a function of random variables,
-\( \widehat{\beta} \) itself must be a random variable. Thus it has
-a pdf, call this function \( p(\boldsymbol{t}) \). The aim of the bootstrap is to
-estimate \( p(\boldsymbol{t}) \) by the relative frequency of
-\( \widehat{\beta} \). You can think of this as using a histogram
-in the place of \( p(\boldsymbol{t}) \). If the relative frequency closely
-resembles \( p(\vec{t}) \), then using numerics, it is straight forward to
-estimate all the interesting parameters of \( p(\boldsymbol{t}) \) using point
-estimators.  
-</p>
+<h2 id="practical-tips">Practical tips </h2>
 
+<ul>
+<li> <b>Randomize the data when making mini-batches</b>. It is always important to randomly shuffle the data when forming mini-batches. Otherwise, the gradient descent method can fit spurious correlations resulting from the order in which data is presented.</li>
+<li> <b>Transform your inputs</b>. Learning becomes difficult when our landscape has a mixture of steep and flat directions. One simple trick for minimizing these situations is to standardize the data by subtracting the mean and normalizing the variance of input variables. Whenever possible, also decorrelate the inputs. To understand why this is helpful, consider the case of linear regression. It is easy to show that for the squared error cost function, the Hessian of the cost function is just the correlation matrix between the inputs. Thus, by standardizing the inputs, we are ensuring that the landscape looks homogeneous in all directions in parameter space. Since most deep networks can be viewed as linear transformations followed by a non-linearity at each layer, we expect this intuition to hold beyond the linear case.</li>
+<li> <b>Monitor the out-of-sample performance.</b> Always monitor the performance of your model on a validation set (a small portion of the training data that is held out of the training process to serve as a proxy for the test set. If the validation error starts increasing, then the model is beginning to overfit. Terminate the learning process. This <em>early stopping</em> significantly improves performance in many settings.</li>
+<li> <b>Adaptive optimization methods don't always have good generalization.</b> Recent studies have shown that adaptive methods such as ADAM, RMSPorp, and AdaGrad tend to have poor generalization compared to SGD or SGD with momentum, particularly in the high-dimensional limit (i.e. the number of parameters exceeds the number of data points). Although it is not clear at this stage why these methods perform so well in training deep neural networks, simpler procedures like properly-tuned SGD may work as well or better in these applications.</li>
+</ul>
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="resampling-methods-more-bootstrap-background">Resampling methods: More Bootstrap background </h2>
+<h2 id="sneaking-in-automatic-differentiation-using-autograd">Sneaking in automatic differentiation using Autograd </h2>
 
-<p>In the case that \( \widehat{\beta} \) has
-more than one component, and the components are independent, we use the
-same estimator on each component separately.  If the probability
-density function of \( X_i \), \( p(x) \), had been known, then it would have
-been straightforward to do this by: 
+<p>In the examples here we take the liberty of sneaking in automatic
+differentiation (without having discussed the mathematics).  In
+project 1 you will write the gradients as discussed above, that is
+hard-coding the gradients.  By introducing automatic differentiation
+via the library <b>autograd</b>, which is now replaced by <b>JAX</b>, we have
+more flexibility in setting up alternative cost functions.
 </p>
-<ol>
-<li> Drawing lots of numbers from \( p(x) \), suppose we call one such set of numbers \( (X_1^*, X_2^*, \cdots, X_n^*) \).</li> 
-<li> Then using these numbers, we could compute a replica of \( \widehat{\beta} \) called \( \widehat{\beta}^* \).</li> 
-</ol>
-<p>By repeated use of the above two points, many
-estimates of \( \widehat{\beta} \) can  be obtained. The
-idea is to use the relative frequency of \( \widehat{\beta}^* \)
-(think of a histogram) as an estimate of \( p(\boldsymbol{t}) \).
+
+<p>The
+first example shows results with ordinary leats squares.
 </p>
 
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="resampling-methods-bootstrap-approach">Resampling methods: Bootstrap approach </h2>
 
-<p>But
-unless there is enough information available about the process that
-generated \( X_1,X_2,\cdots,X_n \), \( p(x) \) is in general
-unknown. Therefore, <a href="https://projecteuclid.org/euclid.aos/1176344552" target="_blank">Efron in 1979</a>  asked the
-question: What if we replace \( p(x) \) by the relative frequency
-of the observation \( X_i \)?
-</p>
+<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #eeeedd">
+  <pre style="line-height: 125%;"><span style="color: #228B22"># Using Autograd to calculate gradients for OLS</span>
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">random</span> <span style="color: #8B008B; font-weight: bold">import</span> random, seed
+<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
+<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">autograd.numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
+<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">matplotlib.pyplot</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">plt</span>
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">autograd</span> <span style="color: #8B008B; font-weight: bold">import</span> grad
+
+<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">CostOLS</span>(theta):
+    <span style="color: #8B008B; font-weight: bold">return</span> (<span style="color: #B452CD">1.0</span>/n)*np.sum((y-X @ theta)**<span style="color: #B452CD">2</span>)
+
+n = <span style="color: #B452CD">100</span>
+x = <span style="color: #B452CD">2</span>*np.random.rand(n,<span style="color: #B452CD">1</span>)
+y = <span style="color: #B452CD">4</span>+<span style="color: #B452CD">3</span>*x+np.random.randn(n,<span style="color: #B452CD">1</span>)
+
+X = np.c_[np.ones((n,<span style="color: #B452CD">1</span>)), x]
+XT_X = X.T @ X
+theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)
+<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;Own inversion&quot;</span>)
+<span style="color: #658b00">print</span>(theta_linreg)
+<span style="color: #228B22"># Hessian matrix</span>
+H = (<span style="color: #B452CD">2.0</span>/n)* XT_X
+EigValues, EigVectors = np.linalg.eig(H)
+<span style="color: #658b00">print</span>(<span style="color: #CD5555">f&quot;Eigenvalues of Hessian Matrix:{</span>EigValues<span style="color: #CD5555">}&quot;</span>)
+
+theta = np.random.randn(<span style="color: #B452CD">2</span>,<span style="color: #B452CD">1</span>)
+eta = <span style="color: #B452CD">1.0</span>/np.max(EigValues)
+Niterations = <span style="color: #B452CD">1000</span>
+<span style="color: #228B22"># define the gradient</span>
+training_gradient = grad(CostOLS)
+
+<span style="color: #8B008B; font-weight: bold">for</span> <span style="color: #658b00">iter</span> <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(Niterations):
+    gradients = training_gradient(theta)
+    theta -= eta*gradients
+<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;theta from own gd&quot;</span>)
+<span style="color: #658b00">print</span>(theta)
+
+xnew = np.array([[<span style="color: #B452CD">0</span>],[<span style="color: #B452CD">2</span>]])
+Xnew = np.c_[np.ones((<span style="color: #B452CD">2</span>,<span style="color: #B452CD">1</span>)), xnew]
+ypredict = Xnew.dot(theta)
+ypredict2 = Xnew.dot(theta_linreg)
+
+plt.plot(xnew, ypredict, <span style="color: #CD5555">&quot;r-&quot;</span>)
+plt.plot(xnew, ypredict2, <span style="color: #CD5555">&quot;b-&quot;</span>)
+plt.plot(x, y ,<span style="color: #CD5555">&#39;ro&#39;</span>)
+plt.axis([<span style="color: #B452CD">0</span>,<span style="color: #B452CD">2.0</span>,<span style="color: #B452CD">0</span>, <span style="color: #B452CD">15.0</span>])
+plt.xlabel(<span style="color: #CD5555">r&#39;$x$&#39;</span>)
+plt.ylabel(<span style="color: #CD5555">r&#39;$y$&#39;</span>)
+plt.title(<span style="color: #CD5555">r&#39;Random numbers &#39;</span>)
+plt.show()
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
+</div>
 
-<p>If we draw observations in accordance with
-the relative frequency of the observations, will we obtain the same
-result in some asymptotic sense? The answer is yes.
-</p>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="resampling-methods-bootstrap-steps">Resampling methods: Bootstrap steps </h2>
+<h2 id="same-code-but-now-with-momentum-gradient-descent">Same code but now with momentum gradient descent </h2>
 
-<p>The independent bootstrap works like this: </p>
+<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #eeeedd">
+  <pre style="line-height: 125%;"><span style="color: #228B22"># Using Autograd to calculate gradients for OLS</span>
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">random</span> <span style="color: #8B008B; font-weight: bold">import</span> random, seed
+<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
+<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">autograd.numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
+<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">matplotlib.pyplot</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">plt</span>
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">autograd</span> <span style="color: #8B008B; font-weight: bold">import</span> grad
+
+<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">CostOLS</span>(theta):
+    <span style="color: #8B008B; font-weight: bold">return</span> (<span style="color: #B452CD">1.0</span>/n)*np.sum((y-X @ theta)**<span style="color: #B452CD">2</span>)
+
+n = <span style="color: #B452CD">100</span>
+x = <span style="color: #B452CD">2</span>*np.random.rand(n,<span style="color: #B452CD">1</span>)
+y = <span style="color: #B452CD">4</span>+<span style="color: #B452CD">3</span>*x<span style="color: #228B22">#+np.random.randn(n,1)</span>
+
+X = np.c_[np.ones((n,<span style="color: #B452CD">1</span>)), x]
+XT_X = X.T @ X
+theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)
+<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;Own inversion&quot;</span>)
+<span style="color: #658b00">print</span>(theta_linreg)
+<span style="color: #228B22"># Hessian matrix</span>
+H = (<span style="color: #B452CD">2.0</span>/n)* XT_X
+EigValues, EigVectors = np.linalg.eig(H)
+<span style="color: #658b00">print</span>(<span style="color: #CD5555">f&quot;Eigenvalues of Hessian Matrix:{</span>EigValues<span style="color: #CD5555">}&quot;</span>)
+
+theta = np.random.randn(<span style="color: #B452CD">2</span>,<span style="color: #B452CD">1</span>)
+eta = <span style="color: #B452CD">1.0</span>/np.max(EigValues)
+Niterations = <span style="color: #B452CD">30</span>
+
+<span style="color: #228B22"># define the gradient</span>
+training_gradient = grad(CostOLS)
+
+<span style="color: #8B008B; font-weight: bold">for</span> <span style="color: #658b00">iter</span> <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(Niterations):
+    gradients = training_gradient(theta)
+    theta -= eta*gradients
+    <span style="color: #658b00">print</span>(<span style="color: #658b00">iter</span>,gradients[<span style="color: #B452CD">0</span>],gradients[<span style="color: #B452CD">1</span>])
+<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;theta from own gd&quot;</span>)
+<span style="color: #658b00">print</span>(theta)
+
+<span style="color: #228B22"># Now improve with momentum gradient descent</span>
+change = <span style="color: #B452CD">0.0</span>
+delta_momentum = <span style="color: #B452CD">0.3</span>
+<span style="color: #8B008B; font-weight: bold">for</span> <span style="color: #658b00">iter</span> <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(Niterations):
+    <span style="color: #228B22"># calculate gradient</span>
+    gradients = training_gradient(theta)
+    <span style="color: #228B22"># calculate update</span>
+    new_change = eta*gradients+delta_momentum*change
+    <span style="color: #228B22"># take a step</span>
+    theta -= new_change
+    <span style="color: #228B22"># save the change</span>
+    change = new_change
+    <span style="color: #658b00">print</span>(<span style="color: #658b00">iter</span>,gradients[<span style="color: #B452CD">0</span>],gradients[<span style="color: #B452CD">1</span>])
+<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;theta from own gd wth momentum&quot;</span>)
+<span style="color: #658b00">print</span>(theta)
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
+</div>
 
-<ol>
-<li> Draw with replacement \( n \) numbers for the observed variables \( \boldsymbol{x} = (x_1,x_2,\cdots,x_n) \).</li> 
-<li> Define a vector \( \boldsymbol{x}^* \) containing the values which were drawn from \( \boldsymbol{x} \).</li> 
-<li> Using the vector \( \boldsymbol{x}^* \) compute \( \widehat{\beta}^* \) by evaluating \( \widehat \beta \) under the observations \( \boldsymbol{x}^* \).</li> 
-<li> Repeat this process \( k \) times.</li> 
-</ol>
-<p>When you are done, you can draw a histogram of the relative frequency
-of \( \widehat \beta^* \). This is your estimate of the probability
-distribution \( p(t) \). Using this probability distribution you can
-estimate any statistics thereof. In principle you never draw the
-histogram of the relative frequency of \( \widehat{\beta}^* \). Instead
-you use the estimators corresponding to the statistic of interest. For
-example, if you are interested in estimating the variance of \( \widehat
-\beta \), apply the etsimator \( \widehat \sigma^2 \) to the values
-\( \widehat \beta^* \).
-</p>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="code-example-for-the-bootstrap-method">Code example for the Bootstrap method </h2>
+<h2 id="including-stochastic-gradient-descent-with-autograd">Including Stochastic Gradient Descent with Autograd </h2>
 
-<p>The following code starts with a Gaussian distribution with mean value
-\( \mu =100 \) and variance \( \sigma=15 \). We use this to generate the data
-used in the bootstrap analysis. The bootstrap analysis returns a data
-set after a given number of bootstrap operations (as many as we have
-data points). This data set consists of estimated mean values for each
-bootstrap operation. The histogram generated by the bootstrap method
-shows that the distribution for these mean values is also a Gaussian,
-centered around the mean value \( \mu=100 \) but with standard deviation
-\( \sigma/\sqrt{n} \), where \( n \) is the number of bootstrap samples (in
-this case the same as the number of original data points). The value
-of the standard deviation is what we expect from the central limit
-theorem.
+<p>In this code we include the stochastic gradient descent approach
+discussed above. Note here that we specify which argument we are
+taking the derivative with respect to when using <b>autograd</b>.
 </p>
 
 
@@ -1075,32 +1949,79 @@ <h2 id="code-example-for-the-bootstrap-method">Code example for the Bootstrap me
     <div class="inner_cell">
       <div class="input_area">
         <div class="highlight" style="background: #eeeedd">
-  <pre style="line-height: 125%;"><span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">time</span> <span style="color: #8B008B; font-weight: bold">import</span> time
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">scipy.stats</span> <span style="color: #8B008B; font-weight: bold">import</span> norm
+  <pre style="line-height: 125%;"><span style="color: #228B22"># Using Autograd to calculate gradients using SGD</span>
+<span style="color: #228B22"># OLS example</span>
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">random</span> <span style="color: #8B008B; font-weight: bold">import</span> random, seed
+<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
+<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">autograd.numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
 <span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">matplotlib.pyplot</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">plt</span>
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">autograd</span> <span style="color: #8B008B; font-weight: bold">import</span> grad
+
+<span style="color: #228B22"># Note change from previous example</span>
+<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">CostOLS</span>(y,X,theta):
+    <span style="color: #8B008B; font-weight: bold">return</span> np.sum((y-X @ theta)**<span style="color: #B452CD">2</span>)
+
+n = <span style="color: #B452CD">100</span>
+x = <span style="color: #B452CD">2</span>*np.random.rand(n,<span style="color: #B452CD">1</span>)
+y = <span style="color: #B452CD">4</span>+<span style="color: #B452CD">3</span>*x+np.random.randn(n,<span style="color: #B452CD">1</span>)
+
+X = np.c_[np.ones((n,<span style="color: #B452CD">1</span>)), x]
+XT_X = X.T @ X
+theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)
+<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;Own inversion&quot;</span>)
+<span style="color: #658b00">print</span>(theta_linreg)
+<span style="color: #228B22"># Hessian matrix</span>
+H = (<span style="color: #B452CD">2.0</span>/n)* XT_X
+EigValues, EigVectors = np.linalg.eig(H)
+<span style="color: #658b00">print</span>(<span style="color: #CD5555">f&quot;Eigenvalues of Hessian Matrix:{</span>EigValues<span style="color: #CD5555">}&quot;</span>)
+
+theta = np.random.randn(<span style="color: #B452CD">2</span>,<span style="color: #B452CD">1</span>)
+eta = <span style="color: #B452CD">1.0</span>/np.max(EigValues)
+Niterations = <span style="color: #B452CD">1000</span>
+
+<span style="color: #228B22"># Note that we request the derivative wrt third argument (theta, 2 here)</span>
+training_gradient = grad(CostOLS,<span style="color: #B452CD">2</span>)
+
+<span style="color: #8B008B; font-weight: bold">for</span> <span style="color: #658b00">iter</span> <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(Niterations):
+    gradients = (<span style="color: #B452CD">1.0</span>/n)*training_gradient(y, X, theta)
+    theta -= eta*gradients
+<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;theta from own gd&quot;</span>)
+<span style="color: #658b00">print</span>(theta)
+
+xnew = np.array([[<span style="color: #B452CD">0</span>],[<span style="color: #B452CD">2</span>]])
+Xnew = np.c_[np.ones((<span style="color: #B452CD">2</span>,<span style="color: #B452CD">1</span>)), xnew]
+ypredict = Xnew.dot(theta)
+ypredict2 = Xnew.dot(theta_linreg)
+
+plt.plot(xnew, ypredict, <span style="color: #CD5555">&quot;r-&quot;</span>)
+plt.plot(xnew, ypredict2, <span style="color: #CD5555">&quot;b-&quot;</span>)
+plt.plot(x, y ,<span style="color: #CD5555">&#39;ro&#39;</span>)
+plt.axis([<span style="color: #B452CD">0</span>,<span style="color: #B452CD">2.0</span>,<span style="color: #B452CD">0</span>, <span style="color: #B452CD">15.0</span>])
+plt.xlabel(<span style="color: #CD5555">r&#39;$x$&#39;</span>)
+plt.ylabel(<span style="color: #CD5555">r&#39;$y$&#39;</span>)
+plt.title(<span style="color: #CD5555">r&#39;Random numbers &#39;</span>)
+plt.show()
 
-<span style="color: #228B22"># Returns mean of bootstrap samples </span>
-<span style="color: #228B22"># Bootstrap algorithm</span>
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">bootstrap</span>(data, datapoints):
-    t = np.zeros(datapoints)
-    n = <span style="color: #658b00">len</span>(data)
-    <span style="color: #228B22"># non-parametric bootstrap         </span>
-    <span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(datapoints):
-        t[i] = np.mean(data[np.random.randint(<span style="color: #B452CD">0</span>,n,n)])
-    <span style="color: #228B22"># analysis    </span>
-    <span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;Bootstrap Statistics :&quot;</span>)
-    <span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;original           bias      std. error&quot;</span>)
-    <span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;%8g %8g %14g %15g&quot;</span> % (np.mean(data), np.std(data),np.mean(t),np.std(t)))
-    <span style="color: #8B008B; font-weight: bold">return</span> t
-
-<span style="color: #228B22"># We set the mean value to 100 and the standard deviation to 15</span>
-mu, sigma = <span style="color: #B452CD">100</span>, <span style="color: #B452CD">15</span>
-datapoints = <span style="color: #B452CD">10000</span>
-<span style="color: #228B22"># We generate random numbers according to the normal distribution</span>
-x = mu + sigma*np.random.randn(datapoints)
-<span style="color: #228B22"># bootstrap returns the data sample                                    </span>
-t = bootstrap(x, datapoints)
+n_epochs = <span style="color: #B452CD">50</span>
+M = <span style="color: #B452CD">5</span>   <span style="color: #228B22">#size of each minibatch</span>
+m = <span style="color: #658b00">int</span>(n/M) <span style="color: #228B22">#number of minibatches</span>
+t0, t1 = <span style="color: #B452CD">5</span>, <span style="color: #B452CD">50</span>
+<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">learning_schedule</span>(t):
+    <span style="color: #8B008B; font-weight: bold">return</span> t0/(t+t1)
+
+theta = np.random.randn(<span style="color: #B452CD">2</span>,<span style="color: #B452CD">1</span>)
+
+<span style="color: #8B008B; font-weight: bold">for</span> epoch <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(n_epochs):
+<span style="color: #228B22"># Can you figure out a better way of setting up the contributions to each batch?</span>
+    <span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(m):
+        random_index = M*np.random.randint(m)
+        xi = X[random_index:random_index+M]
+        yi = y[random_index:random_index+M]
+        gradients = (<span style="color: #B452CD">1.0</span>/M)*training_gradient(yi, xi, theta)
+        eta = learning_schedule(epoch*m+i)
+        theta = theta - eta*gradients
+<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;theta from own sdg&quot;</span>)
+<span style="color: #658b00">print</span>(theta)
 </pre>
 </div>
       </div>
@@ -1116,10 +2037,9 @@ <h2 id="code-example-for-the-bootstrap-method">Code example for the Bootstrap me
   </div>
 </div>
 
-<p>We see that our new variance and from that the standard deviation, agrees with the central limit theorem.</p>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="plotting-the-histogram">Plotting the Histogram </h2>
+<h2 id="same-code-but-now-with-momentum-gradient-descent">Same code but now with momentum gradient descent </h2>
 
 <!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
 <div class="cell border-box-sizing code_cell rendered">
@@ -1127,15 +2047,73 @@ <h2 id="plotting-the-histogram">Plotting the Histogram </h2>
     <div class="inner_cell">
       <div class="input_area">
         <div class="highlight" style="background: #eeeedd">
-  <pre style="line-height: 125%;"><span style="color: #228B22"># the histogram of the bootstrapped data (normalized data if density = True)</span>
-n, binsboot, patches = plt.hist(t, <span style="color: #B452CD">50</span>, density=<span style="color: #8B008B; font-weight: bold">True</span>, facecolor=<span style="color: #CD5555">&#39;red&#39;</span>, alpha=<span style="color: #B452CD">0.75</span>)
-<span style="color: #228B22"># add a &#39;best fit&#39; line  </span>
-y = norm.pdf(binsboot, np.mean(t), np.std(t))
-lt = plt.plot(binsboot, y, <span style="color: #CD5555">&#39;b&#39;</span>, linewidth=<span style="color: #B452CD">1</span>)
-plt.xlabel(<span style="color: #CD5555">&#39;x&#39;</span>)
-plt.ylabel(<span style="color: #CD5555">&#39;Probability&#39;</span>)
-plt.grid(<span style="color: #8B008B; font-weight: bold">True</span>)
-plt.show()
+  <pre style="line-height: 125%;"><span style="color: #228B22"># Using Autograd to calculate gradients using SGD</span>
+<span style="color: #228B22"># OLS example</span>
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">random</span> <span style="color: #8B008B; font-weight: bold">import</span> random, seed
+<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
+<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">autograd.numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
+<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">matplotlib.pyplot</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">plt</span>
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">autograd</span> <span style="color: #8B008B; font-weight: bold">import</span> grad
+
+<span style="color: #228B22"># Note change from previous example</span>
+<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">CostOLS</span>(y,X,theta):
+    <span style="color: #8B008B; font-weight: bold">return</span> np.sum((y-X @ theta)**<span style="color: #B452CD">2</span>)
+
+n = <span style="color: #B452CD">100</span>
+x = <span style="color: #B452CD">2</span>*np.random.rand(n,<span style="color: #B452CD">1</span>)
+y = <span style="color: #B452CD">4</span>+<span style="color: #B452CD">3</span>*x+np.random.randn(n,<span style="color: #B452CD">1</span>)
+
+X = np.c_[np.ones((n,<span style="color: #B452CD">1</span>)), x]
+XT_X = X.T @ X
+theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)
+<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;Own inversion&quot;</span>)
+<span style="color: #658b00">print</span>(theta_linreg)
+<span style="color: #228B22"># Hessian matrix</span>
+H = (<span style="color: #B452CD">2.0</span>/n)* XT_X
+EigValues, EigVectors = np.linalg.eig(H)
+<span style="color: #658b00">print</span>(<span style="color: #CD5555">f&quot;Eigenvalues of Hessian Matrix:{</span>EigValues<span style="color: #CD5555">}&quot;</span>)
+
+theta = np.random.randn(<span style="color: #B452CD">2</span>,<span style="color: #B452CD">1</span>)
+eta = <span style="color: #B452CD">1.0</span>/np.max(EigValues)
+Niterations = <span style="color: #B452CD">100</span>
+
+<span style="color: #228B22"># Note that we request the derivative wrt third argument (theta, 2 here)</span>
+training_gradient = grad(CostOLS,<span style="color: #B452CD">2</span>)
+
+<span style="color: #8B008B; font-weight: bold">for</span> <span style="color: #658b00">iter</span> <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(Niterations):
+    gradients = (<span style="color: #B452CD">1.0</span>/n)*training_gradient(y, X, theta)
+    theta -= eta*gradients
+<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;theta from own gd&quot;</span>)
+<span style="color: #658b00">print</span>(theta)
+
+
+n_epochs = <span style="color: #B452CD">50</span>
+M = <span style="color: #B452CD">5</span>   <span style="color: #228B22">#size of each minibatch</span>
+m = <span style="color: #658b00">int</span>(n/M) <span style="color: #228B22">#number of minibatches</span>
+t0, t1 = <span style="color: #B452CD">5</span>, <span style="color: #B452CD">50</span>
+<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">learning_schedule</span>(t):
+    <span style="color: #8B008B; font-weight: bold">return</span> t0/(t+t1)
+
+theta = np.random.randn(<span style="color: #B452CD">2</span>,<span style="color: #B452CD">1</span>)
+
+change = <span style="color: #B452CD">0.0</span>
+delta_momentum = <span style="color: #B452CD">0.3</span>
+
+<span style="color: #8B008B; font-weight: bold">for</span> epoch <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(n_epochs):
+    <span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(m):
+        random_index = M*np.random.randint(m)
+        xi = X[random_index:random_index+M]
+        yi = y[random_index:random_index+M]
+        gradients = (<span style="color: #B452CD">1.0</span>/M)*training_gradient(yi, xi, theta)
+        eta = learning_schedule(epoch*m+i)
+        <span style="color: #228B22"># calculate update</span>
+        new_change = eta*gradients+delta_momentum*change
+        <span style="color: #228B22"># take a step</span>
+        theta -= new_change
+        <span style="color: #228B22"># save the change</span>
+        change = new_change
+<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;theta from own sdg with momentum&quot;</span>)
+<span style="color: #658b00">print</span>(theta)
 </pre>
 </div>
       </div>
@@ -1153,76 +2131,221 @@ <h2 id="plotting-the-histogram">Plotting the Histogram </h2>
 
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="the-bias-variance-tradeoff">The bias-variance tradeoff </h2>
+<h2 id="but-none-of-these-can-compete-with-newton-s-method">But none of these can compete with Newton's method </h2>
 
-<p>We will discuss the bias-variance tradeoff in the context of
-continuous predictions such as regression. However, many of the
-intuitions and ideas discussed here also carry over to classification
-tasks. Consider a dataset \( \mathcal{D} \) consisting of the data
-\( \mathbf{X}_\mathcal{D}=\{(y_j, \boldsymbol{x}_j), j=0\ldots n-1\} \). 
-</p>
+<p>Note that we here have introduced automatic differentiation</p>
 
-<p>Let us assume that the true data is generated from a noisy model</p>
-
-$$
-\boldsymbol{y}=f(\boldsymbol{x}) + \boldsymbol{\epsilon}
-$$
-
-<p>where \( \epsilon \) is normally distributed with mean zero and standard deviation \( \sigma^2 \).</p>
+<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #eeeedd">
+  <pre style="line-height: 125%;"><span style="color: #228B22"># Using Newton&#39;s method</span>
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">random</span> <span style="color: #8B008B; font-weight: bold">import</span> random, seed
+<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
+<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">autograd.numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">autograd</span> <span style="color: #8B008B; font-weight: bold">import</span> grad
+
+<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">CostOLS</span>(theta):
+    <span style="color: #8B008B; font-weight: bold">return</span> (<span style="color: #B452CD">1.0</span>/n)*np.sum((y-X @ theta)**<span style="color: #B452CD">2</span>)
+
+n = <span style="color: #B452CD">100</span>
+x = <span style="color: #B452CD">2</span>*np.random.rand(n,<span style="color: #B452CD">1</span>)
+y = <span style="color: #B452CD">4</span>+<span style="color: #B452CD">3</span>*x+<span style="color: #B452CD">5</span>*x*x
+
+X = np.c_[np.ones((n,<span style="color: #B452CD">1</span>)), x, x*x]
+XT_X = X.T @ X
+theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)
+<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;Own inversion&quot;</span>)
+<span style="color: #658b00">print</span>(theta_linreg)
+<span style="color: #228B22"># Hessian matrix</span>
+H = (<span style="color: #B452CD">2.0</span>/n)* XT_X
+<span style="color: #228B22"># Note that here the Hessian does not depend on the parameters theta</span>
+invH = np.linalg.pinv(H)
+theta = np.random.randn(<span style="color: #B452CD">3</span>,<span style="color: #B452CD">1</span>)
+Niterations = <span style="color: #B452CD">5</span>
+<span style="color: #228B22"># define the gradient</span>
+training_gradient = grad(CostOLS)
+
+<span style="color: #8B008B; font-weight: bold">for</span> <span style="color: #658b00">iter</span> <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(Niterations):
+    gradients = training_gradient(theta)
+    theta -= invH @ gradients
+    <span style="color: #658b00">print</span>(<span style="color: #658b00">iter</span>,gradients[<span style="color: #B452CD">0</span>],gradients[<span style="color: #B452CD">1</span>])
+<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;theta from own Newton code&quot;</span>)
+<span style="color: #658b00">print</span>(theta)
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
+</div>
 
-<p>In our derivation of the ordinary least squares method we defined then
-an approximation to the function \( f \) in terms of the parameters
-\( \boldsymbol{\beta} \) and the design matrix \( \boldsymbol{X} \) which embody our model,
-that is \( \boldsymbol{\tilde{y}}=\boldsymbol{X}\boldsymbol{\beta} \). 
-</p>
 
-<p>Thereafter we found the parameters \( \boldsymbol{\beta} \) by optimizing the means squared error via the so-called cost function</p>
-$$
-C(\boldsymbol{X},\boldsymbol{\beta}) =\frac{1}{n}\sum_{i=0}^{n-1}(y_i-\tilde{y}_i)^2=\mathbb{E}\left[(\boldsymbol{y}-\boldsymbol{\tilde{y}})^2\right].
-$$
-
-<p>We can rewrite this as </p>
-$$
-\mathbb{E}\left[(\boldsymbol{y}-\boldsymbol{\tilde{y}})^2\right]=\frac{1}{n}\sum_i(f_i-\mathbb{E}\left[\boldsymbol{\tilde{y}}\right])^2+\frac{1}{n}\sum_i(\tilde{y}_i-\mathbb{E}\left[\boldsymbol{\tilde{y}}\right])^2+\sigma^2.
-$$
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="similar-second-order-function-now-problem-but-now-with-adagrad">Similar (second order function now) problem but now with AdaGrad </h2>
 
-<p>The three terms represent the square of the bias of the learning
-method, which can be thought of as the error caused by the simplifying
-assumptions built into the method. The second term represents the
-variance of the chosen model and finally the last terms is variance of
-the error \( \boldsymbol{\epsilon} \).
-</p>
+<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #eeeedd">
+  <pre style="line-height: 125%;"><span style="color: #228B22"># Using Autograd to calculate gradients using AdaGrad and Stochastic Gradient descent</span>
+<span style="color: #228B22"># OLS example</span>
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">random</span> <span style="color: #8B008B; font-weight: bold">import</span> random, seed
+<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
+<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">autograd.numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
+<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">matplotlib.pyplot</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">plt</span>
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">autograd</span> <span style="color: #8B008B; font-weight: bold">import</span> grad
+
+<span style="color: #228B22"># Note change from previous example</span>
+<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">CostOLS</span>(y,X,theta):
+    <span style="color: #8B008B; font-weight: bold">return</span> np.sum((y-X @ theta)**<span style="color: #B452CD">2</span>)
+
+n = <span style="color: #B452CD">1000</span>
+x = np.random.rand(n,<span style="color: #B452CD">1</span>)
+y = <span style="color: #B452CD">2.0</span>+<span style="color: #B452CD">3</span>*x +<span style="color: #B452CD">4</span>*x*x
+
+X = np.c_[np.ones((n,<span style="color: #B452CD">1</span>)), x, x*x]
+XT_X = X.T @ X
+theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)
+<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;Own inversion&quot;</span>)
+<span style="color: #658b00">print</span>(theta_linreg)
+
+
+<span style="color: #228B22"># Note that we request the derivative wrt third argument (theta, 2 here)</span>
+training_gradient = grad(CostOLS,<span style="color: #B452CD">2</span>)
+<span style="color: #228B22"># Define parameters for Stochastic Gradient Descent</span>
+n_epochs = <span style="color: #B452CD">50</span>
+M = <span style="color: #B452CD">5</span>   <span style="color: #228B22">#size of each minibatch</span>
+m = <span style="color: #658b00">int</span>(n/M) <span style="color: #228B22">#number of minibatches</span>
+<span style="color: #228B22"># Guess for unknown parameters theta</span>
+theta = np.random.randn(<span style="color: #B452CD">3</span>,<span style="color: #B452CD">1</span>)
+
+<span style="color: #228B22"># Value for learning rate</span>
+eta = <span style="color: #B452CD">0.01</span>
+<span style="color: #228B22"># Including AdaGrad parameter to avoid possible division by zero</span>
+delta  = <span style="color: #B452CD">1e-8</span>
+<span style="color: #8B008B; font-weight: bold">for</span> epoch <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(n_epochs):
+    Giter = <span style="color: #B452CD">0.0</span>
+    <span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(m):
+        random_index = M*np.random.randint(m)
+        xi = X[random_index:random_index+M]
+        yi = y[random_index:random_index+M]
+        gradients = (<span style="color: #B452CD">1.0</span>/M)*training_gradient(yi, xi, theta)
+        Giter += gradients*gradients
+        update = gradients*eta/(delta+np.sqrt(Giter))
+        theta -= update
+<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;theta from own AdaGrad&quot;</span>)
+<span style="color: #658b00">print</span>(theta)
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
+</div>
 
-<p>To derive this equation, we need to recall that the variance of \( \boldsymbol{y} \) and \( \boldsymbol{\epsilon} \) are both equal to \( \sigma^2 \). The mean value of \( \boldsymbol{\epsilon} \) is by definition equal to zero. Furthermore, the function \( f \) is not a stochastics variable, idem for \( \boldsymbol{\tilde{y}} \).
-We use a more compact notation in terms of the expectation value 
-</p>
-$$
-\mathbb{E}\left[(\boldsymbol{y}-\boldsymbol{\tilde{y}})^2\right]=\mathbb{E}\left[(\boldsymbol{f}+\boldsymbol{\epsilon}-\boldsymbol{\tilde{y}})^2\right],
-$$
+<p>Running this code we note an almost perfect agreement with the results from matrix inversion.</p>
 
-<p>and adding and subtracting \( \mathbb{E}\left[\boldsymbol{\tilde{y}}\right] \) we get</p>
-$$
-\mathbb{E}\left[(\boldsymbol{y}-\boldsymbol{\tilde{y}})^2\right]=\mathbb{E}\left[(\boldsymbol{f}+\boldsymbol{\epsilon}-\boldsymbol{\tilde{y}}+\mathbb{E}\left[\boldsymbol{\tilde{y}}\right]-\mathbb{E}\left[\boldsymbol{\tilde{y}}\right])^2\right],
-$$
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent">RMSprop for adaptive learning rate with Stochastic Gradient Descent </h2>
 
-<p>which, using the abovementioned expectation values can be rewritten as </p>
-$$
-\mathbb{E}\left[(\boldsymbol{y}-\boldsymbol{\tilde{y}})^2\right]=\mathbb{E}\left[(\boldsymbol{y}-\mathbb{E}\left[\boldsymbol{\tilde{y}}\right])^2\right]+\mathrm{Var}\left[\boldsymbol{\tilde{y}}\right]+\sigma^2,
-$$
+<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #eeeedd">
+  <pre style="line-height: 125%;"><span style="color: #228B22"># Using Autograd to calculate gradients using RMSprop  and Stochastic Gradient descent</span>
+<span style="color: #228B22"># OLS example</span>
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">random</span> <span style="color: #8B008B; font-weight: bold">import</span> random, seed
+<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
+<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">autograd.numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
+<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">matplotlib.pyplot</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">plt</span>
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">autograd</span> <span style="color: #8B008B; font-weight: bold">import</span> grad
+
+<span style="color: #228B22"># Note change from previous example</span>
+<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">CostOLS</span>(y,X,theta):
+    <span style="color: #8B008B; font-weight: bold">return</span> np.sum((y-X @ theta)**<span style="color: #B452CD">2</span>)
+
+n = <span style="color: #B452CD">1000</span>
+x = np.random.rand(n,<span style="color: #B452CD">1</span>)
+y = <span style="color: #B452CD">2.0</span>+<span style="color: #B452CD">3</span>*x +<span style="color: #B452CD">4</span>*x*x<span style="color: #228B22"># +np.random.randn(n,1)</span>
+
+X = np.c_[np.ones((n,<span style="color: #B452CD">1</span>)), x, x*x]
+XT_X = X.T @ X
+theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)
+<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;Own inversion&quot;</span>)
+<span style="color: #658b00">print</span>(theta_linreg)
+
+
+<span style="color: #228B22"># Note that we request the derivative wrt third argument (theta, 2 here)</span>
+training_gradient = grad(CostOLS,<span style="color: #B452CD">2</span>)
+<span style="color: #228B22"># Define parameters for Stochastic Gradient Descent</span>
+n_epochs = <span style="color: #B452CD">50</span>
+M = <span style="color: #B452CD">5</span>   <span style="color: #228B22">#size of each minibatch</span>
+m = <span style="color: #658b00">int</span>(n/M) <span style="color: #228B22">#number of minibatches</span>
+<span style="color: #228B22"># Guess for unknown parameters theta</span>
+theta = np.random.randn(<span style="color: #B452CD">3</span>,<span style="color: #B452CD">1</span>)
+
+<span style="color: #228B22"># Value for learning rate</span>
+eta = <span style="color: #B452CD">0.01</span>
+<span style="color: #228B22"># Value for parameter rho</span>
+rho = <span style="color: #B452CD">0.99</span>
+<span style="color: #228B22"># Including AdaGrad parameter to avoid possible division by zero</span>
+delta  = <span style="color: #B452CD">1e-8</span>
+<span style="color: #8B008B; font-weight: bold">for</span> epoch <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(n_epochs):
+    Giter = <span style="color: #B452CD">0.0</span>
+    <span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(m):
+        random_index = M*np.random.randint(m)
+        xi = X[random_index:random_index+M]
+        yi = y[random_index:random_index+M]
+        gradients = (<span style="color: #B452CD">1.0</span>/M)*training_gradient(yi, xi, theta)
+	<span style="color: #228B22"># Accumulated gradient</span>
+	<span style="color: #228B22"># Scaling with rho the new and the previous results</span>
+        Giter = (rho*Giter+(<span style="color: #B452CD">1</span>-rho)*gradients*gradients)
+	<span style="color: #228B22"># Taking the diagonal only and inverting</span>
+        update = gradients*eta/(delta+np.sqrt(Giter))
+	<span style="color: #228B22"># Hadamard product</span>
+        theta -= update
+<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;theta from own RMSprop&quot;</span>)
+<span style="color: #658b00">print</span>(theta)
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
+</div>
 
-<p>that is the rewriting in terms of the so-called bias, the variance of the model \( \boldsymbol{\tilde{y}} \) and the variance of \( \boldsymbol{\epsilon} \).</p>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="a-way-to-read-the-bias-variance-tradeoff">A way to Read the Bias-Variance Tradeoff </h2>
-
-<br/><br/>
-<center>
-<p><img src="http://github.com/figures/BiasVariance.png" width="600" align="bottom"></p>
-</center>
-<br/><br/>
+<h2 id="and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf">And finally <a href="https://arxiv.org/pdf/1412.6980.pdf" target="_blank">ADAM</a> </h2>
 
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="example-code-for-bias-variance-tradeoff">Example code for Bias-Variance tradeoff </h2>
 
 <!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
 <div class="cell border-box-sizing code_cell rendered">
@@ -1230,60 +2353,65 @@ <h2 id="example-code-for-bias-variance-tradeoff">Example code for Bias-Variance
     <div class="inner_cell">
       <div class="input_area">
         <div class="highlight" style="background: #eeeedd">
-  <pre style="line-height: 125%;"><span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">matplotlib.pyplot</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">plt</span>
+  <pre style="line-height: 125%;"><span style="color: #228B22"># Using Autograd to calculate gradients using RMSprop  and Stochastic Gradient descent</span>
+<span style="color: #228B22"># OLS example</span>
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">random</span> <span style="color: #8B008B; font-weight: bold">import</span> random, seed
 <span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.linear_model</span> <span style="color: #8B008B; font-weight: bold">import</span> LinearRegression, Ridge, Lasso
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.preprocessing</span> <span style="color: #8B008B; font-weight: bold">import</span> PolynomialFeatures
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.model_selection</span> <span style="color: #8B008B; font-weight: bold">import</span> train_test_split
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.pipeline</span> <span style="color: #8B008B; font-weight: bold">import</span> make_pipeline
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.utils</span> <span style="color: #8B008B; font-weight: bold">import</span> resample
-
-np.random.seed(<span style="color: #B452CD">2018</span>)
-
-n = <span style="color: #B452CD">500</span>
-n_boostraps = <span style="color: #B452CD">100</span>
-degree = <span style="color: #B452CD">18</span>  <span style="color: #228B22"># A quite high value, just to show.</span>
-noise = <span style="color: #B452CD">0.1</span>
-
-<span style="color: #228B22"># Make data set.</span>
-x = np.linspace(-<span style="color: #B452CD">1</span>, <span style="color: #B452CD">3</span>, n).reshape(-<span style="color: #B452CD">1</span>, <span style="color: #B452CD">1</span>)
-y = np.exp(-x**<span style="color: #B452CD">2</span>) + <span style="color: #B452CD">1.5</span> * np.exp(-(x-<span style="color: #B452CD">2</span>)**<span style="color: #B452CD">2</span>) + np.random.normal(<span style="color: #B452CD">0</span>, <span style="color: #B452CD">0.1</span>, x.shape)
-
-<span style="color: #228B22"># Hold out some test data that is never used in training.</span>
-x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=<span style="color: #B452CD">0.2</span>)
-
-<span style="color: #228B22"># Combine x transformation and model into one operation.</span>
-<span style="color: #228B22"># Not neccesary, but convenient.</span>
-model = make_pipeline(PolynomialFeatures(degree=degree), LinearRegression(fit_intercept=<span style="color: #8B008B; font-weight: bold">False</span>))
-
-<span style="color: #228B22"># The following (m x n_bootstraps) matrix holds the column vectors y_pred</span>
-<span style="color: #228B22"># for each bootstrap iteration.</span>
-y_pred = np.empty((y_test.shape[<span style="color: #B452CD">0</span>], n_boostraps))
-<span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(n_boostraps):
-    x_, y_ = resample(x_train, y_train)
-
-    <span style="color: #228B22"># Evaluate the new model on the same test data each time.</span>
-    y_pred[:, i] = model.fit(x_, y_).predict(x_test).ravel()
-
-<span style="color: #228B22"># Note: Expectations and variances taken w.r.t. different training</span>
-<span style="color: #228B22"># data sets, hence the axis=1. Subsequent means are taken across the test data</span>
-<span style="color: #228B22"># set in order to obtain a total value, but before this we have error/bias/variance</span>
-<span style="color: #228B22"># calculated per data point in the test set.</span>
-<span style="color: #228B22"># Note 2: The use of keepdims=True is important in the calculation of bias as this </span>
-<span style="color: #228B22"># maintains the column vector form. Dropping this yields very unexpected results.</span>
-error = np.mean( np.mean((y_test - y_pred)**<span style="color: #B452CD">2</span>, axis=<span style="color: #B452CD">1</span>, keepdims=<span style="color: #8B008B; font-weight: bold">True</span>) )
-bias = np.mean( (y_test - np.mean(y_pred, axis=<span style="color: #B452CD">1</span>, keepdims=<span style="color: #8B008B; font-weight: bold">True</span>))**<span style="color: #B452CD">2</span> )
-variance = np.mean( np.var(y_pred, axis=<span style="color: #B452CD">1</span>, keepdims=<span style="color: #8B008B; font-weight: bold">True</span>) )
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&#39;Error:&#39;</span>, error)
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&#39;Bias^2:&#39;</span>, bias)
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&#39;Var:&#39;</span>, variance)
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&#39;{} &gt;= {} + {} = {}&#39;</span>.format(error, bias, variance, bias+variance))
-
-plt.plot(x[::<span style="color: #B452CD">5</span>, :], y[::<span style="color: #B452CD">5</span>, :], label=<span style="color: #CD5555">&#39;f(x)&#39;</span>)
-plt.scatter(x_test, y_test, label=<span style="color: #CD5555">&#39;Data points&#39;</span>)
-plt.scatter(x_test, np.mean(y_pred, axis=<span style="color: #B452CD">1</span>), label=<span style="color: #CD5555">&#39;Pred&#39;</span>)
-plt.legend()
-plt.show()
+<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">autograd.numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
+<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">matplotlib.pyplot</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">plt</span>
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">autograd</span> <span style="color: #8B008B; font-weight: bold">import</span> grad
+
+<span style="color: #228B22"># Note change from previous example</span>
+<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">CostOLS</span>(y,X,theta):
+    <span style="color: #8B008B; font-weight: bold">return</span> np.sum((y-X @ theta)**<span style="color: #B452CD">2</span>)
+
+n = <span style="color: #B452CD">1000</span>
+x = np.random.rand(n,<span style="color: #B452CD">1</span>)
+y = <span style="color: #B452CD">2.0</span>+<span style="color: #B452CD">3</span>*x +<span style="color: #B452CD">4</span>*x*x<span style="color: #228B22"># +np.random.randn(n,1)</span>
+
+X = np.c_[np.ones((n,<span style="color: #B452CD">1</span>)), x, x*x]
+XT_X = X.T @ X
+theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)
+<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;Own inversion&quot;</span>)
+<span style="color: #658b00">print</span>(theta_linreg)
+
+
+<span style="color: #228B22"># Note that we request the derivative wrt third argument (theta, 2 here)</span>
+training_gradient = grad(CostOLS,<span style="color: #B452CD">2</span>)
+<span style="color: #228B22"># Define parameters for Stochastic Gradient Descent</span>
+n_epochs = <span style="color: #B452CD">50</span>
+M = <span style="color: #B452CD">5</span>   <span style="color: #228B22">#size of each minibatch</span>
+m = <span style="color: #658b00">int</span>(n/M) <span style="color: #228B22">#number of minibatches</span>
+<span style="color: #228B22"># Guess for unknown parameters theta</span>
+theta = np.random.randn(<span style="color: #B452CD">3</span>,<span style="color: #B452CD">1</span>)
+
+<span style="color: #228B22"># Value for learning rate</span>
+eta = <span style="color: #B452CD">0.01</span>
+<span style="color: #228B22"># Value for parameters theta1 and theta2, see https://arxiv.org/abs/1412.6980</span>
+theta1 = <span style="color: #B452CD">0.9</span>
+theta2 = <span style="color: #B452CD">0.999</span>
+<span style="color: #228B22"># Including AdaGrad parameter to avoid possible division by zero</span>
+delta  = <span style="color: #B452CD">1e-7</span>
+<span style="color: #658b00">iter</span> = <span style="color: #B452CD">0</span>
+<span style="color: #8B008B; font-weight: bold">for</span> epoch <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(n_epochs):
+    first_moment = <span style="color: #B452CD">0.0</span>
+    second_moment = <span style="color: #B452CD">0.0</span>
+    <span style="color: #658b00">iter</span> += <span style="color: #B452CD">1</span>
+    <span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(m):
+        random_index = M*np.random.randint(m)
+        xi = X[random_index:random_index+M]
+        yi = y[random_index:random_index+M]
+        gradients = (<span style="color: #B452CD">1.0</span>/M)*training_gradient(yi, xi, theta)
+        <span style="color: #228B22"># Computing moments first</span>
+        first_moment = theta1*first_moment + (<span style="color: #B452CD">1</span>-theta1)*gradients
+        second_moment = theta2*second_moment+(<span style="color: #B452CD">1</span>-theta2)*gradients*gradients
+        first_term = first_moment/(<span style="color: #B452CD">1.0</span>-theta1**<span style="color: #658b00">iter</span>)
+        second_term = second_moment/(<span style="color: #B452CD">1.0</span>-theta2**<span style="color: #658b00">iter</span>)
+	<span style="color: #228B22"># Scaling with rho the new and the previous results</span>
+        update = eta*first_term/(np.sqrt(second_term)+delta)
+        theta -= update
+<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;theta from own ADAM&quot;</span>)
+<span style="color: #658b00">print</span>(theta)
 </pre>
 </div>
       </div>
@@ -1301,7 +2429,43 @@ <h2 id="example-code-for-bias-variance-tradeoff">Example code for Bias-Variance
 
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="understanding-what-happens">Understanding what happens </h2>
+<h2 id="material-for-the-lab-sessions">Material for the lab sessions  </h2>
+
+<div class="alert alert-block alert-block alert-text-normal">
+<b></b>
+<p>
+<ol>
+<li> Exercise set for week 37 and reminder on scaling (from lab sessions of week 35)</li>
+<li> Work on project 1
+<!-- * <a href="https://youtu.be/bK4AEcTu-oM" target="_blank">Video of exercise sessions week 37</a> --></li>
+</ol>
+<p>For more discussions of Ridge regression and calculation of averages, <a href="https://arxiv.org/abs/1509.09169" target="_blank">Wessel van Wieringen's</a> article is highly recommended.</p>
+</div>
+  
+
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="reminder-on-different-scaling-methods">Reminder on different scaling methods </h2>
+
+<p>Before fitting a regression model, it is good practice to normalize or
+standardize the features. This ensures all features are on a
+comparable scale, which is especially important when using
+regularization. In the exercises this week we will perform standardization, scaling each
+feature to have mean 0 and standard deviation 1.
+</p>
+
+<p>Here we compute the mean and standard deviation of each column (feature) in our design/feature matrix \( \boldsymbol{X} \).
+Then we subtract the mean and divide by the standard deviation for each feature.
+</p>
+
+<p>In the example here we
+we will also center the target \( \boldsymbol{y} \) to mean \( 0 \). Centering \( \boldsymbol{y} \)
+(and each feature) means the model does not require a separate intercept
+term, the data is shifted such that the intercept is effectively 0
+. (In practice, one could include an intercept in the model and not
+penalize it, but here we simplify by centering.)
+Choose \( n=100 \) data points and set up $\boldsymbol{x}, \( \boldsymbol{y} \) and the design matrix \( \boldsymbol{X} \).
+</p>
+
 
 <!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
 <div class="cell border-box-sizing code_cell rendered">
@@ -1309,52 +2473,15 @@ <h2 id="understanding-what-happens">Understanding what happens </h2>
     <div class="inner_cell">
       <div class="input_area">
         <div class="highlight" style="background: #eeeedd">
-  <pre style="line-height: 125%;"><span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">matplotlib.pyplot</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">plt</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.linear_model</span> <span style="color: #8B008B; font-weight: bold">import</span> LinearRegression, Ridge, Lasso
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.preprocessing</span> <span style="color: #8B008B; font-weight: bold">import</span> PolynomialFeatures
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.model_selection</span> <span style="color: #8B008B; font-weight: bold">import</span> train_test_split
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.pipeline</span> <span style="color: #8B008B; font-weight: bold">import</span> make_pipeline
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.utils</span> <span style="color: #8B008B; font-weight: bold">import</span> resample
-
-np.random.seed(<span style="color: #B452CD">2018</span>)
-
-n = <span style="color: #B452CD">40</span>
-n_boostraps = <span style="color: #B452CD">100</span>
-maxdegree = <span style="color: #B452CD">14</span>
-
-
-<span style="color: #228B22"># Make data set.</span>
-x = np.linspace(-<span style="color: #B452CD">3</span>, <span style="color: #B452CD">3</span>, n).reshape(-<span style="color: #B452CD">1</span>, <span style="color: #B452CD">1</span>)
-y = np.exp(-x**<span style="color: #B452CD">2</span>) + <span style="color: #B452CD">1.5</span> * np.exp(-(x-<span style="color: #B452CD">2</span>)**<span style="color: #B452CD">2</span>)+ np.random.normal(<span style="color: #B452CD">0</span>, <span style="color: #B452CD">0.1</span>, x.shape)
-error = np.zeros(maxdegree)
-bias = np.zeros(maxdegree)
-variance = np.zeros(maxdegree)
-polydegree = np.zeros(maxdegree)
-x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=<span style="color: #B452CD">0.2</span>)
-
-<span style="color: #8B008B; font-weight: bold">for</span> degree <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(maxdegree):
-    model = make_pipeline(PolynomialFeatures(degree=degree), LinearRegression(fit_intercept=<span style="color: #8B008B; font-weight: bold">False</span>))
-    y_pred = np.empty((y_test.shape[<span style="color: #B452CD">0</span>], n_boostraps))
-    <span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(n_boostraps):
-        x_, y_ = resample(x_train, y_train)
-        y_pred[:, i] = model.fit(x_, y_).predict(x_test).ravel()
-
-    polydegree[degree] = degree
-    error[degree] = np.mean( np.mean((y_test - y_pred)**<span style="color: #B452CD">2</span>, axis=<span style="color: #B452CD">1</span>, keepdims=<span style="color: #8B008B; font-weight: bold">True</span>) )
-    bias[degree] = np.mean( (y_test - np.mean(y_pred, axis=<span style="color: #B452CD">1</span>, keepdims=<span style="color: #8B008B; font-weight: bold">True</span>))**<span style="color: #B452CD">2</span> )
-    variance[degree] = np.mean( np.var(y_pred, axis=<span style="color: #B452CD">1</span>, keepdims=<span style="color: #8B008B; font-weight: bold">True</span>) )
-    <span style="color: #658b00">print</span>(<span style="color: #CD5555">&#39;Polynomial degree:&#39;</span>, degree)
-    <span style="color: #658b00">print</span>(<span style="color: #CD5555">&#39;Error:&#39;</span>, error[degree])
-    <span style="color: #658b00">print</span>(<span style="color: #CD5555">&#39;Bias^2:&#39;</span>, bias[degree])
-    <span style="color: #658b00">print</span>(<span style="color: #CD5555">&#39;Var:&#39;</span>, variance[degree])
-    <span style="color: #658b00">print</span>(<span style="color: #CD5555">&#39;{} &gt;= {} + {} = {}&#39;</span>.format(error[degree], bias[degree], variance[degree], bias[degree]+variance[degree]))
-
-plt.plot(polydegree, error, label=<span style="color: #CD5555">&#39;Error&#39;</span>)
-plt.plot(polydegree, bias, label=<span style="color: #CD5555">&#39;bias&#39;</span>)
-plt.plot(polydegree, variance, label=<span style="color: #CD5555">&#39;Variance&#39;</span>)
-plt.legend()
-plt.show()
+  <pre style="line-height: 125%;"><span style="color: #228B22"># Standardize features (zero mean, unit variance for each feature)</span>
+X_mean = X.mean(axis=<span style="color: #B452CD">0</span>)
+X_std = X.std(axis=<span style="color: #B452CD">0</span>)
+X_std[X_std == <span style="color: #B452CD">0</span>] = <span style="color: #B452CD">1</span>  <span style="color: #228B22"># safeguard to avoid division by zero for constant features</span>
+X_norm = (X - X_mean) / X_std
+
+<span style="color: #228B22"># Center the target to zero mean (optional, to simplify intercept handling)</span>
+y_mean = <span style="color: #a61717; background-color: #e3d2d2">?</span>
+y_centered = <span style="color: #a61717; background-color: #e3d2d2">?</span>
 </pre>
 </div>
       </div>
@@ -1370,59 +2497,73 @@ <h2 id="understanding-what-happens">Understanding what happens </h2>
   </div>
 </div>
 
+<p>Do we need to center the values of \( y \)?</p>
 
-<!-- !split  -->
-<h2 id="summing-up">Summing up </h2>
-
-<p>The bias-variance tradeoff summarizes the fundamental tension in
-machine learning, particularly supervised learning, between the
-complexity of a model and the amount of training data needed to train
-it.  Since data is often limited, in practice it is often useful to
-use a less-complex model with higher bias, that is  a model whose asymptotic
-performance is worse than another model because it is easier to
-train and less sensitive to sampling noise arising from having a
-finite-sized training dataset (smaller variance). 
+<p>After this preprocessing, each column of \( \boldsymbol{X}_{\mathrm{norm}} \) has mean zero and standard deviation \( 1 \)
+and \( \boldsymbol{y}_{\mathrm{centered}} \) has mean 0. This can make the optimization landscape
+nicer and ensures the regularization penalty \( \lambda \sum_j
+\theta_j^2 \) in Ridge regression treats each coefficient fairly (since features are on the
+same scale).
 </p>
 
-<p>The above equations tell us that in
-order to minimize the expected test error, we need to select a
-statistical learning method that simultaneously achieves low variance
-and low bias. Note that variance is inherently a nonnegative quantity,
-and squared bias is also nonnegative. Hence, we see that the expected
-test MSE can never lie below \( Var(\epsilon) \), the irreducible error.
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="functionality-in-scikit-learn">Functionality in Scikit-Learn </h2>
+
+<p><b>Scikit-Learn</b> has several functions which allow us to rescale the
+data, normally resulting in much better results in terms of various
+accuracy scores.  The <b>StandardScaler</b> function in <b>Scikit-Learn</b>
+ensures that for each feature/predictor we study the mean value is
+zero and the variance is one (every column in the design/feature
+matrix).  This scaling has the drawback that it does not ensure that
+we have a particular maximum or minimum in our data set. Another
+function included in <b>Scikit-Learn</b> is the <b>MinMaxScaler</b> which
+ensures that all features are exactly between \( 0 \) and \( 1 \). The
 </p>
 
-<p>What do we mean by the variance and bias of a statistical learning
-method? The variance refers to the amount by which our model would change if we
-estimated it using a different training data set. Since the training
-data are used to fit the statistical learning method, different
-training data sets  will result in a different estimate. But ideally the
-estimate for our model should not vary too much between training
-sets. However, if a method has high variance  then small changes in
-the training data can result in large changes in the model. In general, more
-flexible statistical methods have higher variance.
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="more-preprocessing">More preprocessing </h2>
+
+<div class="alert alert-block alert-block alert-text-normal">
+<b></b>
+<p>
+<p>The <b>Normalizer</b> scales each data
+point such that the feature vector has a euclidean length of one. In other words, it
+projects a data point on the circle (or sphere in the case of higher dimensions) with a
+radius of 1. This means every data point is scaled by a different number (by the
+inverse of it&#8217;s length).
+This normalization is often used when only the direction (or angle) of the data matters,
+not the length of the feature vector.
+</p>
+
+<p>The <b>RobustScaler</b> works similarly to the StandardScaler in that it
+ensures statistical properties for each feature that guarantee that
+they are on the same scale. However, the RobustScaler uses the median
+and quartiles, instead of mean and variance. This makes the
+RobustScaler ignore data points that are very different from the rest
+(like measurement errors). These odd data points are also called
+outliers, and might often lead to trouble for other scaling
+techniques.
 </p>
+</div>
 
-<p>You may also find this recent <a href="https://www.pnas.org/content/116/32/15849" target="_blank">article</a> of interest.</p>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="another-example-from-scikit-learn-s-repository">Another Example from Scikit-Learn's Repository </h2>
-
-<p>This example demonstrates the problems of underfitting and overfitting and
-how we can use linear regression with polynomial features to approximate
-nonlinear functions. The plot shows the function that we want to approximate,
-which is a part of the cosine function. In addition, the samples from the
-real function and the approximations of different models are displayed. The
-models have polynomial features of different degrees. We can see that a
-linear function (polynomial with degree 1) is not sufficient to fit the
-training samples. This is called <b>underfitting</b>. A polynomial of degree 4
-approximates the true function almost perfectly. However, for higher degrees
-the model will <b>overfit</b> the training data, i.e. it learns the noise of the
-training data.
-We evaluate quantitatively overfitting and underfitting by using
-cross-validation. We calculate the mean squared error (MSE) on the validation
-set, the higher, the less likely the model generalizes correctly from the
-training data.
+<h2 id="frequently-used-scaling-functions">Frequently used scaling functions </h2>
+
+<p>Many features are often scaled using standardization to improve performance. In <b>Scikit-Learn</b> this is given by the <b>StandardScaler</b> function as discussed above. It is easy however to write your own. 
+Mathematically, this involves subtracting the mean and divide by the standard deviation over the data set, for each feature:
+</p>
+
+$$
+    x_j^{(i)} \rightarrow \frac{x_j^{(i)} - \overline{x}_j}{\sigma(x_j)},
+$$
+
+<p>where \( \overline{x}_j \) and \( \sigma(x_j) \) are the mean and standard deviation, respectively,  of the feature \( x_j \).
+This ensures that each feature has zero mean and unit standard deviation.  For data sets where  we do not have the standard deviation or don't wish to calculate it,  it is then common to simply set it to one.
+</p>
+
+<p>Keep in mind that when you transform your data set before training a model, the same transformation needs to be done
+on your eventual new data set  before making a prediction. If we translate this into a Python code, it would could be implemented as
 </p>
 
 
@@ -1432,55 +2573,22 @@ <h2 id="another-example-from-scikit-learn-s-repository">Another Example from Sci
     <div class="inner_cell">
       <div class="input_area">
         <div class="highlight" style="background: #eeeedd">
-  <pre style="line-height: 125%;"><span style="color: #228B22">#print(__doc__)</span>
-
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">matplotlib.pyplot</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">plt</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.pipeline</span> <span style="color: #8B008B; font-weight: bold">import</span> Pipeline
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.preprocessing</span> <span style="color: #8B008B; font-weight: bold">import</span> PolynomialFeatures
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.linear_model</span> <span style="color: #8B008B; font-weight: bold">import</span> LinearRegression
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.model_selection</span> <span style="color: #8B008B; font-weight: bold">import</span> cross_val_score
-
-
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">true_fun</span>(X):
-    <span style="color: #8B008B; font-weight: bold">return</span> np.cos(<span style="color: #B452CD">1.5</span> * np.pi * X)
-
-np.random.seed(<span style="color: #B452CD">0</span>)
-
-n_samples = <span style="color: #B452CD">30</span>
-degrees = [<span style="color: #B452CD">1</span>, <span style="color: #B452CD">4</span>, <span style="color: #B452CD">15</span>]
-
-X = np.sort(np.random.rand(n_samples))
-y = true_fun(X) + np.random.randn(n_samples) * <span style="color: #B452CD">0.1</span>
-
-plt.figure(figsize=(<span style="color: #B452CD">14</span>, <span style="color: #B452CD">5</span>))
-<span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #658b00">len</span>(degrees)):
-    ax = plt.subplot(<span style="color: #B452CD">1</span>, <span style="color: #658b00">len</span>(degrees), i + <span style="color: #B452CD">1</span>)
-    plt.setp(ax, xticks=(), yticks=())
-
-    polynomial_features = PolynomialFeatures(degree=degrees[i],
-                                             include_bias=<span style="color: #8B008B; font-weight: bold">False</span>)
-    linear_regression = LinearRegression()
-    pipeline = Pipeline([(<span style="color: #CD5555">&quot;polynomial_features&quot;</span>, polynomial_features),
-                         (<span style="color: #CD5555">&quot;linear_regression&quot;</span>, linear_regression)])
-    pipeline.fit(X[:, np.newaxis], y)
-
-    <span style="color: #228B22"># Evaluate the models using crossvalidation</span>
-    scores = cross_val_score(pipeline, X[:, np.newaxis], y,
-                             scoring=<span style="color: #CD5555">&quot;neg_mean_squared_error&quot;</span>, cv=<span style="color: #B452CD">10</span>)
-
-    X_test = np.linspace(<span style="color: #B452CD">0</span>, <span style="color: #B452CD">1</span>, <span style="color: #B452CD">100</span>)
-    plt.plot(X_test, pipeline.predict(X_test[:, np.newaxis]), label=<span style="color: #CD5555">&quot;Model&quot;</span>)
-    plt.plot(X_test, true_fun(X_test), label=<span style="color: #CD5555">&quot;True function&quot;</span>)
-    plt.scatter(X, y, edgecolor=<span style="color: #CD5555">&#39;b&#39;</span>, s=<span style="color: #B452CD">20</span>, label=<span style="color: #CD5555">&quot;Samples&quot;</span>)
-    plt.xlabel(<span style="color: #CD5555">&quot;x&quot;</span>)
-    plt.ylabel(<span style="color: #CD5555">&quot;y&quot;</span>)
-    plt.xlim((<span style="color: #B452CD">0</span>, <span style="color: #B452CD">1</span>))
-    plt.ylim((-<span style="color: #B452CD">2</span>, <span style="color: #B452CD">2</span>))
-    plt.legend(loc=<span style="color: #CD5555">&quot;best&quot;</span>)
-    plt.title(<span style="color: #CD5555">&quot;Degree {}\nMSE = {:.2e}(+/- {:.2e})&quot;</span>.format(
-        degrees[i], -scores.mean(), scores.std()))
-plt.show()
+  <pre style="line-height: 125%;"><span style="color: #CD5555">&quot;&quot;&quot;</span>
+<span style="color: #CD5555">#Model training, we compute the mean value of y and X</span>
+<span style="color: #CD5555">y_train_mean = np.mean(y_train)</span>
+<span style="color: #CD5555">X_train_mean = np.mean(X_train,axis=0)</span>
+<span style="color: #CD5555">X_train = X_train - X_train_mean</span>
+<span style="color: #CD5555">y_train = y_train - y_train_mean</span>
+
+<span style="color: #CD5555"># The we fit our model with the training data</span>
+<span style="color: #CD5555">trained_model = some_model.fit(X_train,y_train)</span>
+
+
+<span style="color: #CD5555">#Model prediction, we need also to transform our data set used for the prediction.</span>
+<span style="color: #CD5555">X_test = X_test - X_train_mean #Use mean from training data</span>
+<span style="color: #CD5555">y_pred = trained_model(X_test)</span>
+<span style="color: #CD5555">y_pred = y_pred + y_train_mean</span>
+<span style="color: #CD5555">&quot;&quot;&quot;</span>
 </pre>
 </div>
       </div>
@@ -1496,47 +2604,112 @@ <h2 id="another-example-from-scikit-learn-s-repository">Another Example from Sci
   </div>
 </div>
 
+<p>Let us try to understand what this may imply mathematically when we
+subtract the mean values, also known as <em>zero centering</em>. For
+simplicity, we will focus on  ordinary regression, as done in the above example.
+</p>
+
+<p>The cost/loss function  for regression is</p>
+$$
+C(\theta_0, \theta_1, ... , \theta_{p-1}) = \frac{1}{n}\sum_{i=0}^{n} \left(y_i - \theta_0 - \sum_{j=1}^{p-1} X_{ij}\theta_j\right)^2,.
+$$
 
-<!-- !split  -->
-<h2 id="various-steps-in-cross-validation">Various steps in cross-validation </h2>
-
-<p>When the repetitive splitting of the data set is done randomly,
-samples may accidently end up in a fast majority of the splits in
-either training or test set. Such samples may have an unbalanced
-influence on either model building or prediction evaluation. To avoid
-this \( k \)-fold cross-validation structures the data splitting. The
-samples are divided into \( k \) more or less equally sized exhaustive and
-mutually exclusive subsets. In turn (at each split) one of these
-subsets plays the role of the test set while the union of the
-remaining subsets constitutes the training set. Such a splitting
-warrants a balanced representation of each sample in both training and
-test set over the splits. Still the division into the \( k \) subsets
-involves a degree of randomness. This may be fully excluded when
-choosing \( k=n \). This particular case is referred to as leave-one-out
-cross-validation (LOOCV). 
+<p>Recall also that we use the squared value. This expression can lead to an
+increased penalty for higher differences between predicted and
+output/target values.
 </p>
 
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="cross-validation-in-brief">Cross-validation in brief </h2>
+<p>What we have done is to single out the \( \theta_0 \) term in the
+definition of the mean squared error (MSE).  The design matrix \( X \)
+does in this case not contain any intercept column.  When we take the
+derivative with respect to \( \theta_0 \), we want the derivative to obey
+</p>
 
-<p>For the various values of \( k \)</p>
+$$
+\frac{\partial C}{\partial \theta_j} = 0,
+$$
 
-<ol>
-<li> shuffle the dataset randomly.</li>
-<li> Split the dataset into \( k \) groups.</li>
-<li> For each unique group:
-<ol type="a"></li>
-<li> Decide which group to use as set for test data</li>
-<li> Take the remaining groups as a training data set</li>
-<li> Fit a model on the training set and evaluate it on the test set</li>
-<li> Retain the evaluation score and discard the model</li>
-</ol>
-<li> Summarize the model using the sample of model evaluation scores</li>
-</ol>
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="code-example-for-cross-validation-and-k-fold-cross-validation">Code Example for Cross-validation and \( k \)-fold Cross-validation </h2>
+<p>for all \( j \). For \( \theta_0 \) we have</p>
+
+$$
+\frac{\partial C}{\partial \theta_0} = -\frac{2}{n}\sum_{i=0}^{n-1} \left(y_i - \theta_0 - \sum_{j=1}^{p-1} X_{ij} \theta_j\right).
+$$
+
+<p>Multiplying away the constant \( 2/n \), we obtain</p>
+$$
+\sum_{i=0}^{n-1} \theta_0 = \sum_{i=0}^{n-1}y_i - \sum_{i=0}^{n-1} \sum_{j=1}^{p-1} X_{ij} \theta_j.
+$$
+
+<p>Let us specialize first to the case where we have only two parameters \( \theta_0 \) and \( \theta_1 \).
+Our result for \( \theta_0 \) simplifies then to
+</p>
+$$
+n\theta_0 = \sum_{i=0}^{n-1}y_i - \sum_{i=0}^{n-1} X_{i1} \theta_1.
+$$
+
+<p>We obtain then</p>
+$$
+\theta_0 = \frac{1}{n}\sum_{i=0}^{n-1}y_i - \theta_1\frac{1}{n}\sum_{i=0}^{n-1} X_{i1}.
+$$
+
+<p>If we define</p>
+$$
+\mu_{\boldsymbol{x}_1}=\frac{1}{n}\sum_{i=0}^{n-1} X_{i1},
+$$
+
+<p>and the mean value of the outputs as</p>
+$$
+\mu_y=\frac{1}{n}\sum_{i=0}^{n-1}y_i,
+$$
+
+<p>we have</p>
+$$
+\theta_0 = \mu_y - \theta_1\mu_{\boldsymbol{x}_1}.
+$$
+
+<p>In the general case with more parameters than \( \theta_0 \) and \( \theta_1 \), we have</p>
+$$
+\theta_0 = \frac{1}{n}\sum_{i=0}^{n-1}y_i - \frac{1}{n}\sum_{i=0}^{n-1}\sum_{j=1}^{p-1} X_{ij}\theta_j.
+$$
+
+<p>We can rewrite the latter equation as</p>
+$$
+\theta_0 = \frac{1}{n}\sum_{i=0}^{n-1}y_i - \sum_{j=1}^{p-1} \mu_{\boldsymbol{x}_j}\theta_j,
+$$
+
+<p>where we have defined</p>
+$$
+\mu_{\boldsymbol{x}_j}=\frac{1}{n}\sum_{i=0}^{n-1} X_{ij},
+$$
+
+<p>the mean value for all elements of the column vector \( \boldsymbol{x}_j \).</p>
+
+<p>Replacing \( y_i \) with \( y_i - y_i - \overline{\boldsymbol{y}} \) and centering also our design matrix results in a cost function (in vector-matrix disguise)</p>
+$$
+C(\boldsymbol{\theta}) = (\boldsymbol{\tilde{y}} - \tilde{X}\boldsymbol{\theta})^T(\boldsymbol{\tilde{y}} - \tilde{X}\boldsymbol{\theta}). 
+$$
+
+<p>If we minimize with respect to \( \boldsymbol{\theta} \) we have then</p>
+
+$$
+\hat{\boldsymbol{\theta}} = (\tilde{X}^T\tilde{X})^{-1}\tilde{X}^T\boldsymbol{\tilde{y}},
+$$
+
+<p>where \( \boldsymbol{\tilde{y}} = \boldsymbol{y} - \overline{\boldsymbol{y}} \)
+and \( \tilde{X}_{ij} = X_{ij} - \frac{1}{n}\sum_{k=0}^{n-1}X_{kj} \).
+</p>
+
+<p>For Ridge regression we need to add \( \lambda \boldsymbol{\theta}^T\boldsymbol{\theta} \) to the cost function and get then</p>
+$$
+\hat{\boldsymbol{\theta}} = (\tilde{X}^T\tilde{X} + \lambda I)^{-1}\tilde{X}^T\boldsymbol{\tilde{y}}.
+$$
+
+<p>What does this mean? And why do we insist on all this? Let us look at some examples.</p>
+
+<p>This code shows a simple first-order fit to a data set using the above transformed data, where we consider the role of the intercept first, by either excluding it or including it (<em>code example thanks to  &#216;yvind Sigmundson Sch&#248;yen</em>). Here our scaling of the data is done by subtracting the mean values only.
+Note also that we do not split the data into training and test.
+</p>
 
-<p>The code here uses Ridge regression with cross-validation (CV)  resampling and \( k \)-fold CV in order to fit a specific polynomial. </p>
 
 <!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
 <div class="cell border-box-sizing code_cell rendered">
@@ -1546,90 +2719,87 @@ <h2 id="code-example-for-cross-validation-and-k-fold-cross-validation">Code Exam
         <div class="highlight" style="background: #eeeedd">
   <pre style="line-height: 125%;"><span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
 <span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">matplotlib.pyplot</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">plt</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.model_selection</span> <span style="color: #8B008B; font-weight: bold">import</span> KFold
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.linear_model</span> <span style="color: #8B008B; font-weight: bold">import</span> Ridge
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.model_selection</span> <span style="color: #8B008B; font-weight: bold">import</span> cross_val_score
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.preprocessing</span> <span style="color: #8B008B; font-weight: bold">import</span> PolynomialFeatures
-
-<span style="color: #228B22"># A seed just to ensure that the random numbers are the same for every run.</span>
-<span style="color: #228B22"># Useful for eventual debugging.</span>
-np.random.seed(<span style="color: #B452CD">3155</span>)
-
-<span style="color: #228B22"># Generate the data.</span>
-nsamples = <span style="color: #B452CD">100</span>
-x = np.random.randn(nsamples)
-y = <span style="color: #B452CD">3</span>*x**<span style="color: #B452CD">2</span> + np.random.randn(nsamples)
 
-<span style="color: #228B22">## Cross-validation on Ridge regression using KFold only</span>
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.linear_model</span> <span style="color: #8B008B; font-weight: bold">import</span> LinearRegression
 
-<span style="color: #228B22"># Decide degree on polynomial to fit</span>
-poly = PolynomialFeatures(degree = <span style="color: #B452CD">6</span>)
 
-<span style="color: #228B22"># Decide which values of lambda to use</span>
-nlambdas = <span style="color: #B452CD">500</span>
-lambdas = np.logspace(-<span style="color: #B452CD">3</span>, <span style="color: #B452CD">5</span>, nlambdas)
+np.random.seed(<span style="color: #B452CD">2021</span>)
 
-<span style="color: #228B22"># Initialize a KFold instance</span>
-k = <span style="color: #B452CD">5</span>
-kfold = KFold(n_splits = k)
+<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">MSE</span>(y_data,y_model):
+    n = np.size(y_model)
+    <span style="color: #8B008B; font-weight: bold">return</span> np.sum((y_data-y_model)**<span style="color: #B452CD">2</span>)/n
 
-<span style="color: #228B22"># Perform the cross-validation to estimate MSE</span>
-scores_KFold = np.zeros((nlambdas, k))
 
-i = <span style="color: #B452CD">0</span>
-<span style="color: #8B008B; font-weight: bold">for</span> lmb <span style="color: #8B008B">in</span> lambdas:
-    ridge = Ridge(alpha = lmb)
-    j = <span style="color: #B452CD">0</span>
-    <span style="color: #8B008B; font-weight: bold">for</span> train_inds, test_inds <span style="color: #8B008B">in</span> kfold.split(x):
-        xtrain = x[train_inds]
-        ytrain = y[train_inds]
+<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">fit_theta</span>(X, y):
+    <span style="color: #8B008B; font-weight: bold">return</span> np.linalg.pinv(X.T @ X) @ X.T @ y
 
-        xtest = x[test_inds]
-        ytest = y[test_inds]
 
-        Xtrain = poly.fit_transform(xtrain[:, np.newaxis])
-        ridge.fit(Xtrain, ytrain[:, np.newaxis])
+true_theta = [<span style="color: #B452CD">2</span>, <span style="color: #B452CD">0.5</span>, <span style="color: #B452CD">3.7</span>]
 
-        Xtest = poly.fit_transform(xtest[:, np.newaxis])
-        ypred = ridge.predict(Xtest)
+x = np.linspace(<span style="color: #B452CD">0</span>, <span style="color: #B452CD">1</span>, <span style="color: #B452CD">11</span>)
+y = np.sum(
+    np.asarray([x ** p * b <span style="color: #8B008B; font-weight: bold">for</span> p, b <span style="color: #8B008B">in</span> <span style="color: #658b00">enumerate</span>(true_theta)]), axis=<span style="color: #B452CD">0</span>
+) + <span style="color: #B452CD">0.1</span> * np.random.normal(size=<span style="color: #658b00">len</span>(x))
 
-        scores_KFold[i,j] = np.sum((ypred - ytest[:, np.newaxis])**<span style="color: #B452CD">2</span>)/np.size(ypred)
+degree = <span style="color: #B452CD">3</span>
+X = np.zeros((<span style="color: #658b00">len</span>(x), degree))
 
-        j += <span style="color: #B452CD">1</span>
-    i += <span style="color: #B452CD">1</span>
+<span style="color: #228B22"># Include the intercept in the design matrix</span>
+<span style="color: #8B008B; font-weight: bold">for</span> p <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(degree):
+    X[:, p] = x ** p
 
+theta = fit_theta(X, y)
 
-estimated_mse_KFold = np.mean(scores_KFold, axis = <span style="color: #B452CD">1</span>)
+<span style="color: #228B22"># Intercept is included in the design matrix</span>
+skl = LinearRegression(fit_intercept=<span style="color: #8B008B; font-weight: bold">False</span>).fit(X, y)
 
-<span style="color: #228B22">## Cross-validation using cross_val_score from sklearn along with KFold</span>
+<span style="color: #658b00">print</span>(<span style="color: #CD5555">f&quot;True theta: {</span>true_theta<span style="color: #CD5555">}&quot;</span>)
+<span style="color: #658b00">print</span>(<span style="color: #CD5555">f&quot;Fitted theta: {</span>theta<span style="color: #CD5555">}&quot;</span>)
+<span style="color: #658b00">print</span>(<span style="color: #CD5555">f&quot;Sklearn fitted theta: {</span>skl.coef_<span style="color: #CD5555">}&quot;</span>)
+ypredictOwn = X @ theta
+ypredictSKL = skl.predict(X)
+<span style="color: #658b00">print</span>(<span style="color: #CD5555">f&quot;MSE with intercept column&quot;</span>)
+<span style="color: #658b00">print</span>(MSE(y,ypredictOwn))
+<span style="color: #658b00">print</span>(<span style="color: #CD5555">f&quot;MSE with intercept column from SKL&quot;</span>)
+<span style="color: #658b00">print</span>(MSE(y,ypredictSKL))
 
-<span style="color: #228B22"># kfold is an instance initialized above as:</span>
-<span style="color: #228B22"># kfold = KFold(n_splits = k)</span>
 
-estimated_mse_sklearn = np.zeros(nlambdas)
-i = <span style="color: #B452CD">0</span>
-<span style="color: #8B008B; font-weight: bold">for</span> lmb <span style="color: #8B008B">in</span> lambdas:
-    ridge = Ridge(alpha = lmb)
+plt.figure()
+plt.scatter(x, y, label=<span style="color: #CD5555">&quot;Data&quot;</span>)
+plt.plot(x, X @ theta, label=<span style="color: #CD5555">&quot;Fit&quot;</span>)
+plt.plot(x, skl.predict(X), label=<span style="color: #CD5555">&quot;Sklearn (fit_intercept=False)&quot;</span>)
 
-    X = poly.fit_transform(x[:, np.newaxis])
-    estimated_mse_folds = cross_val_score(ridge, X, y[:, np.newaxis], scoring=<span style="color: #CD5555">&#39;neg_mean_squared_error&#39;</span>, cv=kfold)
 
-    <span style="color: #228B22"># cross_val_score return an array containing the estimated negative mse for every fold.</span>
-    <span style="color: #228B22"># we have to the the mean of every array in order to get an estimate of the mse of the model</span>
-    estimated_mse_sklearn[i] = np.mean(-estimated_mse_folds)
+<span style="color: #228B22"># Do not include the intercept in the design matrix</span>
+X = np.zeros((<span style="color: #658b00">len</span>(x), degree - <span style="color: #B452CD">1</span>))
 
-    i += <span style="color: #B452CD">1</span>
+<span style="color: #8B008B; font-weight: bold">for</span> p <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(degree - <span style="color: #B452CD">1</span>):
+    X[:, p] = x ** (p + <span style="color: #B452CD">1</span>)
 
-<span style="color: #228B22">## Plot and compare the slightly different ways to perform cross-validation</span>
+<span style="color: #228B22"># Intercept is not included in the design matrix</span>
+skl = LinearRegression(fit_intercept=<span style="color: #8B008B; font-weight: bold">True</span>).fit(X, y)
 
-plt.figure()
+<span style="color: #228B22"># Use centered values for X and y when computing coefficients</span>
+y_offset = np.average(y, axis=<span style="color: #B452CD">0</span>)
+X_offset = np.average(X, axis=<span style="color: #B452CD">0</span>)
 
-plt.plot(np.log10(lambdas), estimated_mse_sklearn, label = <span style="color: #CD5555">&#39;cross_val_score&#39;</span>)
-plt.plot(np.log10(lambdas), estimated_mse_KFold, <span style="color: #CD5555">&#39;r--&#39;</span>, label = <span style="color: #CD5555">&#39;KFold&#39;</span>)
+theta = fit_theta(X - X_offset, y - y_offset)
+intercept = np.mean(y_offset - X_offset @ theta)
 
-plt.xlabel(<span style="color: #CD5555">&#39;log10(lambda)&#39;</span>)
-plt.ylabel(<span style="color: #CD5555">&#39;mse&#39;</span>)
+<span style="color: #658b00">print</span>(<span style="color: #CD5555">f&quot;Manual intercept: {</span>intercept<span style="color: #CD5555">}&quot;</span>)
+<span style="color: #658b00">print</span>(<span style="color: #CD5555">f&quot;Fitted theta (without intercept): {</span>theta<span style="color: #CD5555">}&quot;</span>)
+<span style="color: #658b00">print</span>(<span style="color: #CD5555">f&quot;Sklearn intercept: {</span>skl.intercept_<span style="color: #CD5555">}&quot;</span>)
+<span style="color: #658b00">print</span>(<span style="color: #CD5555">f&quot;Sklearn fitted theta (without intercept): {</span>skl.coef_<span style="color: #CD5555">}&quot;</span>)
+ypredictOwn = X @ theta
+ypredictSKL = skl.predict(X)
+<span style="color: #658b00">print</span>(<span style="color: #CD5555">f&quot;MSE with Manual intercept&quot;</span>)
+<span style="color: #658b00">print</span>(MSE(y,ypredictOwn+intercept))
+<span style="color: #658b00">print</span>(<span style="color: #CD5555">f&quot;MSE with Sklearn intercept&quot;</span>)
+<span style="color: #658b00">print</span>(MSE(y,ypredictSKL))
 
+plt.plot(x, X @ theta + intercept, <span style="color: #CD5555">&quot;--&quot;</span>, label=<span style="color: #CD5555">&quot;Fit (manual intercept)&quot;</span>)
+plt.plot(x, skl.predict(X), <span style="color: #CD5555">&quot;--&quot;</span>, label=<span style="color: #CD5555">&quot;Sklearn (fit_intercept=True)&quot;</span>)
+plt.grid()
 plt.legend()
 
 plt.show()
@@ -1648,9 +2818,43 @@ <h2 id="code-example-for-cross-validation-and-k-fold-cross-validation">Code Exam
   </div>
 </div>
 
+<p>The intercept is the value of our output/target variable
+when all our features are zero and our function crosses the \( y \)-axis (for a one-dimensional case). 
+</p>
 
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="more-examples-on-bootstrap-and-cross-validation-and-errors">More examples on bootstrap and cross-validation and errors </h2>
+<p>Printing the MSE, we see first that both methods give the same MSE, as
+they should.  However, when we move to for example Ridge regression,
+the way we treat the intercept may give a larger or smaller MSE,
+meaning that the MSE can be penalized by the value of the
+intercept. Not including the intercept in the fit, means that the
+regularization term does not include \( \theta_0 \). For different values
+of \( \lambda \), this may lead to different MSE values. 
+</p>
+
+<p>To remind the reader, the regularization term, with the intercept in Ridge regression, is given by</p>
+$$
+\lambda \vert\vert \boldsymbol{\theta} \vert\vert_2^2 = \lambda \sum_{j=0}^{p-1}\theta_j^2,
+$$
+
+<p>but when we take out the intercept, this equation becomes</p>
+$$
+\lambda \vert\vert \boldsymbol{\theta} \vert\vert_2^2 = \lambda \sum_{j=1}^{p-1}\theta_j^2.
+$$
+
+<p>For Lasso regression we have</p>
+$$
+\lambda \vert\vert \boldsymbol{\theta} \vert\vert_1 = \lambda \sum_{j=1}^{p-1}\vert\theta_j\vert.
+$$
+
+<p>It means that, when scaling the design matrix and the outputs/targets,
+by subtracting the mean values, we have an optimization problem which
+is not penalized by the intercept. The MSE value can then be smaller
+since it focuses only on the remaining quantities. If we however bring
+back the intercept, we will get a MSE which then contains the
+intercept.
+</p>
+
+<p>Armed with this wisdom, we attempt first to simply set the intercept equal to <b>False</b> in our implementation of Ridge regression for our well-known  vanilla data set.</p>
 
 
 <!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
@@ -1659,82 +2863,69 @@ <h2 id="more-examples-on-bootstrap-and-cross-validation-and-errors">More example
     <div class="inner_cell">
       <div class="input_area">
         <div class="highlight" style="background: #eeeedd">
-  <pre style="line-height: 125%;"><span style="color: #228B22"># Common imports</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">os</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
+  <pre style="line-height: 125%;"><span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
 <span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">pandas</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">pd</span>
 <span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">matplotlib.pyplot</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">plt</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.linear_model</span> <span style="color: #8B008B; font-weight: bold">import</span> LinearRegression, Ridge, Lasso
 <span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.model_selection</span> <span style="color: #8B008B; font-weight: bold">import</span> train_test_split
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.utils</span> <span style="color: #8B008B; font-weight: bold">import</span> resample
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.metrics</span> <span style="color: #8B008B; font-weight: bold">import</span> mean_squared_error
-<span style="color: #228B22"># Where to save the figures and data files</span>
-PROJECT_ROOT_DIR = <span style="color: #CD5555">&quot;Results&quot;</span>
-FIGURE_ID = <span style="color: #CD5555">&quot;Results/FigureFiles&quot;</span>
-DATA_ID = <span style="color: #CD5555">&quot;DataFiles/&quot;</span>
-
-<span style="color: #8B008B; font-weight: bold">if</span> <span style="color: #8B008B">not</span> os.path.exists(PROJECT_ROOT_DIR):
-    os.mkdir(PROJECT_ROOT_DIR)
-
-<span style="color: #8B008B; font-weight: bold">if</span> <span style="color: #8B008B">not</span> os.path.exists(FIGURE_ID):
-    os.makedirs(FIGURE_ID)
-
-<span style="color: #8B008B; font-weight: bold">if</span> <span style="color: #8B008B">not</span> os.path.exists(DATA_ID):
-    os.makedirs(DATA_ID)
-
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">image_path</span>(fig_id):
-    <span style="color: #8B008B; font-weight: bold">return</span> os.path.join(FIGURE_ID, fig_id)
-
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">data_path</span>(dat_id):
-    <span style="color: #8B008B; font-weight: bold">return</span> os.path.join(DATA_ID, dat_id)
-
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">save_fig</span>(fig_id):
-    plt.savefig(image_path(fig_id) + <span style="color: #CD5555">&quot;.png&quot;</span>, <span style="color: #658b00">format</span>=<span style="color: #CD5555">&#39;png&#39;</span>)
-
-infile = <span style="color: #658b00">open</span>(data_path(<span style="color: #CD5555">&quot;EoS.csv&quot;</span>),<span style="color: #CD5555">&#39;r&#39;</span>)
-
-<span style="color: #228B22"># Read the EoS data as  csv file and organize the data into two arrays with density and energies</span>
-EoS = pd.read_csv(infile, names=(<span style="color: #CD5555">&#39;Density&#39;</span>, <span style="color: #CD5555">&#39;Energy&#39;</span>))
-EoS[<span style="color: #CD5555">&#39;Energy&#39;</span>] = pd.to_numeric(EoS[<span style="color: #CD5555">&#39;Energy&#39;</span>], errors=<span style="color: #CD5555">&#39;coerce&#39;</span>)
-EoS = EoS.dropna()
-Energies = EoS[<span style="color: #CD5555">&#39;Energy&#39;</span>]
-Density = EoS[<span style="color: #CD5555">&#39;Density&#39;</span>]
-<span style="color: #228B22">#  The design matrix now as function of various polytrops</span>
-
-Maxpolydegree = <span style="color: #B452CD">30</span>
-X = np.zeros((<span style="color: #658b00">len</span>(Density),Maxpolydegree))
-X[:,<span style="color: #B452CD">0</span>] = <span style="color: #B452CD">1.0</span>
-testerror = np.zeros(Maxpolydegree)
-trainingerror = np.zeros(Maxpolydegree)
-polynomial = np.zeros(Maxpolydegree)
-
-trials = <span style="color: #B452CD">100</span>
-<span style="color: #8B008B; font-weight: bold">for</span> polydegree <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #B452CD">1</span>, Maxpolydegree):
-    polynomial[polydegree] = polydegree
-    <span style="color: #8B008B; font-weight: bold">for</span> degree <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(polydegree):
-        X[:,degree] = Density**(degree/<span style="color: #B452CD">3.0</span>)
-
-<span style="color: #228B22"># loop over trials in order to estimate the expectation value of the MSE</span>
-    testerror[polydegree] = <span style="color: #B452CD">0.0</span>
-    trainingerror[polydegree] = <span style="color: #B452CD">0.0</span>
-    <span style="color: #8B008B; font-weight: bold">for</span> samples <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(trials):
-        x_train, x_test, y_train, y_test = train_test_split(X, Energies, test_size=<span style="color: #B452CD">0.2</span>)
-        model = LinearRegression(fit_intercept=<span style="color: #8B008B; font-weight: bold">False</span>).fit(x_train, y_train)
-        ypred = model.predict(x_train)
-        ytilde = model.predict(x_test)
-        testerror[polydegree] += mean_squared_error(y_test, ytilde)
-        trainingerror[polydegree] += mean_squared_error(y_train, ypred) 
-
-    testerror[polydegree] /= trials
-    trainingerror[polydegree] /= trials
-    <span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;Degree of polynomial: %3d&quot;</span>% polynomial[polydegree])
-    <span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;Mean squared error on training data: %.8f&quot;</span> % trainingerror[polydegree])
-    <span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;Mean squared error on test data: %.8f&quot;</span> % testerror[polydegree])
-
-plt.plot(polynomial, np.log10(trainingerror), label=<span style="color: #CD5555">&#39;Training Error&#39;</span>)
-plt.plot(polynomial, np.log10(testerror), label=<span style="color: #CD5555">&#39;Test Error&#39;</span>)
-plt.xlabel(<span style="color: #CD5555">&#39;Polynomial degree&#39;</span>)
-plt.ylabel(<span style="color: #CD5555">&#39;log10[MSE]&#39;</span>)
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn</span> <span style="color: #8B008B; font-weight: bold">import</span> linear_model
+
+<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">MSE</span>(y_data,y_model):
+    n = np.size(y_model)
+    <span style="color: #8B008B; font-weight: bold">return</span> np.sum((y_data-y_model)**<span style="color: #B452CD">2</span>)/n
+
+
+<span style="color: #228B22"># A seed just to ensure that the random numbers are the same for every run.</span>
+<span style="color: #228B22"># Useful for eventual debugging.</span>
+np.random.seed(<span style="color: #B452CD">3155</span>)
+
+n = <span style="color: #B452CD">100</span>
+x = np.random.rand(n)
+y = np.exp(-x**<span style="color: #B452CD">2</span>) + <span style="color: #B452CD">1.5</span> * np.exp(-(x-<span style="color: #B452CD">2</span>)**<span style="color: #B452CD">2</span>)
+
+Maxpolydegree = <span style="color: #B452CD">20</span>
+X = np.zeros((n,Maxpolydegree))
+<span style="color: #228B22">#We include explicitely the intercept column</span>
+<span style="color: #8B008B; font-weight: bold">for</span> degree <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(Maxpolydegree):
+    X[:,degree] = x**degree
+<span style="color: #228B22"># We split the data in test and training data</span>
+X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=<span style="color: #B452CD">0.2</span>)
+
+p = Maxpolydegree
+I = np.eye(p,p)
+<span style="color: #228B22"># Decide which values of lambda to use</span>
+nlambdas = <span style="color: #B452CD">6</span>
+MSEOwnRidgePredict = np.zeros(nlambdas)
+MSERidgePredict = np.zeros(nlambdas)
+lambdas = np.logspace(-<span style="color: #B452CD">4</span>, <span style="color: #B452CD">2</span>, nlambdas)
+<span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(nlambdas):
+    lmb = lambdas[i]
+    OwnRidgeTheta = np.linalg.pinv(X_train.T @ X_train+lmb*I) @ X_train.T @ y_train
+    <span style="color: #228B22"># Note: we include the intercept column and no scaling</span>
+    RegRidge = linear_model.Ridge(lmb,fit_intercept=<span style="color: #8B008B; font-weight: bold">False</span>)
+    RegRidge.fit(X_train,y_train)
+    <span style="color: #228B22"># and then make the prediction</span>
+    ytildeOwnRidge = X_train @ OwnRidgeTheta
+    ypredictOwnRidge = X_test @ OwnRidgeTheta
+    ytildeRidge = RegRidge.predict(X_train)
+    ypredictRidge = RegRidge.predict(X_test)
+    MSEOwnRidgePredict[i] = MSE(y_test,ypredictOwnRidge)
+    MSERidgePredict[i] = MSE(y_test,ypredictRidge)
+    <span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;Theta values for own Ridge implementation&quot;</span>)
+    <span style="color: #658b00">print</span>(OwnRidgeTheta)
+    <span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;Theta values for Scikit-Learn Ridge implementation&quot;</span>)
+    <span style="color: #658b00">print</span>(RegRidge.coef_)
+    <span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;MSE values for own Ridge implementation&quot;</span>)
+    <span style="color: #658b00">print</span>(MSEOwnRidgePredict[i])
+    <span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;MSE values for Scikit-Learn Ridge implementation&quot;</span>)
+    <span style="color: #658b00">print</span>(MSERidgePredict[i])
+
+<span style="color: #228B22"># Now plot the results</span>
+plt.figure()
+plt.plot(np.log10(lambdas), MSEOwnRidgePredict, <span style="color: #CD5555">&#39;r&#39;</span>, label = <span style="color: #CD5555">&#39;MSE own Ridge Test&#39;</span>)
+plt.plot(np.log10(lambdas), MSERidgePredict, <span style="color: #CD5555">&#39;g&#39;</span>, label = <span style="color: #CD5555">&#39;MSE Ridge Test&#39;</span>)
+
+plt.xlabel(<span style="color: #CD5555">&#39;log10(lambda)&#39;</span>)
+plt.ylabel(<span style="color: #CD5555">&#39;MSE&#39;</span>)
 plt.legend()
 plt.show()
 </pre>
@@ -1752,12 +2943,12 @@ <h2 id="more-examples-on-bootstrap-and-cross-validation-and-errors">More example
   </div>
 </div>
 
-<p>Note that we kept the intercept column in the fitting here. This means that we need to set the <b>intercept</b> in the call to the <b>Scikit-Learn</b> function as <b>False</b>. Alternatively, we could have set up the design matrix \( X \) without the first column of ones.</p>
-
-<!-- !split  -->
-<h2 id="the-same-example-but-now-with-cross-validation">The same example but now with cross-validation </h2>
+<p>The results here agree when we force <b>Scikit-Learn</b>'s Ridge function to include the first column in our design matrix.
+We see that the results agree very well. Here we have thus explicitely included the intercept column in the design matrix.
+What happens if we do not include the intercept in our fit?
+Let us see how we can change this code by zero centering.
+</p>
 
-<p>In this example we keep the intercept column again but add cross-validation in order to estimate the best possible value of the means squared error.</p>
 
 <!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
 <div class="cell border-box-sizing code_cell rendered">
@@ -1765,71 +2956,82 @@ <h2 id="the-same-example-but-now-with-cross-validation">The same example but now
     <div class="inner_cell">
       <div class="input_area">
         <div class="highlight" style="background: #eeeedd">
-  <pre style="line-height: 125%;"><span style="color: #228B22"># Common imports</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">os</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
+  <pre style="line-height: 125%;"><span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
 <span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">pandas</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">pd</span>
 <span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">matplotlib.pyplot</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">plt</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.linear_model</span> <span style="color: #8B008B; font-weight: bold">import</span> LinearRegression, Ridge, Lasso
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.metrics</span> <span style="color: #8B008B; font-weight: bold">import</span> mean_squared_error
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.model_selection</span> <span style="color: #8B008B; font-weight: bold">import</span> KFold
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.model_selection</span> <span style="color: #8B008B; font-weight: bold">import</span> cross_val_score
-
-
-<span style="color: #228B22"># Where to save the figures and data files</span>
-PROJECT_ROOT_DIR = <span style="color: #CD5555">&quot;Results&quot;</span>
-FIGURE_ID = <span style="color: #CD5555">&quot;Results/FigureFiles&quot;</span>
-DATA_ID = <span style="color: #CD5555">&quot;DataFiles/&quot;</span>
-
-<span style="color: #8B008B; font-weight: bold">if</span> <span style="color: #8B008B">not</span> os.path.exists(PROJECT_ROOT_DIR):
-    os.mkdir(PROJECT_ROOT_DIR)
-
-<span style="color: #8B008B; font-weight: bold">if</span> <span style="color: #8B008B">not</span> os.path.exists(FIGURE_ID):
-    os.makedirs(FIGURE_ID)
-
-<span style="color: #8B008B; font-weight: bold">if</span> <span style="color: #8B008B">not</span> os.path.exists(DATA_ID):
-    os.makedirs(DATA_ID)
-
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">image_path</span>(fig_id):
-    <span style="color: #8B008B; font-weight: bold">return</span> os.path.join(FIGURE_ID, fig_id)
-
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">data_path</span>(dat_id):
-    <span style="color: #8B008B; font-weight: bold">return</span> os.path.join(DATA_ID, dat_id)
-
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">save_fig</span>(fig_id):
-    plt.savefig(image_path(fig_id) + <span style="color: #CD5555">&quot;.png&quot;</span>, <span style="color: #658b00">format</span>=<span style="color: #CD5555">&#39;png&#39;</span>)
-
-infile = <span style="color: #658b00">open</span>(data_path(<span style="color: #CD5555">&quot;EoS.csv&quot;</span>),<span style="color: #CD5555">&#39;r&#39;</span>)
-
-<span style="color: #228B22"># Read the EoS data as  csv file and organize the data into two arrays with density and energies</span>
-EoS = pd.read_csv(infile, names=(<span style="color: #CD5555">&#39;Density&#39;</span>, <span style="color: #CD5555">&#39;Energy&#39;</span>))
-EoS[<span style="color: #CD5555">&#39;Energy&#39;</span>] = pd.to_numeric(EoS[<span style="color: #CD5555">&#39;Energy&#39;</span>], errors=<span style="color: #CD5555">&#39;coerce&#39;</span>)
-EoS = EoS.dropna()
-Energies = EoS[<span style="color: #CD5555">&#39;Energy&#39;</span>]
-Density = EoS[<span style="color: #CD5555">&#39;Density&#39;</span>]
-<span style="color: #228B22">#  The design matrix now as function of various polytrops</span>
-
-Maxpolydegree = <span style="color: #B452CD">30</span>
-X = np.zeros((<span style="color: #658b00">len</span>(Density),Maxpolydegree))
-X[:,<span style="color: #B452CD">0</span>] = <span style="color: #B452CD">1.0</span>
-estimated_mse_sklearn = np.zeros(Maxpolydegree)
-polynomial = np.zeros(Maxpolydegree)
-k =<span style="color: #B452CD">5</span>
-kfold = KFold(n_splits = k)
-
-<span style="color: #8B008B; font-weight: bold">for</span> polydegree <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #B452CD">1</span>, Maxpolydegree):
-    polynomial[polydegree] = polydegree
-    <span style="color: #8B008B; font-weight: bold">for</span> degree <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(polydegree):
-        X[:,degree] = Density**(degree/<span style="color: #B452CD">3.0</span>)
-        OLS = LinearRegression(fit_intercept=<span style="color: #8B008B; font-weight: bold">False</span>)
-<span style="color: #228B22"># loop over trials in order to estimate the expectation value of the MSE</span>
-    estimated_mse_folds = cross_val_score(OLS, X, Energies, scoring=<span style="color: #CD5555">&#39;neg_mean_squared_error&#39;</span>, cv=kfold)
-<span style="color: #228B22">#[:, np.newaxis]</span>
-    estimated_mse_sklearn[polydegree] = np.mean(-estimated_mse_folds)
-
-plt.plot(polynomial, np.log10(estimated_mse_sklearn), label=<span style="color: #CD5555">&#39;Test Error&#39;</span>)
-plt.xlabel(<span style="color: #CD5555">&#39;Polynomial degree&#39;</span>)
-plt.ylabel(<span style="color: #CD5555">&#39;log10[MSE]&#39;</span>)
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.model_selection</span> <span style="color: #8B008B; font-weight: bold">import</span> train_test_split
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn</span> <span style="color: #8B008B; font-weight: bold">import</span> linear_model
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.preprocessing</span> <span style="color: #8B008B; font-weight: bold">import</span> StandardScaler
+
+<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">MSE</span>(y_data,y_model):
+    n = np.size(y_model)
+    <span style="color: #8B008B; font-weight: bold">return</span> np.sum((y_data-y_model)**<span style="color: #B452CD">2</span>)/n
+<span style="color: #228B22"># A seed just to ensure that the random numbers are the same for every run.</span>
+<span style="color: #228B22"># Useful for eventual debugging.</span>
+np.random.seed(<span style="color: #B452CD">315</span>)
+
+n = <span style="color: #B452CD">100</span>
+x = np.random.rand(n)
+y = np.exp(-x**<span style="color: #B452CD">2</span>) + <span style="color: #B452CD">1.5</span> * np.exp(-(x-<span style="color: #B452CD">2</span>)**<span style="color: #B452CD">2</span>)
+
+Maxpolydegree = <span style="color: #B452CD">20</span>
+X = np.zeros((n,Maxpolydegree-<span style="color: #B452CD">1</span>))
+
+<span style="color: #8B008B; font-weight: bold">for</span> degree <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #B452CD">1</span>,Maxpolydegree): <span style="color: #228B22">#No intercept column</span>
+    X[:,degree-<span style="color: #B452CD">1</span>] = x**(degree)
+
+<span style="color: #228B22"># We split the data in test and training data</span>
+X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=<span style="color: #B452CD">0.2</span>)
+
+<span style="color: #228B22">#For our own implementation, we will need to deal with the intercept by centering the design matrix and the target variable</span>
+X_train_mean = np.mean(X_train,axis=<span style="color: #B452CD">0</span>)
+<span style="color: #228B22">#Center by removing mean from each feature</span>
+X_train_scaled = X_train - X_train_mean 
+X_test_scaled = X_test - X_train_mean
+<span style="color: #228B22">#The model intercept (called y_scaler) is given by the mean of the target variable (IF X is centered)</span>
+<span style="color: #228B22">#Remove the intercept from the training data.</span>
+y_scaler = np.mean(y_train)           
+y_train_scaled = y_train - y_scaler   
+
+p = Maxpolydegree-<span style="color: #B452CD">1</span>
+I = np.eye(p,p)
+<span style="color: #228B22"># Decide which values of lambda to use</span>
+nlambdas = <span style="color: #B452CD">6</span>
+MSEOwnRidgePredict = np.zeros(nlambdas)
+MSERidgePredict = np.zeros(nlambdas)
+
+lambdas = np.logspace(-<span style="color: #B452CD">4</span>, <span style="color: #B452CD">2</span>, nlambdas)
+<span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(nlambdas):
+    lmb = lambdas[i]
+    OwnRidgeTheta = np.linalg.pinv(X_train_scaled.T @ X_train_scaled+lmb*I) @ X_train_scaled.T @ (y_train_scaled)
+    intercept_ = y_scaler - X_train_mean<span style="color: #707a7c">@OwnRidgeTheta</span> <span style="color: #228B22">#The intercept can be shifted so the model can predict on uncentered data</span>
+    <span style="color: #228B22">#Add intercept to prediction</span>
+    ypredictOwnRidge = X_test_scaled @ OwnRidgeTheta + y_scaler 
+    RegRidge = linear_model.Ridge(lmb)
+    RegRidge.fit(X_train,y_train)
+    ypredictRidge = RegRidge.predict(X_test)
+    MSEOwnRidgePredict[i] = MSE(y_test,ypredictOwnRidge)
+    MSERidgePredict[i] = MSE(y_test,ypredictRidge)
+    <span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;Theta values for own Ridge implementation&quot;</span>)
+    <span style="color: #658b00">print</span>(OwnRidgeTheta) <span style="color: #228B22">#Intercept is given by mean of target variable</span>
+    <span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;Theta values for Scikit-Learn Ridge implementation&quot;</span>)
+    <span style="color: #658b00">print</span>(RegRidge.coef_)
+    <span style="color: #658b00">print</span>(<span style="color: #CD5555">&#39;Intercept from own implementation:&#39;</span>)
+    <span style="color: #658b00">print</span>(intercept_)
+    <span style="color: #658b00">print</span>(<span style="color: #CD5555">&#39;Intercept from Scikit-Learn Ridge implementation&#39;</span>)
+    <span style="color: #658b00">print</span>(RegRidge.intercept_)
+    <span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;MSE values for own Ridge implementation&quot;</span>)
+    <span style="color: #658b00">print</span>(MSEOwnRidgePredict[i])
+    <span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;MSE values for Scikit-Learn Ridge implementation&quot;</span>)
+    <span style="color: #658b00">print</span>(MSERidgePredict[i])
+
+
+<span style="color: #228B22"># Now plot the results</span>
+plt.figure()
+plt.plot(np.log10(lambdas), MSEOwnRidgePredict, <span style="color: #CD5555">&#39;b--&#39;</span>, label = <span style="color: #CD5555">&#39;MSE own Ridge Test&#39;</span>)
+plt.plot(np.log10(lambdas), MSERidgePredict, <span style="color: #CD5555">&#39;g--&#39;</span>, label = <span style="color: #CD5555">&#39;MSE SL Ridge Test&#39;</span>)
+plt.xlabel(<span style="color: #CD5555">&#39;log10(lambda)&#39;</span>)
+plt.ylabel(<span style="color: #CD5555">&#39;MSE&#39;</span>)
 plt.legend()
 plt.show()
 </pre>
@@ -1847,191 +3049,17 @@ <h2 id="the-same-example-but-now-with-cross-validation">The same example but now
   </div>
 </div>
 
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="material-for-the-lab-sessions">Material for the lab sessions </h2>
-
-<!-- !split  -->
-<h2 id="linking-the-regression-analysis-with-a-statistical-interpretation">Linking the regression analysis with a statistical interpretation </h2>
-
-<p>We will now couple the discussions of ordinary least squares, Ridge
-and Lasso regression with a statistical interpretation, that is we
-move from a linear algebra analysis to a statistical analysis. In
-particular, we will focus on what the regularization terms can result
-in.  We will amongst other things show that the regularization
-parameter can reduce considerably the variance of the parameters
-\( \beta \).
-</p>
-
-<p>The
-advantage of doing linear regression is that we actually end up with
-analytical expressions for several statistical quantities.  
-Standard least squares and Ridge regression  allow us to
-derive quantities like the variance and other expectation values in a
-rather straightforward way.
-</p>
-
-<p>It is assumed that \( \varepsilon_i
-\sim \mathcal{N}(0, \sigma^2) \) and the \( \varepsilon_{i} \) are
-independent, i.e.: 
-</p>
-$$
-\begin{align*} 
-\mbox{Cov}(\varepsilon_{i_1},
-\varepsilon_{i_2}) & = \left\{ \begin{array}{lcc} \sigma^2 & \mbox{if}
-& i_1 = i_2, \\ 0 & \mbox{if} & i_1 \not= i_2.  \end{array} \right.
-\end{align*} 
-$$
-
-<p>The randomness of \( \varepsilon_i \) implies that
-\( \mathbf{y}_i \) is also a random variable. In particular,
-\( \mathbf{y}_i \) is normally distributed, because \( \varepsilon_i \sim
-\mathcal{N}(0, \sigma^2) \) and \( \mathbf{X}_{i,\ast} \, \boldsymbol{\beta} \) is a
-non-random scalar. To specify the parameters of the distribution of
-\( \mathbf{y}_i \) we need to calculate its first two moments. 
-</p>
-
-<p>Recall that \( \boldsymbol{X} \) is a matrix of dimensionality \( n\times p \). The
-notation above \( \mathbf{X}_{i,\ast} \) means that we are looking at the
-row number \( i \) and perform a sum over all values \( p \).
-</p>
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="assumptions-made">Assumptions made </h2>
-
-<p>The assumption we have made here can be summarized as (and this is going to be useful when we discuss the bias-variance trade off)
-that there exists a function \( f(\boldsymbol{x}) \) and  a normal distributed error \( \boldsymbol{\varepsilon}\sim \mathcal{N}(0, \sigma^2) \)
-which describe our data
-</p>
-$$
-\boldsymbol{y} = f(\boldsymbol{x})+\boldsymbol{\varepsilon}
-$$
-
-<p>We approximate this function with our model from the solution of the linear regression equations, that is our
-function \( f \) is approximated by \( \boldsymbol{\tilde{y}} \) where we want to minimize \( (\boldsymbol{y}-\boldsymbol{\tilde{y}})^2 \), our MSE, with
-</p>
-$$
-\boldsymbol{\tilde{y}} = \boldsymbol{X}\boldsymbol{\beta}.
-$$
-
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="expectation-value-and-variance">Expectation value and variance </h2>
-
-<p>We can calculate the expectation value of \( \boldsymbol{y} \) for a given element \( i \) </p>
-$$
-\begin{align*} 
-\mathbb{E}(y_i) & =
-\mathbb{E}(\mathbf{X}_{i, \ast} \, \boldsymbol{\beta}) + \mathbb{E}(\varepsilon_i)
-\, \, \, = \, \, \, \mathbf{X}_{i, \ast} \, \beta, 
-\end{align*} 
-$$
-
-<p>while
-its variance is 
-</p>
-$$
-\begin{align*} \mbox{Var}(y_i) & = \mathbb{E} \{ [y_i
-- \mathbb{E}(y_i)]^2 \} \, \, \, = \, \, \, \mathbb{E} ( y_i^2 ) -
-[\mathbb{E}(y_i)]^2  \\  & = \mathbb{E} [ ( \mathbf{X}_{i, \ast} \,
-\beta + \varepsilon_i )^2] - ( \mathbf{X}_{i, \ast} \, \boldsymbol{\beta})^2 \\ &
-= \mathbb{E} [ ( \mathbf{X}_{i, \ast} \, \boldsymbol{\beta})^2 + 2 \varepsilon_i
-\mathbf{X}_{i, \ast} \, \boldsymbol{\beta} + \varepsilon_i^2 ] - ( \mathbf{X}_{i,
-\ast} \, \beta)^2 \\  & = ( \mathbf{X}_{i, \ast} \, \boldsymbol{\beta})^2 + 2
-\mathbb{E}(\varepsilon_i) \mathbf{X}_{i, \ast} \, \boldsymbol{\beta} +
-\mathbb{E}(\varepsilon_i^2 ) - ( \mathbf{X}_{i, \ast} \, \boldsymbol{\beta})^2 
-\\ & = \mathbb{E}(\varepsilon_i^2 ) \, \, \, = \, \, \,
-\mbox{Var}(\varepsilon_i) \, \, \, = \, \, \, \sigma^2.  
-\end{align*}
-$$
-
-<p>Hence, \( y_i \sim \mathcal{N}( \mathbf{X}_{i, \ast} \, \boldsymbol{\beta}, \sigma^2) \), that is \( \boldsymbol{y} \) follows a normal distribution with 
-mean value \( \boldsymbol{X}\boldsymbol{\beta} \) and variance \( \sigma^2 \) (not be confused with the singular values of the SVD). 
+<p>We see here, when compared to the code which includes explicitely the
+intercept column, that our MSE value is actually smaller. This is
+because the regularization term does not include the intercept value
+\( \theta_0 \) in the fitting.  This applies to Lasso regularization as
+well.  It means that our optimization is now done only with the
+centered matrix and/or vector that enter the fitting procedure.
 </p>
 
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="expectation-value-and-variance-for-boldsymbol-beta">Expectation value and variance for \( \boldsymbol{\beta} \) </h2>
-
-<p>With the OLS expressions for the optimal parameters \( \boldsymbol{\hat{\beta}} \) we can evaluate the expectation value</p>
-$$
-\mathbb{E}(\boldsymbol{\hat{\beta}}) = \mathbb{E}[ (\mathbf{X}^{\top} \mathbf{X})^{-1}\mathbf{X}^{T} \mathbf{Y}]=(\mathbf{X}^{T} \mathbf{X})^{-1}\mathbf{X}^{T} \mathbb{E}[ \mathbf{Y}]=(\mathbf{X}^{T} \mathbf{X})^{-1} \mathbf{X}^{T}\mathbf{X}\boldsymbol{\beta}=\boldsymbol{\beta}.
-$$
-
-<p>This means that the estimator of the regression parameters is unbiased.</p>
-
-<p>We can also calculate the variance</p>
-
-<p>The variance of the optimal value \( \boldsymbol{\hat{\beta}} \) is</p>
-$$
-\begin{eqnarray*}
-\mbox{Var}(\boldsymbol{\hat{\beta}}) & = & \mathbb{E} \{ [\boldsymbol{\beta} - \mathbb{E}(\boldsymbol{\beta})] [\boldsymbol{\beta} - \mathbb{E}(\boldsymbol{\beta})]^{T} \}
-\\
-& = & \mathbb{E} \{ [(\mathbf{X}^{T} \mathbf{X})^{-1} \, \mathbf{X}^{T} \mathbf{y} - \boldsymbol{\beta}] \, [(\mathbf{X}^{T} \mathbf{X})^{-1} \, \mathbf{X}^{T} \mathbf{y} - \boldsymbol{\beta}]^{T} \}
-\\
-% & = & \mathbb{E} \{ [(\mathbf{X}^{T} \mathbf{X})^{-1} \, \mathbf{X}^{T} \mathbf{y}] \, [(\mathbf{X}^{T} \mathbf{X})^{-1} \, \mathbf{X}^{T} \mathbf{y}]^{T} \} - \boldsymbol{\beta} \, \boldsymbol{\beta}^{T}
-% \\
-% & = & \mathbb{E} \{ (\mathbf{X}^{T} \mathbf{X})^{-1} \, \mathbf{X}^{T} \mathbf{y} \, \mathbf{y}^{T} \, \mathbf{X} \, (\mathbf{X}^{T} \mathbf{X})^{-1}  \} - \boldsymbol{\beta} \, \boldsymbol{\beta}^{T}
-% \\
-& = & (\mathbf{X}^{T} \mathbf{X})^{-1} \, \mathbf{X}^{T} \, \mathbb{E} \{ \mathbf{y} \, \mathbf{y}^{T} \} \, \mathbf{X} \, (\mathbf{X}^{T} \mathbf{X})^{-1} - \boldsymbol{\beta} \, \boldsymbol{\beta}^{T}
-\\
-& = & (\mathbf{X}^{T} \mathbf{X})^{-1} \, \mathbf{X}^{T} \, \{ \mathbf{X} \, \boldsymbol{\beta} \, \boldsymbol{\beta}^{T} \,  \mathbf{X}^{T} + \sigma^2 \} \, \mathbf{X} \, (\mathbf{X}^{T} \mathbf{X})^{-1} - \boldsymbol{\beta} \, \boldsymbol{\beta}^{T}
-% \\
-% & = & (\mathbf{X}^T \mathbf{X})^{-1} \, \mathbf{X}^T \, \mathbf{X} \, \boldsymbol{\beta} \, \boldsymbol{\beta}^T \,  \mathbf{X}^T \, \mathbf{X} \, (\mathbf{X}^T % \mathbf{X})^{-1}
-% \\
-% & & + \, \, \sigma^2 \, (\mathbf{X}^T \mathbf{X})^{-1} \, \mathbf{X}^T  \, \mathbf{X} \, (\mathbf{X}^T \mathbf{X})^{-1} - \boldsymbol{\beta} \boldsymbol{\beta}^T
-\\
-& = & \boldsymbol{\beta} \, \boldsymbol{\beta}^{T}  + \sigma^2 \, (\mathbf{X}^{T} \mathbf{X})^{-1} - \boldsymbol{\beta} \, \boldsymbol{\beta}^{T}
-\, \, \, = \, \, \, \sigma^2 \, (\mathbf{X}^{T} \mathbf{X})^{-1},
-\end{eqnarray*}
-$$
-
-<p>where we have used  that \( \mathbb{E} (\mathbf{y} \mathbf{y}^{T}) =
-\mathbf{X} \, \boldsymbol{\beta} \, \boldsymbol{\beta}^{T} \, \mathbf{X}^{T} +
-\sigma^2 \, \mathbf{I}_{nn} \). From \( \mbox{Var}(\boldsymbol{\beta}) = \sigma^2
-\, (\mathbf{X}^{T} \mathbf{X})^{-1} \), one obtains an estimate of the
-variance of the estimate of the \( j \)-th regression coefficient:
-\( \boldsymbol{\sigma}^2 (\boldsymbol{\beta}_j ) = \boldsymbol{\sigma}^2 [(\mathbf{X}^{T} \mathbf{X})^{-1}]_{jj}  \). This may be used to
-construct a confidence interval for the estimates.
-</p>
-
-<p>In a similar way, we can obtain analytical expressions for say the
-expectation values of the parameters \( \boldsymbol{\beta} \) and their variance
-when we employ Ridge regression, allowing us again to define a confidence interval. 
-</p>
-
-<p>It is rather straightforward to show that</p>
-$$
-\mathbb{E} \big[ \hat{\boldsymbol{\beta}}^{\mathrm{Ridge}} \big]=(\mathbf{X}^{T} \mathbf{X} + \lambda \mathbf{I}_{pp})^{-1} (\mathbf{X}^{\top} \mathbf{X})\boldsymbol{\beta}.
-$$
-
-<p>We see clearly that 
-\( \mathbb{E} \big[ \hat{\boldsymbol{\beta}}^{\mathrm{Ridge}} \big] \not= \hat{\boldsymbol{\beta}}^{\mathrm{OLS}} \) for any \( \lambda > 0 \).
-</p>
-
-<p>We can also compute the variance as </p>
-
-$$
-\mbox{Var}[\hat{\boldsymbol{\beta}}^{\mathrm{Ridge}}]=\sigma^2[  \mathbf{X}^{T} \mathbf{X} + \lambda \mathbf{I} ]^{-1}  \mathbf{X}^{T} \mathbf{X} \{ [  \mathbf{X}^{\top} \mathbf{X} + \lambda \mathbf{I} ]^{-1}\}^{T},
-$$
-
-<p>and it is easy to see that if the parameter \( \lambda \) goes to infinity then the variance of Ridge parameters \( \boldsymbol{\beta} \) goes to zero. </p>
-
-<p>With this, we can compute the difference </p>
-
-$$
-\mbox{Var}[\hat{\boldsymbol{\beta}}^{\mathrm{OLS}}]-\mbox{Var}(\hat{\boldsymbol{\beta}}^{\mathrm{Ridge}})=\sigma^2 [  \mathbf{X}^{T} \mathbf{X} + \lambda \mathbf{I} ]^{-1}[ 2\lambda\mathbf{I} + \lambda^2 (\mathbf{X}^{T} \mathbf{X})^{-1} ] \{ [  \mathbf{X}^{T} \mathbf{X} + \lambda \mathbf{I} ]^{-1}\}^{T}.
-$$
-
-<p>The difference is non-negative definite since each component of the
-matrix product is non-negative definite. 
-This means the variance we obtain with the standard OLS will always for \( \lambda > 0 \) be larger than the variance of \( \boldsymbol{\beta} \) obtained with the Ridge estimator. This has interesting consequences when we discuss the so-called bias-variance trade-off below. 
-</p>
-
-<p>For more discussions of Ridge regression and calculation of averages, <a href="https://arxiv.org/abs/1509.09169" target="_blank">Wessel van Wieringen's</a> article is highly recommended.</p>
-
 <!-- ------------------- end of main content --------------- -->
 <center style="font-size:80%">
-<!-- copyright --> &copy; 1999-2024, Morten Hjorth-Jensen. Released under CC Attribution-NonCommercial 4.0 license
+<!-- copyright --> &copy; 1999-2025, Morten Hjorth-Jensen. Released under CC Attribution-NonCommercial 4.0 license
 </center>
 </body>
 </html>
diff --git a/doc/pub/week37/html/week37.html b/doc/pub/week37/html/week37.html
index 433634e84..4d7dbab99 100644
--- a/doc/pub/week37/html/week37.html
+++ b/doc/pub/week37/html/week37.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 37: Statistical interpretations and Resampling Methods">
-<title>Week 37: Statistical interpretations and Resampling Methods</title>
+<meta name="description" content="Week 37: Gradient descent methods">
+<title>Week 37: Gradient descent methods</title>
 <style type="text/css">
 /* bloodish style */
 body {
@@ -144,159 +144,222 @@
                2,
                None,
                'plans-for-week-37-lecture-monday'),
-              ('Plans for week 37, lab sessions',
+              ('Readings and Videos:', 2, None, 'readings-and-videos'),
+              ('Material for lecture Monday September 8',
                2,
                None,
-               'plans-for-week-37-lab-sessions'),
-              ('Material for lecture Monday September 9',
+               'material-for-lecture-monday-september-8'),
+              ('Gradient descent and revisiting Ordinary Least Squares from '
+               'last week',
                2,
                None,
-               'material-for-lecture-monday-september-9'),
-              ('Deriving OLS from a probability distribution',
+               'gradient-descent-and-revisiting-ordinary-least-squares-from-last-week'),
+              ('Gradient descent example', 2, None, 'gradient-descent-example'),
+              ('The derivative of the cost/loss function',
                2,
                None,
-               'deriving-ols-from-a-probability-distribution'),
-              ('Independent and Identically Distrubuted (iid)',
+               'the-derivative-of-the-cost-loss-function'),
+              ('The Hessian matrix', 2, None, 'the-hessian-matrix'),
+              ('Simple program', 2, None, 'simple-program'),
+              ('Gradient Descent Example', 2, None, 'gradient-descent-example'),
+              ('Gradient descent and Ridge',
                2,
                None,
-               'independent-and-identically-distrubuted-iid'),
-              ('Maximum Likelihood Estimation (MLE)',
+               'gradient-descent-and-ridge'),
+              ('The Hessian matrix for Ridge Regression',
                2,
                None,
-               'maximum-likelihood-estimation-mle'),
-              ('A new Cost Function', 2, None, 'a-new-cost-function'),
-              ("More basic Statistics and Bayes' theorem",
+               'the-hessian-matrix-for-ridge-regression'),
+              ('Program example for gradient descent with Ridge Regression',
                2,
                None,
-               'more-basic-statistics-and-bayes-theorem'),
-              ('Marginal Probability', 2, None, 'marginal-probability'),
-              ('Conditional  Probability', 2, None, 'conditional-probability'),
-              ("Bayes' Theorem", 2, None, 'bayes-theorem'),
-              ("Interpretations of Bayes' Theorem",
+               'program-example-for-gradient-descent-with-ridge-regression'),
+              ('Using gradient descent methods, limitations',
                2,
                None,
-               'interpretations-of-bayes-theorem'),
-              ("Example of Usage of Bayes' theorem",
+               'using-gradient-descent-methods-limitations'),
+              ('Momentum based GD', 2, None, 'momentum-based-gd'),
+              ('Improving gradient descent with momentum',
                2,
                None,
-               'example-of-usage-of-bayes-theorem'),
-              ('Doing it correctly', 2, None, 'doing-it-correctly'),
-              ("Bayes' Theorem and Ridge and Lasso Regression",
+               'improving-gradient-descent-with-momentum'),
+              ('Same code but now with momentum gradient descent',
                2,
                None,
-               'bayes-theorem-and-ridge-and-lasso-regression'),
-              ('Ridge and Bayes', 2, None, 'ridge-and-bayes'),
-              ('Lasso and Bayes', 2, None, 'lasso-and-bayes'),
-              ('Why resampling methods', 2, None, 'why-resampling-methods'),
-              ('Resampling methods', 2, None, 'resampling-methods'),
-              ('Resampling approaches can be computationally expensive',
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ('Overview video on Stochastic Gradient Descent (SGD)',
                2,
                None,
-               'resampling-approaches-can-be-computationally-expensive'),
-              ('Why resampling methods ?', 2, None, 'why-resampling-methods'),
-              ('Statistical analysis', 2, None, 'statistical-analysis'),
-              ('Resampling methods', 2, None, 'resampling-methods'),
-              ('Resampling methods: Bootstrap',
+               'overview-video-on-stochastic-gradient-descent-sgd'),
+              ('Batches and mini-batches', 2, None, 'batches-and-mini-batches'),
+              ('Pros and cons', 2, None, 'pros-and-cons'),
+              ('Convergence rates', 2, None, 'convergence-rates'),
+              ('Accuracy', 2, None, 'accuracy'),
+              ('Stochastic Gradient Descent (SGD)',
                2,
                None,
-               'resampling-methods-bootstrap'),
-              ('The Central Limit Theorem',
+               'stochastic-gradient-descent-sgd'),
+              ('Stochastic Gradient Descent',
                2,
                None,
-               'the-central-limit-theorem'),
-              ('Finding the Limit', 2, None, 'finding-the-limit'),
-              ('Rewriting the $\\delta$-function',
+               'stochastic-gradient-descent'),
+              ('Computation of gradients', 2, None, 'computation-of-gradients'),
+              ('SGD example', 2, None, 'sgd-example'),
+              ('The gradient step', 2, None, 'the-gradient-step'),
+              ('Simple example code', 2, None, 'simple-example-code'),
+              ('When do we stop?', 2, None, 'when-do-we-stop'),
+              ('Slightly different approach',
                2,
                None,
-               'rewriting-the-delta-function'),
-              ('Identifying Terms', 2, None, 'identifying-terms'),
-              ('Wrapping it up', 2, None, 'wrapping-it-up'),
-              ('Confidence Intervals', 2, None, 'confidence-intervals'),
-              ('Standard Approach based on the Normal Distribution',
+               'slightly-different-approach'),
+              ('Time decay rate', 2, None, 'time-decay-rate'),
+              ('Code with a Number of Minibatches which varies',
                2,
                None,
-               'standard-approach-based-on-the-normal-distribution'),
-              ('Resampling methods: Bootstrap background',
+               'code-with-a-number-of-minibatches-which-varies'),
+              ('Replace or not', 2, None, 'replace-or-not'),
+              ('SGD vs Full-Batch GD: Convergence Speed and Memory Comparison',
                2,
                None,
-               'resampling-methods-bootstrap-background'),
-              ('Resampling methods: More Bootstrap background',
+               'sgd-vs-full-batch-gd-convergence-speed-and-memory-comparison'),
+              ('Theoretical Convergence Speed and convex optimization',
+               3,
+               None,
+               'theoretical-convergence-speed-and-convex-optimization'),
+              ('Strongly Convex Case', 3, None, 'strongly-convex-case'),
+              ('Non-Convex Problems', 3, None, 'non-convex-problems'),
+              ('Memory Usage and Scalability',
+               2,
+               None,
+               'memory-usage-and-scalability'),
+              ('Empirical Evidence: Convergence Time and Memory in Practice',
+               2,
+               None,
+               'empirical-evidence-convergence-time-and-memory-in-practice'),
+              ('Deep Neural Networks', 3, None, 'deep-neural-networks'),
+              ('Memory constraints', 3, None, 'memory-constraints'),
+              ('Second moment of the gradient',
+               2,
+               None,
+               'second-moment-of-the-gradient'),
+              ('Challenge: Choosing a Fixed Learning Rate',
+               2,
+               None,
+               'challenge-choosing-a-fixed-learning-rate'),
+              ('Motivation for Adaptive Step Sizes',
+               2,
+               None,
+               'motivation-for-adaptive-step-sizes'),
+              ('AdaGrad algorithm, taken from "Goodfellow et '
+               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
+               2,
+               None,
+               'adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
+              ('Derivation of the AdaGrad Algorithm',
+               2,
+               None,
+               'derivation-of-the-adagrad-algorithm'),
+              ('AdaGrad Update Rule Derivation',
+               2,
+               None,
+               'adagrad-update-rule-derivation'),
+              ('AdaGrad Properties', 2, None, 'adagrad-properties'),
+              ('RMSProp: Adaptive Learning Rates',
+               2,
+               None,
+               'rmsprop-adaptive-learning-rates'),
+              ('RMSProp algorithm, taken from "Goodfellow et '
+               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
+               2,
+               None,
+               'rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
+              ('Adam Optimizer', 2, None, 'adam-optimizer'),
+              ('"ADAM optimizer":"https://arxiv.org/abs/1412.6980"',
+               2,
+               None,
+               'adam-optimizer-https-arxiv-org-abs-1412-6980'),
+              ('Why Combine Momentum and RMSProp?',
+               2,
+               None,
+               'why-combine-momentum-and-rmsprop'),
+              ('Adam: Exponential Moving Averages (Moments)',
                2,
                None,
-               'resampling-methods-more-bootstrap-background'),
-              ('Resampling methods: Bootstrap approach',
+               'adam-exponential-moving-averages-moments'),
+              ('Adam: Bias Correction', 2, None, 'adam-bias-correction'),
+              ('Adam: Update Rule Derivation',
                2,
                None,
-               'resampling-methods-bootstrap-approach'),
-              ('Resampling methods: Bootstrap steps',
+               'adam-update-rule-derivation'),
+              ('Adam vs. AdaGrad and RMSProp',
                2,
                None,
-               'resampling-methods-bootstrap-steps'),
-              ('Code example for the Bootstrap method',
+               'adam-vs-adagrad-and-rmsprop'),
+              ('Adaptivity Across Dimensions',
                2,
                None,
-               'code-example-for-the-bootstrap-method'),
-              ('Plotting the Histogram', 2, None, 'plotting-the-histogram'),
-              ('The bias-variance tradeoff',
+               'adaptivity-across-dimensions'),
+              ('ADAM algorithm, taken from "Goodfellow et '
+               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
                2,
                None,
-               'the-bias-variance-tradeoff'),
-              ('A way to Read the Bias-Variance Tradeoff',
+               'adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
+              ('Algorithms and codes for Adagrad, RMSprop and Adam',
                2,
                None,
-               'a-way-to-read-the-bias-variance-tradeoff'),
-              ('Example code for Bias-Variance tradeoff',
+               'algorithms-and-codes-for-adagrad-rmsprop-and-adam'),
+              ('Practical tips', 2, None, 'practical-tips'),
+              ('Sneaking in automatic differentiation using Autograd',
                2,
                None,
-               'example-code-for-bias-variance-tradeoff'),
-              ('Understanding what happens',
+               'sneaking-in-automatic-differentiation-using-autograd'),
+              ('Same code but now with momentum gradient descent',
                2,
                None,
-               'understanding-what-happens'),
-              ('Summing up', 2, None, 'summing-up'),
-              ("Another Example from Scikit-Learn's Repository",
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ('Including Stochastic Gradient Descent with Autograd',
                2,
                None,
-               'another-example-from-scikit-learn-s-repository'),
-              ('Various steps in cross-validation',
+               'including-stochastic-gradient-descent-with-autograd'),
+              ('Same code but now with momentum gradient descent',
                2,
                None,
-               'various-steps-in-cross-validation'),
-              ('Cross-validation in brief',
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ("But none of these can compete with Newton's method",
                2,
                None,
-               'cross-validation-in-brief'),
-              ('Code Example for Cross-validation and $k$-fold '
-               'Cross-validation',
+               'but-none-of-these-can-compete-with-newton-s-method'),
+              ('Similar (second order function now) problem but now with '
+               'AdaGrad',
                2,
                None,
-               'code-example-for-cross-validation-and-k-fold-cross-validation'),
-              ('More examples on bootstrap and cross-validation and errors',
+               'similar-second-order-function-now-problem-but-now-with-adagrad'),
+              ('RMSprop for adaptive learning rate with Stochastic Gradient '
+               'Descent',
                2,
                None,
-               'more-examples-on-bootstrap-and-cross-validation-and-errors'),
-              ('The same example but now with cross-validation',
+               'rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent'),
+              ('And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"',
                2,
                None,
-               'the-same-example-but-now-with-cross-validation'),
+               'and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf'),
               ('Material for the lab sessions',
                2,
                None,
                'material-for-the-lab-sessions'),
-              ('Linking the regression analysis with a statistical '
-               'interpretation',
+              ('Reminder on different scaling methods',
                2,
                None,
-               'linking-the-regression-analysis-with-a-statistical-interpretation'),
-              ('Assumptions made', 2, None, 'assumptions-made'),
-              ('Expectation value and variance',
+               'reminder-on-different-scaling-methods'),
+              ('Functionality in Scikit-Learn',
                2,
                None,
-               'expectation-value-and-variance'),
-              ('Expectation value and variance for $\\boldsymbol{\\beta}$',
+               'functionality-in-scikit-learn'),
+              ('More preprocessing', 2, None, 'more-preprocessing'),
+              ('Frequently used scaling functions',
                2,
                None,
-               'expectation-value-and-variance-for-boldsymbol-beta')]}
+               'frequently-used-scaling-functions')]}
 end of tocinfo -->
 
 <body>
@@ -318,7 +381,7 @@
 
 <!-- ------------------- main content ---------------------- -->
 <center>
-<h1>Week 37: Statistical interpretations and Resampling Methods</h1>
+<h1>Week 37: Gradient descent methods</h1>
 </center>  <!-- document title -->
 
 <!-- author(s): Morten Hjorth-Jensen -->
@@ -331,7 +394,7 @@ <h1>Week 37: Statistical interpretations and Resampling Methods</h1>
 </center>
 <br>
 <center>
-<h4>September 9, 2024</h4>
+<h4>September 8-12, 2025</h4>
 </center> <!-- date -->
 <br>
 
@@ -341,808 +404,1619 @@ <h4>September 9, 2024</h4>
 <h2 id="plans-for-week-37-lecture-monday">Plans for week 37, lecture Monday  </h2>
 
 <div class="alert alert-block alert-block alert-text-normal">
-<b>Material for the lecture on Monday September 9</b>
+<b>Plans and material  for the lecture on Monday September 8</b>
 <p>
-<ul>
-   <li> <a href="https://youtu.be/omLmp_kkie0" target="_blank">Video of Lecture</a></li>
-   <li> <a href="https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2024/NotesSeptember9.pdf" target="_blank">Whiteboard notes</a></li>
-</ul>
-  <li> Statistical interpretation of Ridge and Lasso regression, see also slides from last week</li>
-  <li> Resampling techniques, Bootstrap and cross validation and bias-variance tradeoff (this may partly be discussed during the exercise sessions as well.</li>
-  <li> Readings and Videos:</li>
-<ul>
-    <li> Raschka et al, pages 175-192</li>
-    <li> Hastie et al Chapter 7, here we recommend 7.1-7.5 and 7.10 (cross-validation) and 7.11 (bootstrap). See <a href="https://link.springer.com/book/10.1007/978-0-387-84858-7" target="_blank"><tt>https://link.springer.com/book/10.1007/978-0-387-84858-7</tt></a>.</li>
-    <li> <a href="https://www.youtube.com/watch?v=fSytzGwwBVw" target="_blank">Video on cross validation</a></li>
-    <li> <a href="https://www.youtube.com/watch?v=Xz0x-8-cgaQ" target="_blank">Video on Bootstrapping</a></li>
-    <li> <a href="https://www.youtube.com/watch?v=EuBBz3bI-aA" target="_blank">Video on bias-variance tradeoff</a></li>
-</ul>
+<p>The family of gradient descent methods</p>
+<ol>
+<li> Plain gradient descent (constant learning rate), reminder from last week with examples using OLS and Ridge</li>
+<li> Improving gradient descent with momentum</li>
+<li> Introducing stochastic gradient descent</li>
+<li> More advanced updates of the learning rate: ADAgrad, RMSprop and ADAM</li>
+<li> <a href="https://youtu.be/SuxK68tj-V8" target="_blank">Video of Lecture</a></li>
+<li> <a href="https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2025/FYSSTKweek37.pdf" target="_blank">Whiteboard notes</a></li>
+</ol>
 </div>
 
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="plans-for-week-37-lab-sessions">Plans for week 37, lab sessions  </h2>
-
+<h2 id="readings-and-videos">Readings and Videos: </h2>
 <div class="alert alert-block alert-block alert-text-normal">
-<b>Material for the lab  sessions on Tuesday and Wednesday</b>
+<b></b>
 <p>
-<ul>
-  <li> Calculations of expectation values</li>
-  <li> Discussion of resampling techniques</li>
-  <li> Exercise set for week 37</li>
-  <li> Work on project 1</li>
-  <li> <a href="https://youtu.be/bK4AEcTu-oM" target="_blank">Video of exercise sessions week 37</a></li>
-  <li> For more discussions of Ridge regression and calculation of averages, <a href="https://arxiv.org/abs/1509.09169" target="_blank">Wessel van Wieringen's</a> article is highly recommended.</li>
-</ul>
+<ol>
+<li> Recommended: Goodfellow et al, Deep Learning, introduction to gradient descent, see sections 4.3-4.5  at <a href="https://www.deeplearningbook.org/contents/numerical.html" target="_blank"><tt>https://www.deeplearningbook.org/contents/numerical.html</tt></a> and chapter 8.3-8.5 at <a href="https://www.deeplearningbook.org/contents/optimization.html" target="_blank"><tt>https://www.deeplearningbook.org/contents/optimization.html</tt></a></li>
+<li> Rashcka et al, pages 37-44 and pages 278-283 with focus on linear regression.</li>
+<li> Video on gradient descent at <a href="https://www.youtube.com/watch?v=sDv4f4s2SB8" target="_blank"><tt>https://www.youtube.com/watch?v=sDv4f4s2SB8</tt></a></li>
+<li> Video on Stochastic gradient descent at <a href="https://www.youtube.com/watch?v=vMh0zPT0tLI" target="_blank"><tt>https://www.youtube.com/watch?v=vMh0zPT0tLI</tt></a></li>
+</ol>
 </div>
-  
 
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="material-for-lecture-monday-september-9">Material for lecture Monday September 9 </h2>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="deriving-ols-from-a-probability-distribution">Deriving OLS from a probability distribution </h2>
+<h2 id="material-for-lecture-monday-september-8">Material for lecture Monday September 8 </h2>
 
-<p>Our basic assumption when we derived the OLS equations was to assume
-that our output is determined by a given continuous function
-\( f(\boldsymbol{x}) \) and a random noise \( \boldsymbol{\epsilon} \) given by the normal
-distribution with zero mean value and an undetermined variance
-\( \sigma^2 \).
-</p>
+<!-- !split  -->
+<h2 id="gradient-descent-and-revisiting-ordinary-least-squares-from-last-week">Gradient descent and revisiting Ordinary Least Squares from last week </h2>
 
-<p>We found above that the outputs \( \boldsymbol{y} \) have a mean value given by
-\( \boldsymbol{X}\hat{\boldsymbol{\beta}} \) and variance \( \sigma^2 \). Since the entries to
-the design matrix are not stochastic variables, we can assume that the
-probability distribution of our targets is also a normal distribution
-but now with mean value \( \boldsymbol{X}\hat{\boldsymbol{\beta}} \). This means that a
-single output \( y_i \) is given by the Gaussian distribution
+<p>Last week we started with  linear regression as a case study for the gradient descent
+methods. Linear regression is a great test case for the gradient
+descent methods discussed in the lectures since it has several
+desirable properties such as:
 </p>
 
-$$
-y_i\sim \mathcal{N}(\boldsymbol{X}_{i,*}\boldsymbol{\beta}, \sigma^2)=\frac{1}{\sqrt{2\pi\sigma^2}}\exp{\left[-\frac{(y_i-\boldsymbol{X}_{i,*}\boldsymbol{\beta})^2}{2\sigma^2}\right]}.
-$$
+<ol>
+<li> An analytical solution (recall homework sets for week 35).</li>
+<li> The gradient can be computed analytically.</li>
+<li> The cost function is convex which guarantees that gradient descent converges for small enough learning rates</li>
+</ol>
+<p>We revisit an example similar to what we had in the first homework set. We have a function  of the type</p>
 
 
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="independent-and-identically-distrubuted-iid">Independent and Identically Distrubuted (iid) </h2>
+<!-- code=python (!bc pycod) typeset with pygments style "default" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #f8f8f8">
+  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
+x <span style="color: #666666">=</span> <span style="color: #666666">2*</span>np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>rand(m,<span style="color: #666666">1</span>)
+y <span style="color: #666666">=</span> <span style="color: #666666">4+3*</span>x<span style="color: #666666">+</span>np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>randn(m,<span style="color: #666666">1</span>)
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
+</div>
 
-<p>We assume now that the various \( y_i \) values are stochastically distributed according to the above Gaussian distribution. 
-We define this distribution as
+<p>with \( x_i \in [0,1]  \) is chosen randomly using a uniform distribution. Additionally we have a stochastic noise chosen according to a normal distribution \( \cal {N}(0,1) \). 
+The linear regression model is given by 
 </p>
 $$
-p(y_i, \boldsymbol{X}\vert\boldsymbol{\beta})=\frac{1}{\sqrt{2\pi\sigma^2}}\exp{\left[-\frac{(y_i-\boldsymbol{X}_{i,*}\boldsymbol{\beta})^2}{2\sigma^2}\right]},
+h_\theta(x) = \boldsymbol{y} = \theta_0 + \theta_1 x,
+$$
+
+<p>such that </p>
+$$
+\boldsymbol{y}_i = \theta_0 + \theta_1 x_i.
 $$
 
-<p>which reads as finding the likelihood of an event \( y_i \) with the input variables \( \boldsymbol{X} \) given the parameters (to be determined) \( \boldsymbol{\beta} \).</p>
 
-<p>Since these events are assumed to be independent and identicall distributed we can build the probability distribution function (PDF) for all possible event \( \boldsymbol{y} \) as the product of the single events, that is we have</p>
+<!-- !split  -->
+<h2 id="gradient-descent-example">Gradient descent example </h2>
+
+<p>Let \( \mathbf{y} = (y_1,\cdots,y_n)^T \), \( \mathbf{\boldsymbol{y}} = (\boldsymbol{y}_1,\cdots,\boldsymbol{y}_n)^T \) and \( \theta = (\theta_0, \theta_1)^T \)</p>
 
+<p>It is convenient to write \( \mathbf{\boldsymbol{y}} = X\theta \) where \( X \in \mathbb{R}^{100 \times 2}  \) is the design matrix given by (we keep the intercept here)</p>
 $$
-p(\boldsymbol{y},\boldsymbol{X}\vert\boldsymbol{\beta})=\prod_{i=0}^{n-1}\frac{1}{\sqrt{2\pi\sigma^2}}\exp{\left[-\frac{(y_i-\boldsymbol{X}_{i,*}\boldsymbol{\beta})^2}{2\sigma^2}\right]}=\prod_{i=0}^{n-1}p(y_i,\boldsymbol{X}\vert\boldsymbol{\beta}).
+X \equiv \begin{bmatrix}
+1 & x_1  \\
+\vdots & \vdots  \\
+1 & x_{100} &  \\
+\end{bmatrix}.
 $$
 
-<p>We will write this in a more compact form reserving \( \boldsymbol{D} \) for the domain of events, including the ouputs (targets) and the inputs. That is
-in case we have a simple one-dimensional input and output case
-</p>
+<p>The cost/loss/risk function is given by </p>
 $$
-\boldsymbol{D}=[(x_0,y_0), (x_1,y_1),\dots, (x_{n-1},y_{n-1})].
+C(\theta) = \frac{1}{n}||X\theta-\mathbf{y}||_{2}^{2} = \frac{1}{n}\sum_{i=1}^{100}\left[ (\theta_0 + \theta_1 x_i)^2 - 2 y_i (\theta_0 + \theta_1 x_i) + y_i^2\right] 
 $$
 
-<p>In the more general case the various inputs should be replaced by the possible features represented by the input data set \( \boldsymbol{X} \). 
-We can now rewrite the above probability as 
-</p>
+<p>and we want to find \( \theta \) such that \( C(\theta) \) is minimized.</p>
+
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="the-derivative-of-the-cost-loss-function">The derivative of the cost/loss function </h2>
+
+<p>Computing \( \partial C(\theta) / \partial \theta_0 \) and \( \partial C(\theta) / \partial \theta_1 \) we can show  that the gradient can be written as</p>
 $$
-p(\boldsymbol{D}\vert\boldsymbol{\beta})=\prod_{i=0}^{n-1}\frac{1}{\sqrt{2\pi\sigma^2}}\exp{\left[-\frac{(y_i-\boldsymbol{X}_{i,*}\boldsymbol{\beta})^2}{2\sigma^2}\right]}.
+\nabla_{\theta} C(\theta) = \frac{2}{n}\begin{bmatrix} \sum_{i=1}^{100} \left(\theta_0+\theta_1x_i-y_i\right) \\
+\sum_{i=1}^{100}\left( x_i (\theta_0+\theta_1x_i)-y_ix_i\right) \\
+\end{bmatrix} = \frac{2}{n}X^T(X\theta - \mathbf{y}), 
 $$
 
-<p>It is a conditional probability (see below) and reads as the likelihood of a domain of events \( \boldsymbol{D} \) given a set of parameters \( \boldsymbol{\beta} \).</p>
+<p>where \( X \) is the design matrix defined above.</p>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="maximum-likelihood-estimation-mle">Maximum Likelihood Estimation (MLE) </h2>
+<h2 id="the-hessian-matrix">The Hessian matrix </h2>
+<p>The Hessian matrix of \( C(\theta) \) is given by </p>
+$$
+\boldsymbol{H} \equiv \begin{bmatrix}
+\frac{\partial^2 C(\theta)}{\partial \theta_0^2} & \frac{\partial^2 C(\theta)}{\partial \theta_0 \partial \theta_1}  \\
+\frac{\partial^2 C(\theta)}{\partial \theta_0 \partial \theta_1} & \frac{\partial^2 C(\theta)}{\partial \theta_1^2} &  \\
+\end{bmatrix} = \frac{2}{n}X^T X.
+$$
 
-<p>In statistics, maximum likelihood estimation (MLE) is a method of
-estimating the parameters of an assumed probability distribution,
-given some observed data. This is achieved by maximizing a likelihood
-function so that, under the assumed statistical model, the observed
-data is the most probable. 
-</p>
+<p>This result implies that \( C(\theta) \) is a convex function since the matrix \( X^T X \) always is positive semi-definite.</p>
 
-<p>We will assume here that our events are given by the above Gaussian
-distribution and we will determine the optimal parameters \( \beta \) by
-maximizing the above PDF. However, computing the derivatives of a
-product function is cumbersome and can easily lead to overflow and/or
-underflowproblems, with potentials for loss of numerical precision.
-</p>
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="simple-program">Simple program </h2>
 
-<p>In practice, it is more convenient to maximize the logarithm of the
-PDF because it is a monotonically increasing function of the argument.
-Alternatively, and this will be our option, we will minimize the
-negative of the logarithm since this is a monotonically decreasing
-function.
+<p>We can now write a program that minimizes \( C(\theta) \) using the gradient descent method with a constant learning rate \( \eta \) according to </p>
+$$
+\theta_{k+1} = \theta_k - \eta \nabla_\theta C(\theta_k), \ k=0,1,\cdots 
+$$
+
+<p>We can use the expression we computed for the gradient and let use a
+\( \theta_0 \) be chosen randomly and let \( \eta = 0.001 \). Stop iterating
+when \( ||\nabla_\theta C(\theta_k) || \leq \epsilon = 10^{-8} \). <b>Note that the code below does not include the latter stop criterion</b>.
 </p>
 
-<p>Note also that maximization/minimization of the logarithm of the PDF
-is equivalent to the maximization/minimization of the function itself.
+<p>And finally we can compare our solution for \( \theta \) with the analytic result given by 
+\( \theta= (X^TX)^{-1} X^T \mathbf{y} \).
 </p>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="a-new-cost-function">A new Cost Function </h2>
+<h2 id="gradient-descent-example">Gradient Descent Example </h2>
 
-<p>We could now define a new cost function to minimize, namely the negative logarithm of the above PDF</p>
+<p>Here our simple example</p>
+
+<!-- code=python (!bc pycod) typeset with pygments style "default" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #f8f8f8">
+  <pre style="line-height: 125%;"><span style="color: #408080; font-style: italic"># Importing various packages</span>
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">random</span> <span style="color: #008000; font-weight: bold">import</span> random, seed
+<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
+<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">matplotlib.pyplot</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">plt</span>
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">mpl_toolkits.mplot3d</span> <span style="color: #008000; font-weight: bold">import</span> Axes3D
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">matplotlib</span> <span style="color: #008000; font-weight: bold">import</span> cm
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">matplotlib.ticker</span> <span style="color: #008000; font-weight: bold">import</span> LinearLocator, FormatStrFormatter
+<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">sys</span>
+
+<span style="color: #408080; font-style: italic"># the number of datapoints</span>
+n <span style="color: #666666">=</span> <span style="color: #666666">100</span>
+x <span style="color: #666666">=</span> <span style="color: #666666">2*</span>np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>rand(n,<span style="color: #666666">1</span>)
+y <span style="color: #666666">=</span> <span style="color: #666666">4+3*</span>x<span style="color: #666666">+</span>np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>randn(n,<span style="color: #666666">1</span>)
+
+X <span style="color: #666666">=</span> np<span style="color: #666666">.</span>c_[np<span style="color: #666666">.</span>ones((n,<span style="color: #666666">1</span>)), x]
+<span style="color: #408080; font-style: italic"># Hessian matrix</span>
+H <span style="color: #666666">=</span> (<span style="color: #666666">2.0/</span>n)<span style="color: #666666">*</span> X<span style="color: #666666">.</span>T <span style="color: #666666">@</span> X
+<span style="color: #408080; font-style: italic"># Get the eigenvalues</span>
+EigValues, EigVectors <span style="color: #666666">=</span> np<span style="color: #666666">.</span>linalg<span style="color: #666666">.</span>eig(H)
+<span style="color: #008000">print</span>(<span style="color: #BA2121">f&quot;Eigenvalues of Hessian Matrix:</span><span style="color: #BB6688; font-weight: bold">{</span>EigValues<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">&quot;</span>)
+
+theta_linreg <span style="color: #666666">=</span> np<span style="color: #666666">.</span>linalg<span style="color: #666666">.</span>inv(X<span style="color: #666666">.</span>T <span style="color: #666666">@</span> X) <span style="color: #666666">@</span> X<span style="color: #666666">.</span>T <span style="color: #666666">@</span> y
+<span style="color: #008000">print</span>(theta_linreg)
+theta <span style="color: #666666">=</span> np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>randn(<span style="color: #666666">2</span>,<span style="color: #666666">1</span>)
+
+eta <span style="color: #666666">=</span> <span style="color: #666666">1.0/</span>np<span style="color: #666666">.</span>max(EigValues)
+Niterations <span style="color: #666666">=</span> <span style="color: #666666">1000</span>
+
+<span style="color: #008000; font-weight: bold">for</span> <span style="color: #008000">iter</span> <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(Niterations):
+    gradient <span style="color: #666666">=</span> (<span style="color: #666666">2.0/</span>n)<span style="color: #666666">*</span>X<span style="color: #666666">.</span>T <span style="color: #666666">@</span> (X <span style="color: #666666">@</span> theta<span style="color: #666666">-</span>y)
+    theta <span style="color: #666666">-=</span> eta<span style="color: #666666">*</span>gradient
+
+<span style="color: #008000">print</span>(theta)
+xnew <span style="color: #666666">=</span> np<span style="color: #666666">.</span>array([[<span style="color: #666666">0</span>],[<span style="color: #666666">2</span>]])
+xbnew <span style="color: #666666">=</span> np<span style="color: #666666">.</span>c_[np<span style="color: #666666">.</span>ones((<span style="color: #666666">2</span>,<span style="color: #666666">1</span>)), xnew]
+ypredict <span style="color: #666666">=</span> xbnew<span style="color: #666666">.</span>dot(theta)
+ypredict2 <span style="color: #666666">=</span> xbnew<span style="color: #666666">.</span>dot(theta_linreg)
+plt<span style="color: #666666">.</span>plot(xnew, ypredict, <span style="color: #BA2121">&quot;r-&quot;</span>)
+plt<span style="color: #666666">.</span>plot(xnew, ypredict2, <span style="color: #BA2121">&quot;b-&quot;</span>)
+plt<span style="color: #666666">.</span>plot(x, y ,<span style="color: #BA2121">&#39;ro&#39;</span>)
+plt<span style="color: #666666">.</span>axis([<span style="color: #666666">0</span>,<span style="color: #666666">2.0</span>,<span style="color: #666666">0</span>, <span style="color: #666666">15.0</span>])
+plt<span style="color: #666666">.</span>xlabel(<span style="color: #BA2121">r&#39;$x$&#39;</span>)
+plt<span style="color: #666666">.</span>ylabel(<span style="color: #BA2121">r&#39;$y$&#39;</span>)
+plt<span style="color: #666666">.</span>title(<span style="color: #BA2121">r&#39;Gradient descent example&#39;</span>)
+plt<span style="color: #666666">.</span>show()
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
+</div>
 
-$$
-C(\boldsymbol{\beta}=-\log{\prod_{i=0}^{n-1}p(y_i,\boldsymbol{X}\vert\boldsymbol{\beta})}=-\sum_{i=0}^{n-1}\log{p(y_i,\boldsymbol{X}\vert\boldsymbol{\beta})},
-$$
 
-<p>which becomes</p>
+<!-- !split  -->
+<h2 id="gradient-descent-and-ridge">Gradient descent and Ridge </h2>
+
+<p>We have also discussed Ridge regression where the loss function contains a regularized term given by the \( L_2 \) norm of \( \theta \), </p>
 $$
-C(\boldsymbol{\beta}=\frac{n}{2}\log{2\pi\sigma^2}+\frac{\vert\vert (\boldsymbol{y}-\boldsymbol{X}\boldsymbol{\beta})\vert\vert_2^2}{2\sigma^2}.
+C_{\text{ridge}}(\theta) = \frac{1}{n}||X\theta -\mathbf{y}||^2 + \lambda ||\theta||^2, \ \lambda \geq 0.
 $$
 
-<p>Taking the derivative of the <em>new</em> cost function with respect to the parameters \( \beta \) we recognize our familiar OLS equation, namely</p>
-
+<p>In order to minimize \( C_{\text{ridge}}(\theta) \) using GD we adjust the gradient as follows </p>
 $$
-\boldsymbol{X}^T\left(\boldsymbol{y}-\boldsymbol{X}\boldsymbol{\beta}\right) =0,
+\nabla_\theta C_{\text{ridge}}(\theta)  = \frac{2}{n}\begin{bmatrix} \sum_{i=1}^{100} \left(\theta_0+\theta_1x_i-y_i\right) \\
+\sum_{i=1}^{100}\left( x_i (\theta_0+\theta_1x_i)-y_ix_i\right) \\
+\end{bmatrix} + 2\lambda\begin{bmatrix} \theta_0 \\ \theta_1\end{bmatrix} = 2 (\frac{1}{n}X^T(X\theta - \mathbf{y})+\lambda \theta).
 $$
 
-<p>which leads to the well-known OLS equation for the optimal paramters \( \beta \)</p>
+<p>We can easily extend our program to minimize \( C_{\text{ridge}}(\theta) \) using gradient descent and compare with the analytical solution given by </p>
 $$
-\hat{\boldsymbol{\beta}}^{\mathrm{OLS}}=\left(\boldsymbol{X}^T\boldsymbol{X}\right)^{-1}\boldsymbol{X}^T\boldsymbol{y}!
+\theta_{\text{ridge}} = \left(X^T X + n\lambda I_{2 \times 2} \right)^{-1} X^T \mathbf{y}.
 $$
 
-<p>Before we make a similar analysis for Ridge and Lasso regression, we need a short reminder on statistics. </p>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="more-basic-statistics-and-bayes-theorem">More basic Statistics and Bayes' theorem </h2>
-
-<p>A central theorem in statistics is Bayes' theorem. This theorem plays a similar role as the good old Pythagoras' theorem in geometry.
-Bayes' theorem is extremely simple to derive. But to do so we need some basic axioms from statistics.
-</p>
-
-<p>Assume we have two domains of events \( X=[x_0,x_1,\dots,x_{n-1}] \) and \( Y=[y_0,y_1,\dots,y_{n-1}] \).</p>
-
-<p>We define also the likelihood for \( X \) and \( Y \) as \( p(X) \) and \( p(Y) \) respectively.
-The likelihood of a specific event \( x_i \) (or \( y_i \)) is then written as \( p(X=x_i) \) or just \( p(x_i)=p_i \). 
-</p>
-
-<div class="alert alert-block alert-block alert-text-normal">
-<b>Union of events is given by</b>
-<p>
+<h2 id="the-hessian-matrix-for-ridge-regression">The Hessian matrix for Ridge Regression </h2>
+<p>The Hessian matrix of Ridge Regression for our simple example  is given by </p>
 $$
-p(X \cup Y)= p(X)+p(Y)-p(X \cap Y).
+\boldsymbol{H} \equiv \begin{bmatrix}
+\frac{\partial^2 C(\theta)}{\partial \theta_0^2} & \frac{\partial^2 C(\theta)}{\partial \theta_0 \partial \theta_1}  \\
+\frac{\partial^2 C(\theta)}{\partial \theta_0 \partial \theta_1} & \frac{\partial^2 C(\theta)}{\partial \theta_1^2} &  \\
+\end{bmatrix} = \frac{2}{n}X^T X+2\lambda\boldsymbol{I}.
 $$
-</div>
 
+<p>This implies that the Hessian matrix  is positive definite, hence the stationary point is a
+minimum.
+Note that the Ridge cost function is convex being  a sum of two convex
+functions. Therefore, the stationary point is a global
+minimum of this function.
+</p>
 
-<div class="alert alert-block alert-block alert-text-normal">
-<b>The product rule (aka joint probability) is given by</b>
-<p>
-$$
-p(X \cup Y)= p(X,Y)= p(X\vert Y)p(Y)=p(Y\vert X)p(X),
-$$
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="program-example-for-gradient-descent-with-ridge-regression">Program example for gradient descent with Ridge Regression </h2>
 
-<p>where we read \( p(X\vert Y) \) as the likelihood of obtaining \( X \) given \( Y \).</p>
+<!-- code=python (!bc pycod) typeset with pygments style "default" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #f8f8f8">
+  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">random</span> <span style="color: #008000; font-weight: bold">import</span> random, seed
+<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
+<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">matplotlib.pyplot</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">plt</span>
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">mpl_toolkits.mplot3d</span> <span style="color: #008000; font-weight: bold">import</span> Axes3D
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">matplotlib</span> <span style="color: #008000; font-weight: bold">import</span> cm
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">matplotlib.ticker</span> <span style="color: #008000; font-weight: bold">import</span> LinearLocator, FormatStrFormatter
+<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">sys</span>
+
+<span style="color: #408080; font-style: italic"># the number of datapoints</span>
+n <span style="color: #666666">=</span> <span style="color: #666666">100</span>
+x <span style="color: #666666">=</span> <span style="color: #666666">2*</span>np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>rand(n,<span style="color: #666666">1</span>)
+y <span style="color: #666666">=</span> <span style="color: #666666">4+3*</span>x<span style="color: #666666">+</span>np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>randn(n,<span style="color: #666666">1</span>)
+
+X <span style="color: #666666">=</span> np<span style="color: #666666">.</span>c_[np<span style="color: #666666">.</span>ones((n,<span style="color: #666666">1</span>)), x]
+XT_X <span style="color: #666666">=</span> X<span style="color: #666666">.</span>T <span style="color: #666666">@</span> X
+
+<span style="color: #408080; font-style: italic">#Ridge parameter lambda</span>
+lmbda  <span style="color: #666666">=</span> <span style="color: #666666">0.001</span>
+Id <span style="color: #666666">=</span> n<span style="color: #666666">*</span>lmbda<span style="color: #666666">*</span> np<span style="color: #666666">.</span>eye(XT_X<span style="color: #666666">.</span>shape[<span style="color: #666666">0</span>])
+
+<span style="color: #408080; font-style: italic"># Hessian matrix</span>
+H <span style="color: #666666">=</span> (<span style="color: #666666">2.0/</span>n)<span style="color: #666666">*</span> XT_X<span style="color: #666666">+2*</span>lmbda<span style="color: #666666">*</span> np<span style="color: #666666">.</span>eye(XT_X<span style="color: #666666">.</span>shape[<span style="color: #666666">0</span>])
+<span style="color: #408080; font-style: italic"># Get the eigenvalues</span>
+EigValues, EigVectors <span style="color: #666666">=</span> np<span style="color: #666666">.</span>linalg<span style="color: #666666">.</span>eig(H)
+<span style="color: #008000">print</span>(<span style="color: #BA2121">f&quot;Eigenvalues of Hessian Matrix:</span><span style="color: #BB6688; font-weight: bold">{</span>EigValues<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">&quot;</span>)
+
+
+theta_linreg <span style="color: #666666">=</span> np<span style="color: #666666">.</span>linalg<span style="color: #666666">.</span>inv(XT_X<span style="color: #666666">+</span>Id) <span style="color: #666666">@</span> X<span style="color: #666666">.</span>T <span style="color: #666666">@</span> y
+<span style="color: #008000">print</span>(theta_linreg)
+<span style="color: #408080; font-style: italic"># Start plain gradient descent</span>
+theta <span style="color: #666666">=</span> np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>randn(<span style="color: #666666">2</span>,<span style="color: #666666">1</span>)
+
+eta <span style="color: #666666">=</span> <span style="color: #666666">1.0/</span>np<span style="color: #666666">.</span>max(EigValues)
+Niterations <span style="color: #666666">=</span> <span style="color: #666666">100</span>
+
+<span style="color: #008000; font-weight: bold">for</span> <span style="color: #008000">iter</span> <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(Niterations):
+    gradients <span style="color: #666666">=</span> <span style="color: #666666">2.0/</span>n<span style="color: #666666">*</span>X<span style="color: #666666">.</span>T <span style="color: #666666">@</span> (X <span style="color: #666666">@</span> (theta)<span style="color: #666666">-</span>y)<span style="color: #666666">+2*</span>lmbda<span style="color: #666666">*</span>theta
+    theta <span style="color: #666666">-=</span> eta<span style="color: #666666">*</span>gradients
+
+<span style="color: #008000">print</span>(theta)
+ypredict <span style="color: #666666">=</span> X <span style="color: #666666">@</span> theta
+ypredict2 <span style="color: #666666">=</span> X <span style="color: #666666">@</span> theta_linreg
+plt<span style="color: #666666">.</span>plot(x, ypredict, <span style="color: #BA2121">&quot;r-&quot;</span>)
+plt<span style="color: #666666">.</span>plot(x, ypredict2, <span style="color: #BA2121">&quot;b-&quot;</span>)
+plt<span style="color: #666666">.</span>plot(x, y ,<span style="color: #BA2121">&#39;ro&#39;</span>)
+plt<span style="color: #666666">.</span>axis([<span style="color: #666666">0</span>,<span style="color: #666666">2.0</span>,<span style="color: #666666">0</span>, <span style="color: #666666">15.0</span>])
+plt<span style="color: #666666">.</span>xlabel(<span style="color: #BA2121">r&#39;$x$&#39;</span>)
+plt<span style="color: #666666">.</span>ylabel(<span style="color: #BA2121">r&#39;$y$&#39;</span>)
+plt<span style="color: #666666">.</span>title(<span style="color: #BA2121">r&#39;Gradient descent example for Ridge&#39;</span>)
+plt<span style="color: #666666">.</span>show()
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
 </div>
 
 
-<p>If we have independent events then \( p(X,Y)=p(X)p(Y) \).</p>
-
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="marginal-probability">Marginal Probability </h2>
+<h2 id="using-gradient-descent-methods-limitations">Using gradient descent methods, limitations </h2>
 
-<p>The marginal probability is defined in terms of only one of the set of variables \( X,Y \). For a discrete probability we have</p>
-<div class="alert alert-block alert-block alert-text-normal">
-<b></b>
-<p>
-$$
-p(X)=\sum_{i=0}^{n-1}p(X,Y=y_i)=\sum_{i=0}^{n-1}p(X\vert Y=y_i)p(Y=y_i)=\sum_{i=0}^{n-1}p(X\vert y_i)p(y_i).
-$$
-</div>
+<ul>
+<li> <b>Gradient descent (GD) finds local minima of our function</b>. Since the GD algorithm is deterministic, if it converges, it will converge to a local minimum of our cost/loss/risk function. Because in ML we are often dealing with extremely rugged landscapes with many local minima, this can lead to poor performance.</li>
+<li> <b>GD is sensitive to initial conditions</b>. One consequence of the local nature of GD is that initial conditions matter. Depending on where one starts, one will end up at a different local minima. Therefore, it is very important to think about how one initializes the training process. This is true for GD as well as more complicated variants of GD.</li>
+<li> <b>Gradients are computationally expensive to calculate for large datasets</b>. In many cases in statistics and ML, the cost/loss/risk function is a sum of terms, with one term for each data point. For example, in linear regression, \( E \propto \sum_{i=1}^n (y_i - \mathbf{w}^T\cdot\mathbf{x}_i)^2 \); for logistic regression, the square error is replaced by the cross entropy. To calculate the gradient we have to sum over <em>all</em> \( n \) data points. Doing this at every GD step becomes extremely computationally expensive. An ingenious solution to this, is to calculate the gradients using small subsets of the data called &quot;mini batches&quot;. This has the added benefit of introducing stochasticity into our algorithm.</li>
+<li> <b>GD is very sensitive to choices of learning rates</b>. GD is extremely sensitive to the choice of learning rates. If the learning rate is very small, the training process take an extremely long time. For larger learning rates, GD can diverge and give poor results. Furthermore, depending on what the local landscape looks like, we have to modify the learning rates to ensure convergence. Ideally, we would <em>adaptively</em> choose the learning rates to match the landscape.</li>
+<li> <b>GD treats all directions in parameter space uniformly.</b> Another major drawback of GD is that unlike Newton's method, the learning rate for GD is the same in all directions in parameter space. For this reason, the maximum learning rate is set by the behavior of the steepest direction and this can significantly slow down training. Ideally, we would like to take large steps in flat directions and small steps in steep directions. Since we are exploring rugged landscapes where curvatures change, this requires us to keep track of not only the gradient but second derivatives. The ideal scenario would be to calculate the Hessian but this proves to be too computationally expensive.</li> 
+<li> GD can take exponential time to escape saddle points, even with random initialization. As we mentioned, GD is extremely sensitive to initial condition since it determines the particular local minimum GD would eventually reach. However, even with a good initialization scheme, through the introduction of randomness, GD can still take exponential time to escape saddle points.</li>
+</ul>
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="momentum-based-gd">Momentum based GD </h2>
 
+<p>We discuss here some simple examples where we introduce what is called
+'memory'about previous steps, or what is normally called momentum
+gradient descent.
+For the mathematical details, see whiteboad notes from lecture on September 8, 2025. 
+</p>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="conditional-probability">Conditional  Probability </h2>
+<h2 id="improving-gradient-descent-with-momentum">Improving gradient descent with momentum </h2>
 
-<p>The conditional  probability, if \( p(Y) > 0 \), is </p>
-<div class="alert alert-block alert-block alert-text-normal">
-<b></b>
-<p>
-$$
-p(X\vert Y)= \frac{p(X,Y)}{p(Y)}=\frac{p(X,Y)}{\sum_{i=0}^{n-1}p(Y\vert X=x_i)p(x_i)}.
-$$
+
+<!-- code=python (!bc pycod) typeset with pygments style "default" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #f8f8f8">
+  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">numpy</span> <span style="color: #008000; font-weight: bold">import</span> asarray
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">numpy</span> <span style="color: #008000; font-weight: bold">import</span> arange
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">numpy.random</span> <span style="color: #008000; font-weight: bold">import</span> rand
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">numpy.random</span> <span style="color: #008000; font-weight: bold">import</span> seed
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">matplotlib</span> <span style="color: #008000; font-weight: bold">import</span> pyplot
+ 
+<span style="color: #408080; font-style: italic"># objective function</span>
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">objective</span>(x):
+	<span style="color: #008000; font-weight: bold">return</span> x<span style="color: #666666">**2.0</span>
+ 
+<span style="color: #408080; font-style: italic"># derivative of objective function</span>
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">derivative</span>(x):
+	<span style="color: #008000; font-weight: bold">return</span> x <span style="color: #666666">*</span> <span style="color: #666666">2.0</span>
+ 
+<span style="color: #408080; font-style: italic"># gradient descent algorithm</span>
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">gradient_descent</span>(objective, derivative, bounds, n_iter, step_size):
+	<span style="color: #408080; font-style: italic"># track all solutions</span>
+	solutions, scores <span style="color: #666666">=</span> <span style="color: #008000">list</span>(), <span style="color: #008000">list</span>()
+	<span style="color: #408080; font-style: italic"># generate an initial point</span>
+	solution <span style="color: #666666">=</span> bounds[:, <span style="color: #666666">0</span>] <span style="color: #666666">+</span> rand(<span style="color: #008000">len</span>(bounds)) <span style="color: #666666">*</span> (bounds[:, <span style="color: #666666">1</span>] <span style="color: #666666">-</span> bounds[:, <span style="color: #666666">0</span>])
+	<span style="color: #408080; font-style: italic"># run the gradient descent</span>
+	<span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(n_iter):
+		<span style="color: #408080; font-style: italic"># calculate gradient</span>
+		gradient <span style="color: #666666">=</span> derivative(solution)
+		<span style="color: #408080; font-style: italic"># take a step</span>
+		solution <span style="color: #666666">=</span> solution <span style="color: #666666">-</span> step_size <span style="color: #666666">*</span> gradient
+		<span style="color: #408080; font-style: italic"># evaluate candidate point</span>
+		solution_eval <span style="color: #666666">=</span> objective(solution)
+		<span style="color: #408080; font-style: italic"># store solution</span>
+		solutions<span style="color: #666666">.</span>append(solution)
+		scores<span style="color: #666666">.</span>append(solution_eval)
+		<span style="color: #408080; font-style: italic"># report progress</span>
+		<span style="color: #008000">print</span>(<span style="color: #BA2121">&#39;&gt;</span><span style="color: #BB6688; font-weight: bold">%d</span><span style="color: #BA2121"> f(</span><span style="color: #BB6688; font-weight: bold">%s</span><span style="color: #BA2121">) = </span><span style="color: #BB6688; font-weight: bold">%.5f</span><span style="color: #BA2121">&#39;</span> <span style="color: #666666">%</span> (i, solution, solution_eval))
+	<span style="color: #008000; font-weight: bold">return</span> [solutions, scores]
+ 
+<span style="color: #408080; font-style: italic"># seed the pseudo random number generator</span>
+seed(<span style="color: #666666">4</span>)
+<span style="color: #408080; font-style: italic"># define range for input</span>
+bounds <span style="color: #666666">=</span> asarray([[<span style="color: #666666">-1.0</span>, <span style="color: #666666">1.0</span>]])
+<span style="color: #408080; font-style: italic"># define the total iterations</span>
+n_iter <span style="color: #666666">=</span> <span style="color: #666666">30</span>
+<span style="color: #408080; font-style: italic"># define the step size</span>
+step_size <span style="color: #666666">=</span> <span style="color: #666666">0.1</span>
+<span style="color: #408080; font-style: italic"># perform the gradient descent search</span>
+solutions, scores <span style="color: #666666">=</span> gradient_descent(objective, derivative, bounds, n_iter, step_size)
+<span style="color: #408080; font-style: italic"># sample input range uniformly at 0.1 increments</span>
+inputs <span style="color: #666666">=</span> arange(bounds[<span style="color: #666666">0</span>,<span style="color: #666666">0</span>], bounds[<span style="color: #666666">0</span>,<span style="color: #666666">1</span>]<span style="color: #666666">+0.1</span>, <span style="color: #666666">0.1</span>)
+<span style="color: #408080; font-style: italic"># compute targets</span>
+results <span style="color: #666666">=</span> objective(inputs)
+<span style="color: #408080; font-style: italic"># create a line plot of input vs result</span>
+pyplot<span style="color: #666666">.</span>plot(inputs, results)
+<span style="color: #408080; font-style: italic"># plot the solutions found</span>
+pyplot<span style="color: #666666">.</span>plot(solutions, scores, <span style="color: #BA2121">&#39;.-&#39;</span>, color<span style="color: #666666">=</span><span style="color: #BA2121">&#39;red&#39;</span>)
+<span style="color: #408080; font-style: italic"># show the plot</span>
+pyplot<span style="color: #666666">.</span>show()
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
 </div>
 
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="bayes-theorem">Bayes' Theorem </h2>
+<h2 id="same-code-but-now-with-momentum-gradient-descent">Same code but now with momentum gradient descent </h2>
 
-<p>If we combine the conditional probability with the marginal probability and the standard product rule, we have</p>
-$$
-p(X\vert Y)= \frac{p(X,Y)}{p(Y)},
-$$
 
-<p>which we can rewrite as</p>
+<!-- code=python (!bc pycod) typeset with pygments style "default" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #f8f8f8">
+  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">numpy</span> <span style="color: #008000; font-weight: bold">import</span> asarray
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">numpy</span> <span style="color: #008000; font-weight: bold">import</span> arange
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">numpy.random</span> <span style="color: #008000; font-weight: bold">import</span> rand
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">numpy.random</span> <span style="color: #008000; font-weight: bold">import</span> seed
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">matplotlib</span> <span style="color: #008000; font-weight: bold">import</span> pyplot
+ 
+<span style="color: #408080; font-style: italic"># objective function</span>
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">objective</span>(x):
+	<span style="color: #008000; font-weight: bold">return</span> x<span style="color: #666666">**2.0</span>
+ 
+<span style="color: #408080; font-style: italic"># derivative of objective function</span>
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">derivative</span>(x):
+	<span style="color: #008000; font-weight: bold">return</span> x <span style="color: #666666">*</span> <span style="color: #666666">2.0</span>
+ 
+<span style="color: #408080; font-style: italic"># gradient descent algorithm</span>
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">gradient_descent</span>(objective, derivative, bounds, n_iter, step_size, momentum):
+	<span style="color: #408080; font-style: italic"># track all solutions</span>
+	solutions, scores <span style="color: #666666">=</span> <span style="color: #008000">list</span>(), <span style="color: #008000">list</span>()
+	<span style="color: #408080; font-style: italic"># generate an initial point</span>
+	solution <span style="color: #666666">=</span> bounds[:, <span style="color: #666666">0</span>] <span style="color: #666666">+</span> rand(<span style="color: #008000">len</span>(bounds)) <span style="color: #666666">*</span> (bounds[:, <span style="color: #666666">1</span>] <span style="color: #666666">-</span> bounds[:, <span style="color: #666666">0</span>])
+	<span style="color: #408080; font-style: italic"># keep track of the change</span>
+	change <span style="color: #666666">=</span> <span style="color: #666666">0.0</span>
+	<span style="color: #408080; font-style: italic"># run the gradient descent</span>
+	<span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(n_iter):
+		<span style="color: #408080; font-style: italic"># calculate gradient</span>
+		gradient <span style="color: #666666">=</span> derivative(solution)
+		<span style="color: #408080; font-style: italic"># calculate update</span>
+		new_change <span style="color: #666666">=</span> step_size <span style="color: #666666">*</span> gradient <span style="color: #666666">+</span> momentum <span style="color: #666666">*</span> change
+		<span style="color: #408080; font-style: italic"># take a step</span>
+		solution <span style="color: #666666">=</span> solution <span style="color: #666666">-</span> new_change
+		<span style="color: #408080; font-style: italic"># save the change</span>
+		change <span style="color: #666666">=</span> new_change
+		<span style="color: #408080; font-style: italic"># evaluate candidate point</span>
+		solution_eval <span style="color: #666666">=</span> objective(solution)
+		<span style="color: #408080; font-style: italic"># store solution</span>
+		solutions<span style="color: #666666">.</span>append(solution)
+		scores<span style="color: #666666">.</span>append(solution_eval)
+		<span style="color: #408080; font-style: italic"># report progress</span>
+		<span style="color: #008000">print</span>(<span style="color: #BA2121">&#39;&gt;</span><span style="color: #BB6688; font-weight: bold">%d</span><span style="color: #BA2121"> f(</span><span style="color: #BB6688; font-weight: bold">%s</span><span style="color: #BA2121">) = </span><span style="color: #BB6688; font-weight: bold">%.5f</span><span style="color: #BA2121">&#39;</span> <span style="color: #666666">%</span> (i, solution, solution_eval))
+	<span style="color: #008000; font-weight: bold">return</span> [solutions, scores]
+ 
+<span style="color: #408080; font-style: italic"># seed the pseudo random number generator</span>
+seed(<span style="color: #666666">4</span>)
+<span style="color: #408080; font-style: italic"># define range for input</span>
+bounds <span style="color: #666666">=</span> asarray([[<span style="color: #666666">-1.0</span>, <span style="color: #666666">1.0</span>]])
+<span style="color: #408080; font-style: italic"># define the total iterations</span>
+n_iter <span style="color: #666666">=</span> <span style="color: #666666">30</span>
+<span style="color: #408080; font-style: italic"># define the step size</span>
+step_size <span style="color: #666666">=</span> <span style="color: #666666">0.1</span>
+<span style="color: #408080; font-style: italic"># define momentum</span>
+momentum <span style="color: #666666">=</span> <span style="color: #666666">0.3</span>
+<span style="color: #408080; font-style: italic"># perform the gradient descent search with momentum</span>
+solutions, scores <span style="color: #666666">=</span> gradient_descent(objective, derivative, bounds, n_iter, step_size, momentum)
+<span style="color: #408080; font-style: italic"># sample input range uniformly at 0.1 increments</span>
+inputs <span style="color: #666666">=</span> arange(bounds[<span style="color: #666666">0</span>,<span style="color: #666666">0</span>], bounds[<span style="color: #666666">0</span>,<span style="color: #666666">1</span>]<span style="color: #666666">+0.1</span>, <span style="color: #666666">0.1</span>)
+<span style="color: #408080; font-style: italic"># compute targets</span>
+results <span style="color: #666666">=</span> objective(inputs)
+<span style="color: #408080; font-style: italic"># create a line plot of input vs result</span>
+pyplot<span style="color: #666666">.</span>plot(inputs, results)
+<span style="color: #408080; font-style: italic"># plot the solutions found</span>
+pyplot<span style="color: #666666">.</span>plot(solutions, scores, <span style="color: #BA2121">&#39;.-&#39;</span>, color<span style="color: #666666">=</span><span style="color: #BA2121">&#39;red&#39;</span>)
+<span style="color: #408080; font-style: italic"># show the plot</span>
+pyplot<span style="color: #666666">.</span>show()
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
+</div>
+
 
-$$
-p(X\vert Y)= \frac{p(X,Y)}{\sum_{i=0}^{n-1}p(Y\vert X=x_i)p(x_i)}=\frac{p(Y\vert X)p(X)}{\sum_{i=0}^{n-1}p(Y\vert X=x_i)p(x_i)},
-$$
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="overview-video-on-stochastic-gradient-descent-sgd">Overview video on Stochastic Gradient Descent (SGD) </h2>
 
-<p>which is Bayes' theorem. It allows us to evaluate the uncertainty in in \( X \) after we have observed \( Y \). We can easily interchange \( X \) with \( Y \).  </p>
+<a href="https://www.youtube.com/watch?v=vMh0zPT0tLI&ab_channel=StatQuestwithJoshStarmer" target="_blank">What is Stochastic Gradient Descent</a>
+<p>There are several reasons for using stochastic gradient descent. Some of these are:</p>
 
+<ol>
+<li> Efficiency: Updates weights more frequently using a single or a small batch of samples, which speeds up convergence.</li>
+<li> Hopefully avoid Local Minima</li>
+<li> Memory Usage: Requires less memory compared to computing gradients for the entire dataset.</li>
+</ol>
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="interpretations-of-bayes-theorem">Interpretations of Bayes' Theorem </h2>
+<h2 id="batches-and-mini-batches">Batches and mini-batches </h2>
 
-<p>The quantity \( p(Y\vert X) \) on the right-hand side of the theorem is
-evaluated for the observed data \( Y \) and can be viewed as a function of
-the parameter space represented by \( X \). This function is not
-necesseraly normalized and is normally called the likelihood function.
+<p>In gradient descent we compute the cost function and its gradient for all data points we have.</p>
+
+<p>In large-scale applications such as the <a href="https://www.image-net.org/challenges/LSVRC/" target="_blank">ILSVRC challenge</a>, the
+training data can have on order of millions of examples. Hence, it
+seems wasteful to compute the full cost function over the entire
+training set in order to perform only a single parameter update. A
+very common approach to addressing this challenge is to compute the
+gradient over batches of the training data. For example, a typical batch could contain some thousand  examples from
+an  entire training set of several millions. This batch is then used to
+perform a parameter update.
 </p>
 
-<p>The function \( p(X) \) on the right hand side is called the prior while the function on the left hand side is the called the posterior probability. The denominator on the right hand side serves as a normalization factor for the posterior distribution.</p>
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="pros-and-cons">Pros and cons </h2>
 
-<p>Let us try to illustrate Bayes' theorem through an example.</p>
+<ol>
+<li> Speed: SGD is faster than gradient descent because it uses only one training example per iteration, whereas gradient descent requires the entire dataset. This speed advantage becomes more significant as the size of the dataset increases.</li>
+<li> Convergence: Gradient descent has a more predictable convergence behaviour because it uses the average gradient of the entire dataset. In contrast, SGD&#8217;s convergence behaviour can be more erratic due to its random sampling of individual training examples.</li>
+<li> Memory: Gradient descent requires more memory than SGD because it must store the entire dataset for each iteration. SGD only needs to store the current training example, making it more memory-efficient.</li>
+</ol>
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="convergence-rates">Convergence rates </h2>
 
+<ol>
+<li> Stochastic Gradient Descent has a faster convergence rate due to the use of single training examples in each iteration.</li>
+<li> Gradient Descent as a slower convergence rate, as it uses the entire dataset for each iteration.</li>
+</ol>
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="example-of-usage-of-bayes-theorem">Example of Usage of Bayes' theorem </h2>
+<h2 id="accuracy">Accuracy  </h2>
 
-<p>Let us suppose that you are undergoing a series of mammography scans in
-order to rule out possible breast cancer cases.  We define the
-sensitivity for a positive event by the variable \( X \). It takes binary
-values with \( X=1 \) representing a positive event and \( X=0 \) being a
-negative event. We reserve \( Y \) as a classification parameter for
-either a negative or a positive breast cancer confirmation. (Short note on wordings: positive here means having breast cancer, although none of us would consider this being a  positive thing).
+<p>In general, stochastic Gradient Descent is Less accurate than gradient
+descent, as it calculates the gradient on single examples, which may
+not accurately represent the overall dataset.  Gradient Descent is
+more accurate because it uses the average gradient calculated over the
+entire dataset.
 </p>
 
-<p>We let \( Y=1 \) represent the the case of having breast cancer and \( Y=0 \) as not.</p>
-
-<p>Let us assume that if you have breast cancer, the test will be positive with a probability of \( 0.8 \), that is we have</p>
-
-$$
-p(X=1\vert Y=1) =0.8.
-$$
+<p>There are other disadvantages to using SGD. The main drawback is that
+its convergence behaviour can be more erratic due to the random
+sampling of individual training examples. This can lead to less
+accurate results, as the algorithm may not converge to the true
+minimum of the cost function. Additionally, the learning rate, which
+determines the step size of each update to the model&#8217;s parameters,
+must be carefully chosen to ensure convergence.
+</p>
 
-<p>This obviously sounds  scary since many would conclude that if the test is positive, there is a likelihood of \( 80\% \) for having cancer.
-It is however not correct, as the following Bayesian analysis shows.
+<p>It is however the method of choice in deep learning algorithms where
+SGD is often used in combination with other optimization techniques,
+such as momentum or adaptive learning rates
 </p>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="doing-it-correctly">Doing it correctly </h2>
+<h2 id="stochastic-gradient-descent-sgd">Stochastic Gradient Descent (SGD) </h2>
 
-<p>If we look at various national surveys on breast cancer, the general likelihood of developing breast cancer is a very small number.
-Let us assume that the prior probability in the population as a whole is
+<p>In stochastic gradient descent, the extreme case is the case where we
+have only one batch, that is we include the whole data set.
 </p>
 
-$$
-p(Y=1) =0.004.
-$$
-
-<p>We need also to account for the fact that the test may produce a false positive result (false alarm). Let us here assume that we have</p>
-$$
-p(X=1\vert Y=0) =0.1.
-$$
-
-<p>Using Bayes' theorem we can then find the posterior probability that the person has breast cancer in case of a positive test, that is we can compute</p>
-
-$$
-p(Y=1\vert X=1)=\frac{p(X=1\vert Y=1)p(Y=1)}{p(X=1\vert Y=1)p(Y=1)+p(X=1\vert Y=0)p(Y=0)}=\frac{0.8\times 0.004}{0.8\times 0.004+0.1\times 0.996}=0.031.
-$$
+<p>This process is called Stochastic Gradient
+Descent (SGD) (or also sometimes on-line gradient descent). This is
+relatively less common to see because in practice due to vectorized
+code optimizations it can be computationally much more efficient to
+evaluate the gradient for 100 examples, than the gradient for one
+example 100 times. Even though SGD technically refers to using a
+single example at a time to evaluate the gradient, you will hear
+people use the term SGD even when referring to mini-batch gradient
+descent (i.e. mentions of MGD for &#8220;Minibatch Gradient Descent&#8221;, or BGD
+for &#8220;Batch gradient descent&#8221; are rare to see), where it is usually
+assumed that mini-batches are used. The size of the mini-batch is a
+hyperparameter but it is not very common to cross-validate or bootstrap it. It is
+usually based on memory constraints (if any), or set to some value,
+e.g. 32, 64 or 128. We use powers of 2 in practice because many
+vectorized operation implementations work faster when their inputs are
+sized in powers of 2.
+</p>
 
-<p>That is, in case of a positive test, there is only a \( 3\% \) chance of having breast cancer!</p>
+<p>In our notes with  SGD we mean stochastic gradient descent with mini-batches.</p>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="bayes-theorem-and-ridge-and-lasso-regression">Bayes' Theorem and Ridge and Lasso Regression </h2>
+<h2 id="stochastic-gradient-descent">Stochastic Gradient Descent </h2>
 
-<p>Using Bayes' theorem we can gain a better intuition about Ridge and Lasso regression. </p>
-
-<p>For ordinary least squares we postulated that the maximum likelihood for the doamin of events \( \boldsymbol{D} \) (one-dimensional case)</p>
-$$
-\boldsymbol{D}=[(x_0,y_0), (x_1,y_1),\dots, (x_{n-1},y_{n-1})],
-$$
+<p>Stochastic gradient descent (SGD) and variants thereof address some of
+the shortcomings of the Gradient descent method discussed above.
+</p>
 
-<p>is given by</p>
+<p>The underlying idea of SGD comes from the observation that the cost
+function, which we want to minimize, can almost always be written as a
+sum over \( n \) data points \( \{\mathbf{x}_i\}_{i=1}^n \),
+</p>
 $$
-p(\boldsymbol{D}\vert\boldsymbol{\beta})=\prod_{i=0}^{n-1}\frac{1}{\sqrt{2\pi\sigma^2}}\exp{\left[-\frac{(y_i-\boldsymbol{X}_{i,*}\boldsymbol{\beta})^2}{2\sigma^2}\right]}.
+C(\mathbf{\theta}) = \sum_{i=1}^n c_i(\mathbf{x}_i,
+\mathbf{\theta}). 
 $$
 
-<p>In Bayes' theorem this function plays the role of the so-called likelihood. We could now ask the question what is the posterior probability of a parameter set \( \boldsymbol{\beta} \) given a domain of events \( \boldsymbol{D} \)?  That is, how can we define the posterior probability </p>
 
-$$
-p(\boldsymbol{\beta}\vert\boldsymbol{D}).
-$$
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="computation-of-gradients">Computation of gradients </h2>
 
-<p>Bayes' theorem comes to our rescue here since (omitting the normalization constant)</p>
+<p>This in turn means that the gradient can be
+computed as a sum over \( i \)-gradients 
+</p>
 $$
-p(\boldsymbol{\beta}\vert\boldsymbol{D})\propto p(\boldsymbol{D}\vert\boldsymbol{\beta})p(\boldsymbol{\beta}).
+\nabla_\theta C(\mathbf{\theta}) = \sum_i^n \nabla_\theta c_i(\mathbf{x}_i,
+\mathbf{\theta}).
 $$
 
-<p>We have a model for \( p(\boldsymbol{D}\vert\boldsymbol{\beta}) \) but need one for the <b>prior</b> \( p(\boldsymbol{\beta}) \)!   </p>
+<p>Stochasticity/randomness is introduced by only taking the
+gradient on a subset of the data called minibatches.  If there are \( n \)
+data points and the size of each minibatch is \( M \), there will be \( n/M \)
+minibatches. We denote these minibatches by \( B_k \) where
+\( k=1,\cdots,n/M \).
+</p>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="ridge-and-bayes">Ridge and Bayes </h2>
-
-<p>With the posterior probability defined by a likelihood which we have
-already modeled and an unknown prior, we are now ready to make
-additional models for the prior.
+<h2 id="sgd-example">SGD example </h2>
+<p>As an example, suppose we have \( 10 \) data points \( (\mathbf{x}_1,\cdots, \mathbf{x}_{10}) \) 
+and we choose to have \( M=5 \) minibathces,
+then each minibatch contains two data points. In particular we have
+\( B_1 = (\mathbf{x}_1,\mathbf{x}_2), \cdots, B_5 =
+(\mathbf{x}_9,\mathbf{x}_{10}) \). Note that if you choose \( M=1 \) you
+have only a single batch with all data points and on the other extreme,
+you may choose \( M=n \) resulting in a minibatch for each datapoint, i.e
+\( B_k = \mathbf{x}_k \).
 </p>
 
-<p>We can, based on our discussions of the variance of \( \boldsymbol{\beta} \) and the mean value, assume that the prior for the values \( \boldsymbol{\beta} \) is given by a Gaussian with mean value zero and variance \( \tau^2 \), that is</p>
-
+<p>The idea is now to approximate the gradient by replacing the sum over
+all data points with a sum over the data points in one the minibatches
+picked at random in each gradient descent step 
+</p>
 $$
-p(\boldsymbol{\beta})=\prod_{j=0}^{p-1}\exp{\left(-\frac{\beta_j^2}{2\tau^2}\right)}.
+\nabla_{\theta}
+C(\mathbf{\theta}) = \sum_{i=1}^n \nabla_\theta c_i(\mathbf{x}_i,
+\mathbf{\theta}) \rightarrow \sum_{i \in B_k}^n \nabla_\theta
+c_i(\mathbf{x}_i, \mathbf{\theta}).
 $$
 
-<p>Our posterior probability becomes then (omitting the normalization factor which is just a constant)</p>
-$$
-p(\boldsymbol{\beta\vert\boldsymbol{D})}=\prod_{i=0}^{n-1}\frac{1}{\sqrt{2\pi\sigma^2}}\exp{\left[-\frac{(y_i-\boldsymbol{X}_{i,*}\boldsymbol{\beta})^2}{2\sigma^2}\right]}\prod_{j=0}^{p-1}\exp{\left(-\frac{\beta_j^2}{2\tau^2}\right)}.
-$$
 
-<p>We can now optimize this quantity with respect to \( \boldsymbol{\beta} \). As we
-did for OLS, this is most conveniently done by taking the negative
-logarithm of the posterior probability. Doing so and leaving out the
-constants terms that do not depend on \( \beta \), we have
-</p>
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="the-gradient-step">The gradient step </h2>
 
+<p>Thus a gradient descent step now looks like </p>
 $$
-C(\boldsymbol{\beta})=\frac{\vert\vert (\boldsymbol{y}-\boldsymbol{X}\boldsymbol{\beta})\vert\vert_2^2}{2\sigma^2}+\frac{1}{2\tau^2}\vert\vert\boldsymbol{\beta}\vert\vert_2^2,
+\theta_{j+1} = \theta_j - \eta_j \sum_{i \in B_k}^n \nabla_\theta c_i(\mathbf{x}_i,
+\mathbf{\theta})
 $$
 
-<p>and replacing \( 1/2\tau^2 \) with \( \lambda \) we have</p>
-
-$$
-C(\boldsymbol{\beta})=\frac{\vert\vert (\boldsymbol{y}-\boldsymbol{X}\boldsymbol{\beta})\vert\vert_2^2}{2\sigma^2}+\lambda\vert\vert\boldsymbol{\beta}\vert\vert_2^2,
-$$
-
-<p>which is our Ridge cost function!  Nice, isn't it?</p>
+<p>where \( k \) is picked at random with equal
+probability from \( [1,n/M] \). An iteration over the number of
+minibathces (n/M) is commonly referred to as an epoch. Thus it is
+typical to choose a number of epochs and for each epoch iterate over
+the number of minibatches, as exemplified in the code below.
+</p>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="lasso-and-bayes">Lasso and Bayes </h2>
-
-<p>To derive the Lasso cost function, we simply replace the Gaussian prior with an exponential distribution (<a href="https://en.wikipedia.org/wiki/Laplace_distribution" target="_blank">Laplace in this case</a>) with zero mean value,  that is</p>
+<h2 id="simple-example-code">Simple example code </h2>
 
-$$
-p(\boldsymbol{\beta})=\prod_{j=0}^{p-1}\exp{\left(-\frac{\vert\beta_j\vert}{\tau}\right)}.
-$$
 
-<p>Our posterior probability becomes then (omitting the normalization factor which is just a constant)</p>
-$$
-p(\boldsymbol{\beta}\vert\boldsymbol{D})=\prod_{i=0}^{n-1}\frac{1}{\sqrt{2\pi\sigma^2}}\exp{\left[-\frac{(y_i-\boldsymbol{X}_{i,*}\boldsymbol{\beta})^2}{2\sigma^2}\right]}\prod_{j=0}^{p-1}\exp{\left(-\frac{\vert\beta_j\vert}{\tau}\right)}.
-$$
+<!-- code=python (!bc pycod) typeset with pygments style "default" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #f8f8f8">
+  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span> 
+
+n <span style="color: #666666">=</span> <span style="color: #666666">100</span> <span style="color: #408080; font-style: italic">#100 datapoints </span>
+M <span style="color: #666666">=</span> <span style="color: #666666">5</span>   <span style="color: #408080; font-style: italic">#size of each minibatch</span>
+m <span style="color: #666666">=</span> <span style="color: #008000">int</span>(n<span style="color: #666666">/</span>M) <span style="color: #408080; font-style: italic">#number of minibatches</span>
+n_epochs <span style="color: #666666">=</span> <span style="color: #666666">10</span> <span style="color: #408080; font-style: italic">#number of epochs</span>
+
+j <span style="color: #666666">=</span> <span style="color: #666666">0</span>
+<span style="color: #008000; font-weight: bold">for</span> epoch <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #666666">1</span>,n_epochs<span style="color: #666666">+1</span>):
+    <span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(m):
+        k <span style="color: #666666">=</span> np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>randint(m) <span style="color: #408080; font-style: italic">#Pick the k-th minibatch at random</span>
+        <span style="color: #408080; font-style: italic">#Compute the gradient using the data in minibatch Bk</span>
+        <span style="color: #408080; font-style: italic">#Compute new suggestion for </span>
+        j <span style="color: #666666">+=</span> <span style="color: #666666">1</span>
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
+</div>
 
-<p>Taking the negative
-logarithm of the posterior probability and leaving out the
-constants terms that do not depend on \( \beta \), we have
+<p>Taking the gradient only on a subset of the data has two important
+benefits. First, it introduces randomness which decreases the chance
+that our opmization scheme gets stuck in a local minima. Second, if
+the size of the minibatches are small relative to the number of
+datapoints (\( M <  n \)), the computation of the gradient is much
+cheaper since we sum over the datapoints in the \( k-th \) minibatch and not
+all \( n \) datapoints.
 </p>
 
-$$
-C(\boldsymbol{\beta})=\frac{\vert\vert (\boldsymbol{y}-\boldsymbol{X}\boldsymbol{\beta})\vert\vert_2^2}{2\sigma^2}+\frac{1}{\tau}\vert\vert\boldsymbol{\beta}\vert\vert_1,
-$$
-
-<p>and replacing \( 1/\tau \) with \( \lambda \) we have</p>
-
-$$
-C(\boldsymbol{\beta})=\frac{\vert\vert (\boldsymbol{y}-\boldsymbol{X}\boldsymbol{\beta})\vert\vert_2^2}{2\sigma^2}+\lambda\vert\vert\boldsymbol{\beta}\vert\vert_1,
-$$
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="when-do-we-stop">When do we stop? </h2>
 
-<p>which is our Lasso cost function!  </p>
+<p>A natural question is when do we stop the search for a new minimum?
+One possibility is to compute the full gradient after a given number
+of epochs and check if the norm of the gradient is smaller than some
+threshold and stop if true. However, the condition that the gradient
+is zero is valid also for local minima, so this would only tell us
+that we are close to a local/global minimum. However, we could also
+evaluate the cost function at this point, store the result and
+continue the search. If the test kicks in at a later stage we can
+compare the values of the cost function and keep the \( \theta \) that
+gave the lowest value.
+</p>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="why-resampling-methods">Why resampling methods </h2>
+<h2 id="slightly-different-approach">Slightly different approach </h2>
 
-<p>Before we proceed, we need to rethink what we have been doing. In our
-eager to fit the data, we have omitted several important elements in
-our regression analysis. In what follows we will
+<p>Another approach is to let the step length \( \eta_j \) depend on the
+number of epochs in such a way that it becomes very small after a
+reasonable time such that we do not move at all. Such approaches are
+also called scaling. There are many such ways to <a href="https://towardsdatascience.com/gradient-descent-the-learning-rate-and-the-importance-of-feature-scaling-6c0b416596e1" target="_blank">scale the learning
+rate</a>
+and <a href="https://www.jmlr.org/papers/volume23/20-1258/20-1258.pdf" target="_blank">discussions here</a>. See
+also
+<a href="https://towardsdatascience.com/learning-rate-schedules-and-adaptive-learning-rate-methods-for-deep-learning-2c8f433990d1" target="_blank"><tt>https://towardsdatascience.com/learning-rate-schedules-and-adaptive-learning-rate-methods-for-deep-learning-2c8f433990d1</tt></a>
+for a discussion of different scaling functions for the learning rate.
 </p>
-<ol>
-<li> look at statistical properties, including a discussion of mean values, variance and the so-called bias-variance tradeoff</li>
-<li> introduce resampling techniques like cross-validation, bootstrapping and jackknife and more</li>
-</ol>
-<p>and discuss how to select a given model (one of the difficult parts in machine learning).</p>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="resampling-methods">Resampling methods </h2>
-<div class="alert alert-block alert-block alert-text-normal">
-<b></b>
-<p>
-<p>Resampling methods are an indispensable tool in modern
-statistics. They involve repeatedly drawing samples from a training
-set and refitting a model of interest on each sample in order to
-obtain additional information about the fitted model. For example, in
-order to estimate the variability of a linear regression fit, we can
-repeatedly draw different samples from the training data, fit a linear
-regression to each new sample, and then examine the extent to which
-the resulting fits differ. Such an approach may allow us to obtain
-information that would not be available from fitting the model only
-once using the original training sample.
-</p>
-
-<p>Two resampling methods are often used in Machine Learning analyses,</p>
-<ol>
-<li> The <b>bootstrap method</b></li>
-<li> and <b>Cross-Validation</b></li>
-</ol>
-<p>In addition there are several other methods such as the Jackknife and the Blocking methods. We will discuss in particular
-cross-validation and the bootstrap method. 
+<h2 id="time-decay-rate">Time decay rate </h2>
+
+<p>As an example, let \( e = 0,1,2,3,\cdots \) denote the current epoch and let \( t_0, t_1 > 0 \) be two fixed numbers. Furthermore, let \( t = e \cdot m + i \) where \( m \) is the number of minibatches and \( i=0,\cdots,m-1 \). Then the function $$\eta_j(t; t_0, t_1) = \frac{t_0}{t+t_1} $$ goes to zero as the number of epochs gets large. I.e. we start with a step length \( \eta_j (0; t_0, t_1) = t_0/t_1 \) which decays in <em>time</em> \( t \).</p>
+
+<p>In this way we can fix the number of epochs, compute \( \theta \) and
+evaluate the cost function at the end. Repeating the computation will
+give a different result since the scheme is random by design. Then we
+pick the final \( \theta \) that gives the lowest value of the cost
+function.
 </p>
-</div>
 
 
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="resampling-approaches-can-be-computationally-expensive">Resampling approaches can be computationally expensive </h2>
-<div class="alert alert-block alert-block alert-text-normal">
-<b></b>
-<p>
+<!-- code=python (!bc pycod) typeset with pygments style "default" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #f8f8f8">
+  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span> 
+
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">step_length</span>(t,t0,t1):
+    <span style="color: #008000; font-weight: bold">return</span> t0<span style="color: #666666">/</span>(t<span style="color: #666666">+</span>t1)
+
+n <span style="color: #666666">=</span> <span style="color: #666666">100</span> <span style="color: #408080; font-style: italic">#100 datapoints </span>
+M <span style="color: #666666">=</span> <span style="color: #666666">5</span>   <span style="color: #408080; font-style: italic">#size of each minibatch</span>
+m <span style="color: #666666">=</span> <span style="color: #008000">int</span>(n<span style="color: #666666">/</span>M) <span style="color: #408080; font-style: italic">#number of minibatches</span>
+n_epochs <span style="color: #666666">=</span> <span style="color: #666666">500</span> <span style="color: #408080; font-style: italic">#number of epochs</span>
+t0 <span style="color: #666666">=</span> <span style="color: #666666">1.0</span>
+t1 <span style="color: #666666">=</span> <span style="color: #666666">10</span>
+
+eta_j <span style="color: #666666">=</span> t0<span style="color: #666666">/</span>t1
+j <span style="color: #666666">=</span> <span style="color: #666666">0</span>
+<span style="color: #008000; font-weight: bold">for</span> epoch <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #666666">1</span>,n_epochs<span style="color: #666666">+1</span>):
+    <span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(m):
+        k <span style="color: #666666">=</span> np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>randint(m) <span style="color: #408080; font-style: italic">#Pick the k-th minibatch at random</span>
+        <span style="color: #408080; font-style: italic">#Compute the gradient using the data in minibatch Bk</span>
+        <span style="color: #408080; font-style: italic">#Compute new suggestion for theta</span>
+        t <span style="color: #666666">=</span> epoch<span style="color: #666666">*</span>m<span style="color: #666666">+</span>i
+        eta_j <span style="color: #666666">=</span> step_length(t,t0,t1)
+        j <span style="color: #666666">+=</span> <span style="color: #666666">1</span>
 
-<p>Resampling approaches can be computationally expensive, because they
-involve fitting the same statistical method multiple times using
-different subsets of the training data. However, due to recent
-advances in computing power, the computational requirements of
-resampling methods generally are not prohibitive. In this chapter, we
-discuss two of the most commonly used resampling methods,
-cross-validation and the bootstrap. Both methods are important tools
-in the practical application of many statistical learning
-procedures. For example, cross-validation can be used to estimate the
-test error associated with a given statistical learning method in
-order to evaluate its performance, or to select the appropriate level
-of flexibility. The process of evaluating a model&#8217;s performance is
-known as model assessment, whereas the process of selecting the proper
-level of flexibility for a model is known as model selection. The
-bootstrap is widely used.
-</p>
+<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;eta_j after </span><span style="color: #BB6688; font-weight: bold">%d</span><span style="color: #BA2121"> epochs: </span><span style="color: #BB6688; font-weight: bold">%g</span><span style="color: #BA2121">&quot;</span> <span style="color: #666666">%</span> (n_epochs,eta_j))
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
 </div>
 
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="why-resampling-methods">Why resampling methods ? </h2>
-<div class="alert alert-block alert-block alert-text-normal">
-<b>Statistical analysis</b>
-<p>
+<h2 id="code-with-a-number-of-minibatches-which-varies">Code with a Number of Minibatches which varies </h2>
 
-<ul>
-<li> Our simulations can be treated as <em>computer experiments</em>. This is particularly the case for Monte Carlo methods which are widely used in statistical analyses.</li>
-<li> The results can be analysed with the same statistical tools as we would use when analysing experimental data.</li>
-<li> As in all experiments, we are looking for expectation values and an estimate of how accurate they are, i.e., possible sources for errors.</li>
-</ul>
+<p>In the code here we vary the number of mini-batches.</p>
+
+<!-- code=text (!bc pycode) typeset with pygments style "default" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #f8f8f8">
+  <pre style="line-height: 125%;"># Importing various packages
+from math import exp, sqrt
+from random import random, seed
+import numpy as np
+import matplotlib.pyplot as plt
+
+n = 100
+x = 2*np.random.rand(n,1)
+y = 4+3*x+np.random.randn(n,1)
+
+X = np.c_[np.ones((n,1)), x]
+XT_X = X.T @ X
+theta_linreg = np.linalg.inv(X.T @ X) @ (X.T @ y)
+print(&quot;Own inversion&quot;)
+print(theta_linreg)
+# Hessian matrix
+H = (2.0/n)* XT_X
+EigValues, EigVectors = np.linalg.eig(H)
+print(f&quot;Eigenvalues of Hessian Matrix:{EigValues}&quot;)
+
+theta = np.random.randn(2,1)
+eta = 1.0/np.max(EigValues)
+Niterations = 1000
+
+
+for iter in range(Niterations):
+    gradients = 2.0/n*X.T @ ((X @ theta)-y)
+    theta -= eta*gradients
+print(&quot;theta from own gd&quot;)
+print(theta)
+
+xnew = np.array([[0],[2]])
+Xnew = np.c_[np.ones((2,1)), xnew]
+ypredict = Xnew.dot(theta)
+ypredict2 = Xnew.dot(theta_linreg)
+
+n_epochs = 50
+M = 5   #size of each minibatch
+m = int(n/M) #number of minibatches
+t0, t1 = 5, 50
+
+def learning_schedule(t):
+    return t0/(t+t1)
+
+theta = np.random.randn(2,1)
+
+for epoch in range(n_epochs):
+# Can you figure out a better way of setting up the contributions to each batch?
+    for i in range(m):
+        random_index = M*np.random.randint(m)
+        xi = X[random_index:random_index+M]
+        yi = y[random_index:random_index+M]
+        gradients = (2.0/M)* xi.T @ ((xi @ theta)-yi)
+        eta = learning_schedule(epoch*m+i)
+        theta = theta - eta*gradients
+print(&quot;theta from own sdg&quot;)
+print(theta)
+
+plt.plot(xnew, ypredict, &quot;r-&quot;)
+plt.plot(xnew, ypredict2, &quot;b-&quot;)
+plt.plot(x, y ,&#39;ro&#39;)
+plt.axis([0,2.0,0, 15.0])
+plt.xlabel(r&#39;$x$&#39;)
+plt.ylabel(r&#39;$y$&#39;)
+plt.title(r&#39;Random numbers &#39;)
+plt.show()
+</pre>
 </div>
-    
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
+</div>
+
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="statistical-analysis">Statistical analysis </h2>
-<div class="alert alert-block alert-block alert-text-normal">
-<b></b>
-<p>
+<h2 id="replace-or-not">Replace or not </h2>
 
-<ul>
-<li> As in other experiments, many numerical  experiments have two classes of errors:</li>
-<ul>
-  <li> Statistical errors</li>
-  <li> Systematical errors</li>
-</ul>
-<li> Statistical errors can be estimated using standard tools from statistics</li>
-<li> Systematical errors are method specific and must be treated differently from case to case.</li> 
-</ul>
-</div>
-    
+<p>In the above code, we have use replacement in setting up the
+mini-batches. The discussion
+<a href="https://sebastianraschka.com/faq/docs/sgd-methods.html" target="_blank">here</a> may be
+useful.  
+</p>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="resampling-methods">Resampling methods </h2>
+<h2 id="sgd-vs-full-batch-gd-convergence-speed-and-memory-comparison">SGD vs Full-Batch GD: Convergence Speed and Memory Comparison </h2>
+<h3 id="theoretical-convergence-speed-and-convex-optimization">Theoretical Convergence Speed and convex optimization </h3>
+
+<p>Consider minimizing an empirical cost function</p>
+$$
+C(\theta) =\frac{1}{N}\sum_{i=1}^N l_i(\theta),
+$$
 
-<p>With all these analytical equations for both the OLS and Ridge
-regression, we will now outline how to assess a given model. This will
-lead to a discussion of the so-called bias-variance tradeoff (see
-below) and so-called resampling methods.
+<p>where each \( l_i(\theta) \) is a
+differentiable loss term. Gradient Descent (GD) updates parameters
+using the full gradient \( \nabla C(\theta) \), while Stochastic Gradient
+Descent (SGD) uses a single sample (or mini-batch) gradient \( \nabla
+l_i(\theta) \) selected at random. In equation form, one GD step is:
 </p>
 
-<p>One of the quantities we have discussed as a way to measure errors is
-the mean-squared error (MSE), mainly used for fitting of continuous
-functions. Another choice is the absolute error.
+$$
+\theta_{t+1} = \theta_t-\eta \nabla C(\theta_t) =\theta_t -\eta \frac{1}{N}\sum_{i=1}^N \nabla l_i(\theta_t),
+$$
+
+<p>whereas one SGD step is:</p>
+
+$$
+\theta_{t+1} = \theta_t -\eta \nabla l_{i_t}(\theta_t),
+$$
+
+<p>with \( i_t \) randomly chosen. On smooth convex problems, GD and SGD both
+converge to the global minimum, but their rates differ. GD can take
+larger, more stable steps since it uses the exact gradient, achieving
+an error that decreases on the order of \( O(1/t) \) per iteration for
+convex objectives (and even exponentially fast for strongly convex
+cases). In contrast, plain SGD has more variance in each step, leading
+to sublinear convergence in expectation &#8211; typically \( O(1/\sqrt{t}) \)
+for general convex objectives (\thetaith appropriate diminishing step
+sizes) . Intuitively, GD&#8217;s trajectory is smoother and more
+predictable, while SGD&#8217;s path oscillates due to noise but costs far
+less per iteration, enabling many more updates in the same time.
 </p>
+<h3 id="strongly-convex-case">Strongly Convex Case </h3>
 
-<p>In the discussions below we will focus on the MSE and in particular since we will split the data into test and training data,
-we discuss the
+<p>If \( C(\theta) \) is strongly convex and \( L \)-smooth (so GD enjoys linear
+convergence), the gap \( C(\theta_t)-C(\theta^*) \) for GD shrinks as
 </p>
-<ol>
-<li> prediction error or simply the <b>test error</b> \( \mathrm{Err_{Test}} \), where we have a fixed training set and the test error is the MSE arising from the data reserved for testing. We discuss also the</li> 
-<li> training error \( \mathrm{Err_{Train}} \), which is the average loss over the training data.</li>
-</ol>
-<p>As our model becomes more and more complex, more of the training data tends to  used. The training may thence adapt to more complicated structures in the data. This may lead to a decrease in the bias (see below for code example) and a slight increase of the variance for the test error.
-For a certain level of complexity the test error will reach minimum, before starting to increase again. The
-training error reaches a saturation.
+$$
+C(\theta_t) - C(\theta^* ) \le \Big(1 - \frac{\mu}{L}\Big)^t [C(\theta_0)-C(\theta^*)],
+$$
+
+<p>a geometric (linear) convergence per iteration . Achieving an
+\( \epsilon \)-accurate solution thus takes on the order of
+\( \log(1/\epsilon) \) iterations for GD. However, each GD iteration costs
+\( O(N) \) gradient evaluations. SGD cannot exploit strong convexity to
+obtain a linear rate &#8211; instead, with a properly decaying step size
+(e.g. \( \eta_t = \frac{1}{\mu t} \)) or iterate averaging, SGD attains an
+\( O(1/t) \) convergence rate in expectation . For example, one result
+of Moulines and  Bach 2011, see <a href="https://papers.nips.cc/paper_files/paper/2011/hash/40008b9a5380fcacce3976bf7c08af5b-Abstract.html" target="_blank"><tt>https://papers.nips.cc/paper_files/paper/2011/hash/40008b9a5380fcacce3976bf7c08af5b-Abstract.html</tt></a> shows that with \( \eta_t = \Theta(1/t) \),
 </p>
+$$
+\mathbb{E}[C(\theta_t) - C(\theta^*)] = O(1/t),
+$$
 
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="resampling-methods-bootstrap">Resampling methods: Bootstrap </h2>
-<div class="alert alert-block alert-block alert-text-normal">
-<b></b>
-<p>
-<p>Bootstrapping is a <a href="https://en.wikipedia.org/wiki/Nonparametric_statistics" target="_blank">non-parametric approach</a> to statistical inference
-that substitutes computation for more traditional distributional
-assumptions and asymptotic results. Bootstrapping offers a number of
-advantages: 
+<p>for strongly convex, smooth \( F \) . This \( 1/t \) rate is slower per
+iteration than GD&#8217;s exponential decay, but each SGD iteration is \( N \)
+times cheaper. In fact, to reach error \( \epsilon \), plain SGD needs on
+the order of \( T=O(1/\epsilon) \) iterations (sub-linear convergence),
+while GD needs \( O(\log(1/\epsilon)) \) iterations. When accounting for
+cost-per-iteration, GD requires \( O(N \log(1/\epsilon)) \) total gradient
+computations versus SGD&#8217;s \( O(1/\epsilon) \) single-sample
+computations. In large-scale regimes (huge \( N \)), SGD can be
+faster in wall-clock time because \( N \log(1/\epsilon) \) may far exceed
+\( 1/\epsilon \) for reasonable accuracy levels. In other words,
+with millions of data points, one epoch of GD (one full gradient) is
+extremely costly, whereas SGD can make \( N \) cheap updates in the time
+GD makes one &#8211; often yielding a good solution faster in practice, even
+though SGD&#8217;s asymptotic error decays more slowly. As one lecture
+succinctly puts it: &#8220;SGD can be super effective in terms of iteration
+cost and memory, but SGD is slow to converge and can&#8217;t adapt to strong
+convexity&#8221; . Thus, the break-even point depends on \( N \) and the desired
+accuracy: for moderate accuracy on very large \( N \), SGD&#8217;s cheaper
+updates win; for extremely high precision (very small \( \epsilon \)) on a
+modest \( N \), GD&#8217;s fast convergence per step can be advantageous.
 </p>
-<ol>
-<li> The bootstrap is quite general, although there are some cases in which it fails.</li>  
-<li> Because it does not require distributional assumptions (such as normally distributed errors), the bootstrap can provide more accurate inferences when the data are not well behaved or when the sample size is small.</li>  
-<li> It is possible to apply the bootstrap to statistics with sampling distributions that are difficult to derive, even asymptotically.</li> 
-<li> It is relatively simple to apply the bootstrap to complex data-collection plans (such as stratified and clustered samples).</li>
-</ol>
-</div>
+<h3 id="non-convex-problems">Non-Convex Problems </h3>
 
+<p>In non-convex optimization (e.g. deep neural networks), neither GD nor
+SGD guarantees global minima, but SGD often displays faster progress
+in finding useful minima. Theoretical results here are weaker, usually
+showing convergence to a stationary point \( \theta \) (\( |\nabla C| \) is
+small) in expectation. For example, GD might require \( O(1/\epsilon^2) \)
+iterations to ensure \( |\nabla C(\theta)| < \epsilon \), and SGD typically has
+similar polynomial complexity (often worse due to gradient
+noise). However, a noteworthy difference is that SGD&#8217;s stochasticity
+can help escape saddle points or poor local minima. Random gradient
+fluctuations act like implicit noise, helping the iterate &#8220;jump&#8221; out
+of flat saddle regions where full-batch GD could stagnate . In fact,
+research has shown that adding noise to GD can guarantee escaping
+saddle points in polynomial time, and the inherent noise in SGD often
+serves this role. Empirically, this means SGD can sometimes find a
+lower loss basin faster, whereas full-batch GD might get &#8220;stuck&#8221; near
+saddle points or need a very small learning rate to navigate complex
+error surfaces . Overall, in modern high-dimensional machine learning,
+SGD (or mini-batch SGD) is the workhorse for large non-convex problems
+because it converges to good solutions much faster in practice,
+despite the lack of a linear convergence guarantee. Full-batch GD is
+rarely used on large neural networks, as it would require tiny steps
+to avoid divergence and is extremely slow per iteration .
+</p>
 
-<p>The textbook by <a href="https://www.cambridge.org/core/books/bootstrap-methods-and-their-application/ED2FD043579F27952363566DC09CBD6A" target="_blank">Davison on the Bootstrap Methods and their Applications</a> provides many more insights and proofs. In this course we will take a more practical approach and use the results and theorems provided in the literature. For those interested in reading more about the bootstrap methods, we recommend the above text and the one by <a href="https://www.routledge.com/An-Introduction-to-the-Bootstrap/Efron-Tibshirani/p/book/9780412042317" target="_blank">Efron and Tibshirani</a>.</p>
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="memory-usage-and-scalability">Memory Usage and Scalability </h2>
+
+<p>A major advantage of SGD is its memory efficiency in handling large
+datasets. Full-batch GD requires access to the entire training set for
+each iteration, which often means the whole dataset (or a large
+subset) must reside in memory to compute \( \nabla C(\theta) \) . This results
+in memory usage that scales linearly with the dataset size \( N \). For
+instance, if each training sample is large (e.g. high-dimensional
+features), computing a full gradient may require storing a substantial
+portion of the data or all intermediate gradients until they are
+aggregated. In contrast, SGD needs only a single (or a small
+mini-batch of) training example(s) in memory at any time . The
+algorithm processes one sample (or mini-batch) at a time and
+immediately updates the model, discarding that sample before moving to
+the next. This streaming approach means that memory footprint is
+essentially independent of \( N \) (apart from storing the model
+parameters themselves). As one source notes, gradient descent
+&#8220;requires more memory than SGD&#8221; because it &#8220;must store the entire
+dataset for each iteration,&#8221; whereas SGD &#8220;only needs to store the
+current training example&#8221; . In practical terms, if you have a dataset
+of size, say, 1 million examples, full-batch GD would need memory for
+all million every step, while SGD could be implemented to load just
+one example at a time &#8211; a crucial benefit if data are too large to fit
+in RAM or GPU memory. This scalability makes SGD suitable for
+large-scale learning: as long as you can stream data from disk, SGD
+can handle arbitrarily large datasets with fixed memory. In fact, SGD
+&#8220;does not need to remember which examples were visited&#8221; in the past,
+allowing it to run in an online fashion on infinite data streams
+. Full-batch GD, on the other hand, would require multiple passes
+through a giant dataset per update (or a complex distributed memory
+system), which is often infeasible.
+</p>
+
+<p>There is also a secondary memory effect: computing a full-batch
+gradient in deep learning requires storing all intermediate
+activations for backpropagation across the entire batch. A very large
+batch (approaching the full dataset) might exhaust GPU memory due to
+the need to hold activation gradients for thousands or millions of
+examples simultaneously. SGD/minibatches mitigate this by splitting
+the workload &#8211; e.g. with a mini-batch of size 32 or 256, memory use
+stays bounded, whereas a full-batch (size = \( N \)) forward/backward pass
+could not even be executed if \( N \) is huge. Techniques like gradient
+accumulation exist to simulate large-batch GD by summing many
+small-batch gradients &#8211; but these still process data in manageable
+chunks to avoid memory overflow. In summary, memory complexity for GD
+grows with \( N \), while for SGD it remains \( O(1) \) w.r.t. dataset size
+(only the model and perhaps a mini-batch reside in memory) . This is a
+key reason why batch GD &#8220;does not scale&#8221; to very large data and why
+virtually all large-scale machine learning algorithms rely on
+stochastic or mini-batch methods.
+</p>
 
-<p>Before we proceed however, we need to remind ourselves about a central theorem in statistics, namely the so-called <b>central limit theorem</b>.</p>
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="empirical-evidence-convergence-time-and-memory-in-practice">Empirical Evidence: Convergence Time and Memory in Practice </h2>
+
+<p>Empirical studies strongly support the theoretical trade-offs
+above. In large-scale machine learning tasks, SGD often converges to a
+good solution much faster in wall-clock time than full-batch GD, and
+it uses far less memory. For example, Bottou &amp; Bousquet (2008)
+analyzed learning time under a fixed computational budget and
+concluded that when data is abundant, it&#8217;s better to use a faster
+(even if less precise) optimization method to process more examples in
+the same time . This analysis showed that for large-scale problems,
+processing more data with SGD yields lower error than spending the
+time to do exact (batch) optimization on fewer data . In other words,
+if you have a time budget, it&#8217;s often optimal to accept slightly
+slower convergence per step (as with SGD) in exchange for being able
+to use many more training samples in that time. This phenomenon is
+borne out by experiments:
+</p>
+<h3 id="deep-neural-networks">Deep Neural Networks </h3>
+
+<p>In modern deep learning, full-batch GD is so slow that it is rarely
+attempted; instead, mini-batch SGD is standard. A recent study
+demonstrated that it is possible to train a ResNet-50 on ImageNet
+using full-batch gradient descent, but it required careful tuning
+(e.g. gradient clipping, tiny learning rates) and vast computational
+resources &#8211; and even then, each full-batch update was extremely
+expensive.
+</p>
+
+<p>Using a huge batch
+(closer to full GD) tends to slow down convergence if the learning
+rate is not scaled up, and often encounters optimization difficulties
+(plateaus) that small batches avoid.
+Empirically, small or medium
+batch SGD finds minima in fewer clock hours because it can rapidly
+loop over the data with gradient noise aiding exploration.
+</p>
+<h3 id="memory-constraints">Memory constraints </h3>
+
+<p>From a memory standpoint, practitioners note that batch GD becomes
+infeasible on large data. For example, if one tried to do full-batch
+training on a dataset that doesn&#8217;t fit in RAM or GPU memory, the
+program would resort to heavy disk I/O or simply crash. SGD
+circumvents this by processing mini-batches. Even in cases where data
+does fit in memory, using a full batch can spike memory usage due to
+storing all gradients. One empirical observation is that mini-batch
+training has a &#8220;lower, fluctuating usage pattern&#8221; of memory, whereas
+full-batch loading &#8220;quickly consumes memory (often exceeding limits)&#8221;
+. This is especially relevant for graph neural networks or other
+models where a &#8220;batch&#8221; may include a huge chunk of a graph: full-batch
+gradient computation can exhaust GPU memory, whereas mini-batch
+methods keep memory usage manageable .
+</p>
+
+<p>In summary, SGD converges faster than full-batch GD in terms of actual
+training time for large-scale problems, provided we measure
+convergence as reaching a good-enough solution. Theoretical bounds
+show SGD needs more iterations, but because it performs many more
+updates per unit time (and requires far less memory), it often
+achieves lower loss in a given time frame than GD. Full-batch GD might
+take slightly fewer iterations in theory, but each iteration is so
+costly that it is &#8220;slower&#8230; especially for large datasets&#8221; . Meanwhile,
+memory scaling strongly favors SGD: GD&#8217;s memory cost grows with
+dataset size, making it impractical beyond a point, whereas SGD&#8217;s
+memory use is modest and mostly constant w.r.t. \( N \) . These
+differences have made SGD (and mini-batch variants) the de facto
+choice for training large machine learning models, from logistic
+regression on millions of examples to deep neural networks with
+billions of parameters. The consensus in both research and practice is
+that for large-scale or high-dimensional tasks, SGD-type methods
+converge quicker per unit of computation and handle memory constraints
+better than standard full-batch gradient descent .
+</p>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="the-central-limit-theorem">The Central Limit Theorem </h2>
+<h2 id="second-moment-of-the-gradient">Second moment of the gradient </h2>
+
+<p>In stochastic gradient descent, with and without momentum, we still
+have to specify a schedule for tuning the learning rates \( \eta_t \)
+as a function of time.  As discussed in the context of Newton's
+method, this presents a number of dilemmas. The learning rate is
+limited by the steepest direction which can change depending on the
+current position in the landscape. To circumvent this problem, ideally
+our algorithm would keep track of curvature and take large steps in
+shallow, flat directions and small steps in steep, narrow directions.
+Second-order methods accomplish this by calculating or approximating
+the Hessian and normalizing the learning rate by the
+curvature. However, this is very computationally expensive for
+extremely large models. Ideally, we would like to be able to
+adaptively change the step size to match the landscape without paying
+the steep computational price of calculating or approximating
+Hessians.
+</p>
+
+<p>During the last decade a number of methods have been introduced that accomplish
+this by tracking not only the gradient, but also the second moment of
+the gradient. These methods include AdaGrad, AdaDelta, Root Mean Squared Propagation (RMS-Prop), and
+<a href="https://arxiv.org/abs/1412.6980" target="_blank">ADAM</a>.
+</p>
 
-<p>Suppose we have a PDF \( p(x) \) from which we generate  a series \( N \)
-of averages \( \mathbb{E}[x_i] \). Each mean value \( \mathbb{E}[x_i] \)
-is viewed as the average of a specific measurement, e.g., throwing 
-dice 100 times and then taking the average value, or producing a certain
-amount of random numbers. 
-For notational ease, we set \( \mathbb{E}[x_i]=x_i \) in the discussion
-which follows. We do the same for \( \mathbb{E}[z]=z \).
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="challenge-choosing-a-fixed-learning-rate">Challenge: Choosing a Fixed Learning Rate </h2>
+<p>A fixed \( \eta \) is hard to get right:</p>
+<ol>
+<li> If \( \eta \) is too large, the updates can overshoot the minimum, causing oscillations or divergence</li>
+<li> If \( \eta \) is too small, convergence is very slow (many iterations to make progress)</li>
+</ol>
+<p>In practice, one often uses trial-and-error or schedules (decaying \( \eta \) over time) to find a workable balance.
+For a function with steep directions and flat directions, a single global \( \eta \) may be inappropriate:
 </p>
+<ol>
+<li> Steep coordinates require a smaller step size to avoid oscillation.</li>
+<li> Flat/shallow coordinates could use a larger step to speed up progress.</li>
+<li> This issue is pronounced in high-dimensional problems with **sparse or varying-scale features** &#8211; we need a method to adjust step sizesper feature.</li>
+</ol>
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="motivation-for-adaptive-step-sizes">Motivation for Adaptive Step Sizes </h2>
 
-<p>If we compute the mean \( z \) of \( m \) such mean values \( x_i \)   </p>
-$$
-   z=\frac{x_1+x_2+\dots+x_m}{m},
-$$
+<ol>
+<li> Instead of a fixed global \( \eta \), use an <b>adaptive learning rate</b> for each parameter that depends on the history of gradients.</li>
+<li> Parameters that have large accumulated gradient magnitude should get smaller steps (they've been changing a lot), whereas parameters with small or infrequent gradients can have larger relative steps.</li>
+<li> This is especially useful for sparse features: Rarely active features accumulate little gradient, so their learning rate remains comparatively high, ensuring they are not neglected</li>
+<li> Conversely, frequently active features accumulate large gradient sums, and their learning rate automatically decreases, preventing too-large updates</li>
+<li> Several algorithms implement this idea (AdaGrad, RMSProp, AdaDelta, Adam, etc.). We will derive **AdaGrad**, one of the first adaptive methods.</li>
+</ol>
+<h2 id="adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html">AdaGrad algorithm, taken from <a href="https://www.deeplearningbook.org/contents/optimization.html" target="_blank">Goodfellow et al</a> </h2>
 
-<p>the question we pose is which is the PDF of the new variable \( z \).</p>
+<br/><br/>
+<center>
+<p><img src="http://github.com/figures/adagrad.png" width="600" align="bottom"></p>
+</center>
+<br/><br/>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="finding-the-limit">Finding the Limit </h2>
+<h2 id="derivation-of-the-adagrad-algorithm">Derivation of the AdaGrad Algorithm </h2>
 
-<p>The probability of obtaining an average value \( z \) is the product of the 
-probabilities of obtaining arbitrary individual mean values \( x_i \),
-but with the constraint that the average is \( z \). We can express this through
-the following expression
-</p>
+<div class="alert alert-block alert-block alert-text-normal">
+<b>Accumulating Gradient History</b>
+<p>
+<ol>
+<li> AdaGrad maintains a running sum of squared gradients for each parameter (coordinate)</li>
+<li> Let \( g_t = \nabla C_{i_t}(x_t) \) be the gradient at step \( t \) (or a subgradient for nondifferentiable cases).</li>
+<li> Initialize \( r_0 = 0 \) (an all-zero vector in \( \mathbb{R}^d \)).</li>
+<li> At each iteration \( t \), update the accumulation:</li>
+</ol>
 $$
-    \tilde{p}(z)=\int dx_1p(x_1)\int dx_2p(x_2)\dots\int dx_mp(x_m)
-    \delta(z-\frac{x_1+x_2+\dots+x_m}{m}),
+r_t = r_{t-1} + g_t \circ g_t,
 $$
 
-<p>where the \( \delta \)-function enbodies the constraint that the mean is \( z \).
-All measurements that lead to each individual \( x_i \) are expected to
-be independent, which in turn means that we can express \( \tilde{p} \) as the 
-product of individual \( p(x_i) \).  The independence assumption is important in the derivation of the central limit theorem.
-</p>
+<ol>
+<li> Here  \( g_t \circ g_t \) denotes element-wise square of the gradient vector. \( g_t^{(j)} = g_{t-1}^{(j)} + (g_{t,j})^2 \) for each parameter \( j \).</li>
+<li> We can view \( H_t = \mathrm{diag}(r_t) \) as a diagonal matrix of past squared gradients. Initially \( H_0 = 0 \).</li>
+</ol>
+</div>
 
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="rewriting-the-delta-function">Rewriting the \( \delta \)-function </h2>
 
-<p>If we use the integral expression for the \( \delta \)-function</p>
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="adagrad-update-rule-derivation">AdaGrad Update Rule Derivation </h2>
 
+<p>We scale the gradient by the inverse square root of the accumulated matrix \( H_t \). The AdaGrad update at step \( t \) is:</p>
 $$
-   \delta(z-\frac{x_1+x_2+\dots+x_m}{m})=\frac{1}{2\pi}\int_{-\infty}^{\infty}
-   dq\exp{\left(iq(z-\frac{x_1+x_2+\dots+x_m}{m})\right)},
+\theta_{t+1} =\theta_t - \eta H_t^{-1/2} g_t,
 $$
 
-<p>and inserting \( e^{i\mu q-i\mu q} \) where \( \mu \) is the mean value
-we arrive at
+<p>where \( H_t^{-1/2} \) is the diagonal matrix with entries \( (r_{t}^{(1)})^{-1/2}, \dots, (r_{t}^{(d)})^{-1/2} \)
+In coordinates, this means each parameter \( j \) has an individual step size:
 </p>
 $$
-   \tilde{p}(z)=\frac{1}{2\pi}\int_{-\infty}^{\infty}
-   dq\exp{\left(iq(z-\mu)\right)}\left[\int_{-\infty}^{\infty}
-   dxp(x)\exp{\left(iq(\mu-x)/m\right)}\right]^m,
+ \theta_{t+1,j} =\theta_{t,j} -\frac{\eta}{\sqrt{r_{t,j}}}g_{t,j}.
 $$
 
-<p>with the integral over \( x \) resulting in</p>
-
+<p>In practice we add a small constant \( \epsilon \) in the denominator for numerical stability to avoid division by zero:</p>
 $$
-  \int_{-\infty}^{\infty}dxp(x)\exp{\left(iq(\mu-x)/m\right)}=
-  \int_{-\infty}^{\infty}dxp(x)
-   \left[1+\frac{iq(\mu-x)}{m}-\frac{q^2(\mu-x)^2}{2m^2}+\dots\right].
+\theta_{t+1,j}= \theta_{t,j}-\frac{\eta}{\sqrt{\epsilon + r_{t,j}}}g_{t,j}.
 $$
 
+<p>Equivalently, the effective learning rate for parameter \( j \) at time \( t \) is \( \displaystyle \alpha_{t,j} = \frac{\eta}{\sqrt{\epsilon + r_{t,j}}} \). This decreases over time as \( r_{t,j} \) grows.</p>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="identifying-terms">Identifying Terms </h2>
+<h2 id="adagrad-properties">AdaGrad Properties </h2>
 
-<p>The second term on the rhs disappears since this is just the mean and 
-employing the definition of \( \sigma^2 \) we have 
+<ol>
+<li> AdaGrad automatically tunes the step size for each parameter. Parameters with more <em>volatile or large gradients</em> get smaller steps, and those with <em>small or infrequent gradients</em> get relatively larger steps</li>
+<li> No manual schedule needed: The accumulation \( r_t \) keeps increasing (or stays the same if gradient is zero), so step sizes \( \eta/\sqrt{r_t} \) are non-increasing. This has a similar effect to a learning rate schedule, but individualized per coordinate.</li>
+<li> Sparse data benefit: For very sparse features, \( r_{t,j} \) grows slowly, so that feature&#8217;s parameter retains a higher learning rate for longer, allowing it to make significant updates when it does get a gradient signal</li>
+<li> Convergence: In convex optimization, AdaGrad can be shown to achieve a sub-linear convergence rate  comparable to the best fixed learning rate tuned for the problem</li>
+</ol>
+<p>It effectively reduces the need to tune \( \eta \) by hand.</p>
+<ol>
+<li> Limitations: Because \( r_t \) accumulates without bound, AdaGrad&#8217;s learning rates can become extremely small over long training, potentially slowing progress. (Later variants like RMSProp, AdaDelta, Adam address this by modifying the accumulation rule.)</li>
+</ol>
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="rmsprop-adaptive-learning-rates">RMSProp: Adaptive Learning Rates </h2>
+
+<p>Addresses AdaGrad&#8217;s diminishing learning rate issue.
+Uses a decaying average of squared gradients (instead of a cumulative sum):
 </p>
 $$
-  \int_{-\infty}^{\infty}dxp(x)e^{\left(iq(\mu-x)/m\right)}=
-  1-\frac{q^2\sigma^2}{2m^2}+\dots,
+v_t = \rho v_{t-1} + (1-\rho)(\nabla C(\theta_t))^2,
 $$
 
-<p>resulting in </p>
+<p>with \( \rho \) typically \( 0.9 \) (or \( 0.99 \)).</p>
+<ol>
+<li> Update: \( \theta_{t+1} = \theta_t - \frac{\eta}{\sqrt{v_t + \epsilon}} \nabla C(\theta_t) \).</li>
+<li> Recent gradients have more weight, so \( v_t \) adapts to the current landscape.</li>
+<li> Avoids AdaGrad&#8217;s &#8220;infinite memory&#8221; problem &#8211; learning rate does not continuously decay to zero.</li>
+</ol>
+<p>RMSProp was first proposed in lecture notes by Geoff Hinton, 2012 &ndash; unpublished.)</p>
+<h2 id="rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html">RMSProp algorithm, taken from <a href="https://www.deeplearningbook.org/contents/optimization.html" target="_blank">Goodfellow et al</a> </h2>
 
-$$
-  \left[\int_{-\infty}^{\infty}dxp(x)\exp{\left(iq(\mu-x)/m\right)}\right]^m\approx
-  \left[1-\frac{q^2\sigma^2}{2m^2}+\dots \right]^m,
-$$
+<br/><br/>
+<center>
+<p><img src="http://github.com/figures/rmsprop.png" width="600" align="bottom"></p>
+</center>
+<br/><br/>
 
-<p>and in the limit \( m\rightarrow \infty \) we obtain </p>
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="adam-optimizer">Adam Optimizer </h2>
 
-$$
-   \tilde{p}(z)=\frac{1}{\sqrt{2\pi}(\sigma/\sqrt{m})}
-    \exp{\left(-\frac{(z-\mu)^2}{2(\sigma/\sqrt{m})^2}\right)},
-$$
+<p>Why combine Momentum and RMSProp? Motivation for Adam: Adaptive Moment Estimation (Adam) was introduced by Kingma an Ba (2014) to combine the benefits of momentum and RMSProp.</p>
 
-<p>which is the normal distribution with variance
-\( \sigma^2_m=\sigma^2/m \), where \( \sigma \) is the variance of the PDF \( p(x) \)
-and \( \mu \) is also the mean of the PDF \( p(x) \). 
-</p>
+<ol>
+<li> Fast convergence by smoothing gradients (accelerates in long-term gradient direction).</li>
+<li> Adaptive rates (RMSProp): Per-dimension learning rate scaling for stability (handles different feature scales, sparse gradients).</li>
+<li> Adam uses both: maintains moving averages of both first moment (gradients) and second moment (squared gradients)</li>
+<li> Additionally, includes a mechanism to correct the bias in these moving averages (crucial in early iterations)</li>
+</ol>
+<p><b>Result</b>: Adam is robust, achieves faster convergence with less tuning, and often outperforms SGD (with momentum) in practice.</p>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="wrapping-it-up">Wrapping it up </h2>
+<h2 id="adam-optimizer-https-arxiv-org-abs-1412-6980"><a href="https://arxiv.org/abs/1412.6980" target="_blank">ADAM optimizer</a> </h2>
 
-<p>Thus, the central limit theorem states that the PDF \( \tilde{p}(z) \) of
-the average of \( m \) random values corresponding to a PDF \( p(x) \) 
-is a normal distribution whose mean is the 
-mean value of the PDF \( p(x) \) and whose variance is the variance
-of the PDF \( p(x) \) divided by \( m \), the number of values used to compute \( z \).
+<p>In <a href="https://arxiv.org/abs/1412.6980" target="_blank">ADAM</a>, we keep a running average of
+both the first and second moment of the gradient and use this
+information to adaptively change the learning rate for different
+parameters.  The method is efficient when working with large
+problems involving lots data and/or parameters.  It is a combination of the
+gradient descent with momentum algorithm and the RMSprop algorithm
+discussed above.
 </p>
 
-<p>The central limit theorem leads to the well-known expression for the
-standard deviation, given by 
-</p>
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="why-combine-momentum-and-rmsprop">Why Combine Momentum and RMSProp? </h2>
+
+<ol>
+<li> Momentum: Fast convergence by smoothing gradients (accelerates in long-term gradient direction).</li>
+<li> Adaptive rates (RMSProp): Per-dimension learning rate scaling for stability (handles different feature scales, sparse gradients).</li>
+<li> Adam uses both: maintains moving averages of both first moment (gradients) and second moment (squared gradients)</li>
+<li> Additionally, includes a mechanism to correct the bias in these moving averages (crucial in early iterations)</li>
+</ol>
+<p>Result: Adam is robust, achieves faster convergence with less tuning, and often outperforms SGD (with momentum) in practice</p>
 
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="adam-exponential-moving-averages-moments">Adam: Exponential Moving Averages (Moments) </h2>
+<p>Adam maintains two moving averages at each time step \( t \) for each parameter \( w \):</p>
+<div class="alert alert-block alert-block alert-text-normal">
+<b>First moment (mean) \( m_t \)</b>
+<p>
+<p>The Momentum term</p>
 $$
-    \sigma_m=
-\frac{\sigma}{\sqrt{m}}.
+m_t = \beta_1m_{t-1} + (1-\beta_1)\, \nabla C(\theta_t),  
 $$
+</div>
 
-<p>The latter is true only if the average value is known exactly. This is obtained in the limit
-\( m\rightarrow \infty \)  only. Because the mean and the variance are measured quantities we obtain 
-the familiar expression in statistics (the so-called Bessel correction)
-</p>
+<div class="alert alert-block alert-block alert-text-normal">
+<b>Second moment (uncentered variance) \( v_t \)</b>
+<p>
+<p>The RMS term</p>
 $$
-    \sigma_m\approx 
-\frac{\sigma}{\sqrt{m-1}}.
+v_t = \beta_2v_{t-1} + (1-\beta_2)(\nabla C(\theta_t))^2,
 $$
 
-<p>In many cases however the above estimate for the standard deviation,
-in particular if correlations are strong, may be too simplistic. Keep
-in mind that we have assumed that the variables \( x \) are independent
-and identically distributed. This is obviously not always the
-case. For example, the random numbers (or better pseudorandom numbers)
-we generate in various calculations do always exhibit some
-correlations.
-</p>
+<p>with typical \( \beta_1 = 0.9 \), \( \beta_2 = 0.999 \). Initialize \( m_0 = 0 \), \( v_0 = 0 \).</p>
+</div>
 
-<p>The theorem is satisfied by a large class of PDFs. Note however that for a
-finite \( m \), it is not always possible to find a closed form /analytic expression for
-\( \tilde{p}(x) \).
-</p>
+<p>  These are <b>biased</b> estimators of the true first and second moment of the gradients, especially at the start (since \( m_0,v_0 \) are zero)</p>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="confidence-intervals">Confidence Intervals </h2>
+<h2 id="adam-bias-correction">Adam: Bias Correction </h2>
+<p>To counteract initialization bias in \( m_t, v_t \), Adam computes bias-corrected estimates</p>
+$$
+\hat{m}_t = \frac{m_t}{1 - \beta_1^t}, \qquad \hat{v}_t = \frac{v_t}{1 - \beta_2^t}. 
+$$
 
-<p>Confidence intervals are used in statistics and represent a type of estimate
-computed from the observed data. This gives a range of values for an
-unknown parameter such as the parameters \( \boldsymbol{\beta} \) from linear regression.
-</p>
+<ul>
+<li> When \( t \) is small, \( 1-\beta_i^t \approx 0 \), so \( \hat{m}_t, \hat{v}_t \) significantly larger than raw \( m_t, v_t \), compensating for the initial zero bias.</li>
+<li> As \( t \) increases, \( 1-\beta_i^t \to 1 \), and \( \hat{m}_t, \hat{v}_t \) converge to \( m_t, v_t \).</li>
+<li> Bias correction is important for Adam&#8217;s stability in early iterations</li>
+</ul>
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="adam-update-rule-derivation">Adam: Update Rule Derivation </h2>
+<p>Finally, Adam updates parameters using the bias-corrected moments:</p>
+$$
+\theta_{t+1} =\theta_t -\frac{\alpha}{\sqrt{\hat{v}_t} + \epsilon}\hat{m}_t,
+$$
 
-<p>With the OLS expressions for the parameters \( \boldsymbol{\beta} \) we found 
-\( \mathbb{E}(\boldsymbol{\beta}) = \boldsymbol{\beta} \), which means that the estimator of the regression parameters is unbiased.
+<p>where \( \epsilon \) is a small constant (e.g. \( 10^{-8} \)) to prevent division by zero.
+Breaking it down:
 </p>
+<ol>
+<li> Compute gradient \( \nabla C(\theta_t) \).</li>
+<li> Update first moment \( m_t \) and second moment \( v_t \) (exponential moving averages).</li>
+<li> Bias-correct: \( \hat{m}_t = m_t/(1-\beta_1^t) \), \( \; \hat{v}_t = v_t/(1-\beta_2^t) \).</li>
+<li> Compute step: \( \Delta \theta_t = \frac{\hat{m}_t}{\sqrt{\hat{v}_t} + \epsilon} \).</li>
+<li> Update parameters: \( \theta_{t+1} = \theta_t - \alpha\, \Delta \theta_t \).</li>
+</ol>
+<p>This is the Adam update rule as given in the original paper.</p>
 
-<p>In the exercises this week we show that the variance of the estimate of the \( j \)-th regression coefficient is
-\( \boldsymbol{\sigma}^2 (\boldsymbol{\beta}_j ) = \boldsymbol{\sigma}^2 [(\mathbf{X}^{T} \mathbf{X})^{-1}]_{jj}  \).
-</p>
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="adam-vs-adagrad-and-rmsprop">Adam vs. AdaGrad and RMSProp </h2>
 
-<p>This quantity can be used to
-construct a confidence interval for the estimates.
-</p>
+<ol>
+<li> AdaGrad: Uses per-coordinate scaling like Adam, but no momentum. Tends to slow down too much due to cumulative history (no forgetting)</li>
+<li> RMSProp: Uses moving average of squared gradients (like Adam&#8217;s \( v_t \)) to maintain adaptive learning rates, but does not include momentum or bias-correction.</li>
+<li> Adam: Effectively RMSProp + Momentum + Bias-correction</li>
+<ul>
+  <li> Momentum (\( m_t \)) provides acceleration and smoother convergence.</li>
+  <li> Adaptive \( v_t \) scaling moderates the step size per dimension.</li>
+  <li> Bias correction (absent in AdaGrad/RMSProp) ensures robust estimates early on.</li>
+</ul>
+</ol>
+<p>In practice, Adam often yields faster convergence and better tuning stability than RMSProp or AdaGrad alone</p>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="standard-approach-based-on-the-normal-distribution">Standard Approach based on the Normal Distribution </h2>
+<h2 id="adaptivity-across-dimensions">Adaptivity Across Dimensions </h2>
 
-<p>We will assume that the parameters \( \beta \) follow a normal
-distribution.  We can then define the confidence interval.  Here we will be using as
-shorthands \( \mu_{\beta} \) for the above mean value and \( \sigma_{\beta} \)
-for the standard deviation. We have then a confidence interval
-</p>
+<ol>
+<li> Adam adapts the step size \emph{per coordinate}: parameters with larger gradient variance get smaller effective steps, those with smaller or sparse gradients get larger steps.</li>
+<li> This per-dimension adaptivity is inherited from AdaGrad/RMSProp and helps handle ill-conditioned or sparse problems.</li>
+<li> Meanwhile, momentum (first moment) allows Adam to continue making progress even if gradients become small or noisy, by leveraging accumulated direction.</li>
+</ol>
+<h2 id="adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html">ADAM algorithm, taken from <a href="https://www.deeplearningbook.org/contents/optimization.html" target="_blank">Goodfellow et al</a> </h2>
 
-$$
-\left(\mu_{\beta}\pm \frac{z\sigma_{\beta}}{\sqrt{n}}\right),
-$$
+<br/><br/>
+<center>
+<p><img src="http://github.com/figures/adam.png" width="600" align="bottom"></p>
+</center>
+<br/><br/>
 
-<p>where \( z \) defines the level of certainty (or confidence). For a normal
-distribution typical parameters are \( z=2.576 \) which corresponds to a
-confidence of \( 99\% \) while \( z=1.96 \) corresponds to a confidence of
-\( 95\% \).  A confidence level of \( 95\% \) is commonly used and it is
-normally referred to as a <em>two-sigmas</em> confidence level, that is we
-approximate \( z\approx 2 \).
-</p>
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="algorithms-and-codes-for-adagrad-rmsprop-and-adam">Algorithms and codes for Adagrad, RMSprop and Adam </h2>
 
-<p>For more discussions of confidence intervals (and in particular linked with a discussion of the bootstrap method), see chapter 5 of the textbook by <a href="https://www.cambridge.org/core/books/bootstrap-methods-and-their-application/ED2FD043579F27952363566DC09CBD6A" target="_blank">Davison on the Bootstrap Methods and their Applications</a></p>
+<p>The algorithms we have implemented are well described in the text by <a href="https://www.deeplearningbook.org/contents/optimization.html" target="_blank">Goodfellow, Bengio and Courville, chapter 8</a>.</p>
 
-<p>In this text you will also find an in-depth discussion of the
-Bootstrap method, why it works and various theorems related to it. 
-</p>
+<p>The codes which implement these algorithms are discussed below here.</p>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="resampling-methods-bootstrap-background">Resampling methods: Bootstrap background </h2>
-
-<p>Since \( \widehat{\beta} = \widehat{\beta}(\boldsymbol{X}) \) is a function of random variables,
-\( \widehat{\beta} \) itself must be a random variable. Thus it has
-a pdf, call this function \( p(\boldsymbol{t}) \). The aim of the bootstrap is to
-estimate \( p(\boldsymbol{t}) \) by the relative frequency of
-\( \widehat{\beta} \). You can think of this as using a histogram
-in the place of \( p(\boldsymbol{t}) \). If the relative frequency closely
-resembles \( p(\vec{t}) \), then using numerics, it is straight forward to
-estimate all the interesting parameters of \( p(\boldsymbol{t}) \) using point
-estimators.  
-</p>
+<h2 id="practical-tips">Practical tips </h2>
 
+<ul>
+<li> <b>Randomize the data when making mini-batches</b>. It is always important to randomly shuffle the data when forming mini-batches. Otherwise, the gradient descent method can fit spurious correlations resulting from the order in which data is presented.</li>
+<li> <b>Transform your inputs</b>. Learning becomes difficult when our landscape has a mixture of steep and flat directions. One simple trick for minimizing these situations is to standardize the data by subtracting the mean and normalizing the variance of input variables. Whenever possible, also decorrelate the inputs. To understand why this is helpful, consider the case of linear regression. It is easy to show that for the squared error cost function, the Hessian of the cost function is just the correlation matrix between the inputs. Thus, by standardizing the inputs, we are ensuring that the landscape looks homogeneous in all directions in parameter space. Since most deep networks can be viewed as linear transformations followed by a non-linearity at each layer, we expect this intuition to hold beyond the linear case.</li>
+<li> <b>Monitor the out-of-sample performance.</b> Always monitor the performance of your model on a validation set (a small portion of the training data that is held out of the training process to serve as a proxy for the test set. If the validation error starts increasing, then the model is beginning to overfit. Terminate the learning process. This <em>early stopping</em> significantly improves performance in many settings.</li>
+<li> <b>Adaptive optimization methods don't always have good generalization.</b> Recent studies have shown that adaptive methods such as ADAM, RMSPorp, and AdaGrad tend to have poor generalization compared to SGD or SGD with momentum, particularly in the high-dimensional limit (i.e. the number of parameters exceeds the number of data points). Although it is not clear at this stage why these methods perform so well in training deep neural networks, simpler procedures like properly-tuned SGD may work as well or better in these applications.</li>
+</ul>
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="resampling-methods-more-bootstrap-background">Resampling methods: More Bootstrap background </h2>
+<h2 id="sneaking-in-automatic-differentiation-using-autograd">Sneaking in automatic differentiation using Autograd </h2>
 
-<p>In the case that \( \widehat{\beta} \) has
-more than one component, and the components are independent, we use the
-same estimator on each component separately.  If the probability
-density function of \( X_i \), \( p(x) \), had been known, then it would have
-been straightforward to do this by: 
+<p>In the examples here we take the liberty of sneaking in automatic
+differentiation (without having discussed the mathematics).  In
+project 1 you will write the gradients as discussed above, that is
+hard-coding the gradients.  By introducing automatic differentiation
+via the library <b>autograd</b>, which is now replaced by <b>JAX</b>, we have
+more flexibility in setting up alternative cost functions.
 </p>
-<ol>
-<li> Drawing lots of numbers from \( p(x) \), suppose we call one such set of numbers \( (X_1^*, X_2^*, \cdots, X_n^*) \).</li> 
-<li> Then using these numbers, we could compute a replica of \( \widehat{\beta} \) called \( \widehat{\beta}^* \).</li> 
-</ol>
-<p>By repeated use of the above two points, many
-estimates of \( \widehat{\beta} \) can  be obtained. The
-idea is to use the relative frequency of \( \widehat{\beta}^* \)
-(think of a histogram) as an estimate of \( p(\boldsymbol{t}) \).
+
+<p>The
+first example shows results with ordinary leats squares.
 </p>
 
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="resampling-methods-bootstrap-approach">Resampling methods: Bootstrap approach </h2>
 
-<p>But
-unless there is enough information available about the process that
-generated \( X_1,X_2,\cdots,X_n \), \( p(x) \) is in general
-unknown. Therefore, <a href="https://projecteuclid.org/euclid.aos/1176344552" target="_blank">Efron in 1979</a>  asked the
-question: What if we replace \( p(x) \) by the relative frequency
-of the observation \( X_i \)?
-</p>
+<!-- code=python (!bc pycod) typeset with pygments style "default" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #f8f8f8">
+  <pre style="line-height: 125%;"><span style="color: #408080; font-style: italic"># Using Autograd to calculate gradients for OLS</span>
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">random</span> <span style="color: #008000; font-weight: bold">import</span> random, seed
+<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
+<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">autograd.numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
+<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">matplotlib.pyplot</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">plt</span>
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">autograd</span> <span style="color: #008000; font-weight: bold">import</span> grad
+
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">CostOLS</span>(theta):
+    <span style="color: #008000; font-weight: bold">return</span> (<span style="color: #666666">1.0/</span>n)<span style="color: #666666">*</span>np<span style="color: #666666">.</span>sum((y<span style="color: #666666">-</span>X <span style="color: #666666">@</span> theta)<span style="color: #666666">**2</span>)
+
+n <span style="color: #666666">=</span> <span style="color: #666666">100</span>
+x <span style="color: #666666">=</span> <span style="color: #666666">2*</span>np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>rand(n,<span style="color: #666666">1</span>)
+y <span style="color: #666666">=</span> <span style="color: #666666">4+3*</span>x<span style="color: #666666">+</span>np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>randn(n,<span style="color: #666666">1</span>)
+
+X <span style="color: #666666">=</span> np<span style="color: #666666">.</span>c_[np<span style="color: #666666">.</span>ones((n,<span style="color: #666666">1</span>)), x]
+XT_X <span style="color: #666666">=</span> X<span style="color: #666666">.</span>T <span style="color: #666666">@</span> X
+theta_linreg <span style="color: #666666">=</span> np<span style="color: #666666">.</span>linalg<span style="color: #666666">.</span>pinv(XT_X) <span style="color: #666666">@</span> (X<span style="color: #666666">.</span>T <span style="color: #666666">@</span> y)
+<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;Own inversion&quot;</span>)
+<span style="color: #008000">print</span>(theta_linreg)
+<span style="color: #408080; font-style: italic"># Hessian matrix</span>
+H <span style="color: #666666">=</span> (<span style="color: #666666">2.0/</span>n)<span style="color: #666666">*</span> XT_X
+EigValues, EigVectors <span style="color: #666666">=</span> np<span style="color: #666666">.</span>linalg<span style="color: #666666">.</span>eig(H)
+<span style="color: #008000">print</span>(<span style="color: #BA2121">f&quot;Eigenvalues of Hessian Matrix:</span><span style="color: #BB6688; font-weight: bold">{</span>EigValues<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">&quot;</span>)
+
+theta <span style="color: #666666">=</span> np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>randn(<span style="color: #666666">2</span>,<span style="color: #666666">1</span>)
+eta <span style="color: #666666">=</span> <span style="color: #666666">1.0/</span>np<span style="color: #666666">.</span>max(EigValues)
+Niterations <span style="color: #666666">=</span> <span style="color: #666666">1000</span>
+<span style="color: #408080; font-style: italic"># define the gradient</span>
+training_gradient <span style="color: #666666">=</span> grad(CostOLS)
+
+<span style="color: #008000; font-weight: bold">for</span> <span style="color: #008000">iter</span> <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(Niterations):
+    gradients <span style="color: #666666">=</span> training_gradient(theta)
+    theta <span style="color: #666666">-=</span> eta<span style="color: #666666">*</span>gradients
+<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;theta from own gd&quot;</span>)
+<span style="color: #008000">print</span>(theta)
+
+xnew <span style="color: #666666">=</span> np<span style="color: #666666">.</span>array([[<span style="color: #666666">0</span>],[<span style="color: #666666">2</span>]])
+Xnew <span style="color: #666666">=</span> np<span style="color: #666666">.</span>c_[np<span style="color: #666666">.</span>ones((<span style="color: #666666">2</span>,<span style="color: #666666">1</span>)), xnew]
+ypredict <span style="color: #666666">=</span> Xnew<span style="color: #666666">.</span>dot(theta)
+ypredict2 <span style="color: #666666">=</span> Xnew<span style="color: #666666">.</span>dot(theta_linreg)
+
+plt<span style="color: #666666">.</span>plot(xnew, ypredict, <span style="color: #BA2121">&quot;r-&quot;</span>)
+plt<span style="color: #666666">.</span>plot(xnew, ypredict2, <span style="color: #BA2121">&quot;b-&quot;</span>)
+plt<span style="color: #666666">.</span>plot(x, y ,<span style="color: #BA2121">&#39;ro&#39;</span>)
+plt<span style="color: #666666">.</span>axis([<span style="color: #666666">0</span>,<span style="color: #666666">2.0</span>,<span style="color: #666666">0</span>, <span style="color: #666666">15.0</span>])
+plt<span style="color: #666666">.</span>xlabel(<span style="color: #BA2121">r&#39;$x$&#39;</span>)
+plt<span style="color: #666666">.</span>ylabel(<span style="color: #BA2121">r&#39;$y$&#39;</span>)
+plt<span style="color: #666666">.</span>title(<span style="color: #BA2121">r&#39;Random numbers &#39;</span>)
+plt<span style="color: #666666">.</span>show()
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
+</div>
 
-<p>If we draw observations in accordance with
-the relative frequency of the observations, will we obtain the same
-result in some asymptotic sense? The answer is yes.
-</p>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="resampling-methods-bootstrap-steps">Resampling methods: Bootstrap steps </h2>
+<h2 id="same-code-but-now-with-momentum-gradient-descent">Same code but now with momentum gradient descent </h2>
 
-<p>The independent bootstrap works like this: </p>
+<!-- code=python (!bc pycod) typeset with pygments style "default" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #f8f8f8">
+  <pre style="line-height: 125%;"><span style="color: #408080; font-style: italic"># Using Autograd to calculate gradients for OLS</span>
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">random</span> <span style="color: #008000; font-weight: bold">import</span> random, seed
+<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
+<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">autograd.numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
+<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">matplotlib.pyplot</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">plt</span>
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">autograd</span> <span style="color: #008000; font-weight: bold">import</span> grad
+
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">CostOLS</span>(theta):
+    <span style="color: #008000; font-weight: bold">return</span> (<span style="color: #666666">1.0/</span>n)<span style="color: #666666">*</span>np<span style="color: #666666">.</span>sum((y<span style="color: #666666">-</span>X <span style="color: #666666">@</span> theta)<span style="color: #666666">**2</span>)
+
+n <span style="color: #666666">=</span> <span style="color: #666666">100</span>
+x <span style="color: #666666">=</span> <span style="color: #666666">2*</span>np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>rand(n,<span style="color: #666666">1</span>)
+y <span style="color: #666666">=</span> <span style="color: #666666">4+3*</span>x<span style="color: #408080; font-style: italic">#+np.random.randn(n,1)</span>
+
+X <span style="color: #666666">=</span> np<span style="color: #666666">.</span>c_[np<span style="color: #666666">.</span>ones((n,<span style="color: #666666">1</span>)), x]
+XT_X <span style="color: #666666">=</span> X<span style="color: #666666">.</span>T <span style="color: #666666">@</span> X
+theta_linreg <span style="color: #666666">=</span> np<span style="color: #666666">.</span>linalg<span style="color: #666666">.</span>pinv(XT_X) <span style="color: #666666">@</span> (X<span style="color: #666666">.</span>T <span style="color: #666666">@</span> y)
+<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;Own inversion&quot;</span>)
+<span style="color: #008000">print</span>(theta_linreg)
+<span style="color: #408080; font-style: italic"># Hessian matrix</span>
+H <span style="color: #666666">=</span> (<span style="color: #666666">2.0/</span>n)<span style="color: #666666">*</span> XT_X
+EigValues, EigVectors <span style="color: #666666">=</span> np<span style="color: #666666">.</span>linalg<span style="color: #666666">.</span>eig(H)
+<span style="color: #008000">print</span>(<span style="color: #BA2121">f&quot;Eigenvalues of Hessian Matrix:</span><span style="color: #BB6688; font-weight: bold">{</span>EigValues<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">&quot;</span>)
+
+theta <span style="color: #666666">=</span> np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>randn(<span style="color: #666666">2</span>,<span style="color: #666666">1</span>)
+eta <span style="color: #666666">=</span> <span style="color: #666666">1.0/</span>np<span style="color: #666666">.</span>max(EigValues)
+Niterations <span style="color: #666666">=</span> <span style="color: #666666">30</span>
+
+<span style="color: #408080; font-style: italic"># define the gradient</span>
+training_gradient <span style="color: #666666">=</span> grad(CostOLS)
+
+<span style="color: #008000; font-weight: bold">for</span> <span style="color: #008000">iter</span> <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(Niterations):
+    gradients <span style="color: #666666">=</span> training_gradient(theta)
+    theta <span style="color: #666666">-=</span> eta<span style="color: #666666">*</span>gradients
+    <span style="color: #008000">print</span>(<span style="color: #008000">iter</span>,gradients[<span style="color: #666666">0</span>],gradients[<span style="color: #666666">1</span>])
+<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;theta from own gd&quot;</span>)
+<span style="color: #008000">print</span>(theta)
+
+<span style="color: #408080; font-style: italic"># Now improve with momentum gradient descent</span>
+change <span style="color: #666666">=</span> <span style="color: #666666">0.0</span>
+delta_momentum <span style="color: #666666">=</span> <span style="color: #666666">0.3</span>
+<span style="color: #008000; font-weight: bold">for</span> <span style="color: #008000">iter</span> <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(Niterations):
+    <span style="color: #408080; font-style: italic"># calculate gradient</span>
+    gradients <span style="color: #666666">=</span> training_gradient(theta)
+    <span style="color: #408080; font-style: italic"># calculate update</span>
+    new_change <span style="color: #666666">=</span> eta<span style="color: #666666">*</span>gradients<span style="color: #666666">+</span>delta_momentum<span style="color: #666666">*</span>change
+    <span style="color: #408080; font-style: italic"># take a step</span>
+    theta <span style="color: #666666">-=</span> new_change
+    <span style="color: #408080; font-style: italic"># save the change</span>
+    change <span style="color: #666666">=</span> new_change
+    <span style="color: #008000">print</span>(<span style="color: #008000">iter</span>,gradients[<span style="color: #666666">0</span>],gradients[<span style="color: #666666">1</span>])
+<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;theta from own gd wth momentum&quot;</span>)
+<span style="color: #008000">print</span>(theta)
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
+</div>
 
-<ol>
-<li> Draw with replacement \( n \) numbers for the observed variables \( \boldsymbol{x} = (x_1,x_2,\cdots,x_n) \).</li> 
-<li> Define a vector \( \boldsymbol{x}^* \) containing the values which were drawn from \( \boldsymbol{x} \).</li> 
-<li> Using the vector \( \boldsymbol{x}^* \) compute \( \widehat{\beta}^* \) by evaluating \( \widehat \beta \) under the observations \( \boldsymbol{x}^* \).</li> 
-<li> Repeat this process \( k \) times.</li> 
-</ol>
-<p>When you are done, you can draw a histogram of the relative frequency
-of \( \widehat \beta^* \). This is your estimate of the probability
-distribution \( p(t) \). Using this probability distribution you can
-estimate any statistics thereof. In principle you never draw the
-histogram of the relative frequency of \( \widehat{\beta}^* \). Instead
-you use the estimators corresponding to the statistic of interest. For
-example, if you are interested in estimating the variance of \( \widehat
-\beta \), apply the etsimator \( \widehat \sigma^2 \) to the values
-\( \widehat \beta^* \).
-</p>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="code-example-for-the-bootstrap-method">Code example for the Bootstrap method </h2>
+<h2 id="including-stochastic-gradient-descent-with-autograd">Including Stochastic Gradient Descent with Autograd </h2>
 
-<p>The following code starts with a Gaussian distribution with mean value
-\( \mu =100 \) and variance \( \sigma=15 \). We use this to generate the data
-used in the bootstrap analysis. The bootstrap analysis returns a data
-set after a given number of bootstrap operations (as many as we have
-data points). This data set consists of estimated mean values for each
-bootstrap operation. The histogram generated by the bootstrap method
-shows that the distribution for these mean values is also a Gaussian,
-centered around the mean value \( \mu=100 \) but with standard deviation
-\( \sigma/\sqrt{n} \), where \( n \) is the number of bootstrap samples (in
-this case the same as the number of original data points). The value
-of the standard deviation is what we expect from the central limit
-theorem.
+<p>In this code we include the stochastic gradient descent approach
+discussed above. Note here that we specify which argument we are
+taking the derivative with respect to when using <b>autograd</b>.
 </p>
 
 
@@ -1152,32 +2026,79 @@ <h2 id="code-example-for-the-bootstrap-method">Code example for the Bootstrap me
     <div class="inner_cell">
       <div class="input_area">
         <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">time</span> <span style="color: #008000; font-weight: bold">import</span> time
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">scipy.stats</span> <span style="color: #008000; font-weight: bold">import</span> norm
+  <pre style="line-height: 125%;"><span style="color: #408080; font-style: italic"># Using Autograd to calculate gradients using SGD</span>
+<span style="color: #408080; font-style: italic"># OLS example</span>
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">random</span> <span style="color: #008000; font-weight: bold">import</span> random, seed
+<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
+<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">autograd.numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
 <span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">matplotlib.pyplot</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">plt</span>
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">autograd</span> <span style="color: #008000; font-weight: bold">import</span> grad
+
+<span style="color: #408080; font-style: italic"># Note change from previous example</span>
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">CostOLS</span>(y,X,theta):
+    <span style="color: #008000; font-weight: bold">return</span> np<span style="color: #666666">.</span>sum((y<span style="color: #666666">-</span>X <span style="color: #666666">@</span> theta)<span style="color: #666666">**2</span>)
+
+n <span style="color: #666666">=</span> <span style="color: #666666">100</span>
+x <span style="color: #666666">=</span> <span style="color: #666666">2*</span>np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>rand(n,<span style="color: #666666">1</span>)
+y <span style="color: #666666">=</span> <span style="color: #666666">4+3*</span>x<span style="color: #666666">+</span>np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>randn(n,<span style="color: #666666">1</span>)
+
+X <span style="color: #666666">=</span> np<span style="color: #666666">.</span>c_[np<span style="color: #666666">.</span>ones((n,<span style="color: #666666">1</span>)), x]
+XT_X <span style="color: #666666">=</span> X<span style="color: #666666">.</span>T <span style="color: #666666">@</span> X
+theta_linreg <span style="color: #666666">=</span> np<span style="color: #666666">.</span>linalg<span style="color: #666666">.</span>pinv(XT_X) <span style="color: #666666">@</span> (X<span style="color: #666666">.</span>T <span style="color: #666666">@</span> y)
+<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;Own inversion&quot;</span>)
+<span style="color: #008000">print</span>(theta_linreg)
+<span style="color: #408080; font-style: italic"># Hessian matrix</span>
+H <span style="color: #666666">=</span> (<span style="color: #666666">2.0/</span>n)<span style="color: #666666">*</span> XT_X
+EigValues, EigVectors <span style="color: #666666">=</span> np<span style="color: #666666">.</span>linalg<span style="color: #666666">.</span>eig(H)
+<span style="color: #008000">print</span>(<span style="color: #BA2121">f&quot;Eigenvalues of Hessian Matrix:</span><span style="color: #BB6688; font-weight: bold">{</span>EigValues<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">&quot;</span>)
+
+theta <span style="color: #666666">=</span> np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>randn(<span style="color: #666666">2</span>,<span style="color: #666666">1</span>)
+eta <span style="color: #666666">=</span> <span style="color: #666666">1.0/</span>np<span style="color: #666666">.</span>max(EigValues)
+Niterations <span style="color: #666666">=</span> <span style="color: #666666">1000</span>
+
+<span style="color: #408080; font-style: italic"># Note that we request the derivative wrt third argument (theta, 2 here)</span>
+training_gradient <span style="color: #666666">=</span> grad(CostOLS,<span style="color: #666666">2</span>)
+
+<span style="color: #008000; font-weight: bold">for</span> <span style="color: #008000">iter</span> <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(Niterations):
+    gradients <span style="color: #666666">=</span> (<span style="color: #666666">1.0/</span>n)<span style="color: #666666">*</span>training_gradient(y, X, theta)
+    theta <span style="color: #666666">-=</span> eta<span style="color: #666666">*</span>gradients
+<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;theta from own gd&quot;</span>)
+<span style="color: #008000">print</span>(theta)
+
+xnew <span style="color: #666666">=</span> np<span style="color: #666666">.</span>array([[<span style="color: #666666">0</span>],[<span style="color: #666666">2</span>]])
+Xnew <span style="color: #666666">=</span> np<span style="color: #666666">.</span>c_[np<span style="color: #666666">.</span>ones((<span style="color: #666666">2</span>,<span style="color: #666666">1</span>)), xnew]
+ypredict <span style="color: #666666">=</span> Xnew<span style="color: #666666">.</span>dot(theta)
+ypredict2 <span style="color: #666666">=</span> Xnew<span style="color: #666666">.</span>dot(theta_linreg)
+
+plt<span style="color: #666666">.</span>plot(xnew, ypredict, <span style="color: #BA2121">&quot;r-&quot;</span>)
+plt<span style="color: #666666">.</span>plot(xnew, ypredict2, <span style="color: #BA2121">&quot;b-&quot;</span>)
+plt<span style="color: #666666">.</span>plot(x, y ,<span style="color: #BA2121">&#39;ro&#39;</span>)
+plt<span style="color: #666666">.</span>axis([<span style="color: #666666">0</span>,<span style="color: #666666">2.0</span>,<span style="color: #666666">0</span>, <span style="color: #666666">15.0</span>])
+plt<span style="color: #666666">.</span>xlabel(<span style="color: #BA2121">r&#39;$x$&#39;</span>)
+plt<span style="color: #666666">.</span>ylabel(<span style="color: #BA2121">r&#39;$y$&#39;</span>)
+plt<span style="color: #666666">.</span>title(<span style="color: #BA2121">r&#39;Random numbers &#39;</span>)
+plt<span style="color: #666666">.</span>show()
 
-<span style="color: #408080; font-style: italic"># Returns mean of bootstrap samples </span>
-<span style="color: #408080; font-style: italic"># Bootstrap algorithm</span>
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">bootstrap</span>(data, datapoints):
-    t <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros(datapoints)
-    n <span style="color: #666666">=</span> <span style="color: #008000">len</span>(data)
-    <span style="color: #408080; font-style: italic"># non-parametric bootstrap         </span>
-    <span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(datapoints):
-        t[i] <span style="color: #666666">=</span> np<span style="color: #666666">.</span>mean(data[np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>randint(<span style="color: #666666">0</span>,n,n)])
-    <span style="color: #408080; font-style: italic"># analysis    </span>
-    <span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;Bootstrap Statistics :&quot;</span>)
-    <span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;original           bias      std. error&quot;</span>)
-    <span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;</span><span style="color: #BB6688; font-weight: bold">%8g</span><span style="color: #BA2121"> </span><span style="color: #BB6688; font-weight: bold">%8g</span><span style="color: #BA2121"> </span><span style="color: #BB6688; font-weight: bold">%14g</span><span style="color: #BA2121"> </span><span style="color: #BB6688; font-weight: bold">%15g</span><span style="color: #BA2121">&quot;</span> <span style="color: #666666">%</span> (np<span style="color: #666666">.</span>mean(data), np<span style="color: #666666">.</span>std(data),np<span style="color: #666666">.</span>mean(t),np<span style="color: #666666">.</span>std(t)))
-    <span style="color: #008000; font-weight: bold">return</span> t
-
-<span style="color: #408080; font-style: italic"># We set the mean value to 100 and the standard deviation to 15</span>
-mu, sigma <span style="color: #666666">=</span> <span style="color: #666666">100</span>, <span style="color: #666666">15</span>
-datapoints <span style="color: #666666">=</span> <span style="color: #666666">10000</span>
-<span style="color: #408080; font-style: italic"># We generate random numbers according to the normal distribution</span>
-x <span style="color: #666666">=</span> mu <span style="color: #666666">+</span> sigma<span style="color: #666666">*</span>np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>randn(datapoints)
-<span style="color: #408080; font-style: italic"># bootstrap returns the data sample                                    </span>
-t <span style="color: #666666">=</span> bootstrap(x, datapoints)
+n_epochs <span style="color: #666666">=</span> <span style="color: #666666">50</span>
+M <span style="color: #666666">=</span> <span style="color: #666666">5</span>   <span style="color: #408080; font-style: italic">#size of each minibatch</span>
+m <span style="color: #666666">=</span> <span style="color: #008000">int</span>(n<span style="color: #666666">/</span>M) <span style="color: #408080; font-style: italic">#number of minibatches</span>
+t0, t1 <span style="color: #666666">=</span> <span style="color: #666666">5</span>, <span style="color: #666666">50</span>
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">learning_schedule</span>(t):
+    <span style="color: #008000; font-weight: bold">return</span> t0<span style="color: #666666">/</span>(t<span style="color: #666666">+</span>t1)
+
+theta <span style="color: #666666">=</span> np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>randn(<span style="color: #666666">2</span>,<span style="color: #666666">1</span>)
+
+<span style="color: #008000; font-weight: bold">for</span> epoch <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(n_epochs):
+<span style="color: #408080; font-style: italic"># Can you figure out a better way of setting up the contributions to each batch?</span>
+    <span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(m):
+        random_index <span style="color: #666666">=</span> M<span style="color: #666666">*</span>np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>randint(m)
+        xi <span style="color: #666666">=</span> X[random_index:random_index<span style="color: #666666">+</span>M]
+        yi <span style="color: #666666">=</span> y[random_index:random_index<span style="color: #666666">+</span>M]
+        gradients <span style="color: #666666">=</span> (<span style="color: #666666">1.0/</span>M)<span style="color: #666666">*</span>training_gradient(yi, xi, theta)
+        eta <span style="color: #666666">=</span> learning_schedule(epoch<span style="color: #666666">*</span>m<span style="color: #666666">+</span>i)
+        theta <span style="color: #666666">=</span> theta <span style="color: #666666">-</span> eta<span style="color: #666666">*</span>gradients
+<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;theta from own sdg&quot;</span>)
+<span style="color: #008000">print</span>(theta)
 </pre>
 </div>
       </div>
@@ -1193,10 +2114,9 @@ <h2 id="code-example-for-the-bootstrap-method">Code example for the Bootstrap me
   </div>
 </div>
 
-<p>We see that our new variance and from that the standard deviation, agrees with the central limit theorem.</p>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="plotting-the-histogram">Plotting the Histogram </h2>
+<h2 id="same-code-but-now-with-momentum-gradient-descent">Same code but now with momentum gradient descent </h2>
 
 <!-- code=python (!bc pycod) typeset with pygments style "default" -->
 <div class="cell border-box-sizing code_cell rendered">
@@ -1204,15 +2124,73 @@ <h2 id="plotting-the-histogram">Plotting the Histogram </h2>
     <div class="inner_cell">
       <div class="input_area">
         <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #408080; font-style: italic"># the histogram of the bootstrapped data (normalized data if density = True)</span>
-n, binsboot, patches <span style="color: #666666">=</span> plt<span style="color: #666666">.</span>hist(t, <span style="color: #666666">50</span>, density<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">True</span>, facecolor<span style="color: #666666">=</span><span style="color: #BA2121">&#39;red&#39;</span>, alpha<span style="color: #666666">=0.75</span>)
-<span style="color: #408080; font-style: italic"># add a &#39;best fit&#39; line  </span>
-y <span style="color: #666666">=</span> norm<span style="color: #666666">.</span>pdf(binsboot, np<span style="color: #666666">.</span>mean(t), np<span style="color: #666666">.</span>std(t))
-lt <span style="color: #666666">=</span> plt<span style="color: #666666">.</span>plot(binsboot, y, <span style="color: #BA2121">&#39;b&#39;</span>, linewidth<span style="color: #666666">=1</span>)
-plt<span style="color: #666666">.</span>xlabel(<span style="color: #BA2121">&#39;x&#39;</span>)
-plt<span style="color: #666666">.</span>ylabel(<span style="color: #BA2121">&#39;Probability&#39;</span>)
-plt<span style="color: #666666">.</span>grid(<span style="color: #008000; font-weight: bold">True</span>)
-plt<span style="color: #666666">.</span>show()
+  <pre style="line-height: 125%;"><span style="color: #408080; font-style: italic"># Using Autograd to calculate gradients using SGD</span>
+<span style="color: #408080; font-style: italic"># OLS example</span>
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">random</span> <span style="color: #008000; font-weight: bold">import</span> random, seed
+<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
+<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">autograd.numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
+<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">matplotlib.pyplot</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">plt</span>
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">autograd</span> <span style="color: #008000; font-weight: bold">import</span> grad
+
+<span style="color: #408080; font-style: italic"># Note change from previous example</span>
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">CostOLS</span>(y,X,theta):
+    <span style="color: #008000; font-weight: bold">return</span> np<span style="color: #666666">.</span>sum((y<span style="color: #666666">-</span>X <span style="color: #666666">@</span> theta)<span style="color: #666666">**2</span>)
+
+n <span style="color: #666666">=</span> <span style="color: #666666">100</span>
+x <span style="color: #666666">=</span> <span style="color: #666666">2*</span>np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>rand(n,<span style="color: #666666">1</span>)
+y <span style="color: #666666">=</span> <span style="color: #666666">4+3*</span>x<span style="color: #666666">+</span>np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>randn(n,<span style="color: #666666">1</span>)
+
+X <span style="color: #666666">=</span> np<span style="color: #666666">.</span>c_[np<span style="color: #666666">.</span>ones((n,<span style="color: #666666">1</span>)), x]
+XT_X <span style="color: #666666">=</span> X<span style="color: #666666">.</span>T <span style="color: #666666">@</span> X
+theta_linreg <span style="color: #666666">=</span> np<span style="color: #666666">.</span>linalg<span style="color: #666666">.</span>pinv(XT_X) <span style="color: #666666">@</span> (X<span style="color: #666666">.</span>T <span style="color: #666666">@</span> y)
+<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;Own inversion&quot;</span>)
+<span style="color: #008000">print</span>(theta_linreg)
+<span style="color: #408080; font-style: italic"># Hessian matrix</span>
+H <span style="color: #666666">=</span> (<span style="color: #666666">2.0/</span>n)<span style="color: #666666">*</span> XT_X
+EigValues, EigVectors <span style="color: #666666">=</span> np<span style="color: #666666">.</span>linalg<span style="color: #666666">.</span>eig(H)
+<span style="color: #008000">print</span>(<span style="color: #BA2121">f&quot;Eigenvalues of Hessian Matrix:</span><span style="color: #BB6688; font-weight: bold">{</span>EigValues<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">&quot;</span>)
+
+theta <span style="color: #666666">=</span> np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>randn(<span style="color: #666666">2</span>,<span style="color: #666666">1</span>)
+eta <span style="color: #666666">=</span> <span style="color: #666666">1.0/</span>np<span style="color: #666666">.</span>max(EigValues)
+Niterations <span style="color: #666666">=</span> <span style="color: #666666">100</span>
+
+<span style="color: #408080; font-style: italic"># Note that we request the derivative wrt third argument (theta, 2 here)</span>
+training_gradient <span style="color: #666666">=</span> grad(CostOLS,<span style="color: #666666">2</span>)
+
+<span style="color: #008000; font-weight: bold">for</span> <span style="color: #008000">iter</span> <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(Niterations):
+    gradients <span style="color: #666666">=</span> (<span style="color: #666666">1.0/</span>n)<span style="color: #666666">*</span>training_gradient(y, X, theta)
+    theta <span style="color: #666666">-=</span> eta<span style="color: #666666">*</span>gradients
+<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;theta from own gd&quot;</span>)
+<span style="color: #008000">print</span>(theta)
+
+
+n_epochs <span style="color: #666666">=</span> <span style="color: #666666">50</span>
+M <span style="color: #666666">=</span> <span style="color: #666666">5</span>   <span style="color: #408080; font-style: italic">#size of each minibatch</span>
+m <span style="color: #666666">=</span> <span style="color: #008000">int</span>(n<span style="color: #666666">/</span>M) <span style="color: #408080; font-style: italic">#number of minibatches</span>
+t0, t1 <span style="color: #666666">=</span> <span style="color: #666666">5</span>, <span style="color: #666666">50</span>
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">learning_schedule</span>(t):
+    <span style="color: #008000; font-weight: bold">return</span> t0<span style="color: #666666">/</span>(t<span style="color: #666666">+</span>t1)
+
+theta <span style="color: #666666">=</span> np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>randn(<span style="color: #666666">2</span>,<span style="color: #666666">1</span>)
+
+change <span style="color: #666666">=</span> <span style="color: #666666">0.0</span>
+delta_momentum <span style="color: #666666">=</span> <span style="color: #666666">0.3</span>
+
+<span style="color: #008000; font-weight: bold">for</span> epoch <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(n_epochs):
+    <span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(m):
+        random_index <span style="color: #666666">=</span> M<span style="color: #666666">*</span>np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>randint(m)
+        xi <span style="color: #666666">=</span> X[random_index:random_index<span style="color: #666666">+</span>M]
+        yi <span style="color: #666666">=</span> y[random_index:random_index<span style="color: #666666">+</span>M]
+        gradients <span style="color: #666666">=</span> (<span style="color: #666666">1.0/</span>M)<span style="color: #666666">*</span>training_gradient(yi, xi, theta)
+        eta <span style="color: #666666">=</span> learning_schedule(epoch<span style="color: #666666">*</span>m<span style="color: #666666">+</span>i)
+        <span style="color: #408080; font-style: italic"># calculate update</span>
+        new_change <span style="color: #666666">=</span> eta<span style="color: #666666">*</span>gradients<span style="color: #666666">+</span>delta_momentum<span style="color: #666666">*</span>change
+        <span style="color: #408080; font-style: italic"># take a step</span>
+        theta <span style="color: #666666">-=</span> new_change
+        <span style="color: #408080; font-style: italic"># save the change</span>
+        change <span style="color: #666666">=</span> new_change
+<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;theta from own sdg with momentum&quot;</span>)
+<span style="color: #008000">print</span>(theta)
 </pre>
 </div>
       </div>
@@ -1230,76 +2208,221 @@ <h2 id="plotting-the-histogram">Plotting the Histogram </h2>
 
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="the-bias-variance-tradeoff">The bias-variance tradeoff </h2>
+<h2 id="but-none-of-these-can-compete-with-newton-s-method">But none of these can compete with Newton's method </h2>
 
-<p>We will discuss the bias-variance tradeoff in the context of
-continuous predictions such as regression. However, many of the
-intuitions and ideas discussed here also carry over to classification
-tasks. Consider a dataset \( \mathcal{D} \) consisting of the data
-\( \mathbf{X}_\mathcal{D}=\{(y_j, \boldsymbol{x}_j), j=0\ldots n-1\} \). 
-</p>
+<p>Note that we here have introduced automatic differentiation</p>
 
-<p>Let us assume that the true data is generated from a noisy model</p>
-
-$$
-\boldsymbol{y}=f(\boldsymbol{x}) + \boldsymbol{\epsilon}
-$$
-
-<p>where \( \epsilon \) is normally distributed with mean zero and standard deviation \( \sigma^2 \).</p>
+<!-- code=python (!bc pycod) typeset with pygments style "default" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #f8f8f8">
+  <pre style="line-height: 125%;"><span style="color: #408080; font-style: italic"># Using Newton&#39;s method</span>
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">random</span> <span style="color: #008000; font-weight: bold">import</span> random, seed
+<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
+<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">autograd.numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">autograd</span> <span style="color: #008000; font-weight: bold">import</span> grad
+
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">CostOLS</span>(theta):
+    <span style="color: #008000; font-weight: bold">return</span> (<span style="color: #666666">1.0/</span>n)<span style="color: #666666">*</span>np<span style="color: #666666">.</span>sum((y<span style="color: #666666">-</span>X <span style="color: #666666">@</span> theta)<span style="color: #666666">**2</span>)
+
+n <span style="color: #666666">=</span> <span style="color: #666666">100</span>
+x <span style="color: #666666">=</span> <span style="color: #666666">2*</span>np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>rand(n,<span style="color: #666666">1</span>)
+y <span style="color: #666666">=</span> <span style="color: #666666">4+3*</span>x<span style="color: #666666">+5*</span>x<span style="color: #666666">*</span>x
+
+X <span style="color: #666666">=</span> np<span style="color: #666666">.</span>c_[np<span style="color: #666666">.</span>ones((n,<span style="color: #666666">1</span>)), x, x<span style="color: #666666">*</span>x]
+XT_X <span style="color: #666666">=</span> X<span style="color: #666666">.</span>T <span style="color: #666666">@</span> X
+theta_linreg <span style="color: #666666">=</span> np<span style="color: #666666">.</span>linalg<span style="color: #666666">.</span>pinv(XT_X) <span style="color: #666666">@</span> (X<span style="color: #666666">.</span>T <span style="color: #666666">@</span> y)
+<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;Own inversion&quot;</span>)
+<span style="color: #008000">print</span>(theta_linreg)
+<span style="color: #408080; font-style: italic"># Hessian matrix</span>
+H <span style="color: #666666">=</span> (<span style="color: #666666">2.0/</span>n)<span style="color: #666666">*</span> XT_X
+<span style="color: #408080; font-style: italic"># Note that here the Hessian does not depend on the parameters theta</span>
+invH <span style="color: #666666">=</span> np<span style="color: #666666">.</span>linalg<span style="color: #666666">.</span>pinv(H)
+theta <span style="color: #666666">=</span> np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>randn(<span style="color: #666666">3</span>,<span style="color: #666666">1</span>)
+Niterations <span style="color: #666666">=</span> <span style="color: #666666">5</span>
+<span style="color: #408080; font-style: italic"># define the gradient</span>
+training_gradient <span style="color: #666666">=</span> grad(CostOLS)
+
+<span style="color: #008000; font-weight: bold">for</span> <span style="color: #008000">iter</span> <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(Niterations):
+    gradients <span style="color: #666666">=</span> training_gradient(theta)
+    theta <span style="color: #666666">-=</span> invH <span style="color: #666666">@</span> gradients
+    <span style="color: #008000">print</span>(<span style="color: #008000">iter</span>,gradients[<span style="color: #666666">0</span>],gradients[<span style="color: #666666">1</span>])
+<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;theta from own Newton code&quot;</span>)
+<span style="color: #008000">print</span>(theta)
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
+</div>
 
-<p>In our derivation of the ordinary least squares method we defined then
-an approximation to the function \( f \) in terms of the parameters
-\( \boldsymbol{\beta} \) and the design matrix \( \boldsymbol{X} \) which embody our model,
-that is \( \boldsymbol{\tilde{y}}=\boldsymbol{X}\boldsymbol{\beta} \). 
-</p>
 
-<p>Thereafter we found the parameters \( \boldsymbol{\beta} \) by optimizing the means squared error via the so-called cost function</p>
-$$
-C(\boldsymbol{X},\boldsymbol{\beta}) =\frac{1}{n}\sum_{i=0}^{n-1}(y_i-\tilde{y}_i)^2=\mathbb{E}\left[(\boldsymbol{y}-\boldsymbol{\tilde{y}})^2\right].
-$$
-
-<p>We can rewrite this as </p>
-$$
-\mathbb{E}\left[(\boldsymbol{y}-\boldsymbol{\tilde{y}})^2\right]=\frac{1}{n}\sum_i(f_i-\mathbb{E}\left[\boldsymbol{\tilde{y}}\right])^2+\frac{1}{n}\sum_i(\tilde{y}_i-\mathbb{E}\left[\boldsymbol{\tilde{y}}\right])^2+\sigma^2.
-$$
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="similar-second-order-function-now-problem-but-now-with-adagrad">Similar (second order function now) problem but now with AdaGrad </h2>
 
-<p>The three terms represent the square of the bias of the learning
-method, which can be thought of as the error caused by the simplifying
-assumptions built into the method. The second term represents the
-variance of the chosen model and finally the last terms is variance of
-the error \( \boldsymbol{\epsilon} \).
-</p>
+<!-- code=python (!bc pycod) typeset with pygments style "default" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #f8f8f8">
+  <pre style="line-height: 125%;"><span style="color: #408080; font-style: italic"># Using Autograd to calculate gradients using AdaGrad and Stochastic Gradient descent</span>
+<span style="color: #408080; font-style: italic"># OLS example</span>
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">random</span> <span style="color: #008000; font-weight: bold">import</span> random, seed
+<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
+<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">autograd.numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
+<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">matplotlib.pyplot</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">plt</span>
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">autograd</span> <span style="color: #008000; font-weight: bold">import</span> grad
+
+<span style="color: #408080; font-style: italic"># Note change from previous example</span>
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">CostOLS</span>(y,X,theta):
+    <span style="color: #008000; font-weight: bold">return</span> np<span style="color: #666666">.</span>sum((y<span style="color: #666666">-</span>X <span style="color: #666666">@</span> theta)<span style="color: #666666">**2</span>)
+
+n <span style="color: #666666">=</span> <span style="color: #666666">1000</span>
+x <span style="color: #666666">=</span> np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>rand(n,<span style="color: #666666">1</span>)
+y <span style="color: #666666">=</span> <span style="color: #666666">2.0+3*</span>x <span style="color: #666666">+4*</span>x<span style="color: #666666">*</span>x
+
+X <span style="color: #666666">=</span> np<span style="color: #666666">.</span>c_[np<span style="color: #666666">.</span>ones((n,<span style="color: #666666">1</span>)), x, x<span style="color: #666666">*</span>x]
+XT_X <span style="color: #666666">=</span> X<span style="color: #666666">.</span>T <span style="color: #666666">@</span> X
+theta_linreg <span style="color: #666666">=</span> np<span style="color: #666666">.</span>linalg<span style="color: #666666">.</span>pinv(XT_X) <span style="color: #666666">@</span> (X<span style="color: #666666">.</span>T <span style="color: #666666">@</span> y)
+<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;Own inversion&quot;</span>)
+<span style="color: #008000">print</span>(theta_linreg)
+
+
+<span style="color: #408080; font-style: italic"># Note that we request the derivative wrt third argument (theta, 2 here)</span>
+training_gradient <span style="color: #666666">=</span> grad(CostOLS,<span style="color: #666666">2</span>)
+<span style="color: #408080; font-style: italic"># Define parameters for Stochastic Gradient Descent</span>
+n_epochs <span style="color: #666666">=</span> <span style="color: #666666">50</span>
+M <span style="color: #666666">=</span> <span style="color: #666666">5</span>   <span style="color: #408080; font-style: italic">#size of each minibatch</span>
+m <span style="color: #666666">=</span> <span style="color: #008000">int</span>(n<span style="color: #666666">/</span>M) <span style="color: #408080; font-style: italic">#number of minibatches</span>
+<span style="color: #408080; font-style: italic"># Guess for unknown parameters theta</span>
+theta <span style="color: #666666">=</span> np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>randn(<span style="color: #666666">3</span>,<span style="color: #666666">1</span>)
+
+<span style="color: #408080; font-style: italic"># Value for learning rate</span>
+eta <span style="color: #666666">=</span> <span style="color: #666666">0.01</span>
+<span style="color: #408080; font-style: italic"># Including AdaGrad parameter to avoid possible division by zero</span>
+delta  <span style="color: #666666">=</span> <span style="color: #666666">1e-8</span>
+<span style="color: #008000; font-weight: bold">for</span> epoch <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(n_epochs):
+    Giter <span style="color: #666666">=</span> <span style="color: #666666">0.0</span>
+    <span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(m):
+        random_index <span style="color: #666666">=</span> M<span style="color: #666666">*</span>np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>randint(m)
+        xi <span style="color: #666666">=</span> X[random_index:random_index<span style="color: #666666">+</span>M]
+        yi <span style="color: #666666">=</span> y[random_index:random_index<span style="color: #666666">+</span>M]
+        gradients <span style="color: #666666">=</span> (<span style="color: #666666">1.0/</span>M)<span style="color: #666666">*</span>training_gradient(yi, xi, theta)
+        Giter <span style="color: #666666">+=</span> gradients<span style="color: #666666">*</span>gradients
+        update <span style="color: #666666">=</span> gradients<span style="color: #666666">*</span>eta<span style="color: #666666">/</span>(delta<span style="color: #666666">+</span>np<span style="color: #666666">.</span>sqrt(Giter))
+        theta <span style="color: #666666">-=</span> update
+<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;theta from own AdaGrad&quot;</span>)
+<span style="color: #008000">print</span>(theta)
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
+</div>
 
-<p>To derive this equation, we need to recall that the variance of \( \boldsymbol{y} \) and \( \boldsymbol{\epsilon} \) are both equal to \( \sigma^2 \). The mean value of \( \boldsymbol{\epsilon} \) is by definition equal to zero. Furthermore, the function \( f \) is not a stochastics variable, idem for \( \boldsymbol{\tilde{y}} \).
-We use a more compact notation in terms of the expectation value 
-</p>
-$$
-\mathbb{E}\left[(\boldsymbol{y}-\boldsymbol{\tilde{y}})^2\right]=\mathbb{E}\left[(\boldsymbol{f}+\boldsymbol{\epsilon}-\boldsymbol{\tilde{y}})^2\right],
-$$
+<p>Running this code we note an almost perfect agreement with the results from matrix inversion.</p>
 
-<p>and adding and subtracting \( \mathbb{E}\left[\boldsymbol{\tilde{y}}\right] \) we get</p>
-$$
-\mathbb{E}\left[(\boldsymbol{y}-\boldsymbol{\tilde{y}})^2\right]=\mathbb{E}\left[(\boldsymbol{f}+\boldsymbol{\epsilon}-\boldsymbol{\tilde{y}}+\mathbb{E}\left[\boldsymbol{\tilde{y}}\right]-\mathbb{E}\left[\boldsymbol{\tilde{y}}\right])^2\right],
-$$
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent">RMSprop for adaptive learning rate with Stochastic Gradient Descent </h2>
 
-<p>which, using the abovementioned expectation values can be rewritten as </p>
-$$
-\mathbb{E}\left[(\boldsymbol{y}-\boldsymbol{\tilde{y}})^2\right]=\mathbb{E}\left[(\boldsymbol{y}-\mathbb{E}\left[\boldsymbol{\tilde{y}}\right])^2\right]+\mathrm{Var}\left[\boldsymbol{\tilde{y}}\right]+\sigma^2,
-$$
+<!-- code=python (!bc pycod) typeset with pygments style "default" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #f8f8f8">
+  <pre style="line-height: 125%;"><span style="color: #408080; font-style: italic"># Using Autograd to calculate gradients using RMSprop  and Stochastic Gradient descent</span>
+<span style="color: #408080; font-style: italic"># OLS example</span>
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">random</span> <span style="color: #008000; font-weight: bold">import</span> random, seed
+<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
+<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">autograd.numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
+<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">matplotlib.pyplot</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">plt</span>
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">autograd</span> <span style="color: #008000; font-weight: bold">import</span> grad
+
+<span style="color: #408080; font-style: italic"># Note change from previous example</span>
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">CostOLS</span>(y,X,theta):
+    <span style="color: #008000; font-weight: bold">return</span> np<span style="color: #666666">.</span>sum((y<span style="color: #666666">-</span>X <span style="color: #666666">@</span> theta)<span style="color: #666666">**2</span>)
+
+n <span style="color: #666666">=</span> <span style="color: #666666">1000</span>
+x <span style="color: #666666">=</span> np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>rand(n,<span style="color: #666666">1</span>)
+y <span style="color: #666666">=</span> <span style="color: #666666">2.0+3*</span>x <span style="color: #666666">+4*</span>x<span style="color: #666666">*</span>x<span style="color: #408080; font-style: italic"># +np.random.randn(n,1)</span>
+
+X <span style="color: #666666">=</span> np<span style="color: #666666">.</span>c_[np<span style="color: #666666">.</span>ones((n,<span style="color: #666666">1</span>)), x, x<span style="color: #666666">*</span>x]
+XT_X <span style="color: #666666">=</span> X<span style="color: #666666">.</span>T <span style="color: #666666">@</span> X
+theta_linreg <span style="color: #666666">=</span> np<span style="color: #666666">.</span>linalg<span style="color: #666666">.</span>pinv(XT_X) <span style="color: #666666">@</span> (X<span style="color: #666666">.</span>T <span style="color: #666666">@</span> y)
+<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;Own inversion&quot;</span>)
+<span style="color: #008000">print</span>(theta_linreg)
+
+
+<span style="color: #408080; font-style: italic"># Note that we request the derivative wrt third argument (theta, 2 here)</span>
+training_gradient <span style="color: #666666">=</span> grad(CostOLS,<span style="color: #666666">2</span>)
+<span style="color: #408080; font-style: italic"># Define parameters for Stochastic Gradient Descent</span>
+n_epochs <span style="color: #666666">=</span> <span style="color: #666666">50</span>
+M <span style="color: #666666">=</span> <span style="color: #666666">5</span>   <span style="color: #408080; font-style: italic">#size of each minibatch</span>
+m <span style="color: #666666">=</span> <span style="color: #008000">int</span>(n<span style="color: #666666">/</span>M) <span style="color: #408080; font-style: italic">#number of minibatches</span>
+<span style="color: #408080; font-style: italic"># Guess for unknown parameters theta</span>
+theta <span style="color: #666666">=</span> np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>randn(<span style="color: #666666">3</span>,<span style="color: #666666">1</span>)
+
+<span style="color: #408080; font-style: italic"># Value for learning rate</span>
+eta <span style="color: #666666">=</span> <span style="color: #666666">0.01</span>
+<span style="color: #408080; font-style: italic"># Value for parameter rho</span>
+rho <span style="color: #666666">=</span> <span style="color: #666666">0.99</span>
+<span style="color: #408080; font-style: italic"># Including AdaGrad parameter to avoid possible division by zero</span>
+delta  <span style="color: #666666">=</span> <span style="color: #666666">1e-8</span>
+<span style="color: #008000; font-weight: bold">for</span> epoch <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(n_epochs):
+    Giter <span style="color: #666666">=</span> <span style="color: #666666">0.0</span>
+    <span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(m):
+        random_index <span style="color: #666666">=</span> M<span style="color: #666666">*</span>np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>randint(m)
+        xi <span style="color: #666666">=</span> X[random_index:random_index<span style="color: #666666">+</span>M]
+        yi <span style="color: #666666">=</span> y[random_index:random_index<span style="color: #666666">+</span>M]
+        gradients <span style="color: #666666">=</span> (<span style="color: #666666">1.0/</span>M)<span style="color: #666666">*</span>training_gradient(yi, xi, theta)
+	<span style="color: #408080; font-style: italic"># Accumulated gradient</span>
+	<span style="color: #408080; font-style: italic"># Scaling with rho the new and the previous results</span>
+        Giter <span style="color: #666666">=</span> (rho<span style="color: #666666">*</span>Giter<span style="color: #666666">+</span>(<span style="color: #666666">1-</span>rho)<span style="color: #666666">*</span>gradients<span style="color: #666666">*</span>gradients)
+	<span style="color: #408080; font-style: italic"># Taking the diagonal only and inverting</span>
+        update <span style="color: #666666">=</span> gradients<span style="color: #666666">*</span>eta<span style="color: #666666">/</span>(delta<span style="color: #666666">+</span>np<span style="color: #666666">.</span>sqrt(Giter))
+	<span style="color: #408080; font-style: italic"># Hadamard product</span>
+        theta <span style="color: #666666">-=</span> update
+<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;theta from own RMSprop&quot;</span>)
+<span style="color: #008000">print</span>(theta)
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
+</div>
 
-<p>that is the rewriting in terms of the so-called bias, the variance of the model \( \boldsymbol{\tilde{y}} \) and the variance of \( \boldsymbol{\epsilon} \).</p>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="a-way-to-read-the-bias-variance-tradeoff">A way to Read the Bias-Variance Tradeoff </h2>
-
-<br/><br/>
-<center>
-<p><img src="http://github.com/figures/BiasVariance.png" width="600" align="bottom"></p>
-</center>
-<br/><br/>
+<h2 id="and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf">And finally <a href="https://arxiv.org/pdf/1412.6980.pdf" target="_blank">ADAM</a> </h2>
 
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="example-code-for-bias-variance-tradeoff">Example code for Bias-Variance tradeoff </h2>
 
 <!-- code=python (!bc pycod) typeset with pygments style "default" -->
 <div class="cell border-box-sizing code_cell rendered">
@@ -1307,60 +2430,65 @@ <h2 id="example-code-for-bias-variance-tradeoff">Example code for Bias-Variance
     <div class="inner_cell">
       <div class="input_area">
         <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">matplotlib.pyplot</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">plt</span>
+  <pre style="line-height: 125%;"><span style="color: #408080; font-style: italic"># Using Autograd to calculate gradients using RMSprop  and Stochastic Gradient descent</span>
+<span style="color: #408080; font-style: italic"># OLS example</span>
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">random</span> <span style="color: #008000; font-weight: bold">import</span> random, seed
 <span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.linear_model</span> <span style="color: #008000; font-weight: bold">import</span> LinearRegression, Ridge, Lasso
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.preprocessing</span> <span style="color: #008000; font-weight: bold">import</span> PolynomialFeatures
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.model_selection</span> <span style="color: #008000; font-weight: bold">import</span> train_test_split
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.pipeline</span> <span style="color: #008000; font-weight: bold">import</span> make_pipeline
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.utils</span> <span style="color: #008000; font-weight: bold">import</span> resample
-
-np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>seed(<span style="color: #666666">2018</span>)
-
-n <span style="color: #666666">=</span> <span style="color: #666666">500</span>
-n_boostraps <span style="color: #666666">=</span> <span style="color: #666666">100</span>
-degree <span style="color: #666666">=</span> <span style="color: #666666">18</span>  <span style="color: #408080; font-style: italic"># A quite high value, just to show.</span>
-noise <span style="color: #666666">=</span> <span style="color: #666666">0.1</span>
-
-<span style="color: #408080; font-style: italic"># Make data set.</span>
-x <span style="color: #666666">=</span> np<span style="color: #666666">.</span>linspace(<span style="color: #666666">-1</span>, <span style="color: #666666">3</span>, n)<span style="color: #666666">.</span>reshape(<span style="color: #666666">-1</span>, <span style="color: #666666">1</span>)
-y <span style="color: #666666">=</span> np<span style="color: #666666">.</span>exp(<span style="color: #666666">-</span>x<span style="color: #666666">**2</span>) <span style="color: #666666">+</span> <span style="color: #666666">1.5</span> <span style="color: #666666">*</span> np<span style="color: #666666">.</span>exp(<span style="color: #666666">-</span>(x<span style="color: #666666">-2</span>)<span style="color: #666666">**2</span>) <span style="color: #666666">+</span> np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>normal(<span style="color: #666666">0</span>, <span style="color: #666666">0.1</span>, x<span style="color: #666666">.</span>shape)
-
-<span style="color: #408080; font-style: italic"># Hold out some test data that is never used in training.</span>
-x_train, x_test, y_train, y_test <span style="color: #666666">=</span> train_test_split(x, y, test_size<span style="color: #666666">=0.2</span>)
-
-<span style="color: #408080; font-style: italic"># Combine x transformation and model into one operation.</span>
-<span style="color: #408080; font-style: italic"># Not neccesary, but convenient.</span>
-model <span style="color: #666666">=</span> make_pipeline(PolynomialFeatures(degree<span style="color: #666666">=</span>degree), LinearRegression(fit_intercept<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">False</span>))
-
-<span style="color: #408080; font-style: italic"># The following (m x n_bootstraps) matrix holds the column vectors y_pred</span>
-<span style="color: #408080; font-style: italic"># for each bootstrap iteration.</span>
-y_pred <span style="color: #666666">=</span> np<span style="color: #666666">.</span>empty((y_test<span style="color: #666666">.</span>shape[<span style="color: #666666">0</span>], n_boostraps))
-<span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(n_boostraps):
-    x_, y_ <span style="color: #666666">=</span> resample(x_train, y_train)
-
-    <span style="color: #408080; font-style: italic"># Evaluate the new model on the same test data each time.</span>
-    y_pred[:, i] <span style="color: #666666">=</span> model<span style="color: #666666">.</span>fit(x_, y_)<span style="color: #666666">.</span>predict(x_test)<span style="color: #666666">.</span>ravel()
-
-<span style="color: #408080; font-style: italic"># Note: Expectations and variances taken w.r.t. different training</span>
-<span style="color: #408080; font-style: italic"># data sets, hence the axis=1. Subsequent means are taken across the test data</span>
-<span style="color: #408080; font-style: italic"># set in order to obtain a total value, but before this we have error/bias/variance</span>
-<span style="color: #408080; font-style: italic"># calculated per data point in the test set.</span>
-<span style="color: #408080; font-style: italic"># Note 2: The use of keepdims=True is important in the calculation of bias as this </span>
-<span style="color: #408080; font-style: italic"># maintains the column vector form. Dropping this yields very unexpected results.</span>
-error <span style="color: #666666">=</span> np<span style="color: #666666">.</span>mean( np<span style="color: #666666">.</span>mean((y_test <span style="color: #666666">-</span> y_pred)<span style="color: #666666">**2</span>, axis<span style="color: #666666">=1</span>, keepdims<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">True</span>) )
-bias <span style="color: #666666">=</span> np<span style="color: #666666">.</span>mean( (y_test <span style="color: #666666">-</span> np<span style="color: #666666">.</span>mean(y_pred, axis<span style="color: #666666">=1</span>, keepdims<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">True</span>))<span style="color: #666666">**2</span> )
-variance <span style="color: #666666">=</span> np<span style="color: #666666">.</span>mean( np<span style="color: #666666">.</span>var(y_pred, axis<span style="color: #666666">=1</span>, keepdims<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">True</span>) )
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&#39;Error:&#39;</span>, error)
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&#39;Bias^2:&#39;</span>, bias)
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&#39;Var:&#39;</span>, variance)
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&#39;</span><span style="color: #BB6688; font-weight: bold">{}</span><span style="color: #BA2121"> &gt;= </span><span style="color: #BB6688; font-weight: bold">{}</span><span style="color: #BA2121"> + </span><span style="color: #BB6688; font-weight: bold">{}</span><span style="color: #BA2121"> = </span><span style="color: #BB6688; font-weight: bold">{}</span><span style="color: #BA2121">&#39;</span><span style="color: #666666">.</span>format(error, bias, variance, bias<span style="color: #666666">+</span>variance))
-
-plt<span style="color: #666666">.</span>plot(x[::<span style="color: #666666">5</span>, :], y[::<span style="color: #666666">5</span>, :], label<span style="color: #666666">=</span><span style="color: #BA2121">&#39;f(x)&#39;</span>)
-plt<span style="color: #666666">.</span>scatter(x_test, y_test, label<span style="color: #666666">=</span><span style="color: #BA2121">&#39;Data points&#39;</span>)
-plt<span style="color: #666666">.</span>scatter(x_test, np<span style="color: #666666">.</span>mean(y_pred, axis<span style="color: #666666">=1</span>), label<span style="color: #666666">=</span><span style="color: #BA2121">&#39;Pred&#39;</span>)
-plt<span style="color: #666666">.</span>legend()
-plt<span style="color: #666666">.</span>show()
+<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">autograd.numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
+<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">matplotlib.pyplot</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">plt</span>
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">autograd</span> <span style="color: #008000; font-weight: bold">import</span> grad
+
+<span style="color: #408080; font-style: italic"># Note change from previous example</span>
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">CostOLS</span>(y,X,theta):
+    <span style="color: #008000; font-weight: bold">return</span> np<span style="color: #666666">.</span>sum((y<span style="color: #666666">-</span>X <span style="color: #666666">@</span> theta)<span style="color: #666666">**2</span>)
+
+n <span style="color: #666666">=</span> <span style="color: #666666">1000</span>
+x <span style="color: #666666">=</span> np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>rand(n,<span style="color: #666666">1</span>)
+y <span style="color: #666666">=</span> <span style="color: #666666">2.0+3*</span>x <span style="color: #666666">+4*</span>x<span style="color: #666666">*</span>x<span style="color: #408080; font-style: italic"># +np.random.randn(n,1)</span>
+
+X <span style="color: #666666">=</span> np<span style="color: #666666">.</span>c_[np<span style="color: #666666">.</span>ones((n,<span style="color: #666666">1</span>)), x, x<span style="color: #666666">*</span>x]
+XT_X <span style="color: #666666">=</span> X<span style="color: #666666">.</span>T <span style="color: #666666">@</span> X
+theta_linreg <span style="color: #666666">=</span> np<span style="color: #666666">.</span>linalg<span style="color: #666666">.</span>pinv(XT_X) <span style="color: #666666">@</span> (X<span style="color: #666666">.</span>T <span style="color: #666666">@</span> y)
+<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;Own inversion&quot;</span>)
+<span style="color: #008000">print</span>(theta_linreg)
+
+
+<span style="color: #408080; font-style: italic"># Note that we request the derivative wrt third argument (theta, 2 here)</span>
+training_gradient <span style="color: #666666">=</span> grad(CostOLS,<span style="color: #666666">2</span>)
+<span style="color: #408080; font-style: italic"># Define parameters for Stochastic Gradient Descent</span>
+n_epochs <span style="color: #666666">=</span> <span style="color: #666666">50</span>
+M <span style="color: #666666">=</span> <span style="color: #666666">5</span>   <span style="color: #408080; font-style: italic">#size of each minibatch</span>
+m <span style="color: #666666">=</span> <span style="color: #008000">int</span>(n<span style="color: #666666">/</span>M) <span style="color: #408080; font-style: italic">#number of minibatches</span>
+<span style="color: #408080; font-style: italic"># Guess for unknown parameters theta</span>
+theta <span style="color: #666666">=</span> np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>randn(<span style="color: #666666">3</span>,<span style="color: #666666">1</span>)
+
+<span style="color: #408080; font-style: italic"># Value for learning rate</span>
+eta <span style="color: #666666">=</span> <span style="color: #666666">0.01</span>
+<span style="color: #408080; font-style: italic"># Value for parameters theta1 and theta2, see https://arxiv.org/abs/1412.6980</span>
+theta1 <span style="color: #666666">=</span> <span style="color: #666666">0.9</span>
+theta2 <span style="color: #666666">=</span> <span style="color: #666666">0.999</span>
+<span style="color: #408080; font-style: italic"># Including AdaGrad parameter to avoid possible division by zero</span>
+delta  <span style="color: #666666">=</span> <span style="color: #666666">1e-7</span>
+<span style="color: #008000">iter</span> <span style="color: #666666">=</span> <span style="color: #666666">0</span>
+<span style="color: #008000; font-weight: bold">for</span> epoch <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(n_epochs):
+    first_moment <span style="color: #666666">=</span> <span style="color: #666666">0.0</span>
+    second_moment <span style="color: #666666">=</span> <span style="color: #666666">0.0</span>
+    <span style="color: #008000">iter</span> <span style="color: #666666">+=</span> <span style="color: #666666">1</span>
+    <span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(m):
+        random_index <span style="color: #666666">=</span> M<span style="color: #666666">*</span>np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>randint(m)
+        xi <span style="color: #666666">=</span> X[random_index:random_index<span style="color: #666666">+</span>M]
+        yi <span style="color: #666666">=</span> y[random_index:random_index<span style="color: #666666">+</span>M]
+        gradients <span style="color: #666666">=</span> (<span style="color: #666666">1.0/</span>M)<span style="color: #666666">*</span>training_gradient(yi, xi, theta)
+        <span style="color: #408080; font-style: italic"># Computing moments first</span>
+        first_moment <span style="color: #666666">=</span> theta1<span style="color: #666666">*</span>first_moment <span style="color: #666666">+</span> (<span style="color: #666666">1-</span>theta1)<span style="color: #666666">*</span>gradients
+        second_moment <span style="color: #666666">=</span> theta2<span style="color: #666666">*</span>second_moment<span style="color: #666666">+</span>(<span style="color: #666666">1-</span>theta2)<span style="color: #666666">*</span>gradients<span style="color: #666666">*</span>gradients
+        first_term <span style="color: #666666">=</span> first_moment<span style="color: #666666">/</span>(<span style="color: #666666">1.0-</span>theta1<span style="color: #666666">**</span><span style="color: #008000">iter</span>)
+        second_term <span style="color: #666666">=</span> second_moment<span style="color: #666666">/</span>(<span style="color: #666666">1.0-</span>theta2<span style="color: #666666">**</span><span style="color: #008000">iter</span>)
+	<span style="color: #408080; font-style: italic"># Scaling with rho the new and the previous results</span>
+        update <span style="color: #666666">=</span> eta<span style="color: #666666">*</span>first_term<span style="color: #666666">/</span>(np<span style="color: #666666">.</span>sqrt(second_term)<span style="color: #666666">+</span>delta)
+        theta <span style="color: #666666">-=</span> update
+<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;theta from own ADAM&quot;</span>)
+<span style="color: #008000">print</span>(theta)
 </pre>
 </div>
       </div>
@@ -1378,7 +2506,43 @@ <h2 id="example-code-for-bias-variance-tradeoff">Example code for Bias-Variance
 
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="understanding-what-happens">Understanding what happens </h2>
+<h2 id="material-for-the-lab-sessions">Material for the lab sessions  </h2>
+
+<div class="alert alert-block alert-block alert-text-normal">
+<b></b>
+<p>
+<ol>
+<li> Exercise set for week 37 and reminder on scaling (from lab sessions of week 35)</li>
+<li> Work on project 1
+<!-- * <a href="https://youtu.be/bK4AEcTu-oM" target="_blank">Video of exercise sessions week 37</a> --></li>
+</ol>
+<p>For more discussions of Ridge regression and calculation of averages, <a href="https://arxiv.org/abs/1509.09169" target="_blank">Wessel van Wieringen's</a> article is highly recommended.</p>
+</div>
+  
+
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="reminder-on-different-scaling-methods">Reminder on different scaling methods </h2>
+
+<p>Before fitting a regression model, it is good practice to normalize or
+standardize the features. This ensures all features are on a
+comparable scale, which is especially important when using
+regularization. In the exercises this week we will perform standardization, scaling each
+feature to have mean 0 and standard deviation 1.
+</p>
+
+<p>Here we compute the mean and standard deviation of each column (feature) in our design/feature matrix \( \boldsymbol{X} \).
+Then we subtract the mean and divide by the standard deviation for each feature.
+</p>
+
+<p>In the example here we
+we will also center the target \( \boldsymbol{y} \) to mean \( 0 \). Centering \( \boldsymbol{y} \)
+(and each feature) means the model does not require a separate intercept
+term, the data is shifted such that the intercept is effectively 0
+. (In practice, one could include an intercept in the model and not
+penalize it, but here we simplify by centering.)
+Choose \( n=100 \) data points and set up $\boldsymbol{x}, \( \boldsymbol{y} \) and the design matrix \( \boldsymbol{X} \).
+</p>
+
 
 <!-- code=python (!bc pycod) typeset with pygments style "default" -->
 <div class="cell border-box-sizing code_cell rendered">
@@ -1386,52 +2550,15 @@ <h2 id="understanding-what-happens">Understanding what happens </h2>
     <div class="inner_cell">
       <div class="input_area">
         <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">matplotlib.pyplot</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">plt</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.linear_model</span> <span style="color: #008000; font-weight: bold">import</span> LinearRegression, Ridge, Lasso
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.preprocessing</span> <span style="color: #008000; font-weight: bold">import</span> PolynomialFeatures
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.model_selection</span> <span style="color: #008000; font-weight: bold">import</span> train_test_split
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.pipeline</span> <span style="color: #008000; font-weight: bold">import</span> make_pipeline
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.utils</span> <span style="color: #008000; font-weight: bold">import</span> resample
-
-np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>seed(<span style="color: #666666">2018</span>)
-
-n <span style="color: #666666">=</span> <span style="color: #666666">40</span>
-n_boostraps <span style="color: #666666">=</span> <span style="color: #666666">100</span>
-maxdegree <span style="color: #666666">=</span> <span style="color: #666666">14</span>
-
-
-<span style="color: #408080; font-style: italic"># Make data set.</span>
-x <span style="color: #666666">=</span> np<span style="color: #666666">.</span>linspace(<span style="color: #666666">-3</span>, <span style="color: #666666">3</span>, n)<span style="color: #666666">.</span>reshape(<span style="color: #666666">-1</span>, <span style="color: #666666">1</span>)
-y <span style="color: #666666">=</span> np<span style="color: #666666">.</span>exp(<span style="color: #666666">-</span>x<span style="color: #666666">**2</span>) <span style="color: #666666">+</span> <span style="color: #666666">1.5</span> <span style="color: #666666">*</span> np<span style="color: #666666">.</span>exp(<span style="color: #666666">-</span>(x<span style="color: #666666">-2</span>)<span style="color: #666666">**2</span>)<span style="color: #666666">+</span> np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>normal(<span style="color: #666666">0</span>, <span style="color: #666666">0.1</span>, x<span style="color: #666666">.</span>shape)
-error <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros(maxdegree)
-bias <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros(maxdegree)
-variance <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros(maxdegree)
-polydegree <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros(maxdegree)
-x_train, x_test, y_train, y_test <span style="color: #666666">=</span> train_test_split(x, y, test_size<span style="color: #666666">=0.2</span>)
-
-<span style="color: #008000; font-weight: bold">for</span> degree <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(maxdegree):
-    model <span style="color: #666666">=</span> make_pipeline(PolynomialFeatures(degree<span style="color: #666666">=</span>degree), LinearRegression(fit_intercept<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">False</span>))
-    y_pred <span style="color: #666666">=</span> np<span style="color: #666666">.</span>empty((y_test<span style="color: #666666">.</span>shape[<span style="color: #666666">0</span>], n_boostraps))
-    <span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(n_boostraps):
-        x_, y_ <span style="color: #666666">=</span> resample(x_train, y_train)
-        y_pred[:, i] <span style="color: #666666">=</span> model<span style="color: #666666">.</span>fit(x_, y_)<span style="color: #666666">.</span>predict(x_test)<span style="color: #666666">.</span>ravel()
-
-    polydegree[degree] <span style="color: #666666">=</span> degree
-    error[degree] <span style="color: #666666">=</span> np<span style="color: #666666">.</span>mean( np<span style="color: #666666">.</span>mean((y_test <span style="color: #666666">-</span> y_pred)<span style="color: #666666">**2</span>, axis<span style="color: #666666">=1</span>, keepdims<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">True</span>) )
-    bias[degree] <span style="color: #666666">=</span> np<span style="color: #666666">.</span>mean( (y_test <span style="color: #666666">-</span> np<span style="color: #666666">.</span>mean(y_pred, axis<span style="color: #666666">=1</span>, keepdims<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">True</span>))<span style="color: #666666">**2</span> )
-    variance[degree] <span style="color: #666666">=</span> np<span style="color: #666666">.</span>mean( np<span style="color: #666666">.</span>var(y_pred, axis<span style="color: #666666">=1</span>, keepdims<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">True</span>) )
-    <span style="color: #008000">print</span>(<span style="color: #BA2121">&#39;Polynomial degree:&#39;</span>, degree)
-    <span style="color: #008000">print</span>(<span style="color: #BA2121">&#39;Error:&#39;</span>, error[degree])
-    <span style="color: #008000">print</span>(<span style="color: #BA2121">&#39;Bias^2:&#39;</span>, bias[degree])
-    <span style="color: #008000">print</span>(<span style="color: #BA2121">&#39;Var:&#39;</span>, variance[degree])
-    <span style="color: #008000">print</span>(<span style="color: #BA2121">&#39;</span><span style="color: #BB6688; font-weight: bold">{}</span><span style="color: #BA2121"> &gt;= </span><span style="color: #BB6688; font-weight: bold">{}</span><span style="color: #BA2121"> + </span><span style="color: #BB6688; font-weight: bold">{}</span><span style="color: #BA2121"> = </span><span style="color: #BB6688; font-weight: bold">{}</span><span style="color: #BA2121">&#39;</span><span style="color: #666666">.</span>format(error[degree], bias[degree], variance[degree], bias[degree]<span style="color: #666666">+</span>variance[degree]))
-
-plt<span style="color: #666666">.</span>plot(polydegree, error, label<span style="color: #666666">=</span><span style="color: #BA2121">&#39;Error&#39;</span>)
-plt<span style="color: #666666">.</span>plot(polydegree, bias, label<span style="color: #666666">=</span><span style="color: #BA2121">&#39;bias&#39;</span>)
-plt<span style="color: #666666">.</span>plot(polydegree, variance, label<span style="color: #666666">=</span><span style="color: #BA2121">&#39;Variance&#39;</span>)
-plt<span style="color: #666666">.</span>legend()
-plt<span style="color: #666666">.</span>show()
+  <pre style="line-height: 125%;"><span style="color: #408080; font-style: italic"># Standardize features (zero mean, unit variance for each feature)</span>
+X_mean <span style="color: #666666">=</span> X<span style="color: #666666">.</span>mean(axis<span style="color: #666666">=0</span>)
+X_std <span style="color: #666666">=</span> X<span style="color: #666666">.</span>std(axis<span style="color: #666666">=0</span>)
+X_std[X_std <span style="color: #666666">==</span> <span style="color: #666666">0</span>] <span style="color: #666666">=</span> <span style="color: #666666">1</span>  <span style="color: #408080; font-style: italic"># safeguard to avoid division by zero for constant features</span>
+X_norm <span style="color: #666666">=</span> (X <span style="color: #666666">-</span> X_mean) <span style="color: #666666">/</span> X_std
+
+<span style="color: #408080; font-style: italic"># Center the target to zero mean (optional, to simplify intercept handling)</span>
+y_mean <span style="color: #666666">=</span> ?
+y_centered <span style="color: #666666">=</span> ?
 </pre>
 </div>
       </div>
@@ -1447,59 +2574,73 @@ <h2 id="understanding-what-happens">Understanding what happens </h2>
   </div>
 </div>
 
+<p>Do we need to center the values of \( y \)?</p>
 
-<!-- !split  -->
-<h2 id="summing-up">Summing up </h2>
-
-<p>The bias-variance tradeoff summarizes the fundamental tension in
-machine learning, particularly supervised learning, between the
-complexity of a model and the amount of training data needed to train
-it.  Since data is often limited, in practice it is often useful to
-use a less-complex model with higher bias, that is  a model whose asymptotic
-performance is worse than another model because it is easier to
-train and less sensitive to sampling noise arising from having a
-finite-sized training dataset (smaller variance). 
+<p>After this preprocessing, each column of \( \boldsymbol{X}_{\mathrm{norm}} \) has mean zero and standard deviation \( 1 \)
+and \( \boldsymbol{y}_{\mathrm{centered}} \) has mean 0. This can make the optimization landscape
+nicer and ensures the regularization penalty \( \lambda \sum_j
+\theta_j^2 \) in Ridge regression treats each coefficient fairly (since features are on the
+same scale).
 </p>
 
-<p>The above equations tell us that in
-order to minimize the expected test error, we need to select a
-statistical learning method that simultaneously achieves low variance
-and low bias. Note that variance is inherently a nonnegative quantity,
-and squared bias is also nonnegative. Hence, we see that the expected
-test MSE can never lie below \( Var(\epsilon) \), the irreducible error.
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="functionality-in-scikit-learn">Functionality in Scikit-Learn </h2>
+
+<p><b>Scikit-Learn</b> has several functions which allow us to rescale the
+data, normally resulting in much better results in terms of various
+accuracy scores.  The <b>StandardScaler</b> function in <b>Scikit-Learn</b>
+ensures that for each feature/predictor we study the mean value is
+zero and the variance is one (every column in the design/feature
+matrix).  This scaling has the drawback that it does not ensure that
+we have a particular maximum or minimum in our data set. Another
+function included in <b>Scikit-Learn</b> is the <b>MinMaxScaler</b> which
+ensures that all features are exactly between \( 0 \) and \( 1 \). The
 </p>
 
-<p>What do we mean by the variance and bias of a statistical learning
-method? The variance refers to the amount by which our model would change if we
-estimated it using a different training data set. Since the training
-data are used to fit the statistical learning method, different
-training data sets  will result in a different estimate. But ideally the
-estimate for our model should not vary too much between training
-sets. However, if a method has high variance  then small changes in
-the training data can result in large changes in the model. In general, more
-flexible statistical methods have higher variance.
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="more-preprocessing">More preprocessing </h2>
+
+<div class="alert alert-block alert-block alert-text-normal">
+<b></b>
+<p>
+<p>The <b>Normalizer</b> scales each data
+point such that the feature vector has a euclidean length of one. In other words, it
+projects a data point on the circle (or sphere in the case of higher dimensions) with a
+radius of 1. This means every data point is scaled by a different number (by the
+inverse of it&#8217;s length).
+This normalization is often used when only the direction (or angle) of the data matters,
+not the length of the feature vector.
+</p>
+
+<p>The <b>RobustScaler</b> works similarly to the StandardScaler in that it
+ensures statistical properties for each feature that guarantee that
+they are on the same scale. However, the RobustScaler uses the median
+and quartiles, instead of mean and variance. This makes the
+RobustScaler ignore data points that are very different from the rest
+(like measurement errors). These odd data points are also called
+outliers, and might often lead to trouble for other scaling
+techniques.
 </p>
+</div>
 
-<p>You may also find this recent <a href="https://www.pnas.org/content/116/32/15849" target="_blank">article</a> of interest.</p>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="another-example-from-scikit-learn-s-repository">Another Example from Scikit-Learn's Repository </h2>
-
-<p>This example demonstrates the problems of underfitting and overfitting and
-how we can use linear regression with polynomial features to approximate
-nonlinear functions. The plot shows the function that we want to approximate,
-which is a part of the cosine function. In addition, the samples from the
-real function and the approximations of different models are displayed. The
-models have polynomial features of different degrees. We can see that a
-linear function (polynomial with degree 1) is not sufficient to fit the
-training samples. This is called <b>underfitting</b>. A polynomial of degree 4
-approximates the true function almost perfectly. However, for higher degrees
-the model will <b>overfit</b> the training data, i.e. it learns the noise of the
-training data.
-We evaluate quantitatively overfitting and underfitting by using
-cross-validation. We calculate the mean squared error (MSE) on the validation
-set, the higher, the less likely the model generalizes correctly from the
-training data.
+<h2 id="frequently-used-scaling-functions">Frequently used scaling functions </h2>
+
+<p>Many features are often scaled using standardization to improve performance. In <b>Scikit-Learn</b> this is given by the <b>StandardScaler</b> function as discussed above. It is easy however to write your own. 
+Mathematically, this involves subtracting the mean and divide by the standard deviation over the data set, for each feature:
+</p>
+
+$$
+    x_j^{(i)} \rightarrow \frac{x_j^{(i)} - \overline{x}_j}{\sigma(x_j)},
+$$
+
+<p>where \( \overline{x}_j \) and \( \sigma(x_j) \) are the mean and standard deviation, respectively,  of the feature \( x_j \).
+This ensures that each feature has zero mean and unit standard deviation.  For data sets where  we do not have the standard deviation or don't wish to calculate it,  it is then common to simply set it to one.
+</p>
+
+<p>Keep in mind that when you transform your data set before training a model, the same transformation needs to be done
+on your eventual new data set  before making a prediction. If we translate this into a Python code, it would could be implemented as
 </p>
 
 
@@ -1509,55 +2650,22 @@ <h2 id="another-example-from-scikit-learn-s-repository">Another Example from Sci
     <div class="inner_cell">
       <div class="input_area">
         <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #408080; font-style: italic">#print(__doc__)</span>
-
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">matplotlib.pyplot</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">plt</span>
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.pipeline</span> <span style="color: #008000; font-weight: bold">import</span> Pipeline
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.preprocessing</span> <span style="color: #008000; font-weight: bold">import</span> PolynomialFeatures
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.linear_model</span> <span style="color: #008000; font-weight: bold">import</span> LinearRegression
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.model_selection</span> <span style="color: #008000; font-weight: bold">import</span> cross_val_score
-
-
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">true_fun</span>(X):
-    <span style="color: #008000; font-weight: bold">return</span> np<span style="color: #666666">.</span>cos(<span style="color: #666666">1.5</span> <span style="color: #666666">*</span> np<span style="color: #666666">.</span>pi <span style="color: #666666">*</span> X)
-
-np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>seed(<span style="color: #666666">0</span>)
-
-n_samples <span style="color: #666666">=</span> <span style="color: #666666">30</span>
-degrees <span style="color: #666666">=</span> [<span style="color: #666666">1</span>, <span style="color: #666666">4</span>, <span style="color: #666666">15</span>]
-
-X <span style="color: #666666">=</span> np<span style="color: #666666">.</span>sort(np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>rand(n_samples))
-y <span style="color: #666666">=</span> true_fun(X) <span style="color: #666666">+</span> np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>randn(n_samples) <span style="color: #666666">*</span> <span style="color: #666666">0.1</span>
-
-plt<span style="color: #666666">.</span>figure(figsize<span style="color: #666666">=</span>(<span style="color: #666666">14</span>, <span style="color: #666666">5</span>))
-<span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #008000">len</span>(degrees)):
-    ax <span style="color: #666666">=</span> plt<span style="color: #666666">.</span>subplot(<span style="color: #666666">1</span>, <span style="color: #008000">len</span>(degrees), i <span style="color: #666666">+</span> <span style="color: #666666">1</span>)
-    plt<span style="color: #666666">.</span>setp(ax, xticks<span style="color: #666666">=</span>(), yticks<span style="color: #666666">=</span>())
-
-    polynomial_features <span style="color: #666666">=</span> PolynomialFeatures(degree<span style="color: #666666">=</span>degrees[i],
-                                             include_bias<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">False</span>)
-    linear_regression <span style="color: #666666">=</span> LinearRegression()
-    pipeline <span style="color: #666666">=</span> Pipeline([(<span style="color: #BA2121">&quot;polynomial_features&quot;</span>, polynomial_features),
-                         (<span style="color: #BA2121">&quot;linear_regression&quot;</span>, linear_regression)])
-    pipeline<span style="color: #666666">.</span>fit(X[:, np<span style="color: #666666">.</span>newaxis], y)
-
-    <span style="color: #408080; font-style: italic"># Evaluate the models using crossvalidation</span>
-    scores <span style="color: #666666">=</span> cross_val_score(pipeline, X[:, np<span style="color: #666666">.</span>newaxis], y,
-                             scoring<span style="color: #666666">=</span><span style="color: #BA2121">&quot;neg_mean_squared_error&quot;</span>, cv<span style="color: #666666">=10</span>)
-
-    X_test <span style="color: #666666">=</span> np<span style="color: #666666">.</span>linspace(<span style="color: #666666">0</span>, <span style="color: #666666">1</span>, <span style="color: #666666">100</span>)
-    plt<span style="color: #666666">.</span>plot(X_test, pipeline<span style="color: #666666">.</span>predict(X_test[:, np<span style="color: #666666">.</span>newaxis]), label<span style="color: #666666">=</span><span style="color: #BA2121">&quot;Model&quot;</span>)
-    plt<span style="color: #666666">.</span>plot(X_test, true_fun(X_test), label<span style="color: #666666">=</span><span style="color: #BA2121">&quot;True function&quot;</span>)
-    plt<span style="color: #666666">.</span>scatter(X, y, edgecolor<span style="color: #666666">=</span><span style="color: #BA2121">&#39;b&#39;</span>, s<span style="color: #666666">=20</span>, label<span style="color: #666666">=</span><span style="color: #BA2121">&quot;Samples&quot;</span>)
-    plt<span style="color: #666666">.</span>xlabel(<span style="color: #BA2121">&quot;x&quot;</span>)
-    plt<span style="color: #666666">.</span>ylabel(<span style="color: #BA2121">&quot;y&quot;</span>)
-    plt<span style="color: #666666">.</span>xlim((<span style="color: #666666">0</span>, <span style="color: #666666">1</span>))
-    plt<span style="color: #666666">.</span>ylim((<span style="color: #666666">-2</span>, <span style="color: #666666">2</span>))
-    plt<span style="color: #666666">.</span>legend(loc<span style="color: #666666">=</span><span style="color: #BA2121">&quot;best&quot;</span>)
-    plt<span style="color: #666666">.</span>title(<span style="color: #BA2121">&quot;Degree </span><span style="color: #BB6688; font-weight: bold">{}</span><span style="color: #BB6622; font-weight: bold">\n</span><span style="color: #BA2121">MSE = </span><span style="color: #BB6688; font-weight: bold">{:.2e}</span><span style="color: #BA2121">(+/- </span><span style="color: #BB6688; font-weight: bold">{:.2e}</span><span style="color: #BA2121">)&quot;</span><span style="color: #666666">.</span>format(
-        degrees[i], <span style="color: #666666">-</span>scores<span style="color: #666666">.</span>mean(), scores<span style="color: #666666">.</span>std()))
-plt<span style="color: #666666">.</span>show()
+  <pre style="line-height: 125%;"><span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;</span>
+<span style="color: #BA2121; font-style: italic">#Model training, we compute the mean value of y and X</span>
+<span style="color: #BA2121; font-style: italic">y_train_mean = np.mean(y_train)</span>
+<span style="color: #BA2121; font-style: italic">X_train_mean = np.mean(X_train,axis=0)</span>
+<span style="color: #BA2121; font-style: italic">X_train = X_train - X_train_mean</span>
+<span style="color: #BA2121; font-style: italic">y_train = y_train - y_train_mean</span>
+
+<span style="color: #BA2121; font-style: italic"># The we fit our model with the training data</span>
+<span style="color: #BA2121; font-style: italic">trained_model = some_model.fit(X_train,y_train)</span>
+
+
+<span style="color: #BA2121; font-style: italic">#Model prediction, we need also to transform our data set used for the prediction.</span>
+<span style="color: #BA2121; font-style: italic">X_test = X_test - X_train_mean #Use mean from training data</span>
+<span style="color: #BA2121; font-style: italic">y_pred = trained_model(X_test)</span>
+<span style="color: #BA2121; font-style: italic">y_pred = y_pred + y_train_mean</span>
+<span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;</span>
 </pre>
 </div>
       </div>
@@ -1573,47 +2681,112 @@ <h2 id="another-example-from-scikit-learn-s-repository">Another Example from Sci
   </div>
 </div>
 
+<p>Let us try to understand what this may imply mathematically when we
+subtract the mean values, also known as <em>zero centering</em>. For
+simplicity, we will focus on  ordinary regression, as done in the above example.
+</p>
+
+<p>The cost/loss function  for regression is</p>
+$$
+C(\theta_0, \theta_1, ... , \theta_{p-1}) = \frac{1}{n}\sum_{i=0}^{n} \left(y_i - \theta_0 - \sum_{j=1}^{p-1} X_{ij}\theta_j\right)^2,.
+$$
 
-<!-- !split  -->
-<h2 id="various-steps-in-cross-validation">Various steps in cross-validation </h2>
-
-<p>When the repetitive splitting of the data set is done randomly,
-samples may accidently end up in a fast majority of the splits in
-either training or test set. Such samples may have an unbalanced
-influence on either model building or prediction evaluation. To avoid
-this \( k \)-fold cross-validation structures the data splitting. The
-samples are divided into \( k \) more or less equally sized exhaustive and
-mutually exclusive subsets. In turn (at each split) one of these
-subsets plays the role of the test set while the union of the
-remaining subsets constitutes the training set. Such a splitting
-warrants a balanced representation of each sample in both training and
-test set over the splits. Still the division into the \( k \) subsets
-involves a degree of randomness. This may be fully excluded when
-choosing \( k=n \). This particular case is referred to as leave-one-out
-cross-validation (LOOCV). 
+<p>Recall also that we use the squared value. This expression can lead to an
+increased penalty for higher differences between predicted and
+output/target values.
 </p>
 
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="cross-validation-in-brief">Cross-validation in brief </h2>
+<p>What we have done is to single out the \( \theta_0 \) term in the
+definition of the mean squared error (MSE).  The design matrix \( X \)
+does in this case not contain any intercept column.  When we take the
+derivative with respect to \( \theta_0 \), we want the derivative to obey
+</p>
 
-<p>For the various values of \( k \)</p>
+$$
+\frac{\partial C}{\partial \theta_j} = 0,
+$$
 
-<ol>
-<li> shuffle the dataset randomly.</li>
-<li> Split the dataset into \( k \) groups.</li>
-<li> For each unique group:
-<ol type="a"></li>
-<li> Decide which group to use as set for test data</li>
-<li> Take the remaining groups as a training data set</li>
-<li> Fit a model on the training set and evaluate it on the test set</li>
-<li> Retain the evaluation score and discard the model</li>
-</ol>
-<li> Summarize the model using the sample of model evaluation scores</li>
-</ol>
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="code-example-for-cross-validation-and-k-fold-cross-validation">Code Example for Cross-validation and \( k \)-fold Cross-validation </h2>
+<p>for all \( j \). For \( \theta_0 \) we have</p>
+
+$$
+\frac{\partial C}{\partial \theta_0} = -\frac{2}{n}\sum_{i=0}^{n-1} \left(y_i - \theta_0 - \sum_{j=1}^{p-1} X_{ij} \theta_j\right).
+$$
+
+<p>Multiplying away the constant \( 2/n \), we obtain</p>
+$$
+\sum_{i=0}^{n-1} \theta_0 = \sum_{i=0}^{n-1}y_i - \sum_{i=0}^{n-1} \sum_{j=1}^{p-1} X_{ij} \theta_j.
+$$
+
+<p>Let us specialize first to the case where we have only two parameters \( \theta_0 \) and \( \theta_1 \).
+Our result for \( \theta_0 \) simplifies then to
+</p>
+$$
+n\theta_0 = \sum_{i=0}^{n-1}y_i - \sum_{i=0}^{n-1} X_{i1} \theta_1.
+$$
+
+<p>We obtain then</p>
+$$
+\theta_0 = \frac{1}{n}\sum_{i=0}^{n-1}y_i - \theta_1\frac{1}{n}\sum_{i=0}^{n-1} X_{i1}.
+$$
+
+<p>If we define</p>
+$$
+\mu_{\boldsymbol{x}_1}=\frac{1}{n}\sum_{i=0}^{n-1} X_{i1},
+$$
+
+<p>and the mean value of the outputs as</p>
+$$
+\mu_y=\frac{1}{n}\sum_{i=0}^{n-1}y_i,
+$$
+
+<p>we have</p>
+$$
+\theta_0 = \mu_y - \theta_1\mu_{\boldsymbol{x}_1}.
+$$
+
+<p>In the general case with more parameters than \( \theta_0 \) and \( \theta_1 \), we have</p>
+$$
+\theta_0 = \frac{1}{n}\sum_{i=0}^{n-1}y_i - \frac{1}{n}\sum_{i=0}^{n-1}\sum_{j=1}^{p-1} X_{ij}\theta_j.
+$$
+
+<p>We can rewrite the latter equation as</p>
+$$
+\theta_0 = \frac{1}{n}\sum_{i=0}^{n-1}y_i - \sum_{j=1}^{p-1} \mu_{\boldsymbol{x}_j}\theta_j,
+$$
+
+<p>where we have defined</p>
+$$
+\mu_{\boldsymbol{x}_j}=\frac{1}{n}\sum_{i=0}^{n-1} X_{ij},
+$$
+
+<p>the mean value for all elements of the column vector \( \boldsymbol{x}_j \).</p>
+
+<p>Replacing \( y_i \) with \( y_i - y_i - \overline{\boldsymbol{y}} \) and centering also our design matrix results in a cost function (in vector-matrix disguise)</p>
+$$
+C(\boldsymbol{\theta}) = (\boldsymbol{\tilde{y}} - \tilde{X}\boldsymbol{\theta})^T(\boldsymbol{\tilde{y}} - \tilde{X}\boldsymbol{\theta}). 
+$$
+
+<p>If we minimize with respect to \( \boldsymbol{\theta} \) we have then</p>
+
+$$
+\hat{\boldsymbol{\theta}} = (\tilde{X}^T\tilde{X})^{-1}\tilde{X}^T\boldsymbol{\tilde{y}},
+$$
+
+<p>where \( \boldsymbol{\tilde{y}} = \boldsymbol{y} - \overline{\boldsymbol{y}} \)
+and \( \tilde{X}_{ij} = X_{ij} - \frac{1}{n}\sum_{k=0}^{n-1}X_{kj} \).
+</p>
+
+<p>For Ridge regression we need to add \( \lambda \boldsymbol{\theta}^T\boldsymbol{\theta} \) to the cost function and get then</p>
+$$
+\hat{\boldsymbol{\theta}} = (\tilde{X}^T\tilde{X} + \lambda I)^{-1}\tilde{X}^T\boldsymbol{\tilde{y}}.
+$$
+
+<p>What does this mean? And why do we insist on all this? Let us look at some examples.</p>
+
+<p>This code shows a simple first-order fit to a data set using the above transformed data, where we consider the role of the intercept first, by either excluding it or including it (<em>code example thanks to  &#216;yvind Sigmundson Sch&#248;yen</em>). Here our scaling of the data is done by subtracting the mean values only.
+Note also that we do not split the data into training and test.
+</p>
 
-<p>The code here uses Ridge regression with cross-validation (CV)  resampling and \( k \)-fold CV in order to fit a specific polynomial. </p>
 
 <!-- code=python (!bc pycod) typeset with pygments style "default" -->
 <div class="cell border-box-sizing code_cell rendered">
@@ -1623,90 +2796,87 @@ <h2 id="code-example-for-cross-validation-and-k-fold-cross-validation">Code Exam
         <div class="highlight" style="background: #f8f8f8">
   <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
 <span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">matplotlib.pyplot</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">plt</span>
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.model_selection</span> <span style="color: #008000; font-weight: bold">import</span> KFold
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.linear_model</span> <span style="color: #008000; font-weight: bold">import</span> Ridge
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.model_selection</span> <span style="color: #008000; font-weight: bold">import</span> cross_val_score
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.preprocessing</span> <span style="color: #008000; font-weight: bold">import</span> PolynomialFeatures
-
-<span style="color: #408080; font-style: italic"># A seed just to ensure that the random numbers are the same for every run.</span>
-<span style="color: #408080; font-style: italic"># Useful for eventual debugging.</span>
-np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>seed(<span style="color: #666666">3155</span>)
-
-<span style="color: #408080; font-style: italic"># Generate the data.</span>
-nsamples <span style="color: #666666">=</span> <span style="color: #666666">100</span>
-x <span style="color: #666666">=</span> np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>randn(nsamples)
-y <span style="color: #666666">=</span> <span style="color: #666666">3*</span>x<span style="color: #666666">**2</span> <span style="color: #666666">+</span> np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>randn(nsamples)
 
-<span style="color: #408080; font-style: italic">## Cross-validation on Ridge regression using KFold only</span>
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.linear_model</span> <span style="color: #008000; font-weight: bold">import</span> LinearRegression
 
-<span style="color: #408080; font-style: italic"># Decide degree on polynomial to fit</span>
-poly <span style="color: #666666">=</span> PolynomialFeatures(degree <span style="color: #666666">=</span> <span style="color: #666666">6</span>)
 
-<span style="color: #408080; font-style: italic"># Decide which values of lambda to use</span>
-nlambdas <span style="color: #666666">=</span> <span style="color: #666666">500</span>
-lambdas <span style="color: #666666">=</span> np<span style="color: #666666">.</span>logspace(<span style="color: #666666">-3</span>, <span style="color: #666666">5</span>, nlambdas)
+np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>seed(<span style="color: #666666">2021</span>)
 
-<span style="color: #408080; font-style: italic"># Initialize a KFold instance</span>
-k <span style="color: #666666">=</span> <span style="color: #666666">5</span>
-kfold <span style="color: #666666">=</span> KFold(n_splits <span style="color: #666666">=</span> k)
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">MSE</span>(y_data,y_model):
+    n <span style="color: #666666">=</span> np<span style="color: #666666">.</span>size(y_model)
+    <span style="color: #008000; font-weight: bold">return</span> np<span style="color: #666666">.</span>sum((y_data<span style="color: #666666">-</span>y_model)<span style="color: #666666">**2</span>)<span style="color: #666666">/</span>n
 
-<span style="color: #408080; font-style: italic"># Perform the cross-validation to estimate MSE</span>
-scores_KFold <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros((nlambdas, k))
 
-i <span style="color: #666666">=</span> <span style="color: #666666">0</span>
-<span style="color: #008000; font-weight: bold">for</span> lmb <span style="color: #AA22FF; font-weight: bold">in</span> lambdas:
-    ridge <span style="color: #666666">=</span> Ridge(alpha <span style="color: #666666">=</span> lmb)
-    j <span style="color: #666666">=</span> <span style="color: #666666">0</span>
-    <span style="color: #008000; font-weight: bold">for</span> train_inds, test_inds <span style="color: #AA22FF; font-weight: bold">in</span> kfold<span style="color: #666666">.</span>split(x):
-        xtrain <span style="color: #666666">=</span> x[train_inds]
-        ytrain <span style="color: #666666">=</span> y[train_inds]
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">fit_theta</span>(X, y):
+    <span style="color: #008000; font-weight: bold">return</span> np<span style="color: #666666">.</span>linalg<span style="color: #666666">.</span>pinv(X<span style="color: #666666">.</span>T <span style="color: #666666">@</span> X) <span style="color: #666666">@</span> X<span style="color: #666666">.</span>T <span style="color: #666666">@</span> y
 
-        xtest <span style="color: #666666">=</span> x[test_inds]
-        ytest <span style="color: #666666">=</span> y[test_inds]
 
-        Xtrain <span style="color: #666666">=</span> poly<span style="color: #666666">.</span>fit_transform(xtrain[:, np<span style="color: #666666">.</span>newaxis])
-        ridge<span style="color: #666666">.</span>fit(Xtrain, ytrain[:, np<span style="color: #666666">.</span>newaxis])
+true_theta <span style="color: #666666">=</span> [<span style="color: #666666">2</span>, <span style="color: #666666">0.5</span>, <span style="color: #666666">3.7</span>]
 
-        Xtest <span style="color: #666666">=</span> poly<span style="color: #666666">.</span>fit_transform(xtest[:, np<span style="color: #666666">.</span>newaxis])
-        ypred <span style="color: #666666">=</span> ridge<span style="color: #666666">.</span>predict(Xtest)
+x <span style="color: #666666">=</span> np<span style="color: #666666">.</span>linspace(<span style="color: #666666">0</span>, <span style="color: #666666">1</span>, <span style="color: #666666">11</span>)
+y <span style="color: #666666">=</span> np<span style="color: #666666">.</span>sum(
+    np<span style="color: #666666">.</span>asarray([x <span style="color: #666666">**</span> p <span style="color: #666666">*</span> b <span style="color: #008000; font-weight: bold">for</span> p, b <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">enumerate</span>(true_theta)]), axis<span style="color: #666666">=0</span>
+) <span style="color: #666666">+</span> <span style="color: #666666">0.1</span> <span style="color: #666666">*</span> np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>normal(size<span style="color: #666666">=</span><span style="color: #008000">len</span>(x))
 
-        scores_KFold[i,j] <span style="color: #666666">=</span> np<span style="color: #666666">.</span>sum((ypred <span style="color: #666666">-</span> ytest[:, np<span style="color: #666666">.</span>newaxis])<span style="color: #666666">**2</span>)<span style="color: #666666">/</span>np<span style="color: #666666">.</span>size(ypred)
+degree <span style="color: #666666">=</span> <span style="color: #666666">3</span>
+X <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros((<span style="color: #008000">len</span>(x), degree))
 
-        j <span style="color: #666666">+=</span> <span style="color: #666666">1</span>
-    i <span style="color: #666666">+=</span> <span style="color: #666666">1</span>
+<span style="color: #408080; font-style: italic"># Include the intercept in the design matrix</span>
+<span style="color: #008000; font-weight: bold">for</span> p <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(degree):
+    X[:, p] <span style="color: #666666">=</span> x <span style="color: #666666">**</span> p
 
+theta <span style="color: #666666">=</span> fit_theta(X, y)
 
-estimated_mse_KFold <span style="color: #666666">=</span> np<span style="color: #666666">.</span>mean(scores_KFold, axis <span style="color: #666666">=</span> <span style="color: #666666">1</span>)
+<span style="color: #408080; font-style: italic"># Intercept is included in the design matrix</span>
+skl <span style="color: #666666">=</span> LinearRegression(fit_intercept<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">False</span>)<span style="color: #666666">.</span>fit(X, y)
 
-<span style="color: #408080; font-style: italic">## Cross-validation using cross_val_score from sklearn along with KFold</span>
+<span style="color: #008000">print</span>(<span style="color: #BA2121">f&quot;True theta: </span><span style="color: #BB6688; font-weight: bold">{</span>true_theta<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">&quot;</span>)
+<span style="color: #008000">print</span>(<span style="color: #BA2121">f&quot;Fitted theta: </span><span style="color: #BB6688; font-weight: bold">{</span>theta<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">&quot;</span>)
+<span style="color: #008000">print</span>(<span style="color: #BA2121">f&quot;Sklearn fitted theta: </span><span style="color: #BB6688; font-weight: bold">{</span>skl<span style="color: #666666">.</span>coef_<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">&quot;</span>)
+ypredictOwn <span style="color: #666666">=</span> X <span style="color: #666666">@</span> theta
+ypredictSKL <span style="color: #666666">=</span> skl<span style="color: #666666">.</span>predict(X)
+<span style="color: #008000">print</span>(<span style="color: #BA2121">f&quot;MSE with intercept column&quot;</span>)
+<span style="color: #008000">print</span>(MSE(y,ypredictOwn))
+<span style="color: #008000">print</span>(<span style="color: #BA2121">f&quot;MSE with intercept column from SKL&quot;</span>)
+<span style="color: #008000">print</span>(MSE(y,ypredictSKL))
 
-<span style="color: #408080; font-style: italic"># kfold is an instance initialized above as:</span>
-<span style="color: #408080; font-style: italic"># kfold = KFold(n_splits = k)</span>
 
-estimated_mse_sklearn <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros(nlambdas)
-i <span style="color: #666666">=</span> <span style="color: #666666">0</span>
-<span style="color: #008000; font-weight: bold">for</span> lmb <span style="color: #AA22FF; font-weight: bold">in</span> lambdas:
-    ridge <span style="color: #666666">=</span> Ridge(alpha <span style="color: #666666">=</span> lmb)
+plt<span style="color: #666666">.</span>figure()
+plt<span style="color: #666666">.</span>scatter(x, y, label<span style="color: #666666">=</span><span style="color: #BA2121">&quot;Data&quot;</span>)
+plt<span style="color: #666666">.</span>plot(x, X <span style="color: #666666">@</span> theta, label<span style="color: #666666">=</span><span style="color: #BA2121">&quot;Fit&quot;</span>)
+plt<span style="color: #666666">.</span>plot(x, skl<span style="color: #666666">.</span>predict(X), label<span style="color: #666666">=</span><span style="color: #BA2121">&quot;Sklearn (fit_intercept=False)&quot;</span>)
 
-    X <span style="color: #666666">=</span> poly<span style="color: #666666">.</span>fit_transform(x[:, np<span style="color: #666666">.</span>newaxis])
-    estimated_mse_folds <span style="color: #666666">=</span> cross_val_score(ridge, X, y[:, np<span style="color: #666666">.</span>newaxis], scoring<span style="color: #666666">=</span><span style="color: #BA2121">&#39;neg_mean_squared_error&#39;</span>, cv<span style="color: #666666">=</span>kfold)
 
-    <span style="color: #408080; font-style: italic"># cross_val_score return an array containing the estimated negative mse for every fold.</span>
-    <span style="color: #408080; font-style: italic"># we have to the the mean of every array in order to get an estimate of the mse of the model</span>
-    estimated_mse_sklearn[i] <span style="color: #666666">=</span> np<span style="color: #666666">.</span>mean(<span style="color: #666666">-</span>estimated_mse_folds)
+<span style="color: #408080; font-style: italic"># Do not include the intercept in the design matrix</span>
+X <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros((<span style="color: #008000">len</span>(x), degree <span style="color: #666666">-</span> <span style="color: #666666">1</span>))
 
-    i <span style="color: #666666">+=</span> <span style="color: #666666">1</span>
+<span style="color: #008000; font-weight: bold">for</span> p <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(degree <span style="color: #666666">-</span> <span style="color: #666666">1</span>):
+    X[:, p] <span style="color: #666666">=</span> x <span style="color: #666666">**</span> (p <span style="color: #666666">+</span> <span style="color: #666666">1</span>)
 
-<span style="color: #408080; font-style: italic">## Plot and compare the slightly different ways to perform cross-validation</span>
+<span style="color: #408080; font-style: italic"># Intercept is not included in the design matrix</span>
+skl <span style="color: #666666">=</span> LinearRegression(fit_intercept<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">True</span>)<span style="color: #666666">.</span>fit(X, y)
 
-plt<span style="color: #666666">.</span>figure()
+<span style="color: #408080; font-style: italic"># Use centered values for X and y when computing coefficients</span>
+y_offset <span style="color: #666666">=</span> np<span style="color: #666666">.</span>average(y, axis<span style="color: #666666">=0</span>)
+X_offset <span style="color: #666666">=</span> np<span style="color: #666666">.</span>average(X, axis<span style="color: #666666">=0</span>)
 
-plt<span style="color: #666666">.</span>plot(np<span style="color: #666666">.</span>log10(lambdas), estimated_mse_sklearn, label <span style="color: #666666">=</span> <span style="color: #BA2121">&#39;cross_val_score&#39;</span>)
-plt<span style="color: #666666">.</span>plot(np<span style="color: #666666">.</span>log10(lambdas), estimated_mse_KFold, <span style="color: #BA2121">&#39;r--&#39;</span>, label <span style="color: #666666">=</span> <span style="color: #BA2121">&#39;KFold&#39;</span>)
+theta <span style="color: #666666">=</span> fit_theta(X <span style="color: #666666">-</span> X_offset, y <span style="color: #666666">-</span> y_offset)
+intercept <span style="color: #666666">=</span> np<span style="color: #666666">.</span>mean(y_offset <span style="color: #666666">-</span> X_offset <span style="color: #666666">@</span> theta)
 
-plt<span style="color: #666666">.</span>xlabel(<span style="color: #BA2121">&#39;log10(lambda)&#39;</span>)
-plt<span style="color: #666666">.</span>ylabel(<span style="color: #BA2121">&#39;mse&#39;</span>)
+<span style="color: #008000">print</span>(<span style="color: #BA2121">f&quot;Manual intercept: </span><span style="color: #BB6688; font-weight: bold">{</span>intercept<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">&quot;</span>)
+<span style="color: #008000">print</span>(<span style="color: #BA2121">f&quot;Fitted theta (without intercept): </span><span style="color: #BB6688; font-weight: bold">{</span>theta<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">&quot;</span>)
+<span style="color: #008000">print</span>(<span style="color: #BA2121">f&quot;Sklearn intercept: </span><span style="color: #BB6688; font-weight: bold">{</span>skl<span style="color: #666666">.</span>intercept_<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">&quot;</span>)
+<span style="color: #008000">print</span>(<span style="color: #BA2121">f&quot;Sklearn fitted theta (without intercept): </span><span style="color: #BB6688; font-weight: bold">{</span>skl<span style="color: #666666">.</span>coef_<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">&quot;</span>)
+ypredictOwn <span style="color: #666666">=</span> X <span style="color: #666666">@</span> theta
+ypredictSKL <span style="color: #666666">=</span> skl<span style="color: #666666">.</span>predict(X)
+<span style="color: #008000">print</span>(<span style="color: #BA2121">f&quot;MSE with Manual intercept&quot;</span>)
+<span style="color: #008000">print</span>(MSE(y,ypredictOwn<span style="color: #666666">+</span>intercept))
+<span style="color: #008000">print</span>(<span style="color: #BA2121">f&quot;MSE with Sklearn intercept&quot;</span>)
+<span style="color: #008000">print</span>(MSE(y,ypredictSKL))
 
+plt<span style="color: #666666">.</span>plot(x, X <span style="color: #666666">@</span> theta <span style="color: #666666">+</span> intercept, <span style="color: #BA2121">&quot;--&quot;</span>, label<span style="color: #666666">=</span><span style="color: #BA2121">&quot;Fit (manual intercept)&quot;</span>)
+plt<span style="color: #666666">.</span>plot(x, skl<span style="color: #666666">.</span>predict(X), <span style="color: #BA2121">&quot;--&quot;</span>, label<span style="color: #666666">=</span><span style="color: #BA2121">&quot;Sklearn (fit_intercept=True)&quot;</span>)
+plt<span style="color: #666666">.</span>grid()
 plt<span style="color: #666666">.</span>legend()
 
 plt<span style="color: #666666">.</span>show()
@@ -1725,9 +2895,43 @@ <h2 id="code-example-for-cross-validation-and-k-fold-cross-validation">Code Exam
   </div>
 </div>
 
+<p>The intercept is the value of our output/target variable
+when all our features are zero and our function crosses the \( y \)-axis (for a one-dimensional case). 
+</p>
 
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="more-examples-on-bootstrap-and-cross-validation-and-errors">More examples on bootstrap and cross-validation and errors </h2>
+<p>Printing the MSE, we see first that both methods give the same MSE, as
+they should.  However, when we move to for example Ridge regression,
+the way we treat the intercept may give a larger or smaller MSE,
+meaning that the MSE can be penalized by the value of the
+intercept. Not including the intercept in the fit, means that the
+regularization term does not include \( \theta_0 \). For different values
+of \( \lambda \), this may lead to different MSE values. 
+</p>
+
+<p>To remind the reader, the regularization term, with the intercept in Ridge regression, is given by</p>
+$$
+\lambda \vert\vert \boldsymbol{\theta} \vert\vert_2^2 = \lambda \sum_{j=0}^{p-1}\theta_j^2,
+$$
+
+<p>but when we take out the intercept, this equation becomes</p>
+$$
+\lambda \vert\vert \boldsymbol{\theta} \vert\vert_2^2 = \lambda \sum_{j=1}^{p-1}\theta_j^2.
+$$
+
+<p>For Lasso regression we have</p>
+$$
+\lambda \vert\vert \boldsymbol{\theta} \vert\vert_1 = \lambda \sum_{j=1}^{p-1}\vert\theta_j\vert.
+$$
+
+<p>It means that, when scaling the design matrix and the outputs/targets,
+by subtracting the mean values, we have an optimization problem which
+is not penalized by the intercept. The MSE value can then be smaller
+since it focuses only on the remaining quantities. If we however bring
+back the intercept, we will get a MSE which then contains the
+intercept.
+</p>
+
+<p>Armed with this wisdom, we attempt first to simply set the intercept equal to <b>False</b> in our implementation of Ridge regression for our well-known  vanilla data set.</p>
 
 
 <!-- code=python (!bc pycod) typeset with pygments style "default" -->
@@ -1736,82 +2940,69 @@ <h2 id="more-examples-on-bootstrap-and-cross-validation-and-errors">More example
     <div class="inner_cell">
       <div class="input_area">
         <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #408080; font-style: italic"># Common imports</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">os</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
+  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
 <span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">pandas</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">pd</span>
 <span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">matplotlib.pyplot</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">plt</span>
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.linear_model</span> <span style="color: #008000; font-weight: bold">import</span> LinearRegression, Ridge, Lasso
 <span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.model_selection</span> <span style="color: #008000; font-weight: bold">import</span> train_test_split
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.utils</span> <span style="color: #008000; font-weight: bold">import</span> resample
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.metrics</span> <span style="color: #008000; font-weight: bold">import</span> mean_squared_error
-<span style="color: #408080; font-style: italic"># Where to save the figures and data files</span>
-PROJECT_ROOT_DIR <span style="color: #666666">=</span> <span style="color: #BA2121">&quot;Results&quot;</span>
-FIGURE_ID <span style="color: #666666">=</span> <span style="color: #BA2121">&quot;Results/FigureFiles&quot;</span>
-DATA_ID <span style="color: #666666">=</span> <span style="color: #BA2121">&quot;DataFiles/&quot;</span>
-
-<span style="color: #008000; font-weight: bold">if</span> <span style="color: #AA22FF; font-weight: bold">not</span> os<span style="color: #666666">.</span>path<span style="color: #666666">.</span>exists(PROJECT_ROOT_DIR):
-    os<span style="color: #666666">.</span>mkdir(PROJECT_ROOT_DIR)
-
-<span style="color: #008000; font-weight: bold">if</span> <span style="color: #AA22FF; font-weight: bold">not</span> os<span style="color: #666666">.</span>path<span style="color: #666666">.</span>exists(FIGURE_ID):
-    os<span style="color: #666666">.</span>makedirs(FIGURE_ID)
-
-<span style="color: #008000; font-weight: bold">if</span> <span style="color: #AA22FF; font-weight: bold">not</span> os<span style="color: #666666">.</span>path<span style="color: #666666">.</span>exists(DATA_ID):
-    os<span style="color: #666666">.</span>makedirs(DATA_ID)
-
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">image_path</span>(fig_id):
-    <span style="color: #008000; font-weight: bold">return</span> os<span style="color: #666666">.</span>path<span style="color: #666666">.</span>join(FIGURE_ID, fig_id)
-
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">data_path</span>(dat_id):
-    <span style="color: #008000; font-weight: bold">return</span> os<span style="color: #666666">.</span>path<span style="color: #666666">.</span>join(DATA_ID, dat_id)
-
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">save_fig</span>(fig_id):
-    plt<span style="color: #666666">.</span>savefig(image_path(fig_id) <span style="color: #666666">+</span> <span style="color: #BA2121">&quot;.png&quot;</span>, <span style="color: #008000">format</span><span style="color: #666666">=</span><span style="color: #BA2121">&#39;png&#39;</span>)
-
-infile <span style="color: #666666">=</span> <span style="color: #008000">open</span>(data_path(<span style="color: #BA2121">&quot;EoS.csv&quot;</span>),<span style="color: #BA2121">&#39;r&#39;</span>)
-
-<span style="color: #408080; font-style: italic"># Read the EoS data as  csv file and organize the data into two arrays with density and energies</span>
-EoS <span style="color: #666666">=</span> pd<span style="color: #666666">.</span>read_csv(infile, names<span style="color: #666666">=</span>(<span style="color: #BA2121">&#39;Density&#39;</span>, <span style="color: #BA2121">&#39;Energy&#39;</span>))
-EoS[<span style="color: #BA2121">&#39;Energy&#39;</span>] <span style="color: #666666">=</span> pd<span style="color: #666666">.</span>to_numeric(EoS[<span style="color: #BA2121">&#39;Energy&#39;</span>], errors<span style="color: #666666">=</span><span style="color: #BA2121">&#39;coerce&#39;</span>)
-EoS <span style="color: #666666">=</span> EoS<span style="color: #666666">.</span>dropna()
-Energies <span style="color: #666666">=</span> EoS[<span style="color: #BA2121">&#39;Energy&#39;</span>]
-Density <span style="color: #666666">=</span> EoS[<span style="color: #BA2121">&#39;Density&#39;</span>]
-<span style="color: #408080; font-style: italic">#  The design matrix now as function of various polytrops</span>
-
-Maxpolydegree <span style="color: #666666">=</span> <span style="color: #666666">30</span>
-X <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros((<span style="color: #008000">len</span>(Density),Maxpolydegree))
-X[:,<span style="color: #666666">0</span>] <span style="color: #666666">=</span> <span style="color: #666666">1.0</span>
-testerror <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros(Maxpolydegree)
-trainingerror <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros(Maxpolydegree)
-polynomial <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros(Maxpolydegree)
-
-trials <span style="color: #666666">=</span> <span style="color: #666666">100</span>
-<span style="color: #008000; font-weight: bold">for</span> polydegree <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #666666">1</span>, Maxpolydegree):
-    polynomial[polydegree] <span style="color: #666666">=</span> polydegree
-    <span style="color: #008000; font-weight: bold">for</span> degree <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(polydegree):
-        X[:,degree] <span style="color: #666666">=</span> Density<span style="color: #666666">**</span>(degree<span style="color: #666666">/3.0</span>)
-
-<span style="color: #408080; font-style: italic"># loop over trials in order to estimate the expectation value of the MSE</span>
-    testerror[polydegree] <span style="color: #666666">=</span> <span style="color: #666666">0.0</span>
-    trainingerror[polydegree] <span style="color: #666666">=</span> <span style="color: #666666">0.0</span>
-    <span style="color: #008000; font-weight: bold">for</span> samples <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(trials):
-        x_train, x_test, y_train, y_test <span style="color: #666666">=</span> train_test_split(X, Energies, test_size<span style="color: #666666">=0.2</span>)
-        model <span style="color: #666666">=</span> LinearRegression(fit_intercept<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">False</span>)<span style="color: #666666">.</span>fit(x_train, y_train)
-        ypred <span style="color: #666666">=</span> model<span style="color: #666666">.</span>predict(x_train)
-        ytilde <span style="color: #666666">=</span> model<span style="color: #666666">.</span>predict(x_test)
-        testerror[polydegree] <span style="color: #666666">+=</span> mean_squared_error(y_test, ytilde)
-        trainingerror[polydegree] <span style="color: #666666">+=</span> mean_squared_error(y_train, ypred) 
-
-    testerror[polydegree] <span style="color: #666666">/=</span> trials
-    trainingerror[polydegree] <span style="color: #666666">/=</span> trials
-    <span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;Degree of polynomial: </span><span style="color: #BB6688; font-weight: bold">%3d</span><span style="color: #BA2121">&quot;</span><span style="color: #666666">%</span> polynomial[polydegree])
-    <span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;Mean squared error on training data: </span><span style="color: #BB6688; font-weight: bold">%.8f</span><span style="color: #BA2121">&quot;</span> <span style="color: #666666">%</span> trainingerror[polydegree])
-    <span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;Mean squared error on test data: </span><span style="color: #BB6688; font-weight: bold">%.8f</span><span style="color: #BA2121">&quot;</span> <span style="color: #666666">%</span> testerror[polydegree])
-
-plt<span style="color: #666666">.</span>plot(polynomial, np<span style="color: #666666">.</span>log10(trainingerror), label<span style="color: #666666">=</span><span style="color: #BA2121">&#39;Training Error&#39;</span>)
-plt<span style="color: #666666">.</span>plot(polynomial, np<span style="color: #666666">.</span>log10(testerror), label<span style="color: #666666">=</span><span style="color: #BA2121">&#39;Test Error&#39;</span>)
-plt<span style="color: #666666">.</span>xlabel(<span style="color: #BA2121">&#39;Polynomial degree&#39;</span>)
-plt<span style="color: #666666">.</span>ylabel(<span style="color: #BA2121">&#39;log10[MSE]&#39;</span>)
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn</span> <span style="color: #008000; font-weight: bold">import</span> linear_model
+
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">MSE</span>(y_data,y_model):
+    n <span style="color: #666666">=</span> np<span style="color: #666666">.</span>size(y_model)
+    <span style="color: #008000; font-weight: bold">return</span> np<span style="color: #666666">.</span>sum((y_data<span style="color: #666666">-</span>y_model)<span style="color: #666666">**2</span>)<span style="color: #666666">/</span>n
+
+
+<span style="color: #408080; font-style: italic"># A seed just to ensure that the random numbers are the same for every run.</span>
+<span style="color: #408080; font-style: italic"># Useful for eventual debugging.</span>
+np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>seed(<span style="color: #666666">3155</span>)
+
+n <span style="color: #666666">=</span> <span style="color: #666666">100</span>
+x <span style="color: #666666">=</span> np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>rand(n)
+y <span style="color: #666666">=</span> np<span style="color: #666666">.</span>exp(<span style="color: #666666">-</span>x<span style="color: #666666">**2</span>) <span style="color: #666666">+</span> <span style="color: #666666">1.5</span> <span style="color: #666666">*</span> np<span style="color: #666666">.</span>exp(<span style="color: #666666">-</span>(x<span style="color: #666666">-2</span>)<span style="color: #666666">**2</span>)
+
+Maxpolydegree <span style="color: #666666">=</span> <span style="color: #666666">20</span>
+X <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros((n,Maxpolydegree))
+<span style="color: #408080; font-style: italic">#We include explicitely the intercept column</span>
+<span style="color: #008000; font-weight: bold">for</span> degree <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(Maxpolydegree):
+    X[:,degree] <span style="color: #666666">=</span> x<span style="color: #666666">**</span>degree
+<span style="color: #408080; font-style: italic"># We split the data in test and training data</span>
+X_train, X_test, y_train, y_test <span style="color: #666666">=</span> train_test_split(X, y, test_size<span style="color: #666666">=0.2</span>)
+
+p <span style="color: #666666">=</span> Maxpolydegree
+I <span style="color: #666666">=</span> np<span style="color: #666666">.</span>eye(p,p)
+<span style="color: #408080; font-style: italic"># Decide which values of lambda to use</span>
+nlambdas <span style="color: #666666">=</span> <span style="color: #666666">6</span>
+MSEOwnRidgePredict <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros(nlambdas)
+MSERidgePredict <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros(nlambdas)
+lambdas <span style="color: #666666">=</span> np<span style="color: #666666">.</span>logspace(<span style="color: #666666">-4</span>, <span style="color: #666666">2</span>, nlambdas)
+<span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(nlambdas):
+    lmb <span style="color: #666666">=</span> lambdas[i]
+    OwnRidgeTheta <span style="color: #666666">=</span> np<span style="color: #666666">.</span>linalg<span style="color: #666666">.</span>pinv(X_train<span style="color: #666666">.</span>T <span style="color: #666666">@</span> X_train<span style="color: #666666">+</span>lmb<span style="color: #666666">*</span>I) <span style="color: #666666">@</span> X_train<span style="color: #666666">.</span>T <span style="color: #666666">@</span> y_train
+    <span style="color: #408080; font-style: italic"># Note: we include the intercept column and no scaling</span>
+    RegRidge <span style="color: #666666">=</span> linear_model<span style="color: #666666">.</span>Ridge(lmb,fit_intercept<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">False</span>)
+    RegRidge<span style="color: #666666">.</span>fit(X_train,y_train)
+    <span style="color: #408080; font-style: italic"># and then make the prediction</span>
+    ytildeOwnRidge <span style="color: #666666">=</span> X_train <span style="color: #666666">@</span> OwnRidgeTheta
+    ypredictOwnRidge <span style="color: #666666">=</span> X_test <span style="color: #666666">@</span> OwnRidgeTheta
+    ytildeRidge <span style="color: #666666">=</span> RegRidge<span style="color: #666666">.</span>predict(X_train)
+    ypredictRidge <span style="color: #666666">=</span> RegRidge<span style="color: #666666">.</span>predict(X_test)
+    MSEOwnRidgePredict[i] <span style="color: #666666">=</span> MSE(y_test,ypredictOwnRidge)
+    MSERidgePredict[i] <span style="color: #666666">=</span> MSE(y_test,ypredictRidge)
+    <span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;Theta values for own Ridge implementation&quot;</span>)
+    <span style="color: #008000">print</span>(OwnRidgeTheta)
+    <span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;Theta values for Scikit-Learn Ridge implementation&quot;</span>)
+    <span style="color: #008000">print</span>(RegRidge<span style="color: #666666">.</span>coef_)
+    <span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;MSE values for own Ridge implementation&quot;</span>)
+    <span style="color: #008000">print</span>(MSEOwnRidgePredict[i])
+    <span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;MSE values for Scikit-Learn Ridge implementation&quot;</span>)
+    <span style="color: #008000">print</span>(MSERidgePredict[i])
+
+<span style="color: #408080; font-style: italic"># Now plot the results</span>
+plt<span style="color: #666666">.</span>figure()
+plt<span style="color: #666666">.</span>plot(np<span style="color: #666666">.</span>log10(lambdas), MSEOwnRidgePredict, <span style="color: #BA2121">&#39;r&#39;</span>, label <span style="color: #666666">=</span> <span style="color: #BA2121">&#39;MSE own Ridge Test&#39;</span>)
+plt<span style="color: #666666">.</span>plot(np<span style="color: #666666">.</span>log10(lambdas), MSERidgePredict, <span style="color: #BA2121">&#39;g&#39;</span>, label <span style="color: #666666">=</span> <span style="color: #BA2121">&#39;MSE Ridge Test&#39;</span>)
+
+plt<span style="color: #666666">.</span>xlabel(<span style="color: #BA2121">&#39;log10(lambda)&#39;</span>)
+plt<span style="color: #666666">.</span>ylabel(<span style="color: #BA2121">&#39;MSE&#39;</span>)
 plt<span style="color: #666666">.</span>legend()
 plt<span style="color: #666666">.</span>show()
 </pre>
@@ -1829,12 +3020,12 @@ <h2 id="more-examples-on-bootstrap-and-cross-validation-and-errors">More example
   </div>
 </div>
 
-<p>Note that we kept the intercept column in the fitting here. This means that we need to set the <b>intercept</b> in the call to the <b>Scikit-Learn</b> function as <b>False</b>. Alternatively, we could have set up the design matrix \( X \) without the first column of ones.</p>
-
-<!-- !split  -->
-<h2 id="the-same-example-but-now-with-cross-validation">The same example but now with cross-validation </h2>
+<p>The results here agree when we force <b>Scikit-Learn</b>'s Ridge function to include the first column in our design matrix.
+We see that the results agree very well. Here we have thus explicitely included the intercept column in the design matrix.
+What happens if we do not include the intercept in our fit?
+Let us see how we can change this code by zero centering.
+</p>
 
-<p>In this example we keep the intercept column again but add cross-validation in order to estimate the best possible value of the means squared error.</p>
 
 <!-- code=python (!bc pycod) typeset with pygments style "default" -->
 <div class="cell border-box-sizing code_cell rendered">
@@ -1842,71 +3033,82 @@ <h2 id="the-same-example-but-now-with-cross-validation">The same example but now
     <div class="inner_cell">
       <div class="input_area">
         <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #408080; font-style: italic"># Common imports</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">os</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
+  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
 <span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">pandas</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">pd</span>
 <span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">matplotlib.pyplot</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">plt</span>
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.linear_model</span> <span style="color: #008000; font-weight: bold">import</span> LinearRegression, Ridge, Lasso
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.metrics</span> <span style="color: #008000; font-weight: bold">import</span> mean_squared_error
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.model_selection</span> <span style="color: #008000; font-weight: bold">import</span> KFold
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.model_selection</span> <span style="color: #008000; font-weight: bold">import</span> cross_val_score
-
-
-<span style="color: #408080; font-style: italic"># Where to save the figures and data files</span>
-PROJECT_ROOT_DIR <span style="color: #666666">=</span> <span style="color: #BA2121">&quot;Results&quot;</span>
-FIGURE_ID <span style="color: #666666">=</span> <span style="color: #BA2121">&quot;Results/FigureFiles&quot;</span>
-DATA_ID <span style="color: #666666">=</span> <span style="color: #BA2121">&quot;DataFiles/&quot;</span>
-
-<span style="color: #008000; font-weight: bold">if</span> <span style="color: #AA22FF; font-weight: bold">not</span> os<span style="color: #666666">.</span>path<span style="color: #666666">.</span>exists(PROJECT_ROOT_DIR):
-    os<span style="color: #666666">.</span>mkdir(PROJECT_ROOT_DIR)
-
-<span style="color: #008000; font-weight: bold">if</span> <span style="color: #AA22FF; font-weight: bold">not</span> os<span style="color: #666666">.</span>path<span style="color: #666666">.</span>exists(FIGURE_ID):
-    os<span style="color: #666666">.</span>makedirs(FIGURE_ID)
-
-<span style="color: #008000; font-weight: bold">if</span> <span style="color: #AA22FF; font-weight: bold">not</span> os<span style="color: #666666">.</span>path<span style="color: #666666">.</span>exists(DATA_ID):
-    os<span style="color: #666666">.</span>makedirs(DATA_ID)
-
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">image_path</span>(fig_id):
-    <span style="color: #008000; font-weight: bold">return</span> os<span style="color: #666666">.</span>path<span style="color: #666666">.</span>join(FIGURE_ID, fig_id)
-
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">data_path</span>(dat_id):
-    <span style="color: #008000; font-weight: bold">return</span> os<span style="color: #666666">.</span>path<span style="color: #666666">.</span>join(DATA_ID, dat_id)
-
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">save_fig</span>(fig_id):
-    plt<span style="color: #666666">.</span>savefig(image_path(fig_id) <span style="color: #666666">+</span> <span style="color: #BA2121">&quot;.png&quot;</span>, <span style="color: #008000">format</span><span style="color: #666666">=</span><span style="color: #BA2121">&#39;png&#39;</span>)
-
-infile <span style="color: #666666">=</span> <span style="color: #008000">open</span>(data_path(<span style="color: #BA2121">&quot;EoS.csv&quot;</span>),<span style="color: #BA2121">&#39;r&#39;</span>)
-
-<span style="color: #408080; font-style: italic"># Read the EoS data as  csv file and organize the data into two arrays with density and energies</span>
-EoS <span style="color: #666666">=</span> pd<span style="color: #666666">.</span>read_csv(infile, names<span style="color: #666666">=</span>(<span style="color: #BA2121">&#39;Density&#39;</span>, <span style="color: #BA2121">&#39;Energy&#39;</span>))
-EoS[<span style="color: #BA2121">&#39;Energy&#39;</span>] <span style="color: #666666">=</span> pd<span style="color: #666666">.</span>to_numeric(EoS[<span style="color: #BA2121">&#39;Energy&#39;</span>], errors<span style="color: #666666">=</span><span style="color: #BA2121">&#39;coerce&#39;</span>)
-EoS <span style="color: #666666">=</span> EoS<span style="color: #666666">.</span>dropna()
-Energies <span style="color: #666666">=</span> EoS[<span style="color: #BA2121">&#39;Energy&#39;</span>]
-Density <span style="color: #666666">=</span> EoS[<span style="color: #BA2121">&#39;Density&#39;</span>]
-<span style="color: #408080; font-style: italic">#  The design matrix now as function of various polytrops</span>
-
-Maxpolydegree <span style="color: #666666">=</span> <span style="color: #666666">30</span>
-X <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros((<span style="color: #008000">len</span>(Density),Maxpolydegree))
-X[:,<span style="color: #666666">0</span>] <span style="color: #666666">=</span> <span style="color: #666666">1.0</span>
-estimated_mse_sklearn <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros(Maxpolydegree)
-polynomial <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros(Maxpolydegree)
-k <span style="color: #666666">=5</span>
-kfold <span style="color: #666666">=</span> KFold(n_splits <span style="color: #666666">=</span> k)
-
-<span style="color: #008000; font-weight: bold">for</span> polydegree <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #666666">1</span>, Maxpolydegree):
-    polynomial[polydegree] <span style="color: #666666">=</span> polydegree
-    <span style="color: #008000; font-weight: bold">for</span> degree <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(polydegree):
-        X[:,degree] <span style="color: #666666">=</span> Density<span style="color: #666666">**</span>(degree<span style="color: #666666">/3.0</span>)
-        OLS <span style="color: #666666">=</span> LinearRegression(fit_intercept<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">False</span>)
-<span style="color: #408080; font-style: italic"># loop over trials in order to estimate the expectation value of the MSE</span>
-    estimated_mse_folds <span style="color: #666666">=</span> cross_val_score(OLS, X, Energies, scoring<span style="color: #666666">=</span><span style="color: #BA2121">&#39;neg_mean_squared_error&#39;</span>, cv<span style="color: #666666">=</span>kfold)
-<span style="color: #408080; font-style: italic">#[:, np.newaxis]</span>
-    estimated_mse_sklearn[polydegree] <span style="color: #666666">=</span> np<span style="color: #666666">.</span>mean(<span style="color: #666666">-</span>estimated_mse_folds)
-
-plt<span style="color: #666666">.</span>plot(polynomial, np<span style="color: #666666">.</span>log10(estimated_mse_sklearn), label<span style="color: #666666">=</span><span style="color: #BA2121">&#39;Test Error&#39;</span>)
-plt<span style="color: #666666">.</span>xlabel(<span style="color: #BA2121">&#39;Polynomial degree&#39;</span>)
-plt<span style="color: #666666">.</span>ylabel(<span style="color: #BA2121">&#39;log10[MSE]&#39;</span>)
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.model_selection</span> <span style="color: #008000; font-weight: bold">import</span> train_test_split
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn</span> <span style="color: #008000; font-weight: bold">import</span> linear_model
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.preprocessing</span> <span style="color: #008000; font-weight: bold">import</span> StandardScaler
+
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">MSE</span>(y_data,y_model):
+    n <span style="color: #666666">=</span> np<span style="color: #666666">.</span>size(y_model)
+    <span style="color: #008000; font-weight: bold">return</span> np<span style="color: #666666">.</span>sum((y_data<span style="color: #666666">-</span>y_model)<span style="color: #666666">**2</span>)<span style="color: #666666">/</span>n
+<span style="color: #408080; font-style: italic"># A seed just to ensure that the random numbers are the same for every run.</span>
+<span style="color: #408080; font-style: italic"># Useful for eventual debugging.</span>
+np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>seed(<span style="color: #666666">315</span>)
+
+n <span style="color: #666666">=</span> <span style="color: #666666">100</span>
+x <span style="color: #666666">=</span> np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>rand(n)
+y <span style="color: #666666">=</span> np<span style="color: #666666">.</span>exp(<span style="color: #666666">-</span>x<span style="color: #666666">**2</span>) <span style="color: #666666">+</span> <span style="color: #666666">1.5</span> <span style="color: #666666">*</span> np<span style="color: #666666">.</span>exp(<span style="color: #666666">-</span>(x<span style="color: #666666">-2</span>)<span style="color: #666666">**2</span>)
+
+Maxpolydegree <span style="color: #666666">=</span> <span style="color: #666666">20</span>
+X <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros((n,Maxpolydegree<span style="color: #666666">-1</span>))
+
+<span style="color: #008000; font-weight: bold">for</span> degree <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #666666">1</span>,Maxpolydegree): <span style="color: #408080; font-style: italic">#No intercept column</span>
+    X[:,degree<span style="color: #666666">-1</span>] <span style="color: #666666">=</span> x<span style="color: #666666">**</span>(degree)
+
+<span style="color: #408080; font-style: italic"># We split the data in test and training data</span>
+X_train, X_test, y_train, y_test <span style="color: #666666">=</span> train_test_split(X, y, test_size<span style="color: #666666">=0.2</span>)
+
+<span style="color: #408080; font-style: italic">#For our own implementation, we will need to deal with the intercept by centering the design matrix and the target variable</span>
+X_train_mean <span style="color: #666666">=</span> np<span style="color: #666666">.</span>mean(X_train,axis<span style="color: #666666">=0</span>)
+<span style="color: #408080; font-style: italic">#Center by removing mean from each feature</span>
+X_train_scaled <span style="color: #666666">=</span> X_train <span style="color: #666666">-</span> X_train_mean 
+X_test_scaled <span style="color: #666666">=</span> X_test <span style="color: #666666">-</span> X_train_mean
+<span style="color: #408080; font-style: italic">#The model intercept (called y_scaler) is given by the mean of the target variable (IF X is centered)</span>
+<span style="color: #408080; font-style: italic">#Remove the intercept from the training data.</span>
+y_scaler <span style="color: #666666">=</span> np<span style="color: #666666">.</span>mean(y_train)           
+y_train_scaled <span style="color: #666666">=</span> y_train <span style="color: #666666">-</span> y_scaler   
+
+p <span style="color: #666666">=</span> Maxpolydegree<span style="color: #666666">-1</span>
+I <span style="color: #666666">=</span> np<span style="color: #666666">.</span>eye(p,p)
+<span style="color: #408080; font-style: italic"># Decide which values of lambda to use</span>
+nlambdas <span style="color: #666666">=</span> <span style="color: #666666">6</span>
+MSEOwnRidgePredict <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros(nlambdas)
+MSERidgePredict <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros(nlambdas)
+
+lambdas <span style="color: #666666">=</span> np<span style="color: #666666">.</span>logspace(<span style="color: #666666">-4</span>, <span style="color: #666666">2</span>, nlambdas)
+<span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(nlambdas):
+    lmb <span style="color: #666666">=</span> lambdas[i]
+    OwnRidgeTheta <span style="color: #666666">=</span> np<span style="color: #666666">.</span>linalg<span style="color: #666666">.</span>pinv(X_train_scaled<span style="color: #666666">.</span>T <span style="color: #666666">@</span> X_train_scaled<span style="color: #666666">+</span>lmb<span style="color: #666666">*</span>I) <span style="color: #666666">@</span> X_train_scaled<span style="color: #666666">.</span>T <span style="color: #666666">@</span> (y_train_scaled)
+    intercept_ <span style="color: #666666">=</span> y_scaler <span style="color: #666666">-</span> X_train_mean<span style="color: #AA22FF">@OwnRidgeTheta</span> <span style="color: #408080; font-style: italic">#The intercept can be shifted so the model can predict on uncentered data</span>
+    <span style="color: #408080; font-style: italic">#Add intercept to prediction</span>
+    ypredictOwnRidge <span style="color: #666666">=</span> X_test_scaled <span style="color: #666666">@</span> OwnRidgeTheta <span style="color: #666666">+</span> y_scaler 
+    RegRidge <span style="color: #666666">=</span> linear_model<span style="color: #666666">.</span>Ridge(lmb)
+    RegRidge<span style="color: #666666">.</span>fit(X_train,y_train)
+    ypredictRidge <span style="color: #666666">=</span> RegRidge<span style="color: #666666">.</span>predict(X_test)
+    MSEOwnRidgePredict[i] <span style="color: #666666">=</span> MSE(y_test,ypredictOwnRidge)
+    MSERidgePredict[i] <span style="color: #666666">=</span> MSE(y_test,ypredictRidge)
+    <span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;Theta values for own Ridge implementation&quot;</span>)
+    <span style="color: #008000">print</span>(OwnRidgeTheta) <span style="color: #408080; font-style: italic">#Intercept is given by mean of target variable</span>
+    <span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;Theta values for Scikit-Learn Ridge implementation&quot;</span>)
+    <span style="color: #008000">print</span>(RegRidge<span style="color: #666666">.</span>coef_)
+    <span style="color: #008000">print</span>(<span style="color: #BA2121">&#39;Intercept from own implementation:&#39;</span>)
+    <span style="color: #008000">print</span>(intercept_)
+    <span style="color: #008000">print</span>(<span style="color: #BA2121">&#39;Intercept from Scikit-Learn Ridge implementation&#39;</span>)
+    <span style="color: #008000">print</span>(RegRidge<span style="color: #666666">.</span>intercept_)
+    <span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;MSE values for own Ridge implementation&quot;</span>)
+    <span style="color: #008000">print</span>(MSEOwnRidgePredict[i])
+    <span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;MSE values for Scikit-Learn Ridge implementation&quot;</span>)
+    <span style="color: #008000">print</span>(MSERidgePredict[i])
+
+
+<span style="color: #408080; font-style: italic"># Now plot the results</span>
+plt<span style="color: #666666">.</span>figure()
+plt<span style="color: #666666">.</span>plot(np<span style="color: #666666">.</span>log10(lambdas), MSEOwnRidgePredict, <span style="color: #BA2121">&#39;b--&#39;</span>, label <span style="color: #666666">=</span> <span style="color: #BA2121">&#39;MSE own Ridge Test&#39;</span>)
+plt<span style="color: #666666">.</span>plot(np<span style="color: #666666">.</span>log10(lambdas), MSERidgePredict, <span style="color: #BA2121">&#39;g--&#39;</span>, label <span style="color: #666666">=</span> <span style="color: #BA2121">&#39;MSE SL Ridge Test&#39;</span>)
+plt<span style="color: #666666">.</span>xlabel(<span style="color: #BA2121">&#39;log10(lambda)&#39;</span>)
+plt<span style="color: #666666">.</span>ylabel(<span style="color: #BA2121">&#39;MSE&#39;</span>)
 plt<span style="color: #666666">.</span>legend()
 plt<span style="color: #666666">.</span>show()
 </pre>
@@ -1924,191 +3126,17 @@ <h2 id="the-same-example-but-now-with-cross-validation">The same example but now
   </div>
 </div>
 
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="material-for-the-lab-sessions">Material for the lab sessions </h2>
-
-<!-- !split  -->
-<h2 id="linking-the-regression-analysis-with-a-statistical-interpretation">Linking the regression analysis with a statistical interpretation </h2>
-
-<p>We will now couple the discussions of ordinary least squares, Ridge
-and Lasso regression with a statistical interpretation, that is we
-move from a linear algebra analysis to a statistical analysis. In
-particular, we will focus on what the regularization terms can result
-in.  We will amongst other things show that the regularization
-parameter can reduce considerably the variance of the parameters
-\( \beta \).
-</p>
-
-<p>The
-advantage of doing linear regression is that we actually end up with
-analytical expressions for several statistical quantities.  
-Standard least squares and Ridge regression  allow us to
-derive quantities like the variance and other expectation values in a
-rather straightforward way.
-</p>
-
-<p>It is assumed that \( \varepsilon_i
-\sim \mathcal{N}(0, \sigma^2) \) and the \( \varepsilon_{i} \) are
-independent, i.e.: 
-</p>
-$$
-\begin{align*} 
-\mbox{Cov}(\varepsilon_{i_1},
-\varepsilon_{i_2}) & = \left\{ \begin{array}{lcc} \sigma^2 & \mbox{if}
-& i_1 = i_2, \\ 0 & \mbox{if} & i_1 \not= i_2.  \end{array} \right.
-\end{align*} 
-$$
-
-<p>The randomness of \( \varepsilon_i \) implies that
-\( \mathbf{y}_i \) is also a random variable. In particular,
-\( \mathbf{y}_i \) is normally distributed, because \( \varepsilon_i \sim
-\mathcal{N}(0, \sigma^2) \) and \( \mathbf{X}_{i,\ast} \, \boldsymbol{\beta} \) is a
-non-random scalar. To specify the parameters of the distribution of
-\( \mathbf{y}_i \) we need to calculate its first two moments. 
-</p>
-
-<p>Recall that \( \boldsymbol{X} \) is a matrix of dimensionality \( n\times p \). The
-notation above \( \mathbf{X}_{i,\ast} \) means that we are looking at the
-row number \( i \) and perform a sum over all values \( p \).
-</p>
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="assumptions-made">Assumptions made </h2>
-
-<p>The assumption we have made here can be summarized as (and this is going to be useful when we discuss the bias-variance trade off)
-that there exists a function \( f(\boldsymbol{x}) \) and  a normal distributed error \( \boldsymbol{\varepsilon}\sim \mathcal{N}(0, \sigma^2) \)
-which describe our data
-</p>
-$$
-\boldsymbol{y} = f(\boldsymbol{x})+\boldsymbol{\varepsilon}
-$$
-
-<p>We approximate this function with our model from the solution of the linear regression equations, that is our
-function \( f \) is approximated by \( \boldsymbol{\tilde{y}} \) where we want to minimize \( (\boldsymbol{y}-\boldsymbol{\tilde{y}})^2 \), our MSE, with
-</p>
-$$
-\boldsymbol{\tilde{y}} = \boldsymbol{X}\boldsymbol{\beta}.
-$$
-
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="expectation-value-and-variance">Expectation value and variance </h2>
-
-<p>We can calculate the expectation value of \( \boldsymbol{y} \) for a given element \( i \) </p>
-$$
-\begin{align*} 
-\mathbb{E}(y_i) & =
-\mathbb{E}(\mathbf{X}_{i, \ast} \, \boldsymbol{\beta}) + \mathbb{E}(\varepsilon_i)
-\, \, \, = \, \, \, \mathbf{X}_{i, \ast} \, \beta, 
-\end{align*} 
-$$
-
-<p>while
-its variance is 
-</p>
-$$
-\begin{align*} \mbox{Var}(y_i) & = \mathbb{E} \{ [y_i
-- \mathbb{E}(y_i)]^2 \} \, \, \, = \, \, \, \mathbb{E} ( y_i^2 ) -
-[\mathbb{E}(y_i)]^2  \\  & = \mathbb{E} [ ( \mathbf{X}_{i, \ast} \,
-\beta + \varepsilon_i )^2] - ( \mathbf{X}_{i, \ast} \, \boldsymbol{\beta})^2 \\ &
-= \mathbb{E} [ ( \mathbf{X}_{i, \ast} \, \boldsymbol{\beta})^2 + 2 \varepsilon_i
-\mathbf{X}_{i, \ast} \, \boldsymbol{\beta} + \varepsilon_i^2 ] - ( \mathbf{X}_{i,
-\ast} \, \beta)^2 \\  & = ( \mathbf{X}_{i, \ast} \, \boldsymbol{\beta})^2 + 2
-\mathbb{E}(\varepsilon_i) \mathbf{X}_{i, \ast} \, \boldsymbol{\beta} +
-\mathbb{E}(\varepsilon_i^2 ) - ( \mathbf{X}_{i, \ast} \, \boldsymbol{\beta})^2 
-\\ & = \mathbb{E}(\varepsilon_i^2 ) \, \, \, = \, \, \,
-\mbox{Var}(\varepsilon_i) \, \, \, = \, \, \, \sigma^2.  
-\end{align*}
-$$
-
-<p>Hence, \( y_i \sim \mathcal{N}( \mathbf{X}_{i, \ast} \, \boldsymbol{\beta}, \sigma^2) \), that is \( \boldsymbol{y} \) follows a normal distribution with 
-mean value \( \boldsymbol{X}\boldsymbol{\beta} \) and variance \( \sigma^2 \) (not be confused with the singular values of the SVD). 
+<p>We see here, when compared to the code which includes explicitely the
+intercept column, that our MSE value is actually smaller. This is
+because the regularization term does not include the intercept value
+\( \theta_0 \) in the fitting.  This applies to Lasso regularization as
+well.  It means that our optimization is now done only with the
+centered matrix and/or vector that enter the fitting procedure.
 </p>
 
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="expectation-value-and-variance-for-boldsymbol-beta">Expectation value and variance for \( \boldsymbol{\beta} \) </h2>
-
-<p>With the OLS expressions for the optimal parameters \( \boldsymbol{\hat{\beta}} \) we can evaluate the expectation value</p>
-$$
-\mathbb{E}(\boldsymbol{\hat{\beta}}) = \mathbb{E}[ (\mathbf{X}^{\top} \mathbf{X})^{-1}\mathbf{X}^{T} \mathbf{Y}]=(\mathbf{X}^{T} \mathbf{X})^{-1}\mathbf{X}^{T} \mathbb{E}[ \mathbf{Y}]=(\mathbf{X}^{T} \mathbf{X})^{-1} \mathbf{X}^{T}\mathbf{X}\boldsymbol{\beta}=\boldsymbol{\beta}.
-$$
-
-<p>This means that the estimator of the regression parameters is unbiased.</p>
-
-<p>We can also calculate the variance</p>
-
-<p>The variance of the optimal value \( \boldsymbol{\hat{\beta}} \) is</p>
-$$
-\begin{eqnarray*}
-\mbox{Var}(\boldsymbol{\hat{\beta}}) & = & \mathbb{E} \{ [\boldsymbol{\beta} - \mathbb{E}(\boldsymbol{\beta})] [\boldsymbol{\beta} - \mathbb{E}(\boldsymbol{\beta})]^{T} \}
-\\
-& = & \mathbb{E} \{ [(\mathbf{X}^{T} \mathbf{X})^{-1} \, \mathbf{X}^{T} \mathbf{y} - \boldsymbol{\beta}] \, [(\mathbf{X}^{T} \mathbf{X})^{-1} \, \mathbf{X}^{T} \mathbf{y} - \boldsymbol{\beta}]^{T} \}
-\\
-% & = & \mathbb{E} \{ [(\mathbf{X}^{T} \mathbf{X})^{-1} \, \mathbf{X}^{T} \mathbf{y}] \, [(\mathbf{X}^{T} \mathbf{X})^{-1} \, \mathbf{X}^{T} \mathbf{y}]^{T} \} - \boldsymbol{\beta} \, \boldsymbol{\beta}^{T}
-% \\
-% & = & \mathbb{E} \{ (\mathbf{X}^{T} \mathbf{X})^{-1} \, \mathbf{X}^{T} \mathbf{y} \, \mathbf{y}^{T} \, \mathbf{X} \, (\mathbf{X}^{T} \mathbf{X})^{-1}  \} - \boldsymbol{\beta} \, \boldsymbol{\beta}^{T}
-% \\
-& = & (\mathbf{X}^{T} \mathbf{X})^{-1} \, \mathbf{X}^{T} \, \mathbb{E} \{ \mathbf{y} \, \mathbf{y}^{T} \} \, \mathbf{X} \, (\mathbf{X}^{T} \mathbf{X})^{-1} - \boldsymbol{\beta} \, \boldsymbol{\beta}^{T}
-\\
-& = & (\mathbf{X}^{T} \mathbf{X})^{-1} \, \mathbf{X}^{T} \, \{ \mathbf{X} \, \boldsymbol{\beta} \, \boldsymbol{\beta}^{T} \,  \mathbf{X}^{T} + \sigma^2 \} \, \mathbf{X} \, (\mathbf{X}^{T} \mathbf{X})^{-1} - \boldsymbol{\beta} \, \boldsymbol{\beta}^{T}
-% \\
-% & = & (\mathbf{X}^T \mathbf{X})^{-1} \, \mathbf{X}^T \, \mathbf{X} \, \boldsymbol{\beta} \, \boldsymbol{\beta}^T \,  \mathbf{X}^T \, \mathbf{X} \, (\mathbf{X}^T % \mathbf{X})^{-1}
-% \\
-% & & + \, \, \sigma^2 \, (\mathbf{X}^T \mathbf{X})^{-1} \, \mathbf{X}^T  \, \mathbf{X} \, (\mathbf{X}^T \mathbf{X})^{-1} - \boldsymbol{\beta} \boldsymbol{\beta}^T
-\\
-& = & \boldsymbol{\beta} \, \boldsymbol{\beta}^{T}  + \sigma^2 \, (\mathbf{X}^{T} \mathbf{X})^{-1} - \boldsymbol{\beta} \, \boldsymbol{\beta}^{T}
-\, \, \, = \, \, \, \sigma^2 \, (\mathbf{X}^{T} \mathbf{X})^{-1},
-\end{eqnarray*}
-$$
-
-<p>where we have used  that \( \mathbb{E} (\mathbf{y} \mathbf{y}^{T}) =
-\mathbf{X} \, \boldsymbol{\beta} \, \boldsymbol{\beta}^{T} \, \mathbf{X}^{T} +
-\sigma^2 \, \mathbf{I}_{nn} \). From \( \mbox{Var}(\boldsymbol{\beta}) = \sigma^2
-\, (\mathbf{X}^{T} \mathbf{X})^{-1} \), one obtains an estimate of the
-variance of the estimate of the \( j \)-th regression coefficient:
-\( \boldsymbol{\sigma}^2 (\boldsymbol{\beta}_j ) = \boldsymbol{\sigma}^2 [(\mathbf{X}^{T} \mathbf{X})^{-1}]_{jj}  \). This may be used to
-construct a confidence interval for the estimates.
-</p>
-
-<p>In a similar way, we can obtain analytical expressions for say the
-expectation values of the parameters \( \boldsymbol{\beta} \) and their variance
-when we employ Ridge regression, allowing us again to define a confidence interval. 
-</p>
-
-<p>It is rather straightforward to show that</p>
-$$
-\mathbb{E} \big[ \hat{\boldsymbol{\beta}}^{\mathrm{Ridge}} \big]=(\mathbf{X}^{T} \mathbf{X} + \lambda \mathbf{I}_{pp})^{-1} (\mathbf{X}^{\top} \mathbf{X})\boldsymbol{\beta}.
-$$
-
-<p>We see clearly that 
-\( \mathbb{E} \big[ \hat{\boldsymbol{\beta}}^{\mathrm{Ridge}} \big] \not= \hat{\boldsymbol{\beta}}^{\mathrm{OLS}} \) for any \( \lambda > 0 \).
-</p>
-
-<p>We can also compute the variance as </p>
-
-$$
-\mbox{Var}[\hat{\boldsymbol{\beta}}^{\mathrm{Ridge}}]=\sigma^2[  \mathbf{X}^{T} \mathbf{X} + \lambda \mathbf{I} ]^{-1}  \mathbf{X}^{T} \mathbf{X} \{ [  \mathbf{X}^{\top} \mathbf{X} + \lambda \mathbf{I} ]^{-1}\}^{T},
-$$
-
-<p>and it is easy to see that if the parameter \( \lambda \) goes to infinity then the variance of Ridge parameters \( \boldsymbol{\beta} \) goes to zero. </p>
-
-<p>With this, we can compute the difference </p>
-
-$$
-\mbox{Var}[\hat{\boldsymbol{\beta}}^{\mathrm{OLS}}]-\mbox{Var}(\hat{\boldsymbol{\beta}}^{\mathrm{Ridge}})=\sigma^2 [  \mathbf{X}^{T} \mathbf{X} + \lambda \mathbf{I} ]^{-1}[ 2\lambda\mathbf{I} + \lambda^2 (\mathbf{X}^{T} \mathbf{X})^{-1} ] \{ [  \mathbf{X}^{T} \mathbf{X} + \lambda \mathbf{I} ]^{-1}\}^{T}.
-$$
-
-<p>The difference is non-negative definite since each component of the
-matrix product is non-negative definite. 
-This means the variance we obtain with the standard OLS will always for \( \lambda > 0 \) be larger than the variance of \( \boldsymbol{\beta} \) obtained with the Ridge estimator. This has interesting consequences when we discuss the so-called bias-variance trade-off below. 
-</p>
-
-<p>For more discussions of Ridge regression and calculation of averages, <a href="https://arxiv.org/abs/1509.09169" target="_blank">Wessel van Wieringen's</a> article is highly recommended.</p>
-
 <!-- ------------------- end of main content --------------- -->
 <center style="font-size:80%">
-<!-- copyright --> &copy; 1999-2024, Morten Hjorth-Jensen. Released under CC Attribution-NonCommercial 4.0 license
+<!-- copyright --> &copy; 1999-2025, Morten Hjorth-Jensen. Released under CC Attribution-NonCommercial 4.0 license
 </center>
 </body>
 </html>
diff --git a/doc/pub/week37/ipynb/figures/adagrad.png b/doc/pub/week37/ipynb/figures/adagrad.png
new file mode 100644
index 000000000..97a9cf908
Binary files /dev/null and b/doc/pub/week37/ipynb/figures/adagrad.png differ
diff --git a/doc/pub/week37/ipynb/figures/adam.png b/doc/pub/week37/ipynb/figures/adam.png
new file mode 100644
index 000000000..a3a39f025
Binary files /dev/null and b/doc/pub/week37/ipynb/figures/adam.png differ
diff --git a/doc/pub/week37/ipynb/figures/rmsprop.png b/doc/pub/week37/ipynb/figures/rmsprop.png
new file mode 100644
index 000000000..9f336d033
Binary files /dev/null and b/doc/pub/week37/ipynb/figures/rmsprop.png differ
diff --git a/doc/pub/week37/ipynb/ipynb-week37-src.tar.gz b/doc/pub/week37/ipynb/ipynb-week37-src.tar.gz
index 7e1990ba9..db3ca308b 100644
Binary files a/doc/pub/week37/ipynb/ipynb-week37-src.tar.gz and b/doc/pub/week37/ipynb/ipynb-week37-src.tar.gz differ
diff --git a/doc/pub/week37/ipynb/week37.ipynb b/doc/pub/week37/ipynb/week37.ipynb
index 0aba405c9..e652b655b 100644
--- a/doc/pub/week37/ipynb/week37.ipynb
+++ b/doc/pub/week37/ipynb/week37.ipynb
@@ -2,1811 +2,2345 @@
  "cells": [
   {
    "cell_type": "markdown",
-   "id": "7ce0c2c8",
-   "metadata": {},
+   "id": "d842e7e1",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "<!-- HTML file automatically generated from DocOnce source (https://github.com/doconce/doconce/)\n",
     "doconce format html week37.do.txt --no_mako -->\n",
-    "<!-- dom:TITLE: Week 37: Statistical interpretations and Resampling Methods -->"
+    "<!-- dom:TITLE: Week 37: Gradient descent methods -->"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "3bd9c0ba",
-   "metadata": {},
+   "id": "0cd52479",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "# Week 37: Statistical interpretations and Resampling Methods\n",
+    "# Week 37: Gradient descent methods\n",
     "**Morten Hjorth-Jensen**, Department of Physics, University of Oslo, Norway\n",
     "\n",
-    "Date: **September 9, 2024**\n",
+    "Date: **September 8-12, 2025**\n",
     "\n",
     "<!-- todo add link to videos and add link to Van Wieringens notes -->"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "be6135df",
-   "metadata": {},
+   "id": "699b6141",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "## Plans for week 37, lecture Monday\n",
     "\n",
-    "**Material for the lecture on Monday September 9.**\n",
-    "\n",
-    "   * [Video of Lecture](https://youtu.be/omLmp_kkie0)\n",
+    "**Plans and material  for the lecture on Monday September 8.**\n",
     "\n",
-    "   * [Whiteboard notes](https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2024/NotesSeptember9.pdf)\n",
+    "The family of gradient descent methods\n",
+    "1. Plain gradient descent (constant learning rate), reminder from last week with examples using OLS and Ridge\n",
     "\n",
-    "  * Statistical interpretation of Ridge and Lasso regression, see also slides from last week\n",
+    "2. Improving gradient descent with momentum\n",
     "\n",
-    "  * Resampling techniques, Bootstrap and cross validation and bias-variance tradeoff (this may partly be discussed during the exercise sessions as well.\n",
+    "3. Introducing stochastic gradient descent\n",
     "\n",
-    "  * Readings and Videos:\n",
-    "    * Raschka et al, pages 175-192\n",
+    "4. More advanced updates of the learning rate: ADAgrad, RMSprop and ADAM\n",
     "\n",
-    "    * Hastie et al Chapter 7, here we recommend 7.1-7.5 and 7.10 (cross-validation) and 7.11 (bootstrap). See <https://link.springer.com/book/10.1007/978-0-387-84858-7>.\n",
+    "5. [Video of Lecture](https://youtu.be/SuxK68tj-V8)\n",
     "\n",
-    "    * [Video on cross validation](https://www.youtube.com/watch?v=fSytzGwwBVw)\n",
-    "\n",
-    "    * [Video on Bootstrapping](https://www.youtube.com/watch?v=Xz0x-8-cgaQ)\n",
-    "\n",
-    "    * [Video on bias-variance tradeoff](https://www.youtube.com/watch?v=EuBBz3bI-aA)"
+    "6. [Whiteboard notes](https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2025/FYSSTKweek37.pdf)"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "3ac97fd3",
-   "metadata": {},
+   "id": "dd264b1c",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "## Plans for week 37, lab sessions\n",
-    "\n",
-    "**Material for the lab  sessions on Tuesday and Wednesday.**\n",
+    "## Readings and Videos:\n",
+    "1. Recommended: Goodfellow et al, Deep Learning, introduction to gradient descent, see sections 4.3-4.5  at <https://www.deeplearningbook.org/contents/numerical.html> and chapter 8.3-8.5 at <https://www.deeplearningbook.org/contents/optimization.html>\n",
     "\n",
-    "  * Calculations of expectation values\n",
+    "2. Rashcka et al, pages 37-44 and pages 278-283 with focus on linear regression.\n",
     "\n",
-    "  * Discussion of resampling techniques\n",
+    "3. Video on gradient descent at <https://www.youtube.com/watch?v=sDv4f4s2SB8>\n",
     "\n",
-    "  * Exercise set for week 37\n",
-    "\n",
-    "  * Work on project 1\n",
-    "\n",
-    "  * [Video of exercise sessions week 37](https://youtu.be/bK4AEcTu-oM)\n",
-    "\n",
-    "  * For more discussions of Ridge regression and calculation of averages, [Wessel van Wieringen's](https://arxiv.org/abs/1509.09169) article is highly recommended."
+    "4. Video on Stochastic gradient descent at <https://www.youtube.com/watch?v=vMh0zPT0tLI>"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "7010206d",
-   "metadata": {},
+   "id": "608927bc",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "## Material for lecture Monday September 9"
+    "## Material for lecture Monday September 8"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "2467ccb1",
-   "metadata": {},
+   "id": "60640670",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "## Deriving OLS from a probability distribution\n",
+    "## Gradient descent and revisiting Ordinary Least Squares from last week\n",
+    "\n",
+    "Last week we started with  linear regression as a case study for the gradient descent\n",
+    "methods. Linear regression is a great test case for the gradient\n",
+    "descent methods discussed in the lectures since it has several\n",
+    "desirable properties such as:\n",
+    "\n",
+    "1. An analytical solution (recall homework sets for week 35).\n",
+    "\n",
+    "2. The gradient can be computed analytically.\n",
     "\n",
-    "Our basic assumption when we derived the OLS equations was to assume\n",
-    "that our output is determined by a given continuous function\n",
-    "$f(\\boldsymbol{x})$ and a random noise $\\boldsymbol{\\epsilon}$ given by the normal\n",
-    "distribution with zero mean value and an undetermined variance\n",
-    "$\\sigma^2$.\n",
+    "3. The cost function is convex which guarantees that gradient descent converges for small enough learning rates\n",
     "\n",
-    "We found above that the outputs $\\boldsymbol{y}$ have a mean value given by\n",
-    "$\\boldsymbol{X}\\hat{\\boldsymbol{\\beta}}$ and variance $\\sigma^2$. Since the entries to\n",
-    "the design matrix are not stochastic variables, we can assume that the\n",
-    "probability distribution of our targets is also a normal distribution\n",
-    "but now with mean value $\\boldsymbol{X}\\hat{\\boldsymbol{\\beta}}$. This means that a\n",
-    "single output $y_i$ is given by the Gaussian distribution"
+    "We revisit an example similar to what we had in the first homework set. We have a function  of the type"
    ]
   },
   {
-   "cell_type": "markdown",
-   "id": "4e2b9f77",
-   "metadata": {},
+   "cell_type": "code",
+   "execution_count": 1,
+   "id": "947b67ee",
+   "metadata": {
+    "collapsed": false,
+    "editable": true,
+    "jupyter": {
+     "outputs_hidden": false
+    }
+   },
+   "outputs": [],
    "source": [
-    "$$\n",
-    "y_i\\sim \\mathcal{N}(\\boldsymbol{X}_{i,*}\\boldsymbol{\\beta}, \\sigma^2)=\\frac{1}{\\sqrt{2\\pi\\sigma^2}}\\exp{\\left[-\\frac{(y_i-\\boldsymbol{X}_{i,*}\\boldsymbol{\\beta})^2}{2\\sigma^2}\\right]}.\n",
-    "$$"
+    "import numpy as np\n",
+    "x = 2*np.random.rand(m,1)\n",
+    "y = 4+3*x+np.random.randn(m,1)"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "aa3e18f2",
-   "metadata": {},
+   "id": "0a787eca",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "## Independent and Identically Distrubuted (iid)\n",
-    "\n",
-    "We assume now that the various $y_i$ values are stochastically distributed according to the above Gaussian distribution. \n",
-    "We define this distribution as"
+    "with $x_i \\in [0,1] $ is chosen randomly using a uniform distribution. Additionally we have a stochastic noise chosen according to a normal distribution $\\cal {N}(0,1)$. \n",
+    "The linear regression model is given by"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "a9836e09",
-   "metadata": {},
+   "id": "d7e84ac7",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
-    "p(y_i, \\boldsymbol{X}\\vert\\boldsymbol{\\beta})=\\frac{1}{\\sqrt{2\\pi\\sigma^2}}\\exp{\\left[-\\frac{(y_i-\\boldsymbol{X}_{i,*}\\boldsymbol{\\beta})^2}{2\\sigma^2}\\right]},\n",
+    "h_\\theta(x) = \\boldsymbol{y} = \\theta_0 + \\theta_1 x,\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "81c40e6c",
-   "metadata": {},
+   "id": "f34c217e",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "which reads as finding the likelihood of an event $y_i$ with the input variables $\\boldsymbol{X}$ given the parameters (to be determined) $\\boldsymbol{\\beta}$.\n",
-    "\n",
-    "Since these events are assumed to be independent and identicall distributed we can build the probability distribution function (PDF) for all possible event $\\boldsymbol{y}$ as the product of the single events, that is we have"
+    "such that"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "fd6babf0",
-   "metadata": {},
+   "id": "b145d4eb",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
-    "p(\\boldsymbol{y},\\boldsymbol{X}\\vert\\boldsymbol{\\beta})=\\prod_{i=0}^{n-1}\\frac{1}{\\sqrt{2\\pi\\sigma^2}}\\exp{\\left[-\\frac{(y_i-\\boldsymbol{X}_{i,*}\\boldsymbol{\\beta})^2}{2\\sigma^2}\\right]}=\\prod_{i=0}^{n-1}p(y_i,\\boldsymbol{X}\\vert\\boldsymbol{\\beta}).\n",
+    "\\boldsymbol{y}_i = \\theta_0 + \\theta_1 x_i.\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "71dfe09f",
-   "metadata": {},
+   "id": "2df6d60d",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "We will write this in a more compact form reserving $\\boldsymbol{D}$ for the domain of events, including the ouputs (targets) and the inputs. That is\n",
-    "in case we have a simple one-dimensional input and output case"
+    "## Gradient descent example\n",
+    "\n",
+    "Let $\\mathbf{y} = (y_1,\\cdots,y_n)^T$, $\\mathbf{\\boldsymbol{y}} = (\\boldsymbol{y}_1,\\cdots,\\boldsymbol{y}_n)^T$ and $\\theta = (\\theta_0, \\theta_1)^T$\n",
+    "\n",
+    "It is convenient to write $\\mathbf{\\boldsymbol{y}} = X\\theta$ where $X \\in \\mathbb{R}^{100 \\times 2} $ is the design matrix given by (we keep the intercept here)"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "90f49395",
-   "metadata": {},
+   "id": "1deafba0",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
-    "\\boldsymbol{D}=[(x_0,y_0), (x_1,y_1),\\dots, (x_{n-1},y_{n-1})].\n",
+    "X \\equiv \\begin{bmatrix}\n",
+    "1 & x_1  \\\\\n",
+    "\\vdots & \\vdots  \\\\\n",
+    "1 & x_{100} &  \\\\\n",
+    "\\end{bmatrix}.\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "eb1d8923",
-   "metadata": {},
+   "id": "520ac423",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "In the more general case the various inputs should be replaced by the possible features represented by the input data set $\\boldsymbol{X}$. \n",
-    "We can now rewrite the above probability as"
+    "The cost/loss/risk function is given by"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "a3b37065",
-   "metadata": {},
+   "id": "48e7232b",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
-    "p(\\boldsymbol{D}\\vert\\boldsymbol{\\beta})=\\prod_{i=0}^{n-1}\\frac{1}{\\sqrt{2\\pi\\sigma^2}}\\exp{\\left[-\\frac{(y_i-\\boldsymbol{X}_{i,*}\\boldsymbol{\\beta})^2}{2\\sigma^2}\\right]}.\n",
+    "C(\\theta) = \\frac{1}{n}||X\\theta-\\mathbf{y}||_{2}^{2} = \\frac{1}{n}\\sum_{i=1}^{100}\\left[ (\\theta_0 + \\theta_1 x_i)^2 - 2 y_i (\\theta_0 + \\theta_1 x_i) + y_i^2\\right]\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "092a8fe2",
-   "metadata": {},
-   "source": [
-    "It is a conditional probability (see below) and reads as the likelihood of a domain of events $\\boldsymbol{D}$ given a set of parameters $\\boldsymbol{\\beta}$."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "f15ab83a",
-   "metadata": {},
+   "id": "0194af20",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "## Maximum Likelihood Estimation (MLE)\n",
-    "\n",
-    "In statistics, maximum likelihood estimation (MLE) is a method of\n",
-    "estimating the parameters of an assumed probability distribution,\n",
-    "given some observed data. This is achieved by maximizing a likelihood\n",
-    "function so that, under the assumed statistical model, the observed\n",
-    "data is the most probable. \n",
-    "\n",
-    "We will assume here that our events are given by the above Gaussian\n",
-    "distribution and we will determine the optimal parameters $\\beta$ by\n",
-    "maximizing the above PDF. However, computing the derivatives of a\n",
-    "product function is cumbersome and can easily lead to overflow and/or\n",
-    "underflowproblems, with potentials for loss of numerical precision.\n",
-    "\n",
-    "In practice, it is more convenient to maximize the logarithm of the\n",
-    "PDF because it is a monotonically increasing function of the argument.\n",
-    "Alternatively, and this will be our option, we will minimize the\n",
-    "negative of the logarithm since this is a monotonically decreasing\n",
-    "function.\n",
-    "\n",
-    "Note also that maximization/minimization of the logarithm of the PDF\n",
-    "is equivalent to the maximization/minimization of the function itself."
+    "and we want to find $\\theta$ such that $C(\\theta)$ is minimized."
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "5ed97aa3",
-   "metadata": {},
+   "id": "9f58d823",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "## A new Cost Function\n",
+    "## The derivative of the cost/loss function\n",
     "\n",
-    "We could now define a new cost function to minimize, namely the negative logarithm of the above PDF"
+    "Computing $\\partial C(\\theta) / \\partial \\theta_0$ and $\\partial C(\\theta) / \\partial \\theta_1$ we can show  that the gradient can be written as"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "7b570c43",
-   "metadata": {},
+   "id": "10129d02",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
-    "C(\\boldsymbol{\\beta}=-\\log{\\prod_{i=0}^{n-1}p(y_i,\\boldsymbol{X}\\vert\\boldsymbol{\\beta})}=-\\sum_{i=0}^{n-1}\\log{p(y_i,\\boldsymbol{X}\\vert\\boldsymbol{\\beta})},\n",
+    "\\nabla_{\\theta} C(\\theta) = \\frac{2}{n}\\begin{bmatrix} \\sum_{i=1}^{100} \\left(\\theta_0+\\theta_1x_i-y_i\\right) \\\\\n",
+    "\\sum_{i=1}^{100}\\left( x_i (\\theta_0+\\theta_1x_i)-y_ix_i\\right) \\\\\n",
+    "\\end{bmatrix} = \\frac{2}{n}X^T(X\\theta - \\mathbf{y}),\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "8a82d943",
-   "metadata": {},
+   "id": "4cd07523",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "which becomes"
+    "where $X$ is the design matrix defined above."
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "a9fe176c",
-   "metadata": {},
+   "id": "1bda7e01",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "$$\n",
-    "C(\\boldsymbol{\\beta}=\\frac{n}{2}\\log{2\\pi\\sigma^2}+\\frac{\\vert\\vert (\\boldsymbol{y}-\\boldsymbol{X}\\boldsymbol{\\beta})\\vert\\vert_2^2}{2\\sigma^2}.\n",
-    "$$"
+    "## The Hessian matrix\n",
+    "The Hessian matrix of $C(\\theta)$ is given by"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "720c650c",
-   "metadata": {},
+   "id": "aa64bdd1",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "Taking the derivative of the *new* cost function with respect to the parameters $\\beta$ we recognize our familiar OLS equation, namely"
+    "$$\n",
+    "\\boldsymbol{H} \\equiv \\begin{bmatrix}\n",
+    "\\frac{\\partial^2 C(\\theta)}{\\partial \\theta_0^2} & \\frac{\\partial^2 C(\\theta)}{\\partial \\theta_0 \\partial \\theta_1}  \\\\\n",
+    "\\frac{\\partial^2 C(\\theta)}{\\partial \\theta_0 \\partial \\theta_1} & \\frac{\\partial^2 C(\\theta)}{\\partial \\theta_1^2} &  \\\\\n",
+    "\\end{bmatrix} = \\frac{2}{n}X^T X.\n",
+    "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "4c13afe3",
-   "metadata": {},
+   "id": "3e7f4c5d",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "$$\n",
-    "\\boldsymbol{X}^T\\left(\\boldsymbol{y}-\\boldsymbol{X}\\boldsymbol{\\beta}\\right) =0,\n",
-    "$$"
+    "This result implies that $C(\\theta)$ is a convex function since the matrix $X^T X$ always is positive semi-definite."
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "f23aa0de",
-   "metadata": {},
+   "id": "79ed73a8",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "which leads to the well-known OLS equation for the optimal paramters $\\beta$"
+    "## Simple program\n",
+    "\n",
+    "We can now write a program that minimizes $C(\\theta)$ using the gradient descent method with a constant learning rate $\\eta$ according to"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "cc0ec7c0",
-   "metadata": {},
+   "id": "1b70ad9b",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
-    "\\hat{\\boldsymbol{\\beta}}^{\\mathrm{OLS}}=\\left(\\boldsymbol{X}^T\\boldsymbol{X}\\right)^{-1}\\boldsymbol{X}^T\\boldsymbol{y}!\n",
+    "\\theta_{k+1} = \\theta_k - \\eta \\nabla_\\theta C(\\theta_k), \\ k=0,1,\\cdots\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "71275e73",
-   "metadata": {},
+   "id": "2fbef92d",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "Before we make a similar analysis for Ridge and Lasso regression, we need a short reminder on statistics."
+    "We can use the expression we computed for the gradient and let use a\n",
+    "$\\theta_0$ be chosen randomly and let $\\eta = 0.001$. Stop iterating\n",
+    "when $||\\nabla_\\theta C(\\theta_k) || \\leq \\epsilon = 10^{-8}$. **Note that the code below does not include the latter stop criterion**.\n",
+    "\n",
+    "And finally we can compare our solution for $\\theta$ with the analytic result given by \n",
+    "$\\theta= (X^TX)^{-1} X^T \\mathbf{y}$."
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "4c41ecf0",
-   "metadata": {},
+   "id": "0728a369",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "## More basic Statistics and Bayes' theorem\n",
-    "\n",
-    "A central theorem in statistics is Bayes' theorem. This theorem plays a similar role as the good old Pythagoras' theorem in geometry.\n",
-    "Bayes' theorem is extremely simple to derive. But to do so we need some basic axioms from statistics.\n",
-    "\n",
-    "Assume we have two domains of events $X=[x_0,x_1,\\dots,x_{n-1}]$ and $Y=[y_0,y_1,\\dots,y_{n-1}]$.\n",
-    "\n",
-    "We define also the likelihood for $X$ and $Y$ as $p(X)$ and $p(Y)$ respectively.\n",
-    "The likelihood of a specific event $x_i$ (or $y_i$) is then written as $p(X=x_i)$ or just $p(x_i)=p_i$. \n",
+    "## Gradient Descent Example\n",
     "\n",
-    "**Union of events is given by.**"
+    "Here our simple example"
    ]
   },
   {
-   "cell_type": "markdown",
-   "id": "890ba6f9",
-   "metadata": {},
+   "cell_type": "code",
+   "execution_count": 2,
+   "id": "a48d43f0",
+   "metadata": {
+    "collapsed": false,
+    "editable": true,
+    "jupyter": {
+     "outputs_hidden": false
+    }
+   },
+   "outputs": [],
    "source": [
-    "$$\n",
-    "p(X \\cup Y)= p(X)+p(Y)-p(X \\cap Y).\n",
-    "$$"
+    "%matplotlib inline\n",
+    "\n",
+    "\n",
+    "# Importing various packages\n",
+    "from random import random, seed\n",
+    "import numpy as np\n",
+    "import matplotlib.pyplot as plt\n",
+    "from mpl_toolkits.mplot3d import Axes3D\n",
+    "from matplotlib import cm\n",
+    "from matplotlib.ticker import LinearLocator, FormatStrFormatter\n",
+    "import sys\n",
+    "\n",
+    "# the number of datapoints\n",
+    "n = 100\n",
+    "x = 2*np.random.rand(n,1)\n",
+    "y = 4+3*x+np.random.randn(n,1)\n",
+    "\n",
+    "X = np.c_[np.ones((n,1)), x]\n",
+    "# Hessian matrix\n",
+    "H = (2.0/n)* X.T @ X\n",
+    "# Get the eigenvalues\n",
+    "EigValues, EigVectors = np.linalg.eig(H)\n",
+    "print(f\"Eigenvalues of Hessian Matrix:{EigValues}\")\n",
+    "\n",
+    "theta_linreg = np.linalg.inv(X.T @ X) @ X.T @ y\n",
+    "print(theta_linreg)\n",
+    "theta = np.random.randn(2,1)\n",
+    "\n",
+    "eta = 1.0/np.max(EigValues)\n",
+    "Niterations = 1000\n",
+    "\n",
+    "for iter in range(Niterations):\n",
+    "    gradient = (2.0/n)*X.T @ (X @ theta-y)\n",
+    "    theta -= eta*gradient\n",
+    "\n",
+    "print(theta)\n",
+    "xnew = np.array([[0],[2]])\n",
+    "xbnew = np.c_[np.ones((2,1)), xnew]\n",
+    "ypredict = xbnew.dot(theta)\n",
+    "ypredict2 = xbnew.dot(theta_linreg)\n",
+    "plt.plot(xnew, ypredict, \"r-\")\n",
+    "plt.plot(xnew, ypredict2, \"b-\")\n",
+    "plt.plot(x, y ,'ro')\n",
+    "plt.axis([0,2.0,0, 15.0])\n",
+    "plt.xlabel(r'$x$')\n",
+    "plt.ylabel(r'$y$')\n",
+    "plt.title(r'Gradient descent example')\n",
+    "plt.show()"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "dfcd3bf6",
-   "metadata": {},
+   "id": "6c1c6ed1",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "**The product rule (aka joint probability) is given by.**"
+    "## Gradient descent and Ridge\n",
+    "\n",
+    "We have also discussed Ridge regression where the loss function contains a regularized term given by the $L_2$ norm of $\\theta$,"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "956837e7",
-   "metadata": {},
+   "id": "a82ce6e3",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
-    "p(X \\cup Y)= p(X,Y)= p(X\\vert Y)p(Y)=p(Y\\vert X)p(X),\n",
+    "C_{\\text{ridge}}(\\theta) = \\frac{1}{n}||X\\theta -\\mathbf{y}||^2 + \\lambda ||\\theta||^2, \\ \\lambda \\geq 0.\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "d3802c8c",
-   "metadata": {},
-   "source": [
-    "where we read $p(X\\vert Y)$ as the likelihood of obtaining $X$ given $Y$.\n",
-    "\n",
-    "If we have independent events then $p(X,Y)=p(X)p(Y)$."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "95db199e",
-   "metadata": {},
+   "id": "cb0de7c2",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "## Marginal Probability\n",
-    "\n",
-    "The marginal probability is defined in terms of only one of the set of variables $X,Y$. For a discrete probability we have"
+    "In order to minimize $C_{\\text{ridge}}(\\theta)$ using GD we adjust the gradient as follows"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "57fb6f6a",
-   "metadata": {},
+   "id": "b76c0dea",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
-    "p(X)=\\sum_{i=0}^{n-1}p(X,Y=y_i)=\\sum_{i=0}^{n-1}p(X\\vert Y=y_i)p(Y=y_i)=\\sum_{i=0}^{n-1}p(X\\vert y_i)p(y_i).\n",
+    "\\nabla_\\theta C_{\\text{ridge}}(\\theta)  = \\frac{2}{n}\\begin{bmatrix} \\sum_{i=1}^{100} \\left(\\theta_0+\\theta_1x_i-y_i\\right) \\\\\n",
+    "\\sum_{i=1}^{100}\\left( x_i (\\theta_0+\\theta_1x_i)-y_ix_i\\right) \\\\\n",
+    "\\end{bmatrix} + 2\\lambda\\begin{bmatrix} \\theta_0 \\\\ \\theta_1\\end{bmatrix} = 2 (\\frac{1}{n}X^T(X\\theta - \\mathbf{y})+\\lambda \\theta).\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "ea33fe6b",
-   "metadata": {},
+   "id": "4eeb07f6",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "## Conditional  Probability\n",
-    "\n",
-    "The conditional  probability, if $p(Y) > 0$, is"
+    "We can easily extend our program to minimize $C_{\\text{ridge}}(\\theta)$ using gradient descent and compare with the analytical solution given by"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "5705568f",
-   "metadata": {},
+   "id": "cc7d6c64",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
-    "p(X\\vert Y)= \\frac{p(X,Y)}{p(Y)}=\\frac{p(X,Y)}{\\sum_{i=0}^{n-1}p(Y\\vert X=x_i)p(x_i)}.\n",
+    "\\theta_{\\text{ridge}} = \\left(X^T X + n\\lambda I_{2 \\times 2} \\right)^{-1} X^T \\mathbf{y}.\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "ef0bee11",
-   "metadata": {},
+   "id": "08bd65db",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "## Bayes' Theorem\n",
-    "\n",
-    "If we combine the conditional probability with the marginal probability and the standard product rule, we have"
+    "## The Hessian matrix for Ridge Regression\n",
+    "The Hessian matrix of Ridge Regression for our simple example  is given by"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "7569074e",
-   "metadata": {},
+   "id": "a1c5a4d1",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
-    "p(X\\vert Y)= \\frac{p(X,Y)}{p(Y)},\n",
+    "\\boldsymbol{H} \\equiv \\begin{bmatrix}\n",
+    "\\frac{\\partial^2 C(\\theta)}{\\partial \\theta_0^2} & \\frac{\\partial^2 C(\\theta)}{\\partial \\theta_0 \\partial \\theta_1}  \\\\\n",
+    "\\frac{\\partial^2 C(\\theta)}{\\partial \\theta_0 \\partial \\theta_1} & \\frac{\\partial^2 C(\\theta)}{\\partial \\theta_1^2} &  \\\\\n",
+    "\\end{bmatrix} = \\frac{2}{n}X^T X+2\\lambda\\boldsymbol{I}.\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "3cecb6a2",
-   "metadata": {},
+   "id": "f178c97e",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "which we can rewrite as"
+    "This implies that the Hessian matrix  is positive definite, hence the stationary point is a\n",
+    "minimum.\n",
+    "Note that the Ridge cost function is convex being  a sum of two convex\n",
+    "functions. Therefore, the stationary point is a global\n",
+    "minimum of this function."
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "c76a1bc7",
-   "metadata": {},
+   "id": "3853aec7",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "$$\n",
-    "p(X\\vert Y)= \\frac{p(X,Y)}{\\sum_{i=0}^{n-1}p(Y\\vert X=x_i)p(x_i)}=\\frac{p(Y\\vert X)p(X)}{\\sum_{i=0}^{n-1}p(Y\\vert X=x_i)p(x_i)},\n",
-    "$$"
+    "## Program example for gradient descent with Ridge Regression"
    ]
   },
   {
-   "cell_type": "markdown",
-   "id": "d97f5c86",
-   "metadata": {},
+   "cell_type": "code",
+   "execution_count": 3,
+   "id": "81740e7b",
+   "metadata": {
+    "collapsed": false,
+    "editable": true,
+    "jupyter": {
+     "outputs_hidden": false
+    }
+   },
+   "outputs": [],
    "source": [
-    "which is Bayes' theorem. It allows us to evaluate the uncertainty in in $X$ after we have observed $Y$. We can easily interchange $X$ with $Y$."
+    "from random import random, seed\n",
+    "import numpy as np\n",
+    "import matplotlib.pyplot as plt\n",
+    "from mpl_toolkits.mplot3d import Axes3D\n",
+    "from matplotlib import cm\n",
+    "from matplotlib.ticker import LinearLocator, FormatStrFormatter\n",
+    "import sys\n",
+    "\n",
+    "# the number of datapoints\n",
+    "n = 100\n",
+    "x = 2*np.random.rand(n,1)\n",
+    "y = 4+3*x+np.random.randn(n,1)\n",
+    "\n",
+    "X = np.c_[np.ones((n,1)), x]\n",
+    "XT_X = X.T @ X\n",
+    "\n",
+    "#Ridge parameter lambda\n",
+    "lmbda  = 0.001\n",
+    "Id = n*lmbda* np.eye(XT_X.shape[0])\n",
+    "\n",
+    "# Hessian matrix\n",
+    "H = (2.0/n)* XT_X+2*lmbda* np.eye(XT_X.shape[0])\n",
+    "# Get the eigenvalues\n",
+    "EigValues, EigVectors = np.linalg.eig(H)\n",
+    "print(f\"Eigenvalues of Hessian Matrix:{EigValues}\")\n",
+    "\n",
+    "\n",
+    "theta_linreg = np.linalg.inv(XT_X+Id) @ X.T @ y\n",
+    "print(theta_linreg)\n",
+    "# Start plain gradient descent\n",
+    "theta = np.random.randn(2,1)\n",
+    "\n",
+    "eta = 1.0/np.max(EigValues)\n",
+    "Niterations = 100\n",
+    "\n",
+    "for iter in range(Niterations):\n",
+    "    gradients = 2.0/n*X.T @ (X @ (theta)-y)+2*lmbda*theta\n",
+    "    theta -= eta*gradients\n",
+    "\n",
+    "print(theta)\n",
+    "ypredict = X @ theta\n",
+    "ypredict2 = X @ theta_linreg\n",
+    "plt.plot(x, ypredict, \"r-\")\n",
+    "plt.plot(x, ypredict2, \"b-\")\n",
+    "plt.plot(x, y ,'ro')\n",
+    "plt.axis([0,2.0,0, 15.0])\n",
+    "plt.xlabel(r'$x$')\n",
+    "plt.ylabel(r'$y$')\n",
+    "plt.title(r'Gradient descent example for Ridge')\n",
+    "plt.show()"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "6e5646ae",
-   "metadata": {},
+   "id": "aa1b6e08",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "## Interpretations of Bayes' Theorem\n",
+    "## Using gradient descent methods, limitations\n",
     "\n",
-    "The quantity $p(Y\\vert X)$ on the right-hand side of the theorem is\n",
-    "evaluated for the observed data $Y$ and can be viewed as a function of\n",
-    "the parameter space represented by $X$. This function is not\n",
-    "necesseraly normalized and is normally called the likelihood function.\n",
+    "* **Gradient descent (GD) finds local minima of our function**. Since the GD algorithm is deterministic, if it converges, it will converge to a local minimum of our cost/loss/risk function. Because in ML we are often dealing with extremely rugged landscapes with many local minima, this can lead to poor performance.\n",
     "\n",
-    "The function $p(X)$ on the right hand side is called the prior while the function on the left hand side is the called the posterior probability. The denominator on the right hand side serves as a normalization factor for the posterior distribution.\n",
+    "* **GD is sensitive to initial conditions**. One consequence of the local nature of GD is that initial conditions matter. Depending on where one starts, one will end up at a different local minima. Therefore, it is very important to think about how one initializes the training process. This is true for GD as well as more complicated variants of GD.\n",
     "\n",
-    "Let us try to illustrate Bayes' theorem through an example."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "3a23e77a",
-   "metadata": {},
-   "source": [
-    "## Example of Usage of Bayes' theorem\n",
+    "* **Gradients are computationally expensive to calculate for large datasets**. In many cases in statistics and ML, the cost/loss/risk function is a sum of terms, with one term for each data point. For example, in linear regression, $E \\propto \\sum_{i=1}^n (y_i - \\mathbf{w}^T\\cdot\\mathbf{x}_i)^2$; for logistic regression, the square error is replaced by the cross entropy. To calculate the gradient we have to sum over *all* $n$ data points. Doing this at every GD step becomes extremely computationally expensive. An ingenious solution to this, is to calculate the gradients using small subsets of the data called \"mini batches\". This has the added benefit of introducing stochasticity into our algorithm.\n",
     "\n",
-    "Let us suppose that you are undergoing a series of mammography scans in\n",
-    "order to rule out possible breast cancer cases.  We define the\n",
-    "sensitivity for a positive event by the variable $X$. It takes binary\n",
-    "values with $X=1$ representing a positive event and $X=0$ being a\n",
-    "negative event. We reserve $Y$ as a classification parameter for\n",
-    "either a negative or a positive breast cancer confirmation. (Short note on wordings: positive here means having breast cancer, although none of us would consider this being a  positive thing).\n",
+    "* **GD is very sensitive to choices of learning rates**. GD is extremely sensitive to the choice of learning rates. If the learning rate is very small, the training process take an extremely long time. For larger learning rates, GD can diverge and give poor results. Furthermore, depending on what the local landscape looks like, we have to modify the learning rates to ensure convergence. Ideally, we would *adaptively* choose the learning rates to match the landscape.\n",
     "\n",
-    "We let $Y=1$ represent the the case of having breast cancer and $Y=0$ as not.\n",
+    "* **GD treats all directions in parameter space uniformly.** Another major drawback of GD is that unlike Newton's method, the learning rate for GD is the same in all directions in parameter space. For this reason, the maximum learning rate is set by the behavior of the steepest direction and this can significantly slow down training. Ideally, we would like to take large steps in flat directions and small steps in steep directions. Since we are exploring rugged landscapes where curvatures change, this requires us to keep track of not only the gradient but second derivatives. The ideal scenario would be to calculate the Hessian but this proves to be too computationally expensive. \n",
     "\n",
-    "Let us assume that if you have breast cancer, the test will be positive with a probability of $0.8$, that is we have"
+    "* GD can take exponential time to escape saddle points, even with random initialization. As we mentioned, GD is extremely sensitive to initial condition since it determines the particular local minimum GD would eventually reach. However, even with a good initialization scheme, through the introduction of randomness, GD can still take exponential time to escape saddle points."
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "7432af91",
-   "metadata": {},
+   "id": "d1b9be1a",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "$$\n",
-    "p(X=1\\vert Y=1) =0.8.\n",
-    "$$"
+    "## Momentum based GD\n",
+    "\n",
+    "We discuss here some simple examples where we introduce what is called\n",
+    "'memory'about previous steps, or what is normally called momentum\n",
+    "gradient descent.\n",
+    "For the mathematical details, see whiteboad notes from lecture on September 8, 2025."
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "ea23569c",
-   "metadata": {},
+   "id": "2e1267e6",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "This obviously sounds  scary since many would conclude that if the test is positive, there is a likelihood of $80\\%$ for having cancer.\n",
-    "It is however not correct, as the following Bayesian analysis shows."
+    "## Improving gradient descent with momentum"
    ]
   },
   {
-   "cell_type": "markdown",
-   "id": "18e40c37",
-   "metadata": {},
+   "cell_type": "code",
+   "execution_count": 4,
+   "id": "494e82a7",
+   "metadata": {
+    "collapsed": false,
+    "editable": true,
+    "jupyter": {
+     "outputs_hidden": false
+    }
+   },
+   "outputs": [],
    "source": [
-    "## Doing it correctly\n",
-    "\n",
-    "If we look at various national surveys on breast cancer, the general likelihood of developing breast cancer is a very small number.\n",
-    "Let us assume that the prior probability in the population as a whole is"
+    "from numpy import asarray\n",
+    "from numpy import arange\n",
+    "from numpy.random import rand\n",
+    "from numpy.random import seed\n",
+    "from matplotlib import pyplot\n",
+    " \n",
+    "# objective function\n",
+    "def objective(x):\n",
+    "\treturn x**2.0\n",
+    " \n",
+    "# derivative of objective function\n",
+    "def derivative(x):\n",
+    "\treturn x * 2.0\n",
+    " \n",
+    "# gradient descent algorithm\n",
+    "def gradient_descent(objective, derivative, bounds, n_iter, step_size):\n",
+    "\t# track all solutions\n",
+    "\tsolutions, scores = list(), list()\n",
+    "\t# generate an initial point\n",
+    "\tsolution = bounds[:, 0] + rand(len(bounds)) * (bounds[:, 1] - bounds[:, 0])\n",
+    "\t# run the gradient descent\n",
+    "\tfor i in range(n_iter):\n",
+    "\t\t# calculate gradient\n",
+    "\t\tgradient = derivative(solution)\n",
+    "\t\t# take a step\n",
+    "\t\tsolution = solution - step_size * gradient\n",
+    "\t\t# evaluate candidate point\n",
+    "\t\tsolution_eval = objective(solution)\n",
+    "\t\t# store solution\n",
+    "\t\tsolutions.append(solution)\n",
+    "\t\tscores.append(solution_eval)\n",
+    "\t\t# report progress\n",
+    "\t\tprint('>%d f(%s) = %.5f' % (i, solution, solution_eval))\n",
+    "\treturn [solutions, scores]\n",
+    " \n",
+    "# seed the pseudo random number generator\n",
+    "seed(4)\n",
+    "# define range for input\n",
+    "bounds = asarray([[-1.0, 1.0]])\n",
+    "# define the total iterations\n",
+    "n_iter = 30\n",
+    "# define the step size\n",
+    "step_size = 0.1\n",
+    "# perform the gradient descent search\n",
+    "solutions, scores = gradient_descent(objective, derivative, bounds, n_iter, step_size)\n",
+    "# sample input range uniformly at 0.1 increments\n",
+    "inputs = arange(bounds[0,0], bounds[0,1]+0.1, 0.1)\n",
+    "# compute targets\n",
+    "results = objective(inputs)\n",
+    "# create a line plot of input vs result\n",
+    "pyplot.plot(inputs, results)\n",
+    "# plot the solutions found\n",
+    "pyplot.plot(solutions, scores, '.-', color='red')\n",
+    "# show the plot\n",
+    "pyplot.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "46858c7c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Same code but now with momentum gradient descent"
    ]
   },
   {
-   "cell_type": "markdown",
-   "id": "5c32a5c2",
-   "metadata": {},
+   "cell_type": "code",
+   "execution_count": 5,
+   "id": "6a917123",
+   "metadata": {
+    "collapsed": false,
+    "editable": true,
+    "jupyter": {
+     "outputs_hidden": false
+    }
+   },
+   "outputs": [],
    "source": [
-    "$$\n",
-    "p(Y=1) =0.004.\n",
-    "$$"
+    "from numpy import asarray\n",
+    "from numpy import arange\n",
+    "from numpy.random import rand\n",
+    "from numpy.random import seed\n",
+    "from matplotlib import pyplot\n",
+    " \n",
+    "# objective function\n",
+    "def objective(x):\n",
+    "\treturn x**2.0\n",
+    " \n",
+    "# derivative of objective function\n",
+    "def derivative(x):\n",
+    "\treturn x * 2.0\n",
+    " \n",
+    "# gradient descent algorithm\n",
+    "def gradient_descent(objective, derivative, bounds, n_iter, step_size, momentum):\n",
+    "\t# track all solutions\n",
+    "\tsolutions, scores = list(), list()\n",
+    "\t# generate an initial point\n",
+    "\tsolution = bounds[:, 0] + rand(len(bounds)) * (bounds[:, 1] - bounds[:, 0])\n",
+    "\t# keep track of the change\n",
+    "\tchange = 0.0\n",
+    "\t# run the gradient descent\n",
+    "\tfor i in range(n_iter):\n",
+    "\t\t# calculate gradient\n",
+    "\t\tgradient = derivative(solution)\n",
+    "\t\t# calculate update\n",
+    "\t\tnew_change = step_size * gradient + momentum * change\n",
+    "\t\t# take a step\n",
+    "\t\tsolution = solution - new_change\n",
+    "\t\t# save the change\n",
+    "\t\tchange = new_change\n",
+    "\t\t# evaluate candidate point\n",
+    "\t\tsolution_eval = objective(solution)\n",
+    "\t\t# store solution\n",
+    "\t\tsolutions.append(solution)\n",
+    "\t\tscores.append(solution_eval)\n",
+    "\t\t# report progress\n",
+    "\t\tprint('>%d f(%s) = %.5f' % (i, solution, solution_eval))\n",
+    "\treturn [solutions, scores]\n",
+    " \n",
+    "# seed the pseudo random number generator\n",
+    "seed(4)\n",
+    "# define range for input\n",
+    "bounds = asarray([[-1.0, 1.0]])\n",
+    "# define the total iterations\n",
+    "n_iter = 30\n",
+    "# define the step size\n",
+    "step_size = 0.1\n",
+    "# define momentum\n",
+    "momentum = 0.3\n",
+    "# perform the gradient descent search with momentum\n",
+    "solutions, scores = gradient_descent(objective, derivative, bounds, n_iter, step_size, momentum)\n",
+    "# sample input range uniformly at 0.1 increments\n",
+    "inputs = arange(bounds[0,0], bounds[0,1]+0.1, 0.1)\n",
+    "# compute targets\n",
+    "results = objective(inputs)\n",
+    "# create a line plot of input vs result\n",
+    "pyplot.plot(inputs, results)\n",
+    "# plot the solutions found\n",
+    "pyplot.plot(solutions, scores, '.-', color='red')\n",
+    "# show the plot\n",
+    "pyplot.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "361b2aa8",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Overview video on Stochastic Gradient Descent (SGD)\n",
+    "\n",
+    "[What is Stochastic Gradient Descent](https://www.youtube.com/watch?v=vMh0zPT0tLI&ab_channel=StatQuestwithJoshStarmer)\n",
+    "There are several reasons for using stochastic gradient descent. Some of these are:\n",
+    "\n",
+    "1. Efficiency: Updates weights more frequently using a single or a small batch of samples, which speeds up convergence.\n",
+    "\n",
+    "2. Hopefully avoid Local Minima\n",
+    "\n",
+    "3. Memory Usage: Requires less memory compared to computing gradients for the entire dataset."
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "b84fb7fb",
-   "metadata": {},
+   "id": "2dacb8ef",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "We need also to account for the fact that the test may produce a false positive result (false alarm). Let us here assume that we have"
+    "## Batches and mini-batches\n",
+    "\n",
+    "In gradient descent we compute the cost function and its gradient for all data points we have.\n",
+    "\n",
+    "In large-scale applications such as the [ILSVRC challenge](https://www.image-net.org/challenges/LSVRC/), the\n",
+    "training data can have on order of millions of examples. Hence, it\n",
+    "seems wasteful to compute the full cost function over the entire\n",
+    "training set in order to perform only a single parameter update. A\n",
+    "very common approach to addressing this challenge is to compute the\n",
+    "gradient over batches of the training data. For example, a typical batch could contain some thousand  examples from\n",
+    "an  entire training set of several millions. This batch is then used to\n",
+    "perform a parameter update."
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "2c3b605b",
-   "metadata": {},
+   "id": "59c9add4",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "$$\n",
-    "p(X=1\\vert Y=0) =0.1.\n",
-    "$$"
+    "## Pros and cons\n",
+    "\n",
+    "1. Speed: SGD is faster than gradient descent because it uses only one training example per iteration, whereas gradient descent requires the entire dataset. This speed advantage becomes more significant as the size of the dataset increases.\n",
+    "\n",
+    "2. Convergence: Gradient descent has a more predictable convergence behaviour because it uses the average gradient of the entire dataset. In contrast, SGD’s convergence behaviour can be more erratic due to its random sampling of individual training examples.\n",
+    "\n",
+    "3. Memory: Gradient descent requires more memory than SGD because it must store the entire dataset for each iteration. SGD only needs to store the current training example, making it more memory-efficient."
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "d05ca03a",
-   "metadata": {},
+   "id": "a5168cc9",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "Using Bayes' theorem we can then find the posterior probability that the person has breast cancer in case of a positive test, that is we can compute"
+    "## Convergence rates\n",
+    "\n",
+    "1. Stochastic Gradient Descent has a faster convergence rate due to the use of single training examples in each iteration.\n",
+    "\n",
+    "2. Gradient Descent as a slower convergence rate, as it uses the entire dataset for each iteration."
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "4f4cee49",
-   "metadata": {},
+   "id": "47321307",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "$$\n",
-    "p(Y=1\\vert X=1)=\\frac{p(X=1\\vert Y=1)p(Y=1)}{p(X=1\\vert Y=1)p(Y=1)+p(X=1\\vert Y=0)p(Y=0)}=\\frac{0.8\\times 0.004}{0.8\\times 0.004+0.1\\times 0.996}=0.031.\n",
-    "$$"
+    "## Accuracy\n",
+    "\n",
+    "In general, stochastic Gradient Descent is Less accurate than gradient\n",
+    "descent, as it calculates the gradient on single examples, which may\n",
+    "not accurately represent the overall dataset.  Gradient Descent is\n",
+    "more accurate because it uses the average gradient calculated over the\n",
+    "entire dataset.\n",
+    "\n",
+    "There are other disadvantages to using SGD. The main drawback is that\n",
+    "its convergence behaviour can be more erratic due to the random\n",
+    "sampling of individual training examples. This can lead to less\n",
+    "accurate results, as the algorithm may not converge to the true\n",
+    "minimum of the cost function. Additionally, the learning rate, which\n",
+    "determines the step size of each update to the model’s parameters,\n",
+    "must be carefully chosen to ensure convergence.\n",
+    "\n",
+    "It is however the method of choice in deep learning algorithms where\n",
+    "SGD is often used in combination with other optimization techniques,\n",
+    "such as momentum or adaptive learning rates"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "4176d701",
-   "metadata": {},
+   "id": "96f44d6b",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "That is, in case of a positive test, there is only a $3\\%$ chance of having breast cancer!"
+    "## Stochastic Gradient Descent (SGD)\n",
+    "\n",
+    "In stochastic gradient descent, the extreme case is the case where we\n",
+    "have only one batch, that is we include the whole data set.\n",
+    "\n",
+    "This process is called Stochastic Gradient\n",
+    "Descent (SGD) (or also sometimes on-line gradient descent). This is\n",
+    "relatively less common to see because in practice due to vectorized\n",
+    "code optimizations it can be computationally much more efficient to\n",
+    "evaluate the gradient for 100 examples, than the gradient for one\n",
+    "example 100 times. Even though SGD technically refers to using a\n",
+    "single example at a time to evaluate the gradient, you will hear\n",
+    "people use the term SGD even when referring to mini-batch gradient\n",
+    "descent (i.e. mentions of MGD for “Minibatch Gradient Descent”, or BGD\n",
+    "for “Batch gradient descent” are rare to see), where it is usually\n",
+    "assumed that mini-batches are used. The size of the mini-batch is a\n",
+    "hyperparameter but it is not very common to cross-validate or bootstrap it. It is\n",
+    "usually based on memory constraints (if any), or set to some value,\n",
+    "e.g. 32, 64 or 128. We use powers of 2 in practice because many\n",
+    "vectorized operation implementations work faster when their inputs are\n",
+    "sized in powers of 2.\n",
+    "\n",
+    "In our notes with  SGD we mean stochastic gradient descent with mini-batches."
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "5ad8813c",
-   "metadata": {},
+   "id": "898ef421",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "## Bayes' Theorem and Ridge and Lasso Regression\n",
+    "## Stochastic Gradient Descent\n",
     "\n",
-    "Using Bayes' theorem we can gain a better intuition about Ridge and Lasso regression. \n",
+    "Stochastic gradient descent (SGD) and variants thereof address some of\n",
+    "the shortcomings of the Gradient descent method discussed above.\n",
     "\n",
-    "For ordinary least squares we postulated that the maximum likelihood for the doamin of events $\\boldsymbol{D}$ (one-dimensional case)"
+    "The underlying idea of SGD comes from the observation that the cost\n",
+    "function, which we want to minimize, can almost always be written as a\n",
+    "sum over $n$ data points $\\{\\mathbf{x}_i\\}_{i=1}^n$,"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "450006d9",
-   "metadata": {},
+   "id": "4e827950",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
-    "\\boldsymbol{D}=[(x_0,y_0), (x_1,y_1),\\dots, (x_{n-1},y_{n-1})],\n",
+    "C(\\mathbf{\\theta}) = \\sum_{i=1}^n c_i(\\mathbf{x}_i,\n",
+    "\\mathbf{\\theta}).\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "1b2173c5",
-   "metadata": {},
+   "id": "05e99546",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "is given by"
+    "## Computation of gradients\n",
+    "\n",
+    "This in turn means that the gradient can be\n",
+    "computed as a sum over $i$-gradients"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "10b2d8ce",
-   "metadata": {},
+   "id": "b92afe6c",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
-    "p(\\boldsymbol{D}\\vert\\boldsymbol{\\beta})=\\prod_{i=0}^{n-1}\\frac{1}{\\sqrt{2\\pi\\sigma^2}}\\exp{\\left[-\\frac{(y_i-\\boldsymbol{X}_{i,*}\\boldsymbol{\\beta})^2}{2\\sigma^2}\\right]}.\n",
+    "\\nabla_\\theta C(\\mathbf{\\theta}) = \\sum_i^n \\nabla_\\theta c_i(\\mathbf{x}_i,\n",
+    "\\mathbf{\\theta}).\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "a18147f9",
-   "metadata": {},
+   "id": "b20a4aca",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Stochasticity/randomness is introduced by only taking the\n",
+    "gradient on a subset of the data called minibatches.  If there are $n$\n",
+    "data points and the size of each minibatch is $M$, there will be $n/M$\n",
+    "minibatches. We denote these minibatches by $B_k$ where\n",
+    "$k=1,\\cdots,n/M$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7884cc0d",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "In Bayes' theorem this function plays the role of the so-called likelihood. We could now ask the question what is the posterior probability of a parameter set $\\boldsymbol{\\beta}$ given a domain of events $\\boldsymbol{D}$?  That is, how can we define the posterior probability"
+    "## SGD example\n",
+    "As an example, suppose we have $10$ data points $(\\mathbf{x}_1,\\cdots, \\mathbf{x}_{10})$ \n",
+    "and we choose to have $M=5$ minibathces,\n",
+    "then each minibatch contains two data points. In particular we have\n",
+    "$B_1 = (\\mathbf{x}_1,\\mathbf{x}_2), \\cdots, B_5 =\n",
+    "(\\mathbf{x}_9,\\mathbf{x}_{10})$. Note that if you choose $M=1$ you\n",
+    "have only a single batch with all data points and on the other extreme,\n",
+    "you may choose $M=n$ resulting in a minibatch for each datapoint, i.e\n",
+    "$B_k = \\mathbf{x}_k$.\n",
+    "\n",
+    "The idea is now to approximate the gradient by replacing the sum over\n",
+    "all data points with a sum over the data points in one the minibatches\n",
+    "picked at random in each gradient descent step"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "3214dac3",
-   "metadata": {},
+   "id": "392aeed0",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
-    "p(\\boldsymbol{\\beta}\\vert\\boldsymbol{D}).\n",
+    "\\nabla_{\\theta}\n",
+    "C(\\mathbf{\\theta}) = \\sum_{i=1}^n \\nabla_\\theta c_i(\\mathbf{x}_i,\n",
+    "\\mathbf{\\theta}) \\rightarrow \\sum_{i \\in B_k}^n \\nabla_\\theta\n",
+    "c_i(\\mathbf{x}_i, \\mathbf{\\theta}).\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "bd650eeb",
-   "metadata": {},
+   "id": "04581249",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "Bayes' theorem comes to our rescue here since (omitting the normalization constant)"
+    "## The gradient step\n",
+    "\n",
+    "Thus a gradient descent step now looks like"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "08d630ed",
-   "metadata": {},
+   "id": "d21077a4",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
-    "p(\\boldsymbol{\\beta}\\vert\\boldsymbol{D})\\propto p(\\boldsymbol{D}\\vert\\boldsymbol{\\beta})p(\\boldsymbol{\\beta}).\n",
+    "\\theta_{j+1} = \\theta_j - \\eta_j \\sum_{i \\in B_k}^n \\nabla_\\theta c_i(\\mathbf{x}_i,\n",
+    "\\mathbf{\\theta})\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "582ebf85",
-   "metadata": {},
+   "id": "b4bed668",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "We have a model for $p(\\boldsymbol{D}\\vert\\boldsymbol{\\beta})$ but need one for the **prior** $p(\\boldsymbol{\\beta})$!"
+    "where $k$ is picked at random with equal\n",
+    "probability from $[1,n/M]$. An iteration over the number of\n",
+    "minibathces (n/M) is commonly referred to as an epoch. Thus it is\n",
+    "typical to choose a number of epochs and for each epoch iterate over\n",
+    "the number of minibatches, as exemplified in the code below."
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "1a53c784",
-   "metadata": {},
+   "id": "9c15b282",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "## Ridge and Bayes\n",
-    "\n",
-    "With the posterior probability defined by a likelihood which we have\n",
-    "already modeled and an unknown prior, we are now ready to make\n",
-    "additional models for the prior.\n",
-    "\n",
-    "We can, based on our discussions of the variance of $\\boldsymbol{\\beta}$ and the mean value, assume that the prior for the values $\\boldsymbol{\\beta}$ is given by a Gaussian with mean value zero and variance $\\tau^2$, that is"
+    "## Simple example code"
    ]
   },
   {
-   "cell_type": "markdown",
-   "id": "89f8d622",
-   "metadata": {},
+   "cell_type": "code",
+   "execution_count": 6,
+   "id": "602bda4c",
+   "metadata": {
+    "collapsed": false,
+    "editable": true,
+    "jupyter": {
+     "outputs_hidden": false
+    }
+   },
+   "outputs": [],
    "source": [
-    "$$\n",
-    "p(\\boldsymbol{\\beta})=\\prod_{j=0}^{p-1}\\exp{\\left(-\\frac{\\beta_j^2}{2\\tau^2}\\right)}.\n",
-    "$$"
+    "import numpy as np \n",
+    "\n",
+    "n = 100 #100 datapoints \n",
+    "M = 5   #size of each minibatch\n",
+    "m = int(n/M) #number of minibatches\n",
+    "n_epochs = 10 #number of epochs\n",
+    "\n",
+    "j = 0\n",
+    "for epoch in range(1,n_epochs+1):\n",
+    "    for i in range(m):\n",
+    "        k = np.random.randint(m) #Pick the k-th minibatch at random\n",
+    "        #Compute the gradient using the data in minibatch Bk\n",
+    "        #Compute new suggestion for \n",
+    "        j += 1"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "f1138958",
-   "metadata": {},
+   "id": "332831a7",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "Our posterior probability becomes then (omitting the normalization factor which is just a constant)"
+    "Taking the gradient only on a subset of the data has two important\n",
+    "benefits. First, it introduces randomness which decreases the chance\n",
+    "that our opmization scheme gets stuck in a local minima. Second, if\n",
+    "the size of the minibatches are small relative to the number of\n",
+    "datapoints ($M <  n$), the computation of the gradient is much\n",
+    "cheaper since we sum over the datapoints in the $k-th$ minibatch and not\n",
+    "all $n$ datapoints."
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "c4b8fdc1",
-   "metadata": {},
+   "id": "187eb27c",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "$$\n",
-    "p(\\boldsymbol{\\beta\\vert\\boldsymbol{D})}=\\prod_{i=0}^{n-1}\\frac{1}{\\sqrt{2\\pi\\sigma^2}}\\exp{\\left[-\\frac{(y_i-\\boldsymbol{X}_{i,*}\\boldsymbol{\\beta})^2}{2\\sigma^2}\\right]}\\prod_{j=0}^{p-1}\\exp{\\left(-\\frac{\\beta_j^2}{2\\tau^2}\\right)}.\n",
-    "$$"
+    "## When do we stop?\n",
+    "\n",
+    "A natural question is when do we stop the search for a new minimum?\n",
+    "One possibility is to compute the full gradient after a given number\n",
+    "of epochs and check if the norm of the gradient is smaller than some\n",
+    "threshold and stop if true. However, the condition that the gradient\n",
+    "is zero is valid also for local minima, so this would only tell us\n",
+    "that we are close to a local/global minimum. However, we could also\n",
+    "evaluate the cost function at this point, store the result and\n",
+    "continue the search. If the test kicks in at a later stage we can\n",
+    "compare the values of the cost function and keep the $\\theta$ that\n",
+    "gave the lowest value."
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "0e65c2b7",
-   "metadata": {},
+   "id": "8ddbdbb5",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "We can now optimize this quantity with respect to $\\boldsymbol{\\beta}$. As we\n",
-    "did for OLS, this is most conveniently done by taking the negative\n",
-    "logarithm of the posterior probability. Doing so and leaving out the\n",
-    "constants terms that do not depend on $\\beta$, we have"
+    "## Slightly different approach\n",
+    "\n",
+    "Another approach is to let the step length $\\eta_j$ depend on the\n",
+    "number of epochs in such a way that it becomes very small after a\n",
+    "reasonable time such that we do not move at all. Such approaches are\n",
+    "also called scaling. There are many such ways to [scale the learning\n",
+    "rate](https://towardsdatascience.com/gradient-descent-the-learning-rate-and-the-importance-of-feature-scaling-6c0b416596e1)\n",
+    "and [discussions here](https://www.jmlr.org/papers/volume23/20-1258/20-1258.pdf). See\n",
+    "also\n",
+    "<https://towardsdatascience.com/learning-rate-schedules-and-adaptive-learning-rate-methods-for-deep-learning-2c8f433990d1>\n",
+    "for a discussion of different scaling functions for the learning rate."
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "4f7bc40c",
-   "metadata": {},
+   "id": "35ea8e21",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "$$\n",
-    "C(\\boldsymbol{\\beta})=\\frac{\\vert\\vert (\\boldsymbol{y}-\\boldsymbol{X}\\boldsymbol{\\beta})\\vert\\vert_2^2}{2\\sigma^2}+\\frac{1}{2\\tau^2}\\vert\\vert\\boldsymbol{\\beta}\\vert\\vert_2^2,\n",
-    "$$"
+    "## Time decay rate\n",
+    "\n",
+    "As an example, let $e = 0,1,2,3,\\cdots$ denote the current epoch and let $t_0, t_1 > 0$ be two fixed numbers. Furthermore, let $t = e \\cdot m + i$ where $m$ is the number of minibatches and $i=0,\\cdots,m-1$. Then the function $$\\eta_j(t; t_0, t_1) = \\frac{t_0}{t+t_1} $$ goes to zero as the number of epochs gets large. I.e. we start with a step length $\\eta_j (0; t_0, t_1) = t_0/t_1$ which decays in *time* $t$.\n",
+    "\n",
+    "In this way we can fix the number of epochs, compute $\\theta$ and\n",
+    "evaluate the cost function at the end. Repeating the computation will\n",
+    "give a different result since the scheme is random by design. Then we\n",
+    "pick the final $\\theta$ that gives the lowest value of the cost\n",
+    "function."
    ]
   },
   {
-   "cell_type": "markdown",
-   "id": "f1c44499",
-   "metadata": {},
+   "cell_type": "code",
+   "execution_count": 7,
+   "id": "77a60fcd",
+   "metadata": {
+    "collapsed": false,
+    "editable": true,
+    "jupyter": {
+     "outputs_hidden": false
+    }
+   },
+   "outputs": [],
    "source": [
-    "and replacing $1/2\\tau^2$ with $\\lambda$ we have"
+    "import numpy as np \n",
+    "\n",
+    "def step_length(t,t0,t1):\n",
+    "    return t0/(t+t1)\n",
+    "\n",
+    "n = 100 #100 datapoints \n",
+    "M = 5   #size of each minibatch\n",
+    "m = int(n/M) #number of minibatches\n",
+    "n_epochs = 500 #number of epochs\n",
+    "t0 = 1.0\n",
+    "t1 = 10\n",
+    "\n",
+    "eta_j = t0/t1\n",
+    "j = 0\n",
+    "for epoch in range(1,n_epochs+1):\n",
+    "    for i in range(m):\n",
+    "        k = np.random.randint(m) #Pick the k-th minibatch at random\n",
+    "        #Compute the gradient using the data in minibatch Bk\n",
+    "        #Compute new suggestion for theta\n",
+    "        t = epoch*m+i\n",
+    "        eta_j = step_length(t,t0,t1)\n",
+    "        j += 1\n",
+    "\n",
+    "print(\"eta_j after %d epochs: %g\" % (n_epochs,eta_j))"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "215c872a",
-   "metadata": {},
+   "id": "b030b80c",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "$$\n",
-    "C(\\boldsymbol{\\beta})=\\frac{\\vert\\vert (\\boldsymbol{y}-\\boldsymbol{X}\\boldsymbol{\\beta})\\vert\\vert_2^2}{2\\sigma^2}+\\lambda\\vert\\vert\\boldsymbol{\\beta}\\vert\\vert_2^2,\n",
-    "$$"
+    "## Code with a Number of Minibatches which varies\n",
+    "\n",
+    "In the code here we vary the number of mini-batches."
    ]
   },
   {
-   "cell_type": "markdown",
-   "id": "a248bb79",
-   "metadata": {},
+   "cell_type": "code",
+   "execution_count": 8,
+   "id": "9bdf875b",
+   "metadata": {
+    "collapsed": false,
+    "editable": true,
+    "jupyter": {
+     "outputs_hidden": false
+    }
+   },
+   "outputs": [],
    "source": [
-    "which is our Ridge cost function!  Nice, isn't it?"
+    "# Importing various packages\n",
+    "from math import exp, sqrt\n",
+    "from random import random, seed\n",
+    "import numpy as np\n",
+    "import matplotlib.pyplot as plt\n",
+    "\n",
+    "n = 100\n",
+    "x = 2*np.random.rand(n,1)\n",
+    "y = 4+3*x+np.random.randn(n,1)\n",
+    "\n",
+    "X = np.c_[np.ones((n,1)), x]\n",
+    "XT_X = X.T @ X\n",
+    "theta_linreg = np.linalg.inv(X.T @ X) @ (X.T @ y)\n",
+    "print(\"Own inversion\")\n",
+    "print(theta_linreg)\n",
+    "# Hessian matrix\n",
+    "H = (2.0/n)* XT_X\n",
+    "EigValues, EigVectors = np.linalg.eig(H)\n",
+    "print(f\"Eigenvalues of Hessian Matrix:{EigValues}\")\n",
+    "\n",
+    "theta = np.random.randn(2,1)\n",
+    "eta = 1.0/np.max(EigValues)\n",
+    "Niterations = 1000\n",
+    "\n",
+    "\n",
+    "for iter in range(Niterations):\n",
+    "    gradients = 2.0/n*X.T @ ((X @ theta)-y)\n",
+    "    theta -= eta*gradients\n",
+    "print(\"theta from own gd\")\n",
+    "print(theta)\n",
+    "\n",
+    "xnew = np.array([[0],[2]])\n",
+    "Xnew = np.c_[np.ones((2,1)), xnew]\n",
+    "ypredict = Xnew.dot(theta)\n",
+    "ypredict2 = Xnew.dot(theta_linreg)\n",
+    "\n",
+    "n_epochs = 50\n",
+    "M = 5   #size of each minibatch\n",
+    "m = int(n/M) #number of minibatches\n",
+    "t0, t1 = 5, 50\n",
+    "\n",
+    "def learning_schedule(t):\n",
+    "    return t0/(t+t1)\n",
+    "\n",
+    "theta = np.random.randn(2,1)\n",
+    "\n",
+    "for epoch in range(n_epochs):\n",
+    "# Can you figure out a better way of setting up the contributions to each batch?\n",
+    "    for i in range(m):\n",
+    "        random_index = M*np.random.randint(m)\n",
+    "        xi = X[random_index:random_index+M]\n",
+    "        yi = y[random_index:random_index+M]\n",
+    "        gradients = (2.0/M)* xi.T @ ((xi @ theta)-yi)\n",
+    "        eta = learning_schedule(epoch*m+i)\n",
+    "        theta = theta - eta*gradients\n",
+    "print(\"theta from own sdg\")\n",
+    "print(theta)\n",
+    "\n",
+    "plt.plot(xnew, ypredict, \"r-\")\n",
+    "plt.plot(xnew, ypredict2, \"b-\")\n",
+    "plt.plot(x, y ,'ro')\n",
+    "plt.axis([0,2.0,0, 15.0])\n",
+    "plt.xlabel(r'$x$')\n",
+    "plt.ylabel(r'$y$')\n",
+    "plt.title(r'Random numbers ')\n",
+    "plt.show()"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "bdc951a2",
-   "metadata": {},
+   "id": "365cebd9",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "## Lasso and Bayes\n",
+    "## Replace or not\n",
     "\n",
-    "To derive the Lasso cost function, we simply replace the Gaussian prior with an exponential distribution ([Laplace in this case](https://en.wikipedia.org/wiki/Laplace_distribution)) with zero mean value,  that is"
+    "In the above code, we have use replacement in setting up the\n",
+    "mini-batches. The discussion\n",
+    "[here](https://sebastianraschka.com/faq/docs/sgd-methods.html) may be\n",
+    "useful."
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "ce54b5ff",
-   "metadata": {},
+   "id": "e7c9011a",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "$$\n",
-    "p(\\boldsymbol{\\beta})=\\prod_{j=0}^{p-1}\\exp{\\left(-\\frac{\\vert\\beta_j\\vert}{\\tau}\\right)}.\n",
-    "$$"
+    "## SGD vs Full-Batch GD: Convergence Speed and Memory Comparison"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "a158d0d6",
-   "metadata": {},
+   "id": "f1c85da0",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "Our posterior probability becomes then (omitting the normalization factor which is just a constant)"
+    "### Theoretical Convergence Speed and convex optimization\n",
+    "\n",
+    "Consider minimizing an empirical cost function"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "b1010f4a",
-   "metadata": {},
+   "id": "66df0f80",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
-    "p(\\boldsymbol{\\beta}\\vert\\boldsymbol{D})=\\prod_{i=0}^{n-1}\\frac{1}{\\sqrt{2\\pi\\sigma^2}}\\exp{\\left[-\\frac{(y_i-\\boldsymbol{X}_{i,*}\\boldsymbol{\\beta})^2}{2\\sigma^2}\\right]}\\prod_{j=0}^{p-1}\\exp{\\left(-\\frac{\\vert\\beta_j\\vert}{\\tau}\\right)}.\n",
+    "C(\\theta) =\\frac{1}{N}\\sum_{i=1}^N l_i(\\theta),\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "48b806e1",
-   "metadata": {},
+   "id": "9f02b845",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "Taking the negative\n",
-    "logarithm of the posterior probability and leaving out the\n",
-    "constants terms that do not depend on $\\beta$, we have"
+    "where each $l_i(\\theta)$ is a\n",
+    "differentiable loss term. Gradient Descent (GD) updates parameters\n",
+    "using the full gradient $\\nabla C(\\theta)$, while Stochastic Gradient\n",
+    "Descent (SGD) uses a single sample (or mini-batch) gradient $\\nabla\n",
+    "l_i(\\theta)$ selected at random. In equation form, one GD step is:"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "4e3090e9",
-   "metadata": {},
+   "id": "21997f1a",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
-    "C(\\boldsymbol{\\beta})=\\frac{\\vert\\vert (\\boldsymbol{y}-\\boldsymbol{X}\\boldsymbol{\\beta})\\vert\\vert_2^2}{2\\sigma^2}+\\frac{1}{\\tau}\\vert\\vert\\boldsymbol{\\beta}\\vert\\vert_1,\n",
+    "\\theta_{t+1} = \\theta_t-\\eta \\nabla C(\\theta_t) =\\theta_t -\\eta \\frac{1}{N}\\sum_{i=1}^N \\nabla l_i(\\theta_t),\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "af23d3a1",
-   "metadata": {},
+   "id": "cdefe165",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "and replacing $1/\\tau$ with $\\lambda$ we have"
+    "whereas one SGD step is:"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "72d5f20a",
-   "metadata": {},
+   "id": "ac200d56",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
-    "C(\\boldsymbol{\\beta})=\\frac{\\vert\\vert (\\boldsymbol{y}-\\boldsymbol{X}\\boldsymbol{\\beta})\\vert\\vert_2^2}{2\\sigma^2}+\\lambda\\vert\\vert\\boldsymbol{\\beta}\\vert\\vert_1,\n",
+    "\\theta_{t+1} = \\theta_t -\\eta \\nabla l_{i_t}(\\theta_t),\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "45d3140f",
-   "metadata": {},
+   "id": "eb3edfb3",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "which is our Lasso cost function!"
+    "with $i_t$ randomly chosen. On smooth convex problems, GD and SGD both\n",
+    "converge to the global minimum, but their rates differ. GD can take\n",
+    "larger, more stable steps since it uses the exact gradient, achieving\n",
+    "an error that decreases on the order of $O(1/t)$ per iteration for\n",
+    "convex objectives (and even exponentially fast for strongly convex\n",
+    "cases). In contrast, plain SGD has more variance in each step, leading\n",
+    "to sublinear convergence in expectation – typically $O(1/\\sqrt{t})$\n",
+    "for general convex objectives (\\thetaith appropriate diminishing step\n",
+    "sizes) . Intuitively, GD’s trajectory is smoother and more\n",
+    "predictable, while SGD’s path oscillates due to noise but costs far\n",
+    "less per iteration, enabling many more updates in the same time."
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "6c40d9a0",
-   "metadata": {},
+   "id": "7fe05c0d",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "## Why resampling methods\n",
-    "\n",
-    "Before we proceed, we need to rethink what we have been doing. In our\n",
-    "eager to fit the data, we have omitted several important elements in\n",
-    "our regression analysis. In what follows we will\n",
-    "1. look at statistical properties, including a discussion of mean values, variance and the so-called bias-variance tradeoff\n",
+    "### Strongly Convex Case\n",
     "\n",
-    "2. introduce resampling techniques like cross-validation, bootstrapping and jackknife and more\n",
-    "\n",
-    "and discuss how to select a given model (one of the difficult parts in machine learning)."
+    "If $C(\\theta)$ is strongly convex and $L$-smooth (so GD enjoys linear\n",
+    "convergence), the gap $C(\\theta_t)-C(\\theta^*)$ for GD shrinks as"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "bc0ed879",
-   "metadata": {},
+   "id": "2ae403f1",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "## Resampling methods\n",
-    "Resampling methods are an indispensable tool in modern\n",
-    "statistics. They involve repeatedly drawing samples from a training\n",
-    "set and refitting a model of interest on each sample in order to\n",
-    "obtain additional information about the fitted model. For example, in\n",
-    "order to estimate the variability of a linear regression fit, we can\n",
-    "repeatedly draw different samples from the training data, fit a linear\n",
-    "regression to each new sample, and then examine the extent to which\n",
-    "the resulting fits differ. Such an approach may allow us to obtain\n",
-    "information that would not be available from fitting the model only\n",
-    "once using the original training sample.\n",
-    "\n",
-    "Two resampling methods are often used in Machine Learning analyses,\n",
-    "1. The **bootstrap method**\n",
-    "\n",
-    "2. and **Cross-Validation**\n",
-    "\n",
-    "In addition there are several other methods such as the Jackknife and the Blocking methods. We will discuss in particular\n",
-    "cross-validation and the bootstrap method."
+    "$$\n",
+    "C(\\theta_t) - C(\\theta^* ) \\le \\Big(1 - \\frac{\\mu}{L}\\Big)^t [C(\\theta_0)-C(\\theta^*)],\n",
+    "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "a2f50278",
-   "metadata": {},
+   "id": "44272171",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "## Resampling approaches can be computationally expensive\n",
-    "\n",
-    "Resampling approaches can be computationally expensive, because they\n",
-    "involve fitting the same statistical method multiple times using\n",
-    "different subsets of the training data. However, due to recent\n",
-    "advances in computing power, the computational requirements of\n",
-    "resampling methods generally are not prohibitive. In this chapter, we\n",
-    "discuss two of the most commonly used resampling methods,\n",
-    "cross-validation and the bootstrap. Both methods are important tools\n",
-    "in the practical application of many statistical learning\n",
-    "procedures. For example, cross-validation can be used to estimate the\n",
-    "test error associated with a given statistical learning method in\n",
-    "order to evaluate its performance, or to select the appropriate level\n",
-    "of flexibility. The process of evaluating a model’s performance is\n",
-    "known as model assessment, whereas the process of selecting the proper\n",
-    "level of flexibility for a model is known as model selection. The\n",
-    "bootstrap is widely used."
+    "a geometric (linear) convergence per iteration . Achieving an\n",
+    "$\\epsilon$-accurate solution thus takes on the order of\n",
+    "$\\log(1/\\epsilon)$ iterations for GD. However, each GD iteration costs\n",
+    "$O(N)$ gradient evaluations. SGD cannot exploit strong convexity to\n",
+    "obtain a linear rate – instead, with a properly decaying step size\n",
+    "(e.g. $\\eta_t = \\frac{1}{\\mu t}$) or iterate averaging, SGD attains an\n",
+    "$O(1/t)$ convergence rate in expectation . For example, one result\n",
+    "of Moulines and  Bach 2011, see <https://papers.nips.cc/paper_files/paper/2011/hash/40008b9a5380fcacce3976bf7c08af5b-Abstract.html> shows that with $\\eta_t = \\Theta(1/t)$,"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "d05b795a",
-   "metadata": {},
+   "id": "9cde29ef",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "## Why resampling methods ?\n",
-    "**Statistical analysis.**\n",
-    "\n",
-    "* Our simulations can be treated as *computer experiments*. This is particularly the case for Monte Carlo methods which are widely used in statistical analyses.\n",
-    "\n",
-    "* The results can be analysed with the same statistical tools as we would use when analysing experimental data.\n",
-    "\n",
-    "* As in all experiments, we are looking for expectation values and an estimate of how accurate they are, i.e., possible sources for errors."
+    "$$\n",
+    "\\mathbb{E}[C(\\theta_t) - C(\\theta^*)] = O(1/t),\n",
+    "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "61291aec",
-   "metadata": {},
+   "id": "9b77f20e",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "## Statistical analysis\n",
-    "\n",
-    "* As in other experiments, many numerical  experiments have two classes of errors:\n",
-    "\n",
-    "  * Statistical errors\n",
-    "\n",
-    "  * Systematical errors\n",
-    "\n",
-    "* Statistical errors can be estimated using standard tools from statistics\n",
+    "for strongly convex, smooth $F$ . This $1/t$ rate is slower per\n",
+    "iteration than GD’s exponential decay, but each SGD iteration is $N$\n",
+    "times cheaper. In fact, to reach error $\\epsilon$, plain SGD needs on\n",
+    "the order of $T=O(1/\\epsilon)$ iterations (sub-linear convergence),\n",
+    "while GD needs $O(\\log(1/\\epsilon))$ iterations. When accounting for\n",
+    "cost-per-iteration, GD requires $O(N \\log(1/\\epsilon))$ total gradient\n",
+    "computations versus SGD’s $O(1/\\epsilon)$ single-sample\n",
+    "computations. In large-scale regimes (huge $N$), SGD can be\n",
+    "faster in wall-clock time because $N \\log(1/\\epsilon)$ may far exceed\n",
+    "$1/\\epsilon$ for reasonable accuracy levels. In other words,\n",
+    "with millions of data points, one epoch of GD (one full gradient) is\n",
+    "extremely costly, whereas SGD can make $N$ cheap updates in the time\n",
+    "GD makes one – often yielding a good solution faster in practice, even\n",
+    "though SGD’s asymptotic error decays more slowly. As one lecture\n",
+    "succinctly puts it: “SGD can be super effective in terms of iteration\n",
+    "cost and memory, but SGD is slow to converge and can’t adapt to strong\n",
+    "convexity” . Thus, the break-even point depends on $N$ and the desired\n",
+    "accuracy: for moderate accuracy on very large $N$, SGD’s cheaper\n",
+    "updates win; for extremely high precision (very small $\\epsilon$) on a\n",
+    "modest $N$, GD’s fast convergence per step can be advantageous."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4479bd97",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "### Non-Convex Problems\n",
+    "\n",
+    "In non-convex optimization (e.g. deep neural networks), neither GD nor\n",
+    "SGD guarantees global minima, but SGD often displays faster progress\n",
+    "in finding useful minima. Theoretical results here are weaker, usually\n",
+    "showing convergence to a stationary point $\\theta$ ($|\\nabla C|$ is\n",
+    "small) in expectation. For example, GD might require $O(1/\\epsilon^2)$\n",
+    "iterations to ensure $|\\nabla C(\\theta)| < \\epsilon$, and SGD typically has\n",
+    "similar polynomial complexity (often worse due to gradient\n",
+    "noise). However, a noteworthy difference is that SGD’s stochasticity\n",
+    "can help escape saddle points or poor local minima. Random gradient\n",
+    "fluctuations act like implicit noise, helping the iterate “jump” out\n",
+    "of flat saddle regions where full-batch GD could stagnate . In fact,\n",
+    "research has shown that adding noise to GD can guarantee escaping\n",
+    "saddle points in polynomial time, and the inherent noise in SGD often\n",
+    "serves this role. Empirically, this means SGD can sometimes find a\n",
+    "lower loss basin faster, whereas full-batch GD might get “stuck” near\n",
+    "saddle points or need a very small learning rate to navigate complex\n",
+    "error surfaces . Overall, in modern high-dimensional machine learning,\n",
+    "SGD (or mini-batch SGD) is the workhorse for large non-convex problems\n",
+    "because it converges to good solutions much faster in practice,\n",
+    "despite the lack of a linear convergence guarantee. Full-batch GD is\n",
+    "rarely used on large neural networks, as it would require tiny steps\n",
+    "to avoid divergence and is extremely slow per iteration ."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "31ea65c9",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Memory Usage and Scalability\n",
+    "\n",
+    "A major advantage of SGD is its memory efficiency in handling large\n",
+    "datasets. Full-batch GD requires access to the entire training set for\n",
+    "each iteration, which often means the whole dataset (or a large\n",
+    "subset) must reside in memory to compute $\\nabla C(\\theta)$ . This results\n",
+    "in memory usage that scales linearly with the dataset size $N$. For\n",
+    "instance, if each training sample is large (e.g. high-dimensional\n",
+    "features), computing a full gradient may require storing a substantial\n",
+    "portion of the data or all intermediate gradients until they are\n",
+    "aggregated. In contrast, SGD needs only a single (or a small\n",
+    "mini-batch of) training example(s) in memory at any time . The\n",
+    "algorithm processes one sample (or mini-batch) at a time and\n",
+    "immediately updates the model, discarding that sample before moving to\n",
+    "the next. This streaming approach means that memory footprint is\n",
+    "essentially independent of $N$ (apart from storing the model\n",
+    "parameters themselves). As one source notes, gradient descent\n",
+    "“requires more memory than SGD” because it “must store the entire\n",
+    "dataset for each iteration,” whereas SGD “only needs to store the\n",
+    "current training example” . In practical terms, if you have a dataset\n",
+    "of size, say, 1 million examples, full-batch GD would need memory for\n",
+    "all million every step, while SGD could be implemented to load just\n",
+    "one example at a time – a crucial benefit if data are too large to fit\n",
+    "in RAM or GPU memory. This scalability makes SGD suitable for\n",
+    "large-scale learning: as long as you can stream data from disk, SGD\n",
+    "can handle arbitrarily large datasets with fixed memory. In fact, SGD\n",
+    "“does not need to remember which examples were visited” in the past,\n",
+    "allowing it to run in an online fashion on infinite data streams\n",
+    ". Full-batch GD, on the other hand, would require multiple passes\n",
+    "through a giant dataset per update (or a complex distributed memory\n",
+    "system), which is often infeasible.\n",
+    "\n",
+    "There is also a secondary memory effect: computing a full-batch\n",
+    "gradient in deep learning requires storing all intermediate\n",
+    "activations for backpropagation across the entire batch. A very large\n",
+    "batch (approaching the full dataset) might exhaust GPU memory due to\n",
+    "the need to hold activation gradients for thousands or millions of\n",
+    "examples simultaneously. SGD/minibatches mitigate this by splitting\n",
+    "the workload – e.g. with a mini-batch of size 32 or 256, memory use\n",
+    "stays bounded, whereas a full-batch (size = $N$) forward/backward pass\n",
+    "could not even be executed if $N$ is huge. Techniques like gradient\n",
+    "accumulation exist to simulate large-batch GD by summing many\n",
+    "small-batch gradients – but these still process data in manageable\n",
+    "chunks to avoid memory overflow. In summary, memory complexity for GD\n",
+    "grows with $N$, while for SGD it remains $O(1)$ w.r.t. dataset size\n",
+    "(only the model and perhaps a mini-batch reside in memory) . This is a\n",
+    "key reason why batch GD “does not scale” to very large data and why\n",
+    "virtually all large-scale machine learning algorithms rely on\n",
+    "stochastic or mini-batch methods."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3f3fe4c4",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Empirical Evidence: Convergence Time and Memory in Practice\n",
     "\n",
-    "* Systematical errors are method specific and must be treated differently from case to case."
+    "Empirical studies strongly support the theoretical trade-offs\n",
+    "above. In large-scale machine learning tasks, SGD often converges to a\n",
+    "good solution much faster in wall-clock time than full-batch GD, and\n",
+    "it uses far less memory. For example, Bottou & Bousquet (2008)\n",
+    "analyzed learning time under a fixed computational budget and\n",
+    "concluded that when data is abundant, it’s better to use a faster\n",
+    "(even if less precise) optimization method to process more examples in\n",
+    "the same time . This analysis showed that for large-scale problems,\n",
+    "processing more data with SGD yields lower error than spending the\n",
+    "time to do exact (batch) optimization on fewer data . In other words,\n",
+    "if you have a time budget, it’s often optimal to accept slightly\n",
+    "slower convergence per step (as with SGD) in exchange for being able\n",
+    "to use many more training samples in that time. This phenomenon is\n",
+    "borne out by experiments:"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "e0053e66",
-   "metadata": {},
+   "id": "69d08c69",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "## Resampling methods\n",
-    "\n",
-    "With all these analytical equations for both the OLS and Ridge\n",
-    "regression, we will now outline how to assess a given model. This will\n",
-    "lead to a discussion of the so-called bias-variance tradeoff (see\n",
-    "below) and so-called resampling methods.\n",
-    "\n",
-    "One of the quantities we have discussed as a way to measure errors is\n",
-    "the mean-squared error (MSE), mainly used for fitting of continuous\n",
-    "functions. Another choice is the absolute error.\n",
-    "\n",
-    "In the discussions below we will focus on the MSE and in particular since we will split the data into test and training data,\n",
-    "we discuss the\n",
-    "1. prediction error or simply the **test error** $\\mathrm{Err_{Test}}$, where we have a fixed training set and the test error is the MSE arising from the data reserved for testing. We discuss also the \n",
+    "### Deep Neural Networks\n",
     "\n",
-    "2. training error $\\mathrm{Err_{Train}}$, which is the average loss over the training data.\n",
+    "In modern deep learning, full-batch GD is so slow that it is rarely\n",
+    "attempted; instead, mini-batch SGD is standard. A recent study\n",
+    "demonstrated that it is possible to train a ResNet-50 on ImageNet\n",
+    "using full-batch gradient descent, but it required careful tuning\n",
+    "(e.g. gradient clipping, tiny learning rates) and vast computational\n",
+    "resources – and even then, each full-batch update was extremely\n",
+    "expensive.\n",
     "\n",
-    "As our model becomes more and more complex, more of the training data tends to  used. The training may thence adapt to more complicated structures in the data. This may lead to a decrease in the bias (see below for code example) and a slight increase of the variance for the test error.\n",
-    "For a certain level of complexity the test error will reach minimum, before starting to increase again. The\n",
-    "training error reaches a saturation."
+    "Using a huge batch\n",
+    "(closer to full GD) tends to slow down convergence if the learning\n",
+    "rate is not scaled up, and often encounters optimization difficulties\n",
+    "(plateaus) that small batches avoid.\n",
+    "Empirically, small or medium\n",
+    "batch SGD finds minima in fewer clock hours because it can rapidly\n",
+    "loop over the data with gradient noise aiding exploration."
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "5bc0cc49",
-   "metadata": {},
+   "id": "4e2b549d",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "## Resampling methods: Bootstrap\n",
-    "Bootstrapping is a [non-parametric approach](https://en.wikipedia.org/wiki/Nonparametric_statistics) to statistical inference\n",
-    "that substitutes computation for more traditional distributional\n",
-    "assumptions and asymptotic results. Bootstrapping offers a number of\n",
-    "advantages: \n",
-    "1. The bootstrap is quite general, although there are some cases in which it fails.  \n",
-    "\n",
-    "2. Because it does not require distributional assumptions (such as normally distributed errors), the bootstrap can provide more accurate inferences when the data are not well behaved or when the sample size is small.  \n",
-    "\n",
-    "3. It is possible to apply the bootstrap to statistics with sampling distributions that are difficult to derive, even asymptotically. \n",
-    "\n",
-    "4. It is relatively simple to apply the bootstrap to complex data-collection plans (such as stratified and clustered samples).\n",
+    "### Memory constraints\n",
+    "\n",
+    "From a memory standpoint, practitioners note that batch GD becomes\n",
+    "infeasible on large data. For example, if one tried to do full-batch\n",
+    "training on a dataset that doesn’t fit in RAM or GPU memory, the\n",
+    "program would resort to heavy disk I/O or simply crash. SGD\n",
+    "circumvents this by processing mini-batches. Even in cases where data\n",
+    "does fit in memory, using a full batch can spike memory usage due to\n",
+    "storing all gradients. One empirical observation is that mini-batch\n",
+    "training has a “lower, fluctuating usage pattern” of memory, whereas\n",
+    "full-batch loading “quickly consumes memory (often exceeding limits)”\n",
+    ". This is especially relevant for graph neural networks or other\n",
+    "models where a “batch” may include a huge chunk of a graph: full-batch\n",
+    "gradient computation can exhaust GPU memory, whereas mini-batch\n",
+    "methods keep memory usage manageable .\n",
+    "\n",
+    "In summary, SGD converges faster than full-batch GD in terms of actual\n",
+    "training time for large-scale problems, provided we measure\n",
+    "convergence as reaching a good-enough solution. Theoretical bounds\n",
+    "show SGD needs more iterations, but because it performs many more\n",
+    "updates per unit time (and requires far less memory), it often\n",
+    "achieves lower loss in a given time frame than GD. Full-batch GD might\n",
+    "take slightly fewer iterations in theory, but each iteration is so\n",
+    "costly that it is “slower… especially for large datasets” . Meanwhile,\n",
+    "memory scaling strongly favors SGD: GD’s memory cost grows with\n",
+    "dataset size, making it impractical beyond a point, whereas SGD’s\n",
+    "memory use is modest and mostly constant w.r.t. $N$ . These\n",
+    "differences have made SGD (and mini-batch variants) the de facto\n",
+    "choice for training large machine learning models, from logistic\n",
+    "regression on millions of examples to deep neural networks with\n",
+    "billions of parameters. The consensus in both research and practice is\n",
+    "that for large-scale or high-dimensional tasks, SGD-type methods\n",
+    "converge quicker per unit of computation and handle memory constraints\n",
+    "better than standard full-batch gradient descent ."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "48c2661e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Second moment of the gradient\n",
     "\n",
-    "The textbook by [Davison on the Bootstrap Methods and their Applications](https://www.cambridge.org/core/books/bootstrap-methods-and-their-application/ED2FD043579F27952363566DC09CBD6A) provides many more insights and proofs. In this course we will take a more practical approach and use the results and theorems provided in the literature. For those interested in reading more about the bootstrap methods, we recommend the above text and the one by [Efron and Tibshirani](https://www.routledge.com/An-Introduction-to-the-Bootstrap/Efron-Tibshirani/p/book/9780412042317).\n",
+    "In stochastic gradient descent, with and without momentum, we still\n",
+    "have to specify a schedule for tuning the learning rates $\\eta_t$\n",
+    "as a function of time.  As discussed in the context of Newton's\n",
+    "method, this presents a number of dilemmas. The learning rate is\n",
+    "limited by the steepest direction which can change depending on the\n",
+    "current position in the landscape. To circumvent this problem, ideally\n",
+    "our algorithm would keep track of curvature and take large steps in\n",
+    "shallow, flat directions and small steps in steep, narrow directions.\n",
+    "Second-order methods accomplish this by calculating or approximating\n",
+    "the Hessian and normalizing the learning rate by the\n",
+    "curvature. However, this is very computationally expensive for\n",
+    "extremely large models. Ideally, we would like to be able to\n",
+    "adaptively change the step size to match the landscape without paying\n",
+    "the steep computational price of calculating or approximating\n",
+    "Hessians.\n",
     "\n",
-    "Before we proceed however, we need to remind ourselves about a central theorem in statistics, namely the so-called **central limit theorem**."
+    "During the last decade a number of methods have been introduced that accomplish\n",
+    "this by tracking not only the gradient, but also the second moment of\n",
+    "the gradient. These methods include AdaGrad, AdaDelta, Root Mean Squared Propagation (RMS-Prop), and\n",
+    "[ADAM](https://arxiv.org/abs/1412.6980)."
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "61de3a3e",
-   "metadata": {},
+   "id": "a2106298",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "## The Central Limit Theorem\n",
+    "## Challenge: Choosing a Fixed Learning Rate\n",
+    "A fixed $\\eta$ is hard to get right:\n",
+    "1. If $\\eta$ is too large, the updates can overshoot the minimum, causing oscillations or divergence\n",
     "\n",
-    "Suppose we have a PDF $p(x)$ from which we generate  a series $N$\n",
-    "of averages $\\mathbb{E}[x_i]$. Each mean value $\\mathbb{E}[x_i]$\n",
-    "is viewed as the average of a specific measurement, e.g., throwing \n",
-    "dice 100 times and then taking the average value, or producing a certain\n",
-    "amount of random numbers. \n",
-    "For notational ease, we set $\\mathbb{E}[x_i]=x_i$ in the discussion\n",
-    "which follows. We do the same for $\\mathbb{E}[z]=z$.\n",
+    "2. If $\\eta$ is too small, convergence is very slow (many iterations to make progress)\n",
     "\n",
-    "If we compute the mean $z$ of $m$ such mean values $x_i$"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "45c40645",
-   "metadata": {},
-   "source": [
-    "$$\n",
-    "z=\\frac{x_1+x_2+\\dots+x_m}{m},\n",
-    "$$"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "f96414b9",
-   "metadata": {},
-   "source": [
-    "the question we pose is which is the PDF of the new variable $z$."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "89324724",
-   "metadata": {},
-   "source": [
-    "## Finding the Limit\n",
+    "In practice, one often uses trial-and-error or schedules (decaying $\\eta$ over time) to find a workable balance.\n",
+    "For a function with steep directions and flat directions, a single global $\\eta$ may be inappropriate:\n",
+    "1. Steep coordinates require a smaller step size to avoid oscillation.\n",
     "\n",
-    "The probability of obtaining an average value $z$ is the product of the \n",
-    "probabilities of obtaining arbitrary individual mean values $x_i$,\n",
-    "but with the constraint that the average is $z$. We can express this through\n",
-    "the following expression"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "c7dc04c2",
-   "metadata": {},
-   "source": [
-    "$$\n",
-    "\\tilde{p}(z)=\\int dx_1p(x_1)\\int dx_2p(x_2)\\dots\\int dx_mp(x_m)\n",
-    "    \\delta(z-\\frac{x_1+x_2+\\dots+x_m}{m}),\n",
-    "$$"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "451e8d96",
-   "metadata": {},
-   "source": [
-    "where the $\\delta$-function enbodies the constraint that the mean is $z$.\n",
-    "All measurements that lead to each individual $x_i$ are expected to\n",
-    "be independent, which in turn means that we can express $\\tilde{p}$ as the \n",
-    "product of individual $p(x_i)$.  The independence assumption is important in the derivation of the central limit theorem."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "b857136e",
-   "metadata": {},
-   "source": [
-    "## Rewriting the $\\delta$-function\n",
+    "2. Flat/shallow coordinates could use a larger step to speed up progress.\n",
     "\n",
-    "If we use the integral expression for the $\\delta$-function"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "81465668",
-   "metadata": {},
-   "source": [
-    "$$\n",
-    "\\delta(z-\\frac{x_1+x_2+\\dots+x_m}{m})=\\frac{1}{2\\pi}\\int_{-\\infty}^{\\infty}\n",
-    "   dq\\exp{\\left(iq(z-\\frac{x_1+x_2+\\dots+x_m}{m})\\right)},\n",
-    "$$"
+    "3. This issue is pronounced in high-dimensional problems with **sparse or varying-scale features** – we need a method to adjust step sizesper feature."
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "b145c9fd",
-   "metadata": {},
+   "id": "477a053c",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "and inserting $e^{i\\mu q-i\\mu q}$ where $\\mu$ is the mean value\n",
-    "we arrive at"
+    "## Motivation for Adaptive Step Sizes\n",
+    "\n",
+    "1. Instead of a fixed global $\\eta$, use an **adaptive learning rate** for each parameter that depends on the history of gradients.\n",
+    "\n",
+    "2. Parameters that have large accumulated gradient magnitude should get smaller steps (they've been changing a lot), whereas parameters with small or infrequent gradients can have larger relative steps.\n",
+    "\n",
+    "3. This is especially useful for sparse features: Rarely active features accumulate little gradient, so their learning rate remains comparatively high, ensuring they are not neglected\n",
+    "\n",
+    "4. Conversely, frequently active features accumulate large gradient sums, and their learning rate automatically decreases, preventing too-large updates\n",
+    "\n",
+    "5. Several algorithms implement this idea (AdaGrad, RMSProp, AdaDelta, Adam, etc.). We will derive **AdaGrad**, one of the first adaptive methods."
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "c2f110e5",
-   "metadata": {},
+   "id": "f0924df8",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "$$\n",
-    "\\tilde{p}(z)=\\frac{1}{2\\pi}\\int_{-\\infty}^{\\infty}\n",
-    "   dq\\exp{\\left(iq(z-\\mu)\\right)}\\left[\\int_{-\\infty}^{\\infty}\n",
-    "   dxp(x)\\exp{\\left(iq(\\mu-x)/m\\right)}\\right]^m,\n",
-    "$$"
+    "## AdaGrad algorithm, taken from [Goodfellow et al](https://www.deeplearningbook.org/contents/optimization.html)\n",
+    "\n",
+    "<!-- dom:FIGURE: [figures/adagrad.png, width=600 frac=0.8] -->\n",
+    "<!-- begin figure -->\n",
+    "\n",
+    "<img src=\"figures/adagrad.png\" width=\"600\"><p style=\"font-size: 0.9em\"><i>Figure 1: </i></p>\n",
+    "<!-- end figure -->"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "9fffd309",
-   "metadata": {},
+   "id": "7743f26d",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "with the integral over $x$ resulting in"
+    "## Derivation of the AdaGrad Algorithm\n",
+    "\n",
+    "**Accumulating Gradient History.**\n",
+    "\n",
+    "1. AdaGrad maintains a running sum of squared gradients for each parameter (coordinate)\n",
+    "\n",
+    "2. Let $g_t = \\nabla C_{i_t}(x_t)$ be the gradient at step $t$ (or a subgradient for nondifferentiable cases).\n",
+    "\n",
+    "3. Initialize $r_0 = 0$ (an all-zero vector in $\\mathbb{R}^d$).\n",
+    "\n",
+    "4. At each iteration $t$, update the accumulation:"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "6d1a1c34",
-   "metadata": {},
+   "id": "ef4b5d6a",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
-    "\\int_{-\\infty}^{\\infty}dxp(x)\\exp{\\left(iq(\\mu-x)/m\\right)}=\n",
-    "  \\int_{-\\infty}^{\\infty}dxp(x)\n",
-    "   \\left[1+\\frac{iq(\\mu-x)}{m}-\\frac{q^2(\\mu-x)^2}{2m^2}+\\dots\\right].\n",
+    "r_t = r_{t-1} + g_t \\circ g_t,\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "3c425d33",
-   "metadata": {},
+   "id": "927e2738",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "## Identifying Terms\n",
+    "1. Here  $g_t \\circ g_t$ denotes element-wise square of the gradient vector. $g_t^{(j)} = g_{t-1}^{(j)} + (g_{t,j})^2$ for each parameter $j$.\n",
     "\n",
-    "The second term on the rhs disappears since this is just the mean and \n",
-    "employing the definition of $\\sigma^2$ we have"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "d43d6aa9",
-   "metadata": {},
-   "source": [
-    "$$\n",
-    "\\int_{-\\infty}^{\\infty}dxp(x)e^{\\left(iq(\\mu-x)/m\\right)}=\n",
-    "  1-\\frac{q^2\\sigma^2}{2m^2}+\\dots,\n",
-    "$$"
+    "2. We can view $H_t = \\mathrm{diag}(r_t)$ as a diagonal matrix of past squared gradients. Initially $H_0 = 0$."
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "b8bd4093",
-   "metadata": {},
+   "id": "1753de13",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "resulting in"
+    "## AdaGrad Update Rule Derivation\n",
+    "\n",
+    "We scale the gradient by the inverse square root of the accumulated matrix $H_t$. The AdaGrad update at step $t$ is:"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "b46c892b",
-   "metadata": {},
+   "id": "0db67ba3",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
-    "\\left[\\int_{-\\infty}^{\\infty}dxp(x)\\exp{\\left(iq(\\mu-x)/m\\right)}\\right]^m\\approx\n",
-    "  \\left[1-\\frac{q^2\\sigma^2}{2m^2}+\\dots \\right]^m,\n",
+    "\\theta_{t+1} =\\theta_t - \\eta H_t^{-1/2} g_t,\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "0a4fcb69",
-   "metadata": {},
+   "id": "7831e978",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "and in the limit $m\\rightarrow \\infty$ we obtain"
+    "where $H_t^{-1/2}$ is the diagonal matrix with entries $(r_{t}^{(1)})^{-1/2}, \\dots, (r_{t}^{(d)})^{-1/2}$\n",
+    "In coordinates, this means each parameter $j$ has an individual step size:"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "9c6b1478",
-   "metadata": {},
+   "id": "92a7758a",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
-    "\\tilde{p}(z)=\\frac{1}{\\sqrt{2\\pi}(\\sigma/\\sqrt{m})}\n",
-    "    \\exp{\\left(-\\frac{(z-\\mu)^2}{2(\\sigma/\\sqrt{m})^2}\\right)},\n",
+    "\\theta_{t+1,j} =\\theta_{t,j} -\\frac{\\eta}{\\sqrt{r_{t,j}}}g_{t,j}.\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "c179621d",
-   "metadata": {},
-   "source": [
-    "which is the normal distribution with variance\n",
-    "$\\sigma^2_m=\\sigma^2/m$, where $\\sigma$ is the variance of the PDF $p(x)$\n",
-    "and $\\mu$ is also the mean of the PDF $p(x)$."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "de489be5",
-   "metadata": {},
+   "id": "df62a4ff",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "## Wrapping it up\n",
-    "\n",
-    "Thus, the central limit theorem states that the PDF $\\tilde{p}(z)$ of\n",
-    "the average of $m$ random values corresponding to a PDF $p(x)$ \n",
-    "is a normal distribution whose mean is the \n",
-    "mean value of the PDF $p(x)$ and whose variance is the variance\n",
-    "of the PDF $p(x)$ divided by $m$, the number of values used to compute $z$.\n",
-    "\n",
-    "The central limit theorem leads to the well-known expression for the\n",
-    "standard deviation, given by"
+    "In practice we add a small constant $\\epsilon$ in the denominator for numerical stability to avoid division by zero:"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "43c158c8",
-   "metadata": {},
+   "id": "c8a2b948",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
-    "\\sigma_m=\n",
-    "\\frac{\\sigma}{\\sqrt{m}}.\n",
+    "\\theta_{t+1,j}= \\theta_{t,j}-\\frac{\\eta}{\\sqrt{\\epsilon + r_{t,j}}}g_{t,j}.\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "433141f0",
-   "metadata": {},
-   "source": [
-    "The latter is true only if the average value is known exactly. This is obtained in the limit\n",
-    "$m\\rightarrow \\infty$  only. Because the mean and the variance are measured quantities we obtain \n",
-    "the familiar expression in statistics (the so-called Bessel correction)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "68ffe84c",
-   "metadata": {},
+   "id": "3f269e80",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "$$\n",
-    "\\sigma_m\\approx \n",
-    "\\frac{\\sigma}{\\sqrt{m-1}}.\n",
-    "$$"
+    "Equivalently, the effective learning rate for parameter $j$ at time $t$ is $\\displaystyle \\alpha_{t,j} = \\frac{\\eta}{\\sqrt{\\epsilon + r_{t,j}}}$. This decreases over time as $r_{t,j}$ grows."
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "c685143d",
-   "metadata": {},
+   "id": "f4ec584c",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "In many cases however the above estimate for the standard deviation,\n",
-    "in particular if correlations are strong, may be too simplistic. Keep\n",
-    "in mind that we have assumed that the variables $x$ are independent\n",
-    "and identically distributed. This is obviously not always the\n",
-    "case. For example, the random numbers (or better pseudorandom numbers)\n",
-    "we generate in various calculations do always exhibit some\n",
-    "correlations.\n",
+    "## AdaGrad Properties\n",
     "\n",
-    "The theorem is satisfied by a large class of PDFs. Note however that for a\n",
-    "finite $m$, it is not always possible to find a closed form /analytic expression for\n",
-    "$\\tilde{p}(x)$."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "ff2c6f80",
-   "metadata": {},
-   "source": [
-    "## Confidence Intervals\n",
+    "1. AdaGrad automatically tunes the step size for each parameter. Parameters with more *volatile or large gradients* get smaller steps, and those with *small or infrequent gradients* get relatively larger steps\n",
     "\n",
-    "Confidence intervals are used in statistics and represent a type of estimate\n",
-    "computed from the observed data. This gives a range of values for an\n",
-    "unknown parameter such as the parameters $\\boldsymbol{\\beta}$ from linear regression.\n",
+    "2. No manual schedule needed: The accumulation $r_t$ keeps increasing (or stays the same if gradient is zero), so step sizes $\\eta/\\sqrt{r_t}$ are non-increasing. This has a similar effect to a learning rate schedule, but individualized per coordinate.\n",
     "\n",
-    "With the OLS expressions for the parameters $\\boldsymbol{\\beta}$ we found \n",
-    "$\\mathbb{E}(\\boldsymbol{\\beta}) = \\boldsymbol{\\beta}$, which means that the estimator of the regression parameters is unbiased.\n",
+    "3. Sparse data benefit: For very sparse features, $r_{t,j}$ grows slowly, so that feature’s parameter retains a higher learning rate for longer, allowing it to make significant updates when it does get a gradient signal\n",
     "\n",
-    "In the exercises this week we show that the variance of the estimate of the $j$-th regression coefficient is\n",
-    "$\\boldsymbol{\\sigma}^2 (\\boldsymbol{\\beta}_j ) = \\boldsymbol{\\sigma}^2 [(\\mathbf{X}^{T} \\mathbf{X})^{-1}]_{jj} $.\n",
+    "4. Convergence: In convex optimization, AdaGrad can be shown to achieve a sub-linear convergence rate  comparable to the best fixed learning rate tuned for the problem\n",
     "\n",
-    "This quantity can be used to\n",
-    "construct a confidence interval for the estimates."
+    "It effectively reduces the need to tune $\\eta$ by hand.\n",
+    "1. Limitations: Because $r_t$ accumulates without bound, AdaGrad’s learning rates can become extremely small over long training, potentially slowing progress. (Later variants like RMSProp, AdaDelta, Adam address this by modifying the accumulation rule.)"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "de45a804",
-   "metadata": {},
+   "id": "4b741016",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "## Standard Approach based on the Normal Distribution\n",
+    "## RMSProp: Adaptive Learning Rates\n",
     "\n",
-    "We will assume that the parameters $\\beta$ follow a normal\n",
-    "distribution.  We can then define the confidence interval.  Here we will be using as\n",
-    "shorthands $\\mu_{\\beta}$ for the above mean value and $\\sigma_{\\beta}$\n",
-    "for the standard deviation. We have then a confidence interval"
+    "Addresses AdaGrad’s diminishing learning rate issue.\n",
+    "Uses a decaying average of squared gradients (instead of a cumulative sum):"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "7b76e657",
-   "metadata": {},
+   "id": "76108e75",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
-    "\\left(\\mu_{\\beta}\\pm \\frac{z\\sigma_{\\beta}}{\\sqrt{n}}\\right),\n",
+    "v_t = \\rho v_{t-1} + (1-\\rho)(\\nabla C(\\theta_t))^2,\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "68374b3c",
-   "metadata": {},
-   "source": [
-    "where $z$ defines the level of certainty (or confidence). For a normal\n",
-    "distribution typical parameters are $z=2.576$ which corresponds to a\n",
-    "confidence of $99\\%$ while $z=1.96$ corresponds to a confidence of\n",
-    "$95\\%$.  A confidence level of $95\\%$ is commonly used and it is\n",
-    "normally referred to as a *two-sigmas* confidence level, that is we\n",
-    "approximate $z\\approx 2$.\n",
-    "\n",
-    "For more discussions of confidence intervals (and in particular linked with a discussion of the bootstrap method), see chapter 5 of the textbook by [Davison on the Bootstrap Methods and their Applications](https://www.cambridge.org/core/books/bootstrap-methods-and-their-application/ED2FD043579F27952363566DC09CBD6A)\n",
-    "\n",
-    "In this text you will also find an in-depth discussion of the\n",
-    "Bootstrap method, why it works and various theorems related to it."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "a33b3849",
-   "metadata": {},
-   "source": [
-    "## Resampling methods: Bootstrap background\n",
-    "\n",
-    "Since $\\widehat{\\beta} = \\widehat{\\beta}(\\boldsymbol{X})$ is a function of random variables,\n",
-    "$\\widehat{\\beta}$ itself must be a random variable. Thus it has\n",
-    "a pdf, call this function $p(\\boldsymbol{t})$. The aim of the bootstrap is to\n",
-    "estimate $p(\\boldsymbol{t})$ by the relative frequency of\n",
-    "$\\widehat{\\beta}$. You can think of this as using a histogram\n",
-    "in the place of $p(\\boldsymbol{t})$. If the relative frequency closely\n",
-    "resembles $p(\\vec{t})$, then using numerics, it is straight forward to\n",
-    "estimate all the interesting parameters of $p(\\boldsymbol{t})$ using point\n",
-    "estimators."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "3d4a490b",
-   "metadata": {},
+   "id": "4c6a3353",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "## Resampling methods: More Bootstrap background\n",
+    "with $\\rho$ typically $0.9$ (or $0.99$).\n",
+    "1. Update: $\\theta_{t+1} = \\theta_t - \\frac{\\eta}{\\sqrt{v_t + \\epsilon}} \\nabla C(\\theta_t)$.\n",
     "\n",
-    "In the case that $\\widehat{\\beta}$ has\n",
-    "more than one component, and the components are independent, we use the\n",
-    "same estimator on each component separately.  If the probability\n",
-    "density function of $X_i$, $p(x)$, had been known, then it would have\n",
-    "been straightforward to do this by: \n",
-    "1. Drawing lots of numbers from $p(x)$, suppose we call one such set of numbers $(X_1^*, X_2^*, \\cdots, X_n^*)$. \n",
+    "2. Recent gradients have more weight, so $v_t$ adapts to the current landscape.\n",
     "\n",
-    "2. Then using these numbers, we could compute a replica of $\\widehat{\\beta}$ called $\\widehat{\\beta}^*$. \n",
+    "3. Avoids AdaGrad’s “infinite memory” problem – learning rate does not continuously decay to zero.\n",
     "\n",
-    "By repeated use of the above two points, many\n",
-    "estimates of $\\widehat{\\beta}$ can  be obtained. The\n",
-    "idea is to use the relative frequency of $\\widehat{\\beta}^*$\n",
-    "(think of a histogram) as an estimate of $p(\\boldsymbol{t})$."
+    "RMSProp was first proposed in lecture notes by Geoff Hinton, 2012 - unpublished.)"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "293c5a07",
-   "metadata": {},
+   "id": "3e0a76ae",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "## Resampling methods: Bootstrap approach\n",
+    "## RMSProp algorithm, taken from [Goodfellow et al](https://www.deeplearningbook.org/contents/optimization.html)\n",
     "\n",
-    "But\n",
-    "unless there is enough information available about the process that\n",
-    "generated $X_1,X_2,\\cdots,X_n$, $p(x)$ is in general\n",
-    "unknown. Therefore, [Efron in 1979](https://projecteuclid.org/euclid.aos/1176344552)  asked the\n",
-    "question: What if we replace $p(x)$ by the relative frequency\n",
-    "of the observation $X_i$?\n",
+    "<!-- dom:FIGURE: [figures/rmsprop.png, width=600 frac=0.8] -->\n",
+    "<!-- begin figure -->\n",
     "\n",
-    "If we draw observations in accordance with\n",
-    "the relative frequency of the observations, will we obtain the same\n",
-    "result in some asymptotic sense? The answer is yes."
+    "<img src=\"figures/rmsprop.png\" width=\"600\"><p style=\"font-size: 0.9em\"><i>Figure 1: </i></p>\n",
+    "<!-- end figure -->"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "21a752e9",
-   "metadata": {},
+   "id": "fa5fd82e",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "## Resampling methods: Bootstrap steps\n",
+    "## Adam Optimizer\n",
     "\n",
-    "The independent bootstrap works like this: \n",
+    "Why combine Momentum and RMSProp? Motivation for Adam: Adaptive Moment Estimation (Adam) was introduced by Kingma an Ba (2014) to combine the benefits of momentum and RMSProp.\n",
     "\n",
-    "1. Draw with replacement $n$ numbers for the observed variables $\\boldsymbol{x} = (x_1,x_2,\\cdots,x_n)$. \n",
+    "1. Fast convergence by smoothing gradients (accelerates in long-term gradient direction).\n",
     "\n",
-    "2. Define a vector $\\boldsymbol{x}^*$ containing the values which were drawn from $\\boldsymbol{x}$. \n",
+    "2. Adaptive rates (RMSProp): Per-dimension learning rate scaling for stability (handles different feature scales, sparse gradients).\n",
     "\n",
-    "3. Using the vector $\\boldsymbol{x}^*$ compute $\\widehat{\\beta}^*$ by evaluating $\\widehat \\beta$ under the observations $\\boldsymbol{x}^*$. \n",
+    "3. Adam uses both: maintains moving averages of both first moment (gradients) and second moment (squared gradients)\n",
     "\n",
-    "4. Repeat this process $k$ times. \n",
+    "4. Additionally, includes a mechanism to correct the bias in these moving averages (crucial in early iterations)\n",
     "\n",
-    "When you are done, you can draw a histogram of the relative frequency\n",
-    "of $\\widehat \\beta^*$. This is your estimate of the probability\n",
-    "distribution $p(t)$. Using this probability distribution you can\n",
-    "estimate any statistics thereof. In principle you never draw the\n",
-    "histogram of the relative frequency of $\\widehat{\\beta}^*$. Instead\n",
-    "you use the estimators corresponding to the statistic of interest. For\n",
-    "example, if you are interested in estimating the variance of $\\widehat\n",
-    "\\beta$, apply the etsimator $\\widehat \\sigma^2$ to the values\n",
-    "$\\widehat \\beta^*$."
+    "**Result**: Adam is robust, achieves faster convergence with less tuning, and often outperforms SGD (with momentum) in practice."
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "8409d109",
-   "metadata": {},
+   "id": "89cda2f6",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "## Code example for the Bootstrap method\n",
+    "## [ADAM optimizer](https://arxiv.org/abs/1412.6980)\n",
     "\n",
-    "The following code starts with a Gaussian distribution with mean value\n",
-    "$\\mu =100$ and variance $\\sigma=15$. We use this to generate the data\n",
-    "used in the bootstrap analysis. The bootstrap analysis returns a data\n",
-    "set after a given number of bootstrap operations (as many as we have\n",
-    "data points). This data set consists of estimated mean values for each\n",
-    "bootstrap operation. The histogram generated by the bootstrap method\n",
-    "shows that the distribution for these mean values is also a Gaussian,\n",
-    "centered around the mean value $\\mu=100$ but with standard deviation\n",
-    "$\\sigma/\\sqrt{n}$, where $n$ is the number of bootstrap samples (in\n",
-    "this case the same as the number of original data points). The value\n",
-    "of the standard deviation is what we expect from the central limit\n",
-    "theorem."
+    "In [ADAM](https://arxiv.org/abs/1412.6980), we keep a running average of\n",
+    "both the first and second moment of the gradient and use this\n",
+    "information to adaptively change the learning rate for different\n",
+    "parameters.  The method is efficient when working with large\n",
+    "problems involving lots data and/or parameters.  It is a combination of the\n",
+    "gradient descent with momentum algorithm and the RMSprop algorithm\n",
+    "discussed above."
    ]
   },
   {
-   "cell_type": "code",
-   "execution_count": 1,
-   "id": "82f5a45c",
-   "metadata": {},
-   "outputs": [],
+   "cell_type": "markdown",
+   "id": "69310c2b",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "%matplotlib inline\n",
+    "## Why Combine Momentum and RMSProp?\n",
     "\n",
-    "import numpy as np\n",
-    "from time import time\n",
-    "from scipy.stats import norm\n",
-    "import matplotlib.pyplot as plt\n",
+    "1. Momentum: Fast convergence by smoothing gradients (accelerates in long-term gradient direction).\n",
+    "\n",
+    "2. Adaptive rates (RMSProp): Per-dimension learning rate scaling for stability (handles different feature scales, sparse gradients).\n",
     "\n",
-    "# Returns mean of bootstrap samples \n",
-    "# Bootstrap algorithm\n",
-    "def bootstrap(data, datapoints):\n",
-    "    t = np.zeros(datapoints)\n",
-    "    n = len(data)\n",
-    "    # non-parametric bootstrap         \n",
-    "    for i in range(datapoints):\n",
-    "        t[i] = np.mean(data[np.random.randint(0,n,n)])\n",
-    "    # analysis    \n",
-    "    print(\"Bootstrap Statistics :\")\n",
-    "    print(\"original           bias      std. error\")\n",
-    "    print(\"%8g %8g %14g %15g\" % (np.mean(data), np.std(data),np.mean(t),np.std(t)))\n",
-    "    return t\n",
+    "3. Adam uses both: maintains moving averages of both first moment (gradients) and second moment (squared gradients)\n",
     "\n",
-    "# We set the mean value to 100 and the standard deviation to 15\n",
-    "mu, sigma = 100, 15\n",
-    "datapoints = 10000\n",
-    "# We generate random numbers according to the normal distribution\n",
-    "x = mu + sigma*np.random.randn(datapoints)\n",
-    "# bootstrap returns the data sample                                    \n",
-    "t = bootstrap(x, datapoints)"
+    "4. Additionally, includes a mechanism to correct the bias in these moving averages (crucial in early iterations)\n",
+    "\n",
+    "Result: Adam is robust, achieves faster convergence with less tuning, and often outperforms SGD (with momentum) in practice"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "b1c292eb",
-   "metadata": {},
+   "id": "7d6b8734",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "We see that our new variance and from that the standard deviation, agrees with the central limit theorem."
+    "## Adam: Exponential Moving Averages (Moments)\n",
+    "Adam maintains two moving averages at each time step $t$ for each parameter $w$:\n",
+    "**First moment (mean) $m_t$.**\n",
+    "\n",
+    "The Momentum term"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "19a2ff64",
-   "metadata": {},
-   "source": [
-    "## Plotting the Histogram"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 2,
-   "id": "0e3146d6",
-   "metadata": {},
-   "outputs": [],
+   "id": "106ce6bf",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "# the histogram of the bootstrapped data (normalized data if density = True)\n",
-    "n, binsboot, patches = plt.hist(t, 50, density=True, facecolor='red', alpha=0.75)\n",
-    "# add a 'best fit' line  \n",
-    "y = norm.pdf(binsboot, np.mean(t), np.std(t))\n",
-    "lt = plt.plot(binsboot, y, 'b', linewidth=1)\n",
-    "plt.xlabel('x')\n",
-    "plt.ylabel('Probability')\n",
-    "plt.grid(True)\n",
-    "plt.show()"
+    "$$\n",
+    "m_t = \\beta_1m_{t-1} + (1-\\beta_1)\\, \\nabla C(\\theta_t),\n",
+    "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "33a2920b",
-   "metadata": {},
+   "id": "3ba64fd6",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "## The bias-variance tradeoff\n",
-    "\n",
-    "We will discuss the bias-variance tradeoff in the context of\n",
-    "continuous predictions such as regression. However, many of the\n",
-    "intuitions and ideas discussed here also carry over to classification\n",
-    "tasks. Consider a dataset $\\mathcal{D}$ consisting of the data\n",
-    "$\\mathbf{X}_\\mathcal{D}=\\{(y_j, \\boldsymbol{x}_j), j=0\\ldots n-1\\}$. \n",
+    "**Second moment (uncentered variance) $v_t$.**\n",
     "\n",
-    "Let us assume that the true data is generated from a noisy model"
+    "The RMS term"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "dcd7d41e",
-   "metadata": {},
+   "id": "d2e1a9ee",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
-    "\\boldsymbol{y}=f(\\boldsymbol{x}) + \\boldsymbol{\\epsilon}\n",
+    "v_t = \\beta_2v_{t-1} + (1-\\beta_2)(\\nabla C(\\theta_t))^2,\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "7a13a154",
-   "metadata": {},
+   "id": "00aae51f",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "where $\\epsilon$ is normally distributed with mean zero and standard deviation $\\sigma^2$.\n",
+    "with typical $\\beta_1 = 0.9$, $\\beta_2 = 0.999$. Initialize $m_0 = 0$, $v_0 = 0$.\n",
     "\n",
-    "In our derivation of the ordinary least squares method we defined then\n",
-    "an approximation to the function $f$ in terms of the parameters\n",
-    "$\\boldsymbol{\\beta}$ and the design matrix $\\boldsymbol{X}$ which embody our model,\n",
-    "that is $\\boldsymbol{\\tilde{y}}=\\boldsymbol{X}\\boldsymbol{\\beta}$. \n",
-    "\n",
-    "Thereafter we found the parameters $\\boldsymbol{\\beta}$ by optimizing the means squared error via the so-called cost function"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "12c56a3e",
-   "metadata": {},
-   "source": [
-    "$$\n",
-    "C(\\boldsymbol{X},\\boldsymbol{\\beta}) =\\frac{1}{n}\\sum_{i=0}^{n-1}(y_i-\\tilde{y}_i)^2=\\mathbb{E}\\left[(\\boldsymbol{y}-\\boldsymbol{\\tilde{y}})^2\\right].\n",
-    "$$"
+    "  These are **biased** estimators of the true first and second moment of the gradients, especially at the start (since $m_0,v_0$ are zero)"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "ded9dfd0",
-   "metadata": {},
+   "id": "38adfadd",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "We can rewrite this as"
+    "## Adam: Bias Correction\n",
+    "To counteract initialization bias in $m_t, v_t$, Adam computes bias-corrected estimates"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "5c8be1fb",
-   "metadata": {},
+   "id": "484156fb",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
-    "\\mathbb{E}\\left[(\\boldsymbol{y}-\\boldsymbol{\\tilde{y}})^2\\right]=\\frac{1}{n}\\sum_i(f_i-\\mathbb{E}\\left[\\boldsymbol{\\tilde{y}}\\right])^2+\\frac{1}{n}\\sum_i(\\tilde{y}_i-\\mathbb{E}\\left[\\boldsymbol{\\tilde{y}}\\right])^2+\\sigma^2.\n",
+    "\\hat{m}_t = \\frac{m_t}{1 - \\beta_1^t}, \\qquad \\hat{v}_t = \\frac{v_t}{1 - \\beta_2^t}.\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "6ae6d83b",
-   "metadata": {},
+   "id": "45d1d0c2",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "The three terms represent the square of the bias of the learning\n",
-    "method, which can be thought of as the error caused by the simplifying\n",
-    "assumptions built into the method. The second term represents the\n",
-    "variance of the chosen model and finally the last terms is variance of\n",
-    "the error $\\boldsymbol{\\epsilon}$.\n",
+    "* When $t$ is small, $1-\\beta_i^t \\approx 0$, so $\\hat{m}_t, \\hat{v}_t$ significantly larger than raw $m_t, v_t$, compensating for the initial zero bias.\n",
     "\n",
-    "To derive this equation, we need to recall that the variance of $\\boldsymbol{y}$ and $\\boldsymbol{\\epsilon}$ are both equal to $\\sigma^2$. The mean value of $\\boldsymbol{\\epsilon}$ is by definition equal to zero. Furthermore, the function $f$ is not a stochastics variable, idem for $\\boldsymbol{\\tilde{y}}$.\n",
-    "We use a more compact notation in terms of the expectation value"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "ac6ad12e",
-   "metadata": {},
-   "source": [
-    "$$\n",
-    "\\mathbb{E}\\left[(\\boldsymbol{y}-\\boldsymbol{\\tilde{y}})^2\\right]=\\mathbb{E}\\left[(\\boldsymbol{f}+\\boldsymbol{\\epsilon}-\\boldsymbol{\\tilde{y}})^2\\right],\n",
-    "$$"
+    "* As $t$ increases, $1-\\beta_i^t \\to 1$, and $\\hat{m}_t, \\hat{v}_t$ converge to $m_t, v_t$.\n",
+    "\n",
+    "* Bias correction is important for Adam’s stability in early iterations"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "24cd6a77",
-   "metadata": {},
+   "id": "e62d5568",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "and adding and subtracting $\\mathbb{E}\\left[\\boldsymbol{\\tilde{y}}\\right]$ we get"
+    "## Adam: Update Rule Derivation\n",
+    "Finally, Adam updates parameters using the bias-corrected moments:"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "82580456",
-   "metadata": {},
+   "id": "3eb873c1",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
-    "\\mathbb{E}\\left[(\\boldsymbol{y}-\\boldsymbol{\\tilde{y}})^2\\right]=\\mathbb{E}\\left[(\\boldsymbol{f}+\\boldsymbol{\\epsilon}-\\boldsymbol{\\tilde{y}}+\\mathbb{E}\\left[\\boldsymbol{\\tilde{y}}\\right]-\\mathbb{E}\\left[\\boldsymbol{\\tilde{y}}\\right])^2\\right],\n",
+    "\\theta_{t+1} =\\theta_t -\\frac{\\alpha}{\\sqrt{\\hat{v}_t} + \\epsilon}\\hat{m}_t,\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "542a056a",
-   "metadata": {},
+   "id": "fc1129f6",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "which, using the abovementioned expectation values can be rewritten as"
+    "where $\\epsilon$ is a small constant (e.g. $10^{-8}$) to prevent division by zero.\n",
+    "Breaking it down:\n",
+    "1. Compute gradient $\\nabla C(\\theta_t)$.\n",
+    "\n",
+    "2. Update first moment $m_t$ and second moment $v_t$ (exponential moving averages).\n",
+    "\n",
+    "3. Bias-correct: $\\hat{m}_t = m_t/(1-\\beta_1^t)$, $\\; \\hat{v}_t = v_t/(1-\\beta_2^t)$.\n",
+    "\n",
+    "4. Compute step: $\\Delta \\theta_t = \\frac{\\hat{m}_t}{\\sqrt{\\hat{v}_t} + \\epsilon}$.\n",
+    "\n",
+    "5. Update parameters: $\\theta_{t+1} = \\theta_t - \\alpha\\, \\Delta \\theta_t$.\n",
+    "\n",
+    "This is the Adam update rule as given in the original paper."
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "12d87c8f",
-   "metadata": {},
+   "id": "6f15ce48",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "$$\n",
-    "\\mathbb{E}\\left[(\\boldsymbol{y}-\\boldsymbol{\\tilde{y}})^2\\right]=\\mathbb{E}\\left[(\\boldsymbol{y}-\\mathbb{E}\\left[\\boldsymbol{\\tilde{y}}\\right])^2\\right]+\\mathrm{Var}\\left[\\boldsymbol{\\tilde{y}}\\right]+\\sigma^2,\n",
-    "$$"
+    "## Adam vs. AdaGrad and RMSProp\n",
+    "\n",
+    "1. AdaGrad: Uses per-coordinate scaling like Adam, but no momentum. Tends to slow down too much due to cumulative history (no forgetting)\n",
+    "\n",
+    "2. RMSProp: Uses moving average of squared gradients (like Adam’s $v_t$) to maintain adaptive learning rates, but does not include momentum or bias-correction.\n",
+    "\n",
+    "3. Adam: Effectively RMSProp + Momentum + Bias-correction\n",
+    "\n",
+    "  * Momentum ($m_t$) provides acceleration and smoother convergence.\n",
+    "\n",
+    "  * Adaptive $v_t$ scaling moderates the step size per dimension.\n",
+    "\n",
+    "  * Bias correction (absent in AdaGrad/RMSProp) ensures robust estimates early on.\n",
+    "\n",
+    "In practice, Adam often yields faster convergence and better tuning stability than RMSProp or AdaGrad alone"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "661e392d",
-   "metadata": {},
+   "id": "44cb65e2",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "that is the rewriting in terms of the so-called bias, the variance of the model $\\boldsymbol{\\tilde{y}}$ and the variance of $\\boldsymbol{\\epsilon}$."
+    "## Adaptivity Across Dimensions\n",
+    "\n",
+    "1. Adam adapts the step size \\emph{per coordinate}: parameters with larger gradient variance get smaller effective steps, those with smaller or sparse gradients get larger steps.\n",
+    "\n",
+    "2. This per-dimension adaptivity is inherited from AdaGrad/RMSProp and helps handle ill-conditioned or sparse problems.\n",
+    "\n",
+    "3. Meanwhile, momentum (first moment) allows Adam to continue making progress even if gradients become small or noisy, by leveraging accumulated direction."
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "c92115cc",
-   "metadata": {},
+   "id": "e3862c40",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "## A way to Read the Bias-Variance Tradeoff\n",
+    "## ADAM algorithm, taken from [Goodfellow et al](https://www.deeplearningbook.org/contents/optimization.html)\n",
     "\n",
-    "<!-- dom:FIGURE: [figures/BiasVariance.png, width=600 frac=0.9] -->\n",
+    "<!-- dom:FIGURE: [figures/adam.png, width=600 frac=0.8] -->\n",
     "<!-- begin figure -->\n",
     "\n",
-    "<img src=\"figures/BiasVariance.png\" width=\"600\"><p style=\"font-size: 0.9em\"><i>Figure 1: </i></p>\n",
+    "<img src=\"figures/adam.png\" width=\"600\"><p style=\"font-size: 0.9em\"><i>Figure 1: </i></p>\n",
     "<!-- end figure -->"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "555ecab7",
-   "metadata": {},
+   "id": "c4aa2b35",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "## Example code for Bias-Variance tradeoff"
+    "## Algorithms and codes for Adagrad, RMSprop and Adam\n",
+    "\n",
+    "The algorithms we have implemented are well described in the text by [Goodfellow, Bengio and Courville, chapter 8](https://www.deeplearningbook.org/contents/optimization.html).\n",
+    "\n",
+    "The codes which implement these algorithms are discussed below here."
    ]
   },
   {
-   "cell_type": "code",
-   "execution_count": 3,
-   "id": "a1e3bf2d",
-   "metadata": {},
-   "outputs": [],
+   "cell_type": "markdown",
+   "id": "01de27d3",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "import matplotlib.pyplot as plt\n",
-    "import numpy as np\n",
-    "from sklearn.linear_model import LinearRegression, Ridge, Lasso\n",
-    "from sklearn.preprocessing import PolynomialFeatures\n",
-    "from sklearn.model_selection import train_test_split\n",
-    "from sklearn.pipeline import make_pipeline\n",
-    "from sklearn.utils import resample\n",
-    "\n",
-    "np.random.seed(2018)\n",
-    "\n",
-    "n = 500\n",
-    "n_boostraps = 100\n",
-    "degree = 18  # A quite high value, just to show.\n",
-    "noise = 0.1\n",
-    "\n",
-    "# Make data set.\n",
-    "x = np.linspace(-1, 3, n).reshape(-1, 1)\n",
-    "y = np.exp(-x**2) + 1.5 * np.exp(-(x-2)**2) + np.random.normal(0, 0.1, x.shape)\n",
-    "\n",
-    "# Hold out some test data that is never used in training.\n",
-    "x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2)\n",
-    "\n",
-    "# Combine x transformation and model into one operation.\n",
-    "# Not neccesary, but convenient.\n",
-    "model = make_pipeline(PolynomialFeatures(degree=degree), LinearRegression(fit_intercept=False))\n",
-    "\n",
-    "# The following (m x n_bootstraps) matrix holds the column vectors y_pred\n",
-    "# for each bootstrap iteration.\n",
-    "y_pred = np.empty((y_test.shape[0], n_boostraps))\n",
-    "for i in range(n_boostraps):\n",
-    "    x_, y_ = resample(x_train, y_train)\n",
-    "\n",
-    "    # Evaluate the new model on the same test data each time.\n",
-    "    y_pred[:, i] = model.fit(x_, y_).predict(x_test).ravel()\n",
-    "\n",
-    "# Note: Expectations and variances taken w.r.t. different training\n",
-    "# data sets, hence the axis=1. Subsequent means are taken across the test data\n",
-    "# set in order to obtain a total value, but before this we have error/bias/variance\n",
-    "# calculated per data point in the test set.\n",
-    "# Note 2: The use of keepdims=True is important in the calculation of bias as this \n",
-    "# maintains the column vector form. Dropping this yields very unexpected results.\n",
-    "error = np.mean( np.mean((y_test - y_pred)**2, axis=1, keepdims=True) )\n",
-    "bias = np.mean( (y_test - np.mean(y_pred, axis=1, keepdims=True))**2 )\n",
-    "variance = np.mean( np.var(y_pred, axis=1, keepdims=True) )\n",
-    "print('Error:', error)\n",
-    "print('Bias^2:', bias)\n",
-    "print('Var:', variance)\n",
-    "print('{} >= {} + {} = {}'.format(error, bias, variance, bias+variance))\n",
-    "\n",
-    "plt.plot(x[::5, :], y[::5, :], label='f(x)')\n",
-    "plt.scatter(x_test, y_test, label='Data points')\n",
-    "plt.scatter(x_test, np.mean(y_pred, axis=1), label='Pred')\n",
-    "plt.legend()\n",
-    "plt.show()"
+    "## Practical tips\n",
+    "\n",
+    "* **Randomize the data when making mini-batches**. It is always important to randomly shuffle the data when forming mini-batches. Otherwise, the gradient descent method can fit spurious correlations resulting from the order in which data is presented.\n",
+    "\n",
+    "* **Transform your inputs**. Learning becomes difficult when our landscape has a mixture of steep and flat directions. One simple trick for minimizing these situations is to standardize the data by subtracting the mean and normalizing the variance of input variables. Whenever possible, also decorrelate the inputs. To understand why this is helpful, consider the case of linear regression. It is easy to show that for the squared error cost function, the Hessian of the cost function is just the correlation matrix between the inputs. Thus, by standardizing the inputs, we are ensuring that the landscape looks homogeneous in all directions in parameter space. Since most deep networks can be viewed as linear transformations followed by a non-linearity at each layer, we expect this intuition to hold beyond the linear case.\n",
+    "\n",
+    "* **Monitor the out-of-sample performance.** Always monitor the performance of your model on a validation set (a small portion of the training data that is held out of the training process to serve as a proxy for the test set. If the validation error starts increasing, then the model is beginning to overfit. Terminate the learning process. This *early stopping* significantly improves performance in many settings.\n",
+    "\n",
+    "* **Adaptive optimization methods don't always have good generalization.** Recent studies have shown that adaptive methods such as ADAM, RMSPorp, and AdaGrad tend to have poor generalization compared to SGD or SGD with momentum, particularly in the high-dimensional limit (i.e. the number of parameters exceeds the number of data points). Although it is not clear at this stage why these methods perform so well in training deep neural networks, simpler procedures like properly-tuned SGD may work as well or better in these applications."
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "b05eafde",
-   "metadata": {},
+   "id": "78a1a601",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "## Understanding what happens"
+    "## Sneaking in automatic differentiation using Autograd\n",
+    "\n",
+    "In the examples here we take the liberty of sneaking in automatic\n",
+    "differentiation (without having discussed the mathematics).  In\n",
+    "project 1 you will write the gradients as discussed above, that is\n",
+    "hard-coding the gradients.  By introducing automatic differentiation\n",
+    "via the library **autograd**, which is now replaced by **JAX**, we have\n",
+    "more flexibility in setting up alternative cost functions.\n",
+    "\n",
+    "The\n",
+    "first example shows results with ordinary leats squares."
    ]
   },
   {
    "cell_type": "code",
    "execution_count": 1,
-   "id": "23e711c6",
-   "metadata": {},
+   "id": "c721352d",
+   "metadata": {
+    "collapsed": false,
+    "editable": true,
+    "jupyter": {
+     "outputs_hidden": false
+    }
+   },
    "outputs": [
     {
      "name": "stdout",
      "output_type": "stream",
      "text": [
-      "Polynomial degree: 0\n",
-      "Error: 0.32149601703519115\n",
-      "Bias^2: 0.3123314713548606\n",
-      "Var: 0.009164545680330616\n",
-      "0.32149601703519115 >= 0.3123314713548606 + 0.009164545680330616 = 0.3214960170351912\n",
-      "Polynomial degree: 1\n",
-      "Error: 0.08426840630693411\n",
-      "Bias^2: 0.0796891867672603\n",
-      "Var: 0.004579219539673834\n",
-      "0.08426840630693411 >= 0.0796891867672603 + 0.004579219539673834 = 0.08426840630693413\n",
-      "Polynomial degree: 2\n",
-      "Error: 0.10398646080125035\n",
-      "Bias^2: 0.1007711427354898\n",
-      "Var: 0.0032153180657605116\n",
-      "0.10398646080125035 >= 0.1007711427354898 + 0.0032153180657605116 = 0.10398646080125032\n",
-      "Polynomial degree: 3\n",
-      "Error: 0.06547790180152355\n",
-      "Bias^2: 0.06208238634231949\n",
-      "Var: 0.0033955154592040936\n",
-      "0.06547790180152355 >= 0.06208238634231949 + 0.0033955154592040936 = 0.06547790180152359\n",
-      "Polynomial degree: 4\n",
-      "Error: 0.06844519414009445\n",
-      "Bias^2: 0.06453579006728324\n",
-      "Var: 0.003909404072811226\n",
-      "0.06844519414009445 >= 0.06453579006728324 + 0.003909404072811226 = 0.06844519414009446\n",
-      "Polynomial degree: 5\n",
-      "Error: 0.05227921801205686\n",
-      "Bias^2: 0.0481872773043029\n",
-      "Var: 0.004091940707753939\n",
-      "0.05227921801205686 >= 0.0481872773043029 + 0.004091940707753939 = 0.052279218012056844\n",
-      "Polynomial degree: 6\n",
-      "Error: 0.037813671417389005\n",
-      "Bias^2: 0.033657685071527665\n",
-      "Var: 0.00415598634586135\n",
-      "0.037813671417389005 >= 0.033657685071527665 + 0.00415598634586135 = 0.03781367141738902\n",
-      "Polynomial degree: 7\n",
-      "Error: 0.02760977349102253\n",
-      "Bias^2: 0.022999498260366312\n",
-      "Var: 0.004610275230656212\n",
-      "0.02760977349102253 >= 0.022999498260366312 + 0.004610275230656212 = 0.027609773491022525\n",
-      "Polynomial degree: 8\n",
-      "Error: 0.017355848195593347\n",
-      "Bias^2: 0.010331721306655127\n",
-      "Var: 0.007024126888938232\n",
-      "0.017355848195593347 >= 0.010331721306655127 + 0.007024126888938232 = 0.01735584819559336\n",
-      "Polynomial degree: 9\n",
-      "Error: 0.02660572763718093\n",
-      "Bias^2: 0.010018312644137363\n",
-      "Var: 0.016587414993043573\n",
-      "0.02660572763718093 >= 0.010018312644137363 + 0.016587414993043573 = 0.026605727637180936\n",
-      "Polynomial degree: 10\n",
-      "Error: 0.021592704588025025\n",
-      "Bias^2: 0.010516485576645508\n",
-      "Var: 0.011076219011379514\n",
-      "0.021592704588025025 >= 0.010516485576645508 + 0.011076219011379514 = 0.021592704588025022\n",
-      "Polynomial degree: 11\n",
-      "Error: 0.07160048164233104\n",
-      "Bias^2: 0.014436800088904942\n",
-      "Var: 0.05716368155342608\n",
-      "0.07160048164233104 >= 0.014436800088904942 + 0.05716368155342608 = 0.07160048164233102\n",
-      "Polynomial degree: 12\n",
-      "Error: 0.11547777218872497\n",
-      "Bias^2: 0.01628578269596628\n",
-      "Var: 0.09919198949275869\n",
-      "0.11547777218872497 >= 0.01628578269596628 + 0.09919198949275869 = 0.11547777218872497\n",
-      "Polynomial degree: 13\n",
-      "Error: 0.22842468702219465\n",
-      "Bias^2: 0.01975416527185249\n",
-      "Var: 0.20867052175034223\n",
-      "0.22842468702219465 >= 0.01975416527185249 + 0.20867052175034223 = 0.2284246870221947\n"
+      "Own inversion\n",
+      "[[4.04119276]\n",
+      " [2.9956111 ]]\n",
+      "Eigenvalues of Hessian Matrix:[0.33459601 4.61583279]\n",
+      "theta from own gd\n",
+      "[[4.04119276]\n",
+      " [2.9956111 ]]\n"
      ]
     },
     {
      "data": {
-      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAiwAAAGdCAYAAAAxCSikAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8hTgPZAAAACXBIWXMAAA9hAAAPYQGoP6dpAABxkUlEQVR4nO3dd3hUZfrG8e/MpPdKCgQSmhQpShOpaiS4NnYt2BXL7tqR9afiCrg2wIJYWFjdtTd0LbtrQRGpgqAgolKkpRBISID0MsnM+f0xyWSGhBJIMpPk/lzXXEnOnDl5EnczN+953+c1GYZhICIiIuLFzJ4uQERERORYFFhERETE6ymwiIiIiNdTYBERERGvp8AiIiIiXk+BRURERLyeAouIiIh4PQUWERER8Xo+ni6gKdjtdvbu3UtoaCgmk8nT5YiIiMhxMAyD4uJiEhMTMZuPPobSJgLL3r17SUpK8nQZIiIicgKysrLo1KnTUc9pE4ElNDQUcPzAYWFhHq5GREREjkdRURFJSUnO9/GjaROBpfY2UFhYmAKLiIhIK3M80zk06VZERES8ngKLiIiIeD0FFhEREfF6bWIOi4iItB+GYVBdXY3NZvN0KXIcLBYLPj4+J912RIFFRERaDavVyr59+ygrK/N0KdIIQUFBJCQk4Ofnd8LXUGAREZFWwW63s3v3biwWC4mJifj5+alZqJczDAOr1UpeXh67d++mR48ex2wQdyQKLCIi0ipYrVbsdjtJSUkEBQV5uhw5ToGBgfj6+pKRkYHVaiUgIOCErqNJtyIi0qqc6L/QxXOa4r+Z/quLiIiI11NgEREREa+nwCIiIiJeT4FFRESkmd1www2YTKZ6j/Hjx3u6tFZDq4SOoqiiin+u3E1uYQWzL+3v6XJERKQVGz9+PK+++qrbMX9//wbPraqqwtfX1+2Y1Wo9oT4mJ/o6b6MRlqPwNZt5fsl2Fv6QRUGZ1dPliIjIYQzDoMxa7ZGHYRiNqtXf35/4+Hi3R2RkJODYrXj+/PlcdNFFBAcH8/jjj/Pwww8zcOBA/vnPf5KSkuJcDpyZmcnFF19MSEgIYWFhXH755eTm5jq/z5Fe19pphOUoAv0sxIX5k1tUSfqBMgYGtf6EKiLSlpRX2egz/UuPfO/Nj6QR5Nd0b6MPP/wws2bNYu7cufj4+PDKK6+wY8cOPvzwQz766CMsFgt2u90ZVpYvX051dTW33347EydOZNmyZc5rHf66tkCB5Ri6RAWTW1RJxoFSBiZFeLocERFppT799FNCQkLcjj344IM8+OCDAFx11VVMmjTJ7Xmr1cobb7xBbGwsAIsXL+bnn39m9+7dJCUlAfDGG2/Qt29fvv/+e4YMGdLg69oCBZZj6BIdxLr0g2Qc0L4VIiLeJtDXwuZH0jz2vRvjrLPOYv78+W7HoqKinJ8PHjy43mu6dOniFjq2bNlCUlKSM6wA9OnTh4iICLZs2eIMLIe/ri1QYDmGLtGO9s8KLCIi3sdkMjXpbZnmFBwcTPfu3Y/6/PEcO97v1dZo0u0xdIl2/EfPPFjq4UpERKS96927N1lZWWRlZTmPbd68mYKCAvr06ePByppf64ilHlQ7wpKuERYRETkJlZWV5OTkuB3z8fEhJibmuK+RmppKv379uPrqq5k7dy7V1dXcdtttjBkzpsFbSm2JRliOoUuUY4Qlr7iSMmu1h6sREZHWatGiRSQkJLg9Ro4c2ahrmEwm/vOf/xAZGcno0aNJTU2la9euLFy4sJmq9h4mo7ELyb1QUVER4eHhFBYWEhYW1uTXH/i3Lykor+aLu0fRO6Hpry8iIsdWUVHB7t2721RvkfbiSP/tGvP+rRGWoynJg3lnsIobMWHXxFsREREP0RyWowmMhAM7CDGqSOAgGQc08VZERMQTNMJyNBYfiOwCQBdzLhkHNcIiIiLiCQosxxLVFYBkU45GWERERDxEgeVYoroB0MWUqzksIiIiHqLAcizOEZZc9haUY622e7ggERGR9keB5VhqAkuKORe7AXsOaZRFRESkpSmwHEtUCgBdTDmOpc2aeCsiItLiFFiOJaIzmCwEYKUDBWTka+KtiIg0ztixY5k8efIRn09OTmbu3LktVk9rpD4sx2LxdYSWQ7tJNmlps4iINL3vv/++Te6w3JROaIRl3rx5JCcnExAQwLBhw1i3bt0Rz/3oo48YPHgwERERBAcHM3DgQN588023cwzDYPr06SQkJBAYGEhqairbt28/kdKaR808li7mHK0UEhGRJhcbG0tQUJCny/BqjQ4sCxcuZMqUKcyYMYMNGzYwYMAA0tLS2L9/f4PnR0VF8de//pU1a9awadMmJk2axKRJk/jyyy+d5zz55JM8//zzLFiwgLVr1xIcHExaWhoVFRUn/pM1pWjH0uZkU656sYiIyAmprq7mjjvuIDw8nJiYGKZNm0btdn6H3xKaM2cO/fr1Izg4mKSkJG677TZKSkqcz2dkZHDhhRcSGRlJcHAwffv25fPPP2/pH6lFNTqwzJkzh1tuuYVJkybRp08fFixYQFBQEK+88kqD548dO5bf//739O7dm27dunH33XfTv39/Vq1aBThGV+bOnctDDz3ExRdfTP/+/XnjjTfYu3cvn3zyyUn9cE2mdoTFlEPWwXJs9la/X6SISNtgGGAt9cyjkXsHv/766/j4+LBu3Tqee+455syZwz//+c8GzzWbzTz//PP8+uuvvP7663zzzTfcd999zudvv/12KisrWbFiBT///DOzZ88mJCTkpH6V3q5Rc1isVivr169n6tSpzmNms5nU1FTWrFlzzNcbhsE333zDtm3bmD17NgC7d+8mJyeH1NRU53nh4eEMGzaMNWvWcMUVV9S7TmVlJZWVlc6vi4qKGvNjNF5NYOlqzsVaZSenqIKOEYHN+z1FROTYqsrgiUTPfO8H94Lf8c87SUpK4tlnn8VkMnHKKafw888/8+yzz3LLLbfUO9d1gm5ycjKPPfYYf/7zn/n73/8OQGZmJpdccgn9+vUDoGvXrif3s7QCjRphyc/Px2azERcX53Y8Li6OnJycI76usLCQkJAQ/Pz8OP/883nhhRc499xzAZyva8w1Z86cSXh4uPORlJTUmB+j8Vyax4GhlUIiItJoZ5xxBiaTyfn18OHD2b59Ozabrd65X3/9Neeccw4dO3YkNDSUa6+9lgMHDlBW5phHedddd/HYY48xYsQIZsyYwaZNm1rs5/CUFlklFBoaysaNGykpKWHJkiVMmTKFrl27Mnbs2BO63tSpU5kyZYrz66KiouYNLRGdwWQm0KgglkIyDpZxZvN9NxEROV6+QY6RDk9972aQnp7OBRdcwK233srjjz9OVFQUq1at4qabbsJqtRIUFMTNN99MWloan332GV999RUzZ87kmWee4c4772yWmrxBowJLTEwMFouF3Nxct+O5ubnEx8cf8XVms5nu3bsDMHDgQLZs2cLMmTMZO3as83W5ubkkJCS4XXPgwIENXs/f3x9/f//GlH5yfPwhvBMUZNLFlEO6Jt6KiHgHk6lRt2U8ae3atW5ff/fdd/To0QOLxeJ2fP369djtdp555hnMZseNkPfff7/e9ZKSkvjzn//Mn//8Z6ZOncrLL7/cpgNLo24J+fn5MWjQIJYsWeI8ZrfbWbJkCcOHDz/u69jtducclJSUFOLj492uWVRUxNq1axt1zWZXswlisjmXTC1tFhGRRsrMzGTKlCls27aNd999lxdeeIG777673nndu3enqqqKF154gV27dvHmm2+yYMECt3MmT57Ml19+ye7du9mwYQNLly6ld+/eLfWjeESjbwlNmTKF66+/nsGDBzN06FDmzp1LaWkpkyZNAuC6666jY8eOzJw5E3DMNxk8eDDdunWjsrKSzz//nDfffJP58+cDYDKZmDx5Mo899hg9evQgJSWFadOmkZiYyIQJE5ruJz1ZUV1h11K6mHL5XIFFREQa6brrrqO8vJyhQ4disVi4++67+eMf/1jvvAEDBjBnzhxmz57N1KlTGT16NDNnzuS6665znmOz2bj99tvZs2cPYWFhjB8/nmeffbYlf5wW1+jAMnHiRPLy8pg+fTo5OTkMHDiQRYsWOSfNZmZmOoewAEpLS7ntttvYs2cPgYGB9OrVi7feeouJEyc6z7nvvvsoLS3lj3/8IwUFBYwcOZJFixYREBDQBD9iE3FOvM0h80AphmG4TZ4SERE5kmXLljk/r/0Hu6v09HS3r++55x7uuecet2PXXnut8/MXXnihSetrDUyG0ciF5F6oqKiI8PBwCgsLCQsLa55vsvVzeO9KfrYnc6H1CX54KJWYkBacRyMi0s5VVFSwe/duUlJSvOsftHJMR/pv15j3b21+eLxqRlhSzPsBQx1vRUREWpACy/GKTAZMhFBGFMXaU0hERKQFKbAcL98ACOsIOOaxpCuwiIiItBgFlsaIrt1TKJdM3RISERFpMQosjVG7UsicqxEWERGRFqTA0hiuS5sPKrCIiIi0FAWWxoiqvSWUw8FSK0UVVR4uSEREpH1QYGkMt6XNqEW/iIhIC1FgaYzIFADCKSGcEm2CKCIiHmUymfjkk088XUaLUGBpDL8gCE0EHPNY1ItFRESO5cILL2T8+PENPrdy5UpMJhObNm06oWvv27eP884772TKazUUWBorynVpswKLiIgc3U033cTixYvZs2dPvedeffVVBg8eTP/+/Rt1TavVCkB8fDz+/u1jmxgFlsaKctwWSjHl6JaQiIgc0wUXXEBsbCyvvfaa2/GSkhI++OADJkyYwJVXXknHjh0JCgqiX79+vPvuu27njh07ljvuuIPJkycTExNDWloaUP+W0P3330/Pnj0JCgqia9euTJs2jaqqugUiDz/8MAMHDuTNN98kOTmZ8PBwrrjiCoqLi53n2O12nnzySbp3746/vz+dO3fm8ccfdz6flZXF5ZdfTkREBFFRUVx88cX1Nm9sDgosjVU7wmLO1dJmEREPMwyDsqoyjzyOd+9gHx8frrvuOl577TW313zwwQfYbDauueYaBg0axGeffcYvv/zCH//4R6699lrWrVvndp3XX38dPz8/vv32WxYsWNDg9woNDeW1115j8+bNPPfcc7z88ss8++yzbufs3LmTTz75hE8//ZRPP/2U5cuXM2vWLOfzU6dOZdasWUybNo3NmzfzzjvvEBcXB0BVVRVpaWmEhoaycuVKvv32W0JCQhg/frxz1Ke5aLfmxvr1E/jgejbYu/MH6yNsfXQ8Ab6W5v2eIiLS4I6/ZVVlDHtnmEfqWXvVWoJ8g47r3K1bt9K7d2+WLl3K2LFjARg9ejRdunThzTffrHf+BRdcQK9evXj66acBxwhLUVERGzZscDvPZDLx8ccfM2HChAa/79NPP817773HDz/8ADhGWJ566ilycnIIDQ0F4L777mPFihV89913FBcXExsby4svvsjNN99c73pvvfUWjz32GFu2bMFkMgGO21MRERF88sknjBs3rsE6mmK3Zp+jPiv1uXS7Bcg6WEaPuFBPViQiIl6uV69enHnmmbzyyiuMHTuWHTt2sHLlSh555BFsNhtPPPEE77//PtnZ2VitViorKwkKcg9DgwYNOub3WbhwIc8//zw7d+6kpKSE6urqekEgOTnZGVYAEhIS2L/f0a5jy5YtVFZWcs455zR4/Z9++okdO3a4vR4cgWTnzp3H9bs4UQosjVUTWKIoJoxS0g8osIiIeEqgTyBrr1rrse/dGDfddBN33nkn8+bN49VXX6Vbt26MGTOG2bNn89xzzzF37lz69etHcHAwkydPrneLJTg4+KjXX7NmDVdffTV/+9vfSEtLIzw8nPfee49nnnnG7TxfX1+3r00mE3a73fEzBR79ZyopKWHQoEG8/fbb9Z6LjY096mtPlgJLY/mHQEgclOTS2ZRLhibeioh4jMlkOu7bMp52+eWXc/fdd/POO+/wxhtvcOutt2Iymfj222+5+OKLueaaawDHpNfffvuNPn36NOr6q1evpkuXLvz1r391HsvIyGjUNXr06EFgYCBLlixp8JbQ6aefzsKFC+nQoUPzT8E4jCbdngjnnkK56sUiIiLHJSQkhIkTJzJ16lT27dvHDTfcADhCwuLFi1m9ejVbtmzhT3/6E7m5uY2+fo8ePcjMzOS9995j586dPP/883z88ceNukZAQAD3338/9913H2+88QY7d+7ku+++41//+hcAV199NTExMVx88cWsXLmS3bt3s2zZMu66664Gl203JQWWE+GyCWKGVgqJiMhxuummmzh06BBpaWkkJjoakT700EOcfvrppKWlMXbsWOLj4484ifZoLrroIu655x7uuOMOBg4cyOrVq5k2bVqjrzNt2jT+8pe/MH36dHr37s3EiROdc1yCgoJYsWIFnTt35g9/+AO9e/fmpptuoqKiotlHXLRK6ESseAq+eYx/20bzQtgUlv/fWc3/PUVE2rkjrTQR79cUq4Q0wnIiXHZtzj5UTpXN7uGCRERE2jYFlhPhMoel2m6wt6DcwwWJiIi0bQosJ6ImsMSaCgmmXBNvRUREmpkCy4kICIegGMCxCaIm3oqIiDQvBZYT5bpSKF+9WERERJqTAsuJcu3FohEWEZEW0wYWt7Y7TfHfTIHlRDlXCqnbrYhIS6htKV9Wpn8ktja1/80O3xagMdSa/0Q5N0HMIfNgGXa7gdls8nBRIiJtl8ViISIiwq2JWe2OweKdDMOgrKyM/fv3ExERgcViOeFrKbCcqOi6EZYKq539xZXEh6uRkYhIc4qPjwdwhhZpHSIiIpz/7U6UAsuJqhlhiTcdIpAKMg6UKrCIiDQzk8lEQkICHTp0oKqqytPlyHHw9fU9qZGVWgosJyow0vEoP0QX034yDpQxrGu0p6sSEWkXLBZLk7wJSuuhSbcnw6VFf8ZBTbwVERFpLgosJ8NlaXO6ut2KiIg0GwWWk+EywpKpwCIiItJsFFhOhtsIS6maGYmIiDQTBZaTEdUNgC7mXIorqiko04x1ERGR5qDAcjJqRlg6mg7gj1Ut+kVERJqJAsvJCIoC/3AAOpv2q0W/iIhIM1FgORkmE0SlADW7NmvirYiISLNQYDlZLpsgpmuERUREpFkosJws50ohLW0WERFpLgosJyu6ZqWQmseJiIg0mxMKLPPmzSM5OZmAgACGDRvGunXrjnjuyy+/zKhRo4iMjCQyMpLU1NR6599www2YTCa3x/jx40+ktJbn0oslv6SS0spqDxckIiLS9jQ6sCxcuJApU6YwY8YMNmzYwIABA0hLSzviVt/Lli3jyiuvZOnSpaxZs4akpCTGjRtHdna223njx49n3759zse77757Yj9RS6sJLInmfPyo0sRbERGRZtDowDJnzhxuueUWJk2aRJ8+fViwYAFBQUG88sorDZ7/9ttvc9tttzFw4EB69erFP//5T+x2O0uWLHE7z9/fn/j4eOcjMjLyxH6ilhYcC34hWDBIMu0nU5sgioiINLlGBRar1cr69etJTU2tu4DZTGpqKmvWrDmua5SVlVFVVUVUVJTb8WXLltGhQwdOOeUUbr31Vg4cOHDEa1RWVlJUVOT28BiXpc1dTLkaYREREWkGjQos+fn52Gw24uLi3I7HxcWRk5NzXNe4//77SUxMdAs948eP54033mDJkiXMnj2b5cuXc95552Gz2Rq8xsyZMwkPD3c+kpKSGvNjND3t2iwiItKsfFrym82aNYv33nuPZcuWERAQ4Dx+xRVXOD/v168f/fv3p1u3bixbtoxzzjmn3nWmTp3KlClTnF8XFRV5NrTU7ilkyuEr3RISERFpco0aYYmJicFisZCbm+t2PDc3l/j4+KO+9umnn2bWrFl89dVX9O/f/6jndu3alZiYGHbs2NHg8/7+/oSFhbk9PMp1hCVfIywiIiJNrVGBxc/Pj0GDBrlNmK2dQDt8+PAjvu7JJ5/k0UcfZdGiRQwePPiY32fPnj0cOHCAhISExpTnOS7dbvcVllNZ3fCtLBERETkxjV4lNGXKFF5++WVef/11tmzZwq233kppaSmTJk0C4LrrrmPq1KnO82fPns20adN45ZVXSE5OJicnh5ycHEpKSgAoKSnh//7v//juu+9IT09nyZIlXHzxxXTv3p20tLQm+jGbWU1g6WTKw2xUs+dQuYcLEhERaVsaPYdl4sSJ5OXlMX36dHJychg4cCCLFi1yTsTNzMzEbK7LQfPnz8dqtXLppZe6XWfGjBk8/PDDWCwWNm3axOuvv05BQQGJiYmMGzeORx99FH9//5P88VpIaDz4BOJTXU4nUx6ZB8roFhvi6apERETaDJNhGIanizhZRUVFhIeHU1hY6Ln5LH8/E/b/yg3W+xhz/lVMGpHimTpERERaica8f2svoaaiXiwiIiLNRoGlqdRsgphsyiHjgJY2i4iINCUFlqbislIo46BGWERERJqSAktTcfZiySHrYBk2e6ufGiQiIuI1FFiaSk1gSTLlYbdVs69QS5tFRESaigJLUwlNBIs/viYbiaZ8MjXxVkREpMkosDQVs9m5UkibIIqIiDQtBZam5NwEMZcMbYIoIiLSZBRYmpJzhCWHDG2CKCIi0mQUWJqSy0ohLW0WERFpOgosTckZWHLJOFBKG9j1QERExCsosDQl59Lm/VRYq8gvsXq4IBERkbZBgaUphXcCsy/+pmoSOKAW/SIiIk1EgaUpmS0QmQxAF7M2QRQREWkqCixNzbkJYq5GWEREpE14Z20mv2QXenRupgJLU3NugqiVQiIi0vrtL6rgwY9/5sIXV3Gg1HNzMxVYmlpNYEkx5eiWkIiItHrLtuUB0L9TBDEh/h6rQ4GlqdU0j+uiW0IiItIGfLN1PwBnnRLr0ToUWJqa85ZQLgVllRSWV3m4IBERkRNjrbazakc+AGed0sGjtSiwNLXwzmD2IcBURRyHtGuziIi0Wj+kH6SkspqYED/6dQz3aC0KLE3N4gMRXQBINmsTRBERab2WbnPcDhrTswNms8mjtSiwNAeX20KaeCsiIq1V7fyVs3t59nYQKLA0D7eVQhphERGR1ifzQBk780qxmE2M7BHj6XIUWJqFSy+WdI2wiIhIK1R7O2hwl0jCA309XI0CS/Nw2bVZk25FRKQ1qg0sZ3nB7SBQYGkeLnNYcorKqaiyebggERGR41dutbFm5wHAO+avgAJL84jojGGyEGSqJJYCMtWiX0REWpE1u/KprLbTMSKQHh1CPF0OoMDSPHz8MEUkAY7bQun5mngrIiKth7O7ba9YTCbPLmeupcDSXGrnsZhzNMIiIiKthmEYLN3q2D/I091tXSmwNBfnxFttgigiIq3H9v0lZBeU4+dj5sxunl/OXEuBpbm4TLxNVy8WERFpJZbW3A4a3jWaQD+Lh6upo8DSXFyXNuuWkIiItBK1y5m9ZXVQLQWW5uIywrLnUBlVNruHCxIRETm6oooqfkg/BHjX/BVQYGk+kckYmAg1lRNhL2RvQbmnKxIRETmqVdvzqbYbdI0NpnN0kKfLcaPA0lx8/DGFO5Y2O+ax6LaQiIh4N+dmh142ugIKLM0rKgVwbIKYqYm3IiLixex2g2XbapYzHzZ/Zc4Pc/gy/UsqbZWeKA1QYGletfNYzNoEUUREvNsvewvJL6kk2M/CkOQo5/HMokxe/fVVHljxAFab1WP1KbA0J5eVQurFIiIi3qy2WdzIHjH4+dTFg2VZywAYFDeIUL9QD1TmoMDSnFxWCmXolpCIiHixb46wnHn5nuUAjEka0+I1uVJgaU7R3YCaOSwHS7HbDQ8XJCIiUl9+SSWb9hQAMNZlwm2RtYgNuRscxzuN9UBldRRYmlNkMgBhpjICq4vYX+y5yUoiIiJHsnxbHoYBfRPDiAsLcB5fnb2aaqOaruFdSQpL8mCFCizNyzcQwjoCjlEWtegXERFvVNvd9vBmccv2LANgTCfP3g6CEwws8+bNIzk5mYCAAIYNG8a6deuOeO7LL7/MqFGjiIyMJDIyktTU1HrnG4bB9OnTSUhIIDAwkNTUVLZv334ipXkfl3ksmZp4KyIiXqbaZmfFb/WXM1fbq1m5ZyXg+fkrcAKBZeHChUyZMoUZM2awYcMGBgwYQFpaGvv372/w/GXLlnHllVeydOlS1qxZQ1JSEuPGjSM7O9t5zpNPPsnzzz/PggULWLt2LcHBwaSlpVFRUXHiP5m3qOnFkmzWCIuIiHifDZkFFFVUExnky8CkCOfxjfs3UmQtItw/nAGxAzxXYI1GB5Y5c+Zwyy23MGnSJPr06cOCBQsICgrilVdeafD8t99+m9tuu42BAwfSq1cv/vnPf2K321myZAngGF2ZO3cuDz30EBdffDH9+/fnjTfeYO/evXzyyScn9cN5BdeVQtoEUUREvExtd9sxPWOxmE3O47Wrg0Z1HIWP2ccjtblqVGCxWq2sX7+e1NTUuguYzaSmprJmzZrjukZZWRlVVVVERTma0uzevZucnBy3a4aHhzNs2LAjXrOyspKioiK3h9dy68WiERYREfEuy2rnr3jpcuZajQos+fn52Gw24uLi3I7HxcWRk5NzXNe4//77SUxMdAaU2tc15pozZ84kPDzc+UhK8uzM5aOKcixt7mLKIeNAGYahpc0iIuIdsgvK2ZpTjNkEo3vEOo9nFmWyu3A3PiYfRiSO8GCFdVp0ldCsWbN47733+PjjjwkICDj2C45g6tSpFBYWOh9ZWVlNWGUTq5nDEmUqwVRRwKGyKg8XJCIi4lA7unJa50gig/3qjmctAzzf3dZVowJLTEwMFouF3Nxct+O5ubnEx8cf9bVPP/00s2bN4quvvqJ///7O47Wva8w1/f39CQsLc3t4Lb9gCHH8HLotJCIi3mTpVu/ubuuqUYHFz8+PQYMGOSfMAs4JtMOHDz/i65588kkeffRRFi1axODBg92eS0lJIT4+3u2aRUVFrF279qjXbFW0p5CIiHiZiiob3+44AMDYU+puB3lTd1tXjZ72O2XKFK6//noGDx7M0KFDmTt3LqWlpUyaNAmA6667jo4dOzJz5kwAZs+ezfTp03nnnXdITk52zksJCQkhJCQEk8nE5MmTeeyxx+jRowcpKSlMmzaNxMREJkyY0HQ/qSdFdYXM1c55LCIiIp62bvdByqtsxIX50yeh7k6FN3W3ddXowDJx4kTy8vKYPn06OTk5DBw4kEWLFjknzWZmZmI21w3czJ8/H6vVyqWXXup2nRkzZvDwww8DcN9991FaWsof//hHCgoKGDlyJIsWLTqpeS5exdmLJZdvdUtIRES8QO1y5rNO6YDJVLec2dnd1otuB8EJBBaAO+64gzvuuKPB55YtW+b2dXp6+jGvZzKZeOSRR3jkkUdOpBzvF127UiiXd9SLRUREvEDthFvXzQ7dutt6QTt+V9pLqCU4m8fplpCIiHjerrwS0g+U4WsxMbJHjPO4t3W3daXA0hIiHbeEYk1FVJQcoqSy2sMFiYhIe7Z0m2PvoKEpUYT4191s8bbutq4UWFpCQBgEO2ZgaxNEERHxtKVbj7A7c9YywPvmr4ACS8tRi34REfECpZXVrN3tWM7s2o4/oyiD9KJ0r+pu60qBpaW4zmPRxFsREfGQVTvyqbIZdIkOomtMsPP48izH7aBB8d7T3daVAktLqdlTSCMsIiLiSc7NDg9bzuzsbutlq4NqKbC0lJpeLF3M6nYrIiKeYRgGS7c6Jty63g7y1u62rhRYWkrNLaEULW0WEREP2bKvmJyiCgJ9LQxLiXIe/zb7W6/sbutKgaWl1IywdDAVUFB4iMpqm4cLEhGR9mZpze2gEd2jCfC1OI978+qgWgosLSUwEiPQkWY7k0vWwXIPFyQiIu1N7XLmw7vbrspe5TjupbeDQIGlRZmcK4VyyTyoibciItJyDpVa2ZB5CHCfv+LN3W1dKbC0JGcvFs1jERGRlrViex52A06JC6VjRKDzuGt3W4vZcqSXe5wCS0ty2QRRgUVERFqS83ZQr1i3461h/goosLQsdbsVEREPsNkNlv/mWM589imtp7utKwWWllQbWMy6JSQiIi1nY1YBh8qqCA3w4fQukc7j3t7d1pUCS0uqCSwJpoPkHTqEzW54uCAREWkParvbju4Zi6+l7q2/dv6KN68OqqXA0pICIzECwgFIsOeyt0BLm0VEpPkt3VZ/d2bX7rbe2o7flQJLSzKZnEubk005ZGoTRBERaWb7iyr4JbsIgLGn1E24bQ3dbV0psLS0qLqVQumaeCsiIs1s2TbHZNsBncKJCfGvO561DPD+1UG1FFhamstKoUxNvBURkWb2Tc1yZtdmca2lu60rBZaW5nJLSCMsIiLSnKzVdlbtyAfc56/UdreN8I/w6u62rhRYWlpte36zmseJiEjz+iH9ICWV1cSE+NGvY7jzeGvpbutKgaWl1QSWRA6Qc7AAw9DSZhERaR61q4PG9OyA2WxyHm9t81dAgaXlBcdg+IViNhlEV+WQV1Lp6YpERKSNqp2/cnavhrvbnpl4pqdKazQFlpZmMmGKSgFqljbrtpCIiDSDzANl7MwrxWI2MbJHjPN47ehKa+hu60qBxRNqNkFMNuWSrsAiIiLNoPZ20OAukYQH+jqPt6butq4UWDzBtXmcVgqJiEgzcHa37XWE7rataP4KKLB4Ru1KIY2wiIhIMyi32liz8wDgPn/l2+xvsRk2uoV3IynU+7vbulJg8QSXEZYMtecXEZEmtmZXPpXVdjpGBNKjQ4jzeGtcHVRLgcUTagJLR1M+2fkFnq1FRETanNrVQWNPicVkcixndu1u2xo2OzycAosnhMRh+AZjMRmEVuyjsKzK0xWJiEgbYRgGS7c69g9yvR304/4fW113W1cKLJ7gsmtzF1MOGQc18VZERJrG9v0lZBeU4+djZni3aOfxFXtWAK2ru60rBRZPcfZiUYt+ERFpOktrbgcN7xpNkJ+P83hrnr8CCiye4zrxVkubRUSkiTh3Zz4l1nnMtbvtiMQRnirtpCiweIozsGiERUREmkZRRRU/ZBwC4Oxecc7jrt1tQ/xCGnil91Ng8RS3OSwKLCIicvJWbc/HZjfoGhtM5+gg5/HW2t3WlQKLp9QElk6mfPbkF3q4GBERaQucmx2eUrc6qLCysNV2t3WlwOIpoQkYPgH4mmz4lOyl3GrzdEUiItKK2e0Gy7Y5ljO7tuNfvXd1q+1u60qBxVPMZvc9hXRbSERETsIvewvJL6kk2M/CkOQo5/HWvjqolgKLB5m0UkhERJpIbbO4kT1i8PNxvL27drcdmzTWU6U1CQUWT1IvFhERaSLf1OzOfKTutv1j+nuqtCZxQoFl3rx5JCcnExAQwLBhw1i3bt0Rz/3111+55JJLSE5OxmQyMXfu3HrnPPzww5hMJrdHr169TqS01sVl12Z1uxURkROVX1LJpj0FAIx1mXC7PMuxOqi1drd11ejAsnDhQqZMmcKMGTPYsGEDAwYMIC0tjf379zd4fllZGV27dmXWrFnEx8cf8bp9+/Zl3759zseqVasaW1rr43ZLSCMsIiJyYpZvy8MwoG9iGHFhAXXHa5Yzt/b5K3ACgWXOnDnccsstTJo0iT59+rBgwQKCgoJ45ZVXGjx/yJAhPPXUU1xxxRX4+/sf8bo+Pj7Ex8c7HzExMY0trfWpCSxJpv1k5Rd7uBgREWmtlm6r7W5bN7qSXpju6G5rbr3dbV01KrBYrVbWr19Pampq3QXMZlJTU1mzZs1JFbJ9+3YSExPp2rUrV199NZmZmSd1vVYhrBOGxR8/kw17YTZVNrunKxIRkVam2mZnxW+1y5nr2vHXjq4MjhvcarvbumpUYMnPz8dmsxEXF+d2PC4ujpycnBMuYtiwYbz22mssWrSI+fPns3v3bkaNGkVxccOjDpWVlRQVFbk9WiWzGSKTAUhiH9mHyj1bj4iItDobMgsoqqgmIsiXgUmRzuPO7ratfHVQLa9YJXTeeedx2WWX0b9/f9LS0vj8888pKCjg/fffb/D8mTNnEh4e7nwkJbXeRji1S5tT1KJfREROQG132zE9Y7GYTYB7d9vRnUZ7rLam1KjAEhMTg8ViITc31+14bm7uUSfUNlZERAQ9e/Zkx44dDT4/depUCgsLnY+srKwm+94tznWlkHqxiIhIIy1rYDnzt9nftonutq4aFVj8/PwYNGgQS5YscR6z2+0sWbKE4cOHN1lRJSUl7Ny5k4SEhAaf9/f3JywszO3RaqkXi4iInKDsgnK25hRjNsHoHnXzV5btWQa0jdVBtRp9S2jKlCm8/PLLvP7662zZsoVbb72V0tJSJk2aBMB1113H1KlTnedbrVY2btzIxo0bsVqtZGdns3HjRrfRk3vvvZfly5eTnp7O6tWr+f3vf4/FYuHKK69sgh/Ry7nu2qwRFhERaYTa0ZXTOkcSGewHtK3utq58GvuCiRMnkpeXx/Tp08nJyWHgwIEsWrTIORE3MzMTs7kuB+3du5fTTjvN+fXTTz/N008/zZgxY1i2bBkAe/bs4corr+TAgQPExsYycuRIvvvuO2JjY2nzorsB0MW0n8z8Eg8XIyIircnSrbXLmeveL3/c/yPF1mIi/SNbfXdbV40OLAB33HEHd9xxR4PP1YaQWsnJyRiGcdTrvffeeydSRtsQ1gnD7Iu/vYqKQ3uw2w3MNZOmREREjqSiysa3Ow4A7rszO7vbdmr93W1decUqoXbN4gMRXQDoaN9HbnGFhwsSEZHWYO3ug5RX2YgL86dPQt1cztrlzG1ldVAtBRYvYIqua9Gfnq+JtyIicmx1t4M6YDI5RubbWndbVwos3sBlaXOmNkEUEZFjMAzD2Y7fbbPDNtbd1pUCizdwboKYS7qWNouIyDHszi8l40AZvhYTI3vU7b3X1rrbulJg8QYuS5szFVhEROQYlm5z7B00NCWKEH/H+hnX7rZjOrWd/iu1FFi8gesIi5Y2i4jIMbjOX6lV2922e0R3OoV28lRpzUaBxRtEdMYwWQg0Wak4mH3MZeAiItJ+lVZWs3Z3/eXMtd1t29rqoFoKLN7A4osR0RmAGGs2h8qqPFyQiIh4q1U78qmyGXSJDqJrTDAAVfaqNtnd1pUCi5cw185jMeeQrhb9IiJyBLXt+F2XM2/cv7FNdrd1pcDiLVzmsWjirYiINMQwDJZudUy4HevSjr+tdrd1pcDiLVxWCmmERUREGrJlXzE5RRUE+Jo5o2u083jtcua2uDqolgKLt6jZBFEjLCIiciS1zeJGdIshwNcxkuLa3fbMxDM9WV6zUmDxFi7dbrW0WUREGuJcztyrfXS3dXVCuzVLM4jojGEyE0IFpQf3eroaERHxModKrWzIPAQctpw5axnQdlcH1dIIi7fw8ccIczT6CS3LoqSy2sMFiYiIN1mxPQ+7AafEhdIxIhBwdLf9cf+PQNuevwIKLF7FXLtrszmXDE28FRERF7W3g8b2qlsd1Na727pSYPEmLvNYMjTxVkREatjsBst/cyxnPvuU+t1t2/roCiiweBdnL5YcBRYREXHamFXAobIqQgN8OL1LJNA+utu6UmDxJlGOpc1dTLlkHtQtIRERcajtbju6Zyy+Fsdbt2t3234x/TxZXotQYPEmLiMs6XkKLCIi4vBNA7szt4futq4UWLxJZDIGJsJM5RQdzPF0NSIi4gVyiyr4dW8RcFg7/nbQ3daVAos38Q3AHpoIgH9xOpXVNg8XJCIinlZ7O2hAp3BiQvyB9tPd1pUCi5epXdrchVyyDpZ7uBoREfG02s0OG+puOyRuSJvubutKgcXLmKLUi0VERBys1XZW7cgH3Oev1Ha3HZPUPm4HgQKL94muXSmkpc0iIu3dD+kHKamsJibEj34dw4H21d3WlQKLt3HrxaIRFhGR9qx2d+YxPTtgNpuA9tXd1pUCi7dxBpZcMg5qhEVEpD1zLmd2acffnrrbulJg8TaRyQBEmEo5mJ/r2VpERMRjMg+UsTOvFIvZxKgejsDS3rrbulJg8TZ+wdiC4wHwLdxNtc3u4YJERMQTam8HDeoSSXigL9D+utu6UmDxQrVLmzvac9hXWOHhakREpKXZ7Qb/+2kvAGf3qr86qL10t3WlwOKFTDUrhZK1a7OISLv0jxW7+CHjEP4+Zs7vl+A83t6627pSYPFGNRNvu5hzyNAmiCIi7cr36Qd5+qttAPztor4kRQUBju62GUUZ7aq7rSsFFm9UE1hS1ItFRKRdOVhq5a53f8RmN5gwMJGJQ5Kcz7XH7rauFFi8Ue0Ii0ndbkVE2gu73eAv729kX2EFXWOCeez3/TCZTM7n22N3W1cKLN4oKgWAaFMx+Xn7PVyMiIi0hJdW7mLptjz8fczMu/p0Qvx9nM+11+62rhRYvJF/KNWBNU2CDqVjGIZn6xERkWa1PuMgT33pmLfy8EV96Z0Q5vb8quxV7bK7rSsFFi9Vu7Q5wbaXvOJKD1cjIiLN5VCplTveccxbuWhAIle4zFup1Z5XB9VSYPFS5pjuQM08FrXoFxFpk+x2g7988BP7CitIiQnmiT+4z1uB9t3d1pUCi7eqmceSYs4hPV8Tb0VE2qJ/rtrFN1v34+djZt5V7vNWarXn7rauFFi8lXOlUA6ZGmEREWlz1mccZPYix7yVGRf2oU9iWIPntefutq4UWLyVy67N6erFIiLSphwqtXJnzbyVCwckctXQzkc8t3b+Snu+HQQKLN4r0nFLKNZUSF5enoeLqVNQZuWhT37m7KeXsXpHvqfLERFpdQzD4N4PfmJv7byV359ab95Krd2Fu53dbYcnDG/hSr3LCQWWefPmkZycTEBAAMOGDWPdunVHPPfXX3/lkksuITk5GZPJxNy5c0/6mu1CYATVAVEAGId2e7gYx8Sw99ZlctbTy3jru0x25Zfyp7fWsz232NOliYi0Kv9cuZslNfNWXrzqNEIDfI947oo9K4D2293WVaMDy8KFC5kyZQozZsxgw4YNDBgwgLS0NPbvb7jBWVlZGV27dmXWrFnEx8c3yTXbi9pNEKMqsyksq/JYHT/vKeQP81fzwEc/c6isip5xIQxIiqC4oppJr31PfomWXYuIHI8NmYeYvWgrANMv6EPfxPCjnt/eu9u6anRgmTNnDrfccguTJk2iT58+LFiwgKCgIF555ZUGzx8yZAhPPfUUV1xxBf7+/k1yzfbC4rprswc2QSwos/LXj3/monmr2JhVQIi/Dw+d35vPLgvn3T7r6BNlsOdQObe88QMVVbYWr09EpDUpKHPMW6m2G1zQP4Grhx153gq4d7dt7/NXoJGBxWq1sn79elJTU+suYDaTmprKmjVrTqiAE7lmZWUlRUVFbo82yTnxNqdFJ9663v55e20mhgEXD0xkyV1DuLn8FXz/dRZByx/mw8h5RAfAj5kF3PvBT9jt6sgrItKQ2nkr2QXlJEcHMbOBfiuHc+1u2zGkYwtV6r0aFVjy8/Ox2WzExcW5HY+LiyMnJ+eECjiRa86cOZPw8HDnIympflfANqE2sJhzyGyhTRAbuv3z7i1n8NzQQuLeOgtWvwCGHcy+BGav5otuH+Njhk837ePZr39rkRpFRFqbf63azddb9uNnMfPiVacfdd5Kra8zvgY0ulKrVa4Smjp1KoWFhc5HVlaWp0tqHi67Njf3CMsRb//ccirDf54Ob1wMh9IhrCNcuRCueAdMZjrs/IAP+/8AwAvf7ODD9XuatU4RkdZmQ+YhZn3hmLcy7cI+nNrx6PNWAD7b9RlfZzoCy7ldzm3W+lqL+i31jiImJgaLxUJubq7b8dzc3CNOqG2Oa/r7+x9xPkybUtPtNt50iNy8g83yLex2gw/WZzHri60cqpnYe/HARB48rxdxexbB/PugtGby85Bb4JzpEFDT3Gj8LPjiPgZsncOz/Wdxz6bOPPDRJjpGBnJG1+hmqVdEpDVxnbdyfv8ErjnGvBWA3w79xt/W/A2AW/rdQp/oPs1dZqvQqBEWPz8/Bg0axJIlS5zH7HY7S5YsYfjwE1sf3hzXbDOCoqj2cyTx6oO7mvzytbd/7v/wsNs/53Ug7vMb4YMbHGElpifc+CWc/3RdWAEY9idHiAEm7Pobt/Yoospm8Kc317Mrr6TJ6xURaU0c81Y2kV1QTpfoIGYdx7yVImsR9yy9h/Lqcs5MPJPbB97eQtV6v0aNsABMmTKF66+/nsGDBzN06FDmzp1LaWkpkyZNAuC6666jY8eOzJw5E3BMqt28ebPz8+zsbDZu3EhISAjdu3c/rmu2a9HdYN8GwsqyKLNWE+TX6P9k9RSUWXnqy228s84xoTbE34fJqT24fnhnfH98HRbOAGsxmH1h1BQY9RfwOcKI1vhZcGg3ph1fc9+hh9meOJuv98KNr33Px7eNIDLY76TrFRFpjRzzVnLxszj2CTrWvBW7YeevK/9KZnEmicGJzB41u1234j9co9/9Jk6cSF5eHtOnTycnJ4eBAweyaNEi56TZzMxMzOa6gZu9e/dy2mmnOb9++umnefrppxkzZgzLli07rmu2Zz4xjsCSXLOnUK/4hveaOB5HvP3zu97EWbPgjQshc7Xj5I6D4aIXIO4YQ5EWH7j0VXglDdP+zSyIfZLx4X9lx4Ey/vTmet68eSj+Pvo/nIi0LxuzCpz9VqZd0Pu45q386+d/sWzPMvzMfswZO4eIgIhmrrJ1MRmG0erXohYVFREeHk5hYSFhYSf+hu6Vlj4By2fzTvVZRF+5gLS+JzZX6Oc9hUz7zy9szCoAoGdcCH+76FSGdwmF1c/B8ifBZgXfYMc8laG3QGOSfUEmvHw2lOZR0iWVEek3U1hp5w+nd+SZywYccxhURKStKCyr4nfPryS7oJzf9Ytn3lWnH/Nv4Ors1fz56z9jYPC3M//GH3r8oYWq9azGvH+f/P0FaV4umyD+egIrhQrKrDz91TZnP5VgPwv3nNuT689Mxnffj/DSnbD/V8fJ3VPhgmch4tiTwuqJ6AxXvgevnU9Ixtd81qsTY35O46MN2aREB3PnOT0af00RkVbGMAzu/bej30rnqCBmXdL/mGFlb8le7l95PwYGl/S4pN2ElcZSYPF2tUubzbl81oheLLW3f2Yv2sbBUivgcvsnwAaLH4K18x09VQKj4LzZ0O8yOJmRkE6DYcJ8+PckOm17jXdP68Tl6/vwzOLf6BITzEUDEk/82iIircCr36azeHPdvJWwY8xbqbRVMmXZFAoqC+gT3Yepw6a2UKWtjwKLt6sJLB1NB9iXf+i4XvJLdiEPfdLA7Z9u0bDja/j0HsctHID+EyHtCQiOaZp6T/0DHNwJ3zzG0M0zeaLfMzz4cwfu/eAnOkYEMKhLVNN8HxERL7Mxq4CZX2wB4K/n96Zfp2PPW5m5dia/HviVCP8Inh37LP6WdtCy4wS1ysZx7UpQNNW+oQBUHTj6rs0FZVYe+uRnLnzR0fwt2M/iaP521yiGJ5jgoz/BW5c4wkp4Elz9IfzhpaYLK7VG3QsDrgTDxpWZ07m+eznWajt/fGM9mS24xYCISEspLK/ijnc2UGUzOO/UeK4b3uWYr/lo+0d8uP1DTJiYPXo2iSEahT4aBRZvZzJh1IyyBBanY6221zvFbjdY+H0mZz+znLe+q9v755t7x3LzyBR8N38ELw6BTe8BJhh2K9z2HfRIrXetpqqZC5+DzmdiqixiRvHfGJFg50CplUmvraOw3HM7T4uINDXDMLjv3z+x55Bj3srsS489b+XX/F95/LvHAbjztDs5M/HMlii1VVNgaQV8Yhy7Nncml+yCcrfnfsmua/52sNRa1/ztitOIs+fBO5fDhzdBWT506AM3fw3nzQL/kGYu2h8mvgWRKZgLM3k1YC6dQ83szCvltrfXU2WrH7xERFqj11an8+Wvxz9v5VDFIe5Zdg9Wu5WxSWO5qd9NLVRp66bA0gqYXHZtzqiZeHvU2z8pEbD2Jfj7GbD9K7D4wVl/hT8ud0yMbSnB0XD1BxAQjt++H/hv53cJ8jPz7Y4DTPvkF9rAinoRaed+yirgic8d81Ye/F2vY85bsdltPLDyAfaV7qNzaGceH/k4ZpPeio+HJt22Bi6bIO7ILyW3KLPh1T9hAbB/C/z3LtizzvHapDPgouch9hTP1B7TwzHS8ubvidj5Hz7rl8Q5G87kve+zSIkJ5k9junmmLhGRk1RYXsUd7zrmrYzvG8/1ZyYf8zV//+nvrN67mgBLAM+e9Sxhfm2sd1gzUmBpDVx6sdz8xVYqa+axuK3+qa6EpTNh5TNgrwK/UDj3YRh0I5g9nN5TRjv6u/z3TlJ+fZFXByVw/Q8pzFq0lS7RQYw/NcGz9YmINJJhGNz/701kHSwnKSrwuOatLMtaxkubXgJgxpkz6BnZswUqbTs0DtUaRDtGIRJN+RjVle63f7pFQ9Y6+MdoWD7LEVZ6jofbv4MhN3s+rNQ6/ToYcTcAo7f8jYf6FWIYMHnhRn6qWX4tItJavL46nUW/5uBrMTHvqtMJDzz6vJXMokweXPkgAFf1uooLul7QEmW2KV7ybiZHFRyL3TcYi8nghj5mx+qfUV3xrS6Fz/8P/jUO8rZCUAxc+oqj42x4J09XXd85D0OvCzDZrNyU/RCXda2mosrOzW/8UG8ysYiIt9q0p4AnPnfsE/Tg73rTv1PEUc8vry5n8rLJFFcVc1qH07h38L0tUGXbo8DSGphMmKMdt4UeHObnmKvy25cw7wxY9xJgwMCr4Y7v4dRLTq5bbXMymx19XxIGYio7wKzKRxnUwURecSU3vfY9xRVa7iwi3q2wvIrb39mA1WYnrW8cNxxj3ophGPxtzd/Yfmg70QHRPD3maXwtRx+NkYYpsLQWNfNYyFoL/77JsVy5aA9EdIFrP4YJf4egVtBF1i/YMQIU1hHLge28HTGf+BALW3OKufPdH6nWcmcR8VKGYfDAh455K50iA3ny0mNv7Pru1nf5bNdnWEwWnh7zNB2COrRQtW2PAktrURtYVs2BX/4NJjMMvwNuWwPdzvZsbY0VluAILb7BBGSu4LPu/yHA18SybXk8+ulmT1cnItKgN9Zk8MUvjnkrLx7HvJWN+zfy1PdPATBl0BQGx7dgW4k2SIGltYhyWf4bdyrcvATSHneMWLRGCf3h0n8BJqK3vsMnp/2EyQSvr8ng1W+PvgWBiEhL+3lPIY9/5ui3MvW83gxMijjq+fnl+fxl2V+oNqpJS07j2j7XtkCVbZsCS2vR63zodQGk/g3+uAw6nu7pik7eKec5QhfQa9Ns5g/KAeDRTzezZEuuJysTEXEqqqibtzKuTxyTRiQf9fwqexX3Lr+X/eX76RrelUfOfOSYt47k2BRYWougKLjibRg5GdrShK0zboPBNwIGadseYsqp5dgNuPPdH/l1b6GnqxORdq523krmwTI6RgTy1HHMW3lu/XOsz11PsG8wc8+aS5BvUAtV27YpsIhnmUxw3pPQ9SxMVWXcmTuNC1IMyqw2bnrtB3KLKjxdoYi0Y299l8HnP9f0W7n6dMKDjv4Pxi/Tv+T1za8D8PiIx0kJT2mJMtsFBRbxPIsvXPYaxPbCVLyX5+yzODXWh5yiCm56/XvKrNWerlBE2qFfsgt59FPHvJUHjmPeys6CnUz7dhoAN556I+d0Oae5S2xXFFjEOwRGwFULISgGS+7PvB/7L2KDLPySXcRd727EZtdGiSLScopd5q2c2yeOG48xb6XEWsLkpZMpry5nWPww7jztzpYptB1RYBHvEZkMV7wDFn+Cdn3J//oswc/HzNdbcpn1xRZPVyci7YRhGDzw0c9kHKidt3L0fYIMw2Dat9NIL0onLiiO2aNn42PWVn1NTYFFvEvnYY4meED8Ly/xwZDfAHh55W7eXpvhycpEpJ14a20mn23ah4/ZxItXnUZEkN9Rz3/t19f4OvNrfMw+zBk7h+jA6BaqtH1RYBHv0+9SGDsVgAE/PcKzgwsAmP6fX1nxW54HCxORtu7XvYU8+j9HA8sHzuvFaZ0jj3r+2n1rmbthLgBTh06lf2z/5i6x3VJgEe805n7odxnYq5mw/QH+3Lcam93g9rc3sC2n2NPViUgbZK2285f3f8Jqs5PauwM3jTz6Cp+c0hzuW3EfdsPORd0u4rKel7VQpe2TAot4J5MJLnoRkoZhqizi/gPTOaeLheLKam587Xvyiis9XaGItDELlu9ka04xkUG+zLrk6PNWrDYrf1n2Fw5WHKRXVC+mnTFNzeGamQKLeC/fAMck3IgumArSWeAzh57RfmQXlHPLGz9QUWXzWGmGYVBSWU16fik/pB9k0S/7eHddJltzijxWk4icuK05RbzwzXYAHr6oLzEh/kc9/8nvn2RT/ibC/MKYM3YOAT4BLVFmu6ZpzOLdgmPgqvfhX+PwzV7LR6e8y8iyiWzMKuAv7//EC1eehtncNP+qqQ0h+SVW8ksqySuuJL+kkvziSvJqjtU+8oorqahqeGfpc/vEcdfZPejXKbxJ6hKR5lVts3PfvzdRZTM4t08cFw1IPOr5/935XxZuW4gJE7NGzSIpNKmFKm3fFFjE+3XoBZe/Dm9dQsi2D/nfwC6c/f0QPvt5H8kxQfxfWq8jvtQwDIoqqp3BI7+B4JFXYq15rpLK6oZDyJEE+lqICfUjNsQffx8L3+0+wOLNuSzenMvZvTpw59ndjzlpT0Q865+rdrNpTyFhAT48NuHUo97a2XpwK4+seQSAWwfcyqhOo1qqzHbPZBhGq+/IVVRURHh4OIWFhYSFhXm6HGkuP7wKn04GYO3pTzNxteNfQXed04PIIN+aUOIaSKzklVRibWQICfazEBPqT0yIPzEhfsQ6P3c8YkP9nJ8H+7tn/h37S/j70h18sjGb2l53o3vGcvc53RnUJeqkfwUi0rR25pVw3nMrsVbbeerS/lw2+MijJYWVhUz8dCLZJdmM6jiKF895EbNJMytORmPevxVYpHX58q+w5kWw+PNOn3k8+P3xbSoW6u9TE0L83MJH7ehITKi/42OIP4F+luOrxTCgohDKD0LZIagogPj+EBJLen4p85bu4KMfs51dekd0j+aus3swrKt6NIh4A5vd4PJ/rGF9xiFG94zl9UlDjji6Yjfs3LHkDlZmr6RjSEcWXrCQcH/d9j1ZCizSdtltsPAa2PY5RlAM83u8xLL9QTVhoyaMNDA6EuB7jBBiq4KygzXh4ygf3Y4dAuOwib8+gY7dp0fcDaFxZB0s4+/LdvDv9Xuosjn+rzYsJYq7z+nB8G7RWlUg4kGvrNrNI59uJsTfhy/vGU3HiMAjnjv/p/n8fePf8bf48+Z5b9I7uncLVtp2KbBI21ZZAq+Oh5yfIbY33PQlBNT8S8cwwFp6jKDRQPCoPInVPb5BEBgFZgsU1HTj9QlwCS7x7DlUxoLlO3n/+z1YbY5bVIO7RHLXOT0Y1SNGwUWkhWUcKCVt7goqquw8/vtTuXpYlyOeu3LPSm5fcjsGBo+PfJyLul3UgpW2bQos0vYVZsM/z4HifY49iHyD6kKIzXqCFzU5NmEMjIKgqMM+RtZ9HRTt/pxvzXJGw4CdS2DZLNjzveOYTwAMugFGTIawBPYVlvOP5bt4d12mc4LvwKQI7j6nB2NPiVVwEWkBdrvB1f9cy5pdBxjeNZq3bx52xNWGe4r3MPHTiRRZi5h4ykQeOuOhFq62bVNgkfZh74/w6u+gqqz+cxa/wwJGZAMh5LCPgRGOUZKTZRiw8xtYPhuy1tbU4+8ILiMnQ1gi+4sqeGnFLt5am+FcHt2vYzh3nt2dc/vEKbiINKO312bw149/IdDXwqLJo+gSHdzgeRXVFVz7xbVsPbiV/jH9eXX8q/hZjr6vkDSOAou0H/u3Qu4v9cOHX7CjW64nGQbsWuYYccn6znHM4g+DroeR90BYIvkllby8chdvrsmgzOqYD9M7IYy7zu5OWt/4JusxIyIO2QXlpD27gpLKaqZf0Icbj9B+3zAMHvr2If67879EBUSx8IKFxAfHt3C1bZ8Ci4g3MQzYvdwRXDLXOI5Z/OD0muAS3pGDpVb+tWoXr6/OoKSyGoCecSHceXYPftcvAYuCi8hJMwyD61/9nhW/5TGoSyTv/2n4Ef+/9f6293n0u0cxm8y8dO5LDEsY1sLVtg8KLCLeyDBg9wrHraKMbx3HLH5w2rUwagqEd6KgzMor36bz6re7Ka5wBJduscHceXYPLuifgI9FPR9ETtQHP2Txf//ehJ+PmS/uHkW32JAGz9uUt4nrF11Ptb2aewbdw42n3tjClbYfCiwi3m73SseIS8Yqx9dmXzj9Whg5BSKSKCyv4vXV6fxr1W4Ky6sASIkJ5vazujNhYKKCi0gj5RZVcO6c5RRVVPPAeb3485huDZ53sOIgl//vcnLLckntnMqcsXM0p6wZKbCItBa7VzpGXNJXOr42+8Jp1zhGXCI6U1xRxRtrMvjnyl0cKnMEl85RQdx+Vjd+f1on/HwUXESOxTAMbnljPV9vyaV/p3A+uvXMBkO/YRjcvuR2VmavJDksmXfPf5cQv4ZHYaRpKLCItDbpqxwjLm7B5WoY9ReI6ExpZTVvfZfBSyt2caDUsWy7Y0Qgt53VjUsHdcLfpwlWN4m0Uf/ZmM3d723E12Lif3eOpFd8w+8T7259lyfWPoGf2Y93L3iXnpE9W7jS9keBRaS1yljtCC67lzu+NvvAwJrgEtmFcquNt9dm8I8Vu8grrgQgITyAW8d24/LBScfu6CvSzuSXVHLunOUcKqvintSe3J3ao8HzdhzawRWfXUGlrZIHhj7A1b2vbuFK2ycFFpHWLmMNLJ/lWBYNjuAy4EpHcIlKoaLKxnvrMlmwfBc5RRUAdAj1589junHl0M7Hvx+SSBt3+zsb+GzTPnonhPHfO0bg28CtoEpbJVd9dhW/HfqNkR1H8vdz/q55Ky2kMe/fJ3QDfN68eSQnJxMQEMCwYcNYt27dUc//4IMP6NWrFwEBAfTr14/PP//c7fkbbrgBk8nk9hg/fvyJlCbSNnQZDtf9B278ErqeBfZq+PFNeGEQ/Od2AoozuWFECsvvG8tjE06lY0Qg+4sreeTTzYx6cikvrdjpXB4t0l4t+mUfn23ah8Vs4qlL+zcYVgDmrp/Lb4d+IyogikdHPKqw4qUaHVgWLlzIlClTmDFjBhs2bGDAgAGkpaWxf//+Bs9fvXo1V155JTfddBM//vgjEyZMYMKECfzyyy9u540fP559+/Y5H+++++6J/UQibUnnM+C6T+DGr6Db2Y7NFn98yxFcPrkd/6IMrjmjC0vvHcusP/SjU2Qg+SWVPPH5VoY/sYTHPt1M1sEGOgGLtHEFZVYe+uRXAP48piundmx4Z+Vvs7/lrS1vAfDoiEeJCYxpsRqlcRp9S2jYsGEMGTKEF198EQC73U5SUhJ33nknDzzwQL3zJ06cSGlpKZ9++qnz2BlnnMHAgQNZsGAB4BhhKSgo4JNPPjmhH0K3hKTdyFrnWFW042vH1yYL9J8Io++F6G5U2ex88mM2C5bvZGdeKQBmE6T1jeemkSkM6hKpfz1KuzBl4UY++jGb7h1C+PTOkQ3O7zpYcZBL/nsJ+eX5XNnrSh4c9qAHKm3fmu2WkNVqZf369aSmptZdwGwmNTWVNWvWNPiaNWvWuJ0PkJaWVu/8ZcuW0aFDB0455RRuvfVWDhw4cMQ6KisrKSoqcnuItAtJQ+GaD+HmJdD9XMeIy0/vwItD4OM/41uwm8sGJ7H4njG8OmkIo3rEYDfgi19yuHTBGi6e9y3/2ZhNVc2O0SJt0Tdbc/nox2zMJnjy0v4NhhXDMJjx7Qzyy/PpHtGdKYOmeKBSaYxGBZb8/HxsNhtxcXFux+Pi4sjJyWnwNTk5Occ8f/z48bzxxhssWbKE2bNns3z5cs477zxsNluD15w5cybh4eHOR1JSUmN+DJHWr9NguObfcPM30GNcTXB5F14cDB/9CfPBnZx1SgfevGkYX90zmiuGJOHnY2bTnkLufm8jo2Yv5e/LdlBQdqI7W4t4p6KKKh78yDHl4KaRKZzeObLB897f9j7L9izD1+zLrFGzCPAJaMky5QR4RdepK664gosuuoh+/foxYcIEPv30U77//nuWLVvW4PlTp06lsLDQ+cjKymrZgkW8RadBcPUHcMs30CMNDDtseg/mDYGF10Lmd/TsEMKsS/qz5oGz+cu5PYkN9SenqIInF23jjJlL+OvHP7Njf4mnfxKRJjHz8y3kFFWQHB3ElHNPafCcXQW7eOqHpwC4Z9A9nBLV8HniXRoVWGJiYrBYLOTm5rodz83NJT6+4V0s4+PjG3U+QNeuXYmJiWHHjh0NPu/v709YWJjbQ6Rd6zgIrn4fblkKPc9zBJct/4VX0uDls+HnfxMdaObOc3qw6v6zeOayAfRJCKOiys7bazNJnbOcG15dx8rtebSBTgfSTq3ans+76xz/gJ19Sf8Gl/dbbVbuW3EflbZKRiSOUL+VVqRRgcXPz49BgwaxZMkS5zG73c6SJUsYPnx4g68ZPny42/kAixcvPuL5AHv27OHAgQMkJCQ0pjwR6Xg6XPUe3LrGsamixR/2boAPb4LnBsCqufhXFXHJoE58dtdI3vvjGZzbJw6TCZZty+Paf60jbe4K3luXSUVVw7dkRbxRaWU1D3y0CYDrhndhWNfoBs97fsPzbDu0jUj/SB4d4diNWVqHRq8SWrhwIddffz3/+Mc/GDp0KHPnzuX9999n69atxMXFcd1119GxY0dmzpwJOJY1jxkzhlmzZnH++efz3nvv8cQTT7BhwwZOPfVUSkpK+Nvf/sYll1xCfHw8O3fu5L777qO4uJiff/4Zf3//Y9akVUIiR1CSBz+8At+/DKV5jmO+wTDwKjjjVoh2bACXnl/Ka6vT+eCHLEqtjqASFezHNcM6c83wLnQI1f198W4z/vMLr6/JoGNEIF/dM5pgf59656zeu5o/Lf4TAC+c/QJjk8a2cJVyuGbvdPviiy/y1FNPkZOTw8CBA3n++ecZNmwYAGPHjiU5OZnXXnvNef4HH3zAQw89RHp6Oj169ODJJ5/kd7/7HQDl5eVMmDCBH3/8kYKCAhITExk3bhyPPvpovcm6TfEDi7RLVRXwy79hzd9h/681B01wynlwxm2QPBJMJgrLq3j/+yxeW51OdkE5AL4WExcOSOSmkSn0TWy4l4WIJ63bfZDL/+FYefrWTcMY2aN+L5VDFYe45L+XkFeex8RTJvLQGQ+1dJnSALXmF5GGGYZjn6I1f4ftX9Ydj+8HZ9wOp14CPn5U2+x8tTmXf63azfqMQ87TzugaxY0jUjindxwWs/q5iOeVW22c99wK0g+UccWQJGZd0r/eOYZhcPfSu1matZSu4V1574L3CPQJ9EC1cjgFFhE5trzfYO182PguVDtGUwiJh6E3w6AbIdgxB2BjVgGvrNrNZz/vw2Z3/LnoEh3EpDOTuWxwUoND7yIt5YnPt/DSil3EhwXw1ZTRhAX41jvng98+4JE1j+Br9uWd89+hV1QvD1QqDVFgEZHjV3YQ1r8K616G4n2OYz4BMOAKx+2iWMeSz32F5by+OoN312VSWF4FQGiAD1cMSeL6M5PpFBnkqZ9A2qkfMw9xyfzV2A341/WDOad3/WkEuwp3MfF/E6mwVXDv4Hu5vu/1HqhUjkSBRUQar9oKmz+BNS/Cvp/qjnc/F4bf5tiE0WSizFrNhxuyeXXVbnbl17X/H39qvLNRl9r/S3OrrLZx/vOr2LG/hN+f1pFnJw6sd06VrYqrP7+aLQe3MDxhOAvOXaBVQV5GgUVETpxhQMZq+O7vsPUzoOZPRIc+jpVF/S4H3wDsdoNlv+3nlVXprNqR73z5gKQIbhqZwnmnxh9xd1yRk/X0l9t4cekOYkL8+XrKaCKC/OqdM+eHObz666tE+Efw4UUf0iGogwcqlaNRYBGRpnFgJ6z9h2OH6CrHaApBMTDkZhhyE4Q43gC25hTxyqrdfLJxL9Zqxz5FCeEBXDc8mSuHJjX4ZiJyon7JLuTied9isxssuOZ0xp9av2fXd/u+45avbgHgubOe4+zOZ7d0mXIcFFhEpGmVF8CGNxzhpWiP45jFzzHaMvw2iOsLQH5JJW9/l8mb32WQX1IJgJ/FzPBu0YzrG8e5vePoEKaeLnLiqmx2LnrxW7bsK+L8fgnMu/r0eucUVBRwyX8vYX/5fi7reRnTh0/3QKVyPBRYRKR52Kphy38cy6Kzf6g73nWsY1l091Qwm6mstvG/n/bxr1W72bLPfTf1gUkRnNsnjrS+cXSLDdF8F2mU55dsZ87i34gM8uWre8YQG+reXNQwDO5Zdg9LMpeQHJbMwgsWEuSrCeHeSoFFRJpf1jpYM8+xZ5HhuA1ETE/HPJf+V4BfEIZhsDOvhK825/LVr7lszCpwu0RKTDDj+sRxbp84Tuscqd4uclTbcoq54IWVVNkMnrtiIBcP7FjvnA9/+5CH1zyMj9mHt3/3Nn2i+3igUjleCiwi0nIOZcC6lxy3jCprRlMCI2HwjTDkFgirm1+QW1TB11tyWbw5l9U7DmC12Z3PxYT4cU6vOMb1jWNE9xgCfOtvXCftV7XNziXzV/PTnkJSe3fg5esG1xudSy9M5/JPL6e8upwpg6Yw6dRJHqpWjpcCi4i0vMpix+Tc7+ZDQYbjmNkX+lwMfX8P3c8B37ruosUVVaz4LZ+vNufwzdb9FFdUO58L9LUwpmcs5/aJ4+xeHYgM1qTd9u4fy3cy84uthAb48PWUMcQdNheqylbFNV9cw+YDmxkWP4yXxr2kJcytgAKLiHiO3QbbPnfMc8lcXXfcNxh6nAt9LoIeaeAf4nyqymZn7a6DLN6cw1ebc9lXWOF8zmI2MSQ5knF94jm3TxxJUZqP0N7szCvhvOdWYq228+Sl/bl8cFK9c55d/yyv/PIK4f7hfHjhh8QFH99edOJZCiwi4h32/gibPnDMcynMqjtu8XeMuPS+CE4Z77iFVMMwDH7dW8RXvzrCy9acYrdL9k4I49w+cYzrE0ffxDBN2m3jbHaDif9Yww8ZhxjVI4Y3bhxa77/5un3ruPmrmzEwmDt2Lud0OcdD1UpjKbCIiHcxDNi7ATb/1xFeDu6qe87sAyljHCMvvS6AYPeddrMOltVM2s3h+/SD2F3+YnWMCOTcmkm7Q1Oi1KiuDXr129387X+bCfaz8OU9o+ttAVFYWcgf/vsH9pft55Iel/DwmQ97plA5IQosIuK9DANyf3UEl83/hbwtdc+ZzNBlhGPeS68L3CbsAhwstfLN1v0s3pzDit/yKa+yOZ8LC/Dh7F4dGNc3ntE9YwnRpoytXuaBMtLmrqC8ysajE07l2jO6uD1vGAZ/Wf4XFmcs1hLmVkqBRURaj/ztsPk/jgDjuocRQNIwx22j3hdCpPubVUWVjVXbHZN2l2zZz4FSq/M5P4uZEd2jObdPPKl9OtAhVM3qWhvDMLjq5bWs2XWAM7pG8c7NZ2A+bNn7x9s/Zvrq6fiYfHjrd2/RN6avh6qVE6XAIiKt06F02PI/x8jLnnXuzyUMdNw26n0xxHR3e8pmN9iQeYjFNbeO0g+UOZ8zmRzN6mon7XbvEIJ4v3fWZvLgxz8T4Gvmy8mj6RId7PZ8RlEGl/3vMsqry5l8+mRu6neThyqVk6HAIiKtX9Fe2PKpY+Ql49u65nTg2Iix90WOANOhjyOV1DAMgx37a5rVbc7lp8Oa1XWNDWZcn3jG9Y1jYKeIev9qb0/2FZaTW1RJl6ggr1o6nl1QTtqzKyiprGbaBX24aWSK2/NV9iqu+/w6fjnwC0Pih/DyuS9jMatvT2ukwCIibUtJHmz7zDHysns52Ot6thDVrWbk5SJIPM0tvICjWd3imvCyZmc+Vba6P3mxof7OFUfDu0Xj79O23/Qqq238kH6I5b/lsXxbHtty61ZgRQb50jU2hK4xwXTr4PjYNTaEzlFB+Pm03GRmwzC44dXvWf5bHqd3juCDP59ZrwPy8xue5+WfXybML4wPL/qQ+OD4FqtPmpYCi4i0XeWHYNsix8jLjiVgq6x7LryzY75Ln4ug01Awu7/RFlVUsXxbHl9tzmXp1v2UVNYFnxB/H8aeEsu4vvGMPSWWsADflvqJmlXGgVJnQFm984DbRGWzCWJC/NlfXHnE11vMJjpHBdUEmGBnqOkaG0JMiF+TLyv/9/o93PvBT/j5mPn8rlH1buF9n/M9N315EwYGz4x5hnHJ45r0+0vLUmARkfahshh++9IRXrYvhqq6uSuExEPvCxwjL11GgMV91VBltY3vdh3kq19zWLw51+1N29diYni3GOc+R4d3VfVmZdZqvtt1gOXb8lj+W57bfB5wjCqN6RnLmJ6xjOoRQ0SQH2XWanbllbIrv5RdeSU1nzs+llltR/hOEBrgQ9fYELq5hpnYYJKjg09oa4X9RRWkzllOUUU194/vxa1ju7k9X1hZyKX/u5Sc0hx+3/33PDLikUZ/D/EuCiwi0v5Yy2DnEsdto98W1e1rBBAUDaf8DvpdCsmj64282O0Gm7ILnc3qduwvcXt+QFIE47x0h2nDMNi+v8QZUNbtPui2R5OP2cTg5EjG9OzAmJ6x9E4IPe76DcMgt6iSXXkl7KwJMzvzHB+zC8o50ruHyQSdIgPpGhPiDDLdakZl4sL8G/z+hmHwxzfXs3hzLv06hvPxbWfi49JXxzAM/m/F//Fl+pd0Du3MBxd+oCXMbYACi4i0b9WVsGs5bPkPbP0cyg/WPRfRGQZeAwOvgoj6Ld7B0Qq+dsXRj1kFbm/MXWOCObdvHOP6xHNakmcm7RZVVLF6R77zVs9el60MwNFQb+wpjlGUM7vHNEtPmooqG+kHSh2jMTWjMrWhxnVfqMMF+1lIiQ12CzNdY4LZmlPMvR/8hK/FxP/uHEmvePe/5f/Z8R8e+vYhfEw+vPm7Nzk15tQm/5mk5SmwiIjUslVDxir49RP45SOoLKx5wgTdzoLTroVe54OPf4Mv319Uwddb9vPV5pwGdpj259w+HRjXJ57h3aKbbYdpu91g874iZ0BZn3kIm0vLX38fM2d0jXbc6jkllq4xwR4bBTIMg/wSqyPEuN1iKiXzYJlb3Q2ZnNqDyak93Y5lFmVy2f8uo6y6jLtOu4tb+t/SnD+CtCAFFhGRhljLHH1efnwT0lfWHQ+MhP4THeEl/sj/ci+prK6ZtFt/h+lgPwtjT+nAuL5xjD2lA+GBJzdp92CplZXbHQFlxfY88kusbs93jQ1mbM8OjDkllmEpUc0WlpqStdpO5sEytzBTe4vpUFkVA5Ii+OBPw91WJVXZq7jhixvYlL+JQXGD+Ne4f2kJsyfYbVBRCEFRTXpZBRYRkWM5uAt+fBs2vgPFe+uOJ54Gp10Dp14KgRFHfLm12s7a3Qf46tdcFm/OJaeo7raMj9nE8G7RjOsTR2qfOBLCA49Zjs1usDGrwDGK8lsem/a434oK9rNwZvcY54TZtrZrdWF5FcF+Frd5KwAv/PgCL216iVC/UD688EMSQhKOcAU5YdYyKN7n6H3U4Md9UJIDAeFw365jX68RFFhERI6X3QY7v4ENb8C2L8Be5TjuE+DY0+i0ayF5ZL3+Lm6XsBv8nF3IV5sdK45+yz1s0m6ncMb1jWdcTafd2ts1uUUVzoCyans+heVVbq/rnRDmDCiDukS2aD8Ub7A+dz03fnkjdsPOU2OeYnzyeE+X1LrY7VCW33AAKd5b97Gi8NjXAjBZ4K854NN0TQYVWERETkRpPvz0nuOWUd7WuuORKXDa1TDwaghLPOZldueXsnhzDl/9msv6zENuIyXJ0UEMTYli055CtuYUu70uPNCXUT0coyije8a2quXUTa3IWsSl/72UfaX7uKjbRTw+8nFPl+RdqsqPHECK9jmOF+fUBfBj8Q2C0ATH/75rP7p+HpoAIXH12gOcLAUWEZGTYRiQvd4x6vLLR2CtCRYmM3RPddwy6nnecf1LM6+4kiVbHJ12V+3Ix1pdN2nXZIIBnSKck2UHdIqo19W1OVTbq/kp7ydW7lnJ/rL9JIcn0y2iG90jutMppJPH54gYhsH9K+7ni/Qv6BTSiX9f9G+CfYOP/cLWyjAcPYQqihy9hSqLHZPDK4uh7GD90ZGibKgoOM6LmyA41rHzeWjiYR9dwkhA+FFHEZuLAouISFOxljp2k97wJmSurjseFAMDrnDcMurQ67guVVpZzYrf8ti4p4A+CWGM6hFLVAvt4XOw4iDfZn/Lij0r+HbvtxRbixs8z9/iT0p4ijPAdAt3fOwY2hGzqWVuSf1v5/94cNWDWEwW3jjvDfrH9m+R73tCqipqAkZRzaPYJXjUHHMLIg2dUwzGkRv0HZFP4JEDSO3H0HiweG/XZgUWEZHmkL8DNr4FG991TEKs1XEwnH4t9P0DBHjH3yC7YWfLwS2s2LOCVXtW8XP+zxjU/bkP9w9nROIIUsJTSC9KZ2fBTnYX7qbS1nCb/gBLgDPIOMNMRDc6hjRtkMkqzuKy/11GaVUpdwy8gz8N+FOTXfuoKoqgMAsK9zhGMMoLjhAyXIJJZTHYrMe89HEzmcE/FPzDaz6GOiZ+uwURl4ASEOGRUZGmpMAiItKcbNWwYzH8+Jajq27tZoy+QdBngiO8dB7e4m8mxdZi1uxdw8rslazKXkV+eb7b872iejGq4yhGdxpNv5h+9W792Ow29pTsYUfBDnYW7HQ+dhfuxmpv+I050CeQlPAUZ4Cp/ZgQnNDoIFNtr+aGRTfwU95PnN7hdF5Je6Vpbk/Z7VCS6wgjhZmOjwVZdQGlIMulP88J8gt1hNXaoOFf83lAWN3nbsdqv3Z5jV9wqw8gjaXAIiLSUkr2w0/vOm4ZHdhedzyqm2Ouy8CrHMPyzcAwDHYV7mLlnpWsyF7Bj7k/Um3U9YYJ9AlkeMJwRncazciOI4kLjjuh71Ntr2ZP8R52Fux0hpkdhTtIL0yn6giTOgN9AukW7j4a0z2iO/HB8Udsavf3jX9n/k/zCfEN4cOLPiQx5NgTnAHHbZmibCjIdA8hhbWP7OObfBoYBeGdHI/AqLog0VDAcA0nfqH1tnuQ46PAIiLS0gwDstbBj2/ALx9DVanjuMkCPcY5Rl16jDvp+QQV1RWsy1nnuNWTvYrskmy355PDkhnVaRSjOo5iUNwg/CzNN0em2l5NZnGmW5DZWbCT9KJ0qu0Nt+cP9g12BhnX20v7Svdxw6IbsBt2Zo+aze+6/s7xAsNw7NBdmFUTQvbUBZHar0v3H7tYk8VxWyU8yRFIImo+hnd2fB7WEfxDjn0daVIKLCIinlRZAr9+7FgenbW27nhwB8dE3QFXOkZdzBbHvAVT7Uezy7G6UYjskmxW7FnByj0rWZezzm2eiZ/ZjyHxQ5whpXNY55b8SR2BwrA7+tnYq8FeTVV1BVlFmewo3MnOwt3sKEpnZ3EmGaV7qT7G5NILQ7ryhF+K+0hJbfg7Gt9glxBSG0o6130dmtDkS3Ll5CmwiIh4i7xtjrkuP70LpXnH9ZIq4MeAAFYGBbIiKJBdvu5vtHE2O6MrbYy2GgytgiDXkFMbfuqFIVP9Y7XzQ2qChiN02Fy+rnb/2rDVP3aEkZQj/VwZvr7s8PNlp68vO/182eHrS6avDzaTic5VVSzMziGkobel4A4uIyNJh42UJDm2V2hn8z/aAgWWJmIYBud9dB6R/pEkhSXRObQzncM6kxSaRFJoEtEB0V61zbyIeDFbFfz2pSO87Fhc740+32JmZWAgK4MCWRMYQInLnAiLYTCgspLRZeWMKqugR1UVreIvj8kCZh+Xh+Wwzx1fW80Wsswm4gKjCQnvcthISRKEdwTfY29vIK1PY96/NT52FAWVBWSXZJNdks0vB36p93yQTxBJoUnOENM5tLPz6w5BHVqsZ4GItAIWX+h9geNht2O3V/NL/s+syF7Jyr3fsvngVrfTo/wjGNlhMKPihjA8dgDhvsGOkY3aWzCGre5WjPNro4Fj9qO/DuMooaLm62MGjwZed9htraPxA7o1/W9c2hiNsBxFtb2aXYW7yCrKIrM4k6zimo9FWewr3efW0+BwfmY/OoV2coSYMJcwE9qZhJAEfMzKiiJtlc1uo8haREFlAYWVhRRWFlJQWeD8Orskm+/2fcfBioNur+sb3ZdRnUYxuuNo+sb01T96pM3TLaEWYLVZyS7JdoSYIpcwU5xFdnG229LCw/mYfEgISXAbkakNMx1DO+Jv8W+Rn0FEjs4wDMqry51h4/AA4hpEiiqLnJ8XW4uP+g+aWiG+IQxPrFt2HBMY0wI/lYj30C2hFuBn8SMlPIWU8JR6z1Xbq8kpzXGOxriGmaziLCptlc7PD2fCRFxwXINhJik0iSDftrWlvEhzsxt2bIYNm91GSVVJvdGOhgKHayA5Up+R4xHiG0K4fzjh/uFE+Ec4P0YGRDI4bjADOwzE1+y9bdNFvIlGWFqY3bCTV5ZXd4upqC7IZBZnUnqM5Xu+Zl8CfAII9AkkyCfI+XmAxfEx0Nfl88MeznN9AgjyCXI7Vvt5e/vjaRiG8w2t2l7tfGOrNqqx2W0Nfu32ub3a7XV2w46BgWEYGBh1X9ceMwzs2Kn9v93h57s+f/g1nOe7nOv2PQ67huMHrP1guH80Gv76qOfUXey4znM9Xm2vdgsONsPm9jurNmqeP+w5m1HzvMvr3a512PXs9vrXOp6RjmPxNfu6BY7azw//2vV4uH94u/v/k0hjaYTFi5lNZuKC44gLjmNI/BC35wzD4GDFQbcAk1Wc5ZxDU1BZQJW9iipr1RE3LjtZPmYfR4Cx1A8/ruGm9rZVvTfLw99Ij+PN1fkme/ib++HnH+Vah4cI1zeyYwURaT9MmAjzD6sLFn7hxxVEAn0CtSJQxMNOKLDMmzePp556ipycHAYMGMALL7zA0KFDj3j+Bx98wLRp00hPT6dHjx7Mnj2b3/3ud87nDcNgxowZvPzyyxQUFDBixAjmz59Pjx49TqS8VstkMhEdGE10YDQDOwys93yJtYSSqhLKq8udj4rqCrevDz9WYaugvKrmOVsDz9d8bqtp5lRtr6bYWkwxxVDewr8AL+Nj9sHH5IPFbMFisuBj9sFisjT4tY/JB7PJjMVkwWQyYcLk9tFsMrsfw3EME5gx13tNvfNNJszUnF/73OHnH/59axa+Hv5Ge/jxI33tPP/w8452vol655pMJrffm/NhtmA2mR2/O7PZ+bt2HjOZj/was4/zvKO+pubz2v82gT6BTbM3jYi0uEYHloULFzJlyhQWLFjAsGHDmDt3LmlpaWzbto0OHTrUO3/16tVceeWVzJw5kwsuuIB33nmHCRMmsGHDBk499VQAnnzySZ5//nlef/11UlJSmDZtGmlpaWzevJmAgICT/ynbiBC/EEL8mr51tGEYVNmrGgw9tR/Lqsvcwk+lrbLem6vrm2/t6oajvbm6Pl97fu0bc0Nv4PXON+H8vPaNqzEB42jHtTpDRMS7NHoOy7BhwxgyZAgvvvgiAHa7naSkJO68804eeOCBeudPnDiR0tJSPv30U+exM844g4EDB7JgwQIMwyAxMZG//OUv3HvvvQAUFhYSFxfHa6+9xhVXXHHMmlrTHBYRERFxaMz7d6P+GWm1Wlm/fj2pqal1FzCbSU1NZc2aNQ2+Zs2aNW7nA6SlpTnP3717Nzk5OW7nhIeHM2zYsCNes7KykqKiIreHiIiItF2NCiz5+fnYbDbi4ty3KI+LiyMnJ6fB1+Tk5Bz1/NqPjbnmzJkzCQ8Pdz6SkpIa82OIiIhIK9Mqb9RPnTqVwsJC5yMrq34/ExEREWk7GhVYYmJisFgs5Obmuh3Pzc0lPj6+wdfEx8cf9fzaj425pr+/P2FhYW4PERERabsaFVj8/PwYNGgQS5YscR6z2+0sWbKE4cOHN/ia4cOHu50PsHjxYuf5KSkpxMfHu51TVFTE2rVrj3hNERERaV8avax5ypQpXH/99QwePJihQ4cyd+5cSktLmTRpEgDXXXcdHTt2ZObMmQDcfffdjBkzhmeeeYbzzz+f9957jx9++IGXXnoJcCxjnTx5Mo899hg9evRwLmtOTExkwoQJTfeTioiISKvV6MAyceJE8vLymD59Ojk5OQwcOJBFixY5J81mZmZiNtcN3Jx55pm88847PPTQQzz44IP06NGDTz75xNmDBeC+++6jtLSUP/7xjxQUFDBy5EgWLVqkHiwiIiICaC8hERER8ZBm68MiIiIi4gkKLCIiIuL1FFhERETE6ymwiIiIiNdTYBERERGv1+hlzd6odqGTNkEUERFpPWrft49nwXKbCCzFxcUA2gRRRESkFSouLiY8PPyo57SJPix2u529e/cSGhqKyWRq0msXFRWRlJREVlaWerzU0O+kYfq91KffScP0e6lPv5P62sPvxDAMiouLSUxMdGs625A2McJiNpvp1KlTs34PbbJYn34nDdPvpT79Thqm30t9+p3U19Z/J8caWamlSbciIiLi9RRYRERExOspsByDv78/M2bMwN/f39OleA39Thqm30t9+p00TL+X+vQ7qU+/E3dtYtKtiIiItG0aYRERERGvp8AiIiIiXk+BRURERLyeAouIiIh4PQWWY5g3bx7JyckEBAQwbNgw1q1b5+mSPGbmzJkMGTKE0NBQOnTowIQJE9i2bZuny/Iqs2bNwmQyMXnyZE+X4nHZ2dlcc801REdHExgYSL9+/fjhhx88XZbH2Gw2pk2bRkpKCoGBgXTr1o1HH330uPZQaUtWrFjBhRdeSGJiIiaTiU8++cTtecMwmD59OgkJCQQGBpKamsr27ds9U2wLOdrvpKqqivvvv59+/foRHBxMYmIi1113HXv37vVcwR6iwHIUCxcuZMqUKcyYMYMNGzYwYMAA0tLS2L9/v6dL84jly5dz++23891337F48WKqqqoYN24cpaWlni7NK3z//ff84x//oH///p4uxeMOHTrEiBEj8PX15YsvvmDz5s0888wzREZGero0j5k9ezbz58/nxRdfZMuWLcyePZsnn3ySF154wdOltajS0lIGDBjAvHnzGnz+ySef5Pnnn2fBggWsXbuW4OBg0tLSqKioaOFKW87RfidlZWVs2LCBadOmsWHDBj766CO2bdvGRRdd5IFKPcyQIxo6dKhx++23O7+22WxGYmKiMXPmTA9W5T32799vAMby5cs9XYrHFRcXGz169DAWL15sjBkzxrj77rs9XZJH3X///cbIkSM9XYZXOf/8840bb7zR7dgf/vAH4+qrr/ZQRZ4HGB9//LHza7vdbsTHxxtPPfWU81hBQYHh7+9vvPvuux6osOUd/jtpyLp16wzAyMjIaJmivIRGWI7AarWyfv16UlNTncfMZjOpqamsWbPGg5V5j8LCQgCioqI8XInn3X777Zx//vlu/3tpz/773/8yePBgLrvsMjp06MBpp53Gyy+/7OmyPOrMM89kyZIl/PbbbwD89NNPrFq1ivPOO8/DlXmP3bt3k5OT4/b/o/DwcIYNG6a/uy4KCwsxmUxERER4upQW1SY2P2wO+fn52Gw24uLi3I7HxcWxdetWD1XlPex2O5MnT2bEiBGceuqpni7Ho9577z02bNjA999/7+lSvMauXbuYP38+U6ZM4cEHH+T777/nrrvuws/Pj+uvv97T5XnEAw88QFFREb169cJisWCz2Xj88ce5+uqrPV2a18jJyQFo8O9u7XPtXUVFBffffz9XXnllm94QsSEKLHJCbr/9dn755RdWrVrl6VI8Kisri7vvvpvFixcTEBDg6XK8ht1uZ/DgwTzxxBMAnHbaafzyyy8sWLCg3QaW999/n7fffpt33nmHvn37snHjRiZPnkxiYmK7/Z1I41RVVXH55ZdjGAbz58/3dDktTreEjiAmJgaLxUJubq7b8dzcXOLj4z1UlXe44447+PTTT1m6dCmdOnXydDketX79evbv38/pp5+Oj48PPj4+LF++nOeffx4fHx9sNpunS/SIhIQE+vTp43asd+/eZGZmeqgiz/u///s/HnjgAa644gr69evHtddeyz333MPMmTM9XZrXqP3bqr+79dWGlYyMDBYvXtzuRldAgeWI/Pz8GDRoEEuWLHEes9vtLFmyhOHDh3uwMs8xDIM77riDjz/+mG+++YaUlBRPl+Rx55xzDj///DMbN250PgYPHszVV1/Nxo0bsVgsni7RI0aMGFFvyftvv/1Gly5dPFSR55WVlWE2u//JtVgs2O12D1XkfVJSUoiPj3f7u1tUVMTatWvb7d9dqAsr27dv5+uvvyY6OtrTJXmEbgkdxZQpU7j++usZPHgwQ4cOZe7cuZSWljJp0iRPl+YRt99+O++88w7/+c9/CA0Ndd5TDg8PJzAw0MPVeUZoaGi9OTzBwcFER0e367k999xzD2eeeSZPPPEEl19+OevWreOll17ipZde8nRpHnPhhRfy+OOP07lzZ/r27cuPP/7InDlzuPHGGz1dWosqKSlhx44dzq93797Nxo0biYqKonPnzkyePJnHHnuMHj16kJKSwrRp00hMTGTChAmeK7qZHe13kpCQwKWXXsqGDRv49NNPsdlszr+9UVFR+Pn5earslufpZUre7oUXXjA6d+5s+Pn5GUOHDjW+++47T5fkMUCDj1dffdXTpXkVLWt2+N///meceuqphr+/v9GrVy/jpZde8nRJHlVUVGTcfffdRufOnY2AgACja9euxl//+lejsrLS06W1qKVLlzb4d+T66683DMOxtHnatGlGXFyc4e/vb5xzzjnGtm3bPFt0Mzva72T37t1H/Nu7dOlST5feokyG0c7aLIqIiEirozksIiIi4vUUWERERMTrKbCIiIiI11NgEREREa+nwCIiIiJeT4FFREREvJ4Ci4iIiHg9BRYRERHxegosIiIi4vUUWERERMTrKbCIiIiI11NgEREREa/3/2JvAyYaPDiIAAAAAElFTkSuQmCC",
+      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAkMAAAHFCAYAAADxOP3DAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjguNCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8fJSN1AAAACXBIWXMAAA9hAAAPYQGoP6dpAABSY0lEQVR4nO3deVxU5f4H8M8BBBQFRQXZFNwVcsklSUm5pqYOVyOztMy81a1cieymt+5VK/OXt1xKy7x187Zg3hRNx7JMQS21rLTct1BxQVxBMRBmnt8f04wMzDALs5wz5/N+veaFc7Z5zhmO58vzfJ/nkYQQAkREREQq5eftAhARERF5E4MhIiIiUjUGQ0RERKRqDIaIiIhI1RgMERERkaoxGCIiIiJVYzBEREREqsZgiIiIiFSNwRARERGpGoMhIhlatmwZJEkyvQICAhAVFYUHH3wQR48e9Vq5Zs6cCUmSvPb5SnDixAlIkoTXX3/d20UhIjsFeLsARGTdBx98gPbt26O0tBTfffcdZs+ejZycHBw6dAiNGjXydvGIiHwCgyEiGUtKSkL37t0BAP369YNOp8OMGTOwZs0ajBs3zsulI28qLy831RoSUe2wmYxIQYyB0fnz503LSktL8eyzz6JLly4ICwtDeHg4kpOT8fnnn1fbX5IkTJw4ER999BE6dOiAevXqoXPnztBqtdW2Xb9+Pbp06YKgoCAkJCRYbfYpLS3F9OnTkZCQgMDAQMTExGDChAm4evWq2Xbx8fHQaDTQarXo2rUr6tatiw4dOpg+e9myZejQoQNCQkLQs2dP/Pjjjzavh7E5MScnB08//TSaNGmCxo0bIz09HWfPnq127jNnzqx2jPj4eDz66KPVjrl582Y88cQTaNy4MUJDQ/HII4+gpKQEBQUFGDlyJBo2bIioqChMnToV5eXl1Y6r1+sxe/ZsNG/eHMHBwejevTs2bdpUbbujR49i9OjRiIiIQFBQEDp06IDFixebbZObmwtJkvDRRx/h2WefRUxMDIKCgnDs2DHcuHEDU6dORUJCAoKDgxEeHo7u3btj+fLlNq8fERnwTwoiBcnLywMAtG3b1rSsrKwMly9fxtSpUxETE4ObN2/im2++QXp6Oj744AM88sgjZsdYv349du3ahZdeegn169fH3Llzce+99+Lw4cNo2bIlAGDTpk0YNmwYkpOT8emnn0Kn02Hu3LlmQRgACCEwfPhwbNq0CdOnT0dKSgp+/fVXzJgxAzt27MCOHTsQFBRk2v6XX37B9OnT8cILLyAsLAyzZs1Ceno6pk+fjk2bNuHVV1+FJEl4/vnnodFokJeXh7p169q8Lo8//jiGDh2KrKws5Ofn47nnnsPDDz+MzZs3O32tH3/8caSnp+PTTz/F7t278fe//x0VFRU4fPgw0tPT8de//hXffPMNXnvtNURHRyMzM9Ns/0WLFqFFixZYsGAB9Ho95s6di8GDB2PLli1ITk4GABw4cAB33nknmjdvjjfeeAPNmjXDV199hcmTJ+PixYuYMWOG2TGnT5+O5ORkLFmyBH5+foiIiEBmZiY++ugjvPLKK+jatStKSkqwb98+XLp0yelzJ1IdQUSy88EHHwgAYufOnaK8vFxcu3ZNbNiwQTRr1kzcddddory83Oq+FRUVory8XDz22GOia9euZusAiMjISFFcXGxaVlBQIPz8/MScOXNMy+644w4RHR0tfv/9d9Oy4uJiER4eLir/t7FhwwYBQMydO9fsc1asWCEAiKVLl5qWtWjRQtStW1ecPn3atGzPnj0CgIiKihIlJSWm5WvWrBEAxNq1a+26TuPHjzdbPnfuXAFAnDt3zuzcZ8yYUe0YLVq0EGPHjq12zEmTJpltN3z4cAFAzJs3z2x5ly5dxO233256n5eXJwBYvX533323admgQYNEbGysKCoqMjvmxIkTRXBwsLh8+bIQQoicnBwBQNx1113Vyp+UlCSGDx9ebTkR2Y/NZEQy1qtXL9SpUwcNGjTAPffcg0aNGuHzzz+vlify2WefoXfv3qhfvz4CAgJQp04dvP/++zh48GC1Y6ampqJBgwam95GRkYiIiMDJkycBACUlJdi1axfS09MRHBxs2q5BgwZIS0szO5ax5qVyMxMA3H///QgJCanWLNSlSxfExMSY3nfo0AGAIR+qXr161ZYby2TLn//8Z7P3nTp1cmh/SzQajdl7Y5mGDh1abbmlz7F2/bZu3QqdTofS0lJs2rQJ9957L+rVq4eKigrTa8iQISgtLcXOnTvNjnnfffdV+5yePXviyy+/xLRp05Cbm4vff//d6XMmUisGQ0Qy9uGHH2LXrl3YvHkznnzySRw8eBCjRo0y2yY7OxsjR45ETEwMPv74Y+zYsQO7du3CX/7yF5SWllY7ZuPGjastCwoKMj1Er1y5Ar1ej2bNmlXbruqyS5cuISAgAE2bNjVbLkkSmjVrVq2pJjw83Ox9YGBgjcstld+SqudkbJqrTWDgSFktldPa9bt58yauX7+OS5cuoaKiAm+99Rbq1Klj9hoyZAgA4OLFi2b7R0VFVTvmm2++ieeffx5r1qxBamoqwsPDMXz4cK8OwUCkNMwZIpKxDh06mJKmU1NTodPp8N5772HlypUYMWIEAODjjz9GQkICVqxYYTYGUFlZmVOf2ahRI0iShIKCgmrrqi5r3LgxKioqcOHCBbOASAiBgoIC9OjRw6kyuENQUJDFa+Ku3Bpr1y8wMBD169dHnTp14O/vjzFjxmDChAkWj5GQkGD23tIYTyEhIZg1axZmzZqF8+fPm2qJ0tLScOjQIdecDJGPY80QkYLMnTsXjRo1wj//+U/o9XoAhgdkYGCg2YOyoKDAYm8yexh7c2VnZ5vVeFy7dg3r1q0z27Z///4ADAFZZatWrUJJSYlpvRzEx8fj119/NVu2efNmXL9+3S2fZ+36paSkwN/fH/Xq1UNqaip2796NTp06oXv37tVelmrxahIZGYlHH30Uo0aNwuHDh3Hjxg1XnxaRT2LNEJGCNGrUCNOnT8ff/vY3ZGVl4eGHH4ZGo0F2djbGjx+PESNGID8/Hy+//DKioqKcbip5+eWXcc8992DAgAF49tlnodPp8NprryEkJASXL182bTdgwAAMGjQIzz//PIqLi9G7d29Tb7KuXbtizJgxrjr1WhszZgz+8Y9/4J///Cf69u2LAwcOYNGiRQgLC3PL5/n7+2PAgAHIzMyEXq/Ha6+9huLiYsyaNcu0zcKFC9GnTx+kpKTg6aefRnx8PK5du4Zjx45h3bp1dvWGu+OOO6DRaNCpUyc0atQIBw8exEcffYTk5GSzPCwiso7BEJHCTJo0CYsWLcJLL72EUaNGYdy4cSgsLMSSJUvwn//8By1btsS0adNw+vRpswevIwYMGIA1a9bgxRdfxAMPPIBmzZph/Pjx+P33382OKUkS1qxZg5kzZ+KDDz7A7Nmz0aRJE4wZMwavvvqqWbd6b3vuuedQXFyMZcuW4fXXX0fPnj3xv//9D8OGDXPL502cOBGlpaWYPHkyCgsLkZiYiPXr16N3796mbTp27Iiff/4ZL7/8Ml588UUUFhaiYcOGaNOmjSlvyJY//elPWLt2LebPn48bN24gJiYGjzzyCF544QW3nBeRL5KEEMLbhSAiIiLyFuYMERERkaoxGCIiIiJVYzBEREREqsZgiIiIiFSNwRARERGpGoMhIiIiUjVVjDOk1+tx9uxZNGjQwOJw9kRERCQ/Qghcu3YN0dHR8PNzX/2NKoKhs2fPIi4uztvFICIiIifk5+cjNjbWbcdXRTDUoEEDAIaLGRoa6uXSEBERkT2Ki4sRFxdneo67iyqCIWPTWGhoKIMhIiIihXF3igsTqImIiEjVGAwRERGRqjEYIiIiIlVjMERERESqxmCIiIiIVI3BEBEREakagyEiIiJSNQZDREREpGoMhoiIiEjVGAwRERGRqjEYIiIiIlVjMERERESqxmCIiIiIVI3BEBEREakagyEiIiJSNQZDREREpGoMhoiIiEjVvB4Mbd26FWlpaYiOjoYkSVizZo3VbZ988klIkoQFCxZ4rHxERETk27weDJWUlKBz585YtGhRjdutWbMG33//PaKjoz1UMiIiIlKDAG8XYPDgwRg8eHCN25w5cwYTJ07EV199haFDh3qoZERERKQGXq8ZskWv12PMmDF47rnnkJiY6O3iEBERkY/xes2QLa+99hoCAgIwefJku/cpKytDWVmZ6X1xcbE7ikZEREQ+QNY1Qz/99BMWLlyIZcuWQZIku/ebM2cOwsLCTK+4uDg3lpKIiIiUTNbB0LZt21BYWIjmzZsjICAAAQEBOHnyJJ599lnEx8db3W/69OkoKioyvfLz8z1XaCIiIlIUWTeTjRkzBnfffbfZskGDBmHMmDEYN26c1f2CgoIQFBTk7uIRERGRD/B6MHT9+nUcO3bM9D4vLw979uxBeHg4mjdvjsaNG5ttX6dOHTRr1gzt2rXzdFGJiIjIB3k9GPrxxx+Rmppqep+ZmQkAGDt2LJYtW+alUhEREZFaeD0Y6tevH4QQdm9/4sQJ9xWGiIiIVEfWCdRERERE7sZgiIiIiFSNwRARERGpGoMhIiIiUjUGQ0RERKRqDIaIiIhI1RgMERERkaoxGCIiIiJVYzBEREREqsZgiIiIiFSNwRARERGpGoMhIiIiUjUGQ0RERKRqDIaIiIhI1RgMERERkaoxGCIiIiJVYzBEREREqsZgiIiIiFSNwRARERGpGoMhIiIiUjUGQ0RERKRqDIaIiIhI1RgMERERkaoxGCIiIiJVYzBEREREqsZgiIiIiFSNwRARERGpGoMhIiIiUjUGQ0RERKRqDIaIiIhI1RgMERERkaoxGCIiIiJVYzBEREREqsZgiIiIiFSNwRARERGpGoMhIiIiUjUGQ0RERKRqDIaIiIhI1RgMERERkap5PRjaunUr0tLSEB0dDUmSsGbNGtO68vJyPP/887jtttsQEhKC6OhoPPLIIzh79qz3CkxEREQ+xevBUElJCTp37oxFixZVW3fjxg38/PPP+Mc//oGff/4Z2dnZOHLkCP785z97oaRERETkiyQhhPB2IYwkScLq1asxfPhwq9vs2rULPXv2xMmTJ9G8eXO7jltcXIywsDAUFRUhNDTURaUlIiIid/LU8zvAbUd2k6KiIkiShIYNG1rdpqysDGVlZab3xcXFHigZERERKZHXm8kcUVpaimnTpmH06NE1Rohz5sxBWFiY6RUXF+fBUhIREZGSKCYYKi8vx4MPPgi9Xo+33367xm2nT5+OoqIi0ys/P99DpSQiIiKlUUQzWXl5OUaOHIm8vDxs3rzZZrthUFAQgoKCPFQ6IiIiUjLZB0PGQOjo0aPIyclB48aNvV0kIiIi8iFeD4auX7+OY8eOmd7n5eVhz549CA8PR3R0NEaMGIGff/4ZWq0WOp0OBQUFAIDw8HAEBgZ6q9hERETkI7zetT43NxepqanVlo8dOxYzZ85EQkKCxf1ycnLQr18/uz6DXeuJiIiURzVd6/v164ea4jEZDYNEREREPkgxvcmIiIiI3IHBEBEREakagyEiIiJSNQZDREREpGoMhoiIiEjVvN6bjIiIiLxApwO2bQPOnQOiooCUFMDf39ul8goGQ0RERGqTnQ1MmQKcPn1rWWwssHAhkJ7uvXJ5CZvJiIiI1CQ7GxgxwjwQAoAzZwzLs7O9Uy4vYjBERESkFjqdoUbI0oDGxmUZGYbtVITBEBERkVps21a9RqgyIYD8fMN2KsJgiIiISC3OnXPtdj6CwRAREZFaREW5djsfwWCIiIhILVJSDL3GJMnyekkC4uIM26kIgyEiIiK18Pc3dJ8HqgdExvcLFqhuvCEGQ0RERGqSng6sXAnExJgvj401LFfhOEMcdJGIiEht0tOBYcM4AvUfGAwRERGpkb8/0K+ft0shC2wmIyIiIlVjMERERESqxmCIiIiIVI05Q0RERORZOp2skrcZDBEREVkjs4e2T8jONkwWW3mOtNhYw/hHXurWz2YyIiJSF50OyM0Fli83/LQ2Q3t2NhAfD6SmAqNHG37GxxuWk3Oys4ERI6pPFnvmjGG5l64tgyEiIlIPewMcmT60FU2nM9QICVF9nXFZRob14NSNGAwREZE62BvgyPihrWjbtlW/9pUJAeTnG7bzMAZDRETk+xwJcGT80Fa0c+dcu50LMRgiIiLf50iAI+OHtqJFRbl2OxdiMERERL7PkQBHxg9tRUtJMfQakyTL6yUJiIszbOdhDIaIiMj3ORLgyPihrWj+/obu80D1a2t8v2CBV4YuYDBERES+z5EAR8YPbcVLTwdWrgRiYsyXx8YalnOcISIiIjdxNMCR6UPbJ6SnAydOADk5QFaW4WdenlevqSSEpdR631JcXIywsDAUFRUhNDTU28UhIiJvsTT6cVycIRCy9DDmCNTuZeP6eur5zWCIiIjUhQGOPNgxLQeDIRdiMERERCQjxgEwq4YgxibLP5oiPfX8Zs4QERGRktk715pcyHCEbwZDRERESqXEyWRlOMI3gyEiIiIlUupksjIc4ZvBEBERkdLIsKnJbvYOgHn0qHvLUYnXg6GtW7ciLS0N0dHRkCQJa9asMVsvhMDMmTMRHR2NunXrol+/fti/f793CktERCQHMmxqsputATCNZswA1q71SJG8HgyVlJSgc+fOWLRokcX1c+fOxbx587Bo0SLs2rULzZo1w4ABA3Dt2jUPl5SIiEgmnG1qkkOydeUBMG3JyHBrUYy8HgwNHjwYr7zyCtItDHYlhMCCBQvwwgsvID09HUlJSfjvf/+LGzduICsrywulJSIikgFnJpOVU7J1ejowc6bt7S5dcntRABkEQzXJy8tDQUEBBg4caFoWFBSEvn37Yvv27Vb3KysrQ3FxsdmLiIjIZzg6mawck63btPH8Z1oh62CooKAAABAZGWm2PDIy0rTOkjlz5iAsLMz0iouLc2s5iYiIPMqRudZsJVsLATzxBLBpU/VmM3c2q9lbu+UBsg6GjKQqX7QQotqyyqZPn46ioiLTKz8/391FJCIi8ix7J5O1lWwNAJcvA3ffbd5s5myzmo0A6vjmk1iYvgWD0ushHzHQw0YitQcEeLsANWnWrBkAQw1RVKUIsrCwsFptUWVBQUEICgpye/mIiIi8Kj0dGDas5rnWHBmvx9hsNnUq8Prr1WuTjOsrB1uVWZhvTMTE4sA9z+C/v94O7S9xOHizFfwQixRsw2e4HxlYAAF4NSSSdTCUkJCAZs2aYePGjejatSsA4ObNm9iyZQtee+01L5eOiIgUzVcmbPX3B/r1s77ekeYoIQzNbPPmWW9WkyRDL69hw8yvl5X5xsSZM+jw/lQcw0ocRD+MwP+wWJqECFFofg5eHBPJ681k169fx549e7Bnzx4AhqTpPXv24NSpU5AkCRkZGXj11VexevVq7Nu3D48++ijq1auH0aNHe7fgRESkXHLqWeVu9o7rYyREzYGJhTGMRHkFyp+cAEtzv/vBsOzf/k/juyGv4H/Sg+aBEADo9YafkycDTZvaX1YX8fqs9bm5uUhNTa22fOzYsVi2bBmEEJg1axbeffddXLlyBXfccQcWL16MpKQkuz+Ds9YTEZGJnTOm+xRr51wLFf/5ELkXEqFdfg0Xfj2HT/SjbO/UtClw4YLldZJkCNrmzQNGjgQAFAuBMMDtz2+vB0OewGCIiIgAGGo84uOtJxQbH8h5efJsMjM27Z05YwgqmjY1JFDbauLLzgb++leXjtszBFp8iaEAgAexHMvhohabnBxDQveUKSg+fdojwZCsc4aIiIhcypFpLGrKw/EGC8nJJrGxhq721pKaHakVkiTAz89qU5keEk4jFl/hHjTzO4+hrY/g0W5XgOUOnEtNzp0DRo0y5CRt2ABoNC46sHVezxkiInI7OUxBQPIgwxnT7WJt0ESj06ctD55Y0xhDlkgSBIDjd4yGAKp1eze+39HxcXz/38M4U9YU7x1OQZ+PnrQ9CGTTpvaVwZjw7e9/a9BIN2MwRES+TU2JsmSbM9NYeJu9AY0Q1Weqt2eMoUouSBEYLT5B6+0f4j6swhmYj2Gkj4yC36qVeGD/P9H9kY7wC/gjjLBnEMjFix0bNduDGAwRke+S4xQE5F2OTmMhB44ENFVnqrezhustTEQ/5KCZ/gw+xSjE+Z9Bs8Qm2PvPz1D2+QYgKwvIyUHAmVPWk8ttDQJ5//32j5rtYcwZIiLfZGsKAmtjpZBvM9ZgjBhh+B2o/Pvh5QeyVY422VXe3s4armzci7L6jfFSr21IezIat6W3geQXY3vHqmwNAmkMmKrmPsXGGq67l3rxMRgiIt+k5ERZci+ZPpCtcrTJrtL2J6QENK7bFCG/X7DYFKQH8HtIBFZs74iITs1qVUwTW4NA2jNqtocxGCIi36TURFnyDGceyN4YsVqnM7zCww3dzW0QsbHYvj8M2um50O6Owb6yNrgXS7ASI6DHrQEQAUBIEvwAhHz4DkJcFQjZy1bA5GEMhojINykxUZY8y5EHsqVu7TV1Z3eFmrrSV2EMcR49/Qo+nNjVtNwfFbgU1hLrWs3EkFPvwO9igWmdJNeaMC/goItE5JuMg+udOWM5b0jug+uRfHh6xGqdDpg9G5gxw+5dTiEOGViA1UhHI+kKBjc/AE2ahEEZHRDeqtGt48qoacoennp+MxgiIt9lfIgBlhNlfXHaBXItT49YnZ1tmJ/rzBmrm+gBXEY4ZuMFnEckziIGF+tEY0iXs9CMaYQ7n0hEQLBvNPx46vntG1eLiMgSpSXKkvx4MhHfzpGi/QA0wWX4NQhBrwGxGDohHq3+1AJA29p9vooxGCIi3ybDniukIJ5KxNfpIP4YCsLe+drfeDcUGNW3dp9LABgMEZEayKznCimImxPxy4rLkPvWXhx6/1tMcWCk6Np8JlXHEaiJiIisccOI1QW/FuI/47YhPXonGoeV454Xu2NnXqRj5WrYUF6jZCsca4aIiIisccGI1UIvsGfFYaz7dwG0PzTFrpJEABGm9dF+59A9phDId6Bcycls6nUhBkNERETWGAc8nDIF+Phj4OLFW+tqSMS/cfEGNi3cB+3KUmiPtMFZfXsA7QEAftDhieAPMaDlb7hteGu0mfkQJL+JQPzr9s9BNnBg7c+NTNi1noiIyBJLgx42bQo89JAhKb9KIn7+92exfuExrPumLjZfSEIp6prWheA6Bkbtw4T236Dfgbfhf75SwrVx8EYAuO8+2+Xy9wdu3AACA2t7hrLHrvVERETeYq2b+8WLhsAlJQV6IeGH9/ZBu+witD9F4ZfSdgCiTZu28D+NtI7HoXkgBH0nJCF481lgxD+rH/PMGcNnTZ1qX9kyM1URCHkSa4aIiIgqszHQogBwOSACiRW/4DxuzenlBx2SG+yH5s7LSHs6Fh3TWkHyk+w6JgBDjY9OV3PZnn0WeP11h05HyVgzRES+S4HTApCK2BhoUQLQuKIQ7XEIpaiLe+L2QzNU4J6M9mjSrpNTxwRgOxACAI3G9jbkMAZDRORZ3pjwknybi4Nr3cnTsGfvxQ/tQNulvVGn3p22N67toIyuPg6ZYTBERJ5jLQ/DmDPBucLIUS4Krq/kXcWGeQegXSdQfDIM6+zYJ/HxZKBeHfs+wFUDJHKgRbdgzhAReYanJ7wk31eL2eSFXuDwhjxo3z6Fdd82xHdFSdD9UT/gBx1OoTmicNbyyMTO/K4af//PnLE+95i/P6DXW16v0vvDU89vjkBNRJ7hyISX5Dk6HZCbCyxfbvhpT96KHOh0hhohS4GDcVlGhtn53Lx+E9/M/RkZXbegTdApdBjaEs+t74etRV2gQwCSgo5iWq9cbH17P5qtWAg/Sao+8rSdAy1WYxy80VqgI0mGXmKVP6O2n0l2YzMZEXmGpya8JPspOX/LzuD6yofrsHZ7E2i/9MdXZxJxDbebNglEGVIb70Xan0owdHIrxPdpA6DNH2s7AQErLV8fKwMt2qVxY+DSJfNl4eHA0qWGY/bq5frPJJsYDBGRZ7h5wktykNLzt+wMmif8pQTLMdz0PtKvEJrWh6FJD8TdUxJRv1l36zunpxsGV3RFcra16w0Aly+75zPJbswZIiLPsJUzodKcCK9wNn9LTkMi5OYCqak2N+uHHFyrGwlNj/PQ/CUC3R5qD78AD2eIMF/OacwZIiLfYsyZAJgT4W3O5G9lZxse6KmpwOjRhp/x8YblXnA2pA2u12sKvZX1egDX60Xgk52t8dONDpi1pR96jO3o+UAIYL6cAjAYIiLPSU83NL/ExJgvj42Vf7OML3E0f8vYxFP1gW5sUvNAQKSv0OPHDw9gRt9cdKt3EDE9Y/DIjSUAJOhhHlwLSYKfJKH+R+8g5o5Yxz7IHQnlzJeTPeYMEZFnMSfC+xzJ37LVa0uSDL22hg1z+Xd4veA6vlm4H9rsm1h/rC0K9B0BdARgCIHO1W+D7JYz8efT7yDwcoFpP8nZhGN3JZQzX072mDNERKQ2juRvbdtmV24OcnKAfv1qXbST352GduFxaHPqIefibShDsGldfVzDoJh9SBusw+CMdohIbHrrfGobXNdizCKbmC/nNM5NRkRErlM1YJg/Hxg50vAgrvyArpq/5eYmHt1NHXa+vx/aDy9DuzsG+8raALjVtNUy4CTSkk5AM6oB7hqfhMD6ydUP4u9fu0DM3bVfxny5ESNsX2/yCgZDRES+zlrzz9SphtyYmsa0sbfp5sABQ46NHbUyRaeK8NX8/dCu1eOLvA64JG5NbuqPCvQO2wdN76tIm9Ac7e5JgOTXwr4yWGJPrZEjCc7OBl3GfDmOISRLbCYjIvJltpp/VqwAmja1HizYM41EZVZybI5uPAHt4hNYtyUM264moQK35vRqJF3B4OYHoEmTMCijA8JbNXLyZKuwNwdo+XJDDzlbsrKAUaNqVyY5DU+gAJ56fjMYIlI7/ufsu1w1vo0xoAJsB0R/BFkVn3yKbefaQJtVBO2vLXCkPMFssw6Bx6HpnA/NmEa484lEBAS7uKHCkRwgO8csclVeFNmPwZALMRgiskLJ0zGQba58yFv6XbFCD+AsYtACJ6GHIciqg5voG74Xmr7XoZkYj1Z/qkXTly2OBoFMcJYtDrpIRO4lg7FjyM1cmfycng6cOGEInF58scZN/QDE4gzSsA6Ptt6GlVN34GJ+KTZe6oYp2X3dGwgBjg9yyAFBPU9mEwQzGCJSIydm/CaZcOQh4urxbfz9UXZ7Mn45HGx7WwDZH5bgg6MpuO9fyQiN9WCtvDNBIAcE9RyZjWYOKCAYqqiowIsvvoiEhATUrVsXLVu2xEsvvQS93tog7ERkE6cHUCZHHyIpKYaHedXaDiNJAuLiDNvVoODXQvxn3DakR+9E47ByTPmst13F9YuLsb2ROzgbBFau/crKMvzMy2Mg5EoyrZGWfdf61157DUuWLMF///tfJCYm4scff8S4ceMQFhaGKVOmeLt4RMrE6QGUx5lZ5v39gXnzDOMJVVVD84/QC+xZcRja9wqw7vum2FWSCCDCtP641BqX/ZuiUcUFWAyzjDk2NoIstzEGgbZygCyVr7ZjFpF1XhzN3BbZB0M7duzAsGHDMHToUABAfHw8li9fjh9//NHLJSNSME4PoCzOPkSys4HMTMvHrDK+zY2LN7Bp4T5oV5ZCe6QNzurbA2hv2rxHyH5oel6A5rFIdB3VHtKaJTX31vJmjo0zgxyyV6X7eWI8JyfJPhjq06cPlixZgiNHjqBt27b45Zdf8O2332LBggXeLhqRPNnzn3pt/nImz3PmIWKtJslo3jzkx/TC+tFbod0UjE2Ft6EUPU2rQ3AdA6L2IW1QOYY80w7NOiVWP0Z4OHDpUvVlS5d6v2nJkUEO2avSM2RcIy37YOj5559HUVER2rdvD39/f+h0OsyePRujahj4qqysDGVlZab3xcXFnigqkffZ+5+62qcHUFotgKMPkZpqkgAIAAX3T0I8TkOPaNPyFv6nkdbxODQPhKDvhCQEN+xl+XNqCrSqBkfeZM+kwM40P5Jz5FwjLWRu+fLlIjY2Vixfvlz8+uuv4sMPPxTh4eFi2bJlVveZMWOGgOF+N3sVFRV5sOREHrZqlRCSJIThv/VbL0kyvFatsrxPbKz59nFxlrf1FZbOOTbWfedcUSFETo4QWVmGnxUVjh8jJ6f692rplZPj0Pap+Eb0bvCLmDMoR+zNPiL0Or1951P1+lV9xcU5d56eZutcJEk556IExutt6f8pK9e7qKjII89v2QdDsbGxYtGiRWbLXn75ZdGuXTur+5SWloqioiLTKz8/n8EQ+bba/Kfuioe1UjgTMNb281wReDn4ECl48U27gqGi1991/JwcDczkzJfORSmM92DV32Ur96CngiHZd62/ceMG/PzMi+nv719j1/qgoCCEhoaavYh8Wm26yht7z4waZfgp5+ai2vD02Equ7ELs72/4fmpo9jowMAN/S96GjkHH8cArt9l12NBube0vg9GZM67dzptknMPis2Q6npPsc4bS0tIwe/ZsNG/eHImJidi9ezfmzZuHv/zlL94uGpF88D912zzZk8XVXYizs4HXX7e4SgB4S0zElPdv9Ro7jhgUShFoKgpd3/X9wgXXbudNcs5h8WX25HJ5mOxrht566y2MGDEC48ePR4cOHTB16lQ8+eSTePnll71dNCL54H/qtnkyYHTloJZ/BFaihlqh4ViDpijEmJbfYkXGdpw/UYaIle9AkiTXTy/RtKlrt/MmFw1KSU6QWY207GuGGjRogAULFrArPVFN2FXeNk8GjC4KvG5ev4m9zy5Dt9OnLdfwwPAXbXPk49zXe+E/oP+tFS0c6FruiKrNG7XdzpvU3quSTGRfM0REduBEk7Z5shagFoFX4f4L+O8T3+L+2B1o0qAUry9tYNeh/C8WVl/ojukljNexJlWvo8wm5TQj0xwW8izZ1wwRkZ0cGWTOE+Q2lo8nawEcqKkTeoG92UehXXoW63Y0xvfXEyHQx7RpmRRsaAuzxVoA5urpJSpfR2vnVvk6KmFAQxnmsJCHOdL17NSpU27q1OZenuqaRyQLcugq7+mxfGpbNneMrVRDF2K9JIkfR74mnk7cIuL8T1fryX173QPin3fliB+W7Re60psOj83iEfZcR08PZUA+x1PPb0kIK1l5FoSEhCAzMxPTpk1DSEiI+yI0FysuLkZYWBiKiorYzZ7I3ayN6GusfZFD04O7aq2qHvfCBcPcYJVqRS74RWKyfh4+xWjTsrq4gbsj90JzdxmGZrRBTPcqtTzGawpYrtHy1jWt6TrqdEB8vPVEcmPtWF4ea2DIKk89vx0KhrZv345nnnkG+fn5mD17NsaNG+e2grkSgyEiD1HzA9BCc9DNRpFYHfUUNh1LwLWbgTiHKGxDCvTwR6z/WWjaHUPayLpInZSEuuF1HT4+4uK80wRqj9xcIDXV9nY5OZwlnqySZTBk9OGHH+KFF15AkyZNMH/+fPST+S8ygyFSDLnl2ThKrQ/AP2puhBBmvb70f7wbgZVYg+G4o/5+aHpdguaJKHQa0RaSn7U+YlYo6fdj+XJg9Gjb22VlGbpXE1ngqee3UwnUjzzyCO6//37MmTMHQ4cOxcCBA/Gvf/0LrVu3dnX5iNRDCYmmtqhw8MeTW08i/OGnUL9KIAQAfhDQA/gw5Gnc2H4nIjrZNzK0Va5OhnYnjn1FCuJ013ohBAYOHIi//vWvWLt2LZKSkvDss8/i2rVrriwfkTq4cuoGb1LBA1B3U4ft7+7F9ORc3BZ8FGP75qHB7xdqHAeofkkhIi4fqsWHyrhrujUc0JAUxKGaoSVLlmDXrl3YtWsXDh48CH9/f3Tq1AkTJkxAly5d8Mknn6Bjx45YvXo1unfv7q4yE/kWV0/d4C06neEVHg5cvmx5G4UO/lh0qghfzd8P7Vo9vsjrgEviVg3PaHxv30GcrQ1Tao0hBzQkBXEoZyguLg69evUyvbp3746goCCzbV599VVkZWVh3759Li+ss5gzRLLmC3k2lh7YVXm755ODjm48Ae3iE9BuDcXWK7ehAnVM6xpKVzG4+X5ohgKaOy8j9OE/2z6gM9+fEnrm2aK0xG+SFVknUNfk/PnziI6Ohk5G1bgMhkjWlJ5oau2BXZXMH4DlN8rx3dL9WPfxVWh/bYEj5Qlm69sHHkda53xoxjTCnU8kIiD4j4p1Yw86WwMsOtqDzpd65ikp8ZtkRdYJ1DWJiIjA5s2bXX1YIt+l5Dybmpr4jBo3BlaskMVkjFVdOnoZX847CO16CRvyE1GELqZ1dXATfcP3QtP3OoZOiEfr/q0AtKp+EHc1Bzky2atcawyNlJT4bcQATlVcHgxJkoS+ffu6+rBEvkvJk6zaemADwKVLhoeIDB4kQi9wYN1xaJechnZ7I2wvToIevU3rm0gXMbTlIWiG+WPgM4kIje1m34HdMRWKCnvmyYZS87TIaZybjMjblJxoqoAHdllxGbYs3gftp9ehPZCAvIrWAG4NA9Ip+DDSup2D5tEm6PFIB/gH9rF+sJq4en4rJdcYKpm1Zl9jz04l5GmRw1yeMyRHzBkiRVBioqlMk78Lfi3EF/MPQ/tVHXx9LgklqG9aF4RS9G+6F5r+v2Po5FZonhxTw5G8yF25SGSdL+Vp+QjFJlDLEYMhUgyl5SnI5IEt9AJ7VhyG9r0CrPu+KXaVJJqtj/IrgKbtEaSNCMafJiUiJEIhcyvKdU4yXyXT4F7NFJtATUS1oLREUy828d24eAObFu6DdmUp1h9pgzP69gDam9b3CNkPTc8L0DwWia6j2kPya+byMridO3KRyDoFNPuSezAYIqLa8eADO//7s1i/8Bi0m4KxqfA2lKKnaV0IrmNA1D5oBpZjSEZbRHVJrOFICuLqXCSyjnlaqsVmMiJyDTc08ekr9Nj13wPQLrsI7U/NsOf39mbrW/ifhqbjcaQ9EIK+E5IQ3DC4Vp9HKieTZl+6hc1kRKQsLmriu3b2GjYu2I91q8vxxfH2KBRJpnV+0CG5wX5o7rwMzZMxSBzWGpJfbK0/kwiAsnt2Uq0wGCIir/st9xS0b+VBm1sfuZdvQzl6mdaFogj3xO2HZogeg5/pgCbtOnmxpOTzmKelSmwmIyKPqyitwI739kP70RVof4nFgbLWZutb1zmBtNtOIO2hMPR5Kgl16tWxciQLlNYjj+SJv0eywGYyIvIpV/KuYsO8A9CuE/jyVEdcEZ1N6wJQjpSG+6C5qwia8S3QdlACgHjHP8SekYP5kCN7KK1nJ9UKgyEicguhFzi8IQ/at09B+21DfFuUBB3uNK0Ply5jSPwBaNL8MCgzEQ1bdK3dB9ozcjDAaRaIqBo2kxGRy9y8fhPb3tkH7fJrWLc3HscrWpitTww6irSuZ6B5JBy9HkuEf6CLamTsGTk4PNwwT5qldQAHMCSSITaTEZEiXDh4EV/MOwTtl/746kwiruF207pAlCG18V5oUkswdFJLJNzVBkAb2wd1tCnLnhneLQVCxnWSBGRkGMbzYZMZkeowGCIihwi9wN7so9AuPQvtjsbYeT0RArcmN430K8TQVoehubcOBjyThPrNujv2Ac7MGF7bEYGFAPLzDUEV80SIVIfBEBHZVHq1FDlv7oX2sxvQHmyFU7q2ANqa1netexBpPc5D85cIdHuoPfwCIpz7IGdnDHfViMDunmbBmeRtJnwTuR1zhojIorM/F2D9giPQbgzCNwVJuIFbk5vWxQ3cHbkXmrvLMDSjDWK6uyAYqc2M4bb2tZc7J+B0psbLmX3UgkGiKnDWehdiMERkm75Cj5+zDkH7n0Jod0XgpxsdzdbH+p+Fpt0xaO6viz9NTkLd8LquLUBtZwx/6SVgxgznPtvd0yxYq/GqKXnbmX3UgkGiajAYciEGQ+SzavnXcUlhCb5ZsA/rVt3E+mNtUaCPNK2ToMcd9fdD0+sSNE9EodOItpD8JHechUFaGqDV2t5u4kTgvvuqn+vy5cDo0Y5/rruDC2dqvGpTS+brGCSqCoMhF2IwRD7JyQEGT+48B+3C49Dm1EPOxdtQhluTm9bHNQyK2QfNPToMzmiHyKSmnjmXmzeBunUBvd7+faqeq701S02bAhcu3HofF+feaRacqfGqbS2Zr2KQqDrsWq9kbMv2XXL5bp0cYLAAzfAMFmM1bj34EwJOIS0xD2mjGyDlqUQEhSZ74gzMvf22Y4EQUD2pOiXF8CC0NeP4sWPA9u2e+w7tTcquvJ0z+6iBPUMosFcgOYHBkKuxLdt3yeW71ekM5bD0wP9jzBz9Y09AunoZAFC5YSsC57ESI/DPev9CWL9u0Dwdh/ZDWkLya+6Zsltz/Ljj+1gaH8ieGccDAz37oLS3p1vl7ZzZRw0YJJKb+Hm7AD7F+Nd61b9cjH/BZmd7p1xUe3L6bu3469jPQiAEAH4QkCTglcYL8dzaFHTQtHJvHpC9WrVybr/KNQHArRnHY2LMt4uN9V4uibHGSrJynSXJ0FSXklK7fdSAQSK5CYMhV7H11zpg+AtWp/NoscgF5Pbd2vlXr7UQR6oaQMjB+PG1a6qqfE3S04ETJwz5NFlZhp95ed6rmTXWWAHVg5vKNVaVz9+ZfdSAQSK5CYMhV3GkLZuURUbf7aWjl7Hx04uuOZicmhICA4HMTOf3r1oTYJxxfNQow09vBw3O1FjJsZbL2xgkkpswZ8hV2Jbtu7z43Qq9wIF1x6Fdchra7Y2wvTgJwHicwFzE4Az8UIvOoHJrSpg71/Bz3jz7a9mMSdFKqAlITzfkNjmSgO/MPr7OGCRayt9zZ69A8mkMhmriSM8htmX7Lg9/t2XFZdiyeB+0n16H9kAC8ipaA2htWt8p+DC2xT+BUYdmQkiSodnLyJg43LgxcPlyzb2q5BhAzJ0LvPKKoXfZ8eOGXKKoKEMND2A9KVopwYGxxsrd+/g6BonkakIBTp8+LR566CERHh4u6tatKzp37ix+/PFHu/cvKioSAERRUZH9H7pqlRCxsUIY/vs1vGJjDcstqagwrJck832ML0kSIi7OsB0pi7PfbUWFEDk5QmRlGX7W8N0X7C0U7z+6VdwbtUOE4JrZ4YPwuxjS9Afx9oNbxMntp2/tZOl3NC7OsHzVKkO5qpbZuMza77Fc1XSuROSznHp+O0H2gy5euXIFXbt2RWpqKp5++mlERETg+PHjiI+PRys7e6A4PGiTsyOcGvcDLP8Fq9Z2fl/g6Hdroxu+0AvsWXEY2vcKoP2+CX4oSTL7uCi/AmjaHoEmPQj9pyQhJCIEFtVUe2mpDO4eYNCd5DLGExF5DEeg/sO0adPw3XffYVstklMdupi1HeHU1x5AdIu9362VYFr80YT1bszLeOXcYzijN29W617vANLuKITmsUh0HdXeNV3eGUAQkYIxGPpDx44dMWjQIJw+fRpbtmxBTEwMxo8fjyeeeMLqPmVlZSgrKzO9Ly4uRlxcnH0X0xXD4PMB5D3uvva2jm8jmNZDwmnEIgF5CEYpBkbthWZgOYZktEVUl0iL+xARqRWn4/jDb7/9hnfeeQeZmZn4+9//jh9++AGTJ09GUFAQHnnkEYv7zJkzB7NmzXLuA13Rc4gJj65TU/BRdd2FC4bu2e4cIdrGd6vP3QK/Grrh+0GgOfKx4y/vodMbYxHcsJdrykVERE6Tfc1QYGAgunfvju3bt5uWTZ48Gbt27cKOHTss7uP1miFyjZryboDq6yzxQL7WtbPXsHHBfmjXlCPw6AEswVO2d8rKutVDioiILGLN0B+ioqLQsWNHs2UdOnTAqlWrrO4TFBSEoKAg5z7Q3ske5dgt2ZPc3RxV00Sk991n/3EszV/lAr/lnoL2rTxoc+sj9/JtKIehhqcv7Bwfh0Ms2MbmZiLyENkHQ71798bhw4fNlh05cgQtWrRwzwfaO9mjmv9TdveEpfZMf+EIF8xkXVFagR3v7Yf2oyvQ/hKLA2WtAdya3LR1nRNIu+0ENA/Wh1gYA+nsWQbTtSGXSXGJSB3c2nHfBX744QcREBAgZs+eLY4ePSo++eQTUa9ePfHxxx/bfQyXjTPEcU1ujV9jaawdV41fk5NjeTyf2r6yshwqxuXfrojlk74To1t8KxpJl80O5Y9ykdrwZ/HGn3PE4Q2/Wb5GvjLGj6d54neMiBSB4wxVotVqMX36dBw9ehQJCQnIzMyssTdZVU63OSqtmt4TPalqM+yAvZYvB0aPdn5/a2zkeQm9wOENedC+fQrabxvi26Ik6CpVnoZLlzEk/gA0aX4YlJmIhi3CrH8Wh1hwjrt+x5R2LxMRAHatdylPXUyv8kSzgjuTyys/rM6fB555xpkSWlbDA/Tm9ZvY9s4+aJdfg3ZfCxwrjzdbnxh0FJouZ5A2Nhy9HkuEfyAfwG7ljt8xNrkRKRYTqMl+NSUbjxjhup5U7pqw1NLDyt/f/sk6a2Ihz+vCwYv4cv4hrPvCH1+dScQ13G7aPBBlSG28F5rUEgyd1BIJd7UB0Ma5z/bVIRbcGeS5+nfMU/cGESkagyGls5Vs7MqeVO6YsNTaw8oVgRAAxMZCzJuPvfokaAfmQrujMXZeT4RAH9MmkX6FGNrqMDT31sHdUxLRILq7az7bF7m7lsWVv2OevDeISNHYTKZ0nhwXyZjPYWvYAXvzOWzlhwCO1xBNmQIMH46bx05i93e/46NdbbHuUFuc0sWabda17kFoup9H2mMR6PZQe/gF+Nn/GWrl7Jx9jnDl7xjHDCNSPDaTkX3c1XRliauHHdi2zfagiTodMH8+cOkS8MorNg+5/mBLLF1RF98UjMAN3JrcNBi/4+6IvUgbUIohU9ogtkcHAB3sKyd5rpbFlb9jnrw3iEjR+Oew0rmj6aom6emGGoCYGPPlsbGO1wzY+xCKjARmzjR8hvGBWIUeEk4hDn/+egLWFtyBGwhBrP9ZPNVxK7QzduHSBYF153virx/fhdgeHPDQYbYC18pjOdWWq37HPH1vEJFisWZI6bwxYnZ6uqEGoLZJtI48rP6oMRAjRgAAKodE+j/ePYN56BFyEGnJF6F5IgqdRrSF5BftWJnIMk/Xsrjid4yjyRORnRgMKZ23Rsx2RU8pOx9WJ/0SoB25BdqcOISKLLyO5xCHW7UUl/2b4td+k/H2gr6ITGpauzKRZd6oZant7xhHkyciOzGB2lcodZA/Y1IuYPawEpAACGTWeQsLyiea7dLKPw9T4j9Hv+4laDemBwLv6c8Hmru5Onnek5R6bxARB110JVUEQ4ByB/nLzoZ+4iT4nTtrWnQKccjAAqxGOvygQ+/QfUjrcwWap+PQfkhLSH6Wc4fIjawEri7tTeYuSr03iFSOwZALqSYYUphjm05i3Vt50G4NxbdXOiIZOxGFcziHKOzFbRjU4hA0Q4F7MjsivFUjbxeXANayEJFHsWs9+ZzyG+X4bul+aD++Cu3e5jh8syWAFqb15wPj0KMT8NTDDXHnE6GoU6+39wpLlrkqeZ6ISEYYDJFbXTp6GRvmH4R2vYQN+R1xVXQxrQtAOfo22gtN32vQTIxH6/6tALTyWlnJTr46zQgRqRaDIXIpoRc4sO44tEtOQ7u9EbYXJ0GPWzU8TaSLGNryEDTD/DFgSkeENb+9hqMRERG5H4MhqrWy4jJsWbwP2k+vQ3sgAXkVrQG0Nq3vFHwYmtvPIW1cE/R4pAP8A/tYPxgREZGHMRgip5zfdwFfzDuEdRvq4OtzSShBN9O6IJSif9O90PT/HUMnt0Lz5HYA2nmvsERERDVgMKREXugmLPQCe1Ychva9Ami/b4IfSpIA3BrgMMqvAJq2R6BJD0L/KUkIiejh1vIQERG5CoMhJagc/Bw9Cixdahj8zig21jDSrou7Nt+4eAOb39wH7cpSaA+3wRl9ewDtTeu71zsATc9CaB6LRNcH28EvoJlLP9/lbAWRHIuGiEiVGAzJnaVxXao6c8YwGJ4LBr07vesc1i84inXfBGNT4W0oRU/TunoowYBm+5A26CaGZLRFVJeOADrW6vM8xtJ1rBxE2lpPREQ+i4Mu1oa7axKMI/7a8xU5OR2CvkKPHz86iHX/uQDtT82w5/f2Zuub+59GWsfj0IwMQb+JSQhuUEd5tSfWrqNx5OSpU4HXX7e+Xs4jKxMR+TCOQO1CbrmY7q5JMM4FVVONkCU5OTbHgLl29ho2LtgP7ZpyrD/WHoXiVu6PBD2SG+yDJvkyNH+NRtK9bW5NfaHE2hNb11GSAD8/w3bW1st1zi0iIh/HEajlzFpNgwubq7Btm+OBEGCosbHgt9xT0L6VB21ufeRevg3l6GVaF4oi3BO3H5ohetwzpT2aduhU/QCeOGd3sHUdhbAeCBnX5+cbjsOBBomIfBKDIUfpdIbaEUsVakIYahIyMgxTFtSmJsFKUGPT+fPA8uXQNY7A9kONoP2kGNpfYnGgrDWA5qbNWtc5gbTbTkAzOgwpTyehTr07rR/TU+fsDs5eR3cdh4iIZIfBkKPsqWmoXJPgbF5RVJTDRRN+fpCeeQYA4A8gHrE4ioU4gH7wRwVSGu6Fpk8R0ia2QNtBCQDi7Tuwo+csJ05cR7ceh4iIZIfBkKPsrSE4d652OTYpKYZt7WwqEwCg15sti8EZrMJ9+G7IbCS9PQENW3S1r+xVOXLOcmO8jmfOWK7ZsjdnKCXFveUkIiKv8fN2ARTH3hqCo0cNuTRVgxljjk12ds37+/sbgiYHSFXe+0FAkiT02bsEDWPrO3QsM/aesxxrTypfR6nKFTK+z8w0/Nva+gUL5Nf8R0RELsNgyFHGmoaqD04jY03Cv/9tPccGMOTY1JS4Cxhqjz77DMLP+tdkrAuyUhrzJixn2XPOcXHyrT1JTzckeMfEmC+PjTUsnzu35vVyTAwnIiKXYTDkKHtqGp54wv4cG0ur9QJ7Vx3BnEG5uHNcO4zUZ0GPW4GPaTvUEARVVZsmLHvOWe61J+npwIkThqEHsrIMP/PybgU6ttYTEZHPYs6QM4w1DZbygRYsAMrK7DtOpQCl9Gopct7cC+1nN6A92AqndG0BtP1j7W34W+B5/AOvIOzmBdM+Ulwc8PjjwIwZtj+rtk1Yts5ZCUGDv3/NCd621hMRkU/ioIu1Ya2nWG4ukJpqc/eL767Cmu+aYt3XgfimIAk3EGJaF4zfcXfEXqQNKMWQKW0Q2yPK8ucBhkEFa0oQduWggZy/i4iIPIQjULuQpy6myc2bQL16VnOCBAA9/BCM31GBQNPyWP+z0LQ7Bs39dZE6MRH1mtSz7/OMAyIC5gERp5MgIiIF4wjUSqXTAW+/XWNytATAH3r0wbf4PSQCml4XoXkiCp3vbwvJL9rxz/SFJiwiIiIvYTDkSvbMMF/JmteOIuxvf3LNZ6enG0aAZhMWERGRQxgMuYDupg7HJr+Jtu9mQsD+LnphPdu5tiBMACYiInIYgyEnFZ0qwtcLDkD7uQ4bfmuDHzHP/kCIoxoTERHJBoMhBxzbdBLaRSeg3dIAW67chgokAwD6IhdxsHOGeWfG5WEPLiIiIrdhMFSD8hvl+G7pfmg/vgrt3uY4fLMlgBam9e0Dj0PTKR/jOuwAPrLzoI4mNddmfjMiIiKyicFQFZeOXsaG+QehXS9hQ35HXBVdTOsCUI6+jfZC0/caNBPj0bp/KwCtgFzYFwzNnw9MmmR/rY6xy3zV0Q+M85uxyzwREVGtqX6cIaEXOKg9Du2S01j3XSNsL06CHreClSbSRQxteQiaYf4YMKUjwpqHVf8Anc71Ax8aj2mtZ5qrB1MkIiKSGU+NM6S4ucnmzJkDSZKQkZHh9DHKisvw9ZyfMLnzFrQKykfisNZ4/st++La4M/TwR6fgw/j7nbnY/u5eFJQ2wrJjfTDijWTLgRDgnrm7tm2r1fxmXqHTGUbfXr7c8NPWRLREREQyoKhmsl27dmHp0qXo1KmTU/t/PH47NuU0xNdnE3EDXZCCbbgD36E1DiGwSSiG9C+DZkprNE9uB+haGwKNVf+zL2nZ1QMf2juxam0mYHUl5jYREZFCKSYYun79Oh566CH8+9//xiuvvOLUMSZ8cieAUNyLbCzCRESjUiARHAuMXAgk97X8YG/SxDCy9P33W/8AVw58aO/EqrWdgNUVmNtEREQKppicobFjxyI8PBzz589Hv3790KVLFyxYsMDitmVlZSirNHN8cXEx4uLi0LXu93i+5QaM3D8TgIBZg5axOWvqVOD11y3n/gDAc88Bc+e64IxscEcekjswt4mIiNyEOUOVfPrpp/j5558xZ84cu7afM2cOwsLCTK+4uDgAQO6ZNnig6N+QqgZCwK2AY94864EQAPzrX4aaDndzRx6Ss2rKBVJibhMREVElsg+G8vPzMWXKFHz88ccIDg62a5/p06ejqKjI9MrPzzes2L7d9oPbnqTf8eM9kxxszEOKiTFfHhvruaan7GxDzU9qKjB6tOFnfLxhOaC83CYiIqIqZN9MtmbNGtx7773wr1QDotPpIEkS/Pz8UFZWZrbOElM123vvIfTxx11TsJwcz80D5q0RqK3lAhlrplauBMLDDQGSLZ68XkRE5BM81Uwm+wTq/v37Y+/evWbLxo0bh/bt2+P555+3GQiZadbMdQXzZE2HKyZgdTSg0ukMSeSWYmUhDAFRRgZw7JihpspWbhPnYSMiIpmSfTDUoEEDJCUlmS0LCQlB48aNqy236c47bT+4JQnQ620fy5FeXN6eW8yZbu/25gJt3244zogRhmtX+bp6OreJiIjICbLPGXIpe5KSn3nG9nHi4uyv6bCWc7NypWcGKDQ2dVUNbIzd3o25P1U5kgskh9wmIiIiJ8k+Z8gVqrU5WqopiYu7NTji3/5m6DVmiSTZ/4C3lnNjiTsGKKxNt/fcXMdzgbxdA0ZERD7FUzlD6gyGANsP7pUrDb3GLly4taxywGSLrUCkqspJyenprgksnAlojJQyzhEREfksJlC7m62k5BEjgHvvdT4gsZVzU1XlpGS93tBcV9upLWrT7d3YpMhcICIi8nHqyhlylDFgGjXK8NORB78zvc2MScn33+94jo8ltZ3Sg7lARESkAuptJnM3e5uoHOFo05SrmrqYC0RERF7A6TiULiXFEGhU7bVWG45ObeGqKT1qU0NGREQkcwyG3KWmQKS2HGmCY1MXERFRjRgMuZO1QMQSRwImRwZ8NJbjxAlDr7GsLMPPvDwGQkRERGDOkGdUzbm5cAHIzKw+ztEbbxiWszs7ERERu9b7FEvd+NPTLScl+/uzOzsREZEHMRjyFmvjHBmb1izNJWbvgI9ERERkNwZDcpSeDgwbxu7sREREHsBgSK5sjZBNRERELsHeZERERKRqDIaIiIhI1RgMERERkaoxGCIiIiJVYzBEREREqsZgiIiIiFSNwRARERGpGoMhIiIiUjUGQ0RERKRqDIaIiIhI1RgMERERkaoxGCIiIiJVYzBEREREqsZgiIiIiFSNwRARERGpGoMhIiIiUjUGQ0RERKRqDIaIiIhI1RgMERERkaoxGCIiIiJVYzBEREREqsZgiIiIiFSNwRARERGpGoMhIiIiUjUGQ0RERKRqDIaIiIhI1WQfDM2ZMwc9evRAgwYNEBERgeHDh+Pw4cPeLhYRERH5CNkHQ1u2bMGECROwc+dObNy4ERUVFRg4cCBKSkq8XTQiIiLyAZIQQni7EI64cOECIiIisGXLFtx111127VNcXIywsDAUFRUhNDTUzSUkIiIiV/DU8zvAbUd2k6KiIgBAeHi41W3KyspQVlZmel9cXOz2chEREZEyyb6ZrDIhBDIzM9GnTx8kJSVZ3W7OnDkICwszveLi4jxYSiIiIlISRTWTTZgwAevXr8e3336L2NhYq9tZqhmKi4tjMxkREZGCsJmsikmTJmHt2rXYunVrjYEQAAQFBSEoKMhDJSMiIiIlk30wJITApEmTsHr1auTm5iIhIcHbRSIiIiIfIvtgaMKECcjKysLnn3+OBg0aoKCgAAAQFhaGunXrerl0REREpHSyzxmSJMni8g8++ACPPvqoXcdg13oiIiLlYc7QH2QeqxEREZHCKaprPREREZGrMRgiIiIiVWMwRERERKrGYIiIiIhUjcEQERERqRqDISIiIlI1BkNERESkagyGiIiISNUYDBEREZGqMRgiIiIiVWMwRERERKrGYIiIiIhUjcEQERERqRqDISIiIlI1BkNERESkagyGiIiISNUYDBEREZGqMRgiIiIiVWMwRERERKrGYIiIiIhUjcEQERERqRqDISIiIlI1BkNERESkagyGiIiISNUYDBEREZGqMRgiIiIiVWMwRERERKrGYIiIiIhUjcEQERERqRqDISIiIlI1BkNERESkagyGiIiISNUYDBEREZGqMRgiIiIiVWMwRERERKrGYIiIiIhUjcEQERERqRqDISIiIlI1xQRDb7/9NhISEhAcHIxu3bph27Zt3i4SERER+QBFBEMrVqxARkYGXnjhBezevRspKSkYPHgwTp065e2iERERkcJJQgjh7ULYcscdd+D222/HO++8Y1rWoUMHDB8+HHPmzLG5f3FxMcLCwlBUVITQ0FB3FpWIiIhcxFPPb9nXDN28eRM//fQTBg4caLZ84MCB2L59u5dKRURERL4iwNsFsOXixYvQ6XSIjIw0Wx4ZGYmCggKL+5SVlaGsrMz0vqioCIAhwiQiIiJlMD633d2IJftgyEiSJLP3Qohqy4zmzJmDWbNmVVseFxfnlrIRERGR+1y6dAlhYWFuO77sg6EmTZrA39+/Wi1QYWFhtdoio+nTpyMzM9P0/urVq2jRogVOnTrl1ospN8XFxYiLi0N+fr6qcqV43jxvNeB587zVoKioCM2bN0d4eLhbP0f2wVBgYCC6deuGjRs34t577zUt37hxI4YNG2Zxn6CgIAQFBVVbHhYWpqpfIqPQ0FCet4rwvNWF560uaj1vPz/3pjjLPhgCgMzMTIwZMwbdu3dHcnIyli5dilOnTuGpp57ydtGIiIhI4RQRDD3wwAO4dOkSXnrpJZw7dw5JSUn44osv0KJFC28XjYiIiBROEcEQAIwfPx7jx493at+goCDMmDHDYtOZL+N587zVgOfN81YDnrd7z1sRgy4SERERuYvsB10kIiIicicGQ0RERKRqDIaIiIhI1RgMERERkaopMhh6++23kZCQgODgYHTr1g3btm2rcfstW7agW7duCA4ORsuWLbFkyZJq26xatQodO3ZEUFAQOnbsiNWrV7ur+E5z5Lyzs7MxYMAANG3aFKGhoUhOTsZXX31lts2yZcsgSVK1V2lpqbtPxSGOnHdubq7Fczp06JDZdr72fT/66KMWzzsxMdG0jRK+761btyItLQ3R0dGQJAlr1qyxuY8v3N+Onrev3N+Onrev3N+Onrev3N9z5sxBjx490KBBA0RERGD48OE4fPiwzf08cY8rLhhasWIFMjIy8MILL2D37t1ISUnB4MGDcerUKYvb5+XlYciQIUhJScHu3bvx97//HZMnT8aqVatM2+zYsQMPPPAAxowZg19++QVjxozByJEj8f3333vqtGxy9Ly3bt2KAQMG4IsvvsBPP/2E1NRUpKWlYffu3WbbhYaG4ty5c2av4OBgT5ySXRw9b6PDhw+bnVObNm1M63zx+164cKHZ+ebn5yM8PBz333+/2XZy/75LSkrQuXNnLFq0yK7tfeX+dvS8feX+dvS8jZR+fzt63r5yf2/ZsgUTJkzAzp07sXHjRlRUVGDgwIEoKSmxuo/H7nGhMD179hRPPfWU2bL27duLadOmWdz+b3/7m2jfvr3ZsieffFL06tXL9H7kyJHinnvuMdtm0KBB4sEHH3RRqWvP0fO2pGPHjmLWrFmm9x988IEICwtzVRHdwtHzzsnJEQDElStXrB5TDd/36tWrhSRJ4sSJE6ZlSvi+KwMgVq9eXeM2vnJ/V2bPeVuixPu7MnvO21fu78qc+b594f4WQojCwkIBQGzZssXqNp66xxVVM3Tz5k389NNPGDhwoNnygQMHYvv27Rb32bFjR7XtBw0ahB9//BHl5eU1bmPtmJ7mzHlXpdfrce3atWqT3V2/fh0tWrRAbGwsNBpNtb8svak25921a1dERUWhf//+yMnJMVunhu/7/fffx913311tlHY5f9/O8IX72xWUeH/XhpLvb1fwlfu7qKgIAGqchNVT97iigqGLFy9Cp9NVm60+MjKy2qz2RgUFBRa3r6iowMWLF2vcxtoxPc2Z867qjTfeQElJCUaOHGla1r59eyxbtgxr167F8uXLERwcjN69e+Po0aMuLb+znDnvqKgoLF26FKtWrUJ2djbatWuH/v37Y+vWraZtfP37PnfuHL788ks8/vjjZsvl/n07wxfub1dQ4v3tDF+4v2vLV+5vIQQyMzPRp08fJCUlWd3OU/e4YqbjqEySJLP3Qohqy2xtX3W5o8f0BmfLuHz5csycOROff/45IiIiTMt79eqFXr16md737t0bt99+O9566y28+eabrit4LTly3u3atUO7du1M75OTk5Gfn4/XX38dd911l1PH9BZny7hs2TI0bNgQw4cPN1uulO/bUb5yfztL6fe3I3zp/naWr9zfEydOxK+//opvv/3W5raeuMcVVTPUpEkT+Pv7V4v2CgsLq0WFRs2aNbO4fUBAABo3blzjNtaO6WnOnLfRihUr8Nhjj+F///sf7r777hq39fPzQ48ePWTzl0RtzruyXr16mZ2TL3/fQgj85z//wZgxYxAYGFjjtnL7vp3hC/d3bSj5/nYVpd3fteEr9/ekSZOwdu1a5OTkIDY2tsZtPXWPKyoYCgwMRLdu3bBx40az5Rs3bsSdd95pcZ/k5ORq23/99dfo3r076tSpU+M21o7pac6cN2D4i/HRRx9FVlYWhg4davNzhBDYs2cPoqKial1mV3D2vKvavXu32Tn56vcNGHprHDt2DI899pjNz5Hb9+0MX7i/naX0+9tVlHZ/14bS728hBCZOnIjs7Gxs3rwZCQkJNvfx2D1ud6q1THz66aeiTp064v333xcHDhwQGRkZIiQkxJRVP23aNDFmzBjT9r/99puoV6+eeOaZZ8SBAwfE+++/L+rUqSNWrlxp2ua7774T/v7+4v/+7//EwYMHxf/93/+JgIAAsXPnTo+fnzWOnndWVpYICAgQixcvFufOnTO9rl69atpm5syZYsOGDeL48eNi9+7dYty4cSIgIEB8//33Hj8/axw97/nz54vVq1eLI0eOiH379olp06YJAGLVqlWmbXzx+zZ6+OGHxR133GHxmEr4vq9duyZ2794tdu/eLQCIefPmid27d4uTJ08KIXz3/nb0vH3l/nb0vH3l/nb0vI2Ufn8//fTTIiwsTOTm5pr93t64ccO0jbfuccUFQ0IIsXjxYtGiRQsRGBgobr/9drNueWPHjhV9+/Y12z43N1d07dpVBAYGivj4ePHOO+9UO+Znn30m2rVrJ+rUqSPat29vdnPJhSPn3bdvXwGg2mvs2LGmbTIyMkTz5s1FYGCgaNq0qRg4cKDYvn27B8/IPo6c92uvvSZatWolgoODRaNGjUSfPn3E+vXrqx3T175vIYS4evWqqFu3rli6dKnF4ynh+zZ2nbb2e+ur97ej5+0r97ej5+0r97czv+e+cH9bOmcA4oMPPjBt4617XPqjgERERESqpKicISIiIiJXYzBEREREqsZgiIiIiFSNwRARERGpGoMhIiIiUjUGQ0RERKRqDIaIiIhI1RgMERERkaoxGCIiIiJVYzBEREREqsZgiIgUafny5QgODsaZM2dMyx5//HF06tQJRUVFXiwZESkN5yYjIkUSQqBLly5ISUnBokWLMGvWLLz33nvYuXMnYmJivF08IlKQAG8XgIjIGZIkYfbs2RgxYgSio6OxcOFCbNu2jYEQETmMNUNEpGi333479u/fj6+//hp9+/b1dnGISIGYM0REivXVV1/h0KFD0Ol0iIyM9HZxiEihWDNERIr0888/o1+/fli8eDE+/fRT1KtXD5999pm3i0VECsScISJSnBMnTmDo0KGYNm0axowZg44dO6JHjx746aef0K1bN28Xj4gUhjVDRKQoly9fRu/evXHXXXfh3XffNS0fNmwYysrKsGHDBi+WjoiUiMEQERERqRoTqImIiEjVGAwRERGRqjEYIiIiIlVjMERERESqxmCIiIiIVI3BEBEREakagyEiIiJSNQZDREREpGoMhoiIiEjVGAwRERGRqjEYIiIiIlVjMERERESq9v/vyPBsDaIxpgAAAABJRU5ErkJggg==",
       "text/plain": [
        "<Figure size 640x480 with 1 Axes>"
       ]
@@ -1816,843 +2350,2047 @@
     }
    ],
    "source": [
-    "import matplotlib.pyplot as plt\n",
+    "# Using Autograd to calculate gradients for OLS\n",
+    "from random import random, seed\n",
     "import numpy as np\n",
-    "from sklearn.linear_model import LinearRegression, Ridge, Lasso\n",
-    "from sklearn.preprocessing import PolynomialFeatures\n",
-    "from sklearn.model_selection import train_test_split\n",
-    "from sklearn.pipeline import make_pipeline\n",
-    "from sklearn.utils import resample\n",
-    "\n",
-    "np.random.seed(2018)\n",
-    "\n",
-    "n = 40\n",
-    "n_boostraps = 100\n",
-    "maxdegree = 14\n",
-    "\n",
-    "\n",
-    "# Make data set.\n",
-    "x = np.linspace(-3, 3, n).reshape(-1, 1)\n",
-    "y = np.exp(-x**2) + 1.5 * np.exp(-(x-2)**2)+ np.random.normal(0, 0.1, x.shape)\n",
-    "error = np.zeros(maxdegree)\n",
-    "bias = np.zeros(maxdegree)\n",
-    "variance = np.zeros(maxdegree)\n",
-    "polydegree = np.zeros(maxdegree)\n",
-    "x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2)\n",
-    "\n",
-    "for degree in range(maxdegree):\n",
-    "    model = make_pipeline(PolynomialFeatures(degree=degree), LinearRegression(fit_intercept=False))\n",
-    "    y_pred = np.empty((y_test.shape[0], n_boostraps))\n",
-    "    for i in range(n_boostraps):\n",
-    "        x_, y_ = resample(x_train, y_train)\n",
-    "        y_pred[:, i] = model.fit(x_, y_).predict(x_test).ravel()\n",
-    "\n",
-    "    polydegree[degree] = degree\n",
-    "    error[degree] = np.mean( np.mean((y_test - y_pred)**2, axis=1, keepdims=True) )\n",
-    "    bias[degree] = np.mean( (y_test - np.mean(y_pred, axis=1, keepdims=True))**2 )\n",
-    "    variance[degree] = np.mean( np.var(y_pred, axis=1, keepdims=True) )\n",
-    "    print('Polynomial degree:', degree)\n",
-    "    print('Error:', error[degree])\n",
-    "    print('Bias^2:', bias[degree])\n",
-    "    print('Var:', variance[degree])\n",
-    "    print('{} >= {} + {} = {}'.format(error[degree], bias[degree], variance[degree], bias[degree]+variance[degree]))\n",
-    "\n",
-    "plt.plot(polydegree, error, label='Error')\n",
-    "plt.plot(polydegree, bias, label='bias')\n",
-    "plt.plot(polydegree, variance, label='Variance')\n",
-    "plt.legend()\n",
+    "import autograd.numpy as np\n",
+    "import matplotlib.pyplot as plt\n",
+    "from autograd import grad\n",
+    "\n",
+    "def CostOLS(theta):\n",
+    "    return (1.0/n)*np.sum((y-X @ theta)**2)\n",
+    "\n",
+    "n = 100\n",
+    "x = 2*np.random.rand(n,1)\n",
+    "y = 4+3*x+np.random.randn(n,1)\n",
+    "\n",
+    "X = np.c_[np.ones((n,1)), x]\n",
+    "XT_X = X.T @ X\n",
+    "theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)\n",
+    "print(\"Own inversion\")\n",
+    "print(theta_linreg)\n",
+    "# Hessian matrix\n",
+    "H = (2.0/n)* XT_X\n",
+    "EigValues, EigVectors = np.linalg.eig(H)\n",
+    "print(f\"Eigenvalues of Hessian Matrix:{EigValues}\")\n",
+    "\n",
+    "theta = np.random.randn(2,1)\n",
+    "eta = 1.0/np.max(EigValues)\n",
+    "Niterations = 1000\n",
+    "# define the gradient\n",
+    "training_gradient = grad(CostOLS)\n",
+    "\n",
+    "for iter in range(Niterations):\n",
+    "    gradients = training_gradient(theta)\n",
+    "    theta -= eta*gradients\n",
+    "print(\"theta from own gd\")\n",
+    "print(theta)\n",
+    "\n",
+    "xnew = np.array([[0],[2]])\n",
+    "Xnew = np.c_[np.ones((2,1)), xnew]\n",
+    "ypredict = Xnew.dot(theta)\n",
+    "ypredict2 = Xnew.dot(theta_linreg)\n",
+    "\n",
+    "plt.plot(xnew, ypredict, \"r-\")\n",
+    "plt.plot(xnew, ypredict2, \"b-\")\n",
+    "plt.plot(x, y ,'ro')\n",
+    "plt.axis([0,2.0,0, 15.0])\n",
+    "plt.xlabel(r'$x$')\n",
+    "plt.ylabel(r'$y$')\n",
+    "plt.title(r'Random numbers ')\n",
     "plt.show()"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "d638746f",
-   "metadata": {},
-   "source": [
-    "## Summing up\n",
-    "\n",
-    "The bias-variance tradeoff summarizes the fundamental tension in\n",
-    "machine learning, particularly supervised learning, between the\n",
-    "complexity of a model and the amount of training data needed to train\n",
-    "it.  Since data is often limited, in practice it is often useful to\n",
-    "use a less-complex model with higher bias, that is  a model whose asymptotic\n",
-    "performance is worse than another model because it is easier to\n",
-    "train and less sensitive to sampling noise arising from having a\n",
-    "finite-sized training dataset (smaller variance). \n",
-    "\n",
-    "The above equations tell us that in\n",
-    "order to minimize the expected test error, we need to select a\n",
-    "statistical learning method that simultaneously achieves low variance\n",
-    "and low bias. Note that variance is inherently a nonnegative quantity,\n",
-    "and squared bias is also nonnegative. Hence, we see that the expected\n",
-    "test MSE can never lie below $Var(\\epsilon)$, the irreducible error.\n",
-    "\n",
-    "What do we mean by the variance and bias of a statistical learning\n",
-    "method? The variance refers to the amount by which our model would change if we\n",
-    "estimated it using a different training data set. Since the training\n",
-    "data are used to fit the statistical learning method, different\n",
-    "training data sets  will result in a different estimate. But ideally the\n",
-    "estimate for our model should not vary too much between training\n",
-    "sets. However, if a method has high variance  then small changes in\n",
-    "the training data can result in large changes in the model. In general, more\n",
-    "flexible statistical methods have higher variance.\n",
-    "\n",
-    "You may also find this recent [article](https://www.pnas.org/content/116/32/15849) of interest."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "6fe999c4",
-   "metadata": {},
+   "id": "e36cec47",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "## Another Example from Scikit-Learn's Repository\n",
-    "\n",
-    "This example demonstrates the problems of underfitting and overfitting and\n",
-    "how we can use linear regression with polynomial features to approximate\n",
-    "nonlinear functions. The plot shows the function that we want to approximate,\n",
-    "which is a part of the cosine function. In addition, the samples from the\n",
-    "real function and the approximations of different models are displayed. The\n",
-    "models have polynomial features of different degrees. We can see that a\n",
-    "linear function (polynomial with degree 1) is not sufficient to fit the\n",
-    "training samples. This is called **underfitting**. A polynomial of degree 4\n",
-    "approximates the true function almost perfectly. However, for higher degrees\n",
-    "the model will **overfit** the training data, i.e. it learns the noise of the\n",
-    "training data.\n",
-    "We evaluate quantitatively overfitting and underfitting by using\n",
-    "cross-validation. We calculate the mean squared error (MSE) on the validation\n",
-    "set, the higher, the less likely the model generalizes correctly from the\n",
-    "training data."
+    "## Same code but now with momentum gradient descent"
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 7,
-   "id": "8289e153",
-   "metadata": {},
+   "execution_count": 2,
+   "id": "fc5df7eb",
+   "metadata": {
+    "collapsed": false,
+    "editable": true,
+    "jupyter": {
+     "outputs_hidden": false
+    }
+   },
    "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Own inversion\n",
+      "[[4.]\n",
+      " [3.]]\n",
+      "Eigenvalues of Hessian Matrix:[0.32271082 4.28881826]\n",
+      "0 [-15.82213508] [-18.92177871]\n",
+      "1 [0.20056598] [-0.17169419]\n",
+      "2 [0.18547446] [-0.15877511]\n",
+      "3 [0.17151849] [-0.14682813]\n",
+      "4 [0.15861263] [-0.13578009]\n",
+      "5 [0.14667787] [-0.12556336]\n",
+      "6 [0.13564114] [-0.11611538]\n",
+      "7 [0.12543486] [-0.10737831]\n",
+      "8 [0.11599656] [-0.09929867]\n",
+      "9 [0.10726843] [-0.09182697]\n",
+      "10 [0.09919705] [-0.08491748]\n",
+      "11 [0.091733] [-0.07852789]\n",
+      "12 [0.08483058] [-0.07261908]\n",
+      "13 [0.07844753] [-0.06715488]\n",
+      "14 [0.07254477] [-0.06210183]\n",
+      "15 [0.06708616] [-0.057429]\n",
+      "16 [0.06203828] [-0.05310777]\n",
+      "17 [0.05737023] [-0.04911169]\n",
+      "18 [0.05305342] [-0.0454163]\n",
+      "19 [0.04906143] [-0.04199896]\n",
+      "20 [0.04536982] [-0.03883876]\n",
+      "21 [0.04195598] [-0.03591635]\n",
+      "22 [0.03879902] [-0.03321384]\n",
+      "23 [0.0358796] [-0.03071467]\n",
+      "24 [0.03317985] [-0.02840356]\n",
+      "25 [0.03068324] [-0.02626634]\n",
+      "26 [0.02837449] [-0.02428994]\n",
+      "27 [0.02623946] [-0.02246225]\n",
+      "28 [0.02426508] [-0.02077208]\n",
+      "29 [0.02243926] [-0.01920909]\n",
+      "theta from own gd\n",
+      "[[4.06430161]\n",
+      " [2.94495471]]\n",
+      "0 [0.02075083] [-0.01776371]\n",
+      "1 [0.01918944] [-0.01642709]\n",
+      "2 [0.01727712] [-0.01479005]\n",
+      "3 [0.01540341] [-0.01318606]\n",
+      "4 [0.01368227] [-0.01171269]\n",
+      "5 [0.01213641] [-0.01038936]\n",
+      "6 [0.01075945] [-0.00921061]\n",
+      "7 [0.00953677] [-0.00816394]\n",
+      "8 [0.00845238] [-0.00723565]\n",
+      "9 [0.00749106] [-0.00641271]\n",
+      "10 [0.00663901] [-0.00568331]\n",
+      "11 [0.00588384] [-0.00503685]\n",
+      "12 [0.00521456] [-0.00446392]\n",
+      "13 [0.00462141] [-0.00395615]\n",
+      "14 [0.00409573] [-0.00350614]\n",
+      "15 [0.00362984] [-0.00310732]\n",
+      "16 [0.00321695] [-0.00275386]\n",
+      "17 [0.00285102] [-0.00244061]\n",
+      "18 [0.00252672] [-0.002163]\n",
+      "19 [0.00223931] [-0.00191696]\n",
+      "20 [0.00198459] [-0.0016989]\n",
+      "21 [0.00175884] [-0.00150565]\n",
+      "22 [0.00155877] [-0.00133439]\n",
+      "23 [0.00138147] [-0.0011826]\n",
+      "24 [0.00122432] [-0.00104808]\n",
+      "25 [0.00108506] [-0.00092886]\n",
+      "26 [0.00096163] [-0.0008232]\n",
+      "27 [0.00085225] [-0.00072957]\n",
+      "28 [0.00075531] [-0.00064658]\n",
+      "29 [0.00066939] [-0.00057303]\n",
+      "theta from own gd wth momentum\n",
+      "[[4.00183832]\n",
+      " [2.99842631]]\n"
+     ]
+    }
+   ],
+   "source": [
+    "# Using Autograd to calculate gradients for OLS\n",
+    "from random import random, seed\n",
+    "import numpy as np\n",
+    "import autograd.numpy as np\n",
+    "import matplotlib.pyplot as plt\n",
+    "from autograd import grad\n",
+    "\n",
+    "def CostOLS(theta):\n",
+    "    return (1.0/n)*np.sum((y-X @ theta)**2)\n",
+    "\n",
+    "n = 100\n",
+    "x = 2*np.random.rand(n,1)\n",
+    "y = 4+3*x#+np.random.randn(n,1)\n",
+    "\n",
+    "X = np.c_[np.ones((n,1)), x]\n",
+    "XT_X = X.T @ X\n",
+    "theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)\n",
+    "print(\"Own inversion\")\n",
+    "print(theta_linreg)\n",
+    "# Hessian matrix\n",
+    "H = (2.0/n)* XT_X\n",
+    "EigValues, EigVectors = np.linalg.eig(H)\n",
+    "print(f\"Eigenvalues of Hessian Matrix:{EigValues}\")\n",
+    "\n",
+    "theta = np.random.randn(2,1)\n",
+    "eta = 1.0/np.max(EigValues)\n",
+    "Niterations = 30\n",
+    "\n",
+    "# define the gradient\n",
+    "training_gradient = grad(CostOLS)\n",
+    "\n",
+    "for iter in range(Niterations):\n",
+    "    gradients = training_gradient(theta)\n",
+    "    theta -= eta*gradients\n",
+    "    print(iter,gradients[0],gradients[1])\n",
+    "print(\"theta from own gd\")\n",
+    "print(theta)\n",
+    "\n",
+    "# Now improve with momentum gradient descent\n",
+    "change = 0.0\n",
+    "delta_momentum = 0.3\n",
+    "for iter in range(Niterations):\n",
+    "    # calculate gradient\n",
+    "    gradients = training_gradient(theta)\n",
+    "    # calculate update\n",
+    "    new_change = eta*gradients+delta_momentum*change\n",
+    "    # take a step\n",
+    "    theta -= new_change\n",
+    "    # save the change\n",
+    "    change = new_change\n",
+    "    print(iter,gradients[0],gradients[1])\n",
+    "print(\"theta from own gd wth momentum\")\n",
+    "print(theta)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0b27af70",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Including Stochastic Gradient Descent with Autograd\n",
+    "\n",
+    "In this code we include the stochastic gradient descent approach\n",
+    "discussed above. Note here that we specify which argument we are\n",
+    "taking the derivative with respect to when using **autograd**."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "id": "adef9763",
+   "metadata": {
+    "collapsed": false,
+    "editable": true,
+    "jupyter": {
+     "outputs_hidden": false
+    }
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Own inversion\n",
+      "[[3.64332363]\n",
+      " [3.23651864]]\n",
+      "Eigenvalues of Hessian Matrix:[0.34281178 4.21575295]\n",
+      "theta from own gd\n",
+      "[[3.64332363]\n",
+      " [3.23651864]]\n"
+     ]
+    },
     {
      "data": {
-      "image/png": "iVBORw0KGgoAAAANSUhEUgAABGQAAAHTCAYAAABhg86vAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8hTgPZAAAACXBIWXMAAA9hAAAPYQGoP6dpAADlnUlEQVR4nOzdd3xN9x/H8de92VsSmyD2Vrv2FqtojaJWtShVtFWlv+peVodSbVVttVt+lNp7K2rHiL2FDElk3PP7I7/cigQJkcjN+/l45ME994zvyU3e95vP/Z7vMRmGYSAiIiIiIiIiIunGnNENEBERERERERHJalSQERERERERERFJZyrIiIiIiIiIiIikMxVkRERERERERETSmQoyIiIiIiIiIiLpTAUZEREREREREZF0poKMiIiIiIiIiEg6U0FGRERERERERCSdqSAjIiIiIiIiIpLOVJAREREREREREUlnKsjIEzV16lRMJpP1y9nZmbx58xIQEMC4ceMICwvL6Cami0uXLjFs2DAaNGiAh4cHJpOJ9evXZ3SzRESeCGV/8nr37o3JZKJVq1YZ3RQRkUemjI+Xmv59/fr1E33PEr6aNWuWvo2Wp459RjdAsoZPPvkEf39/YmJiuHz5MuvXr2fw4MF8/fXXLFmyhPLly2d0E5+oY8eOMXLkSIoVK0a5cuXYtm1bRjdJROSJy+rZf7fdu3czdepUnJ2dM7opIiJpIqtnfGr79/nz5+fLL79MtCxv3rxPsomSCaggI+miefPmVKlSxfp4+PDhrF27llatWtG6dWuOHDmCi4tLurXHMAyioqLS7ZiVK1fmxo0b+Pj4sGDBAjp06JAuxxURyUhZPfvvPu7AgQPp3r07a9asSddji4g8KVk941Pbv/fy8qJr167p0jbJPHTJkmSYhg0bMmLECM6cOcPMmTMTPXf06FHat2+Pj48Pzs7OVKlShSVLliTZxz///EO9evVwcXEhf/78fPbZZ0yZMgWTycTp06et6xUqVIhWrVrx119/UaVKFVxcXPjpp58AuHXrFoMHD8bPzw8nJyeKFi3KyJEjsVgsiY5lsVj49ttvKVOmDM7OzuTKlYu+ffty8+bNh56rh4cHPj4+j/BdEhGxLVkp+xPMmDGDgwcP8vnnn6fiOyUikvlkpYx/lP59bGws4eHhqdpGbJtGyEiG6tatG++99x4rV66kd+/eABw6dIhatWqRL18+hg0bhpubG/PmzaNt27YsXLiQ559/HoALFy7QoEEDTCYTw4cPx83NjV9++QUnJ6dkj3Xs2DE6d+5M37596d27NyVKlCAiIoJ69epx4cIF+vbtS4ECBdi6dSvDhw/n0qVLfPvtt9bt+/bty9SpU3n55ZcZOHAgQUFBjB8/nr1797JlyxYcHBye+PdLRMQWZKXsDwsL49133+W9994jd+7cafMNFBF5imWljE+NwMBA3NzciI6OJleuXPTu3ZsPPvhAf0NkdYbIEzRlyhQDMHbt2nXfdby8vIyKFStaHzdq1MgoV66cERUVZV1msViMmjVrGsWKFbMue+ONNwyTyWTs3bvXuuzGjRuGj4+PARhBQUHW5QULFjQAY8WKFYmO/emnnxpubm5GYGBgouXDhg0z7OzsjLNnzxqGYRibNm0yAGPWrFmJ1luxYkWyyx9k/vz5BmCsW7cuxduIiGQmyv5/DRkyxPD397eeV8GCBY2WLVs+dDsRkaeVMj6ph/Xve/XqZXz00UfGwoULjenTpxutW7c2AKNjx44pPobYJl2yJBnO3d3dOht7cHAwa9eupWPHjoSFhXH9+nWuX7/OjRs3CAgI4Pjx41y4cAGAFStWUKNGDZ555hnrvnx8fHjppZeSPY6/vz8BAQGJls2fP586derg7e1tPdb169dp3LgxcXFxbNy40bqel5cXTZo0SbRe5cqVcXd3Z926dU/gOyMiYruyQvYHBgby3XffMXr06Pt+uisiYouyQsanxuTJk/nwww954YUX6NatG4sXL6Z3797MmzeP7du3p9lxJPPRJUuS4cLDw8mZMycAJ06cwDAMRowYwYgRI5Jd/+rVq+TLl48zZ85Qo0aNJM8XLVo02e38/f2TLDt+/Dj//PMPOXLkuO+xEtYLCQmxtvN+64mISMpkhewfNGgQNWvWpF27dg9cT0TE1mSFjH9cb7/9NpMmTWL16tU8++yzT/RY8vRSQUYy1Pnz5wkJCbGGbMJEW0OGDElS7U5wv0B+mORmXLdYLDRp0oShQ4cmu03x4sWt6+XMmZNZs2Ylu979Al9ERJLKCtm/du1aVqxYwaJFixJNQhkbG0tkZCSnT5/Gx8cHT0/PVJyNiMjTLytkfFrw8/MD4kcQSdalgoxkqBkzZgBYw7lw4cIAODg40Lhx4wduW7BgQU6cOJFkeXLL7qdIkSKEh4c/9FhFihRh9erV1KpVK91vlyoiYmuyQvafPXsWgBdeeCHJcxcuXMDf359vvvmGwYMHp2q/IiJPu6yQ8Wnh1KlTgD7Yzeo0h4xkmLVr1/Lpp5/i7+9vvS40Z86c1K9fn59++olLly4l2ebatWvW/wcEBLBt2zb27dtnXRYcHHzfKndyOnbsyLZt2/jrr7+SPHfr1i1iY2Ot68XFxfHpp58mWS82NpZbt26l+JgiIllZVsn+hg0b8vvvvyf5ypEjB1WqVOH333/nueeeS3GbRUQyg6yS8akRGhrKnTt3Ei0zDIPPPvsM4L6jhiRr0AgZSRfLly/n6NGjxMbGcuXKFdauXcuqVasoWLAgS5YswdnZ2bruhAkTqF27NuXKlaN3794ULlyYK1eusG3bNs6fP8/+/fsBGDp0KDNnzqRJkya88cYb1tviFShQgODgYEwm00Pb9c4777BkyRJatWpFz549qVy5Mrdv3+bAgQMsWLCA06dPkz17durVq0ffvn358ssv2bdvH02bNsXBwYHjx48zf/58vvvuO9q3b//AYyWE7qFDh4D4Tw82b94MwPvvv/9I31cRkadZVs7+AgUKUKBAgSTLBw8eTK5cuWjbtu2jfVNFRJ4SWTnjE6Skf//333/TuXNnOnfuTNGiRYmMjOT3339ny5Yt9OnTh0qVKqX+my+2I2Nv8iS2LuG2eAlfjo6ORu7cuY0mTZoY3333nREaGprsdidPnjS6d+9u5M6d23BwcDDy5ctntGrVyliwYEGi9fbu3WvUqVPHcHJyMvLnz298+eWXxrhx4wzAuHz5snW9B91mNCwszBg+fLhRtGhRw9HR0ciePbtRs2ZNY8yYMUZ0dHSidX/++WejcuXKhouLi+Hh4WGUK1fOGDp0qHHx4sWHfi/u/j7c+yUiYkuU/fen216LSGanjP9XSvr3p06dMjp06GAUKlTIcHZ2NlxdXY3KlSsbP/74o2GxWB56DLFtJsMwjPQq/oikh8GDB/PTTz8RHh6OnZ1dRjdHRETSgbJfRMR2KePFVmkOGcnUIiMjEz2+ceMGM2bMoHbt2gprEREbpewXEbFdynjJSjSHjGRqNWrUoH79+pQqVYorV64wefJkQkNDGTFiREY3TUREnhBlv4iI7VLGS1aigoxkai1atGDBggX8/PPPmEwmKlWqxOTJk6lbt25GN01ERJ4QZb+IiO1SxktWojlkRERERERERETSmeaQERERERERERFJZyrIiIiIiIiIiIikMxVkRB6gRYsW9O7dO6ObIakwbNgwqlevntHNEJFMTNmf+XTq1ImOHTtmdDNEJBPr378/TZo0yehmSCqsWLECd3d3rl27ltFNeWRZviAzdepUTCYTJpOJzZs3J3neMAz8/PwwmUy0atUq0XPh4eF8+OGHlC1bFjc3N3x9fXnmmWcYNGgQFy9etK730UcfWY+R3Nfly5ef+HkmmDRpEvXq1SNXrlw4OTnh7+/Pyy+/zOnTp1O9r1u3bpEzZ05MJhMLFixI8vzx48fp1KkT+fPnx9XVlZIlS/LJJ58QERGRBmeS2JEjR2jWrBnu7u74+PjQrVu3ZH8xP//8c1q3bk2uXLkwmUx89NFH993nli1bWLlyJe+++26atdNisZAjRw5GjRr12Ptq0qQJJpOJAQMGpGj9+vXrJ/vz16xZs0Tr7dq1iwEDBlCmTBnc3NwoUKAAHTt2JDAw8LHbnJwLFy7QsWNHsmXLhqenJ23atOHUqVNJ1ps4cSIdOnSgQIECmEwmevbsmez+Bg8ezP79+1myZMkTaa/YhqyU/RaLhalTp9K6dWv8/Pxwc3OjbNmyfPbZZ0RFRaV4P9HR0XzxxReULFkSZ2dncuXKRcuWLTl//rx1nfXr19/3fLdv357m55aS7L948SJdu3alRIkSeHh4kC1bNqpVq8a0adNIbho9W8r+06dPP/Bn8O6iU8+ePR+47oULFx677XdL69fu3XffZeHChezfvz9N2ym2JStlP8DOnTvp378/lStXxsHBAZPJlOJtIyIimDBhAk2bNiVPnjx4eHhQsWJFJk6cSFxcXJL1L126RJ8+ffD398fFxYUiRYrw1ltvcePGjbQ8JY4dO8abb75JzZo1cXZ2xmQy3ffvmPDwcAYPHkz+/PlxcnKiVKlSTJw4Mdl1g4KC+OWXX3jvvffStL2VK1emf//+qd5u48aN1vdtZ2dncufOTbNmzdiyZcsjteN+7x3nzp3j448/plq1anh7e5M9e3bq16/P6tWrH+k4D5LwO9SsWTN8fHwwmUxMnTr1vuuPHz+eUqVK4eTkRL58+Xjrrbe4fft2onWaNWtG0aJF+fLLL9O8velFd1n6P2dnZ2bPnk3t2rUTLd+wYQPnz5/Hyckp0fKYmBjq1q3L0aNH6dGjB2+88Qbh4eEcOnSI2bNn8/zzz5M3b95E20ycOBF3d/ckx86WLVuan8/97N27F39/f1q3bo23tzdBQUFMmjSJpUuXsn///iRtfpAPPvjgvsWVc+fOUa1aNby8vBgwYAA+Pj5s27aNDz/8kD179rB48eK0OiXOnz9P3bp18fLy4osvviA8PJwxY8Zw4MABdu7ciaOjo3Xd999/n9y5c1OxYkX++uuvB+539OjRNGrUiKJFi6ZZW3fu3Mn169dp2bLlY+1n0aJFbNu2LdXb5c+fP0lg3fuajxw5ki1bttChQwfKly/P5cuXGT9+PJUqVWL79u2ULVv2sdp+t/DwcBo0aEBISAjvvfceDg4OfPPNN9SrV499+/bh6+ubqF1hYWFUq1aNS5cu3XefuXPnpk2bNowZM4bWrVunWVvFNmWF7I+IiODll1/m2Wef5bXXXiNnzpzWPF6zZg1r1659aCc9JiaGli1bsnXrVnr37k358uW5efMmO3bsICQkhPz58ydaf+DAgVStWjXRsrTMUkh59l+/fp3z58/Tvn17ChQoQExMDKtWraJnz54cO3aML774ItF+bSn7c+TIwYwZM5IsX7FiBbNmzaJp06bWZX379qVx48aJ1jMMg9dee41ChQqRL1++R2/4PZ7Ea1exYkWqVKnC2LFjmT59epq1VWxTVsh+gD///JNffvmF8uXLU7hw4VR9uHbq1CneeOMNGjVqxFtvvYWnpyd//fUX/fv3Z/v27UybNs26bnh4ODVq1OD27dv0798fPz8/9u/fz/jx41m3bh179uzBbE6bcQDbtm1j3LhxlC5dmlKlSrFv375k14uLiyMgIIDdu3fz+uuvU6xYMWv7b968maTw8t133+Hv70+DBg3SpJ0QX6Tau3cvn3zySaq3DQwMxGw289prr5E7d25u3rzJzJkzqVu3LsuWLUvygeqDPOi9Y/HixYwcOZK2bdvSo0cPYmNjmT59Ok2aNOHXX3/l5ZdfTnXb7+f69et88sknFChQgAoVKrB+/fr7rvvuu+8yatQo2rdvz6BBgzh8+DDff/89hw4dSvI3XN++fRkyZAgff/wxHh4eadbedGNkcVOmTDEA44UXXjCyZ89uxMTEJHq+d+/eRuXKlY2CBQsaLVu2tC6fN2+eARizZs1Kss/IyEgjJCTE+vjDDz80AOPatWtP7kQew+7duw3A+PLLL1O8zYEDBwx7e3vjk08+MQBj/vz5iZ7//PPPDcA4ePBgouXdu3c3ACM4ODhN2m4YhtGvXz/DxcXFOHPmjHXZqlWrDMD46aefEq0bFBRkGIZhXLt2zQCMDz/8MNl9XrlyxbC3tzd++eWXhx6/R48eRr169VLU1hEjRhgFCxZM0br3ExkZaRQqVMj6vX/99ddTtF29evWMMmXKPHS9LVu2GHfu3Em0LDAw0HBycjJeeumlR2rz/YwcOdIAjJ07d1qXHTlyxLCzszOGDx+eaN3Tp08bFovFMAzDcHNzM3r06HHf/S5YsMAwmUzGyZMn07S9YjuyUvbfuXPH2LJlS5LlH3/8sQEYq1ateug+Ro4caTg4OBg7dux44Hrr1q1L9j3hSUhN9ienVatWhpubmxEbG2tdZovZn5xGjRoZnp6eRmRk5APX27RpkwEYn3/++SMfKzlP4rUzDMMYM2aM4ebmZoSFhaVpe8V2ZKXsNwzDuHz5shEREWEYhmG8/vrrRmr+9Lt27VqSfrxhGMbLL79sAMbx48ety2bNmmUAxtKlSxOt+8EHHxiA8ffffz/iGSR148YNIzQ01DAMwxg9erQBWPv3d0t4zSZPnpxoebt27QxnZ2fjypUr1mXR0dFG9uzZjffff/+hx//www9TnOeTJ082XFxcrK/B47p9+7aRK1cuIyAgIMXbPOy94+DBg0l+VqOiooySJUsa+fPnT5N2373fS5cuGYZhGLt27TIAY8qUKUnWu3jxomFvb29069Yt0fLvv//eAIwlS5YkWn7lyhXDzs4uyWudWWT5S5YSdO7cmRs3brBq1SrrsujoaBYsWECXLl2SrH/y5EkAatWqleQ5Z2dnPD09n1xj01ihQoWA+EuQUmrQoEE8//zz1KlTJ9nnQ0NDAciVK1ei5Xny5MFsNicatQIwc+ZMKleujIuLCz4+PnTq1Ilz586lqC0LFy6kVatWFChQwLqscePGFC9enHnz5iVaN+FcH2bZsmXExsYm+bTwcS1btuyxPyEdNWoUFouFIUOGPNL2sbGxhIeH3/f5mjVrJnl9ihUrRpkyZThy5EiS9ZcvX06dOnVwc3PDw8ODli1bcujQoRS1ZcGCBVStWjXRJ+klS5akUaNGSV67ggULpniobcLrlpYjscQ2ZYXsd3R0pGbNmkmWP//88wDJ/l7fzWKx8N133/H8889TrVo1YmNjU3TpaVhYGLGxsQ9cJ72yPzmFChUiIiKC6Oho6zJbzv4Ely5dYt26dbzwwgs4Ozs/cN3Zs2djMpmS/V142l47iB+Sf/v27US/zyLJyQrZD/H9cBcXl0faNnv27JQpUybJ8uTeOx7U7weStOFx+o4+Pj4pGgWxadMmIH5+qbt16tSJqKioRH3EzZs3c/369SeS/Q0aNHjk1+Berq6u5MiRI1V/sz3svaNMmTJkz5490TInJydatGjB+fPnCQsLS/Tc0aNHad++PT4+Pjg7O1OlSpUUTxPg5ORE7ty5H7retm3biI2NTfa1A5gzZ06i5Tlz5qR8+fKZtt+vgsz/FSpUiBo1avDbb79Zly1fvpyQkJAkPwwQ/8chwPTp05O9Bj05wcHBXL9+PdFXSn6hQkJCkmyX3NeD/si+140bN7h69Sq7d++2DkVr1KhRiradP38+W7dufeC18PXr1wfglVdeYd++fZw7d465c+cyceJEBg4ciJubm3Xdzz//nO7du1OsWDG+/vprBg8ezJo1a6hbt+5Dvz8XLlzg6tWrVKlSJclz1apVY+/evSk6p3tt3boVX19f6+ucFi5fvszevXtp0aLFI+/j7NmzfPXVV4wcOfKRwj0wMND65pc7d25GjBhBTEzMQ7czDIMrV64kCewZM2bQsmVL3N3dGTlyJCNGjODw4cPUrl37ofMSWSwW/vnnn/u+didPnkzyJpBSXl5eFClS5JGvs5WsI6tl/90S5jG49/f6XocPH+bixYuUL1+ePn364ObmhpubG+XLl2fdunXJbvPyyy/j6emJs7MzDRo0YPfu3UnWSe/sj4yM5Pr165w+fZpp06YxZcoUatSokShLbTX77zZnzhwsFgsvvfTSA9eLiYlh3rx51KxZM8mHGU/jawdQunRpXFxclP3yUFk5+x9Xcu8ddevWxWw2M2jQILZv38758+f5888/+fzzz2nbti0lS5a0rvs4fcfUuHPnDnZ2dkk+ZHR1dQVgz5491mVbt27FZDJRsWLFNDt+TEwMq1evfqzsh/hi1/Xr1zl69CjvvfceBw8eTPHfbI/z3nH58mVcXV2t3y+AQ4cO8eyzz3LkyBGGDRvG2LFjcXNzo23btvz++++p2v+D3LlzB0hayEvutUtQuXJltm7dmmZtSFcZO0An4yUMXdy1a5cxfvx4w8PDwzqsrEOHDkaDBg0MwzCSDF2MiIgwSpQoYQBGwYIFjZ49exqTJ09ONPwtQcLQxeS+SpQo8dA21qtX777b3/31oEs47uXk5GTdztfX1xg3blyKtouIiDAKFChgvZzkQcPTP/30U8PFxSVRG//zn/8kWuf06dOGnZ1dkuHQCZdEPWyYdMJwt+nTpyd57p133jEAIyoqKslzD7tkqXbt2kblypUfeOwEKR22nhbDFtu3b2/UrFnT+phUDFvv1auX8dFHHxkLFy40pk+fbrRu3doAjI4dOz502xkzZiQZ9hkWFmZky5bN6N27d6J1L1++bHh5eSVZfq+E1+CTTz5J8tyECRMMwDh69Giy2z7skiXDMIymTZsapUqVeuA6knVl1ey/W+PGjQ1PT0/j5s2bD1xv0aJF1veKYsWKGVOmTDGmTJliFCtWzHB0dDT2799vXXfLli1Gu3btjMmTJxuLFy82vvzyS8PX19dwdnZONGQ9I7L/yy+/TPR9a9SokXH27NlE69hi9t+rcuXKRp48eYy4uLgHrvff//7XAIwffvgh0fKn9bVLULx4caN58+YPbINkXVk5+1N7yVJy7ty5Y5QuXdrw9/dPcrnXL7/8YmTLli1JG+9e73H7jvd60CVLY8eONQBj06ZNiZYPGzbMAIxWrVpZl3Xt2tXw9fVN0TFTesnSmjVr7tu21AgICLB+Px0dHY2+ffs+9HLTBI/63nH8+HHD2dk5ySVDjRo1MsqVK5cooy0Wi1GzZk2jWLFiKTyjeA+6ZGnPnj0GYHz66aeJlq9YscIADHd39yTbfPHFFwaQ7O/k006T+t6lY8eODB48mKVLl9KsWTOWLl3KuHHjkl3XxcWFHTt28PnnnzNv3jymTp3K1KlTMZvN9O/fnzFjxiSZEGzhwoVJhjTePVLkfsaOHcvNmzcful5qJuRdvnw5UVFRHDlyhJkzZyaZsfp+vvrqK2JiYlI0A3mhQoWoW7cu7dq1w9fXl2XLlvHFF1+QO3du6wzfixYtwmKx0LFjR65fv27dNnfu3BQrVox169Y98FiRkZEASb7XgHUodmRkZLLPP8iNGzeSncDQYrEQHBycaNmdO3eIiYlJ1H6IH6Xh4OBgffznn38+1rDFdevWsXDhQnbs2PFI20+ePDnR427dutGnTx8mTZrEm2++ybPPPpvsdkePHuX111+nRo0a9OjRw7p81apV3Lp1i86dOyc6dzs7O6pXr37fT84TpPS1e1Te3t6PPEJKspaslP0JvvjiC1avXs0PP/zw0AkmEz6FDQsLY+/evfj5+QHQsGFDihYtyqhRo5g5cyYQf8nj3ZdHtW7dmvbt21O+fHmGDx/OihUrgIzJ/s6dO1OlShWuXbvG0qVLuXLlSpKMscXsv1tgYCB79uzhzTfffOgEm7Nnz8bBwSHJraSf1tcugbe3d5LXRCQ5WTH7H9eAAQM4fPgwy5Ytw94+8Z+R+fLlo1q1arRo0YKCBQuyadMmxo0bR/bs2RkzZgzw+H3H1OjSpQuffPIJvXr1YsKECRQrVoyVK1fyww8/AIn7mDdu3MDb2zvZ/dybJxEREVgsliTLPTw8Ev0M/Pnnn5QuXTrF0yXcz1dffcXbb7/NuXPnmDZtGtHR0Q+9HBge/b0jIiKCDh064OLiwldffWVdHhwczNq1a/nkk08ICwtLNIo9ICCADz/8kAsXLqTJBPCVKlWievXqjBw5knz58tGgQQOOHDlCv379cHBwSDb/E16/69evkzNnzsduQ3pSQeYuOXLkoHHjxsyePZuIiAji4uJo3779fdf38vJi1KhRjBo1ijNnzrBmzRrGjBnD+PHj8fLy4rPPPku0ft26dR86NDw5lStXTvU2D5Mwg3jz5s1p06YNZcuWxd3d/YG30Tx9+jSjR49mwoQJyc4af7c5c+bQp08fAgMDrXffeOGFF7BYLLz77rt07twZX19fjh8/jmEYFCtWLNn9JHRqw8PDEw3NtLOzI0eOHNYObsLQtrsl3M71UTvBRjJDUs+ePYu/v3+y6+fIkSPR43Xr1lkv3Uq4M8TddzgKCQlJFCiOjo74+Pgku+/Y2FgGDhxIt27dkty55HG8/fbbTJo0idWrVydbkLl8+TItW7bEy8uLBQsWYGdnZ33u+PHjQPwfZclJ6IRERkYSEhKS6LncuXM/0dcO4l+/1NzeUbKurJT9AHPnzuX999/nlVdeoV+/fg9dP+H3sFatWtZiDECBAgWoXbv2Q4cIFy1alDZt2rBo0SLi4uKws7PLkOwvWLCg9bKDzp0706dPHxo3bsyxY8cSrWvL2T9r1iyAh16uFB4ezuLFiwkICEh0tzvgqX7tQNkvKZfVsv9xjR49mkmTJvHpp58muQxny5YttGrViu3bt1svR2zbti2enp58/PHH9OrVi9KlSz923zE1cufOzZIlS+jWrZv1jnKenp58//339OjRI8nfMsllPyTN+PstnzJlCj179rQ+XrZsGc8995z18f0y8WGeeeYZ6/+7du1KpUqV6NmzJwsWLLjvNo/63hEXF0enTp04fPgwy5cvT1T0O3HiBIZhMGLECEaMGJHs9levXiV37txcu3Yt0XIfH58kl449zMKFC3nxxRfp1asXEP/9euutt9iwYQPHjh1Lsn7C65cZ818FmXt06dKF3r17c/nyZZo3b57iW9MVLFiQXr168fzzz1O4cGFmzZqVJJgfVXBwcJKJ65Lj4uKCl5dXqvdfpEgRKlasyKxZsx5YkPnggw/Ily8f9evXt17jmXAd6bVr1zh9+jQFChTAbDbzww8/ULFixSS3Qm3dujVTp05l7969NG7cGIvFgslkYvny5Yn+2E+QEJZjxozh448/ti4vWLAgp0+ftk4WltxtkC9duoSPj0+qR8cA+Pr6JvvpRO7cuZNMFjh69GguX77M2LFjEy2vUKGC9f+bN28mNDQ00RvYoEGDEt0ysF69eve9/dv06dM5duwYP/30U5Lra8PCwjh9+jQ5c+ZMdJ1nSiT8cXXvJ78Q/0dD8+bNuXXrFps2bUrySYzFYgHirwVO7k0y4ZOTuXPnJrllnmEY1tfmfq8dPN6nPzdv3nykjpBkTVkl+1etWkX37t1p2bIlP/74Y4q2Sfg9vHeyRoifSC8lI9H8/PyIjo7m9u3beHp6PhXZ3759eyZNmsTGjRsJCAgAbD/7Z8+eTYkSJR76B98ff/xBREREsoWbp/W1S3Dz5s37FotE7pVVsv9xTZ06lXfffZfXXnuN999/P8nzP/30E7ly5UoyN1Tr1q356KOP2Lp1K6VLl37svmNq1a1bl1OnTnHgwAFu375NhQoVuHjxIgDFixe3rne/7AeSZP/06dNZuXKldWRogrsnQA4KCuLo0aNMnDjRuux+mZgajo6OtG7dmq+++orIyMj7fnD5qO8dvXv3ZunSpcyaNStJ0SzhtRsyZEiS3E1QtGhRzp07l+QDjLs/qEipfPnysXnzZo4fP87ly5cpVqwYuXPnJm/evIleuwQJr19m7PurIHOP559/nr59+7J9+3bmzp2b6u29vb0pUqQIBw8eTLM2vfDCC2zYsOGh6/Xo0YOpU6c+0jEiIyOT/bTqbmfPnuXEiRMULlw4yXP9+/cH4n8ZsmXLxpUrV5Id+pcwgWzCULsiRYpgGAb+/v7J/nIl6N69O7Vr17Y+TgigfPnykSNHjmQnjNy5c2eiqnJqlCxZkoULFyZZ7uzsnGQG9pkzZ3Lnzp0Hzsy+bNmyJMMWhw4dSteuXa2P7zdUEuK/9zExMcnO7j99+nSmT5/O77//Ttu2bR9wVkmdOnUKSFrlj4qK4rnnniMwMJDVq1dTunTpJNsWKVIEiP+D7EHnHhAQkOwdL8xmM+XKlUv2tduxYweFCxdO0Sz69xMUFJToDyORB8kK2b9jxw6ef/55qlSpwrx585IMN7+fcuXK4eDgwIULF5I8d/HixRR9wnfq1CmcnZ2tf6w/DdmfMErl7k9hbTn7d+zYwYkTJ/jkk08euu6sWbNwd3endevWSZ57Wl87iO9bnDt3Ltl2iyQnK2T/41q8eDGvvvoqL7zwAhMmTEh2nStXrhAXF5dkeXL9fnj0vuOjsLOzS5Qrq1evBkh0/JIlSzJr1ixCQkKSFLnubefmzZuTfU+427Jly/Dy8kqUgffLxNSKjIzEMAzCwsLuu49Hee945513mDJlCt9++y2dO3dOsl3C338ODg4PPHcHB4ckr93j9MeLFStmLbIfPnyYS5cuJRqJlCAoKIjs2bOnqE/ytFFB5h7u7u5MnDiR06dPJxpmdq/9+/eTL1++JFW4M2fOcPjwYUqUKJFmbUqra0ljY2MJCwtL0vnbuXMnBw4cSHKbv6NHj+Lq6mq9LeVnn32W5HrJgwcPMmLECIYOHUqNGjWs18YWL16clStXEhgYmKjD9ttvv2E2mylfvjwQ/6YzfPhwPv74Y2bOnJlomJlhGAQHB+Pr60vhwoWTLQQBtGvXjmnTpnHu3DnriI81a9YQGBjIm2+++cDvyf3UqFGDX375hVOnTt33uKnx559/0qpVq0TLSpcunWyhIzmdOnVKtpP6/PPP06JFC3r37k316tWty+997UJDQ3Fyckr0qaNhGNZPc+6udMfFxfHiiy+ybds2Fi9eTI0aNZJtU0BAAJ6ennzxxRc0aNAg0ZwJED9qKkeOHOTJk8f6iei92rdvz7Bhw9i9e7f1U5Vjx46xdu3ax7q1a0hICCdPnkzR5RgiYNvZD/G3J23ZsiWFChVi6dKlD+wI3psfHh4etGjRgqVLl3L06FHr3TKOHDnC1q1b6du3r3XbhN/7u+3fv58lS5bQvHlz67wl6Zn9ybUJ4ufVMplMVKpUybrM1rL/brNnzwZI9pa+d7t27RqrV6+mc+fOyY68eVpfO4jvrEdFRSV7m3eR5Nh69qdGcvmxceNGOnXqRN26dZk1a9Z9555K6PevX78+0UiIhLtYJdy9KC36jo/j2rVrjBw5kvLlyycqKtSoUQPDMNizZ899L6dKjT///JOmTZsm+uDjQZmYnKtXryaZC+XWrVssXLgQPz+/RM+dPXuWiIgI6/tzat87Ro8ezZgxY3jvvfcYNGhQsu3JmTMn9evX56effuKNN95I8vokvHYPK1Y9KovFwtChQ3F1deW1115L8vyePXvu+zfL004FmWTcPXHp/axatYoPP/yQ1q1b8+yzz+Lu7s6pU6f49ddfuXPnDh999FGSbRYsWJDs3CtNmjRJdih4grS6ljQ8PBw/Pz9efPFFypQpg5ubGwcOHGDKlCl4eXkluR6wVKlSiYZS313VTZAwtLNq1apJqqzLly+nTp06DBgwAF9fX5YuXcry5ct59dVXrW8iRYoU4bPPPmP48OGcPn2atm3b4uHhQVBQEL///jt9+vR56B/m7733HvPnz6dBgwYMGjSI8PBwRo8eTbly5ZIMd5wxYwZnzpwhIiICiH+jSShKdOvWzXqNesuWLbG3t2f16tX06dMnZd/g+wgKCuLIkSOJhi2mVsmSJRPdMvBu/v7+ST4dvfe1+/vvv+ncuTOdO3emaNGiREZG8vvvv7Nlyxb69OmTqFP79ttvs2TJEp577jmCg4OTDMlM+GTX09OTiRMn0q1bNypVqkSnTp3IkSMHZ8+eZdmyZdSqVYvx48c/8Lz69+/PpEmTaNmyJUOGDMHBwYGvv/6aXLly8fbbbyda97///S/79+8H4j9x+eeff6yvXevWra1FPoj/9MMwDNq0afPA44vczVazPywsjICAAG7evMk777zDsmXLEj1fpEiRRJ2Ye/MD4icBXrNmDQ0bNmTgwIEAjBs3Dh8fn0QTuL744ou4uLhQs2ZNcubMyeHDh/n5559xdXVNNDlgemb/559/zpYtW2jWrBkFChQgODiYhQsXsmvXLt544w2KFi1qXdfWsj9BXFwcc+fO5dlnn7V+Qn0/c+fOJTY29r7zzDytrx3E/366urrSpEmTBx5f5G62mv0QXzCaMWMGgHVUWkLfqWDBgnTr1s267r35cebMGVq3bo3JZKJ9+/bMnz8/0b7Lly9v7XsNGDCAKVOm8Nxzz/HGG29QsGBBNmzYwG+//UaTJk2sf/ynRd8xJCSE77//HsB6i/vx48eTLVs2smXLlmj6hXr16lGjRg2KFi3K5cuX+fnnnwkPD2fp0qWJiku1a9fG19eX1atXP3ZBJjIyknXr1qX4suD7ad68Ofnz56d69erkzJmTs2fPMmXKFC5evJhkNFf37t3ZsGGD9bKu1Lx3/P777wwdOpRixYpRqlSpJP3+u39eJ0yYQO3atSlXrhy9e/emcOHCXLlyhW3btnH+/HlrP/1Bxo8fz61bt6yXjv33v//l/PnzALzxxhvWEUqDBg0iKiqKZ555hpiYGGbPns3OnTuZNm1akg8drl69yj///MPrr7/+0OM/ldLxjk5Ppbtvf/cg997+7tSpU8YHH3xgPPvss0bOnDkNe3t7I0eOHEbLli2NtWvXJtr2Qbe/A4x169Y9iVNL4s6dO8agQYOM8uXLG56enoaDg4NRsGBB45VXXkn2lmzAQ2/p+aDbXu/YscNo3ry5kTt3bsPBwcEoXry48fnnnye5TZ5hGMbChQuN2rVrG25uboabm5tRsmRJ4/XXXzeOHTuWonM7ePCg0bRpU8PV1dXIli2b8dJLLxmXL19Ost6DbiV47+vQunVro1GjRg899sNufTp+/HjDy8sr2fN+XNzn9nX3vnanTp0yOnToYBQqVMhwdnY2XF1djcqVKxs//vijYbFYEm37sNst3mvdunVGQECA4eXlZTg7OxtFihQxevbsaezevTtF53Du3Dmjffv2hqenp+Hu7m60atXKOH78eJL1evTocd823XvLvBdffNGoXbt2io4vWVNWyv6goKBU3Tr1ftm/Z88eo3Hjxoabm5vh4eFhtGnTxggMDEy0znfffWdUq1bN8PHxMezt7Y08efIYXbt2TfZ32jDSJ/tXrlxptGrVysibN6/h4OBgeHh4GLVq1TKmTJmSJP8Mw7ayP0HCrULHjRv30H0n/GzHxsY+cL2n8bWrXr260bVr1xQdX7KmrJT9hvFvPz25r3uz4t5lD9oWMD788MNE2x89etRo37694efnZ/0bY8iQIcbt27eTbdej9h0f9J527+2o33zzTaNw4cKGk5OTkSNHDqNLly7GyZMnk93vwIEDjaJFiz70+A+77fXSpUsNk8n02LdfHj9+vFG7dm0je/bs1p+35557zti4cWOSdRP67g+T3HtHan9eT548aXTv3t36912+fPmMVq1aGQsWLEjReRUsWPC+x7r779EpU6YYFSpUsPY5GjVqlOR3LcHEiRMNV1dXIzQ0NEVteNqYDOMRZkgSyQI2bdpE/fr1OXr06GNNENiiRQvc3d2ZN29eGrZO7ufy5cv4+/szZ84cjZARkVRT9mdO+/bto1KlSvz999+PPH+ciGRdp06domTJkixfvpxGjRo98n769+/P7t272blzZxq2Th6kYsWK1K9fn2+++Sajm/JIVJAReYCE4YKTJk165H2MGjWKOnXqZNrrGjObYcOGsXbtWr0RisgjU/ZnPp06dcJisagAJiKPrF+/fpw4ceKxJhT++eef8fPzo3nz5mnYMrmfFStW0L59e06dOpVkzp3MQgUZEREREREREZF0lvxU2SIiIiIiIiIi8sSoICMiIiIiIiIiks5UkBERERERERERSWf2GXFQi8XCxYsX8fDwwGQyZUQTRETSnWEYhIWFkTdvXszmrFUPV+6LSFal7Ff2i0jWk9Lsz5CCzMWLF/Hz88uIQ4uIZLhz586RP3/+jG5GulLui0hWp+wXEcl6Hpb9GVKQ8fDwAOIb5+npmRFNEBFJd6Ghofj5+VkzMCtR7otIVqXsV/aLSNaT0uzPkIJMwpBFT09PhbOIZDlZcdi2cl9Esjplv7JfRLKeh2V/1rqQVURERERERETkKaCCjIiIiIiIiIhIOlNBRkREREREREQknWXIHDIimVFcXBwxMTEZ3Qx5ijk4OGBnZ5fRzRCRNKTsl4dR9ovYHmW/PExaZb8KMiIPYRgGly9f5tatWxndFMkEsmXLRu7cubPk5I0itkTZL6mh7BexDcp+SY20yH4VZEQeIiGUc+bMiaurqzpbkizDMIiIiODq1asA5MmTJ4NbJCKPQ9kvKaHsF7Etyn5JibTMfhVkRB4gLi7OGsq+vr4Z3Rx5yrm4uABw9epVcubMqSHsIpmUsl9SQ9kvYhuU/ZIaaZX9mtRX5AESrh11dXXN4JZIZpHws6LrjkUyL2W/pJayXyTzU/ZLaqVF9qsgI5ICGq4oKaWfFRHbod9nSSn9rIjYDv0+S0qlxc+KCjIiIiIiIiIiIulMBRkRERERERERkXSmgoyIPJL169djMplSdVvAQoUK8e233z6xNomIyJOl7BcRyXqU/U+OCjIiNqpnz56YTCZee+21JM+9/vrrmEwmevbsmf4NExGRJ0bZLyKS9Sj7My8VZERsmJ+fH3PmzCEyMtK6LCoqitmzZ1OgQIEMbJmIiDwpyn4RkaxH2Z85qSAjkkqGYRARHZvuX4ZhpLqtlSpVws/Pj0WLFlmXLVq0iAIFClCxYkXrsjt37jBw4EBy5syJs7MztWvXZteuXYn29eeff1K8eHFcXFxo0KABp0+fTnK8zZs3U6dOHVxcXPDz82PgwIHcvn071e0WEXnaKPuV/SKS9Sj7lf1Pmn1GN0Aks4mMiaP0B3+l+3EPfxKAq2Pqf2V79erFlClTeOmllwD49ddfefnll1m/fr11naFDh7Jw4UKmTZtGwYIFGTVqFAEBAZw4cQIfHx/OnTvHCy+8wOuvv06fPn3YvXs3b7/9dqLjnDx5kmbNmvHZZ5/x66+/cu3aNQYMGMCAAQOYMmXKY527iEhGU/Yr+0Uk61H2K/ufNI2QEbFxXbt2ZfPmzZw5c4YzZ86wZcsWunbtan3+9u3bTJw4kdGjR9O8eXNKly7NpEmTcHFxYfLkyQBMnDiRIkWKMHbsWEqUKMFLL72U5DrUL7/8kpdeeonBgwdTrFgxatasybhx45g+fTpRUVHpecoiIlmesl9EJOtR9mc+GiEjkkouDnYc/iQgQ477KHLkyEHLli2ZOnUqhmHQsmVLsmfPbn3+5MmTxMTEUKtWLesyBwcHqlWrxpEjRwA4cuQI1atXT7TfGjVqJHq8f/9+/vnnH2bNmmVdZhgGFouFoKAgSpUq9UjtFxF5Gij74yn7RSQrUfbHU/Y/OSrIiKSSyWR6pCGEGalXr14MGDAAgAkTJjyRY4SHh9O3b18GDhyY5DlNJCYimZ2yP3nKfhGxZcr+5Cn7007m+ukSkUfSrFkzoqOjMZlMBAQkrvIXKVIER0dHtmzZQsGCBQGIiYlh165dDB48GIBSpUqxZMmSRNtt37490eNKlSpx+PBhihYt+uROREREUkzZLyKS9Sj7MxfNISOSBdjZ2XHkyBEOHz6MnV3iIZBubm7069ePd955hxUrVnD48GF69+5NREQEr7zyCgCvvfYax48f55133uHYsWPMnj2bqVOnJtrPu+++y9atWxkwYAD79u3j+PHjLF682FqhFxGR9KXsFxHJepT9mYsKMiJZhKenJ56ensk+99VXX9GuXTu6detGpUqVOHHiBH/99Rfe3t5A/NDDhQsX8scff1ChQgV+/PFHvvjii0T7KF++PBs2bCAwMJA6depQsWJFPvjgA/LmzfvEz01ERJKn7BcRyXqU/ZmHyXiUm5w/ptDQULy8vAgJCbnvD4rI0yAqKoqgoCD8/f1xdnbO6OZIJvCgn5msnH1Z+dwl81H2S2op+5OXlc9dMh9lv6RWWmS/RsiIiIiIiIiIiKQzFWRERERERERERNKZCjIiIiIiIiIiIulMBRkRERERERERkXSmgoyIiIiIiIiISDpTQUZEREREREREJJ2pICMiIiIiIiIiks5UkBERERERERERSWcqyIiIiIiIiIiIpDMVZETkiYqIiKBdu3Z4enpiMpm4detWhrVl/fr1Gd4GERFbp9wXEcl6lP2PRgUZERtkMpke+PXRRx+lW1umTZvGpk2b2Lp1K5cuXcLLyytdjlu/fn0GDx6caFnNmjXTtQ0iIulFua/cF5GsR9mf+bPfPqMbICJp79KlS9b/z507lw8++IBjx45Zl7m7u1v/bxgGcXFx2Ns/mTg4efIkpUqVomzZsk9k/6nh6OhI7ty5M7oZIiJpTrmfPOW+iNgyZX/yMlP2a4SMSGoZBkTfTv8vw0hxE3Pnzm398vLywmQyWR8fPXoUDw8Pli9fTuXKlXFycmLz5s307NmTtm3bJtrP4MGDqV+/vvWxxWLhyy+/xN/fHxcXFypUqMCCBQvu24769eszduxYNm7ciMlksu7LZDLxxx9/JFo3W7ZsTJ06FYDTp09jMplYtGgRDRo0wNXVlQoVKrBt27ZE22zZsoX69evj6uqKt7c3AQEB3Lx5k549e7Jhwwa+++476ycEp0+fTnb44sKFCylTpgxOTk4UKlSIsWPHJjpGoUKF+OKLL+jVqxceHh4UKFCAn3/+OUWvg4jYkKc8+5X7yn0ReQKU/cr+J0wjZERSKyYCvsib/sd97yI4uqXZ7oYNG8aYMWMoXLgw3t7eKdrmyy+/ZObMmfz4448UK1aMjRs30rVrV3LkyEG9evWSrL9o0SKGDRvGwYMHWbRoEY6Ojqlq43/+8x/GjBlDsWLF+M9//kPnzp05ceIE9vb27Nu3j0aNGtGrVy++++477O3tWbduHXFxcXz33XcEBgZStmxZPvnkEwBy5MjB6dOnE+1/z549dOzYkY8++ogXX3yRrVu30r9/f3x9fenZs6d1vbFjx/Lpp5/y3nvvsWDBAvr160e9evUoUaJEqs5HRDIxG8h+5b5yX0RSSdmv7H/C2a+CjEgW9cknn9CkSZMUr3/nzh2++OILVq9eTY0aNQAoXLgwmzdv5qeffko2nH18fHB1dX3kYYNDhgyhZcuWAHz88ceUKVOGEydOULJkSUaNGkWVKlX44YcfrOuXKVPG+n9HR0dcXV0feNyvv/6aRo0aMWLECACKFy/O4cOHGT16dKJwbtGiBf379wfg3Xff5ZtvvmHdunXqmItIpqLcV+6LSNaj7H+6s18FGZHUcnCNr1pnxHHTUJUqVVK1/okTJ4iIiEgS6NHR0VSsWDEtm2ZVvnx56//z5MkDwNWrVylZsiT79u2jQ4cOj7X/I0eO0KZNm0TLatWqxbfffktcXBx2dnZJ2pEwFPTq1auPdWwRyWRsIPuV+8p9EUklZb+Vsv/JUEFGJLVMpjS9dCijuLklPgez2Yxxz/WqMTEx1v+Hh4cDsGzZMvLly5doPScnp1Qd22QyPfBYCRwcHBJtA/HXtAK4uLik6piP4+52JLQloR0ikkXYQPYr91NOuS8igLJf2f/Es1+T+ooIEH+95d0ztQPs27fP+v/SpUvj5OTE2bNnKVq0aKIvPz+/xzrW8ePHiYiISNU+ypcvz5o1a+77vKOjI3FxcQ/cR6lSpdiyZUuiZVu2bKF48eLWSrmIiK1S7sdT7otIVqLsj/e0ZL9GyIgIAA0bNmT06NFMnz6dGjVqMHPmTA4ePGgdmujh4cGQIUN48803sVgs1K5dm5CQELZs2YKnpyc9evRI1bHGjx9PjRo1iIuL4913301SkX6Y4cOHU65cOfr3789rr72Go6Mj69ato0OHDmTPnp1ChQqxY8cOTp8+jbu7Oz4+Pkn28fbbb1O1alU+/fRTXnzxRbZt28b48eMTXaMqImKrlPvKfRHJepT9T1f2a4SMiAAQEBDAiBEjGDp0KFWrViUsLIzu3bsnWufTTz9lxIgRfPnll5QqVYpmzZqxbNky/P39U3WssWPH4ufnR506dejSpQtDhgzB1TV118oWL16clStXsn//fqpVq0aNGjVYvHgx9vbxdeYhQ4ZgZ2dH6dKlyZEjB2fPnk2yj0qVKjFv3jzmzJlD2bJl+eCDD/jkk08STe4lImKrlPvKfRHJepT9T1f2m4x7L+pKB6GhoXh5eRESEoKnp2d6H14kxaKioggKCsLf3x9nZ+eMbo5kAg/6mcnK2ZeVz10yH2W/pJayP3lZ+dwl81H2S2qlRfZrhIyIiIiIiIiISDpTQUZEREREREREJJ2pICMiIiIiIiIiks5UkBERERERERERSWcqyIiIiIiIiIiIpDMVZERERERERERE0pkKMiIiIiIiIiIi6UwFGRERERERERGRdKaCjIiIiIiIiIhIOlNBRkTSnclk4o8//sjoZoiISDpS9ouIZD3K/gdTQUbERl27do1+/fpRoEABnJycyJ07NwEBAWzZsiWjmyYiIk+Isl9EJOtR9mde9hndAJGsJDAQTp6EokWhWLEne6x27doRHR3NtGnTKFy4MFeuXGHNmjXcuHHjyR5YRESs0jP3QdkvIvI0UPZLSmmEjEg6CA6GZs0NSpSAFi2gePH4xzdvPpnj3bp1i02bNjFy5EgaNGhAwYIFqVatGsOHD6d169YAfP3115QrVw43Nzf8/Pzo378/4eHh1n1MnTqVbNmysXTpUkqUKIGrqyvt27cnIiKCadOmUahQIby9vRk4cCBxcXHW7QoVKsSnn35K586dcXNzI1++fEyYMOGB7T137hwdO3YkW7Zs+Pj40KZNG06fPm19fv369VSrVg03NzeyZctGrVq1OHPmTNp+00RE0lB65z4o+0VEMpqyX9mfWirIiKSDLi8ZrN0Yi2+rveTrtwbfVntZuzGWzl2MJ3I8d3d33N3d+eOPP7hz506y65jNZsaNG8ehQ4eYNm0aa9euZejQoYnWiYiIYNy4ccyZM4cVK1awfv16nn/+ef7880/+/PNPZsyYwU8//cSCBQsSbTd69GgqVKjA3r17GTZsGIMGDWLVqlXJtiMmJoaAgAA8PDzYtGkTW7Zswd3dnWbNmhEdHU1sbCxt27alXr16/PPPP2zbto0+ffpgMpnS5pslIvIEpHfug7JfRCSjKfuV/almZICQkBADMEJCQjLi8CIpFhkZaRw+fNiIjIx85H0cO2YYYBi+rf42Cr671Prl22qvAYYRGJiGDb7LggULDG9vb8PZ2dmoWbOmMXz4cGP//v33XX/+/PmGr6+v9fGUKVMMwDhx4oR1Wd++fQ1XV1cjLCzMuiwgIMDo27ev9XHBggWNZs2aJdr3iy++aDRv3tz6GDB+//13wzAMY8aMGUaJEiUMi8Viff7OnTuGi4uL8ddffxk3btwwAGP9+vWp/yZkgAf9zGTl7MvK5y6Zz+Nmf0blvmEo+zOKsj95WfncJfNR9iv7Uystsl8jZESesJMn4/919gtOtNzZL/6azhMnnsxx27Vrx8WLF1myZAnNmjVj/fr1VKpUialTpwKwevVqGjVqRL58+fDw8KBbt27cuHGDiIgI6z5cXV0pUqSI9XGuXLkoVKgQ7u7uiZZdvXo10bFr1KiR5PGRI0eSbef+/fs5ceIEHh4e1gq/j48PUVFRnDx5Eh8fH3r27ElAQADPPfcc3333HZcuXXrcb4+IyBOTUbkPyn4RkYyi7P/3sbI/5VSQEXnCEnIt6pxPouVR53yB+Mm+nhRnZ2eaNGnCiBEj2Lp1Kz179uTDDz/k9OnTtGrVivLly7Nw4UL27Nljvd4zOjraur2Dg0Oi/ZlMpmSXWSyWR25jeHg4lStXZt++fYm+AgMD6dKlCwBTpkxh27Zt1KxZk7lz51K8eHG2b9/+yMcUEXmSMjL3QdkvIpIRlP0Pp+xPSndZEnnCiheHgGYGa9eWBUw4+90g6pwvoWvLENDMoFix9LsmsnTp0vzxxx/s2bMHi8XC2LFjMZvj67Lz5s1Ls+PcG5rbt2+nVKlSya5bqVIl5s6dS86cOfH09LzvPitWrEjFihUZPnw4NWrUYPbs2Tz77LNp1mYRkbTyNOU+KPtFRNKDsv/fx8r+lNMIGZF08NtsEw3r2nNj6TNcmNiIG0ufoWFde36b/WSC+caNGzRs2JCZM2fyzz//EBQUxPz58xk1ahRt2rShaNGixMTE8P3333Pq1ClmzJjBjz/+mGbH37JlC6NGjSIwMJAJEyYwf/58Bg0alOy6L730EtmzZ6dNmzZs2rSJoKAg1q9fz8CBAzl//jxBQUEMHz6cbdu2cebMGVauXMnx48fvG/QiIk+D9M59UPaLiGQ0Zb+yP7U0QkYkHXh7w4rlJo4fj79+tGhRnmiV3N3dnerVq/PNN99w8uRJYmJi8PPzo3fv3rz33nu4uLjw9ddfM3LkSIYPH07dunX58ssv6d69e5oc/+2332b37t18/PHHeHp68vXXXxMQEJDsuq6urmzcuJF3332XF154gbCwMPLly0ejRo3w9PQkMjKSo0ePMm3aNG7cuEGePHl4/fXX6du3b5q0VUTkSUjv3Adlv4hIRlP2K/tTy2QYxpO7B9d9hIaG4uXlRUhIyAOHKolktKioKIKCgvD398fZ2Tmjm5MpFCpUiMGDBzN48OCMbkqGeNDPTFbOvqx87pL5KPtTT9mv7E9OVj53yXyU/amn7H/87NclSyIiIiIiIiIi6UwFGRERERERERGRdKY5ZEQkTZ0+fTqjmyAiIulM2S8ikvUo+x+fRsiIiIiIiIiIiKQzFWREUsBisWR0EyST0M+KiO3Q77OklH5WRGyHfp8lpdLiZ0WXLIk8gKOjI2azmYsXL5IjRw4cHR0xmZ7sreskczIMg+joaK5du4bZbMbR0TGjmyQij0jZLyml7BexHcp+Sam0zH4VZEQewGw24+/vz6VLl7h48WJGN0cyAVdXVwoUKIDZrAGIIpmVsl9SS9kvkvkp+yW10iL7VZAReQhHR0cKFChAbGwscXFxGd0ceYrZ2dlhb2+vT1NEbICyX1JK2S9iO5T9klJplf0qyIikgMlkwsHBAQcHh4xuioiIpBNlv4hI1qPsl/SkcZUiIiIiIiIiIulMBRkRERERERERkXSmgoyIiIiIiIiISDpTQUZEREREREREJJ2pICMiIiIiIiIiks5UkBERERERERERSWcqyIiIiIiIiIiIpDMVZERERERERERE0pkKMiIiIiIiIiIi6UwFGRERERERERGRdKaCjIiIiIiIiIhIOlNBRkREREREREQknakgIyIiIiIiIiKSzlSQERERERERERFJZyrIiIiIiIiIiIikMxVkRERERERERETSmX1GHnzVKvDwgDNn4NAh2LABrl+HUqWgY0eoVw+KFcvIFoqISFo6cQKuXAE7u/js37wZdu6E6GioUAFatFD2i4jYGmW/iEjyTIZhGOl90NDQULy8vIBbgCeQ0AQTmADDZF23wjMGI9434eoKRYsqqEUk80rIvpCQEDw9PTO6Oenq39wPATz+v1TZLyK2T9mv7BeRrCel2Z+hI2QaF1lNrCU71yN8uBCeh5tR2TA7WPCqe5SIo3m5czY7+/dD+/b/bhPQzOC32Sa8vTOu3SIi8mjaVF7InbA8REa7ci3Cl4u3c3MrykvZLyJiw9o3ncqdiNxE3nHl8s08nLvmR0i4N2YHQ9kvIllahhZkFnZ4GU+nf6viUbFOnIvNyaErpThgKcehvEXZfKI2Uc/ewNkvmKhzPqxdW5bOXewZ952JDRvAZNIQRxGRzGJ6k0GJch8gMsaZs7G5OHS1FAfiynEwTzE2H69FTM1ryn4RERswudL7SbI/wnDirCUnR28X5lCuEvxzoxQb/m5MbBX1+0Uk68jQS5bW96hBTrcYcrhdJ7tr8H3XP2PJyS6jJOvjKvDffS05u/zZJEMcc+Y26P2KiR49FNIi8nTSsHUv/nqpAdndYnB1iCSn2zV8XG7dd5tTltzstJRkveUZVhxpwrkVVbHccQDj3/nolf0i8rRT9nuxdnBlsjtbcDVH4msXQjbz7ftuc9ySj12WEqy1VGT5vgAuLq+mfr+IZDopzf4MnkMmhPg5ZMDJLoo8HpcpX2ML/sEhVKm8mTKxZ6mQ+yBm079NjLHYs/FMTeYfa82CswHcuJQ/vlx+V0iXK2ewYYOGN4rI00Wd8sS5b+8VQbYi5yhU9Bgl8x+hhNMZipvPU858ihKm84my/45hz3ZLaZZE1uaPyw0JPutH1Hkfos76QpwdoOwXkaeTsj9x9gO4Oofhl+sMFZ7dTAkuUK7AP1RwOElxh3OJto8yHNgYXom5x9qwYG9bQi/mUb9fRDKFzFGQMQcDXvFze5njm2Gyi8OIdiRb/cPcWl+aQq03UbfMJmqYD9HI2Ecxh/PW/UTFOrEksBm/HujMqqAGWO44W59zcLJw7IgZf/90PjkRkftQp9wLx3yncPaLw73seRx8k35CasSZiLnpho/7Daq6HqaW+SCNzX9TyHzFuk6k4chflirMjWvAlshyRAblJCIwNxGBuXGwNyn7ReSpouz3wj77WWJv5gXL/wsp9+n3F3thDfVKbKaW+SCNzH/jZ75u3Ve44cySG/WZtLMnG/5pgKF+v4g8xTJHQeZ+d1kCTI6x2HtFEBfqgnfjwzj73SB0TyFyHofnSy2le/k5lMt12LrPozeK8uvtlvzhUo3gs/kIXlUWD2c75vxm1iztIvJUUKfci/wDFmDnFt+JNiwm7lzIRlRQDqIuehN7ww1LjB32XpH3ZH9B8p2K44XSS+hecTYlsp2y7veIxY+pcc34I64WUbFOhP1TACOwALMneir7ReSpoOx/yF2WuF+/vwBFbwXTsfp8OhVYQiH7fwvz++8U5ZdbbVhmqkroudzq94vIUydTFGQWLIhv3JkzcOgQbNgA167Fh+jVawaHDprAZCQalvgvg2dy/8PLz8yiR4Xf8HIOBSDEcGVSVCu+mv0BN877WdeuXcdgyWINZxSRjKNOuRd+g+dRyDcP9f0KkDMqD7u3ObBjB0RHQ+nSEHT64dlfNe/fvFxxJt3Lz8HNMQKAG4YHP8e2YkZcEyJwJuJYLm5tLUb1Ep7KfhHJUMp+L/7+O4TLlz2xt4czZ2DzZlKZ/RYaVFrFK89OpV32VTibYgC4YmRj4p3WfD1rOGHn81rXVr9fRDJapijIPKxxx4/DiRNgbw+xsVC0KLwx0OCvVRbrnAEA7o5hDBrwEb3cllLYfBmAW1GejDvwCqNXv014dHxF3tPTQv/+Zho2hCZNnuw5iojcS51yLzYfOkOt0gUeuG5Ks9/L6RZvvf4BL7v9iZ/5GgDXY72YZDRnWlwAUTgREZiLO7uK8ckQL1q00KemIpL+lP0pO/eUZn8On0u89erH9HD+izym+JuCXLFk49tTPfh6/jCio10BcHOz8PnnZmW/iGQImyjIJOfmTWjfwWDtWuP/M64bYNjh22ovnmXOExCxjwGxSymT8ygAV2K8GR3cle+mvkdslKt1P46OFiZNMtO9e1qemYjI/alT/ujn/qDs9ypzjucid/O65b8U842/nOlinC+j4zrwh6U2FosdYXsKcWtLMQrls2feXBNVqqTxCYqI3Iey/8lkv3eZs7wQu4U3XBaR3y6+KH/WkpMRB95k+uK+YPxbxClZymDrFo2YEZH0k9L8M9/3maeUtzesWW1i104zxYr+f5Z1k4XgVWUJPeTH/JPPUW7iNjrOn8LJqPzkcrjJmFzfs7dXAxoUXQMmCwDR0WZ69AAfXwtBQRl8UiIi8kAPyv6QQwWYeeIFSk3YRffff+TsnVzktbvBN44/8ofxMZXtjuFZNYh8vddz1eUCVasa1KtvcPNmRp+ViIg8yIOy/+ahgkw+0oXCXx1k4N4PuGLxpoD5KtMqDGfHO7WoVnaTdT9Hj5jw8bWwb1/GnYuISHIy3QiZex0/Dj/+CBN+sHAnKnF9qVC/FbxkWc+bnnPwdY3vec840JG3V3zBtYgc8SuZ43B3M/H3HrOGM4rIE6VPSdPu3B+U/X5d1tP5zF7eq/21dX6xmbcDGG3XjhDciTiRk+CVpSlb2JX5803KfhF5opT96ZP9hV5aQx/TCgYVnoar6Q5xholJV9sxdOYYwsL/PzTGZGHuHDMVK+oyJhF5smz2kqUHWbUKtm2D+QsMDh4w4dtqL7E3XTHtycmnDT6nf9VfMJsMbsZ68GlIT8bPG0LM1WzW7QOaGfw2W8MZReTJUKf8yZz7vdnvUe0kYTuLkN31Ol81/pBXKs4E4HqcFx/FdWOppQZxUQ7cXFWW24fzKftF5IlS9qdv9ufPdZrvOr7JCz5rAbhk8WF4SB9mz+lPzFUv6/bKfhF5krJkQSbBzZtQvISFG6FxuFc4S9jOIgBUzbuHX17qQ3nXEwAsO9GEobdfIySPiVubixJxKD+lS5s5eCC52d1FRB6POuVP9twTsv/6rTiIcbAur1NgC5M69aOEyxkAltyuxYd23biJJ1HnsnFtQVVKFXdQ9ovIE6Hsz5jsb1lzEeMbDqOQXfztsn+70Yz3zvcnJk+s+v0i8sTZ7BwyKeHtDYHHzNSubv//Ykz8vDG7LlbmufAv+TSkB3diHWlZdBWby/ag1uZgbh8oiGGx49BBE/aOusZURCSzScj+OjXsgX8/a9h0thbNwkYxJrQzMXH2tHbbwkrHd2lo2oOz3y1ydd1G4KVwZb+ISCZ0v+xftvUFGt/4hh+i2mAxTHT2XcGWcj0pvz9E/X4ReWrYZEEG4sN54wYTgYEwebLZOpnv7XM5GH++K5V+3sg/MYXxtgtn7guvMrljX1zsIwCIizFTsbJFEz6KiGQyCdm/a5cJD0+LNfvDzuVi9LlePDt5Ncdi/chhDuFXp7GMYAbuOW6Su/sWHP2vKPtFRDKhJNn//w9jb53Ly8cnB9Bo6TxOW3KR1xzM0sa9+PrV17EzRwPq94tIxrLZgkyCYsWgVy84ddKMnX38rOzR1z04fK0UdZfO59ON72AxTPQqNYfd/epSOseR+A0tZvLms7B7d8a2X0REUq9KFThz2kyVKv/ekSP6ugd/X6pIzaWLGLttAACvOC9ngekT/J0ukPP5v/F69iR588Up+0VEMqGE7H+mYuLsX/93U6r9+QdTLrfBbDJ4M99Mtr5Tj4J54qcxUL9fRDKKzRdkEvj7w7WrZiqXt+fW+lJgMgg54M8H696n8fTFXLqdg9I+x9nVtx7d6/8KQFSkmapVoUpV3R5VRCSz8faGXTvjb5datOC/2R/8TxGGrPyclrPncT3Sm/JOJ1nq8B+amneRrU4g7k33U61GnLJfRCQT8vaGvX/HZ3+BPP9m/7W/S9Drp+l03fItoYYL1ZwPs7d3PVo3nwX82++vU1fZLyLpJ8sUZODfznlgIMydY6Jo0fjTX3e6HhV+2MbakKq42t1hWr03+br5u9iZYgHYs9tE9uwW1q7NyNaLiMijqFIFAo8lzf4/jwfwzMQtbAsrj4c5kp8dv+FN8wLcS14gV6ft7D0Uo+wXEcmkqlSB00FJs3/W6pepMmUd+2OL4G0KZ1HV1xnRaQQJlzlt3mTSaBkRSTdZqiCToFgx6NgRjgeasHOID99rETl46cqHfLZxCABvVvuRld3b4Ot6HQCLxUyjRlCvvqrmIiKZUXLZfyEsH+0vfc632/sBMMhxEZPsviZ7vovkfmkrJvcoZb+ISCaWXPYfP1eK54LGMvNaC+xMBp+UGMfvg57Hze0W8O9oGWW/iDxpWbIgc7fdO81g/v+EvyfzMmLdCF6YO4OwO240LLSZHa82onq7P8jZeStu5c6ycXMcrduk+53CRUQkDd2d/WEn8/HmX1/RddHPRMY409jhbxbYf4x/9iDy9VmPZ81jyn4RERtwd/YHHyhCtx9mMWj/+0QbdrTNtp6tbzSkTPsV6veLSLrJ8gWZZ56B4Otm3D0shO0tAMDvR1tT/Ze1nAwuRBHv0ywv9gZld0Rz+0ABiLNn8yYTFSupYi4iklkll/2zDrxIrV//4nxoXkrYn+MP+w+obH8Mz6qnccwZpuwXEcnk7s7+24fzAmbG/fEOAUvncN3iSXmnk6wq1Y8C+1G/X0TSRZYvyED83DJnz5ipU8MeiK+CH7lekmcnr2HblUp4O4Wyoks7XqkzybrNvr0mipfQLfJERDKr5LJ/7+VnqP7LGvbeKEN2u1BmO3xOa9fN5HpxB05+N5T9IiKZXEL2V65oZ122/u+m1Jy6ksDY/OQxB7O+fVvaNZtmfV7ZLyJPigoy/+ftDRs3mNi1y4TZLn4o4/WI7DSYtJzZB9rjYBfLLw2HMLz2GBI67teDDVq01DBGEZHMKrnsvxiWlzo/rWTx0RY4m2P43uF7Xnb9k5wdduLsf03ZLyKSyXl7w+5dJmrUNMAUB8TPK1P9+w1svP0MbqY7zKk2mD4tv7duo+wXkSdBBZl7VKkCJ46bMZktYI7jTpwzLy36hS82vwXAF40+ZfI7Xcjeag8muzi2b4NVqzK40SIi8ljuzf7bMe68MG8m43f2xmwy+NhhGkNdfiNn+x24+F9V9ouI2IBlS000bGgGU3z23wrNTqNvVzLnegD2Jgs/VXmfz4f2w1f9fhF5QlSQSYa/P5w8YcbNxfT/JSb+s+ZD3js3AIBern/yg+9Y7GJNgImmTaFOXV1bKiKSmd2b/RbDjjeWj+aLC70AGGC/mK8cfyFn679xKnBD2S8iksl5e8Oa1SbWrDbj4hyf/bGxTnSeMIcJYS8A8J7LbD4rMAEj2h71+0Ukrakgcx/+/hAebqZK1X+HMk53asgb0QOIjnOgQ6Fl/N6tE052UQBs3mSiSBFdWyoikpklzX4TPzi04t2Y3sRZzHS2X8d3ruPJ2247TvlvKPtFRGxAw4YQcfvu7DfzZWRXPo3pCsBrORYybUBXzOZYQP1+EUk7Ksg8xMq/TDz7bPy3KeqcD4uuN6HNnN+IjHWiVaHV/LfLi7g63Abg5k0zhRXOIiKZ3r3ZP/NaK15cMIWYOHva2G3lR7evyd9+K455bin7RURsxL3Z/+O1DvTd/RlxhonuvsuY0/9F7MzRgPr9IpI2VJB5CG9v2LY1ftKvm6vKErbfjxUnmtB81gLCo91oUng9q98IoHj/pWSrf5hbYRaaBmjCLxGRzCy57F94pC1t580iKtaJpnZ7mOw+ioJd1+HTfJ+yX0TEBiSX/T8ve4OXt48hxrCjg+9qlrzzHAX7rVC/X0TShAoyKbRsqYkmDewJ21kEgA2n69J4+mJCYt2o4XGAKcbXxG71g1h7du8yUbeeri0VEcns7s3+PwOb0WLWfMLjnKln9w+/OI3F2/8Cdq7Ryn4RERtxb/bPWPkqnddP4I5hTwvn7Uxw/Y7wzUXU7xeRx6aCTAp5e8OK5SYCA6FAwfhK+I4LVekU+hE373hSM+ceVg5oTvH+S/FttZdtu2Lp3EUVcxGRzCy57F93uh7dQ98nPMaVunYHmOzzFYX7riR7m93KfhERG5Bc9i/c2JlXwt4lynCgpftWFr/dmoL9VqjfLyKPRQWZVCpWDBbMN1kfbz9QmybT/h0pM8P3E3KWOYV79RP8tcKkW+OJiNiAe7N/3T+NaDFzAbf/P1JmktNYfEqew71GoLJfRMRG3Jv9y/a24sW1E60jZX7wHUu2MmfV7xeRR5ahBZkb4Xcy8vCPrGpVaNjYALs4QrYUY8+lSnQJ/YBQw5Vq5mNMuDmJyE3+ADRtClWqahijiEhmd2/2bzpbix6h/yHCcKKe3T+MdxiHq08ImC3KfhERG3Fv9i/Z3IHe4e/EF2XsdjIyegqhG4sB6veLSOplaEGm+Xcb+XZ1IOF3YjOyGY9kwTwTDeubMWLsAdgRVJ3u0cMIi3GjQZ5tLOjUFYf/z8K+Z7eJHDktBAVlZItFRORx3Zv960/V4ZWYIUTGOdHE7m8mlvyEHAH7AEPZLyJiI+7N/j8DA+gfM4gYw452nuuY0rcHYAHU7xeR1MnQgkxEtIVvVx+n/uh1TN92muhYS0Y2J1W8vWHNahOBgSbKlDUIWVOWDdub0mrWPCJjnGlVdBUz2r+K2RQHQFysmSJFFc4iIplZctm/YttztJszk+g4e1rbbWNcpY/JVjMQUPaLiNiC5LJ/0bYOvLJtVPwtsXMuZULv10goyij7RSSlTIZhpPsMVKGhoXh5eTFvyzEmbLnA6RsRABT0dWVI0xK0LJcHs9n0kL08PW7ehM5dDP5aEd/mgCKrWdK5E452Mcy83oK3gwcQeTI3YXsL4OlmR8hNTd0jkhUlZF9ISAienp4Z3Zx0ZYvnfm/2tyv1B3M7vIydycIvsc354HJfwg8UUPaLZHG2mH8pZYvnfm/2v9bqOyZW/gCA0be6MOZqNyJP5lL2i2RxKc2/DE2IgLK5WfVWPT5tU4bs7o6cuRHBG7/tpe0PW9h64npGNi1VEmZi/+uv+Md/nWxMpwW/EmeY6Zr9TwZdWR1/27wYB0JDTCxalLHtFRGRx3dv9i880pZeiycA8Kr9ct7IMZfoC97KfhERG3Jv9v+4dBBDDw4F4J1ss+kUuUX9fhFJsQwdIXN3tej2nVh+2RTEzxtPcjs6/jKfusVz8G6zEpTJ65XeTXxkVaoY7Nnz/4p5owlMrP0eAB+Gvcy4v/twa3NxCuSz48zpzDMCSETShi1+UphStn7ud2f/O82/YlS1LwEYcqcPv+7tys01ZZT9IlmUreffg9j6ud+d/aN7v8GQvNOxGCb6hb/JvL87qN8vkoVlihEyd3NzsmdQ42JsGNqAnjUL4WBnYmPgNVqO28zgOXs5FxyR0U1MkVWrTNjZx18/+uOa1/nswqsAfOwxhRY3DkKsPWfPmKhaTTOwi4jYiruzf/TyYYy73AmArxx/oXXRvzCZDWW/iIiNuTv735n0HTNuB2A2GYxzH0f1yBPq94vIQz01BZkE2d2d+Kh1GVa/VY/WFfIC8Me+izQcu56P/3voqb9Vtrc3HA80g+n/RRmH5ow/0gOAqW360dB/PQC7d5koXsKicBYRsQH3Zv8ocwemBz2PvcnCRN/RNO3wG2Ao+0VEbEji7Dfzn9uvsDS8Fk6mWOYF9KFi8R2A+v0icn9PXUEmQUFfN8Z1rsjSN2pTu2h2YuIMpmw5Tb3R6/l+zXEiop/eW2X7+8PqVfHf2ttH8jJw3rfMO/4cDnax/P5SFxoN+BXfVnu5ERpHm7bpfsWYiIg8Afdmf68Zv7DiUl1cTNHMKvIetd+apuwXEbExd2d/2BE/2n2/gG1RZfE0RbLsxU5UGTBX2S8i9/XUFmQSlM3nxcxXqzPjlWqUyetJ+J1Yxq4KpN7o9czacYaYuKfzVtmNGkFAM4PQrcUxMNN97i9sDauAp91tpvt8StEyB/CqcYJNG02sWpXRrRURkbRwd/bHGfa0n/Ib+6OK4msKY7rbFxQoc1TZLyJiY+7O/uhoV1pN/IPjcfnIYw5muven5ClzQtkvIsl66gsyCeoUy8F/B9Tmu07P4OfjwrWwO/zn94MEfLOR5QcukQFzEz/Ub7NN1KxuB8CdOGf6RL3FCUte8pqC+TliAnFb8wHQtClUqaprS0VEbMHd2X87xp2e4cM5b2TH33yFXxzGEnfMB4jP/jp1lf0iIrbg7uwPDs1B11vvc9XIRinzOX4wvid8c2FA2S8iiWWaggyA2WyizTP5WPNWfT56rjQ+bo6cun6bfrP+5vkftrL91I2MbmIi3t6waaOJylXii0WXzxakZ8y7XI7MTnmfo/zWvhd2pvhLr/bsNpEzl4WgoIxssYiIPK57s//cmSL0iH6Xm3HuVLI7zozuvbBziQJg8yYThfw1r4CISGZ3b/afPF2Kl6Pf4bbhRB2X/fzS+xUgfmS/sl9EEmSqgkwCR3szPWv5s+Gd+gxsWBRXRzv2nbtFp5+38/KUnRy5FJrRTUxk1UoT2XNYCF5VlkPbq9N61lwiY5xpWWwV37QdYl0vNsZM8ZIKZxERW3B39v+9vQ7t500nxrCjjfNmRvUeZJ0AODTETJGiyn4REVtwd/Zv3d6IHhu+wWKY6JHzv/yn6wjresp+EYFMWpBJ4OHswFtNS7D+nfp0fbYAdmYT645do8W4Tbw9bz8XbkVmdBOB+Ip54DEztavbc2t9KXZdqEK3338C4I3yU3i96s/WdWNjDRo2evouvxIRkdS5N/vXBjai39pRALzlPYe+Hb+xrnvzlkGLlsp+EZHM7t7sX7jhJd47/DYAnxSeQMcGM63rKvtFJFMXZBLk9HDms7blWPVmXVqWy4NhwMK/z9NgzHo+X3aYm7ejM7qJeHvDxg0mPvss/vHCI20ZtuZDAL5r9i5Ni6yOf8Jix759Brt3Z1BDRUQkzSRkf/fu8Y8nb36VMYd7AzCuxOc0rLUs/gmLHdu3K/tFRGzBvdk/csF/+PVyW8wmgyl13qJq6S3xTyj7RbI8myjIJCicw50JL1Xij9drUaOwL9GxFiZtCqLu6HVMWHeCyOi4jG4iHTr8+/+Rm99k2tEO2JktzOvUnToDpuNR7QSYDNq1V7VcRMRWdOny7/+Hzh/Ff4Pr4miKY06j16j6xmxlv4iIDfo3+830mTSJdRGVcDXd4fd23ajw+gJlv4jYVkEmwTN+2ZjduzpTX65KydwehEXFMvqvY9Qfs445O88Sm4G3yi5eHBo2Noif1MtEnwUT2BFeFi/720wyfYt5f3aw2HH2jAkfX03yKyJiCwICwCe7BUxxGJjp9NNsDsUWIocplF/dR2E5mEPZLyJiY+7O/jiLI+1/msfJuDzkM9/gZ/cx3NmXV9kvksXZZEEGwGQyUb9ETv4cWIdvXqxAvmwuXAm9w7BFBwj4diN/HbqcYbfKXjDPROUqJgCi45zoG/kWZ0PzUcLnJHPuuvPSzVsGVapmXPFIRETSzu6dZry84rM/ItqDXiHvct3iSVnHIGa/2g0T8aM4lf0iIrbj7uwPDs3By6HDCDFcqeZ8hMmv/nvnJWW/SNZkswWZBGaziecr5mfN2/V4v2Upsrk6cPLabfrO2EP7H7ex63RwurfJ2xt27zLxTMX4gtCp3RVo/dtv3I52pVnRNXzVOH5uGSx2BAfDokXp3kQREUlj/v5w66aZkqXis//Izmp0WPYL0YYdrb038PGL78evqOwXEbEZ92b/vu216bpmPHGGia45/uSddl/Gr6jsF8mSTEYGDBMJDQ3Fy8uLkJAQPD090/fYUTH8uP4kv24JIiomvgrduFQu3m1WgmK5PNK1LTdvQvESFq5fN8Cw48VnZzMnoB8A/S4NZda+ToTtLUCBfPacCTKla9tEJO1lZPZltKx87ve6N/tf7zSS8SW+iL8t6pX3Wbz3eWW/iA3JyvmXlc/9Xvdm/3tdR/B5kXHEGmY6XvmU1XubK/tFbEhK88/mR8jcy9PZgaHNSrLhnQZ0ruaH2QSrj1wh4NuNDF2wn0sh6Xer7ITb4hUoEP8yzN3ehbHnuwHwdfZxFDkbDjEOnD2DZl8XEbER92b/hDlDmXq7GWaTwYScY8l77o6yX0TExtyb/V/M/JiFEfWwN1mYlHMk2S/EKvtFsqAsV5BJkMvTmS9fKM/KN+sRUCYXFgPm7T5P/dHr+XL5EUIiYtKlHd7esGrlv1XwEatHsPxkI1wcoljyajsKN94KdnH0fFmzr4uI2IrE2W9i4OLRbI8ujac5gv++0g6/RjuV/SIiNiZx9pt59fdxHIgujK85lMUvv0juRnuU/SJZTJYtyCQomtOdn7pVYWG/mlQr5MOdWAs/bThF3dHr+HnjSaJinvytsosXhypV44M38kwuXg8bTFBcbvzsrvFzni+xi4NDB01UrWZw8+YTb46IiKSDu7M/7HgBXjn1IRctPhSzu8gPRT/HFGtW9ouI2Ji7s//WiUJ0P/kpNwwPytkF8W3Jr0DZL5KlZPmCTILKBb2Z2/dZJveoQvFc7oRExvDFn0dpOGY983efI87yZCvVK/8y4ekZP6dNdN4Y2i/7lfBoVxoV3shnDT8F4icCLl7ConAWEbERd2d/iI8D3Q+O5I5hT6vc63mvWfxEj8p+ERHbcnf2X/N1oceu0cQaZl7MvpLBz48GlP0iWYUKMncxmUw0KpWL5YPqMrp9efJ6OXMxJIp3FvxDi+82sfbolSd2q2xvb1i9Ov7luH0kD3/vrUWf9fGBPKz2t7w6+HOy1T/M9ZsWmgZoGKOIiC24N/vX/N6B4ZcHAPBxtdG0f/V7PKqd5PqtONq0VfaLiNiCe7N/2fLOfBzUH4CR5UbSss/Pyn6RLEIFmWTYmU10qOLH2iH1ea9FSbxcHDh2JYxeU3fz4s/b+fvskylVV60KAc0MwrYVB+C3bV354UpHAMa6TCDXAXuItWf3LhN162kYo4iILUic/Sa+nfY+v0U3wM5k8GOuUXgfs4cYBzZt0kSPIiK24t5+/2czPmVpZA0cTXH8kvNL3E45KvtFsgAVZB7A2cGOPnWLsPGdBvStVxgnezM7g4J54YetvDZjDyevhaf5MX+bbaJGNTvr45GmDmy8XA1Px9v88UoHCnVZg0e1k2zdEUvnLqqYi4jYgruz37jjwBsrR/JPnD++9qEsGfA8ORvs00SPIiI2JnG/38zLf0wgMCY/ue2CWdi3I7719yv7RWycCjIp4OXqwPDmpVg3pD4dq+THbIIVhy7T9JuNvPf7Aa6GRqXZsby9YdNGE5WrxAdv6JH8dJg5i0vRvpR0Oc2nkXMI21mYuDsO/LVCFXMREVtwb/bf2FOCVy6O4KbhTgX7k3zk/SvE2muiRxERG3Jv9l8PLEKPoI8JN5ypaX+Y9wpNVvaL2DgVZFIhbzYXRrWvwIrBdWlcKhdxFoPZO85Sd/Q6xvx1jNCotLtV9qqVJrLnsHBrc3Gu3s5J19UTiLXY0bX8PPpW/vX/a5moW0+TfYmI2IpVK/+d6PGSkxcDrrwDQL9y0+hSYS4QP9FjoULKfhERW3F39p/L5s3Aw/8B4K2C03ih7hwgPvuLFFH2i9gaFWQeQfFcHvzSowrzX6tBpQLZiIqxMH7dCeqNWsfkzUHciX38W2V7e0PgMTOVn4kfxrh2V3P+s30YAN81G0blPH8DEBlhJmduC0FBj31IERHJYIkneszLnBn9+C6yHQA/PTeIktmPARAaquwXEbEV907yO2XBQH6++gIAk+u9ReH88dl/86aZwirKiNgUk/Gkbhv0AKGhoXh5eRESEoKnp2d6Hz5NGYbBysNXGLXiKCev3QYgv7cLbzctTpsK+TCbTY99jLLlDA4dNAEGizp34fnif3L2Ti6ah47mytkCBK8qSzY3O4JvqL4m8jSzpexLrax87o+iWXOD1eviiLtjj6v/ZVa+9AK17A5x7I4frUJGcvNcXmW/SCaRlfMvK5/7o7g7+x0cItn2Vj0qOx9jf2wRng/+nNBzuQleVZba1e3ZuOHx/8YQkScnpfmnXtxjMplMBJTJzV+D6/LVC+XI5enE+ZuRvDl3Py2/38z6Y1cf+1bZU35NCFwTLy+ayKmovBRwusI3Pt/hXuYCPk0OcTPYzKpVj38+IiKS8e6e6DEiKDc9t4/hsuFNCadzfOU7UdkvImKD7s7+mBgX2s+YZZ1LbITPFNzLXMSnySE2bTRx/HgGN1ZE0oQKMmnE3s5Mp2oFWD+kAUOblcDD2Z4jl0LpOWUXL/2yg/3nbj3yvqtWhdp14os6IXey0T/iTe4Y9jS120MvuxUYRvw1pzNnpsWZiIhIRrt3oscTa5/ljag3iDNMtLPbRAe7Dcp+EREbk5D9RYvFZ//pi8UYHD4AgJft/6Lpnb+JC3UCYMOGDGumiKQhFWTSmIujHf3rF2XjOw14tbY/jnZmtp68QZsJW3h99t8EXb/9SPtdsvjfyb52B1Xl09huAAwzz6HI3vjQnj4dcuTUnAIiIrbCOtGjxcz6E3UZG9sBgI/N08iz0xlQ9ouI2JrZs/69HGlZYDMm3nkOgDGeE8jx/6zv3RsaNdGdl0QyOxVknhBvN0feb1WatUPq8UKlfJhMsOyfSzT5egMj/jjItbA7qdufN5w+bcbe0ULwqrL8+E8P5h1tjaNdDPM69MTbORiA68EGVapansQpiYhIOrs7+28se4ZvzvRgQ1x5XO3usKBzV9wcwgFlv4iILalaFRo2NsAujpury/D2tK/ZGVkKT1Mk81/qiqNjBABr11lo3yHdpwMVkTSkgswTlt/bla87PsPyQXVoUCIHsRaDGdvPUG/0Or5eFUj4ndgU78vbGwKPmsnmZseNpRXpvfh7TtwsRKFsZ5k9+EWy1T+EyWwh+IZJcwqIiNiIu7P/8qxa9D30MZcNb0p6n+TXN18iW/3Dyn4RERuzYJ6JhvXNWKIciLyUk44zZxBs8aCC0wkmvtXLmv1r16D5ZEQyMRVk0knJ3J5Mebkav/V+lgp+2YiIjmPcmuPUG7WOaVtPEx2bsk82/f0h+IaZtm0hNCobfS69R7RhRzOnnbx0ewtGjANgosfLGsIoImIr/s1+E4HL6jAgYiBxhomOLutpFfyPsl9ExMZ4e8Oa1SYmTYq/fOnMxWK8cXEIAL2cltOEv63Z36Klsl8ks1JBJp3VKOLLH/1rMvGlShTO7saN29F8uOQQjb/ewJL9F7FYUjbssHLl+H8Ds2fnvd3DABgb8B+eyb0fgEsXTBQvYVE4i4jYkMqVwYh2YHtUOb6NbQfAxOcGU9w3/uNRZb+IiG2pW/ff/29wLMfEq+0B+LnOUArljc/+E8eV/SKZlQoyGcBkMtG8XB7+erMunz9flhweTpwNjmDgb3tpPWEzm49ff+g+OnaM//f2kTyM/fMd/hvUBCe7aBb06IJ/l9V4VDvJ9VtxtGmr60pFRGyFNfsP5uPzjW+zNa407vaRLHzlRfy6rFf2i4jYmOLF/73b6u0jeRg0aQJ77xQjm/k283q+RN7OG5T9IpmYCjIZyMHOzEvVC7LhnfoMaVocdyd7Dl4IpevkHXSbvIODF0Luu23x4lC3nkHI5uKAiZ7zf+bCnZwUcT7PxxHzCNtZBGIc2LQJdu9Ov3MSEZEn5+7sv7m1JP3PD+e64UlZl5MMvblU2S8iYoOWLDbhk93Crc3FiYl15sVZMwg1XKjqcIy3HBcp+0UyMRVkngKujvYMaFiMDe/U5+VahXCwM7Hp+HVafb+Zgb/t5eyNiGS3++N3E7Vq2AEQHOlDt9XfE2cx063CXF7v8wlu5c6COY6eL6taLiJiK6zZbzFz6I9GvBnZH4AB1SbR49Uxyn4RERvj7Q0nAs1Ufia+33/8XCkGHngfgDcLTqPda+OU/SKZlAoyTxFfdyc+fK4Ma9+uT9tn8gKwZP9FGn29no+WHOJGeOJbZXt7w6aNJipXiQ/edbubMfL8ywB85fsjeS9EQ5w9hw6aqFlLk32JiNiCu7M/9pYbv6/oys+xLQH42ncc3mfMyn4RERvj7Q27d/3b75/2+wBmhTfBbDIYn/1r3C7YKftFMiEVZJ5Cfj6ufNupIkvfqE3d4jmIiTOYuvU09UavZ9ya49y+51bZq1aa8PSMv0vTZ2vfZd3p2rg73mZ+r87491uBb6u97NgTS+cuqpiLiNiKhOy/fSA/nwb1Zb+lMD7OIcx9tTN+/VYq+0VEbNDd/f4Bi8cQGJOf3HY3mdWnG/n6rVL2i2QyKsg8xcrm82J6r2rMerU65fJ5EX4nlq9XBVJv9HpmbD9DTFx8GHt7w+rV8S9l5JlcdF00ieBYTyq4BfKez1TsXKJxKnidv1aYOH48I89IRETSyr/Zb+Li7zV4/eZbhBvO1PT4hze95yj7RURs0N39/lsnCtF50a9EGQ40dtpDb+/Fyn6RTEYFmUygVtHsLH69Ft93rkhBX1euh99hxB8HafrNRpb9cwnDMKhaFSpVjq+EXwzLy9vh8XMKvGq/nIp/hxEZmAeAipUsBAVl2KmIiEgaSsh+S6Qje5Y14/2YXgAMtPudotsNZb+IiA26u9//99HqfBbZDYBh9r/hd0DZL5KZqCCTSZjNJp6rkJdVb9bj49ZlyO7uSND127w++2/aTtjCtpM3+HGiybr+f082Y/zf8R3zaS/0JafbVQBuh5spUcqi60pFRGxEQvZHBeVg3rUAFsbVxs5sYXaHXng53QKU/SIitubufv+PR7qyIqwGTqZYZrd7BVfnMEDZL5IZqCCTyTjam+lRsxDr32nA4MbFcHW0Y//5EDpP2s6Egzup1zYEk30cN/4qwzvLv+DgjRLkcr3OlDb9MRF/iVNMjEHDRrquVETEFlStCgHNDEz2cVyeVZ3hlwdwxpKTgh4X+LHVm0B83iv7RURsx93Zf+OvcnSfOoUrFm+K259nfI/Xresp+0WebirIZFLuTvYMblycDe80oHuNgtibTaw/do0zJTZTuscB7FxiiIp1odO8KUTFOtGi2CreqP5T/MYWO/btNdi9O2PPQURE0sZvs000qGfGiHTm9JKaDLzzBrGGmU5lF9Gjwuz4lZT9IiI2JSH7iXHgWnAeeq76Doth4uXci3mxwYz4lZT9Ik81FWQyuRweTnzSpiyr36pHq/J5MIDw7Bco2H8j3g0PcyS0GEO3vA/AqMYfUD7Xgf9vaaZ+Aw1hFBGxBd7esGa1icmTTcRc9WL9hpZ8HdsegPEthlDU5+T/11T2i4jYiruzH2DF9jaMv9QJgB9rv0uBPMp+kaedCjI2olB2N8Z3qcSSAbWoWcSXOMOCZ9Ug8r22jmlRzVhyPAAn+2jm9uhKgS7r8Kh2ktt34mjTVkMYRURsRa9e4O5uIXRnYb493Z1tcaVxd4xgbq8u5O28UdkvImKDErIfYMjUb9h3pxjZzLeZ1bM7uTpvUvaLPMVUkLEx5fNnY9ar1Zneqxp+Hp6YnWLxrnec971f5EqsNyVdTvPOzWWE7SwCMQ5s2oSGMIqI2JA//jCDYeLq0soMvj2AW4YbldyO0v/KemW/iIiN+uOP+D/rYmJc6DJ3MhGGE7UdD9LbskrZL/IUU0HGBplMJuoWz8GG4bVx2FOB2FsuhLo7846lDwADqk2i+6Cv8G21F5NjLK/1U7VcRMRWNGoEJUoaxIW6cmhlPd6LeQWA4bW/5oWB45T9IiI2KCH7AY4EVeDdwLcAGFFiPAEDf1L2izylVJCxYWazidlf5OfCL/UIXlOadeFVmBzbHICxXhMoVOYIPk0OsWe3iePHM7ixIiKSZmZMj59P4PahfMw/0pZ5sfUwmwzGZfuWvGUClf0iIjYoIfsBxs8Zyp9Rz+JoimOC1zfkKHNK2S/yFFJBxsZVrQoNG5gJ+7sAF35qwFe3u3DEUoDsplBGO/yIs981HLKHMW9eRrdURETSStWq0LCxAWYLwSvK8WF0D4IsuchnusHnDr9iGHEAyn4RERvyb/bHAWaGhr3GJcOHIuZLvG8/C8OIn2dG2S/y9FBBJgtYMM/Es9XNGNEO3Dzhx8CYAUQZDjSw208v7/+Sp9dGvtm6n5z+twkKyujWiohIWkjIfkuUI8Hn8jI45nViDTPP2W2nxbn4j0fffx9y5LQo+0VEbERC9gNcPluIt2L6YTFMvGS/hka3DwPKfpGniQoyWYC3N2zbaqJOXYOQNWXZe7A6Q9d8DMB79r9R3Hwe93LncWm3kVqvH+ZWRHQGt1hERB7X3dkfvLgKu65X4NvYdgD80PpNqnRbhEe1k1y/FUe16pYMbq2IiKSFe/v9qw40ZcLF+FthT6w7jLLdlyj7RZ4iKshkIYv/MNGonj03lj7D91v6s/pmdZxN0YwN/xXLOXdM9hYcywdR4/N1/LjhJFExcRndZBEReUwJ2X95ah2+u9qZXZbieNhF8K39j0TsKggxDly/ZmLRooxuqYiIpJW7+/1Dpn/NwdhC+JrC+C7Hd4TvLKTsF3lKqCCThXh7w4rlJgYOBDDxyl/fcS3Cl/K+Rxl4bSVX5lcl+poHkXGxfLX8KA3GrGfernPEWTQbu4hIZpWQ/W/0t+Pq0sq8eac/YYYLtQrsYHidr/+/lon2HTV8XUTEVtzd74+OdqXH2m+JMhxo6L6bIe2++v9ayn6RjKaCTBbUokX8v2ePleLVP78FYEjN8dQ0DnNpSh2uL61AbKgzl0KiGLrwH5p9u5HVh69gGCrMiIhkVi1aQPTlbBzcXJsPYnoC8GG9r6iadw8ARpyZ4iUt3LyZgY0UEZE0ldDv/3tbA94//CYAn5T5hgrFdwPKfpGMpoJMFhQQAJ6e8deMLjnUmkmHugAwo9MrlOm3GGf/a1ycVgu34yXxcnHg+NVwXp2+m44/bWPPmeCMbLqIiDyihOwP2VaEOedb8d+4Z7E3xzGnWzeK9V+Kb6u9xJniaBqg4ruIiK24u98/dsFw1tyugrMphtkde+Hfb7myXySDqSCTRS1a9O9LP3jxGE5E+ZHX8Tpf+fyAe5kL+DQ8yuFFRZjargH96hfByd7MrtM3aTdxG32m7+bE1bAMbL2IiDyKRYvMYJi5vrQi793uzQXDl8LOF/jI+xfcy1zEp8khdu8ycfx4RrdURETSyr/9fjPdpk3hhuFBabszvOszQ9kvksFUkMmiGjWCypXjK+ERMW4MjHiDGMOOVnY7eMG8CWe/GwBcPufAu81KsuGdBnSq6ofZBCsPX6HpNxsZtvAfLodEZeRpiIhIKiRkf+xNN06vq8qQmNewGCa62K+jqXmXNftPnMjghoqISJq5u99/6VoB3gnvB0Bvuz+pYT6k7BfJQCrIZGGrVpmwd4gfwrgzqJr1dqgfO0wj55X4W18XLRq/bm4vZ75qV56Vb9alSelcWAyYs+sc9cesY9SKo4RExmTIOYiISOokZH/4vgJsCqnEz3EtAfjKYRJel+LXSch+EXk8MXEWft18KqObIZKo378ksAWzYxtiNhl87TARp8v2gLJfJCOoIJOFeXtD4DEz9o4WgleVZfQ/r7EzuiQepki+9/6G5s1iKVYs8TZFc3owqXsVFvarQZWC3kTFWPhh/UnqjV7HL5tO6VbZIvcRGAirVmV0K0Tuzn6Dy789y9ioFzlkKYiPKZyxHhMIaGZJkv0i8mhG/3Gar1fpOhDJePf2+4cdHMIpS27ymIL5usDXNG0Wp+wXSSOp6ferIJPF+fvD1ctmKpe359rSyrw4cS6hdzyolX8HCwd/c9/tKhf0Yf5rNZjUvQpFc7pzKyKGz5YdodHYDSzcc163yhb5v+BgaNbcoEQJaN8+o1sjEi8h+ysWdePCkmoMinmdKMOBZoXXsOjdXzK6eSKZXnAwNHoukh83B2Z0U0Ss7u73n1tci65//EysYaZNtvWMePmnjG6eSKb3KP3+VBdkevTowcaNG1O7mTzFvL1h104TgYHww+xCRNQfDYDLjq84s20Py5eT7CRfJpOJJqVzsWJQHUa1K09uT2cu3Irk7fn7aTluE+uOXdWtsiXL6/KSwdqNsfi22ku7vt9ndHMembLf9iRk//5lucmR61m+jI2/457L5hGc3nX0vtkvIg/X5SWDAw6HMTvFUcM4lNHNeWTKfttzd79/xMh6bMzZB4BnDn3Cjk3HlP0ij+Hffv8eOvT/LkXbpLogExISQuPGjSlWrBhffPEFFy5cSHVD5elUrBg0bw65m3biTtG2YIklalYf2re5TfHi8dW+mzeTbmdvZ6ZjVT/Wv1Ofd5uVxMPZnqOXw3h5yi46T9rOvnO30vtURJ4KgYHw1woTng0P8ly5ZfzqPTKjm/TIlP22q1gxmNSvDGs927A+rgKm2Chu/vIqbVpFPzD7RSR5gYGw/vANXItfprV5C5Ncvs7oJj0yZb/tSuj31+3zGYftS+NuisRY+jKtWsUq+0Uewd39/r7lZ/KL5+gUbZfqgswff/zBhQsX6NevH3PnzqVQoUI0b96cBQsWEBOjiV1tgsnES3O+4XxYHkpkP8HEwS/j22ovazfG0rnL/Ue8ODvY0a9+ETYNbUCfuoVxtDez/VQwbSdsof+sPQRdv52OJyGS8U6ejP83X4FTjHLI3EOBlf22zd3Jnm9erMg70X0INtypmOcA37zVJ0XZLyKJBR634NPkEHm5zmcOUzK6OY9F2W/77B0c+Gz/r4QZLjzrcojPhwxQ9os8gg0b4v8tXXg/79vPTPF2jzSHTI4cOXjrrbfYv38/O3bsoGjRonTr1o28efPy5ptvclzj3DK1wEBYuMyHNy4NAaC761+0LrcUt2on+GuF6aETFGVzdeS9FqVYN6Q+7Svnx2SCPw9cpvHXG/jP7we4GqZbZYvtCw6GER8YgMEoh0nkMIUSaMmX0c16LMp+2+YZ7cOJ7VUYHtMbgH6uf9C43OoUZ7+IxGf/0F+DcM4eyliHiXiaIthvKZzRzXosyn7bFhgIvy0oxX+uxt8K+23nedQou1nZL5JCCfPG9OkDDvZRjPf4DhdTNFvjSqVo+8ea1PfSpUusWrWKVatWYWdnR4sWLThw4AClS5fmm2/uPyGsPN0SPtXflc2fSbEtAPjK8iuOu3wBaNo0ZcMY82VzYUyHCiwfVIdGJXMSZzGYteMs9UatZ+zKY4RF6ZMVsU3BwVCipIW/D8TSr/4PBGTbxh2LA0NuDcjopqUJZb9tOnkSbm0uzoqYasyJrY/ZZDA6ZjLGtjxAyrNfJKsKDoaSlW8TXvA4ve2WUcPuCOFxLrxz6/WMblqaUPbbpoR+/yJTTZbFVcPBFMe3dj8QszMvoOwXeZgOHQ1WrY+fL/KrXm9T3v4UNw13hoX0S9H2qS7IxMTEsHDhQlq1akXBggWZP38+gwcP5uLFi0ybNo3Vq1czb948Pvnkk1SfjDwdihSJ/zfqnA9jYjtyILgEOZ2Dmdx6ABA/dHHlagvtO6RsGGPJ3J5M7lmVuX2epWKBbETGxPH92hPUG72eKVuCuBOrW2WLbWnT1uD6NTNVWy1nbN2PABi26mM2/NwtYxv2GJT9tq9IEcBiJuJUdj6J7c5pSy783C7xQ8u3reukJvtFspo2bQ2MZ45Szukkb9vPA2Dg0tHs+PnFDG7Zo1P2275/+/3ZeS/mVS5bvClif5Hvuw20rqPsF0leYCCsXWPCu/FB6pddy6DcswAYsOUz9v/8Qor2YZ/ag+bJkweLxULnzp3ZuXMnzzzzTJJ1GjRoQLZs2VK7a3lKFC8OAc0M1q4tS1y4M12O/MquVxvSqvhfDHtrKJMCu3BzdRnWrjVz/Hj8pGApUb2wL4v61eSvQ5cZ9dcxTl27zcf/PcyvW4IY0rQEz5XPi9lserInJ/KEBQbC5k0m7M0x/FTiI1xM0WyOK8P8ciXx9viHm5l06K+y3/ZZs39FRcwOFt707898x4/pUm4BWwvkZ86Jto+U/SJZQWAg7L5wjYI1z/CtwwQcTXGsiKvKX1Xz4Z1D2S9Pr3v7/b3ufMufTXvwcu4/2PDWlywNbKbsF7mPhHljchQ4y7eOP2BnMpgfW5c1OUukeB+pHiHzzTffcPHiRSZMmJBsKANky5aNoKCg1O5aniK/zTbRsK49t9aX4uCVsnx66VUAPnCfSoWyu/BufBgMk/WHMKVMJhPNyuZh5eC6fPF8OXJ6OHEuOJJBc/bR6vvNbAy89gTORiR9BAdjnQDvw3pf8YzDSW4ZbgyJeQ17n6jHvEg0Yyn7s4aE7L86tzpbT9bk+9jnAfjC82dKlNn/yNkvYsuCg6FT11h8Gh9imP1vFDNf4KqRjeExr+DgE6nsl6fe3f3+5dva8lNYawDGuP9AoTJHlP0iyQgOhm+/i+/3f+r+K/lN1zlrycHHsd2JDXNN8X5S/RbRrVs3nJ2dU7uZZDLe3rBiuYnPPot/PN2pAZviyuJiiuZbhwk4e4QAcOXKo+3f3s5Ml+oFWP9OfYY0LY6Hkz2HL4XS/deddP1lBwfOh6TRmYikn4AAg30HYqnlt43hteNvc/rWqXc4H5qP8EP5uLkuZZN7PY2U/VnDv9lv4vqf5fk+pi1/W4riaYrga8eJOHiEAY+e/SK2KCDAIMj1OE18t9LTfiUAr5/4D9dCcyr7JVO4t98/KrIzRyx+ZDeFMtLhZ+w8wgFlv8jdAgIMjh6PpVvTX2jvsp44w8TAsEFcPlSMm6tLkzDVx8Nk4pq9pIcOHeL/jTyXnbdj+nHT4k4582lev7oOgPfff7yJvlwd7RnQsBgbhjbgldr+ONqZ2XziOs+N38wbv+3lzA3dKluefidPQjZvC7t3myjQbDuzOr6CndnC9KPtmTLzTS5MbMSNpc/g42GX0U0VSZEOHcBy25nb57MzOOZ1bhtOVDcfpcfZXcDjZ7+ILUjI/n2nbuNfbS+jHX4C4Pt/XmbB7D7Kfsl0Evr9IefyMDjmde4Y9jSy20vnyK2Asl8EEvf7y7ZZxffPfgDA2NO9WPztG9xY+gxGrB1166VsfyrIyAMVLw4NGxvcXF2WU4fK0m/9SADerfUtrV78FZfil1i9NtZ6mcaj8nFzZESr0qx5ux4vVMyHyQT/3X+RRmM38OHig1wPv5MWpyPyRFStZiEkNP53YHTJ0RR0v0BQWH4G/P7vXSfKlDX4e48iVzKHhOy//ntVTt7w56PYHgB82ugzGnT4Lc2yXyQzi89+C75N/2Gk0//au+vAqur/j+PPc+9dsGQb3d1Ij5SUUFFUSkAUBAQlFAW7/VlgYIGKiQKKqCghnYrEkK7B6IZtrOvec35/XDYGX1u4i/t6/IPCGTsH8MnH9875nI8obiSw83x1Hp77cs4xar8UJLnX/Zt2tOT5Q+63xLx8zURa3D5b7Rfh4rrfZnMyucb/EWqk8FtmdZ6YcbH9jRvamPb539sbVX9DyF+aPcugc3sHsfMa8vWaO/jy1I3YDIv3Sr+O7+EiuDJ9WLTIYvny//65yocH8EbfhswffS3tahTHaVp8/uth2k1YwaSl0SRnOP/7JxG5gmbPhvg4G5h2+tadTa8iq3BZBuN8hxB81yaCI93vk/z+O4OwsDw+WZF/ILv9Jz+7lhlnbuAnVzN8DBeTS7wOB0KvaPtFCprs9gfWOcXdVWbT2b6JDMvBuCJ3Ez54ndovBVbudf/L055jdcY1FDEy+aDSSzgPhKn94tVyr/uf6P0crR07SbX8eNAaQYnBa3PaP3PG32+/BjLyl7KfK5061f3vI796h/3xlakQeozPHriDou13gd3k5h7mFfucdcqE8PndkcwY2pxryoWSkuli0tJ9tJ+4gi9+PUSW68p9LpH/YswY91eJyocc5f2bHgDgzZMDWZ/QgIyTYSRvrUCTppbeSiAFTnb7P5zs4Ny8RjyaPoxTVhi1iu3j/bGDrkr7RQqKMWMsbP6ZNO20lKccXwDw4rF72J5QU+2XAi33ut/Czh0zPibeDOIavxgmPXSP2i9eLXvdH1n3Z56s+R4AT54dzv6Eyv+6/RrIyN/Wtq3728SECEaeHofTsnGL/xpuiN0BTgcpyTbqX3NlnyttVa0Yc+5rzTv9GlExIoBzyZk89cNOOr+xinnbTmBZumVS8k50NJw8aWAzXHxx63CK+iWx4WwDHvv4jZy9A6xMB+9P0evcpeBq2xYyTxXl0M9NeChrBAB3BiyizYnDV639IvlZdvtLdNzGOyFvUsTIZPnJlvzfJy+q/VJoZK/7jx6pwdhTDwAwwu9HmqUcVPvFK2W3P8A/iek97sHXcDE/uRVvTnnuP7VfAxn522rUgKrV3AOQneGleXnrKAAm3/ggFUMPA7Bju0GNmuYVjbPNZnBTgzIsfbAdL/SoS7EgXw7FpjJqxmZ6vPcLa/efu3KfTOQfyH794/jWk2hX6ReSMwMYefphIvpudN+y6JNF06bQtGnenqfIf5Hd/sT1Vfk58xqmOm8A4ONbRlIi8Axwddovkl+tWgV+5WN5qtFb1LcdIjajKGPOPkSJfuvUfik0cq/7lzoaMz2+GzbD4tOuYwgPOQuo/eJdstf979w1imo+xzntCuOBg+Mp+R/br4GM/COvvOye+KXsLs1zPz7D2hNNCfVL4svb7sFuuPd3ORcLXbpe+TtXfOw2BrasxKrxHXjguuoE+trZdiyB/h+t585PNrDzhF6VLZ4RFwdt21nccw80Kf0bL7R/CYDRP00g6tvunJ7ZiqQNVXFgZ/EifYVUCr5XXjbAMkjaXo5Xkgew26xAiSKxfNbjXrJf63i12i+SX8TFQbNmFvfc66L7TTMYYZ8LwD0/vMXO7zur/VLo5F73j/joAw46S1PWfo6PBw0F3I8sqf1S2MXFQf367nV/z7YzubvUHADuXvYG+7/r+J/bb1h58MxHYmIioaGhJCQkEBIS4ulPL/9R8RIm5+JNcDqoFHaQrcPbEOKXzMTk23lp0/2c/7kGOG1ER9uu6rPT55IzeHf5fqavP0yWy/3H+JaGZXioS03KhwdcvU8sXi0uDqpWN0lIcVGu60aWVbmX6qGHmL2nO72//hJwh9jhYxK910blyhc/1pvb583XXlhktz+g2hla3TKXub5P4G9k8XTSEN75bajH2i+SF2JioHYdE6fhouYdi1lWdhRljDg+2XE7Q779IOc4tf9S3nzthUXudX/La1aw6pae+BguxqaM5LOoO9R+KdQ2bYJmzU1wuKh90wpW176bCFsSk0/0ZeTUD3OO+y/t1x0y8o9tWG8jLNj9R+dQfGUeOTEGgLEB31BzXwo4HYCNG7tf3edKiwX58ezNdVn6YDtublAGgDlbTtDx9ZU8N3cncSmZV++Ti9e6sbvF+Tgb4Z138Gr9CVQPPcSxlFLc88M7ZA9jmjS1OHP60iiLFHTZ7U/dU4bN21rwonMAAI8HfEG5XZbH2i+SF5o0M8nKtFGs+29MLP0WZYw49qeWZ8yPr188Ru2XQij3uv/XbR147Xx/AF4I+JgSB9V+KdxatDSxXDYiOm9lcq0XiLAlsTOrEmM/n5RzzH9tvwYy8o9VrgxxcTZKlHDflTI3sClfHbwJh83FzIF3UmXAEoIjY4g55KRf/6t/A1bFiEDe7teIuaPa0KZaMbJcFp/+coh2E1bw7vJ9pGbqVdlyZURHw7pf3UOXW6stoJ9jBaZlMM5nKL637gHggQcgaqNecyqFT+72xy2ty7TMLix1NcLfnsnXgwZQof8Kj7ZfxFMWLYKEeBsYFgOrfUd3+3qyLDsP2u8hqNc2AP7v/9R+KZwuX/dPybqJten1CDLSmXnHXZTpt0rtl0Lp44/BmeUel9xX+wva2beRbvlwv3kfYT23ADB16n9vvwYy8q+9997F50qHf/0eh9NLU9nvBE8nf0vShqqYmT4sWgRRUZ45n/rlQvlyaHO+GBJJ3TIhJGU4eW1xNO0nrmTG+iM49aps+Y8WLHB/Wyb4BK8Gvg/A+66bWGfWwZnkfkzO1B8zKeTee8/AynSQsrsMD6cP56wVSp0iB3ko7qc8ab/I1fbdd+5vG7RezXN+nwHwhrM326yqOe0vWTKPTk7EQ7LX/Um7y9N/2uckWIE0duzjAZ85ar8USu+7l/o0rrWex4t8CcD/Oe9gv1Uup/3t2v33z6M9ZOQ/iShuEnfe/Vxp+6YLWXp9P+w2k1EJDzBt7UCSoipTraqNfdGe3dzONC3mbjvBa4v3cjQuDYAqxQN5uGtNutYthWFosz35Z2JioEZNE0yLJQN70LHyGjan1OS2lOdJOlqK+KV1MDN8WLzIoHPn3/85vLl93nzthVF2+0MaH6ZHp6+Z5vsqAHeef5w5v/bMs/aLXGkxMVC9holfSCLrRnWggf0Av6TVp1/S06QeLZHT/ui9xh/un+HN/fPmay+Mcq/777z1PT6/5nFMy6BP4rMsXttd7ZdCISbG/ajSubMGAf7J/PZgS2r6HGVxRlPuTniU9KPFiVtSlwqlHRw6+Md/1rWHjHhE1IaLz5WujOrGa6fuBOBF/4+JiHaAaWf/PoPwCJODBz13XjabQY+GZVn2YHuevakO4YG+HDibwogvf+PWyWtZdyDWcycjhULjJiamy8bDrSfRsfIakjMD6fPJTA5N6UbsvIaYWXZCQ60/HMaIFCbZ7U/cUIVFMZ34yHk9AK/7vk/grqA8a7/Ilda4iYllGkwYMJ4G9gPEu4LoO+Urjk7pktP+Fi3QZqbiFXKv+6d9P5KZqZ2wGRZvB72Nz55QtV8KhRYtTc6ddwEG7941kpo+RzlthnHHh19yfEpnYuc1xMq0M/ubKzN41EBG/pPs50qrV3ffaDXZcSM/H4sk1C+JWUP6UfHeRUR038z5FBeRzT3/LIevw8ag1pVZNb49YzpWI8DXzpaj57n9w3UM/nQDe04levycpGCJi4OmTS0SE2xElo3ihQ4vAjDqpwnsj6uac5zdZrD5NyVVvMPF9kPs/Aa8mnk7u8yKFA+IY8awAZS7d0metl/kv8rd/hs7zGZkxDcA3LPoTY4nlcs5LijAYMF83Q0g3uHydf8zKYM54CxNGXss0+69g7JqvxRws2fDubM2ghsdoW+HLxhc6gcABi95i7NxpS8cZVKrlkHTplfmc+r/HuSKmD7dvRiJ31SFAbM/JsEZSLOgXTwYPoOguicI77yTc2dtLFmSN+cX7O/Dg11qsnJ8e+5oUQG7zWDF3rNc/9YaHpq1lePn0/LmxCTf693HYtNmk2DfRGbcNgSHzcU3h27k8y0Dch1lsW6t3qwh3mf6dANXij9x2yszOmsUaZYvHUKiGB72fb5ov8i/ld3+EsWO83Gb8dgMi89P38TsjX1yHWWxYrlNG/mK18le95+Kqk3/OVPJtOzc6L+OAWGL1H4p0B5+2D1srFJ7O1PaPArAO6f78tO6HhcPMgy+mHblBvEayMgV0awZdO1mkfJbZY4kVOCRlBEAjLL/QAvbLvzLux8R+vXXvDxLKBHsz//dUp8lY9tyY/3SWBZ8+9sxOry2kpcW7OZ8ql6VLRd99x0sX2YQ3PQg790wjqrhhziUUJ5Hzt9LyX6/EhwZAz5ZNG3GFZuSixQk2e0/v7Qev61vw3NO92OrDzu+or5xIN+0X+SfuNj+GD4fOISStnj2ZZXjyeMjKdlvrdovXi/3un/9zmuZkN4PgGcdn1PVOK72S4H03Xdw8KCBzeZkctkJhNmS2ZpRjZdODM1pv90vi65dr2z7NZCRK2bmDINGDdx/pGbvv4mvne2xGRZv+kymyAkHAC1b5uUZXlSleBDvDWjMnJGtaVElnEynyYerD3DthBVMWRlDepYrr09R8lhcHPTs5b7ddnDrzxjY4Gucpp27lk/i4A/XcnpmK5I2VMXHZmfxIt2uLt4ru/3nV9VienIXFrgi8TVcvO3zDo7jRYD8036Rv5K7/fe3nUK3kF/JsBwMXPY2x+a2VPtFLsi97n9r12DWuOpRxMjkXZ93sI4HAmq/FBy52//cgMdo5bOTFMuPO1dM4tSPkTntv66Dg5kzrmz7NZCRKyYsDKKiDMLCTeKW1GP89kfZ7yxLaSOO1wInExLi4tdfyVe3LzYsX5SZw1rw2eBm1CoVTFK6k1cX7qH9xJV8vVGvyvZmrVtbYNmoHr6fFwOnAvDcL+NYve26nGMCAk327tbt6uLdctpfFE5/1YJH0odxzCpGZdtpnvf5lJAQM9+1X+SPZLe/YaVNPBkwDYCn9o9h/fpOOceo/SKXrvtjlzTg3j3PcM4KobbtCM+W/kDtlwIlu/3tGy/m0cofAfDAxmfYtv7aC0eYPPggLPzJuOLt10BGrrhNUTYiQuwc/bElvT76igynLz2qL2RgzY955hno0gXCw/PP7uuGYdC+ZgkWjLmWN/o0oGzRIpxKTOeRb7fT7a01LN55ijx4O7zkoeho2LPHwNeewcyedxNkT+fnpIZ8XrMpwZH7we4ELLZs1r4xItk2RdkItYI5ML8N92eOxGUZ3FlvFjdXmpUv2y9yuez2B/gkM6vvXfgZTpZkNmFGaCu1X+QPZK/7d87uwt3L3gRgWKnv6NLoK7VfCoTs9oeHnOWLG+7FYZh8m9aO7+0tKVLjJPhkgmEwYsTV+fyGlQf/p/l338ktBduSJe5nR5OXT2FC+0dJd/rR/KNlbDtdHwAfP5PTJ/PfV5jSs1x8ue4w767Yz/nULACaVAzjsetr0bRSeB6fnVxtcXFwbVuLXTsNXuvyBA+1fJfY1DCueX8tJ5LKuA+yu6hd08aunf/slkVvbp83X7u3WbIEXlu1nS7WJB7ymU1SViCN31+T81ay/Np+8W652//pwLsZVOVbTplhNPpsJaeOVnIfpPb/Y9587d4me91f/PgI7i09k3gziIYfrebISbVf8q+L7bf44f4e3Fx0NQedpWj41q8kJl/4/76r3H7dISNXTefO0Lw5TFw1gvmHOuHvyGD2oH5Uv28eEd0348RFl675784Tfx87Q6+twqrxHRjZoSr+PjY2HY6n1/u/MvTzKPadTsrrU5SrqGtXi917XdxYfSEPtXwXgGGrJl4cxgCGZfDLz9o7QOT3dO4Mo1rV4c2Td7DOrE2wTwrfDulL5XsX5uv2i3fLbv+AptMZVOVbTMvgnl3PXRzGoPaL/Jnsdf/9n0xia2Y1wmzJzBo8gAr3LlL7Jd/Kbv+YW17n5qKrybTs3Lly0sVhDFe//RrIyFW1fj2AwV2zpnIisxjV/Y/ycvj7Oa/Ei9posG9fXp/l7wst4sP4rrVYOa4D/SLLY7cZLN19mq6TVvPI7G2cTNCrsguTmBj3LbVRUQa1O/7M57e470uctH4432/onetIk++/01d4RP7M5ig7p39oxuiUMcRawVwTsI+nwj8pEO0X75K7/Y06L+X9bg8B8GbC7cz9fmCuI9V+kb+yfj1kOf3pM2MaSVYRmvvs5qHwGWq/5Du529+mx/e8es2rADyzZww//3J9riOvfvs1kJGrqnlz97exaRGMSnkAl2XQ076GHmnr4cJ+ufv35935/R2lQv15+bZrWPRAW7rWLYlpwddRR2k/cSWv/LSHhAuPNUnBFtnCJD7JxG44+bjJY0QExLPpZAMeWfJCrqNcdOxk0KNHnp2mSIHQvDk444LYvbAtD2W5h5uDHYvomLq9wLRfvEN2+33tGXza+FGC7Gmsc9Xmyekvg5m9TFb7Rf6O7HV/9OG6PJoyHIBR9h9onhat9ku+kt3+oIDzfFLnafyNLBYnNeeVWU/nOsoz7ddARq6qrl0hLMxd4NUH2vBmeh8AXgz8iGLrAwB4+RWL+Pg8O8W/rVqJID4Y2JRv721FZKVwMpwm76+Koe3EFXy4Wq/KLshmz4a4czaKtonm6Xav0sJ3F0lWEe5a/zqZLr+c465ta2P2N7pdXeSvZLc/ZVc5lpxrzYfOGwF4PWAygb+4bwMuKO2Xwit3+1+78TGucRwkzgpi6JYXST938cuhar/I35N73f/13luYmdUBm2HxdugkfDZEAGq/5L3c7f9w8HCq2U5w0gpn6NI3yD0e8VT7NZCRq27TJhs+fu5XYT/5+SssO3QtQb4pfD+iF2VvWse6KCf9+hecZ0qbVAzj6+Et+PiuptQoGURCWhYvLdhDx9dW8k3UUVxmwbkWcXvwIffvWddrFvJk24kAPHRwHOdbJVK0/S5sPk7aXGuxetWVf9WdSGGV3f6T01vy0unBbDGrEuafwHcje1Ky+8YC134pfLLb37PB94xu9DEAY5NGc768pfaL/Eu51/33TnuPPVkVKGk7z6zh/Siu9ks+kN3+wU2/oF+xhbgsg+GHH8dsfTpP2q+BjFx1lSvD6ZM26tZwkHE6nDEJD3DWCqG24zATrnkF/3pHWLTQ4LPP8vpM/z7DMOhUuyQ/3d+WCT2voXSoPycS0hk/exs3vLWG5XtO61XZBUR0NBw9YlAq6BRvB72NzbD4dF8fpn4xjuNTOnF+ZR2qVrbz4w/66qjIP5HT/qp+HJ/dilHpY0i0AmjiE81zDd4skO2XwiO7/ZWKHuL1kPcA+CDzRmZ8MobjU65T+0X+pdzr/oRjZRly6GlSLT+udWzn4Ws+UPslT2W3v0GNKF4M+AiACScGM/fzYXm27tdARjwiLAwmvOr+g51Q2uCBrJGYlkF/xwpuStsEwODBULyEycGDeXmm/4zdZtCnWXlWjGvPY9fXIsTfwd7TSdz9WRR9P1zHb0d0T2Z+FhcH/fpb2A0nM3oOoYRPPDtTq/Bi6K0Ubb8LHE7AZP48fXVU5N/Ibr/zfCCH08swPsu9p8Bwx3zanY8GCmb7pWDLbr+vPYNv+g0k1JbCb2Y1XkkZQFCjQ2q/yH+Ue91/ODScp7IGAzDWMZuW7ALUfvG87PYHBiTwda9BFDEyWZXZgCk+3fJ03a+BjHhM1arub9OPhvOLWZ//+/VBAD7oMZq2oz4novtmYhNdRDY38/As/x1/HzvD21VlzcMdGd6uCr4OGxsOxnHb5LWM+GITMWeT8/oU5Xf0H2CxbaeTp9u9SodKP5OcGchtn8wiZsoNnF9ZB0wbTZsZVK+e12cqUnBltz/tYHEWmc34xNkNgGm9hxE5cmaBbr8UTNntn9jlSZqW2MZ5K5C797zAobfVfpErJfe6/1uzLTNju2E3LL7oOooGI2er/eJx2e2fOvgeavoc5ZQrjL5TZ3B0Stc8bb8GMuIxNWpA124WicvrEbeiFs8tfYI1iY0IsqfzQdhEitU9SHjnnZw7a2PJkrw+238nNMCHx66vzcpx7enTtBw2AxbuPEWXN1fz+PfbOZOYntenKBdER8OihQY9bv00Z9+Y+1a/THTsxQoXi4DFi3S7ush/cUn7V9bkmYP3scWsQpgjiQ/CJxJW90iBb78UHNntH9DrPcZEfgjAA0mj2T73OsDde7Vf5L+7fN0/7JMP2esqRwnbed4Le4OQusfUfvGY7PaPGvx/OfvGDPrldU6fK5dzTF61XwMZ8aiZMww6tnWQtKEqpmVnTOYozlqh1LId5UWfT3AmOQCYPDmPT/Q/KlO0CBN6NWDhA225rnZJXKbFjPVHaDdxJa8t2ktiul6VnddiYqBcyDE+rPY8NsPi4+i+fPHLXTk/XreeRfRem25XF7kCctq/vhonf2zOyIz7SbACaGiL4XHHdJxJPkDBb7/kfzExUC08hneqvALA+87ufP39UMx0X0DtF7mScq/7U1JDGZH4EKmWH63tO7nf8a3aLx4TEwNNaq3j5eLuPcNeOT6ERSt65vx4XrZfAxnxqLAwWPiTwcfulxlw9HBVRmeNxmXZ6GlfQ69E934yc+aAr7/Jli15dqpXRI2SwXx0V1O+GdGSxhWKkpbl4t0V+2k3YQUf/3yQDKdelZ1XqlbK5Otegwm3JbHDrMSECtdTZtgKgiNjAPj+O+0dIHKl5G6/K9mffYdq8WDWvQAMdiyiy9m9QOFpv+Rf1SqmMrvvHQQbaaw3a/Gasw/hnXeo/SJXweXr/u0HGvFo1lAA7nd8TzuXez8ZtV+utjKlYvmm5yD8jSxWuBrwUbH2+Wbdb1h58CqYxMREQkNDSUhIICQkxNOfXvKJ4iVMYhNdhHfeydCEZbzS8XnSTR9uPPoGG3e1JmlzBXxsdjLTC8fc0LIsFu86zYSFe4g5mwJAubAiPNSlBj0alMVm0+3RnhAd7Z6St0h4jLA9kzmfHkKnfR9wuqQf6UcjSFxel45tHSz86cr/fnhz+7z52uVS2e0veft6nir3Afc5fiTZVYQuR95mx54mha79kj9kt7/lmfsoenA6Z60Qrk+YyKnMYqQfLab2XyXefO1yqdzr/gll3+TuUnOIt4K47sg77NvVQO2XqyI6GvbvMym27WYiM9ZwzFWMDrumklbSyjfrfv2JlzyzYb2NiBA7sfMaMmHNWBbGtcLflsWHwa9j3xYOWT5kZRi8915en+mVYRgGXeuWYtEDbXn5tvqUDPHjWHwaY7/eyo3v/MzKvWf0quyrKC4Oul1vUbMmfDLue8L2uO+Pfe3g+/z23Q0cn9KJ2HkN6djWwcwZGo6JXC3Z7T/1eWue3zaGdWZtguxpfBj8Gq7NpQpd+yVv5W7/7CemUfTgdFyWwZj0+9k6rTvHp1yn9ot4QO51/72fTmGrsyphRjJTS75MxpbSar9cUbnbv2baC0RmrCHTsvPs/o/Y/13HfLXu10BG8kzlynD2jI1OncDCxuCf3uHg+QpUDT/EF7cOx8AEDMY+aBJfiN4e7bDb6BdZgZXjOjC+a02C/RzsPpnIoE83MuCj9Ww7dj6vT7FQ6j/AYvlqJy16zeGzniMAeH3DKKISbyA6GhYscE/RF/6k29VFrqaL7Tc4s7ARI84+zBmrKPWK7WXqzaMBi8LYfskb2e3v1Gcm793kfrvj687eHDzal13rg9R+EQ/Jve7PzAyg36LJnDcDaewfzfuD771wlNovV0Z2+28e9CEv1JoEwOO7x3HsfPt8t+7XQEbyXL9+7m9P7a9G3yVTSMvyp3uNRTlvvsnKtFG2XOGLcxFfOyM7VGP1wx0Y2qYyvnYba2NiufndXxg54zcOnUvJ61MsNLJ3Vi9z3Xq+rP0ogbZ01rrqMMF+K4sWuqfi11+PXnEq4kH9+oGV5WDnV125L2UsWZad/vW+ZUzzKUDhbb94Tnb7K3X+mc9qPYm/kcUSVxPeju3L+pnuN2uo/SKelb3u3x3Vgrs3vIppGQwuNYd7bnwHUPvlv8tuf80blvJRhRdwGCbfudrwWVq3fLnu10BG8tyQIWB3mABs3NWGUWteBODZ9i/TrZr7PXhpqTbCIwrnZl9hgb482b0Oy8e147bGZTEMmL/tJNe9sYqn5uzgbFJGXp9igRYTA82amYDF5NovUNV2khNWOKOzRuNTPgGA/fvz9hxFvFF2+53nA5n/7QBezOoPwGtdnqJNhbWAu/0RxQpn++Xqym6/zXAxtc4zlDPOcdAs6d5M2gZgqP0ieSD3uv/7RQN59ejdALzV5Dma1fkFUPvl38tuv48jnc9qPUVxI5HdZgUezxqCf3n3lC+/tV8DGckXvp198Y/iJ2uG8v6mQdgMi+k9h1C56EH3D1g2GjcpvBPzcmEBvNGnIQvGXEuHmsVxmhZfrDtMu4kreHNJNMkZzrw+xQKpcROTxDQX41u9xQ3+68i07NyX+QCxhJJ+NAKAatXy+CRFvFR2+9MPFeeNpWP50dUSH5uTWX3vpHTQSQAsw6JppJmXpykFUHb7n+/wIu18t5Jq+TEiayxJBJB+tBig9ovkldzr/ic+nciSpEj8jSxm3zqIYkVPAWq//DvZ7X/v7ntpYt9HohXA8KyxpOOXb9f9GshIvtCjB7S59uKGtvf/NIENpxsS7p/A3GG3Ue3e+UR03ww+Ltq2K9wb39YuHcKngyOZOawFDcoXJTXTxVvL9tFuwgo+X3uITKf+cvo7YmIgNNQkMcFGz16f8nKn5wB4/Mj9RCXUI3lnWRKW1aVrNyvf3LIo4m1ytz9pYxVGb32avWY5SgecZe6Ymyl903psPi5cToPPPsvbc5WCIXf77+r7Hk9c+zoAj2YNZVdqFbVfJB/I3X4LO7d/PJ1DzpJUcJzh25G9KNF9o9ov/0ju9t8/9P8YVvo7AO47/ggHEirm6/brtdeSb8THQ9WqJvHx7jlh2eDj/Da6NSV84pnnasHw04+StLUCSRuqEh2df577u5osy2LhjlNMWLSXgxf2lKkYEcC4LjW5sX5pvSr7TxQvYXIu1qJK6GE2jbqWorZkPtvXm8EzpgLuX7dq1S02rPfcZl7e3D5vvnb5c5e03+6izagvmRf6KKFGKjOcHRj5y8ucX1mbjh1h2bK8PlvJ77LbXzdiL+vv7UCgkc5U5w08vu1hzs1tCBhqvwd587XLn7t83d+2/9f8VG00AUYGHzhv5NFfnlL75W/Lbn+remtYfktP/AwnrxwfzGMfTco5Jr+2X3fISL4RFgYxMTaKBLjvADmeVJbhyePIsux0t69jwM6tJG2oCkC//lahfXQpN8MwuL5+aRaPbcv/3VKPYkF+HI5NZfTMzfR47xd+2X8ur08xX1q0CM6dtVGq+U6+73sHRW3JbDar8VL57pQZtpLgyBgAJr+X9zuri3i77Pb7+pngsnMwqyxjskZjWgb9HSvok+TeT2b5cvcrLL2h/fLvZLe/fIutzLm9P4FGOj+76vKKsx++peIJjjwAqP0i+cEl7QdiQosxLms4AMMd8+nhuwZQ++WvZbe/Wvt1zL55MH6Gk0WupnwQ0Ykyw1bk+3W/BjKSr4SFwfFjNjDccV5zoDXjNjwFwCvXPcuA+ycS0X0z23Y56de/cD+6lJuP3cYdLSqyanx7HupcgyA/B9uPJzDgo/UM/Hg9O44n5PUp5ivTpwNYTG37MNeU3MnJ5BIM3PEKqYnBZJwMI3lrBYoVN+ncOa/PVETA3f6f11zYT+ZIBKvMBkzIvB2Ad258mFvGvEtE980sX+1d7Zd/Zvp0sBkuprUdS7XwgxyzijEqfQwZyUXIOBmu9ovkM5e0/2g4C8wWvHXK/RqmKS2eoMuYD9R++UvTp4OPI50ZrcZS2h7HvqxyDN/xLFmJAQVi3a9HliRf2rLFvSmTZXdBloNp9w1gYPH5JFoBdIt5hy0bW5MWXdprHl26XGxyBu+u2M+X6w6T5XL/J9yjYRke6lyTChEBeXx2eScmBlq0NDl31sZTbV/l+Q4vkWk56DTze37e1zbnuJCiJlt+s1G5smfPz5vb583XLn9f6zYWv0Y5CW25n8yToXzdezDd7es4Z4XQdf977I5q5tXtl9+Xu/2vdXmCh1q+S5rlS8/MZ1k151ZS95QB1P684M3XLn9f7vYnrq7O0keuo4PvZo5bEXSNeZcDGxup/fI/LrYfpt17BwNLzCfRKkLrb+ayY3eznOPye/t1h4zkSw0bQuw5GxXLOgCDZ6w72ZBZixAjlc/CXsLviD8AjRqbHDyYp6eaJyKC/Hjmprose7A9PRq6F5o/bDlBpzdW8uyPO4lN9s5XZTeLNIlNdHHXHZN4vsNLAIw//CAHr8siOHI/2FyEhJgkxHs+yiLy1+bNNejSwcH5lbVJ3VuGB5PuY6dZkWJGIp9EvIjtYDDgve2X35fd/pF3vcRDLd8F4KGsEexIr4o9JFXtF8nncrffNB2MTBjLfrMMZY1YPi33PK5DoYDaL5fKbv+Tox5mYIn5uCyDe449QXz7xAK17tdARvKtsDBYvMi9+Wri0ZLcPHU2hxPKUSMihq97DcZuOElJtlGzduF9FfZfqRARwFu3N2Le6DZcW70YWS6Lz9Yeot3ElbyzbB+pmd7xquy4OKhX3yI+zkbbm79nchX3MOatqGG8/fkTHJ/SiaQN1SgWYbBli7Inkl+FhcHCnwwWLXL/e1xMeQbseoWzVggNwvfwRd+hGJhe335xi4uDGjXc7e92y0xer+h+o9Jbztv47tANHH27i9ovUgBc3v4zRypwy+wvOG8G0sx/N58OvRvUfrkgd/t79f+AZ8M/AuCp6Pv5+pPRBW7dn//PULxajRrQtZvF+cX1OH2uPL0WTSUlM4AuVVfwwbg7CY7cT1aWRfMW3v1cab2yoXwxpDnThzanftlQkjOcvL4kmrYTVl54rKnwvio7Lg6qVjPZucukZOBpvqz5GAFGBqtd9Xm7Xlsiuv8GwJ13wtkz+XtCLiJuXbpkt78+W7/vxt0nnyLDcnBr1Z94Y9wwtV+IiYGSpU32xZhUDD3M5zWexM9wssAVySTnbThTHWDa1H6RAuRi++uxc3dT7lr3Gi7LoH/xn3j+4TFqv1zS/rpVtzC1/IvYDYtZznbMqNisQK77NZCRfG/mDINKFRwARO1uxX1HHwVgSMB8BlorwLSzL9ogtKhuY2xdrRg/jGzNO/0aUSE8gHPJGTw5Zwdd3lzNgu0nyYMto66quDioVs3kfLyN4pG7+eH2fpSzn+OAWYpRWaNxYSc7c3fckbfnKiL/TE77TRsLpg3msbR7AHggcDY9star/V4sLg5q1TZxZtoo02I7c/vdTnFbAjvNijyUNQILG7h8ALVfpKDJve7/cUl/no0bCsAT/l/SLWCd2u/F4uKg5oX2V2q7kXm39yHUSCXKrMGTzrsBg4K47tdARvK9sDB4710j59+XBdfj8dWPA/B2t4fpWnUpAIkJNmrU1G2MNpvBTQ3KsPTBdjx3c10iAn05eC6F+6b/xi2T1/JrTGxen+IVc2N3i/h4GwYmH7cbT/Nym4hNC+P2Xa8Tl1iM5J1liVtSN1/vrC4ivy93+80MH2aldOS9tB4AfHTTGFpX/AVQ+71Rx04WziwbdsPJl20foH7JXZy2ijIkcxzJqcFqv0gBdvm6/1NXVz49czM2w+Kztg8SWfdnQO33NnFxULmyiSvLhq9vKl+3GUMlx2kOO0syYKf7TaoFtf0ayEiB0LUrhIW5H7tJ3FSRl1c8zOd7emO3mczqcxd1i+8CwJllo1IVxRnA12HjrlaVWPVwB+7vVJ0AXztbj56n39R1DP50A7tPJub1Kf4nGzbAul/df2E/3+FFbvJfS6bloNecT9j4bXeOT+lE7LyGBPra2bBeqRMpiC5pf1RlHpo2iQVZkfgZTn6443aqhB0A1H5vsmEDbN3ibv+kbo/SwW8zaZYvQzLHsffXSI6904XYeQ0pGqj2ixRUl677K3PP1I9YkdqYACODH24bSPlS7ltj1H7vcWN3i8REG2DyxT13EmnfS4IVwM1zPif6u0456/6C2P6Cdbbi1TZtsuHjZ5IUVRkwGPbNZFYdb0GIbzILh/ag8chviOi+maQ0Fz1uKVyP5vwXQX4OxnauwarxHbizZUUcNoMVe89yw9treHDWFo7Fp+b1Kf5jMTHQspX7L+rBDb/gybavATD20Hj2dbLcO6vbnYDFb5sKxvOjIvL7crc/41Q4d3zzEVtdVYhwJLL43u7UvW+O2u8lcrf//uaTGRU5FYAHsu5jW2Y1XOl2sLmoXsMiLlbtFynIcrff6fTj1vdnszerPKVscfw0pAc17pur9nuJpUsvfhH2pTseoU/EEpyWjaFHnyK2bWrOG5UKavsNKw82lfi77+QWuVx8PDRvYbEv2v0fZXiRONaPaUs1/6PsNCvSJ/NpTu+sRuy8hkRHQ/XqeXzC+dChcylMXLyX+dtOAuBrt3Fny4qM7FCNsEDfPD67vye0qEligo0uVZcyv38fHDYXL/78IE8ue+biQXYXHdvbWLbU+OOfyMO8uX3efO3y313e/irNfmPV9b0pZ5wjyqzBgMzHid1ZWe0v5LLb37P2HGb1HoTNsHg5qx9vHx/A6RktsbIc2B0mZ8/YCAvL67O9yJv7583XLv/d5e2vVn4Pawd1obgtgdWu+tydNZ7zOyuq/YVcQKBJWqqN4d3f5v0mTwFw/+7HeHvWoznHOHxNzpwqmO3XHTJSoISFQfReg5BQ91fI4tLCuTP5Mc5aIdS1HWaKzyT8gt2P4qxalZdnmn9VKhbIe/0b8+Oo1rSqGkGmy+Sjnw/SdsIK3luxn7RMV16f4h+Ki4OmTS0SE2w0LLWV2b3vwmFzMX3vrTy5/KlLjm3Z3Mbsb/LPMEZE/r3L239gY2PuSnqMRCuAprZo3vSZjE9wEqD2F0a529+6/K98eds92AyLac7OvHe+J2e/bYaV5QBMFi/KXwtyEfn3Lm///qO1uCvxMVItP9rat/Oy4yPswcmA2l8YZbc/LdVG99bf8m5j9xdeXz9y1yXDGJvdJHpPwW2/BjJSIG3ZbMPh445z9OE6DMkcnxPnF1K+BiyGDYNu11t6rvQPXFOuKNOHNuezwc2oXTqEpAwnExftpf1rK5i54QjOfPiq7P4DLDZvNakQeoQF/XsT7JfMsoNtGTzrQ7Au5Mxw0aChxdpfjAIbZhH5fbnbv3VfY+7JepAMy8EN9g08bn6D2l84Zbe/RsQ+fri9H/6ODJa4mvB0yt2cmR2JK9kfDBctWhp07JjXZysiV1ru9m+IaeF+k6Zl0NuxmrGOOQBqfyGU3f6mtdcys9N9OAyTb853Ytynk3IdZbJ+XcF7TCk3DWSkQKpcGc6cthFS1CRuST3W7mjLgKXv4jJt3N3oS94aN4SI7ptZttLJdZ0t9u3L6zPOnwzDoH3NEswf3YY3+zagbNEinE7M4LHvttNl0moW7jiVb16V/fHHsGihQfnmv7HojtsoHXyabWfq0Hf5ZLLMi49ahYQarFiuO2NECqNL21+fpduvY8zx8QCMrvU5/zduFEXb72LpChc398gf7ZL/Jrv91VquZ/EdtxIREM9msxqjMkZx+odmZJ0JBdztXzBf7RcpjC5f9/+4/SbG7Xe3/9EqU3l03ENqfyGT3f76HZcxv1c/gox01qQ0ZMB7M8gZYdhcNGxk0LRpnp7qf6aBjBRYYWFw6ICNNs0dxM5ryJy1fXjoyIMAjAn8lkHmEpzpPvy2yaBGDU3N/4zNZnBro3IsH9eOp7rXISzAhwNnUxjx5SZum7KWDQfj8uzcYmKgeAmToUMh0CeZOe1GUKvYPo4klKXHwk+JPVY+51iHj8mhAwX3lkUR+WuXtr8RH37yMM8n3QXAE4Ff0ttYgyvDwc9rDJpFqvsFVe72h/nH8UPbe6hY9CgxZmmGZI7j+MJmpB8sDqj9It7g8nX/pBlP8GZCXwBeDPiYG+zr1f5CIHf7Sxc/wpzmoyhhO8/OzMrcNvszspz+Ocf6+BgsX1bwB/EayEiBFhYGq1cZTHW/aIHZAa2YmNUHgNeavUi/et/kHLtoiUmv3pqa/xk/h50hbSqz6uEOjOpQDX8fG5uPnKfPB78y9PONRJ9O8vg5RbYwiU10UbzDVr7rewcNffYTZwbTbfbXHDpYK+e4kFCT6L1akIt4g0vab9n4MO1m3jvfE4DJ1z7Krc1nAxC10aBqVb0StSDKbn/pjr8xt9/t1HIc5ZQVxl1Zj3LglyakbHcP49V+Ee9x+br/9YzeTIu7EZth8UnrcXRt/iOg9hdk2e2v0GUdi4f2oJLtNIfNEnSd/g3nDl18Likk1GTv7sLRfg1kpFBo29b9bfrRcCad7c9b64cD8PktI7i+2mL3D7rsLF9uERWVRydZgIT4+zCua01Wje9Av8gK2G0GS3efoduk1Yz/Zisnzqd55DwWLYK4czaKdd7Kx60fpkvVFSRnBtB370QSbzpDcOR+bA4XTZpaJJwv2M+Pisg/l93+lN1lGP3eh3yVdB0Ow2RG1xF0bLAIgPh4G1WraWFekGS3v2SXzXzZaiytK6wnwQrkrsxHOJJRCjPTUPtFvNjF9pdl8HvTmJfUGj/DyTdd76Flfffuvmp/wZPd/rJdN/B18weo53uAM2ZRekW/DjceKbTrfg1kpFCoUQO6drNIXF6PhHVVGbvwFWZG34KP3cm3/QbQe8wkirbfBXaTfv11l8zfVTLEn5dvq8/isW25vl4pTAu+2XSMDq+t5OWfdpOQmnVVP//06WBg8nadF7nJvo5My07/1e+ydFY/jk/pRNKGanS+zsaSxQX/dkUR+eey25/0aw0sp4OB705ncWok/kYWP/S4kxvvn0zR9ruITzTp0lXtLyimTwe74eTDOs/Q0b6FNMuXIZkP8duu5hyd1JWkDdXVfhEvlrv9pumg17vf8HPaNQQbacy/tT+dxnyk9hdA06eDr28q0+o/RgvbbhKtIty25n2ivr65UK/7NZCRQmPmDIOObR2kbK+AhY07v/6In863pogtk0+CXqXmvhRwOti/z6Befd0p809ULR7ElDua8N19rYisHE6G0+SDVQe4dsJyPlgVQ3rWlX1V9oYN0LSZxRdfWEzq9ii3F1mOyzIYkzWabW383cM14IEHYOFPepuSiDebOcOgZaQdAGemPz3e/Y6fs+oRZKTzZej/UeVUIjgdRG3UvgL5XXb7v/zC5KObR3OT/1oyLAfDs8YSZdXCmeQHlqH2i8gl7c/IDOSGyT/ym7M6YUYyM0Kfp/ThLLW/gMhu/4zpmXx7bx86OLaSavkxOPNhjjU3C/26XwMZKTTCwtz/kS5y36WO0/RhVOZolp1sRZBPKgsH3UJkQ/dtjDt3GDRrpo1+/6nGFcL4+p4WfDKoKTVLBpOY7uTln/bQ4bWVzIo6isv8b1+FiIuDtu0smrew+G17Fq8PHsOY5h8AMOrQY8w735bknWVJ+LU6GCb33XclrkpECrKwMFiz2qBJU3d/0tOCufv8o0Rl1KSokcLC3n2oX3MT4N5XoEZN3cKe31ze/g+GDmVQwxk4LRtjskazKrMhybtLk7C2htovIsD/tj8pOYyB8U+yM6sSJWwJLBlwG1XK7QXU/vwqd/u37Exn5oN96F50DRmWg7uOPsv6hAZese43rDx4p21iYiKhoaEkJCQQEhLi6U8vXqBlK4t1vxoUbb+LzF8qsmxMF1oE7SA2LYyeByewv1gE6UfDSVxej45tHSz8qXDd+uYJLtPiu9+O8eaSaE4kpANQo2QQD3etRafaJTCMf/ZrGhcH1WuYxCW6IMvBxHtGM670FwCMmP86H0QNvXiwzeTaNgarVxWs3zdvbp83X7t4Rnw8FzZxtLm/mrapBGtGdb74DHrCc+w60IC4JfVo09xR4PpRWF3e/g/uvZt7SnyHaRmMzbqXWce6c/rrSKxMH7W/APLmaxfPuLz9fjtC+Hl4F6o5jnPELE7v+OeJOVRb7c9ncrff5jKY8VBv+gYsx2nZ6Lf6XWavHHDx4ELeft0hI4XSgvkGxYqbnP+5BqlZgdxx6lnWHWtKRJF4vq89lvqhewmqe4KQjjtZtNBg3768PuOCx24z6N20PMvHteeJG2oTWsSH6NPJDJ0WRZ8PfmXT4X/2quwet1jExdoIbnSY59q/lDOMeTbtLqbF3XzJsV27GPwwp2BFWUSurrAwiImxERbubv/5pGJ0mPwTu50VKGE7zzdFn6F+vU2Ed97JmtXqfn6Ru/2Tuj3KPSW+A+AR5zBmx3bhzKwLwxjUfhH5X5e3//S5cnT8eD4HXaWoYDvLN2FPU6PuNrU/n8luf0ijg0wbMTBnGDMmbTTzj3S+5NjC3n4NZKRQCguD6L02mjR0P1t6Zl8Vun75HZuz3M+WTvd9kdrGYezBqQCsWpWXZ1uw+fvYGda2Cqsf7sC97avi57Cx8VA8Paf8yj3Toth/5q9flR0dDT+vMQCLpyNf5+l2EwB4IesOPjO6UqL3xpznRxcvLpzPj4rIfxcWBjH7L7b/XEIp+sY/yx6zPCWN88z0/T9qV90KwMyZaGGexxYtutj+iS1e4P7m7wPwcNYwvnG1xycslZCW7t8ktV9E/sjl7T96qjJ9zj/PIbMkFWxn+cr3BapU2g2o/flBdvsNXHzQ4REGFF+A07LxQNZIFtiae926XwMZKbTCwtzPjF7b1iJ1awUSM0LpufMNtphVCDeS+ZJXKb/SH4Bhw6BJU230+1+EFvHhkW61WDm+Pbc3K4/NgMW7TtPlzdU8+u02Tl14rOlyMTHQrJkJWLzU6TnGRcwA4KHlz/PWtmE4E/1J3lmW5PXV6drNonPn3/1pRESAi+3P3lfgxJEqDMh8nL1mOUoZ8Xwb+jRNW67gmWfcb+pQ+z0vLg46drLo1s3CwOTdG8YxPGwOAI9mDWVmchecSX5qv4j8bZe3/+ChGtye+WTOUObbok9Ro+JOtT8P5W6/3ZbJzJG3c7u/+8UdQze+wlfbe3nlul97yEihFx8P/fpbLFoEhq+TSt3WMbPMkzQvsYUkZwA9Fn3KiqhuOce3udbixx8K9yTWE/afSeLVhXtZsus0AP4+Nu5uXZnh7aoSWsR9+3l0NDSLNElKzWJSp6dyNvB98tAoXlsxlowjxXJ+vmvbWvwwp2D/vnhz+7z52iVvxMdDjZomsYkuwjvvpGyFGGYGv0Ad/4PEWUH03D6Jld/3Aty3Qav9ntO2ncXPv5jYHRlM7fYQgxrOwLQMHncOYdrpmzj1ZWusDPffE2p/webN1y554/L2V660h69Dn6WK4ySnzDBuWPoZm39tn3O82u852e338Uth9rAB3FR0NU7Lxn0nHmHa4qFeu+7XQEa8RlQUjLjXYlOUQZBvEotG3Uir4K2kZAbQ4+sZLDvQIefY0DCTgzG2Ah2B/CLqUByv/LSHqMPure2LBvhwd/NqzHurAkt+cmAzXHx+3x3cUWwBACMWvM4HG3Nv4OuiRXMbv64t+M+OenP7vPnaJe/Ex7ufU1+z2t2PMP84lo3tQiPffSRaAfTZ9iaLfugLlvvHw4uZ7I9W+6+WuDj378fPawwctiy+GdmbW8JX4LRsjMsawVdHbuLMN80ubOCr9hcG3nztkncub3/p4kdYPvwGatmPEmsG0+3bWUTtapVzvNp/deVufxH/ZOY+cBOd/H4jw3IwcM0kvlkx8OLBXth+PbIkXqNpU/etjFOnQnJmMP1PPcuimI4E+qYyv18fetaZk3NsQryNKlX1erwroWmlcL4Z0ZIPBzahWokgzqdm8caK3eyqtIpS121iVu+7uKPYAlyWwZA1E/kgasglH9+xo40F8wt+lEXE88LCYPUqg+hoeO45iE8Pp9eRl1kb15AQI5XvrhlD34GTMXycgHvR2KWrx79O5TW6drVYu85FoE8yc/v15ZbwFWRadkZmjWHmvlsv2cBX7ReRf+vy9p88W4FbDk9gW0ZVImxJLOt1G9c1W5BzvNp/dWW3v2jIOZaN6Uonv99Is3zpt+lNvll5xyXHemP7NZARr9O2rfvbc9urcfPMr/h29034OTKZ1WsQ4wY/Q4l+awmOjOF8sosetyjOV4JhGHSpW4rJN11L2qp6OJP8CQuJ4/uWI+lZey4Zpg+jssawrFlZygxdSeA1RwD49FNYtqRg364oInmvenW4/Xb3P5/eVpPO7y9g6anWBBgZfFHpKe5/6EnCOm8Dm0nURksbPl5hMTEQHm4SFWVQqd0Glt15M92qLSPV8mNY1jgWmZE4wpIJqH0CcG/iqPaLyH+Vu/3HttSj7XtLWJdWlxAjjbnX38nQB19wbx6r9l8Vudtfs/NqfhnZiZZFdpBoFeGuzEf4rV7EJet+b22/BjLidWrUgK7dLNL2lCHT5Uefbz7n/ajB2AyLiRUmMfLkCpI2VIEsH9ashu++y+szLhyio6FRIzgTVRZzRi2+8nmBFrbdJFlFGOR8mLlJ1+JM9iXjZBipe0tTrLjJoEF5fdYiUljkbn9qViA3TJ3D7PMd8TFcvOk3hQevmYpvRDJgo30HbfZ4pURHwzUNTOKTTCqGHmZx8yE0L7eJeCuIAZmPs+RsK5yJfmScDCcturTXbOIoIp6Ru/0JSRF0fHsxCzMi8TeyeD/odQaGLgCnA7X/yvr4Y6hbz93+OlW2sKjJMOr4HuK0WZROc2azfMd1OBP9yTgZ5vXt10BGvNLMGQatWrhfjWdadu6d/yYvbL4fgKfbTeDLAYPwtWcABj17ujeW0uNL/05cHDRrZlGzpklqio3WXRewblAX6toPc9YKpXfyM6zNrI8jMBNHUCaG3UWxculsWK88iciVlbv9WaYvYzPv4519dwLwZNA0PhoyhOA6hzlx3KBZM7X/v7i8/Z2u/571wzpR1XGC41YEPTOeYdmymzn5UTuOT7mO2HkNad/GwcwZ3nWruohcfbnbn5YexPCEcXwRfwN2w+Lthi/w1vDhGLjU/itg0ybw8TMZOhQy0m3c3PtzfhlwA+Vt5zhglqLDT18Ttb01sfMacnxKJ7UfDWTES4WFwZrVBg0bZT+SZPD0j8/z4OGHcJp2BlSbw9K7uxNRJBaAn9cY1KipPWX+qbg4qFbdJOo3C2wW11dbzMLGw6gQeoy9sVW5fscUdqTVIPVAcZJ3lAELAmudIrjPGj7eup0zSb//qmwRkX8ju/21arvbn7y7LGNmvMNTx+7FtAwG+ixl7m13UKHTRjBMtf9furz9vep8z7xG91Ey8Cy7zIrclv4c+1IrYQ9Mx3C4qFPXIjraO29VF5Gr7/L2J+4uz51vT+f1BPfzTGNKfcWPD95EgH8SoHX/vxUX5357qstwUaTGSYZ3f5tvqo+nqC2FTek1uXHXeyQ2S6Fou90YDqfaf4HesiReLT4eqtUwiYuzwLRTot9arlmfzuw+dxLql8SB9DLcnfIoW3c04/zPNWjayM7GDd47wf0n4uKgWjWT+HgbYDG+9//xcq03sNtMfsmqS+/vPufknuo5xzdtZvHel0l8sHYvy/acAaCIj51h11ZmWNsqBPv75NGVXDne3D5vvnbJf+LjoWQpkyzTBKeDEv3W0nJzLNN7DSXQns5esxxD0saza2tj4lfUUfv/gcvb/1y/x3m6xmQAlrkaMTJxLAdntSPzVFHA3f7Fiwr3Ytyb++fN1y75z++1/8b4bUyJfAI/w8lWZ1WGJDxCzI76Wvf/Q3FxUKGCSUqKjeLdN/JUzcmMDnLv+/CjsyWDZr9P/N5KOcer/RfpDhnxamFhsD/aRoNr3P8ppMWUZOmBjrT8aCmHM0pRxf8Ec0Meo93JA+B0ELXRoF59PV/6d3TpahEfb8PfkcZnPe5lQp3XsNtMPt46gF7b38LW4Yh7IzWHk7Awk40bDCJrhPDxoGZ8dU8LGpYvSlqWi7eX76fdxJV8+stBMpyuvL4sESkEwsJg7x4bIYEX2//Dnpu49qNFnHRGUNN2jHkBj3N93cX4lzmv9v8D2e0P8k3im9535gxjPnZez90Jj5LsDCSg1olL2l+YF+Qikn/8Xvs/XTiC6+fNINYKpoEjhgVh42mcdETr/n+oS1eLlBQb4SFn+brBQznDmImHBzNk64sEdIz5n3W/2u+mO2RELmjbzmLteieuDPedGNfc9y3vuN6nbakNALy8ZSRP/vgCpuV+BrXNtRY//qCYXG7DBhhwh8X+fQYVQo/wXZ87aFJmK07LxotJd/L8B69gphbJOT6kqMmW32xUrnzpz2NZFot2nmLCwr0cOJcCQPnwIozrUpObrimDzVbwvmLhze3z5muX/K1ZpMVvW52YF1633GTkLKbY3qJZ+A5clsEEZ19eXT2WhLU1wLSp/X8gd/urhccwp18/6hbbS6Zl5xnnIKZuG0jsT9eAy/136B+1vzDy5v5587VL/nZ5+1uO+pKPQiZSx+cQWZadxw/cz2tfPkX2/Qtq/+/L3f5GNdbzbe87qew4RZrly/jkEUx5/wnMVP+c44sVN9mwXu3PTXfIiFzwwxyD6zo4APeM8siu2nSauoD3T/cC4LGG7/HToFsoEeh+nEbPl14qLg5atrJo3sJk/0EXN1RfxKZ72tGkzFbOpkTQ9avZvLNvMKXvWktw5H6wuQgJMUmI//0oG4ZBt3qlWTy2LS/dWp/iwX4cjUvj/q+2cNO7P7Nm31nPX6SIFDqLFxl07nix/TE769Fm8nK+POfe8PExn6+Y1f4+ag1cjCMsRe2/zOXt71Xne6KGt6Vusb2ctorSJ+0ZPj3SE//ysQQ3OfiX7RcR8YTL2797R2MiJ61mTnobfAwXE6u+wez7exIc5I692n+pS9ufxT03vsPPt99EZccpjrhKcOOBt5iT1p6wjnvAkYVhM1m8GM6eUfsvpztkRC4TFQWdrjNJTLn4fGnnHUf58KYxBPikczornDHJY1h2oD1xS+rRuL6DF543qFYNqlf/65+/MIqLg5q1TM7FWvjg4o1BYxhVfiYAm05dQ++F73PwSG2wLs6Aw4uZRG34+1FOzXTy6S+HeH9lDEkZTgDaVCvGI91qUb9c6BW/pqvBm9vnzdcuBcP/tv8Xbo3ew1s3PIKfLYtjVjHGZI5i7fFmnJkVSeP6Pmp/rvb72zKZPHgEg8vMAWCjWYN7zj7Czm87k3X2YqP/afsLA2/unzdfuxQM/9v+nxniWsrztd7CYZgcdJXi3sQHiToQqXX/BbnbHxyQyCcjBtErcCUAK1Ma0/vTLzkXWzbneMNm8tsmGw0b5s355hXdISPyLzVtCocO2mjS8MLr8WJKMn17X5pNXcnutEqU9IljZtHneLbOWxQJO8+mKIMbboAaNaBMWe97zjQ62v2417mzNqoXPcjPd3fNGca8HTWMdj98y8HDdXOGMf7+Jt9+C7Fn/9mCPMDXwcgO1Vj1cAeGtKmMj93g5/3nuOndnxk9czOHY1OuxuWJiJf43/aX4oNNQ2g5dSkHMspSzjjHLN/neaj855Tpu5atMak57a9YyftekZq7/fWK7WHDiPYMLjMH0zJ4z3kzPXa8zdaPb8kZxvzb9ouIXE3/2/7SvDzrWTrP+5rjZgSV7af4oejjjK77CT6hSZes+729/S3r/szmMa3oFbgSl2XwQswIun05+5JhTGCgSew57xvG/BMayIj8jrAwiNpocG1bi9StFQDYdbY2NyW9zAxnB2yGxagi3/NLj1uoX2IbGO4bzU6eMGjWDBo0LPyB3rDBvUN6zZqwa4fF6Mj32TKiDZFlfyPeGcyQ5PE8uvthUk6VyPmYps0sTpywcdtt//7zhgf68lT3Oix/qD23NiqLYcDcrSfo9PoqnvlhB+eSM67A1YmIN/q99m8+1ZAbEl/lB1crHIbJeJ9ZzK34EK3v/oqQyBiwmRw5bBAebvLhh3l8AR6Qu/27d5o8eu3rbBrelvoRezhrhTAw81GeWPUkJ79viZXlAK5M+0VErpbfa//K37rQNfZ1lria4Gc4ecZ/Gsvv6kGV8rtzPs5b2h8dDbNmuQcxNWtC9J4MXr3zQVbfegtVfU5wwgynd8qzvLz2ITJOReR8XNNmFkeP2rTvzl/QI0sifyI+Hvr1t1i0EMAgovtmguqeoEvqFl7y/4hiAXFkunx4dtUjTFx7P06Xb87HhoaZHIwpfBGKi4P+AywWLQLD10m9FmuZVHEiHSuvAWDp0TYMmfsuKc0T8C8fS8ru0lf11YE7TyQwYeFeVkW795QJ9LUzrG0Vhl1bhUA/xxX/fP+FN7fPm69dCp7fb/9xeqRt4LmATynql0ia5ctEZ18+PH0bsUuvIf1wMQBatLJYMK/wbfx4efsbtVrF5Fov0LzEFgCWuJowdt8jnEovhn+Z81e9/QWJN/fPm69dCp4/an8/50qeDJxGoJFBsuXPE3vG8s4347Csi+vMwtj+nO4vNMCwMHydtO48n8n1nqe+XwwA3yW0596v3sHVNM4j6/6C5O/2TwMZkb8h+/nSpHQX4Z134kr0w3dTBB92v58etRYAsP10bUYseJO1R1rmfJzdYdL5Oht9+8KgQXl08ldIdDTExMDLr1j8usGJkWnx9N2P8VCZaRQxMknJKsL4pc/z/ta7cISmk3Xm4p4BxYqbRO+9usOptfvP8crCPWw7luD+nEF+3H9ddW5vVh4fe/64GdCb2+fN1y4F1++1P2hLMB/fPIouVVcAsM2szONZQ1i/pxXnV9bCeT4Qh4/JdZ0KZ/sdTif/N/QhRpf4Cl/DRaJVhKeThjB10QiyzoV4vP0FgTf3z5uvXQqu32t/icMmn/cdRosiOwFYn1aHET+8xZa9kTkfV9ja33+AxbZdTgIj9+HcUIaJI+5naNCPOAyTODOYMeueY8bqgThC09T+36GBjMgVFh8PPW6xWLM697TXYkD9WbzZ9TGKB8YC8PHOfjy64AXOpUVcsomtzW6yKargPUN5yXT8guu7fc3Eym9Qt8QeAFacbMk9P0zikKs0ztgQsC4e68mvGFiWxfztJ3lt0V4OxaYCUCkigHFda3Jj/dIYRt5O6r25fd587VKw/VH7hzX+jAmdn6aofyIuy2CaqwtvZPTi+JY6JKytjpnqBxSu9t9602e8VnsiVYocA2CZqxFjtzzBnk2NyToVnmftz++8uX/efO1SsP1e+202Jw/3fIkna79LoJFBlmVnysk+PDnzJZJSihbKdX9E903cXH4pL0R8SFn7OQDmJrVhxNdvcyajWJ6u+/M7DWRErpJ9+2DlSrjnnovfF14kjgmdn2JIoy8BSEgP5sU1D/H2huFkOAMuHmiYbP6tYMQ5OhpWrYKJEy0OHnMS2mkHVX2O8ZjvTG6pNR+As1lFecnqx+zU9sTObUzagYv7xdSpa/H5ZwZNm3r+3LNcJl9tPMpbS/fl7ClzTblQHr2+Fq2qFvP8CV3gze3z5muXwuH32l8y8DRvdH2c/vVnAxBvBfGW8zampXUl/rdqJG6s7B7MFPD21/aP4dlSU+kUsQ6AU1YYz2bdxfy0FsT+2CTftD+/8ub+efO1S+Hwe+2vVGYf7/W5nxtCfwHgjFmU/9s6msnzR+Fy+V88sIC1f9YsmPqxxYnTTorUP0KdlBNMunUcLezuL8AeMYvztHMQy1KbETu3kdr/FzSQEbnKOnW2WL7UIvfe2G0qrOWtbo/QuPQ2AA4llOe1+P7MDWhKytHixC2ph+G089yzNs6ehe7doXPnPLqAPxAXB61bW+zZk/09BnVvXcz4Oh/S374cH8OF07LxSfTtPL7wWWhzEv/ysaQfjSBhaV2uqevgq5lGvngVYEqGk4/WHOTD1TGkZLoAaFejOI90q0WdMp5vjze3z5uvXQqX32v/dVWW82bXx6lXwr3Z4wGzFG86ezHP1YKMuGDO/tAYMy4oX7d/9mx4YKzF8WPZ32PQqPc8Hq4xld72VdgNi0zLzqepN/LMT0/hrJyCf/m4fNn+/Mab++fN1y6Fy++1v1e7GUxo8zyVHScB2J1VkVfPDmSZowFpF9b9ZNkZNdJGyZLQp0/+e1X27Nkw5n6LkyetnDt8Im//nkcqf8Rtvj8DkG75MPlUX56Z9Tx+bY7m23V/fqOBjMhVFh8PXbpaRG28NM4GJndc8zUvdXqeciEnADhsluCdtJ68+9l4MmLDLrml0dfPZNRIGyNG5G2ko6Ph88/h5VdMrAtvjSofepRHWrzD0Kaf4Wc4AfjpWHvGLXyRk3UtUveUJuPIxTtOunazmDkj/92meC45g3eW7WP6+iM4TQvDgFsaluXBzjUoHx7w1z/BFeLN7fPma5fCJT4eOnay2LL50vbbDSdDGk/j+fYvUTLIvcn4frMMbztvZa6zJSkxpUjaVJn0wxGAkW/av2QJ3HqbSUoKYHO3v2bV7TzZaQJ9iy/Ex3APs+ent+DRVU9zPCSY1D1lCkT78wtv7p83X7sULn/Ufh9HOmNveY3Hak+mqC0FgN1mBd5M7sNH743DyvK/5JGeMmUsBg82uOuufNR+AwwfF407LOXhah/TM3wp9gv/LzArrjMvnR7MmaAgzs1reMleMWr/n9NARsRDoqKgeQsT07LAtOd8fxFHKo/f/wgjAudQzEgE4GhCGaZsvpuvfFtxaE1TrAzfSyJdtqzFnDmeu90vOhreeAO+/c7i3DnAcv8lE1k2ivubT6F33e/xsbkX4+vNWkxy9uTn1AacntnikiA3bmLxwfv5/zbFw7EpvLY4mrlb3YMyX7uNgS0rMrJDNcIDff/io/87b26fN1+7FE5RURDZwsS6rP3Bvok8Ofph7gn8kaKGe3F+1CzO564uzHK1J/Z8MVJ2lCNlRzmcCe6BcN623wLDwubj4rrW8xjd6COuD1ybsxhf7arPJGdPNqbV5vTMlgWy/XnNm/vnzdcuhdMftT885CzP3DuewX4LCTbSAIhxlmHK3jv4KvE6TqxumM/W/e72Yxpc32IuY9u/RSff37BdaP9SVyPeSO3Dyjm9Lnk0qc21FqNHGTRqlP/u9slvNJAR8aCDB6FRY5OEBC7EzR3oiO6bKV73AP0yVnOPfQGlgs4AkJblz1c7b+XzLf1Yfbg1lmG/JNBVqlqMGW1www1w4ACsXw8tW/77W9wXLYLJk+HsWShVCooXh9VrLPbssQADbCah/nH0qTmPIY2+oHm5TTkfu/pUJM8uf5xtNcMu3qK4rC4tmjp4/DGDatUKXpC3H0vglYW7+WW/eyPmYD8Hw9tV4e42lQnwvXqvyvbm9nnztUvh9WftL1V3PwMzVzDUsYCIgHgAUiw/fnC14ltXWzZZNcg4HkbqvlKk7i+BMzaIKlW5Yu1ftMj98QkJsH8/2O3/235bkXTK1d5Hv8azGFhyLnXth3I+fnlKE55f9Qg7Spa4+GhSAW9/XvHm/nnztUvh9WftL1s3msFZS7g7aB4hFwYziVYRvj51PZ/+eie/br8WruK6/++0H0xKRZxgUIdPuav6N9TyOZLz8QtTWvDcskfZVzZMjyb9RxrIiOSB776D+x+wOHbM/Z+V4et+XR4mJC2qTd+63/Fgl9dpGLgv52MOJ5Rjxq6eLAuqw8/rO5FxotjFSBvmJY83BQWZDB9uY98+CA+HqlXd33/unHvvl5o1oXRp2LIFSpaEMmVg9OgLtyNiuP/SsGzub7EIDjhP14qr6F1nDjfXXIC/IxOADKcvM3f0Yppve/aVKfo/G3dd29bihzkF+xZFy7JYs+8cr/y0h10n3XcwlQj244HratCnaTkcV+FV2d7cPm++din8/qz9qYtr0L/+Nzx43ZvUCTiY8zGHzJL8YLZmsasJO61KZMUHkn4kgoyj4aQfC8OVEIB74fz77c/ufkQE1K0LNpu7/Q4HfPypSeJ5G2CSu/2GXyZ+ZeIpWe0AN1Rdxs0RK+ho34zvhceS0kxfZh69iU/TrudQqZBC2f684M398+Zrl8Lvj9qfeSoYa1cEw7q+z8i6n1HVfjLnY/ZlleProzeyKKMpG37pQOY/XPefOweHDkFoKNxxB5gmzJ//5+3PXvcXL3aMW5r9SO9aP9AhKAqHYQKQbPkzK6UTX/h0ZH9a+f9pvx5N+nc0kBHJQ1FR0Pd2iwMHrEvCClCkxgkaph5lUMPp9K37PaH+iTk/djKpJAsOXEdUSEUWLbqNg/EVcYSk44wNvhDrC4G9hAWGcfHHs//5QnwB974AloGvXxptr5tPg3On6FJ1Oe0rr8HX5sz5mbafrsOXMbfy2fqBnEkpgc3HRdh1u/AvH0vK7tIk/FKD1i3sl73+tWAzTYu5204wcdFejsW7v5JRpVggD3erSde6pa7oq7K9uX3efO3iPf6q/ZEZMQxqOINedX4gyDcl58dOmOGsMBuxzqzNRrMmp4jAleYg80woWWdCyIoLwJlYBFdCEVzJ/piZ9guLbC5rPxh+WdiD07EHZOAITcMRmkZQiVgiK22kuf8u2tq20cy2J2chDrAlsRbT99/CJyuGEJcW4RXt9yRv7p83X7t4jz9rf0CNY3QotoFBjadzY/hqihiZOT921FWcRWevZZ1Vi0U/3cax05VwhKT9ybrfutD97M+Re91vYvi6sJwXfswyCAhKoNON3xNp7KVzudU09d+d8zgqQFR6LWYevomPFwwnMaMo4Z135twVE7uoLiFF7ERF2XRXzL+kgYxIPrBvn/tRoQMH4MBBix07LbC5wOkDgL8jjZtrLqB/10/p4LuFEL+kSz7+WHIpNp9owK64Gmw/VZeY+EqcSC7NqeQSZLr8cgYvNh8XlgWW047Nx0l4sdMUdyVQtshZahXbR70Su6hXfDeNy22miC3zks+x51x15kZ3Y0HRa1j5/a1gA0dIKs5zIbkW+24dr7OYPatwTsgznC5mrD/CO8v3E5fi/jVqWL4oj11fi+ZVIq7I5/Dm9nnztYv3+av2B/ikcGutefTv8intfLcS6Jt6yccfM4uxy6pItFWOaLMcR60SnLbCOEtRsnA/Vmlm2rGyLu5fgM2kqG8CpRzxlDZiqWacoLpxjFq2o9QxDuNnZF3yOXYlVOPHPTcwfXsvdpyqh+FwYQ9O87r2e4I398+br128T+72L19hkpx6of1Z7n0KQ4Li6Nf+S3rWn8O1Ptvwv6zLh10l2JFcg53nq7PjRF1izlTl2NnynDhXFmeW+3Xaho+L8M47iF9WBzPTgc0ni+LFT1CmyFmq1t9Gdftx6hbbS53g/dT1i8m5AzLbjswqzD/WkS829GPn/kYYDhe2wDRc8cGXDJPCi5lEbbBRufJV/kUrxDSQEcln4uOhxy0Wa9ZYF25cuRi94MgY0qPK067Sz/Ts+SlNsg7QMGgPPnbnH/58iRnBpGX5k+70x+Vj4esyKeJII9A/Bf/Lhi65xbmCWb2vLasPt+KXshXYvrkFaYfdXxENahpD4q81wHVxkW/YTJ58wsbAgd6xX0BSehZTVx9g6pqDpGW5/xLrVKsED3erRc1Swf/p5/bm9nnztYt3+6v2Z24qS4fKa+h566c0ccZwTVA0dpv5hz9folWEDHzIwBeXZcPPyMKPLALI+J+hS25nzVDW7GvDygNt+aVceXZvaUba4QiwX1isZ/lcMoSpWNFi7Fj3ngbe0P6ryZv7583XLt7t4EFo2swkLs7IeWlGtuDIGFzbSnBD87nc0GoOkba91LYfztlQ93KmZZCMP+mWHxn4YGLgZznxNzIoYmTkvAn195w2i/JrQkNWHW3FCp+6REdF/mn7y5SxeOcdg9tuuyK/DF5NAxmRfGrfPnj/fZj0lolpAjYLw+7CynRPzyO6b8ZeJJPkOfVoWmYz9UrspkG1TdT0PUKF0KOUCT6Fn+OPBy7Z4p3BnIgry764quw8W4udZ2pzqJaDk1UdnP6mOWkHi2H4uijabg8pu8qQeTz8kiCXLGXx5BMGo0ZdrV+J/O1MYjpvL9/HzA1HcV14VXbPxuUY27kGZYsW+Vc/pze3z5uvXQT+fvtTf6hDszK/5bS/hu9Ryocep0zwSXztfzxwyRabGsbprHD2nKjNzrO12XmmFgdr+XK6mo0z30T+ZfsrV7GY9bXenHQleXP/vPnaRcD9eunp0+GL6Samiz9sf8bCarSut5r6ZXdQv+wOavgfpozPWUrY4vG57C6X3xNrhnDGLMq+5ErsiqvJjpN12RsRztlKdrU/j2ggI5LPxcdD6zYWu3dd2AMGwGbmPLsft6SO+/V4QGD9w6Rsr3jhIy3Ci8QT5h9PEZ90ijjSCa13kNht1Uh3+uNocpx9y1oR2G0P8ctrY6b55uwh477NcSe+JRI4/V1DzIRLb0+02U2eeNx77ob5Ow6cTeb1xdHM3+7ekM3XYWNQq0rc174qRQP+2auyvbl93nztIrn92/YbmEQExFHUP4EijjT8HRmE1DtE3LaqpDmL4NPkGNFLW5PpsAHGJfsIqP15y5v7583XLpLbv2+/ixLFThIWFE+AfyrFI3fj45PJ6V/rkpZZBGrFEr24NVmGA7U/f9FARqSAyH7edNMmOHjY4thRLt1t3ebeI8bMvVdAtt/dQ8aFPSQNV2IRglvuI+GXapDpwyW7rV8QGmbS81Yb587BrbfCoEGeuOKCacvR87zy027WHYgDINjfwX3tqzG4dSX8fX7n9+Z3eHP7vPnaRX7Pv25/zqbt/G77s84Xce/xmOHL/7xp4wK137O8uX/efO0iv+c/tR9yhizxy2pf2ENG7c+vNJARKaD27YNVq9z/3LAhDLvHYssW/t1blnLdilisuEXlSgbly0O5ctC9O3Tu7IkrKjwsy2Jl9Fle/WkPe065N2AuFeLP2M7V6dn4r1+V7c3t8+ZrF/k7/ln7Lfe//89blv63/RUqWtSpbRAQoPbnFW/unzdfu8jf8Y/b/4dvWVL78xsNZEQKkdyxPncOfv0VwsOhWrWL3xcXBzVqQNmysHkzlCgBkZHgdLqP062IV47LtJiz+ThvLInm+Hn3q7KrlQji4a416Vyn5B++Ktub2+fN1y7yb/1Z+7O7Hx4OdeuCw6H251fe3D9vvnaRf+uv2n/oEISEwB13uI+ZN0/tz480kBERucrSs1x8ue4w767Yz/lU92abTSuG8ej1tWhaKfx/jvfm9nnztYuId/Pm/nnztYuId/u7/fvz++tFROQP+fvYGXptFVaN78B97avi72Mj6nA8vd7/lWHToth3OimvT1FERERERPIpDWRERP6j0CI+PNytFqvGd6BfZHlsBizZdZquk1bzyOxtnEpIz+tTFBERERGRfEYDGRGRK6RkiD8v33YNi8e2o2vdkpgWfB11lHYTV/Dqwj0kpGXl9SmKiIiIiEg+oYGMiMgVVq1EEB8MbMq397aiWaUwMpwmU1bGcP2k1Xl9aiIiIiIikk9oICMicpU0qRjGrOEt+fiuptQoGURiujOvT0lERERERPIJDWRERK4iwzDoVLskP93flud71M3r0xERERERkXxCAxkREQ+w2wxua1wur09DRERERETyCQ1kREREREREREQ8TAMZEREREREREREP00BGRERERERERMTDNJAREREREREREfEwDWRERERERERERDxMAxkREREREREREQ/TQEZERERERERExMM0kBERERERERER8TANZEREREREREREPEwDGRERERERERERD9NARkRERERERETEwzSQERERERERERHxMA1kREREREREREQ8TAMZEREREREREREP00BGRERERERERMTDNJAREREREREREfEwDWRERERERERERDxMAxkREREREREREQ/TQEZERERERERExMM0kBERERERERER8TANZEREREREREREPEwDGRERERERERERD9NARkRERERERETEwzSQERERERERERHxMA1kREREREREREQ8TAMZEREREREREREP00BGRERERERERMTDNJAREREREREREfEwDWRERERERERERDxMAxkREREREREREQ/TQEZERERERERExMM0kBERERERERER8TANZEREREREREREPEwDGRERERERERERD9NARkRERERERETEwzSQERERERERERHxMA1kREREREREREQ8TAMZEREREREREREP00BGRERERERERMTDNJAREREREREREfEwDWRERERERERERDxMAxkREREREREREQ/TQEZERERERERExMM0kBERERERERER8TANZEREREREREREPEwDGRERERERERERD9NARkRERERERETEwzSQERERERERERHxMA1kREREREREREQ8TAMZEREREREREREP00BGRERERERERMTDNJAREREREREREfEwDWRERERERERERDxMAxkREREREREREQ/TQEZERERERERExMM0kBERERERERER8TANZEREREREREREPEwDGRERERERERERD9NARkRERERERETEwzSQERERERERERHxMA1kREREREREREQ8TAMZEREREREREREP00BGRERERERERMTDNJAREREREREREfEwDWRERERERERERDxMAxkREREREREREQ/TQEZERERERERExMM0kBERERERERER8TANZEREREREREREPEwDGRERERERERERD9NARkRERERERETEwzSQERERERERERHxMA1kREREREREREQ8TAMZEREREREREREP00BGRERERERERMTDNJAREREREREREfEwDWRERERERERERDxMAxkREREREREREQ/TQEZERERERERExMM0kBERERERERER8TANZEREREREREREPEwDGRERERERERERD3PkxSe1LAuAxMTEvPj0IiJ5Irt52Q30Juq+iHgrtV/tFxHv83fbnycDmaSkJADKly+fF59eRCRPJSUlERoamten4VHqvoh4O7VfRMT7/FX7DSsPxvWmaXLixAmCg4MxDMPTn15EJE9YlkVSUhJlypTBZvOuJ0bVfRHxVmq/2i8i3ufvtj9PBjIiIiIiIiIiIt7Mu8b0IiIiIiIiIiL5gAYyIiIiIiIiIiIepoGMiIiIiIiIiIiHaSAjIiIiIiIiIuJhGsiIiIiIiIiIiHiYBjIiIiIiIiIiIh6mgYyIiIiIiIiIiIdpICOF3tmzZylVqhQvvfRSzvetXbsWX19fli1blodnJiIiV4vaLyLifdR+KWgMy7KsvD4JkattwYIF3HLLLaxdu5aaNWvSsGFDevTowRtvvJHXpyYiIleJ2i8i4n3UfilINJARrzFy5EiWLl1K06ZN2b59Oxs3bsTPzy+vT0tERK4itV9ExPuo/VJQaCAjXiMtLY169epx9OhRNm3aRP369fP6lERE5CpT+0VEvI/aLwWF9pARrxETE8OJEycwTZNDhw7l9emIiIgHqP0iIt5H7ZeCQnfIiFfIzMwkMjKShg0bUrNmTSZNmsT27dspUaJEXp+aiIhcJWq/iIj3UfulINFARrzC+PHjmT17Nlu3biUoKIh27doRGhrKvHnz8vrURETkKlH7RUS8j9ovBYkeWZJCb+XKlUyaNIkvvviCkJAQbDYbX3zxBWvWrGHKlCl5fXoiInIVqP0iIt5H7ZeCRnfIiIiIiIiIiIh4mO6QERERERERERHxMA1kREREREREREQ8TAMZEREREREREREP00BGRERERERERMTDNJAREREREREREfEwDWRERERERERERDxMAxkREREREREREQ/TQEZERERERERExMM0kBERERERERER8TANZEREREREREREPEwDGRERERERERERD/t/fgPmJNNk03EAAAAASUVORK5CYII=",
+      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAkMAAAHFCAYAAADxOP3DAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjguNCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8fJSN1AAAACXBIWXMAAA9hAAAPYQGoP6dpAABV10lEQVR4nO3deXQUVdoG8KeyswYCJGRpSFiEQAi7rGEZNhUQ6CAKI4KOjgooEVRAnQGcTxHGURhxY2RkRieAmmZxFyEBFHFAVkEQnIBhCaskEEhIOvf7o+gmnfS+VnU9v3Nycrqquvre7lTq7bu8VxJCCBARERFpVEigC0BEREQUSAyGiIiISNMYDBEREZGmMRgiIiIiTWMwRERERJrGYIiIiIg0jcEQERERaRqDISIiItI0BkNERESkaQyGiBRoxYoVkCTJ/BMWFob4+Hjcc889OHLkSMDKNW/ePEiSFLDXV4Njx45BkiS8/PLLgS4KETkpLNAFICLb3n33XbRt2xalpaX49ttv8cILLyA3NxeHDh1Cw4YNA108IqKgwGCISMHS0tLQrVs3AMCAAQNgNBoxd+5crF27Fvfff3+AS0eBVF5ebm41JCLPsJuMSEVMgdGZM2fM20pLSzFz5kx06tQJ0dHRiImJQa9evbBu3boaz5ckCdOmTcN7772H1NRU1K5dGx07dsQnn3xS49hPP/0UnTp1QmRkJFJSUmx2+5SWlmLOnDlISUlBREQEEhMTMXXqVFy6dMniuOTkZIwYMQKffPIJOnfujFq1aiE1NdX82itWrEBqairq1KmDW2+9FTt37nT4fpi6E3Nzc/Hoo4+icePGaNSoEfR6PU6dOlWj7vPmzatxjuTkZEyePLnGOTdt2oSHHnoIjRo1Qv369XHfffehpKQEhYWFGDduHBo0aID4+Hg8+eSTKC8vr3HeyspKvPDCC2jWrBmioqLQrVs3bNy4scZxR44cwYQJExAbG4vIyEikpqbi9ddftzgmLy8PkiThvffew8yZM5GYmIjIyEgcPXoUV69exZNPPomUlBRERUUhJiYG3bp1w8qVKx2+f0Qk41cKIhXJz88HANxyyy3mbWVlZbh48SKefPJJJCYm4vr16/j666+h1+vx7rvv4r777rM4x6effoodO3bg+eefR926dbFo0SKMGTMGhw8fRosWLQAAGzduxKhRo9CrVy+sWrUKRqMRixYtsgjCAEAIgdGjR2Pjxo2YM2cOMjIysG/fPsydOxffffcdvvvuO0RGRpqP37t3L+bMmYNnn30W0dHRmD9/PvR6PebMmYONGzfixRdfhCRJmDVrFkaMGIH8/HzUqlXL4fvy4IMPYvjw4cjOzkZBQQGeeuop3Hvvvdi0aZPb7/WDDz4IvV6PVatWYffu3XjmmWdQUVGBw4cPQ6/X449//CO+/vprLFy4EAkJCZgxY4bF85cuXYrmzZtj8eLFqKysxKJFi3D77bdj8+bN6NWrFwDg4MGD6N27N5o1a4a//e1vaNq0Kb788ks8/vjjOH/+PObOnWtxzjlz5qBXr1546623EBISgtjYWMyYMQPvvfce/u///g+dO3dGSUkJfvzxR1y4cMHtuhNpjiAixXn33XcFALF9+3ZRXl4uLl++LL744gvRtGlT0a9fP1FeXm7zuRUVFaK8vFz84Q9/EJ07d7bYB0DExcWJ4uJi87bCwkIREhIiFixYYN7Wo0cPkZCQIK5du2beVlxcLGJiYkTVfxtffPGFACAWLVpk8TqrV68WAMSyZcvM25o3by5q1aolTpw4Yd62Z88eAUDEx8eLkpIS8/a1a9cKAGL9+vVOvU9Tpkyx2L5o0SIBQJw+fdqi7nPnzq1xjubNm4tJkybVOOdjjz1mcdzo0aMFAPHKK69YbO/UqZPo0qWL+XF+fr4AYPP9Gzx4sHnbsGHDRFJSkigqKrI457Rp00RUVJS4ePGiEEKI3NxcAUD069evRvnT0tLE6NGja2wnIuexm4xIwXr27Inw8HDUq1cPt912Gxo2bIh169bVGCfy4Ycfok+fPqhbty7CwsIQHh6O5cuX46effqpxzoEDB6JevXrmx3FxcYiNjcXx48cBACUlJdixYwf0ej2ioqLMx9WrVw8jR460OJep5aVqNxMA3HXXXahTp06NbqFOnTohMTHR/Dg1NRWAPB6qdu3aNbabyuTInXfeafE4PT3dpedbM2LECIvHpjINHz68xnZrr2Pr/duyZQuMRiNKS0uxceNGjBkzBrVr10ZFRYX554477kBpaSm2b99ucc7MzMwar3Prrbfi888/x+zZs5GXl4dr1665XWcirWIwRKRg//73v7Fjxw5s2rQJDz/8MH766SeMHz/e4hiDwYBx48YhMTER77//Pr777jvs2LEDDzzwAEpLS2ucs1GjRjW2RUZGmm+iv/32GyorK9G0adMax1XfduHCBYSFhaFJkyYW2yVJQtOmTWt01cTExFg8joiIsLvdWvmtqV4nU9ecJ4GBK2W1Vk5b79/169dx5coVXLhwARUVFXjttdcQHh5u8XPHHXcAAM6fP2/x/Pj4+Brn/Pvf/45Zs2Zh7dq1GDhwIGJiYjB69OiApmAgUhuOGSJSsNTUVPOg6YEDB8JoNOKdd97BRx99hLFjxwIA3n//faSkpGD16tUWOYDKysrces2GDRtCkiQUFhbW2Fd9W6NGjVBRUYFz585ZBERCCBQWFqJ79+5ulcEXIiMjrb4nvhpbY+v9i4iIQN26dREeHo7Q0FBMnDgRU6dOtXqOlJQUi8fWcjzVqVMH8+fPx/z583HmzBlzK9HIkSNx6NAh71SGKMixZYhIRRYtWoSGDRviz3/+MyorKwHIN8iIiAiLG2VhYaHV2WTOMM3mMhgMFi0ely9fxscff2xx7KBBgwDIAVlVOTk5KCkpMe9XguTkZOzbt89i26ZNm3DlyhWfvJ6t9y8jIwOhoaGoXbs2Bg4ciN27dyM9PR3dunWr8WOtFc+euLg4TJ48GePHj8fhw4dx9epVb1eLKCixZYhIRRo2bIg5c+bg6aefRnZ2Nu69916MGDECBoMBU6ZMwdixY1FQUIC//OUviI+Pd7ur5C9/+Qtuu+02DBkyBDNnzoTRaMTChQtRp04dXLx40XzckCFDMGzYMMyaNQvFxcXo06ePeTZZ586dMXHiRG9V3WMTJ07En/70J/z5z39G//79cfDgQSxduhTR0dE+eb3Q0FAMGTIEM2bMQGVlJRYuXIji4mLMnz/ffMySJUvQt29fZGRk4NFHH0VycjIuX76Mo0eP4uOPP3ZqNlyPHj0wYsQIpKeno2HDhvjpp5/w3nvvoVevXhbjsIjINgZDRCrz2GOPYenSpXj++ecxfvx43H///Th79izeeust/POf/0SLFi0we/ZsnDhxwuLG64ohQ4Zg7dq1eO6553D33XejadOmmDJlCq5du2ZxTkmSsHbtWsybNw/vvvsuXnjhBTRu3BgTJ07Eiy++aDGtPtCeeuopFBcXY8WKFXj55Zdx66234oMPPsCoUaN88nrTpk1DaWkpHn/8cZw9exbt27fHp59+ij59+piPadeuHXbt2oW//OUveO6553D27Fk0aNAArVu3No8bcuR3v/sd1q9fj1dffRVXr15FYmIi7rvvPjz77LM+qRdRMJKEECLQhSAiIiIKFI4ZIiIiIk1jMERERESaxmCIiIiINI3BEBEREWkagyEiIiLSNAZDREREpGmayDNUWVmJU6dOoV69elbT2RMREZHyCCFw+fJlJCQkICTEd+03mgiGTp06BZ1OF+hiEBERkRsKCgqQlJTks/NrIhiqV68eAPnNrF+/foBLQ0RERM4oLi6GTqcz38d9RRPBkKlrrH79+gyGiIiIVMbXQ1w4gJqIiIg0jcEQERERaRqDISIiItI0BkNERESkaQyGiIiISNMYDBEREZGmMRgiIiIiTWMwRERERJrGYIiIiIg0jcEQERERaRqDISIiItI0BkNERESkaQyGiIiISNMYDBEREZGmMRgiIiIiTWMwRERERJrGYIiIiIg0LeDB0JYtWzBy5EgkJCRAkiSsXbvW5rEPP/wwJEnC4sWL/VY+IiIiCm4BD4ZKSkrQsWNHLF261O5xa9euxffff4+EhAQ/lYyIiIi0ICzQBbj99ttx++232z3m5MmTmDZtGr788ksMHz7cTyUjIiIiLQh4y5AjlZWVmDhxIp566im0b98+0MUhIiKiIBPwliFHFi5ciLCwMDz++ONOP6esrAxlZWXmx8XFxb4oGhEREQUBRbcM/fDDD1iyZAlWrFgBSZKcft6CBQsQHR1t/tHpdD4sJREREamZooOhrVu34uzZs2jWrBnCwsIQFhaG48ePY+bMmUhOTrb5vDlz5qCoqMj8U1BQ4L9CExERkaooupts4sSJGDx4sMW2YcOGYeLEibj//vttPi8yMhKRkZG+Lh4REREFgYAHQ1euXMHRo0fNj/Pz87Fnzx7ExMSgWbNmaNSokcXx4eHhaNq0Kdq0aePvohIREVEQCngwtHPnTgwcOND8eMaMGQCASZMmYcWKFQEqFREREWlFwIOhAQMGQAjh9PHHjh3zXWGIiIhIcxQ9gJqIiIjI1xgMERERkaYxGCIiIiJNYzBEREREmsZgiIiIiDSNwRARERFpGoMhIiIi0jQGQ0RERKRpDIaIiIhI0xgMERERkaYxGCIiIiJNYzBEREREmsZgiIiIiDQt4KvWExERkYoYjcDWrcDp00B8PJCRAYSGBrpUHmEwRERERM4xGIDp04ETJ25uS0oCliwB9PrAlctD7CYjIiIixwwGYOxYy0AIAE6elLcbDIEplxcwGCIiIiL7jEa5RUiImvtM27Ky5ONUiMEQERER2bd1a80WoaqEAAoK5ONUiMEQERER2Xf6tHePUxgGQ0RERGRffLx3j1MYBkNERERkX0aGPGtMkqzvlyRAp5OPUyEGQ0RERGRfaKg8fR6oGRCZHi9erNp8QwyGiIiIyDG9HvjoIyAx0XJ7UpK8XcV5hph0kYiIiJyj1wOjRjEDNREREWlYaCgwYECgS+FV7CYjIiIiTWMwRERERJrGYIiIiIg0jcEQERERaRqDISIiItI0BkNERESkaQyGiIiISNMYDBEREZGmMRgiIiIiTWMwRERERJrGYIiIiIg0jcEQERERaRqDISIiItI0BkNERESkaQEPhrZs2YKRI0ciISEBkiRh7dq15n3l5eWYNWsWOnTogDp16iAhIQH33XcfTp06FbgCExERUVAJeDBUUlKCjh07YunSpTX2Xb16Fbt27cKf/vQn7Nq1CwaDAT///DPuvPPOAJSUiIiIgpEkhBCBLoSJJElYs2YNRo8ebfOYHTt24NZbb8Xx48fRrFkzp85bXFyM6OhoFBUVoX79+l4qLREREfmSv+7fYT47s48UFRVBkiQ0aNDA5jFlZWUoKyszPy4uLvZDyYiIiEiNAt5N5orS0lLMnj0bEyZMsBshLliwANHR0eYfnU7nx1ISERGRmqgmGCovL8c999yDyspKvPHGG3aPnTNnDoqKisw/BQUFfiolERERqY0qusnKy8sxbtw45OfnY9OmTQ77DSMjIxEZGemn0hEREZGaKT4YMgVCR44cQW5uLho1ahToIhEREVEQCXgwdOXKFRw9etT8OD8/H3v27EFMTAwSEhIwduxY7Nq1C5988gmMRiMKCwsBADExMYiIiAhUsYmIiChIBHxqfV5eHgYOHFhj+6RJkzBv3jykpKRYfV5ubi4GDBjg1Gtwaj0REZH6aGZq/YABA2AvHlNQGiQiIiIKQqqZTUZERETkCwyGiIiISNMYDBEREZGmMRgiIiIiTWMwRERERJrGYIiIiIg0jcEQERERaRqDISIiItK0gCddJCIiIg8ZjcDWrcDp00B8PJCRAYSGBrpUqsFgiIiISM0MBmD6dODEiZvbkpKAJUsAvT5w5VIRdpMRERGplcEAjB1rGQgBwMmT8naDITDlUhkGQ0RERGpkNMotQtbW8DRty8qSjyO7GAwRERGp0datNVuEqhICKCiQjyO7GAwRERGp0enT3j1OwxgMERERqVF8vHeP0zAGQ0RERGqUkSHPGpMk6/slCdDp5OPILgZDREREahQaKk+fB2oGRKbHixcz35ATGAwRERGplV4PfPQRkJhouT0pSd7OPENOYdJFIiIiNdPrgVGjmIHaAwyGiIiI1C40FBgwINClUC12kxEREZGmMRgiIiIiTWMwRERERJrGYIiIiIg0jQOoiYiI7DEaOVMryDEYIiIissVgkFeGr7ogalKSnOyQOXyCBrvJiIiIrDEYgLFja64Mf/KkvN1g8H0ZjEYgLw9YuVL+bTT6/jU1iMEQERFRdUaj3CIkRM19pm1ZWe4FJ84GOAYDkJwMDBwITJgg/05O9k8QpjHsJiMiIqpu69aaLUJVCQEUFMjHuZLs0NluN1OrVPVgzNQq5epSG2oa91S1rPXr++Ul2TJERERU3enT3j0OcL7bzdutUmpqYape1hEj/PKyDIaIiIiqi4/37nGuBDiutEo5ooRxT86yVVY/YDBERERUXUaG3H0lSdb3SxKg08nHOcOVAMdbrVK+HPfkbfbK6gcMhoiIiKoLDZXH8QA1AyLT48WLnR9340qA461WKW+2MPmao7L6GIMhIiIia/R6eaByYqLl9qQk1wcwuxLgeKtVyhfjnnwlwGXgbDIiIiJb9Hpg1CjPZ2KZApyTJ613BUmSvN907iVL5PEzkmR5vCutUt4e9+RLAS4DW4aIiIjsCQ2Vp8+PHy//dmdKuqvdbt5olfL2uCdfclRWH2MwRERE5A+uBjh6PXDsGJCbC2Rny7/z853vnvP2uCdfsldWP5CECNDQbT8qLi5GdHQ0ioqKUN9PCZyIiIis8ncCRGuJHnU6ORDydH01b9elWlmLAUQDPr9/B7xlaMuWLRg5ciQSEhIgSRLWrl1rsV8IgXnz5iEhIQG1atXCgAEDcODAgcAUloiIyFPe6HZzhactTLb4IpljlbKenrUYH3RZ4FkZnRTwYKikpAQdO3bE0qVLre5ftGgRXnnlFSxduhQ7duxA06ZNMWTIEFy+fNnPJSUiIlIpbwdgPkjmKCoF9n5wGH8euBVptyUiYeF0PLRrimfldJKiuskkScKaNWswevRoAHKrUEJCArKysjBr1iwAQFlZGeLi4rBw4UI8/PDDTp2X3WREREQeMnWJnTwpJ2s8f976caaZcfn5DoOuyopK/HfFQeS8fR6GPSn4X0Vz874wlKN/w2+x8beBPr9/K3pqfX5+PgoLCzF06FDztsjISPTv3x/btm2zGQyVlZWhrKzM/Li4uNjnZSUiIgpa1sYd2eJgEduK0gpsfWM/clYUY82BW3CqMs28LwrXcFv8XmSOMmLE7DSENOyC6Ggv1sMGRQdDhYWFAIC4uDiL7XFxcTh+/LjN5y1YsADz58/3admIiIg0wdQl5mpHUpVEimXFZdj46j7k/KcU6462wwXR2byvHooxovl+ZN4VgtueSked2J7mff5qzFB0MGQiVZtmJ4Sosa2qOXPmYMaMGebHxcXF0Ol0PisfERFRUPJgzbBrUQ3w6czvYMipxCfHO+Ayupv3NZIuYFSrg8j8fRQGPZGOyPp9vFlqlyk6GGratCkAuYUovkp2yrNnz9ZoLaoqMjISkZGRPi8fERFRUHNjzTAB4HxILJL1GbiKuubtCSGnMab9z8icXB8ZUzogLEoByR5vUHQwlJKSgqZNm2LDhg3o3FluUrt+/To2b96MhQsXBrh0REREQc7FNcMqIffaPFz5Jq6iLlqEHYe+Uz4yH26MWye3Q0iYApb+sCLgwdCVK1dw9OhR8+P8/Hzs2bMHMTExaNasGbKysvDiiy+idevWaN26NV588UXUrl0bEyZMCGCpiYiI/MjfiRpNXFwz7ASS8ErY0+jQOwZzpx5G+thbIIU0d/zEAAt4MLRz504MHDjQ/Ng01mfSpElYsWIFnn76aVy7dg1TpkzBb7/9hh49euCrr75CvXr1AlVkIiIi/7E2kyspSV6+wtPEiY44WGC2EhLOoTFeC5+Jdp0i0fXPw7F4RGvflskHFJVnyFeYZ4iIiFTJ1kwu0yQiZxdtdYOoFNhvOIKDc1dh3MF5AIAQ3CxHJSRIEDj/4jI0mfOQT8rgr/s3gyEiIiIlMhrl5S1sDWC2ltzQw+60yopK7PjXjSSIu1Pwy40kiGNgwBI8Dh1O3jzYW+ub2eGv+3fAu8mIiIjICkczuUzJDfPy5IBn3Trg/fctM0M70Z1WUVqBb976ETn/LMKaA7fgZLUkiMOa7sOYUbGo++Re4MR+/49b8gMGQ0RERErk7EyuceOAixet7zOtFVatO82UBNGQXYqPf74FqbiEeJxGKwhcRi3c0ewgMsdKuO2pDqjbtMfN87Ua4H59FIzBEBERkRI5O5PLViAEyK1HkgRkZaGk12B88coBGD6sxCfH01CM7hgDA3ZiAnS42QIlEhIhvfp33w/OVhCOGSIiIlIi05ghGzO5XDUMn+Mr3GZ+/IC0HO+IBwEAFms6+GFwtrP8df8O8dmZiYiIyH2hofJ4H+BmgOKBGPyGlLBfMbNrHra9uQfvJM6DhGqBEHAz8MrKkgMyfzMa5XFQK1fK46b8gN1kRERESqXXyy001fMMxcTY7x6z4i9/KkfLeTpIIc3kYMOZwdk2Vp4383YySGs5lfyAwRAREZGS6fXAqFGWQYfRCAwe7Nzzb0zBbzX390DIjXYgZwdnnzxpe5+3k0HayqnkB+wmIyLyh6pN/3l5gel+IPUKDYXo1x/7I7pi3jyg0/AEFCDJvBaYQ4sXW7bYODs4+4kn5CClOlPgUr0FxzR7zdpz7DEa5cAqQMOYOYCaiMjXArmcAqmaqBRyEsS3zsGwOxlHy5PN+8biA3yAuwFYGfdjYisxorODs60NpnYnGaQjeXlAlaW5TIoBRAMcQE1EpGre/gZNQc943YjNS/bg8Y6b0SziNHo80B6L/jsAR8uTEYlS3Nn0e/zroW/w9tEhkHJyICUlWZ6gSRN58HNurhyQWAu4qw7OtsfaYGpnk0G6MvjZ2W47H+GYISIiX7HX9F8l/wtGjQqaTL7knutXrmPjK3thyL6GdT+n4pzoZN5XF5cxXLcfmWMl3P50tSSILa2MJ3J2ELNpcPbDD1tmra6u+mBqZwMXVwIcZ7vtfITBEBGRr7jyDdrejB0KSlfPX8UXi/bB8KERHx+TkyCaxEgXcWfLA8gcH4nBM9IR1aC37ROFhrr/96PXA9euAffe6/hYU3DjbODiSoCTkSF3rXkpp5KrGAwREfmKL75Be5O3p0WTQ0W/FuGTl36EYV0oPj+Vjmvoad7XNOQMxqQeRuakuug3tQPCa2f4p1CJic4dZwpuHAUupjFDGS6U39RtN3as/Hw/B0QMhoiIfMUX36C9xRuDuu0FUwy0zM79dB7rXvoJhs+j8PW5jihHH/O+5LAC6Dv+gsyHGqHnH9ojJCzO/wV0NbixF7iYBlxXn73mDFs5lfyAs8mIiHzF0Ywdd2bdeIOtfC6uLMNgL5gCND977sSO01iz8GcYNtbHlkvpqMTNzzc14hfouxcgc2pTdLq7DaQQz7NLe8z0NwFYD26s/U1Y+xuwNXvNFVUC6eL69RE9YoTP798MhoiIfMmdm4wveWNatL1gytYtRUHrXfnK0Y3HYfhbPgxbGuP7kjSLfV1q/YTMvmcw5nEdUke0DFAJHXAnuPFxC6C/7t8MhoiIfM1X36DdYSOfSw25udYH5ToKpuwJVEuYj4hKgQPrjiLn7ydh2B6PfaVtzPskVKJ3vR+R+buLGPNkKyT3TbJzJgVRWPemv+7fHDNERORr1pZTCNRNxtNB3Y5myNkTBLPnRKXAzvd+Qs6bZ2HY1RxHylsDaA0ACEUFBsbsReawEox6ug3iO6UHtrDu8GRmmi0KC7CsYTBEROQPvrjJuMPTQd3emPkW4AR7rjJeN+Lbt39EzvJLWPNjKxQY2wFoBwCIRCmGxu1F5shyjJzdHjEtuwa2sEqjkuzrDIaIiLTENHPIXuuOTmd7WrQ3Zr4FOMGeM65fuY5Ni/fB8J+rWHs4FedER/O+OriC4bp9yMwEbn+qA+ol9LBzJg2zNbbMlH1dQePHGAwREQUTR10SoaHA+PHAX/9q+xz33GO7G8OT5Hju5J/xo6vnr+LLv+6D4cMKfJzfAUXoZt7XUPoNd7Y4gMzxERgy00oSRBV0BXnE1fqpLfu60ICioiIBQBQVFQW6KEREvpOTI0RSkhDy7Ub+SUqSt5tUVNQ8pvqPTicfZ+91JEn+qfq8qo+t7ZMky7IoQFFBkfjPlG9EZuI2URtXLIocF3JGPNJus9iw8AdxveS67ZM4876rmTv1y821/zdm+snNtfvS/rp/MxgiIgoGpgCl+s2mehDipZuU1RukTidvt7dPAc4dOi/embRF3NHkvyICpRbFbB5aIGZ0zRXfvLFXVJTZCQhNnH3f1crd+mVnO/d3lp1t9+X9df/m1HoiIrVzJXfQBx8AEyY4Pmd2ttyd5uh1VZKB+uTOG0kQv66PzdWSILaN+AWZ3QugfzQOnce3dT4JojdyNimZJ/XzNIXDDZxaT0REznFlQVhvLhFib4acAmbP/bLJlASxEbZf6QDgZp061/oJmX3OQD/dlATRjUSIwb4Qryf188X6ZT7EYIiISO1cyR00bpzyblJeakUSlQIHP/4FOUtOwPBdPPaWtgHQHMDNJIj6gRcxZmZLpPRLBZDqWbmVvhCvpzypn6/WL/MRBkNERGrnSmuP0m5SHuahEZUCP7x/IwniD83xc3krAK0A3EyCqB9agtGzfJAEUckL8XqDp/WztfBqUlJgsq/bwTFDRERq586CsEpYIsTNBWON143YtkxOgmjY3woFxkTzPlMSRP2Icoyc1Q6NWsf4rvxKXYjXW7xVPw9a/rg2mRcxGCKioOfOgrD+HuRc9fViY4FJk+QbrTXVbrTXr1xH3mv7kfPvEqw9nIqzoon5UFMSRP0Y4I5ZHVAvoZ7v6lCd0hbi9bYA14/BkBcxGCIiTVBCa48t1srmhG/Gv45l36fj4/w0XBINzNtNSRD1d4djyMx01Iqp5eUCu0DJ77s3BLB+DIa8iMEQEWmGwqa0A7DdHeaE8cjGKshT/ONCzmJM20PQT6yDAY+nI7x2uLdL6j4lvu/eFKD6MRjyIgZDRER+Zrp5njwpL7tw/rxbp7k75AMkdmoC/R9i0OvB9giNCKIAgxxiniEiIvItX33bd7NLrCoBoDymKVYVjoEUzlsV+Rb/woiItMjDKe12z+tml5iJkCRIACL+8TrAQIj8ICTQBSAiIj8zBSzVW25OnpS3GwzundfeSuUukJKS1D8Li1SFITcRkZbYC1iEkKdMZ2UBo0a53mXmaPkGayQJSEwEVqwAzp4NzsHHpHgMhoiItMQH62kZrxvx3TsH8L9Fm3CfK2Ux5apZsgQYNMiVZxJ5leK7ySoqKvDcc88hJSUFtWrVQosWLfD888+jsrIy0EUjInLMaJRX8F65Uv5tNAa2PF5aT6v8ajk2vPQDHmm3BYm1LiBjajr+efx3rpWF3WGkEIpvGVq4cCHeeust/Otf/0L79u2xc+dO3H///YiOjsb06dMDXTyi4BLsuVL8zdog5ZgYeduzzwbmvfVgvalrF6/hq5f3wbC6HOvz03BJdDXvayBdQnKyhGvnYxF15RwkW8s3NG4MvPqq3DXGvy9SCMXnGRoxYgTi4uKwfPly87bMzEzUrl0b7733nlPnYJ4hIif4anaRVjmaVdWoEbBsmf/fWxfXm7p86jI+fWk/DGskfHaiA0pQ13xorHQOY9r+JCdBfKwDIupGBHz5Bgou/rp/K76brG/fvti4cSN+/vlnAMDevXvxzTff4I477ghwyYiCiK9mF2mVM7OqLlwIzHtrWrUeuBmgmNx4fPnpv+Ddh7ZhZNx/0TgxAuNf640PT/RCCeqiWegJZHXejC2v7cWp0hi8dbAfhs7pKgdCwM2VyhMTLc/NLjFSMMW3DAkh8Mwzz2DhwoUIDQ2F0WjECy+8gDlz5th8TllZGcrKysyPi4uLodPp2DJEZI2ppcDWoFq1r7wdCHl5wMCBzh2r0wXmvbXSEni5Viz+FvIU/q8kC8YqoyhuCc9HZrfj0D8Si673pkIKkayd0RK7XMkLmIH6htWrV+P9999HdnY22rdvjz179iArKwsJCQmYNGmS1ecsWLAA8+fP93NJiVTKB7OLNM/ZQcpA4N5bvR75Dbvgv8+sxcE9ZdhS2h1brvVHJeSApWPUYWT2Pg3940loN7IlpJAU184fGmq7TgyUSGEUHww99dRTmD17Nu655x4AQIcOHXD8+HEsWLDAZjA0Z84czJgxw/zY1DJERFZ4aXYRVeHsIGUTP763B9cfheHvJ2DYFofd11IBZJn39aq7H/r+FzBmRgpa/q4NgDbeL4AWxqYx2FMdxQdDV69eRUiI5dCm0NBQu1PrIyMjERkZ6euiEQUHD2YXkQ0ZGfIN3tkEhD58b0WlwK7sQzC8eQaGnTocut4KQCsAQAiMGNBwL/SDL2P007cgsVsHn5UDgO1B5aaxacEwpkgLwV4QUvyYocmTJ+Prr7/G22+/jfbt22P37t344x//iAceeAALFy506hycTUZkh4uzi8hJBgOQmWn/GB+9t5UVldi27EcYll+EYW8rHDcmmfdFoAxDYvdCf0cZ7pzdDo3bNHLvRVxt/dDC2DRbwR5n0rnNb/dvoXDFxcVi+vTpolmzZiIqKkq0aNFCPPvss6KsrMzpcxQVFQkAoqioyIclJVKxnBwhJEn+kf+Vyz+mbTk5gS6hOuXkCNGokeV76qP39nrJdfHVgp3ikXabRdOQQouXqo0rYmziNpE97Vtx6fglz18sJ0eIpCTL+iQl2a9Lbq7196H6T26u5+ULhIqKmu9J9c9bp5OPI6f56/6t+GDIGxgMETnB2g1Op2Mg5KmKCiHmzxciJsbr7+21366Jdc9sF5NabhUNpYsWp4/GJTGxxVaxZvZ2UXKuxEuVETcDZ1eDu+xs54Kh7GzvldWfgj3YCxB/3b8VP2aIiPxEr5cX5+TAT+8KDQX+/Gc547QX3tvLpy7js4X7YVgDfFqQjhL0MO+Llc5hdJufoL+3NgZOT0dE3b7erIlni7wG+9g0TkRQNQZDRHSTvenQ5Jnq761pzTIngqMLRy7i44UHYfgkHF+d6Ygy9Dbv04WehL7DUegfaIA+D6chNKKf7+rgSRoG06ByR2PTMjK8WmS/CfZgL8i5FAwVFBRwijoRkaecmHF0es8ZrF14GIav6iD3YkcYcbOVp3V4PjK7ykkQu01MhRSSWP0VfMOT1g9T5uuxY+XAx9pSHYsXq7cl0lGwBwBNmgC9e1vfRwHl0nIcbdu2xZ/+9CeUlJT4qjxERMHNztInIjMTP7S4C4/VXg5d5xhMWdUPX1/sCiPCkB51GPMH5mG/4QgOlyZjwXcD0H1SO+eyQXuLp60fwbxUh71lTkzOnQNatuTyNgrk0tT6bdu24YknnkBBQQFeeOEF3H///b4sm9dwaj0RKYKj6eVVFCAJr0fNQKPfdcaYGSloNai578vniLfSMARzUkJrrX5VcZq9S/x1/3Yrz9C///1vPPvss2jcuDFeffVVDFD4GAMGQ0SkCC6sWSYgyfdNpd00uSq9Y9evy61f589b3x8MOZX8RNGr1t933334+eefMXLkSAwfPhxjxozB0aNHvV02IgoGpoHCK1fKv43GQJfI7yorKvHtm/vw/gMbnX6OhBuBRlaWst6zYO7q8pZt22wHQoDlQHNSBLeCIUBeTX7o0KH44x//iPXr1yMtLQ0zZ87E5cuXvVk+IlIzg0HuVhk4EJgwQf6dnKyJMRPlV8vx9aJdmJK2BYmR59B3SjreyR/k2kmUetPU64Fjx4DcXCA7W/6dn89AyITT7FXHpdlkb731Fnbs2IEdO3bgp59+QmhoKNLT0zF16lR06tQJ//nPf9CuXTusWbMG3bp181WZiUgN/LEOlcLGnpReKsWGv+2DYVUZ1v/SHhdFF/O+aBShWbKEqxdiUevKOUiujFBQ4k2TaRhs4zR71XFpzJBOp0PPnj3NP926dauxIOqLL76I7Oxs/Pjjj14vrLs4ZojIz/yxDpVCFsS8fOoyPv/rfhhygE8LOuAK6pn3NTElQfx9bfwuKx0RdSNsj7mxJzeXgYeacL0/r1H0AGp7zpw5g4SEBBgV1MfNYIjIz5wdKOzuTT7AC2Je/OU3fPzSARg+CceXhR1RhijzvqTQU9CnHYH+/mj0fbQDQiOs3OwczTgy4U1TvTjQ3Cv8df/2egbq2NhYbNq0ydunJSKlsddF5csxE54sCeGBwn1nsfalQzB8WQe5F9NRUSUJYqvwY8jscgz6h5ug28RUhIQl2D9Z1aVP1q2Tkw1WFwyJCLXMNNDcWuvl4sUMhBTG6y1DSsSWISIvc9RF5cuWIV+3OlVx7JsTWPPyURhyG+Lb4g4QVeacdIj6GZk9T0E/LQFpY1p7lvzQ2vup0/n/pqmwMVhBge+pR1TbMkREQc6ZgdGjRvluHSofz9Q59Nn/YFj8KwzfxuKHq+0AJJn39ajzI/T9zmPME8loPeQWALe49Ro1KGGRXIWMwQo6HGiuCgyGiMh5rnRR+WodKi/P1BGVAntWH4bhjUIYdiThYFkrAC0AACEwol+DfdAPKsbop1pD1yPN9fI6K5A3TX/M/CNSMHaTEZHzXO2i8kX3jxdm6lRWVGL78gMw/OMCDHtbIL+imXlfOK5jcJO90N9eijufbovY9k3cK6da+GPmH5Gb2E1GRMrjaheVL7p/3Fz9vKK0Aptf2wfDvy9jzcE2OF3ZwbyvFq7i9oR90I8yYvisNDRo3t398qnN1q32Z7VVTfzI7h4KUgyGiNQoUIMyjxxx7riqXVS+6P5xcqZOWXEZNry8F4ZVZVh31DIJYn0UYWTyj9DfFYphT3ZAndie3i2jklX9+zl40LnnKDHxI5GXMBgiUptADXQ1GIC5c+0f48nAaFfZaHW6cu4aPp/xHQw5Ap/+mobLuNX8lMbSeYy+5SD0E2rhd1npiKzfx/flVBpncxxVx2zJFMQ4ZohITQKVbNDRuJKq5QjAYNvf8i/h4wU/wvBxOL4sTEcpapn3JYachj7tZzkJ4iNpCIvS8HdAW38/9nDMEAWQajNQKxGDIXKZEnODBHKgq7MDp+fPB/78Z+++tg2F+85i3cJDMHxZG5sudEQFws37WoYdR2aXfOj/2BjdJ7VDSJjba1IHD2cD2qpMQfbq1UCTJsq6HkgTOICaKFCUmm8lkANdnR0v0rq1d1+3muPfnsCal3+BYVMDfFPcAQL9zPvSIo8gs+dJ6KcloIO+NaSQ5j4ti+o4+vuxJikJuOceYMYM5V0PRF7EYIioKiXnW/FxskG7ArgK9+HPbyRB/CYWO6slQexe5wAyM85hTFZz3DKsNQDfBmOq5uzfxXPPAe3ayZ/l+fPAuHHKvB6IvIjBkNYpsTsoUAK05pXTAhiQICPDdxmlqxGVAns//BmG10/D8N9EHChrjapJEDOi90M/qAijn2yFZr3ae/x6muHs38WgQXLLoqlbTanXA5EXMRjSMqV2BwWKP7qhPAk+/RiQ1OBmbh9nVVZU4vt/3kiCuCcF/6toA6ANADkJ4qDGe6G//RpGzUpFbPtOHlVFcfz1hcTVvx/mHyINYTCkVUruDgoUX3dDeRp8+jggcchebp9XXgFiYoCVK4HYWHn72bN2b+4VpRXY8vp+GP5VjDUHbsGpakkQb4uXkyCOmB3ESRD9+YXE1b+fQHbLEvmb0ICioiIBQBQVFQW6KMpQUSFEUpIQ8r/Dmj+SJIROJx+nJbm5tt+Tqj+5ua6fOydHfl+tvdeSJO935VzVPz+dzrVzeKKiQn4PsrPl3x98YP/vKSnJXLbSolLxydz/igdabxGNpPMWh9VDkZjQ/Bvx0ZPbxJUzV/xTl0Dy5t+Eq6/rzN+PL68HIif56/7NqfXBwNVmdlfXl1IaX3UreGHNK7vn9eaUeKWM9XIib4240QqxuMkLmHtuGi7j5jXYSLqA0a0PQj8hCoOeSEdk/Uh/lDrwAr0emDN/P766Hohc4Lf7t09DLYUI6pYha9/yqnwTtyo727lvfNnZ/quHs9ypr6vnN30z99a39WD9hu2ohbHKjxGSOA6dCEGFSAg5JaZ1yBOb/rZLlF8rD3QtAkMtfxO+uB6IXOCv+zczkamZ6Vt59W+XJ08CmZnA88/LYzjy8uRveSaBnJXkCXv1HTvWdn1dYRoXk5houT0pyf1xVME69sKFvDUhEGiGAux/8t8oKIvDa/v6Y+CMztrNBq2WvwlfXA/+ZDTK/w88/b9AQU+j/4mCgKNp4IDlOlJVB2UGclaSuzypr6u8vdK6WoNPR9y4UbfrEgUwG7S6/ia8fT34C2fLkgs4ZkitnB33Y1J97SpTKwtgfVaJ0r71eVrfQAqWsRc3xpmc/mIvcjcJbN1XD2+WPejaOZQ6Ds3fguVvQqkCtYYfeZ2/7t/8iqZWrn4rN/1TyMqS/xGrrfnb0/o6yxfN6qYpzcDNf8Ym/pgS7yFRKXBs5msoqh0PDByI+IVZmLDjCTxTNg/n0QiVkByfRJIAnU5ZrY2BpPK/CUVzphXZ1f8LFPQYDKmVO83nVZOkAXLAc+yY/G09O1v+nZ+vvEAI8E59HTEY5G/rAwcCEybIv5OT5e3uqBpYxcQAH3ygmuCzsqIS29/5EU/fmoep4cvQ7JXpqHf9nMUxSTiBRriAEIiaN/SqeHO3Tm1fSNTClWSRRDdwzJBaORr3Y0/VVpbQUHV0W3irvrZ4OwmlrfEKr7yi2NW/K0orsPWN/TCskJMgnqxMQwiMOIaJAESNb04SIAc6MTFAVJT8XlmTlCQHQry516TW8ThKppbB6aQoDIbUyl42WUeUMCjTVb6sr7fXJLMXWN19txxYjR/vdPF9qay4DBtf3QdDdinWHUnFedHZvK8eivFU3L+gO+PgW/aFC8DXX8vvzenTTmegphvU8oVELdQ0OJ0UgwOo1c5aC4QtwTAo0xf19WYSykAn03NCydkSfPHXfTB8WIlPjqehGNHmfY2kCxjV6kYSxKwOiPp8jdxl6Eh2tmICPNI4Dk4PKv66f7NlSO2qN7MfOQLMmyfvc7T2kBpZq+/cuZ6t1eXNZnWFLm556XgRPnnpRxjWheKL0+m4hl7mffEhhdC3Pwz9pProN7UDwqKqDHLmt2xSm0Cv4UeqxGAoGFRvZk9Lsz5eJVjGbXi7vt684StovMLZA+ewbuFPMHxeCxvPd0Q5+pj3pYT9isxO/4P+oUbo8UB7hIQ1tX4SNeakIrK3qHCw/B8kr1JFN9nJkycxa9YsfP7557h27RpuueUWLF++HF27dnXq+UHdTWaLP9euUsI6WZ6UwZvN6gFe963g+1NYs+gIDBujsbWoAypxs7ztIo8i89YT0E+NR8e7boEU4sSUeEB9OanIOiVcp/6mxToHGX/dvxUfDP3222/o3LkzBg4ciEcffRSxsbH45ZdfkJycjJYtWzp1Dk0GQ/5iL8urmmbJeOuG70lg5eY/7iMbjsHwyjEYtjbGf0vSLPZ1rX0QmX3PYsz0Zmh7RwvH5bfF2ues0/FbtlowGzOpFBdqvWHWrFmib9++Hp0jqBdqDSTTIo7VF5c0bWvUyHcLqvqCtUVgdTrXy+zO4pYuLEBbaawUez88LOb2zxUdog5bvgSMIqP+HvHq6Dxx7JsCN98IGyoq5IVDs7Pl3xUV3j0/+Ya965SLrZLC+ev+rfiWoXbt2mHYsGE4ceIENm/ejMTEREyZMgUPPfSQzeeUlZWhrKzM/Li4uBg6nY4tQ97kaNaUNWroVnGldcbesa60pDixdEDlnaOx418HYVh2HobdyThanmw+LAzl+F2jvdAPu4pRs9qiaXqsR28BBREVzG4ksofdZDdERUUBAGbMmIG77roL//3vf5GVlYW3334b9913n9XnzJs3D/Pnz6+xncGQF7m6VphJsPzzdabbwRQsnTwJnDsnJ1tMTLQMmhzcrAQkXApvgk7Gnfi1UmfeHoVrGNZ0H/QjyzFiVnvEtGzoo4qSqgV4DBuRpzi1/obKykp069YNL774IgCgc+fOOHDgAN58802bwdCcOXMwY8YM82NTyxB5kbuzoQI0tdyrnM1WHRoKXLwIzJ5tO2hyMBVfgkDD8rNIwS+4iAYY0Ww/9JkSbn+6A+o27eGjClLQUNDsRiIlU3wwFB8fj3bt2llsS01NRU5Ojs3nREZGIjIy0tdF0zZP88qo9Z+vK9mq161zHDRV6c6155UxW9Hunz0R1aC353XQMq3NLmKeKCKnKH6h1j59+uDw4cMW237++Wc0b948QCUiADfzz9hboNMetf7zdTapYl6e3aBJACiZ9CjmTrvg1Mt2eTwDUQ2i3Coy3eDthXjVwNF1KknyWDbmiSKNU3ww9MQTT2D79u148cUXcfToUWRnZ2PZsmWYOnVqoIumbaYsr4BrAZHa//k626KVl2e/+0sI1LlyFlsvpqIASXA4cO/cOUdHBC+jUX4/V66UfxuNrp/D1LVZ/TMxtdIpJSDyRl2rsnedMhsz0U0+navmJR9//LFIS0sTkZGRom3btmLZsmUuPZ9T633I2pRw05R6V6aWq0Vubs0pytZ+nnvOqeNWt3lO/DL9VVHp6FidTptT2V1IOWBTRUXNc1T/u1TC++uNurpybnfSRhD5GafWexGTLvqYtXEY69YFZ5I+J5Iqljduipzkmbhnx5OOz5ebK/9W04wff427cSLlgFN/S2qYUeWtutqjtfFSFBQ4td6LGAwFSLD+87WRrVpAngo/Fh9hHUbhGJKRiJMIsdYJVjXFwAcfqGdleH9lMvZmfpyVK5X9/jIXEJFN/rp/K37MEKmYaUHV8ePl38Hyj1yvh/jgQ5Q1sExuWAAdxuIjrMedGBSzBwf7Pix/sXc0VsPTGT/eHmdiiz/H3Tg7UH3rVsfnUvqMKm/WlYjcovip9URKYbxuxDdv7ofh3SIYfuyFU8aTyMBWxOM0LqAR6sTVxeiRlXjnqUuIOXUZON0SGDIP+Mc/7K+c7cnK8P5sqXE2pYA3gl5v5sfx5P31B+YCIgo4BkNEdly/ch2bFu+D4T9XsfZwKs6JTuZ9dXEZTXURGJPZHLc/1QH1EurJwcmg8ZbBSWIiMH8+0Lq19e5C04yfsWPlG7O1hWKtzfhxNvmjN7jSeuGNcTfebM1x9/31F6W3XBFpgU+HZysEZ5ORK0rOlQjD09+Je1O2imhcspiA01C6KCa32iLWP/e9uPbbNcsnerogpiszfvw9Qyo727lZdNnZ3nk9U/2svZ/u1k+pM6p8UVeiIMHZZF7EAdTkSNGvRfh04Y8wrAvB5yfTcRV1zPuahpzBmNTD0N9XF/2ndUB47fCaJ/DWIFhnB537e4ZUIGZk2Rio7tEMK6UO6vdFXYmCANcmUwul/nOlmqp9Vucap2L9yz/D8FkUvj6XjuvoYz60eegJZHY6Cv0fYtDroTSEhMXZP7e3upFMg84d8fc4k0CMu9Hr5SDA2pgod1M0OPv++psv6kpETmMw5Al/DV4lz1n5rEqRhE+xBJ9B/qzaRvyCzO4F0D8ah87j20IKSXL+/P4OTvw9ziRQ4270enlQtha+cGiprkQKw24yd/kjSRp5ReH8txE37xEIWOaSqIT8WRnS56H9C79H6oiW7r+Iv7uRnEj+6JPcNNa+AARDMk0iUiQmXfQir7+ZTJKmaKJS4MC6ozC8dhJrt8ViXdkwJOKE9aRa3vqsAhGcBGqcCbuGichPmHRRyZgkTXFEpcCOfx3EnF55aBt1DB30rTE3dwDql52FzlYgBHjvswrEgpimcSaJiZbbk5J82zIZrMk0iUizOGbIHUySpgjG60Z8+/aPMPzzEgz7W6HA2A5AOwBAJEoxNG4vFjR+AzjgxMm88VkFYhAsx5kQEXmMwZA7mCQtYK5fuY7cJftgeF9OgnhWdDTvq4MrGK7bB/0Y4I5ZHVBv+0kg80PnTuytzyoQwYlSZ0gREakExwy5I1CDV4OBG+NNrl28hi//ug+GD8rxcX4aLokG5n0Npd9wZ4sD0N8djiEz01ErptbN17E3rsvE0WfF8TFERAHDPENKpvT0/krlQiqC4hPFchLENRI+O5mOq+hh3hcXchZj2h6CfmIdDHg8HeG1+9Z8LUfjukyEsP1ZMXUCEZEmMBhyF5OkucaJdbTOt++P9S8dhOGzSGw42xHX0dt8WPPQE9B3vJEE8cH2CI3oZ//1nB0DlJVl/bPy57pfREQUUOwm8xS7URxz0GUlAJyT4pAkjqMckebtbSL+h8xuv0L/aBy6TGgLKUSy+nyrPMn7w9QJRESKwG4yteDgVcccdFlJAGLFGfTGdyiuFQd97zPQP56Edne2AtDCvdf0ZPkIf6/QTkREAcVgiHxOnDwFZ9p0Vj+3H3F/GQAg1fMX9WRcF1MnEBFpCpMukk+ISoGd/z6IZ3rnYdJk554TN6iDdwvhblJCpk4gItIUjhkirzFeN2LbshtJEPe1xK9GeaHTEBhxHM2RgJO+XRLDZsFcHNfF1AlERIrAMUOkCuVXy+UkiO+VYO3htjhTaZkE8Y6k/dCPEYjp8BJCHr5P3uHvVASujuti6gQiIk1hMGQNZ4jZde3iNXz18j4YVpdjfX4aLomu5n0NpEu4M+VH6O8Ox9An01ErpteNPb2BRrW9n4rAV58VUycQEWkGu8mqY6I9qy6fuoxPX9ovJ0E80QElqGveFyudw5i2P0E/sQ4GTk9HeO1w2yfyZvDij8+KgTERUcD4q5uMwVBVthLtmbpGNJZo78KRi1i/4AAMn0biq7Mdcb1KDqBmoSegT/8F+gcaoPcf0xAa4ecAgZ8VEVHQYzDkRU69mUy0BwA4tasQaxcdhuGresj7LR3GKj2pt4TnI7PbcegfiUXXe1NdS4LoTfysiIg0gQOo/U1Nifa83HWTv6UAhr/+AsPmGGy7nA6gqXlfp1qHoO9VKCdBHNkSUkiKFyrgITV9VkREpHgMhkzUkmjPS+NkDq4/CsPfT8CwLQ67r6UC0Jn39aq7H/r+FzBmejO0DC+U6xwNQKQAUEBLi7Ofwbp1DIaIiMghBkMmaki058HioaJSYFf2IRjePAPDTh0OXW8FoBUAOQ/QgIZ7oR98GaOfvgWJ3TrIr/XARGUOJHf2M1i8WG41C3R5iYhI0ThmyMRRoj0AaNQIOHMmMONQ3BgnY7xuxHfvHIBh+UUY9rbC8RtJEAEgAmUYErsX+jvKcOfsdmjcptHNcyl9cLIzn5WJTsexQ0REKsUB1F7k0myyzEz7J8vJCUwg4OQq7BWfb0DunoZyEsRDbVBYGWfeVxsluCNxH/RjBIbPSkP9JCvvhVoGJzvzWZlYW5meiIgUjwOoA2HUKLn158IF6/slCcjKko/zdyDg5DiZR24/huUYbH4cjSLc2WI/9OPCMeypqkkQbVDL4GS9Xv4sFi92fGygx3kREZGicaHWqrZutR0IAZaBgL85OU7mKFohVjqHP7bdgi/+byfOXq6Ff//SF6MX9ECtmFqOT6CWgeSAHJQ6gwuqEhGRHWwZqkrJgUBGBoxNExBSeArWsvtUQkJReBM8/9f66PNoDEIj+rn3OmoYSG6SkSF32TlaUDUjw/9lCzbMxE1EQYwtQ1V5OxAwGuWxPitXyr+NRpeLdHrPGbw5fgsGN9mLuwsXQ0BCZbVwSECCJAENV72JftO7eJYN2hRgSDYSKkqSPChZCQGGaUFVoGZ5/bGgqhc+X1UwGORxZAMHAhMmyL+Tk+XtRETBQGhAUVGRACCKiorsH1hRIURSkhCSJITc1mD5I0lC6HTycY7k5Mjnqvr8pCR5uwP/2/yr+NuduaJ3vb1CgtHiFE+E/11cimxieV6dzqnzOi0nR65r9ffBtK3qa1VUCJGbK0R2tvzbmffG26y9195+T5x5TSc/X1Ux/S1Yuxaq/y0QEXmZ0/dvDzEYqs6VQMDROVy4gRz8+Kj4v8G5onOtgzWe1rPuPrHojlxx5Otj8sH+CECcCTCUFBD4MyjTSoBg+nJg7YuBq18OiIjc4K9giFPrrbGW5Vmnk7tcHE2rd3Jquvjlf9j9wREY3jyDnB06HLre0nxICIzo32Af9IOLMfrpW5DUPUDjc6qOE4mNlbedPSt3E547B9x9t3JzEfmKWlIPeIOT6RyYuoCIfIVT621YsGABnnnmGUyfPh2LnZlW7SqjEYiJAV56Sb7hN2kCJCY6P2DUyanp46MMWF05DkAqACAc1zGkyV7o7yjFnbNS0SS1s3fq44nQUPkmZzAAkydb1is01PqgZSECm4LA19SSesAblDyhgIjIi1QVDO3YsQPLli1Denq6b17A3rpfzt7UnbwxSJVG1EYJbk/cB/2oSgyflYboZt3dKLSP2cpGbW+wcDAFBNVpKUBQ08xCIiIPqGY22ZUrV/D73/8e//jHP9CwYUPvv4Dppl/9W79p3S9nZ844eWOYcs9vOHdOwkcnemHC630Q3SzaxQJ7gaPZUEajHBy625MaDAFBdVoKENQ0s5CIyAOqCYamTp2K4cOHY/DgwQ6PLSsrQ3FxscWPXfZu+qZtWVl2W0Mun7qMD57YhvETQ3ECCTWmv5tPd+MGkvH+w6jduLbDuviMM9OlHXUJORIMAUF1WgoQAp26gIjIT1QRDK1atQq7du3CggULnDp+wYIFiI6ONv/odDr7T3BlHEgVF3/5Df966BuMiv8eTRLDcffi3lh1IgOP4zX5adXPI0lyiBToG4izrWDutuwEU0BQndYCBL1eHgyfmGi5PSkpeAfJE5HmKD4YKigowPTp0/H+++8jKirKqefMmTMHRUVF5p+CggL7T3D2pn/yJAr3ncVbE7ZgaKMfENeqLia/0xfrC3ugDFFoFX4Ms3rkYfY/20D68ENISUmWz1fCDcSVVjB3WnaCMSCoTmsBgl4PHDsmzxrLzpZ/5+cHXz2JSLMUP7V+7dq1GDNmDEKr3FiNRiMkSUJISAjKysos9lnjcGqek1OIf0NDPIhlMGCseVt61GHoe56GfloC0sa0hhRSpbVAiUsYuDJdOiND7jqztdwFINenavehsykIgoESP18ioiDCqfU3DBo0CPv377fYdv/996Nt27aYNWuWw0DIKY7WuLohGpfwIcbhmchXEPO7ThjzRDJaD2kDoI31J5impiuJK7OhTF1CY8fKLT5V3xtTC9CqVUDjxtoMCJT4+RIRkcsUHwzVq1cPaWlpFtvq1KmDRo0a1djuttBQiFcXA3fdBQA2hj4DIRAQkPBS7CvAxypNqufqbChTl5C1lAOutgCxJYWIiBRI8WOGfKmyohLb3t6PJ7vloeX47sjERziHxnafI8H6YGrVcGc2lDfGjHCxTyIiUijFtwxZk5eX5/ZzK0orsPm1fVjzr0s4ffASIkQZTiMex5GIQjRGdoOpyLo03/GJ1JpDx5muL2uDnz3pErKVuNE0ey0YBx0TEZFqKH4AtTeYBmD9vsXn+Dz/VvQXeViC6dDhZrfP1bqxCH1tMSKT47WxHpMn66+5QktreRERkVf5awC1poIhoAhj8DU+wlhIEJZjg0ytIqtXAzNm2B5MHUw3b3+M4eFin0RE5CbOJvOBh9ttxuKTUyAViZqDpE0LjM6cCbz6KjBunGvdSGrkj9lQWlrLi4iIVElTA6gXLQpBVNEZm7PFzJmmGzfWVlI9X9LSWl5ERKRKmmoZQmGhc8edPg2MHw+MGsWp4J5ylMPJ1O0YjEt3EBGRKmgrGGra1LnjTK0UgUyqFyw5edydvUZEROQnmuomQ+/e6lhxPNhy8mhtLS8iIlIVTc0mKyoqQv2vv5ZbKQDrrRSBvjnbysmjlPJ5Ilhau4iIyC84td6LaryZ/sqx4yrm5CEiIjLj1Hpf0uudGxztjZYMV86xdavtQAi4Odtt61bm5CEiIvISbQZDgOPB0dZaj5KS5MHAzrYeuXoO5uQhIiLyO20NoHaWadxO9VYa01pazgxkducczMlDRETkd9ocM2SPN8btuHsO0/O0sBQIERGRA/4aM8SWoepcGbfj7XOYcvIANaf/MycPERGRTzAYqs4b43Y8OQdz8hAREfmVdgdQ2+KNcTuensPZ2W5ERETkMY4Zqs4b43Y49oeIiMhjHDMUKN4YtxNsY3+MRiAvD1i5Uv5tNAa6RERERF7DYMgab4zbCZaxP8G2ThoREVE17Cazx98ZqJUmmNdJIyIixePaZF7krzczqHCdNCIiCjCOGaLA8ka+JSIiIhVgMETWcZ00IiLSCAZDZB3XSSMiIo1g0kVXqXlAtCsyMuQxQY5yJWVk+L9sREREXsSWIVdoaZp5sOVKIiIisoHBkLNM08yrDyo+eVLe/tFHwZeYMFhyJREREdnBqfXOcDTNHJBbSKoGQElJcstKMAQMWukaJCIiRfHX1HqOGXKGo2nmQM2WoKotRmoPiEJDgQEDAl0KIiIin2A3mTPcmT5uanDLygqOLjMiIqIgxWDIGe5OH2diQiIiIsVjMOQM0zTz6rOqnMXEhERERIrFYMgZ9qaZO4OJCYmIiBSLwZCzbE0ztzerSpIAnY6JCYmIiBSMs8lcodcDo0ZZTjM/dw64+255f9UsBUxMSEREpAoMhlxlbZp5aCgwfbrl9PukJDkQUvu0eiIioiDHYMgbrLUYMTEhERGRKjAY8hYmJiQiIlIlxQ+gXrBgAbp374569eohNjYWo0ePxuHDhwNdLCIiIgoSig+GNm/ejKlTp2L79u3YsGEDKioqMHToUJSUlAS6aERERBQEVLdQ67lz5xAbG4vNmzejX79+Tj3HXwu9ERERkfdwoVYbioqKAAAxMTE2jykrK0NZWZn5cXFxsc/LRUREROqk+G6yqoQQmDFjBvr27Yu0tDSbxy1YsADR0dHmH51O58dSEhERkZqoqpts6tSp+PTTT/HNN98gKSnJ5nHWWoZ0Oh27yYiIiFSE3WTVPPbYY1i/fj22bNliNxACgMjISERGRvqpZERERKRmig+GhBB47LHHsGbNGuTl5SElJSXQRSIiIqIgovhgaOrUqcjOzsa6detQr149FBYWAgCio6NRq1atAJeOiIiI1E7xY4Yk04Kn1bz77ruYPHmyU+fg1HoiIiL14ZihGxQeqxEREZHKqWpqPREREZG3MRgiIiIiTWMwRERERJrGYIiIiIg0jcEQERERaRqDISIiItI0BkNERESkaQyGiIiISNMYDBEREZGmMRgiIiIiTWMwRERERJrGYIiIiIg0jcEQERERaRqDISIiItI0BkNERESkaQyGiIiISNMYDBEREZGmMRgiIiIiTWMwRERERJrGYIiIiIg0jcEQERERaRqDISIiItI0BkNERESkaQyGiIiISNMYDBEREZGmMRgiIiIiTWMwRERERJrGYIiIiIg0jcEQERERaRqDISIiItI0BkNERESkaQyGiIiISNMYDBEREZGmMRgiIiIiTWMwRERERJrGYIiIiIg0jcEQERERaRqDISIiItI01QRDb7zxBlJSUhAVFYWuXbti69atgS4SERERBQFVBEOrV69GVlYWnn32WezevRsZGRm4/fbb8euvvwa6aERERKRykhBCBLoQjvTo0QNdunTBm2++ad6WmpqK0aNHY8GCBQ6fX1xcjOjoaBQVFaF+/fq+LCoRERF5ib/u34pvGbp+/Tp++OEHDB061GL70KFDsW3btgCVioiIiIJFWKAL4Mj58+dhNBoRFxdnsT0uLg6FhYVWn1NWVoaysjLz46KiIgByhElERETqYLpv+7oTS/HBkIkkSRaPhRA1tpksWLAA8+fPr7Fdp9P5pGxERETkOxcuXEB0dLTPzq/4YKhx48YIDQ2t0Qp09uzZGq1FJnPmzMGMGTPMjy9duoTmzZvj119/9embqTTFxcXQ6XQoKCjQ1Fgp1pv11gLWm/XWgqKiIjRr1gwxMTE+fR3FB0MRERHo2rUrNmzYgDFjxpi3b9iwAaNGjbL6nMjISERGRtbYHh0drak/IpP69euz3hrCemsL660tWq13SIhvhzgrPhgCgBkzZmDixIno1q0bevXqhWXLluHXX3/FI488EuiiERERkcqpIhi6++67ceHCBTz//PM4ffo00tLS8Nlnn6F58+aBLhoRERGpnCqCIQCYMmUKpkyZ4tZzIyMjMXfuXKtdZ8GM9Wa9tYD1Zr21gPX2bb1VkXSRiIiIyFcUn3SRiIiIyJcYDBEREZGmMRgiIiIiTWMwRERERJqmymDojTfeQEpKCqKiotC1a1ds3brV7vGbN29G165dERUVhRYtWuCtt96qcUxOTg7atWuHyMhItGvXDmvWrPFV8d3mSr0NBgOGDBmCJk2aoH79+ujVqxe+/PJLi2NWrFgBSZJq/JSWlvq6Ki5xpd55eXlW63To0CGL44Lt8548ebLVerdv3958jBo+7y1btmDkyJFISEiAJElYu3atw+cEw/Xtar2D5fp2td7Bcn27Wu9gub4XLFiA7t27o169eoiNjcXo0aNx+PBhh8/zxzWuumBo9erVyMrKwrPPPovdu3cjIyMDt99+O3799Verx+fn5+OOO+5ARkYGdu/ejWeeeQaPP/44cnJyzMd89913uPvuuzFx4kTs3bsXEydOxLhx4/D999/7q1oOuVrvLVu2YMiQIfjss8/www8/YODAgRg5ciR2795tcVz9+vVx+vRpi5+oqCh/VMkprtbb5PDhwxZ1at26tXlfMH7eS5YssahvQUEBYmJicNddd1kcp/TPu6SkBB07dsTSpUudOj5Yrm9X6x0s17er9TZR+/Xtar2D5frevHkzpk6diu3bt2PDhg2oqKjA0KFDUVJSYvM5frvGhcrceuut4pFHHrHY1rZtWzF79myrxz/99NOibdu2Ftsefvhh0bNnT/PjcePGidtuu83imGHDhol77rnHS6X2nKv1tqZdu3Zi/vz55sfvvvuuiI6O9lYRfcLVeufm5goA4rfffrN5Ti183mvWrBGSJIljx46Zt6nh864KgFizZo3dY4Ll+q7KmXpbo8bruypn6h0s13dV7nzewXB9CyHE2bNnBQCxefNmm8f46xpXVcvQ9evX8cMPP2Do0KEW24cOHYpt27ZZfc53331X4/hhw4Zh586dKC8vt3uMrXP6mzv1rq6yshKXL1+usdjdlStX0Lx5cyQlJWHEiBE1vlkGkif17ty5M+Lj4zFo0CDk5uZa7NPC5718+XIMHjy4RpZ2JX/e7giG69sb1Hh9e0LN17c3BMv1XVRUBAB2F2H11zWuqmDo/PnzMBqNNVarj4uLq7GqvUlhYaHV4ysqKnD+/Hm7x9g6p7+5U+/q/va3v6GkpATjxo0zb2vbti1WrFiB9evXY+XKlYiKikKfPn1w5MgRr5bfXe7UOz4+HsuWLUNOTg4MBgPatGmDQYMGYcuWLeZjgv3zPn36ND7//HM8+OCDFtuV/nm7Ixiub29Q4/XtjmC4vj0VLNe3EAIzZsxA3759kZaWZvM4f13jqlmOoypJkiweCyFqbHN0fPXtrp4zENwt48qVKzFv3jysW7cOsbGx5u09e/ZEz549zY/79OmDLl264LXXXsPf//537xXcQ67Uu02bNmjTpo35ca9evVBQUICXX34Z/fr1c+ucgeJuGVesWIEGDRpg9OjRFtvV8nm7Kliub3ep/fp2RTBd3+4Klut72rRp2LdvH7755huHx/rjGldVy1Djxo0RGhpaI9o7e/ZsjajQpGnTplaPDwsLQ6NGjeweY+uc/uZOvU1Wr16NP/zhD/jggw8wePBgu8eGhISge/fuivkm4Um9q+rZs6dFnYL58xZC4J///CcmTpyIiIgIu8cq7fN2RzBc355Q8/XtLWq7vj0RLNf3Y489hvXr1yM3NxdJSUl2j/XXNa6qYCgiIgJdu3bFhg0bLLZv2LABvXv3tvqcXr161Tj+q6++Qrdu3RAeHm73GFvn9Dd36g3I3xgnT56M7OxsDB8+3OHrCCGwZ88exMfHe1xmb3C33tXt3r3bok7B+nkD8myNo0eP4g9/+IPD11Ha5+2OYLi+3aX269tb1HZ9e0Lt17cQAtOmTYPBYMCmTZuQkpLi8Dl+u8adHmqtEKtWrRLh4eFi+fLl4uDBgyIrK0vUqVPHPKp+9uzZYuLEiebj//e//4natWuLJ554Qhw8eFAsX75chIeHi48++sh8zLfffitCQ0PFSy+9JH766Sfx0ksvibCwMLF9+3a/188WV+udnZ0twsLCxOuvvy5Onz5t/rl06ZL5mHnz5okvvvhC/PLLL2L37t3i/vvvF2FhYeL777/3e/1scbXer776qlizZo34+eefxY8//ihmz54tAIicnBzzMcH4eZvce++9okePHlbPqYbP+/Lly2L37t1i9+7dAoB45ZVXxO7du8Xx48eFEMF7fbta72C5vl2td7Bc367W20Tt1/ejjz4qoqOjRV5ensXf7dWrV83HBOoaV10wJIQQr7/+umjevLmIiIgQXbp0sZiWN2nSJNG/f3+L4/Py8kTnzp1FRESESE5OFm+++WaNc3744YeiTZs2Ijw8XLRt29bi4lIKV+rdv39/AaDGz6RJk8zHZGVliWbNmomIiAjRpEkTMXToULFt2zY/1sg5rtR74cKFomXLliIqKko0bNhQ9O3bV3z66ac1zhlsn7cQQly6dEnUqlVLLFu2zOr51PB5m6ZO2/q7Ddbr29V6B8v17Wq9g+X6dufvPBiub2t1BiDeffdd8zGBusalGwUkIiIi0iRVjRkiIiIi8jYGQ0RERKRpDIaIiIhI0xgMERERkaYxGCIiIiJNYzBEREREmsZgiIiIiDSNwRARERFpGoMhIiIi0jQGQ0RERKRpDIaISJVWrlyJqKgonDx50rztwQcfRHp6OoqKigJYMiJSG65NRkSqJIRAp06dkJGRgaVLl2L+/Pl45513sH37diQmJga6eESkImGBLgARkTskScILL7yAsWPHIiEhAUuWLMHWrVsZCBGRy9gyRESq1qVLFxw4cABfffUV+vfvH+jiEJEKccwQEanWl19+iUOHDsFoNCIuLi7QxSEilWLLEBGp0q5duzBgwAC8/vrrWLVqFWrXro0PP/ww0MUiIhXimCEiUp1jx45h+PDhmD17NiZOnIh27dqhe/fu+OGHH9C1a9dAF4+IVIYtQ0SkKhcvXkSfPn3Qr18/vP322+bto0aNQllZGb744osAlo6I1IjBEBEREWkaB1ATERGRpjEYIiIiIk1jMERERESaxmCIiIiINI3BEBEREWkagyEiIiLSNAZDREREpGkMhoiIiEjTGAwRERGRpjEYIiIiIk1jMERERESaxmCIiIiINO3/AavJgYjvFWjAAAAAAElFTkSuQmCC",
       "text/plain": [
-       "<Figure size 1400x500 with 3 Axes>"
+       "<Figure size 640x480 with 1 Axes>"
       ]
      },
      "metadata": {},
      "output_type": "display_data"
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "theta from own sdg\n",
+      "[[3.69960894]\n",
+      " [3.18116338]]\n"
+     ]
     }
    ],
    "source": [
-    "\n",
-    "\n",
-    "#print(__doc__)\n",
-    "\n",
+    "# Using Autograd to calculate gradients using SGD\n",
+    "# OLS example\n",
+    "from random import random, seed\n",
     "import numpy as np\n",
+    "import autograd.numpy as np\n",
     "import matplotlib.pyplot as plt\n",
-    "from sklearn.pipeline import Pipeline\n",
-    "from sklearn.preprocessing import PolynomialFeatures\n",
-    "from sklearn.linear_model import LinearRegression\n",
-    "from sklearn.model_selection import cross_val_score\n",
-    "\n",
-    "\n",
-    "def true_fun(X):\n",
-    "    return np.cos(1.5 * np.pi * X)\n",
-    "\n",
-    "np.random.seed(0)\n",
-    "\n",
-    "n_samples = 300\n",
-    "degrees = [1, 4, 15]\n",
-    "\n",
-    "X = np.sort(np.random.rand(n_samples))\n",
-    "y = true_fun(X)# + np.random.randn(n_samples) * 0.1\n",
-    "\n",
-    "plt.figure(figsize=(14, 5))\n",
-    "for i in range(len(degrees)):\n",
-    "    ax = plt.subplot(1, len(degrees), i + 1)\n",
-    "    plt.setp(ax, xticks=(), yticks=())\n",
-    "\n",
-    "    polynomial_features = PolynomialFeatures(degree=degrees[i],\n",
-    "                                             include_bias=False)\n",
-    "    linear_regression = LinearRegression()\n",
-    "    pipeline = Pipeline([(\"polynomial_features\", polynomial_features),\n",
-    "                         (\"linear_regression\", linear_regression)])\n",
-    "    pipeline.fit(X[:, np.newaxis], y)\n",
-    "\n",
-    "    # Evaluate the models using crossvalidation\n",
-    "    scores = cross_val_score(pipeline, X[:, np.newaxis], y,\n",
-    "                             scoring=\"neg_mean_squared_error\", cv=10)\n",
-    "\n",
-    "    X_test = np.linspace(0, 1, 100)\n",
-    "    plt.plot(X_test, pipeline.predict(X_test[:, np.newaxis]), label=\"Model\")\n",
-    "    plt.plot(X_test, true_fun(X_test), label=\"True function\")\n",
-    "    plt.scatter(X, y, edgecolor='b', s=20, label=\"Samples\")\n",
-    "    plt.xlabel(\"x\")\n",
-    "    plt.ylabel(\"y\")\n",
-    "    plt.xlim((0, 1))\n",
-    "    plt.ylim((-2, 2))\n",
-    "    plt.legend(loc=\"best\")\n",
-    "    plt.title(\"Degree {}\\nMSE = {:.2e}(+/- {:.2e})\".format(\n",
-    "        degrees[i], -scores.mean(), scores.std()))\n",
-    "plt.show()"
+    "from autograd import grad\n",
+    "\n",
+    "# Note change from previous example\n",
+    "def CostOLS(y,X,theta):\n",
+    "    return np.sum((y-X @ theta)**2)\n",
+    "\n",
+    "n = 100\n",
+    "x = 2*np.random.rand(n,1)\n",
+    "y = 4+3*x+np.random.randn(n,1)\n",
+    "\n",
+    "X = np.c_[np.ones((n,1)), x]\n",
+    "XT_X = X.T @ X\n",
+    "theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)\n",
+    "print(\"Own inversion\")\n",
+    "print(theta_linreg)\n",
+    "# Hessian matrix\n",
+    "H = (2.0/n)* XT_X\n",
+    "EigValues, EigVectors = np.linalg.eig(H)\n",
+    "print(f\"Eigenvalues of Hessian Matrix:{EigValues}\")\n",
+    "\n",
+    "theta = np.random.randn(2,1)\n",
+    "eta = 1.0/np.max(EigValues)\n",
+    "Niterations = 1000\n",
+    "\n",
+    "# Note that we request the derivative wrt third argument (theta, 2 here)\n",
+    "training_gradient = grad(CostOLS,2)\n",
+    "\n",
+    "for iter in range(Niterations):\n",
+    "    gradients = (1.0/n)*training_gradient(y, X, theta)\n",
+    "    theta -= eta*gradients\n",
+    "print(\"theta from own gd\")\n",
+    "print(theta)\n",
+    "\n",
+    "xnew = np.array([[0],[2]])\n",
+    "Xnew = np.c_[np.ones((2,1)), xnew]\n",
+    "ypredict = Xnew.dot(theta)\n",
+    "ypredict2 = Xnew.dot(theta_linreg)\n",
+    "\n",
+    "plt.plot(xnew, ypredict, \"r-\")\n",
+    "plt.plot(xnew, ypredict2, \"b-\")\n",
+    "plt.plot(x, y ,'ro')\n",
+    "plt.axis([0,2.0,0, 15.0])\n",
+    "plt.xlabel(r'$x$')\n",
+    "plt.ylabel(r'$y$')\n",
+    "plt.title(r'Random numbers ')\n",
+    "plt.show()\n",
+    "\n",
+    "n_epochs = 50\n",
+    "M = 5   #size of each minibatch\n",
+    "m = int(n/M) #number of minibatches\n",
+    "t0, t1 = 5, 50\n",
+    "def learning_schedule(t):\n",
+    "    return t0/(t+t1)\n",
+    "\n",
+    "theta = np.random.randn(2,1)\n",
+    "\n",
+    "for epoch in range(n_epochs):\n",
+    "# Can you figure out a better way of setting up the contributions to each batch?\n",
+    "    for i in range(m):\n",
+    "        random_index = M*np.random.randint(m)\n",
+    "        xi = X[random_index:random_index+M]\n",
+    "        yi = y[random_index:random_index+M]\n",
+    "        gradients = (1.0/M)*training_gradient(yi, xi, theta)\n",
+    "        eta = learning_schedule(epoch*m+i)\n",
+    "        theta = theta - eta*gradients\n",
+    "print(\"theta from own sdg\")\n",
+    "print(theta)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "310fe5b2",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Same code but now with momentum gradient descent"
    ]
   },
   {
-   "cell_type": "markdown",
-   "id": "3a9dffce",
-   "metadata": {},
+   "cell_type": "code",
+   "execution_count": 4,
+   "id": "bcf65acf",
+   "metadata": {
+    "collapsed": false,
+    "editable": true,
+    "jupyter": {
+     "outputs_hidden": false
+    }
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Own inversion\n",
+      "[[4.22525328]\n",
+      " [2.68718918]]\n",
+      "Eigenvalues of Hessian Matrix:[0.32445272 4.47050883]\n",
+      "theta from own gd\n",
+      "[[4.22397364]\n",
+      " [2.68824301]]\n",
+      "theta from own sdg with momentum\n",
+      "[[4.2047527 ]\n",
+      " [2.77379323]]\n"
+     ]
+    }
+   ],
    "source": [
-    "## Various steps in cross-validation\n",
+    "# Using Autograd to calculate gradients using SGD\n",
+    "# OLS example\n",
+    "from random import random, seed\n",
+    "import numpy as np\n",
+    "import autograd.numpy as np\n",
+    "import matplotlib.pyplot as plt\n",
+    "from autograd import grad\n",
+    "\n",
+    "# Note change from previous example\n",
+    "def CostOLS(y,X,theta):\n",
+    "    return np.sum((y-X @ theta)**2)\n",
+    "\n",
+    "n = 100\n",
+    "x = 2*np.random.rand(n,1)\n",
+    "y = 4+3*x+np.random.randn(n,1)\n",
+    "\n",
+    "X = np.c_[np.ones((n,1)), x]\n",
+    "XT_X = X.T @ X\n",
+    "theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)\n",
+    "print(\"Own inversion\")\n",
+    "print(theta_linreg)\n",
+    "# Hessian matrix\n",
+    "H = (2.0/n)* XT_X\n",
+    "EigValues, EigVectors = np.linalg.eig(H)\n",
+    "print(f\"Eigenvalues of Hessian Matrix:{EigValues}\")\n",
+    "\n",
+    "theta = np.random.randn(2,1)\n",
+    "eta = 1.0/np.max(EigValues)\n",
+    "Niterations = 100\n",
+    "\n",
+    "# Note that we request the derivative wrt third argument (theta, 2 here)\n",
+    "training_gradient = grad(CostOLS,2)\n",
+    "\n",
+    "for iter in range(Niterations):\n",
+    "    gradients = (1.0/n)*training_gradient(y, X, theta)\n",
+    "    theta -= eta*gradients\n",
+    "print(\"theta from own gd\")\n",
+    "print(theta)\n",
+    "\n",
+    "\n",
+    "n_epochs = 50\n",
+    "M = 5   #size of each minibatch\n",
+    "m = int(n/M) #number of minibatches\n",
+    "t0, t1 = 5, 50\n",
+    "def learning_schedule(t):\n",
+    "    return t0/(t+t1)\n",
+    "\n",
+    "theta = np.random.randn(2,1)\n",
+    "\n",
+    "change = 0.0\n",
+    "delta_momentum = 0.3\n",
+    "\n",
+    "for epoch in range(n_epochs):\n",
+    "    for i in range(m):\n",
+    "        random_index = M*np.random.randint(m)\n",
+    "        xi = X[random_index:random_index+M]\n",
+    "        yi = y[random_index:random_index+M]\n",
+    "        gradients = (1.0/M)*training_gradient(yi, xi, theta)\n",
+    "        eta = learning_schedule(epoch*m+i)\n",
+    "        # calculate update\n",
+    "        new_change = eta*gradients+delta_momentum*change\n",
+    "        # take a step\n",
+    "        theta -= new_change\n",
+    "        # save the change\n",
+    "        change = new_change\n",
+    "print(\"theta from own sdg with momentum\")\n",
+    "print(theta)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f5e2c550",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## But none of these can compete with Newton's method\n",
     "\n",
-    "When the repetitive splitting of the data set is done randomly,\n",
-    "samples may accidently end up in a fast majority of the splits in\n",
-    "either training or test set. Such samples may have an unbalanced\n",
-    "influence on either model building or prediction evaluation. To avoid\n",
-    "this $k$-fold cross-validation structures the data splitting. The\n",
-    "samples are divided into $k$ more or less equally sized exhaustive and\n",
-    "mutually exclusive subsets. In turn (at each split) one of these\n",
-    "subsets plays the role of the test set while the union of the\n",
-    "remaining subsets constitutes the training set. Such a splitting\n",
-    "warrants a balanced representation of each sample in both training and\n",
-    "test set over the splits. Still the division into the $k$ subsets\n",
-    "involves a degree of randomness. This may be fully excluded when\n",
-    "choosing $k=n$. This particular case is referred to as leave-one-out\n",
-    "cross-validation (LOOCV)."
+    "Note that we here have introduced automatic differentiation"
    ]
   },
   {
-   "cell_type": "markdown",
-   "id": "2ee67263",
-   "metadata": {},
+   "cell_type": "code",
+   "execution_count": 5,
+   "id": "300a02a4",
+   "metadata": {
+    "collapsed": false,
+    "editable": true,
+    "jupyter": {
+     "outputs_hidden": false
+    }
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Own inversion\n",
+      "[[4.]\n",
+      " [3.]\n",
+      " [5.]]\n",
+      "0 [-27.64307243] [-34.86880234]\n",
+      "1 [8.82316442e-14] [1.48165552e-13]\n",
+      "2 [-4.08562073e-16] [-6.88988669e-16]\n",
+      "3 [-4.08562073e-16] [-6.88988669e-16]\n",
+      "4 [-4.08562073e-16] [-6.88988669e-16]\n",
+      "theta from own Newton code\n",
+      "[[4.]\n",
+      " [3.]\n",
+      " [5.]]\n"
+     ]
+    }
+   ],
    "source": [
-    "## Cross-validation in brief\n",
-    "\n",
-    "For the various values of $k$\n",
-    "\n",
-    "1. shuffle the dataset randomly.\n",
-    "\n",
-    "2. Split the dataset into $k$ groups.\n",
-    "\n",
-    "3. For each unique group:\n",
-    "\n",
-    "a. Decide which group to use as set for test data\n",
-    "\n",
-    "b. Take the remaining groups as a training data set\n",
-    "\n",
-    "c. Fit a model on the training set and evaluate it on the test set\n",
-    "\n",
-    "d. Retain the evaluation score and discard the model\n",
-    "\n",
-    "5. Summarize the model using the sample of model evaluation scores"
+    "# Using Newton's method\n",
+    "from random import random, seed\n",
+    "import numpy as np\n",
+    "import autograd.numpy as np\n",
+    "from autograd import grad\n",
+    "\n",
+    "def CostOLS(theta):\n",
+    "    return (1.0/n)*np.sum((y-X @ theta)**2)\n",
+    "\n",
+    "n = 100\n",
+    "x = 2*np.random.rand(n,1)\n",
+    "y = 4+3*x+5*x*x\n",
+    "\n",
+    "X = np.c_[np.ones((n,1)), x, x*x]\n",
+    "XT_X = X.T @ X\n",
+    "theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)\n",
+    "print(\"Own inversion\")\n",
+    "print(theta_linreg)\n",
+    "# Hessian matrix\n",
+    "H = (2.0/n)* XT_X\n",
+    "# Note that here the Hessian does not depend on the parameters theta\n",
+    "invH = np.linalg.pinv(H)\n",
+    "theta = np.random.randn(3,1)\n",
+    "Niterations = 5\n",
+    "# define the gradient\n",
+    "training_gradient = grad(CostOLS)\n",
+    "\n",
+    "for iter in range(Niterations):\n",
+    "    gradients = training_gradient(theta)\n",
+    "    theta -= invH @ gradients\n",
+    "    print(iter,gradients[0],gradients[1])\n",
+    "print(\"theta from own Newton code\")\n",
+    "print(theta)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5cb5fd26",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Similar (second order function now) problem but now with AdaGrad"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "id": "030efc5d",
+   "metadata": {
+    "collapsed": false,
+    "editable": true,
+    "jupyter": {
+     "outputs_hidden": false
+    }
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Own inversion\n",
+      "[[2.]\n",
+      " [3.]\n",
+      " [4.]]\n",
+      "theta from own AdaGrad\n",
+      "[[1.99997714]\n",
+      " [3.00010941]\n",
+      " [3.9998835 ]]\n"
+     ]
+    }
+   ],
+   "source": [
+    "# Using Autograd to calculate gradients using AdaGrad and Stochastic Gradient descent\n",
+    "# OLS example\n",
+    "from random import random, seed\n",
+    "import numpy as np\n",
+    "import autograd.numpy as np\n",
+    "import matplotlib.pyplot as plt\n",
+    "from autograd import grad\n",
+    "\n",
+    "# Note change from previous example\n",
+    "def CostOLS(y,X,theta):\n",
+    "    return np.sum((y-X @ theta)**2)\n",
+    "\n",
+    "n = 1000\n",
+    "x = np.random.rand(n,1)\n",
+    "y = 2.0+3*x +4*x*x\n",
+    "\n",
+    "X = np.c_[np.ones((n,1)), x, x*x]\n",
+    "XT_X = X.T @ X\n",
+    "theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)\n",
+    "print(\"Own inversion\")\n",
+    "print(theta_linreg)\n",
+    "\n",
+    "\n",
+    "# Note that we request the derivative wrt third argument (theta, 2 here)\n",
+    "training_gradient = grad(CostOLS,2)\n",
+    "# Define parameters for Stochastic Gradient Descent\n",
+    "n_epochs = 50\n",
+    "M = 5   #size of each minibatch\n",
+    "m = int(n/M) #number of minibatches\n",
+    "# Guess for unknown parameters theta\n",
+    "theta = np.random.randn(3,1)\n",
+    "\n",
+    "# Value for learning rate\n",
+    "eta = 0.01\n",
+    "# Including AdaGrad parameter to avoid possible division by zero\n",
+    "delta  = 1e-8\n",
+    "for epoch in range(n_epochs):\n",
+    "    Giter = 0.0\n",
+    "    for i in range(m):\n",
+    "        random_index = M*np.random.randint(m)\n",
+    "        xi = X[random_index:random_index+M]\n",
+    "        yi = y[random_index:random_index+M]\n",
+    "        gradients = (1.0/M)*training_gradient(yi, xi, theta)\n",
+    "        Giter += gradients*gradients\n",
+    "        update = gradients*eta/(delta+np.sqrt(Giter))\n",
+    "        theta -= update\n",
+    "print(\"theta from own AdaGrad\")\n",
+    "print(theta)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "66850bb7",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Running this code we note an almost perfect agreement with the results from matrix inversion."
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "c44868f2",
-   "metadata": {},
+   "id": "e1608bcf",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "## Code Example for Cross-validation and $k$-fold Cross-validation\n",
-    "\n",
-    "The code here uses Ridge regression with cross-validation (CV)  resampling and $k$-fold CV in order to fit a specific polynomial."
+    "## RMSprop for adaptive learning rate with Stochastic Gradient Descent"
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 6,
-   "id": "df0c5466",
-   "metadata": {},
-   "outputs": [],
+   "execution_count": 9,
+   "id": "0ba7d8f7",
+   "metadata": {
+    "collapsed": false,
+    "editable": true,
+    "jupyter": {
+     "outputs_hidden": false
+    }
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Own inversion\n",
+      "[[2.]\n",
+      " [3.]\n",
+      " [4.]]\n",
+      "theta from own RMSprop\n",
+      "[[2.00066836]\n",
+      " [2.99799388]\n",
+      " [4.00229827]]\n"
+     ]
+    }
+   ],
    "source": [
+    "# Using Autograd to calculate gradients using RMSprop  and Stochastic Gradient descent\n",
+    "# OLS example\n",
+    "from random import random, seed\n",
     "import numpy as np\n",
+    "import autograd.numpy as np\n",
     "import matplotlib.pyplot as plt\n",
-    "from sklearn.model_selection import KFold\n",
-    "from sklearn.linear_model import Ridge\n",
-    "from sklearn.model_selection import cross_val_score\n",
-    "from sklearn.preprocessing import PolynomialFeatures\n",
-    "\n",
-    "# A seed just to ensure that the random numbers are the same for every run.\n",
-    "# Useful for eventual debugging.\n",
-    "np.random.seed(3155)\n",
-    "\n",
-    "# Generate the data.\n",
-    "nsamples = 100\n",
-    "x = np.random.randn(nsamples)\n",
-    "y = 3*x**2 + np.random.randn(nsamples)\n",
-    "\n",
-    "## Cross-validation on Ridge regression using KFold only\n",
-    "\n",
-    "# Decide degree on polynomial to fit\n",
-    "poly = PolynomialFeatures(degree = 6)\n",
-    "\n",
-    "# Decide which values of lambda to use\n",
-    "nlambdas = 500\n",
-    "lambdas = np.logspace(-3, 5, nlambdas)\n",
-    "\n",
-    "# Initialize a KFold instance\n",
-    "k = 5\n",
-    "kfold = KFold(n_splits = k)\n",
-    "\n",
-    "# Perform the cross-validation to estimate MSE\n",
-    "scores_KFold = np.zeros((nlambdas, k))\n",
-    "\n",
-    "i = 0\n",
-    "for lmb in lambdas:\n",
-    "    ridge = Ridge(alpha = lmb)\n",
-    "    j = 0\n",
-    "    for train_inds, test_inds in kfold.split(x):\n",
-    "        xtrain = x[train_inds]\n",
-    "        ytrain = y[train_inds]\n",
-    "\n",
-    "        xtest = x[test_inds]\n",
-    "        ytest = y[test_inds]\n",
+    "from autograd import grad\n",
+    "\n",
+    "# Note change from previous example\n",
+    "def CostOLS(y,X,theta):\n",
+    "    return np.sum((y-X @ theta)**2)\n",
+    "\n",
+    "n = 1000\n",
+    "x = np.random.rand(n,1)\n",
+    "y = 2.0+3*x +4*x*x# +np.random.randn(n,1)\n",
+    "\n",
+    "X = np.c_[np.ones((n,1)), x, x*x]\n",
+    "XT_X = X.T @ X\n",
+    "theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)\n",
+    "print(\"Own inversion\")\n",
+    "print(theta_linreg)\n",
+    "\n",
+    "\n",
+    "# Note that we request the derivative wrt third argument (theta, 2 here)\n",
+    "training_gradient = grad(CostOLS,2)\n",
+    "# Define parameters for Stochastic Gradient Descent\n",
+    "n_epochs = 50\n",
+    "M = 5   #size of each minibatch\n",
+    "m = int(n/M) #number of minibatches\n",
+    "# Guess for unknown parameters theta\n",
+    "theta = np.random.randn(3,1)\n",
+    "\n",
+    "# Value for learning rate\n",
+    "eta = 0.01\n",
+    "# Value for parameter rho\n",
+    "rho = 0.99\n",
+    "# Including AdaGrad parameter to avoid possible division by zero\n",
+    "delta  = 1e-8\n",
+    "for epoch in range(n_epochs):\n",
+    "    Giter = 0.0\n",
+    "    for i in range(m):\n",
+    "        random_index = M*np.random.randint(m)\n",
+    "        xi = X[random_index:random_index+M]\n",
+    "        yi = y[random_index:random_index+M]\n",
+    "        gradients = (1.0/M)*training_gradient(yi, xi, theta)\n",
+    "\t# Accumulated gradient\n",
+    "\t# Scaling with rho the new and the previous results\n",
+    "        Giter = (rho*Giter+(1-rho)*gradients*gradients)\n",
+    "\t# Taking the diagonal only and inverting\n",
+    "        update = gradients*eta/(delta+np.sqrt(Giter))\n",
+    "\t# Hadamard product\n",
+    "        theta -= update\n",
+    "print(\"theta from own RMSprop\")\n",
+    "print(theta)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0503f74b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## And finally [ADAM](https://arxiv.org/pdf/1412.6980.pdf)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 10,
+   "id": "c2a2732a",
+   "metadata": {
+    "collapsed": false,
+    "editable": true,
+    "jupyter": {
+     "outputs_hidden": false
+    }
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Own inversion\n",
+      "[[2.]\n",
+      " [3.]\n",
+      " [4.]]\n",
+      "theta from own ADAM\n",
+      "[[1.99995763]\n",
+      " [3.00015536]\n",
+      " [3.99985505]]\n"
+     ]
+    }
+   ],
+   "source": [
+    "# Using Autograd to calculate gradients using RMSprop  and Stochastic Gradient descent\n",
+    "# OLS example\n",
+    "from random import random, seed\n",
+    "import numpy as np\n",
+    "import autograd.numpy as np\n",
+    "import matplotlib.pyplot as plt\n",
+    "from autograd import grad\n",
+    "\n",
+    "# Note change from previous example\n",
+    "def CostOLS(y,X,theta):\n",
+    "    return np.sum((y-X @ theta)**2)\n",
+    "\n",
+    "n = 1000\n",
+    "x = np.random.rand(n,1)\n",
+    "y = 2.0+3*x +4*x*x# +np.random.randn(n,1)\n",
+    "\n",
+    "X = np.c_[np.ones((n,1)), x, x*x]\n",
+    "XT_X = X.T @ X\n",
+    "theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)\n",
+    "print(\"Own inversion\")\n",
+    "print(theta_linreg)\n",
+    "\n",
+    "\n",
+    "# Note that we request the derivative wrt third argument (theta, 2 here)\n",
+    "training_gradient = grad(CostOLS,2)\n",
+    "# Define parameters for Stochastic Gradient Descent\n",
+    "n_epochs = 50\n",
+    "M = 5   #size of each minibatch\n",
+    "m = int(n/M) #number of minibatches\n",
+    "# Guess for unknown parameters theta\n",
+    "theta = np.random.randn(3,1)\n",
+    "\n",
+    "# Value for learning rate\n",
+    "eta = 0.01\n",
+    "# Value for parameters theta1 and theta2, see https://arxiv.org/abs/1412.6980\n",
+    "theta1 = 0.9\n",
+    "theta2 = 0.999\n",
+    "# Including AdaGrad parameter to avoid possible division by zero\n",
+    "delta  = 1e-7\n",
+    "iter = 0\n",
+    "for epoch in range(n_epochs):\n",
+    "    first_moment = 0.0\n",
+    "    second_moment = 0.0\n",
+    "    iter += 1\n",
+    "    for i in range(m):\n",
+    "        random_index = M*np.random.randint(m)\n",
+    "        xi = X[random_index:random_index+M]\n",
+    "        yi = y[random_index:random_index+M]\n",
+    "        gradients = (1.0/M)*training_gradient(yi, xi, theta)\n",
+    "        # Computing moments first\n",
+    "        first_moment = theta1*first_moment + (1-theta1)*gradients\n",
+    "        second_moment = theta2*second_moment+(1-theta2)*gradients*gradients\n",
+    "        first_term = first_moment/(1.0-theta1**iter)\n",
+    "        second_term = second_moment/(1.0-theta2**iter)\n",
+    "\t# Scaling with rho the new and the previous results\n",
+    "        update = eta*first_term/(np.sqrt(second_term)+delta)\n",
+    "        theta -= update\n",
+    "print(\"theta from own ADAM\")\n",
+    "print(theta)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b8475863",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Material for the lab sessions\n",
     "\n",
-    "        Xtrain = poly.fit_transform(xtrain[:, np.newaxis])\n",
-    "        ridge.fit(Xtrain, ytrain[:, np.newaxis])\n",
+    "1. Exercise set for week 37 and reminder on scaling (from lab sessions of week 35)\n",
     "\n",
-    "        Xtest = poly.fit_transform(xtest[:, np.newaxis])\n",
-    "        ypred = ridge.predict(Xtest)\n",
+    "2. Work on project 1\n",
+    "<!-- * [Video of exercise sessions week 37](https://youtu.be/bK4AEcTu-oM) -->\n",
     "\n",
-    "        scores_KFold[i,j] = np.sum((ypred - ytest[:, np.newaxis])**2)/np.size(ypred)\n",
+    "For more discussions of Ridge regression and calculation of averages, [Wessel van Wieringen's](https://arxiv.org/abs/1509.09169) article is highly recommended."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4d4d0717",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Reminder on different scaling methods\n",
     "\n",
-    "        j += 1\n",
-    "    i += 1\n",
+    "Before fitting a regression model, it is good practice to normalize or\n",
+    "standardize the features. This ensures all features are on a\n",
+    "comparable scale, which is especially important when using\n",
+    "regularization. In the exercises this week we will perform standardization, scaling each\n",
+    "feature to have mean 0 and standard deviation 1.\n",
     "\n",
+    "Here we compute the mean and standard deviation of each column (feature) in our design/feature matrix $\\boldsymbol{X}$.\n",
+    "Then we subtract the mean and divide by the standard deviation for each feature.\n",
     "\n",
-    "estimated_mse_KFold = np.mean(scores_KFold, axis = 1)\n",
+    "In the example here we\n",
+    "we will also center the target $\\boldsymbol{y}$ to mean $0$. Centering $\\boldsymbol{y}$\n",
+    "(and each feature) means the model does not require a separate intercept\n",
+    "term, the data is shifted such that the intercept is effectively 0\n",
+    ". (In practice, one could include an intercept in the model and not\n",
+    "penalize it, but here we simplify by centering.)\n",
+    "Choose $n=100$ data points and set up $\\boldsymbol{x}, $\\boldsymbol{y}$ and the design matrix $\\boldsymbol{X}$."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 17,
+   "id": "46375144",
+   "metadata": {
+    "collapsed": false,
+    "editable": true,
+    "jupyter": {
+     "outputs_hidden": false
+    }
+   },
+   "outputs": [],
+   "source": [
+    "# Standardize features (zero mean, unit variance for each feature)\n",
+    "X_mean = X.mean(axis=0)\n",
+    "X_std = X.std(axis=0)\n",
+    "X_std[X_std == 0] = 1  # safeguard to avoid division by zero for constant features\n",
+    "X_norm = (X - X_mean) / X_std\n",
     "\n",
-    "## Cross-validation using cross_val_score from sklearn along with KFold\n",
+    "# Center the target to zero mean (optional, to simplify intercept handling)\n",
+    "y_mean = ?\n",
+    "y_centered = ?"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "39426ccf",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Do we need to center the values of $y$?\n",
     "\n",
-    "# kfold is an instance initialized above as:\n",
-    "# kfold = KFold(n_splits = k)\n",
+    "After this preprocessing, each column of $\\boldsymbol{X}_{\\mathrm{norm}}$ has mean zero and standard deviation $1$\n",
+    "and $\\boldsymbol{y}_{\\mathrm{centered}}$ has mean 0. This can make the optimization landscape\n",
+    "nicer and ensures the regularization penalty $\\lambda \\sum_j\n",
+    "\\theta_j^2$ in Ridge regression treats each coefficient fairly (since features are on the\n",
+    "same scale)."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "df7fe27f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Functionality in Scikit-Learn\n",
     "\n",
-    "estimated_mse_sklearn = np.zeros(nlambdas)\n",
-    "i = 0\n",
-    "for lmb in lambdas:\n",
-    "    ridge = Ridge(alpha = lmb)\n",
+    "**Scikit-Learn** has several functions which allow us to rescale the\n",
+    "data, normally resulting in much better results in terms of various\n",
+    "accuracy scores.  The **StandardScaler** function in **Scikit-Learn**\n",
+    "ensures that for each feature/predictor we study the mean value is\n",
+    "zero and the variance is one (every column in the design/feature\n",
+    "matrix).  This scaling has the drawback that it does not ensure that\n",
+    "we have a particular maximum or minimum in our data set. Another\n",
+    "function included in **Scikit-Learn** is the **MinMaxScaler** which\n",
+    "ensures that all features are exactly between $0$ and $1$. The"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8fd48e39",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## More preprocessing\n",
     "\n",
-    "    X = poly.fit_transform(x[:, np.newaxis])\n",
-    "    estimated_mse_folds = cross_val_score(ridge, X, y[:, np.newaxis], scoring='neg_mean_squared_error', cv=kfold)\n",
+    "The **Normalizer** scales each data\n",
+    "point such that the feature vector has a euclidean length of one. In other words, it\n",
+    "projects a data point on the circle (or sphere in the case of higher dimensions) with a\n",
+    "radius of 1. This means every data point is scaled by a different number (by the\n",
+    "inverse of it’s length).\n",
+    "This normalization is often used when only the direction (or angle) of the data matters,\n",
+    "not the length of the feature vector.\n",
     "\n",
-    "    # cross_val_score return an array containing the estimated negative mse for every fold.\n",
-    "    # we have to the the mean of every array in order to get an estimate of the mse of the model\n",
-    "    estimated_mse_sklearn[i] = np.mean(-estimated_mse_folds)\n",
+    "The **RobustScaler** works similarly to the StandardScaler in that it\n",
+    "ensures statistical properties for each feature that guarantee that\n",
+    "they are on the same scale. However, the RobustScaler uses the median\n",
+    "and quartiles, instead of mean and variance. This makes the\n",
+    "RobustScaler ignore data points that are very different from the rest\n",
+    "(like measurement errors). These odd data points are also called\n",
+    "outliers, and might often lead to trouble for other scaling\n",
+    "techniques."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d6c60a0a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Frequently used scaling functions\n",
     "\n",
-    "    i += 1\n",
+    "Many features are often scaled using standardization to improve performance. In **Scikit-Learn** this is given by the **StandardScaler** function as discussed above. It is easy however to write your own. \n",
+    "Mathematically, this involves subtracting the mean and divide by the standard deviation over the data set, for each feature:"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1bb6eaa0",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "x_j^{(i)} \\rightarrow \\frac{x_j^{(i)} - \\overline{x}_j}{\\sigma(x_j)},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "25135896",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where $\\overline{x}_j$ and $\\sigma(x_j)$ are the mean and standard deviation, respectively,  of the feature $x_j$.\n",
+    "This ensures that each feature has zero mean and unit standard deviation.  For data sets where  we do not have the standard deviation or don't wish to calculate it,  it is then common to simply set it to one.\n",
     "\n",
-    "## Plot and compare the slightly different ways to perform cross-validation\n",
+    "Keep in mind that when you transform your data set before training a model, the same transformation needs to be done\n",
+    "on your eventual new data set  before making a prediction. If we translate this into a Python code, it would could be implemented as"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 18,
+   "id": "469ca11e",
+   "metadata": {
+    "collapsed": false,
+    "editable": true,
+    "jupyter": {
+     "outputs_hidden": false
+    }
+   },
+   "outputs": [],
+   "source": [
+    "\"\"\"\n",
+    "#Model training, we compute the mean value of y and X\n",
+    "y_train_mean = np.mean(y_train)\n",
+    "X_train_mean = np.mean(X_train,axis=0)\n",
+    "X_train = X_train - X_train_mean\n",
+    "y_train = y_train - y_train_mean\n",
     "\n",
-    "plt.figure()\n",
+    "# The we fit our model with the training data\n",
+    "trained_model = some_model.fit(X_train,y_train)\n",
     "\n",
-    "plt.plot(np.log10(lambdas), estimated_mse_sklearn, label = 'cross_val_score')\n",
-    "plt.plot(np.log10(lambdas), estimated_mse_KFold, 'r--', label = 'KFold')\n",
     "\n",
-    "plt.xlabel('log10(lambda)')\n",
-    "plt.ylabel('mse')\n",
+    "#Model prediction, we need also to transform our data set used for the prediction.\n",
+    "X_test = X_test - X_train_mean #Use mean from training data\n",
+    "y_pred = trained_model(X_test)\n",
+    "y_pred = y_pred + y_train_mean\n",
+    "\"\"\""
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "33722029",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Let us try to understand what this may imply mathematically when we\n",
+    "subtract the mean values, also known as *zero centering*. For\n",
+    "simplicity, we will focus on  ordinary regression, as done in the above example.\n",
     "\n",
-    "plt.legend()\n",
+    "The cost/loss function  for regression is"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "fe27291e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "C(\\theta_0, \\theta_1, ... , \\theta_{p-1}) = \\frac{1}{n}\\sum_{i=0}^{n} \\left(y_i - \\theta_0 - \\sum_{j=1}^{p-1} X_{ij}\\theta_j\\right)^2,.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ead1167d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Recall also that we use the squared value. This expression can lead to an\n",
+    "increased penalty for higher differences between predicted and\n",
+    "output/target values.\n",
     "\n",
-    "plt.show()"
+    "What we have done is to single out the $\\theta_0$ term in the\n",
+    "definition of the mean squared error (MSE).  The design matrix $X$\n",
+    "does in this case not contain any intercept column.  When we take the\n",
+    "derivative with respect to $\\theta_0$, we want the derivative to obey"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "5cbfeb1f",
-   "metadata": {},
+   "id": "b2efb706",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "## More examples on bootstrap and cross-validation and errors"
+    "$$\n",
+    "\\frac{\\partial C}{\\partial \\theta_j} = 0,\n",
+    "$$"
    ]
   },
   {
-   "cell_type": "code",
-   "execution_count": 7,
-   "id": "34028f77",
-   "metadata": {},
-   "outputs": [],
+   "cell_type": "markdown",
+   "id": "65333100",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "# Common imports\n",
-    "import os\n",
-    "import numpy as np\n",
-    "import pandas as pd\n",
-    "import matplotlib.pyplot as plt\n",
-    "from sklearn.linear_model import LinearRegression, Ridge, Lasso\n",
-    "from sklearn.model_selection import train_test_split\n",
-    "from sklearn.utils import resample\n",
-    "from sklearn.metrics import mean_squared_error\n",
-    "# Where to save the figures and data files\n",
-    "PROJECT_ROOT_DIR = \"Results\"\n",
-    "FIGURE_ID = \"Results/FigureFiles\"\n",
-    "DATA_ID = \"DataFiles/\"\n",
-    "\n",
-    "if not os.path.exists(PROJECT_ROOT_DIR):\n",
-    "    os.mkdir(PROJECT_ROOT_DIR)\n",
-    "\n",
-    "if not os.path.exists(FIGURE_ID):\n",
-    "    os.makedirs(FIGURE_ID)\n",
-    "\n",
-    "if not os.path.exists(DATA_ID):\n",
-    "    os.makedirs(DATA_ID)\n",
-    "\n",
-    "def image_path(fig_id):\n",
-    "    return os.path.join(FIGURE_ID, fig_id)\n",
-    "\n",
-    "def data_path(dat_id):\n",
-    "    return os.path.join(DATA_ID, dat_id)\n",
-    "\n",
-    "def save_fig(fig_id):\n",
-    "    plt.savefig(image_path(fig_id) + \".png\", format='png')\n",
-    "\n",
-    "infile = open(data_path(\"EoS.csv\"),'r')\n",
-    "\n",
-    "# Read the EoS data as  csv file and organize the data into two arrays with density and energies\n",
-    "EoS = pd.read_csv(infile, names=('Density', 'Energy'))\n",
-    "EoS['Energy'] = pd.to_numeric(EoS['Energy'], errors='coerce')\n",
-    "EoS = EoS.dropna()\n",
-    "Energies = EoS['Energy']\n",
-    "Density = EoS['Density']\n",
-    "#  The design matrix now as function of various polytrops\n",
-    "\n",
-    "Maxpolydegree = 30\n",
-    "X = np.zeros((len(Density),Maxpolydegree))\n",
-    "X[:,0] = 1.0\n",
-    "testerror = np.zeros(Maxpolydegree)\n",
-    "trainingerror = np.zeros(Maxpolydegree)\n",
-    "polynomial = np.zeros(Maxpolydegree)\n",
-    "\n",
-    "trials = 100\n",
-    "for polydegree in range(1, Maxpolydegree):\n",
-    "    polynomial[polydegree] = polydegree\n",
-    "    for degree in range(polydegree):\n",
-    "        X[:,degree] = Density**(degree/3.0)\n",
-    "\n",
-    "# loop over trials in order to estimate the expectation value of the MSE\n",
-    "    testerror[polydegree] = 0.0\n",
-    "    trainingerror[polydegree] = 0.0\n",
-    "    for samples in range(trials):\n",
-    "        x_train, x_test, y_train, y_test = train_test_split(X, Energies, test_size=0.2)\n",
-    "        model = LinearRegression(fit_intercept=False).fit(x_train, y_train)\n",
-    "        ypred = model.predict(x_train)\n",
-    "        ytilde = model.predict(x_test)\n",
-    "        testerror[polydegree] += mean_squared_error(y_test, ytilde)\n",
-    "        trainingerror[polydegree] += mean_squared_error(y_train, ypred) \n",
-    "\n",
-    "    testerror[polydegree] /= trials\n",
-    "    trainingerror[polydegree] /= trials\n",
-    "    print(\"Degree of polynomial: %3d\"% polynomial[polydegree])\n",
-    "    print(\"Mean squared error on training data: %.8f\" % trainingerror[polydegree])\n",
-    "    print(\"Mean squared error on test data: %.8f\" % testerror[polydegree])\n",
-    "\n",
-    "plt.plot(polynomial, np.log10(trainingerror), label='Training Error')\n",
-    "plt.plot(polynomial, np.log10(testerror), label='Test Error')\n",
-    "plt.xlabel('Polynomial degree')\n",
-    "plt.ylabel('log10[MSE]')\n",
-    "plt.legend()\n",
-    "plt.show()"
+    "for all $j$. For $\\theta_0$ we have"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1fde497c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial C}{\\partial \\theta_0} = -\\frac{2}{n}\\sum_{i=0}^{n-1} \\left(y_i - \\theta_0 - \\sum_{j=1}^{p-1} X_{ij} \\theta_j\\right).\n",
+    "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "9c64885c",
-   "metadata": {},
+   "id": "264ce562",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "Note that we kept the intercept column in the fitting here. This means that we need to set the **intercept** in the call to the **Scikit-Learn** function as **False**. Alternatively, we could have set up the design matrix $X$ without the first column of ones."
+    "Multiplying away the constant $2/n$, we obtain"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "ff70b35f",
-   "metadata": {},
+   "id": "0f63a6f8",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "## The same example but now with cross-validation\n",
-    "\n",
-    "In this example we keep the intercept column again but add cross-validation in order to estimate the best possible value of the means squared error."
+    "$$\n",
+    "\\sum_{i=0}^{n-1} \\theta_0 = \\sum_{i=0}^{n-1}y_i - \\sum_{i=0}^{n-1} \\sum_{j=1}^{p-1} X_{ij} \\theta_j.\n",
+    "$$"
    ]
   },
   {
-   "cell_type": "code",
-   "execution_count": 8,
-   "id": "e21b19fd",
-   "metadata": {},
-   "outputs": [],
+   "cell_type": "markdown",
+   "id": "2ba0a6e4",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "# Common imports\n",
-    "import os\n",
-    "import numpy as np\n",
-    "import pandas as pd\n",
-    "import matplotlib.pyplot as plt\n",
-    "from sklearn.linear_model import LinearRegression, Ridge, Lasso\n",
-    "from sklearn.metrics import mean_squared_error\n",
-    "from sklearn.model_selection import KFold\n",
-    "from sklearn.model_selection import cross_val_score\n",
-    "\n",
-    "\n",
-    "# Where to save the figures and data files\n",
-    "PROJECT_ROOT_DIR = \"Results\"\n",
-    "FIGURE_ID = \"Results/FigureFiles\"\n",
-    "DATA_ID = \"DataFiles/\"\n",
-    "\n",
-    "if not os.path.exists(PROJECT_ROOT_DIR):\n",
-    "    os.mkdir(PROJECT_ROOT_DIR)\n",
-    "\n",
-    "if not os.path.exists(FIGURE_ID):\n",
-    "    os.makedirs(FIGURE_ID)\n",
-    "\n",
-    "if not os.path.exists(DATA_ID):\n",
-    "    os.makedirs(DATA_ID)\n",
-    "\n",
-    "def image_path(fig_id):\n",
-    "    return os.path.join(FIGURE_ID, fig_id)\n",
-    "\n",
-    "def data_path(dat_id):\n",
-    "    return os.path.join(DATA_ID, dat_id)\n",
-    "\n",
-    "def save_fig(fig_id):\n",
-    "    plt.savefig(image_path(fig_id) + \".png\", format='png')\n",
-    "\n",
-    "infile = open(data_path(\"EoS.csv\"),'r')\n",
-    "\n",
-    "# Read the EoS data as  csv file and organize the data into two arrays with density and energies\n",
-    "EoS = pd.read_csv(infile, names=('Density', 'Energy'))\n",
-    "EoS['Energy'] = pd.to_numeric(EoS['Energy'], errors='coerce')\n",
-    "EoS = EoS.dropna()\n",
-    "Energies = EoS['Energy']\n",
-    "Density = EoS['Density']\n",
-    "#  The design matrix now as function of various polytrops\n",
-    "\n",
-    "Maxpolydegree = 30\n",
-    "X = np.zeros((len(Density),Maxpolydegree))\n",
-    "X[:,0] = 1.0\n",
-    "estimated_mse_sklearn = np.zeros(Maxpolydegree)\n",
-    "polynomial = np.zeros(Maxpolydegree)\n",
-    "k =5\n",
-    "kfold = KFold(n_splits = k)\n",
-    "\n",
-    "for polydegree in range(1, Maxpolydegree):\n",
-    "    polynomial[polydegree] = polydegree\n",
-    "    for degree in range(polydegree):\n",
-    "        X[:,degree] = Density**(degree/3.0)\n",
-    "        OLS = LinearRegression(fit_intercept=False)\n",
-    "# loop over trials in order to estimate the expectation value of the MSE\n",
-    "    estimated_mse_folds = cross_val_score(OLS, X, Energies, scoring='neg_mean_squared_error', cv=kfold)\n",
-    "#[:, np.newaxis]\n",
-    "    estimated_mse_sklearn[polydegree] = np.mean(-estimated_mse_folds)\n",
-    "\n",
-    "plt.plot(polynomial, np.log10(estimated_mse_sklearn), label='Test Error')\n",
-    "plt.xlabel('Polynomial degree')\n",
-    "plt.ylabel('log10[MSE]')\n",
-    "plt.legend()\n",
-    "plt.show()"
+    "Let us specialize first to the case where we have only two parameters $\\theta_0$ and $\\theta_1$.\n",
+    "Our result for $\\theta_0$ simplifies then to"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "cc43a51c",
-   "metadata": {},
+   "id": "3b377f93",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "## Material for the lab sessions"
+    "$$\n",
+    "n\\theta_0 = \\sum_{i=0}^{n-1}y_i - \\sum_{i=0}^{n-1} X_{i1} \\theta_1.\n",
+    "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "af5687ed",
-   "metadata": {},
+   "id": "f05e9d08",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "## Linking the regression analysis with a statistical interpretation\n",
-    "\n",
-    "We will now couple the discussions of ordinary least squares, Ridge\n",
-    "and Lasso regression with a statistical interpretation, that is we\n",
-    "move from a linear algebra analysis to a statistical analysis. In\n",
-    "particular, we will focus on what the regularization terms can result\n",
-    "in.  We will amongst other things show that the regularization\n",
-    "parameter can reduce considerably the variance of the parameters\n",
-    "$\\beta$.\n",
-    "\n",
-    "The\n",
-    "advantage of doing linear regression is that we actually end up with\n",
-    "analytical expressions for several statistical quantities.  \n",
-    "Standard least squares and Ridge regression  allow us to\n",
-    "derive quantities like the variance and other expectation values in a\n",
-    "rather straightforward way.\n",
-    "\n",
-    "It is assumed that $\\varepsilon_i\n",
-    "\\sim \\mathcal{N}(0, \\sigma^2)$ and the $\\varepsilon_{i}$ are\n",
-    "independent, i.e.:"
+    "We obtain then"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "47c3811a",
-   "metadata": {},
+   "id": "84784b8e",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
-    "\\begin{align*} \n",
-    "\\mbox{Cov}(\\varepsilon_{i_1},\n",
-    "\\varepsilon_{i_2}) & = \\left\\{ \\begin{array}{lcc} \\sigma^2 & \\mbox{if}\n",
-    "& i_1 = i_2, \\\\ 0 & \\mbox{if} & i_1 \\not= i_2.  \\end{array} \\right.\n",
-    "\\end{align*}\n",
+    "\\theta_0 = \\frac{1}{n}\\sum_{i=0}^{n-1}y_i - \\theta_1\\frac{1}{n}\\sum_{i=0}^{n-1} X_{i1}.\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "980dac66",
-   "metadata": {},
+   "id": "b62c6e5a",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "The randomness of $\\varepsilon_i$ implies that\n",
-    "$\\mathbf{y}_i$ is also a random variable. In particular,\n",
-    "$\\mathbf{y}_i$ is normally distributed, because $\\varepsilon_i \\sim\n",
-    "\\mathcal{N}(0, \\sigma^2)$ and $\\mathbf{X}_{i,\\ast} \\, \\boldsymbol{\\beta}$ is a\n",
-    "non-random scalar. To specify the parameters of the distribution of\n",
-    "$\\mathbf{y}_i$ we need to calculate its first two moments. \n",
-    "\n",
-    "Recall that $\\boldsymbol{X}$ is a matrix of dimensionality $n\\times p$. The\n",
-    "notation above $\\mathbf{X}_{i,\\ast}$ means that we are looking at the\n",
-    "row number $i$ and perform a sum over all values $p$."
+    "If we define"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "be67d8c5",
-   "metadata": {},
+   "id": "ecce9763",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "## Assumptions made\n",
-    "\n",
-    "The assumption we have made here can be summarized as (and this is going to be useful when we discuss the bias-variance trade off)\n",
-    "that there exists a function $f(\\boldsymbol{x})$ and  a normal distributed error $\\boldsymbol{\\varepsilon}\\sim \\mathcal{N}(0, \\sigma^2)$\n",
-    "which describe our data"
+    "$$\n",
+    "\\mu_{\\boldsymbol{x}_1}=\\frac{1}{n}\\sum_{i=0}^{n-1} X_{i1},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c9e1842a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and the mean value of the outputs as"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "5133bf97",
-   "metadata": {},
+   "id": "be12163e",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
-    "\\boldsymbol{y} = f(\\boldsymbol{x})+\\boldsymbol{\\varepsilon}\n",
+    "\\mu_y=\\frac{1}{n}\\sum_{i=0}^{n-1}y_i,\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "b76cb41b",
-   "metadata": {},
+   "id": "a097e9ab",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "We approximate this function with our model from the solution of the linear regression equations, that is our\n",
-    "function $f$ is approximated by $\\boldsymbol{\\tilde{y}}$ where we want to minimize $(\\boldsymbol{y}-\\boldsymbol{\\tilde{y}})^2$, our MSE, with"
+    "we have"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "b6e9e9ce",
-   "metadata": {},
+   "id": "239422b0",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
-    "\\boldsymbol{\\tilde{y}} = \\boldsymbol{X}\\boldsymbol{\\beta}.\n",
+    "\\theta_0 = \\mu_y - \\theta_1\\mu_{\\boldsymbol{x}_1}.\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "53479926",
-   "metadata": {},
+   "id": "ed9778bb",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "## Expectation value and variance\n",
-    "\n",
-    "We can calculate the expectation value of $\\boldsymbol{y}$ for a given element $i$"
+    "In the general case with more parameters than $\\theta_0$ and $\\theta_1$, we have"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "1929fd98",
-   "metadata": {},
+   "id": "7179b77b",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
-    "\\begin{align*} \n",
-    "\\mathbb{E}(y_i) & =\n",
-    "\\mathbb{E}(\\mathbf{X}_{i, \\ast} \\, \\boldsymbol{\\beta}) + \\mathbb{E}(\\varepsilon_i)\n",
-    "\\, \\, \\, = \\, \\, \\, \\mathbf{X}_{i, \\ast} \\, \\beta, \n",
-    "\\end{align*}\n",
+    "\\theta_0 = \\frac{1}{n}\\sum_{i=0}^{n-1}y_i - \\frac{1}{n}\\sum_{i=0}^{n-1}\\sum_{j=1}^{p-1} X_{ij}\\theta_j.\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "18b53cb3",
-   "metadata": {},
+   "id": "aad2f56e",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "while\n",
-    "its variance is"
+    "We can rewrite the latter equation as"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "a57528a8",
-   "metadata": {},
+   "id": "26aa9739",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
-    "\\begin{align*} \\mbox{Var}(y_i) & = \\mathbb{E} \\{ [y_i\n",
-    "- \\mathbb{E}(y_i)]^2 \\} \\, \\, \\, = \\, \\, \\, \\mathbb{E} ( y_i^2 ) -\n",
-    "[\\mathbb{E}(y_i)]^2  \\\\  & = \\mathbb{E} [ ( \\mathbf{X}_{i, \\ast} \\,\n",
-    "\\beta + \\varepsilon_i )^2] - ( \\mathbf{X}_{i, \\ast} \\, \\boldsymbol{\\beta})^2 \\\\ &\n",
-    "= \\mathbb{E} [ ( \\mathbf{X}_{i, \\ast} \\, \\boldsymbol{\\beta})^2 + 2 \\varepsilon_i\n",
-    "\\mathbf{X}_{i, \\ast} \\, \\boldsymbol{\\beta} + \\varepsilon_i^2 ] - ( \\mathbf{X}_{i,\n",
-    "\\ast} \\, \\beta)^2 \\\\  & = ( \\mathbf{X}_{i, \\ast} \\, \\boldsymbol{\\beta})^2 + 2\n",
-    "\\mathbb{E}(\\varepsilon_i) \\mathbf{X}_{i, \\ast} \\, \\boldsymbol{\\beta} +\n",
-    "\\mathbb{E}(\\varepsilon_i^2 ) - ( \\mathbf{X}_{i, \\ast} \\, \\boldsymbol{\\beta})^2 \n",
-    "\\\\ & = \\mathbb{E}(\\varepsilon_i^2 ) \\, \\, \\, = \\, \\, \\,\n",
-    "\\mbox{Var}(\\varepsilon_i) \\, \\, \\, = \\, \\, \\, \\sigma^2.  \n",
-    "\\end{align*}\n",
+    "\\theta_0 = \\frac{1}{n}\\sum_{i=0}^{n-1}y_i - \\sum_{j=1}^{p-1} \\mu_{\\boldsymbol{x}_j}\\theta_j,\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "b5e5c863",
-   "metadata": {},
+   "id": "d270cb13",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "Hence, $y_i \\sim \\mathcal{N}( \\mathbf{X}_{i, \\ast} \\, \\boldsymbol{\\beta}, \\sigma^2)$, that is $\\boldsymbol{y}$ follows a normal distribution with \n",
-    "mean value $\\boldsymbol{X}\\boldsymbol{\\beta}$ and variance $\\sigma^2$ (not be confused with the singular values of the SVD)."
+    "where we have defined"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "13fe1b50",
-   "metadata": {},
+   "id": "5a52457b",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "## Expectation value and variance for $\\boldsymbol{\\beta}$\n",
+    "$$\n",
+    "\\mu_{\\boldsymbol{x}_j}=\\frac{1}{n}\\sum_{i=0}^{n-1} X_{ij},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8c98105d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "the mean value for all elements of the column vector $\\boldsymbol{x}_j$.\n",
     "\n",
-    "With the OLS expressions for the optimal parameters $\\boldsymbol{\\hat{\\beta}}$ we can evaluate the expectation value"
+    "Replacing $y_i$ with $y_i - y_i - \\overline{\\boldsymbol{y}}$ and centering also our design matrix results in a cost function (in vector-matrix disguise)"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "22327251",
-   "metadata": {},
+   "id": "4d82302f",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
-    "\\mathbb{E}(\\boldsymbol{\\hat{\\beta}}) = \\mathbb{E}[ (\\mathbf{X}^{\\top} \\mathbf{X})^{-1}\\mathbf{X}^{T} \\mathbf{Y}]=(\\mathbf{X}^{T} \\mathbf{X})^{-1}\\mathbf{X}^{T} \\mathbb{E}[ \\mathbf{Y}]=(\\mathbf{X}^{T} \\mathbf{X})^{-1} \\mathbf{X}^{T}\\mathbf{X}\\boldsymbol{\\beta}=\\boldsymbol{\\beta}.\n",
+    "C(\\boldsymbol{\\theta}) = (\\boldsymbol{\\tilde{y}} - \\tilde{X}\\boldsymbol{\\theta})^T(\\boldsymbol{\\tilde{y}} - \\tilde{X}\\boldsymbol{\\theta}).\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "f7875671",
-   "metadata": {},
+   "id": "a3a07a10",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "This means that the estimator of the regression parameters is unbiased.\n",
-    "\n",
-    "We can also calculate the variance\n",
+    "If we minimize with respect to $\\boldsymbol{\\theta}$ we have then"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ea19374e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\hat{\\boldsymbol{\\theta}} = (\\tilde{X}^T\\tilde{X})^{-1}\\tilde{X}^T\\boldsymbol{\\tilde{y}},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "11dd1361",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where $\\boldsymbol{\\tilde{y}} = \\boldsymbol{y} - \\overline{\\boldsymbol{y}}$\n",
+    "and $\\tilde{X}_{ij} = X_{ij} - \\frac{1}{n}\\sum_{k=0}^{n-1}X_{kj}$.\n",
     "\n",
-    "The variance of the optimal value $\\boldsymbol{\\hat{\\beta}}$ is"
+    "For Ridge regression we need to add $\\lambda \\boldsymbol{\\theta}^T\\boldsymbol{\\theta}$ to the cost function and get then"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "584d9150",
-   "metadata": {},
+   "id": "f6a52f34",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
-    "\\begin{eqnarray*}\n",
-    "\\mbox{Var}(\\boldsymbol{\\hat{\\beta}}) & = & \\mathbb{E} \\{ [\\boldsymbol{\\beta} - \\mathbb{E}(\\boldsymbol{\\beta})] [\\boldsymbol{\\beta} - \\mathbb{E}(\\boldsymbol{\\beta})]^{T} \\}\n",
-    "\\\\\n",
-    "& = & \\mathbb{E} \\{ [(\\mathbf{X}^{T} \\mathbf{X})^{-1} \\, \\mathbf{X}^{T} \\mathbf{y} - \\boldsymbol{\\beta}] \\, [(\\mathbf{X}^{T} \\mathbf{X})^{-1} \\, \\mathbf{X}^{T} \\mathbf{y} - \\boldsymbol{\\beta}]^{T} \\}\n",
-    "\\\\\n",
-    "% & = & \\mathbb{E} \\{ [(\\mathbf{X}^{T} \\mathbf{X})^{-1} \\, \\mathbf{X}^{T} \\mathbf{y}] \\, [(\\mathbf{X}^{T} \\mathbf{X})^{-1} \\, \\mathbf{X}^{T} \\mathbf{y}]^{T} \\} - \\boldsymbol{\\beta} \\, \\boldsymbol{\\beta}^{T}\n",
-    "% \\\\\n",
-    "% & = & \\mathbb{E} \\{ (\\mathbf{X}^{T} \\mathbf{X})^{-1} \\, \\mathbf{X}^{T} \\mathbf{y} \\, \\mathbf{y}^{T} \\, \\mathbf{X} \\, (\\mathbf{X}^{T} \\mathbf{X})^{-1}  \\} - \\boldsymbol{\\beta} \\, \\boldsymbol{\\beta}^{T}\n",
-    "% \\\\\n",
-    "& = & (\\mathbf{X}^{T} \\mathbf{X})^{-1} \\, \\mathbf{X}^{T} \\, \\mathbb{E} \\{ \\mathbf{y} \\, \\mathbf{y}^{T} \\} \\, \\mathbf{X} \\, (\\mathbf{X}^{T} \\mathbf{X})^{-1} - \\boldsymbol{\\beta} \\, \\boldsymbol{\\beta}^{T}\n",
-    "\\\\\n",
-    "& = & (\\mathbf{X}^{T} \\mathbf{X})^{-1} \\, \\mathbf{X}^{T} \\, \\{ \\mathbf{X} \\, \\boldsymbol{\\beta} \\, \\boldsymbol{\\beta}^{T} \\,  \\mathbf{X}^{T} + \\sigma^2 \\} \\, \\mathbf{X} \\, (\\mathbf{X}^{T} \\mathbf{X})^{-1} - \\boldsymbol{\\beta} \\, \\boldsymbol{\\beta}^{T}\n",
-    "% \\\\\n",
-    "% & = & (\\mathbf{X}^T \\mathbf{X})^{-1} \\, \\mathbf{X}^T \\, \\mathbf{X} \\, \\boldsymbol{\\beta} \\, \\boldsymbol{\\beta}^T \\,  \\mathbf{X}^T \\, \\mathbf{X} \\, (\\mathbf{X}^T % \\mathbf{X})^{-1}\n",
-    "% \\\\\n",
-    "% & & + \\, \\, \\sigma^2 \\, (\\mathbf{X}^T \\mathbf{X})^{-1} \\, \\mathbf{X}^T  \\, \\mathbf{X} \\, (\\mathbf{X}^T \\mathbf{X})^{-1} - \\boldsymbol{\\beta} \\boldsymbol{\\beta}^T\n",
-    "\\\\\n",
-    "& = & \\boldsymbol{\\beta} \\, \\boldsymbol{\\beta}^{T}  + \\sigma^2 \\, (\\mathbf{X}^{T} \\mathbf{X})^{-1} - \\boldsymbol{\\beta} \\, \\boldsymbol{\\beta}^{T}\n",
-    "\\, \\, \\, = \\, \\, \\, \\sigma^2 \\, (\\mathbf{X}^{T} \\mathbf{X})^{-1},\n",
-    "\\end{eqnarray*}\n",
+    "\\hat{\\boldsymbol{\\theta}} = (\\tilde{X}^T\\tilde{X} + \\lambda I)^{-1}\\tilde{X}^T\\boldsymbol{\\tilde{y}}.\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "0ebb2d20",
-   "metadata": {},
+   "id": "9d6807dc",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "What does this mean? And why do we insist on all this? Let us look at some examples.\n",
+    "\n",
+    "This code shows a simple first-order fit to a data set using the above transformed data, where we consider the role of the intercept first, by either excluding it or including it (*code example thanks to  Øyvind Sigmundson Schøyen*). Here our scaling of the data is done by subtracting the mean values only.\n",
+    "Note also that we do not split the data into training and test."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 9,
+   "id": "2ed0cafc",
+   "metadata": {
+    "collapsed": false,
+    "editable": true,
+    "jupyter": {
+     "outputs_hidden": false
+    }
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "True theta: [2, 0.5, 3.7]\n",
+      "Fitted theta: [2.08376632 0.19569961 3.97898392]\n",
+      "Sklearn fitted theta: [2.08376632 0.19569961 3.97898392]\n",
+      "MSE with intercept column\n",
+      "0.004113634617443139\n",
+      "MSE with intercept column from SKL\n",
+      "0.004113634617443147\n",
+      "Manual intercept: 2.083766322923899\n",
+      "Fitted theta (without intercept): [0.19569961 3.97898392]\n",
+      "Sklearn intercept: 2.0837663229239043\n",
+      "Sklearn fitted theta (without intercept): [0.19569961 3.97898392]\n",
+      "MSE with Manual intercept\n",
+      "0.00411363461744314\n",
+      "MSE with Sklearn intercept\n",
+      "0.004113634617443131\n"
+     ]
+    },
+    {
+     "data": {
+      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAhYAAAGdCAYAAABO2DpVAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjguNCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8fJSN1AAAACXBIWXMAAA9hAAAPYQGoP6dpAABswElEQVR4nO3dd3gUVffA8e9m0yskpAGBhE4IBBBDkypNqgjSlKqgCEp5Ebs0pUkTFRSQ8gMBBSlKCaB0kF4khBoSaiCEQHrZMr8/8rIvSxJIILubcj7Ps4/u7Mzds4dN5uTeO3dUiqIoCCGEEELkAytLByCEEEKIokMKCyGEEELkGykshBBCCJFvpLAQQgghRL6RwkIIIYQQ+UYKCyGEEELkGykshBBCCJFvpLAQQgghRL6xNvcb6vV6bt26hYuLCyqVytxvL4QQQohnoCgKiYmJlC5dGiurnPslzF5Y3Lp1Cz8/P3O/rRBCCCHywfXr1ylbtmyOr5u9sHBxcQEyA3N1dc23djUaDdu3b6dNmzbY2NjkW7vCmOTZfCTX5iF5Ng/Js3mYMs8JCQn4+fkZzuM5MXth8XD4w9XVNd8LC0dHR1xdXeVLa0KSZ/ORXJuH5Nk8JM/mYY48P20ag0zeFEIIIUS+kcJCCCGEEPlGCgshhBBC5Buzz7HIDZ1Oh0ajydMxGo0Ga2tr0tLS0Ol0JopMSJ7NpyjlWq1WY21tLZeYC1EMFLjCIikpiRs3bqAoSp6OUxQFHx8frl+/Lr+8TEjybD5FLdeOjo74+vpia2tr6VCEECZUoAoLnU7HjRs3cHR0xNPTM0+/TPV6PUlJSTg7Oz9x4Q7xfCTP5lNUcq0oChkZGdy9e5fIyEgqV65cqD+PEOLJClRhodFoUBQFT09PHBwc8nSsXq8nIyMDe3t7+aVlQpJn8ylKuXZwcMDGxoarV68aPpMQomgqkL+tikK3rxDCWGEvjoQQuSM/6UIIIYTIN1JYCCGEECLfSGEhhBBCiHwjhUU+GDBgACqVCpVKhY2NDd7e3rRu3ZrFixej1+tz3c7SpUspUaKE6QIVQgghTKxIFhY6vcI/EffYeOom/0TcQ6fP25oYz6Jdu3ZER0cTFRXF1q1badGiBSNGjKBjx45otVqTv78QQojiTadX2Dl/JFYXNnLoUoxZzn3ZKXKFRWjYbV6atpPeCw8xYvUpei88xEvTdhIaFm3S97Wzs8PHx4cyZcpQt25dPv30UzZu3MjWrVtZunQpALNmzaJmzZo4OTnh5+fHe++9R1JSEgC7d+9m4MCBxMfHG3o/xo8fD8CKFSuoV68eLi4u+Pj40KdPH2JiYkz6eYQQQhQeoWHRfD56AJzYQofk31n4y3KznPuyU6QKi78v3GPYypNEx6cZbb8dn8bQFSfMnuCWLVsSHBzMunXrgMzL7ebOnUtYWBjLli1j586djB07FoBGjRoxZ84cXF1diY6OJjo6mjFjxgCQkZHBpEmTOH36NBs2bCAyMpIBAwaY9bMIIYQomELDopm8dA2d9x2h3CF7Vt6oymGCLHbuK1ALZD0PnV5h+l9XyK7jRwFUwIQ/w2kd6IPaynzrZFSrVo1///0XgJEjRxq2BwQEMGnSJIYOHcq8efOwtbXFzc0NlUqFj4+PURuDBg0y/H+FChWYO3cuISEhhlUZhRBCFE86vcK4Df/yydkFlEiGO+5g2/JdCLfcua/I9FgcjYrjTmJGjq8rQHR8Gkci48wXFJnLGT9c8GvXrl20bt2aMmXK4OLiQr9+/bh37x7JyclPbOPkyZN06dKF8uXL4+LiQvPmzQG4du2aqcMXQghRgB2JjKPX5SlUjdShUcOykLbYOXsYXrfEua/IFBYxiem53C/t6Tvlo3PnzhEQEMDVq1dp3749QUFB/P777xw/fpwffvgB4Il3ck1OTqZNmzY4OzuzYsUKjh49yvr164HMIRIhhBDF17kdi2h59A4Au+p5cdCpVbb7mfPcV2SGQrxc7HK5n/nuUbBz507OnDnDqFGjOHbsGFqtlpkzZxqWNv7tt9+M9re1tc1ye+zz588TGxvL1KlT8fPzA+DYsWPm+QBCCCEKrLiYmwQsX4qtDi76WzHb5z/kdCY057mvyPRYvOjvjreLLTmNIKkAXzd7QgLcTfL+6enp3L59m5s3b3LixAkmT55Mly5d6NixI/369aNixYpotVq+++47rly5wvLly/nxxx+N2vD39ycpKYm///6b2NhYUlJSKFeuHLa2tobj/vjjDyZNmmSSzyCEEKLw2DVrIKXuQ7wjzK4xBFTqLPuY+tyXnSJTWKitVIxtVQEgS3Hx8Pm4ToEmm7wSGhqKr68v/v7+tGvXjl27djF37lw2btyIWq2mdu3azJo1i2nTphEUFMQvv/zClClTjNpo1KgR7777Lj179sTT05Pp06fj6enJ0qVLWbNmDYGBgUydOpUZM2aY5DMIIYQoHE5uX0E3+8Oo2j0g/PWXuWpTySLnvuyoFEUx6woaCQkJuLm5ER8fj6urq9FraWlpREZGEhAQkOfbKuv1ehISEjh4LYVJm88ZXXLq62bPuE6BtAvyzZfPUJw9zLOrq6vcrdLEilqun+fn25Q0Gg1btmyhffv22NjYWDqcIkvynH/u3IjAblFTSpDEIZ83aPDuPELDopnwZzhxSalMD9Ex9ogad2eHfD33Pen8/agiM8fioXZBPrQN8uVIZBwxiWl4uWR2AZmzWhNCCCFMIT01hf0jX6WGv5a7bpWoO3AWAO2CfGkd6MOhyzHEnjvE4v4v0qCSl0XOfUWusIDMYZGGFT2evqMQQghRiKwb8Qq1w7QkRLhTYum32Nr9r/dPbaUiJMCdLeew6B/Uhb9/VQghhCgGti+dRM19mbdziGwXSJXglywcUfaksBBCCCEKuFtR53CYvxK1Auer2tB90q+WDilHUlgIIYQQBZii13NoZC9KxcM9VwiZsxq1dcGdySCFhRBCCFGArfm8J9XPZ6BTQfK7vSkTEGjpkJ5ICgshhBCigLpxOQynU2cACHvJk7aDvrRwRE8nhYUQQghRAGWkp5G6egBt60VzroU9XeeGWjqkXMlzYXHz5k3efPNNPDw8cHR0pHbt2hw/ftwUsRV6zZs3N7pVuhBCCJFbx5f+h8raSyRbOdHo0zXYOThaOqRcyVNhcf/+fRo3boyNjQ1bt24lPDycmTNnUqJECROFVzgMGDAAlUqV5TF9+nSj+3r4+/szZ84cywUqhBCiUPhz7mju7dhOik7FlYZT8PGrZOmQci1P00qnTZuGn58fS5YsMWzz9/fP75gKpXbt2hnlBcDT0xO1OutNYYQQQoicRJw5iOeyrbglW/OHc1V6te1v6ZDyJE+FxR9//EHbtm15/fXX2bNnD2XKlOG9995j8ODBOR6Tnp5Oenq64XlCQgKQuW68RqMx2lej0aAoCnq9Hr1en5fQeHjLk4fHm5OiKNja2uLl5WW0vWXLlgQHBzN79mxatmzJ1atXGTVqFKNGjQLIcov0wsCSeS5uilqu9Xo9iqKg0WgKVMH98PfQ47+PRP6SPOeOTqvl7NghVE6G2+7Q4qtVecqZKfOc2zbzVFhcuXKF+fPnM3r0aD799FOOHDnCBx98gJ2dHf369cv2mClTpjBhwoQs27dv346jo/F4kbW1NT4+PiQlJZGRkYGiKKRp8vYLNfXegzztnxN7GytUqtwth6rRaNBqtYai6SGtVktGRgYJCQksWbKEl156iQEDBhhy9fj+hUliYqKlQyg2ikquMzIySE1NZe/evWi1WkuHk8WOHTssHUKxIHl+srjNs2kQqUOjhsgubUg4fgY4k+d2TJHnlJSUXO2Xp8JCr9dTr149Jk+eDECdOnU4e/Ys8+fPz7Gw+OSTTxg9erTheUJCAn5+frRp0ybbu5tev34dZ2dn7O3tScnQUmeaZb6EYeNb42ibu/TY2Niwbds2ypYta9jWrl07rK2tsbW1xdXVFVdXV2xsbChVqhSVK1c2VdgmpygKiYmJuLi45LrwEs+mqOU6LS0NBwcHmjZtWuDubrpjxw5at24td900Icnz0x3YMA//A3cAOPuyH31Hz8hzG6bMc27/GM5TYeHr60tgoPHCHNWrV+f333/P8Rg7Ozvs7OyybLexscnyoXU6HSqVCisrK8PDUvLy/iqVihYtWjB//nzDNicnJ3r37m34PI/uW5hvgf2wS76wf47CoKjl2soqsxcwu5/9gqCgxlXUSJ6zF3fnOsqcH7HVwaUANT1mbXmu1TVNkefctpenqBs3bsyFCxeMtl28eJHy5cvnpZlcc7BREz6xba721ev1JCYk4uLqki+/hB1s8jYG7OTkRKVKhWfWrhBCiIJj/7yhlE6GeCeoMX1BgV6y+2nyFPmoUaNo1KgRkydPpkePHhw5coQFCxawYMECkwSnUqlyPRyh1+vR2qpxtLUusH/d2draFsoJm0IIIUznxLbldFbt49YrNpzyH0KDmo0sHdJzydMZ+MUXX2T9+vWsWrWKoKAgJk2axJw5c3jjjTdMFV+R4u/vz969e7l58yaxsbGWDkcIIYSF3bkRQYV/PgbgWvketH9nsoUjen557mvp2LEjHTt2NEUsRd7EiRN55513qFixIunp6YbLCYUQQhQ/6akp/DO0M17lrPD0rUzdgbMsHVK+KLyDOAXI0qVLs92+e/duo+cNGjTg9OnTpg9ICCFEgbduxCvUvqQn9aoz6Qu/wtau4Fwt9TwK5mQEIYQQogjbvnQSNffFAHC5QyBB9dtZOKL8I4WFEEIIYUY3I8NxmL8StQLnq9rQbdKvlg4pX0lhIYQQQpiJTqvl8MhelIqHe64QMmd1ob60NDtSWAghhBBm8vsXvah+QYNOBcnv9qZMQODTDypkpLAQQgghzODG5TDsI/4FIOwlT9oO+tLCEZmGFBZCCCGEiWWkp5G6egCdA+8Q0d6JrnNDLR2SyUhhIYQQQpjYsSWjqKy9RDxO1Bu1GjsHx6cfVEhJYSGEEEKY0J9zR5Owbid3MqyJaDgNH7+ifV+pojUVVQghhChAIs4cxHPZVtySrdjtGUDPtn0tHZLJSY+FEEIIYQI6rZazY4fglgy3PaD1rPWWDskspLAwE5VKxYYNG3J83d/fnzlz5pgtnmfxxRdf8M477xieK4rCkCFDcHd3R6VScerUKZo3b87IkSOf+T2WLl1KiRIlnj9Y8UQDBgzg1VdfzZe2YmJi8PT05ObNm/nSnhBFxW//6UjlSB0ZalCPGoa7t5+lQzILKSzyQUxMDO+88w7lypXDzs4OHx8f2rZtyz///GPp0PLNnTt3+Pbbb/nkk08M20JDQ1m6dCmbNm0iOjqaoKAg1q1bx6RJkwz75LVg6tmzJxcvXsxTbM9bzJjb7t27UalUPHjw4Lna8ff3R6VSGT3Kli2bP0HmgZeXF3379mXcuHFmf28hCqq9a78n8K+rAIS3Kk/T7sMtHJH5yByLfNCtWzc0Gg3Lli2jQoUK3Llzh7///pu4uDhLh2YkIyMDW1vbZzr2559/pmHDhvj7+5OQkABAREQEvr6+NGrUyLCfu7v7c8Xo4OCAg4PDc7XxrJ4nP5YyceJEBg8ebHiuVqstEsfAgQMJCQnhm2++oWTJkhaJQYiCIu7OdXSzf8BWB5cC1PSYucnSIZlVwe6xUBTISM79Q5OSt/2f9MjlLc0fPHjA/v37mTZtGi1atKB8+fKEhITwySef0KFDhxyPmzhxIt7e3pw6dSrb1+Pj4xkyZAheXl64urrSsmVLozujRkRE0KVLF7y9vXF2dubFF1/kr7/+MmrD39+fr776igEDBuDm5sbgwYMNQw3btm2jevXqODs7065dO6Kjo5/4OVevXk3nzp0NzwcOHMj777/PtWvXUKlU+Pv7A8a9B82bN+fq1auMGjXK8Bf10zw+FDJ+/Hhq167N8uXL8ff3x83NjV69epGYmAhkdunv2bOHb7/91vAeUVFRAISHh9O+fXucnZ3x9vamb9++xMbGGtpu3rw5w4cPZ/To0ZQqVYrWrVsDcPbsWTp06ICrqysuLi40adKEiIgIw3FLliyhevXq2NvbU61aNebNm2d4LSoqCpVKxerVq2nUqBH29vbUqFHDcKfbqKgoWrRoAUDJkiVRqVQMGDDgqXnJiYuLCz4+PoaHp6cnOp2Ot956i4CAABwcHKhatSrffvvtE9tZu3YtNWvWxMHBAQ8PD1q1akVycnKuPjNAzZo18fHxYf364jGGLMSTHFn0AVZ6hQdOUGP6giK3ZPfTFOxPq0mByaVztasVUCI/3/vTW2Dr9NTdnJ2dcXZ2ZsOGDTRo0AA7O7sn7q8oCiNHjmTDhg3s37+fypUrZ7tPhw4dcHd3Z8uWLbi5ufHTTz/x8ssvc/HiRdzd3UlKSqJ9+/Z89dVX2Nvbs2zZMjp16sSFCxcoV66coa1vvvmGL774gs8//xyA/fv3k5KSwowZM1i+fDlWVla8+eabjBkzhl9++SXbmO/fv09YWBj16tUzbJszZw6VKlViwYIFHD16NNu/lNetW0dwcDBDhgwx+qs6ryIiItiwYQObNm3i/v379OjRg6lTp/L111/z7bffcvHiRYKCgpg4cSIAnp6eREdH06xZMwYPHsysWbNITU3lo48+okePHuzcudPQ9rJlyxg6dCgHDhxAURRu3rxJ06ZNad68OTt37sTV1ZUDBw6g1WoBWLhwIePGjeP777+nTp06nDx5ksGDB+Pk5ET//v0N7X744YfMmTOHwMBAZs2aRefOnYmMjMTPz4/ff/+dbt26ceHCBVxdXQ09NJMnT2by5MlPzMXWrVtp0qTJE/fR6/WULVuW3377jVKlSnHw4EGGDBmCr68vPXr0yLJ/dHQ0vXv3Zvr06XTt2pXExET27duH8t/iOrefOSQkhH379jFo0KAnxidEUXZi23LaaXaS8LKaw5VH07Bmo6cfVMQU7MKiELC2tmbp0qUMHjyYH3/8kbp169KsWTN69epFrVq1jPbVarX069ePY8eOceDAgRzHw3ft2sWZM2eIiYkxFCozZsxgw4YNrF27liFDhhAcHExwcLDhmK+++or169fzxx9/MHz4/8byWrZsyZgxYwzP9+/fj0aj4ccff6RixYoADB8+3HBSzs7Vq1dRFIXSpf9X5Lm5ueHi4oJarcbHxyfb49zd3VGr1Ya/qp+VXq9n6dKluLi4ANC3b1/+/vtvvv76a9zc3LC1tcXR0dHoPebPn0/dunWNTtSLFy/Gz8+PixcvUqVKFQAqVarE9OnTDft8+umnuLm5sXr1amxsbAAM+wJMmjSJmTNn8tprrwEQEBBAeHg4P/30k9FJdvjw4XTr1s0QS2hoKD///DNjx441DBd5eXkZ9c68++67Rid+vV5PUlISzs7OWFlldi6WKVPGKDcfffSRoWiEzOLkgw8+YMKECYZtAQEBHDx4kN9++y3HwkKr1fLaa69Rvnx5ILMHIq+fuUyZMpw8eTJL+0IUF9HXLlHhn48BOOvXm9b9P3/KEUVTwS4sbBwzew5yQa/Xk5CYiKuLi+GX8HO/dy5169aNDh06sG/fPv755x9CQ0OZPn06ixYtMurmHjVqFHZ2dhw6dIhSpUrl2N7x48dJSkrCw8PDaHtqaqqhSz45OZkJEyawadMmbt26hVarJTU1lWvXrhkd82gvw0OOjo6GogLA19eXmJiYHONJTU0FwN7ePuckmJC/v7+hqICnxwuZOdy1axfOzs5ZXouIiDAUC4/n59SpUzRp0sRQVDzq7t27XL9+nbfeesuoB0ar1eLm5ma0b8OGDQ3/b21tTb169Th37twTY3Z3dzeao6LX60lISMDV1TXH7/SHH35o9B17+L368ccfWbRoEVevXiU1NZWMjAxq166dbRvBwcG8/PLL1KxZk7Zt29KmTRu6d+9OyZIl8/SZHRwcSElJeeJnFKKoSk9N4fjgV7HxtaV8xYq8MGCmpUOymIJdWKhUuRqOAECvBxtd5v75UVjkkb29Pa1bt6Z169Z8+eWXvP3224wbN87ol37r1q1ZtWoV27Zt44033sixLb1ej6+vr2Fc/lEP/8L98MMP2bZtGzNmzKBSpUo4ODjQvXt3MjIyjPZ3csqav8dPmiqVytDtnZ2HJ6v79+9nKXbMIbt49Xr9E4/R6/V06tSJadOmZXnN19fX8P+P5+dJE0cfvufChQupX7++0Wu5mTT5tDkmzzIUUqpUKSpVMl7F77fffmPUqFHMnDmThg0b4uLiwjfffMPhw4ezbVOtVrNjxw4OHjzI9u3b+e677/jss884fPgwjo6ZBXZuPnNcXByenp5PjF+IomrdiHbUvqon7ZY9KQM+w9bOMn+IFQQFu7AoxAIDA7OsW9G5c2c6depEnz59UKvV9OrVK9tj69aty+3bt7G2tjZMinzcvn37GDBgAF27dgUgKSnJMGkxv1WsWBFXV1fCw8OznMSextbWFp1OZ5K4nvQedevW5ffff8ff3x/rPEycqlWrFsuWLUOj0WQpaLy9vSlTpgxXrlx5YmEIcOjQIZo2bQpk/nV//PhxwxDVwytPHo/5WYZCsrNv3z4aNWrEe++9Z9j26OTT7KhUKho3bkzjxo358ssvKV++POvXr2f06NG5/sxhYWE0b978qfEJUdRsWzyRmvvuAnCpYyA9mnezcESWVbCvCikE7t27R8uWLVmxYgX//vsvkZGRrFmzhunTp9OlS5cs+3ft2pXly5czcOBA1q5dm22brVq1omHDhrz66qts27aNqKgoDh48yOeff86xY8eAzLkB69at49SpU5w+fZo+ffo89a/4Z2VlZUWrVq3Yv39/no/19/dn79693Lx50+iKjPzk7+/P4cOHiYqKIjY2Fr1ez7Bhw4iLi6N3794cOXKEK1eusH37dgYNGvTEQmf48OEkJCTQq1cvjh07xqVLl1i+fDkXLlwAMq9SmTJlimHS6JkzZ1iyZAmzZs0yaueHH35g/fr1nD9/nmHDhnH//n3DpMby5cujUqnYtGkTd+/eJSkpCcgcCqlUqZLRo0KFCkbPc3MpbqVKlTh27Bjbtm3j4sWLfPHFFxw9ejTH/Q8fPszkyZM5duwY165dY926ddy9e5fq1avn+jOnpKRw/Phx2rRp89T4hChKbkaG4/TjKtQKnKtqQ7dJv1o6JIuTwuI5OTs7U79+fWbPnk3Tpk0JCgriiy++YPDgwXz//ffZHtO9e3eWLVtG3759WbduXZbXVSoVW7ZsoWnTpgwaNIgqVarQq1cvoqKi8Pb2BmD27NmULFmSRo0a0alTJ9q2bUvdunVN9jmHDBnC6tWr81y8TJw4kaioKCpWrGiybvIxY8agVqsJDAzE09OTa9euUbp0aQ4cOIBOp6Nt27YEBQUxYsQI3NzcnjgHx8PDg507d5KUlESzZs144YUXWLhwoaH34u2332bRokUsXbqUmjVr0qxZM5YuXUpAQIBRO1OnTmXatGkEBwezb98+Nm7caBhSKlOmDBMmTODjjz/G29vbaLJtfnj33Xd57bXX6NmzJ/Xr1+fevXtGvRePc3V1Ze/evbRv354qVarw+eefM3PmTF555ZVcf+aNGzdSrly5p16xIkRRotNqOTyyFx4JcM8V6s9ZXewuLc2OSnnS4LoJJCQk4ObmRnx8PK6urkavpaWlERkZSUBAQJ4nCuZmopt4doqi0KBBAz744APDGg+S56yioqIICAjg5MmTOU6WzK3C9J0OCQlh5MiR9OnTJ8d9nufn25Q0Gg1btmyhffv22U7aFfmjKOb5t0+6UXN9ODoV3PqoD20GfGHpkEya5yedvx9VsH9biQJDpVKxYMECw3oOQjwUExND9+7d6d27t6VDEcJsrl8+g8udU2it4EwTzwJRVBQUUliIXAsODqZv3+e75e8rr7xiWFTs8cfTrogQBZOXlxdjx47N1cqqQhQFGelppK0eyCvlYonp4spr34ZaOqQCRQaDhFktWrTIsC7G4573PiMFgb+//xMv3RVCFH5HFo/kJe0lHuBM0LCV2Dnkft2j4kAKC2FWublcUgghCqo/vx2Fy6+7OdfInrQ206hTtuLTDypmZChECCGEyIWIMwfx/L9QvONUnLlRljpt3rR0SAWSFBZCCCHEU+i0Ws6OHYJbMtz2gFbfbrB0SAWWDIUIIYQQj9HpFY5ExhGTmIaXiz1R375B7UgdGWqw/s9w3L39LB1igSWFhRBCCPGI0LBoJvwZTnR8GgBN0v5mzN9XATjXujy9XhtmyfAKPBkKMbHmzZszcuTIZzp2586dVKtWzWRLdZvb7t27UalUPHjwIMd9VCpVlnusFAebNm2iTp06RebfWojCKjQsmqErThiKClddHIOObMVWBxf8rXAdsNDCERZ8UljkgwEDBqBSqbI8Ll++zLp165g0aZJhX39/f+bMmZOrdseOHctnn31W4FddzE/R0dGGpaRzY+nSpYY7vhYW2X0HOnbsiEqlYuXKlZYJSgiBTq8w4c9wHr1gfJTVr6Q76XngBLOC3mHS1ovo9HJJ+ZMUnzOWibVr147o6GijR0BAAO7u7ri4uOS5vYMHD3Lp0iVef/11E0RbcPn4+GBnZ2f299XpdBbvLRg4cCDfffedRWMQojg7Ehln6KkAaGt1hIEO+3i54R0WN2/HNeuKRMencSQyzoJRFnxSWOQTOzs7fHx8jB5qtdpoKKR58+ZcvXqVUaNGGXo1crJ69WratGljdE+F8ePHU7t2bRYvXky5cuVwdnZm6NCh6HQ6pk+fjo+PD15eXnz99ddGbc2aNYuaNWvi5OSEn58f7733nuGOmvC/v/q3bdtG9erVcXZ2NhRKD2U3pPPqq68yYMAAw/MVK1ZQr149XFxc8PHxoU+fPsTExOQpj48OhURFRaFSqVi3bh0tWrTA0dGR4OBg/vnnHyBzaGXgwIHEx8cb8jl+/HgAMjIyGDt2LGXKlMHJyYn69euze/fuLJ9506ZNBAYGYmdnx9WrV0lPT2fs2LH4+flhZ2dH5cqV+fnnnw3HhYeH0759e5ydnfH29qZv375Gd21t3rw5w4cPZ/jw4ZQoUQIPDw8+//xzw6JZT/oOdO7c2XAnViGE+cUk/q+oqKiPYJpN5rDHz/qO7LBple1+IqtCUVikaFJyfKTr0nO9b5o2LVf7msq6desoW7YsEydONPRq5GTv3r3Uq1cvy/aIiAi2bt1KaGgoq1atYvHixXTo0IEbN26wZ88epk2bxueff86hQ4cMx1hZWTF37lzCwsJYtmwZO3fuZOzYsUbtpqSkMGPGDJYvX87evXu5du0aY8aMydPny8jIYNKkSZw+fZoNGzYQGRlpVHg8q88++4wxY8Zw6tQpqlSpQu/evdFqtTRq1Ig5c+bg6upqyOfDmAcOHMiBAwdYvXo1//77L6+//jrt2rXj0qVLRp95ypQpLFq0iLNnz+Ll5UW/fv1YvXo1c+fO5dy5c/z44484OzsDmcM0zZo1o3bt2hw7dozQ0FDu3LlDjx49jOJdtmwZ1tbWHD58mLlz5zJ79mwWLVoEPPk7UL58eby8vNi3b99z50wIkXdeLpl/yJXQxfLpwfnsP+PEUU0AM7WvZ7ufyF6huCqk/sr6Ob7WpEwT5rWaZ3jeYk0L0nTZV5P1vOuxpN0Sw/N2v7fjfvr9LPud6X8mzzFu2rTJcAKCzHtirFmzxmgfd3d31Gq14S/6J4mKiqJ06dJZtuv1ehYvXoyLiwuBgYG0aNGCCxcusGXLFqysrKhatSrTpk1j9+7dNGjQAMCopyEgIIBJkyYxdOhQ5s37X940Gg0//vgjFStmriI3fPhwJk6cmKccDBo0yPD/FSpUYO7cuYSEhJCUlGSUm7waM2YMHTp0AGDChAnUqFGDy5cvU61aNdzc3FCpVEb5jIiIYNWqVdy4ccOQwzFjxhAaGsqSJUsM9yTRaDTMmzeP4OBgAC5evMhvv/3Gjh07aNWqleFzPDR//nzq1q1rdE+TxYsX4+fnx8WLF6lSpQoAfn5+zJ49G5VKRdWqVTlz5gyzZ89m8ODBT/0OlClThqioqGfOlRDi2YUEuOPtbM3YQzMoHQvxKdZ8GtAHjU3mqVIF+LjZExJQ+G8/YEqForAoDFq0aMH8+fMNz52cnJ6rvdTU1GxvLe3v7280Z8Pb2xu1Wm00wdPb29toCGLXrl1MnjyZ8PBwEhIS0Gq1pKWlkZycbIjT0dHRUFQA+Pr65nkY4+TJk4wfP55Tp04RFxdnmLNw7do1AgMD89TWo2rVqmUUF2TeUbNatWrZ7n/ixAkURTGc6B9KT0/Hw8PD8NzW1tao7VOnTqFWq2nWrFm27R4/fpxdu3ZlWyRFREQY3q9BgwZGQxwNGzZk5syZ6HQ61Gr1Ez+rg4MDKSmm6zUTQuRMbaViWNTXBF7WorWC/2vcmEs2NYDMogJgXKdA1FZyw70nKRSFxeE+h3N8TW1l/It61+u7cryKwkplvD20W/7dkc7JyYlKlSrlW3ulSpXi/v2svSk2NjZGz1UqVbbbHp7Ur169Svv27Xn33XeZNGkS7u7u7N+/n7feeguNRvPEdh+9mZaVlVWWm2s9enxycjJt2rShTZs2rFixAk9PT65du0bbtm3JyMjI46fP+TM/PGE/aaKlXq9HrVZz/PjxLCfyR4sCBwcHowLAwcHhiXHo9Xo6derEtGnTsrz2sOB5XnFxcXh6euZLW0KIvNkwYxgv7L8LwF+NyrLFqavhNR83e8Z1CqRdUP78rBdlhaKwcLTJ/Z3jHG0cc315Zl7azS+2trbodLqn7lenTh3Cw8Of+/2OHTuGVqtl5syZhrz89ttveW7H09PTaD6ATqcjLCyMFi1aAHD+/HliY2OZOnUqfn5+hvc2tezyWadOHXQ6HTExMTRp0iTXbdWsWRO9Xs+ePXsMQyGPqlu3Lr///jv+/v5YW+f8o/Po/JaHzytXrmwocnL6DqSlpREREUGdOnVyHbMQIn8c3rYCv+U7sQLCajnwwYLtNHhk5c2QAHfpqcilQjF5syjx9/dn79693Lx50+hqgse1bduW/fv3P/f7VaxYEa1Wy3fffceVK1dYvnw5P/74Y57badmyJZs3b2bz5s1cvHiRYcOGGS10Va5cOWxtbQ3v88cffxit32Eq/v7+JCUl8ffffxMbG0tKSgpVqlThjTfeoF+/fqxbt47IyEiOHj3KtGnT2LJlyxPb6t+/P4MGDTJMPt29e7ehEBs2bBhxcXH07t3bcPXG9u3bGTRokFGhcP36dUaPHs2FCxdYtWoV3333HSNGjDB6n+y+A4cOHcLOzo6GDRuaIFNCiJzE3b1F8qSvcUyHq6VVvLLwL9RWKhpW9KBL7TI0rOghRUUeSGFhZhMnTiQqKoqKFSs+scv7zTffJDw8nAsXLjzX+9WuXZtZs2Yxbdo0goKC+OWXX5gyZUqe2xk0aBD9+/dnwIABdOzYkYCAAENvBWT2aCxdupQ1a9YQGBjI1KlTmTFjxnPFnhuNGjXi3XffpWfPnnh6ejJ9+nQAlixZQr9+/fjPf/5D1apV6dy5M4cPHzb0puRk/vz5dO/enffee49q1aoxePBgkpOTAShdujQHDhxAp9PRtm1bgoKCGDFiBG5ubka9ZP369SM1NZWQkBCGDRvG+++/z5AhQwyv5/QdWLVqFW+88QaOjubvSROiuNLrdFxd8haOdRKJLgUBM+fh7CaTM5+HSnl84NzEEhIScHNzIz4+HldXV6PX0tLSiIyMJCAgINuJi0+i1+tJSEjA1dW1yKxUOXbsWOLj4/npp58sHYpBUcxzfmrevDm1a9fO9eqqD929e5dq1apx7NgxAgICgKKX6+f5+TYljUbDli1baN++fZa5RiL/FNQ8/7P4QxpeW0C6YsOVjr9S/cWXLR3SczFlnp90/n5U4f9tVYR99tlnlC9fPldzMkThFhkZybx58wxFhRDC9DbMHEbpC0sBOF17XKEvKgqKQjF5s7hyc3Pj008/tXQYwgxCQkIICQmxdBhCFBsH/1hIuWU7uWtVijPdq9Gx6/uWDqnIkMJCiHz06LLhQoiC6XpEGJopsyiZAZFlrWgxYqmlQypSZChECCFEsZGemsKpYT3xug9xrlB19kKcXNwsHVaRIoWFEEKIYmP90JepFKUn3RoyRg+mYs1Glg6pyJHCQgghRLGwdmI/gg89AOBi12Ba9Bpt2YCKKCkshBBCFHmX//0H571HAPi3nis9Jq22cERFlxQWQgghirT4e3dwXN+fZo2iCWtkT+efdlg6pCJNCgshhBBFVkZ6GtcW9KK0coc4tTdtvgnFwSnnxZ3E85PLTYUQQhRZ699qhrM+lvIV7cjotRw3D29Lh1TkSY+FmahUKjZs2JDj6/7+/nleBtrcvvjiC9555x3Dc0VRGDJkCO7u7qhUKk6dOkXz5s0ZOXLkM7/H0qVLKVGixPMHK0xi06ZN1KlT54m3rReioFjzeU9qHUugwglbdpV8nYAa9S0dUrEghUU+iImJ4Z133qFcuXLY2dnh4+ND27Zt+eeffywdWr65c+cO3377LZ988olhW2hoKEuXLmXTpk1ER0cTFBTEunXrjO5qmteCqWfPnly8eDFPsT1vMWNuu3fvRqVSGd0d9lmOf9Jj6dKl+RrzQx07dkSlUrFy5UqTtC9Eftm5aiZVNvwLwOkGJegy+gcLR1R8yFBIPujWrRsajYZly5ZRoUIF7ty5w99//01cXJylQzOSkZGBra3tMx37888/07BhQ/z9/UlISAAgIiICX19fGjX633Xg7u7Pd1dABwcHHBwcnquNZ/U8+TGnRo0aER0dbXg+YsQIEhISWLJkiWGbm9v/FvzR6XSoVKp8u5HZwIED+e6773jzzTfzpT0h8tul0wewm7UIWy1cDlDTdf7flg6pWCkUPRb6lJScH+npud83LS1X++bFgwcP2L9/P9OmTaNFixaUL1+ekJAQPvnkEzp06JDjcRMnTsTb25tTp05l+3p8fDxDhgzBy8sLV1dXWrZsyenTpw2vR0RE0KVLF7y9vXF2dubFF1/kr7/+MmrD39+fr776igEDBuDm5sbgwYMNQw3btm2jevXqODs7065dO6MTVXZWr15N586dDc8HDhzI+++/z7Vr11CpVPj7+wPGvQfNmzfn6tWrjBo1yvCX9NM8PhQyfvx4ateuzfLly/H398fNzY1evXqRmJgIwIABA9izZw/ffvut4T2ioqIACA8Pp3379jg7O+Pt7U3fvn2JjY01tN28eXOGDx/O6NGjKVWqFK1btwbg7NmzdOjQAVdXV1xcXGjSpAkRERGG45YsWUL16tWxt7enWrVqzJs3z/BaVFQUKpWK1atX06hRI+zt7alRo4Zhqe+oqCjD7eZLliyJSqViwIABT83Lo2xtbfHx8TE8HBwcDD1lPj4+hIaG4uvry6ZNmwgMDMTOzo6rV69m27Pz6quvGr1/RkYGY8eOpUyZMjg5OVG/fv0sy5R37tyZI0eOcOXKlTzFLYQ5JCfGc+k/g3FPhJiSUPv71dg5OFo6rGKlUPRYXKj7Qo6vOTVrSrlHbit+uUlTlNTUbPd1fPFFyi//v//t+3IrdPfvZ9mv+vlzuY7N2dkZZ2dnNmzYQIMGDbCzs3vi/oqiMHLkSDZs2MD+/fupXLlytvt06NABd3d3tmzZgpubGz/99BMvv/wyFy9exN3dnaSkJNq3b89XX32Fvb09y5Yto1OnTly4cIFy5coZ2vrmm2/44osv+PzzzwHYv38/KSkpzJgxg+XLl2NlZcWbb77JmDFj+OWXX7KN+f79+4SFhVGvXj3Dtjlz5lCpUiUWLFjA0aNHUavVWY5bt24dwcHBDBkyhMGDB+cqn9mJiIhgw4YNbNq0ifv379OjRw+mTp3K119/zbfffsvFixcJCgpi4sSJAHh6ehIdHU2zZs0YPHgws2bNIjU1lY8++ogePXqwc+dOQ9vLli1j6NChHDhwAEVRuHnzJk2bNqV58+bs3LkTV1dXDhw4gFarBWDhwoWMGzeO77//njp16nDy5EkGDx6Mk5MT/fv3N7T74YcfMmfOHAIDA5k1axadO3cmMjISPz8/fv/9d7p168aFCxdwdXU19NBMnjyZyZMnPzEXW7dupUmTJk/NWUpKClOmTGHRokV4eHjg5eWVq1wPHDiQqKgoVq9eTenSpVm/fj3t2rXjzJkzhu9q+fLl8fLyYt++fVSoUCFX7QphLpuHvEzNGwqptmD72Rj8KgZZOqRip1AUFgWZtbU1S5cuZfDgwfz444/UrVuXZs2a0atXL2rVqmW0r1arpV+/fhw7dowDBw5QtmzZbNvctWsXZ86cISYmxlCozJgxgw0bNrB27VqGDBlCcHAwwcHBhmO++uor1q9fzx9//MHw4cMN21u2bMmYMWMMz/fv349Go+HHH3+kYsWKAAwfPtxwUs7O1atXURSF0qVLG7a5ubnh4uKCWq3Gx8cn2+Pc3d1Rq9W4uLjkuE9u6PV6li5diouLCwB9+/bl77//5uuvv8bNzQ1bW1scHR2N3mP+/PnUrVvX6ES9ePFi/Pz8uHjxIlWqVAGgUqVKTJ8+3bDPp59+ipubG6tXr8bGxgbAsC/ApEmTmDlzJq+99hoAAQEBhIeH89NPPxkVFsOHD6dbt26GWEJDQ/n5558ZO3asYbjIy8vLqHfm3XffpUePHkafOykpCWdnZ8MwRpkyZXKVM41Gw7x584y+I08TERHBqlWruHHjhuHfesyYMYSGhrJkyRKjXJYpU8bQMyREQXFk/Vy8XGNIt3Hiap8mdO34lqVDKpYKRWFR9cTxnF987C/lSvv25jyW/Nj2Sn//lf1+edStWzc6dOjAvn37+OeffwgNDWX69OksWrTIqJt51KhR2NnZcejQIUqVKpVje8ePHycpKQkPDw+j7ampqYYu+eTkZCZMmMCmTZu4desWWq2W1NRUrl27ZnTMo70MDzk6OhqKCgBfX19iYmJyjCf1vz1A9vb2OSfBhPz9/Q1FBTw9XsjM4a5du3B2ds7yWkREhKFYeDw/p06dokmTJoai4lF3797l+vXrvPXWW0Y9MFqt1mhOA0DDhg0N/29tbU29evU4d+7JPWHu7u5Gc1T0ej0JCQm4urrmeX6Era1tlsL2aU6cOIGiKEaFFEB6enqW76KDgwMpeRw2FMKULhzbSe1TE7D11bJjSHu6vv+jpUMqtgpFYWHlmPvxMStHx1z/Es5Lu09jb29P69atad26NV9++SVvv/0248aNMyosWrduzapVq9i2bRtvvPFGjm3p9Xp8fX2zvQX3w79wP/zwQ7Zt28aMGTOoVKkSDg4OdO/enYyMDKP9nZycsrTx+ElTpVKhKEqO8Twsgu7fv5/lBGMO2cX7tMsd9Xo9nTp1Ytq0aVle8/X1Nfz/4/l50sTRh++5cOFC6tc3vmwtu6Ggxz1tjkl+DoU4ODhkeT8rK6ss/84ajcbw/3q9HrVazfHjx7N8nscLtLi4ODw9PZ8ahxDmcOHUHvS/v42tnZaTjo15+T25AsSSCkVhURgFBgZmWbeic+fOdOrUiT59+qBWq+nVq1e2x9atW5fbt29jbW1tmBT5uH379jFgwAC6du0KQFJSksm6pitWrIirqyvh4eFUqlQpT8fa2tqi0+lMEteT3qNu3br8/vvv+Pv7Y22d+695rVq1WLZsGRqNJktB4+3tTZkyZbhy5coTC0OAQ4cO0bRpUyCzR+P48eOGIaqHV548HnN+DoVk5+Hck4d0Oh1hYWGGyaR16tRBp9MRExPzxOIlLS2NiIgI6tSp88yxCJFfkuLjiBw9FNcEO/a0KMsLo3/BKheFvjCdQnFVSEF27949WrZsyYoVK/j333+JjIxkzZo1TJ8+nS5dumTZv2vXrixfvpyBAweydu3abNts1aoVDRs25NVXX2Xbtm1ERUVx8OBBPv/8c44dOwZkzg1Yt24dp06d4vTp0/Tp08dkixZZWVnRqlUr9u/fn+dj/f392bt3Lzdv3jS6IiM/+fv7c/jwYaKiooiNjUWv1zNs2DDi4uLo3bu34QqG7du3M2jQoCcWOsOHDychIYFevXpx7NgxLl26xPLly7lw4QKQeZXKlClTDJNGz5w5w5IlS5g1a5ZROz/88APr16/n/PnzDBs2jPv37zNo0CAgc/KjSqVi06ZN3L17l6SkJCBzKKRSpUpGjwoVKhg9f55LcVu2bMnmzZvZvHkz58+f57333jNaS6NKlSq88cYb9OvXj3Xr1hEZGcnRo0eZNm0aW7ZsMex36NAh7OzsjIZ7hLCUrYNbUf6Wgp0GHFp/grNrSUuHVOxJYfGcnJ2dqV+/PrNnz6Zp06YEBQXxxRdfMHjwYL7//vtsj+nevTvLli2jb9++rFu3LsvrKpWKLVu20LRpUwYNGkSVKlXo1asXUVFReHtnLkc7e/ZsSpYsSaNGjejUqRNt27albt26JvucQ4YMYfXq1XkuXiZOnEhUVBQVK1Y0Wdf5mDFjUKvVBAYG4unpybVr1yhdujQHDhxAp9PRtm1bgoKCGDFiBG5ubk8cKvPw8GDnzp0kJSXRrFkzXnjhBRYuXGjovXj77bdZtGgRS5cupWbNmjRr1oylS5cSEBBg1M7UqVOZNm0awcHB7Nu3j40bNxqGlMqUKcOECRP4+OOP8fb2Nppsa0qDBg2if//+9OvXj2bNmhEQEGDorXhoyZIl9OvXj//85z9UrVqVzp07c/jwYfz8/Az7rFq1ijfeeAPHfBxKFOJZ/PqfjgT9m4oeuNGvJSFtZG2VgkClPGlw3QQSEhJwc3MjPj4eV1fjG8GkpaURGRlJQEBAnicKPs9EN/F0iqLQoEEDPvjgA8MaD5LnrKKioggICODkyZPUrl37udoqiN/pu3fvUq1aNY4dO5almHqa5/n5NiWNRsOWLVto3759tpN2Rf7I7zxvWzSO0rN+w1oPp5p70/vH3c8fZBFgyu/zk87fjyoYv61EgadSqViwYIFhPQdRPEVGRjJv3rw8FxVC5Kezh7fjNi+zqDhf2YYe3+fPFX4if0hhIXItODiYvn37Plcbr7zyimFRsccfT7siQlheSEgIPXv2tHQYohhLS03m4uRRuKVAdClosmAj6jxM0BamJ/8awqwWLVpkWBfjcc97n5GCwN/f/4mX7gohnp2i13Pmx0G8XPMWu/GizAdfUcpXes8KGikshFk9z+WSQoji7fCvU2kQH4pOrSLgP9Op2TTrlXfC8gpkYSF/8QlR9MjPtXgem+d9hPrIGvTl4WiV0TSQoqLAytMci/HjxxvuIPnw8Tz3gHjcw9X+Hl89UghR+D1cAlyuvBB5dWrfRjwX/kH5Q3asiQmkfu/PLR2SeII891jUqFHD6PbcuVnKONfBWFvj6OjI3bt3sbGxydMldnq9noyMDNLS0grMpXlFkeTZfIpKrhVFISUlhZiYGEqUKJGvvzNE0Rd35zqxn31MmVS44QVtJq5FVYh/HoqDPBcW1tbW+dpL8SiVSoWvry+RkZFcvXo1T8cqikJqamq290gQ+UfybD5FLdclSpQw2e8OUTTptFp2v9OR6jGQ4AjeU6ZT0lPmaRV0eS4sLl26ROnSpbGzs6N+/fpMnjyZChUq5Lh/eno66enphucJCQlA5iIej94A6SGVSoW/vz8ajSZPY7JarZaDBw/SqFGjPN0bQuSN5Nl8ikquVSoV1tbWqNXqArkOysPfQ9n9PhL551nyvHZkO2qfz0Cngti3u9AmpJ38Oz2FKb/PuW0zTytvbt26lZSUFKpUqcKdO3f46quvOH/+PGfPns3xrpfjx49nwoQJWbavXLlSlgQWQgiRrbtHfqXhupNYKXComTfu7UdZOqRiLyUlhT59+jx15c3nWtI7OTmZihUrMnbsWEaPHp3tPtn1WPj5+REbG/vEwPJKo9GwY8cOWrduLZPDTEjybD6Sa/OQPJtHXvJ8K+oc4dNew++ADecD7ei44h9ZBCuXTPl9TkhIoFSpUk8tLJ7rX8rJyYmaNWty6dKlHPexs7PDzs4uy3YbGxuT/BCbql1hTPJsPpJr85A8m8fT8pyc+AD9qr68UvYu2zsE0OqTrdg/x119iytTfJ9z295zTa1NT0/n3Llz+Pr6Pk8zQgghBDqtltPf9yZAH0UsJag9aj0lPOT8UtjkqbAYM2YMe/bsITIyksOHD9O9e3cSEhLo37+/qeITQghRTPz23sto/ojkQrIDse0X4VVGlusujPI0FHLjxg169+5NbGwsnp6eNGjQgEOHDlG+fHlTxSeEEKIY2DBjKLX2xmCFilPUp2dIa0uHJJ5RngqL1atXmyoOIYQQxdThrcvwW74bKyAs2IGe32y0dEjiOcjyZUIIISwm+up5UiZNxTEdrpZW8cqCv55+kCjQpLAQQghhEZqMdI681x2fOLjvDAEz5+Hs5m7psMRzksJCCCGERfz+QRuqROjIUEPKiP5UrdPc0iGJfCCFhRBCCLM7uX0FLUuGcb2MwoXOgbTq+7GlQxL5RJYyE0IIYVZXzx2nyoH/4GSnxblrY9oM/9nSIYl8JD0WQgghzObmlTDOfvcmTqo0ztoG8+KQeZYOSeQzKSyEEEKYlE6vcCQyDq0mg39HvEnATthwxRfft1dhY5v1lg+icJOhECGEECYTGhbNhD/DiUtK5Yuo6QRG6Um3hgcN3sTdq4ylwxMmID0WQgghTCI0LJqhK04QHZ/GG7ELeOF4EgB/NqzIhGt1CA2LtnCEwhSksBBCCJHvdHqFCX+GowCtUzbR+WDmXbAPBTvxs8dQACb8GY5Or1gwSmEKUlgIIYTId0ci44iOT+PF9IMM3rcbOw1cLm/FNxU+BUABouPTOBIZZ9E4Rf6TwkIIIUS+i0lMw5s4+iauxSUVrntDypuj0VnZZtlPFC1SWAghhMh3riSzzHYancveJqIFTKs/FHvXUln283Kxt0B0wpTkqhAhhBD56v7dm7hu6Es1q+vcVkoytcRH3LUqBegM+6gAHzd7QgLk3iBFjRQWQggh8k1qcgL7+rfFOVFLxEsuvMdH3FQ8seV/kzRV//3vuE6BqK1U2TckCi0ZChFCCJEvdFotm/s1pfIVHR73VVyqOIhRb7yKj5vxcIePmz3z36xLuyBfC0UqTEl6LIQQQuSLNW81IfhsOnoVRLzxEq+9PRGA1oE+HLocQ+y5Qyzu/yINKnlJT0URJj0WQgghntvq99sQfPgBAGEdq/DaJwsNr6mtVIa5FCEB7lJUFHFSWAghhHgua77oTfCO6wCcauZFz282WjgiYUlSWAghhHhmJ3asxHf7KQDO1HGixw9/WzYgYXEyx0IIIcQzuXhiN9X2jySuhZ6D18vRdckB1NZyWinupMdCCCFEnkWEHcLzjzdxVKVz360mry4+KLdAF4AUFkIIIfIo/OgObgweyNkYFZesKxMwbB22drKCpsgkhYUQQohcu3bpNLdHfYDXfdCccsZ1wK84u5a0dFiiAJHCQgghRK7cv3uT8Hd74xsLD5ygxJTJeJetaOmwRAEjhYUQQoinSk1OYG//tpS/qZBiB5pPhlGnWVdLhyUKICkshBBCPJFOq2Vz/2ZUuaIjwxpihr1G0+7DLR2WKKCksBBCCPFEaz/uQI2wtP8u1d2YV4Z8bemQRAEmFxwLIYTI0aGVX9Hd6RBba3mSXL4GPT9ZZOmQRAEnPRZCCCGydWzTAhpc/Aa1Ckq17yFLdYtckR4LIYQQWfz57UjYuYWMWnDSpzv1+31l6ZBEISGFhRBCCCN/LZ9K2UXbsNeoWV+qCq+PX4DKSjq4Re7IN0UIIYTBoa3LcJ21DHsNXClnRYdZ27BSqy0dlihEpLAQQggBwLmjf6EdNxWXVLjhreKFRRtwdnO3dFiikJHCQgghBNcjwoge9T4eCRBTEgK+X4hPucqWDksUQlJYCCFEMZeSlMC/Q3sYlup2nfo1lWo2tnRYopCSwkIIIYoxrSaDC/N64lv9AXGuD5fqfs3SYYlCTK4KEUKIYkrR6zkxbwAhKQdJL2HDhRmTqdVU7v8hno/0WAghRDH169DmlLzxNzpFRXjjOVJUiHwhhYUQQhRDqz9oS/Ceu8TvcGdvhQ+o0+ZNS4ckiggpLIQQophZM643wduvAXApxJMW/SdaOCJRlEhhIYQQxcgfc0ZSbc0pAM7UdqLHvJ2WDUgUOVJYCCFEMfHX8qn4/bwNaz2cr2pD1/87gNpa5vCL/CWFhRBCFANHtq8wWqr75aV/YWNrZ+mwRBEkpaoQQhRxd29FUXLPeK54qInXqHlh0QZcS3pZOixRRElhIYQQRVj8/ViSFnWhss1dVI1Lo+qxRJbqFiYlQyFCCFFE3b97k52ftSZAH0UsJXAcsJ6KNepbOixRxElhIYQQRVBqcgJ7B7Sl2s40/rzkRXy3XykdUM3SYYliQAoLIYQoYnRaLZv7N6NKhI4MNVi91J2KNRtYOixRTEhhIYQQRcyawU2pEZaGXgWX+zSkw9Aplg5JFCMyeVMIIQoRnV7hSGQcMYlpeLnYExLgjtpKZXh91Yh21P7nPgBhHSrT87PFlgpVFFNSWAghRCERGhbNhD/DiY5PM2zzdbNnXKdA2gX5smZcH2ptuwrA6Sae9Jrxh6VCFcWYFBZCCFEIhIZFM3TFCZTHtt+OT2PoihN888I9XK8fAuw4U9uR1+fLUt3CMqSwEEKIAk6nV5jwZ3iWogJAAWqqrvBK2CScyqWz0bM2XcZvlaW6hcXIN08IIQq4I5FxRsMfj6qbcYSZDotxUqVzyjqYdhO2YmfvaOYIhfgfKSyEEKKAi0nMvqiorAln9P7fiLJ14kbjEsS8+hO1pagQFiaXmwohRAHn5WKfZZu39iYfH16MRwKotSq+0L5NaS+5/4ewPCkshBCigAsJcMfXzZ6HF5W66B8w7uQcSsdCvBPMaNSb9FJBhAS4WzROIUAKCyGEKPDUVirGdQoEwEafysSwKQTcVEixhXkvteOM7QuM6xRotJ6FEJYicyyEEKIQaBfky5yu/ugm9qTalcylupc1acAln47M/+86FkIUBFJYCCFEIZCWkoT71qGoYnVo1HDstfr0HjCD2Y+tvCmEpUlhIYQQBVxy4gOivutME05zqbkT58v04q0PZlk6LCGyJYWFEEIUYDeuhHF6bl86lLhCsmKP5tWf6dTwFUuHJUSOZPKmEEIUUFfCj3BuwOuU35bGrlgPbnb5lUApKkQBJ4WFEEIUQBdO7ubq4P6UjYFkB1A1HUGVus0tHZYQTyVDIUIIUcD8e3Az9/8zBp/78MAZGD+G5h3fsnRYQuSKFBZCCFGAHP97DamffIlXAtxzBfsp46j3ci9LhyVErklhIYQQBcTJvRvJ+OhLPJIgpiSUnDGdWo07WTosIfJE5lgIIUQBcCXsMH5/D+deOS23PcDn+/lSVIhC6bkKiylTpqBSqRg5cmQ+hSOEEMXPxRN78FjblVKqBCrXdqLCkpVUfaG5pcMS4pk881DI0aNHWbBgAbVq1crPeIQQoljZuuAztJvX4h+czAXbavgM24xbyVKWDkuIZ/ZMPRZJSUm88cYbLFy4kJIlS+Z3TEIIUSz8OXc0vnPXUemCFRuv+lPmg1ApKkSh90w9FsOGDaNDhw60atWKr7766on7pqenk56ebniekJAAgEajQaPRPMvbZ+thW/nZpshK8mw+kmvzsFSe/5z5HhWX78dWB5f9rWg+9Q/sHJyL7L+3fJ/Nw5R5zm2bKkVRlLw0vHr1ar7++muOHj2Kvb09zZs3p3bt2syZMyfb/cePH8+ECROybF+5ciWOjo55eWshhCgS7u5ZRP3Qy1jr4UJFNdq+H2Pj4GLpsIR4opSUFPr06UN8fDyurq457penwuL69evUq1eP7du3ExwcDPDUwiK7Hgs/Pz9iY2OfGFheaTQaduzYQevWrbGxscm3doUxybP5SK7Nw9x5XvdlL4I2hGOlwLlqNry8ZCeOzm4mf19Lk++zeZgyzwkJCZQqVeqphUWehkKOHz9OTEwML7zwgmGbTqdj7969fP/996Snp6NWq42OsbOzw87OLktbNjY2JvlymapdYUzybD6Sa/MwR553Lv6Mypszi4qzQfZ0Wn4AO4fi1XMr32fzMEWec9tengqLl19+mTNnzhhtGzhwINWqVeOjjz7KUlQIIYTIdOiXCbS89j1Hmzvx770yvLZkPza2Wf/oEqKwy1Nh4eLiQlBQkNE2JycnPDw8smwXQggBOq2WnT++T+vYlQBoa3Sjx+BvUVnJ+oSiaJIlvYUQwkR0Wi1r3mpCwJkHXG5lx93qA2nQf4oUFaJIe+7CYvfu3fkQhhBCFC2ajHTW929M8MlkAI5l1KXXwGkWjkoI05OyWQgh8ll6agob+zSg5n+LitPtK9BrTqiFoxLCPKSwEEKIfJSSFM/m3g2pEZaGXgVnXqtBr1mbLR2WEGYjhYUQQuSThPsxbO/1EtXPZ6C1gnO969Fj8lpLhyWEWUlhIYQQ+SAtJYnzP/TAMVmDRg2X+zel+5fLLR2WEGYnV4UIIcRzSk58QNR3nQnRnuZOQ3uO+Q2g61CZqCmKJykshBDiOVyPCOPY9Dfo6htFkuLAvc5L6dCgnaXDEsJiZChECCGe0ZXwI1wY+DrV9mSw+Zont7qsJlCKClHMSWEhhBDP4MLJ3Vwd3J8yMZDgCM6vjKRK3eaWDksIi5OhECGEyKN/D27m/n/G4HMfHjiDMv5DmnUcZOmwhCgQpMdCCFFs6fQKRyLjADgSGYdOrzz1mON//0bCyDF43Yd7rmAzbRyNpKgQwkB6LIQQxVJoWDQT/gwnLimV6SEwaNlR3J0dGNcpkHZBvtkec+7oTtI/GodHEsSUhJIzZ1CrUQczRy5EwSY9FkKIYic0LJqhK04QHZ9mtP12fBpDV5wgNCw6yzERZw7hvbk/iUFp3PYAn3nzpagQIhtSWAghihWdXmHCn+FkN+jxcNuEP8ONhkXOH9tJqd9fw50EKlYsSbWVG6hap7kZohWi8JHCQghRrByJjMvSU/EoBYiOTzPMvdi64DOiPn4PRZvKeevqeA7fjm/5qmaKVojCR+ZYCCGKlZjEnIuKx/f789uR+C3chp1WxV+Xy/LKwlCcXEqYNkAhCjkpLIQQxYqXi32u9nuw4QvqrPkHGx1c9rfi5e+kqBAiN2QoRAhRrIQEuOPrZo8qh9dVwJD4hdT9LbOouFDJmmardlHSs4w5wxSi0JLCQghRrKitVIzrFAiQpbhQAUNiv6PL7gtY6+FcNVva/noA15JeZo9TiMJKCgshRLHTLsiX+W/WxcfNeFhkoM0Wmp2+ipUCZ4Ps6bDqHxycXC0UpRCFk8yxEEIUS+2CfGkd6MOhyzHEnjvE7HIHaR+9gsst7DgeE8BrC/ZiY2tn6TCFKHSkx0IIUWyprVS84OdKyvFltI/+AYC7lXrSY/E/UlQI8Yykx0IIUWzptFo2vNOCwOPxHH3ZGW2dN2nQfwoqK/mbS4hnJT89Qohi6f7dm2zs/gLBR+Kx1cFlpQYNB06TokKI5yQ/QUKIYufsoVCOdW9N9fMZ6FTwT8vSdJ+12dJhCVEkyFCIEKJY2b5kIq7fr6JsMiQ6wJ23OuLh95KlwxKiyJAeCyFEsbH5+5H4zFiFWzJElwKbWV/R7p3Jlg5LiCJFeiyEEEWeVpPBsYXDeOXub2yp6I1Wb0ejnzbgVaYiGo3G0uEJUaRIYSGEKNKuXTjB7d9G0ED3L6jAtcMrNBw4XS4nFcJEpLAQQhRZ+zcuQD9lNknuehLr2XGp0XSathtg6bCEKNKksBBCFEnrpw6h/Mp9OGSAld6KC82+p97LPSwdlhBFnhQWQogiRafV8tt7Lam19y5WQGRZK6rNXUKFwBBLhyZEsSCFhRCiyIiNjmTfO12ofTFzQuaZ2o50WLgTJxc3C0cmRPEhl5sKIYqE29cucWhgB6pd1KC1gtMdK9Fj9XEpKoQwMykshBCFXvihUGwXt6R84H3iXODGyNfoNeNPS4clRLEkQyFCiEItdP4YXr69GBuVjvslK+D6y1LKVwm2dFhCFFtSWAghCqXkxHi2vN2SymdTuNzOhhSfJlR/9/9wdJahDyEsSYZChBCFTsSZg+zp2pCg0ynYaOF0Ri3qjl4vRYUQBYAUFkKIQmXP2u+IfustAm4opNrCxYFN6TV3h9zuXIgCQoZChBCFxu+T+lPxtyPYaeBuCVB/OoqunYdYOiwhxCOksBBCFHh6nY5147tRY80FAK6Us6Lmd8spV7WuhSMTQjxOCgshRIGWnPiAiz++wauq/ewo70NiKVc6LfgbBydXS4cmhMiGFBZCiALr5J4NOO78nDqq62SorHF7ewivvP4fS4clhHgCKSyEEAXSph8+xHvhJi5V0FKyZgniOv5MoxdbWTosIcRTSGEhhChQdFota8Z0JGjbVdQKPLhnQ3KvlVSr/qKlQxNC5IIUFkKIAiMpPo7Qt14mOCwNgHPVbGmxcAslPctYODIhRG7Jhd9CiALhwqm97O/6EjXC0tADp1v60mXtcSkqhChkpLAQQljchRO7uTn8HcrfUkixg8uDW9Jr3k7U1tKpKkRhI4WFEMKijm1eSLmNr+NcN5HbHqCZ+jFd/vODpcMSQjwj+XNACGERmox0ts8eRIfUTaAC+9LBVPhsGaW8ZehDiMJMCgshhNlFXz3P0Xe7UzpaR1RbO24HdOfFt+fK0IcQRYAMhQghzOrI9hWc792VypE6bLRw3PkVGrw7T4oKIYoI+UkWQpjNH3NGUHrJdnzS4b4zpIwcQLc3P7J0WEKIfCSFhRDC5HRaLWtGvkLNv29gpcA1XxXlZ3xPoxdaWjo0IUQ+k8JCCGFSaanJbBzThuC/4wA4W8OONou241rSy8KRCSFMQeZYCCFMJvbWVa7ObMGr7mFcL61wurUfXX89JkWFEEWY9FgIIUxiz5q5VD4zm6pWccSrnfD+4mvatOhm6bCEECYmhYUQIt/99tnrVN0QxslaVmRU98O6z2qCKwVZOiwhhBlIYSGEyDfpqSlsGNyCWscSANDFO+A27C9KevhYODIhhLnIHAshRL6IOneM7V1fNBQVpxuW5JUNx6WoEKKYkR4LIcRzWzdlMKXX7KdSCqRbw+XuL9Br/ApLhyWEsAApLIQQzyw+7i4nF7yF/6pLOGTAbQ+I7PMG1dsNRadXUFupLB2iEMLMpLAQQjyT07vW4LtnLM2J4+/6JTmd5MlMv5EkXS8BCw/h62bPuE6BtAvytXSoQggzksJCCJEnt6LOcXBMH8qVjsHLPYmrlOaHUkM44VHFaL/b8WkMXXGC+W/WleJCiGJEJm8KIXLtz7mjiXz9NWqEpaH5x5X9Ht3oZzODE0qVLPsq//3vhD/D0emVLK8LIYomKSyEEE8VczOCNT3rUmneVtwTIdYNEga/hrr9N1xNzPk4BYiOT+NIZJzZYhVCWJYMhQghnmjzvI9xW7qRoMyrSDlT25GWs9dSyjeAjadu5qqNmMQ0E0YohChIpLAQQmQrNTmRHTPfpPLKiwDcc4UH/TrQY/gMwz5eLva5aiu3+wkhCj8pLIQQWZw/sgOnrR/QWbnF5srepNo703jGSnzLVzPaLyTAHV83e27Hp5HdLAoV4ONmT0iAu1niFkJYnsyxEEIYPLgXzeoBDSm5vhd+yi1icKf0iAl0X3MiS1EBoLZSMa5TIJBZRDzq4fNxnQJlPQshihEpLIQQAPz1f5M51aklwYcecOzfUhwt8Qp2I45Sp1XvJx7XLsiX+W/WxcfNeLjDx81eLjUVohjK01DI/PnzmT9/PlFRUQDUqFGDL7/8kldeecUUsQkhzCDhfgxbRnWh5uEHWCkQ7wQZTZvz4sifct1GuyBfWgf6cCQyjpjENLxcMoc/pKdCiOInT4VF2bJlmTp1KpUqVQJg2bJldOnShZMnT1KjRg2TBCiEMJ1dq2ehfL+Q4NjM5+eq2VJ3ys80qF4vz22prVQ0rOiRzxEKIQqbPBUWnTp1Mnr+9ddfM3/+fA4dOiSFhRCFiCYjnQ3jX6fahktY6yHBEW50DaHbF8ssHZoQopB75qtCdDoda9asITk5mYYNG+a4X3p6Ounp6YbnCQmZF8NrNBo0Gs2zvn0WD9vKzzZFVpJn8zFVrqPCj6L6YxhtraI44+zDXS9rqn41j841GhTLf1f5TpuH5Nk8TJnn3LapUhQlT2vtnjlzhoYNG5KWloazszMrV66kffv2Oe4/fvx4JkyYkGX7ypUrcXR0zMtbCyGegyYjlaQ9C+jlfgp7Kx0PFGf+dOqKc8WWWFmrLR2eEKKAS0lJoU+fPsTHx+Pq6prjfnkuLDIyMrh27RoPHjzg999/Z9GiRezZs4fAwMBs98+ux8LPz4/Y2NgnBpZXGo2GHTt20Lp1a2xsbPKtXWFM8mw++ZnrQ5uXkjp7Nn53FG6+lIJH5er49pmHh49fPkVbeMl32jwkz+ZhyjwnJCRQqlSppxYWeR4KsbW1NUzerFevHkePHuXbb7/lp5+yn0FuZ2eHnZ1dlu02NjYm+XKZql1hTPJsPs+T6/TUFNb/pxPV99yilA5S7CDOtwEvf7gKlZVcbf4o+U6bh+TZPEyR59y299wrbyqKYtQjIYQoGI5sX0Hc1MkE38rslLzsb0WFCbPoUb+thSMTQhRleSosPv30U1555RX8/PxITExk9erV7N69m9DQUFPFJ4TII71Ox5rPXqPqpouU10KqLVxqV5Xuk9eitpZV/IUQppWn3zJ37tyhb9++REdH4+bmRq1atQgNDaV169amik8IkQc3r5zjwerBVM+IQK0rwZVyVpT5YjI9m3SxdGhCiGIiT4XFzz//bKo4hBDPQavJYOsPI3n5/lrKqNJJcbFj+6DGdBw5D2sbW0uHJ4QoRqRfVIgCSKdXOBIZB8CRyDgaVPLKcXnsfw/8yfUJH1Pupp5bHSC9RE1K9FrIqxWqmzNkIYQApLAQosAJDYtmwp/hxCWlMj0EBi07iruzA+M6BRrd0Eun1bL20+5UDr1AhQxIt4bj1o15/aO1WKllXQohhGXI9WZCFCChYdEMXXGC6Pg0o+2349MYuuIEoWHRAJw9vJ0tHYOp9ccFHDLgmq+K9Bmf0HPyeikqhBAWJT0WQhQQOr3ChD/DyW7FOgVQAeM3hpGwegQV/jxDpXTIUMO5ZqXpOvNP7BxkJVshhOVJYSFEAXEkMi5LT8WjSvGAiWk/o7sWhVO6Hde9VTj/ZxS9Og82Y5RCCPFkUlgIUUDEJGZfVCh6HV30OxnvsJaSqiSSy6v50/1Funy9Bgen/FsWXwgh8oMUFkIUEF4u9lm2Jd29wpQzP+NxX4tjyyTOqspzrdlMerWStWOEEAWTFBZCFBAhAe74utlzOz4NBeh172cqbT2PawroVCrm3a/PWt/R7G3ZytKhCiFEjqSwEKKAUFupGNcpkFULp9Ht/A4qX9UDEO0By19ozW7HtszvEpzjehZCCFEQSGEhRAERee44KZ8P4uPzGQBoreB0HWe+8RuNo5sX8x9bx0IIIQoiKSyEsLDY29eIWDuOunc3cj7NE7DiXFUbdL2G4+LozbzqDZ648qYQQhQkUlgIYSE3I8PZN/FtGvhEUt8uBVRg84Intwe8xmu9RqPRaNiyZQshAe5SVAghCg0pLIQws/t3b7JtXD8qHrxFcBqEBTmTXrsc2hZf0qpxB0uHJ4QQz0UKCyHMJDU5gT/G98Vv10WCkzK33fYATb3mVBn7EyorWWFfCFH4SWEhhIkpej3rpryFx+ZD1Mq8YSlxrnDj5UBe/XK5LMUthChSpLAQwoTC9m3Ebs8k7M48wDvOhkQHuPKSH+3GL6Oxh1zhIYQoeqSwEMIE/v5lOvqwdbS2OweAQ0UH9jtXotmXiwgpX83C0QkhhOlIYSFEPjq0dRnR82ZS7ZKGq3560hqqOeX9GpW7j6eXd1lLhyeEECYnhYUQ+eDs4e2cm/ER1cLSqKaAHkh2tie693YaVK9n6fCEEMJspLAQ4jlcu3CCf74aSrWTCdTQZm67VEFNybeH0u21YZYNTgghLEAKCyGeQUpSPKd/n0b8/l+pddQWgKtlVNC7O53fnmjh6IQQwnKksBAiD5IT49n7f1/y4t0NNOQBGb4QWrEM+uZN6TTqe9TW8iMlhCje5LegELmgyUhnw9cD8d52EhcFXNs84Ja1N7fqjabDl29LQSGEEP8lvw2FeAKdVsvmuaNw3PAXQTGZ2xIcYbt7H9q8N5vSdvaWDVAIIQoYKSyEyMH2pZPQrlhF5RsKAKm2cLG+Jy3HL6F+mYoWjk4IIQomKSyEeMzVc8c5v2I05X7P7KLIUMP52i68+Nn39AoMsXB0QghRsElhIcR/RZ47Tuy2b6h7P5TyNgpb/H1IcXKgyuhJ9GzcydLhCSFEoSCFhSj2roQf4ejXwwgIT8K/XQxqO4WTTi9RdcY4KgZJD4UQQuSFFBai2Lpz4zK7xg+kypFYamVkbtt7pyw135tNnXotLRucEEIUUlJYiGIn4X4Mm8f3pcK+awSnZG676QUpr7Wh6/CZcumoEEI8B/kNKooNnVbLkY3fkz7zJ2rHZW67WwJi2tWly6eLsbG1s2h8QghRFEhhIYo8nVbLmd1rcDs4hYb6q2wq64ldug3Xmlek4/hfcHJxs3SIQghRZEhhIYqsiDMHOfLTl/icvIlX/QcEuKaQgCOOL3ei6nejaejtZ+kQhRCiyJHCQhQpmox0ts4bi+7vXVSK0FBbn7k9MtyVhK6vEdj9S1p6eFs2SCGEKMKksBBFwtWLp/ln+nDK/htL5YT/bb/hrSKuXgVeGj6dMgGBlgtQCCGKCSksRKGVlppC+J7fsD61gsCUY5Q+64NHgopkO7hS3Qmvrm/SuudIS4cphBDFihQWotA5uHkJ11f9iE9EAi+2isZRrYAV3HvBkTslKtNqxCzqeZYBQKdXOBIZR0xiGl4u9oQEuKO2Uln4EwghRNElhYUoFGKjI/l77hjcjpyj/E2Fkv/dvu+uJ651X8Gv5RC6VgoyOiY0LJoJf4YTHZ9m2ObrZs+4ToG0C/I1Y/RCCFF8SGEhCixFr+fQlqXc+L/vqXA+1bA6pk4FVwLU6Jo2ot37M3Bwcs1ybGhYNENXnEB5bPvt+DSGrjjB/DfrSnEhhBAmIIWFKHDu343mwl9L8Lz8Gz6JtyjxrweQuZjVrWBvar39GZ1fbJ3j8Tq9woQ/w7MUFQAKoAIm/BlO60AfGRYRQoh8JoWFKBA0GelsW/g5GTu2Y6tPo2Pd2wCkO9iwoYETTi+2oO3gr3K1OuaRyDij4Y/HKUB0fBpHIuNoWNEjvz6CEEIIpLAQFnbu6F+cXvQVpU/foeKDzG1aKytO1SpHWo3eVG/zFj3zuO5ETGLORcWz7CeEECL3pLAQZqfVZLB13liUbX9RIVJH8H/HLFJtIaKaA+5detG85+hnvhmYl4t9vu4nhBAi96SwEGZz7eK/3Ny9iEq3/sDuqgq/K5kn9qtlVMS/WI2XR8ykrm/Ac79PSIA7vm723I5Py3aehQrwccu89FQIIUT+ksJCmNT9u7f469tROB0Kw75CEi1LPwCgUmk3jtZzwa/Xu7TrOChf31NtpWJcp0CGrjiBCoyKi4dTNcd1CpSJm0IIYQJSWAiT2L3mO2J+X0aFc8kEpWduu6534N8KldHWfpOgFr2oYGe6oYh2Qb7Mf7NulnUsfGQdCyGEMCkpLES+SYyPY8uk/rgfjaDsHYWHUy7jXOF6rVIEDhxLrcadzBZPuyBfWgf6yMqbQghhRlJYiOei6PVcOPoXif8spsb9nbj9W5Kyd6zQWsHlijZYtWxOu3en0tjB0SLxqa1UckmpEEKYkRQW4plEhB3k1KKJeJ2+RfUmd6lmpwEVqKpbc6qKLyHvTKRrzUaWDlMIIYSZSWEhcu3s4W2E/fEztqfD0VzREazP3H78Rgm8XngR10aDaFPvZVRWVpYNVAghhMVIYSFydPv6Za6f2E7s6S04bYnA8wHUeuT16z4q7terRNP3p+NbvpqlwhRCCFGASGEhDE7t28jFLf+H1cUIrJ3TeTXgFj5Aik7F5URf9Cq45aUipowjZbr2pc3rIywdshBCiAJGCotiSqfV8u++jVzc9gvWF6/gfSMdjwSo+d/Xoz0VtP5WXLGpxD3fEO4NcyK4XX8eaEvhfO4QNtUboNMrcoWFEEIII1JYFBM6rZbzx/4i+eZ5VNcOUObBCVK32FIr8X/7aK3glreK++XccKj9ImlvTaCKa0kg8zbkPX4NJy4pkukhMGjZUdydHWRNCCGEEEaksCiidFotx/5eTdSO37C5fBWfGxnYaOGFLrdQqwAVHC/rTUK8NQ/KueFYO4QGr39AzTIVs7QVGhbN0BUnUAA79f+2345PY+iKE8x/s64UF0IIIQApLIoMRa/n2qV/ObJqKtYnw/G5oaFEsvFkS60VHNBUwabCSzhXaU6L0U1x/m+PRE50eoUJf4Zne88Nhcwlsif8GU7rQB8ZFhFCCFE0CgudXuFIZBwARyLjaFDJq8Cf5B7G/KwrQuq0Wo5sXcrVXesp66lQLf0s5XlA2MVSVLhgC4BGDTd9VMSXL4lz3YY0en0ENb398hTnkcg4oyWxH6cA0fFpHImMk4WohBBCFP7CIjQsmgl/hhOXlFpoxv4fxvzoCdv3Kfew0Go0HNqymBu7NmAXcYPSN7WUSIESwINWiZQqlUi6YoN1WXdOO9jh8kJjGr3+AbU8yzxXrDGJORcVz7KfEEKIoq1QFxaFcez/0Zgf9XjMep2OqHPHiDnzN7ePbab0zjg8UuHRPoF0a7jpa0WqewjhbQdToXZT2jo45Wu8Xi65u1FYbvcTQghRtBXawqIwjv0/KWaVkkGTtD1cnvkNG+7ew750Cu1Kx1IBiFDbkZHqQbpNZiGR4O9ByReb0aj7+9Qu6WXSmEMC3PF1s+d2fFr2cZN5x9CQAHeTxiGEEKJwKLSFRWEc+8+MORVXUvDmLsEJBymXfB2/mDj8onU4PfJxrqAmxdeOyw5BJPmHkFjNgcbd3qe2m3lP4GorFeM6BTJ0xQkeL88ePh/XKbDAFG9CCCEsq9AWFgVx7D8jPY1LJ3dz68IJEm9cIj0mGqv7D7BJTMU+SUt6CT3hgfdwVKWjVyBsT2lsdP87PtUWbvhacd+vFKWbv4JNj1HUsrUzW/w5aRfky/w36xrmsjzk85R5IUIIIYqfQltYmHvs//a1S1w5s5+4K2GkRF9Ffy8Wq/hEcIDaFa0ooY2lpP4+2t99KavLvo2bGgVHVToACThxrSxordTc8HTjdIla7LZ/GY2VA6sGNygwvSwPtQvypXWgD4cuxxB77hCL+79YKK6+EUIIYV6FtrDIr7H/jLRUroQf4ua5o8Rfu0hGzC1UVloqlC2JfdodnDPuErNRj2sKlCTz8aibPgpVykdnPrGCB64K1loVSc4qUp3UZLjYoS/hitrDE2f/qnSNrsS5ZGfSsIM6WWP2LcDzFdRWKkIC3NlyjjxfHiuEEKJ4KLSFRW7G/j9q6s6Jv1YTE3EabXI83l6lIDEa25Q73N4egdsDBddksNZDaTIfADe9FV50OWxo76ZN5iuptpDgDMnOVqQ526BxdUDl483JRp/j7FmOkr7+NPrUBxtb2xzjVv33qhAVGBVEMl9BCCFEUVBoCwv439j/+I1h9E5dSdqKk3wTr8E5WY9rMjhtyNzPmcxioUGLaMOx+xJ9cU/MPIHrgQQnSHJWkeJsRVopJ/4JeA1rtzI4eJTFvmk6JSvVoXq5yvkW8+PrWMh8BSGEEEVBoS4s4H9j//cnvsW5646UemD81366DcQ7Q5KrNcdcW6Fx8kHlWpr7JW6TWsKb0tVeoGLNxjg4uZo95udZeVMIIYQoiAp9YQGZwyIXy/Umou5hbjmXwN63HB4ValA+qDFVyldDbV3wPqbaSlXgJmgKIYQQz6vgnXGf0Yt9v+Kuxxbat2+PjY2NpcMRQgghiiWrvOw8ZcoUXnzxRVxcXPDy8uLVV1/lwoULpopNCCGEEIVMngqLPXv2MGzYMA4dOsSOHTvQarW0adOG5ORkU8UnhBBCiEIkT0MhoaGhRs+XLFmCl5cXx48fp2nTpvkamBBCCCEKn+eaYxEfHw+Au3vOCzqlp6eTnp5ueJ6QkACARqNBo9E8z9sbedhWfrYpspI8m4/k2jwkz+YheTYPU+Y5t22qFEXJbuHKp1IUhS5dunD//n327duX437jx49nwoQJWbavXLkSR0fHZ3lrIYQQQphZSkoKffr0IT4+HlfXnJdoeObCYtiwYWzevJn9+/dTtmzZHPfLrsfCz8+P2NjYJwaWVxqNhh07dtC6dWu5KsSEJM/mI7k2D8mzeUiezcOUeU5ISKBUqVJPLSyeaSjk/fff548//mDv3r1PLCoA7OzssLPLeodOGxsbk3y5TNWuMCZ5Nh/JtXlIns1D8mwepshzbtvLU2GhKArvv/8+69evZ/fu3QQEBDxTcEIIIYQomvJUWAwbNoyVK1eyceNGXFxcuH37NgBubm44ODiYJEAhhBBCFB55Wsdi/vz5xMfH07x5c3x9fQ2PX3/91VTxCSGEEKIQyfNQiBBCCCFETvLUYyGEEEII8SRSWAghhBAi35j97qYPh1MersCZXzQaDSkpKSQkJMilTCYkeTYfybV5SJ7NQ/JsHqbM88Pz9tOmRZi9sEhMTATAz8/P3G8thBBCiOeUmJiIm5tbjq8/88qbz0qv13Pr1i1cXFxQqVT51u7DFT2vX7+eryt6CmOSZ/ORXJuH5Nk8JM/mYco8K4pCYmIipUuXxsoq55kUZu+xsLKyeupqnc/D1dVVvrRmIHk2H8m1eUiezUPybB6myvOTeioeksmbQgghhMg3UlgIIYQQIt8UmcLCzs6OcePGZXvDM5F/JM/mI7k2D8mzeUiezaMg5NnskzeFEEIIUXQVmR4LIYQQQlieFBZCCCGEyDdSWAghhBAi30hhIYQQQoh8U6gKi3nz5hEQEIC9vT0vvPAC+/bte+L+e/bs4YUXXsDe3p4KFSrw448/minSwi0veV63bh2tW7fG09MTV1dXGjZsyLZt28wYbeGV1+/zQwcOHMDa2pratWubNsAiJK+5Tk9P57PPPqN8+fLY2dlRsWJFFi9ebKZoC6+85vmXX34hODgYR0dHfH19GThwIPfu3TNTtIXT3r176dSpE6VLl0alUrFhw4anHmP2c6FSSKxevVqxsbFRFi5cqISHhysjRoxQnJyclKtXr2a7/5UrVxRHR0dlxIgRSnh4uLJw4ULFxsZGWbt2rZkjL1zymucRI0Yo06ZNU44cOaJcvHhR+eSTTxQbGxvlxIkTZo68cMlrnh968OCBUqFCBaVNmzZKcHCweYIt5J4l1507d1bq16+v7NixQ4mMjFQOHz6sHDhwwIxRFz55zfO+ffsUKysr5dtvv1WuXLmi7Nu3T6lRo4by6quvmjnywmXLli3KZ599pvz+++8KoKxfv/6J+1viXFhoCouQkBDl3XffNdpWrVo15eOPP852/7FjxyrVqlUz2vbOO+8oDRo0MFmMRUFe85ydwMBAZcKECfkdWpHyrHnu2bOn8vnnnyvjxo2TwiKX8prrrVu3Km5ubsq9e/fMEV6Rkdc8f/PNN0qFChWMts2dO1cpW7asyWIsanJTWFjiXFgohkIyMjI4fvw4bdq0Mdrepk0bDh48mO0x//zzT5b927Zty7Fjx9BoNCaLtTB7ljw/Tq/Xk5iYiLu7uylCLBKeNc9LliwhIiKCcePGmTrEIuNZcv3HH39Qr149pk+fTpkyZahSpQpjxowhNTXVHCEXSs+S50aNGnHjxg22bNmCoijcuXOHtWvX0qFDB3OEXGxY4lxo9puQPYvY2Fh0Oh3e3t5G2729vbl9+3a2x9y+fTvb/bVaLbGxsfj6+pos3sLqWfL8uJkzZ5KcnEyPHj1MEWKR8Cx5vnTpEh9//DH79u3D2rpQ/NgWCM+S6ytXrrB//37s7e1Zv349sbGxvPfee8TFxck8ixw8S54bNWrEL7/8Qs+ePUlLS0Or1dK5c2e+++47c4RcbFjiXFgoeiweevw264qiPPHW69ntn912YSyveX5o1apVjB8/nl9//RUvLy9ThVdk5DbPOp2OPn36MGHCBKpUqWKu8IqUvHyn9Xo9KpWKX375hZCQENq3b8+sWbNYunSp9Fo8RV7yHB4ezgcffMCXX37J8ePHCQ0NJTIyknfffdccoRYr5j4XFoo/fUqVKoVarc5S+cbExGSpxB7y8fHJdn9ra2s8PDxMFmth9ix5fujXX3/lrbfeYs2aNbRq1cqUYRZ6ec1zYmIix44d4+TJkwwfPhzIPPkpioK1tTXbt2+nZcuWZom9sHmW77Svry9lypQxuj109erVURSFGzduULlyZZPGXBg9S56nTJlC48aN+fDDDwGoVasWTk5ONGnShK+++kp6lfOJJc6FhaLHwtbWlhdeeIEdO3YYbd+xYweNGjXK9piGDRtm2X/79u3Uq1cPGxsbk8VamD1LniGzp2LAgAGsXLlSxkdzIa95dnV15cyZM5w6dcrwePfdd6latSqnTp2ifv365gq90HmW73Tjxo25desWSUlJhm0XL17EysqKsmXLmjTewupZ8pySkoKVlfEpSK1WA//7i1o8P4ucC002LTSfPbyU6eeff1bCw8OVkSNHKk5OTkpUVJSiKIry8ccfK3379jXs//ASm1GjRinh4eHKzz//LJeb5kJe87xy5UrF2tpa+eGHH5To6GjD48GDB5b6CIVCXvP8OLkqJPfymuvExESlbNmySvfu3ZWzZ88qe/bsUSpXrqy8/fbblvoIhUJe87xkyRLF2tpamTdvnhIREaHs379fqVevnhISEmKpj1AoJCYmKidPnlROnjypAMqsWbOUkydPGi7rLQjnwkJTWCiKovzwww9K+fLlFVtbW6Vu3brKnj17DK/1799fadasmdH+u3fvVurUqaPY2toq/v7+yvz5880cceGUlzw3a9ZMAbI8+vfvb/7AC5m8fp8fJYVF3uQ11+fOnVNatWqlODg4KGXLllVGjx6tpKSkmDnqwieveZ47d64SGBioODg4KL6+vsobb7yh3Lhxw8xRFy67du164u/cgnAulNumCyGEECLfFIo5FkIIIYQoHKSwEEIIIUS+kcJCCCGEEPlGCgshhBBC5BspLIQQQgiRb6SwEEIIIUS+kcJCCCGEEPlGCgshhBBC5BspLIQQQgiRb6SwEEIIIUS+kcJCCCGEEPlGCgshhBBC5Jv/B0LFhVYonimoAAAAAElFTkSuQmCC",
+      "text/plain": [
+       "<Figure size 640x480 with 1 Axes>"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "import numpy as np\n",
+    "import matplotlib.pyplot as plt\n",
+    "\n",
+    "from sklearn.linear_model import LinearRegression\n",
+    "\n",
+    "\n",
+    "np.random.seed(2021)\n",
+    "\n",
+    "def MSE(y_data,y_model):\n",
+    "    n = np.size(y_model)\n",
+    "    return np.sum((y_data-y_model)**2)/n\n",
+    "\n",
+    "\n",
+    "def fit_theta(X, y):\n",
+    "    return np.linalg.pinv(X.T @ X) @ X.T @ y\n",
+    "\n",
+    "\n",
+    "true_theta = [2, 0.5, 3.7]\n",
+    "\n",
+    "x = np.linspace(0, 1, 11)\n",
+    "y = np.sum(\n",
+    "    np.asarray([x ** p * b for p, b in enumerate(true_theta)]), axis=0\n",
+    ") + 0.1 * np.random.normal(size=len(x))\n",
+    "\n",
+    "degree = 3\n",
+    "X = np.zeros((len(x), degree))\n",
+    "\n",
+    "# Include the intercept in the design matrix\n",
+    "for p in range(degree):\n",
+    "    X[:, p] = x ** p\n",
+    "\n",
+    "theta = fit_theta(X, y)\n",
+    "\n",
+    "# Intercept is included in the design matrix\n",
+    "skl = LinearRegression(fit_intercept=False).fit(X, y)\n",
+    "\n",
+    "print(f\"True theta: {true_theta}\")\n",
+    "print(f\"Fitted theta: {theta}\")\n",
+    "print(f\"Sklearn fitted theta: {skl.coef_}\")\n",
+    "ypredictOwn = X @ theta\n",
+    "ypredictSKL = skl.predict(X)\n",
+    "print(f\"MSE with intercept column\")\n",
+    "print(MSE(y,ypredictOwn))\n",
+    "print(f\"MSE with intercept column from SKL\")\n",
+    "print(MSE(y,ypredictSKL))\n",
+    "\n",
+    "\n",
+    "plt.figure()\n",
+    "plt.scatter(x, y, label=\"Data\")\n",
+    "plt.plot(x, X @ theta, label=\"Fit\")\n",
+    "plt.plot(x, skl.predict(X), label=\"Sklearn (fit_intercept=False)\")\n",
+    "\n",
+    "\n",
+    "# Do not include the intercept in the design matrix\n",
+    "X = np.zeros((len(x), degree - 1))\n",
+    "\n",
+    "for p in range(degree - 1):\n",
+    "    X[:, p] = x ** (p + 1)\n",
+    "\n",
+    "# Intercept is not included in the design matrix\n",
+    "skl = LinearRegression(fit_intercept=True).fit(X, y)\n",
+    "\n",
+    "# Use centered values for X and y when computing coefficients\n",
+    "y_offset = np.average(y, axis=0)\n",
+    "X_offset = np.average(X, axis=0)\n",
+    "\n",
+    "theta = fit_theta(X - X_offset, y-y_offset)\n",
+    "intercept = np.mean(y_offset - X_offset @ theta)\n",
+    "\n",
+    "print(f\"Manual intercept: {intercept}\")\n",
+    "print(f\"Fitted theta (without intercept): {theta}\")\n",
+    "print(f\"Sklearn intercept: {skl.intercept_}\")\n",
+    "print(f\"Sklearn fitted theta (without intercept): {skl.coef_}\")\n",
+    "ypredictOwn = X @ theta\n",
+    "ypredictSKL = skl.predict(X)\n",
+    "print(f\"MSE with Manual intercept\")\n",
+    "print(MSE(y,ypredictOwn+intercept))\n",
+    "print(f\"MSE with Sklearn intercept\")\n",
+    "print(MSE(y,ypredictSKL))\n",
+    "\n",
+    "plt.plot(x, X @ theta + intercept, \"--\", label=\"Fit (manual intercept)\")\n",
+    "plt.plot(x, skl.predict(X), \"--\", label=\"Sklearn (fit_intercept=True)\")\n",
+    "plt.grid()\n",
+    "plt.legend()\n",
+    "\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f72dbb49",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "where we have used  that $\\mathbb{E} (\\mathbf{y} \\mathbf{y}^{T}) =\n",
-    "\\mathbf{X} \\, \\boldsymbol{\\beta} \\, \\boldsymbol{\\beta}^{T} \\, \\mathbf{X}^{T} +\n",
-    "\\sigma^2 \\, \\mathbf{I}_{nn}$. From $\\mbox{Var}(\\boldsymbol{\\beta}) = \\sigma^2\n",
-    "\\, (\\mathbf{X}^{T} \\mathbf{X})^{-1}$, one obtains an estimate of the\n",
-    "variance of the estimate of the $j$-th regression coefficient:\n",
-    "$\\boldsymbol{\\sigma}^2 (\\boldsymbol{\\beta}_j ) = \\boldsymbol{\\sigma}^2 [(\\mathbf{X}^{T} \\mathbf{X})^{-1}]_{jj} $. This may be used to\n",
-    "construct a confidence interval for the estimates.\n",
+    "The intercept is the value of our output/target variable\n",
+    "when all our features are zero and our function crosses the $y$-axis (for a one-dimensional case). \n",
     "\n",
-    "In a similar way, we can obtain analytical expressions for say the\n",
-    "expectation values of the parameters $\\boldsymbol{\\beta}$ and their variance\n",
-    "when we employ Ridge regression, allowing us again to define a confidence interval. \n",
+    "Printing the MSE, we see first that both methods give the same MSE, as\n",
+    "they should.  However, when we move to for example Ridge regression,\n",
+    "the way we treat the intercept may give a larger or smaller MSE,\n",
+    "meaning that the MSE can be penalized by the value of the\n",
+    "intercept. Not including the intercept in the fit, means that the\n",
+    "regularization term does not include $\\theta_0$. For different values\n",
+    "of $\\lambda$, this may lead to different MSE values. \n",
     "\n",
-    "It is rather straightforward to show that"
+    "To remind the reader, the regularization term, with the intercept in Ridge regression, is given by"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "b701ef3f",
-   "metadata": {},
+   "id": "b7759b1f",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
-    "\\mathbb{E} \\big[ \\hat{\\boldsymbol{\\beta}}^{\\mathrm{Ridge}} \\big]=(\\mathbf{X}^{T} \\mathbf{X} + \\lambda \\mathbf{I}_{pp})^{-1} (\\mathbf{X}^{\\top} \\mathbf{X})\\boldsymbol{\\beta}.\n",
+    "\\lambda \\vert\\vert \\boldsymbol{\\theta} \\vert\\vert_2^2 = \\lambda \\sum_{j=0}^{p-1}\\theta_j^2,\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "1453c4df",
-   "metadata": {},
+   "id": "ba0ecd6e",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "We see clearly that \n",
-    "$\\mathbb{E} \\big[ \\hat{\\boldsymbol{\\beta}}^{\\mathrm{Ridge}} \\big] \\not= \\hat{\\boldsymbol{\\beta}}^{\\mathrm{OLS}}$ for any $\\lambda > 0$.\n",
-    "\n",
-    "We can also compute the variance as"
+    "but when we take out the intercept, this equation becomes"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "fade84b6",
-   "metadata": {},
+   "id": "ae897f1e",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
-    "\\mbox{Var}[\\hat{\\boldsymbol{\\beta}}^{\\mathrm{Ridge}}]=\\sigma^2[  \\mathbf{X}^{T} \\mathbf{X} + \\lambda \\mathbf{I} ]^{-1}  \\mathbf{X}^{T} \\mathbf{X} \\{ [  \\mathbf{X}^{\\top} \\mathbf{X} + \\lambda \\mathbf{I} ]^{-1}\\}^{T},\n",
+    "\\lambda \\vert\\vert \\boldsymbol{\\theta} \\vert\\vert_2^2 = \\lambda \\sum_{j=1}^{p-1}\\theta_j^2.\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "e91b254f",
-   "metadata": {},
+   "id": "f9c41f7f",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "and it is easy to see that if the parameter $\\lambda$ goes to infinity then the variance of Ridge parameters $\\boldsymbol{\\beta}$ goes to zero. \n",
-    "\n",
-    "With this, we can compute the difference"
+    "For Lasso regression we have"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "6f50fb28",
-   "metadata": {},
+   "id": "fa013cc4",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
-    "\\mbox{Var}[\\hat{\\boldsymbol{\\beta}}^{\\mathrm{OLS}}]-\\mbox{Var}(\\hat{\\boldsymbol{\\beta}}^{\\mathrm{Ridge}})=\\sigma^2 [  \\mathbf{X}^{T} \\mathbf{X} + \\lambda \\mathbf{I} ]^{-1}[ 2\\lambda\\mathbf{I} + \\lambda^2 (\\mathbf{X}^{T} \\mathbf{X})^{-1} ] \\{ [  \\mathbf{X}^{T} \\mathbf{X} + \\lambda \\mathbf{I} ]^{-1}\\}^{T}.\n",
+    "\\lambda \\vert\\vert \\boldsymbol{\\theta} \\vert\\vert_1 = \\lambda \\sum_{j=1}^{p-1}\\vert\\theta_j\\vert.\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "e37b2bfb",
-   "metadata": {},
+   "id": "0c9b24be",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "The difference is non-negative definite since each component of the\n",
-    "matrix product is non-negative definite. \n",
-    "This means the variance we obtain with the standard OLS will always for $\\lambda > 0$ be larger than the variance of $\\boldsymbol{\\beta}$ obtained with the Ridge estimator. This has interesting consequences when we discuss the so-called bias-variance trade-off below. \n",
+    "It means that, when scaling the design matrix and the outputs/targets,\n",
+    "by subtracting the mean values, we have an optimization problem which\n",
+    "is not penalized by the intercept. The MSE value can then be smaller\n",
+    "since it focuses only on the remaining quantities. If we however bring\n",
+    "back the intercept, we will get a MSE which then contains the\n",
+    "intercept.\n",
     "\n",
-    "For more discussions of Ridge regression and calculation of averages, [Wessel van Wieringen's](https://arxiv.org/abs/1509.09169) article is highly recommended."
+    "Armed with this wisdom, we attempt first to simply set the intercept equal to **False** in our implementation of Ridge regression for our well-known  vanilla data set."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 10,
+   "id": "4f9b1fa0",
+   "metadata": {
+    "collapsed": false,
+    "editable": true,
+    "jupyter": {
+     "outputs_hidden": false
+    }
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Theta values for own Ridge implementation\n",
+      "[ 1.03032441e+00  6.28336218e-02 -6.24175744e-01  5.21169159e-02\n",
+      "  2.80847477e-01  2.12552073e-01  8.13220608e-02 -1.69634577e-02\n",
+      " -6.50846112e-02 -7.38962192e-02 -5.94226022e-02 -3.50227564e-02\n",
+      " -9.80609616e-03  1.08299273e-02  2.41882037e-02  2.93492130e-02\n",
+      "  2.64742912e-02  1.63249532e-02 -5.01831251e-05 -2.15098090e-02]\n",
+      "Theta values for Scikit-Learn Ridge implementation\n",
+      "[ 1.03032441e+00  6.28336218e-02 -6.24175744e-01  5.21169159e-02\n",
+      "  2.80847477e-01  2.12552073e-01  8.13220608e-02 -1.69634577e-02\n",
+      " -6.50846112e-02 -7.38962192e-02 -5.94226022e-02 -3.50227564e-02\n",
+      " -9.80609615e-03  1.08299273e-02  2.41882037e-02  2.93492130e-02\n",
+      "  2.64742912e-02  1.63249532e-02 -5.01831207e-05 -2.15098090e-02]\n",
+      "MSE values for own Ridge implementation\n",
+      "4.3632959215700067e-07\n",
+      "MSE values for Scikit-Learn Ridge implementation\n",
+      "4.363295916323784e-07\n",
+      "Theta values for own Ridge implementation\n",
+      "[ 1.03630548 -0.01963611 -0.37900111 -0.07062318  0.12182967  0.16343471\n",
+      "  0.13003291  0.07490892  0.02365049 -0.01449782 -0.03814292 -0.04909093\n",
+      " -0.05009826 -0.04389027 -0.03279636 -0.01866537 -0.00289724  0.01348565\n",
+      "  0.02976145  0.04543942]\n",
+      "Theta values for Scikit-Learn Ridge implementation\n",
+      "[ 1.03630548 -0.01963611 -0.37900111 -0.07062318  0.12182967  0.16343471\n",
+      "  0.13003291  0.07490892  0.02365049 -0.01449782 -0.03814292 -0.04909093\n",
+      " -0.05009826 -0.04389027 -0.03279636 -0.01866537 -0.00289724  0.01348565\n",
+      "  0.02976145  0.04543942]\n",
+      "MSE values for own Ridge implementation\n",
+      "5.194042827197027e-06\n",
+      "MSE values for Scikit-Learn Ridge implementation\n",
+      "5.1940428268204826e-06\n",
+      "Theta values for own Ridge implementation\n",
+      "[ 1.04220758 -0.10931453 -0.17641709 -0.06020587  0.02208512  0.05789007\n",
+      "  0.06491736  0.05785343  0.04537385  0.03196357  0.01969145  0.00934499\n",
+      "  0.00107405 -0.00526348 -0.00992331 -0.01318643 -0.01531845 -0.01655318\n",
+      " -0.01708852 -0.01708781]\n",
+      "Theta values for Scikit-Learn Ridge implementation\n",
+      "[ 1.04220758 -0.10931453 -0.17641709 -0.06020587  0.02208512  0.05789007\n",
+      "  0.06491736  0.05785343  0.04537385  0.03196357  0.01969145  0.00934499\n",
+      "  0.00107405 -0.00526348 -0.00992331 -0.01318643 -0.01531845 -0.01655318\n",
+      " -0.01708852 -0.01708781]\n",
+      "MSE values for own Ridge implementation\n",
+      "2.0940821989643363e-05\n",
+      "MSE values for Scikit-Learn Ridge implementation\n",
+      "2.094082198961999e-05\n",
+      "Theta values for own Ridge implementation\n",
+      "[ 1.01219292 -0.06043581 -0.10391807 -0.05651951 -0.01898855  0.00312361\n",
+      "  0.01463049  0.01975848  0.02123176  0.02068067  0.01905883  0.01691985\n",
+      "  0.01458337  0.01223198  0.00996754  0.00784393  0.00588657  0.00410387\n",
+      "  0.00249435  0.00105081]\n",
+      "Theta values for Scikit-Learn Ridge implementation\n",
+      "[ 1.01219292 -0.06043581 -0.10391807 -0.05651951 -0.01898855  0.00312361\n",
+      "  0.01463049  0.01975848  0.02123176  0.02068067  0.01905883  0.01691985\n",
+      "  0.01458337  0.01223198  0.00996754  0.00784393  0.00588657  0.00410387\n",
+      "  0.00249435  0.00105081]\n",
+      "MSE values for own Ridge implementation\n",
+      "0.0003153514830957865\n",
+      "MSE values for Scikit-Learn Ridge implementation\n",
+      "0.00031535148309580783\n",
+      "Theta values for own Ridge implementation\n",
+      "[ 8.38916861e-01  1.31276579e-01  8.97497404e-03 -1.72271878e-02\n",
+      " -2.11744554e-02 -1.91492986e-02 -1.57201944e-02 -1.23002365e-02\n",
+      " -9.30466214e-03 -6.81048318e-03 -4.78184120e-03 -3.15130074e-03\n",
+      " -1.84923989e-03 -8.13661243e-04  7.46984697e-06  6.56636616e-04\n",
+      "  1.16805821e-03  1.56912044e-03  1.88168312e-03  2.12318726e-03]\n",
+      "Theta values for Scikit-Learn Ridge implementation\n",
+      "[ 8.38916861e-01  1.31276579e-01  8.97497404e-03 -1.72271878e-02\n",
+      " -2.11744554e-02 -1.91492986e-02 -1.57201944e-02 -1.23002365e-02\n",
+      " -9.30466214e-03 -6.81048318e-03 -4.78184120e-03 -3.15130074e-03\n",
+      " -1.84923989e-03 -8.13661243e-04  7.46984697e-06  6.56636616e-04\n",
+      "  1.16805821e-03  1.56912044e-03  1.88168312e-03  2.12318726e-03]\n",
+      "MSE values for own Ridge implementation\n",
+      "0.015072388895177157\n",
+      "MSE values for Scikit-Learn Ridge implementation\n",
+      "0.0150723888951771\n",
+      "Theta values for own Ridge implementation\n",
+      "[0.37396662 0.14174745 0.0764924  0.04892055 0.03447512 0.02586427\n",
+      " 0.02024962 0.01633913 0.01347916 0.0113104  0.0096208  0.00827728\n",
+      " 0.00719176 0.00630331 0.00556826 0.0049544  0.00443743 0.0039987\n",
+      " 0.0036237  0.003301  ]\n",
+      "Theta values for Scikit-Learn Ridge implementation\n",
+      "[0.37396662 0.14174745 0.0764924  0.04892055 0.03447512 0.02586427\n",
+      " 0.02024962 0.01633913 0.01347916 0.0113104  0.0096208  0.00827728\n",
+      " 0.00719176 0.00630331 0.00556826 0.0049544  0.00443743 0.0039987\n",
+      " 0.0036237  0.003301  ]\n",
+      "MSE values for own Ridge implementation\n",
+      "0.26409315307910036\n",
+      "MSE values for Scikit-Learn Ridge implementation\n",
+      "0.26409315307910025\n"
+     ]
+    },
+    {
+     "data": {
+      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAkAAAAGwCAYAAABB4NqyAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjguNCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8fJSN1AAAACXBIWXMAAA9hAAAPYQGoP6dpAABThUlEQVR4nO3deVxU9eL/8deArCK4oKAJuGSuaYqoaG4Vy2hdMyvvrSxLLW/lkr82rq1+b9ert2umZdomeculrpZmDIqVOymaaLllpmEKmRu4scic3x/qXEcWAcEDzPv5eMxD58xnzrxnonh35vM5x2IYhoGIiIiIC3EzO4CIiIjItaYCJCIiIi5HBUhERERcjgqQiIiIuBwVIBEREXE5KkAiIiLiclSARERExOXUMDtAZWS32zl06BC1atXCYrGYHUdERERKwDAMTp48SaNGjXBzK/4YjwpQIQ4dOkRISIjZMURERKQMDhw4QOPGjYsdowJUiFq1agHnP0B/f3+T04iIiEhJZGVlERIS4vg9XhwVoEJc/NrL399fBUhERKSKKcn0FU2CFhEREZejAiQiIiIuRwVIREREXI7mAF2F/Px88vLyzI4hUmE8PDxwd3c3O4aISLlTASoDwzDIyMjgxIkTZkcRqXC1a9cmODhY58QSkWpFBagMLpafBg0a4Ovrq18MUi0ZhsGZM2c4fPgwAA0bNjQ5kYhI+VEBKqX8/HxH+alXr57ZcUQqlI+PDwCHDx+mQYMG+jpMRKoNTYIupYtzfnx9fU1OInJtXPxZ13w3EalOVIDKSF97iavQz7qIVEcqQCIiIuJyVIBERETE5agAiRRh//79WCwWUlNTixyzcuVKLBaLTokgIlLFqAC5kKFDh2KxWBg5cmSBxx5//HEsFgtDhw51bDt8+DCPPfYYoaGheHl5ERwcTExMDMnJyY4xTZo0wWKxFLj985//vBZvqcz69OnjyOrp6Unz5s2Ji4sjJyfHMSYkJIT09HTatWtnYtKiP+OLtz59+pR533369GHs2LHlllVE5ErO5WazPmEW53KzTc2hZfAuJiQkhPnz5/PGG284ljhnZ2czb948QkNDncYOGjSIvLw8PvroI5o1a8bvv//O119/zbFjx5zGTZgwgREjRjhtq1WrVsW+kXIwYsQIJkyYQG5uLikpKTz88MMATJw4EQB3d3eCg4PNjAhASkoK+fn5AKxfv55Bgwaxe/du/P39AfD09DQznohIqXy3/EN6bn6C678exU//ysbiZs6xGB0BKg+GAadPm3MzjFJF7dSpE6GhoSxatMixbdGiRYSEhNCxY0fHthMnTrB27VomTZpE3759CQsLo0uXLsTFxdG/f3+nfdaqVYvg4GCnW82aNYvMcPz4cR588EHq1KmDr68vVquVPXv2XPgoDerXr8/ChQsd42+66SYaNGjguJ+cnIyHhwenTp0Czq9Sev/99xk4cCC+vr60aNGCJUuWXPGz8PX1JTg4mNDQUAYNGkRUVBTLly93PF7YV2AJCQnccMMN+Pj40LdvX/bv319gv++99x4hISH4+voycOBApkyZQu3atZ3GfPnll4SHh+Pt7U2zZs149dVXOXfuXKE569ev7/hc69atC0CDBg0c23bt2kWvXr3w8fEhJCSE0aNHc/r0acfzZ8yYQYsWLfD29iYoKIi7774bOH9EcNWqVbz55puOo0mFvR8RkfJk2/gJAJ1paFr5ARWg8nHmDPj5mXM7c6bUcR9++GFmz57tuP/hhx/yyCOPOI3x8/PDz8+PL774wulrofIwdOhQNm3axJIlS0hOTsYwDPr160deXh4Wi4VevXqxcuVK4HxZ2rFjB3l5eezYsQM4P+8mPDwcPz8/xz5fffVV7r33XrZt20a/fv24//77CxypKs7WrVtZt24dHh4eRY45cOAAd911F/369SM1NZXhw4fz/PPPO41Zt24dI0eOZMyYMaSmphIVFcVrr73mNGbZsmU88MADjB49mh07djBr1izi4+MLjCuJH374gZiYGO666y62bdvGggULWLt2LU8++SQAmzZtYvTo0UyYMIHdu3eTmJhIr169AHjzzTeJjIxkxIgRpKenk56eTkhISKkziIiUhu3kFgCszWLMDWJIAZmZmQZgZGZmFnjs7Nmzxo4dO4yzZ8/+b+OpU4Zx/ljMtb+dOlXi9/XQQw8ZAwYMMP744w/Dy8vL2Ldvn7F//37D29vb+OOPP4wBAwYYDz30kGP8f//7X6NOnTqGt7e30b17dyMuLs7YunWr0z7DwsIMT09Po2bNmk63b7/9ttAMP/30kwEY69atc2w7cuSI4ePjY3z66aeGYRjGtGnTjHbt2hmGYRhffPGF0blzZ+Ouu+4y3n77bcMwDCM6Otp47rnnHM8HjBdeeOGSfxynDIvFYthstiI/i969exseHh5GzZo1DU9PTwMw3NzcjP/+97+OMfv27TMAY8uWLYZhGEZcXJzRunVrw263O8Y899xzBmAcP37cMAzDGDx4sNG/f3+n17r//vuNgIAAx/2ePXsa//jHP5zG/Oc//zEaNmxYZN6Lvv32W6fXGzJkiPHoo486jVmzZo3h5uZmnD171li4cKHh7+9vZGVlFfk5jBkzptjXLPRnXkSkDNL3bjV4BYNXMDJ++aHc91/c7+/LaQ5QefD1hQtfx5jy2qUUGBhI//79+eijjzAMg/79+xMYGFhg3KBBg+jfvz9r1qwhOTmZxMREJk+ezPvvv+80WfqZZ55xug9w3XXXFfraO3fupEaNGnTt2tWxrV69erRs2ZKdO3cC5yfmjhkzhiNHjrBq1Sr69OlDaGgoq1at4tFHH2X9+vUFJu62b9/e8feaNWtSq1YtxzWsinL//fczfvx4srKymDRpEv7+/gwaNKjI8Tt37qRbt25OJwaMjIx0GrN7924GDhzotK1Lly4sXbrUcX/z5s2kpKQ4HfHJz88nOzubM2fOlOos45s3b+bnn3/mk08+cWwzDAO73c6+ffuIiooiLCyMZs2aERsbS2xsrOOrQhGRa22ZbToA4Sd8CWpq7gITFaDyYLFAMXNeKqNHHnnE8TXJ22+/XeQ4b29voqKiiIqK4qWXXmL48OG8/PLLToUnMDCQ66+/vkSvaxQxZ8kwDEexaNeuHfXq1WPVqlWsWrWKCRMmEBISwmuvvUZKSgpnz57l5ptvdnr+5V9dWSwW7HZ7sVkCAgIcuT/++GPatm3LBx98wLBhw0qVvaj3UdTz7HY7r776KnfddVeB53t7e1/xNS7f12OPPcbo0aMLPBYaGoqnpyfff/89K1euZPny5bz00ku88sorpKSkFJiXJCJS0Wx7l0EAWP07mR1Fc4BcVWxsLLm5ueTm5hITU/LvYdu0aeM0wba02rRpw7lz59iwYYNj29GjR/npp59o3bo1gGMe0OLFi/nxxx/p2bMnN954I3l5ecycOZNOnTqV+yozDw8P/va3v/HCCy9wpoh5VW3atOG7775z2nb5/VatWrFx40anbZs2bXK636lTJ3bv3s31119f4OZWygmBnTp1Yvv27YXu6+LqsBo1anDbbbcxefJktm3bxv79+/nmm2+A8yvILq4wExGpSOdys1nu9RsA1q4PmJxGBchlubu7s3PnTnbu3FnoFb6PHj3KLbfcwscff8y2bdvYt28fn332GZMnT2bAgAFOY0+ePElGRobTLSsrq9DXbdGiBQMGDGDEiBGsXbuWrVu38sADD3Ddddc57bdPnz7MnTuX9u3b4+/v7yhFn3zyyVWd96Y49913HxaLhRkzZhT6+MiRI9m7dy/jxo1j9+7dzJ07l/j4eKcxo0aNIiEhgSlTprBnzx5mzZqFzWZzOir00ksvMWfOHF555RW2b9/Ozp07WbBgAS+88EKpMz/33HMkJyfzxBNPkJqayp49e1iyZAmjRo0CYOnSpUybNo3U1FR+/fVX5syZg91up2XLlsD5cwxt2LCB/fv3c+TIkSseNRMRKauNKz7iuLdBnWwLXW57yOw4KkCuzN/f33Eumcv5+fnRtWtX3njjDXr16kW7du148cUXGTFiBG+99ZbT2JdeeomGDRs63Z599tkiX3f27NmEh4dz++23ExkZiWEYJCQkOH2N1bdvX/Lz853KTu/evcnPz6d3795X98aL4OnpyZNPPsnkyZMdS+wvFRoaysKFC/nyyy/p0KEDM2fO5B//+IfTmB49ejBz5kymTJlChw4dSExM5KmnnnL6aismJoalS5eSlJREREQE3bp1Y8qUKYSFhZU6c/v27Vm1ahV79uyhZ8+edOzYkRdffJGGDRsCULt2bRYtWsQtt9xC69atmTlzJvPmzaNt27YAPP3007i7u9OmTRvq169PWlpaqTOIiJSEbcPHAETlNKaGZ+m+7q8IFqMkExtcTFZWFgEBAWRmZhYoCNnZ2ezbt4+mTZuWer6GuKYRI0awa9cu1qxZY3aUMtHPvIiUh85P1WRz7TPMrjeMoU++XyGvUdzv78tpErRIOXv99deJioqiZs2a2Gw2PvrooyK/VhMRcQW/7/uRzbXPz6+M7Vdw0YYZVIBEytnGjRuZPHkyJ0+epFmzZkybNo3hw4ebHUtExDTLbeenTnQ84UNws/ZXGH1tqACJlLNPP/3U7AgiIpWKbW8i+IO1VscrD75GNAlaREREKkx+Xi7LPM4vsLB2ud/kNP+jAiQiIiIVJuXrORzzMQjIhm7Rj1z5CdeI6QVoxowZjtUl4eHhxa6UWbRoEVFRUdSvXx9/f38iIyNZtmyZ05j4+HjHla0vvWVnZ1f0WxEREZHL2L77DwBROddViuXvF5lagBYsWMDYsWMZP348W7ZsoWfPnlit1iLPRbJ69WqioqJISEhg8+bN9O3blzvuuIMtW7Y4jfP393dc3friTct3RURErj1b5mYArE2iTE7izNRJ0FOmTGHYsGGOFTJTp05l2bJlvPPOO0ycOLHA+KlTpzrd/8c//sHixYv58ssv6djxfxOrLBYLwcHBFZpdREREivdH2k42BZy/fFKsdZTJaZyZdgQoNzeXzZs3Ex0d7bQ9Ojqa9evXl2gfdrudkydPUrduXaftp06dIiwsjMaNG3P77bcXOEJ0uZycHLKyspxuYq79+/djsVhITU0tcszKlSuxWCycOHHimuUSEZGSW54wHcMCHU5406iF+RdAvZRpBejIkSPk5+cTFBTktD0oKIiMjIwS7ePf//43p0+f5t5773Vsa9WqFfHx8SxZsoR58+bh7e1Njx492LNnT5H7mThxIgEBAY5bSEhI2d5UJTd06FAsFgsjR44s8Njjjz+OxWJxusr74cOHeeyxxwgNDcXLy4vg4GBiYmJITk52jGnSpEmhc67++c9/FpmjT58+jnGenp40b96cuLg4cnJyHGNCQkJIT0+nXbt25fPmy6io93fxdjXXJevTpw9jx44tt6wiIpWNbY8NAKvfTeYGKYTp5wG69CKRAIZhFNhWmHnz5vHKK6+wePFiGjRo4NjerVs3unXr5rjfo0cPOnXqxPTp05k2bVqh+4qLi2PcuHGO+1lZWdW2BIWEhDB//nzeeOMNfHx8gPOXOpg3bx6hoaFOYwcNGkReXh4fffQRzZo14/fff+frr7/m2LFjTuMmTJjAiBEjnLZd6WrtI0aMYMKECeTm5pKSksLDDz8M4Pjq093dvVJ8jZmSkuK4Wvr69esZNGgQu3fvdpxi/eIV10VExJk9/xzLPH4FwBpxn8lpCjLtCFBgYCDu7u4FjvYcPny4wFGhyy1YsIBhw4bx6aefcttttxU71s3NjYiIiGKPAHl5eTkuDFrcBUKrg06dOhEaGsqiRYsc2xYtWkRISIjTPKoTJ06wdu1aJk2aRN++fQkLC6NLly7ExcXRv39/p33WqlWL4OBgp1vNmjWLzeHr60twcDChoaEMGjSIqKgoli9f7ni8sK/AEhISuOGGG/Dx8aFv377s37+/wH7fe+89QkJC8PX1ZeDAgUyZMoXatWs7jfnyyy8JDw/H29ubZs2a8eqrr3Lu3LlCc9avX9/xni5+1dqgQQPHtl27dtGrVy98fHwICQlh9OjRnD592vH8GTNm0KJFC7y9vQkKCuLuu+8Gzh+NW7VqFW+++abjaFJh70dEpKra9PV/OOJj4J8DkdHDzI5TgGkFyNPTk/DwcJKSkpy2JyUl0b179yKfN2/ePIYOHcrcuXML/CIujGEYpKamOq6OXREMw+B07mlTbmW5lu3DDz/M7NmzHfc//PBDHnnE+dwMfn5++Pn58cUXXzh9NVURtm7dyrp165yuBn+5AwcOcNddd9GvXz9SU1MZPnw4zz//vNOYdevWMXLkSMaMGUNqaipRUVG89tprTmOWLVvGAw88wOjRo9mxYwezZs0iPj6+wLiS+OGHH4iJieGuu+5i27ZtLFiwgLVr1/Lkk08CsGnTJkaPHs2ECRPYvXs3iYmJ9OrVC4A333yTyMhIRowY4VipWF2POoqIa7IlzwEgKvs6PLx9TU5TkKlfgY0bN44hQ4bQuXNnIiMjeffdd0lLS3PMUYmLi+PgwYPMmXP+Q5w3bx4PPvggb775Jt26dXMcPfLx8SEgIACAV199lW7dutGiRQuysrKYNm0aqampvP322xX2Ps7kncFvol+F7b84p+JOUdOz+KMtlxsyZAhxcXGOoyzr1q1j/vz5rFy50jGmRo0axMfHM2LECGbOnEmnTp3o3bs3f/7zn2nf3vk6Ls899xwvvPCC07alS5cWOz9mxowZvP/+++Tl5ZGbm4ubm1ux/4zeeecdmjVrxhtvvIHFYqFly5b88MMPTJo0yTFm+vTpWK1Wnn76aQBuuOEG1q9fz9KlSx1jXnvtNZ5//nkeeughAJo1a8b//d//8eyzz/Lyyy9f8bO71L/+9S/uu+8+xzyeFi1aMG3aNHr37s0777xDWloaNWvW5Pbbb6dWrVqEhYU5jrIFBATg6enpOBImIlLd2E6kQG2IDbvV7CiFMrUADR48mKNHjzJhwgTHhNeEhATCwsIASE9Pdzon0KxZszh37hxPPPEETzzxhGP7Qw89RHx8PHD+q5tHH32UjIwMAgIC6NixI6tXr6ZLly7X9L1VZoGBgfTv35+PPvoIwzDo378/gYGBBcYNGjSI/v37s2bNGpKTk0lMTGTy5Mm8//77TpOln3nmGaf7ANddd12xGe6//37Gjx9PVlYWkyZNwt/fn0GDBhU5fufOnXTr1s1pflhkZKTTmN27dzNw4ECnbV26dHEqQJs3byYlJcXpiE9+fj7Z2dmcOXMGX9+S/1/K5s2b+fnnn/nkk08c2wzDwG63s2/fPqKioggLC6NZs2bExsYSGxvLwIEDS/UaIiJV0ZEDu9l4cfl77JMmpymc6ZOgH3/8cR5//PFCH7tYai669AhFUd544w3eeOONckhWcr4evpyKO3VNX/PS1y6LRx55xPFVTXFHXry9vYmKiiIqKoqXXnqJ4cOH8/LLLzsVnsDAQK6//vpSvX5AQIDjOR9//DFt27blgw8+YNiwwr8nLslXfYVNoL/8eXa7nVdffZW77rqrwPNLe7JMu93OY489xujRows8FhoaiqenJ99//z0rV65k+fLlvPTSS7zyyiukpKQUmJckIlKdLP9qGoYFbsz0pnHLCLPjFMr0AlQdWCyWUn8NZbbY2Fhyc3MBiImJKfHz2rRpwxdffFGuWTw8PPjb3/5GXFwcf/nLXwo9QlLY63733XdO91u1asXGjRudtm3atMnpfqdOndi9e3epC1thOnXqxPbt24vdV40aNbjtttu47bbbePnll6lduzbffPMNd911F56eno4VZiIi1Uniz4lQC6y+7a882CSmXwtMzOHu7s7OnTvZuXMn7u7uBR4/evQot9xyCx9//DHbtm1j3759fPbZZ0yePJkBAwY4jT158iQZGRlOt9KeTPK+++7DYrEwY8aMQh8fOXIke/fuZdy4cezevZu5c+cWOEI4atQoEhISmDJlCnv27GHWrFnYbDano0IvvfQSc+bM4ZVXXmH79u3s3LmTBQsWFJjDVBLPPfccycnJPPHEE6SmprJnzx6WLFnCqFHnz3a6dOlSxxy0X3/9lTlz5mC322nZsiVw/hxDGzZsYP/+/Rw5cgS73V7qDCIilY09/xyJ7vsAsHb+i8lpiqYC5MKKW/Lv5+dH165deeONN+jVqxft2rXjxRdfZMSIEbz11ltOY1966SUaNmzodHv22WdLlcXT05Mnn3ySyZMnc+pUwa8TQ0NDWbhwIV9++SUdOnRg5syZ/OMf/3Aa06NHD2bOnMmUKVPo0KEDiYmJPPXUU05fbcXExLB06VKSkpKIiIigW7duTJkyxTHvrDTat2/PqlWr2LNnDz179qRjx468+OKLjhWHtWvXZtGiRdxyyy20bt2amTNnMm/ePNq2bQvA008/jbu7O23atKF+/fpFXgNPRKQq+f7bufzha1ArB3rEPmp2nCJZjLKso67msrKyCAgIIDMzs0BByM7OZt++fY4r2EvlNmLECHbt2sWaNWvMjlJl6WdeRErj/ybcykvGNww80ZBFbxy6pq9d3O/vy2kOkFQrr7/+OlFRUdSsWRObzcZHH31U5NdqIiJS/mzHNkIdsIYVf6Jis6kASbWyceNGJk+ezMmTJ2nWrBnTpk1j+PDhZscSEXEJxw7tZUPA+WkMsTFPXGG0uVSApFr59NNPzY4gIuKyli99E7sbtM30IqR1V7PjFEuToEVERKRcJP6UAIDV50aTk1yZClAZae64uAr9rItISTgtfw//s8lprkwFqJQuXrDzzJkzJicRuTYu/qwXd7FaEZHUVQv43deOXy7cbH3M7DhXpDlApeTu7k7t2rU5fPgwAL6+vgUuvyBSHRiGwZkzZzh8+DC1a9cu9ISZIiIX2dbFA3DrmWA8fcy5QHhpqACVwcWrd18sQSLVWe3atXXFehG5ItvRDeeXv4feYnaUElEBKgOLxULDhg1p0KABeXl5ZscRqTAeHh468iMiV3Q8fR/JAScBiI0u/ALnlY0K0FVwd3fXLwcREXF5SV9Nw+4GrTM9CWvbw+w4JaJJ0CIiInJVbLuWAmD1rvzL3y9SARIREZEyM+x2Et1+AcDa6V6T05ScCpCIiIiU2dY1n5FR007NXOhpHWl2nBJTARIREZEys635EIBbzgThVbP4K7BXJipAIiIiUma2I98BYG3c1+QkpaMCJCIiImVy4vdfWR+QBUBs1F9NTlM6KkAiIiJSJiuWTiPfDVpmedK0fS+z45SKCpCIiIiUiW3XlwBYPduanKT0VIBERESk1Ay7nUTLXgCsHe8xOU3pqQCJiIhIqf2wbhGHatrxzYVe/arW/B9QARIREZEysK3+AIC+Zxrg7Vfb3DBloAIkIiIipWb7IxkA63W9TU5SNipAIiIiUipZf/zGOv9MAKxRVePq75dTARIREZFSWbF0Gufc4YYsD5p16GN2nDJRARIREZFSse1YAkCsZxuTk5SdCpCIiIiUmGG3Y2MPANYOd5ucpuxUgERERKTEticv5qCfHe886N2vas7/ARUgERERKQXbqvcB6Hu6Pj7+dU1OU3YqQCIiIlJitsPrAbA2qlrX/rqcCpCIiIiUyMmjh1jrfwIA621V7+zPl1IBEhERkRL5eul08tzh+iwPru94q9lxrooKkIiIiJSIbfsXAFg9WpsbpByoAImIiMgVGXY7NuP88vfY9neZnObqqQCJiIjIFe347ksO+OXjdQ769H/C7DhXTQVIRERErijxwvL3Pqfq4RsQaHKaq6cCJCIiIldk+30dANaGVXv5+0UqQCIiIlKsU8cyWFPrOADWWx4zOU35UAESERGRYn3z1XRya0CzkzVo0SnK7DjlQgVIREREimX78QsArDVaY3GrHtWherwLERERqRCG3Y4tfzcAse3uNDdMOVIBEhERkSLtSkng11r5eJ6Dvv2fNDtOuVEBEhERkSIlfnt++Xvvk3WpWaeByWnKjwqQiIiIFMmWsQYAa3BPk5OULxUgERERKdTp44dZVesYANZbHjU5TflSARIREZFCffvVW+TWgCYna9Cyc6zZccqVCpCIiIgUyvbD5wBY3VtWm+XvF1WvdyMiIiLl4vzy910AWNsNNDlN+VMBEhERkQJ+2ryMfbXOXVj+XvWv/n45FSAREREpwPbNuwD0PFkHv7rBJqcpf6YXoBkzZtC0aVO8vb0JDw9nzZo1RY5dtGgRUVFR1K9fH39/fyIjI1m2bFmBcQsXLqRNmzZ4eXnRpk0bPv/884p8CyIiItVO4sXl70E9TE5SMUwtQAsWLGDs2LGMHz+eLVu20LNnT6xWK2lpaYWOX716NVFRUSQkJLB582b69u3LHXfcwZYtWxxjkpOTGTx4MEOGDGHr1q0MGTKEe++9lw0bNlyrtyUiIlKlnck8wkq/owBY+44wOU3FsBiGYZj14l27dqVTp0688847jm2tW7fmzjvvZOLEiSXaR9u2bRk8eDAvvfQSAIMHDyYrKwubzeYYExsbS506dZg3b16h+8jJySEnJ8dxPysri5CQEDIzM/H39y/LWxMREamyEua+Sv89rxB6yp39k3KrzAqwrKwsAgICSvT727R3lJuby+bNm4mOjnbaHh0dzfr160u0D7vdzsmTJ6lbt65jW3JycoF9xsTEFLvPiRMnEhAQ4LiFhISU4p2IiIhUL7ZtCwGwWm6oMuWntEx7V0eOHCE/P5+goCCn7UFBQWRkZJRoH//+9785ffo09957r2NbRkZGqfcZFxdHZmam43bgwIFSvBMREZHqxZa3EwBrNbr6++VqmB3AYrE43TcMo8C2wsybN49XXnmFxYsX06CB88XZSrtPLy8vvLy8SpFaRESketqzOYm9/ufwyIdbqtHV3y9nWgEKDAzE3d29wJGZw4cPFziCc7kFCxYwbNgwPvvsM2677Tanx4KDg8u0TxEREQHb1zMBuDmrNrXqNTI5TcUx7SswT09PwsPDSUpKctqelJRE9+7di3zevHnzGDp0KHPnzqV///4FHo+MjCywz+XLlxe7TxERETkvMf3C8vcG1fv3pqlfgY0bN44hQ4bQuXNnIiMjeffdd0lLS2PkyJHA+bk5Bw8eZM6cOcD58vPggw/y5ptv0q1bN8eRHh8fHwICAgAYM2YMvXr1YtKkSQwYMIDFixezYsUK1q5da86bFBERqSLOZh3j25p/AGDtPdzkNBXL1KndgwcPZurUqUyYMIGbbrqJ1atXk5CQQFhYGADp6elO5wSaNWsW586d44knnqBhw4aO25gxYxxjunfvzvz585k9ezbt27cnPj6eBQsW0LVr12v+/kRERKqSVQkzyPaAxqfcaRs5wOw4FcrU8wBVVqU5j4CIiEh1MSbuJqZ5b2XE6Va8O3mn2XFKrUqcB0hEREQqF1vuDgCsbf5kcpKKpwIkIiIi7E39hj3+edTIh1v7jzI7ToVTARIRERFsSecvS9UjKwD/+o1NTlPxVIBEREQE26FVAFjrR5qc5NpQARIREXFx2adO/G/5e69hJqe5NlSAREREXNzqhHc46wHXnXLjxh53mR3nmlABEhERcXG2LZ8BEMv11fbq75dzjXcpIiIiRbLlbgfA2rr6L3+/SAVIRETEhe3btprd/rnUyIfbbq/+y98vUgESERFxYbakGQB0zwogoEGoyWmuHRUgERERF2b77VsAYgNd65qZKkAiIiIuKud0Ft/4HgbA2vMRk9NcWypAIiIiLmqNbSZnPKHhaTc69LzH7DjXlAqQiIiIi7J9/ykAsUZzl1n+fpFrvVsRERFxsGX/AIC11e0mJ7n2VIBERERc0K/b17EzIBd3O0TdPsbsONecCpCIiIgLsi17G4DITH9qB4WZnObaUwESERFxQbYD55e/W+u51vL3i1SAREREXEzO6Sy+9s0AILbHQyanMYcKkIiIiItZt+w9TntC0Bk3buo92Ow4plABEhERcTG2zfMBiM1vipt7DZPTmEMFSERExMXYzl5Y/t6yv8lJzKMCJCIi4kIO7NzA9oAc3OwQ1X+02XFMowIkIiLiQmzLpgPQLbMWdRs1NzmNeVSAREREXIjt128AsNbtYnISc6kAiYiIuIjcs6dY4ZsOgLXHUHPDmEwFSERExEWsX/Y+pzyhwRkLHfv82ew4plIBEhERcRG2TfMAiHHh5e8XqQCJiIi4CNuZbQBYW/QzOYn5VIBERERcwG+7U/ghIBs3O0S78PL3i1SAREREXEBi4lsAdMnyo17jFianMZ8KkIiIiAuw/boCAGvtCJOTVA4qQCIiItVcXvYZVngfAsDa3TWv/n45FSAREZFqbv2y98jygsCzFsJvud/sOJWCCpCIiEg1l7jp/NXfY/LCXH75+0UqQCIiItWc7VQqANYWVnODVCIqQCIiItXYoT3fs7V2NhYDYvqPMTtOpaECJCIiUo0l2s5f/T0isyaBIS1NTlN5qACJiIhUY7b9SQBYAzqbnKRyUQESERGpps7lZpPkdRAAa+QQk9NULipAIiIi1VTysvfJ9IZ6Zy10vkUF6FIqQCIiItWUbeNcAKLzQnH38DQ5TeWiAiQiIlJNJV5c/t481twglZAKkIiISDWU8cs2ttQ+C0BMv1Emp6l8VIBERESqocSEaQB0PuFLgyZtTU5T+agAiYiIVEO2X5YDYPUPNzlJ5aQCJCIiUs2cy81muddvAFi7PWBymspJBUhERKSa2ZA0mxPeBnXPWuhy21Cz41RKKkAiIiLVjG3DJwBE54Vo+XsRVIBERESqmcSTWwCIbRZtcpLKSwVIRESkGvl9349srn0GgNh+uvp7UVSAREREqpFlF5a/dzrhQ1DTdianqbxML0AzZsygadOmeHt7Ex4ezpo1a4ocm56ezn333UfLli1xc3Nj7NixBcbEx8djsVgK3LKzsyvwXYiIiFQOtl+WAWCt1cnkJJWbqQVowYIFjB07lvHjx7NlyxZ69uyJ1WolLS2t0PE5OTnUr1+f8ePH06FDhyL36+/vT3p6utPN29u7ot6GiIhIpZCfl8tyjwMAWLveb3Kays3UAjRlyhSGDRvG8OHDad26NVOnTiUkJIR33nmn0PFNmjThzTff5MEHHyQgIKDI/VosFoKDg51uIiIi1d3GFfEc8zGonW2ha9TDZsep1EwrQLm5uWzevJnoaOcZ6tHR0axfv/6q9n3q1CnCwsJo3Lgxt99+O1u2bCl2fE5ODllZWU43ERGRqsb23ccAROc0poanvvkojmkF6MiRI+Tn5xMUFOS0PSgoiIyMjDLvt1WrVsTHx7NkyRLmzZuHt7c3PXr0YM+ePUU+Z+LEiQQEBDhuISEhZX59ERERs9iyNgNg1fL3KzJ9ErTFYnG6bxhGgW2l0a1bNx544AE6dOhAz549+fTTT7nhhhuYPn16kc+Ji4sjMzPTcTtw4ECZX19ERMQMh/dvZ9OF5e8x1idNTlP51TDrhQMDA3F3dy9wtOfw4cMFjgpdDTc3NyIiIoo9AuTl5YWXl1e5vaaIiMi1ttz2FgA3nfChYfObzA1TBZh2BMjT05Pw8HCSkpKcticlJdG9e/dyex3DMEhNTaVhw4bltk8REZHKxvazDQCr303mBqkiTDsCBDBu3DiGDBlC586diYyM5N133yUtLY2RI0cC57+aOnjwIHPmzHE8JzU1FTg/0fmPP/4gNTUVT09P2rRpA8Crr75Kt27daNGiBVlZWUybNo3U1FTefvvta/7+REREroX8vFyWeZw/hYy1y30mp6kaTC1AgwcP5ujRo0yYMIH09HTatWtHQkICYWFhwPkTH15+TqCOHTs6/r5582bmzp1LWFgY+/fvB+DEiRM8+uijZGRkEBAQQMeOHVm9ejVdunS5Zu9LRETkWtr0zX846mMQkA2RMcPNjlMlWAzDMMwOUdlkZWUREBBAZmYm/v7+ZscREREp1iuv9OFVyyruzryOz6b8ZnYc05Tm97fpq8BERETk6tgyNwFgbRJlcpKqQwVIRESkCjtyYDcpAacBiIl9wuQ0VYcKkIiISBW2/KtpGBZof8Kb627obHacKkMFSEREpAqz7bmw/L1m0RcJl4JUgERERKooe/45ltXYD4A14i/mhqliVIBERESqqM3ffMIfvgb+OdA9ZoTZcaqUUhWgyZMnc/bsWcf91atXk5OT47h/8uRJHn/88fJLJyIiIkWyrf8IgNuyG+Hh7WtymqqlVAUoLi6OkydPOu7ffvvtHDx40HH/zJkzzJo1q/zSiYiISJFsJ1IAsIbdZnKSqqdUBejycybqHIoiIiLmOPrbHjYEnAIgNlZXfy8tzQESERGpgpISpmNYoF2mF41bRpgdp8pRARIREamCbD99BYDVp73JSaqmUl8M9f3338fPzw+Ac+fOER8fT2BgIIDT/CARERGpGPb8cyS67wPA2vnPJqepmkp1MdQmTZpgsViuOG7fvn1XFcpsuhiqiIhUZpu//pjOa4fglwtHXziJp4+f2ZEqhdL8/i7VEaD9+/dfTS4REREpB7Z18QDcdqahyk8ZaQ6QiIhIFWM7thEAa9gtJiepukpVgDZs2IDNZnPaNmfOHJo2bUqDBg149NFHnU6MKCIiIuXr2KG9fBdwfs6tNWaUyWmqrlIVoFdeeYVt27Y57v/www8MGzaM2267jeeff54vv/ySiRMnlntIEREROW9FwlvY3aBtphchrbuaHafKKlUBSk1N5dZbb3Xcnz9/Pl27duW9995j3LhxTJs2jU8//bTcQ4qIiMh5tl1LAYj1aWdykqqtVAXo+PHjBAUFOe6vWrWK2NhYx/2IiAgOHDhQfulERETEwZ5/jkS3XwCwdhpscpqqrVQFKCgoyLHEPTc3l++//57IyEjH4ydPnsTDw6N8E4qIiAgAW9d8RkZNOzVz4ebYx8yOU6WVqgDFxsby/PPPs2bNGuLi4vD19aVnz56Ox7dt20bz5s3LPaSIiIiAbc1sAG49E4xXTZ2n7mqU6jxAf//737nrrrvo3bs3fn5+xMfH4+np6Xj8ww8/JDo6utxDioiICNiOboA6YA3pa3aUKq9UBah+/fqsWbOGzMxM/Pz8cHd3d3r8s88+o1atWuUaUERERODE77+SHJAFgDXmCZPTVH2lKkCPPPJIicZ9+OGHZQojIiIihUta+ib5btA605Owtj3MjlPllaoAxcfHExYWRseOHSnFJcRERETkKiXuWgp+EOut5e/loVQFaOTIkcyfP59ffvmFRx55hAceeIC6detWVDYREREBDLudRMteAKwd7zE5TfVQqlVgM2bMID09neeee44vv/ySkJAQ7r33XpYtW6YjQiIiIhVk29r/cqimHd9c6NXvcbPjVAulvhiql5cXf/nLX0hKSmLHjh20bduWxx9/nLCwME6dOlURGUVERFyabfX5ubW3nGmg5e/l5KquBm+xWLBYLBiGgd1uL69MIiIicgnbke8AsDbW8vfyUuoClJOTw7x584iKiqJly5b88MMPvPXWW6SlpeHn51cRGUVERFxW5uE01gVkAmCN0tdf5aVUk6Aff/xx5s+fT2hoKA8//DDz58+nXr16FZVNRETE5a1YOp18N2iZ5UnT9r3MjlNtlKoAzZw5k9DQUJo2bcqqVatYtWpVoeMWLVpULuFERERcXeLOL8EPrJ5tzY5SrZSqAD344INYLJaKyiIiIiKXMOx2bOwBIPamQSanqV5KfSJEERERuTZ+XP85B/3s+ORB7/66/EV5uqpVYCIiIlJxbKs+AKDv6fp4+9U2N0w1owIkIiJSSdn+SAbA2qi3yUmqHxUgERGRSijrj99Y638CAGvUX80NUw2pAImIiFRCX381nXPu0CLLg+Y33WJ2nGpHBUhERKQSsu1YAoDVs43JSaonFSAREZFKxrDbSTTOL3+3drjb5DTVkwqQiIhIJbPjuy854JePdx701tXfK4QKkIiISCVjW/UeAH1OB+LjX9fkNNWTCpCIiEglY/t9PQDWhrr2V0VRARIREalETh49xBr/4wBYbx1pcprqSwVIRESkEvnmq7fIc4fmWTVoER5ldpxqSwVIRESkErH9+AUAVo/W5gap5lSAREREKgnDbsdm/ASAtb2u/l6RVIBEREQqiV0pCaT55eN1Dvro6u8VSgVIRESkkrB9e375e++T9fANCDQ5TfWmAiQiIlJJ2DLWAmBt2NPkJNWfCpCIiEglcOpYBqtrHQPAesujJqep/lSAREREKoFvv3qb3BrQ9GQNbgiPMTtOtacCJCIiUgnYfvwcAKt7Kyxu+vVc0Uz/hGfMmEHTpk3x9vYmPDycNWvWFDk2PT2d++67j5YtW+Lm5sbYsWMLHbdw4ULatGmDl5cXbdq04fPPP6+g9CIiIlfPsNux5e8GwHrjQJPTuAZTC9CCBQsYO3Ys48ePZ8uWLfTs2ROr1UpaWlqh43Nycqhfvz7jx4+nQ4cOhY5JTk5m8ODBDBkyhK1btzJkyBDuvfdeNmzYUJFvRUREpMx+2ryM/bXO4XkO+vZ/0uw4LsFiGIZh1ot37dqVTp068c477zi2tW7dmjvvvJOJEycW+9w+ffpw0003MXXqVKftgwcPJisrC5vN5tgWGxtLnTp1mDdvXolyZWVlERAQQGZmJv7+/iV/QyIiImUwddJAnsr+gtuO1yVp6lGz41RZpfn9bdoRoNzcXDZv3kx0dLTT9ujoaNavX1/m/SYnJxfYZ0xMTLH7zMnJISsry+kmIiJyrdjSz0//sAbfbHIS12FaATpy5Aj5+fkEBQU5bQ8KCiIjI6PM+83IyCj1PidOnEhAQIDjFhISUubXFxERKY0zmUdYVev8UR9r3xEmp3Edpk+CtlgsTvcNwyiwraL3GRcXR2ZmpuN24MCBq3p9ERGRkvp26Vvk1ICwk+60iuhndhyXUcOsFw4MDMTd3b3AkZnDhw8XOIJTGsHBwaXep5eXF15eXmV+TRERkbKy/bAIfMDq3lLL368h0z5pT09PwsPDSUpKctqelJRE9+7dy7zfyMjIAvtcvnz5Ve1TRESkIhh2O7ZzOwGwtrvT3DAuxrQjQADjxo1jyJAhdO7cmcjISN59913S0tIYOXIkcP6rqYMHDzJnzhzHc1JTUwE4deoUf/zxB6mpqXh6etKmTRsAxowZQ69evZg0aRIDBgxg8eLFrFixgrVr117z9yciIlKcPd8n8cuF5e+39B9ldhyXYmoBGjx4MEePHmXChAmkp6fTrl07EhISCAsLA86f+PDycwJ17NjR8ffNmzczd+5cwsLC2L9/PwDdu3dn/vz5vPDCC7z44os0b96cBQsW0LVr12v2vkREREoi8Zt3Aeh5sg5+dYNNTuNaTD0PUGWl8wCJiMi1YH2qPom1j/Avz/48HbfU7DhVXpU4D5CIiIgrO5t1jJU1jwBg7a3l79eaCpCIiIgJVn71FtkeEHLKnTbd7jA7jstRARIRETGBbetCAKyWFlr+bgJ94iIiIiaw5V1Y/t72TnODuCgVIBERkWvs5y1f87N/Hh75cOvtWv5uBhUgERGRayxxxUwAbs6qTa16jUxO45pUgERERK4x26FVAFgb6CoFZlEBEhERuYayT53g25p/ABDb6xGT07guFSAREZFraNVXb3PWA6475Ua77gPNjuOyVIBERESuIVvqfwGwouXvZtInLyIicg3ZcrcDYG3zJ5OTuDYVIBERkWvkl60r+ck/jxr5cNvto82O49JUgERERK4RW9IMAHpkBeBfv7HJaVybCpCIiMg1knjwwvL3+pEmJxEVIBERkWsg+9QJvvE9DEBsz4dNTiMqQCIiItfAGttMznhCo9NutL/5brPjuDwVIBERkWvAtuUzAGKN5lr+Xgnon4CIiMg1YMv+EQBrqztMTiKgAiQiIlLh9v+4ll0Bubjbtfy9slABEhERqWC2ZW8D0D3Tn9pBYSanEVABEhERqXCJv60EwBrYzdwg4qACJCIiUoFyTmfxtW8GANaeuvp7ZaECJCIiUoHWJs7itCcEn3ajQ897zI4jF6gAiYiIVCDb9wsAiLU30/L3SkT/JERERCqQ7ezF5e+3m5xELqUCJCIiUkHSdiSzIyAHNztE9dfy98pEBUhERKSC2Ja9BUBkZi3qNGxqchq5lAqQiIhIBbGlfQOAtV5Xk5PI5VSAREREKkDu2VP/W/7eY6i5YaQAFSAREZEKsC7xXU55QtAZN27qPdjsOHIZFSAREZEKYNs0H4CY/Ca4udcwOY1cTgVIRESkAtjObgPAekN/k5NIYVSAREREytmBnRv48cLy9+jbx5gdRwqhAiQiIlLOEi9c/b1rph91GzU3OY0URgVIRESknNl+XQGAtW4Xk5NIUVSAREREylFe9hlW+KQDYO3+kMlppCgqQCIiIuVo/bL3OOkFDc5Y6NT3PrPjSBFUgERERMqRLWUeADH5TbX8vRJTARIRESlHttNbAYi9PtbkJFIcFSAREZFycvCnTWyrnY3FgGhd/b1SUwESEREpJ4mJ56/+3iWzJoEhLU1OI8VRARIRESkntv0Xlr/XjjA5iVyJCpCIiEg5yMs+Q5L3QQCskQ+anEauRAVIRESkHHyX9CFZXhB41kLnW4eYHUeuQAVIRESkHNg2zgUgJi9My9+rABUgERGRcmA7lQpo+XtVoQIkIiJyldL3ppJa+ywWA2K0/L1KUAESERG5SokJ0wDonFmT+qGtTU4jJaECJCIicpVs+5IAsAaEm5xESkoFSERE5Cqcy80myevC8vduWv1VVagAiYiIXIXvln/ICW+DemctRNyq8/9UFSpAIiIiVyHxwvL36LxQ3D08TU4jJaUCJCIichVsJ78HwNpcy9+rEtML0IwZM2jatCne3t6Eh4ezZs2aYsevWrWK8PBwvL29adasGTNnznR6PD4+HovFUuCWnZ1dkW9DRERcUMYv2/i+9lkAoq1PmpxGSsPUArRgwQLGjh3L+PHj2bJlCz179sRqtZKWllbo+H379tGvXz969uzJli1b+Nvf/sbo0aNZuHCh0zh/f3/S09Odbt7e3tfiLYmIiAtZZpsOQPgJX4KatjM5jZSGqefqnjJlCsOGDWP48OEATJ06lWXLlvHOO+8wceLEAuNnzpxJaGgoU6dOBaB169Zs2rSJ119/nUGDBjnGWSwWgoODS5wjJyeHnJwcx/2srKwyviMREXEltr3LIACs/p3MjiKlZNoRoNzcXDZv3kx0dLTT9ujoaNavX1/oc5KTkwuMj4mJYdOmTeTl5Tm2nTp1irCwMBo3bsztt9/Oli1bis0yceJEAgICHLeQkJAyvisREXEV53KzWe71GwDWrg+YnEZKy7QCdOTIEfLz8wkKCnLaHhQUREZGRqHPycjIKHT8uXPnOHLkCACtWrUiPj6eJUuWMG/ePLy9venRowd79uwpMktcXByZmZmO24EDB67y3YmISHW3ccVHHPc2qJNtoWvUw2bHkVIy/XK1FovF6b5hGAW2XWn8pdu7detGt27dHI/36NGDTp06MX36dKZNm1boPr28vPDy8ipTfhERcU2JGz4BN4jOaazl71WQaUeAAgMDcXd3L3C05/DhwwWO8lwUHBxc6PgaNWpQr169Qp/j5uZGREREsUeARERESsuWtRkAa/MYk5NIWZhWgDw9PQkPDycpKclpe1JSEt27dy/0OZGRkQXGL1++nM6dO+Ph4VHocwzDIDU1lYYNG5ZPcBERcXmH929nU+0zAMRYR5mcRsrC1GXw48aN4/333+fDDz9k586dPPXUU6SlpTFy5Ejg/NycBx/832nFR44cya+//sq4cePYuXMnH374IR988AFPP/20Y8yrr77KsmXL+OWXX0hNTWXYsGGkpqY69ikiInK1liWcX/7e8YQPwc3am5xGysLUOUCDBw/m6NGjTJgwgfT0dNq1a0dCQgJhYWEApKenO50TqGnTpiQkJPDUU0/x9ttv06hRI6ZNm+a0BP7EiRM8+uijZGRkEBAQQMeOHVm9ejVdunS55u9PRESqJ9veRPAHa62OZkeRMrIYF2cRi0NWVhYBAQFkZmbi7+9vdhwREalE8vNyafCiN8d8DNaEv83Ntz9udiS5oDS/v02/FIaIiEhVkvL1HI75GNTOttAt+hGz40gZqQCJiIiUgu27/wAQlXMdNTx1maWqSgVIRESkFBIzLyx/bxplchK5GipAIiIiJfRH2k5SAk4DENtvtMlp5GqoAImIiJTQ8oTpGBbocMKbhs1vMjuOXAUVIBERkRKy7bEBYPW7ydwgctVUgERERErAnn+OZR6/AmCNuM/kNHK1VIBERERKYNPX/+GIj4F/DkRGDzM7jlwlFSAREZESsCXPASAq+zo8vH1NTiNXSwVIRESkBBJPbALA2uQ2k5NIeVABEhERuYKjv+1hQ8ApAGJjnzQ5jZQHFSAREZErWP7VNAwLtD/hzXU3dDY7jpQDFSAREZErsO1JACC2ZnuTk0h5UQESEREphj3/HInu+wCwdv6LyWmkvKgAiYiIFOP7b+fyh69BrRzoEfuo2XGknKgAiYiIFMO2/iMAbjvbUMvfqxEVIBERkWLYjm0EwBqm5e/ViQqQiIhIEY4d2utY/m61jjI5jZQnFSAREZEiJH01DbsbtMv0onHLCLPjSDlSARIRESmCbfdXAFh9tPy9ulEBEhERKcSly99jwwebnEbKmwqQiIhIIVJXLeB3Xzt+uXCz9TGz40g5UwESEREphG1dPAC3ngnG08fP3DBS7lSARERECmE7ugEAa+gtJieRiqACJCIicpnj6ftIDjgJgDVGV3+vjlSARERELrMiYTp2N2iT6UVom0iz40gFUAESERG5jG3XUgCsPu1MTiIVRQVIRETkEobdTqJlLwDWTlr+Xl2pAImIiFxi65rPSK9pp2Yu3Byr5e/VlQqQiIjIJWxrPgTgljNBeNX0NzmNVBQVIBERkUvYjnwHgLVxX5OTSEVSARIREbngxO+/sj4gCwBrzBMmp5GKpAIkIiJywddfvUW+G7TK9KRJu5vNjiMVSAVIRETkAtvOJQBYvbX8vbpTARIREeHC8nd+BsDa8R6T00hFUwESEREBkhPf46CfHd9c6GkdaXYcqWA1zA4gIiJyrR1P38emtQtI2fUNKcd+ZKPH7xyqaQeg75kGePvVNjegVDgVIBERqdbOZB5hy9rPSPlxOSmHt5DCIfb45/1vQO3zf7jZ4cYsb+Jue8WMmHKNqQCJiEi1kZd9hu0bviRly1dsPLiRlHO/8qN/NvkXJ3xccl7D5lk1iKAREfVvoku7GDr2uJuadRqYkluuPRUgERGpkgy7nZ+3fM3GlM9J+TWZlLN7+d7vJNkeFwb4/W9s8Gk3uuQ1IKJOWyJa9qVzj3up17iFKbmlclABEhGRKuHgT5tISV7Ixp9XkXJyN5t8jnPC2zj/oPeFGxCQDZ3P1iGiVku6NO9FROTdXNciHIub1v3I/6gAiYhIpXPs0F42rf2UlN0XJykfJv3CJGVqAHXO/9XrHHQ85UcX7+ZEhEYS0flPtOgUhZu7fr1J8fQTIiIipro4SXnjD8tI+SOVFA7xcxGTlNtleRNRI5Qu10UQcVN/2nUbgIe3rym5pWpTARIRkWsmL/sMP363mJTUr0g5uImN535lezGTlLtwHRH1byKiXbQmKUu5UgESEZEKYc8/d2GS8hekpJ2fpLzF71Shk5QbnnYjIq8BXeq2I6LlLXS++V7qNmpuSm5xDSpAIiJy1Qy7nYN7NpOS/F9S9q5h48ldbPI5TuaFicmXT1KOOFuXiFotiWjeky7d7+G6GzqbFV1clAqQiIiU2rFDe0lZM5+U3d+Scnw7Gz0Ok1HIJGXvvPOTlCN8mtMlrDsRnQdwfcdbNUlZTKefQBERKdbp44fZsu6/bPzxf5OU9/qf+9+A2uf/cHdMUg4j4rrOdOl4B2273qFJylIpqQCJiIhDXvYZflj/OSlbE0g5tImN+Wlsr5WNvZBJytdneRBBI7o06Hh+kvLN9+AbEGhKbpHSUgESEXFR9vxz7Pk+iY0pi89PUs75hdSaRU9S7pLXgIi67ejS6lY63zyYOg2bmpJbpDyoAImIuADDbue33SmkfLeQjXtXk3LqJzb5HifL68IAnws3oHa2hc5n69DFvxURzXsRETlIk5Sl2lEBEhGpho7+toeUtQtI+enCJGXPP/jd98IkZQ+cJil3OlXrf5OUI+7k+o636rIRUu2ZXoBmzJjBv/71L9LT02nbti1Tp06lZ8+eRY5ftWoV48aNY/v27TRq1Ihnn32WkSNHOo1ZuHAhL774Inv37qV58+a89tprDBw4sKLfioiIKU4fP8z3az8lZXsSGw9vIcUtnV9qFT1JuYtHEyIaRRDRsb8mKYvLMrUALViwgLFjxzJjxgx69OjBrFmzsFqt7Nixg9DQ0ALj9+3bR79+/RgxYgQff/wx69at4/HHH6d+/foMGjQIgOTkZAYPHsz//d//MXDgQD7//HPuvfde1q5dS9euXa/1WxQRKVe5Z0/xw/ovzk9STj8/SXlHrZz/TVIO+N/YFhcnKQd1IqJdDDf1GKRJyiIXWAzDMMx68a5du9KpUyfeeecdx7bWrVtz5513MnHixALjn3vuOZYsWcLOnTsd20aOHMnWrVtJTk4GYPDgwWRlZWGz2RxjYmNjqVOnDvPmzStRrqysLAICAsjMzMTf3//KTyihnNNZpO/bVm77k+IZdnv57MeobPspn39ly/PzOX8zsNvPYdgNp21Of9oL2XbJ3+32/EIeL9mfdvsl++Li9kvHXPg7BbfbjfzixzjuX77vS557cYzT2Mv35zzOfvn+i/kzM/ckm3J+IdXvNDmF/K9ro9NudMkLIqJuOyJa3aJJyuKSSvP727QjQLm5uWzevJnnn3/eaXt0dDTr168v9DnJyclER0c7bYuJieGDDz4gLy8PDw8PkpOTeeqppwqMmTp1apFZcnJyyMnJcdzPysoq5bspmS1rPiVyw4gK2beIVBOWy/686LJJyhFn6xDh34ou1/cmIvJuGrXodA1DilR9phWgI0eOkJ+fT1BQkNP2oKAgMjIyCn1ORkZGoePPnTvHkSNHaNiwYZFjitonwMSJE3n11VfL+E5Kzs3NHZ+8K4+rjCymHSe8Opf/DjF9P+X0OVa+PBYsgJtxPpvl4p9YLvn7xe2WQu+7Xf6Y03OvvN3twr2C4yyXjS34p9ul2yw49lRw/GXbLJZLnu+8zek5lqLvu1ncLnltS+HPsVjwdveiQ+POmqQsUk5MnwRtsTj/p9wwjALbrjT+8u2l3WdcXBzjxo1z3M/KyiIkJOTK4UupS/TDnIl+uNz3KyIiIqVjWgEKDAzE3d29wJGZw4cPFziCc1FwcHCh42vUqEG9evWKHVPUPgG8vLzw8vIq8nERERGpXkw7hurp6Ul4eDhJSUlO25OSkujevXuhz4mMjCwwfvny5XTu3BkPD49ixxS1TxEREXE9pn4FNm7cOIYMGULnzp2JjIzk3XffJS0tzXFen7i4OA4ePMicOXOA8yu+3nrrLcaNG8eIESNITk7mgw8+cFrdNWbMGHr16sWkSZMYMGAAixcvZsWKFaxdu9aU9ygiIiKVj6kFaPDgwRw9epQJEyaQnp5Ou3btSEhIICwsDID09HTS0tIc45s2bUpCQgJPPfUUb7/9No0aNWLatGmOcwABdO/enfnz5/PCCy/w4osv0rx5cxYsWKBzAImIiIiDqecBqqwq6jxAIiIiUnFK8/tb6yhFRETE5agAiYiIiMtRARIRERGXowIkIiIiLkcFSERERFyOCpCIiIi4HBUgERERcTkqQCIiIuJyVIBERETE5Zh6KYzK6uLJsbOyskxOIiIiIiV18fd2SS5yoQJUiJMnTwIQEhJichIREREprZMnTxIQEFDsGF0LrBB2u51Dhw5Rq1YtLBZLue47KyuLkJAQDhw4oOuMXYE+q5LTZ1Vy+qxKTp9V6ejzKrmK+qwMw+DkyZM0atQIN7fiZ/noCFAh3NzcaNy4cYW+hr+/v/4FKSF9ViWnz6rk9FmVnD6r0tHnVXIV8Vld6cjPRZoELSIiIi5HBUhERERcjgrQNebl5cXLL7+Ml5eX2VEqPX1WJafPquT0WZWcPqvS0edVcpXhs9IkaBEREXE5OgIkIiIiLkcFSERERFyOCpCIiIi4HBUgERERcTkqQJVATk4ON910ExaLhdTUVLPjVEp/+tOfCA0Nxdvbm4YNGzJkyBAOHTpkdqxKZ//+/QwbNoymTZvi4+ND8+bNefnll8nNzTU7WqX12muv0b17d3x9faldu7bZcSqVGTNm0LRpU7y9vQkPD2fNmjVmR6qUVq9ezR133EGjRo2wWCx88cUXZkeqlCZOnEhERAS1atWiQYMG3Hnnnezevdu0PCpAlcCzzz5Lo0aNzI5RqfXt25dPP/2U3bt3s3DhQvbu3cvdd99tdqxKZ9euXdjtdmbNmsX27dt54403mDlzJn/729/MjlZp5ebmcs899/DXv/7V7CiVyoIFCxg7dizjx49ny5Yt9OzZE6vVSlpamtnRKp3Tp0/ToUMH3nrrLbOjVGqrVq3iiSee4LvvviMpKYlz584RHR3N6dOnzQlkiKkSEhKMVq1aGdu3bzcAY8uWLWZHqhIWL15sWCwWIzc31+wold7kyZONpk2bmh2j0ps9e7YREBBgdoxKo0uXLsbIkSOdtrVq1cp4/vnnTUpUNQDG559/bnaMKuHw4cMGYKxatcqU19cRIBP9/vvvjBgxgv/85z/4+vqaHafKOHbsGJ988gndu3fHw8PD7DiVXmZmJnXr1jU7hlQhubm5bN68mejoaKft0dHRrF+/3qRUUt1kZmYCmPbfJxUgkxiGwdChQxk5ciSdO3c2O06V8Nxzz1GzZk3q1atHWloaixcvNjtSpbd3716mT5/OyJEjzY4iVciRI0fIz88nKCjIaXtQUBAZGRkmpZLqxDAMxo0bx80330y7du1MyaACVM5eeeUVLBZLsbdNmzYxffp0srKyiIuLMzuyaUr6WV30zDPPsGXLFpYvX467uzsPPvgghoucyLy0nxXAoUOHiI2N5Z577mH48OEmJTdHWT4vKchisTjdNwyjwDaRsnjyySfZtm0b8+bNMy2DLoVRzo4cOcKRI0eKHdOkSRP+/Oc/8+WXXzr9xyQ/Px93d3fuv/9+Pvroo4qOarqSflbe3t4Ftv/222+EhISwfv16IiMjKypipVHaz+rQoUP07duXrl27Eh8fj5uba/2/Tll+tuLj4xk7diwnTpyo4HSVX25uLr6+vnz22WcMHDjQsX3MmDGkpqayatUqE9NVbhaLhc8//5w777zT7CiV1qhRo/jiiy9YvXo1TZs2NS1HDdNeuZoKDAwkMDDwiuOmTZvG3//+d8f9Q4cOERMTw4IFC+jatWtFRqw0SvpZFeZib8/JySnPSJVWaT6rgwcP0rdvX8LDw5k9e7bLlR+4up8tAU9PT8LDw0lKSnIqQElJSQwYMMDEZFKVGYbBqFGj+Pzzz1m5cqWp5QdUgEwTGhrqdN/Pzw+A5s2b07hxYzMiVVobN25k48aN3HzzzdSpU4dffvmFl156iebNm7vE0Z/SOHToEH369CE0NJTXX3+dP/74w/FYcHCwickqr7S0NI4dO0ZaWhr5+fmOc3Fdf/31jn8vXdG4ceMYMmQInTt3JjIyknfffZe0tDTNJyvEqVOn+Pnnnx339+3bR2pqKnXr1i3w33pX9sQTTzB37lwWL15MrVq1HPPJAgIC8PHxufaBTFl7JgXs27dPy+CLsG3bNqNv375G3bp1DS8vL6NJkybGyJEjjd9++83saJXO7NmzDaDQmxTuoYceKvTz+vbbb82OZrq3337bCAsLMzw9PY1OnTqZtly5svv2228L/Rl66KGHzI5WqRT136bZs2ebkkdzgERERMTluN7kABEREXF5KkAiIiLiclSARERExOWoAImIiIjLUQESERERl6MCJCIiIi5HBUhERERcjgqQiIiIuBwVIBEpsT59+jB27FizYxTq6NGjNGjQgP379wOwcuVKLBZLhV/ctKyvEx8fT+3atUv1nIiICBYtWlSq54hI4VSARMQ06enp3HfffbRs2RI3N7ciy9XChQtp06YNXl5etGnThs8//7zAmIkTJ3LHHXfQpEmTig1tohdffJHnn38eu91udhSRKk8FSERMk5OTQ/369Rk/fjwdOnQodExycjKDBw9myJAhbN26lSFDhnDvvfeyYcMGx5izZ8/ywQcfMHz48GsV3RT9+/cnMzOTZcuWmR1FpMpTARKRMjl+/DgPPvggderUwdfXF6vVyp49e5zGvPfee4SEhODr68vAgQOZMmWK09c+TZo04c033+TBBx8kICCg0NeZOnUqUVFRxMXF0apVK+Li4rj11luZOnWqY4zNZqNGjRpERkYWmffo0aP85S9/oXHjxvj6+nLjjTcyb948pzF9+vRh1KhRjB07ljp16hAUFMS7777L6dOnefjhh6lVqxbNmzfHZrMV2P+6devo0KED3t7edO3alR9++MHp8fj4eEJDQx2fxdGjR50e37t3LwMGDCAoKAg/Pz8iIiJYsWKF0xh3d3f69etXILeIlJ4KkIiUydChQ9m0aRNLliwhOTkZwzDo168feXl5wPlCMHLkSMaMGUNqaipRUVG89tprpX6d5ORkoqOjnbbFxMSwfv16x/3Vq1fTuXPnYveTnZ1NeHg4S5cu5ccff+TRRx9lyJAhTkeSAD766CMCAwPZuHEjo0aN4q9//Sv33HMP3bt35/vvvycmJoYhQ4Zw5swZp+c988wzvP7666SkpNCgQQP+9Kc/OT6LDRs28Mgjj/D444+TmppK3759+fvf/+70/FOnTtGvXz9WrFjBli1biImJ4Y477iAtLc1pXJcuXVizZk3JPjwRKZop16AXkSqpd+/expgxY4yffvrJAIx169Y5Hjty5Ijh4+NjfPrpp4ZhGMbgwYON/v37Oz3//vvvNwICAord9+U8PDyMTz75xGnbJ598Ynh6ejruDxgwwHjkkUecxnz77bcGYBw/frzI99OvXz/j//2//+eU4eabb3bcP3funFGzZk1jyJAhjm3p6ekGYCQnJzu9zvz58x1jjh49avj4+BgLFiwwDMMw/vKXvxixsbFOrz148OAiP4uL2rRpY0yfPt1p2+LFiw03NzcjPz+/2OeKSPF0BEhESm3nzp3UqFGDrl27OrbVq1ePli1bsnPnTgB2795Nly5dnJ53+f2SslgsTvcNw3DadvbsWby9vYvdR35+Pq+99hrt27enXr16+Pn5sXz58gJHWNq3b+/4u7u7O/Xq1ePGG290bAsKCgLg8OHDTs+79Ou3unXrOn0WO3fuLPD13OX3T58+zbPPPkubNm2oXbs2fn5+7Nq1q0A+Hx8f7HY7OTk5xb5fESleDbMDiEjVYxhGkdsvFpPLS0pxzytOcHAwGRkZTtsOHz7sKCIAgYGBHD9+vNj9/Pvf/+aNN95g6tSp3HjjjdSsWZOxY8eSm5vrNM7Dw8PpvsVicdp28T2VZCXWpZ/FlTzzzDMsW7aM119/neuvvx4fHx/uvvvuAvmOHTuGr68vPj4+V9yniBRNR4BEpNTatGnDuXPnnObPHD16lJ9++onWrVsD0KpVKzZu3Oj0vE2bNpX6tSIjI0lKSnLatnz5crp37+6437FjR3bs2FHsftasWcOAAQN44IEH6NChA82aNSswaftqfPfdd46/Hz9+nJ9++olWrVoB5z+vSx+/fPzFfEOHDmXgwIHceOONBAcHO85pdKkff/yRTp06lVtuEVelAiQipdaiRQsGDBjAiBEjWLt2LVu3buWBBx7guuuuY8CAAQCMGjWKhIQEpkyZwp49e5g1axY2m63AUaHU1FRSU1M5deoUf/zxB6mpqU5lZsyYMSxfvpxJkyaxa9cuJk2axIoVK5zOGRQTE8P27duLPQp0/fXXk5SUxPr169m5cyePPfZYgSNLV2PChAl8/fXX/PjjjwwdOpTAwEDuvPNOAEaPHk1iYiKTJ0/mp59+4q233iIxMbFAvkWLFpGamsrWrVu57777Cj3KtGbNmgKTwkWk9FSARKRMZs+eTXh4OLfffjuRkZEYhkFCQoLj66IePXowc+ZMpkyZQocOHUhMTOSpp54qMFenY8eOdOzYkc2bNzN37lw6duxIv379HI93796d+fPnM3v2bNq3b098fDwLFixwmn9044030rlzZz799NMi87744ot06tSJmJgY+vTpQ3BwsKOglId//vOfjBkzhvDwcNLT01myZAmenp4AdOvWjffff5/p06dz0003sXz5cl544QWn57/xxhvUqVOH7t27c8cddxATE1PgSM/BgwdZv349Dz/8cLnlFnFVFqMsX8qLiJTBiBEj2LVrV4Us405ISODpp5/mxx9/xM2tev6/3TPPPENmZibvvvuu2VFEqjxNghaRCvP6668TFRVFzZo1sdlsfPTRR8yYMaNCXqtfv37s2bOHgwcPEhISUiGvYbYGDRrw9NNPmx1DpFrQESARqTD33nsvK1eu5OTJkzRr1oxRo0YxcuRIs2OJiKgAiYiIiOupnl+Ui4iIiBRDBUhERERcjgqQiIiIuBwVIBEREXE5KkAiIiLiclSARERExOWoAImIiIjLUQESERERl/P/AZaP27Xvhj1uAAAAAElFTkSuQmCC",
+      "text/plain": [
+       "<Figure size 640x480 with 1 Axes>"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "import numpy as np\n",
+    "import pandas as pd\n",
+    "import matplotlib.pyplot as plt\n",
+    "from sklearn.model_selection import train_test_split\n",
+    "from sklearn import linear_model\n",
+    "\n",
+    "def MSE(y_data,y_model):\n",
+    "    n = np.size(y_model)\n",
+    "    return np.sum((y_data-y_model)**2)/n\n",
+    "\n",
+    "\n",
+    "# A seed just to ensure that the random numbers are the same for every run.\n",
+    "# Useful for eventual debugging.\n",
+    "np.random.seed(3155)\n",
+    "\n",
+    "n = 100\n",
+    "x = np.random.rand(n)\n",
+    "y = np.exp(-x**2) + 1.5 * np.exp(-(x-2)**2)\n",
+    "\n",
+    "Maxpolydegree = 20\n",
+    "X = np.zeros((n,Maxpolydegree))\n",
+    "#We include explicitely the intercept column\n",
+    "for degree in range(Maxpolydegree):\n",
+    "    X[:,degree] = x**degree\n",
+    "# We split the data in test and training data\n",
+    "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)\n",
+    "\n",
+    "p = Maxpolydegree\n",
+    "I = np.eye(p,p)\n",
+    "# Decide which values of lambda to use\n",
+    "nlambdas = 6\n",
+    "MSEOwnRidgePredict = np.zeros(nlambdas)\n",
+    "MSERidgePredict = np.zeros(nlambdas)\n",
+    "lambdas = np.logspace(-4, 2, nlambdas)\n",
+    "for i in range(nlambdas):\n",
+    "    lmb = lambdas[i]\n",
+    "    OwnRidgeTheta = np.linalg.pinv(X_train.T @ X_train+lmb*I) @ X_train.T @ y_train\n",
+    "    # Note: we include the intercept column and no scaling\n",
+    "    RegRidge = linear_model.Ridge(lmb,fit_intercept=False)\n",
+    "    RegRidge.fit(X_train,y_train)\n",
+    "    # and then make the prediction\n",
+    "    ytildeOwnRidge = X_train @ OwnRidgeTheta\n",
+    "    ypredictOwnRidge = X_test @ OwnRidgeTheta\n",
+    "    ytildeRidge = RegRidge.predict(X_train)\n",
+    "    ypredictRidge = RegRidge.predict(X_test)\n",
+    "    MSEOwnRidgePredict[i] = MSE(y_test,ypredictOwnRidge)\n",
+    "    MSERidgePredict[i] = MSE(y_test,ypredictRidge)\n",
+    "    print(\"Theta values for own Ridge implementation\")\n",
+    "    print(OwnRidgeTheta)\n",
+    "    print(\"Theta values for Scikit-Learn Ridge implementation\")\n",
+    "    print(RegRidge.coef_)\n",
+    "    print(\"MSE values for own Ridge implementation\")\n",
+    "    print(MSEOwnRidgePredict[i])\n",
+    "    print(\"MSE values for Scikit-Learn Ridge implementation\")\n",
+    "    print(MSERidgePredict[i])\n",
+    "\n",
+    "# Now plot the results\n",
+    "plt.figure()\n",
+    "plt.plot(np.log10(lambdas), MSEOwnRidgePredict, 'r', label = 'MSE own Ridge Test')\n",
+    "plt.plot(np.log10(lambdas), MSERidgePredict, 'g', label = 'MSE Ridge Test')\n",
+    "\n",
+    "plt.xlabel('log10(lambda)')\n",
+    "plt.ylabel('MSE')\n",
+    "plt.legend()\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1aa5ca37",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "The results here agree when we force **Scikit-Learn**'s Ridge function to include the first column in our design matrix.\n",
+    "We see that the results agree very well. Here we have thus explicitely included the intercept column in the design matrix.\n",
+    "What happens if we do not include the intercept in our fit?\n",
+    "Let us see how we can change this code by zero centering."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 11,
+   "id": "a731e32c",
+   "metadata": {
+    "collapsed": false,
+    "editable": true,
+    "jupyter": {
+     "outputs_hidden": false
+    }
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Theta values for own Ridge implementation\n",
+      "[ 6.21437387e-02 -6.21799480e-01  4.97375876e-02  2.80186212e-01\n",
+      "  2.13204296e-01  8.21806408e-02 -1.64261842e-02 -6.49525927e-02\n",
+      " -7.40621002e-02 -5.97380172e-02 -3.53631498e-02 -1.00892602e-02\n",
+      "  1.06457090e-02  2.41129954e-02  2.93712126e-02  2.65682547e-02\n",
+      "  1.64586608e-02  8.86820363e-05 -2.13996549e-02]\n",
+      "Theta values for Scikit-Learn Ridge implementation\n",
+      "[ 6.21437387e-02 -6.21799480e-01  4.97375876e-02  2.80186212e-01\n",
+      "  2.13204296e-01  8.21806408e-02 -1.64261842e-02 -6.49525927e-02\n",
+      " -7.40621002e-02 -5.97380172e-02 -3.53631498e-02 -1.00892602e-02\n",
+      "  1.06457090e-02  2.41129954e-02  2.93712126e-02  2.65682547e-02\n",
+      "  1.64586608e-02  8.86820368e-05 -2.13996549e-02]\n",
+      "Intercept from own implementation:\n",
+      "1.0303792191716266\n",
+      "Intercept from Scikit-Learn Ridge implementation\n",
+      "1.030379219171611\n",
+      "MSE values for own Ridge implementation\n",
+      "4.5384372495623863e-07\n",
+      "MSE values for Scikit-Learn Ridge implementation\n",
+      "4.538437249513786e-07\n",
+      "Theta values for own Ridge implementation\n",
+      "[-0.02347793 -0.37069922 -0.07332612  0.11841713  0.16193346  0.1301137\n",
+      "  0.07578485  0.02473008 -0.01354652 -0.03746205 -0.048708   -0.04998023\n",
+      " -0.04397732 -0.03302123 -0.01896364 -0.00321213  0.01320154  0.02954601\n",
+      "  0.04532178]\n",
+      "Theta values for Scikit-Learn Ridge implementation\n",
+      "[-0.02347793 -0.37069922 -0.07332612  0.11841713  0.16193346  0.1301137\n",
+      "  0.07578485  0.02473008 -0.01354652 -0.03746205 -0.048708   -0.04998023\n",
+      " -0.04397732 -0.03302123 -0.01896364 -0.00321213  0.01320154  0.02954601\n",
+      "  0.04532178]\n",
+      "Intercept from own implementation:\n",
+      "1.0367447638630856\n",
+      "Intercept from Scikit-Learn Ridge implementation\n",
+      "1.0367447638630596\n",
+      "MSE values for own Ridge implementation\n",
+      "5.753660262133006e-06\n",
+      "MSE values for Scikit-Learn Ridge implementation\n",
+      "5.753660262105462e-06\n",
+      "Theta values for own Ridge implementation\n",
+      "[-0.12806426 -0.15525674 -0.05487426  0.0191961   0.05293494  0.0605132\n",
+      "  0.05479386  0.04368354  0.03139359  0.01993241  0.01011079  0.00212882\n",
+      " -0.00410226 -0.00879058 -0.01217801 -0.0144992  -0.01596432 -0.01675341\n",
+      " -0.01701673]\n",
+      "Theta values for Scikit-Learn Ridge implementation\n",
+      "[-0.12806426 -0.15525674 -0.05487426  0.0191961   0.05293494  0.0605132\n",
+      "  0.05479386  0.04368354  0.03139359  0.01993241  0.01011079  0.00212882\n",
+      " -0.00410226 -0.00879058 -0.01217801 -0.0144992  -0.01596432 -0.01675341\n",
+      " -0.01701673]\n",
+      "Intercept from own implementation:\n",
+      "1.0457044830369082\n",
+      "Intercept from Scikit-Learn Ridge implementation\n",
+      "1.0457044830369118\n",
+      "MSE values for own Ridge implementation\n",
+      "2.891180548496047e-05\n",
+      "MSE values for Scikit-Learn Ridge implementation\n",
+      "2.891180548496749e-05\n",
+      "Theta values for own Ridge implementation\n",
+      "[-0.13729887 -0.08409267 -0.03383964 -0.00374257  0.01182892  0.01878432\n",
+      "  0.02095086  0.02056419  0.01888753  0.01662493  0.01416442  0.01171602\n",
+      "  0.00938921  0.00723663  0.0052788   0.00351837  0.00194828  0.00055647\n",
+      " -0.00067131]\n",
+      "Theta values for Scikit-Learn Ridge implementation\n",
+      "[-0.13729887 -0.08409267 -0.03383964 -0.00374257  0.01182892  0.01878432\n",
+      "  0.02095086  0.02056419  0.01888753  0.01662493  0.01416442  0.01171602\n",
+      "  0.00938921  0.00723663  0.0052788   0.00351837  0.00194828  0.00055647\n",
+      " -0.00067131]\n",
+      "Intercept from own implementation:\n",
+      "1.0395764153518305\n",
+      "Intercept from Scikit-Learn Ridge implementation\n",
+      "1.0395764153518303\n",
+      "MSE values for own Ridge implementation\n",
+      "6.344515538040167e-05\n",
+      "MSE values for Scikit-Learn Ridge implementation\n",
+      "6.344515538040063e-05\n",
+      "Theta values for own Ridge implementation\n",
+      "[-0.05814585 -0.04390414 -0.02861232 -0.01777492 -0.01046009 -0.00550644\n",
+      " -0.00210846  0.00024979  0.00189794  0.00305065  0.00385122  0.00439783\n",
+      "  0.00475926  0.00498468  0.00510982  0.00516093  0.00515753  0.00511424\n",
+      "  0.00504205]\n",
+      "Theta values for Scikit-Learn Ridge implementation\n",
+      "[-0.05814585 -0.04390414 -0.02861232 -0.01777492 -0.01046009 -0.00550644\n",
+      " -0.00210846  0.00024979  0.00189794  0.00305065  0.00385122  0.00439783\n",
+      "  0.00475926  0.00498468  0.00510982  0.00516093  0.00515753  0.00511424\n",
+      "  0.00504205]\n",
+      "Intercept from own implementation:\n",
+      "1.0036921863049417\n",
+      "Intercept from Scikit-Learn Ridge implementation\n",
+      "1.0036921863049417\n",
+      "MSE values for own Ridge implementation\n",
+      "0.0008213907109028527\n",
+      "MSE values for Scikit-Learn Ridge implementation\n",
+      "0.000821390710902852\n",
+      "Theta values for own Ridge implementation\n",
+      "[-0.00952707 -0.00855426 -0.00684557 -0.00542639 -0.00433676 -0.00350466\n",
+      " -0.00286296 -0.00236187 -0.0019659  -0.00164965 -0.0013947  -0.00118745\n",
+      " -0.0010177  -0.0008777  -0.00076148 -0.00066442 -0.00058287 -0.00051397\n",
+      " -0.00045544]\n",
+      "Theta values for Scikit-Learn Ridge implementation\n",
+      "[-0.00952707 -0.00855426 -0.00684557 -0.00542639 -0.00433676 -0.00350466\n",
+      " -0.00286296 -0.00236187 -0.0019659  -0.00164965 -0.0013947  -0.00118745\n",
+      " -0.0010177  -0.0008777  -0.00076148 -0.00066442 -0.00058287 -0.00051397\n",
+      " -0.00045544]\n",
+      "Intercept from own implementation:\n",
+      "0.9661173541241802\n",
+      "Intercept from Scikit-Learn Ridge implementation\n",
+      "0.9661173541241802\n",
+      "MSE values for own Ridge implementation\n",
+      "0.0031252083411001533\n",
+      "MSE values for Scikit-Learn Ridge implementation\n",
+      "0.0031252083411001533\n"
+     ]
+    },
+    {
+     "data": {
+      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAlIAAAGwCAYAAABiu4tnAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjguNCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8fJSN1AAAACXBIWXMAAA9hAAAPYQGoP6dpAABlzUlEQVR4nO3deVxU1fsH8M8AszAsI4oCJiBqqbglWAjlWixq5laSFWmWRZqKtiiZa5rot7T6ulamZqakqFmpSSW4oalfwAUyf4ppCSGKMyyyzZzfH+TkyOIMApfl83695hX3znPPfe6VnMdzz5wjE0IIEBEREZHFrKROgIiIiKi+YiFFREREVEUspIiIiIiqiIUUERERURWxkCIiIiKqIhZSRERERFXEQoqIiIioimykTqAhMxgMuHLlChwcHCCTyaROh4iIiMwghEBOTg5atmwJK6vK+5xYSNWgK1euwN3dXeo0iIiIqAouX76MVq1aVRrDQqoGOTg4ACj9g3B0dJQ4GyIiIjKHTqeDu7u78XO8MiykatCtx3mOjo4spIiIiOoZc4blcLA5ERERURWxkCIiIiKqIhZSRERERFXEMVJ1gF6vR3FxsdRpENUYhUJx168QExHVRyykJCSEQEZGBm7cuCF1KkQ1ysrKCl5eXlAoFFKnQkRUrVhISehWEdWiRQuo1WpO2kkN0q2JadPT0+Hh4cHfcyJqUFhISUSv1xuLqGbNmkmdDlGNat68Oa5cuYKSkhLI5XKp0yEiqjYctCCRW2Oi1Gq1xJkQ1bxbj/T0er3EmRARVS8WUhLjYw5qDPh7TkQNFQspIiIioipiIUVERERURSykiGrQxYsXIZPJkJSUVGFMXFwcZDIZp8EgIqqHWEiRRcaMGQOZTIbw8PAy740fPx4ymQxjxowx7svMzMSrr74KDw8PKJVKuLq6Ijg4GAkJCcaY1q1bQyaTlXlFRUXVxiVVWd++fY25KhQKtG3bFpGRkSgsLDTGuLu7Iz09HZ07d5Yw04rv8a1X3759q9x23759ERERUW25EhHVJ5z+gCzm7u6OzZs3Y+nSpbC1tQUAFBQUYNOmTfDw8DCJHTFiBIqLi7F+/Xq0adMGf//9N37++Wdcv37dJG7evHkYN26cyT4HB4eavZBqMG7cOMybNw9FRUU4duwYXnzxRQDAwoULAQDW1tZwdXWVMkUAwLFjx4zfmDt8+DBGjBiBs2fPwtHREQA4USYR1TvrY4/Bt50HOnu5SJoHe6TqoLy8il8FBebH3rx599iq8PHxgYeHB7Zt22bct23bNri7u6N79+7GfTdu3MDBgwexaNEi9OvXD56ennj44YcRGRmJQYMGmbTp4OAAV1dXk5ednV2FOWRnZ+OFF16Ak5MT1Go1BgwYgHPnzgEonTG+efPmiImJMcY/+OCDaNGihXE7ISEBcrkcubm5AEq/Vfb5559j2LBhUKvVuP/++7Fz58673gu1Wg1XV1d4eHhgxIgRCAwMxN69e43vl/dob9euXXjggQdga2uLfv364eLFi2Xa/eyzz+Du7g61Wo1hw4ZhyZIlaNKkiUnMd999B19fX6hUKrRp0wZz585FSUlJuXk2b97ceF+bNm0KAGjRooVx32+//YbevXvD1tYW7u7umDRpEvJu+wVZsWIF7r//fqhUKri4uOCpp54CUNpDGR8fj48//tjYu1Xe9RARVSeDQeCV3S+gy7r7sCD6R0lzYSFVB9nbV/waMcI0tkWLimMHDDCNbd26bExVvfjii1i7dq1x+4svvsDYsWPvuA572NvbY8eOHSaPu6rDmDFjcPz4cezcuRMJCQkQQmDgwIEoLi6GTCZD7969ERcXB6C06EpJSUFxcTFSUlIAlI5L8vX1hf1tN2Hu3LkYOXIkTp48iYEDB+K5554r03NWmeTkZBw6dKjSCScvX76M4cOHY+DAgUhKSsLLL7+M6dOnm8QcOnQI4eHhmDx5MpKSkhAYGIgFCxaYxPz44494/vnnMWnSJKSkpGD16tVYt25dmThznDp1CsHBwRg+fDhOnjyJ6OhoHDx4EK+//joA4Pjx45g0aRLmzZuHs2fPYs+ePejduzcA4OOPP4a/vz/GjRuH9PR0pKenw93d3eIciIgs8dUvJ1Ck+Q3QyzG6v7+0yQiqMVqtVgAQWq22zHs3b94UKSkp4ubNm2XeAyp+DRxoGqtWVxzbp49prLNz2RhLjR49WgwZMkRcvXpVKJVKkZaWJi5evChUKpW4evWqGDJkiBg9erQxfuvWrcLJyUmoVCoREBAgIiMjRXJyskmbnp6eQqFQCDs7O5PXvn37ys3h999/FwDEoUOHjPuysrKEra2t+Oabb4QQQnzyySeic+fOQgghduzYIXr06CGGDx8uli9fLoQQIigoSEybNs14PADx7rvvGrdzc3OFTCYTu3fvrvBe9OnTR8jlcmFnZycUCoUAIKysrMTWrVuNMWlpaQKASExMFEIIERkZKTp27CgMBoMxZtq0aQKAyM7OFkIIERoaKgYNGmRyrueee05oNBrjdq9evcT7779vErNhwwbh5uZWYb637Nu3z+R8YWFh4pVXXjGJOXDggLCyshI3b94UMTExwtHRUeh0ugrvw+TJkys9Z2W/70REluo2fZLAHAiPqc/USPuVfX7fiWOk6qB/njaVy9radDszs+JYqzv6G6vziYuzszMGDRqE9evXQwiBQYMGwdnZuUzciBEjMGjQIBw4cAAJCQnYs2cPFi9ejM8//9xkUPpbb71lsg0A9913X7nnTk1NhY2NDfz8/Iz7mjVrhvbt2yM1NRVA6QDoyZMnIysrC/Hx8ejbty88PDwQHx+PV155BYcPHy4zQLpr167Gn+3s7ODg4IDMym4wgOeeew4zZsyATqfDokWL4OjoiBF3dhvekXvPnj1NJqj09zf919TZs2cxbNgwk30PP/wwvv/+e+P2iRMncOzYMZMeKL1ej4KCAuTn51s0Y/6JEyfwf//3f9i4caNxnxACBoMBaWlpCAwMhKenJ9q0aYOQkBCEhIQYH4ESEUnhfMGvgAoY6xsmdSocbF4XVTI0qNZizTF27Fjj45/ly5dXGKdSqRAYGIjAwEDMmjULL7/8MmbPnm1SODk7O6Ndu3ZmnVcIUeH+WwVK586d0axZM8THxyM+Ph7z5s2Du7s7FixYgGPHjuHmzZt49NFHTY6/85GcTCaDwWCoNBeNRmPM+6uvvkKnTp2wZs0avPTSSxblXtF1VHScwWDA3LlzMXz48DLHq1Squ57jzrZeffVVTJo0qcx7Hh4eUCgU+N///oe4uDjs3bsXs2bNwpw5c3Ds2LEy47aIiGqD9sPDWL37MF4M9Lt7cA1jIUVVFhISgqKiIgBAcHCw2cd5e3tjx44dVT6vt7c3SkpKcPToUQQEBAAArl27ht9//x0dO3YEAOM4qW+//RanT59Gr1694ODggOLiYqxatQo+Pj7V/q1AuVyOd955B5GRkRg1alS5PTblXfuRI0dMtjt06IBff/3VZN/x48dNtn18fHD27Fmzi8/K+Pj44MyZM5W2ZWNjg8cffxyPP/44Zs+ejSZNmuCXX37B8OHDoVAouIYeEdUqKysZXhv0iNRpAOBgc7oH1tbWSE1NRWpqKqzvfOaI0uKmf//++Oqrr3Dy5EmkpaVhy5YtWLx4MYYMGWISm5OTg4yMDJOXTqcr97z3338/hgwZgnHjxuHgwYNITk7G888/j/vuu8+k3b59++Lrr79G165d4ejoaCyuNm7ceE/zJlXm2WefhUwmw4oVK8p9Pzw8HOfPn8fUqVNx9uxZfP3111i3bp1JzMSJE7Fr1y4sWbIE586dw+rVq7F7926TXqpZs2bhyy+/xJw5c3DmzBmkpqYiOjoa7777rsU5T5s2DQkJCZgwYQKSkpJw7tw57Ny5ExMnTgQAfP/99/jkk0+QlJSEP/74A19++SUMBgPat28PoHSOqqNHj+LixYvIysq6ay8eEVFV3cgtgC6ver+8dK9YSNE9cXR0NM5FdCd7e3v4+flh6dKl6N27Nzp37oyZM2di3LhxWLZsmUnsrFmz4ObmZvJ6++23Kzzv2rVr4evriyeeeAL+/v4QQmDXrl0mj+f69esHvV5vUjT16dMHer0effr0ubcLr4BCocDrr7+OxYsXG6dWuJ2HhwdiYmLw3XffoVu3bli1ahXef/99k5hHHnkEq1atwpIlS9CtWzfs2bMHU6ZMMXlkFxwcjO+//x6xsbF46KGH0LNnTyxZsgSenp4W59y1a1fEx8fj3Llz6NWrF7p3746ZM2fCzc0NANCkSRNs27YN/fv3R8eOHbFq1Sps2rQJnTp1AgC8+eabsLa2hre3N5o3b45Lly5ZnAMRkTkmr/kKTea7IWR+3ZmwWSbMGbRBVaLT6aDRaKDVassUGwUFBUhLS4OXl5fFY1qo8Rk3bhx+++03HDhwQOpUqoS/70RUHZpE9IHWaT+CrRdiz7vT735AFVX2+X0njpEiqoM++OADBAYGws7ODrt378b69esrfFxIRNQYHDx9EVqn/YCQYe6I56ROx0jyR3srVqww/ivV19f3rv/ijo+PN5nNedWqVWViYmJi4O3tDaVSCW9vb2zfvt3k/ZUrVxrHzTg6OsLf3x+7d+82iRFCYM6cOWjZsiVsbW3Rt29fnDlz5t4vmMgMv/76KwIDA9GlSxesWrUKn3zyCV5++WWp0yIikszc7aVTtDjd6Ae/jnVn4l9JC6no6GhERERgxowZSExMRK9evTBgwIAKx1ikpaVh4MCB6NWrFxITE/HOO+9g0qRJJkuBJCQkIDQ0FGFhYUhOTkZYWBhGjhyJo0ePGmNatWqFqKgoHD9+HMePH0f//v0xZMgQk0Jp8eLFWLJkCZYtW4Zjx47B1dUVgYGByMnJqbkbQvSPb775BpmZmbh58ybOnDlT7iLRRESNhcEgsP/GBgDA8HbSzx1lokamBDXTww8/LMLDw032dejQQUyfPr3c+Lffflt06NDBZN+rr74qevbsadweOXKkCAkJMYkJDg4WzzxT+eynTk5O4vPPPxdCCGEwGISrq6uIiooyvl9QUCA0Go1YtWpVhW0UFBQIrVZrfF2+fLlKM5sTNTT8fSeie/HFj0cF5kBghq34K6v8VRaqkyUzm0vWI1VUVIQTJ04gKCjIZH9QUBAOHz5c7jEJCQll4oODg3H8+HEUFxdXGlNRm3q9Hps3b0ZeXp5xhum0tDRkZGSYtKNUKtGnT58K2wGAhQsXQqPRGF9cc4yIiOjeLf2ltDfK8+ZQtGxWvXMA3ivJCqmsrCzo9Xq4uLiY7HdxcUFGRka5x2RkZJQbX1JSgqysrEpj7mzz1KlTsLe3h1KpRHh4OLZv3w5vb29jG7eOMzc3AIiMjIRWqzW+Ll++XGEsERERmSdq+AT4l0RiSu9XpU6lDMm/tVfeUhh37rtb/J37zWmzffv2SEpKwo0bNxATE4PRo0cjPj7eWExVJTelUgmlUlnh+0RERGS5gQ93wMCH3797oAQk65FydnaGtbV1mR6ezMzMMj1Bt7i6upYbb2Njg2bNmlUac2ebCoUC7dq1Q48ePbBw4UJ069YNH3/8sbENABblRkRERI2PZIWUQqGAr68vYmNjTfbHxsYa10+7k7+/f5n4vXv3okePHsYZrSuKqajNW4QQKCwsnXbey8sLrq6uJu0UFRUhPj7+ru1QwxYXFweZTIYbN25UGLNu3Tou5ktEVA3++PsGWr/xLGZ99T0Mhjo6f3jNjnuv3ObNm4VcLhdr1qwRKSkpIiIiQtjZ2YmLFy8KIYSYPn26CAsLM8ZfuHBBqNVqMWXKFJGSkiLWrFkj5HK52Lp1qzHm0KFDwtraWkRFRYnU1FQRFRUlbGxsxJEjR4wxkZGRYv/+/SItLU2cPHlSvPPOO8LKykrs3bvXGBMVFSU0Go3Ytm2bOHXqlBg1apRwc3MTOp353xaobNR/ff0W0+jRowUA8eqrr5Z577XXXhMAxOjRo437/v77b/HKK68Id3d3oVAohIuLiwgKChKHDx82xnh6egoAZV4LFy6sMI/z58+LZ555Rri5uQmlUinuu+8+8eSTT4qzZ88aYwCI7du3m31tt+ehUqlE+/btxeLFi4XBYDDGFBYWivT0dJN9d1q7dq3QaDRmn7cqyrtft79u/zOwlKenp1i6dGm15SpE/f19JyJphS39VGAOhHJKJ6HXV/z3bnWz5Ft7ko6RCg0NxbVr1zBv3jykp6ejc+fO2LVrl3G9sPT0dJM5pby8vLBr1y5MmTIFy5cvR8uWLfHJJ59gxIgRxpiAgABs3rwZ7777LmbOnIm2bdsiOjoafn5+xpi///4bYWFhSE9Ph0ajQdeuXbFnzx4EBgYaY95++23cvHkT48ePR3Z2Nvz8/LB37144ONStbwtIwd3dHZs3b8bSpUtha2sLoHQJkE2bNsHDw8MkdsSIESguLsb69evRpk0b/P333/j5559x/fp1k7h58+Zh3LhxJvsqutdFRUUIDAxEhw4dsG3bNri5ueHPP//Erl27oNVq7+nabuVRUFCAn376Ca+99hocHR3x6qulAxwVCoXx0a+U0tPTjT9HR0dj1qxZOHv2rHHfrT8XIqL67NuLGwAnoH+zF2BlVfEYZUnVQmHXaDXUHqkhQ4aILl26iK+++sq4f+PGjaJLly5iyJAhxt6Q7OxsAUDExcVV2qalPSCJiYkCgLHnsiKoQo/UnXn4+PiI4cOHG7f37dsnAIjs7GzjvrVr1wp3d3dha2srhg4dKj744IMyPVLvvfeeaN68ubC3txcvvfSSmDZtmujWrZtJzBdffCE6dOgglEqlaN++vVi+fLlZeZfXA7Zz507h4+MjlEql8PLyEnPmzBHFxcXG92fPnm3sJXRzcxMTJ04UQgjRp0+fMr1b1aG+/r4TkXQOnEornTtqtkz8+tvlWj13vZhHiiqWV5RX4augpMDs2JvFN+8aW1Uvvvgi1q5da9z+4osvMHbsWJMYe3t72NvbY8eOHcbxZ9WhefPmsLKywtatW6HX66ut3dsJIRAXF4fU1FTj+LvyHD16FGPHjsX48eORlJSEfv36Yf78+SYxGzduxIIFC7Bo0SKcOHECHh4eWLlypUnMZ599hhkzZmDBggVITU3F+++/j5kzZ2L9+vUW5/7jjz/i+eefx6RJk5CSkoLVq1dj3bp1WLBgAQBg69atWLp0KVavXo1z585hx44d6NKlCwBg27ZtaNWqlbGX+PaeLyKi2jRn21cAAKcb/fFQ+1YSZ1OJmq/rGq+q9khhDip8Ddw40CRWvUBdYWyftX1MYp0XO5eJsdStHqmrV68KpVIp0tLSxMWLF4VKpRJXr1416ZESQoitW7cKJycnoVKpREBAgIiMjBTJyckmbXp6egqFQiHs7OxMXvv27aswj2XLlgm1Wi0cHBxEv379xLx588T58+dN72MVeqRu5SGXy41jpQ4dOmSMubNHatSoUWVm0g8NDTXpIfLz8xMTJkwwiXnkkUdMeqTc3d3F119/bRLz3nvvCX9//7vmfWePVK9evcT7779vErNhwwbh5uYmhBDiww8/FA888IAoKioqtz2OkSIiqen1BiGfer/AHIiXl62r9fOzR4pqnLOzMwYNGoT169dj7dq1GDRoEJydncvEjRgxAleuXMHOnTsRHByMuLg4+Pj4YN26dSZxb731FpKSkkxet49ru9OECROQkZGBr776Cv7+/tiyZQs6depU5hublrqVR3x8PPr164cZM2ZU+k3N1NRU44z4t9y5ffbsWTz88MMm+27fvnr1Ki5fvoyXXnrJ2Itnb2+P+fPn4/z58xZfw4kTJzBv3jyTtsaNG4f09HTk5+fj6aefxs2bN9GmTRuMGzcO27dvR0lJicXnISKqKWtjf0Wx4zmg2BZznxkudTqVknxCTiorNzK3wvesraxNtjPfzKww1kpmWidfnHzxnvK609ixY/H6668DAJYvX15hnEqlQmBgIAIDAzFr1iy8/PLLmD17NsaMGWOMcXZ2Rrt27Sw6v4ODA5588kk8+eSTmD9/PoKDgzF//nyTLw1Y6lYe7dq1Q0xMDNq1a4eePXvi8ccfLzdeCPO+jlvRRLIAYDAYAJQ+3ruzeLS2Nv3zNofBYMDcuXMxfHjZv3xUKhXc3d1x9uxZxMbG4qeffsL48ePxn//8B/Hx8ZU+xiQiqi3Xc3Oh0naBq6xrnVsS5k4spOogO4Wd5LHmCAkJQVFREYDS9QzN5e3tjR07dlRrLjKZDB06dKh0LURLOTk5YeLEiXjzzTeRmJhY7qz23t7eOHLkiMm+O7fbt2+PX3/9FWFh/65Yfvz4cePPLi4uuO+++3DhwgU899xz95y3j48Pzp49W2lhamtrayxCJ0yYgA4dOuDUqVPw8fGBQqGosbFnRETmeGvEY3hrxElc1928e7DEWEhRlVlbWyM1NdX4852uXbuGp59+GmPHjkXXrl3h4OCA48ePY/HixRgyZIhJbE5OTpmZ5NVqNRwdHcu0m5SUhNmzZyMsLAze3t5QKBSIj4/HF198gWnTppnEpqWlISkpyWRfu3btYG9vb9Y1TpgwAYsWLUJMTAyeeuqpMu9PmjQJAQEBWLx4MYYOHYq9e/diz549JjETJ07EuHHj0KNHDwQEBCA6OhonT55EmzZtjDFz5szBpEmT4OjoiAEDBqCwsBDHjx9HdnY2pk6dalaut8yaNQtPPPEE3N3d8fTTT8PKygonT57EqVOnMH/+fKxbtw56vR5+fn5Qq9XYsGEDbG1tjdOOtG7dGvv378czzzwDpVJZ7iNbIqLa0NSxHkzlUuMjthqxhjz9QUVuH2xeUFAgpk+fLnx8fIRGoxFqtVq0b99evPvuuyI/P994TEUTcpY36acQQly9elVMmjRJdO7cWdjb2wsHBwfRpUsX8cEHHwi9Xm+MK69NABUOYq9okPW4ceNEp06dhF6vL3f6gzVr1ohWrVoJW1tbMXjw4HKnP5g3b55wdnYW9vb2YuzYsWLSpEmiZ8+eJjEbN24UDz74oFAoFMLJyUn07t1bbNu2rcJ7fUt50x/s2bNHBAQECFtbW+Ho6Cgefvhh8emnnwohhNi+fbvw8/MTjo6Ows7OTvTs2VP89NNPxmMTEhJE165dhVKp5PQHRFTrlm7fJ9Kv5UiagyWDzWVCmDnIgyym0+mg0Wig1WrL9KwUFBQgLS0NXl5eUKlUEmVIUgkMDISrqys2bNggdSq1gr/vRGSOtPRstFnhChhscOT53+DX0V2SPCr7/L4TH+0R1bD8/HysWrUKwcHBsLa2xqZNm/DTTz/d8zcMiYgampnRWwCbIqi07ev23FG3YSFFVMNkMhl27dqF+fPno7CwEO3bt0dMTEyF3wQkImqsvru1JIxzWN1dEuYOLKSIapitrS1++uknqdMgIqrT4pIvQOd0EBAyzH3qWanTMRsn5CQiIiLJzdtRuiRM0xuPoccD90mcjflYSEmMY/2pMeDvORFVxmAQOKgr/fLN0w+8IHE2lmEhJZFbM0jn5+dLnAlRzbs1cWtVZmonoobvuyMpKHb8P6BIjTnPDJM6HYtwjJRErK2t0aRJE2Rmli7xolary505m6i+MxgMuHr1KtRqNWxs+FcOEZU1JKATDmou4ocTSXBtat6EyXUF/1aTkKurKwAYiymihsrKygoeHh78xwIRVeiRTp54pJOn1GlYjIWUhGQyGdzc3NCiRQsUFxdLnQ5RjVEoFLCy4kgCIiqrRG+AjXX9/fuBhVQdYG1tzbEjRETUKLV5axQKRC4+HjIfo/p2lzodi9XfEpCIiIjqtbT0bFy224GrTXbBup72WtfPrImIiKjee3fzN/8sCdMFI3t3kzqdKmEhRURERJL47o/SuaMea16/5o66HQspIiIiqnW/JJ1HjtMhwGCFefVoSZg7sZAiIiKiWvfet/8sCaN9DD73t5Q4m6pjIUVERES16vYlYUa2r7+P9QAWUkRERFTLikr0ePq+N+GU3R9z69mSMHeSCa4mWmN0Oh00Gg20Wi0cHR2lToeIiIjMYMnnN3ukiIiIiKqIhRQRERHVmv/u3I9nPlyBc39ekzqVasFCioiIiGrNwn0fITp3Ap5dGSV1KtWChRQRERHVivNXriPd4XsAwLSQMImzqR4spIiIiKhWzNz8DWBdDNWNrniqV1ep06kWLKSIiIioVnx/6Z8lYVo0jN4ogIUUERER1YKfE/8POU6H6/2SMHdiIUVEREQ17taSMM20j9frJWHuxEKKiIiIaty1m1cBgzWebt9wHusBgI3UCRAREVHDd2rRcpy5OBtuTR2kTqVasZAiIiKiWtGpdQupU6h2fLRHRERENUaXV4iDpy9KnUaNYSFFRERENWb+Nz+gV4wXPN8YJXUqNYKFFBEREdWYr06Wzh3lonKXOJOawUKKiIiIasS5P68h3eEHAMC0AQ3r23q3sJAiIiKiGjEzOvqfJWG6YcSjXaROp0awkCIiIqIa8cOfpY/1glxekDiTmiN5IbVixQp4eXlBpVLB19cXBw4cqDQ+Pj4evr6+UKlUaNOmDVatWlUmJiYmBt7e3lAqlfD29sb27dtN3l+4cCEeeughODg4oEWLFhg6dCjOnj1rEjNmzBjIZDKTV8+ePe/9gomIiBqB2BPnkNvkCGCwwtynG+ZAc0DiQio6OhoRERGYMWMGEhMT0atXLwwYMACXLl0qNz4tLQ0DBw5Er169kJiYiHfeeQeTJk1CTEyMMSYhIQGhoaEICwtDcnIywsLCMHLkSBw9etQYEx8fjwkTJuDIkSOIjY1FSUkJgoKCkJeXZ3K+kJAQpKenG1+7du2qmRtBRETUwLz//dcAgGa6QDzY1k3ibGqOTAghpDq5n58ffHx8sHLlSuO+jh07YujQoVi4cGGZ+GnTpmHnzp1ITU017gsPD0dycjISEhIAAKGhodDpdNi9e7cxJiQkBE5OTti0aVO5eVy9ehUtWrRAfHw8evfuDaC0R+rGjRvYsWOH2ddTWFiIwsJC47ZOp4O7uzu0Wi0cHR3NboeIiKi+y8zOw+zN29HOpSXeGN5f6nQsotPpoNFozPr8lqxHqqioCCdOnEBQUJDJ/qCgIBw+fLjcYxISEsrEBwcH4/jx4yguLq40pqI2AUCr1QIAmjZtarI/Li4OLVq0wAMPPIBx48YhMzOz0mtauHAhNBqN8eXu3jC/6klERHQ3LZzssPK15+tdEWUpyQqprKws6PV6uLi4mOx3cXFBRkZGucdkZGSUG19SUoKsrKxKYypqUwiBqVOn4tFHH0Xnzp2N+wcMGICNGzfil19+wYcffohjx46hf//+Jj1Od4qMjIRWqzW+Ll++XPENICIionpP8rX2ZDKZybYQosy+u8Xfud+SNl9//XWcPHkSBw8eNNkfGhpq/Llz587o0aMHPD098cMPP2D48OHltqVUKqFUKivMnYiIqKHT5RXCfcbjeKT5YHw9eRKa2KukTqlGSdYj5ezsDGtr6zI9RZmZmWV6lG5xdXUtN97GxgbNmjWrNKa8NidOnIidO3di3759aNWqVaX5urm5wdPTE+fOnbvrtRERETVW87/5ATqng/jxxidQK+VSp1PjJCukFAoFfH19ERsba7I/NjYWAQEB5R7j7+9fJn7v3r3o0aMH5HJ5pTG3tymEwOuvv45t27bhl19+gZeX113zvXbtGi5fvgw3t4b7zQMiIqJ7dWtJmB7K56CQW0ucTS0QEtq8ebOQy+VizZo1IiUlRURERAg7Oztx8eJFIYQQ06dPF2FhYcb4CxcuCLVaLaZMmSJSUlLEmjVrhFwuF1u3bjXGHDp0SFhbW4uoqCiRmpoqoqKihI2NjThy5Igx5rXXXhMajUbExcWJ9PR04ys/P18IIUROTo544403xOHDh0VaWprYt2+f8Pf3F/fdd5/Q6XRmX59WqxUAhFarvddbRUREVOf9fjlLYKZcYA7EtoOnpE6nyiz5/Ja0kBJCiOXLlwtPT0+hUCiEj4+PiI+PN743evRo0adPH5P4uLg40b17d6FQKETr1q3FypUry7S5ZcsW0b59eyGXy0WHDh1ETEyMyfsAyn2tXbtWCCFEfn6+CAoKEs2bNxdyuVx4eHiI0aNHi0uXLll0bSykiIioMQn9YLnAHAjbiAelTuWeWPL5Lek8Ug2dJfNQEBER1XcOU/yR2+QIhiiXYMf0KVKnU2X1Yh4pIiIiajhuXxJm3siGuyTMnSSf/oCIiIjqvxKDAR66Z1BsKETXNq5Sp1NrWEgRERHRPRvwUHv88dAmGAyNa8QQH+0RERFRtbGyqnhS7YaIhRQRERHdk2nrtuPbw2ekTkMSLKSIiIioym7kFuA/Z8diaGxnfLo7Qep0ah0LKSIiIqqyBd/8AKG6AevcVhgb5Cd1OrWOhRQRERFV2VenvgRQuiSMjXXjKysa3xUTERFRtTh7OQsZDrsAAJEDwyTORhospIiIiKhKZkZHA9YlsL3hgyEBnaRORxIspIiIiKhKdv21AQAQ7No4e6MAFlJERERUBeevXEe+8v8AgzXea0RLwtyJM5sTERGRxdq2bArd7CvY8MsxdPZykTodybBHioiIiKrE3laB1wY9InUakmIhRURERBbJuJ7b6NbUqwgLKSIiIrJIv6ipUL7dFpHrd0idiuQ4RoqIiIjMdiO3AL9ZfwPYadHcUSN1OpJjjxQRERGZbV70d4BKC+tcd0x6so/U6UiOhRQRERGZ7evTpXNHPaRqnEvC3Il3gIiIiMySeukq/nbYDQCIHNR4J+G8HQspIiIiMsusf5aEUd/wxZM9vaVOp05gIUVERERm2X3lnyVh3NgbdQsLKSIiIjLLwn7/Qfvclxv1kjB3kgkhOKNWDdHpdNBoNNBqtXB0dJQ6HSIiIjKDJZ/f7JEiIiIiqiIWUkRERFSpz/ccQdfpr+Orn09InUqdw0KKiIiIKvXBz2twynY55u1ZIXUqdQ4LKSIiIqrQdd1NnJV/AwAY/wi/rXcnFlJERERUofe++Q5Q6mCd64HXB/eWOp06h4UUERERVejrM6VzR/nZPs8lYcrBO0JERETlSr10FZkOewAAM57gY73ysJAiIiKics2M3vzPkjA9MPDhDlKnUyexkCIiIqJyya1sYJXXEiFcEqZCnNm8BnFmcyIiqu+KivUoKCqBo51S6lRqjSWf3za1lBMRERHVQwq5NRRya6nTqLP4aI+IiIhMlOgNiNoSi4KiEqlTqfNYSBEREZGJZd/tR2RKEDSRnVGiN0idTp3GQoqIiIhMrDhUOndUa6tHOXfUXfDuEBERkdF13U2ck28FALze6wWJs6n7WEgRERGR0dzonaVLwuR44rVBj0qdTp3HQoqIiIiMNqeUPtbrqeaSMObgHSIiIiIAwJmLmch0/GdJmMGchNMckhdSK1asgJeXF1QqFXx9fXHgwIFK4+Pj4+Hr6wuVSoU2bdpg1apVZWJiYmLg7e0NpVIJb29vbN++3eT9hQsX4qGHHoKDgwNatGiBoUOH4uzZsyYxQgjMmTMHLVu2hK2tLfr27YszZ87c+wUTERHVUR9+9wNgpYfdjYcw4KH2UqdTL0haSEVHRyMiIgIzZsxAYmIievXqhQEDBuDSpUvlxqelpWHgwIHo1asXEhMT8c4772DSpEmIiYkxxiQkJCA0NBRhYWFITk5GWFgYRo4ciaNHjxpj4uPjMWHCBBw5cgSxsbEoKSlBUFAQ8vLyjDGLFy/GkiVLsGzZMhw7dgyurq4IDAxETk5Ozd0QIiIiCX0+YQy+6nUCUf0/lDqVekPSJWL8/Pzg4+ODlStXGvd17NgRQ4cOxcKFC8vET5s2DTt37kRqaqpxX3h4OJKTk5GQkAAACA0NhU6nw+7du40xISEhcHJywqZNm8rN4+rVq2jRogXi4+PRu3dvCCHQsmVLREREYNq0aQCAwsJCuLi4YNGiRXj11VfNuj4uEUNERFT/WPL5LVmPVFFREU6cOIGgoCCT/UFBQTh8+HC5xyQkJJSJDw4OxvHjx1FcXFxpTEVtAoBWqwUANG3aFEBpz1dGRoZJO0qlEn369Km0ncLCQuh0OpMXERFRfcCJN6tGskIqKysLer0eLi4uJvtdXFyQkZFR7jEZGRnlxpeUlCArK6vSmIraFEJg6tSpePTRR9G5c2djG7eOM7cdoHTslUajMb7c3d0rjCUiIqorSvQG2L/VFe3eHIMzFzOlTqdekXywuUwmM9kWQpTZd7f4O/db0ubrr7+OkydPlvvYz9LcIiMjodVqja/Lly9XGEtERFRXLPtuPwo1Z3BevgP3OXMoiiVspDqxs7MzrK2ty/TwZGZmlukJusXV1bXceBsbGzRr1qzSmPLanDhxInbu3In9+/ejVatWJucBSnum3NzczMoNKH38p1QqK3yfiIioLlpx6EvAHuigH4km9iqp06lXJOuRUigU8PX1RWxsrMn+2NhYBAQElHuMv79/mfi9e/eiR48ekMvllcbc3qYQAq+//jq2bduGX375BV5eXibxXl5ecHV1NWmnqKgI8fHxFeZGRERUH2Vp841LwkzoxbmjLCYktHnzZiGXy8WaNWtESkqKiIiIEHZ2duLixYtCCCGmT58uwsLCjPEXLlwQarVaTJkyRaSkpIg1a9YIuVwutm7daow5dOiQsLa2FlFRUSI1NVVERUUJGxsbceTIEWPMa6+9JjQajYiLixPp6enGV35+vjEmKipKaDQasW3bNnHq1CkxatQo4ebmJnQ6ndnXp9VqBQCh1Wrv5TYRERHVmImrNwnMgbB5o7UoLtFLnU6dYMnnt6SFlBBCLF++XHh6egqFQiF8fHxEfHy88b3Ro0eLPn36mMTHxcWJ7t27C4VCIVq3bi1WrlxZps0tW7aI9u3bC7lcLjp06CBiYmJM3gdQ7mvt2rXGGIPBIGbPni1cXV2FUqkUvXv3FqdOnbLo2lhIERFRXdc8YqDAHIhHZ70rdSp1hiWf35LOI9XQcR4pIiKqy06n/Y0u6+4DrPTYM+gsgns8IHVKdYIln9+SDTYnIiIiaSnlNuhvPRvntaksoqqIPVI1iD1SRERE9U+9mNmciIiIqL5jIUVERNQIvbbyK0xdswU3cgukTqVe4xgpIiKiRqZEb8BnFyKht/8Tsugt+PClp6ROqd5ijxQREVEj8/G3caVFVEETzAx9Qup06jUWUkRERI3MqsMbAADtuSTMPWMhRURE1IhkafPxf8rSJWEm9uaSMPeKhRQREVEjMnfzt4AiFzY5Xggf+IjU6dR7LKSIiIgakc2/fQkA8Ld7HlZWMomzqf9YSBERETUSBUUlKBQ6AMDMJ/lYrzpw+gMiIqJGQqWwge6jQzh05g880slT6nQaBPZIERERNTIsoqoPCykiIqJGIPXSVaSlZ0udRoPDQoqIiKgRGPPpB2izwhWD3v+P1Kk0KCykiIiIGriiYj2OF24EbIrQya2t1Ok0KCykiIiIGriPd8bBYP8XZAVOeHfkIKnTaVBYSBERETVwqxNKl4TpoB8JRzulxNk0LCykiIiIGrDM7DycV8YAACb24dxR1Y2FFBERUQM2J3rHP0vCtMGrAwKkTqfBYSFFRETUgG09uxEAEGDPJWFqAgspIiKiBuyniV9giHIJ5g4bLXUqDZJMCCGkTqKh0ul00Gg00Gq1cHR0lDodIiIiMoMln9/skSIiIiKqIhZSREREDdD2Q6fhFNEf41dtlDqVBo2FFBERUQO0cNeXuOG0DzvOxkidSoPGQoqIiKiBKSrW40RxaU9UWDfOHVWTWEgRERE1MB99uw8GuyuQFThhxtMDpU6nQWMhRURE1MCsPlK6JExHQyiXhKlhLKSIiIgakMzsPFz4Z0mYSVwSpsZZVEgtXrwYN2/eNG7v378fhYWFxu2cnByMHz+++rIjIiIii8zevB1Q5MFG1xbjQvylTqfBs6iQioyMRE5OjnH7iSeewF9//WXczs/Px+rVq6svOyIiIrKIp7MLnLL7o7dmNJeEqQU2lgTfOQk6J0UnIiKqW6Y/HYjpTwfCYOBndG3gGCkiIqIGiL1RtYOFFBERUQMx9r9rcfJChtRpNCoWPdoDgM8//xz29vYAgJKSEqxbtw7Ozs4AYDJ+ioiIiGrP1gMnsfb6WKz9whZ/v3EVLZzspE6pUbCokPLw8MBnn31m3HZ1dcWGDRvKxBAREVHtWrRnA6AAWuYPYBFViywqpC5evFhDaRAREVFVFRXr8b/irwEF8AKXhKlVHCNFRERUzy3Z8UvpkjA3m2LGSC4JU5ssKqSOHj2K3bt3m+z78ssv4eXlhRYtWuCVV14xmaCTiIiIat6nR74EAHiLUNjbKiTOpnGxqJCaM2cOTp48adw+deoUXnrpJTz++OOYPn06vvvuOyxcuLDakyQiIqLyZVzPRZpqGwBgUl8+1qttFhVSSUlJeOyxx4zbmzdvhp+fHz777DNMnToVn3zyCb755ptqT5KIiIjKF33gBGClh1zXDi8H95Q6nUbHosHm2dnZcHFxMW7Hx8cjJCTEuP3QQw/h8uXL1ZcdERERVWrykD4Y2jMDB89c4CScErCoR8rFxQVpaWkAgKKiIvzvf/+Dv/+/CyLm5ORALpdblMCKFSvg5eUFlUoFX19fHDhwoNL4+Ph4+Pr6QqVSoU2bNli1alWZmJiYGHh7e0OpVMLb2xvbt283eX///v0YPHgwWrZsCZlMhh07dpRpY8yYMZDJZCavnj1Z6RMRUd3j6dIEz/X3kTqNRsmiQiokJATTp0/HgQMHEBkZCbVajV69ehnfP3nyJNq2bWt2e9HR0YiIiMCMGTOQmJiIXr16YcCAAbh06VK58WlpaRg4cCB69eqFxMREvPPOO5g0aRJiYmKMMQkJCQgNDUVYWBiSk5MRFhaGkSNH4ujRo8aYvLw8dOvWDcuWLbvr9aanpxtfu3btMvvaiIiIatqVa5wIW2oyYcHKw1evXsXw4cNx6NAh2NvbY926dRg+fLjx/cceeww9e/bEggULzGrPz88PPj4+WLlypXFfx44dMXTo0HIHrU+bNg07d+5EamqqcV94eDiSk5ORkJAAAAgNDYVOpzP5dmFISAicnJywadOmMm3KZDJs374dQ4cONdk/ZswY3Lhxo9zeKnPpdDpoNBpotVo4OjpWuR0iIqLy2E7tCmuhwsan12JIQCep02kwLPn8tqhHqnnz5jhw4ACys7ORnZ1tUkQBwJYtWzBnzhyz2ioqKsKJEycQFBRksj8oKAiHDx8u95iEhIQy8cHBwTh+/DiKi4srjamozcrExcWhRYsWeOCBBzBu3DhkZmZWGl9YWAidTmfyIiIiqgnf7E9GgeYU8uyT0dWrpdTpNFoWDTYfO3asWXFffPHFXWOysrKg1+tNBq8DpeOwMjLKX3AxIyOj3PiSkhJkZWXBzc2twpiK2qzIgAED8PTTT8PT0xNpaWmYOXMm+vfvjxMnTkCpVJZ7zMKFCzF37lyLzkNERFQVi38sXRLmvrzB8HJzkjqdRsuiQmrdunXw9PRE9+7dYcETwUrJZKbfMBBClNl3t/g791vaZnlCQ0ONP3fu3Bk9evSAp6cnfvjhhzI9cbdERkZi6tSpxm2dTgd3d3eLzktERHQ3RcV6JN5aEuZBzh0lJYsKqfDwcGzevBkXLlzA2LFj8fzzz6Np06ZVOrGzszOsra3L9BRlZmaW6VG6xdXVtdx4GxsbNGvWrNKYito0l5ubGzw9PXHu3LkKY5RKZYW9VURERNXlw+0/w2CXDtnNZnjn6QFSp9OoWTRGasWKFUhPT8e0adPw3Xffwd3dHSNHjsSPP/5ocQ+VQqGAr68vYmNjTfbHxsYiICCg3GP8/f3LxO/duxc9evQwTrtQUUxFbZrr2rVruHz5Mtzc3O6pHSIionv16dHSJWE6cUkYyVm8aLFSqcSoUaMQGxuLlJQUdOrUCePHj4enpydyc3Mtamvq1Kn4/PPP8cUXXyA1NRVTpkzBpUuXEB4eDqD0UdkLL7xgjA8PD8cff/yBqVOnIjU1FV988QXWrFmDN9980xgzefJk7N27F4sWLcJvv/2GRYsW4aeffkJERIQxJjc3F0lJSUhKSgJQOq1CUlKScdqF3NxcvPnmm0hISMDFixcRFxeHwYMHw9nZGcOGDbP0lhEREVWbjOu5uKgqnR8xov8Ld4mmmmbRo7073ZqoUggBg8Fg8fGhoaG4du0a5s2bh/T0dHTu3Bm7du2Cp6cnACA9Pd1kTikvLy/s2rULU6ZMwfLly9GyZUt88sknGDFihDEmICAAmzdvxrvvvouZM2eibdu2iI6Ohp+fnzHm+PHj6Nevn3H71rim0aNHY926dbC2tsapU6fw5Zdf4saNG3Bzc0O/fv0QHR0NBwcHi6+TiIioutjbKjC1zXrsPvsTXgx8WOp0Gj2L5pECSr/iv23bNnzxxRc4ePAgnnjiCbz44osICQmBlZXFHVwNGueRIiIiqn8s+fy2qEdq/Pjx2Lx5Mzw8PPDiiy9i8+bNxkHeRERERI2NRT1SVlZW8PDwQPfu3SudTmDbtm3Vklx9xx4pIiKqTi8vW4cL1y7hvadG45FOnlKn02DVWI/UCy+8YPF8TERERHTvDAaBry58iELNaayOdcMjncZJnRKhChNyEhERUe3bciAZhZrTQIkS7416Wup06B8cHU5ERFQP/OfHDQBKl4TxdGkibTJkxEKKiIiojisoKkGifiMAYHR3LglTl7CQIiIiquM+2PYTDOq/IbvpjMinQqROh27DQoqIiKiO+/xY6WO9zniGS8LUMSykiIiI6jhHuRNQZI8p/flYr66xeGZzMh/nkSIiouqSpc1HUwdbWFlxGqKaVmPzSBEREZE0nDVqqVOgcvDRHhERUR115mImvvr5BAwGPjyqq1hIERER1VFvblyLsIM98MDbL0qdClWAhRQREVEdZDAI7LtW+m29R9wflTgbqggLKSIiojooen8SCjVngBIl5j3zlNTpUAVYSBEREdVBH+wt7Y1qlfckl4Spw1hIERER1TEFRSVI0n8NAHjR5wWJs6HKsJAiIiKqYxbHxJYuCZPfHNOfCpY6HaoECykiIqI6ZlPiDgBAF9kzUKvk0iZDleKEnERERHVM4vzl+HD7cPRo6yV1KnQXLKSIiIjqGJXCBjNC+UivPuCjPSIiojqEs5jXLyykiIiI6oijqZehmOYBv3enoURvkDodMgMLKSIiojpidsxG6O3/xNncI7Cx5kd0fcA/JSIiojrAYBCIu146CefQNpw7qr5gIUVERFQHbIpLRKEmBShWcUmYeoSFFBERUR3wQeyXAAD3/Cfh0UIjcTZkLhZSREREEisoKkGyYRMAYKwvH+vVJyykiIiIJLZo614IdSZk+c3x9oggqdMhC3BCTiIiIok9fH9bdDn5OpqpnbkkTD0jE0Jw5q8aotPpoNFooNVq4ejoKHU6REREZAZLPr/5aI+IiIioilhIERERSWjQ+//Bx9/GcybzeoqFFBERkUQSUi5hV/HbiEjqi2Nn/5Q6HaoCFlJEREQSmbNtIwBAk90H/t4eEmdDVcFCioiISAKlS8KUTsI5tE2YxNlQVbGQIiIiksBXv5xAkeY3LglTz7GQIiIiksCSn0sXKHbPH8IlYeoxFlJERES1LL+gGCeNS8LwsV59xpnNiYiIallC6h+w1ttDnw8uCVPPsZAiIiKqZY91b4fCbudxOOUPLglTz/HRHhERkQSsrGR4tHNrqdOge8RCioiIqBYlpFyCLq9Q6jSomkheSK1YsQJeXl5QqVTw9fXFgQMHKo2Pj4+Hr68vVCoV2rRpg1WrVpWJiYmJgbe3N5RKJby9vbF9+3aT9/fv34/BgwejZcuWkMlk2LFjR5k2hBCYM2cOWrZsCVtbW/Tt2xdnzpy5p2slIiIa8OkLaDLfDfM375E6FaoGkhZS0dHRiIiIwIwZM5CYmIhevXphwIABuHTpUrnxaWlpGDhwIHr16oXExES88847mDRpEmJiYowxCQkJCA0NRVhYGJKTkxEWFoaRI0fi6NGjxpi8vDx069YNy5YtqzC3xYsXY8mSJVi2bBmOHTsGV1dXBAYGIicnp/puABERNSqHzvwBrVM8hCobj3X1ljodqgYyIYSQ6uR+fn7w8fHBypUrjfs6duyIoUOHYuHChWXip02bhp07dyI1NdW4Lzw8HMnJyUhISAAAhIaGQqfTYffu3caYkJAQODk5YdOmTWXalMlk2L59O4YOHWrcJ4RAy5YtERERgWnTpgEACgsL4eLigkWLFuHVV18t93oKCwtRWPhvd61Op4O7uzu0Wi0cHR3NvCtERNRQBb63AD8Z3kWT7L7I/mif1OlQBXQ6HTQajVmf35L1SBUVFeHEiRMICjL92mdQUBAOHz5c7jEJCQll4oODg3H8+HEUFxdXGlNRm+VJS0tDRkaGSTtKpRJ9+vSptJ2FCxdCo9EYX+7u7mafk4iIGjaDQWD/jdJJOIe1fUHibKi6SFZIZWVlQa/Xw8XFxWS/i4sLMjIyyj0mIyOj3PiSkhJkZWVVGlNRmxWd59ZxlrQTGRkJrVZrfF2+fNnscxIRUcO24efjKHI8+8+SMCOkToeqieTzSMlkMpNtIUSZfXeLv3O/pW1WV25KpRJKpdLi8xARUcO35OcNgC3gcXMoWjXncI+GQrIeKWdnZ1hbW5fp4cnMzCzTE3SLq6trufE2NjZo1qxZpTEVtVnReQDccztEREQAUFBUglNiMwAuCdPQSFZIKRQK+Pr6IjY21mR/bGwsAgICyj3G39+/TPzevXvRo0cPyOXySmMqarM8Xl5ecHV1NWmnqKgI8fHxFrVDREQEACqFDXYMi0MfMRvTnuKSMA2KkNDmzZuFXC4Xa9asESkpKSIiIkLY2dmJixcvCiGEmD59uggLCzPGX7hwQajVajFlyhSRkpIi1qxZI+Ryudi6dasx5tChQ8La2lpERUWJ1NRUERUVJWxsbMSRI0eMMTk5OSIxMVEkJiYKAGLJkiUiMTFR/PHHH8aYqKgoodFoxLZt28SpU6fEqFGjhJubm9DpdGZfn1arFQCEVqu9l9tEREREtciSz29JCykhhFi+fLnw9PQUCoVC+Pj4iPj4eON7o0ePFn369DGJj4uLE927dxcKhUK0bt1arFy5skybW7ZsEe3btxdyuVx06NBBxMTEmLy/b98+AaDMa/To0cYYg8EgZs+eLVxdXYVSqRS9e/cWp06dsujaWEgRERHVP5Z8fks6j1RDZ8k8FERE1DC9tvIr/HDue7zVdzwmPtlb6nTIDPViHikiIqLGYNPZz3FZE40fTpo/nyHVHyykiIiIasjB0xehdYoHhAxzRzwndTpUA1hIERER1ZC52zcCAJrc6Au/jlztoiFiIUVERFQDbl8SZkQ7LgnTULGQIiIiqgH/Lglji3mjuCRMQ8VCioiIqAYs+bm0N8rz5lC0bOYgcTZUU1hIERER1YAurp2h0nbB2B5cEqYh4zxSNYjzSBERkcEgYGVV8YL3VPdwHikiIqI6gkVUw8ZCioiIqBr98fcNhK/YgIzruVKnQrWAhRQREVE1mrlpC1ZffQHt3ntM6lSoFrCQIiIiqiab4hKxMX0mAKCX83CJs6HawEKKiIioGny47Rc8u7cPDOq/obrRFateflXqlKgWsJAiIiK6R1M+/wZvJg4AlDlokt0XZyP3w9OlidRpUS2wkToBIiKi+mzi6k1Ylv4cYCNwn/YpnJ6/AU3sVVKnRbWEPVJERET34MX+fWCd64HO+eNxYdFmFlGNDHukiIiILHT7JJs+97fEmcnHcf99zThnVCPEHikiIiILZGnzcd8bwzBx9SbjvvbuziyiGikWUkRERGY6f+U6vGY/jowm32LZH+E4f+W61CmRxPhoj4iIyAxHUy+j92fBKHJKhaygCZY/+j3atmwqdVokMRZSREREd/Ht4TMYsT0Ees2fsMq9D9uG/YghAZ2kTovqABZSRERElVj5wyFMODgYwj4bCm1HxL28B/7eHlKnRXUECykiIqJKbDy6G0KVDftsfyS9zcd5ZIqDzYmIiCqxf857GKH+BGlzf2IRRWWwkCIiIrqNwSAwbvl63MgtAABYWcmw9a2JcNaoJc6M6iIWUkRERP8oKtbjwXcm4vOsMfCe+SxK9AapU6I6jmOkiIiIAOjyCuE983n8pdkKCBn83frCxpr9DVQ5FlJERNToXcrUotv7Q3HDKQ7QyzHJfQM+HhcqdVpUD7CQIiKiRi3pfDr8lw1AgVMyUOiAxb7b8daIx6ROi+oJFlJERNRoGQwCjyx7EgVNkiHLb4ENwbvxXH8fqdOieoQPf4mIqNGyspJhadAnUGm74KdRh1lEkcVkQgghdRINlU6ng0ajgVarhaOjo9TpEBHRP/68qkOr5v/+vVyiN3BgORlZ8vnN3xoiImpUxq/aCI8P22BTXKJxH4soqir+5hARUaMxJGoJVv79PITtNSzc84XU6VADwMHmRETU4JXoDfCfNQ3HFR8AAB4smIxjC5ZInBU1BOyRIiKiBi2/oBgPvD3aWESF2EThxIKlfJxH1YI9UkRE1GBlZueh07ynkNVkD2Cwxsst1uCzCaOlTosaEJbjRETUYKlVcghhAIptMfuBb1lEUbVjjxQRETVY9rYKnJ4Vg5+TfuccUVQj2CNFREQNytYDJ9F79iwYDKXTJLo2tWcRRTWGhRQRETUYH38bj6d398IBq/fw/EerpU6HGgEWUkRE1CC8vXYbIo4HA0odHLN7YcGoUKlTokaAY6SIiKjee3bJKmzSjQdsBNxuDMXpeV+jqaOt1GlRIyB5j9SKFSvg5eUFlUoFX19fHDhwoNL4+Ph4+Pr6QqVSoU2bNli1alWZmJiYGHh7e0OpVMLb2xvbt2+3+LxjxoyBTCYzefXs2fPeLpaIiKqVwSDQd84cbMp5DZAJdMh7BRcXb2URRbVG0kIqOjoaERERmDFjBhITE9GrVy8MGDAAly5dKjc+LS0NAwcORK9evZCYmIh33nkHkyZNQkxMjDEmISEBoaGhCAsLQ3JyMsLCwjBy5EgcPXrU4vOGhIQgPT3d+Nq1a1fN3AgiIqqS6P1JiBfvAQD6iNk4E7UKCrm1xFlRYyITQgipTu7n5wcfHx+sXLnSuK9jx44YOnQoFi5cWCZ+2rRp2LlzJ1JTU437wsPDkZycjISEBABAaGgodDoddu/ebYwJCQmBk5MTNm3aZPZ5x4wZgxs3bmDHjh1mX09hYSEKCwuN2zqdDu7u7matHk1ERFXzwkefocSgx9dTw6VOhRoInU4HjUZj1ue3ZD1SRUVFOHHiBIKCgkz2BwUF4fDhw+Uek5CQUCY+ODgYx48fR3FxcaUxt9q05LxxcXFo0aIFHnjgAYwbNw6ZmZmVXtPChQuh0WiML3d390rjiYjIcn/8fQMJKf8+QfgyYhyLKJKMZIVUVlYW9Ho9XFxcTPa7uLggIyOj3GMyMjLKjS8pKUFWVlalMbfaNPe8AwYMwMaNG/HLL7/gww8/xLFjx9C/f3+THqc7RUZGQqvVGl+XL1++y10gIiJLHP/9L7Rf1At91gTi7OUsqdMhkv5bezKZzGRbCFFm393i79xvTpt3iwkN/fdrs507d0aPHj3g6emJH374AcOHDy83N6VSCaVSWWHuRERUdbt+/Q1PbgmGXnMJVnluOPtXJtq7O0udFjVykhVSzs7OsLa2LtP7lJmZWaa36BZXV9dy421sbNCsWbNKY261WZXzAoCbmxs8PT1x7tw58y6QiIiqzed7juCVuCcg7K9BrnsAv7z4Ix7t3FrqtIike7SnUCjg6+uL2NhYk/2xsbEICAgo9xh/f/8y8Xv37kWPHj0gl8srjbnVZlXOCwDXrl3D5cuX4ebmZt4FEhFRtZj79S6MO9gfwvYa7G48jFMRh1hEUd0hJLR582Yhl8vFmjVrREpKioiIiBB2dnbi4sWLQgghpk+fLsLCwozxFy5cEGq1WkyZMkWkpKSINWvWCLlcLrZu3WqMOXTokLC2thZRUVEiNTVVREVFCRsbG3HkyBGzz5uTkyPeeOMNcfjwYZGWlib27dsn/P39xX333Sd0Op3Z16fVagUAodVq7/VWERE1SpHrdwjMshaYA+EcESL+vp4rdUrUCFjy+S1pISWEEMuXLxeenp5CoVAIHx8fER8fb3xv9OjRok+fPibxcXFxonv37kKhUIjWrVuLlStXlmlzy5Yton379kIul4sOHTqImJgYi86bn58vgoKCRPPmzYVcLhceHh5i9OjR4tKlSxZdGwspIqJ7c+pChrB5o41o80aYyLtZJHU61EhY8vkt6TxSDZ0l81AQEVEpg0HAyurfL/+cuVg6qNzGWvLFOKiRqBfzSBEREd0p92YR2rz1PMYtX2/c16l1CxZRVGfxN5OIiOqEK9dy4Bk5CH84fo3P01/D6bS/pU6J6K4kn0eKiIjodNrf8PtkEPKdTgBFdni/2zZ09qp4ShqiuoKFFBERSeqXpPMI3hCMkibnIctvjnWBu/DC4z2kTovILCykiIhIMpviEvHc7hAIx0zY5Hhh16gfEeh7v9RpEZmNhRQREUlmzf4fINSZUN3ohqMT96BrG1epUyKyCAebExGRZPa+OwPDbD/C2ch4FlFUL7GQIiKiWjX5s2hkafMBAFZWMmx7ezI8WmgkzoqoalhIERFRrTAYBAJmvoNPrjyDTnNCUVSslzolonvGMVJERFTjCopK0HXGKzhnvxYA0K2pPyfZpAaBhRQREdWoLG0+Os0JRWaT7wGDFUY3+xTrJr0kdVpE1YKFFBER1Zhzf16DzweDkeuUABSr8E77aCwIe1LqtIiqDQspIiKqEQaDwEMfjECuUwJkBU5Y/uh3eG3QI1KnRVSt+ICaiIhqhJWVDB8P+g8U2g7YPvgAiyhqkGRCCCF1Eg2VTqeDRqOBVquFo6Oj1OkQEdWKjOu5cG1qb9wuKtZDIbeWMCMiy1jy+c0eKSIiqjYzNuxEy8VeWB97zLiPRRQ1ZCykiIioWoz++HO8/3/DIGyzMP/HFVKnQ1QrONiciIjuicEgEDh/Pn4RswAr4IHcl5D8/iqp0yKqFeyRIiKiKisq1qNb5OulRRSAR/QzkLroM6gU/Hc6NQ78TScioirR5RXCe+bz+EuzFRAyPGX3Cba89brUaRHVKhZSRERUJQq5NYpFAVCiQITnBix9eaTUKRHVOhZSRERUJSqFDc7MicaOhJN4OaSn1OkQSYJjpIiIyGw/Hv8dvWfPgsFQOgWhs0bNIooaNRZSRERklvWxxzBg6yM4YPUeRvznY6nTIaoTWEgREdFdLYj+EWPi+kHYZkF9owfmhz4rdUpEdQLHSBERUaVeW/kVVqW/CChK0DQ7EKdmxqBlMwep0yKqE1hIERFRhQYv/BDfF70JWAOeumdxeuFa2NsqpE6LqM7goz0iIirX90dT8X3BNACAT+EU/N/iDSyiiO7AHikiIirXE34d8fLxNfjrxt/4fuZbsLKSSZ0SUZ3DQoqIiIwyrufifPo1PNLJEwDw2YTREmdEVLfx0R4REQEAUi9dRbv3+qPv2v44nfa31OkQ1QsspIiICPtPpqHbx48gr8kx6G20SE77S+qUiOoFPtojImrkvtmfjFHfh8DgmAHrXA98N3IvBjzUXuq0iOoFFlJERI3YRzviMOXXIYCdDkptZxwM34MeD9wndVpE9QYLKSKiRipqSywiTz4BKIugye6N5Mhv4enSROq0iOoVjpEiImqkhvXsDnl+a7jdGIaL839kEUVUBeyRIiJqpNq7O+N/r+/HA62coZBbS50OUb3EHikiokaioKgEHd9+Fc9/9KlxX2cvFxZRRPeAPVJERHVQid6ALG0+WjSxM84oHpd8Af+78Ad0+fnQ3syD9mYecgvzkVuYh7ziPGyc8AZaNXcEALz4yRf4IS0GxSIfxbI8lMjyUWyTDYPdFfx2XY4JKSHw9/aQ8hKJGgQWUkREVZBfUIyr2jxc0+UjOzcfj3VvZ3xv7d5f8b+089AV/FPoFOUhvzgf+cV5uFmSjyOzP4ajnRIAEPjeAhy6vg16WT701nkwWOdD2OQB8gIAQMqLmejo0RwAMPHrD3FavaL8hGTA73+9YCykUjLP4mqTXWXjilV4u93XLKKIqgkLKSJqcAwGAV1+Ia7p8pGlzcP13Hxcz8nDzaIivBzS0xg366vvcfLPc8grzkdeUWmRc7MkDwWGfJSIIvy5ZIsxtuPbr+J3fA+DdR4gzwOsS0zOmdOh0Lig79wfP8Yfjl+bJmUFQFn6+jt7gbGQ+ivnMm42+V+F13I9J9/4c0uHVjh33Rs2wg42Qg2FzA4KmRoqKzuorNVo6qA2xo7v8xQ6/dYBDio1NLZ20KjV0KjV6NWpHdq7O1t6S4moAiykiEgS5/68hus5+dDlFyC3oBC6/JvIzsuDNj8fMpkMs0YNMMY+/9GnOJd1HjdL8nCzJA+FhnwUijwUiXzYyJS4unS3MdZ5SgiuOcYCVoayJy22xcsh/xYmq0+sQmaTH0o3bFDmb8SiYr1x/FBeiRYGzZWybRqsgGI7XM+5aSyk2jt1hi6rP+QyNZSy0iJHZW0HWxs17OR2sPsnDgDmPPEaUv98Ek3s7NBErYaTgx2a2qvRzNEOzRzUcNb8Wxz9+G4kgEiz7u/owIcwOvAhs2KJqOokL6RWrFiB//znP0hPT0enTp3w0UcfoVevXhXGx8fHY+rUqThz5gxatmyJt99+G+Hh4SYxMTExmDlzJs6fP4+2bdtiwYIFGDZsmEXnFUJg7ty5+PTTT5GdnQ0/Pz8sX74cnTp1qt4bQFSDior10OUXokRvgGtTe+P+nUdScCM3H3mFhcgvLERuQQHyiwpxs6gQzg4akyJmxOJPcDUvC4X6QhTpC1FkKESxoRBFhgI4K91w/P0PjbGtpj6Na+IcDLJCGKxKX8KqEMK6EIqb7ihckmKM7frhYyhoklxu3lZ5rpg1Kv3ffC9+iRynQ4C8vIu0N9mUQWZaROnlkBXbQaZXw1pvD4NBGMcc9WjeB2euOUJlZQdbm38KHYUd7BV2sFOqYRDC2Mya595DVs40NLW3Q1MHNZwd7dDMUQ17W4WxvVssKXhG9u4GoJtZsURU90haSEVHRyMiIgIrVqzAI488gtWrV2PAgAFISUmBh0fZ5/dpaWkYOHAgxo0bh6+++gqHDh3C+PHj0bx5c4wYMQIAkJCQgNDQULz33nsYNmwYtm/fjpEjR+LgwYPw8/Mz+7yLFy/GkiVLsG7dOjzwwAOYP38+AgMDcfbsWTg4ONTeTaJ6paCoBDdyC5CTX4icm4XIvVmIJva26NrGFUBpYfPRt/uQX1hoLFxuFhciv6gABSWF6OTWBu+FDQZQ+niq+zuTjYVLsSj9bwkKUSIKcb+9L47OX2Q8t3KqN0psbkBYF0BYFQI2hYCVHgDQJLsvsj/aZ4wduqM3hO21cq9BndrDpJD6NnMJ9A5/lD6auqOQSdd2APBvIZWFVBQ2OVNuu/qiHJNta6iAEgWgV0JmUMJKr4a1Xg0bYQc7tDCJDWw5En/c8INabge1XA17pR3slWo4quygsbUzif154loYhPin2FFDrSqv+ir1wztvVfjenQJ97zc7logaD5kQt/2Tq5b5+fnBx8cHK1euNO7r2LEjhg4dioULF5aJnzZtGnbu3InU1FTjvvDwcCQnJyMhIQEAEBoaCp1Oh927/+3qDwkJgZOTEzZt2mTWeYUQaNmyJSIiIjBt2jQAQGFhIVxcXLBo0SK8+uqrZl2fTqeDRqOBVquFo6OjBXfm7v68qsOF9GsoMRgghECJ3gCDENAbDNDrDfDr4GnsgTj35zX8+vsfpe8Zbov759jHunUwjpk4eSEDe/532iTOcOs4CDzZozv8OroDAI7//hc27j9kbEcvSmMNQsAgDBjh529cr+vY2T+x9Ifv/40Tpcfc+u9TD/XG84/5GmNnfPMlBErfN4mFwIjujyFiaF8AwP/OXcEr65YAt71f+t/SYwZ3CsL7LwwBAJxO+xvDV0SWtgUDcHssBAK9gvH562MAAOevXEev/7wEAQE9ilAsCqBHIfSy0pef43DEz5lrjG23ohVgXVju4yQv3fO48OEGAMCN3AI4fWhb4Z+r242huLJ0u3FbNtvGWAzdqWn247j2Uey/sZFNAJW23Fj7bH/kfHTYuG075UEU21yHlUEJK1H6shZK2ECF+5QdcHrRvwOaH531LnRFN6CwUkJhrYTSRgmltRIqGyVaNmmBFeHPGWM//jYeuQUFsFepYKdUwk5V+rJXKdHEzhY+97c0xt7eM0REVJdY8vktWY9UUVERTpw4genTp5vsDwoKwuHDh8s9JiEhAUFBQSb7goODsWbNGhQXF0MulyMhIQFTpkwpE/PRRx+Zfd60tDRkZGSYnEupVKJPnz44fPhwhYVUYWEhCgsLjds6na6SO3BvJq1di+03Iyp8f/7lPZgRGlz6c8x2fHljXIWxb12NweIXhwMAPovdh2UZz1YYey3nS/h1DAMAbDtyHB/9FVph7M2Dq4yFVNzps9iU81qFseLYh8ZCKvHCZcQaZpgGyP55AVCeVBoLqQsZWTih+BAVsT/nCKC0kMrU5uKc/doKY4/92RzAGABAzs1CpDfZUWHslZwexp9tlXJAfrNskMEKKFHBWvbv/2b2tgqotF1uK1yUsJH9+9/uLg+bNNFH9i6sYAWVjQoqeWnxopIrYStX4oEHTddD2/D4z5DbWBsLF0e1Cg62SjiolXBUK01iby5NqvDa7nRw3nyzYycP6WN2LIsoImoIJCuksrKyoNfr4eLiYrLfxcUFGRkZ5R6TkZFRbnxJSQmysrLg5uZWYcytNs05763/lhfzxx9/VHhNCxcuxNy5cyt8vzoprRVAkRqAFSBkgLCC7NbPsILC5t8/Wo2tPaz/bGWMLY2zKh1LAito1P8+GnHROEH1W1eUjjSxMv73Vqzb/c2MsZ7OzdHkf31NYmSwAmQyWMEK97drZYxt49ICbgeHwUpm9W+s7N9jfHw7GGMfaOmCB3Jf+jf2n/ZkstLYvg/+W8S0a9kcPYrehJXMyviSQWb8eWC3R42xbd2aIcj6/X/jZDJY33Zc7wf/HafSylmDZx1XQSaTQWWjgK1cCbVSBbVCCbVSic4e/15biyZ2ODAiDfa2ytLCxVYJRzslVIqy/3vZWFvh5pKTZv85x82ZY3bsrUKUiIhqj+SDzWUy03+VCiHK7Ltb/J37zWmzumJuFxkZialTpxq3dTod3N3dK4y/F5veeA2bUHEPz+0+eeUZfIJnzIp995kQvPtMiFmxrw4MwKsD9909EMCIR7tgxKPbzIrt260Nznb73KzYB9u64diC/5gV6+nS5J9BwHfnrFFj4xTzHuHaWFvh0c6tzYolIqKGRbJCytnZGdbW1mV6nzIzM8v0BN3i6upabryNjQ2aNWtWacytNs05r6tr6cDgjIwMuLm5mZUbUPr4T6lUVvg+ERERNSySrbWnUCjg6+uL2NhYk/2xsbEICAgo9xh/f/8y8Xv37kWPHj0gl8srjbnVpjnn9fLygqurq0lMUVER4uPjK8yNiIiIGiEhoc2bNwu5XC7WrFkjUlJSREREhLCzsxMXL14UQggxffp0ERYWZoy/cOGCUKvVYsqUKSIlJUWsWbNGyOVysXXrVmPMoUOHhLW1tYiKihKpqakiKipK2NjYiCNHjph9XiGEiIqKEhqNRmzbtk2cOnVKjBo1Sri5uQmdTmf29Wm1WgFAaLXae7lNREREVIss+fyWtJASQojly5cLT09PoVAohI+Pj4iPjze+N3r0aNGnTx+T+Li4ONG9e3ehUChE69atxcqVK8u0uWXLFtG+fXshl8tFhw4dRExMjEXnFUIIg8EgZs+eLVxdXYVSqRS9e/cWp06dsujaWEgRERHVP5Z8fks6j1RDV5PzSBEREVHNsOTzW7IxUkRERET1HQspIiIioipiIUVERERURSykiIiIiKqIhRQRERFRFbGQIiIiIqoiFlJEREREVcRCioiIiKiKWEgRERERVZGN1Ak0ZLcmjdfpdBJnQkREROa69bltzuIvLKRqUE5ODgDA3d1d4kyIiIjIUjk5OdBoNJXGcK29GmQwGHDlyhU4ODhAJpNVa9s6nQ7u7u64fPky1/G7C94r8/FemY/3yny8V+bjvTJfTd4rIQRycnLQsmVLWFlVPgqKPVI1yMrKCq1atarRczg6OvJ/NjPxXpmP98p8vFfm470yH++V+WrqXt2tJ+oWDjYnIiIiqiIWUkRERERVxEKqnlIqlZg9ezaUSqXUqdR5vFfm470yH++V+XivzMd7Zb66cq842JyIiIioitgjRURERFRFLKSIiIiIqoiFFBEREVEVsZAiIiIiqiIWUg1IYWEhHnzwQchkMiQlJUmdTp305JNPwsPDAyqVCm5ubggLC8OVK1ekTqvOuXjxIl566SV4eXnB1tYWbdu2xezZs1FUVCR1anXSggULEBAQALVajSZNmkidTp2zYsUKeHl5QaVSwdfXFwcOHJA6pTpn//79GDx4MFq2bAmZTIYdO3ZInVKdtXDhQjz00ENwcHBAixYtMHToUJw9e1ayfFhINSBvv/02WrZsKXUadVq/fv3wzTff4OzZs4iJicH58+fx1FNPSZ1WnfPbb7/BYDBg9erVOHPmDJYuXYpVq1bhnXfekTq1OqmoqAhPP/00XnvtNalTqXOio6MRERGBGTNmIDExEb169cKAAQNw6dIlqVOrU/Ly8tCtWzcsW7ZM6lTqvPj4eEyYMAFHjhxBbGwsSkpKEBQUhLy8PGkSEtQg7Nq1S3To0EGcOXNGABCJiYlSp1QvfPvtt0Imk4mioiKpU6nzFi9eLLy8vKROo05bu3at0Gg0UqdRpzz88MMiPDzcZF+HDh3E9OnTJcqo7gMgtm/fLnUa9UZmZqYAIOLj4yU5P3ukGoC///4b48aNw4YNG6BWq6VOp964fv06Nm7ciICAAMjlcqnTqfO0Wi2aNm0qdRpUjxQVFeHEiRMICgoy2R8UFITDhw9LlBU1NFqtFgAk+/uJhVQ9J4TAmDFjEB4ejh49ekidTr0wbdo02NnZoVmzZrh06RK+/fZbqVOq886fP4///ve/CA8PlzoVqkeysrKg1+vh4uJist/FxQUZGRkSZUUNiRACU6dOxaOPPorOnTtLkgMLqTpqzpw5kMlklb6OHz+O//73v9DpdIiMjJQ6ZcmYe69ueeutt5CYmIi9e/fC2toaL7zwAkQjmeDf0nsFAFeuXEFISAiefvppvPzyyxJlXvuqcq+ofDKZzGRbCFFmH1FVvP766zh58iQ2bdokWQ5cIqaOysrKQlZWVqUxrVu3xjPPPIPvvvvO5C8lvV4Pa2trPPfcc1i/fn1Npyo5c++VSqUqs//PP/+Eu7s7Dh8+DH9//5pKsc6w9F5duXIF/fr1g5+fH9atWwcrq8bzb6+q/F6tW7cOERERuHHjRg1nVz8UFRVBrVZjy5YtGDZsmHH/5MmTkZSUhPj4eAmzq7tkMhm2b9+OoUOHSp1KnTZx4kTs2LED+/fvh5eXl2R52Eh2ZqqUs7MznJ2d7xr3ySefYP78+cbtK1euIDg4GNHR0fDz86vJFOsMc+9VeW79O6KwsLA6U6qzLLlXf/31F/r16wdfX1+sXbu2URVRwL39XlEphUIBX19fxMbGmhRSsbGxGDJkiISZUX0mhMDEiROxfft2xMXFSVpEASyk6j0PDw+TbXt7ewBA27Zt0apVKylSqrN+/fVX/Prrr3j00Ufh5OSECxcuYNasWWjbtm2j6I2yxJUrV9C3b194eHjggw8+wNWrV43vubq6SphZ3XTp0iVcv34dly5dgl6vN87j1q5dO+P/k43V1KlTERYWhh49esDf3x+ffvopLl26xPF2d8jNzcX//d//GbfT0tKQlJSEpk2blvl7vrGbMGECvv76a3z77bdwcHAwjrfTaDSwtbWt/YQk+a4g1Zi0tDROf1CBkydPin79+ommTZsKpVIpWrduLcLDw8Wff/4pdWp1ztq1awWAcl9U1ujRo8u9V/v27ZM6tTph+fLlwtPTUygUCuHj4yPZ19Trsn379pX7OzR69GipU6tzKvq7ae3atZLkwzFSRERERFXUuAY9EBEREVUjFlJEREREVcRCioiIiKiKWEgRERERVRELKSIiIqIqYiFFREREVEUspIiIiIiqiIUUERERURWxkCKiWte3b19ERERInUa5rl27hhYtWuDixYsAgLi4OMhkshpfiLiq51m3bh2aNGli0TEPPfQQtm3bZtExRFQ+FlJEVO+lp6fj2WefRfv27WFlZVVhkRYTEwNvb28olUp4e3tj+/btZWIWLlyIwYMHo3Xr1jWbtIRmzpyJ6dOnw2AwSJ0KUb3HQoqI6r3CwkI0b94cM2bMQLdu3cqNSUhIQGhoKMLCwpCcnIywsDCMHDkSR48eNcbcvHkTa9aswcsvv1xbqUti0KBB0Gq1+PHHH6VOhajeYyFFRJLKzs7GCy+8ACcnJ6jVagwYMADnzp0zifnss8/g7u4OtVqNYcOGYcmSJSaPs1q3bo2PP/4YL7zwAjQaTbnn+eijjxAYGIjIyEh06NABkZGReOyxx/DRRx8ZY3bv3g0bGxv4+/tXmO+1a9cwatQotGrVCmq1Gl26dMGmTZtMYvr27YuJEyciIiICTk5OcHFxwaeffoq8vDy8+OKLcHBwQNu2bbF79+4y7R86dAjdunWDSqWCn58fTp06ZfL+unXr4OHhYbwX165dM3n//PnzGDJkCFxcXGBvb4+HHnoIP/30k0mMtbU1Bg4cWCZvIrIcCykiktSYMWNw/Phx7Ny5EwkJCRBCYODAgSguLgZQWliEh4dj8uTJSEpKQmBgIBYsWGDxeRISEhAUFGSyLzg4GIcPHzZu79+/Hz169Ki0nYKCAvj6+uL777/H6dOn8corryAsLMykZwsA1q9fD2dnZ/z666+YOHEiXnvtNTz99NMICAjA//73PwQHByMsLAz5+fkmx7311lv44IMPcOzYMbRo0QJPPvmk8V4cPXoUY8eOxfjx45GUlIR+/fph/vz5Jsfn5uZi4MCB+Omnn5CYmIjg4GAMHjwYly5dMol7+OGHceDAAfNuHhFVTBAR1bI+ffqIyZMni99//10AEIcOHTK+l5WVJWxtbcU333wjhBAiNDRUDBo0yOT45557Tmg0mkrbvpNcLhcbN2402bdx40ahUCiM20OGDBFjx441idm3b58AILKzsyu8noEDB4o33njDJIdHH33UuF1SUiLs7OxEWFiYcV96eroAIBISEkzOs3nzZmPMtWvXhK2trYiOjhZCCDFq1CgREhJicu7Q0NAK78Ut3t7e4r///a/Jvm+//VZYWVkJvV5f6bFEVDn2SBGRZFJTU2FjYwM/Pz/jvmbNmqF9+/ZITU0FAJw9exYPP/ywyXF3bptLJpOZbAshTPbdvHkTKpWq0jb0ej0WLFiArl27olmzZrC3t8fevXvL9Ph07drV+LO1tTWaNWuGLl26GPe5uLgAADIzM02Ou/2xYtOmTU3uRWpqapnHjndu5+Xl4e2334a3tzeaNGkCe3t7/Pbbb2Xys7W1hcFgQGFhYaXXS0SVs5E6ASJqvIQQFe6/VeDcWexUdlxlXF1dkZGRYbIvMzPTWNAAgLOzM7Kzsytt58MPP8TSpUvx0UcfoUuXLrCzs0NERASKiopM4uRyucm2TCYz2Xfrmsz55tzt9+Ju3nrrLfz444/44IMP0K5dO9ja2uKpp54qk9/169ehVqtha2t71zaJqGLskSIiyXh7e6OkpMRkfNG1a9fw+++/o2PHjgCADh064NdffzU57vjx4xafy9/fH7GxsSb79u7di4CAAON29+7dkZKSUmk7Bw4cwJAhQ/D888+jW7duaNOmTZnB8ffiyJEjxp+zs7Px+++/o0OHDgBK79ft798Zfyu/MWPGYNiwYejSpQtcXV2Nc2Ld7vTp0/Dx8am2vIkaKxZSRCSZ+++/H0OGDMG4ceNw8OBBJCcn4/nnn8d9992HIUOGAAAmTpyIXbt2YcmSJTh37hxWr16N3bt3l+mlSkpKQlJSEnJzc3H16lUkJSWZFEWTJ0/G3r17sWjRIvz2229YtGgRfvrpJ5M5p4KDg3HmzJlKe6XatWuH2NhYHD58GKmpqXj11VfL9HTdi3nz5uHnn3/G6dOnMWbMGDg7O2Po0KEAgEmTJmHPnj1YvHgxfv/9dyxbtgx79uwpk9+2bduQlJSE5ORkPPvss+X2eh04cKDM4HsishwLKSKS1Nq1a+Hr64snnngC/v7+EEJg165dxsdgjzzyCFatWoUlS5agW7du2LNnD6ZMmVJmLFP37t3RvXt3nDhxAl9//TW6d++OgQMHGt8PCAjA5s2bsXbtWnTt2hXr1q1DdHS0yfisLl26oEePHvjmm28qzHfmzJnw8fFBcHAw+vbtC1dXV2OhUx2ioqIwefJk+Pr6Ij09HTt37oRCoQAA9OzZE59//jn++9//4sEHH8TevXvx7rvvmhy/dOlSODk5ISAgAIMHD0ZwcHCZnqe//voLhw8fxosvvlhteRM1VjJRlcEGREQSGjduHH777bca+fr+rl278Oabb+L06dOwsmqY/9Z86623oNVq8emnn0qdClG9x8HmRFTnffDBBwgMDISdnR12796N9evXY8WKFTVyroEDB+LcuXP466+/4O7uXiPnkFqLFi3w5ptvSp0GUYPAHikiqvNGjhyJuLg45OTkoE2bNpg4cSLCw8OlTouIiIUUERERUVU1zAEARERERLWAhRQRERFRFbGQIiIiIqoiFlJEREREVcRCioiIiKiKWEgRERERVRELKSIiIqIqYiFFREREVEX/D9lunEmgFDIbAAAAAElFTkSuQmCC",
+      "text/plain": [
+       "<Figure size 640x480 with 1 Axes>"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "import numpy as np\n",
+    "import pandas as pd\n",
+    "import matplotlib.pyplot as plt\n",
+    "from sklearn.model_selection import train_test_split\n",
+    "from sklearn import linear_model\n",
+    "from sklearn.preprocessing import StandardScaler\n",
+    "\n",
+    "def MSE(y_data,y_model):\n",
+    "    n = np.size(y_model)\n",
+    "    return np.sum((y_data-y_model)**2)/n\n",
+    "# A seed just to ensure that the random numbers are the same for every run.\n",
+    "# Useful for eventual debugging.\n",
+    "np.random.seed(3155)\n",
+    "\n",
+    "n = 100\n",
+    "x = np.random.rand(n)\n",
+    "y = np.exp(-x**2) + 1.5 * np.exp(-(x-2)**2)\n",
+    "\n",
+    "Maxpolydegree = 20\n",
+    "X = np.zeros((n,Maxpolydegree-1))\n",
+    "\n",
+    "for degree in range(1,Maxpolydegree): #No intercept column\n",
+    "    X[:,degree-1] = x**(degree)\n",
+    "\n",
+    "# We split the data in test and training data\n",
+    "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)\n",
+    "\n",
+    "#For our own implementation, we will need to deal with the intercept by centering the design matrix and the target variable\n",
+    "\n",
+    "X_train_mean = np.mean(X_train,axis=0)\n",
+    "#Center by removing mean from each feature\n",
+    "X_train_scaled = X_train - X_train_mean \n",
+    "X_test_scaled = X_test - X_train_mean\n",
+    "#The model intercept (called y_scaler) is given by the mean of the target variable (IF X is centered)\n",
+    "#Remove the intercept from the training data.\n",
+    "y_scaler = np.mean(y_train)           \n",
+    "y_train_scaled = y_train - y_scaler   \n",
+    "\n",
+    "p = Maxpolydegree-1\n",
+    "I = np.eye(p,p)\n",
+    "# Decide which values of lambda to use\n",
+    "nlambdas = 6\n",
+    "MSEOwnRidgePredict = np.zeros(nlambdas)\n",
+    "MSERidgePredict = np.zeros(nlambdas)\n",
+    "\n",
+    "lambdas = np.logspace(-4, 2, nlambdas)\n",
+    "for i in range(nlambdas):\n",
+    "    lmb = lambdas[i]\n",
+    "    OwnRidgeTheta = np.linalg.pinv(X_train_scaled.T @ X_train_scaled+lmb*I) @ X_train_scaled.T @ (y_train_scaled)\n",
+    "    intercept_ = y_scaler - X_train_mean@OwnRidgeTheta #The intercept can be shifted so the model can predict on uncentered data\n",
+    "    #Add intercept to prediction\n",
+    "    ypredictOwnRidge = X_test_scaled @ OwnRidgeTheta + y_scaler \n",
+    "    RegRidge = linear_model.Ridge(lmb)\n",
+    "    RegRidge.fit(X_train,y_train)\n",
+    "    ypredictRidge = RegRidge.predict(X_test)\n",
+    "    MSEOwnRidgePredict[i] = MSE(y_test,ypredictOwnRidge)\n",
+    "    MSERidgePredict[i] = MSE(y_test,ypredictRidge)\n",
+    "    print(\"Theta values for own Ridge implementation\")\n",
+    "    print(OwnRidgeTheta) #Intercept is given by mean of target variable\n",
+    "    print(\"Theta values for Scikit-Learn Ridge implementation\")\n",
+    "    print(RegRidge.coef_)\n",
+    "    print('Intercept from own implementation:')\n",
+    "    print(intercept_)\n",
+    "    print('Intercept from Scikit-Learn Ridge implementation')\n",
+    "    print(RegRidge.intercept_)\n",
+    "    print(\"MSE values for own Ridge implementation\")\n",
+    "    print(MSEOwnRidgePredict[i])\n",
+    "    print(\"MSE values for Scikit-Learn Ridge implementation\")\n",
+    "    print(MSERidgePredict[i])\n",
+    "\n",
+    "\n",
+    "# Now plot the results\n",
+    "plt.figure()\n",
+    "plt.plot(np.log10(lambdas), MSEOwnRidgePredict, 'b--', label = 'MSE own Ridge Test')\n",
+    "plt.plot(np.log10(lambdas), MSERidgePredict, 'g--', label = 'MSE SL Ridge Test')\n",
+    "plt.xlabel('log10(lambda)')\n",
+    "plt.ylabel('MSE')\n",
+    "plt.legend()\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6ea197d8",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "We see here, when compared to the code which includes explicitely the\n",
+    "intercept column, that our MSE value is actually smaller. This is\n",
+    "because the regularization term does not include the intercept value\n",
+    "$\\theta_0$ in the fitting.  This applies to Lasso regularization as\n",
+    "well.  It means that our optimization is now done only with the\n",
+    "centered matrix and/or vector that enter the fitting procedure."
    ]
   }
  ],
diff --git a/doc/pub/week38/html/._week38-bs000.html b/doc/pub/week38/html/._week38-bs000.html
index a52bfcda1..0c055159c 100644
--- a/doc/pub/week38/html/._week38-bs000.html
+++ b/doc/pub/week38/html/._week38-bs000.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 38: Logistic Regression and Optimization">
-<title>Week 38: Logistic Regression and Optimization</title>
+<meta name="description" content="Week 38: Statistical analysis, bias-variance tradeoff and resampling methods">
+<title>Week 38: Statistical analysis, bias-variance tradeoff and resampling methods</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week38.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week38-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -36,172 +36,113 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Plans for week 38, lecture Monday September 16',
+ 'sections': [('Plans for week 38, lecture Monday September 15',
                2,
                None,
-               'plans-for-week-38-lecture-monday-september-16'),
-              ('Suggested reading and videos',
+               'plans-for-week-38-lecture-monday-september-15'),
+              ('Readings and Videos', 2, None, 'readings-and-videos'),
+              ('Linking the regression analysis with a statistical '
+               'interpretation',
                2,
                None,
-               'suggested-reading-and-videos'),
-              ('Plans for the lab sessions',
+               'linking-the-regression-analysis-with-a-statistical-interpretation'),
+              ('Assumptions made', 2, None, 'assumptions-made'),
+              ('Expectation value and variance',
                2,
                None,
-               'plans-for-the-lab-sessions'),
-              ('Material for lecture Monday September 16',
+               'expectation-value-and-variance'),
+              ('Expectation value and variance for $\\boldsymbol{\\theta}$',
                2,
                None,
-               'material-for-lecture-monday-september-16'),
-              ('Logistic Regression', 2, None, 'logistic-regression'),
-              ('Classification problems', 2, None, 'classification-problems'),
-              ('Optimization and Deep learning',
+               'expectation-value-and-variance-for-boldsymbol-theta'),
+              ('Deriving OLS from a probability distribution',
                2,
                None,
-               'optimization-and-deep-learning'),
-              ('Basics', 2, None, 'basics'),
-              ('Linear classifier', 2, None, 'linear-classifier'),
-              ('Some selected properties', 2, None, 'some-selected-properties'),
-              ('Simple example', 2, None, 'simple-example'),
-              ('Plotting the mean value for each group',
+               'deriving-ols-from-a-probability-distribution'),
+              ('Independent and Identically Distributed (iid)',
                2,
                None,
-               'plotting-the-mean-value-for-each-group'),
-              ('The logistic function', 2, None, 'the-logistic-function'),
-              ('Examples of likelihood functions used in logistic regression '
-               'and nueral networks',
+               'independent-and-identically-distributed-iid'),
+              ('Maximum Likelihood Estimation (MLE)',
                2,
                None,
-               'examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-networks'),
-              ('Two parameters', 2, None, 'two-parameters'),
-              ('Maximum likelihood', 2, None, 'maximum-likelihood'),
-              ('The cost function rewritten',
+               'maximum-likelihood-estimation-mle'),
+              ('A new Cost Function', 2, None, 'a-new-cost-function'),
+              ('Why resampling methods', 2, None, 'why-resampling-methods'),
+              ('Resampling methods', 2, None, 'resampling-methods'),
+              ('Resampling approaches can be computationally expensive',
                2,
                None,
-               'the-cost-function-rewritten'),
-              ('Minimizing the cross entropy',
+               'resampling-approaches-can-be-computationally-expensive'),
+              ('Why resampling methods ?', 2, None, 'why-resampling-methods'),
+              ('Statistical analysis', 2, None, 'statistical-analysis'),
+              ('Resampling methods', 2, None, 'resampling-methods'),
+              ('Resampling methods: Bootstrap',
                2,
                None,
-               'minimizing-the-cross-entropy'),
-              ('A more compact expression',
+               'resampling-methods-bootstrap'),
+              ('The Central Limit Theorem',
                2,
                None,
-               'a-more-compact-expression'),
-              ('Extending to more predictors',
+               'the-central-limit-theorem'),
+              ('Finding the Limit', 2, None, 'finding-the-limit'),
+              ('Rewriting the $\\delta$-function',
                2,
                None,
-               'extending-to-more-predictors'),
-              ('Including more classes', 2, None, 'including-more-classes'),
-              ('More classes', 2, None, 'more-classes'),
-              ('Searching for Optimal Regularization Parameters $\\lambda$',
+               'rewriting-the-delta-function'),
+              ('Identifying Terms', 2, None, 'identifying-terms'),
+              ('Wrapping it up', 2, None, 'wrapping-it-up'),
+              ('Confidence Intervals', 2, None, 'confidence-intervals'),
+              ('Standard Approach based on the Normal Distribution',
                2,
                None,
-               'searching-for-optimal-regularization-parameters-lambda'),
-              ('Grid Search', 2, None, 'grid-search'),
-              ('Randomized Grid Search', 2, None, 'randomized-grid-search'),
-              ('Wisconsin Cancer Data', 2, None, 'wisconsin-cancer-data'),
-              ('Using the correlation matrix',
+               'standard-approach-based-on-the-normal-distribution'),
+              ('Resampling methods: Bootstrap background',
                2,
                None,
-               'using-the-correlation-matrix'),
-              ('Discussing the correlation data',
+               'resampling-methods-bootstrap-background'),
+              ('Resampling methods: More Bootstrap background',
                2,
                None,
-               'discussing-the-correlation-data'),
-              ('Other measures in classification studies: Cancer Data  again',
+               'resampling-methods-more-bootstrap-background'),
+              ('Resampling methods: Bootstrap approach',
                2,
                None,
-               'other-measures-in-classification-studies-cancer-data-again'),
-              ('Optimization, the central part of any Machine Learning '
-               'algortithm',
+               'resampling-methods-bootstrap-approach'),
+              ('Resampling methods: Bootstrap steps',
                2,
                None,
-               'optimization-the-central-part-of-any-machine-learning-algortithm'),
-              ('Revisiting our Logistic Regression case',
+               'resampling-methods-bootstrap-steps'),
+              ('Code example for the Bootstrap method',
                2,
                None,
-               'revisiting-our-logistic-regression-case'),
-              ('The equations to solve', 2, None, 'the-equations-to-solve'),
-              ("Solving using Newton-Raphson's method",
+               'code-example-for-the-bootstrap-method'),
+              ('Plotting the Histogram', 2, None, 'plotting-the-histogram'),
+              ('The bias-variance tradeoff',
                2,
                None,
-               'solving-using-newton-raphson-s-method'),
-              ("Brief reminder on Newton-Raphson's method",
+               'the-bias-variance-tradeoff'),
+              ('A way to Read the Bias-Variance Tradeoff',
                2,
                None,
-               'brief-reminder-on-newton-raphson-s-method'),
-              ('The equations', 2, None, 'the-equations'),
-              ('Simple geometric interpretation',
+               'a-way-to-read-the-bias-variance-tradeoff'),
+              ('Example code for Bias-Variance tradeoff',
                2,
                None,
-               'simple-geometric-interpretation'),
-              ('Extending to more than one variable',
+               'example-code-for-bias-variance-tradeoff'),
+              ('Understanding what happens',
                2,
                None,
-               'extending-to-more-than-one-variable'),
-              ('Steepest descent', 2, None, 'steepest-descent'),
-              ('More on Steepest descent', 2, None, 'more-on-steepest-descent'),
-              ('The ideal', 2, None, 'the-ideal'),
-              ('The sensitiveness of the gradient descent',
+               'understanding-what-happens'),
+              ('Summing up', 2, None, 'summing-up'),
+              ("Another Example from Scikit-Learn's Repository",
                2,
                None,
-               'the-sensitiveness-of-the-gradient-descent'),
-              ('Convex functions', 2, None, 'convex-functions'),
-              ('Convex function', 2, None, 'convex-function'),
-              ('Conditions on convex functions',
-               2,
-               None,
-               'conditions-on-convex-functions'),
-              ('More on convex functions', 2, None, 'more-on-convex-functions'),
-              ('Some simple problems', 2, None, 'some-simple-problems'),
-              ('Revisiting our first homework',
-               2,
-               None,
-               'revisiting-our-first-homework'),
-              ('Gradient descent example', 2, None, 'gradient-descent-example'),
-              ('The derivative of the cost/loss function',
-               2,
-               None,
-               'the-derivative-of-the-cost-loss-function'),
-              ('The Hessian matrix', 2, None, 'the-hessian-matrix'),
-              ('Simple program', 2, None, 'simple-program'),
-              ('Gradient Descent Example', 2, None, 'gradient-descent-example'),
-              ('And a corresponding example using _scikit-learn_',
-               2,
-               None,
-               'and-a-corresponding-example-using-scikit-learn'),
-              ('Gradient descent and Ridge',
-               2,
-               None,
-               'gradient-descent-and-ridge'),
-              ('The Hessian matrix for Ridge Regression',
-               2,
-               None,
-               'the-hessian-matrix-for-ridge-regression'),
-              ('Program example for gradient descent with Ridge Regression',
-               2,
-               None,
-               'program-example-for-gradient-descent-with-ridge-regression'),
-              ('Using gradient descent methods, limitations',
-               2,
-               None,
-               'using-gradient-descent-methods-limitations'),
-              ('Challenge yourself the coming weekend',
-               2,
-               None,
-               'challenge-yourself-the-coming-weekend'),
-              ('Lab session: Material from last week and relevant for the '
-               'first project',
-               2,
-               None,
-               'lab-session-material-from-last-week-and-relevant-for-the-first-project'),
+               'another-example-from-scikit-learn-s-repository'),
               ('Various steps in cross-validation',
                2,
                None,
                'various-steps-in-cross-validation'),
-              ('How to set up the cross-validation for Ridge and/or Lasso',
-               2,
-               None,
-               'how-to-set-up-the-cross-validation-for-ridge-and-or-lasso'),
               ('Cross-validation in brief',
                2,
                None,
@@ -210,7 +151,19 @@
                'Cross-validation',
                2,
                None,
-               'code-example-for-cross-validation-and-k-fold-cross-validation')]}
+               'code-example-for-cross-validation-and-k-fold-cross-validation'),
+              ('More examples on bootstrap and cross-validation and errors',
+               2,
+               None,
+               'more-examples-on-bootstrap-and-cross-validation-and-errors'),
+              ('The same example but now with cross-validation',
+               2,
+               None,
+               'the-same-example-but-now-with-cross-validation'),
+              ('Material for the lab sessions',
+               2,
+               None,
+               'material-for-the-lab-sessions')]}
 end of tocinfo -->
 
 <body>
@@ -238,76 +191,55 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week38-bs.html">Week 38: Logistic Regression and Optimization</a>
+    <a class="navbar-brand" href="http://github.com/week38-bs.html">Week 38: Statistical analysis, bias-variance tradeoff and resampling methods</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs001.html#plans-for-week-38-lecture-monday-september-16" style="font-size: 80%;">Plans for week 38, lecture Monday September 16</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs002.html#suggested-reading-and-videos" style="font-size: 80%;">Suggested reading and videos</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs003.html#plans-for-the-lab-sessions" style="font-size: 80%;">Plans for the lab sessions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs004.html#material-for-lecture-monday-september-16" style="font-size: 80%;">Material for lecture Monday September 16</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs005.html#logistic-regression" style="font-size: 80%;">Logistic Regression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs006.html#classification-problems" style="font-size: 80%;">Classification problems</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs007.html#optimization-and-deep-learning" style="font-size: 80%;">Optimization and Deep learning</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs008.html#basics" style="font-size: 80%;">Basics</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs009.html#linear-classifier" style="font-size: 80%;">Linear classifier</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs010.html#some-selected-properties" style="font-size: 80%;">Some selected properties</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs011.html#simple-example" style="font-size: 80%;">Simple example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs012.html#plotting-the-mean-value-for-each-group" style="font-size: 80%;">Plotting the mean value for each group</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs013.html#the-logistic-function" style="font-size: 80%;">The logistic function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs014.html#examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-networks" style="font-size: 80%;">Examples of likelihood functions used in logistic regression and nueral networks</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs015.html#two-parameters" style="font-size: 80%;">Two parameters</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs016.html#maximum-likelihood" style="font-size: 80%;">Maximum likelihood</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs017.html#the-cost-function-rewritten" style="font-size: 80%;">The cost function rewritten</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs018.html#minimizing-the-cross-entropy" style="font-size: 80%;">Minimizing the cross entropy</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs019.html#a-more-compact-expression" style="font-size: 80%;">A more compact expression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs020.html#extending-to-more-predictors" style="font-size: 80%;">Extending to more predictors</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs021.html#including-more-classes" style="font-size: 80%;">Including more classes</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs022.html#more-classes" style="font-size: 80%;">More classes</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs023.html#searching-for-optimal-regularization-parameters-lambda" style="font-size: 80%;">Searching for Optimal Regularization Parameters \( \lambda \)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs024.html#grid-search" style="font-size: 80%;">Grid Search</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs025.html#randomized-grid-search" style="font-size: 80%;">Randomized Grid Search</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs026.html#wisconsin-cancer-data" style="font-size: 80%;">Wisconsin Cancer Data</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs027.html#using-the-correlation-matrix" style="font-size: 80%;">Using the correlation matrix</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs028.html#discussing-the-correlation-data" style="font-size: 80%;">Discussing the correlation data</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs029.html#other-measures-in-classification-studies-cancer-data-again" style="font-size: 80%;">Other measures in classification studies: Cancer Data  again</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs030.html#optimization-the-central-part-of-any-machine-learning-algortithm" style="font-size: 80%;">Optimization, the central part of any Machine Learning algortithm</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs031.html#revisiting-our-logistic-regression-case" style="font-size: 80%;">Revisiting our Logistic Regression case</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs032.html#the-equations-to-solve" style="font-size: 80%;">The equations to solve</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs033.html#solving-using-newton-raphson-s-method" style="font-size: 80%;">Solving using Newton-Raphson's method</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs034.html#brief-reminder-on-newton-raphson-s-method" style="font-size: 80%;">Brief reminder on Newton-Raphson's method</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs035.html#the-equations" style="font-size: 80%;">The equations</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs036.html#simple-geometric-interpretation" style="font-size: 80%;">Simple geometric interpretation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs037.html#extending-to-more-than-one-variable" style="font-size: 80%;">Extending to more than one variable</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs038.html#steepest-descent" style="font-size: 80%;">Steepest descent</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs039.html#more-on-steepest-descent" style="font-size: 80%;">More on Steepest descent</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs040.html#the-ideal" style="font-size: 80%;">The ideal</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs041.html#the-sensitiveness-of-the-gradient-descent" style="font-size: 80%;">The sensitiveness of the gradient descent</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs042.html#convex-functions" style="font-size: 80%;">Convex functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs043.html#convex-function" style="font-size: 80%;">Convex function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs044.html#conditions-on-convex-functions" style="font-size: 80%;">Conditions on convex functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs045.html#more-on-convex-functions" style="font-size: 80%;">More on convex functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs046.html#some-simple-problems" style="font-size: 80%;">Some simple problems</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs047.html#revisiting-our-first-homework" style="font-size: 80%;">Revisiting our first homework</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs052.html#gradient-descent-example" style="font-size: 80%;">Gradient descent example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs049.html#the-derivative-of-the-cost-loss-function" style="font-size: 80%;">The derivative of the cost/loss function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs050.html#the-hessian-matrix" style="font-size: 80%;">The Hessian matrix</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs051.html#simple-program" style="font-size: 80%;">Simple program</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs052.html#gradient-descent-example" style="font-size: 80%;">Gradient Descent Example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs053.html#and-a-corresponding-example-using-scikit-learn" style="font-size: 80%;">And a corresponding example using <b>scikit-learn</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs054.html#gradient-descent-and-ridge" style="font-size: 80%;">Gradient descent and Ridge</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs055.html#the-hessian-matrix-for-ridge-regression" style="font-size: 80%;">The Hessian matrix for Ridge Regression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs056.html#program-example-for-gradient-descent-with-ridge-regression" style="font-size: 80%;">Program example for gradient descent with Ridge Regression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs057.html#using-gradient-descent-methods-limitations" style="font-size: 80%;">Using gradient descent methods, limitations</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs058.html#challenge-yourself-the-coming-weekend" style="font-size: 80%;">Challenge yourself the coming weekend</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs059.html#lab-session-material-from-last-week-and-relevant-for-the-first-project" style="font-size: 80%;">Lab session: Material from last week and relevant for the first project</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs060.html#various-steps-in-cross-validation" style="font-size: 80%;">Various steps in cross-validation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs061.html#how-to-set-up-the-cross-validation-for-ridge-and-or-lasso" style="font-size: 80%;">How to set up the cross-validation for Ridge and/or Lasso</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs062.html#cross-validation-in-brief" style="font-size: 80%;">Cross-validation in brief</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs063.html#code-example-for-cross-validation-and-k-fold-cross-validation" style="font-size: 80%;">Code Example for Cross-validation and \( k \)-fold Cross-validation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs001.html#plans-for-week-38-lecture-monday-september-15" style="font-size: 80%;">Plans for week 38, lecture Monday September 15</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs002.html#readings-and-videos" style="font-size: 80%;">Readings and Videos</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs003.html#linking-the-regression-analysis-with-a-statistical-interpretation" style="font-size: 80%;">Linking the regression analysis with a statistical interpretation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs004.html#assumptions-made" style="font-size: 80%;">Assumptions made</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs005.html#expectation-value-and-variance" style="font-size: 80%;">Expectation value and variance</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs006.html#expectation-value-and-variance-for-boldsymbol-theta" style="font-size: 80%;">Expectation value and variance for \( \boldsymbol{\theta} \)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs007.html#deriving-ols-from-a-probability-distribution" style="font-size: 80%;">Deriving OLS from a probability distribution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs008.html#independent-and-identically-distributed-iid" style="font-size: 80%;">Independent and Identically Distributed (iid)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs009.html#maximum-likelihood-estimation-mle" style="font-size: 80%;">Maximum Likelihood Estimation (MLE)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs010.html#a-new-cost-function" style="font-size: 80%;">A new Cost Function</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs014.html#why-resampling-methods" style="font-size: 80%;">Why resampling methods</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs016.html#resampling-methods" style="font-size: 80%;">Resampling methods</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs013.html#resampling-approaches-can-be-computationally-expensive" style="font-size: 80%;">Resampling approaches can be computationally expensive</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs014.html#why-resampling-methods" style="font-size: 80%;">Why resampling methods ?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs015.html#statistical-analysis" style="font-size: 80%;">Statistical analysis</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs016.html#resampling-methods" style="font-size: 80%;">Resampling methods</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs017.html#resampling-methods-bootstrap" style="font-size: 80%;">Resampling methods: Bootstrap</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs018.html#the-central-limit-theorem" style="font-size: 80%;">The Central Limit Theorem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs019.html#finding-the-limit" style="font-size: 80%;">Finding the Limit</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs020.html#rewriting-the-delta-function" style="font-size: 80%;">Rewriting the \( \delta \)-function</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs021.html#identifying-terms" style="font-size: 80%;">Identifying Terms</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs022.html#wrapping-it-up" style="font-size: 80%;">Wrapping it up</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs023.html#confidence-intervals" style="font-size: 80%;">Confidence Intervals</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs024.html#standard-approach-based-on-the-normal-distribution" style="font-size: 80%;">Standard Approach based on the Normal Distribution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs025.html#resampling-methods-bootstrap-background" style="font-size: 80%;">Resampling methods: Bootstrap background</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs026.html#resampling-methods-more-bootstrap-background" style="font-size: 80%;">Resampling methods: More Bootstrap background</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs027.html#resampling-methods-bootstrap-approach" style="font-size: 80%;">Resampling methods: Bootstrap approach</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs028.html#resampling-methods-bootstrap-steps" style="font-size: 80%;">Resampling methods: Bootstrap steps</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs029.html#code-example-for-the-bootstrap-method" style="font-size: 80%;">Code example for the Bootstrap method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs030.html#plotting-the-histogram" style="font-size: 80%;">Plotting the Histogram</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs031.html#the-bias-variance-tradeoff" style="font-size: 80%;">The bias-variance tradeoff</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs032.html#a-way-to-read-the-bias-variance-tradeoff" style="font-size: 80%;">A way to Read the Bias-Variance Tradeoff</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs033.html#example-code-for-bias-variance-tradeoff" style="font-size: 80%;">Example code for Bias-Variance tradeoff</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs034.html#understanding-what-happens" style="font-size: 80%;">Understanding what happens</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs035.html#summing-up" style="font-size: 80%;">Summing up</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs036.html#another-example-from-scikit-learn-s-repository" style="font-size: 80%;">Another Example from Scikit-Learn's Repository</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs037.html#various-steps-in-cross-validation" style="font-size: 80%;">Various steps in cross-validation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs038.html#cross-validation-in-brief" style="font-size: 80%;">Cross-validation in brief</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs039.html#code-example-for-cross-validation-and-k-fold-cross-validation" style="font-size: 80%;">Code Example for Cross-validation and \( k \)-fold Cross-validation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs040.html#more-examples-on-bootstrap-and-cross-validation-and-errors" style="font-size: 80%;">More examples on bootstrap and cross-validation and errors</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs041.html#the-same-example-but-now-with-cross-validation" style="font-size: 80%;">The same example but now with cross-validation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs042.html#material-for-the-lab-sessions" style="font-size: 80%;">Material for the lab sessions</a></li>
 
         </ul>
       </li>
@@ -321,23 +253,20 @@
 <!-- ------------------- main content ---------------------- -->
 <div class="jumbotron">
 <center>
-<h1>Week 38: Logistic Regression and Optimization</h1>
+<h1>Week 38: Statistical analysis, bias-variance tradeoff and resampling methods</h1>
 </center>  <!-- document title -->
 
 <!-- author(s): Morten Hjorth-Jensen -->
 <center>
-<b>Morten Hjorth-Jensen</b> [1, 2]
-</center>
-<!-- institution(s) -->
-<center>
-[1] <b>Department of Physics and Center for Computing in Science Education, University of Oslo</b>
+<b>Morten Hjorth-Jensen</b> 
 </center>
+<!-- institution -->
 <center>
-[2] <b>Department of Physics and Astronomy and Facility for Rare Isotope Beams, Michigan State University</b>
+<b>Department of Physics and Center for Computing in Science Education, University of Oslo, Norway</b>
 </center>
 <br>
 <center>
-<h4>September 16-20, 2024</h4>
+<h4>September 15-19, 2025</h4>
 </center> <!-- date -->
 <br>
 
@@ -362,7 +291,7 @@ <h4>September 16-20, 2024</h4>
   <li><a href="http://github.com/._week38-bs008.html">9</a></li>
   <li><a href="http://github.com/._week38-bs009.html">10</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week38-bs063.html">64</a></li>
+  <li><a href="http://github.com/._week38-bs042.html">43</a></li>
   <li><a href="http://github.com/._week38-bs001.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
@@ -376,7 +305,7 @@ <h4>September 16-20, 2024</h4>
 </footer>
 -->
 <center style="font-size:80%">
-<!-- copyright --> &copy; 1999-2024, Morten Hjorth-Jensen. Released under CC Attribution-NonCommercial 4.0 license
+<!-- copyright --> &copy; 1999-2025, Morten Hjorth-Jensen. Released under CC Attribution-NonCommercial 4.0 license
 </center>
 </body>
 </html>
diff --git a/doc/pub/week38/html/._week38-bs001.html b/doc/pub/week38/html/._week38-bs001.html
index 4b62a0142..e252e97e3 100644
--- a/doc/pub/week38/html/._week38-bs001.html
+++ b/doc/pub/week38/html/._week38-bs001.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 38: Logistic Regression and Optimization">
-<title>Week 38: Logistic Regression and Optimization</title>
+<meta name="description" content="Week 38: Statistical analysis, bias-variance tradeoff and resampling methods">
+<title>Week 38: Statistical analysis, bias-variance tradeoff and resampling methods</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week38.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week38-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -36,172 +36,113 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Plans for week 38, lecture Monday September 16',
+ 'sections': [('Plans for week 38, lecture Monday September 15',
                2,
                None,
-               'plans-for-week-38-lecture-monday-september-16'),
-              ('Suggested reading and videos',
+               'plans-for-week-38-lecture-monday-september-15'),
+              ('Readings and Videos', 2, None, 'readings-and-videos'),
+              ('Linking the regression analysis with a statistical '
+               'interpretation',
                2,
                None,
-               'suggested-reading-and-videos'),
-              ('Plans for the lab sessions',
+               'linking-the-regression-analysis-with-a-statistical-interpretation'),
+              ('Assumptions made', 2, None, 'assumptions-made'),
+              ('Expectation value and variance',
                2,
                None,
-               'plans-for-the-lab-sessions'),
-              ('Material for lecture Monday September 16',
+               'expectation-value-and-variance'),
+              ('Expectation value and variance for $\\boldsymbol{\\theta}$',
                2,
                None,
-               'material-for-lecture-monday-september-16'),
-              ('Logistic Regression', 2, None, 'logistic-regression'),
-              ('Classification problems', 2, None, 'classification-problems'),
-              ('Optimization and Deep learning',
+               'expectation-value-and-variance-for-boldsymbol-theta'),
+              ('Deriving OLS from a probability distribution',
                2,
                None,
-               'optimization-and-deep-learning'),
-              ('Basics', 2, None, 'basics'),
-              ('Linear classifier', 2, None, 'linear-classifier'),
-              ('Some selected properties', 2, None, 'some-selected-properties'),
-              ('Simple example', 2, None, 'simple-example'),
-              ('Plotting the mean value for each group',
+               'deriving-ols-from-a-probability-distribution'),
+              ('Independent and Identically Distributed (iid)',
                2,
                None,
-               'plotting-the-mean-value-for-each-group'),
-              ('The logistic function', 2, None, 'the-logistic-function'),
-              ('Examples of likelihood functions used in logistic regression '
-               'and nueral networks',
+               'independent-and-identically-distributed-iid'),
+              ('Maximum Likelihood Estimation (MLE)',
                2,
                None,
-               'examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-networks'),
-              ('Two parameters', 2, None, 'two-parameters'),
-              ('Maximum likelihood', 2, None, 'maximum-likelihood'),
-              ('The cost function rewritten',
+               'maximum-likelihood-estimation-mle'),
+              ('A new Cost Function', 2, None, 'a-new-cost-function'),
+              ('Why resampling methods', 2, None, 'why-resampling-methods'),
+              ('Resampling methods', 2, None, 'resampling-methods'),
+              ('Resampling approaches can be computationally expensive',
                2,
                None,
-               'the-cost-function-rewritten'),
-              ('Minimizing the cross entropy',
+               'resampling-approaches-can-be-computationally-expensive'),
+              ('Why resampling methods ?', 2, None, 'why-resampling-methods'),
+              ('Statistical analysis', 2, None, 'statistical-analysis'),
+              ('Resampling methods', 2, None, 'resampling-methods'),
+              ('Resampling methods: Bootstrap',
                2,
                None,
-               'minimizing-the-cross-entropy'),
-              ('A more compact expression',
+               'resampling-methods-bootstrap'),
+              ('The Central Limit Theorem',
                2,
                None,
-               'a-more-compact-expression'),
-              ('Extending to more predictors',
+               'the-central-limit-theorem'),
+              ('Finding the Limit', 2, None, 'finding-the-limit'),
+              ('Rewriting the $\\delta$-function',
                2,
                None,
-               'extending-to-more-predictors'),
-              ('Including more classes', 2, None, 'including-more-classes'),
-              ('More classes', 2, None, 'more-classes'),
-              ('Searching for Optimal Regularization Parameters $\\lambda$',
+               'rewriting-the-delta-function'),
+              ('Identifying Terms', 2, None, 'identifying-terms'),
+              ('Wrapping it up', 2, None, 'wrapping-it-up'),
+              ('Confidence Intervals', 2, None, 'confidence-intervals'),
+              ('Standard Approach based on the Normal Distribution',
                2,
                None,
-               'searching-for-optimal-regularization-parameters-lambda'),
-              ('Grid Search', 2, None, 'grid-search'),
-              ('Randomized Grid Search', 2, None, 'randomized-grid-search'),
-              ('Wisconsin Cancer Data', 2, None, 'wisconsin-cancer-data'),
-              ('Using the correlation matrix',
+               'standard-approach-based-on-the-normal-distribution'),
+              ('Resampling methods: Bootstrap background',
                2,
                None,
-               'using-the-correlation-matrix'),
-              ('Discussing the correlation data',
+               'resampling-methods-bootstrap-background'),
+              ('Resampling methods: More Bootstrap background',
                2,
                None,
-               'discussing-the-correlation-data'),
-              ('Other measures in classification studies: Cancer Data  again',
+               'resampling-methods-more-bootstrap-background'),
+              ('Resampling methods: Bootstrap approach',
                2,
                None,
-               'other-measures-in-classification-studies-cancer-data-again'),
-              ('Optimization, the central part of any Machine Learning '
-               'algortithm',
+               'resampling-methods-bootstrap-approach'),
+              ('Resampling methods: Bootstrap steps',
                2,
                None,
-               'optimization-the-central-part-of-any-machine-learning-algortithm'),
-              ('Revisiting our Logistic Regression case',
+               'resampling-methods-bootstrap-steps'),
+              ('Code example for the Bootstrap method',
                2,
                None,
-               'revisiting-our-logistic-regression-case'),
-              ('The equations to solve', 2, None, 'the-equations-to-solve'),
-              ("Solving using Newton-Raphson's method",
+               'code-example-for-the-bootstrap-method'),
+              ('Plotting the Histogram', 2, None, 'plotting-the-histogram'),
+              ('The bias-variance tradeoff',
                2,
                None,
-               'solving-using-newton-raphson-s-method'),
-              ("Brief reminder on Newton-Raphson's method",
+               'the-bias-variance-tradeoff'),
+              ('A way to Read the Bias-Variance Tradeoff',
                2,
                None,
-               'brief-reminder-on-newton-raphson-s-method'),
-              ('The equations', 2, None, 'the-equations'),
-              ('Simple geometric interpretation',
+               'a-way-to-read-the-bias-variance-tradeoff'),
+              ('Example code for Bias-Variance tradeoff',
                2,
                None,
-               'simple-geometric-interpretation'),
-              ('Extending to more than one variable',
+               'example-code-for-bias-variance-tradeoff'),
+              ('Understanding what happens',
                2,
                None,
-               'extending-to-more-than-one-variable'),
-              ('Steepest descent', 2, None, 'steepest-descent'),
-              ('More on Steepest descent', 2, None, 'more-on-steepest-descent'),
-              ('The ideal', 2, None, 'the-ideal'),
-              ('The sensitiveness of the gradient descent',
+               'understanding-what-happens'),
+              ('Summing up', 2, None, 'summing-up'),
+              ("Another Example from Scikit-Learn's Repository",
                2,
                None,
-               'the-sensitiveness-of-the-gradient-descent'),
-              ('Convex functions', 2, None, 'convex-functions'),
-              ('Convex function', 2, None, 'convex-function'),
-              ('Conditions on convex functions',
-               2,
-               None,
-               'conditions-on-convex-functions'),
-              ('More on convex functions', 2, None, 'more-on-convex-functions'),
-              ('Some simple problems', 2, None, 'some-simple-problems'),
-              ('Revisiting our first homework',
-               2,
-               None,
-               'revisiting-our-first-homework'),
-              ('Gradient descent example', 2, None, 'gradient-descent-example'),
-              ('The derivative of the cost/loss function',
-               2,
-               None,
-               'the-derivative-of-the-cost-loss-function'),
-              ('The Hessian matrix', 2, None, 'the-hessian-matrix'),
-              ('Simple program', 2, None, 'simple-program'),
-              ('Gradient Descent Example', 2, None, 'gradient-descent-example'),
-              ('And a corresponding example using _scikit-learn_',
-               2,
-               None,
-               'and-a-corresponding-example-using-scikit-learn'),
-              ('Gradient descent and Ridge',
-               2,
-               None,
-               'gradient-descent-and-ridge'),
-              ('The Hessian matrix for Ridge Regression',
-               2,
-               None,
-               'the-hessian-matrix-for-ridge-regression'),
-              ('Program example for gradient descent with Ridge Regression',
-               2,
-               None,
-               'program-example-for-gradient-descent-with-ridge-regression'),
-              ('Using gradient descent methods, limitations',
-               2,
-               None,
-               'using-gradient-descent-methods-limitations'),
-              ('Challenge yourself the coming weekend',
-               2,
-               None,
-               'challenge-yourself-the-coming-weekend'),
-              ('Lab session: Material from last week and relevant for the '
-               'first project',
-               2,
-               None,
-               'lab-session-material-from-last-week-and-relevant-for-the-first-project'),
+               'another-example-from-scikit-learn-s-repository'),
               ('Various steps in cross-validation',
                2,
                None,
                'various-steps-in-cross-validation'),
-              ('How to set up the cross-validation for Ridge and/or Lasso',
-               2,
-               None,
-               'how-to-set-up-the-cross-validation-for-ridge-and-or-lasso'),
               ('Cross-validation in brief',
                2,
                None,
@@ -210,7 +151,19 @@
                'Cross-validation',
                2,
                None,
-               'code-example-for-cross-validation-and-k-fold-cross-validation')]}
+               'code-example-for-cross-validation-and-k-fold-cross-validation'),
+              ('More examples on bootstrap and cross-validation and errors',
+               2,
+               None,
+               'more-examples-on-bootstrap-and-cross-validation-and-errors'),
+              ('The same example but now with cross-validation',
+               2,
+               None,
+               'the-same-example-but-now-with-cross-validation'),
+              ('Material for the lab sessions',
+               2,
+               None,
+               'material-for-the-lab-sessions')]}
 end of tocinfo -->
 
 <body>
@@ -238,76 +191,55 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week38-bs.html">Week 38: Logistic Regression and Optimization</a>
+    <a class="navbar-brand" href="http://github.com/week38-bs.html">Week 38: Statistical analysis, bias-variance tradeoff and resampling methods</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="#plans-for-week-38-lecture-monday-september-16" style="font-size: 80%;">Plans for week 38, lecture Monday September 16</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs002.html#suggested-reading-and-videos" style="font-size: 80%;">Suggested reading and videos</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs003.html#plans-for-the-lab-sessions" style="font-size: 80%;">Plans for the lab sessions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs004.html#material-for-lecture-monday-september-16" style="font-size: 80%;">Material for lecture Monday September 16</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs005.html#logistic-regression" style="font-size: 80%;">Logistic Regression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs006.html#classification-problems" style="font-size: 80%;">Classification problems</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs007.html#optimization-and-deep-learning" style="font-size: 80%;">Optimization and Deep learning</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs008.html#basics" style="font-size: 80%;">Basics</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs009.html#linear-classifier" style="font-size: 80%;">Linear classifier</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs010.html#some-selected-properties" style="font-size: 80%;">Some selected properties</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs011.html#simple-example" style="font-size: 80%;">Simple example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs012.html#plotting-the-mean-value-for-each-group" style="font-size: 80%;">Plotting the mean value for each group</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs013.html#the-logistic-function" style="font-size: 80%;">The logistic function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs014.html#examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-networks" style="font-size: 80%;">Examples of likelihood functions used in logistic regression and nueral networks</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs015.html#two-parameters" style="font-size: 80%;">Two parameters</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs016.html#maximum-likelihood" style="font-size: 80%;">Maximum likelihood</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs017.html#the-cost-function-rewritten" style="font-size: 80%;">The cost function rewritten</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs018.html#minimizing-the-cross-entropy" style="font-size: 80%;">Minimizing the cross entropy</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs019.html#a-more-compact-expression" style="font-size: 80%;">A more compact expression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs020.html#extending-to-more-predictors" style="font-size: 80%;">Extending to more predictors</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs021.html#including-more-classes" style="font-size: 80%;">Including more classes</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs022.html#more-classes" style="font-size: 80%;">More classes</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs023.html#searching-for-optimal-regularization-parameters-lambda" style="font-size: 80%;">Searching for Optimal Regularization Parameters \( \lambda \)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs024.html#grid-search" style="font-size: 80%;">Grid Search</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs025.html#randomized-grid-search" style="font-size: 80%;">Randomized Grid Search</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs026.html#wisconsin-cancer-data" style="font-size: 80%;">Wisconsin Cancer Data</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs027.html#using-the-correlation-matrix" style="font-size: 80%;">Using the correlation matrix</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs028.html#discussing-the-correlation-data" style="font-size: 80%;">Discussing the correlation data</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs029.html#other-measures-in-classification-studies-cancer-data-again" style="font-size: 80%;">Other measures in classification studies: Cancer Data  again</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs030.html#optimization-the-central-part-of-any-machine-learning-algortithm" style="font-size: 80%;">Optimization, the central part of any Machine Learning algortithm</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs031.html#revisiting-our-logistic-regression-case" style="font-size: 80%;">Revisiting our Logistic Regression case</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs032.html#the-equations-to-solve" style="font-size: 80%;">The equations to solve</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs033.html#solving-using-newton-raphson-s-method" style="font-size: 80%;">Solving using Newton-Raphson's method</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs034.html#brief-reminder-on-newton-raphson-s-method" style="font-size: 80%;">Brief reminder on Newton-Raphson's method</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs035.html#the-equations" style="font-size: 80%;">The equations</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs036.html#simple-geometric-interpretation" style="font-size: 80%;">Simple geometric interpretation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs037.html#extending-to-more-than-one-variable" style="font-size: 80%;">Extending to more than one variable</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs038.html#steepest-descent" style="font-size: 80%;">Steepest descent</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs039.html#more-on-steepest-descent" style="font-size: 80%;">More on Steepest descent</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs040.html#the-ideal" style="font-size: 80%;">The ideal</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs041.html#the-sensitiveness-of-the-gradient-descent" style="font-size: 80%;">The sensitiveness of the gradient descent</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs042.html#convex-functions" style="font-size: 80%;">Convex functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs043.html#convex-function" style="font-size: 80%;">Convex function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs044.html#conditions-on-convex-functions" style="font-size: 80%;">Conditions on convex functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs045.html#more-on-convex-functions" style="font-size: 80%;">More on convex functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs046.html#some-simple-problems" style="font-size: 80%;">Some simple problems</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs047.html#revisiting-our-first-homework" style="font-size: 80%;">Revisiting our first homework</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs052.html#gradient-descent-example" style="font-size: 80%;">Gradient descent example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs049.html#the-derivative-of-the-cost-loss-function" style="font-size: 80%;">The derivative of the cost/loss function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs050.html#the-hessian-matrix" style="font-size: 80%;">The Hessian matrix</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs051.html#simple-program" style="font-size: 80%;">Simple program</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs052.html#gradient-descent-example" style="font-size: 80%;">Gradient Descent Example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs053.html#and-a-corresponding-example-using-scikit-learn" style="font-size: 80%;">And a corresponding example using <b>scikit-learn</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs054.html#gradient-descent-and-ridge" style="font-size: 80%;">Gradient descent and Ridge</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs055.html#the-hessian-matrix-for-ridge-regression" style="font-size: 80%;">The Hessian matrix for Ridge Regression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs056.html#program-example-for-gradient-descent-with-ridge-regression" style="font-size: 80%;">Program example for gradient descent with Ridge Regression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs057.html#using-gradient-descent-methods-limitations" style="font-size: 80%;">Using gradient descent methods, limitations</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs058.html#challenge-yourself-the-coming-weekend" style="font-size: 80%;">Challenge yourself the coming weekend</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs059.html#lab-session-material-from-last-week-and-relevant-for-the-first-project" style="font-size: 80%;">Lab session: Material from last week and relevant for the first project</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs060.html#various-steps-in-cross-validation" style="font-size: 80%;">Various steps in cross-validation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs061.html#how-to-set-up-the-cross-validation-for-ridge-and-or-lasso" style="font-size: 80%;">How to set up the cross-validation for Ridge and/or Lasso</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs062.html#cross-validation-in-brief" style="font-size: 80%;">Cross-validation in brief</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs063.html#code-example-for-cross-validation-and-k-fold-cross-validation" style="font-size: 80%;">Code Example for Cross-validation and \( k \)-fold Cross-validation</a></li>
+     <!-- navigation toc: --> <li><a href="#plans-for-week-38-lecture-monday-september-15" style="font-size: 80%;">Plans for week 38, lecture Monday September 15</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs002.html#readings-and-videos" style="font-size: 80%;">Readings and Videos</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs003.html#linking-the-regression-analysis-with-a-statistical-interpretation" style="font-size: 80%;">Linking the regression analysis with a statistical interpretation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs004.html#assumptions-made" style="font-size: 80%;">Assumptions made</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs005.html#expectation-value-and-variance" style="font-size: 80%;">Expectation value and variance</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs006.html#expectation-value-and-variance-for-boldsymbol-theta" style="font-size: 80%;">Expectation value and variance for \( \boldsymbol{\theta} \)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs007.html#deriving-ols-from-a-probability-distribution" style="font-size: 80%;">Deriving OLS from a probability distribution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs008.html#independent-and-identically-distributed-iid" style="font-size: 80%;">Independent and Identically Distributed (iid)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs009.html#maximum-likelihood-estimation-mle" style="font-size: 80%;">Maximum Likelihood Estimation (MLE)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs010.html#a-new-cost-function" style="font-size: 80%;">A new Cost Function</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs014.html#why-resampling-methods" style="font-size: 80%;">Why resampling methods</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs016.html#resampling-methods" style="font-size: 80%;">Resampling methods</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs013.html#resampling-approaches-can-be-computationally-expensive" style="font-size: 80%;">Resampling approaches can be computationally expensive</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs014.html#why-resampling-methods" style="font-size: 80%;">Why resampling methods ?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs015.html#statistical-analysis" style="font-size: 80%;">Statistical analysis</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs016.html#resampling-methods" style="font-size: 80%;">Resampling methods</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs017.html#resampling-methods-bootstrap" style="font-size: 80%;">Resampling methods: Bootstrap</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs018.html#the-central-limit-theorem" style="font-size: 80%;">The Central Limit Theorem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs019.html#finding-the-limit" style="font-size: 80%;">Finding the Limit</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs020.html#rewriting-the-delta-function" style="font-size: 80%;">Rewriting the \( \delta \)-function</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs021.html#identifying-terms" style="font-size: 80%;">Identifying Terms</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs022.html#wrapping-it-up" style="font-size: 80%;">Wrapping it up</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs023.html#confidence-intervals" style="font-size: 80%;">Confidence Intervals</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs024.html#standard-approach-based-on-the-normal-distribution" style="font-size: 80%;">Standard Approach based on the Normal Distribution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs025.html#resampling-methods-bootstrap-background" style="font-size: 80%;">Resampling methods: Bootstrap background</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs026.html#resampling-methods-more-bootstrap-background" style="font-size: 80%;">Resampling methods: More Bootstrap background</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs027.html#resampling-methods-bootstrap-approach" style="font-size: 80%;">Resampling methods: Bootstrap approach</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs028.html#resampling-methods-bootstrap-steps" style="font-size: 80%;">Resampling methods: Bootstrap steps</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs029.html#code-example-for-the-bootstrap-method" style="font-size: 80%;">Code example for the Bootstrap method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs030.html#plotting-the-histogram" style="font-size: 80%;">Plotting the Histogram</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs031.html#the-bias-variance-tradeoff" style="font-size: 80%;">The bias-variance tradeoff</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs032.html#a-way-to-read-the-bias-variance-tradeoff" style="font-size: 80%;">A way to Read the Bias-Variance Tradeoff</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs033.html#example-code-for-bias-variance-tradeoff" style="font-size: 80%;">Example code for Bias-Variance tradeoff</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs034.html#understanding-what-happens" style="font-size: 80%;">Understanding what happens</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs035.html#summing-up" style="font-size: 80%;">Summing up</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs036.html#another-example-from-scikit-learn-s-repository" style="font-size: 80%;">Another Example from Scikit-Learn's Repository</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs037.html#various-steps-in-cross-validation" style="font-size: 80%;">Various steps in cross-validation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs038.html#cross-validation-in-brief" style="font-size: 80%;">Cross-validation in brief</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs039.html#code-example-for-cross-validation-and-k-fold-cross-validation" style="font-size: 80%;">Code Example for Cross-validation and \( k \)-fold Cross-validation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs040.html#more-examples-on-bootstrap-and-cross-validation-and-errors" style="font-size: 80%;">More examples on bootstrap and cross-validation and errors</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs041.html#the-same-example-but-now-with-cross-validation" style="font-size: 80%;">The same example but now with cross-validation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs042.html#material-for-the-lab-sessions" style="font-size: 80%;">Material for the lab sessions</a></li>
 
         </ul>
       </li>
@@ -319,17 +251,19 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0001"></a>
 <!-- !split -->
-<h2 id="plans-for-week-38-lecture-monday-september-16" class="anchor">Plans for week 38, lecture Monday September 16 </h2>
+<h2 id="plans-for-week-38-lecture-monday-september-15" class="anchor">Plans for week 38, lecture Monday September 15 </h2>
 
 <div class="panel panel-default">
 <div class="panel-body">
 <!-- subsequent paragraphs come in larger fonts, so start with a paragraph -->
-<ul>
-  <li> Logistic regression as our first encounter of classification methods. From binary cases to several categories.</li>
-  <li> Start gradient and optimization methods</li>
-  <li> <a href="https://youtu.be/c9DIfNHy2ks" target="_self">Video of lecture</a></li>
-  <li> Whiteboard notes at <a href="https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2024/NotesSeptember16.pdf" target="_self"><tt>https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2024/NotesSeptember16.pdf</tt></a></li>
-</ul>
+<ol>
+<li> Statistical interpretation of OLS and various expectation values</li>
+<li> Resampling techniques, Bootstrap and cross validation and bias-variance tradeoff</li>
+<li> The material we did not cover last week, that is on more advanced methods for updating the learning rate, are covered by its own video. We will briefly discuss these topics at the beginning of the lecture and during the lab sessions. See video on ADAgrad, RMSprop and ADAM (material from last week not covered during lecture) at <a href="https://youtu.be/J_41Hld6tTU" target="_self"><tt>https://youtu.be/J_41Hld6tTU</tt></a></li>
+<li> <a href="https://youtu.be/4Fo7ITVA7V4" target="_self">Video of Lecture</a></li>
+<li> <a href="https://youtu.be/GBWc1abChKo" target="_self">Video from lab sessions on the bias-variance tradeoff</a></li>
+<li> <a href="https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2025/FYSSTKweek38.pdf" target="_self">Whiteboard notes</a></li>
+</ol>
 </div>
 </div>
 
@@ -350,7 +284,7 @@ <h2 id="plans-for-week-38-lecture-monday-september-16" class="anchor">Plans for
   <li><a href="http://github.com/._week38-bs009.html">10</a></li>
   <li><a href="http://github.com/._week38-bs010.html">11</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week38-bs063.html">64</a></li>
+  <li><a href="http://github.com/._week38-bs042.html">43</a></li>
   <li><a href="http://github.com/._week38-bs002.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week38/html/._week38-bs002.html b/doc/pub/week38/html/._week38-bs002.html
index f5f74c7a4..7e2918a9e 100644
--- a/doc/pub/week38/html/._week38-bs002.html
+++ b/doc/pub/week38/html/._week38-bs002.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 38: Logistic Regression and Optimization">
-<title>Week 38: Logistic Regression and Optimization</title>
+<meta name="description" content="Week 38: Statistical analysis, bias-variance tradeoff and resampling methods">
+<title>Week 38: Statistical analysis, bias-variance tradeoff and resampling methods</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week38.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week38-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -36,172 +36,113 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Plans for week 38, lecture Monday September 16',
+ 'sections': [('Plans for week 38, lecture Monday September 15',
                2,
                None,
-               'plans-for-week-38-lecture-monday-september-16'),
-              ('Suggested reading and videos',
+               'plans-for-week-38-lecture-monday-september-15'),
+              ('Readings and Videos', 2, None, 'readings-and-videos'),
+              ('Linking the regression analysis with a statistical '
+               'interpretation',
                2,
                None,
-               'suggested-reading-and-videos'),
-              ('Plans for the lab sessions',
+               'linking-the-regression-analysis-with-a-statistical-interpretation'),
+              ('Assumptions made', 2, None, 'assumptions-made'),
+              ('Expectation value and variance',
                2,
                None,
-               'plans-for-the-lab-sessions'),
-              ('Material for lecture Monday September 16',
+               'expectation-value-and-variance'),
+              ('Expectation value and variance for $\\boldsymbol{\\theta}$',
                2,
                None,
-               'material-for-lecture-monday-september-16'),
-              ('Logistic Regression', 2, None, 'logistic-regression'),
-              ('Classification problems', 2, None, 'classification-problems'),
-              ('Optimization and Deep learning',
+               'expectation-value-and-variance-for-boldsymbol-theta'),
+              ('Deriving OLS from a probability distribution',
                2,
                None,
-               'optimization-and-deep-learning'),
-              ('Basics', 2, None, 'basics'),
-              ('Linear classifier', 2, None, 'linear-classifier'),
-              ('Some selected properties', 2, None, 'some-selected-properties'),
-              ('Simple example', 2, None, 'simple-example'),
-              ('Plotting the mean value for each group',
+               'deriving-ols-from-a-probability-distribution'),
+              ('Independent and Identically Distributed (iid)',
                2,
                None,
-               'plotting-the-mean-value-for-each-group'),
-              ('The logistic function', 2, None, 'the-logistic-function'),
-              ('Examples of likelihood functions used in logistic regression '
-               'and nueral networks',
+               'independent-and-identically-distributed-iid'),
+              ('Maximum Likelihood Estimation (MLE)',
                2,
                None,
-               'examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-networks'),
-              ('Two parameters', 2, None, 'two-parameters'),
-              ('Maximum likelihood', 2, None, 'maximum-likelihood'),
-              ('The cost function rewritten',
+               'maximum-likelihood-estimation-mle'),
+              ('A new Cost Function', 2, None, 'a-new-cost-function'),
+              ('Why resampling methods', 2, None, 'why-resampling-methods'),
+              ('Resampling methods', 2, None, 'resampling-methods'),
+              ('Resampling approaches can be computationally expensive',
                2,
                None,
-               'the-cost-function-rewritten'),
-              ('Minimizing the cross entropy',
+               'resampling-approaches-can-be-computationally-expensive'),
+              ('Why resampling methods ?', 2, None, 'why-resampling-methods'),
+              ('Statistical analysis', 2, None, 'statistical-analysis'),
+              ('Resampling methods', 2, None, 'resampling-methods'),
+              ('Resampling methods: Bootstrap',
                2,
                None,
-               'minimizing-the-cross-entropy'),
-              ('A more compact expression',
+               'resampling-methods-bootstrap'),
+              ('The Central Limit Theorem',
                2,
                None,
-               'a-more-compact-expression'),
-              ('Extending to more predictors',
+               'the-central-limit-theorem'),
+              ('Finding the Limit', 2, None, 'finding-the-limit'),
+              ('Rewriting the $\\delta$-function',
                2,
                None,
-               'extending-to-more-predictors'),
-              ('Including more classes', 2, None, 'including-more-classes'),
-              ('More classes', 2, None, 'more-classes'),
-              ('Searching for Optimal Regularization Parameters $\\lambda$',
+               'rewriting-the-delta-function'),
+              ('Identifying Terms', 2, None, 'identifying-terms'),
+              ('Wrapping it up', 2, None, 'wrapping-it-up'),
+              ('Confidence Intervals', 2, None, 'confidence-intervals'),
+              ('Standard Approach based on the Normal Distribution',
                2,
                None,
-               'searching-for-optimal-regularization-parameters-lambda'),
-              ('Grid Search', 2, None, 'grid-search'),
-              ('Randomized Grid Search', 2, None, 'randomized-grid-search'),
-              ('Wisconsin Cancer Data', 2, None, 'wisconsin-cancer-data'),
-              ('Using the correlation matrix',
+               'standard-approach-based-on-the-normal-distribution'),
+              ('Resampling methods: Bootstrap background',
                2,
                None,
-               'using-the-correlation-matrix'),
-              ('Discussing the correlation data',
+               'resampling-methods-bootstrap-background'),
+              ('Resampling methods: More Bootstrap background',
                2,
                None,
-               'discussing-the-correlation-data'),
-              ('Other measures in classification studies: Cancer Data  again',
+               'resampling-methods-more-bootstrap-background'),
+              ('Resampling methods: Bootstrap approach',
                2,
                None,
-               'other-measures-in-classification-studies-cancer-data-again'),
-              ('Optimization, the central part of any Machine Learning '
-               'algortithm',
+               'resampling-methods-bootstrap-approach'),
+              ('Resampling methods: Bootstrap steps',
                2,
                None,
-               'optimization-the-central-part-of-any-machine-learning-algortithm'),
-              ('Revisiting our Logistic Regression case',
+               'resampling-methods-bootstrap-steps'),
+              ('Code example for the Bootstrap method',
                2,
                None,
-               'revisiting-our-logistic-regression-case'),
-              ('The equations to solve', 2, None, 'the-equations-to-solve'),
-              ("Solving using Newton-Raphson's method",
+               'code-example-for-the-bootstrap-method'),
+              ('Plotting the Histogram', 2, None, 'plotting-the-histogram'),
+              ('The bias-variance tradeoff',
                2,
                None,
-               'solving-using-newton-raphson-s-method'),
-              ("Brief reminder on Newton-Raphson's method",
+               'the-bias-variance-tradeoff'),
+              ('A way to Read the Bias-Variance Tradeoff',
                2,
                None,
-               'brief-reminder-on-newton-raphson-s-method'),
-              ('The equations', 2, None, 'the-equations'),
-              ('Simple geometric interpretation',
+               'a-way-to-read-the-bias-variance-tradeoff'),
+              ('Example code for Bias-Variance tradeoff',
                2,
                None,
-               'simple-geometric-interpretation'),
-              ('Extending to more than one variable',
+               'example-code-for-bias-variance-tradeoff'),
+              ('Understanding what happens',
                2,
                None,
-               'extending-to-more-than-one-variable'),
-              ('Steepest descent', 2, None, 'steepest-descent'),
-              ('More on Steepest descent', 2, None, 'more-on-steepest-descent'),
-              ('The ideal', 2, None, 'the-ideal'),
-              ('The sensitiveness of the gradient descent',
+               'understanding-what-happens'),
+              ('Summing up', 2, None, 'summing-up'),
+              ("Another Example from Scikit-Learn's Repository",
                2,
                None,
-               'the-sensitiveness-of-the-gradient-descent'),
-              ('Convex functions', 2, None, 'convex-functions'),
-              ('Convex function', 2, None, 'convex-function'),
-              ('Conditions on convex functions',
-               2,
-               None,
-               'conditions-on-convex-functions'),
-              ('More on convex functions', 2, None, 'more-on-convex-functions'),
-              ('Some simple problems', 2, None, 'some-simple-problems'),
-              ('Revisiting our first homework',
-               2,
-               None,
-               'revisiting-our-first-homework'),
-              ('Gradient descent example', 2, None, 'gradient-descent-example'),
-              ('The derivative of the cost/loss function',
-               2,
-               None,
-               'the-derivative-of-the-cost-loss-function'),
-              ('The Hessian matrix', 2, None, 'the-hessian-matrix'),
-              ('Simple program', 2, None, 'simple-program'),
-              ('Gradient Descent Example', 2, None, 'gradient-descent-example'),
-              ('And a corresponding example using _scikit-learn_',
-               2,
-               None,
-               'and-a-corresponding-example-using-scikit-learn'),
-              ('Gradient descent and Ridge',
-               2,
-               None,
-               'gradient-descent-and-ridge'),
-              ('The Hessian matrix for Ridge Regression',
-               2,
-               None,
-               'the-hessian-matrix-for-ridge-regression'),
-              ('Program example for gradient descent with Ridge Regression',
-               2,
-               None,
-               'program-example-for-gradient-descent-with-ridge-regression'),
-              ('Using gradient descent methods, limitations',
-               2,
-               None,
-               'using-gradient-descent-methods-limitations'),
-              ('Challenge yourself the coming weekend',
-               2,
-               None,
-               'challenge-yourself-the-coming-weekend'),
-              ('Lab session: Material from last week and relevant for the '
-               'first project',
-               2,
-               None,
-               'lab-session-material-from-last-week-and-relevant-for-the-first-project'),
+               'another-example-from-scikit-learn-s-repository'),
               ('Various steps in cross-validation',
                2,
                None,
                'various-steps-in-cross-validation'),
-              ('How to set up the cross-validation for Ridge and/or Lasso',
-               2,
-               None,
-               'how-to-set-up-the-cross-validation-for-ridge-and-or-lasso'),
               ('Cross-validation in brief',
                2,
                None,
@@ -210,7 +151,19 @@
                'Cross-validation',
                2,
                None,
-               'code-example-for-cross-validation-and-k-fold-cross-validation')]}
+               'code-example-for-cross-validation-and-k-fold-cross-validation'),
+              ('More examples on bootstrap and cross-validation and errors',
+               2,
+               None,
+               'more-examples-on-bootstrap-and-cross-validation-and-errors'),
+              ('The same example but now with cross-validation',
+               2,
+               None,
+               'the-same-example-but-now-with-cross-validation'),
+              ('Material for the lab sessions',
+               2,
+               None,
+               'material-for-the-lab-sessions')]}
 end of tocinfo -->
 
 <body>
@@ -238,76 +191,55 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week38-bs.html">Week 38: Logistic Regression and Optimization</a>
+    <a class="navbar-brand" href="http://github.com/week38-bs.html">Week 38: Statistical analysis, bias-variance tradeoff and resampling methods</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs001.html#plans-for-week-38-lecture-monday-september-16" style="font-size: 80%;">Plans for week 38, lecture Monday September 16</a></li>
-     <!-- navigation toc: --> <li><a href="#suggested-reading-and-videos" style="font-size: 80%;">Suggested reading and videos</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs003.html#plans-for-the-lab-sessions" style="font-size: 80%;">Plans for the lab sessions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs004.html#material-for-lecture-monday-september-16" style="font-size: 80%;">Material for lecture Monday September 16</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs005.html#logistic-regression" style="font-size: 80%;">Logistic Regression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs006.html#classification-problems" style="font-size: 80%;">Classification problems</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs007.html#optimization-and-deep-learning" style="font-size: 80%;">Optimization and Deep learning</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs008.html#basics" style="font-size: 80%;">Basics</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs009.html#linear-classifier" style="font-size: 80%;">Linear classifier</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs010.html#some-selected-properties" style="font-size: 80%;">Some selected properties</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs011.html#simple-example" style="font-size: 80%;">Simple example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs012.html#plotting-the-mean-value-for-each-group" style="font-size: 80%;">Plotting the mean value for each group</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs013.html#the-logistic-function" style="font-size: 80%;">The logistic function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs014.html#examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-networks" style="font-size: 80%;">Examples of likelihood functions used in logistic regression and nueral networks</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs015.html#two-parameters" style="font-size: 80%;">Two parameters</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs016.html#maximum-likelihood" style="font-size: 80%;">Maximum likelihood</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs017.html#the-cost-function-rewritten" style="font-size: 80%;">The cost function rewritten</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs018.html#minimizing-the-cross-entropy" style="font-size: 80%;">Minimizing the cross entropy</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs019.html#a-more-compact-expression" style="font-size: 80%;">A more compact expression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs020.html#extending-to-more-predictors" style="font-size: 80%;">Extending to more predictors</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs021.html#including-more-classes" style="font-size: 80%;">Including more classes</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs022.html#more-classes" style="font-size: 80%;">More classes</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs023.html#searching-for-optimal-regularization-parameters-lambda" style="font-size: 80%;">Searching for Optimal Regularization Parameters \( \lambda \)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs024.html#grid-search" style="font-size: 80%;">Grid Search</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs025.html#randomized-grid-search" style="font-size: 80%;">Randomized Grid Search</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs026.html#wisconsin-cancer-data" style="font-size: 80%;">Wisconsin Cancer Data</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs027.html#using-the-correlation-matrix" style="font-size: 80%;">Using the correlation matrix</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs028.html#discussing-the-correlation-data" style="font-size: 80%;">Discussing the correlation data</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs029.html#other-measures-in-classification-studies-cancer-data-again" style="font-size: 80%;">Other measures in classification studies: Cancer Data  again</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs030.html#optimization-the-central-part-of-any-machine-learning-algortithm" style="font-size: 80%;">Optimization, the central part of any Machine Learning algortithm</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs031.html#revisiting-our-logistic-regression-case" style="font-size: 80%;">Revisiting our Logistic Regression case</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs032.html#the-equations-to-solve" style="font-size: 80%;">The equations to solve</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs033.html#solving-using-newton-raphson-s-method" style="font-size: 80%;">Solving using Newton-Raphson's method</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs034.html#brief-reminder-on-newton-raphson-s-method" style="font-size: 80%;">Brief reminder on Newton-Raphson's method</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs035.html#the-equations" style="font-size: 80%;">The equations</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs036.html#simple-geometric-interpretation" style="font-size: 80%;">Simple geometric interpretation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs037.html#extending-to-more-than-one-variable" style="font-size: 80%;">Extending to more than one variable</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs038.html#steepest-descent" style="font-size: 80%;">Steepest descent</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs039.html#more-on-steepest-descent" style="font-size: 80%;">More on Steepest descent</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs040.html#the-ideal" style="font-size: 80%;">The ideal</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs041.html#the-sensitiveness-of-the-gradient-descent" style="font-size: 80%;">The sensitiveness of the gradient descent</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs042.html#convex-functions" style="font-size: 80%;">Convex functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs043.html#convex-function" style="font-size: 80%;">Convex function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs044.html#conditions-on-convex-functions" style="font-size: 80%;">Conditions on convex functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs045.html#more-on-convex-functions" style="font-size: 80%;">More on convex functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs046.html#some-simple-problems" style="font-size: 80%;">Some simple problems</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs047.html#revisiting-our-first-homework" style="font-size: 80%;">Revisiting our first homework</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs052.html#gradient-descent-example" style="font-size: 80%;">Gradient descent example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs049.html#the-derivative-of-the-cost-loss-function" style="font-size: 80%;">The derivative of the cost/loss function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs050.html#the-hessian-matrix" style="font-size: 80%;">The Hessian matrix</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs051.html#simple-program" style="font-size: 80%;">Simple program</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs052.html#gradient-descent-example" style="font-size: 80%;">Gradient Descent Example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs053.html#and-a-corresponding-example-using-scikit-learn" style="font-size: 80%;">And a corresponding example using <b>scikit-learn</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs054.html#gradient-descent-and-ridge" style="font-size: 80%;">Gradient descent and Ridge</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs055.html#the-hessian-matrix-for-ridge-regression" style="font-size: 80%;">The Hessian matrix for Ridge Regression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs056.html#program-example-for-gradient-descent-with-ridge-regression" style="font-size: 80%;">Program example for gradient descent with Ridge Regression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs057.html#using-gradient-descent-methods-limitations" style="font-size: 80%;">Using gradient descent methods, limitations</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs058.html#challenge-yourself-the-coming-weekend" style="font-size: 80%;">Challenge yourself the coming weekend</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs059.html#lab-session-material-from-last-week-and-relevant-for-the-first-project" style="font-size: 80%;">Lab session: Material from last week and relevant for the first project</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs060.html#various-steps-in-cross-validation" style="font-size: 80%;">Various steps in cross-validation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs061.html#how-to-set-up-the-cross-validation-for-ridge-and-or-lasso" style="font-size: 80%;">How to set up the cross-validation for Ridge and/or Lasso</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs062.html#cross-validation-in-brief" style="font-size: 80%;">Cross-validation in brief</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs063.html#code-example-for-cross-validation-and-k-fold-cross-validation" style="font-size: 80%;">Code Example for Cross-validation and \( k \)-fold Cross-validation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs001.html#plans-for-week-38-lecture-monday-september-15" style="font-size: 80%;">Plans for week 38, lecture Monday September 15</a></li>
+     <!-- navigation toc: --> <li><a href="#readings-and-videos" style="font-size: 80%;">Readings and Videos</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs003.html#linking-the-regression-analysis-with-a-statistical-interpretation" style="font-size: 80%;">Linking the regression analysis with a statistical interpretation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs004.html#assumptions-made" style="font-size: 80%;">Assumptions made</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs005.html#expectation-value-and-variance" style="font-size: 80%;">Expectation value and variance</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs006.html#expectation-value-and-variance-for-boldsymbol-theta" style="font-size: 80%;">Expectation value and variance for \( \boldsymbol{\theta} \)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs007.html#deriving-ols-from-a-probability-distribution" style="font-size: 80%;">Deriving OLS from a probability distribution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs008.html#independent-and-identically-distributed-iid" style="font-size: 80%;">Independent and Identically Distributed (iid)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs009.html#maximum-likelihood-estimation-mle" style="font-size: 80%;">Maximum Likelihood Estimation (MLE)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs010.html#a-new-cost-function" style="font-size: 80%;">A new Cost Function</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs014.html#why-resampling-methods" style="font-size: 80%;">Why resampling methods</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs016.html#resampling-methods" style="font-size: 80%;">Resampling methods</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs013.html#resampling-approaches-can-be-computationally-expensive" style="font-size: 80%;">Resampling approaches can be computationally expensive</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs014.html#why-resampling-methods" style="font-size: 80%;">Why resampling methods ?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs015.html#statistical-analysis" style="font-size: 80%;">Statistical analysis</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs016.html#resampling-methods" style="font-size: 80%;">Resampling methods</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs017.html#resampling-methods-bootstrap" style="font-size: 80%;">Resampling methods: Bootstrap</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs018.html#the-central-limit-theorem" style="font-size: 80%;">The Central Limit Theorem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs019.html#finding-the-limit" style="font-size: 80%;">Finding the Limit</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs020.html#rewriting-the-delta-function" style="font-size: 80%;">Rewriting the \( \delta \)-function</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs021.html#identifying-terms" style="font-size: 80%;">Identifying Terms</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs022.html#wrapping-it-up" style="font-size: 80%;">Wrapping it up</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs023.html#confidence-intervals" style="font-size: 80%;">Confidence Intervals</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs024.html#standard-approach-based-on-the-normal-distribution" style="font-size: 80%;">Standard Approach based on the Normal Distribution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs025.html#resampling-methods-bootstrap-background" style="font-size: 80%;">Resampling methods: Bootstrap background</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs026.html#resampling-methods-more-bootstrap-background" style="font-size: 80%;">Resampling methods: More Bootstrap background</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs027.html#resampling-methods-bootstrap-approach" style="font-size: 80%;">Resampling methods: Bootstrap approach</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs028.html#resampling-methods-bootstrap-steps" style="font-size: 80%;">Resampling methods: Bootstrap steps</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs029.html#code-example-for-the-bootstrap-method" style="font-size: 80%;">Code example for the Bootstrap method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs030.html#plotting-the-histogram" style="font-size: 80%;">Plotting the Histogram</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs031.html#the-bias-variance-tradeoff" style="font-size: 80%;">The bias-variance tradeoff</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs032.html#a-way-to-read-the-bias-variance-tradeoff" style="font-size: 80%;">A way to Read the Bias-Variance Tradeoff</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs033.html#example-code-for-bias-variance-tradeoff" style="font-size: 80%;">Example code for Bias-Variance tradeoff</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs034.html#understanding-what-happens" style="font-size: 80%;">Understanding what happens</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs035.html#summing-up" style="font-size: 80%;">Summing up</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs036.html#another-example-from-scikit-learn-s-repository" style="font-size: 80%;">Another Example from Scikit-Learn's Repository</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs037.html#various-steps-in-cross-validation" style="font-size: 80%;">Various steps in cross-validation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs038.html#cross-validation-in-brief" style="font-size: 80%;">Cross-validation in brief</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs039.html#code-example-for-cross-validation-and-k-fold-cross-validation" style="font-size: 80%;">Code Example for Cross-validation and \( k \)-fold Cross-validation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs040.html#more-examples-on-bootstrap-and-cross-validation-and-errors" style="font-size: 80%;">More examples on bootstrap and cross-validation and errors</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs041.html#the-same-example-but-now-with-cross-validation" style="font-size: 80%;">The same example but now with cross-validation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs042.html#material-for-the-lab-sessions" style="font-size: 80%;">Material for the lab sessions</a></li>
 
         </ul>
       </li>
@@ -319,21 +251,18 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0002"></a>
 <!-- !split -->
-<h2 id="suggested-reading-and-videos" class="anchor">Suggested reading and videos </h2>
+<h2 id="readings-and-videos" class="anchor">Readings and Videos </h2>
 <div class="panel panel-default">
 <div class="panel-body">
 <!-- subsequent paragraphs come in larger fonts, so start with a paragraph -->
-<ul>
-  <li> Readings and Videos:</li>
-<ul>
-    <li> Hastie et al 4.1, 4.2 and 4.3 on logistic regression</li>
-    <li> Raschka et al, pages 53-76 on Logistic regression and pages 37-52 on gradient optimization</li>
-    <li> For a good discussion on gradient methods, see Goodfellow et al section 4.3-4.5 and chapter 8. We will come back to the latter chapter in our discussion of Neural networks as well.</li>
-    <li> <a href="https://www.youtube.com/watch?v=C5268D9t9Ak" target="_self">Video on Logistic regression</a></li>
-    <li> <a href="https://www.youtube.com/watch?v=yIYKR4sgzI8" target="_self">Yet another video on logistic regression</a></li>
-    <li> <a href="https://www.youtube.com/watch?v=sDv4f4s2SB8" target="_self">Video on gradient descent</a></li>
-</ul>
-</ul>
+<ol>
+<li> Raschka et al, pages 175-192</li>
+<li> Hastie et al Chapter 7, here we recommend 7.1-7.5 and 7.10 (cross-validation) and 7.11 (bootstrap). See <a href="https://link.springer.com/book/10.1007/978-0-387-84858-7" target="_self"><tt>https://link.springer.com/book/10.1007/978-0-387-84858-7</tt></a>.</li>
+<li> <a href="https://www.youtube.com/watch?v=EuBBz3bI-aA" target="_self">Video on bias-variance tradeoff</a></li>
+<li> <a href="https://www.youtube.com/watch?v=Xz0x-8-cgaQ" target="_self">Video on Bootstrapping</a></li>
+<li> <a href="https://www.youtube.com/watch?v=fSytzGwwBVw" target="_self">Video on cross validation</a></li>
+</ol>
+<p>For the lab session, the following video on cross validation (from 2024), could be helpful, see <a href="https://www.youtube.com/watch?v=T9jjWsmsd1o" target="_self"><tt>https://www.youtube.com/watch?v=T9jjWsmsd1o</tt></a></p>
 </div>
 </div>
 
@@ -355,7 +284,7 @@ <h2 id="suggested-reading-and-videos" class="anchor">Suggested reading and video
   <li><a href="http://github.com/._week38-bs010.html">11</a></li>
   <li><a href="http://github.com/._week38-bs011.html">12</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week38-bs063.html">64</a></li>
+  <li><a href="http://github.com/._week38-bs042.html">43</a></li>
   <li><a href="http://github.com/._week38-bs003.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week38/html/._week38-bs003.html b/doc/pub/week38/html/._week38-bs003.html
index 25b7cd9ff..01a605f1d 100644
--- a/doc/pub/week38/html/._week38-bs003.html
+++ b/doc/pub/week38/html/._week38-bs003.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 38: Logistic Regression and Optimization">
-<title>Week 38: Logistic Regression and Optimization</title>
+<meta name="description" content="Week 38: Statistical analysis, bias-variance tradeoff and resampling methods">
+<title>Week 38: Statistical analysis, bias-variance tradeoff and resampling methods</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week38.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week38-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -36,172 +36,113 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Plans for week 38, lecture Monday September 16',
+ 'sections': [('Plans for week 38, lecture Monday September 15',
                2,
                None,
-               'plans-for-week-38-lecture-monday-september-16'),
-              ('Suggested reading and videos',
+               'plans-for-week-38-lecture-monday-september-15'),
+              ('Readings and Videos', 2, None, 'readings-and-videos'),
+              ('Linking the regression analysis with a statistical '
+               'interpretation',
                2,
                None,
-               'suggested-reading-and-videos'),
-              ('Plans for the lab sessions',
+               'linking-the-regression-analysis-with-a-statistical-interpretation'),
+              ('Assumptions made', 2, None, 'assumptions-made'),
+              ('Expectation value and variance',
                2,
                None,
-               'plans-for-the-lab-sessions'),
-              ('Material for lecture Monday September 16',
+               'expectation-value-and-variance'),
+              ('Expectation value and variance for $\\boldsymbol{\\theta}$',
                2,
                None,
-               'material-for-lecture-monday-september-16'),
-              ('Logistic Regression', 2, None, 'logistic-regression'),
-              ('Classification problems', 2, None, 'classification-problems'),
-              ('Optimization and Deep learning',
+               'expectation-value-and-variance-for-boldsymbol-theta'),
+              ('Deriving OLS from a probability distribution',
                2,
                None,
-               'optimization-and-deep-learning'),
-              ('Basics', 2, None, 'basics'),
-              ('Linear classifier', 2, None, 'linear-classifier'),
-              ('Some selected properties', 2, None, 'some-selected-properties'),
-              ('Simple example', 2, None, 'simple-example'),
-              ('Plotting the mean value for each group',
+               'deriving-ols-from-a-probability-distribution'),
+              ('Independent and Identically Distributed (iid)',
                2,
                None,
-               'plotting-the-mean-value-for-each-group'),
-              ('The logistic function', 2, None, 'the-logistic-function'),
-              ('Examples of likelihood functions used in logistic regression '
-               'and nueral networks',
+               'independent-and-identically-distributed-iid'),
+              ('Maximum Likelihood Estimation (MLE)',
                2,
                None,
-               'examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-networks'),
-              ('Two parameters', 2, None, 'two-parameters'),
-              ('Maximum likelihood', 2, None, 'maximum-likelihood'),
-              ('The cost function rewritten',
+               'maximum-likelihood-estimation-mle'),
+              ('A new Cost Function', 2, None, 'a-new-cost-function'),
+              ('Why resampling methods', 2, None, 'why-resampling-methods'),
+              ('Resampling methods', 2, None, 'resampling-methods'),
+              ('Resampling approaches can be computationally expensive',
                2,
                None,
-               'the-cost-function-rewritten'),
-              ('Minimizing the cross entropy',
+               'resampling-approaches-can-be-computationally-expensive'),
+              ('Why resampling methods ?', 2, None, 'why-resampling-methods'),
+              ('Statistical analysis', 2, None, 'statistical-analysis'),
+              ('Resampling methods', 2, None, 'resampling-methods'),
+              ('Resampling methods: Bootstrap',
                2,
                None,
-               'minimizing-the-cross-entropy'),
-              ('A more compact expression',
+               'resampling-methods-bootstrap'),
+              ('The Central Limit Theorem',
                2,
                None,
-               'a-more-compact-expression'),
-              ('Extending to more predictors',
+               'the-central-limit-theorem'),
+              ('Finding the Limit', 2, None, 'finding-the-limit'),
+              ('Rewriting the $\\delta$-function',
                2,
                None,
-               'extending-to-more-predictors'),
-              ('Including more classes', 2, None, 'including-more-classes'),
-              ('More classes', 2, None, 'more-classes'),
-              ('Searching for Optimal Regularization Parameters $\\lambda$',
+               'rewriting-the-delta-function'),
+              ('Identifying Terms', 2, None, 'identifying-terms'),
+              ('Wrapping it up', 2, None, 'wrapping-it-up'),
+              ('Confidence Intervals', 2, None, 'confidence-intervals'),
+              ('Standard Approach based on the Normal Distribution',
                2,
                None,
-               'searching-for-optimal-regularization-parameters-lambda'),
-              ('Grid Search', 2, None, 'grid-search'),
-              ('Randomized Grid Search', 2, None, 'randomized-grid-search'),
-              ('Wisconsin Cancer Data', 2, None, 'wisconsin-cancer-data'),
-              ('Using the correlation matrix',
+               'standard-approach-based-on-the-normal-distribution'),
+              ('Resampling methods: Bootstrap background',
                2,
                None,
-               'using-the-correlation-matrix'),
-              ('Discussing the correlation data',
+               'resampling-methods-bootstrap-background'),
+              ('Resampling methods: More Bootstrap background',
                2,
                None,
-               'discussing-the-correlation-data'),
-              ('Other measures in classification studies: Cancer Data  again',
+               'resampling-methods-more-bootstrap-background'),
+              ('Resampling methods: Bootstrap approach',
                2,
                None,
-               'other-measures-in-classification-studies-cancer-data-again'),
-              ('Optimization, the central part of any Machine Learning '
-               'algortithm',
+               'resampling-methods-bootstrap-approach'),
+              ('Resampling methods: Bootstrap steps',
                2,
                None,
-               'optimization-the-central-part-of-any-machine-learning-algortithm'),
-              ('Revisiting our Logistic Regression case',
+               'resampling-methods-bootstrap-steps'),
+              ('Code example for the Bootstrap method',
                2,
                None,
-               'revisiting-our-logistic-regression-case'),
-              ('The equations to solve', 2, None, 'the-equations-to-solve'),
-              ("Solving using Newton-Raphson's method",
+               'code-example-for-the-bootstrap-method'),
+              ('Plotting the Histogram', 2, None, 'plotting-the-histogram'),
+              ('The bias-variance tradeoff',
                2,
                None,
-               'solving-using-newton-raphson-s-method'),
-              ("Brief reminder on Newton-Raphson's method",
+               'the-bias-variance-tradeoff'),
+              ('A way to Read the Bias-Variance Tradeoff',
                2,
                None,
-               'brief-reminder-on-newton-raphson-s-method'),
-              ('The equations', 2, None, 'the-equations'),
-              ('Simple geometric interpretation',
+               'a-way-to-read-the-bias-variance-tradeoff'),
+              ('Example code for Bias-Variance tradeoff',
                2,
                None,
-               'simple-geometric-interpretation'),
-              ('Extending to more than one variable',
+               'example-code-for-bias-variance-tradeoff'),
+              ('Understanding what happens',
                2,
                None,
-               'extending-to-more-than-one-variable'),
-              ('Steepest descent', 2, None, 'steepest-descent'),
-              ('More on Steepest descent', 2, None, 'more-on-steepest-descent'),
-              ('The ideal', 2, None, 'the-ideal'),
-              ('The sensitiveness of the gradient descent',
+               'understanding-what-happens'),
+              ('Summing up', 2, None, 'summing-up'),
+              ("Another Example from Scikit-Learn's Repository",
                2,
                None,
-               'the-sensitiveness-of-the-gradient-descent'),
-              ('Convex functions', 2, None, 'convex-functions'),
-              ('Convex function', 2, None, 'convex-function'),
-              ('Conditions on convex functions',
-               2,
-               None,
-               'conditions-on-convex-functions'),
-              ('More on convex functions', 2, None, 'more-on-convex-functions'),
-              ('Some simple problems', 2, None, 'some-simple-problems'),
-              ('Revisiting our first homework',
-               2,
-               None,
-               'revisiting-our-first-homework'),
-              ('Gradient descent example', 2, None, 'gradient-descent-example'),
-              ('The derivative of the cost/loss function',
-               2,
-               None,
-               'the-derivative-of-the-cost-loss-function'),
-              ('The Hessian matrix', 2, None, 'the-hessian-matrix'),
-              ('Simple program', 2, None, 'simple-program'),
-              ('Gradient Descent Example', 2, None, 'gradient-descent-example'),
-              ('And a corresponding example using _scikit-learn_',
-               2,
-               None,
-               'and-a-corresponding-example-using-scikit-learn'),
-              ('Gradient descent and Ridge',
-               2,
-               None,
-               'gradient-descent-and-ridge'),
-              ('The Hessian matrix for Ridge Regression',
-               2,
-               None,
-               'the-hessian-matrix-for-ridge-regression'),
-              ('Program example for gradient descent with Ridge Regression',
-               2,
-               None,
-               'program-example-for-gradient-descent-with-ridge-regression'),
-              ('Using gradient descent methods, limitations',
-               2,
-               None,
-               'using-gradient-descent-methods-limitations'),
-              ('Challenge yourself the coming weekend',
-               2,
-               None,
-               'challenge-yourself-the-coming-weekend'),
-              ('Lab session: Material from last week and relevant for the '
-               'first project',
-               2,
-               None,
-               'lab-session-material-from-last-week-and-relevant-for-the-first-project'),
+               'another-example-from-scikit-learn-s-repository'),
               ('Various steps in cross-validation',
                2,
                None,
                'various-steps-in-cross-validation'),
-              ('How to set up the cross-validation for Ridge and/or Lasso',
-               2,
-               None,
-               'how-to-set-up-the-cross-validation-for-ridge-and-or-lasso'),
               ('Cross-validation in brief',
                2,
                None,
@@ -210,7 +151,19 @@
                'Cross-validation',
                2,
                None,
-               'code-example-for-cross-validation-and-k-fold-cross-validation')]}
+               'code-example-for-cross-validation-and-k-fold-cross-validation'),
+              ('More examples on bootstrap and cross-validation and errors',
+               2,
+               None,
+               'more-examples-on-bootstrap-and-cross-validation-and-errors'),
+              ('The same example but now with cross-validation',
+               2,
+               None,
+               'the-same-example-but-now-with-cross-validation'),
+              ('Material for the lab sessions',
+               2,
+               None,
+               'material-for-the-lab-sessions')]}
 end of tocinfo -->
 
 <body>
@@ -238,76 +191,55 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week38-bs.html">Week 38: Logistic Regression and Optimization</a>
+    <a class="navbar-brand" href="http://github.com/week38-bs.html">Week 38: Statistical analysis, bias-variance tradeoff and resampling methods</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs001.html#plans-for-week-38-lecture-monday-september-16" style="font-size: 80%;">Plans for week 38, lecture Monday September 16</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs002.html#suggested-reading-and-videos" style="font-size: 80%;">Suggested reading and videos</a></li>
-     <!-- navigation toc: --> <li><a href="#plans-for-the-lab-sessions" style="font-size: 80%;">Plans for the lab sessions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs004.html#material-for-lecture-monday-september-16" style="font-size: 80%;">Material for lecture Monday September 16</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs005.html#logistic-regression" style="font-size: 80%;">Logistic Regression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs006.html#classification-problems" style="font-size: 80%;">Classification problems</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs007.html#optimization-and-deep-learning" style="font-size: 80%;">Optimization and Deep learning</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs008.html#basics" style="font-size: 80%;">Basics</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs009.html#linear-classifier" style="font-size: 80%;">Linear classifier</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs010.html#some-selected-properties" style="font-size: 80%;">Some selected properties</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs011.html#simple-example" style="font-size: 80%;">Simple example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs012.html#plotting-the-mean-value-for-each-group" style="font-size: 80%;">Plotting the mean value for each group</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs013.html#the-logistic-function" style="font-size: 80%;">The logistic function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs014.html#examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-networks" style="font-size: 80%;">Examples of likelihood functions used in logistic regression and nueral networks</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs015.html#two-parameters" style="font-size: 80%;">Two parameters</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs016.html#maximum-likelihood" style="font-size: 80%;">Maximum likelihood</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs017.html#the-cost-function-rewritten" style="font-size: 80%;">The cost function rewritten</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs018.html#minimizing-the-cross-entropy" style="font-size: 80%;">Minimizing the cross entropy</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs019.html#a-more-compact-expression" style="font-size: 80%;">A more compact expression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs020.html#extending-to-more-predictors" style="font-size: 80%;">Extending to more predictors</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs021.html#including-more-classes" style="font-size: 80%;">Including more classes</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs022.html#more-classes" style="font-size: 80%;">More classes</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs023.html#searching-for-optimal-regularization-parameters-lambda" style="font-size: 80%;">Searching for Optimal Regularization Parameters \( \lambda \)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs024.html#grid-search" style="font-size: 80%;">Grid Search</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs025.html#randomized-grid-search" style="font-size: 80%;">Randomized Grid Search</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs026.html#wisconsin-cancer-data" style="font-size: 80%;">Wisconsin Cancer Data</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs027.html#using-the-correlation-matrix" style="font-size: 80%;">Using the correlation matrix</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs028.html#discussing-the-correlation-data" style="font-size: 80%;">Discussing the correlation data</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs029.html#other-measures-in-classification-studies-cancer-data-again" style="font-size: 80%;">Other measures in classification studies: Cancer Data  again</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs030.html#optimization-the-central-part-of-any-machine-learning-algortithm" style="font-size: 80%;">Optimization, the central part of any Machine Learning algortithm</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs031.html#revisiting-our-logistic-regression-case" style="font-size: 80%;">Revisiting our Logistic Regression case</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs032.html#the-equations-to-solve" style="font-size: 80%;">The equations to solve</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs033.html#solving-using-newton-raphson-s-method" style="font-size: 80%;">Solving using Newton-Raphson's method</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs034.html#brief-reminder-on-newton-raphson-s-method" style="font-size: 80%;">Brief reminder on Newton-Raphson's method</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs035.html#the-equations" style="font-size: 80%;">The equations</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs036.html#simple-geometric-interpretation" style="font-size: 80%;">Simple geometric interpretation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs037.html#extending-to-more-than-one-variable" style="font-size: 80%;">Extending to more than one variable</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs038.html#steepest-descent" style="font-size: 80%;">Steepest descent</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs039.html#more-on-steepest-descent" style="font-size: 80%;">More on Steepest descent</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs040.html#the-ideal" style="font-size: 80%;">The ideal</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs041.html#the-sensitiveness-of-the-gradient-descent" style="font-size: 80%;">The sensitiveness of the gradient descent</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs042.html#convex-functions" style="font-size: 80%;">Convex functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs043.html#convex-function" style="font-size: 80%;">Convex function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs044.html#conditions-on-convex-functions" style="font-size: 80%;">Conditions on convex functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs045.html#more-on-convex-functions" style="font-size: 80%;">More on convex functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs046.html#some-simple-problems" style="font-size: 80%;">Some simple problems</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs047.html#revisiting-our-first-homework" style="font-size: 80%;">Revisiting our first homework</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs052.html#gradient-descent-example" style="font-size: 80%;">Gradient descent example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs049.html#the-derivative-of-the-cost-loss-function" style="font-size: 80%;">The derivative of the cost/loss function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs050.html#the-hessian-matrix" style="font-size: 80%;">The Hessian matrix</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs051.html#simple-program" style="font-size: 80%;">Simple program</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs052.html#gradient-descent-example" style="font-size: 80%;">Gradient Descent Example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs053.html#and-a-corresponding-example-using-scikit-learn" style="font-size: 80%;">And a corresponding example using <b>scikit-learn</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs054.html#gradient-descent-and-ridge" style="font-size: 80%;">Gradient descent and Ridge</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs055.html#the-hessian-matrix-for-ridge-regression" style="font-size: 80%;">The Hessian matrix for Ridge Regression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs056.html#program-example-for-gradient-descent-with-ridge-regression" style="font-size: 80%;">Program example for gradient descent with Ridge Regression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs057.html#using-gradient-descent-methods-limitations" style="font-size: 80%;">Using gradient descent methods, limitations</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs058.html#challenge-yourself-the-coming-weekend" style="font-size: 80%;">Challenge yourself the coming weekend</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs059.html#lab-session-material-from-last-week-and-relevant-for-the-first-project" style="font-size: 80%;">Lab session: Material from last week and relevant for the first project</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs060.html#various-steps-in-cross-validation" style="font-size: 80%;">Various steps in cross-validation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs061.html#how-to-set-up-the-cross-validation-for-ridge-and-or-lasso" style="font-size: 80%;">How to set up the cross-validation for Ridge and/or Lasso</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs062.html#cross-validation-in-brief" style="font-size: 80%;">Cross-validation in brief</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs063.html#code-example-for-cross-validation-and-k-fold-cross-validation" style="font-size: 80%;">Code Example for Cross-validation and \( k \)-fold Cross-validation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs001.html#plans-for-week-38-lecture-monday-september-15" style="font-size: 80%;">Plans for week 38, lecture Monday September 15</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs002.html#readings-and-videos" style="font-size: 80%;">Readings and Videos</a></li>
+     <!-- navigation toc: --> <li><a href="#linking-the-regression-analysis-with-a-statistical-interpretation" style="font-size: 80%;">Linking the regression analysis with a statistical interpretation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs004.html#assumptions-made" style="font-size: 80%;">Assumptions made</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs005.html#expectation-value-and-variance" style="font-size: 80%;">Expectation value and variance</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs006.html#expectation-value-and-variance-for-boldsymbol-theta" style="font-size: 80%;">Expectation value and variance for \( \boldsymbol{\theta} \)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs007.html#deriving-ols-from-a-probability-distribution" style="font-size: 80%;">Deriving OLS from a probability distribution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs008.html#independent-and-identically-distributed-iid" style="font-size: 80%;">Independent and Identically Distributed (iid)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs009.html#maximum-likelihood-estimation-mle" style="font-size: 80%;">Maximum Likelihood Estimation (MLE)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs010.html#a-new-cost-function" style="font-size: 80%;">A new Cost Function</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs014.html#why-resampling-methods" style="font-size: 80%;">Why resampling methods</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs016.html#resampling-methods" style="font-size: 80%;">Resampling methods</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs013.html#resampling-approaches-can-be-computationally-expensive" style="font-size: 80%;">Resampling approaches can be computationally expensive</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs014.html#why-resampling-methods" style="font-size: 80%;">Why resampling methods ?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs015.html#statistical-analysis" style="font-size: 80%;">Statistical analysis</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs016.html#resampling-methods" style="font-size: 80%;">Resampling methods</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs017.html#resampling-methods-bootstrap" style="font-size: 80%;">Resampling methods: Bootstrap</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs018.html#the-central-limit-theorem" style="font-size: 80%;">The Central Limit Theorem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs019.html#finding-the-limit" style="font-size: 80%;">Finding the Limit</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs020.html#rewriting-the-delta-function" style="font-size: 80%;">Rewriting the \( \delta \)-function</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs021.html#identifying-terms" style="font-size: 80%;">Identifying Terms</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs022.html#wrapping-it-up" style="font-size: 80%;">Wrapping it up</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs023.html#confidence-intervals" style="font-size: 80%;">Confidence Intervals</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs024.html#standard-approach-based-on-the-normal-distribution" style="font-size: 80%;">Standard Approach based on the Normal Distribution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs025.html#resampling-methods-bootstrap-background" style="font-size: 80%;">Resampling methods: Bootstrap background</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs026.html#resampling-methods-more-bootstrap-background" style="font-size: 80%;">Resampling methods: More Bootstrap background</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs027.html#resampling-methods-bootstrap-approach" style="font-size: 80%;">Resampling methods: Bootstrap approach</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs028.html#resampling-methods-bootstrap-steps" style="font-size: 80%;">Resampling methods: Bootstrap steps</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs029.html#code-example-for-the-bootstrap-method" style="font-size: 80%;">Code example for the Bootstrap method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs030.html#plotting-the-histogram" style="font-size: 80%;">Plotting the Histogram</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs031.html#the-bias-variance-tradeoff" style="font-size: 80%;">The bias-variance tradeoff</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs032.html#a-way-to-read-the-bias-variance-tradeoff" style="font-size: 80%;">A way to Read the Bias-Variance Tradeoff</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs033.html#example-code-for-bias-variance-tradeoff" style="font-size: 80%;">Example code for Bias-Variance tradeoff</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs034.html#understanding-what-happens" style="font-size: 80%;">Understanding what happens</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs035.html#summing-up" style="font-size: 80%;">Summing up</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs036.html#another-example-from-scikit-learn-s-repository" style="font-size: 80%;">Another Example from Scikit-Learn's Repository</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs037.html#various-steps-in-cross-validation" style="font-size: 80%;">Various steps in cross-validation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs038.html#cross-validation-in-brief" style="font-size: 80%;">Cross-validation in brief</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs039.html#code-example-for-cross-validation-and-k-fold-cross-validation" style="font-size: 80%;">Code Example for Cross-validation and \( k \)-fold Cross-validation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs040.html#more-examples-on-bootstrap-and-cross-validation-and-errors" style="font-size: 80%;">More examples on bootstrap and cross-validation and errors</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs041.html#the-same-example-but-now-with-cross-validation" style="font-size: 80%;">The same example but now with cross-validation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs042.html#material-for-the-lab-sessions" style="font-size: 80%;">Material for the lab sessions</a></li>
 
         </ul>
       </li>
@@ -318,22 +250,49 @@
 <div class="container">
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0003"></a>
-<!-- !split -->
-<h2 id="plans-for-the-lab-sessions" class="anchor">Plans for the lab sessions </h2>
+<!-- !split  -->
+<h2 id="linking-the-regression-analysis-with-a-statistical-interpretation" class="anchor">Linking the regression analysis with a statistical interpretation </h2>
 
-<div class="panel panel-default">
-<div class="panel-body">
-<!-- subsequent paragraphs come in larger fonts, so start with a paragraph -->
-<ul>
-  <li> Repetition  from last week on the bias-variance tradeoff</li>
-  <li> Resampling techniques, cross-validation examples included here, see also the lectures from last week on the bootstrap method</li>
-  <li> Exercise for week 38 on the bias-variance tradeoff, see also the video from the lab session from week 37 at <a href="https://youtu.be/omLmp_kkie0" target="_self"><tt>https://youtu.be/omLmp_kkie0</tt></a></li>
-  <li> Work on project 1, in particular resampling methods like cross-validation and bootstrap.</li>
-  <li> <a href="https://youtu.be/T9jjWsmsd1o" target="_self">Video on cross-validation from exercise session</a></li>
-</ul>
-</div>
-</div>
-  
+<p>We will now couple the discussions of ordinary least squares, Ridge
+and Lasso regression with a statistical interpretation, that is we
+move from a linear algebra analysis to a statistical analysis. In
+particular, we will focus on what the regularization terms can result
+in.  We will amongst other things show that the regularization
+parameter can reduce considerably the variance of the parameters
+\( \theta \).
+</p>
+
+<p>On of the advantages of doing linear regression is that we actually end up with
+analytical expressions for several statistical quantities.  
+Standard least squares and Ridge regression  allow us to
+derive quantities like the variance and other expectation values in a
+rather straightforward way.
+</p>
+
+<p>It is assumed that \( \varepsilon_i
+\sim \mathcal{N}(0, \sigma^2) \) and the \( \varepsilon_{i} \) are
+independent, i.e.: 
+</p>
+$$
+\begin{align*} 
+\mbox{Cov}(\varepsilon_{i_1},
+\varepsilon_{i_2}) & = \left\{ \begin{array}{lcc} \sigma^2 & \mbox{if}
+& i_1 = i_2, \\ 0 & \mbox{if} & i_1 \not= i_2.  \end{array} \right.
+\end{align*} 
+$$
+
+<p>The randomness of \( \varepsilon_i \) implies that
+\( \mathbf{y}_i \) is also a random variable. In particular,
+\( \mathbf{y}_i \) is normally distributed, because \( \varepsilon_i \sim
+\mathcal{N}(0, \sigma^2) \) and \( \mathbf{X}_{i,\ast} \, \boldsymbol{\theta} \) is a
+non-random scalar. To specify the parameters of the distribution of
+\( \mathbf{y}_i \) we need to calculate its first two moments. 
+</p>
+
+<p>Recall that \( \boldsymbol{X} \) is a matrix of dimensionality \( n\times p \). The
+notation above \( \mathbf{X}_{i,\ast} \) means that we are looking at the
+row number \( i \) and perform a sum over all values \( p \).
+</p>
 
 <p>
 <!-- navigation buttons at the bottom of the page -->
@@ -353,7 +312,7 @@ <h2 id="plans-for-the-lab-sessions" class="anchor">Plans for the lab sessions </
   <li><a href="http://github.com/._week38-bs011.html">12</a></li>
   <li><a href="http://github.com/._week38-bs012.html">13</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week38-bs063.html">64</a></li>
+  <li><a href="http://github.com/._week38-bs042.html">43</a></li>
   <li><a href="http://github.com/._week38-bs004.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week38/html/._week38-bs004.html b/doc/pub/week38/html/._week38-bs004.html
index e2fb069b1..33ef42b90 100644
--- a/doc/pub/week38/html/._week38-bs004.html
+++ b/doc/pub/week38/html/._week38-bs004.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 38: Logistic Regression and Optimization">
-<title>Week 38: Logistic Regression and Optimization</title>
+<meta name="description" content="Week 38: Statistical analysis, bias-variance tradeoff and resampling methods">
+<title>Week 38: Statistical analysis, bias-variance tradeoff and resampling methods</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week38.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week38-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -36,172 +36,113 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Plans for week 38, lecture Monday September 16',
+ 'sections': [('Plans for week 38, lecture Monday September 15',
                2,
                None,
-               'plans-for-week-38-lecture-monday-september-16'),
-              ('Suggested reading and videos',
+               'plans-for-week-38-lecture-monday-september-15'),
+              ('Readings and Videos', 2, None, 'readings-and-videos'),
+              ('Linking the regression analysis with a statistical '
+               'interpretation',
                2,
                None,
-               'suggested-reading-and-videos'),
-              ('Plans for the lab sessions',
+               'linking-the-regression-analysis-with-a-statistical-interpretation'),
+              ('Assumptions made', 2, None, 'assumptions-made'),
+              ('Expectation value and variance',
                2,
                None,
-               'plans-for-the-lab-sessions'),
-              ('Material for lecture Monday September 16',
+               'expectation-value-and-variance'),
+              ('Expectation value and variance for $\\boldsymbol{\\theta}$',
                2,
                None,
-               'material-for-lecture-monday-september-16'),
-              ('Logistic Regression', 2, None, 'logistic-regression'),
-              ('Classification problems', 2, None, 'classification-problems'),
-              ('Optimization and Deep learning',
+               'expectation-value-and-variance-for-boldsymbol-theta'),
+              ('Deriving OLS from a probability distribution',
                2,
                None,
-               'optimization-and-deep-learning'),
-              ('Basics', 2, None, 'basics'),
-              ('Linear classifier', 2, None, 'linear-classifier'),
-              ('Some selected properties', 2, None, 'some-selected-properties'),
-              ('Simple example', 2, None, 'simple-example'),
-              ('Plotting the mean value for each group',
+               'deriving-ols-from-a-probability-distribution'),
+              ('Independent and Identically Distributed (iid)',
                2,
                None,
-               'plotting-the-mean-value-for-each-group'),
-              ('The logistic function', 2, None, 'the-logistic-function'),
-              ('Examples of likelihood functions used in logistic regression '
-               'and nueral networks',
+               'independent-and-identically-distributed-iid'),
+              ('Maximum Likelihood Estimation (MLE)',
                2,
                None,
-               'examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-networks'),
-              ('Two parameters', 2, None, 'two-parameters'),
-              ('Maximum likelihood', 2, None, 'maximum-likelihood'),
-              ('The cost function rewritten',
+               'maximum-likelihood-estimation-mle'),
+              ('A new Cost Function', 2, None, 'a-new-cost-function'),
+              ('Why resampling methods', 2, None, 'why-resampling-methods'),
+              ('Resampling methods', 2, None, 'resampling-methods'),
+              ('Resampling approaches can be computationally expensive',
                2,
                None,
-               'the-cost-function-rewritten'),
-              ('Minimizing the cross entropy',
+               'resampling-approaches-can-be-computationally-expensive'),
+              ('Why resampling methods ?', 2, None, 'why-resampling-methods'),
+              ('Statistical analysis', 2, None, 'statistical-analysis'),
+              ('Resampling methods', 2, None, 'resampling-methods'),
+              ('Resampling methods: Bootstrap',
                2,
                None,
-               'minimizing-the-cross-entropy'),
-              ('A more compact expression',
+               'resampling-methods-bootstrap'),
+              ('The Central Limit Theorem',
                2,
                None,
-               'a-more-compact-expression'),
-              ('Extending to more predictors',
+               'the-central-limit-theorem'),
+              ('Finding the Limit', 2, None, 'finding-the-limit'),
+              ('Rewriting the $\\delta$-function',
                2,
                None,
-               'extending-to-more-predictors'),
-              ('Including more classes', 2, None, 'including-more-classes'),
-              ('More classes', 2, None, 'more-classes'),
-              ('Searching for Optimal Regularization Parameters $\\lambda$',
+               'rewriting-the-delta-function'),
+              ('Identifying Terms', 2, None, 'identifying-terms'),
+              ('Wrapping it up', 2, None, 'wrapping-it-up'),
+              ('Confidence Intervals', 2, None, 'confidence-intervals'),
+              ('Standard Approach based on the Normal Distribution',
                2,
                None,
-               'searching-for-optimal-regularization-parameters-lambda'),
-              ('Grid Search', 2, None, 'grid-search'),
-              ('Randomized Grid Search', 2, None, 'randomized-grid-search'),
-              ('Wisconsin Cancer Data', 2, None, 'wisconsin-cancer-data'),
-              ('Using the correlation matrix',
+               'standard-approach-based-on-the-normal-distribution'),
+              ('Resampling methods: Bootstrap background',
                2,
                None,
-               'using-the-correlation-matrix'),
-              ('Discussing the correlation data',
+               'resampling-methods-bootstrap-background'),
+              ('Resampling methods: More Bootstrap background',
                2,
                None,
-               'discussing-the-correlation-data'),
-              ('Other measures in classification studies: Cancer Data  again',
+               'resampling-methods-more-bootstrap-background'),
+              ('Resampling methods: Bootstrap approach',
                2,
                None,
-               'other-measures-in-classification-studies-cancer-data-again'),
-              ('Optimization, the central part of any Machine Learning '
-               'algortithm',
+               'resampling-methods-bootstrap-approach'),
+              ('Resampling methods: Bootstrap steps',
                2,
                None,
-               'optimization-the-central-part-of-any-machine-learning-algortithm'),
-              ('Revisiting our Logistic Regression case',
+               'resampling-methods-bootstrap-steps'),
+              ('Code example for the Bootstrap method',
                2,
                None,
-               'revisiting-our-logistic-regression-case'),
-              ('The equations to solve', 2, None, 'the-equations-to-solve'),
-              ("Solving using Newton-Raphson's method",
+               'code-example-for-the-bootstrap-method'),
+              ('Plotting the Histogram', 2, None, 'plotting-the-histogram'),
+              ('The bias-variance tradeoff',
                2,
                None,
-               'solving-using-newton-raphson-s-method'),
-              ("Brief reminder on Newton-Raphson's method",
+               'the-bias-variance-tradeoff'),
+              ('A way to Read the Bias-Variance Tradeoff',
                2,
                None,
-               'brief-reminder-on-newton-raphson-s-method'),
-              ('The equations', 2, None, 'the-equations'),
-              ('Simple geometric interpretation',
+               'a-way-to-read-the-bias-variance-tradeoff'),
+              ('Example code for Bias-Variance tradeoff',
                2,
                None,
-               'simple-geometric-interpretation'),
-              ('Extending to more than one variable',
+               'example-code-for-bias-variance-tradeoff'),
+              ('Understanding what happens',
                2,
                None,
-               'extending-to-more-than-one-variable'),
-              ('Steepest descent', 2, None, 'steepest-descent'),
-              ('More on Steepest descent', 2, None, 'more-on-steepest-descent'),
-              ('The ideal', 2, None, 'the-ideal'),
-              ('The sensitiveness of the gradient descent',
+               'understanding-what-happens'),
+              ('Summing up', 2, None, 'summing-up'),
+              ("Another Example from Scikit-Learn's Repository",
                2,
                None,
-               'the-sensitiveness-of-the-gradient-descent'),
-              ('Convex functions', 2, None, 'convex-functions'),
-              ('Convex function', 2, None, 'convex-function'),
-              ('Conditions on convex functions',
-               2,
-               None,
-               'conditions-on-convex-functions'),
-              ('More on convex functions', 2, None, 'more-on-convex-functions'),
-              ('Some simple problems', 2, None, 'some-simple-problems'),
-              ('Revisiting our first homework',
-               2,
-               None,
-               'revisiting-our-first-homework'),
-              ('Gradient descent example', 2, None, 'gradient-descent-example'),
-              ('The derivative of the cost/loss function',
-               2,
-               None,
-               'the-derivative-of-the-cost-loss-function'),
-              ('The Hessian matrix', 2, None, 'the-hessian-matrix'),
-              ('Simple program', 2, None, 'simple-program'),
-              ('Gradient Descent Example', 2, None, 'gradient-descent-example'),
-              ('And a corresponding example using _scikit-learn_',
-               2,
-               None,
-               'and-a-corresponding-example-using-scikit-learn'),
-              ('Gradient descent and Ridge',
-               2,
-               None,
-               'gradient-descent-and-ridge'),
-              ('The Hessian matrix for Ridge Regression',
-               2,
-               None,
-               'the-hessian-matrix-for-ridge-regression'),
-              ('Program example for gradient descent with Ridge Regression',
-               2,
-               None,
-               'program-example-for-gradient-descent-with-ridge-regression'),
-              ('Using gradient descent methods, limitations',
-               2,
-               None,
-               'using-gradient-descent-methods-limitations'),
-              ('Challenge yourself the coming weekend',
-               2,
-               None,
-               'challenge-yourself-the-coming-weekend'),
-              ('Lab session: Material from last week and relevant for the '
-               'first project',
-               2,
-               None,
-               'lab-session-material-from-last-week-and-relevant-for-the-first-project'),
+               'another-example-from-scikit-learn-s-repository'),
               ('Various steps in cross-validation',
                2,
                None,
                'various-steps-in-cross-validation'),
-              ('How to set up the cross-validation for Ridge and/or Lasso',
-               2,
-               None,
-               'how-to-set-up-the-cross-validation-for-ridge-and-or-lasso'),
               ('Cross-validation in brief',
                2,
                None,
@@ -210,7 +151,19 @@
                'Cross-validation',
                2,
                None,
-               'code-example-for-cross-validation-and-k-fold-cross-validation')]}
+               'code-example-for-cross-validation-and-k-fold-cross-validation'),
+              ('More examples on bootstrap and cross-validation and errors',
+               2,
+               None,
+               'more-examples-on-bootstrap-and-cross-validation-and-errors'),
+              ('The same example but now with cross-validation',
+               2,
+               None,
+               'the-same-example-but-now-with-cross-validation'),
+              ('Material for the lab sessions',
+               2,
+               None,
+               'material-for-the-lab-sessions')]}
 end of tocinfo -->
 
 <body>
@@ -238,76 +191,55 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week38-bs.html">Week 38: Logistic Regression and Optimization</a>
+    <a class="navbar-brand" href="http://github.com/week38-bs.html">Week 38: Statistical analysis, bias-variance tradeoff and resampling methods</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs001.html#plans-for-week-38-lecture-monday-september-16" style="font-size: 80%;">Plans for week 38, lecture Monday September 16</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs002.html#suggested-reading-and-videos" style="font-size: 80%;">Suggested reading and videos</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs003.html#plans-for-the-lab-sessions" style="font-size: 80%;">Plans for the lab sessions</a></li>
-     <!-- navigation toc: --> <li><a href="#material-for-lecture-monday-september-16" style="font-size: 80%;">Material for lecture Monday September 16</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs005.html#logistic-regression" style="font-size: 80%;">Logistic Regression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs006.html#classification-problems" style="font-size: 80%;">Classification problems</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs007.html#optimization-and-deep-learning" style="font-size: 80%;">Optimization and Deep learning</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs008.html#basics" style="font-size: 80%;">Basics</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs009.html#linear-classifier" style="font-size: 80%;">Linear classifier</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs010.html#some-selected-properties" style="font-size: 80%;">Some selected properties</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs011.html#simple-example" style="font-size: 80%;">Simple example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs012.html#plotting-the-mean-value-for-each-group" style="font-size: 80%;">Plotting the mean value for each group</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs013.html#the-logistic-function" style="font-size: 80%;">The logistic function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs014.html#examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-networks" style="font-size: 80%;">Examples of likelihood functions used in logistic regression and nueral networks</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs015.html#two-parameters" style="font-size: 80%;">Two parameters</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs016.html#maximum-likelihood" style="font-size: 80%;">Maximum likelihood</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs017.html#the-cost-function-rewritten" style="font-size: 80%;">The cost function rewritten</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs018.html#minimizing-the-cross-entropy" style="font-size: 80%;">Minimizing the cross entropy</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs019.html#a-more-compact-expression" style="font-size: 80%;">A more compact expression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs020.html#extending-to-more-predictors" style="font-size: 80%;">Extending to more predictors</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs021.html#including-more-classes" style="font-size: 80%;">Including more classes</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs022.html#more-classes" style="font-size: 80%;">More classes</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs023.html#searching-for-optimal-regularization-parameters-lambda" style="font-size: 80%;">Searching for Optimal Regularization Parameters \( \lambda \)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs024.html#grid-search" style="font-size: 80%;">Grid Search</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs025.html#randomized-grid-search" style="font-size: 80%;">Randomized Grid Search</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs026.html#wisconsin-cancer-data" style="font-size: 80%;">Wisconsin Cancer Data</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs027.html#using-the-correlation-matrix" style="font-size: 80%;">Using the correlation matrix</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs028.html#discussing-the-correlation-data" style="font-size: 80%;">Discussing the correlation data</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs029.html#other-measures-in-classification-studies-cancer-data-again" style="font-size: 80%;">Other measures in classification studies: Cancer Data  again</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs030.html#optimization-the-central-part-of-any-machine-learning-algortithm" style="font-size: 80%;">Optimization, the central part of any Machine Learning algortithm</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs031.html#revisiting-our-logistic-regression-case" style="font-size: 80%;">Revisiting our Logistic Regression case</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs032.html#the-equations-to-solve" style="font-size: 80%;">The equations to solve</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs033.html#solving-using-newton-raphson-s-method" style="font-size: 80%;">Solving using Newton-Raphson's method</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs034.html#brief-reminder-on-newton-raphson-s-method" style="font-size: 80%;">Brief reminder on Newton-Raphson's method</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs035.html#the-equations" style="font-size: 80%;">The equations</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs036.html#simple-geometric-interpretation" style="font-size: 80%;">Simple geometric interpretation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs037.html#extending-to-more-than-one-variable" style="font-size: 80%;">Extending to more than one variable</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs038.html#steepest-descent" style="font-size: 80%;">Steepest descent</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs039.html#more-on-steepest-descent" style="font-size: 80%;">More on Steepest descent</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs040.html#the-ideal" style="font-size: 80%;">The ideal</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs041.html#the-sensitiveness-of-the-gradient-descent" style="font-size: 80%;">The sensitiveness of the gradient descent</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs042.html#convex-functions" style="font-size: 80%;">Convex functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs043.html#convex-function" style="font-size: 80%;">Convex function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs044.html#conditions-on-convex-functions" style="font-size: 80%;">Conditions on convex functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs045.html#more-on-convex-functions" style="font-size: 80%;">More on convex functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs046.html#some-simple-problems" style="font-size: 80%;">Some simple problems</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs047.html#revisiting-our-first-homework" style="font-size: 80%;">Revisiting our first homework</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs052.html#gradient-descent-example" style="font-size: 80%;">Gradient descent example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs049.html#the-derivative-of-the-cost-loss-function" style="font-size: 80%;">The derivative of the cost/loss function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs050.html#the-hessian-matrix" style="font-size: 80%;">The Hessian matrix</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs051.html#simple-program" style="font-size: 80%;">Simple program</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs052.html#gradient-descent-example" style="font-size: 80%;">Gradient Descent Example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs053.html#and-a-corresponding-example-using-scikit-learn" style="font-size: 80%;">And a corresponding example using <b>scikit-learn</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs054.html#gradient-descent-and-ridge" style="font-size: 80%;">Gradient descent and Ridge</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs055.html#the-hessian-matrix-for-ridge-regression" style="font-size: 80%;">The Hessian matrix for Ridge Regression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs056.html#program-example-for-gradient-descent-with-ridge-regression" style="font-size: 80%;">Program example for gradient descent with Ridge Regression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs057.html#using-gradient-descent-methods-limitations" style="font-size: 80%;">Using gradient descent methods, limitations</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs058.html#challenge-yourself-the-coming-weekend" style="font-size: 80%;">Challenge yourself the coming weekend</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs059.html#lab-session-material-from-last-week-and-relevant-for-the-first-project" style="font-size: 80%;">Lab session: Material from last week and relevant for the first project</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs060.html#various-steps-in-cross-validation" style="font-size: 80%;">Various steps in cross-validation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs061.html#how-to-set-up-the-cross-validation-for-ridge-and-or-lasso" style="font-size: 80%;">How to set up the cross-validation for Ridge and/or Lasso</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs062.html#cross-validation-in-brief" style="font-size: 80%;">Cross-validation in brief</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs063.html#code-example-for-cross-validation-and-k-fold-cross-validation" style="font-size: 80%;">Code Example for Cross-validation and \( k \)-fold Cross-validation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs001.html#plans-for-week-38-lecture-monday-september-15" style="font-size: 80%;">Plans for week 38, lecture Monday September 15</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs002.html#readings-and-videos" style="font-size: 80%;">Readings and Videos</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs003.html#linking-the-regression-analysis-with-a-statistical-interpretation" style="font-size: 80%;">Linking the regression analysis with a statistical interpretation</a></li>
+     <!-- navigation toc: --> <li><a href="#assumptions-made" style="font-size: 80%;">Assumptions made</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs005.html#expectation-value-and-variance" style="font-size: 80%;">Expectation value and variance</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs006.html#expectation-value-and-variance-for-boldsymbol-theta" style="font-size: 80%;">Expectation value and variance for \( \boldsymbol{\theta} \)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs007.html#deriving-ols-from-a-probability-distribution" style="font-size: 80%;">Deriving OLS from a probability distribution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs008.html#independent-and-identically-distributed-iid" style="font-size: 80%;">Independent and Identically Distributed (iid)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs009.html#maximum-likelihood-estimation-mle" style="font-size: 80%;">Maximum Likelihood Estimation (MLE)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs010.html#a-new-cost-function" style="font-size: 80%;">A new Cost Function</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs014.html#why-resampling-methods" style="font-size: 80%;">Why resampling methods</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs016.html#resampling-methods" style="font-size: 80%;">Resampling methods</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs013.html#resampling-approaches-can-be-computationally-expensive" style="font-size: 80%;">Resampling approaches can be computationally expensive</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs014.html#why-resampling-methods" style="font-size: 80%;">Why resampling methods ?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs015.html#statistical-analysis" style="font-size: 80%;">Statistical analysis</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs016.html#resampling-methods" style="font-size: 80%;">Resampling methods</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs017.html#resampling-methods-bootstrap" style="font-size: 80%;">Resampling methods: Bootstrap</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs018.html#the-central-limit-theorem" style="font-size: 80%;">The Central Limit Theorem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs019.html#finding-the-limit" style="font-size: 80%;">Finding the Limit</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs020.html#rewriting-the-delta-function" style="font-size: 80%;">Rewriting the \( \delta \)-function</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs021.html#identifying-terms" style="font-size: 80%;">Identifying Terms</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs022.html#wrapping-it-up" style="font-size: 80%;">Wrapping it up</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs023.html#confidence-intervals" style="font-size: 80%;">Confidence Intervals</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs024.html#standard-approach-based-on-the-normal-distribution" style="font-size: 80%;">Standard Approach based on the Normal Distribution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs025.html#resampling-methods-bootstrap-background" style="font-size: 80%;">Resampling methods: Bootstrap background</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs026.html#resampling-methods-more-bootstrap-background" style="font-size: 80%;">Resampling methods: More Bootstrap background</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs027.html#resampling-methods-bootstrap-approach" style="font-size: 80%;">Resampling methods: Bootstrap approach</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs028.html#resampling-methods-bootstrap-steps" style="font-size: 80%;">Resampling methods: Bootstrap steps</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs029.html#code-example-for-the-bootstrap-method" style="font-size: 80%;">Code example for the Bootstrap method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs030.html#plotting-the-histogram" style="font-size: 80%;">Plotting the Histogram</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs031.html#the-bias-variance-tradeoff" style="font-size: 80%;">The bias-variance tradeoff</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs032.html#a-way-to-read-the-bias-variance-tradeoff" style="font-size: 80%;">A way to Read the Bias-Variance Tradeoff</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs033.html#example-code-for-bias-variance-tradeoff" style="font-size: 80%;">Example code for Bias-Variance tradeoff</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs034.html#understanding-what-happens" style="font-size: 80%;">Understanding what happens</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs035.html#summing-up" style="font-size: 80%;">Summing up</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs036.html#another-example-from-scikit-learn-s-repository" style="font-size: 80%;">Another Example from Scikit-Learn's Repository</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs037.html#various-steps-in-cross-validation" style="font-size: 80%;">Various steps in cross-validation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs038.html#cross-validation-in-brief" style="font-size: 80%;">Cross-validation in brief</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs039.html#code-example-for-cross-validation-and-k-fold-cross-validation" style="font-size: 80%;">Code Example for Cross-validation and \( k \)-fold Cross-validation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs040.html#more-examples-on-bootstrap-and-cross-validation-and-errors" style="font-size: 80%;">More examples on bootstrap and cross-validation and errors</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs041.html#the-same-example-but-now-with-cross-validation" style="font-size: 80%;">The same example but now with cross-validation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs042.html#material-for-the-lab-sessions" style="font-size: 80%;">Material for the lab sessions</a></li>
 
         </ul>
       </li>
@@ -319,7 +251,23 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0004"></a>
 <!-- !split -->
-<h2 id="material-for-lecture-monday-september-16" class="anchor">Material for lecture Monday September 16 </h2>
+<h2 id="assumptions-made" class="anchor">Assumptions made </h2>
+
+<p>The assumption we have made here can be summarized as (and this is going to be useful when we discuss the bias-variance trade off)
+that there exists a function \( f(\boldsymbol{x}) \) and  a normal distributed error \( \boldsymbol{\varepsilon}\sim \mathcal{N}(0, \sigma^2) \)
+which describe our data
+</p>
+$$
+\boldsymbol{y} = f(\boldsymbol{x})+\boldsymbol{\varepsilon}
+$$
+
+<p>We approximate this function with our model from the solution of the linear regression equations, that is our
+function \( f \) is approximated by \( \boldsymbol{\tilde{y}} \) where we want to minimize \( (\boldsymbol{y}-\boldsymbol{\tilde{y}})^2 \), our MSE, with
+</p>
+$$
+\boldsymbol{\tilde{y}} = \boldsymbol{X}\boldsymbol{\theta}.
+$$
+
 
 <p>
 <!-- navigation buttons at the bottom of the page -->
@@ -340,7 +288,7 @@ <h2 id="material-for-lecture-monday-september-16" class="anchor">Material for le
   <li><a href="http://github.com/._week38-bs012.html">13</a></li>
   <li><a href="http://github.com/._week38-bs013.html">14</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week38-bs063.html">64</a></li>
+  <li><a href="http://github.com/._week38-bs042.html">43</a></li>
   <li><a href="http://github.com/._week38-bs005.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week38/html/._week38-bs005.html b/doc/pub/week38/html/._week38-bs005.html
index 06e44fb5b..af0731ea8 100644
--- a/doc/pub/week38/html/._week38-bs005.html
+++ b/doc/pub/week38/html/._week38-bs005.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 38: Logistic Regression and Optimization">
-<title>Week 38: Logistic Regression and Optimization</title>
+<meta name="description" content="Week 38: Statistical analysis, bias-variance tradeoff and resampling methods">
+<title>Week 38: Statistical analysis, bias-variance tradeoff and resampling methods</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week38.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week38-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -36,172 +36,113 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Plans for week 38, lecture Monday September 16',
+ 'sections': [('Plans for week 38, lecture Monday September 15',
                2,
                None,
-               'plans-for-week-38-lecture-monday-september-16'),
-              ('Suggested reading and videos',
+               'plans-for-week-38-lecture-monday-september-15'),
+              ('Readings and Videos', 2, None, 'readings-and-videos'),
+              ('Linking the regression analysis with a statistical '
+               'interpretation',
                2,
                None,
-               'suggested-reading-and-videos'),
-              ('Plans for the lab sessions',
+               'linking-the-regression-analysis-with-a-statistical-interpretation'),
+              ('Assumptions made', 2, None, 'assumptions-made'),
+              ('Expectation value and variance',
                2,
                None,
-               'plans-for-the-lab-sessions'),
-              ('Material for lecture Monday September 16',
+               'expectation-value-and-variance'),
+              ('Expectation value and variance for $\\boldsymbol{\\theta}$',
                2,
                None,
-               'material-for-lecture-monday-september-16'),
-              ('Logistic Regression', 2, None, 'logistic-regression'),
-              ('Classification problems', 2, None, 'classification-problems'),
-              ('Optimization and Deep learning',
+               'expectation-value-and-variance-for-boldsymbol-theta'),
+              ('Deriving OLS from a probability distribution',
                2,
                None,
-               'optimization-and-deep-learning'),
-              ('Basics', 2, None, 'basics'),
-              ('Linear classifier', 2, None, 'linear-classifier'),
-              ('Some selected properties', 2, None, 'some-selected-properties'),
-              ('Simple example', 2, None, 'simple-example'),
-              ('Plotting the mean value for each group',
+               'deriving-ols-from-a-probability-distribution'),
+              ('Independent and Identically Distributed (iid)',
                2,
                None,
-               'plotting-the-mean-value-for-each-group'),
-              ('The logistic function', 2, None, 'the-logistic-function'),
-              ('Examples of likelihood functions used in logistic regression '
-               'and nueral networks',
+               'independent-and-identically-distributed-iid'),
+              ('Maximum Likelihood Estimation (MLE)',
                2,
                None,
-               'examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-networks'),
-              ('Two parameters', 2, None, 'two-parameters'),
-              ('Maximum likelihood', 2, None, 'maximum-likelihood'),
-              ('The cost function rewritten',
+               'maximum-likelihood-estimation-mle'),
+              ('A new Cost Function', 2, None, 'a-new-cost-function'),
+              ('Why resampling methods', 2, None, 'why-resampling-methods'),
+              ('Resampling methods', 2, None, 'resampling-methods'),
+              ('Resampling approaches can be computationally expensive',
                2,
                None,
-               'the-cost-function-rewritten'),
-              ('Minimizing the cross entropy',
+               'resampling-approaches-can-be-computationally-expensive'),
+              ('Why resampling methods ?', 2, None, 'why-resampling-methods'),
+              ('Statistical analysis', 2, None, 'statistical-analysis'),
+              ('Resampling methods', 2, None, 'resampling-methods'),
+              ('Resampling methods: Bootstrap',
                2,
                None,
-               'minimizing-the-cross-entropy'),
-              ('A more compact expression',
+               'resampling-methods-bootstrap'),
+              ('The Central Limit Theorem',
                2,
                None,
-               'a-more-compact-expression'),
-              ('Extending to more predictors',
+               'the-central-limit-theorem'),
+              ('Finding the Limit', 2, None, 'finding-the-limit'),
+              ('Rewriting the $\\delta$-function',
                2,
                None,
-               'extending-to-more-predictors'),
-              ('Including more classes', 2, None, 'including-more-classes'),
-              ('More classes', 2, None, 'more-classes'),
-              ('Searching for Optimal Regularization Parameters $\\lambda$',
+               'rewriting-the-delta-function'),
+              ('Identifying Terms', 2, None, 'identifying-terms'),
+              ('Wrapping it up', 2, None, 'wrapping-it-up'),
+              ('Confidence Intervals', 2, None, 'confidence-intervals'),
+              ('Standard Approach based on the Normal Distribution',
                2,
                None,
-               'searching-for-optimal-regularization-parameters-lambda'),
-              ('Grid Search', 2, None, 'grid-search'),
-              ('Randomized Grid Search', 2, None, 'randomized-grid-search'),
-              ('Wisconsin Cancer Data', 2, None, 'wisconsin-cancer-data'),
-              ('Using the correlation matrix',
+               'standard-approach-based-on-the-normal-distribution'),
+              ('Resampling methods: Bootstrap background',
                2,
                None,
-               'using-the-correlation-matrix'),
-              ('Discussing the correlation data',
+               'resampling-methods-bootstrap-background'),
+              ('Resampling methods: More Bootstrap background',
                2,
                None,
-               'discussing-the-correlation-data'),
-              ('Other measures in classification studies: Cancer Data  again',
+               'resampling-methods-more-bootstrap-background'),
+              ('Resampling methods: Bootstrap approach',
                2,
                None,
-               'other-measures-in-classification-studies-cancer-data-again'),
-              ('Optimization, the central part of any Machine Learning '
-               'algortithm',
+               'resampling-methods-bootstrap-approach'),
+              ('Resampling methods: Bootstrap steps',
                2,
                None,
-               'optimization-the-central-part-of-any-machine-learning-algortithm'),
-              ('Revisiting our Logistic Regression case',
+               'resampling-methods-bootstrap-steps'),
+              ('Code example for the Bootstrap method',
                2,
                None,
-               'revisiting-our-logistic-regression-case'),
-              ('The equations to solve', 2, None, 'the-equations-to-solve'),
-              ("Solving using Newton-Raphson's method",
+               'code-example-for-the-bootstrap-method'),
+              ('Plotting the Histogram', 2, None, 'plotting-the-histogram'),
+              ('The bias-variance tradeoff',
                2,
                None,
-               'solving-using-newton-raphson-s-method'),
-              ("Brief reminder on Newton-Raphson's method",
+               'the-bias-variance-tradeoff'),
+              ('A way to Read the Bias-Variance Tradeoff',
                2,
                None,
-               'brief-reminder-on-newton-raphson-s-method'),
-              ('The equations', 2, None, 'the-equations'),
-              ('Simple geometric interpretation',
+               'a-way-to-read-the-bias-variance-tradeoff'),
+              ('Example code for Bias-Variance tradeoff',
                2,
                None,
-               'simple-geometric-interpretation'),
-              ('Extending to more than one variable',
+               'example-code-for-bias-variance-tradeoff'),
+              ('Understanding what happens',
                2,
                None,
-               'extending-to-more-than-one-variable'),
-              ('Steepest descent', 2, None, 'steepest-descent'),
-              ('More on Steepest descent', 2, None, 'more-on-steepest-descent'),
-              ('The ideal', 2, None, 'the-ideal'),
-              ('The sensitiveness of the gradient descent',
+               'understanding-what-happens'),
+              ('Summing up', 2, None, 'summing-up'),
+              ("Another Example from Scikit-Learn's Repository",
                2,
                None,
-               'the-sensitiveness-of-the-gradient-descent'),
-              ('Convex functions', 2, None, 'convex-functions'),
-              ('Convex function', 2, None, 'convex-function'),
-              ('Conditions on convex functions',
-               2,
-               None,
-               'conditions-on-convex-functions'),
-              ('More on convex functions', 2, None, 'more-on-convex-functions'),
-              ('Some simple problems', 2, None, 'some-simple-problems'),
-              ('Revisiting our first homework',
-               2,
-               None,
-               'revisiting-our-first-homework'),
-              ('Gradient descent example', 2, None, 'gradient-descent-example'),
-              ('The derivative of the cost/loss function',
-               2,
-               None,
-               'the-derivative-of-the-cost-loss-function'),
-              ('The Hessian matrix', 2, None, 'the-hessian-matrix'),
-              ('Simple program', 2, None, 'simple-program'),
-              ('Gradient Descent Example', 2, None, 'gradient-descent-example'),
-              ('And a corresponding example using _scikit-learn_',
-               2,
-               None,
-               'and-a-corresponding-example-using-scikit-learn'),
-              ('Gradient descent and Ridge',
-               2,
-               None,
-               'gradient-descent-and-ridge'),
-              ('The Hessian matrix for Ridge Regression',
-               2,
-               None,
-               'the-hessian-matrix-for-ridge-regression'),
-              ('Program example for gradient descent with Ridge Regression',
-               2,
-               None,
-               'program-example-for-gradient-descent-with-ridge-regression'),
-              ('Using gradient descent methods, limitations',
-               2,
-               None,
-               'using-gradient-descent-methods-limitations'),
-              ('Challenge yourself the coming weekend',
-               2,
-               None,
-               'challenge-yourself-the-coming-weekend'),
-              ('Lab session: Material from last week and relevant for the '
-               'first project',
-               2,
-               None,
-               'lab-session-material-from-last-week-and-relevant-for-the-first-project'),
+               'another-example-from-scikit-learn-s-repository'),
               ('Various steps in cross-validation',
                2,
                None,
                'various-steps-in-cross-validation'),
-              ('How to set up the cross-validation for Ridge and/or Lasso',
-               2,
-               None,
-               'how-to-set-up-the-cross-validation-for-ridge-and-or-lasso'),
               ('Cross-validation in brief',
                2,
                None,
@@ -210,7 +151,19 @@
                'Cross-validation',
                2,
                None,
-               'code-example-for-cross-validation-and-k-fold-cross-validation')]}
+               'code-example-for-cross-validation-and-k-fold-cross-validation'),
+              ('More examples on bootstrap and cross-validation and errors',
+               2,
+               None,
+               'more-examples-on-bootstrap-and-cross-validation-and-errors'),
+              ('The same example but now with cross-validation',
+               2,
+               None,
+               'the-same-example-but-now-with-cross-validation'),
+              ('Material for the lab sessions',
+               2,
+               None,
+               'material-for-the-lab-sessions')]}
 end of tocinfo -->
 
 <body>
@@ -238,76 +191,55 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week38-bs.html">Week 38: Logistic Regression and Optimization</a>
+    <a class="navbar-brand" href="http://github.com/week38-bs.html">Week 38: Statistical analysis, bias-variance tradeoff and resampling methods</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs001.html#plans-for-week-38-lecture-monday-september-16" style="font-size: 80%;">Plans for week 38, lecture Monday September 16</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs002.html#suggested-reading-and-videos" style="font-size: 80%;">Suggested reading and videos</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs003.html#plans-for-the-lab-sessions" style="font-size: 80%;">Plans for the lab sessions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs004.html#material-for-lecture-monday-september-16" style="font-size: 80%;">Material for lecture Monday September 16</a></li>
-     <!-- navigation toc: --> <li><a href="#logistic-regression" style="font-size: 80%;">Logistic Regression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs006.html#classification-problems" style="font-size: 80%;">Classification problems</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs007.html#optimization-and-deep-learning" style="font-size: 80%;">Optimization and Deep learning</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs008.html#basics" style="font-size: 80%;">Basics</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs009.html#linear-classifier" style="font-size: 80%;">Linear classifier</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs010.html#some-selected-properties" style="font-size: 80%;">Some selected properties</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs011.html#simple-example" style="font-size: 80%;">Simple example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs012.html#plotting-the-mean-value-for-each-group" style="font-size: 80%;">Plotting the mean value for each group</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs013.html#the-logistic-function" style="font-size: 80%;">The logistic function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs014.html#examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-networks" style="font-size: 80%;">Examples of likelihood functions used in logistic regression and nueral networks</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs015.html#two-parameters" style="font-size: 80%;">Two parameters</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs016.html#maximum-likelihood" style="font-size: 80%;">Maximum likelihood</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs017.html#the-cost-function-rewritten" style="font-size: 80%;">The cost function rewritten</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs018.html#minimizing-the-cross-entropy" style="font-size: 80%;">Minimizing the cross entropy</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs019.html#a-more-compact-expression" style="font-size: 80%;">A more compact expression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs020.html#extending-to-more-predictors" style="font-size: 80%;">Extending to more predictors</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs021.html#including-more-classes" style="font-size: 80%;">Including more classes</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs022.html#more-classes" style="font-size: 80%;">More classes</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs023.html#searching-for-optimal-regularization-parameters-lambda" style="font-size: 80%;">Searching for Optimal Regularization Parameters \( \lambda \)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs024.html#grid-search" style="font-size: 80%;">Grid Search</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs025.html#randomized-grid-search" style="font-size: 80%;">Randomized Grid Search</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs026.html#wisconsin-cancer-data" style="font-size: 80%;">Wisconsin Cancer Data</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs027.html#using-the-correlation-matrix" style="font-size: 80%;">Using the correlation matrix</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs028.html#discussing-the-correlation-data" style="font-size: 80%;">Discussing the correlation data</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs029.html#other-measures-in-classification-studies-cancer-data-again" style="font-size: 80%;">Other measures in classification studies: Cancer Data  again</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs030.html#optimization-the-central-part-of-any-machine-learning-algortithm" style="font-size: 80%;">Optimization, the central part of any Machine Learning algortithm</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs031.html#revisiting-our-logistic-regression-case" style="font-size: 80%;">Revisiting our Logistic Regression case</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs032.html#the-equations-to-solve" style="font-size: 80%;">The equations to solve</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs033.html#solving-using-newton-raphson-s-method" style="font-size: 80%;">Solving using Newton-Raphson's method</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs034.html#brief-reminder-on-newton-raphson-s-method" style="font-size: 80%;">Brief reminder on Newton-Raphson's method</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs035.html#the-equations" style="font-size: 80%;">The equations</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs036.html#simple-geometric-interpretation" style="font-size: 80%;">Simple geometric interpretation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs037.html#extending-to-more-than-one-variable" style="font-size: 80%;">Extending to more than one variable</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs038.html#steepest-descent" style="font-size: 80%;">Steepest descent</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs039.html#more-on-steepest-descent" style="font-size: 80%;">More on Steepest descent</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs040.html#the-ideal" style="font-size: 80%;">The ideal</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs041.html#the-sensitiveness-of-the-gradient-descent" style="font-size: 80%;">The sensitiveness of the gradient descent</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs042.html#convex-functions" style="font-size: 80%;">Convex functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs043.html#convex-function" style="font-size: 80%;">Convex function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs044.html#conditions-on-convex-functions" style="font-size: 80%;">Conditions on convex functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs045.html#more-on-convex-functions" style="font-size: 80%;">More on convex functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs046.html#some-simple-problems" style="font-size: 80%;">Some simple problems</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs047.html#revisiting-our-first-homework" style="font-size: 80%;">Revisiting our first homework</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs052.html#gradient-descent-example" style="font-size: 80%;">Gradient descent example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs049.html#the-derivative-of-the-cost-loss-function" style="font-size: 80%;">The derivative of the cost/loss function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs050.html#the-hessian-matrix" style="font-size: 80%;">The Hessian matrix</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs051.html#simple-program" style="font-size: 80%;">Simple program</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs052.html#gradient-descent-example" style="font-size: 80%;">Gradient Descent Example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs053.html#and-a-corresponding-example-using-scikit-learn" style="font-size: 80%;">And a corresponding example using <b>scikit-learn</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs054.html#gradient-descent-and-ridge" style="font-size: 80%;">Gradient descent and Ridge</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs055.html#the-hessian-matrix-for-ridge-regression" style="font-size: 80%;">The Hessian matrix for Ridge Regression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs056.html#program-example-for-gradient-descent-with-ridge-regression" style="font-size: 80%;">Program example for gradient descent with Ridge Regression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs057.html#using-gradient-descent-methods-limitations" style="font-size: 80%;">Using gradient descent methods, limitations</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs058.html#challenge-yourself-the-coming-weekend" style="font-size: 80%;">Challenge yourself the coming weekend</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs059.html#lab-session-material-from-last-week-and-relevant-for-the-first-project" style="font-size: 80%;">Lab session: Material from last week and relevant for the first project</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs060.html#various-steps-in-cross-validation" style="font-size: 80%;">Various steps in cross-validation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs061.html#how-to-set-up-the-cross-validation-for-ridge-and-or-lasso" style="font-size: 80%;">How to set up the cross-validation for Ridge and/or Lasso</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs062.html#cross-validation-in-brief" style="font-size: 80%;">Cross-validation in brief</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs063.html#code-example-for-cross-validation-and-k-fold-cross-validation" style="font-size: 80%;">Code Example for Cross-validation and \( k \)-fold Cross-validation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs001.html#plans-for-week-38-lecture-monday-september-15" style="font-size: 80%;">Plans for week 38, lecture Monday September 15</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs002.html#readings-and-videos" style="font-size: 80%;">Readings and Videos</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs003.html#linking-the-regression-analysis-with-a-statistical-interpretation" style="font-size: 80%;">Linking the regression analysis with a statistical interpretation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs004.html#assumptions-made" style="font-size: 80%;">Assumptions made</a></li>
+     <!-- navigation toc: --> <li><a href="#expectation-value-and-variance" style="font-size: 80%;">Expectation value and variance</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs006.html#expectation-value-and-variance-for-boldsymbol-theta" style="font-size: 80%;">Expectation value and variance for \( \boldsymbol{\theta} \)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs007.html#deriving-ols-from-a-probability-distribution" style="font-size: 80%;">Deriving OLS from a probability distribution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs008.html#independent-and-identically-distributed-iid" style="font-size: 80%;">Independent and Identically Distributed (iid)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs009.html#maximum-likelihood-estimation-mle" style="font-size: 80%;">Maximum Likelihood Estimation (MLE)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs010.html#a-new-cost-function" style="font-size: 80%;">A new Cost Function</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs014.html#why-resampling-methods" style="font-size: 80%;">Why resampling methods</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs016.html#resampling-methods" style="font-size: 80%;">Resampling methods</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs013.html#resampling-approaches-can-be-computationally-expensive" style="font-size: 80%;">Resampling approaches can be computationally expensive</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs014.html#why-resampling-methods" style="font-size: 80%;">Why resampling methods ?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs015.html#statistical-analysis" style="font-size: 80%;">Statistical analysis</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs016.html#resampling-methods" style="font-size: 80%;">Resampling methods</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs017.html#resampling-methods-bootstrap" style="font-size: 80%;">Resampling methods: Bootstrap</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs018.html#the-central-limit-theorem" style="font-size: 80%;">The Central Limit Theorem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs019.html#finding-the-limit" style="font-size: 80%;">Finding the Limit</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs020.html#rewriting-the-delta-function" style="font-size: 80%;">Rewriting the \( \delta \)-function</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs021.html#identifying-terms" style="font-size: 80%;">Identifying Terms</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs022.html#wrapping-it-up" style="font-size: 80%;">Wrapping it up</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs023.html#confidence-intervals" style="font-size: 80%;">Confidence Intervals</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs024.html#standard-approach-based-on-the-normal-distribution" style="font-size: 80%;">Standard Approach based on the Normal Distribution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs025.html#resampling-methods-bootstrap-background" style="font-size: 80%;">Resampling methods: Bootstrap background</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs026.html#resampling-methods-more-bootstrap-background" style="font-size: 80%;">Resampling methods: More Bootstrap background</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs027.html#resampling-methods-bootstrap-approach" style="font-size: 80%;">Resampling methods: Bootstrap approach</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs028.html#resampling-methods-bootstrap-steps" style="font-size: 80%;">Resampling methods: Bootstrap steps</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs029.html#code-example-for-the-bootstrap-method" style="font-size: 80%;">Code example for the Bootstrap method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs030.html#plotting-the-histogram" style="font-size: 80%;">Plotting the Histogram</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs031.html#the-bias-variance-tradeoff" style="font-size: 80%;">The bias-variance tradeoff</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs032.html#a-way-to-read-the-bias-variance-tradeoff" style="font-size: 80%;">A way to Read the Bias-Variance Tradeoff</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs033.html#example-code-for-bias-variance-tradeoff" style="font-size: 80%;">Example code for Bias-Variance tradeoff</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs034.html#understanding-what-happens" style="font-size: 80%;">Understanding what happens</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs035.html#summing-up" style="font-size: 80%;">Summing up</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs036.html#another-example-from-scikit-learn-s-repository" style="font-size: 80%;">Another Example from Scikit-Learn's Repository</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs037.html#various-steps-in-cross-validation" style="font-size: 80%;">Various steps in cross-validation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs038.html#cross-validation-in-brief" style="font-size: 80%;">Cross-validation in brief</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs039.html#code-example-for-cross-validation-and-k-fold-cross-validation" style="font-size: 80%;">Code Example for Cross-validation and \( k \)-fold Cross-validation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs040.html#more-examples-on-bootstrap-and-cross-validation-and-errors" style="font-size: 80%;">More examples on bootstrap and cross-validation and errors</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs041.html#the-same-example-but-now-with-cross-validation" style="font-size: 80%;">The same example but now with cross-validation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs042.html#material-for-the-lab-sessions" style="font-size: 80%;">Material for the lab sessions</a></li>
 
         </ul>
       </li>
@@ -318,20 +250,38 @@
 <div class="container">
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0005"></a>
-<!-- !split  -->
-<h2 id="logistic-regression" class="anchor">Logistic Regression </h2>
+<!-- !split -->
+<h2 id="expectation-value-and-variance" class="anchor">Expectation value and variance </h2>
+
+<p>We can calculate the expectation value of \( \boldsymbol{y} \) for a given element \( i \) </p>
+$$
+\begin{align*} 
+\mathbb{E}(y_i) & =
+\mathbb{E}(\mathbf{X}_{i, \ast} \, \boldsymbol{\theta}) + \mathbb{E}(\varepsilon_i)
+\, \, \, = \, \, \, \mathbf{X}_{i, \ast} \, \theta, 
+\end{align*} 
+$$
+
+<p>while
+its variance is 
+</p>
+$$
+\begin{align*} \mbox{Var}(y_i) & = \mathbb{E} \{ [y_i
+- \mathbb{E}(y_i)]^2 \} \, \, \, = \, \, \, \mathbb{E} ( y_i^2 ) -
+[\mathbb{E}(y_i)]^2  \\  & = \mathbb{E} [ ( \mathbf{X}_{i, \ast} \,
+\theta + \varepsilon_i )^2] - ( \mathbf{X}_{i, \ast} \, \boldsymbol{\theta})^2 \\ &
+= \mathbb{E} [ ( \mathbf{X}_{i, \ast} \, \boldsymbol{\theta})^2 + 2 \varepsilon_i
+\mathbf{X}_{i, \ast} \, \boldsymbol{\theta} + \varepsilon_i^2 ] - ( \mathbf{X}_{i,
+\ast} \, \theta)^2 \\  & = ( \mathbf{X}_{i, \ast} \, \boldsymbol{\theta})^2 + 2
+\mathbb{E}(\varepsilon_i) \mathbf{X}_{i, \ast} \, \boldsymbol{\theta} +
+\mathbb{E}(\varepsilon_i^2 ) - ( \mathbf{X}_{i, \ast} \, \boldsymbol{\theta})^2 
+\\ & = \mathbb{E}(\varepsilon_i^2 ) \, \, \, = \, \, \,
+\mbox{Var}(\varepsilon_i) \, \, \, = \, \, \, \sigma^2.  
+\end{align*}
+$$
 
-<p>In linear regression our main interest was centered on learning the
-coefficients of a functional fit (say a polynomial) in order to be
-able to predict the response of a continuous variable on some unseen
-data. The fit to the continuous variable \( y_i \) is based on some
-independent variables \( \boldsymbol{x}_i \). Linear regression resulted in
-analytical expressions for standard ordinary Least Squares or Ridge
-regression (in terms of matrices to invert) for several quantities,
-ranging from the variance and thereby the confidence intervals of the
-parameters \( \boldsymbol{\beta} \) to the mean squared error. If we can invert
-the product of the design matrices, linear regression gives then a
-simple recipe for fitting our data.
+<p>Hence, \( y_i \sim \mathcal{N}( \mathbf{X}_{i, \ast} \, \boldsymbol{\theta}, \sigma^2) \), that is \( \boldsymbol{y} \) follows a normal distribution with 
+mean value \( \boldsymbol{X}\boldsymbol{\theta} \) and variance \( \sigma^2 \) (not be confused with the singular values of the SVD). 
 </p>
 
 <p>
@@ -354,7 +304,7 @@ <h2 id="logistic-regression" class="anchor">Logistic Regression </h2>
   <li><a href="http://github.com/._week38-bs013.html">14</a></li>
   <li><a href="http://github.com/._week38-bs014.html">15</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week38-bs063.html">64</a></li>
+  <li><a href="http://github.com/._week38-bs042.html">43</a></li>
   <li><a href="http://github.com/._week38-bs006.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week38/html/._week38-bs006.html b/doc/pub/week38/html/._week38-bs006.html
index 893663b0f..a6dac2bd5 100644
--- a/doc/pub/week38/html/._week38-bs006.html
+++ b/doc/pub/week38/html/._week38-bs006.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 38: Logistic Regression and Optimization">
-<title>Week 38: Logistic Regression and Optimization</title>
+<meta name="description" content="Week 38: Statistical analysis, bias-variance tradeoff and resampling methods">
+<title>Week 38: Statistical analysis, bias-variance tradeoff and resampling methods</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week38.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week38-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -36,172 +36,113 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Plans for week 38, lecture Monday September 16',
+ 'sections': [('Plans for week 38, lecture Monday September 15',
                2,
                None,
-               'plans-for-week-38-lecture-monday-september-16'),
-              ('Suggested reading and videos',
+               'plans-for-week-38-lecture-monday-september-15'),
+              ('Readings and Videos', 2, None, 'readings-and-videos'),
+              ('Linking the regression analysis with a statistical '
+               'interpretation',
                2,
                None,
-               'suggested-reading-and-videos'),
-              ('Plans for the lab sessions',
+               'linking-the-regression-analysis-with-a-statistical-interpretation'),
+              ('Assumptions made', 2, None, 'assumptions-made'),
+              ('Expectation value and variance',
                2,
                None,
-               'plans-for-the-lab-sessions'),
-              ('Material for lecture Monday September 16',
+               'expectation-value-and-variance'),
+              ('Expectation value and variance for $\\boldsymbol{\\theta}$',
                2,
                None,
-               'material-for-lecture-monday-september-16'),
-              ('Logistic Regression', 2, None, 'logistic-regression'),
-              ('Classification problems', 2, None, 'classification-problems'),
-              ('Optimization and Deep learning',
+               'expectation-value-and-variance-for-boldsymbol-theta'),
+              ('Deriving OLS from a probability distribution',
                2,
                None,
-               'optimization-and-deep-learning'),
-              ('Basics', 2, None, 'basics'),
-              ('Linear classifier', 2, None, 'linear-classifier'),
-              ('Some selected properties', 2, None, 'some-selected-properties'),
-              ('Simple example', 2, None, 'simple-example'),
-              ('Plotting the mean value for each group',
+               'deriving-ols-from-a-probability-distribution'),
+              ('Independent and Identically Distributed (iid)',
                2,
                None,
-               'plotting-the-mean-value-for-each-group'),
-              ('The logistic function', 2, None, 'the-logistic-function'),
-              ('Examples of likelihood functions used in logistic regression '
-               'and nueral networks',
+               'independent-and-identically-distributed-iid'),
+              ('Maximum Likelihood Estimation (MLE)',
                2,
                None,
-               'examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-networks'),
-              ('Two parameters', 2, None, 'two-parameters'),
-              ('Maximum likelihood', 2, None, 'maximum-likelihood'),
-              ('The cost function rewritten',
+               'maximum-likelihood-estimation-mle'),
+              ('A new Cost Function', 2, None, 'a-new-cost-function'),
+              ('Why resampling methods', 2, None, 'why-resampling-methods'),
+              ('Resampling methods', 2, None, 'resampling-methods'),
+              ('Resampling approaches can be computationally expensive',
                2,
                None,
-               'the-cost-function-rewritten'),
-              ('Minimizing the cross entropy',
+               'resampling-approaches-can-be-computationally-expensive'),
+              ('Why resampling methods ?', 2, None, 'why-resampling-methods'),
+              ('Statistical analysis', 2, None, 'statistical-analysis'),
+              ('Resampling methods', 2, None, 'resampling-methods'),
+              ('Resampling methods: Bootstrap',
                2,
                None,
-               'minimizing-the-cross-entropy'),
-              ('A more compact expression',
+               'resampling-methods-bootstrap'),
+              ('The Central Limit Theorem',
                2,
                None,
-               'a-more-compact-expression'),
-              ('Extending to more predictors',
+               'the-central-limit-theorem'),
+              ('Finding the Limit', 2, None, 'finding-the-limit'),
+              ('Rewriting the $\\delta$-function',
                2,
                None,
-               'extending-to-more-predictors'),
-              ('Including more classes', 2, None, 'including-more-classes'),
-              ('More classes', 2, None, 'more-classes'),
-              ('Searching for Optimal Regularization Parameters $\\lambda$',
+               'rewriting-the-delta-function'),
+              ('Identifying Terms', 2, None, 'identifying-terms'),
+              ('Wrapping it up', 2, None, 'wrapping-it-up'),
+              ('Confidence Intervals', 2, None, 'confidence-intervals'),
+              ('Standard Approach based on the Normal Distribution',
                2,
                None,
-               'searching-for-optimal-regularization-parameters-lambda'),
-              ('Grid Search', 2, None, 'grid-search'),
-              ('Randomized Grid Search', 2, None, 'randomized-grid-search'),
-              ('Wisconsin Cancer Data', 2, None, 'wisconsin-cancer-data'),
-              ('Using the correlation matrix',
+               'standard-approach-based-on-the-normal-distribution'),
+              ('Resampling methods: Bootstrap background',
                2,
                None,
-               'using-the-correlation-matrix'),
-              ('Discussing the correlation data',
+               'resampling-methods-bootstrap-background'),
+              ('Resampling methods: More Bootstrap background',
                2,
                None,
-               'discussing-the-correlation-data'),
-              ('Other measures in classification studies: Cancer Data  again',
+               'resampling-methods-more-bootstrap-background'),
+              ('Resampling methods: Bootstrap approach',
                2,
                None,
-               'other-measures-in-classification-studies-cancer-data-again'),
-              ('Optimization, the central part of any Machine Learning '
-               'algortithm',
+               'resampling-methods-bootstrap-approach'),
+              ('Resampling methods: Bootstrap steps',
                2,
                None,
-               'optimization-the-central-part-of-any-machine-learning-algortithm'),
-              ('Revisiting our Logistic Regression case',
+               'resampling-methods-bootstrap-steps'),
+              ('Code example for the Bootstrap method',
                2,
                None,
-               'revisiting-our-logistic-regression-case'),
-              ('The equations to solve', 2, None, 'the-equations-to-solve'),
-              ("Solving using Newton-Raphson's method",
+               'code-example-for-the-bootstrap-method'),
+              ('Plotting the Histogram', 2, None, 'plotting-the-histogram'),
+              ('The bias-variance tradeoff',
                2,
                None,
-               'solving-using-newton-raphson-s-method'),
-              ("Brief reminder on Newton-Raphson's method",
+               'the-bias-variance-tradeoff'),
+              ('A way to Read the Bias-Variance Tradeoff',
                2,
                None,
-               'brief-reminder-on-newton-raphson-s-method'),
-              ('The equations', 2, None, 'the-equations'),
-              ('Simple geometric interpretation',
+               'a-way-to-read-the-bias-variance-tradeoff'),
+              ('Example code for Bias-Variance tradeoff',
                2,
                None,
-               'simple-geometric-interpretation'),
-              ('Extending to more than one variable',
+               'example-code-for-bias-variance-tradeoff'),
+              ('Understanding what happens',
                2,
                None,
-               'extending-to-more-than-one-variable'),
-              ('Steepest descent', 2, None, 'steepest-descent'),
-              ('More on Steepest descent', 2, None, 'more-on-steepest-descent'),
-              ('The ideal', 2, None, 'the-ideal'),
-              ('The sensitiveness of the gradient descent',
+               'understanding-what-happens'),
+              ('Summing up', 2, None, 'summing-up'),
+              ("Another Example from Scikit-Learn's Repository",
                2,
                None,
-               'the-sensitiveness-of-the-gradient-descent'),
-              ('Convex functions', 2, None, 'convex-functions'),
-              ('Convex function', 2, None, 'convex-function'),
-              ('Conditions on convex functions',
-               2,
-               None,
-               'conditions-on-convex-functions'),
-              ('More on convex functions', 2, None, 'more-on-convex-functions'),
-              ('Some simple problems', 2, None, 'some-simple-problems'),
-              ('Revisiting our first homework',
-               2,
-               None,
-               'revisiting-our-first-homework'),
-              ('Gradient descent example', 2, None, 'gradient-descent-example'),
-              ('The derivative of the cost/loss function',
-               2,
-               None,
-               'the-derivative-of-the-cost-loss-function'),
-              ('The Hessian matrix', 2, None, 'the-hessian-matrix'),
-              ('Simple program', 2, None, 'simple-program'),
-              ('Gradient Descent Example', 2, None, 'gradient-descent-example'),
-              ('And a corresponding example using _scikit-learn_',
-               2,
-               None,
-               'and-a-corresponding-example-using-scikit-learn'),
-              ('Gradient descent and Ridge',
-               2,
-               None,
-               'gradient-descent-and-ridge'),
-              ('The Hessian matrix for Ridge Regression',
-               2,
-               None,
-               'the-hessian-matrix-for-ridge-regression'),
-              ('Program example for gradient descent with Ridge Regression',
-               2,
-               None,
-               'program-example-for-gradient-descent-with-ridge-regression'),
-              ('Using gradient descent methods, limitations',
-               2,
-               None,
-               'using-gradient-descent-methods-limitations'),
-              ('Challenge yourself the coming weekend',
-               2,
-               None,
-               'challenge-yourself-the-coming-weekend'),
-              ('Lab session: Material from last week and relevant for the '
-               'first project',
-               2,
-               None,
-               'lab-session-material-from-last-week-and-relevant-for-the-first-project'),
+               'another-example-from-scikit-learn-s-repository'),
               ('Various steps in cross-validation',
                2,
                None,
                'various-steps-in-cross-validation'),
-              ('How to set up the cross-validation for Ridge and/or Lasso',
-               2,
-               None,
-               'how-to-set-up-the-cross-validation-for-ridge-and-or-lasso'),
               ('Cross-validation in brief',
                2,
                None,
@@ -210,7 +151,19 @@
                'Cross-validation',
                2,
                None,
-               'code-example-for-cross-validation-and-k-fold-cross-validation')]}
+               'code-example-for-cross-validation-and-k-fold-cross-validation'),
+              ('More examples on bootstrap and cross-validation and errors',
+               2,
+               None,
+               'more-examples-on-bootstrap-and-cross-validation-and-errors'),
+              ('The same example but now with cross-validation',
+               2,
+               None,
+               'the-same-example-but-now-with-cross-validation'),
+              ('Material for the lab sessions',
+               2,
+               None,
+               'material-for-the-lab-sessions')]}
 end of tocinfo -->
 
 <body>
@@ -238,76 +191,55 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week38-bs.html">Week 38: Logistic Regression and Optimization</a>
+    <a class="navbar-brand" href="http://github.com/week38-bs.html">Week 38: Statistical analysis, bias-variance tradeoff and resampling methods</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs001.html#plans-for-week-38-lecture-monday-september-16" style="font-size: 80%;">Plans for week 38, lecture Monday September 16</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs002.html#suggested-reading-and-videos" style="font-size: 80%;">Suggested reading and videos</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs003.html#plans-for-the-lab-sessions" style="font-size: 80%;">Plans for the lab sessions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs004.html#material-for-lecture-monday-september-16" style="font-size: 80%;">Material for lecture Monday September 16</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs005.html#logistic-regression" style="font-size: 80%;">Logistic Regression</a></li>
-     <!-- navigation toc: --> <li><a href="#classification-problems" style="font-size: 80%;">Classification problems</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs007.html#optimization-and-deep-learning" style="font-size: 80%;">Optimization and Deep learning</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs008.html#basics" style="font-size: 80%;">Basics</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs009.html#linear-classifier" style="font-size: 80%;">Linear classifier</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs010.html#some-selected-properties" style="font-size: 80%;">Some selected properties</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs011.html#simple-example" style="font-size: 80%;">Simple example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs012.html#plotting-the-mean-value-for-each-group" style="font-size: 80%;">Plotting the mean value for each group</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs013.html#the-logistic-function" style="font-size: 80%;">The logistic function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs014.html#examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-networks" style="font-size: 80%;">Examples of likelihood functions used in logistic regression and nueral networks</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs015.html#two-parameters" style="font-size: 80%;">Two parameters</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs016.html#maximum-likelihood" style="font-size: 80%;">Maximum likelihood</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs017.html#the-cost-function-rewritten" style="font-size: 80%;">The cost function rewritten</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs018.html#minimizing-the-cross-entropy" style="font-size: 80%;">Minimizing the cross entropy</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs019.html#a-more-compact-expression" style="font-size: 80%;">A more compact expression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs020.html#extending-to-more-predictors" style="font-size: 80%;">Extending to more predictors</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs021.html#including-more-classes" style="font-size: 80%;">Including more classes</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs022.html#more-classes" style="font-size: 80%;">More classes</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs023.html#searching-for-optimal-regularization-parameters-lambda" style="font-size: 80%;">Searching for Optimal Regularization Parameters \( \lambda \)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs024.html#grid-search" style="font-size: 80%;">Grid Search</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs025.html#randomized-grid-search" style="font-size: 80%;">Randomized Grid Search</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs026.html#wisconsin-cancer-data" style="font-size: 80%;">Wisconsin Cancer Data</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs027.html#using-the-correlation-matrix" style="font-size: 80%;">Using the correlation matrix</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs028.html#discussing-the-correlation-data" style="font-size: 80%;">Discussing the correlation data</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs029.html#other-measures-in-classification-studies-cancer-data-again" style="font-size: 80%;">Other measures in classification studies: Cancer Data  again</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs030.html#optimization-the-central-part-of-any-machine-learning-algortithm" style="font-size: 80%;">Optimization, the central part of any Machine Learning algortithm</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs031.html#revisiting-our-logistic-regression-case" style="font-size: 80%;">Revisiting our Logistic Regression case</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs032.html#the-equations-to-solve" style="font-size: 80%;">The equations to solve</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs033.html#solving-using-newton-raphson-s-method" style="font-size: 80%;">Solving using Newton-Raphson's method</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs034.html#brief-reminder-on-newton-raphson-s-method" style="font-size: 80%;">Brief reminder on Newton-Raphson's method</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs035.html#the-equations" style="font-size: 80%;">The equations</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs036.html#simple-geometric-interpretation" style="font-size: 80%;">Simple geometric interpretation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs037.html#extending-to-more-than-one-variable" style="font-size: 80%;">Extending to more than one variable</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs038.html#steepest-descent" style="font-size: 80%;">Steepest descent</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs039.html#more-on-steepest-descent" style="font-size: 80%;">More on Steepest descent</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs040.html#the-ideal" style="font-size: 80%;">The ideal</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs041.html#the-sensitiveness-of-the-gradient-descent" style="font-size: 80%;">The sensitiveness of the gradient descent</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs042.html#convex-functions" style="font-size: 80%;">Convex functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs043.html#convex-function" style="font-size: 80%;">Convex function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs044.html#conditions-on-convex-functions" style="font-size: 80%;">Conditions on convex functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs045.html#more-on-convex-functions" style="font-size: 80%;">More on convex functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs046.html#some-simple-problems" style="font-size: 80%;">Some simple problems</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs047.html#revisiting-our-first-homework" style="font-size: 80%;">Revisiting our first homework</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs052.html#gradient-descent-example" style="font-size: 80%;">Gradient descent example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs049.html#the-derivative-of-the-cost-loss-function" style="font-size: 80%;">The derivative of the cost/loss function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs050.html#the-hessian-matrix" style="font-size: 80%;">The Hessian matrix</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs051.html#simple-program" style="font-size: 80%;">Simple program</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs052.html#gradient-descent-example" style="font-size: 80%;">Gradient Descent Example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs053.html#and-a-corresponding-example-using-scikit-learn" style="font-size: 80%;">And a corresponding example using <b>scikit-learn</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs054.html#gradient-descent-and-ridge" style="font-size: 80%;">Gradient descent and Ridge</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs055.html#the-hessian-matrix-for-ridge-regression" style="font-size: 80%;">The Hessian matrix for Ridge Regression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs056.html#program-example-for-gradient-descent-with-ridge-regression" style="font-size: 80%;">Program example for gradient descent with Ridge Regression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs057.html#using-gradient-descent-methods-limitations" style="font-size: 80%;">Using gradient descent methods, limitations</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs058.html#challenge-yourself-the-coming-weekend" style="font-size: 80%;">Challenge yourself the coming weekend</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs059.html#lab-session-material-from-last-week-and-relevant-for-the-first-project" style="font-size: 80%;">Lab session: Material from last week and relevant for the first project</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs060.html#various-steps-in-cross-validation" style="font-size: 80%;">Various steps in cross-validation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs061.html#how-to-set-up-the-cross-validation-for-ridge-and-or-lasso" style="font-size: 80%;">How to set up the cross-validation for Ridge and/or Lasso</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs062.html#cross-validation-in-brief" style="font-size: 80%;">Cross-validation in brief</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs063.html#code-example-for-cross-validation-and-k-fold-cross-validation" style="font-size: 80%;">Code Example for Cross-validation and \( k \)-fold Cross-validation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs001.html#plans-for-week-38-lecture-monday-september-15" style="font-size: 80%;">Plans for week 38, lecture Monday September 15</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs002.html#readings-and-videos" style="font-size: 80%;">Readings and Videos</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs003.html#linking-the-regression-analysis-with-a-statistical-interpretation" style="font-size: 80%;">Linking the regression analysis with a statistical interpretation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs004.html#assumptions-made" style="font-size: 80%;">Assumptions made</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs005.html#expectation-value-and-variance" style="font-size: 80%;">Expectation value and variance</a></li>
+     <!-- navigation toc: --> <li><a href="#expectation-value-and-variance-for-boldsymbol-theta" style="font-size: 80%;">Expectation value and variance for \( \boldsymbol{\theta} \)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs007.html#deriving-ols-from-a-probability-distribution" style="font-size: 80%;">Deriving OLS from a probability distribution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs008.html#independent-and-identically-distributed-iid" style="font-size: 80%;">Independent and Identically Distributed (iid)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs009.html#maximum-likelihood-estimation-mle" style="font-size: 80%;">Maximum Likelihood Estimation (MLE)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs010.html#a-new-cost-function" style="font-size: 80%;">A new Cost Function</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs014.html#why-resampling-methods" style="font-size: 80%;">Why resampling methods</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs016.html#resampling-methods" style="font-size: 80%;">Resampling methods</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs013.html#resampling-approaches-can-be-computationally-expensive" style="font-size: 80%;">Resampling approaches can be computationally expensive</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs014.html#why-resampling-methods" style="font-size: 80%;">Why resampling methods ?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs015.html#statistical-analysis" style="font-size: 80%;">Statistical analysis</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs016.html#resampling-methods" style="font-size: 80%;">Resampling methods</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs017.html#resampling-methods-bootstrap" style="font-size: 80%;">Resampling methods: Bootstrap</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs018.html#the-central-limit-theorem" style="font-size: 80%;">The Central Limit Theorem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs019.html#finding-the-limit" style="font-size: 80%;">Finding the Limit</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs020.html#rewriting-the-delta-function" style="font-size: 80%;">Rewriting the \( \delta \)-function</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs021.html#identifying-terms" style="font-size: 80%;">Identifying Terms</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs022.html#wrapping-it-up" style="font-size: 80%;">Wrapping it up</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs023.html#confidence-intervals" style="font-size: 80%;">Confidence Intervals</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs024.html#standard-approach-based-on-the-normal-distribution" style="font-size: 80%;">Standard Approach based on the Normal Distribution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs025.html#resampling-methods-bootstrap-background" style="font-size: 80%;">Resampling methods: Bootstrap background</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs026.html#resampling-methods-more-bootstrap-background" style="font-size: 80%;">Resampling methods: More Bootstrap background</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs027.html#resampling-methods-bootstrap-approach" style="font-size: 80%;">Resampling methods: Bootstrap approach</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs028.html#resampling-methods-bootstrap-steps" style="font-size: 80%;">Resampling methods: Bootstrap steps</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs029.html#code-example-for-the-bootstrap-method" style="font-size: 80%;">Code example for the Bootstrap method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs030.html#plotting-the-histogram" style="font-size: 80%;">Plotting the Histogram</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs031.html#the-bias-variance-tradeoff" style="font-size: 80%;">The bias-variance tradeoff</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs032.html#a-way-to-read-the-bias-variance-tradeoff" style="font-size: 80%;">A way to Read the Bias-Variance Tradeoff</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs033.html#example-code-for-bias-variance-tradeoff" style="font-size: 80%;">Example code for Bias-Variance tradeoff</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs034.html#understanding-what-happens" style="font-size: 80%;">Understanding what happens</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs035.html#summing-up" style="font-size: 80%;">Summing up</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs036.html#another-example-from-scikit-learn-s-repository" style="font-size: 80%;">Another Example from Scikit-Learn's Repository</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs037.html#various-steps-in-cross-validation" style="font-size: 80%;">Various steps in cross-validation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs038.html#cross-validation-in-brief" style="font-size: 80%;">Cross-validation in brief</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs039.html#code-example-for-cross-validation-and-k-fold-cross-validation" style="font-size: 80%;">Code Example for Cross-validation and \( k \)-fold Cross-validation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs040.html#more-examples-on-bootstrap-and-cross-validation-and-errors" style="font-size: 80%;">More examples on bootstrap and cross-validation and errors</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs041.html#the-same-example-but-now-with-cross-validation" style="font-size: 80%;">The same example but now with cross-validation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs042.html#material-for-the-lab-sessions" style="font-size: 80%;">Material for the lab sessions</a></li>
 
         </ul>
       </li>
@@ -318,25 +250,82 @@
 <div class="container">
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0006"></a>
-<!-- !split  -->
-<h2 id="classification-problems" class="anchor">Classification problems </h2>
+<!-- !split -->
+<h2 id="expectation-value-and-variance-for-boldsymbol-theta" class="anchor">Expectation value and variance for \( \boldsymbol{\theta} \) </h2>
+
+<p>With the OLS expressions for the optimal parameters \( \boldsymbol{\hat{\theta}} \) we can evaluate the expectation value</p>
+$$
+\mathbb{E}(\boldsymbol{\hat{\theta}}) = \mathbb{E}[ (\mathbf{X}^{\top} \mathbf{X})^{-1}\mathbf{X}^{T} \mathbf{Y}]=(\mathbf{X}^{T} \mathbf{X})^{-1}\mathbf{X}^{T} \mathbb{E}[ \mathbf{Y}]=(\mathbf{X}^{T} \mathbf{X})^{-1} \mathbf{X}^{T}\mathbf{X}\boldsymbol{\theta}=\boldsymbol{\theta}.
+$$
+
+<p>This means that the estimator of the regression parameters is unbiased.</p>
+
+<p>We can also calculate the variance</p>
+
+<p>The variance of the optimal value \( \boldsymbol{\hat{\theta}} \) is</p>
+$$
+\begin{eqnarray*}
+\mbox{Var}(\boldsymbol{\hat{\theta}}) & = & \mathbb{E} \{ [\boldsymbol{\theta} - \mathbb{E}(\boldsymbol{\theta})] [\boldsymbol{\theta} - \mathbb{E}(\boldsymbol{\theta})]^{T} \}
+\\
+& = & \mathbb{E} \{ [(\mathbf{X}^{T} \mathbf{X})^{-1} \, \mathbf{X}^{T} \mathbf{Y} - \boldsymbol{\theta}] \, [(\mathbf{X}^{T} \mathbf{X})^{-1} \, \mathbf{X}^{T} \mathbf{Y} - \boldsymbol{\theta}]^{T} \}
+\\
+% & = & \mathbb{E} \{ [(\mathbf{X}^{T} \mathbf{X})^{-1} \, \mathbf{X}^{T} \mathbf{Y}] \, [(\mathbf{X}^{T} \mathbf{X})^{-1} \, \mathbf{X}^{T} \mathbf{Y}]^{T} \} - \boldsymbol{\theta} \, \boldsymbol{\theta}^{T}
+% \\
+% & = & \mathbb{E} \{ (\mathbf{X}^{T} \mathbf{X})^{-1} \, \mathbf{X}^{T} \mathbf{Y} \, \mathbf{Y}^{T} \, \mathbf{X} \, (\mathbf{X}^{T} \mathbf{X})^{-1}  \} - \boldsymbol{\theta} \, \boldsymbol{\theta}^{T}
+% \\
+& = & (\mathbf{X}^{T} \mathbf{X})^{-1} \, \mathbf{X}^{T} \, \mathbb{E} \{ \mathbf{Y} \, \mathbf{Y}^{T} \} \, \mathbf{X} \, (\mathbf{X}^{T} \mathbf{X})^{-1} - \boldsymbol{\theta} \, \boldsymbol{\theta}^{T}
+\\
+& = & (\mathbf{X}^{T} \mathbf{X})^{-1} \, \mathbf{X}^{T} \, \{ \mathbf{X} \, \boldsymbol{\theta} \, \boldsymbol{\theta}^{T} \,  \mathbf{X}^{T} + \sigma^2 \} \, \mathbf{X} \, (\mathbf{X}^{T} \mathbf{X})^{-1} - \boldsymbol{\theta} \, \boldsymbol{\theta}^{T}
+% \\
+% & = & (\mathbf{X}^T \mathbf{X})^{-1} \, \mathbf{X}^T \, \mathbf{X} \, \boldsymbol{\theta} \, \boldsymbol{\theta}^T \,  \mathbf{X}^T \, \mathbf{X} \, (\mathbf{X}^T % \mathbf{X})^{-1}
+% \\
+% & & + \, \, \sigma^2 \, (\mathbf{X}^T \mathbf{X})^{-1} \, \mathbf{X}^T  \, \mathbf{X} \, (\mathbf{X}^T \mathbf{X})^{-1} - \boldsymbol{\theta} \boldsymbol{\theta}^T
+\\
+& = & \boldsymbol{\theta} \, \boldsymbol{\theta}^{T}  + \sigma^2 \, (\mathbf{X}^{T} \mathbf{X})^{-1} - \boldsymbol{\theta} \, \boldsymbol{\theta}^{T}
+\, \, \, = \, \, \, \sigma^2 \, (\mathbf{X}^{T} \mathbf{X})^{-1},
+\end{eqnarray*}
+$$
 
-<p>Classification problems, however, are concerned with outcomes taking
-the form of discrete variables (i.e. categories). We may for example,
-on the basis of DNA sequencing for a number of patients, like to find
-out which mutations are important for a certain disease; or based on
-scans of various patients' brains, figure out if there is a tumor or
-not; or given a specific physical system, we'd like to identify its
-state, say whether it is an ordered or disordered system (typical
-situation in solid state physics); or classify the status of a
-patient, whether she/he has a stroke or not and many other similar
-situations.
+<p>where we have used  that \( \mathbb{E} (\mathbf{Y} \mathbf{Y}^{T}) =
+\mathbf{X} \, \boldsymbol{\theta} \, \boldsymbol{\theta}^{T} \, \mathbf{X}^{T} +
+\sigma^2 \, \mathbf{I}_{nn} \). From \( \mbox{Var}(\boldsymbol{\theta}) = \sigma^2
+\, (\mathbf{X}^{T} \mathbf{X})^{-1} \), one obtains an estimate of the
+variance of the estimate of the \( j \)-th regression coefficient:
+\( \boldsymbol{\sigma}^2 (\boldsymbol{\theta}_j ) = \boldsymbol{\sigma}^2 [(\mathbf{X}^{T} \mathbf{X})^{-1}]_{jj}  \). This may be used to
+construct a confidence interval for the estimates.
 </p>
 
-<p>The most common situation we encounter when we apply logistic
-regression is that of two possible outcomes, normally denoted as a
-binary outcome, true or false, positive or negative, success or
-failure etc.
+<p>In a similar way, we can obtain analytical expressions for say the
+expectation values of the parameters \( \boldsymbol{\theta} \) and their variance
+when we employ Ridge regression, allowing us again to define a confidence interval. 
+</p>
+
+<p>It is rather straightforward to show that</p>
+$$
+\mathbb{E} \big[ \boldsymbol{\theta}^{\mathrm{Ridge}} \big]=(\mathbf{X}^{T} \mathbf{X} + \lambda \mathbf{I}_{pp})^{-1} (\mathbf{X}^{\top} \mathbf{X})\boldsymbol{\theta}^{\mathrm{OLS}}.
+$$
+
+<p>We see clearly that 
+\( \mathbb{E} \big[ \boldsymbol{\theta}^{\mathrm{Ridge}} \big] \not= \boldsymbol{\theta}^{\mathrm{OLS}} \) for any \( \lambda > 0 \). We say then that the ridge estimator is biased.
+</p>
+
+<p>We can also compute the variance as </p>
+
+$$
+\mbox{Var}[\boldsymbol{\theta}^{\mathrm{Ridge}}]=\sigma^2[  \mathbf{X}^{T} \mathbf{X} + \lambda \mathbf{I} ]^{-1}  \mathbf{X}^{T} \mathbf{X} \{ [  \mathbf{X}^{\top} \mathbf{X} + \lambda \mathbf{I} ]^{-1}\}^{T},
+$$
+
+<p>and it is easy to see that if the parameter \( \lambda \) goes to infinity then the variance of Ridge parameters \( \boldsymbol{\theta} \) goes to zero. </p>
+
+<p>With this, we can compute the difference </p>
+
+$$
+\mbox{Var}[\boldsymbol{\theta}^{\mathrm{OLS}}]-\mbox{Var}(\boldsymbol{\theta}^{\mathrm{Ridge}})=\sigma^2 [  \mathbf{X}^{T} \mathbf{X} + \lambda \mathbf{I} ]^{-1}[ 2\lambda\mathbf{I} + \lambda^2 (\mathbf{X}^{T} \mathbf{X})^{-1} ] \{ [  \mathbf{X}^{T} \mathbf{X} + \lambda \mathbf{I} ]^{-1}\}^{T}.
+$$
+
+<p>The difference is non-negative definite since each component of the
+matrix product is non-negative definite. 
+This means the variance we obtain with the standard OLS will always for \( \lambda > 0 \) be larger than the variance of \( \boldsymbol{\theta} \) obtained with the Ridge estimator. This has interesting consequences when we discuss the so-called bias-variance trade-off below. 
 </p>
 
 <p>
@@ -360,7 +349,7 @@ <h2 id="classification-problems" class="anchor">Classification problems </h2>
   <li><a href="http://github.com/._week38-bs014.html">15</a></li>
   <li><a href="http://github.com/._week38-bs015.html">16</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week38-bs063.html">64</a></li>
+  <li><a href="http://github.com/._week38-bs042.html">43</a></li>
   <li><a href="http://github.com/._week38-bs007.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week38/html/._week38-bs007.html b/doc/pub/week38/html/._week38-bs007.html
index 7d230f0d6..f10456640 100644
--- a/doc/pub/week38/html/._week38-bs007.html
+++ b/doc/pub/week38/html/._week38-bs007.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 38: Logistic Regression and Optimization">
-<title>Week 38: Logistic Regression and Optimization</title>
+<meta name="description" content="Week 38: Statistical analysis, bias-variance tradeoff and resampling methods">
+<title>Week 38: Statistical analysis, bias-variance tradeoff and resampling methods</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week38.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week38-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -36,172 +36,113 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Plans for week 38, lecture Monday September 16',
+ 'sections': [('Plans for week 38, lecture Monday September 15',
                2,
                None,
-               'plans-for-week-38-lecture-monday-september-16'),
-              ('Suggested reading and videos',
+               'plans-for-week-38-lecture-monday-september-15'),
+              ('Readings and Videos', 2, None, 'readings-and-videos'),
+              ('Linking the regression analysis with a statistical '
+               'interpretation',
                2,
                None,
-               'suggested-reading-and-videos'),
-              ('Plans for the lab sessions',
+               'linking-the-regression-analysis-with-a-statistical-interpretation'),
+              ('Assumptions made', 2, None, 'assumptions-made'),
+              ('Expectation value and variance',
                2,
                None,
-               'plans-for-the-lab-sessions'),
-              ('Material for lecture Monday September 16',
+               'expectation-value-and-variance'),
+              ('Expectation value and variance for $\\boldsymbol{\\theta}$',
                2,
                None,
-               'material-for-lecture-monday-september-16'),
-              ('Logistic Regression', 2, None, 'logistic-regression'),
-              ('Classification problems', 2, None, 'classification-problems'),
-              ('Optimization and Deep learning',
+               'expectation-value-and-variance-for-boldsymbol-theta'),
+              ('Deriving OLS from a probability distribution',
                2,
                None,
-               'optimization-and-deep-learning'),
-              ('Basics', 2, None, 'basics'),
-              ('Linear classifier', 2, None, 'linear-classifier'),
-              ('Some selected properties', 2, None, 'some-selected-properties'),
-              ('Simple example', 2, None, 'simple-example'),
-              ('Plotting the mean value for each group',
+               'deriving-ols-from-a-probability-distribution'),
+              ('Independent and Identically Distributed (iid)',
                2,
                None,
-               'plotting-the-mean-value-for-each-group'),
-              ('The logistic function', 2, None, 'the-logistic-function'),
-              ('Examples of likelihood functions used in logistic regression '
-               'and nueral networks',
+               'independent-and-identically-distributed-iid'),
+              ('Maximum Likelihood Estimation (MLE)',
                2,
                None,
-               'examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-networks'),
-              ('Two parameters', 2, None, 'two-parameters'),
-              ('Maximum likelihood', 2, None, 'maximum-likelihood'),
-              ('The cost function rewritten',
+               'maximum-likelihood-estimation-mle'),
+              ('A new Cost Function', 2, None, 'a-new-cost-function'),
+              ('Why resampling methods', 2, None, 'why-resampling-methods'),
+              ('Resampling methods', 2, None, 'resampling-methods'),
+              ('Resampling approaches can be computationally expensive',
                2,
                None,
-               'the-cost-function-rewritten'),
-              ('Minimizing the cross entropy',
+               'resampling-approaches-can-be-computationally-expensive'),
+              ('Why resampling methods ?', 2, None, 'why-resampling-methods'),
+              ('Statistical analysis', 2, None, 'statistical-analysis'),
+              ('Resampling methods', 2, None, 'resampling-methods'),
+              ('Resampling methods: Bootstrap',
                2,
                None,
-               'minimizing-the-cross-entropy'),
-              ('A more compact expression',
+               'resampling-methods-bootstrap'),
+              ('The Central Limit Theorem',
                2,
                None,
-               'a-more-compact-expression'),
-              ('Extending to more predictors',
+               'the-central-limit-theorem'),
+              ('Finding the Limit', 2, None, 'finding-the-limit'),
+              ('Rewriting the $\\delta$-function',
                2,
                None,
-               'extending-to-more-predictors'),
-              ('Including more classes', 2, None, 'including-more-classes'),
-              ('More classes', 2, None, 'more-classes'),
-              ('Searching for Optimal Regularization Parameters $\\lambda$',
+               'rewriting-the-delta-function'),
+              ('Identifying Terms', 2, None, 'identifying-terms'),
+              ('Wrapping it up', 2, None, 'wrapping-it-up'),
+              ('Confidence Intervals', 2, None, 'confidence-intervals'),
+              ('Standard Approach based on the Normal Distribution',
                2,
                None,
-               'searching-for-optimal-regularization-parameters-lambda'),
-              ('Grid Search', 2, None, 'grid-search'),
-              ('Randomized Grid Search', 2, None, 'randomized-grid-search'),
-              ('Wisconsin Cancer Data', 2, None, 'wisconsin-cancer-data'),
-              ('Using the correlation matrix',
+               'standard-approach-based-on-the-normal-distribution'),
+              ('Resampling methods: Bootstrap background',
                2,
                None,
-               'using-the-correlation-matrix'),
-              ('Discussing the correlation data',
+               'resampling-methods-bootstrap-background'),
+              ('Resampling methods: More Bootstrap background',
                2,
                None,
-               'discussing-the-correlation-data'),
-              ('Other measures in classification studies: Cancer Data  again',
+               'resampling-methods-more-bootstrap-background'),
+              ('Resampling methods: Bootstrap approach',
                2,
                None,
-               'other-measures-in-classification-studies-cancer-data-again'),
-              ('Optimization, the central part of any Machine Learning '
-               'algortithm',
+               'resampling-methods-bootstrap-approach'),
+              ('Resampling methods: Bootstrap steps',
                2,
                None,
-               'optimization-the-central-part-of-any-machine-learning-algortithm'),
-              ('Revisiting our Logistic Regression case',
+               'resampling-methods-bootstrap-steps'),
+              ('Code example for the Bootstrap method',
                2,
                None,
-               'revisiting-our-logistic-regression-case'),
-              ('The equations to solve', 2, None, 'the-equations-to-solve'),
-              ("Solving using Newton-Raphson's method",
+               'code-example-for-the-bootstrap-method'),
+              ('Plotting the Histogram', 2, None, 'plotting-the-histogram'),
+              ('The bias-variance tradeoff',
                2,
                None,
-               'solving-using-newton-raphson-s-method'),
-              ("Brief reminder on Newton-Raphson's method",
+               'the-bias-variance-tradeoff'),
+              ('A way to Read the Bias-Variance Tradeoff',
                2,
                None,
-               'brief-reminder-on-newton-raphson-s-method'),
-              ('The equations', 2, None, 'the-equations'),
-              ('Simple geometric interpretation',
+               'a-way-to-read-the-bias-variance-tradeoff'),
+              ('Example code for Bias-Variance tradeoff',
                2,
                None,
-               'simple-geometric-interpretation'),
-              ('Extending to more than one variable',
+               'example-code-for-bias-variance-tradeoff'),
+              ('Understanding what happens',
                2,
                None,
-               'extending-to-more-than-one-variable'),
-              ('Steepest descent', 2, None, 'steepest-descent'),
-              ('More on Steepest descent', 2, None, 'more-on-steepest-descent'),
-              ('The ideal', 2, None, 'the-ideal'),
-              ('The sensitiveness of the gradient descent',
+               'understanding-what-happens'),
+              ('Summing up', 2, None, 'summing-up'),
+              ("Another Example from Scikit-Learn's Repository",
                2,
                None,
-               'the-sensitiveness-of-the-gradient-descent'),
-              ('Convex functions', 2, None, 'convex-functions'),
-              ('Convex function', 2, None, 'convex-function'),
-              ('Conditions on convex functions',
-               2,
-               None,
-               'conditions-on-convex-functions'),
-              ('More on convex functions', 2, None, 'more-on-convex-functions'),
-              ('Some simple problems', 2, None, 'some-simple-problems'),
-              ('Revisiting our first homework',
-               2,
-               None,
-               'revisiting-our-first-homework'),
-              ('Gradient descent example', 2, None, 'gradient-descent-example'),
-              ('The derivative of the cost/loss function',
-               2,
-               None,
-               'the-derivative-of-the-cost-loss-function'),
-              ('The Hessian matrix', 2, None, 'the-hessian-matrix'),
-              ('Simple program', 2, None, 'simple-program'),
-              ('Gradient Descent Example', 2, None, 'gradient-descent-example'),
-              ('And a corresponding example using _scikit-learn_',
-               2,
-               None,
-               'and-a-corresponding-example-using-scikit-learn'),
-              ('Gradient descent and Ridge',
-               2,
-               None,
-               'gradient-descent-and-ridge'),
-              ('The Hessian matrix for Ridge Regression',
-               2,
-               None,
-               'the-hessian-matrix-for-ridge-regression'),
-              ('Program example for gradient descent with Ridge Regression',
-               2,
-               None,
-               'program-example-for-gradient-descent-with-ridge-regression'),
-              ('Using gradient descent methods, limitations',
-               2,
-               None,
-               'using-gradient-descent-methods-limitations'),
-              ('Challenge yourself the coming weekend',
-               2,
-               None,
-               'challenge-yourself-the-coming-weekend'),
-              ('Lab session: Material from last week and relevant for the '
-               'first project',
-               2,
-               None,
-               'lab-session-material-from-last-week-and-relevant-for-the-first-project'),
+               'another-example-from-scikit-learn-s-repository'),
               ('Various steps in cross-validation',
                2,
                None,
                'various-steps-in-cross-validation'),
-              ('How to set up the cross-validation for Ridge and/or Lasso',
-               2,
-               None,
-               'how-to-set-up-the-cross-validation-for-ridge-and-or-lasso'),
               ('Cross-validation in brief',
                2,
                None,
@@ -210,7 +151,19 @@
                'Cross-validation',
                2,
                None,
-               'code-example-for-cross-validation-and-k-fold-cross-validation')]}
+               'code-example-for-cross-validation-and-k-fold-cross-validation'),
+              ('More examples on bootstrap and cross-validation and errors',
+               2,
+               None,
+               'more-examples-on-bootstrap-and-cross-validation-and-errors'),
+              ('The same example but now with cross-validation',
+               2,
+               None,
+               'the-same-example-but-now-with-cross-validation'),
+              ('Material for the lab sessions',
+               2,
+               None,
+               'material-for-the-lab-sessions')]}
 end of tocinfo -->
 
 <body>
@@ -238,76 +191,55 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week38-bs.html">Week 38: Logistic Regression and Optimization</a>
+    <a class="navbar-brand" href="http://github.com/week38-bs.html">Week 38: Statistical analysis, bias-variance tradeoff and resampling methods</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs001.html#plans-for-week-38-lecture-monday-september-16" style="font-size: 80%;">Plans for week 38, lecture Monday September 16</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs002.html#suggested-reading-and-videos" style="font-size: 80%;">Suggested reading and videos</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs003.html#plans-for-the-lab-sessions" style="font-size: 80%;">Plans for the lab sessions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs004.html#material-for-lecture-monday-september-16" style="font-size: 80%;">Material for lecture Monday September 16</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs005.html#logistic-regression" style="font-size: 80%;">Logistic Regression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs006.html#classification-problems" style="font-size: 80%;">Classification problems</a></li>
-     <!-- navigation toc: --> <li><a href="#optimization-and-deep-learning" style="font-size: 80%;">Optimization and Deep learning</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs008.html#basics" style="font-size: 80%;">Basics</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs009.html#linear-classifier" style="font-size: 80%;">Linear classifier</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs010.html#some-selected-properties" style="font-size: 80%;">Some selected properties</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs011.html#simple-example" style="font-size: 80%;">Simple example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs012.html#plotting-the-mean-value-for-each-group" style="font-size: 80%;">Plotting the mean value for each group</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs013.html#the-logistic-function" style="font-size: 80%;">The logistic function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs014.html#examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-networks" style="font-size: 80%;">Examples of likelihood functions used in logistic regression and nueral networks</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs015.html#two-parameters" style="font-size: 80%;">Two parameters</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs016.html#maximum-likelihood" style="font-size: 80%;">Maximum likelihood</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs017.html#the-cost-function-rewritten" style="font-size: 80%;">The cost function rewritten</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs018.html#minimizing-the-cross-entropy" style="font-size: 80%;">Minimizing the cross entropy</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs019.html#a-more-compact-expression" style="font-size: 80%;">A more compact expression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs020.html#extending-to-more-predictors" style="font-size: 80%;">Extending to more predictors</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs021.html#including-more-classes" style="font-size: 80%;">Including more classes</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs022.html#more-classes" style="font-size: 80%;">More classes</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs023.html#searching-for-optimal-regularization-parameters-lambda" style="font-size: 80%;">Searching for Optimal Regularization Parameters \( \lambda \)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs024.html#grid-search" style="font-size: 80%;">Grid Search</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs025.html#randomized-grid-search" style="font-size: 80%;">Randomized Grid Search</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs026.html#wisconsin-cancer-data" style="font-size: 80%;">Wisconsin Cancer Data</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs027.html#using-the-correlation-matrix" style="font-size: 80%;">Using the correlation matrix</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs028.html#discussing-the-correlation-data" style="font-size: 80%;">Discussing the correlation data</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs029.html#other-measures-in-classification-studies-cancer-data-again" style="font-size: 80%;">Other measures in classification studies: Cancer Data  again</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs030.html#optimization-the-central-part-of-any-machine-learning-algortithm" style="font-size: 80%;">Optimization, the central part of any Machine Learning algortithm</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs031.html#revisiting-our-logistic-regression-case" style="font-size: 80%;">Revisiting our Logistic Regression case</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs032.html#the-equations-to-solve" style="font-size: 80%;">The equations to solve</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs033.html#solving-using-newton-raphson-s-method" style="font-size: 80%;">Solving using Newton-Raphson's method</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs034.html#brief-reminder-on-newton-raphson-s-method" style="font-size: 80%;">Brief reminder on Newton-Raphson's method</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs035.html#the-equations" style="font-size: 80%;">The equations</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs036.html#simple-geometric-interpretation" style="font-size: 80%;">Simple geometric interpretation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs037.html#extending-to-more-than-one-variable" style="font-size: 80%;">Extending to more than one variable</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs038.html#steepest-descent" style="font-size: 80%;">Steepest descent</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs039.html#more-on-steepest-descent" style="font-size: 80%;">More on Steepest descent</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs040.html#the-ideal" style="font-size: 80%;">The ideal</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs041.html#the-sensitiveness-of-the-gradient-descent" style="font-size: 80%;">The sensitiveness of the gradient descent</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs042.html#convex-functions" style="font-size: 80%;">Convex functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs043.html#convex-function" style="font-size: 80%;">Convex function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs044.html#conditions-on-convex-functions" style="font-size: 80%;">Conditions on convex functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs045.html#more-on-convex-functions" style="font-size: 80%;">More on convex functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs046.html#some-simple-problems" style="font-size: 80%;">Some simple problems</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs047.html#revisiting-our-first-homework" style="font-size: 80%;">Revisiting our first homework</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs052.html#gradient-descent-example" style="font-size: 80%;">Gradient descent example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs049.html#the-derivative-of-the-cost-loss-function" style="font-size: 80%;">The derivative of the cost/loss function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs050.html#the-hessian-matrix" style="font-size: 80%;">The Hessian matrix</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs051.html#simple-program" style="font-size: 80%;">Simple program</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs052.html#gradient-descent-example" style="font-size: 80%;">Gradient Descent Example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs053.html#and-a-corresponding-example-using-scikit-learn" style="font-size: 80%;">And a corresponding example using <b>scikit-learn</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs054.html#gradient-descent-and-ridge" style="font-size: 80%;">Gradient descent and Ridge</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs055.html#the-hessian-matrix-for-ridge-regression" style="font-size: 80%;">The Hessian matrix for Ridge Regression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs056.html#program-example-for-gradient-descent-with-ridge-regression" style="font-size: 80%;">Program example for gradient descent with Ridge Regression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs057.html#using-gradient-descent-methods-limitations" style="font-size: 80%;">Using gradient descent methods, limitations</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs058.html#challenge-yourself-the-coming-weekend" style="font-size: 80%;">Challenge yourself the coming weekend</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs059.html#lab-session-material-from-last-week-and-relevant-for-the-first-project" style="font-size: 80%;">Lab session: Material from last week and relevant for the first project</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs060.html#various-steps-in-cross-validation" style="font-size: 80%;">Various steps in cross-validation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs061.html#how-to-set-up-the-cross-validation-for-ridge-and-or-lasso" style="font-size: 80%;">How to set up the cross-validation for Ridge and/or Lasso</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs062.html#cross-validation-in-brief" style="font-size: 80%;">Cross-validation in brief</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs063.html#code-example-for-cross-validation-and-k-fold-cross-validation" style="font-size: 80%;">Code Example for Cross-validation and \( k \)-fold Cross-validation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs001.html#plans-for-week-38-lecture-monday-september-15" style="font-size: 80%;">Plans for week 38, lecture Monday September 15</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs002.html#readings-and-videos" style="font-size: 80%;">Readings and Videos</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs003.html#linking-the-regression-analysis-with-a-statistical-interpretation" style="font-size: 80%;">Linking the regression analysis with a statistical interpretation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs004.html#assumptions-made" style="font-size: 80%;">Assumptions made</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs005.html#expectation-value-and-variance" style="font-size: 80%;">Expectation value and variance</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs006.html#expectation-value-and-variance-for-boldsymbol-theta" style="font-size: 80%;">Expectation value and variance for \( \boldsymbol{\theta} \)</a></li>
+     <!-- navigation toc: --> <li><a href="#deriving-ols-from-a-probability-distribution" style="font-size: 80%;">Deriving OLS from a probability distribution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs008.html#independent-and-identically-distributed-iid" style="font-size: 80%;">Independent and Identically Distributed (iid)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs009.html#maximum-likelihood-estimation-mle" style="font-size: 80%;">Maximum Likelihood Estimation (MLE)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs010.html#a-new-cost-function" style="font-size: 80%;">A new Cost Function</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs014.html#why-resampling-methods" style="font-size: 80%;">Why resampling methods</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs016.html#resampling-methods" style="font-size: 80%;">Resampling methods</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs013.html#resampling-approaches-can-be-computationally-expensive" style="font-size: 80%;">Resampling approaches can be computationally expensive</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs014.html#why-resampling-methods" style="font-size: 80%;">Why resampling methods ?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs015.html#statistical-analysis" style="font-size: 80%;">Statistical analysis</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs016.html#resampling-methods" style="font-size: 80%;">Resampling methods</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs017.html#resampling-methods-bootstrap" style="font-size: 80%;">Resampling methods: Bootstrap</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs018.html#the-central-limit-theorem" style="font-size: 80%;">The Central Limit Theorem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs019.html#finding-the-limit" style="font-size: 80%;">Finding the Limit</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs020.html#rewriting-the-delta-function" style="font-size: 80%;">Rewriting the \( \delta \)-function</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs021.html#identifying-terms" style="font-size: 80%;">Identifying Terms</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs022.html#wrapping-it-up" style="font-size: 80%;">Wrapping it up</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs023.html#confidence-intervals" style="font-size: 80%;">Confidence Intervals</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs024.html#standard-approach-based-on-the-normal-distribution" style="font-size: 80%;">Standard Approach based on the Normal Distribution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs025.html#resampling-methods-bootstrap-background" style="font-size: 80%;">Resampling methods: Bootstrap background</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs026.html#resampling-methods-more-bootstrap-background" style="font-size: 80%;">Resampling methods: More Bootstrap background</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs027.html#resampling-methods-bootstrap-approach" style="font-size: 80%;">Resampling methods: Bootstrap approach</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs028.html#resampling-methods-bootstrap-steps" style="font-size: 80%;">Resampling methods: Bootstrap steps</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs029.html#code-example-for-the-bootstrap-method" style="font-size: 80%;">Code example for the Bootstrap method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs030.html#plotting-the-histogram" style="font-size: 80%;">Plotting the Histogram</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs031.html#the-bias-variance-tradeoff" style="font-size: 80%;">The bias-variance tradeoff</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs032.html#a-way-to-read-the-bias-variance-tradeoff" style="font-size: 80%;">A way to Read the Bias-Variance Tradeoff</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs033.html#example-code-for-bias-variance-tradeoff" style="font-size: 80%;">Example code for Bias-Variance tradeoff</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs034.html#understanding-what-happens" style="font-size: 80%;">Understanding what happens</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs035.html#summing-up" style="font-size: 80%;">Summing up</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs036.html#another-example-from-scikit-learn-s-repository" style="font-size: 80%;">Another Example from Scikit-Learn's Repository</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs037.html#various-steps-in-cross-validation" style="font-size: 80%;">Various steps in cross-validation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs038.html#cross-validation-in-brief" style="font-size: 80%;">Cross-validation in brief</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs039.html#code-example-for-cross-validation-and-k-fold-cross-validation" style="font-size: 80%;">Code Example for Cross-validation and \( k \)-fold Cross-validation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs040.html#more-examples-on-bootstrap-and-cross-validation-and-errors" style="font-size: 80%;">More examples on bootstrap and cross-validation and errors</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs041.html#the-same-example-but-now-with-cross-validation" style="font-size: 80%;">The same example but now with cross-validation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs042.html#material-for-the-lab-sessions" style="font-size: 80%;">Material for the lab sessions</a></li>
 
         </ul>
       </li>
@@ -319,24 +251,28 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0007"></a>
 <!-- !split -->
-<h2 id="optimization-and-deep-learning" class="anchor">Optimization and Deep learning </h2>
+<h2 id="deriving-ols-from-a-probability-distribution" class="anchor">Deriving OLS from a probability distribution </h2>
 
-<p>Logistic regression will also serve as our stepping stone towards
-neural network algorithms and supervised deep learning. For logistic
-learning, the minimization of the cost function leads to a non-linear
-equation in the parameters \( \boldsymbol{\beta} \). The optimization of the
-problem calls therefore for minimization algorithms. This forms the
-bottle neck of all machine learning algorithms, namely how to find
-reliable minima of a multi-variable function. This leads us to the
-family of gradient descent methods. The latter are the working horses
-of basically all modern machine learning algorithms.
+<p>Our basic assumption when we derived the OLS equations was to assume
+that our output is determined by a given continuous function
+\( f(\boldsymbol{x}) \) and a random noise \( \boldsymbol{\epsilon} \) given by the normal
+distribution with zero mean value and an undetermined variance
+\( \sigma^2 \).
 </p>
 
-<p>We note also that many of the topics discussed here on logistic 
-regression are also commonly used in modern supervised Deep Learning
-models, as we will see later.
+<p>We found above that the outputs \( \boldsymbol{y} \) have a mean value given by
+\( \boldsymbol{X}\hat{\boldsymbol{\theta}} \) and variance \( \sigma^2 \). Since the entries to
+the design matrix are not stochastic variables, we can assume that the
+probability distribution of our targets is also a normal distribution
+but now with mean value \( \boldsymbol{X}\hat{\boldsymbol{\theta}} \). This means that a
+single output \( y_i \) is given by the Gaussian distribution
 </p>
 
+$$
+y_i\sim \mathcal{N}(\boldsymbol{X}_{i,*}\boldsymbol{\theta}, \sigma^2)=\frac{1}{\sqrt{2\pi\sigma^2}}\exp{\left[-\frac{(y_i-\boldsymbol{X}_{i,*}\boldsymbol{\theta})^2}{2\sigma^2}\right]}.
+$$
+
+
 <p>
 <!-- navigation buttons at the bottom of the page -->
 <ul class="pagination">
@@ -359,7 +295,7 @@ <h2 id="optimization-and-deep-learning" class="anchor">Optimization and Deep lea
   <li><a href="http://github.com/._week38-bs015.html">16</a></li>
   <li><a href="http://github.com/._week38-bs016.html">17</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week38-bs063.html">64</a></li>
+  <li><a href="http://github.com/._week38-bs042.html">43</a></li>
   <li><a href="http://github.com/._week38-bs008.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week38/html/._week38-bs008.html b/doc/pub/week38/html/._week38-bs008.html
index 9788889fd..f0c7d6286 100644
--- a/doc/pub/week38/html/._week38-bs008.html
+++ b/doc/pub/week38/html/._week38-bs008.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 38: Logistic Regression and Optimization">
-<title>Week 38: Logistic Regression and Optimization</title>
+<meta name="description" content="Week 38: Statistical analysis, bias-variance tradeoff and resampling methods">
+<title>Week 38: Statistical analysis, bias-variance tradeoff and resampling methods</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week38.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week38-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -36,172 +36,113 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Plans for week 38, lecture Monday September 16',
+ 'sections': [('Plans for week 38, lecture Monday September 15',
                2,
                None,
-               'plans-for-week-38-lecture-monday-september-16'),
-              ('Suggested reading and videos',
+               'plans-for-week-38-lecture-monday-september-15'),
+              ('Readings and Videos', 2, None, 'readings-and-videos'),
+              ('Linking the regression analysis with a statistical '
+               'interpretation',
                2,
                None,
-               'suggested-reading-and-videos'),
-              ('Plans for the lab sessions',
+               'linking-the-regression-analysis-with-a-statistical-interpretation'),
+              ('Assumptions made', 2, None, 'assumptions-made'),
+              ('Expectation value and variance',
                2,
                None,
-               'plans-for-the-lab-sessions'),
-              ('Material for lecture Monday September 16',
+               'expectation-value-and-variance'),
+              ('Expectation value and variance for $\\boldsymbol{\\theta}$',
                2,
                None,
-               'material-for-lecture-monday-september-16'),
-              ('Logistic Regression', 2, None, 'logistic-regression'),
-              ('Classification problems', 2, None, 'classification-problems'),
-              ('Optimization and Deep learning',
+               'expectation-value-and-variance-for-boldsymbol-theta'),
+              ('Deriving OLS from a probability distribution',
                2,
                None,
-               'optimization-and-deep-learning'),
-              ('Basics', 2, None, 'basics'),
-              ('Linear classifier', 2, None, 'linear-classifier'),
-              ('Some selected properties', 2, None, 'some-selected-properties'),
-              ('Simple example', 2, None, 'simple-example'),
-              ('Plotting the mean value for each group',
+               'deriving-ols-from-a-probability-distribution'),
+              ('Independent and Identically Distributed (iid)',
                2,
                None,
-               'plotting-the-mean-value-for-each-group'),
-              ('The logistic function', 2, None, 'the-logistic-function'),
-              ('Examples of likelihood functions used in logistic regression '
-               'and nueral networks',
+               'independent-and-identically-distributed-iid'),
+              ('Maximum Likelihood Estimation (MLE)',
                2,
                None,
-               'examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-networks'),
-              ('Two parameters', 2, None, 'two-parameters'),
-              ('Maximum likelihood', 2, None, 'maximum-likelihood'),
-              ('The cost function rewritten',
+               'maximum-likelihood-estimation-mle'),
+              ('A new Cost Function', 2, None, 'a-new-cost-function'),
+              ('Why resampling methods', 2, None, 'why-resampling-methods'),
+              ('Resampling methods', 2, None, 'resampling-methods'),
+              ('Resampling approaches can be computationally expensive',
                2,
                None,
-               'the-cost-function-rewritten'),
-              ('Minimizing the cross entropy',
+               'resampling-approaches-can-be-computationally-expensive'),
+              ('Why resampling methods ?', 2, None, 'why-resampling-methods'),
+              ('Statistical analysis', 2, None, 'statistical-analysis'),
+              ('Resampling methods', 2, None, 'resampling-methods'),
+              ('Resampling methods: Bootstrap',
                2,
                None,
-               'minimizing-the-cross-entropy'),
-              ('A more compact expression',
+               'resampling-methods-bootstrap'),
+              ('The Central Limit Theorem',
                2,
                None,
-               'a-more-compact-expression'),
-              ('Extending to more predictors',
+               'the-central-limit-theorem'),
+              ('Finding the Limit', 2, None, 'finding-the-limit'),
+              ('Rewriting the $\\delta$-function',
                2,
                None,
-               'extending-to-more-predictors'),
-              ('Including more classes', 2, None, 'including-more-classes'),
-              ('More classes', 2, None, 'more-classes'),
-              ('Searching for Optimal Regularization Parameters $\\lambda$',
+               'rewriting-the-delta-function'),
+              ('Identifying Terms', 2, None, 'identifying-terms'),
+              ('Wrapping it up', 2, None, 'wrapping-it-up'),
+              ('Confidence Intervals', 2, None, 'confidence-intervals'),
+              ('Standard Approach based on the Normal Distribution',
                2,
                None,
-               'searching-for-optimal-regularization-parameters-lambda'),
-              ('Grid Search', 2, None, 'grid-search'),
-              ('Randomized Grid Search', 2, None, 'randomized-grid-search'),
-              ('Wisconsin Cancer Data', 2, None, 'wisconsin-cancer-data'),
-              ('Using the correlation matrix',
+               'standard-approach-based-on-the-normal-distribution'),
+              ('Resampling methods: Bootstrap background',
                2,
                None,
-               'using-the-correlation-matrix'),
-              ('Discussing the correlation data',
+               'resampling-methods-bootstrap-background'),
+              ('Resampling methods: More Bootstrap background',
                2,
                None,
-               'discussing-the-correlation-data'),
-              ('Other measures in classification studies: Cancer Data  again',
+               'resampling-methods-more-bootstrap-background'),
+              ('Resampling methods: Bootstrap approach',
                2,
                None,
-               'other-measures-in-classification-studies-cancer-data-again'),
-              ('Optimization, the central part of any Machine Learning '
-               'algortithm',
+               'resampling-methods-bootstrap-approach'),
+              ('Resampling methods: Bootstrap steps',
                2,
                None,
-               'optimization-the-central-part-of-any-machine-learning-algortithm'),
-              ('Revisiting our Logistic Regression case',
+               'resampling-methods-bootstrap-steps'),
+              ('Code example for the Bootstrap method',
                2,
                None,
-               'revisiting-our-logistic-regression-case'),
-              ('The equations to solve', 2, None, 'the-equations-to-solve'),
-              ("Solving using Newton-Raphson's method",
+               'code-example-for-the-bootstrap-method'),
+              ('Plotting the Histogram', 2, None, 'plotting-the-histogram'),
+              ('The bias-variance tradeoff',
                2,
                None,
-               'solving-using-newton-raphson-s-method'),
-              ("Brief reminder on Newton-Raphson's method",
+               'the-bias-variance-tradeoff'),
+              ('A way to Read the Bias-Variance Tradeoff',
                2,
                None,
-               'brief-reminder-on-newton-raphson-s-method'),
-              ('The equations', 2, None, 'the-equations'),
-              ('Simple geometric interpretation',
+               'a-way-to-read-the-bias-variance-tradeoff'),
+              ('Example code for Bias-Variance tradeoff',
                2,
                None,
-               'simple-geometric-interpretation'),
-              ('Extending to more than one variable',
+               'example-code-for-bias-variance-tradeoff'),
+              ('Understanding what happens',
                2,
                None,
-               'extending-to-more-than-one-variable'),
-              ('Steepest descent', 2, None, 'steepest-descent'),
-              ('More on Steepest descent', 2, None, 'more-on-steepest-descent'),
-              ('The ideal', 2, None, 'the-ideal'),
-              ('The sensitiveness of the gradient descent',
+               'understanding-what-happens'),
+              ('Summing up', 2, None, 'summing-up'),
+              ("Another Example from Scikit-Learn's Repository",
                2,
                None,
-               'the-sensitiveness-of-the-gradient-descent'),
-              ('Convex functions', 2, None, 'convex-functions'),
-              ('Convex function', 2, None, 'convex-function'),
-              ('Conditions on convex functions',
-               2,
-               None,
-               'conditions-on-convex-functions'),
-              ('More on convex functions', 2, None, 'more-on-convex-functions'),
-              ('Some simple problems', 2, None, 'some-simple-problems'),
-              ('Revisiting our first homework',
-               2,
-               None,
-               'revisiting-our-first-homework'),
-              ('Gradient descent example', 2, None, 'gradient-descent-example'),
-              ('The derivative of the cost/loss function',
-               2,
-               None,
-               'the-derivative-of-the-cost-loss-function'),
-              ('The Hessian matrix', 2, None, 'the-hessian-matrix'),
-              ('Simple program', 2, None, 'simple-program'),
-              ('Gradient Descent Example', 2, None, 'gradient-descent-example'),
-              ('And a corresponding example using _scikit-learn_',
-               2,
-               None,
-               'and-a-corresponding-example-using-scikit-learn'),
-              ('Gradient descent and Ridge',
-               2,
-               None,
-               'gradient-descent-and-ridge'),
-              ('The Hessian matrix for Ridge Regression',
-               2,
-               None,
-               'the-hessian-matrix-for-ridge-regression'),
-              ('Program example for gradient descent with Ridge Regression',
-               2,
-               None,
-               'program-example-for-gradient-descent-with-ridge-regression'),
-              ('Using gradient descent methods, limitations',
-               2,
-               None,
-               'using-gradient-descent-methods-limitations'),
-              ('Challenge yourself the coming weekend',
-               2,
-               None,
-               'challenge-yourself-the-coming-weekend'),
-              ('Lab session: Material from last week and relevant for the '
-               'first project',
-               2,
-               None,
-               'lab-session-material-from-last-week-and-relevant-for-the-first-project'),
+               'another-example-from-scikit-learn-s-repository'),
               ('Various steps in cross-validation',
                2,
                None,
                'various-steps-in-cross-validation'),
-              ('How to set up the cross-validation for Ridge and/or Lasso',
-               2,
-               None,
-               'how-to-set-up-the-cross-validation-for-ridge-and-or-lasso'),
               ('Cross-validation in brief',
                2,
                None,
@@ -210,7 +151,19 @@
                'Cross-validation',
                2,
                None,
-               'code-example-for-cross-validation-and-k-fold-cross-validation')]}
+               'code-example-for-cross-validation-and-k-fold-cross-validation'),
+              ('More examples on bootstrap and cross-validation and errors',
+               2,
+               None,
+               'more-examples-on-bootstrap-and-cross-validation-and-errors'),
+              ('The same example but now with cross-validation',
+               2,
+               None,
+               'the-same-example-but-now-with-cross-validation'),
+              ('Material for the lab sessions',
+               2,
+               None,
+               'material-for-the-lab-sessions')]}
 end of tocinfo -->
 
 <body>
@@ -238,76 +191,55 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week38-bs.html">Week 38: Logistic Regression and Optimization</a>
+    <a class="navbar-brand" href="http://github.com/week38-bs.html">Week 38: Statistical analysis, bias-variance tradeoff and resampling methods</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs001.html#plans-for-week-38-lecture-monday-september-16" style="font-size: 80%;">Plans for week 38, lecture Monday September 16</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs002.html#suggested-reading-and-videos" style="font-size: 80%;">Suggested reading and videos</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs003.html#plans-for-the-lab-sessions" style="font-size: 80%;">Plans for the lab sessions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs004.html#material-for-lecture-monday-september-16" style="font-size: 80%;">Material for lecture Monday September 16</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs005.html#logistic-regression" style="font-size: 80%;">Logistic Regression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs006.html#classification-problems" style="font-size: 80%;">Classification problems</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs007.html#optimization-and-deep-learning" style="font-size: 80%;">Optimization and Deep learning</a></li>
-     <!-- navigation toc: --> <li><a href="#basics" style="font-size: 80%;">Basics</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs009.html#linear-classifier" style="font-size: 80%;">Linear classifier</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs010.html#some-selected-properties" style="font-size: 80%;">Some selected properties</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs011.html#simple-example" style="font-size: 80%;">Simple example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs012.html#plotting-the-mean-value-for-each-group" style="font-size: 80%;">Plotting the mean value for each group</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs013.html#the-logistic-function" style="font-size: 80%;">The logistic function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs014.html#examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-networks" style="font-size: 80%;">Examples of likelihood functions used in logistic regression and nueral networks</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs015.html#two-parameters" style="font-size: 80%;">Two parameters</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs016.html#maximum-likelihood" style="font-size: 80%;">Maximum likelihood</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs017.html#the-cost-function-rewritten" style="font-size: 80%;">The cost function rewritten</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs018.html#minimizing-the-cross-entropy" style="font-size: 80%;">Minimizing the cross entropy</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs019.html#a-more-compact-expression" style="font-size: 80%;">A more compact expression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs020.html#extending-to-more-predictors" style="font-size: 80%;">Extending to more predictors</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs021.html#including-more-classes" style="font-size: 80%;">Including more classes</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs022.html#more-classes" style="font-size: 80%;">More classes</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs023.html#searching-for-optimal-regularization-parameters-lambda" style="font-size: 80%;">Searching for Optimal Regularization Parameters \( \lambda \)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs024.html#grid-search" style="font-size: 80%;">Grid Search</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs025.html#randomized-grid-search" style="font-size: 80%;">Randomized Grid Search</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs026.html#wisconsin-cancer-data" style="font-size: 80%;">Wisconsin Cancer Data</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs027.html#using-the-correlation-matrix" style="font-size: 80%;">Using the correlation matrix</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs028.html#discussing-the-correlation-data" style="font-size: 80%;">Discussing the correlation data</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs029.html#other-measures-in-classification-studies-cancer-data-again" style="font-size: 80%;">Other measures in classification studies: Cancer Data  again</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs030.html#optimization-the-central-part-of-any-machine-learning-algortithm" style="font-size: 80%;">Optimization, the central part of any Machine Learning algortithm</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs031.html#revisiting-our-logistic-regression-case" style="font-size: 80%;">Revisiting our Logistic Regression case</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs032.html#the-equations-to-solve" style="font-size: 80%;">The equations to solve</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs033.html#solving-using-newton-raphson-s-method" style="font-size: 80%;">Solving using Newton-Raphson's method</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs034.html#brief-reminder-on-newton-raphson-s-method" style="font-size: 80%;">Brief reminder on Newton-Raphson's method</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs035.html#the-equations" style="font-size: 80%;">The equations</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs036.html#simple-geometric-interpretation" style="font-size: 80%;">Simple geometric interpretation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs037.html#extending-to-more-than-one-variable" style="font-size: 80%;">Extending to more than one variable</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs038.html#steepest-descent" style="font-size: 80%;">Steepest descent</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs039.html#more-on-steepest-descent" style="font-size: 80%;">More on Steepest descent</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs040.html#the-ideal" style="font-size: 80%;">The ideal</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs041.html#the-sensitiveness-of-the-gradient-descent" style="font-size: 80%;">The sensitiveness of the gradient descent</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs042.html#convex-functions" style="font-size: 80%;">Convex functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs043.html#convex-function" style="font-size: 80%;">Convex function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs044.html#conditions-on-convex-functions" style="font-size: 80%;">Conditions on convex functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs045.html#more-on-convex-functions" style="font-size: 80%;">More on convex functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs046.html#some-simple-problems" style="font-size: 80%;">Some simple problems</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs047.html#revisiting-our-first-homework" style="font-size: 80%;">Revisiting our first homework</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs052.html#gradient-descent-example" style="font-size: 80%;">Gradient descent example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs049.html#the-derivative-of-the-cost-loss-function" style="font-size: 80%;">The derivative of the cost/loss function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs050.html#the-hessian-matrix" style="font-size: 80%;">The Hessian matrix</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs051.html#simple-program" style="font-size: 80%;">Simple program</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs052.html#gradient-descent-example" style="font-size: 80%;">Gradient Descent Example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs053.html#and-a-corresponding-example-using-scikit-learn" style="font-size: 80%;">And a corresponding example using <b>scikit-learn</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs054.html#gradient-descent-and-ridge" style="font-size: 80%;">Gradient descent and Ridge</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs055.html#the-hessian-matrix-for-ridge-regression" style="font-size: 80%;">The Hessian matrix for Ridge Regression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs056.html#program-example-for-gradient-descent-with-ridge-regression" style="font-size: 80%;">Program example for gradient descent with Ridge Regression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs057.html#using-gradient-descent-methods-limitations" style="font-size: 80%;">Using gradient descent methods, limitations</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs058.html#challenge-yourself-the-coming-weekend" style="font-size: 80%;">Challenge yourself the coming weekend</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs059.html#lab-session-material-from-last-week-and-relevant-for-the-first-project" style="font-size: 80%;">Lab session: Material from last week and relevant for the first project</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs060.html#various-steps-in-cross-validation" style="font-size: 80%;">Various steps in cross-validation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs061.html#how-to-set-up-the-cross-validation-for-ridge-and-or-lasso" style="font-size: 80%;">How to set up the cross-validation for Ridge and/or Lasso</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs062.html#cross-validation-in-brief" style="font-size: 80%;">Cross-validation in brief</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs063.html#code-example-for-cross-validation-and-k-fold-cross-validation" style="font-size: 80%;">Code Example for Cross-validation and \( k \)-fold Cross-validation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs001.html#plans-for-week-38-lecture-monday-september-15" style="font-size: 80%;">Plans for week 38, lecture Monday September 15</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs002.html#readings-and-videos" style="font-size: 80%;">Readings and Videos</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs003.html#linking-the-regression-analysis-with-a-statistical-interpretation" style="font-size: 80%;">Linking the regression analysis with a statistical interpretation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs004.html#assumptions-made" style="font-size: 80%;">Assumptions made</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs005.html#expectation-value-and-variance" style="font-size: 80%;">Expectation value and variance</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs006.html#expectation-value-and-variance-for-boldsymbol-theta" style="font-size: 80%;">Expectation value and variance for \( \boldsymbol{\theta} \)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs007.html#deriving-ols-from-a-probability-distribution" style="font-size: 80%;">Deriving OLS from a probability distribution</a></li>
+     <!-- navigation toc: --> <li><a href="#independent-and-identically-distributed-iid" style="font-size: 80%;">Independent and Identically Distributed (iid)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs009.html#maximum-likelihood-estimation-mle" style="font-size: 80%;">Maximum Likelihood Estimation (MLE)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs010.html#a-new-cost-function" style="font-size: 80%;">A new Cost Function</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs014.html#why-resampling-methods" style="font-size: 80%;">Why resampling methods</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs016.html#resampling-methods" style="font-size: 80%;">Resampling methods</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs013.html#resampling-approaches-can-be-computationally-expensive" style="font-size: 80%;">Resampling approaches can be computationally expensive</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs014.html#why-resampling-methods" style="font-size: 80%;">Why resampling methods ?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs015.html#statistical-analysis" style="font-size: 80%;">Statistical analysis</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs016.html#resampling-methods" style="font-size: 80%;">Resampling methods</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs017.html#resampling-methods-bootstrap" style="font-size: 80%;">Resampling methods: Bootstrap</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs018.html#the-central-limit-theorem" style="font-size: 80%;">The Central Limit Theorem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs019.html#finding-the-limit" style="font-size: 80%;">Finding the Limit</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs020.html#rewriting-the-delta-function" style="font-size: 80%;">Rewriting the \( \delta \)-function</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs021.html#identifying-terms" style="font-size: 80%;">Identifying Terms</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs022.html#wrapping-it-up" style="font-size: 80%;">Wrapping it up</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs023.html#confidence-intervals" style="font-size: 80%;">Confidence Intervals</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs024.html#standard-approach-based-on-the-normal-distribution" style="font-size: 80%;">Standard Approach based on the Normal Distribution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs025.html#resampling-methods-bootstrap-background" style="font-size: 80%;">Resampling methods: Bootstrap background</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs026.html#resampling-methods-more-bootstrap-background" style="font-size: 80%;">Resampling methods: More Bootstrap background</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs027.html#resampling-methods-bootstrap-approach" style="font-size: 80%;">Resampling methods: Bootstrap approach</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs028.html#resampling-methods-bootstrap-steps" style="font-size: 80%;">Resampling methods: Bootstrap steps</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs029.html#code-example-for-the-bootstrap-method" style="font-size: 80%;">Code example for the Bootstrap method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs030.html#plotting-the-histogram" style="font-size: 80%;">Plotting the Histogram</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs031.html#the-bias-variance-tradeoff" style="font-size: 80%;">The bias-variance tradeoff</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs032.html#a-way-to-read-the-bias-variance-tradeoff" style="font-size: 80%;">A way to Read the Bias-Variance Tradeoff</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs033.html#example-code-for-bias-variance-tradeoff" style="font-size: 80%;">Example code for Bias-Variance tradeoff</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs034.html#understanding-what-happens" style="font-size: 80%;">Understanding what happens</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs035.html#summing-up" style="font-size: 80%;">Summing up</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs036.html#another-example-from-scikit-learn-s-repository" style="font-size: 80%;">Another Example from Scikit-Learn's Repository</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs037.html#various-steps-in-cross-validation" style="font-size: 80%;">Various steps in cross-validation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs038.html#cross-validation-in-brief" style="font-size: 80%;">Cross-validation in brief</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs039.html#code-example-for-cross-validation-and-k-fold-cross-validation" style="font-size: 80%;">Code Example for Cross-validation and \( k \)-fold Cross-validation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs040.html#more-examples-on-bootstrap-and-cross-validation-and-errors" style="font-size: 80%;">More examples on bootstrap and cross-validation and errors</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs041.html#the-same-example-but-now-with-cross-validation" style="font-size: 80%;">The same example but now with cross-validation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs042.html#material-for-the-lab-sessions" style="font-size: 80%;">Material for the lab sessions</a></li>
 
         </ul>
       </li>
@@ -318,31 +250,39 @@
 <div class="container">
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0008"></a>
-<!-- !split  -->
-<h2 id="basics" class="anchor">Basics </h2>
+<!-- !split -->
+<h2 id="independent-and-identically-distributed-iid" class="anchor">Independent and Identically Distributed (iid) </h2>
 
-<p>We consider the case where the dependent variables, also called the
-responses or the outcomes, \( y_i \) are discrete and only take values
-from \( k=0,\dots,K-1 \) (i.e. \( K \) classes).
+<p>We assume now that the various \( y_i \) values are stochastically distributed according to the above Gaussian distribution. 
+We define this distribution as
 </p>
+$$
+p(y_i, \boldsymbol{X}\vert\boldsymbol{\theta})=\frac{1}{\sqrt{2\pi\sigma^2}}\exp{\left[-\frac{(y_i-\boldsymbol{X}_{i,*}\boldsymbol{\theta})^2}{2\sigma^2}\right]},
+$$
 
-<p>The goal is to predict the
-output classes from the design matrix \( \boldsymbol{X}\in\mathbb{R}^{n\times p} \)
-made of \( n \) samples, each of which carries \( p \) features or predictors. The
-primary goal is to identify the classes to which new unseen samples
-belong.
-</p>
+<p>which reads as finding the likelihood of an event \( y_i \) with the input variables \( \boldsymbol{X} \) given the parameters (to be determined) \( \boldsymbol{\theta} \).</p>
+
+<p>Since these events are assumed to be independent and identicall distributed we can build the probability distribution function (PDF) for all possible event \( \boldsymbol{y} \) as the product of the single events, that is we have</p>
 
-<p>Let us specialize to the case of two classes only, with outputs
-\( y_i=0 \) and \( y_i=1 \). Our outcomes could represent the status of a
-credit card user that could default or not on her/his credit card
-debt. That is
+$$
+p(\boldsymbol{y},\boldsymbol{X}\vert\boldsymbol{\theta})=\prod_{i=0}^{n-1}\frac{1}{\sqrt{2\pi\sigma^2}}\exp{\left[-\frac{(y_i-\boldsymbol{X}_{i,*}\boldsymbol{\theta})^2}{2\sigma^2}\right]}=\prod_{i=0}^{n-1}p(y_i,\boldsymbol{X}\vert\boldsymbol{\theta}).
+$$
+
+<p>We will write this in a more compact form reserving \( \boldsymbol{D} \) for the domain of events, including the ouputs (targets) and the inputs. That is
+in case we have a simple one-dimensional input and output case
 </p>
+$$
+\boldsymbol{D}=[(x_0,y_0), (x_1,y_1),\dots, (x_{n-1},y_{n-1})].
+$$
 
+<p>In the more general case the various inputs should be replaced by the possible features represented by the input data set \( \boldsymbol{X} \). 
+We can now rewrite the above probability as 
+</p>
 $$
-y_i = \begin{bmatrix} 0 & \mathrm{no}\\  1 & \mathrm{yes} \end{bmatrix}.
+p(\boldsymbol{D}\vert\boldsymbol{\theta})=\prod_{i=0}^{n-1}\frac{1}{\sqrt{2\pi\sigma^2}}\exp{\left[-\frac{(y_i-\boldsymbol{X}_{i,*}\boldsymbol{\theta})^2}{2\sigma^2}\right]}.
 $$
 
+<p>It is a conditional probability (see below) and reads as the likelihood of a domain of events \( \boldsymbol{D} \) given a set of parameters \( \boldsymbol{\theta} \).</p>
 
 <p>
 <!-- navigation buttons at the bottom of the page -->
@@ -367,7 +307,7 @@ <h2 id="basics" class="anchor">Basics </h2>
   <li><a href="http://github.com/._week38-bs016.html">17</a></li>
   <li><a href="http://github.com/._week38-bs017.html">18</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week38-bs063.html">64</a></li>
+  <li><a href="http://github.com/._week38-bs042.html">43</a></li>
   <li><a href="http://github.com/._week38-bs009.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week38/html/._week38-bs009.html b/doc/pub/week38/html/._week38-bs009.html
index ecf5ff1c7..43a34c58b 100644
--- a/doc/pub/week38/html/._week38-bs009.html
+++ b/doc/pub/week38/html/._week38-bs009.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 38: Logistic Regression and Optimization">
-<title>Week 38: Logistic Regression and Optimization</title>
+<meta name="description" content="Week 38: Statistical analysis, bias-variance tradeoff and resampling methods">
+<title>Week 38: Statistical analysis, bias-variance tradeoff and resampling methods</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week38.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week38-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -36,172 +36,113 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Plans for week 38, lecture Monday September 16',
+ 'sections': [('Plans for week 38, lecture Monday September 15',
                2,
                None,
-               'plans-for-week-38-lecture-monday-september-16'),
-              ('Suggested reading and videos',
+               'plans-for-week-38-lecture-monday-september-15'),
+              ('Readings and Videos', 2, None, 'readings-and-videos'),
+              ('Linking the regression analysis with a statistical '
+               'interpretation',
                2,
                None,
-               'suggested-reading-and-videos'),
-              ('Plans for the lab sessions',
+               'linking-the-regression-analysis-with-a-statistical-interpretation'),
+              ('Assumptions made', 2, None, 'assumptions-made'),
+              ('Expectation value and variance',
                2,
                None,
-               'plans-for-the-lab-sessions'),
-              ('Material for lecture Monday September 16',
+               'expectation-value-and-variance'),
+              ('Expectation value and variance for $\\boldsymbol{\\theta}$',
                2,
                None,
-               'material-for-lecture-monday-september-16'),
-              ('Logistic Regression', 2, None, 'logistic-regression'),
-              ('Classification problems', 2, None, 'classification-problems'),
-              ('Optimization and Deep learning',
+               'expectation-value-and-variance-for-boldsymbol-theta'),
+              ('Deriving OLS from a probability distribution',
                2,
                None,
-               'optimization-and-deep-learning'),
-              ('Basics', 2, None, 'basics'),
-              ('Linear classifier', 2, None, 'linear-classifier'),
-              ('Some selected properties', 2, None, 'some-selected-properties'),
-              ('Simple example', 2, None, 'simple-example'),
-              ('Plotting the mean value for each group',
+               'deriving-ols-from-a-probability-distribution'),
+              ('Independent and Identically Distributed (iid)',
                2,
                None,
-               'plotting-the-mean-value-for-each-group'),
-              ('The logistic function', 2, None, 'the-logistic-function'),
-              ('Examples of likelihood functions used in logistic regression '
-               'and nueral networks',
+               'independent-and-identically-distributed-iid'),
+              ('Maximum Likelihood Estimation (MLE)',
                2,
                None,
-               'examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-networks'),
-              ('Two parameters', 2, None, 'two-parameters'),
-              ('Maximum likelihood', 2, None, 'maximum-likelihood'),
-              ('The cost function rewritten',
+               'maximum-likelihood-estimation-mle'),
+              ('A new Cost Function', 2, None, 'a-new-cost-function'),
+              ('Why resampling methods', 2, None, 'why-resampling-methods'),
+              ('Resampling methods', 2, None, 'resampling-methods'),
+              ('Resampling approaches can be computationally expensive',
                2,
                None,
-               'the-cost-function-rewritten'),
-              ('Minimizing the cross entropy',
+               'resampling-approaches-can-be-computationally-expensive'),
+              ('Why resampling methods ?', 2, None, 'why-resampling-methods'),
+              ('Statistical analysis', 2, None, 'statistical-analysis'),
+              ('Resampling methods', 2, None, 'resampling-methods'),
+              ('Resampling methods: Bootstrap',
                2,
                None,
-               'minimizing-the-cross-entropy'),
-              ('A more compact expression',
+               'resampling-methods-bootstrap'),
+              ('The Central Limit Theorem',
                2,
                None,
-               'a-more-compact-expression'),
-              ('Extending to more predictors',
+               'the-central-limit-theorem'),
+              ('Finding the Limit', 2, None, 'finding-the-limit'),
+              ('Rewriting the $\\delta$-function',
                2,
                None,
-               'extending-to-more-predictors'),
-              ('Including more classes', 2, None, 'including-more-classes'),
-              ('More classes', 2, None, 'more-classes'),
-              ('Searching for Optimal Regularization Parameters $\\lambda$',
+               'rewriting-the-delta-function'),
+              ('Identifying Terms', 2, None, 'identifying-terms'),
+              ('Wrapping it up', 2, None, 'wrapping-it-up'),
+              ('Confidence Intervals', 2, None, 'confidence-intervals'),
+              ('Standard Approach based on the Normal Distribution',
                2,
                None,
-               'searching-for-optimal-regularization-parameters-lambda'),
-              ('Grid Search', 2, None, 'grid-search'),
-              ('Randomized Grid Search', 2, None, 'randomized-grid-search'),
-              ('Wisconsin Cancer Data', 2, None, 'wisconsin-cancer-data'),
-              ('Using the correlation matrix',
+               'standard-approach-based-on-the-normal-distribution'),
+              ('Resampling methods: Bootstrap background',
                2,
                None,
-               'using-the-correlation-matrix'),
-              ('Discussing the correlation data',
+               'resampling-methods-bootstrap-background'),
+              ('Resampling methods: More Bootstrap background',
                2,
                None,
-               'discussing-the-correlation-data'),
-              ('Other measures in classification studies: Cancer Data  again',
+               'resampling-methods-more-bootstrap-background'),
+              ('Resampling methods: Bootstrap approach',
                2,
                None,
-               'other-measures-in-classification-studies-cancer-data-again'),
-              ('Optimization, the central part of any Machine Learning '
-               'algortithm',
+               'resampling-methods-bootstrap-approach'),
+              ('Resampling methods: Bootstrap steps',
                2,
                None,
-               'optimization-the-central-part-of-any-machine-learning-algortithm'),
-              ('Revisiting our Logistic Regression case',
+               'resampling-methods-bootstrap-steps'),
+              ('Code example for the Bootstrap method',
                2,
                None,
-               'revisiting-our-logistic-regression-case'),
-              ('The equations to solve', 2, None, 'the-equations-to-solve'),
-              ("Solving using Newton-Raphson's method",
+               'code-example-for-the-bootstrap-method'),
+              ('Plotting the Histogram', 2, None, 'plotting-the-histogram'),
+              ('The bias-variance tradeoff',
                2,
                None,
-               'solving-using-newton-raphson-s-method'),
-              ("Brief reminder on Newton-Raphson's method",
+               'the-bias-variance-tradeoff'),
+              ('A way to Read the Bias-Variance Tradeoff',
                2,
                None,
-               'brief-reminder-on-newton-raphson-s-method'),
-              ('The equations', 2, None, 'the-equations'),
-              ('Simple geometric interpretation',
+               'a-way-to-read-the-bias-variance-tradeoff'),
+              ('Example code for Bias-Variance tradeoff',
                2,
                None,
-               'simple-geometric-interpretation'),
-              ('Extending to more than one variable',
+               'example-code-for-bias-variance-tradeoff'),
+              ('Understanding what happens',
                2,
                None,
-               'extending-to-more-than-one-variable'),
-              ('Steepest descent', 2, None, 'steepest-descent'),
-              ('More on Steepest descent', 2, None, 'more-on-steepest-descent'),
-              ('The ideal', 2, None, 'the-ideal'),
-              ('The sensitiveness of the gradient descent',
+               'understanding-what-happens'),
+              ('Summing up', 2, None, 'summing-up'),
+              ("Another Example from Scikit-Learn's Repository",
                2,
                None,
-               'the-sensitiveness-of-the-gradient-descent'),
-              ('Convex functions', 2, None, 'convex-functions'),
-              ('Convex function', 2, None, 'convex-function'),
-              ('Conditions on convex functions',
-               2,
-               None,
-               'conditions-on-convex-functions'),
-              ('More on convex functions', 2, None, 'more-on-convex-functions'),
-              ('Some simple problems', 2, None, 'some-simple-problems'),
-              ('Revisiting our first homework',
-               2,
-               None,
-               'revisiting-our-first-homework'),
-              ('Gradient descent example', 2, None, 'gradient-descent-example'),
-              ('The derivative of the cost/loss function',
-               2,
-               None,
-               'the-derivative-of-the-cost-loss-function'),
-              ('The Hessian matrix', 2, None, 'the-hessian-matrix'),
-              ('Simple program', 2, None, 'simple-program'),
-              ('Gradient Descent Example', 2, None, 'gradient-descent-example'),
-              ('And a corresponding example using _scikit-learn_',
-               2,
-               None,
-               'and-a-corresponding-example-using-scikit-learn'),
-              ('Gradient descent and Ridge',
-               2,
-               None,
-               'gradient-descent-and-ridge'),
-              ('The Hessian matrix for Ridge Regression',
-               2,
-               None,
-               'the-hessian-matrix-for-ridge-regression'),
-              ('Program example for gradient descent with Ridge Regression',
-               2,
-               None,
-               'program-example-for-gradient-descent-with-ridge-regression'),
-              ('Using gradient descent methods, limitations',
-               2,
-               None,
-               'using-gradient-descent-methods-limitations'),
-              ('Challenge yourself the coming weekend',
-               2,
-               None,
-               'challenge-yourself-the-coming-weekend'),
-              ('Lab session: Material from last week and relevant for the '
-               'first project',
-               2,
-               None,
-               'lab-session-material-from-last-week-and-relevant-for-the-first-project'),
+               'another-example-from-scikit-learn-s-repository'),
               ('Various steps in cross-validation',
                2,
                None,
                'various-steps-in-cross-validation'),
-              ('How to set up the cross-validation for Ridge and/or Lasso',
-               2,
-               None,
-               'how-to-set-up-the-cross-validation-for-ridge-and-or-lasso'),
               ('Cross-validation in brief',
                2,
                None,
@@ -210,7 +151,19 @@
                'Cross-validation',
                2,
                None,
-               'code-example-for-cross-validation-and-k-fold-cross-validation')]}
+               'code-example-for-cross-validation-and-k-fold-cross-validation'),
+              ('More examples on bootstrap and cross-validation and errors',
+               2,
+               None,
+               'more-examples-on-bootstrap-and-cross-validation-and-errors'),
+              ('The same example but now with cross-validation',
+               2,
+               None,
+               'the-same-example-but-now-with-cross-validation'),
+              ('Material for the lab sessions',
+               2,
+               None,
+               'material-for-the-lab-sessions')]}
 end of tocinfo -->
 
 <body>
@@ -238,76 +191,55 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week38-bs.html">Week 38: Logistic Regression and Optimization</a>
+    <a class="navbar-brand" href="http://github.com/week38-bs.html">Week 38: Statistical analysis, bias-variance tradeoff and resampling methods</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs001.html#plans-for-week-38-lecture-monday-september-16" style="font-size: 80%;">Plans for week 38, lecture Monday September 16</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs002.html#suggested-reading-and-videos" style="font-size: 80%;">Suggested reading and videos</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs003.html#plans-for-the-lab-sessions" style="font-size: 80%;">Plans for the lab sessions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs004.html#material-for-lecture-monday-september-16" style="font-size: 80%;">Material for lecture Monday September 16</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs005.html#logistic-regression" style="font-size: 80%;">Logistic Regression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs006.html#classification-problems" style="font-size: 80%;">Classification problems</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs007.html#optimization-and-deep-learning" style="font-size: 80%;">Optimization and Deep learning</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs008.html#basics" style="font-size: 80%;">Basics</a></li>
-     <!-- navigation toc: --> <li><a href="#linear-classifier" style="font-size: 80%;">Linear classifier</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs010.html#some-selected-properties" style="font-size: 80%;">Some selected properties</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs011.html#simple-example" style="font-size: 80%;">Simple example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs012.html#plotting-the-mean-value-for-each-group" style="font-size: 80%;">Plotting the mean value for each group</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs013.html#the-logistic-function" style="font-size: 80%;">The logistic function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs014.html#examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-networks" style="font-size: 80%;">Examples of likelihood functions used in logistic regression and nueral networks</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs015.html#two-parameters" style="font-size: 80%;">Two parameters</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs016.html#maximum-likelihood" style="font-size: 80%;">Maximum likelihood</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs017.html#the-cost-function-rewritten" style="font-size: 80%;">The cost function rewritten</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs018.html#minimizing-the-cross-entropy" style="font-size: 80%;">Minimizing the cross entropy</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs019.html#a-more-compact-expression" style="font-size: 80%;">A more compact expression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs020.html#extending-to-more-predictors" style="font-size: 80%;">Extending to more predictors</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs021.html#including-more-classes" style="font-size: 80%;">Including more classes</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs022.html#more-classes" style="font-size: 80%;">More classes</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs023.html#searching-for-optimal-regularization-parameters-lambda" style="font-size: 80%;">Searching for Optimal Regularization Parameters \( \lambda \)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs024.html#grid-search" style="font-size: 80%;">Grid Search</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs025.html#randomized-grid-search" style="font-size: 80%;">Randomized Grid Search</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs026.html#wisconsin-cancer-data" style="font-size: 80%;">Wisconsin Cancer Data</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs027.html#using-the-correlation-matrix" style="font-size: 80%;">Using the correlation matrix</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs028.html#discussing-the-correlation-data" style="font-size: 80%;">Discussing the correlation data</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs029.html#other-measures-in-classification-studies-cancer-data-again" style="font-size: 80%;">Other measures in classification studies: Cancer Data  again</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs030.html#optimization-the-central-part-of-any-machine-learning-algortithm" style="font-size: 80%;">Optimization, the central part of any Machine Learning algortithm</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs031.html#revisiting-our-logistic-regression-case" style="font-size: 80%;">Revisiting our Logistic Regression case</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs032.html#the-equations-to-solve" style="font-size: 80%;">The equations to solve</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs033.html#solving-using-newton-raphson-s-method" style="font-size: 80%;">Solving using Newton-Raphson's method</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs034.html#brief-reminder-on-newton-raphson-s-method" style="font-size: 80%;">Brief reminder on Newton-Raphson's method</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs035.html#the-equations" style="font-size: 80%;">The equations</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs036.html#simple-geometric-interpretation" style="font-size: 80%;">Simple geometric interpretation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs037.html#extending-to-more-than-one-variable" style="font-size: 80%;">Extending to more than one variable</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs038.html#steepest-descent" style="font-size: 80%;">Steepest descent</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs039.html#more-on-steepest-descent" style="font-size: 80%;">More on Steepest descent</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs040.html#the-ideal" style="font-size: 80%;">The ideal</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs041.html#the-sensitiveness-of-the-gradient-descent" style="font-size: 80%;">The sensitiveness of the gradient descent</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs042.html#convex-functions" style="font-size: 80%;">Convex functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs043.html#convex-function" style="font-size: 80%;">Convex function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs044.html#conditions-on-convex-functions" style="font-size: 80%;">Conditions on convex functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs045.html#more-on-convex-functions" style="font-size: 80%;">More on convex functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs046.html#some-simple-problems" style="font-size: 80%;">Some simple problems</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs047.html#revisiting-our-first-homework" style="font-size: 80%;">Revisiting our first homework</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs052.html#gradient-descent-example" style="font-size: 80%;">Gradient descent example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs049.html#the-derivative-of-the-cost-loss-function" style="font-size: 80%;">The derivative of the cost/loss function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs050.html#the-hessian-matrix" style="font-size: 80%;">The Hessian matrix</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs051.html#simple-program" style="font-size: 80%;">Simple program</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs052.html#gradient-descent-example" style="font-size: 80%;">Gradient Descent Example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs053.html#and-a-corresponding-example-using-scikit-learn" style="font-size: 80%;">And a corresponding example using <b>scikit-learn</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs054.html#gradient-descent-and-ridge" style="font-size: 80%;">Gradient descent and Ridge</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs055.html#the-hessian-matrix-for-ridge-regression" style="font-size: 80%;">The Hessian matrix for Ridge Regression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs056.html#program-example-for-gradient-descent-with-ridge-regression" style="font-size: 80%;">Program example for gradient descent with Ridge Regression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs057.html#using-gradient-descent-methods-limitations" style="font-size: 80%;">Using gradient descent methods, limitations</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs058.html#challenge-yourself-the-coming-weekend" style="font-size: 80%;">Challenge yourself the coming weekend</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs059.html#lab-session-material-from-last-week-and-relevant-for-the-first-project" style="font-size: 80%;">Lab session: Material from last week and relevant for the first project</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs060.html#various-steps-in-cross-validation" style="font-size: 80%;">Various steps in cross-validation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs061.html#how-to-set-up-the-cross-validation-for-ridge-and-or-lasso" style="font-size: 80%;">How to set up the cross-validation for Ridge and/or Lasso</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs062.html#cross-validation-in-brief" style="font-size: 80%;">Cross-validation in brief</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs063.html#code-example-for-cross-validation-and-k-fold-cross-validation" style="font-size: 80%;">Code Example for Cross-validation and \( k \)-fold Cross-validation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs001.html#plans-for-week-38-lecture-monday-september-15" style="font-size: 80%;">Plans for week 38, lecture Monday September 15</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs002.html#readings-and-videos" style="font-size: 80%;">Readings and Videos</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs003.html#linking-the-regression-analysis-with-a-statistical-interpretation" style="font-size: 80%;">Linking the regression analysis with a statistical interpretation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs004.html#assumptions-made" style="font-size: 80%;">Assumptions made</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs005.html#expectation-value-and-variance" style="font-size: 80%;">Expectation value and variance</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs006.html#expectation-value-and-variance-for-boldsymbol-theta" style="font-size: 80%;">Expectation value and variance for \( \boldsymbol{\theta} \)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs007.html#deriving-ols-from-a-probability-distribution" style="font-size: 80%;">Deriving OLS from a probability distribution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs008.html#independent-and-identically-distributed-iid" style="font-size: 80%;">Independent and Identically Distributed (iid)</a></li>
+     <!-- navigation toc: --> <li><a href="#maximum-likelihood-estimation-mle" style="font-size: 80%;">Maximum Likelihood Estimation (MLE)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs010.html#a-new-cost-function" style="font-size: 80%;">A new Cost Function</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs014.html#why-resampling-methods" style="font-size: 80%;">Why resampling methods</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs016.html#resampling-methods" style="font-size: 80%;">Resampling methods</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs013.html#resampling-approaches-can-be-computationally-expensive" style="font-size: 80%;">Resampling approaches can be computationally expensive</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs014.html#why-resampling-methods" style="font-size: 80%;">Why resampling methods ?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs015.html#statistical-analysis" style="font-size: 80%;">Statistical analysis</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs016.html#resampling-methods" style="font-size: 80%;">Resampling methods</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs017.html#resampling-methods-bootstrap" style="font-size: 80%;">Resampling methods: Bootstrap</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs018.html#the-central-limit-theorem" style="font-size: 80%;">The Central Limit Theorem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs019.html#finding-the-limit" style="font-size: 80%;">Finding the Limit</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs020.html#rewriting-the-delta-function" style="font-size: 80%;">Rewriting the \( \delta \)-function</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs021.html#identifying-terms" style="font-size: 80%;">Identifying Terms</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs022.html#wrapping-it-up" style="font-size: 80%;">Wrapping it up</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs023.html#confidence-intervals" style="font-size: 80%;">Confidence Intervals</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs024.html#standard-approach-based-on-the-normal-distribution" style="font-size: 80%;">Standard Approach based on the Normal Distribution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs025.html#resampling-methods-bootstrap-background" style="font-size: 80%;">Resampling methods: Bootstrap background</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs026.html#resampling-methods-more-bootstrap-background" style="font-size: 80%;">Resampling methods: More Bootstrap background</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs027.html#resampling-methods-bootstrap-approach" style="font-size: 80%;">Resampling methods: Bootstrap approach</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs028.html#resampling-methods-bootstrap-steps" style="font-size: 80%;">Resampling methods: Bootstrap steps</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs029.html#code-example-for-the-bootstrap-method" style="font-size: 80%;">Code example for the Bootstrap method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs030.html#plotting-the-histogram" style="font-size: 80%;">Plotting the Histogram</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs031.html#the-bias-variance-tradeoff" style="font-size: 80%;">The bias-variance tradeoff</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs032.html#a-way-to-read-the-bias-variance-tradeoff" style="font-size: 80%;">A way to Read the Bias-Variance Tradeoff</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs033.html#example-code-for-bias-variance-tradeoff" style="font-size: 80%;">Example code for Bias-Variance tradeoff</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs034.html#understanding-what-happens" style="font-size: 80%;">Understanding what happens</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs035.html#summing-up" style="font-size: 80%;">Summing up</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs036.html#another-example-from-scikit-learn-s-repository" style="font-size: 80%;">Another Example from Scikit-Learn's Repository</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs037.html#various-steps-in-cross-validation" style="font-size: 80%;">Various steps in cross-validation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs038.html#cross-validation-in-brief" style="font-size: 80%;">Cross-validation in brief</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs039.html#code-example-for-cross-validation-and-k-fold-cross-validation" style="font-size: 80%;">Code Example for Cross-validation and \( k \)-fold Cross-validation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs040.html#more-examples-on-bootstrap-and-cross-validation-and-errors" style="font-size: 80%;">More examples on bootstrap and cross-validation and errors</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs041.html#the-same-example-but-now-with-cross-validation" style="font-size: 80%;">The same example but now with cross-validation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs042.html#material-for-the-lab-sessions" style="font-size: 80%;">Material for the lab sessions</a></li>
 
         </ul>
       </li>
@@ -319,26 +251,31 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0009"></a>
 <!-- !split -->
-<h2 id="linear-classifier" class="anchor">Linear classifier </h2>
+<h2 id="maximum-likelihood-estimation-mle" class="anchor">Maximum Likelihood Estimation (MLE) </h2>
+
+<p>In statistics, maximum likelihood estimation (MLE) is a method of
+estimating the parameters of an assumed probability distribution,
+given some observed data. This is achieved by maximizing a likelihood
+function so that, under the assumed statistical model, the observed
+data is the most probable. 
+</p>
 
-<p>Before moving to the logistic model, let us try to use our linear
-regression model to classify these two outcomes. We could for example
-fit a linear model to the default case if \( y_i > 0.5 \) and the no
-default case \( y_i \leq 0.5 \).
+<p>We will assume here that our events are given by the above Gaussian
+distribution and we will determine the optimal parameters \( \theta \) by
+maximizing the above PDF. However, computing the derivatives of a
+product function is cumbersome and can easily lead to overflow and/or
+underflowproblems, with potentials for loss of numerical precision.
 </p>
 
-<p>We would then have our 
-weighted linear combination, namely 
+<p>In practice, it is more convenient to maximize the logarithm of the
+PDF because it is a monotonically increasing function of the argument.
+Alternatively, and this will be our option, we will minimize the
+negative of the logarithm since this is a monotonically decreasing
+function.
 </p>
-$$
-\begin{equation}
-\boldsymbol{y} = \boldsymbol{X}^T\boldsymbol{\beta} +  \boldsymbol{\epsilon},
-\tag{1}
-\end{equation}
-$$
 
-<p>where \( \boldsymbol{y} \) is a vector representing the possible outcomes, \( \boldsymbol{X} \) is our
-\( n\times p \) design matrix and \( \boldsymbol{\beta} \) represents our estimators/predictors.
+<p>Note also that maximization/minimization of the logarithm of the PDF
+is equivalent to the maximization/minimization of the function itself.
 </p>
 
 <p>
@@ -365,7 +302,7 @@ <h2 id="linear-classifier" class="anchor">Linear classifier </h2>
   <li><a href="http://github.com/._week38-bs017.html">18</a></li>
   <li><a href="http://github.com/._week38-bs018.html">19</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week38-bs063.html">64</a></li>
+  <li><a href="http://github.com/._week38-bs042.html">43</a></li>
   <li><a href="http://github.com/._week38-bs010.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week38/html/._week38-bs010.html b/doc/pub/week38/html/._week38-bs010.html
index ef4581284..b2ec18c2d 100644
--- a/doc/pub/week38/html/._week38-bs010.html
+++ b/doc/pub/week38/html/._week38-bs010.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 38: Logistic Regression and Optimization">
-<title>Week 38: Logistic Regression and Optimization</title>
+<meta name="description" content="Week 38: Statistical analysis, bias-variance tradeoff and resampling methods">
+<title>Week 38: Statistical analysis, bias-variance tradeoff and resampling methods</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week38.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week38-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -36,172 +36,113 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Plans for week 38, lecture Monday September 16',
+ 'sections': [('Plans for week 38, lecture Monday September 15',
                2,
                None,
-               'plans-for-week-38-lecture-monday-september-16'),
-              ('Suggested reading and videos',
+               'plans-for-week-38-lecture-monday-september-15'),
+              ('Readings and Videos', 2, None, 'readings-and-videos'),
+              ('Linking the regression analysis with a statistical '
+               'interpretation',
                2,
                None,
-               'suggested-reading-and-videos'),
-              ('Plans for the lab sessions',
+               'linking-the-regression-analysis-with-a-statistical-interpretation'),
+              ('Assumptions made', 2, None, 'assumptions-made'),
+              ('Expectation value and variance',
                2,
                None,
-               'plans-for-the-lab-sessions'),
-              ('Material for lecture Monday September 16',
+               'expectation-value-and-variance'),
+              ('Expectation value and variance for $\\boldsymbol{\\theta}$',
                2,
                None,
-               'material-for-lecture-monday-september-16'),
-              ('Logistic Regression', 2, None, 'logistic-regression'),
-              ('Classification problems', 2, None, 'classification-problems'),
-              ('Optimization and Deep learning',
+               'expectation-value-and-variance-for-boldsymbol-theta'),
+              ('Deriving OLS from a probability distribution',
                2,
                None,
-               'optimization-and-deep-learning'),
-              ('Basics', 2, None, 'basics'),
-              ('Linear classifier', 2, None, 'linear-classifier'),
-              ('Some selected properties', 2, None, 'some-selected-properties'),
-              ('Simple example', 2, None, 'simple-example'),
-              ('Plotting the mean value for each group',
+               'deriving-ols-from-a-probability-distribution'),
+              ('Independent and Identically Distributed (iid)',
                2,
                None,
-               'plotting-the-mean-value-for-each-group'),
-              ('The logistic function', 2, None, 'the-logistic-function'),
-              ('Examples of likelihood functions used in logistic regression '
-               'and nueral networks',
+               'independent-and-identically-distributed-iid'),
+              ('Maximum Likelihood Estimation (MLE)',
                2,
                None,
-               'examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-networks'),
-              ('Two parameters', 2, None, 'two-parameters'),
-              ('Maximum likelihood', 2, None, 'maximum-likelihood'),
-              ('The cost function rewritten',
+               'maximum-likelihood-estimation-mle'),
+              ('A new Cost Function', 2, None, 'a-new-cost-function'),
+              ('Why resampling methods', 2, None, 'why-resampling-methods'),
+              ('Resampling methods', 2, None, 'resampling-methods'),
+              ('Resampling approaches can be computationally expensive',
                2,
                None,
-               'the-cost-function-rewritten'),
-              ('Minimizing the cross entropy',
+               'resampling-approaches-can-be-computationally-expensive'),
+              ('Why resampling methods ?', 2, None, 'why-resampling-methods'),
+              ('Statistical analysis', 2, None, 'statistical-analysis'),
+              ('Resampling methods', 2, None, 'resampling-methods'),
+              ('Resampling methods: Bootstrap',
                2,
                None,
-               'minimizing-the-cross-entropy'),
-              ('A more compact expression',
+               'resampling-methods-bootstrap'),
+              ('The Central Limit Theorem',
                2,
                None,
-               'a-more-compact-expression'),
-              ('Extending to more predictors',
+               'the-central-limit-theorem'),
+              ('Finding the Limit', 2, None, 'finding-the-limit'),
+              ('Rewriting the $\\delta$-function',
                2,
                None,
-               'extending-to-more-predictors'),
-              ('Including more classes', 2, None, 'including-more-classes'),
-              ('More classes', 2, None, 'more-classes'),
-              ('Searching for Optimal Regularization Parameters $\\lambda$',
+               'rewriting-the-delta-function'),
+              ('Identifying Terms', 2, None, 'identifying-terms'),
+              ('Wrapping it up', 2, None, 'wrapping-it-up'),
+              ('Confidence Intervals', 2, None, 'confidence-intervals'),
+              ('Standard Approach based on the Normal Distribution',
                2,
                None,
-               'searching-for-optimal-regularization-parameters-lambda'),
-              ('Grid Search', 2, None, 'grid-search'),
-              ('Randomized Grid Search', 2, None, 'randomized-grid-search'),
-              ('Wisconsin Cancer Data', 2, None, 'wisconsin-cancer-data'),
-              ('Using the correlation matrix',
+               'standard-approach-based-on-the-normal-distribution'),
+              ('Resampling methods: Bootstrap background',
                2,
                None,
-               'using-the-correlation-matrix'),
-              ('Discussing the correlation data',
+               'resampling-methods-bootstrap-background'),
+              ('Resampling methods: More Bootstrap background',
                2,
                None,
-               'discussing-the-correlation-data'),
-              ('Other measures in classification studies: Cancer Data  again',
+               'resampling-methods-more-bootstrap-background'),
+              ('Resampling methods: Bootstrap approach',
                2,
                None,
-               'other-measures-in-classification-studies-cancer-data-again'),
-              ('Optimization, the central part of any Machine Learning '
-               'algortithm',
+               'resampling-methods-bootstrap-approach'),
+              ('Resampling methods: Bootstrap steps',
                2,
                None,
-               'optimization-the-central-part-of-any-machine-learning-algortithm'),
-              ('Revisiting our Logistic Regression case',
+               'resampling-methods-bootstrap-steps'),
+              ('Code example for the Bootstrap method',
                2,
                None,
-               'revisiting-our-logistic-regression-case'),
-              ('The equations to solve', 2, None, 'the-equations-to-solve'),
-              ("Solving using Newton-Raphson's method",
+               'code-example-for-the-bootstrap-method'),
+              ('Plotting the Histogram', 2, None, 'plotting-the-histogram'),
+              ('The bias-variance tradeoff',
                2,
                None,
-               'solving-using-newton-raphson-s-method'),
-              ("Brief reminder on Newton-Raphson's method",
+               'the-bias-variance-tradeoff'),
+              ('A way to Read the Bias-Variance Tradeoff',
                2,
                None,
-               'brief-reminder-on-newton-raphson-s-method'),
-              ('The equations', 2, None, 'the-equations'),
-              ('Simple geometric interpretation',
+               'a-way-to-read-the-bias-variance-tradeoff'),
+              ('Example code for Bias-Variance tradeoff',
                2,
                None,
-               'simple-geometric-interpretation'),
-              ('Extending to more than one variable',
+               'example-code-for-bias-variance-tradeoff'),
+              ('Understanding what happens',
                2,
                None,
-               'extending-to-more-than-one-variable'),
-              ('Steepest descent', 2, None, 'steepest-descent'),
-              ('More on Steepest descent', 2, None, 'more-on-steepest-descent'),
-              ('The ideal', 2, None, 'the-ideal'),
-              ('The sensitiveness of the gradient descent',
+               'understanding-what-happens'),
+              ('Summing up', 2, None, 'summing-up'),
+              ("Another Example from Scikit-Learn's Repository",
                2,
                None,
-               'the-sensitiveness-of-the-gradient-descent'),
-              ('Convex functions', 2, None, 'convex-functions'),
-              ('Convex function', 2, None, 'convex-function'),
-              ('Conditions on convex functions',
-               2,
-               None,
-               'conditions-on-convex-functions'),
-              ('More on convex functions', 2, None, 'more-on-convex-functions'),
-              ('Some simple problems', 2, None, 'some-simple-problems'),
-              ('Revisiting our first homework',
-               2,
-               None,
-               'revisiting-our-first-homework'),
-              ('Gradient descent example', 2, None, 'gradient-descent-example'),
-              ('The derivative of the cost/loss function',
-               2,
-               None,
-               'the-derivative-of-the-cost-loss-function'),
-              ('The Hessian matrix', 2, None, 'the-hessian-matrix'),
-              ('Simple program', 2, None, 'simple-program'),
-              ('Gradient Descent Example', 2, None, 'gradient-descent-example'),
-              ('And a corresponding example using _scikit-learn_',
-               2,
-               None,
-               'and-a-corresponding-example-using-scikit-learn'),
-              ('Gradient descent and Ridge',
-               2,
-               None,
-               'gradient-descent-and-ridge'),
-              ('The Hessian matrix for Ridge Regression',
-               2,
-               None,
-               'the-hessian-matrix-for-ridge-regression'),
-              ('Program example for gradient descent with Ridge Regression',
-               2,
-               None,
-               'program-example-for-gradient-descent-with-ridge-regression'),
-              ('Using gradient descent methods, limitations',
-               2,
-               None,
-               'using-gradient-descent-methods-limitations'),
-              ('Challenge yourself the coming weekend',
-               2,
-               None,
-               'challenge-yourself-the-coming-weekend'),
-              ('Lab session: Material from last week and relevant for the '
-               'first project',
-               2,
-               None,
-               'lab-session-material-from-last-week-and-relevant-for-the-first-project'),
+               'another-example-from-scikit-learn-s-repository'),
               ('Various steps in cross-validation',
                2,
                None,
                'various-steps-in-cross-validation'),
-              ('How to set up the cross-validation for Ridge and/or Lasso',
-               2,
-               None,
-               'how-to-set-up-the-cross-validation-for-ridge-and-or-lasso'),
               ('Cross-validation in brief',
                2,
                None,
@@ -210,7 +151,19 @@
                'Cross-validation',
                2,
                None,
-               'code-example-for-cross-validation-and-k-fold-cross-validation')]}
+               'code-example-for-cross-validation-and-k-fold-cross-validation'),
+              ('More examples on bootstrap and cross-validation and errors',
+               2,
+               None,
+               'more-examples-on-bootstrap-and-cross-validation-and-errors'),
+              ('The same example but now with cross-validation',
+               2,
+               None,
+               'the-same-example-but-now-with-cross-validation'),
+              ('Material for the lab sessions',
+               2,
+               None,
+               'material-for-the-lab-sessions')]}
 end of tocinfo -->
 
 <body>
@@ -238,76 +191,55 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week38-bs.html">Week 38: Logistic Regression and Optimization</a>
+    <a class="navbar-brand" href="http://github.com/week38-bs.html">Week 38: Statistical analysis, bias-variance tradeoff and resampling methods</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs001.html#plans-for-week-38-lecture-monday-september-16" style="font-size: 80%;">Plans for week 38, lecture Monday September 16</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs002.html#suggested-reading-and-videos" style="font-size: 80%;">Suggested reading and videos</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs003.html#plans-for-the-lab-sessions" style="font-size: 80%;">Plans for the lab sessions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs004.html#material-for-lecture-monday-september-16" style="font-size: 80%;">Material for lecture Monday September 16</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs005.html#logistic-regression" style="font-size: 80%;">Logistic Regression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs006.html#classification-problems" style="font-size: 80%;">Classification problems</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs007.html#optimization-and-deep-learning" style="font-size: 80%;">Optimization and Deep learning</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs008.html#basics" style="font-size: 80%;">Basics</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs009.html#linear-classifier" style="font-size: 80%;">Linear classifier</a></li>
-     <!-- navigation toc: --> <li><a href="#some-selected-properties" style="font-size: 80%;">Some selected properties</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs011.html#simple-example" style="font-size: 80%;">Simple example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs012.html#plotting-the-mean-value-for-each-group" style="font-size: 80%;">Plotting the mean value for each group</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs013.html#the-logistic-function" style="font-size: 80%;">The logistic function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs014.html#examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-networks" style="font-size: 80%;">Examples of likelihood functions used in logistic regression and nueral networks</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs015.html#two-parameters" style="font-size: 80%;">Two parameters</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs016.html#maximum-likelihood" style="font-size: 80%;">Maximum likelihood</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs017.html#the-cost-function-rewritten" style="font-size: 80%;">The cost function rewritten</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs018.html#minimizing-the-cross-entropy" style="font-size: 80%;">Minimizing the cross entropy</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs019.html#a-more-compact-expression" style="font-size: 80%;">A more compact expression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs020.html#extending-to-more-predictors" style="font-size: 80%;">Extending to more predictors</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs021.html#including-more-classes" style="font-size: 80%;">Including more classes</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs022.html#more-classes" style="font-size: 80%;">More classes</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs023.html#searching-for-optimal-regularization-parameters-lambda" style="font-size: 80%;">Searching for Optimal Regularization Parameters \( \lambda \)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs024.html#grid-search" style="font-size: 80%;">Grid Search</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs025.html#randomized-grid-search" style="font-size: 80%;">Randomized Grid Search</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs026.html#wisconsin-cancer-data" style="font-size: 80%;">Wisconsin Cancer Data</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs027.html#using-the-correlation-matrix" style="font-size: 80%;">Using the correlation matrix</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs028.html#discussing-the-correlation-data" style="font-size: 80%;">Discussing the correlation data</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs029.html#other-measures-in-classification-studies-cancer-data-again" style="font-size: 80%;">Other measures in classification studies: Cancer Data  again</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs030.html#optimization-the-central-part-of-any-machine-learning-algortithm" style="font-size: 80%;">Optimization, the central part of any Machine Learning algortithm</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs031.html#revisiting-our-logistic-regression-case" style="font-size: 80%;">Revisiting our Logistic Regression case</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs032.html#the-equations-to-solve" style="font-size: 80%;">The equations to solve</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs033.html#solving-using-newton-raphson-s-method" style="font-size: 80%;">Solving using Newton-Raphson's method</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs034.html#brief-reminder-on-newton-raphson-s-method" style="font-size: 80%;">Brief reminder on Newton-Raphson's method</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs035.html#the-equations" style="font-size: 80%;">The equations</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs036.html#simple-geometric-interpretation" style="font-size: 80%;">Simple geometric interpretation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs037.html#extending-to-more-than-one-variable" style="font-size: 80%;">Extending to more than one variable</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs038.html#steepest-descent" style="font-size: 80%;">Steepest descent</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs039.html#more-on-steepest-descent" style="font-size: 80%;">More on Steepest descent</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs040.html#the-ideal" style="font-size: 80%;">The ideal</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs041.html#the-sensitiveness-of-the-gradient-descent" style="font-size: 80%;">The sensitiveness of the gradient descent</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs042.html#convex-functions" style="font-size: 80%;">Convex functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs043.html#convex-function" style="font-size: 80%;">Convex function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs044.html#conditions-on-convex-functions" style="font-size: 80%;">Conditions on convex functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs045.html#more-on-convex-functions" style="font-size: 80%;">More on convex functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs046.html#some-simple-problems" style="font-size: 80%;">Some simple problems</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs047.html#revisiting-our-first-homework" style="font-size: 80%;">Revisiting our first homework</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs052.html#gradient-descent-example" style="font-size: 80%;">Gradient descent example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs049.html#the-derivative-of-the-cost-loss-function" style="font-size: 80%;">The derivative of the cost/loss function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs050.html#the-hessian-matrix" style="font-size: 80%;">The Hessian matrix</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs051.html#simple-program" style="font-size: 80%;">Simple program</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs052.html#gradient-descent-example" style="font-size: 80%;">Gradient Descent Example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs053.html#and-a-corresponding-example-using-scikit-learn" style="font-size: 80%;">And a corresponding example using <b>scikit-learn</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs054.html#gradient-descent-and-ridge" style="font-size: 80%;">Gradient descent and Ridge</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs055.html#the-hessian-matrix-for-ridge-regression" style="font-size: 80%;">The Hessian matrix for Ridge Regression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs056.html#program-example-for-gradient-descent-with-ridge-regression" style="font-size: 80%;">Program example for gradient descent with Ridge Regression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs057.html#using-gradient-descent-methods-limitations" style="font-size: 80%;">Using gradient descent methods, limitations</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs058.html#challenge-yourself-the-coming-weekend" style="font-size: 80%;">Challenge yourself the coming weekend</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs059.html#lab-session-material-from-last-week-and-relevant-for-the-first-project" style="font-size: 80%;">Lab session: Material from last week and relevant for the first project</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs060.html#various-steps-in-cross-validation" style="font-size: 80%;">Various steps in cross-validation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs061.html#how-to-set-up-the-cross-validation-for-ridge-and-or-lasso" style="font-size: 80%;">How to set up the cross-validation for Ridge and/or Lasso</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs062.html#cross-validation-in-brief" style="font-size: 80%;">Cross-validation in brief</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs063.html#code-example-for-cross-validation-and-k-fold-cross-validation" style="font-size: 80%;">Code Example for Cross-validation and \( k \)-fold Cross-validation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs001.html#plans-for-week-38-lecture-monday-september-15" style="font-size: 80%;">Plans for week 38, lecture Monday September 15</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs002.html#readings-and-videos" style="font-size: 80%;">Readings and Videos</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs003.html#linking-the-regression-analysis-with-a-statistical-interpretation" style="font-size: 80%;">Linking the regression analysis with a statistical interpretation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs004.html#assumptions-made" style="font-size: 80%;">Assumptions made</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs005.html#expectation-value-and-variance" style="font-size: 80%;">Expectation value and variance</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs006.html#expectation-value-and-variance-for-boldsymbol-theta" style="font-size: 80%;">Expectation value and variance for \( \boldsymbol{\theta} \)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs007.html#deriving-ols-from-a-probability-distribution" style="font-size: 80%;">Deriving OLS from a probability distribution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs008.html#independent-and-identically-distributed-iid" style="font-size: 80%;">Independent and Identically Distributed (iid)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs009.html#maximum-likelihood-estimation-mle" style="font-size: 80%;">Maximum Likelihood Estimation (MLE)</a></li>
+     <!-- navigation toc: --> <li><a href="#a-new-cost-function" style="font-size: 80%;">A new Cost Function</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs014.html#why-resampling-methods" style="font-size: 80%;">Why resampling methods</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs016.html#resampling-methods" style="font-size: 80%;">Resampling methods</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs013.html#resampling-approaches-can-be-computationally-expensive" style="font-size: 80%;">Resampling approaches can be computationally expensive</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs014.html#why-resampling-methods" style="font-size: 80%;">Why resampling methods ?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs015.html#statistical-analysis" style="font-size: 80%;">Statistical analysis</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs016.html#resampling-methods" style="font-size: 80%;">Resampling methods</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs017.html#resampling-methods-bootstrap" style="font-size: 80%;">Resampling methods: Bootstrap</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs018.html#the-central-limit-theorem" style="font-size: 80%;">The Central Limit Theorem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs019.html#finding-the-limit" style="font-size: 80%;">Finding the Limit</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs020.html#rewriting-the-delta-function" style="font-size: 80%;">Rewriting the \( \delta \)-function</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs021.html#identifying-terms" style="font-size: 80%;">Identifying Terms</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs022.html#wrapping-it-up" style="font-size: 80%;">Wrapping it up</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs023.html#confidence-intervals" style="font-size: 80%;">Confidence Intervals</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs024.html#standard-approach-based-on-the-normal-distribution" style="font-size: 80%;">Standard Approach based on the Normal Distribution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs025.html#resampling-methods-bootstrap-background" style="font-size: 80%;">Resampling methods: Bootstrap background</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs026.html#resampling-methods-more-bootstrap-background" style="font-size: 80%;">Resampling methods: More Bootstrap background</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs027.html#resampling-methods-bootstrap-approach" style="font-size: 80%;">Resampling methods: Bootstrap approach</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs028.html#resampling-methods-bootstrap-steps" style="font-size: 80%;">Resampling methods: Bootstrap steps</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs029.html#code-example-for-the-bootstrap-method" style="font-size: 80%;">Code example for the Bootstrap method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs030.html#plotting-the-histogram" style="font-size: 80%;">Plotting the Histogram</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs031.html#the-bias-variance-tradeoff" style="font-size: 80%;">The bias-variance tradeoff</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs032.html#a-way-to-read-the-bias-variance-tradeoff" style="font-size: 80%;">A way to Read the Bias-Variance Tradeoff</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs033.html#example-code-for-bias-variance-tradeoff" style="font-size: 80%;">Example code for Bias-Variance tradeoff</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs034.html#understanding-what-happens" style="font-size: 80%;">Understanding what happens</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs035.html#summing-up" style="font-size: 80%;">Summing up</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs036.html#another-example-from-scikit-learn-s-repository" style="font-size: 80%;">Another Example from Scikit-Learn's Repository</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs037.html#various-steps-in-cross-validation" style="font-size: 80%;">Various steps in cross-validation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs038.html#cross-validation-in-brief" style="font-size: 80%;">Cross-validation in brief</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs039.html#code-example-for-cross-validation-and-k-fold-cross-validation" style="font-size: 80%;">Code Example for Cross-validation and \( k \)-fold Cross-validation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs040.html#more-examples-on-bootstrap-and-cross-validation-and-errors" style="font-size: 80%;">More examples on bootstrap and cross-validation and errors</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs041.html#the-same-example-but-now-with-cross-validation" style="font-size: 80%;">The same example but now with cross-validation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs042.html#material-for-the-lab-sessions" style="font-size: 80%;">Material for the lab sessions</a></li>
 
         </ul>
       </li>
@@ -319,27 +251,31 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0010"></a>
 <!-- !split -->
-<h2 id="some-selected-properties" class="anchor">Some selected properties </h2>
+<h2 id="a-new-cost-function" class="anchor">A new Cost Function </h2>
+
+<p>We could now define a new cost function to minimize, namely the negative logarithm of the above PDF</p>
+
+$$
+C(\boldsymbol{\theta})=-\log{\prod_{i=0}^{n-1}p(y_i,\boldsymbol{X}\vert\boldsymbol{\theta})}=-\sum_{i=0}^{n-1}\log{p(y_i,\boldsymbol{X}\vert\boldsymbol{\theta})},
+$$
+
+<p>which becomes</p>
+$$
+C(\boldsymbol{\theta})=\frac{n}{2}\log{2\pi\sigma^2}+\frac{\vert\vert (\boldsymbol{y}-\boldsymbol{X}\boldsymbol{\theta})\vert\vert_2^2}{2\sigma^2}.
+$$
+
+<p>Taking the derivative of the <em>new</em> cost function with respect to the parameters \( \theta \) we recognize our familiar OLS equation, namely</p>
 
-<p>The main problem with our function is that it takes values on the
-entire real axis. In the case of logistic regression, however, the
-labels \( y_i \) are discrete variables. A typical example is the credit
-card data discussed below here, where we can set the state of
-defaulting the debt to \( y_i=1 \) and not to \( y_i=0 \) for one the persons
-in the data set (see the full example below).
-</p>
+$$
+\boldsymbol{X}^T\left(\boldsymbol{y}-\boldsymbol{X}\boldsymbol{\theta}\right) =0,
+$$
 
-<p>One simple way to get a discrete output is to have sign
-functions that map the output of a linear regressor to values \( \{0,1\} \),
-\( f(s_i)=sign(s_i)=1 \) if \( s_i\ge 0 \) and 0 if otherwise. 
-We will encounter this model in our first demonstration of neural networks.
-</p>
+<p>which leads to the well-known OLS equation for the optimal paramters \( \theta \)</p>
+$$
+\hat{\boldsymbol{\theta}}^{\mathrm{OLS}}=\left(\boldsymbol{X}^T\boldsymbol{X}\right)^{-1}\boldsymbol{X}^T\boldsymbol{y}!
+$$
 
-<p>Historically it is called the <b>perceptron</b> model in the machine learning
-literature. This model is extremely simple. However, in many cases it is more
-favorable to use a ``soft" classifier that outputs
-the probability of a given category. This leads us to the logistic function.
-</p>
+<p>Next week we will make  a similar analysis for Ridge and Lasso regression</p>
 
 <p>
 <!-- navigation buttons at the bottom of the page -->
@@ -366,7 +302,7 @@ <h2 id="some-selected-properties" class="anchor">Some selected properties </h2>
   <li><a href="http://github.com/._week38-bs018.html">19</a></li>
   <li><a href="http://github.com/._week38-bs019.html">20</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week38-bs063.html">64</a></li>
+  <li><a href="http://github.com/._week38-bs042.html">43</a></li>
   <li><a href="http://github.com/._week38-bs011.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week38/html/._week38-bs011.html b/doc/pub/week38/html/._week38-bs011.html
index e9d8d4629..bc24378e0 100644
--- a/doc/pub/week38/html/._week38-bs011.html
+++ b/doc/pub/week38/html/._week38-bs011.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 38: Logistic Regression and Optimization">
-<title>Week 38: Logistic Regression and Optimization</title>
+<meta name="description" content="Week 38: Statistical analysis, bias-variance tradeoff and resampling methods">
+<title>Week 38: Statistical analysis, bias-variance tradeoff and resampling methods</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week38.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week38-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -36,172 +36,113 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Plans for week 38, lecture Monday September 16',
+ 'sections': [('Plans for week 38, lecture Monday September 15',
                2,
                None,
-               'plans-for-week-38-lecture-monday-september-16'),
-              ('Suggested reading and videos',
+               'plans-for-week-38-lecture-monday-september-15'),
+              ('Readings and Videos', 2, None, 'readings-and-videos'),
+              ('Linking the regression analysis with a statistical '
+               'interpretation',
                2,
                None,
-               'suggested-reading-and-videos'),
-              ('Plans for the lab sessions',
+               'linking-the-regression-analysis-with-a-statistical-interpretation'),
+              ('Assumptions made', 2, None, 'assumptions-made'),
+              ('Expectation value and variance',
                2,
                None,
-               'plans-for-the-lab-sessions'),
-              ('Material for lecture Monday September 16',
+               'expectation-value-and-variance'),
+              ('Expectation value and variance for $\\boldsymbol{\\theta}$',
                2,
                None,
-               'material-for-lecture-monday-september-16'),
-              ('Logistic Regression', 2, None, 'logistic-regression'),
-              ('Classification problems', 2, None, 'classification-problems'),
-              ('Optimization and Deep learning',
+               'expectation-value-and-variance-for-boldsymbol-theta'),
+              ('Deriving OLS from a probability distribution',
                2,
                None,
-               'optimization-and-deep-learning'),
-              ('Basics', 2, None, 'basics'),
-              ('Linear classifier', 2, None, 'linear-classifier'),
-              ('Some selected properties', 2, None, 'some-selected-properties'),
-              ('Simple example', 2, None, 'simple-example'),
-              ('Plotting the mean value for each group',
+               'deriving-ols-from-a-probability-distribution'),
+              ('Independent and Identically Distributed (iid)',
                2,
                None,
-               'plotting-the-mean-value-for-each-group'),
-              ('The logistic function', 2, None, 'the-logistic-function'),
-              ('Examples of likelihood functions used in logistic regression '
-               'and nueral networks',
+               'independent-and-identically-distributed-iid'),
+              ('Maximum Likelihood Estimation (MLE)',
                2,
                None,
-               'examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-networks'),
-              ('Two parameters', 2, None, 'two-parameters'),
-              ('Maximum likelihood', 2, None, 'maximum-likelihood'),
-              ('The cost function rewritten',
+               'maximum-likelihood-estimation-mle'),
+              ('A new Cost Function', 2, None, 'a-new-cost-function'),
+              ('Why resampling methods', 2, None, 'why-resampling-methods'),
+              ('Resampling methods', 2, None, 'resampling-methods'),
+              ('Resampling approaches can be computationally expensive',
                2,
                None,
-               'the-cost-function-rewritten'),
-              ('Minimizing the cross entropy',
+               'resampling-approaches-can-be-computationally-expensive'),
+              ('Why resampling methods ?', 2, None, 'why-resampling-methods'),
+              ('Statistical analysis', 2, None, 'statistical-analysis'),
+              ('Resampling methods', 2, None, 'resampling-methods'),
+              ('Resampling methods: Bootstrap',
                2,
                None,
-               'minimizing-the-cross-entropy'),
-              ('A more compact expression',
+               'resampling-methods-bootstrap'),
+              ('The Central Limit Theorem',
                2,
                None,
-               'a-more-compact-expression'),
-              ('Extending to more predictors',
+               'the-central-limit-theorem'),
+              ('Finding the Limit', 2, None, 'finding-the-limit'),
+              ('Rewriting the $\\delta$-function',
                2,
                None,
-               'extending-to-more-predictors'),
-              ('Including more classes', 2, None, 'including-more-classes'),
-              ('More classes', 2, None, 'more-classes'),
-              ('Searching for Optimal Regularization Parameters $\\lambda$',
+               'rewriting-the-delta-function'),
+              ('Identifying Terms', 2, None, 'identifying-terms'),
+              ('Wrapping it up', 2, None, 'wrapping-it-up'),
+              ('Confidence Intervals', 2, None, 'confidence-intervals'),
+              ('Standard Approach based on the Normal Distribution',
                2,
                None,
-               'searching-for-optimal-regularization-parameters-lambda'),
-              ('Grid Search', 2, None, 'grid-search'),
-              ('Randomized Grid Search', 2, None, 'randomized-grid-search'),
-              ('Wisconsin Cancer Data', 2, None, 'wisconsin-cancer-data'),
-              ('Using the correlation matrix',
+               'standard-approach-based-on-the-normal-distribution'),
+              ('Resampling methods: Bootstrap background',
                2,
                None,
-               'using-the-correlation-matrix'),
-              ('Discussing the correlation data',
+               'resampling-methods-bootstrap-background'),
+              ('Resampling methods: More Bootstrap background',
                2,
                None,
-               'discussing-the-correlation-data'),
-              ('Other measures in classification studies: Cancer Data  again',
+               'resampling-methods-more-bootstrap-background'),
+              ('Resampling methods: Bootstrap approach',
                2,
                None,
-               'other-measures-in-classification-studies-cancer-data-again'),
-              ('Optimization, the central part of any Machine Learning '
-               'algortithm',
+               'resampling-methods-bootstrap-approach'),
+              ('Resampling methods: Bootstrap steps',
                2,
                None,
-               'optimization-the-central-part-of-any-machine-learning-algortithm'),
-              ('Revisiting our Logistic Regression case',
+               'resampling-methods-bootstrap-steps'),
+              ('Code example for the Bootstrap method',
                2,
                None,
-               'revisiting-our-logistic-regression-case'),
-              ('The equations to solve', 2, None, 'the-equations-to-solve'),
-              ("Solving using Newton-Raphson's method",
+               'code-example-for-the-bootstrap-method'),
+              ('Plotting the Histogram', 2, None, 'plotting-the-histogram'),
+              ('The bias-variance tradeoff',
                2,
                None,
-               'solving-using-newton-raphson-s-method'),
-              ("Brief reminder on Newton-Raphson's method",
+               'the-bias-variance-tradeoff'),
+              ('A way to Read the Bias-Variance Tradeoff',
                2,
                None,
-               'brief-reminder-on-newton-raphson-s-method'),
-              ('The equations', 2, None, 'the-equations'),
-              ('Simple geometric interpretation',
+               'a-way-to-read-the-bias-variance-tradeoff'),
+              ('Example code for Bias-Variance tradeoff',
                2,
                None,
-               'simple-geometric-interpretation'),
-              ('Extending to more than one variable',
+               'example-code-for-bias-variance-tradeoff'),
+              ('Understanding what happens',
                2,
                None,
-               'extending-to-more-than-one-variable'),
-              ('Steepest descent', 2, None, 'steepest-descent'),
-              ('More on Steepest descent', 2, None, 'more-on-steepest-descent'),
-              ('The ideal', 2, None, 'the-ideal'),
-              ('The sensitiveness of the gradient descent',
+               'understanding-what-happens'),
+              ('Summing up', 2, None, 'summing-up'),
+              ("Another Example from Scikit-Learn's Repository",
                2,
                None,
-               'the-sensitiveness-of-the-gradient-descent'),
-              ('Convex functions', 2, None, 'convex-functions'),
-              ('Convex function', 2, None, 'convex-function'),
-              ('Conditions on convex functions',
-               2,
-               None,
-               'conditions-on-convex-functions'),
-              ('More on convex functions', 2, None, 'more-on-convex-functions'),
-              ('Some simple problems', 2, None, 'some-simple-problems'),
-              ('Revisiting our first homework',
-               2,
-               None,
-               'revisiting-our-first-homework'),
-              ('Gradient descent example', 2, None, 'gradient-descent-example'),
-              ('The derivative of the cost/loss function',
-               2,
-               None,
-               'the-derivative-of-the-cost-loss-function'),
-              ('The Hessian matrix', 2, None, 'the-hessian-matrix'),
-              ('Simple program', 2, None, 'simple-program'),
-              ('Gradient Descent Example', 2, None, 'gradient-descent-example'),
-              ('And a corresponding example using _scikit-learn_',
-               2,
-               None,
-               'and-a-corresponding-example-using-scikit-learn'),
-              ('Gradient descent and Ridge',
-               2,
-               None,
-               'gradient-descent-and-ridge'),
-              ('The Hessian matrix for Ridge Regression',
-               2,
-               None,
-               'the-hessian-matrix-for-ridge-regression'),
-              ('Program example for gradient descent with Ridge Regression',
-               2,
-               None,
-               'program-example-for-gradient-descent-with-ridge-regression'),
-              ('Using gradient descent methods, limitations',
-               2,
-               None,
-               'using-gradient-descent-methods-limitations'),
-              ('Challenge yourself the coming weekend',
-               2,
-               None,
-               'challenge-yourself-the-coming-weekend'),
-              ('Lab session: Material from last week and relevant for the '
-               'first project',
-               2,
-               None,
-               'lab-session-material-from-last-week-and-relevant-for-the-first-project'),
+               'another-example-from-scikit-learn-s-repository'),
               ('Various steps in cross-validation',
                2,
                None,
                'various-steps-in-cross-validation'),
-              ('How to set up the cross-validation for Ridge and/or Lasso',
-               2,
-               None,
-               'how-to-set-up-the-cross-validation-for-ridge-and-or-lasso'),
               ('Cross-validation in brief',
                2,
                None,
@@ -210,7 +151,19 @@
                'Cross-validation',
                2,
                None,
-               'code-example-for-cross-validation-and-k-fold-cross-validation')]}
+               'code-example-for-cross-validation-and-k-fold-cross-validation'),
+              ('More examples on bootstrap and cross-validation and errors',
+               2,
+               None,
+               'more-examples-on-bootstrap-and-cross-validation-and-errors'),
+              ('The same example but now with cross-validation',
+               2,
+               None,
+               'the-same-example-but-now-with-cross-validation'),
+              ('Material for the lab sessions',
+               2,
+               None,
+               'material-for-the-lab-sessions')]}
 end of tocinfo -->
 
 <body>
@@ -238,76 +191,55 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week38-bs.html">Week 38: Logistic Regression and Optimization</a>
+    <a class="navbar-brand" href="http://github.com/week38-bs.html">Week 38: Statistical analysis, bias-variance tradeoff and resampling methods</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs001.html#plans-for-week-38-lecture-monday-september-16" style="font-size: 80%;">Plans for week 38, lecture Monday September 16</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs002.html#suggested-reading-and-videos" style="font-size: 80%;">Suggested reading and videos</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs003.html#plans-for-the-lab-sessions" style="font-size: 80%;">Plans for the lab sessions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs004.html#material-for-lecture-monday-september-16" style="font-size: 80%;">Material for lecture Monday September 16</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs005.html#logistic-regression" style="font-size: 80%;">Logistic Regression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs006.html#classification-problems" style="font-size: 80%;">Classification problems</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs007.html#optimization-and-deep-learning" style="font-size: 80%;">Optimization and Deep learning</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs008.html#basics" style="font-size: 80%;">Basics</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs009.html#linear-classifier" style="font-size: 80%;">Linear classifier</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs010.html#some-selected-properties" style="font-size: 80%;">Some selected properties</a></li>
-     <!-- navigation toc: --> <li><a href="#simple-example" style="font-size: 80%;">Simple example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs012.html#plotting-the-mean-value-for-each-group" style="font-size: 80%;">Plotting the mean value for each group</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs013.html#the-logistic-function" style="font-size: 80%;">The logistic function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs014.html#examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-networks" style="font-size: 80%;">Examples of likelihood functions used in logistic regression and nueral networks</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs015.html#two-parameters" style="font-size: 80%;">Two parameters</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs016.html#maximum-likelihood" style="font-size: 80%;">Maximum likelihood</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs017.html#the-cost-function-rewritten" style="font-size: 80%;">The cost function rewritten</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs018.html#minimizing-the-cross-entropy" style="font-size: 80%;">Minimizing the cross entropy</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs019.html#a-more-compact-expression" style="font-size: 80%;">A more compact expression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs020.html#extending-to-more-predictors" style="font-size: 80%;">Extending to more predictors</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs021.html#including-more-classes" style="font-size: 80%;">Including more classes</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs022.html#more-classes" style="font-size: 80%;">More classes</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs023.html#searching-for-optimal-regularization-parameters-lambda" style="font-size: 80%;">Searching for Optimal Regularization Parameters \( \lambda \)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs024.html#grid-search" style="font-size: 80%;">Grid Search</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs025.html#randomized-grid-search" style="font-size: 80%;">Randomized Grid Search</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs026.html#wisconsin-cancer-data" style="font-size: 80%;">Wisconsin Cancer Data</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs027.html#using-the-correlation-matrix" style="font-size: 80%;">Using the correlation matrix</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs028.html#discussing-the-correlation-data" style="font-size: 80%;">Discussing the correlation data</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs029.html#other-measures-in-classification-studies-cancer-data-again" style="font-size: 80%;">Other measures in classification studies: Cancer Data  again</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs030.html#optimization-the-central-part-of-any-machine-learning-algortithm" style="font-size: 80%;">Optimization, the central part of any Machine Learning algortithm</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs031.html#revisiting-our-logistic-regression-case" style="font-size: 80%;">Revisiting our Logistic Regression case</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs032.html#the-equations-to-solve" style="font-size: 80%;">The equations to solve</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs033.html#solving-using-newton-raphson-s-method" style="font-size: 80%;">Solving using Newton-Raphson's method</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs034.html#brief-reminder-on-newton-raphson-s-method" style="font-size: 80%;">Brief reminder on Newton-Raphson's method</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs035.html#the-equations" style="font-size: 80%;">The equations</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs036.html#simple-geometric-interpretation" style="font-size: 80%;">Simple geometric interpretation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs037.html#extending-to-more-than-one-variable" style="font-size: 80%;">Extending to more than one variable</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs038.html#steepest-descent" style="font-size: 80%;">Steepest descent</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs039.html#more-on-steepest-descent" style="font-size: 80%;">More on Steepest descent</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs040.html#the-ideal" style="font-size: 80%;">The ideal</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs041.html#the-sensitiveness-of-the-gradient-descent" style="font-size: 80%;">The sensitiveness of the gradient descent</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs042.html#convex-functions" style="font-size: 80%;">Convex functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs043.html#convex-function" style="font-size: 80%;">Convex function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs044.html#conditions-on-convex-functions" style="font-size: 80%;">Conditions on convex functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs045.html#more-on-convex-functions" style="font-size: 80%;">More on convex functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs046.html#some-simple-problems" style="font-size: 80%;">Some simple problems</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs047.html#revisiting-our-first-homework" style="font-size: 80%;">Revisiting our first homework</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs052.html#gradient-descent-example" style="font-size: 80%;">Gradient descent example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs049.html#the-derivative-of-the-cost-loss-function" style="font-size: 80%;">The derivative of the cost/loss function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs050.html#the-hessian-matrix" style="font-size: 80%;">The Hessian matrix</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs051.html#simple-program" style="font-size: 80%;">Simple program</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs052.html#gradient-descent-example" style="font-size: 80%;">Gradient Descent Example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs053.html#and-a-corresponding-example-using-scikit-learn" style="font-size: 80%;">And a corresponding example using <b>scikit-learn</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs054.html#gradient-descent-and-ridge" style="font-size: 80%;">Gradient descent and Ridge</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs055.html#the-hessian-matrix-for-ridge-regression" style="font-size: 80%;">The Hessian matrix for Ridge Regression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs056.html#program-example-for-gradient-descent-with-ridge-regression" style="font-size: 80%;">Program example for gradient descent with Ridge Regression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs057.html#using-gradient-descent-methods-limitations" style="font-size: 80%;">Using gradient descent methods, limitations</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs058.html#challenge-yourself-the-coming-weekend" style="font-size: 80%;">Challenge yourself the coming weekend</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs059.html#lab-session-material-from-last-week-and-relevant-for-the-first-project" style="font-size: 80%;">Lab session: Material from last week and relevant for the first project</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs060.html#various-steps-in-cross-validation" style="font-size: 80%;">Various steps in cross-validation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs061.html#how-to-set-up-the-cross-validation-for-ridge-and-or-lasso" style="font-size: 80%;">How to set up the cross-validation for Ridge and/or Lasso</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs062.html#cross-validation-in-brief" style="font-size: 80%;">Cross-validation in brief</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs063.html#code-example-for-cross-validation-and-k-fold-cross-validation" style="font-size: 80%;">Code Example for Cross-validation and \( k \)-fold Cross-validation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs001.html#plans-for-week-38-lecture-monday-september-15" style="font-size: 80%;">Plans for week 38, lecture Monday September 15</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs002.html#readings-and-videos" style="font-size: 80%;">Readings and Videos</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs003.html#linking-the-regression-analysis-with-a-statistical-interpretation" style="font-size: 80%;">Linking the regression analysis with a statistical interpretation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs004.html#assumptions-made" style="font-size: 80%;">Assumptions made</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs005.html#expectation-value-and-variance" style="font-size: 80%;">Expectation value and variance</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs006.html#expectation-value-and-variance-for-boldsymbol-theta" style="font-size: 80%;">Expectation value and variance for \( \boldsymbol{\theta} \)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs007.html#deriving-ols-from-a-probability-distribution" style="font-size: 80%;">Deriving OLS from a probability distribution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs008.html#independent-and-identically-distributed-iid" style="font-size: 80%;">Independent and Identically Distributed (iid)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs009.html#maximum-likelihood-estimation-mle" style="font-size: 80%;">Maximum Likelihood Estimation (MLE)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs010.html#a-new-cost-function" style="font-size: 80%;">A new Cost Function</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs014.html#why-resampling-methods" style="font-size: 80%;">Why resampling methods</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs016.html#resampling-methods" style="font-size: 80%;">Resampling methods</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs013.html#resampling-approaches-can-be-computationally-expensive" style="font-size: 80%;">Resampling approaches can be computationally expensive</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs014.html#why-resampling-methods" style="font-size: 80%;">Why resampling methods ?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs015.html#statistical-analysis" style="font-size: 80%;">Statistical analysis</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs016.html#resampling-methods" style="font-size: 80%;">Resampling methods</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs017.html#resampling-methods-bootstrap" style="font-size: 80%;">Resampling methods: Bootstrap</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs018.html#the-central-limit-theorem" style="font-size: 80%;">The Central Limit Theorem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs019.html#finding-the-limit" style="font-size: 80%;">Finding the Limit</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs020.html#rewriting-the-delta-function" style="font-size: 80%;">Rewriting the \( \delta \)-function</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs021.html#identifying-terms" style="font-size: 80%;">Identifying Terms</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs022.html#wrapping-it-up" style="font-size: 80%;">Wrapping it up</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs023.html#confidence-intervals" style="font-size: 80%;">Confidence Intervals</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs024.html#standard-approach-based-on-the-normal-distribution" style="font-size: 80%;">Standard Approach based on the Normal Distribution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs025.html#resampling-methods-bootstrap-background" style="font-size: 80%;">Resampling methods: Bootstrap background</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs026.html#resampling-methods-more-bootstrap-background" style="font-size: 80%;">Resampling methods: More Bootstrap background</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs027.html#resampling-methods-bootstrap-approach" style="font-size: 80%;">Resampling methods: Bootstrap approach</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs028.html#resampling-methods-bootstrap-steps" style="font-size: 80%;">Resampling methods: Bootstrap steps</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs029.html#code-example-for-the-bootstrap-method" style="font-size: 80%;">Code example for the Bootstrap method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs030.html#plotting-the-histogram" style="font-size: 80%;">Plotting the Histogram</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs031.html#the-bias-variance-tradeoff" style="font-size: 80%;">The bias-variance tradeoff</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs032.html#a-way-to-read-the-bias-variance-tradeoff" style="font-size: 80%;">A way to Read the Bias-Variance Tradeoff</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs033.html#example-code-for-bias-variance-tradeoff" style="font-size: 80%;">Example code for Bias-Variance tradeoff</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs034.html#understanding-what-happens" style="font-size: 80%;">Understanding what happens</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs035.html#summing-up" style="font-size: 80%;">Summing up</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs036.html#another-example-from-scikit-learn-s-repository" style="font-size: 80%;">Another Example from Scikit-Learn's Repository</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs037.html#various-steps-in-cross-validation" style="font-size: 80%;">Various steps in cross-validation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs038.html#cross-validation-in-brief" style="font-size: 80%;">Cross-validation in brief</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs039.html#code-example-for-cross-validation-and-k-fold-cross-validation" style="font-size: 80%;">Code Example for Cross-validation and \( k \)-fold Cross-validation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs040.html#more-examples-on-bootstrap-and-cross-validation-and-errors" style="font-size: 80%;">More examples on bootstrap and cross-validation and errors</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs041.html#the-same-example-but-now-with-cross-validation" style="font-size: 80%;">The same example but now with cross-validation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs042.html#material-for-the-lab-sessions" style="font-size: 80%;">Material for the lab sessions</a></li>
 
         </ul>
       </li>
@@ -319,86 +251,17 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0011"></a>
 <!-- !split -->
-<h2 id="simple-example" class="anchor">Simple example </h2>
-
-<p>The following example on data for coronary heart disease (CHD) as function of age may serve as an illustration. In the code here we read and plot whether a person has had CHD (output = 1) or not (output = 0). This ouput  is plotted the person's against age. Clearly, the figure shows that attempting to make a standard linear regression fit may not be very meaningful.</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #408080; font-style: italic"># Common imports</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">os</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">pandas</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">pd</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">matplotlib.pyplot</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">plt</span>
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.linear_model</span> <span style="color: #008000; font-weight: bold">import</span> LinearRegression, Ridge, Lasso
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.model_selection</span> <span style="color: #008000; font-weight: bold">import</span> train_test_split
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.utils</span> <span style="color: #008000; font-weight: bold">import</span> resample
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.metrics</span> <span style="color: #008000; font-weight: bold">import</span> mean_squared_error
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">IPython.display</span> <span style="color: #008000; font-weight: bold">import</span> display
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">pylab</span> <span style="color: #008000; font-weight: bold">import</span> plt, mpl
-plt<span style="color: #666666">.</span>style<span style="color: #666666">.</span>use(<span style="color: #BA2121">&#39;seaborn&#39;</span>)
-mpl<span style="color: #666666">.</span>rcParams[<span style="color: #BA2121">&#39;font.family&#39;</span>] <span style="color: #666666">=</span> <span style="color: #BA2121">&#39;serif&#39;</span>
-
-<span style="color: #408080; font-style: italic"># Where to save the figures and data files</span>
-PROJECT_ROOT_DIR <span style="color: #666666">=</span> <span style="color: #BA2121">&quot;Results&quot;</span>
-FIGURE_ID <span style="color: #666666">=</span> <span style="color: #BA2121">&quot;Results/FigureFiles&quot;</span>
-DATA_ID <span style="color: #666666">=</span> <span style="color: #BA2121">&quot;DataFiles/&quot;</span>
-
-<span style="color: #008000; font-weight: bold">if</span> <span style="color: #AA22FF; font-weight: bold">not</span> os<span style="color: #666666">.</span>path<span style="color: #666666">.</span>exists(PROJECT_ROOT_DIR):
-    os<span style="color: #666666">.</span>mkdir(PROJECT_ROOT_DIR)
-
-<span style="color: #008000; font-weight: bold">if</span> <span style="color: #AA22FF; font-weight: bold">not</span> os<span style="color: #666666">.</span>path<span style="color: #666666">.</span>exists(FIGURE_ID):
-    os<span style="color: #666666">.</span>makedirs(FIGURE_ID)
-
-<span style="color: #008000; font-weight: bold">if</span> <span style="color: #AA22FF; font-weight: bold">not</span> os<span style="color: #666666">.</span>path<span style="color: #666666">.</span>exists(DATA_ID):
-    os<span style="color: #666666">.</span>makedirs(DATA_ID)
-
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">image_path</span>(fig_id):
-    <span style="color: #008000; font-weight: bold">return</span> os<span style="color: #666666">.</span>path<span style="color: #666666">.</span>join(FIGURE_ID, fig_id)
-
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">data_path</span>(dat_id):
-    <span style="color: #008000; font-weight: bold">return</span> os<span style="color: #666666">.</span>path<span style="color: #666666">.</span>join(DATA_ID, dat_id)
-
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">save_fig</span>(fig_id):
-    plt<span style="color: #666666">.</span>savefig(image_path(fig_id) <span style="color: #666666">+</span> <span style="color: #BA2121">&quot;.png&quot;</span>, <span style="color: #008000">format</span><span style="color: #666666">=</span><span style="color: #BA2121">&#39;png&#39;</span>)
-
-infile <span style="color: #666666">=</span> <span style="color: #008000">open</span>(data_path(<span style="color: #BA2121">&quot;chddata.csv&quot;</span>),<span style="color: #BA2121">&#39;r&#39;</span>)
-
-<span style="color: #408080; font-style: italic"># Read the chd data as  csv file and organize the data into arrays with age group, age, and chd</span>
-chd <span style="color: #666666">=</span> pd<span style="color: #666666">.</span>read_csv(infile, names<span style="color: #666666">=</span>(<span style="color: #BA2121">&#39;ID&#39;</span>, <span style="color: #BA2121">&#39;Age&#39;</span>, <span style="color: #BA2121">&#39;Agegroup&#39;</span>, <span style="color: #BA2121">&#39;CHD&#39;</span>))
-chd<span style="color: #666666">.</span>columns <span style="color: #666666">=</span> [<span style="color: #BA2121">&#39;ID&#39;</span>, <span style="color: #BA2121">&#39;Age&#39;</span>, <span style="color: #BA2121">&#39;Agegroup&#39;</span>, <span style="color: #BA2121">&#39;CHD&#39;</span>]
-output <span style="color: #666666">=</span> chd[<span style="color: #BA2121">&#39;CHD&#39;</span>]
-age <span style="color: #666666">=</span> chd[<span style="color: #BA2121">&#39;Age&#39;</span>]
-agegroup <span style="color: #666666">=</span> chd[<span style="color: #BA2121">&#39;Agegroup&#39;</span>]
-numberID  <span style="color: #666666">=</span> chd[<span style="color: #BA2121">&#39;ID&#39;</span>] 
-display(chd)
-
-plt<span style="color: #666666">.</span>scatter(age, output, marker<span style="color: #666666">=</span><span style="color: #BA2121">&#39;o&#39;</span>)
-plt<span style="color: #666666">.</span>axis([<span style="color: #666666">18</span>,<span style="color: #666666">70.0</span>,<span style="color: #666666">-0.1</span>, <span style="color: #666666">1.2</span>])
-plt<span style="color: #666666">.</span>xlabel(<span style="color: #BA2121">r&#39;Age&#39;</span>)
-plt<span style="color: #666666">.</span>ylabel(<span style="color: #BA2121">r&#39;CHD&#39;</span>)
-plt<span style="color: #666666">.</span>title(<span style="color: #BA2121">r&#39;Age distribution and Coronary heart disease&#39;</span>)
-plt<span style="color: #666666">.</span>show()
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
+<h2 id="why-resampling-methods" class="anchor">Why resampling methods </h2>
 
+<p>Before we proceed, we need to rethink what we have been doing. In our
+eager to fit the data, we have omitted several important elements in
+our regression analysis. In what follows we will
+</p>
+<ol>
+<li> look at statistical properties, including a discussion of mean values, variance and the so-called bias-variance tradeoff</li>
+<li> introduce resampling techniques like cross-validation, bootstrapping and jackknife and more</li>
+</ol>
+<p>and discuss how to select a given model (one of the difficult parts in machine learning).</p>
 
 <p>
 <!-- navigation buttons at the bottom of the page -->
@@ -425,7 +288,7 @@ <h2 id="simple-example" class="anchor">Simple example </h2>
   <li><a href="http://github.com/._week38-bs019.html">20</a></li>
   <li><a href="http://github.com/._week38-bs020.html">21</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week38-bs063.html">64</a></li>
+  <li><a href="http://github.com/._week38-bs042.html">43</a></li>
   <li><a href="http://github.com/._week38-bs012.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week38/html/._week38-bs012.html b/doc/pub/week38/html/._week38-bs012.html
index c8a7ce3e3..95e64abfa 100644
--- a/doc/pub/week38/html/._week38-bs012.html
+++ b/doc/pub/week38/html/._week38-bs012.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 38: Logistic Regression and Optimization">
-<title>Week 38: Logistic Regression and Optimization</title>
+<meta name="description" content="Week 38: Statistical analysis, bias-variance tradeoff and resampling methods">
+<title>Week 38: Statistical analysis, bias-variance tradeoff and resampling methods</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week38.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week38-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -36,172 +36,113 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Plans for week 38, lecture Monday September 16',
+ 'sections': [('Plans for week 38, lecture Monday September 15',
                2,
                None,
-               'plans-for-week-38-lecture-monday-september-16'),
-              ('Suggested reading and videos',
+               'plans-for-week-38-lecture-monday-september-15'),
+              ('Readings and Videos', 2, None, 'readings-and-videos'),
+              ('Linking the regression analysis with a statistical '
+               'interpretation',
                2,
                None,
-               'suggested-reading-and-videos'),
-              ('Plans for the lab sessions',
+               'linking-the-regression-analysis-with-a-statistical-interpretation'),
+              ('Assumptions made', 2, None, 'assumptions-made'),
+              ('Expectation value and variance',
                2,
                None,
-               'plans-for-the-lab-sessions'),
-              ('Material for lecture Monday September 16',
+               'expectation-value-and-variance'),
+              ('Expectation value and variance for $\\boldsymbol{\\theta}$',
                2,
                None,
-               'material-for-lecture-monday-september-16'),
-              ('Logistic Regression', 2, None, 'logistic-regression'),
-              ('Classification problems', 2, None, 'classification-problems'),
-              ('Optimization and Deep learning',
+               'expectation-value-and-variance-for-boldsymbol-theta'),
+              ('Deriving OLS from a probability distribution',
                2,
                None,
-               'optimization-and-deep-learning'),
-              ('Basics', 2, None, 'basics'),
-              ('Linear classifier', 2, None, 'linear-classifier'),
-              ('Some selected properties', 2, None, 'some-selected-properties'),
-              ('Simple example', 2, None, 'simple-example'),
-              ('Plotting the mean value for each group',
+               'deriving-ols-from-a-probability-distribution'),
+              ('Independent and Identically Distributed (iid)',
                2,
                None,
-               'plotting-the-mean-value-for-each-group'),
-              ('The logistic function', 2, None, 'the-logistic-function'),
-              ('Examples of likelihood functions used in logistic regression '
-               'and nueral networks',
+               'independent-and-identically-distributed-iid'),
+              ('Maximum Likelihood Estimation (MLE)',
                2,
                None,
-               'examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-networks'),
-              ('Two parameters', 2, None, 'two-parameters'),
-              ('Maximum likelihood', 2, None, 'maximum-likelihood'),
-              ('The cost function rewritten',
+               'maximum-likelihood-estimation-mle'),
+              ('A new Cost Function', 2, None, 'a-new-cost-function'),
+              ('Why resampling methods', 2, None, 'why-resampling-methods'),
+              ('Resampling methods', 2, None, 'resampling-methods'),
+              ('Resampling approaches can be computationally expensive',
                2,
                None,
-               'the-cost-function-rewritten'),
-              ('Minimizing the cross entropy',
+               'resampling-approaches-can-be-computationally-expensive'),
+              ('Why resampling methods ?', 2, None, 'why-resampling-methods'),
+              ('Statistical analysis', 2, None, 'statistical-analysis'),
+              ('Resampling methods', 2, None, 'resampling-methods'),
+              ('Resampling methods: Bootstrap',
                2,
                None,
-               'minimizing-the-cross-entropy'),
-              ('A more compact expression',
+               'resampling-methods-bootstrap'),
+              ('The Central Limit Theorem',
                2,
                None,
-               'a-more-compact-expression'),
-              ('Extending to more predictors',
+               'the-central-limit-theorem'),
+              ('Finding the Limit', 2, None, 'finding-the-limit'),
+              ('Rewriting the $\\delta$-function',
                2,
                None,
-               'extending-to-more-predictors'),
-              ('Including more classes', 2, None, 'including-more-classes'),
-              ('More classes', 2, None, 'more-classes'),
-              ('Searching for Optimal Regularization Parameters $\\lambda$',
+               'rewriting-the-delta-function'),
+              ('Identifying Terms', 2, None, 'identifying-terms'),
+              ('Wrapping it up', 2, None, 'wrapping-it-up'),
+              ('Confidence Intervals', 2, None, 'confidence-intervals'),
+              ('Standard Approach based on the Normal Distribution',
                2,
                None,
-               'searching-for-optimal-regularization-parameters-lambda'),
-              ('Grid Search', 2, None, 'grid-search'),
-              ('Randomized Grid Search', 2, None, 'randomized-grid-search'),
-              ('Wisconsin Cancer Data', 2, None, 'wisconsin-cancer-data'),
-              ('Using the correlation matrix',
+               'standard-approach-based-on-the-normal-distribution'),
+              ('Resampling methods: Bootstrap background',
                2,
                None,
-               'using-the-correlation-matrix'),
-              ('Discussing the correlation data',
+               'resampling-methods-bootstrap-background'),
+              ('Resampling methods: More Bootstrap background',
                2,
                None,
-               'discussing-the-correlation-data'),
-              ('Other measures in classification studies: Cancer Data  again',
+               'resampling-methods-more-bootstrap-background'),
+              ('Resampling methods: Bootstrap approach',
                2,
                None,
-               'other-measures-in-classification-studies-cancer-data-again'),
-              ('Optimization, the central part of any Machine Learning '
-               'algortithm',
+               'resampling-methods-bootstrap-approach'),
+              ('Resampling methods: Bootstrap steps',
                2,
                None,
-               'optimization-the-central-part-of-any-machine-learning-algortithm'),
-              ('Revisiting our Logistic Regression case',
+               'resampling-methods-bootstrap-steps'),
+              ('Code example for the Bootstrap method',
                2,
                None,
-               'revisiting-our-logistic-regression-case'),
-              ('The equations to solve', 2, None, 'the-equations-to-solve'),
-              ("Solving using Newton-Raphson's method",
+               'code-example-for-the-bootstrap-method'),
+              ('Plotting the Histogram', 2, None, 'plotting-the-histogram'),
+              ('The bias-variance tradeoff',
                2,
                None,
-               'solving-using-newton-raphson-s-method'),
-              ("Brief reminder on Newton-Raphson's method",
+               'the-bias-variance-tradeoff'),
+              ('A way to Read the Bias-Variance Tradeoff',
                2,
                None,
-               'brief-reminder-on-newton-raphson-s-method'),
-              ('The equations', 2, None, 'the-equations'),
-              ('Simple geometric interpretation',
+               'a-way-to-read-the-bias-variance-tradeoff'),
+              ('Example code for Bias-Variance tradeoff',
                2,
                None,
-               'simple-geometric-interpretation'),
-              ('Extending to more than one variable',
+               'example-code-for-bias-variance-tradeoff'),
+              ('Understanding what happens',
                2,
                None,
-               'extending-to-more-than-one-variable'),
-              ('Steepest descent', 2, None, 'steepest-descent'),
-              ('More on Steepest descent', 2, None, 'more-on-steepest-descent'),
-              ('The ideal', 2, None, 'the-ideal'),
-              ('The sensitiveness of the gradient descent',
+               'understanding-what-happens'),
+              ('Summing up', 2, None, 'summing-up'),
+              ("Another Example from Scikit-Learn's Repository",
                2,
                None,
-               'the-sensitiveness-of-the-gradient-descent'),
-              ('Convex functions', 2, None, 'convex-functions'),
-              ('Convex function', 2, None, 'convex-function'),
-              ('Conditions on convex functions',
-               2,
-               None,
-               'conditions-on-convex-functions'),
-              ('More on convex functions', 2, None, 'more-on-convex-functions'),
-              ('Some simple problems', 2, None, 'some-simple-problems'),
-              ('Revisiting our first homework',
-               2,
-               None,
-               'revisiting-our-first-homework'),
-              ('Gradient descent example', 2, None, 'gradient-descent-example'),
-              ('The derivative of the cost/loss function',
-               2,
-               None,
-               'the-derivative-of-the-cost-loss-function'),
-              ('The Hessian matrix', 2, None, 'the-hessian-matrix'),
-              ('Simple program', 2, None, 'simple-program'),
-              ('Gradient Descent Example', 2, None, 'gradient-descent-example'),
-              ('And a corresponding example using _scikit-learn_',
-               2,
-               None,
-               'and-a-corresponding-example-using-scikit-learn'),
-              ('Gradient descent and Ridge',
-               2,
-               None,
-               'gradient-descent-and-ridge'),
-              ('The Hessian matrix for Ridge Regression',
-               2,
-               None,
-               'the-hessian-matrix-for-ridge-regression'),
-              ('Program example for gradient descent with Ridge Regression',
-               2,
-               None,
-               'program-example-for-gradient-descent-with-ridge-regression'),
-              ('Using gradient descent methods, limitations',
-               2,
-               None,
-               'using-gradient-descent-methods-limitations'),
-              ('Challenge yourself the coming weekend',
-               2,
-               None,
-               'challenge-yourself-the-coming-weekend'),
-              ('Lab session: Material from last week and relevant for the '
-               'first project',
-               2,
-               None,
-               'lab-session-material-from-last-week-and-relevant-for-the-first-project'),
+               'another-example-from-scikit-learn-s-repository'),
               ('Various steps in cross-validation',
                2,
                None,
                'various-steps-in-cross-validation'),
-              ('How to set up the cross-validation for Ridge and/or Lasso',
-               2,
-               None,
-               'how-to-set-up-the-cross-validation-for-ridge-and-or-lasso'),
               ('Cross-validation in brief',
                2,
                None,
@@ -210,7 +151,19 @@
                'Cross-validation',
                2,
                None,
-               'code-example-for-cross-validation-and-k-fold-cross-validation')]}
+               'code-example-for-cross-validation-and-k-fold-cross-validation'),
+              ('More examples on bootstrap and cross-validation and errors',
+               2,
+               None,
+               'more-examples-on-bootstrap-and-cross-validation-and-errors'),
+              ('The same example but now with cross-validation',
+               2,
+               None,
+               'the-same-example-but-now-with-cross-validation'),
+              ('Material for the lab sessions',
+               2,
+               None,
+               'material-for-the-lab-sessions')]}
 end of tocinfo -->
 
 <body>
@@ -238,76 +191,55 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week38-bs.html">Week 38: Logistic Regression and Optimization</a>
+    <a class="navbar-brand" href="http://github.com/week38-bs.html">Week 38: Statistical analysis, bias-variance tradeoff and resampling methods</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs001.html#plans-for-week-38-lecture-monday-september-16" style="font-size: 80%;">Plans for week 38, lecture Monday September 16</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs002.html#suggested-reading-and-videos" style="font-size: 80%;">Suggested reading and videos</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs003.html#plans-for-the-lab-sessions" style="font-size: 80%;">Plans for the lab sessions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs004.html#material-for-lecture-monday-september-16" style="font-size: 80%;">Material for lecture Monday September 16</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs005.html#logistic-regression" style="font-size: 80%;">Logistic Regression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs006.html#classification-problems" style="font-size: 80%;">Classification problems</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs007.html#optimization-and-deep-learning" style="font-size: 80%;">Optimization and Deep learning</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs008.html#basics" style="font-size: 80%;">Basics</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs009.html#linear-classifier" style="font-size: 80%;">Linear classifier</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs010.html#some-selected-properties" style="font-size: 80%;">Some selected properties</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs011.html#simple-example" style="font-size: 80%;">Simple example</a></li>
-     <!-- navigation toc: --> <li><a href="#plotting-the-mean-value-for-each-group" style="font-size: 80%;">Plotting the mean value for each group</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs013.html#the-logistic-function" style="font-size: 80%;">The logistic function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs014.html#examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-networks" style="font-size: 80%;">Examples of likelihood functions used in logistic regression and nueral networks</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs015.html#two-parameters" style="font-size: 80%;">Two parameters</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs016.html#maximum-likelihood" style="font-size: 80%;">Maximum likelihood</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs017.html#the-cost-function-rewritten" style="font-size: 80%;">The cost function rewritten</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs018.html#minimizing-the-cross-entropy" style="font-size: 80%;">Minimizing the cross entropy</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs019.html#a-more-compact-expression" style="font-size: 80%;">A more compact expression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs020.html#extending-to-more-predictors" style="font-size: 80%;">Extending to more predictors</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs021.html#including-more-classes" style="font-size: 80%;">Including more classes</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs022.html#more-classes" style="font-size: 80%;">More classes</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs023.html#searching-for-optimal-regularization-parameters-lambda" style="font-size: 80%;">Searching for Optimal Regularization Parameters \( \lambda \)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs024.html#grid-search" style="font-size: 80%;">Grid Search</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs025.html#randomized-grid-search" style="font-size: 80%;">Randomized Grid Search</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs026.html#wisconsin-cancer-data" style="font-size: 80%;">Wisconsin Cancer Data</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs027.html#using-the-correlation-matrix" style="font-size: 80%;">Using the correlation matrix</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs028.html#discussing-the-correlation-data" style="font-size: 80%;">Discussing the correlation data</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs029.html#other-measures-in-classification-studies-cancer-data-again" style="font-size: 80%;">Other measures in classification studies: Cancer Data  again</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs030.html#optimization-the-central-part-of-any-machine-learning-algortithm" style="font-size: 80%;">Optimization, the central part of any Machine Learning algortithm</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs031.html#revisiting-our-logistic-regression-case" style="font-size: 80%;">Revisiting our Logistic Regression case</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs032.html#the-equations-to-solve" style="font-size: 80%;">The equations to solve</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs033.html#solving-using-newton-raphson-s-method" style="font-size: 80%;">Solving using Newton-Raphson's method</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs034.html#brief-reminder-on-newton-raphson-s-method" style="font-size: 80%;">Brief reminder on Newton-Raphson's method</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs035.html#the-equations" style="font-size: 80%;">The equations</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs036.html#simple-geometric-interpretation" style="font-size: 80%;">Simple geometric interpretation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs037.html#extending-to-more-than-one-variable" style="font-size: 80%;">Extending to more than one variable</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs038.html#steepest-descent" style="font-size: 80%;">Steepest descent</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs039.html#more-on-steepest-descent" style="font-size: 80%;">More on Steepest descent</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs040.html#the-ideal" style="font-size: 80%;">The ideal</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs041.html#the-sensitiveness-of-the-gradient-descent" style="font-size: 80%;">The sensitiveness of the gradient descent</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs042.html#convex-functions" style="font-size: 80%;">Convex functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs043.html#convex-function" style="font-size: 80%;">Convex function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs044.html#conditions-on-convex-functions" style="font-size: 80%;">Conditions on convex functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs045.html#more-on-convex-functions" style="font-size: 80%;">More on convex functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs046.html#some-simple-problems" style="font-size: 80%;">Some simple problems</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs047.html#revisiting-our-first-homework" style="font-size: 80%;">Revisiting our first homework</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs052.html#gradient-descent-example" style="font-size: 80%;">Gradient descent example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs049.html#the-derivative-of-the-cost-loss-function" style="font-size: 80%;">The derivative of the cost/loss function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs050.html#the-hessian-matrix" style="font-size: 80%;">The Hessian matrix</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs051.html#simple-program" style="font-size: 80%;">Simple program</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs052.html#gradient-descent-example" style="font-size: 80%;">Gradient Descent Example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs053.html#and-a-corresponding-example-using-scikit-learn" style="font-size: 80%;">And a corresponding example using <b>scikit-learn</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs054.html#gradient-descent-and-ridge" style="font-size: 80%;">Gradient descent and Ridge</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs055.html#the-hessian-matrix-for-ridge-regression" style="font-size: 80%;">The Hessian matrix for Ridge Regression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs056.html#program-example-for-gradient-descent-with-ridge-regression" style="font-size: 80%;">Program example for gradient descent with Ridge Regression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs057.html#using-gradient-descent-methods-limitations" style="font-size: 80%;">Using gradient descent methods, limitations</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs058.html#challenge-yourself-the-coming-weekend" style="font-size: 80%;">Challenge yourself the coming weekend</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs059.html#lab-session-material-from-last-week-and-relevant-for-the-first-project" style="font-size: 80%;">Lab session: Material from last week and relevant for the first project</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs060.html#various-steps-in-cross-validation" style="font-size: 80%;">Various steps in cross-validation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs061.html#how-to-set-up-the-cross-validation-for-ridge-and-or-lasso" style="font-size: 80%;">How to set up the cross-validation for Ridge and/or Lasso</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs062.html#cross-validation-in-brief" style="font-size: 80%;">Cross-validation in brief</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs063.html#code-example-for-cross-validation-and-k-fold-cross-validation" style="font-size: 80%;">Code Example for Cross-validation and \( k \)-fold Cross-validation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs001.html#plans-for-week-38-lecture-monday-september-15" style="font-size: 80%;">Plans for week 38, lecture Monday September 15</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs002.html#readings-and-videos" style="font-size: 80%;">Readings and Videos</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs003.html#linking-the-regression-analysis-with-a-statistical-interpretation" style="font-size: 80%;">Linking the regression analysis with a statistical interpretation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs004.html#assumptions-made" style="font-size: 80%;">Assumptions made</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs005.html#expectation-value-and-variance" style="font-size: 80%;">Expectation value and variance</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs006.html#expectation-value-and-variance-for-boldsymbol-theta" style="font-size: 80%;">Expectation value and variance for \( \boldsymbol{\theta} \)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs007.html#deriving-ols-from-a-probability-distribution" style="font-size: 80%;">Deriving OLS from a probability distribution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs008.html#independent-and-identically-distributed-iid" style="font-size: 80%;">Independent and Identically Distributed (iid)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs009.html#maximum-likelihood-estimation-mle" style="font-size: 80%;">Maximum Likelihood Estimation (MLE)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs010.html#a-new-cost-function" style="font-size: 80%;">A new Cost Function</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs014.html#why-resampling-methods" style="font-size: 80%;">Why resampling methods</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs016.html#resampling-methods" style="font-size: 80%;">Resampling methods</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs013.html#resampling-approaches-can-be-computationally-expensive" style="font-size: 80%;">Resampling approaches can be computationally expensive</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs014.html#why-resampling-methods" style="font-size: 80%;">Why resampling methods ?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs015.html#statistical-analysis" style="font-size: 80%;">Statistical analysis</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs016.html#resampling-methods" style="font-size: 80%;">Resampling methods</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs017.html#resampling-methods-bootstrap" style="font-size: 80%;">Resampling methods: Bootstrap</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs018.html#the-central-limit-theorem" style="font-size: 80%;">The Central Limit Theorem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs019.html#finding-the-limit" style="font-size: 80%;">Finding the Limit</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs020.html#rewriting-the-delta-function" style="font-size: 80%;">Rewriting the \( \delta \)-function</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs021.html#identifying-terms" style="font-size: 80%;">Identifying Terms</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs022.html#wrapping-it-up" style="font-size: 80%;">Wrapping it up</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs023.html#confidence-intervals" style="font-size: 80%;">Confidence Intervals</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs024.html#standard-approach-based-on-the-normal-distribution" style="font-size: 80%;">Standard Approach based on the Normal Distribution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs025.html#resampling-methods-bootstrap-background" style="font-size: 80%;">Resampling methods: Bootstrap background</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs026.html#resampling-methods-more-bootstrap-background" style="font-size: 80%;">Resampling methods: More Bootstrap background</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs027.html#resampling-methods-bootstrap-approach" style="font-size: 80%;">Resampling methods: Bootstrap approach</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs028.html#resampling-methods-bootstrap-steps" style="font-size: 80%;">Resampling methods: Bootstrap steps</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs029.html#code-example-for-the-bootstrap-method" style="font-size: 80%;">Code example for the Bootstrap method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs030.html#plotting-the-histogram" style="font-size: 80%;">Plotting the Histogram</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs031.html#the-bias-variance-tradeoff" style="font-size: 80%;">The bias-variance tradeoff</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs032.html#a-way-to-read-the-bias-variance-tradeoff" style="font-size: 80%;">A way to Read the Bias-Variance Tradeoff</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs033.html#example-code-for-bias-variance-tradeoff" style="font-size: 80%;">Example code for Bias-Variance tradeoff</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs034.html#understanding-what-happens" style="font-size: 80%;">Understanding what happens</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs035.html#summing-up" style="font-size: 80%;">Summing up</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs036.html#another-example-from-scikit-learn-s-repository" style="font-size: 80%;">Another Example from Scikit-Learn's Repository</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs037.html#various-steps-in-cross-validation" style="font-size: 80%;">Various steps in cross-validation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs038.html#cross-validation-in-brief" style="font-size: 80%;">Cross-validation in brief</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs039.html#code-example-for-cross-validation-and-k-fold-cross-validation" style="font-size: 80%;">Code Example for Cross-validation and \( k \)-fold Cross-validation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs040.html#more-examples-on-bootstrap-and-cross-validation-and-errors" style="font-size: 80%;">More examples on bootstrap and cross-validation and errors</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs041.html#the-same-example-but-now-with-cross-validation" style="font-size: 80%;">The same example but now with cross-validation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs042.html#material-for-the-lab-sessions" style="font-size: 80%;">Material for the lab sessions</a></li>
 
         </ul>
       </li>
@@ -319,58 +251,33 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0012"></a>
 <!-- !split -->
-<h2 id="plotting-the-mean-value-for-each-group" class="anchor">Plotting the mean value for each group </h2>
-
-<p>What we could attempt however is to plot the mean value for each group.</p>
-
+<h2 id="resampling-methods" class="anchor">Resampling methods </h2>
+<div class="panel panel-default">
+<div class="panel-body">
+<!-- subsequent paragraphs come in larger fonts, so start with a paragraph -->
+<p>Resampling methods are an indispensable tool in modern
+statistics. They involve repeatedly drawing samples from a training
+set and refitting a model of interest on each sample in order to
+obtain additional information about the fitted model. For example, in
+order to estimate the variability of a linear regression fit, we can
+repeatedly draw different samples from the training data, fit a linear
+regression to each new sample, and then examine the extent to which
+the resulting fits differ. Such an approach may allow us to obtain
+information that would not be available from fitting the model only
+once using the original training sample.
+</p>
 
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;">agegroupmean <span style="color: #666666">=</span> np<span style="color: #666666">.</span>array([<span style="color: #666666">0.1</span>, <span style="color: #666666">0.133</span>, <span style="color: #666666">0.250</span>, <span style="color: #666666">0.333</span>, <span style="color: #666666">0.462</span>, <span style="color: #666666">0.625</span>, <span style="color: #666666">0.765</span>, <span style="color: #666666">0.800</span>])
-group <span style="color: #666666">=</span> np<span style="color: #666666">.</span>array([<span style="color: #666666">1</span>, <span style="color: #666666">2</span>, <span style="color: #666666">3</span>, <span style="color: #666666">4</span>, <span style="color: #666666">5</span>, <span style="color: #666666">6</span>, <span style="color: #666666">7</span>, <span style="color: #666666">8</span>])
-plt<span style="color: #666666">.</span>plot(group, agegroupmean, <span style="color: #BA2121">&quot;r-&quot;</span>)
-plt<span style="color: #666666">.</span>axis([<span style="color: #666666">0</span>,<span style="color: #666666">9</span>,<span style="color: #666666">0</span>, <span style="color: #666666">1.0</span>])
-plt<span style="color: #666666">.</span>xlabel(<span style="color: #BA2121">r&#39;Age group&#39;</span>)
-plt<span style="color: #666666">.</span>ylabel(<span style="color: #BA2121">r&#39;CHD mean values&#39;</span>)
-plt<span style="color: #666666">.</span>title(<span style="color: #BA2121">r&#39;Mean values for each age group&#39;</span>)
-plt<span style="color: #666666">.</span>show()
-</pre>
+<p>Two resampling methods are often used in Machine Learning analyses,</p>
+<ol>
+<li> The <b>bootstrap method</b></li>
+<li> and <b>Cross-Validation</b></li>
+</ol>
+<p>In addition there are several other methods such as the Jackknife and the Blocking methods. We will discuss in particular
+cross-validation and the bootstrap method. 
+</p>
 </div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
 </div>
 
-<p>We are now trying to find a function \( f(y\vert x) \), that is a function which gives us an expected value for the output \( y \) with a given input \( x \).
-In standard linear regression with a linear dependence on \( x \), we would write this in terms of our model
-</p>
-$$
-f(y_i\vert x_i)=\beta_0+\beta_1 x_i.
-$$
-
-<p>This expression implies however that \( f(y_i\vert x_i) \) could take any
-value from minus infinity to plus infinity. If we however let
-\( f(y\vert y) \) be represented by the mean value, the above example
-shows us that we can constrain the function to take values between
-zero and one, that is we have \( 0 \le f(y_i\vert x_i) \le 1 \). Looking
-at our last curve we see also that it has an S-shaped form. This leads
-us to a very popular model for the function \( f \), namely the so-called
-Sigmoid function or logistic model. We will consider this function as
-representing the probability for finding a value of \( y_i \) with a given
-\( x_i \).
-</p>
 
 <p>
 <!-- navigation buttons at the bottom of the page -->
@@ -397,7 +304,7 @@ <h2 id="plotting-the-mean-value-for-each-group" class="anchor">Plotting the mean
   <li><a href="http://github.com/._week38-bs020.html">21</a></li>
   <li><a href="http://github.com/._week38-bs021.html">22</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week38-bs063.html">64</a></li>
+  <li><a href="http://github.com/._week38-bs042.html">43</a></li>
   <li><a href="http://github.com/._week38-bs013.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week38/html/._week38-bs013.html b/doc/pub/week38/html/._week38-bs013.html
index 69634491c..f89833211 100644
--- a/doc/pub/week38/html/._week38-bs013.html
+++ b/doc/pub/week38/html/._week38-bs013.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 38: Logistic Regression and Optimization">
-<title>Week 38: Logistic Regression and Optimization</title>
+<meta name="description" content="Week 38: Statistical analysis, bias-variance tradeoff and resampling methods">
+<title>Week 38: Statistical analysis, bias-variance tradeoff and resampling methods</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week38.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week38-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -36,172 +36,113 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Plans for week 38, lecture Monday September 16',
+ 'sections': [('Plans for week 38, lecture Monday September 15',
                2,
                None,
-               'plans-for-week-38-lecture-monday-september-16'),
-              ('Suggested reading and videos',
+               'plans-for-week-38-lecture-monday-september-15'),
+              ('Readings and Videos', 2, None, 'readings-and-videos'),
+              ('Linking the regression analysis with a statistical '
+               'interpretation',
                2,
                None,
-               'suggested-reading-and-videos'),
-              ('Plans for the lab sessions',
+               'linking-the-regression-analysis-with-a-statistical-interpretation'),
+              ('Assumptions made', 2, None, 'assumptions-made'),
+              ('Expectation value and variance',
                2,
                None,
-               'plans-for-the-lab-sessions'),
-              ('Material for lecture Monday September 16',
+               'expectation-value-and-variance'),
+              ('Expectation value and variance for $\\boldsymbol{\\theta}$',
                2,
                None,
-               'material-for-lecture-monday-september-16'),
-              ('Logistic Regression', 2, None, 'logistic-regression'),
-              ('Classification problems', 2, None, 'classification-problems'),
-              ('Optimization and Deep learning',
+               'expectation-value-and-variance-for-boldsymbol-theta'),
+              ('Deriving OLS from a probability distribution',
                2,
                None,
-               'optimization-and-deep-learning'),
-              ('Basics', 2, None, 'basics'),
-              ('Linear classifier', 2, None, 'linear-classifier'),
-              ('Some selected properties', 2, None, 'some-selected-properties'),
-              ('Simple example', 2, None, 'simple-example'),
-              ('Plotting the mean value for each group',
+               'deriving-ols-from-a-probability-distribution'),
+              ('Independent and Identically Distributed (iid)',
                2,
                None,
-               'plotting-the-mean-value-for-each-group'),
-              ('The logistic function', 2, None, 'the-logistic-function'),
-              ('Examples of likelihood functions used in logistic regression '
-               'and nueral networks',
+               'independent-and-identically-distributed-iid'),
+              ('Maximum Likelihood Estimation (MLE)',
                2,
                None,
-               'examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-networks'),
-              ('Two parameters', 2, None, 'two-parameters'),
-              ('Maximum likelihood', 2, None, 'maximum-likelihood'),
-              ('The cost function rewritten',
+               'maximum-likelihood-estimation-mle'),
+              ('A new Cost Function', 2, None, 'a-new-cost-function'),
+              ('Why resampling methods', 2, None, 'why-resampling-methods'),
+              ('Resampling methods', 2, None, 'resampling-methods'),
+              ('Resampling approaches can be computationally expensive',
                2,
                None,
-               'the-cost-function-rewritten'),
-              ('Minimizing the cross entropy',
+               'resampling-approaches-can-be-computationally-expensive'),
+              ('Why resampling methods ?', 2, None, 'why-resampling-methods'),
+              ('Statistical analysis', 2, None, 'statistical-analysis'),
+              ('Resampling methods', 2, None, 'resampling-methods'),
+              ('Resampling methods: Bootstrap',
                2,
                None,
-               'minimizing-the-cross-entropy'),
-              ('A more compact expression',
+               'resampling-methods-bootstrap'),
+              ('The Central Limit Theorem',
                2,
                None,
-               'a-more-compact-expression'),
-              ('Extending to more predictors',
+               'the-central-limit-theorem'),
+              ('Finding the Limit', 2, None, 'finding-the-limit'),
+              ('Rewriting the $\\delta$-function',
                2,
                None,
-               'extending-to-more-predictors'),
-              ('Including more classes', 2, None, 'including-more-classes'),
-              ('More classes', 2, None, 'more-classes'),
-              ('Searching for Optimal Regularization Parameters $\\lambda$',
+               'rewriting-the-delta-function'),
+              ('Identifying Terms', 2, None, 'identifying-terms'),
+              ('Wrapping it up', 2, None, 'wrapping-it-up'),
+              ('Confidence Intervals', 2, None, 'confidence-intervals'),
+              ('Standard Approach based on the Normal Distribution',
                2,
                None,
-               'searching-for-optimal-regularization-parameters-lambda'),
-              ('Grid Search', 2, None, 'grid-search'),
-              ('Randomized Grid Search', 2, None, 'randomized-grid-search'),
-              ('Wisconsin Cancer Data', 2, None, 'wisconsin-cancer-data'),
-              ('Using the correlation matrix',
+               'standard-approach-based-on-the-normal-distribution'),
+              ('Resampling methods: Bootstrap background',
                2,
                None,
-               'using-the-correlation-matrix'),
-              ('Discussing the correlation data',
+               'resampling-methods-bootstrap-background'),
+              ('Resampling methods: More Bootstrap background',
                2,
                None,
-               'discussing-the-correlation-data'),
-              ('Other measures in classification studies: Cancer Data  again',
+               'resampling-methods-more-bootstrap-background'),
+              ('Resampling methods: Bootstrap approach',
                2,
                None,
-               'other-measures-in-classification-studies-cancer-data-again'),
-              ('Optimization, the central part of any Machine Learning '
-               'algortithm',
+               'resampling-methods-bootstrap-approach'),
+              ('Resampling methods: Bootstrap steps',
                2,
                None,
-               'optimization-the-central-part-of-any-machine-learning-algortithm'),
-              ('Revisiting our Logistic Regression case',
+               'resampling-methods-bootstrap-steps'),
+              ('Code example for the Bootstrap method',
                2,
                None,
-               'revisiting-our-logistic-regression-case'),
-              ('The equations to solve', 2, None, 'the-equations-to-solve'),
-              ("Solving using Newton-Raphson's method",
+               'code-example-for-the-bootstrap-method'),
+              ('Plotting the Histogram', 2, None, 'plotting-the-histogram'),
+              ('The bias-variance tradeoff',
                2,
                None,
-               'solving-using-newton-raphson-s-method'),
-              ("Brief reminder on Newton-Raphson's method",
+               'the-bias-variance-tradeoff'),
+              ('A way to Read the Bias-Variance Tradeoff',
                2,
                None,
-               'brief-reminder-on-newton-raphson-s-method'),
-              ('The equations', 2, None, 'the-equations'),
-              ('Simple geometric interpretation',
+               'a-way-to-read-the-bias-variance-tradeoff'),
+              ('Example code for Bias-Variance tradeoff',
                2,
                None,
-               'simple-geometric-interpretation'),
-              ('Extending to more than one variable',
+               'example-code-for-bias-variance-tradeoff'),
+              ('Understanding what happens',
                2,
                None,
-               'extending-to-more-than-one-variable'),
-              ('Steepest descent', 2, None, 'steepest-descent'),
-              ('More on Steepest descent', 2, None, 'more-on-steepest-descent'),
-              ('The ideal', 2, None, 'the-ideal'),
-              ('The sensitiveness of the gradient descent',
+               'understanding-what-happens'),
+              ('Summing up', 2, None, 'summing-up'),
+              ("Another Example from Scikit-Learn's Repository",
                2,
                None,
-               'the-sensitiveness-of-the-gradient-descent'),
-              ('Convex functions', 2, None, 'convex-functions'),
-              ('Convex function', 2, None, 'convex-function'),
-              ('Conditions on convex functions',
-               2,
-               None,
-               'conditions-on-convex-functions'),
-              ('More on convex functions', 2, None, 'more-on-convex-functions'),
-              ('Some simple problems', 2, None, 'some-simple-problems'),
-              ('Revisiting our first homework',
-               2,
-               None,
-               'revisiting-our-first-homework'),
-              ('Gradient descent example', 2, None, 'gradient-descent-example'),
-              ('The derivative of the cost/loss function',
-               2,
-               None,
-               'the-derivative-of-the-cost-loss-function'),
-              ('The Hessian matrix', 2, None, 'the-hessian-matrix'),
-              ('Simple program', 2, None, 'simple-program'),
-              ('Gradient Descent Example', 2, None, 'gradient-descent-example'),
-              ('And a corresponding example using _scikit-learn_',
-               2,
-               None,
-               'and-a-corresponding-example-using-scikit-learn'),
-              ('Gradient descent and Ridge',
-               2,
-               None,
-               'gradient-descent-and-ridge'),
-              ('The Hessian matrix for Ridge Regression',
-               2,
-               None,
-               'the-hessian-matrix-for-ridge-regression'),
-              ('Program example for gradient descent with Ridge Regression',
-               2,
-               None,
-               'program-example-for-gradient-descent-with-ridge-regression'),
-              ('Using gradient descent methods, limitations',
-               2,
-               None,
-               'using-gradient-descent-methods-limitations'),
-              ('Challenge yourself the coming weekend',
-               2,
-               None,
-               'challenge-yourself-the-coming-weekend'),
-              ('Lab session: Material from last week and relevant for the '
-               'first project',
-               2,
-               None,
-               'lab-session-material-from-last-week-and-relevant-for-the-first-project'),
+               'another-example-from-scikit-learn-s-repository'),
               ('Various steps in cross-validation',
                2,
                None,
                'various-steps-in-cross-validation'),
-              ('How to set up the cross-validation for Ridge and/or Lasso',
-               2,
-               None,
-               'how-to-set-up-the-cross-validation-for-ridge-and-or-lasso'),
               ('Cross-validation in brief',
                2,
                None,
@@ -210,7 +151,19 @@
                'Cross-validation',
                2,
                None,
-               'code-example-for-cross-validation-and-k-fold-cross-validation')]}
+               'code-example-for-cross-validation-and-k-fold-cross-validation'),
+              ('More examples on bootstrap and cross-validation and errors',
+               2,
+               None,
+               'more-examples-on-bootstrap-and-cross-validation-and-errors'),
+              ('The same example but now with cross-validation',
+               2,
+               None,
+               'the-same-example-but-now-with-cross-validation'),
+              ('Material for the lab sessions',
+               2,
+               None,
+               'material-for-the-lab-sessions')]}
 end of tocinfo -->
 
 <body>
@@ -238,76 +191,55 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week38-bs.html">Week 38: Logistic Regression and Optimization</a>
+    <a class="navbar-brand" href="http://github.com/week38-bs.html">Week 38: Statistical analysis, bias-variance tradeoff and resampling methods</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs001.html#plans-for-week-38-lecture-monday-september-16" style="font-size: 80%;">Plans for week 38, lecture Monday September 16</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs002.html#suggested-reading-and-videos" style="font-size: 80%;">Suggested reading and videos</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs003.html#plans-for-the-lab-sessions" style="font-size: 80%;">Plans for the lab sessions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs004.html#material-for-lecture-monday-september-16" style="font-size: 80%;">Material for lecture Monday September 16</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs005.html#logistic-regression" style="font-size: 80%;">Logistic Regression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs006.html#classification-problems" style="font-size: 80%;">Classification problems</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs007.html#optimization-and-deep-learning" style="font-size: 80%;">Optimization and Deep learning</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs008.html#basics" style="font-size: 80%;">Basics</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs009.html#linear-classifier" style="font-size: 80%;">Linear classifier</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs010.html#some-selected-properties" style="font-size: 80%;">Some selected properties</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs011.html#simple-example" style="font-size: 80%;">Simple example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs012.html#plotting-the-mean-value-for-each-group" style="font-size: 80%;">Plotting the mean value for each group</a></li>
-     <!-- navigation toc: --> <li><a href="#the-logistic-function" style="font-size: 80%;">The logistic function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs014.html#examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-networks" style="font-size: 80%;">Examples of likelihood functions used in logistic regression and nueral networks</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs015.html#two-parameters" style="font-size: 80%;">Two parameters</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs016.html#maximum-likelihood" style="font-size: 80%;">Maximum likelihood</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs017.html#the-cost-function-rewritten" style="font-size: 80%;">The cost function rewritten</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs018.html#minimizing-the-cross-entropy" style="font-size: 80%;">Minimizing the cross entropy</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs019.html#a-more-compact-expression" style="font-size: 80%;">A more compact expression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs020.html#extending-to-more-predictors" style="font-size: 80%;">Extending to more predictors</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs021.html#including-more-classes" style="font-size: 80%;">Including more classes</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs022.html#more-classes" style="font-size: 80%;">More classes</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs023.html#searching-for-optimal-regularization-parameters-lambda" style="font-size: 80%;">Searching for Optimal Regularization Parameters \( \lambda \)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs024.html#grid-search" style="font-size: 80%;">Grid Search</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs025.html#randomized-grid-search" style="font-size: 80%;">Randomized Grid Search</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs026.html#wisconsin-cancer-data" style="font-size: 80%;">Wisconsin Cancer Data</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs027.html#using-the-correlation-matrix" style="font-size: 80%;">Using the correlation matrix</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs028.html#discussing-the-correlation-data" style="font-size: 80%;">Discussing the correlation data</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs029.html#other-measures-in-classification-studies-cancer-data-again" style="font-size: 80%;">Other measures in classification studies: Cancer Data  again</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs030.html#optimization-the-central-part-of-any-machine-learning-algortithm" style="font-size: 80%;">Optimization, the central part of any Machine Learning algortithm</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs031.html#revisiting-our-logistic-regression-case" style="font-size: 80%;">Revisiting our Logistic Regression case</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs032.html#the-equations-to-solve" style="font-size: 80%;">The equations to solve</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs033.html#solving-using-newton-raphson-s-method" style="font-size: 80%;">Solving using Newton-Raphson's method</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs034.html#brief-reminder-on-newton-raphson-s-method" style="font-size: 80%;">Brief reminder on Newton-Raphson's method</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs035.html#the-equations" style="font-size: 80%;">The equations</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs036.html#simple-geometric-interpretation" style="font-size: 80%;">Simple geometric interpretation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs037.html#extending-to-more-than-one-variable" style="font-size: 80%;">Extending to more than one variable</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs038.html#steepest-descent" style="font-size: 80%;">Steepest descent</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs039.html#more-on-steepest-descent" style="font-size: 80%;">More on Steepest descent</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs040.html#the-ideal" style="font-size: 80%;">The ideal</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs041.html#the-sensitiveness-of-the-gradient-descent" style="font-size: 80%;">The sensitiveness of the gradient descent</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs042.html#convex-functions" style="font-size: 80%;">Convex functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs043.html#convex-function" style="font-size: 80%;">Convex function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs044.html#conditions-on-convex-functions" style="font-size: 80%;">Conditions on convex functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs045.html#more-on-convex-functions" style="font-size: 80%;">More on convex functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs046.html#some-simple-problems" style="font-size: 80%;">Some simple problems</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs047.html#revisiting-our-first-homework" style="font-size: 80%;">Revisiting our first homework</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs052.html#gradient-descent-example" style="font-size: 80%;">Gradient descent example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs049.html#the-derivative-of-the-cost-loss-function" style="font-size: 80%;">The derivative of the cost/loss function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs050.html#the-hessian-matrix" style="font-size: 80%;">The Hessian matrix</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs051.html#simple-program" style="font-size: 80%;">Simple program</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs052.html#gradient-descent-example" style="font-size: 80%;">Gradient Descent Example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs053.html#and-a-corresponding-example-using-scikit-learn" style="font-size: 80%;">And a corresponding example using <b>scikit-learn</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs054.html#gradient-descent-and-ridge" style="font-size: 80%;">Gradient descent and Ridge</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs055.html#the-hessian-matrix-for-ridge-regression" style="font-size: 80%;">The Hessian matrix for Ridge Regression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs056.html#program-example-for-gradient-descent-with-ridge-regression" style="font-size: 80%;">Program example for gradient descent with Ridge Regression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs057.html#using-gradient-descent-methods-limitations" style="font-size: 80%;">Using gradient descent methods, limitations</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs058.html#challenge-yourself-the-coming-weekend" style="font-size: 80%;">Challenge yourself the coming weekend</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs059.html#lab-session-material-from-last-week-and-relevant-for-the-first-project" style="font-size: 80%;">Lab session: Material from last week and relevant for the first project</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs060.html#various-steps-in-cross-validation" style="font-size: 80%;">Various steps in cross-validation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs061.html#how-to-set-up-the-cross-validation-for-ridge-and-or-lasso" style="font-size: 80%;">How to set up the cross-validation for Ridge and/or Lasso</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs062.html#cross-validation-in-brief" style="font-size: 80%;">Cross-validation in brief</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs063.html#code-example-for-cross-validation-and-k-fold-cross-validation" style="font-size: 80%;">Code Example for Cross-validation and \( k \)-fold Cross-validation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs001.html#plans-for-week-38-lecture-monday-september-15" style="font-size: 80%;">Plans for week 38, lecture Monday September 15</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs002.html#readings-and-videos" style="font-size: 80%;">Readings and Videos</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs003.html#linking-the-regression-analysis-with-a-statistical-interpretation" style="font-size: 80%;">Linking the regression analysis with a statistical interpretation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs004.html#assumptions-made" style="font-size: 80%;">Assumptions made</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs005.html#expectation-value-and-variance" style="font-size: 80%;">Expectation value and variance</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs006.html#expectation-value-and-variance-for-boldsymbol-theta" style="font-size: 80%;">Expectation value and variance for \( \boldsymbol{\theta} \)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs007.html#deriving-ols-from-a-probability-distribution" style="font-size: 80%;">Deriving OLS from a probability distribution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs008.html#independent-and-identically-distributed-iid" style="font-size: 80%;">Independent and Identically Distributed (iid)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs009.html#maximum-likelihood-estimation-mle" style="font-size: 80%;">Maximum Likelihood Estimation (MLE)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs010.html#a-new-cost-function" style="font-size: 80%;">A new Cost Function</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs014.html#why-resampling-methods" style="font-size: 80%;">Why resampling methods</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs016.html#resampling-methods" style="font-size: 80%;">Resampling methods</a></li>
+     <!-- navigation toc: --> <li><a href="#resampling-approaches-can-be-computationally-expensive" style="font-size: 80%;">Resampling approaches can be computationally expensive</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs014.html#why-resampling-methods" style="font-size: 80%;">Why resampling methods ?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs015.html#statistical-analysis" style="font-size: 80%;">Statistical analysis</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs016.html#resampling-methods" style="font-size: 80%;">Resampling methods</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs017.html#resampling-methods-bootstrap" style="font-size: 80%;">Resampling methods: Bootstrap</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs018.html#the-central-limit-theorem" style="font-size: 80%;">The Central Limit Theorem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs019.html#finding-the-limit" style="font-size: 80%;">Finding the Limit</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs020.html#rewriting-the-delta-function" style="font-size: 80%;">Rewriting the \( \delta \)-function</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs021.html#identifying-terms" style="font-size: 80%;">Identifying Terms</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs022.html#wrapping-it-up" style="font-size: 80%;">Wrapping it up</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs023.html#confidence-intervals" style="font-size: 80%;">Confidence Intervals</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs024.html#standard-approach-based-on-the-normal-distribution" style="font-size: 80%;">Standard Approach based on the Normal Distribution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs025.html#resampling-methods-bootstrap-background" style="font-size: 80%;">Resampling methods: Bootstrap background</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs026.html#resampling-methods-more-bootstrap-background" style="font-size: 80%;">Resampling methods: More Bootstrap background</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs027.html#resampling-methods-bootstrap-approach" style="font-size: 80%;">Resampling methods: Bootstrap approach</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs028.html#resampling-methods-bootstrap-steps" style="font-size: 80%;">Resampling methods: Bootstrap steps</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs029.html#code-example-for-the-bootstrap-method" style="font-size: 80%;">Code example for the Bootstrap method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs030.html#plotting-the-histogram" style="font-size: 80%;">Plotting the Histogram</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs031.html#the-bias-variance-tradeoff" style="font-size: 80%;">The bias-variance tradeoff</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs032.html#a-way-to-read-the-bias-variance-tradeoff" style="font-size: 80%;">A way to Read the Bias-Variance Tradeoff</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs033.html#example-code-for-bias-variance-tradeoff" style="font-size: 80%;">Example code for Bias-Variance tradeoff</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs034.html#understanding-what-happens" style="font-size: 80%;">Understanding what happens</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs035.html#summing-up" style="font-size: 80%;">Summing up</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs036.html#another-example-from-scikit-learn-s-repository" style="font-size: 80%;">Another Example from Scikit-Learn's Repository</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs037.html#various-steps-in-cross-validation" style="font-size: 80%;">Various steps in cross-validation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs038.html#cross-validation-in-brief" style="font-size: 80%;">Cross-validation in brief</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs039.html#code-example-for-cross-validation-and-k-fold-cross-validation" style="font-size: 80%;">Code Example for Cross-validation and \( k \)-fold Cross-validation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs040.html#more-examples-on-bootstrap-and-cross-validation-and-errors" style="font-size: 80%;">More examples on bootstrap and cross-validation and errors</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs041.html#the-same-example-but-now-with-cross-validation" style="font-size: 80%;">The same example but now with cross-validation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs042.html#material-for-the-lab-sessions" style="font-size: 80%;">Material for the lab sessions</a></li>
 
         </ul>
       </li>
@@ -319,25 +251,30 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0013"></a>
 <!-- !split -->
-<h2 id="the-logistic-function" class="anchor">The logistic function </h2>
+<h2 id="resampling-approaches-can-be-computationally-expensive" class="anchor">Resampling approaches can be computationally expensive </h2>
+<div class="panel panel-default">
+<div class="panel-body">
+<!-- subsequent paragraphs come in larger fonts, so start with a paragraph -->
 
-<p>Another widely studied model, is the so-called 
-perceptron model, which is an example of a &quot;hard classification&quot; model. We
-will encounter this model when we discuss neural networks as
-well. Each datapoint is deterministically assigned to a category (i.e
-\( y_i=0 \) or \( y_i=1 \)). In many cases, and the coronary heart disease data forms one of many such examples, it is favorable to have a &quot;soft&quot;
-classifier that outputs the probability of a given category rather
-than a single value. For example, given \( x_i \), the classifier
-outputs the probability of being in a category \( k \).  Logistic regression
-is the most common example of a so-called soft classifier. In logistic
-regression, the probability that a data point \( x_i \)
-belongs to a category \( y_i=\{0,1\} \) is given by the so-called logit function (or Sigmoid) which is meant to represent the likelihood for a given event, 
+<p>Resampling approaches can be computationally expensive, because they
+involve fitting the same statistical method multiple times using
+different subsets of the training data. However, due to recent
+advances in computing power, the computational requirements of
+resampling methods generally are not prohibitive. In this chapter, we
+discuss two of the most commonly used resampling methods,
+cross-validation and the bootstrap. Both methods are important tools
+in the practical application of many statistical learning
+procedures. For example, cross-validation can be used to estimate the
+test error associated with a given statistical learning method in
+order to evaluate its performance, or to select the appropriate level
+of flexibility. The process of evaluating a model&#8217;s performance is
+known as model assessment, whereas the process of selecting the proper
+level of flexibility for a model is known as model selection. The
+bootstrap is widely used.
 </p>
-$$
-p(t) = \frac{1}{1+\mathrm \exp{-t}}=\frac{\exp{t}}{1+\mathrm \exp{t}}.
-$$
+</div>
+</div>
 
-<p>Note that \( 1-p(t)= p(-t) \).</p>
 
 <p>
 <!-- navigation buttons at the bottom of the page -->
@@ -364,7 +301,7 @@ <h2 id="the-logistic-function" class="anchor">The logistic function </h2>
   <li><a href="http://github.com/._week38-bs021.html">22</a></li>
   <li><a href="http://github.com/._week38-bs022.html">23</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week38-bs063.html">64</a></li>
+  <li><a href="http://github.com/._week38-bs042.html">43</a></li>
   <li><a href="http://github.com/._week38-bs014.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week38/html/._week38-bs014.html b/doc/pub/week38/html/._week38-bs014.html
index eaa1d6ac9..91edf5963 100644
--- a/doc/pub/week38/html/._week38-bs014.html
+++ b/doc/pub/week38/html/._week38-bs014.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 38: Logistic Regression and Optimization">
-<title>Week 38: Logistic Regression and Optimization</title>
+<meta name="description" content="Week 38: Statistical analysis, bias-variance tradeoff and resampling methods">
+<title>Week 38: Statistical analysis, bias-variance tradeoff and resampling methods</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week38.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week38-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -36,172 +36,113 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Plans for week 38, lecture Monday September 16',
+ 'sections': [('Plans for week 38, lecture Monday September 15',
                2,
                None,
-               'plans-for-week-38-lecture-monday-september-16'),
-              ('Suggested reading and videos',
+               'plans-for-week-38-lecture-monday-september-15'),
+              ('Readings and Videos', 2, None, 'readings-and-videos'),
+              ('Linking the regression analysis with a statistical '
+               'interpretation',
                2,
                None,
-               'suggested-reading-and-videos'),
-              ('Plans for the lab sessions',
+               'linking-the-regression-analysis-with-a-statistical-interpretation'),
+              ('Assumptions made', 2, None, 'assumptions-made'),
+              ('Expectation value and variance',
                2,
                None,
-               'plans-for-the-lab-sessions'),
-              ('Material for lecture Monday September 16',
+               'expectation-value-and-variance'),
+              ('Expectation value and variance for $\\boldsymbol{\\theta}$',
                2,
                None,
-               'material-for-lecture-monday-september-16'),
-              ('Logistic Regression', 2, None, 'logistic-regression'),
-              ('Classification problems', 2, None, 'classification-problems'),
-              ('Optimization and Deep learning',
+               'expectation-value-and-variance-for-boldsymbol-theta'),
+              ('Deriving OLS from a probability distribution',
                2,
                None,
-               'optimization-and-deep-learning'),
-              ('Basics', 2, None, 'basics'),
-              ('Linear classifier', 2, None, 'linear-classifier'),
-              ('Some selected properties', 2, None, 'some-selected-properties'),
-              ('Simple example', 2, None, 'simple-example'),
-              ('Plotting the mean value for each group',
+               'deriving-ols-from-a-probability-distribution'),
+              ('Independent and Identically Distributed (iid)',
                2,
                None,
-               'plotting-the-mean-value-for-each-group'),
-              ('The logistic function', 2, None, 'the-logistic-function'),
-              ('Examples of likelihood functions used in logistic regression '
-               'and nueral networks',
+               'independent-and-identically-distributed-iid'),
+              ('Maximum Likelihood Estimation (MLE)',
                2,
                None,
-               'examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-networks'),
-              ('Two parameters', 2, None, 'two-parameters'),
-              ('Maximum likelihood', 2, None, 'maximum-likelihood'),
-              ('The cost function rewritten',
+               'maximum-likelihood-estimation-mle'),
+              ('A new Cost Function', 2, None, 'a-new-cost-function'),
+              ('Why resampling methods', 2, None, 'why-resampling-methods'),
+              ('Resampling methods', 2, None, 'resampling-methods'),
+              ('Resampling approaches can be computationally expensive',
                2,
                None,
-               'the-cost-function-rewritten'),
-              ('Minimizing the cross entropy',
+               'resampling-approaches-can-be-computationally-expensive'),
+              ('Why resampling methods ?', 2, None, 'why-resampling-methods'),
+              ('Statistical analysis', 2, None, 'statistical-analysis'),
+              ('Resampling methods', 2, None, 'resampling-methods'),
+              ('Resampling methods: Bootstrap',
                2,
                None,
-               'minimizing-the-cross-entropy'),
-              ('A more compact expression',
+               'resampling-methods-bootstrap'),
+              ('The Central Limit Theorem',
                2,
                None,
-               'a-more-compact-expression'),
-              ('Extending to more predictors',
+               'the-central-limit-theorem'),
+              ('Finding the Limit', 2, None, 'finding-the-limit'),
+              ('Rewriting the $\\delta$-function',
                2,
                None,
-               'extending-to-more-predictors'),
-              ('Including more classes', 2, None, 'including-more-classes'),
-              ('More classes', 2, None, 'more-classes'),
-              ('Searching for Optimal Regularization Parameters $\\lambda$',
+               'rewriting-the-delta-function'),
+              ('Identifying Terms', 2, None, 'identifying-terms'),
+              ('Wrapping it up', 2, None, 'wrapping-it-up'),
+              ('Confidence Intervals', 2, None, 'confidence-intervals'),
+              ('Standard Approach based on the Normal Distribution',
                2,
                None,
-               'searching-for-optimal-regularization-parameters-lambda'),
-              ('Grid Search', 2, None, 'grid-search'),
-              ('Randomized Grid Search', 2, None, 'randomized-grid-search'),
-              ('Wisconsin Cancer Data', 2, None, 'wisconsin-cancer-data'),
-              ('Using the correlation matrix',
+               'standard-approach-based-on-the-normal-distribution'),
+              ('Resampling methods: Bootstrap background',
                2,
                None,
-               'using-the-correlation-matrix'),
-              ('Discussing the correlation data',
+               'resampling-methods-bootstrap-background'),
+              ('Resampling methods: More Bootstrap background',
                2,
                None,
-               'discussing-the-correlation-data'),
-              ('Other measures in classification studies: Cancer Data  again',
+               'resampling-methods-more-bootstrap-background'),
+              ('Resampling methods: Bootstrap approach',
                2,
                None,
-               'other-measures-in-classification-studies-cancer-data-again'),
-              ('Optimization, the central part of any Machine Learning '
-               'algortithm',
+               'resampling-methods-bootstrap-approach'),
+              ('Resampling methods: Bootstrap steps',
                2,
                None,
-               'optimization-the-central-part-of-any-machine-learning-algortithm'),
-              ('Revisiting our Logistic Regression case',
+               'resampling-methods-bootstrap-steps'),
+              ('Code example for the Bootstrap method',
                2,
                None,
-               'revisiting-our-logistic-regression-case'),
-              ('The equations to solve', 2, None, 'the-equations-to-solve'),
-              ("Solving using Newton-Raphson's method",
+               'code-example-for-the-bootstrap-method'),
+              ('Plotting the Histogram', 2, None, 'plotting-the-histogram'),
+              ('The bias-variance tradeoff',
                2,
                None,
-               'solving-using-newton-raphson-s-method'),
-              ("Brief reminder on Newton-Raphson's method",
+               'the-bias-variance-tradeoff'),
+              ('A way to Read the Bias-Variance Tradeoff',
                2,
                None,
-               'brief-reminder-on-newton-raphson-s-method'),
-              ('The equations', 2, None, 'the-equations'),
-              ('Simple geometric interpretation',
+               'a-way-to-read-the-bias-variance-tradeoff'),
+              ('Example code for Bias-Variance tradeoff',
                2,
                None,
-               'simple-geometric-interpretation'),
-              ('Extending to more than one variable',
+               'example-code-for-bias-variance-tradeoff'),
+              ('Understanding what happens',
                2,
                None,
-               'extending-to-more-than-one-variable'),
-              ('Steepest descent', 2, None, 'steepest-descent'),
-              ('More on Steepest descent', 2, None, 'more-on-steepest-descent'),
-              ('The ideal', 2, None, 'the-ideal'),
-              ('The sensitiveness of the gradient descent',
+               'understanding-what-happens'),
+              ('Summing up', 2, None, 'summing-up'),
+              ("Another Example from Scikit-Learn's Repository",
                2,
                None,
-               'the-sensitiveness-of-the-gradient-descent'),
-              ('Convex functions', 2, None, 'convex-functions'),
-              ('Convex function', 2, None, 'convex-function'),
-              ('Conditions on convex functions',
-               2,
-               None,
-               'conditions-on-convex-functions'),
-              ('More on convex functions', 2, None, 'more-on-convex-functions'),
-              ('Some simple problems', 2, None, 'some-simple-problems'),
-              ('Revisiting our first homework',
-               2,
-               None,
-               'revisiting-our-first-homework'),
-              ('Gradient descent example', 2, None, 'gradient-descent-example'),
-              ('The derivative of the cost/loss function',
-               2,
-               None,
-               'the-derivative-of-the-cost-loss-function'),
-              ('The Hessian matrix', 2, None, 'the-hessian-matrix'),
-              ('Simple program', 2, None, 'simple-program'),
-              ('Gradient Descent Example', 2, None, 'gradient-descent-example'),
-              ('And a corresponding example using _scikit-learn_',
-               2,
-               None,
-               'and-a-corresponding-example-using-scikit-learn'),
-              ('Gradient descent and Ridge',
-               2,
-               None,
-               'gradient-descent-and-ridge'),
-              ('The Hessian matrix for Ridge Regression',
-               2,
-               None,
-               'the-hessian-matrix-for-ridge-regression'),
-              ('Program example for gradient descent with Ridge Regression',
-               2,
-               None,
-               'program-example-for-gradient-descent-with-ridge-regression'),
-              ('Using gradient descent methods, limitations',
-               2,
-               None,
-               'using-gradient-descent-methods-limitations'),
-              ('Challenge yourself the coming weekend',
-               2,
-               None,
-               'challenge-yourself-the-coming-weekend'),
-              ('Lab session: Material from last week and relevant for the '
-               'first project',
-               2,
-               None,
-               'lab-session-material-from-last-week-and-relevant-for-the-first-project'),
+               'another-example-from-scikit-learn-s-repository'),
               ('Various steps in cross-validation',
                2,
                None,
                'various-steps-in-cross-validation'),
-              ('How to set up the cross-validation for Ridge and/or Lasso',
-               2,
-               None,
-               'how-to-set-up-the-cross-validation-for-ridge-and-or-lasso'),
               ('Cross-validation in brief',
                2,
                None,
@@ -210,7 +151,19 @@
                'Cross-validation',
                2,
                None,
-               'code-example-for-cross-validation-and-k-fold-cross-validation')]}
+               'code-example-for-cross-validation-and-k-fold-cross-validation'),
+              ('More examples on bootstrap and cross-validation and errors',
+               2,
+               None,
+               'more-examples-on-bootstrap-and-cross-validation-and-errors'),
+              ('The same example but now with cross-validation',
+               2,
+               None,
+               'the-same-example-but-now-with-cross-validation'),
+              ('Material for the lab sessions',
+               2,
+               None,
+               'material-for-the-lab-sessions')]}
 end of tocinfo -->
 
 <body>
@@ -238,76 +191,55 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week38-bs.html">Week 38: Logistic Regression and Optimization</a>
+    <a class="navbar-brand" href="http://github.com/week38-bs.html">Week 38: Statistical analysis, bias-variance tradeoff and resampling methods</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs001.html#plans-for-week-38-lecture-monday-september-16" style="font-size: 80%;">Plans for week 38, lecture Monday September 16</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs002.html#suggested-reading-and-videos" style="font-size: 80%;">Suggested reading and videos</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs003.html#plans-for-the-lab-sessions" style="font-size: 80%;">Plans for the lab sessions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs004.html#material-for-lecture-monday-september-16" style="font-size: 80%;">Material for lecture Monday September 16</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs005.html#logistic-regression" style="font-size: 80%;">Logistic Regression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs006.html#classification-problems" style="font-size: 80%;">Classification problems</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs007.html#optimization-and-deep-learning" style="font-size: 80%;">Optimization and Deep learning</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs008.html#basics" style="font-size: 80%;">Basics</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs009.html#linear-classifier" style="font-size: 80%;">Linear classifier</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs010.html#some-selected-properties" style="font-size: 80%;">Some selected properties</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs011.html#simple-example" style="font-size: 80%;">Simple example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs012.html#plotting-the-mean-value-for-each-group" style="font-size: 80%;">Plotting the mean value for each group</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs013.html#the-logistic-function" style="font-size: 80%;">The logistic function</a></li>
-     <!-- navigation toc: --> <li><a href="#examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-networks" style="font-size: 80%;">Examples of likelihood functions used in logistic regression and nueral networks</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs015.html#two-parameters" style="font-size: 80%;">Two parameters</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs016.html#maximum-likelihood" style="font-size: 80%;">Maximum likelihood</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs017.html#the-cost-function-rewritten" style="font-size: 80%;">The cost function rewritten</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs018.html#minimizing-the-cross-entropy" style="font-size: 80%;">Minimizing the cross entropy</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs019.html#a-more-compact-expression" style="font-size: 80%;">A more compact expression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs020.html#extending-to-more-predictors" style="font-size: 80%;">Extending to more predictors</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs021.html#including-more-classes" style="font-size: 80%;">Including more classes</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs022.html#more-classes" style="font-size: 80%;">More classes</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs023.html#searching-for-optimal-regularization-parameters-lambda" style="font-size: 80%;">Searching for Optimal Regularization Parameters \( \lambda \)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs024.html#grid-search" style="font-size: 80%;">Grid Search</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs025.html#randomized-grid-search" style="font-size: 80%;">Randomized Grid Search</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs026.html#wisconsin-cancer-data" style="font-size: 80%;">Wisconsin Cancer Data</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs027.html#using-the-correlation-matrix" style="font-size: 80%;">Using the correlation matrix</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs028.html#discussing-the-correlation-data" style="font-size: 80%;">Discussing the correlation data</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs029.html#other-measures-in-classification-studies-cancer-data-again" style="font-size: 80%;">Other measures in classification studies: Cancer Data  again</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs030.html#optimization-the-central-part-of-any-machine-learning-algortithm" style="font-size: 80%;">Optimization, the central part of any Machine Learning algortithm</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs031.html#revisiting-our-logistic-regression-case" style="font-size: 80%;">Revisiting our Logistic Regression case</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs032.html#the-equations-to-solve" style="font-size: 80%;">The equations to solve</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs033.html#solving-using-newton-raphson-s-method" style="font-size: 80%;">Solving using Newton-Raphson's method</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs034.html#brief-reminder-on-newton-raphson-s-method" style="font-size: 80%;">Brief reminder on Newton-Raphson's method</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs035.html#the-equations" style="font-size: 80%;">The equations</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs036.html#simple-geometric-interpretation" style="font-size: 80%;">Simple geometric interpretation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs037.html#extending-to-more-than-one-variable" style="font-size: 80%;">Extending to more than one variable</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs038.html#steepest-descent" style="font-size: 80%;">Steepest descent</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs039.html#more-on-steepest-descent" style="font-size: 80%;">More on Steepest descent</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs040.html#the-ideal" style="font-size: 80%;">The ideal</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs041.html#the-sensitiveness-of-the-gradient-descent" style="font-size: 80%;">The sensitiveness of the gradient descent</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs042.html#convex-functions" style="font-size: 80%;">Convex functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs043.html#convex-function" style="font-size: 80%;">Convex function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs044.html#conditions-on-convex-functions" style="font-size: 80%;">Conditions on convex functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs045.html#more-on-convex-functions" style="font-size: 80%;">More on convex functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs046.html#some-simple-problems" style="font-size: 80%;">Some simple problems</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs047.html#revisiting-our-first-homework" style="font-size: 80%;">Revisiting our first homework</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs052.html#gradient-descent-example" style="font-size: 80%;">Gradient descent example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs049.html#the-derivative-of-the-cost-loss-function" style="font-size: 80%;">The derivative of the cost/loss function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs050.html#the-hessian-matrix" style="font-size: 80%;">The Hessian matrix</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs051.html#simple-program" style="font-size: 80%;">Simple program</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs052.html#gradient-descent-example" style="font-size: 80%;">Gradient Descent Example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs053.html#and-a-corresponding-example-using-scikit-learn" style="font-size: 80%;">And a corresponding example using <b>scikit-learn</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs054.html#gradient-descent-and-ridge" style="font-size: 80%;">Gradient descent and Ridge</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs055.html#the-hessian-matrix-for-ridge-regression" style="font-size: 80%;">The Hessian matrix for Ridge Regression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs056.html#program-example-for-gradient-descent-with-ridge-regression" style="font-size: 80%;">Program example for gradient descent with Ridge Regression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs057.html#using-gradient-descent-methods-limitations" style="font-size: 80%;">Using gradient descent methods, limitations</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs058.html#challenge-yourself-the-coming-weekend" style="font-size: 80%;">Challenge yourself the coming weekend</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs059.html#lab-session-material-from-last-week-and-relevant-for-the-first-project" style="font-size: 80%;">Lab session: Material from last week and relevant for the first project</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs060.html#various-steps-in-cross-validation" style="font-size: 80%;">Various steps in cross-validation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs061.html#how-to-set-up-the-cross-validation-for-ridge-and-or-lasso" style="font-size: 80%;">How to set up the cross-validation for Ridge and/or Lasso</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs062.html#cross-validation-in-brief" style="font-size: 80%;">Cross-validation in brief</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs063.html#code-example-for-cross-validation-and-k-fold-cross-validation" style="font-size: 80%;">Code Example for Cross-validation and \( k \)-fold Cross-validation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs001.html#plans-for-week-38-lecture-monday-september-15" style="font-size: 80%;">Plans for week 38, lecture Monday September 15</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs002.html#readings-and-videos" style="font-size: 80%;">Readings and Videos</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs003.html#linking-the-regression-analysis-with-a-statistical-interpretation" style="font-size: 80%;">Linking the regression analysis with a statistical interpretation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs004.html#assumptions-made" style="font-size: 80%;">Assumptions made</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs005.html#expectation-value-and-variance" style="font-size: 80%;">Expectation value and variance</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs006.html#expectation-value-and-variance-for-boldsymbol-theta" style="font-size: 80%;">Expectation value and variance for \( \boldsymbol{\theta} \)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs007.html#deriving-ols-from-a-probability-distribution" style="font-size: 80%;">Deriving OLS from a probability distribution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs008.html#independent-and-identically-distributed-iid" style="font-size: 80%;">Independent and Identically Distributed (iid)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs009.html#maximum-likelihood-estimation-mle" style="font-size: 80%;">Maximum Likelihood Estimation (MLE)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs010.html#a-new-cost-function" style="font-size: 80%;">A new Cost Function</a></li>
+     <!-- navigation toc: --> <li><a href="#why-resampling-methods" style="font-size: 80%;">Why resampling methods</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs016.html#resampling-methods" style="font-size: 80%;">Resampling methods</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs013.html#resampling-approaches-can-be-computationally-expensive" style="font-size: 80%;">Resampling approaches can be computationally expensive</a></li>
+     <!-- navigation toc: --> <li><a href="#why-resampling-methods" style="font-size: 80%;">Why resampling methods ?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs015.html#statistical-analysis" style="font-size: 80%;">Statistical analysis</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs016.html#resampling-methods" style="font-size: 80%;">Resampling methods</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs017.html#resampling-methods-bootstrap" style="font-size: 80%;">Resampling methods: Bootstrap</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs018.html#the-central-limit-theorem" style="font-size: 80%;">The Central Limit Theorem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs019.html#finding-the-limit" style="font-size: 80%;">Finding the Limit</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs020.html#rewriting-the-delta-function" style="font-size: 80%;">Rewriting the \( \delta \)-function</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs021.html#identifying-terms" style="font-size: 80%;">Identifying Terms</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs022.html#wrapping-it-up" style="font-size: 80%;">Wrapping it up</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs023.html#confidence-intervals" style="font-size: 80%;">Confidence Intervals</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs024.html#standard-approach-based-on-the-normal-distribution" style="font-size: 80%;">Standard Approach based on the Normal Distribution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs025.html#resampling-methods-bootstrap-background" style="font-size: 80%;">Resampling methods: Bootstrap background</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs026.html#resampling-methods-more-bootstrap-background" style="font-size: 80%;">Resampling methods: More Bootstrap background</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs027.html#resampling-methods-bootstrap-approach" style="font-size: 80%;">Resampling methods: Bootstrap approach</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs028.html#resampling-methods-bootstrap-steps" style="font-size: 80%;">Resampling methods: Bootstrap steps</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs029.html#code-example-for-the-bootstrap-method" style="font-size: 80%;">Code example for the Bootstrap method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs030.html#plotting-the-histogram" style="font-size: 80%;">Plotting the Histogram</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs031.html#the-bias-variance-tradeoff" style="font-size: 80%;">The bias-variance tradeoff</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs032.html#a-way-to-read-the-bias-variance-tradeoff" style="font-size: 80%;">A way to Read the Bias-Variance Tradeoff</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs033.html#example-code-for-bias-variance-tradeoff" style="font-size: 80%;">Example code for Bias-Variance tradeoff</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs034.html#understanding-what-happens" style="font-size: 80%;">Understanding what happens</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs035.html#summing-up" style="font-size: 80%;">Summing up</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs036.html#another-example-from-scikit-learn-s-repository" style="font-size: 80%;">Another Example from Scikit-Learn's Repository</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs037.html#various-steps-in-cross-validation" style="font-size: 80%;">Various steps in cross-validation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs038.html#cross-validation-in-brief" style="font-size: 80%;">Cross-validation in brief</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs039.html#code-example-for-cross-validation-and-k-fold-cross-validation" style="font-size: 80%;">Code Example for Cross-validation and \( k \)-fold Cross-validation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs040.html#more-examples-on-bootstrap-and-cross-validation-and-errors" style="font-size: 80%;">More examples on bootstrap and cross-validation and errors</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs041.html#the-same-example-but-now-with-cross-validation" style="font-size: 80%;">The same example but now with cross-validation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs042.html#material-for-the-lab-sessions" style="font-size: 80%;">Material for the lab sessions</a></li>
 
         </ul>
       </li>
@@ -319,86 +251,19 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0014"></a>
 <!-- !split -->
-<h2 id="examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-networks" class="anchor">Examples of likelihood functions used in logistic regression and nueral networks </h2>
-
-<p>The following code plots the logistic function, the step function and other functions we will encounter from here and on.</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;The sigmoid function (or the logistic curve) is a</span>
-<span style="color: #BA2121; font-style: italic">function that takes any real number, z, and outputs a number (0,1).</span>
-<span style="color: #BA2121; font-style: italic">It is useful in neural networks for assigning weights on a relative scale.</span>
-<span style="color: #BA2121; font-style: italic">The value z is the weighted sum of parameters involved in the learning algorithm.&quot;&quot;&quot;</span>
-
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">numpy</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">matplotlib.pyplot</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">plt</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">math</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">mt</span>
-
-z <span style="color: #666666">=</span> numpy<span style="color: #666666">.</span>arange(<span style="color: #666666">-5</span>, <span style="color: #666666">5</span>, <span style="color: #666666">.1</span>)
-sigma_fn <span style="color: #666666">=</span> numpy<span style="color: #666666">.</span>vectorize(<span style="color: #008000; font-weight: bold">lambda</span> z: <span style="color: #666666">1/</span>(<span style="color: #666666">1+</span>numpy<span style="color: #666666">.</span>exp(<span style="color: #666666">-</span>z)))
-sigma <span style="color: #666666">=</span> sigma_fn(z)
+<h2 id="why-resampling-methods" class="anchor">Why resampling methods ? </h2>
+<div class="panel panel-default">
+<div class="panel-body">
+<!-- subsequent paragraphs come in larger fonts, so start with a paragraph -->
 
-fig <span style="color: #666666">=</span> plt<span style="color: #666666">.</span>figure()
-ax <span style="color: #666666">=</span> fig<span style="color: #666666">.</span>add_subplot(<span style="color: #666666">111</span>)
-ax<span style="color: #666666">.</span>plot(z, sigma)
-ax<span style="color: #666666">.</span>set_ylim([<span style="color: #666666">-0.1</span>, <span style="color: #666666">1.1</span>])
-ax<span style="color: #666666">.</span>set_xlim([<span style="color: #666666">-5</span>,<span style="color: #666666">5</span>])
-ax<span style="color: #666666">.</span>grid(<span style="color: #008000; font-weight: bold">True</span>)
-ax<span style="color: #666666">.</span>set_xlabel(<span style="color: #BA2121">&#39;z&#39;</span>)
-ax<span style="color: #666666">.</span>set_title(<span style="color: #BA2121">&#39;sigmoid function&#39;</span>)
-
-plt<span style="color: #666666">.</span>show()
-
-<span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;Step Function&quot;&quot;&quot;</span>
-z <span style="color: #666666">=</span> numpy<span style="color: #666666">.</span>arange(<span style="color: #666666">-5</span>, <span style="color: #666666">5</span>, <span style="color: #666666">.02</span>)
-step_fn <span style="color: #666666">=</span> numpy<span style="color: #666666">.</span>vectorize(<span style="color: #008000; font-weight: bold">lambda</span> z: <span style="color: #666666">1.0</span> <span style="color: #008000; font-weight: bold">if</span> z <span style="color: #666666">&gt;=</span> <span style="color: #666666">0.0</span> <span style="color: #008000; font-weight: bold">else</span> <span style="color: #666666">0.0</span>)
-step <span style="color: #666666">=</span> step_fn(z)
-
-fig <span style="color: #666666">=</span> plt<span style="color: #666666">.</span>figure()
-ax <span style="color: #666666">=</span> fig<span style="color: #666666">.</span>add_subplot(<span style="color: #666666">111</span>)
-ax<span style="color: #666666">.</span>plot(z, step)
-ax<span style="color: #666666">.</span>set_ylim([<span style="color: #666666">-0.5</span>, <span style="color: #666666">1.5</span>])
-ax<span style="color: #666666">.</span>set_xlim([<span style="color: #666666">-5</span>,<span style="color: #666666">5</span>])
-ax<span style="color: #666666">.</span>grid(<span style="color: #008000; font-weight: bold">True</span>)
-ax<span style="color: #666666">.</span>set_xlabel(<span style="color: #BA2121">&#39;z&#39;</span>)
-ax<span style="color: #666666">.</span>set_title(<span style="color: #BA2121">&#39;step function&#39;</span>)
-
-plt<span style="color: #666666">.</span>show()
-
-<span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;tanh Function&quot;&quot;&quot;</span>
-z <span style="color: #666666">=</span> numpy<span style="color: #666666">.</span>arange(<span style="color: #666666">-2*</span>mt<span style="color: #666666">.</span>pi, <span style="color: #666666">2*</span>mt<span style="color: #666666">.</span>pi, <span style="color: #666666">0.1</span>)
-t <span style="color: #666666">=</span> numpy<span style="color: #666666">.</span>tanh(z)
-
-fig <span style="color: #666666">=</span> plt<span style="color: #666666">.</span>figure()
-ax <span style="color: #666666">=</span> fig<span style="color: #666666">.</span>add_subplot(<span style="color: #666666">111</span>)
-ax<span style="color: #666666">.</span>plot(z, t)
-ax<span style="color: #666666">.</span>set_ylim([<span style="color: #666666">-1.0</span>, <span style="color: #666666">1.0</span>])
-ax<span style="color: #666666">.</span>set_xlim([<span style="color: #666666">-2*</span>mt<span style="color: #666666">.</span>pi,<span style="color: #666666">2*</span>mt<span style="color: #666666">.</span>pi])
-ax<span style="color: #666666">.</span>grid(<span style="color: #008000; font-weight: bold">True</span>)
-ax<span style="color: #666666">.</span>set_xlabel(<span style="color: #BA2121">&#39;z&#39;</span>)
-ax<span style="color: #666666">.</span>set_title(<span style="color: #BA2121">&#39;tanh function&#39;</span>)
-
-plt<span style="color: #666666">.</span>show()
-</pre>
+<ul>
+<li> Our simulations can be treated as <em>computer experiments</em>. This is particularly the case for Monte Carlo methods which are widely used in statistical analyses.</li>
+<li> The results can be analysed with the same statistical tools as we would use when analysing experimental data.</li>
+<li> As in all experiments, we are looking for expectation values and an estimate of how accurate they are, i.e., possible sources for errors.</li>
+</ul>
 </div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
 </div>
-
+    
 
 <p>
 <!-- navigation buttons at the bottom of the page -->
@@ -425,7 +290,7 @@ <h2 id="examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-
   <li><a href="http://github.com/._week38-bs022.html">23</a></li>
   <li><a href="http://github.com/._week38-bs023.html">24</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week38-bs063.html">64</a></li>
+  <li><a href="http://github.com/._week38-bs042.html">43</a></li>
   <li><a href="http://github.com/._week38-bs015.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week38/html/._week38-bs015.html b/doc/pub/week38/html/._week38-bs015.html
index 2ecabeb95..7edc59cb4 100644
--- a/doc/pub/week38/html/._week38-bs015.html
+++ b/doc/pub/week38/html/._week38-bs015.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 38: Logistic Regression and Optimization">
-<title>Week 38: Logistic Regression and Optimization</title>
+<meta name="description" content="Week 38: Statistical analysis, bias-variance tradeoff and resampling methods">
+<title>Week 38: Statistical analysis, bias-variance tradeoff and resampling methods</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week38.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week38-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -36,172 +36,113 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Plans for week 38, lecture Monday September 16',
+ 'sections': [('Plans for week 38, lecture Monday September 15',
                2,
                None,
-               'plans-for-week-38-lecture-monday-september-16'),
-              ('Suggested reading and videos',
+               'plans-for-week-38-lecture-monday-september-15'),
+              ('Readings and Videos', 2, None, 'readings-and-videos'),
+              ('Linking the regression analysis with a statistical '
+               'interpretation',
                2,
                None,
-               'suggested-reading-and-videos'),
-              ('Plans for the lab sessions',
+               'linking-the-regression-analysis-with-a-statistical-interpretation'),
+              ('Assumptions made', 2, None, 'assumptions-made'),
+              ('Expectation value and variance',
                2,
                None,
-               'plans-for-the-lab-sessions'),
-              ('Material for lecture Monday September 16',
+               'expectation-value-and-variance'),
+              ('Expectation value and variance for $\\boldsymbol{\\theta}$',
                2,
                None,
-               'material-for-lecture-monday-september-16'),
-              ('Logistic Regression', 2, None, 'logistic-regression'),
-              ('Classification problems', 2, None, 'classification-problems'),
-              ('Optimization and Deep learning',
+               'expectation-value-and-variance-for-boldsymbol-theta'),
+              ('Deriving OLS from a probability distribution',
                2,
                None,
-               'optimization-and-deep-learning'),
-              ('Basics', 2, None, 'basics'),
-              ('Linear classifier', 2, None, 'linear-classifier'),
-              ('Some selected properties', 2, None, 'some-selected-properties'),
-              ('Simple example', 2, None, 'simple-example'),
-              ('Plotting the mean value for each group',
+               'deriving-ols-from-a-probability-distribution'),
+              ('Independent and Identically Distributed (iid)',
                2,
                None,
-               'plotting-the-mean-value-for-each-group'),
-              ('The logistic function', 2, None, 'the-logistic-function'),
-              ('Examples of likelihood functions used in logistic regression '
-               'and nueral networks',
+               'independent-and-identically-distributed-iid'),
+              ('Maximum Likelihood Estimation (MLE)',
                2,
                None,
-               'examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-networks'),
-              ('Two parameters', 2, None, 'two-parameters'),
-              ('Maximum likelihood', 2, None, 'maximum-likelihood'),
-              ('The cost function rewritten',
+               'maximum-likelihood-estimation-mle'),
+              ('A new Cost Function', 2, None, 'a-new-cost-function'),
+              ('Why resampling methods', 2, None, 'why-resampling-methods'),
+              ('Resampling methods', 2, None, 'resampling-methods'),
+              ('Resampling approaches can be computationally expensive',
                2,
                None,
-               'the-cost-function-rewritten'),
-              ('Minimizing the cross entropy',
+               'resampling-approaches-can-be-computationally-expensive'),
+              ('Why resampling methods ?', 2, None, 'why-resampling-methods'),
+              ('Statistical analysis', 2, None, 'statistical-analysis'),
+              ('Resampling methods', 2, None, 'resampling-methods'),
+              ('Resampling methods: Bootstrap',
                2,
                None,
-               'minimizing-the-cross-entropy'),
-              ('A more compact expression',
+               'resampling-methods-bootstrap'),
+              ('The Central Limit Theorem',
                2,
                None,
-               'a-more-compact-expression'),
-              ('Extending to more predictors',
+               'the-central-limit-theorem'),
+              ('Finding the Limit', 2, None, 'finding-the-limit'),
+              ('Rewriting the $\\delta$-function',
                2,
                None,
-               'extending-to-more-predictors'),
-              ('Including more classes', 2, None, 'including-more-classes'),
-              ('More classes', 2, None, 'more-classes'),
-              ('Searching for Optimal Regularization Parameters $\\lambda$',
+               'rewriting-the-delta-function'),
+              ('Identifying Terms', 2, None, 'identifying-terms'),
+              ('Wrapping it up', 2, None, 'wrapping-it-up'),
+              ('Confidence Intervals', 2, None, 'confidence-intervals'),
+              ('Standard Approach based on the Normal Distribution',
                2,
                None,
-               'searching-for-optimal-regularization-parameters-lambda'),
-              ('Grid Search', 2, None, 'grid-search'),
-              ('Randomized Grid Search', 2, None, 'randomized-grid-search'),
-              ('Wisconsin Cancer Data', 2, None, 'wisconsin-cancer-data'),
-              ('Using the correlation matrix',
+               'standard-approach-based-on-the-normal-distribution'),
+              ('Resampling methods: Bootstrap background',
                2,
                None,
-               'using-the-correlation-matrix'),
-              ('Discussing the correlation data',
+               'resampling-methods-bootstrap-background'),
+              ('Resampling methods: More Bootstrap background',
                2,
                None,
-               'discussing-the-correlation-data'),
-              ('Other measures in classification studies: Cancer Data  again',
+               'resampling-methods-more-bootstrap-background'),
+              ('Resampling methods: Bootstrap approach',
                2,
                None,
-               'other-measures-in-classification-studies-cancer-data-again'),
-              ('Optimization, the central part of any Machine Learning '
-               'algortithm',
+               'resampling-methods-bootstrap-approach'),
+              ('Resampling methods: Bootstrap steps',
                2,
                None,
-               'optimization-the-central-part-of-any-machine-learning-algortithm'),
-              ('Revisiting our Logistic Regression case',
+               'resampling-methods-bootstrap-steps'),
+              ('Code example for the Bootstrap method',
                2,
                None,
-               'revisiting-our-logistic-regression-case'),
-              ('The equations to solve', 2, None, 'the-equations-to-solve'),
-              ("Solving using Newton-Raphson's method",
+               'code-example-for-the-bootstrap-method'),
+              ('Plotting the Histogram', 2, None, 'plotting-the-histogram'),
+              ('The bias-variance tradeoff',
                2,
                None,
-               'solving-using-newton-raphson-s-method'),
-              ("Brief reminder on Newton-Raphson's method",
+               'the-bias-variance-tradeoff'),
+              ('A way to Read the Bias-Variance Tradeoff',
                2,
                None,
-               'brief-reminder-on-newton-raphson-s-method'),
-              ('The equations', 2, None, 'the-equations'),
-              ('Simple geometric interpretation',
+               'a-way-to-read-the-bias-variance-tradeoff'),
+              ('Example code for Bias-Variance tradeoff',
                2,
                None,
-               'simple-geometric-interpretation'),
-              ('Extending to more than one variable',
+               'example-code-for-bias-variance-tradeoff'),
+              ('Understanding what happens',
                2,
                None,
-               'extending-to-more-than-one-variable'),
-              ('Steepest descent', 2, None, 'steepest-descent'),
-              ('More on Steepest descent', 2, None, 'more-on-steepest-descent'),
-              ('The ideal', 2, None, 'the-ideal'),
-              ('The sensitiveness of the gradient descent',
+               'understanding-what-happens'),
+              ('Summing up', 2, None, 'summing-up'),
+              ("Another Example from Scikit-Learn's Repository",
                2,
                None,
-               'the-sensitiveness-of-the-gradient-descent'),
-              ('Convex functions', 2, None, 'convex-functions'),
-              ('Convex function', 2, None, 'convex-function'),
-              ('Conditions on convex functions',
-               2,
-               None,
-               'conditions-on-convex-functions'),
-              ('More on convex functions', 2, None, 'more-on-convex-functions'),
-              ('Some simple problems', 2, None, 'some-simple-problems'),
-              ('Revisiting our first homework',
-               2,
-               None,
-               'revisiting-our-first-homework'),
-              ('Gradient descent example', 2, None, 'gradient-descent-example'),
-              ('The derivative of the cost/loss function',
-               2,
-               None,
-               'the-derivative-of-the-cost-loss-function'),
-              ('The Hessian matrix', 2, None, 'the-hessian-matrix'),
-              ('Simple program', 2, None, 'simple-program'),
-              ('Gradient Descent Example', 2, None, 'gradient-descent-example'),
-              ('And a corresponding example using _scikit-learn_',
-               2,
-               None,
-               'and-a-corresponding-example-using-scikit-learn'),
-              ('Gradient descent and Ridge',
-               2,
-               None,
-               'gradient-descent-and-ridge'),
-              ('The Hessian matrix for Ridge Regression',
-               2,
-               None,
-               'the-hessian-matrix-for-ridge-regression'),
-              ('Program example for gradient descent with Ridge Regression',
-               2,
-               None,
-               'program-example-for-gradient-descent-with-ridge-regression'),
-              ('Using gradient descent methods, limitations',
-               2,
-               None,
-               'using-gradient-descent-methods-limitations'),
-              ('Challenge yourself the coming weekend',
-               2,
-               None,
-               'challenge-yourself-the-coming-weekend'),
-              ('Lab session: Material from last week and relevant for the '
-               'first project',
-               2,
-               None,
-               'lab-session-material-from-last-week-and-relevant-for-the-first-project'),
+               'another-example-from-scikit-learn-s-repository'),
               ('Various steps in cross-validation',
                2,
                None,
                'various-steps-in-cross-validation'),
-              ('How to set up the cross-validation for Ridge and/or Lasso',
-               2,
-               None,
-               'how-to-set-up-the-cross-validation-for-ridge-and-or-lasso'),
               ('Cross-validation in brief',
                2,
                None,
@@ -210,7 +151,19 @@
                'Cross-validation',
                2,
                None,
-               'code-example-for-cross-validation-and-k-fold-cross-validation')]}
+               'code-example-for-cross-validation-and-k-fold-cross-validation'),
+              ('More examples on bootstrap and cross-validation and errors',
+               2,
+               None,
+               'more-examples-on-bootstrap-and-cross-validation-and-errors'),
+              ('The same example but now with cross-validation',
+               2,
+               None,
+               'the-same-example-but-now-with-cross-validation'),
+              ('Material for the lab sessions',
+               2,
+               None,
+               'material-for-the-lab-sessions')]}
 end of tocinfo -->
 
 <body>
@@ -238,76 +191,55 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week38-bs.html">Week 38: Logistic Regression and Optimization</a>
+    <a class="navbar-brand" href="http://github.com/week38-bs.html">Week 38: Statistical analysis, bias-variance tradeoff and resampling methods</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs001.html#plans-for-week-38-lecture-monday-september-16" style="font-size: 80%;">Plans for week 38, lecture Monday September 16</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs002.html#suggested-reading-and-videos" style="font-size: 80%;">Suggested reading and videos</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs003.html#plans-for-the-lab-sessions" style="font-size: 80%;">Plans for the lab sessions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs004.html#material-for-lecture-monday-september-16" style="font-size: 80%;">Material for lecture Monday September 16</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs005.html#logistic-regression" style="font-size: 80%;">Logistic Regression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs006.html#classification-problems" style="font-size: 80%;">Classification problems</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs007.html#optimization-and-deep-learning" style="font-size: 80%;">Optimization and Deep learning</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs008.html#basics" style="font-size: 80%;">Basics</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs009.html#linear-classifier" style="font-size: 80%;">Linear classifier</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs010.html#some-selected-properties" style="font-size: 80%;">Some selected properties</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs011.html#simple-example" style="font-size: 80%;">Simple example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs012.html#plotting-the-mean-value-for-each-group" style="font-size: 80%;">Plotting the mean value for each group</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs013.html#the-logistic-function" style="font-size: 80%;">The logistic function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs014.html#examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-networks" style="font-size: 80%;">Examples of likelihood functions used in logistic regression and nueral networks</a></li>
-     <!-- navigation toc: --> <li><a href="#two-parameters" style="font-size: 80%;">Two parameters</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs016.html#maximum-likelihood" style="font-size: 80%;">Maximum likelihood</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs017.html#the-cost-function-rewritten" style="font-size: 80%;">The cost function rewritten</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs018.html#minimizing-the-cross-entropy" style="font-size: 80%;">Minimizing the cross entropy</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs019.html#a-more-compact-expression" style="font-size: 80%;">A more compact expression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs020.html#extending-to-more-predictors" style="font-size: 80%;">Extending to more predictors</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs021.html#including-more-classes" style="font-size: 80%;">Including more classes</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs022.html#more-classes" style="font-size: 80%;">More classes</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs023.html#searching-for-optimal-regularization-parameters-lambda" style="font-size: 80%;">Searching for Optimal Regularization Parameters \( \lambda \)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs024.html#grid-search" style="font-size: 80%;">Grid Search</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs025.html#randomized-grid-search" style="font-size: 80%;">Randomized Grid Search</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs026.html#wisconsin-cancer-data" style="font-size: 80%;">Wisconsin Cancer Data</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs027.html#using-the-correlation-matrix" style="font-size: 80%;">Using the correlation matrix</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs028.html#discussing-the-correlation-data" style="font-size: 80%;">Discussing the correlation data</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs029.html#other-measures-in-classification-studies-cancer-data-again" style="font-size: 80%;">Other measures in classification studies: Cancer Data  again</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs030.html#optimization-the-central-part-of-any-machine-learning-algortithm" style="font-size: 80%;">Optimization, the central part of any Machine Learning algortithm</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs031.html#revisiting-our-logistic-regression-case" style="font-size: 80%;">Revisiting our Logistic Regression case</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs032.html#the-equations-to-solve" style="font-size: 80%;">The equations to solve</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs033.html#solving-using-newton-raphson-s-method" style="font-size: 80%;">Solving using Newton-Raphson's method</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs034.html#brief-reminder-on-newton-raphson-s-method" style="font-size: 80%;">Brief reminder on Newton-Raphson's method</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs035.html#the-equations" style="font-size: 80%;">The equations</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs036.html#simple-geometric-interpretation" style="font-size: 80%;">Simple geometric interpretation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs037.html#extending-to-more-than-one-variable" style="font-size: 80%;">Extending to more than one variable</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs038.html#steepest-descent" style="font-size: 80%;">Steepest descent</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs039.html#more-on-steepest-descent" style="font-size: 80%;">More on Steepest descent</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs040.html#the-ideal" style="font-size: 80%;">The ideal</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs041.html#the-sensitiveness-of-the-gradient-descent" style="font-size: 80%;">The sensitiveness of the gradient descent</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs042.html#convex-functions" style="font-size: 80%;">Convex functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs043.html#convex-function" style="font-size: 80%;">Convex function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs044.html#conditions-on-convex-functions" style="font-size: 80%;">Conditions on convex functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs045.html#more-on-convex-functions" style="font-size: 80%;">More on convex functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs046.html#some-simple-problems" style="font-size: 80%;">Some simple problems</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs047.html#revisiting-our-first-homework" style="font-size: 80%;">Revisiting our first homework</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs052.html#gradient-descent-example" style="font-size: 80%;">Gradient descent example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs049.html#the-derivative-of-the-cost-loss-function" style="font-size: 80%;">The derivative of the cost/loss function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs050.html#the-hessian-matrix" style="font-size: 80%;">The Hessian matrix</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs051.html#simple-program" style="font-size: 80%;">Simple program</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs052.html#gradient-descent-example" style="font-size: 80%;">Gradient Descent Example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs053.html#and-a-corresponding-example-using-scikit-learn" style="font-size: 80%;">And a corresponding example using <b>scikit-learn</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs054.html#gradient-descent-and-ridge" style="font-size: 80%;">Gradient descent and Ridge</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs055.html#the-hessian-matrix-for-ridge-regression" style="font-size: 80%;">The Hessian matrix for Ridge Regression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs056.html#program-example-for-gradient-descent-with-ridge-regression" style="font-size: 80%;">Program example for gradient descent with Ridge Regression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs057.html#using-gradient-descent-methods-limitations" style="font-size: 80%;">Using gradient descent methods, limitations</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs058.html#challenge-yourself-the-coming-weekend" style="font-size: 80%;">Challenge yourself the coming weekend</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs059.html#lab-session-material-from-last-week-and-relevant-for-the-first-project" style="font-size: 80%;">Lab session: Material from last week and relevant for the first project</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs060.html#various-steps-in-cross-validation" style="font-size: 80%;">Various steps in cross-validation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs061.html#how-to-set-up-the-cross-validation-for-ridge-and-or-lasso" style="font-size: 80%;">How to set up the cross-validation for Ridge and/or Lasso</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs062.html#cross-validation-in-brief" style="font-size: 80%;">Cross-validation in brief</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs063.html#code-example-for-cross-validation-and-k-fold-cross-validation" style="font-size: 80%;">Code Example for Cross-validation and \( k \)-fold Cross-validation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs001.html#plans-for-week-38-lecture-monday-september-15" style="font-size: 80%;">Plans for week 38, lecture Monday September 15</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs002.html#readings-and-videos" style="font-size: 80%;">Readings and Videos</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs003.html#linking-the-regression-analysis-with-a-statistical-interpretation" style="font-size: 80%;">Linking the regression analysis with a statistical interpretation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs004.html#assumptions-made" style="font-size: 80%;">Assumptions made</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs005.html#expectation-value-and-variance" style="font-size: 80%;">Expectation value and variance</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs006.html#expectation-value-and-variance-for-boldsymbol-theta" style="font-size: 80%;">Expectation value and variance for \( \boldsymbol{\theta} \)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs007.html#deriving-ols-from-a-probability-distribution" style="font-size: 80%;">Deriving OLS from a probability distribution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs008.html#independent-and-identically-distributed-iid" style="font-size: 80%;">Independent and Identically Distributed (iid)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs009.html#maximum-likelihood-estimation-mle" style="font-size: 80%;">Maximum Likelihood Estimation (MLE)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs010.html#a-new-cost-function" style="font-size: 80%;">A new Cost Function</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs014.html#why-resampling-methods" style="font-size: 80%;">Why resampling methods</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs016.html#resampling-methods" style="font-size: 80%;">Resampling methods</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs013.html#resampling-approaches-can-be-computationally-expensive" style="font-size: 80%;">Resampling approaches can be computationally expensive</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs014.html#why-resampling-methods" style="font-size: 80%;">Why resampling methods ?</a></li>
+     <!-- navigation toc: --> <li><a href="#statistical-analysis" style="font-size: 80%;">Statistical analysis</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs016.html#resampling-methods" style="font-size: 80%;">Resampling methods</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs017.html#resampling-methods-bootstrap" style="font-size: 80%;">Resampling methods: Bootstrap</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs018.html#the-central-limit-theorem" style="font-size: 80%;">The Central Limit Theorem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs019.html#finding-the-limit" style="font-size: 80%;">Finding the Limit</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs020.html#rewriting-the-delta-function" style="font-size: 80%;">Rewriting the \( \delta \)-function</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs021.html#identifying-terms" style="font-size: 80%;">Identifying Terms</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs022.html#wrapping-it-up" style="font-size: 80%;">Wrapping it up</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs023.html#confidence-intervals" style="font-size: 80%;">Confidence Intervals</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs024.html#standard-approach-based-on-the-normal-distribution" style="font-size: 80%;">Standard Approach based on the Normal Distribution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs025.html#resampling-methods-bootstrap-background" style="font-size: 80%;">Resampling methods: Bootstrap background</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs026.html#resampling-methods-more-bootstrap-background" style="font-size: 80%;">Resampling methods: More Bootstrap background</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs027.html#resampling-methods-bootstrap-approach" style="font-size: 80%;">Resampling methods: Bootstrap approach</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs028.html#resampling-methods-bootstrap-steps" style="font-size: 80%;">Resampling methods: Bootstrap steps</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs029.html#code-example-for-the-bootstrap-method" style="font-size: 80%;">Code example for the Bootstrap method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs030.html#plotting-the-histogram" style="font-size: 80%;">Plotting the Histogram</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs031.html#the-bias-variance-tradeoff" style="font-size: 80%;">The bias-variance tradeoff</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs032.html#a-way-to-read-the-bias-variance-tradeoff" style="font-size: 80%;">A way to Read the Bias-Variance Tradeoff</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs033.html#example-code-for-bias-variance-tradeoff" style="font-size: 80%;">Example code for Bias-Variance tradeoff</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs034.html#understanding-what-happens" style="font-size: 80%;">Understanding what happens</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs035.html#summing-up" style="font-size: 80%;">Summing up</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs036.html#another-example-from-scikit-learn-s-repository" style="font-size: 80%;">Another Example from Scikit-Learn's Repository</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs037.html#various-steps-in-cross-validation" style="font-size: 80%;">Various steps in cross-validation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs038.html#cross-validation-in-brief" style="font-size: 80%;">Cross-validation in brief</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs039.html#code-example-for-cross-validation-and-k-fold-cross-validation" style="font-size: 80%;">Code Example for Cross-validation and \( k \)-fold Cross-validation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs040.html#more-examples-on-bootstrap-and-cross-validation-and-errors" style="font-size: 80%;">More examples on bootstrap and cross-validation and errors</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs041.html#the-same-example-but-now-with-cross-validation" style="font-size: 80%;">The same example but now with cross-validation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs042.html#material-for-the-lab-sessions" style="font-size: 80%;">Material for the lab sessions</a></li>
 
         </ul>
       </li>
@@ -319,23 +251,23 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0015"></a>
 <!-- !split -->
-<h2 id="two-parameters" class="anchor">Two parameters </h2>
-
-<p>We assume now that we have two classes with \( y_i \) either \( 0 \) or \( 1 \). Furthermore we assume also that we have only two parameters \( \beta \) in our fitting of the Sigmoid function, that is we define probabilities </p>
-$$
-\begin{align*}
-p(y_i=1|x_i,\boldsymbol{\beta}) &= \frac{\exp{(\beta_0+\beta_1x_i)}}{1+\exp{(\beta_0+\beta_1x_i)}},\nonumber\\
-p(y_i=0|x_i,\boldsymbol{\beta}) &= 1 - p(y_i=1|x_i,\boldsymbol{\beta}),
-\end{align*}
-$$
-
-<p>where \( \boldsymbol{\beta} \) are the weights we wish to extract from data, in our case \( \beta_0 \) and \( \beta_1 \). </p>
-
-<p>Note that we used</p>
-$$
-p(y_i=0\vert x_i, \boldsymbol{\beta}) = 1-p(y_i=1\vert x_i, \boldsymbol{\beta}).
-$$
+<h2 id="statistical-analysis" class="anchor">Statistical analysis </h2>
+<div class="panel panel-default">
+<div class="panel-body">
+<!-- subsequent paragraphs come in larger fonts, so start with a paragraph -->
 
+<ul>
+<li> As in other experiments, many numerical  experiments have two classes of errors:</li>
+<ul>
+  <li> Statistical errors</li>
+  <li> Systematical errors</li>
+</ul>
+<li> Statistical errors can be estimated using standard tools from statistics</li>
+<li> Systematical errors are method specific and must be treated differently from case to case.</li> 
+</ul>
+</div>
+</div>
+    
 
 <p>
 <!-- navigation buttons at the bottom of the page -->
@@ -362,7 +294,7 @@ <h2 id="two-parameters" class="anchor">Two parameters </h2>
   <li><a href="http://github.com/._week38-bs023.html">24</a></li>
   <li><a href="http://github.com/._week38-bs024.html">25</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week38-bs063.html">64</a></li>
+  <li><a href="http://github.com/._week38-bs042.html">43</a></li>
   <li><a href="http://github.com/._week38-bs016.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week38/html/._week38-bs016.html b/doc/pub/week38/html/._week38-bs016.html
index 57aeb35bc..35a35b345 100644
--- a/doc/pub/week38/html/._week38-bs016.html
+++ b/doc/pub/week38/html/._week38-bs016.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 38: Logistic Regression and Optimization">
-<title>Week 38: Logistic Regression and Optimization</title>
+<meta name="description" content="Week 38: Statistical analysis, bias-variance tradeoff and resampling methods">
+<title>Week 38: Statistical analysis, bias-variance tradeoff and resampling methods</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week38.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week38-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -36,172 +36,113 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Plans for week 38, lecture Monday September 16',
+ 'sections': [('Plans for week 38, lecture Monday September 15',
                2,
                None,
-               'plans-for-week-38-lecture-monday-september-16'),
-              ('Suggested reading and videos',
+               'plans-for-week-38-lecture-monday-september-15'),
+              ('Readings and Videos', 2, None, 'readings-and-videos'),
+              ('Linking the regression analysis with a statistical '
+               'interpretation',
                2,
                None,
-               'suggested-reading-and-videos'),
-              ('Plans for the lab sessions',
+               'linking-the-regression-analysis-with-a-statistical-interpretation'),
+              ('Assumptions made', 2, None, 'assumptions-made'),
+              ('Expectation value and variance',
                2,
                None,
-               'plans-for-the-lab-sessions'),
-              ('Material for lecture Monday September 16',
+               'expectation-value-and-variance'),
+              ('Expectation value and variance for $\\boldsymbol{\\theta}$',
                2,
                None,
-               'material-for-lecture-monday-september-16'),
-              ('Logistic Regression', 2, None, 'logistic-regression'),
-              ('Classification problems', 2, None, 'classification-problems'),
-              ('Optimization and Deep learning',
+               'expectation-value-and-variance-for-boldsymbol-theta'),
+              ('Deriving OLS from a probability distribution',
                2,
                None,
-               'optimization-and-deep-learning'),
-              ('Basics', 2, None, 'basics'),
-              ('Linear classifier', 2, None, 'linear-classifier'),
-              ('Some selected properties', 2, None, 'some-selected-properties'),
-              ('Simple example', 2, None, 'simple-example'),
-              ('Plotting the mean value for each group',
+               'deriving-ols-from-a-probability-distribution'),
+              ('Independent and Identically Distributed (iid)',
                2,
                None,
-               'plotting-the-mean-value-for-each-group'),
-              ('The logistic function', 2, None, 'the-logistic-function'),
-              ('Examples of likelihood functions used in logistic regression '
-               'and nueral networks',
+               'independent-and-identically-distributed-iid'),
+              ('Maximum Likelihood Estimation (MLE)',
                2,
                None,
-               'examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-networks'),
-              ('Two parameters', 2, None, 'two-parameters'),
-              ('Maximum likelihood', 2, None, 'maximum-likelihood'),
-              ('The cost function rewritten',
+               'maximum-likelihood-estimation-mle'),
+              ('A new Cost Function', 2, None, 'a-new-cost-function'),
+              ('Why resampling methods', 2, None, 'why-resampling-methods'),
+              ('Resampling methods', 2, None, 'resampling-methods'),
+              ('Resampling approaches can be computationally expensive',
                2,
                None,
-               'the-cost-function-rewritten'),
-              ('Minimizing the cross entropy',
+               'resampling-approaches-can-be-computationally-expensive'),
+              ('Why resampling methods ?', 2, None, 'why-resampling-methods'),
+              ('Statistical analysis', 2, None, 'statistical-analysis'),
+              ('Resampling methods', 2, None, 'resampling-methods'),
+              ('Resampling methods: Bootstrap',
                2,
                None,
-               'minimizing-the-cross-entropy'),
-              ('A more compact expression',
+               'resampling-methods-bootstrap'),
+              ('The Central Limit Theorem',
                2,
                None,
-               'a-more-compact-expression'),
-              ('Extending to more predictors',
+               'the-central-limit-theorem'),
+              ('Finding the Limit', 2, None, 'finding-the-limit'),
+              ('Rewriting the $\\delta$-function',
                2,
                None,
-               'extending-to-more-predictors'),
-              ('Including more classes', 2, None, 'including-more-classes'),
-              ('More classes', 2, None, 'more-classes'),
-              ('Searching for Optimal Regularization Parameters $\\lambda$',
+               'rewriting-the-delta-function'),
+              ('Identifying Terms', 2, None, 'identifying-terms'),
+              ('Wrapping it up', 2, None, 'wrapping-it-up'),
+              ('Confidence Intervals', 2, None, 'confidence-intervals'),
+              ('Standard Approach based on the Normal Distribution',
                2,
                None,
-               'searching-for-optimal-regularization-parameters-lambda'),
-              ('Grid Search', 2, None, 'grid-search'),
-              ('Randomized Grid Search', 2, None, 'randomized-grid-search'),
-              ('Wisconsin Cancer Data', 2, None, 'wisconsin-cancer-data'),
-              ('Using the correlation matrix',
+               'standard-approach-based-on-the-normal-distribution'),
+              ('Resampling methods: Bootstrap background',
                2,
                None,
-               'using-the-correlation-matrix'),
-              ('Discussing the correlation data',
+               'resampling-methods-bootstrap-background'),
+              ('Resampling methods: More Bootstrap background',
                2,
                None,
-               'discussing-the-correlation-data'),
-              ('Other measures in classification studies: Cancer Data  again',
+               'resampling-methods-more-bootstrap-background'),
+              ('Resampling methods: Bootstrap approach',
                2,
                None,
-               'other-measures-in-classification-studies-cancer-data-again'),
-              ('Optimization, the central part of any Machine Learning '
-               'algortithm',
+               'resampling-methods-bootstrap-approach'),
+              ('Resampling methods: Bootstrap steps',
                2,
                None,
-               'optimization-the-central-part-of-any-machine-learning-algortithm'),
-              ('Revisiting our Logistic Regression case',
+               'resampling-methods-bootstrap-steps'),
+              ('Code example for the Bootstrap method',
                2,
                None,
-               'revisiting-our-logistic-regression-case'),
-              ('The equations to solve', 2, None, 'the-equations-to-solve'),
-              ("Solving using Newton-Raphson's method",
+               'code-example-for-the-bootstrap-method'),
+              ('Plotting the Histogram', 2, None, 'plotting-the-histogram'),
+              ('The bias-variance tradeoff',
                2,
                None,
-               'solving-using-newton-raphson-s-method'),
-              ("Brief reminder on Newton-Raphson's method",
+               'the-bias-variance-tradeoff'),
+              ('A way to Read the Bias-Variance Tradeoff',
                2,
                None,
-               'brief-reminder-on-newton-raphson-s-method'),
-              ('The equations', 2, None, 'the-equations'),
-              ('Simple geometric interpretation',
+               'a-way-to-read-the-bias-variance-tradeoff'),
+              ('Example code for Bias-Variance tradeoff',
                2,
                None,
-               'simple-geometric-interpretation'),
-              ('Extending to more than one variable',
+               'example-code-for-bias-variance-tradeoff'),
+              ('Understanding what happens',
                2,
                None,
-               'extending-to-more-than-one-variable'),
-              ('Steepest descent', 2, None, 'steepest-descent'),
-              ('More on Steepest descent', 2, None, 'more-on-steepest-descent'),
-              ('The ideal', 2, None, 'the-ideal'),
-              ('The sensitiveness of the gradient descent',
+               'understanding-what-happens'),
+              ('Summing up', 2, None, 'summing-up'),
+              ("Another Example from Scikit-Learn's Repository",
                2,
                None,
-               'the-sensitiveness-of-the-gradient-descent'),
-              ('Convex functions', 2, None, 'convex-functions'),
-              ('Convex function', 2, None, 'convex-function'),
-              ('Conditions on convex functions',
-               2,
-               None,
-               'conditions-on-convex-functions'),
-              ('More on convex functions', 2, None, 'more-on-convex-functions'),
-              ('Some simple problems', 2, None, 'some-simple-problems'),
-              ('Revisiting our first homework',
-               2,
-               None,
-               'revisiting-our-first-homework'),
-              ('Gradient descent example', 2, None, 'gradient-descent-example'),
-              ('The derivative of the cost/loss function',
-               2,
-               None,
-               'the-derivative-of-the-cost-loss-function'),
-              ('The Hessian matrix', 2, None, 'the-hessian-matrix'),
-              ('Simple program', 2, None, 'simple-program'),
-              ('Gradient Descent Example', 2, None, 'gradient-descent-example'),
-              ('And a corresponding example using _scikit-learn_',
-               2,
-               None,
-               'and-a-corresponding-example-using-scikit-learn'),
-              ('Gradient descent and Ridge',
-               2,
-               None,
-               'gradient-descent-and-ridge'),
-              ('The Hessian matrix for Ridge Regression',
-               2,
-               None,
-               'the-hessian-matrix-for-ridge-regression'),
-              ('Program example for gradient descent with Ridge Regression',
-               2,
-               None,
-               'program-example-for-gradient-descent-with-ridge-regression'),
-              ('Using gradient descent methods, limitations',
-               2,
-               None,
-               'using-gradient-descent-methods-limitations'),
-              ('Challenge yourself the coming weekend',
-               2,
-               None,
-               'challenge-yourself-the-coming-weekend'),
-              ('Lab session: Material from last week and relevant for the '
-               'first project',
-               2,
-               None,
-               'lab-session-material-from-last-week-and-relevant-for-the-first-project'),
+               'another-example-from-scikit-learn-s-repository'),
               ('Various steps in cross-validation',
                2,
                None,
                'various-steps-in-cross-validation'),
-              ('How to set up the cross-validation for Ridge and/or Lasso',
-               2,
-               None,
-               'how-to-set-up-the-cross-validation-for-ridge-and-or-lasso'),
               ('Cross-validation in brief',
                2,
                None,
@@ -210,7 +151,19 @@
                'Cross-validation',
                2,
                None,
-               'code-example-for-cross-validation-and-k-fold-cross-validation')]}
+               'code-example-for-cross-validation-and-k-fold-cross-validation'),
+              ('More examples on bootstrap and cross-validation and errors',
+               2,
+               None,
+               'more-examples-on-bootstrap-and-cross-validation-and-errors'),
+              ('The same example but now with cross-validation',
+               2,
+               None,
+               'the-same-example-but-now-with-cross-validation'),
+              ('Material for the lab sessions',
+               2,
+               None,
+               'material-for-the-lab-sessions')]}
 end of tocinfo -->
 
 <body>
@@ -238,76 +191,55 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week38-bs.html">Week 38: Logistic Regression and Optimization</a>
+    <a class="navbar-brand" href="http://github.com/week38-bs.html">Week 38: Statistical analysis, bias-variance tradeoff and resampling methods</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs001.html#plans-for-week-38-lecture-monday-september-16" style="font-size: 80%;">Plans for week 38, lecture Monday September 16</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs002.html#suggested-reading-and-videos" style="font-size: 80%;">Suggested reading and videos</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs003.html#plans-for-the-lab-sessions" style="font-size: 80%;">Plans for the lab sessions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs004.html#material-for-lecture-monday-september-16" style="font-size: 80%;">Material for lecture Monday September 16</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs005.html#logistic-regression" style="font-size: 80%;">Logistic Regression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs006.html#classification-problems" style="font-size: 80%;">Classification problems</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs007.html#optimization-and-deep-learning" style="font-size: 80%;">Optimization and Deep learning</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs008.html#basics" style="font-size: 80%;">Basics</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs009.html#linear-classifier" style="font-size: 80%;">Linear classifier</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs010.html#some-selected-properties" style="font-size: 80%;">Some selected properties</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs011.html#simple-example" style="font-size: 80%;">Simple example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs012.html#plotting-the-mean-value-for-each-group" style="font-size: 80%;">Plotting the mean value for each group</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs013.html#the-logistic-function" style="font-size: 80%;">The logistic function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs014.html#examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-networks" style="font-size: 80%;">Examples of likelihood functions used in logistic regression and nueral networks</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs015.html#two-parameters" style="font-size: 80%;">Two parameters</a></li>
-     <!-- navigation toc: --> <li><a href="#maximum-likelihood" style="font-size: 80%;">Maximum likelihood</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs017.html#the-cost-function-rewritten" style="font-size: 80%;">The cost function rewritten</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs018.html#minimizing-the-cross-entropy" style="font-size: 80%;">Minimizing the cross entropy</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs019.html#a-more-compact-expression" style="font-size: 80%;">A more compact expression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs020.html#extending-to-more-predictors" style="font-size: 80%;">Extending to more predictors</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs021.html#including-more-classes" style="font-size: 80%;">Including more classes</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs022.html#more-classes" style="font-size: 80%;">More classes</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs023.html#searching-for-optimal-regularization-parameters-lambda" style="font-size: 80%;">Searching for Optimal Regularization Parameters \( \lambda \)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs024.html#grid-search" style="font-size: 80%;">Grid Search</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs025.html#randomized-grid-search" style="font-size: 80%;">Randomized Grid Search</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs026.html#wisconsin-cancer-data" style="font-size: 80%;">Wisconsin Cancer Data</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs027.html#using-the-correlation-matrix" style="font-size: 80%;">Using the correlation matrix</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs028.html#discussing-the-correlation-data" style="font-size: 80%;">Discussing the correlation data</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs029.html#other-measures-in-classification-studies-cancer-data-again" style="font-size: 80%;">Other measures in classification studies: Cancer Data  again</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs030.html#optimization-the-central-part-of-any-machine-learning-algortithm" style="font-size: 80%;">Optimization, the central part of any Machine Learning algortithm</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs031.html#revisiting-our-logistic-regression-case" style="font-size: 80%;">Revisiting our Logistic Regression case</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs032.html#the-equations-to-solve" style="font-size: 80%;">The equations to solve</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs033.html#solving-using-newton-raphson-s-method" style="font-size: 80%;">Solving using Newton-Raphson's method</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs034.html#brief-reminder-on-newton-raphson-s-method" style="font-size: 80%;">Brief reminder on Newton-Raphson's method</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs035.html#the-equations" style="font-size: 80%;">The equations</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs036.html#simple-geometric-interpretation" style="font-size: 80%;">Simple geometric interpretation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs037.html#extending-to-more-than-one-variable" style="font-size: 80%;">Extending to more than one variable</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs038.html#steepest-descent" style="font-size: 80%;">Steepest descent</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs039.html#more-on-steepest-descent" style="font-size: 80%;">More on Steepest descent</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs040.html#the-ideal" style="font-size: 80%;">The ideal</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs041.html#the-sensitiveness-of-the-gradient-descent" style="font-size: 80%;">The sensitiveness of the gradient descent</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs042.html#convex-functions" style="font-size: 80%;">Convex functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs043.html#convex-function" style="font-size: 80%;">Convex function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs044.html#conditions-on-convex-functions" style="font-size: 80%;">Conditions on convex functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs045.html#more-on-convex-functions" style="font-size: 80%;">More on convex functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs046.html#some-simple-problems" style="font-size: 80%;">Some simple problems</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs047.html#revisiting-our-first-homework" style="font-size: 80%;">Revisiting our first homework</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs052.html#gradient-descent-example" style="font-size: 80%;">Gradient descent example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs049.html#the-derivative-of-the-cost-loss-function" style="font-size: 80%;">The derivative of the cost/loss function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs050.html#the-hessian-matrix" style="font-size: 80%;">The Hessian matrix</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs051.html#simple-program" style="font-size: 80%;">Simple program</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs052.html#gradient-descent-example" style="font-size: 80%;">Gradient Descent Example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs053.html#and-a-corresponding-example-using-scikit-learn" style="font-size: 80%;">And a corresponding example using <b>scikit-learn</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs054.html#gradient-descent-and-ridge" style="font-size: 80%;">Gradient descent and Ridge</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs055.html#the-hessian-matrix-for-ridge-regression" style="font-size: 80%;">The Hessian matrix for Ridge Regression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs056.html#program-example-for-gradient-descent-with-ridge-regression" style="font-size: 80%;">Program example for gradient descent with Ridge Regression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs057.html#using-gradient-descent-methods-limitations" style="font-size: 80%;">Using gradient descent methods, limitations</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs058.html#challenge-yourself-the-coming-weekend" style="font-size: 80%;">Challenge yourself the coming weekend</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs059.html#lab-session-material-from-last-week-and-relevant-for-the-first-project" style="font-size: 80%;">Lab session: Material from last week and relevant for the first project</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs060.html#various-steps-in-cross-validation" style="font-size: 80%;">Various steps in cross-validation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs061.html#how-to-set-up-the-cross-validation-for-ridge-and-or-lasso" style="font-size: 80%;">How to set up the cross-validation for Ridge and/or Lasso</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs062.html#cross-validation-in-brief" style="font-size: 80%;">Cross-validation in brief</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs063.html#code-example-for-cross-validation-and-k-fold-cross-validation" style="font-size: 80%;">Code Example for Cross-validation and \( k \)-fold Cross-validation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs001.html#plans-for-week-38-lecture-monday-september-15" style="font-size: 80%;">Plans for week 38, lecture Monday September 15</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs002.html#readings-and-videos" style="font-size: 80%;">Readings and Videos</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs003.html#linking-the-regression-analysis-with-a-statistical-interpretation" style="font-size: 80%;">Linking the regression analysis with a statistical interpretation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs004.html#assumptions-made" style="font-size: 80%;">Assumptions made</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs005.html#expectation-value-and-variance" style="font-size: 80%;">Expectation value and variance</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs006.html#expectation-value-and-variance-for-boldsymbol-theta" style="font-size: 80%;">Expectation value and variance for \( \boldsymbol{\theta} \)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs007.html#deriving-ols-from-a-probability-distribution" style="font-size: 80%;">Deriving OLS from a probability distribution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs008.html#independent-and-identically-distributed-iid" style="font-size: 80%;">Independent and Identically Distributed (iid)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs009.html#maximum-likelihood-estimation-mle" style="font-size: 80%;">Maximum Likelihood Estimation (MLE)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs010.html#a-new-cost-function" style="font-size: 80%;">A new Cost Function</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs014.html#why-resampling-methods" style="font-size: 80%;">Why resampling methods</a></li>
+     <!-- navigation toc: --> <li><a href="#resampling-methods" style="font-size: 80%;">Resampling methods</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs013.html#resampling-approaches-can-be-computationally-expensive" style="font-size: 80%;">Resampling approaches can be computationally expensive</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs014.html#why-resampling-methods" style="font-size: 80%;">Why resampling methods ?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs015.html#statistical-analysis" style="font-size: 80%;">Statistical analysis</a></li>
+     <!-- navigation toc: --> <li><a href="#resampling-methods" style="font-size: 80%;">Resampling methods</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs017.html#resampling-methods-bootstrap" style="font-size: 80%;">Resampling methods: Bootstrap</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs018.html#the-central-limit-theorem" style="font-size: 80%;">The Central Limit Theorem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs019.html#finding-the-limit" style="font-size: 80%;">Finding the Limit</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs020.html#rewriting-the-delta-function" style="font-size: 80%;">Rewriting the \( \delta \)-function</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs021.html#identifying-terms" style="font-size: 80%;">Identifying Terms</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs022.html#wrapping-it-up" style="font-size: 80%;">Wrapping it up</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs023.html#confidence-intervals" style="font-size: 80%;">Confidence Intervals</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs024.html#standard-approach-based-on-the-normal-distribution" style="font-size: 80%;">Standard Approach based on the Normal Distribution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs025.html#resampling-methods-bootstrap-background" style="font-size: 80%;">Resampling methods: Bootstrap background</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs026.html#resampling-methods-more-bootstrap-background" style="font-size: 80%;">Resampling methods: More Bootstrap background</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs027.html#resampling-methods-bootstrap-approach" style="font-size: 80%;">Resampling methods: Bootstrap approach</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs028.html#resampling-methods-bootstrap-steps" style="font-size: 80%;">Resampling methods: Bootstrap steps</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs029.html#code-example-for-the-bootstrap-method" style="font-size: 80%;">Code example for the Bootstrap method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs030.html#plotting-the-histogram" style="font-size: 80%;">Plotting the Histogram</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs031.html#the-bias-variance-tradeoff" style="font-size: 80%;">The bias-variance tradeoff</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs032.html#a-way-to-read-the-bias-variance-tradeoff" style="font-size: 80%;">A way to Read the Bias-Variance Tradeoff</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs033.html#example-code-for-bias-variance-tradeoff" style="font-size: 80%;">Example code for Bias-Variance tradeoff</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs034.html#understanding-what-happens" style="font-size: 80%;">Understanding what happens</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs035.html#summing-up" style="font-size: 80%;">Summing up</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs036.html#another-example-from-scikit-learn-s-repository" style="font-size: 80%;">Another Example from Scikit-Learn's Repository</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs037.html#various-steps-in-cross-validation" style="font-size: 80%;">Various steps in cross-validation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs038.html#cross-validation-in-brief" style="font-size: 80%;">Cross-validation in brief</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs039.html#code-example-for-cross-validation-and-k-fold-cross-validation" style="font-size: 80%;">Code Example for Cross-validation and \( k \)-fold Cross-validation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs040.html#more-examples-on-bootstrap-and-cross-validation-and-errors" style="font-size: 80%;">More examples on bootstrap and cross-validation and errors</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs041.html#the-same-example-but-now-with-cross-validation" style="font-size: 80%;">The same example but now with cross-validation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs042.html#material-for-the-lab-sessions" style="font-size: 80%;">Material for the lab sessions</a></li>
 
         </ul>
       </li>
@@ -318,27 +250,31 @@
 <div class="container">
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0016"></a>
-<!-- !split  -->
-<h2 id="maximum-likelihood" class="anchor">Maximum likelihood </h2>
+<!-- !split -->
+<h2 id="resampling-methods" class="anchor">Resampling methods </h2>
 
-<p>In order to define the total likelihood for all possible outcomes from a  
-dataset \( \mathcal{D}=\{(y_i,x_i)\} \), with the binary labels
-\( y_i\in\{0,1\} \) and where the data points are drawn independently, we use the so-called <a href="https://en.wikipedia.org/wiki/Maximum_likelihood_estimation" target="_self">Maximum Likelihood Estimation</a> (MLE) principle. 
-We aim thus at maximizing 
-the probability of seeing the observed data. We can then approximate the 
-likelihood in terms of the product of the individual probabilities of a specific outcome \( y_i \), that is 
+<p>With all these analytical equations for both the OLS and Ridge
+regression, we will now outline how to assess a given model. This will
+lead to a discussion of the so-called bias-variance tradeoff (see
+below) and so-called resampling methods.
 </p>
-$$
-\begin{align*}
-P(\mathcal{D}|\boldsymbol{\beta})& = \prod_{i=1}^n \left[p(y_i=1|x_i,\boldsymbol{\beta})\right]^{y_i}\left[1-p(y_i=1|x_i,\boldsymbol{\beta}))\right]^{1-y_i}\nonumber \\
-\end{align*}
-$$
 
-<p>from which we obtain the log-likelihood and our <b>cost/loss</b> function</p>
-$$
-\mathcal{C}(\boldsymbol{\beta}) = \sum_{i=1}^n \left( y_i\log{p(y_i=1|x_i,\boldsymbol{\beta})} + (1-y_i)\log\left[1-p(y_i=1|x_i,\boldsymbol{\beta}))\right]\right).
-$$
+<p>One of the quantities we have discussed as a way to measure errors is
+the mean-squared error (MSE), mainly used for fitting of continuous
+functions. Another choice is the absolute error.
+</p>
 
+<p>In the discussions below we will focus on the MSE and in particular since we will split the data into test and training data,
+we discuss the
+</p>
+<ol>
+<li> prediction error or simply the <b>test error</b> \( \mathrm{Err_{Test}} \), where we have a fixed training set and the test error is the MSE arising from the data reserved for testing. We discuss also the</li> 
+<li> training error \( \mathrm{Err_{Train}} \), which is the average loss over the training data.</li>
+</ol>
+<p>As our model becomes more and more complex, more of the training data tends to  used. The training may thence adapt to more complicated structures in the data. This may lead to a decrease in the bias (see below for code example) and a slight increase of the variance for the test error.
+For a certain level of complexity the test error will reach minimum, before starting to increase again. The
+training error reaches a saturation.
+</p>
 
 <p>
 <!-- navigation buttons at the bottom of the page -->
@@ -365,7 +301,7 @@ <h2 id="maximum-likelihood" class="anchor">Maximum likelihood </h2>
   <li><a href="http://github.com/._week38-bs024.html">25</a></li>
   <li><a href="http://github.com/._week38-bs025.html">26</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week38-bs063.html">64</a></li>
+  <li><a href="http://github.com/._week38-bs042.html">43</a></li>
   <li><a href="http://github.com/._week38-bs017.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week38/html/._week38-bs017.html b/doc/pub/week38/html/._week38-bs017.html
index 7a4092b38..c8ac26ca6 100644
--- a/doc/pub/week38/html/._week38-bs017.html
+++ b/doc/pub/week38/html/._week38-bs017.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 38: Logistic Regression and Optimization">
-<title>Week 38: Logistic Regression and Optimization</title>
+<meta name="description" content="Week 38: Statistical analysis, bias-variance tradeoff and resampling methods">
+<title>Week 38: Statistical analysis, bias-variance tradeoff and resampling methods</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week38.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week38-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -36,172 +36,113 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Plans for week 38, lecture Monday September 16',
+ 'sections': [('Plans for week 38, lecture Monday September 15',
                2,
                None,
-               'plans-for-week-38-lecture-monday-september-16'),
-              ('Suggested reading and videos',
+               'plans-for-week-38-lecture-monday-september-15'),
+              ('Readings and Videos', 2, None, 'readings-and-videos'),
+              ('Linking the regression analysis with a statistical '
+               'interpretation',
                2,
                None,
-               'suggested-reading-and-videos'),
-              ('Plans for the lab sessions',
+               'linking-the-regression-analysis-with-a-statistical-interpretation'),
+              ('Assumptions made', 2, None, 'assumptions-made'),
+              ('Expectation value and variance',
                2,
                None,
-               'plans-for-the-lab-sessions'),
-              ('Material for lecture Monday September 16',
+               'expectation-value-and-variance'),
+              ('Expectation value and variance for $\\boldsymbol{\\theta}$',
                2,
                None,
-               'material-for-lecture-monday-september-16'),
-              ('Logistic Regression', 2, None, 'logistic-regression'),
-              ('Classification problems', 2, None, 'classification-problems'),
-              ('Optimization and Deep learning',
+               'expectation-value-and-variance-for-boldsymbol-theta'),
+              ('Deriving OLS from a probability distribution',
                2,
                None,
-               'optimization-and-deep-learning'),
-              ('Basics', 2, None, 'basics'),
-              ('Linear classifier', 2, None, 'linear-classifier'),
-              ('Some selected properties', 2, None, 'some-selected-properties'),
-              ('Simple example', 2, None, 'simple-example'),
-              ('Plotting the mean value for each group',
+               'deriving-ols-from-a-probability-distribution'),
+              ('Independent and Identically Distributed (iid)',
                2,
                None,
-               'plotting-the-mean-value-for-each-group'),
-              ('The logistic function', 2, None, 'the-logistic-function'),
-              ('Examples of likelihood functions used in logistic regression '
-               'and nueral networks',
+               'independent-and-identically-distributed-iid'),
+              ('Maximum Likelihood Estimation (MLE)',
                2,
                None,
-               'examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-networks'),
-              ('Two parameters', 2, None, 'two-parameters'),
-              ('Maximum likelihood', 2, None, 'maximum-likelihood'),
-              ('The cost function rewritten',
+               'maximum-likelihood-estimation-mle'),
+              ('A new Cost Function', 2, None, 'a-new-cost-function'),
+              ('Why resampling methods', 2, None, 'why-resampling-methods'),
+              ('Resampling methods', 2, None, 'resampling-methods'),
+              ('Resampling approaches can be computationally expensive',
                2,
                None,
-               'the-cost-function-rewritten'),
-              ('Minimizing the cross entropy',
+               'resampling-approaches-can-be-computationally-expensive'),
+              ('Why resampling methods ?', 2, None, 'why-resampling-methods'),
+              ('Statistical analysis', 2, None, 'statistical-analysis'),
+              ('Resampling methods', 2, None, 'resampling-methods'),
+              ('Resampling methods: Bootstrap',
                2,
                None,
-               'minimizing-the-cross-entropy'),
-              ('A more compact expression',
+               'resampling-methods-bootstrap'),
+              ('The Central Limit Theorem',
                2,
                None,
-               'a-more-compact-expression'),
-              ('Extending to more predictors',
+               'the-central-limit-theorem'),
+              ('Finding the Limit', 2, None, 'finding-the-limit'),
+              ('Rewriting the $\\delta$-function',
                2,
                None,
-               'extending-to-more-predictors'),
-              ('Including more classes', 2, None, 'including-more-classes'),
-              ('More classes', 2, None, 'more-classes'),
-              ('Searching for Optimal Regularization Parameters $\\lambda$',
+               'rewriting-the-delta-function'),
+              ('Identifying Terms', 2, None, 'identifying-terms'),
+              ('Wrapping it up', 2, None, 'wrapping-it-up'),
+              ('Confidence Intervals', 2, None, 'confidence-intervals'),
+              ('Standard Approach based on the Normal Distribution',
                2,
                None,
-               'searching-for-optimal-regularization-parameters-lambda'),
-              ('Grid Search', 2, None, 'grid-search'),
-              ('Randomized Grid Search', 2, None, 'randomized-grid-search'),
-              ('Wisconsin Cancer Data', 2, None, 'wisconsin-cancer-data'),
-              ('Using the correlation matrix',
+               'standard-approach-based-on-the-normal-distribution'),
+              ('Resampling methods: Bootstrap background',
                2,
                None,
-               'using-the-correlation-matrix'),
-              ('Discussing the correlation data',
+               'resampling-methods-bootstrap-background'),
+              ('Resampling methods: More Bootstrap background',
                2,
                None,
-               'discussing-the-correlation-data'),
-              ('Other measures in classification studies: Cancer Data  again',
+               'resampling-methods-more-bootstrap-background'),
+              ('Resampling methods: Bootstrap approach',
                2,
                None,
-               'other-measures-in-classification-studies-cancer-data-again'),
-              ('Optimization, the central part of any Machine Learning '
-               'algortithm',
+               'resampling-methods-bootstrap-approach'),
+              ('Resampling methods: Bootstrap steps',
                2,
                None,
-               'optimization-the-central-part-of-any-machine-learning-algortithm'),
-              ('Revisiting our Logistic Regression case',
+               'resampling-methods-bootstrap-steps'),
+              ('Code example for the Bootstrap method',
                2,
                None,
-               'revisiting-our-logistic-regression-case'),
-              ('The equations to solve', 2, None, 'the-equations-to-solve'),
-              ("Solving using Newton-Raphson's method",
+               'code-example-for-the-bootstrap-method'),
+              ('Plotting the Histogram', 2, None, 'plotting-the-histogram'),
+              ('The bias-variance tradeoff',
                2,
                None,
-               'solving-using-newton-raphson-s-method'),
-              ("Brief reminder on Newton-Raphson's method",
+               'the-bias-variance-tradeoff'),
+              ('A way to Read the Bias-Variance Tradeoff',
                2,
                None,
-               'brief-reminder-on-newton-raphson-s-method'),
-              ('The equations', 2, None, 'the-equations'),
-              ('Simple geometric interpretation',
+               'a-way-to-read-the-bias-variance-tradeoff'),
+              ('Example code for Bias-Variance tradeoff',
                2,
                None,
-               'simple-geometric-interpretation'),
-              ('Extending to more than one variable',
+               'example-code-for-bias-variance-tradeoff'),
+              ('Understanding what happens',
                2,
                None,
-               'extending-to-more-than-one-variable'),
-              ('Steepest descent', 2, None, 'steepest-descent'),
-              ('More on Steepest descent', 2, None, 'more-on-steepest-descent'),
-              ('The ideal', 2, None, 'the-ideal'),
-              ('The sensitiveness of the gradient descent',
+               'understanding-what-happens'),
+              ('Summing up', 2, None, 'summing-up'),
+              ("Another Example from Scikit-Learn's Repository",
                2,
                None,
-               'the-sensitiveness-of-the-gradient-descent'),
-              ('Convex functions', 2, None, 'convex-functions'),
-              ('Convex function', 2, None, 'convex-function'),
-              ('Conditions on convex functions',
-               2,
-               None,
-               'conditions-on-convex-functions'),
-              ('More on convex functions', 2, None, 'more-on-convex-functions'),
-              ('Some simple problems', 2, None, 'some-simple-problems'),
-              ('Revisiting our first homework',
-               2,
-               None,
-               'revisiting-our-first-homework'),
-              ('Gradient descent example', 2, None, 'gradient-descent-example'),
-              ('The derivative of the cost/loss function',
-               2,
-               None,
-               'the-derivative-of-the-cost-loss-function'),
-              ('The Hessian matrix', 2, None, 'the-hessian-matrix'),
-              ('Simple program', 2, None, 'simple-program'),
-              ('Gradient Descent Example', 2, None, 'gradient-descent-example'),
-              ('And a corresponding example using _scikit-learn_',
-               2,
-               None,
-               'and-a-corresponding-example-using-scikit-learn'),
-              ('Gradient descent and Ridge',
-               2,
-               None,
-               'gradient-descent-and-ridge'),
-              ('The Hessian matrix for Ridge Regression',
-               2,
-               None,
-               'the-hessian-matrix-for-ridge-regression'),
-              ('Program example for gradient descent with Ridge Regression',
-               2,
-               None,
-               'program-example-for-gradient-descent-with-ridge-regression'),
-              ('Using gradient descent methods, limitations',
-               2,
-               None,
-               'using-gradient-descent-methods-limitations'),
-              ('Challenge yourself the coming weekend',
-               2,
-               None,
-               'challenge-yourself-the-coming-weekend'),
-              ('Lab session: Material from last week and relevant for the '
-               'first project',
-               2,
-               None,
-               'lab-session-material-from-last-week-and-relevant-for-the-first-project'),
+               'another-example-from-scikit-learn-s-repository'),
               ('Various steps in cross-validation',
                2,
                None,
                'various-steps-in-cross-validation'),
-              ('How to set up the cross-validation for Ridge and/or Lasso',
-               2,
-               None,
-               'how-to-set-up-the-cross-validation-for-ridge-and-or-lasso'),
               ('Cross-validation in brief',
                2,
                None,
@@ -210,7 +151,19 @@
                'Cross-validation',
                2,
                None,
-               'code-example-for-cross-validation-and-k-fold-cross-validation')]}
+               'code-example-for-cross-validation-and-k-fold-cross-validation'),
+              ('More examples on bootstrap and cross-validation and errors',
+               2,
+               None,
+               'more-examples-on-bootstrap-and-cross-validation-and-errors'),
+              ('The same example but now with cross-validation',
+               2,
+               None,
+               'the-same-example-but-now-with-cross-validation'),
+              ('Material for the lab sessions',
+               2,
+               None,
+               'material-for-the-lab-sessions')]}
 end of tocinfo -->
 
 <body>
@@ -238,76 +191,55 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week38-bs.html">Week 38: Logistic Regression and Optimization</a>
+    <a class="navbar-brand" href="http://github.com/week38-bs.html">Week 38: Statistical analysis, bias-variance tradeoff and resampling methods</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs001.html#plans-for-week-38-lecture-monday-september-16" style="font-size: 80%;">Plans for week 38, lecture Monday September 16</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs002.html#suggested-reading-and-videos" style="font-size: 80%;">Suggested reading and videos</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs003.html#plans-for-the-lab-sessions" style="font-size: 80%;">Plans for the lab sessions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs004.html#material-for-lecture-monday-september-16" style="font-size: 80%;">Material for lecture Monday September 16</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs005.html#logistic-regression" style="font-size: 80%;">Logistic Regression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs006.html#classification-problems" style="font-size: 80%;">Classification problems</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs007.html#optimization-and-deep-learning" style="font-size: 80%;">Optimization and Deep learning</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs008.html#basics" style="font-size: 80%;">Basics</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs009.html#linear-classifier" style="font-size: 80%;">Linear classifier</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs010.html#some-selected-properties" style="font-size: 80%;">Some selected properties</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs011.html#simple-example" style="font-size: 80%;">Simple example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs012.html#plotting-the-mean-value-for-each-group" style="font-size: 80%;">Plotting the mean value for each group</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs013.html#the-logistic-function" style="font-size: 80%;">The logistic function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs014.html#examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-networks" style="font-size: 80%;">Examples of likelihood functions used in logistic regression and nueral networks</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs015.html#two-parameters" style="font-size: 80%;">Two parameters</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs016.html#maximum-likelihood" style="font-size: 80%;">Maximum likelihood</a></li>
-     <!-- navigation toc: --> <li><a href="#the-cost-function-rewritten" style="font-size: 80%;">The cost function rewritten</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs018.html#minimizing-the-cross-entropy" style="font-size: 80%;">Minimizing the cross entropy</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs019.html#a-more-compact-expression" style="font-size: 80%;">A more compact expression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs020.html#extending-to-more-predictors" style="font-size: 80%;">Extending to more predictors</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs021.html#including-more-classes" style="font-size: 80%;">Including more classes</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs022.html#more-classes" style="font-size: 80%;">More classes</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs023.html#searching-for-optimal-regularization-parameters-lambda" style="font-size: 80%;">Searching for Optimal Regularization Parameters \( \lambda \)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs024.html#grid-search" style="font-size: 80%;">Grid Search</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs025.html#randomized-grid-search" style="font-size: 80%;">Randomized Grid Search</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs026.html#wisconsin-cancer-data" style="font-size: 80%;">Wisconsin Cancer Data</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs027.html#using-the-correlation-matrix" style="font-size: 80%;">Using the correlation matrix</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs028.html#discussing-the-correlation-data" style="font-size: 80%;">Discussing the correlation data</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs029.html#other-measures-in-classification-studies-cancer-data-again" style="font-size: 80%;">Other measures in classification studies: Cancer Data  again</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs030.html#optimization-the-central-part-of-any-machine-learning-algortithm" style="font-size: 80%;">Optimization, the central part of any Machine Learning algortithm</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs031.html#revisiting-our-logistic-regression-case" style="font-size: 80%;">Revisiting our Logistic Regression case</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs032.html#the-equations-to-solve" style="font-size: 80%;">The equations to solve</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs033.html#solving-using-newton-raphson-s-method" style="font-size: 80%;">Solving using Newton-Raphson's method</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs034.html#brief-reminder-on-newton-raphson-s-method" style="font-size: 80%;">Brief reminder on Newton-Raphson's method</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs035.html#the-equations" style="font-size: 80%;">The equations</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs036.html#simple-geometric-interpretation" style="font-size: 80%;">Simple geometric interpretation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs037.html#extending-to-more-than-one-variable" style="font-size: 80%;">Extending to more than one variable</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs038.html#steepest-descent" style="font-size: 80%;">Steepest descent</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs039.html#more-on-steepest-descent" style="font-size: 80%;">More on Steepest descent</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs040.html#the-ideal" style="font-size: 80%;">The ideal</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs041.html#the-sensitiveness-of-the-gradient-descent" style="font-size: 80%;">The sensitiveness of the gradient descent</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs042.html#convex-functions" style="font-size: 80%;">Convex functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs043.html#convex-function" style="font-size: 80%;">Convex function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs044.html#conditions-on-convex-functions" style="font-size: 80%;">Conditions on convex functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs045.html#more-on-convex-functions" style="font-size: 80%;">More on convex functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs046.html#some-simple-problems" style="font-size: 80%;">Some simple problems</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs047.html#revisiting-our-first-homework" style="font-size: 80%;">Revisiting our first homework</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs052.html#gradient-descent-example" style="font-size: 80%;">Gradient descent example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs049.html#the-derivative-of-the-cost-loss-function" style="font-size: 80%;">The derivative of the cost/loss function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs050.html#the-hessian-matrix" style="font-size: 80%;">The Hessian matrix</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs051.html#simple-program" style="font-size: 80%;">Simple program</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs052.html#gradient-descent-example" style="font-size: 80%;">Gradient Descent Example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs053.html#and-a-corresponding-example-using-scikit-learn" style="font-size: 80%;">And a corresponding example using <b>scikit-learn</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs054.html#gradient-descent-and-ridge" style="font-size: 80%;">Gradient descent and Ridge</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs055.html#the-hessian-matrix-for-ridge-regression" style="font-size: 80%;">The Hessian matrix for Ridge Regression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs056.html#program-example-for-gradient-descent-with-ridge-regression" style="font-size: 80%;">Program example for gradient descent with Ridge Regression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs057.html#using-gradient-descent-methods-limitations" style="font-size: 80%;">Using gradient descent methods, limitations</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs058.html#challenge-yourself-the-coming-weekend" style="font-size: 80%;">Challenge yourself the coming weekend</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs059.html#lab-session-material-from-last-week-and-relevant-for-the-first-project" style="font-size: 80%;">Lab session: Material from last week and relevant for the first project</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs060.html#various-steps-in-cross-validation" style="font-size: 80%;">Various steps in cross-validation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs061.html#how-to-set-up-the-cross-validation-for-ridge-and-or-lasso" style="font-size: 80%;">How to set up the cross-validation for Ridge and/or Lasso</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs062.html#cross-validation-in-brief" style="font-size: 80%;">Cross-validation in brief</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs063.html#code-example-for-cross-validation-and-k-fold-cross-validation" style="font-size: 80%;">Code Example for Cross-validation and \( k \)-fold Cross-validation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs001.html#plans-for-week-38-lecture-monday-september-15" style="font-size: 80%;">Plans for week 38, lecture Monday September 15</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs002.html#readings-and-videos" style="font-size: 80%;">Readings and Videos</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs003.html#linking-the-regression-analysis-with-a-statistical-interpretation" style="font-size: 80%;">Linking the regression analysis with a statistical interpretation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs004.html#assumptions-made" style="font-size: 80%;">Assumptions made</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs005.html#expectation-value-and-variance" style="font-size: 80%;">Expectation value and variance</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs006.html#expectation-value-and-variance-for-boldsymbol-theta" style="font-size: 80%;">Expectation value and variance for \( \boldsymbol{\theta} \)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs007.html#deriving-ols-from-a-probability-distribution" style="font-size: 80%;">Deriving OLS from a probability distribution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs008.html#independent-and-identically-distributed-iid" style="font-size: 80%;">Independent and Identically Distributed (iid)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs009.html#maximum-likelihood-estimation-mle" style="font-size: 80%;">Maximum Likelihood Estimation (MLE)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs010.html#a-new-cost-function" style="font-size: 80%;">A new Cost Function</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs014.html#why-resampling-methods" style="font-size: 80%;">Why resampling methods</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs016.html#resampling-methods" style="font-size: 80%;">Resampling methods</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs013.html#resampling-approaches-can-be-computationally-expensive" style="font-size: 80%;">Resampling approaches can be computationally expensive</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs014.html#why-resampling-methods" style="font-size: 80%;">Why resampling methods ?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs015.html#statistical-analysis" style="font-size: 80%;">Statistical analysis</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs016.html#resampling-methods" style="font-size: 80%;">Resampling methods</a></li>
+     <!-- navigation toc: --> <li><a href="#resampling-methods-bootstrap" style="font-size: 80%;">Resampling methods: Bootstrap</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs018.html#the-central-limit-theorem" style="font-size: 80%;">The Central Limit Theorem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs019.html#finding-the-limit" style="font-size: 80%;">Finding the Limit</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs020.html#rewriting-the-delta-function" style="font-size: 80%;">Rewriting the \( \delta \)-function</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs021.html#identifying-terms" style="font-size: 80%;">Identifying Terms</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs022.html#wrapping-it-up" style="font-size: 80%;">Wrapping it up</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs023.html#confidence-intervals" style="font-size: 80%;">Confidence Intervals</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs024.html#standard-approach-based-on-the-normal-distribution" style="font-size: 80%;">Standard Approach based on the Normal Distribution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs025.html#resampling-methods-bootstrap-background" style="font-size: 80%;">Resampling methods: Bootstrap background</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs026.html#resampling-methods-more-bootstrap-background" style="font-size: 80%;">Resampling methods: More Bootstrap background</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs027.html#resampling-methods-bootstrap-approach" style="font-size: 80%;">Resampling methods: Bootstrap approach</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs028.html#resampling-methods-bootstrap-steps" style="font-size: 80%;">Resampling methods: Bootstrap steps</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs029.html#code-example-for-the-bootstrap-method" style="font-size: 80%;">Code example for the Bootstrap method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs030.html#plotting-the-histogram" style="font-size: 80%;">Plotting the Histogram</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs031.html#the-bias-variance-tradeoff" style="font-size: 80%;">The bias-variance tradeoff</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs032.html#a-way-to-read-the-bias-variance-tradeoff" style="font-size: 80%;">A way to Read the Bias-Variance Tradeoff</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs033.html#example-code-for-bias-variance-tradeoff" style="font-size: 80%;">Example code for Bias-Variance tradeoff</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs034.html#understanding-what-happens" style="font-size: 80%;">Understanding what happens</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs035.html#summing-up" style="font-size: 80%;">Summing up</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs036.html#another-example-from-scikit-learn-s-repository" style="font-size: 80%;">Another Example from Scikit-Learn's Repository</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs037.html#various-steps-in-cross-validation" style="font-size: 80%;">Various steps in cross-validation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs038.html#cross-validation-in-brief" style="font-size: 80%;">Cross-validation in brief</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs039.html#code-example-for-cross-validation-and-k-fold-cross-validation" style="font-size: 80%;">Code Example for Cross-validation and \( k \)-fold Cross-validation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs040.html#more-examples-on-bootstrap-and-cross-validation-and-errors" style="font-size: 80%;">More examples on bootstrap and cross-validation and errors</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs041.html#the-same-example-but-now-with-cross-validation" style="font-size: 80%;">The same example but now with cross-validation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs042.html#material-for-the-lab-sessions" style="font-size: 80%;">Material for the lab sessions</a></li>
 
         </ul>
       </li>
@@ -319,23 +251,28 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0017"></a>
 <!-- !split -->
-<h2 id="the-cost-function-rewritten" class="anchor">The cost function rewritten </h2>
+<h2 id="resampling-methods-bootstrap" class="anchor">Resampling methods: Bootstrap </h2>
+<div class="panel panel-default">
+<div class="panel-body">
+<!-- subsequent paragraphs come in larger fonts, so start with a paragraph -->
+<p>Bootstrapping is a <a href="https://en.wikipedia.org/wiki/Nonparametric_statistics" target="_self">non-parametric approach</a> to statistical inference
+that substitutes computation for more traditional distributional
+assumptions and asymptotic results. Bootstrapping offers a number of
+advantages: 
+</p>
+<ol>
+<li> The bootstrap is quite general, although there are some cases in which it fails.</li>  
+<li> Because it does not require distributional assumptions (such as normally distributed errors), the bootstrap can provide more accurate inferences when the data are not well behaved or when the sample size is small.</li>  
+<li> It is possible to apply the bootstrap to statistics with sampling distributions that are difficult to derive, even asymptotically.</li> 
+<li> It is relatively simple to apply the bootstrap to complex data-collection plans (such as stratified and clustered samples).</li>
+</ol>
+</div>
+</div>
 
-<p>Reordering the logarithms, we can rewrite the <b>cost/loss</b> function as</p>
-$$
-\mathcal{C}(\boldsymbol{\beta}) = \sum_{i=1}^n  \left(y_i(\beta_0+\beta_1x_i) -\log{(1+\exp{(\beta_0+\beta_1x_i)})}\right).
-$$
 
-<p>The maximum likelihood estimator is defined as the set of parameters that maximize the log-likelihood where we maximize with respect to \( \beta \).
-Since the cost (error) function is just the negative log-likelihood, for logistic regression we have that
-</p>
-$$
-\mathcal{C}(\boldsymbol{\beta})=-\sum_{i=1}^n  \left(y_i(\beta_0+\beta_1x_i) -\log{(1+\exp{(\beta_0+\beta_1x_i)})}\right).
-$$
+<p>The textbook by <a href="https://www.cambridge.org/core/books/bootstrap-methods-and-their-application/ED2FD043579F27952363566DC09CBD6A" target="_self">Davison on the Bootstrap Methods and their Applications</a> provides many more insights and proofs. In this course we will take a more practical approach and use the results and theorems provided in the literature. For those interested in reading more about the bootstrap methods, we recommend the above text and the one by <a href="https://www.routledge.com/An-Introduction-to-the-Bootstrap/Efron-Tibshirani/p/book/9780412042317" target="_self">Efron and Tibshirani</a>.</p>
 
-<p>This equation is known in statistics as the <b>cross entropy</b>. Finally, we note that just as in linear regression, 
-in practice we often supplement the cross-entropy with additional regularization terms, usually \( L_1 \) and \( L_2 \) regularization as we did for Ridge and Lasso regression.
-</p>
+<p>Before we proceed however, we need to remind ourselves about a central theorem in statistics, namely the so-called <b>central limit theorem</b>.</p>
 
 <p>
 <!-- navigation buttons at the bottom of the page -->
@@ -362,7 +299,7 @@ <h2 id="the-cost-function-rewritten" class="anchor">The cost function rewritten
   <li><a href="http://github.com/._week38-bs025.html">26</a></li>
   <li><a href="http://github.com/._week38-bs026.html">27</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week38-bs063.html">64</a></li>
+  <li><a href="http://github.com/._week38-bs042.html">43</a></li>
   <li><a href="http://github.com/._week38-bs018.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week38/html/._week38-bs018.html b/doc/pub/week38/html/._week38-bs018.html
index 92aa08d87..11422bf7b 100644
--- a/doc/pub/week38/html/._week38-bs018.html
+++ b/doc/pub/week38/html/._week38-bs018.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 38: Logistic Regression and Optimization">
-<title>Week 38: Logistic Regression and Optimization</title>
+<meta name="description" content="Week 38: Statistical analysis, bias-variance tradeoff and resampling methods">
+<title>Week 38: Statistical analysis, bias-variance tradeoff and resampling methods</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week38.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week38-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -36,172 +36,113 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Plans for week 38, lecture Monday September 16',
+ 'sections': [('Plans for week 38, lecture Monday September 15',
                2,
                None,
-               'plans-for-week-38-lecture-monday-september-16'),
-              ('Suggested reading and videos',
+               'plans-for-week-38-lecture-monday-september-15'),
+              ('Readings and Videos', 2, None, 'readings-and-videos'),
+              ('Linking the regression analysis with a statistical '
+               'interpretation',
                2,
                None,
-               'suggested-reading-and-videos'),
-              ('Plans for the lab sessions',
+               'linking-the-regression-analysis-with-a-statistical-interpretation'),
+              ('Assumptions made', 2, None, 'assumptions-made'),
+              ('Expectation value and variance',
                2,
                None,
-               'plans-for-the-lab-sessions'),
-              ('Material for lecture Monday September 16',
+               'expectation-value-and-variance'),
+              ('Expectation value and variance for $\\boldsymbol{\\theta}$',
                2,
                None,
-               'material-for-lecture-monday-september-16'),
-              ('Logistic Regression', 2, None, 'logistic-regression'),
-              ('Classification problems', 2, None, 'classification-problems'),
-              ('Optimization and Deep learning',
+               'expectation-value-and-variance-for-boldsymbol-theta'),
+              ('Deriving OLS from a probability distribution',
                2,
                None,
-               'optimization-and-deep-learning'),
-              ('Basics', 2, None, 'basics'),
-              ('Linear classifier', 2, None, 'linear-classifier'),
-              ('Some selected properties', 2, None, 'some-selected-properties'),
-              ('Simple example', 2, None, 'simple-example'),
-              ('Plotting the mean value for each group',
+               'deriving-ols-from-a-probability-distribution'),
+              ('Independent and Identically Distributed (iid)',
                2,
                None,
-               'plotting-the-mean-value-for-each-group'),
-              ('The logistic function', 2, None, 'the-logistic-function'),
-              ('Examples of likelihood functions used in logistic regression '
-               'and nueral networks',
+               'independent-and-identically-distributed-iid'),
+              ('Maximum Likelihood Estimation (MLE)',
                2,
                None,
-               'examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-networks'),
-              ('Two parameters', 2, None, 'two-parameters'),
-              ('Maximum likelihood', 2, None, 'maximum-likelihood'),
-              ('The cost function rewritten',
+               'maximum-likelihood-estimation-mle'),
+              ('A new Cost Function', 2, None, 'a-new-cost-function'),
+              ('Why resampling methods', 2, None, 'why-resampling-methods'),
+              ('Resampling methods', 2, None, 'resampling-methods'),
+              ('Resampling approaches can be computationally expensive',
                2,
                None,
-               'the-cost-function-rewritten'),
-              ('Minimizing the cross entropy',
+               'resampling-approaches-can-be-computationally-expensive'),
+              ('Why resampling methods ?', 2, None, 'why-resampling-methods'),
+              ('Statistical analysis', 2, None, 'statistical-analysis'),
+              ('Resampling methods', 2, None, 'resampling-methods'),
+              ('Resampling methods: Bootstrap',
                2,
                None,
-               'minimizing-the-cross-entropy'),
-              ('A more compact expression',
+               'resampling-methods-bootstrap'),
+              ('The Central Limit Theorem',
                2,
                None,
-               'a-more-compact-expression'),
-              ('Extending to more predictors',
+               'the-central-limit-theorem'),
+              ('Finding the Limit', 2, None, 'finding-the-limit'),
+              ('Rewriting the $\\delta$-function',
                2,
                None,
-               'extending-to-more-predictors'),
-              ('Including more classes', 2, None, 'including-more-classes'),
-              ('More classes', 2, None, 'more-classes'),
-              ('Searching for Optimal Regularization Parameters $\\lambda$',
+               'rewriting-the-delta-function'),
+              ('Identifying Terms', 2, None, 'identifying-terms'),
+              ('Wrapping it up', 2, None, 'wrapping-it-up'),
+              ('Confidence Intervals', 2, None, 'confidence-intervals'),
+              ('Standard Approach based on the Normal Distribution',
                2,
                None,
-               'searching-for-optimal-regularization-parameters-lambda'),
-              ('Grid Search', 2, None, 'grid-search'),
-              ('Randomized Grid Search', 2, None, 'randomized-grid-search'),
-              ('Wisconsin Cancer Data', 2, None, 'wisconsin-cancer-data'),
-              ('Using the correlation matrix',
+               'standard-approach-based-on-the-normal-distribution'),
+              ('Resampling methods: Bootstrap background',
                2,
                None,
-               'using-the-correlation-matrix'),
-              ('Discussing the correlation data',
+               'resampling-methods-bootstrap-background'),
+              ('Resampling methods: More Bootstrap background',
                2,
                None,
-               'discussing-the-correlation-data'),
-              ('Other measures in classification studies: Cancer Data  again',
+               'resampling-methods-more-bootstrap-background'),
+              ('Resampling methods: Bootstrap approach',
                2,
                None,
-               'other-measures-in-classification-studies-cancer-data-again'),
-              ('Optimization, the central part of any Machine Learning '
-               'algortithm',
+               'resampling-methods-bootstrap-approach'),
+              ('Resampling methods: Bootstrap steps',
                2,
                None,
-               'optimization-the-central-part-of-any-machine-learning-algortithm'),
-              ('Revisiting our Logistic Regression case',
+               'resampling-methods-bootstrap-steps'),
+              ('Code example for the Bootstrap method',
                2,
                None,
-               'revisiting-our-logistic-regression-case'),
-              ('The equations to solve', 2, None, 'the-equations-to-solve'),
-              ("Solving using Newton-Raphson's method",
+               'code-example-for-the-bootstrap-method'),
+              ('Plotting the Histogram', 2, None, 'plotting-the-histogram'),
+              ('The bias-variance tradeoff',
                2,
                None,
-               'solving-using-newton-raphson-s-method'),
-              ("Brief reminder on Newton-Raphson's method",
+               'the-bias-variance-tradeoff'),
+              ('A way to Read the Bias-Variance Tradeoff',
                2,
                None,
-               'brief-reminder-on-newton-raphson-s-method'),
-              ('The equations', 2, None, 'the-equations'),
-              ('Simple geometric interpretation',
+               'a-way-to-read-the-bias-variance-tradeoff'),
+              ('Example code for Bias-Variance tradeoff',
                2,
                None,
-               'simple-geometric-interpretation'),
-              ('Extending to more than one variable',
+               'example-code-for-bias-variance-tradeoff'),
+              ('Understanding what happens',
                2,
                None,
-               'extending-to-more-than-one-variable'),
-              ('Steepest descent', 2, None, 'steepest-descent'),
-              ('More on Steepest descent', 2, None, 'more-on-steepest-descent'),
-              ('The ideal', 2, None, 'the-ideal'),
-              ('The sensitiveness of the gradient descent',
+               'understanding-what-happens'),
+              ('Summing up', 2, None, 'summing-up'),
+              ("Another Example from Scikit-Learn's Repository",
                2,
                None,
-               'the-sensitiveness-of-the-gradient-descent'),
-              ('Convex functions', 2, None, 'convex-functions'),
-              ('Convex function', 2, None, 'convex-function'),
-              ('Conditions on convex functions',
-               2,
-               None,
-               'conditions-on-convex-functions'),
-              ('More on convex functions', 2, None, 'more-on-convex-functions'),
-              ('Some simple problems', 2, None, 'some-simple-problems'),
-              ('Revisiting our first homework',
-               2,
-               None,
-               'revisiting-our-first-homework'),
-              ('Gradient descent example', 2, None, 'gradient-descent-example'),
-              ('The derivative of the cost/loss function',
-               2,
-               None,
-               'the-derivative-of-the-cost-loss-function'),
-              ('The Hessian matrix', 2, None, 'the-hessian-matrix'),
-              ('Simple program', 2, None, 'simple-program'),
-              ('Gradient Descent Example', 2, None, 'gradient-descent-example'),
-              ('And a corresponding example using _scikit-learn_',
-               2,
-               None,
-               'and-a-corresponding-example-using-scikit-learn'),
-              ('Gradient descent and Ridge',
-               2,
-               None,
-               'gradient-descent-and-ridge'),
-              ('The Hessian matrix for Ridge Regression',
-               2,
-               None,
-               'the-hessian-matrix-for-ridge-regression'),
-              ('Program example for gradient descent with Ridge Regression',
-               2,
-               None,
-               'program-example-for-gradient-descent-with-ridge-regression'),
-              ('Using gradient descent methods, limitations',
-               2,
-               None,
-               'using-gradient-descent-methods-limitations'),
-              ('Challenge yourself the coming weekend',
-               2,
-               None,
-               'challenge-yourself-the-coming-weekend'),
-              ('Lab session: Material from last week and relevant for the '
-               'first project',
-               2,
-               None,
-               'lab-session-material-from-last-week-and-relevant-for-the-first-project'),
+               'another-example-from-scikit-learn-s-repository'),
               ('Various steps in cross-validation',
                2,
                None,
                'various-steps-in-cross-validation'),
-              ('How to set up the cross-validation for Ridge and/or Lasso',
-               2,
-               None,
-               'how-to-set-up-the-cross-validation-for-ridge-and-or-lasso'),
               ('Cross-validation in brief',
                2,
                None,
@@ -210,7 +151,19 @@
                'Cross-validation',
                2,
                None,
-               'code-example-for-cross-validation-and-k-fold-cross-validation')]}
+               'code-example-for-cross-validation-and-k-fold-cross-validation'),
+              ('More examples on bootstrap and cross-validation and errors',
+               2,
+               None,
+               'more-examples-on-bootstrap-and-cross-validation-and-errors'),
+              ('The same example but now with cross-validation',
+               2,
+               None,
+               'the-same-example-but-now-with-cross-validation'),
+              ('Material for the lab sessions',
+               2,
+               None,
+               'material-for-the-lab-sessions')]}
 end of tocinfo -->
 
 <body>
@@ -238,76 +191,55 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week38-bs.html">Week 38: Logistic Regression and Optimization</a>
+    <a class="navbar-brand" href="http://github.com/week38-bs.html">Week 38: Statistical analysis, bias-variance tradeoff and resampling methods</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs001.html#plans-for-week-38-lecture-monday-september-16" style="font-size: 80%;">Plans for week 38, lecture Monday September 16</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs002.html#suggested-reading-and-videos" style="font-size: 80%;">Suggested reading and videos</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs003.html#plans-for-the-lab-sessions" style="font-size: 80%;">Plans for the lab sessions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs004.html#material-for-lecture-monday-september-16" style="font-size: 80%;">Material for lecture Monday September 16</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs005.html#logistic-regression" style="font-size: 80%;">Logistic Regression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs006.html#classification-problems" style="font-size: 80%;">Classification problems</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs007.html#optimization-and-deep-learning" style="font-size: 80%;">Optimization and Deep learning</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs008.html#basics" style="font-size: 80%;">Basics</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs009.html#linear-classifier" style="font-size: 80%;">Linear classifier</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs010.html#some-selected-properties" style="font-size: 80%;">Some selected properties</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs011.html#simple-example" style="font-size: 80%;">Simple example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs012.html#plotting-the-mean-value-for-each-group" style="font-size: 80%;">Plotting the mean value for each group</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs013.html#the-logistic-function" style="font-size: 80%;">The logistic function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs014.html#examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-networks" style="font-size: 80%;">Examples of likelihood functions used in logistic regression and nueral networks</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs015.html#two-parameters" style="font-size: 80%;">Two parameters</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs016.html#maximum-likelihood" style="font-size: 80%;">Maximum likelihood</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs017.html#the-cost-function-rewritten" style="font-size: 80%;">The cost function rewritten</a></li>
-     <!-- navigation toc: --> <li><a href="#minimizing-the-cross-entropy" style="font-size: 80%;">Minimizing the cross entropy</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs019.html#a-more-compact-expression" style="font-size: 80%;">A more compact expression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs020.html#extending-to-more-predictors" style="font-size: 80%;">Extending to more predictors</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs021.html#including-more-classes" style="font-size: 80%;">Including more classes</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs022.html#more-classes" style="font-size: 80%;">More classes</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs023.html#searching-for-optimal-regularization-parameters-lambda" style="font-size: 80%;">Searching for Optimal Regularization Parameters \( \lambda \)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs024.html#grid-search" style="font-size: 80%;">Grid Search</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs025.html#randomized-grid-search" style="font-size: 80%;">Randomized Grid Search</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs026.html#wisconsin-cancer-data" style="font-size: 80%;">Wisconsin Cancer Data</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs027.html#using-the-correlation-matrix" style="font-size: 80%;">Using the correlation matrix</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs028.html#discussing-the-correlation-data" style="font-size: 80%;">Discussing the correlation data</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs029.html#other-measures-in-classification-studies-cancer-data-again" style="font-size: 80%;">Other measures in classification studies: Cancer Data  again</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs030.html#optimization-the-central-part-of-any-machine-learning-algortithm" style="font-size: 80%;">Optimization, the central part of any Machine Learning algortithm</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs031.html#revisiting-our-logistic-regression-case" style="font-size: 80%;">Revisiting our Logistic Regression case</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs032.html#the-equations-to-solve" style="font-size: 80%;">The equations to solve</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs033.html#solving-using-newton-raphson-s-method" style="font-size: 80%;">Solving using Newton-Raphson's method</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs034.html#brief-reminder-on-newton-raphson-s-method" style="font-size: 80%;">Brief reminder on Newton-Raphson's method</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs035.html#the-equations" style="font-size: 80%;">The equations</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs036.html#simple-geometric-interpretation" style="font-size: 80%;">Simple geometric interpretation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs037.html#extending-to-more-than-one-variable" style="font-size: 80%;">Extending to more than one variable</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs038.html#steepest-descent" style="font-size: 80%;">Steepest descent</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs039.html#more-on-steepest-descent" style="font-size: 80%;">More on Steepest descent</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs040.html#the-ideal" style="font-size: 80%;">The ideal</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs041.html#the-sensitiveness-of-the-gradient-descent" style="font-size: 80%;">The sensitiveness of the gradient descent</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs042.html#convex-functions" style="font-size: 80%;">Convex functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs043.html#convex-function" style="font-size: 80%;">Convex function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs044.html#conditions-on-convex-functions" style="font-size: 80%;">Conditions on convex functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs045.html#more-on-convex-functions" style="font-size: 80%;">More on convex functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs046.html#some-simple-problems" style="font-size: 80%;">Some simple problems</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs047.html#revisiting-our-first-homework" style="font-size: 80%;">Revisiting our first homework</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs052.html#gradient-descent-example" style="font-size: 80%;">Gradient descent example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs049.html#the-derivative-of-the-cost-loss-function" style="font-size: 80%;">The derivative of the cost/loss function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs050.html#the-hessian-matrix" style="font-size: 80%;">The Hessian matrix</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs051.html#simple-program" style="font-size: 80%;">Simple program</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs052.html#gradient-descent-example" style="font-size: 80%;">Gradient Descent Example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs053.html#and-a-corresponding-example-using-scikit-learn" style="font-size: 80%;">And a corresponding example using <b>scikit-learn</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs054.html#gradient-descent-and-ridge" style="font-size: 80%;">Gradient descent and Ridge</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs055.html#the-hessian-matrix-for-ridge-regression" style="font-size: 80%;">The Hessian matrix for Ridge Regression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs056.html#program-example-for-gradient-descent-with-ridge-regression" style="font-size: 80%;">Program example for gradient descent with Ridge Regression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs057.html#using-gradient-descent-methods-limitations" style="font-size: 80%;">Using gradient descent methods, limitations</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs058.html#challenge-yourself-the-coming-weekend" style="font-size: 80%;">Challenge yourself the coming weekend</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs059.html#lab-session-material-from-last-week-and-relevant-for-the-first-project" style="font-size: 80%;">Lab session: Material from last week and relevant for the first project</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs060.html#various-steps-in-cross-validation" style="font-size: 80%;">Various steps in cross-validation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs061.html#how-to-set-up-the-cross-validation-for-ridge-and-or-lasso" style="font-size: 80%;">How to set up the cross-validation for Ridge and/or Lasso</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs062.html#cross-validation-in-brief" style="font-size: 80%;">Cross-validation in brief</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs063.html#code-example-for-cross-validation-and-k-fold-cross-validation" style="font-size: 80%;">Code Example for Cross-validation and \( k \)-fold Cross-validation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs001.html#plans-for-week-38-lecture-monday-september-15" style="font-size: 80%;">Plans for week 38, lecture Monday September 15</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs002.html#readings-and-videos" style="font-size: 80%;">Readings and Videos</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs003.html#linking-the-regression-analysis-with-a-statistical-interpretation" style="font-size: 80%;">Linking the regression analysis with a statistical interpretation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs004.html#assumptions-made" style="font-size: 80%;">Assumptions made</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs005.html#expectation-value-and-variance" style="font-size: 80%;">Expectation value and variance</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs006.html#expectation-value-and-variance-for-boldsymbol-theta" style="font-size: 80%;">Expectation value and variance for \( \boldsymbol{\theta} \)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs007.html#deriving-ols-from-a-probability-distribution" style="font-size: 80%;">Deriving OLS from a probability distribution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs008.html#independent-and-identically-distributed-iid" style="font-size: 80%;">Independent and Identically Distributed (iid)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs009.html#maximum-likelihood-estimation-mle" style="font-size: 80%;">Maximum Likelihood Estimation (MLE)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs010.html#a-new-cost-function" style="font-size: 80%;">A new Cost Function</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs014.html#why-resampling-methods" style="font-size: 80%;">Why resampling methods</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs016.html#resampling-methods" style="font-size: 80%;">Resampling methods</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs013.html#resampling-approaches-can-be-computationally-expensive" style="font-size: 80%;">Resampling approaches can be computationally expensive</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs014.html#why-resampling-methods" style="font-size: 80%;">Why resampling methods ?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs015.html#statistical-analysis" style="font-size: 80%;">Statistical analysis</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs016.html#resampling-methods" style="font-size: 80%;">Resampling methods</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs017.html#resampling-methods-bootstrap" style="font-size: 80%;">Resampling methods: Bootstrap</a></li>
+     <!-- navigation toc: --> <li><a href="#the-central-limit-theorem" style="font-size: 80%;">The Central Limit Theorem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs019.html#finding-the-limit" style="font-size: 80%;">Finding the Limit</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs020.html#rewriting-the-delta-function" style="font-size: 80%;">Rewriting the \( \delta \)-function</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs021.html#identifying-terms" style="font-size: 80%;">Identifying Terms</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs022.html#wrapping-it-up" style="font-size: 80%;">Wrapping it up</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs023.html#confidence-intervals" style="font-size: 80%;">Confidence Intervals</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs024.html#standard-approach-based-on-the-normal-distribution" style="font-size: 80%;">Standard Approach based on the Normal Distribution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs025.html#resampling-methods-bootstrap-background" style="font-size: 80%;">Resampling methods: Bootstrap background</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs026.html#resampling-methods-more-bootstrap-background" style="font-size: 80%;">Resampling methods: More Bootstrap background</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs027.html#resampling-methods-bootstrap-approach" style="font-size: 80%;">Resampling methods: Bootstrap approach</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs028.html#resampling-methods-bootstrap-steps" style="font-size: 80%;">Resampling methods: Bootstrap steps</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs029.html#code-example-for-the-bootstrap-method" style="font-size: 80%;">Code example for the Bootstrap method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs030.html#plotting-the-histogram" style="font-size: 80%;">Plotting the Histogram</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs031.html#the-bias-variance-tradeoff" style="font-size: 80%;">The bias-variance tradeoff</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs032.html#a-way-to-read-the-bias-variance-tradeoff" style="font-size: 80%;">A way to Read the Bias-Variance Tradeoff</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs033.html#example-code-for-bias-variance-tradeoff" style="font-size: 80%;">Example code for Bias-Variance tradeoff</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs034.html#understanding-what-happens" style="font-size: 80%;">Understanding what happens</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs035.html#summing-up" style="font-size: 80%;">Summing up</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs036.html#another-example-from-scikit-learn-s-repository" style="font-size: 80%;">Another Example from Scikit-Learn's Repository</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs037.html#various-steps-in-cross-validation" style="font-size: 80%;">Various steps in cross-validation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs038.html#cross-validation-in-brief" style="font-size: 80%;">Cross-validation in brief</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs039.html#code-example-for-cross-validation-and-k-fold-cross-validation" style="font-size: 80%;">Code Example for Cross-validation and \( k \)-fold Cross-validation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs040.html#more-examples-on-bootstrap-and-cross-validation-and-errors" style="font-size: 80%;">More examples on bootstrap and cross-validation and errors</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs041.html#the-same-example-but-now-with-cross-validation" style="font-size: 80%;">The same example but now with cross-validation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs042.html#material-for-the-lab-sessions" style="font-size: 80%;">Material for the lab sessions</a></li>
 
         </ul>
       </li>
@@ -319,25 +251,23 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0018"></a>
 <!-- !split -->
-<h2 id="minimizing-the-cross-entropy" class="anchor">Minimizing the cross entropy </h2>
-
-<p>The cross entropy is a convex function of the weights \( \boldsymbol{\beta} \) and,
-therefore, any local minimizer is a global minimizer. 
-</p>
+<h2 id="the-central-limit-theorem" class="anchor">The Central Limit Theorem </h2>
 
-<p>Minimizing this
-cost function with respect to the two parameters \( \beta_0 \) and \( \beta_1 \) we obtain
+<p>Suppose we have a PDF \( p(x) \) from which we generate  a series \( N \)
+of averages \( \mathbb{E}[x_i] \). Each mean value \( \mathbb{E}[x_i] \)
+is viewed as the average of a specific measurement, e.g., throwing 
+dice 100 times and then taking the average value, or producing a certain
+amount of random numbers. 
+For notational ease, we set \( \mathbb{E}[x_i]=x_i \) in the discussion
+which follows. We do the same for \( \mathbb{E}[z]=z \).
 </p>
 
+<p>If we compute the mean \( z \) of \( m \) such mean values \( x_i \)   </p>
 $$
-\frac{\partial \mathcal{C}(\boldsymbol{\beta})}{\partial \beta_0} = -\sum_{i=1}^n  \left(y_i -\frac{\exp{(\beta_0+\beta_1x_i)}}{1+\exp{(\beta_0+\beta_1x_i)}}\right),
-$$
-
-<p>and </p>
-$$
-\frac{\partial \mathcal{C}(\boldsymbol{\beta})}{\partial \beta_1} = -\sum_{i=1}^n  \left(y_ix_i -x_i\frac{\exp{(\beta_0+\beta_1x_i)}}{1+\exp{(\beta_0+\beta_1x_i)}}\right).
+   z=\frac{x_1+x_2+\dots+x_m}{m},
 $$
 
+<p>the question we pose is which is the PDF of the new variable \( z \).</p>
 
 <p>
 <!-- navigation buttons at the bottom of the page -->
@@ -364,7 +294,7 @@ <h2 id="minimizing-the-cross-entropy" class="anchor">Minimizing the cross entrop
   <li><a href="http://github.com/._week38-bs026.html">27</a></li>
   <li><a href="http://github.com/._week38-bs027.html">28</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week38-bs063.html">64</a></li>
+  <li><a href="http://github.com/._week38-bs042.html">43</a></li>
   <li><a href="http://github.com/._week38-bs019.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week38/html/._week38-bs019.html b/doc/pub/week38/html/._week38-bs019.html
index 80a1ba0a2..5e0dbe7ed 100644
--- a/doc/pub/week38/html/._week38-bs019.html
+++ b/doc/pub/week38/html/._week38-bs019.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 38: Logistic Regression and Optimization">
-<title>Week 38: Logistic Regression and Optimization</title>
+<meta name="description" content="Week 38: Statistical analysis, bias-variance tradeoff and resampling methods">
+<title>Week 38: Statistical analysis, bias-variance tradeoff and resampling methods</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week38.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week38-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -36,172 +36,113 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Plans for week 38, lecture Monday September 16',
+ 'sections': [('Plans for week 38, lecture Monday September 15',
                2,
                None,
-               'plans-for-week-38-lecture-monday-september-16'),
-              ('Suggested reading and videos',
+               'plans-for-week-38-lecture-monday-september-15'),
+              ('Readings and Videos', 2, None, 'readings-and-videos'),
+              ('Linking the regression analysis with a statistical '
+               'interpretation',
                2,
                None,
-               'suggested-reading-and-videos'),
-              ('Plans for the lab sessions',
+               'linking-the-regression-analysis-with-a-statistical-interpretation'),
+              ('Assumptions made', 2, None, 'assumptions-made'),
+              ('Expectation value and variance',
                2,
                None,
-               'plans-for-the-lab-sessions'),
-              ('Material for lecture Monday September 16',
+               'expectation-value-and-variance'),
+              ('Expectation value and variance for $\\boldsymbol{\\theta}$',
                2,
                None,
-               'material-for-lecture-monday-september-16'),
-              ('Logistic Regression', 2, None, 'logistic-regression'),
-              ('Classification problems', 2, None, 'classification-problems'),
-              ('Optimization and Deep learning',
+               'expectation-value-and-variance-for-boldsymbol-theta'),
+              ('Deriving OLS from a probability distribution',
                2,
                None,
-               'optimization-and-deep-learning'),
-              ('Basics', 2, None, 'basics'),
-              ('Linear classifier', 2, None, 'linear-classifier'),
-              ('Some selected properties', 2, None, 'some-selected-properties'),
-              ('Simple example', 2, None, 'simple-example'),
-              ('Plotting the mean value for each group',
+               'deriving-ols-from-a-probability-distribution'),
+              ('Independent and Identically Distributed (iid)',
                2,
                None,
-               'plotting-the-mean-value-for-each-group'),
-              ('The logistic function', 2, None, 'the-logistic-function'),
-              ('Examples of likelihood functions used in logistic regression '
-               'and nueral networks',
+               'independent-and-identically-distributed-iid'),
+              ('Maximum Likelihood Estimation (MLE)',
                2,
                None,
-               'examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-networks'),
-              ('Two parameters', 2, None, 'two-parameters'),
-              ('Maximum likelihood', 2, None, 'maximum-likelihood'),
-              ('The cost function rewritten',
+               'maximum-likelihood-estimation-mle'),
+              ('A new Cost Function', 2, None, 'a-new-cost-function'),
+              ('Why resampling methods', 2, None, 'why-resampling-methods'),
+              ('Resampling methods', 2, None, 'resampling-methods'),
+              ('Resampling approaches can be computationally expensive',
                2,
                None,
-               'the-cost-function-rewritten'),
-              ('Minimizing the cross entropy',
+               'resampling-approaches-can-be-computationally-expensive'),
+              ('Why resampling methods ?', 2, None, 'why-resampling-methods'),
+              ('Statistical analysis', 2, None, 'statistical-analysis'),
+              ('Resampling methods', 2, None, 'resampling-methods'),
+              ('Resampling methods: Bootstrap',
                2,
                None,
-               'minimizing-the-cross-entropy'),
-              ('A more compact expression',
+               'resampling-methods-bootstrap'),
+              ('The Central Limit Theorem',
                2,
                None,
-               'a-more-compact-expression'),
-              ('Extending to more predictors',
+               'the-central-limit-theorem'),
+              ('Finding the Limit', 2, None, 'finding-the-limit'),
+              ('Rewriting the $\\delta$-function',
                2,
                None,
-               'extending-to-more-predictors'),
-              ('Including more classes', 2, None, 'including-more-classes'),
-              ('More classes', 2, None, 'more-classes'),
-              ('Searching for Optimal Regularization Parameters $\\lambda$',
+               'rewriting-the-delta-function'),
+              ('Identifying Terms', 2, None, 'identifying-terms'),
+              ('Wrapping it up', 2, None, 'wrapping-it-up'),
+              ('Confidence Intervals', 2, None, 'confidence-intervals'),
+              ('Standard Approach based on the Normal Distribution',
                2,
                None,
-               'searching-for-optimal-regularization-parameters-lambda'),
-              ('Grid Search', 2, None, 'grid-search'),
-              ('Randomized Grid Search', 2, None, 'randomized-grid-search'),
-              ('Wisconsin Cancer Data', 2, None, 'wisconsin-cancer-data'),
-              ('Using the correlation matrix',
+               'standard-approach-based-on-the-normal-distribution'),
+              ('Resampling methods: Bootstrap background',
                2,
                None,
-               'using-the-correlation-matrix'),
-              ('Discussing the correlation data',
+               'resampling-methods-bootstrap-background'),
+              ('Resampling methods: More Bootstrap background',
                2,
                None,
-               'discussing-the-correlation-data'),
-              ('Other measures in classification studies: Cancer Data  again',
+               'resampling-methods-more-bootstrap-background'),
+              ('Resampling methods: Bootstrap approach',
                2,
                None,
-               'other-measures-in-classification-studies-cancer-data-again'),
-              ('Optimization, the central part of any Machine Learning '
-               'algortithm',
+               'resampling-methods-bootstrap-approach'),
+              ('Resampling methods: Bootstrap steps',
                2,
                None,
-               'optimization-the-central-part-of-any-machine-learning-algortithm'),
-              ('Revisiting our Logistic Regression case',
+               'resampling-methods-bootstrap-steps'),
+              ('Code example for the Bootstrap method',
                2,
                None,
-               'revisiting-our-logistic-regression-case'),
-              ('The equations to solve', 2, None, 'the-equations-to-solve'),
-              ("Solving using Newton-Raphson's method",
+               'code-example-for-the-bootstrap-method'),
+              ('Plotting the Histogram', 2, None, 'plotting-the-histogram'),
+              ('The bias-variance tradeoff',
                2,
                None,
-               'solving-using-newton-raphson-s-method'),
-              ("Brief reminder on Newton-Raphson's method",
+               'the-bias-variance-tradeoff'),
+              ('A way to Read the Bias-Variance Tradeoff',
                2,
                None,
-               'brief-reminder-on-newton-raphson-s-method'),
-              ('The equations', 2, None, 'the-equations'),
-              ('Simple geometric interpretation',
+               'a-way-to-read-the-bias-variance-tradeoff'),
+              ('Example code for Bias-Variance tradeoff',
                2,
                None,
-               'simple-geometric-interpretation'),
-              ('Extending to more than one variable',
+               'example-code-for-bias-variance-tradeoff'),
+              ('Understanding what happens',
                2,
                None,
-               'extending-to-more-than-one-variable'),
-              ('Steepest descent', 2, None, 'steepest-descent'),
-              ('More on Steepest descent', 2, None, 'more-on-steepest-descent'),
-              ('The ideal', 2, None, 'the-ideal'),
-              ('The sensitiveness of the gradient descent',
+               'understanding-what-happens'),
+              ('Summing up', 2, None, 'summing-up'),
+              ("Another Example from Scikit-Learn's Repository",
                2,
                None,
-               'the-sensitiveness-of-the-gradient-descent'),
-              ('Convex functions', 2, None, 'convex-functions'),
-              ('Convex function', 2, None, 'convex-function'),
-              ('Conditions on convex functions',
-               2,
-               None,
-               'conditions-on-convex-functions'),
-              ('More on convex functions', 2, None, 'more-on-convex-functions'),
-              ('Some simple problems', 2, None, 'some-simple-problems'),
-              ('Revisiting our first homework',
-               2,
-               None,
-               'revisiting-our-first-homework'),
-              ('Gradient descent example', 2, None, 'gradient-descent-example'),
-              ('The derivative of the cost/loss function',
-               2,
-               None,
-               'the-derivative-of-the-cost-loss-function'),
-              ('The Hessian matrix', 2, None, 'the-hessian-matrix'),
-              ('Simple program', 2, None, 'simple-program'),
-              ('Gradient Descent Example', 2, None, 'gradient-descent-example'),
-              ('And a corresponding example using _scikit-learn_',
-               2,
-               None,
-               'and-a-corresponding-example-using-scikit-learn'),
-              ('Gradient descent and Ridge',
-               2,
-               None,
-               'gradient-descent-and-ridge'),
-              ('The Hessian matrix for Ridge Regression',
-               2,
-               None,
-               'the-hessian-matrix-for-ridge-regression'),
-              ('Program example for gradient descent with Ridge Regression',
-               2,
-               None,
-               'program-example-for-gradient-descent-with-ridge-regression'),
-              ('Using gradient descent methods, limitations',
-               2,
-               None,
-               'using-gradient-descent-methods-limitations'),
-              ('Challenge yourself the coming weekend',
-               2,
-               None,
-               'challenge-yourself-the-coming-weekend'),
-              ('Lab session: Material from last week and relevant for the '
-               'first project',
-               2,
-               None,
-               'lab-session-material-from-last-week-and-relevant-for-the-first-project'),
+               'another-example-from-scikit-learn-s-repository'),
               ('Various steps in cross-validation',
                2,
                None,
                'various-steps-in-cross-validation'),
-              ('How to set up the cross-validation for Ridge and/or Lasso',
-               2,
-               None,
-               'how-to-set-up-the-cross-validation-for-ridge-and-or-lasso'),
               ('Cross-validation in brief',
                2,
                None,
@@ -210,7 +151,19 @@
                'Cross-validation',
                2,
                None,
-               'code-example-for-cross-validation-and-k-fold-cross-validation')]}
+               'code-example-for-cross-validation-and-k-fold-cross-validation'),
+              ('More examples on bootstrap and cross-validation and errors',
+               2,
+               None,
+               'more-examples-on-bootstrap-and-cross-validation-and-errors'),
+              ('The same example but now with cross-validation',
+               2,
+               None,
+               'the-same-example-but-now-with-cross-validation'),
+              ('Material for the lab sessions',
+               2,
+               None,
+               'material-for-the-lab-sessions')]}
 end of tocinfo -->
 
 <body>
@@ -238,76 +191,55 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week38-bs.html">Week 38: Logistic Regression and Optimization</a>
+    <a class="navbar-brand" href="http://github.com/week38-bs.html">Week 38: Statistical analysis, bias-variance tradeoff and resampling methods</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs001.html#plans-for-week-38-lecture-monday-september-16" style="font-size: 80%;">Plans for week 38, lecture Monday September 16</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs002.html#suggested-reading-and-videos" style="font-size: 80%;">Suggested reading and videos</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs003.html#plans-for-the-lab-sessions" style="font-size: 80%;">Plans for the lab sessions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs004.html#material-for-lecture-monday-september-16" style="font-size: 80%;">Material for lecture Monday September 16</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs005.html#logistic-regression" style="font-size: 80%;">Logistic Regression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs006.html#classification-problems" style="font-size: 80%;">Classification problems</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs007.html#optimization-and-deep-learning" style="font-size: 80%;">Optimization and Deep learning</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs008.html#basics" style="font-size: 80%;">Basics</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs009.html#linear-classifier" style="font-size: 80%;">Linear classifier</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs010.html#some-selected-properties" style="font-size: 80%;">Some selected properties</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs011.html#simple-example" style="font-size: 80%;">Simple example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs012.html#plotting-the-mean-value-for-each-group" style="font-size: 80%;">Plotting the mean value for each group</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs013.html#the-logistic-function" style="font-size: 80%;">The logistic function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs014.html#examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-networks" style="font-size: 80%;">Examples of likelihood functions used in logistic regression and nueral networks</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs015.html#two-parameters" style="font-size: 80%;">Two parameters</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs016.html#maximum-likelihood" style="font-size: 80%;">Maximum likelihood</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs017.html#the-cost-function-rewritten" style="font-size: 80%;">The cost function rewritten</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs018.html#minimizing-the-cross-entropy" style="font-size: 80%;">Minimizing the cross entropy</a></li>
-     <!-- navigation toc: --> <li><a href="#a-more-compact-expression" style="font-size: 80%;">A more compact expression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs020.html#extending-to-more-predictors" style="font-size: 80%;">Extending to more predictors</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs021.html#including-more-classes" style="font-size: 80%;">Including more classes</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs022.html#more-classes" style="font-size: 80%;">More classes</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs023.html#searching-for-optimal-regularization-parameters-lambda" style="font-size: 80%;">Searching for Optimal Regularization Parameters \( \lambda \)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs024.html#grid-search" style="font-size: 80%;">Grid Search</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs025.html#randomized-grid-search" style="font-size: 80%;">Randomized Grid Search</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs026.html#wisconsin-cancer-data" style="font-size: 80%;">Wisconsin Cancer Data</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs027.html#using-the-correlation-matrix" style="font-size: 80%;">Using the correlation matrix</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs028.html#discussing-the-correlation-data" style="font-size: 80%;">Discussing the correlation data</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs029.html#other-measures-in-classification-studies-cancer-data-again" style="font-size: 80%;">Other measures in classification studies: Cancer Data  again</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs030.html#optimization-the-central-part-of-any-machine-learning-algortithm" style="font-size: 80%;">Optimization, the central part of any Machine Learning algortithm</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs031.html#revisiting-our-logistic-regression-case" style="font-size: 80%;">Revisiting our Logistic Regression case</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs032.html#the-equations-to-solve" style="font-size: 80%;">The equations to solve</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs033.html#solving-using-newton-raphson-s-method" style="font-size: 80%;">Solving using Newton-Raphson's method</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs034.html#brief-reminder-on-newton-raphson-s-method" style="font-size: 80%;">Brief reminder on Newton-Raphson's method</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs035.html#the-equations" style="font-size: 80%;">The equations</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs036.html#simple-geometric-interpretation" style="font-size: 80%;">Simple geometric interpretation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs037.html#extending-to-more-than-one-variable" style="font-size: 80%;">Extending to more than one variable</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs038.html#steepest-descent" style="font-size: 80%;">Steepest descent</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs039.html#more-on-steepest-descent" style="font-size: 80%;">More on Steepest descent</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs040.html#the-ideal" style="font-size: 80%;">The ideal</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs041.html#the-sensitiveness-of-the-gradient-descent" style="font-size: 80%;">The sensitiveness of the gradient descent</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs042.html#convex-functions" style="font-size: 80%;">Convex functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs043.html#convex-function" style="font-size: 80%;">Convex function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs044.html#conditions-on-convex-functions" style="font-size: 80%;">Conditions on convex functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs045.html#more-on-convex-functions" style="font-size: 80%;">More on convex functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs046.html#some-simple-problems" style="font-size: 80%;">Some simple problems</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs047.html#revisiting-our-first-homework" style="font-size: 80%;">Revisiting our first homework</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs052.html#gradient-descent-example" style="font-size: 80%;">Gradient descent example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs049.html#the-derivative-of-the-cost-loss-function" style="font-size: 80%;">The derivative of the cost/loss function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs050.html#the-hessian-matrix" style="font-size: 80%;">The Hessian matrix</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs051.html#simple-program" style="font-size: 80%;">Simple program</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs052.html#gradient-descent-example" style="font-size: 80%;">Gradient Descent Example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs053.html#and-a-corresponding-example-using-scikit-learn" style="font-size: 80%;">And a corresponding example using <b>scikit-learn</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs054.html#gradient-descent-and-ridge" style="font-size: 80%;">Gradient descent and Ridge</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs055.html#the-hessian-matrix-for-ridge-regression" style="font-size: 80%;">The Hessian matrix for Ridge Regression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs056.html#program-example-for-gradient-descent-with-ridge-regression" style="font-size: 80%;">Program example for gradient descent with Ridge Regression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs057.html#using-gradient-descent-methods-limitations" style="font-size: 80%;">Using gradient descent methods, limitations</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs058.html#challenge-yourself-the-coming-weekend" style="font-size: 80%;">Challenge yourself the coming weekend</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs059.html#lab-session-material-from-last-week-and-relevant-for-the-first-project" style="font-size: 80%;">Lab session: Material from last week and relevant for the first project</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs060.html#various-steps-in-cross-validation" style="font-size: 80%;">Various steps in cross-validation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs061.html#how-to-set-up-the-cross-validation-for-ridge-and-or-lasso" style="font-size: 80%;">How to set up the cross-validation for Ridge and/or Lasso</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs062.html#cross-validation-in-brief" style="font-size: 80%;">Cross-validation in brief</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs063.html#code-example-for-cross-validation-and-k-fold-cross-validation" style="font-size: 80%;">Code Example for Cross-validation and \( k \)-fold Cross-validation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs001.html#plans-for-week-38-lecture-monday-september-15" style="font-size: 80%;">Plans for week 38, lecture Monday September 15</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs002.html#readings-and-videos" style="font-size: 80%;">Readings and Videos</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs003.html#linking-the-regression-analysis-with-a-statistical-interpretation" style="font-size: 80%;">Linking the regression analysis with a statistical interpretation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs004.html#assumptions-made" style="font-size: 80%;">Assumptions made</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs005.html#expectation-value-and-variance" style="font-size: 80%;">Expectation value and variance</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs006.html#expectation-value-and-variance-for-boldsymbol-theta" style="font-size: 80%;">Expectation value and variance for \( \boldsymbol{\theta} \)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs007.html#deriving-ols-from-a-probability-distribution" style="font-size: 80%;">Deriving OLS from a probability distribution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs008.html#independent-and-identically-distributed-iid" style="font-size: 80%;">Independent and Identically Distributed (iid)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs009.html#maximum-likelihood-estimation-mle" style="font-size: 80%;">Maximum Likelihood Estimation (MLE)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs010.html#a-new-cost-function" style="font-size: 80%;">A new Cost Function</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs014.html#why-resampling-methods" style="font-size: 80%;">Why resampling methods</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs016.html#resampling-methods" style="font-size: 80%;">Resampling methods</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs013.html#resampling-approaches-can-be-computationally-expensive" style="font-size: 80%;">Resampling approaches can be computationally expensive</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs014.html#why-resampling-methods" style="font-size: 80%;">Why resampling methods ?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs015.html#statistical-analysis" style="font-size: 80%;">Statistical analysis</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs016.html#resampling-methods" style="font-size: 80%;">Resampling methods</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs017.html#resampling-methods-bootstrap" style="font-size: 80%;">Resampling methods: Bootstrap</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs018.html#the-central-limit-theorem" style="font-size: 80%;">The Central Limit Theorem</a></li>
+     <!-- navigation toc: --> <li><a href="#finding-the-limit" style="font-size: 80%;">Finding the Limit</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs020.html#rewriting-the-delta-function" style="font-size: 80%;">Rewriting the \( \delta \)-function</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs021.html#identifying-terms" style="font-size: 80%;">Identifying Terms</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs022.html#wrapping-it-up" style="font-size: 80%;">Wrapping it up</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs023.html#confidence-intervals" style="font-size: 80%;">Confidence Intervals</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs024.html#standard-approach-based-on-the-normal-distribution" style="font-size: 80%;">Standard Approach based on the Normal Distribution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs025.html#resampling-methods-bootstrap-background" style="font-size: 80%;">Resampling methods: Bootstrap background</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs026.html#resampling-methods-more-bootstrap-background" style="font-size: 80%;">Resampling methods: More Bootstrap background</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs027.html#resampling-methods-bootstrap-approach" style="font-size: 80%;">Resampling methods: Bootstrap approach</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs028.html#resampling-methods-bootstrap-steps" style="font-size: 80%;">Resampling methods: Bootstrap steps</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs029.html#code-example-for-the-bootstrap-method" style="font-size: 80%;">Code example for the Bootstrap method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs030.html#plotting-the-histogram" style="font-size: 80%;">Plotting the Histogram</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs031.html#the-bias-variance-tradeoff" style="font-size: 80%;">The bias-variance tradeoff</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs032.html#a-way-to-read-the-bias-variance-tradeoff" style="font-size: 80%;">A way to Read the Bias-Variance Tradeoff</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs033.html#example-code-for-bias-variance-tradeoff" style="font-size: 80%;">Example code for Bias-Variance tradeoff</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs034.html#understanding-what-happens" style="font-size: 80%;">Understanding what happens</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs035.html#summing-up" style="font-size: 80%;">Summing up</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs036.html#another-example-from-scikit-learn-s-repository" style="font-size: 80%;">Another Example from Scikit-Learn's Repository</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs037.html#various-steps-in-cross-validation" style="font-size: 80%;">Various steps in cross-validation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs038.html#cross-validation-in-brief" style="font-size: 80%;">Cross-validation in brief</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs039.html#code-example-for-cross-validation-and-k-fold-cross-validation" style="font-size: 80%;">Code Example for Cross-validation and \( k \)-fold Cross-validation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs040.html#more-examples-on-bootstrap-and-cross-validation-and-errors" style="font-size: 80%;">More examples on bootstrap and cross-validation and errors</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs041.html#the-same-example-but-now-with-cross-validation" style="font-size: 80%;">The same example but now with cross-validation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs042.html#material-for-the-lab-sessions" style="font-size: 80%;">Material for the lab sessions</a></li>
 
         </ul>
       </li>
@@ -319,27 +251,24 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0019"></a>
 <!-- !split -->
-<h2 id="a-more-compact-expression" class="anchor">A more compact expression </h2>
+<h2 id="finding-the-limit" class="anchor">Finding the Limit </h2>
 
-<p>Let us now define a vector \( \boldsymbol{y} \) with \( n \) elements \( y_i \), an
-\( n\times p \) matrix \( \boldsymbol{X} \) which contains the \( x_i \) values and a
-vector \( \boldsymbol{p} \) of fitted probabilities \( p(y_i\vert x_i,\boldsymbol{\beta}) \). We can rewrite in a more compact form the first
-derivative of cost function as
+<p>The probability of obtaining an average value \( z \) is the product of the 
+probabilities of obtaining arbitrary individual mean values \( x_i \),
+but with the constraint that the average is \( z \). We can express this through
+the following expression
 </p>
-
 $$
-\frac{\partial \mathcal{C}(\boldsymbol{\beta})}{\partial \boldsymbol{\beta}} = -\boldsymbol{X}^T\left(\boldsymbol{y}-\boldsymbol{p}\right). 
+    \tilde{p}(z)=\int dx_1p(x_1)\int dx_2p(x_2)\dots\int dx_mp(x_m)
+    \delta(z-\frac{x_1+x_2+\dots+x_m}{m}),
 $$
 
-<p>If we in addition define a diagonal matrix \( \boldsymbol{W} \) with elements 
-\( p(y_i\vert x_i,\boldsymbol{\beta})(1-p(y_i\vert x_i,\boldsymbol{\beta}) \), we can obtain a compact expression of the second derivative as 
+<p>where the \( \delta \)-function enbodies the constraint that the mean is \( z \).
+All measurements that lead to each individual \( x_i \) are expected to
+be independent, which in turn means that we can express \( \tilde{p} \) as the 
+product of individual \( p(x_i) \).  The independence assumption is important in the derivation of the central limit theorem.
 </p>
 
-$$
-\frac{\partial^2 \mathcal{C}(\boldsymbol{\beta})}{\partial \boldsymbol{\beta}\partial \boldsymbol{\beta}^T} = \boldsymbol{X}^T\boldsymbol{W}\boldsymbol{X}. 
-$$
-
-
 <p>
 <!-- navigation buttons at the bottom of the page -->
 <ul class="pagination">
@@ -365,7 +294,7 @@ <h2 id="a-more-compact-expression" class="anchor">A more compact expression </h2
   <li><a href="http://github.com/._week38-bs027.html">28</a></li>
   <li><a href="http://github.com/._week38-bs028.html">29</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week38-bs063.html">64</a></li>
+  <li><a href="http://github.com/._week38-bs042.html">43</a></li>
   <li><a href="http://github.com/._week38-bs020.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week38/html/._week38-bs020.html b/doc/pub/week38/html/._week38-bs020.html
index 603d7d0fb..c12385909 100644
--- a/doc/pub/week38/html/._week38-bs020.html
+++ b/doc/pub/week38/html/._week38-bs020.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 38: Logistic Regression and Optimization">
-<title>Week 38: Logistic Regression and Optimization</title>
+<meta name="description" content="Week 38: Statistical analysis, bias-variance tradeoff and resampling methods">
+<title>Week 38: Statistical analysis, bias-variance tradeoff and resampling methods</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week38.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week38-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -36,172 +36,113 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Plans for week 38, lecture Monday September 16',
+ 'sections': [('Plans for week 38, lecture Monday September 15',
                2,
                None,
-               'plans-for-week-38-lecture-monday-september-16'),
-              ('Suggested reading and videos',
+               'plans-for-week-38-lecture-monday-september-15'),
+              ('Readings and Videos', 2, None, 'readings-and-videos'),
+              ('Linking the regression analysis with a statistical '
+               'interpretation',
                2,
                None,
-               'suggested-reading-and-videos'),
-              ('Plans for the lab sessions',
+               'linking-the-regression-analysis-with-a-statistical-interpretation'),
+              ('Assumptions made', 2, None, 'assumptions-made'),
+              ('Expectation value and variance',
                2,
                None,
-               'plans-for-the-lab-sessions'),
-              ('Material for lecture Monday September 16',
+               'expectation-value-and-variance'),
+              ('Expectation value and variance for $\\boldsymbol{\\theta}$',
                2,
                None,
-               'material-for-lecture-monday-september-16'),
-              ('Logistic Regression', 2, None, 'logistic-regression'),
-              ('Classification problems', 2, None, 'classification-problems'),
-              ('Optimization and Deep learning',
+               'expectation-value-and-variance-for-boldsymbol-theta'),
+              ('Deriving OLS from a probability distribution',
                2,
                None,
-               'optimization-and-deep-learning'),
-              ('Basics', 2, None, 'basics'),
-              ('Linear classifier', 2, None, 'linear-classifier'),
-              ('Some selected properties', 2, None, 'some-selected-properties'),
-              ('Simple example', 2, None, 'simple-example'),
-              ('Plotting the mean value for each group',
+               'deriving-ols-from-a-probability-distribution'),
+              ('Independent and Identically Distributed (iid)',
                2,
                None,
-               'plotting-the-mean-value-for-each-group'),
-              ('The logistic function', 2, None, 'the-logistic-function'),
-              ('Examples of likelihood functions used in logistic regression '
-               'and nueral networks',
+               'independent-and-identically-distributed-iid'),
+              ('Maximum Likelihood Estimation (MLE)',
                2,
                None,
-               'examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-networks'),
-              ('Two parameters', 2, None, 'two-parameters'),
-              ('Maximum likelihood', 2, None, 'maximum-likelihood'),
-              ('The cost function rewritten',
+               'maximum-likelihood-estimation-mle'),
+              ('A new Cost Function', 2, None, 'a-new-cost-function'),
+              ('Why resampling methods', 2, None, 'why-resampling-methods'),
+              ('Resampling methods', 2, None, 'resampling-methods'),
+              ('Resampling approaches can be computationally expensive',
                2,
                None,
-               'the-cost-function-rewritten'),
-              ('Minimizing the cross entropy',
+               'resampling-approaches-can-be-computationally-expensive'),
+              ('Why resampling methods ?', 2, None, 'why-resampling-methods'),
+              ('Statistical analysis', 2, None, 'statistical-analysis'),
+              ('Resampling methods', 2, None, 'resampling-methods'),
+              ('Resampling methods: Bootstrap',
                2,
                None,
-               'minimizing-the-cross-entropy'),
-              ('A more compact expression',
+               'resampling-methods-bootstrap'),
+              ('The Central Limit Theorem',
                2,
                None,
-               'a-more-compact-expression'),
-              ('Extending to more predictors',
+               'the-central-limit-theorem'),
+              ('Finding the Limit', 2, None, 'finding-the-limit'),
+              ('Rewriting the $\\delta$-function',
                2,
                None,
-               'extending-to-more-predictors'),
-              ('Including more classes', 2, None, 'including-more-classes'),
-              ('More classes', 2, None, 'more-classes'),
-              ('Searching for Optimal Regularization Parameters $\\lambda$',
+               'rewriting-the-delta-function'),
+              ('Identifying Terms', 2, None, 'identifying-terms'),
+              ('Wrapping it up', 2, None, 'wrapping-it-up'),
+              ('Confidence Intervals', 2, None, 'confidence-intervals'),
+              ('Standard Approach based on the Normal Distribution',
                2,
                None,
-               'searching-for-optimal-regularization-parameters-lambda'),
-              ('Grid Search', 2, None, 'grid-search'),
-              ('Randomized Grid Search', 2, None, 'randomized-grid-search'),
-              ('Wisconsin Cancer Data', 2, None, 'wisconsin-cancer-data'),
-              ('Using the correlation matrix',
+               'standard-approach-based-on-the-normal-distribution'),
+              ('Resampling methods: Bootstrap background',
                2,
                None,
-               'using-the-correlation-matrix'),
-              ('Discussing the correlation data',
+               'resampling-methods-bootstrap-background'),
+              ('Resampling methods: More Bootstrap background',
                2,
                None,
-               'discussing-the-correlation-data'),
-              ('Other measures in classification studies: Cancer Data  again',
+               'resampling-methods-more-bootstrap-background'),
+              ('Resampling methods: Bootstrap approach',
                2,
                None,
-               'other-measures-in-classification-studies-cancer-data-again'),
-              ('Optimization, the central part of any Machine Learning '
-               'algortithm',
+               'resampling-methods-bootstrap-approach'),
+              ('Resampling methods: Bootstrap steps',
                2,
                None,
-               'optimization-the-central-part-of-any-machine-learning-algortithm'),
-              ('Revisiting our Logistic Regression case',
+               'resampling-methods-bootstrap-steps'),
+              ('Code example for the Bootstrap method',
                2,
                None,
-               'revisiting-our-logistic-regression-case'),
-              ('The equations to solve', 2, None, 'the-equations-to-solve'),
-              ("Solving using Newton-Raphson's method",
+               'code-example-for-the-bootstrap-method'),
+              ('Plotting the Histogram', 2, None, 'plotting-the-histogram'),
+              ('The bias-variance tradeoff',
                2,
                None,
-               'solving-using-newton-raphson-s-method'),
-              ("Brief reminder on Newton-Raphson's method",
+               'the-bias-variance-tradeoff'),
+              ('A way to Read the Bias-Variance Tradeoff',
                2,
                None,
-               'brief-reminder-on-newton-raphson-s-method'),
-              ('The equations', 2, None, 'the-equations'),
-              ('Simple geometric interpretation',
+               'a-way-to-read-the-bias-variance-tradeoff'),
+              ('Example code for Bias-Variance tradeoff',
                2,
                None,
-               'simple-geometric-interpretation'),
-              ('Extending to more than one variable',
+               'example-code-for-bias-variance-tradeoff'),
+              ('Understanding what happens',
                2,
                None,
-               'extending-to-more-than-one-variable'),
-              ('Steepest descent', 2, None, 'steepest-descent'),
-              ('More on Steepest descent', 2, None, 'more-on-steepest-descent'),
-              ('The ideal', 2, None, 'the-ideal'),
-              ('The sensitiveness of the gradient descent',
+               'understanding-what-happens'),
+              ('Summing up', 2, None, 'summing-up'),
+              ("Another Example from Scikit-Learn's Repository",
                2,
                None,
-               'the-sensitiveness-of-the-gradient-descent'),
-              ('Convex functions', 2, None, 'convex-functions'),
-              ('Convex function', 2, None, 'convex-function'),
-              ('Conditions on convex functions',
-               2,
-               None,
-               'conditions-on-convex-functions'),
-              ('More on convex functions', 2, None, 'more-on-convex-functions'),
-              ('Some simple problems', 2, None, 'some-simple-problems'),
-              ('Revisiting our first homework',
-               2,
-               None,
-               'revisiting-our-first-homework'),
-              ('Gradient descent example', 2, None, 'gradient-descent-example'),
-              ('The derivative of the cost/loss function',
-               2,
-               None,
-               'the-derivative-of-the-cost-loss-function'),
-              ('The Hessian matrix', 2, None, 'the-hessian-matrix'),
-              ('Simple program', 2, None, 'simple-program'),
-              ('Gradient Descent Example', 2, None, 'gradient-descent-example'),
-              ('And a corresponding example using _scikit-learn_',
-               2,
-               None,
-               'and-a-corresponding-example-using-scikit-learn'),
-              ('Gradient descent and Ridge',
-               2,
-               None,
-               'gradient-descent-and-ridge'),
-              ('The Hessian matrix for Ridge Regression',
-               2,
-               None,
-               'the-hessian-matrix-for-ridge-regression'),
-              ('Program example for gradient descent with Ridge Regression',
-               2,
-               None,
-               'program-example-for-gradient-descent-with-ridge-regression'),
-              ('Using gradient descent methods, limitations',
-               2,
-               None,
-               'using-gradient-descent-methods-limitations'),
-              ('Challenge yourself the coming weekend',
-               2,
-               None,
-               'challenge-yourself-the-coming-weekend'),
-              ('Lab session: Material from last week and relevant for the '
-               'first project',
-               2,
-               None,
-               'lab-session-material-from-last-week-and-relevant-for-the-first-project'),
+               'another-example-from-scikit-learn-s-repository'),
               ('Various steps in cross-validation',
                2,
                None,
                'various-steps-in-cross-validation'),
-              ('How to set up the cross-validation for Ridge and/or Lasso',
-               2,
-               None,
-               'how-to-set-up-the-cross-validation-for-ridge-and-or-lasso'),
               ('Cross-validation in brief',
                2,
                None,
@@ -210,7 +151,19 @@
                'Cross-validation',
                2,
                None,
-               'code-example-for-cross-validation-and-k-fold-cross-validation')]}
+               'code-example-for-cross-validation-and-k-fold-cross-validation'),
+              ('More examples on bootstrap and cross-validation and errors',
+               2,
+               None,
+               'more-examples-on-bootstrap-and-cross-validation-and-errors'),
+              ('The same example but now with cross-validation',
+               2,
+               None,
+               'the-same-example-but-now-with-cross-validation'),
+              ('Material for the lab sessions',
+               2,
+               None,
+               'material-for-the-lab-sessions')]}
 end of tocinfo -->
 
 <body>
@@ -238,76 +191,55 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week38-bs.html">Week 38: Logistic Regression and Optimization</a>
+    <a class="navbar-brand" href="http://github.com/week38-bs.html">Week 38: Statistical analysis, bias-variance tradeoff and resampling methods</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs001.html#plans-for-week-38-lecture-monday-september-16" style="font-size: 80%;">Plans for week 38, lecture Monday September 16</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs002.html#suggested-reading-and-videos" style="font-size: 80%;">Suggested reading and videos</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs003.html#plans-for-the-lab-sessions" style="font-size: 80%;">Plans for the lab sessions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs004.html#material-for-lecture-monday-september-16" style="font-size: 80%;">Material for lecture Monday September 16</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs005.html#logistic-regression" style="font-size: 80%;">Logistic Regression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs006.html#classification-problems" style="font-size: 80%;">Classification problems</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs007.html#optimization-and-deep-learning" style="font-size: 80%;">Optimization and Deep learning</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs008.html#basics" style="font-size: 80%;">Basics</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs009.html#linear-classifier" style="font-size: 80%;">Linear classifier</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs010.html#some-selected-properties" style="font-size: 80%;">Some selected properties</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs011.html#simple-example" style="font-size: 80%;">Simple example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs012.html#plotting-the-mean-value-for-each-group" style="font-size: 80%;">Plotting the mean value for each group</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs013.html#the-logistic-function" style="font-size: 80%;">The logistic function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs014.html#examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-networks" style="font-size: 80%;">Examples of likelihood functions used in logistic regression and nueral networks</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs015.html#two-parameters" style="font-size: 80%;">Two parameters</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs016.html#maximum-likelihood" style="font-size: 80%;">Maximum likelihood</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs017.html#the-cost-function-rewritten" style="font-size: 80%;">The cost function rewritten</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs018.html#minimizing-the-cross-entropy" style="font-size: 80%;">Minimizing the cross entropy</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs019.html#a-more-compact-expression" style="font-size: 80%;">A more compact expression</a></li>
-     <!-- navigation toc: --> <li><a href="#extending-to-more-predictors" style="font-size: 80%;">Extending to more predictors</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs021.html#including-more-classes" style="font-size: 80%;">Including more classes</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs022.html#more-classes" style="font-size: 80%;">More classes</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs023.html#searching-for-optimal-regularization-parameters-lambda" style="font-size: 80%;">Searching for Optimal Regularization Parameters \( \lambda \)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs024.html#grid-search" style="font-size: 80%;">Grid Search</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs025.html#randomized-grid-search" style="font-size: 80%;">Randomized Grid Search</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs026.html#wisconsin-cancer-data" style="font-size: 80%;">Wisconsin Cancer Data</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs027.html#using-the-correlation-matrix" style="font-size: 80%;">Using the correlation matrix</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs028.html#discussing-the-correlation-data" style="font-size: 80%;">Discussing the correlation data</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs029.html#other-measures-in-classification-studies-cancer-data-again" style="font-size: 80%;">Other measures in classification studies: Cancer Data  again</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs030.html#optimization-the-central-part-of-any-machine-learning-algortithm" style="font-size: 80%;">Optimization, the central part of any Machine Learning algortithm</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs031.html#revisiting-our-logistic-regression-case" style="font-size: 80%;">Revisiting our Logistic Regression case</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs032.html#the-equations-to-solve" style="font-size: 80%;">The equations to solve</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs033.html#solving-using-newton-raphson-s-method" style="font-size: 80%;">Solving using Newton-Raphson's method</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs034.html#brief-reminder-on-newton-raphson-s-method" style="font-size: 80%;">Brief reminder on Newton-Raphson's method</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs035.html#the-equations" style="font-size: 80%;">The equations</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs036.html#simple-geometric-interpretation" style="font-size: 80%;">Simple geometric interpretation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs037.html#extending-to-more-than-one-variable" style="font-size: 80%;">Extending to more than one variable</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs038.html#steepest-descent" style="font-size: 80%;">Steepest descent</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs039.html#more-on-steepest-descent" style="font-size: 80%;">More on Steepest descent</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs040.html#the-ideal" style="font-size: 80%;">The ideal</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs041.html#the-sensitiveness-of-the-gradient-descent" style="font-size: 80%;">The sensitiveness of the gradient descent</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs042.html#convex-functions" style="font-size: 80%;">Convex functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs043.html#convex-function" style="font-size: 80%;">Convex function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs044.html#conditions-on-convex-functions" style="font-size: 80%;">Conditions on convex functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs045.html#more-on-convex-functions" style="font-size: 80%;">More on convex functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs046.html#some-simple-problems" style="font-size: 80%;">Some simple problems</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs047.html#revisiting-our-first-homework" style="font-size: 80%;">Revisiting our first homework</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs052.html#gradient-descent-example" style="font-size: 80%;">Gradient descent example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs049.html#the-derivative-of-the-cost-loss-function" style="font-size: 80%;">The derivative of the cost/loss function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs050.html#the-hessian-matrix" style="font-size: 80%;">The Hessian matrix</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs051.html#simple-program" style="font-size: 80%;">Simple program</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs052.html#gradient-descent-example" style="font-size: 80%;">Gradient Descent Example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs053.html#and-a-corresponding-example-using-scikit-learn" style="font-size: 80%;">And a corresponding example using <b>scikit-learn</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs054.html#gradient-descent-and-ridge" style="font-size: 80%;">Gradient descent and Ridge</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs055.html#the-hessian-matrix-for-ridge-regression" style="font-size: 80%;">The Hessian matrix for Ridge Regression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs056.html#program-example-for-gradient-descent-with-ridge-regression" style="font-size: 80%;">Program example for gradient descent with Ridge Regression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs057.html#using-gradient-descent-methods-limitations" style="font-size: 80%;">Using gradient descent methods, limitations</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs058.html#challenge-yourself-the-coming-weekend" style="font-size: 80%;">Challenge yourself the coming weekend</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs059.html#lab-session-material-from-last-week-and-relevant-for-the-first-project" style="font-size: 80%;">Lab session: Material from last week and relevant for the first project</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs060.html#various-steps-in-cross-validation" style="font-size: 80%;">Various steps in cross-validation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs061.html#how-to-set-up-the-cross-validation-for-ridge-and-or-lasso" style="font-size: 80%;">How to set up the cross-validation for Ridge and/or Lasso</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs062.html#cross-validation-in-brief" style="font-size: 80%;">Cross-validation in brief</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs063.html#code-example-for-cross-validation-and-k-fold-cross-validation" style="font-size: 80%;">Code Example for Cross-validation and \( k \)-fold Cross-validation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs001.html#plans-for-week-38-lecture-monday-september-15" style="font-size: 80%;">Plans for week 38, lecture Monday September 15</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs002.html#readings-and-videos" style="font-size: 80%;">Readings and Videos</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs003.html#linking-the-regression-analysis-with-a-statistical-interpretation" style="font-size: 80%;">Linking the regression analysis with a statistical interpretation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs004.html#assumptions-made" style="font-size: 80%;">Assumptions made</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs005.html#expectation-value-and-variance" style="font-size: 80%;">Expectation value and variance</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs006.html#expectation-value-and-variance-for-boldsymbol-theta" style="font-size: 80%;">Expectation value and variance for \( \boldsymbol{\theta} \)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs007.html#deriving-ols-from-a-probability-distribution" style="font-size: 80%;">Deriving OLS from a probability distribution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs008.html#independent-and-identically-distributed-iid" style="font-size: 80%;">Independent and Identically Distributed (iid)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs009.html#maximum-likelihood-estimation-mle" style="font-size: 80%;">Maximum Likelihood Estimation (MLE)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs010.html#a-new-cost-function" style="font-size: 80%;">A new Cost Function</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs014.html#why-resampling-methods" style="font-size: 80%;">Why resampling methods</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs016.html#resampling-methods" style="font-size: 80%;">Resampling methods</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs013.html#resampling-approaches-can-be-computationally-expensive" style="font-size: 80%;">Resampling approaches can be computationally expensive</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs014.html#why-resampling-methods" style="font-size: 80%;">Why resampling methods ?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs015.html#statistical-analysis" style="font-size: 80%;">Statistical analysis</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs016.html#resampling-methods" style="font-size: 80%;">Resampling methods</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs017.html#resampling-methods-bootstrap" style="font-size: 80%;">Resampling methods: Bootstrap</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs018.html#the-central-limit-theorem" style="font-size: 80%;">The Central Limit Theorem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs019.html#finding-the-limit" style="font-size: 80%;">Finding the Limit</a></li>
+     <!-- navigation toc: --> <li><a href="#rewriting-the-delta-function" style="font-size: 80%;">Rewriting the \( \delta \)-function</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs021.html#identifying-terms" style="font-size: 80%;">Identifying Terms</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs022.html#wrapping-it-up" style="font-size: 80%;">Wrapping it up</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs023.html#confidence-intervals" style="font-size: 80%;">Confidence Intervals</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs024.html#standard-approach-based-on-the-normal-distribution" style="font-size: 80%;">Standard Approach based on the Normal Distribution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs025.html#resampling-methods-bootstrap-background" style="font-size: 80%;">Resampling methods: Bootstrap background</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs026.html#resampling-methods-more-bootstrap-background" style="font-size: 80%;">Resampling methods: More Bootstrap background</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs027.html#resampling-methods-bootstrap-approach" style="font-size: 80%;">Resampling methods: Bootstrap approach</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs028.html#resampling-methods-bootstrap-steps" style="font-size: 80%;">Resampling methods: Bootstrap steps</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs029.html#code-example-for-the-bootstrap-method" style="font-size: 80%;">Code example for the Bootstrap method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs030.html#plotting-the-histogram" style="font-size: 80%;">Plotting the Histogram</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs031.html#the-bias-variance-tradeoff" style="font-size: 80%;">The bias-variance tradeoff</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs032.html#a-way-to-read-the-bias-variance-tradeoff" style="font-size: 80%;">A way to Read the Bias-Variance Tradeoff</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs033.html#example-code-for-bias-variance-tradeoff" style="font-size: 80%;">Example code for Bias-Variance tradeoff</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs034.html#understanding-what-happens" style="font-size: 80%;">Understanding what happens</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs035.html#summing-up" style="font-size: 80%;">Summing up</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs036.html#another-example-from-scikit-learn-s-repository" style="font-size: 80%;">Another Example from Scikit-Learn's Repository</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs037.html#various-steps-in-cross-validation" style="font-size: 80%;">Various steps in cross-validation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs038.html#cross-validation-in-brief" style="font-size: 80%;">Cross-validation in brief</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs039.html#code-example-for-cross-validation-and-k-fold-cross-validation" style="font-size: 80%;">Code Example for Cross-validation and \( k \)-fold Cross-validation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs040.html#more-examples-on-bootstrap-and-cross-validation-and-errors" style="font-size: 80%;">More examples on bootstrap and cross-validation and errors</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs041.html#the-same-example-but-now-with-cross-validation" style="font-size: 80%;">The same example but now with cross-validation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs042.html#material-for-the-lab-sessions" style="font-size: 80%;">Material for the lab sessions</a></li>
 
         </ul>
       </li>
@@ -319,16 +251,30 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0020"></a>
 <!-- !split -->
-<h2 id="extending-to-more-predictors" class="anchor">Extending to more predictors </h2>
+<h2 id="rewriting-the-delta-function" class="anchor">Rewriting the \( \delta \)-function </h2>
+
+<p>If we use the integral expression for the \( \delta \)-function</p>
 
-<p>Within a binary classification problem, we can easily expand our model to include multiple predictors. Our ratio between likelihoods is then with \( p \) predictors</p>
 $$
-\log{ \frac{p(\boldsymbol{\beta}\boldsymbol{x})}{1-p(\boldsymbol{\beta}\boldsymbol{x})}} = \beta_0+\beta_1x_1+\beta_2x_2+\dots+\beta_px_p.
+   \delta(z-\frac{x_1+x_2+\dots+x_m}{m})=\frac{1}{2\pi}\int_{-\infty}^{\infty}
+   dq\exp{\left(iq(z-\frac{x_1+x_2+\dots+x_m}{m})\right)},
 $$
 
-<p>Here we defined \( \boldsymbol{x}=[1,x_1,x_2,\dots,x_p] \) and \( \boldsymbol{\beta}=[\beta_0, \beta_1, \dots, \beta_p] \) leading to</p>
+<p>and inserting \( e^{i\mu q-i\mu q} \) where \( \mu \) is the mean value
+we arrive at
+</p>
+$$
+   \tilde{p}(z)=\frac{1}{2\pi}\int_{-\infty}^{\infty}
+   dq\exp{\left(iq(z-\mu)\right)}\left[\int_{-\infty}^{\infty}
+   dxp(x)\exp{\left(iq(\mu-x)/m\right)}\right]^m,
+$$
+
+<p>with the integral over \( x \) resulting in</p>
+
 $$
-p(\boldsymbol{\beta}\boldsymbol{x})=\frac{ \exp{(\beta_0+\beta_1x_1+\beta_2x_2+\dots+\beta_px_p)}}{1+\exp{(\beta_0+\beta_1x_1+\beta_2x_2+\dots+\beta_px_p)}}.
+  \int_{-\infty}^{\infty}dxp(x)\exp{\left(iq(\mu-x)/m\right)}=
+  \int_{-\infty}^{\infty}dxp(x)
+   \left[1+\frac{iq(\mu-x)}{m}-\frac{q^2(\mu-x)^2}{2m^2}+\dots\right].
 $$
 
 
@@ -357,7 +303,7 @@ <h2 id="extending-to-more-predictors" class="anchor">Extending to more predictor
   <li><a href="http://github.com/._week38-bs028.html">29</a></li>
   <li><a href="http://github.com/._week38-bs029.html">30</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week38-bs063.html">64</a></li>
+  <li><a href="http://github.com/._week38-bs042.html">43</a></li>
   <li><a href="http://github.com/._week38-bs021.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week38/html/._week38-bs021.html b/doc/pub/week38/html/._week38-bs021.html
index 13346790e..d183fc22b 100644
--- a/doc/pub/week38/html/._week38-bs021.html
+++ b/doc/pub/week38/html/._week38-bs021.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 38: Logistic Regression and Optimization">
-<title>Week 38: Logistic Regression and Optimization</title>
+<meta name="description" content="Week 38: Statistical analysis, bias-variance tradeoff and resampling methods">
+<title>Week 38: Statistical analysis, bias-variance tradeoff and resampling methods</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week38.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week38-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -36,172 +36,113 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Plans for week 38, lecture Monday September 16',
+ 'sections': [('Plans for week 38, lecture Monday September 15',
                2,
                None,
-               'plans-for-week-38-lecture-monday-september-16'),
-              ('Suggested reading and videos',
+               'plans-for-week-38-lecture-monday-september-15'),
+              ('Readings and Videos', 2, None, 'readings-and-videos'),
+              ('Linking the regression analysis with a statistical '
+               'interpretation',
                2,
                None,
-               'suggested-reading-and-videos'),
-              ('Plans for the lab sessions',
+               'linking-the-regression-analysis-with-a-statistical-interpretation'),
+              ('Assumptions made', 2, None, 'assumptions-made'),
+              ('Expectation value and variance',
                2,
                None,
-               'plans-for-the-lab-sessions'),
-              ('Material for lecture Monday September 16',
+               'expectation-value-and-variance'),
+              ('Expectation value and variance for $\\boldsymbol{\\theta}$',
                2,
                None,
-               'material-for-lecture-monday-september-16'),
-              ('Logistic Regression', 2, None, 'logistic-regression'),
-              ('Classification problems', 2, None, 'classification-problems'),
-              ('Optimization and Deep learning',
+               'expectation-value-and-variance-for-boldsymbol-theta'),
+              ('Deriving OLS from a probability distribution',
                2,
                None,
-               'optimization-and-deep-learning'),
-              ('Basics', 2, None, 'basics'),
-              ('Linear classifier', 2, None, 'linear-classifier'),
-              ('Some selected properties', 2, None, 'some-selected-properties'),
-              ('Simple example', 2, None, 'simple-example'),
-              ('Plotting the mean value for each group',
+               'deriving-ols-from-a-probability-distribution'),
+              ('Independent and Identically Distributed (iid)',
                2,
                None,
-               'plotting-the-mean-value-for-each-group'),
-              ('The logistic function', 2, None, 'the-logistic-function'),
-              ('Examples of likelihood functions used in logistic regression '
-               'and nueral networks',
+               'independent-and-identically-distributed-iid'),
+              ('Maximum Likelihood Estimation (MLE)',
                2,
                None,
-               'examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-networks'),
-              ('Two parameters', 2, None, 'two-parameters'),
-              ('Maximum likelihood', 2, None, 'maximum-likelihood'),
-              ('The cost function rewritten',
+               'maximum-likelihood-estimation-mle'),
+              ('A new Cost Function', 2, None, 'a-new-cost-function'),
+              ('Why resampling methods', 2, None, 'why-resampling-methods'),
+              ('Resampling methods', 2, None, 'resampling-methods'),
+              ('Resampling approaches can be computationally expensive',
                2,
                None,
-               'the-cost-function-rewritten'),
-              ('Minimizing the cross entropy',
+               'resampling-approaches-can-be-computationally-expensive'),
+              ('Why resampling methods ?', 2, None, 'why-resampling-methods'),
+              ('Statistical analysis', 2, None, 'statistical-analysis'),
+              ('Resampling methods', 2, None, 'resampling-methods'),
+              ('Resampling methods: Bootstrap',
                2,
                None,
-               'minimizing-the-cross-entropy'),
-              ('A more compact expression',
+               'resampling-methods-bootstrap'),
+              ('The Central Limit Theorem',
                2,
                None,
-               'a-more-compact-expression'),
-              ('Extending to more predictors',
+               'the-central-limit-theorem'),
+              ('Finding the Limit', 2, None, 'finding-the-limit'),
+              ('Rewriting the $\\delta$-function',
                2,
                None,
-               'extending-to-more-predictors'),
-              ('Including more classes', 2, None, 'including-more-classes'),
-              ('More classes', 2, None, 'more-classes'),
-              ('Searching for Optimal Regularization Parameters $\\lambda$',
+               'rewriting-the-delta-function'),
+              ('Identifying Terms', 2, None, 'identifying-terms'),
+              ('Wrapping it up', 2, None, 'wrapping-it-up'),
+              ('Confidence Intervals', 2, None, 'confidence-intervals'),
+              ('Standard Approach based on the Normal Distribution',
                2,
                None,
-               'searching-for-optimal-regularization-parameters-lambda'),
-              ('Grid Search', 2, None, 'grid-search'),
-              ('Randomized Grid Search', 2, None, 'randomized-grid-search'),
-              ('Wisconsin Cancer Data', 2, None, 'wisconsin-cancer-data'),
-              ('Using the correlation matrix',
+               'standard-approach-based-on-the-normal-distribution'),
+              ('Resampling methods: Bootstrap background',
                2,
                None,
-               'using-the-correlation-matrix'),
-              ('Discussing the correlation data',
+               'resampling-methods-bootstrap-background'),
+              ('Resampling methods: More Bootstrap background',
                2,
                None,
-               'discussing-the-correlation-data'),
-              ('Other measures in classification studies: Cancer Data  again',
+               'resampling-methods-more-bootstrap-background'),
+              ('Resampling methods: Bootstrap approach',
                2,
                None,
-               'other-measures-in-classification-studies-cancer-data-again'),
-              ('Optimization, the central part of any Machine Learning '
-               'algortithm',
+               'resampling-methods-bootstrap-approach'),
+              ('Resampling methods: Bootstrap steps',
                2,
                None,
-               'optimization-the-central-part-of-any-machine-learning-algortithm'),
-              ('Revisiting our Logistic Regression case',
+               'resampling-methods-bootstrap-steps'),
+              ('Code example for the Bootstrap method',
                2,
                None,
-               'revisiting-our-logistic-regression-case'),
-              ('The equations to solve', 2, None, 'the-equations-to-solve'),
-              ("Solving using Newton-Raphson's method",
+               'code-example-for-the-bootstrap-method'),
+              ('Plotting the Histogram', 2, None, 'plotting-the-histogram'),
+              ('The bias-variance tradeoff',
                2,
                None,
-               'solving-using-newton-raphson-s-method'),
-              ("Brief reminder on Newton-Raphson's method",
+               'the-bias-variance-tradeoff'),
+              ('A way to Read the Bias-Variance Tradeoff',
                2,
                None,
-               'brief-reminder-on-newton-raphson-s-method'),
-              ('The equations', 2, None, 'the-equations'),
-              ('Simple geometric interpretation',
+               'a-way-to-read-the-bias-variance-tradeoff'),
+              ('Example code for Bias-Variance tradeoff',
                2,
                None,
-               'simple-geometric-interpretation'),
-              ('Extending to more than one variable',
+               'example-code-for-bias-variance-tradeoff'),
+              ('Understanding what happens',
                2,
                None,
-               'extending-to-more-than-one-variable'),
-              ('Steepest descent', 2, None, 'steepest-descent'),
-              ('More on Steepest descent', 2, None, 'more-on-steepest-descent'),
-              ('The ideal', 2, None, 'the-ideal'),
-              ('The sensitiveness of the gradient descent',
+               'understanding-what-happens'),
+              ('Summing up', 2, None, 'summing-up'),
+              ("Another Example from Scikit-Learn's Repository",
                2,
                None,
-               'the-sensitiveness-of-the-gradient-descent'),
-              ('Convex functions', 2, None, 'convex-functions'),
-              ('Convex function', 2, None, 'convex-function'),
-              ('Conditions on convex functions',
-               2,
-               None,
-               'conditions-on-convex-functions'),
-              ('More on convex functions', 2, None, 'more-on-convex-functions'),
-              ('Some simple problems', 2, None, 'some-simple-problems'),
-              ('Revisiting our first homework',
-               2,
-               None,
-               'revisiting-our-first-homework'),
-              ('Gradient descent example', 2, None, 'gradient-descent-example'),
-              ('The derivative of the cost/loss function',
-               2,
-               None,
-               'the-derivative-of-the-cost-loss-function'),
-              ('The Hessian matrix', 2, None, 'the-hessian-matrix'),
-              ('Simple program', 2, None, 'simple-program'),
-              ('Gradient Descent Example', 2, None, 'gradient-descent-example'),
-              ('And a corresponding example using _scikit-learn_',
-               2,
-               None,
-               'and-a-corresponding-example-using-scikit-learn'),
-              ('Gradient descent and Ridge',
-               2,
-               None,
-               'gradient-descent-and-ridge'),
-              ('The Hessian matrix for Ridge Regression',
-               2,
-               None,
-               'the-hessian-matrix-for-ridge-regression'),
-              ('Program example for gradient descent with Ridge Regression',
-               2,
-               None,
-               'program-example-for-gradient-descent-with-ridge-regression'),
-              ('Using gradient descent methods, limitations',
-               2,
-               None,
-               'using-gradient-descent-methods-limitations'),
-              ('Challenge yourself the coming weekend',
-               2,
-               None,
-               'challenge-yourself-the-coming-weekend'),
-              ('Lab session: Material from last week and relevant for the '
-               'first project',
-               2,
-               None,
-               'lab-session-material-from-last-week-and-relevant-for-the-first-project'),
+               'another-example-from-scikit-learn-s-repository'),
               ('Various steps in cross-validation',
                2,
                None,
                'various-steps-in-cross-validation'),
-              ('How to set up the cross-validation for Ridge and/or Lasso',
-               2,
-               None,
-               'how-to-set-up-the-cross-validation-for-ridge-and-or-lasso'),
               ('Cross-validation in brief',
                2,
                None,
@@ -210,7 +151,19 @@
                'Cross-validation',
                2,
                None,
-               'code-example-for-cross-validation-and-k-fold-cross-validation')]}
+               'code-example-for-cross-validation-and-k-fold-cross-validation'),
+              ('More examples on bootstrap and cross-validation and errors',
+               2,
+               None,
+               'more-examples-on-bootstrap-and-cross-validation-and-errors'),
+              ('The same example but now with cross-validation',
+               2,
+               None,
+               'the-same-example-but-now-with-cross-validation'),
+              ('Material for the lab sessions',
+               2,
+               None,
+               'material-for-the-lab-sessions')]}
 end of tocinfo -->
 
 <body>
@@ -238,76 +191,55 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week38-bs.html">Week 38: Logistic Regression and Optimization</a>
+    <a class="navbar-brand" href="http://github.com/week38-bs.html">Week 38: Statistical analysis, bias-variance tradeoff and resampling methods</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs001.html#plans-for-week-38-lecture-monday-september-16" style="font-size: 80%;">Plans for week 38, lecture Monday September 16</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs002.html#suggested-reading-and-videos" style="font-size: 80%;">Suggested reading and videos</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs003.html#plans-for-the-lab-sessions" style="font-size: 80%;">Plans for the lab sessions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs004.html#material-for-lecture-monday-september-16" style="font-size: 80%;">Material for lecture Monday September 16</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs005.html#logistic-regression" style="font-size: 80%;">Logistic Regression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs006.html#classification-problems" style="font-size: 80%;">Classification problems</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs007.html#optimization-and-deep-learning" style="font-size: 80%;">Optimization and Deep learning</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs008.html#basics" style="font-size: 80%;">Basics</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs009.html#linear-classifier" style="font-size: 80%;">Linear classifier</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs010.html#some-selected-properties" style="font-size: 80%;">Some selected properties</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs011.html#simple-example" style="font-size: 80%;">Simple example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs012.html#plotting-the-mean-value-for-each-group" style="font-size: 80%;">Plotting the mean value for each group</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs013.html#the-logistic-function" style="font-size: 80%;">The logistic function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs014.html#examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-networks" style="font-size: 80%;">Examples of likelihood functions used in logistic regression and nueral networks</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs015.html#two-parameters" style="font-size: 80%;">Two parameters</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs016.html#maximum-likelihood" style="font-size: 80%;">Maximum likelihood</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs017.html#the-cost-function-rewritten" style="font-size: 80%;">The cost function rewritten</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs018.html#minimizing-the-cross-entropy" style="font-size: 80%;">Minimizing the cross entropy</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs019.html#a-more-compact-expression" style="font-size: 80%;">A more compact expression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs020.html#extending-to-more-predictors" style="font-size: 80%;">Extending to more predictors</a></li>
-     <!-- navigation toc: --> <li><a href="#including-more-classes" style="font-size: 80%;">Including more classes</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs022.html#more-classes" style="font-size: 80%;">More classes</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs023.html#searching-for-optimal-regularization-parameters-lambda" style="font-size: 80%;">Searching for Optimal Regularization Parameters \( \lambda \)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs024.html#grid-search" style="font-size: 80%;">Grid Search</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs025.html#randomized-grid-search" style="font-size: 80%;">Randomized Grid Search</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs026.html#wisconsin-cancer-data" style="font-size: 80%;">Wisconsin Cancer Data</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs027.html#using-the-correlation-matrix" style="font-size: 80%;">Using the correlation matrix</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs028.html#discussing-the-correlation-data" style="font-size: 80%;">Discussing the correlation data</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs029.html#other-measures-in-classification-studies-cancer-data-again" style="font-size: 80%;">Other measures in classification studies: Cancer Data  again</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs030.html#optimization-the-central-part-of-any-machine-learning-algortithm" style="font-size: 80%;">Optimization, the central part of any Machine Learning algortithm</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs031.html#revisiting-our-logistic-regression-case" style="font-size: 80%;">Revisiting our Logistic Regression case</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs032.html#the-equations-to-solve" style="font-size: 80%;">The equations to solve</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs033.html#solving-using-newton-raphson-s-method" style="font-size: 80%;">Solving using Newton-Raphson's method</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs034.html#brief-reminder-on-newton-raphson-s-method" style="font-size: 80%;">Brief reminder on Newton-Raphson's method</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs035.html#the-equations" style="font-size: 80%;">The equations</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs036.html#simple-geometric-interpretation" style="font-size: 80%;">Simple geometric interpretation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs037.html#extending-to-more-than-one-variable" style="font-size: 80%;">Extending to more than one variable</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs038.html#steepest-descent" style="font-size: 80%;">Steepest descent</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs039.html#more-on-steepest-descent" style="font-size: 80%;">More on Steepest descent</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs040.html#the-ideal" style="font-size: 80%;">The ideal</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs041.html#the-sensitiveness-of-the-gradient-descent" style="font-size: 80%;">The sensitiveness of the gradient descent</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs042.html#convex-functions" style="font-size: 80%;">Convex functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs043.html#convex-function" style="font-size: 80%;">Convex function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs044.html#conditions-on-convex-functions" style="font-size: 80%;">Conditions on convex functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs045.html#more-on-convex-functions" style="font-size: 80%;">More on convex functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs046.html#some-simple-problems" style="font-size: 80%;">Some simple problems</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs047.html#revisiting-our-first-homework" style="font-size: 80%;">Revisiting our first homework</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs052.html#gradient-descent-example" style="font-size: 80%;">Gradient descent example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs049.html#the-derivative-of-the-cost-loss-function" style="font-size: 80%;">The derivative of the cost/loss function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs050.html#the-hessian-matrix" style="font-size: 80%;">The Hessian matrix</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs051.html#simple-program" style="font-size: 80%;">Simple program</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs052.html#gradient-descent-example" style="font-size: 80%;">Gradient Descent Example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs053.html#and-a-corresponding-example-using-scikit-learn" style="font-size: 80%;">And a corresponding example using <b>scikit-learn</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs054.html#gradient-descent-and-ridge" style="font-size: 80%;">Gradient descent and Ridge</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs055.html#the-hessian-matrix-for-ridge-regression" style="font-size: 80%;">The Hessian matrix for Ridge Regression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs056.html#program-example-for-gradient-descent-with-ridge-regression" style="font-size: 80%;">Program example for gradient descent with Ridge Regression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs057.html#using-gradient-descent-methods-limitations" style="font-size: 80%;">Using gradient descent methods, limitations</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs058.html#challenge-yourself-the-coming-weekend" style="font-size: 80%;">Challenge yourself the coming weekend</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs059.html#lab-session-material-from-last-week-and-relevant-for-the-first-project" style="font-size: 80%;">Lab session: Material from last week and relevant for the first project</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs060.html#various-steps-in-cross-validation" style="font-size: 80%;">Various steps in cross-validation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs061.html#how-to-set-up-the-cross-validation-for-ridge-and-or-lasso" style="font-size: 80%;">How to set up the cross-validation for Ridge and/or Lasso</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs062.html#cross-validation-in-brief" style="font-size: 80%;">Cross-validation in brief</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs063.html#code-example-for-cross-validation-and-k-fold-cross-validation" style="font-size: 80%;">Code Example for Cross-validation and \( k \)-fold Cross-validation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs001.html#plans-for-week-38-lecture-monday-september-15" style="font-size: 80%;">Plans for week 38, lecture Monday September 15</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs002.html#readings-and-videos" style="font-size: 80%;">Readings and Videos</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs003.html#linking-the-regression-analysis-with-a-statistical-interpretation" style="font-size: 80%;">Linking the regression analysis with a statistical interpretation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs004.html#assumptions-made" style="font-size: 80%;">Assumptions made</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs005.html#expectation-value-and-variance" style="font-size: 80%;">Expectation value and variance</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs006.html#expectation-value-and-variance-for-boldsymbol-theta" style="font-size: 80%;">Expectation value and variance for \( \boldsymbol{\theta} \)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs007.html#deriving-ols-from-a-probability-distribution" style="font-size: 80%;">Deriving OLS from a probability distribution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs008.html#independent-and-identically-distributed-iid" style="font-size: 80%;">Independent and Identically Distributed (iid)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs009.html#maximum-likelihood-estimation-mle" style="font-size: 80%;">Maximum Likelihood Estimation (MLE)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs010.html#a-new-cost-function" style="font-size: 80%;">A new Cost Function</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs014.html#why-resampling-methods" style="font-size: 80%;">Why resampling methods</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs016.html#resampling-methods" style="font-size: 80%;">Resampling methods</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs013.html#resampling-approaches-can-be-computationally-expensive" style="font-size: 80%;">Resampling approaches can be computationally expensive</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs014.html#why-resampling-methods" style="font-size: 80%;">Why resampling methods ?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs015.html#statistical-analysis" style="font-size: 80%;">Statistical analysis</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs016.html#resampling-methods" style="font-size: 80%;">Resampling methods</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs017.html#resampling-methods-bootstrap" style="font-size: 80%;">Resampling methods: Bootstrap</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs018.html#the-central-limit-theorem" style="font-size: 80%;">The Central Limit Theorem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs019.html#finding-the-limit" style="font-size: 80%;">Finding the Limit</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs020.html#rewriting-the-delta-function" style="font-size: 80%;">Rewriting the \( \delta \)-function</a></li>
+     <!-- navigation toc: --> <li><a href="#identifying-terms" style="font-size: 80%;">Identifying Terms</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs022.html#wrapping-it-up" style="font-size: 80%;">Wrapping it up</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs023.html#confidence-intervals" style="font-size: 80%;">Confidence Intervals</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs024.html#standard-approach-based-on-the-normal-distribution" style="font-size: 80%;">Standard Approach based on the Normal Distribution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs025.html#resampling-methods-bootstrap-background" style="font-size: 80%;">Resampling methods: Bootstrap background</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs026.html#resampling-methods-more-bootstrap-background" style="font-size: 80%;">Resampling methods: More Bootstrap background</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs027.html#resampling-methods-bootstrap-approach" style="font-size: 80%;">Resampling methods: Bootstrap approach</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs028.html#resampling-methods-bootstrap-steps" style="font-size: 80%;">Resampling methods: Bootstrap steps</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs029.html#code-example-for-the-bootstrap-method" style="font-size: 80%;">Code example for the Bootstrap method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs030.html#plotting-the-histogram" style="font-size: 80%;">Plotting the Histogram</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs031.html#the-bias-variance-tradeoff" style="font-size: 80%;">The bias-variance tradeoff</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs032.html#a-way-to-read-the-bias-variance-tradeoff" style="font-size: 80%;">A way to Read the Bias-Variance Tradeoff</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs033.html#example-code-for-bias-variance-tradeoff" style="font-size: 80%;">Example code for Bias-Variance tradeoff</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs034.html#understanding-what-happens" style="font-size: 80%;">Understanding what happens</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs035.html#summing-up" style="font-size: 80%;">Summing up</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs036.html#another-example-from-scikit-learn-s-repository" style="font-size: 80%;">Another Example from Scikit-Learn's Repository</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs037.html#various-steps-in-cross-validation" style="font-size: 80%;">Various steps in cross-validation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs038.html#cross-validation-in-brief" style="font-size: 80%;">Cross-validation in brief</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs039.html#code-example-for-cross-validation-and-k-fold-cross-validation" style="font-size: 80%;">Code Example for Cross-validation and \( k \)-fold Cross-validation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs040.html#more-examples-on-bootstrap-and-cross-validation-and-errors" style="font-size: 80%;">More examples on bootstrap and cross-validation and errors</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs041.html#the-same-example-but-now-with-cross-validation" style="font-size: 80%;">The same example but now with cross-validation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs042.html#material-for-the-lab-sessions" style="font-size: 80%;">Material for the lab sessions</a></li>
 
         </ul>
       </li>
@@ -319,29 +251,33 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0021"></a>
 <!-- !split -->
-<h2 id="including-more-classes" class="anchor">Including more classes </h2>
+<h2 id="identifying-terms" class="anchor">Identifying Terms </h2>
 
-<p>Till now we have mainly focused on two classes, the so-called binary
-system. Suppose we wish to extend to \( K \) classes.  Let us for the sake
-of simplicity assume we have only two predictors. We have then following model
+<p>The second term on the rhs disappears since this is just the mean and 
+employing the definition of \( \sigma^2 \) we have 
 </p>
-
 $$
-\log{\frac{p(C=1\vert x)}{p(K\vert x)}} = \beta_{10}+\beta_{11}x_1,
+  \int_{-\infty}^{\infty}dxp(x)e^{\left(iq(\mu-x)/m\right)}=
+  1-\frac{q^2\sigma^2}{2m^2}+\dots,
 $$
 
-<p>and </p>
+<p>resulting in </p>
+
 $$
-\log{\frac{p(C=2\vert x)}{p(K\vert x)}} = \beta_{20}+\beta_{21}x_1,
+  \left[\int_{-\infty}^{\infty}dxp(x)\exp{\left(iq(\mu-x)/m\right)}\right]^m\approx
+  \left[1-\frac{q^2\sigma^2}{2m^2}+\dots \right]^m,
 $$
 
-<p>and so on till the class \( C=K-1 \) class</p>
+<p>and in the limit \( m\rightarrow \infty \) we obtain </p>
+
 $$
-\log{\frac{p(C=K-1\vert x)}{p(K\vert x)}} = \beta_{(K-1)0}+\beta_{(K-1)1}x_1,
+   \tilde{p}(z)=\frac{1}{\sqrt{2\pi}(\sigma/\sqrt{m})}
+    \exp{\left(-\frac{(z-\mu)^2}{2(\sigma/\sqrt{m})^2}\right)},
 $$
 
-<p>and the model is specified in term of \( K-1 \) so-called log-odds or
-<b>logit</b> transformations.
+<p>which is the normal distribution with variance
+\( \sigma^2_m=\sigma^2/m \), where \( \sigma \) is the variance of the PDF \( p(x) \)
+and \( \mu \) is also the mean of the PDF \( p(x) \). 
 </p>
 
 <p>
@@ -369,7 +305,7 @@ <h2 id="including-more-classes" class="anchor">Including more classes </h2>
   <li><a href="http://github.com/._week38-bs029.html">30</a></li>
   <li><a href="http://github.com/._week38-bs030.html">31</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week38-bs063.html">64</a></li>
+  <li><a href="http://github.com/._week38-bs042.html">43</a></li>
   <li><a href="http://github.com/._week38-bs022.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week38/html/._week38-bs022.html b/doc/pub/week38/html/._week38-bs022.html
index 6b53c4aec..39b938f7b 100644
--- a/doc/pub/week38/html/._week38-bs022.html
+++ b/doc/pub/week38/html/._week38-bs022.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 38: Logistic Regression and Optimization">
-<title>Week 38: Logistic Regression and Optimization</title>
+<meta name="description" content="Week 38: Statistical analysis, bias-variance tradeoff and resampling methods">
+<title>Week 38: Statistical analysis, bias-variance tradeoff and resampling methods</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week38.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week38-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -36,172 +36,113 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Plans for week 38, lecture Monday September 16',
+ 'sections': [('Plans for week 38, lecture Monday September 15',
                2,
                None,
-               'plans-for-week-38-lecture-monday-september-16'),
-              ('Suggested reading and videos',
+               'plans-for-week-38-lecture-monday-september-15'),
+              ('Readings and Videos', 2, None, 'readings-and-videos'),
+              ('Linking the regression analysis with a statistical '
+               'interpretation',
                2,
                None,
-               'suggested-reading-and-videos'),
-              ('Plans for the lab sessions',
+               'linking-the-regression-analysis-with-a-statistical-interpretation'),
+              ('Assumptions made', 2, None, 'assumptions-made'),
+              ('Expectation value and variance',
                2,
                None,
-               'plans-for-the-lab-sessions'),
-              ('Material for lecture Monday September 16',
+               'expectation-value-and-variance'),
+              ('Expectation value and variance for $\\boldsymbol{\\theta}$',
                2,
                None,
-               'material-for-lecture-monday-september-16'),
-              ('Logistic Regression', 2, None, 'logistic-regression'),
-              ('Classification problems', 2, None, 'classification-problems'),
-              ('Optimization and Deep learning',
+               'expectation-value-and-variance-for-boldsymbol-theta'),
+              ('Deriving OLS from a probability distribution',
                2,
                None,
-               'optimization-and-deep-learning'),
-              ('Basics', 2, None, 'basics'),
-              ('Linear classifier', 2, None, 'linear-classifier'),
-              ('Some selected properties', 2, None, 'some-selected-properties'),
-              ('Simple example', 2, None, 'simple-example'),
-              ('Plotting the mean value for each group',
+               'deriving-ols-from-a-probability-distribution'),
+              ('Independent and Identically Distributed (iid)',
                2,
                None,
-               'plotting-the-mean-value-for-each-group'),
-              ('The logistic function', 2, None, 'the-logistic-function'),
-              ('Examples of likelihood functions used in logistic regression '
-               'and nueral networks',
+               'independent-and-identically-distributed-iid'),
+              ('Maximum Likelihood Estimation (MLE)',
                2,
                None,
-               'examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-networks'),
-              ('Two parameters', 2, None, 'two-parameters'),
-              ('Maximum likelihood', 2, None, 'maximum-likelihood'),
-              ('The cost function rewritten',
+               'maximum-likelihood-estimation-mle'),
+              ('A new Cost Function', 2, None, 'a-new-cost-function'),
+              ('Why resampling methods', 2, None, 'why-resampling-methods'),
+              ('Resampling methods', 2, None, 'resampling-methods'),
+              ('Resampling approaches can be computationally expensive',
                2,
                None,
-               'the-cost-function-rewritten'),
-              ('Minimizing the cross entropy',
+               'resampling-approaches-can-be-computationally-expensive'),
+              ('Why resampling methods ?', 2, None, 'why-resampling-methods'),
+              ('Statistical analysis', 2, None, 'statistical-analysis'),
+              ('Resampling methods', 2, None, 'resampling-methods'),
+              ('Resampling methods: Bootstrap',
                2,
                None,
-               'minimizing-the-cross-entropy'),
-              ('A more compact expression',
+               'resampling-methods-bootstrap'),
+              ('The Central Limit Theorem',
                2,
                None,
-               'a-more-compact-expression'),
-              ('Extending to more predictors',
+               'the-central-limit-theorem'),
+              ('Finding the Limit', 2, None, 'finding-the-limit'),
+              ('Rewriting the $\\delta$-function',
                2,
                None,
-               'extending-to-more-predictors'),
-              ('Including more classes', 2, None, 'including-more-classes'),
-              ('More classes', 2, None, 'more-classes'),
-              ('Searching for Optimal Regularization Parameters $\\lambda$',
+               'rewriting-the-delta-function'),
+              ('Identifying Terms', 2, None, 'identifying-terms'),
+              ('Wrapping it up', 2, None, 'wrapping-it-up'),
+              ('Confidence Intervals', 2, None, 'confidence-intervals'),
+              ('Standard Approach based on the Normal Distribution',
                2,
                None,
-               'searching-for-optimal-regularization-parameters-lambda'),
-              ('Grid Search', 2, None, 'grid-search'),
-              ('Randomized Grid Search', 2, None, 'randomized-grid-search'),
-              ('Wisconsin Cancer Data', 2, None, 'wisconsin-cancer-data'),
-              ('Using the correlation matrix',
+               'standard-approach-based-on-the-normal-distribution'),
+              ('Resampling methods: Bootstrap background',
                2,
                None,
-               'using-the-correlation-matrix'),
-              ('Discussing the correlation data',
+               'resampling-methods-bootstrap-background'),
+              ('Resampling methods: More Bootstrap background',
                2,
                None,
-               'discussing-the-correlation-data'),
-              ('Other measures in classification studies: Cancer Data  again',
+               'resampling-methods-more-bootstrap-background'),
+              ('Resampling methods: Bootstrap approach',
                2,
                None,
-               'other-measures-in-classification-studies-cancer-data-again'),
-              ('Optimization, the central part of any Machine Learning '
-               'algortithm',
+               'resampling-methods-bootstrap-approach'),
+              ('Resampling methods: Bootstrap steps',
                2,
                None,
-               'optimization-the-central-part-of-any-machine-learning-algortithm'),
-              ('Revisiting our Logistic Regression case',
+               'resampling-methods-bootstrap-steps'),
+              ('Code example for the Bootstrap method',
                2,
                None,
-               'revisiting-our-logistic-regression-case'),
-              ('The equations to solve', 2, None, 'the-equations-to-solve'),
-              ("Solving using Newton-Raphson's method",
+               'code-example-for-the-bootstrap-method'),
+              ('Plotting the Histogram', 2, None, 'plotting-the-histogram'),
+              ('The bias-variance tradeoff',
                2,
                None,
-               'solving-using-newton-raphson-s-method'),
-              ("Brief reminder on Newton-Raphson's method",
+               'the-bias-variance-tradeoff'),
+              ('A way to Read the Bias-Variance Tradeoff',
                2,
                None,
-               'brief-reminder-on-newton-raphson-s-method'),
-              ('The equations', 2, None, 'the-equations'),
-              ('Simple geometric interpretation',
+               'a-way-to-read-the-bias-variance-tradeoff'),
+              ('Example code for Bias-Variance tradeoff',
                2,
                None,
-               'simple-geometric-interpretation'),
-              ('Extending to more than one variable',
+               'example-code-for-bias-variance-tradeoff'),
+              ('Understanding what happens',
                2,
                None,
-               'extending-to-more-than-one-variable'),
-              ('Steepest descent', 2, None, 'steepest-descent'),
-              ('More on Steepest descent', 2, None, 'more-on-steepest-descent'),
-              ('The ideal', 2, None, 'the-ideal'),
-              ('The sensitiveness of the gradient descent',
+               'understanding-what-happens'),
+              ('Summing up', 2, None, 'summing-up'),
+              ("Another Example from Scikit-Learn's Repository",
                2,
                None,
-               'the-sensitiveness-of-the-gradient-descent'),
-              ('Convex functions', 2, None, 'convex-functions'),
-              ('Convex function', 2, None, 'convex-function'),
-              ('Conditions on convex functions',
-               2,
-               None,
-               'conditions-on-convex-functions'),
-              ('More on convex functions', 2, None, 'more-on-convex-functions'),
-              ('Some simple problems', 2, None, 'some-simple-problems'),
-              ('Revisiting our first homework',
-               2,
-               None,
-               'revisiting-our-first-homework'),
-              ('Gradient descent example', 2, None, 'gradient-descent-example'),
-              ('The derivative of the cost/loss function',
-               2,
-               None,
-               'the-derivative-of-the-cost-loss-function'),
-              ('The Hessian matrix', 2, None, 'the-hessian-matrix'),
-              ('Simple program', 2, None, 'simple-program'),
-              ('Gradient Descent Example', 2, None, 'gradient-descent-example'),
-              ('And a corresponding example using _scikit-learn_',
-               2,
-               None,
-               'and-a-corresponding-example-using-scikit-learn'),
-              ('Gradient descent and Ridge',
-               2,
-               None,
-               'gradient-descent-and-ridge'),
-              ('The Hessian matrix for Ridge Regression',
-               2,
-               None,
-               'the-hessian-matrix-for-ridge-regression'),
-              ('Program example for gradient descent with Ridge Regression',
-               2,
-               None,
-               'program-example-for-gradient-descent-with-ridge-regression'),
-              ('Using gradient descent methods, limitations',
-               2,
-               None,
-               'using-gradient-descent-methods-limitations'),
-              ('Challenge yourself the coming weekend',
-               2,
-               None,
-               'challenge-yourself-the-coming-weekend'),
-              ('Lab session: Material from last week and relevant for the '
-               'first project',
-               2,
-               None,
-               'lab-session-material-from-last-week-and-relevant-for-the-first-project'),
+               'another-example-from-scikit-learn-s-repository'),
               ('Various steps in cross-validation',
                2,
                None,
                'various-steps-in-cross-validation'),
-              ('How to set up the cross-validation for Ridge and/or Lasso',
-               2,
-               None,
-               'how-to-set-up-the-cross-validation-for-ridge-and-or-lasso'),
               ('Cross-validation in brief',
                2,
                None,
@@ -210,7 +151,19 @@
                'Cross-validation',
                2,
                None,
-               'code-example-for-cross-validation-and-k-fold-cross-validation')]}
+               'code-example-for-cross-validation-and-k-fold-cross-validation'),
+              ('More examples on bootstrap and cross-validation and errors',
+               2,
+               None,
+               'more-examples-on-bootstrap-and-cross-validation-and-errors'),
+              ('The same example but now with cross-validation',
+               2,
+               None,
+               'the-same-example-but-now-with-cross-validation'),
+              ('Material for the lab sessions',
+               2,
+               None,
+               'material-for-the-lab-sessions')]}
 end of tocinfo -->
 
 <body>
@@ -238,76 +191,55 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week38-bs.html">Week 38: Logistic Regression and Optimization</a>
+    <a class="navbar-brand" href="http://github.com/week38-bs.html">Week 38: Statistical analysis, bias-variance tradeoff and resampling methods</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs001.html#plans-for-week-38-lecture-monday-september-16" style="font-size: 80%;">Plans for week 38, lecture Monday September 16</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs002.html#suggested-reading-and-videos" style="font-size: 80%;">Suggested reading and videos</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs003.html#plans-for-the-lab-sessions" style="font-size: 80%;">Plans for the lab sessions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs004.html#material-for-lecture-monday-september-16" style="font-size: 80%;">Material for lecture Monday September 16</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs005.html#logistic-regression" style="font-size: 80%;">Logistic Regression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs006.html#classification-problems" style="font-size: 80%;">Classification problems</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs007.html#optimization-and-deep-learning" style="font-size: 80%;">Optimization and Deep learning</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs008.html#basics" style="font-size: 80%;">Basics</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs009.html#linear-classifier" style="font-size: 80%;">Linear classifier</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs010.html#some-selected-properties" style="font-size: 80%;">Some selected properties</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs011.html#simple-example" style="font-size: 80%;">Simple example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs012.html#plotting-the-mean-value-for-each-group" style="font-size: 80%;">Plotting the mean value for each group</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs013.html#the-logistic-function" style="font-size: 80%;">The logistic function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs014.html#examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-networks" style="font-size: 80%;">Examples of likelihood functions used in logistic regression and nueral networks</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs015.html#two-parameters" style="font-size: 80%;">Two parameters</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs016.html#maximum-likelihood" style="font-size: 80%;">Maximum likelihood</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs017.html#the-cost-function-rewritten" style="font-size: 80%;">The cost function rewritten</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs018.html#minimizing-the-cross-entropy" style="font-size: 80%;">Minimizing the cross entropy</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs019.html#a-more-compact-expression" style="font-size: 80%;">A more compact expression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs020.html#extending-to-more-predictors" style="font-size: 80%;">Extending to more predictors</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs021.html#including-more-classes" style="font-size: 80%;">Including more classes</a></li>
-     <!-- navigation toc: --> <li><a href="#more-classes" style="font-size: 80%;">More classes</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs023.html#searching-for-optimal-regularization-parameters-lambda" style="font-size: 80%;">Searching for Optimal Regularization Parameters \( \lambda \)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs024.html#grid-search" style="font-size: 80%;">Grid Search</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs025.html#randomized-grid-search" style="font-size: 80%;">Randomized Grid Search</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs026.html#wisconsin-cancer-data" style="font-size: 80%;">Wisconsin Cancer Data</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs027.html#using-the-correlation-matrix" style="font-size: 80%;">Using the correlation matrix</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs028.html#discussing-the-correlation-data" style="font-size: 80%;">Discussing the correlation data</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs029.html#other-measures-in-classification-studies-cancer-data-again" style="font-size: 80%;">Other measures in classification studies: Cancer Data  again</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs030.html#optimization-the-central-part-of-any-machine-learning-algortithm" style="font-size: 80%;">Optimization, the central part of any Machine Learning algortithm</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs031.html#revisiting-our-logistic-regression-case" style="font-size: 80%;">Revisiting our Logistic Regression case</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs032.html#the-equations-to-solve" style="font-size: 80%;">The equations to solve</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs033.html#solving-using-newton-raphson-s-method" style="font-size: 80%;">Solving using Newton-Raphson's method</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs034.html#brief-reminder-on-newton-raphson-s-method" style="font-size: 80%;">Brief reminder on Newton-Raphson's method</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs035.html#the-equations" style="font-size: 80%;">The equations</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs036.html#simple-geometric-interpretation" style="font-size: 80%;">Simple geometric interpretation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs037.html#extending-to-more-than-one-variable" style="font-size: 80%;">Extending to more than one variable</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs038.html#steepest-descent" style="font-size: 80%;">Steepest descent</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs039.html#more-on-steepest-descent" style="font-size: 80%;">More on Steepest descent</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs040.html#the-ideal" style="font-size: 80%;">The ideal</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs041.html#the-sensitiveness-of-the-gradient-descent" style="font-size: 80%;">The sensitiveness of the gradient descent</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs042.html#convex-functions" style="font-size: 80%;">Convex functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs043.html#convex-function" style="font-size: 80%;">Convex function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs044.html#conditions-on-convex-functions" style="font-size: 80%;">Conditions on convex functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs045.html#more-on-convex-functions" style="font-size: 80%;">More on convex functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs046.html#some-simple-problems" style="font-size: 80%;">Some simple problems</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs047.html#revisiting-our-first-homework" style="font-size: 80%;">Revisiting our first homework</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs052.html#gradient-descent-example" style="font-size: 80%;">Gradient descent example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs049.html#the-derivative-of-the-cost-loss-function" style="font-size: 80%;">The derivative of the cost/loss function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs050.html#the-hessian-matrix" style="font-size: 80%;">The Hessian matrix</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs051.html#simple-program" style="font-size: 80%;">Simple program</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs052.html#gradient-descent-example" style="font-size: 80%;">Gradient Descent Example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs053.html#and-a-corresponding-example-using-scikit-learn" style="font-size: 80%;">And a corresponding example using <b>scikit-learn</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs054.html#gradient-descent-and-ridge" style="font-size: 80%;">Gradient descent and Ridge</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs055.html#the-hessian-matrix-for-ridge-regression" style="font-size: 80%;">The Hessian matrix for Ridge Regression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs056.html#program-example-for-gradient-descent-with-ridge-regression" style="font-size: 80%;">Program example for gradient descent with Ridge Regression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs057.html#using-gradient-descent-methods-limitations" style="font-size: 80%;">Using gradient descent methods, limitations</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs058.html#challenge-yourself-the-coming-weekend" style="font-size: 80%;">Challenge yourself the coming weekend</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs059.html#lab-session-material-from-last-week-and-relevant-for-the-first-project" style="font-size: 80%;">Lab session: Material from last week and relevant for the first project</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs060.html#various-steps-in-cross-validation" style="font-size: 80%;">Various steps in cross-validation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs061.html#how-to-set-up-the-cross-validation-for-ridge-and-or-lasso" style="font-size: 80%;">How to set up the cross-validation for Ridge and/or Lasso</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs062.html#cross-validation-in-brief" style="font-size: 80%;">Cross-validation in brief</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs063.html#code-example-for-cross-validation-and-k-fold-cross-validation" style="font-size: 80%;">Code Example for Cross-validation and \( k \)-fold Cross-validation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs001.html#plans-for-week-38-lecture-monday-september-15" style="font-size: 80%;">Plans for week 38, lecture Monday September 15</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs002.html#readings-and-videos" style="font-size: 80%;">Readings and Videos</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs003.html#linking-the-regression-analysis-with-a-statistical-interpretation" style="font-size: 80%;">Linking the regression analysis with a statistical interpretation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs004.html#assumptions-made" style="font-size: 80%;">Assumptions made</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs005.html#expectation-value-and-variance" style="font-size: 80%;">Expectation value and variance</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs006.html#expectation-value-and-variance-for-boldsymbol-theta" style="font-size: 80%;">Expectation value and variance for \( \boldsymbol{\theta} \)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs007.html#deriving-ols-from-a-probability-distribution" style="font-size: 80%;">Deriving OLS from a probability distribution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs008.html#independent-and-identically-distributed-iid" style="font-size: 80%;">Independent and Identically Distributed (iid)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs009.html#maximum-likelihood-estimation-mle" style="font-size: 80%;">Maximum Likelihood Estimation (MLE)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs010.html#a-new-cost-function" style="font-size: 80%;">A new Cost Function</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs014.html#why-resampling-methods" style="font-size: 80%;">Why resampling methods</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs016.html#resampling-methods" style="font-size: 80%;">Resampling methods</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs013.html#resampling-approaches-can-be-computationally-expensive" style="font-size: 80%;">Resampling approaches can be computationally expensive</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs014.html#why-resampling-methods" style="font-size: 80%;">Why resampling methods ?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs015.html#statistical-analysis" style="font-size: 80%;">Statistical analysis</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs016.html#resampling-methods" style="font-size: 80%;">Resampling methods</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs017.html#resampling-methods-bootstrap" style="font-size: 80%;">Resampling methods: Bootstrap</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs018.html#the-central-limit-theorem" style="font-size: 80%;">The Central Limit Theorem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs019.html#finding-the-limit" style="font-size: 80%;">Finding the Limit</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs020.html#rewriting-the-delta-function" style="font-size: 80%;">Rewriting the \( \delta \)-function</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs021.html#identifying-terms" style="font-size: 80%;">Identifying Terms</a></li>
+     <!-- navigation toc: --> <li><a href="#wrapping-it-up" style="font-size: 80%;">Wrapping it up</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs023.html#confidence-intervals" style="font-size: 80%;">Confidence Intervals</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs024.html#standard-approach-based-on-the-normal-distribution" style="font-size: 80%;">Standard Approach based on the Normal Distribution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs025.html#resampling-methods-bootstrap-background" style="font-size: 80%;">Resampling methods: Bootstrap background</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs026.html#resampling-methods-more-bootstrap-background" style="font-size: 80%;">Resampling methods: More Bootstrap background</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs027.html#resampling-methods-bootstrap-approach" style="font-size: 80%;">Resampling methods: Bootstrap approach</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs028.html#resampling-methods-bootstrap-steps" style="font-size: 80%;">Resampling methods: Bootstrap steps</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs029.html#code-example-for-the-bootstrap-method" style="font-size: 80%;">Code example for the Bootstrap method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs030.html#plotting-the-histogram" style="font-size: 80%;">Plotting the Histogram</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs031.html#the-bias-variance-tradeoff" style="font-size: 80%;">The bias-variance tradeoff</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs032.html#a-way-to-read-the-bias-variance-tradeoff" style="font-size: 80%;">A way to Read the Bias-Variance Tradeoff</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs033.html#example-code-for-bias-variance-tradeoff" style="font-size: 80%;">Example code for Bias-Variance tradeoff</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs034.html#understanding-what-happens" style="font-size: 80%;">Understanding what happens</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs035.html#summing-up" style="font-size: 80%;">Summing up</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs036.html#another-example-from-scikit-learn-s-repository" style="font-size: 80%;">Another Example from Scikit-Learn's Repository</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs037.html#various-steps-in-cross-validation" style="font-size: 80%;">Various steps in cross-validation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs038.html#cross-validation-in-brief" style="font-size: 80%;">Cross-validation in brief</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs039.html#code-example-for-cross-validation-and-k-fold-cross-validation" style="font-size: 80%;">Code Example for Cross-validation and \( k \)-fold Cross-validation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs040.html#more-examples-on-bootstrap-and-cross-validation-and-errors" style="font-size: 80%;">More examples on bootstrap and cross-validation and errors</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs041.html#the-same-example-but-now-with-cross-validation" style="font-size: 80%;">The same example but now with cross-validation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs042.html#material-for-the-lab-sessions" style="font-size: 80%;">Material for the lab sessions</a></li>
 
         </ul>
       </li>
@@ -319,41 +251,45 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0022"></a>
 <!-- !split -->
-<h2 id="more-classes" class="anchor">More classes </h2>
+<h2 id="wrapping-it-up" class="anchor">Wrapping it up </h2>
 
-<p>In our discussion of neural networks we will encounter the above again
-in terms of a slightly modified function, the so-called <b>Softmax</b> function.
+<p>Thus, the central limit theorem states that the PDF \( \tilde{p}(z) \) of
+the average of \( m \) random values corresponding to a PDF \( p(x) \) 
+is a normal distribution whose mean is the 
+mean value of the PDF \( p(x) \) and whose variance is the variance
+of the PDF \( p(x) \) divided by \( m \), the number of values used to compute \( z \).
 </p>
 
-<p>The softmax function is used in various multiclass classification
-methods, such as multinomial logistic regression (also known as
-softmax regression), multiclass linear discriminant analysis, naive
-Bayes classifiers, and artificial neural networks.  Specifically, in
-multinomial logistic regression and linear discriminant analysis, the
-input to the function is the result of \( K \) distinct linear functions,
-and the predicted probability for the \( k \)-th class given a sample
-vector \( \boldsymbol{x} \) and a weighting vector \( \boldsymbol{\beta} \) is (with two
-predictors):
+<p>The central limit theorem leads to the well-known expression for the
+standard deviation, given by 
 </p>
 
 $$
-p(C=k\vert \mathbf {x} )=\frac{\exp{(\beta_{k0}+\beta_{k1}x_1)}}{1+\sum_{l=1}^{K-1}\exp{(\beta_{l0}+\beta_{l1}x_1)}}.
+    \sigma_m=
+\frac{\sigma}{\sqrt{m}}.
 $$
 
-<p>It is easy to extend to more predictors. The final class is </p>
+<p>The latter is true only if the average value is known exactly. This is obtained in the limit
+\( m\rightarrow \infty \)  only. Because the mean and the variance are measured quantities we obtain 
+the familiar expression in statistics (the so-called Bessel correction)
+</p>
 $$
-p(C=K\vert \mathbf {x} )=\frac{1}{1+\sum_{l=1}^{K-1}\exp{(\beta_{l0}+\beta_{l1}x_1)}},
+    \sigma_m\approx 
+\frac{\sigma}{\sqrt{m-1}}.
 $$
 
-<p>and they sum to one. Our earlier discussions were all specialized to
-the case with two classes only. It is easy to see from the above that
-what we derived earlier is compatible with these equations.
+<p>In many cases however the above estimate for the standard deviation,
+in particular if correlations are strong, may be too simplistic. Keep
+in mind that we have assumed that the variables \( x \) are independent
+and identically distributed. This is obviously not always the
+case. For example, the random numbers (or better pseudorandom numbers)
+we generate in various calculations do always exhibit some
+correlations.
 </p>
 
-<p>To find the optimal parameters we would typically use a gradient
-descent method.  Newton's method and gradient descent methods are
-discussed in the material on <a href="https://compphysics.github.io/MachineLearning/doc/pub/Splines/html/Splines-bs.html" target="_self">optimization
-methods</a>.
+<p>The theorem is satisfied by a large class of PDFs. Note however that for a
+finite \( m \), it is not always possible to find a closed form /analytic expression for
+\( \tilde{p}(x) \).
 </p>
 
 <p>
@@ -381,7 +317,7 @@ <h2 id="more-classes" class="anchor">More classes </h2>
   <li><a href="http://github.com/._week38-bs030.html">31</a></li>
   <li><a href="http://github.com/._week38-bs031.html">32</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week38-bs063.html">64</a></li>
+  <li><a href="http://github.com/._week38-bs042.html">43</a></li>
   <li><a href="http://github.com/._week38-bs023.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week38/html/._week38-bs023.html b/doc/pub/week38/html/._week38-bs023.html
index d2a8d9bbc..26919cfd8 100644
--- a/doc/pub/week38/html/._week38-bs023.html
+++ b/doc/pub/week38/html/._week38-bs023.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 38: Logistic Regression and Optimization">
-<title>Week 38: Logistic Regression and Optimization</title>
+<meta name="description" content="Week 38: Statistical analysis, bias-variance tradeoff and resampling methods">
+<title>Week 38: Statistical analysis, bias-variance tradeoff and resampling methods</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week38.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week38-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -36,172 +36,113 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Plans for week 38, lecture Monday September 16',
+ 'sections': [('Plans for week 38, lecture Monday September 15',
                2,
                None,
-               'plans-for-week-38-lecture-monday-september-16'),
-              ('Suggested reading and videos',
+               'plans-for-week-38-lecture-monday-september-15'),
+              ('Readings and Videos', 2, None, 'readings-and-videos'),
+              ('Linking the regression analysis with a statistical '
+               'interpretation',
                2,
                None,
-               'suggested-reading-and-videos'),
-              ('Plans for the lab sessions',
+               'linking-the-regression-analysis-with-a-statistical-interpretation'),
+              ('Assumptions made', 2, None, 'assumptions-made'),
+              ('Expectation value and variance',
                2,
                None,
-               'plans-for-the-lab-sessions'),
-              ('Material for lecture Monday September 16',
+               'expectation-value-and-variance'),
+              ('Expectation value and variance for $\\boldsymbol{\\theta}$',
                2,
                None,
-               'material-for-lecture-monday-september-16'),
-              ('Logistic Regression', 2, None, 'logistic-regression'),
-              ('Classification problems', 2, None, 'classification-problems'),
-              ('Optimization and Deep learning',
+               'expectation-value-and-variance-for-boldsymbol-theta'),
+              ('Deriving OLS from a probability distribution',
                2,
                None,
-               'optimization-and-deep-learning'),
-              ('Basics', 2, None, 'basics'),
-              ('Linear classifier', 2, None, 'linear-classifier'),
-              ('Some selected properties', 2, None, 'some-selected-properties'),
-              ('Simple example', 2, None, 'simple-example'),
-              ('Plotting the mean value for each group',
+               'deriving-ols-from-a-probability-distribution'),
+              ('Independent and Identically Distributed (iid)',
                2,
                None,
-               'plotting-the-mean-value-for-each-group'),
-              ('The logistic function', 2, None, 'the-logistic-function'),
-              ('Examples of likelihood functions used in logistic regression '
-               'and nueral networks',
+               'independent-and-identically-distributed-iid'),
+              ('Maximum Likelihood Estimation (MLE)',
                2,
                None,
-               'examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-networks'),
-              ('Two parameters', 2, None, 'two-parameters'),
-              ('Maximum likelihood', 2, None, 'maximum-likelihood'),
-              ('The cost function rewritten',
+               'maximum-likelihood-estimation-mle'),
+              ('A new Cost Function', 2, None, 'a-new-cost-function'),
+              ('Why resampling methods', 2, None, 'why-resampling-methods'),
+              ('Resampling methods', 2, None, 'resampling-methods'),
+              ('Resampling approaches can be computationally expensive',
                2,
                None,
-               'the-cost-function-rewritten'),
-              ('Minimizing the cross entropy',
+               'resampling-approaches-can-be-computationally-expensive'),
+              ('Why resampling methods ?', 2, None, 'why-resampling-methods'),
+              ('Statistical analysis', 2, None, 'statistical-analysis'),
+              ('Resampling methods', 2, None, 'resampling-methods'),
+              ('Resampling methods: Bootstrap',
                2,
                None,
-               'minimizing-the-cross-entropy'),
-              ('A more compact expression',
+               'resampling-methods-bootstrap'),
+              ('The Central Limit Theorem',
                2,
                None,
-               'a-more-compact-expression'),
-              ('Extending to more predictors',
+               'the-central-limit-theorem'),
+              ('Finding the Limit', 2, None, 'finding-the-limit'),
+              ('Rewriting the $\\delta$-function',
                2,
                None,
-               'extending-to-more-predictors'),
-              ('Including more classes', 2, None, 'including-more-classes'),
-              ('More classes', 2, None, 'more-classes'),
-              ('Searching for Optimal Regularization Parameters $\\lambda$',
+               'rewriting-the-delta-function'),
+              ('Identifying Terms', 2, None, 'identifying-terms'),
+              ('Wrapping it up', 2, None, 'wrapping-it-up'),
+              ('Confidence Intervals', 2, None, 'confidence-intervals'),
+              ('Standard Approach based on the Normal Distribution',
                2,
                None,
-               'searching-for-optimal-regularization-parameters-lambda'),
-              ('Grid Search', 2, None, 'grid-search'),
-              ('Randomized Grid Search', 2, None, 'randomized-grid-search'),
-              ('Wisconsin Cancer Data', 2, None, 'wisconsin-cancer-data'),
-              ('Using the correlation matrix',
+               'standard-approach-based-on-the-normal-distribution'),
+              ('Resampling methods: Bootstrap background',
                2,
                None,
-               'using-the-correlation-matrix'),
-              ('Discussing the correlation data',
+               'resampling-methods-bootstrap-background'),
+              ('Resampling methods: More Bootstrap background',
                2,
                None,
-               'discussing-the-correlation-data'),
-              ('Other measures in classification studies: Cancer Data  again',
+               'resampling-methods-more-bootstrap-background'),
+              ('Resampling methods: Bootstrap approach',
                2,
                None,
-               'other-measures-in-classification-studies-cancer-data-again'),
-              ('Optimization, the central part of any Machine Learning '
-               'algortithm',
+               'resampling-methods-bootstrap-approach'),
+              ('Resampling methods: Bootstrap steps',
                2,
                None,
-               'optimization-the-central-part-of-any-machine-learning-algortithm'),
-              ('Revisiting our Logistic Regression case',
+               'resampling-methods-bootstrap-steps'),
+              ('Code example for the Bootstrap method',
                2,
                None,
-               'revisiting-our-logistic-regression-case'),
-              ('The equations to solve', 2, None, 'the-equations-to-solve'),
-              ("Solving using Newton-Raphson's method",
+               'code-example-for-the-bootstrap-method'),
+              ('Plotting the Histogram', 2, None, 'plotting-the-histogram'),
+              ('The bias-variance tradeoff',
                2,
                None,
-               'solving-using-newton-raphson-s-method'),
-              ("Brief reminder on Newton-Raphson's method",
+               'the-bias-variance-tradeoff'),
+              ('A way to Read the Bias-Variance Tradeoff',
                2,
                None,
-               'brief-reminder-on-newton-raphson-s-method'),
-              ('The equations', 2, None, 'the-equations'),
-              ('Simple geometric interpretation',
+               'a-way-to-read-the-bias-variance-tradeoff'),
+              ('Example code for Bias-Variance tradeoff',
                2,
                None,
-               'simple-geometric-interpretation'),
-              ('Extending to more than one variable',
+               'example-code-for-bias-variance-tradeoff'),
+              ('Understanding what happens',
                2,
                None,
-               'extending-to-more-than-one-variable'),
-              ('Steepest descent', 2, None, 'steepest-descent'),
-              ('More on Steepest descent', 2, None, 'more-on-steepest-descent'),
-              ('The ideal', 2, None, 'the-ideal'),
-              ('The sensitiveness of the gradient descent',
+               'understanding-what-happens'),
+              ('Summing up', 2, None, 'summing-up'),
+              ("Another Example from Scikit-Learn's Repository",
                2,
                None,
-               'the-sensitiveness-of-the-gradient-descent'),
-              ('Convex functions', 2, None, 'convex-functions'),
-              ('Convex function', 2, None, 'convex-function'),
-              ('Conditions on convex functions',
-               2,
-               None,
-               'conditions-on-convex-functions'),
-              ('More on convex functions', 2, None, 'more-on-convex-functions'),
-              ('Some simple problems', 2, None, 'some-simple-problems'),
-              ('Revisiting our first homework',
-               2,
-               None,
-               'revisiting-our-first-homework'),
-              ('Gradient descent example', 2, None, 'gradient-descent-example'),
-              ('The derivative of the cost/loss function',
-               2,
-               None,
-               'the-derivative-of-the-cost-loss-function'),
-              ('The Hessian matrix', 2, None, 'the-hessian-matrix'),
-              ('Simple program', 2, None, 'simple-program'),
-              ('Gradient Descent Example', 2, None, 'gradient-descent-example'),
-              ('And a corresponding example using _scikit-learn_',
-               2,
-               None,
-               'and-a-corresponding-example-using-scikit-learn'),
-              ('Gradient descent and Ridge',
-               2,
-               None,
-               'gradient-descent-and-ridge'),
-              ('The Hessian matrix for Ridge Regression',
-               2,
-               None,
-               'the-hessian-matrix-for-ridge-regression'),
-              ('Program example for gradient descent with Ridge Regression',
-               2,
-               None,
-               'program-example-for-gradient-descent-with-ridge-regression'),
-              ('Using gradient descent methods, limitations',
-               2,
-               None,
-               'using-gradient-descent-methods-limitations'),
-              ('Challenge yourself the coming weekend',
-               2,
-               None,
-               'challenge-yourself-the-coming-weekend'),
-              ('Lab session: Material from last week and relevant for the '
-               'first project',
-               2,
-               None,
-               'lab-session-material-from-last-week-and-relevant-for-the-first-project'),
+               'another-example-from-scikit-learn-s-repository'),
               ('Various steps in cross-validation',
                2,
                None,
                'various-steps-in-cross-validation'),
-              ('How to set up the cross-validation for Ridge and/or Lasso',
-               2,
-               None,
-               'how-to-set-up-the-cross-validation-for-ridge-and-or-lasso'),
               ('Cross-validation in brief',
                2,
                None,
@@ -210,7 +151,19 @@
                'Cross-validation',
                2,
                None,
-               'code-example-for-cross-validation-and-k-fold-cross-validation')]}
+               'code-example-for-cross-validation-and-k-fold-cross-validation'),
+              ('More examples on bootstrap and cross-validation and errors',
+               2,
+               None,
+               'more-examples-on-bootstrap-and-cross-validation-and-errors'),
+              ('The same example but now with cross-validation',
+               2,
+               None,
+               'the-same-example-but-now-with-cross-validation'),
+              ('Material for the lab sessions',
+               2,
+               None,
+               'material-for-the-lab-sessions')]}
 end of tocinfo -->
 
 <body>
@@ -238,76 +191,55 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week38-bs.html">Week 38: Logistic Regression and Optimization</a>
+    <a class="navbar-brand" href="http://github.com/week38-bs.html">Week 38: Statistical analysis, bias-variance tradeoff and resampling methods</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs001.html#plans-for-week-38-lecture-monday-september-16" style="font-size: 80%;">Plans for week 38, lecture Monday September 16</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs002.html#suggested-reading-and-videos" style="font-size: 80%;">Suggested reading and videos</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs003.html#plans-for-the-lab-sessions" style="font-size: 80%;">Plans for the lab sessions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs004.html#material-for-lecture-monday-september-16" style="font-size: 80%;">Material for lecture Monday September 16</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs005.html#logistic-regression" style="font-size: 80%;">Logistic Regression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs006.html#classification-problems" style="font-size: 80%;">Classification problems</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs007.html#optimization-and-deep-learning" style="font-size: 80%;">Optimization and Deep learning</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs008.html#basics" style="font-size: 80%;">Basics</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs009.html#linear-classifier" style="font-size: 80%;">Linear classifier</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs010.html#some-selected-properties" style="font-size: 80%;">Some selected properties</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs011.html#simple-example" style="font-size: 80%;">Simple example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs012.html#plotting-the-mean-value-for-each-group" style="font-size: 80%;">Plotting the mean value for each group</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs013.html#the-logistic-function" style="font-size: 80%;">The logistic function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs014.html#examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-networks" style="font-size: 80%;">Examples of likelihood functions used in logistic regression and nueral networks</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs015.html#two-parameters" style="font-size: 80%;">Two parameters</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs016.html#maximum-likelihood" style="font-size: 80%;">Maximum likelihood</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs017.html#the-cost-function-rewritten" style="font-size: 80%;">The cost function rewritten</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs018.html#minimizing-the-cross-entropy" style="font-size: 80%;">Minimizing the cross entropy</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs019.html#a-more-compact-expression" style="font-size: 80%;">A more compact expression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs020.html#extending-to-more-predictors" style="font-size: 80%;">Extending to more predictors</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs021.html#including-more-classes" style="font-size: 80%;">Including more classes</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs022.html#more-classes" style="font-size: 80%;">More classes</a></li>
-     <!-- navigation toc: --> <li><a href="#searching-for-optimal-regularization-parameters-lambda" style="font-size: 80%;">Searching for Optimal Regularization Parameters \( \lambda \)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs024.html#grid-search" style="font-size: 80%;">Grid Search</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs025.html#randomized-grid-search" style="font-size: 80%;">Randomized Grid Search</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs026.html#wisconsin-cancer-data" style="font-size: 80%;">Wisconsin Cancer Data</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs027.html#using-the-correlation-matrix" style="font-size: 80%;">Using the correlation matrix</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs028.html#discussing-the-correlation-data" style="font-size: 80%;">Discussing the correlation data</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs029.html#other-measures-in-classification-studies-cancer-data-again" style="font-size: 80%;">Other measures in classification studies: Cancer Data  again</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs030.html#optimization-the-central-part-of-any-machine-learning-algortithm" style="font-size: 80%;">Optimization, the central part of any Machine Learning algortithm</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs031.html#revisiting-our-logistic-regression-case" style="font-size: 80%;">Revisiting our Logistic Regression case</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs032.html#the-equations-to-solve" style="font-size: 80%;">The equations to solve</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs033.html#solving-using-newton-raphson-s-method" style="font-size: 80%;">Solving using Newton-Raphson's method</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs034.html#brief-reminder-on-newton-raphson-s-method" style="font-size: 80%;">Brief reminder on Newton-Raphson's method</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs035.html#the-equations" style="font-size: 80%;">The equations</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs036.html#simple-geometric-interpretation" style="font-size: 80%;">Simple geometric interpretation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs037.html#extending-to-more-than-one-variable" style="font-size: 80%;">Extending to more than one variable</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs038.html#steepest-descent" style="font-size: 80%;">Steepest descent</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs039.html#more-on-steepest-descent" style="font-size: 80%;">More on Steepest descent</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs040.html#the-ideal" style="font-size: 80%;">The ideal</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs041.html#the-sensitiveness-of-the-gradient-descent" style="font-size: 80%;">The sensitiveness of the gradient descent</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs042.html#convex-functions" style="font-size: 80%;">Convex functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs043.html#convex-function" style="font-size: 80%;">Convex function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs044.html#conditions-on-convex-functions" style="font-size: 80%;">Conditions on convex functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs045.html#more-on-convex-functions" style="font-size: 80%;">More on convex functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs046.html#some-simple-problems" style="font-size: 80%;">Some simple problems</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs047.html#revisiting-our-first-homework" style="font-size: 80%;">Revisiting our first homework</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs052.html#gradient-descent-example" style="font-size: 80%;">Gradient descent example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs049.html#the-derivative-of-the-cost-loss-function" style="font-size: 80%;">The derivative of the cost/loss function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs050.html#the-hessian-matrix" style="font-size: 80%;">The Hessian matrix</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs051.html#simple-program" style="font-size: 80%;">Simple program</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs052.html#gradient-descent-example" style="font-size: 80%;">Gradient Descent Example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs053.html#and-a-corresponding-example-using-scikit-learn" style="font-size: 80%;">And a corresponding example using <b>scikit-learn</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs054.html#gradient-descent-and-ridge" style="font-size: 80%;">Gradient descent and Ridge</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs055.html#the-hessian-matrix-for-ridge-regression" style="font-size: 80%;">The Hessian matrix for Ridge Regression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs056.html#program-example-for-gradient-descent-with-ridge-regression" style="font-size: 80%;">Program example for gradient descent with Ridge Regression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs057.html#using-gradient-descent-methods-limitations" style="font-size: 80%;">Using gradient descent methods, limitations</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs058.html#challenge-yourself-the-coming-weekend" style="font-size: 80%;">Challenge yourself the coming weekend</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs059.html#lab-session-material-from-last-week-and-relevant-for-the-first-project" style="font-size: 80%;">Lab session: Material from last week and relevant for the first project</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs060.html#various-steps-in-cross-validation" style="font-size: 80%;">Various steps in cross-validation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs061.html#how-to-set-up-the-cross-validation-for-ridge-and-or-lasso" style="font-size: 80%;">How to set up the cross-validation for Ridge and/or Lasso</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs062.html#cross-validation-in-brief" style="font-size: 80%;">Cross-validation in brief</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs063.html#code-example-for-cross-validation-and-k-fold-cross-validation" style="font-size: 80%;">Code Example for Cross-validation and \( k \)-fold Cross-validation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs001.html#plans-for-week-38-lecture-monday-september-15" style="font-size: 80%;">Plans for week 38, lecture Monday September 15</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs002.html#readings-and-videos" style="font-size: 80%;">Readings and Videos</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs003.html#linking-the-regression-analysis-with-a-statistical-interpretation" style="font-size: 80%;">Linking the regression analysis with a statistical interpretation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs004.html#assumptions-made" style="font-size: 80%;">Assumptions made</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs005.html#expectation-value-and-variance" style="font-size: 80%;">Expectation value and variance</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs006.html#expectation-value-and-variance-for-boldsymbol-theta" style="font-size: 80%;">Expectation value and variance for \( \boldsymbol{\theta} \)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs007.html#deriving-ols-from-a-probability-distribution" style="font-size: 80%;">Deriving OLS from a probability distribution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs008.html#independent-and-identically-distributed-iid" style="font-size: 80%;">Independent and Identically Distributed (iid)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs009.html#maximum-likelihood-estimation-mle" style="font-size: 80%;">Maximum Likelihood Estimation (MLE)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs010.html#a-new-cost-function" style="font-size: 80%;">A new Cost Function</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs014.html#why-resampling-methods" style="font-size: 80%;">Why resampling methods</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs016.html#resampling-methods" style="font-size: 80%;">Resampling methods</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs013.html#resampling-approaches-can-be-computationally-expensive" style="font-size: 80%;">Resampling approaches can be computationally expensive</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs014.html#why-resampling-methods" style="font-size: 80%;">Why resampling methods ?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs015.html#statistical-analysis" style="font-size: 80%;">Statistical analysis</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs016.html#resampling-methods" style="font-size: 80%;">Resampling methods</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs017.html#resampling-methods-bootstrap" style="font-size: 80%;">Resampling methods: Bootstrap</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs018.html#the-central-limit-theorem" style="font-size: 80%;">The Central Limit Theorem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs019.html#finding-the-limit" style="font-size: 80%;">Finding the Limit</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs020.html#rewriting-the-delta-function" style="font-size: 80%;">Rewriting the \( \delta \)-function</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs021.html#identifying-terms" style="font-size: 80%;">Identifying Terms</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs022.html#wrapping-it-up" style="font-size: 80%;">Wrapping it up</a></li>
+     <!-- navigation toc: --> <li><a href="#confidence-intervals" style="font-size: 80%;">Confidence Intervals</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs024.html#standard-approach-based-on-the-normal-distribution" style="font-size: 80%;">Standard Approach based on the Normal Distribution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs025.html#resampling-methods-bootstrap-background" style="font-size: 80%;">Resampling methods: Bootstrap background</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs026.html#resampling-methods-more-bootstrap-background" style="font-size: 80%;">Resampling methods: More Bootstrap background</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs027.html#resampling-methods-bootstrap-approach" style="font-size: 80%;">Resampling methods: Bootstrap approach</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs028.html#resampling-methods-bootstrap-steps" style="font-size: 80%;">Resampling methods: Bootstrap steps</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs029.html#code-example-for-the-bootstrap-method" style="font-size: 80%;">Code example for the Bootstrap method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs030.html#plotting-the-histogram" style="font-size: 80%;">Plotting the Histogram</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs031.html#the-bias-variance-tradeoff" style="font-size: 80%;">The bias-variance tradeoff</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs032.html#a-way-to-read-the-bias-variance-tradeoff" style="font-size: 80%;">A way to Read the Bias-Variance Tradeoff</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs033.html#example-code-for-bias-variance-tradeoff" style="font-size: 80%;">Example code for Bias-Variance tradeoff</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs034.html#understanding-what-happens" style="font-size: 80%;">Understanding what happens</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs035.html#summing-up" style="font-size: 80%;">Summing up</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs036.html#another-example-from-scikit-learn-s-repository" style="font-size: 80%;">Another Example from Scikit-Learn's Repository</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs037.html#various-steps-in-cross-validation" style="font-size: 80%;">Various steps in cross-validation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs038.html#cross-validation-in-brief" style="font-size: 80%;">Cross-validation in brief</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs039.html#code-example-for-cross-validation-and-k-fold-cross-validation" style="font-size: 80%;">Code Example for Cross-validation and \( k \)-fold Cross-validation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs040.html#more-examples-on-bootstrap-and-cross-validation-and-errors" style="font-size: 80%;">More examples on bootstrap and cross-validation and errors</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs041.html#the-same-example-but-now-with-cross-validation" style="font-size: 80%;">The same example but now with cross-validation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs042.html#material-for-the-lab-sessions" style="font-size: 80%;">Material for the lab sessions</a></li>
 
         </ul>
       </li>
@@ -319,85 +251,23 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0023"></a>
 <!-- !split -->
-<h2 id="searching-for-optimal-regularization-parameters-lambda" class="anchor">Searching for Optimal Regularization Parameters \( \lambda \) </h2>
+<h2 id="confidence-intervals" class="anchor">Confidence Intervals </h2>
 
-<p>In project 1, when using Ridge and Lasso regression, we end up
-searching for the optimal parameter \( \lambda \) which minimizes our
-selected scores (MSE or \( R2 \) values for example). The brute force
-approach, as discussed in the code here for Ridge regression, consists
-in evaluating the MSE as function of different \( \lambda \) values.
-Based on these calculations, one tries then to determine the value of the hyperparameter \( \lambda \)
-which results in optimal scores (for example the smallest MSE or an \( R2=1 \)).
+<p>Confidence intervals are used in statistics and represent a type of estimate
+computed from the observed data. This gives a range of values for an
+unknown parameter such as the parameters \( \boldsymbol{\theta} \) from linear regression.
 </p>
 
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">pandas</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">pd</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">matplotlib.pyplot</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">plt</span>
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.model_selection</span> <span style="color: #008000; font-weight: bold">import</span> train_test_split
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn</span> <span style="color: #008000; font-weight: bold">import</span> linear_model
-
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">MSE</span>(y_data,y_model):
-    n <span style="color: #666666">=</span> np<span style="color: #666666">.</span>size(y_model)
-    <span style="color: #008000; font-weight: bold">return</span> np<span style="color: #666666">.</span>sum((y_data<span style="color: #666666">-</span>y_model)<span style="color: #666666">**2</span>)<span style="color: #666666">/</span>n
-<span style="color: #408080; font-style: italic"># A seed just to ensure that the random numbers are the same for every run.</span>
-<span style="color: #408080; font-style: italic"># Useful for eventual debugging.</span>
-np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>seed(<span style="color: #666666">2021</span>)
-
-n <span style="color: #666666">=</span> <span style="color: #666666">100</span>
-x <span style="color: #666666">=</span> np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>rand(n)
-y <span style="color: #666666">=</span> np<span style="color: #666666">.</span>exp(<span style="color: #666666">-</span>x<span style="color: #666666">**2</span>) <span style="color: #666666">+</span> <span style="color: #666666">1.5</span> <span style="color: #666666">*</span> np<span style="color: #666666">.</span>exp(<span style="color: #666666">-</span>(x<span style="color: #666666">-2</span>)<span style="color: #666666">**2</span>)<span style="color: #666666">+</span> np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>randn(n)
-
-Maxpolydegree <span style="color: #666666">=</span> <span style="color: #666666">5</span>
-X <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros((n,Maxpolydegree<span style="color: #666666">-1</span>))
-
-<span style="color: #008000; font-weight: bold">for</span> degree <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #666666">1</span>,Maxpolydegree): <span style="color: #408080; font-style: italic">#No intercept column</span>
-    X[:,degree<span style="color: #666666">-1</span>] <span style="color: #666666">=</span> x<span style="color: #666666">**</span>(degree)
-
-<span style="color: #408080; font-style: italic"># We split the data in test and training data</span>
-X_train, X_test, y_train, y_test <span style="color: #666666">=</span> train_test_split(X, y, test_size<span style="color: #666666">=0.2</span>)
-
-<span style="color: #408080; font-style: italic"># Decide which values of lambda to use</span>
-nlambdas <span style="color: #666666">=</span> <span style="color: #666666">500</span>
-MSERidgePredict <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros(nlambdas)
-lambdas <span style="color: #666666">=</span> np<span style="color: #666666">.</span>logspace(<span style="color: #666666">-4</span>, <span style="color: #666666">2</span>, nlambdas)
-<span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(nlambdas):
-    lmb <span style="color: #666666">=</span> lambdas[i]
-    RegRidge <span style="color: #666666">=</span> linear_model<span style="color: #666666">.</span>Ridge(lmb)
-    RegRidge<span style="color: #666666">.</span>fit(X_train,y_train)
-    ypredictRidge <span style="color: #666666">=</span> RegRidge<span style="color: #666666">.</span>predict(X_test)
-    MSERidgePredict[i] <span style="color: #666666">=</span> MSE(y_test,ypredictRidge)
+<p>With the OLS expressions for the parameters \( \boldsymbol{\theta} \) we found 
+\( \mathbb{E}(\boldsymbol{\theta}) = \boldsymbol{\theta} \), which means that the estimator of the regression parameters is unbiased.
+</p>
 
-<span style="color: #408080; font-style: italic"># Now plot the results</span>
-plt<span style="color: #666666">.</span>figure()
-plt<span style="color: #666666">.</span>plot(np<span style="color: #666666">.</span>log10(lambdas), MSERidgePredict, <span style="color: #BA2121">&#39;g--&#39;</span>, label <span style="color: #666666">=</span> <span style="color: #BA2121">&#39;MSE SL Ridge Test&#39;</span>)
-plt<span style="color: #666666">.</span>xlabel(<span style="color: #BA2121">&#39;log10(lambda)&#39;</span>)
-plt<span style="color: #666666">.</span>ylabel(<span style="color: #BA2121">&#39;MSE&#39;</span>)
-plt<span style="color: #666666">.</span>legend()
-plt<span style="color: #666666">.</span>show()
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
+<p>In the exercises this week we show that the variance of the estimate of the \( j \)-th regression coefficient is
+\( \boldsymbol{\sigma}^2 (\boldsymbol{\theta}_j ) = \boldsymbol{\sigma}^2 [(\mathbf{X}^{T} \mathbf{X})^{-1}]_{jj}  \).
+</p>
 
-<p>Here we have performed a rather data greedy calculation as function of the regularization parameter \( \lambda \). There is no resampling here. The latter can easily be added by employing the function <b>RidgeCV</b> instead of just calling the <b>Ridge</b> function. For <b>RidgeCV</b> we need to pass the array of \( \lambda \) values.
-By inspecting the figure we can in turn determine which is the optimal regularization parameter.
-This becomes however less functional in the long run. 
+<p>This quantity can be used to
+construct a confidence interval for the estimates.
 </p>
 
 <p>
@@ -425,7 +295,7 @@ <h2 id="searching-for-optimal-regularization-parameters-lambda" class="anchor">S
   <li><a href="http://github.com/._week38-bs031.html">32</a></li>
   <li><a href="http://github.com/._week38-bs032.html">33</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week38-bs063.html">64</a></li>
+  <li><a href="http://github.com/._week38-bs042.html">43</a></li>
   <li><a href="http://github.com/._week38-bs024.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week38/html/._week38-bs024.html b/doc/pub/week38/html/._week38-bs024.html
index b93c2aa2a..c03589561 100644
--- a/doc/pub/week38/html/._week38-bs024.html
+++ b/doc/pub/week38/html/._week38-bs024.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 38: Logistic Regression and Optimization">
-<title>Week 38: Logistic Regression and Optimization</title>
+<meta name="description" content="Week 38: Statistical analysis, bias-variance tradeoff and resampling methods">
+<title>Week 38: Statistical analysis, bias-variance tradeoff and resampling methods</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week38.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week38-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -36,172 +36,113 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Plans for week 38, lecture Monday September 16',
+ 'sections': [('Plans for week 38, lecture Monday September 15',
                2,
                None,
-               'plans-for-week-38-lecture-monday-september-16'),
-              ('Suggested reading and videos',
+               'plans-for-week-38-lecture-monday-september-15'),
+              ('Readings and Videos', 2, None, 'readings-and-videos'),
+              ('Linking the regression analysis with a statistical '
+               'interpretation',
                2,
                None,
-               'suggested-reading-and-videos'),
-              ('Plans for the lab sessions',
+               'linking-the-regression-analysis-with-a-statistical-interpretation'),
+              ('Assumptions made', 2, None, 'assumptions-made'),
+              ('Expectation value and variance',
                2,
                None,
-               'plans-for-the-lab-sessions'),
-              ('Material for lecture Monday September 16',
+               'expectation-value-and-variance'),
+              ('Expectation value and variance for $\\boldsymbol{\\theta}$',
                2,
                None,
-               'material-for-lecture-monday-september-16'),
-              ('Logistic Regression', 2, None, 'logistic-regression'),
-              ('Classification problems', 2, None, 'classification-problems'),
-              ('Optimization and Deep learning',
+               'expectation-value-and-variance-for-boldsymbol-theta'),
+              ('Deriving OLS from a probability distribution',
                2,
                None,
-               'optimization-and-deep-learning'),
-              ('Basics', 2, None, 'basics'),
-              ('Linear classifier', 2, None, 'linear-classifier'),
-              ('Some selected properties', 2, None, 'some-selected-properties'),
-              ('Simple example', 2, None, 'simple-example'),
-              ('Plotting the mean value for each group',
+               'deriving-ols-from-a-probability-distribution'),
+              ('Independent and Identically Distributed (iid)',
                2,
                None,
-               'plotting-the-mean-value-for-each-group'),
-              ('The logistic function', 2, None, 'the-logistic-function'),
-              ('Examples of likelihood functions used in logistic regression '
-               'and nueral networks',
+               'independent-and-identically-distributed-iid'),
+              ('Maximum Likelihood Estimation (MLE)',
                2,
                None,
-               'examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-networks'),
-              ('Two parameters', 2, None, 'two-parameters'),
-              ('Maximum likelihood', 2, None, 'maximum-likelihood'),
-              ('The cost function rewritten',
+               'maximum-likelihood-estimation-mle'),
+              ('A new Cost Function', 2, None, 'a-new-cost-function'),
+              ('Why resampling methods', 2, None, 'why-resampling-methods'),
+              ('Resampling methods', 2, None, 'resampling-methods'),
+              ('Resampling approaches can be computationally expensive',
                2,
                None,
-               'the-cost-function-rewritten'),
-              ('Minimizing the cross entropy',
+               'resampling-approaches-can-be-computationally-expensive'),
+              ('Why resampling methods ?', 2, None, 'why-resampling-methods'),
+              ('Statistical analysis', 2, None, 'statistical-analysis'),
+              ('Resampling methods', 2, None, 'resampling-methods'),
+              ('Resampling methods: Bootstrap',
                2,
                None,
-               'minimizing-the-cross-entropy'),
-              ('A more compact expression',
+               'resampling-methods-bootstrap'),
+              ('The Central Limit Theorem',
                2,
                None,
-               'a-more-compact-expression'),
-              ('Extending to more predictors',
+               'the-central-limit-theorem'),
+              ('Finding the Limit', 2, None, 'finding-the-limit'),
+              ('Rewriting the $\\delta$-function',
                2,
                None,
-               'extending-to-more-predictors'),
-              ('Including more classes', 2, None, 'including-more-classes'),
-              ('More classes', 2, None, 'more-classes'),
-              ('Searching for Optimal Regularization Parameters $\\lambda$',
+               'rewriting-the-delta-function'),
+              ('Identifying Terms', 2, None, 'identifying-terms'),
+              ('Wrapping it up', 2, None, 'wrapping-it-up'),
+              ('Confidence Intervals', 2, None, 'confidence-intervals'),
+              ('Standard Approach based on the Normal Distribution',
                2,
                None,
-               'searching-for-optimal-regularization-parameters-lambda'),
-              ('Grid Search', 2, None, 'grid-search'),
-              ('Randomized Grid Search', 2, None, 'randomized-grid-search'),
-              ('Wisconsin Cancer Data', 2, None, 'wisconsin-cancer-data'),
-              ('Using the correlation matrix',
+               'standard-approach-based-on-the-normal-distribution'),
+              ('Resampling methods: Bootstrap background',
                2,
                None,
-               'using-the-correlation-matrix'),
-              ('Discussing the correlation data',
+               'resampling-methods-bootstrap-background'),
+              ('Resampling methods: More Bootstrap background',
                2,
                None,
-               'discussing-the-correlation-data'),
-              ('Other measures in classification studies: Cancer Data  again',
+               'resampling-methods-more-bootstrap-background'),
+              ('Resampling methods: Bootstrap approach',
                2,
                None,
-               'other-measures-in-classification-studies-cancer-data-again'),
-              ('Optimization, the central part of any Machine Learning '
-               'algortithm',
+               'resampling-methods-bootstrap-approach'),
+              ('Resampling methods: Bootstrap steps',
                2,
                None,
-               'optimization-the-central-part-of-any-machine-learning-algortithm'),
-              ('Revisiting our Logistic Regression case',
+               'resampling-methods-bootstrap-steps'),
+              ('Code example for the Bootstrap method',
                2,
                None,
-               'revisiting-our-logistic-regression-case'),
-              ('The equations to solve', 2, None, 'the-equations-to-solve'),
-              ("Solving using Newton-Raphson's method",
+               'code-example-for-the-bootstrap-method'),
+              ('Plotting the Histogram', 2, None, 'plotting-the-histogram'),
+              ('The bias-variance tradeoff',
                2,
                None,
-               'solving-using-newton-raphson-s-method'),
-              ("Brief reminder on Newton-Raphson's method",
+               'the-bias-variance-tradeoff'),
+              ('A way to Read the Bias-Variance Tradeoff',
                2,
                None,
-               'brief-reminder-on-newton-raphson-s-method'),
-              ('The equations', 2, None, 'the-equations'),
-              ('Simple geometric interpretation',
+               'a-way-to-read-the-bias-variance-tradeoff'),
+              ('Example code for Bias-Variance tradeoff',
                2,
                None,
-               'simple-geometric-interpretation'),
-              ('Extending to more than one variable',
+               'example-code-for-bias-variance-tradeoff'),
+              ('Understanding what happens',
                2,
                None,
-               'extending-to-more-than-one-variable'),
-              ('Steepest descent', 2, None, 'steepest-descent'),
-              ('More on Steepest descent', 2, None, 'more-on-steepest-descent'),
-              ('The ideal', 2, None, 'the-ideal'),
-              ('The sensitiveness of the gradient descent',
+               'understanding-what-happens'),
+              ('Summing up', 2, None, 'summing-up'),
+              ("Another Example from Scikit-Learn's Repository",
                2,
                None,
-               'the-sensitiveness-of-the-gradient-descent'),
-              ('Convex functions', 2, None, 'convex-functions'),
-              ('Convex function', 2, None, 'convex-function'),
-              ('Conditions on convex functions',
-               2,
-               None,
-               'conditions-on-convex-functions'),
-              ('More on convex functions', 2, None, 'more-on-convex-functions'),
-              ('Some simple problems', 2, None, 'some-simple-problems'),
-              ('Revisiting our first homework',
-               2,
-               None,
-               'revisiting-our-first-homework'),
-              ('Gradient descent example', 2, None, 'gradient-descent-example'),
-              ('The derivative of the cost/loss function',
-               2,
-               None,
-               'the-derivative-of-the-cost-loss-function'),
-              ('The Hessian matrix', 2, None, 'the-hessian-matrix'),
-              ('Simple program', 2, None, 'simple-program'),
-              ('Gradient Descent Example', 2, None, 'gradient-descent-example'),
-              ('And a corresponding example using _scikit-learn_',
-               2,
-               None,
-               'and-a-corresponding-example-using-scikit-learn'),
-              ('Gradient descent and Ridge',
-               2,
-               None,
-               'gradient-descent-and-ridge'),
-              ('The Hessian matrix for Ridge Regression',
-               2,
-               None,
-               'the-hessian-matrix-for-ridge-regression'),
-              ('Program example for gradient descent with Ridge Regression',
-               2,
-               None,
-               'program-example-for-gradient-descent-with-ridge-regression'),
-              ('Using gradient descent methods, limitations',
-               2,
-               None,
-               'using-gradient-descent-methods-limitations'),
-              ('Challenge yourself the coming weekend',
-               2,
-               None,
-               'challenge-yourself-the-coming-weekend'),
-              ('Lab session: Material from last week and relevant for the '
-               'first project',
-               2,
-               None,
-               'lab-session-material-from-last-week-and-relevant-for-the-first-project'),
+               'another-example-from-scikit-learn-s-repository'),
               ('Various steps in cross-validation',
                2,
                None,
                'various-steps-in-cross-validation'),
-              ('How to set up the cross-validation for Ridge and/or Lasso',
-               2,
-               None,
-               'how-to-set-up-the-cross-validation-for-ridge-and-or-lasso'),
               ('Cross-validation in brief',
                2,
                None,
@@ -210,7 +151,19 @@
                'Cross-validation',
                2,
                None,
-               'code-example-for-cross-validation-and-k-fold-cross-validation')]}
+               'code-example-for-cross-validation-and-k-fold-cross-validation'),
+              ('More examples on bootstrap and cross-validation and errors',
+               2,
+               None,
+               'more-examples-on-bootstrap-and-cross-validation-and-errors'),
+              ('The same example but now with cross-validation',
+               2,
+               None,
+               'the-same-example-but-now-with-cross-validation'),
+              ('Material for the lab sessions',
+               2,
+               None,
+               'material-for-the-lab-sessions')]}
 end of tocinfo -->
 
 <body>
@@ -238,76 +191,55 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week38-bs.html">Week 38: Logistic Regression and Optimization</a>
+    <a class="navbar-brand" href="http://github.com/week38-bs.html">Week 38: Statistical analysis, bias-variance tradeoff and resampling methods</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs001.html#plans-for-week-38-lecture-monday-september-16" style="font-size: 80%;">Plans for week 38, lecture Monday September 16</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs002.html#suggested-reading-and-videos" style="font-size: 80%;">Suggested reading and videos</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs003.html#plans-for-the-lab-sessions" style="font-size: 80%;">Plans for the lab sessions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs004.html#material-for-lecture-monday-september-16" style="font-size: 80%;">Material for lecture Monday September 16</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs005.html#logistic-regression" style="font-size: 80%;">Logistic Regression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs006.html#classification-problems" style="font-size: 80%;">Classification problems</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs007.html#optimization-and-deep-learning" style="font-size: 80%;">Optimization and Deep learning</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs008.html#basics" style="font-size: 80%;">Basics</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs009.html#linear-classifier" style="font-size: 80%;">Linear classifier</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs010.html#some-selected-properties" style="font-size: 80%;">Some selected properties</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs011.html#simple-example" style="font-size: 80%;">Simple example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs012.html#plotting-the-mean-value-for-each-group" style="font-size: 80%;">Plotting the mean value for each group</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs013.html#the-logistic-function" style="font-size: 80%;">The logistic function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs014.html#examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-networks" style="font-size: 80%;">Examples of likelihood functions used in logistic regression and nueral networks</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs015.html#two-parameters" style="font-size: 80%;">Two parameters</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs016.html#maximum-likelihood" style="font-size: 80%;">Maximum likelihood</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs017.html#the-cost-function-rewritten" style="font-size: 80%;">The cost function rewritten</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs018.html#minimizing-the-cross-entropy" style="font-size: 80%;">Minimizing the cross entropy</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs019.html#a-more-compact-expression" style="font-size: 80%;">A more compact expression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs020.html#extending-to-more-predictors" style="font-size: 80%;">Extending to more predictors</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs021.html#including-more-classes" style="font-size: 80%;">Including more classes</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs022.html#more-classes" style="font-size: 80%;">More classes</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs023.html#searching-for-optimal-regularization-parameters-lambda" style="font-size: 80%;">Searching for Optimal Regularization Parameters \( \lambda \)</a></li>
-     <!-- navigation toc: --> <li><a href="#grid-search" style="font-size: 80%;">Grid Search</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs025.html#randomized-grid-search" style="font-size: 80%;">Randomized Grid Search</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs026.html#wisconsin-cancer-data" style="font-size: 80%;">Wisconsin Cancer Data</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs027.html#using-the-correlation-matrix" style="font-size: 80%;">Using the correlation matrix</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs028.html#discussing-the-correlation-data" style="font-size: 80%;">Discussing the correlation data</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs029.html#other-measures-in-classification-studies-cancer-data-again" style="font-size: 80%;">Other measures in classification studies: Cancer Data  again</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs030.html#optimization-the-central-part-of-any-machine-learning-algortithm" style="font-size: 80%;">Optimization, the central part of any Machine Learning algortithm</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs031.html#revisiting-our-logistic-regression-case" style="font-size: 80%;">Revisiting our Logistic Regression case</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs032.html#the-equations-to-solve" style="font-size: 80%;">The equations to solve</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs033.html#solving-using-newton-raphson-s-method" style="font-size: 80%;">Solving using Newton-Raphson's method</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs034.html#brief-reminder-on-newton-raphson-s-method" style="font-size: 80%;">Brief reminder on Newton-Raphson's method</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs035.html#the-equations" style="font-size: 80%;">The equations</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs036.html#simple-geometric-interpretation" style="font-size: 80%;">Simple geometric interpretation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs037.html#extending-to-more-than-one-variable" style="font-size: 80%;">Extending to more than one variable</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs038.html#steepest-descent" style="font-size: 80%;">Steepest descent</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs039.html#more-on-steepest-descent" style="font-size: 80%;">More on Steepest descent</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs040.html#the-ideal" style="font-size: 80%;">The ideal</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs041.html#the-sensitiveness-of-the-gradient-descent" style="font-size: 80%;">The sensitiveness of the gradient descent</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs042.html#convex-functions" style="font-size: 80%;">Convex functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs043.html#convex-function" style="font-size: 80%;">Convex function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs044.html#conditions-on-convex-functions" style="font-size: 80%;">Conditions on convex functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs045.html#more-on-convex-functions" style="font-size: 80%;">More on convex functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs046.html#some-simple-problems" style="font-size: 80%;">Some simple problems</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs047.html#revisiting-our-first-homework" style="font-size: 80%;">Revisiting our first homework</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs052.html#gradient-descent-example" style="font-size: 80%;">Gradient descent example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs049.html#the-derivative-of-the-cost-loss-function" style="font-size: 80%;">The derivative of the cost/loss function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs050.html#the-hessian-matrix" style="font-size: 80%;">The Hessian matrix</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs051.html#simple-program" style="font-size: 80%;">Simple program</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs052.html#gradient-descent-example" style="font-size: 80%;">Gradient Descent Example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs053.html#and-a-corresponding-example-using-scikit-learn" style="font-size: 80%;">And a corresponding example using <b>scikit-learn</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs054.html#gradient-descent-and-ridge" style="font-size: 80%;">Gradient descent and Ridge</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs055.html#the-hessian-matrix-for-ridge-regression" style="font-size: 80%;">The Hessian matrix for Ridge Regression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs056.html#program-example-for-gradient-descent-with-ridge-regression" style="font-size: 80%;">Program example for gradient descent with Ridge Regression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs057.html#using-gradient-descent-methods-limitations" style="font-size: 80%;">Using gradient descent methods, limitations</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs058.html#challenge-yourself-the-coming-weekend" style="font-size: 80%;">Challenge yourself the coming weekend</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs059.html#lab-session-material-from-last-week-and-relevant-for-the-first-project" style="font-size: 80%;">Lab session: Material from last week and relevant for the first project</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs060.html#various-steps-in-cross-validation" style="font-size: 80%;">Various steps in cross-validation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs061.html#how-to-set-up-the-cross-validation-for-ridge-and-or-lasso" style="font-size: 80%;">How to set up the cross-validation for Ridge and/or Lasso</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs062.html#cross-validation-in-brief" style="font-size: 80%;">Cross-validation in brief</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs063.html#code-example-for-cross-validation-and-k-fold-cross-validation" style="font-size: 80%;">Code Example for Cross-validation and \( k \)-fold Cross-validation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs001.html#plans-for-week-38-lecture-monday-september-15" style="font-size: 80%;">Plans for week 38, lecture Monday September 15</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs002.html#readings-and-videos" style="font-size: 80%;">Readings and Videos</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs003.html#linking-the-regression-analysis-with-a-statistical-interpretation" style="font-size: 80%;">Linking the regression analysis with a statistical interpretation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs004.html#assumptions-made" style="font-size: 80%;">Assumptions made</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs005.html#expectation-value-and-variance" style="font-size: 80%;">Expectation value and variance</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs006.html#expectation-value-and-variance-for-boldsymbol-theta" style="font-size: 80%;">Expectation value and variance for \( \boldsymbol{\theta} \)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs007.html#deriving-ols-from-a-probability-distribution" style="font-size: 80%;">Deriving OLS from a probability distribution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs008.html#independent-and-identically-distributed-iid" style="font-size: 80%;">Independent and Identically Distributed (iid)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs009.html#maximum-likelihood-estimation-mle" style="font-size: 80%;">Maximum Likelihood Estimation (MLE)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs010.html#a-new-cost-function" style="font-size: 80%;">A new Cost Function</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs014.html#why-resampling-methods" style="font-size: 80%;">Why resampling methods</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs016.html#resampling-methods" style="font-size: 80%;">Resampling methods</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs013.html#resampling-approaches-can-be-computationally-expensive" style="font-size: 80%;">Resampling approaches can be computationally expensive</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs014.html#why-resampling-methods" style="font-size: 80%;">Why resampling methods ?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs015.html#statistical-analysis" style="font-size: 80%;">Statistical analysis</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs016.html#resampling-methods" style="font-size: 80%;">Resampling methods</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs017.html#resampling-methods-bootstrap" style="font-size: 80%;">Resampling methods: Bootstrap</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs018.html#the-central-limit-theorem" style="font-size: 80%;">The Central Limit Theorem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs019.html#finding-the-limit" style="font-size: 80%;">Finding the Limit</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs020.html#rewriting-the-delta-function" style="font-size: 80%;">Rewriting the \( \delta \)-function</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs021.html#identifying-terms" style="font-size: 80%;">Identifying Terms</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs022.html#wrapping-it-up" style="font-size: 80%;">Wrapping it up</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs023.html#confidence-intervals" style="font-size: 80%;">Confidence Intervals</a></li>
+     <!-- navigation toc: --> <li><a href="#standard-approach-based-on-the-normal-distribution" style="font-size: 80%;">Standard Approach based on the Normal Distribution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs025.html#resampling-methods-bootstrap-background" style="font-size: 80%;">Resampling methods: Bootstrap background</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs026.html#resampling-methods-more-bootstrap-background" style="font-size: 80%;">Resampling methods: More Bootstrap background</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs027.html#resampling-methods-bootstrap-approach" style="font-size: 80%;">Resampling methods: Bootstrap approach</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs028.html#resampling-methods-bootstrap-steps" style="font-size: 80%;">Resampling methods: Bootstrap steps</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs029.html#code-example-for-the-bootstrap-method" style="font-size: 80%;">Code example for the Bootstrap method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs030.html#plotting-the-histogram" style="font-size: 80%;">Plotting the Histogram</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs031.html#the-bias-variance-tradeoff" style="font-size: 80%;">The bias-variance tradeoff</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs032.html#a-way-to-read-the-bias-variance-tradeoff" style="font-size: 80%;">A way to Read the Bias-Variance Tradeoff</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs033.html#example-code-for-bias-variance-tradeoff" style="font-size: 80%;">Example code for Bias-Variance tradeoff</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs034.html#understanding-what-happens" style="font-size: 80%;">Understanding what happens</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs035.html#summing-up" style="font-size: 80%;">Summing up</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs036.html#another-example-from-scikit-learn-s-repository" style="font-size: 80%;">Another Example from Scikit-Learn's Repository</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs037.html#various-steps-in-cross-validation" style="font-size: 80%;">Various steps in cross-validation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs038.html#cross-validation-in-brief" style="font-size: 80%;">Cross-validation in brief</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs039.html#code-example-for-cross-validation-and-k-fold-cross-validation" style="font-size: 80%;">Code Example for Cross-validation and \( k \)-fold Cross-validation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs040.html#more-examples-on-bootstrap-and-cross-validation-and-errors" style="font-size: 80%;">More examples on bootstrap and cross-validation and errors</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs041.html#the-same-example-but-now-with-cross-validation" style="font-size: 80%;">The same example but now with cross-validation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs042.html#material-for-the-lab-sessions" style="font-size: 80%;">Material for the lab sessions</a></li>
 
         </ul>
       </li>
@@ -319,85 +251,32 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0024"></a>
 <!-- !split -->
-<h2 id="grid-search" class="anchor">Grid Search </h2>
+<h2 id="standard-approach-based-on-the-normal-distribution" class="anchor">Standard Approach based on the Normal Distribution </h2>
 
-<p>An alternative is to use the so-called grid search functionality
-included with the library <b>Scikit-Learn</b>, as demonstrated for the same
-example here.
+<p>We will assume that the parameters \( \theta \) follow a normal
+distribution.  We can then define the confidence interval.  Here we will be using as
+shorthands \( \mu_{\theta} \) for the above mean value and \( \sigma_{\theta} \)
+for the standard deviation. We have then a confidence interval
 </p>
 
+$$
+\left(\mu_{\theta}\pm \frac{z\sigma_{\theta}}{\sqrt{n}}\right),
+$$
 
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.model_selection</span> <span style="color: #008000; font-weight: bold">import</span> train_test_split
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.linear_model</span> <span style="color: #008000; font-weight: bold">import</span> Ridge
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.model_selection</span> <span style="color: #008000; font-weight: bold">import</span> GridSearchCV
-
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">R2</span>(y_data, y_model):
-    <span style="color: #008000; font-weight: bold">return</span> <span style="color: #666666">1</span> <span style="color: #666666">-</span> np<span style="color: #666666">.</span>sum((y_data <span style="color: #666666">-</span> y_model) <span style="color: #666666">**</span> <span style="color: #666666">2</span>) <span style="color: #666666">/</span> np<span style="color: #666666">.</span>sum((y_data <span style="color: #666666">-</span> np<span style="color: #666666">.</span>mean(y_data)) <span style="color: #666666">**</span> <span style="color: #666666">2</span>)
-
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">MSE</span>(y_data,y_model):
-    n <span style="color: #666666">=</span> np<span style="color: #666666">.</span>size(y_model)
-    <span style="color: #008000; font-weight: bold">return</span> np<span style="color: #666666">.</span>sum((y_data<span style="color: #666666">-</span>y_model)<span style="color: #666666">**2</span>)<span style="color: #666666">/</span>n
-
-<span style="color: #408080; font-style: italic"># A seed just to ensure that the random numbers are the same for every run.</span>
-<span style="color: #408080; font-style: italic"># Useful for eventual debugging.</span>
-np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>seed(<span style="color: #666666">2021</span>)
-
-n <span style="color: #666666">=</span> <span style="color: #666666">100</span>
-x <span style="color: #666666">=</span> np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>rand(n)
-y <span style="color: #666666">=</span> np<span style="color: #666666">.</span>exp(<span style="color: #666666">-</span>x<span style="color: #666666">**2</span>) <span style="color: #666666">+</span> <span style="color: #666666">1.5</span> <span style="color: #666666">*</span> np<span style="color: #666666">.</span>exp(<span style="color: #666666">-</span>(x<span style="color: #666666">-2</span>)<span style="color: #666666">**2</span>)<span style="color: #666666">+</span> np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>randn(n)
-
-Maxpolydegree <span style="color: #666666">=</span> <span style="color: #666666">5</span>
-X <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros((n,Maxpolydegree<span style="color: #666666">-1</span>))
-
-<span style="color: #008000; font-weight: bold">for</span> degree <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #666666">1</span>,Maxpolydegree): <span style="color: #408080; font-style: italic">#No intercept column</span>
-    X[:,degree<span style="color: #666666">-1</span>] <span style="color: #666666">=</span> x<span style="color: #666666">**</span>(degree)
-
-<span style="color: #408080; font-style: italic"># We split the data in test and training data</span>
-X_train, X_test, y_train, y_test <span style="color: #666666">=</span> train_test_split(X, y, test_size<span style="color: #666666">=0.2</span>)
+<p>where \( z \) defines the level of certainty (or confidence). For a normal
+distribution typical parameters are \( z=2.576 \) which corresponds to a
+confidence of \( 99\% \) while \( z=1.96 \) corresponds to a confidence of
+\( 95\% \).  A confidence level of \( 95\% \) is commonly used and it is
+normally referred to as a <em>two-sigmas</em> confidence level, that is we
+approximate \( z\approx 2 \).
+</p>
 
-<span style="color: #408080; font-style: italic"># Decide which values of lambda to use</span>
-nlambdas <span style="color: #666666">=</span> <span style="color: #666666">10</span>
-lambdas <span style="color: #666666">=</span> np<span style="color: #666666">.</span>logspace(<span style="color: #666666">-4</span>, <span style="color: #666666">2</span>, nlambdas)
-<span style="color: #408080; font-style: italic"># create and fit a ridge regression model, testing each alpha</span>
-model <span style="color: #666666">=</span> Ridge()
-gridsearch <span style="color: #666666">=</span> GridSearchCV(estimator<span style="color: #666666">=</span>model, param_grid<span style="color: #666666">=</span><span style="color: #008000">dict</span>(alpha<span style="color: #666666">=</span>lambdas))
-gridsearch<span style="color: #666666">.</span>fit(X_train, y_train)
-<span style="color: #008000">print</span>(gridsearch)
-ypredictRidge <span style="color: #666666">=</span> gridsearch<span style="color: #666666">.</span>predict(X_test)
-<span style="color: #408080; font-style: italic"># summarize the results of the grid search</span>
-<span style="color: #008000">print</span>(<span style="color: #BA2121">f&quot;Best estimated lambda-value: </span><span style="color: #BB6688; font-weight: bold">{</span>gridsearch<span style="color: #666666">.</span>best_estimator_<span style="color: #666666">.</span>alpha<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">&quot;</span>)
-<span style="color: #008000">print</span>(<span style="color: #BA2121">f&quot;MSE score: </span><span style="color: #BB6688; font-weight: bold">{</span>MSE(y_test,ypredictRidge)<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">&quot;</span>)
-<span style="color: #008000">print</span>(<span style="color: #BA2121">f&quot;R2 score: </span><span style="color: #BB6688; font-weight: bold">{</span>R2(y_test,ypredictRidge)<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">&quot;</span>)
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
+<p>For more discussions of confidence intervals (and in particular linked with a discussion of the bootstrap method), see chapter 5 of the textbook by <a href="https://www.cambridge.org/core/books/bootstrap-methods-and-their-application/ED2FD043579F27952363566DC09CBD6A" target="_self">Davison on the Bootstrap Methods and their Applications</a></p>
 
-<p>By default the grid search function includes cross validation with
-five folds. The <a href="https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html#sklearn.model_selection.GridSearchCV" target="_self">Scikit-Learn
-documentation</a>
-contains more information on how to set the different parameters.
+<p>In this text you will also find an in-depth discussion of the
+Bootstrap method, why it works and various theorems related to it. 
 </p>
 
-<p>If we take out the random noise, running the above codes results in \( \lambda=0 \) yielding the best fit. </p>
-
 <p>
 <!-- navigation buttons at the bottom of the page -->
 <ul class="pagination">
@@ -423,7 +302,7 @@ <h2 id="grid-search" class="anchor">Grid Search </h2>
   <li><a href="http://github.com/._week38-bs032.html">33</a></li>
   <li><a href="http://github.com/._week38-bs033.html">34</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week38-bs063.html">64</a></li>
+  <li><a href="http://github.com/._week38-bs042.html">43</a></li>
   <li><a href="http://github.com/._week38-bs025.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week38/html/._week38-bs025.html b/doc/pub/week38/html/._week38-bs025.html
index b1c784217..c358a1709 100644
--- a/doc/pub/week38/html/._week38-bs025.html
+++ b/doc/pub/week38/html/._week38-bs025.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 38: Logistic Regression and Optimization">
-<title>Week 38: Logistic Regression and Optimization</title>
+<meta name="description" content="Week 38: Statistical analysis, bias-variance tradeoff and resampling methods">
+<title>Week 38: Statistical analysis, bias-variance tradeoff and resampling methods</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week38.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week38-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -36,172 +36,113 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Plans for week 38, lecture Monday September 16',
+ 'sections': [('Plans for week 38, lecture Monday September 15',
                2,
                None,
-               'plans-for-week-38-lecture-monday-september-16'),
-              ('Suggested reading and videos',
+               'plans-for-week-38-lecture-monday-september-15'),
+              ('Readings and Videos', 2, None, 'readings-and-videos'),
+              ('Linking the regression analysis with a statistical '
+               'interpretation',
                2,
                None,
-               'suggested-reading-and-videos'),
-              ('Plans for the lab sessions',
+               'linking-the-regression-analysis-with-a-statistical-interpretation'),
+              ('Assumptions made', 2, None, 'assumptions-made'),
+              ('Expectation value and variance',
                2,
                None,
-               'plans-for-the-lab-sessions'),
-              ('Material for lecture Monday September 16',
+               'expectation-value-and-variance'),
+              ('Expectation value and variance for $\\boldsymbol{\\theta}$',
                2,
                None,
-               'material-for-lecture-monday-september-16'),
-              ('Logistic Regression', 2, None, 'logistic-regression'),
-              ('Classification problems', 2, None, 'classification-problems'),
-              ('Optimization and Deep learning',
+               'expectation-value-and-variance-for-boldsymbol-theta'),
+              ('Deriving OLS from a probability distribution',
                2,
                None,
-               'optimization-and-deep-learning'),
-              ('Basics', 2, None, 'basics'),
-              ('Linear classifier', 2, None, 'linear-classifier'),
-              ('Some selected properties', 2, None, 'some-selected-properties'),
-              ('Simple example', 2, None, 'simple-example'),
-              ('Plotting the mean value for each group',
+               'deriving-ols-from-a-probability-distribution'),
+              ('Independent and Identically Distributed (iid)',
                2,
                None,
-               'plotting-the-mean-value-for-each-group'),
-              ('The logistic function', 2, None, 'the-logistic-function'),
-              ('Examples of likelihood functions used in logistic regression '
-               'and nueral networks',
+               'independent-and-identically-distributed-iid'),
+              ('Maximum Likelihood Estimation (MLE)',
                2,
                None,
-               'examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-networks'),
-              ('Two parameters', 2, None, 'two-parameters'),
-              ('Maximum likelihood', 2, None, 'maximum-likelihood'),
-              ('The cost function rewritten',
+               'maximum-likelihood-estimation-mle'),
+              ('A new Cost Function', 2, None, 'a-new-cost-function'),
+              ('Why resampling methods', 2, None, 'why-resampling-methods'),
+              ('Resampling methods', 2, None, 'resampling-methods'),
+              ('Resampling approaches can be computationally expensive',
                2,
                None,
-               'the-cost-function-rewritten'),
-              ('Minimizing the cross entropy',
+               'resampling-approaches-can-be-computationally-expensive'),
+              ('Why resampling methods ?', 2, None, 'why-resampling-methods'),
+              ('Statistical analysis', 2, None, 'statistical-analysis'),
+              ('Resampling methods', 2, None, 'resampling-methods'),
+              ('Resampling methods: Bootstrap',
                2,
                None,
-               'minimizing-the-cross-entropy'),
-              ('A more compact expression',
+               'resampling-methods-bootstrap'),
+              ('The Central Limit Theorem',
                2,
                None,
-               'a-more-compact-expression'),
-              ('Extending to more predictors',
+               'the-central-limit-theorem'),
+              ('Finding the Limit', 2, None, 'finding-the-limit'),
+              ('Rewriting the $\\delta$-function',
                2,
                None,
-               'extending-to-more-predictors'),
-              ('Including more classes', 2, None, 'including-more-classes'),
-              ('More classes', 2, None, 'more-classes'),
-              ('Searching for Optimal Regularization Parameters $\\lambda$',
+               'rewriting-the-delta-function'),
+              ('Identifying Terms', 2, None, 'identifying-terms'),
+              ('Wrapping it up', 2, None, 'wrapping-it-up'),
+              ('Confidence Intervals', 2, None, 'confidence-intervals'),
+              ('Standard Approach based on the Normal Distribution',
                2,
                None,
-               'searching-for-optimal-regularization-parameters-lambda'),
-              ('Grid Search', 2, None, 'grid-search'),
-              ('Randomized Grid Search', 2, None, 'randomized-grid-search'),
-              ('Wisconsin Cancer Data', 2, None, 'wisconsin-cancer-data'),
-              ('Using the correlation matrix',
+               'standard-approach-based-on-the-normal-distribution'),
+              ('Resampling methods: Bootstrap background',
                2,
                None,
-               'using-the-correlation-matrix'),
-              ('Discussing the correlation data',
+               'resampling-methods-bootstrap-background'),
+              ('Resampling methods: More Bootstrap background',
                2,
                None,
-               'discussing-the-correlation-data'),
-              ('Other measures in classification studies: Cancer Data  again',
+               'resampling-methods-more-bootstrap-background'),
+              ('Resampling methods: Bootstrap approach',
                2,
                None,
-               'other-measures-in-classification-studies-cancer-data-again'),
-              ('Optimization, the central part of any Machine Learning '
-               'algortithm',
+               'resampling-methods-bootstrap-approach'),
+              ('Resampling methods: Bootstrap steps',
                2,
                None,
-               'optimization-the-central-part-of-any-machine-learning-algortithm'),
-              ('Revisiting our Logistic Regression case',
+               'resampling-methods-bootstrap-steps'),
+              ('Code example for the Bootstrap method',
                2,
                None,
-               'revisiting-our-logistic-regression-case'),
-              ('The equations to solve', 2, None, 'the-equations-to-solve'),
-              ("Solving using Newton-Raphson's method",
+               'code-example-for-the-bootstrap-method'),
+              ('Plotting the Histogram', 2, None, 'plotting-the-histogram'),
+              ('The bias-variance tradeoff',
                2,
                None,
-               'solving-using-newton-raphson-s-method'),
-              ("Brief reminder on Newton-Raphson's method",
+               'the-bias-variance-tradeoff'),
+              ('A way to Read the Bias-Variance Tradeoff',
                2,
                None,
-               'brief-reminder-on-newton-raphson-s-method'),
-              ('The equations', 2, None, 'the-equations'),
-              ('Simple geometric interpretation',
+               'a-way-to-read-the-bias-variance-tradeoff'),
+              ('Example code for Bias-Variance tradeoff',
                2,
                None,
-               'simple-geometric-interpretation'),
-              ('Extending to more than one variable',
+               'example-code-for-bias-variance-tradeoff'),
+              ('Understanding what happens',
                2,
                None,
-               'extending-to-more-than-one-variable'),
-              ('Steepest descent', 2, None, 'steepest-descent'),
-              ('More on Steepest descent', 2, None, 'more-on-steepest-descent'),
-              ('The ideal', 2, None, 'the-ideal'),
-              ('The sensitiveness of the gradient descent',
+               'understanding-what-happens'),
+              ('Summing up', 2, None, 'summing-up'),
+              ("Another Example from Scikit-Learn's Repository",
                2,
                None,
-               'the-sensitiveness-of-the-gradient-descent'),
-              ('Convex functions', 2, None, 'convex-functions'),
-              ('Convex function', 2, None, 'convex-function'),
-              ('Conditions on convex functions',
-               2,
-               None,
-               'conditions-on-convex-functions'),
-              ('More on convex functions', 2, None, 'more-on-convex-functions'),
-              ('Some simple problems', 2, None, 'some-simple-problems'),
-              ('Revisiting our first homework',
-               2,
-               None,
-               'revisiting-our-first-homework'),
-              ('Gradient descent example', 2, None, 'gradient-descent-example'),
-              ('The derivative of the cost/loss function',
-               2,
-               None,
-               'the-derivative-of-the-cost-loss-function'),
-              ('The Hessian matrix', 2, None, 'the-hessian-matrix'),
-              ('Simple program', 2, None, 'simple-program'),
-              ('Gradient Descent Example', 2, None, 'gradient-descent-example'),
-              ('And a corresponding example using _scikit-learn_',
-               2,
-               None,
-               'and-a-corresponding-example-using-scikit-learn'),
-              ('Gradient descent and Ridge',
-               2,
-               None,
-               'gradient-descent-and-ridge'),
-              ('The Hessian matrix for Ridge Regression',
-               2,
-               None,
-               'the-hessian-matrix-for-ridge-regression'),
-              ('Program example for gradient descent with Ridge Regression',
-               2,
-               None,
-               'program-example-for-gradient-descent-with-ridge-regression'),
-              ('Using gradient descent methods, limitations',
-               2,
-               None,
-               'using-gradient-descent-methods-limitations'),
-              ('Challenge yourself the coming weekend',
-               2,
-               None,
-               'challenge-yourself-the-coming-weekend'),
-              ('Lab session: Material from last week and relevant for the '
-               'first project',
-               2,
-               None,
-               'lab-session-material-from-last-week-and-relevant-for-the-first-project'),
+               'another-example-from-scikit-learn-s-repository'),
               ('Various steps in cross-validation',
                2,
                None,
                'various-steps-in-cross-validation'),
-              ('How to set up the cross-validation for Ridge and/or Lasso',
-               2,
-               None,
-               'how-to-set-up-the-cross-validation-for-ridge-and-or-lasso'),
               ('Cross-validation in brief',
                2,
                None,
@@ -210,7 +151,19 @@
                'Cross-validation',
                2,
                None,
-               'code-example-for-cross-validation-and-k-fold-cross-validation')]}
+               'code-example-for-cross-validation-and-k-fold-cross-validation'),
+              ('More examples on bootstrap and cross-validation and errors',
+               2,
+               None,
+               'more-examples-on-bootstrap-and-cross-validation-and-errors'),
+              ('The same example but now with cross-validation',
+               2,
+               None,
+               'the-same-example-but-now-with-cross-validation'),
+              ('Material for the lab sessions',
+               2,
+               None,
+               'material-for-the-lab-sessions')]}
 end of tocinfo -->
 
 <body>
@@ -238,76 +191,55 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week38-bs.html">Week 38: Logistic Regression and Optimization</a>
+    <a class="navbar-brand" href="http://github.com/week38-bs.html">Week 38: Statistical analysis, bias-variance tradeoff and resampling methods</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs001.html#plans-for-week-38-lecture-monday-september-16" style="font-size: 80%;">Plans for week 38, lecture Monday September 16</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs002.html#suggested-reading-and-videos" style="font-size: 80%;">Suggested reading and videos</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs003.html#plans-for-the-lab-sessions" style="font-size: 80%;">Plans for the lab sessions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs004.html#material-for-lecture-monday-september-16" style="font-size: 80%;">Material for lecture Monday September 16</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs005.html#logistic-regression" style="font-size: 80%;">Logistic Regression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs006.html#classification-problems" style="font-size: 80%;">Classification problems</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs007.html#optimization-and-deep-learning" style="font-size: 80%;">Optimization and Deep learning</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs008.html#basics" style="font-size: 80%;">Basics</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs009.html#linear-classifier" style="font-size: 80%;">Linear classifier</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs010.html#some-selected-properties" style="font-size: 80%;">Some selected properties</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs011.html#simple-example" style="font-size: 80%;">Simple example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs012.html#plotting-the-mean-value-for-each-group" style="font-size: 80%;">Plotting the mean value for each group</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs013.html#the-logistic-function" style="font-size: 80%;">The logistic function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs014.html#examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-networks" style="font-size: 80%;">Examples of likelihood functions used in logistic regression and nueral networks</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs015.html#two-parameters" style="font-size: 80%;">Two parameters</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs016.html#maximum-likelihood" style="font-size: 80%;">Maximum likelihood</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs017.html#the-cost-function-rewritten" style="font-size: 80%;">The cost function rewritten</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs018.html#minimizing-the-cross-entropy" style="font-size: 80%;">Minimizing the cross entropy</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs019.html#a-more-compact-expression" style="font-size: 80%;">A more compact expression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs020.html#extending-to-more-predictors" style="font-size: 80%;">Extending to more predictors</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs021.html#including-more-classes" style="font-size: 80%;">Including more classes</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs022.html#more-classes" style="font-size: 80%;">More classes</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs023.html#searching-for-optimal-regularization-parameters-lambda" style="font-size: 80%;">Searching for Optimal Regularization Parameters \( \lambda \)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs024.html#grid-search" style="font-size: 80%;">Grid Search</a></li>
-     <!-- navigation toc: --> <li><a href="#randomized-grid-search" style="font-size: 80%;">Randomized Grid Search</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs026.html#wisconsin-cancer-data" style="font-size: 80%;">Wisconsin Cancer Data</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs027.html#using-the-correlation-matrix" style="font-size: 80%;">Using the correlation matrix</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs028.html#discussing-the-correlation-data" style="font-size: 80%;">Discussing the correlation data</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs029.html#other-measures-in-classification-studies-cancer-data-again" style="font-size: 80%;">Other measures in classification studies: Cancer Data  again</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs030.html#optimization-the-central-part-of-any-machine-learning-algortithm" style="font-size: 80%;">Optimization, the central part of any Machine Learning algortithm</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs031.html#revisiting-our-logistic-regression-case" style="font-size: 80%;">Revisiting our Logistic Regression case</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs032.html#the-equations-to-solve" style="font-size: 80%;">The equations to solve</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs033.html#solving-using-newton-raphson-s-method" style="font-size: 80%;">Solving using Newton-Raphson's method</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs034.html#brief-reminder-on-newton-raphson-s-method" style="font-size: 80%;">Brief reminder on Newton-Raphson's method</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs035.html#the-equations" style="font-size: 80%;">The equations</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs036.html#simple-geometric-interpretation" style="font-size: 80%;">Simple geometric interpretation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs037.html#extending-to-more-than-one-variable" style="font-size: 80%;">Extending to more than one variable</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs038.html#steepest-descent" style="font-size: 80%;">Steepest descent</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs039.html#more-on-steepest-descent" style="font-size: 80%;">More on Steepest descent</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs040.html#the-ideal" style="font-size: 80%;">The ideal</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs041.html#the-sensitiveness-of-the-gradient-descent" style="font-size: 80%;">The sensitiveness of the gradient descent</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs042.html#convex-functions" style="font-size: 80%;">Convex functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs043.html#convex-function" style="font-size: 80%;">Convex function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs044.html#conditions-on-convex-functions" style="font-size: 80%;">Conditions on convex functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs045.html#more-on-convex-functions" style="font-size: 80%;">More on convex functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs046.html#some-simple-problems" style="font-size: 80%;">Some simple problems</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs047.html#revisiting-our-first-homework" style="font-size: 80%;">Revisiting our first homework</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs052.html#gradient-descent-example" style="font-size: 80%;">Gradient descent example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs049.html#the-derivative-of-the-cost-loss-function" style="font-size: 80%;">The derivative of the cost/loss function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs050.html#the-hessian-matrix" style="font-size: 80%;">The Hessian matrix</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs051.html#simple-program" style="font-size: 80%;">Simple program</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs052.html#gradient-descent-example" style="font-size: 80%;">Gradient Descent Example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs053.html#and-a-corresponding-example-using-scikit-learn" style="font-size: 80%;">And a corresponding example using <b>scikit-learn</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs054.html#gradient-descent-and-ridge" style="font-size: 80%;">Gradient descent and Ridge</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs055.html#the-hessian-matrix-for-ridge-regression" style="font-size: 80%;">The Hessian matrix for Ridge Regression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs056.html#program-example-for-gradient-descent-with-ridge-regression" style="font-size: 80%;">Program example for gradient descent with Ridge Regression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs057.html#using-gradient-descent-methods-limitations" style="font-size: 80%;">Using gradient descent methods, limitations</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs058.html#challenge-yourself-the-coming-weekend" style="font-size: 80%;">Challenge yourself the coming weekend</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs059.html#lab-session-material-from-last-week-and-relevant-for-the-first-project" style="font-size: 80%;">Lab session: Material from last week and relevant for the first project</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs060.html#various-steps-in-cross-validation" style="font-size: 80%;">Various steps in cross-validation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs061.html#how-to-set-up-the-cross-validation-for-ridge-and-or-lasso" style="font-size: 80%;">How to set up the cross-validation for Ridge and/or Lasso</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs062.html#cross-validation-in-brief" style="font-size: 80%;">Cross-validation in brief</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs063.html#code-example-for-cross-validation-and-k-fold-cross-validation" style="font-size: 80%;">Code Example for Cross-validation and \( k \)-fold Cross-validation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs001.html#plans-for-week-38-lecture-monday-september-15" style="font-size: 80%;">Plans for week 38, lecture Monday September 15</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs002.html#readings-and-videos" style="font-size: 80%;">Readings and Videos</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs003.html#linking-the-regression-analysis-with-a-statistical-interpretation" style="font-size: 80%;">Linking the regression analysis with a statistical interpretation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs004.html#assumptions-made" style="font-size: 80%;">Assumptions made</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs005.html#expectation-value-and-variance" style="font-size: 80%;">Expectation value and variance</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs006.html#expectation-value-and-variance-for-boldsymbol-theta" style="font-size: 80%;">Expectation value and variance for \( \boldsymbol{\theta} \)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs007.html#deriving-ols-from-a-probability-distribution" style="font-size: 80%;">Deriving OLS from a probability distribution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs008.html#independent-and-identically-distributed-iid" style="font-size: 80%;">Independent and Identically Distributed (iid)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs009.html#maximum-likelihood-estimation-mle" style="font-size: 80%;">Maximum Likelihood Estimation (MLE)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs010.html#a-new-cost-function" style="font-size: 80%;">A new Cost Function</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs014.html#why-resampling-methods" style="font-size: 80%;">Why resampling methods</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs016.html#resampling-methods" style="font-size: 80%;">Resampling methods</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs013.html#resampling-approaches-can-be-computationally-expensive" style="font-size: 80%;">Resampling approaches can be computationally expensive</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs014.html#why-resampling-methods" style="font-size: 80%;">Why resampling methods ?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs015.html#statistical-analysis" style="font-size: 80%;">Statistical analysis</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs016.html#resampling-methods" style="font-size: 80%;">Resampling methods</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs017.html#resampling-methods-bootstrap" style="font-size: 80%;">Resampling methods: Bootstrap</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs018.html#the-central-limit-theorem" style="font-size: 80%;">The Central Limit Theorem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs019.html#finding-the-limit" style="font-size: 80%;">Finding the Limit</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs020.html#rewriting-the-delta-function" style="font-size: 80%;">Rewriting the \( \delta \)-function</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs021.html#identifying-terms" style="font-size: 80%;">Identifying Terms</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs022.html#wrapping-it-up" style="font-size: 80%;">Wrapping it up</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs023.html#confidence-intervals" style="font-size: 80%;">Confidence Intervals</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs024.html#standard-approach-based-on-the-normal-distribution" style="font-size: 80%;">Standard Approach based on the Normal Distribution</a></li>
+     <!-- navigation toc: --> <li><a href="#resampling-methods-bootstrap-background" style="font-size: 80%;">Resampling methods: Bootstrap background</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs026.html#resampling-methods-more-bootstrap-background" style="font-size: 80%;">Resampling methods: More Bootstrap background</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs027.html#resampling-methods-bootstrap-approach" style="font-size: 80%;">Resampling methods: Bootstrap approach</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs028.html#resampling-methods-bootstrap-steps" style="font-size: 80%;">Resampling methods: Bootstrap steps</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs029.html#code-example-for-the-bootstrap-method" style="font-size: 80%;">Code example for the Bootstrap method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs030.html#plotting-the-histogram" style="font-size: 80%;">Plotting the Histogram</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs031.html#the-bias-variance-tradeoff" style="font-size: 80%;">The bias-variance tradeoff</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs032.html#a-way-to-read-the-bias-variance-tradeoff" style="font-size: 80%;">A way to Read the Bias-Variance Tradeoff</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs033.html#example-code-for-bias-variance-tradeoff" style="font-size: 80%;">Example code for Bias-Variance tradeoff</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs034.html#understanding-what-happens" style="font-size: 80%;">Understanding what happens</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs035.html#summing-up" style="font-size: 80%;">Summing up</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs036.html#another-example-from-scikit-learn-s-repository" style="font-size: 80%;">Another Example from Scikit-Learn's Repository</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs037.html#various-steps-in-cross-validation" style="font-size: 80%;">Various steps in cross-validation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs038.html#cross-validation-in-brief" style="font-size: 80%;">Cross-validation in brief</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs039.html#code-example-for-cross-validation-and-k-fold-cross-validation" style="font-size: 80%;">Code Example for Cross-validation and \( k \)-fold Cross-validation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs040.html#more-examples-on-bootstrap-and-cross-validation-and-errors" style="font-size: 80%;">More examples on bootstrap and cross-validation and errors</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs041.html#the-same-example-but-now-with-cross-validation" style="font-size: 80%;">The same example but now with cross-validation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs042.html#material-for-the-lab-sessions" style="font-size: 80%;">Material for the lab sessions</a></li>
 
         </ul>
       </li>
@@ -319,84 +251,19 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0025"></a>
 <!-- !split -->
-<h2 id="randomized-grid-search" class="anchor">Randomized Grid Search </h2>
+<h2 id="resampling-methods-bootstrap-background" class="anchor">Resampling methods: Bootstrap background </h2>
 
-<p>An alternative to the above manual grid set up, is to use a random
-search where the parameters are tuned from a random distribution
-(uniform below) for a fixed number of iterations. A model is
-constructed and evaluated for each combination of chosen parameters.
-We repeat the previous example but now with a random search.  Note
-that values of \( \lambda \) are now limited to be within \( x\in
-[0,1] \). This domain may not be the most relevant one for the specific
-case under study.
+<p>Since \( \widehat{\theta} = \widehat{\theta}(\boldsymbol{X}) \) is a function of random variables,
+\( \widehat{\theta} \) itself must be a random variable. Thus it has
+a pdf, call this function \( p(\boldsymbol{t}) \). The aim of the bootstrap is to
+estimate \( p(\boldsymbol{t}) \) by the relative frequency of
+\( \widehat{\theta} \). You can think of this as using a histogram
+in the place of \( p(\boldsymbol{t}) \). If the relative frequency closely
+resembles \( p(\vec{t}) \), then using numerics, it is straight forward to
+estimate all the interesting parameters of \( p(\boldsymbol{t}) \) using point
+estimators.  
 </p>
 
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.model_selection</span> <span style="color: #008000; font-weight: bold">import</span> train_test_split
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.linear_model</span> <span style="color: #008000; font-weight: bold">import</span> Ridge
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.model_selection</span> <span style="color: #008000; font-weight: bold">import</span> GridSearchCV
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">scipy.stats</span> <span style="color: #008000; font-weight: bold">import</span> uniform <span style="color: #008000; font-weight: bold">as</span> randuniform
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.model_selection</span> <span style="color: #008000; font-weight: bold">import</span> RandomizedSearchCV
-
-
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">R2</span>(y_data, y_model):
-    <span style="color: #008000; font-weight: bold">return</span> <span style="color: #666666">1</span> <span style="color: #666666">-</span> np<span style="color: #666666">.</span>sum((y_data <span style="color: #666666">-</span> y_model) <span style="color: #666666">**</span> <span style="color: #666666">2</span>) <span style="color: #666666">/</span> np<span style="color: #666666">.</span>sum((y_data <span style="color: #666666">-</span> np<span style="color: #666666">.</span>mean(y_data)) <span style="color: #666666">**</span> <span style="color: #666666">2</span>)
-
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">MSE</span>(y_data,y_model):
-    n <span style="color: #666666">=</span> np<span style="color: #666666">.</span>size(y_model)
-    <span style="color: #008000; font-weight: bold">return</span> np<span style="color: #666666">.</span>sum((y_data<span style="color: #666666">-</span>y_model)<span style="color: #666666">**2</span>)<span style="color: #666666">/</span>n
-
-<span style="color: #408080; font-style: italic"># A seed just to ensure that the random numbers are the same for every run.</span>
-<span style="color: #408080; font-style: italic"># Useful for eventual debugging.</span>
-np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>seed(<span style="color: #666666">2021</span>)
-
-n <span style="color: #666666">=</span> <span style="color: #666666">100</span>
-x <span style="color: #666666">=</span> np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>rand(n)
-y <span style="color: #666666">=</span> np<span style="color: #666666">.</span>exp(<span style="color: #666666">-</span>x<span style="color: #666666">**2</span>) <span style="color: #666666">+</span> <span style="color: #666666">1.5</span> <span style="color: #666666">*</span> np<span style="color: #666666">.</span>exp(<span style="color: #666666">-</span>(x<span style="color: #666666">-2</span>)<span style="color: #666666">**2</span>)<span style="color: #666666">+</span> np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>randn(n)
-
-Maxpolydegree <span style="color: #666666">=</span> <span style="color: #666666">5</span>
-X <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros((n,Maxpolydegree<span style="color: #666666">-1</span>))
-
-<span style="color: #008000; font-weight: bold">for</span> degree <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #666666">1</span>,Maxpolydegree): <span style="color: #408080; font-style: italic">#No intercept column</span>
-    X[:,degree<span style="color: #666666">-1</span>] <span style="color: #666666">=</span> x<span style="color: #666666">**</span>(degree)
-
-<span style="color: #408080; font-style: italic"># We split the data in test and training data</span>
-X_train, X_test, y_train, y_test <span style="color: #666666">=</span> train_test_split(X, y, test_size<span style="color: #666666">=0.2</span>)
-
-param_grid <span style="color: #666666">=</span> {<span style="color: #BA2121">&#39;alpha&#39;</span>: randuniform()}
-<span style="color: #408080; font-style: italic"># create and fit a ridge regression model, testing each alpha</span>
-model <span style="color: #666666">=</span> Ridge()
-gridsearch <span style="color: #666666">=</span> RandomizedSearchCV(estimator<span style="color: #666666">=</span>model, param_distributions<span style="color: #666666">=</span>param_grid, n_iter<span style="color: #666666">=100</span>)
-gridsearch<span style="color: #666666">.</span>fit(X_train, y_train)
-<span style="color: #008000">print</span>(gridsearch)
-ypredictRidge <span style="color: #666666">=</span> gridsearch<span style="color: #666666">.</span>predict(X_test)
-<span style="color: #408080; font-style: italic"># summarize the results of the grid search</span>
-<span style="color: #008000">print</span>(<span style="color: #BA2121">f&quot;Best estimated lambda-value: </span><span style="color: #BB6688; font-weight: bold">{</span>gridsearch<span style="color: #666666">.</span>best_estimator_<span style="color: #666666">.</span>alpha<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">&quot;</span>)
-<span style="color: #008000">print</span>(<span style="color: #BA2121">f&quot;MSE score: </span><span style="color: #BB6688; font-weight: bold">{</span>MSE(y_test,ypredictRidge)<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">&quot;</span>)
-<span style="color: #008000">print</span>(<span style="color: #BA2121">f&quot;R2 score: </span><span style="color: #BB6688; font-weight: bold">{</span>R2(y_test,ypredictRidge)<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">&quot;</span>)
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-
 <p>
 <!-- navigation buttons at the bottom of the page -->
 <ul class="pagination">
@@ -422,7 +289,7 @@ <h2 id="randomized-grid-search" class="anchor">Randomized Grid Search </h2>
   <li><a href="http://github.com/._week38-bs033.html">34</a></li>
   <li><a href="http://github.com/._week38-bs034.html">35</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week38-bs063.html">64</a></li>
+  <li><a href="http://github.com/._week38-bs042.html">43</a></li>
   <li><a href="http://github.com/._week38-bs026.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week38/html/._week38-bs026.html b/doc/pub/week38/html/._week38-bs026.html
index 0233fafa1..343b628bb 100644
--- a/doc/pub/week38/html/._week38-bs026.html
+++ b/doc/pub/week38/html/._week38-bs026.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 38: Logistic Regression and Optimization">
-<title>Week 38: Logistic Regression and Optimization</title>
+<meta name="description" content="Week 38: Statistical analysis, bias-variance tradeoff and resampling methods">
+<title>Week 38: Statistical analysis, bias-variance tradeoff and resampling methods</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week38.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week38-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -36,172 +36,113 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Plans for week 38, lecture Monday September 16',
+ 'sections': [('Plans for week 38, lecture Monday September 15',
                2,
                None,
-               'plans-for-week-38-lecture-monday-september-16'),
-              ('Suggested reading and videos',
+               'plans-for-week-38-lecture-monday-september-15'),
+              ('Readings and Videos', 2, None, 'readings-and-videos'),
+              ('Linking the regression analysis with a statistical '
+               'interpretation',
                2,
                None,
-               'suggested-reading-and-videos'),
-              ('Plans for the lab sessions',
+               'linking-the-regression-analysis-with-a-statistical-interpretation'),
+              ('Assumptions made', 2, None, 'assumptions-made'),
+              ('Expectation value and variance',
                2,
                None,
-               'plans-for-the-lab-sessions'),
-              ('Material for lecture Monday September 16',
+               'expectation-value-and-variance'),
+              ('Expectation value and variance for $\\boldsymbol{\\theta}$',
                2,
                None,
-               'material-for-lecture-monday-september-16'),
-              ('Logistic Regression', 2, None, 'logistic-regression'),
-              ('Classification problems', 2, None, 'classification-problems'),
-              ('Optimization and Deep learning',
+               'expectation-value-and-variance-for-boldsymbol-theta'),
+              ('Deriving OLS from a probability distribution',
                2,
                None,
-               'optimization-and-deep-learning'),
-              ('Basics', 2, None, 'basics'),
-              ('Linear classifier', 2, None, 'linear-classifier'),
-              ('Some selected properties', 2, None, 'some-selected-properties'),
-              ('Simple example', 2, None, 'simple-example'),
-              ('Plotting the mean value for each group',
+               'deriving-ols-from-a-probability-distribution'),
+              ('Independent and Identically Distributed (iid)',
                2,
                None,
-               'plotting-the-mean-value-for-each-group'),
-              ('The logistic function', 2, None, 'the-logistic-function'),
-              ('Examples of likelihood functions used in logistic regression '
-               'and nueral networks',
+               'independent-and-identically-distributed-iid'),
+              ('Maximum Likelihood Estimation (MLE)',
                2,
                None,
-               'examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-networks'),
-              ('Two parameters', 2, None, 'two-parameters'),
-              ('Maximum likelihood', 2, None, 'maximum-likelihood'),
-              ('The cost function rewritten',
+               'maximum-likelihood-estimation-mle'),
+              ('A new Cost Function', 2, None, 'a-new-cost-function'),
+              ('Why resampling methods', 2, None, 'why-resampling-methods'),
+              ('Resampling methods', 2, None, 'resampling-methods'),
+              ('Resampling approaches can be computationally expensive',
                2,
                None,
-               'the-cost-function-rewritten'),
-              ('Minimizing the cross entropy',
+               'resampling-approaches-can-be-computationally-expensive'),
+              ('Why resampling methods ?', 2, None, 'why-resampling-methods'),
+              ('Statistical analysis', 2, None, 'statistical-analysis'),
+              ('Resampling methods', 2, None, 'resampling-methods'),
+              ('Resampling methods: Bootstrap',
                2,
                None,
-               'minimizing-the-cross-entropy'),
-              ('A more compact expression',
+               'resampling-methods-bootstrap'),
+              ('The Central Limit Theorem',
                2,
                None,
-               'a-more-compact-expression'),
-              ('Extending to more predictors',
+               'the-central-limit-theorem'),
+              ('Finding the Limit', 2, None, 'finding-the-limit'),
+              ('Rewriting the $\\delta$-function',
                2,
                None,
-               'extending-to-more-predictors'),
-              ('Including more classes', 2, None, 'including-more-classes'),
-              ('More classes', 2, None, 'more-classes'),
-              ('Searching for Optimal Regularization Parameters $\\lambda$',
+               'rewriting-the-delta-function'),
+              ('Identifying Terms', 2, None, 'identifying-terms'),
+              ('Wrapping it up', 2, None, 'wrapping-it-up'),
+              ('Confidence Intervals', 2, None, 'confidence-intervals'),
+              ('Standard Approach based on the Normal Distribution',
                2,
                None,
-               'searching-for-optimal-regularization-parameters-lambda'),
-              ('Grid Search', 2, None, 'grid-search'),
-              ('Randomized Grid Search', 2, None, 'randomized-grid-search'),
-              ('Wisconsin Cancer Data', 2, None, 'wisconsin-cancer-data'),
-              ('Using the correlation matrix',
+               'standard-approach-based-on-the-normal-distribution'),
+              ('Resampling methods: Bootstrap background',
                2,
                None,
-               'using-the-correlation-matrix'),
-              ('Discussing the correlation data',
+               'resampling-methods-bootstrap-background'),
+              ('Resampling methods: More Bootstrap background',
                2,
                None,
-               'discussing-the-correlation-data'),
-              ('Other measures in classification studies: Cancer Data  again',
+               'resampling-methods-more-bootstrap-background'),
+              ('Resampling methods: Bootstrap approach',
                2,
                None,
-               'other-measures-in-classification-studies-cancer-data-again'),
-              ('Optimization, the central part of any Machine Learning '
-               'algortithm',
+               'resampling-methods-bootstrap-approach'),
+              ('Resampling methods: Bootstrap steps',
                2,
                None,
-               'optimization-the-central-part-of-any-machine-learning-algortithm'),
-              ('Revisiting our Logistic Regression case',
+               'resampling-methods-bootstrap-steps'),
+              ('Code example for the Bootstrap method',
                2,
                None,
-               'revisiting-our-logistic-regression-case'),
-              ('The equations to solve', 2, None, 'the-equations-to-solve'),
-              ("Solving using Newton-Raphson's method",
+               'code-example-for-the-bootstrap-method'),
+              ('Plotting the Histogram', 2, None, 'plotting-the-histogram'),
+              ('The bias-variance tradeoff',
                2,
                None,
-               'solving-using-newton-raphson-s-method'),
-              ("Brief reminder on Newton-Raphson's method",
+               'the-bias-variance-tradeoff'),
+              ('A way to Read the Bias-Variance Tradeoff',
                2,
                None,
-               'brief-reminder-on-newton-raphson-s-method'),
-              ('The equations', 2, None, 'the-equations'),
-              ('Simple geometric interpretation',
+               'a-way-to-read-the-bias-variance-tradeoff'),
+              ('Example code for Bias-Variance tradeoff',
                2,
                None,
-               'simple-geometric-interpretation'),
-              ('Extending to more than one variable',
+               'example-code-for-bias-variance-tradeoff'),
+              ('Understanding what happens',
                2,
                None,
-               'extending-to-more-than-one-variable'),
-              ('Steepest descent', 2, None, 'steepest-descent'),
-              ('More on Steepest descent', 2, None, 'more-on-steepest-descent'),
-              ('The ideal', 2, None, 'the-ideal'),
-              ('The sensitiveness of the gradient descent',
+               'understanding-what-happens'),
+              ('Summing up', 2, None, 'summing-up'),
+              ("Another Example from Scikit-Learn's Repository",
                2,
                None,
-               'the-sensitiveness-of-the-gradient-descent'),
-              ('Convex functions', 2, None, 'convex-functions'),
-              ('Convex function', 2, None, 'convex-function'),
-              ('Conditions on convex functions',
-               2,
-               None,
-               'conditions-on-convex-functions'),
-              ('More on convex functions', 2, None, 'more-on-convex-functions'),
-              ('Some simple problems', 2, None, 'some-simple-problems'),
-              ('Revisiting our first homework',
-               2,
-               None,
-               'revisiting-our-first-homework'),
-              ('Gradient descent example', 2, None, 'gradient-descent-example'),
-              ('The derivative of the cost/loss function',
-               2,
-               None,
-               'the-derivative-of-the-cost-loss-function'),
-              ('The Hessian matrix', 2, None, 'the-hessian-matrix'),
-              ('Simple program', 2, None, 'simple-program'),
-              ('Gradient Descent Example', 2, None, 'gradient-descent-example'),
-              ('And a corresponding example using _scikit-learn_',
-               2,
-               None,
-               'and-a-corresponding-example-using-scikit-learn'),
-              ('Gradient descent and Ridge',
-               2,
-               None,
-               'gradient-descent-and-ridge'),
-              ('The Hessian matrix for Ridge Regression',
-               2,
-               None,
-               'the-hessian-matrix-for-ridge-regression'),
-              ('Program example for gradient descent with Ridge Regression',
-               2,
-               None,
-               'program-example-for-gradient-descent-with-ridge-regression'),
-              ('Using gradient descent methods, limitations',
-               2,
-               None,
-               'using-gradient-descent-methods-limitations'),
-              ('Challenge yourself the coming weekend',
-               2,
-               None,
-               'challenge-yourself-the-coming-weekend'),
-              ('Lab session: Material from last week and relevant for the '
-               'first project',
-               2,
-               None,
-               'lab-session-material-from-last-week-and-relevant-for-the-first-project'),
+               'another-example-from-scikit-learn-s-repository'),
               ('Various steps in cross-validation',
                2,
                None,
                'various-steps-in-cross-validation'),
-              ('How to set up the cross-validation for Ridge and/or Lasso',
-               2,
-               None,
-               'how-to-set-up-the-cross-validation-for-ridge-and-or-lasso'),
               ('Cross-validation in brief',
                2,
                None,
@@ -210,7 +151,19 @@
                'Cross-validation',
                2,
                None,
-               'code-example-for-cross-validation-and-k-fold-cross-validation')]}
+               'code-example-for-cross-validation-and-k-fold-cross-validation'),
+              ('More examples on bootstrap and cross-validation and errors',
+               2,
+               None,
+               'more-examples-on-bootstrap-and-cross-validation-and-errors'),
+              ('The same example but now with cross-validation',
+               2,
+               None,
+               'the-same-example-but-now-with-cross-validation'),
+              ('Material for the lab sessions',
+               2,
+               None,
+               'material-for-the-lab-sessions')]}
 end of tocinfo -->
 
 <body>
@@ -238,76 +191,55 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week38-bs.html">Week 38: Logistic Regression and Optimization</a>
+    <a class="navbar-brand" href="http://github.com/week38-bs.html">Week 38: Statistical analysis, bias-variance tradeoff and resampling methods</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs001.html#plans-for-week-38-lecture-monday-september-16" style="font-size: 80%;">Plans for week 38, lecture Monday September 16</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs002.html#suggested-reading-and-videos" style="font-size: 80%;">Suggested reading and videos</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs003.html#plans-for-the-lab-sessions" style="font-size: 80%;">Plans for the lab sessions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs004.html#material-for-lecture-monday-september-16" style="font-size: 80%;">Material for lecture Monday September 16</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs005.html#logistic-regression" style="font-size: 80%;">Logistic Regression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs006.html#classification-problems" style="font-size: 80%;">Classification problems</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs007.html#optimization-and-deep-learning" style="font-size: 80%;">Optimization and Deep learning</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs008.html#basics" style="font-size: 80%;">Basics</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs009.html#linear-classifier" style="font-size: 80%;">Linear classifier</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs010.html#some-selected-properties" style="font-size: 80%;">Some selected properties</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs011.html#simple-example" style="font-size: 80%;">Simple example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs012.html#plotting-the-mean-value-for-each-group" style="font-size: 80%;">Plotting the mean value for each group</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs013.html#the-logistic-function" style="font-size: 80%;">The logistic function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs014.html#examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-networks" style="font-size: 80%;">Examples of likelihood functions used in logistic regression and nueral networks</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs015.html#two-parameters" style="font-size: 80%;">Two parameters</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs016.html#maximum-likelihood" style="font-size: 80%;">Maximum likelihood</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs017.html#the-cost-function-rewritten" style="font-size: 80%;">The cost function rewritten</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs018.html#minimizing-the-cross-entropy" style="font-size: 80%;">Minimizing the cross entropy</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs019.html#a-more-compact-expression" style="font-size: 80%;">A more compact expression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs020.html#extending-to-more-predictors" style="font-size: 80%;">Extending to more predictors</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs021.html#including-more-classes" style="font-size: 80%;">Including more classes</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs022.html#more-classes" style="font-size: 80%;">More classes</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs023.html#searching-for-optimal-regularization-parameters-lambda" style="font-size: 80%;">Searching for Optimal Regularization Parameters \( \lambda \)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs024.html#grid-search" style="font-size: 80%;">Grid Search</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs025.html#randomized-grid-search" style="font-size: 80%;">Randomized Grid Search</a></li>
-     <!-- navigation toc: --> <li><a href="#wisconsin-cancer-data" style="font-size: 80%;">Wisconsin Cancer Data</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs027.html#using-the-correlation-matrix" style="font-size: 80%;">Using the correlation matrix</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs028.html#discussing-the-correlation-data" style="font-size: 80%;">Discussing the correlation data</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs029.html#other-measures-in-classification-studies-cancer-data-again" style="font-size: 80%;">Other measures in classification studies: Cancer Data  again</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs030.html#optimization-the-central-part-of-any-machine-learning-algortithm" style="font-size: 80%;">Optimization, the central part of any Machine Learning algortithm</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs031.html#revisiting-our-logistic-regression-case" style="font-size: 80%;">Revisiting our Logistic Regression case</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs032.html#the-equations-to-solve" style="font-size: 80%;">The equations to solve</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs033.html#solving-using-newton-raphson-s-method" style="font-size: 80%;">Solving using Newton-Raphson's method</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs034.html#brief-reminder-on-newton-raphson-s-method" style="font-size: 80%;">Brief reminder on Newton-Raphson's method</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs035.html#the-equations" style="font-size: 80%;">The equations</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs036.html#simple-geometric-interpretation" style="font-size: 80%;">Simple geometric interpretation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs037.html#extending-to-more-than-one-variable" style="font-size: 80%;">Extending to more than one variable</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs038.html#steepest-descent" style="font-size: 80%;">Steepest descent</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs039.html#more-on-steepest-descent" style="font-size: 80%;">More on Steepest descent</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs040.html#the-ideal" style="font-size: 80%;">The ideal</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs041.html#the-sensitiveness-of-the-gradient-descent" style="font-size: 80%;">The sensitiveness of the gradient descent</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs042.html#convex-functions" style="font-size: 80%;">Convex functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs043.html#convex-function" style="font-size: 80%;">Convex function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs044.html#conditions-on-convex-functions" style="font-size: 80%;">Conditions on convex functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs045.html#more-on-convex-functions" style="font-size: 80%;">More on convex functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs046.html#some-simple-problems" style="font-size: 80%;">Some simple problems</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs047.html#revisiting-our-first-homework" style="font-size: 80%;">Revisiting our first homework</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs052.html#gradient-descent-example" style="font-size: 80%;">Gradient descent example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs049.html#the-derivative-of-the-cost-loss-function" style="font-size: 80%;">The derivative of the cost/loss function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs050.html#the-hessian-matrix" style="font-size: 80%;">The Hessian matrix</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs051.html#simple-program" style="font-size: 80%;">Simple program</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs052.html#gradient-descent-example" style="font-size: 80%;">Gradient Descent Example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs053.html#and-a-corresponding-example-using-scikit-learn" style="font-size: 80%;">And a corresponding example using <b>scikit-learn</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs054.html#gradient-descent-and-ridge" style="font-size: 80%;">Gradient descent and Ridge</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs055.html#the-hessian-matrix-for-ridge-regression" style="font-size: 80%;">The Hessian matrix for Ridge Regression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs056.html#program-example-for-gradient-descent-with-ridge-regression" style="font-size: 80%;">Program example for gradient descent with Ridge Regression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs057.html#using-gradient-descent-methods-limitations" style="font-size: 80%;">Using gradient descent methods, limitations</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs058.html#challenge-yourself-the-coming-weekend" style="font-size: 80%;">Challenge yourself the coming weekend</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs059.html#lab-session-material-from-last-week-and-relevant-for-the-first-project" style="font-size: 80%;">Lab session: Material from last week and relevant for the first project</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs060.html#various-steps-in-cross-validation" style="font-size: 80%;">Various steps in cross-validation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs061.html#how-to-set-up-the-cross-validation-for-ridge-and-or-lasso" style="font-size: 80%;">How to set up the cross-validation for Ridge and/or Lasso</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs062.html#cross-validation-in-brief" style="font-size: 80%;">Cross-validation in brief</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs063.html#code-example-for-cross-validation-and-k-fold-cross-validation" style="font-size: 80%;">Code Example for Cross-validation and \( k \)-fold Cross-validation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs001.html#plans-for-week-38-lecture-monday-september-15" style="font-size: 80%;">Plans for week 38, lecture Monday September 15</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs002.html#readings-and-videos" style="font-size: 80%;">Readings and Videos</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs003.html#linking-the-regression-analysis-with-a-statistical-interpretation" style="font-size: 80%;">Linking the regression analysis with a statistical interpretation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs004.html#assumptions-made" style="font-size: 80%;">Assumptions made</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs005.html#expectation-value-and-variance" style="font-size: 80%;">Expectation value and variance</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs006.html#expectation-value-and-variance-for-boldsymbol-theta" style="font-size: 80%;">Expectation value and variance for \( \boldsymbol{\theta} \)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs007.html#deriving-ols-from-a-probability-distribution" style="font-size: 80%;">Deriving OLS from a probability distribution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs008.html#independent-and-identically-distributed-iid" style="font-size: 80%;">Independent and Identically Distributed (iid)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs009.html#maximum-likelihood-estimation-mle" style="font-size: 80%;">Maximum Likelihood Estimation (MLE)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs010.html#a-new-cost-function" style="font-size: 80%;">A new Cost Function</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs014.html#why-resampling-methods" style="font-size: 80%;">Why resampling methods</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs016.html#resampling-methods" style="font-size: 80%;">Resampling methods</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs013.html#resampling-approaches-can-be-computationally-expensive" style="font-size: 80%;">Resampling approaches can be computationally expensive</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs014.html#why-resampling-methods" style="font-size: 80%;">Why resampling methods ?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs015.html#statistical-analysis" style="font-size: 80%;">Statistical analysis</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs016.html#resampling-methods" style="font-size: 80%;">Resampling methods</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs017.html#resampling-methods-bootstrap" style="font-size: 80%;">Resampling methods: Bootstrap</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs018.html#the-central-limit-theorem" style="font-size: 80%;">The Central Limit Theorem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs019.html#finding-the-limit" style="font-size: 80%;">Finding the Limit</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs020.html#rewriting-the-delta-function" style="font-size: 80%;">Rewriting the \( \delta \)-function</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs021.html#identifying-terms" style="font-size: 80%;">Identifying Terms</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs022.html#wrapping-it-up" style="font-size: 80%;">Wrapping it up</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs023.html#confidence-intervals" style="font-size: 80%;">Confidence Intervals</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs024.html#standard-approach-based-on-the-normal-distribution" style="font-size: 80%;">Standard Approach based on the Normal Distribution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs025.html#resampling-methods-bootstrap-background" style="font-size: 80%;">Resampling methods: Bootstrap background</a></li>
+     <!-- navigation toc: --> <li><a href="#resampling-methods-more-bootstrap-background" style="font-size: 80%;">Resampling methods: More Bootstrap background</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs027.html#resampling-methods-bootstrap-approach" style="font-size: 80%;">Resampling methods: Bootstrap approach</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs028.html#resampling-methods-bootstrap-steps" style="font-size: 80%;">Resampling methods: Bootstrap steps</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs029.html#code-example-for-the-bootstrap-method" style="font-size: 80%;">Code example for the Bootstrap method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs030.html#plotting-the-histogram" style="font-size: 80%;">Plotting the Histogram</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs031.html#the-bias-variance-tradeoff" style="font-size: 80%;">The bias-variance tradeoff</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs032.html#a-way-to-read-the-bias-variance-tradeoff" style="font-size: 80%;">A way to Read the Bias-Variance Tradeoff</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs033.html#example-code-for-bias-variance-tradeoff" style="font-size: 80%;">Example code for Bias-Variance tradeoff</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs034.html#understanding-what-happens" style="font-size: 80%;">Understanding what happens</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs035.html#summing-up" style="font-size: 80%;">Summing up</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs036.html#another-example-from-scikit-learn-s-repository" style="font-size: 80%;">Another Example from Scikit-Learn's Repository</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs037.html#various-steps-in-cross-validation" style="font-size: 80%;">Various steps in cross-validation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs038.html#cross-validation-in-brief" style="font-size: 80%;">Cross-validation in brief</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs039.html#code-example-for-cross-validation-and-k-fold-cross-validation" style="font-size: 80%;">Code Example for Cross-validation and \( k \)-fold Cross-validation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs040.html#more-examples-on-bootstrap-and-cross-validation-and-errors" style="font-size: 80%;">More examples on bootstrap and cross-validation and errors</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs041.html#the-same-example-but-now-with-cross-validation" style="font-size: 80%;">The same example but now with cross-validation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs042.html#material-for-the-lab-sessions" style="font-size: 80%;">Material for the lab sessions</a></li>
 
         </ul>
       </li>
@@ -319,51 +251,23 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0026"></a>
 <!-- !split -->
-<h2 id="wisconsin-cancer-data" class="anchor">Wisconsin Cancer Data  </h2>
+<h2 id="resampling-methods-more-bootstrap-background" class="anchor">Resampling methods: More Bootstrap background </h2>
 
-<p>We show here how we can use a simple regression case on the breast
-cancer data using Logistic regression as our algorithm for
-classification.
+<p>In the case that \( \widehat{\theta} \) has
+more than one component, and the components are independent, we use the
+same estimator on each component separately.  If the probability
+density function of \( X_i \), \( p(x) \), had been known, then it would have
+been straightforward to do this by: 
+</p>
+<ol>
+<li> Drawing lots of numbers from \( p(x) \), suppose we call one such set of numbers \( (X_1^*, X_2^*, \cdots, X_n^*) \).</li> 
+<li> Then using these numbers, we could compute a replica of \( \widehat{\theta} \) called \( \widehat{\theta}^* \).</li> 
+</ol>
+<p>By repeated use of the above two points, many
+estimates of \( \widehat{\theta} \) can  be obtained. The
+idea is to use the relative frequency of \( \widehat{\theta}^* \)
+(think of a histogram) as an estimate of \( p(\boldsymbol{t}) \).
 </p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">matplotlib.pyplot</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">plt</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.model_selection</span> <span style="color: #008000; font-weight: bold">import</span>  train_test_split 
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.datasets</span> <span style="color: #008000; font-weight: bold">import</span> load_breast_cancer
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.linear_model</span> <span style="color: #008000; font-weight: bold">import</span> LogisticRegression
-
-<span style="color: #408080; font-style: italic"># Load the data</span>
-cancer <span style="color: #666666">=</span> load_breast_cancer()
-
-X_train, X_test, y_train, y_test <span style="color: #666666">=</span> train_test_split(cancer<span style="color: #666666">.</span>data,cancer<span style="color: #666666">.</span>target,random_state<span style="color: #666666">=0</span>)
-<span style="color: #008000">print</span>(X_train<span style="color: #666666">.</span>shape)
-<span style="color: #008000">print</span>(X_test<span style="color: #666666">.</span>shape)
-<span style="color: #408080; font-style: italic"># Logistic Regression</span>
-logreg <span style="color: #666666">=</span> LogisticRegression(solver<span style="color: #666666">=</span><span style="color: #BA2121">&#39;lbfgs&#39;</span>)
-logreg<span style="color: #666666">.</span>fit(X_train, y_train)
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;Test set accuracy with Logistic Regression: </span><span style="color: #BB6688; font-weight: bold">{:.2f}</span><span style="color: #BA2121">&quot;</span><span style="color: #666666">.</span>format(logreg<span style="color: #666666">.</span>score(X_test,y_test)))
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
 
 <p>
 <!-- navigation buttons at the bottom of the page -->
@@ -390,7 +294,7 @@ <h2 id="wisconsin-cancer-data" class="anchor">Wisconsin Cancer Data  </h2>
   <li><a href="http://github.com/._week38-bs034.html">35</a></li>
   <li><a href="http://github.com/._week38-bs035.html">36</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week38-bs063.html">64</a></li>
+  <li><a href="http://github.com/._week38-bs042.html">43</a></li>
   <li><a href="http://github.com/._week38-bs027.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week38/html/._week38-bs027.html b/doc/pub/week38/html/._week38-bs027.html
index 3401cf91f..252f2828d 100644
--- a/doc/pub/week38/html/._week38-bs027.html
+++ b/doc/pub/week38/html/._week38-bs027.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 38: Logistic Regression and Optimization">
-<title>Week 38: Logistic Regression and Optimization</title>
+<meta name="description" content="Week 38: Statistical analysis, bias-variance tradeoff and resampling methods">
+<title>Week 38: Statistical analysis, bias-variance tradeoff and resampling methods</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week38.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week38-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -36,172 +36,113 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Plans for week 38, lecture Monday September 16',
+ 'sections': [('Plans for week 38, lecture Monday September 15',
                2,
                None,
-               'plans-for-week-38-lecture-monday-september-16'),
-              ('Suggested reading and videos',
+               'plans-for-week-38-lecture-monday-september-15'),
+              ('Readings and Videos', 2, None, 'readings-and-videos'),
+              ('Linking the regression analysis with a statistical '
+               'interpretation',
                2,
                None,
-               'suggested-reading-and-videos'),
-              ('Plans for the lab sessions',
+               'linking-the-regression-analysis-with-a-statistical-interpretation'),
+              ('Assumptions made', 2, None, 'assumptions-made'),
+              ('Expectation value and variance',
                2,
                None,
-               'plans-for-the-lab-sessions'),
-              ('Material for lecture Monday September 16',
+               'expectation-value-and-variance'),
+              ('Expectation value and variance for $\\boldsymbol{\\theta}$',
                2,
                None,
-               'material-for-lecture-monday-september-16'),
-              ('Logistic Regression', 2, None, 'logistic-regression'),
-              ('Classification problems', 2, None, 'classification-problems'),
-              ('Optimization and Deep learning',
+               'expectation-value-and-variance-for-boldsymbol-theta'),
+              ('Deriving OLS from a probability distribution',
                2,
                None,
-               'optimization-and-deep-learning'),
-              ('Basics', 2, None, 'basics'),
-              ('Linear classifier', 2, None, 'linear-classifier'),
-              ('Some selected properties', 2, None, 'some-selected-properties'),
-              ('Simple example', 2, None, 'simple-example'),
-              ('Plotting the mean value for each group',
+               'deriving-ols-from-a-probability-distribution'),
+              ('Independent and Identically Distributed (iid)',
                2,
                None,
-               'plotting-the-mean-value-for-each-group'),
-              ('The logistic function', 2, None, 'the-logistic-function'),
-              ('Examples of likelihood functions used in logistic regression '
-               'and nueral networks',
+               'independent-and-identically-distributed-iid'),
+              ('Maximum Likelihood Estimation (MLE)',
                2,
                None,
-               'examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-networks'),
-              ('Two parameters', 2, None, 'two-parameters'),
-              ('Maximum likelihood', 2, None, 'maximum-likelihood'),
-              ('The cost function rewritten',
+               'maximum-likelihood-estimation-mle'),
+              ('A new Cost Function', 2, None, 'a-new-cost-function'),
+              ('Why resampling methods', 2, None, 'why-resampling-methods'),
+              ('Resampling methods', 2, None, 'resampling-methods'),
+              ('Resampling approaches can be computationally expensive',
                2,
                None,
-               'the-cost-function-rewritten'),
-              ('Minimizing the cross entropy',
+               'resampling-approaches-can-be-computationally-expensive'),
+              ('Why resampling methods ?', 2, None, 'why-resampling-methods'),
+              ('Statistical analysis', 2, None, 'statistical-analysis'),
+              ('Resampling methods', 2, None, 'resampling-methods'),
+              ('Resampling methods: Bootstrap',
                2,
                None,
-               'minimizing-the-cross-entropy'),
-              ('A more compact expression',
+               'resampling-methods-bootstrap'),
+              ('The Central Limit Theorem',
                2,
                None,
-               'a-more-compact-expression'),
-              ('Extending to more predictors',
+               'the-central-limit-theorem'),
+              ('Finding the Limit', 2, None, 'finding-the-limit'),
+              ('Rewriting the $\\delta$-function',
                2,
                None,
-               'extending-to-more-predictors'),
-              ('Including more classes', 2, None, 'including-more-classes'),
-              ('More classes', 2, None, 'more-classes'),
-              ('Searching for Optimal Regularization Parameters $\\lambda$',
+               'rewriting-the-delta-function'),
+              ('Identifying Terms', 2, None, 'identifying-terms'),
+              ('Wrapping it up', 2, None, 'wrapping-it-up'),
+              ('Confidence Intervals', 2, None, 'confidence-intervals'),
+              ('Standard Approach based on the Normal Distribution',
                2,
                None,
-               'searching-for-optimal-regularization-parameters-lambda'),
-              ('Grid Search', 2, None, 'grid-search'),
-              ('Randomized Grid Search', 2, None, 'randomized-grid-search'),
-              ('Wisconsin Cancer Data', 2, None, 'wisconsin-cancer-data'),
-              ('Using the correlation matrix',
+               'standard-approach-based-on-the-normal-distribution'),
+              ('Resampling methods: Bootstrap background',
                2,
                None,
-               'using-the-correlation-matrix'),
-              ('Discussing the correlation data',
+               'resampling-methods-bootstrap-background'),
+              ('Resampling methods: More Bootstrap background',
                2,
                None,
-               'discussing-the-correlation-data'),
-              ('Other measures in classification studies: Cancer Data  again',
+               'resampling-methods-more-bootstrap-background'),
+              ('Resampling methods: Bootstrap approach',
                2,
                None,
-               'other-measures-in-classification-studies-cancer-data-again'),
-              ('Optimization, the central part of any Machine Learning '
-               'algortithm',
+               'resampling-methods-bootstrap-approach'),
+              ('Resampling methods: Bootstrap steps',
                2,
                None,
-               'optimization-the-central-part-of-any-machine-learning-algortithm'),
-              ('Revisiting our Logistic Regression case',
+               'resampling-methods-bootstrap-steps'),
+              ('Code example for the Bootstrap method',
                2,
                None,
-               'revisiting-our-logistic-regression-case'),
-              ('The equations to solve', 2, None, 'the-equations-to-solve'),
-              ("Solving using Newton-Raphson's method",
+               'code-example-for-the-bootstrap-method'),
+              ('Plotting the Histogram', 2, None, 'plotting-the-histogram'),
+              ('The bias-variance tradeoff',
                2,
                None,
-               'solving-using-newton-raphson-s-method'),
-              ("Brief reminder on Newton-Raphson's method",
+               'the-bias-variance-tradeoff'),
+              ('A way to Read the Bias-Variance Tradeoff',
                2,
                None,
-               'brief-reminder-on-newton-raphson-s-method'),
-              ('The equations', 2, None, 'the-equations'),
-              ('Simple geometric interpretation',
+               'a-way-to-read-the-bias-variance-tradeoff'),
+              ('Example code for Bias-Variance tradeoff',
                2,
                None,
-               'simple-geometric-interpretation'),
-              ('Extending to more than one variable',
+               'example-code-for-bias-variance-tradeoff'),
+              ('Understanding what happens',
                2,
                None,
-               'extending-to-more-than-one-variable'),
-              ('Steepest descent', 2, None, 'steepest-descent'),
-              ('More on Steepest descent', 2, None, 'more-on-steepest-descent'),
-              ('The ideal', 2, None, 'the-ideal'),
-              ('The sensitiveness of the gradient descent',
+               'understanding-what-happens'),
+              ('Summing up', 2, None, 'summing-up'),
+              ("Another Example from Scikit-Learn's Repository",
                2,
                None,
-               'the-sensitiveness-of-the-gradient-descent'),
-              ('Convex functions', 2, None, 'convex-functions'),
-              ('Convex function', 2, None, 'convex-function'),
-              ('Conditions on convex functions',
-               2,
-               None,
-               'conditions-on-convex-functions'),
-              ('More on convex functions', 2, None, 'more-on-convex-functions'),
-              ('Some simple problems', 2, None, 'some-simple-problems'),
-              ('Revisiting our first homework',
-               2,
-               None,
-               'revisiting-our-first-homework'),
-              ('Gradient descent example', 2, None, 'gradient-descent-example'),
-              ('The derivative of the cost/loss function',
-               2,
-               None,
-               'the-derivative-of-the-cost-loss-function'),
-              ('The Hessian matrix', 2, None, 'the-hessian-matrix'),
-              ('Simple program', 2, None, 'simple-program'),
-              ('Gradient Descent Example', 2, None, 'gradient-descent-example'),
-              ('And a corresponding example using _scikit-learn_',
-               2,
-               None,
-               'and-a-corresponding-example-using-scikit-learn'),
-              ('Gradient descent and Ridge',
-               2,
-               None,
-               'gradient-descent-and-ridge'),
-              ('The Hessian matrix for Ridge Regression',
-               2,
-               None,
-               'the-hessian-matrix-for-ridge-regression'),
-              ('Program example for gradient descent with Ridge Regression',
-               2,
-               None,
-               'program-example-for-gradient-descent-with-ridge-regression'),
-              ('Using gradient descent methods, limitations',
-               2,
-               None,
-               'using-gradient-descent-methods-limitations'),
-              ('Challenge yourself the coming weekend',
-               2,
-               None,
-               'challenge-yourself-the-coming-weekend'),
-              ('Lab session: Material from last week and relevant for the '
-               'first project',
-               2,
-               None,
-               'lab-session-material-from-last-week-and-relevant-for-the-first-project'),
+               'another-example-from-scikit-learn-s-repository'),
               ('Various steps in cross-validation',
                2,
                None,
                'various-steps-in-cross-validation'),
-              ('How to set up the cross-validation for Ridge and/or Lasso',
-               2,
-               None,
-               'how-to-set-up-the-cross-validation-for-ridge-and-or-lasso'),
               ('Cross-validation in brief',
                2,
                None,
@@ -210,7 +151,19 @@
                'Cross-validation',
                2,
                None,
-               'code-example-for-cross-validation-and-k-fold-cross-validation')]}
+               'code-example-for-cross-validation-and-k-fold-cross-validation'),
+              ('More examples on bootstrap and cross-validation and errors',
+               2,
+               None,
+               'more-examples-on-bootstrap-and-cross-validation-and-errors'),
+              ('The same example but now with cross-validation',
+               2,
+               None,
+               'the-same-example-but-now-with-cross-validation'),
+              ('Material for the lab sessions',
+               2,
+               None,
+               'material-for-the-lab-sessions')]}
 end of tocinfo -->
 
 <body>
@@ -238,76 +191,55 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week38-bs.html">Week 38: Logistic Regression and Optimization</a>
+    <a class="navbar-brand" href="http://github.com/week38-bs.html">Week 38: Statistical analysis, bias-variance tradeoff and resampling methods</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs001.html#plans-for-week-38-lecture-monday-september-16" style="font-size: 80%;">Plans for week 38, lecture Monday September 16</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs002.html#suggested-reading-and-videos" style="font-size: 80%;">Suggested reading and videos</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs003.html#plans-for-the-lab-sessions" style="font-size: 80%;">Plans for the lab sessions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs004.html#material-for-lecture-monday-september-16" style="font-size: 80%;">Material for lecture Monday September 16</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs005.html#logistic-regression" style="font-size: 80%;">Logistic Regression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs006.html#classification-problems" style="font-size: 80%;">Classification problems</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs007.html#optimization-and-deep-learning" style="font-size: 80%;">Optimization and Deep learning</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs008.html#basics" style="font-size: 80%;">Basics</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs009.html#linear-classifier" style="font-size: 80%;">Linear classifier</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs010.html#some-selected-properties" style="font-size: 80%;">Some selected properties</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs011.html#simple-example" style="font-size: 80%;">Simple example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs012.html#plotting-the-mean-value-for-each-group" style="font-size: 80%;">Plotting the mean value for each group</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs013.html#the-logistic-function" style="font-size: 80%;">The logistic function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs014.html#examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-networks" style="font-size: 80%;">Examples of likelihood functions used in logistic regression and nueral networks</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs015.html#two-parameters" style="font-size: 80%;">Two parameters</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs016.html#maximum-likelihood" style="font-size: 80%;">Maximum likelihood</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs017.html#the-cost-function-rewritten" style="font-size: 80%;">The cost function rewritten</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs018.html#minimizing-the-cross-entropy" style="font-size: 80%;">Minimizing the cross entropy</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs019.html#a-more-compact-expression" style="font-size: 80%;">A more compact expression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs020.html#extending-to-more-predictors" style="font-size: 80%;">Extending to more predictors</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs021.html#including-more-classes" style="font-size: 80%;">Including more classes</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs022.html#more-classes" style="font-size: 80%;">More classes</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs023.html#searching-for-optimal-regularization-parameters-lambda" style="font-size: 80%;">Searching for Optimal Regularization Parameters \( \lambda \)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs024.html#grid-search" style="font-size: 80%;">Grid Search</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs025.html#randomized-grid-search" style="font-size: 80%;">Randomized Grid Search</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs026.html#wisconsin-cancer-data" style="font-size: 80%;">Wisconsin Cancer Data</a></li>
-     <!-- navigation toc: --> <li><a href="#using-the-correlation-matrix" style="font-size: 80%;">Using the correlation matrix</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs028.html#discussing-the-correlation-data" style="font-size: 80%;">Discussing the correlation data</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs029.html#other-measures-in-classification-studies-cancer-data-again" style="font-size: 80%;">Other measures in classification studies: Cancer Data  again</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs030.html#optimization-the-central-part-of-any-machine-learning-algortithm" style="font-size: 80%;">Optimization, the central part of any Machine Learning algortithm</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs031.html#revisiting-our-logistic-regression-case" style="font-size: 80%;">Revisiting our Logistic Regression case</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs032.html#the-equations-to-solve" style="font-size: 80%;">The equations to solve</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs033.html#solving-using-newton-raphson-s-method" style="font-size: 80%;">Solving using Newton-Raphson's method</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs034.html#brief-reminder-on-newton-raphson-s-method" style="font-size: 80%;">Brief reminder on Newton-Raphson's method</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs035.html#the-equations" style="font-size: 80%;">The equations</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs036.html#simple-geometric-interpretation" style="font-size: 80%;">Simple geometric interpretation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs037.html#extending-to-more-than-one-variable" style="font-size: 80%;">Extending to more than one variable</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs038.html#steepest-descent" style="font-size: 80%;">Steepest descent</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs039.html#more-on-steepest-descent" style="font-size: 80%;">More on Steepest descent</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs040.html#the-ideal" style="font-size: 80%;">The ideal</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs041.html#the-sensitiveness-of-the-gradient-descent" style="font-size: 80%;">The sensitiveness of the gradient descent</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs042.html#convex-functions" style="font-size: 80%;">Convex functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs043.html#convex-function" style="font-size: 80%;">Convex function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs044.html#conditions-on-convex-functions" style="font-size: 80%;">Conditions on convex functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs045.html#more-on-convex-functions" style="font-size: 80%;">More on convex functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs046.html#some-simple-problems" style="font-size: 80%;">Some simple problems</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs047.html#revisiting-our-first-homework" style="font-size: 80%;">Revisiting our first homework</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs052.html#gradient-descent-example" style="font-size: 80%;">Gradient descent example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs049.html#the-derivative-of-the-cost-loss-function" style="font-size: 80%;">The derivative of the cost/loss function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs050.html#the-hessian-matrix" style="font-size: 80%;">The Hessian matrix</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs051.html#simple-program" style="font-size: 80%;">Simple program</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs052.html#gradient-descent-example" style="font-size: 80%;">Gradient Descent Example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs053.html#and-a-corresponding-example-using-scikit-learn" style="font-size: 80%;">And a corresponding example using <b>scikit-learn</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs054.html#gradient-descent-and-ridge" style="font-size: 80%;">Gradient descent and Ridge</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs055.html#the-hessian-matrix-for-ridge-regression" style="font-size: 80%;">The Hessian matrix for Ridge Regression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs056.html#program-example-for-gradient-descent-with-ridge-regression" style="font-size: 80%;">Program example for gradient descent with Ridge Regression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs057.html#using-gradient-descent-methods-limitations" style="font-size: 80%;">Using gradient descent methods, limitations</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs058.html#challenge-yourself-the-coming-weekend" style="font-size: 80%;">Challenge yourself the coming weekend</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs059.html#lab-session-material-from-last-week-and-relevant-for-the-first-project" style="font-size: 80%;">Lab session: Material from last week and relevant for the first project</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs060.html#various-steps-in-cross-validation" style="font-size: 80%;">Various steps in cross-validation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs061.html#how-to-set-up-the-cross-validation-for-ridge-and-or-lasso" style="font-size: 80%;">How to set up the cross-validation for Ridge and/or Lasso</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs062.html#cross-validation-in-brief" style="font-size: 80%;">Cross-validation in brief</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs063.html#code-example-for-cross-validation-and-k-fold-cross-validation" style="font-size: 80%;">Code Example for Cross-validation and \( k \)-fold Cross-validation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs001.html#plans-for-week-38-lecture-monday-september-15" style="font-size: 80%;">Plans for week 38, lecture Monday September 15</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs002.html#readings-and-videos" style="font-size: 80%;">Readings and Videos</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs003.html#linking-the-regression-analysis-with-a-statistical-interpretation" style="font-size: 80%;">Linking the regression analysis with a statistical interpretation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs004.html#assumptions-made" style="font-size: 80%;">Assumptions made</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs005.html#expectation-value-and-variance" style="font-size: 80%;">Expectation value and variance</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs006.html#expectation-value-and-variance-for-boldsymbol-theta" style="font-size: 80%;">Expectation value and variance for \( \boldsymbol{\theta} \)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs007.html#deriving-ols-from-a-probability-distribution" style="font-size: 80%;">Deriving OLS from a probability distribution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs008.html#independent-and-identically-distributed-iid" style="font-size: 80%;">Independent and Identically Distributed (iid)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs009.html#maximum-likelihood-estimation-mle" style="font-size: 80%;">Maximum Likelihood Estimation (MLE)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs010.html#a-new-cost-function" style="font-size: 80%;">A new Cost Function</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs014.html#why-resampling-methods" style="font-size: 80%;">Why resampling methods</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs016.html#resampling-methods" style="font-size: 80%;">Resampling methods</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs013.html#resampling-approaches-can-be-computationally-expensive" style="font-size: 80%;">Resampling approaches can be computationally expensive</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs014.html#why-resampling-methods" style="font-size: 80%;">Why resampling methods ?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs015.html#statistical-analysis" style="font-size: 80%;">Statistical analysis</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs016.html#resampling-methods" style="font-size: 80%;">Resampling methods</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs017.html#resampling-methods-bootstrap" style="font-size: 80%;">Resampling methods: Bootstrap</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs018.html#the-central-limit-theorem" style="font-size: 80%;">The Central Limit Theorem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs019.html#finding-the-limit" style="font-size: 80%;">Finding the Limit</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs020.html#rewriting-the-delta-function" style="font-size: 80%;">Rewriting the \( \delta \)-function</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs021.html#identifying-terms" style="font-size: 80%;">Identifying Terms</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs022.html#wrapping-it-up" style="font-size: 80%;">Wrapping it up</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs023.html#confidence-intervals" style="font-size: 80%;">Confidence Intervals</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs024.html#standard-approach-based-on-the-normal-distribution" style="font-size: 80%;">Standard Approach based on the Normal Distribution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs025.html#resampling-methods-bootstrap-background" style="font-size: 80%;">Resampling methods: Bootstrap background</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs026.html#resampling-methods-more-bootstrap-background" style="font-size: 80%;">Resampling methods: More Bootstrap background</a></li>
+     <!-- navigation toc: --> <li><a href="#resampling-methods-bootstrap-approach" style="font-size: 80%;">Resampling methods: Bootstrap approach</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs028.html#resampling-methods-bootstrap-steps" style="font-size: 80%;">Resampling methods: Bootstrap steps</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs029.html#code-example-for-the-bootstrap-method" style="font-size: 80%;">Code example for the Bootstrap method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs030.html#plotting-the-histogram" style="font-size: 80%;">Plotting the Histogram</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs031.html#the-bias-variance-tradeoff" style="font-size: 80%;">The bias-variance tradeoff</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs032.html#a-way-to-read-the-bias-variance-tradeoff" style="font-size: 80%;">A way to Read the Bias-Variance Tradeoff</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs033.html#example-code-for-bias-variance-tradeoff" style="font-size: 80%;">Example code for Bias-Variance tradeoff</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs034.html#understanding-what-happens" style="font-size: 80%;">Understanding what happens</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs035.html#summing-up" style="font-size: 80%;">Summing up</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs036.html#another-example-from-scikit-learn-s-repository" style="font-size: 80%;">Another Example from Scikit-Learn's Repository</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs037.html#various-steps-in-cross-validation" style="font-size: 80%;">Various steps in cross-validation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs038.html#cross-validation-in-brief" style="font-size: 80%;">Cross-validation in brief</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs039.html#code-example-for-cross-validation-and-k-fold-cross-validation" style="font-size: 80%;">Code Example for Cross-validation and \( k \)-fold Cross-validation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs040.html#more-examples-on-bootstrap-and-cross-validation-and-errors" style="font-size: 80%;">More examples on bootstrap and cross-validation and errors</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs041.html#the-same-example-but-now-with-cross-validation" style="font-size: 80%;">The same example but now with cross-validation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs042.html#material-for-the-lab-sessions" style="font-size: 80%;">Material for the lab sessions</a></li>
 
         </ul>
       </li>
@@ -319,67 +251,20 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0027"></a>
 <!-- !split -->
-<h2 id="using-the-correlation-matrix" class="anchor">Using the correlation matrix </h2>
+<h2 id="resampling-methods-bootstrap-approach" class="anchor">Resampling methods: Bootstrap approach </h2>
 
-<p>In addition to the above scores, we could also study the covariance (and the correlation matrix).
-We use <b>Pandas</b> to compute the correlation matrix.
+<p>But
+unless there is enough information available about the process that
+generated \( X_1,X_2,\cdots,X_n \), \( p(x) \) is in general
+unknown. Therefore, <a href="https://projecteuclid.org/euclid.aos/1176344552" target="_self">Efron in 1979</a>  asked the
+question: What if we replace \( p(x) \) by the relative frequency
+of the observation \( X_i \)?
 </p>
 
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">matplotlib.pyplot</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">plt</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.model_selection</span> <span style="color: #008000; font-weight: bold">import</span>  train_test_split 
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.datasets</span> <span style="color: #008000; font-weight: bold">import</span> load_breast_cancer
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.linear_model</span> <span style="color: #008000; font-weight: bold">import</span> LogisticRegression
-cancer <span style="color: #666666">=</span> load_breast_cancer()
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">pandas</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">pd</span>
-<span style="color: #408080; font-style: italic"># Making a data frame</span>
-cancerpd <span style="color: #666666">=</span> pd<span style="color: #666666">.</span>DataFrame(cancer<span style="color: #666666">.</span>data, columns<span style="color: #666666">=</span>cancer<span style="color: #666666">.</span>feature_names)
-
-fig, axes <span style="color: #666666">=</span> plt<span style="color: #666666">.</span>subplots(<span style="color: #666666">15</span>,<span style="color: #666666">2</span>,figsize<span style="color: #666666">=</span>(<span style="color: #666666">10</span>,<span style="color: #666666">20</span>))
-malignant <span style="color: #666666">=</span> cancer<span style="color: #666666">.</span>data[cancer<span style="color: #666666">.</span>target <span style="color: #666666">==</span> <span style="color: #666666">0</span>]
-benign <span style="color: #666666">=</span> cancer<span style="color: #666666">.</span>data[cancer<span style="color: #666666">.</span>target <span style="color: #666666">==</span> <span style="color: #666666">1</span>]
-ax <span style="color: #666666">=</span> axes<span style="color: #666666">.</span>ravel()
-
-<span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #666666">30</span>):
-    _, bins <span style="color: #666666">=</span> np<span style="color: #666666">.</span>histogram(cancer<span style="color: #666666">.</span>data[:,i], bins <span style="color: #666666">=50</span>)
-    ax[i]<span style="color: #666666">.</span>hist(malignant[:,i], bins <span style="color: #666666">=</span> bins, alpha <span style="color: #666666">=</span> <span style="color: #666666">0.5</span>)
-    ax[i]<span style="color: #666666">.</span>hist(benign[:,i], bins <span style="color: #666666">=</span> bins, alpha <span style="color: #666666">=</span> <span style="color: #666666">0.5</span>)
-    ax[i]<span style="color: #666666">.</span>set_title(cancer<span style="color: #666666">.</span>feature_names[i])
-    ax[i]<span style="color: #666666">.</span>set_yticks(())
-ax[<span style="color: #666666">0</span>]<span style="color: #666666">.</span>set_xlabel(<span style="color: #BA2121">&quot;Feature magnitude&quot;</span>)
-ax[<span style="color: #666666">0</span>]<span style="color: #666666">.</span>set_ylabel(<span style="color: #BA2121">&quot;Frequency&quot;</span>)
-ax[<span style="color: #666666">0</span>]<span style="color: #666666">.</span>legend([<span style="color: #BA2121">&quot;Malignant&quot;</span>, <span style="color: #BA2121">&quot;Benign&quot;</span>], loc <span style="color: #666666">=</span><span style="color: #BA2121">&quot;best&quot;</span>)
-fig<span style="color: #666666">.</span>tight_layout()
-plt<span style="color: #666666">.</span>show()
-
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">seaborn</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">sns</span>
-correlation_matrix <span style="color: #666666">=</span> cancerpd<span style="color: #666666">.</span>corr()<span style="color: #666666">.</span>round(<span style="color: #666666">1</span>)
-<span style="color: #408080; font-style: italic"># use the heatmap function from seaborn to plot the correlation matrix</span>
-<span style="color: #408080; font-style: italic"># annot = True to print the values inside the square</span>
-plt<span style="color: #666666">.</span>figure(figsize<span style="color: #666666">=</span>(<span style="color: #666666">15</span>,<span style="color: #666666">8</span>))
-sns<span style="color: #666666">.</span>heatmap(data<span style="color: #666666">=</span>correlation_matrix, annot<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">True</span>)
-plt<span style="color: #666666">.</span>show()
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
+<p>If we draw observations in accordance with
+the relative frequency of the observations, will we obtain the same
+result in some asymptotic sense? The answer is yes.
+</p>
 
 <p>
 <!-- navigation buttons at the bottom of the page -->
@@ -406,7 +291,7 @@ <h2 id="using-the-correlation-matrix" class="anchor">Using the correlation matri
   <li><a href="http://github.com/._week38-bs035.html">36</a></li>
   <li><a href="http://github.com/._week38-bs036.html">37</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week38-bs063.html">64</a></li>
+  <li><a href="http://github.com/._week38-bs042.html">43</a></li>
   <li><a href="http://github.com/._week38-bs028.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week38/html/._week38-bs028.html b/doc/pub/week38/html/._week38-bs028.html
index 28461be2f..0883f9f43 100644
--- a/doc/pub/week38/html/._week38-bs028.html
+++ b/doc/pub/week38/html/._week38-bs028.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 38: Logistic Regression and Optimization">
-<title>Week 38: Logistic Regression and Optimization</title>
+<meta name="description" content="Week 38: Statistical analysis, bias-variance tradeoff and resampling methods">
+<title>Week 38: Statistical analysis, bias-variance tradeoff and resampling methods</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week38.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week38-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -36,172 +36,113 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Plans for week 38, lecture Monday September 16',
+ 'sections': [('Plans for week 38, lecture Monday September 15',
                2,
                None,
-               'plans-for-week-38-lecture-monday-september-16'),
-              ('Suggested reading and videos',
+               'plans-for-week-38-lecture-monday-september-15'),
+              ('Readings and Videos', 2, None, 'readings-and-videos'),
+              ('Linking the regression analysis with a statistical '
+               'interpretation',
                2,
                None,
-               'suggested-reading-and-videos'),
-              ('Plans for the lab sessions',
+               'linking-the-regression-analysis-with-a-statistical-interpretation'),
+              ('Assumptions made', 2, None, 'assumptions-made'),
+              ('Expectation value and variance',
                2,
                None,
-               'plans-for-the-lab-sessions'),
-              ('Material for lecture Monday September 16',
+               'expectation-value-and-variance'),
+              ('Expectation value and variance for $\\boldsymbol{\\theta}$',
                2,
                None,
-               'material-for-lecture-monday-september-16'),
-              ('Logistic Regression', 2, None, 'logistic-regression'),
-              ('Classification problems', 2, None, 'classification-problems'),
-              ('Optimization and Deep learning',
+               'expectation-value-and-variance-for-boldsymbol-theta'),
+              ('Deriving OLS from a probability distribution',
                2,
                None,
-               'optimization-and-deep-learning'),
-              ('Basics', 2, None, 'basics'),
-              ('Linear classifier', 2, None, 'linear-classifier'),
-              ('Some selected properties', 2, None, 'some-selected-properties'),
-              ('Simple example', 2, None, 'simple-example'),
-              ('Plotting the mean value for each group',
+               'deriving-ols-from-a-probability-distribution'),
+              ('Independent and Identically Distributed (iid)',
                2,
                None,
-               'plotting-the-mean-value-for-each-group'),
-              ('The logistic function', 2, None, 'the-logistic-function'),
-              ('Examples of likelihood functions used in logistic regression '
-               'and nueral networks',
+               'independent-and-identically-distributed-iid'),
+              ('Maximum Likelihood Estimation (MLE)',
                2,
                None,
-               'examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-networks'),
-              ('Two parameters', 2, None, 'two-parameters'),
-              ('Maximum likelihood', 2, None, 'maximum-likelihood'),
-              ('The cost function rewritten',
+               'maximum-likelihood-estimation-mle'),
+              ('A new Cost Function', 2, None, 'a-new-cost-function'),
+              ('Why resampling methods', 2, None, 'why-resampling-methods'),
+              ('Resampling methods', 2, None, 'resampling-methods'),
+              ('Resampling approaches can be computationally expensive',
                2,
                None,
-               'the-cost-function-rewritten'),
-              ('Minimizing the cross entropy',
+               'resampling-approaches-can-be-computationally-expensive'),
+              ('Why resampling methods ?', 2, None, 'why-resampling-methods'),
+              ('Statistical analysis', 2, None, 'statistical-analysis'),
+              ('Resampling methods', 2, None, 'resampling-methods'),
+              ('Resampling methods: Bootstrap',
                2,
                None,
-               'minimizing-the-cross-entropy'),
-              ('A more compact expression',
+               'resampling-methods-bootstrap'),
+              ('The Central Limit Theorem',
                2,
                None,
-               'a-more-compact-expression'),
-              ('Extending to more predictors',
+               'the-central-limit-theorem'),
+              ('Finding the Limit', 2, None, 'finding-the-limit'),
+              ('Rewriting the $\\delta$-function',
                2,
                None,
-               'extending-to-more-predictors'),
-              ('Including more classes', 2, None, 'including-more-classes'),
-              ('More classes', 2, None, 'more-classes'),
-              ('Searching for Optimal Regularization Parameters $\\lambda$',
+               'rewriting-the-delta-function'),
+              ('Identifying Terms', 2, None, 'identifying-terms'),
+              ('Wrapping it up', 2, None, 'wrapping-it-up'),
+              ('Confidence Intervals', 2, None, 'confidence-intervals'),
+              ('Standard Approach based on the Normal Distribution',
                2,
                None,
-               'searching-for-optimal-regularization-parameters-lambda'),
-              ('Grid Search', 2, None, 'grid-search'),
-              ('Randomized Grid Search', 2, None, 'randomized-grid-search'),
-              ('Wisconsin Cancer Data', 2, None, 'wisconsin-cancer-data'),
-              ('Using the correlation matrix',
+               'standard-approach-based-on-the-normal-distribution'),
+              ('Resampling methods: Bootstrap background',
                2,
                None,
-               'using-the-correlation-matrix'),
-              ('Discussing the correlation data',
+               'resampling-methods-bootstrap-background'),
+              ('Resampling methods: More Bootstrap background',
                2,
                None,
-               'discussing-the-correlation-data'),
-              ('Other measures in classification studies: Cancer Data  again',
+               'resampling-methods-more-bootstrap-background'),
+              ('Resampling methods: Bootstrap approach',
                2,
                None,
-               'other-measures-in-classification-studies-cancer-data-again'),
-              ('Optimization, the central part of any Machine Learning '
-               'algortithm',
+               'resampling-methods-bootstrap-approach'),
+              ('Resampling methods: Bootstrap steps',
                2,
                None,
-               'optimization-the-central-part-of-any-machine-learning-algortithm'),
-              ('Revisiting our Logistic Regression case',
+               'resampling-methods-bootstrap-steps'),
+              ('Code example for the Bootstrap method',
                2,
                None,
-               'revisiting-our-logistic-regression-case'),
-              ('The equations to solve', 2, None, 'the-equations-to-solve'),
-              ("Solving using Newton-Raphson's method",
+               'code-example-for-the-bootstrap-method'),
+              ('Plotting the Histogram', 2, None, 'plotting-the-histogram'),
+              ('The bias-variance tradeoff',
                2,
                None,
-               'solving-using-newton-raphson-s-method'),
-              ("Brief reminder on Newton-Raphson's method",
+               'the-bias-variance-tradeoff'),
+              ('A way to Read the Bias-Variance Tradeoff',
                2,
                None,
-               'brief-reminder-on-newton-raphson-s-method'),
-              ('The equations', 2, None, 'the-equations'),
-              ('Simple geometric interpretation',
+               'a-way-to-read-the-bias-variance-tradeoff'),
+              ('Example code for Bias-Variance tradeoff',
                2,
                None,
-               'simple-geometric-interpretation'),
-              ('Extending to more than one variable',
+               'example-code-for-bias-variance-tradeoff'),
+              ('Understanding what happens',
                2,
                None,
-               'extending-to-more-than-one-variable'),
-              ('Steepest descent', 2, None, 'steepest-descent'),
-              ('More on Steepest descent', 2, None, 'more-on-steepest-descent'),
-              ('The ideal', 2, None, 'the-ideal'),
-              ('The sensitiveness of the gradient descent',
+               'understanding-what-happens'),
+              ('Summing up', 2, None, 'summing-up'),
+              ("Another Example from Scikit-Learn's Repository",
                2,
                None,
-               'the-sensitiveness-of-the-gradient-descent'),
-              ('Convex functions', 2, None, 'convex-functions'),
-              ('Convex function', 2, None, 'convex-function'),
-              ('Conditions on convex functions',
-               2,
-               None,
-               'conditions-on-convex-functions'),
-              ('More on convex functions', 2, None, 'more-on-convex-functions'),
-              ('Some simple problems', 2, None, 'some-simple-problems'),
-              ('Revisiting our first homework',
-               2,
-               None,
-               'revisiting-our-first-homework'),
-              ('Gradient descent example', 2, None, 'gradient-descent-example'),
-              ('The derivative of the cost/loss function',
-               2,
-               None,
-               'the-derivative-of-the-cost-loss-function'),
-              ('The Hessian matrix', 2, None, 'the-hessian-matrix'),
-              ('Simple program', 2, None, 'simple-program'),
-              ('Gradient Descent Example', 2, None, 'gradient-descent-example'),
-              ('And a corresponding example using _scikit-learn_',
-               2,
-               None,
-               'and-a-corresponding-example-using-scikit-learn'),
-              ('Gradient descent and Ridge',
-               2,
-               None,
-               'gradient-descent-and-ridge'),
-              ('The Hessian matrix for Ridge Regression',
-               2,
-               None,
-               'the-hessian-matrix-for-ridge-regression'),
-              ('Program example for gradient descent with Ridge Regression',
-               2,
-               None,
-               'program-example-for-gradient-descent-with-ridge-regression'),
-              ('Using gradient descent methods, limitations',
-               2,
-               None,
-               'using-gradient-descent-methods-limitations'),
-              ('Challenge yourself the coming weekend',
-               2,
-               None,
-               'challenge-yourself-the-coming-weekend'),
-              ('Lab session: Material from last week and relevant for the '
-               'first project',
-               2,
-               None,
-               'lab-session-material-from-last-week-and-relevant-for-the-first-project'),
+               'another-example-from-scikit-learn-s-repository'),
               ('Various steps in cross-validation',
                2,
                None,
                'various-steps-in-cross-validation'),
-              ('How to set up the cross-validation for Ridge and/or Lasso',
-               2,
-               None,
-               'how-to-set-up-the-cross-validation-for-ridge-and-or-lasso'),
               ('Cross-validation in brief',
                2,
                None,
@@ -210,7 +151,19 @@
                'Cross-validation',
                2,
                None,
-               'code-example-for-cross-validation-and-k-fold-cross-validation')]}
+               'code-example-for-cross-validation-and-k-fold-cross-validation'),
+              ('More examples on bootstrap and cross-validation and errors',
+               2,
+               None,
+               'more-examples-on-bootstrap-and-cross-validation-and-errors'),
+              ('The same example but now with cross-validation',
+               2,
+               None,
+               'the-same-example-but-now-with-cross-validation'),
+              ('Material for the lab sessions',
+               2,
+               None,
+               'material-for-the-lab-sessions')]}
 end of tocinfo -->
 
 <body>
@@ -238,76 +191,55 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week38-bs.html">Week 38: Logistic Regression and Optimization</a>
+    <a class="navbar-brand" href="http://github.com/week38-bs.html">Week 38: Statistical analysis, bias-variance tradeoff and resampling methods</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs001.html#plans-for-week-38-lecture-monday-september-16" style="font-size: 80%;">Plans for week 38, lecture Monday September 16</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs002.html#suggested-reading-and-videos" style="font-size: 80%;">Suggested reading and videos</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs003.html#plans-for-the-lab-sessions" style="font-size: 80%;">Plans for the lab sessions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs004.html#material-for-lecture-monday-september-16" style="font-size: 80%;">Material for lecture Monday September 16</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs005.html#logistic-regression" style="font-size: 80%;">Logistic Regression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs006.html#classification-problems" style="font-size: 80%;">Classification problems</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs007.html#optimization-and-deep-learning" style="font-size: 80%;">Optimization and Deep learning</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs008.html#basics" style="font-size: 80%;">Basics</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs009.html#linear-classifier" style="font-size: 80%;">Linear classifier</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs010.html#some-selected-properties" style="font-size: 80%;">Some selected properties</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs011.html#simple-example" style="font-size: 80%;">Simple example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs012.html#plotting-the-mean-value-for-each-group" style="font-size: 80%;">Plotting the mean value for each group</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs013.html#the-logistic-function" style="font-size: 80%;">The logistic function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs014.html#examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-networks" style="font-size: 80%;">Examples of likelihood functions used in logistic regression and nueral networks</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs015.html#two-parameters" style="font-size: 80%;">Two parameters</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs016.html#maximum-likelihood" style="font-size: 80%;">Maximum likelihood</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs017.html#the-cost-function-rewritten" style="font-size: 80%;">The cost function rewritten</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs018.html#minimizing-the-cross-entropy" style="font-size: 80%;">Minimizing the cross entropy</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs019.html#a-more-compact-expression" style="font-size: 80%;">A more compact expression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs020.html#extending-to-more-predictors" style="font-size: 80%;">Extending to more predictors</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs021.html#including-more-classes" style="font-size: 80%;">Including more classes</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs022.html#more-classes" style="font-size: 80%;">More classes</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs023.html#searching-for-optimal-regularization-parameters-lambda" style="font-size: 80%;">Searching for Optimal Regularization Parameters \( \lambda \)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs024.html#grid-search" style="font-size: 80%;">Grid Search</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs025.html#randomized-grid-search" style="font-size: 80%;">Randomized Grid Search</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs026.html#wisconsin-cancer-data" style="font-size: 80%;">Wisconsin Cancer Data</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs027.html#using-the-correlation-matrix" style="font-size: 80%;">Using the correlation matrix</a></li>
-     <!-- navigation toc: --> <li><a href="#discussing-the-correlation-data" style="font-size: 80%;">Discussing the correlation data</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs029.html#other-measures-in-classification-studies-cancer-data-again" style="font-size: 80%;">Other measures in classification studies: Cancer Data  again</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs030.html#optimization-the-central-part-of-any-machine-learning-algortithm" style="font-size: 80%;">Optimization, the central part of any Machine Learning algortithm</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs031.html#revisiting-our-logistic-regression-case" style="font-size: 80%;">Revisiting our Logistic Regression case</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs032.html#the-equations-to-solve" style="font-size: 80%;">The equations to solve</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs033.html#solving-using-newton-raphson-s-method" style="font-size: 80%;">Solving using Newton-Raphson's method</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs034.html#brief-reminder-on-newton-raphson-s-method" style="font-size: 80%;">Brief reminder on Newton-Raphson's method</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs035.html#the-equations" style="font-size: 80%;">The equations</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs036.html#simple-geometric-interpretation" style="font-size: 80%;">Simple geometric interpretation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs037.html#extending-to-more-than-one-variable" style="font-size: 80%;">Extending to more than one variable</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs038.html#steepest-descent" style="font-size: 80%;">Steepest descent</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs039.html#more-on-steepest-descent" style="font-size: 80%;">More on Steepest descent</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs040.html#the-ideal" style="font-size: 80%;">The ideal</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs041.html#the-sensitiveness-of-the-gradient-descent" style="font-size: 80%;">The sensitiveness of the gradient descent</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs042.html#convex-functions" style="font-size: 80%;">Convex functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs043.html#convex-function" style="font-size: 80%;">Convex function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs044.html#conditions-on-convex-functions" style="font-size: 80%;">Conditions on convex functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs045.html#more-on-convex-functions" style="font-size: 80%;">More on convex functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs046.html#some-simple-problems" style="font-size: 80%;">Some simple problems</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs047.html#revisiting-our-first-homework" style="font-size: 80%;">Revisiting our first homework</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs052.html#gradient-descent-example" style="font-size: 80%;">Gradient descent example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs049.html#the-derivative-of-the-cost-loss-function" style="font-size: 80%;">The derivative of the cost/loss function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs050.html#the-hessian-matrix" style="font-size: 80%;">The Hessian matrix</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs051.html#simple-program" style="font-size: 80%;">Simple program</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs052.html#gradient-descent-example" style="font-size: 80%;">Gradient Descent Example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs053.html#and-a-corresponding-example-using-scikit-learn" style="font-size: 80%;">And a corresponding example using <b>scikit-learn</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs054.html#gradient-descent-and-ridge" style="font-size: 80%;">Gradient descent and Ridge</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs055.html#the-hessian-matrix-for-ridge-regression" style="font-size: 80%;">The Hessian matrix for Ridge Regression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs056.html#program-example-for-gradient-descent-with-ridge-regression" style="font-size: 80%;">Program example for gradient descent with Ridge Regression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs057.html#using-gradient-descent-methods-limitations" style="font-size: 80%;">Using gradient descent methods, limitations</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs058.html#challenge-yourself-the-coming-weekend" style="font-size: 80%;">Challenge yourself the coming weekend</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs059.html#lab-session-material-from-last-week-and-relevant-for-the-first-project" style="font-size: 80%;">Lab session: Material from last week and relevant for the first project</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs060.html#various-steps-in-cross-validation" style="font-size: 80%;">Various steps in cross-validation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs061.html#how-to-set-up-the-cross-validation-for-ridge-and-or-lasso" style="font-size: 80%;">How to set up the cross-validation for Ridge and/or Lasso</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs062.html#cross-validation-in-brief" style="font-size: 80%;">Cross-validation in brief</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs063.html#code-example-for-cross-validation-and-k-fold-cross-validation" style="font-size: 80%;">Code Example for Cross-validation and \( k \)-fold Cross-validation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs001.html#plans-for-week-38-lecture-monday-september-15" style="font-size: 80%;">Plans for week 38, lecture Monday September 15</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs002.html#readings-and-videos" style="font-size: 80%;">Readings and Videos</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs003.html#linking-the-regression-analysis-with-a-statistical-interpretation" style="font-size: 80%;">Linking the regression analysis with a statistical interpretation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs004.html#assumptions-made" style="font-size: 80%;">Assumptions made</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs005.html#expectation-value-and-variance" style="font-size: 80%;">Expectation value and variance</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs006.html#expectation-value-and-variance-for-boldsymbol-theta" style="font-size: 80%;">Expectation value and variance for \( \boldsymbol{\theta} \)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs007.html#deriving-ols-from-a-probability-distribution" style="font-size: 80%;">Deriving OLS from a probability distribution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs008.html#independent-and-identically-distributed-iid" style="font-size: 80%;">Independent and Identically Distributed (iid)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs009.html#maximum-likelihood-estimation-mle" style="font-size: 80%;">Maximum Likelihood Estimation (MLE)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs010.html#a-new-cost-function" style="font-size: 80%;">A new Cost Function</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs014.html#why-resampling-methods" style="font-size: 80%;">Why resampling methods</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs016.html#resampling-methods" style="font-size: 80%;">Resampling methods</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs013.html#resampling-approaches-can-be-computationally-expensive" style="font-size: 80%;">Resampling approaches can be computationally expensive</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs014.html#why-resampling-methods" style="font-size: 80%;">Why resampling methods ?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs015.html#statistical-analysis" style="font-size: 80%;">Statistical analysis</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs016.html#resampling-methods" style="font-size: 80%;">Resampling methods</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs017.html#resampling-methods-bootstrap" style="font-size: 80%;">Resampling methods: Bootstrap</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs018.html#the-central-limit-theorem" style="font-size: 80%;">The Central Limit Theorem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs019.html#finding-the-limit" style="font-size: 80%;">Finding the Limit</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs020.html#rewriting-the-delta-function" style="font-size: 80%;">Rewriting the \( \delta \)-function</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs021.html#identifying-terms" style="font-size: 80%;">Identifying Terms</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs022.html#wrapping-it-up" style="font-size: 80%;">Wrapping it up</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs023.html#confidence-intervals" style="font-size: 80%;">Confidence Intervals</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs024.html#standard-approach-based-on-the-normal-distribution" style="font-size: 80%;">Standard Approach based on the Normal Distribution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs025.html#resampling-methods-bootstrap-background" style="font-size: 80%;">Resampling methods: Bootstrap background</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs026.html#resampling-methods-more-bootstrap-background" style="font-size: 80%;">Resampling methods: More Bootstrap background</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs027.html#resampling-methods-bootstrap-approach" style="font-size: 80%;">Resampling methods: Bootstrap approach</a></li>
+     <!-- navigation toc: --> <li><a href="#resampling-methods-bootstrap-steps" style="font-size: 80%;">Resampling methods: Bootstrap steps</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs029.html#code-example-for-the-bootstrap-method" style="font-size: 80%;">Code example for the Bootstrap method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs030.html#plotting-the-histogram" style="font-size: 80%;">Plotting the Histogram</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs031.html#the-bias-variance-tradeoff" style="font-size: 80%;">The bias-variance tradeoff</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs032.html#a-way-to-read-the-bias-variance-tradeoff" style="font-size: 80%;">A way to Read the Bias-Variance Tradeoff</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs033.html#example-code-for-bias-variance-tradeoff" style="font-size: 80%;">Example code for Bias-Variance tradeoff</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs034.html#understanding-what-happens" style="font-size: 80%;">Understanding what happens</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs035.html#summing-up" style="font-size: 80%;">Summing up</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs036.html#another-example-from-scikit-learn-s-repository" style="font-size: 80%;">Another Example from Scikit-Learn's Repository</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs037.html#various-steps-in-cross-validation" style="font-size: 80%;">Various steps in cross-validation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs038.html#cross-validation-in-brief" style="font-size: 80%;">Cross-validation in brief</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs039.html#code-example-for-cross-validation-and-k-fold-cross-validation" style="font-size: 80%;">Code Example for Cross-validation and \( k \)-fold Cross-validation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs040.html#more-examples-on-bootstrap-and-cross-validation-and-errors" style="font-size: 80%;">More examples on bootstrap and cross-validation and errors</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs041.html#the-same-example-but-now-with-cross-validation" style="font-size: 80%;">The same example but now with cross-validation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs042.html#material-for-the-lab-sessions" style="font-size: 80%;">Material for the lab sessions</a></li>
 
         </ul>
       </li>
@@ -319,74 +251,25 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0028"></a>
 <!-- !split -->
-<h2 id="discussing-the-correlation-data" class="anchor">Discussing the correlation data </h2>
-
-<p>In the above example we note two things. In the first plot we display
-the overlap of benign and malignant tumors as functions of the various
-features in the Wisconsing breast cancer data set. We see that for
-some of the features we can distinguish clearly the benign and
-malignant cases while for other features we cannot. This can point to
-us which features may be of greater interest when we wish to classify
-a benign or not benign tumour.
-</p>
-
-<p>In the second figure we have computed the so-called correlation
-matrix, which in our case with thirty features becomes a \( 30\times 30 \)
-matrix.
-</p>
-
-<p>We constructed this matrix using <b>pandas</b> via the statements</p>
+<h2 id="resampling-methods-bootstrap-steps" class="anchor">Resampling methods: Bootstrap steps </h2>
 
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;">cancerpd <span style="color: #666666">=</span> pd<span style="color: #666666">.</span>DataFrame(cancer<span style="color: #666666">.</span>data, columns<span style="color: #666666">=</span>cancer<span style="color: #666666">.</span>feature_names)
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-<p>and then</p>
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;">correlation_matrix <span style="color: #666666">=</span> cancerpd<span style="color: #666666">.</span>corr()<span style="color: #666666">.</span>round(<span style="color: #666666">1</span>)
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
+<p>The independent bootstrap works like this: </p>
 
-<p>Diagonalizing this matrix we can in turn say something about which
-features are of relevance and which are not. This leads  us to
-the classical Principal Component Analysis (PCA) theorem with
-applications. This will be discussed later this semester (<a href="https://compphysics.github.io/MachineLearning/doc/pub/week43/html/week43-bs.html" target="_self">week 43</a>).
+<ol>
+<li> Draw with replacement \( n \) numbers for the observed variables \( \boldsymbol{x} = (x_1,x_2,\cdots,x_n) \).</li> 
+<li> Define a vector \( \boldsymbol{x}^* \) containing the values which were drawn from \( \boldsymbol{x} \).</li> 
+<li> Using the vector \( \boldsymbol{x}^* \) compute \( \widehat{\theta}^* \) by evaluating \( \widehat \theta \) under the observations \( \boldsymbol{x}^* \).</li> 
+<li> Repeat this process \( k \) times.</li> 
+</ol>
+<p>When you are done, you can draw a histogram of the relative frequency
+of \( \widehat \theta^* \). This is your estimate of the probability
+distribution \( p(t) \). Using this probability distribution you can
+estimate any statistics thereof. In principle you never draw the
+histogram of the relative frequency of \( \widehat{\theta}^* \). Instead
+you use the estimators corresponding to the statistic of interest. For
+example, if you are interested in estimating the variance of \( \widehat
+\theta \), apply the etsimator \( \widehat \sigma^2 \) to the values
+\( \widehat \theta^* \).
 </p>
 
 <p>
@@ -414,7 +297,7 @@ <h2 id="discussing-the-correlation-data" class="anchor">Discussing the correlati
   <li><a href="http://github.com/._week38-bs036.html">37</a></li>
   <li><a href="http://github.com/._week38-bs037.html">38</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week38-bs063.html">64</a></li>
+  <li><a href="http://github.com/._week38-bs042.html">43</a></li>
   <li><a href="http://github.com/._week38-bs029.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week38/html/._week38-bs029.html b/doc/pub/week38/html/._week38-bs029.html
index eb743f7ce..25ed7c5d6 100644
--- a/doc/pub/week38/html/._week38-bs029.html
+++ b/doc/pub/week38/html/._week38-bs029.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 38: Logistic Regression and Optimization">
-<title>Week 38: Logistic Regression and Optimization</title>
+<meta name="description" content="Week 38: Statistical analysis, bias-variance tradeoff and resampling methods">
+<title>Week 38: Statistical analysis, bias-variance tradeoff and resampling methods</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week38.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week38-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -36,172 +36,113 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Plans for week 38, lecture Monday September 16',
+ 'sections': [('Plans for week 38, lecture Monday September 15',
                2,
                None,
-               'plans-for-week-38-lecture-monday-september-16'),
-              ('Suggested reading and videos',
+               'plans-for-week-38-lecture-monday-september-15'),
+              ('Readings and Videos', 2, None, 'readings-and-videos'),
+              ('Linking the regression analysis with a statistical '
+               'interpretation',
                2,
                None,
-               'suggested-reading-and-videos'),
-              ('Plans for the lab sessions',
+               'linking-the-regression-analysis-with-a-statistical-interpretation'),
+              ('Assumptions made', 2, None, 'assumptions-made'),
+              ('Expectation value and variance',
                2,
                None,
-               'plans-for-the-lab-sessions'),
-              ('Material for lecture Monday September 16',
+               'expectation-value-and-variance'),
+              ('Expectation value and variance for $\\boldsymbol{\\theta}$',
                2,
                None,
-               'material-for-lecture-monday-september-16'),
-              ('Logistic Regression', 2, None, 'logistic-regression'),
-              ('Classification problems', 2, None, 'classification-problems'),
-              ('Optimization and Deep learning',
+               'expectation-value-and-variance-for-boldsymbol-theta'),
+              ('Deriving OLS from a probability distribution',
                2,
                None,
-               'optimization-and-deep-learning'),
-              ('Basics', 2, None, 'basics'),
-              ('Linear classifier', 2, None, 'linear-classifier'),
-              ('Some selected properties', 2, None, 'some-selected-properties'),
-              ('Simple example', 2, None, 'simple-example'),
-              ('Plotting the mean value for each group',
+               'deriving-ols-from-a-probability-distribution'),
+              ('Independent and Identically Distributed (iid)',
                2,
                None,
-               'plotting-the-mean-value-for-each-group'),
-              ('The logistic function', 2, None, 'the-logistic-function'),
-              ('Examples of likelihood functions used in logistic regression '
-               'and nueral networks',
+               'independent-and-identically-distributed-iid'),
+              ('Maximum Likelihood Estimation (MLE)',
                2,
                None,
-               'examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-networks'),
-              ('Two parameters', 2, None, 'two-parameters'),
-              ('Maximum likelihood', 2, None, 'maximum-likelihood'),
-              ('The cost function rewritten',
+               'maximum-likelihood-estimation-mle'),
+              ('A new Cost Function', 2, None, 'a-new-cost-function'),
+              ('Why resampling methods', 2, None, 'why-resampling-methods'),
+              ('Resampling methods', 2, None, 'resampling-methods'),
+              ('Resampling approaches can be computationally expensive',
                2,
                None,
-               'the-cost-function-rewritten'),
-              ('Minimizing the cross entropy',
+               'resampling-approaches-can-be-computationally-expensive'),
+              ('Why resampling methods ?', 2, None, 'why-resampling-methods'),
+              ('Statistical analysis', 2, None, 'statistical-analysis'),
+              ('Resampling methods', 2, None, 'resampling-methods'),
+              ('Resampling methods: Bootstrap',
                2,
                None,
-               'minimizing-the-cross-entropy'),
-              ('A more compact expression',
+               'resampling-methods-bootstrap'),
+              ('The Central Limit Theorem',
                2,
                None,
-               'a-more-compact-expression'),
-              ('Extending to more predictors',
+               'the-central-limit-theorem'),
+              ('Finding the Limit', 2, None, 'finding-the-limit'),
+              ('Rewriting the $\\delta$-function',
                2,
                None,
-               'extending-to-more-predictors'),
-              ('Including more classes', 2, None, 'including-more-classes'),
-              ('More classes', 2, None, 'more-classes'),
-              ('Searching for Optimal Regularization Parameters $\\lambda$',
+               'rewriting-the-delta-function'),
+              ('Identifying Terms', 2, None, 'identifying-terms'),
+              ('Wrapping it up', 2, None, 'wrapping-it-up'),
+              ('Confidence Intervals', 2, None, 'confidence-intervals'),
+              ('Standard Approach based on the Normal Distribution',
                2,
                None,
-               'searching-for-optimal-regularization-parameters-lambda'),
-              ('Grid Search', 2, None, 'grid-search'),
-              ('Randomized Grid Search', 2, None, 'randomized-grid-search'),
-              ('Wisconsin Cancer Data', 2, None, 'wisconsin-cancer-data'),
-              ('Using the correlation matrix',
+               'standard-approach-based-on-the-normal-distribution'),
+              ('Resampling methods: Bootstrap background',
                2,
                None,
-               'using-the-correlation-matrix'),
-              ('Discussing the correlation data',
+               'resampling-methods-bootstrap-background'),
+              ('Resampling methods: More Bootstrap background',
                2,
                None,
-               'discussing-the-correlation-data'),
-              ('Other measures in classification studies: Cancer Data  again',
+               'resampling-methods-more-bootstrap-background'),
+              ('Resampling methods: Bootstrap approach',
                2,
                None,
-               'other-measures-in-classification-studies-cancer-data-again'),
-              ('Optimization, the central part of any Machine Learning '
-               'algortithm',
+               'resampling-methods-bootstrap-approach'),
+              ('Resampling methods: Bootstrap steps',
                2,
                None,
-               'optimization-the-central-part-of-any-machine-learning-algortithm'),
-              ('Revisiting our Logistic Regression case',
+               'resampling-methods-bootstrap-steps'),
+              ('Code example for the Bootstrap method',
                2,
                None,
-               'revisiting-our-logistic-regression-case'),
-              ('The equations to solve', 2, None, 'the-equations-to-solve'),
-              ("Solving using Newton-Raphson's method",
+               'code-example-for-the-bootstrap-method'),
+              ('Plotting the Histogram', 2, None, 'plotting-the-histogram'),
+              ('The bias-variance tradeoff',
                2,
                None,
-               'solving-using-newton-raphson-s-method'),
-              ("Brief reminder on Newton-Raphson's method",
+               'the-bias-variance-tradeoff'),
+              ('A way to Read the Bias-Variance Tradeoff',
                2,
                None,
-               'brief-reminder-on-newton-raphson-s-method'),
-              ('The equations', 2, None, 'the-equations'),
-              ('Simple geometric interpretation',
+               'a-way-to-read-the-bias-variance-tradeoff'),
+              ('Example code for Bias-Variance tradeoff',
                2,
                None,
-               'simple-geometric-interpretation'),
-              ('Extending to more than one variable',
+               'example-code-for-bias-variance-tradeoff'),
+              ('Understanding what happens',
                2,
                None,
-               'extending-to-more-than-one-variable'),
-              ('Steepest descent', 2, None, 'steepest-descent'),
-              ('More on Steepest descent', 2, None, 'more-on-steepest-descent'),
-              ('The ideal', 2, None, 'the-ideal'),
-              ('The sensitiveness of the gradient descent',
+               'understanding-what-happens'),
+              ('Summing up', 2, None, 'summing-up'),
+              ("Another Example from Scikit-Learn's Repository",
                2,
                None,
-               'the-sensitiveness-of-the-gradient-descent'),
-              ('Convex functions', 2, None, 'convex-functions'),
-              ('Convex function', 2, None, 'convex-function'),
-              ('Conditions on convex functions',
-               2,
-               None,
-               'conditions-on-convex-functions'),
-              ('More on convex functions', 2, None, 'more-on-convex-functions'),
-              ('Some simple problems', 2, None, 'some-simple-problems'),
-              ('Revisiting our first homework',
-               2,
-               None,
-               'revisiting-our-first-homework'),
-              ('Gradient descent example', 2, None, 'gradient-descent-example'),
-              ('The derivative of the cost/loss function',
-               2,
-               None,
-               'the-derivative-of-the-cost-loss-function'),
-              ('The Hessian matrix', 2, None, 'the-hessian-matrix'),
-              ('Simple program', 2, None, 'simple-program'),
-              ('Gradient Descent Example', 2, None, 'gradient-descent-example'),
-              ('And a corresponding example using _scikit-learn_',
-               2,
-               None,
-               'and-a-corresponding-example-using-scikit-learn'),
-              ('Gradient descent and Ridge',
-               2,
-               None,
-               'gradient-descent-and-ridge'),
-              ('The Hessian matrix for Ridge Regression',
-               2,
-               None,
-               'the-hessian-matrix-for-ridge-regression'),
-              ('Program example for gradient descent with Ridge Regression',
-               2,
-               None,
-               'program-example-for-gradient-descent-with-ridge-regression'),
-              ('Using gradient descent methods, limitations',
-               2,
-               None,
-               'using-gradient-descent-methods-limitations'),
-              ('Challenge yourself the coming weekend',
-               2,
-               None,
-               'challenge-yourself-the-coming-weekend'),
-              ('Lab session: Material from last week and relevant for the '
-               'first project',
-               2,
-               None,
-               'lab-session-material-from-last-week-and-relevant-for-the-first-project'),
+               'another-example-from-scikit-learn-s-repository'),
               ('Various steps in cross-validation',
                2,
                None,
                'various-steps-in-cross-validation'),
-              ('How to set up the cross-validation for Ridge and/or Lasso',
-               2,
-               None,
-               'how-to-set-up-the-cross-validation-for-ridge-and-or-lasso'),
               ('Cross-validation in brief',
                2,
                None,
@@ -210,7 +151,19 @@
                'Cross-validation',
                2,
                None,
-               'code-example-for-cross-validation-and-k-fold-cross-validation')]}
+               'code-example-for-cross-validation-and-k-fold-cross-validation'),
+              ('More examples on bootstrap and cross-validation and errors',
+               2,
+               None,
+               'more-examples-on-bootstrap-and-cross-validation-and-errors'),
+              ('The same example but now with cross-validation',
+               2,
+               None,
+               'the-same-example-but-now-with-cross-validation'),
+              ('Material for the lab sessions',
+               2,
+               None,
+               'material-for-the-lab-sessions')]}
 end of tocinfo -->
 
 <body>
@@ -238,76 +191,55 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week38-bs.html">Week 38: Logistic Regression and Optimization</a>
+    <a class="navbar-brand" href="http://github.com/week38-bs.html">Week 38: Statistical analysis, bias-variance tradeoff and resampling methods</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs001.html#plans-for-week-38-lecture-monday-september-16" style="font-size: 80%;">Plans for week 38, lecture Monday September 16</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs002.html#suggested-reading-and-videos" style="font-size: 80%;">Suggested reading and videos</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs003.html#plans-for-the-lab-sessions" style="font-size: 80%;">Plans for the lab sessions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs004.html#material-for-lecture-monday-september-16" style="font-size: 80%;">Material for lecture Monday September 16</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs005.html#logistic-regression" style="font-size: 80%;">Logistic Regression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs006.html#classification-problems" style="font-size: 80%;">Classification problems</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs007.html#optimization-and-deep-learning" style="font-size: 80%;">Optimization and Deep learning</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs008.html#basics" style="font-size: 80%;">Basics</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs009.html#linear-classifier" style="font-size: 80%;">Linear classifier</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs010.html#some-selected-properties" style="font-size: 80%;">Some selected properties</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs011.html#simple-example" style="font-size: 80%;">Simple example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs012.html#plotting-the-mean-value-for-each-group" style="font-size: 80%;">Plotting the mean value for each group</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs013.html#the-logistic-function" style="font-size: 80%;">The logistic function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs014.html#examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-networks" style="font-size: 80%;">Examples of likelihood functions used in logistic regression and nueral networks</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs015.html#two-parameters" style="font-size: 80%;">Two parameters</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs016.html#maximum-likelihood" style="font-size: 80%;">Maximum likelihood</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs017.html#the-cost-function-rewritten" style="font-size: 80%;">The cost function rewritten</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs018.html#minimizing-the-cross-entropy" style="font-size: 80%;">Minimizing the cross entropy</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs019.html#a-more-compact-expression" style="font-size: 80%;">A more compact expression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs020.html#extending-to-more-predictors" style="font-size: 80%;">Extending to more predictors</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs021.html#including-more-classes" style="font-size: 80%;">Including more classes</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs022.html#more-classes" style="font-size: 80%;">More classes</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs023.html#searching-for-optimal-regularization-parameters-lambda" style="font-size: 80%;">Searching for Optimal Regularization Parameters \( \lambda \)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs024.html#grid-search" style="font-size: 80%;">Grid Search</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs025.html#randomized-grid-search" style="font-size: 80%;">Randomized Grid Search</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs026.html#wisconsin-cancer-data" style="font-size: 80%;">Wisconsin Cancer Data</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs027.html#using-the-correlation-matrix" style="font-size: 80%;">Using the correlation matrix</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs028.html#discussing-the-correlation-data" style="font-size: 80%;">Discussing the correlation data</a></li>
-     <!-- navigation toc: --> <li><a href="#other-measures-in-classification-studies-cancer-data-again" style="font-size: 80%;">Other measures in classification studies: Cancer Data  again</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs030.html#optimization-the-central-part-of-any-machine-learning-algortithm" style="font-size: 80%;">Optimization, the central part of any Machine Learning algortithm</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs031.html#revisiting-our-logistic-regression-case" style="font-size: 80%;">Revisiting our Logistic Regression case</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs032.html#the-equations-to-solve" style="font-size: 80%;">The equations to solve</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs033.html#solving-using-newton-raphson-s-method" style="font-size: 80%;">Solving using Newton-Raphson's method</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs034.html#brief-reminder-on-newton-raphson-s-method" style="font-size: 80%;">Brief reminder on Newton-Raphson's method</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs035.html#the-equations" style="font-size: 80%;">The equations</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs036.html#simple-geometric-interpretation" style="font-size: 80%;">Simple geometric interpretation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs037.html#extending-to-more-than-one-variable" style="font-size: 80%;">Extending to more than one variable</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs038.html#steepest-descent" style="font-size: 80%;">Steepest descent</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs039.html#more-on-steepest-descent" style="font-size: 80%;">More on Steepest descent</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs040.html#the-ideal" style="font-size: 80%;">The ideal</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs041.html#the-sensitiveness-of-the-gradient-descent" style="font-size: 80%;">The sensitiveness of the gradient descent</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs042.html#convex-functions" style="font-size: 80%;">Convex functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs043.html#convex-function" style="font-size: 80%;">Convex function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs044.html#conditions-on-convex-functions" style="font-size: 80%;">Conditions on convex functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs045.html#more-on-convex-functions" style="font-size: 80%;">More on convex functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs046.html#some-simple-problems" style="font-size: 80%;">Some simple problems</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs047.html#revisiting-our-first-homework" style="font-size: 80%;">Revisiting our first homework</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs052.html#gradient-descent-example" style="font-size: 80%;">Gradient descent example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs049.html#the-derivative-of-the-cost-loss-function" style="font-size: 80%;">The derivative of the cost/loss function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs050.html#the-hessian-matrix" style="font-size: 80%;">The Hessian matrix</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs051.html#simple-program" style="font-size: 80%;">Simple program</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs052.html#gradient-descent-example" style="font-size: 80%;">Gradient Descent Example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs053.html#and-a-corresponding-example-using-scikit-learn" style="font-size: 80%;">And a corresponding example using <b>scikit-learn</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs054.html#gradient-descent-and-ridge" style="font-size: 80%;">Gradient descent and Ridge</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs055.html#the-hessian-matrix-for-ridge-regression" style="font-size: 80%;">The Hessian matrix for Ridge Regression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs056.html#program-example-for-gradient-descent-with-ridge-regression" style="font-size: 80%;">Program example for gradient descent with Ridge Regression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs057.html#using-gradient-descent-methods-limitations" style="font-size: 80%;">Using gradient descent methods, limitations</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs058.html#challenge-yourself-the-coming-weekend" style="font-size: 80%;">Challenge yourself the coming weekend</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs059.html#lab-session-material-from-last-week-and-relevant-for-the-first-project" style="font-size: 80%;">Lab session: Material from last week and relevant for the first project</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs060.html#various-steps-in-cross-validation" style="font-size: 80%;">Various steps in cross-validation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs061.html#how-to-set-up-the-cross-validation-for-ridge-and-or-lasso" style="font-size: 80%;">How to set up the cross-validation for Ridge and/or Lasso</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs062.html#cross-validation-in-brief" style="font-size: 80%;">Cross-validation in brief</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs063.html#code-example-for-cross-validation-and-k-fold-cross-validation" style="font-size: 80%;">Code Example for Cross-validation and \( k \)-fold Cross-validation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs001.html#plans-for-week-38-lecture-monday-september-15" style="font-size: 80%;">Plans for week 38, lecture Monday September 15</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs002.html#readings-and-videos" style="font-size: 80%;">Readings and Videos</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs003.html#linking-the-regression-analysis-with-a-statistical-interpretation" style="font-size: 80%;">Linking the regression analysis with a statistical interpretation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs004.html#assumptions-made" style="font-size: 80%;">Assumptions made</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs005.html#expectation-value-and-variance" style="font-size: 80%;">Expectation value and variance</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs006.html#expectation-value-and-variance-for-boldsymbol-theta" style="font-size: 80%;">Expectation value and variance for \( \boldsymbol{\theta} \)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs007.html#deriving-ols-from-a-probability-distribution" style="font-size: 80%;">Deriving OLS from a probability distribution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs008.html#independent-and-identically-distributed-iid" style="font-size: 80%;">Independent and Identically Distributed (iid)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs009.html#maximum-likelihood-estimation-mle" style="font-size: 80%;">Maximum Likelihood Estimation (MLE)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs010.html#a-new-cost-function" style="font-size: 80%;">A new Cost Function</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs014.html#why-resampling-methods" style="font-size: 80%;">Why resampling methods</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs016.html#resampling-methods" style="font-size: 80%;">Resampling methods</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs013.html#resampling-approaches-can-be-computationally-expensive" style="font-size: 80%;">Resampling approaches can be computationally expensive</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs014.html#why-resampling-methods" style="font-size: 80%;">Why resampling methods ?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs015.html#statistical-analysis" style="font-size: 80%;">Statistical analysis</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs016.html#resampling-methods" style="font-size: 80%;">Resampling methods</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs017.html#resampling-methods-bootstrap" style="font-size: 80%;">Resampling methods: Bootstrap</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs018.html#the-central-limit-theorem" style="font-size: 80%;">The Central Limit Theorem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs019.html#finding-the-limit" style="font-size: 80%;">Finding the Limit</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs020.html#rewriting-the-delta-function" style="font-size: 80%;">Rewriting the \( \delta \)-function</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs021.html#identifying-terms" style="font-size: 80%;">Identifying Terms</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs022.html#wrapping-it-up" style="font-size: 80%;">Wrapping it up</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs023.html#confidence-intervals" style="font-size: 80%;">Confidence Intervals</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs024.html#standard-approach-based-on-the-normal-distribution" style="font-size: 80%;">Standard Approach based on the Normal Distribution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs025.html#resampling-methods-bootstrap-background" style="font-size: 80%;">Resampling methods: Bootstrap background</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs026.html#resampling-methods-more-bootstrap-background" style="font-size: 80%;">Resampling methods: More Bootstrap background</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs027.html#resampling-methods-bootstrap-approach" style="font-size: 80%;">Resampling methods: Bootstrap approach</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs028.html#resampling-methods-bootstrap-steps" style="font-size: 80%;">Resampling methods: Bootstrap steps</a></li>
+     <!-- navigation toc: --> <li><a href="#code-example-for-the-bootstrap-method" style="font-size: 80%;">Code example for the Bootstrap method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs030.html#plotting-the-histogram" style="font-size: 80%;">Plotting the Histogram</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs031.html#the-bias-variance-tradeoff" style="font-size: 80%;">The bias-variance tradeoff</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs032.html#a-way-to-read-the-bias-variance-tradeoff" style="font-size: 80%;">A way to Read the Bias-Variance Tradeoff</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs033.html#example-code-for-bias-variance-tradeoff" style="font-size: 80%;">Example code for Bias-Variance tradeoff</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs034.html#understanding-what-happens" style="font-size: 80%;">Understanding what happens</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs035.html#summing-up" style="font-size: 80%;">Summing up</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs036.html#another-example-from-scikit-learn-s-repository" style="font-size: 80%;">Another Example from Scikit-Learn's Repository</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs037.html#various-steps-in-cross-validation" style="font-size: 80%;">Various steps in cross-validation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs038.html#cross-validation-in-brief" style="font-size: 80%;">Cross-validation in brief</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs039.html#code-example-for-cross-validation-and-k-fold-cross-validation" style="font-size: 80%;">Code Example for Cross-validation and \( k \)-fold Cross-validation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs040.html#more-examples-on-bootstrap-and-cross-validation-and-errors" style="font-size: 80%;">More examples on bootstrap and cross-validation and errors</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs041.html#the-same-example-but-now-with-cross-validation" style="font-size: 80%;">The same example but now with cross-validation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs042.html#material-for-the-lab-sessions" style="font-size: 80%;">Material for the lab sessions</a></li>
 
         </ul>
       </li>
@@ -319,7 +251,22 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0029"></a>
 <!-- !split -->
-<h2 id="other-measures-in-classification-studies-cancer-data-again" class="anchor">Other measures in classification studies: Cancer Data  again </h2>
+<h2 id="code-example-for-the-bootstrap-method" class="anchor">Code example for the Bootstrap method </h2>
+
+<p>The following code starts with a Gaussian distribution with mean value
+\( \mu =100 \) and variance \( \sigma=15 \). We use this to generate the data
+used in the bootstrap analysis. The bootstrap analysis returns a data
+set after a given number of bootstrap operations (as many as we have
+data points). This data set consists of estimated mean values for each
+bootstrap operation. The histogram generated by the bootstrap method
+shows that the distribution for these mean values is also a Gaussian,
+centered around the mean value \( \mu=100 \) but with standard deviation
+\( \sigma/\sqrt{n} \), where \( n \) is the number of bootstrap samples (in
+this case the same as the number of original data points). The value
+of the standard deviation is what we expect from the central limit
+theorem.
+</p>
+
 
 <!-- code=python (!bc pycod) typeset with pygments style "default" -->
 <div class="cell border-box-sizing code_cell rendered">
@@ -327,38 +274,32 @@ <h2 id="other-measures-in-classification-studies-cancer-data-again" class="ancho
     <div class="inner_cell">
       <div class="input_area">
         <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">matplotlib.pyplot</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">plt</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.model_selection</span> <span style="color: #008000; font-weight: bold">import</span>  train_test_split 
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.datasets</span> <span style="color: #008000; font-weight: bold">import</span> load_breast_cancer
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.linear_model</span> <span style="color: #008000; font-weight: bold">import</span> LogisticRegression
-
-<span style="color: #408080; font-style: italic"># Load the data</span>
-cancer <span style="color: #666666">=</span> load_breast_cancer()
-
-X_train, X_test, y_train, y_test <span style="color: #666666">=</span> train_test_split(cancer<span style="color: #666666">.</span>data,cancer<span style="color: #666666">.</span>target,random_state<span style="color: #666666">=0</span>)
-<span style="color: #008000">print</span>(X_train<span style="color: #666666">.</span>shape)
-<span style="color: #008000">print</span>(X_test<span style="color: #666666">.</span>shape)
-<span style="color: #408080; font-style: italic"># Logistic Regression</span>
-logreg <span style="color: #666666">=</span> LogisticRegression(solver<span style="color: #666666">=</span><span style="color: #BA2121">&#39;lbfgs&#39;</span>)
-logreg<span style="color: #666666">.</span>fit(X_train, y_train)
+  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">time</span> <span style="color: #008000; font-weight: bold">import</span> time
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">scipy.stats</span> <span style="color: #008000; font-weight: bold">import</span> norm
+<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">matplotlib.pyplot</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">plt</span>
 
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.preprocessing</span> <span style="color: #008000; font-weight: bold">import</span> LabelEncoder
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.model_selection</span> <span style="color: #008000; font-weight: bold">import</span> cross_validate
-<span style="color: #408080; font-style: italic">#Cross validation</span>
-accuracy <span style="color: #666666">=</span> cross_validate(logreg,X_test,y_test,cv<span style="color: #666666">=10</span>)[<span style="color: #BA2121">&#39;test_score&#39;</span>]
-<span style="color: #008000">print</span>(accuracy)
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;Test set accuracy with Logistic Regression: </span><span style="color: #BB6688; font-weight: bold">{:.2f}</span><span style="color: #BA2121">&quot;</span><span style="color: #666666">.</span>format(logreg<span style="color: #666666">.</span>score(X_test,y_test)))
+<span style="color: #408080; font-style: italic"># Returns mean of bootstrap samples </span>
+<span style="color: #408080; font-style: italic"># Bootstrap algorithm</span>
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">bootstrap</span>(data, datapoints):
+    t <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros(datapoints)
+    n <span style="color: #666666">=</span> <span style="color: #008000">len</span>(data)
+    <span style="color: #408080; font-style: italic"># non-parametric bootstrap         </span>
+    <span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(datapoints):
+        t[i] <span style="color: #666666">=</span> np<span style="color: #666666">.</span>mean(data[np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>randint(<span style="color: #666666">0</span>,n,n)])
+    <span style="color: #408080; font-style: italic"># analysis    </span>
+    <span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;Bootstrap Statistics :&quot;</span>)
+    <span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;original           bias      std. error&quot;</span>)
+    <span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;</span><span style="color: #BB6688; font-weight: bold">%8g</span><span style="color: #BA2121"> </span><span style="color: #BB6688; font-weight: bold">%8g</span><span style="color: #BA2121"> </span><span style="color: #BB6688; font-weight: bold">%14g</span><span style="color: #BA2121"> </span><span style="color: #BB6688; font-weight: bold">%15g</span><span style="color: #BA2121">&quot;</span> <span style="color: #666666">%</span> (np<span style="color: #666666">.</span>mean(data), np<span style="color: #666666">.</span>std(data),np<span style="color: #666666">.</span>mean(t),np<span style="color: #666666">.</span>std(t)))
+    <span style="color: #008000; font-weight: bold">return</span> t
 
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">scikitplot</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">skplt</span>
-y_pred <span style="color: #666666">=</span> logreg<span style="color: #666666">.</span>predict(X_test)
-skplt<span style="color: #666666">.</span>metrics<span style="color: #666666">.</span>plot_confusion_matrix(y_test, y_pred, normalize<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">True</span>)
-plt<span style="color: #666666">.</span>show()
-y_probas <span style="color: #666666">=</span> logreg<span style="color: #666666">.</span>predict_proba(X_test)
-skplt<span style="color: #666666">.</span>metrics<span style="color: #666666">.</span>plot_roc(y_test, y_probas)
-plt<span style="color: #666666">.</span>show()
-skplt<span style="color: #666666">.</span>metrics<span style="color: #666666">.</span>plot_cumulative_gain(y_test, y_probas)
-plt<span style="color: #666666">.</span>show()
+<span style="color: #408080; font-style: italic"># We set the mean value to 100 and the standard deviation to 15</span>
+mu, sigma <span style="color: #666666">=</span> <span style="color: #666666">100</span>, <span style="color: #666666">15</span>
+datapoints <span style="color: #666666">=</span> <span style="color: #666666">10000</span>
+<span style="color: #408080; font-style: italic"># We generate random numbers according to the normal distribution</span>
+x <span style="color: #666666">=</span> mu <span style="color: #666666">+</span> sigma<span style="color: #666666">*</span>np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>randn(datapoints)
+<span style="color: #408080; font-style: italic"># bootstrap returns the data sample                                    </span>
+t <span style="color: #666666">=</span> bootstrap(x, datapoints)
 </pre>
 </div>
       </div>
@@ -374,6 +315,7 @@ <h2 id="other-measures-in-classification-studies-cancer-data-again" class="ancho
   </div>
 </div>
 
+<p>We see that our new variance and from that the standard deviation, agrees with the central limit theorem.</p>
 
 <p>
 <!-- navigation buttons at the bottom of the page -->
@@ -400,7 +342,7 @@ <h2 id="other-measures-in-classification-studies-cancer-data-again" class="ancho
   <li><a href="http://github.com/._week38-bs037.html">38</a></li>
   <li><a href="http://github.com/._week38-bs038.html">39</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week38-bs063.html">64</a></li>
+  <li><a href="http://github.com/._week38-bs042.html">43</a></li>
   <li><a href="http://github.com/._week38-bs030.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week38/html/._week38-bs030.html b/doc/pub/week38/html/._week38-bs030.html
index d592c53e4..14270e4da 100644
--- a/doc/pub/week38/html/._week38-bs030.html
+++ b/doc/pub/week38/html/._week38-bs030.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 38: Logistic Regression and Optimization">
-<title>Week 38: Logistic Regression and Optimization</title>
+<meta name="description" content="Week 38: Statistical analysis, bias-variance tradeoff and resampling methods">
+<title>Week 38: Statistical analysis, bias-variance tradeoff and resampling methods</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week38.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week38-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -36,172 +36,113 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Plans for week 38, lecture Monday September 16',
+ 'sections': [('Plans for week 38, lecture Monday September 15',
                2,
                None,
-               'plans-for-week-38-lecture-monday-september-16'),
-              ('Suggested reading and videos',
+               'plans-for-week-38-lecture-monday-september-15'),
+              ('Readings and Videos', 2, None, 'readings-and-videos'),
+              ('Linking the regression analysis with a statistical '
+               'interpretation',
                2,
                None,
-               'suggested-reading-and-videos'),
-              ('Plans for the lab sessions',
+               'linking-the-regression-analysis-with-a-statistical-interpretation'),
+              ('Assumptions made', 2, None, 'assumptions-made'),
+              ('Expectation value and variance',
                2,
                None,
-               'plans-for-the-lab-sessions'),
-              ('Material for lecture Monday September 16',
+               'expectation-value-and-variance'),
+              ('Expectation value and variance for $\\boldsymbol{\\theta}$',
                2,
                None,
-               'material-for-lecture-monday-september-16'),
-              ('Logistic Regression', 2, None, 'logistic-regression'),
-              ('Classification problems', 2, None, 'classification-problems'),
-              ('Optimization and Deep learning',
+               'expectation-value-and-variance-for-boldsymbol-theta'),
+              ('Deriving OLS from a probability distribution',
                2,
                None,
-               'optimization-and-deep-learning'),
-              ('Basics', 2, None, 'basics'),
-              ('Linear classifier', 2, None, 'linear-classifier'),
-              ('Some selected properties', 2, None, 'some-selected-properties'),
-              ('Simple example', 2, None, 'simple-example'),
-              ('Plotting the mean value for each group',
+               'deriving-ols-from-a-probability-distribution'),
+              ('Independent and Identically Distributed (iid)',
                2,
                None,
-               'plotting-the-mean-value-for-each-group'),
-              ('The logistic function', 2, None, 'the-logistic-function'),
-              ('Examples of likelihood functions used in logistic regression '
-               'and nueral networks',
+               'independent-and-identically-distributed-iid'),
+              ('Maximum Likelihood Estimation (MLE)',
                2,
                None,
-               'examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-networks'),
-              ('Two parameters', 2, None, 'two-parameters'),
-              ('Maximum likelihood', 2, None, 'maximum-likelihood'),
-              ('The cost function rewritten',
+               'maximum-likelihood-estimation-mle'),
+              ('A new Cost Function', 2, None, 'a-new-cost-function'),
+              ('Why resampling methods', 2, None, 'why-resampling-methods'),
+              ('Resampling methods', 2, None, 'resampling-methods'),
+              ('Resampling approaches can be computationally expensive',
                2,
                None,
-               'the-cost-function-rewritten'),
-              ('Minimizing the cross entropy',
+               'resampling-approaches-can-be-computationally-expensive'),
+              ('Why resampling methods ?', 2, None, 'why-resampling-methods'),
+              ('Statistical analysis', 2, None, 'statistical-analysis'),
+              ('Resampling methods', 2, None, 'resampling-methods'),
+              ('Resampling methods: Bootstrap',
                2,
                None,
-               'minimizing-the-cross-entropy'),
-              ('A more compact expression',
+               'resampling-methods-bootstrap'),
+              ('The Central Limit Theorem',
                2,
                None,
-               'a-more-compact-expression'),
-              ('Extending to more predictors',
+               'the-central-limit-theorem'),
+              ('Finding the Limit', 2, None, 'finding-the-limit'),
+              ('Rewriting the $\\delta$-function',
                2,
                None,
-               'extending-to-more-predictors'),
-              ('Including more classes', 2, None, 'including-more-classes'),
-              ('More classes', 2, None, 'more-classes'),
-              ('Searching for Optimal Regularization Parameters $\\lambda$',
+               'rewriting-the-delta-function'),
+              ('Identifying Terms', 2, None, 'identifying-terms'),
+              ('Wrapping it up', 2, None, 'wrapping-it-up'),
+              ('Confidence Intervals', 2, None, 'confidence-intervals'),
+              ('Standard Approach based on the Normal Distribution',
                2,
                None,
-               'searching-for-optimal-regularization-parameters-lambda'),
-              ('Grid Search', 2, None, 'grid-search'),
-              ('Randomized Grid Search', 2, None, 'randomized-grid-search'),
-              ('Wisconsin Cancer Data', 2, None, 'wisconsin-cancer-data'),
-              ('Using the correlation matrix',
+               'standard-approach-based-on-the-normal-distribution'),
+              ('Resampling methods: Bootstrap background',
                2,
                None,
-               'using-the-correlation-matrix'),
-              ('Discussing the correlation data',
+               'resampling-methods-bootstrap-background'),
+              ('Resampling methods: More Bootstrap background',
                2,
                None,
-               'discussing-the-correlation-data'),
-              ('Other measures in classification studies: Cancer Data  again',
+               'resampling-methods-more-bootstrap-background'),
+              ('Resampling methods: Bootstrap approach',
                2,
                None,
-               'other-measures-in-classification-studies-cancer-data-again'),
-              ('Optimization, the central part of any Machine Learning '
-               'algortithm',
+               'resampling-methods-bootstrap-approach'),
+              ('Resampling methods: Bootstrap steps',
                2,
                None,
-               'optimization-the-central-part-of-any-machine-learning-algortithm'),
-              ('Revisiting our Logistic Regression case',
+               'resampling-methods-bootstrap-steps'),
+              ('Code example for the Bootstrap method',
                2,
                None,
-               'revisiting-our-logistic-regression-case'),
-              ('The equations to solve', 2, None, 'the-equations-to-solve'),
-              ("Solving using Newton-Raphson's method",
+               'code-example-for-the-bootstrap-method'),
+              ('Plotting the Histogram', 2, None, 'plotting-the-histogram'),
+              ('The bias-variance tradeoff',
                2,
                None,
-               'solving-using-newton-raphson-s-method'),
-              ("Brief reminder on Newton-Raphson's method",
+               'the-bias-variance-tradeoff'),
+              ('A way to Read the Bias-Variance Tradeoff',
                2,
                None,
-               'brief-reminder-on-newton-raphson-s-method'),
-              ('The equations', 2, None, 'the-equations'),
-              ('Simple geometric interpretation',
+               'a-way-to-read-the-bias-variance-tradeoff'),
+              ('Example code for Bias-Variance tradeoff',
                2,
                None,
-               'simple-geometric-interpretation'),
-              ('Extending to more than one variable',
+               'example-code-for-bias-variance-tradeoff'),
+              ('Understanding what happens',
                2,
                None,
-               'extending-to-more-than-one-variable'),
-              ('Steepest descent', 2, None, 'steepest-descent'),
-              ('More on Steepest descent', 2, None, 'more-on-steepest-descent'),
-              ('The ideal', 2, None, 'the-ideal'),
-              ('The sensitiveness of the gradient descent',
+               'understanding-what-happens'),
+              ('Summing up', 2, None, 'summing-up'),
+              ("Another Example from Scikit-Learn's Repository",
                2,
                None,
-               'the-sensitiveness-of-the-gradient-descent'),
-              ('Convex functions', 2, None, 'convex-functions'),
-              ('Convex function', 2, None, 'convex-function'),
-              ('Conditions on convex functions',
-               2,
-               None,
-               'conditions-on-convex-functions'),
-              ('More on convex functions', 2, None, 'more-on-convex-functions'),
-              ('Some simple problems', 2, None, 'some-simple-problems'),
-              ('Revisiting our first homework',
-               2,
-               None,
-               'revisiting-our-first-homework'),
-              ('Gradient descent example', 2, None, 'gradient-descent-example'),
-              ('The derivative of the cost/loss function',
-               2,
-               None,
-               'the-derivative-of-the-cost-loss-function'),
-              ('The Hessian matrix', 2, None, 'the-hessian-matrix'),
-              ('Simple program', 2, None, 'simple-program'),
-              ('Gradient Descent Example', 2, None, 'gradient-descent-example'),
-              ('And a corresponding example using _scikit-learn_',
-               2,
-               None,
-               'and-a-corresponding-example-using-scikit-learn'),
-              ('Gradient descent and Ridge',
-               2,
-               None,
-               'gradient-descent-and-ridge'),
-              ('The Hessian matrix for Ridge Regression',
-               2,
-               None,
-               'the-hessian-matrix-for-ridge-regression'),
-              ('Program example for gradient descent with Ridge Regression',
-               2,
-               None,
-               'program-example-for-gradient-descent-with-ridge-regression'),
-              ('Using gradient descent methods, limitations',
-               2,
-               None,
-               'using-gradient-descent-methods-limitations'),
-              ('Challenge yourself the coming weekend',
-               2,
-               None,
-               'challenge-yourself-the-coming-weekend'),
-              ('Lab session: Material from last week and relevant for the '
-               'first project',
-               2,
-               None,
-               'lab-session-material-from-last-week-and-relevant-for-the-first-project'),
+               'another-example-from-scikit-learn-s-repository'),
               ('Various steps in cross-validation',
                2,
                None,
                'various-steps-in-cross-validation'),
-              ('How to set up the cross-validation for Ridge and/or Lasso',
-               2,
-               None,
-               'how-to-set-up-the-cross-validation-for-ridge-and-or-lasso'),
               ('Cross-validation in brief',
                2,
                None,
@@ -210,7 +151,19 @@
                'Cross-validation',
                2,
                None,
-               'code-example-for-cross-validation-and-k-fold-cross-validation')]}
+               'code-example-for-cross-validation-and-k-fold-cross-validation'),
+              ('More examples on bootstrap and cross-validation and errors',
+               2,
+               None,
+               'more-examples-on-bootstrap-and-cross-validation-and-errors'),
+              ('The same example but now with cross-validation',
+               2,
+               None,
+               'the-same-example-but-now-with-cross-validation'),
+              ('Material for the lab sessions',
+               2,
+               None,
+               'material-for-the-lab-sessions')]}
 end of tocinfo -->
 
 <body>
@@ -238,76 +191,55 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week38-bs.html">Week 38: Logistic Regression and Optimization</a>
+    <a class="navbar-brand" href="http://github.com/week38-bs.html">Week 38: Statistical analysis, bias-variance tradeoff and resampling methods</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs001.html#plans-for-week-38-lecture-monday-september-16" style="font-size: 80%;">Plans for week 38, lecture Monday September 16</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs002.html#suggested-reading-and-videos" style="font-size: 80%;">Suggested reading and videos</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs003.html#plans-for-the-lab-sessions" style="font-size: 80%;">Plans for the lab sessions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs004.html#material-for-lecture-monday-september-16" style="font-size: 80%;">Material for lecture Monday September 16</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs005.html#logistic-regression" style="font-size: 80%;">Logistic Regression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs006.html#classification-problems" style="font-size: 80%;">Classification problems</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs007.html#optimization-and-deep-learning" style="font-size: 80%;">Optimization and Deep learning</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs008.html#basics" style="font-size: 80%;">Basics</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs009.html#linear-classifier" style="font-size: 80%;">Linear classifier</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs010.html#some-selected-properties" style="font-size: 80%;">Some selected properties</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs011.html#simple-example" style="font-size: 80%;">Simple example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs012.html#plotting-the-mean-value-for-each-group" style="font-size: 80%;">Plotting the mean value for each group</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs013.html#the-logistic-function" style="font-size: 80%;">The logistic function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs014.html#examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-networks" style="font-size: 80%;">Examples of likelihood functions used in logistic regression and nueral networks</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs015.html#two-parameters" style="font-size: 80%;">Two parameters</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs016.html#maximum-likelihood" style="font-size: 80%;">Maximum likelihood</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs017.html#the-cost-function-rewritten" style="font-size: 80%;">The cost function rewritten</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs018.html#minimizing-the-cross-entropy" style="font-size: 80%;">Minimizing the cross entropy</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs019.html#a-more-compact-expression" style="font-size: 80%;">A more compact expression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs020.html#extending-to-more-predictors" style="font-size: 80%;">Extending to more predictors</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs021.html#including-more-classes" style="font-size: 80%;">Including more classes</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs022.html#more-classes" style="font-size: 80%;">More classes</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs023.html#searching-for-optimal-regularization-parameters-lambda" style="font-size: 80%;">Searching for Optimal Regularization Parameters \( \lambda \)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs024.html#grid-search" style="font-size: 80%;">Grid Search</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs025.html#randomized-grid-search" style="font-size: 80%;">Randomized Grid Search</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs026.html#wisconsin-cancer-data" style="font-size: 80%;">Wisconsin Cancer Data</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs027.html#using-the-correlation-matrix" style="font-size: 80%;">Using the correlation matrix</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs028.html#discussing-the-correlation-data" style="font-size: 80%;">Discussing the correlation data</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs029.html#other-measures-in-classification-studies-cancer-data-again" style="font-size: 80%;">Other measures in classification studies: Cancer Data  again</a></li>
-     <!-- navigation toc: --> <li><a href="#optimization-the-central-part-of-any-machine-learning-algortithm" style="font-size: 80%;">Optimization, the central part of any Machine Learning algortithm</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs031.html#revisiting-our-logistic-regression-case" style="font-size: 80%;">Revisiting our Logistic Regression case</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs032.html#the-equations-to-solve" style="font-size: 80%;">The equations to solve</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs033.html#solving-using-newton-raphson-s-method" style="font-size: 80%;">Solving using Newton-Raphson's method</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs034.html#brief-reminder-on-newton-raphson-s-method" style="font-size: 80%;">Brief reminder on Newton-Raphson's method</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs035.html#the-equations" style="font-size: 80%;">The equations</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs036.html#simple-geometric-interpretation" style="font-size: 80%;">Simple geometric interpretation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs037.html#extending-to-more-than-one-variable" style="font-size: 80%;">Extending to more than one variable</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs038.html#steepest-descent" style="font-size: 80%;">Steepest descent</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs039.html#more-on-steepest-descent" style="font-size: 80%;">More on Steepest descent</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs040.html#the-ideal" style="font-size: 80%;">The ideal</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs041.html#the-sensitiveness-of-the-gradient-descent" style="font-size: 80%;">The sensitiveness of the gradient descent</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs042.html#convex-functions" style="font-size: 80%;">Convex functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs043.html#convex-function" style="font-size: 80%;">Convex function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs044.html#conditions-on-convex-functions" style="font-size: 80%;">Conditions on convex functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs045.html#more-on-convex-functions" style="font-size: 80%;">More on convex functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs046.html#some-simple-problems" style="font-size: 80%;">Some simple problems</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs047.html#revisiting-our-first-homework" style="font-size: 80%;">Revisiting our first homework</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs052.html#gradient-descent-example" style="font-size: 80%;">Gradient descent example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs049.html#the-derivative-of-the-cost-loss-function" style="font-size: 80%;">The derivative of the cost/loss function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs050.html#the-hessian-matrix" style="font-size: 80%;">The Hessian matrix</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs051.html#simple-program" style="font-size: 80%;">Simple program</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs052.html#gradient-descent-example" style="font-size: 80%;">Gradient Descent Example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs053.html#and-a-corresponding-example-using-scikit-learn" style="font-size: 80%;">And a corresponding example using <b>scikit-learn</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs054.html#gradient-descent-and-ridge" style="font-size: 80%;">Gradient descent and Ridge</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs055.html#the-hessian-matrix-for-ridge-regression" style="font-size: 80%;">The Hessian matrix for Ridge Regression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs056.html#program-example-for-gradient-descent-with-ridge-regression" style="font-size: 80%;">Program example for gradient descent with Ridge Regression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs057.html#using-gradient-descent-methods-limitations" style="font-size: 80%;">Using gradient descent methods, limitations</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs058.html#challenge-yourself-the-coming-weekend" style="font-size: 80%;">Challenge yourself the coming weekend</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs059.html#lab-session-material-from-last-week-and-relevant-for-the-first-project" style="font-size: 80%;">Lab session: Material from last week and relevant for the first project</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs060.html#various-steps-in-cross-validation" style="font-size: 80%;">Various steps in cross-validation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs061.html#how-to-set-up-the-cross-validation-for-ridge-and-or-lasso" style="font-size: 80%;">How to set up the cross-validation for Ridge and/or Lasso</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs062.html#cross-validation-in-brief" style="font-size: 80%;">Cross-validation in brief</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs063.html#code-example-for-cross-validation-and-k-fold-cross-validation" style="font-size: 80%;">Code Example for Cross-validation and \( k \)-fold Cross-validation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs001.html#plans-for-week-38-lecture-monday-september-15" style="font-size: 80%;">Plans for week 38, lecture Monday September 15</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs002.html#readings-and-videos" style="font-size: 80%;">Readings and Videos</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs003.html#linking-the-regression-analysis-with-a-statistical-interpretation" style="font-size: 80%;">Linking the regression analysis with a statistical interpretation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs004.html#assumptions-made" style="font-size: 80%;">Assumptions made</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs005.html#expectation-value-and-variance" style="font-size: 80%;">Expectation value and variance</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs006.html#expectation-value-and-variance-for-boldsymbol-theta" style="font-size: 80%;">Expectation value and variance for \( \boldsymbol{\theta} \)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs007.html#deriving-ols-from-a-probability-distribution" style="font-size: 80%;">Deriving OLS from a probability distribution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs008.html#independent-and-identically-distributed-iid" style="font-size: 80%;">Independent and Identically Distributed (iid)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs009.html#maximum-likelihood-estimation-mle" style="font-size: 80%;">Maximum Likelihood Estimation (MLE)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs010.html#a-new-cost-function" style="font-size: 80%;">A new Cost Function</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs014.html#why-resampling-methods" style="font-size: 80%;">Why resampling methods</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs016.html#resampling-methods" style="font-size: 80%;">Resampling methods</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs013.html#resampling-approaches-can-be-computationally-expensive" style="font-size: 80%;">Resampling approaches can be computationally expensive</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs014.html#why-resampling-methods" style="font-size: 80%;">Why resampling methods ?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs015.html#statistical-analysis" style="font-size: 80%;">Statistical analysis</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs016.html#resampling-methods" style="font-size: 80%;">Resampling methods</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs017.html#resampling-methods-bootstrap" style="font-size: 80%;">Resampling methods: Bootstrap</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs018.html#the-central-limit-theorem" style="font-size: 80%;">The Central Limit Theorem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs019.html#finding-the-limit" style="font-size: 80%;">Finding the Limit</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs020.html#rewriting-the-delta-function" style="font-size: 80%;">Rewriting the \( \delta \)-function</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs021.html#identifying-terms" style="font-size: 80%;">Identifying Terms</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs022.html#wrapping-it-up" style="font-size: 80%;">Wrapping it up</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs023.html#confidence-intervals" style="font-size: 80%;">Confidence Intervals</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs024.html#standard-approach-based-on-the-normal-distribution" style="font-size: 80%;">Standard Approach based on the Normal Distribution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs025.html#resampling-methods-bootstrap-background" style="font-size: 80%;">Resampling methods: Bootstrap background</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs026.html#resampling-methods-more-bootstrap-background" style="font-size: 80%;">Resampling methods: More Bootstrap background</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs027.html#resampling-methods-bootstrap-approach" style="font-size: 80%;">Resampling methods: Bootstrap approach</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs028.html#resampling-methods-bootstrap-steps" style="font-size: 80%;">Resampling methods: Bootstrap steps</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs029.html#code-example-for-the-bootstrap-method" style="font-size: 80%;">Code example for the Bootstrap method</a></li>
+     <!-- navigation toc: --> <li><a href="#plotting-the-histogram" style="font-size: 80%;">Plotting the Histogram</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs031.html#the-bias-variance-tradeoff" style="font-size: 80%;">The bias-variance tradeoff</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs032.html#a-way-to-read-the-bias-variance-tradeoff" style="font-size: 80%;">A way to Read the Bias-Variance Tradeoff</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs033.html#example-code-for-bias-variance-tradeoff" style="font-size: 80%;">Example code for Bias-Variance tradeoff</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs034.html#understanding-what-happens" style="font-size: 80%;">Understanding what happens</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs035.html#summing-up" style="font-size: 80%;">Summing up</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs036.html#another-example-from-scikit-learn-s-repository" style="font-size: 80%;">Another Example from Scikit-Learn's Repository</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs037.html#various-steps-in-cross-validation" style="font-size: 80%;">Various steps in cross-validation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs038.html#cross-validation-in-brief" style="font-size: 80%;">Cross-validation in brief</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs039.html#code-example-for-cross-validation-and-k-fold-cross-validation" style="font-size: 80%;">Code Example for Cross-validation and \( k \)-fold Cross-validation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs040.html#more-examples-on-bootstrap-and-cross-validation-and-errors" style="font-size: 80%;">More examples on bootstrap and cross-validation and errors</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs041.html#the-same-example-but-now-with-cross-validation" style="font-size: 80%;">The same example but now with cross-validation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs042.html#material-for-the-lab-sessions" style="font-size: 80%;">Material for the lab sessions</a></li>
 
         </ul>
       </li>
@@ -319,19 +251,38 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0030"></a>
 <!-- !split -->
-<h2 id="optimization-the-central-part-of-any-machine-learning-algortithm" class="anchor">Optimization, the central part of any Machine Learning algortithm </h2>
+<h2 id="plotting-the-histogram" class="anchor">Plotting the Histogram </h2>
 
-<a href="https://www.uio.no/studier/emner/matnat/fys/FYS-STK3155/h20/forelesningsvideoer/OverarchingAimsWeek39.mp4?vrtx=view-as-webpage" target="_self">Overview Video, why do we care about gradient methods?</a>
+<!-- code=python (!bc pycod) typeset with pygments style "default" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #f8f8f8">
+  <pre style="line-height: 125%;"><span style="color: #408080; font-style: italic"># the histogram of the bootstrapped data (normalized data if density = True)</span>
+n, binsboot, patches <span style="color: #666666">=</span> plt<span style="color: #666666">.</span>hist(t, <span style="color: #666666">50</span>, density<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">True</span>, facecolor<span style="color: #666666">=</span><span style="color: #BA2121">&#39;red&#39;</span>, alpha<span style="color: #666666">=0.75</span>)
+<span style="color: #408080; font-style: italic"># add a &#39;best fit&#39; line  </span>
+y <span style="color: #666666">=</span> norm<span style="color: #666666">.</span>pdf(binsboot, np<span style="color: #666666">.</span>mean(t), np<span style="color: #666666">.</span>std(t))
+lt <span style="color: #666666">=</span> plt<span style="color: #666666">.</span>plot(binsboot, y, <span style="color: #BA2121">&#39;b&#39;</span>, linewidth<span style="color: #666666">=1</span>)
+plt<span style="color: #666666">.</span>xlabel(<span style="color: #BA2121">&#39;x&#39;</span>)
+plt<span style="color: #666666">.</span>ylabel(<span style="color: #BA2121">&#39;Probability&#39;</span>)
+plt<span style="color: #666666">.</span>grid(<span style="color: #008000; font-weight: bold">True</span>)
+plt<span style="color: #666666">.</span>show()
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
+</div>
 
-<p>Almost every problem in machine learning and data science starts with
-a dataset \( X \), a model \( g(\beta) \), which is a function of the
-parameters \( \beta \) and a cost function \( C(X, g(\beta)) \) that allows
-us to judge how well the model \( g(\beta) \) explains the observations
-\( X \). The model is fit by finding the values of \( \beta \) that minimize
-the cost function. Ideally we would be able to solve for \( \beta \)
-analytically, however this is not possible in general and we must use
-some approximative/numerical method to compute the minimum.
-</p>
 
 <p>
 <!-- navigation buttons at the bottom of the page -->
@@ -358,7 +309,7 @@ <h2 id="optimization-the-central-part-of-any-machine-learning-algortithm" class=
   <li><a href="http://github.com/._week38-bs038.html">39</a></li>
   <li><a href="http://github.com/._week38-bs039.html">40</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week38-bs063.html">64</a></li>
+  <li><a href="http://github.com/._week38-bs042.html">43</a></li>
   <li><a href="http://github.com/._week38-bs031.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week38/html/._week38-bs031.html b/doc/pub/week38/html/._week38-bs031.html
index 75c3ea574..dc864c65f 100644
--- a/doc/pub/week38/html/._week38-bs031.html
+++ b/doc/pub/week38/html/._week38-bs031.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 38: Logistic Regression and Optimization">
-<title>Week 38: Logistic Regression and Optimization</title>
+<meta name="description" content="Week 38: Statistical analysis, bias-variance tradeoff and resampling methods">
+<title>Week 38: Statistical analysis, bias-variance tradeoff and resampling methods</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week38.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week38-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -36,172 +36,113 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Plans for week 38, lecture Monday September 16',
+ 'sections': [('Plans for week 38, lecture Monday September 15',
                2,
                None,
-               'plans-for-week-38-lecture-monday-september-16'),
-              ('Suggested reading and videos',
+               'plans-for-week-38-lecture-monday-september-15'),
+              ('Readings and Videos', 2, None, 'readings-and-videos'),
+              ('Linking the regression analysis with a statistical '
+               'interpretation',
                2,
                None,
-               'suggested-reading-and-videos'),
-              ('Plans for the lab sessions',
+               'linking-the-regression-analysis-with-a-statistical-interpretation'),
+              ('Assumptions made', 2, None, 'assumptions-made'),
+              ('Expectation value and variance',
                2,
                None,
-               'plans-for-the-lab-sessions'),
-              ('Material for lecture Monday September 16',
+               'expectation-value-and-variance'),
+              ('Expectation value and variance for $\\boldsymbol{\\theta}$',
                2,
                None,
-               'material-for-lecture-monday-september-16'),
-              ('Logistic Regression', 2, None, 'logistic-regression'),
-              ('Classification problems', 2, None, 'classification-problems'),
-              ('Optimization and Deep learning',
+               'expectation-value-and-variance-for-boldsymbol-theta'),
+              ('Deriving OLS from a probability distribution',
                2,
                None,
-               'optimization-and-deep-learning'),
-              ('Basics', 2, None, 'basics'),
-              ('Linear classifier', 2, None, 'linear-classifier'),
-              ('Some selected properties', 2, None, 'some-selected-properties'),
-              ('Simple example', 2, None, 'simple-example'),
-              ('Plotting the mean value for each group',
+               'deriving-ols-from-a-probability-distribution'),
+              ('Independent and Identically Distributed (iid)',
                2,
                None,
-               'plotting-the-mean-value-for-each-group'),
-              ('The logistic function', 2, None, 'the-logistic-function'),
-              ('Examples of likelihood functions used in logistic regression '
-               'and nueral networks',
+               'independent-and-identically-distributed-iid'),
+              ('Maximum Likelihood Estimation (MLE)',
                2,
                None,
-               'examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-networks'),
-              ('Two parameters', 2, None, 'two-parameters'),
-              ('Maximum likelihood', 2, None, 'maximum-likelihood'),
-              ('The cost function rewritten',
+               'maximum-likelihood-estimation-mle'),
+              ('A new Cost Function', 2, None, 'a-new-cost-function'),
+              ('Why resampling methods', 2, None, 'why-resampling-methods'),
+              ('Resampling methods', 2, None, 'resampling-methods'),
+              ('Resampling approaches can be computationally expensive',
                2,
                None,
-               'the-cost-function-rewritten'),
-              ('Minimizing the cross entropy',
+               'resampling-approaches-can-be-computationally-expensive'),
+              ('Why resampling methods ?', 2, None, 'why-resampling-methods'),
+              ('Statistical analysis', 2, None, 'statistical-analysis'),
+              ('Resampling methods', 2, None, 'resampling-methods'),
+              ('Resampling methods: Bootstrap',
                2,
                None,
-               'minimizing-the-cross-entropy'),
-              ('A more compact expression',
+               'resampling-methods-bootstrap'),
+              ('The Central Limit Theorem',
                2,
                None,
-               'a-more-compact-expression'),
-              ('Extending to more predictors',
+               'the-central-limit-theorem'),
+              ('Finding the Limit', 2, None, 'finding-the-limit'),
+              ('Rewriting the $\\delta$-function',
                2,
                None,
-               'extending-to-more-predictors'),
-              ('Including more classes', 2, None, 'including-more-classes'),
-              ('More classes', 2, None, 'more-classes'),
-              ('Searching for Optimal Regularization Parameters $\\lambda$',
+               'rewriting-the-delta-function'),
+              ('Identifying Terms', 2, None, 'identifying-terms'),
+              ('Wrapping it up', 2, None, 'wrapping-it-up'),
+              ('Confidence Intervals', 2, None, 'confidence-intervals'),
+              ('Standard Approach based on the Normal Distribution',
                2,
                None,
-               'searching-for-optimal-regularization-parameters-lambda'),
-              ('Grid Search', 2, None, 'grid-search'),
-              ('Randomized Grid Search', 2, None, 'randomized-grid-search'),
-              ('Wisconsin Cancer Data', 2, None, 'wisconsin-cancer-data'),
-              ('Using the correlation matrix',
+               'standard-approach-based-on-the-normal-distribution'),
+              ('Resampling methods: Bootstrap background',
                2,
                None,
-               'using-the-correlation-matrix'),
-              ('Discussing the correlation data',
+               'resampling-methods-bootstrap-background'),
+              ('Resampling methods: More Bootstrap background',
                2,
                None,
-               'discussing-the-correlation-data'),
-              ('Other measures in classification studies: Cancer Data  again',
+               'resampling-methods-more-bootstrap-background'),
+              ('Resampling methods: Bootstrap approach',
                2,
                None,
-               'other-measures-in-classification-studies-cancer-data-again'),
-              ('Optimization, the central part of any Machine Learning '
-               'algortithm',
+               'resampling-methods-bootstrap-approach'),
+              ('Resampling methods: Bootstrap steps',
                2,
                None,
-               'optimization-the-central-part-of-any-machine-learning-algortithm'),
-              ('Revisiting our Logistic Regression case',
+               'resampling-methods-bootstrap-steps'),
+              ('Code example for the Bootstrap method',
                2,
                None,
-               'revisiting-our-logistic-regression-case'),
-              ('The equations to solve', 2, None, 'the-equations-to-solve'),
-              ("Solving using Newton-Raphson's method",
+               'code-example-for-the-bootstrap-method'),
+              ('Plotting the Histogram', 2, None, 'plotting-the-histogram'),
+              ('The bias-variance tradeoff',
                2,
                None,
-               'solving-using-newton-raphson-s-method'),
-              ("Brief reminder on Newton-Raphson's method",
+               'the-bias-variance-tradeoff'),
+              ('A way to Read the Bias-Variance Tradeoff',
                2,
                None,
-               'brief-reminder-on-newton-raphson-s-method'),
-              ('The equations', 2, None, 'the-equations'),
-              ('Simple geometric interpretation',
+               'a-way-to-read-the-bias-variance-tradeoff'),
+              ('Example code for Bias-Variance tradeoff',
                2,
                None,
-               'simple-geometric-interpretation'),
-              ('Extending to more than one variable',
+               'example-code-for-bias-variance-tradeoff'),
+              ('Understanding what happens',
                2,
                None,
-               'extending-to-more-than-one-variable'),
-              ('Steepest descent', 2, None, 'steepest-descent'),
-              ('More on Steepest descent', 2, None, 'more-on-steepest-descent'),
-              ('The ideal', 2, None, 'the-ideal'),
-              ('The sensitiveness of the gradient descent',
+               'understanding-what-happens'),
+              ('Summing up', 2, None, 'summing-up'),
+              ("Another Example from Scikit-Learn's Repository",
                2,
                None,
-               'the-sensitiveness-of-the-gradient-descent'),
-              ('Convex functions', 2, None, 'convex-functions'),
-              ('Convex function', 2, None, 'convex-function'),
-              ('Conditions on convex functions',
-               2,
-               None,
-               'conditions-on-convex-functions'),
-              ('More on convex functions', 2, None, 'more-on-convex-functions'),
-              ('Some simple problems', 2, None, 'some-simple-problems'),
-              ('Revisiting our first homework',
-               2,
-               None,
-               'revisiting-our-first-homework'),
-              ('Gradient descent example', 2, None, 'gradient-descent-example'),
-              ('The derivative of the cost/loss function',
-               2,
-               None,
-               'the-derivative-of-the-cost-loss-function'),
-              ('The Hessian matrix', 2, None, 'the-hessian-matrix'),
-              ('Simple program', 2, None, 'simple-program'),
-              ('Gradient Descent Example', 2, None, 'gradient-descent-example'),
-              ('And a corresponding example using _scikit-learn_',
-               2,
-               None,
-               'and-a-corresponding-example-using-scikit-learn'),
-              ('Gradient descent and Ridge',
-               2,
-               None,
-               'gradient-descent-and-ridge'),
-              ('The Hessian matrix for Ridge Regression',
-               2,
-               None,
-               'the-hessian-matrix-for-ridge-regression'),
-              ('Program example for gradient descent with Ridge Regression',
-               2,
-               None,
-               'program-example-for-gradient-descent-with-ridge-regression'),
-              ('Using gradient descent methods, limitations',
-               2,
-               None,
-               'using-gradient-descent-methods-limitations'),
-              ('Challenge yourself the coming weekend',
-               2,
-               None,
-               'challenge-yourself-the-coming-weekend'),
-              ('Lab session: Material from last week and relevant for the '
-               'first project',
-               2,
-               None,
-               'lab-session-material-from-last-week-and-relevant-for-the-first-project'),
+               'another-example-from-scikit-learn-s-repository'),
               ('Various steps in cross-validation',
                2,
                None,
                'various-steps-in-cross-validation'),
-              ('How to set up the cross-validation for Ridge and/or Lasso',
-               2,
-               None,
-               'how-to-set-up-the-cross-validation-for-ridge-and-or-lasso'),
               ('Cross-validation in brief',
                2,
                None,
@@ -210,7 +151,19 @@
                'Cross-validation',
                2,
                None,
-               'code-example-for-cross-validation-and-k-fold-cross-validation')]}
+               'code-example-for-cross-validation-and-k-fold-cross-validation'),
+              ('More examples on bootstrap and cross-validation and errors',
+               2,
+               None,
+               'more-examples-on-bootstrap-and-cross-validation-and-errors'),
+              ('The same example but now with cross-validation',
+               2,
+               None,
+               'the-same-example-but-now-with-cross-validation'),
+              ('Material for the lab sessions',
+               2,
+               None,
+               'material-for-the-lab-sessions')]}
 end of tocinfo -->
 
 <body>
@@ -238,76 +191,55 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week38-bs.html">Week 38: Logistic Regression and Optimization</a>
+    <a class="navbar-brand" href="http://github.com/week38-bs.html">Week 38: Statistical analysis, bias-variance tradeoff and resampling methods</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs001.html#plans-for-week-38-lecture-monday-september-16" style="font-size: 80%;">Plans for week 38, lecture Monday September 16</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs002.html#suggested-reading-and-videos" style="font-size: 80%;">Suggested reading and videos</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs003.html#plans-for-the-lab-sessions" style="font-size: 80%;">Plans for the lab sessions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs004.html#material-for-lecture-monday-september-16" style="font-size: 80%;">Material for lecture Monday September 16</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs005.html#logistic-regression" style="font-size: 80%;">Logistic Regression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs006.html#classification-problems" style="font-size: 80%;">Classification problems</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs007.html#optimization-and-deep-learning" style="font-size: 80%;">Optimization and Deep learning</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs008.html#basics" style="font-size: 80%;">Basics</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs009.html#linear-classifier" style="font-size: 80%;">Linear classifier</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs010.html#some-selected-properties" style="font-size: 80%;">Some selected properties</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs011.html#simple-example" style="font-size: 80%;">Simple example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs012.html#plotting-the-mean-value-for-each-group" style="font-size: 80%;">Plotting the mean value for each group</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs013.html#the-logistic-function" style="font-size: 80%;">The logistic function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs014.html#examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-networks" style="font-size: 80%;">Examples of likelihood functions used in logistic regression and nueral networks</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs015.html#two-parameters" style="font-size: 80%;">Two parameters</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs016.html#maximum-likelihood" style="font-size: 80%;">Maximum likelihood</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs017.html#the-cost-function-rewritten" style="font-size: 80%;">The cost function rewritten</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs018.html#minimizing-the-cross-entropy" style="font-size: 80%;">Minimizing the cross entropy</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs019.html#a-more-compact-expression" style="font-size: 80%;">A more compact expression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs020.html#extending-to-more-predictors" style="font-size: 80%;">Extending to more predictors</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs021.html#including-more-classes" style="font-size: 80%;">Including more classes</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs022.html#more-classes" style="font-size: 80%;">More classes</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs023.html#searching-for-optimal-regularization-parameters-lambda" style="font-size: 80%;">Searching for Optimal Regularization Parameters \( \lambda \)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs024.html#grid-search" style="font-size: 80%;">Grid Search</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs025.html#randomized-grid-search" style="font-size: 80%;">Randomized Grid Search</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs026.html#wisconsin-cancer-data" style="font-size: 80%;">Wisconsin Cancer Data</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs027.html#using-the-correlation-matrix" style="font-size: 80%;">Using the correlation matrix</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs028.html#discussing-the-correlation-data" style="font-size: 80%;">Discussing the correlation data</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs029.html#other-measures-in-classification-studies-cancer-data-again" style="font-size: 80%;">Other measures in classification studies: Cancer Data  again</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs030.html#optimization-the-central-part-of-any-machine-learning-algortithm" style="font-size: 80%;">Optimization, the central part of any Machine Learning algortithm</a></li>
-     <!-- navigation toc: --> <li><a href="#revisiting-our-logistic-regression-case" style="font-size: 80%;">Revisiting our Logistic Regression case</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs032.html#the-equations-to-solve" style="font-size: 80%;">The equations to solve</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs033.html#solving-using-newton-raphson-s-method" style="font-size: 80%;">Solving using Newton-Raphson's method</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs034.html#brief-reminder-on-newton-raphson-s-method" style="font-size: 80%;">Brief reminder on Newton-Raphson's method</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs035.html#the-equations" style="font-size: 80%;">The equations</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs036.html#simple-geometric-interpretation" style="font-size: 80%;">Simple geometric interpretation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs037.html#extending-to-more-than-one-variable" style="font-size: 80%;">Extending to more than one variable</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs038.html#steepest-descent" style="font-size: 80%;">Steepest descent</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs039.html#more-on-steepest-descent" style="font-size: 80%;">More on Steepest descent</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs040.html#the-ideal" style="font-size: 80%;">The ideal</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs041.html#the-sensitiveness-of-the-gradient-descent" style="font-size: 80%;">The sensitiveness of the gradient descent</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs042.html#convex-functions" style="font-size: 80%;">Convex functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs043.html#convex-function" style="font-size: 80%;">Convex function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs044.html#conditions-on-convex-functions" style="font-size: 80%;">Conditions on convex functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs045.html#more-on-convex-functions" style="font-size: 80%;">More on convex functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs046.html#some-simple-problems" style="font-size: 80%;">Some simple problems</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs047.html#revisiting-our-first-homework" style="font-size: 80%;">Revisiting our first homework</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs052.html#gradient-descent-example" style="font-size: 80%;">Gradient descent example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs049.html#the-derivative-of-the-cost-loss-function" style="font-size: 80%;">The derivative of the cost/loss function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs050.html#the-hessian-matrix" style="font-size: 80%;">The Hessian matrix</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs051.html#simple-program" style="font-size: 80%;">Simple program</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs052.html#gradient-descent-example" style="font-size: 80%;">Gradient Descent Example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs053.html#and-a-corresponding-example-using-scikit-learn" style="font-size: 80%;">And a corresponding example using <b>scikit-learn</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs054.html#gradient-descent-and-ridge" style="font-size: 80%;">Gradient descent and Ridge</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs055.html#the-hessian-matrix-for-ridge-regression" style="font-size: 80%;">The Hessian matrix for Ridge Regression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs056.html#program-example-for-gradient-descent-with-ridge-regression" style="font-size: 80%;">Program example for gradient descent with Ridge Regression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs057.html#using-gradient-descent-methods-limitations" style="font-size: 80%;">Using gradient descent methods, limitations</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs058.html#challenge-yourself-the-coming-weekend" style="font-size: 80%;">Challenge yourself the coming weekend</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs059.html#lab-session-material-from-last-week-and-relevant-for-the-first-project" style="font-size: 80%;">Lab session: Material from last week and relevant for the first project</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs060.html#various-steps-in-cross-validation" style="font-size: 80%;">Various steps in cross-validation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs061.html#how-to-set-up-the-cross-validation-for-ridge-and-or-lasso" style="font-size: 80%;">How to set up the cross-validation for Ridge and/or Lasso</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs062.html#cross-validation-in-brief" style="font-size: 80%;">Cross-validation in brief</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs063.html#code-example-for-cross-validation-and-k-fold-cross-validation" style="font-size: 80%;">Code Example for Cross-validation and \( k \)-fold Cross-validation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs001.html#plans-for-week-38-lecture-monday-september-15" style="font-size: 80%;">Plans for week 38, lecture Monday September 15</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs002.html#readings-and-videos" style="font-size: 80%;">Readings and Videos</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs003.html#linking-the-regression-analysis-with-a-statistical-interpretation" style="font-size: 80%;">Linking the regression analysis with a statistical interpretation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs004.html#assumptions-made" style="font-size: 80%;">Assumptions made</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs005.html#expectation-value-and-variance" style="font-size: 80%;">Expectation value and variance</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs006.html#expectation-value-and-variance-for-boldsymbol-theta" style="font-size: 80%;">Expectation value and variance for \( \boldsymbol{\theta} \)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs007.html#deriving-ols-from-a-probability-distribution" style="font-size: 80%;">Deriving OLS from a probability distribution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs008.html#independent-and-identically-distributed-iid" style="font-size: 80%;">Independent and Identically Distributed (iid)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs009.html#maximum-likelihood-estimation-mle" style="font-size: 80%;">Maximum Likelihood Estimation (MLE)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs010.html#a-new-cost-function" style="font-size: 80%;">A new Cost Function</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs014.html#why-resampling-methods" style="font-size: 80%;">Why resampling methods</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs016.html#resampling-methods" style="font-size: 80%;">Resampling methods</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs013.html#resampling-approaches-can-be-computationally-expensive" style="font-size: 80%;">Resampling approaches can be computationally expensive</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs014.html#why-resampling-methods" style="font-size: 80%;">Why resampling methods ?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs015.html#statistical-analysis" style="font-size: 80%;">Statistical analysis</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs016.html#resampling-methods" style="font-size: 80%;">Resampling methods</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs017.html#resampling-methods-bootstrap" style="font-size: 80%;">Resampling methods: Bootstrap</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs018.html#the-central-limit-theorem" style="font-size: 80%;">The Central Limit Theorem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs019.html#finding-the-limit" style="font-size: 80%;">Finding the Limit</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs020.html#rewriting-the-delta-function" style="font-size: 80%;">Rewriting the \( \delta \)-function</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs021.html#identifying-terms" style="font-size: 80%;">Identifying Terms</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs022.html#wrapping-it-up" style="font-size: 80%;">Wrapping it up</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs023.html#confidence-intervals" style="font-size: 80%;">Confidence Intervals</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs024.html#standard-approach-based-on-the-normal-distribution" style="font-size: 80%;">Standard Approach based on the Normal Distribution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs025.html#resampling-methods-bootstrap-background" style="font-size: 80%;">Resampling methods: Bootstrap background</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs026.html#resampling-methods-more-bootstrap-background" style="font-size: 80%;">Resampling methods: More Bootstrap background</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs027.html#resampling-methods-bootstrap-approach" style="font-size: 80%;">Resampling methods: Bootstrap approach</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs028.html#resampling-methods-bootstrap-steps" style="font-size: 80%;">Resampling methods: Bootstrap steps</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs029.html#code-example-for-the-bootstrap-method" style="font-size: 80%;">Code example for the Bootstrap method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs030.html#plotting-the-histogram" style="font-size: 80%;">Plotting the Histogram</a></li>
+     <!-- navigation toc: --> <li><a href="#the-bias-variance-tradeoff" style="font-size: 80%;">The bias-variance tradeoff</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs032.html#a-way-to-read-the-bias-variance-tradeoff" style="font-size: 80%;">A way to Read the Bias-Variance Tradeoff</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs033.html#example-code-for-bias-variance-tradeoff" style="font-size: 80%;">Example code for Bias-Variance tradeoff</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs034.html#understanding-what-happens" style="font-size: 80%;">Understanding what happens</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs035.html#summing-up" style="font-size: 80%;">Summing up</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs036.html#another-example-from-scikit-learn-s-repository" style="font-size: 80%;">Another Example from Scikit-Learn's Repository</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs037.html#various-steps-in-cross-validation" style="font-size: 80%;">Various steps in cross-validation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs038.html#cross-validation-in-brief" style="font-size: 80%;">Cross-validation in brief</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs039.html#code-example-for-cross-validation-and-k-fold-cross-validation" style="font-size: 80%;">Code Example for Cross-validation and \( k \)-fold Cross-validation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs040.html#more-examples-on-bootstrap-and-cross-validation-and-errors" style="font-size: 80%;">More examples on bootstrap and cross-validation and errors</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs041.html#the-same-example-but-now-with-cross-validation" style="font-size: 80%;">The same example but now with cross-validation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs042.html#material-for-the-lab-sessions" style="font-size: 80%;">Material for the lab sessions</a></li>
 
         </ul>
       </li>
@@ -319,24 +251,64 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0031"></a>
 <!-- !split -->
-<h2 id="revisiting-our-logistic-regression-case" class="anchor">Revisiting our Logistic Regression case </h2>
+<h2 id="the-bias-variance-tradeoff" class="anchor">The bias-variance tradeoff </h2>
 
-<p>In our discussion on Logistic Regression we studied the 
-case of
-two classes, with \( y_i \) either
-\( 0 \) or \( 1 \). Furthermore we assumed also that we have only two
-parameters \( \beta \) in our fitting, that is we
-defined probabilities
+<p>We will discuss the bias-variance tradeoff in the context of
+continuous predictions such as regression. However, many of the
+intuitions and ideas discussed here also carry over to classification
+tasks. Consider a dataset \( \mathcal{D} \) consisting of the data
+\( \mathbf{X}_\mathcal{D}=\{(y_j, \boldsymbol{x}_j), j=0\ldots n-1\} \). 
 </p>
 
+<p>Let us assume that the true data is generated from a noisy model</p>
+
+$$
+\boldsymbol{y}=f(\boldsymbol{x}) + \boldsymbol{\epsilon}
+$$
+
+<p>where \( \epsilon \) is normally distributed with mean zero and standard deviation \( \sigma^2 \).</p>
+
+<p>In our derivation of the ordinary least squares method we defined then
+an approximation to the function \( f \) in terms of the parameters
+\( \boldsymbol{\theta} \) and the design matrix \( \boldsymbol{X} \) which embody our model,
+that is \( \boldsymbol{\tilde{y}}=\boldsymbol{X}\boldsymbol{\theta} \). 
+</p>
+
+<p>Thereafter we found the parameters \( \boldsymbol{\theta} \) by optimizing the means squared error via the so-called cost function</p>
+$$
+C(\boldsymbol{X},\boldsymbol{\theta}) =\frac{1}{n}\sum_{i=0}^{n-1}(y_i-\tilde{y}_i)^2=\mathbb{E}\left[(\boldsymbol{y}-\boldsymbol{\tilde{y}})^2\right].
+$$
+
+<p>We can rewrite this as </p>
+$$
+\mathbb{E}\left[(\boldsymbol{y}-\boldsymbol{\tilde{y}})^2\right]=\frac{1}{n}\sum_i(f_i-\mathbb{E}\left[\boldsymbol{\tilde{y}}\right])^2+\frac{1}{n}\sum_i(\tilde{y}_i-\mathbb{E}\left[\boldsymbol{\tilde{y}}\right])^2+\sigma^2.
+$$
+
+<p>The three terms represent the square of the bias of the learning
+method, which can be thought of as the error caused by the simplifying
+assumptions built into the method. The second term represents the
+variance of the chosen model and finally the last terms is variance of
+the error \( \boldsymbol{\epsilon} \).
+</p>
+
+<p>To derive this equation, we need to recall that the variance of \( \boldsymbol{y} \) and \( \boldsymbol{\epsilon} \) are both equal to \( \sigma^2 \). The mean value of \( \boldsymbol{\epsilon} \) is by definition equal to zero. Furthermore, the function \( f \) is not a stochastics variable, idem for \( \boldsymbol{\tilde{y}} \).
+We use a more compact notation in terms of the expectation value 
+</p>
+$$
+\mathbb{E}\left[(\boldsymbol{y}-\boldsymbol{\tilde{y}})^2\right]=\mathbb{E}\left[(\boldsymbol{f}+\boldsymbol{\epsilon}-\boldsymbol{\tilde{y}})^2\right],
+$$
+
+<p>and adding and subtracting \( \mathbb{E}\left[\boldsymbol{\tilde{y}}\right] \) we get</p>
+$$
+\mathbb{E}\left[(\boldsymbol{y}-\boldsymbol{\tilde{y}})^2\right]=\mathbb{E}\left[(\boldsymbol{f}+\boldsymbol{\epsilon}-\boldsymbol{\tilde{y}}+\mathbb{E}\left[\boldsymbol{\tilde{y}}\right]-\mathbb{E}\left[\boldsymbol{\tilde{y}}\right])^2\right],
+$$
+
+<p>which, using the abovementioned expectation values can be rewritten as </p>
 $$
-\begin{align*}
-p(y_i=1|x_i,\boldsymbol{\beta}) &= \frac{\exp{(\beta_0+\beta_1x_i)}}{1+\exp{(\beta_0+\beta_1x_i)}},\nonumber\\
-p(y_i=0|x_i,\boldsymbol{\beta}) &= 1 - p(y_i=1|x_i,\boldsymbol{\beta}),
-\end{align*}
+\mathbb{E}\left[(\boldsymbol{y}-\boldsymbol{\tilde{y}})^2\right]=\mathbb{E}\left[(\boldsymbol{y}-\mathbb{E}\left[\boldsymbol{\tilde{y}}\right])^2\right]+\mathrm{Var}\left[\boldsymbol{\tilde{y}}\right]+\sigma^2,
 $$
 
-<p>where \( \boldsymbol{\beta} \) are the weights we wish to extract from data, in our case \( \beta_0 \) and \( \beta_1 \). </p>
+<p>that is the rewriting in terms of the so-called bias, the variance of the model \( \boldsymbol{\tilde{y}} \) and the variance of \( \boldsymbol{\epsilon} \).</p>
 
 <p>
 <!-- navigation buttons at the bottom of the page -->
@@ -363,7 +335,7 @@ <h2 id="revisiting-our-logistic-regression-case" class="anchor">Revisiting our L
   <li><a href="http://github.com/._week38-bs039.html">40</a></li>
   <li><a href="http://github.com/._week38-bs040.html">41</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week38-bs063.html">64</a></li>
+  <li><a href="http://github.com/._week38-bs042.html">43</a></li>
   <li><a href="http://github.com/._week38-bs032.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week38/html/._week38-bs032.html b/doc/pub/week38/html/._week38-bs032.html
index 0765817e5..a756abf4e 100644
--- a/doc/pub/week38/html/._week38-bs032.html
+++ b/doc/pub/week38/html/._week38-bs032.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 38: Logistic Regression and Optimization">
-<title>Week 38: Logistic Regression and Optimization</title>
+<meta name="description" content="Week 38: Statistical analysis, bias-variance tradeoff and resampling methods">
+<title>Week 38: Statistical analysis, bias-variance tradeoff and resampling methods</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week38.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week38-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -36,172 +36,113 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Plans for week 38, lecture Monday September 16',
+ 'sections': [('Plans for week 38, lecture Monday September 15',
                2,
                None,
-               'plans-for-week-38-lecture-monday-september-16'),
-              ('Suggested reading and videos',
+               'plans-for-week-38-lecture-monday-september-15'),
+              ('Readings and Videos', 2, None, 'readings-and-videos'),
+              ('Linking the regression analysis with a statistical '
+               'interpretation',
                2,
                None,
-               'suggested-reading-and-videos'),
-              ('Plans for the lab sessions',
+               'linking-the-regression-analysis-with-a-statistical-interpretation'),
+              ('Assumptions made', 2, None, 'assumptions-made'),
+              ('Expectation value and variance',
                2,
                None,
-               'plans-for-the-lab-sessions'),
-              ('Material for lecture Monday September 16',
+               'expectation-value-and-variance'),
+              ('Expectation value and variance for $\\boldsymbol{\\theta}$',
                2,
                None,
-               'material-for-lecture-monday-september-16'),
-              ('Logistic Regression', 2, None, 'logistic-regression'),
-              ('Classification problems', 2, None, 'classification-problems'),
-              ('Optimization and Deep learning',
+               'expectation-value-and-variance-for-boldsymbol-theta'),
+              ('Deriving OLS from a probability distribution',
                2,
                None,
-               'optimization-and-deep-learning'),
-              ('Basics', 2, None, 'basics'),
-              ('Linear classifier', 2, None, 'linear-classifier'),
-              ('Some selected properties', 2, None, 'some-selected-properties'),
-              ('Simple example', 2, None, 'simple-example'),
-              ('Plotting the mean value for each group',
+               'deriving-ols-from-a-probability-distribution'),
+              ('Independent and Identically Distributed (iid)',
                2,
                None,
-               'plotting-the-mean-value-for-each-group'),
-              ('The logistic function', 2, None, 'the-logistic-function'),
-              ('Examples of likelihood functions used in logistic regression '
-               'and nueral networks',
+               'independent-and-identically-distributed-iid'),
+              ('Maximum Likelihood Estimation (MLE)',
                2,
                None,
-               'examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-networks'),
-              ('Two parameters', 2, None, 'two-parameters'),
-              ('Maximum likelihood', 2, None, 'maximum-likelihood'),
-              ('The cost function rewritten',
+               'maximum-likelihood-estimation-mle'),
+              ('A new Cost Function', 2, None, 'a-new-cost-function'),
+              ('Why resampling methods', 2, None, 'why-resampling-methods'),
+              ('Resampling methods', 2, None, 'resampling-methods'),
+              ('Resampling approaches can be computationally expensive',
                2,
                None,
-               'the-cost-function-rewritten'),
-              ('Minimizing the cross entropy',
+               'resampling-approaches-can-be-computationally-expensive'),
+              ('Why resampling methods ?', 2, None, 'why-resampling-methods'),
+              ('Statistical analysis', 2, None, 'statistical-analysis'),
+              ('Resampling methods', 2, None, 'resampling-methods'),
+              ('Resampling methods: Bootstrap',
                2,
                None,
-               'minimizing-the-cross-entropy'),
-              ('A more compact expression',
+               'resampling-methods-bootstrap'),
+              ('The Central Limit Theorem',
                2,
                None,
-               'a-more-compact-expression'),
-              ('Extending to more predictors',
+               'the-central-limit-theorem'),
+              ('Finding the Limit', 2, None, 'finding-the-limit'),
+              ('Rewriting the $\\delta$-function',
                2,
                None,
-               'extending-to-more-predictors'),
-              ('Including more classes', 2, None, 'including-more-classes'),
-              ('More classes', 2, None, 'more-classes'),
-              ('Searching for Optimal Regularization Parameters $\\lambda$',
+               'rewriting-the-delta-function'),
+              ('Identifying Terms', 2, None, 'identifying-terms'),
+              ('Wrapping it up', 2, None, 'wrapping-it-up'),
+              ('Confidence Intervals', 2, None, 'confidence-intervals'),
+              ('Standard Approach based on the Normal Distribution',
                2,
                None,
-               'searching-for-optimal-regularization-parameters-lambda'),
-              ('Grid Search', 2, None, 'grid-search'),
-              ('Randomized Grid Search', 2, None, 'randomized-grid-search'),
-              ('Wisconsin Cancer Data', 2, None, 'wisconsin-cancer-data'),
-              ('Using the correlation matrix',
+               'standard-approach-based-on-the-normal-distribution'),
+              ('Resampling methods: Bootstrap background',
                2,
                None,
-               'using-the-correlation-matrix'),
-              ('Discussing the correlation data',
+               'resampling-methods-bootstrap-background'),
+              ('Resampling methods: More Bootstrap background',
                2,
                None,
-               'discussing-the-correlation-data'),
-              ('Other measures in classification studies: Cancer Data  again',
+               'resampling-methods-more-bootstrap-background'),
+              ('Resampling methods: Bootstrap approach',
                2,
                None,
-               'other-measures-in-classification-studies-cancer-data-again'),
-              ('Optimization, the central part of any Machine Learning '
-               'algortithm',
+               'resampling-methods-bootstrap-approach'),
+              ('Resampling methods: Bootstrap steps',
                2,
                None,
-               'optimization-the-central-part-of-any-machine-learning-algortithm'),
-              ('Revisiting our Logistic Regression case',
+               'resampling-methods-bootstrap-steps'),
+              ('Code example for the Bootstrap method',
                2,
                None,
-               'revisiting-our-logistic-regression-case'),
-              ('The equations to solve', 2, None, 'the-equations-to-solve'),
-              ("Solving using Newton-Raphson's method",
+               'code-example-for-the-bootstrap-method'),
+              ('Plotting the Histogram', 2, None, 'plotting-the-histogram'),
+              ('The bias-variance tradeoff',
                2,
                None,
-               'solving-using-newton-raphson-s-method'),
-              ("Brief reminder on Newton-Raphson's method",
+               'the-bias-variance-tradeoff'),
+              ('A way to Read the Bias-Variance Tradeoff',
                2,
                None,
-               'brief-reminder-on-newton-raphson-s-method'),
-              ('The equations', 2, None, 'the-equations'),
-              ('Simple geometric interpretation',
+               'a-way-to-read-the-bias-variance-tradeoff'),
+              ('Example code for Bias-Variance tradeoff',
                2,
                None,
-               'simple-geometric-interpretation'),
-              ('Extending to more than one variable',
+               'example-code-for-bias-variance-tradeoff'),
+              ('Understanding what happens',
                2,
                None,
-               'extending-to-more-than-one-variable'),
-              ('Steepest descent', 2, None, 'steepest-descent'),
-              ('More on Steepest descent', 2, None, 'more-on-steepest-descent'),
-              ('The ideal', 2, None, 'the-ideal'),
-              ('The sensitiveness of the gradient descent',
+               'understanding-what-happens'),
+              ('Summing up', 2, None, 'summing-up'),
+              ("Another Example from Scikit-Learn's Repository",
                2,
                None,
-               'the-sensitiveness-of-the-gradient-descent'),
-              ('Convex functions', 2, None, 'convex-functions'),
-              ('Convex function', 2, None, 'convex-function'),
-              ('Conditions on convex functions',
-               2,
-               None,
-               'conditions-on-convex-functions'),
-              ('More on convex functions', 2, None, 'more-on-convex-functions'),
-              ('Some simple problems', 2, None, 'some-simple-problems'),
-              ('Revisiting our first homework',
-               2,
-               None,
-               'revisiting-our-first-homework'),
-              ('Gradient descent example', 2, None, 'gradient-descent-example'),
-              ('The derivative of the cost/loss function',
-               2,
-               None,
-               'the-derivative-of-the-cost-loss-function'),
-              ('The Hessian matrix', 2, None, 'the-hessian-matrix'),
-              ('Simple program', 2, None, 'simple-program'),
-              ('Gradient Descent Example', 2, None, 'gradient-descent-example'),
-              ('And a corresponding example using _scikit-learn_',
-               2,
-               None,
-               'and-a-corresponding-example-using-scikit-learn'),
-              ('Gradient descent and Ridge',
-               2,
-               None,
-               'gradient-descent-and-ridge'),
-              ('The Hessian matrix for Ridge Regression',
-               2,
-               None,
-               'the-hessian-matrix-for-ridge-regression'),
-              ('Program example for gradient descent with Ridge Regression',
-               2,
-               None,
-               'program-example-for-gradient-descent-with-ridge-regression'),
-              ('Using gradient descent methods, limitations',
-               2,
-               None,
-               'using-gradient-descent-methods-limitations'),
-              ('Challenge yourself the coming weekend',
-               2,
-               None,
-               'challenge-yourself-the-coming-weekend'),
-              ('Lab session: Material from last week and relevant for the '
-               'first project',
-               2,
-               None,
-               'lab-session-material-from-last-week-and-relevant-for-the-first-project'),
+               'another-example-from-scikit-learn-s-repository'),
               ('Various steps in cross-validation',
                2,
                None,
                'various-steps-in-cross-validation'),
-              ('How to set up the cross-validation for Ridge and/or Lasso',
-               2,
-               None,
-               'how-to-set-up-the-cross-validation-for-ridge-and-or-lasso'),
               ('Cross-validation in brief',
                2,
                None,
@@ -210,7 +151,19 @@
                'Cross-validation',
                2,
                None,
-               'code-example-for-cross-validation-and-k-fold-cross-validation')]}
+               'code-example-for-cross-validation-and-k-fold-cross-validation'),
+              ('More examples on bootstrap and cross-validation and errors',
+               2,
+               None,
+               'more-examples-on-bootstrap-and-cross-validation-and-errors'),
+              ('The same example but now with cross-validation',
+               2,
+               None,
+               'the-same-example-but-now-with-cross-validation'),
+              ('Material for the lab sessions',
+               2,
+               None,
+               'material-for-the-lab-sessions')]}
 end of tocinfo -->
 
 <body>
@@ -238,76 +191,55 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week38-bs.html">Week 38: Logistic Regression and Optimization</a>
+    <a class="navbar-brand" href="http://github.com/week38-bs.html">Week 38: Statistical analysis, bias-variance tradeoff and resampling methods</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs001.html#plans-for-week-38-lecture-monday-september-16" style="font-size: 80%;">Plans for week 38, lecture Monday September 16</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs002.html#suggested-reading-and-videos" style="font-size: 80%;">Suggested reading and videos</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs003.html#plans-for-the-lab-sessions" style="font-size: 80%;">Plans for the lab sessions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs004.html#material-for-lecture-monday-september-16" style="font-size: 80%;">Material for lecture Monday September 16</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs005.html#logistic-regression" style="font-size: 80%;">Logistic Regression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs006.html#classification-problems" style="font-size: 80%;">Classification problems</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs007.html#optimization-and-deep-learning" style="font-size: 80%;">Optimization and Deep learning</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs008.html#basics" style="font-size: 80%;">Basics</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs009.html#linear-classifier" style="font-size: 80%;">Linear classifier</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs010.html#some-selected-properties" style="font-size: 80%;">Some selected properties</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs011.html#simple-example" style="font-size: 80%;">Simple example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs012.html#plotting-the-mean-value-for-each-group" style="font-size: 80%;">Plotting the mean value for each group</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs013.html#the-logistic-function" style="font-size: 80%;">The logistic function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs014.html#examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-networks" style="font-size: 80%;">Examples of likelihood functions used in logistic regression and nueral networks</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs015.html#two-parameters" style="font-size: 80%;">Two parameters</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs016.html#maximum-likelihood" style="font-size: 80%;">Maximum likelihood</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs017.html#the-cost-function-rewritten" style="font-size: 80%;">The cost function rewritten</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs018.html#minimizing-the-cross-entropy" style="font-size: 80%;">Minimizing the cross entropy</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs019.html#a-more-compact-expression" style="font-size: 80%;">A more compact expression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs020.html#extending-to-more-predictors" style="font-size: 80%;">Extending to more predictors</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs021.html#including-more-classes" style="font-size: 80%;">Including more classes</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs022.html#more-classes" style="font-size: 80%;">More classes</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs023.html#searching-for-optimal-regularization-parameters-lambda" style="font-size: 80%;">Searching for Optimal Regularization Parameters \( \lambda \)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs024.html#grid-search" style="font-size: 80%;">Grid Search</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs025.html#randomized-grid-search" style="font-size: 80%;">Randomized Grid Search</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs026.html#wisconsin-cancer-data" style="font-size: 80%;">Wisconsin Cancer Data</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs027.html#using-the-correlation-matrix" style="font-size: 80%;">Using the correlation matrix</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs028.html#discussing-the-correlation-data" style="font-size: 80%;">Discussing the correlation data</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs029.html#other-measures-in-classification-studies-cancer-data-again" style="font-size: 80%;">Other measures in classification studies: Cancer Data  again</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs030.html#optimization-the-central-part-of-any-machine-learning-algortithm" style="font-size: 80%;">Optimization, the central part of any Machine Learning algortithm</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs031.html#revisiting-our-logistic-regression-case" style="font-size: 80%;">Revisiting our Logistic Regression case</a></li>
-     <!-- navigation toc: --> <li><a href="#the-equations-to-solve" style="font-size: 80%;">The equations to solve</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs033.html#solving-using-newton-raphson-s-method" style="font-size: 80%;">Solving using Newton-Raphson's method</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs034.html#brief-reminder-on-newton-raphson-s-method" style="font-size: 80%;">Brief reminder on Newton-Raphson's method</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs035.html#the-equations" style="font-size: 80%;">The equations</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs036.html#simple-geometric-interpretation" style="font-size: 80%;">Simple geometric interpretation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs037.html#extending-to-more-than-one-variable" style="font-size: 80%;">Extending to more than one variable</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs038.html#steepest-descent" style="font-size: 80%;">Steepest descent</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs039.html#more-on-steepest-descent" style="font-size: 80%;">More on Steepest descent</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs040.html#the-ideal" style="font-size: 80%;">The ideal</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs041.html#the-sensitiveness-of-the-gradient-descent" style="font-size: 80%;">The sensitiveness of the gradient descent</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs042.html#convex-functions" style="font-size: 80%;">Convex functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs043.html#convex-function" style="font-size: 80%;">Convex function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs044.html#conditions-on-convex-functions" style="font-size: 80%;">Conditions on convex functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs045.html#more-on-convex-functions" style="font-size: 80%;">More on convex functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs046.html#some-simple-problems" style="font-size: 80%;">Some simple problems</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs047.html#revisiting-our-first-homework" style="font-size: 80%;">Revisiting our first homework</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs052.html#gradient-descent-example" style="font-size: 80%;">Gradient descent example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs049.html#the-derivative-of-the-cost-loss-function" style="font-size: 80%;">The derivative of the cost/loss function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs050.html#the-hessian-matrix" style="font-size: 80%;">The Hessian matrix</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs051.html#simple-program" style="font-size: 80%;">Simple program</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs052.html#gradient-descent-example" style="font-size: 80%;">Gradient Descent Example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs053.html#and-a-corresponding-example-using-scikit-learn" style="font-size: 80%;">And a corresponding example using <b>scikit-learn</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs054.html#gradient-descent-and-ridge" style="font-size: 80%;">Gradient descent and Ridge</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs055.html#the-hessian-matrix-for-ridge-regression" style="font-size: 80%;">The Hessian matrix for Ridge Regression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs056.html#program-example-for-gradient-descent-with-ridge-regression" style="font-size: 80%;">Program example for gradient descent with Ridge Regression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs057.html#using-gradient-descent-methods-limitations" style="font-size: 80%;">Using gradient descent methods, limitations</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs058.html#challenge-yourself-the-coming-weekend" style="font-size: 80%;">Challenge yourself the coming weekend</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs059.html#lab-session-material-from-last-week-and-relevant-for-the-first-project" style="font-size: 80%;">Lab session: Material from last week and relevant for the first project</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs060.html#various-steps-in-cross-validation" style="font-size: 80%;">Various steps in cross-validation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs061.html#how-to-set-up-the-cross-validation-for-ridge-and-or-lasso" style="font-size: 80%;">How to set up the cross-validation for Ridge and/or Lasso</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs062.html#cross-validation-in-brief" style="font-size: 80%;">Cross-validation in brief</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs063.html#code-example-for-cross-validation-and-k-fold-cross-validation" style="font-size: 80%;">Code Example for Cross-validation and \( k \)-fold Cross-validation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs001.html#plans-for-week-38-lecture-monday-september-15" style="font-size: 80%;">Plans for week 38, lecture Monday September 15</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs002.html#readings-and-videos" style="font-size: 80%;">Readings and Videos</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs003.html#linking-the-regression-analysis-with-a-statistical-interpretation" style="font-size: 80%;">Linking the regression analysis with a statistical interpretation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs004.html#assumptions-made" style="font-size: 80%;">Assumptions made</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs005.html#expectation-value-and-variance" style="font-size: 80%;">Expectation value and variance</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs006.html#expectation-value-and-variance-for-boldsymbol-theta" style="font-size: 80%;">Expectation value and variance for \( \boldsymbol{\theta} \)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs007.html#deriving-ols-from-a-probability-distribution" style="font-size: 80%;">Deriving OLS from a probability distribution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs008.html#independent-and-identically-distributed-iid" style="font-size: 80%;">Independent and Identically Distributed (iid)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs009.html#maximum-likelihood-estimation-mle" style="font-size: 80%;">Maximum Likelihood Estimation (MLE)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs010.html#a-new-cost-function" style="font-size: 80%;">A new Cost Function</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs014.html#why-resampling-methods" style="font-size: 80%;">Why resampling methods</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs016.html#resampling-methods" style="font-size: 80%;">Resampling methods</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs013.html#resampling-approaches-can-be-computationally-expensive" style="font-size: 80%;">Resampling approaches can be computationally expensive</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs014.html#why-resampling-methods" style="font-size: 80%;">Why resampling methods ?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs015.html#statistical-analysis" style="font-size: 80%;">Statistical analysis</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs016.html#resampling-methods" style="font-size: 80%;">Resampling methods</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs017.html#resampling-methods-bootstrap" style="font-size: 80%;">Resampling methods: Bootstrap</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs018.html#the-central-limit-theorem" style="font-size: 80%;">The Central Limit Theorem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs019.html#finding-the-limit" style="font-size: 80%;">Finding the Limit</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs020.html#rewriting-the-delta-function" style="font-size: 80%;">Rewriting the \( \delta \)-function</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs021.html#identifying-terms" style="font-size: 80%;">Identifying Terms</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs022.html#wrapping-it-up" style="font-size: 80%;">Wrapping it up</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs023.html#confidence-intervals" style="font-size: 80%;">Confidence Intervals</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs024.html#standard-approach-based-on-the-normal-distribution" style="font-size: 80%;">Standard Approach based on the Normal Distribution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs025.html#resampling-methods-bootstrap-background" style="font-size: 80%;">Resampling methods: Bootstrap background</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs026.html#resampling-methods-more-bootstrap-background" style="font-size: 80%;">Resampling methods: More Bootstrap background</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs027.html#resampling-methods-bootstrap-approach" style="font-size: 80%;">Resampling methods: Bootstrap approach</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs028.html#resampling-methods-bootstrap-steps" style="font-size: 80%;">Resampling methods: Bootstrap steps</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs029.html#code-example-for-the-bootstrap-method" style="font-size: 80%;">Code example for the Bootstrap method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs030.html#plotting-the-histogram" style="font-size: 80%;">Plotting the Histogram</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs031.html#the-bias-variance-tradeoff" style="font-size: 80%;">The bias-variance tradeoff</a></li>
+     <!-- navigation toc: --> <li><a href="#a-way-to-read-the-bias-variance-tradeoff" style="font-size: 80%;">A way to Read the Bias-Variance Tradeoff</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs033.html#example-code-for-bias-variance-tradeoff" style="font-size: 80%;">Example code for Bias-Variance tradeoff</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs034.html#understanding-what-happens" style="font-size: 80%;">Understanding what happens</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs035.html#summing-up" style="font-size: 80%;">Summing up</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs036.html#another-example-from-scikit-learn-s-repository" style="font-size: 80%;">Another Example from Scikit-Learn's Repository</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs037.html#various-steps-in-cross-validation" style="font-size: 80%;">Various steps in cross-validation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs038.html#cross-validation-in-brief" style="font-size: 80%;">Cross-validation in brief</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs039.html#code-example-for-cross-validation-and-k-fold-cross-validation" style="font-size: 80%;">Code Example for Cross-validation and \( k \)-fold Cross-validation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs040.html#more-examples-on-bootstrap-and-cross-validation-and-errors" style="font-size: 80%;">More examples on bootstrap and cross-validation and errors</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs041.html#the-same-example-but-now-with-cross-validation" style="font-size: 80%;">The same example but now with cross-validation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs042.html#material-for-the-lab-sessions" style="font-size: 80%;">Material for the lab sessions</a></li>
 
         </ul>
       </li>
@@ -319,28 +251,13 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0032"></a>
 <!-- !split -->
-<h2 id="the-equations-to-solve" class="anchor">The equations to solve </h2>
-
-<p>Our compact equations used a definition of a vector \( \boldsymbol{y} \) with \( n \)
-elements \( y_i \), an \( n\times p \) matrix \( \boldsymbol{X} \) which contains the
-\( x_i \) values and a vector \( \boldsymbol{p} \) of fitted probabilities
-\( p(y_i\vert x_i,\boldsymbol{\beta}) \). We rewrote in a more compact form
-the first derivative of the cost function as
-</p>
+<h2 id="a-way-to-read-the-bias-variance-tradeoff" class="anchor">A way to Read the Bias-Variance Tradeoff </h2>
 
-$$
-\frac{\partial \mathcal{C}(\boldsymbol{\beta})}{\partial \boldsymbol{\beta}} = -\boldsymbol{X}^T\left(\boldsymbol{y}-\boldsymbol{p}\right). 
-$$
-
-<p>If we in addition define a diagonal matrix \( \boldsymbol{W} \) with elements 
-\( p(y_i\vert x_i,\boldsymbol{\beta})(1-p(y_i\vert x_i,\boldsymbol{\beta}) \), we can obtain a compact expression of the second derivative as 
-</p>
-
-$$
-\frac{\partial^2 \mathcal{C}(\boldsymbol{\beta})}{\partial \boldsymbol{\beta}\partial \boldsymbol{\beta}^T} = \boldsymbol{X}^T\boldsymbol{W}\boldsymbol{X}. 
-$$
-
-<p>This defines what is called  the Hessian matrix.</p>
+<br/><br/>
+<center>
+<p><img src="http://github.com/figures/BiasVariance.png" width="600" align="bottom"></p>
+</center>
+<br/><br/>
 
 <p>
 <!-- navigation buttons at the bottom of the page -->
@@ -367,7 +284,7 @@ <h2 id="the-equations-to-solve" class="anchor">The equations to solve </h2>
   <li><a href="http://github.com/._week38-bs040.html">41</a></li>
   <li><a href="http://github.com/._week38-bs041.html">42</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week38-bs063.html">64</a></li>
+  <li><a href="http://github.com/._week38-bs042.html">43</a></li>
   <li><a href="http://github.com/._week38-bs033.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week38/html/._week38-bs033.html b/doc/pub/week38/html/._week38-bs033.html
index aac7ed65a..040e257e0 100644
--- a/doc/pub/week38/html/._week38-bs033.html
+++ b/doc/pub/week38/html/._week38-bs033.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 38: Logistic Regression and Optimization">
-<title>Week 38: Logistic Regression and Optimization</title>
+<meta name="description" content="Week 38: Statistical analysis, bias-variance tradeoff and resampling methods">
+<title>Week 38: Statistical analysis, bias-variance tradeoff and resampling methods</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week38.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week38-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -36,172 +36,113 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Plans for week 38, lecture Monday September 16',
+ 'sections': [('Plans for week 38, lecture Monday September 15',
                2,
                None,
-               'plans-for-week-38-lecture-monday-september-16'),
-              ('Suggested reading and videos',
+               'plans-for-week-38-lecture-monday-september-15'),
+              ('Readings and Videos', 2, None, 'readings-and-videos'),
+              ('Linking the regression analysis with a statistical '
+               'interpretation',
                2,
                None,
-               'suggested-reading-and-videos'),
-              ('Plans for the lab sessions',
+               'linking-the-regression-analysis-with-a-statistical-interpretation'),
+              ('Assumptions made', 2, None, 'assumptions-made'),
+              ('Expectation value and variance',
                2,
                None,
-               'plans-for-the-lab-sessions'),
-              ('Material for lecture Monday September 16',
+               'expectation-value-and-variance'),
+              ('Expectation value and variance for $\\boldsymbol{\\theta}$',
                2,
                None,
-               'material-for-lecture-monday-september-16'),
-              ('Logistic Regression', 2, None, 'logistic-regression'),
-              ('Classification problems', 2, None, 'classification-problems'),
-              ('Optimization and Deep learning',
+               'expectation-value-and-variance-for-boldsymbol-theta'),
+              ('Deriving OLS from a probability distribution',
                2,
                None,
-               'optimization-and-deep-learning'),
-              ('Basics', 2, None, 'basics'),
-              ('Linear classifier', 2, None, 'linear-classifier'),
-              ('Some selected properties', 2, None, 'some-selected-properties'),
-              ('Simple example', 2, None, 'simple-example'),
-              ('Plotting the mean value for each group',
+               'deriving-ols-from-a-probability-distribution'),
+              ('Independent and Identically Distributed (iid)',
                2,
                None,
-               'plotting-the-mean-value-for-each-group'),
-              ('The logistic function', 2, None, 'the-logistic-function'),
-              ('Examples of likelihood functions used in logistic regression '
-               'and nueral networks',
+               'independent-and-identically-distributed-iid'),
+              ('Maximum Likelihood Estimation (MLE)',
                2,
                None,
-               'examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-networks'),
-              ('Two parameters', 2, None, 'two-parameters'),
-              ('Maximum likelihood', 2, None, 'maximum-likelihood'),
-              ('The cost function rewritten',
+               'maximum-likelihood-estimation-mle'),
+              ('A new Cost Function', 2, None, 'a-new-cost-function'),
+              ('Why resampling methods', 2, None, 'why-resampling-methods'),
+              ('Resampling methods', 2, None, 'resampling-methods'),
+              ('Resampling approaches can be computationally expensive',
                2,
                None,
-               'the-cost-function-rewritten'),
-              ('Minimizing the cross entropy',
+               'resampling-approaches-can-be-computationally-expensive'),
+              ('Why resampling methods ?', 2, None, 'why-resampling-methods'),
+              ('Statistical analysis', 2, None, 'statistical-analysis'),
+              ('Resampling methods', 2, None, 'resampling-methods'),
+              ('Resampling methods: Bootstrap',
                2,
                None,
-               'minimizing-the-cross-entropy'),
-              ('A more compact expression',
+               'resampling-methods-bootstrap'),
+              ('The Central Limit Theorem',
                2,
                None,
-               'a-more-compact-expression'),
-              ('Extending to more predictors',
+               'the-central-limit-theorem'),
+              ('Finding the Limit', 2, None, 'finding-the-limit'),
+              ('Rewriting the $\\delta$-function',
                2,
                None,
-               'extending-to-more-predictors'),
-              ('Including more classes', 2, None, 'including-more-classes'),
-              ('More classes', 2, None, 'more-classes'),
-              ('Searching for Optimal Regularization Parameters $\\lambda$',
+               'rewriting-the-delta-function'),
+              ('Identifying Terms', 2, None, 'identifying-terms'),
+              ('Wrapping it up', 2, None, 'wrapping-it-up'),
+              ('Confidence Intervals', 2, None, 'confidence-intervals'),
+              ('Standard Approach based on the Normal Distribution',
                2,
                None,
-               'searching-for-optimal-regularization-parameters-lambda'),
-              ('Grid Search', 2, None, 'grid-search'),
-              ('Randomized Grid Search', 2, None, 'randomized-grid-search'),
-              ('Wisconsin Cancer Data', 2, None, 'wisconsin-cancer-data'),
-              ('Using the correlation matrix',
+               'standard-approach-based-on-the-normal-distribution'),
+              ('Resampling methods: Bootstrap background',
                2,
                None,
-               'using-the-correlation-matrix'),
-              ('Discussing the correlation data',
+               'resampling-methods-bootstrap-background'),
+              ('Resampling methods: More Bootstrap background',
                2,
                None,
-               'discussing-the-correlation-data'),
-              ('Other measures in classification studies: Cancer Data  again',
+               'resampling-methods-more-bootstrap-background'),
+              ('Resampling methods: Bootstrap approach',
                2,
                None,
-               'other-measures-in-classification-studies-cancer-data-again'),
-              ('Optimization, the central part of any Machine Learning '
-               'algortithm',
+               'resampling-methods-bootstrap-approach'),
+              ('Resampling methods: Bootstrap steps',
                2,
                None,
-               'optimization-the-central-part-of-any-machine-learning-algortithm'),
-              ('Revisiting our Logistic Regression case',
+               'resampling-methods-bootstrap-steps'),
+              ('Code example for the Bootstrap method',
                2,
                None,
-               'revisiting-our-logistic-regression-case'),
-              ('The equations to solve', 2, None, 'the-equations-to-solve'),
-              ("Solving using Newton-Raphson's method",
+               'code-example-for-the-bootstrap-method'),
+              ('Plotting the Histogram', 2, None, 'plotting-the-histogram'),
+              ('The bias-variance tradeoff',
                2,
                None,
-               'solving-using-newton-raphson-s-method'),
-              ("Brief reminder on Newton-Raphson's method",
+               'the-bias-variance-tradeoff'),
+              ('A way to Read the Bias-Variance Tradeoff',
                2,
                None,
-               'brief-reminder-on-newton-raphson-s-method'),
-              ('The equations', 2, None, 'the-equations'),
-              ('Simple geometric interpretation',
+               'a-way-to-read-the-bias-variance-tradeoff'),
+              ('Example code for Bias-Variance tradeoff',
                2,
                None,
-               'simple-geometric-interpretation'),
-              ('Extending to more than one variable',
+               'example-code-for-bias-variance-tradeoff'),
+              ('Understanding what happens',
                2,
                None,
-               'extending-to-more-than-one-variable'),
-              ('Steepest descent', 2, None, 'steepest-descent'),
-              ('More on Steepest descent', 2, None, 'more-on-steepest-descent'),
-              ('The ideal', 2, None, 'the-ideal'),
-              ('The sensitiveness of the gradient descent',
+               'understanding-what-happens'),
+              ('Summing up', 2, None, 'summing-up'),
+              ("Another Example from Scikit-Learn's Repository",
                2,
                None,
-               'the-sensitiveness-of-the-gradient-descent'),
-              ('Convex functions', 2, None, 'convex-functions'),
-              ('Convex function', 2, None, 'convex-function'),
-              ('Conditions on convex functions',
-               2,
-               None,
-               'conditions-on-convex-functions'),
-              ('More on convex functions', 2, None, 'more-on-convex-functions'),
-              ('Some simple problems', 2, None, 'some-simple-problems'),
-              ('Revisiting our first homework',
-               2,
-               None,
-               'revisiting-our-first-homework'),
-              ('Gradient descent example', 2, None, 'gradient-descent-example'),
-              ('The derivative of the cost/loss function',
-               2,
-               None,
-               'the-derivative-of-the-cost-loss-function'),
-              ('The Hessian matrix', 2, None, 'the-hessian-matrix'),
-              ('Simple program', 2, None, 'simple-program'),
-              ('Gradient Descent Example', 2, None, 'gradient-descent-example'),
-              ('And a corresponding example using _scikit-learn_',
-               2,
-               None,
-               'and-a-corresponding-example-using-scikit-learn'),
-              ('Gradient descent and Ridge',
-               2,
-               None,
-               'gradient-descent-and-ridge'),
-              ('The Hessian matrix for Ridge Regression',
-               2,
-               None,
-               'the-hessian-matrix-for-ridge-regression'),
-              ('Program example for gradient descent with Ridge Regression',
-               2,
-               None,
-               'program-example-for-gradient-descent-with-ridge-regression'),
-              ('Using gradient descent methods, limitations',
-               2,
-               None,
-               'using-gradient-descent-methods-limitations'),
-              ('Challenge yourself the coming weekend',
-               2,
-               None,
-               'challenge-yourself-the-coming-weekend'),
-              ('Lab session: Material from last week and relevant for the '
-               'first project',
-               2,
-               None,
-               'lab-session-material-from-last-week-and-relevant-for-the-first-project'),
+               'another-example-from-scikit-learn-s-repository'),
               ('Various steps in cross-validation',
                2,
                None,
                'various-steps-in-cross-validation'),
-              ('How to set up the cross-validation for Ridge and/or Lasso',
-               2,
-               None,
-               'how-to-set-up-the-cross-validation-for-ridge-and-or-lasso'),
               ('Cross-validation in brief',
                2,
                None,
@@ -210,7 +151,19 @@
                'Cross-validation',
                2,
                None,
-               'code-example-for-cross-validation-and-k-fold-cross-validation')]}
+               'code-example-for-cross-validation-and-k-fold-cross-validation'),
+              ('More examples on bootstrap and cross-validation and errors',
+               2,
+               None,
+               'more-examples-on-bootstrap-and-cross-validation-and-errors'),
+              ('The same example but now with cross-validation',
+               2,
+               None,
+               'the-same-example-but-now-with-cross-validation'),
+              ('Material for the lab sessions',
+               2,
+               None,
+               'material-for-the-lab-sessions')]}
 end of tocinfo -->
 
 <body>
@@ -238,76 +191,55 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week38-bs.html">Week 38: Logistic Regression and Optimization</a>
+    <a class="navbar-brand" href="http://github.com/week38-bs.html">Week 38: Statistical analysis, bias-variance tradeoff and resampling methods</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs001.html#plans-for-week-38-lecture-monday-september-16" style="font-size: 80%;">Plans for week 38, lecture Monday September 16</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs002.html#suggested-reading-and-videos" style="font-size: 80%;">Suggested reading and videos</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs003.html#plans-for-the-lab-sessions" style="font-size: 80%;">Plans for the lab sessions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs004.html#material-for-lecture-monday-september-16" style="font-size: 80%;">Material for lecture Monday September 16</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs005.html#logistic-regression" style="font-size: 80%;">Logistic Regression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs006.html#classification-problems" style="font-size: 80%;">Classification problems</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs007.html#optimization-and-deep-learning" style="font-size: 80%;">Optimization and Deep learning</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs008.html#basics" style="font-size: 80%;">Basics</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs009.html#linear-classifier" style="font-size: 80%;">Linear classifier</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs010.html#some-selected-properties" style="font-size: 80%;">Some selected properties</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs011.html#simple-example" style="font-size: 80%;">Simple example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs012.html#plotting-the-mean-value-for-each-group" style="font-size: 80%;">Plotting the mean value for each group</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs013.html#the-logistic-function" style="font-size: 80%;">The logistic function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs014.html#examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-networks" style="font-size: 80%;">Examples of likelihood functions used in logistic regression and nueral networks</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs015.html#two-parameters" style="font-size: 80%;">Two parameters</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs016.html#maximum-likelihood" style="font-size: 80%;">Maximum likelihood</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs017.html#the-cost-function-rewritten" style="font-size: 80%;">The cost function rewritten</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs018.html#minimizing-the-cross-entropy" style="font-size: 80%;">Minimizing the cross entropy</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs019.html#a-more-compact-expression" style="font-size: 80%;">A more compact expression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs020.html#extending-to-more-predictors" style="font-size: 80%;">Extending to more predictors</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs021.html#including-more-classes" style="font-size: 80%;">Including more classes</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs022.html#more-classes" style="font-size: 80%;">More classes</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs023.html#searching-for-optimal-regularization-parameters-lambda" style="font-size: 80%;">Searching for Optimal Regularization Parameters \( \lambda \)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs024.html#grid-search" style="font-size: 80%;">Grid Search</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs025.html#randomized-grid-search" style="font-size: 80%;">Randomized Grid Search</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs026.html#wisconsin-cancer-data" style="font-size: 80%;">Wisconsin Cancer Data</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs027.html#using-the-correlation-matrix" style="font-size: 80%;">Using the correlation matrix</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs028.html#discussing-the-correlation-data" style="font-size: 80%;">Discussing the correlation data</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs029.html#other-measures-in-classification-studies-cancer-data-again" style="font-size: 80%;">Other measures in classification studies: Cancer Data  again</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs030.html#optimization-the-central-part-of-any-machine-learning-algortithm" style="font-size: 80%;">Optimization, the central part of any Machine Learning algortithm</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs031.html#revisiting-our-logistic-regression-case" style="font-size: 80%;">Revisiting our Logistic Regression case</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs032.html#the-equations-to-solve" style="font-size: 80%;">The equations to solve</a></li>
-     <!-- navigation toc: --> <li><a href="#solving-using-newton-raphson-s-method" style="font-size: 80%;">Solving using Newton-Raphson's method</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs034.html#brief-reminder-on-newton-raphson-s-method" style="font-size: 80%;">Brief reminder on Newton-Raphson's method</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs035.html#the-equations" style="font-size: 80%;">The equations</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs036.html#simple-geometric-interpretation" style="font-size: 80%;">Simple geometric interpretation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs037.html#extending-to-more-than-one-variable" style="font-size: 80%;">Extending to more than one variable</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs038.html#steepest-descent" style="font-size: 80%;">Steepest descent</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs039.html#more-on-steepest-descent" style="font-size: 80%;">More on Steepest descent</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs040.html#the-ideal" style="font-size: 80%;">The ideal</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs041.html#the-sensitiveness-of-the-gradient-descent" style="font-size: 80%;">The sensitiveness of the gradient descent</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs042.html#convex-functions" style="font-size: 80%;">Convex functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs043.html#convex-function" style="font-size: 80%;">Convex function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs044.html#conditions-on-convex-functions" style="font-size: 80%;">Conditions on convex functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs045.html#more-on-convex-functions" style="font-size: 80%;">More on convex functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs046.html#some-simple-problems" style="font-size: 80%;">Some simple problems</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs047.html#revisiting-our-first-homework" style="font-size: 80%;">Revisiting our first homework</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs052.html#gradient-descent-example" style="font-size: 80%;">Gradient descent example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs049.html#the-derivative-of-the-cost-loss-function" style="font-size: 80%;">The derivative of the cost/loss function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs050.html#the-hessian-matrix" style="font-size: 80%;">The Hessian matrix</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs051.html#simple-program" style="font-size: 80%;">Simple program</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs052.html#gradient-descent-example" style="font-size: 80%;">Gradient Descent Example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs053.html#and-a-corresponding-example-using-scikit-learn" style="font-size: 80%;">And a corresponding example using <b>scikit-learn</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs054.html#gradient-descent-and-ridge" style="font-size: 80%;">Gradient descent and Ridge</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs055.html#the-hessian-matrix-for-ridge-regression" style="font-size: 80%;">The Hessian matrix for Ridge Regression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs056.html#program-example-for-gradient-descent-with-ridge-regression" style="font-size: 80%;">Program example for gradient descent with Ridge Regression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs057.html#using-gradient-descent-methods-limitations" style="font-size: 80%;">Using gradient descent methods, limitations</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs058.html#challenge-yourself-the-coming-weekend" style="font-size: 80%;">Challenge yourself the coming weekend</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs059.html#lab-session-material-from-last-week-and-relevant-for-the-first-project" style="font-size: 80%;">Lab session: Material from last week and relevant for the first project</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs060.html#various-steps-in-cross-validation" style="font-size: 80%;">Various steps in cross-validation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs061.html#how-to-set-up-the-cross-validation-for-ridge-and-or-lasso" style="font-size: 80%;">How to set up the cross-validation for Ridge and/or Lasso</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs062.html#cross-validation-in-brief" style="font-size: 80%;">Cross-validation in brief</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs063.html#code-example-for-cross-validation-and-k-fold-cross-validation" style="font-size: 80%;">Code Example for Cross-validation and \( k \)-fold Cross-validation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs001.html#plans-for-week-38-lecture-monday-september-15" style="font-size: 80%;">Plans for week 38, lecture Monday September 15</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs002.html#readings-and-videos" style="font-size: 80%;">Readings and Videos</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs003.html#linking-the-regression-analysis-with-a-statistical-interpretation" style="font-size: 80%;">Linking the regression analysis with a statistical interpretation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs004.html#assumptions-made" style="font-size: 80%;">Assumptions made</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs005.html#expectation-value-and-variance" style="font-size: 80%;">Expectation value and variance</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs006.html#expectation-value-and-variance-for-boldsymbol-theta" style="font-size: 80%;">Expectation value and variance for \( \boldsymbol{\theta} \)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs007.html#deriving-ols-from-a-probability-distribution" style="font-size: 80%;">Deriving OLS from a probability distribution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs008.html#independent-and-identically-distributed-iid" style="font-size: 80%;">Independent and Identically Distributed (iid)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs009.html#maximum-likelihood-estimation-mle" style="font-size: 80%;">Maximum Likelihood Estimation (MLE)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs010.html#a-new-cost-function" style="font-size: 80%;">A new Cost Function</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs014.html#why-resampling-methods" style="font-size: 80%;">Why resampling methods</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs016.html#resampling-methods" style="font-size: 80%;">Resampling methods</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs013.html#resampling-approaches-can-be-computationally-expensive" style="font-size: 80%;">Resampling approaches can be computationally expensive</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs014.html#why-resampling-methods" style="font-size: 80%;">Why resampling methods ?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs015.html#statistical-analysis" style="font-size: 80%;">Statistical analysis</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs016.html#resampling-methods" style="font-size: 80%;">Resampling methods</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs017.html#resampling-methods-bootstrap" style="font-size: 80%;">Resampling methods: Bootstrap</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs018.html#the-central-limit-theorem" style="font-size: 80%;">The Central Limit Theorem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs019.html#finding-the-limit" style="font-size: 80%;">Finding the Limit</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs020.html#rewriting-the-delta-function" style="font-size: 80%;">Rewriting the \( \delta \)-function</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs021.html#identifying-terms" style="font-size: 80%;">Identifying Terms</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs022.html#wrapping-it-up" style="font-size: 80%;">Wrapping it up</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs023.html#confidence-intervals" style="font-size: 80%;">Confidence Intervals</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs024.html#standard-approach-based-on-the-normal-distribution" style="font-size: 80%;">Standard Approach based on the Normal Distribution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs025.html#resampling-methods-bootstrap-background" style="font-size: 80%;">Resampling methods: Bootstrap background</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs026.html#resampling-methods-more-bootstrap-background" style="font-size: 80%;">Resampling methods: More Bootstrap background</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs027.html#resampling-methods-bootstrap-approach" style="font-size: 80%;">Resampling methods: Bootstrap approach</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs028.html#resampling-methods-bootstrap-steps" style="font-size: 80%;">Resampling methods: Bootstrap steps</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs029.html#code-example-for-the-bootstrap-method" style="font-size: 80%;">Code example for the Bootstrap method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs030.html#plotting-the-histogram" style="font-size: 80%;">Plotting the Histogram</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs031.html#the-bias-variance-tradeoff" style="font-size: 80%;">The bias-variance tradeoff</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs032.html#a-way-to-read-the-bias-variance-tradeoff" style="font-size: 80%;">A way to Read the Bias-Variance Tradeoff</a></li>
+     <!-- navigation toc: --> <li><a href="#example-code-for-bias-variance-tradeoff" style="font-size: 80%;">Example code for Bias-Variance tradeoff</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs034.html#understanding-what-happens" style="font-size: 80%;">Understanding what happens</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs035.html#summing-up" style="font-size: 80%;">Summing up</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs036.html#another-example-from-scikit-learn-s-repository" style="font-size: 80%;">Another Example from Scikit-Learn's Repository</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs037.html#various-steps-in-cross-validation" style="font-size: 80%;">Various steps in cross-validation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs038.html#cross-validation-in-brief" style="font-size: 80%;">Cross-validation in brief</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs039.html#code-example-for-cross-validation-and-k-fold-cross-validation" style="font-size: 80%;">Code Example for Cross-validation and \( k \)-fold Cross-validation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs040.html#more-examples-on-bootstrap-and-cross-validation-and-errors" style="font-size: 80%;">More examples on bootstrap and cross-validation and errors</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs041.html#the-same-example-but-now-with-cross-validation" style="font-size: 80%;">The same example but now with cross-validation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs042.html#material-for-the-lab-sessions" style="font-size: 80%;">Material for the lab sessions</a></li>
 
         </ul>
       </li>
@@ -319,25 +251,83 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0033"></a>
 <!-- !split -->
-<h2 id="solving-using-newton-raphson-s-method" class="anchor">Solving using Newton-Raphson's method </h2>
+<h2 id="example-code-for-bias-variance-tradeoff" class="anchor">Example code for Bias-Variance tradeoff </h2>
 
-<p>If we can set up these equations, Newton-Raphson's iterative method is normally the method of choice. It requires however that we can compute in an efficient way the  matrices that define the first and second derivatives. </p>
+<!-- code=python (!bc pycod) typeset with pygments style "default" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #f8f8f8">
+  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">matplotlib.pyplot</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">plt</span>
+<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.linear_model</span> <span style="color: #008000; font-weight: bold">import</span> LinearRegression, Ridge, Lasso
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.preprocessing</span> <span style="color: #008000; font-weight: bold">import</span> PolynomialFeatures
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.model_selection</span> <span style="color: #008000; font-weight: bold">import</span> train_test_split
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.pipeline</span> <span style="color: #008000; font-weight: bold">import</span> make_pipeline
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.utils</span> <span style="color: #008000; font-weight: bold">import</span> resample
 
-<p>Our iterative scheme is then given by</p>
+np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>seed(<span style="color: #666666">2018</span>)
 
-$$
-\boldsymbol{\beta}^{\mathrm{new}} = \boldsymbol{\beta}^{\mathrm{old}}-\left(\frac{\partial^2 \mathcal{C}(\boldsymbol{\beta})}{\partial \boldsymbol{\beta}\partial \boldsymbol{\beta}^T}\right)^{-1}_{\boldsymbol{\beta}^{\mathrm{old}}}\times \left(\frac{\partial \mathcal{C}(\boldsymbol{\beta})}{\partial \boldsymbol{\beta}}\right)_{\boldsymbol{\beta}^{\mathrm{old}}},
-$$
+n <span style="color: #666666">=</span> <span style="color: #666666">500</span>
+n_boostraps <span style="color: #666666">=</span> <span style="color: #666666">100</span>
+degree <span style="color: #666666">=</span> <span style="color: #666666">18</span>  <span style="color: #408080; font-style: italic"># A quite high value, just to show.</span>
+noise <span style="color: #666666">=</span> <span style="color: #666666">0.1</span>
 
-<p>or in matrix form as</p>
+<span style="color: #408080; font-style: italic"># Make data set.</span>
+x <span style="color: #666666">=</span> np<span style="color: #666666">.</span>linspace(<span style="color: #666666">-1</span>, <span style="color: #666666">3</span>, n)<span style="color: #666666">.</span>reshape(<span style="color: #666666">-1</span>, <span style="color: #666666">1</span>)
+y <span style="color: #666666">=</span> np<span style="color: #666666">.</span>exp(<span style="color: #666666">-</span>x<span style="color: #666666">**2</span>) <span style="color: #666666">+</span> <span style="color: #666666">1.5</span> <span style="color: #666666">*</span> np<span style="color: #666666">.</span>exp(<span style="color: #666666">-</span>(x<span style="color: #666666">-2</span>)<span style="color: #666666">**2</span>) <span style="color: #666666">+</span> np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>normal(<span style="color: #666666">0</span>, <span style="color: #666666">0.1</span>, x<span style="color: #666666">.</span>shape)
 
-$$
-\boldsymbol{\beta}^{\mathrm{new}} = \boldsymbol{\beta}^{\mathrm{old}}-\left(\boldsymbol{X}^T\boldsymbol{W}\boldsymbol{X} \right)^{-1}\times \left(-\boldsymbol{X}^T(\boldsymbol{y}-\boldsymbol{p}) \right)_{\boldsymbol{\beta}^{\mathrm{old}}}.
-$$
+<span style="color: #408080; font-style: italic"># Hold out some test data that is never used in training.</span>
+x_train, x_test, y_train, y_test <span style="color: #666666">=</span> train_test_split(x, y, test_size<span style="color: #666666">=0.2</span>)
 
-<p>The right-hand side is computed with the old values of \( \beta \). </p>
+<span style="color: #408080; font-style: italic"># Combine x transformation and model into one operation.</span>
+<span style="color: #408080; font-style: italic"># Not neccesary, but convenient.</span>
+model <span style="color: #666666">=</span> make_pipeline(PolynomialFeatures(degree<span style="color: #666666">=</span>degree), LinearRegression(fit_intercept<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">False</span>))
+
+<span style="color: #408080; font-style: italic"># The following (m x n_bootstraps) matrix holds the column vectors y_pred</span>
+<span style="color: #408080; font-style: italic"># for each bootstrap iteration.</span>
+y_pred <span style="color: #666666">=</span> np<span style="color: #666666">.</span>empty((y_test<span style="color: #666666">.</span>shape[<span style="color: #666666">0</span>], n_boostraps))
+<span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(n_boostraps):
+    x_, y_ <span style="color: #666666">=</span> resample(x_train, y_train)
+
+    <span style="color: #408080; font-style: italic"># Evaluate the new model on the same test data each time.</span>
+    y_pred[:, i] <span style="color: #666666">=</span> model<span style="color: #666666">.</span>fit(x_, y_)<span style="color: #666666">.</span>predict(x_test)<span style="color: #666666">.</span>ravel()
+
+<span style="color: #408080; font-style: italic"># Note: Expectations and variances taken w.r.t. different training</span>
+<span style="color: #408080; font-style: italic"># data sets, hence the axis=1. Subsequent means are taken across the test data</span>
+<span style="color: #408080; font-style: italic"># set in order to obtain a total value, but before this we have error/bias/variance</span>
+<span style="color: #408080; font-style: italic"># calculated per data point in the test set.</span>
+<span style="color: #408080; font-style: italic"># Note 2: The use of keepdims=True is important in the calculation of bias as this </span>
+<span style="color: #408080; font-style: italic"># maintains the column vector form. Dropping this yields very unexpected results.</span>
+error <span style="color: #666666">=</span> np<span style="color: #666666">.</span>mean( np<span style="color: #666666">.</span>mean((y_test <span style="color: #666666">-</span> y_pred)<span style="color: #666666">**2</span>, axis<span style="color: #666666">=1</span>, keepdims<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">True</span>) )
+bias <span style="color: #666666">=</span> np<span style="color: #666666">.</span>mean( (y_test <span style="color: #666666">-</span> np<span style="color: #666666">.</span>mean(y_pred, axis<span style="color: #666666">=1</span>, keepdims<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">True</span>))<span style="color: #666666">**2</span> )
+variance <span style="color: #666666">=</span> np<span style="color: #666666">.</span>mean( np<span style="color: #666666">.</span>var(y_pred, axis<span style="color: #666666">=1</span>, keepdims<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">True</span>) )
+<span style="color: #008000">print</span>(<span style="color: #BA2121">&#39;Error:&#39;</span>, error)
+<span style="color: #008000">print</span>(<span style="color: #BA2121">&#39;Bias^2:&#39;</span>, bias)
+<span style="color: #008000">print</span>(<span style="color: #BA2121">&#39;Var:&#39;</span>, variance)
+<span style="color: #008000">print</span>(<span style="color: #BA2121">&#39;</span><span style="color: #BB6688; font-weight: bold">{}</span><span style="color: #BA2121"> &gt;= </span><span style="color: #BB6688; font-weight: bold">{}</span><span style="color: #BA2121"> + </span><span style="color: #BB6688; font-weight: bold">{}</span><span style="color: #BA2121"> = </span><span style="color: #BB6688; font-weight: bold">{}</span><span style="color: #BA2121">&#39;</span><span style="color: #666666">.</span>format(error, bias, variance, bias<span style="color: #666666">+</span>variance))
+
+plt<span style="color: #666666">.</span>plot(x[::<span style="color: #666666">5</span>, :], y[::<span style="color: #666666">5</span>, :], label<span style="color: #666666">=</span><span style="color: #BA2121">&#39;f(x)&#39;</span>)
+plt<span style="color: #666666">.</span>scatter(x_test, y_test, label<span style="color: #666666">=</span><span style="color: #BA2121">&#39;Data points&#39;</span>)
+plt<span style="color: #666666">.</span>scatter(x_test, np<span style="color: #666666">.</span>mean(y_pred, axis<span style="color: #666666">=1</span>), label<span style="color: #666666">=</span><span style="color: #BA2121">&#39;Pred&#39;</span>)
+plt<span style="color: #666666">.</span>legend()
+plt<span style="color: #666666">.</span>show()
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
+</div>
 
-<p>If we can compute these matrices, in particular the Hessian, the above is often the easiest method to implement. </p>
 
 <p>
 <!-- navigation buttons at the bottom of the page -->
@@ -363,8 +353,6 @@ <h2 id="solving-using-newton-raphson-s-method" class="anchor">Solving using Newt
   <li><a href="http://github.com/._week38-bs040.html">41</a></li>
   <li><a href="http://github.com/._week38-bs041.html">42</a></li>
   <li><a href="http://github.com/._week38-bs042.html">43</a></li>
-  <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week38-bs063.html">64</a></li>
   <li><a href="http://github.com/._week38-bs034.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week38/html/._week38-bs034.html b/doc/pub/week38/html/._week38-bs034.html
index cf2562182..ab92540b5 100644
--- a/doc/pub/week38/html/._week38-bs034.html
+++ b/doc/pub/week38/html/._week38-bs034.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 38: Logistic Regression and Optimization">
-<title>Week 38: Logistic Regression and Optimization</title>
+<meta name="description" content="Week 38: Statistical analysis, bias-variance tradeoff and resampling methods">
+<title>Week 38: Statistical analysis, bias-variance tradeoff and resampling methods</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week38.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week38-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -36,172 +36,113 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Plans for week 38, lecture Monday September 16',
+ 'sections': [('Plans for week 38, lecture Monday September 15',
                2,
                None,
-               'plans-for-week-38-lecture-monday-september-16'),
-              ('Suggested reading and videos',
+               'plans-for-week-38-lecture-monday-september-15'),
+              ('Readings and Videos', 2, None, 'readings-and-videos'),
+              ('Linking the regression analysis with a statistical '
+               'interpretation',
                2,
                None,
-               'suggested-reading-and-videos'),
-              ('Plans for the lab sessions',
+               'linking-the-regression-analysis-with-a-statistical-interpretation'),
+              ('Assumptions made', 2, None, 'assumptions-made'),
+              ('Expectation value and variance',
                2,
                None,
-               'plans-for-the-lab-sessions'),
-              ('Material for lecture Monday September 16',
+               'expectation-value-and-variance'),
+              ('Expectation value and variance for $\\boldsymbol{\\theta}$',
                2,
                None,
-               'material-for-lecture-monday-september-16'),
-              ('Logistic Regression', 2, None, 'logistic-regression'),
-              ('Classification problems', 2, None, 'classification-problems'),
-              ('Optimization and Deep learning',
+               'expectation-value-and-variance-for-boldsymbol-theta'),
+              ('Deriving OLS from a probability distribution',
                2,
                None,
-               'optimization-and-deep-learning'),
-              ('Basics', 2, None, 'basics'),
-              ('Linear classifier', 2, None, 'linear-classifier'),
-              ('Some selected properties', 2, None, 'some-selected-properties'),
-              ('Simple example', 2, None, 'simple-example'),
-              ('Plotting the mean value for each group',
+               'deriving-ols-from-a-probability-distribution'),
+              ('Independent and Identically Distributed (iid)',
                2,
                None,
-               'plotting-the-mean-value-for-each-group'),
-              ('The logistic function', 2, None, 'the-logistic-function'),
-              ('Examples of likelihood functions used in logistic regression '
-               'and nueral networks',
+               'independent-and-identically-distributed-iid'),
+              ('Maximum Likelihood Estimation (MLE)',
                2,
                None,
-               'examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-networks'),
-              ('Two parameters', 2, None, 'two-parameters'),
-              ('Maximum likelihood', 2, None, 'maximum-likelihood'),
-              ('The cost function rewritten',
+               'maximum-likelihood-estimation-mle'),
+              ('A new Cost Function', 2, None, 'a-new-cost-function'),
+              ('Why resampling methods', 2, None, 'why-resampling-methods'),
+              ('Resampling methods', 2, None, 'resampling-methods'),
+              ('Resampling approaches can be computationally expensive',
                2,
                None,
-               'the-cost-function-rewritten'),
-              ('Minimizing the cross entropy',
+               'resampling-approaches-can-be-computationally-expensive'),
+              ('Why resampling methods ?', 2, None, 'why-resampling-methods'),
+              ('Statistical analysis', 2, None, 'statistical-analysis'),
+              ('Resampling methods', 2, None, 'resampling-methods'),
+              ('Resampling methods: Bootstrap',
                2,
                None,
-               'minimizing-the-cross-entropy'),
-              ('A more compact expression',
+               'resampling-methods-bootstrap'),
+              ('The Central Limit Theorem',
                2,
                None,
-               'a-more-compact-expression'),
-              ('Extending to more predictors',
+               'the-central-limit-theorem'),
+              ('Finding the Limit', 2, None, 'finding-the-limit'),
+              ('Rewriting the $\\delta$-function',
                2,
                None,
-               'extending-to-more-predictors'),
-              ('Including more classes', 2, None, 'including-more-classes'),
-              ('More classes', 2, None, 'more-classes'),
-              ('Searching for Optimal Regularization Parameters $\\lambda$',
+               'rewriting-the-delta-function'),
+              ('Identifying Terms', 2, None, 'identifying-terms'),
+              ('Wrapping it up', 2, None, 'wrapping-it-up'),
+              ('Confidence Intervals', 2, None, 'confidence-intervals'),
+              ('Standard Approach based on the Normal Distribution',
                2,
                None,
-               'searching-for-optimal-regularization-parameters-lambda'),
-              ('Grid Search', 2, None, 'grid-search'),
-              ('Randomized Grid Search', 2, None, 'randomized-grid-search'),
-              ('Wisconsin Cancer Data', 2, None, 'wisconsin-cancer-data'),
-              ('Using the correlation matrix',
+               'standard-approach-based-on-the-normal-distribution'),
+              ('Resampling methods: Bootstrap background',
                2,
                None,
-               'using-the-correlation-matrix'),
-              ('Discussing the correlation data',
+               'resampling-methods-bootstrap-background'),
+              ('Resampling methods: More Bootstrap background',
                2,
                None,
-               'discussing-the-correlation-data'),
-              ('Other measures in classification studies: Cancer Data  again',
+               'resampling-methods-more-bootstrap-background'),
+              ('Resampling methods: Bootstrap approach',
                2,
                None,
-               'other-measures-in-classification-studies-cancer-data-again'),
-              ('Optimization, the central part of any Machine Learning '
-               'algortithm',
+               'resampling-methods-bootstrap-approach'),
+              ('Resampling methods: Bootstrap steps',
                2,
                None,
-               'optimization-the-central-part-of-any-machine-learning-algortithm'),
-              ('Revisiting our Logistic Regression case',
+               'resampling-methods-bootstrap-steps'),
+              ('Code example for the Bootstrap method',
                2,
                None,
-               'revisiting-our-logistic-regression-case'),
-              ('The equations to solve', 2, None, 'the-equations-to-solve'),
-              ("Solving using Newton-Raphson's method",
+               'code-example-for-the-bootstrap-method'),
+              ('Plotting the Histogram', 2, None, 'plotting-the-histogram'),
+              ('The bias-variance tradeoff',
                2,
                None,
-               'solving-using-newton-raphson-s-method'),
-              ("Brief reminder on Newton-Raphson's method",
+               'the-bias-variance-tradeoff'),
+              ('A way to Read the Bias-Variance Tradeoff',
                2,
                None,
-               'brief-reminder-on-newton-raphson-s-method'),
-              ('The equations', 2, None, 'the-equations'),
-              ('Simple geometric interpretation',
+               'a-way-to-read-the-bias-variance-tradeoff'),
+              ('Example code for Bias-Variance tradeoff',
                2,
                None,
-               'simple-geometric-interpretation'),
-              ('Extending to more than one variable',
+               'example-code-for-bias-variance-tradeoff'),
+              ('Understanding what happens',
                2,
                None,
-               'extending-to-more-than-one-variable'),
-              ('Steepest descent', 2, None, 'steepest-descent'),
-              ('More on Steepest descent', 2, None, 'more-on-steepest-descent'),
-              ('The ideal', 2, None, 'the-ideal'),
-              ('The sensitiveness of the gradient descent',
+               'understanding-what-happens'),
+              ('Summing up', 2, None, 'summing-up'),
+              ("Another Example from Scikit-Learn's Repository",
                2,
                None,
-               'the-sensitiveness-of-the-gradient-descent'),
-              ('Convex functions', 2, None, 'convex-functions'),
-              ('Convex function', 2, None, 'convex-function'),
-              ('Conditions on convex functions',
-               2,
-               None,
-               'conditions-on-convex-functions'),
-              ('More on convex functions', 2, None, 'more-on-convex-functions'),
-              ('Some simple problems', 2, None, 'some-simple-problems'),
-              ('Revisiting our first homework',
-               2,
-               None,
-               'revisiting-our-first-homework'),
-              ('Gradient descent example', 2, None, 'gradient-descent-example'),
-              ('The derivative of the cost/loss function',
-               2,
-               None,
-               'the-derivative-of-the-cost-loss-function'),
-              ('The Hessian matrix', 2, None, 'the-hessian-matrix'),
-              ('Simple program', 2, None, 'simple-program'),
-              ('Gradient Descent Example', 2, None, 'gradient-descent-example'),
-              ('And a corresponding example using _scikit-learn_',
-               2,
-               None,
-               'and-a-corresponding-example-using-scikit-learn'),
-              ('Gradient descent and Ridge',
-               2,
-               None,
-               'gradient-descent-and-ridge'),
-              ('The Hessian matrix for Ridge Regression',
-               2,
-               None,
-               'the-hessian-matrix-for-ridge-regression'),
-              ('Program example for gradient descent with Ridge Regression',
-               2,
-               None,
-               'program-example-for-gradient-descent-with-ridge-regression'),
-              ('Using gradient descent methods, limitations',
-               2,
-               None,
-               'using-gradient-descent-methods-limitations'),
-              ('Challenge yourself the coming weekend',
-               2,
-               None,
-               'challenge-yourself-the-coming-weekend'),
-              ('Lab session: Material from last week and relevant for the '
-               'first project',
-               2,
-               None,
-               'lab-session-material-from-last-week-and-relevant-for-the-first-project'),
+               'another-example-from-scikit-learn-s-repository'),
               ('Various steps in cross-validation',
                2,
                None,
                'various-steps-in-cross-validation'),
-              ('How to set up the cross-validation for Ridge and/or Lasso',
-               2,
-               None,
-               'how-to-set-up-the-cross-validation-for-ridge-and-or-lasso'),
               ('Cross-validation in brief',
                2,
                None,
@@ -210,7 +151,19 @@
                'Cross-validation',
                2,
                None,
-               'code-example-for-cross-validation-and-k-fold-cross-validation')]}
+               'code-example-for-cross-validation-and-k-fold-cross-validation'),
+              ('More examples on bootstrap and cross-validation and errors',
+               2,
+               None,
+               'more-examples-on-bootstrap-and-cross-validation-and-errors'),
+              ('The same example but now with cross-validation',
+               2,
+               None,
+               'the-same-example-but-now-with-cross-validation'),
+              ('Material for the lab sessions',
+               2,
+               None,
+               'material-for-the-lab-sessions')]}
 end of tocinfo -->
 
 <body>
@@ -238,76 +191,55 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week38-bs.html">Week 38: Logistic Regression and Optimization</a>
+    <a class="navbar-brand" href="http://github.com/week38-bs.html">Week 38: Statistical analysis, bias-variance tradeoff and resampling methods</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs001.html#plans-for-week-38-lecture-monday-september-16" style="font-size: 80%;">Plans for week 38, lecture Monday September 16</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs002.html#suggested-reading-and-videos" style="font-size: 80%;">Suggested reading and videos</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs003.html#plans-for-the-lab-sessions" style="font-size: 80%;">Plans for the lab sessions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs004.html#material-for-lecture-monday-september-16" style="font-size: 80%;">Material for lecture Monday September 16</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs005.html#logistic-regression" style="font-size: 80%;">Logistic Regression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs006.html#classification-problems" style="font-size: 80%;">Classification problems</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs007.html#optimization-and-deep-learning" style="font-size: 80%;">Optimization and Deep learning</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs008.html#basics" style="font-size: 80%;">Basics</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs009.html#linear-classifier" style="font-size: 80%;">Linear classifier</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs010.html#some-selected-properties" style="font-size: 80%;">Some selected properties</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs011.html#simple-example" style="font-size: 80%;">Simple example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs012.html#plotting-the-mean-value-for-each-group" style="font-size: 80%;">Plotting the mean value for each group</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs013.html#the-logistic-function" style="font-size: 80%;">The logistic function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs014.html#examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-networks" style="font-size: 80%;">Examples of likelihood functions used in logistic regression and nueral networks</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs015.html#two-parameters" style="font-size: 80%;">Two parameters</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs016.html#maximum-likelihood" style="font-size: 80%;">Maximum likelihood</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs017.html#the-cost-function-rewritten" style="font-size: 80%;">The cost function rewritten</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs018.html#minimizing-the-cross-entropy" style="font-size: 80%;">Minimizing the cross entropy</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs019.html#a-more-compact-expression" style="font-size: 80%;">A more compact expression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs020.html#extending-to-more-predictors" style="font-size: 80%;">Extending to more predictors</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs021.html#including-more-classes" style="font-size: 80%;">Including more classes</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs022.html#more-classes" style="font-size: 80%;">More classes</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs023.html#searching-for-optimal-regularization-parameters-lambda" style="font-size: 80%;">Searching for Optimal Regularization Parameters \( \lambda \)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs024.html#grid-search" style="font-size: 80%;">Grid Search</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs025.html#randomized-grid-search" style="font-size: 80%;">Randomized Grid Search</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs026.html#wisconsin-cancer-data" style="font-size: 80%;">Wisconsin Cancer Data</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs027.html#using-the-correlation-matrix" style="font-size: 80%;">Using the correlation matrix</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs028.html#discussing-the-correlation-data" style="font-size: 80%;">Discussing the correlation data</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs029.html#other-measures-in-classification-studies-cancer-data-again" style="font-size: 80%;">Other measures in classification studies: Cancer Data  again</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs030.html#optimization-the-central-part-of-any-machine-learning-algortithm" style="font-size: 80%;">Optimization, the central part of any Machine Learning algortithm</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs031.html#revisiting-our-logistic-regression-case" style="font-size: 80%;">Revisiting our Logistic Regression case</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs032.html#the-equations-to-solve" style="font-size: 80%;">The equations to solve</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs033.html#solving-using-newton-raphson-s-method" style="font-size: 80%;">Solving using Newton-Raphson's method</a></li>
-     <!-- navigation toc: --> <li><a href="#brief-reminder-on-newton-raphson-s-method" style="font-size: 80%;">Brief reminder on Newton-Raphson's method</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs035.html#the-equations" style="font-size: 80%;">The equations</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs036.html#simple-geometric-interpretation" style="font-size: 80%;">Simple geometric interpretation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs037.html#extending-to-more-than-one-variable" style="font-size: 80%;">Extending to more than one variable</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs038.html#steepest-descent" style="font-size: 80%;">Steepest descent</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs039.html#more-on-steepest-descent" style="font-size: 80%;">More on Steepest descent</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs040.html#the-ideal" style="font-size: 80%;">The ideal</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs041.html#the-sensitiveness-of-the-gradient-descent" style="font-size: 80%;">The sensitiveness of the gradient descent</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs042.html#convex-functions" style="font-size: 80%;">Convex functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs043.html#convex-function" style="font-size: 80%;">Convex function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs044.html#conditions-on-convex-functions" style="font-size: 80%;">Conditions on convex functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs045.html#more-on-convex-functions" style="font-size: 80%;">More on convex functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs046.html#some-simple-problems" style="font-size: 80%;">Some simple problems</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs047.html#revisiting-our-first-homework" style="font-size: 80%;">Revisiting our first homework</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs052.html#gradient-descent-example" style="font-size: 80%;">Gradient descent example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs049.html#the-derivative-of-the-cost-loss-function" style="font-size: 80%;">The derivative of the cost/loss function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs050.html#the-hessian-matrix" style="font-size: 80%;">The Hessian matrix</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs051.html#simple-program" style="font-size: 80%;">Simple program</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs052.html#gradient-descent-example" style="font-size: 80%;">Gradient Descent Example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs053.html#and-a-corresponding-example-using-scikit-learn" style="font-size: 80%;">And a corresponding example using <b>scikit-learn</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs054.html#gradient-descent-and-ridge" style="font-size: 80%;">Gradient descent and Ridge</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs055.html#the-hessian-matrix-for-ridge-regression" style="font-size: 80%;">The Hessian matrix for Ridge Regression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs056.html#program-example-for-gradient-descent-with-ridge-regression" style="font-size: 80%;">Program example for gradient descent with Ridge Regression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs057.html#using-gradient-descent-methods-limitations" style="font-size: 80%;">Using gradient descent methods, limitations</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs058.html#challenge-yourself-the-coming-weekend" style="font-size: 80%;">Challenge yourself the coming weekend</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs059.html#lab-session-material-from-last-week-and-relevant-for-the-first-project" style="font-size: 80%;">Lab session: Material from last week and relevant for the first project</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs060.html#various-steps-in-cross-validation" style="font-size: 80%;">Various steps in cross-validation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs061.html#how-to-set-up-the-cross-validation-for-ridge-and-or-lasso" style="font-size: 80%;">How to set up the cross-validation for Ridge and/or Lasso</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs062.html#cross-validation-in-brief" style="font-size: 80%;">Cross-validation in brief</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs063.html#code-example-for-cross-validation-and-k-fold-cross-validation" style="font-size: 80%;">Code Example for Cross-validation and \( k \)-fold Cross-validation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs001.html#plans-for-week-38-lecture-monday-september-15" style="font-size: 80%;">Plans for week 38, lecture Monday September 15</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs002.html#readings-and-videos" style="font-size: 80%;">Readings and Videos</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs003.html#linking-the-regression-analysis-with-a-statistical-interpretation" style="font-size: 80%;">Linking the regression analysis with a statistical interpretation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs004.html#assumptions-made" style="font-size: 80%;">Assumptions made</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs005.html#expectation-value-and-variance" style="font-size: 80%;">Expectation value and variance</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs006.html#expectation-value-and-variance-for-boldsymbol-theta" style="font-size: 80%;">Expectation value and variance for \( \boldsymbol{\theta} \)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs007.html#deriving-ols-from-a-probability-distribution" style="font-size: 80%;">Deriving OLS from a probability distribution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs008.html#independent-and-identically-distributed-iid" style="font-size: 80%;">Independent and Identically Distributed (iid)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs009.html#maximum-likelihood-estimation-mle" style="font-size: 80%;">Maximum Likelihood Estimation (MLE)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs010.html#a-new-cost-function" style="font-size: 80%;">A new Cost Function</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs014.html#why-resampling-methods" style="font-size: 80%;">Why resampling methods</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs016.html#resampling-methods" style="font-size: 80%;">Resampling methods</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs013.html#resampling-approaches-can-be-computationally-expensive" style="font-size: 80%;">Resampling approaches can be computationally expensive</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs014.html#why-resampling-methods" style="font-size: 80%;">Why resampling methods ?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs015.html#statistical-analysis" style="font-size: 80%;">Statistical analysis</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs016.html#resampling-methods" style="font-size: 80%;">Resampling methods</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs017.html#resampling-methods-bootstrap" style="font-size: 80%;">Resampling methods: Bootstrap</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs018.html#the-central-limit-theorem" style="font-size: 80%;">The Central Limit Theorem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs019.html#finding-the-limit" style="font-size: 80%;">Finding the Limit</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs020.html#rewriting-the-delta-function" style="font-size: 80%;">Rewriting the \( \delta \)-function</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs021.html#identifying-terms" style="font-size: 80%;">Identifying Terms</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs022.html#wrapping-it-up" style="font-size: 80%;">Wrapping it up</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs023.html#confidence-intervals" style="font-size: 80%;">Confidence Intervals</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs024.html#standard-approach-based-on-the-normal-distribution" style="font-size: 80%;">Standard Approach based on the Normal Distribution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs025.html#resampling-methods-bootstrap-background" style="font-size: 80%;">Resampling methods: Bootstrap background</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs026.html#resampling-methods-more-bootstrap-background" style="font-size: 80%;">Resampling methods: More Bootstrap background</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs027.html#resampling-methods-bootstrap-approach" style="font-size: 80%;">Resampling methods: Bootstrap approach</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs028.html#resampling-methods-bootstrap-steps" style="font-size: 80%;">Resampling methods: Bootstrap steps</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs029.html#code-example-for-the-bootstrap-method" style="font-size: 80%;">Code example for the Bootstrap method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs030.html#plotting-the-histogram" style="font-size: 80%;">Plotting the Histogram</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs031.html#the-bias-variance-tradeoff" style="font-size: 80%;">The bias-variance tradeoff</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs032.html#a-way-to-read-the-bias-variance-tradeoff" style="font-size: 80%;">A way to Read the Bias-Variance Tradeoff</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs033.html#example-code-for-bias-variance-tradeoff" style="font-size: 80%;">Example code for Bias-Variance tradeoff</a></li>
+     <!-- navigation toc: --> <li><a href="#understanding-what-happens" style="font-size: 80%;">Understanding what happens</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs035.html#summing-up" style="font-size: 80%;">Summing up</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs036.html#another-example-from-scikit-learn-s-repository" style="font-size: 80%;">Another Example from Scikit-Learn's Repository</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs037.html#various-steps-in-cross-validation" style="font-size: 80%;">Various steps in cross-validation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs038.html#cross-validation-in-brief" style="font-size: 80%;">Cross-validation in brief</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs039.html#code-example-for-cross-validation-and-k-fold-cross-validation" style="font-size: 80%;">Code Example for Cross-validation and \( k \)-fold Cross-validation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs040.html#more-examples-on-bootstrap-and-cross-validation-and-errors" style="font-size: 80%;">More examples on bootstrap and cross-validation and errors</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs041.html#the-same-example-but-now-with-cross-validation" style="font-size: 80%;">The same example but now with cross-validation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs042.html#material-for-the-lab-sessions" style="font-size: 80%;">Material for the lab sessions</a></li>
 
         </ul>
       </li>
@@ -319,18 +251,75 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0034"></a>
 <!-- !split -->
-<h2 id="brief-reminder-on-newton-raphson-s-method" class="anchor">Brief reminder on Newton-Raphson's method </h2>
+<h2 id="understanding-what-happens" class="anchor">Understanding what happens </h2>
+
+<!-- code=python (!bc pycod) typeset with pygments style "default" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #f8f8f8">
+  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">matplotlib.pyplot</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">plt</span>
+<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.linear_model</span> <span style="color: #008000; font-weight: bold">import</span> LinearRegression, Ridge, Lasso
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.preprocessing</span> <span style="color: #008000; font-weight: bold">import</span> PolynomialFeatures
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.model_selection</span> <span style="color: #008000; font-weight: bold">import</span> train_test_split
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.pipeline</span> <span style="color: #008000; font-weight: bold">import</span> make_pipeline
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.utils</span> <span style="color: #008000; font-weight: bold">import</span> resample
+
+np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>seed(<span style="color: #666666">2018</span>)
 
-<p>Let us quickly remind ourselves how we derive the above method.</p>
+n <span style="color: #666666">=</span> <span style="color: #666666">40</span>
+n_boostraps <span style="color: #666666">=</span> <span style="color: #666666">100</span>
+maxdegree <span style="color: #666666">=</span> <span style="color: #666666">14</span>
+
+
+<span style="color: #408080; font-style: italic"># Make data set.</span>
+x <span style="color: #666666">=</span> np<span style="color: #666666">.</span>linspace(<span style="color: #666666">-3</span>, <span style="color: #666666">3</span>, n)<span style="color: #666666">.</span>reshape(<span style="color: #666666">-1</span>, <span style="color: #666666">1</span>)
+y <span style="color: #666666">=</span> np<span style="color: #666666">.</span>exp(<span style="color: #666666">-</span>x<span style="color: #666666">**2</span>) <span style="color: #666666">+</span> <span style="color: #666666">1.5</span> <span style="color: #666666">*</span> np<span style="color: #666666">.</span>exp(<span style="color: #666666">-</span>(x<span style="color: #666666">-2</span>)<span style="color: #666666">**2</span>)<span style="color: #666666">+</span> np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>normal(<span style="color: #666666">0</span>, <span style="color: #666666">0.1</span>, x<span style="color: #666666">.</span>shape)
+error <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros(maxdegree)
+bias <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros(maxdegree)
+variance <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros(maxdegree)
+polydegree <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros(maxdegree)
+x_train, x_test, y_train, y_test <span style="color: #666666">=</span> train_test_split(x, y, test_size<span style="color: #666666">=0.2</span>)
+
+<span style="color: #008000; font-weight: bold">for</span> degree <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(maxdegree):
+    model <span style="color: #666666">=</span> make_pipeline(PolynomialFeatures(degree<span style="color: #666666">=</span>degree), LinearRegression(fit_intercept<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">False</span>))
+    y_pred <span style="color: #666666">=</span> np<span style="color: #666666">.</span>empty((y_test<span style="color: #666666">.</span>shape[<span style="color: #666666">0</span>], n_boostraps))
+    <span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(n_boostraps):
+        x_, y_ <span style="color: #666666">=</span> resample(x_train, y_train)
+        y_pred[:, i] <span style="color: #666666">=</span> model<span style="color: #666666">.</span>fit(x_, y_)<span style="color: #666666">.</span>predict(x_test)<span style="color: #666666">.</span>ravel()
+
+    polydegree[degree] <span style="color: #666666">=</span> degree
+    error[degree] <span style="color: #666666">=</span> np<span style="color: #666666">.</span>mean( np<span style="color: #666666">.</span>mean((y_test <span style="color: #666666">-</span> y_pred)<span style="color: #666666">**2</span>, axis<span style="color: #666666">=1</span>, keepdims<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">True</span>) )
+    bias[degree] <span style="color: #666666">=</span> np<span style="color: #666666">.</span>mean( (y_test <span style="color: #666666">-</span> np<span style="color: #666666">.</span>mean(y_pred, axis<span style="color: #666666">=1</span>, keepdims<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">True</span>))<span style="color: #666666">**2</span> )
+    variance[degree] <span style="color: #666666">=</span> np<span style="color: #666666">.</span>mean( np<span style="color: #666666">.</span>var(y_pred, axis<span style="color: #666666">=1</span>, keepdims<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">True</span>) )
+    <span style="color: #008000">print</span>(<span style="color: #BA2121">&#39;Polynomial degree:&#39;</span>, degree)
+    <span style="color: #008000">print</span>(<span style="color: #BA2121">&#39;Error:&#39;</span>, error[degree])
+    <span style="color: #008000">print</span>(<span style="color: #BA2121">&#39;Bias^2:&#39;</span>, bias[degree])
+    <span style="color: #008000">print</span>(<span style="color: #BA2121">&#39;Var:&#39;</span>, variance[degree])
+    <span style="color: #008000">print</span>(<span style="color: #BA2121">&#39;</span><span style="color: #BB6688; font-weight: bold">{}</span><span style="color: #BA2121"> &gt;= </span><span style="color: #BB6688; font-weight: bold">{}</span><span style="color: #BA2121"> + </span><span style="color: #BB6688; font-weight: bold">{}</span><span style="color: #BA2121"> = </span><span style="color: #BB6688; font-weight: bold">{}</span><span style="color: #BA2121">&#39;</span><span style="color: #666666">.</span>format(error[degree], bias[degree], variance[degree], bias[degree]<span style="color: #666666">+</span>variance[degree]))
+
+plt<span style="color: #666666">.</span>plot(polydegree, error, label<span style="color: #666666">=</span><span style="color: #BA2121">&#39;Error&#39;</span>)
+plt<span style="color: #666666">.</span>plot(polydegree, bias, label<span style="color: #666666">=</span><span style="color: #BA2121">&#39;bias&#39;</span>)
+plt<span style="color: #666666">.</span>plot(polydegree, variance, label<span style="color: #666666">=</span><span style="color: #BA2121">&#39;Variance&#39;</span>)
+plt<span style="color: #666666">.</span>legend()
+plt<span style="color: #666666">.</span>show()
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
+</div>
 
-<p>Perhaps the most celebrated of all one-dimensional root-finding
-routines is Newton's method, also called the Newton-Raphson
-method. This method  requires the evaluation of both the
-function \( f \) and its derivative \( f' \) at arbitrary points. 
-If you can only calculate the derivative
-numerically and/or your function is not of the smooth type, we
-normally discourage the use of this method.
-</p>
 
 <p>
 <!-- navigation buttons at the bottom of the page -->
@@ -355,9 +344,6 @@ <h2 id="brief-reminder-on-newton-raphson-s-method" class="anchor">Brief reminder
   <li><a href="http://github.com/._week38-bs040.html">41</a></li>
   <li><a href="http://github.com/._week38-bs041.html">42</a></li>
   <li><a href="http://github.com/._week38-bs042.html">43</a></li>
-  <li><a href="http://github.com/._week38-bs043.html">44</a></li>
-  <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week38-bs063.html">64</a></li>
   <li><a href="http://github.com/._week38-bs035.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week38/html/._week38-bs035.html b/doc/pub/week38/html/._week38-bs035.html
index dbd6a332c..c015c57ee 100644
--- a/doc/pub/week38/html/._week38-bs035.html
+++ b/doc/pub/week38/html/._week38-bs035.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 38: Logistic Regression and Optimization">
-<title>Week 38: Logistic Regression and Optimization</title>
+<meta name="description" content="Week 38: Statistical analysis, bias-variance tradeoff and resampling methods">
+<title>Week 38: Statistical analysis, bias-variance tradeoff and resampling methods</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week38.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week38-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -36,172 +36,113 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Plans for week 38, lecture Monday September 16',
+ 'sections': [('Plans for week 38, lecture Monday September 15',
                2,
                None,
-               'plans-for-week-38-lecture-monday-september-16'),
-              ('Suggested reading and videos',
+               'plans-for-week-38-lecture-monday-september-15'),
+              ('Readings and Videos', 2, None, 'readings-and-videos'),
+              ('Linking the regression analysis with a statistical '
+               'interpretation',
                2,
                None,
-               'suggested-reading-and-videos'),
-              ('Plans for the lab sessions',
+               'linking-the-regression-analysis-with-a-statistical-interpretation'),
+              ('Assumptions made', 2, None, 'assumptions-made'),
+              ('Expectation value and variance',
                2,
                None,
-               'plans-for-the-lab-sessions'),
-              ('Material for lecture Monday September 16',
+               'expectation-value-and-variance'),
+              ('Expectation value and variance for $\\boldsymbol{\\theta}$',
                2,
                None,
-               'material-for-lecture-monday-september-16'),
-              ('Logistic Regression', 2, None, 'logistic-regression'),
-              ('Classification problems', 2, None, 'classification-problems'),
-              ('Optimization and Deep learning',
+               'expectation-value-and-variance-for-boldsymbol-theta'),
+              ('Deriving OLS from a probability distribution',
                2,
                None,
-               'optimization-and-deep-learning'),
-              ('Basics', 2, None, 'basics'),
-              ('Linear classifier', 2, None, 'linear-classifier'),
-              ('Some selected properties', 2, None, 'some-selected-properties'),
-              ('Simple example', 2, None, 'simple-example'),
-              ('Plotting the mean value for each group',
+               'deriving-ols-from-a-probability-distribution'),
+              ('Independent and Identically Distributed (iid)',
                2,
                None,
-               'plotting-the-mean-value-for-each-group'),
-              ('The logistic function', 2, None, 'the-logistic-function'),
-              ('Examples of likelihood functions used in logistic regression '
-               'and nueral networks',
+               'independent-and-identically-distributed-iid'),
+              ('Maximum Likelihood Estimation (MLE)',
                2,
                None,
-               'examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-networks'),
-              ('Two parameters', 2, None, 'two-parameters'),
-              ('Maximum likelihood', 2, None, 'maximum-likelihood'),
-              ('The cost function rewritten',
+               'maximum-likelihood-estimation-mle'),
+              ('A new Cost Function', 2, None, 'a-new-cost-function'),
+              ('Why resampling methods', 2, None, 'why-resampling-methods'),
+              ('Resampling methods', 2, None, 'resampling-methods'),
+              ('Resampling approaches can be computationally expensive',
                2,
                None,
-               'the-cost-function-rewritten'),
-              ('Minimizing the cross entropy',
+               'resampling-approaches-can-be-computationally-expensive'),
+              ('Why resampling methods ?', 2, None, 'why-resampling-methods'),
+              ('Statistical analysis', 2, None, 'statistical-analysis'),
+              ('Resampling methods', 2, None, 'resampling-methods'),
+              ('Resampling methods: Bootstrap',
                2,
                None,
-               'minimizing-the-cross-entropy'),
-              ('A more compact expression',
+               'resampling-methods-bootstrap'),
+              ('The Central Limit Theorem',
                2,
                None,
-               'a-more-compact-expression'),
-              ('Extending to more predictors',
+               'the-central-limit-theorem'),
+              ('Finding the Limit', 2, None, 'finding-the-limit'),
+              ('Rewriting the $\\delta$-function',
                2,
                None,
-               'extending-to-more-predictors'),
-              ('Including more classes', 2, None, 'including-more-classes'),
-              ('More classes', 2, None, 'more-classes'),
-              ('Searching for Optimal Regularization Parameters $\\lambda$',
+               'rewriting-the-delta-function'),
+              ('Identifying Terms', 2, None, 'identifying-terms'),
+              ('Wrapping it up', 2, None, 'wrapping-it-up'),
+              ('Confidence Intervals', 2, None, 'confidence-intervals'),
+              ('Standard Approach based on the Normal Distribution',
                2,
                None,
-               'searching-for-optimal-regularization-parameters-lambda'),
-              ('Grid Search', 2, None, 'grid-search'),
-              ('Randomized Grid Search', 2, None, 'randomized-grid-search'),
-              ('Wisconsin Cancer Data', 2, None, 'wisconsin-cancer-data'),
-              ('Using the correlation matrix',
+               'standard-approach-based-on-the-normal-distribution'),
+              ('Resampling methods: Bootstrap background',
                2,
                None,
-               'using-the-correlation-matrix'),
-              ('Discussing the correlation data',
+               'resampling-methods-bootstrap-background'),
+              ('Resampling methods: More Bootstrap background',
                2,
                None,
-               'discussing-the-correlation-data'),
-              ('Other measures in classification studies: Cancer Data  again',
+               'resampling-methods-more-bootstrap-background'),
+              ('Resampling methods: Bootstrap approach',
                2,
                None,
-               'other-measures-in-classification-studies-cancer-data-again'),
-              ('Optimization, the central part of any Machine Learning '
-               'algortithm',
+               'resampling-methods-bootstrap-approach'),
+              ('Resampling methods: Bootstrap steps',
                2,
                None,
-               'optimization-the-central-part-of-any-machine-learning-algortithm'),
-              ('Revisiting our Logistic Regression case',
+               'resampling-methods-bootstrap-steps'),
+              ('Code example for the Bootstrap method',
                2,
                None,
-               'revisiting-our-logistic-regression-case'),
-              ('The equations to solve', 2, None, 'the-equations-to-solve'),
-              ("Solving using Newton-Raphson's method",
+               'code-example-for-the-bootstrap-method'),
+              ('Plotting the Histogram', 2, None, 'plotting-the-histogram'),
+              ('The bias-variance tradeoff',
                2,
                None,
-               'solving-using-newton-raphson-s-method'),
-              ("Brief reminder on Newton-Raphson's method",
+               'the-bias-variance-tradeoff'),
+              ('A way to Read the Bias-Variance Tradeoff',
                2,
                None,
-               'brief-reminder-on-newton-raphson-s-method'),
-              ('The equations', 2, None, 'the-equations'),
-              ('Simple geometric interpretation',
+               'a-way-to-read-the-bias-variance-tradeoff'),
+              ('Example code for Bias-Variance tradeoff',
                2,
                None,
-               'simple-geometric-interpretation'),
-              ('Extending to more than one variable',
+               'example-code-for-bias-variance-tradeoff'),
+              ('Understanding what happens',
                2,
                None,
-               'extending-to-more-than-one-variable'),
-              ('Steepest descent', 2, None, 'steepest-descent'),
-              ('More on Steepest descent', 2, None, 'more-on-steepest-descent'),
-              ('The ideal', 2, None, 'the-ideal'),
-              ('The sensitiveness of the gradient descent',
+               'understanding-what-happens'),
+              ('Summing up', 2, None, 'summing-up'),
+              ("Another Example from Scikit-Learn's Repository",
                2,
                None,
-               'the-sensitiveness-of-the-gradient-descent'),
-              ('Convex functions', 2, None, 'convex-functions'),
-              ('Convex function', 2, None, 'convex-function'),
-              ('Conditions on convex functions',
-               2,
-               None,
-               'conditions-on-convex-functions'),
-              ('More on convex functions', 2, None, 'more-on-convex-functions'),
-              ('Some simple problems', 2, None, 'some-simple-problems'),
-              ('Revisiting our first homework',
-               2,
-               None,
-               'revisiting-our-first-homework'),
-              ('Gradient descent example', 2, None, 'gradient-descent-example'),
-              ('The derivative of the cost/loss function',
-               2,
-               None,
-               'the-derivative-of-the-cost-loss-function'),
-              ('The Hessian matrix', 2, None, 'the-hessian-matrix'),
-              ('Simple program', 2, None, 'simple-program'),
-              ('Gradient Descent Example', 2, None, 'gradient-descent-example'),
-              ('And a corresponding example using _scikit-learn_',
-               2,
-               None,
-               'and-a-corresponding-example-using-scikit-learn'),
-              ('Gradient descent and Ridge',
-               2,
-               None,
-               'gradient-descent-and-ridge'),
-              ('The Hessian matrix for Ridge Regression',
-               2,
-               None,
-               'the-hessian-matrix-for-ridge-regression'),
-              ('Program example for gradient descent with Ridge Regression',
-               2,
-               None,
-               'program-example-for-gradient-descent-with-ridge-regression'),
-              ('Using gradient descent methods, limitations',
-               2,
-               None,
-               'using-gradient-descent-methods-limitations'),
-              ('Challenge yourself the coming weekend',
-               2,
-               None,
-               'challenge-yourself-the-coming-weekend'),
-              ('Lab session: Material from last week and relevant for the '
-               'first project',
-               2,
-               None,
-               'lab-session-material-from-last-week-and-relevant-for-the-first-project'),
+               'another-example-from-scikit-learn-s-repository'),
               ('Various steps in cross-validation',
                2,
                None,
                'various-steps-in-cross-validation'),
-              ('How to set up the cross-validation for Ridge and/or Lasso',
-               2,
-               None,
-               'how-to-set-up-the-cross-validation-for-ridge-and-or-lasso'),
               ('Cross-validation in brief',
                2,
                None,
@@ -210,7 +151,19 @@
                'Cross-validation',
                2,
                None,
-               'code-example-for-cross-validation-and-k-fold-cross-validation')]}
+               'code-example-for-cross-validation-and-k-fold-cross-validation'),
+              ('More examples on bootstrap and cross-validation and errors',
+               2,
+               None,
+               'more-examples-on-bootstrap-and-cross-validation-and-errors'),
+              ('The same example but now with cross-validation',
+               2,
+               None,
+               'the-same-example-but-now-with-cross-validation'),
+              ('Material for the lab sessions',
+               2,
+               None,
+               'material-for-the-lab-sessions')]}
 end of tocinfo -->
 
 <body>
@@ -238,76 +191,55 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week38-bs.html">Week 38: Logistic Regression and Optimization</a>
+    <a class="navbar-brand" href="http://github.com/week38-bs.html">Week 38: Statistical analysis, bias-variance tradeoff and resampling methods</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs001.html#plans-for-week-38-lecture-monday-september-16" style="font-size: 80%;">Plans for week 38, lecture Monday September 16</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs002.html#suggested-reading-and-videos" style="font-size: 80%;">Suggested reading and videos</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs003.html#plans-for-the-lab-sessions" style="font-size: 80%;">Plans for the lab sessions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs004.html#material-for-lecture-monday-september-16" style="font-size: 80%;">Material for lecture Monday September 16</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs005.html#logistic-regression" style="font-size: 80%;">Logistic Regression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs006.html#classification-problems" style="font-size: 80%;">Classification problems</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs007.html#optimization-and-deep-learning" style="font-size: 80%;">Optimization and Deep learning</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs008.html#basics" style="font-size: 80%;">Basics</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs009.html#linear-classifier" style="font-size: 80%;">Linear classifier</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs010.html#some-selected-properties" style="font-size: 80%;">Some selected properties</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs011.html#simple-example" style="font-size: 80%;">Simple example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs012.html#plotting-the-mean-value-for-each-group" style="font-size: 80%;">Plotting the mean value for each group</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs013.html#the-logistic-function" style="font-size: 80%;">The logistic function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs014.html#examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-networks" style="font-size: 80%;">Examples of likelihood functions used in logistic regression and nueral networks</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs015.html#two-parameters" style="font-size: 80%;">Two parameters</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs016.html#maximum-likelihood" style="font-size: 80%;">Maximum likelihood</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs017.html#the-cost-function-rewritten" style="font-size: 80%;">The cost function rewritten</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs018.html#minimizing-the-cross-entropy" style="font-size: 80%;">Minimizing the cross entropy</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs019.html#a-more-compact-expression" style="font-size: 80%;">A more compact expression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs020.html#extending-to-more-predictors" style="font-size: 80%;">Extending to more predictors</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs021.html#including-more-classes" style="font-size: 80%;">Including more classes</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs022.html#more-classes" style="font-size: 80%;">More classes</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs023.html#searching-for-optimal-regularization-parameters-lambda" style="font-size: 80%;">Searching for Optimal Regularization Parameters \( \lambda \)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs024.html#grid-search" style="font-size: 80%;">Grid Search</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs025.html#randomized-grid-search" style="font-size: 80%;">Randomized Grid Search</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs026.html#wisconsin-cancer-data" style="font-size: 80%;">Wisconsin Cancer Data</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs027.html#using-the-correlation-matrix" style="font-size: 80%;">Using the correlation matrix</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs028.html#discussing-the-correlation-data" style="font-size: 80%;">Discussing the correlation data</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs029.html#other-measures-in-classification-studies-cancer-data-again" style="font-size: 80%;">Other measures in classification studies: Cancer Data  again</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs030.html#optimization-the-central-part-of-any-machine-learning-algortithm" style="font-size: 80%;">Optimization, the central part of any Machine Learning algortithm</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs031.html#revisiting-our-logistic-regression-case" style="font-size: 80%;">Revisiting our Logistic Regression case</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs032.html#the-equations-to-solve" style="font-size: 80%;">The equations to solve</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs033.html#solving-using-newton-raphson-s-method" style="font-size: 80%;">Solving using Newton-Raphson's method</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs034.html#brief-reminder-on-newton-raphson-s-method" style="font-size: 80%;">Brief reminder on Newton-Raphson's method</a></li>
-     <!-- navigation toc: --> <li><a href="#the-equations" style="font-size: 80%;">The equations</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs036.html#simple-geometric-interpretation" style="font-size: 80%;">Simple geometric interpretation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs037.html#extending-to-more-than-one-variable" style="font-size: 80%;">Extending to more than one variable</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs038.html#steepest-descent" style="font-size: 80%;">Steepest descent</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs039.html#more-on-steepest-descent" style="font-size: 80%;">More on Steepest descent</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs040.html#the-ideal" style="font-size: 80%;">The ideal</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs041.html#the-sensitiveness-of-the-gradient-descent" style="font-size: 80%;">The sensitiveness of the gradient descent</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs042.html#convex-functions" style="font-size: 80%;">Convex functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs043.html#convex-function" style="font-size: 80%;">Convex function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs044.html#conditions-on-convex-functions" style="font-size: 80%;">Conditions on convex functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs045.html#more-on-convex-functions" style="font-size: 80%;">More on convex functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs046.html#some-simple-problems" style="font-size: 80%;">Some simple problems</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs047.html#revisiting-our-first-homework" style="font-size: 80%;">Revisiting our first homework</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs052.html#gradient-descent-example" style="font-size: 80%;">Gradient descent example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs049.html#the-derivative-of-the-cost-loss-function" style="font-size: 80%;">The derivative of the cost/loss function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs050.html#the-hessian-matrix" style="font-size: 80%;">The Hessian matrix</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs051.html#simple-program" style="font-size: 80%;">Simple program</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs052.html#gradient-descent-example" style="font-size: 80%;">Gradient Descent Example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs053.html#and-a-corresponding-example-using-scikit-learn" style="font-size: 80%;">And a corresponding example using <b>scikit-learn</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs054.html#gradient-descent-and-ridge" style="font-size: 80%;">Gradient descent and Ridge</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs055.html#the-hessian-matrix-for-ridge-regression" style="font-size: 80%;">The Hessian matrix for Ridge Regression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs056.html#program-example-for-gradient-descent-with-ridge-regression" style="font-size: 80%;">Program example for gradient descent with Ridge Regression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs057.html#using-gradient-descent-methods-limitations" style="font-size: 80%;">Using gradient descent methods, limitations</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs058.html#challenge-yourself-the-coming-weekend" style="font-size: 80%;">Challenge yourself the coming weekend</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs059.html#lab-session-material-from-last-week-and-relevant-for-the-first-project" style="font-size: 80%;">Lab session: Material from last week and relevant for the first project</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs060.html#various-steps-in-cross-validation" style="font-size: 80%;">Various steps in cross-validation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs061.html#how-to-set-up-the-cross-validation-for-ridge-and-or-lasso" style="font-size: 80%;">How to set up the cross-validation for Ridge and/or Lasso</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs062.html#cross-validation-in-brief" style="font-size: 80%;">Cross-validation in brief</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs063.html#code-example-for-cross-validation-and-k-fold-cross-validation" style="font-size: 80%;">Code Example for Cross-validation and \( k \)-fold Cross-validation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs001.html#plans-for-week-38-lecture-monday-september-15" style="font-size: 80%;">Plans for week 38, lecture Monday September 15</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs002.html#readings-and-videos" style="font-size: 80%;">Readings and Videos</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs003.html#linking-the-regression-analysis-with-a-statistical-interpretation" style="font-size: 80%;">Linking the regression analysis with a statistical interpretation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs004.html#assumptions-made" style="font-size: 80%;">Assumptions made</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs005.html#expectation-value-and-variance" style="font-size: 80%;">Expectation value and variance</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs006.html#expectation-value-and-variance-for-boldsymbol-theta" style="font-size: 80%;">Expectation value and variance for \( \boldsymbol{\theta} \)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs007.html#deriving-ols-from-a-probability-distribution" style="font-size: 80%;">Deriving OLS from a probability distribution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs008.html#independent-and-identically-distributed-iid" style="font-size: 80%;">Independent and Identically Distributed (iid)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs009.html#maximum-likelihood-estimation-mle" style="font-size: 80%;">Maximum Likelihood Estimation (MLE)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs010.html#a-new-cost-function" style="font-size: 80%;">A new Cost Function</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs014.html#why-resampling-methods" style="font-size: 80%;">Why resampling methods</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs016.html#resampling-methods" style="font-size: 80%;">Resampling methods</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs013.html#resampling-approaches-can-be-computationally-expensive" style="font-size: 80%;">Resampling approaches can be computationally expensive</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs014.html#why-resampling-methods" style="font-size: 80%;">Why resampling methods ?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs015.html#statistical-analysis" style="font-size: 80%;">Statistical analysis</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs016.html#resampling-methods" style="font-size: 80%;">Resampling methods</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs017.html#resampling-methods-bootstrap" style="font-size: 80%;">Resampling methods: Bootstrap</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs018.html#the-central-limit-theorem" style="font-size: 80%;">The Central Limit Theorem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs019.html#finding-the-limit" style="font-size: 80%;">Finding the Limit</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs020.html#rewriting-the-delta-function" style="font-size: 80%;">Rewriting the \( \delta \)-function</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs021.html#identifying-terms" style="font-size: 80%;">Identifying Terms</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs022.html#wrapping-it-up" style="font-size: 80%;">Wrapping it up</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs023.html#confidence-intervals" style="font-size: 80%;">Confidence Intervals</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs024.html#standard-approach-based-on-the-normal-distribution" style="font-size: 80%;">Standard Approach based on the Normal Distribution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs025.html#resampling-methods-bootstrap-background" style="font-size: 80%;">Resampling methods: Bootstrap background</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs026.html#resampling-methods-more-bootstrap-background" style="font-size: 80%;">Resampling methods: More Bootstrap background</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs027.html#resampling-methods-bootstrap-approach" style="font-size: 80%;">Resampling methods: Bootstrap approach</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs028.html#resampling-methods-bootstrap-steps" style="font-size: 80%;">Resampling methods: Bootstrap steps</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs029.html#code-example-for-the-bootstrap-method" style="font-size: 80%;">Code example for the Bootstrap method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs030.html#plotting-the-histogram" style="font-size: 80%;">Plotting the Histogram</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs031.html#the-bias-variance-tradeoff" style="font-size: 80%;">The bias-variance tradeoff</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs032.html#a-way-to-read-the-bias-variance-tradeoff" style="font-size: 80%;">A way to Read the Bias-Variance Tradeoff</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs033.html#example-code-for-bias-variance-tradeoff" style="font-size: 80%;">Example code for Bias-Variance tradeoff</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs034.html#understanding-what-happens" style="font-size: 80%;">Understanding what happens</a></li>
+     <!-- navigation toc: --> <li><a href="#summing-up" style="font-size: 80%;">Summing up</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs036.html#another-example-from-scikit-learn-s-repository" style="font-size: 80%;">Another Example from Scikit-Learn's Repository</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs037.html#various-steps-in-cross-validation" style="font-size: 80%;">Various steps in cross-validation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs038.html#cross-validation-in-brief" style="font-size: 80%;">Cross-validation in brief</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs039.html#code-example-for-cross-validation-and-k-fold-cross-validation" style="font-size: 80%;">Code Example for Cross-validation and \( k \)-fold Cross-validation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs040.html#more-examples-on-bootstrap-and-cross-validation-and-errors" style="font-size: 80%;">More examples on bootstrap and cross-validation and errors</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs041.html#the-same-example-but-now-with-cross-validation" style="font-size: 80%;">The same example but now with cross-validation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs042.html#material-for-the-lab-sessions" style="font-size: 80%;">Material for the lab sessions</a></li>
 
         </ul>
       </li>
@@ -318,39 +250,39 @@
 <div class="container">
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0035"></a>
-<!-- !split -->
-<h2 id="the-equations" class="anchor">The equations </h2>
+<!-- !split  -->
+<h2 id="summing-up" class="anchor">Summing up </h2>
 
-<p>The Newton-Raphson formula consists geometrically of extending the
-tangent line at a current point until it crosses zero, then setting
-the next guess to the abscissa of that zero-crossing.  The mathematics
-behind this method is rather simple. Employing a Taylor expansion for
-\( x \) sufficiently close to the solution \( s \), we have
+<p>The bias-variance tradeoff summarizes the fundamental tension in
+machine learning, particularly supervised learning, between the
+complexity of a model and the amount of training data needed to train
+it.  Since data is often limited, in practice it is often useful to
+use a less-complex model with higher bias, that is  a model whose asymptotic
+performance is worse than another model because it is easier to
+train and less sensitive to sampling noise arising from having a
+finite-sized training dataset (smaller variance). 
 </p>
 
-$$
-    f(s)=0=f(x)+(s-x)f'(x)+\frac{(s-x)^2}{2}f''(x) +\dots.
-    \tag{2}
-$$
-
-<p>For small enough values of the function and for well-behaved
-functions, the terms beyond linear are unimportant, hence we obtain
+<p>The above equations tell us that in
+order to minimize the expected test error, we need to select a
+statistical learning method that simultaneously achieves low variance
+and low bias. Note that variance is inherently a nonnegative quantity,
+and squared bias is also nonnegative. Hence, we see that the expected
+test MSE can never lie below \( Var(\epsilon) \), the irreducible error.
 </p>
 
-$$
-   f(x)+(s-x)f'(x)\approx 0,
-$$
-
-<p>yielding</p>
-$$
-   s\approx x-\frac{f(x)}{f'(x)}.
-$$
-
-<p>Having in mind an iterative procedure, it is natural to start iterating with</p>
-$$
-   x_{n+1}=x_n-\frac{f(x_n)}{f'(x_n)}.
-$$
+<p>What do we mean by the variance and bias of a statistical learning
+method? The variance refers to the amount by which our model would change if we
+estimated it using a different training data set. Since the training
+data are used to fit the statistical learning method, different
+training data sets  will result in a different estimate. But ideally the
+estimate for our model should not vary too much between training
+sets. However, if a method has high variance  then small changes in
+the training data can result in large changes in the model. In general, more
+flexible statistical methods have higher variance.
+</p>
 
+<p>You may also find this recent <a href="https://www.pnas.org/content/116/32/15849" target="_self">article</a> of interest.</p>
 
 <p>
 <!-- navigation buttons at the bottom of the page -->
@@ -374,10 +306,6 @@ <h2 id="the-equations" class="anchor">The equations </h2>
   <li><a href="http://github.com/._week38-bs040.html">41</a></li>
   <li><a href="http://github.com/._week38-bs041.html">42</a></li>
   <li><a href="http://github.com/._week38-bs042.html">43</a></li>
-  <li><a href="http://github.com/._week38-bs043.html">44</a></li>
-  <li><a href="http://github.com/._week38-bs044.html">45</a></li>
-  <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week38-bs063.html">64</a></li>
   <li><a href="http://github.com/._week38-bs036.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week38/html/._week38-bs036.html b/doc/pub/week38/html/._week38-bs036.html
index eba069d80..43db9e657 100644
--- a/doc/pub/week38/html/._week38-bs036.html
+++ b/doc/pub/week38/html/._week38-bs036.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 38: Logistic Regression and Optimization">
-<title>Week 38: Logistic Regression and Optimization</title>
+<meta name="description" content="Week 38: Statistical analysis, bias-variance tradeoff and resampling methods">
+<title>Week 38: Statistical analysis, bias-variance tradeoff and resampling methods</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week38.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week38-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -36,172 +36,113 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Plans for week 38, lecture Monday September 16',
+ 'sections': [('Plans for week 38, lecture Monday September 15',
                2,
                None,
-               'plans-for-week-38-lecture-monday-september-16'),
-              ('Suggested reading and videos',
+               'plans-for-week-38-lecture-monday-september-15'),
+              ('Readings and Videos', 2, None, 'readings-and-videos'),
+              ('Linking the regression analysis with a statistical '
+               'interpretation',
                2,
                None,
-               'suggested-reading-and-videos'),
-              ('Plans for the lab sessions',
+               'linking-the-regression-analysis-with-a-statistical-interpretation'),
+              ('Assumptions made', 2, None, 'assumptions-made'),
+              ('Expectation value and variance',
                2,
                None,
-               'plans-for-the-lab-sessions'),
-              ('Material for lecture Monday September 16',
+               'expectation-value-and-variance'),
+              ('Expectation value and variance for $\\boldsymbol{\\theta}$',
                2,
                None,
-               'material-for-lecture-monday-september-16'),
-              ('Logistic Regression', 2, None, 'logistic-regression'),
-              ('Classification problems', 2, None, 'classification-problems'),
-              ('Optimization and Deep learning',
+               'expectation-value-and-variance-for-boldsymbol-theta'),
+              ('Deriving OLS from a probability distribution',
                2,
                None,
-               'optimization-and-deep-learning'),
-              ('Basics', 2, None, 'basics'),
-              ('Linear classifier', 2, None, 'linear-classifier'),
-              ('Some selected properties', 2, None, 'some-selected-properties'),
-              ('Simple example', 2, None, 'simple-example'),
-              ('Plotting the mean value for each group',
+               'deriving-ols-from-a-probability-distribution'),
+              ('Independent and Identically Distributed (iid)',
                2,
                None,
-               'plotting-the-mean-value-for-each-group'),
-              ('The logistic function', 2, None, 'the-logistic-function'),
-              ('Examples of likelihood functions used in logistic regression '
-               'and nueral networks',
+               'independent-and-identically-distributed-iid'),
+              ('Maximum Likelihood Estimation (MLE)',
                2,
                None,
-               'examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-networks'),
-              ('Two parameters', 2, None, 'two-parameters'),
-              ('Maximum likelihood', 2, None, 'maximum-likelihood'),
-              ('The cost function rewritten',
+               'maximum-likelihood-estimation-mle'),
+              ('A new Cost Function', 2, None, 'a-new-cost-function'),
+              ('Why resampling methods', 2, None, 'why-resampling-methods'),
+              ('Resampling methods', 2, None, 'resampling-methods'),
+              ('Resampling approaches can be computationally expensive',
                2,
                None,
-               'the-cost-function-rewritten'),
-              ('Minimizing the cross entropy',
+               'resampling-approaches-can-be-computationally-expensive'),
+              ('Why resampling methods ?', 2, None, 'why-resampling-methods'),
+              ('Statistical analysis', 2, None, 'statistical-analysis'),
+              ('Resampling methods', 2, None, 'resampling-methods'),
+              ('Resampling methods: Bootstrap',
                2,
                None,
-               'minimizing-the-cross-entropy'),
-              ('A more compact expression',
+               'resampling-methods-bootstrap'),
+              ('The Central Limit Theorem',
                2,
                None,
-               'a-more-compact-expression'),
-              ('Extending to more predictors',
+               'the-central-limit-theorem'),
+              ('Finding the Limit', 2, None, 'finding-the-limit'),
+              ('Rewriting the $\\delta$-function',
                2,
                None,
-               'extending-to-more-predictors'),
-              ('Including more classes', 2, None, 'including-more-classes'),
-              ('More classes', 2, None, 'more-classes'),
-              ('Searching for Optimal Regularization Parameters $\\lambda$',
+               'rewriting-the-delta-function'),
+              ('Identifying Terms', 2, None, 'identifying-terms'),
+              ('Wrapping it up', 2, None, 'wrapping-it-up'),
+              ('Confidence Intervals', 2, None, 'confidence-intervals'),
+              ('Standard Approach based on the Normal Distribution',
                2,
                None,
-               'searching-for-optimal-regularization-parameters-lambda'),
-              ('Grid Search', 2, None, 'grid-search'),
-              ('Randomized Grid Search', 2, None, 'randomized-grid-search'),
-              ('Wisconsin Cancer Data', 2, None, 'wisconsin-cancer-data'),
-              ('Using the correlation matrix',
+               'standard-approach-based-on-the-normal-distribution'),
+              ('Resampling methods: Bootstrap background',
                2,
                None,
-               'using-the-correlation-matrix'),
-              ('Discussing the correlation data',
+               'resampling-methods-bootstrap-background'),
+              ('Resampling methods: More Bootstrap background',
                2,
                None,
-               'discussing-the-correlation-data'),
-              ('Other measures in classification studies: Cancer Data  again',
+               'resampling-methods-more-bootstrap-background'),
+              ('Resampling methods: Bootstrap approach',
                2,
                None,
-               'other-measures-in-classification-studies-cancer-data-again'),
-              ('Optimization, the central part of any Machine Learning '
-               'algortithm',
+               'resampling-methods-bootstrap-approach'),
+              ('Resampling methods: Bootstrap steps',
                2,
                None,
-               'optimization-the-central-part-of-any-machine-learning-algortithm'),
-              ('Revisiting our Logistic Regression case',
+               'resampling-methods-bootstrap-steps'),
+              ('Code example for the Bootstrap method',
                2,
                None,
-               'revisiting-our-logistic-regression-case'),
-              ('The equations to solve', 2, None, 'the-equations-to-solve'),
-              ("Solving using Newton-Raphson's method",
+               'code-example-for-the-bootstrap-method'),
+              ('Plotting the Histogram', 2, None, 'plotting-the-histogram'),
+              ('The bias-variance tradeoff',
                2,
                None,
-               'solving-using-newton-raphson-s-method'),
-              ("Brief reminder on Newton-Raphson's method",
+               'the-bias-variance-tradeoff'),
+              ('A way to Read the Bias-Variance Tradeoff',
                2,
                None,
-               'brief-reminder-on-newton-raphson-s-method'),
-              ('The equations', 2, None, 'the-equations'),
-              ('Simple geometric interpretation',
+               'a-way-to-read-the-bias-variance-tradeoff'),
+              ('Example code for Bias-Variance tradeoff',
                2,
                None,
-               'simple-geometric-interpretation'),
-              ('Extending to more than one variable',
+               'example-code-for-bias-variance-tradeoff'),
+              ('Understanding what happens',
                2,
                None,
-               'extending-to-more-than-one-variable'),
-              ('Steepest descent', 2, None, 'steepest-descent'),
-              ('More on Steepest descent', 2, None, 'more-on-steepest-descent'),
-              ('The ideal', 2, None, 'the-ideal'),
-              ('The sensitiveness of the gradient descent',
+               'understanding-what-happens'),
+              ('Summing up', 2, None, 'summing-up'),
+              ("Another Example from Scikit-Learn's Repository",
                2,
                None,
-               'the-sensitiveness-of-the-gradient-descent'),
-              ('Convex functions', 2, None, 'convex-functions'),
-              ('Convex function', 2, None, 'convex-function'),
-              ('Conditions on convex functions',
-               2,
-               None,
-               'conditions-on-convex-functions'),
-              ('More on convex functions', 2, None, 'more-on-convex-functions'),
-              ('Some simple problems', 2, None, 'some-simple-problems'),
-              ('Revisiting our first homework',
-               2,
-               None,
-               'revisiting-our-first-homework'),
-              ('Gradient descent example', 2, None, 'gradient-descent-example'),
-              ('The derivative of the cost/loss function',
-               2,
-               None,
-               'the-derivative-of-the-cost-loss-function'),
-              ('The Hessian matrix', 2, None, 'the-hessian-matrix'),
-              ('Simple program', 2, None, 'simple-program'),
-              ('Gradient Descent Example', 2, None, 'gradient-descent-example'),
-              ('And a corresponding example using _scikit-learn_',
-               2,
-               None,
-               'and-a-corresponding-example-using-scikit-learn'),
-              ('Gradient descent and Ridge',
-               2,
-               None,
-               'gradient-descent-and-ridge'),
-              ('The Hessian matrix for Ridge Regression',
-               2,
-               None,
-               'the-hessian-matrix-for-ridge-regression'),
-              ('Program example for gradient descent with Ridge Regression',
-               2,
-               None,
-               'program-example-for-gradient-descent-with-ridge-regression'),
-              ('Using gradient descent methods, limitations',
-               2,
-               None,
-               'using-gradient-descent-methods-limitations'),
-              ('Challenge yourself the coming weekend',
-               2,
-               None,
-               'challenge-yourself-the-coming-weekend'),
-              ('Lab session: Material from last week and relevant for the '
-               'first project',
-               2,
-               None,
-               'lab-session-material-from-last-week-and-relevant-for-the-first-project'),
+               'another-example-from-scikit-learn-s-repository'),
               ('Various steps in cross-validation',
                2,
                None,
                'various-steps-in-cross-validation'),
-              ('How to set up the cross-validation for Ridge and/or Lasso',
-               2,
-               None,
-               'how-to-set-up-the-cross-validation-for-ridge-and-or-lasso'),
               ('Cross-validation in brief',
                2,
                None,
@@ -210,7 +151,19 @@
                'Cross-validation',
                2,
                None,
-               'code-example-for-cross-validation-and-k-fold-cross-validation')]}
+               'code-example-for-cross-validation-and-k-fold-cross-validation'),
+              ('More examples on bootstrap and cross-validation and errors',
+               2,
+               None,
+               'more-examples-on-bootstrap-and-cross-validation-and-errors'),
+              ('The same example but now with cross-validation',
+               2,
+               None,
+               'the-same-example-but-now-with-cross-validation'),
+              ('Material for the lab sessions',
+               2,
+               None,
+               'material-for-the-lab-sessions')]}
 end of tocinfo -->
 
 <body>
@@ -238,76 +191,55 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week38-bs.html">Week 38: Logistic Regression and Optimization</a>
+    <a class="navbar-brand" href="http://github.com/week38-bs.html">Week 38: Statistical analysis, bias-variance tradeoff and resampling methods</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs001.html#plans-for-week-38-lecture-monday-september-16" style="font-size: 80%;">Plans for week 38, lecture Monday September 16</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs002.html#suggested-reading-and-videos" style="font-size: 80%;">Suggested reading and videos</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs003.html#plans-for-the-lab-sessions" style="font-size: 80%;">Plans for the lab sessions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs004.html#material-for-lecture-monday-september-16" style="font-size: 80%;">Material for lecture Monday September 16</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs005.html#logistic-regression" style="font-size: 80%;">Logistic Regression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs006.html#classification-problems" style="font-size: 80%;">Classification problems</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs007.html#optimization-and-deep-learning" style="font-size: 80%;">Optimization and Deep learning</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs008.html#basics" style="font-size: 80%;">Basics</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs009.html#linear-classifier" style="font-size: 80%;">Linear classifier</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs010.html#some-selected-properties" style="font-size: 80%;">Some selected properties</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs011.html#simple-example" style="font-size: 80%;">Simple example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs012.html#plotting-the-mean-value-for-each-group" style="font-size: 80%;">Plotting the mean value for each group</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs013.html#the-logistic-function" style="font-size: 80%;">The logistic function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs014.html#examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-networks" style="font-size: 80%;">Examples of likelihood functions used in logistic regression and nueral networks</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs015.html#two-parameters" style="font-size: 80%;">Two parameters</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs016.html#maximum-likelihood" style="font-size: 80%;">Maximum likelihood</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs017.html#the-cost-function-rewritten" style="font-size: 80%;">The cost function rewritten</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs018.html#minimizing-the-cross-entropy" style="font-size: 80%;">Minimizing the cross entropy</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs019.html#a-more-compact-expression" style="font-size: 80%;">A more compact expression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs020.html#extending-to-more-predictors" style="font-size: 80%;">Extending to more predictors</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs021.html#including-more-classes" style="font-size: 80%;">Including more classes</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs022.html#more-classes" style="font-size: 80%;">More classes</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs023.html#searching-for-optimal-regularization-parameters-lambda" style="font-size: 80%;">Searching for Optimal Regularization Parameters \( \lambda \)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs024.html#grid-search" style="font-size: 80%;">Grid Search</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs025.html#randomized-grid-search" style="font-size: 80%;">Randomized Grid Search</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs026.html#wisconsin-cancer-data" style="font-size: 80%;">Wisconsin Cancer Data</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs027.html#using-the-correlation-matrix" style="font-size: 80%;">Using the correlation matrix</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs028.html#discussing-the-correlation-data" style="font-size: 80%;">Discussing the correlation data</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs029.html#other-measures-in-classification-studies-cancer-data-again" style="font-size: 80%;">Other measures in classification studies: Cancer Data  again</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs030.html#optimization-the-central-part-of-any-machine-learning-algortithm" style="font-size: 80%;">Optimization, the central part of any Machine Learning algortithm</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs031.html#revisiting-our-logistic-regression-case" style="font-size: 80%;">Revisiting our Logistic Regression case</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs032.html#the-equations-to-solve" style="font-size: 80%;">The equations to solve</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs033.html#solving-using-newton-raphson-s-method" style="font-size: 80%;">Solving using Newton-Raphson's method</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs034.html#brief-reminder-on-newton-raphson-s-method" style="font-size: 80%;">Brief reminder on Newton-Raphson's method</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs035.html#the-equations" style="font-size: 80%;">The equations</a></li>
-     <!-- navigation toc: --> <li><a href="#simple-geometric-interpretation" style="font-size: 80%;">Simple geometric interpretation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs037.html#extending-to-more-than-one-variable" style="font-size: 80%;">Extending to more than one variable</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs038.html#steepest-descent" style="font-size: 80%;">Steepest descent</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs039.html#more-on-steepest-descent" style="font-size: 80%;">More on Steepest descent</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs040.html#the-ideal" style="font-size: 80%;">The ideal</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs041.html#the-sensitiveness-of-the-gradient-descent" style="font-size: 80%;">The sensitiveness of the gradient descent</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs042.html#convex-functions" style="font-size: 80%;">Convex functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs043.html#convex-function" style="font-size: 80%;">Convex function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs044.html#conditions-on-convex-functions" style="font-size: 80%;">Conditions on convex functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs045.html#more-on-convex-functions" style="font-size: 80%;">More on convex functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs046.html#some-simple-problems" style="font-size: 80%;">Some simple problems</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs047.html#revisiting-our-first-homework" style="font-size: 80%;">Revisiting our first homework</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs052.html#gradient-descent-example" style="font-size: 80%;">Gradient descent example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs049.html#the-derivative-of-the-cost-loss-function" style="font-size: 80%;">The derivative of the cost/loss function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs050.html#the-hessian-matrix" style="font-size: 80%;">The Hessian matrix</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs051.html#simple-program" style="font-size: 80%;">Simple program</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs052.html#gradient-descent-example" style="font-size: 80%;">Gradient Descent Example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs053.html#and-a-corresponding-example-using-scikit-learn" style="font-size: 80%;">And a corresponding example using <b>scikit-learn</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs054.html#gradient-descent-and-ridge" style="font-size: 80%;">Gradient descent and Ridge</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs055.html#the-hessian-matrix-for-ridge-regression" style="font-size: 80%;">The Hessian matrix for Ridge Regression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs056.html#program-example-for-gradient-descent-with-ridge-regression" style="font-size: 80%;">Program example for gradient descent with Ridge Regression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs057.html#using-gradient-descent-methods-limitations" style="font-size: 80%;">Using gradient descent methods, limitations</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs058.html#challenge-yourself-the-coming-weekend" style="font-size: 80%;">Challenge yourself the coming weekend</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs059.html#lab-session-material-from-last-week-and-relevant-for-the-first-project" style="font-size: 80%;">Lab session: Material from last week and relevant for the first project</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs060.html#various-steps-in-cross-validation" style="font-size: 80%;">Various steps in cross-validation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs061.html#how-to-set-up-the-cross-validation-for-ridge-and-or-lasso" style="font-size: 80%;">How to set up the cross-validation for Ridge and/or Lasso</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs062.html#cross-validation-in-brief" style="font-size: 80%;">Cross-validation in brief</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs063.html#code-example-for-cross-validation-and-k-fold-cross-validation" style="font-size: 80%;">Code Example for Cross-validation and \( k \)-fold Cross-validation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs001.html#plans-for-week-38-lecture-monday-september-15" style="font-size: 80%;">Plans for week 38, lecture Monday September 15</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs002.html#readings-and-videos" style="font-size: 80%;">Readings and Videos</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs003.html#linking-the-regression-analysis-with-a-statistical-interpretation" style="font-size: 80%;">Linking the regression analysis with a statistical interpretation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs004.html#assumptions-made" style="font-size: 80%;">Assumptions made</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs005.html#expectation-value-and-variance" style="font-size: 80%;">Expectation value and variance</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs006.html#expectation-value-and-variance-for-boldsymbol-theta" style="font-size: 80%;">Expectation value and variance for \( \boldsymbol{\theta} \)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs007.html#deriving-ols-from-a-probability-distribution" style="font-size: 80%;">Deriving OLS from a probability distribution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs008.html#independent-and-identically-distributed-iid" style="font-size: 80%;">Independent and Identically Distributed (iid)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs009.html#maximum-likelihood-estimation-mle" style="font-size: 80%;">Maximum Likelihood Estimation (MLE)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs010.html#a-new-cost-function" style="font-size: 80%;">A new Cost Function</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs014.html#why-resampling-methods" style="font-size: 80%;">Why resampling methods</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs016.html#resampling-methods" style="font-size: 80%;">Resampling methods</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs013.html#resampling-approaches-can-be-computationally-expensive" style="font-size: 80%;">Resampling approaches can be computationally expensive</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs014.html#why-resampling-methods" style="font-size: 80%;">Why resampling methods ?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs015.html#statistical-analysis" style="font-size: 80%;">Statistical analysis</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs016.html#resampling-methods" style="font-size: 80%;">Resampling methods</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs017.html#resampling-methods-bootstrap" style="font-size: 80%;">Resampling methods: Bootstrap</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs018.html#the-central-limit-theorem" style="font-size: 80%;">The Central Limit Theorem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs019.html#finding-the-limit" style="font-size: 80%;">Finding the Limit</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs020.html#rewriting-the-delta-function" style="font-size: 80%;">Rewriting the \( \delta \)-function</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs021.html#identifying-terms" style="font-size: 80%;">Identifying Terms</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs022.html#wrapping-it-up" style="font-size: 80%;">Wrapping it up</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs023.html#confidence-intervals" style="font-size: 80%;">Confidence Intervals</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs024.html#standard-approach-based-on-the-normal-distribution" style="font-size: 80%;">Standard Approach based on the Normal Distribution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs025.html#resampling-methods-bootstrap-background" style="font-size: 80%;">Resampling methods: Bootstrap background</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs026.html#resampling-methods-more-bootstrap-background" style="font-size: 80%;">Resampling methods: More Bootstrap background</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs027.html#resampling-methods-bootstrap-approach" style="font-size: 80%;">Resampling methods: Bootstrap approach</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs028.html#resampling-methods-bootstrap-steps" style="font-size: 80%;">Resampling methods: Bootstrap steps</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs029.html#code-example-for-the-bootstrap-method" style="font-size: 80%;">Code example for the Bootstrap method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs030.html#plotting-the-histogram" style="font-size: 80%;">Plotting the Histogram</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs031.html#the-bias-variance-tradeoff" style="font-size: 80%;">The bias-variance tradeoff</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs032.html#a-way-to-read-the-bias-variance-tradeoff" style="font-size: 80%;">A way to Read the Bias-Variance Tradeoff</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs033.html#example-code-for-bias-variance-tradeoff" style="font-size: 80%;">Example code for Bias-Variance tradeoff</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs034.html#understanding-what-happens" style="font-size: 80%;">Understanding what happens</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs035.html#summing-up" style="font-size: 80%;">Summing up</a></li>
+     <!-- navigation toc: --> <li><a href="#another-example-from-scikit-learn-s-repository" style="font-size: 80%;">Another Example from Scikit-Learn's Repository</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs037.html#various-steps-in-cross-validation" style="font-size: 80%;">Various steps in cross-validation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs038.html#cross-validation-in-brief" style="font-size: 80%;">Cross-validation in brief</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs039.html#code-example-for-cross-validation-and-k-fold-cross-validation" style="font-size: 80%;">Code Example for Cross-validation and \( k \)-fold Cross-validation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs040.html#more-examples-on-bootstrap-and-cross-validation-and-errors" style="font-size: 80%;">More examples on bootstrap and cross-validation and errors</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs041.html#the-same-example-but-now-with-cross-validation" style="font-size: 80%;">The same example but now with cross-validation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs042.html#material-for-the-lab-sessions" style="font-size: 80%;">Material for the lab sessions</a></li>
 
         </ul>
       </li>
@@ -319,21 +251,97 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0036"></a>
 <!-- !split -->
-<h2 id="simple-geometric-interpretation" class="anchor">Simple geometric interpretation </h2>
+<h2 id="another-example-from-scikit-learn-s-repository" class="anchor">Another Example from Scikit-Learn's Repository </h2>
 
-<p>The above is Newton-Raphson's method. It has a simple geometric
-interpretation, namely \( x_{n+1} \) is the point where the tangent from
-\( (x_n,f(x_n)) \) crosses the \( x \)-axis.  Close to the solution,
-Newton-Raphson converges fast to the desired result. However, if we
-are far from a root, where the higher-order terms in the series are
-important, the Newton-Raphson formula can give grossly inaccurate
-results. For instance, the initial guess for the root might be so far
-from the true root as to let the search interval include a local
-maximum or minimum of the function.  If an iteration places a trial
-guess near such a local extremum, so that the first derivative nearly
-vanishes, then Newton-Raphson may fail totally
+<p>This example demonstrates the problems of underfitting and overfitting and
+how we can use linear regression with polynomial features to approximate
+nonlinear functions. The plot shows the function that we want to approximate,
+which is a part of the cosine function. In addition, the samples from the
+real function and the approximations of different models are displayed. The
+models have polynomial features of different degrees. We can see that a
+linear function (polynomial with degree 1) is not sufficient to fit the
+training samples. This is called <b>underfitting</b>. A polynomial of degree 4
+approximates the true function almost perfectly. However, for higher degrees
+the model will <b>overfit</b> the training data, i.e. it learns the noise of the
+training data.
+We evaluate quantitatively overfitting and underfitting by using
+cross-validation. We calculate the mean squared error (MSE) on the validation
+set, the higher, the less likely the model generalizes correctly from the
+training data.
 </p>
 
+
+<!-- code=python (!bc pycod) typeset with pygments style "default" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #f8f8f8">
+  <pre style="line-height: 125%;"><span style="color: #408080; font-style: italic">#print(__doc__)</span>
+
+<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
+<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">matplotlib.pyplot</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">plt</span>
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.pipeline</span> <span style="color: #008000; font-weight: bold">import</span> Pipeline
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.preprocessing</span> <span style="color: #008000; font-weight: bold">import</span> PolynomialFeatures
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.linear_model</span> <span style="color: #008000; font-weight: bold">import</span> LinearRegression
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.model_selection</span> <span style="color: #008000; font-weight: bold">import</span> cross_val_score
+
+
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">true_fun</span>(X):
+    <span style="color: #008000; font-weight: bold">return</span> np<span style="color: #666666">.</span>cos(<span style="color: #666666">1.5</span> <span style="color: #666666">*</span> np<span style="color: #666666">.</span>pi <span style="color: #666666">*</span> X)
+
+np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>seed(<span style="color: #666666">0</span>)
+
+n_samples <span style="color: #666666">=</span> <span style="color: #666666">30</span>
+degrees <span style="color: #666666">=</span> [<span style="color: #666666">1</span>, <span style="color: #666666">4</span>, <span style="color: #666666">15</span>]
+
+X <span style="color: #666666">=</span> np<span style="color: #666666">.</span>sort(np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>rand(n_samples))
+y <span style="color: #666666">=</span> true_fun(X) <span style="color: #666666">+</span> np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>randn(n_samples) <span style="color: #666666">*</span> <span style="color: #666666">0.1</span>
+
+plt<span style="color: #666666">.</span>figure(figsize<span style="color: #666666">=</span>(<span style="color: #666666">14</span>, <span style="color: #666666">5</span>))
+<span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #008000">len</span>(degrees)):
+    ax <span style="color: #666666">=</span> plt<span style="color: #666666">.</span>subplot(<span style="color: #666666">1</span>, <span style="color: #008000">len</span>(degrees), i <span style="color: #666666">+</span> <span style="color: #666666">1</span>)
+    plt<span style="color: #666666">.</span>setp(ax, xticks<span style="color: #666666">=</span>(), yticks<span style="color: #666666">=</span>())
+
+    polynomial_features <span style="color: #666666">=</span> PolynomialFeatures(degree<span style="color: #666666">=</span>degrees[i],
+                                             include_bias<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">False</span>)
+    linear_regression <span style="color: #666666">=</span> LinearRegression()
+    pipeline <span style="color: #666666">=</span> Pipeline([(<span style="color: #BA2121">&quot;polynomial_features&quot;</span>, polynomial_features),
+                         (<span style="color: #BA2121">&quot;linear_regression&quot;</span>, linear_regression)])
+    pipeline<span style="color: #666666">.</span>fit(X[:, np<span style="color: #666666">.</span>newaxis], y)
+
+    <span style="color: #408080; font-style: italic"># Evaluate the models using crossvalidation</span>
+    scores <span style="color: #666666">=</span> cross_val_score(pipeline, X[:, np<span style="color: #666666">.</span>newaxis], y,
+                             scoring<span style="color: #666666">=</span><span style="color: #BA2121">&quot;neg_mean_squared_error&quot;</span>, cv<span style="color: #666666">=10</span>)
+
+    X_test <span style="color: #666666">=</span> np<span style="color: #666666">.</span>linspace(<span style="color: #666666">0</span>, <span style="color: #666666">1</span>, <span style="color: #666666">100</span>)
+    plt<span style="color: #666666">.</span>plot(X_test, pipeline<span style="color: #666666">.</span>predict(X_test[:, np<span style="color: #666666">.</span>newaxis]), label<span style="color: #666666">=</span><span style="color: #BA2121">&quot;Model&quot;</span>)
+    plt<span style="color: #666666">.</span>plot(X_test, true_fun(X_test), label<span style="color: #666666">=</span><span style="color: #BA2121">&quot;True function&quot;</span>)
+    plt<span style="color: #666666">.</span>scatter(X, y, edgecolor<span style="color: #666666">=</span><span style="color: #BA2121">&#39;b&#39;</span>, s<span style="color: #666666">=20</span>, label<span style="color: #666666">=</span><span style="color: #BA2121">&quot;Samples&quot;</span>)
+    plt<span style="color: #666666">.</span>xlabel(<span style="color: #BA2121">&quot;x&quot;</span>)
+    plt<span style="color: #666666">.</span>ylabel(<span style="color: #BA2121">&quot;y&quot;</span>)
+    plt<span style="color: #666666">.</span>xlim((<span style="color: #666666">0</span>, <span style="color: #666666">1</span>))
+    plt<span style="color: #666666">.</span>ylim((<span style="color: #666666">-2</span>, <span style="color: #666666">2</span>))
+    plt<span style="color: #666666">.</span>legend(loc<span style="color: #666666">=</span><span style="color: #BA2121">&quot;best&quot;</span>)
+    plt<span style="color: #666666">.</span>title(<span style="color: #BA2121">&quot;Degree </span><span style="color: #BB6688; font-weight: bold">{}</span><span style="color: #BB6622; font-weight: bold">\n</span><span style="color: #BA2121">MSE = </span><span style="color: #BB6688; font-weight: bold">{:.2e}</span><span style="color: #BA2121">(+/- </span><span style="color: #BB6688; font-weight: bold">{:.2e}</span><span style="color: #BA2121">)&quot;</span><span style="color: #666666">.</span>format(
+        degrees[i], <span style="color: #666666">-</span>scores<span style="color: #666666">.</span>mean(), scores<span style="color: #666666">.</span>std()))
+plt<span style="color: #666666">.</span>show()
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
+</div>
+
+
 <p>
 <!-- navigation buttons at the bottom of the page -->
 <ul class="pagination">
@@ -355,11 +363,6 @@ <h2 id="simple-geometric-interpretation" class="anchor">Simple geometric interpr
   <li><a href="http://github.com/._week38-bs040.html">41</a></li>
   <li><a href="http://github.com/._week38-bs041.html">42</a></li>
   <li><a href="http://github.com/._week38-bs042.html">43</a></li>
-  <li><a href="http://github.com/._week38-bs043.html">44</a></li>
-  <li><a href="http://github.com/._week38-bs044.html">45</a></li>
-  <li><a href="http://github.com/._week38-bs045.html">46</a></li>
-  <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week38-bs063.html">64</a></li>
   <li><a href="http://github.com/._week38-bs037.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week38/html/._week38-bs037.html b/doc/pub/week38/html/._week38-bs037.html
index 27f4eac93..0a8a8f285 100644
--- a/doc/pub/week38/html/._week38-bs037.html
+++ b/doc/pub/week38/html/._week38-bs037.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 38: Logistic Regression and Optimization">
-<title>Week 38: Logistic Regression and Optimization</title>
+<meta name="description" content="Week 38: Statistical analysis, bias-variance tradeoff and resampling methods">
+<title>Week 38: Statistical analysis, bias-variance tradeoff and resampling methods</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week38.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week38-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -36,172 +36,113 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Plans for week 38, lecture Monday September 16',
+ 'sections': [('Plans for week 38, lecture Monday September 15',
                2,
                None,
-               'plans-for-week-38-lecture-monday-september-16'),
-              ('Suggested reading and videos',
+               'plans-for-week-38-lecture-monday-september-15'),
+              ('Readings and Videos', 2, None, 'readings-and-videos'),
+              ('Linking the regression analysis with a statistical '
+               'interpretation',
                2,
                None,
-               'suggested-reading-and-videos'),
-              ('Plans for the lab sessions',
+               'linking-the-regression-analysis-with-a-statistical-interpretation'),
+              ('Assumptions made', 2, None, 'assumptions-made'),
+              ('Expectation value and variance',
                2,
                None,
-               'plans-for-the-lab-sessions'),
-              ('Material for lecture Monday September 16',
+               'expectation-value-and-variance'),
+              ('Expectation value and variance for $\\boldsymbol{\\theta}$',
                2,
                None,
-               'material-for-lecture-monday-september-16'),
-              ('Logistic Regression', 2, None, 'logistic-regression'),
-              ('Classification problems', 2, None, 'classification-problems'),
-              ('Optimization and Deep learning',
+               'expectation-value-and-variance-for-boldsymbol-theta'),
+              ('Deriving OLS from a probability distribution',
                2,
                None,
-               'optimization-and-deep-learning'),
-              ('Basics', 2, None, 'basics'),
-              ('Linear classifier', 2, None, 'linear-classifier'),
-              ('Some selected properties', 2, None, 'some-selected-properties'),
-              ('Simple example', 2, None, 'simple-example'),
-              ('Plotting the mean value for each group',
+               'deriving-ols-from-a-probability-distribution'),
+              ('Independent and Identically Distributed (iid)',
                2,
                None,
-               'plotting-the-mean-value-for-each-group'),
-              ('The logistic function', 2, None, 'the-logistic-function'),
-              ('Examples of likelihood functions used in logistic regression '
-               'and nueral networks',
+               'independent-and-identically-distributed-iid'),
+              ('Maximum Likelihood Estimation (MLE)',
                2,
                None,
-               'examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-networks'),
-              ('Two parameters', 2, None, 'two-parameters'),
-              ('Maximum likelihood', 2, None, 'maximum-likelihood'),
-              ('The cost function rewritten',
+               'maximum-likelihood-estimation-mle'),
+              ('A new Cost Function', 2, None, 'a-new-cost-function'),
+              ('Why resampling methods', 2, None, 'why-resampling-methods'),
+              ('Resampling methods', 2, None, 'resampling-methods'),
+              ('Resampling approaches can be computationally expensive',
                2,
                None,
-               'the-cost-function-rewritten'),
-              ('Minimizing the cross entropy',
+               'resampling-approaches-can-be-computationally-expensive'),
+              ('Why resampling methods ?', 2, None, 'why-resampling-methods'),
+              ('Statistical analysis', 2, None, 'statistical-analysis'),
+              ('Resampling methods', 2, None, 'resampling-methods'),
+              ('Resampling methods: Bootstrap',
                2,
                None,
-               'minimizing-the-cross-entropy'),
-              ('A more compact expression',
+               'resampling-methods-bootstrap'),
+              ('The Central Limit Theorem',
                2,
                None,
-               'a-more-compact-expression'),
-              ('Extending to more predictors',
+               'the-central-limit-theorem'),
+              ('Finding the Limit', 2, None, 'finding-the-limit'),
+              ('Rewriting the $\\delta$-function',
                2,
                None,
-               'extending-to-more-predictors'),
-              ('Including more classes', 2, None, 'including-more-classes'),
-              ('More classes', 2, None, 'more-classes'),
-              ('Searching for Optimal Regularization Parameters $\\lambda$',
+               'rewriting-the-delta-function'),
+              ('Identifying Terms', 2, None, 'identifying-terms'),
+              ('Wrapping it up', 2, None, 'wrapping-it-up'),
+              ('Confidence Intervals', 2, None, 'confidence-intervals'),
+              ('Standard Approach based on the Normal Distribution',
                2,
                None,
-               'searching-for-optimal-regularization-parameters-lambda'),
-              ('Grid Search', 2, None, 'grid-search'),
-              ('Randomized Grid Search', 2, None, 'randomized-grid-search'),
-              ('Wisconsin Cancer Data', 2, None, 'wisconsin-cancer-data'),
-              ('Using the correlation matrix',
+               'standard-approach-based-on-the-normal-distribution'),
+              ('Resampling methods: Bootstrap background',
                2,
                None,
-               'using-the-correlation-matrix'),
-              ('Discussing the correlation data',
+               'resampling-methods-bootstrap-background'),
+              ('Resampling methods: More Bootstrap background',
                2,
                None,
-               'discussing-the-correlation-data'),
-              ('Other measures in classification studies: Cancer Data  again',
+               'resampling-methods-more-bootstrap-background'),
+              ('Resampling methods: Bootstrap approach',
                2,
                None,
-               'other-measures-in-classification-studies-cancer-data-again'),
-              ('Optimization, the central part of any Machine Learning '
-               'algortithm',
+               'resampling-methods-bootstrap-approach'),
+              ('Resampling methods: Bootstrap steps',
                2,
                None,
-               'optimization-the-central-part-of-any-machine-learning-algortithm'),
-              ('Revisiting our Logistic Regression case',
+               'resampling-methods-bootstrap-steps'),
+              ('Code example for the Bootstrap method',
                2,
                None,
-               'revisiting-our-logistic-regression-case'),
-              ('The equations to solve', 2, None, 'the-equations-to-solve'),
-              ("Solving using Newton-Raphson's method",
+               'code-example-for-the-bootstrap-method'),
+              ('Plotting the Histogram', 2, None, 'plotting-the-histogram'),
+              ('The bias-variance tradeoff',
                2,
                None,
-               'solving-using-newton-raphson-s-method'),
-              ("Brief reminder on Newton-Raphson's method",
+               'the-bias-variance-tradeoff'),
+              ('A way to Read the Bias-Variance Tradeoff',
                2,
                None,
-               'brief-reminder-on-newton-raphson-s-method'),
-              ('The equations', 2, None, 'the-equations'),
-              ('Simple geometric interpretation',
+               'a-way-to-read-the-bias-variance-tradeoff'),
+              ('Example code for Bias-Variance tradeoff',
                2,
                None,
-               'simple-geometric-interpretation'),
-              ('Extending to more than one variable',
+               'example-code-for-bias-variance-tradeoff'),
+              ('Understanding what happens',
                2,
                None,
-               'extending-to-more-than-one-variable'),
-              ('Steepest descent', 2, None, 'steepest-descent'),
-              ('More on Steepest descent', 2, None, 'more-on-steepest-descent'),
-              ('The ideal', 2, None, 'the-ideal'),
-              ('The sensitiveness of the gradient descent',
+               'understanding-what-happens'),
+              ('Summing up', 2, None, 'summing-up'),
+              ("Another Example from Scikit-Learn's Repository",
                2,
                None,
-               'the-sensitiveness-of-the-gradient-descent'),
-              ('Convex functions', 2, None, 'convex-functions'),
-              ('Convex function', 2, None, 'convex-function'),
-              ('Conditions on convex functions',
-               2,
-               None,
-               'conditions-on-convex-functions'),
-              ('More on convex functions', 2, None, 'more-on-convex-functions'),
-              ('Some simple problems', 2, None, 'some-simple-problems'),
-              ('Revisiting our first homework',
-               2,
-               None,
-               'revisiting-our-first-homework'),
-              ('Gradient descent example', 2, None, 'gradient-descent-example'),
-              ('The derivative of the cost/loss function',
-               2,
-               None,
-               'the-derivative-of-the-cost-loss-function'),
-              ('The Hessian matrix', 2, None, 'the-hessian-matrix'),
-              ('Simple program', 2, None, 'simple-program'),
-              ('Gradient Descent Example', 2, None, 'gradient-descent-example'),
-              ('And a corresponding example using _scikit-learn_',
-               2,
-               None,
-               'and-a-corresponding-example-using-scikit-learn'),
-              ('Gradient descent and Ridge',
-               2,
-               None,
-               'gradient-descent-and-ridge'),
-              ('The Hessian matrix for Ridge Regression',
-               2,
-               None,
-               'the-hessian-matrix-for-ridge-regression'),
-              ('Program example for gradient descent with Ridge Regression',
-               2,
-               None,
-               'program-example-for-gradient-descent-with-ridge-regression'),
-              ('Using gradient descent methods, limitations',
-               2,
-               None,
-               'using-gradient-descent-methods-limitations'),
-              ('Challenge yourself the coming weekend',
-               2,
-               None,
-               'challenge-yourself-the-coming-weekend'),
-              ('Lab session: Material from last week and relevant for the '
-               'first project',
-               2,
-               None,
-               'lab-session-material-from-last-week-and-relevant-for-the-first-project'),
+               'another-example-from-scikit-learn-s-repository'),
               ('Various steps in cross-validation',
                2,
                None,
                'various-steps-in-cross-validation'),
-              ('How to set up the cross-validation for Ridge and/or Lasso',
-               2,
-               None,
-               'how-to-set-up-the-cross-validation-for-ridge-and-or-lasso'),
               ('Cross-validation in brief',
                2,
                None,
@@ -210,7 +151,19 @@
                'Cross-validation',
                2,
                None,
-               'code-example-for-cross-validation-and-k-fold-cross-validation')]}
+               'code-example-for-cross-validation-and-k-fold-cross-validation'),
+              ('More examples on bootstrap and cross-validation and errors',
+               2,
+               None,
+               'more-examples-on-bootstrap-and-cross-validation-and-errors'),
+              ('The same example but now with cross-validation',
+               2,
+               None,
+               'the-same-example-but-now-with-cross-validation'),
+              ('Material for the lab sessions',
+               2,
+               None,
+               'material-for-the-lab-sessions')]}
 end of tocinfo -->
 
 <body>
@@ -238,76 +191,55 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week38-bs.html">Week 38: Logistic Regression and Optimization</a>
+    <a class="navbar-brand" href="http://github.com/week38-bs.html">Week 38: Statistical analysis, bias-variance tradeoff and resampling methods</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs001.html#plans-for-week-38-lecture-monday-september-16" style="font-size: 80%;">Plans for week 38, lecture Monday September 16</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs002.html#suggested-reading-and-videos" style="font-size: 80%;">Suggested reading and videos</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs003.html#plans-for-the-lab-sessions" style="font-size: 80%;">Plans for the lab sessions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs004.html#material-for-lecture-monday-september-16" style="font-size: 80%;">Material for lecture Monday September 16</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs005.html#logistic-regression" style="font-size: 80%;">Logistic Regression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs006.html#classification-problems" style="font-size: 80%;">Classification problems</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs007.html#optimization-and-deep-learning" style="font-size: 80%;">Optimization and Deep learning</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs008.html#basics" style="font-size: 80%;">Basics</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs009.html#linear-classifier" style="font-size: 80%;">Linear classifier</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs010.html#some-selected-properties" style="font-size: 80%;">Some selected properties</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs011.html#simple-example" style="font-size: 80%;">Simple example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs012.html#plotting-the-mean-value-for-each-group" style="font-size: 80%;">Plotting the mean value for each group</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs013.html#the-logistic-function" style="font-size: 80%;">The logistic function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs014.html#examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-networks" style="font-size: 80%;">Examples of likelihood functions used in logistic regression and nueral networks</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs015.html#two-parameters" style="font-size: 80%;">Two parameters</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs016.html#maximum-likelihood" style="font-size: 80%;">Maximum likelihood</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs017.html#the-cost-function-rewritten" style="font-size: 80%;">The cost function rewritten</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs018.html#minimizing-the-cross-entropy" style="font-size: 80%;">Minimizing the cross entropy</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs019.html#a-more-compact-expression" style="font-size: 80%;">A more compact expression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs020.html#extending-to-more-predictors" style="font-size: 80%;">Extending to more predictors</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs021.html#including-more-classes" style="font-size: 80%;">Including more classes</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs022.html#more-classes" style="font-size: 80%;">More classes</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs023.html#searching-for-optimal-regularization-parameters-lambda" style="font-size: 80%;">Searching for Optimal Regularization Parameters \( \lambda \)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs024.html#grid-search" style="font-size: 80%;">Grid Search</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs025.html#randomized-grid-search" style="font-size: 80%;">Randomized Grid Search</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs026.html#wisconsin-cancer-data" style="font-size: 80%;">Wisconsin Cancer Data</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs027.html#using-the-correlation-matrix" style="font-size: 80%;">Using the correlation matrix</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs028.html#discussing-the-correlation-data" style="font-size: 80%;">Discussing the correlation data</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs029.html#other-measures-in-classification-studies-cancer-data-again" style="font-size: 80%;">Other measures in classification studies: Cancer Data  again</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs030.html#optimization-the-central-part-of-any-machine-learning-algortithm" style="font-size: 80%;">Optimization, the central part of any Machine Learning algortithm</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs031.html#revisiting-our-logistic-regression-case" style="font-size: 80%;">Revisiting our Logistic Regression case</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs032.html#the-equations-to-solve" style="font-size: 80%;">The equations to solve</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs033.html#solving-using-newton-raphson-s-method" style="font-size: 80%;">Solving using Newton-Raphson's method</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs034.html#brief-reminder-on-newton-raphson-s-method" style="font-size: 80%;">Brief reminder on Newton-Raphson's method</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs035.html#the-equations" style="font-size: 80%;">The equations</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs036.html#simple-geometric-interpretation" style="font-size: 80%;">Simple geometric interpretation</a></li>
-     <!-- navigation toc: --> <li><a href="#extending-to-more-than-one-variable" style="font-size: 80%;">Extending to more than one variable</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs038.html#steepest-descent" style="font-size: 80%;">Steepest descent</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs039.html#more-on-steepest-descent" style="font-size: 80%;">More on Steepest descent</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs040.html#the-ideal" style="font-size: 80%;">The ideal</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs041.html#the-sensitiveness-of-the-gradient-descent" style="font-size: 80%;">The sensitiveness of the gradient descent</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs042.html#convex-functions" style="font-size: 80%;">Convex functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs043.html#convex-function" style="font-size: 80%;">Convex function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs044.html#conditions-on-convex-functions" style="font-size: 80%;">Conditions on convex functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs045.html#more-on-convex-functions" style="font-size: 80%;">More on convex functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs046.html#some-simple-problems" style="font-size: 80%;">Some simple problems</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs047.html#revisiting-our-first-homework" style="font-size: 80%;">Revisiting our first homework</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs052.html#gradient-descent-example" style="font-size: 80%;">Gradient descent example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs049.html#the-derivative-of-the-cost-loss-function" style="font-size: 80%;">The derivative of the cost/loss function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs050.html#the-hessian-matrix" style="font-size: 80%;">The Hessian matrix</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs051.html#simple-program" style="font-size: 80%;">Simple program</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs052.html#gradient-descent-example" style="font-size: 80%;">Gradient Descent Example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs053.html#and-a-corresponding-example-using-scikit-learn" style="font-size: 80%;">And a corresponding example using <b>scikit-learn</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs054.html#gradient-descent-and-ridge" style="font-size: 80%;">Gradient descent and Ridge</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs055.html#the-hessian-matrix-for-ridge-regression" style="font-size: 80%;">The Hessian matrix for Ridge Regression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs056.html#program-example-for-gradient-descent-with-ridge-regression" style="font-size: 80%;">Program example for gradient descent with Ridge Regression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs057.html#using-gradient-descent-methods-limitations" style="font-size: 80%;">Using gradient descent methods, limitations</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs058.html#challenge-yourself-the-coming-weekend" style="font-size: 80%;">Challenge yourself the coming weekend</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs059.html#lab-session-material-from-last-week-and-relevant-for-the-first-project" style="font-size: 80%;">Lab session: Material from last week and relevant for the first project</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs060.html#various-steps-in-cross-validation" style="font-size: 80%;">Various steps in cross-validation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs061.html#how-to-set-up-the-cross-validation-for-ridge-and-or-lasso" style="font-size: 80%;">How to set up the cross-validation for Ridge and/or Lasso</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs062.html#cross-validation-in-brief" style="font-size: 80%;">Cross-validation in brief</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs063.html#code-example-for-cross-validation-and-k-fold-cross-validation" style="font-size: 80%;">Code Example for Cross-validation and \( k \)-fold Cross-validation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs001.html#plans-for-week-38-lecture-monday-september-15" style="font-size: 80%;">Plans for week 38, lecture Monday September 15</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs002.html#readings-and-videos" style="font-size: 80%;">Readings and Videos</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs003.html#linking-the-regression-analysis-with-a-statistical-interpretation" style="font-size: 80%;">Linking the regression analysis with a statistical interpretation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs004.html#assumptions-made" style="font-size: 80%;">Assumptions made</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs005.html#expectation-value-and-variance" style="font-size: 80%;">Expectation value and variance</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs006.html#expectation-value-and-variance-for-boldsymbol-theta" style="font-size: 80%;">Expectation value and variance for \( \boldsymbol{\theta} \)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs007.html#deriving-ols-from-a-probability-distribution" style="font-size: 80%;">Deriving OLS from a probability distribution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs008.html#independent-and-identically-distributed-iid" style="font-size: 80%;">Independent and Identically Distributed (iid)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs009.html#maximum-likelihood-estimation-mle" style="font-size: 80%;">Maximum Likelihood Estimation (MLE)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs010.html#a-new-cost-function" style="font-size: 80%;">A new Cost Function</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs014.html#why-resampling-methods" style="font-size: 80%;">Why resampling methods</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs016.html#resampling-methods" style="font-size: 80%;">Resampling methods</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs013.html#resampling-approaches-can-be-computationally-expensive" style="font-size: 80%;">Resampling approaches can be computationally expensive</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs014.html#why-resampling-methods" style="font-size: 80%;">Why resampling methods ?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs015.html#statistical-analysis" style="font-size: 80%;">Statistical analysis</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs016.html#resampling-methods" style="font-size: 80%;">Resampling methods</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs017.html#resampling-methods-bootstrap" style="font-size: 80%;">Resampling methods: Bootstrap</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs018.html#the-central-limit-theorem" style="font-size: 80%;">The Central Limit Theorem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs019.html#finding-the-limit" style="font-size: 80%;">Finding the Limit</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs020.html#rewriting-the-delta-function" style="font-size: 80%;">Rewriting the \( \delta \)-function</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs021.html#identifying-terms" style="font-size: 80%;">Identifying Terms</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs022.html#wrapping-it-up" style="font-size: 80%;">Wrapping it up</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs023.html#confidence-intervals" style="font-size: 80%;">Confidence Intervals</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs024.html#standard-approach-based-on-the-normal-distribution" style="font-size: 80%;">Standard Approach based on the Normal Distribution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs025.html#resampling-methods-bootstrap-background" style="font-size: 80%;">Resampling methods: Bootstrap background</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs026.html#resampling-methods-more-bootstrap-background" style="font-size: 80%;">Resampling methods: More Bootstrap background</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs027.html#resampling-methods-bootstrap-approach" style="font-size: 80%;">Resampling methods: Bootstrap approach</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs028.html#resampling-methods-bootstrap-steps" style="font-size: 80%;">Resampling methods: Bootstrap steps</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs029.html#code-example-for-the-bootstrap-method" style="font-size: 80%;">Code example for the Bootstrap method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs030.html#plotting-the-histogram" style="font-size: 80%;">Plotting the Histogram</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs031.html#the-bias-variance-tradeoff" style="font-size: 80%;">The bias-variance tradeoff</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs032.html#a-way-to-read-the-bias-variance-tradeoff" style="font-size: 80%;">A way to Read the Bias-Variance Tradeoff</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs033.html#example-code-for-bias-variance-tradeoff" style="font-size: 80%;">Example code for Bias-Variance tradeoff</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs034.html#understanding-what-happens" style="font-size: 80%;">Understanding what happens</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs035.html#summing-up" style="font-size: 80%;">Summing up</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs036.html#another-example-from-scikit-learn-s-repository" style="font-size: 80%;">Another Example from Scikit-Learn's Repository</a></li>
+     <!-- navigation toc: --> <li><a href="#various-steps-in-cross-validation" style="font-size: 80%;">Various steps in cross-validation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs038.html#cross-validation-in-brief" style="font-size: 80%;">Cross-validation in brief</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs039.html#code-example-for-cross-validation-and-k-fold-cross-validation" style="font-size: 80%;">Code Example for Cross-validation and \( k \)-fold Cross-validation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs040.html#more-examples-on-bootstrap-and-cross-validation-and-errors" style="font-size: 80%;">More examples on bootstrap and cross-validation and errors</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs041.html#the-same-example-but-now-with-cross-validation" style="font-size: 80%;">The same example but now with cross-validation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs042.html#material-for-the-lab-sessions" style="font-size: 80%;">Material for the lab sessions</a></li>
 
         </ul>
       </li>
@@ -318,58 +250,23 @@
 <div class="container">
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0037"></a>
-<!-- !split -->
-<h2 id="extending-to-more-than-one-variable" class="anchor">Extending to more than one variable </h2>
+<!-- !split  -->
+<h2 id="various-steps-in-cross-validation" class="anchor">Various steps in cross-validation </h2>
 
-<p>Newton's method can be generalized to systems of several non-linear equations
-and variables. Consider the case with two equations
-</p>
-$$
-   \begin{array}{cc} f_1(x_1,x_2) &=0\\
-                     f_2(x_1,x_2) &=0,\end{array}
-$$
-
-<p>which we Taylor expand to obtain</p>
-
-$$
-   \begin{array}{cc} 0=f_1(x_1+h_1,x_2+h_2)=&f_1(x_1,x_2)+h_1
-                     \partial f_1/\partial x_1+h_2
-                     \partial f_1/\partial x_2+\dots\\
-                     0=f_2(x_1+h_1,x_2+h_2)=&f_2(x_1,x_2)+h_1
-                     \partial f_2/\partial x_1+h_2
-                     \partial f_2/\partial x_2+\dots
-                       \end{array}.
-$$
-
-<p>Defining the Jacobian matrix \( {\bf \boldsymbol{J}} \) we have</p>
-$$
- {\bf \boldsymbol{J}}=\left( \begin{array}{cc}
-                         \partial f_1/\partial x_1  & \partial f_1/\partial x_2 \\
-                          \partial f_2/\partial x_1     &\partial f_2/\partial x_2
-             \end{array} \right),
-$$
-
-<p>we can rephrase Newton's method as</p>
-$$
-\left(\begin{array}{c} x_1^{n+1} \\ x_2^{n+1} \end{array} \right)=
-\left(\begin{array}{c} x_1^{n} \\ x_2^{n} \end{array} \right)+
-\left(\begin{array}{c} h_1^{n} \\ h_2^{n} \end{array} \right),
-$$
-
-<p>where we have defined</p>
-$$
-   \left(\begin{array}{c} h_1^{n} \\ h_2^{n} \end{array} \right)=
-   -{\bf \boldsymbol{J}}^{-1}
-   \left(\begin{array}{c} f_1(x_1^{n},x_2^{n}) \\ f_2(x_1^{n},x_2^{n}) \end{array} \right).
-$$
-
-<p>We need thus to compute the inverse of the Jacobian matrix and it
-is to understand that difficulties  may
-arise in case \( {\bf \boldsymbol{J}} \) is nearly singular.
-</p>
-
-<p>It is rather straightforward to extend the above scheme to systems of
-more than two non-linear equations. In our case, the Jacobian matrix is given by the Hessian that represents the second derivative of cost function. 
+<p>When the repetitive splitting of the data set is done randomly,
+samples may accidently end up in a fast majority of the splits in
+either training or test set. Such samples may have an unbalanced
+influence on either model building or prediction evaluation. To avoid
+this \( k \)-fold cross-validation structures the data splitting. The
+samples are divided into \( k \) more or less equally sized exhaustive and
+mutually exclusive subsets. In turn (at each split) one of these
+subsets plays the role of the test set while the union of the
+remaining subsets constitutes the training set. Such a splitting
+warrants a balanced representation of each sample in both training and
+test set over the splits. Still the division into the \( k \) subsets
+involves a degree of randomness. This may be fully excluded when
+choosing \( k=n \). This particular case is referred to as leave-one-out
+cross-validation (LOOCV). 
 </p>
 
 <p>
@@ -392,12 +289,6 @@ <h2 id="extending-to-more-than-one-variable" class="anchor">Extending to more th
   <li><a href="http://github.com/._week38-bs040.html">41</a></li>
   <li><a href="http://github.com/._week38-bs041.html">42</a></li>
   <li><a href="http://github.com/._week38-bs042.html">43</a></li>
-  <li><a href="http://github.com/._week38-bs043.html">44</a></li>
-  <li><a href="http://github.com/._week38-bs044.html">45</a></li>
-  <li><a href="http://github.com/._week38-bs045.html">46</a></li>
-  <li><a href="http://github.com/._week38-bs046.html">47</a></li>
-  <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week38-bs063.html">64</a></li>
   <li><a href="http://github.com/._week38-bs038.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week38/html/._week38-bs038.html b/doc/pub/week38/html/._week38-bs038.html
index ad0bc9017..64d21ddac 100644
--- a/doc/pub/week38/html/._week38-bs038.html
+++ b/doc/pub/week38/html/._week38-bs038.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 38: Logistic Regression and Optimization">
-<title>Week 38: Logistic Regression and Optimization</title>
+<meta name="description" content="Week 38: Statistical analysis, bias-variance tradeoff and resampling methods">
+<title>Week 38: Statistical analysis, bias-variance tradeoff and resampling methods</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week38.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week38-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -36,172 +36,113 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Plans for week 38, lecture Monday September 16',
+ 'sections': [('Plans for week 38, lecture Monday September 15',
                2,
                None,
-               'plans-for-week-38-lecture-monday-september-16'),
-              ('Suggested reading and videos',
+               'plans-for-week-38-lecture-monday-september-15'),
+              ('Readings and Videos', 2, None, 'readings-and-videos'),
+              ('Linking the regression analysis with a statistical '
+               'interpretation',
                2,
                None,
-               'suggested-reading-and-videos'),
-              ('Plans for the lab sessions',
+               'linking-the-regression-analysis-with-a-statistical-interpretation'),
+              ('Assumptions made', 2, None, 'assumptions-made'),
+              ('Expectation value and variance',
                2,
                None,
-               'plans-for-the-lab-sessions'),
-              ('Material for lecture Monday September 16',
+               'expectation-value-and-variance'),
+              ('Expectation value and variance for $\\boldsymbol{\\theta}$',
                2,
                None,
-               'material-for-lecture-monday-september-16'),
-              ('Logistic Regression', 2, None, 'logistic-regression'),
-              ('Classification problems', 2, None, 'classification-problems'),
-              ('Optimization and Deep learning',
+               'expectation-value-and-variance-for-boldsymbol-theta'),
+              ('Deriving OLS from a probability distribution',
                2,
                None,
-               'optimization-and-deep-learning'),
-              ('Basics', 2, None, 'basics'),
-              ('Linear classifier', 2, None, 'linear-classifier'),
-              ('Some selected properties', 2, None, 'some-selected-properties'),
-              ('Simple example', 2, None, 'simple-example'),
-              ('Plotting the mean value for each group',
+               'deriving-ols-from-a-probability-distribution'),
+              ('Independent and Identically Distributed (iid)',
                2,
                None,
-               'plotting-the-mean-value-for-each-group'),
-              ('The logistic function', 2, None, 'the-logistic-function'),
-              ('Examples of likelihood functions used in logistic regression '
-               'and nueral networks',
+               'independent-and-identically-distributed-iid'),
+              ('Maximum Likelihood Estimation (MLE)',
                2,
                None,
-               'examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-networks'),
-              ('Two parameters', 2, None, 'two-parameters'),
-              ('Maximum likelihood', 2, None, 'maximum-likelihood'),
-              ('The cost function rewritten',
+               'maximum-likelihood-estimation-mle'),
+              ('A new Cost Function', 2, None, 'a-new-cost-function'),
+              ('Why resampling methods', 2, None, 'why-resampling-methods'),
+              ('Resampling methods', 2, None, 'resampling-methods'),
+              ('Resampling approaches can be computationally expensive',
                2,
                None,
-               'the-cost-function-rewritten'),
-              ('Minimizing the cross entropy',
+               'resampling-approaches-can-be-computationally-expensive'),
+              ('Why resampling methods ?', 2, None, 'why-resampling-methods'),
+              ('Statistical analysis', 2, None, 'statistical-analysis'),
+              ('Resampling methods', 2, None, 'resampling-methods'),
+              ('Resampling methods: Bootstrap',
                2,
                None,
-               'minimizing-the-cross-entropy'),
-              ('A more compact expression',
+               'resampling-methods-bootstrap'),
+              ('The Central Limit Theorem',
                2,
                None,
-               'a-more-compact-expression'),
-              ('Extending to more predictors',
+               'the-central-limit-theorem'),
+              ('Finding the Limit', 2, None, 'finding-the-limit'),
+              ('Rewriting the $\\delta$-function',
                2,
                None,
-               'extending-to-more-predictors'),
-              ('Including more classes', 2, None, 'including-more-classes'),
-              ('More classes', 2, None, 'more-classes'),
-              ('Searching for Optimal Regularization Parameters $\\lambda$',
+               'rewriting-the-delta-function'),
+              ('Identifying Terms', 2, None, 'identifying-terms'),
+              ('Wrapping it up', 2, None, 'wrapping-it-up'),
+              ('Confidence Intervals', 2, None, 'confidence-intervals'),
+              ('Standard Approach based on the Normal Distribution',
                2,
                None,
-               'searching-for-optimal-regularization-parameters-lambda'),
-              ('Grid Search', 2, None, 'grid-search'),
-              ('Randomized Grid Search', 2, None, 'randomized-grid-search'),
-              ('Wisconsin Cancer Data', 2, None, 'wisconsin-cancer-data'),
-              ('Using the correlation matrix',
+               'standard-approach-based-on-the-normal-distribution'),
+              ('Resampling methods: Bootstrap background',
                2,
                None,
-               'using-the-correlation-matrix'),
-              ('Discussing the correlation data',
+               'resampling-methods-bootstrap-background'),
+              ('Resampling methods: More Bootstrap background',
                2,
                None,
-               'discussing-the-correlation-data'),
-              ('Other measures in classification studies: Cancer Data  again',
+               'resampling-methods-more-bootstrap-background'),
+              ('Resampling methods: Bootstrap approach',
                2,
                None,
-               'other-measures-in-classification-studies-cancer-data-again'),
-              ('Optimization, the central part of any Machine Learning '
-               'algortithm',
+               'resampling-methods-bootstrap-approach'),
+              ('Resampling methods: Bootstrap steps',
                2,
                None,
-               'optimization-the-central-part-of-any-machine-learning-algortithm'),
-              ('Revisiting our Logistic Regression case',
+               'resampling-methods-bootstrap-steps'),
+              ('Code example for the Bootstrap method',
                2,
                None,
-               'revisiting-our-logistic-regression-case'),
-              ('The equations to solve', 2, None, 'the-equations-to-solve'),
-              ("Solving using Newton-Raphson's method",
+               'code-example-for-the-bootstrap-method'),
+              ('Plotting the Histogram', 2, None, 'plotting-the-histogram'),
+              ('The bias-variance tradeoff',
                2,
                None,
-               'solving-using-newton-raphson-s-method'),
-              ("Brief reminder on Newton-Raphson's method",
+               'the-bias-variance-tradeoff'),
+              ('A way to Read the Bias-Variance Tradeoff',
                2,
                None,
-               'brief-reminder-on-newton-raphson-s-method'),
-              ('The equations', 2, None, 'the-equations'),
-              ('Simple geometric interpretation',
+               'a-way-to-read-the-bias-variance-tradeoff'),
+              ('Example code for Bias-Variance tradeoff',
                2,
                None,
-               'simple-geometric-interpretation'),
-              ('Extending to more than one variable',
+               'example-code-for-bias-variance-tradeoff'),
+              ('Understanding what happens',
                2,
                None,
-               'extending-to-more-than-one-variable'),
-              ('Steepest descent', 2, None, 'steepest-descent'),
-              ('More on Steepest descent', 2, None, 'more-on-steepest-descent'),
-              ('The ideal', 2, None, 'the-ideal'),
-              ('The sensitiveness of the gradient descent',
+               'understanding-what-happens'),
+              ('Summing up', 2, None, 'summing-up'),
+              ("Another Example from Scikit-Learn's Repository",
                2,
                None,
-               'the-sensitiveness-of-the-gradient-descent'),
-              ('Convex functions', 2, None, 'convex-functions'),
-              ('Convex function', 2, None, 'convex-function'),
-              ('Conditions on convex functions',
-               2,
-               None,
-               'conditions-on-convex-functions'),
-              ('More on convex functions', 2, None, 'more-on-convex-functions'),
-              ('Some simple problems', 2, None, 'some-simple-problems'),
-              ('Revisiting our first homework',
-               2,
-               None,
-               'revisiting-our-first-homework'),
-              ('Gradient descent example', 2, None, 'gradient-descent-example'),
-              ('The derivative of the cost/loss function',
-               2,
-               None,
-               'the-derivative-of-the-cost-loss-function'),
-              ('The Hessian matrix', 2, None, 'the-hessian-matrix'),
-              ('Simple program', 2, None, 'simple-program'),
-              ('Gradient Descent Example', 2, None, 'gradient-descent-example'),
-              ('And a corresponding example using _scikit-learn_',
-               2,
-               None,
-               'and-a-corresponding-example-using-scikit-learn'),
-              ('Gradient descent and Ridge',
-               2,
-               None,
-               'gradient-descent-and-ridge'),
-              ('The Hessian matrix for Ridge Regression',
-               2,
-               None,
-               'the-hessian-matrix-for-ridge-regression'),
-              ('Program example for gradient descent with Ridge Regression',
-               2,
-               None,
-               'program-example-for-gradient-descent-with-ridge-regression'),
-              ('Using gradient descent methods, limitations',
-               2,
-               None,
-               'using-gradient-descent-methods-limitations'),
-              ('Challenge yourself the coming weekend',
-               2,
-               None,
-               'challenge-yourself-the-coming-weekend'),
-              ('Lab session: Material from last week and relevant for the '
-               'first project',
-               2,
-               None,
-               'lab-session-material-from-last-week-and-relevant-for-the-first-project'),
+               'another-example-from-scikit-learn-s-repository'),
               ('Various steps in cross-validation',
                2,
                None,
                'various-steps-in-cross-validation'),
-              ('How to set up the cross-validation for Ridge and/or Lasso',
-               2,
-               None,
-               'how-to-set-up-the-cross-validation-for-ridge-and-or-lasso'),
               ('Cross-validation in brief',
                2,
                None,
@@ -210,7 +151,19 @@
                'Cross-validation',
                2,
                None,
-               'code-example-for-cross-validation-and-k-fold-cross-validation')]}
+               'code-example-for-cross-validation-and-k-fold-cross-validation'),
+              ('More examples on bootstrap and cross-validation and errors',
+               2,
+               None,
+               'more-examples-on-bootstrap-and-cross-validation-and-errors'),
+              ('The same example but now with cross-validation',
+               2,
+               None,
+               'the-same-example-but-now-with-cross-validation'),
+              ('Material for the lab sessions',
+               2,
+               None,
+               'material-for-the-lab-sessions')]}
 end of tocinfo -->
 
 <body>
@@ -238,76 +191,55 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week38-bs.html">Week 38: Logistic Regression and Optimization</a>
+    <a class="navbar-brand" href="http://github.com/week38-bs.html">Week 38: Statistical analysis, bias-variance tradeoff and resampling methods</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs001.html#plans-for-week-38-lecture-monday-september-16" style="font-size: 80%;">Plans for week 38, lecture Monday September 16</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs002.html#suggested-reading-and-videos" style="font-size: 80%;">Suggested reading and videos</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs003.html#plans-for-the-lab-sessions" style="font-size: 80%;">Plans for the lab sessions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs004.html#material-for-lecture-monday-september-16" style="font-size: 80%;">Material for lecture Monday September 16</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs005.html#logistic-regression" style="font-size: 80%;">Logistic Regression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs006.html#classification-problems" style="font-size: 80%;">Classification problems</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs007.html#optimization-and-deep-learning" style="font-size: 80%;">Optimization and Deep learning</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs008.html#basics" style="font-size: 80%;">Basics</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs009.html#linear-classifier" style="font-size: 80%;">Linear classifier</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs010.html#some-selected-properties" style="font-size: 80%;">Some selected properties</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs011.html#simple-example" style="font-size: 80%;">Simple example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs012.html#plotting-the-mean-value-for-each-group" style="font-size: 80%;">Plotting the mean value for each group</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs013.html#the-logistic-function" style="font-size: 80%;">The logistic function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs014.html#examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-networks" style="font-size: 80%;">Examples of likelihood functions used in logistic regression and nueral networks</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs015.html#two-parameters" style="font-size: 80%;">Two parameters</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs016.html#maximum-likelihood" style="font-size: 80%;">Maximum likelihood</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs017.html#the-cost-function-rewritten" style="font-size: 80%;">The cost function rewritten</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs018.html#minimizing-the-cross-entropy" style="font-size: 80%;">Minimizing the cross entropy</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs019.html#a-more-compact-expression" style="font-size: 80%;">A more compact expression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs020.html#extending-to-more-predictors" style="font-size: 80%;">Extending to more predictors</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs021.html#including-more-classes" style="font-size: 80%;">Including more classes</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs022.html#more-classes" style="font-size: 80%;">More classes</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs023.html#searching-for-optimal-regularization-parameters-lambda" style="font-size: 80%;">Searching for Optimal Regularization Parameters \( \lambda \)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs024.html#grid-search" style="font-size: 80%;">Grid Search</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs025.html#randomized-grid-search" style="font-size: 80%;">Randomized Grid Search</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs026.html#wisconsin-cancer-data" style="font-size: 80%;">Wisconsin Cancer Data</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs027.html#using-the-correlation-matrix" style="font-size: 80%;">Using the correlation matrix</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs028.html#discussing-the-correlation-data" style="font-size: 80%;">Discussing the correlation data</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs029.html#other-measures-in-classification-studies-cancer-data-again" style="font-size: 80%;">Other measures in classification studies: Cancer Data  again</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs030.html#optimization-the-central-part-of-any-machine-learning-algortithm" style="font-size: 80%;">Optimization, the central part of any Machine Learning algortithm</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs031.html#revisiting-our-logistic-regression-case" style="font-size: 80%;">Revisiting our Logistic Regression case</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs032.html#the-equations-to-solve" style="font-size: 80%;">The equations to solve</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs033.html#solving-using-newton-raphson-s-method" style="font-size: 80%;">Solving using Newton-Raphson's method</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs034.html#brief-reminder-on-newton-raphson-s-method" style="font-size: 80%;">Brief reminder on Newton-Raphson's method</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs035.html#the-equations" style="font-size: 80%;">The equations</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs036.html#simple-geometric-interpretation" style="font-size: 80%;">Simple geometric interpretation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs037.html#extending-to-more-than-one-variable" style="font-size: 80%;">Extending to more than one variable</a></li>
-     <!-- navigation toc: --> <li><a href="#steepest-descent" style="font-size: 80%;">Steepest descent</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs039.html#more-on-steepest-descent" style="font-size: 80%;">More on Steepest descent</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs040.html#the-ideal" style="font-size: 80%;">The ideal</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs041.html#the-sensitiveness-of-the-gradient-descent" style="font-size: 80%;">The sensitiveness of the gradient descent</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs042.html#convex-functions" style="font-size: 80%;">Convex functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs043.html#convex-function" style="font-size: 80%;">Convex function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs044.html#conditions-on-convex-functions" style="font-size: 80%;">Conditions on convex functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs045.html#more-on-convex-functions" style="font-size: 80%;">More on convex functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs046.html#some-simple-problems" style="font-size: 80%;">Some simple problems</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs047.html#revisiting-our-first-homework" style="font-size: 80%;">Revisiting our first homework</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs052.html#gradient-descent-example" style="font-size: 80%;">Gradient descent example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs049.html#the-derivative-of-the-cost-loss-function" style="font-size: 80%;">The derivative of the cost/loss function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs050.html#the-hessian-matrix" style="font-size: 80%;">The Hessian matrix</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs051.html#simple-program" style="font-size: 80%;">Simple program</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs052.html#gradient-descent-example" style="font-size: 80%;">Gradient Descent Example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs053.html#and-a-corresponding-example-using-scikit-learn" style="font-size: 80%;">And a corresponding example using <b>scikit-learn</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs054.html#gradient-descent-and-ridge" style="font-size: 80%;">Gradient descent and Ridge</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs055.html#the-hessian-matrix-for-ridge-regression" style="font-size: 80%;">The Hessian matrix for Ridge Regression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs056.html#program-example-for-gradient-descent-with-ridge-regression" style="font-size: 80%;">Program example for gradient descent with Ridge Regression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs057.html#using-gradient-descent-methods-limitations" style="font-size: 80%;">Using gradient descent methods, limitations</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs058.html#challenge-yourself-the-coming-weekend" style="font-size: 80%;">Challenge yourself the coming weekend</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs059.html#lab-session-material-from-last-week-and-relevant-for-the-first-project" style="font-size: 80%;">Lab session: Material from last week and relevant for the first project</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs060.html#various-steps-in-cross-validation" style="font-size: 80%;">Various steps in cross-validation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs061.html#how-to-set-up-the-cross-validation-for-ridge-and-or-lasso" style="font-size: 80%;">How to set up the cross-validation for Ridge and/or Lasso</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs062.html#cross-validation-in-brief" style="font-size: 80%;">Cross-validation in brief</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs063.html#code-example-for-cross-validation-and-k-fold-cross-validation" style="font-size: 80%;">Code Example for Cross-validation and \( k \)-fold Cross-validation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs001.html#plans-for-week-38-lecture-monday-september-15" style="font-size: 80%;">Plans for week 38, lecture Monday September 15</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs002.html#readings-and-videos" style="font-size: 80%;">Readings and Videos</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs003.html#linking-the-regression-analysis-with-a-statistical-interpretation" style="font-size: 80%;">Linking the regression analysis with a statistical interpretation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs004.html#assumptions-made" style="font-size: 80%;">Assumptions made</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs005.html#expectation-value-and-variance" style="font-size: 80%;">Expectation value and variance</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs006.html#expectation-value-and-variance-for-boldsymbol-theta" style="font-size: 80%;">Expectation value and variance for \( \boldsymbol{\theta} \)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs007.html#deriving-ols-from-a-probability-distribution" style="font-size: 80%;">Deriving OLS from a probability distribution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs008.html#independent-and-identically-distributed-iid" style="font-size: 80%;">Independent and Identically Distributed (iid)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs009.html#maximum-likelihood-estimation-mle" style="font-size: 80%;">Maximum Likelihood Estimation (MLE)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs010.html#a-new-cost-function" style="font-size: 80%;">A new Cost Function</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs014.html#why-resampling-methods" style="font-size: 80%;">Why resampling methods</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs016.html#resampling-methods" style="font-size: 80%;">Resampling methods</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs013.html#resampling-approaches-can-be-computationally-expensive" style="font-size: 80%;">Resampling approaches can be computationally expensive</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs014.html#why-resampling-methods" style="font-size: 80%;">Why resampling methods ?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs015.html#statistical-analysis" style="font-size: 80%;">Statistical analysis</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs016.html#resampling-methods" style="font-size: 80%;">Resampling methods</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs017.html#resampling-methods-bootstrap" style="font-size: 80%;">Resampling methods: Bootstrap</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs018.html#the-central-limit-theorem" style="font-size: 80%;">The Central Limit Theorem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs019.html#finding-the-limit" style="font-size: 80%;">Finding the Limit</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs020.html#rewriting-the-delta-function" style="font-size: 80%;">Rewriting the \( \delta \)-function</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs021.html#identifying-terms" style="font-size: 80%;">Identifying Terms</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs022.html#wrapping-it-up" style="font-size: 80%;">Wrapping it up</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs023.html#confidence-intervals" style="font-size: 80%;">Confidence Intervals</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs024.html#standard-approach-based-on-the-normal-distribution" style="font-size: 80%;">Standard Approach based on the Normal Distribution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs025.html#resampling-methods-bootstrap-background" style="font-size: 80%;">Resampling methods: Bootstrap background</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs026.html#resampling-methods-more-bootstrap-background" style="font-size: 80%;">Resampling methods: More Bootstrap background</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs027.html#resampling-methods-bootstrap-approach" style="font-size: 80%;">Resampling methods: Bootstrap approach</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs028.html#resampling-methods-bootstrap-steps" style="font-size: 80%;">Resampling methods: Bootstrap steps</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs029.html#code-example-for-the-bootstrap-method" style="font-size: 80%;">Code example for the Bootstrap method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs030.html#plotting-the-histogram" style="font-size: 80%;">Plotting the Histogram</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs031.html#the-bias-variance-tradeoff" style="font-size: 80%;">The bias-variance tradeoff</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs032.html#a-way-to-read-the-bias-variance-tradeoff" style="font-size: 80%;">A way to Read the Bias-Variance Tradeoff</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs033.html#example-code-for-bias-variance-tradeoff" style="font-size: 80%;">Example code for Bias-Variance tradeoff</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs034.html#understanding-what-happens" style="font-size: 80%;">Understanding what happens</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs035.html#summing-up" style="font-size: 80%;">Summing up</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs036.html#another-example-from-scikit-learn-s-repository" style="font-size: 80%;">Another Example from Scikit-Learn's Repository</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs037.html#various-steps-in-cross-validation" style="font-size: 80%;">Various steps in cross-validation</a></li>
+     <!-- navigation toc: --> <li><a href="#cross-validation-in-brief" style="font-size: 80%;">Cross-validation in brief</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs039.html#code-example-for-cross-validation-and-k-fold-cross-validation" style="font-size: 80%;">Code Example for Cross-validation and \( k \)-fold Cross-validation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs040.html#more-examples-on-bootstrap-and-cross-validation-and-errors" style="font-size: 80%;">More examples on bootstrap and cross-validation and errors</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs041.html#the-same-example-but-now-with-cross-validation" style="font-size: 80%;">The same example but now with cross-validation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs042.html#material-for-the-lab-sessions" style="font-size: 80%;">Material for the lab sessions</a></li>
 
         </ul>
       </li>
@@ -319,26 +251,22 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0038"></a>
 <!-- !split -->
-<h2 id="steepest-descent" class="anchor">Steepest descent </h2>
+<h2 id="cross-validation-in-brief" class="anchor">Cross-validation in brief </h2>
 
-<p>The basic idea of gradient descent is
-that a function \( F(\mathbf{x}) \), 
-\( \mathbf{x} \equiv (x_1,\cdots,x_n) \), decreases fastest if one goes from \( \bf {x} \) in the
-direction of the negative gradient \( -\nabla F(\mathbf{x}) \).
-</p>
-
-<p>It can be shown that if </p>
-$$
-\mathbf{x}_{k+1} = \mathbf{x}_k - \gamma_k \nabla F(\mathbf{x}_k),
-$$
-
-<p>with \( \gamma_k > 0 \).</p>
-
-<p>For \( \gamma_k \) small enough, then \( F(\mathbf{x}_{k+1}) \leq
-F(\mathbf{x}_k) \). This means that for a sufficiently small \( \gamma_k \)
-we are always moving towards smaller function values, i.e a minimum.
-</p>
+<p>For the various values of \( k \)</p>
 
+<ol>
+<li> shuffle the dataset randomly.</li>
+<li> Split the dataset into \( k \) groups.</li>
+<li> For each unique group:
+<ol type="a"></li>
+<li> Decide which group to use as set for test data</li>
+<li> Take the remaining groups as a training data set</li>
+<li> Fit a model on the training set and evaluate it on the test set</li>
+<li> Retain the evaluation score and discard the model</li>
+</ol>
+<li> Summarize the model using the sample of model evaluation scores</li>
+</ol>
 <p>
 <!-- navigation buttons at the bottom of the page -->
 <ul class="pagination">
@@ -358,13 +286,6 @@ <h2 id="steepest-descent" class="anchor">Steepest descent </h2>
   <li><a href="http://github.com/._week38-bs040.html">41</a></li>
   <li><a href="http://github.com/._week38-bs041.html">42</a></li>
   <li><a href="http://github.com/._week38-bs042.html">43</a></li>
-  <li><a href="http://github.com/._week38-bs043.html">44</a></li>
-  <li><a href="http://github.com/._week38-bs044.html">45</a></li>
-  <li><a href="http://github.com/._week38-bs045.html">46</a></li>
-  <li><a href="http://github.com/._week38-bs046.html">47</a></li>
-  <li><a href="http://github.com/._week38-bs047.html">48</a></li>
-  <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week38-bs063.html">64</a></li>
   <li><a href="http://github.com/._week38-bs039.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week38/html/._week38-bs039.html b/doc/pub/week38/html/._week38-bs039.html
index b1544eb84..1b0205ed8 100644
--- a/doc/pub/week38/html/._week38-bs039.html
+++ b/doc/pub/week38/html/._week38-bs039.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 38: Logistic Regression and Optimization">
-<title>Week 38: Logistic Regression and Optimization</title>
+<meta name="description" content="Week 38: Statistical analysis, bias-variance tradeoff and resampling methods">
+<title>Week 38: Statistical analysis, bias-variance tradeoff and resampling methods</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week38.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week38-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -36,172 +36,113 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Plans for week 38, lecture Monday September 16',
+ 'sections': [('Plans for week 38, lecture Monday September 15',
                2,
                None,
-               'plans-for-week-38-lecture-monday-september-16'),
-              ('Suggested reading and videos',
+               'plans-for-week-38-lecture-monday-september-15'),
+              ('Readings and Videos', 2, None, 'readings-and-videos'),
+              ('Linking the regression analysis with a statistical '
+               'interpretation',
                2,
                None,
-               'suggested-reading-and-videos'),
-              ('Plans for the lab sessions',
+               'linking-the-regression-analysis-with-a-statistical-interpretation'),
+              ('Assumptions made', 2, None, 'assumptions-made'),
+              ('Expectation value and variance',
                2,
                None,
-               'plans-for-the-lab-sessions'),
-              ('Material for lecture Monday September 16',
+               'expectation-value-and-variance'),
+              ('Expectation value and variance for $\\boldsymbol{\\theta}$',
                2,
                None,
-               'material-for-lecture-monday-september-16'),
-              ('Logistic Regression', 2, None, 'logistic-regression'),
-              ('Classification problems', 2, None, 'classification-problems'),
-              ('Optimization and Deep learning',
+               'expectation-value-and-variance-for-boldsymbol-theta'),
+              ('Deriving OLS from a probability distribution',
                2,
                None,
-               'optimization-and-deep-learning'),
-              ('Basics', 2, None, 'basics'),
-              ('Linear classifier', 2, None, 'linear-classifier'),
-              ('Some selected properties', 2, None, 'some-selected-properties'),
-              ('Simple example', 2, None, 'simple-example'),
-              ('Plotting the mean value for each group',
+               'deriving-ols-from-a-probability-distribution'),
+              ('Independent and Identically Distributed (iid)',
                2,
                None,
-               'plotting-the-mean-value-for-each-group'),
-              ('The logistic function', 2, None, 'the-logistic-function'),
-              ('Examples of likelihood functions used in logistic regression '
-               'and nueral networks',
+               'independent-and-identically-distributed-iid'),
+              ('Maximum Likelihood Estimation (MLE)',
                2,
                None,
-               'examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-networks'),
-              ('Two parameters', 2, None, 'two-parameters'),
-              ('Maximum likelihood', 2, None, 'maximum-likelihood'),
-              ('The cost function rewritten',
+               'maximum-likelihood-estimation-mle'),
+              ('A new Cost Function', 2, None, 'a-new-cost-function'),
+              ('Why resampling methods', 2, None, 'why-resampling-methods'),
+              ('Resampling methods', 2, None, 'resampling-methods'),
+              ('Resampling approaches can be computationally expensive',
                2,
                None,
-               'the-cost-function-rewritten'),
-              ('Minimizing the cross entropy',
+               'resampling-approaches-can-be-computationally-expensive'),
+              ('Why resampling methods ?', 2, None, 'why-resampling-methods'),
+              ('Statistical analysis', 2, None, 'statistical-analysis'),
+              ('Resampling methods', 2, None, 'resampling-methods'),
+              ('Resampling methods: Bootstrap',
                2,
                None,
-               'minimizing-the-cross-entropy'),
-              ('A more compact expression',
+               'resampling-methods-bootstrap'),
+              ('The Central Limit Theorem',
                2,
                None,
-               'a-more-compact-expression'),
-              ('Extending to more predictors',
+               'the-central-limit-theorem'),
+              ('Finding the Limit', 2, None, 'finding-the-limit'),
+              ('Rewriting the $\\delta$-function',
                2,
                None,
-               'extending-to-more-predictors'),
-              ('Including more classes', 2, None, 'including-more-classes'),
-              ('More classes', 2, None, 'more-classes'),
-              ('Searching for Optimal Regularization Parameters $\\lambda$',
+               'rewriting-the-delta-function'),
+              ('Identifying Terms', 2, None, 'identifying-terms'),
+              ('Wrapping it up', 2, None, 'wrapping-it-up'),
+              ('Confidence Intervals', 2, None, 'confidence-intervals'),
+              ('Standard Approach based on the Normal Distribution',
                2,
                None,
-               'searching-for-optimal-regularization-parameters-lambda'),
-              ('Grid Search', 2, None, 'grid-search'),
-              ('Randomized Grid Search', 2, None, 'randomized-grid-search'),
-              ('Wisconsin Cancer Data', 2, None, 'wisconsin-cancer-data'),
-              ('Using the correlation matrix',
+               'standard-approach-based-on-the-normal-distribution'),
+              ('Resampling methods: Bootstrap background',
                2,
                None,
-               'using-the-correlation-matrix'),
-              ('Discussing the correlation data',
+               'resampling-methods-bootstrap-background'),
+              ('Resampling methods: More Bootstrap background',
                2,
                None,
-               'discussing-the-correlation-data'),
-              ('Other measures in classification studies: Cancer Data  again',
+               'resampling-methods-more-bootstrap-background'),
+              ('Resampling methods: Bootstrap approach',
                2,
                None,
-               'other-measures-in-classification-studies-cancer-data-again'),
-              ('Optimization, the central part of any Machine Learning '
-               'algortithm',
+               'resampling-methods-bootstrap-approach'),
+              ('Resampling methods: Bootstrap steps',
                2,
                None,
-               'optimization-the-central-part-of-any-machine-learning-algortithm'),
-              ('Revisiting our Logistic Regression case',
+               'resampling-methods-bootstrap-steps'),
+              ('Code example for the Bootstrap method',
                2,
                None,
-               'revisiting-our-logistic-regression-case'),
-              ('The equations to solve', 2, None, 'the-equations-to-solve'),
-              ("Solving using Newton-Raphson's method",
+               'code-example-for-the-bootstrap-method'),
+              ('Plotting the Histogram', 2, None, 'plotting-the-histogram'),
+              ('The bias-variance tradeoff',
                2,
                None,
-               'solving-using-newton-raphson-s-method'),
-              ("Brief reminder on Newton-Raphson's method",
+               'the-bias-variance-tradeoff'),
+              ('A way to Read the Bias-Variance Tradeoff',
                2,
                None,
-               'brief-reminder-on-newton-raphson-s-method'),
-              ('The equations', 2, None, 'the-equations'),
-              ('Simple geometric interpretation',
+               'a-way-to-read-the-bias-variance-tradeoff'),
+              ('Example code for Bias-Variance tradeoff',
                2,
                None,
-               'simple-geometric-interpretation'),
-              ('Extending to more than one variable',
+               'example-code-for-bias-variance-tradeoff'),
+              ('Understanding what happens',
                2,
                None,
-               'extending-to-more-than-one-variable'),
-              ('Steepest descent', 2, None, 'steepest-descent'),
-              ('More on Steepest descent', 2, None, 'more-on-steepest-descent'),
-              ('The ideal', 2, None, 'the-ideal'),
-              ('The sensitiveness of the gradient descent',
+               'understanding-what-happens'),
+              ('Summing up', 2, None, 'summing-up'),
+              ("Another Example from Scikit-Learn's Repository",
                2,
                None,
-               'the-sensitiveness-of-the-gradient-descent'),
-              ('Convex functions', 2, None, 'convex-functions'),
-              ('Convex function', 2, None, 'convex-function'),
-              ('Conditions on convex functions',
-               2,
-               None,
-               'conditions-on-convex-functions'),
-              ('More on convex functions', 2, None, 'more-on-convex-functions'),
-              ('Some simple problems', 2, None, 'some-simple-problems'),
-              ('Revisiting our first homework',
-               2,
-               None,
-               'revisiting-our-first-homework'),
-              ('Gradient descent example', 2, None, 'gradient-descent-example'),
-              ('The derivative of the cost/loss function',
-               2,
-               None,
-               'the-derivative-of-the-cost-loss-function'),
-              ('The Hessian matrix', 2, None, 'the-hessian-matrix'),
-              ('Simple program', 2, None, 'simple-program'),
-              ('Gradient Descent Example', 2, None, 'gradient-descent-example'),
-              ('And a corresponding example using _scikit-learn_',
-               2,
-               None,
-               'and-a-corresponding-example-using-scikit-learn'),
-              ('Gradient descent and Ridge',
-               2,
-               None,
-               'gradient-descent-and-ridge'),
-              ('The Hessian matrix for Ridge Regression',
-               2,
-               None,
-               'the-hessian-matrix-for-ridge-regression'),
-              ('Program example for gradient descent with Ridge Regression',
-               2,
-               None,
-               'program-example-for-gradient-descent-with-ridge-regression'),
-              ('Using gradient descent methods, limitations',
-               2,
-               None,
-               'using-gradient-descent-methods-limitations'),
-              ('Challenge yourself the coming weekend',
-               2,
-               None,
-               'challenge-yourself-the-coming-weekend'),
-              ('Lab session: Material from last week and relevant for the '
-               'first project',
-               2,
-               None,
-               'lab-session-material-from-last-week-and-relevant-for-the-first-project'),
+               'another-example-from-scikit-learn-s-repository'),
               ('Various steps in cross-validation',
                2,
                None,
                'various-steps-in-cross-validation'),
-              ('How to set up the cross-validation for Ridge and/or Lasso',
-               2,
-               None,
-               'how-to-set-up-the-cross-validation-for-ridge-and-or-lasso'),
               ('Cross-validation in brief',
                2,
                None,
@@ -210,7 +151,19 @@
                'Cross-validation',
                2,
                None,
-               'code-example-for-cross-validation-and-k-fold-cross-validation')]}
+               'code-example-for-cross-validation-and-k-fold-cross-validation'),
+              ('More examples on bootstrap and cross-validation and errors',
+               2,
+               None,
+               'more-examples-on-bootstrap-and-cross-validation-and-errors'),
+              ('The same example but now with cross-validation',
+               2,
+               None,
+               'the-same-example-but-now-with-cross-validation'),
+              ('Material for the lab sessions',
+               2,
+               None,
+               'material-for-the-lab-sessions')]}
 end of tocinfo -->
 
 <body>
@@ -238,76 +191,55 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week38-bs.html">Week 38: Logistic Regression and Optimization</a>
+    <a class="navbar-brand" href="http://github.com/week38-bs.html">Week 38: Statistical analysis, bias-variance tradeoff and resampling methods</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs001.html#plans-for-week-38-lecture-monday-september-16" style="font-size: 80%;">Plans for week 38, lecture Monday September 16</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs002.html#suggested-reading-and-videos" style="font-size: 80%;">Suggested reading and videos</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs003.html#plans-for-the-lab-sessions" style="font-size: 80%;">Plans for the lab sessions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs004.html#material-for-lecture-monday-september-16" style="font-size: 80%;">Material for lecture Monday September 16</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs005.html#logistic-regression" style="font-size: 80%;">Logistic Regression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs006.html#classification-problems" style="font-size: 80%;">Classification problems</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs007.html#optimization-and-deep-learning" style="font-size: 80%;">Optimization and Deep learning</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs008.html#basics" style="font-size: 80%;">Basics</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs009.html#linear-classifier" style="font-size: 80%;">Linear classifier</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs010.html#some-selected-properties" style="font-size: 80%;">Some selected properties</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs011.html#simple-example" style="font-size: 80%;">Simple example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs012.html#plotting-the-mean-value-for-each-group" style="font-size: 80%;">Plotting the mean value for each group</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs013.html#the-logistic-function" style="font-size: 80%;">The logistic function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs014.html#examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-networks" style="font-size: 80%;">Examples of likelihood functions used in logistic regression and nueral networks</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs015.html#two-parameters" style="font-size: 80%;">Two parameters</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs016.html#maximum-likelihood" style="font-size: 80%;">Maximum likelihood</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs017.html#the-cost-function-rewritten" style="font-size: 80%;">The cost function rewritten</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs018.html#minimizing-the-cross-entropy" style="font-size: 80%;">Minimizing the cross entropy</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs019.html#a-more-compact-expression" style="font-size: 80%;">A more compact expression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs020.html#extending-to-more-predictors" style="font-size: 80%;">Extending to more predictors</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs021.html#including-more-classes" style="font-size: 80%;">Including more classes</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs022.html#more-classes" style="font-size: 80%;">More classes</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs023.html#searching-for-optimal-regularization-parameters-lambda" style="font-size: 80%;">Searching for Optimal Regularization Parameters \( \lambda \)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs024.html#grid-search" style="font-size: 80%;">Grid Search</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs025.html#randomized-grid-search" style="font-size: 80%;">Randomized Grid Search</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs026.html#wisconsin-cancer-data" style="font-size: 80%;">Wisconsin Cancer Data</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs027.html#using-the-correlation-matrix" style="font-size: 80%;">Using the correlation matrix</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs028.html#discussing-the-correlation-data" style="font-size: 80%;">Discussing the correlation data</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs029.html#other-measures-in-classification-studies-cancer-data-again" style="font-size: 80%;">Other measures in classification studies: Cancer Data  again</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs030.html#optimization-the-central-part-of-any-machine-learning-algortithm" style="font-size: 80%;">Optimization, the central part of any Machine Learning algortithm</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs031.html#revisiting-our-logistic-regression-case" style="font-size: 80%;">Revisiting our Logistic Regression case</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs032.html#the-equations-to-solve" style="font-size: 80%;">The equations to solve</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs033.html#solving-using-newton-raphson-s-method" style="font-size: 80%;">Solving using Newton-Raphson's method</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs034.html#brief-reminder-on-newton-raphson-s-method" style="font-size: 80%;">Brief reminder on Newton-Raphson's method</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs035.html#the-equations" style="font-size: 80%;">The equations</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs036.html#simple-geometric-interpretation" style="font-size: 80%;">Simple geometric interpretation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs037.html#extending-to-more-than-one-variable" style="font-size: 80%;">Extending to more than one variable</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs038.html#steepest-descent" style="font-size: 80%;">Steepest descent</a></li>
-     <!-- navigation toc: --> <li><a href="#more-on-steepest-descent" style="font-size: 80%;">More on Steepest descent</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs040.html#the-ideal" style="font-size: 80%;">The ideal</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs041.html#the-sensitiveness-of-the-gradient-descent" style="font-size: 80%;">The sensitiveness of the gradient descent</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs042.html#convex-functions" style="font-size: 80%;">Convex functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs043.html#convex-function" style="font-size: 80%;">Convex function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs044.html#conditions-on-convex-functions" style="font-size: 80%;">Conditions on convex functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs045.html#more-on-convex-functions" style="font-size: 80%;">More on convex functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs046.html#some-simple-problems" style="font-size: 80%;">Some simple problems</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs047.html#revisiting-our-first-homework" style="font-size: 80%;">Revisiting our first homework</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs052.html#gradient-descent-example" style="font-size: 80%;">Gradient descent example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs049.html#the-derivative-of-the-cost-loss-function" style="font-size: 80%;">The derivative of the cost/loss function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs050.html#the-hessian-matrix" style="font-size: 80%;">The Hessian matrix</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs051.html#simple-program" style="font-size: 80%;">Simple program</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs052.html#gradient-descent-example" style="font-size: 80%;">Gradient Descent Example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs053.html#and-a-corresponding-example-using-scikit-learn" style="font-size: 80%;">And a corresponding example using <b>scikit-learn</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs054.html#gradient-descent-and-ridge" style="font-size: 80%;">Gradient descent and Ridge</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs055.html#the-hessian-matrix-for-ridge-regression" style="font-size: 80%;">The Hessian matrix for Ridge Regression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs056.html#program-example-for-gradient-descent-with-ridge-regression" style="font-size: 80%;">Program example for gradient descent with Ridge Regression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs057.html#using-gradient-descent-methods-limitations" style="font-size: 80%;">Using gradient descent methods, limitations</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs058.html#challenge-yourself-the-coming-weekend" style="font-size: 80%;">Challenge yourself the coming weekend</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs059.html#lab-session-material-from-last-week-and-relevant-for-the-first-project" style="font-size: 80%;">Lab session: Material from last week and relevant for the first project</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs060.html#various-steps-in-cross-validation" style="font-size: 80%;">Various steps in cross-validation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs061.html#how-to-set-up-the-cross-validation-for-ridge-and-or-lasso" style="font-size: 80%;">How to set up the cross-validation for Ridge and/or Lasso</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs062.html#cross-validation-in-brief" style="font-size: 80%;">Cross-validation in brief</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs063.html#code-example-for-cross-validation-and-k-fold-cross-validation" style="font-size: 80%;">Code Example for Cross-validation and \( k \)-fold Cross-validation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs001.html#plans-for-week-38-lecture-monday-september-15" style="font-size: 80%;">Plans for week 38, lecture Monday September 15</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs002.html#readings-and-videos" style="font-size: 80%;">Readings and Videos</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs003.html#linking-the-regression-analysis-with-a-statistical-interpretation" style="font-size: 80%;">Linking the regression analysis with a statistical interpretation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs004.html#assumptions-made" style="font-size: 80%;">Assumptions made</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs005.html#expectation-value-and-variance" style="font-size: 80%;">Expectation value and variance</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs006.html#expectation-value-and-variance-for-boldsymbol-theta" style="font-size: 80%;">Expectation value and variance for \( \boldsymbol{\theta} \)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs007.html#deriving-ols-from-a-probability-distribution" style="font-size: 80%;">Deriving OLS from a probability distribution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs008.html#independent-and-identically-distributed-iid" style="font-size: 80%;">Independent and Identically Distributed (iid)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs009.html#maximum-likelihood-estimation-mle" style="font-size: 80%;">Maximum Likelihood Estimation (MLE)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs010.html#a-new-cost-function" style="font-size: 80%;">A new Cost Function</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs014.html#why-resampling-methods" style="font-size: 80%;">Why resampling methods</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs016.html#resampling-methods" style="font-size: 80%;">Resampling methods</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs013.html#resampling-approaches-can-be-computationally-expensive" style="font-size: 80%;">Resampling approaches can be computationally expensive</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs014.html#why-resampling-methods" style="font-size: 80%;">Why resampling methods ?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs015.html#statistical-analysis" style="font-size: 80%;">Statistical analysis</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs016.html#resampling-methods" style="font-size: 80%;">Resampling methods</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs017.html#resampling-methods-bootstrap" style="font-size: 80%;">Resampling methods: Bootstrap</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs018.html#the-central-limit-theorem" style="font-size: 80%;">The Central Limit Theorem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs019.html#finding-the-limit" style="font-size: 80%;">Finding the Limit</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs020.html#rewriting-the-delta-function" style="font-size: 80%;">Rewriting the \( \delta \)-function</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs021.html#identifying-terms" style="font-size: 80%;">Identifying Terms</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs022.html#wrapping-it-up" style="font-size: 80%;">Wrapping it up</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs023.html#confidence-intervals" style="font-size: 80%;">Confidence Intervals</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs024.html#standard-approach-based-on-the-normal-distribution" style="font-size: 80%;">Standard Approach based on the Normal Distribution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs025.html#resampling-methods-bootstrap-background" style="font-size: 80%;">Resampling methods: Bootstrap background</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs026.html#resampling-methods-more-bootstrap-background" style="font-size: 80%;">Resampling methods: More Bootstrap background</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs027.html#resampling-methods-bootstrap-approach" style="font-size: 80%;">Resampling methods: Bootstrap approach</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs028.html#resampling-methods-bootstrap-steps" style="font-size: 80%;">Resampling methods: Bootstrap steps</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs029.html#code-example-for-the-bootstrap-method" style="font-size: 80%;">Code example for the Bootstrap method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs030.html#plotting-the-histogram" style="font-size: 80%;">Plotting the Histogram</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs031.html#the-bias-variance-tradeoff" style="font-size: 80%;">The bias-variance tradeoff</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs032.html#a-way-to-read-the-bias-variance-tradeoff" style="font-size: 80%;">A way to Read the Bias-Variance Tradeoff</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs033.html#example-code-for-bias-variance-tradeoff" style="font-size: 80%;">Example code for Bias-Variance tradeoff</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs034.html#understanding-what-happens" style="font-size: 80%;">Understanding what happens</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs035.html#summing-up" style="font-size: 80%;">Summing up</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs036.html#another-example-from-scikit-learn-s-repository" style="font-size: 80%;">Another Example from Scikit-Learn's Repository</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs037.html#various-steps-in-cross-validation" style="font-size: 80%;">Various steps in cross-validation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs038.html#cross-validation-in-brief" style="font-size: 80%;">Cross-validation in brief</a></li>
+     <!-- navigation toc: --> <li><a href="#code-example-for-cross-validation-and-k-fold-cross-validation" style="font-size: 80%;">Code Example for Cross-validation and \( k \)-fold Cross-validation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs040.html#more-examples-on-bootstrap-and-cross-validation-and-errors" style="font-size: 80%;">More examples on bootstrap and cross-validation and errors</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs041.html#the-same-example-but-now-with-cross-validation" style="font-size: 80%;">The same example but now with cross-validation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs042.html#material-for-the-lab-sessions" style="font-size: 80%;">Material for the lab sessions</a></li>
 
         </ul>
       </li>
@@ -318,22 +250,121 @@
 <div class="container">
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0039"></a>
-<!-- !split  -->
-<h2 id="more-on-steepest-descent" class="anchor">More on Steepest descent </h2>
+<!-- !split -->
+<h2 id="code-example-for-cross-validation-and-k-fold-cross-validation" class="anchor">Code Example for Cross-validation and \( k \)-fold Cross-validation </h2>
+
+<p>The code here uses Ridge regression with cross-validation (CV)  resampling and \( k \)-fold CV in order to fit a specific polynomial. </p>
+
+<!-- code=python (!bc pycod) typeset with pygments style "default" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #f8f8f8">
+  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
+<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">matplotlib.pyplot</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">plt</span>
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.model_selection</span> <span style="color: #008000; font-weight: bold">import</span> KFold
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.linear_model</span> <span style="color: #008000; font-weight: bold">import</span> Ridge
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.model_selection</span> <span style="color: #008000; font-weight: bold">import</span> cross_val_score
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.preprocessing</span> <span style="color: #008000; font-weight: bold">import</span> PolynomialFeatures
 
-<p>The previous observation is the basis of the method of steepest
-descent, which is also referred to as just gradient descent (GD). One
-starts with an initial guess \( \mathbf{x}_0 \) for a minimum of \( F \) and
-computes new approximations according to
-</p>
+<span style="color: #408080; font-style: italic"># A seed just to ensure that the random numbers are the same for every run.</span>
+<span style="color: #408080; font-style: italic"># Useful for eventual debugging.</span>
+np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>seed(<span style="color: #666666">3155</span>)
 
-$$
-\mathbf{x}_{k+1} = \mathbf{x}_k - \gamma_k \nabla F(\mathbf{x}_k), \ \ k \geq 0.
-$$
+<span style="color: #408080; font-style: italic"># Generate the data.</span>
+nsamples <span style="color: #666666">=</span> <span style="color: #666666">100</span>
+x <span style="color: #666666">=</span> np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>randn(nsamples)
+y <span style="color: #666666">=</span> <span style="color: #666666">3*</span>x<span style="color: #666666">**2</span> <span style="color: #666666">+</span> np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>randn(nsamples)
+
+<span style="color: #408080; font-style: italic">## Cross-validation on Ridge regression using KFold only</span>
+
+<span style="color: #408080; font-style: italic"># Decide degree on polynomial to fit</span>
+poly <span style="color: #666666">=</span> PolynomialFeatures(degree <span style="color: #666666">=</span> <span style="color: #666666">6</span>)
+
+<span style="color: #408080; font-style: italic"># Decide which values of lambda to use</span>
+nlambdas <span style="color: #666666">=</span> <span style="color: #666666">500</span>
+lambdas <span style="color: #666666">=</span> np<span style="color: #666666">.</span>logspace(<span style="color: #666666">-3</span>, <span style="color: #666666">5</span>, nlambdas)
+
+<span style="color: #408080; font-style: italic"># Initialize a KFold instance</span>
+k <span style="color: #666666">=</span> <span style="color: #666666">5</span>
+kfold <span style="color: #666666">=</span> KFold(n_splits <span style="color: #666666">=</span> k)
+
+<span style="color: #408080; font-style: italic"># Perform the cross-validation to estimate MSE</span>
+scores_KFold <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros((nlambdas, k))
+
+i <span style="color: #666666">=</span> <span style="color: #666666">0</span>
+<span style="color: #008000; font-weight: bold">for</span> lmb <span style="color: #AA22FF; font-weight: bold">in</span> lambdas:
+    ridge <span style="color: #666666">=</span> Ridge(alpha <span style="color: #666666">=</span> lmb)
+    j <span style="color: #666666">=</span> <span style="color: #666666">0</span>
+    <span style="color: #008000; font-weight: bold">for</span> train_inds, test_inds <span style="color: #AA22FF; font-weight: bold">in</span> kfold<span style="color: #666666">.</span>split(x):
+        xtrain <span style="color: #666666">=</span> x[train_inds]
+        ytrain <span style="color: #666666">=</span> y[train_inds]
+
+        xtest <span style="color: #666666">=</span> x[test_inds]
+        ytest <span style="color: #666666">=</span> y[test_inds]
+
+        Xtrain <span style="color: #666666">=</span> poly<span style="color: #666666">.</span>fit_transform(xtrain[:, np<span style="color: #666666">.</span>newaxis])
+        ridge<span style="color: #666666">.</span>fit(Xtrain, ytrain[:, np<span style="color: #666666">.</span>newaxis])
+
+        Xtest <span style="color: #666666">=</span> poly<span style="color: #666666">.</span>fit_transform(xtest[:, np<span style="color: #666666">.</span>newaxis])
+        ypred <span style="color: #666666">=</span> ridge<span style="color: #666666">.</span>predict(Xtest)
+
+        scores_KFold[i,j] <span style="color: #666666">=</span> np<span style="color: #666666">.</span>sum((ypred <span style="color: #666666">-</span> ytest[:, np<span style="color: #666666">.</span>newaxis])<span style="color: #666666">**2</span>)<span style="color: #666666">/</span>np<span style="color: #666666">.</span>size(ypred)
+
+        j <span style="color: #666666">+=</span> <span style="color: #666666">1</span>
+    i <span style="color: #666666">+=</span> <span style="color: #666666">1</span>
+
+
+estimated_mse_KFold <span style="color: #666666">=</span> np<span style="color: #666666">.</span>mean(scores_KFold, axis <span style="color: #666666">=</span> <span style="color: #666666">1</span>)
+
+<span style="color: #408080; font-style: italic">## Cross-validation using cross_val_score from sklearn along with KFold</span>
+
+<span style="color: #408080; font-style: italic"># kfold is an instance initialized above as:</span>
+<span style="color: #408080; font-style: italic"># kfold = KFold(n_splits = k)</span>
+
+estimated_mse_sklearn <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros(nlambdas)
+i <span style="color: #666666">=</span> <span style="color: #666666">0</span>
+<span style="color: #008000; font-weight: bold">for</span> lmb <span style="color: #AA22FF; font-weight: bold">in</span> lambdas:
+    ridge <span style="color: #666666">=</span> Ridge(alpha <span style="color: #666666">=</span> lmb)
+
+    X <span style="color: #666666">=</span> poly<span style="color: #666666">.</span>fit_transform(x[:, np<span style="color: #666666">.</span>newaxis])
+    estimated_mse_folds <span style="color: #666666">=</span> cross_val_score(ridge, X, y[:, np<span style="color: #666666">.</span>newaxis], scoring<span style="color: #666666">=</span><span style="color: #BA2121">&#39;neg_mean_squared_error&#39;</span>, cv<span style="color: #666666">=</span>kfold)
+
+    <span style="color: #408080; font-style: italic"># cross_val_score return an array containing the estimated negative mse for every fold.</span>
+    <span style="color: #408080; font-style: italic"># we have to the the mean of every array in order to get an estimate of the mse of the model</span>
+    estimated_mse_sklearn[i] <span style="color: #666666">=</span> np<span style="color: #666666">.</span>mean(<span style="color: #666666">-</span>estimated_mse_folds)
+
+    i <span style="color: #666666">+=</span> <span style="color: #666666">1</span>
+
+<span style="color: #408080; font-style: italic">## Plot and compare the slightly different ways to perform cross-validation</span>
+
+plt<span style="color: #666666">.</span>figure()
+
+plt<span style="color: #666666">.</span>plot(np<span style="color: #666666">.</span>log10(lambdas), estimated_mse_sklearn, label <span style="color: #666666">=</span> <span style="color: #BA2121">&#39;cross_val_score&#39;</span>)
+plt<span style="color: #666666">.</span>plot(np<span style="color: #666666">.</span>log10(lambdas), estimated_mse_KFold, <span style="color: #BA2121">&#39;r--&#39;</span>, label <span style="color: #666666">=</span> <span style="color: #BA2121">&#39;KFold&#39;</span>)
+
+plt<span style="color: #666666">.</span>xlabel(<span style="color: #BA2121">&#39;log10(lambda)&#39;</span>)
+plt<span style="color: #666666">.</span>ylabel(<span style="color: #BA2121">&#39;mse&#39;</span>)
+
+plt<span style="color: #666666">.</span>legend()
+
+plt<span style="color: #666666">.</span>show()
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
+</div>
 
-<p>The parameter \( \gamma_k \) is often referred to as the step length or
-the learning rate within the context of Machine Learning.
-</p>
 
 <p>
 <!-- navigation buttons at the bottom of the page -->
@@ -353,14 +384,6 @@ <h2 id="more-on-steepest-descent" class="anchor">More on Steepest descent </h2>
   <li><a href="http://github.com/._week38-bs040.html">41</a></li>
   <li><a href="http://github.com/._week38-bs041.html">42</a></li>
   <li><a href="http://github.com/._week38-bs042.html">43</a></li>
-  <li><a href="http://github.com/._week38-bs043.html">44</a></li>
-  <li><a href="http://github.com/._week38-bs044.html">45</a></li>
-  <li><a href="http://github.com/._week38-bs045.html">46</a></li>
-  <li><a href="http://github.com/._week38-bs046.html">47</a></li>
-  <li><a href="http://github.com/._week38-bs047.html">48</a></li>
-  <li><a href="http://github.com/._week38-bs048.html">49</a></li>
-  <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week38-bs063.html">64</a></li>
   <li><a href="http://github.com/._week38-bs040.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week38/html/._week38-bs040.html b/doc/pub/week38/html/._week38-bs040.html
index 7990df9a2..f379528bb 100644
--- a/doc/pub/week38/html/._week38-bs040.html
+++ b/doc/pub/week38/html/._week38-bs040.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 38: Logistic Regression and Optimization">
-<title>Week 38: Logistic Regression and Optimization</title>
+<meta name="description" content="Week 38: Statistical analysis, bias-variance tradeoff and resampling methods">
+<title>Week 38: Statistical analysis, bias-variance tradeoff and resampling methods</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week38.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week38-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -36,172 +36,113 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Plans for week 38, lecture Monday September 16',
+ 'sections': [('Plans for week 38, lecture Monday September 15',
                2,
                None,
-               'plans-for-week-38-lecture-monday-september-16'),
-              ('Suggested reading and videos',
+               'plans-for-week-38-lecture-monday-september-15'),
+              ('Readings and Videos', 2, None, 'readings-and-videos'),
+              ('Linking the regression analysis with a statistical '
+               'interpretation',
                2,
                None,
-               'suggested-reading-and-videos'),
-              ('Plans for the lab sessions',
+               'linking-the-regression-analysis-with-a-statistical-interpretation'),
+              ('Assumptions made', 2, None, 'assumptions-made'),
+              ('Expectation value and variance',
                2,
                None,
-               'plans-for-the-lab-sessions'),
-              ('Material for lecture Monday September 16',
+               'expectation-value-and-variance'),
+              ('Expectation value and variance for $\\boldsymbol{\\theta}$',
                2,
                None,
-               'material-for-lecture-monday-september-16'),
-              ('Logistic Regression', 2, None, 'logistic-regression'),
-              ('Classification problems', 2, None, 'classification-problems'),
-              ('Optimization and Deep learning',
+               'expectation-value-and-variance-for-boldsymbol-theta'),
+              ('Deriving OLS from a probability distribution',
                2,
                None,
-               'optimization-and-deep-learning'),
-              ('Basics', 2, None, 'basics'),
-              ('Linear classifier', 2, None, 'linear-classifier'),
-              ('Some selected properties', 2, None, 'some-selected-properties'),
-              ('Simple example', 2, None, 'simple-example'),
-              ('Plotting the mean value for each group',
+               'deriving-ols-from-a-probability-distribution'),
+              ('Independent and Identically Distributed (iid)',
                2,
                None,
-               'plotting-the-mean-value-for-each-group'),
-              ('The logistic function', 2, None, 'the-logistic-function'),
-              ('Examples of likelihood functions used in logistic regression '
-               'and nueral networks',
+               'independent-and-identically-distributed-iid'),
+              ('Maximum Likelihood Estimation (MLE)',
                2,
                None,
-               'examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-networks'),
-              ('Two parameters', 2, None, 'two-parameters'),
-              ('Maximum likelihood', 2, None, 'maximum-likelihood'),
-              ('The cost function rewritten',
+               'maximum-likelihood-estimation-mle'),
+              ('A new Cost Function', 2, None, 'a-new-cost-function'),
+              ('Why resampling methods', 2, None, 'why-resampling-methods'),
+              ('Resampling methods', 2, None, 'resampling-methods'),
+              ('Resampling approaches can be computationally expensive',
                2,
                None,
-               'the-cost-function-rewritten'),
-              ('Minimizing the cross entropy',
+               'resampling-approaches-can-be-computationally-expensive'),
+              ('Why resampling methods ?', 2, None, 'why-resampling-methods'),
+              ('Statistical analysis', 2, None, 'statistical-analysis'),
+              ('Resampling methods', 2, None, 'resampling-methods'),
+              ('Resampling methods: Bootstrap',
                2,
                None,
-               'minimizing-the-cross-entropy'),
-              ('A more compact expression',
+               'resampling-methods-bootstrap'),
+              ('The Central Limit Theorem',
                2,
                None,
-               'a-more-compact-expression'),
-              ('Extending to more predictors',
+               'the-central-limit-theorem'),
+              ('Finding the Limit', 2, None, 'finding-the-limit'),
+              ('Rewriting the $\\delta$-function',
                2,
                None,
-               'extending-to-more-predictors'),
-              ('Including more classes', 2, None, 'including-more-classes'),
-              ('More classes', 2, None, 'more-classes'),
-              ('Searching for Optimal Regularization Parameters $\\lambda$',
+               'rewriting-the-delta-function'),
+              ('Identifying Terms', 2, None, 'identifying-terms'),
+              ('Wrapping it up', 2, None, 'wrapping-it-up'),
+              ('Confidence Intervals', 2, None, 'confidence-intervals'),
+              ('Standard Approach based on the Normal Distribution',
                2,
                None,
-               'searching-for-optimal-regularization-parameters-lambda'),
-              ('Grid Search', 2, None, 'grid-search'),
-              ('Randomized Grid Search', 2, None, 'randomized-grid-search'),
-              ('Wisconsin Cancer Data', 2, None, 'wisconsin-cancer-data'),
-              ('Using the correlation matrix',
+               'standard-approach-based-on-the-normal-distribution'),
+              ('Resampling methods: Bootstrap background',
                2,
                None,
-               'using-the-correlation-matrix'),
-              ('Discussing the correlation data',
+               'resampling-methods-bootstrap-background'),
+              ('Resampling methods: More Bootstrap background',
                2,
                None,
-               'discussing-the-correlation-data'),
-              ('Other measures in classification studies: Cancer Data  again',
+               'resampling-methods-more-bootstrap-background'),
+              ('Resampling methods: Bootstrap approach',
                2,
                None,
-               'other-measures-in-classification-studies-cancer-data-again'),
-              ('Optimization, the central part of any Machine Learning '
-               'algortithm',
+               'resampling-methods-bootstrap-approach'),
+              ('Resampling methods: Bootstrap steps',
                2,
                None,
-               'optimization-the-central-part-of-any-machine-learning-algortithm'),
-              ('Revisiting our Logistic Regression case',
+               'resampling-methods-bootstrap-steps'),
+              ('Code example for the Bootstrap method',
                2,
                None,
-               'revisiting-our-logistic-regression-case'),
-              ('The equations to solve', 2, None, 'the-equations-to-solve'),
-              ("Solving using Newton-Raphson's method",
+               'code-example-for-the-bootstrap-method'),
+              ('Plotting the Histogram', 2, None, 'plotting-the-histogram'),
+              ('The bias-variance tradeoff',
                2,
                None,
-               'solving-using-newton-raphson-s-method'),
-              ("Brief reminder on Newton-Raphson's method",
+               'the-bias-variance-tradeoff'),
+              ('A way to Read the Bias-Variance Tradeoff',
                2,
                None,
-               'brief-reminder-on-newton-raphson-s-method'),
-              ('The equations', 2, None, 'the-equations'),
-              ('Simple geometric interpretation',
+               'a-way-to-read-the-bias-variance-tradeoff'),
+              ('Example code for Bias-Variance tradeoff',
                2,
                None,
-               'simple-geometric-interpretation'),
-              ('Extending to more than one variable',
+               'example-code-for-bias-variance-tradeoff'),
+              ('Understanding what happens',
                2,
                None,
-               'extending-to-more-than-one-variable'),
-              ('Steepest descent', 2, None, 'steepest-descent'),
-              ('More on Steepest descent', 2, None, 'more-on-steepest-descent'),
-              ('The ideal', 2, None, 'the-ideal'),
-              ('The sensitiveness of the gradient descent',
+               'understanding-what-happens'),
+              ('Summing up', 2, None, 'summing-up'),
+              ("Another Example from Scikit-Learn's Repository",
                2,
                None,
-               'the-sensitiveness-of-the-gradient-descent'),
-              ('Convex functions', 2, None, 'convex-functions'),
-              ('Convex function', 2, None, 'convex-function'),
-              ('Conditions on convex functions',
-               2,
-               None,
-               'conditions-on-convex-functions'),
-              ('More on convex functions', 2, None, 'more-on-convex-functions'),
-              ('Some simple problems', 2, None, 'some-simple-problems'),
-              ('Revisiting our first homework',
-               2,
-               None,
-               'revisiting-our-first-homework'),
-              ('Gradient descent example', 2, None, 'gradient-descent-example'),
-              ('The derivative of the cost/loss function',
-               2,
-               None,
-               'the-derivative-of-the-cost-loss-function'),
-              ('The Hessian matrix', 2, None, 'the-hessian-matrix'),
-              ('Simple program', 2, None, 'simple-program'),
-              ('Gradient Descent Example', 2, None, 'gradient-descent-example'),
-              ('And a corresponding example using _scikit-learn_',
-               2,
-               None,
-               'and-a-corresponding-example-using-scikit-learn'),
-              ('Gradient descent and Ridge',
-               2,
-               None,
-               'gradient-descent-and-ridge'),
-              ('The Hessian matrix for Ridge Regression',
-               2,
-               None,
-               'the-hessian-matrix-for-ridge-regression'),
-              ('Program example for gradient descent with Ridge Regression',
-               2,
-               None,
-               'program-example-for-gradient-descent-with-ridge-regression'),
-              ('Using gradient descent methods, limitations',
-               2,
-               None,
-               'using-gradient-descent-methods-limitations'),
-              ('Challenge yourself the coming weekend',
-               2,
-               None,
-               'challenge-yourself-the-coming-weekend'),
-              ('Lab session: Material from last week and relevant for the '
-               'first project',
-               2,
-               None,
-               'lab-session-material-from-last-week-and-relevant-for-the-first-project'),
+               'another-example-from-scikit-learn-s-repository'),
               ('Various steps in cross-validation',
                2,
                None,
                'various-steps-in-cross-validation'),
-              ('How to set up the cross-validation for Ridge and/or Lasso',
-               2,
-               None,
-               'how-to-set-up-the-cross-validation-for-ridge-and-or-lasso'),
               ('Cross-validation in brief',
                2,
                None,
@@ -210,7 +151,19 @@
                'Cross-validation',
                2,
                None,
-               'code-example-for-cross-validation-and-k-fold-cross-validation')]}
+               'code-example-for-cross-validation-and-k-fold-cross-validation'),
+              ('More examples on bootstrap and cross-validation and errors',
+               2,
+               None,
+               'more-examples-on-bootstrap-and-cross-validation-and-errors'),
+              ('The same example but now with cross-validation',
+               2,
+               None,
+               'the-same-example-but-now-with-cross-validation'),
+              ('Material for the lab sessions',
+               2,
+               None,
+               'material-for-the-lab-sessions')]}
 end of tocinfo -->
 
 <body>
@@ -238,76 +191,55 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week38-bs.html">Week 38: Logistic Regression and Optimization</a>
+    <a class="navbar-brand" href="http://github.com/week38-bs.html">Week 38: Statistical analysis, bias-variance tradeoff and resampling methods</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs001.html#plans-for-week-38-lecture-monday-september-16" style="font-size: 80%;">Plans for week 38, lecture Monday September 16</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs002.html#suggested-reading-and-videos" style="font-size: 80%;">Suggested reading and videos</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs003.html#plans-for-the-lab-sessions" style="font-size: 80%;">Plans for the lab sessions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs004.html#material-for-lecture-monday-september-16" style="font-size: 80%;">Material for lecture Monday September 16</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs005.html#logistic-regression" style="font-size: 80%;">Logistic Regression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs006.html#classification-problems" style="font-size: 80%;">Classification problems</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs007.html#optimization-and-deep-learning" style="font-size: 80%;">Optimization and Deep learning</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs008.html#basics" style="font-size: 80%;">Basics</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs009.html#linear-classifier" style="font-size: 80%;">Linear classifier</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs010.html#some-selected-properties" style="font-size: 80%;">Some selected properties</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs011.html#simple-example" style="font-size: 80%;">Simple example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs012.html#plotting-the-mean-value-for-each-group" style="font-size: 80%;">Plotting the mean value for each group</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs013.html#the-logistic-function" style="font-size: 80%;">The logistic function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs014.html#examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-networks" style="font-size: 80%;">Examples of likelihood functions used in logistic regression and nueral networks</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs015.html#two-parameters" style="font-size: 80%;">Two parameters</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs016.html#maximum-likelihood" style="font-size: 80%;">Maximum likelihood</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs017.html#the-cost-function-rewritten" style="font-size: 80%;">The cost function rewritten</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs018.html#minimizing-the-cross-entropy" style="font-size: 80%;">Minimizing the cross entropy</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs019.html#a-more-compact-expression" style="font-size: 80%;">A more compact expression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs020.html#extending-to-more-predictors" style="font-size: 80%;">Extending to more predictors</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs021.html#including-more-classes" style="font-size: 80%;">Including more classes</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs022.html#more-classes" style="font-size: 80%;">More classes</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs023.html#searching-for-optimal-regularization-parameters-lambda" style="font-size: 80%;">Searching for Optimal Regularization Parameters \( \lambda \)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs024.html#grid-search" style="font-size: 80%;">Grid Search</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs025.html#randomized-grid-search" style="font-size: 80%;">Randomized Grid Search</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs026.html#wisconsin-cancer-data" style="font-size: 80%;">Wisconsin Cancer Data</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs027.html#using-the-correlation-matrix" style="font-size: 80%;">Using the correlation matrix</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs028.html#discussing-the-correlation-data" style="font-size: 80%;">Discussing the correlation data</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs029.html#other-measures-in-classification-studies-cancer-data-again" style="font-size: 80%;">Other measures in classification studies: Cancer Data  again</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs030.html#optimization-the-central-part-of-any-machine-learning-algortithm" style="font-size: 80%;">Optimization, the central part of any Machine Learning algortithm</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs031.html#revisiting-our-logistic-regression-case" style="font-size: 80%;">Revisiting our Logistic Regression case</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs032.html#the-equations-to-solve" style="font-size: 80%;">The equations to solve</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs033.html#solving-using-newton-raphson-s-method" style="font-size: 80%;">Solving using Newton-Raphson's method</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs034.html#brief-reminder-on-newton-raphson-s-method" style="font-size: 80%;">Brief reminder on Newton-Raphson's method</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs035.html#the-equations" style="font-size: 80%;">The equations</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs036.html#simple-geometric-interpretation" style="font-size: 80%;">Simple geometric interpretation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs037.html#extending-to-more-than-one-variable" style="font-size: 80%;">Extending to more than one variable</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs038.html#steepest-descent" style="font-size: 80%;">Steepest descent</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs039.html#more-on-steepest-descent" style="font-size: 80%;">More on Steepest descent</a></li>
-     <!-- navigation toc: --> <li><a href="#the-ideal" style="font-size: 80%;">The ideal</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs041.html#the-sensitiveness-of-the-gradient-descent" style="font-size: 80%;">The sensitiveness of the gradient descent</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs042.html#convex-functions" style="font-size: 80%;">Convex functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs043.html#convex-function" style="font-size: 80%;">Convex function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs044.html#conditions-on-convex-functions" style="font-size: 80%;">Conditions on convex functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs045.html#more-on-convex-functions" style="font-size: 80%;">More on convex functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs046.html#some-simple-problems" style="font-size: 80%;">Some simple problems</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs047.html#revisiting-our-first-homework" style="font-size: 80%;">Revisiting our first homework</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs052.html#gradient-descent-example" style="font-size: 80%;">Gradient descent example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs049.html#the-derivative-of-the-cost-loss-function" style="font-size: 80%;">The derivative of the cost/loss function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs050.html#the-hessian-matrix" style="font-size: 80%;">The Hessian matrix</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs051.html#simple-program" style="font-size: 80%;">Simple program</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs052.html#gradient-descent-example" style="font-size: 80%;">Gradient Descent Example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs053.html#and-a-corresponding-example-using-scikit-learn" style="font-size: 80%;">And a corresponding example using <b>scikit-learn</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs054.html#gradient-descent-and-ridge" style="font-size: 80%;">Gradient descent and Ridge</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs055.html#the-hessian-matrix-for-ridge-regression" style="font-size: 80%;">The Hessian matrix for Ridge Regression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs056.html#program-example-for-gradient-descent-with-ridge-regression" style="font-size: 80%;">Program example for gradient descent with Ridge Regression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs057.html#using-gradient-descent-methods-limitations" style="font-size: 80%;">Using gradient descent methods, limitations</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs058.html#challenge-yourself-the-coming-weekend" style="font-size: 80%;">Challenge yourself the coming weekend</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs059.html#lab-session-material-from-last-week-and-relevant-for-the-first-project" style="font-size: 80%;">Lab session: Material from last week and relevant for the first project</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs060.html#various-steps-in-cross-validation" style="font-size: 80%;">Various steps in cross-validation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs061.html#how-to-set-up-the-cross-validation-for-ridge-and-or-lasso" style="font-size: 80%;">How to set up the cross-validation for Ridge and/or Lasso</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs062.html#cross-validation-in-brief" style="font-size: 80%;">Cross-validation in brief</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs063.html#code-example-for-cross-validation-and-k-fold-cross-validation" style="font-size: 80%;">Code Example for Cross-validation and \( k \)-fold Cross-validation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs001.html#plans-for-week-38-lecture-monday-september-15" style="font-size: 80%;">Plans for week 38, lecture Monday September 15</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs002.html#readings-and-videos" style="font-size: 80%;">Readings and Videos</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs003.html#linking-the-regression-analysis-with-a-statistical-interpretation" style="font-size: 80%;">Linking the regression analysis with a statistical interpretation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs004.html#assumptions-made" style="font-size: 80%;">Assumptions made</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs005.html#expectation-value-and-variance" style="font-size: 80%;">Expectation value and variance</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs006.html#expectation-value-and-variance-for-boldsymbol-theta" style="font-size: 80%;">Expectation value and variance for \( \boldsymbol{\theta} \)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs007.html#deriving-ols-from-a-probability-distribution" style="font-size: 80%;">Deriving OLS from a probability distribution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs008.html#independent-and-identically-distributed-iid" style="font-size: 80%;">Independent and Identically Distributed (iid)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs009.html#maximum-likelihood-estimation-mle" style="font-size: 80%;">Maximum Likelihood Estimation (MLE)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs010.html#a-new-cost-function" style="font-size: 80%;">A new Cost Function</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs014.html#why-resampling-methods" style="font-size: 80%;">Why resampling methods</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs016.html#resampling-methods" style="font-size: 80%;">Resampling methods</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs013.html#resampling-approaches-can-be-computationally-expensive" style="font-size: 80%;">Resampling approaches can be computationally expensive</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs014.html#why-resampling-methods" style="font-size: 80%;">Why resampling methods ?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs015.html#statistical-analysis" style="font-size: 80%;">Statistical analysis</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs016.html#resampling-methods" style="font-size: 80%;">Resampling methods</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs017.html#resampling-methods-bootstrap" style="font-size: 80%;">Resampling methods: Bootstrap</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs018.html#the-central-limit-theorem" style="font-size: 80%;">The Central Limit Theorem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs019.html#finding-the-limit" style="font-size: 80%;">Finding the Limit</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs020.html#rewriting-the-delta-function" style="font-size: 80%;">Rewriting the \( \delta \)-function</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs021.html#identifying-terms" style="font-size: 80%;">Identifying Terms</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs022.html#wrapping-it-up" style="font-size: 80%;">Wrapping it up</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs023.html#confidence-intervals" style="font-size: 80%;">Confidence Intervals</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs024.html#standard-approach-based-on-the-normal-distribution" style="font-size: 80%;">Standard Approach based on the Normal Distribution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs025.html#resampling-methods-bootstrap-background" style="font-size: 80%;">Resampling methods: Bootstrap background</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs026.html#resampling-methods-more-bootstrap-background" style="font-size: 80%;">Resampling methods: More Bootstrap background</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs027.html#resampling-methods-bootstrap-approach" style="font-size: 80%;">Resampling methods: Bootstrap approach</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs028.html#resampling-methods-bootstrap-steps" style="font-size: 80%;">Resampling methods: Bootstrap steps</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs029.html#code-example-for-the-bootstrap-method" style="font-size: 80%;">Code example for the Bootstrap method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs030.html#plotting-the-histogram" style="font-size: 80%;">Plotting the Histogram</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs031.html#the-bias-variance-tradeoff" style="font-size: 80%;">The bias-variance tradeoff</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs032.html#a-way-to-read-the-bias-variance-tradeoff" style="font-size: 80%;">A way to Read the Bias-Variance Tradeoff</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs033.html#example-code-for-bias-variance-tradeoff" style="font-size: 80%;">Example code for Bias-Variance tradeoff</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs034.html#understanding-what-happens" style="font-size: 80%;">Understanding what happens</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs035.html#summing-up" style="font-size: 80%;">Summing up</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs036.html#another-example-from-scikit-learn-s-repository" style="font-size: 80%;">Another Example from Scikit-Learn's Repository</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs037.html#various-steps-in-cross-validation" style="font-size: 80%;">Various steps in cross-validation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs038.html#cross-validation-in-brief" style="font-size: 80%;">Cross-validation in brief</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs039.html#code-example-for-cross-validation-and-k-fold-cross-validation" style="font-size: 80%;">Code Example for Cross-validation and \( k \)-fold Cross-validation</a></li>
+     <!-- navigation toc: --> <li><a href="#more-examples-on-bootstrap-and-cross-validation-and-errors" style="font-size: 80%;">More examples on bootstrap and cross-validation and errors</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs041.html#the-same-example-but-now-with-cross-validation" style="font-size: 80%;">The same example but now with cross-validation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs042.html#material-for-the-lab-sessions" style="font-size: 80%;">Material for the lab sessions</a></li>
 
         </ul>
       </li>
@@ -318,29 +250,110 @@
 <div class="container">
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0040"></a>
-<!-- !split  -->
-<h2 id="the-ideal" class="anchor">The ideal </h2>
+<!-- !split -->
+<h2 id="more-examples-on-bootstrap-and-cross-validation-and-errors" class="anchor">More examples on bootstrap and cross-validation and errors </h2>
+
+
+<!-- code=python (!bc pycod) typeset with pygments style "default" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #f8f8f8">
+  <pre style="line-height: 125%;"><span style="color: #408080; font-style: italic"># Common imports</span>
+<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">os</span>
+<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
+<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">pandas</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">pd</span>
+<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">matplotlib.pyplot</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">plt</span>
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.linear_model</span> <span style="color: #008000; font-weight: bold">import</span> LinearRegression, Ridge, Lasso
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.model_selection</span> <span style="color: #008000; font-weight: bold">import</span> train_test_split
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.utils</span> <span style="color: #008000; font-weight: bold">import</span> resample
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.metrics</span> <span style="color: #008000; font-weight: bold">import</span> mean_squared_error
+<span style="color: #408080; font-style: italic"># Where to save the figures and data files</span>
+PROJECT_ROOT_DIR <span style="color: #666666">=</span> <span style="color: #BA2121">&quot;Results&quot;</span>
+FIGURE_ID <span style="color: #666666">=</span> <span style="color: #BA2121">&quot;Results/FigureFiles&quot;</span>
+DATA_ID <span style="color: #666666">=</span> <span style="color: #BA2121">&quot;DataFiles/&quot;</span>
+
+<span style="color: #008000; font-weight: bold">if</span> <span style="color: #AA22FF; font-weight: bold">not</span> os<span style="color: #666666">.</span>path<span style="color: #666666">.</span>exists(PROJECT_ROOT_DIR):
+    os<span style="color: #666666">.</span>mkdir(PROJECT_ROOT_DIR)
+
+<span style="color: #008000; font-weight: bold">if</span> <span style="color: #AA22FF; font-weight: bold">not</span> os<span style="color: #666666">.</span>path<span style="color: #666666">.</span>exists(FIGURE_ID):
+    os<span style="color: #666666">.</span>makedirs(FIGURE_ID)
+
+<span style="color: #008000; font-weight: bold">if</span> <span style="color: #AA22FF; font-weight: bold">not</span> os<span style="color: #666666">.</span>path<span style="color: #666666">.</span>exists(DATA_ID):
+    os<span style="color: #666666">.</span>makedirs(DATA_ID)
+
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">image_path</span>(fig_id):
+    <span style="color: #008000; font-weight: bold">return</span> os<span style="color: #666666">.</span>path<span style="color: #666666">.</span>join(FIGURE_ID, fig_id)
+
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">data_path</span>(dat_id):
+    <span style="color: #008000; font-weight: bold">return</span> os<span style="color: #666666">.</span>path<span style="color: #666666">.</span>join(DATA_ID, dat_id)
 
-<p>Ideally the sequence \( \{\mathbf{x}_k \}_{k=0} \) converges to a global
-minimum of the function \( F \). In general we do not know if we are in a
-global or local minimum. In the special case when \( F \) is a convex
-function, all local minima are also global minima, so in this case
-gradient descent can converge to the global solution. The advantage of
-this scheme is that it is conceptually simple and straightforward to
-implement. However the method in this form has some severe
-limitations:
-</p>
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">save_fig</span>(fig_id):
+    plt<span style="color: #666666">.</span>savefig(image_path(fig_id) <span style="color: #666666">+</span> <span style="color: #BA2121">&quot;.png&quot;</span>, <span style="color: #008000">format</span><span style="color: #666666">=</span><span style="color: #BA2121">&#39;png&#39;</span>)
 
-<p>In machine learing we are often faced with non-convex high dimensional
-cost functions with many local minima. Since GD is deterministic we
-will get stuck in a local minimum, if the method converges, unless we
-have a very good intial guess. This also implies that the scheme is
-sensitive to the chosen initial condition.
-</p>
+infile <span style="color: #666666">=</span> <span style="color: #008000">open</span>(data_path(<span style="color: #BA2121">&quot;EoS.csv&quot;</span>),<span style="color: #BA2121">&#39;r&#39;</span>)
 
-<p>Note that the gradient is a function of \( \mathbf{x} =
-(x_1,\cdots,x_n) \) which makes it expensive to compute numerically.
-</p>
+<span style="color: #408080; font-style: italic"># Read the EoS data as  csv file and organize the data into two arrays with density and energies</span>
+EoS <span style="color: #666666">=</span> pd<span style="color: #666666">.</span>read_csv(infile, names<span style="color: #666666">=</span>(<span style="color: #BA2121">&#39;Density&#39;</span>, <span style="color: #BA2121">&#39;Energy&#39;</span>))
+EoS[<span style="color: #BA2121">&#39;Energy&#39;</span>] <span style="color: #666666">=</span> pd<span style="color: #666666">.</span>to_numeric(EoS[<span style="color: #BA2121">&#39;Energy&#39;</span>], errors<span style="color: #666666">=</span><span style="color: #BA2121">&#39;coerce&#39;</span>)
+EoS <span style="color: #666666">=</span> EoS<span style="color: #666666">.</span>dropna()
+Energies <span style="color: #666666">=</span> EoS[<span style="color: #BA2121">&#39;Energy&#39;</span>]
+Density <span style="color: #666666">=</span> EoS[<span style="color: #BA2121">&#39;Density&#39;</span>]
+<span style="color: #408080; font-style: italic">#  The design matrix now as function of various polytrops</span>
+
+Maxpolydegree <span style="color: #666666">=</span> <span style="color: #666666">30</span>
+X <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros((<span style="color: #008000">len</span>(Density),Maxpolydegree))
+X[:,<span style="color: #666666">0</span>] <span style="color: #666666">=</span> <span style="color: #666666">1.0</span>
+testerror <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros(Maxpolydegree)
+trainingerror <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros(Maxpolydegree)
+polynomial <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros(Maxpolydegree)
+
+trials <span style="color: #666666">=</span> <span style="color: #666666">100</span>
+<span style="color: #008000; font-weight: bold">for</span> polydegree <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #666666">1</span>, Maxpolydegree):
+    polynomial[polydegree] <span style="color: #666666">=</span> polydegree
+    <span style="color: #008000; font-weight: bold">for</span> degree <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(polydegree):
+        X[:,degree] <span style="color: #666666">=</span> Density<span style="color: #666666">**</span>(degree<span style="color: #666666">/3.0</span>)
+
+<span style="color: #408080; font-style: italic"># loop over trials in order to estimate the expectation value of the MSE</span>
+    testerror[polydegree] <span style="color: #666666">=</span> <span style="color: #666666">0.0</span>
+    trainingerror[polydegree] <span style="color: #666666">=</span> <span style="color: #666666">0.0</span>
+    <span style="color: #008000; font-weight: bold">for</span> samples <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(trials):
+        x_train, x_test, y_train, y_test <span style="color: #666666">=</span> train_test_split(X, Energies, test_size<span style="color: #666666">=0.2</span>)
+        model <span style="color: #666666">=</span> LinearRegression(fit_intercept<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">False</span>)<span style="color: #666666">.</span>fit(x_train, y_train)
+        ypred <span style="color: #666666">=</span> model<span style="color: #666666">.</span>predict(x_train)
+        ytilde <span style="color: #666666">=</span> model<span style="color: #666666">.</span>predict(x_test)
+        testerror[polydegree] <span style="color: #666666">+=</span> mean_squared_error(y_test, ytilde)
+        trainingerror[polydegree] <span style="color: #666666">+=</span> mean_squared_error(y_train, ypred) 
+
+    testerror[polydegree] <span style="color: #666666">/=</span> trials
+    trainingerror[polydegree] <span style="color: #666666">/=</span> trials
+    <span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;Degree of polynomial: </span><span style="color: #BB6688; font-weight: bold">%3d</span><span style="color: #BA2121">&quot;</span><span style="color: #666666">%</span> polynomial[polydegree])
+    <span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;Mean squared error on training data: </span><span style="color: #BB6688; font-weight: bold">%.8f</span><span style="color: #BA2121">&quot;</span> <span style="color: #666666">%</span> trainingerror[polydegree])
+    <span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;Mean squared error on test data: </span><span style="color: #BB6688; font-weight: bold">%.8f</span><span style="color: #BA2121">&quot;</span> <span style="color: #666666">%</span> testerror[polydegree])
+
+plt<span style="color: #666666">.</span>plot(polynomial, np<span style="color: #666666">.</span>log10(trainingerror), label<span style="color: #666666">=</span><span style="color: #BA2121">&#39;Training Error&#39;</span>)
+plt<span style="color: #666666">.</span>plot(polynomial, np<span style="color: #666666">.</span>log10(testerror), label<span style="color: #666666">=</span><span style="color: #BA2121">&#39;Test Error&#39;</span>)
+plt<span style="color: #666666">.</span>xlabel(<span style="color: #BA2121">&#39;Polynomial degree&#39;</span>)
+plt<span style="color: #666666">.</span>ylabel(<span style="color: #BA2121">&#39;log10[MSE]&#39;</span>)
+plt<span style="color: #666666">.</span>legend()
+plt<span style="color: #666666">.</span>show()
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
+</div>
+
+<p>Note that we kept the intercept column in the fitting here. This means that we need to set the <b>intercept</b> in the call to the <b>Scikit-Learn</b> function as <b>False</b>. Alternatively, we could have set up the design matrix \( X \) without the first column of ones.</p>
 
 <p>
 <!-- navigation buttons at the bottom of the page -->
@@ -359,15 +372,6 @@ <h2 id="the-ideal" class="anchor">The ideal </h2>
   <li class="active"><a href="http://github.com/._week38-bs040.html">41</a></li>
   <li><a href="http://github.com/._week38-bs041.html">42</a></li>
   <li><a href="http://github.com/._week38-bs042.html">43</a></li>
-  <li><a href="http://github.com/._week38-bs043.html">44</a></li>
-  <li><a href="http://github.com/._week38-bs044.html">45</a></li>
-  <li><a href="http://github.com/._week38-bs045.html">46</a></li>
-  <li><a href="http://github.com/._week38-bs046.html">47</a></li>
-  <li><a href="http://github.com/._week38-bs047.html">48</a></li>
-  <li><a href="http://github.com/._week38-bs048.html">49</a></li>
-  <li><a href="http://github.com/._week38-bs049.html">50</a></li>
-  <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week38-bs063.html">64</a></li>
   <li><a href="http://github.com/._week38-bs041.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week38/html/._week38-bs041.html b/doc/pub/week38/html/._week38-bs041.html
index efcee7c0a..a3959bb0b 100644
--- a/doc/pub/week38/html/._week38-bs041.html
+++ b/doc/pub/week38/html/._week38-bs041.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 38: Logistic Regression and Optimization">
-<title>Week 38: Logistic Regression and Optimization</title>
+<meta name="description" content="Week 38: Statistical analysis, bias-variance tradeoff and resampling methods">
+<title>Week 38: Statistical analysis, bias-variance tradeoff and resampling methods</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week38.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week38-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -36,172 +36,113 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Plans for week 38, lecture Monday September 16',
+ 'sections': [('Plans for week 38, lecture Monday September 15',
                2,
                None,
-               'plans-for-week-38-lecture-monday-september-16'),
-              ('Suggested reading and videos',
+               'plans-for-week-38-lecture-monday-september-15'),
+              ('Readings and Videos', 2, None, 'readings-and-videos'),
+              ('Linking the regression analysis with a statistical '
+               'interpretation',
                2,
                None,
-               'suggested-reading-and-videos'),
-              ('Plans for the lab sessions',
+               'linking-the-regression-analysis-with-a-statistical-interpretation'),
+              ('Assumptions made', 2, None, 'assumptions-made'),
+              ('Expectation value and variance',
                2,
                None,
-               'plans-for-the-lab-sessions'),
-              ('Material for lecture Monday September 16',
+               'expectation-value-and-variance'),
+              ('Expectation value and variance for $\\boldsymbol{\\theta}$',
                2,
                None,
-               'material-for-lecture-monday-september-16'),
-              ('Logistic Regression', 2, None, 'logistic-regression'),
-              ('Classification problems', 2, None, 'classification-problems'),
-              ('Optimization and Deep learning',
+               'expectation-value-and-variance-for-boldsymbol-theta'),
+              ('Deriving OLS from a probability distribution',
                2,
                None,
-               'optimization-and-deep-learning'),
-              ('Basics', 2, None, 'basics'),
-              ('Linear classifier', 2, None, 'linear-classifier'),
-              ('Some selected properties', 2, None, 'some-selected-properties'),
-              ('Simple example', 2, None, 'simple-example'),
-              ('Plotting the mean value for each group',
+               'deriving-ols-from-a-probability-distribution'),
+              ('Independent and Identically Distributed (iid)',
                2,
                None,
-               'plotting-the-mean-value-for-each-group'),
-              ('The logistic function', 2, None, 'the-logistic-function'),
-              ('Examples of likelihood functions used in logistic regression '
-               'and nueral networks',
+               'independent-and-identically-distributed-iid'),
+              ('Maximum Likelihood Estimation (MLE)',
                2,
                None,
-               'examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-networks'),
-              ('Two parameters', 2, None, 'two-parameters'),
-              ('Maximum likelihood', 2, None, 'maximum-likelihood'),
-              ('The cost function rewritten',
+               'maximum-likelihood-estimation-mle'),
+              ('A new Cost Function', 2, None, 'a-new-cost-function'),
+              ('Why resampling methods', 2, None, 'why-resampling-methods'),
+              ('Resampling methods', 2, None, 'resampling-methods'),
+              ('Resampling approaches can be computationally expensive',
                2,
                None,
-               'the-cost-function-rewritten'),
-              ('Minimizing the cross entropy',
+               'resampling-approaches-can-be-computationally-expensive'),
+              ('Why resampling methods ?', 2, None, 'why-resampling-methods'),
+              ('Statistical analysis', 2, None, 'statistical-analysis'),
+              ('Resampling methods', 2, None, 'resampling-methods'),
+              ('Resampling methods: Bootstrap',
                2,
                None,
-               'minimizing-the-cross-entropy'),
-              ('A more compact expression',
+               'resampling-methods-bootstrap'),
+              ('The Central Limit Theorem',
                2,
                None,
-               'a-more-compact-expression'),
-              ('Extending to more predictors',
+               'the-central-limit-theorem'),
+              ('Finding the Limit', 2, None, 'finding-the-limit'),
+              ('Rewriting the $\\delta$-function',
                2,
                None,
-               'extending-to-more-predictors'),
-              ('Including more classes', 2, None, 'including-more-classes'),
-              ('More classes', 2, None, 'more-classes'),
-              ('Searching for Optimal Regularization Parameters $\\lambda$',
+               'rewriting-the-delta-function'),
+              ('Identifying Terms', 2, None, 'identifying-terms'),
+              ('Wrapping it up', 2, None, 'wrapping-it-up'),
+              ('Confidence Intervals', 2, None, 'confidence-intervals'),
+              ('Standard Approach based on the Normal Distribution',
                2,
                None,
-               'searching-for-optimal-regularization-parameters-lambda'),
-              ('Grid Search', 2, None, 'grid-search'),
-              ('Randomized Grid Search', 2, None, 'randomized-grid-search'),
-              ('Wisconsin Cancer Data', 2, None, 'wisconsin-cancer-data'),
-              ('Using the correlation matrix',
+               'standard-approach-based-on-the-normal-distribution'),
+              ('Resampling methods: Bootstrap background',
                2,
                None,
-               'using-the-correlation-matrix'),
-              ('Discussing the correlation data',
+               'resampling-methods-bootstrap-background'),
+              ('Resampling methods: More Bootstrap background',
                2,
                None,
-               'discussing-the-correlation-data'),
-              ('Other measures in classification studies: Cancer Data  again',
+               'resampling-methods-more-bootstrap-background'),
+              ('Resampling methods: Bootstrap approach',
                2,
                None,
-               'other-measures-in-classification-studies-cancer-data-again'),
-              ('Optimization, the central part of any Machine Learning '
-               'algortithm',
+               'resampling-methods-bootstrap-approach'),
+              ('Resampling methods: Bootstrap steps',
                2,
                None,
-               'optimization-the-central-part-of-any-machine-learning-algortithm'),
-              ('Revisiting our Logistic Regression case',
+               'resampling-methods-bootstrap-steps'),
+              ('Code example for the Bootstrap method',
                2,
                None,
-               'revisiting-our-logistic-regression-case'),
-              ('The equations to solve', 2, None, 'the-equations-to-solve'),
-              ("Solving using Newton-Raphson's method",
+               'code-example-for-the-bootstrap-method'),
+              ('Plotting the Histogram', 2, None, 'plotting-the-histogram'),
+              ('The bias-variance tradeoff',
                2,
                None,
-               'solving-using-newton-raphson-s-method'),
-              ("Brief reminder on Newton-Raphson's method",
+               'the-bias-variance-tradeoff'),
+              ('A way to Read the Bias-Variance Tradeoff',
                2,
                None,
-               'brief-reminder-on-newton-raphson-s-method'),
-              ('The equations', 2, None, 'the-equations'),
-              ('Simple geometric interpretation',
+               'a-way-to-read-the-bias-variance-tradeoff'),
+              ('Example code for Bias-Variance tradeoff',
                2,
                None,
-               'simple-geometric-interpretation'),
-              ('Extending to more than one variable',
+               'example-code-for-bias-variance-tradeoff'),
+              ('Understanding what happens',
                2,
                None,
-               'extending-to-more-than-one-variable'),
-              ('Steepest descent', 2, None, 'steepest-descent'),
-              ('More on Steepest descent', 2, None, 'more-on-steepest-descent'),
-              ('The ideal', 2, None, 'the-ideal'),
-              ('The sensitiveness of the gradient descent',
+               'understanding-what-happens'),
+              ('Summing up', 2, None, 'summing-up'),
+              ("Another Example from Scikit-Learn's Repository",
                2,
                None,
-               'the-sensitiveness-of-the-gradient-descent'),
-              ('Convex functions', 2, None, 'convex-functions'),
-              ('Convex function', 2, None, 'convex-function'),
-              ('Conditions on convex functions',
-               2,
-               None,
-               'conditions-on-convex-functions'),
-              ('More on convex functions', 2, None, 'more-on-convex-functions'),
-              ('Some simple problems', 2, None, 'some-simple-problems'),
-              ('Revisiting our first homework',
-               2,
-               None,
-               'revisiting-our-first-homework'),
-              ('Gradient descent example', 2, None, 'gradient-descent-example'),
-              ('The derivative of the cost/loss function',
-               2,
-               None,
-               'the-derivative-of-the-cost-loss-function'),
-              ('The Hessian matrix', 2, None, 'the-hessian-matrix'),
-              ('Simple program', 2, None, 'simple-program'),
-              ('Gradient Descent Example', 2, None, 'gradient-descent-example'),
-              ('And a corresponding example using _scikit-learn_',
-               2,
-               None,
-               'and-a-corresponding-example-using-scikit-learn'),
-              ('Gradient descent and Ridge',
-               2,
-               None,
-               'gradient-descent-and-ridge'),
-              ('The Hessian matrix for Ridge Regression',
-               2,
-               None,
-               'the-hessian-matrix-for-ridge-regression'),
-              ('Program example for gradient descent with Ridge Regression',
-               2,
-               None,
-               'program-example-for-gradient-descent-with-ridge-regression'),
-              ('Using gradient descent methods, limitations',
-               2,
-               None,
-               'using-gradient-descent-methods-limitations'),
-              ('Challenge yourself the coming weekend',
-               2,
-               None,
-               'challenge-yourself-the-coming-weekend'),
-              ('Lab session: Material from last week and relevant for the '
-               'first project',
-               2,
-               None,
-               'lab-session-material-from-last-week-and-relevant-for-the-first-project'),
+               'another-example-from-scikit-learn-s-repository'),
               ('Various steps in cross-validation',
                2,
                None,
                'various-steps-in-cross-validation'),
-              ('How to set up the cross-validation for Ridge and/or Lasso',
-               2,
-               None,
-               'how-to-set-up-the-cross-validation-for-ridge-and-or-lasso'),
               ('Cross-validation in brief',
                2,
                None,
@@ -210,7 +151,19 @@
                'Cross-validation',
                2,
                None,
-               'code-example-for-cross-validation-and-k-fold-cross-validation')]}
+               'code-example-for-cross-validation-and-k-fold-cross-validation'),
+              ('More examples on bootstrap and cross-validation and errors',
+               2,
+               None,
+               'more-examples-on-bootstrap-and-cross-validation-and-errors'),
+              ('The same example but now with cross-validation',
+               2,
+               None,
+               'the-same-example-but-now-with-cross-validation'),
+              ('Material for the lab sessions',
+               2,
+               None,
+               'material-for-the-lab-sessions')]}
 end of tocinfo -->
 
 <body>
@@ -238,76 +191,55 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week38-bs.html">Week 38: Logistic Regression and Optimization</a>
+    <a class="navbar-brand" href="http://github.com/week38-bs.html">Week 38: Statistical analysis, bias-variance tradeoff and resampling methods</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs001.html#plans-for-week-38-lecture-monday-september-16" style="font-size: 80%;">Plans for week 38, lecture Monday September 16</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs002.html#suggested-reading-and-videos" style="font-size: 80%;">Suggested reading and videos</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs003.html#plans-for-the-lab-sessions" style="font-size: 80%;">Plans for the lab sessions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs004.html#material-for-lecture-monday-september-16" style="font-size: 80%;">Material for lecture Monday September 16</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs005.html#logistic-regression" style="font-size: 80%;">Logistic Regression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs006.html#classification-problems" style="font-size: 80%;">Classification problems</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs007.html#optimization-and-deep-learning" style="font-size: 80%;">Optimization and Deep learning</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs008.html#basics" style="font-size: 80%;">Basics</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs009.html#linear-classifier" style="font-size: 80%;">Linear classifier</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs010.html#some-selected-properties" style="font-size: 80%;">Some selected properties</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs011.html#simple-example" style="font-size: 80%;">Simple example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs012.html#plotting-the-mean-value-for-each-group" style="font-size: 80%;">Plotting the mean value for each group</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs013.html#the-logistic-function" style="font-size: 80%;">The logistic function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs014.html#examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-networks" style="font-size: 80%;">Examples of likelihood functions used in logistic regression and nueral networks</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs015.html#two-parameters" style="font-size: 80%;">Two parameters</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs016.html#maximum-likelihood" style="font-size: 80%;">Maximum likelihood</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs017.html#the-cost-function-rewritten" style="font-size: 80%;">The cost function rewritten</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs018.html#minimizing-the-cross-entropy" style="font-size: 80%;">Minimizing the cross entropy</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs019.html#a-more-compact-expression" style="font-size: 80%;">A more compact expression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs020.html#extending-to-more-predictors" style="font-size: 80%;">Extending to more predictors</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs021.html#including-more-classes" style="font-size: 80%;">Including more classes</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs022.html#more-classes" style="font-size: 80%;">More classes</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs023.html#searching-for-optimal-regularization-parameters-lambda" style="font-size: 80%;">Searching for Optimal Regularization Parameters \( \lambda \)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs024.html#grid-search" style="font-size: 80%;">Grid Search</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs025.html#randomized-grid-search" style="font-size: 80%;">Randomized Grid Search</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs026.html#wisconsin-cancer-data" style="font-size: 80%;">Wisconsin Cancer Data</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs027.html#using-the-correlation-matrix" style="font-size: 80%;">Using the correlation matrix</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs028.html#discussing-the-correlation-data" style="font-size: 80%;">Discussing the correlation data</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs029.html#other-measures-in-classification-studies-cancer-data-again" style="font-size: 80%;">Other measures in classification studies: Cancer Data  again</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs030.html#optimization-the-central-part-of-any-machine-learning-algortithm" style="font-size: 80%;">Optimization, the central part of any Machine Learning algortithm</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs031.html#revisiting-our-logistic-regression-case" style="font-size: 80%;">Revisiting our Logistic Regression case</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs032.html#the-equations-to-solve" style="font-size: 80%;">The equations to solve</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs033.html#solving-using-newton-raphson-s-method" style="font-size: 80%;">Solving using Newton-Raphson's method</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs034.html#brief-reminder-on-newton-raphson-s-method" style="font-size: 80%;">Brief reminder on Newton-Raphson's method</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs035.html#the-equations" style="font-size: 80%;">The equations</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs036.html#simple-geometric-interpretation" style="font-size: 80%;">Simple geometric interpretation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs037.html#extending-to-more-than-one-variable" style="font-size: 80%;">Extending to more than one variable</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs038.html#steepest-descent" style="font-size: 80%;">Steepest descent</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs039.html#more-on-steepest-descent" style="font-size: 80%;">More on Steepest descent</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs040.html#the-ideal" style="font-size: 80%;">The ideal</a></li>
-     <!-- navigation toc: --> <li><a href="#the-sensitiveness-of-the-gradient-descent" style="font-size: 80%;">The sensitiveness of the gradient descent</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs042.html#convex-functions" style="font-size: 80%;">Convex functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs043.html#convex-function" style="font-size: 80%;">Convex function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs044.html#conditions-on-convex-functions" style="font-size: 80%;">Conditions on convex functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs045.html#more-on-convex-functions" style="font-size: 80%;">More on convex functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs046.html#some-simple-problems" style="font-size: 80%;">Some simple problems</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs047.html#revisiting-our-first-homework" style="font-size: 80%;">Revisiting our first homework</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs052.html#gradient-descent-example" style="font-size: 80%;">Gradient descent example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs049.html#the-derivative-of-the-cost-loss-function" style="font-size: 80%;">The derivative of the cost/loss function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs050.html#the-hessian-matrix" style="font-size: 80%;">The Hessian matrix</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs051.html#simple-program" style="font-size: 80%;">Simple program</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs052.html#gradient-descent-example" style="font-size: 80%;">Gradient Descent Example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs053.html#and-a-corresponding-example-using-scikit-learn" style="font-size: 80%;">And a corresponding example using <b>scikit-learn</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs054.html#gradient-descent-and-ridge" style="font-size: 80%;">Gradient descent and Ridge</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs055.html#the-hessian-matrix-for-ridge-regression" style="font-size: 80%;">The Hessian matrix for Ridge Regression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs056.html#program-example-for-gradient-descent-with-ridge-regression" style="font-size: 80%;">Program example for gradient descent with Ridge Regression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs057.html#using-gradient-descent-methods-limitations" style="font-size: 80%;">Using gradient descent methods, limitations</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs058.html#challenge-yourself-the-coming-weekend" style="font-size: 80%;">Challenge yourself the coming weekend</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs059.html#lab-session-material-from-last-week-and-relevant-for-the-first-project" style="font-size: 80%;">Lab session: Material from last week and relevant for the first project</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs060.html#various-steps-in-cross-validation" style="font-size: 80%;">Various steps in cross-validation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs061.html#how-to-set-up-the-cross-validation-for-ridge-and-or-lasso" style="font-size: 80%;">How to set up the cross-validation for Ridge and/or Lasso</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs062.html#cross-validation-in-brief" style="font-size: 80%;">Cross-validation in brief</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs063.html#code-example-for-cross-validation-and-k-fold-cross-validation" style="font-size: 80%;">Code Example for Cross-validation and \( k \)-fold Cross-validation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs001.html#plans-for-week-38-lecture-monday-september-15" style="font-size: 80%;">Plans for week 38, lecture Monday September 15</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs002.html#readings-and-videos" style="font-size: 80%;">Readings and Videos</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs003.html#linking-the-regression-analysis-with-a-statistical-interpretation" style="font-size: 80%;">Linking the regression analysis with a statistical interpretation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs004.html#assumptions-made" style="font-size: 80%;">Assumptions made</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs005.html#expectation-value-and-variance" style="font-size: 80%;">Expectation value and variance</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs006.html#expectation-value-and-variance-for-boldsymbol-theta" style="font-size: 80%;">Expectation value and variance for \( \boldsymbol{\theta} \)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs007.html#deriving-ols-from-a-probability-distribution" style="font-size: 80%;">Deriving OLS from a probability distribution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs008.html#independent-and-identically-distributed-iid" style="font-size: 80%;">Independent and Identically Distributed (iid)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs009.html#maximum-likelihood-estimation-mle" style="font-size: 80%;">Maximum Likelihood Estimation (MLE)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs010.html#a-new-cost-function" style="font-size: 80%;">A new Cost Function</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs014.html#why-resampling-methods" style="font-size: 80%;">Why resampling methods</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs016.html#resampling-methods" style="font-size: 80%;">Resampling methods</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs013.html#resampling-approaches-can-be-computationally-expensive" style="font-size: 80%;">Resampling approaches can be computationally expensive</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs014.html#why-resampling-methods" style="font-size: 80%;">Why resampling methods ?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs015.html#statistical-analysis" style="font-size: 80%;">Statistical analysis</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs016.html#resampling-methods" style="font-size: 80%;">Resampling methods</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs017.html#resampling-methods-bootstrap" style="font-size: 80%;">Resampling methods: Bootstrap</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs018.html#the-central-limit-theorem" style="font-size: 80%;">The Central Limit Theorem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs019.html#finding-the-limit" style="font-size: 80%;">Finding the Limit</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs020.html#rewriting-the-delta-function" style="font-size: 80%;">Rewriting the \( \delta \)-function</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs021.html#identifying-terms" style="font-size: 80%;">Identifying Terms</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs022.html#wrapping-it-up" style="font-size: 80%;">Wrapping it up</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs023.html#confidence-intervals" style="font-size: 80%;">Confidence Intervals</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs024.html#standard-approach-based-on-the-normal-distribution" style="font-size: 80%;">Standard Approach based on the Normal Distribution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs025.html#resampling-methods-bootstrap-background" style="font-size: 80%;">Resampling methods: Bootstrap background</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs026.html#resampling-methods-more-bootstrap-background" style="font-size: 80%;">Resampling methods: More Bootstrap background</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs027.html#resampling-methods-bootstrap-approach" style="font-size: 80%;">Resampling methods: Bootstrap approach</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs028.html#resampling-methods-bootstrap-steps" style="font-size: 80%;">Resampling methods: Bootstrap steps</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs029.html#code-example-for-the-bootstrap-method" style="font-size: 80%;">Code example for the Bootstrap method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs030.html#plotting-the-histogram" style="font-size: 80%;">Plotting the Histogram</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs031.html#the-bias-variance-tradeoff" style="font-size: 80%;">The bias-variance tradeoff</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs032.html#a-way-to-read-the-bias-variance-tradeoff" style="font-size: 80%;">A way to Read the Bias-Variance Tradeoff</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs033.html#example-code-for-bias-variance-tradeoff" style="font-size: 80%;">Example code for Bias-Variance tradeoff</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs034.html#understanding-what-happens" style="font-size: 80%;">Understanding what happens</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs035.html#summing-up" style="font-size: 80%;">Summing up</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs036.html#another-example-from-scikit-learn-s-repository" style="font-size: 80%;">Another Example from Scikit-Learn's Repository</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs037.html#various-steps-in-cross-validation" style="font-size: 80%;">Various steps in cross-validation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs038.html#cross-validation-in-brief" style="font-size: 80%;">Cross-validation in brief</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs039.html#code-example-for-cross-validation-and-k-fold-cross-validation" style="font-size: 80%;">Code Example for Cross-validation and \( k \)-fold Cross-validation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs040.html#more-examples-on-bootstrap-and-cross-validation-and-errors" style="font-size: 80%;">More examples on bootstrap and cross-validation and errors</a></li>
+     <!-- navigation toc: --> <li><a href="#the-same-example-but-now-with-cross-validation" style="font-size: 80%;">The same example but now with cross-validation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs042.html#material-for-the-lab-sessions" style="font-size: 80%;">Material for the lab sessions</a></li>
 
         </ul>
       </li>
@@ -319,21 +251,98 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0041"></a>
 <!-- !split  -->
-<h2 id="the-sensitiveness-of-the-gradient-descent" class="anchor">The sensitiveness of the gradient descent </h2>
+<h2 id="the-same-example-but-now-with-cross-validation" class="anchor">The same example but now with cross-validation </h2>
+
+<p>In this example we keep the intercept column again but add cross-validation in order to estimate the best possible value of the means squared error.</p>
+
+<!-- code=python (!bc pycod) typeset with pygments style "default" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #f8f8f8">
+  <pre style="line-height: 125%;"><span style="color: #408080; font-style: italic"># Common imports</span>
+<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">os</span>
+<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
+<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">pandas</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">pd</span>
+<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">matplotlib.pyplot</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">plt</span>
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.linear_model</span> <span style="color: #008000; font-weight: bold">import</span> LinearRegression, Ridge, Lasso
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.metrics</span> <span style="color: #008000; font-weight: bold">import</span> mean_squared_error
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.model_selection</span> <span style="color: #008000; font-weight: bold">import</span> KFold
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.model_selection</span> <span style="color: #008000; font-weight: bold">import</span> cross_val_score
+
+
+<span style="color: #408080; font-style: italic"># Where to save the figures and data files</span>
+PROJECT_ROOT_DIR <span style="color: #666666">=</span> <span style="color: #BA2121">&quot;Results&quot;</span>
+FIGURE_ID <span style="color: #666666">=</span> <span style="color: #BA2121">&quot;Results/FigureFiles&quot;</span>
+DATA_ID <span style="color: #666666">=</span> <span style="color: #BA2121">&quot;DataFiles/&quot;</span>
+
+<span style="color: #008000; font-weight: bold">if</span> <span style="color: #AA22FF; font-weight: bold">not</span> os<span style="color: #666666">.</span>path<span style="color: #666666">.</span>exists(PROJECT_ROOT_DIR):
+    os<span style="color: #666666">.</span>mkdir(PROJECT_ROOT_DIR)
+
+<span style="color: #008000; font-weight: bold">if</span> <span style="color: #AA22FF; font-weight: bold">not</span> os<span style="color: #666666">.</span>path<span style="color: #666666">.</span>exists(FIGURE_ID):
+    os<span style="color: #666666">.</span>makedirs(FIGURE_ID)
+
+<span style="color: #008000; font-weight: bold">if</span> <span style="color: #AA22FF; font-weight: bold">not</span> os<span style="color: #666666">.</span>path<span style="color: #666666">.</span>exists(DATA_ID):
+    os<span style="color: #666666">.</span>makedirs(DATA_ID)
+
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">image_path</span>(fig_id):
+    <span style="color: #008000; font-weight: bold">return</span> os<span style="color: #666666">.</span>path<span style="color: #666666">.</span>join(FIGURE_ID, fig_id)
 
-<p>The gradient descent method 
-is sensitive to the choice of learning rate \( \gamma_k \). This is due
-to the fact that we are only guaranteed that \( F(\mathbf{x}_{k+1}) \leq
-F(\mathbf{x}_k) \) for sufficiently small \( \gamma_k \). The problem is to
-determine an optimal learning rate. If the learning rate is chosen too
-small the method will take a long time to converge and if it is too
-large we can experience erratic behavior.
-</p>
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">data_path</span>(dat_id):
+    <span style="color: #008000; font-weight: bold">return</span> os<span style="color: #666666">.</span>path<span style="color: #666666">.</span>join(DATA_ID, dat_id)
+
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">save_fig</span>(fig_id):
+    plt<span style="color: #666666">.</span>savefig(image_path(fig_id) <span style="color: #666666">+</span> <span style="color: #BA2121">&quot;.png&quot;</span>, <span style="color: #008000">format</span><span style="color: #666666">=</span><span style="color: #BA2121">&#39;png&#39;</span>)
+
+infile <span style="color: #666666">=</span> <span style="color: #008000">open</span>(data_path(<span style="color: #BA2121">&quot;EoS.csv&quot;</span>),<span style="color: #BA2121">&#39;r&#39;</span>)
+
+<span style="color: #408080; font-style: italic"># Read the EoS data as  csv file and organize the data into two arrays with density and energies</span>
+EoS <span style="color: #666666">=</span> pd<span style="color: #666666">.</span>read_csv(infile, names<span style="color: #666666">=</span>(<span style="color: #BA2121">&#39;Density&#39;</span>, <span style="color: #BA2121">&#39;Energy&#39;</span>))
+EoS[<span style="color: #BA2121">&#39;Energy&#39;</span>] <span style="color: #666666">=</span> pd<span style="color: #666666">.</span>to_numeric(EoS[<span style="color: #BA2121">&#39;Energy&#39;</span>], errors<span style="color: #666666">=</span><span style="color: #BA2121">&#39;coerce&#39;</span>)
+EoS <span style="color: #666666">=</span> EoS<span style="color: #666666">.</span>dropna()
+Energies <span style="color: #666666">=</span> EoS[<span style="color: #BA2121">&#39;Energy&#39;</span>]
+Density <span style="color: #666666">=</span> EoS[<span style="color: #BA2121">&#39;Density&#39;</span>]
+<span style="color: #408080; font-style: italic">#  The design matrix now as function of various polytrops</span>
+
+Maxpolydegree <span style="color: #666666">=</span> <span style="color: #666666">30</span>
+X <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros((<span style="color: #008000">len</span>(Density),Maxpolydegree))
+X[:,<span style="color: #666666">0</span>] <span style="color: #666666">=</span> <span style="color: #666666">1.0</span>
+estimated_mse_sklearn <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros(Maxpolydegree)
+polynomial <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros(Maxpolydegree)
+k <span style="color: #666666">=5</span>
+kfold <span style="color: #666666">=</span> KFold(n_splits <span style="color: #666666">=</span> k)
+
+<span style="color: #008000; font-weight: bold">for</span> polydegree <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #666666">1</span>, Maxpolydegree):
+    polynomial[polydegree] <span style="color: #666666">=</span> polydegree
+    <span style="color: #008000; font-weight: bold">for</span> degree <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(polydegree):
+        X[:,degree] <span style="color: #666666">=</span> Density<span style="color: #666666">**</span>(degree<span style="color: #666666">/3.0</span>)
+        OLS <span style="color: #666666">=</span> LinearRegression(fit_intercept<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">False</span>)
+<span style="color: #408080; font-style: italic"># loop over trials in order to estimate the expectation value of the MSE</span>
+    estimated_mse_folds <span style="color: #666666">=</span> cross_val_score(OLS, X, Energies, scoring<span style="color: #666666">=</span><span style="color: #BA2121">&#39;neg_mean_squared_error&#39;</span>, cv<span style="color: #666666">=</span>kfold)
+<span style="color: #408080; font-style: italic">#[:, np.newaxis]</span>
+    estimated_mse_sklearn[polydegree] <span style="color: #666666">=</span> np<span style="color: #666666">.</span>mean(<span style="color: #666666">-</span>estimated_mse_folds)
+
+plt<span style="color: #666666">.</span>plot(polynomial, np<span style="color: #666666">.</span>log10(estimated_mse_sklearn), label<span style="color: #666666">=</span><span style="color: #BA2121">&#39;Test Error&#39;</span>)
+plt<span style="color: #666666">.</span>xlabel(<span style="color: #BA2121">&#39;Polynomial degree&#39;</span>)
+plt<span style="color: #666666">.</span>ylabel(<span style="color: #BA2121">&#39;log10[MSE]&#39;</span>)
+plt<span style="color: #666666">.</span>legend()
+plt<span style="color: #666666">.</span>show()
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
+</div>
 
-<p>Many of these shortcomings can be alleviated by introducing
-randomness. One such method is that of Stochastic Gradient Descent
-(SGD), to be discussed next week.
-</p>
 
 <p>
 <!-- navigation buttons at the bottom of the page -->
@@ -351,16 +360,6 @@ <h2 id="the-sensitiveness-of-the-gradient-descent" class="anchor">The sensitiven
   <li><a href="http://github.com/._week38-bs040.html">41</a></li>
   <li class="active"><a href="http://github.com/._week38-bs041.html">42</a></li>
   <li><a href="http://github.com/._week38-bs042.html">43</a></li>
-  <li><a href="http://github.com/._week38-bs043.html">44</a></li>
-  <li><a href="http://github.com/._week38-bs044.html">45</a></li>
-  <li><a href="http://github.com/._week38-bs045.html">46</a></li>
-  <li><a href="http://github.com/._week38-bs046.html">47</a></li>
-  <li><a href="http://github.com/._week38-bs047.html">48</a></li>
-  <li><a href="http://github.com/._week38-bs048.html">49</a></li>
-  <li><a href="http://github.com/._week38-bs049.html">50</a></li>
-  <li><a href="http://github.com/._week38-bs050.html">51</a></li>
-  <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week38-bs063.html">64</a></li>
   <li><a href="http://github.com/._week38-bs042.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week38/html/._week38-bs042.html b/doc/pub/week38/html/._week38-bs042.html
index 0cadba8ad..ba0ebeb22 100644
--- a/doc/pub/week38/html/._week38-bs042.html
+++ b/doc/pub/week38/html/._week38-bs042.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 38: Logistic Regression and Optimization">
-<title>Week 38: Logistic Regression and Optimization</title>
+<meta name="description" content="Week 38: Statistical analysis, bias-variance tradeoff and resampling methods">
+<title>Week 38: Statistical analysis, bias-variance tradeoff and resampling methods</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week38.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week38-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -36,172 +36,113 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Plans for week 38, lecture Monday September 16',
+ 'sections': [('Plans for week 38, lecture Monday September 15',
                2,
                None,
-               'plans-for-week-38-lecture-monday-september-16'),
-              ('Suggested reading and videos',
+               'plans-for-week-38-lecture-monday-september-15'),
+              ('Readings and Videos', 2, None, 'readings-and-videos'),
+              ('Linking the regression analysis with a statistical '
+               'interpretation',
                2,
                None,
-               'suggested-reading-and-videos'),
-              ('Plans for the lab sessions',
+               'linking-the-regression-analysis-with-a-statistical-interpretation'),
+              ('Assumptions made', 2, None, 'assumptions-made'),
+              ('Expectation value and variance',
                2,
                None,
-               'plans-for-the-lab-sessions'),
-              ('Material for lecture Monday September 16',
+               'expectation-value-and-variance'),
+              ('Expectation value and variance for $\\boldsymbol{\\theta}$',
                2,
                None,
-               'material-for-lecture-monday-september-16'),
-              ('Logistic Regression', 2, None, 'logistic-regression'),
-              ('Classification problems', 2, None, 'classification-problems'),
-              ('Optimization and Deep learning',
+               'expectation-value-and-variance-for-boldsymbol-theta'),
+              ('Deriving OLS from a probability distribution',
                2,
                None,
-               'optimization-and-deep-learning'),
-              ('Basics', 2, None, 'basics'),
-              ('Linear classifier', 2, None, 'linear-classifier'),
-              ('Some selected properties', 2, None, 'some-selected-properties'),
-              ('Simple example', 2, None, 'simple-example'),
-              ('Plotting the mean value for each group',
+               'deriving-ols-from-a-probability-distribution'),
+              ('Independent and Identically Distributed (iid)',
                2,
                None,
-               'plotting-the-mean-value-for-each-group'),
-              ('The logistic function', 2, None, 'the-logistic-function'),
-              ('Examples of likelihood functions used in logistic regression '
-               'and nueral networks',
+               'independent-and-identically-distributed-iid'),
+              ('Maximum Likelihood Estimation (MLE)',
                2,
                None,
-               'examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-networks'),
-              ('Two parameters', 2, None, 'two-parameters'),
-              ('Maximum likelihood', 2, None, 'maximum-likelihood'),
-              ('The cost function rewritten',
+               'maximum-likelihood-estimation-mle'),
+              ('A new Cost Function', 2, None, 'a-new-cost-function'),
+              ('Why resampling methods', 2, None, 'why-resampling-methods'),
+              ('Resampling methods', 2, None, 'resampling-methods'),
+              ('Resampling approaches can be computationally expensive',
                2,
                None,
-               'the-cost-function-rewritten'),
-              ('Minimizing the cross entropy',
+               'resampling-approaches-can-be-computationally-expensive'),
+              ('Why resampling methods ?', 2, None, 'why-resampling-methods'),
+              ('Statistical analysis', 2, None, 'statistical-analysis'),
+              ('Resampling methods', 2, None, 'resampling-methods'),
+              ('Resampling methods: Bootstrap',
                2,
                None,
-               'minimizing-the-cross-entropy'),
-              ('A more compact expression',
+               'resampling-methods-bootstrap'),
+              ('The Central Limit Theorem',
                2,
                None,
-               'a-more-compact-expression'),
-              ('Extending to more predictors',
+               'the-central-limit-theorem'),
+              ('Finding the Limit', 2, None, 'finding-the-limit'),
+              ('Rewriting the $\\delta$-function',
                2,
                None,
-               'extending-to-more-predictors'),
-              ('Including more classes', 2, None, 'including-more-classes'),
-              ('More classes', 2, None, 'more-classes'),
-              ('Searching for Optimal Regularization Parameters $\\lambda$',
+               'rewriting-the-delta-function'),
+              ('Identifying Terms', 2, None, 'identifying-terms'),
+              ('Wrapping it up', 2, None, 'wrapping-it-up'),
+              ('Confidence Intervals', 2, None, 'confidence-intervals'),
+              ('Standard Approach based on the Normal Distribution',
                2,
                None,
-               'searching-for-optimal-regularization-parameters-lambda'),
-              ('Grid Search', 2, None, 'grid-search'),
-              ('Randomized Grid Search', 2, None, 'randomized-grid-search'),
-              ('Wisconsin Cancer Data', 2, None, 'wisconsin-cancer-data'),
-              ('Using the correlation matrix',
+               'standard-approach-based-on-the-normal-distribution'),
+              ('Resampling methods: Bootstrap background',
                2,
                None,
-               'using-the-correlation-matrix'),
-              ('Discussing the correlation data',
+               'resampling-methods-bootstrap-background'),
+              ('Resampling methods: More Bootstrap background',
                2,
                None,
-               'discussing-the-correlation-data'),
-              ('Other measures in classification studies: Cancer Data  again',
+               'resampling-methods-more-bootstrap-background'),
+              ('Resampling methods: Bootstrap approach',
                2,
                None,
-               'other-measures-in-classification-studies-cancer-data-again'),
-              ('Optimization, the central part of any Machine Learning '
-               'algortithm',
+               'resampling-methods-bootstrap-approach'),
+              ('Resampling methods: Bootstrap steps',
                2,
                None,
-               'optimization-the-central-part-of-any-machine-learning-algortithm'),
-              ('Revisiting our Logistic Regression case',
+               'resampling-methods-bootstrap-steps'),
+              ('Code example for the Bootstrap method',
                2,
                None,
-               'revisiting-our-logistic-regression-case'),
-              ('The equations to solve', 2, None, 'the-equations-to-solve'),
-              ("Solving using Newton-Raphson's method",
+               'code-example-for-the-bootstrap-method'),
+              ('Plotting the Histogram', 2, None, 'plotting-the-histogram'),
+              ('The bias-variance tradeoff',
                2,
                None,
-               'solving-using-newton-raphson-s-method'),
-              ("Brief reminder on Newton-Raphson's method",
+               'the-bias-variance-tradeoff'),
+              ('A way to Read the Bias-Variance Tradeoff',
                2,
                None,
-               'brief-reminder-on-newton-raphson-s-method'),
-              ('The equations', 2, None, 'the-equations'),
-              ('Simple geometric interpretation',
+               'a-way-to-read-the-bias-variance-tradeoff'),
+              ('Example code for Bias-Variance tradeoff',
                2,
                None,
-               'simple-geometric-interpretation'),
-              ('Extending to more than one variable',
+               'example-code-for-bias-variance-tradeoff'),
+              ('Understanding what happens',
                2,
                None,
-               'extending-to-more-than-one-variable'),
-              ('Steepest descent', 2, None, 'steepest-descent'),
-              ('More on Steepest descent', 2, None, 'more-on-steepest-descent'),
-              ('The ideal', 2, None, 'the-ideal'),
-              ('The sensitiveness of the gradient descent',
+               'understanding-what-happens'),
+              ('Summing up', 2, None, 'summing-up'),
+              ("Another Example from Scikit-Learn's Repository",
                2,
                None,
-               'the-sensitiveness-of-the-gradient-descent'),
-              ('Convex functions', 2, None, 'convex-functions'),
-              ('Convex function', 2, None, 'convex-function'),
-              ('Conditions on convex functions',
-               2,
-               None,
-               'conditions-on-convex-functions'),
-              ('More on convex functions', 2, None, 'more-on-convex-functions'),
-              ('Some simple problems', 2, None, 'some-simple-problems'),
-              ('Revisiting our first homework',
-               2,
-               None,
-               'revisiting-our-first-homework'),
-              ('Gradient descent example', 2, None, 'gradient-descent-example'),
-              ('The derivative of the cost/loss function',
-               2,
-               None,
-               'the-derivative-of-the-cost-loss-function'),
-              ('The Hessian matrix', 2, None, 'the-hessian-matrix'),
-              ('Simple program', 2, None, 'simple-program'),
-              ('Gradient Descent Example', 2, None, 'gradient-descent-example'),
-              ('And a corresponding example using _scikit-learn_',
-               2,
-               None,
-               'and-a-corresponding-example-using-scikit-learn'),
-              ('Gradient descent and Ridge',
-               2,
-               None,
-               'gradient-descent-and-ridge'),
-              ('The Hessian matrix for Ridge Regression',
-               2,
-               None,
-               'the-hessian-matrix-for-ridge-regression'),
-              ('Program example for gradient descent with Ridge Regression',
-               2,
-               None,
-               'program-example-for-gradient-descent-with-ridge-regression'),
-              ('Using gradient descent methods, limitations',
-               2,
-               None,
-               'using-gradient-descent-methods-limitations'),
-              ('Challenge yourself the coming weekend',
-               2,
-               None,
-               'challenge-yourself-the-coming-weekend'),
-              ('Lab session: Material from last week and relevant for the '
-               'first project',
-               2,
-               None,
-               'lab-session-material-from-last-week-and-relevant-for-the-first-project'),
+               'another-example-from-scikit-learn-s-repository'),
               ('Various steps in cross-validation',
                2,
                None,
                'various-steps-in-cross-validation'),
-              ('How to set up the cross-validation for Ridge and/or Lasso',
-               2,
-               None,
-               'how-to-set-up-the-cross-validation-for-ridge-and-or-lasso'),
               ('Cross-validation in brief',
                2,
                None,
@@ -210,7 +151,19 @@
                'Cross-validation',
                2,
                None,
-               'code-example-for-cross-validation-and-k-fold-cross-validation')]}
+               'code-example-for-cross-validation-and-k-fold-cross-validation'),
+              ('More examples on bootstrap and cross-validation and errors',
+               2,
+               None,
+               'more-examples-on-bootstrap-and-cross-validation-and-errors'),
+              ('The same example but now with cross-validation',
+               2,
+               None,
+               'the-same-example-but-now-with-cross-validation'),
+              ('Material for the lab sessions',
+               2,
+               None,
+               'material-for-the-lab-sessions')]}
 end of tocinfo -->
 
 <body>
@@ -238,76 +191,55 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week38-bs.html">Week 38: Logistic Regression and Optimization</a>
+    <a class="navbar-brand" href="http://github.com/week38-bs.html">Week 38: Statistical analysis, bias-variance tradeoff and resampling methods</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs001.html#plans-for-week-38-lecture-monday-september-16" style="font-size: 80%;">Plans for week 38, lecture Monday September 16</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs002.html#suggested-reading-and-videos" style="font-size: 80%;">Suggested reading and videos</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs003.html#plans-for-the-lab-sessions" style="font-size: 80%;">Plans for the lab sessions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs004.html#material-for-lecture-monday-september-16" style="font-size: 80%;">Material for lecture Monday September 16</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs005.html#logistic-regression" style="font-size: 80%;">Logistic Regression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs006.html#classification-problems" style="font-size: 80%;">Classification problems</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs007.html#optimization-and-deep-learning" style="font-size: 80%;">Optimization and Deep learning</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs008.html#basics" style="font-size: 80%;">Basics</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs009.html#linear-classifier" style="font-size: 80%;">Linear classifier</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs010.html#some-selected-properties" style="font-size: 80%;">Some selected properties</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs011.html#simple-example" style="font-size: 80%;">Simple example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs012.html#plotting-the-mean-value-for-each-group" style="font-size: 80%;">Plotting the mean value for each group</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs013.html#the-logistic-function" style="font-size: 80%;">The logistic function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs014.html#examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-networks" style="font-size: 80%;">Examples of likelihood functions used in logistic regression and nueral networks</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs015.html#two-parameters" style="font-size: 80%;">Two parameters</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs016.html#maximum-likelihood" style="font-size: 80%;">Maximum likelihood</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs017.html#the-cost-function-rewritten" style="font-size: 80%;">The cost function rewritten</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs018.html#minimizing-the-cross-entropy" style="font-size: 80%;">Minimizing the cross entropy</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs019.html#a-more-compact-expression" style="font-size: 80%;">A more compact expression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs020.html#extending-to-more-predictors" style="font-size: 80%;">Extending to more predictors</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs021.html#including-more-classes" style="font-size: 80%;">Including more classes</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs022.html#more-classes" style="font-size: 80%;">More classes</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs023.html#searching-for-optimal-regularization-parameters-lambda" style="font-size: 80%;">Searching for Optimal Regularization Parameters \( \lambda \)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs024.html#grid-search" style="font-size: 80%;">Grid Search</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs025.html#randomized-grid-search" style="font-size: 80%;">Randomized Grid Search</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs026.html#wisconsin-cancer-data" style="font-size: 80%;">Wisconsin Cancer Data</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs027.html#using-the-correlation-matrix" style="font-size: 80%;">Using the correlation matrix</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs028.html#discussing-the-correlation-data" style="font-size: 80%;">Discussing the correlation data</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs029.html#other-measures-in-classification-studies-cancer-data-again" style="font-size: 80%;">Other measures in classification studies: Cancer Data  again</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs030.html#optimization-the-central-part-of-any-machine-learning-algortithm" style="font-size: 80%;">Optimization, the central part of any Machine Learning algortithm</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs031.html#revisiting-our-logistic-regression-case" style="font-size: 80%;">Revisiting our Logistic Regression case</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs032.html#the-equations-to-solve" style="font-size: 80%;">The equations to solve</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs033.html#solving-using-newton-raphson-s-method" style="font-size: 80%;">Solving using Newton-Raphson's method</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs034.html#brief-reminder-on-newton-raphson-s-method" style="font-size: 80%;">Brief reminder on Newton-Raphson's method</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs035.html#the-equations" style="font-size: 80%;">The equations</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs036.html#simple-geometric-interpretation" style="font-size: 80%;">Simple geometric interpretation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs037.html#extending-to-more-than-one-variable" style="font-size: 80%;">Extending to more than one variable</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs038.html#steepest-descent" style="font-size: 80%;">Steepest descent</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs039.html#more-on-steepest-descent" style="font-size: 80%;">More on Steepest descent</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs040.html#the-ideal" style="font-size: 80%;">The ideal</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs041.html#the-sensitiveness-of-the-gradient-descent" style="font-size: 80%;">The sensitiveness of the gradient descent</a></li>
-     <!-- navigation toc: --> <li><a href="#convex-functions" style="font-size: 80%;">Convex functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs043.html#convex-function" style="font-size: 80%;">Convex function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs044.html#conditions-on-convex-functions" style="font-size: 80%;">Conditions on convex functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs045.html#more-on-convex-functions" style="font-size: 80%;">More on convex functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs046.html#some-simple-problems" style="font-size: 80%;">Some simple problems</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs047.html#revisiting-our-first-homework" style="font-size: 80%;">Revisiting our first homework</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs052.html#gradient-descent-example" style="font-size: 80%;">Gradient descent example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs049.html#the-derivative-of-the-cost-loss-function" style="font-size: 80%;">The derivative of the cost/loss function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs050.html#the-hessian-matrix" style="font-size: 80%;">The Hessian matrix</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs051.html#simple-program" style="font-size: 80%;">Simple program</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs052.html#gradient-descent-example" style="font-size: 80%;">Gradient Descent Example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs053.html#and-a-corresponding-example-using-scikit-learn" style="font-size: 80%;">And a corresponding example using <b>scikit-learn</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs054.html#gradient-descent-and-ridge" style="font-size: 80%;">Gradient descent and Ridge</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs055.html#the-hessian-matrix-for-ridge-regression" style="font-size: 80%;">The Hessian matrix for Ridge Regression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs056.html#program-example-for-gradient-descent-with-ridge-regression" style="font-size: 80%;">Program example for gradient descent with Ridge Regression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs057.html#using-gradient-descent-methods-limitations" style="font-size: 80%;">Using gradient descent methods, limitations</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs058.html#challenge-yourself-the-coming-weekend" style="font-size: 80%;">Challenge yourself the coming weekend</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs059.html#lab-session-material-from-last-week-and-relevant-for-the-first-project" style="font-size: 80%;">Lab session: Material from last week and relevant for the first project</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs060.html#various-steps-in-cross-validation" style="font-size: 80%;">Various steps in cross-validation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs061.html#how-to-set-up-the-cross-validation-for-ridge-and-or-lasso" style="font-size: 80%;">How to set up the cross-validation for Ridge and/or Lasso</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs062.html#cross-validation-in-brief" style="font-size: 80%;">Cross-validation in brief</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs063.html#code-example-for-cross-validation-and-k-fold-cross-validation" style="font-size: 80%;">Code Example for Cross-validation and \( k \)-fold Cross-validation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs001.html#plans-for-week-38-lecture-monday-september-15" style="font-size: 80%;">Plans for week 38, lecture Monday September 15</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs002.html#readings-and-videos" style="font-size: 80%;">Readings and Videos</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs003.html#linking-the-regression-analysis-with-a-statistical-interpretation" style="font-size: 80%;">Linking the regression analysis with a statistical interpretation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs004.html#assumptions-made" style="font-size: 80%;">Assumptions made</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs005.html#expectation-value-and-variance" style="font-size: 80%;">Expectation value and variance</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs006.html#expectation-value-and-variance-for-boldsymbol-theta" style="font-size: 80%;">Expectation value and variance for \( \boldsymbol{\theta} \)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs007.html#deriving-ols-from-a-probability-distribution" style="font-size: 80%;">Deriving OLS from a probability distribution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs008.html#independent-and-identically-distributed-iid" style="font-size: 80%;">Independent and Identically Distributed (iid)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs009.html#maximum-likelihood-estimation-mle" style="font-size: 80%;">Maximum Likelihood Estimation (MLE)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs010.html#a-new-cost-function" style="font-size: 80%;">A new Cost Function</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs014.html#why-resampling-methods" style="font-size: 80%;">Why resampling methods</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs016.html#resampling-methods" style="font-size: 80%;">Resampling methods</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs013.html#resampling-approaches-can-be-computationally-expensive" style="font-size: 80%;">Resampling approaches can be computationally expensive</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs014.html#why-resampling-methods" style="font-size: 80%;">Why resampling methods ?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs015.html#statistical-analysis" style="font-size: 80%;">Statistical analysis</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs016.html#resampling-methods" style="font-size: 80%;">Resampling methods</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs017.html#resampling-methods-bootstrap" style="font-size: 80%;">Resampling methods: Bootstrap</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs018.html#the-central-limit-theorem" style="font-size: 80%;">The Central Limit Theorem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs019.html#finding-the-limit" style="font-size: 80%;">Finding the Limit</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs020.html#rewriting-the-delta-function" style="font-size: 80%;">Rewriting the \( \delta \)-function</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs021.html#identifying-terms" style="font-size: 80%;">Identifying Terms</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs022.html#wrapping-it-up" style="font-size: 80%;">Wrapping it up</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs023.html#confidence-intervals" style="font-size: 80%;">Confidence Intervals</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs024.html#standard-approach-based-on-the-normal-distribution" style="font-size: 80%;">Standard Approach based on the Normal Distribution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs025.html#resampling-methods-bootstrap-background" style="font-size: 80%;">Resampling methods: Bootstrap background</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs026.html#resampling-methods-more-bootstrap-background" style="font-size: 80%;">Resampling methods: More Bootstrap background</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs027.html#resampling-methods-bootstrap-approach" style="font-size: 80%;">Resampling methods: Bootstrap approach</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs028.html#resampling-methods-bootstrap-steps" style="font-size: 80%;">Resampling methods: Bootstrap steps</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs029.html#code-example-for-the-bootstrap-method" style="font-size: 80%;">Code example for the Bootstrap method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs030.html#plotting-the-histogram" style="font-size: 80%;">Plotting the Histogram</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs031.html#the-bias-variance-tradeoff" style="font-size: 80%;">The bias-variance tradeoff</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs032.html#a-way-to-read-the-bias-variance-tradeoff" style="font-size: 80%;">A way to Read the Bias-Variance Tradeoff</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs033.html#example-code-for-bias-variance-tradeoff" style="font-size: 80%;">Example code for Bias-Variance tradeoff</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs034.html#understanding-what-happens" style="font-size: 80%;">Understanding what happens</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs035.html#summing-up" style="font-size: 80%;">Summing up</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs036.html#another-example-from-scikit-learn-s-repository" style="font-size: 80%;">Another Example from Scikit-Learn's Repository</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs037.html#various-steps-in-cross-validation" style="font-size: 80%;">Various steps in cross-validation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs038.html#cross-validation-in-brief" style="font-size: 80%;">Cross-validation in brief</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs039.html#code-example-for-cross-validation-and-k-fold-cross-validation" style="font-size: 80%;">Code Example for Cross-validation and \( k \)-fold Cross-validation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs040.html#more-examples-on-bootstrap-and-cross-validation-and-errors" style="font-size: 80%;">More examples on bootstrap and cross-validation and errors</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs041.html#the-same-example-but-now-with-cross-validation" style="font-size: 80%;">The same example but now with cross-validation</a></li>
+     <!-- navigation toc: --> <li><a href="#material-for-the-lab-sessions" style="font-size: 80%;">Material for the lab sessions</a></li>
 
         </ul>
       </li>
@@ -318,23 +250,18 @@
 <div class="container">
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0042"></a>
-<!-- !split  -->
-<h2 id="convex-functions" class="anchor">Convex functions </h2>
+<!-- !split -->
+<h2 id="material-for-the-lab-sessions" class="anchor">Material for the lab sessions </h2>
 
-<p>Ideally we want our cost/loss function to be convex(concave).</p>
-
-<p>First we give the definition of a convex set: A set \( C \) in
-\( \mathbb{R}^n \) is said to be convex if, for all \( x \) and \( y \) in \( C \) and
-all \( t \in (0,1) \) , the point \( (1 &#8722; t)x + ty \) also belongs to
-C. Geometrically this means that every point on the line segment
-connecting \( x \) and \( y \) is in \( C \) as discussed below.
+<p>This week we will discuss during the first hour of each lab session
+some technicalities related to the project and methods for updating
+the learning like ADAgrad, RMSprop and ADAM. As teaching material, see
+the jupyter-notebook from week 37 (September 12-16).
 </p>
 
-<p>The convex subsets of \( \mathbb{R} \) are the intervals of
-\( \mathbb{R} \). Examples of convex sets of \( \mathbb{R}^2 \) are the
-regular polygons (triangles, rectangles, pentagons, etc...).
-</p>
+<p>For the lab session, the following video on cross validation (from 2024), could be helpful, see <a href="https://www.youtube.com/watch?v=T9jjWsmsd1o" target="_self"><tt>https://www.youtube.com/watch?v=T9jjWsmsd1o</tt></a></p>
 
+<p>See also video on ADAgrad, RMSprop and ADAM (material from last week not covered during lecture) at <a href="https://youtu.be/J_41Hld6tTU" target="_self"><tt>https://youtu.be/J_41Hld6tTU</tt></a></p>
 <p>
 <!-- navigation buttons at the bottom of the page -->
 <ul class="pagination">
@@ -350,18 +277,6 @@ <h2 id="convex-functions" class="anchor">Convex functions </h2>
   <li><a href="http://github.com/._week38-bs040.html">41</a></li>
   <li><a href="http://github.com/._week38-bs041.html">42</a></li>
   <li class="active"><a href="http://github.com/._week38-bs042.html">43</a></li>
-  <li><a href="http://github.com/._week38-bs043.html">44</a></li>
-  <li><a href="http://github.com/._week38-bs044.html">45</a></li>
-  <li><a href="http://github.com/._week38-bs045.html">46</a></li>
-  <li><a href="http://github.com/._week38-bs046.html">47</a></li>
-  <li><a href="http://github.com/._week38-bs047.html">48</a></li>
-  <li><a href="http://github.com/._week38-bs048.html">49</a></li>
-  <li><a href="http://github.com/._week38-bs049.html">50</a></li>
-  <li><a href="http://github.com/._week38-bs050.html">51</a></li>
-  <li><a href="http://github.com/._week38-bs051.html">52</a></li>
-  <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week38-bs063.html">64</a></li>
-  <li><a href="http://github.com/._week38-bs043.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
 </div>  <!-- end container -->
diff --git a/doc/pub/week38/html/week38-bs.html b/doc/pub/week38/html/week38-bs.html
index a52bfcda1..0c055159c 100644
--- a/doc/pub/week38/html/week38-bs.html
+++ b/doc/pub/week38/html/week38-bs.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 38: Logistic Regression and Optimization">
-<title>Week 38: Logistic Regression and Optimization</title>
+<meta name="description" content="Week 38: Statistical analysis, bias-variance tradeoff and resampling methods">
+<title>Week 38: Statistical analysis, bias-variance tradeoff and resampling methods</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week38.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week38-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -36,172 +36,113 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Plans for week 38, lecture Monday September 16',
+ 'sections': [('Plans for week 38, lecture Monday September 15',
                2,
                None,
-               'plans-for-week-38-lecture-monday-september-16'),
-              ('Suggested reading and videos',
+               'plans-for-week-38-lecture-monday-september-15'),
+              ('Readings and Videos', 2, None, 'readings-and-videos'),
+              ('Linking the regression analysis with a statistical '
+               'interpretation',
                2,
                None,
-               'suggested-reading-and-videos'),
-              ('Plans for the lab sessions',
+               'linking-the-regression-analysis-with-a-statistical-interpretation'),
+              ('Assumptions made', 2, None, 'assumptions-made'),
+              ('Expectation value and variance',
                2,
                None,
-               'plans-for-the-lab-sessions'),
-              ('Material for lecture Monday September 16',
+               'expectation-value-and-variance'),
+              ('Expectation value and variance for $\\boldsymbol{\\theta}$',
                2,
                None,
-               'material-for-lecture-monday-september-16'),
-              ('Logistic Regression', 2, None, 'logistic-regression'),
-              ('Classification problems', 2, None, 'classification-problems'),
-              ('Optimization and Deep learning',
+               'expectation-value-and-variance-for-boldsymbol-theta'),
+              ('Deriving OLS from a probability distribution',
                2,
                None,
-               'optimization-and-deep-learning'),
-              ('Basics', 2, None, 'basics'),
-              ('Linear classifier', 2, None, 'linear-classifier'),
-              ('Some selected properties', 2, None, 'some-selected-properties'),
-              ('Simple example', 2, None, 'simple-example'),
-              ('Plotting the mean value for each group',
+               'deriving-ols-from-a-probability-distribution'),
+              ('Independent and Identically Distributed (iid)',
                2,
                None,
-               'plotting-the-mean-value-for-each-group'),
-              ('The logistic function', 2, None, 'the-logistic-function'),
-              ('Examples of likelihood functions used in logistic regression '
-               'and nueral networks',
+               'independent-and-identically-distributed-iid'),
+              ('Maximum Likelihood Estimation (MLE)',
                2,
                None,
-               'examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-networks'),
-              ('Two parameters', 2, None, 'two-parameters'),
-              ('Maximum likelihood', 2, None, 'maximum-likelihood'),
-              ('The cost function rewritten',
+               'maximum-likelihood-estimation-mle'),
+              ('A new Cost Function', 2, None, 'a-new-cost-function'),
+              ('Why resampling methods', 2, None, 'why-resampling-methods'),
+              ('Resampling methods', 2, None, 'resampling-methods'),
+              ('Resampling approaches can be computationally expensive',
                2,
                None,
-               'the-cost-function-rewritten'),
-              ('Minimizing the cross entropy',
+               'resampling-approaches-can-be-computationally-expensive'),
+              ('Why resampling methods ?', 2, None, 'why-resampling-methods'),
+              ('Statistical analysis', 2, None, 'statistical-analysis'),
+              ('Resampling methods', 2, None, 'resampling-methods'),
+              ('Resampling methods: Bootstrap',
                2,
                None,
-               'minimizing-the-cross-entropy'),
-              ('A more compact expression',
+               'resampling-methods-bootstrap'),
+              ('The Central Limit Theorem',
                2,
                None,
-               'a-more-compact-expression'),
-              ('Extending to more predictors',
+               'the-central-limit-theorem'),
+              ('Finding the Limit', 2, None, 'finding-the-limit'),
+              ('Rewriting the $\\delta$-function',
                2,
                None,
-               'extending-to-more-predictors'),
-              ('Including more classes', 2, None, 'including-more-classes'),
-              ('More classes', 2, None, 'more-classes'),
-              ('Searching for Optimal Regularization Parameters $\\lambda$',
+               'rewriting-the-delta-function'),
+              ('Identifying Terms', 2, None, 'identifying-terms'),
+              ('Wrapping it up', 2, None, 'wrapping-it-up'),
+              ('Confidence Intervals', 2, None, 'confidence-intervals'),
+              ('Standard Approach based on the Normal Distribution',
                2,
                None,
-               'searching-for-optimal-regularization-parameters-lambda'),
-              ('Grid Search', 2, None, 'grid-search'),
-              ('Randomized Grid Search', 2, None, 'randomized-grid-search'),
-              ('Wisconsin Cancer Data', 2, None, 'wisconsin-cancer-data'),
-              ('Using the correlation matrix',
+               'standard-approach-based-on-the-normal-distribution'),
+              ('Resampling methods: Bootstrap background',
                2,
                None,
-               'using-the-correlation-matrix'),
-              ('Discussing the correlation data',
+               'resampling-methods-bootstrap-background'),
+              ('Resampling methods: More Bootstrap background',
                2,
                None,
-               'discussing-the-correlation-data'),
-              ('Other measures in classification studies: Cancer Data  again',
+               'resampling-methods-more-bootstrap-background'),
+              ('Resampling methods: Bootstrap approach',
                2,
                None,
-               'other-measures-in-classification-studies-cancer-data-again'),
-              ('Optimization, the central part of any Machine Learning '
-               'algortithm',
+               'resampling-methods-bootstrap-approach'),
+              ('Resampling methods: Bootstrap steps',
                2,
                None,
-               'optimization-the-central-part-of-any-machine-learning-algortithm'),
-              ('Revisiting our Logistic Regression case',
+               'resampling-methods-bootstrap-steps'),
+              ('Code example for the Bootstrap method',
                2,
                None,
-               'revisiting-our-logistic-regression-case'),
-              ('The equations to solve', 2, None, 'the-equations-to-solve'),
-              ("Solving using Newton-Raphson's method",
+               'code-example-for-the-bootstrap-method'),
+              ('Plotting the Histogram', 2, None, 'plotting-the-histogram'),
+              ('The bias-variance tradeoff',
                2,
                None,
-               'solving-using-newton-raphson-s-method'),
-              ("Brief reminder on Newton-Raphson's method",
+               'the-bias-variance-tradeoff'),
+              ('A way to Read the Bias-Variance Tradeoff',
                2,
                None,
-               'brief-reminder-on-newton-raphson-s-method'),
-              ('The equations', 2, None, 'the-equations'),
-              ('Simple geometric interpretation',
+               'a-way-to-read-the-bias-variance-tradeoff'),
+              ('Example code for Bias-Variance tradeoff',
                2,
                None,
-               'simple-geometric-interpretation'),
-              ('Extending to more than one variable',
+               'example-code-for-bias-variance-tradeoff'),
+              ('Understanding what happens',
                2,
                None,
-               'extending-to-more-than-one-variable'),
-              ('Steepest descent', 2, None, 'steepest-descent'),
-              ('More on Steepest descent', 2, None, 'more-on-steepest-descent'),
-              ('The ideal', 2, None, 'the-ideal'),
-              ('The sensitiveness of the gradient descent',
+               'understanding-what-happens'),
+              ('Summing up', 2, None, 'summing-up'),
+              ("Another Example from Scikit-Learn's Repository",
                2,
                None,
-               'the-sensitiveness-of-the-gradient-descent'),
-              ('Convex functions', 2, None, 'convex-functions'),
-              ('Convex function', 2, None, 'convex-function'),
-              ('Conditions on convex functions',
-               2,
-               None,
-               'conditions-on-convex-functions'),
-              ('More on convex functions', 2, None, 'more-on-convex-functions'),
-              ('Some simple problems', 2, None, 'some-simple-problems'),
-              ('Revisiting our first homework',
-               2,
-               None,
-               'revisiting-our-first-homework'),
-              ('Gradient descent example', 2, None, 'gradient-descent-example'),
-              ('The derivative of the cost/loss function',
-               2,
-               None,
-               'the-derivative-of-the-cost-loss-function'),
-              ('The Hessian matrix', 2, None, 'the-hessian-matrix'),
-              ('Simple program', 2, None, 'simple-program'),
-              ('Gradient Descent Example', 2, None, 'gradient-descent-example'),
-              ('And a corresponding example using _scikit-learn_',
-               2,
-               None,
-               'and-a-corresponding-example-using-scikit-learn'),
-              ('Gradient descent and Ridge',
-               2,
-               None,
-               'gradient-descent-and-ridge'),
-              ('The Hessian matrix for Ridge Regression',
-               2,
-               None,
-               'the-hessian-matrix-for-ridge-regression'),
-              ('Program example for gradient descent with Ridge Regression',
-               2,
-               None,
-               'program-example-for-gradient-descent-with-ridge-regression'),
-              ('Using gradient descent methods, limitations',
-               2,
-               None,
-               'using-gradient-descent-methods-limitations'),
-              ('Challenge yourself the coming weekend',
-               2,
-               None,
-               'challenge-yourself-the-coming-weekend'),
-              ('Lab session: Material from last week and relevant for the '
-               'first project',
-               2,
-               None,
-               'lab-session-material-from-last-week-and-relevant-for-the-first-project'),
+               'another-example-from-scikit-learn-s-repository'),
               ('Various steps in cross-validation',
                2,
                None,
                'various-steps-in-cross-validation'),
-              ('How to set up the cross-validation for Ridge and/or Lasso',
-               2,
-               None,
-               'how-to-set-up-the-cross-validation-for-ridge-and-or-lasso'),
               ('Cross-validation in brief',
                2,
                None,
@@ -210,7 +151,19 @@
                'Cross-validation',
                2,
                None,
-               'code-example-for-cross-validation-and-k-fold-cross-validation')]}
+               'code-example-for-cross-validation-and-k-fold-cross-validation'),
+              ('More examples on bootstrap and cross-validation and errors',
+               2,
+               None,
+               'more-examples-on-bootstrap-and-cross-validation-and-errors'),
+              ('The same example but now with cross-validation',
+               2,
+               None,
+               'the-same-example-but-now-with-cross-validation'),
+              ('Material for the lab sessions',
+               2,
+               None,
+               'material-for-the-lab-sessions')]}
 end of tocinfo -->
 
 <body>
@@ -238,76 +191,55 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week38-bs.html">Week 38: Logistic Regression and Optimization</a>
+    <a class="navbar-brand" href="http://github.com/week38-bs.html">Week 38: Statistical analysis, bias-variance tradeoff and resampling methods</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs001.html#plans-for-week-38-lecture-monday-september-16" style="font-size: 80%;">Plans for week 38, lecture Monday September 16</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs002.html#suggested-reading-and-videos" style="font-size: 80%;">Suggested reading and videos</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs003.html#plans-for-the-lab-sessions" style="font-size: 80%;">Plans for the lab sessions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs004.html#material-for-lecture-monday-september-16" style="font-size: 80%;">Material for lecture Monday September 16</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs005.html#logistic-regression" style="font-size: 80%;">Logistic Regression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs006.html#classification-problems" style="font-size: 80%;">Classification problems</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs007.html#optimization-and-deep-learning" style="font-size: 80%;">Optimization and Deep learning</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs008.html#basics" style="font-size: 80%;">Basics</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs009.html#linear-classifier" style="font-size: 80%;">Linear classifier</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs010.html#some-selected-properties" style="font-size: 80%;">Some selected properties</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs011.html#simple-example" style="font-size: 80%;">Simple example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs012.html#plotting-the-mean-value-for-each-group" style="font-size: 80%;">Plotting the mean value for each group</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs013.html#the-logistic-function" style="font-size: 80%;">The logistic function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs014.html#examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-networks" style="font-size: 80%;">Examples of likelihood functions used in logistic regression and nueral networks</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs015.html#two-parameters" style="font-size: 80%;">Two parameters</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs016.html#maximum-likelihood" style="font-size: 80%;">Maximum likelihood</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs017.html#the-cost-function-rewritten" style="font-size: 80%;">The cost function rewritten</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs018.html#minimizing-the-cross-entropy" style="font-size: 80%;">Minimizing the cross entropy</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs019.html#a-more-compact-expression" style="font-size: 80%;">A more compact expression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs020.html#extending-to-more-predictors" style="font-size: 80%;">Extending to more predictors</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs021.html#including-more-classes" style="font-size: 80%;">Including more classes</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs022.html#more-classes" style="font-size: 80%;">More classes</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs023.html#searching-for-optimal-regularization-parameters-lambda" style="font-size: 80%;">Searching for Optimal Regularization Parameters \( \lambda \)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs024.html#grid-search" style="font-size: 80%;">Grid Search</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs025.html#randomized-grid-search" style="font-size: 80%;">Randomized Grid Search</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs026.html#wisconsin-cancer-data" style="font-size: 80%;">Wisconsin Cancer Data</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs027.html#using-the-correlation-matrix" style="font-size: 80%;">Using the correlation matrix</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs028.html#discussing-the-correlation-data" style="font-size: 80%;">Discussing the correlation data</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs029.html#other-measures-in-classification-studies-cancer-data-again" style="font-size: 80%;">Other measures in classification studies: Cancer Data  again</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs030.html#optimization-the-central-part-of-any-machine-learning-algortithm" style="font-size: 80%;">Optimization, the central part of any Machine Learning algortithm</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs031.html#revisiting-our-logistic-regression-case" style="font-size: 80%;">Revisiting our Logistic Regression case</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs032.html#the-equations-to-solve" style="font-size: 80%;">The equations to solve</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs033.html#solving-using-newton-raphson-s-method" style="font-size: 80%;">Solving using Newton-Raphson's method</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs034.html#brief-reminder-on-newton-raphson-s-method" style="font-size: 80%;">Brief reminder on Newton-Raphson's method</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs035.html#the-equations" style="font-size: 80%;">The equations</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs036.html#simple-geometric-interpretation" style="font-size: 80%;">Simple geometric interpretation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs037.html#extending-to-more-than-one-variable" style="font-size: 80%;">Extending to more than one variable</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs038.html#steepest-descent" style="font-size: 80%;">Steepest descent</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs039.html#more-on-steepest-descent" style="font-size: 80%;">More on Steepest descent</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs040.html#the-ideal" style="font-size: 80%;">The ideal</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs041.html#the-sensitiveness-of-the-gradient-descent" style="font-size: 80%;">The sensitiveness of the gradient descent</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs042.html#convex-functions" style="font-size: 80%;">Convex functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs043.html#convex-function" style="font-size: 80%;">Convex function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs044.html#conditions-on-convex-functions" style="font-size: 80%;">Conditions on convex functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs045.html#more-on-convex-functions" style="font-size: 80%;">More on convex functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs046.html#some-simple-problems" style="font-size: 80%;">Some simple problems</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs047.html#revisiting-our-first-homework" style="font-size: 80%;">Revisiting our first homework</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs052.html#gradient-descent-example" style="font-size: 80%;">Gradient descent example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs049.html#the-derivative-of-the-cost-loss-function" style="font-size: 80%;">The derivative of the cost/loss function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs050.html#the-hessian-matrix" style="font-size: 80%;">The Hessian matrix</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs051.html#simple-program" style="font-size: 80%;">Simple program</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs052.html#gradient-descent-example" style="font-size: 80%;">Gradient Descent Example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs053.html#and-a-corresponding-example-using-scikit-learn" style="font-size: 80%;">And a corresponding example using <b>scikit-learn</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs054.html#gradient-descent-and-ridge" style="font-size: 80%;">Gradient descent and Ridge</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs055.html#the-hessian-matrix-for-ridge-regression" style="font-size: 80%;">The Hessian matrix for Ridge Regression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs056.html#program-example-for-gradient-descent-with-ridge-regression" style="font-size: 80%;">Program example for gradient descent with Ridge Regression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs057.html#using-gradient-descent-methods-limitations" style="font-size: 80%;">Using gradient descent methods, limitations</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs058.html#challenge-yourself-the-coming-weekend" style="font-size: 80%;">Challenge yourself the coming weekend</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs059.html#lab-session-material-from-last-week-and-relevant-for-the-first-project" style="font-size: 80%;">Lab session: Material from last week and relevant for the first project</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs060.html#various-steps-in-cross-validation" style="font-size: 80%;">Various steps in cross-validation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs061.html#how-to-set-up-the-cross-validation-for-ridge-and-or-lasso" style="font-size: 80%;">How to set up the cross-validation for Ridge and/or Lasso</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs062.html#cross-validation-in-brief" style="font-size: 80%;">Cross-validation in brief</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs063.html#code-example-for-cross-validation-and-k-fold-cross-validation" style="font-size: 80%;">Code Example for Cross-validation and \( k \)-fold Cross-validation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs001.html#plans-for-week-38-lecture-monday-september-15" style="font-size: 80%;">Plans for week 38, lecture Monday September 15</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs002.html#readings-and-videos" style="font-size: 80%;">Readings and Videos</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs003.html#linking-the-regression-analysis-with-a-statistical-interpretation" style="font-size: 80%;">Linking the regression analysis with a statistical interpretation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs004.html#assumptions-made" style="font-size: 80%;">Assumptions made</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs005.html#expectation-value-and-variance" style="font-size: 80%;">Expectation value and variance</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs006.html#expectation-value-and-variance-for-boldsymbol-theta" style="font-size: 80%;">Expectation value and variance for \( \boldsymbol{\theta} \)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs007.html#deriving-ols-from-a-probability-distribution" style="font-size: 80%;">Deriving OLS from a probability distribution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs008.html#independent-and-identically-distributed-iid" style="font-size: 80%;">Independent and Identically Distributed (iid)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs009.html#maximum-likelihood-estimation-mle" style="font-size: 80%;">Maximum Likelihood Estimation (MLE)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs010.html#a-new-cost-function" style="font-size: 80%;">A new Cost Function</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs014.html#why-resampling-methods" style="font-size: 80%;">Why resampling methods</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs016.html#resampling-methods" style="font-size: 80%;">Resampling methods</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs013.html#resampling-approaches-can-be-computationally-expensive" style="font-size: 80%;">Resampling approaches can be computationally expensive</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs014.html#why-resampling-methods" style="font-size: 80%;">Why resampling methods ?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs015.html#statistical-analysis" style="font-size: 80%;">Statistical analysis</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs016.html#resampling-methods" style="font-size: 80%;">Resampling methods</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs017.html#resampling-methods-bootstrap" style="font-size: 80%;">Resampling methods: Bootstrap</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs018.html#the-central-limit-theorem" style="font-size: 80%;">The Central Limit Theorem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs019.html#finding-the-limit" style="font-size: 80%;">Finding the Limit</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs020.html#rewriting-the-delta-function" style="font-size: 80%;">Rewriting the \( \delta \)-function</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs021.html#identifying-terms" style="font-size: 80%;">Identifying Terms</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs022.html#wrapping-it-up" style="font-size: 80%;">Wrapping it up</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs023.html#confidence-intervals" style="font-size: 80%;">Confidence Intervals</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs024.html#standard-approach-based-on-the-normal-distribution" style="font-size: 80%;">Standard Approach based on the Normal Distribution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs025.html#resampling-methods-bootstrap-background" style="font-size: 80%;">Resampling methods: Bootstrap background</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs026.html#resampling-methods-more-bootstrap-background" style="font-size: 80%;">Resampling methods: More Bootstrap background</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs027.html#resampling-methods-bootstrap-approach" style="font-size: 80%;">Resampling methods: Bootstrap approach</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs028.html#resampling-methods-bootstrap-steps" style="font-size: 80%;">Resampling methods: Bootstrap steps</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs029.html#code-example-for-the-bootstrap-method" style="font-size: 80%;">Code example for the Bootstrap method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs030.html#plotting-the-histogram" style="font-size: 80%;">Plotting the Histogram</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs031.html#the-bias-variance-tradeoff" style="font-size: 80%;">The bias-variance tradeoff</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs032.html#a-way-to-read-the-bias-variance-tradeoff" style="font-size: 80%;">A way to Read the Bias-Variance Tradeoff</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs033.html#example-code-for-bias-variance-tradeoff" style="font-size: 80%;">Example code for Bias-Variance tradeoff</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs034.html#understanding-what-happens" style="font-size: 80%;">Understanding what happens</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs035.html#summing-up" style="font-size: 80%;">Summing up</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs036.html#another-example-from-scikit-learn-s-repository" style="font-size: 80%;">Another Example from Scikit-Learn's Repository</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs037.html#various-steps-in-cross-validation" style="font-size: 80%;">Various steps in cross-validation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs038.html#cross-validation-in-brief" style="font-size: 80%;">Cross-validation in brief</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs039.html#code-example-for-cross-validation-and-k-fold-cross-validation" style="font-size: 80%;">Code Example for Cross-validation and \( k \)-fold Cross-validation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs040.html#more-examples-on-bootstrap-and-cross-validation-and-errors" style="font-size: 80%;">More examples on bootstrap and cross-validation and errors</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs041.html#the-same-example-but-now-with-cross-validation" style="font-size: 80%;">The same example but now with cross-validation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week38-bs042.html#material-for-the-lab-sessions" style="font-size: 80%;">Material for the lab sessions</a></li>
 
         </ul>
       </li>
@@ -321,23 +253,20 @@
 <!-- ------------------- main content ---------------------- -->
 <div class="jumbotron">
 <center>
-<h1>Week 38: Logistic Regression and Optimization</h1>
+<h1>Week 38: Statistical analysis, bias-variance tradeoff and resampling methods</h1>
 </center>  <!-- document title -->
 
 <!-- author(s): Morten Hjorth-Jensen -->
 <center>
-<b>Morten Hjorth-Jensen</b> [1, 2]
-</center>
-<!-- institution(s) -->
-<center>
-[1] <b>Department of Physics and Center for Computing in Science Education, University of Oslo</b>
+<b>Morten Hjorth-Jensen</b> 
 </center>
+<!-- institution -->
 <center>
-[2] <b>Department of Physics and Astronomy and Facility for Rare Isotope Beams, Michigan State University</b>
+<b>Department of Physics and Center for Computing in Science Education, University of Oslo, Norway</b>
 </center>
 <br>
 <center>
-<h4>September 16-20, 2024</h4>
+<h4>September 15-19, 2025</h4>
 </center> <!-- date -->
 <br>
 
@@ -362,7 +291,7 @@ <h4>September 16-20, 2024</h4>
   <li><a href="http://github.com/._week38-bs008.html">9</a></li>
   <li><a href="http://github.com/._week38-bs009.html">10</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week38-bs063.html">64</a></li>
+  <li><a href="http://github.com/._week38-bs042.html">43</a></li>
   <li><a href="http://github.com/._week38-bs001.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
@@ -376,7 +305,7 @@ <h4>September 16-20, 2024</h4>
 </footer>
 -->
 <center style="font-size:80%">
-<!-- copyright --> &copy; 1999-2024, Morten Hjorth-Jensen. Released under CC Attribution-NonCommercial 4.0 license
+<!-- copyright --> &copy; 1999-2025, Morten Hjorth-Jensen. Released under CC Attribution-NonCommercial 4.0 license
 </center>
 </body>
 </html>
diff --git a/doc/pub/week38/html/week38-reveal.html b/doc/pub/week38/html/week38-reveal.html
index 3cbc48741..6fbf2ca21 100644
--- a/doc/pub/week38/html/week38-reveal.html
+++ b/doc/pub/week38/html/week38-reveal.html
@@ -9,8 +9,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 38: Logistic Regression and Optimization">
-<title>Week 38: Logistic Regression and Optimization</title>
+<meta name="description" content="Week 38: Statistical analysis, bias-variance tradeoff and resampling methods">
+<title>Week 38: Statistical analysis, bias-variance tradeoff and resampling methods</title>
 
 <!-- reveal.js: https://lab.hakim.se/reveal-js/ -->
 
@@ -168,1640 +168,911 @@
 <section>
 <!-- ------------------- main content ---------------------- -->
 <center>
-<h1 style="text-align: center;">Week 38: Logistic Regression and Optimization</h1>
+<h1 style="text-align: center;">Week 38: Statistical analysis, bias-variance tradeoff and resampling methods</h1>
 </center>  <!-- document title -->
 
 <!-- author(s): Morten Hjorth-Jensen -->
 <center>
-<b>Morten Hjorth-Jensen</b> [1, 2]
+<b>Morten Hjorth-Jensen</b> 
 </center>
-<!-- institution(s) -->
+<!-- institution -->
 <center>
-[1] <b>Department of Physics and Center for Computing in Science Education, University of Oslo</b>
-</center>
-<center>
-[2] <b>Department of Physics and Astronomy and Facility for Rare Isotope Beams, Michigan State University</b>
+<b>Department of Physics and Center for Computing in Science Education, University of Oslo, Norway</b>
 </center>
 <br>
 <center>
-<h4>September 16-20, 2024</h4>
+<h4>September 15-19, 2025</h4>
 </center> <!-- date -->
 <br>
 
 
 <center style="font-size:80%">
-<!-- copyright --> &copy; 1999-2024, Morten Hjorth-Jensen. Released under CC Attribution-NonCommercial 4.0 license
+<!-- copyright --> &copy; 1999-2025, Morten Hjorth-Jensen. Released under CC Attribution-NonCommercial 4.0 license
 </center>
 </section>
 
 <section>
-<h2 id="plans-for-week-38-lecture-monday-september-16">Plans for week 38, lecture Monday September 16 </h2>
+<h2 id="plans-for-week-38-lecture-monday-september-15">Plans for week 38, lecture Monday September 15 </h2>
 
 <div class="alert alert-block alert-block alert-text-normal">
-<b>Material for the lecture on Monday September 16</b>
+<b>Material for the lecture on Monday September 15</b>
 <p>
-<ul>
-
-<p><li> Logistic regression as our first encounter of classification methods. From binary cases to several categories.</li>
-
-<p><li> Start gradient and optimization methods</li>
-
-<p><li> <a href="https://youtu.be/c9DIfNHy2ks" target="_blank">Video of lecture</a></li>
-
-<p><li> Whiteboard notes at <a href="https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2024/NotesSeptember16.pdf" target="_blank"><tt>https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2024/NotesSeptember16.pdf</tt></a></li>
-</ul>
+<ol>
+<p><li> Statistical interpretation of OLS and various expectation values</li>
+<p><li> Resampling techniques, Bootstrap and cross validation and bias-variance tradeoff</li>
+<p><li> The material we did not cover last week, that is on more advanced methods for updating the learning rate, are covered by its own video. We will briefly discuss these topics at the beginning of the lecture and during the lab sessions. See video on ADAgrad, RMSprop and ADAM (material from last week not covered during lecture) at <a href="https://youtu.be/J_41Hld6tTU" target="_blank"><tt>https://youtu.be/J_41Hld6tTU</tt></a></li>
+<p><li> <a href="https://youtu.be/4Fo7ITVA7V4" target="_blank">Video of Lecture</a></li>
+<p><li> <a href="https://youtu.be/GBWc1abChKo" target="_blank">Video from lab sessions on the bias-variance tradeoff</a></li>
+<p><li> <a href="https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2025/FYSSTKweek38.pdf" target="_blank">Whiteboard notes</a></li>
+</ol>
 </div>
 </section>
 
 <section>
-<h2 id="suggested-reading-and-videos">Suggested reading and videos </h2>
+<h2 id="readings-and-videos">Readings and Videos </h2>
 <div class="alert alert-block alert-block alert-text-normal">
 <b></b>
 <p>
-<ul>
-
-<p><li> Readings and Videos:</li>
-<ul>
-
-<p><li> Hastie et al 4.1, 4.2 and 4.3 on logistic regression</li>
-
-<p><li> Raschka et al, pages 53-76 on Logistic regression and pages 37-52 on gradient optimization</li>
-
-<p><li> For a good discussion on gradient methods, see Goodfellow et al section 4.3-4.5 and chapter 8. We will come back to the latter chapter in our discussion of Neural networks as well.</li>
-
-<p><li> <a href="https://www.youtube.com/watch?v=C5268D9t9Ak" target="_blank">Video on Logistic regression</a></li>
-
-<p><li> <a href="https://www.youtube.com/watch?v=yIYKR4sgzI8" target="_blank">Yet another video on logistic regression</a></li>
-
-<p><li> <a href="https://www.youtube.com/watch?v=sDv4f4s2SB8" target="_blank">Video on gradient descent</a></li>
-</ul>
-<p>
-</ul>
-</div>
-</section>
-
-<section>
-<h2 id="plans-for-the-lab-sessions">Plans for the lab sessions </h2>
-
-<div class="alert alert-block alert-block alert-text-normal">
-<b>Material for the active learning sessions on Tuesday and Wednesday</b>
-<p>
-<ul>
-
-<p><li> Repetition  from last week on the bias-variance tradeoff</li>
-
-<p><li> Resampling techniques, cross-validation examples included here, see also the lectures from last week on the bootstrap method</li>
-
-<p><li> Exercise for week 38 on the bias-variance tradeoff, see also the video from the lab session from week 37 at <a href="https://youtu.be/omLmp_kkie0" target="_blank"><tt>https://youtu.be/omLmp_kkie0</tt></a></li>
-
-<p><li> Work on project 1, in particular resampling methods like cross-validation and bootstrap.</li>
-
-<p><li> <a href="https://youtu.be/T9jjWsmsd1o" target="_blank">Video on cross-validation from exercise session</a></li>
-</ul>
-</div>
-</section>
-
-<section>
-<h2 id="material-for-lecture-monday-september-16">Material for lecture Monday September 16 </h2>
-</section>
-
-<section>
-<h2 id="logistic-regression">Logistic Regression </h2>
-
-<p>In linear regression our main interest was centered on learning the
-coefficients of a functional fit (say a polynomial) in order to be
-able to predict the response of a continuous variable on some unseen
-data. The fit to the continuous variable \( y_i \) is based on some
-independent variables \( \boldsymbol{x}_i \). Linear regression resulted in
-analytical expressions for standard ordinary Least Squares or Ridge
-regression (in terms of matrices to invert) for several quantities,
-ranging from the variance and thereby the confidence intervals of the
-parameters \( \boldsymbol{\beta} \) to the mean squared error. If we can invert
-the product of the design matrices, linear regression gives then a
-simple recipe for fitting our data.
-</p>
-</section>
-
-<section>
-<h2 id="classification-problems">Classification problems </h2>
-
-<p>Classification problems, however, are concerned with outcomes taking
-the form of discrete variables (i.e. categories). We may for example,
-on the basis of DNA sequencing for a number of patients, like to find
-out which mutations are important for a certain disease; or based on
-scans of various patients' brains, figure out if there is a tumor or
-not; or given a specific physical system, we'd like to identify its
-state, say whether it is an ordered or disordered system (typical
-situation in solid state physics); or classify the status of a
-patient, whether she/he has a stroke or not and many other similar
-situations.
-</p>
-
-<p>The most common situation we encounter when we apply logistic
-regression is that of two possible outcomes, normally denoted as a
-binary outcome, true or false, positive or negative, success or
-failure etc.
-</p>
-</section>
-
-<section>
-<h2 id="optimization-and-deep-learning">Optimization and Deep learning </h2>
-
-<p>Logistic regression will also serve as our stepping stone towards
-neural network algorithms and supervised deep learning. For logistic
-learning, the minimization of the cost function leads to a non-linear
-equation in the parameters \( \boldsymbol{\beta} \). The optimization of the
-problem calls therefore for minimization algorithms. This forms the
-bottle neck of all machine learning algorithms, namely how to find
-reliable minima of a multi-variable function. This leads us to the
-family of gradient descent methods. The latter are the working horses
-of basically all modern machine learning algorithms.
-</p>
-
-<p>We note also that many of the topics discussed here on logistic 
-regression are also commonly used in modern supervised Deep Learning
-models, as we will see later.
-</p>
-</section>
-
-<section>
-<h2 id="basics">Basics </h2>
-
-<p>We consider the case where the dependent variables, also called the
-responses or the outcomes, \( y_i \) are discrete and only take values
-from \( k=0,\dots,K-1 \) (i.e. \( K \) classes).
-</p>
-
-<p>The goal is to predict the
-output classes from the design matrix \( \boldsymbol{X}\in\mathbb{R}^{n\times p} \)
-made of \( n \) samples, each of which carries \( p \) features or predictors. The
-primary goal is to identify the classes to which new unseen samples
-belong.
-</p>
-
-<p>Let us specialize to the case of two classes only, with outputs
-\( y_i=0 \) and \( y_i=1 \). Our outcomes could represent the status of a
-credit card user that could default or not on her/his credit card
-debt. That is
-</p>
-
-<p>&nbsp;<br>
-$$
-y_i = \begin{bmatrix} 0 & \mathrm{no}\\  1 & \mathrm{yes} \end{bmatrix}.
-$$
-<p>&nbsp;<br>
-</section>
-
-<section>
-<h2 id="linear-classifier">Linear classifier </h2>
-
-<p>Before moving to the logistic model, let us try to use our linear
-regression model to classify these two outcomes. We could for example
-fit a linear model to the default case if \( y_i > 0.5 \) and the no
-default case \( y_i \leq 0.5 \).
-</p>
-
-<p>We would then have our 
-weighted linear combination, namely 
-</p>
-<p>&nbsp;<br>
-$$
-\begin{equation}
-\boldsymbol{y} = \boldsymbol{X}^T\boldsymbol{\beta} +  \boldsymbol{\epsilon},
-\tag{1}
-\end{equation}
-$$
-<p>&nbsp;<br>
-
-<p>where \( \boldsymbol{y} \) is a vector representing the possible outcomes, \( \boldsymbol{X} \) is our
-\( n\times p \) design matrix and \( \boldsymbol{\beta} \) represents our estimators/predictors.
-</p>
-</section>
-
-<section>
-<h2 id="some-selected-properties">Some selected properties </h2>
-
-<p>The main problem with our function is that it takes values on the
-entire real axis. In the case of logistic regression, however, the
-labels \( y_i \) are discrete variables. A typical example is the credit
-card data discussed below here, where we can set the state of
-defaulting the debt to \( y_i=1 \) and not to \( y_i=0 \) for one the persons
-in the data set (see the full example below).
-</p>
-
-<p>One simple way to get a discrete output is to have sign
-functions that map the output of a linear regressor to values \( \{0,1\} \),
-\( f(s_i)=sign(s_i)=1 \) if \( s_i\ge 0 \) and 0 if otherwise. 
-We will encounter this model in our first demonstration of neural networks.
-</p>
-
-<p>Historically it is called the <b>perceptron</b> model in the machine learning
-literature. This model is extremely simple. However, in many cases it is more
-favorable to use a ``soft" classifier that outputs
-the probability of a given category. This leads us to the logistic function.
-</p>
-</section>
-
-<section>
-<h2 id="simple-example">Simple example </h2>
-
-<p>The following example on data for coronary heart disease (CHD) as function of age may serve as an illustration. In the code here we read and plot whether a person has had CHD (output = 1) or not (output = 0). This ouput  is plotted the person's against age. Clearly, the figure shows that attempting to make a standard linear regression fit may not be very meaningful.</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="font-size: 80%; line-height: 125%;"><span style="color: #228B22"># Common imports</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">os</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">pandas</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">pd</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">matplotlib.pyplot</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">plt</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.linear_model</span> <span style="color: #8B008B; font-weight: bold">import</span> LinearRegression, Ridge, Lasso
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.model_selection</span> <span style="color: #8B008B; font-weight: bold">import</span> train_test_split
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.utils</span> <span style="color: #8B008B; font-weight: bold">import</span> resample
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.metrics</span> <span style="color: #8B008B; font-weight: bold">import</span> mean_squared_error
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">IPython.display</span> <span style="color: #8B008B; font-weight: bold">import</span> display
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">pylab</span> <span style="color: #8B008B; font-weight: bold">import</span> plt, mpl
-plt.style.use(<span style="color: #CD5555">&#39;seaborn&#39;</span>)
-mpl.rcParams[<span style="color: #CD5555">&#39;font.family&#39;</span>] = <span style="color: #CD5555">&#39;serif&#39;</span>
-
-<span style="color: #228B22"># Where to save the figures and data files</span>
-PROJECT_ROOT_DIR = <span style="color: #CD5555">&quot;Results&quot;</span>
-FIGURE_ID = <span style="color: #CD5555">&quot;Results/FigureFiles&quot;</span>
-DATA_ID = <span style="color: #CD5555">&quot;DataFiles/&quot;</span>
-
-<span style="color: #8B008B; font-weight: bold">if</span> <span style="color: #8B008B">not</span> os.path.exists(PROJECT_ROOT_DIR):
-    os.mkdir(PROJECT_ROOT_DIR)
-
-<span style="color: #8B008B; font-weight: bold">if</span> <span style="color: #8B008B">not</span> os.path.exists(FIGURE_ID):
-    os.makedirs(FIGURE_ID)
-
-<span style="color: #8B008B; font-weight: bold">if</span> <span style="color: #8B008B">not</span> os.path.exists(DATA_ID):
-    os.makedirs(DATA_ID)
-
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">image_path</span>(fig_id):
-    <span style="color: #8B008B; font-weight: bold">return</span> os.path.join(FIGURE_ID, fig_id)
-
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">data_path</span>(dat_id):
-    <span style="color: #8B008B; font-weight: bold">return</span> os.path.join(DATA_ID, dat_id)
-
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">save_fig</span>(fig_id):
-    plt.savefig(image_path(fig_id) + <span style="color: #CD5555">&quot;.png&quot;</span>, <span style="color: #658b00">format</span>=<span style="color: #CD5555">&#39;png&#39;</span>)
-
-infile = <span style="color: #658b00">open</span>(data_path(<span style="color: #CD5555">&quot;chddata.csv&quot;</span>),<span style="color: #CD5555">&#39;r&#39;</span>)
-
-<span style="color: #228B22"># Read the chd data as  csv file and organize the data into arrays with age group, age, and chd</span>
-chd = pd.read_csv(infile, names=(<span style="color: #CD5555">&#39;ID&#39;</span>, <span style="color: #CD5555">&#39;Age&#39;</span>, <span style="color: #CD5555">&#39;Agegroup&#39;</span>, <span style="color: #CD5555">&#39;CHD&#39;</span>))
-chd.columns = [<span style="color: #CD5555">&#39;ID&#39;</span>, <span style="color: #CD5555">&#39;Age&#39;</span>, <span style="color: #CD5555">&#39;Agegroup&#39;</span>, <span style="color: #CD5555">&#39;CHD&#39;</span>]
-output = chd[<span style="color: #CD5555">&#39;CHD&#39;</span>]
-age = chd[<span style="color: #CD5555">&#39;Age&#39;</span>]
-agegroup = chd[<span style="color: #CD5555">&#39;Agegroup&#39;</span>]
-numberID  = chd[<span style="color: #CD5555">&#39;ID&#39;</span>] 
-display(chd)
-
-plt.scatter(age, output, marker=<span style="color: #CD5555">&#39;o&#39;</span>)
-plt.axis([<span style="color: #B452CD">18</span>,<span style="color: #B452CD">70.0</span>,-<span style="color: #B452CD">0.1</span>, <span style="color: #B452CD">1.2</span>])
-plt.xlabel(<span style="color: #CD5555">r&#39;Age&#39;</span>)
-plt.ylabel(<span style="color: #CD5555">r&#39;CHD&#39;</span>)
-plt.title(<span style="color: #CD5555">r&#39;Age distribution and Coronary heart disease&#39;</span>)
-plt.show()
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-</section>
-
-<section>
-<h2 id="plotting-the-mean-value-for-each-group">Plotting the mean value for each group </h2>
-
-<p>What we could attempt however is to plot the mean value for each group.</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="font-size: 80%; line-height: 125%;">agegroupmean = np.array([<span style="color: #B452CD">0.1</span>, <span style="color: #B452CD">0.133</span>, <span style="color: #B452CD">0.250</span>, <span style="color: #B452CD">0.333</span>, <span style="color: #B452CD">0.462</span>, <span style="color: #B452CD">0.625</span>, <span style="color: #B452CD">0.765</span>, <span style="color: #B452CD">0.800</span>])
-group = np.array([<span style="color: #B452CD">1</span>, <span style="color: #B452CD">2</span>, <span style="color: #B452CD">3</span>, <span style="color: #B452CD">4</span>, <span style="color: #B452CD">5</span>, <span style="color: #B452CD">6</span>, <span style="color: #B452CD">7</span>, <span style="color: #B452CD">8</span>])
-plt.plot(group, agegroupmean, <span style="color: #CD5555">&quot;r-&quot;</span>)
-plt.axis([<span style="color: #B452CD">0</span>,<span style="color: #B452CD">9</span>,<span style="color: #B452CD">0</span>, <span style="color: #B452CD">1.0</span>])
-plt.xlabel(<span style="color: #CD5555">r&#39;Age group&#39;</span>)
-plt.ylabel(<span style="color: #CD5555">r&#39;CHD mean values&#39;</span>)
-plt.title(<span style="color: #CD5555">r&#39;Mean values for each age group&#39;</span>)
-plt.show()
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-<p>We are now trying to find a function \( f(y\vert x) \), that is a function which gives us an expected value for the output \( y \) with a given input \( x \).
-In standard linear regression with a linear dependence on \( x \), we would write this in terms of our model
-</p>
-<p>&nbsp;<br>
-$$
-f(y_i\vert x_i)=\beta_0+\beta_1 x_i.
-$$
-<p>&nbsp;<br>
-
-<p>This expression implies however that \( f(y_i\vert x_i) \) could take any
-value from minus infinity to plus infinity. If we however let
-\( f(y\vert y) \) be represented by the mean value, the above example
-shows us that we can constrain the function to take values between
-zero and one, that is we have \( 0 \le f(y_i\vert x_i) \le 1 \). Looking
-at our last curve we see also that it has an S-shaped form. This leads
-us to a very popular model for the function \( f \), namely the so-called
-Sigmoid function or logistic model. We will consider this function as
-representing the probability for finding a value of \( y_i \) with a given
-\( x_i \).
-</p>
-</section>
-
-<section>
-<h2 id="the-logistic-function">The logistic function </h2>
-
-<p>Another widely studied model, is the so-called 
-perceptron model, which is an example of a &quot;hard classification&quot; model. We
-will encounter this model when we discuss neural networks as
-well. Each datapoint is deterministically assigned to a category (i.e
-\( y_i=0 \) or \( y_i=1 \)). In many cases, and the coronary heart disease data forms one of many such examples, it is favorable to have a &quot;soft&quot;
-classifier that outputs the probability of a given category rather
-than a single value. For example, given \( x_i \), the classifier
-outputs the probability of being in a category \( k \).  Logistic regression
-is the most common example of a so-called soft classifier. In logistic
-regression, the probability that a data point \( x_i \)
-belongs to a category \( y_i=\{0,1\} \) is given by the so-called logit function (or Sigmoid) which is meant to represent the likelihood for a given event, 
-</p>
-<p>&nbsp;<br>
-$$
-p(t) = \frac{1}{1+\mathrm \exp{-t}}=\frac{\exp{t}}{1+\mathrm \exp{t}}.
-$$
-<p>&nbsp;<br>
-
-<p>Note that \( 1-p(t)= p(-t) \).</p>
-</section>
-
-<section>
-<h2 id="examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-networks">Examples of likelihood functions used in logistic regression and nueral networks </h2>
-
-<p>The following code plots the logistic function, the step function and other functions we will encounter from here and on.</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="font-size: 80%; line-height: 125%;"><span style="color: #CD5555">&quot;&quot;&quot;The sigmoid function (or the logistic curve) is a</span>
-<span style="color: #CD5555">function that takes any real number, z, and outputs a number (0,1).</span>
-<span style="color: #CD5555">It is useful in neural networks for assigning weights on a relative scale.</span>
-<span style="color: #CD5555">The value z is the weighted sum of parameters involved in the learning algorithm.&quot;&quot;&quot;</span>
-
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">numpy</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">matplotlib.pyplot</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">plt</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">math</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">mt</span>
-
-z = numpy.arange(-<span style="color: #B452CD">5</span>, <span style="color: #B452CD">5</span>, <span style="color: #B452CD">.1</span>)
-sigma_fn = numpy.vectorize(<span style="color: #8B008B; font-weight: bold">lambda</span> z: <span style="color: #B452CD">1</span>/(<span style="color: #B452CD">1</span>+numpy.exp(-z)))
-sigma = sigma_fn(z)
-
-fig = plt.figure()
-ax = fig.add_subplot(<span style="color: #B452CD">111</span>)
-ax.plot(z, sigma)
-ax.set_ylim([-<span style="color: #B452CD">0.1</span>, <span style="color: #B452CD">1.1</span>])
-ax.set_xlim([-<span style="color: #B452CD">5</span>,<span style="color: #B452CD">5</span>])
-ax.grid(<span style="color: #8B008B; font-weight: bold">True</span>)
-ax.set_xlabel(<span style="color: #CD5555">&#39;z&#39;</span>)
-ax.set_title(<span style="color: #CD5555">&#39;sigmoid function&#39;</span>)
-
-plt.show()
-
-<span style="color: #CD5555">&quot;&quot;&quot;Step Function&quot;&quot;&quot;</span>
-z = numpy.arange(-<span style="color: #B452CD">5</span>, <span style="color: #B452CD">5</span>, <span style="color: #B452CD">.02</span>)
-step_fn = numpy.vectorize(<span style="color: #8B008B; font-weight: bold">lambda</span> z: <span style="color: #B452CD">1.0</span> <span style="color: #8B008B; font-weight: bold">if</span> z &gt;= <span style="color: #B452CD">0.0</span> <span style="color: #8B008B; font-weight: bold">else</span> <span style="color: #B452CD">0.0</span>)
-step = step_fn(z)
-
-fig = plt.figure()
-ax = fig.add_subplot(<span style="color: #B452CD">111</span>)
-ax.plot(z, step)
-ax.set_ylim([-<span style="color: #B452CD">0.5</span>, <span style="color: #B452CD">1.5</span>])
-ax.set_xlim([-<span style="color: #B452CD">5</span>,<span style="color: #B452CD">5</span>])
-ax.grid(<span style="color: #8B008B; font-weight: bold">True</span>)
-ax.set_xlabel(<span style="color: #CD5555">&#39;z&#39;</span>)
-ax.set_title(<span style="color: #CD5555">&#39;step function&#39;</span>)
-
-plt.show()
-
-<span style="color: #CD5555">&quot;&quot;&quot;tanh Function&quot;&quot;&quot;</span>
-z = numpy.arange(-<span style="color: #B452CD">2</span>*mt.pi, <span style="color: #B452CD">2</span>*mt.pi, <span style="color: #B452CD">0.1</span>)
-t = numpy.tanh(z)
-
-fig = plt.figure()
-ax = fig.add_subplot(<span style="color: #B452CD">111</span>)
-ax.plot(z, t)
-ax.set_ylim([-<span style="color: #B452CD">1.0</span>, <span style="color: #B452CD">1.0</span>])
-ax.set_xlim([-<span style="color: #B452CD">2</span>*mt.pi,<span style="color: #B452CD">2</span>*mt.pi])
-ax.grid(<span style="color: #8B008B; font-weight: bold">True</span>)
-ax.set_xlabel(<span style="color: #CD5555">&#39;z&#39;</span>)
-ax.set_title(<span style="color: #CD5555">&#39;tanh function&#39;</span>)
-
-plt.show()
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-</section>
-
-<section>
-<h2 id="two-parameters">Two parameters </h2>
-
-<p>We assume now that we have two classes with \( y_i \) either \( 0 \) or \( 1 \). Furthermore we assume also that we have only two parameters \( \beta \) in our fitting of the Sigmoid function, that is we define probabilities </p>
-<p>&nbsp;<br>
-$$
-\begin{align*}
-p(y_i=1|x_i,\boldsymbol{\beta}) &= \frac{\exp{(\beta_0+\beta_1x_i)}}{1+\exp{(\beta_0+\beta_1x_i)}},\nonumber\\
-p(y_i=0|x_i,\boldsymbol{\beta}) &= 1 - p(y_i=1|x_i,\boldsymbol{\beta}),
-\end{align*}
-$$
-<p>&nbsp;<br>
-
-<p>where \( \boldsymbol{\beta} \) are the weights we wish to extract from data, in our case \( \beta_0 \) and \( \beta_1 \). </p>
-
-<p>Note that we used</p>
-<p>&nbsp;<br>
-$$
-p(y_i=0\vert x_i, \boldsymbol{\beta}) = 1-p(y_i=1\vert x_i, \boldsymbol{\beta}).
-$$
-<p>&nbsp;<br>
-</section>
-
-<section>
-<h2 id="maximum-likelihood">Maximum likelihood </h2>
-
-<p>In order to define the total likelihood for all possible outcomes from a  
-dataset \( \mathcal{D}=\{(y_i,x_i)\} \), with the binary labels
-\( y_i\in\{0,1\} \) and where the data points are drawn independently, we use the so-called <a href="https://en.wikipedia.org/wiki/Maximum_likelihood_estimation" target="_blank">Maximum Likelihood Estimation</a> (MLE) principle. 
-We aim thus at maximizing 
-the probability of seeing the observed data. We can then approximate the 
-likelihood in terms of the product of the individual probabilities of a specific outcome \( y_i \), that is 
-</p>
-<p>&nbsp;<br>
-$$
-\begin{align*}
-P(\mathcal{D}|\boldsymbol{\beta})& = \prod_{i=1}^n \left[p(y_i=1|x_i,\boldsymbol{\beta})\right]^{y_i}\left[1-p(y_i=1|x_i,\boldsymbol{\beta}))\right]^{1-y_i}\nonumber \\
-\end{align*}
-$$
-<p>&nbsp;<br>
-
-<p>from which we obtain the log-likelihood and our <b>cost/loss</b> function</p>
-<p>&nbsp;<br>
-$$
-\mathcal{C}(\boldsymbol{\beta}) = \sum_{i=1}^n \left( y_i\log{p(y_i=1|x_i,\boldsymbol{\beta})} + (1-y_i)\log\left[1-p(y_i=1|x_i,\boldsymbol{\beta}))\right]\right).
-$$
-<p>&nbsp;<br>
-</section>
-
-<section>
-<h2 id="the-cost-function-rewritten">The cost function rewritten </h2>
-
-<p>Reordering the logarithms, we can rewrite the <b>cost/loss</b> function as</p>
-<p>&nbsp;<br>
-$$
-\mathcal{C}(\boldsymbol{\beta}) = \sum_{i=1}^n  \left(y_i(\beta_0+\beta_1x_i) -\log{(1+\exp{(\beta_0+\beta_1x_i)})}\right).
-$$
-<p>&nbsp;<br>
-
-<p>The maximum likelihood estimator is defined as the set of parameters that maximize the log-likelihood where we maximize with respect to \( \beta \).
-Since the cost (error) function is just the negative log-likelihood, for logistic regression we have that
-</p>
-<p>&nbsp;<br>
-$$
-\mathcal{C}(\boldsymbol{\beta})=-\sum_{i=1}^n  \left(y_i(\beta_0+\beta_1x_i) -\log{(1+\exp{(\beta_0+\beta_1x_i)})}\right).
-$$
-<p>&nbsp;<br>
-
-<p>This equation is known in statistics as the <b>cross entropy</b>. Finally, we note that just as in linear regression, 
-in practice we often supplement the cross-entropy with additional regularization terms, usually \( L_1 \) and \( L_2 \) regularization as we did for Ridge and Lasso regression.
-</p>
-</section>
-
-<section>
-<h2 id="minimizing-the-cross-entropy">Minimizing the cross entropy </h2>
-
-<p>The cross entropy is a convex function of the weights \( \boldsymbol{\beta} \) and,
-therefore, any local minimizer is a global minimizer. 
-</p>
-
-<p>Minimizing this
-cost function with respect to the two parameters \( \beta_0 \) and \( \beta_1 \) we obtain
-</p>
-
-<p>&nbsp;<br>
-$$
-\frac{\partial \mathcal{C}(\boldsymbol{\beta})}{\partial \beta_0} = -\sum_{i=1}^n  \left(y_i -\frac{\exp{(\beta_0+\beta_1x_i)}}{1+\exp{(\beta_0+\beta_1x_i)}}\right),
-$$
-<p>&nbsp;<br>
-
-<p>and </p>
-<p>&nbsp;<br>
-$$
-\frac{\partial \mathcal{C}(\boldsymbol{\beta})}{\partial \beta_1} = -\sum_{i=1}^n  \left(y_ix_i -x_i\frac{\exp{(\beta_0+\beta_1x_i)}}{1+\exp{(\beta_0+\beta_1x_i)}}\right).
-$$
-<p>&nbsp;<br>
-</section>
-
-<section>
-<h2 id="a-more-compact-expression">A more compact expression </h2>
-
-<p>Let us now define a vector \( \boldsymbol{y} \) with \( n \) elements \( y_i \), an
-\( n\times p \) matrix \( \boldsymbol{X} \) which contains the \( x_i \) values and a
-vector \( \boldsymbol{p} \) of fitted probabilities \( p(y_i\vert x_i,\boldsymbol{\beta}) \). We can rewrite in a more compact form the first
-derivative of cost function as
-</p>
-
-<p>&nbsp;<br>
-$$
-\frac{\partial \mathcal{C}(\boldsymbol{\beta})}{\partial \boldsymbol{\beta}} = -\boldsymbol{X}^T\left(\boldsymbol{y}-\boldsymbol{p}\right). 
-$$
-<p>&nbsp;<br>
-
-<p>If we in addition define a diagonal matrix \( \boldsymbol{W} \) with elements 
-\( p(y_i\vert x_i,\boldsymbol{\beta})(1-p(y_i\vert x_i,\boldsymbol{\beta}) \), we can obtain a compact expression of the second derivative as 
-</p>
-
-<p>&nbsp;<br>
-$$
-\frac{\partial^2 \mathcal{C}(\boldsymbol{\beta})}{\partial \boldsymbol{\beta}\partial \boldsymbol{\beta}^T} = \boldsymbol{X}^T\boldsymbol{W}\boldsymbol{X}. 
-$$
-<p>&nbsp;<br>
-</section>
-
-<section>
-<h2 id="extending-to-more-predictors">Extending to more predictors </h2>
-
-<p>Within a binary classification problem, we can easily expand our model to include multiple predictors. Our ratio between likelihoods is then with \( p \) predictors</p>
-<p>&nbsp;<br>
-$$
-\log{ \frac{p(\boldsymbol{\beta}\boldsymbol{x})}{1-p(\boldsymbol{\beta}\boldsymbol{x})}} = \beta_0+\beta_1x_1+\beta_2x_2+\dots+\beta_px_p.
-$$
-<p>&nbsp;<br>
-
-<p>Here we defined \( \boldsymbol{x}=[1,x_1,x_2,\dots,x_p] \) and \( \boldsymbol{\beta}=[\beta_0, \beta_1, \dots, \beta_p] \) leading to</p>
-<p>&nbsp;<br>
-$$
-p(\boldsymbol{\beta}\boldsymbol{x})=\frac{ \exp{(\beta_0+\beta_1x_1+\beta_2x_2+\dots+\beta_px_p)}}{1+\exp{(\beta_0+\beta_1x_1+\beta_2x_2+\dots+\beta_px_p)}}.
-$$
-<p>&nbsp;<br>
-</section>
-
-<section>
-<h2 id="including-more-classes">Including more classes </h2>
-
-<p>Till now we have mainly focused on two classes, the so-called binary
-system. Suppose we wish to extend to \( K \) classes.  Let us for the sake
-of simplicity assume we have only two predictors. We have then following model
-</p>
-
-<p>&nbsp;<br>
-$$
-\log{\frac{p(C=1\vert x)}{p(K\vert x)}} = \beta_{10}+\beta_{11}x_1,
-$$
-<p>&nbsp;<br>
-
-<p>and </p>
-<p>&nbsp;<br>
-$$
-\log{\frac{p(C=2\vert x)}{p(K\vert x)}} = \beta_{20}+\beta_{21}x_1,
-$$
-<p>&nbsp;<br>
-
-<p>and so on till the class \( C=K-1 \) class</p>
-<p>&nbsp;<br>
-$$
-\log{\frac{p(C=K-1\vert x)}{p(K\vert x)}} = \beta_{(K-1)0}+\beta_{(K-1)1}x_1,
-$$
-<p>&nbsp;<br>
-
-<p>and the model is specified in term of \( K-1 \) so-called log-odds or
-<b>logit</b> transformations.
-</p>
-</section>
-
-<section>
-<h2 id="more-classes">More classes </h2>
-
-<p>In our discussion of neural networks we will encounter the above again
-in terms of a slightly modified function, the so-called <b>Softmax</b> function.
-</p>
-
-<p>The softmax function is used in various multiclass classification
-methods, such as multinomial logistic regression (also known as
-softmax regression), multiclass linear discriminant analysis, naive
-Bayes classifiers, and artificial neural networks.  Specifically, in
-multinomial logistic regression and linear discriminant analysis, the
-input to the function is the result of \( K \) distinct linear functions,
-and the predicted probability for the \( k \)-th class given a sample
-vector \( \boldsymbol{x} \) and a weighting vector \( \boldsymbol{\beta} \) is (with two
-predictors):
-</p>
-
-<p>&nbsp;<br>
-$$
-p(C=k\vert \mathbf {x} )=\frac{\exp{(\beta_{k0}+\beta_{k1}x_1)}}{1+\sum_{l=1}^{K-1}\exp{(\beta_{l0}+\beta_{l1}x_1)}}.
-$$
-<p>&nbsp;<br>
-
-<p>It is easy to extend to more predictors. The final class is </p>
-<p>&nbsp;<br>
-$$
-p(C=K\vert \mathbf {x} )=\frac{1}{1+\sum_{l=1}^{K-1}\exp{(\beta_{l0}+\beta_{l1}x_1)}},
-$$
-<p>&nbsp;<br>
-
-<p>and they sum to one. Our earlier discussions were all specialized to
-the case with two classes only. It is easy to see from the above that
-what we derived earlier is compatible with these equations.
-</p>
-
-<p>To find the optimal parameters we would typically use a gradient
-descent method.  Newton's method and gradient descent methods are
-discussed in the material on <a href="https://compphysics.github.io/MachineLearning/doc/pub/Splines/html/Splines-bs.html" target="_blank">optimization
-methods</a>.
-</p>
-</section>
-
-<section>
-<h2 id="searching-for-optimal-regularization-parameters-lambda">Searching for Optimal Regularization Parameters \( \lambda \) </h2>
-
-<p>In project 1, when using Ridge and Lasso regression, we end up
-searching for the optimal parameter \( \lambda \) which minimizes our
-selected scores (MSE or \( R2 \) values for example). The brute force
-approach, as discussed in the code here for Ridge regression, consists
-in evaluating the MSE as function of different \( \lambda \) values.
-Based on these calculations, one tries then to determine the value of the hyperparameter \( \lambda \)
-which results in optimal scores (for example the smallest MSE or an \( R2=1 \)).
-</p>
-
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="font-size: 80%; line-height: 125%;"><span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">pandas</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">pd</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">matplotlib.pyplot</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">plt</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.model_selection</span> <span style="color: #8B008B; font-weight: bold">import</span> train_test_split
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn</span> <span style="color: #8B008B; font-weight: bold">import</span> linear_model
-
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">MSE</span>(y_data,y_model):
-    n = np.size(y_model)
-    <span style="color: #8B008B; font-weight: bold">return</span> np.sum((y_data-y_model)**<span style="color: #B452CD">2</span>)/n
-<span style="color: #228B22"># A seed just to ensure that the random numbers are the same for every run.</span>
-<span style="color: #228B22"># Useful for eventual debugging.</span>
-np.random.seed(<span style="color: #B452CD">2021</span>)
-
-n = <span style="color: #B452CD">100</span>
-x = np.random.rand(n)
-y = np.exp(-x**<span style="color: #B452CD">2</span>) + <span style="color: #B452CD">1.5</span> * np.exp(-(x-<span style="color: #B452CD">2</span>)**<span style="color: #B452CD">2</span>)+ np.random.randn(n)
-
-Maxpolydegree = <span style="color: #B452CD">5</span>
-X = np.zeros((n,Maxpolydegree-<span style="color: #B452CD">1</span>))
-
-<span style="color: #8B008B; font-weight: bold">for</span> degree <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #B452CD">1</span>,Maxpolydegree): <span style="color: #228B22">#No intercept column</span>
-    X[:,degree-<span style="color: #B452CD">1</span>] = x**(degree)
-
-<span style="color: #228B22"># We split the data in test and training data</span>
-X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=<span style="color: #B452CD">0.2</span>)
-
-<span style="color: #228B22"># Decide which values of lambda to use</span>
-nlambdas = <span style="color: #B452CD">500</span>
-MSERidgePredict = np.zeros(nlambdas)
-lambdas = np.logspace(-<span style="color: #B452CD">4</span>, <span style="color: #B452CD">2</span>, nlambdas)
-<span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(nlambdas):
-    lmb = lambdas[i]
-    RegRidge = linear_model.Ridge(lmb)
-    RegRidge.fit(X_train,y_train)
-    ypredictRidge = RegRidge.predict(X_test)
-    MSERidgePredict[i] = MSE(y_test,ypredictRidge)
-
-<span style="color: #228B22"># Now plot the results</span>
-plt.figure()
-plt.plot(np.log10(lambdas), MSERidgePredict, <span style="color: #CD5555">&#39;g--&#39;</span>, label = <span style="color: #CD5555">&#39;MSE SL Ridge Test&#39;</span>)
-plt.xlabel(<span style="color: #CD5555">&#39;log10(lambda)&#39;</span>)
-plt.ylabel(<span style="color: #CD5555">&#39;MSE&#39;</span>)
-plt.legend()
-plt.show()
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-<p>Here we have performed a rather data greedy calculation as function of the regularization parameter \( \lambda \). There is no resampling here. The latter can easily be added by employing the function <b>RidgeCV</b> instead of just calling the <b>Ridge</b> function. For <b>RidgeCV</b> we need to pass the array of \( \lambda \) values.
-By inspecting the figure we can in turn determine which is the optimal regularization parameter.
-This becomes however less functional in the long run. 
-</p>
-</section>
-
-<section>
-<h2 id="grid-search">Grid Search </h2>
-
-<p>An alternative is to use the so-called grid search functionality
-included with the library <b>Scikit-Learn</b>, as demonstrated for the same
-example here.
-</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="font-size: 80%; line-height: 125%;"><span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.model_selection</span> <span style="color: #8B008B; font-weight: bold">import</span> train_test_split
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.linear_model</span> <span style="color: #8B008B; font-weight: bold">import</span> Ridge
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.model_selection</span> <span style="color: #8B008B; font-weight: bold">import</span> GridSearchCV
-
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">R2</span>(y_data, y_model):
-    <span style="color: #8B008B; font-weight: bold">return</span> <span style="color: #B452CD">1</span> - np.sum((y_data - y_model) ** <span style="color: #B452CD">2</span>) / np.sum((y_data - np.mean(y_data)) ** <span style="color: #B452CD">2</span>)
-
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">MSE</span>(y_data,y_model):
-    n = np.size(y_model)
-    <span style="color: #8B008B; font-weight: bold">return</span> np.sum((y_data-y_model)**<span style="color: #B452CD">2</span>)/n
-
-<span style="color: #228B22"># A seed just to ensure that the random numbers are the same for every run.</span>
-<span style="color: #228B22"># Useful for eventual debugging.</span>
-np.random.seed(<span style="color: #B452CD">2021</span>)
-
-n = <span style="color: #B452CD">100</span>
-x = np.random.rand(n)
-y = np.exp(-x**<span style="color: #B452CD">2</span>) + <span style="color: #B452CD">1.5</span> * np.exp(-(x-<span style="color: #B452CD">2</span>)**<span style="color: #B452CD">2</span>)+ np.random.randn(n)
-
-Maxpolydegree = <span style="color: #B452CD">5</span>
-X = np.zeros((n,Maxpolydegree-<span style="color: #B452CD">1</span>))
-
-<span style="color: #8B008B; font-weight: bold">for</span> degree <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #B452CD">1</span>,Maxpolydegree): <span style="color: #228B22">#No intercept column</span>
-    X[:,degree-<span style="color: #B452CD">1</span>] = x**(degree)
-
-<span style="color: #228B22"># We split the data in test and training data</span>
-X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=<span style="color: #B452CD">0.2</span>)
-
-<span style="color: #228B22"># Decide which values of lambda to use</span>
-nlambdas = <span style="color: #B452CD">10</span>
-lambdas = np.logspace(-<span style="color: #B452CD">4</span>, <span style="color: #B452CD">2</span>, nlambdas)
-<span style="color: #228B22"># create and fit a ridge regression model, testing each alpha</span>
-model = Ridge()
-gridsearch = GridSearchCV(estimator=model, param_grid=<span style="color: #658b00">dict</span>(alpha=lambdas))
-gridsearch.fit(X_train, y_train)
-<span style="color: #658b00">print</span>(gridsearch)
-ypredictRidge = gridsearch.predict(X_test)
-<span style="color: #228B22"># summarize the results of the grid search</span>
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">f&quot;Best estimated lambda-value: {</span>gridsearch.best_estimator_.alpha<span style="color: #CD5555">}&quot;</span>)
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">f&quot;MSE score: {</span>MSE(y_test,ypredictRidge)<span style="color: #CD5555">}&quot;</span>)
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">f&quot;R2 score: {</span>R2(y_test,ypredictRidge)<span style="color: #CD5555">}&quot;</span>)
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-<p>By default the grid search function includes cross validation with
-five folds. The <a href="https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html#sklearn.model_selection.GridSearchCV" target="_blank">Scikit-Learn
-documentation</a>
-contains more information on how to set the different parameters.
-</p>
-
-<p>If we take out the random noise, running the above codes results in \( \lambda=0 \) yielding the best fit. </p>
-</section>
-
-<section>
-<h2 id="randomized-grid-search">Randomized Grid Search </h2>
-
-<p>An alternative to the above manual grid set up, is to use a random
-search where the parameters are tuned from a random distribution
-(uniform below) for a fixed number of iterations. A model is
-constructed and evaluated for each combination of chosen parameters.
-We repeat the previous example but now with a random search.  Note
-that values of \( \lambda \) are now limited to be within \( x\in
-[0,1] \). This domain may not be the most relevant one for the specific
-case under study.
-</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="font-size: 80%; line-height: 125%;"><span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.model_selection</span> <span style="color: #8B008B; font-weight: bold">import</span> train_test_split
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.linear_model</span> <span style="color: #8B008B; font-weight: bold">import</span> Ridge
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.model_selection</span> <span style="color: #8B008B; font-weight: bold">import</span> GridSearchCV
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">scipy.stats</span> <span style="color: #8B008B; font-weight: bold">import</span> uniform <span style="color: #8B008B; font-weight: bold">as</span> randuniform
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.model_selection</span> <span style="color: #8B008B; font-weight: bold">import</span> RandomizedSearchCV
-
-
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">R2</span>(y_data, y_model):
-    <span style="color: #8B008B; font-weight: bold">return</span> <span style="color: #B452CD">1</span> - np.sum((y_data - y_model) ** <span style="color: #B452CD">2</span>) / np.sum((y_data - np.mean(y_data)) ** <span style="color: #B452CD">2</span>)
-
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">MSE</span>(y_data,y_model):
-    n = np.size(y_model)
-    <span style="color: #8B008B; font-weight: bold">return</span> np.sum((y_data-y_model)**<span style="color: #B452CD">2</span>)/n
-
-<span style="color: #228B22"># A seed just to ensure that the random numbers are the same for every run.</span>
-<span style="color: #228B22"># Useful for eventual debugging.</span>
-np.random.seed(<span style="color: #B452CD">2021</span>)
-
-n = <span style="color: #B452CD">100</span>
-x = np.random.rand(n)
-y = np.exp(-x**<span style="color: #B452CD">2</span>) + <span style="color: #B452CD">1.5</span> * np.exp(-(x-<span style="color: #B452CD">2</span>)**<span style="color: #B452CD">2</span>)+ np.random.randn(n)
-
-Maxpolydegree = <span style="color: #B452CD">5</span>
-X = np.zeros((n,Maxpolydegree-<span style="color: #B452CD">1</span>))
-
-<span style="color: #8B008B; font-weight: bold">for</span> degree <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #B452CD">1</span>,Maxpolydegree): <span style="color: #228B22">#No intercept column</span>
-    X[:,degree-<span style="color: #B452CD">1</span>] = x**(degree)
-
-<span style="color: #228B22"># We split the data in test and training data</span>
-X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=<span style="color: #B452CD">0.2</span>)
-
-param_grid = {<span style="color: #CD5555">&#39;alpha&#39;</span>: randuniform()}
-<span style="color: #228B22"># create and fit a ridge regression model, testing each alpha</span>
-model = Ridge()
-gridsearch = RandomizedSearchCV(estimator=model, param_distributions=param_grid, n_iter=<span style="color: #B452CD">100</span>)
-gridsearch.fit(X_train, y_train)
-<span style="color: #658b00">print</span>(gridsearch)
-ypredictRidge = gridsearch.predict(X_test)
-<span style="color: #228B22"># summarize the results of the grid search</span>
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">f&quot;Best estimated lambda-value: {</span>gridsearch.best_estimator_.alpha<span style="color: #CD5555">}&quot;</span>)
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">f&quot;MSE score: {</span>MSE(y_test,ypredictRidge)<span style="color: #CD5555">}&quot;</span>)
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">f&quot;R2 score: {</span>R2(y_test,ypredictRidge)<span style="color: #CD5555">}&quot;</span>)
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-</section>
-
-<section>
-<h2 id="wisconsin-cancer-data">Wisconsin Cancer Data  </h2>
-
-<p>We show here how we can use a simple regression case on the breast
-cancer data using Logistic regression as our algorithm for
-classification.
-</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="font-size: 80%; line-height: 125%;"><span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">matplotlib.pyplot</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">plt</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.model_selection</span> <span style="color: #8B008B; font-weight: bold">import</span>  train_test_split 
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.datasets</span> <span style="color: #8B008B; font-weight: bold">import</span> load_breast_cancer
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.linear_model</span> <span style="color: #8B008B; font-weight: bold">import</span> LogisticRegression
-
-<span style="color: #228B22"># Load the data</span>
-cancer = load_breast_cancer()
-
-X_train, X_test, y_train, y_test = train_test_split(cancer.data,cancer.target,random_state=<span style="color: #B452CD">0</span>)
-<span style="color: #658b00">print</span>(X_train.shape)
-<span style="color: #658b00">print</span>(X_test.shape)
-<span style="color: #228B22"># Logistic Regression</span>
-logreg = LogisticRegression(solver=<span style="color: #CD5555">&#39;lbfgs&#39;</span>)
-logreg.fit(X_train, y_train)
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;Test set accuracy with Logistic Regression: {:.2f}&quot;</span>.format(logreg.score(X_test,y_test)))
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-</section>
-
-<section>
-<h2 id="using-the-correlation-matrix">Using the correlation matrix </h2>
-
-<p>In addition to the above scores, we could also study the covariance (and the correlation matrix).
-We use <b>Pandas</b> to compute the correlation matrix.
-</p>
-
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="font-size: 80%; line-height: 125%;"><span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">matplotlib.pyplot</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">plt</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.model_selection</span> <span style="color: #8B008B; font-weight: bold">import</span>  train_test_split 
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.datasets</span> <span style="color: #8B008B; font-weight: bold">import</span> load_breast_cancer
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.linear_model</span> <span style="color: #8B008B; font-weight: bold">import</span> LogisticRegression
-cancer = load_breast_cancer()
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">pandas</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">pd</span>
-<span style="color: #228B22"># Making a data frame</span>
-cancerpd = pd.DataFrame(cancer.data, columns=cancer.feature_names)
-
-fig, axes = plt.subplots(<span style="color: #B452CD">15</span>,<span style="color: #B452CD">2</span>,figsize=(<span style="color: #B452CD">10</span>,<span style="color: #B452CD">20</span>))
-malignant = cancer.data[cancer.target == <span style="color: #B452CD">0</span>]
-benign = cancer.data[cancer.target == <span style="color: #B452CD">1</span>]
-ax = axes.ravel()
-
-<span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #B452CD">30</span>):
-    _, bins = np.histogram(cancer.data[:,i], bins =<span style="color: #B452CD">50</span>)
-    ax[i].hist(malignant[:,i], bins = bins, alpha = <span style="color: #B452CD">0.5</span>)
-    ax[i].hist(benign[:,i], bins = bins, alpha = <span style="color: #B452CD">0.5</span>)
-    ax[i].set_title(cancer.feature_names[i])
-    ax[i].set_yticks(())
-ax[<span style="color: #B452CD">0</span>].set_xlabel(<span style="color: #CD5555">&quot;Feature magnitude&quot;</span>)
-ax[<span style="color: #B452CD">0</span>].set_ylabel(<span style="color: #CD5555">&quot;Frequency&quot;</span>)
-ax[<span style="color: #B452CD">0</span>].legend([<span style="color: #CD5555">&quot;Malignant&quot;</span>, <span style="color: #CD5555">&quot;Benign&quot;</span>], loc =<span style="color: #CD5555">&quot;best&quot;</span>)
-fig.tight_layout()
-plt.show()
-
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">seaborn</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">sns</span>
-correlation_matrix = cancerpd.corr().round(<span style="color: #B452CD">1</span>)
-<span style="color: #228B22"># use the heatmap function from seaborn to plot the correlation matrix</span>
-<span style="color: #228B22"># annot = True to print the values inside the square</span>
-plt.figure(figsize=(<span style="color: #B452CD">15</span>,<span style="color: #B452CD">8</span>))
-sns.heatmap(data=correlation_matrix, annot=<span style="color: #8B008B; font-weight: bold">True</span>)
-plt.show()
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-</section>
-
-<section>
-<h2 id="discussing-the-correlation-data">Discussing the correlation data </h2>
-
-<p>In the above example we note two things. In the first plot we display
-the overlap of benign and malignant tumors as functions of the various
-features in the Wisconsing breast cancer data set. We see that for
-some of the features we can distinguish clearly the benign and
-malignant cases while for other features we cannot. This can point to
-us which features may be of greater interest when we wish to classify
-a benign or not benign tumour.
-</p>
-
-<p>In the second figure we have computed the so-called correlation
-matrix, which in our case with thirty features becomes a \( 30\times 30 \)
-matrix.
-</p>
-
-<p>We constructed this matrix using <b>pandas</b> via the statements</p>
-
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="font-size: 80%; line-height: 125%;">cancerpd = pd.DataFrame(cancer.data, columns=cancer.feature_names)
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-<p>and then</p>
-
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="font-size: 80%; line-height: 125%;">correlation_matrix = cancerpd.corr().round(<span style="color: #B452CD">1</span>)
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-<p>Diagonalizing this matrix we can in turn say something about which
-features are of relevance and which are not. This leads  us to
-the classical Principal Component Analysis (PCA) theorem with
-applications. This will be discussed later this semester (<a href="https://compphysics.github.io/MachineLearning/doc/pub/week43/html/week43-bs.html" target="_blank">week 43</a>).
-</p>
-</section>
-
-<section>
-<h2 id="other-measures-in-classification-studies-cancer-data-again">Other measures in classification studies: Cancer Data  again </h2>
-
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="font-size: 80%; line-height: 125%;"><span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">matplotlib.pyplot</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">plt</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.model_selection</span> <span style="color: #8B008B; font-weight: bold">import</span>  train_test_split 
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.datasets</span> <span style="color: #8B008B; font-weight: bold">import</span> load_breast_cancer
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.linear_model</span> <span style="color: #8B008B; font-weight: bold">import</span> LogisticRegression
-
-<span style="color: #228B22"># Load the data</span>
-cancer = load_breast_cancer()
-
-X_train, X_test, y_train, y_test = train_test_split(cancer.data,cancer.target,random_state=<span style="color: #B452CD">0</span>)
-<span style="color: #658b00">print</span>(X_train.shape)
-<span style="color: #658b00">print</span>(X_test.shape)
-<span style="color: #228B22"># Logistic Regression</span>
-logreg = LogisticRegression(solver=<span style="color: #CD5555">&#39;lbfgs&#39;</span>)
-logreg.fit(X_train, y_train)
-
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.preprocessing</span> <span style="color: #8B008B; font-weight: bold">import</span> LabelEncoder
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.model_selection</span> <span style="color: #8B008B; font-weight: bold">import</span> cross_validate
-<span style="color: #228B22">#Cross validation</span>
-accuracy = cross_validate(logreg,X_test,y_test,cv=<span style="color: #B452CD">10</span>)[<span style="color: #CD5555">&#39;test_score&#39;</span>]
-<span style="color: #658b00">print</span>(accuracy)
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;Test set accuracy with Logistic Regression: {:.2f}&quot;</span>.format(logreg.score(X_test,y_test)))
-
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">scikitplot</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">skplt</span>
-y_pred = logreg.predict(X_test)
-skplt.metrics.plot_confusion_matrix(y_test, y_pred, normalize=<span style="color: #8B008B; font-weight: bold">True</span>)
-plt.show()
-y_probas = logreg.predict_proba(X_test)
-skplt.metrics.plot_roc(y_test, y_probas)
-plt.show()
-skplt.metrics.plot_cumulative_gain(y_test, y_probas)
-plt.show()
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
+<ol>
+<p><li> Raschka et al, pages 175-192</li>
+<p><li> Hastie et al Chapter 7, here we recommend 7.1-7.5 and 7.10 (cross-validation) and 7.11 (bootstrap). See <a href="https://link.springer.com/book/10.1007/978-0-387-84858-7" target="_blank"><tt>https://link.springer.com/book/10.1007/978-0-387-84858-7</tt></a>.</li>
+<p><li> <a href="https://www.youtube.com/watch?v=EuBBz3bI-aA" target="_blank">Video on bias-variance tradeoff</a></li>
+<p><li> <a href="https://www.youtube.com/watch?v=Xz0x-8-cgaQ" target="_blank">Video on Bootstrapping</a></li>
+<p><li> <a href="https://www.youtube.com/watch?v=fSytzGwwBVw" target="_blank">Video on cross validation</a></li>
+</ol>
+<p>
+<p>For the lab session, the following video on cross validation (from 2024), could be helpful, see <a href="https://www.youtube.com/watch?v=T9jjWsmsd1o" target="_blank"><tt>https://www.youtube.com/watch?v=T9jjWsmsd1o</tt></a></p>
 </div>
 </section>
 
 <section>
-<h2 id="optimization-the-central-part-of-any-machine-learning-algortithm">Optimization, the central part of any Machine Learning algortithm </h2>
+<h2 id="linking-the-regression-analysis-with-a-statistical-interpretation">Linking the regression analysis with a statistical interpretation </h2>
+
+<p>We will now couple the discussions of ordinary least squares, Ridge
+and Lasso regression with a statistical interpretation, that is we
+move from a linear algebra analysis to a statistical analysis. In
+particular, we will focus on what the regularization terms can result
+in.  We will amongst other things show that the regularization
+parameter can reduce considerably the variance of the parameters
+\( \theta \).
+</p>
+
+<p>On of the advantages of doing linear regression is that we actually end up with
+analytical expressions for several statistical quantities.  
+Standard least squares and Ridge regression  allow us to
+derive quantities like the variance and other expectation values in a
+rather straightforward way.
+</p>
+
+<p>It is assumed that \( \varepsilon_i
+\sim \mathcal{N}(0, \sigma^2) \) and the \( \varepsilon_{i} \) are
+independent, i.e.: 
+</p>
+<p>&nbsp;<br>
+$$
+\begin{align*} 
+\mbox{Cov}(\varepsilon_{i_1},
+\varepsilon_{i_2}) & = \left\{ \begin{array}{lcc} \sigma^2 & \mbox{if}
+& i_1 = i_2, \\ 0 & \mbox{if} & i_1 \not= i_2.  \end{array} \right.
+\end{align*} 
+$$
+<p>&nbsp;<br>
 
-<a href="https://www.uio.no/studier/emner/matnat/fys/FYS-STK3155/h20/forelesningsvideoer/OverarchingAimsWeek39.mp4?vrtx=view-as-webpage" target="_blank">Overview Video, why do we care about gradient methods?</a>
+<p>The randomness of \( \varepsilon_i \) implies that
+\( \mathbf{y}_i \) is also a random variable. In particular,
+\( \mathbf{y}_i \) is normally distributed, because \( \varepsilon_i \sim
+\mathcal{N}(0, \sigma^2) \) and \( \mathbf{X}_{i,\ast} \, \boldsymbol{\theta} \) is a
+non-random scalar. To specify the parameters of the distribution of
+\( \mathbf{y}_i \) we need to calculate its first two moments. 
+</p>
 
-<p>Almost every problem in machine learning and data science starts with
-a dataset \( X \), a model \( g(\beta) \), which is a function of the
-parameters \( \beta \) and a cost function \( C(X, g(\beta)) \) that allows
-us to judge how well the model \( g(\beta) \) explains the observations
-\( X \). The model is fit by finding the values of \( \beta \) that minimize
-the cost function. Ideally we would be able to solve for \( \beta \)
-analytically, however this is not possible in general and we must use
-some approximative/numerical method to compute the minimum.
+<p>Recall that \( \boldsymbol{X} \) is a matrix of dimensionality \( n\times p \). The
+notation above \( \mathbf{X}_{i,\ast} \) means that we are looking at the
+row number \( i \) and perform a sum over all values \( p \).
 </p>
 </section>
 
 <section>
-<h2 id="revisiting-our-logistic-regression-case">Revisiting our Logistic Regression case </h2>
+<h2 id="assumptions-made">Assumptions made </h2>
 
-<p>In our discussion on Logistic Regression we studied the 
-case of
-two classes, with \( y_i \) either
-\( 0 \) or \( 1 \). Furthermore we assumed also that we have only two
-parameters \( \beta \) in our fitting, that is we
-defined probabilities
+<p>The assumption we have made here can be summarized as (and this is going to be useful when we discuss the bias-variance trade off)
+that there exists a function \( f(\boldsymbol{x}) \) and  a normal distributed error \( \boldsymbol{\varepsilon}\sim \mathcal{N}(0, \sigma^2) \)
+which describe our data
 </p>
-
 <p>&nbsp;<br>
 $$
-\begin{align*}
-p(y_i=1|x_i,\boldsymbol{\beta}) &= \frac{\exp{(\beta_0+\beta_1x_i)}}{1+\exp{(\beta_0+\beta_1x_i)}},\nonumber\\
-p(y_i=0|x_i,\boldsymbol{\beta}) &= 1 - p(y_i=1|x_i,\boldsymbol{\beta}),
-\end{align*}
+\boldsymbol{y} = f(\boldsymbol{x})+\boldsymbol{\varepsilon}
 $$
 <p>&nbsp;<br>
 
-<p>where \( \boldsymbol{\beta} \) are the weights we wish to extract from data, in our case \( \beta_0 \) and \( \beta_1 \). </p>
+<p>We approximate this function with our model from the solution of the linear regression equations, that is our
+function \( f \) is approximated by \( \boldsymbol{\tilde{y}} \) where we want to minimize \( (\boldsymbol{y}-\boldsymbol{\tilde{y}})^2 \), our MSE, with
+</p>
+<p>&nbsp;<br>
+$$
+\boldsymbol{\tilde{y}} = \boldsymbol{X}\boldsymbol{\theta}.
+$$
+<p>&nbsp;<br>
 </section>
 
 <section>
-<h2 id="the-equations-to-solve">The equations to solve </h2>
-
-<p>Our compact equations used a definition of a vector \( \boldsymbol{y} \) with \( n \)
-elements \( y_i \), an \( n\times p \) matrix \( \boldsymbol{X} \) which contains the
-\( x_i \) values and a vector \( \boldsymbol{p} \) of fitted probabilities
-\( p(y_i\vert x_i,\boldsymbol{\beta}) \). We rewrote in a more compact form
-the first derivative of the cost function as
-</p>
+<h2 id="expectation-value-and-variance">Expectation value and variance </h2>
 
+<p>We can calculate the expectation value of \( \boldsymbol{y} \) for a given element \( i \) </p>
 <p>&nbsp;<br>
 $$
-\frac{\partial \mathcal{C}(\boldsymbol{\beta})}{\partial \boldsymbol{\beta}} = -\boldsymbol{X}^T\left(\boldsymbol{y}-\boldsymbol{p}\right). 
+\begin{align*} 
+\mathbb{E}(y_i) & =
+\mathbb{E}(\mathbf{X}_{i, \ast} \, \boldsymbol{\theta}) + \mathbb{E}(\varepsilon_i)
+\, \, \, = \, \, \, \mathbf{X}_{i, \ast} \, \theta, 
+\end{align*} 
 $$
 <p>&nbsp;<br>
 
-<p>If we in addition define a diagonal matrix \( \boldsymbol{W} \) with elements 
-\( p(y_i\vert x_i,\boldsymbol{\beta})(1-p(y_i\vert x_i,\boldsymbol{\beta}) \), we can obtain a compact expression of the second derivative as 
+<p>while
+its variance is 
 </p>
-
 <p>&nbsp;<br>
 $$
-\frac{\partial^2 \mathcal{C}(\boldsymbol{\beta})}{\partial \boldsymbol{\beta}\partial \boldsymbol{\beta}^T} = \boldsymbol{X}^T\boldsymbol{W}\boldsymbol{X}. 
+\begin{align*} \mbox{Var}(y_i) & = \mathbb{E} \{ [y_i
+- \mathbb{E}(y_i)]^2 \} \, \, \, = \, \, \, \mathbb{E} ( y_i^2 ) -
+[\mathbb{E}(y_i)]^2  \\  & = \mathbb{E} [ ( \mathbf{X}_{i, \ast} \,
+\theta + \varepsilon_i )^2] - ( \mathbf{X}_{i, \ast} \, \boldsymbol{\theta})^2 \\ &
+= \mathbb{E} [ ( \mathbf{X}_{i, \ast} \, \boldsymbol{\theta})^2 + 2 \varepsilon_i
+\mathbf{X}_{i, \ast} \, \boldsymbol{\theta} + \varepsilon_i^2 ] - ( \mathbf{X}_{i,
+\ast} \, \theta)^2 \\  & = ( \mathbf{X}_{i, \ast} \, \boldsymbol{\theta})^2 + 2
+\mathbb{E}(\varepsilon_i) \mathbf{X}_{i, \ast} \, \boldsymbol{\theta} +
+\mathbb{E}(\varepsilon_i^2 ) - ( \mathbf{X}_{i, \ast} \, \boldsymbol{\theta})^2 
+\\ & = \mathbb{E}(\varepsilon_i^2 ) \, \, \, = \, \, \,
+\mbox{Var}(\varepsilon_i) \, \, \, = \, \, \, \sigma^2.  
+\end{align*}
 $$
 <p>&nbsp;<br>
 
-<p>This defines what is called  the Hessian matrix.</p>
+<p>Hence, \( y_i \sim \mathcal{N}( \mathbf{X}_{i, \ast} \, \boldsymbol{\theta}, \sigma^2) \), that is \( \boldsymbol{y} \) follows a normal distribution with 
+mean value \( \boldsymbol{X}\boldsymbol{\theta} \) and variance \( \sigma^2 \) (not be confused with the singular values of the SVD). 
+</p>
 </section>
 
 <section>
-<h2 id="solving-using-newton-raphson-s-method">Solving using Newton-Raphson's method </h2>
+<h2 id="expectation-value-and-variance-for-boldsymbol-theta">Expectation value and variance for \( \boldsymbol{\theta} \) </h2>
+
+<p>With the OLS expressions for the optimal parameters \( \boldsymbol{\hat{\theta}} \) we can evaluate the expectation value</p>
+<p>&nbsp;<br>
+$$
+\mathbb{E}(\boldsymbol{\hat{\theta}}) = \mathbb{E}[ (\mathbf{X}^{\top} \mathbf{X})^{-1}\mathbf{X}^{T} \mathbf{Y}]=(\mathbf{X}^{T} \mathbf{X})^{-1}\mathbf{X}^{T} \mathbb{E}[ \mathbf{Y}]=(\mathbf{X}^{T} \mathbf{X})^{-1} \mathbf{X}^{T}\mathbf{X}\boldsymbol{\theta}=\boldsymbol{\theta}.
+$$
+<p>&nbsp;<br>
+
+<p>This means that the estimator of the regression parameters is unbiased.</p>
+
+<p>We can also calculate the variance</p>
 
-<p>If we can set up these equations, Newton-Raphson's iterative method is normally the method of choice. It requires however that we can compute in an efficient way the  matrices that define the first and second derivatives. </p>
+<p>The variance of the optimal value \( \boldsymbol{\hat{\theta}} \) is</p>
+<p>&nbsp;<br>
+$$
+\begin{eqnarray*}
+\mbox{Var}(\boldsymbol{\hat{\theta}}) & = & \mathbb{E} \{ [\boldsymbol{\theta} - \mathbb{E}(\boldsymbol{\theta})] [\boldsymbol{\theta} - \mathbb{E}(\boldsymbol{\theta})]^{T} \}
+\\
+& = & \mathbb{E} \{ [(\mathbf{X}^{T} \mathbf{X})^{-1} \, \mathbf{X}^{T} \mathbf{Y} - \boldsymbol{\theta}] \, [(\mathbf{X}^{T} \mathbf{X})^{-1} \, \mathbf{X}^{T} \mathbf{Y} - \boldsymbol{\theta}]^{T} \}
+\\
+% & = & \mathbb{E} \{ [(\mathbf{X}^{T} \mathbf{X})^{-1} \, \mathbf{X}^{T} \mathbf{Y}] \, [(\mathbf{X}^{T} \mathbf{X})^{-1} \, \mathbf{X}^{T} \mathbf{Y}]^{T} \} - \boldsymbol{\theta} \, \boldsymbol{\theta}^{T}
+% \\
+% & = & \mathbb{E} \{ (\mathbf{X}^{T} \mathbf{X})^{-1} \, \mathbf{X}^{T} \mathbf{Y} \, \mathbf{Y}^{T} \, \mathbf{X} \, (\mathbf{X}^{T} \mathbf{X})^{-1}  \} - \boldsymbol{\theta} \, \boldsymbol{\theta}^{T}
+% \\
+& = & (\mathbf{X}^{T} \mathbf{X})^{-1} \, \mathbf{X}^{T} \, \mathbb{E} \{ \mathbf{Y} \, \mathbf{Y}^{T} \} \, \mathbf{X} \, (\mathbf{X}^{T} \mathbf{X})^{-1} - \boldsymbol{\theta} \, \boldsymbol{\theta}^{T}
+\\
+& = & (\mathbf{X}^{T} \mathbf{X})^{-1} \, \mathbf{X}^{T} \, \{ \mathbf{X} \, \boldsymbol{\theta} \, \boldsymbol{\theta}^{T} \,  \mathbf{X}^{T} + \sigma^2 \} \, \mathbf{X} \, (\mathbf{X}^{T} \mathbf{X})^{-1} - \boldsymbol{\theta} \, \boldsymbol{\theta}^{T}
+% \\
+% & = & (\mathbf{X}^T \mathbf{X})^{-1} \, \mathbf{X}^T \, \mathbf{X} \, \boldsymbol{\theta} \, \boldsymbol{\theta}^T \,  \mathbf{X}^T \, \mathbf{X} \, (\mathbf{X}^T % \mathbf{X})^{-1}
+% \\
+% & & + \, \, \sigma^2 \, (\mathbf{X}^T \mathbf{X})^{-1} \, \mathbf{X}^T  \, \mathbf{X} \, (\mathbf{X}^T \mathbf{X})^{-1} - \boldsymbol{\theta} \boldsymbol{\theta}^T
+\\
+& = & \boldsymbol{\theta} \, \boldsymbol{\theta}^{T}  + \sigma^2 \, (\mathbf{X}^{T} \mathbf{X})^{-1} - \boldsymbol{\theta} \, \boldsymbol{\theta}^{T}
+\, \, \, = \, \, \, \sigma^2 \, (\mathbf{X}^{T} \mathbf{X})^{-1},
+\end{eqnarray*}
+$$
+<p>&nbsp;<br>
 
-<p>Our iterative scheme is then given by</p>
+<p>where we have used  that \( \mathbb{E} (\mathbf{Y} \mathbf{Y}^{T}) =
+\mathbf{X} \, \boldsymbol{\theta} \, \boldsymbol{\theta}^{T} \, \mathbf{X}^{T} +
+\sigma^2 \, \mathbf{I}_{nn} \). From \( \mbox{Var}(\boldsymbol{\theta}) = \sigma^2
+\, (\mathbf{X}^{T} \mathbf{X})^{-1} \), one obtains an estimate of the
+variance of the estimate of the \( j \)-th regression coefficient:
+\( \boldsymbol{\sigma}^2 (\boldsymbol{\theta}_j ) = \boldsymbol{\sigma}^2 [(\mathbf{X}^{T} \mathbf{X})^{-1}]_{jj}  \). This may be used to
+construct a confidence interval for the estimates.
+</p>
+
+<p>In a similar way, we can obtain analytical expressions for say the
+expectation values of the parameters \( \boldsymbol{\theta} \) and their variance
+when we employ Ridge regression, allowing us again to define a confidence interval. 
+</p>
 
+<p>It is rather straightforward to show that</p>
 <p>&nbsp;<br>
 $$
-\boldsymbol{\beta}^{\mathrm{new}} = \boldsymbol{\beta}^{\mathrm{old}}-\left(\frac{\partial^2 \mathcal{C}(\boldsymbol{\beta})}{\partial \boldsymbol{\beta}\partial \boldsymbol{\beta}^T}\right)^{-1}_{\boldsymbol{\beta}^{\mathrm{old}}}\times \left(\frac{\partial \mathcal{C}(\boldsymbol{\beta})}{\partial \boldsymbol{\beta}}\right)_{\boldsymbol{\beta}^{\mathrm{old}}},
+\mathbb{E} \big[ \boldsymbol{\theta}^{\mathrm{Ridge}} \big]=(\mathbf{X}^{T} \mathbf{X} + \lambda \mathbf{I}_{pp})^{-1} (\mathbf{X}^{\top} \mathbf{X})\boldsymbol{\theta}^{\mathrm{OLS}}.
 $$
 <p>&nbsp;<br>
 
-<p>or in matrix form as</p>
+<p>We see clearly that 
+\( \mathbb{E} \big[ \boldsymbol{\theta}^{\mathrm{Ridge}} \big] \not= \boldsymbol{\theta}^{\mathrm{OLS}} \) for any \( \lambda > 0 \). We say then that the ridge estimator is biased.
+</p>
+
+<p>We can also compute the variance as </p>
 
 <p>&nbsp;<br>
 $$
-\boldsymbol{\beta}^{\mathrm{new}} = \boldsymbol{\beta}^{\mathrm{old}}-\left(\boldsymbol{X}^T\boldsymbol{W}\boldsymbol{X} \right)^{-1}\times \left(-\boldsymbol{X}^T(\boldsymbol{y}-\boldsymbol{p}) \right)_{\boldsymbol{\beta}^{\mathrm{old}}}.
+\mbox{Var}[\boldsymbol{\theta}^{\mathrm{Ridge}}]=\sigma^2[  \mathbf{X}^{T} \mathbf{X} + \lambda \mathbf{I} ]^{-1}  \mathbf{X}^{T} \mathbf{X} \{ [  \mathbf{X}^{\top} \mathbf{X} + \lambda \mathbf{I} ]^{-1}\}^{T},
 $$
 <p>&nbsp;<br>
 
-<p>The right-hand side is computed with the old values of \( \beta \). </p>
+<p>and it is easy to see that if the parameter \( \lambda \) goes to infinity then the variance of Ridge parameters \( \boldsymbol{\theta} \) goes to zero. </p>
+
+<p>With this, we can compute the difference </p>
+
+<p>&nbsp;<br>
+$$
+\mbox{Var}[\boldsymbol{\theta}^{\mathrm{OLS}}]-\mbox{Var}(\boldsymbol{\theta}^{\mathrm{Ridge}})=\sigma^2 [  \mathbf{X}^{T} \mathbf{X} + \lambda \mathbf{I} ]^{-1}[ 2\lambda\mathbf{I} + \lambda^2 (\mathbf{X}^{T} \mathbf{X})^{-1} ] \{ [  \mathbf{X}^{T} \mathbf{X} + \lambda \mathbf{I} ]^{-1}\}^{T}.
+$$
+<p>&nbsp;<br>
 
-<p>If we can compute these matrices, in particular the Hessian, the above is often the easiest method to implement. </p>
+<p>The difference is non-negative definite since each component of the
+matrix product is non-negative definite. 
+This means the variance we obtain with the standard OLS will always for \( \lambda > 0 \) be larger than the variance of \( \boldsymbol{\theta} \) obtained with the Ridge estimator. This has interesting consequences when we discuss the so-called bias-variance trade-off below. 
+</p>
 </section>
 
 <section>
-<h2 id="brief-reminder-on-newton-raphson-s-method">Brief reminder on Newton-Raphson's method </h2>
+<h2 id="deriving-ols-from-a-probability-distribution">Deriving OLS from a probability distribution </h2>
 
-<p>Let us quickly remind ourselves how we derive the above method.</p>
+<p>Our basic assumption when we derived the OLS equations was to assume
+that our output is determined by a given continuous function
+\( f(\boldsymbol{x}) \) and a random noise \( \boldsymbol{\epsilon} \) given by the normal
+distribution with zero mean value and an undetermined variance
+\( \sigma^2 \).
+</p>
 
-<p>Perhaps the most celebrated of all one-dimensional root-finding
-routines is Newton's method, also called the Newton-Raphson
-method. This method  requires the evaluation of both the
-function \( f \) and its derivative \( f' \) at arbitrary points. 
-If you can only calculate the derivative
-numerically and/or your function is not of the smooth type, we
-normally discourage the use of this method.
+<p>We found above that the outputs \( \boldsymbol{y} \) have a mean value given by
+\( \boldsymbol{X}\hat{\boldsymbol{\theta}} \) and variance \( \sigma^2 \). Since the entries to
+the design matrix are not stochastic variables, we can assume that the
+probability distribution of our targets is also a normal distribution
+but now with mean value \( \boldsymbol{X}\hat{\boldsymbol{\theta}} \). This means that a
+single output \( y_i \) is given by the Gaussian distribution
 </p>
+
+<p>&nbsp;<br>
+$$
+y_i\sim \mathcal{N}(\boldsymbol{X}_{i,*}\boldsymbol{\theta}, \sigma^2)=\frac{1}{\sqrt{2\pi\sigma^2}}\exp{\left[-\frac{(y_i-\boldsymbol{X}_{i,*}\boldsymbol{\theta})^2}{2\sigma^2}\right]}.
+$$
+<p>&nbsp;<br>
 </section>
 
 <section>
-<h2 id="the-equations">The equations </h2>
+<h2 id="independent-and-identically-distributed-iid">Independent and Identically Distributed (iid) </h2>
 
-<p>The Newton-Raphson formula consists geometrically of extending the
-tangent line at a current point until it crosses zero, then setting
-the next guess to the abscissa of that zero-crossing.  The mathematics
-behind this method is rather simple. Employing a Taylor expansion for
-\( x \) sufficiently close to the solution \( s \), we have
+<p>We assume now that the various \( y_i \) values are stochastically distributed according to the above Gaussian distribution. 
+We define this distribution as
 </p>
-
 <p>&nbsp;<br>
 $$
-    f(s)=0=f(x)+(s-x)f'(x)+\frac{(s-x)^2}{2}f''(x) +\dots.
-    \tag{2}
+p(y_i, \boldsymbol{X}\vert\boldsymbol{\theta})=\frac{1}{\sqrt{2\pi\sigma^2}}\exp{\left[-\frac{(y_i-\boldsymbol{X}_{i,*}\boldsymbol{\theta})^2}{2\sigma^2}\right]},
 $$
 <p>&nbsp;<br>
 
-<p>For small enough values of the function and for well-behaved
-functions, the terms beyond linear are unimportant, hence we obtain
-</p>
+<p>which reads as finding the likelihood of an event \( y_i \) with the input variables \( \boldsymbol{X} \) given the parameters (to be determined) \( \boldsymbol{\theta} \).</p>
+
+<p>Since these events are assumed to be independent and identicall distributed we can build the probability distribution function (PDF) for all possible event \( \boldsymbol{y} \) as the product of the single events, that is we have</p>
 
 <p>&nbsp;<br>
 $$
-   f(x)+(s-x)f'(x)\approx 0,
+p(\boldsymbol{y},\boldsymbol{X}\vert\boldsymbol{\theta})=\prod_{i=0}^{n-1}\frac{1}{\sqrt{2\pi\sigma^2}}\exp{\left[-\frac{(y_i-\boldsymbol{X}_{i,*}\boldsymbol{\theta})^2}{2\sigma^2}\right]}=\prod_{i=0}^{n-1}p(y_i,\boldsymbol{X}\vert\boldsymbol{\theta}).
 $$
 <p>&nbsp;<br>
 
-<p>yielding</p>
+<p>We will write this in a more compact form reserving \( \boldsymbol{D} \) for the domain of events, including the ouputs (targets) and the inputs. That is
+in case we have a simple one-dimensional input and output case
+</p>
 <p>&nbsp;<br>
 $$
-   s\approx x-\frac{f(x)}{f'(x)}.
+\boldsymbol{D}=[(x_0,y_0), (x_1,y_1),\dots, (x_{n-1},y_{n-1})].
 $$
 <p>&nbsp;<br>
 
-<p>Having in mind an iterative procedure, it is natural to start iterating with</p>
+<p>In the more general case the various inputs should be replaced by the possible features represented by the input data set \( \boldsymbol{X} \). 
+We can now rewrite the above probability as 
+</p>
 <p>&nbsp;<br>
 $$
-   x_{n+1}=x_n-\frac{f(x_n)}{f'(x_n)}.
+p(\boldsymbol{D}\vert\boldsymbol{\theta})=\prod_{i=0}^{n-1}\frac{1}{\sqrt{2\pi\sigma^2}}\exp{\left[-\frac{(y_i-\boldsymbol{X}_{i,*}\boldsymbol{\theta})^2}{2\sigma^2}\right]}.
 $$
 <p>&nbsp;<br>
+
+<p>It is a conditional probability (see below) and reads as the likelihood of a domain of events \( \boldsymbol{D} \) given a set of parameters \( \boldsymbol{\theta} \).</p>
 </section>
 
 <section>
-<h2 id="simple-geometric-interpretation">Simple geometric interpretation </h2>
-
-<p>The above is Newton-Raphson's method. It has a simple geometric
-interpretation, namely \( x_{n+1} \) is the point where the tangent from
-\( (x_n,f(x_n)) \) crosses the \( x \)-axis.  Close to the solution,
-Newton-Raphson converges fast to the desired result. However, if we
-are far from a root, where the higher-order terms in the series are
-important, the Newton-Raphson formula can give grossly inaccurate
-results. For instance, the initial guess for the root might be so far
-from the true root as to let the search interval include a local
-maximum or minimum of the function.  If an iteration places a trial
-guess near such a local extremum, so that the first derivative nearly
-vanishes, then Newton-Raphson may fail totally
+<h2 id="maximum-likelihood-estimation-mle">Maximum Likelihood Estimation (MLE) </h2>
+
+<p>In statistics, maximum likelihood estimation (MLE) is a method of
+estimating the parameters of an assumed probability distribution,
+given some observed data. This is achieved by maximizing a likelihood
+function so that, under the assumed statistical model, the observed
+data is the most probable. 
+</p>
+
+<p>We will assume here that our events are given by the above Gaussian
+distribution and we will determine the optimal parameters \( \theta \) by
+maximizing the above PDF. However, computing the derivatives of a
+product function is cumbersome and can easily lead to overflow and/or
+underflowproblems, with potentials for loss of numerical precision.
+</p>
+
+<p>In practice, it is more convenient to maximize the logarithm of the
+PDF because it is a monotonically increasing function of the argument.
+Alternatively, and this will be our option, we will minimize the
+negative of the logarithm since this is a monotonically decreasing
+function.
+</p>
+
+<p>Note also that maximization/minimization of the logarithm of the PDF
+is equivalent to the maximization/minimization of the function itself.
 </p>
 </section>
 
 <section>
-<h2 id="extending-to-more-than-one-variable">Extending to more than one variable </h2>
+<h2 id="a-new-cost-function">A new Cost Function </h2>
+
+<p>We could now define a new cost function to minimize, namely the negative logarithm of the above PDF</p>
 
-<p>Newton's method can be generalized to systems of several non-linear equations
-and variables. Consider the case with two equations
-</p>
 <p>&nbsp;<br>
 $$
-   \begin{array}{cc} f_1(x_1,x_2) &=0\\
-                     f_2(x_1,x_2) &=0,\end{array}
+C(\boldsymbol{\theta})=-\log{\prod_{i=0}^{n-1}p(y_i,\boldsymbol{X}\vert\boldsymbol{\theta})}=-\sum_{i=0}^{n-1}\log{p(y_i,\boldsymbol{X}\vert\boldsymbol{\theta})},
 $$
 <p>&nbsp;<br>
 
-<p>which we Taylor expand to obtain</p>
-
+<p>which becomes</p>
 <p>&nbsp;<br>
 $$
-   \begin{array}{cc} 0=f_1(x_1+h_1,x_2+h_2)=&f_1(x_1,x_2)+h_1
-                     \partial f_1/\partial x_1+h_2
-                     \partial f_1/\partial x_2+\dots\\
-                     0=f_2(x_1+h_1,x_2+h_2)=&f_2(x_1,x_2)+h_1
-                     \partial f_2/\partial x_1+h_2
-                     \partial f_2/\partial x_2+\dots
-                       \end{array}.
+C(\boldsymbol{\theta})=\frac{n}{2}\log{2\pi\sigma^2}+\frac{\vert\vert (\boldsymbol{y}-\boldsymbol{X}\boldsymbol{\theta})\vert\vert_2^2}{2\sigma^2}.
 $$
 <p>&nbsp;<br>
 
-<p>Defining the Jacobian matrix \( {\bf \boldsymbol{J}} \) we have</p>
+<p>Taking the derivative of the <em>new</em> cost function with respect to the parameters \( \theta \) we recognize our familiar OLS equation, namely</p>
+
 <p>&nbsp;<br>
 $$
- {\bf \boldsymbol{J}}=\left( \begin{array}{cc}
-                         \partial f_1/\partial x_1  & \partial f_1/\partial x_2 \\
-                          \partial f_2/\partial x_1     &\partial f_2/\partial x_2
-             \end{array} \right),
+\boldsymbol{X}^T\left(\boldsymbol{y}-\boldsymbol{X}\boldsymbol{\theta}\right) =0,
 $$
 <p>&nbsp;<br>
 
-<p>we can rephrase Newton's method as</p>
+<p>which leads to the well-known OLS equation for the optimal paramters \( \theta \)</p>
 <p>&nbsp;<br>
 $$
-\left(\begin{array}{c} x_1^{n+1} \\ x_2^{n+1} \end{array} \right)=
-\left(\begin{array}{c} x_1^{n} \\ x_2^{n} \end{array} \right)+
-\left(\begin{array}{c} h_1^{n} \\ h_2^{n} \end{array} \right),
+\hat{\boldsymbol{\theta}}^{\mathrm{OLS}}=\left(\boldsymbol{X}^T\boldsymbol{X}\right)^{-1}\boldsymbol{X}^T\boldsymbol{y}!
 $$
 <p>&nbsp;<br>
 
-<p>where we have defined</p>
+<p>Next week we will make  a similar analysis for Ridge and Lasso regression</p>
+</section>
+
+<section>
+<h2 id="why-resampling-methods">Why resampling methods </h2>
+
+<p>Before we proceed, we need to rethink what we have been doing. In our
+eager to fit the data, we have omitted several important elements in
+our regression analysis. In what follows we will
+</p>
+<ol>
+<p><li> look at statistical properties, including a discussion of mean values, variance and the so-called bias-variance tradeoff</li>
+<p><li> introduce resampling techniques like cross-validation, bootstrapping and jackknife and more</li>
+</ol>
+<p>
+<p>and discuss how to select a given model (one of the difficult parts in machine learning).</p>
+</section>
+
+<section>
+<h2 id="resampling-methods">Resampling methods </h2>
+<div class="alert alert-block alert-block alert-text-normal">
+<b></b>
+<p>
+<p>Resampling methods are an indispensable tool in modern
+statistics. They involve repeatedly drawing samples from a training
+set and refitting a model of interest on each sample in order to
+obtain additional information about the fitted model. For example, in
+order to estimate the variability of a linear regression fit, we can
+repeatedly draw different samples from the training data, fit a linear
+regression to each new sample, and then examine the extent to which
+the resulting fits differ. Such an approach may allow us to obtain
+information that would not be available from fitting the model only
+once using the original training sample.
+</p>
+
+<p>Two resampling methods are often used in Machine Learning analyses,</p>
+<ol>
+<p><li> The <b>bootstrap method</b></li>
+<p><li> and <b>Cross-Validation</b></li>
+</ol>
+<p>
+<p>In addition there are several other methods such as the Jackknife and the Blocking methods. We will discuss in particular
+cross-validation and the bootstrap method. 
+</p>
+</div>
+</section>
+
+<section>
+<h2 id="resampling-approaches-can-be-computationally-expensive">Resampling approaches can be computationally expensive </h2>
+<div class="alert alert-block alert-block alert-text-normal">
+<b></b>
+<p>
+
+<p>Resampling approaches can be computationally expensive, because they
+involve fitting the same statistical method multiple times using
+different subsets of the training data. However, due to recent
+advances in computing power, the computational requirements of
+resampling methods generally are not prohibitive. In this chapter, we
+discuss two of the most commonly used resampling methods,
+cross-validation and the bootstrap. Both methods are important tools
+in the practical application of many statistical learning
+procedures. For example, cross-validation can be used to estimate the
+test error associated with a given statistical learning method in
+order to evaluate its performance, or to select the appropriate level
+of flexibility. The process of evaluating a model&#8217;s performance is
+known as model assessment, whereas the process of selecting the proper
+level of flexibility for a model is known as model selection. The
+bootstrap is widely used.
+</p>
+</div>
+</section>
+
+<section>
+<h2 id="why-resampling-methods">Why resampling methods ? </h2>
+<div class="alert alert-block alert-block alert-text-normal">
+<b>Statistical analysis</b>
+<p>
+
+<ul>
+<p><li> Our simulations can be treated as <em>computer experiments</em>. This is particularly the case for Monte Carlo methods which are widely used in statistical analyses.</li>
+<p><li> The results can be analysed with the same statistical tools as we would use when analysing experimental data.</li>
+<p><li> As in all experiments, we are looking for expectation values and an estimate of how accurate they are, i.e., possible sources for errors.</li>
+</ul>
+</div>
+</section>
+
+<section>
+<h2 id="statistical-analysis">Statistical analysis </h2>
+<div class="alert alert-block alert-block alert-text-normal">
+<b></b>
+<p>
+
+<ul>
+<p><li> As in other experiments, many numerical  experiments have two classes of errors:</li>
+<ul>
+
+<p><li> Statistical errors</li>
+
+<p><li> Systematical errors</li>
+</ul>
+<p>
+<p><li> Statistical errors can be estimated using standard tools from statistics</li>
+<p><li> Systematical errors are method specific and must be treated differently from case to case.</li> 
+</ul>
+</div>
+</section>
+
+<section>
+<h2 id="resampling-methods">Resampling methods </h2>
+
+<p>With all these analytical equations for both the OLS and Ridge
+regression, we will now outline how to assess a given model. This will
+lead to a discussion of the so-called bias-variance tradeoff (see
+below) and so-called resampling methods.
+</p>
+
+<p>One of the quantities we have discussed as a way to measure errors is
+the mean-squared error (MSE), mainly used for fitting of continuous
+functions. Another choice is the absolute error.
+</p>
+
+<p>In the discussions below we will focus on the MSE and in particular since we will split the data into test and training data,
+we discuss the
+</p>
+<ol>
+<p><li> prediction error or simply the <b>test error</b> \( \mathrm{Err_{Test}} \), where we have a fixed training set and the test error is the MSE arising from the data reserved for testing. We discuss also the</li> 
+<p><li> training error \( \mathrm{Err_{Train}} \), which is the average loss over the training data.</li>
+</ol>
+<p>
+<p>As our model becomes more and more complex, more of the training data tends to  used. The training may thence adapt to more complicated structures in the data. This may lead to a decrease in the bias (see below for code example) and a slight increase of the variance for the test error.
+For a certain level of complexity the test error will reach minimum, before starting to increase again. The
+training error reaches a saturation.
+</p>
+</section>
+
+<section>
+<h2 id="resampling-methods-bootstrap">Resampling methods: Bootstrap </h2>
+<div class="alert alert-block alert-block alert-text-normal">
+<b></b>
+<p>
+<p>Bootstrapping is a <a href="https://en.wikipedia.org/wiki/Nonparametric_statistics" target="_blank">non-parametric approach</a> to statistical inference
+that substitutes computation for more traditional distributional
+assumptions and asymptotic results. Bootstrapping offers a number of
+advantages: 
+</p>
+<ol>
+<p><li> The bootstrap is quite general, although there are some cases in which it fails.</li>
+
+<p><li> Because it does not require distributional assumptions (such as normally distributed errors), the bootstrap can provide more accurate inferences when the data are not well behaved or when the sample size is small.</li>
+
+<p><li> It is possible to apply the bootstrap to statistics with sampling distributions that are difficult to derive, even asymptotically.</li> 
+<p><li> It is relatively simple to apply the bootstrap to complex data-collection plans (such as stratified and clustered samples).</li>
+</ol>
+</div>
+
+<p>The textbook by <a href="https://www.cambridge.org/core/books/bootstrap-methods-and-their-application/ED2FD043579F27952363566DC09CBD6A" target="_blank">Davison on the Bootstrap Methods and their Applications</a> provides many more insights and proofs. In this course we will take a more practical approach and use the results and theorems provided in the literature. For those interested in reading more about the bootstrap methods, we recommend the above text and the one by <a href="https://www.routledge.com/An-Introduction-to-the-Bootstrap/Efron-Tibshirani/p/book/9780412042317" target="_blank">Efron and Tibshirani</a>.</p>
+
+<p>Before we proceed however, we need to remind ourselves about a central theorem in statistics, namely the so-called <b>central limit theorem</b>.</p>
+</section>
+
+<section>
+<h2 id="the-central-limit-theorem">The Central Limit Theorem </h2>
+
+<p>Suppose we have a PDF \( p(x) \) from which we generate  a series \( N \)
+of averages \( \mathbb{E}[x_i] \). Each mean value \( \mathbb{E}[x_i] \)
+is viewed as the average of a specific measurement, e.g., throwing 
+dice 100 times and then taking the average value, or producing a certain
+amount of random numbers. 
+For notational ease, we set \( \mathbb{E}[x_i]=x_i \) in the discussion
+which follows. We do the same for \( \mathbb{E}[z]=z \).
+</p>
+
+<p>If we compute the mean \( z \) of \( m \) such mean values \( x_i \)   </p>
 <p>&nbsp;<br>
 $$
-   \left(\begin{array}{c} h_1^{n} \\ h_2^{n} \end{array} \right)=
-   -{\bf \boldsymbol{J}}^{-1}
-   \left(\begin{array}{c} f_1(x_1^{n},x_2^{n}) \\ f_2(x_1^{n},x_2^{n}) \end{array} \right).
+   z=\frac{x_1+x_2+\dots+x_m}{m},
 $$
 <p>&nbsp;<br>
 
-<p>We need thus to compute the inverse of the Jacobian matrix and it
-is to understand that difficulties  may
-arise in case \( {\bf \boldsymbol{J}} \) is nearly singular.
+<p>the question we pose is which is the PDF of the new variable \( z \).</p>
+</section>
+
+<section>
+<h2 id="finding-the-limit">Finding the Limit </h2>
+
+<p>The probability of obtaining an average value \( z \) is the product of the 
+probabilities of obtaining arbitrary individual mean values \( x_i \),
+but with the constraint that the average is \( z \). We can express this through
+the following expression
 </p>
+<p>&nbsp;<br>
+$$
+    \tilde{p}(z)=\int dx_1p(x_1)\int dx_2p(x_2)\dots\int dx_mp(x_m)
+    \delta(z-\frac{x_1+x_2+\dots+x_m}{m}),
+$$
+<p>&nbsp;<br>
 
-<p>It is rather straightforward to extend the above scheme to systems of
-more than two non-linear equations. In our case, the Jacobian matrix is given by the Hessian that represents the second derivative of cost function. 
+<p>where the \( \delta \)-function enbodies the constraint that the mean is \( z \).
+All measurements that lead to each individual \( x_i \) are expected to
+be independent, which in turn means that we can express \( \tilde{p} \) as the 
+product of individual \( p(x_i) \).  The independence assumption is important in the derivation of the central limit theorem.
 </p>
 </section>
 
 <section>
-<h2 id="steepest-descent">Steepest descent </h2>
+<h2 id="rewriting-the-delta-function">Rewriting the \( \delta \)-function </h2>
 
-<p>The basic idea of gradient descent is
-that a function \( F(\mathbf{x}) \), 
-\( \mathbf{x} \equiv (x_1,\cdots,x_n) \), decreases fastest if one goes from \( \bf {x} \) in the
-direction of the negative gradient \( -\nabla F(\mathbf{x}) \).
-</p>
+<p>If we use the integral expression for the \( \delta \)-function</p>
 
-<p>It can be shown that if </p>
 <p>&nbsp;<br>
 $$
-\mathbf{x}_{k+1} = \mathbf{x}_k - \gamma_k \nabla F(\mathbf{x}_k),
+   \delta(z-\frac{x_1+x_2+\dots+x_m}{m})=\frac{1}{2\pi}\int_{-\infty}^{\infty}
+   dq\exp{\left(iq(z-\frac{x_1+x_2+\dots+x_m}{m})\right)},
 $$
 <p>&nbsp;<br>
 
-<p>with \( \gamma_k > 0 \).</p>
-
-<p>For \( \gamma_k \) small enough, then \( F(\mathbf{x}_{k+1}) \leq
-F(\mathbf{x}_k) \). This means that for a sufficiently small \( \gamma_k \)
-we are always moving towards smaller function values, i.e a minimum.
+<p>and inserting \( e^{i\mu q-i\mu q} \) where \( \mu \) is the mean value
+we arrive at
 </p>
+<p>&nbsp;<br>
+$$
+   \tilde{p}(z)=\frac{1}{2\pi}\int_{-\infty}^{\infty}
+   dq\exp{\left(iq(z-\mu)\right)}\left[\int_{-\infty}^{\infty}
+   dxp(x)\exp{\left(iq(\mu-x)/m\right)}\right]^m,
+$$
+<p>&nbsp;<br>
+
+<p>with the integral over \( x \) resulting in</p>
+
+<p>&nbsp;<br>
+$$
+  \int_{-\infty}^{\infty}dxp(x)\exp{\left(iq(\mu-x)/m\right)}=
+  \int_{-\infty}^{\infty}dxp(x)
+   \left[1+\frac{iq(\mu-x)}{m}-\frac{q^2(\mu-x)^2}{2m^2}+\dots\right].
+$$
+<p>&nbsp;<br>
 </section>
 
 <section>
-<h2 id="more-on-steepest-descent">More on Steepest descent </h2>
+<h2 id="identifying-terms">Identifying Terms </h2>
 
-<p>The previous observation is the basis of the method of steepest
-descent, which is also referred to as just gradient descent (GD). One
-starts with an initial guess \( \mathbf{x}_0 \) for a minimum of \( F \) and
-computes new approximations according to
+<p>The second term on the rhs disappears since this is just the mean and 
+employing the definition of \( \sigma^2 \) we have 
 </p>
+<p>&nbsp;<br>
+$$
+  \int_{-\infty}^{\infty}dxp(x)e^{\left(iq(\mu-x)/m\right)}=
+  1-\frac{q^2\sigma^2}{2m^2}+\dots,
+$$
+<p>&nbsp;<br>
+
+<p>resulting in </p>
+
+<p>&nbsp;<br>
+$$
+  \left[\int_{-\infty}^{\infty}dxp(x)\exp{\left(iq(\mu-x)/m\right)}\right]^m\approx
+  \left[1-\frac{q^2\sigma^2}{2m^2}+\dots \right]^m,
+$$
+<p>&nbsp;<br>
+
+<p>and in the limit \( m\rightarrow \infty \) we obtain </p>
 
 <p>&nbsp;<br>
 $$
-\mathbf{x}_{k+1} = \mathbf{x}_k - \gamma_k \nabla F(\mathbf{x}_k), \ \ k \geq 0.
+   \tilde{p}(z)=\frac{1}{\sqrt{2\pi}(\sigma/\sqrt{m})}
+    \exp{\left(-\frac{(z-\mu)^2}{2(\sigma/\sqrt{m})^2}\right)},
 $$
 <p>&nbsp;<br>
 
-<p>The parameter \( \gamma_k \) is often referred to as the step length or
-the learning rate within the context of Machine Learning.
+<p>which is the normal distribution with variance
+\( \sigma^2_m=\sigma^2/m \), where \( \sigma \) is the variance of the PDF \( p(x) \)
+and \( \mu \) is also the mean of the PDF \( p(x) \). 
 </p>
 </section>
 
 <section>
-<h2 id="the-ideal">The ideal </h2>
+<h2 id="wrapping-it-up">Wrapping it up </h2>
+
+<p>Thus, the central limit theorem states that the PDF \( \tilde{p}(z) \) of
+the average of \( m \) random values corresponding to a PDF \( p(x) \) 
+is a normal distribution whose mean is the 
+mean value of the PDF \( p(x) \) and whose variance is the variance
+of the PDF \( p(x) \) divided by \( m \), the number of values used to compute \( z \).
+</p>
+
+<p>The central limit theorem leads to the well-known expression for the
+standard deviation, given by 
+</p>
+
+<p>&nbsp;<br>
+$$
+    \sigma_m=
+\frac{\sigma}{\sqrt{m}}.
+$$
+<p>&nbsp;<br>
 
-<p>Ideally the sequence \( \{\mathbf{x}_k \}_{k=0} \) converges to a global
-minimum of the function \( F \). In general we do not know if we are in a
-global or local minimum. In the special case when \( F \) is a convex
-function, all local minima are also global minima, so in this case
-gradient descent can converge to the global solution. The advantage of
-this scheme is that it is conceptually simple and straightforward to
-implement. However the method in this form has some severe
-limitations:
+<p>The latter is true only if the average value is known exactly. This is obtained in the limit
+\( m\rightarrow \infty \)  only. Because the mean and the variance are measured quantities we obtain 
+the familiar expression in statistics (the so-called Bessel correction)
 </p>
+<p>&nbsp;<br>
+$$
+    \sigma_m\approx 
+\frac{\sigma}{\sqrt{m-1}}.
+$$
+<p>&nbsp;<br>
 
-<p>In machine learing we are often faced with non-convex high dimensional
-cost functions with many local minima. Since GD is deterministic we
-will get stuck in a local minimum, if the method converges, unless we
-have a very good intial guess. This also implies that the scheme is
-sensitive to the chosen initial condition.
+<p>In many cases however the above estimate for the standard deviation,
+in particular if correlations are strong, may be too simplistic. Keep
+in mind that we have assumed that the variables \( x \) are independent
+and identically distributed. This is obviously not always the
+case. For example, the random numbers (or better pseudorandom numbers)
+we generate in various calculations do always exhibit some
+correlations.
 </p>
 
-<p>Note that the gradient is a function of \( \mathbf{x} =
-(x_1,\cdots,x_n) \) which makes it expensive to compute numerically.
+<p>The theorem is satisfied by a large class of PDFs. Note however that for a
+finite \( m \), it is not always possible to find a closed form /analytic expression for
+\( \tilde{p}(x) \).
 </p>
 </section>
 
 <section>
-<h2 id="the-sensitiveness-of-the-gradient-descent">The sensitiveness of the gradient descent </h2>
+<h2 id="confidence-intervals">Confidence Intervals </h2>
 
-<p>The gradient descent method 
-is sensitive to the choice of learning rate \( \gamma_k \). This is due
-to the fact that we are only guaranteed that \( F(\mathbf{x}_{k+1}) \leq
-F(\mathbf{x}_k) \) for sufficiently small \( \gamma_k \). The problem is to
-determine an optimal learning rate. If the learning rate is chosen too
-small the method will take a long time to converge and if it is too
-large we can experience erratic behavior.
+<p>Confidence intervals are used in statistics and represent a type of estimate
+computed from the observed data. This gives a range of values for an
+unknown parameter such as the parameters \( \boldsymbol{\theta} \) from linear regression.
 </p>
 
-<p>Many of these shortcomings can be alleviated by introducing
-randomness. One such method is that of Stochastic Gradient Descent
-(SGD), to be discussed next week.
+<p>With the OLS expressions for the parameters \( \boldsymbol{\theta} \) we found 
+\( \mathbb{E}(\boldsymbol{\theta}) = \boldsymbol{\theta} \), which means that the estimator of the regression parameters is unbiased.
+</p>
+
+<p>In the exercises this week we show that the variance of the estimate of the \( j \)-th regression coefficient is
+\( \boldsymbol{\sigma}^2 (\boldsymbol{\theta}_j ) = \boldsymbol{\sigma}^2 [(\mathbf{X}^{T} \mathbf{X})^{-1}]_{jj}  \).
+</p>
+
+<p>This quantity can be used to
+construct a confidence interval for the estimates.
 </p>
 </section>
 
 <section>
-<h2 id="convex-functions">Convex functions </h2>
+<h2 id="standard-approach-based-on-the-normal-distribution">Standard Approach based on the Normal Distribution </h2>
 
-<p>Ideally we want our cost/loss function to be convex(concave).</p>
+<p>We will assume that the parameters \( \theta \) follow a normal
+distribution.  We can then define the confidence interval.  Here we will be using as
+shorthands \( \mu_{\theta} \) for the above mean value and \( \sigma_{\theta} \)
+for the standard deviation. We have then a confidence interval
+</p>
+
+<p>&nbsp;<br>
+$$
+\left(\mu_{\theta}\pm \frac{z\sigma_{\theta}}{\sqrt{n}}\right),
+$$
+<p>&nbsp;<br>
 
-<p>First we give the definition of a convex set: A set \( C \) in
-\( \mathbb{R}^n \) is said to be convex if, for all \( x \) and \( y \) in \( C \) and
-all \( t \in (0,1) \) , the point \( (1 &#8722; t)x + ty \) also belongs to
-C. Geometrically this means that every point on the line segment
-connecting \( x \) and \( y \) is in \( C \) as discussed below.
+<p>where \( z \) defines the level of certainty (or confidence). For a normal
+distribution typical parameters are \( z=2.576 \) which corresponds to a
+confidence of \( 99\% \) while \( z=1.96 \) corresponds to a confidence of
+\( 95\% \).  A confidence level of \( 95\% \) is commonly used and it is
+normally referred to as a <em>two-sigmas</em> confidence level, that is we
+approximate \( z\approx 2 \).
 </p>
 
-<p>The convex subsets of \( \mathbb{R} \) are the intervals of
-\( \mathbb{R} \). Examples of convex sets of \( \mathbb{R}^2 \) are the
-regular polygons (triangles, rectangles, pentagons, etc...).
+<p>For more discussions of confidence intervals (and in particular linked with a discussion of the bootstrap method), see chapter 5 of the textbook by <a href="https://www.cambridge.org/core/books/bootstrap-methods-and-their-application/ED2FD043579F27952363566DC09CBD6A" target="_blank">Davison on the Bootstrap Methods and their Applications</a></p>
+
+<p>In this text you will also find an in-depth discussion of the
+Bootstrap method, why it works and various theorems related to it. 
 </p>
 </section>
 
 <section>
-<h2 id="convex-function">Convex function </h2>
+<h2 id="resampling-methods-bootstrap-background">Resampling methods: Bootstrap background </h2>
 
-<p><b>Convex function</b>: Let \( X \subset \mathbb{R}^n \) be a convex set. Assume that the function \( f: X \rightarrow \mathbb{R} \) is continuous, then \( f \) is said to be convex if <p>&nbsp;<br>
-$$f(tx_1 + (1-t)x_2) \leq tf(x_1) + (1-t)f(x_2) $$
-<p>&nbsp;<br> for all \( x_1, x_2 \in X \) and for all \( t \in [0,1] \). If \( \leq \) is replaced with a strict inequaltiy in the definition, we demand \( x_1 \neq x_2 \) and \( t\in(0,1) \) then \( f \) is said to be strictly convex. For a single variable function, convexity means that if you draw a straight line connecting \( f(x_1) \) and \( f(x_2) \), the value of the function on the interval \( [x_1,x_2] \) is always below the line as illustrated below.</p>
+<p>Since \( \widehat{\theta} = \widehat{\theta}(\boldsymbol{X}) \) is a function of random variables,
+\( \widehat{\theta} \) itself must be a random variable. Thus it has
+a pdf, call this function \( p(\boldsymbol{t}) \). The aim of the bootstrap is to
+estimate \( p(\boldsymbol{t}) \) by the relative frequency of
+\( \widehat{\theta} \). You can think of this as using a histogram
+in the place of \( p(\boldsymbol{t}) \). If the relative frequency closely
+resembles \( p(\vec{t}) \), then using numerics, it is straight forward to
+estimate all the interesting parameters of \( p(\boldsymbol{t}) \) using point
+estimators.  
+</p>
 </section>
 
 <section>
-<h2 id="conditions-on-convex-functions">Conditions on convex functions </h2>
+<h2 id="resampling-methods-more-bootstrap-background">Resampling methods: More Bootstrap background </h2>
 
-<p>In the following we state first and second-order conditions which
-ensures convexity of a function \( f \). We write \( D_f \) to denote the
-domain of \( f \), i.e the subset of \( R^n \) where \( f \) is defined. For more
-details and proofs we refer to: <a href="http://stanford.edu/boyd/cvxbook/,%202004" target="_blank">S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge University Press</a>.
+<p>In the case that \( \widehat{\theta} \) has
+more than one component, and the components are independent, we use the
+same estimator on each component separately.  If the probability
+density function of \( X_i \), \( p(x) \), had been known, then it would have
+been straightforward to do this by: 
 </p>
-
-<div class="alert alert-block alert-block alert-text-normal">
-<b>First order condition</b>
+<ol>
+<p><li> Drawing lots of numbers from \( p(x) \), suppose we call one such set of numbers \( (X_1^*, X_2^*, \cdots, X_n^*) \).</li> 
+<p><li> Then using these numbers, we could compute a replica of \( \widehat{\theta} \) called \( \widehat{\theta}^* \).</li> 
+</ol>
 <p>
-<p>Suppose \( f \) is differentiable (i.e \( \nabla f(x) \) is well defined for
-all \( x \) in the domain of \( f \)). Then \( f \) is convex if and only if \( D_f \)
-is a convex set and <p>&nbsp;<br>
-$$f(y) \geq f(x) + \nabla f(x)^T (y-x) $$
-<p>&nbsp;<br> holds
-for all \( x,y \in D_f \). This condition means that for a convex function
-the first order Taylor expansion (right hand side above) at any point
-a global under estimator of the function. To convince yourself you can
-make a drawing of \( f(x) = x^2+1 \) and draw the tangent line to \( f(x) \) and
-note that it is always below the graph.  
+<p>By repeated use of the above two points, many
+estimates of \( \widehat{\theta} \) can  be obtained. The
+idea is to use the relative frequency of \( \widehat{\theta}^* \)
+(think of a histogram) as an estimate of \( p(\boldsymbol{t}) \).
 </p>
-</div>
+</section>
 
+<section>
+<h2 id="resampling-methods-bootstrap-approach">Resampling methods: Bootstrap approach </h2>
 
-<div class="alert alert-block alert-block alert-text-normal">
-<b>Second order condition</b>
-<p>
-<p>Assume that \( f \) is twice
-differentiable, i.e the Hessian matrix exists at each point in
-\( D_f \). Then \( f \) is convex if and only if \( D_f \) is a convex set and its
-Hessian is positive semi-definite for all \( x\in D_f \). For a
-single-variable function this reduces to \( f''(x) \geq 0 \). Geometrically this means that \( f \) has nonnegative curvature
-everywhere.
+<p>But
+unless there is enough information available about the process that
+generated \( X_1,X_2,\cdots,X_n \), \( p(x) \) is in general
+unknown. Therefore, <a href="https://projecteuclid.org/euclid.aos/1176344552" target="_blank">Efron in 1979</a>  asked the
+question: What if we replace \( p(x) \) by the relative frequency
+of the observation \( X_i \)?
 </p>
-</div>
 
-<p>This condition is particularly useful since it gives us an procedure for determining if the function under consideration is convex, apart from using the definition.</p>
+<p>If we draw observations in accordance with
+the relative frequency of the observations, will we obtain the same
+result in some asymptotic sense? The answer is yes.
+</p>
 </section>
 
 <section>
-<h2 id="more-on-convex-functions">More on convex functions </h2>
+<h2 id="resampling-methods-bootstrap-steps">Resampling methods: Bootstrap steps </h2>
 
-<p>The next result is of great importance to us and the reason why we are
-going on about convex functions. In machine learning we frequently
-have to minimize a loss/cost function in order to find the best
-parameters for the model we are considering. 
-</p>
+<p>The independent bootstrap works like this: </p>
 
-<p>Ideally we want the
-global minimum (for high-dimensional models it is hard to know
-if we have local or global minimum). However, if the cost/loss function
-is convex the following result provides invaluable information:
+<ol>
+<p><li> Draw with replacement \( n \) numbers for the observed variables \( \boldsymbol{x} = (x_1,x_2,\cdots,x_n) \).</li> 
+<p><li> Define a vector \( \boldsymbol{x}^* \) containing the values which were drawn from \( \boldsymbol{x} \).</li> 
+<p><li> Using the vector \( \boldsymbol{x}^* \) compute \( \widehat{\theta}^* \) by evaluating \( \widehat \theta \) under the observations \( \boldsymbol{x}^* \).</li> 
+<p><li> Repeat this process \( k \) times.</li> 
+</ol>
+<p>
+<p>When you are done, you can draw a histogram of the relative frequency
+of \( \widehat \theta^* \). This is your estimate of the probability
+distribution \( p(t) \). Using this probability distribution you can
+estimate any statistics thereof. In principle you never draw the
+histogram of the relative frequency of \( \widehat{\theta}^* \). Instead
+you use the estimators corresponding to the statistic of interest. For
+example, if you are interested in estimating the variance of \( \widehat
+\theta \), apply the etsimator \( \widehat \sigma^2 \) to the values
+\( \widehat \theta^* \).
 </p>
+</section>
 
-<div class="alert alert-block alert-block alert-text-normal">
-<b>Any minimum is global for convex functions</b>
-<p>
-<p>Consider the problem of finding \( x \in \mathbb{R}^n \) such that \( f(x) \)
-is minimal, where \( f \) is convex and differentiable. Then, any point
-\( x^* \) that satisfies \( \nabla f(x^*) = 0 \) is a global minimum.
+<section>
+<h2 id="code-example-for-the-bootstrap-method">Code example for the Bootstrap method </h2>
+
+<p>The following code starts with a Gaussian distribution with mean value
+\( \mu =100 \) and variance \( \sigma=15 \). We use this to generate the data
+used in the bootstrap analysis. The bootstrap analysis returns a data
+set after a given number of bootstrap operations (as many as we have
+data points). This data set consists of estimated mean values for each
+bootstrap operation. The histogram generated by the bootstrap method
+shows that the distribution for these mean values is also a Gaussian,
+centered around the mean value \( \mu=100 \) but with standard deviation
+\( \sigma/\sqrt{n} \), where \( n \) is the number of bootstrap samples (in
+this case the same as the number of original data points). The value
+of the standard deviation is what we expect from the central limit
+theorem.
 </p>
+
+
+<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #eeeedd">
+  <pre style="font-size: 80%; line-height: 125%;"><span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">time</span> <span style="color: #8B008B; font-weight: bold">import</span> time
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">scipy.stats</span> <span style="color: #8B008B; font-weight: bold">import</span> norm
+<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">matplotlib.pyplot</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">plt</span>
+
+<span style="color: #228B22"># Returns mean of bootstrap samples </span>
+<span style="color: #228B22"># Bootstrap algorithm</span>
+<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">bootstrap</span>(data, datapoints):
+    t = np.zeros(datapoints)
+    n = <span style="color: #658b00">len</span>(data)
+    <span style="color: #228B22"># non-parametric bootstrap         </span>
+    <span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(datapoints):
+        t[i] = np.mean(data[np.random.randint(<span style="color: #B452CD">0</span>,n,n)])
+    <span style="color: #228B22"># analysis    </span>
+    <span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;Bootstrap Statistics :&quot;</span>)
+    <span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;original           bias      std. error&quot;</span>)
+    <span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;%8g %8g %14g %15g&quot;</span> % (np.mean(data), np.std(data),np.mean(t),np.std(t)))
+    <span style="color: #8B008B; font-weight: bold">return</span> t
+
+<span style="color: #228B22"># We set the mean value to 100 and the standard deviation to 15</span>
+mu, sigma = <span style="color: #B452CD">100</span>, <span style="color: #B452CD">15</span>
+datapoints = <span style="color: #B452CD">10000</span>
+<span style="color: #228B22"># We generate random numbers according to the normal distribution</span>
+x = mu + sigma*np.random.randn(datapoints)
+<span style="color: #228B22"># bootstrap returns the data sample                                    </span>
+t = bootstrap(x, datapoints)
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
 </div>
 
-<p>This result means that if we know that the cost/loss function is convex and we are able to find a minimum, we are guaranteed that it is a global minimum.</p>
+<p>We see that our new variance and from that the standard deviation, agrees with the central limit theorem.</p>
 </section>
 
 <section>
-<h2 id="some-simple-problems">Some simple problems </h2>
-
-<ol>
-<p><li> Show that \( f(x)=x^2 \) is convex for \( x \in \mathbb{R} \) using the definition of convexity. Hint: If you re-write the definition, \( f \) is convex if the following holds for all \( x,y \in D_f \) and any \( \lambda \in [0,1] \) $\lambda f(x)+(1-\lambda)f(y)-f(\lambda x + (1-\lambda) y ) \geq 0$.</li>
-<p><li> Using the second order condition show that the following functions are convex on the specified domain.</li>
-<ul>
- <p><li> \( f(x) = e^x \) is convex for \( x \in \mathbb{R} \).</li>
- <p><li> \( g(x) = -\ln(x) \) is convex for \( x \in (0,\infty) \).</li>
-</ul>
-<p>
-<p><li> Let \( f(x) = x^2 \) and \( g(x) = e^x \). Show that \( f(g(x)) \) and \( g(f(x)) \) is convex for \( x \in \mathbb{R} \). Also show that if \( f(x) \) is any convex function than \( h(x) = e^{f(x)} \) is convex.</li>
-<p><li> A norm is any function that satisfy the following properties</li>
-<ul>
- <p><li> \( f(\alpha x) = |\alpha| f(x) \) for all \( \alpha \in \mathbb{R} \).</li>
- <p><li> \( f(x+y) \leq f(x) + f(y) \)</li>
- <p><li> \( f(x) \leq 0 \) for all \( x \in \mathbb{R}^n \) with equality if and only if \( x = 0 \)</li>
-</ul>
-<p>
-</ol>
-<p>
-<p>Using the definition of convexity, try to show that a function satisfying the properties above is convex (the third condition is not needed to show this).</p>
-</section>
-
-<section>
-<h2 id="revisiting-our-first-homework">Revisiting our first homework </h2>
-
-<p>We will use linear regression as a case study for the gradient descent
-methods. Linear regression is a great test case for the gradient
-descent methods discussed in the lectures since it has several
-desirable properties such as:
-</p>
-
-<ol>
-<p><li> An analytical solution (recall homework set 1).</li>
-<p><li> The gradient can be computed analytically.</li>
-<p><li> The cost function is convex which guarantees that gradient descent converges for small enough learning rates</li>
-</ol>
-<p>
-<p>We revisit an example similar to what we had in the first homework set. We had a function  of the type</p>
-
+<h2 id="plotting-the-histogram">Plotting the Histogram </h2>
 
 <!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
 <div class="cell border-box-sizing code_cell rendered">
@@ -1809,8 +1080,15 @@ <h2 id="revisiting-our-first-homework">Revisiting our first homework </h2>
     <div class="inner_cell">
       <div class="input_area">
         <div class="highlight" style="background: #eeeedd">
-  <pre style="font-size: 80%; line-height: 125%;">x = <span style="color: #B452CD">2</span>*np.random.rand(m,<span style="color: #B452CD">1</span>)
-y = <span style="color: #B452CD">4</span>+<span style="color: #B452CD">3</span>*x+np.random.randn(m,<span style="color: #B452CD">1</span>)
+  <pre style="font-size: 80%; line-height: 125%;"><span style="color: #228B22"># the histogram of the bootstrapped data (normalized data if density = True)</span>
+n, binsboot, patches = plt.hist(t, <span style="color: #B452CD">50</span>, density=<span style="color: #8B008B; font-weight: bold">True</span>, facecolor=<span style="color: #CD5555">&#39;red&#39;</span>, alpha=<span style="color: #B452CD">0.75</span>)
+<span style="color: #228B22"># add a &#39;best fit&#39; line  </span>
+y = norm.pdf(binsboot, np.mean(t), np.std(t))
+lt = plt.plot(binsboot, y, <span style="color: #CD5555">&#39;b&#39;</span>, linewidth=<span style="color: #B452CD">1</span>)
+plt.xlabel(<span style="color: #CD5555">&#39;x&#39;</span>)
+plt.ylabel(<span style="color: #CD5555">&#39;Probability&#39;</span>)
+plt.grid(<span style="color: #8B008B; font-weight: bold">True</span>)
+plt.show()
 </pre>
 </div>
       </div>
@@ -1825,104 +1103,93 @@ <h2 id="revisiting-our-first-homework">Revisiting our first homework </h2>
     </div>
   </div>
 </div>
+</section>
+
+<section>
+<h2 id="the-bias-variance-tradeoff">The bias-variance tradeoff </h2>
 
-<p>with \( x_i \in [0,1]  \) is chosen randomly using a uniform distribution. Additionally we have a stochastic noise chosen according to a normal distribution \( \cal {N}(0,1) \). 
-The linear regression model is given by 
+<p>We will discuss the bias-variance tradeoff in the context of
+continuous predictions such as regression. However, many of the
+intuitions and ideas discussed here also carry over to classification
+tasks. Consider a dataset \( \mathcal{D} \) consisting of the data
+\( \mathbf{X}_\mathcal{D}=\{(y_j, \boldsymbol{x}_j), j=0\ldots n-1\} \). 
 </p>
-<p>&nbsp;<br>
-$$
-h_\beta(x) = \boldsymbol{y} = \beta_0 + \beta_1 x,
-$$
-<p>&nbsp;<br>
 
-<p>such that </p>
+<p>Let us assume that the true data is generated from a noisy model</p>
+
 <p>&nbsp;<br>
 $$
-\boldsymbol{y}_i = \beta_0 + \beta_1 x_i.
+\boldsymbol{y}=f(\boldsymbol{x}) + \boldsymbol{\epsilon}
 $$
 <p>&nbsp;<br>
-</section>
 
-<section>
-<h2 id="gradient-descent-example">Gradient descent example </h2>
+<p>where \( \epsilon \) is normally distributed with mean zero and standard deviation \( \sigma^2 \).</p>
 
-<p>Let \( \mathbf{y} = (y_1,\cdots,y_n)^T \), \( \mathbf{\boldsymbol{y}} = (\boldsymbol{y}_1,\cdots,\boldsymbol{y}_n)^T \) and \( \beta = (\beta_0, \beta_1)^T \)</p>
+<p>In our derivation of the ordinary least squares method we defined then
+an approximation to the function \( f \) in terms of the parameters
+\( \boldsymbol{\theta} \) and the design matrix \( \boldsymbol{X} \) which embody our model,
+that is \( \boldsymbol{\tilde{y}}=\boldsymbol{X}\boldsymbol{\theta} \). 
+</p>
 
-<p>It is convenient to write \( \mathbf{\boldsymbol{y}} = X\beta \) where \( X \in \mathbb{R}^{100 \times 2}  \) is the design matrix given by (we keep the intercept here)</p>
+<p>Thereafter we found the parameters \( \boldsymbol{\theta} \) by optimizing the means squared error via the so-called cost function</p>
 <p>&nbsp;<br>
 $$
-X \equiv \begin{bmatrix}
-1 & x_1  \\
-\vdots & \vdots  \\
-1 & x_{100} &  \\
-\end{bmatrix}.
+C(\boldsymbol{X},\boldsymbol{\theta}) =\frac{1}{n}\sum_{i=0}^{n-1}(y_i-\tilde{y}_i)^2=\mathbb{E}\left[(\boldsymbol{y}-\boldsymbol{\tilde{y}})^2\right].
 $$
 <p>&nbsp;<br>
 
-<p>The cost/loss/risk function is given by (</p>
+<p>We can rewrite this as </p>
 <p>&nbsp;<br>
 $$
-C(\beta) = \frac{1}{n}||X\beta-\mathbf{y}||_{2}^{2} = \frac{1}{n}\sum_{i=1}^{100}\left[ (\beta_0 + \beta_1 x_i)^2 - 2 y_i (\beta_0 + \beta_1 x_i) + y_i^2\right] 
+\mathbb{E}\left[(\boldsymbol{y}-\boldsymbol{\tilde{y}})^2\right]=\frac{1}{n}\sum_i(f_i-\mathbb{E}\left[\boldsymbol{\tilde{y}}\right])^2+\frac{1}{n}\sum_i(\tilde{y}_i-\mathbb{E}\left[\boldsymbol{\tilde{y}}\right])^2+\sigma^2.
 $$
 <p>&nbsp;<br>
 
-<p>and we want to find \( \beta \) such that \( C(\beta) \) is minimized.</p>
-</section>
-
-<section>
-<h2 id="the-derivative-of-the-cost-loss-function">The derivative of the cost/loss function </h2>
+<p>The three terms represent the square of the bias of the learning
+method, which can be thought of as the error caused by the simplifying
+assumptions built into the method. The second term represents the
+variance of the chosen model and finally the last terms is variance of
+the error \( \boldsymbol{\epsilon} \).
+</p>
 
-<p>Computing \( \partial C(\beta) / \partial \beta_0 \) and \( \partial C(\beta) / \partial \beta_1 \) we can show  that the gradient can be written as</p>
+<p>To derive this equation, we need to recall that the variance of \( \boldsymbol{y} \) and \( \boldsymbol{\epsilon} \) are both equal to \( \sigma^2 \). The mean value of \( \boldsymbol{\epsilon} \) is by definition equal to zero. Furthermore, the function \( f \) is not a stochastics variable, idem for \( \boldsymbol{\tilde{y}} \).
+We use a more compact notation in terms of the expectation value 
+</p>
 <p>&nbsp;<br>
 $$
-\nabla_{\beta} C(\beta) = \frac{2}{n}\begin{bmatrix} \sum_{i=1}^{100} \left(\beta_0+\beta_1x_i-y_i\right) \\
-\sum_{i=1}^{100}\left( x_i (\beta_0+\beta_1x_i)-y_ix_i\right) \\
-\end{bmatrix} = \frac{2}{n}X^T(X\beta - \mathbf{y}), 
+\mathbb{E}\left[(\boldsymbol{y}-\boldsymbol{\tilde{y}})^2\right]=\mathbb{E}\left[(\boldsymbol{f}+\boldsymbol{\epsilon}-\boldsymbol{\tilde{y}})^2\right],
 $$
 <p>&nbsp;<br>
 
-<p>where \( X \) is the design matrix defined above.</p>
-</section>
-
-<section>
-<h2 id="the-hessian-matrix">The Hessian matrix </h2>
-<p>The Hessian matrix of \( C(\beta) \) is given by </p>
+<p>and adding and subtracting \( \mathbb{E}\left[\boldsymbol{\tilde{y}}\right] \) we get</p>
 <p>&nbsp;<br>
 $$
-\boldsymbol{H} \equiv \begin{bmatrix}
-\frac{\partial^2 C(\beta)}{\partial \beta_0^2} & \frac{\partial^2 C(\beta)}{\partial \beta_0 \partial \beta_1}  \\
-\frac{\partial^2 C(\beta)}{\partial \beta_0 \partial \beta_1} & \frac{\partial^2 C(\beta)}{\partial \beta_1^2} &  \\
-\end{bmatrix} = \frac{2}{n}X^T X.
+\mathbb{E}\left[(\boldsymbol{y}-\boldsymbol{\tilde{y}})^2\right]=\mathbb{E}\left[(\boldsymbol{f}+\boldsymbol{\epsilon}-\boldsymbol{\tilde{y}}+\mathbb{E}\left[\boldsymbol{\tilde{y}}\right]-\mathbb{E}\left[\boldsymbol{\tilde{y}}\right])^2\right],
 $$
 <p>&nbsp;<br>
 
-<p>This result implies that \( C(\beta) \) is a convex function since the matrix \( X^T X \) always is positive semi-definite.</p>
-</section>
-
-<section>
-<h2 id="simple-program">Simple program </h2>
-
-<p>We can now write a program that minimizes \( C(\beta) \) using the gradient descent method with a constant learning rate \( \gamma \) according to </p>
+<p>which, using the abovementioned expectation values can be rewritten as </p>
 <p>&nbsp;<br>
 $$
-\beta_{k+1} = \beta_k - \gamma \nabla_\beta C(\beta_k), \ k=0,1,\cdots 
+\mathbb{E}\left[(\boldsymbol{y}-\boldsymbol{\tilde{y}})^2\right]=\mathbb{E}\left[(\boldsymbol{y}-\mathbb{E}\left[\boldsymbol{\tilde{y}}\right])^2\right]+\mathrm{Var}\left[\boldsymbol{\tilde{y}}\right]+\sigma^2,
 $$
 <p>&nbsp;<br>
 
-<p>We can use the expression we computed for the gradient and let use a
-\( \beta_0 \) be chosen randomly and let \( \gamma = 0.001 \). Stop iterating
-when \( ||\nabla_\beta C(\beta_k) || \leq \epsilon = 10^{-8} \). <b>Note that the code below does not include the latter stop criterion</b>.
-</p>
-
-<p>And finally we can compare our solution for \( \beta \) with the analytic result given by 
-\( \beta= (X^TX)^{-1} X^T \mathbf{y} \).
-</p>
+<p>that is the rewriting in terms of the so-called bias, the variance of the model \( \boldsymbol{\tilde{y}} \) and the variance of \( \boldsymbol{\epsilon} \).</p>
 </section>
 
 <section>
-<h2 id="gradient-descent-example">Gradient Descent Example </h2>
+<h2 id="a-way-to-read-the-bias-variance-tradeoff">A way to Read the Bias-Variance Tradeoff </h2>
+
+<br/><br/>
+<center>
+<p><img src="http://github.com/figures/BiasVariance.png" width="600" align="bottom"></p>
+</center>
+<br/><br/>
+</section>
 
-<p>Here our simple example</p>
+<section>
+<h2 id="example-code-for-bias-variance-tradeoff">Example code for Bias-Variance tradeoff </h2>
 
 <!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
 <div class="cell border-box-sizing code_cell rendered">
@@ -1930,50 +1197,59 @@ <h2 id="gradient-descent-example">Gradient Descent Example </h2>
     <div class="inner_cell">
       <div class="input_area">
         <div class="highlight" style="background: #eeeedd">
-  <pre style="font-size: 80%; line-height: 125%;"><span style="color: #228B22"># Importing various packages</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">random</span> <span style="color: #8B008B; font-weight: bold">import</span> random, seed
+  <pre style="font-size: 80%; line-height: 125%;"><span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">matplotlib.pyplot</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">plt</span>
 <span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">matplotlib.pyplot</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">plt</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">mpl_toolkits.mplot3d</span> <span style="color: #8B008B; font-weight: bold">import</span> Axes3D
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">matplotlib</span> <span style="color: #8B008B; font-weight: bold">import</span> cm
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">matplotlib.ticker</span> <span style="color: #8B008B; font-weight: bold">import</span> LinearLocator, FormatStrFormatter
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">sys</span>
-
-<span style="color: #228B22"># the number of datapoints</span>
-n = <span style="color: #B452CD">100</span>
-x = <span style="color: #B452CD">2</span>*np.random.rand(n,<span style="color: #B452CD">1</span>)
-y = <span style="color: #B452CD">4</span>+<span style="color: #B452CD">3</span>*x+np.random.randn(n,<span style="color: #B452CD">1</span>)
-
-X = np.c_[np.ones((n,<span style="color: #B452CD">1</span>)), x]
-<span style="color: #228B22"># Hessian matrix</span>
-H = (<span style="color: #B452CD">2.0</span>/n)* X.T @ X
-<span style="color: #228B22"># Get the eigenvalues</span>
-EigValues, EigVectors = np.linalg.eig(H)
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">f&quot;Eigenvalues of Hessian Matrix:{</span>EigValues<span style="color: #CD5555">}&quot;</span>)
-
-beta_linreg = np.linalg.inv(X.T @ X) @ X.T @ y
-<span style="color: #658b00">print</span>(beta_linreg)
-beta = np.random.randn(<span style="color: #B452CD">2</span>,<span style="color: #B452CD">1</span>)
-
-eta = <span style="color: #B452CD">1.0</span>/np.max(EigValues)
-Niterations = <span style="color: #B452CD">1000</span>
-
-<span style="color: #8B008B; font-weight: bold">for</span> <span style="color: #658b00">iter</span> <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(Niterations):
-    gradient = (<span style="color: #B452CD">2.0</span>/n)*X.T @ (X @ beta-y)
-    beta -= eta*gradient
-
-<span style="color: #658b00">print</span>(beta)
-xnew = np.array([[<span style="color: #B452CD">0</span>],[<span style="color: #B452CD">2</span>]])
-xbnew = np.c_[np.ones((<span style="color: #B452CD">2</span>,<span style="color: #B452CD">1</span>)), xnew]
-ypredict = xbnew.dot(beta)
-ypredict2 = xbnew.dot(beta_linreg)
-plt.plot(xnew, ypredict, <span style="color: #CD5555">&quot;r-&quot;</span>)
-plt.plot(xnew, ypredict2, <span style="color: #CD5555">&quot;b-&quot;</span>)
-plt.plot(x, y ,<span style="color: #CD5555">&#39;ro&#39;</span>)
-plt.axis([<span style="color: #B452CD">0</span>,<span style="color: #B452CD">2.0</span>,<span style="color: #B452CD">0</span>, <span style="color: #B452CD">15.0</span>])
-plt.xlabel(<span style="color: #CD5555">r&#39;$x$&#39;</span>)
-plt.ylabel(<span style="color: #CD5555">r&#39;$y$&#39;</span>)
-plt.title(<span style="color: #CD5555">r&#39;Gradient descent example&#39;</span>)
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.linear_model</span> <span style="color: #8B008B; font-weight: bold">import</span> LinearRegression, Ridge, Lasso
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.preprocessing</span> <span style="color: #8B008B; font-weight: bold">import</span> PolynomialFeatures
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.model_selection</span> <span style="color: #8B008B; font-weight: bold">import</span> train_test_split
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.pipeline</span> <span style="color: #8B008B; font-weight: bold">import</span> make_pipeline
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.utils</span> <span style="color: #8B008B; font-weight: bold">import</span> resample
+
+np.random.seed(<span style="color: #B452CD">2018</span>)
+
+n = <span style="color: #B452CD">500</span>
+n_boostraps = <span style="color: #B452CD">100</span>
+degree = <span style="color: #B452CD">18</span>  <span style="color: #228B22"># A quite high value, just to show.</span>
+noise = <span style="color: #B452CD">0.1</span>
+
+<span style="color: #228B22"># Make data set.</span>
+x = np.linspace(-<span style="color: #B452CD">1</span>, <span style="color: #B452CD">3</span>, n).reshape(-<span style="color: #B452CD">1</span>, <span style="color: #B452CD">1</span>)
+y = np.exp(-x**<span style="color: #B452CD">2</span>) + <span style="color: #B452CD">1.5</span> * np.exp(-(x-<span style="color: #B452CD">2</span>)**<span style="color: #B452CD">2</span>) + np.random.normal(<span style="color: #B452CD">0</span>, <span style="color: #B452CD">0.1</span>, x.shape)
+
+<span style="color: #228B22"># Hold out some test data that is never used in training.</span>
+x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=<span style="color: #B452CD">0.2</span>)
+
+<span style="color: #228B22"># Combine x transformation and model into one operation.</span>
+<span style="color: #228B22"># Not neccesary, but convenient.</span>
+model = make_pipeline(PolynomialFeatures(degree=degree), LinearRegression(fit_intercept=<span style="color: #8B008B; font-weight: bold">False</span>))
+
+<span style="color: #228B22"># The following (m x n_bootstraps) matrix holds the column vectors y_pred</span>
+<span style="color: #228B22"># for each bootstrap iteration.</span>
+y_pred = np.empty((y_test.shape[<span style="color: #B452CD">0</span>], n_boostraps))
+<span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(n_boostraps):
+    x_, y_ = resample(x_train, y_train)
+
+    <span style="color: #228B22"># Evaluate the new model on the same test data each time.</span>
+    y_pred[:, i] = model.fit(x_, y_).predict(x_test).ravel()
+
+<span style="color: #228B22"># Note: Expectations and variances taken w.r.t. different training</span>
+<span style="color: #228B22"># data sets, hence the axis=1. Subsequent means are taken across the test data</span>
+<span style="color: #228B22"># set in order to obtain a total value, but before this we have error/bias/variance</span>
+<span style="color: #228B22"># calculated per data point in the test set.</span>
+<span style="color: #228B22"># Note 2: The use of keepdims=True is important in the calculation of bias as this </span>
+<span style="color: #228B22"># maintains the column vector form. Dropping this yields very unexpected results.</span>
+error = np.mean( np.mean((y_test - y_pred)**<span style="color: #B452CD">2</span>, axis=<span style="color: #B452CD">1</span>, keepdims=<span style="color: #8B008B; font-weight: bold">True</span>) )
+bias = np.mean( (y_test - np.mean(y_pred, axis=<span style="color: #B452CD">1</span>, keepdims=<span style="color: #8B008B; font-weight: bold">True</span>))**<span style="color: #B452CD">2</span> )
+variance = np.mean( np.var(y_pred, axis=<span style="color: #B452CD">1</span>, keepdims=<span style="color: #8B008B; font-weight: bold">True</span>) )
+<span style="color: #658b00">print</span>(<span style="color: #CD5555">&#39;Error:&#39;</span>, error)
+<span style="color: #658b00">print</span>(<span style="color: #CD5555">&#39;Bias^2:&#39;</span>, bias)
+<span style="color: #658b00">print</span>(<span style="color: #CD5555">&#39;Var:&#39;</span>, variance)
+<span style="color: #658b00">print</span>(<span style="color: #CD5555">&#39;{} &gt;= {} + {} = {}&#39;</span>.format(error, bias, variance, bias+variance))
+
+plt.plot(x[::<span style="color: #B452CD">5</span>, :], y[::<span style="color: #B452CD">5</span>, :], label=<span style="color: #CD5555">&#39;f(x)&#39;</span>)
+plt.scatter(x_test, y_test, label=<span style="color: #CD5555">&#39;Data points&#39;</span>)
+plt.scatter(x_test, np.mean(y_pred, axis=<span style="color: #B452CD">1</span>), label=<span style="color: #CD5555">&#39;Pred&#39;</span>)
+plt.legend()
 plt.show()
 </pre>
 </div>
@@ -1992,8 +1268,7 @@ <h2 id="gradient-descent-example">Gradient Descent Example </h2>
 </section>
 
 <section>
-<h2 id="and-a-corresponding-example-using-scikit-learn">And a corresponding example using <b>scikit-learn</b> </h2>
-
+<h2 id="understanding-what-happens">Understanding what happens </h2>
 
 <!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
 <div class="cell border-box-sizing code_cell rendered">
@@ -2001,22 +1276,52 @@ <h2 id="and-a-corresponding-example-using-scikit-learn">And a corresponding exam
     <div class="inner_cell">
       <div class="input_area">
         <div class="highlight" style="background: #eeeedd">
-  <pre style="font-size: 80%; line-height: 125%;"><span style="color: #228B22"># Importing various packages</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">random</span> <span style="color: #8B008B; font-weight: bold">import</span> random, seed
+  <pre style="font-size: 80%; line-height: 125%;"><span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">matplotlib.pyplot</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">plt</span>
 <span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">matplotlib.pyplot</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">plt</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.linear_model</span> <span style="color: #8B008B; font-weight: bold">import</span> SGDRegressor
-
-n = <span style="color: #B452CD">100</span>
-x = <span style="color: #B452CD">2</span>*np.random.rand(n,<span style="color: #B452CD">1</span>)
-y = <span style="color: #B452CD">4</span>+<span style="color: #B452CD">3</span>*x+np.random.randn(n,<span style="color: #B452CD">1</span>)
-
-X = np.c_[np.ones((n,<span style="color: #B452CD">1</span>)), x]
-beta_linreg = np.linalg.inv(X.T @ X) @ (X.T @ y)
-<span style="color: #658b00">print</span>(beta_linreg)
-sgdreg = SGDRegressor(max_iter = <span style="color: #B452CD">50</span>, penalty=<span style="color: #8B008B; font-weight: bold">None</span>, eta0=<span style="color: #B452CD">0.1</span>)
-sgdreg.fit(x,y.ravel())
-<span style="color: #658b00">print</span>(sgdreg.intercept_, sgdreg.coef_)
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.linear_model</span> <span style="color: #8B008B; font-weight: bold">import</span> LinearRegression, Ridge, Lasso
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.preprocessing</span> <span style="color: #8B008B; font-weight: bold">import</span> PolynomialFeatures
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.model_selection</span> <span style="color: #8B008B; font-weight: bold">import</span> train_test_split
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.pipeline</span> <span style="color: #8B008B; font-weight: bold">import</span> make_pipeline
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.utils</span> <span style="color: #8B008B; font-weight: bold">import</span> resample
+
+np.random.seed(<span style="color: #B452CD">2018</span>)
+
+n = <span style="color: #B452CD">40</span>
+n_boostraps = <span style="color: #B452CD">100</span>
+maxdegree = <span style="color: #B452CD">14</span>
+
+
+<span style="color: #228B22"># Make data set.</span>
+x = np.linspace(-<span style="color: #B452CD">3</span>, <span style="color: #B452CD">3</span>, n).reshape(-<span style="color: #B452CD">1</span>, <span style="color: #B452CD">1</span>)
+y = np.exp(-x**<span style="color: #B452CD">2</span>) + <span style="color: #B452CD">1.5</span> * np.exp(-(x-<span style="color: #B452CD">2</span>)**<span style="color: #B452CD">2</span>)+ np.random.normal(<span style="color: #B452CD">0</span>, <span style="color: #B452CD">0.1</span>, x.shape)
+error = np.zeros(maxdegree)
+bias = np.zeros(maxdegree)
+variance = np.zeros(maxdegree)
+polydegree = np.zeros(maxdegree)
+x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=<span style="color: #B452CD">0.2</span>)
+
+<span style="color: #8B008B; font-weight: bold">for</span> degree <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(maxdegree):
+    model = make_pipeline(PolynomialFeatures(degree=degree), LinearRegression(fit_intercept=<span style="color: #8B008B; font-weight: bold">False</span>))
+    y_pred = np.empty((y_test.shape[<span style="color: #B452CD">0</span>], n_boostraps))
+    <span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(n_boostraps):
+        x_, y_ = resample(x_train, y_train)
+        y_pred[:, i] = model.fit(x_, y_).predict(x_test).ravel()
+
+    polydegree[degree] = degree
+    error[degree] = np.mean( np.mean((y_test - y_pred)**<span style="color: #B452CD">2</span>, axis=<span style="color: #B452CD">1</span>, keepdims=<span style="color: #8B008B; font-weight: bold">True</span>) )
+    bias[degree] = np.mean( (y_test - np.mean(y_pred, axis=<span style="color: #B452CD">1</span>, keepdims=<span style="color: #8B008B; font-weight: bold">True</span>))**<span style="color: #B452CD">2</span> )
+    variance[degree] = np.mean( np.var(y_pred, axis=<span style="color: #B452CD">1</span>, keepdims=<span style="color: #8B008B; font-weight: bold">True</span>) )
+    <span style="color: #658b00">print</span>(<span style="color: #CD5555">&#39;Polynomial degree:&#39;</span>, degree)
+    <span style="color: #658b00">print</span>(<span style="color: #CD5555">&#39;Error:&#39;</span>, error[degree])
+    <span style="color: #658b00">print</span>(<span style="color: #CD5555">&#39;Bias^2:&#39;</span>, bias[degree])
+    <span style="color: #658b00">print</span>(<span style="color: #CD5555">&#39;Var:&#39;</span>, variance[degree])
+    <span style="color: #658b00">print</span>(<span style="color: #CD5555">&#39;{} &gt;= {} + {} = {}&#39;</span>.format(error[degree], bias[degree], variance[degree], bias[degree]+variance[degree]))
+
+plt.plot(polydegree, error, label=<span style="color: #CD5555">&#39;Error&#39;</span>)
+plt.plot(polydegree, bias, label=<span style="color: #CD5555">&#39;bias&#39;</span>)
+plt.plot(polydegree, variance, label=<span style="color: #CD5555">&#39;Variance&#39;</span>)
+plt.legend()
+plt.show()
 </pre>
 </div>
       </div>
@@ -2034,54 +1339,60 @@ <h2 id="and-a-corresponding-example-using-scikit-learn">And a corresponding exam
 </section>
 
 <section>
-<h2 id="gradient-descent-and-ridge">Gradient descent and Ridge </h2>
+<h2 id="summing-up">Summing up </h2>
 
-<p>We have also discussed Ridge regression where the loss function contains a regularized term given by the \( L_2 \) norm of \( \beta \), </p>
-<p>&nbsp;<br>
-$$
-C_{\text{ridge}}(\beta) = \frac{1}{n}||X\beta -\mathbf{y}||^2 + \lambda ||\beta||^2, \ \lambda \geq 0.
-$$
-<p>&nbsp;<br>
+<p>The bias-variance tradeoff summarizes the fundamental tension in
+machine learning, particularly supervised learning, between the
+complexity of a model and the amount of training data needed to train
+it.  Since data is often limited, in practice it is often useful to
+use a less-complex model with higher bias, that is  a model whose asymptotic
+performance is worse than another model because it is easier to
+train and less sensitive to sampling noise arising from having a
+finite-sized training dataset (smaller variance). 
+</p>
 
-<p>In order to minimize \( C_{\text{ridge}}(\beta) \) using GD we adjust the gradient as follows </p>
-<p>&nbsp;<br>
-$$
-\nabla_\beta C_{\text{ridge}}(\beta)  = \frac{2}{n}\begin{bmatrix} \sum_{i=1}^{100} \left(\beta_0+\beta_1x_i-y_i\right) \\
-\sum_{i=1}^{100}\left( x_i (\beta_0+\beta_1x_i)-y_ix_i\right) \\
-\end{bmatrix} + 2\lambda\begin{bmatrix} \beta_0 \\ \beta_1\end{bmatrix} = 2 (\frac{1}{n}X^T(X\beta - \mathbf{y})+\lambda \beta).
-$$
-<p>&nbsp;<br>
+<p>The above equations tell us that in
+order to minimize the expected test error, we need to select a
+statistical learning method that simultaneously achieves low variance
+and low bias. Note that variance is inherently a nonnegative quantity,
+and squared bias is also nonnegative. Hence, we see that the expected
+test MSE can never lie below \( Var(\epsilon) \), the irreducible error.
+</p>
 
-<p>We can easily extend our program to minimize \( C_{\text{ridge}}(\beta) \) using gradient descent and compare with the analytical solution given by </p>
-<p>&nbsp;<br>
-$$
-\beta_{\text{ridge}} = \left(X^T X + n\lambda I_{2 \times 2} \right)^{-1} X^T \mathbf{y}.
-$$
-<p>&nbsp;<br>
+<p>What do we mean by the variance and bias of a statistical learning
+method? The variance refers to the amount by which our model would change if we
+estimated it using a different training data set. Since the training
+data are used to fit the statistical learning method, different
+training data sets  will result in a different estimate. But ideally the
+estimate for our model should not vary too much between training
+sets. However, if a method has high variance  then small changes in
+the training data can result in large changes in the model. In general, more
+flexible statistical methods have higher variance.
+</p>
+
+<p>You may also find this recent <a href="https://www.pnas.org/content/116/32/15849" target="_blank">article</a> of interest.</p>
 </section>
 
 <section>
-<h2 id="the-hessian-matrix-for-ridge-regression">The Hessian matrix for Ridge Regression </h2>
-<p>The Hessian matrix of Ridge Regression for our simple example  is given by </p>
-<p>&nbsp;<br>
-$$
-\boldsymbol{H} \equiv \begin{bmatrix}
-\frac{\partial^2 C(\beta)}{\partial \beta_0^2} & \frac{\partial^2 C(\beta)}{\partial \beta_0 \partial \beta_1}  \\
-\frac{\partial^2 C(\beta)}{\partial \beta_0 \partial \beta_1} & \frac{\partial^2 C(\beta)}{\partial \beta_1^2} &  \\
-\end{bmatrix} = \frac{2}{n}X^T X+2\lambda\boldsymbol{I}.
-$$
-<p>&nbsp;<br>
+<h2 id="another-example-from-scikit-learn-s-repository">Another Example from Scikit-Learn's Repository </h2>
 
-<p>This implies that the Hessian matrix  is positive definite, hence the stationary point is a
-minimum.
-Note that the Ridge cost function is convex being  a sum of two convex
-functions. Therefore, the stationary point is a global
-minimum of this function.
+<p>This example demonstrates the problems of underfitting and overfitting and
+how we can use linear regression with polynomial features to approximate
+nonlinear functions. The plot shows the function that we want to approximate,
+which is a part of the cosine function. In addition, the samples from the
+real function and the approximations of different models are displayed. The
+models have polynomial features of different degrees. We can see that a
+linear function (polynomial with degree 1) is not sufficient to fit the
+training samples. This is called <b>underfitting</b>. A polynomial of degree 4
+approximates the true function almost perfectly. However, for higher degrees
+the model will <b>overfit</b> the training data, i.e. it learns the noise of the
+training data.
+We evaluate quantitatively overfitting and underfitting by using
+cross-validation. We calculate the mean squared error (MSE) on the validation
+set, the higher, the less likely the model generalizes correctly from the
+training data.
 </p>
-</section>
 
-<section>
-<h2 id="program-example-for-gradient-descent-with-ridge-regression">Program example for gradient descent with Ridge Regression </h2>
 
 <!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
 <div class="cell border-box-sizing code_cell rendered">
@@ -2089,55 +1400,54 @@ <h2 id="program-example-for-gradient-descent-with-ridge-regression">Program exam
     <div class="inner_cell">
       <div class="input_area">
         <div class="highlight" style="background: #eeeedd">
-  <pre style="font-size: 80%; line-height: 125%;"><span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">random</span> <span style="color: #8B008B; font-weight: bold">import</span> random, seed
+  <pre style="font-size: 80%; line-height: 125%;"><span style="color: #228B22">#print(__doc__)</span>
+
 <span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
 <span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">matplotlib.pyplot</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">plt</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">mpl_toolkits.mplot3d</span> <span style="color: #8B008B; font-weight: bold">import</span> Axes3D
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">matplotlib</span> <span style="color: #8B008B; font-weight: bold">import</span> cm
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">matplotlib.ticker</span> <span style="color: #8B008B; font-weight: bold">import</span> LinearLocator, FormatStrFormatter
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">sys</span>
-
-<span style="color: #228B22"># the number of datapoints</span>
-n = <span style="color: #B452CD">100</span>
-x = <span style="color: #B452CD">2</span>*np.random.rand(n,<span style="color: #B452CD">1</span>)
-y = <span style="color: #B452CD">4</span>+<span style="color: #B452CD">3</span>*x+np.random.randn(n,<span style="color: #B452CD">1</span>)
-
-X = np.c_[np.ones((n,<span style="color: #B452CD">1</span>)), x]
-XT_X = X.T @ X
-
-<span style="color: #228B22">#Ridge parameter lambda</span>
-lmbda  = <span style="color: #B452CD">0.001</span>
-Id = n*lmbda* np.eye(XT_X.shape[<span style="color: #B452CD">0</span>])
-
-<span style="color: #228B22"># Hessian matrix</span>
-H = (<span style="color: #B452CD">2.0</span>/n)* XT_X+<span style="color: #B452CD">2</span>*lmbda* np.eye(XT_X.shape[<span style="color: #B452CD">0</span>])
-<span style="color: #228B22"># Get the eigenvalues</span>
-EigValues, EigVectors = np.linalg.eig(H)
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">f&quot;Eigenvalues of Hessian Matrix:{</span>EigValues<span style="color: #CD5555">}&quot;</span>)
-
-
-beta_linreg = np.linalg.inv(XT_X+Id) @ X.T @ y
-<span style="color: #658b00">print</span>(beta_linreg)
-<span style="color: #228B22"># Start plain gradient descent</span>
-beta = np.random.randn(<span style="color: #B452CD">2</span>,<span style="color: #B452CD">1</span>)
-
-eta = <span style="color: #B452CD">1.0</span>/np.max(EigValues)
-Niterations = <span style="color: #B452CD">100</span>
-
-<span style="color: #8B008B; font-weight: bold">for</span> <span style="color: #658b00">iter</span> <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(Niterations):
-    gradients = <span style="color: #B452CD">2.0</span>/n*X.T @ (X @ (beta)-y)+<span style="color: #B452CD">2</span>*lmbda*beta
-    beta -= eta*gradients
-
-<span style="color: #658b00">print</span>(beta)
-ypredict = X @ beta
-ypredict2 = X @ beta_linreg
-plt.plot(x, ypredict, <span style="color: #CD5555">&quot;r-&quot;</span>)
-plt.plot(x, ypredict2, <span style="color: #CD5555">&quot;b-&quot;</span>)
-plt.plot(x, y ,<span style="color: #CD5555">&#39;ro&#39;</span>)
-plt.axis([<span style="color: #B452CD">0</span>,<span style="color: #B452CD">2.0</span>,<span style="color: #B452CD">0</span>, <span style="color: #B452CD">15.0</span>])
-plt.xlabel(<span style="color: #CD5555">r&#39;$x$&#39;</span>)
-plt.ylabel(<span style="color: #CD5555">r&#39;$y$&#39;</span>)
-plt.title(<span style="color: #CD5555">r&#39;Gradient descent example for Ridge&#39;</span>)
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.pipeline</span> <span style="color: #8B008B; font-weight: bold">import</span> Pipeline
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.preprocessing</span> <span style="color: #8B008B; font-weight: bold">import</span> PolynomialFeatures
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.linear_model</span> <span style="color: #8B008B; font-weight: bold">import</span> LinearRegression
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.model_selection</span> <span style="color: #8B008B; font-weight: bold">import</span> cross_val_score
+
+
+<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">true_fun</span>(X):
+    <span style="color: #8B008B; font-weight: bold">return</span> np.cos(<span style="color: #B452CD">1.5</span> * np.pi * X)
+
+np.random.seed(<span style="color: #B452CD">0</span>)
+
+n_samples = <span style="color: #B452CD">30</span>
+degrees = [<span style="color: #B452CD">1</span>, <span style="color: #B452CD">4</span>, <span style="color: #B452CD">15</span>]
+
+X = np.sort(np.random.rand(n_samples))
+y = true_fun(X) + np.random.randn(n_samples) * <span style="color: #B452CD">0.1</span>
+
+plt.figure(figsize=(<span style="color: #B452CD">14</span>, <span style="color: #B452CD">5</span>))
+<span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #658b00">len</span>(degrees)):
+    ax = plt.subplot(<span style="color: #B452CD">1</span>, <span style="color: #658b00">len</span>(degrees), i + <span style="color: #B452CD">1</span>)
+    plt.setp(ax, xticks=(), yticks=())
+
+    polynomial_features = PolynomialFeatures(degree=degrees[i],
+                                             include_bias=<span style="color: #8B008B; font-weight: bold">False</span>)
+    linear_regression = LinearRegression()
+    pipeline = Pipeline([(<span style="color: #CD5555">&quot;polynomial_features&quot;</span>, polynomial_features),
+                         (<span style="color: #CD5555">&quot;linear_regression&quot;</span>, linear_regression)])
+    pipeline.fit(X[:, np.newaxis], y)
+
+    <span style="color: #228B22"># Evaluate the models using crossvalidation</span>
+    scores = cross_val_score(pipeline, X[:, np.newaxis], y,
+                             scoring=<span style="color: #CD5555">&quot;neg_mean_squared_error&quot;</span>, cv=<span style="color: #B452CD">10</span>)
+
+    X_test = np.linspace(<span style="color: #B452CD">0</span>, <span style="color: #B452CD">1</span>, <span style="color: #B452CD">100</span>)
+    plt.plot(X_test, pipeline.predict(X_test[:, np.newaxis]), label=<span style="color: #CD5555">&quot;Model&quot;</span>)
+    plt.plot(X_test, true_fun(X_test), label=<span style="color: #CD5555">&quot;True function&quot;</span>)
+    plt.scatter(X, y, edgecolor=<span style="color: #CD5555">&#39;b&#39;</span>, s=<span style="color: #B452CD">20</span>, label=<span style="color: #CD5555">&quot;Samples&quot;</span>)
+    plt.xlabel(<span style="color: #CD5555">&quot;x&quot;</span>)
+    plt.ylabel(<span style="color: #CD5555">&quot;y&quot;</span>)
+    plt.xlim((<span style="color: #B452CD">0</span>, <span style="color: #B452CD">1</span>))
+    plt.ylim((-<span style="color: #B452CD">2</span>, <span style="color: #B452CD">2</span>))
+    plt.legend(loc=<span style="color: #CD5555">&quot;best&quot;</span>)
+    plt.title(<span style="color: #CD5555">&quot;Degree {}\nMSE = {:.2e}(+/- {:.2e})&quot;</span>.format(
+        degrees[i], -scores.mean(), scores.std()))
 plt.show()
 </pre>
 </div>
@@ -2155,29 +1465,6 @@ <h2 id="program-example-for-gradient-descent-with-ridge-regression">Program exam
 </div>
 </section>
 
-<section>
-<h2 id="using-gradient-descent-methods-limitations">Using gradient descent methods, limitations </h2>
-
-<ul>
-<p><li> <b>Gradient descent (GD) finds local minima of our function</b>. Since the GD algorithm is deterministic, if it converges, it will converge to a local minimum of our cost/loss/risk function. Because in ML we are often dealing with extremely rugged landscapes with many local minima, this can lead to poor performance.</li>
-<p><li> <b>GD is sensitive to initial conditions</b>. One consequence of the local nature of GD is that initial conditions matter. Depending on where one starts, one will end up at a different local minima. Therefore, it is very important to think about how one initializes the training process. This is true for GD as well as more complicated variants of GD.</li>
-<p><li> <b>Gradients are computationally expensive to calculate for large datasets</b>. In many cases in statistics and ML, the cost/loss/risk function is a sum of terms, with one term for each data point. For example, in linear regression, \( E \propto \sum_{i=1}^n (y_i - \mathbf{w}^T\cdot\mathbf{x}_i)^2 \); for logistic regression, the square error is replaced by the cross entropy. To calculate the gradient we have to sum over <em>all</em> \( n \) data points. Doing this at every GD step becomes extremely computationally expensive. An ingenious solution to this, is to calculate the gradients using small subsets of the data called &quot;mini batches&quot;. This has the added benefit of introducing stochasticity into our algorithm.</li>
-<p><li> <b>GD is very sensitive to choices of learning rates</b>. GD is extremely sensitive to the choice of learning rates. If the learning rate is very small, the training process take an extremely long time. For larger learning rates, GD can diverge and give poor results. Furthermore, depending on what the local landscape looks like, we have to modify the learning rates to ensure convergence. Ideally, we would <em>adaptively</em> choose the learning rates to match the landscape.</li>
-<p><li> <b>GD treats all directions in parameter space uniformly.</b> Another major drawback of GD is that unlike Newton's method, the learning rate for GD is the same in all directions in parameter space. For this reason, the maximum learning rate is set by the behavior of the steepest direction and this can significantly slow down training. Ideally, we would like to take large steps in flat directions and small steps in steep directions. Since we are exploring rugged landscapes where curvatures change, this requires us to keep track of not only the gradient but second derivatives. The ideal scenario would be to calculate the Hessian but this proves to be too computationally expensive.</li> 
-<p><li> GD can take exponential time to escape saddle points, even with random initialization. As we mentioned, GD is extremely sensitive to initial condition since it determines the particular local minimum GD would eventually reach. However, even with a good initialization scheme, through the introduction of randomness, GD can still take exponential time to escape saddle points.</li>
-</ul>
-</section>
-
-<section>
-<h2 id="challenge-yourself-the-coming-weekend">Challenge yourself the coming weekend </h2>
-
-<p>Write a code which implements gradient descent for a logistic regression example.</p>
-</section>
-
-<section>
-<h2 id="lab-session-material-from-last-week-and-relevant-for-the-first-project">Lab session: Material from last week and relevant for the first project </h2>
-</section>
-
 <section>
 <h2 id="various-steps-in-cross-validation">Various steps in cross-validation </h2>
 
@@ -2198,33 +1485,6 @@ <h2 id="various-steps-in-cross-validation">Various steps in cross-validation </h
 </p>
 </section>
 
-<section>
-<h2 id="how-to-set-up-the-cross-validation-for-ridge-and-or-lasso">How to set up the cross-validation for Ridge and/or Lasso </h2>
-
-<ul>
-<p><li> Define a range of interest for the penalty parameter.</li>
-<p><li> Divide the data set into training and test set comprising samples \( \{1, \ldots, n\} \setminus i \) and \( \{ i \} \), respectively.</li>
-<p><li> Fit the linear regression model by means of for example Ridge or Lasso regression  for each \( \lambda \) in the grid using the training set, and the corresponding estimate of the error variance \( \boldsymbol{\sigma}_{-i}^2(\lambda) \), as</li>
-</ul>
-<p>
-<p>&nbsp;<br>
-$$
-\begin{align*}
-\boldsymbol{\beta}_{-i}(\lambda) & =  ( \boldsymbol{X}_{-i, \ast}^{T}
-\boldsymbol{X}_{-i, \ast} + \lambda \boldsymbol{I}_{pp})^{-1}
-\boldsymbol{X}_{-i, \ast}^{T} \boldsymbol{y}_{-i}
-\end{align*}
-$$
-<p>&nbsp;<br>
-
-
-<ul>
-<p><li> Evaluate the prediction performance of these models on the test set by \( C[y_i, \boldsymbol{X}_{i, \ast}; \boldsymbol{\beta}_{-i}(\lambda), \boldsymbol{\sigma}_{-i}^2(\lambda)] \). Or, by the prediction error \( |y_i - \boldsymbol{X}_{i, \ast} \boldsymbol{\beta}_{-i}(\lambda)| \), the relative error, the error squared or the R2 score function.</li>
-<p><li> Repeat the first three steps  such that each sample plays the role of the test set once.</li>
-<p><li> Average the prediction performances of the test sets at each grid point of the penalty bias/parameter. It is an estimate of the prediction performance of the model corresponding to this value of the penalty parameter on novel data.</li> 
-</ul>
-</section>
-
 <section>
 <h2 id="cross-validation-in-brief">Cross-validation in brief </h2>
 
@@ -2235,10 +1495,10 @@ <h2 id="cross-validation-in-brief">Cross-validation in brief </h2>
 <p><li> Split the dataset into \( k \) groups.</li>
 <p><li> For each unique group:
 <ol type="a"></li>
- <p><li> Decide which group to use as set for test data</li>
- <p><li> Take the remaining groups as a training data set</li>
- <p><li> Fit a model on the training set and evaluate it on the test set</li>
- <p><li> Retain the evaluation score and discard the model</li>
+<p><li> Decide which group to use as set for test data</li>
+<p><li> Take the remaining groups as a training data set</li>
+<p><li> Fit a model on the training set and evaluate it on the test set</li>
+<p><li> Retain the evaluation score and discard the model</li>
 </ol>
 <p>
 <p><li> Summarize the model using the sample of model evaluation scores</li>
@@ -2361,6 +1621,220 @@ <h2 id="code-example-for-cross-validation-and-k-fold-cross-validation">Code Exam
 </div>
 </section>
 
+<section>
+<h2 id="more-examples-on-bootstrap-and-cross-validation-and-errors">More examples on bootstrap and cross-validation and errors </h2>
+
+
+<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #eeeedd">
+  <pre style="font-size: 80%; line-height: 125%;"><span style="color: #228B22"># Common imports</span>
+<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">os</span>
+<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
+<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">pandas</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">pd</span>
+<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">matplotlib.pyplot</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">plt</span>
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.linear_model</span> <span style="color: #8B008B; font-weight: bold">import</span> LinearRegression, Ridge, Lasso
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.model_selection</span> <span style="color: #8B008B; font-weight: bold">import</span> train_test_split
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.utils</span> <span style="color: #8B008B; font-weight: bold">import</span> resample
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.metrics</span> <span style="color: #8B008B; font-weight: bold">import</span> mean_squared_error
+<span style="color: #228B22"># Where to save the figures and data files</span>
+PROJECT_ROOT_DIR = <span style="color: #CD5555">&quot;Results&quot;</span>
+FIGURE_ID = <span style="color: #CD5555">&quot;Results/FigureFiles&quot;</span>
+DATA_ID = <span style="color: #CD5555">&quot;DataFiles/&quot;</span>
+
+<span style="color: #8B008B; font-weight: bold">if</span> <span style="color: #8B008B">not</span> os.path.exists(PROJECT_ROOT_DIR):
+    os.mkdir(PROJECT_ROOT_DIR)
+
+<span style="color: #8B008B; font-weight: bold">if</span> <span style="color: #8B008B">not</span> os.path.exists(FIGURE_ID):
+    os.makedirs(FIGURE_ID)
+
+<span style="color: #8B008B; font-weight: bold">if</span> <span style="color: #8B008B">not</span> os.path.exists(DATA_ID):
+    os.makedirs(DATA_ID)
+
+<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">image_path</span>(fig_id):
+    <span style="color: #8B008B; font-weight: bold">return</span> os.path.join(FIGURE_ID, fig_id)
+
+<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">data_path</span>(dat_id):
+    <span style="color: #8B008B; font-weight: bold">return</span> os.path.join(DATA_ID, dat_id)
+
+<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">save_fig</span>(fig_id):
+    plt.savefig(image_path(fig_id) + <span style="color: #CD5555">&quot;.png&quot;</span>, <span style="color: #658b00">format</span>=<span style="color: #CD5555">&#39;png&#39;</span>)
+
+infile = <span style="color: #658b00">open</span>(data_path(<span style="color: #CD5555">&quot;EoS.csv&quot;</span>),<span style="color: #CD5555">&#39;r&#39;</span>)
+
+<span style="color: #228B22"># Read the EoS data as  csv file and organize the data into two arrays with density and energies</span>
+EoS = pd.read_csv(infile, names=(<span style="color: #CD5555">&#39;Density&#39;</span>, <span style="color: #CD5555">&#39;Energy&#39;</span>))
+EoS[<span style="color: #CD5555">&#39;Energy&#39;</span>] = pd.to_numeric(EoS[<span style="color: #CD5555">&#39;Energy&#39;</span>], errors=<span style="color: #CD5555">&#39;coerce&#39;</span>)
+EoS = EoS.dropna()
+Energies = EoS[<span style="color: #CD5555">&#39;Energy&#39;</span>]
+Density = EoS[<span style="color: #CD5555">&#39;Density&#39;</span>]
+<span style="color: #228B22">#  The design matrix now as function of various polytrops</span>
+
+Maxpolydegree = <span style="color: #B452CD">30</span>
+X = np.zeros((<span style="color: #658b00">len</span>(Density),Maxpolydegree))
+X[:,<span style="color: #B452CD">0</span>] = <span style="color: #B452CD">1.0</span>
+testerror = np.zeros(Maxpolydegree)
+trainingerror = np.zeros(Maxpolydegree)
+polynomial = np.zeros(Maxpolydegree)
+
+trials = <span style="color: #B452CD">100</span>
+<span style="color: #8B008B; font-weight: bold">for</span> polydegree <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #B452CD">1</span>, Maxpolydegree):
+    polynomial[polydegree] = polydegree
+    <span style="color: #8B008B; font-weight: bold">for</span> degree <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(polydegree):
+        X[:,degree] = Density**(degree/<span style="color: #B452CD">3.0</span>)
+
+<span style="color: #228B22"># loop over trials in order to estimate the expectation value of the MSE</span>
+    testerror[polydegree] = <span style="color: #B452CD">0.0</span>
+    trainingerror[polydegree] = <span style="color: #B452CD">0.0</span>
+    <span style="color: #8B008B; font-weight: bold">for</span> samples <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(trials):
+        x_train, x_test, y_train, y_test = train_test_split(X, Energies, test_size=<span style="color: #B452CD">0.2</span>)
+        model = LinearRegression(fit_intercept=<span style="color: #8B008B; font-weight: bold">False</span>).fit(x_train, y_train)
+        ypred = model.predict(x_train)
+        ytilde = model.predict(x_test)
+        testerror[polydegree] += mean_squared_error(y_test, ytilde)
+        trainingerror[polydegree] += mean_squared_error(y_train, ypred) 
+
+    testerror[polydegree] /= trials
+    trainingerror[polydegree] /= trials
+    <span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;Degree of polynomial: %3d&quot;</span>% polynomial[polydegree])
+    <span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;Mean squared error on training data: %.8f&quot;</span> % trainingerror[polydegree])
+    <span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;Mean squared error on test data: %.8f&quot;</span> % testerror[polydegree])
+
+plt.plot(polynomial, np.log10(trainingerror), label=<span style="color: #CD5555">&#39;Training Error&#39;</span>)
+plt.plot(polynomial, np.log10(testerror), label=<span style="color: #CD5555">&#39;Test Error&#39;</span>)
+plt.xlabel(<span style="color: #CD5555">&#39;Polynomial degree&#39;</span>)
+plt.ylabel(<span style="color: #CD5555">&#39;log10[MSE]&#39;</span>)
+plt.legend()
+plt.show()
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
+</div>
+
+<p>Note that we kept the intercept column in the fitting here. This means that we need to set the <b>intercept</b> in the call to the <b>Scikit-Learn</b> function as <b>False</b>. Alternatively, we could have set up the design matrix \( X \) without the first column of ones.</p>
+</section>
+
+<section>
+<h2 id="the-same-example-but-now-with-cross-validation">The same example but now with cross-validation </h2>
+
+<p>In this example we keep the intercept column again but add cross-validation in order to estimate the best possible value of the means squared error.</p>
+
+<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #eeeedd">
+  <pre style="font-size: 80%; line-height: 125%;"><span style="color: #228B22"># Common imports</span>
+<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">os</span>
+<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
+<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">pandas</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">pd</span>
+<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">matplotlib.pyplot</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">plt</span>
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.linear_model</span> <span style="color: #8B008B; font-weight: bold">import</span> LinearRegression, Ridge, Lasso
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.metrics</span> <span style="color: #8B008B; font-weight: bold">import</span> mean_squared_error
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.model_selection</span> <span style="color: #8B008B; font-weight: bold">import</span> KFold
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.model_selection</span> <span style="color: #8B008B; font-weight: bold">import</span> cross_val_score
+
+
+<span style="color: #228B22"># Where to save the figures and data files</span>
+PROJECT_ROOT_DIR = <span style="color: #CD5555">&quot;Results&quot;</span>
+FIGURE_ID = <span style="color: #CD5555">&quot;Results/FigureFiles&quot;</span>
+DATA_ID = <span style="color: #CD5555">&quot;DataFiles/&quot;</span>
+
+<span style="color: #8B008B; font-weight: bold">if</span> <span style="color: #8B008B">not</span> os.path.exists(PROJECT_ROOT_DIR):
+    os.mkdir(PROJECT_ROOT_DIR)
+
+<span style="color: #8B008B; font-weight: bold">if</span> <span style="color: #8B008B">not</span> os.path.exists(FIGURE_ID):
+    os.makedirs(FIGURE_ID)
+
+<span style="color: #8B008B; font-weight: bold">if</span> <span style="color: #8B008B">not</span> os.path.exists(DATA_ID):
+    os.makedirs(DATA_ID)
+
+<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">image_path</span>(fig_id):
+    <span style="color: #8B008B; font-weight: bold">return</span> os.path.join(FIGURE_ID, fig_id)
+
+<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">data_path</span>(dat_id):
+    <span style="color: #8B008B; font-weight: bold">return</span> os.path.join(DATA_ID, dat_id)
+
+<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">save_fig</span>(fig_id):
+    plt.savefig(image_path(fig_id) + <span style="color: #CD5555">&quot;.png&quot;</span>, <span style="color: #658b00">format</span>=<span style="color: #CD5555">&#39;png&#39;</span>)
+
+infile = <span style="color: #658b00">open</span>(data_path(<span style="color: #CD5555">&quot;EoS.csv&quot;</span>),<span style="color: #CD5555">&#39;r&#39;</span>)
+
+<span style="color: #228B22"># Read the EoS data as  csv file and organize the data into two arrays with density and energies</span>
+EoS = pd.read_csv(infile, names=(<span style="color: #CD5555">&#39;Density&#39;</span>, <span style="color: #CD5555">&#39;Energy&#39;</span>))
+EoS[<span style="color: #CD5555">&#39;Energy&#39;</span>] = pd.to_numeric(EoS[<span style="color: #CD5555">&#39;Energy&#39;</span>], errors=<span style="color: #CD5555">&#39;coerce&#39;</span>)
+EoS = EoS.dropna()
+Energies = EoS[<span style="color: #CD5555">&#39;Energy&#39;</span>]
+Density = EoS[<span style="color: #CD5555">&#39;Density&#39;</span>]
+<span style="color: #228B22">#  The design matrix now as function of various polytrops</span>
+
+Maxpolydegree = <span style="color: #B452CD">30</span>
+X = np.zeros((<span style="color: #658b00">len</span>(Density),Maxpolydegree))
+X[:,<span style="color: #B452CD">0</span>] = <span style="color: #B452CD">1.0</span>
+estimated_mse_sklearn = np.zeros(Maxpolydegree)
+polynomial = np.zeros(Maxpolydegree)
+k =<span style="color: #B452CD">5</span>
+kfold = KFold(n_splits = k)
+
+<span style="color: #8B008B; font-weight: bold">for</span> polydegree <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #B452CD">1</span>, Maxpolydegree):
+    polynomial[polydegree] = polydegree
+    <span style="color: #8B008B; font-weight: bold">for</span> degree <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(polydegree):
+        X[:,degree] = Density**(degree/<span style="color: #B452CD">3.0</span>)
+        OLS = LinearRegression(fit_intercept=<span style="color: #8B008B; font-weight: bold">False</span>)
+<span style="color: #228B22"># loop over trials in order to estimate the expectation value of the MSE</span>
+    estimated_mse_folds = cross_val_score(OLS, X, Energies, scoring=<span style="color: #CD5555">&#39;neg_mean_squared_error&#39;</span>, cv=kfold)
+<span style="color: #228B22">#[:, np.newaxis]</span>
+    estimated_mse_sklearn[polydegree] = np.mean(-estimated_mse_folds)
+
+plt.plot(polynomial, np.log10(estimated_mse_sklearn), label=<span style="color: #CD5555">&#39;Test Error&#39;</span>)
+plt.xlabel(<span style="color: #CD5555">&#39;Polynomial degree&#39;</span>)
+plt.ylabel(<span style="color: #CD5555">&#39;log10[MSE]&#39;</span>)
+plt.legend()
+plt.show()
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
+</div>
+</section>
+
+<section>
+<h2 id="material-for-the-lab-sessions">Material for the lab sessions </h2>
+
+<p>This week we will discuss during the first hour of each lab session
+some technicalities related to the project and methods for updating
+the learning like ADAgrad, RMSprop and ADAM. As teaching material, see
+the jupyter-notebook from week 37 (September 12-16).
+</p>
+
+<p>For the lab session, the following video on cross validation (from 2024), could be helpful, see <a href="https://www.youtube.com/watch?v=T9jjWsmsd1o" target="_blank"><tt>https://www.youtube.com/watch?v=T9jjWsmsd1o</tt></a></p>
+
+<p>See also video on ADAgrad, RMSprop and ADAM (material from last week not covered during lecture) at <a href="https://youtu.be/J_41Hld6tTU" target="_blank"><tt>https://youtu.be/J_41Hld6tTU</tt></a></p>
+</section>
+
 
 
 </div> <!-- class="slides" -->
diff --git a/doc/pub/week38/html/week38-solarized.html b/doc/pub/week38/html/week38-solarized.html
index 0675122f4..391bf4621 100644
--- a/doc/pub/week38/html/week38-solarized.html
+++ b/doc/pub/week38/html/week38-solarized.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 38: Logistic Regression and Optimization">
-<title>Week 38: Logistic Regression and Optimization</title>
+<meta name="description" content="Week 38: Statistical analysis, bias-variance tradeoff and resampling methods">
+<title>Week 38: Statistical analysis, bias-variance tradeoff and resampling methods</title>
 <link href="https://cdn.rawgit.com/doconce/doconce/master/bundled/html_styles/style_solarized_box/css/solarized_light_code.css" rel="stylesheet" type="text/css" title="light"/>
 <script src="https://cdn.rawgit.com/doconce/doconce/master/bundled/html_styles/style_solarized_box/js/highlight.pack.js"></script>
 <script>hljs.initHighlightingOnLoad();</script>
@@ -63,172 +63,113 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Plans for week 38, lecture Monday September 16',
+ 'sections': [('Plans for week 38, lecture Monday September 15',
                2,
                None,
-               'plans-for-week-38-lecture-monday-september-16'),
-              ('Suggested reading and videos',
+               'plans-for-week-38-lecture-monday-september-15'),
+              ('Readings and Videos', 2, None, 'readings-and-videos'),
+              ('Linking the regression analysis with a statistical '
+               'interpretation',
                2,
                None,
-               'suggested-reading-and-videos'),
-              ('Plans for the lab sessions',
+               'linking-the-regression-analysis-with-a-statistical-interpretation'),
+              ('Assumptions made', 2, None, 'assumptions-made'),
+              ('Expectation value and variance',
                2,
                None,
-               'plans-for-the-lab-sessions'),
-              ('Material for lecture Monday September 16',
+               'expectation-value-and-variance'),
+              ('Expectation value and variance for $\\boldsymbol{\\theta}$',
                2,
                None,
-               'material-for-lecture-monday-september-16'),
-              ('Logistic Regression', 2, None, 'logistic-regression'),
-              ('Classification problems', 2, None, 'classification-problems'),
-              ('Optimization and Deep learning',
+               'expectation-value-and-variance-for-boldsymbol-theta'),
+              ('Deriving OLS from a probability distribution',
                2,
                None,
-               'optimization-and-deep-learning'),
-              ('Basics', 2, None, 'basics'),
-              ('Linear classifier', 2, None, 'linear-classifier'),
-              ('Some selected properties', 2, None, 'some-selected-properties'),
-              ('Simple example', 2, None, 'simple-example'),
-              ('Plotting the mean value for each group',
+               'deriving-ols-from-a-probability-distribution'),
+              ('Independent and Identically Distributed (iid)',
                2,
                None,
-               'plotting-the-mean-value-for-each-group'),
-              ('The logistic function', 2, None, 'the-logistic-function'),
-              ('Examples of likelihood functions used in logistic regression '
-               'and nueral networks',
+               'independent-and-identically-distributed-iid'),
+              ('Maximum Likelihood Estimation (MLE)',
                2,
                None,
-               'examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-networks'),
-              ('Two parameters', 2, None, 'two-parameters'),
-              ('Maximum likelihood', 2, None, 'maximum-likelihood'),
-              ('The cost function rewritten',
+               'maximum-likelihood-estimation-mle'),
+              ('A new Cost Function', 2, None, 'a-new-cost-function'),
+              ('Why resampling methods', 2, None, 'why-resampling-methods'),
+              ('Resampling methods', 2, None, 'resampling-methods'),
+              ('Resampling approaches can be computationally expensive',
                2,
                None,
-               'the-cost-function-rewritten'),
-              ('Minimizing the cross entropy',
+               'resampling-approaches-can-be-computationally-expensive'),
+              ('Why resampling methods ?', 2, None, 'why-resampling-methods'),
+              ('Statistical analysis', 2, None, 'statistical-analysis'),
+              ('Resampling methods', 2, None, 'resampling-methods'),
+              ('Resampling methods: Bootstrap',
                2,
                None,
-               'minimizing-the-cross-entropy'),
-              ('A more compact expression',
+               'resampling-methods-bootstrap'),
+              ('The Central Limit Theorem',
                2,
                None,
-               'a-more-compact-expression'),
-              ('Extending to more predictors',
+               'the-central-limit-theorem'),
+              ('Finding the Limit', 2, None, 'finding-the-limit'),
+              ('Rewriting the $\\delta$-function',
                2,
                None,
-               'extending-to-more-predictors'),
-              ('Including more classes', 2, None, 'including-more-classes'),
-              ('More classes', 2, None, 'more-classes'),
-              ('Searching for Optimal Regularization Parameters $\\lambda$',
+               'rewriting-the-delta-function'),
+              ('Identifying Terms', 2, None, 'identifying-terms'),
+              ('Wrapping it up', 2, None, 'wrapping-it-up'),
+              ('Confidence Intervals', 2, None, 'confidence-intervals'),
+              ('Standard Approach based on the Normal Distribution',
                2,
                None,
-               'searching-for-optimal-regularization-parameters-lambda'),
-              ('Grid Search', 2, None, 'grid-search'),
-              ('Randomized Grid Search', 2, None, 'randomized-grid-search'),
-              ('Wisconsin Cancer Data', 2, None, 'wisconsin-cancer-data'),
-              ('Using the correlation matrix',
+               'standard-approach-based-on-the-normal-distribution'),
+              ('Resampling methods: Bootstrap background',
                2,
                None,
-               'using-the-correlation-matrix'),
-              ('Discussing the correlation data',
+               'resampling-methods-bootstrap-background'),
+              ('Resampling methods: More Bootstrap background',
                2,
                None,
-               'discussing-the-correlation-data'),
-              ('Other measures in classification studies: Cancer Data  again',
+               'resampling-methods-more-bootstrap-background'),
+              ('Resampling methods: Bootstrap approach',
                2,
                None,
-               'other-measures-in-classification-studies-cancer-data-again'),
-              ('Optimization, the central part of any Machine Learning '
-               'algortithm',
+               'resampling-methods-bootstrap-approach'),
+              ('Resampling methods: Bootstrap steps',
                2,
                None,
-               'optimization-the-central-part-of-any-machine-learning-algortithm'),
-              ('Revisiting our Logistic Regression case',
+               'resampling-methods-bootstrap-steps'),
+              ('Code example for the Bootstrap method',
                2,
                None,
-               'revisiting-our-logistic-regression-case'),
-              ('The equations to solve', 2, None, 'the-equations-to-solve'),
-              ("Solving using Newton-Raphson's method",
+               'code-example-for-the-bootstrap-method'),
+              ('Plotting the Histogram', 2, None, 'plotting-the-histogram'),
+              ('The bias-variance tradeoff',
                2,
                None,
-               'solving-using-newton-raphson-s-method'),
-              ("Brief reminder on Newton-Raphson's method",
+               'the-bias-variance-tradeoff'),
+              ('A way to Read the Bias-Variance Tradeoff',
                2,
                None,
-               'brief-reminder-on-newton-raphson-s-method'),
-              ('The equations', 2, None, 'the-equations'),
-              ('Simple geometric interpretation',
+               'a-way-to-read-the-bias-variance-tradeoff'),
+              ('Example code for Bias-Variance tradeoff',
                2,
                None,
-               'simple-geometric-interpretation'),
-              ('Extending to more than one variable',
+               'example-code-for-bias-variance-tradeoff'),
+              ('Understanding what happens',
                2,
                None,
-               'extending-to-more-than-one-variable'),
-              ('Steepest descent', 2, None, 'steepest-descent'),
-              ('More on Steepest descent', 2, None, 'more-on-steepest-descent'),
-              ('The ideal', 2, None, 'the-ideal'),
-              ('The sensitiveness of the gradient descent',
+               'understanding-what-happens'),
+              ('Summing up', 2, None, 'summing-up'),
+              ("Another Example from Scikit-Learn's Repository",
                2,
                None,
-               'the-sensitiveness-of-the-gradient-descent'),
-              ('Convex functions', 2, None, 'convex-functions'),
-              ('Convex function', 2, None, 'convex-function'),
-              ('Conditions on convex functions',
-               2,
-               None,
-               'conditions-on-convex-functions'),
-              ('More on convex functions', 2, None, 'more-on-convex-functions'),
-              ('Some simple problems', 2, None, 'some-simple-problems'),
-              ('Revisiting our first homework',
-               2,
-               None,
-               'revisiting-our-first-homework'),
-              ('Gradient descent example', 2, None, 'gradient-descent-example'),
-              ('The derivative of the cost/loss function',
-               2,
-               None,
-               'the-derivative-of-the-cost-loss-function'),
-              ('The Hessian matrix', 2, None, 'the-hessian-matrix'),
-              ('Simple program', 2, None, 'simple-program'),
-              ('Gradient Descent Example', 2, None, 'gradient-descent-example'),
-              ('And a corresponding example using _scikit-learn_',
-               2,
-               None,
-               'and-a-corresponding-example-using-scikit-learn'),
-              ('Gradient descent and Ridge',
-               2,
-               None,
-               'gradient-descent-and-ridge'),
-              ('The Hessian matrix for Ridge Regression',
-               2,
-               None,
-               'the-hessian-matrix-for-ridge-regression'),
-              ('Program example for gradient descent with Ridge Regression',
-               2,
-               None,
-               'program-example-for-gradient-descent-with-ridge-regression'),
-              ('Using gradient descent methods, limitations',
-               2,
-               None,
-               'using-gradient-descent-methods-limitations'),
-              ('Challenge yourself the coming weekend',
-               2,
-               None,
-               'challenge-yourself-the-coming-weekend'),
-              ('Lab session: Material from last week and relevant for the '
-               'first project',
-               2,
-               None,
-               'lab-session-material-from-last-week-and-relevant-for-the-first-project'),
+               'another-example-from-scikit-learn-s-repository'),
               ('Various steps in cross-validation',
                2,
                None,
                'various-steps-in-cross-validation'),
-              ('How to set up the cross-validation for Ridge and/or Lasso',
-               2,
-               None,
-               'how-to-set-up-the-cross-validation-for-ridge-and-or-lasso'),
               ('Cross-validation in brief',
                2,
                None,
@@ -237,7 +178,19 @@
                'Cross-validation',
                2,
                None,
-               'code-example-for-cross-validation-and-k-fold-cross-validation')]}
+               'code-example-for-cross-validation-and-k-fold-cross-validation'),
+              ('More examples on bootstrap and cross-validation and errors',
+               2,
+               None,
+               'more-examples-on-bootstrap-and-cross-validation-and-errors'),
+              ('The same example but now with cross-validation',
+               2,
+               None,
+               'the-same-example-but-now-with-cross-validation'),
+              ('Material for the lab sessions',
+               2,
+               None,
+               'material-for-the-lab-sessions')]}
 end of tocinfo -->
 
 <body>
@@ -259,1507 +212,815 @@
 
 <!-- ------------------- main content ---------------------- -->
 <center>
-<h1>Week 38: Logistic Regression and Optimization</h1>
+<h1>Week 38: Statistical analysis, bias-variance tradeoff and resampling methods</h1>
 </center>  <!-- document title -->
 
 <!-- author(s): Morten Hjorth-Jensen -->
 <center>
-<b>Morten Hjorth-Jensen</b> [1, 2]
+<b>Morten Hjorth-Jensen</b> 
 </center>
-<!-- institution(s) -->
+<!-- institution -->
 <center>
-[1] <b>Department of Physics and Center for Computing in Science Education, University of Oslo</b>
-</center>
-<center>
-[2] <b>Department of Physics and Astronomy and Facility for Rare Isotope Beams, Michigan State University</b>
+<b>Department of Physics and Center for Computing in Science Education, University of Oslo, Norway</b>
 </center>
 <br>
 <center>
-<h4>September 16-20, 2024</h4>
+<h4>September 15-19, 2025</h4>
 </center> <!-- date -->
 <br>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="plans-for-week-38-lecture-monday-september-16">Plans for week 38, lecture Monday September 16 </h2>
+<h2 id="plans-for-week-38-lecture-monday-september-15">Plans for week 38, lecture Monday September 15 </h2>
 
 <div class="alert alert-block alert-block alert-text-normal">
-<b>Material for the lecture on Monday September 16</b>
+<b>Material for the lecture on Monday September 15</b>
 <p>
-<ul>
-  <li> Logistic regression as our first encounter of classification methods. From binary cases to several categories.</li>
-  <li> Start gradient and optimization methods</li>
-  <li> <a href="https://youtu.be/c9DIfNHy2ks" target="_blank">Video of lecture</a></li>
-  <li> Whiteboard notes at <a href="https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2024/NotesSeptember16.pdf" target="_blank"><tt>https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2024/NotesSeptember16.pdf</tt></a></li>
-</ul>
+<ol>
+<li> Statistical interpretation of OLS and various expectation values</li>
+<li> Resampling techniques, Bootstrap and cross validation and bias-variance tradeoff</li>
+<li> The material we did not cover last week, that is on more advanced methods for updating the learning rate, are covered by its own video. We will briefly discuss these topics at the beginning of the lecture and during the lab sessions. See video on ADAgrad, RMSprop and ADAM (material from last week not covered during lecture) at <a href="https://youtu.be/J_41Hld6tTU" target="_blank"><tt>https://youtu.be/J_41Hld6tTU</tt></a></li>
+<li> <a href="https://youtu.be/4Fo7ITVA7V4" target="_blank">Video of Lecture</a></li>
+<li> <a href="https://youtu.be/GBWc1abChKo" target="_blank">Video from lab sessions on the bias-variance tradeoff</a></li>
+<li> <a href="https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2025/FYSSTKweek38.pdf" target="_blank">Whiteboard notes</a></li>
+</ol>
 </div>
 
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="suggested-reading-and-videos">Suggested reading and videos </h2>
+<h2 id="readings-and-videos">Readings and Videos </h2>
 <div class="alert alert-block alert-block alert-text-normal">
 <b></b>
 <p>
-<ul>
-  <li> Readings and Videos:</li>
-<ul>
-    <li> Hastie et al 4.1, 4.2 and 4.3 on logistic regression</li>
-    <li> Raschka et al, pages 53-76 on Logistic regression and pages 37-52 on gradient optimization</li>
-    <li> For a good discussion on gradient methods, see Goodfellow et al section 4.3-4.5 and chapter 8. We will come back to the latter chapter in our discussion of Neural networks as well.</li>
-    <li> <a href="https://www.youtube.com/watch?v=C5268D9t9Ak" target="_blank">Video on Logistic regression</a></li>
-    <li> <a href="https://www.youtube.com/watch?v=yIYKR4sgzI8" target="_blank">Yet another video on logistic regression</a></li>
-    <li> <a href="https://www.youtube.com/watch?v=sDv4f4s2SB8" target="_blank">Video on gradient descent</a></li>
-</ul>
-</ul>
-</div>
-
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="plans-for-the-lab-sessions">Plans for the lab sessions </h2>
-
-<div class="alert alert-block alert-block alert-text-normal">
-<b>Material for the active learning sessions on Tuesday and Wednesday</b>
-<p>
-<ul>
-  <li> Repetition  from last week on the bias-variance tradeoff</li>
-  <li> Resampling techniques, cross-validation examples included here, see also the lectures from last week on the bootstrap method</li>
-  <li> Exercise for week 38 on the bias-variance tradeoff, see also the video from the lab session from week 37 at <a href="https://youtu.be/omLmp_kkie0" target="_blank"><tt>https://youtu.be/omLmp_kkie0</tt></a></li>
-  <li> Work on project 1, in particular resampling methods like cross-validation and bootstrap.</li>
-  <li> <a href="https://youtu.be/T9jjWsmsd1o" target="_blank">Video on cross-validation from exercise session</a></li>
-</ul>
+<ol>
+<li> Raschka et al, pages 175-192</li>
+<li> Hastie et al Chapter 7, here we recommend 7.1-7.5 and 7.10 (cross-validation) and 7.11 (bootstrap). See <a href="https://link.springer.com/book/10.1007/978-0-387-84858-7" target="_blank"><tt>https://link.springer.com/book/10.1007/978-0-387-84858-7</tt></a>.</li>
+<li> <a href="https://www.youtube.com/watch?v=EuBBz3bI-aA" target="_blank">Video on bias-variance tradeoff</a></li>
+<li> <a href="https://www.youtube.com/watch?v=Xz0x-8-cgaQ" target="_blank">Video on Bootstrapping</a></li>
+<li> <a href="https://www.youtube.com/watch?v=fSytzGwwBVw" target="_blank">Video on cross validation</a></li>
+</ol>
+<p>For the lab session, the following video on cross validation (from 2024), could be helpful, see <a href="https://www.youtube.com/watch?v=T9jjWsmsd1o" target="_blank"><tt>https://www.youtube.com/watch?v=T9jjWsmsd1o</tt></a></p>
 </div>
-  
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="material-for-lecture-monday-september-16">Material for lecture Monday September 16 </h2>
-
-<!-- !split  -->
-<h2 id="logistic-regression">Logistic Regression </h2>
 
-<p>In linear regression our main interest was centered on learning the
-coefficients of a functional fit (say a polynomial) in order to be
-able to predict the response of a continuous variable on some unseen
-data. The fit to the continuous variable \( y_i \) is based on some
-independent variables \( \boldsymbol{x}_i \). Linear regression resulted in
-analytical expressions for standard ordinary Least Squares or Ridge
-regression (in terms of matrices to invert) for several quantities,
-ranging from the variance and thereby the confidence intervals of the
-parameters \( \boldsymbol{\beta} \) to the mean squared error. If we can invert
-the product of the design matrices, linear regression gives then a
-simple recipe for fitting our data.
-</p>
 
 <!-- !split  -->
-<h2 id="classification-problems">Classification problems </h2>
+<h2 id="linking-the-regression-analysis-with-a-statistical-interpretation">Linking the regression analysis with a statistical interpretation </h2>
 
-<p>Classification problems, however, are concerned with outcomes taking
-the form of discrete variables (i.e. categories). We may for example,
-on the basis of DNA sequencing for a number of patients, like to find
-out which mutations are important for a certain disease; or based on
-scans of various patients' brains, figure out if there is a tumor or
-not; or given a specific physical system, we'd like to identify its
-state, say whether it is an ordered or disordered system (typical
-situation in solid state physics); or classify the status of a
-patient, whether she/he has a stroke or not and many other similar
-situations.
+<p>We will now couple the discussions of ordinary least squares, Ridge
+and Lasso regression with a statistical interpretation, that is we
+move from a linear algebra analysis to a statistical analysis. In
+particular, we will focus on what the regularization terms can result
+in.  We will amongst other things show that the regularization
+parameter can reduce considerably the variance of the parameters
+\( \theta \).
 </p>
 
-<p>The most common situation we encounter when we apply logistic
-regression is that of two possible outcomes, normally denoted as a
-binary outcome, true or false, positive or negative, success or
-failure etc.
+<p>On of the advantages of doing linear regression is that we actually end up with
+analytical expressions for several statistical quantities.  
+Standard least squares and Ridge regression  allow us to
+derive quantities like the variance and other expectation values in a
+rather straightforward way.
 </p>
 
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="optimization-and-deep-learning">Optimization and Deep learning </h2>
-
-<p>Logistic regression will also serve as our stepping stone towards
-neural network algorithms and supervised deep learning. For logistic
-learning, the minimization of the cost function leads to a non-linear
-equation in the parameters \( \boldsymbol{\beta} \). The optimization of the
-problem calls therefore for minimization algorithms. This forms the
-bottle neck of all machine learning algorithms, namely how to find
-reliable minima of a multi-variable function. This leads us to the
-family of gradient descent methods. The latter are the working horses
-of basically all modern machine learning algorithms.
+<p>It is assumed that \( \varepsilon_i
+\sim \mathcal{N}(0, \sigma^2) \) and the \( \varepsilon_{i} \) are
+independent, i.e.: 
 </p>
+$$
+\begin{align*} 
+\mbox{Cov}(\varepsilon_{i_1},
+\varepsilon_{i_2}) & = \left\{ \begin{array}{lcc} \sigma^2 & \mbox{if}
+& i_1 = i_2, \\ 0 & \mbox{if} & i_1 \not= i_2.  \end{array} \right.
+\end{align*} 
+$$
 
-<p>We note also that many of the topics discussed here on logistic 
-regression are also commonly used in modern supervised Deep Learning
-models, as we will see later.
+<p>The randomness of \( \varepsilon_i \) implies that
+\( \mathbf{y}_i \) is also a random variable. In particular,
+\( \mathbf{y}_i \) is normally distributed, because \( \varepsilon_i \sim
+\mathcal{N}(0, \sigma^2) \) and \( \mathbf{X}_{i,\ast} \, \boldsymbol{\theta} \) is a
+non-random scalar. To specify the parameters of the distribution of
+\( \mathbf{y}_i \) we need to calculate its first two moments. 
 </p>
 
-<!-- !split  -->
-<h2 id="basics">Basics </h2>
-
-<p>We consider the case where the dependent variables, also called the
-responses or the outcomes, \( y_i \) are discrete and only take values
-from \( k=0,\dots,K-1 \) (i.e. \( K \) classes).
+<p>Recall that \( \boldsymbol{X} \) is a matrix of dimensionality \( n\times p \). The
+notation above \( \mathbf{X}_{i,\ast} \) means that we are looking at the
+row number \( i \) and perform a sum over all values \( p \).
 </p>
 
-<p>The goal is to predict the
-output classes from the design matrix \( \boldsymbol{X}\in\mathbb{R}^{n\times p} \)
-made of \( n \) samples, each of which carries \( p \) features or predictors. The
-primary goal is to identify the classes to which new unseen samples
-belong.
-</p>
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="assumptions-made">Assumptions made </h2>
 
-<p>Let us specialize to the case of two classes only, with outputs
-\( y_i=0 \) and \( y_i=1 \). Our outcomes could represent the status of a
-credit card user that could default or not on her/his credit card
-debt. That is
+<p>The assumption we have made here can be summarized as (and this is going to be useful when we discuss the bias-variance trade off)
+that there exists a function \( f(\boldsymbol{x}) \) and  a normal distributed error \( \boldsymbol{\varepsilon}\sim \mathcal{N}(0, \sigma^2) \)
+which describe our data
 </p>
-
 $$
-y_i = \begin{bmatrix} 0 & \mathrm{no}\\  1 & \mathrm{yes} \end{bmatrix}.
+\boldsymbol{y} = f(\boldsymbol{x})+\boldsymbol{\varepsilon}
 $$
 
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="linear-classifier">Linear classifier </h2>
-
-<p>Before moving to the logistic model, let us try to use our linear
-regression model to classify these two outcomes. We could for example
-fit a linear model to the default case if \( y_i > 0.5 \) and the no
-default case \( y_i \leq 0.5 \).
-</p>
-
-<p>We would then have our 
-weighted linear combination, namely 
+<p>We approximate this function with our model from the solution of the linear regression equations, that is our
+function \( f \) is approximated by \( \boldsymbol{\tilde{y}} \) where we want to minimize \( (\boldsymbol{y}-\boldsymbol{\tilde{y}})^2 \), our MSE, with
 </p>
 $$
-\begin{equation}
-\boldsymbol{y} = \boldsymbol{X}^T\boldsymbol{\beta} +  \boldsymbol{\epsilon},
-\label{_auto1}
-\end{equation}
+\boldsymbol{\tilde{y}} = \boldsymbol{X}\boldsymbol{\theta}.
 $$
 
-<p>where \( \boldsymbol{y} \) is a vector representing the possible outcomes, \( \boldsymbol{X} \) is our
-\( n\times p \) design matrix and \( \boldsymbol{\beta} \) represents our estimators/predictors.
-</p>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="some-selected-properties">Some selected properties </h2>
+<h2 id="expectation-value-and-variance">Expectation value and variance </h2>
 
-<p>The main problem with our function is that it takes values on the
-entire real axis. In the case of logistic regression, however, the
-labels \( y_i \) are discrete variables. A typical example is the credit
-card data discussed below here, where we can set the state of
-defaulting the debt to \( y_i=1 \) and not to \( y_i=0 \) for one the persons
-in the data set (see the full example below).
-</p>
+<p>We can calculate the expectation value of \( \boldsymbol{y} \) for a given element \( i \) </p>
+$$
+\begin{align*} 
+\mathbb{E}(y_i) & =
+\mathbb{E}(\mathbf{X}_{i, \ast} \, \boldsymbol{\theta}) + \mathbb{E}(\varepsilon_i)
+\, \, \, = \, \, \, \mathbf{X}_{i, \ast} \, \theta, 
+\end{align*} 
+$$
 
-<p>One simple way to get a discrete output is to have sign
-functions that map the output of a linear regressor to values \( \{0,1\} \),
-\( f(s_i)=sign(s_i)=1 \) if \( s_i\ge 0 \) and 0 if otherwise. 
-We will encounter this model in our first demonstration of neural networks.
+<p>while
+its variance is 
 </p>
+$$
+\begin{align*} \mbox{Var}(y_i) & = \mathbb{E} \{ [y_i
+- \mathbb{E}(y_i)]^2 \} \, \, \, = \, \, \, \mathbb{E} ( y_i^2 ) -
+[\mathbb{E}(y_i)]^2  \\  & = \mathbb{E} [ ( \mathbf{X}_{i, \ast} \,
+\theta + \varepsilon_i )^2] - ( \mathbf{X}_{i, \ast} \, \boldsymbol{\theta})^2 \\ &
+= \mathbb{E} [ ( \mathbf{X}_{i, \ast} \, \boldsymbol{\theta})^2 + 2 \varepsilon_i
+\mathbf{X}_{i, \ast} \, \boldsymbol{\theta} + \varepsilon_i^2 ] - ( \mathbf{X}_{i,
+\ast} \, \theta)^2 \\  & = ( \mathbf{X}_{i, \ast} \, \boldsymbol{\theta})^2 + 2
+\mathbb{E}(\varepsilon_i) \mathbf{X}_{i, \ast} \, \boldsymbol{\theta} +
+\mathbb{E}(\varepsilon_i^2 ) - ( \mathbf{X}_{i, \ast} \, \boldsymbol{\theta})^2 
+\\ & = \mathbb{E}(\varepsilon_i^2 ) \, \, \, = \, \, \,
+\mbox{Var}(\varepsilon_i) \, \, \, = \, \, \, \sigma^2.  
+\end{align*}
+$$
 
-<p>Historically it is called the <b>perceptron</b> model in the machine learning
-literature. This model is extremely simple. However, in many cases it is more
-favorable to use a ``soft" classifier that outputs
-the probability of a given category. This leads us to the logistic function.
+<p>Hence, \( y_i \sim \mathcal{N}( \mathbf{X}_{i, \ast} \, \boldsymbol{\theta}, \sigma^2) \), that is \( \boldsymbol{y} \) follows a normal distribution with 
+mean value \( \boldsymbol{X}\boldsymbol{\theta} \) and variance \( \sigma^2 \) (not be confused with the singular values of the SVD). 
 </p>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="simple-example">Simple example </h2>
-
-<p>The following example on data for coronary heart disease (CHD) as function of age may serve as an illustration. In the code here we read and plot whether a person has had CHD (output = 1) or not (output = 0). This ouput  is plotted the person's against age. Clearly, the figure shows that attempting to make a standard linear regression fit may not be very meaningful.</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="line-height: 125%;"><span style="color: #228B22"># Common imports</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">os</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">pandas</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">pd</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">matplotlib.pyplot</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">plt</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.linear_model</span> <span style="color: #8B008B; font-weight: bold">import</span> LinearRegression, Ridge, Lasso
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.model_selection</span> <span style="color: #8B008B; font-weight: bold">import</span> train_test_split
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.utils</span> <span style="color: #8B008B; font-weight: bold">import</span> resample
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.metrics</span> <span style="color: #8B008B; font-weight: bold">import</span> mean_squared_error
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">IPython.display</span> <span style="color: #8B008B; font-weight: bold">import</span> display
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">pylab</span> <span style="color: #8B008B; font-weight: bold">import</span> plt, mpl
-plt.style.use(<span style="color: #CD5555">&#39;seaborn&#39;</span>)
-mpl.rcParams[<span style="color: #CD5555">&#39;font.family&#39;</span>] = <span style="color: #CD5555">&#39;serif&#39;</span>
-
-<span style="color: #228B22"># Where to save the figures and data files</span>
-PROJECT_ROOT_DIR = <span style="color: #CD5555">&quot;Results&quot;</span>
-FIGURE_ID = <span style="color: #CD5555">&quot;Results/FigureFiles&quot;</span>
-DATA_ID = <span style="color: #CD5555">&quot;DataFiles/&quot;</span>
-
-<span style="color: #8B008B; font-weight: bold">if</span> <span style="color: #8B008B">not</span> os.path.exists(PROJECT_ROOT_DIR):
-    os.mkdir(PROJECT_ROOT_DIR)
-
-<span style="color: #8B008B; font-weight: bold">if</span> <span style="color: #8B008B">not</span> os.path.exists(FIGURE_ID):
-    os.makedirs(FIGURE_ID)
-
-<span style="color: #8B008B; font-weight: bold">if</span> <span style="color: #8B008B">not</span> os.path.exists(DATA_ID):
-    os.makedirs(DATA_ID)
-
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">image_path</span>(fig_id):
-    <span style="color: #8B008B; font-weight: bold">return</span> os.path.join(FIGURE_ID, fig_id)
-
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">data_path</span>(dat_id):
-    <span style="color: #8B008B; font-weight: bold">return</span> os.path.join(DATA_ID, dat_id)
-
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">save_fig</span>(fig_id):
-    plt.savefig(image_path(fig_id) + <span style="color: #CD5555">&quot;.png&quot;</span>, <span style="color: #658b00">format</span>=<span style="color: #CD5555">&#39;png&#39;</span>)
-
-infile = <span style="color: #658b00">open</span>(data_path(<span style="color: #CD5555">&quot;chddata.csv&quot;</span>),<span style="color: #CD5555">&#39;r&#39;</span>)
-
-<span style="color: #228B22"># Read the chd data as  csv file and organize the data into arrays with age group, age, and chd</span>
-chd = pd.read_csv(infile, names=(<span style="color: #CD5555">&#39;ID&#39;</span>, <span style="color: #CD5555">&#39;Age&#39;</span>, <span style="color: #CD5555">&#39;Agegroup&#39;</span>, <span style="color: #CD5555">&#39;CHD&#39;</span>))
-chd.columns = [<span style="color: #CD5555">&#39;ID&#39;</span>, <span style="color: #CD5555">&#39;Age&#39;</span>, <span style="color: #CD5555">&#39;Agegroup&#39;</span>, <span style="color: #CD5555">&#39;CHD&#39;</span>]
-output = chd[<span style="color: #CD5555">&#39;CHD&#39;</span>]
-age = chd[<span style="color: #CD5555">&#39;Age&#39;</span>]
-agegroup = chd[<span style="color: #CD5555">&#39;Agegroup&#39;</span>]
-numberID  = chd[<span style="color: #CD5555">&#39;ID&#39;</span>] 
-display(chd)
-
-plt.scatter(age, output, marker=<span style="color: #CD5555">&#39;o&#39;</span>)
-plt.axis([<span style="color: #B452CD">18</span>,<span style="color: #B452CD">70.0</span>,-<span style="color: #B452CD">0.1</span>, <span style="color: #B452CD">1.2</span>])
-plt.xlabel(<span style="color: #CD5555">r&#39;Age&#39;</span>)
-plt.ylabel(<span style="color: #CD5555">r&#39;CHD&#39;</span>)
-plt.title(<span style="color: #CD5555">r&#39;Age distribution and Coronary heart disease&#39;</span>)
-plt.show()
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="plotting-the-mean-value-for-each-group">Plotting the mean value for each group </h2>
+<h2 id="expectation-value-and-variance-for-boldsymbol-theta">Expectation value and variance for \( \boldsymbol{\theta} \) </h2>
 
-<p>What we could attempt however is to plot the mean value for each group.</p>
+<p>With the OLS expressions for the optimal parameters \( \boldsymbol{\hat{\theta}} \) we can evaluate the expectation value</p>
+$$
+\mathbb{E}(\boldsymbol{\hat{\theta}}) = \mathbb{E}[ (\mathbf{X}^{\top} \mathbf{X})^{-1}\mathbf{X}^{T} \mathbf{Y}]=(\mathbf{X}^{T} \mathbf{X})^{-1}\mathbf{X}^{T} \mathbb{E}[ \mathbf{Y}]=(\mathbf{X}^{T} \mathbf{X})^{-1} \mathbf{X}^{T}\mathbf{X}\boldsymbol{\theta}=\boldsymbol{\theta}.
+$$
 
+<p>This means that the estimator of the regression parameters is unbiased.</p>
 
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="line-height: 125%;">agegroupmean = np.array([<span style="color: #B452CD">0.1</span>, <span style="color: #B452CD">0.133</span>, <span style="color: #B452CD">0.250</span>, <span style="color: #B452CD">0.333</span>, <span style="color: #B452CD">0.462</span>, <span style="color: #B452CD">0.625</span>, <span style="color: #B452CD">0.765</span>, <span style="color: #B452CD">0.800</span>])
-group = np.array([<span style="color: #B452CD">1</span>, <span style="color: #B452CD">2</span>, <span style="color: #B452CD">3</span>, <span style="color: #B452CD">4</span>, <span style="color: #B452CD">5</span>, <span style="color: #B452CD">6</span>, <span style="color: #B452CD">7</span>, <span style="color: #B452CD">8</span>])
-plt.plot(group, agegroupmean, <span style="color: #CD5555">&quot;r-&quot;</span>)
-plt.axis([<span style="color: #B452CD">0</span>,<span style="color: #B452CD">9</span>,<span style="color: #B452CD">0</span>, <span style="color: #B452CD">1.0</span>])
-plt.xlabel(<span style="color: #CD5555">r&#39;Age group&#39;</span>)
-plt.ylabel(<span style="color: #CD5555">r&#39;CHD mean values&#39;</span>)
-plt.title(<span style="color: #CD5555">r&#39;Mean values for each age group&#39;</span>)
-plt.show()
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
+<p>We can also calculate the variance</p>
 
-<p>We are now trying to find a function \( f(y\vert x) \), that is a function which gives us an expected value for the output \( y \) with a given input \( x \).
-In standard linear regression with a linear dependence on \( x \), we would write this in terms of our model
-</p>
+<p>The variance of the optimal value \( \boldsymbol{\hat{\theta}} \) is</p>
 $$
-f(y_i\vert x_i)=\beta_0+\beta_1 x_i.
+\begin{eqnarray*}
+\mbox{Var}(\boldsymbol{\hat{\theta}}) & = & \mathbb{E} \{ [\boldsymbol{\theta} - \mathbb{E}(\boldsymbol{\theta})] [\boldsymbol{\theta} - \mathbb{E}(\boldsymbol{\theta})]^{T} \}
+\\
+& = & \mathbb{E} \{ [(\mathbf{X}^{T} \mathbf{X})^{-1} \, \mathbf{X}^{T} \mathbf{Y} - \boldsymbol{\theta}] \, [(\mathbf{X}^{T} \mathbf{X})^{-1} \, \mathbf{X}^{T} \mathbf{Y} - \boldsymbol{\theta}]^{T} \}
+\\
+% & = & \mathbb{E} \{ [(\mathbf{X}^{T} \mathbf{X})^{-1} \, \mathbf{X}^{T} \mathbf{Y}] \, [(\mathbf{X}^{T} \mathbf{X})^{-1} \, \mathbf{X}^{T} \mathbf{Y}]^{T} \} - \boldsymbol{\theta} \, \boldsymbol{\theta}^{T}
+% \\
+% & = & \mathbb{E} \{ (\mathbf{X}^{T} \mathbf{X})^{-1} \, \mathbf{X}^{T} \mathbf{Y} \, \mathbf{Y}^{T} \, \mathbf{X} \, (\mathbf{X}^{T} \mathbf{X})^{-1}  \} - \boldsymbol{\theta} \, \boldsymbol{\theta}^{T}
+% \\
+& = & (\mathbf{X}^{T} \mathbf{X})^{-1} \, \mathbf{X}^{T} \, \mathbb{E} \{ \mathbf{Y} \, \mathbf{Y}^{T} \} \, \mathbf{X} \, (\mathbf{X}^{T} \mathbf{X})^{-1} - \boldsymbol{\theta} \, \boldsymbol{\theta}^{T}
+\\
+& = & (\mathbf{X}^{T} \mathbf{X})^{-1} \, \mathbf{X}^{T} \, \{ \mathbf{X} \, \boldsymbol{\theta} \, \boldsymbol{\theta}^{T} \,  \mathbf{X}^{T} + \sigma^2 \} \, \mathbf{X} \, (\mathbf{X}^{T} \mathbf{X})^{-1} - \boldsymbol{\theta} \, \boldsymbol{\theta}^{T}
+% \\
+% & = & (\mathbf{X}^T \mathbf{X})^{-1} \, \mathbf{X}^T \, \mathbf{X} \, \boldsymbol{\theta} \, \boldsymbol{\theta}^T \,  \mathbf{X}^T \, \mathbf{X} \, (\mathbf{X}^T % \mathbf{X})^{-1}
+% \\
+% & & + \, \, \sigma^2 \, (\mathbf{X}^T \mathbf{X})^{-1} \, \mathbf{X}^T  \, \mathbf{X} \, (\mathbf{X}^T \mathbf{X})^{-1} - \boldsymbol{\theta} \boldsymbol{\theta}^T
+\\
+& = & \boldsymbol{\theta} \, \boldsymbol{\theta}^{T}  + \sigma^2 \, (\mathbf{X}^{T} \mathbf{X})^{-1} - \boldsymbol{\theta} \, \boldsymbol{\theta}^{T}
+\, \, \, = \, \, \, \sigma^2 \, (\mathbf{X}^{T} \mathbf{X})^{-1},
+\end{eqnarray*}
 $$
 
-<p>This expression implies however that \( f(y_i\vert x_i) \) could take any
-value from minus infinity to plus infinity. If we however let
-\( f(y\vert y) \) be represented by the mean value, the above example
-shows us that we can constrain the function to take values between
-zero and one, that is we have \( 0 \le f(y_i\vert x_i) \le 1 \). Looking
-at our last curve we see also that it has an S-shaped form. This leads
-us to a very popular model for the function \( f \), namely the so-called
-Sigmoid function or logistic model. We will consider this function as
-representing the probability for finding a value of \( y_i \) with a given
-\( x_i \).
+<p>where we have used  that \( \mathbb{E} (\mathbf{Y} \mathbf{Y}^{T}) =
+\mathbf{X} \, \boldsymbol{\theta} \, \boldsymbol{\theta}^{T} \, \mathbf{X}^{T} +
+\sigma^2 \, \mathbf{I}_{nn} \). From \( \mbox{Var}(\boldsymbol{\theta}) = \sigma^2
+\, (\mathbf{X}^{T} \mathbf{X})^{-1} \), one obtains an estimate of the
+variance of the estimate of the \( j \)-th regression coefficient:
+\( \boldsymbol{\sigma}^2 (\boldsymbol{\theta}_j ) = \boldsymbol{\sigma}^2 [(\mathbf{X}^{T} \mathbf{X})^{-1}]_{jj}  \). This may be used to
+construct a confidence interval for the estimates.
 </p>
 
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="the-logistic-function">The logistic function </h2>
-
-<p>Another widely studied model, is the so-called 
-perceptron model, which is an example of a &quot;hard classification&quot; model. We
-will encounter this model when we discuss neural networks as
-well. Each datapoint is deterministically assigned to a category (i.e
-\( y_i=0 \) or \( y_i=1 \)). In many cases, and the coronary heart disease data forms one of many such examples, it is favorable to have a &quot;soft&quot;
-classifier that outputs the probability of a given category rather
-than a single value. For example, given \( x_i \), the classifier
-outputs the probability of being in a category \( k \).  Logistic regression
-is the most common example of a so-called soft classifier. In logistic
-regression, the probability that a data point \( x_i \)
-belongs to a category \( y_i=\{0,1\} \) is given by the so-called logit function (or Sigmoid) which is meant to represent the likelihood for a given event, 
+<p>In a similar way, we can obtain analytical expressions for say the
+expectation values of the parameters \( \boldsymbol{\theta} \) and their variance
+when we employ Ridge regression, allowing us again to define a confidence interval. 
 </p>
+
+<p>It is rather straightforward to show that</p>
 $$
-p(t) = \frac{1}{1+\mathrm \exp{-t}}=\frac{\exp{t}}{1+\mathrm \exp{t}}.
+\mathbb{E} \big[ \boldsymbol{\theta}^{\mathrm{Ridge}} \big]=(\mathbf{X}^{T} \mathbf{X} + \lambda \mathbf{I}_{pp})^{-1} (\mathbf{X}^{\top} \mathbf{X})\boldsymbol{\theta}^{\mathrm{OLS}}.
 $$
 
-<p>Note that \( 1-p(t)= p(-t) \).</p>
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-networks">Examples of likelihood functions used in logistic regression and nueral networks </h2>
-
-<p>The following code plots the logistic function, the step function and other functions we will encounter from here and on.</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="line-height: 125%;"><span style="color: #CD5555">&quot;&quot;&quot;The sigmoid function (or the logistic curve) is a</span>
-<span style="color: #CD5555">function that takes any real number, z, and outputs a number (0,1).</span>
-<span style="color: #CD5555">It is useful in neural networks for assigning weights on a relative scale.</span>
-<span style="color: #CD5555">The value z is the weighted sum of parameters involved in the learning algorithm.&quot;&quot;&quot;</span>
-
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">numpy</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">matplotlib.pyplot</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">plt</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">math</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">mt</span>
-
-z = numpy.arange(-<span style="color: #B452CD">5</span>, <span style="color: #B452CD">5</span>, <span style="color: #B452CD">.1</span>)
-sigma_fn = numpy.vectorize(<span style="color: #8B008B; font-weight: bold">lambda</span> z: <span style="color: #B452CD">1</span>/(<span style="color: #B452CD">1</span>+numpy.exp(-z)))
-sigma = sigma_fn(z)
-
-fig = plt.figure()
-ax = fig.add_subplot(<span style="color: #B452CD">111</span>)
-ax.plot(z, sigma)
-ax.set_ylim([-<span style="color: #B452CD">0.1</span>, <span style="color: #B452CD">1.1</span>])
-ax.set_xlim([-<span style="color: #B452CD">5</span>,<span style="color: #B452CD">5</span>])
-ax.grid(<span style="color: #8B008B; font-weight: bold">True</span>)
-ax.set_xlabel(<span style="color: #CD5555">&#39;z&#39;</span>)
-ax.set_title(<span style="color: #CD5555">&#39;sigmoid function&#39;</span>)
-
-plt.show()
-
-<span style="color: #CD5555">&quot;&quot;&quot;Step Function&quot;&quot;&quot;</span>
-z = numpy.arange(-<span style="color: #B452CD">5</span>, <span style="color: #B452CD">5</span>, <span style="color: #B452CD">.02</span>)
-step_fn = numpy.vectorize(<span style="color: #8B008B; font-weight: bold">lambda</span> z: <span style="color: #B452CD">1.0</span> <span style="color: #8B008B; font-weight: bold">if</span> z &gt;= <span style="color: #B452CD">0.0</span> <span style="color: #8B008B; font-weight: bold">else</span> <span style="color: #B452CD">0.0</span>)
-step = step_fn(z)
+<p>We see clearly that 
+\( \mathbb{E} \big[ \boldsymbol{\theta}^{\mathrm{Ridge}} \big] \not= \boldsymbol{\theta}^{\mathrm{OLS}} \) for any \( \lambda > 0 \). We say then that the ridge estimator is biased.
+</p>
 
-fig = plt.figure()
-ax = fig.add_subplot(<span style="color: #B452CD">111</span>)
-ax.plot(z, step)
-ax.set_ylim([-<span style="color: #B452CD">0.5</span>, <span style="color: #B452CD">1.5</span>])
-ax.set_xlim([-<span style="color: #B452CD">5</span>,<span style="color: #B452CD">5</span>])
-ax.grid(<span style="color: #8B008B; font-weight: bold">True</span>)
-ax.set_xlabel(<span style="color: #CD5555">&#39;z&#39;</span>)
-ax.set_title(<span style="color: #CD5555">&#39;step function&#39;</span>)
+<p>We can also compute the variance as </p>
 
-plt.show()
+$$
+\mbox{Var}[\boldsymbol{\theta}^{\mathrm{Ridge}}]=\sigma^2[  \mathbf{X}^{T} \mathbf{X} + \lambda \mathbf{I} ]^{-1}  \mathbf{X}^{T} \mathbf{X} \{ [  \mathbf{X}^{\top} \mathbf{X} + \lambda \mathbf{I} ]^{-1}\}^{T},
+$$
 
-<span style="color: #CD5555">&quot;&quot;&quot;tanh Function&quot;&quot;&quot;</span>
-z = numpy.arange(-<span style="color: #B452CD">2</span>*mt.pi, <span style="color: #B452CD">2</span>*mt.pi, <span style="color: #B452CD">0.1</span>)
-t = numpy.tanh(z)
+<p>and it is easy to see that if the parameter \( \lambda \) goes to infinity then the variance of Ridge parameters \( \boldsymbol{\theta} \) goes to zero. </p>
 
-fig = plt.figure()
-ax = fig.add_subplot(<span style="color: #B452CD">111</span>)
-ax.plot(z, t)
-ax.set_ylim([-<span style="color: #B452CD">1.0</span>, <span style="color: #B452CD">1.0</span>])
-ax.set_xlim([-<span style="color: #B452CD">2</span>*mt.pi,<span style="color: #B452CD">2</span>*mt.pi])
-ax.grid(<span style="color: #8B008B; font-weight: bold">True</span>)
-ax.set_xlabel(<span style="color: #CD5555">&#39;z&#39;</span>)
-ax.set_title(<span style="color: #CD5555">&#39;tanh function&#39;</span>)
+<p>With this, we can compute the difference </p>
 
-plt.show()
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
+$$
+\mbox{Var}[\boldsymbol{\theta}^{\mathrm{OLS}}]-\mbox{Var}(\boldsymbol{\theta}^{\mathrm{Ridge}})=\sigma^2 [  \mathbf{X}^{T} \mathbf{X} + \lambda \mathbf{I} ]^{-1}[ 2\lambda\mathbf{I} + \lambda^2 (\mathbf{X}^{T} \mathbf{X})^{-1} ] \{ [  \mathbf{X}^{T} \mathbf{X} + \lambda \mathbf{I} ]^{-1}\}^{T}.
+$$
 
+<p>The difference is non-negative definite since each component of the
+matrix product is non-negative definite. 
+This means the variance we obtain with the standard OLS will always for \( \lambda > 0 \) be larger than the variance of \( \boldsymbol{\theta} \) obtained with the Ridge estimator. This has interesting consequences when we discuss the so-called bias-variance trade-off below. 
+</p>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="two-parameters">Two parameters </h2>
+<h2 id="deriving-ols-from-a-probability-distribution">Deriving OLS from a probability distribution </h2>
 
-<p>We assume now that we have two classes with \( y_i \) either \( 0 \) or \( 1 \). Furthermore we assume also that we have only two parameters \( \beta \) in our fitting of the Sigmoid function, that is we define probabilities </p>
-$$
-\begin{align*}
-p(y_i=1|x_i,\boldsymbol{\beta}) &= \frac{\exp{(\beta_0+\beta_1x_i)}}{1+\exp{(\beta_0+\beta_1x_i)}},\nonumber\\
-p(y_i=0|x_i,\boldsymbol{\beta}) &= 1 - p(y_i=1|x_i,\boldsymbol{\beta}),
-\end{align*}
-$$
+<p>Our basic assumption when we derived the OLS equations was to assume
+that our output is determined by a given continuous function
+\( f(\boldsymbol{x}) \) and a random noise \( \boldsymbol{\epsilon} \) given by the normal
+distribution with zero mean value and an undetermined variance
+\( \sigma^2 \).
+</p>
 
-<p>where \( \boldsymbol{\beta} \) are the weights we wish to extract from data, in our case \( \beta_0 \) and \( \beta_1 \). </p>
+<p>We found above that the outputs \( \boldsymbol{y} \) have a mean value given by
+\( \boldsymbol{X}\hat{\boldsymbol{\theta}} \) and variance \( \sigma^2 \). Since the entries to
+the design matrix are not stochastic variables, we can assume that the
+probability distribution of our targets is also a normal distribution
+but now with mean value \( \boldsymbol{X}\hat{\boldsymbol{\theta}} \). This means that a
+single output \( y_i \) is given by the Gaussian distribution
+</p>
 
-<p>Note that we used</p>
 $$
-p(y_i=0\vert x_i, \boldsymbol{\beta}) = 1-p(y_i=1\vert x_i, \boldsymbol{\beta}).
+y_i\sim \mathcal{N}(\boldsymbol{X}_{i,*}\boldsymbol{\theta}, \sigma^2)=\frac{1}{\sqrt{2\pi\sigma^2}}\exp{\left[-\frac{(y_i-\boldsymbol{X}_{i,*}\boldsymbol{\theta})^2}{2\sigma^2}\right]}.
 $$
 
 
-<!-- !split  -->
-<h2 id="maximum-likelihood">Maximum likelihood </h2>
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="independent-and-identically-distributed-iid">Independent and Identically Distributed (iid) </h2>
 
-<p>In order to define the total likelihood for all possible outcomes from a  
-dataset \( \mathcal{D}=\{(y_i,x_i)\} \), with the binary labels
-\( y_i\in\{0,1\} \) and where the data points are drawn independently, we use the so-called <a href="https://en.wikipedia.org/wiki/Maximum_likelihood_estimation" target="_blank">Maximum Likelihood Estimation</a> (MLE) principle. 
-We aim thus at maximizing 
-the probability of seeing the observed data. We can then approximate the 
-likelihood in terms of the product of the individual probabilities of a specific outcome \( y_i \), that is 
+<p>We assume now that the various \( y_i \) values are stochastically distributed according to the above Gaussian distribution. 
+We define this distribution as
 </p>
 $$
-\begin{align*}
-P(\mathcal{D}|\boldsymbol{\beta})& = \prod_{i=1}^n \left[p(y_i=1|x_i,\boldsymbol{\beta})\right]^{y_i}\left[1-p(y_i=1|x_i,\boldsymbol{\beta}))\right]^{1-y_i}\nonumber \\
-\end{align*}
-$$
-
-<p>from which we obtain the log-likelihood and our <b>cost/loss</b> function</p>
-$$
-\mathcal{C}(\boldsymbol{\beta}) = \sum_{i=1}^n \left( y_i\log{p(y_i=1|x_i,\boldsymbol{\beta})} + (1-y_i)\log\left[1-p(y_i=1|x_i,\boldsymbol{\beta}))\right]\right).
+p(y_i, \boldsymbol{X}\vert\boldsymbol{\theta})=\frac{1}{\sqrt{2\pi\sigma^2}}\exp{\left[-\frac{(y_i-\boldsymbol{X}_{i,*}\boldsymbol{\theta})^2}{2\sigma^2}\right]},
 $$
 
+<p>which reads as finding the likelihood of an event \( y_i \) with the input variables \( \boldsymbol{X} \) given the parameters (to be determined) \( \boldsymbol{\theta} \).</p>
 
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="the-cost-function-rewritten">The cost function rewritten </h2>
+<p>Since these events are assumed to be independent and identicall distributed we can build the probability distribution function (PDF) for all possible event \( \boldsymbol{y} \) as the product of the single events, that is we have</p>
 
-<p>Reordering the logarithms, we can rewrite the <b>cost/loss</b> function as</p>
 $$
-\mathcal{C}(\boldsymbol{\beta}) = \sum_{i=1}^n  \left(y_i(\beta_0+\beta_1x_i) -\log{(1+\exp{(\beta_0+\beta_1x_i)})}\right).
+p(\boldsymbol{y},\boldsymbol{X}\vert\boldsymbol{\theta})=\prod_{i=0}^{n-1}\frac{1}{\sqrt{2\pi\sigma^2}}\exp{\left[-\frac{(y_i-\boldsymbol{X}_{i,*}\boldsymbol{\theta})^2}{2\sigma^2}\right]}=\prod_{i=0}^{n-1}p(y_i,\boldsymbol{X}\vert\boldsymbol{\theta}).
 $$
 
-<p>The maximum likelihood estimator is defined as the set of parameters that maximize the log-likelihood where we maximize with respect to \( \beta \).
-Since the cost (error) function is just the negative log-likelihood, for logistic regression we have that
+<p>We will write this in a more compact form reserving \( \boldsymbol{D} \) for the domain of events, including the ouputs (targets) and the inputs. That is
+in case we have a simple one-dimensional input and output case
 </p>
 $$
-\mathcal{C}(\boldsymbol{\beta})=-\sum_{i=1}^n  \left(y_i(\beta_0+\beta_1x_i) -\log{(1+\exp{(\beta_0+\beta_1x_i)})}\right).
+\boldsymbol{D}=[(x_0,y_0), (x_1,y_1),\dots, (x_{n-1},y_{n-1})].
 $$
 
-<p>This equation is known in statistics as the <b>cross entropy</b>. Finally, we note that just as in linear regression, 
-in practice we often supplement the cross-entropy with additional regularization terms, usually \( L_1 \) and \( L_2 \) regularization as we did for Ridge and Lasso regression.
+<p>In the more general case the various inputs should be replaced by the possible features represented by the input data set \( \boldsymbol{X} \). 
+We can now rewrite the above probability as 
 </p>
+$$
+p(\boldsymbol{D}\vert\boldsymbol{\theta})=\prod_{i=0}^{n-1}\frac{1}{\sqrt{2\pi\sigma^2}}\exp{\left[-\frac{(y_i-\boldsymbol{X}_{i,*}\boldsymbol{\theta})^2}{2\sigma^2}\right]}.
+$$
+
+<p>It is a conditional probability (see below) and reads as the likelihood of a domain of events \( \boldsymbol{D} \) given a set of parameters \( \boldsymbol{\theta} \).</p>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="minimizing-the-cross-entropy">Minimizing the cross entropy </h2>
+<h2 id="maximum-likelihood-estimation-mle">Maximum Likelihood Estimation (MLE) </h2>
 
-<p>The cross entropy is a convex function of the weights \( \boldsymbol{\beta} \) and,
-therefore, any local minimizer is a global minimizer. 
+<p>In statistics, maximum likelihood estimation (MLE) is a method of
+estimating the parameters of an assumed probability distribution,
+given some observed data. This is achieved by maximizing a likelihood
+function so that, under the assumed statistical model, the observed
+data is the most probable. 
 </p>
 
-<p>Minimizing this
-cost function with respect to the two parameters \( \beta_0 \) and \( \beta_1 \) we obtain
+<p>We will assume here that our events are given by the above Gaussian
+distribution and we will determine the optimal parameters \( \theta \) by
+maximizing the above PDF. However, computing the derivatives of a
+product function is cumbersome and can easily lead to overflow and/or
+underflowproblems, with potentials for loss of numerical precision.
 </p>
 
-$$
-\frac{\partial \mathcal{C}(\boldsymbol{\beta})}{\partial \beta_0} = -\sum_{i=1}^n  \left(y_i -\frac{\exp{(\beta_0+\beta_1x_i)}}{1+\exp{(\beta_0+\beta_1x_i)}}\right),
-$$
-
-<p>and </p>
-$$
-\frac{\partial \mathcal{C}(\boldsymbol{\beta})}{\partial \beta_1} = -\sum_{i=1}^n  \left(y_ix_i -x_i\frac{\exp{(\beta_0+\beta_1x_i)}}{1+\exp{(\beta_0+\beta_1x_i)}}\right).
-$$
+<p>In practice, it is more convenient to maximize the logarithm of the
+PDF because it is a monotonically increasing function of the argument.
+Alternatively, and this will be our option, we will minimize the
+negative of the logarithm since this is a monotonically decreasing
+function.
+</p>
 
+<p>Note also that maximization/minimization of the logarithm of the PDF
+is equivalent to the maximization/minimization of the function itself.
+</p>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="a-more-compact-expression">A more compact expression </h2>
+<h2 id="a-new-cost-function">A new Cost Function </h2>
 
-<p>Let us now define a vector \( \boldsymbol{y} \) with \( n \) elements \( y_i \), an
-\( n\times p \) matrix \( \boldsymbol{X} \) which contains the \( x_i \) values and a
-vector \( \boldsymbol{p} \) of fitted probabilities \( p(y_i\vert x_i,\boldsymbol{\beta}) \). We can rewrite in a more compact form the first
-derivative of cost function as
-</p>
+<p>We could now define a new cost function to minimize, namely the negative logarithm of the above PDF</p>
 
 $$
-\frac{\partial \mathcal{C}(\boldsymbol{\beta})}{\partial \boldsymbol{\beta}} = -\boldsymbol{X}^T\left(\boldsymbol{y}-\boldsymbol{p}\right). 
+C(\boldsymbol{\theta})=-\log{\prod_{i=0}^{n-1}p(y_i,\boldsymbol{X}\vert\boldsymbol{\theta})}=-\sum_{i=0}^{n-1}\log{p(y_i,\boldsymbol{X}\vert\boldsymbol{\theta})},
 $$
 
-<p>If we in addition define a diagonal matrix \( \boldsymbol{W} \) with elements 
-\( p(y_i\vert x_i,\boldsymbol{\beta})(1-p(y_i\vert x_i,\boldsymbol{\beta}) \), we can obtain a compact expression of the second derivative as 
-</p>
-
+<p>which becomes</p>
 $$
-\frac{\partial^2 \mathcal{C}(\boldsymbol{\beta})}{\partial \boldsymbol{\beta}\partial \boldsymbol{\beta}^T} = \boldsymbol{X}^T\boldsymbol{W}\boldsymbol{X}. 
+C(\boldsymbol{\theta})=\frac{n}{2}\log{2\pi\sigma^2}+\frac{\vert\vert (\boldsymbol{y}-\boldsymbol{X}\boldsymbol{\theta})\vert\vert_2^2}{2\sigma^2}.
 $$
 
+<p>Taking the derivative of the <em>new</em> cost function with respect to the parameters \( \theta \) we recognize our familiar OLS equation, namely</p>
 
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="extending-to-more-predictors">Extending to more predictors </h2>
-
-<p>Within a binary classification problem, we can easily expand our model to include multiple predictors. Our ratio between likelihoods is then with \( p \) predictors</p>
 $$
-\log{ \frac{p(\boldsymbol{\beta}\boldsymbol{x})}{1-p(\boldsymbol{\beta}\boldsymbol{x})}} = \beta_0+\beta_1x_1+\beta_2x_2+\dots+\beta_px_p.
+\boldsymbol{X}^T\left(\boldsymbol{y}-\boldsymbol{X}\boldsymbol{\theta}\right) =0,
 $$
 
-<p>Here we defined \( \boldsymbol{x}=[1,x_1,x_2,\dots,x_p] \) and \( \boldsymbol{\beta}=[\beta_0, \beta_1, \dots, \beta_p] \) leading to</p>
+<p>which leads to the well-known OLS equation for the optimal paramters \( \theta \)</p>
 $$
-p(\boldsymbol{\beta}\boldsymbol{x})=\frac{ \exp{(\beta_0+\beta_1x_1+\beta_2x_2+\dots+\beta_px_p)}}{1+\exp{(\beta_0+\beta_1x_1+\beta_2x_2+\dots+\beta_px_p)}}.
+\hat{\boldsymbol{\theta}}^{\mathrm{OLS}}=\left(\boldsymbol{X}^T\boldsymbol{X}\right)^{-1}\boldsymbol{X}^T\boldsymbol{y}!
 $$
 
+<p>Next week we will make  a similar analysis for Ridge and Lasso regression</p>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="including-more-classes">Including more classes </h2>
+<h2 id="why-resampling-methods">Why resampling methods </h2>
 
-<p>Till now we have mainly focused on two classes, the so-called binary
-system. Suppose we wish to extend to \( K \) classes.  Let us for the sake
-of simplicity assume we have only two predictors. We have then following model
+<p>Before we proceed, we need to rethink what we have been doing. In our
+eager to fit the data, we have omitted several important elements in
+our regression analysis. In what follows we will
 </p>
+<ol>
+<li> look at statistical properties, including a discussion of mean values, variance and the so-called bias-variance tradeoff</li>
+<li> introduce resampling techniques like cross-validation, bootstrapping and jackknife and more</li>
+</ol>
+<p>and discuss how to select a given model (one of the difficult parts in machine learning).</p>
 
-$$
-\log{\frac{p(C=1\vert x)}{p(K\vert x)}} = \beta_{10}+\beta_{11}x_1,
-$$
-
-<p>and </p>
-$$
-\log{\frac{p(C=2\vert x)}{p(K\vert x)}} = \beta_{20}+\beta_{21}x_1,
-$$
-
-<p>and so on till the class \( C=K-1 \) class</p>
-$$
-\log{\frac{p(C=K-1\vert x)}{p(K\vert x)}} = \beta_{(K-1)0}+\beta_{(K-1)1}x_1,
-$$
-
-<p>and the model is specified in term of \( K-1 \) so-called log-odds or
-<b>logit</b> transformations.
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="resampling-methods">Resampling methods </h2>
+<div class="alert alert-block alert-block alert-text-normal">
+<b></b>
+<p>
+<p>Resampling methods are an indispensable tool in modern
+statistics. They involve repeatedly drawing samples from a training
+set and refitting a model of interest on each sample in order to
+obtain additional information about the fitted model. For example, in
+order to estimate the variability of a linear regression fit, we can
+repeatedly draw different samples from the training data, fit a linear
+regression to each new sample, and then examine the extent to which
+the resulting fits differ. Such an approach may allow us to obtain
+information that would not be available from fitting the model only
+once using the original training sample.
+</p>
+
+<p>Two resampling methods are often used in Machine Learning analyses,</p>
+<ol>
+<li> The <b>bootstrap method</b></li>
+<li> and <b>Cross-Validation</b></li>
+</ol>
+<p>In addition there are several other methods such as the Jackknife and the Blocking methods. We will discuss in particular
+cross-validation and the bootstrap method. 
 </p>
+</div>
 
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="more-classes">More classes </h2>
 
-<p>In our discussion of neural networks we will encounter the above again
-in terms of a slightly modified function, the so-called <b>Softmax</b> function.
-</p>
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="resampling-approaches-can-be-computationally-expensive">Resampling approaches can be computationally expensive </h2>
+<div class="alert alert-block alert-block alert-text-normal">
+<b></b>
+<p>
 
-<p>The softmax function is used in various multiclass classification
-methods, such as multinomial logistic regression (also known as
-softmax regression), multiclass linear discriminant analysis, naive
-Bayes classifiers, and artificial neural networks.  Specifically, in
-multinomial logistic regression and linear discriminant analysis, the
-input to the function is the result of \( K \) distinct linear functions,
-and the predicted probability for the \( k \)-th class given a sample
-vector \( \boldsymbol{x} \) and a weighting vector \( \boldsymbol{\beta} \) is (with two
-predictors):
+<p>Resampling approaches can be computationally expensive, because they
+involve fitting the same statistical method multiple times using
+different subsets of the training data. However, due to recent
+advances in computing power, the computational requirements of
+resampling methods generally are not prohibitive. In this chapter, we
+discuss two of the most commonly used resampling methods,
+cross-validation and the bootstrap. Both methods are important tools
+in the practical application of many statistical learning
+procedures. For example, cross-validation can be used to estimate the
+test error associated with a given statistical learning method in
+order to evaluate its performance, or to select the appropriate level
+of flexibility. The process of evaluating a model&#8217;s performance is
+known as model assessment, whereas the process of selecting the proper
+level of flexibility for a model is known as model selection. The
+bootstrap is widely used.
 </p>
+</div>
 
-$$
-p(C=k\vert \mathbf {x} )=\frac{\exp{(\beta_{k0}+\beta_{k1}x_1)}}{1+\sum_{l=1}^{K-1}\exp{(\beta_{l0}+\beta_{l1}x_1)}}.
-$$
-
-<p>It is easy to extend to more predictors. The final class is </p>
-$$
-p(C=K\vert \mathbf {x} )=\frac{1}{1+\sum_{l=1}^{K-1}\exp{(\beta_{l0}+\beta_{l1}x_1)}},
-$$
-
-<p>and they sum to one. Our earlier discussions were all specialized to
-the case with two classes only. It is easy to see from the above that
-what we derived earlier is compatible with these equations.
-</p>
-
-<p>To find the optimal parameters we would typically use a gradient
-descent method.  Newton's method and gradient descent methods are
-discussed in the material on <a href="https://compphysics.github.io/MachineLearning/doc/pub/Splines/html/Splines-bs.html" target="_blank">optimization
-methods</a>.
-</p>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="searching-for-optimal-regularization-parameters-lambda">Searching for Optimal Regularization Parameters \( \lambda \) </h2>
-
-<p>In project 1, when using Ridge and Lasso regression, we end up
-searching for the optimal parameter \( \lambda \) which minimizes our
-selected scores (MSE or \( R2 \) values for example). The brute force
-approach, as discussed in the code here for Ridge regression, consists
-in evaluating the MSE as function of different \( \lambda \) values.
-Based on these calculations, one tries then to determine the value of the hyperparameter \( \lambda \)
-which results in optimal scores (for example the smallest MSE or an \( R2=1 \)).
-</p>
-
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="line-height: 125%;"><span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">pandas</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">pd</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">matplotlib.pyplot</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">plt</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.model_selection</span> <span style="color: #8B008B; font-weight: bold">import</span> train_test_split
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn</span> <span style="color: #8B008B; font-weight: bold">import</span> linear_model
-
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">MSE</span>(y_data,y_model):
-    n = np.size(y_model)
-    <span style="color: #8B008B; font-weight: bold">return</span> np.sum((y_data-y_model)**<span style="color: #B452CD">2</span>)/n
-<span style="color: #228B22"># A seed just to ensure that the random numbers are the same for every run.</span>
-<span style="color: #228B22"># Useful for eventual debugging.</span>
-np.random.seed(<span style="color: #B452CD">2021</span>)
-
-n = <span style="color: #B452CD">100</span>
-x = np.random.rand(n)
-y = np.exp(-x**<span style="color: #B452CD">2</span>) + <span style="color: #B452CD">1.5</span> * np.exp(-(x-<span style="color: #B452CD">2</span>)**<span style="color: #B452CD">2</span>)+ np.random.randn(n)
-
-Maxpolydegree = <span style="color: #B452CD">5</span>
-X = np.zeros((n,Maxpolydegree-<span style="color: #B452CD">1</span>))
-
-<span style="color: #8B008B; font-weight: bold">for</span> degree <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #B452CD">1</span>,Maxpolydegree): <span style="color: #228B22">#No intercept column</span>
-    X[:,degree-<span style="color: #B452CD">1</span>] = x**(degree)
-
-<span style="color: #228B22"># We split the data in test and training data</span>
-X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=<span style="color: #B452CD">0.2</span>)
+<h2 id="why-resampling-methods">Why resampling methods ? </h2>
+<div class="alert alert-block alert-block alert-text-normal">
+<b>Statistical analysis</b>
+<p>
 
-<span style="color: #228B22"># Decide which values of lambda to use</span>
-nlambdas = <span style="color: #B452CD">500</span>
-MSERidgePredict = np.zeros(nlambdas)
-lambdas = np.logspace(-<span style="color: #B452CD">4</span>, <span style="color: #B452CD">2</span>, nlambdas)
-<span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(nlambdas):
-    lmb = lambdas[i]
-    RegRidge = linear_model.Ridge(lmb)
-    RegRidge.fit(X_train,y_train)
-    ypredictRidge = RegRidge.predict(X_test)
-    MSERidgePredict[i] = MSE(y_test,ypredictRidge)
-
-<span style="color: #228B22"># Now plot the results</span>
-plt.figure()
-plt.plot(np.log10(lambdas), MSERidgePredict, <span style="color: #CD5555">&#39;g--&#39;</span>, label = <span style="color: #CD5555">&#39;MSE SL Ridge Test&#39;</span>)
-plt.xlabel(<span style="color: #CD5555">&#39;log10(lambda)&#39;</span>)
-plt.ylabel(<span style="color: #CD5555">&#39;MSE&#39;</span>)
-plt.legend()
-plt.show()
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
+<ul>
+<li> Our simulations can be treated as <em>computer experiments</em>. This is particularly the case for Monte Carlo methods which are widely used in statistical analyses.</li>
+<li> The results can be analysed with the same statistical tools as we would use when analysing experimental data.</li>
+<li> As in all experiments, we are looking for expectation values and an estimate of how accurate they are, i.e., possible sources for errors.</li>
+</ul>
 </div>
-
-<p>Here we have performed a rather data greedy calculation as function of the regularization parameter \( \lambda \). There is no resampling here. The latter can easily be added by employing the function <b>RidgeCV</b> instead of just calling the <b>Ridge</b> function. For <b>RidgeCV</b> we need to pass the array of \( \lambda \) values.
-By inspecting the figure we can in turn determine which is the optimal regularization parameter.
-This becomes however less functional in the long run. 
-</p>
+    
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="grid-search">Grid Search </h2>
-
-<p>An alternative is to use the so-called grid search functionality
-included with the library <b>Scikit-Learn</b>, as demonstrated for the same
-example here.
-</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="line-height: 125%;"><span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.model_selection</span> <span style="color: #8B008B; font-weight: bold">import</span> train_test_split
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.linear_model</span> <span style="color: #8B008B; font-weight: bold">import</span> Ridge
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.model_selection</span> <span style="color: #8B008B; font-weight: bold">import</span> GridSearchCV
-
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">R2</span>(y_data, y_model):
-    <span style="color: #8B008B; font-weight: bold">return</span> <span style="color: #B452CD">1</span> - np.sum((y_data - y_model) ** <span style="color: #B452CD">2</span>) / np.sum((y_data - np.mean(y_data)) ** <span style="color: #B452CD">2</span>)
-
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">MSE</span>(y_data,y_model):
-    n = np.size(y_model)
-    <span style="color: #8B008B; font-weight: bold">return</span> np.sum((y_data-y_model)**<span style="color: #B452CD">2</span>)/n
-
-<span style="color: #228B22"># A seed just to ensure that the random numbers are the same for every run.</span>
-<span style="color: #228B22"># Useful for eventual debugging.</span>
-np.random.seed(<span style="color: #B452CD">2021</span>)
-
-n = <span style="color: #B452CD">100</span>
-x = np.random.rand(n)
-y = np.exp(-x**<span style="color: #B452CD">2</span>) + <span style="color: #B452CD">1.5</span> * np.exp(-(x-<span style="color: #B452CD">2</span>)**<span style="color: #B452CD">2</span>)+ np.random.randn(n)
-
-Maxpolydegree = <span style="color: #B452CD">5</span>
-X = np.zeros((n,Maxpolydegree-<span style="color: #B452CD">1</span>))
-
-<span style="color: #8B008B; font-weight: bold">for</span> degree <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #B452CD">1</span>,Maxpolydegree): <span style="color: #228B22">#No intercept column</span>
-    X[:,degree-<span style="color: #B452CD">1</span>] = x**(degree)
-
-<span style="color: #228B22"># We split the data in test and training data</span>
-X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=<span style="color: #B452CD">0.2</span>)
+<h2 id="statistical-analysis">Statistical analysis </h2>
+<div class="alert alert-block alert-block alert-text-normal">
+<b></b>
+<p>
 
-<span style="color: #228B22"># Decide which values of lambda to use</span>
-nlambdas = <span style="color: #B452CD">10</span>
-lambdas = np.logspace(-<span style="color: #B452CD">4</span>, <span style="color: #B452CD">2</span>, nlambdas)
-<span style="color: #228B22"># create and fit a ridge regression model, testing each alpha</span>
-model = Ridge()
-gridsearch = GridSearchCV(estimator=model, param_grid=<span style="color: #658b00">dict</span>(alpha=lambdas))
-gridsearch.fit(X_train, y_train)
-<span style="color: #658b00">print</span>(gridsearch)
-ypredictRidge = gridsearch.predict(X_test)
-<span style="color: #228B22"># summarize the results of the grid search</span>
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">f&quot;Best estimated lambda-value: {</span>gridsearch.best_estimator_.alpha<span style="color: #CD5555">}&quot;</span>)
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">f&quot;MSE score: {</span>MSE(y_test,ypredictRidge)<span style="color: #CD5555">}&quot;</span>)
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">f&quot;R2 score: {</span>R2(y_test,ypredictRidge)<span style="color: #CD5555">}&quot;</span>)
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
+<ul>
+<li> As in other experiments, many numerical  experiments have two classes of errors:</li>
+<ul>
+  <li> Statistical errors</li>
+  <li> Systematical errors</li>
+</ul>
+<li> Statistical errors can be estimated using standard tools from statistics</li>
+<li> Systematical errors are method specific and must be treated differently from case to case.</li> 
+</ul>
 </div>
-
-<p>By default the grid search function includes cross validation with
-five folds. The <a href="https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html#sklearn.model_selection.GridSearchCV" target="_blank">Scikit-Learn
-documentation</a>
-contains more information on how to set the different parameters.
-</p>
-
-<p>If we take out the random noise, running the above codes results in \( \lambda=0 \) yielding the best fit. </p>
+    
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="randomized-grid-search">Randomized Grid Search </h2>
+<h2 id="resampling-methods">Resampling methods </h2>
 
-<p>An alternative to the above manual grid set up, is to use a random
-search where the parameters are tuned from a random distribution
-(uniform below) for a fixed number of iterations. A model is
-constructed and evaluated for each combination of chosen parameters.
-We repeat the previous example but now with a random search.  Note
-that values of \( \lambda \) are now limited to be within \( x\in
-[0,1] \). This domain may not be the most relevant one for the specific
-case under study.
+<p>With all these analytical equations for both the OLS and Ridge
+regression, we will now outline how to assess a given model. This will
+lead to a discussion of the so-called bias-variance tradeoff (see
+below) and so-called resampling methods.
 </p>
 
-
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="line-height: 125%;"><span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.model_selection</span> <span style="color: #8B008B; font-weight: bold">import</span> train_test_split
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.linear_model</span> <span style="color: #8B008B; font-weight: bold">import</span> Ridge
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.model_selection</span> <span style="color: #8B008B; font-weight: bold">import</span> GridSearchCV
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">scipy.stats</span> <span style="color: #8B008B; font-weight: bold">import</span> uniform <span style="color: #8B008B; font-weight: bold">as</span> randuniform
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.model_selection</span> <span style="color: #8B008B; font-weight: bold">import</span> RandomizedSearchCV
-
-
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">R2</span>(y_data, y_model):
-    <span style="color: #8B008B; font-weight: bold">return</span> <span style="color: #B452CD">1</span> - np.sum((y_data - y_model) ** <span style="color: #B452CD">2</span>) / np.sum((y_data - np.mean(y_data)) ** <span style="color: #B452CD">2</span>)
-
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">MSE</span>(y_data,y_model):
-    n = np.size(y_model)
-    <span style="color: #8B008B; font-weight: bold">return</span> np.sum((y_data-y_model)**<span style="color: #B452CD">2</span>)/n
-
-<span style="color: #228B22"># A seed just to ensure that the random numbers are the same for every run.</span>
-<span style="color: #228B22"># Useful for eventual debugging.</span>
-np.random.seed(<span style="color: #B452CD">2021</span>)
-
-n = <span style="color: #B452CD">100</span>
-x = np.random.rand(n)
-y = np.exp(-x**<span style="color: #B452CD">2</span>) + <span style="color: #B452CD">1.5</span> * np.exp(-(x-<span style="color: #B452CD">2</span>)**<span style="color: #B452CD">2</span>)+ np.random.randn(n)
-
-Maxpolydegree = <span style="color: #B452CD">5</span>
-X = np.zeros((n,Maxpolydegree-<span style="color: #B452CD">1</span>))
-
-<span style="color: #8B008B; font-weight: bold">for</span> degree <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #B452CD">1</span>,Maxpolydegree): <span style="color: #228B22">#No intercept column</span>
-    X[:,degree-<span style="color: #B452CD">1</span>] = x**(degree)
-
-<span style="color: #228B22"># We split the data in test and training data</span>
-X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=<span style="color: #B452CD">0.2</span>)
-
-param_grid = {<span style="color: #CD5555">&#39;alpha&#39;</span>: randuniform()}
-<span style="color: #228B22"># create and fit a ridge regression model, testing each alpha</span>
-model = Ridge()
-gridsearch = RandomizedSearchCV(estimator=model, param_distributions=param_grid, n_iter=<span style="color: #B452CD">100</span>)
-gridsearch.fit(X_train, y_train)
-<span style="color: #658b00">print</span>(gridsearch)
-ypredictRidge = gridsearch.predict(X_test)
-<span style="color: #228B22"># summarize the results of the grid search</span>
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">f&quot;Best estimated lambda-value: {</span>gridsearch.best_estimator_.alpha<span style="color: #CD5555">}&quot;</span>)
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">f&quot;MSE score: {</span>MSE(y_test,ypredictRidge)<span style="color: #CD5555">}&quot;</span>)
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">f&quot;R2 score: {</span>R2(y_test,ypredictRidge)<span style="color: #CD5555">}&quot;</span>)
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="wisconsin-cancer-data">Wisconsin Cancer Data  </h2>
-
-<p>We show here how we can use a simple regression case on the breast
-cancer data using Logistic regression as our algorithm for
-classification.
+<p>One of the quantities we have discussed as a way to measure errors is
+the mean-squared error (MSE), mainly used for fitting of continuous
+functions. Another choice is the absolute error.
 </p>
 
-
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="line-height: 125%;"><span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">matplotlib.pyplot</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">plt</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.model_selection</span> <span style="color: #8B008B; font-weight: bold">import</span>  train_test_split 
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.datasets</span> <span style="color: #8B008B; font-weight: bold">import</span> load_breast_cancer
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.linear_model</span> <span style="color: #8B008B; font-weight: bold">import</span> LogisticRegression
-
-<span style="color: #228B22"># Load the data</span>
-cancer = load_breast_cancer()
-
-X_train, X_test, y_train, y_test = train_test_split(cancer.data,cancer.target,random_state=<span style="color: #B452CD">0</span>)
-<span style="color: #658b00">print</span>(X_train.shape)
-<span style="color: #658b00">print</span>(X_test.shape)
-<span style="color: #228B22"># Logistic Regression</span>
-logreg = LogisticRegression(solver=<span style="color: #CD5555">&#39;lbfgs&#39;</span>)
-logreg.fit(X_train, y_train)
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;Test set accuracy with Logistic Regression: {:.2f}&quot;</span>.format(logreg.score(X_test,y_test)))
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="using-the-correlation-matrix">Using the correlation matrix </h2>
-
-<p>In addition to the above scores, we could also study the covariance (and the correlation matrix).
-We use <b>Pandas</b> to compute the correlation matrix.
+<p>In the discussions below we will focus on the MSE and in particular since we will split the data into test and training data,
+we discuss the
 </p>
-
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="line-height: 125%;"><span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">matplotlib.pyplot</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">plt</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.model_selection</span> <span style="color: #8B008B; font-weight: bold">import</span>  train_test_split 
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.datasets</span> <span style="color: #8B008B; font-weight: bold">import</span> load_breast_cancer
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.linear_model</span> <span style="color: #8B008B; font-weight: bold">import</span> LogisticRegression
-cancer = load_breast_cancer()
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">pandas</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">pd</span>
-<span style="color: #228B22"># Making a data frame</span>
-cancerpd = pd.DataFrame(cancer.data, columns=cancer.feature_names)
-
-fig, axes = plt.subplots(<span style="color: #B452CD">15</span>,<span style="color: #B452CD">2</span>,figsize=(<span style="color: #B452CD">10</span>,<span style="color: #B452CD">20</span>))
-malignant = cancer.data[cancer.target == <span style="color: #B452CD">0</span>]
-benign = cancer.data[cancer.target == <span style="color: #B452CD">1</span>]
-ax = axes.ravel()
-
-<span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #B452CD">30</span>):
-    _, bins = np.histogram(cancer.data[:,i], bins =<span style="color: #B452CD">50</span>)
-    ax[i].hist(malignant[:,i], bins = bins, alpha = <span style="color: #B452CD">0.5</span>)
-    ax[i].hist(benign[:,i], bins = bins, alpha = <span style="color: #B452CD">0.5</span>)
-    ax[i].set_title(cancer.feature_names[i])
-    ax[i].set_yticks(())
-ax[<span style="color: #B452CD">0</span>].set_xlabel(<span style="color: #CD5555">&quot;Feature magnitude&quot;</span>)
-ax[<span style="color: #B452CD">0</span>].set_ylabel(<span style="color: #CD5555">&quot;Frequency&quot;</span>)
-ax[<span style="color: #B452CD">0</span>].legend([<span style="color: #CD5555">&quot;Malignant&quot;</span>, <span style="color: #CD5555">&quot;Benign&quot;</span>], loc =<span style="color: #CD5555">&quot;best&quot;</span>)
-fig.tight_layout()
-plt.show()
-
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">seaborn</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">sns</span>
-correlation_matrix = cancerpd.corr().round(<span style="color: #B452CD">1</span>)
-<span style="color: #228B22"># use the heatmap function from seaborn to plot the correlation matrix</span>
-<span style="color: #228B22"># annot = True to print the values inside the square</span>
-plt.figure(figsize=(<span style="color: #B452CD">15</span>,<span style="color: #B452CD">8</span>))
-sns.heatmap(data=correlation_matrix, annot=<span style="color: #8B008B; font-weight: bold">True</span>)
-plt.show()
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="discussing-the-correlation-data">Discussing the correlation data </h2>
-
-<p>In the above example we note two things. In the first plot we display
-the overlap of benign and malignant tumors as functions of the various
-features in the Wisconsing breast cancer data set. We see that for
-some of the features we can distinguish clearly the benign and
-malignant cases while for other features we cannot. This can point to
-us which features may be of greater interest when we wish to classify
-a benign or not benign tumour.
-</p>
-
-<p>In the second figure we have computed the so-called correlation
-matrix, which in our case with thirty features becomes a \( 30\times 30 \)
-matrix.
-</p>
-
-<p>We constructed this matrix using <b>pandas</b> via the statements</p>
-
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="line-height: 125%;">cancerpd = pd.DataFrame(cancer.data, columns=cancer.feature_names)
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-<p>and then</p>
-
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="line-height: 125%;">correlation_matrix = cancerpd.corr().round(<span style="color: #B452CD">1</span>)
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-<p>Diagonalizing this matrix we can in turn say something about which
-features are of relevance and which are not. This leads  us to
-the classical Principal Component Analysis (PCA) theorem with
-applications. This will be discussed later this semester (<a href="https://compphysics.github.io/MachineLearning/doc/pub/week43/html/week43-bs.html" target="_blank">week 43</a>).
+<ol>
+<li> prediction error or simply the <b>test error</b> \( \mathrm{Err_{Test}} \), where we have a fixed training set and the test error is the MSE arising from the data reserved for testing. We discuss also the</li> 
+<li> training error \( \mathrm{Err_{Train}} \), which is the average loss over the training data.</li>
+</ol>
+<p>As our model becomes more and more complex, more of the training data tends to  used. The training may thence adapt to more complicated structures in the data. This may lead to a decrease in the bias (see below for code example) and a slight increase of the variance for the test error.
+For a certain level of complexity the test error will reach minimum, before starting to increase again. The
+training error reaches a saturation.
 </p>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="other-measures-in-classification-studies-cancer-data-again">Other measures in classification studies: Cancer Data  again </h2>
-
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="line-height: 125%;"><span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">matplotlib.pyplot</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">plt</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.model_selection</span> <span style="color: #8B008B; font-weight: bold">import</span>  train_test_split 
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.datasets</span> <span style="color: #8B008B; font-weight: bold">import</span> load_breast_cancer
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.linear_model</span> <span style="color: #8B008B; font-weight: bold">import</span> LogisticRegression
-
-<span style="color: #228B22"># Load the data</span>
-cancer = load_breast_cancer()
-
-X_train, X_test, y_train, y_test = train_test_split(cancer.data,cancer.target,random_state=<span style="color: #B452CD">0</span>)
-<span style="color: #658b00">print</span>(X_train.shape)
-<span style="color: #658b00">print</span>(X_test.shape)
-<span style="color: #228B22"># Logistic Regression</span>
-logreg = LogisticRegression(solver=<span style="color: #CD5555">&#39;lbfgs&#39;</span>)
-logreg.fit(X_train, y_train)
-
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.preprocessing</span> <span style="color: #8B008B; font-weight: bold">import</span> LabelEncoder
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.model_selection</span> <span style="color: #8B008B; font-weight: bold">import</span> cross_validate
-<span style="color: #228B22">#Cross validation</span>
-accuracy = cross_validate(logreg,X_test,y_test,cv=<span style="color: #B452CD">10</span>)[<span style="color: #CD5555">&#39;test_score&#39;</span>]
-<span style="color: #658b00">print</span>(accuracy)
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;Test set accuracy with Logistic Regression: {:.2f}&quot;</span>.format(logreg.score(X_test,y_test)))
-
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">scikitplot</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">skplt</span>
-y_pred = logreg.predict(X_test)
-skplt.metrics.plot_confusion_matrix(y_test, y_pred, normalize=<span style="color: #8B008B; font-weight: bold">True</span>)
-plt.show()
-y_probas = logreg.predict_proba(X_test)
-skplt.metrics.plot_roc(y_test, y_probas)
-plt.show()
-skplt.metrics.plot_cumulative_gain(y_test, y_probas)
-plt.show()
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
+<h2 id="resampling-methods-bootstrap">Resampling methods: Bootstrap </h2>
+<div class="alert alert-block alert-block alert-text-normal">
+<b></b>
+<p>
+<p>Bootstrapping is a <a href="https://en.wikipedia.org/wiki/Nonparametric_statistics" target="_blank">non-parametric approach</a> to statistical inference
+that substitutes computation for more traditional distributional
+assumptions and asymptotic results. Bootstrapping offers a number of
+advantages: 
+</p>
+<ol>
+<li> The bootstrap is quite general, although there are some cases in which it fails.</li>  
+<li> Because it does not require distributional assumptions (such as normally distributed errors), the bootstrap can provide more accurate inferences when the data are not well behaved or when the sample size is small.</li>  
+<li> It is possible to apply the bootstrap to statistics with sampling distributions that are difficult to derive, even asymptotically.</li> 
+<li> It is relatively simple to apply the bootstrap to complex data-collection plans (such as stratified and clustered samples).</li>
+</ol>
 </div>
 
 
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="optimization-the-central-part-of-any-machine-learning-algortithm">Optimization, the central part of any Machine Learning algortithm </h2>
-
-<a href="https://www.uio.no/studier/emner/matnat/fys/FYS-STK3155/h20/forelesningsvideoer/OverarchingAimsWeek39.mp4?vrtx=view-as-webpage" target="_blank">Overview Video, why do we care about gradient methods?</a>
+<p>The textbook by <a href="https://www.cambridge.org/core/books/bootstrap-methods-and-their-application/ED2FD043579F27952363566DC09CBD6A" target="_blank">Davison on the Bootstrap Methods and their Applications</a> provides many more insights and proofs. In this course we will take a more practical approach and use the results and theorems provided in the literature. For those interested in reading more about the bootstrap methods, we recommend the above text and the one by <a href="https://www.routledge.com/An-Introduction-to-the-Bootstrap/Efron-Tibshirani/p/book/9780412042317" target="_blank">Efron and Tibshirani</a>.</p>
 
-<p>Almost every problem in machine learning and data science starts with
-a dataset \( X \), a model \( g(\beta) \), which is a function of the
-parameters \( \beta \) and a cost function \( C(X, g(\beta)) \) that allows
-us to judge how well the model \( g(\beta) \) explains the observations
-\( X \). The model is fit by finding the values of \( \beta \) that minimize
-the cost function. Ideally we would be able to solve for \( \beta \)
-analytically, however this is not possible in general and we must use
-some approximative/numerical method to compute the minimum.
-</p>
+<p>Before we proceed however, we need to remind ourselves about a central theorem in statistics, namely the so-called <b>central limit theorem</b>.</p>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="revisiting-our-logistic-regression-case">Revisiting our Logistic Regression case </h2>
+<h2 id="the-central-limit-theorem">The Central Limit Theorem </h2>
 
-<p>In our discussion on Logistic Regression we studied the 
-case of
-two classes, with \( y_i \) either
-\( 0 \) or \( 1 \). Furthermore we assumed also that we have only two
-parameters \( \beta \) in our fitting, that is we
-defined probabilities
+<p>Suppose we have a PDF \( p(x) \) from which we generate  a series \( N \)
+of averages \( \mathbb{E}[x_i] \). Each mean value \( \mathbb{E}[x_i] \)
+is viewed as the average of a specific measurement, e.g., throwing 
+dice 100 times and then taking the average value, or producing a certain
+amount of random numbers. 
+For notational ease, we set \( \mathbb{E}[x_i]=x_i \) in the discussion
+which follows. We do the same for \( \mathbb{E}[z]=z \).
 </p>
 
+<p>If we compute the mean \( z \) of \( m \) such mean values \( x_i \)   </p>
 $$
-\begin{align*}
-p(y_i=1|x_i,\boldsymbol{\beta}) &= \frac{\exp{(\beta_0+\beta_1x_i)}}{1+\exp{(\beta_0+\beta_1x_i)}},\nonumber\\
-p(y_i=0|x_i,\boldsymbol{\beta}) &= 1 - p(y_i=1|x_i,\boldsymbol{\beta}),
-\end{align*}
+   z=\frac{x_1+x_2+\dots+x_m}{m},
 $$
 
-<p>where \( \boldsymbol{\beta} \) are the weights we wish to extract from data, in our case \( \beta_0 \) and \( \beta_1 \). </p>
+<p>the question we pose is which is the PDF of the new variable \( z \).</p>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="the-equations-to-solve">The equations to solve </h2>
+<h2 id="finding-the-limit">Finding the Limit </h2>
 
-<p>Our compact equations used a definition of a vector \( \boldsymbol{y} \) with \( n \)
-elements \( y_i \), an \( n\times p \) matrix \( \boldsymbol{X} \) which contains the
-\( x_i \) values and a vector \( \boldsymbol{p} \) of fitted probabilities
-\( p(y_i\vert x_i,\boldsymbol{\beta}) \). We rewrote in a more compact form
-the first derivative of the cost function as
+<p>The probability of obtaining an average value \( z \) is the product of the 
+probabilities of obtaining arbitrary individual mean values \( x_i \),
+but with the constraint that the average is \( z \). We can express this through
+the following expression
 </p>
-
 $$
-\frac{\partial \mathcal{C}(\boldsymbol{\beta})}{\partial \boldsymbol{\beta}} = -\boldsymbol{X}^T\left(\boldsymbol{y}-\boldsymbol{p}\right). 
+    \tilde{p}(z)=\int dx_1p(x_1)\int dx_2p(x_2)\dots\int dx_mp(x_m)
+    \delta(z-\frac{x_1+x_2+\dots+x_m}{m}),
 $$
 
-<p>If we in addition define a diagonal matrix \( \boldsymbol{W} \) with elements 
-\( p(y_i\vert x_i,\boldsymbol{\beta})(1-p(y_i\vert x_i,\boldsymbol{\beta}) \), we can obtain a compact expression of the second derivative as 
+<p>where the \( \delta \)-function enbodies the constraint that the mean is \( z \).
+All measurements that lead to each individual \( x_i \) are expected to
+be independent, which in turn means that we can express \( \tilde{p} \) as the 
+product of individual \( p(x_i) \).  The independence assumption is important in the derivation of the central limit theorem.
 </p>
 
-$$
-\frac{\partial^2 \mathcal{C}(\boldsymbol{\beta})}{\partial \boldsymbol{\beta}\partial \boldsymbol{\beta}^T} = \boldsymbol{X}^T\boldsymbol{W}\boldsymbol{X}. 
-$$
-
-<p>This defines what is called  the Hessian matrix.</p>
-
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="solving-using-newton-raphson-s-method">Solving using Newton-Raphson's method </h2>
-
-<p>If we can set up these equations, Newton-Raphson's iterative method is normally the method of choice. It requires however that we can compute in an efficient way the  matrices that define the first and second derivatives. </p>
+<h2 id="rewriting-the-delta-function">Rewriting the \( \delta \)-function </h2>
 
-<p>Our iterative scheme is then given by</p>
+<p>If we use the integral expression for the \( \delta \)-function</p>
 
 $$
-\boldsymbol{\beta}^{\mathrm{new}} = \boldsymbol{\beta}^{\mathrm{old}}-\left(\frac{\partial^2 \mathcal{C}(\boldsymbol{\beta})}{\partial \boldsymbol{\beta}\partial \boldsymbol{\beta}^T}\right)^{-1}_{\boldsymbol{\beta}^{\mathrm{old}}}\times \left(\frac{\partial \mathcal{C}(\boldsymbol{\beta})}{\partial \boldsymbol{\beta}}\right)_{\boldsymbol{\beta}^{\mathrm{old}}},
+   \delta(z-\frac{x_1+x_2+\dots+x_m}{m})=\frac{1}{2\pi}\int_{-\infty}^{\infty}
+   dq\exp{\left(iq(z-\frac{x_1+x_2+\dots+x_m}{m})\right)},
 $$
 
-<p>or in matrix form as</p>
-
-$$
-\boldsymbol{\beta}^{\mathrm{new}} = \boldsymbol{\beta}^{\mathrm{old}}-\left(\boldsymbol{X}^T\boldsymbol{W}\boldsymbol{X} \right)^{-1}\times \left(-\boldsymbol{X}^T(\boldsymbol{y}-\boldsymbol{p}) \right)_{\boldsymbol{\beta}^{\mathrm{old}}}.
-$$
-
-<p>The right-hand side is computed with the old values of \( \beta \). </p>
-
-<p>If we can compute these matrices, in particular the Hessian, the above is often the easiest method to implement. </p>
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="brief-reminder-on-newton-raphson-s-method">Brief reminder on Newton-Raphson's method </h2>
-
-<p>Let us quickly remind ourselves how we derive the above method.</p>
-
-<p>Perhaps the most celebrated of all one-dimensional root-finding
-routines is Newton's method, also called the Newton-Raphson
-method. This method  requires the evaluation of both the
-function \( f \) and its derivative \( f' \) at arbitrary points. 
-If you can only calculate the derivative
-numerically and/or your function is not of the smooth type, we
-normally discourage the use of this method.
+<p>and inserting \( e^{i\mu q-i\mu q} \) where \( \mu \) is the mean value
+we arrive at
 </p>
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="the-equations">The equations </h2>
-
-<p>The Newton-Raphson formula consists geometrically of extending the
-tangent line at a current point until it crosses zero, then setting
-the next guess to the abscissa of that zero-crossing.  The mathematics
-behind this method is rather simple. Employing a Taylor expansion for
-\( x \) sufficiently close to the solution \( s \), we have
-</p>
-
-$$
-    f(s)=0=f(x)+(s-x)f'(x)+\frac{(s-x)^2}{2}f''(x) +\dots.
-    \label{eq:taylornr}
-$$
-
-<p>For small enough values of the function and for well-behaved
-functions, the terms beyond linear are unimportant, hence we obtain
-</p>
-
 $$
-   f(x)+(s-x)f'(x)\approx 0,
+   \tilde{p}(z)=\frac{1}{2\pi}\int_{-\infty}^{\infty}
+   dq\exp{\left(iq(z-\mu)\right)}\left[\int_{-\infty}^{\infty}
+   dxp(x)\exp{\left(iq(\mu-x)/m\right)}\right]^m,
 $$
 
-<p>yielding</p>
-$$
-   s\approx x-\frac{f(x)}{f'(x)}.
-$$
+<p>with the integral over \( x \) resulting in</p>
 
-<p>Having in mind an iterative procedure, it is natural to start iterating with</p>
 $$
-   x_{n+1}=x_n-\frac{f(x_n)}{f'(x_n)}.
+  \int_{-\infty}^{\infty}dxp(x)\exp{\left(iq(\mu-x)/m\right)}=
+  \int_{-\infty}^{\infty}dxp(x)
+   \left[1+\frac{iq(\mu-x)}{m}-\frac{q^2(\mu-x)^2}{2m^2}+\dots\right].
 $$
 
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="simple-geometric-interpretation">Simple geometric interpretation </h2>
+<h2 id="identifying-terms">Identifying Terms </h2>
 
-<p>The above is Newton-Raphson's method. It has a simple geometric
-interpretation, namely \( x_{n+1} \) is the point where the tangent from
-\( (x_n,f(x_n)) \) crosses the \( x \)-axis.  Close to the solution,
-Newton-Raphson converges fast to the desired result. However, if we
-are far from a root, where the higher-order terms in the series are
-important, the Newton-Raphson formula can give grossly inaccurate
-results. For instance, the initial guess for the root might be so far
-from the true root as to let the search interval include a local
-maximum or minimum of the function.  If an iteration places a trial
-guess near such a local extremum, so that the first derivative nearly
-vanishes, then Newton-Raphson may fail totally
-</p>
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="extending-to-more-than-one-variable">Extending to more than one variable </h2>
-
-<p>Newton's method can be generalized to systems of several non-linear equations
-and variables. Consider the case with two equations
+<p>The second term on the rhs disappears since this is just the mean and 
+employing the definition of \( \sigma^2 \) we have 
 </p>
 $$
-   \begin{array}{cc} f_1(x_1,x_2) &=0\\
-                     f_2(x_1,x_2) &=0,\end{array}
+  \int_{-\infty}^{\infty}dxp(x)e^{\left(iq(\mu-x)/m\right)}=
+  1-\frac{q^2\sigma^2}{2m^2}+\dots,
 $$
 
-<p>which we Taylor expand to obtain</p>
+<p>resulting in </p>
 
 $$
-   \begin{array}{cc} 0=f_1(x_1+h_1,x_2+h_2)=&f_1(x_1,x_2)+h_1
-                     \partial f_1/\partial x_1+h_2
-                     \partial f_1/\partial x_2+\dots\\
-                     0=f_2(x_1+h_1,x_2+h_2)=&f_2(x_1,x_2)+h_1
-                     \partial f_2/\partial x_1+h_2
-                     \partial f_2/\partial x_2+\dots
-                       \end{array}.
+  \left[\int_{-\infty}^{\infty}dxp(x)\exp{\left(iq(\mu-x)/m\right)}\right]^m\approx
+  \left[1-\frac{q^2\sigma^2}{2m^2}+\dots \right]^m,
 $$
 
-<p>Defining the Jacobian matrix \( {\bf \boldsymbol{J}} \) we have</p>
-$$
- {\bf \boldsymbol{J}}=\left( \begin{array}{cc}
-                         \partial f_1/\partial x_1  & \partial f_1/\partial x_2 \\
-                          \partial f_2/\partial x_1     &\partial f_2/\partial x_2
-             \end{array} \right),
-$$
+<p>and in the limit \( m\rightarrow \infty \) we obtain </p>
 
-<p>we can rephrase Newton's method as</p>
 $$
-\left(\begin{array}{c} x_1^{n+1} \\ x_2^{n+1} \end{array} \right)=
-\left(\begin{array}{c} x_1^{n} \\ x_2^{n} \end{array} \right)+
-\left(\begin{array}{c} h_1^{n} \\ h_2^{n} \end{array} \right),
+   \tilde{p}(z)=\frac{1}{\sqrt{2\pi}(\sigma/\sqrt{m})}
+    \exp{\left(-\frac{(z-\mu)^2}{2(\sigma/\sqrt{m})^2}\right)},
 $$
 
-<p>where we have defined</p>
-$$
-   \left(\begin{array}{c} h_1^{n} \\ h_2^{n} \end{array} \right)=
-   -{\bf \boldsymbol{J}}^{-1}
-   \left(\begin{array}{c} f_1(x_1^{n},x_2^{n}) \\ f_2(x_1^{n},x_2^{n}) \end{array} \right).
-$$
-
-<p>We need thus to compute the inverse of the Jacobian matrix and it
-is to understand that difficulties  may
-arise in case \( {\bf \boldsymbol{J}} \) is nearly singular.
-</p>
-
-<p>It is rather straightforward to extend the above scheme to systems of
-more than two non-linear equations. In our case, the Jacobian matrix is given by the Hessian that represents the second derivative of cost function. 
+<p>which is the normal distribution with variance
+\( \sigma^2_m=\sigma^2/m \), where \( \sigma \) is the variance of the PDF \( p(x) \)
+and \( \mu \) is also the mean of the PDF \( p(x) \). 
 </p>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="steepest-descent">Steepest descent </h2>
+<h2 id="wrapping-it-up">Wrapping it up </h2>
 
-<p>The basic idea of gradient descent is
-that a function \( F(\mathbf{x}) \), 
-\( \mathbf{x} \equiv (x_1,\cdots,x_n) \), decreases fastest if one goes from \( \bf {x} \) in the
-direction of the negative gradient \( -\nabla F(\mathbf{x}) \).
+<p>Thus, the central limit theorem states that the PDF \( \tilde{p}(z) \) of
+the average of \( m \) random values corresponding to a PDF \( p(x) \) 
+is a normal distribution whose mean is the 
+mean value of the PDF \( p(x) \) and whose variance is the variance
+of the PDF \( p(x) \) divided by \( m \), the number of values used to compute \( z \).
 </p>
 
-<p>It can be shown that if </p>
-$$
-\mathbf{x}_{k+1} = \mathbf{x}_k - \gamma_k \nabla F(\mathbf{x}_k),
-$$
-
-<p>with \( \gamma_k > 0 \).</p>
-
-<p>For \( \gamma_k \) small enough, then \( F(\mathbf{x}_{k+1}) \leq
-F(\mathbf{x}_k) \). This means that for a sufficiently small \( \gamma_k \)
-we are always moving towards smaller function values, i.e a minimum.
-</p>
-
-<!-- !split  -->
-<h2 id="more-on-steepest-descent">More on Steepest descent </h2>
-
-<p>The previous observation is the basis of the method of steepest
-descent, which is also referred to as just gradient descent (GD). One
-starts with an initial guess \( \mathbf{x}_0 \) for a minimum of \( F \) and
-computes new approximations according to
+<p>The central limit theorem leads to the well-known expression for the
+standard deviation, given by 
 </p>
 
 $$
-\mathbf{x}_{k+1} = \mathbf{x}_k - \gamma_k \nabla F(\mathbf{x}_k), \ \ k \geq 0.
+    \sigma_m=
+\frac{\sigma}{\sqrt{m}}.
 $$
 
-<p>The parameter \( \gamma_k \) is often referred to as the step length or
-the learning rate within the context of Machine Learning.
-</p>
-
-<!-- !split  -->
-<h2 id="the-ideal">The ideal </h2>
-
-<p>Ideally the sequence \( \{\mathbf{x}_k \}_{k=0} \) converges to a global
-minimum of the function \( F \). In general we do not know if we are in a
-global or local minimum. In the special case when \( F \) is a convex
-function, all local minima are also global minima, so in this case
-gradient descent can converge to the global solution. The advantage of
-this scheme is that it is conceptually simple and straightforward to
-implement. However the method in this form has some severe
-limitations:
+<p>The latter is true only if the average value is known exactly. This is obtained in the limit
+\( m\rightarrow \infty \)  only. Because the mean and the variance are measured quantities we obtain 
+the familiar expression in statistics (the so-called Bessel correction)
 </p>
+$$
+    \sigma_m\approx 
+\frac{\sigma}{\sqrt{m-1}}.
+$$
 
-<p>In machine learing we are often faced with non-convex high dimensional
-cost functions with many local minima. Since GD is deterministic we
-will get stuck in a local minimum, if the method converges, unless we
-have a very good intial guess. This also implies that the scheme is
-sensitive to the chosen initial condition.
+<p>In many cases however the above estimate for the standard deviation,
+in particular if correlations are strong, may be too simplistic. Keep
+in mind that we have assumed that the variables \( x \) are independent
+and identically distributed. This is obviously not always the
+case. For example, the random numbers (or better pseudorandom numbers)
+we generate in various calculations do always exhibit some
+correlations.
 </p>
 
-<p>Note that the gradient is a function of \( \mathbf{x} =
-(x_1,\cdots,x_n) \) which makes it expensive to compute numerically.
+<p>The theorem is satisfied by a large class of PDFs. Note however that for a
+finite \( m \), it is not always possible to find a closed form /analytic expression for
+\( \tilde{p}(x) \).
 </p>
 
-<!-- !split  -->
-<h2 id="the-sensitiveness-of-the-gradient-descent">The sensitiveness of the gradient descent </h2>
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="confidence-intervals">Confidence Intervals </h2>
 
-<p>The gradient descent method 
-is sensitive to the choice of learning rate \( \gamma_k \). This is due
-to the fact that we are only guaranteed that \( F(\mathbf{x}_{k+1}) \leq
-F(\mathbf{x}_k) \) for sufficiently small \( \gamma_k \). The problem is to
-determine an optimal learning rate. If the learning rate is chosen too
-small the method will take a long time to converge and if it is too
-large we can experience erratic behavior.
+<p>Confidence intervals are used in statistics and represent a type of estimate
+computed from the observed data. This gives a range of values for an
+unknown parameter such as the parameters \( \boldsymbol{\theta} \) from linear regression.
 </p>
 
-<p>Many of these shortcomings can be alleviated by introducing
-randomness. One such method is that of Stochastic Gradient Descent
-(SGD), to be discussed next week.
+<p>With the OLS expressions for the parameters \( \boldsymbol{\theta} \) we found 
+\( \mathbb{E}(\boldsymbol{\theta}) = \boldsymbol{\theta} \), which means that the estimator of the regression parameters is unbiased.
 </p>
 
-<!-- !split  -->
-<h2 id="convex-functions">Convex functions </h2>
-
-<p>Ideally we want our cost/loss function to be convex(concave).</p>
-
-<p>First we give the definition of a convex set: A set \( C \) in
-\( \mathbb{R}^n \) is said to be convex if, for all \( x \) and \( y \) in \( C \) and
-all \( t \in (0,1) \) , the point \( (1 &#8722; t)x + ty \) also belongs to
-C. Geometrically this means that every point on the line segment
-connecting \( x \) and \( y \) is in \( C \) as discussed below.
+<p>In the exercises this week we show that the variance of the estimate of the \( j \)-th regression coefficient is
+\( \boldsymbol{\sigma}^2 (\boldsymbol{\theta}_j ) = \boldsymbol{\sigma}^2 [(\mathbf{X}^{T} \mathbf{X})^{-1}]_{jj}  \).
 </p>
 
-<p>The convex subsets of \( \mathbb{R} \) are the intervals of
-\( \mathbb{R} \). Examples of convex sets of \( \mathbb{R}^2 \) are the
-regular polygons (triangles, rectangles, pentagons, etc...).
+<p>This quantity can be used to
+construct a confidence interval for the estimates.
 </p>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="convex-function">Convex function </h2>
-
-<p><b>Convex function</b>: Let \( X \subset \mathbb{R}^n \) be a convex set. Assume that the function \( f: X \rightarrow \mathbb{R} \) is continuous, then \( f \) is said to be convex if $$f(tx_1 + (1-t)x_2) \leq tf(x_1) + (1-t)f(x_2) $$ for all \( x_1, x_2 \in X \) and for all \( t \in [0,1] \). If \( \leq \) is replaced with a strict inequaltiy in the definition, we demand \( x_1 \neq x_2 \) and \( t\in(0,1) \) then \( f \) is said to be strictly convex. For a single variable function, convexity means that if you draw a straight line connecting \( f(x_1) \) and \( f(x_2) \), the value of the function on the interval \( [x_1,x_2] \) is always below the line as illustrated below.</p>
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="conditions-on-convex-functions">Conditions on convex functions </h2>
+<h2 id="standard-approach-based-on-the-normal-distribution">Standard Approach based on the Normal Distribution </h2>
 
-<p>In the following we state first and second-order conditions which
-ensures convexity of a function \( f \). We write \( D_f \) to denote the
-domain of \( f \), i.e the subset of \( R^n \) where \( f \) is defined. For more
-details and proofs we refer to: <a href="http://stanford.edu/boyd/cvxbook/,%202004" target="_blank">S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge University Press</a>.
+<p>We will assume that the parameters \( \theta \) follow a normal
+distribution.  We can then define the confidence interval.  Here we will be using as
+shorthands \( \mu_{\theta} \) for the above mean value and \( \sigma_{\theta} \)
+for the standard deviation. We have then a confidence interval
 </p>
 
-<div class="alert alert-block alert-block alert-text-normal">
-<b>First order condition</b>
-<p>
-<p>Suppose \( f \) is differentiable (i.e \( \nabla f(x) \) is well defined for
-all \( x \) in the domain of \( f \)). Then \( f \) is convex if and only if \( D_f \)
-is a convex set and $$f(y) \geq f(x) + \nabla f(x)^T (y-x) $$ holds
-for all \( x,y \in D_f \). This condition means that for a convex function
-the first order Taylor expansion (right hand side above) at any point
-a global under estimator of the function. To convince yourself you can
-make a drawing of \( f(x) = x^2+1 \) and draw the tangent line to \( f(x) \) and
-note that it is always below the graph.  
+$$
+\left(\mu_{\theta}\pm \frac{z\sigma_{\theta}}{\sqrt{n}}\right),
+$$
+
+<p>where \( z \) defines the level of certainty (or confidence). For a normal
+distribution typical parameters are \( z=2.576 \) which corresponds to a
+confidence of \( 99\% \) while \( z=1.96 \) corresponds to a confidence of
+\( 95\% \).  A confidence level of \( 95\% \) is commonly used and it is
+normally referred to as a <em>two-sigmas</em> confidence level, that is we
+approximate \( z\approx 2 \).
 </p>
-</div>
 
+<p>For more discussions of confidence intervals (and in particular linked with a discussion of the bootstrap method), see chapter 5 of the textbook by <a href="https://www.cambridge.org/core/books/bootstrap-methods-and-their-application/ED2FD043579F27952363566DC09CBD6A" target="_blank">Davison on the Bootstrap Methods and their Applications</a></p>
 
-<div class="alert alert-block alert-block alert-text-normal">
-<b>Second order condition</b>
-<p>
-<p>Assume that \( f \) is twice
-differentiable, i.e the Hessian matrix exists at each point in
-\( D_f \). Then \( f \) is convex if and only if \( D_f \) is a convex set and its
-Hessian is positive semi-definite for all \( x\in D_f \). For a
-single-variable function this reduces to \( f''(x) \geq 0 \). Geometrically this means that \( f \) has nonnegative curvature
-everywhere.
+<p>In this text you will also find an in-depth discussion of the
+Bootstrap method, why it works and various theorems related to it. 
 </p>
-</div>
 
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="resampling-methods-bootstrap-background">Resampling methods: Bootstrap background </h2>
 
-<p>This condition is particularly useful since it gives us an procedure for determining if the function under consideration is convex, apart from using the definition.</p>
+<p>Since \( \widehat{\theta} = \widehat{\theta}(\boldsymbol{X}) \) is a function of random variables,
+\( \widehat{\theta} \) itself must be a random variable. Thus it has
+a pdf, call this function \( p(\boldsymbol{t}) \). The aim of the bootstrap is to
+estimate \( p(\boldsymbol{t}) \) by the relative frequency of
+\( \widehat{\theta} \). You can think of this as using a histogram
+in the place of \( p(\boldsymbol{t}) \). If the relative frequency closely
+resembles \( p(\vec{t}) \), then using numerics, it is straight forward to
+estimate all the interesting parameters of \( p(\boldsymbol{t}) \) using point
+estimators.  
+</p>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="more-on-convex-functions">More on convex functions </h2>
+<h2 id="resampling-methods-more-bootstrap-background">Resampling methods: More Bootstrap background </h2>
 
-<p>The next result is of great importance to us and the reason why we are
-going on about convex functions. In machine learning we frequently
-have to minimize a loss/cost function in order to find the best
-parameters for the model we are considering. 
+<p>In the case that \( \widehat{\theta} \) has
+more than one component, and the components are independent, we use the
+same estimator on each component separately.  If the probability
+density function of \( X_i \), \( p(x) \), had been known, then it would have
+been straightforward to do this by: 
 </p>
-
-<p>Ideally we want the
-global minimum (for high-dimensional models it is hard to know
-if we have local or global minimum). However, if the cost/loss function
-is convex the following result provides invaluable information:
+<ol>
+<li> Drawing lots of numbers from \( p(x) \), suppose we call one such set of numbers \( (X_1^*, X_2^*, \cdots, X_n^*) \).</li> 
+<li> Then using these numbers, we could compute a replica of \( \widehat{\theta} \) called \( \widehat{\theta}^* \).</li> 
+</ol>
+<p>By repeated use of the above two points, many
+estimates of \( \widehat{\theta} \) can  be obtained. The
+idea is to use the relative frequency of \( \widehat{\theta}^* \)
+(think of a histogram) as an estimate of \( p(\boldsymbol{t}) \).
 </p>
 
-<div class="alert alert-block alert-block alert-text-normal">
-<b>Any minimum is global for convex functions</b>
-<p>
-<p>Consider the problem of finding \( x \in \mathbb{R}^n \) such that \( f(x) \)
-is minimal, where \( f \) is convex and differentiable. Then, any point
-\( x^* \) that satisfies \( \nabla f(x^*) = 0 \) is a global minimum.
-</p>
-</div>
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="resampling-methods-bootstrap-approach">Resampling methods: Bootstrap approach </h2>
 
+<p>But
+unless there is enough information available about the process that
+generated \( X_1,X_2,\cdots,X_n \), \( p(x) \) is in general
+unknown. Therefore, <a href="https://projecteuclid.org/euclid.aos/1176344552" target="_blank">Efron in 1979</a>  asked the
+question: What if we replace \( p(x) \) by the relative frequency
+of the observation \( X_i \)?
+</p>
 
-<p>This result means that if we know that the cost/loss function is convex and we are able to find a minimum, we are guaranteed that it is a global minimum.</p>
+<p>If we draw observations in accordance with
+the relative frequency of the observations, will we obtain the same
+result in some asymptotic sense? The answer is yes.
+</p>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="some-simple-problems">Some simple problems </h2>
+<h2 id="resampling-methods-bootstrap-steps">Resampling methods: Bootstrap steps </h2>
+
+<p>The independent bootstrap works like this: </p>
 
 <ol>
-<li> Show that \( f(x)=x^2 \) is convex for \( x \in \mathbb{R} \) using the definition of convexity. Hint: If you re-write the definition, \( f \) is convex if the following holds for all \( x,y \in D_f \) and any \( \lambda \in [0,1] \) $\lambda f(x)+(1-\lambda)f(y)-f(\lambda x + (1-\lambda) y ) \geq 0$.</li>
-<li> Using the second order condition show that the following functions are convex on the specified domain.</li>
-<ul>
- <li> \( f(x) = e^x \) is convex for \( x \in \mathbb{R} \).</li>
- <li> \( g(x) = -\ln(x) \) is convex for \( x \in (0,\infty) \).</li>
-</ul>
-<li> Let \( f(x) = x^2 \) and \( g(x) = e^x \). Show that \( f(g(x)) \) and \( g(f(x)) \) is convex for \( x \in \mathbb{R} \). Also show that if \( f(x) \) is any convex function than \( h(x) = e^{f(x)} \) is convex.</li>
-<li> A norm is any function that satisfy the following properties</li>
-<ul>
- <li> \( f(\alpha x) = |\alpha| f(x) \) for all \( \alpha \in \mathbb{R} \).</li>
- <li> \( f(x+y) \leq f(x) + f(y) \)</li>
- <li> \( f(x) \leq 0 \) for all \( x \in \mathbb{R}^n \) with equality if and only if \( x = 0 \)</li>
-</ul>
+<li> Draw with replacement \( n \) numbers for the observed variables \( \boldsymbol{x} = (x_1,x_2,\cdots,x_n) \).</li> 
+<li> Define a vector \( \boldsymbol{x}^* \) containing the values which were drawn from \( \boldsymbol{x} \).</li> 
+<li> Using the vector \( \boldsymbol{x}^* \) compute \( \widehat{\theta}^* \) by evaluating \( \widehat \theta \) under the observations \( \boldsymbol{x}^* \).</li> 
+<li> Repeat this process \( k \) times.</li> 
 </ol>
-<p>Using the definition of convexity, try to show that a function satisfying the properties above is convex (the third condition is not needed to show this).</p>
+<p>When you are done, you can draw a histogram of the relative frequency
+of \( \widehat \theta^* \). This is your estimate of the probability
+distribution \( p(t) \). Using this probability distribution you can
+estimate any statistics thereof. In principle you never draw the
+histogram of the relative frequency of \( \widehat{\theta}^* \). Instead
+you use the estimators corresponding to the statistic of interest. For
+example, if you are interested in estimating the variance of \( \widehat
+\theta \), apply the etsimator \( \widehat \sigma^2 \) to the values
+\( \widehat \theta^* \).
+</p>
 
-<!-- !split  -->
-<h2 id="revisiting-our-first-homework">Revisiting our first homework </h2>
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="code-example-for-the-bootstrap-method">Code example for the Bootstrap method </h2>
 
-<p>We will use linear regression as a case study for the gradient descent
-methods. Linear regression is a great test case for the gradient
-descent methods discussed in the lectures since it has several
-desirable properties such as:
+<p>The following code starts with a Gaussian distribution with mean value
+\( \mu =100 \) and variance \( \sigma=15 \). We use this to generate the data
+used in the bootstrap analysis. The bootstrap analysis returns a data
+set after a given number of bootstrap operations (as many as we have
+data points). This data set consists of estimated mean values for each
+bootstrap operation. The histogram generated by the bootstrap method
+shows that the distribution for these mean values is also a Gaussian,
+centered around the mean value \( \mu=100 \) but with standard deviation
+\( \sigma/\sqrt{n} \), where \( n \) is the number of bootstrap samples (in
+this case the same as the number of original data points). The value
+of the standard deviation is what we expect from the central limit
+theorem.
 </p>
 
-<ol>
-<li> An analytical solution (recall homework set 1).</li>
-<li> The gradient can be computed analytically.</li>
-<li> The cost function is convex which guarantees that gradient descent converges for small enough learning rates</li>
-</ol>
-<p>We revisit an example similar to what we had in the first homework set. We had a function  of the type</p>
 
+<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #eeeedd">
+  <pre style="line-height: 125%;"><span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">time</span> <span style="color: #8B008B; font-weight: bold">import</span> time
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">scipy.stats</span> <span style="color: #8B008B; font-weight: bold">import</span> norm
+<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">matplotlib.pyplot</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">plt</span>
+
+<span style="color: #228B22"># Returns mean of bootstrap samples </span>
+<span style="color: #228B22"># Bootstrap algorithm</span>
+<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">bootstrap</span>(data, datapoints):
+    t = np.zeros(datapoints)
+    n = <span style="color: #658b00">len</span>(data)
+    <span style="color: #228B22"># non-parametric bootstrap         </span>
+    <span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(datapoints):
+        t[i] = np.mean(data[np.random.randint(<span style="color: #B452CD">0</span>,n,n)])
+    <span style="color: #228B22"># analysis    </span>
+    <span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;Bootstrap Statistics :&quot;</span>)
+    <span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;original           bias      std. error&quot;</span>)
+    <span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;%8g %8g %14g %15g&quot;</span> % (np.mean(data), np.std(data),np.mean(t),np.std(t)))
+    <span style="color: #8B008B; font-weight: bold">return</span> t
+
+<span style="color: #228B22"># We set the mean value to 100 and the standard deviation to 15</span>
+mu, sigma = <span style="color: #B452CD">100</span>, <span style="color: #B452CD">15</span>
+datapoints = <span style="color: #B452CD">10000</span>
+<span style="color: #228B22"># We generate random numbers according to the normal distribution</span>
+x = mu + sigma*np.random.randn(datapoints)
+<span style="color: #228B22"># bootstrap returns the data sample                                    </span>
+t = bootstrap(x, datapoints)
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
+</div>
+
+<p>We see that our new variance and from that the standard deviation, agrees with the central limit theorem.</p>
+
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="plotting-the-histogram">Plotting the Histogram </h2>
 
 <!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
 <div class="cell border-box-sizing code_cell rendered">
@@ -1767,8 +1028,15 @@ <h2 id="revisiting-our-first-homework">Revisiting our first homework </h2>
     <div class="inner_cell">
       <div class="input_area">
         <div class="highlight" style="background: #eeeedd">
-  <pre style="line-height: 125%;">x = <span style="color: #B452CD">2</span>*np.random.rand(m,<span style="color: #B452CD">1</span>)
-y = <span style="color: #B452CD">4</span>+<span style="color: #B452CD">3</span>*x+np.random.randn(m,<span style="color: #B452CD">1</span>)
+  <pre style="line-height: 125%;"><span style="color: #228B22"># the histogram of the bootstrapped data (normalized data if density = True)</span>
+n, binsboot, patches = plt.hist(t, <span style="color: #B452CD">50</span>, density=<span style="color: #8B008B; font-weight: bold">True</span>, facecolor=<span style="color: #CD5555">&#39;red&#39;</span>, alpha=<span style="color: #B452CD">0.75</span>)
+<span style="color: #228B22"># add a &#39;best fit&#39; line  </span>
+y = norm.pdf(binsboot, np.mean(t), np.std(t))
+lt = plt.plot(binsboot, y, <span style="color: #CD5555">&#39;b&#39;</span>, linewidth=<span style="color: #B452CD">1</span>)
+plt.xlabel(<span style="color: #CD5555">&#39;x&#39;</span>)
+plt.ylabel(<span style="color: #CD5555">&#39;Probability&#39;</span>)
+plt.grid(<span style="color: #8B008B; font-weight: bold">True</span>)
+plt.show()
 </pre>
 </div>
       </div>
@@ -1784,85 +1052,78 @@ <h2 id="revisiting-our-first-homework">Revisiting our first homework </h2>
   </div>
 </div>
 
-<p>with \( x_i \in [0,1]  \) is chosen randomly using a uniform distribution. Additionally we have a stochastic noise chosen according to a normal distribution \( \cal {N}(0,1) \). 
-The linear regression model is given by 
+
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="the-bias-variance-tradeoff">The bias-variance tradeoff </h2>
+
+<p>We will discuss the bias-variance tradeoff in the context of
+continuous predictions such as regression. However, many of the
+intuitions and ideas discussed here also carry over to classification
+tasks. Consider a dataset \( \mathcal{D} \) consisting of the data
+\( \mathbf{X}_\mathcal{D}=\{(y_j, \boldsymbol{x}_j), j=0\ldots n-1\} \). 
 </p>
-$$
-h_\beta(x) = \boldsymbol{y} = \beta_0 + \beta_1 x,
-$$
 
-<p>such that </p>
+<p>Let us assume that the true data is generated from a noisy model</p>
+
 $$
-\boldsymbol{y}_i = \beta_0 + \beta_1 x_i.
+\boldsymbol{y}=f(\boldsymbol{x}) + \boldsymbol{\epsilon}
 $$
 
+<p>where \( \epsilon \) is normally distributed with mean zero and standard deviation \( \sigma^2 \).</p>
 
-<!-- !split  -->
-<h2 id="gradient-descent-example">Gradient descent example </h2>
-
-<p>Let \( \mathbf{y} = (y_1,\cdots,y_n)^T \), \( \mathbf{\boldsymbol{y}} = (\boldsymbol{y}_1,\cdots,\boldsymbol{y}_n)^T \) and \( \beta = (\beta_0, \beta_1)^T \)</p>
+<p>In our derivation of the ordinary least squares method we defined then
+an approximation to the function \( f \) in terms of the parameters
+\( \boldsymbol{\theta} \) and the design matrix \( \boldsymbol{X} \) which embody our model,
+that is \( \boldsymbol{\tilde{y}}=\boldsymbol{X}\boldsymbol{\theta} \). 
+</p>
 
-<p>It is convenient to write \( \mathbf{\boldsymbol{y}} = X\beta \) where \( X \in \mathbb{R}^{100 \times 2}  \) is the design matrix given by (we keep the intercept here)</p>
+<p>Thereafter we found the parameters \( \boldsymbol{\theta} \) by optimizing the means squared error via the so-called cost function</p>
 $$
-X \equiv \begin{bmatrix}
-1 & x_1  \\
-\vdots & \vdots  \\
-1 & x_{100} &  \\
-\end{bmatrix}.
+C(\boldsymbol{X},\boldsymbol{\theta}) =\frac{1}{n}\sum_{i=0}^{n-1}(y_i-\tilde{y}_i)^2=\mathbb{E}\left[(\boldsymbol{y}-\boldsymbol{\tilde{y}})^2\right].
 $$
 
-<p>The cost/loss/risk function is given by (</p>
+<p>We can rewrite this as </p>
 $$
-C(\beta) = \frac{1}{n}||X\beta-\mathbf{y}||_{2}^{2} = \frac{1}{n}\sum_{i=1}^{100}\left[ (\beta_0 + \beta_1 x_i)^2 - 2 y_i (\beta_0 + \beta_1 x_i) + y_i^2\right] 
+\mathbb{E}\left[(\boldsymbol{y}-\boldsymbol{\tilde{y}})^2\right]=\frac{1}{n}\sum_i(f_i-\mathbb{E}\left[\boldsymbol{\tilde{y}}\right])^2+\frac{1}{n}\sum_i(\tilde{y}_i-\mathbb{E}\left[\boldsymbol{\tilde{y}}\right])^2+\sigma^2.
 $$
 
-<p>and we want to find \( \beta \) such that \( C(\beta) \) is minimized.</p>
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="the-derivative-of-the-cost-loss-function">The derivative of the cost/loss function </h2>
+<p>The three terms represent the square of the bias of the learning
+method, which can be thought of as the error caused by the simplifying
+assumptions built into the method. The second term represents the
+variance of the chosen model and finally the last terms is variance of
+the error \( \boldsymbol{\epsilon} \).
+</p>
 
-<p>Computing \( \partial C(\beta) / \partial \beta_0 \) and \( \partial C(\beta) / \partial \beta_1 \) we can show  that the gradient can be written as</p>
+<p>To derive this equation, we need to recall that the variance of \( \boldsymbol{y} \) and \( \boldsymbol{\epsilon} \) are both equal to \( \sigma^2 \). The mean value of \( \boldsymbol{\epsilon} \) is by definition equal to zero. Furthermore, the function \( f \) is not a stochastics variable, idem for \( \boldsymbol{\tilde{y}} \).
+We use a more compact notation in terms of the expectation value 
+</p>
 $$
-\nabla_{\beta} C(\beta) = \frac{2}{n}\begin{bmatrix} \sum_{i=1}^{100} \left(\beta_0+\beta_1x_i-y_i\right) \\
-\sum_{i=1}^{100}\left( x_i (\beta_0+\beta_1x_i)-y_ix_i\right) \\
-\end{bmatrix} = \frac{2}{n}X^T(X\beta - \mathbf{y}), 
+\mathbb{E}\left[(\boldsymbol{y}-\boldsymbol{\tilde{y}})^2\right]=\mathbb{E}\left[(\boldsymbol{f}+\boldsymbol{\epsilon}-\boldsymbol{\tilde{y}})^2\right],
 $$
 
-<p>where \( X \) is the design matrix defined above.</p>
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="the-hessian-matrix">The Hessian matrix </h2>
-<p>The Hessian matrix of \( C(\beta) \) is given by </p>
+<p>and adding and subtracting \( \mathbb{E}\left[\boldsymbol{\tilde{y}}\right] \) we get</p>
 $$
-\boldsymbol{H} \equiv \begin{bmatrix}
-\frac{\partial^2 C(\beta)}{\partial \beta_0^2} & \frac{\partial^2 C(\beta)}{\partial \beta_0 \partial \beta_1}  \\
-\frac{\partial^2 C(\beta)}{\partial \beta_0 \partial \beta_1} & \frac{\partial^2 C(\beta)}{\partial \beta_1^2} &  \\
-\end{bmatrix} = \frac{2}{n}X^T X.
+\mathbb{E}\left[(\boldsymbol{y}-\boldsymbol{\tilde{y}})^2\right]=\mathbb{E}\left[(\boldsymbol{f}+\boldsymbol{\epsilon}-\boldsymbol{\tilde{y}}+\mathbb{E}\left[\boldsymbol{\tilde{y}}\right]-\mathbb{E}\left[\boldsymbol{\tilde{y}}\right])^2\right],
 $$
 
-<p>This result implies that \( C(\beta) \) is a convex function since the matrix \( X^T X \) always is positive semi-definite.</p>
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="simple-program">Simple program </h2>
-
-<p>We can now write a program that minimizes \( C(\beta) \) using the gradient descent method with a constant learning rate \( \gamma \) according to </p>
+<p>which, using the abovementioned expectation values can be rewritten as </p>
 $$
-\beta_{k+1} = \beta_k - \gamma \nabla_\beta C(\beta_k), \ k=0,1,\cdots 
+\mathbb{E}\left[(\boldsymbol{y}-\boldsymbol{\tilde{y}})^2\right]=\mathbb{E}\left[(\boldsymbol{y}-\mathbb{E}\left[\boldsymbol{\tilde{y}}\right])^2\right]+\mathrm{Var}\left[\boldsymbol{\tilde{y}}\right]+\sigma^2,
 $$
 
-<p>We can use the expression we computed for the gradient and let use a
-\( \beta_0 \) be chosen randomly and let \( \gamma = 0.001 \). Stop iterating
-when \( ||\nabla_\beta C(\beta_k) || \leq \epsilon = 10^{-8} \). <b>Note that the code below does not include the latter stop criterion</b>.
-</p>
-
-<p>And finally we can compare our solution for \( \beta \) with the analytic result given by 
-\( \beta= (X^TX)^{-1} X^T \mathbf{y} \).
-</p>
+<p>that is the rewriting in terms of the so-called bias, the variance of the model \( \boldsymbol{\tilde{y}} \) and the variance of \( \boldsymbol{\epsilon} \).</p>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="gradient-descent-example">Gradient Descent Example </h2>
+<h2 id="a-way-to-read-the-bias-variance-tradeoff">A way to Read the Bias-Variance Tradeoff </h2>
+
+<br/><br/>
+<center>
+<p><img src="http://github.com/figures/BiasVariance.png" width="600" align="bottom"></p>
+</center>
+<br/><br/>
 
-<p>Here our simple example</p>
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="example-code-for-bias-variance-tradeoff">Example code for Bias-Variance tradeoff </h2>
 
 <!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
 <div class="cell border-box-sizing code_cell rendered">
@@ -1870,50 +1131,59 @@ <h2 id="gradient-descent-example">Gradient Descent Example </h2>
     <div class="inner_cell">
       <div class="input_area">
         <div class="highlight" style="background: #eeeedd">
-  <pre style="line-height: 125%;"><span style="color: #228B22"># Importing various packages</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">random</span> <span style="color: #8B008B; font-weight: bold">import</span> random, seed
+  <pre style="line-height: 125%;"><span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">matplotlib.pyplot</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">plt</span>
 <span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">matplotlib.pyplot</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">plt</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">mpl_toolkits.mplot3d</span> <span style="color: #8B008B; font-weight: bold">import</span> Axes3D
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">matplotlib</span> <span style="color: #8B008B; font-weight: bold">import</span> cm
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">matplotlib.ticker</span> <span style="color: #8B008B; font-weight: bold">import</span> LinearLocator, FormatStrFormatter
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">sys</span>
-
-<span style="color: #228B22"># the number of datapoints</span>
-n = <span style="color: #B452CD">100</span>
-x = <span style="color: #B452CD">2</span>*np.random.rand(n,<span style="color: #B452CD">1</span>)
-y = <span style="color: #B452CD">4</span>+<span style="color: #B452CD">3</span>*x+np.random.randn(n,<span style="color: #B452CD">1</span>)
-
-X = np.c_[np.ones((n,<span style="color: #B452CD">1</span>)), x]
-<span style="color: #228B22"># Hessian matrix</span>
-H = (<span style="color: #B452CD">2.0</span>/n)* X.T @ X
-<span style="color: #228B22"># Get the eigenvalues</span>
-EigValues, EigVectors = np.linalg.eig(H)
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">f&quot;Eigenvalues of Hessian Matrix:{</span>EigValues<span style="color: #CD5555">}&quot;</span>)
-
-beta_linreg = np.linalg.inv(X.T @ X) @ X.T @ y
-<span style="color: #658b00">print</span>(beta_linreg)
-beta = np.random.randn(<span style="color: #B452CD">2</span>,<span style="color: #B452CD">1</span>)
-
-eta = <span style="color: #B452CD">1.0</span>/np.max(EigValues)
-Niterations = <span style="color: #B452CD">1000</span>
-
-<span style="color: #8B008B; font-weight: bold">for</span> <span style="color: #658b00">iter</span> <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(Niterations):
-    gradient = (<span style="color: #B452CD">2.0</span>/n)*X.T @ (X @ beta-y)
-    beta -= eta*gradient
-
-<span style="color: #658b00">print</span>(beta)
-xnew = np.array([[<span style="color: #B452CD">0</span>],[<span style="color: #B452CD">2</span>]])
-xbnew = np.c_[np.ones((<span style="color: #B452CD">2</span>,<span style="color: #B452CD">1</span>)), xnew]
-ypredict = xbnew.dot(beta)
-ypredict2 = xbnew.dot(beta_linreg)
-plt.plot(xnew, ypredict, <span style="color: #CD5555">&quot;r-&quot;</span>)
-plt.plot(xnew, ypredict2, <span style="color: #CD5555">&quot;b-&quot;</span>)
-plt.plot(x, y ,<span style="color: #CD5555">&#39;ro&#39;</span>)
-plt.axis([<span style="color: #B452CD">0</span>,<span style="color: #B452CD">2.0</span>,<span style="color: #B452CD">0</span>, <span style="color: #B452CD">15.0</span>])
-plt.xlabel(<span style="color: #CD5555">r&#39;$x$&#39;</span>)
-plt.ylabel(<span style="color: #CD5555">r&#39;$y$&#39;</span>)
-plt.title(<span style="color: #CD5555">r&#39;Gradient descent example&#39;</span>)
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.linear_model</span> <span style="color: #8B008B; font-weight: bold">import</span> LinearRegression, Ridge, Lasso
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.preprocessing</span> <span style="color: #8B008B; font-weight: bold">import</span> PolynomialFeatures
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.model_selection</span> <span style="color: #8B008B; font-weight: bold">import</span> train_test_split
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.pipeline</span> <span style="color: #8B008B; font-weight: bold">import</span> make_pipeline
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.utils</span> <span style="color: #8B008B; font-weight: bold">import</span> resample
+
+np.random.seed(<span style="color: #B452CD">2018</span>)
+
+n = <span style="color: #B452CD">500</span>
+n_boostraps = <span style="color: #B452CD">100</span>
+degree = <span style="color: #B452CD">18</span>  <span style="color: #228B22"># A quite high value, just to show.</span>
+noise = <span style="color: #B452CD">0.1</span>
+
+<span style="color: #228B22"># Make data set.</span>
+x = np.linspace(-<span style="color: #B452CD">1</span>, <span style="color: #B452CD">3</span>, n).reshape(-<span style="color: #B452CD">1</span>, <span style="color: #B452CD">1</span>)
+y = np.exp(-x**<span style="color: #B452CD">2</span>) + <span style="color: #B452CD">1.5</span> * np.exp(-(x-<span style="color: #B452CD">2</span>)**<span style="color: #B452CD">2</span>) + np.random.normal(<span style="color: #B452CD">0</span>, <span style="color: #B452CD">0.1</span>, x.shape)
+
+<span style="color: #228B22"># Hold out some test data that is never used in training.</span>
+x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=<span style="color: #B452CD">0.2</span>)
+
+<span style="color: #228B22"># Combine x transformation and model into one operation.</span>
+<span style="color: #228B22"># Not neccesary, but convenient.</span>
+model = make_pipeline(PolynomialFeatures(degree=degree), LinearRegression(fit_intercept=<span style="color: #8B008B; font-weight: bold">False</span>))
+
+<span style="color: #228B22"># The following (m x n_bootstraps) matrix holds the column vectors y_pred</span>
+<span style="color: #228B22"># for each bootstrap iteration.</span>
+y_pred = np.empty((y_test.shape[<span style="color: #B452CD">0</span>], n_boostraps))
+<span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(n_boostraps):
+    x_, y_ = resample(x_train, y_train)
+
+    <span style="color: #228B22"># Evaluate the new model on the same test data each time.</span>
+    y_pred[:, i] = model.fit(x_, y_).predict(x_test).ravel()
+
+<span style="color: #228B22"># Note: Expectations and variances taken w.r.t. different training</span>
+<span style="color: #228B22"># data sets, hence the axis=1. Subsequent means are taken across the test data</span>
+<span style="color: #228B22"># set in order to obtain a total value, but before this we have error/bias/variance</span>
+<span style="color: #228B22"># calculated per data point in the test set.</span>
+<span style="color: #228B22"># Note 2: The use of keepdims=True is important in the calculation of bias as this </span>
+<span style="color: #228B22"># maintains the column vector form. Dropping this yields very unexpected results.</span>
+error = np.mean( np.mean((y_test - y_pred)**<span style="color: #B452CD">2</span>, axis=<span style="color: #B452CD">1</span>, keepdims=<span style="color: #8B008B; font-weight: bold">True</span>) )
+bias = np.mean( (y_test - np.mean(y_pred, axis=<span style="color: #B452CD">1</span>, keepdims=<span style="color: #8B008B; font-weight: bold">True</span>))**<span style="color: #B452CD">2</span> )
+variance = np.mean( np.var(y_pred, axis=<span style="color: #B452CD">1</span>, keepdims=<span style="color: #8B008B; font-weight: bold">True</span>) )
+<span style="color: #658b00">print</span>(<span style="color: #CD5555">&#39;Error:&#39;</span>, error)
+<span style="color: #658b00">print</span>(<span style="color: #CD5555">&#39;Bias^2:&#39;</span>, bias)
+<span style="color: #658b00">print</span>(<span style="color: #CD5555">&#39;Var:&#39;</span>, variance)
+<span style="color: #658b00">print</span>(<span style="color: #CD5555">&#39;{} &gt;= {} + {} = {}&#39;</span>.format(error, bias, variance, bias+variance))
+
+plt.plot(x[::<span style="color: #B452CD">5</span>, :], y[::<span style="color: #B452CD">5</span>, :], label=<span style="color: #CD5555">&#39;f(x)&#39;</span>)
+plt.scatter(x_test, y_test, label=<span style="color: #CD5555">&#39;Data points&#39;</span>)
+plt.scatter(x_test, np.mean(y_pred, axis=<span style="color: #B452CD">1</span>), label=<span style="color: #CD5555">&#39;Pred&#39;</span>)
+plt.legend()
 plt.show()
 </pre>
 </div>
@@ -1932,8 +1202,7 @@ <h2 id="gradient-descent-example">Gradient Descent Example </h2>
 
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="and-a-corresponding-example-using-scikit-learn">And a corresponding example using <b>scikit-learn</b> </h2>
-
+<h2 id="understanding-what-happens">Understanding what happens </h2>
 
 <!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
 <div class="cell border-box-sizing code_cell rendered">
@@ -1941,22 +1210,52 @@ <h2 id="and-a-corresponding-example-using-scikit-learn">And a corresponding exam
     <div class="inner_cell">
       <div class="input_area">
         <div class="highlight" style="background: #eeeedd">
-  <pre style="line-height: 125%;"><span style="color: #228B22"># Importing various packages</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">random</span> <span style="color: #8B008B; font-weight: bold">import</span> random, seed
+  <pre style="line-height: 125%;"><span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">matplotlib.pyplot</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">plt</span>
 <span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">matplotlib.pyplot</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">plt</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.linear_model</span> <span style="color: #8B008B; font-weight: bold">import</span> SGDRegressor
-
-n = <span style="color: #B452CD">100</span>
-x = <span style="color: #B452CD">2</span>*np.random.rand(n,<span style="color: #B452CD">1</span>)
-y = <span style="color: #B452CD">4</span>+<span style="color: #B452CD">3</span>*x+np.random.randn(n,<span style="color: #B452CD">1</span>)
-
-X = np.c_[np.ones((n,<span style="color: #B452CD">1</span>)), x]
-beta_linreg = np.linalg.inv(X.T @ X) @ (X.T @ y)
-<span style="color: #658b00">print</span>(beta_linreg)
-sgdreg = SGDRegressor(max_iter = <span style="color: #B452CD">50</span>, penalty=<span style="color: #8B008B; font-weight: bold">None</span>, eta0=<span style="color: #B452CD">0.1</span>)
-sgdreg.fit(x,y.ravel())
-<span style="color: #658b00">print</span>(sgdreg.intercept_, sgdreg.coef_)
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.linear_model</span> <span style="color: #8B008B; font-weight: bold">import</span> LinearRegression, Ridge, Lasso
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.preprocessing</span> <span style="color: #8B008B; font-weight: bold">import</span> PolynomialFeatures
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.model_selection</span> <span style="color: #8B008B; font-weight: bold">import</span> train_test_split
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.pipeline</span> <span style="color: #8B008B; font-weight: bold">import</span> make_pipeline
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.utils</span> <span style="color: #8B008B; font-weight: bold">import</span> resample
+
+np.random.seed(<span style="color: #B452CD">2018</span>)
+
+n = <span style="color: #B452CD">40</span>
+n_boostraps = <span style="color: #B452CD">100</span>
+maxdegree = <span style="color: #B452CD">14</span>
+
+
+<span style="color: #228B22"># Make data set.</span>
+x = np.linspace(-<span style="color: #B452CD">3</span>, <span style="color: #B452CD">3</span>, n).reshape(-<span style="color: #B452CD">1</span>, <span style="color: #B452CD">1</span>)
+y = np.exp(-x**<span style="color: #B452CD">2</span>) + <span style="color: #B452CD">1.5</span> * np.exp(-(x-<span style="color: #B452CD">2</span>)**<span style="color: #B452CD">2</span>)+ np.random.normal(<span style="color: #B452CD">0</span>, <span style="color: #B452CD">0.1</span>, x.shape)
+error = np.zeros(maxdegree)
+bias = np.zeros(maxdegree)
+variance = np.zeros(maxdegree)
+polydegree = np.zeros(maxdegree)
+x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=<span style="color: #B452CD">0.2</span>)
+
+<span style="color: #8B008B; font-weight: bold">for</span> degree <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(maxdegree):
+    model = make_pipeline(PolynomialFeatures(degree=degree), LinearRegression(fit_intercept=<span style="color: #8B008B; font-weight: bold">False</span>))
+    y_pred = np.empty((y_test.shape[<span style="color: #B452CD">0</span>], n_boostraps))
+    <span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(n_boostraps):
+        x_, y_ = resample(x_train, y_train)
+        y_pred[:, i] = model.fit(x_, y_).predict(x_test).ravel()
+
+    polydegree[degree] = degree
+    error[degree] = np.mean( np.mean((y_test - y_pred)**<span style="color: #B452CD">2</span>, axis=<span style="color: #B452CD">1</span>, keepdims=<span style="color: #8B008B; font-weight: bold">True</span>) )
+    bias[degree] = np.mean( (y_test - np.mean(y_pred, axis=<span style="color: #B452CD">1</span>, keepdims=<span style="color: #8B008B; font-weight: bold">True</span>))**<span style="color: #B452CD">2</span> )
+    variance[degree] = np.mean( np.var(y_pred, axis=<span style="color: #B452CD">1</span>, keepdims=<span style="color: #8B008B; font-weight: bold">True</span>) )
+    <span style="color: #658b00">print</span>(<span style="color: #CD5555">&#39;Polynomial degree:&#39;</span>, degree)
+    <span style="color: #658b00">print</span>(<span style="color: #CD5555">&#39;Error:&#39;</span>, error[degree])
+    <span style="color: #658b00">print</span>(<span style="color: #CD5555">&#39;Bias^2:&#39;</span>, bias[degree])
+    <span style="color: #658b00">print</span>(<span style="color: #CD5555">&#39;Var:&#39;</span>, variance[degree])
+    <span style="color: #658b00">print</span>(<span style="color: #CD5555">&#39;{} &gt;= {} + {} = {}&#39;</span>.format(error[degree], bias[degree], variance[degree], bias[degree]+variance[degree]))
+
+plt.plot(polydegree, error, label=<span style="color: #CD5555">&#39;Error&#39;</span>)
+plt.plot(polydegree, bias, label=<span style="color: #CD5555">&#39;bias&#39;</span>)
+plt.plot(polydegree, variance, label=<span style="color: #CD5555">&#39;Variance&#39;</span>)
+plt.legend()
+plt.show()
 </pre>
 </div>
       </div>
@@ -1974,45 +1273,59 @@ <h2 id="and-a-corresponding-example-using-scikit-learn">And a corresponding exam
 
 
 <!-- !split  -->
-<h2 id="gradient-descent-and-ridge">Gradient descent and Ridge </h2>
+<h2 id="summing-up">Summing up </h2>
 
-<p>We have also discussed Ridge regression where the loss function contains a regularized term given by the \( L_2 \) norm of \( \beta \), </p>
-$$
-C_{\text{ridge}}(\beta) = \frac{1}{n}||X\beta -\mathbf{y}||^2 + \lambda ||\beta||^2, \ \lambda \geq 0.
-$$
+<p>The bias-variance tradeoff summarizes the fundamental tension in
+machine learning, particularly supervised learning, between the
+complexity of a model and the amount of training data needed to train
+it.  Since data is often limited, in practice it is often useful to
+use a less-complex model with higher bias, that is  a model whose asymptotic
+performance is worse than another model because it is easier to
+train and less sensitive to sampling noise arising from having a
+finite-sized training dataset (smaller variance). 
+</p>
 
-<p>In order to minimize \( C_{\text{ridge}}(\beta) \) using GD we adjust the gradient as follows </p>
-$$
-\nabla_\beta C_{\text{ridge}}(\beta)  = \frac{2}{n}\begin{bmatrix} \sum_{i=1}^{100} \left(\beta_0+\beta_1x_i-y_i\right) \\
-\sum_{i=1}^{100}\left( x_i (\beta_0+\beta_1x_i)-y_ix_i\right) \\
-\end{bmatrix} + 2\lambda\begin{bmatrix} \beta_0 \\ \beta_1\end{bmatrix} = 2 (\frac{1}{n}X^T(X\beta - \mathbf{y})+\lambda \beta).
-$$
+<p>The above equations tell us that in
+order to minimize the expected test error, we need to select a
+statistical learning method that simultaneously achieves low variance
+and low bias. Note that variance is inherently a nonnegative quantity,
+and squared bias is also nonnegative. Hence, we see that the expected
+test MSE can never lie below \( Var(\epsilon) \), the irreducible error.
+</p>
 
-<p>We can easily extend our program to minimize \( C_{\text{ridge}}(\beta) \) using gradient descent and compare with the analytical solution given by </p>
-$$
-\beta_{\text{ridge}} = \left(X^T X + n\lambda I_{2 \times 2} \right)^{-1} X^T \mathbf{y}.
-$$
+<p>What do we mean by the variance and bias of a statistical learning
+method? The variance refers to the amount by which our model would change if we
+estimated it using a different training data set. Since the training
+data are used to fit the statistical learning method, different
+training data sets  will result in a different estimate. But ideally the
+estimate for our model should not vary too much between training
+sets. However, if a method has high variance  then small changes in
+the training data can result in large changes in the model. In general, more
+flexible statistical methods have higher variance.
+</p>
 
+<p>You may also find this recent <a href="https://www.pnas.org/content/116/32/15849" target="_blank">article</a> of interest.</p>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="the-hessian-matrix-for-ridge-regression">The Hessian matrix for Ridge Regression </h2>
-<p>The Hessian matrix of Ridge Regression for our simple example  is given by </p>
-$$
-\boldsymbol{H} \equiv \begin{bmatrix}
-\frac{\partial^2 C(\beta)}{\partial \beta_0^2} & \frac{\partial^2 C(\beta)}{\partial \beta_0 \partial \beta_1}  \\
-\frac{\partial^2 C(\beta)}{\partial \beta_0 \partial \beta_1} & \frac{\partial^2 C(\beta)}{\partial \beta_1^2} &  \\
-\end{bmatrix} = \frac{2}{n}X^T X+2\lambda\boldsymbol{I}.
-$$
-
-<p>This implies that the Hessian matrix  is positive definite, hence the stationary point is a
-minimum.
-Note that the Ridge cost function is convex being  a sum of two convex
-functions. Therefore, the stationary point is a global
-minimum of this function.
+<h2 id="another-example-from-scikit-learn-s-repository">Another Example from Scikit-Learn's Repository </h2>
+
+<p>This example demonstrates the problems of underfitting and overfitting and
+how we can use linear regression with polynomial features to approximate
+nonlinear functions. The plot shows the function that we want to approximate,
+which is a part of the cosine function. In addition, the samples from the
+real function and the approximations of different models are displayed. The
+models have polynomial features of different degrees. We can see that a
+linear function (polynomial with degree 1) is not sufficient to fit the
+training samples. This is called <b>underfitting</b>. A polynomial of degree 4
+approximates the true function almost perfectly. However, for higher degrees
+the model will <b>overfit</b> the training data, i.e. it learns the noise of the
+training data.
+We evaluate quantitatively overfitting and underfitting by using
+cross-validation. We calculate the mean squared error (MSE) on the validation
+set, the higher, the less likely the model generalizes correctly from the
+training data.
 </p>
 
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="program-example-for-gradient-descent-with-ridge-regression">Program example for gradient descent with Ridge Regression </h2>
 
 <!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
 <div class="cell border-box-sizing code_cell rendered">
@@ -2020,55 +1333,54 @@ <h2 id="program-example-for-gradient-descent-with-ridge-regression">Program exam
     <div class="inner_cell">
       <div class="input_area">
         <div class="highlight" style="background: #eeeedd">
-  <pre style="line-height: 125%;"><span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">random</span> <span style="color: #8B008B; font-weight: bold">import</span> random, seed
+  <pre style="line-height: 125%;"><span style="color: #228B22">#print(__doc__)</span>
+
 <span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
 <span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">matplotlib.pyplot</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">plt</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">mpl_toolkits.mplot3d</span> <span style="color: #8B008B; font-weight: bold">import</span> Axes3D
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">matplotlib</span> <span style="color: #8B008B; font-weight: bold">import</span> cm
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">matplotlib.ticker</span> <span style="color: #8B008B; font-weight: bold">import</span> LinearLocator, FormatStrFormatter
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">sys</span>
-
-<span style="color: #228B22"># the number of datapoints</span>
-n = <span style="color: #B452CD">100</span>
-x = <span style="color: #B452CD">2</span>*np.random.rand(n,<span style="color: #B452CD">1</span>)
-y = <span style="color: #B452CD">4</span>+<span style="color: #B452CD">3</span>*x+np.random.randn(n,<span style="color: #B452CD">1</span>)
-
-X = np.c_[np.ones((n,<span style="color: #B452CD">1</span>)), x]
-XT_X = X.T @ X
-
-<span style="color: #228B22">#Ridge parameter lambda</span>
-lmbda  = <span style="color: #B452CD">0.001</span>
-Id = n*lmbda* np.eye(XT_X.shape[<span style="color: #B452CD">0</span>])
-
-<span style="color: #228B22"># Hessian matrix</span>
-H = (<span style="color: #B452CD">2.0</span>/n)* XT_X+<span style="color: #B452CD">2</span>*lmbda* np.eye(XT_X.shape[<span style="color: #B452CD">0</span>])
-<span style="color: #228B22"># Get the eigenvalues</span>
-EigValues, EigVectors = np.linalg.eig(H)
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">f&quot;Eigenvalues of Hessian Matrix:{</span>EigValues<span style="color: #CD5555">}&quot;</span>)
-
-
-beta_linreg = np.linalg.inv(XT_X+Id) @ X.T @ y
-<span style="color: #658b00">print</span>(beta_linreg)
-<span style="color: #228B22"># Start plain gradient descent</span>
-beta = np.random.randn(<span style="color: #B452CD">2</span>,<span style="color: #B452CD">1</span>)
-
-eta = <span style="color: #B452CD">1.0</span>/np.max(EigValues)
-Niterations = <span style="color: #B452CD">100</span>
-
-<span style="color: #8B008B; font-weight: bold">for</span> <span style="color: #658b00">iter</span> <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(Niterations):
-    gradients = <span style="color: #B452CD">2.0</span>/n*X.T @ (X @ (beta)-y)+<span style="color: #B452CD">2</span>*lmbda*beta
-    beta -= eta*gradients
-
-<span style="color: #658b00">print</span>(beta)
-ypredict = X @ beta
-ypredict2 = X @ beta_linreg
-plt.plot(x, ypredict, <span style="color: #CD5555">&quot;r-&quot;</span>)
-plt.plot(x, ypredict2, <span style="color: #CD5555">&quot;b-&quot;</span>)
-plt.plot(x, y ,<span style="color: #CD5555">&#39;ro&#39;</span>)
-plt.axis([<span style="color: #B452CD">0</span>,<span style="color: #B452CD">2.0</span>,<span style="color: #B452CD">0</span>, <span style="color: #B452CD">15.0</span>])
-plt.xlabel(<span style="color: #CD5555">r&#39;$x$&#39;</span>)
-plt.ylabel(<span style="color: #CD5555">r&#39;$y$&#39;</span>)
-plt.title(<span style="color: #CD5555">r&#39;Gradient descent example for Ridge&#39;</span>)
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.pipeline</span> <span style="color: #8B008B; font-weight: bold">import</span> Pipeline
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.preprocessing</span> <span style="color: #8B008B; font-weight: bold">import</span> PolynomialFeatures
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.linear_model</span> <span style="color: #8B008B; font-weight: bold">import</span> LinearRegression
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.model_selection</span> <span style="color: #8B008B; font-weight: bold">import</span> cross_val_score
+
+
+<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">true_fun</span>(X):
+    <span style="color: #8B008B; font-weight: bold">return</span> np.cos(<span style="color: #B452CD">1.5</span> * np.pi * X)
+
+np.random.seed(<span style="color: #B452CD">0</span>)
+
+n_samples = <span style="color: #B452CD">30</span>
+degrees = [<span style="color: #B452CD">1</span>, <span style="color: #B452CD">4</span>, <span style="color: #B452CD">15</span>]
+
+X = np.sort(np.random.rand(n_samples))
+y = true_fun(X) + np.random.randn(n_samples) * <span style="color: #B452CD">0.1</span>
+
+plt.figure(figsize=(<span style="color: #B452CD">14</span>, <span style="color: #B452CD">5</span>))
+<span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #658b00">len</span>(degrees)):
+    ax = plt.subplot(<span style="color: #B452CD">1</span>, <span style="color: #658b00">len</span>(degrees), i + <span style="color: #B452CD">1</span>)
+    plt.setp(ax, xticks=(), yticks=())
+
+    polynomial_features = PolynomialFeatures(degree=degrees[i],
+                                             include_bias=<span style="color: #8B008B; font-weight: bold">False</span>)
+    linear_regression = LinearRegression()
+    pipeline = Pipeline([(<span style="color: #CD5555">&quot;polynomial_features&quot;</span>, polynomial_features),
+                         (<span style="color: #CD5555">&quot;linear_regression&quot;</span>, linear_regression)])
+    pipeline.fit(X[:, np.newaxis], y)
+
+    <span style="color: #228B22"># Evaluate the models using crossvalidation</span>
+    scores = cross_val_score(pipeline, X[:, np.newaxis], y,
+                             scoring=<span style="color: #CD5555">&quot;neg_mean_squared_error&quot;</span>, cv=<span style="color: #B452CD">10</span>)
+
+    X_test = np.linspace(<span style="color: #B452CD">0</span>, <span style="color: #B452CD">1</span>, <span style="color: #B452CD">100</span>)
+    plt.plot(X_test, pipeline.predict(X_test[:, np.newaxis]), label=<span style="color: #CD5555">&quot;Model&quot;</span>)
+    plt.plot(X_test, true_fun(X_test), label=<span style="color: #CD5555">&quot;True function&quot;</span>)
+    plt.scatter(X, y, edgecolor=<span style="color: #CD5555">&#39;b&#39;</span>, s=<span style="color: #B452CD">20</span>, label=<span style="color: #CD5555">&quot;Samples&quot;</span>)
+    plt.xlabel(<span style="color: #CD5555">&quot;x&quot;</span>)
+    plt.ylabel(<span style="color: #CD5555">&quot;y&quot;</span>)
+    plt.xlim((<span style="color: #B452CD">0</span>, <span style="color: #B452CD">1</span>))
+    plt.ylim((-<span style="color: #B452CD">2</span>, <span style="color: #B452CD">2</span>))
+    plt.legend(loc=<span style="color: #CD5555">&quot;best&quot;</span>)
+    plt.title(<span style="color: #CD5555">&quot;Degree {}\nMSE = {:.2e}(+/- {:.2e})&quot;</span>.format(
+        degrees[i], -scores.mean(), scores.std()))
 plt.show()
 </pre>
 </div>
@@ -2086,25 +1398,6 @@ <h2 id="program-example-for-gradient-descent-with-ridge-regression">Program exam
 </div>
 
 
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="using-gradient-descent-methods-limitations">Using gradient descent methods, limitations </h2>
-
-<ul>
-<li> <b>Gradient descent (GD) finds local minima of our function</b>. Since the GD algorithm is deterministic, if it converges, it will converge to a local minimum of our cost/loss/risk function. Because in ML we are often dealing with extremely rugged landscapes with many local minima, this can lead to poor performance.</li>
-<li> <b>GD is sensitive to initial conditions</b>. One consequence of the local nature of GD is that initial conditions matter. Depending on where one starts, one will end up at a different local minima. Therefore, it is very important to think about how one initializes the training process. This is true for GD as well as more complicated variants of GD.</li>
-<li> <b>Gradients are computationally expensive to calculate for large datasets</b>. In many cases in statistics and ML, the cost/loss/risk function is a sum of terms, with one term for each data point. For example, in linear regression, \( E \propto \sum_{i=1}^n (y_i - \mathbf{w}^T\cdot\mathbf{x}_i)^2 \); for logistic regression, the square error is replaced by the cross entropy. To calculate the gradient we have to sum over <em>all</em> \( n \) data points. Doing this at every GD step becomes extremely computationally expensive. An ingenious solution to this, is to calculate the gradients using small subsets of the data called &quot;mini batches&quot;. This has the added benefit of introducing stochasticity into our algorithm.</li>
-<li> <b>GD is very sensitive to choices of learning rates</b>. GD is extremely sensitive to the choice of learning rates. If the learning rate is very small, the training process take an extremely long time. For larger learning rates, GD can diverge and give poor results. Furthermore, depending on what the local landscape looks like, we have to modify the learning rates to ensure convergence. Ideally, we would <em>adaptively</em> choose the learning rates to match the landscape.</li>
-<li> <b>GD treats all directions in parameter space uniformly.</b> Another major drawback of GD is that unlike Newton's method, the learning rate for GD is the same in all directions in parameter space. For this reason, the maximum learning rate is set by the behavior of the steepest direction and this can significantly slow down training. Ideally, we would like to take large steps in flat directions and small steps in steep directions. Since we are exploring rugged landscapes where curvatures change, this requires us to keep track of not only the gradient but second derivatives. The ideal scenario would be to calculate the Hessian but this proves to be too computationally expensive.</li> 
-<li> GD can take exponential time to escape saddle points, even with random initialization. As we mentioned, GD is extremely sensitive to initial condition since it determines the particular local minimum GD would eventually reach. However, even with a good initialization scheme, through the introduction of randomness, GD can still take exponential time to escape saddle points.</li>
-</ul>
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="challenge-yourself-the-coming-weekend">Challenge yourself the coming weekend </h2>
-
-<p>Write a code which implements gradient descent for a logistic regression example.</p>
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="lab-session-material-from-last-week-and-relevant-for-the-first-project">Lab session: Material from last week and relevant for the first project </h2>
-
 <!-- !split  -->
 <h2 id="various-steps-in-cross-validation">Various steps in cross-validation </h2>
 
@@ -2124,28 +1417,6 @@ <h2 id="various-steps-in-cross-validation">Various steps in cross-validation </h
 cross-validation (LOOCV). 
 </p>
 
-<!-- !split  -->
-<h2 id="how-to-set-up-the-cross-validation-for-ridge-and-or-lasso">How to set up the cross-validation for Ridge and/or Lasso </h2>
-
-<ul>
-<li> Define a range of interest for the penalty parameter.</li>
-<li> Divide the data set into training and test set comprising samples \( \{1, \ldots, n\} \setminus i \) and \( \{ i \} \), respectively.</li>
-<li> Fit the linear regression model by means of for example Ridge or Lasso regression  for each \( \lambda \) in the grid using the training set, and the corresponding estimate of the error variance \( \boldsymbol{\sigma}_{-i}^2(\lambda) \), as</li>
-</ul>
-$$
-\begin{align*}
-\boldsymbol{\beta}_{-i}(\lambda) & =  ( \boldsymbol{X}_{-i, \ast}^{T}
-\boldsymbol{X}_{-i, \ast} + \lambda \boldsymbol{I}_{pp})^{-1}
-\boldsymbol{X}_{-i, \ast}^{T} \boldsymbol{y}_{-i}
-\end{align*}
-$$
-
-
-<ul>
-<li> Evaluate the prediction performance of these models on the test set by \( C[y_i, \boldsymbol{X}_{i, \ast}; \boldsymbol{\beta}_{-i}(\lambda), \boldsymbol{\sigma}_{-i}^2(\lambda)] \). Or, by the prediction error \( |y_i - \boldsymbol{X}_{i, \ast} \boldsymbol{\beta}_{-i}(\lambda)| \), the relative error, the error squared or the R2 score function.</li>
-<li> Repeat the first three steps  such that each sample plays the role of the test set once.</li>
-<li> Average the prediction performances of the test sets at each grid point of the penalty bias/parameter. It is an estimate of the prediction performance of the model corresponding to this value of the penalty parameter on novel data.</li> 
-</ul>
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
 <h2 id="cross-validation-in-brief">Cross-validation in brief </h2>
 
@@ -2156,10 +1427,10 @@ <h2 id="cross-validation-in-brief">Cross-validation in brief </h2>
 <li> Split the dataset into \( k \) groups.</li>
 <li> For each unique group:
 <ol type="a"></li>
- <li> Decide which group to use as set for test data</li>
- <li> Take the remaining groups as a training data set</li>
- <li> Fit a model on the training set and evaluate it on the test set</li>
- <li> Retain the evaluation score and discard the model</li>
+<li> Decide which group to use as set for test data</li>
+<li> Take the remaining groups as a training data set</li>
+<li> Fit a model on the training set and evaluate it on the test set</li>
+<li> Retain the evaluation score and discard the model</li>
 </ol>
 <li> Summarize the model using the sample of model evaluation scores</li>
 </ol>
@@ -2279,9 +1550,220 @@ <h2 id="code-example-for-cross-validation-and-k-fold-cross-validation">Code Exam
 </div>
 
 
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="more-examples-on-bootstrap-and-cross-validation-and-errors">More examples on bootstrap and cross-validation and errors </h2>
+
+
+<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #eeeedd">
+  <pre style="line-height: 125%;"><span style="color: #228B22"># Common imports</span>
+<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">os</span>
+<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
+<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">pandas</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">pd</span>
+<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">matplotlib.pyplot</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">plt</span>
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.linear_model</span> <span style="color: #8B008B; font-weight: bold">import</span> LinearRegression, Ridge, Lasso
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.model_selection</span> <span style="color: #8B008B; font-weight: bold">import</span> train_test_split
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.utils</span> <span style="color: #8B008B; font-weight: bold">import</span> resample
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.metrics</span> <span style="color: #8B008B; font-weight: bold">import</span> mean_squared_error
+<span style="color: #228B22"># Where to save the figures and data files</span>
+PROJECT_ROOT_DIR = <span style="color: #CD5555">&quot;Results&quot;</span>
+FIGURE_ID = <span style="color: #CD5555">&quot;Results/FigureFiles&quot;</span>
+DATA_ID = <span style="color: #CD5555">&quot;DataFiles/&quot;</span>
+
+<span style="color: #8B008B; font-weight: bold">if</span> <span style="color: #8B008B">not</span> os.path.exists(PROJECT_ROOT_DIR):
+    os.mkdir(PROJECT_ROOT_DIR)
+
+<span style="color: #8B008B; font-weight: bold">if</span> <span style="color: #8B008B">not</span> os.path.exists(FIGURE_ID):
+    os.makedirs(FIGURE_ID)
+
+<span style="color: #8B008B; font-weight: bold">if</span> <span style="color: #8B008B">not</span> os.path.exists(DATA_ID):
+    os.makedirs(DATA_ID)
+
+<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">image_path</span>(fig_id):
+    <span style="color: #8B008B; font-weight: bold">return</span> os.path.join(FIGURE_ID, fig_id)
+
+<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">data_path</span>(dat_id):
+    <span style="color: #8B008B; font-weight: bold">return</span> os.path.join(DATA_ID, dat_id)
+
+<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">save_fig</span>(fig_id):
+    plt.savefig(image_path(fig_id) + <span style="color: #CD5555">&quot;.png&quot;</span>, <span style="color: #658b00">format</span>=<span style="color: #CD5555">&#39;png&#39;</span>)
+
+infile = <span style="color: #658b00">open</span>(data_path(<span style="color: #CD5555">&quot;EoS.csv&quot;</span>),<span style="color: #CD5555">&#39;r&#39;</span>)
+
+<span style="color: #228B22"># Read the EoS data as  csv file and organize the data into two arrays with density and energies</span>
+EoS = pd.read_csv(infile, names=(<span style="color: #CD5555">&#39;Density&#39;</span>, <span style="color: #CD5555">&#39;Energy&#39;</span>))
+EoS[<span style="color: #CD5555">&#39;Energy&#39;</span>] = pd.to_numeric(EoS[<span style="color: #CD5555">&#39;Energy&#39;</span>], errors=<span style="color: #CD5555">&#39;coerce&#39;</span>)
+EoS = EoS.dropna()
+Energies = EoS[<span style="color: #CD5555">&#39;Energy&#39;</span>]
+Density = EoS[<span style="color: #CD5555">&#39;Density&#39;</span>]
+<span style="color: #228B22">#  The design matrix now as function of various polytrops</span>
+
+Maxpolydegree = <span style="color: #B452CD">30</span>
+X = np.zeros((<span style="color: #658b00">len</span>(Density),Maxpolydegree))
+X[:,<span style="color: #B452CD">0</span>] = <span style="color: #B452CD">1.0</span>
+testerror = np.zeros(Maxpolydegree)
+trainingerror = np.zeros(Maxpolydegree)
+polynomial = np.zeros(Maxpolydegree)
+
+trials = <span style="color: #B452CD">100</span>
+<span style="color: #8B008B; font-weight: bold">for</span> polydegree <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #B452CD">1</span>, Maxpolydegree):
+    polynomial[polydegree] = polydegree
+    <span style="color: #8B008B; font-weight: bold">for</span> degree <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(polydegree):
+        X[:,degree] = Density**(degree/<span style="color: #B452CD">3.0</span>)
+
+<span style="color: #228B22"># loop over trials in order to estimate the expectation value of the MSE</span>
+    testerror[polydegree] = <span style="color: #B452CD">0.0</span>
+    trainingerror[polydegree] = <span style="color: #B452CD">0.0</span>
+    <span style="color: #8B008B; font-weight: bold">for</span> samples <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(trials):
+        x_train, x_test, y_train, y_test = train_test_split(X, Energies, test_size=<span style="color: #B452CD">0.2</span>)
+        model = LinearRegression(fit_intercept=<span style="color: #8B008B; font-weight: bold">False</span>).fit(x_train, y_train)
+        ypred = model.predict(x_train)
+        ytilde = model.predict(x_test)
+        testerror[polydegree] += mean_squared_error(y_test, ytilde)
+        trainingerror[polydegree] += mean_squared_error(y_train, ypred) 
+
+    testerror[polydegree] /= trials
+    trainingerror[polydegree] /= trials
+    <span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;Degree of polynomial: %3d&quot;</span>% polynomial[polydegree])
+    <span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;Mean squared error on training data: %.8f&quot;</span> % trainingerror[polydegree])
+    <span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;Mean squared error on test data: %.8f&quot;</span> % testerror[polydegree])
+
+plt.plot(polynomial, np.log10(trainingerror), label=<span style="color: #CD5555">&#39;Training Error&#39;</span>)
+plt.plot(polynomial, np.log10(testerror), label=<span style="color: #CD5555">&#39;Test Error&#39;</span>)
+plt.xlabel(<span style="color: #CD5555">&#39;Polynomial degree&#39;</span>)
+plt.ylabel(<span style="color: #CD5555">&#39;log10[MSE]&#39;</span>)
+plt.legend()
+plt.show()
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
+</div>
+
+<p>Note that we kept the intercept column in the fitting here. This means that we need to set the <b>intercept</b> in the call to the <b>Scikit-Learn</b> function as <b>False</b>. Alternatively, we could have set up the design matrix \( X \) without the first column of ones.</p>
+
+<!-- !split  -->
+<h2 id="the-same-example-but-now-with-cross-validation">The same example but now with cross-validation </h2>
+
+<p>In this example we keep the intercept column again but add cross-validation in order to estimate the best possible value of the means squared error.</p>
+
+<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #eeeedd">
+  <pre style="line-height: 125%;"><span style="color: #228B22"># Common imports</span>
+<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">os</span>
+<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
+<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">pandas</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">pd</span>
+<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">matplotlib.pyplot</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">plt</span>
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.linear_model</span> <span style="color: #8B008B; font-weight: bold">import</span> LinearRegression, Ridge, Lasso
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.metrics</span> <span style="color: #8B008B; font-weight: bold">import</span> mean_squared_error
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.model_selection</span> <span style="color: #8B008B; font-weight: bold">import</span> KFold
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.model_selection</span> <span style="color: #8B008B; font-weight: bold">import</span> cross_val_score
+
+
+<span style="color: #228B22"># Where to save the figures and data files</span>
+PROJECT_ROOT_DIR = <span style="color: #CD5555">&quot;Results&quot;</span>
+FIGURE_ID = <span style="color: #CD5555">&quot;Results/FigureFiles&quot;</span>
+DATA_ID = <span style="color: #CD5555">&quot;DataFiles/&quot;</span>
+
+<span style="color: #8B008B; font-weight: bold">if</span> <span style="color: #8B008B">not</span> os.path.exists(PROJECT_ROOT_DIR):
+    os.mkdir(PROJECT_ROOT_DIR)
+
+<span style="color: #8B008B; font-weight: bold">if</span> <span style="color: #8B008B">not</span> os.path.exists(FIGURE_ID):
+    os.makedirs(FIGURE_ID)
+
+<span style="color: #8B008B; font-weight: bold">if</span> <span style="color: #8B008B">not</span> os.path.exists(DATA_ID):
+    os.makedirs(DATA_ID)
+
+<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">image_path</span>(fig_id):
+    <span style="color: #8B008B; font-weight: bold">return</span> os.path.join(FIGURE_ID, fig_id)
+
+<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">data_path</span>(dat_id):
+    <span style="color: #8B008B; font-weight: bold">return</span> os.path.join(DATA_ID, dat_id)
+
+<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">save_fig</span>(fig_id):
+    plt.savefig(image_path(fig_id) + <span style="color: #CD5555">&quot;.png&quot;</span>, <span style="color: #658b00">format</span>=<span style="color: #CD5555">&#39;png&#39;</span>)
+
+infile = <span style="color: #658b00">open</span>(data_path(<span style="color: #CD5555">&quot;EoS.csv&quot;</span>),<span style="color: #CD5555">&#39;r&#39;</span>)
+
+<span style="color: #228B22"># Read the EoS data as  csv file and organize the data into two arrays with density and energies</span>
+EoS = pd.read_csv(infile, names=(<span style="color: #CD5555">&#39;Density&#39;</span>, <span style="color: #CD5555">&#39;Energy&#39;</span>))
+EoS[<span style="color: #CD5555">&#39;Energy&#39;</span>] = pd.to_numeric(EoS[<span style="color: #CD5555">&#39;Energy&#39;</span>], errors=<span style="color: #CD5555">&#39;coerce&#39;</span>)
+EoS = EoS.dropna()
+Energies = EoS[<span style="color: #CD5555">&#39;Energy&#39;</span>]
+Density = EoS[<span style="color: #CD5555">&#39;Density&#39;</span>]
+<span style="color: #228B22">#  The design matrix now as function of various polytrops</span>
+
+Maxpolydegree = <span style="color: #B452CD">30</span>
+X = np.zeros((<span style="color: #658b00">len</span>(Density),Maxpolydegree))
+X[:,<span style="color: #B452CD">0</span>] = <span style="color: #B452CD">1.0</span>
+estimated_mse_sklearn = np.zeros(Maxpolydegree)
+polynomial = np.zeros(Maxpolydegree)
+k =<span style="color: #B452CD">5</span>
+kfold = KFold(n_splits = k)
+
+<span style="color: #8B008B; font-weight: bold">for</span> polydegree <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #B452CD">1</span>, Maxpolydegree):
+    polynomial[polydegree] = polydegree
+    <span style="color: #8B008B; font-weight: bold">for</span> degree <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(polydegree):
+        X[:,degree] = Density**(degree/<span style="color: #B452CD">3.0</span>)
+        OLS = LinearRegression(fit_intercept=<span style="color: #8B008B; font-weight: bold">False</span>)
+<span style="color: #228B22"># loop over trials in order to estimate the expectation value of the MSE</span>
+    estimated_mse_folds = cross_val_score(OLS, X, Energies, scoring=<span style="color: #CD5555">&#39;neg_mean_squared_error&#39;</span>, cv=kfold)
+<span style="color: #228B22">#[:, np.newaxis]</span>
+    estimated_mse_sklearn[polydegree] = np.mean(-estimated_mse_folds)
+
+plt.plot(polynomial, np.log10(estimated_mse_sklearn), label=<span style="color: #CD5555">&#39;Test Error&#39;</span>)
+plt.xlabel(<span style="color: #CD5555">&#39;Polynomial degree&#39;</span>)
+plt.ylabel(<span style="color: #CD5555">&#39;log10[MSE]&#39;</span>)
+plt.legend()
+plt.show()
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
+</div>
+
+
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="material-for-the-lab-sessions">Material for the lab sessions </h2>
+
+<p>This week we will discuss during the first hour of each lab session
+some technicalities related to the project and methods for updating
+the learning like ADAgrad, RMSprop and ADAM. As teaching material, see
+the jupyter-notebook from week 37 (September 12-16).
+</p>
+
+<p>For the lab session, the following video on cross validation (from 2024), could be helpful, see <a href="https://www.youtube.com/watch?v=T9jjWsmsd1o" target="_blank"><tt>https://www.youtube.com/watch?v=T9jjWsmsd1o</tt></a></p>
+
+<p>See also video on ADAgrad, RMSprop and ADAM (material from last week not covered during lecture) at <a href="https://youtu.be/J_41Hld6tTU" target="_blank"><tt>https://youtu.be/J_41Hld6tTU</tt></a></p>
 <!-- ------------------- end of main content --------------- -->
 <center style="font-size:80%">
-<!-- copyright --> &copy; 1999-2024, Morten Hjorth-Jensen. Released under CC Attribution-NonCommercial 4.0 license
+<!-- copyright --> &copy; 1999-2025, Morten Hjorth-Jensen. Released under CC Attribution-NonCommercial 4.0 license
 </center>
 </body>
 </html>
diff --git a/doc/pub/week38/html/week38.html b/doc/pub/week38/html/week38.html
index 51d5d5167..8a07c4a1c 100644
--- a/doc/pub/week38/html/week38.html
+++ b/doc/pub/week38/html/week38.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 38: Logistic Regression and Optimization">
-<title>Week 38: Logistic Regression and Optimization</title>
+<meta name="description" content="Week 38: Statistical analysis, bias-variance tradeoff and resampling methods">
+<title>Week 38: Statistical analysis, bias-variance tradeoff and resampling methods</title>
 <style type="text/css">
 /* bloodish style */
 body {
@@ -140,172 +140,113 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Plans for week 38, lecture Monday September 16',
+ 'sections': [('Plans for week 38, lecture Monday September 15',
                2,
                None,
-               'plans-for-week-38-lecture-monday-september-16'),
-              ('Suggested reading and videos',
+               'plans-for-week-38-lecture-monday-september-15'),
+              ('Readings and Videos', 2, None, 'readings-and-videos'),
+              ('Linking the regression analysis with a statistical '
+               'interpretation',
                2,
                None,
-               'suggested-reading-and-videos'),
-              ('Plans for the lab sessions',
+               'linking-the-regression-analysis-with-a-statistical-interpretation'),
+              ('Assumptions made', 2, None, 'assumptions-made'),
+              ('Expectation value and variance',
                2,
                None,
-               'plans-for-the-lab-sessions'),
-              ('Material for lecture Monday September 16',
+               'expectation-value-and-variance'),
+              ('Expectation value and variance for $\\boldsymbol{\\theta}$',
                2,
                None,
-               'material-for-lecture-monday-september-16'),
-              ('Logistic Regression', 2, None, 'logistic-regression'),
-              ('Classification problems', 2, None, 'classification-problems'),
-              ('Optimization and Deep learning',
+               'expectation-value-and-variance-for-boldsymbol-theta'),
+              ('Deriving OLS from a probability distribution',
                2,
                None,
-               'optimization-and-deep-learning'),
-              ('Basics', 2, None, 'basics'),
-              ('Linear classifier', 2, None, 'linear-classifier'),
-              ('Some selected properties', 2, None, 'some-selected-properties'),
-              ('Simple example', 2, None, 'simple-example'),
-              ('Plotting the mean value for each group',
+               'deriving-ols-from-a-probability-distribution'),
+              ('Independent and Identically Distributed (iid)',
                2,
                None,
-               'plotting-the-mean-value-for-each-group'),
-              ('The logistic function', 2, None, 'the-logistic-function'),
-              ('Examples of likelihood functions used in logistic regression '
-               'and nueral networks',
+               'independent-and-identically-distributed-iid'),
+              ('Maximum Likelihood Estimation (MLE)',
                2,
                None,
-               'examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-networks'),
-              ('Two parameters', 2, None, 'two-parameters'),
-              ('Maximum likelihood', 2, None, 'maximum-likelihood'),
-              ('The cost function rewritten',
+               'maximum-likelihood-estimation-mle'),
+              ('A new Cost Function', 2, None, 'a-new-cost-function'),
+              ('Why resampling methods', 2, None, 'why-resampling-methods'),
+              ('Resampling methods', 2, None, 'resampling-methods'),
+              ('Resampling approaches can be computationally expensive',
                2,
                None,
-               'the-cost-function-rewritten'),
-              ('Minimizing the cross entropy',
+               'resampling-approaches-can-be-computationally-expensive'),
+              ('Why resampling methods ?', 2, None, 'why-resampling-methods'),
+              ('Statistical analysis', 2, None, 'statistical-analysis'),
+              ('Resampling methods', 2, None, 'resampling-methods'),
+              ('Resampling methods: Bootstrap',
                2,
                None,
-               'minimizing-the-cross-entropy'),
-              ('A more compact expression',
+               'resampling-methods-bootstrap'),
+              ('The Central Limit Theorem',
                2,
                None,
-               'a-more-compact-expression'),
-              ('Extending to more predictors',
+               'the-central-limit-theorem'),
+              ('Finding the Limit', 2, None, 'finding-the-limit'),
+              ('Rewriting the $\\delta$-function',
                2,
                None,
-               'extending-to-more-predictors'),
-              ('Including more classes', 2, None, 'including-more-classes'),
-              ('More classes', 2, None, 'more-classes'),
-              ('Searching for Optimal Regularization Parameters $\\lambda$',
+               'rewriting-the-delta-function'),
+              ('Identifying Terms', 2, None, 'identifying-terms'),
+              ('Wrapping it up', 2, None, 'wrapping-it-up'),
+              ('Confidence Intervals', 2, None, 'confidence-intervals'),
+              ('Standard Approach based on the Normal Distribution',
                2,
                None,
-               'searching-for-optimal-regularization-parameters-lambda'),
-              ('Grid Search', 2, None, 'grid-search'),
-              ('Randomized Grid Search', 2, None, 'randomized-grid-search'),
-              ('Wisconsin Cancer Data', 2, None, 'wisconsin-cancer-data'),
-              ('Using the correlation matrix',
+               'standard-approach-based-on-the-normal-distribution'),
+              ('Resampling methods: Bootstrap background',
                2,
                None,
-               'using-the-correlation-matrix'),
-              ('Discussing the correlation data',
+               'resampling-methods-bootstrap-background'),
+              ('Resampling methods: More Bootstrap background',
                2,
                None,
-               'discussing-the-correlation-data'),
-              ('Other measures in classification studies: Cancer Data  again',
+               'resampling-methods-more-bootstrap-background'),
+              ('Resampling methods: Bootstrap approach',
                2,
                None,
-               'other-measures-in-classification-studies-cancer-data-again'),
-              ('Optimization, the central part of any Machine Learning '
-               'algortithm',
+               'resampling-methods-bootstrap-approach'),
+              ('Resampling methods: Bootstrap steps',
                2,
                None,
-               'optimization-the-central-part-of-any-machine-learning-algortithm'),
-              ('Revisiting our Logistic Regression case',
+               'resampling-methods-bootstrap-steps'),
+              ('Code example for the Bootstrap method',
                2,
                None,
-               'revisiting-our-logistic-regression-case'),
-              ('The equations to solve', 2, None, 'the-equations-to-solve'),
-              ("Solving using Newton-Raphson's method",
+               'code-example-for-the-bootstrap-method'),
+              ('Plotting the Histogram', 2, None, 'plotting-the-histogram'),
+              ('The bias-variance tradeoff',
                2,
                None,
-               'solving-using-newton-raphson-s-method'),
-              ("Brief reminder on Newton-Raphson's method",
+               'the-bias-variance-tradeoff'),
+              ('A way to Read the Bias-Variance Tradeoff',
                2,
                None,
-               'brief-reminder-on-newton-raphson-s-method'),
-              ('The equations', 2, None, 'the-equations'),
-              ('Simple geometric interpretation',
+               'a-way-to-read-the-bias-variance-tradeoff'),
+              ('Example code for Bias-Variance tradeoff',
                2,
                None,
-               'simple-geometric-interpretation'),
-              ('Extending to more than one variable',
+               'example-code-for-bias-variance-tradeoff'),
+              ('Understanding what happens',
                2,
                None,
-               'extending-to-more-than-one-variable'),
-              ('Steepest descent', 2, None, 'steepest-descent'),
-              ('More on Steepest descent', 2, None, 'more-on-steepest-descent'),
-              ('The ideal', 2, None, 'the-ideal'),
-              ('The sensitiveness of the gradient descent',
+               'understanding-what-happens'),
+              ('Summing up', 2, None, 'summing-up'),
+              ("Another Example from Scikit-Learn's Repository",
                2,
                None,
-               'the-sensitiveness-of-the-gradient-descent'),
-              ('Convex functions', 2, None, 'convex-functions'),
-              ('Convex function', 2, None, 'convex-function'),
-              ('Conditions on convex functions',
-               2,
-               None,
-               'conditions-on-convex-functions'),
-              ('More on convex functions', 2, None, 'more-on-convex-functions'),
-              ('Some simple problems', 2, None, 'some-simple-problems'),
-              ('Revisiting our first homework',
-               2,
-               None,
-               'revisiting-our-first-homework'),
-              ('Gradient descent example', 2, None, 'gradient-descent-example'),
-              ('The derivative of the cost/loss function',
-               2,
-               None,
-               'the-derivative-of-the-cost-loss-function'),
-              ('The Hessian matrix', 2, None, 'the-hessian-matrix'),
-              ('Simple program', 2, None, 'simple-program'),
-              ('Gradient Descent Example', 2, None, 'gradient-descent-example'),
-              ('And a corresponding example using _scikit-learn_',
-               2,
-               None,
-               'and-a-corresponding-example-using-scikit-learn'),
-              ('Gradient descent and Ridge',
-               2,
-               None,
-               'gradient-descent-and-ridge'),
-              ('The Hessian matrix for Ridge Regression',
-               2,
-               None,
-               'the-hessian-matrix-for-ridge-regression'),
-              ('Program example for gradient descent with Ridge Regression',
-               2,
-               None,
-               'program-example-for-gradient-descent-with-ridge-regression'),
-              ('Using gradient descent methods, limitations',
-               2,
-               None,
-               'using-gradient-descent-methods-limitations'),
-              ('Challenge yourself the coming weekend',
-               2,
-               None,
-               'challenge-yourself-the-coming-weekend'),
-              ('Lab session: Material from last week and relevant for the '
-               'first project',
-               2,
-               None,
-               'lab-session-material-from-last-week-and-relevant-for-the-first-project'),
+               'another-example-from-scikit-learn-s-repository'),
               ('Various steps in cross-validation',
                2,
                None,
                'various-steps-in-cross-validation'),
-              ('How to set up the cross-validation for Ridge and/or Lasso',
-               2,
-               None,
-               'how-to-set-up-the-cross-validation-for-ridge-and-or-lasso'),
               ('Cross-validation in brief',
                2,
                None,
@@ -314,7 +255,19 @@
                'Cross-validation',
                2,
                None,
-               'code-example-for-cross-validation-and-k-fold-cross-validation')]}
+               'code-example-for-cross-validation-and-k-fold-cross-validation'),
+              ('More examples on bootstrap and cross-validation and errors',
+               2,
+               None,
+               'more-examples-on-bootstrap-and-cross-validation-and-errors'),
+              ('The same example but now with cross-validation',
+               2,
+               None,
+               'the-same-example-but-now-with-cross-validation'),
+              ('Material for the lab sessions',
+               2,
+               None,
+               'material-for-the-lab-sessions')]}
 end of tocinfo -->
 
 <body>
@@ -336,1507 +289,815 @@
 
 <!-- ------------------- main content ---------------------- -->
 <center>
-<h1>Week 38: Logistic Regression and Optimization</h1>
+<h1>Week 38: Statistical analysis, bias-variance tradeoff and resampling methods</h1>
 </center>  <!-- document title -->
 
 <!-- author(s): Morten Hjorth-Jensen -->
 <center>
-<b>Morten Hjorth-Jensen</b> [1, 2]
+<b>Morten Hjorth-Jensen</b> 
 </center>
-<!-- institution(s) -->
+<!-- institution -->
 <center>
-[1] <b>Department of Physics and Center for Computing in Science Education, University of Oslo</b>
-</center>
-<center>
-[2] <b>Department of Physics and Astronomy and Facility for Rare Isotope Beams, Michigan State University</b>
+<b>Department of Physics and Center for Computing in Science Education, University of Oslo, Norway</b>
 </center>
 <br>
 <center>
-<h4>September 16-20, 2024</h4>
+<h4>September 15-19, 2025</h4>
 </center> <!-- date -->
 <br>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="plans-for-week-38-lecture-monday-september-16">Plans for week 38, lecture Monday September 16 </h2>
+<h2 id="plans-for-week-38-lecture-monday-september-15">Plans for week 38, lecture Monday September 15 </h2>
 
 <div class="alert alert-block alert-block alert-text-normal">
-<b>Material for the lecture on Monday September 16</b>
+<b>Material for the lecture on Monday September 15</b>
 <p>
-<ul>
-  <li> Logistic regression as our first encounter of classification methods. From binary cases to several categories.</li>
-  <li> Start gradient and optimization methods</li>
-  <li> <a href="https://youtu.be/c9DIfNHy2ks" target="_blank">Video of lecture</a></li>
-  <li> Whiteboard notes at <a href="https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2024/NotesSeptember16.pdf" target="_blank"><tt>https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2024/NotesSeptember16.pdf</tt></a></li>
-</ul>
+<ol>
+<li> Statistical interpretation of OLS and various expectation values</li>
+<li> Resampling techniques, Bootstrap and cross validation and bias-variance tradeoff</li>
+<li> The material we did not cover last week, that is on more advanced methods for updating the learning rate, are covered by its own video. We will briefly discuss these topics at the beginning of the lecture and during the lab sessions. See video on ADAgrad, RMSprop and ADAM (material from last week not covered during lecture) at <a href="https://youtu.be/J_41Hld6tTU" target="_blank"><tt>https://youtu.be/J_41Hld6tTU</tt></a></li>
+<li> <a href="https://youtu.be/4Fo7ITVA7V4" target="_blank">Video of Lecture</a></li>
+<li> <a href="https://youtu.be/GBWc1abChKo" target="_blank">Video from lab sessions on the bias-variance tradeoff</a></li>
+<li> <a href="https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2025/FYSSTKweek38.pdf" target="_blank">Whiteboard notes</a></li>
+</ol>
 </div>
 
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="suggested-reading-and-videos">Suggested reading and videos </h2>
+<h2 id="readings-and-videos">Readings and Videos </h2>
 <div class="alert alert-block alert-block alert-text-normal">
 <b></b>
 <p>
-<ul>
-  <li> Readings and Videos:</li>
-<ul>
-    <li> Hastie et al 4.1, 4.2 and 4.3 on logistic regression</li>
-    <li> Raschka et al, pages 53-76 on Logistic regression and pages 37-52 on gradient optimization</li>
-    <li> For a good discussion on gradient methods, see Goodfellow et al section 4.3-4.5 and chapter 8. We will come back to the latter chapter in our discussion of Neural networks as well.</li>
-    <li> <a href="https://www.youtube.com/watch?v=C5268D9t9Ak" target="_blank">Video on Logistic regression</a></li>
-    <li> <a href="https://www.youtube.com/watch?v=yIYKR4sgzI8" target="_blank">Yet another video on logistic regression</a></li>
-    <li> <a href="https://www.youtube.com/watch?v=sDv4f4s2SB8" target="_blank">Video on gradient descent</a></li>
-</ul>
-</ul>
-</div>
-
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="plans-for-the-lab-sessions">Plans for the lab sessions </h2>
-
-<div class="alert alert-block alert-block alert-text-normal">
-<b>Material for the active learning sessions on Tuesday and Wednesday</b>
-<p>
-<ul>
-  <li> Repetition  from last week on the bias-variance tradeoff</li>
-  <li> Resampling techniques, cross-validation examples included here, see also the lectures from last week on the bootstrap method</li>
-  <li> Exercise for week 38 on the bias-variance tradeoff, see also the video from the lab session from week 37 at <a href="https://youtu.be/omLmp_kkie0" target="_blank"><tt>https://youtu.be/omLmp_kkie0</tt></a></li>
-  <li> Work on project 1, in particular resampling methods like cross-validation and bootstrap.</li>
-  <li> <a href="https://youtu.be/T9jjWsmsd1o" target="_blank">Video on cross-validation from exercise session</a></li>
-</ul>
+<ol>
+<li> Raschka et al, pages 175-192</li>
+<li> Hastie et al Chapter 7, here we recommend 7.1-7.5 and 7.10 (cross-validation) and 7.11 (bootstrap). See <a href="https://link.springer.com/book/10.1007/978-0-387-84858-7" target="_blank"><tt>https://link.springer.com/book/10.1007/978-0-387-84858-7</tt></a>.</li>
+<li> <a href="https://www.youtube.com/watch?v=EuBBz3bI-aA" target="_blank">Video on bias-variance tradeoff</a></li>
+<li> <a href="https://www.youtube.com/watch?v=Xz0x-8-cgaQ" target="_blank">Video on Bootstrapping</a></li>
+<li> <a href="https://www.youtube.com/watch?v=fSytzGwwBVw" target="_blank">Video on cross validation</a></li>
+</ol>
+<p>For the lab session, the following video on cross validation (from 2024), could be helpful, see <a href="https://www.youtube.com/watch?v=T9jjWsmsd1o" target="_blank"><tt>https://www.youtube.com/watch?v=T9jjWsmsd1o</tt></a></p>
 </div>
-  
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="material-for-lecture-monday-september-16">Material for lecture Monday September 16 </h2>
-
-<!-- !split  -->
-<h2 id="logistic-regression">Logistic Regression </h2>
 
-<p>In linear regression our main interest was centered on learning the
-coefficients of a functional fit (say a polynomial) in order to be
-able to predict the response of a continuous variable on some unseen
-data. The fit to the continuous variable \( y_i \) is based on some
-independent variables \( \boldsymbol{x}_i \). Linear regression resulted in
-analytical expressions for standard ordinary Least Squares or Ridge
-regression (in terms of matrices to invert) for several quantities,
-ranging from the variance and thereby the confidence intervals of the
-parameters \( \boldsymbol{\beta} \) to the mean squared error. If we can invert
-the product of the design matrices, linear regression gives then a
-simple recipe for fitting our data.
-</p>
 
 <!-- !split  -->
-<h2 id="classification-problems">Classification problems </h2>
+<h2 id="linking-the-regression-analysis-with-a-statistical-interpretation">Linking the regression analysis with a statistical interpretation </h2>
 
-<p>Classification problems, however, are concerned with outcomes taking
-the form of discrete variables (i.e. categories). We may for example,
-on the basis of DNA sequencing for a number of patients, like to find
-out which mutations are important for a certain disease; or based on
-scans of various patients' brains, figure out if there is a tumor or
-not; or given a specific physical system, we'd like to identify its
-state, say whether it is an ordered or disordered system (typical
-situation in solid state physics); or classify the status of a
-patient, whether she/he has a stroke or not and many other similar
-situations.
+<p>We will now couple the discussions of ordinary least squares, Ridge
+and Lasso regression with a statistical interpretation, that is we
+move from a linear algebra analysis to a statistical analysis. In
+particular, we will focus on what the regularization terms can result
+in.  We will amongst other things show that the regularization
+parameter can reduce considerably the variance of the parameters
+\( \theta \).
 </p>
 
-<p>The most common situation we encounter when we apply logistic
-regression is that of two possible outcomes, normally denoted as a
-binary outcome, true or false, positive or negative, success or
-failure etc.
+<p>On of the advantages of doing linear regression is that we actually end up with
+analytical expressions for several statistical quantities.  
+Standard least squares and Ridge regression  allow us to
+derive quantities like the variance and other expectation values in a
+rather straightforward way.
 </p>
 
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="optimization-and-deep-learning">Optimization and Deep learning </h2>
-
-<p>Logistic regression will also serve as our stepping stone towards
-neural network algorithms and supervised deep learning. For logistic
-learning, the minimization of the cost function leads to a non-linear
-equation in the parameters \( \boldsymbol{\beta} \). The optimization of the
-problem calls therefore for minimization algorithms. This forms the
-bottle neck of all machine learning algorithms, namely how to find
-reliable minima of a multi-variable function. This leads us to the
-family of gradient descent methods. The latter are the working horses
-of basically all modern machine learning algorithms.
+<p>It is assumed that \( \varepsilon_i
+\sim \mathcal{N}(0, \sigma^2) \) and the \( \varepsilon_{i} \) are
+independent, i.e.: 
 </p>
+$$
+\begin{align*} 
+\mbox{Cov}(\varepsilon_{i_1},
+\varepsilon_{i_2}) & = \left\{ \begin{array}{lcc} \sigma^2 & \mbox{if}
+& i_1 = i_2, \\ 0 & \mbox{if} & i_1 \not= i_2.  \end{array} \right.
+\end{align*} 
+$$
 
-<p>We note also that many of the topics discussed here on logistic 
-regression are also commonly used in modern supervised Deep Learning
-models, as we will see later.
+<p>The randomness of \( \varepsilon_i \) implies that
+\( \mathbf{y}_i \) is also a random variable. In particular,
+\( \mathbf{y}_i \) is normally distributed, because \( \varepsilon_i \sim
+\mathcal{N}(0, \sigma^2) \) and \( \mathbf{X}_{i,\ast} \, \boldsymbol{\theta} \) is a
+non-random scalar. To specify the parameters of the distribution of
+\( \mathbf{y}_i \) we need to calculate its first two moments. 
 </p>
 
-<!-- !split  -->
-<h2 id="basics">Basics </h2>
-
-<p>We consider the case where the dependent variables, also called the
-responses or the outcomes, \( y_i \) are discrete and only take values
-from \( k=0,\dots,K-1 \) (i.e. \( K \) classes).
+<p>Recall that \( \boldsymbol{X} \) is a matrix of dimensionality \( n\times p \). The
+notation above \( \mathbf{X}_{i,\ast} \) means that we are looking at the
+row number \( i \) and perform a sum over all values \( p \).
 </p>
 
-<p>The goal is to predict the
-output classes from the design matrix \( \boldsymbol{X}\in\mathbb{R}^{n\times p} \)
-made of \( n \) samples, each of which carries \( p \) features or predictors. The
-primary goal is to identify the classes to which new unseen samples
-belong.
-</p>
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="assumptions-made">Assumptions made </h2>
 
-<p>Let us specialize to the case of two classes only, with outputs
-\( y_i=0 \) and \( y_i=1 \). Our outcomes could represent the status of a
-credit card user that could default or not on her/his credit card
-debt. That is
+<p>The assumption we have made here can be summarized as (and this is going to be useful when we discuss the bias-variance trade off)
+that there exists a function \( f(\boldsymbol{x}) \) and  a normal distributed error \( \boldsymbol{\varepsilon}\sim \mathcal{N}(0, \sigma^2) \)
+which describe our data
 </p>
-
 $$
-y_i = \begin{bmatrix} 0 & \mathrm{no}\\  1 & \mathrm{yes} \end{bmatrix}.
+\boldsymbol{y} = f(\boldsymbol{x})+\boldsymbol{\varepsilon}
 $$
 
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="linear-classifier">Linear classifier </h2>
-
-<p>Before moving to the logistic model, let us try to use our linear
-regression model to classify these two outcomes. We could for example
-fit a linear model to the default case if \( y_i > 0.5 \) and the no
-default case \( y_i \leq 0.5 \).
-</p>
-
-<p>We would then have our 
-weighted linear combination, namely 
+<p>We approximate this function with our model from the solution of the linear regression equations, that is our
+function \( f \) is approximated by \( \boldsymbol{\tilde{y}} \) where we want to minimize \( (\boldsymbol{y}-\boldsymbol{\tilde{y}})^2 \), our MSE, with
 </p>
 $$
-\begin{equation}
-\boldsymbol{y} = \boldsymbol{X}^T\boldsymbol{\beta} +  \boldsymbol{\epsilon},
-\label{_auto1}
-\end{equation}
+\boldsymbol{\tilde{y}} = \boldsymbol{X}\boldsymbol{\theta}.
 $$
 
-<p>where \( \boldsymbol{y} \) is a vector representing the possible outcomes, \( \boldsymbol{X} \) is our
-\( n\times p \) design matrix and \( \boldsymbol{\beta} \) represents our estimators/predictors.
-</p>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="some-selected-properties">Some selected properties </h2>
+<h2 id="expectation-value-and-variance">Expectation value and variance </h2>
 
-<p>The main problem with our function is that it takes values on the
-entire real axis. In the case of logistic regression, however, the
-labels \( y_i \) are discrete variables. A typical example is the credit
-card data discussed below here, where we can set the state of
-defaulting the debt to \( y_i=1 \) and not to \( y_i=0 \) for one the persons
-in the data set (see the full example below).
-</p>
+<p>We can calculate the expectation value of \( \boldsymbol{y} \) for a given element \( i \) </p>
+$$
+\begin{align*} 
+\mathbb{E}(y_i) & =
+\mathbb{E}(\mathbf{X}_{i, \ast} \, \boldsymbol{\theta}) + \mathbb{E}(\varepsilon_i)
+\, \, \, = \, \, \, \mathbf{X}_{i, \ast} \, \theta, 
+\end{align*} 
+$$
 
-<p>One simple way to get a discrete output is to have sign
-functions that map the output of a linear regressor to values \( \{0,1\} \),
-\( f(s_i)=sign(s_i)=1 \) if \( s_i\ge 0 \) and 0 if otherwise. 
-We will encounter this model in our first demonstration of neural networks.
+<p>while
+its variance is 
 </p>
+$$
+\begin{align*} \mbox{Var}(y_i) & = \mathbb{E} \{ [y_i
+- \mathbb{E}(y_i)]^2 \} \, \, \, = \, \, \, \mathbb{E} ( y_i^2 ) -
+[\mathbb{E}(y_i)]^2  \\  & = \mathbb{E} [ ( \mathbf{X}_{i, \ast} \,
+\theta + \varepsilon_i )^2] - ( \mathbf{X}_{i, \ast} \, \boldsymbol{\theta})^2 \\ &
+= \mathbb{E} [ ( \mathbf{X}_{i, \ast} \, \boldsymbol{\theta})^2 + 2 \varepsilon_i
+\mathbf{X}_{i, \ast} \, \boldsymbol{\theta} + \varepsilon_i^2 ] - ( \mathbf{X}_{i,
+\ast} \, \theta)^2 \\  & = ( \mathbf{X}_{i, \ast} \, \boldsymbol{\theta})^2 + 2
+\mathbb{E}(\varepsilon_i) \mathbf{X}_{i, \ast} \, \boldsymbol{\theta} +
+\mathbb{E}(\varepsilon_i^2 ) - ( \mathbf{X}_{i, \ast} \, \boldsymbol{\theta})^2 
+\\ & = \mathbb{E}(\varepsilon_i^2 ) \, \, \, = \, \, \,
+\mbox{Var}(\varepsilon_i) \, \, \, = \, \, \, \sigma^2.  
+\end{align*}
+$$
 
-<p>Historically it is called the <b>perceptron</b> model in the machine learning
-literature. This model is extremely simple. However, in many cases it is more
-favorable to use a ``soft" classifier that outputs
-the probability of a given category. This leads us to the logistic function.
+<p>Hence, \( y_i \sim \mathcal{N}( \mathbf{X}_{i, \ast} \, \boldsymbol{\theta}, \sigma^2) \), that is \( \boldsymbol{y} \) follows a normal distribution with 
+mean value \( \boldsymbol{X}\boldsymbol{\theta} \) and variance \( \sigma^2 \) (not be confused with the singular values of the SVD). 
 </p>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="simple-example">Simple example </h2>
-
-<p>The following example on data for coronary heart disease (CHD) as function of age may serve as an illustration. In the code here we read and plot whether a person has had CHD (output = 1) or not (output = 0). This ouput  is plotted the person's against age. Clearly, the figure shows that attempting to make a standard linear regression fit may not be very meaningful.</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #408080; font-style: italic"># Common imports</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">os</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">pandas</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">pd</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">matplotlib.pyplot</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">plt</span>
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.linear_model</span> <span style="color: #008000; font-weight: bold">import</span> LinearRegression, Ridge, Lasso
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.model_selection</span> <span style="color: #008000; font-weight: bold">import</span> train_test_split
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.utils</span> <span style="color: #008000; font-weight: bold">import</span> resample
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.metrics</span> <span style="color: #008000; font-weight: bold">import</span> mean_squared_error
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">IPython.display</span> <span style="color: #008000; font-weight: bold">import</span> display
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">pylab</span> <span style="color: #008000; font-weight: bold">import</span> plt, mpl
-plt<span style="color: #666666">.</span>style<span style="color: #666666">.</span>use(<span style="color: #BA2121">&#39;seaborn&#39;</span>)
-mpl<span style="color: #666666">.</span>rcParams[<span style="color: #BA2121">&#39;font.family&#39;</span>] <span style="color: #666666">=</span> <span style="color: #BA2121">&#39;serif&#39;</span>
-
-<span style="color: #408080; font-style: italic"># Where to save the figures and data files</span>
-PROJECT_ROOT_DIR <span style="color: #666666">=</span> <span style="color: #BA2121">&quot;Results&quot;</span>
-FIGURE_ID <span style="color: #666666">=</span> <span style="color: #BA2121">&quot;Results/FigureFiles&quot;</span>
-DATA_ID <span style="color: #666666">=</span> <span style="color: #BA2121">&quot;DataFiles/&quot;</span>
-
-<span style="color: #008000; font-weight: bold">if</span> <span style="color: #AA22FF; font-weight: bold">not</span> os<span style="color: #666666">.</span>path<span style="color: #666666">.</span>exists(PROJECT_ROOT_DIR):
-    os<span style="color: #666666">.</span>mkdir(PROJECT_ROOT_DIR)
-
-<span style="color: #008000; font-weight: bold">if</span> <span style="color: #AA22FF; font-weight: bold">not</span> os<span style="color: #666666">.</span>path<span style="color: #666666">.</span>exists(FIGURE_ID):
-    os<span style="color: #666666">.</span>makedirs(FIGURE_ID)
-
-<span style="color: #008000; font-weight: bold">if</span> <span style="color: #AA22FF; font-weight: bold">not</span> os<span style="color: #666666">.</span>path<span style="color: #666666">.</span>exists(DATA_ID):
-    os<span style="color: #666666">.</span>makedirs(DATA_ID)
-
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">image_path</span>(fig_id):
-    <span style="color: #008000; font-weight: bold">return</span> os<span style="color: #666666">.</span>path<span style="color: #666666">.</span>join(FIGURE_ID, fig_id)
-
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">data_path</span>(dat_id):
-    <span style="color: #008000; font-weight: bold">return</span> os<span style="color: #666666">.</span>path<span style="color: #666666">.</span>join(DATA_ID, dat_id)
-
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">save_fig</span>(fig_id):
-    plt<span style="color: #666666">.</span>savefig(image_path(fig_id) <span style="color: #666666">+</span> <span style="color: #BA2121">&quot;.png&quot;</span>, <span style="color: #008000">format</span><span style="color: #666666">=</span><span style="color: #BA2121">&#39;png&#39;</span>)
-
-infile <span style="color: #666666">=</span> <span style="color: #008000">open</span>(data_path(<span style="color: #BA2121">&quot;chddata.csv&quot;</span>),<span style="color: #BA2121">&#39;r&#39;</span>)
-
-<span style="color: #408080; font-style: italic"># Read the chd data as  csv file and organize the data into arrays with age group, age, and chd</span>
-chd <span style="color: #666666">=</span> pd<span style="color: #666666">.</span>read_csv(infile, names<span style="color: #666666">=</span>(<span style="color: #BA2121">&#39;ID&#39;</span>, <span style="color: #BA2121">&#39;Age&#39;</span>, <span style="color: #BA2121">&#39;Agegroup&#39;</span>, <span style="color: #BA2121">&#39;CHD&#39;</span>))
-chd<span style="color: #666666">.</span>columns <span style="color: #666666">=</span> [<span style="color: #BA2121">&#39;ID&#39;</span>, <span style="color: #BA2121">&#39;Age&#39;</span>, <span style="color: #BA2121">&#39;Agegroup&#39;</span>, <span style="color: #BA2121">&#39;CHD&#39;</span>]
-output <span style="color: #666666">=</span> chd[<span style="color: #BA2121">&#39;CHD&#39;</span>]
-age <span style="color: #666666">=</span> chd[<span style="color: #BA2121">&#39;Age&#39;</span>]
-agegroup <span style="color: #666666">=</span> chd[<span style="color: #BA2121">&#39;Agegroup&#39;</span>]
-numberID  <span style="color: #666666">=</span> chd[<span style="color: #BA2121">&#39;ID&#39;</span>] 
-display(chd)
-
-plt<span style="color: #666666">.</span>scatter(age, output, marker<span style="color: #666666">=</span><span style="color: #BA2121">&#39;o&#39;</span>)
-plt<span style="color: #666666">.</span>axis([<span style="color: #666666">18</span>,<span style="color: #666666">70.0</span>,<span style="color: #666666">-0.1</span>, <span style="color: #666666">1.2</span>])
-plt<span style="color: #666666">.</span>xlabel(<span style="color: #BA2121">r&#39;Age&#39;</span>)
-plt<span style="color: #666666">.</span>ylabel(<span style="color: #BA2121">r&#39;CHD&#39;</span>)
-plt<span style="color: #666666">.</span>title(<span style="color: #BA2121">r&#39;Age distribution and Coronary heart disease&#39;</span>)
-plt<span style="color: #666666">.</span>show()
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="plotting-the-mean-value-for-each-group">Plotting the mean value for each group </h2>
+<h2 id="expectation-value-and-variance-for-boldsymbol-theta">Expectation value and variance for \( \boldsymbol{\theta} \) </h2>
 
-<p>What we could attempt however is to plot the mean value for each group.</p>
+<p>With the OLS expressions for the optimal parameters \( \boldsymbol{\hat{\theta}} \) we can evaluate the expectation value</p>
+$$
+\mathbb{E}(\boldsymbol{\hat{\theta}}) = \mathbb{E}[ (\mathbf{X}^{\top} \mathbf{X})^{-1}\mathbf{X}^{T} \mathbf{Y}]=(\mathbf{X}^{T} \mathbf{X})^{-1}\mathbf{X}^{T} \mathbb{E}[ \mathbf{Y}]=(\mathbf{X}^{T} \mathbf{X})^{-1} \mathbf{X}^{T}\mathbf{X}\boldsymbol{\theta}=\boldsymbol{\theta}.
+$$
 
+<p>This means that the estimator of the regression parameters is unbiased.</p>
 
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;">agegroupmean <span style="color: #666666">=</span> np<span style="color: #666666">.</span>array([<span style="color: #666666">0.1</span>, <span style="color: #666666">0.133</span>, <span style="color: #666666">0.250</span>, <span style="color: #666666">0.333</span>, <span style="color: #666666">0.462</span>, <span style="color: #666666">0.625</span>, <span style="color: #666666">0.765</span>, <span style="color: #666666">0.800</span>])
-group <span style="color: #666666">=</span> np<span style="color: #666666">.</span>array([<span style="color: #666666">1</span>, <span style="color: #666666">2</span>, <span style="color: #666666">3</span>, <span style="color: #666666">4</span>, <span style="color: #666666">5</span>, <span style="color: #666666">6</span>, <span style="color: #666666">7</span>, <span style="color: #666666">8</span>])
-plt<span style="color: #666666">.</span>plot(group, agegroupmean, <span style="color: #BA2121">&quot;r-&quot;</span>)
-plt<span style="color: #666666">.</span>axis([<span style="color: #666666">0</span>,<span style="color: #666666">9</span>,<span style="color: #666666">0</span>, <span style="color: #666666">1.0</span>])
-plt<span style="color: #666666">.</span>xlabel(<span style="color: #BA2121">r&#39;Age group&#39;</span>)
-plt<span style="color: #666666">.</span>ylabel(<span style="color: #BA2121">r&#39;CHD mean values&#39;</span>)
-plt<span style="color: #666666">.</span>title(<span style="color: #BA2121">r&#39;Mean values for each age group&#39;</span>)
-plt<span style="color: #666666">.</span>show()
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
+<p>We can also calculate the variance</p>
 
-<p>We are now trying to find a function \( f(y\vert x) \), that is a function which gives us an expected value for the output \( y \) with a given input \( x \).
-In standard linear regression with a linear dependence on \( x \), we would write this in terms of our model
-</p>
+<p>The variance of the optimal value \( \boldsymbol{\hat{\theta}} \) is</p>
 $$
-f(y_i\vert x_i)=\beta_0+\beta_1 x_i.
+\begin{eqnarray*}
+\mbox{Var}(\boldsymbol{\hat{\theta}}) & = & \mathbb{E} \{ [\boldsymbol{\theta} - \mathbb{E}(\boldsymbol{\theta})] [\boldsymbol{\theta} - \mathbb{E}(\boldsymbol{\theta})]^{T} \}
+\\
+& = & \mathbb{E} \{ [(\mathbf{X}^{T} \mathbf{X})^{-1} \, \mathbf{X}^{T} \mathbf{Y} - \boldsymbol{\theta}] \, [(\mathbf{X}^{T} \mathbf{X})^{-1} \, \mathbf{X}^{T} \mathbf{Y} - \boldsymbol{\theta}]^{T} \}
+\\
+% & = & \mathbb{E} \{ [(\mathbf{X}^{T} \mathbf{X})^{-1} \, \mathbf{X}^{T} \mathbf{Y}] \, [(\mathbf{X}^{T} \mathbf{X})^{-1} \, \mathbf{X}^{T} \mathbf{Y}]^{T} \} - \boldsymbol{\theta} \, \boldsymbol{\theta}^{T}
+% \\
+% & = & \mathbb{E} \{ (\mathbf{X}^{T} \mathbf{X})^{-1} \, \mathbf{X}^{T} \mathbf{Y} \, \mathbf{Y}^{T} \, \mathbf{X} \, (\mathbf{X}^{T} \mathbf{X})^{-1}  \} - \boldsymbol{\theta} \, \boldsymbol{\theta}^{T}
+% \\
+& = & (\mathbf{X}^{T} \mathbf{X})^{-1} \, \mathbf{X}^{T} \, \mathbb{E} \{ \mathbf{Y} \, \mathbf{Y}^{T} \} \, \mathbf{X} \, (\mathbf{X}^{T} \mathbf{X})^{-1} - \boldsymbol{\theta} \, \boldsymbol{\theta}^{T}
+\\
+& = & (\mathbf{X}^{T} \mathbf{X})^{-1} \, \mathbf{X}^{T} \, \{ \mathbf{X} \, \boldsymbol{\theta} \, \boldsymbol{\theta}^{T} \,  \mathbf{X}^{T} + \sigma^2 \} \, \mathbf{X} \, (\mathbf{X}^{T} \mathbf{X})^{-1} - \boldsymbol{\theta} \, \boldsymbol{\theta}^{T}
+% \\
+% & = & (\mathbf{X}^T \mathbf{X})^{-1} \, \mathbf{X}^T \, \mathbf{X} \, \boldsymbol{\theta} \, \boldsymbol{\theta}^T \,  \mathbf{X}^T \, \mathbf{X} \, (\mathbf{X}^T % \mathbf{X})^{-1}
+% \\
+% & & + \, \, \sigma^2 \, (\mathbf{X}^T \mathbf{X})^{-1} \, \mathbf{X}^T  \, \mathbf{X} \, (\mathbf{X}^T \mathbf{X})^{-1} - \boldsymbol{\theta} \boldsymbol{\theta}^T
+\\
+& = & \boldsymbol{\theta} \, \boldsymbol{\theta}^{T}  + \sigma^2 \, (\mathbf{X}^{T} \mathbf{X})^{-1} - \boldsymbol{\theta} \, \boldsymbol{\theta}^{T}
+\, \, \, = \, \, \, \sigma^2 \, (\mathbf{X}^{T} \mathbf{X})^{-1},
+\end{eqnarray*}
 $$
 
-<p>This expression implies however that \( f(y_i\vert x_i) \) could take any
-value from minus infinity to plus infinity. If we however let
-\( f(y\vert y) \) be represented by the mean value, the above example
-shows us that we can constrain the function to take values between
-zero and one, that is we have \( 0 \le f(y_i\vert x_i) \le 1 \). Looking
-at our last curve we see also that it has an S-shaped form. This leads
-us to a very popular model for the function \( f \), namely the so-called
-Sigmoid function or logistic model. We will consider this function as
-representing the probability for finding a value of \( y_i \) with a given
-\( x_i \).
+<p>where we have used  that \( \mathbb{E} (\mathbf{Y} \mathbf{Y}^{T}) =
+\mathbf{X} \, \boldsymbol{\theta} \, \boldsymbol{\theta}^{T} \, \mathbf{X}^{T} +
+\sigma^2 \, \mathbf{I}_{nn} \). From \( \mbox{Var}(\boldsymbol{\theta}) = \sigma^2
+\, (\mathbf{X}^{T} \mathbf{X})^{-1} \), one obtains an estimate of the
+variance of the estimate of the \( j \)-th regression coefficient:
+\( \boldsymbol{\sigma}^2 (\boldsymbol{\theta}_j ) = \boldsymbol{\sigma}^2 [(\mathbf{X}^{T} \mathbf{X})^{-1}]_{jj}  \). This may be used to
+construct a confidence interval for the estimates.
 </p>
 
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="the-logistic-function">The logistic function </h2>
-
-<p>Another widely studied model, is the so-called 
-perceptron model, which is an example of a &quot;hard classification&quot; model. We
-will encounter this model when we discuss neural networks as
-well. Each datapoint is deterministically assigned to a category (i.e
-\( y_i=0 \) or \( y_i=1 \)). In many cases, and the coronary heart disease data forms one of many such examples, it is favorable to have a &quot;soft&quot;
-classifier that outputs the probability of a given category rather
-than a single value. For example, given \( x_i \), the classifier
-outputs the probability of being in a category \( k \).  Logistic regression
-is the most common example of a so-called soft classifier. In logistic
-regression, the probability that a data point \( x_i \)
-belongs to a category \( y_i=\{0,1\} \) is given by the so-called logit function (or Sigmoid) which is meant to represent the likelihood for a given event, 
+<p>In a similar way, we can obtain analytical expressions for say the
+expectation values of the parameters \( \boldsymbol{\theta} \) and their variance
+when we employ Ridge regression, allowing us again to define a confidence interval. 
 </p>
+
+<p>It is rather straightforward to show that</p>
 $$
-p(t) = \frac{1}{1+\mathrm \exp{-t}}=\frac{\exp{t}}{1+\mathrm \exp{t}}.
+\mathbb{E} \big[ \boldsymbol{\theta}^{\mathrm{Ridge}} \big]=(\mathbf{X}^{T} \mathbf{X} + \lambda \mathbf{I}_{pp})^{-1} (\mathbf{X}^{\top} \mathbf{X})\boldsymbol{\theta}^{\mathrm{OLS}}.
 $$
 
-<p>Note that \( 1-p(t)= p(-t) \).</p>
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-networks">Examples of likelihood functions used in logistic regression and nueral networks </h2>
-
-<p>The following code plots the logistic function, the step function and other functions we will encounter from here and on.</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;The sigmoid function (or the logistic curve) is a</span>
-<span style="color: #BA2121; font-style: italic">function that takes any real number, z, and outputs a number (0,1).</span>
-<span style="color: #BA2121; font-style: italic">It is useful in neural networks for assigning weights on a relative scale.</span>
-<span style="color: #BA2121; font-style: italic">The value z is the weighted sum of parameters involved in the learning algorithm.&quot;&quot;&quot;</span>
-
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">numpy</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">matplotlib.pyplot</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">plt</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">math</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">mt</span>
-
-z <span style="color: #666666">=</span> numpy<span style="color: #666666">.</span>arange(<span style="color: #666666">-5</span>, <span style="color: #666666">5</span>, <span style="color: #666666">.1</span>)
-sigma_fn <span style="color: #666666">=</span> numpy<span style="color: #666666">.</span>vectorize(<span style="color: #008000; font-weight: bold">lambda</span> z: <span style="color: #666666">1/</span>(<span style="color: #666666">1+</span>numpy<span style="color: #666666">.</span>exp(<span style="color: #666666">-</span>z)))
-sigma <span style="color: #666666">=</span> sigma_fn(z)
-
-fig <span style="color: #666666">=</span> plt<span style="color: #666666">.</span>figure()
-ax <span style="color: #666666">=</span> fig<span style="color: #666666">.</span>add_subplot(<span style="color: #666666">111</span>)
-ax<span style="color: #666666">.</span>plot(z, sigma)
-ax<span style="color: #666666">.</span>set_ylim([<span style="color: #666666">-0.1</span>, <span style="color: #666666">1.1</span>])
-ax<span style="color: #666666">.</span>set_xlim([<span style="color: #666666">-5</span>,<span style="color: #666666">5</span>])
-ax<span style="color: #666666">.</span>grid(<span style="color: #008000; font-weight: bold">True</span>)
-ax<span style="color: #666666">.</span>set_xlabel(<span style="color: #BA2121">&#39;z&#39;</span>)
-ax<span style="color: #666666">.</span>set_title(<span style="color: #BA2121">&#39;sigmoid function&#39;</span>)
-
-plt<span style="color: #666666">.</span>show()
-
-<span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;Step Function&quot;&quot;&quot;</span>
-z <span style="color: #666666">=</span> numpy<span style="color: #666666">.</span>arange(<span style="color: #666666">-5</span>, <span style="color: #666666">5</span>, <span style="color: #666666">.02</span>)
-step_fn <span style="color: #666666">=</span> numpy<span style="color: #666666">.</span>vectorize(<span style="color: #008000; font-weight: bold">lambda</span> z: <span style="color: #666666">1.0</span> <span style="color: #008000; font-weight: bold">if</span> z <span style="color: #666666">&gt;=</span> <span style="color: #666666">0.0</span> <span style="color: #008000; font-weight: bold">else</span> <span style="color: #666666">0.0</span>)
-step <span style="color: #666666">=</span> step_fn(z)
+<p>We see clearly that 
+\( \mathbb{E} \big[ \boldsymbol{\theta}^{\mathrm{Ridge}} \big] \not= \boldsymbol{\theta}^{\mathrm{OLS}} \) for any \( \lambda > 0 \). We say then that the ridge estimator is biased.
+</p>
 
-fig <span style="color: #666666">=</span> plt<span style="color: #666666">.</span>figure()
-ax <span style="color: #666666">=</span> fig<span style="color: #666666">.</span>add_subplot(<span style="color: #666666">111</span>)
-ax<span style="color: #666666">.</span>plot(z, step)
-ax<span style="color: #666666">.</span>set_ylim([<span style="color: #666666">-0.5</span>, <span style="color: #666666">1.5</span>])
-ax<span style="color: #666666">.</span>set_xlim([<span style="color: #666666">-5</span>,<span style="color: #666666">5</span>])
-ax<span style="color: #666666">.</span>grid(<span style="color: #008000; font-weight: bold">True</span>)
-ax<span style="color: #666666">.</span>set_xlabel(<span style="color: #BA2121">&#39;z&#39;</span>)
-ax<span style="color: #666666">.</span>set_title(<span style="color: #BA2121">&#39;step function&#39;</span>)
+<p>We can also compute the variance as </p>
 
-plt<span style="color: #666666">.</span>show()
+$$
+\mbox{Var}[\boldsymbol{\theta}^{\mathrm{Ridge}}]=\sigma^2[  \mathbf{X}^{T} \mathbf{X} + \lambda \mathbf{I} ]^{-1}  \mathbf{X}^{T} \mathbf{X} \{ [  \mathbf{X}^{\top} \mathbf{X} + \lambda \mathbf{I} ]^{-1}\}^{T},
+$$
 
-<span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;tanh Function&quot;&quot;&quot;</span>
-z <span style="color: #666666">=</span> numpy<span style="color: #666666">.</span>arange(<span style="color: #666666">-2*</span>mt<span style="color: #666666">.</span>pi, <span style="color: #666666">2*</span>mt<span style="color: #666666">.</span>pi, <span style="color: #666666">0.1</span>)
-t <span style="color: #666666">=</span> numpy<span style="color: #666666">.</span>tanh(z)
+<p>and it is easy to see that if the parameter \( \lambda \) goes to infinity then the variance of Ridge parameters \( \boldsymbol{\theta} \) goes to zero. </p>
 
-fig <span style="color: #666666">=</span> plt<span style="color: #666666">.</span>figure()
-ax <span style="color: #666666">=</span> fig<span style="color: #666666">.</span>add_subplot(<span style="color: #666666">111</span>)
-ax<span style="color: #666666">.</span>plot(z, t)
-ax<span style="color: #666666">.</span>set_ylim([<span style="color: #666666">-1.0</span>, <span style="color: #666666">1.0</span>])
-ax<span style="color: #666666">.</span>set_xlim([<span style="color: #666666">-2*</span>mt<span style="color: #666666">.</span>pi,<span style="color: #666666">2*</span>mt<span style="color: #666666">.</span>pi])
-ax<span style="color: #666666">.</span>grid(<span style="color: #008000; font-weight: bold">True</span>)
-ax<span style="color: #666666">.</span>set_xlabel(<span style="color: #BA2121">&#39;z&#39;</span>)
-ax<span style="color: #666666">.</span>set_title(<span style="color: #BA2121">&#39;tanh function&#39;</span>)
+<p>With this, we can compute the difference </p>
 
-plt<span style="color: #666666">.</span>show()
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
+$$
+\mbox{Var}[\boldsymbol{\theta}^{\mathrm{OLS}}]-\mbox{Var}(\boldsymbol{\theta}^{\mathrm{Ridge}})=\sigma^2 [  \mathbf{X}^{T} \mathbf{X} + \lambda \mathbf{I} ]^{-1}[ 2\lambda\mathbf{I} + \lambda^2 (\mathbf{X}^{T} \mathbf{X})^{-1} ] \{ [  \mathbf{X}^{T} \mathbf{X} + \lambda \mathbf{I} ]^{-1}\}^{T}.
+$$
 
+<p>The difference is non-negative definite since each component of the
+matrix product is non-negative definite. 
+This means the variance we obtain with the standard OLS will always for \( \lambda > 0 \) be larger than the variance of \( \boldsymbol{\theta} \) obtained with the Ridge estimator. This has interesting consequences when we discuss the so-called bias-variance trade-off below. 
+</p>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="two-parameters">Two parameters </h2>
+<h2 id="deriving-ols-from-a-probability-distribution">Deriving OLS from a probability distribution </h2>
 
-<p>We assume now that we have two classes with \( y_i \) either \( 0 \) or \( 1 \). Furthermore we assume also that we have only two parameters \( \beta \) in our fitting of the Sigmoid function, that is we define probabilities </p>
-$$
-\begin{align*}
-p(y_i=1|x_i,\boldsymbol{\beta}) &= \frac{\exp{(\beta_0+\beta_1x_i)}}{1+\exp{(\beta_0+\beta_1x_i)}},\nonumber\\
-p(y_i=0|x_i,\boldsymbol{\beta}) &= 1 - p(y_i=1|x_i,\boldsymbol{\beta}),
-\end{align*}
-$$
+<p>Our basic assumption when we derived the OLS equations was to assume
+that our output is determined by a given continuous function
+\( f(\boldsymbol{x}) \) and a random noise \( \boldsymbol{\epsilon} \) given by the normal
+distribution with zero mean value and an undetermined variance
+\( \sigma^2 \).
+</p>
 
-<p>where \( \boldsymbol{\beta} \) are the weights we wish to extract from data, in our case \( \beta_0 \) and \( \beta_1 \). </p>
+<p>We found above that the outputs \( \boldsymbol{y} \) have a mean value given by
+\( \boldsymbol{X}\hat{\boldsymbol{\theta}} \) and variance \( \sigma^2 \). Since the entries to
+the design matrix are not stochastic variables, we can assume that the
+probability distribution of our targets is also a normal distribution
+but now with mean value \( \boldsymbol{X}\hat{\boldsymbol{\theta}} \). This means that a
+single output \( y_i \) is given by the Gaussian distribution
+</p>
 
-<p>Note that we used</p>
 $$
-p(y_i=0\vert x_i, \boldsymbol{\beta}) = 1-p(y_i=1\vert x_i, \boldsymbol{\beta}).
+y_i\sim \mathcal{N}(\boldsymbol{X}_{i,*}\boldsymbol{\theta}, \sigma^2)=\frac{1}{\sqrt{2\pi\sigma^2}}\exp{\left[-\frac{(y_i-\boldsymbol{X}_{i,*}\boldsymbol{\theta})^2}{2\sigma^2}\right]}.
 $$
 
 
-<!-- !split  -->
-<h2 id="maximum-likelihood">Maximum likelihood </h2>
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="independent-and-identically-distributed-iid">Independent and Identically Distributed (iid) </h2>
 
-<p>In order to define the total likelihood for all possible outcomes from a  
-dataset \( \mathcal{D}=\{(y_i,x_i)\} \), with the binary labels
-\( y_i\in\{0,1\} \) and where the data points are drawn independently, we use the so-called <a href="https://en.wikipedia.org/wiki/Maximum_likelihood_estimation" target="_blank">Maximum Likelihood Estimation</a> (MLE) principle. 
-We aim thus at maximizing 
-the probability of seeing the observed data. We can then approximate the 
-likelihood in terms of the product of the individual probabilities of a specific outcome \( y_i \), that is 
+<p>We assume now that the various \( y_i \) values are stochastically distributed according to the above Gaussian distribution. 
+We define this distribution as
 </p>
 $$
-\begin{align*}
-P(\mathcal{D}|\boldsymbol{\beta})& = \prod_{i=1}^n \left[p(y_i=1|x_i,\boldsymbol{\beta})\right]^{y_i}\left[1-p(y_i=1|x_i,\boldsymbol{\beta}))\right]^{1-y_i}\nonumber \\
-\end{align*}
-$$
-
-<p>from which we obtain the log-likelihood and our <b>cost/loss</b> function</p>
-$$
-\mathcal{C}(\boldsymbol{\beta}) = \sum_{i=1}^n \left( y_i\log{p(y_i=1|x_i,\boldsymbol{\beta})} + (1-y_i)\log\left[1-p(y_i=1|x_i,\boldsymbol{\beta}))\right]\right).
+p(y_i, \boldsymbol{X}\vert\boldsymbol{\theta})=\frac{1}{\sqrt{2\pi\sigma^2}}\exp{\left[-\frac{(y_i-\boldsymbol{X}_{i,*}\boldsymbol{\theta})^2}{2\sigma^2}\right]},
 $$
 
+<p>which reads as finding the likelihood of an event \( y_i \) with the input variables \( \boldsymbol{X} \) given the parameters (to be determined) \( \boldsymbol{\theta} \).</p>
 
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="the-cost-function-rewritten">The cost function rewritten </h2>
+<p>Since these events are assumed to be independent and identicall distributed we can build the probability distribution function (PDF) for all possible event \( \boldsymbol{y} \) as the product of the single events, that is we have</p>
 
-<p>Reordering the logarithms, we can rewrite the <b>cost/loss</b> function as</p>
 $$
-\mathcal{C}(\boldsymbol{\beta}) = \sum_{i=1}^n  \left(y_i(\beta_0+\beta_1x_i) -\log{(1+\exp{(\beta_0+\beta_1x_i)})}\right).
+p(\boldsymbol{y},\boldsymbol{X}\vert\boldsymbol{\theta})=\prod_{i=0}^{n-1}\frac{1}{\sqrt{2\pi\sigma^2}}\exp{\left[-\frac{(y_i-\boldsymbol{X}_{i,*}\boldsymbol{\theta})^2}{2\sigma^2}\right]}=\prod_{i=0}^{n-1}p(y_i,\boldsymbol{X}\vert\boldsymbol{\theta}).
 $$
 
-<p>The maximum likelihood estimator is defined as the set of parameters that maximize the log-likelihood where we maximize with respect to \( \beta \).
-Since the cost (error) function is just the negative log-likelihood, for logistic regression we have that
+<p>We will write this in a more compact form reserving \( \boldsymbol{D} \) for the domain of events, including the ouputs (targets) and the inputs. That is
+in case we have a simple one-dimensional input and output case
 </p>
 $$
-\mathcal{C}(\boldsymbol{\beta})=-\sum_{i=1}^n  \left(y_i(\beta_0+\beta_1x_i) -\log{(1+\exp{(\beta_0+\beta_1x_i)})}\right).
+\boldsymbol{D}=[(x_0,y_0), (x_1,y_1),\dots, (x_{n-1},y_{n-1})].
 $$
 
-<p>This equation is known in statistics as the <b>cross entropy</b>. Finally, we note that just as in linear regression, 
-in practice we often supplement the cross-entropy with additional regularization terms, usually \( L_1 \) and \( L_2 \) regularization as we did for Ridge and Lasso regression.
+<p>In the more general case the various inputs should be replaced by the possible features represented by the input data set \( \boldsymbol{X} \). 
+We can now rewrite the above probability as 
 </p>
+$$
+p(\boldsymbol{D}\vert\boldsymbol{\theta})=\prod_{i=0}^{n-1}\frac{1}{\sqrt{2\pi\sigma^2}}\exp{\left[-\frac{(y_i-\boldsymbol{X}_{i,*}\boldsymbol{\theta})^2}{2\sigma^2}\right]}.
+$$
+
+<p>It is a conditional probability (see below) and reads as the likelihood of a domain of events \( \boldsymbol{D} \) given a set of parameters \( \boldsymbol{\theta} \).</p>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="minimizing-the-cross-entropy">Minimizing the cross entropy </h2>
+<h2 id="maximum-likelihood-estimation-mle">Maximum Likelihood Estimation (MLE) </h2>
 
-<p>The cross entropy is a convex function of the weights \( \boldsymbol{\beta} \) and,
-therefore, any local minimizer is a global minimizer. 
+<p>In statistics, maximum likelihood estimation (MLE) is a method of
+estimating the parameters of an assumed probability distribution,
+given some observed data. This is achieved by maximizing a likelihood
+function so that, under the assumed statistical model, the observed
+data is the most probable. 
 </p>
 
-<p>Minimizing this
-cost function with respect to the two parameters \( \beta_0 \) and \( \beta_1 \) we obtain
+<p>We will assume here that our events are given by the above Gaussian
+distribution and we will determine the optimal parameters \( \theta \) by
+maximizing the above PDF. However, computing the derivatives of a
+product function is cumbersome and can easily lead to overflow and/or
+underflowproblems, with potentials for loss of numerical precision.
 </p>
 
-$$
-\frac{\partial \mathcal{C}(\boldsymbol{\beta})}{\partial \beta_0} = -\sum_{i=1}^n  \left(y_i -\frac{\exp{(\beta_0+\beta_1x_i)}}{1+\exp{(\beta_0+\beta_1x_i)}}\right),
-$$
-
-<p>and </p>
-$$
-\frac{\partial \mathcal{C}(\boldsymbol{\beta})}{\partial \beta_1} = -\sum_{i=1}^n  \left(y_ix_i -x_i\frac{\exp{(\beta_0+\beta_1x_i)}}{1+\exp{(\beta_0+\beta_1x_i)}}\right).
-$$
+<p>In practice, it is more convenient to maximize the logarithm of the
+PDF because it is a monotonically increasing function of the argument.
+Alternatively, and this will be our option, we will minimize the
+negative of the logarithm since this is a monotonically decreasing
+function.
+</p>
 
+<p>Note also that maximization/minimization of the logarithm of the PDF
+is equivalent to the maximization/minimization of the function itself.
+</p>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="a-more-compact-expression">A more compact expression </h2>
+<h2 id="a-new-cost-function">A new Cost Function </h2>
 
-<p>Let us now define a vector \( \boldsymbol{y} \) with \( n \) elements \( y_i \), an
-\( n\times p \) matrix \( \boldsymbol{X} \) which contains the \( x_i \) values and a
-vector \( \boldsymbol{p} \) of fitted probabilities \( p(y_i\vert x_i,\boldsymbol{\beta}) \). We can rewrite in a more compact form the first
-derivative of cost function as
-</p>
+<p>We could now define a new cost function to minimize, namely the negative logarithm of the above PDF</p>
 
 $$
-\frac{\partial \mathcal{C}(\boldsymbol{\beta})}{\partial \boldsymbol{\beta}} = -\boldsymbol{X}^T\left(\boldsymbol{y}-\boldsymbol{p}\right). 
+C(\boldsymbol{\theta})=-\log{\prod_{i=0}^{n-1}p(y_i,\boldsymbol{X}\vert\boldsymbol{\theta})}=-\sum_{i=0}^{n-1}\log{p(y_i,\boldsymbol{X}\vert\boldsymbol{\theta})},
 $$
 
-<p>If we in addition define a diagonal matrix \( \boldsymbol{W} \) with elements 
-\( p(y_i\vert x_i,\boldsymbol{\beta})(1-p(y_i\vert x_i,\boldsymbol{\beta}) \), we can obtain a compact expression of the second derivative as 
-</p>
-
+<p>which becomes</p>
 $$
-\frac{\partial^2 \mathcal{C}(\boldsymbol{\beta})}{\partial \boldsymbol{\beta}\partial \boldsymbol{\beta}^T} = \boldsymbol{X}^T\boldsymbol{W}\boldsymbol{X}. 
+C(\boldsymbol{\theta})=\frac{n}{2}\log{2\pi\sigma^2}+\frac{\vert\vert (\boldsymbol{y}-\boldsymbol{X}\boldsymbol{\theta})\vert\vert_2^2}{2\sigma^2}.
 $$
 
+<p>Taking the derivative of the <em>new</em> cost function with respect to the parameters \( \theta \) we recognize our familiar OLS equation, namely</p>
 
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="extending-to-more-predictors">Extending to more predictors </h2>
-
-<p>Within a binary classification problem, we can easily expand our model to include multiple predictors. Our ratio between likelihoods is then with \( p \) predictors</p>
 $$
-\log{ \frac{p(\boldsymbol{\beta}\boldsymbol{x})}{1-p(\boldsymbol{\beta}\boldsymbol{x})}} = \beta_0+\beta_1x_1+\beta_2x_2+\dots+\beta_px_p.
+\boldsymbol{X}^T\left(\boldsymbol{y}-\boldsymbol{X}\boldsymbol{\theta}\right) =0,
 $$
 
-<p>Here we defined \( \boldsymbol{x}=[1,x_1,x_2,\dots,x_p] \) and \( \boldsymbol{\beta}=[\beta_0, \beta_1, \dots, \beta_p] \) leading to</p>
+<p>which leads to the well-known OLS equation for the optimal paramters \( \theta \)</p>
 $$
-p(\boldsymbol{\beta}\boldsymbol{x})=\frac{ \exp{(\beta_0+\beta_1x_1+\beta_2x_2+\dots+\beta_px_p)}}{1+\exp{(\beta_0+\beta_1x_1+\beta_2x_2+\dots+\beta_px_p)}}.
+\hat{\boldsymbol{\theta}}^{\mathrm{OLS}}=\left(\boldsymbol{X}^T\boldsymbol{X}\right)^{-1}\boldsymbol{X}^T\boldsymbol{y}!
 $$
 
+<p>Next week we will make  a similar analysis for Ridge and Lasso regression</p>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="including-more-classes">Including more classes </h2>
+<h2 id="why-resampling-methods">Why resampling methods </h2>
 
-<p>Till now we have mainly focused on two classes, the so-called binary
-system. Suppose we wish to extend to \( K \) classes.  Let us for the sake
-of simplicity assume we have only two predictors. We have then following model
+<p>Before we proceed, we need to rethink what we have been doing. In our
+eager to fit the data, we have omitted several important elements in
+our regression analysis. In what follows we will
 </p>
+<ol>
+<li> look at statistical properties, including a discussion of mean values, variance and the so-called bias-variance tradeoff</li>
+<li> introduce resampling techniques like cross-validation, bootstrapping and jackknife and more</li>
+</ol>
+<p>and discuss how to select a given model (one of the difficult parts in machine learning).</p>
 
-$$
-\log{\frac{p(C=1\vert x)}{p(K\vert x)}} = \beta_{10}+\beta_{11}x_1,
-$$
-
-<p>and </p>
-$$
-\log{\frac{p(C=2\vert x)}{p(K\vert x)}} = \beta_{20}+\beta_{21}x_1,
-$$
-
-<p>and so on till the class \( C=K-1 \) class</p>
-$$
-\log{\frac{p(C=K-1\vert x)}{p(K\vert x)}} = \beta_{(K-1)0}+\beta_{(K-1)1}x_1,
-$$
-
-<p>and the model is specified in term of \( K-1 \) so-called log-odds or
-<b>logit</b> transformations.
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="resampling-methods">Resampling methods </h2>
+<div class="alert alert-block alert-block alert-text-normal">
+<b></b>
+<p>
+<p>Resampling methods are an indispensable tool in modern
+statistics. They involve repeatedly drawing samples from a training
+set and refitting a model of interest on each sample in order to
+obtain additional information about the fitted model. For example, in
+order to estimate the variability of a linear regression fit, we can
+repeatedly draw different samples from the training data, fit a linear
+regression to each new sample, and then examine the extent to which
+the resulting fits differ. Such an approach may allow us to obtain
+information that would not be available from fitting the model only
+once using the original training sample.
+</p>
+
+<p>Two resampling methods are often used in Machine Learning analyses,</p>
+<ol>
+<li> The <b>bootstrap method</b></li>
+<li> and <b>Cross-Validation</b></li>
+</ol>
+<p>In addition there are several other methods such as the Jackknife and the Blocking methods. We will discuss in particular
+cross-validation and the bootstrap method. 
 </p>
+</div>
 
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="more-classes">More classes </h2>
 
-<p>In our discussion of neural networks we will encounter the above again
-in terms of a slightly modified function, the so-called <b>Softmax</b> function.
-</p>
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="resampling-approaches-can-be-computationally-expensive">Resampling approaches can be computationally expensive </h2>
+<div class="alert alert-block alert-block alert-text-normal">
+<b></b>
+<p>
 
-<p>The softmax function is used in various multiclass classification
-methods, such as multinomial logistic regression (also known as
-softmax regression), multiclass linear discriminant analysis, naive
-Bayes classifiers, and artificial neural networks.  Specifically, in
-multinomial logistic regression and linear discriminant analysis, the
-input to the function is the result of \( K \) distinct linear functions,
-and the predicted probability for the \( k \)-th class given a sample
-vector \( \boldsymbol{x} \) and a weighting vector \( \boldsymbol{\beta} \) is (with two
-predictors):
+<p>Resampling approaches can be computationally expensive, because they
+involve fitting the same statistical method multiple times using
+different subsets of the training data. However, due to recent
+advances in computing power, the computational requirements of
+resampling methods generally are not prohibitive. In this chapter, we
+discuss two of the most commonly used resampling methods,
+cross-validation and the bootstrap. Both methods are important tools
+in the practical application of many statistical learning
+procedures. For example, cross-validation can be used to estimate the
+test error associated with a given statistical learning method in
+order to evaluate its performance, or to select the appropriate level
+of flexibility. The process of evaluating a model&#8217;s performance is
+known as model assessment, whereas the process of selecting the proper
+level of flexibility for a model is known as model selection. The
+bootstrap is widely used.
 </p>
+</div>
 
-$$
-p(C=k\vert \mathbf {x} )=\frac{\exp{(\beta_{k0}+\beta_{k1}x_1)}}{1+\sum_{l=1}^{K-1}\exp{(\beta_{l0}+\beta_{l1}x_1)}}.
-$$
-
-<p>It is easy to extend to more predictors. The final class is </p>
-$$
-p(C=K\vert \mathbf {x} )=\frac{1}{1+\sum_{l=1}^{K-1}\exp{(\beta_{l0}+\beta_{l1}x_1)}},
-$$
-
-<p>and they sum to one. Our earlier discussions were all specialized to
-the case with two classes only. It is easy to see from the above that
-what we derived earlier is compatible with these equations.
-</p>
-
-<p>To find the optimal parameters we would typically use a gradient
-descent method.  Newton's method and gradient descent methods are
-discussed in the material on <a href="https://compphysics.github.io/MachineLearning/doc/pub/Splines/html/Splines-bs.html" target="_blank">optimization
-methods</a>.
-</p>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="searching-for-optimal-regularization-parameters-lambda">Searching for Optimal Regularization Parameters \( \lambda \) </h2>
-
-<p>In project 1, when using Ridge and Lasso regression, we end up
-searching for the optimal parameter \( \lambda \) which minimizes our
-selected scores (MSE or \( R2 \) values for example). The brute force
-approach, as discussed in the code here for Ridge regression, consists
-in evaluating the MSE as function of different \( \lambda \) values.
-Based on these calculations, one tries then to determine the value of the hyperparameter \( \lambda \)
-which results in optimal scores (for example the smallest MSE or an \( R2=1 \)).
-</p>
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">pandas</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">pd</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">matplotlib.pyplot</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">plt</span>
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.model_selection</span> <span style="color: #008000; font-weight: bold">import</span> train_test_split
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn</span> <span style="color: #008000; font-weight: bold">import</span> linear_model
-
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">MSE</span>(y_data,y_model):
-    n <span style="color: #666666">=</span> np<span style="color: #666666">.</span>size(y_model)
-    <span style="color: #008000; font-weight: bold">return</span> np<span style="color: #666666">.</span>sum((y_data<span style="color: #666666">-</span>y_model)<span style="color: #666666">**2</span>)<span style="color: #666666">/</span>n
-<span style="color: #408080; font-style: italic"># A seed just to ensure that the random numbers are the same for every run.</span>
-<span style="color: #408080; font-style: italic"># Useful for eventual debugging.</span>
-np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>seed(<span style="color: #666666">2021</span>)
-
-n <span style="color: #666666">=</span> <span style="color: #666666">100</span>
-x <span style="color: #666666">=</span> np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>rand(n)
-y <span style="color: #666666">=</span> np<span style="color: #666666">.</span>exp(<span style="color: #666666">-</span>x<span style="color: #666666">**2</span>) <span style="color: #666666">+</span> <span style="color: #666666">1.5</span> <span style="color: #666666">*</span> np<span style="color: #666666">.</span>exp(<span style="color: #666666">-</span>(x<span style="color: #666666">-2</span>)<span style="color: #666666">**2</span>)<span style="color: #666666">+</span> np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>randn(n)
-
-Maxpolydegree <span style="color: #666666">=</span> <span style="color: #666666">5</span>
-X <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros((n,Maxpolydegree<span style="color: #666666">-1</span>))
-
-<span style="color: #008000; font-weight: bold">for</span> degree <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #666666">1</span>,Maxpolydegree): <span style="color: #408080; font-style: italic">#No intercept column</span>
-    X[:,degree<span style="color: #666666">-1</span>] <span style="color: #666666">=</span> x<span style="color: #666666">**</span>(degree)
-
-<span style="color: #408080; font-style: italic"># We split the data in test and training data</span>
-X_train, X_test, y_train, y_test <span style="color: #666666">=</span> train_test_split(X, y, test_size<span style="color: #666666">=0.2</span>)
+<h2 id="why-resampling-methods">Why resampling methods ? </h2>
+<div class="alert alert-block alert-block alert-text-normal">
+<b>Statistical analysis</b>
+<p>
 
-<span style="color: #408080; font-style: italic"># Decide which values of lambda to use</span>
-nlambdas <span style="color: #666666">=</span> <span style="color: #666666">500</span>
-MSERidgePredict <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros(nlambdas)
-lambdas <span style="color: #666666">=</span> np<span style="color: #666666">.</span>logspace(<span style="color: #666666">-4</span>, <span style="color: #666666">2</span>, nlambdas)
-<span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(nlambdas):
-    lmb <span style="color: #666666">=</span> lambdas[i]
-    RegRidge <span style="color: #666666">=</span> linear_model<span style="color: #666666">.</span>Ridge(lmb)
-    RegRidge<span style="color: #666666">.</span>fit(X_train,y_train)
-    ypredictRidge <span style="color: #666666">=</span> RegRidge<span style="color: #666666">.</span>predict(X_test)
-    MSERidgePredict[i] <span style="color: #666666">=</span> MSE(y_test,ypredictRidge)
-
-<span style="color: #408080; font-style: italic"># Now plot the results</span>
-plt<span style="color: #666666">.</span>figure()
-plt<span style="color: #666666">.</span>plot(np<span style="color: #666666">.</span>log10(lambdas), MSERidgePredict, <span style="color: #BA2121">&#39;g--&#39;</span>, label <span style="color: #666666">=</span> <span style="color: #BA2121">&#39;MSE SL Ridge Test&#39;</span>)
-plt<span style="color: #666666">.</span>xlabel(<span style="color: #BA2121">&#39;log10(lambda)&#39;</span>)
-plt<span style="color: #666666">.</span>ylabel(<span style="color: #BA2121">&#39;MSE&#39;</span>)
-plt<span style="color: #666666">.</span>legend()
-plt<span style="color: #666666">.</span>show()
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
+<ul>
+<li> Our simulations can be treated as <em>computer experiments</em>. This is particularly the case for Monte Carlo methods which are widely used in statistical analyses.</li>
+<li> The results can be analysed with the same statistical tools as we would use when analysing experimental data.</li>
+<li> As in all experiments, we are looking for expectation values and an estimate of how accurate they are, i.e., possible sources for errors.</li>
+</ul>
 </div>
-
-<p>Here we have performed a rather data greedy calculation as function of the regularization parameter \( \lambda \). There is no resampling here. The latter can easily be added by employing the function <b>RidgeCV</b> instead of just calling the <b>Ridge</b> function. For <b>RidgeCV</b> we need to pass the array of \( \lambda \) values.
-By inspecting the figure we can in turn determine which is the optimal regularization parameter.
-This becomes however less functional in the long run. 
-</p>
+    
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="grid-search">Grid Search </h2>
-
-<p>An alternative is to use the so-called grid search functionality
-included with the library <b>Scikit-Learn</b>, as demonstrated for the same
-example here.
-</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.model_selection</span> <span style="color: #008000; font-weight: bold">import</span> train_test_split
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.linear_model</span> <span style="color: #008000; font-weight: bold">import</span> Ridge
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.model_selection</span> <span style="color: #008000; font-weight: bold">import</span> GridSearchCV
-
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">R2</span>(y_data, y_model):
-    <span style="color: #008000; font-weight: bold">return</span> <span style="color: #666666">1</span> <span style="color: #666666">-</span> np<span style="color: #666666">.</span>sum((y_data <span style="color: #666666">-</span> y_model) <span style="color: #666666">**</span> <span style="color: #666666">2</span>) <span style="color: #666666">/</span> np<span style="color: #666666">.</span>sum((y_data <span style="color: #666666">-</span> np<span style="color: #666666">.</span>mean(y_data)) <span style="color: #666666">**</span> <span style="color: #666666">2</span>)
-
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">MSE</span>(y_data,y_model):
-    n <span style="color: #666666">=</span> np<span style="color: #666666">.</span>size(y_model)
-    <span style="color: #008000; font-weight: bold">return</span> np<span style="color: #666666">.</span>sum((y_data<span style="color: #666666">-</span>y_model)<span style="color: #666666">**2</span>)<span style="color: #666666">/</span>n
-
-<span style="color: #408080; font-style: italic"># A seed just to ensure that the random numbers are the same for every run.</span>
-<span style="color: #408080; font-style: italic"># Useful for eventual debugging.</span>
-np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>seed(<span style="color: #666666">2021</span>)
-
-n <span style="color: #666666">=</span> <span style="color: #666666">100</span>
-x <span style="color: #666666">=</span> np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>rand(n)
-y <span style="color: #666666">=</span> np<span style="color: #666666">.</span>exp(<span style="color: #666666">-</span>x<span style="color: #666666">**2</span>) <span style="color: #666666">+</span> <span style="color: #666666">1.5</span> <span style="color: #666666">*</span> np<span style="color: #666666">.</span>exp(<span style="color: #666666">-</span>(x<span style="color: #666666">-2</span>)<span style="color: #666666">**2</span>)<span style="color: #666666">+</span> np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>randn(n)
-
-Maxpolydegree <span style="color: #666666">=</span> <span style="color: #666666">5</span>
-X <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros((n,Maxpolydegree<span style="color: #666666">-1</span>))
-
-<span style="color: #008000; font-weight: bold">for</span> degree <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #666666">1</span>,Maxpolydegree): <span style="color: #408080; font-style: italic">#No intercept column</span>
-    X[:,degree<span style="color: #666666">-1</span>] <span style="color: #666666">=</span> x<span style="color: #666666">**</span>(degree)
-
-<span style="color: #408080; font-style: italic"># We split the data in test and training data</span>
-X_train, X_test, y_train, y_test <span style="color: #666666">=</span> train_test_split(X, y, test_size<span style="color: #666666">=0.2</span>)
+<h2 id="statistical-analysis">Statistical analysis </h2>
+<div class="alert alert-block alert-block alert-text-normal">
+<b></b>
+<p>
 
-<span style="color: #408080; font-style: italic"># Decide which values of lambda to use</span>
-nlambdas <span style="color: #666666">=</span> <span style="color: #666666">10</span>
-lambdas <span style="color: #666666">=</span> np<span style="color: #666666">.</span>logspace(<span style="color: #666666">-4</span>, <span style="color: #666666">2</span>, nlambdas)
-<span style="color: #408080; font-style: italic"># create and fit a ridge regression model, testing each alpha</span>
-model <span style="color: #666666">=</span> Ridge()
-gridsearch <span style="color: #666666">=</span> GridSearchCV(estimator<span style="color: #666666">=</span>model, param_grid<span style="color: #666666">=</span><span style="color: #008000">dict</span>(alpha<span style="color: #666666">=</span>lambdas))
-gridsearch<span style="color: #666666">.</span>fit(X_train, y_train)
-<span style="color: #008000">print</span>(gridsearch)
-ypredictRidge <span style="color: #666666">=</span> gridsearch<span style="color: #666666">.</span>predict(X_test)
-<span style="color: #408080; font-style: italic"># summarize the results of the grid search</span>
-<span style="color: #008000">print</span>(<span style="color: #BA2121">f&quot;Best estimated lambda-value: </span><span style="color: #BB6688; font-weight: bold">{</span>gridsearch<span style="color: #666666">.</span>best_estimator_<span style="color: #666666">.</span>alpha<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">&quot;</span>)
-<span style="color: #008000">print</span>(<span style="color: #BA2121">f&quot;MSE score: </span><span style="color: #BB6688; font-weight: bold">{</span>MSE(y_test,ypredictRidge)<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">&quot;</span>)
-<span style="color: #008000">print</span>(<span style="color: #BA2121">f&quot;R2 score: </span><span style="color: #BB6688; font-weight: bold">{</span>R2(y_test,ypredictRidge)<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">&quot;</span>)
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
+<ul>
+<li> As in other experiments, many numerical  experiments have two classes of errors:</li>
+<ul>
+  <li> Statistical errors</li>
+  <li> Systematical errors</li>
+</ul>
+<li> Statistical errors can be estimated using standard tools from statistics</li>
+<li> Systematical errors are method specific and must be treated differently from case to case.</li> 
+</ul>
 </div>
-
-<p>By default the grid search function includes cross validation with
-five folds. The <a href="https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html#sklearn.model_selection.GridSearchCV" target="_blank">Scikit-Learn
-documentation</a>
-contains more information on how to set the different parameters.
-</p>
-
-<p>If we take out the random noise, running the above codes results in \( \lambda=0 \) yielding the best fit. </p>
+    
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="randomized-grid-search">Randomized Grid Search </h2>
+<h2 id="resampling-methods">Resampling methods </h2>
 
-<p>An alternative to the above manual grid set up, is to use a random
-search where the parameters are tuned from a random distribution
-(uniform below) for a fixed number of iterations. A model is
-constructed and evaluated for each combination of chosen parameters.
-We repeat the previous example but now with a random search.  Note
-that values of \( \lambda \) are now limited to be within \( x\in
-[0,1] \). This domain may not be the most relevant one for the specific
-case under study.
+<p>With all these analytical equations for both the OLS and Ridge
+regression, we will now outline how to assess a given model. This will
+lead to a discussion of the so-called bias-variance tradeoff (see
+below) and so-called resampling methods.
 </p>
 
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.model_selection</span> <span style="color: #008000; font-weight: bold">import</span> train_test_split
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.linear_model</span> <span style="color: #008000; font-weight: bold">import</span> Ridge
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.model_selection</span> <span style="color: #008000; font-weight: bold">import</span> GridSearchCV
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">scipy.stats</span> <span style="color: #008000; font-weight: bold">import</span> uniform <span style="color: #008000; font-weight: bold">as</span> randuniform
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.model_selection</span> <span style="color: #008000; font-weight: bold">import</span> RandomizedSearchCV
-
-
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">R2</span>(y_data, y_model):
-    <span style="color: #008000; font-weight: bold">return</span> <span style="color: #666666">1</span> <span style="color: #666666">-</span> np<span style="color: #666666">.</span>sum((y_data <span style="color: #666666">-</span> y_model) <span style="color: #666666">**</span> <span style="color: #666666">2</span>) <span style="color: #666666">/</span> np<span style="color: #666666">.</span>sum((y_data <span style="color: #666666">-</span> np<span style="color: #666666">.</span>mean(y_data)) <span style="color: #666666">**</span> <span style="color: #666666">2</span>)
-
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">MSE</span>(y_data,y_model):
-    n <span style="color: #666666">=</span> np<span style="color: #666666">.</span>size(y_model)
-    <span style="color: #008000; font-weight: bold">return</span> np<span style="color: #666666">.</span>sum((y_data<span style="color: #666666">-</span>y_model)<span style="color: #666666">**2</span>)<span style="color: #666666">/</span>n
-
-<span style="color: #408080; font-style: italic"># A seed just to ensure that the random numbers are the same for every run.</span>
-<span style="color: #408080; font-style: italic"># Useful for eventual debugging.</span>
-np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>seed(<span style="color: #666666">2021</span>)
-
-n <span style="color: #666666">=</span> <span style="color: #666666">100</span>
-x <span style="color: #666666">=</span> np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>rand(n)
-y <span style="color: #666666">=</span> np<span style="color: #666666">.</span>exp(<span style="color: #666666">-</span>x<span style="color: #666666">**2</span>) <span style="color: #666666">+</span> <span style="color: #666666">1.5</span> <span style="color: #666666">*</span> np<span style="color: #666666">.</span>exp(<span style="color: #666666">-</span>(x<span style="color: #666666">-2</span>)<span style="color: #666666">**2</span>)<span style="color: #666666">+</span> np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>randn(n)
-
-Maxpolydegree <span style="color: #666666">=</span> <span style="color: #666666">5</span>
-X <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros((n,Maxpolydegree<span style="color: #666666">-1</span>))
-
-<span style="color: #008000; font-weight: bold">for</span> degree <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #666666">1</span>,Maxpolydegree): <span style="color: #408080; font-style: italic">#No intercept column</span>
-    X[:,degree<span style="color: #666666">-1</span>] <span style="color: #666666">=</span> x<span style="color: #666666">**</span>(degree)
-
-<span style="color: #408080; font-style: italic"># We split the data in test and training data</span>
-X_train, X_test, y_train, y_test <span style="color: #666666">=</span> train_test_split(X, y, test_size<span style="color: #666666">=0.2</span>)
-
-param_grid <span style="color: #666666">=</span> {<span style="color: #BA2121">&#39;alpha&#39;</span>: randuniform()}
-<span style="color: #408080; font-style: italic"># create and fit a ridge regression model, testing each alpha</span>
-model <span style="color: #666666">=</span> Ridge()
-gridsearch <span style="color: #666666">=</span> RandomizedSearchCV(estimator<span style="color: #666666">=</span>model, param_distributions<span style="color: #666666">=</span>param_grid, n_iter<span style="color: #666666">=100</span>)
-gridsearch<span style="color: #666666">.</span>fit(X_train, y_train)
-<span style="color: #008000">print</span>(gridsearch)
-ypredictRidge <span style="color: #666666">=</span> gridsearch<span style="color: #666666">.</span>predict(X_test)
-<span style="color: #408080; font-style: italic"># summarize the results of the grid search</span>
-<span style="color: #008000">print</span>(<span style="color: #BA2121">f&quot;Best estimated lambda-value: </span><span style="color: #BB6688; font-weight: bold">{</span>gridsearch<span style="color: #666666">.</span>best_estimator_<span style="color: #666666">.</span>alpha<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">&quot;</span>)
-<span style="color: #008000">print</span>(<span style="color: #BA2121">f&quot;MSE score: </span><span style="color: #BB6688; font-weight: bold">{</span>MSE(y_test,ypredictRidge)<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">&quot;</span>)
-<span style="color: #008000">print</span>(<span style="color: #BA2121">f&quot;R2 score: </span><span style="color: #BB6688; font-weight: bold">{</span>R2(y_test,ypredictRidge)<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">&quot;</span>)
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="wisconsin-cancer-data">Wisconsin Cancer Data  </h2>
-
-<p>We show here how we can use a simple regression case on the breast
-cancer data using Logistic regression as our algorithm for
-classification.
+<p>One of the quantities we have discussed as a way to measure errors is
+the mean-squared error (MSE), mainly used for fitting of continuous
+functions. Another choice is the absolute error.
 </p>
 
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">matplotlib.pyplot</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">plt</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.model_selection</span> <span style="color: #008000; font-weight: bold">import</span>  train_test_split 
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.datasets</span> <span style="color: #008000; font-weight: bold">import</span> load_breast_cancer
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.linear_model</span> <span style="color: #008000; font-weight: bold">import</span> LogisticRegression
-
-<span style="color: #408080; font-style: italic"># Load the data</span>
-cancer <span style="color: #666666">=</span> load_breast_cancer()
-
-X_train, X_test, y_train, y_test <span style="color: #666666">=</span> train_test_split(cancer<span style="color: #666666">.</span>data,cancer<span style="color: #666666">.</span>target,random_state<span style="color: #666666">=0</span>)
-<span style="color: #008000">print</span>(X_train<span style="color: #666666">.</span>shape)
-<span style="color: #008000">print</span>(X_test<span style="color: #666666">.</span>shape)
-<span style="color: #408080; font-style: italic"># Logistic Regression</span>
-logreg <span style="color: #666666">=</span> LogisticRegression(solver<span style="color: #666666">=</span><span style="color: #BA2121">&#39;lbfgs&#39;</span>)
-logreg<span style="color: #666666">.</span>fit(X_train, y_train)
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;Test set accuracy with Logistic Regression: </span><span style="color: #BB6688; font-weight: bold">{:.2f}</span><span style="color: #BA2121">&quot;</span><span style="color: #666666">.</span>format(logreg<span style="color: #666666">.</span>score(X_test,y_test)))
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="using-the-correlation-matrix">Using the correlation matrix </h2>
-
-<p>In addition to the above scores, we could also study the covariance (and the correlation matrix).
-We use <b>Pandas</b> to compute the correlation matrix.
+<p>In the discussions below we will focus on the MSE and in particular since we will split the data into test and training data,
+we discuss the
 </p>
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">matplotlib.pyplot</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">plt</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.model_selection</span> <span style="color: #008000; font-weight: bold">import</span>  train_test_split 
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.datasets</span> <span style="color: #008000; font-weight: bold">import</span> load_breast_cancer
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.linear_model</span> <span style="color: #008000; font-weight: bold">import</span> LogisticRegression
-cancer <span style="color: #666666">=</span> load_breast_cancer()
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">pandas</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">pd</span>
-<span style="color: #408080; font-style: italic"># Making a data frame</span>
-cancerpd <span style="color: #666666">=</span> pd<span style="color: #666666">.</span>DataFrame(cancer<span style="color: #666666">.</span>data, columns<span style="color: #666666">=</span>cancer<span style="color: #666666">.</span>feature_names)
-
-fig, axes <span style="color: #666666">=</span> plt<span style="color: #666666">.</span>subplots(<span style="color: #666666">15</span>,<span style="color: #666666">2</span>,figsize<span style="color: #666666">=</span>(<span style="color: #666666">10</span>,<span style="color: #666666">20</span>))
-malignant <span style="color: #666666">=</span> cancer<span style="color: #666666">.</span>data[cancer<span style="color: #666666">.</span>target <span style="color: #666666">==</span> <span style="color: #666666">0</span>]
-benign <span style="color: #666666">=</span> cancer<span style="color: #666666">.</span>data[cancer<span style="color: #666666">.</span>target <span style="color: #666666">==</span> <span style="color: #666666">1</span>]
-ax <span style="color: #666666">=</span> axes<span style="color: #666666">.</span>ravel()
-
-<span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #666666">30</span>):
-    _, bins <span style="color: #666666">=</span> np<span style="color: #666666">.</span>histogram(cancer<span style="color: #666666">.</span>data[:,i], bins <span style="color: #666666">=50</span>)
-    ax[i]<span style="color: #666666">.</span>hist(malignant[:,i], bins <span style="color: #666666">=</span> bins, alpha <span style="color: #666666">=</span> <span style="color: #666666">0.5</span>)
-    ax[i]<span style="color: #666666">.</span>hist(benign[:,i], bins <span style="color: #666666">=</span> bins, alpha <span style="color: #666666">=</span> <span style="color: #666666">0.5</span>)
-    ax[i]<span style="color: #666666">.</span>set_title(cancer<span style="color: #666666">.</span>feature_names[i])
-    ax[i]<span style="color: #666666">.</span>set_yticks(())
-ax[<span style="color: #666666">0</span>]<span style="color: #666666">.</span>set_xlabel(<span style="color: #BA2121">&quot;Feature magnitude&quot;</span>)
-ax[<span style="color: #666666">0</span>]<span style="color: #666666">.</span>set_ylabel(<span style="color: #BA2121">&quot;Frequency&quot;</span>)
-ax[<span style="color: #666666">0</span>]<span style="color: #666666">.</span>legend([<span style="color: #BA2121">&quot;Malignant&quot;</span>, <span style="color: #BA2121">&quot;Benign&quot;</span>], loc <span style="color: #666666">=</span><span style="color: #BA2121">&quot;best&quot;</span>)
-fig<span style="color: #666666">.</span>tight_layout()
-plt<span style="color: #666666">.</span>show()
-
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">seaborn</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">sns</span>
-correlation_matrix <span style="color: #666666">=</span> cancerpd<span style="color: #666666">.</span>corr()<span style="color: #666666">.</span>round(<span style="color: #666666">1</span>)
-<span style="color: #408080; font-style: italic"># use the heatmap function from seaborn to plot the correlation matrix</span>
-<span style="color: #408080; font-style: italic"># annot = True to print the values inside the square</span>
-plt<span style="color: #666666">.</span>figure(figsize<span style="color: #666666">=</span>(<span style="color: #666666">15</span>,<span style="color: #666666">8</span>))
-sns<span style="color: #666666">.</span>heatmap(data<span style="color: #666666">=</span>correlation_matrix, annot<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">True</span>)
-plt<span style="color: #666666">.</span>show()
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="discussing-the-correlation-data">Discussing the correlation data </h2>
-
-<p>In the above example we note two things. In the first plot we display
-the overlap of benign and malignant tumors as functions of the various
-features in the Wisconsing breast cancer data set. We see that for
-some of the features we can distinguish clearly the benign and
-malignant cases while for other features we cannot. This can point to
-us which features may be of greater interest when we wish to classify
-a benign or not benign tumour.
-</p>
-
-<p>In the second figure we have computed the so-called correlation
-matrix, which in our case with thirty features becomes a \( 30\times 30 \)
-matrix.
-</p>
-
-<p>We constructed this matrix using <b>pandas</b> via the statements</p>
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;">cancerpd <span style="color: #666666">=</span> pd<span style="color: #666666">.</span>DataFrame(cancer<span style="color: #666666">.</span>data, columns<span style="color: #666666">=</span>cancer<span style="color: #666666">.</span>feature_names)
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-<p>and then</p>
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;">correlation_matrix <span style="color: #666666">=</span> cancerpd<span style="color: #666666">.</span>corr()<span style="color: #666666">.</span>round(<span style="color: #666666">1</span>)
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-<p>Diagonalizing this matrix we can in turn say something about which
-features are of relevance and which are not. This leads  us to
-the classical Principal Component Analysis (PCA) theorem with
-applications. This will be discussed later this semester (<a href="https://compphysics.github.io/MachineLearning/doc/pub/week43/html/week43-bs.html" target="_blank">week 43</a>).
+<ol>
+<li> prediction error or simply the <b>test error</b> \( \mathrm{Err_{Test}} \), where we have a fixed training set and the test error is the MSE arising from the data reserved for testing. We discuss also the</li> 
+<li> training error \( \mathrm{Err_{Train}} \), which is the average loss over the training data.</li>
+</ol>
+<p>As our model becomes more and more complex, more of the training data tends to  used. The training may thence adapt to more complicated structures in the data. This may lead to a decrease in the bias (see below for code example) and a slight increase of the variance for the test error.
+For a certain level of complexity the test error will reach minimum, before starting to increase again. The
+training error reaches a saturation.
 </p>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="other-measures-in-classification-studies-cancer-data-again">Other measures in classification studies: Cancer Data  again </h2>
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">matplotlib.pyplot</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">plt</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.model_selection</span> <span style="color: #008000; font-weight: bold">import</span>  train_test_split 
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.datasets</span> <span style="color: #008000; font-weight: bold">import</span> load_breast_cancer
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.linear_model</span> <span style="color: #008000; font-weight: bold">import</span> LogisticRegression
-
-<span style="color: #408080; font-style: italic"># Load the data</span>
-cancer <span style="color: #666666">=</span> load_breast_cancer()
-
-X_train, X_test, y_train, y_test <span style="color: #666666">=</span> train_test_split(cancer<span style="color: #666666">.</span>data,cancer<span style="color: #666666">.</span>target,random_state<span style="color: #666666">=0</span>)
-<span style="color: #008000">print</span>(X_train<span style="color: #666666">.</span>shape)
-<span style="color: #008000">print</span>(X_test<span style="color: #666666">.</span>shape)
-<span style="color: #408080; font-style: italic"># Logistic Regression</span>
-logreg <span style="color: #666666">=</span> LogisticRegression(solver<span style="color: #666666">=</span><span style="color: #BA2121">&#39;lbfgs&#39;</span>)
-logreg<span style="color: #666666">.</span>fit(X_train, y_train)
-
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.preprocessing</span> <span style="color: #008000; font-weight: bold">import</span> LabelEncoder
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.model_selection</span> <span style="color: #008000; font-weight: bold">import</span> cross_validate
-<span style="color: #408080; font-style: italic">#Cross validation</span>
-accuracy <span style="color: #666666">=</span> cross_validate(logreg,X_test,y_test,cv<span style="color: #666666">=10</span>)[<span style="color: #BA2121">&#39;test_score&#39;</span>]
-<span style="color: #008000">print</span>(accuracy)
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;Test set accuracy with Logistic Regression: </span><span style="color: #BB6688; font-weight: bold">{:.2f}</span><span style="color: #BA2121">&quot;</span><span style="color: #666666">.</span>format(logreg<span style="color: #666666">.</span>score(X_test,y_test)))
-
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">scikitplot</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">skplt</span>
-y_pred <span style="color: #666666">=</span> logreg<span style="color: #666666">.</span>predict(X_test)
-skplt<span style="color: #666666">.</span>metrics<span style="color: #666666">.</span>plot_confusion_matrix(y_test, y_pred, normalize<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">True</span>)
-plt<span style="color: #666666">.</span>show()
-y_probas <span style="color: #666666">=</span> logreg<span style="color: #666666">.</span>predict_proba(X_test)
-skplt<span style="color: #666666">.</span>metrics<span style="color: #666666">.</span>plot_roc(y_test, y_probas)
-plt<span style="color: #666666">.</span>show()
-skplt<span style="color: #666666">.</span>metrics<span style="color: #666666">.</span>plot_cumulative_gain(y_test, y_probas)
-plt<span style="color: #666666">.</span>show()
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
+<h2 id="resampling-methods-bootstrap">Resampling methods: Bootstrap </h2>
+<div class="alert alert-block alert-block alert-text-normal">
+<b></b>
+<p>
+<p>Bootstrapping is a <a href="https://en.wikipedia.org/wiki/Nonparametric_statistics" target="_blank">non-parametric approach</a> to statistical inference
+that substitutes computation for more traditional distributional
+assumptions and asymptotic results. Bootstrapping offers a number of
+advantages: 
+</p>
+<ol>
+<li> The bootstrap is quite general, although there are some cases in which it fails.</li>  
+<li> Because it does not require distributional assumptions (such as normally distributed errors), the bootstrap can provide more accurate inferences when the data are not well behaved or when the sample size is small.</li>  
+<li> It is possible to apply the bootstrap to statistics with sampling distributions that are difficult to derive, even asymptotically.</li> 
+<li> It is relatively simple to apply the bootstrap to complex data-collection plans (such as stratified and clustered samples).</li>
+</ol>
 </div>
 
 
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="optimization-the-central-part-of-any-machine-learning-algortithm">Optimization, the central part of any Machine Learning algortithm </h2>
-
-<a href="https://www.uio.no/studier/emner/matnat/fys/FYS-STK3155/h20/forelesningsvideoer/OverarchingAimsWeek39.mp4?vrtx=view-as-webpage" target="_blank">Overview Video, why do we care about gradient methods?</a>
+<p>The textbook by <a href="https://www.cambridge.org/core/books/bootstrap-methods-and-their-application/ED2FD043579F27952363566DC09CBD6A" target="_blank">Davison on the Bootstrap Methods and their Applications</a> provides many more insights and proofs. In this course we will take a more practical approach and use the results and theorems provided in the literature. For those interested in reading more about the bootstrap methods, we recommend the above text and the one by <a href="https://www.routledge.com/An-Introduction-to-the-Bootstrap/Efron-Tibshirani/p/book/9780412042317" target="_blank">Efron and Tibshirani</a>.</p>
 
-<p>Almost every problem in machine learning and data science starts with
-a dataset \( X \), a model \( g(\beta) \), which is a function of the
-parameters \( \beta \) and a cost function \( C(X, g(\beta)) \) that allows
-us to judge how well the model \( g(\beta) \) explains the observations
-\( X \). The model is fit by finding the values of \( \beta \) that minimize
-the cost function. Ideally we would be able to solve for \( \beta \)
-analytically, however this is not possible in general and we must use
-some approximative/numerical method to compute the minimum.
-</p>
+<p>Before we proceed however, we need to remind ourselves about a central theorem in statistics, namely the so-called <b>central limit theorem</b>.</p>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="revisiting-our-logistic-regression-case">Revisiting our Logistic Regression case </h2>
+<h2 id="the-central-limit-theorem">The Central Limit Theorem </h2>
 
-<p>In our discussion on Logistic Regression we studied the 
-case of
-two classes, with \( y_i \) either
-\( 0 \) or \( 1 \). Furthermore we assumed also that we have only two
-parameters \( \beta \) in our fitting, that is we
-defined probabilities
+<p>Suppose we have a PDF \( p(x) \) from which we generate  a series \( N \)
+of averages \( \mathbb{E}[x_i] \). Each mean value \( \mathbb{E}[x_i] \)
+is viewed as the average of a specific measurement, e.g., throwing 
+dice 100 times and then taking the average value, or producing a certain
+amount of random numbers. 
+For notational ease, we set \( \mathbb{E}[x_i]=x_i \) in the discussion
+which follows. We do the same for \( \mathbb{E}[z]=z \).
 </p>
 
+<p>If we compute the mean \( z \) of \( m \) such mean values \( x_i \)   </p>
 $$
-\begin{align*}
-p(y_i=1|x_i,\boldsymbol{\beta}) &= \frac{\exp{(\beta_0+\beta_1x_i)}}{1+\exp{(\beta_0+\beta_1x_i)}},\nonumber\\
-p(y_i=0|x_i,\boldsymbol{\beta}) &= 1 - p(y_i=1|x_i,\boldsymbol{\beta}),
-\end{align*}
+   z=\frac{x_1+x_2+\dots+x_m}{m},
 $$
 
-<p>where \( \boldsymbol{\beta} \) are the weights we wish to extract from data, in our case \( \beta_0 \) and \( \beta_1 \). </p>
+<p>the question we pose is which is the PDF of the new variable \( z \).</p>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="the-equations-to-solve">The equations to solve </h2>
+<h2 id="finding-the-limit">Finding the Limit </h2>
 
-<p>Our compact equations used a definition of a vector \( \boldsymbol{y} \) with \( n \)
-elements \( y_i \), an \( n\times p \) matrix \( \boldsymbol{X} \) which contains the
-\( x_i \) values and a vector \( \boldsymbol{p} \) of fitted probabilities
-\( p(y_i\vert x_i,\boldsymbol{\beta}) \). We rewrote in a more compact form
-the first derivative of the cost function as
+<p>The probability of obtaining an average value \( z \) is the product of the 
+probabilities of obtaining arbitrary individual mean values \( x_i \),
+but with the constraint that the average is \( z \). We can express this through
+the following expression
 </p>
-
 $$
-\frac{\partial \mathcal{C}(\boldsymbol{\beta})}{\partial \boldsymbol{\beta}} = -\boldsymbol{X}^T\left(\boldsymbol{y}-\boldsymbol{p}\right). 
+    \tilde{p}(z)=\int dx_1p(x_1)\int dx_2p(x_2)\dots\int dx_mp(x_m)
+    \delta(z-\frac{x_1+x_2+\dots+x_m}{m}),
 $$
 
-<p>If we in addition define a diagonal matrix \( \boldsymbol{W} \) with elements 
-\( p(y_i\vert x_i,\boldsymbol{\beta})(1-p(y_i\vert x_i,\boldsymbol{\beta}) \), we can obtain a compact expression of the second derivative as 
+<p>where the \( \delta \)-function enbodies the constraint that the mean is \( z \).
+All measurements that lead to each individual \( x_i \) are expected to
+be independent, which in turn means that we can express \( \tilde{p} \) as the 
+product of individual \( p(x_i) \).  The independence assumption is important in the derivation of the central limit theorem.
 </p>
 
-$$
-\frac{\partial^2 \mathcal{C}(\boldsymbol{\beta})}{\partial \boldsymbol{\beta}\partial \boldsymbol{\beta}^T} = \boldsymbol{X}^T\boldsymbol{W}\boldsymbol{X}. 
-$$
-
-<p>This defines what is called  the Hessian matrix.</p>
-
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="solving-using-newton-raphson-s-method">Solving using Newton-Raphson's method </h2>
-
-<p>If we can set up these equations, Newton-Raphson's iterative method is normally the method of choice. It requires however that we can compute in an efficient way the  matrices that define the first and second derivatives. </p>
+<h2 id="rewriting-the-delta-function">Rewriting the \( \delta \)-function </h2>
 
-<p>Our iterative scheme is then given by</p>
+<p>If we use the integral expression for the \( \delta \)-function</p>
 
 $$
-\boldsymbol{\beta}^{\mathrm{new}} = \boldsymbol{\beta}^{\mathrm{old}}-\left(\frac{\partial^2 \mathcal{C}(\boldsymbol{\beta})}{\partial \boldsymbol{\beta}\partial \boldsymbol{\beta}^T}\right)^{-1}_{\boldsymbol{\beta}^{\mathrm{old}}}\times \left(\frac{\partial \mathcal{C}(\boldsymbol{\beta})}{\partial \boldsymbol{\beta}}\right)_{\boldsymbol{\beta}^{\mathrm{old}}},
+   \delta(z-\frac{x_1+x_2+\dots+x_m}{m})=\frac{1}{2\pi}\int_{-\infty}^{\infty}
+   dq\exp{\left(iq(z-\frac{x_1+x_2+\dots+x_m}{m})\right)},
 $$
 
-<p>or in matrix form as</p>
-
-$$
-\boldsymbol{\beta}^{\mathrm{new}} = \boldsymbol{\beta}^{\mathrm{old}}-\left(\boldsymbol{X}^T\boldsymbol{W}\boldsymbol{X} \right)^{-1}\times \left(-\boldsymbol{X}^T(\boldsymbol{y}-\boldsymbol{p}) \right)_{\boldsymbol{\beta}^{\mathrm{old}}}.
-$$
-
-<p>The right-hand side is computed with the old values of \( \beta \). </p>
-
-<p>If we can compute these matrices, in particular the Hessian, the above is often the easiest method to implement. </p>
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="brief-reminder-on-newton-raphson-s-method">Brief reminder on Newton-Raphson's method </h2>
-
-<p>Let us quickly remind ourselves how we derive the above method.</p>
-
-<p>Perhaps the most celebrated of all one-dimensional root-finding
-routines is Newton's method, also called the Newton-Raphson
-method. This method  requires the evaluation of both the
-function \( f \) and its derivative \( f' \) at arbitrary points. 
-If you can only calculate the derivative
-numerically and/or your function is not of the smooth type, we
-normally discourage the use of this method.
+<p>and inserting \( e^{i\mu q-i\mu q} \) where \( \mu \) is the mean value
+we arrive at
 </p>
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="the-equations">The equations </h2>
-
-<p>The Newton-Raphson formula consists geometrically of extending the
-tangent line at a current point until it crosses zero, then setting
-the next guess to the abscissa of that zero-crossing.  The mathematics
-behind this method is rather simple. Employing a Taylor expansion for
-\( x \) sufficiently close to the solution \( s \), we have
-</p>
-
-$$
-    f(s)=0=f(x)+(s-x)f'(x)+\frac{(s-x)^2}{2}f''(x) +\dots.
-    \label{eq:taylornr}
-$$
-
-<p>For small enough values of the function and for well-behaved
-functions, the terms beyond linear are unimportant, hence we obtain
-</p>
-
 $$
-   f(x)+(s-x)f'(x)\approx 0,
+   \tilde{p}(z)=\frac{1}{2\pi}\int_{-\infty}^{\infty}
+   dq\exp{\left(iq(z-\mu)\right)}\left[\int_{-\infty}^{\infty}
+   dxp(x)\exp{\left(iq(\mu-x)/m\right)}\right]^m,
 $$
 
-<p>yielding</p>
-$$
-   s\approx x-\frac{f(x)}{f'(x)}.
-$$
+<p>with the integral over \( x \) resulting in</p>
 
-<p>Having in mind an iterative procedure, it is natural to start iterating with</p>
 $$
-   x_{n+1}=x_n-\frac{f(x_n)}{f'(x_n)}.
+  \int_{-\infty}^{\infty}dxp(x)\exp{\left(iq(\mu-x)/m\right)}=
+  \int_{-\infty}^{\infty}dxp(x)
+   \left[1+\frac{iq(\mu-x)}{m}-\frac{q^2(\mu-x)^2}{2m^2}+\dots\right].
 $$
 
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="simple-geometric-interpretation">Simple geometric interpretation </h2>
+<h2 id="identifying-terms">Identifying Terms </h2>
 
-<p>The above is Newton-Raphson's method. It has a simple geometric
-interpretation, namely \( x_{n+1} \) is the point where the tangent from
-\( (x_n,f(x_n)) \) crosses the \( x \)-axis.  Close to the solution,
-Newton-Raphson converges fast to the desired result. However, if we
-are far from a root, where the higher-order terms in the series are
-important, the Newton-Raphson formula can give grossly inaccurate
-results. For instance, the initial guess for the root might be so far
-from the true root as to let the search interval include a local
-maximum or minimum of the function.  If an iteration places a trial
-guess near such a local extremum, so that the first derivative nearly
-vanishes, then Newton-Raphson may fail totally
-</p>
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="extending-to-more-than-one-variable">Extending to more than one variable </h2>
-
-<p>Newton's method can be generalized to systems of several non-linear equations
-and variables. Consider the case with two equations
+<p>The second term on the rhs disappears since this is just the mean and 
+employing the definition of \( \sigma^2 \) we have 
 </p>
 $$
-   \begin{array}{cc} f_1(x_1,x_2) &=0\\
-                     f_2(x_1,x_2) &=0,\end{array}
+  \int_{-\infty}^{\infty}dxp(x)e^{\left(iq(\mu-x)/m\right)}=
+  1-\frac{q^2\sigma^2}{2m^2}+\dots,
 $$
 
-<p>which we Taylor expand to obtain</p>
+<p>resulting in </p>
 
 $$
-   \begin{array}{cc} 0=f_1(x_1+h_1,x_2+h_2)=&f_1(x_1,x_2)+h_1
-                     \partial f_1/\partial x_1+h_2
-                     \partial f_1/\partial x_2+\dots\\
-                     0=f_2(x_1+h_1,x_2+h_2)=&f_2(x_1,x_2)+h_1
-                     \partial f_2/\partial x_1+h_2
-                     \partial f_2/\partial x_2+\dots
-                       \end{array}.
+  \left[\int_{-\infty}^{\infty}dxp(x)\exp{\left(iq(\mu-x)/m\right)}\right]^m\approx
+  \left[1-\frac{q^2\sigma^2}{2m^2}+\dots \right]^m,
 $$
 
-<p>Defining the Jacobian matrix \( {\bf \boldsymbol{J}} \) we have</p>
-$$
- {\bf \boldsymbol{J}}=\left( \begin{array}{cc}
-                         \partial f_1/\partial x_1  & \partial f_1/\partial x_2 \\
-                          \partial f_2/\partial x_1     &\partial f_2/\partial x_2
-             \end{array} \right),
-$$
+<p>and in the limit \( m\rightarrow \infty \) we obtain </p>
 
-<p>we can rephrase Newton's method as</p>
 $$
-\left(\begin{array}{c} x_1^{n+1} \\ x_2^{n+1} \end{array} \right)=
-\left(\begin{array}{c} x_1^{n} \\ x_2^{n} \end{array} \right)+
-\left(\begin{array}{c} h_1^{n} \\ h_2^{n} \end{array} \right),
+   \tilde{p}(z)=\frac{1}{\sqrt{2\pi}(\sigma/\sqrt{m})}
+    \exp{\left(-\frac{(z-\mu)^2}{2(\sigma/\sqrt{m})^2}\right)},
 $$
 
-<p>where we have defined</p>
-$$
-   \left(\begin{array}{c} h_1^{n} \\ h_2^{n} \end{array} \right)=
-   -{\bf \boldsymbol{J}}^{-1}
-   \left(\begin{array}{c} f_1(x_1^{n},x_2^{n}) \\ f_2(x_1^{n},x_2^{n}) \end{array} \right).
-$$
-
-<p>We need thus to compute the inverse of the Jacobian matrix and it
-is to understand that difficulties  may
-arise in case \( {\bf \boldsymbol{J}} \) is nearly singular.
-</p>
-
-<p>It is rather straightforward to extend the above scheme to systems of
-more than two non-linear equations. In our case, the Jacobian matrix is given by the Hessian that represents the second derivative of cost function. 
+<p>which is the normal distribution with variance
+\( \sigma^2_m=\sigma^2/m \), where \( \sigma \) is the variance of the PDF \( p(x) \)
+and \( \mu \) is also the mean of the PDF \( p(x) \). 
 </p>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="steepest-descent">Steepest descent </h2>
+<h2 id="wrapping-it-up">Wrapping it up </h2>
 
-<p>The basic idea of gradient descent is
-that a function \( F(\mathbf{x}) \), 
-\( \mathbf{x} \equiv (x_1,\cdots,x_n) \), decreases fastest if one goes from \( \bf {x} \) in the
-direction of the negative gradient \( -\nabla F(\mathbf{x}) \).
+<p>Thus, the central limit theorem states that the PDF \( \tilde{p}(z) \) of
+the average of \( m \) random values corresponding to a PDF \( p(x) \) 
+is a normal distribution whose mean is the 
+mean value of the PDF \( p(x) \) and whose variance is the variance
+of the PDF \( p(x) \) divided by \( m \), the number of values used to compute \( z \).
 </p>
 
-<p>It can be shown that if </p>
-$$
-\mathbf{x}_{k+1} = \mathbf{x}_k - \gamma_k \nabla F(\mathbf{x}_k),
-$$
-
-<p>with \( \gamma_k > 0 \).</p>
-
-<p>For \( \gamma_k \) small enough, then \( F(\mathbf{x}_{k+1}) \leq
-F(\mathbf{x}_k) \). This means that for a sufficiently small \( \gamma_k \)
-we are always moving towards smaller function values, i.e a minimum.
-</p>
-
-<!-- !split  -->
-<h2 id="more-on-steepest-descent">More on Steepest descent </h2>
-
-<p>The previous observation is the basis of the method of steepest
-descent, which is also referred to as just gradient descent (GD). One
-starts with an initial guess \( \mathbf{x}_0 \) for a minimum of \( F \) and
-computes new approximations according to
+<p>The central limit theorem leads to the well-known expression for the
+standard deviation, given by 
 </p>
 
 $$
-\mathbf{x}_{k+1} = \mathbf{x}_k - \gamma_k \nabla F(\mathbf{x}_k), \ \ k \geq 0.
+    \sigma_m=
+\frac{\sigma}{\sqrt{m}}.
 $$
 
-<p>The parameter \( \gamma_k \) is often referred to as the step length or
-the learning rate within the context of Machine Learning.
-</p>
-
-<!-- !split  -->
-<h2 id="the-ideal">The ideal </h2>
-
-<p>Ideally the sequence \( \{\mathbf{x}_k \}_{k=0} \) converges to a global
-minimum of the function \( F \). In general we do not know if we are in a
-global or local minimum. In the special case when \( F \) is a convex
-function, all local minima are also global minima, so in this case
-gradient descent can converge to the global solution. The advantage of
-this scheme is that it is conceptually simple and straightforward to
-implement. However the method in this form has some severe
-limitations:
+<p>The latter is true only if the average value is known exactly. This is obtained in the limit
+\( m\rightarrow \infty \)  only. Because the mean and the variance are measured quantities we obtain 
+the familiar expression in statistics (the so-called Bessel correction)
 </p>
+$$
+    \sigma_m\approx 
+\frac{\sigma}{\sqrt{m-1}}.
+$$
 
-<p>In machine learing we are often faced with non-convex high dimensional
-cost functions with many local minima. Since GD is deterministic we
-will get stuck in a local minimum, if the method converges, unless we
-have a very good intial guess. This also implies that the scheme is
-sensitive to the chosen initial condition.
+<p>In many cases however the above estimate for the standard deviation,
+in particular if correlations are strong, may be too simplistic. Keep
+in mind that we have assumed that the variables \( x \) are independent
+and identically distributed. This is obviously not always the
+case. For example, the random numbers (or better pseudorandom numbers)
+we generate in various calculations do always exhibit some
+correlations.
 </p>
 
-<p>Note that the gradient is a function of \( \mathbf{x} =
-(x_1,\cdots,x_n) \) which makes it expensive to compute numerically.
+<p>The theorem is satisfied by a large class of PDFs. Note however that for a
+finite \( m \), it is not always possible to find a closed form /analytic expression for
+\( \tilde{p}(x) \).
 </p>
 
-<!-- !split  -->
-<h2 id="the-sensitiveness-of-the-gradient-descent">The sensitiveness of the gradient descent </h2>
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="confidence-intervals">Confidence Intervals </h2>
 
-<p>The gradient descent method 
-is sensitive to the choice of learning rate \( \gamma_k \). This is due
-to the fact that we are only guaranteed that \( F(\mathbf{x}_{k+1}) \leq
-F(\mathbf{x}_k) \) for sufficiently small \( \gamma_k \). The problem is to
-determine an optimal learning rate. If the learning rate is chosen too
-small the method will take a long time to converge and if it is too
-large we can experience erratic behavior.
+<p>Confidence intervals are used in statistics and represent a type of estimate
+computed from the observed data. This gives a range of values for an
+unknown parameter such as the parameters \( \boldsymbol{\theta} \) from linear regression.
 </p>
 
-<p>Many of these shortcomings can be alleviated by introducing
-randomness. One such method is that of Stochastic Gradient Descent
-(SGD), to be discussed next week.
+<p>With the OLS expressions for the parameters \( \boldsymbol{\theta} \) we found 
+\( \mathbb{E}(\boldsymbol{\theta}) = \boldsymbol{\theta} \), which means that the estimator of the regression parameters is unbiased.
 </p>
 
-<!-- !split  -->
-<h2 id="convex-functions">Convex functions </h2>
-
-<p>Ideally we want our cost/loss function to be convex(concave).</p>
-
-<p>First we give the definition of a convex set: A set \( C \) in
-\( \mathbb{R}^n \) is said to be convex if, for all \( x \) and \( y \) in \( C \) and
-all \( t \in (0,1) \) , the point \( (1 &#8722; t)x + ty \) also belongs to
-C. Geometrically this means that every point on the line segment
-connecting \( x \) and \( y \) is in \( C \) as discussed below.
+<p>In the exercises this week we show that the variance of the estimate of the \( j \)-th regression coefficient is
+\( \boldsymbol{\sigma}^2 (\boldsymbol{\theta}_j ) = \boldsymbol{\sigma}^2 [(\mathbf{X}^{T} \mathbf{X})^{-1}]_{jj}  \).
 </p>
 
-<p>The convex subsets of \( \mathbb{R} \) are the intervals of
-\( \mathbb{R} \). Examples of convex sets of \( \mathbb{R}^2 \) are the
-regular polygons (triangles, rectangles, pentagons, etc...).
+<p>This quantity can be used to
+construct a confidence interval for the estimates.
 </p>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="convex-function">Convex function </h2>
-
-<p><b>Convex function</b>: Let \( X \subset \mathbb{R}^n \) be a convex set. Assume that the function \( f: X \rightarrow \mathbb{R} \) is continuous, then \( f \) is said to be convex if $$f(tx_1 + (1-t)x_2) \leq tf(x_1) + (1-t)f(x_2) $$ for all \( x_1, x_2 \in X \) and for all \( t \in [0,1] \). If \( \leq \) is replaced with a strict inequaltiy in the definition, we demand \( x_1 \neq x_2 \) and \( t\in(0,1) \) then \( f \) is said to be strictly convex. For a single variable function, convexity means that if you draw a straight line connecting \( f(x_1) \) and \( f(x_2) \), the value of the function on the interval \( [x_1,x_2] \) is always below the line as illustrated below.</p>
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="conditions-on-convex-functions">Conditions on convex functions </h2>
+<h2 id="standard-approach-based-on-the-normal-distribution">Standard Approach based on the Normal Distribution </h2>
 
-<p>In the following we state first and second-order conditions which
-ensures convexity of a function \( f \). We write \( D_f \) to denote the
-domain of \( f \), i.e the subset of \( R^n \) where \( f \) is defined. For more
-details and proofs we refer to: <a href="http://stanford.edu/boyd/cvxbook/,%202004" target="_blank">S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge University Press</a>.
+<p>We will assume that the parameters \( \theta \) follow a normal
+distribution.  We can then define the confidence interval.  Here we will be using as
+shorthands \( \mu_{\theta} \) for the above mean value and \( \sigma_{\theta} \)
+for the standard deviation. We have then a confidence interval
 </p>
 
-<div class="alert alert-block alert-block alert-text-normal">
-<b>First order condition</b>
-<p>
-<p>Suppose \( f \) is differentiable (i.e \( \nabla f(x) \) is well defined for
-all \( x \) in the domain of \( f \)). Then \( f \) is convex if and only if \( D_f \)
-is a convex set and $$f(y) \geq f(x) + \nabla f(x)^T (y-x) $$ holds
-for all \( x,y \in D_f \). This condition means that for a convex function
-the first order Taylor expansion (right hand side above) at any point
-a global under estimator of the function. To convince yourself you can
-make a drawing of \( f(x) = x^2+1 \) and draw the tangent line to \( f(x) \) and
-note that it is always below the graph.  
+$$
+\left(\mu_{\theta}\pm \frac{z\sigma_{\theta}}{\sqrt{n}}\right),
+$$
+
+<p>where \( z \) defines the level of certainty (or confidence). For a normal
+distribution typical parameters are \( z=2.576 \) which corresponds to a
+confidence of \( 99\% \) while \( z=1.96 \) corresponds to a confidence of
+\( 95\% \).  A confidence level of \( 95\% \) is commonly used and it is
+normally referred to as a <em>two-sigmas</em> confidence level, that is we
+approximate \( z\approx 2 \).
 </p>
-</div>
 
+<p>For more discussions of confidence intervals (and in particular linked with a discussion of the bootstrap method), see chapter 5 of the textbook by <a href="https://www.cambridge.org/core/books/bootstrap-methods-and-their-application/ED2FD043579F27952363566DC09CBD6A" target="_blank">Davison on the Bootstrap Methods and their Applications</a></p>
 
-<div class="alert alert-block alert-block alert-text-normal">
-<b>Second order condition</b>
-<p>
-<p>Assume that \( f \) is twice
-differentiable, i.e the Hessian matrix exists at each point in
-\( D_f \). Then \( f \) is convex if and only if \( D_f \) is a convex set and its
-Hessian is positive semi-definite for all \( x\in D_f \). For a
-single-variable function this reduces to \( f''(x) \geq 0 \). Geometrically this means that \( f \) has nonnegative curvature
-everywhere.
+<p>In this text you will also find an in-depth discussion of the
+Bootstrap method, why it works and various theorems related to it. 
 </p>
-</div>
 
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="resampling-methods-bootstrap-background">Resampling methods: Bootstrap background </h2>
 
-<p>This condition is particularly useful since it gives us an procedure for determining if the function under consideration is convex, apart from using the definition.</p>
+<p>Since \( \widehat{\theta} = \widehat{\theta}(\boldsymbol{X}) \) is a function of random variables,
+\( \widehat{\theta} \) itself must be a random variable. Thus it has
+a pdf, call this function \( p(\boldsymbol{t}) \). The aim of the bootstrap is to
+estimate \( p(\boldsymbol{t}) \) by the relative frequency of
+\( \widehat{\theta} \). You can think of this as using a histogram
+in the place of \( p(\boldsymbol{t}) \). If the relative frequency closely
+resembles \( p(\vec{t}) \), then using numerics, it is straight forward to
+estimate all the interesting parameters of \( p(\boldsymbol{t}) \) using point
+estimators.  
+</p>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="more-on-convex-functions">More on convex functions </h2>
+<h2 id="resampling-methods-more-bootstrap-background">Resampling methods: More Bootstrap background </h2>
 
-<p>The next result is of great importance to us and the reason why we are
-going on about convex functions. In machine learning we frequently
-have to minimize a loss/cost function in order to find the best
-parameters for the model we are considering. 
+<p>In the case that \( \widehat{\theta} \) has
+more than one component, and the components are independent, we use the
+same estimator on each component separately.  If the probability
+density function of \( X_i \), \( p(x) \), had been known, then it would have
+been straightforward to do this by: 
 </p>
-
-<p>Ideally we want the
-global minimum (for high-dimensional models it is hard to know
-if we have local or global minimum). However, if the cost/loss function
-is convex the following result provides invaluable information:
+<ol>
+<li> Drawing lots of numbers from \( p(x) \), suppose we call one such set of numbers \( (X_1^*, X_2^*, \cdots, X_n^*) \).</li> 
+<li> Then using these numbers, we could compute a replica of \( \widehat{\theta} \) called \( \widehat{\theta}^* \).</li> 
+</ol>
+<p>By repeated use of the above two points, many
+estimates of \( \widehat{\theta} \) can  be obtained. The
+idea is to use the relative frequency of \( \widehat{\theta}^* \)
+(think of a histogram) as an estimate of \( p(\boldsymbol{t}) \).
 </p>
 
-<div class="alert alert-block alert-block alert-text-normal">
-<b>Any minimum is global for convex functions</b>
-<p>
-<p>Consider the problem of finding \( x \in \mathbb{R}^n \) such that \( f(x) \)
-is minimal, where \( f \) is convex and differentiable. Then, any point
-\( x^* \) that satisfies \( \nabla f(x^*) = 0 \) is a global minimum.
-</p>
-</div>
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="resampling-methods-bootstrap-approach">Resampling methods: Bootstrap approach </h2>
 
+<p>But
+unless there is enough information available about the process that
+generated \( X_1,X_2,\cdots,X_n \), \( p(x) \) is in general
+unknown. Therefore, <a href="https://projecteuclid.org/euclid.aos/1176344552" target="_blank">Efron in 1979</a>  asked the
+question: What if we replace \( p(x) \) by the relative frequency
+of the observation \( X_i \)?
+</p>
 
-<p>This result means that if we know that the cost/loss function is convex and we are able to find a minimum, we are guaranteed that it is a global minimum.</p>
+<p>If we draw observations in accordance with
+the relative frequency of the observations, will we obtain the same
+result in some asymptotic sense? The answer is yes.
+</p>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="some-simple-problems">Some simple problems </h2>
+<h2 id="resampling-methods-bootstrap-steps">Resampling methods: Bootstrap steps </h2>
+
+<p>The independent bootstrap works like this: </p>
 
 <ol>
-<li> Show that \( f(x)=x^2 \) is convex for \( x \in \mathbb{R} \) using the definition of convexity. Hint: If you re-write the definition, \( f \) is convex if the following holds for all \( x,y \in D_f \) and any \( \lambda \in [0,1] \) $\lambda f(x)+(1-\lambda)f(y)-f(\lambda x + (1-\lambda) y ) \geq 0$.</li>
-<li> Using the second order condition show that the following functions are convex on the specified domain.</li>
-<ul>
- <li> \( f(x) = e^x \) is convex for \( x \in \mathbb{R} \).</li>
- <li> \( g(x) = -\ln(x) \) is convex for \( x \in (0,\infty) \).</li>
-</ul>
-<li> Let \( f(x) = x^2 \) and \( g(x) = e^x \). Show that \( f(g(x)) \) and \( g(f(x)) \) is convex for \( x \in \mathbb{R} \). Also show that if \( f(x) \) is any convex function than \( h(x) = e^{f(x)} \) is convex.</li>
-<li> A norm is any function that satisfy the following properties</li>
-<ul>
- <li> \( f(\alpha x) = |\alpha| f(x) \) for all \( \alpha \in \mathbb{R} \).</li>
- <li> \( f(x+y) \leq f(x) + f(y) \)</li>
- <li> \( f(x) \leq 0 \) for all \( x \in \mathbb{R}^n \) with equality if and only if \( x = 0 \)</li>
-</ul>
+<li> Draw with replacement \( n \) numbers for the observed variables \( \boldsymbol{x} = (x_1,x_2,\cdots,x_n) \).</li> 
+<li> Define a vector \( \boldsymbol{x}^* \) containing the values which were drawn from \( \boldsymbol{x} \).</li> 
+<li> Using the vector \( \boldsymbol{x}^* \) compute \( \widehat{\theta}^* \) by evaluating \( \widehat \theta \) under the observations \( \boldsymbol{x}^* \).</li> 
+<li> Repeat this process \( k \) times.</li> 
 </ol>
-<p>Using the definition of convexity, try to show that a function satisfying the properties above is convex (the third condition is not needed to show this).</p>
+<p>When you are done, you can draw a histogram of the relative frequency
+of \( \widehat \theta^* \). This is your estimate of the probability
+distribution \( p(t) \). Using this probability distribution you can
+estimate any statistics thereof. In principle you never draw the
+histogram of the relative frequency of \( \widehat{\theta}^* \). Instead
+you use the estimators corresponding to the statistic of interest. For
+example, if you are interested in estimating the variance of \( \widehat
+\theta \), apply the etsimator \( \widehat \sigma^2 \) to the values
+\( \widehat \theta^* \).
+</p>
 
-<!-- !split  -->
-<h2 id="revisiting-our-first-homework">Revisiting our first homework </h2>
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="code-example-for-the-bootstrap-method">Code example for the Bootstrap method </h2>
 
-<p>We will use linear regression as a case study for the gradient descent
-methods. Linear regression is a great test case for the gradient
-descent methods discussed in the lectures since it has several
-desirable properties such as:
+<p>The following code starts with a Gaussian distribution with mean value
+\( \mu =100 \) and variance \( \sigma=15 \). We use this to generate the data
+used in the bootstrap analysis. The bootstrap analysis returns a data
+set after a given number of bootstrap operations (as many as we have
+data points). This data set consists of estimated mean values for each
+bootstrap operation. The histogram generated by the bootstrap method
+shows that the distribution for these mean values is also a Gaussian,
+centered around the mean value \( \mu=100 \) but with standard deviation
+\( \sigma/\sqrt{n} \), where \( n \) is the number of bootstrap samples (in
+this case the same as the number of original data points). The value
+of the standard deviation is what we expect from the central limit
+theorem.
 </p>
 
-<ol>
-<li> An analytical solution (recall homework set 1).</li>
-<li> The gradient can be computed analytically.</li>
-<li> The cost function is convex which guarantees that gradient descent converges for small enough learning rates</li>
-</ol>
-<p>We revisit an example similar to what we had in the first homework set. We had a function  of the type</p>
 
+<!-- code=python (!bc pycod) typeset with pygments style "default" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #f8f8f8">
+  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">time</span> <span style="color: #008000; font-weight: bold">import</span> time
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">scipy.stats</span> <span style="color: #008000; font-weight: bold">import</span> norm
+<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">matplotlib.pyplot</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">plt</span>
+
+<span style="color: #408080; font-style: italic"># Returns mean of bootstrap samples </span>
+<span style="color: #408080; font-style: italic"># Bootstrap algorithm</span>
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">bootstrap</span>(data, datapoints):
+    t <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros(datapoints)
+    n <span style="color: #666666">=</span> <span style="color: #008000">len</span>(data)
+    <span style="color: #408080; font-style: italic"># non-parametric bootstrap         </span>
+    <span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(datapoints):
+        t[i] <span style="color: #666666">=</span> np<span style="color: #666666">.</span>mean(data[np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>randint(<span style="color: #666666">0</span>,n,n)])
+    <span style="color: #408080; font-style: italic"># analysis    </span>
+    <span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;Bootstrap Statistics :&quot;</span>)
+    <span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;original           bias      std. error&quot;</span>)
+    <span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;</span><span style="color: #BB6688; font-weight: bold">%8g</span><span style="color: #BA2121"> </span><span style="color: #BB6688; font-weight: bold">%8g</span><span style="color: #BA2121"> </span><span style="color: #BB6688; font-weight: bold">%14g</span><span style="color: #BA2121"> </span><span style="color: #BB6688; font-weight: bold">%15g</span><span style="color: #BA2121">&quot;</span> <span style="color: #666666">%</span> (np<span style="color: #666666">.</span>mean(data), np<span style="color: #666666">.</span>std(data),np<span style="color: #666666">.</span>mean(t),np<span style="color: #666666">.</span>std(t)))
+    <span style="color: #008000; font-weight: bold">return</span> t
+
+<span style="color: #408080; font-style: italic"># We set the mean value to 100 and the standard deviation to 15</span>
+mu, sigma <span style="color: #666666">=</span> <span style="color: #666666">100</span>, <span style="color: #666666">15</span>
+datapoints <span style="color: #666666">=</span> <span style="color: #666666">10000</span>
+<span style="color: #408080; font-style: italic"># We generate random numbers according to the normal distribution</span>
+x <span style="color: #666666">=</span> mu <span style="color: #666666">+</span> sigma<span style="color: #666666">*</span>np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>randn(datapoints)
+<span style="color: #408080; font-style: italic"># bootstrap returns the data sample                                    </span>
+t <span style="color: #666666">=</span> bootstrap(x, datapoints)
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
+</div>
+
+<p>We see that our new variance and from that the standard deviation, agrees with the central limit theorem.</p>
+
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="plotting-the-histogram">Plotting the Histogram </h2>
 
 <!-- code=python (!bc pycod) typeset with pygments style "default" -->
 <div class="cell border-box-sizing code_cell rendered">
@@ -1844,8 +1105,15 @@ <h2 id="revisiting-our-first-homework">Revisiting our first homework </h2>
     <div class="inner_cell">
       <div class="input_area">
         <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;">x <span style="color: #666666">=</span> <span style="color: #666666">2*</span>np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>rand(m,<span style="color: #666666">1</span>)
-y <span style="color: #666666">=</span> <span style="color: #666666">4+3*</span>x<span style="color: #666666">+</span>np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>randn(m,<span style="color: #666666">1</span>)
+  <pre style="line-height: 125%;"><span style="color: #408080; font-style: italic"># the histogram of the bootstrapped data (normalized data if density = True)</span>
+n, binsboot, patches <span style="color: #666666">=</span> plt<span style="color: #666666">.</span>hist(t, <span style="color: #666666">50</span>, density<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">True</span>, facecolor<span style="color: #666666">=</span><span style="color: #BA2121">&#39;red&#39;</span>, alpha<span style="color: #666666">=0.75</span>)
+<span style="color: #408080; font-style: italic"># add a &#39;best fit&#39; line  </span>
+y <span style="color: #666666">=</span> norm<span style="color: #666666">.</span>pdf(binsboot, np<span style="color: #666666">.</span>mean(t), np<span style="color: #666666">.</span>std(t))
+lt <span style="color: #666666">=</span> plt<span style="color: #666666">.</span>plot(binsboot, y, <span style="color: #BA2121">&#39;b&#39;</span>, linewidth<span style="color: #666666">=1</span>)
+plt<span style="color: #666666">.</span>xlabel(<span style="color: #BA2121">&#39;x&#39;</span>)
+plt<span style="color: #666666">.</span>ylabel(<span style="color: #BA2121">&#39;Probability&#39;</span>)
+plt<span style="color: #666666">.</span>grid(<span style="color: #008000; font-weight: bold">True</span>)
+plt<span style="color: #666666">.</span>show()
 </pre>
 </div>
       </div>
@@ -1861,85 +1129,78 @@ <h2 id="revisiting-our-first-homework">Revisiting our first homework </h2>
   </div>
 </div>
 
-<p>with \( x_i \in [0,1]  \) is chosen randomly using a uniform distribution. Additionally we have a stochastic noise chosen according to a normal distribution \( \cal {N}(0,1) \). 
-The linear regression model is given by 
+
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="the-bias-variance-tradeoff">The bias-variance tradeoff </h2>
+
+<p>We will discuss the bias-variance tradeoff in the context of
+continuous predictions such as regression. However, many of the
+intuitions and ideas discussed here also carry over to classification
+tasks. Consider a dataset \( \mathcal{D} \) consisting of the data
+\( \mathbf{X}_\mathcal{D}=\{(y_j, \boldsymbol{x}_j), j=0\ldots n-1\} \). 
 </p>
-$$
-h_\beta(x) = \boldsymbol{y} = \beta_0 + \beta_1 x,
-$$
 
-<p>such that </p>
+<p>Let us assume that the true data is generated from a noisy model</p>
+
 $$
-\boldsymbol{y}_i = \beta_0 + \beta_1 x_i.
+\boldsymbol{y}=f(\boldsymbol{x}) + \boldsymbol{\epsilon}
 $$
 
+<p>where \( \epsilon \) is normally distributed with mean zero and standard deviation \( \sigma^2 \).</p>
 
-<!-- !split  -->
-<h2 id="gradient-descent-example">Gradient descent example </h2>
-
-<p>Let \( \mathbf{y} = (y_1,\cdots,y_n)^T \), \( \mathbf{\boldsymbol{y}} = (\boldsymbol{y}_1,\cdots,\boldsymbol{y}_n)^T \) and \( \beta = (\beta_0, \beta_1)^T \)</p>
+<p>In our derivation of the ordinary least squares method we defined then
+an approximation to the function \( f \) in terms of the parameters
+\( \boldsymbol{\theta} \) and the design matrix \( \boldsymbol{X} \) which embody our model,
+that is \( \boldsymbol{\tilde{y}}=\boldsymbol{X}\boldsymbol{\theta} \). 
+</p>
 
-<p>It is convenient to write \( \mathbf{\boldsymbol{y}} = X\beta \) where \( X \in \mathbb{R}^{100 \times 2}  \) is the design matrix given by (we keep the intercept here)</p>
+<p>Thereafter we found the parameters \( \boldsymbol{\theta} \) by optimizing the means squared error via the so-called cost function</p>
 $$
-X \equiv \begin{bmatrix}
-1 & x_1  \\
-\vdots & \vdots  \\
-1 & x_{100} &  \\
-\end{bmatrix}.
+C(\boldsymbol{X},\boldsymbol{\theta}) =\frac{1}{n}\sum_{i=0}^{n-1}(y_i-\tilde{y}_i)^2=\mathbb{E}\left[(\boldsymbol{y}-\boldsymbol{\tilde{y}})^2\right].
 $$
 
-<p>The cost/loss/risk function is given by (</p>
+<p>We can rewrite this as </p>
 $$
-C(\beta) = \frac{1}{n}||X\beta-\mathbf{y}||_{2}^{2} = \frac{1}{n}\sum_{i=1}^{100}\left[ (\beta_0 + \beta_1 x_i)^2 - 2 y_i (\beta_0 + \beta_1 x_i) + y_i^2\right] 
+\mathbb{E}\left[(\boldsymbol{y}-\boldsymbol{\tilde{y}})^2\right]=\frac{1}{n}\sum_i(f_i-\mathbb{E}\left[\boldsymbol{\tilde{y}}\right])^2+\frac{1}{n}\sum_i(\tilde{y}_i-\mathbb{E}\left[\boldsymbol{\tilde{y}}\right])^2+\sigma^2.
 $$
 
-<p>and we want to find \( \beta \) such that \( C(\beta) \) is minimized.</p>
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="the-derivative-of-the-cost-loss-function">The derivative of the cost/loss function </h2>
+<p>The three terms represent the square of the bias of the learning
+method, which can be thought of as the error caused by the simplifying
+assumptions built into the method. The second term represents the
+variance of the chosen model and finally the last terms is variance of
+the error \( \boldsymbol{\epsilon} \).
+</p>
 
-<p>Computing \( \partial C(\beta) / \partial \beta_0 \) and \( \partial C(\beta) / \partial \beta_1 \) we can show  that the gradient can be written as</p>
+<p>To derive this equation, we need to recall that the variance of \( \boldsymbol{y} \) and \( \boldsymbol{\epsilon} \) are both equal to \( \sigma^2 \). The mean value of \( \boldsymbol{\epsilon} \) is by definition equal to zero. Furthermore, the function \( f \) is not a stochastics variable, idem for \( \boldsymbol{\tilde{y}} \).
+We use a more compact notation in terms of the expectation value 
+</p>
 $$
-\nabla_{\beta} C(\beta) = \frac{2}{n}\begin{bmatrix} \sum_{i=1}^{100} \left(\beta_0+\beta_1x_i-y_i\right) \\
-\sum_{i=1}^{100}\left( x_i (\beta_0+\beta_1x_i)-y_ix_i\right) \\
-\end{bmatrix} = \frac{2}{n}X^T(X\beta - \mathbf{y}), 
+\mathbb{E}\left[(\boldsymbol{y}-\boldsymbol{\tilde{y}})^2\right]=\mathbb{E}\left[(\boldsymbol{f}+\boldsymbol{\epsilon}-\boldsymbol{\tilde{y}})^2\right],
 $$
 
-<p>where \( X \) is the design matrix defined above.</p>
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="the-hessian-matrix">The Hessian matrix </h2>
-<p>The Hessian matrix of \( C(\beta) \) is given by </p>
+<p>and adding and subtracting \( \mathbb{E}\left[\boldsymbol{\tilde{y}}\right] \) we get</p>
 $$
-\boldsymbol{H} \equiv \begin{bmatrix}
-\frac{\partial^2 C(\beta)}{\partial \beta_0^2} & \frac{\partial^2 C(\beta)}{\partial \beta_0 \partial \beta_1}  \\
-\frac{\partial^2 C(\beta)}{\partial \beta_0 \partial \beta_1} & \frac{\partial^2 C(\beta)}{\partial \beta_1^2} &  \\
-\end{bmatrix} = \frac{2}{n}X^T X.
+\mathbb{E}\left[(\boldsymbol{y}-\boldsymbol{\tilde{y}})^2\right]=\mathbb{E}\left[(\boldsymbol{f}+\boldsymbol{\epsilon}-\boldsymbol{\tilde{y}}+\mathbb{E}\left[\boldsymbol{\tilde{y}}\right]-\mathbb{E}\left[\boldsymbol{\tilde{y}}\right])^2\right],
 $$
 
-<p>This result implies that \( C(\beta) \) is a convex function since the matrix \( X^T X \) always is positive semi-definite.</p>
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="simple-program">Simple program </h2>
-
-<p>We can now write a program that minimizes \( C(\beta) \) using the gradient descent method with a constant learning rate \( \gamma \) according to </p>
+<p>which, using the abovementioned expectation values can be rewritten as </p>
 $$
-\beta_{k+1} = \beta_k - \gamma \nabla_\beta C(\beta_k), \ k=0,1,\cdots 
+\mathbb{E}\left[(\boldsymbol{y}-\boldsymbol{\tilde{y}})^2\right]=\mathbb{E}\left[(\boldsymbol{y}-\mathbb{E}\left[\boldsymbol{\tilde{y}}\right])^2\right]+\mathrm{Var}\left[\boldsymbol{\tilde{y}}\right]+\sigma^2,
 $$
 
-<p>We can use the expression we computed for the gradient and let use a
-\( \beta_0 \) be chosen randomly and let \( \gamma = 0.001 \). Stop iterating
-when \( ||\nabla_\beta C(\beta_k) || \leq \epsilon = 10^{-8} \). <b>Note that the code below does not include the latter stop criterion</b>.
-</p>
-
-<p>And finally we can compare our solution for \( \beta \) with the analytic result given by 
-\( \beta= (X^TX)^{-1} X^T \mathbf{y} \).
-</p>
+<p>that is the rewriting in terms of the so-called bias, the variance of the model \( \boldsymbol{\tilde{y}} \) and the variance of \( \boldsymbol{\epsilon} \).</p>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="gradient-descent-example">Gradient Descent Example </h2>
+<h2 id="a-way-to-read-the-bias-variance-tradeoff">A way to Read the Bias-Variance Tradeoff </h2>
+
+<br/><br/>
+<center>
+<p><img src="http://github.com/figures/BiasVariance.png" width="600" align="bottom"></p>
+</center>
+<br/><br/>
 
-<p>Here our simple example</p>
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="example-code-for-bias-variance-tradeoff">Example code for Bias-Variance tradeoff </h2>
 
 <!-- code=python (!bc pycod) typeset with pygments style "default" -->
 <div class="cell border-box-sizing code_cell rendered">
@@ -1947,50 +1208,59 @@ <h2 id="gradient-descent-example">Gradient Descent Example </h2>
     <div class="inner_cell">
       <div class="input_area">
         <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #408080; font-style: italic"># Importing various packages</span>
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">random</span> <span style="color: #008000; font-weight: bold">import</span> random, seed
+  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">matplotlib.pyplot</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">plt</span>
 <span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">matplotlib.pyplot</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">plt</span>
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">mpl_toolkits.mplot3d</span> <span style="color: #008000; font-weight: bold">import</span> Axes3D
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">matplotlib</span> <span style="color: #008000; font-weight: bold">import</span> cm
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">matplotlib.ticker</span> <span style="color: #008000; font-weight: bold">import</span> LinearLocator, FormatStrFormatter
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">sys</span>
-
-<span style="color: #408080; font-style: italic"># the number of datapoints</span>
-n <span style="color: #666666">=</span> <span style="color: #666666">100</span>
-x <span style="color: #666666">=</span> <span style="color: #666666">2*</span>np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>rand(n,<span style="color: #666666">1</span>)
-y <span style="color: #666666">=</span> <span style="color: #666666">4+3*</span>x<span style="color: #666666">+</span>np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>randn(n,<span style="color: #666666">1</span>)
-
-X <span style="color: #666666">=</span> np<span style="color: #666666">.</span>c_[np<span style="color: #666666">.</span>ones((n,<span style="color: #666666">1</span>)), x]
-<span style="color: #408080; font-style: italic"># Hessian matrix</span>
-H <span style="color: #666666">=</span> (<span style="color: #666666">2.0/</span>n)<span style="color: #666666">*</span> X<span style="color: #666666">.</span>T <span style="color: #666666">@</span> X
-<span style="color: #408080; font-style: italic"># Get the eigenvalues</span>
-EigValues, EigVectors <span style="color: #666666">=</span> np<span style="color: #666666">.</span>linalg<span style="color: #666666">.</span>eig(H)
-<span style="color: #008000">print</span>(<span style="color: #BA2121">f&quot;Eigenvalues of Hessian Matrix:</span><span style="color: #BB6688; font-weight: bold">{</span>EigValues<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">&quot;</span>)
-
-beta_linreg <span style="color: #666666">=</span> np<span style="color: #666666">.</span>linalg<span style="color: #666666">.</span>inv(X<span style="color: #666666">.</span>T <span style="color: #666666">@</span> X) <span style="color: #666666">@</span> X<span style="color: #666666">.</span>T <span style="color: #666666">@</span> y
-<span style="color: #008000">print</span>(beta_linreg)
-beta <span style="color: #666666">=</span> np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>randn(<span style="color: #666666">2</span>,<span style="color: #666666">1</span>)
-
-eta <span style="color: #666666">=</span> <span style="color: #666666">1.0/</span>np<span style="color: #666666">.</span>max(EigValues)
-Niterations <span style="color: #666666">=</span> <span style="color: #666666">1000</span>
-
-<span style="color: #008000; font-weight: bold">for</span> <span style="color: #008000">iter</span> <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(Niterations):
-    gradient <span style="color: #666666">=</span> (<span style="color: #666666">2.0/</span>n)<span style="color: #666666">*</span>X<span style="color: #666666">.</span>T <span style="color: #666666">@</span> (X <span style="color: #666666">@</span> beta<span style="color: #666666">-</span>y)
-    beta <span style="color: #666666">-=</span> eta<span style="color: #666666">*</span>gradient
-
-<span style="color: #008000">print</span>(beta)
-xnew <span style="color: #666666">=</span> np<span style="color: #666666">.</span>array([[<span style="color: #666666">0</span>],[<span style="color: #666666">2</span>]])
-xbnew <span style="color: #666666">=</span> np<span style="color: #666666">.</span>c_[np<span style="color: #666666">.</span>ones((<span style="color: #666666">2</span>,<span style="color: #666666">1</span>)), xnew]
-ypredict <span style="color: #666666">=</span> xbnew<span style="color: #666666">.</span>dot(beta)
-ypredict2 <span style="color: #666666">=</span> xbnew<span style="color: #666666">.</span>dot(beta_linreg)
-plt<span style="color: #666666">.</span>plot(xnew, ypredict, <span style="color: #BA2121">&quot;r-&quot;</span>)
-plt<span style="color: #666666">.</span>plot(xnew, ypredict2, <span style="color: #BA2121">&quot;b-&quot;</span>)
-plt<span style="color: #666666">.</span>plot(x, y ,<span style="color: #BA2121">&#39;ro&#39;</span>)
-plt<span style="color: #666666">.</span>axis([<span style="color: #666666">0</span>,<span style="color: #666666">2.0</span>,<span style="color: #666666">0</span>, <span style="color: #666666">15.0</span>])
-plt<span style="color: #666666">.</span>xlabel(<span style="color: #BA2121">r&#39;$x$&#39;</span>)
-plt<span style="color: #666666">.</span>ylabel(<span style="color: #BA2121">r&#39;$y$&#39;</span>)
-plt<span style="color: #666666">.</span>title(<span style="color: #BA2121">r&#39;Gradient descent example&#39;</span>)
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.linear_model</span> <span style="color: #008000; font-weight: bold">import</span> LinearRegression, Ridge, Lasso
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.preprocessing</span> <span style="color: #008000; font-weight: bold">import</span> PolynomialFeatures
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.model_selection</span> <span style="color: #008000; font-weight: bold">import</span> train_test_split
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.pipeline</span> <span style="color: #008000; font-weight: bold">import</span> make_pipeline
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.utils</span> <span style="color: #008000; font-weight: bold">import</span> resample
+
+np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>seed(<span style="color: #666666">2018</span>)
+
+n <span style="color: #666666">=</span> <span style="color: #666666">500</span>
+n_boostraps <span style="color: #666666">=</span> <span style="color: #666666">100</span>
+degree <span style="color: #666666">=</span> <span style="color: #666666">18</span>  <span style="color: #408080; font-style: italic"># A quite high value, just to show.</span>
+noise <span style="color: #666666">=</span> <span style="color: #666666">0.1</span>
+
+<span style="color: #408080; font-style: italic"># Make data set.</span>
+x <span style="color: #666666">=</span> np<span style="color: #666666">.</span>linspace(<span style="color: #666666">-1</span>, <span style="color: #666666">3</span>, n)<span style="color: #666666">.</span>reshape(<span style="color: #666666">-1</span>, <span style="color: #666666">1</span>)
+y <span style="color: #666666">=</span> np<span style="color: #666666">.</span>exp(<span style="color: #666666">-</span>x<span style="color: #666666">**2</span>) <span style="color: #666666">+</span> <span style="color: #666666">1.5</span> <span style="color: #666666">*</span> np<span style="color: #666666">.</span>exp(<span style="color: #666666">-</span>(x<span style="color: #666666">-2</span>)<span style="color: #666666">**2</span>) <span style="color: #666666">+</span> np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>normal(<span style="color: #666666">0</span>, <span style="color: #666666">0.1</span>, x<span style="color: #666666">.</span>shape)
+
+<span style="color: #408080; font-style: italic"># Hold out some test data that is never used in training.</span>
+x_train, x_test, y_train, y_test <span style="color: #666666">=</span> train_test_split(x, y, test_size<span style="color: #666666">=0.2</span>)
+
+<span style="color: #408080; font-style: italic"># Combine x transformation and model into one operation.</span>
+<span style="color: #408080; font-style: italic"># Not neccesary, but convenient.</span>
+model <span style="color: #666666">=</span> make_pipeline(PolynomialFeatures(degree<span style="color: #666666">=</span>degree), LinearRegression(fit_intercept<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">False</span>))
+
+<span style="color: #408080; font-style: italic"># The following (m x n_bootstraps) matrix holds the column vectors y_pred</span>
+<span style="color: #408080; font-style: italic"># for each bootstrap iteration.</span>
+y_pred <span style="color: #666666">=</span> np<span style="color: #666666">.</span>empty((y_test<span style="color: #666666">.</span>shape[<span style="color: #666666">0</span>], n_boostraps))
+<span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(n_boostraps):
+    x_, y_ <span style="color: #666666">=</span> resample(x_train, y_train)
+
+    <span style="color: #408080; font-style: italic"># Evaluate the new model on the same test data each time.</span>
+    y_pred[:, i] <span style="color: #666666">=</span> model<span style="color: #666666">.</span>fit(x_, y_)<span style="color: #666666">.</span>predict(x_test)<span style="color: #666666">.</span>ravel()
+
+<span style="color: #408080; font-style: italic"># Note: Expectations and variances taken w.r.t. different training</span>
+<span style="color: #408080; font-style: italic"># data sets, hence the axis=1. Subsequent means are taken across the test data</span>
+<span style="color: #408080; font-style: italic"># set in order to obtain a total value, but before this we have error/bias/variance</span>
+<span style="color: #408080; font-style: italic"># calculated per data point in the test set.</span>
+<span style="color: #408080; font-style: italic"># Note 2: The use of keepdims=True is important in the calculation of bias as this </span>
+<span style="color: #408080; font-style: italic"># maintains the column vector form. Dropping this yields very unexpected results.</span>
+error <span style="color: #666666">=</span> np<span style="color: #666666">.</span>mean( np<span style="color: #666666">.</span>mean((y_test <span style="color: #666666">-</span> y_pred)<span style="color: #666666">**2</span>, axis<span style="color: #666666">=1</span>, keepdims<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">True</span>) )
+bias <span style="color: #666666">=</span> np<span style="color: #666666">.</span>mean( (y_test <span style="color: #666666">-</span> np<span style="color: #666666">.</span>mean(y_pred, axis<span style="color: #666666">=1</span>, keepdims<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">True</span>))<span style="color: #666666">**2</span> )
+variance <span style="color: #666666">=</span> np<span style="color: #666666">.</span>mean( np<span style="color: #666666">.</span>var(y_pred, axis<span style="color: #666666">=1</span>, keepdims<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">True</span>) )
+<span style="color: #008000">print</span>(<span style="color: #BA2121">&#39;Error:&#39;</span>, error)
+<span style="color: #008000">print</span>(<span style="color: #BA2121">&#39;Bias^2:&#39;</span>, bias)
+<span style="color: #008000">print</span>(<span style="color: #BA2121">&#39;Var:&#39;</span>, variance)
+<span style="color: #008000">print</span>(<span style="color: #BA2121">&#39;</span><span style="color: #BB6688; font-weight: bold">{}</span><span style="color: #BA2121"> &gt;= </span><span style="color: #BB6688; font-weight: bold">{}</span><span style="color: #BA2121"> + </span><span style="color: #BB6688; font-weight: bold">{}</span><span style="color: #BA2121"> = </span><span style="color: #BB6688; font-weight: bold">{}</span><span style="color: #BA2121">&#39;</span><span style="color: #666666">.</span>format(error, bias, variance, bias<span style="color: #666666">+</span>variance))
+
+plt<span style="color: #666666">.</span>plot(x[::<span style="color: #666666">5</span>, :], y[::<span style="color: #666666">5</span>, :], label<span style="color: #666666">=</span><span style="color: #BA2121">&#39;f(x)&#39;</span>)
+plt<span style="color: #666666">.</span>scatter(x_test, y_test, label<span style="color: #666666">=</span><span style="color: #BA2121">&#39;Data points&#39;</span>)
+plt<span style="color: #666666">.</span>scatter(x_test, np<span style="color: #666666">.</span>mean(y_pred, axis<span style="color: #666666">=1</span>), label<span style="color: #666666">=</span><span style="color: #BA2121">&#39;Pred&#39;</span>)
+plt<span style="color: #666666">.</span>legend()
 plt<span style="color: #666666">.</span>show()
 </pre>
 </div>
@@ -2009,8 +1279,7 @@ <h2 id="gradient-descent-example">Gradient Descent Example </h2>
 
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="and-a-corresponding-example-using-scikit-learn">And a corresponding example using <b>scikit-learn</b> </h2>
-
+<h2 id="understanding-what-happens">Understanding what happens </h2>
 
 <!-- code=python (!bc pycod) typeset with pygments style "default" -->
 <div class="cell border-box-sizing code_cell rendered">
@@ -2018,22 +1287,52 @@ <h2 id="and-a-corresponding-example-using-scikit-learn">And a corresponding exam
     <div class="inner_cell">
       <div class="input_area">
         <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #408080; font-style: italic"># Importing various packages</span>
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">random</span> <span style="color: #008000; font-weight: bold">import</span> random, seed
+  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">matplotlib.pyplot</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">plt</span>
 <span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">matplotlib.pyplot</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">plt</span>
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.linear_model</span> <span style="color: #008000; font-weight: bold">import</span> SGDRegressor
-
-n <span style="color: #666666">=</span> <span style="color: #666666">100</span>
-x <span style="color: #666666">=</span> <span style="color: #666666">2*</span>np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>rand(n,<span style="color: #666666">1</span>)
-y <span style="color: #666666">=</span> <span style="color: #666666">4+3*</span>x<span style="color: #666666">+</span>np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>randn(n,<span style="color: #666666">1</span>)
-
-X <span style="color: #666666">=</span> np<span style="color: #666666">.</span>c_[np<span style="color: #666666">.</span>ones((n,<span style="color: #666666">1</span>)), x]
-beta_linreg <span style="color: #666666">=</span> np<span style="color: #666666">.</span>linalg<span style="color: #666666">.</span>inv(X<span style="color: #666666">.</span>T <span style="color: #666666">@</span> X) <span style="color: #666666">@</span> (X<span style="color: #666666">.</span>T <span style="color: #666666">@</span> y)
-<span style="color: #008000">print</span>(beta_linreg)
-sgdreg <span style="color: #666666">=</span> SGDRegressor(max_iter <span style="color: #666666">=</span> <span style="color: #666666">50</span>, penalty<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">None</span>, eta0<span style="color: #666666">=0.1</span>)
-sgdreg<span style="color: #666666">.</span>fit(x,y<span style="color: #666666">.</span>ravel())
-<span style="color: #008000">print</span>(sgdreg<span style="color: #666666">.</span>intercept_, sgdreg<span style="color: #666666">.</span>coef_)
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.linear_model</span> <span style="color: #008000; font-weight: bold">import</span> LinearRegression, Ridge, Lasso
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.preprocessing</span> <span style="color: #008000; font-weight: bold">import</span> PolynomialFeatures
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.model_selection</span> <span style="color: #008000; font-weight: bold">import</span> train_test_split
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.pipeline</span> <span style="color: #008000; font-weight: bold">import</span> make_pipeline
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.utils</span> <span style="color: #008000; font-weight: bold">import</span> resample
+
+np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>seed(<span style="color: #666666">2018</span>)
+
+n <span style="color: #666666">=</span> <span style="color: #666666">40</span>
+n_boostraps <span style="color: #666666">=</span> <span style="color: #666666">100</span>
+maxdegree <span style="color: #666666">=</span> <span style="color: #666666">14</span>
+
+
+<span style="color: #408080; font-style: italic"># Make data set.</span>
+x <span style="color: #666666">=</span> np<span style="color: #666666">.</span>linspace(<span style="color: #666666">-3</span>, <span style="color: #666666">3</span>, n)<span style="color: #666666">.</span>reshape(<span style="color: #666666">-1</span>, <span style="color: #666666">1</span>)
+y <span style="color: #666666">=</span> np<span style="color: #666666">.</span>exp(<span style="color: #666666">-</span>x<span style="color: #666666">**2</span>) <span style="color: #666666">+</span> <span style="color: #666666">1.5</span> <span style="color: #666666">*</span> np<span style="color: #666666">.</span>exp(<span style="color: #666666">-</span>(x<span style="color: #666666">-2</span>)<span style="color: #666666">**2</span>)<span style="color: #666666">+</span> np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>normal(<span style="color: #666666">0</span>, <span style="color: #666666">0.1</span>, x<span style="color: #666666">.</span>shape)
+error <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros(maxdegree)
+bias <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros(maxdegree)
+variance <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros(maxdegree)
+polydegree <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros(maxdegree)
+x_train, x_test, y_train, y_test <span style="color: #666666">=</span> train_test_split(x, y, test_size<span style="color: #666666">=0.2</span>)
+
+<span style="color: #008000; font-weight: bold">for</span> degree <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(maxdegree):
+    model <span style="color: #666666">=</span> make_pipeline(PolynomialFeatures(degree<span style="color: #666666">=</span>degree), LinearRegression(fit_intercept<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">False</span>))
+    y_pred <span style="color: #666666">=</span> np<span style="color: #666666">.</span>empty((y_test<span style="color: #666666">.</span>shape[<span style="color: #666666">0</span>], n_boostraps))
+    <span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(n_boostraps):
+        x_, y_ <span style="color: #666666">=</span> resample(x_train, y_train)
+        y_pred[:, i] <span style="color: #666666">=</span> model<span style="color: #666666">.</span>fit(x_, y_)<span style="color: #666666">.</span>predict(x_test)<span style="color: #666666">.</span>ravel()
+
+    polydegree[degree] <span style="color: #666666">=</span> degree
+    error[degree] <span style="color: #666666">=</span> np<span style="color: #666666">.</span>mean( np<span style="color: #666666">.</span>mean((y_test <span style="color: #666666">-</span> y_pred)<span style="color: #666666">**2</span>, axis<span style="color: #666666">=1</span>, keepdims<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">True</span>) )
+    bias[degree] <span style="color: #666666">=</span> np<span style="color: #666666">.</span>mean( (y_test <span style="color: #666666">-</span> np<span style="color: #666666">.</span>mean(y_pred, axis<span style="color: #666666">=1</span>, keepdims<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">True</span>))<span style="color: #666666">**2</span> )
+    variance[degree] <span style="color: #666666">=</span> np<span style="color: #666666">.</span>mean( np<span style="color: #666666">.</span>var(y_pred, axis<span style="color: #666666">=1</span>, keepdims<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">True</span>) )
+    <span style="color: #008000">print</span>(<span style="color: #BA2121">&#39;Polynomial degree:&#39;</span>, degree)
+    <span style="color: #008000">print</span>(<span style="color: #BA2121">&#39;Error:&#39;</span>, error[degree])
+    <span style="color: #008000">print</span>(<span style="color: #BA2121">&#39;Bias^2:&#39;</span>, bias[degree])
+    <span style="color: #008000">print</span>(<span style="color: #BA2121">&#39;Var:&#39;</span>, variance[degree])
+    <span style="color: #008000">print</span>(<span style="color: #BA2121">&#39;</span><span style="color: #BB6688; font-weight: bold">{}</span><span style="color: #BA2121"> &gt;= </span><span style="color: #BB6688; font-weight: bold">{}</span><span style="color: #BA2121"> + </span><span style="color: #BB6688; font-weight: bold">{}</span><span style="color: #BA2121"> = </span><span style="color: #BB6688; font-weight: bold">{}</span><span style="color: #BA2121">&#39;</span><span style="color: #666666">.</span>format(error[degree], bias[degree], variance[degree], bias[degree]<span style="color: #666666">+</span>variance[degree]))
+
+plt<span style="color: #666666">.</span>plot(polydegree, error, label<span style="color: #666666">=</span><span style="color: #BA2121">&#39;Error&#39;</span>)
+plt<span style="color: #666666">.</span>plot(polydegree, bias, label<span style="color: #666666">=</span><span style="color: #BA2121">&#39;bias&#39;</span>)
+plt<span style="color: #666666">.</span>plot(polydegree, variance, label<span style="color: #666666">=</span><span style="color: #BA2121">&#39;Variance&#39;</span>)
+plt<span style="color: #666666">.</span>legend()
+plt<span style="color: #666666">.</span>show()
 </pre>
 </div>
       </div>
@@ -2051,45 +1350,59 @@ <h2 id="and-a-corresponding-example-using-scikit-learn">And a corresponding exam
 
 
 <!-- !split  -->
-<h2 id="gradient-descent-and-ridge">Gradient descent and Ridge </h2>
+<h2 id="summing-up">Summing up </h2>
 
-<p>We have also discussed Ridge regression where the loss function contains a regularized term given by the \( L_2 \) norm of \( \beta \), </p>
-$$
-C_{\text{ridge}}(\beta) = \frac{1}{n}||X\beta -\mathbf{y}||^2 + \lambda ||\beta||^2, \ \lambda \geq 0.
-$$
+<p>The bias-variance tradeoff summarizes the fundamental tension in
+machine learning, particularly supervised learning, between the
+complexity of a model and the amount of training data needed to train
+it.  Since data is often limited, in practice it is often useful to
+use a less-complex model with higher bias, that is  a model whose asymptotic
+performance is worse than another model because it is easier to
+train and less sensitive to sampling noise arising from having a
+finite-sized training dataset (smaller variance). 
+</p>
 
-<p>In order to minimize \( C_{\text{ridge}}(\beta) \) using GD we adjust the gradient as follows </p>
-$$
-\nabla_\beta C_{\text{ridge}}(\beta)  = \frac{2}{n}\begin{bmatrix} \sum_{i=1}^{100} \left(\beta_0+\beta_1x_i-y_i\right) \\
-\sum_{i=1}^{100}\left( x_i (\beta_0+\beta_1x_i)-y_ix_i\right) \\
-\end{bmatrix} + 2\lambda\begin{bmatrix} \beta_0 \\ \beta_1\end{bmatrix} = 2 (\frac{1}{n}X^T(X\beta - \mathbf{y})+\lambda \beta).
-$$
+<p>The above equations tell us that in
+order to minimize the expected test error, we need to select a
+statistical learning method that simultaneously achieves low variance
+and low bias. Note that variance is inherently a nonnegative quantity,
+and squared bias is also nonnegative. Hence, we see that the expected
+test MSE can never lie below \( Var(\epsilon) \), the irreducible error.
+</p>
 
-<p>We can easily extend our program to minimize \( C_{\text{ridge}}(\beta) \) using gradient descent and compare with the analytical solution given by </p>
-$$
-\beta_{\text{ridge}} = \left(X^T X + n\lambda I_{2 \times 2} \right)^{-1} X^T \mathbf{y}.
-$$
+<p>What do we mean by the variance and bias of a statistical learning
+method? The variance refers to the amount by which our model would change if we
+estimated it using a different training data set. Since the training
+data are used to fit the statistical learning method, different
+training data sets  will result in a different estimate. But ideally the
+estimate for our model should not vary too much between training
+sets. However, if a method has high variance  then small changes in
+the training data can result in large changes in the model. In general, more
+flexible statistical methods have higher variance.
+</p>
 
+<p>You may also find this recent <a href="https://www.pnas.org/content/116/32/15849" target="_blank">article</a> of interest.</p>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="the-hessian-matrix-for-ridge-regression">The Hessian matrix for Ridge Regression </h2>
-<p>The Hessian matrix of Ridge Regression for our simple example  is given by </p>
-$$
-\boldsymbol{H} \equiv \begin{bmatrix}
-\frac{\partial^2 C(\beta)}{\partial \beta_0^2} & \frac{\partial^2 C(\beta)}{\partial \beta_0 \partial \beta_1}  \\
-\frac{\partial^2 C(\beta)}{\partial \beta_0 \partial \beta_1} & \frac{\partial^2 C(\beta)}{\partial \beta_1^2} &  \\
-\end{bmatrix} = \frac{2}{n}X^T X+2\lambda\boldsymbol{I}.
-$$
-
-<p>This implies that the Hessian matrix  is positive definite, hence the stationary point is a
-minimum.
-Note that the Ridge cost function is convex being  a sum of two convex
-functions. Therefore, the stationary point is a global
-minimum of this function.
+<h2 id="another-example-from-scikit-learn-s-repository">Another Example from Scikit-Learn's Repository </h2>
+
+<p>This example demonstrates the problems of underfitting and overfitting and
+how we can use linear regression with polynomial features to approximate
+nonlinear functions. The plot shows the function that we want to approximate,
+which is a part of the cosine function. In addition, the samples from the
+real function and the approximations of different models are displayed. The
+models have polynomial features of different degrees. We can see that a
+linear function (polynomial with degree 1) is not sufficient to fit the
+training samples. This is called <b>underfitting</b>. A polynomial of degree 4
+approximates the true function almost perfectly. However, for higher degrees
+the model will <b>overfit</b> the training data, i.e. it learns the noise of the
+training data.
+We evaluate quantitatively overfitting and underfitting by using
+cross-validation. We calculate the mean squared error (MSE) on the validation
+set, the higher, the less likely the model generalizes correctly from the
+training data.
 </p>
 
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="program-example-for-gradient-descent-with-ridge-regression">Program example for gradient descent with Ridge Regression </h2>
 
 <!-- code=python (!bc pycod) typeset with pygments style "default" -->
 <div class="cell border-box-sizing code_cell rendered">
@@ -2097,55 +1410,54 @@ <h2 id="program-example-for-gradient-descent-with-ridge-regression">Program exam
     <div class="inner_cell">
       <div class="input_area">
         <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">random</span> <span style="color: #008000; font-weight: bold">import</span> random, seed
+  <pre style="line-height: 125%;"><span style="color: #408080; font-style: italic">#print(__doc__)</span>
+
 <span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
 <span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">matplotlib.pyplot</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">plt</span>
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">mpl_toolkits.mplot3d</span> <span style="color: #008000; font-weight: bold">import</span> Axes3D
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">matplotlib</span> <span style="color: #008000; font-weight: bold">import</span> cm
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">matplotlib.ticker</span> <span style="color: #008000; font-weight: bold">import</span> LinearLocator, FormatStrFormatter
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">sys</span>
-
-<span style="color: #408080; font-style: italic"># the number of datapoints</span>
-n <span style="color: #666666">=</span> <span style="color: #666666">100</span>
-x <span style="color: #666666">=</span> <span style="color: #666666">2*</span>np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>rand(n,<span style="color: #666666">1</span>)
-y <span style="color: #666666">=</span> <span style="color: #666666">4+3*</span>x<span style="color: #666666">+</span>np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>randn(n,<span style="color: #666666">1</span>)
-
-X <span style="color: #666666">=</span> np<span style="color: #666666">.</span>c_[np<span style="color: #666666">.</span>ones((n,<span style="color: #666666">1</span>)), x]
-XT_X <span style="color: #666666">=</span> X<span style="color: #666666">.</span>T <span style="color: #666666">@</span> X
-
-<span style="color: #408080; font-style: italic">#Ridge parameter lambda</span>
-lmbda  <span style="color: #666666">=</span> <span style="color: #666666">0.001</span>
-Id <span style="color: #666666">=</span> n<span style="color: #666666">*</span>lmbda<span style="color: #666666">*</span> np<span style="color: #666666">.</span>eye(XT_X<span style="color: #666666">.</span>shape[<span style="color: #666666">0</span>])
-
-<span style="color: #408080; font-style: italic"># Hessian matrix</span>
-H <span style="color: #666666">=</span> (<span style="color: #666666">2.0/</span>n)<span style="color: #666666">*</span> XT_X<span style="color: #666666">+2*</span>lmbda<span style="color: #666666">*</span> np<span style="color: #666666">.</span>eye(XT_X<span style="color: #666666">.</span>shape[<span style="color: #666666">0</span>])
-<span style="color: #408080; font-style: italic"># Get the eigenvalues</span>
-EigValues, EigVectors <span style="color: #666666">=</span> np<span style="color: #666666">.</span>linalg<span style="color: #666666">.</span>eig(H)
-<span style="color: #008000">print</span>(<span style="color: #BA2121">f&quot;Eigenvalues of Hessian Matrix:</span><span style="color: #BB6688; font-weight: bold">{</span>EigValues<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">&quot;</span>)
-
-
-beta_linreg <span style="color: #666666">=</span> np<span style="color: #666666">.</span>linalg<span style="color: #666666">.</span>inv(XT_X<span style="color: #666666">+</span>Id) <span style="color: #666666">@</span> X<span style="color: #666666">.</span>T <span style="color: #666666">@</span> y
-<span style="color: #008000">print</span>(beta_linreg)
-<span style="color: #408080; font-style: italic"># Start plain gradient descent</span>
-beta <span style="color: #666666">=</span> np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>randn(<span style="color: #666666">2</span>,<span style="color: #666666">1</span>)
-
-eta <span style="color: #666666">=</span> <span style="color: #666666">1.0/</span>np<span style="color: #666666">.</span>max(EigValues)
-Niterations <span style="color: #666666">=</span> <span style="color: #666666">100</span>
-
-<span style="color: #008000; font-weight: bold">for</span> <span style="color: #008000">iter</span> <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(Niterations):
-    gradients <span style="color: #666666">=</span> <span style="color: #666666">2.0/</span>n<span style="color: #666666">*</span>X<span style="color: #666666">.</span>T <span style="color: #666666">@</span> (X <span style="color: #666666">@</span> (beta)<span style="color: #666666">-</span>y)<span style="color: #666666">+2*</span>lmbda<span style="color: #666666">*</span>beta
-    beta <span style="color: #666666">-=</span> eta<span style="color: #666666">*</span>gradients
-
-<span style="color: #008000">print</span>(beta)
-ypredict <span style="color: #666666">=</span> X <span style="color: #666666">@</span> beta
-ypredict2 <span style="color: #666666">=</span> X <span style="color: #666666">@</span> beta_linreg
-plt<span style="color: #666666">.</span>plot(x, ypredict, <span style="color: #BA2121">&quot;r-&quot;</span>)
-plt<span style="color: #666666">.</span>plot(x, ypredict2, <span style="color: #BA2121">&quot;b-&quot;</span>)
-plt<span style="color: #666666">.</span>plot(x, y ,<span style="color: #BA2121">&#39;ro&#39;</span>)
-plt<span style="color: #666666">.</span>axis([<span style="color: #666666">0</span>,<span style="color: #666666">2.0</span>,<span style="color: #666666">0</span>, <span style="color: #666666">15.0</span>])
-plt<span style="color: #666666">.</span>xlabel(<span style="color: #BA2121">r&#39;$x$&#39;</span>)
-plt<span style="color: #666666">.</span>ylabel(<span style="color: #BA2121">r&#39;$y$&#39;</span>)
-plt<span style="color: #666666">.</span>title(<span style="color: #BA2121">r&#39;Gradient descent example for Ridge&#39;</span>)
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.pipeline</span> <span style="color: #008000; font-weight: bold">import</span> Pipeline
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.preprocessing</span> <span style="color: #008000; font-weight: bold">import</span> PolynomialFeatures
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.linear_model</span> <span style="color: #008000; font-weight: bold">import</span> LinearRegression
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.model_selection</span> <span style="color: #008000; font-weight: bold">import</span> cross_val_score
+
+
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">true_fun</span>(X):
+    <span style="color: #008000; font-weight: bold">return</span> np<span style="color: #666666">.</span>cos(<span style="color: #666666">1.5</span> <span style="color: #666666">*</span> np<span style="color: #666666">.</span>pi <span style="color: #666666">*</span> X)
+
+np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>seed(<span style="color: #666666">0</span>)
+
+n_samples <span style="color: #666666">=</span> <span style="color: #666666">30</span>
+degrees <span style="color: #666666">=</span> [<span style="color: #666666">1</span>, <span style="color: #666666">4</span>, <span style="color: #666666">15</span>]
+
+X <span style="color: #666666">=</span> np<span style="color: #666666">.</span>sort(np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>rand(n_samples))
+y <span style="color: #666666">=</span> true_fun(X) <span style="color: #666666">+</span> np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>randn(n_samples) <span style="color: #666666">*</span> <span style="color: #666666">0.1</span>
+
+plt<span style="color: #666666">.</span>figure(figsize<span style="color: #666666">=</span>(<span style="color: #666666">14</span>, <span style="color: #666666">5</span>))
+<span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #008000">len</span>(degrees)):
+    ax <span style="color: #666666">=</span> plt<span style="color: #666666">.</span>subplot(<span style="color: #666666">1</span>, <span style="color: #008000">len</span>(degrees), i <span style="color: #666666">+</span> <span style="color: #666666">1</span>)
+    plt<span style="color: #666666">.</span>setp(ax, xticks<span style="color: #666666">=</span>(), yticks<span style="color: #666666">=</span>())
+
+    polynomial_features <span style="color: #666666">=</span> PolynomialFeatures(degree<span style="color: #666666">=</span>degrees[i],
+                                             include_bias<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">False</span>)
+    linear_regression <span style="color: #666666">=</span> LinearRegression()
+    pipeline <span style="color: #666666">=</span> Pipeline([(<span style="color: #BA2121">&quot;polynomial_features&quot;</span>, polynomial_features),
+                         (<span style="color: #BA2121">&quot;linear_regression&quot;</span>, linear_regression)])
+    pipeline<span style="color: #666666">.</span>fit(X[:, np<span style="color: #666666">.</span>newaxis], y)
+
+    <span style="color: #408080; font-style: italic"># Evaluate the models using crossvalidation</span>
+    scores <span style="color: #666666">=</span> cross_val_score(pipeline, X[:, np<span style="color: #666666">.</span>newaxis], y,
+                             scoring<span style="color: #666666">=</span><span style="color: #BA2121">&quot;neg_mean_squared_error&quot;</span>, cv<span style="color: #666666">=10</span>)
+
+    X_test <span style="color: #666666">=</span> np<span style="color: #666666">.</span>linspace(<span style="color: #666666">0</span>, <span style="color: #666666">1</span>, <span style="color: #666666">100</span>)
+    plt<span style="color: #666666">.</span>plot(X_test, pipeline<span style="color: #666666">.</span>predict(X_test[:, np<span style="color: #666666">.</span>newaxis]), label<span style="color: #666666">=</span><span style="color: #BA2121">&quot;Model&quot;</span>)
+    plt<span style="color: #666666">.</span>plot(X_test, true_fun(X_test), label<span style="color: #666666">=</span><span style="color: #BA2121">&quot;True function&quot;</span>)
+    plt<span style="color: #666666">.</span>scatter(X, y, edgecolor<span style="color: #666666">=</span><span style="color: #BA2121">&#39;b&#39;</span>, s<span style="color: #666666">=20</span>, label<span style="color: #666666">=</span><span style="color: #BA2121">&quot;Samples&quot;</span>)
+    plt<span style="color: #666666">.</span>xlabel(<span style="color: #BA2121">&quot;x&quot;</span>)
+    plt<span style="color: #666666">.</span>ylabel(<span style="color: #BA2121">&quot;y&quot;</span>)
+    plt<span style="color: #666666">.</span>xlim((<span style="color: #666666">0</span>, <span style="color: #666666">1</span>))
+    plt<span style="color: #666666">.</span>ylim((<span style="color: #666666">-2</span>, <span style="color: #666666">2</span>))
+    plt<span style="color: #666666">.</span>legend(loc<span style="color: #666666">=</span><span style="color: #BA2121">&quot;best&quot;</span>)
+    plt<span style="color: #666666">.</span>title(<span style="color: #BA2121">&quot;Degree </span><span style="color: #BB6688; font-weight: bold">{}</span><span style="color: #BB6622; font-weight: bold">\n</span><span style="color: #BA2121">MSE = </span><span style="color: #BB6688; font-weight: bold">{:.2e}</span><span style="color: #BA2121">(+/- </span><span style="color: #BB6688; font-weight: bold">{:.2e}</span><span style="color: #BA2121">)&quot;</span><span style="color: #666666">.</span>format(
+        degrees[i], <span style="color: #666666">-</span>scores<span style="color: #666666">.</span>mean(), scores<span style="color: #666666">.</span>std()))
 plt<span style="color: #666666">.</span>show()
 </pre>
 </div>
@@ -2163,25 +1475,6 @@ <h2 id="program-example-for-gradient-descent-with-ridge-regression">Program exam
 </div>
 
 
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="using-gradient-descent-methods-limitations">Using gradient descent methods, limitations </h2>
-
-<ul>
-<li> <b>Gradient descent (GD) finds local minima of our function</b>. Since the GD algorithm is deterministic, if it converges, it will converge to a local minimum of our cost/loss/risk function. Because in ML we are often dealing with extremely rugged landscapes with many local minima, this can lead to poor performance.</li>
-<li> <b>GD is sensitive to initial conditions</b>. One consequence of the local nature of GD is that initial conditions matter. Depending on where one starts, one will end up at a different local minima. Therefore, it is very important to think about how one initializes the training process. This is true for GD as well as more complicated variants of GD.</li>
-<li> <b>Gradients are computationally expensive to calculate for large datasets</b>. In many cases in statistics and ML, the cost/loss/risk function is a sum of terms, with one term for each data point. For example, in linear regression, \( E \propto \sum_{i=1}^n (y_i - \mathbf{w}^T\cdot\mathbf{x}_i)^2 \); for logistic regression, the square error is replaced by the cross entropy. To calculate the gradient we have to sum over <em>all</em> \( n \) data points. Doing this at every GD step becomes extremely computationally expensive. An ingenious solution to this, is to calculate the gradients using small subsets of the data called &quot;mini batches&quot;. This has the added benefit of introducing stochasticity into our algorithm.</li>
-<li> <b>GD is very sensitive to choices of learning rates</b>. GD is extremely sensitive to the choice of learning rates. If the learning rate is very small, the training process take an extremely long time. For larger learning rates, GD can diverge and give poor results. Furthermore, depending on what the local landscape looks like, we have to modify the learning rates to ensure convergence. Ideally, we would <em>adaptively</em> choose the learning rates to match the landscape.</li>
-<li> <b>GD treats all directions in parameter space uniformly.</b> Another major drawback of GD is that unlike Newton's method, the learning rate for GD is the same in all directions in parameter space. For this reason, the maximum learning rate is set by the behavior of the steepest direction and this can significantly slow down training. Ideally, we would like to take large steps in flat directions and small steps in steep directions. Since we are exploring rugged landscapes where curvatures change, this requires us to keep track of not only the gradient but second derivatives. The ideal scenario would be to calculate the Hessian but this proves to be too computationally expensive.</li> 
-<li> GD can take exponential time to escape saddle points, even with random initialization. As we mentioned, GD is extremely sensitive to initial condition since it determines the particular local minimum GD would eventually reach. However, even with a good initialization scheme, through the introduction of randomness, GD can still take exponential time to escape saddle points.</li>
-</ul>
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="challenge-yourself-the-coming-weekend">Challenge yourself the coming weekend </h2>
-
-<p>Write a code which implements gradient descent for a logistic regression example.</p>
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="lab-session-material-from-last-week-and-relevant-for-the-first-project">Lab session: Material from last week and relevant for the first project </h2>
-
 <!-- !split  -->
 <h2 id="various-steps-in-cross-validation">Various steps in cross-validation </h2>
 
@@ -2201,28 +1494,6 @@ <h2 id="various-steps-in-cross-validation">Various steps in cross-validation </h
 cross-validation (LOOCV). 
 </p>
 
-<!-- !split  -->
-<h2 id="how-to-set-up-the-cross-validation-for-ridge-and-or-lasso">How to set up the cross-validation for Ridge and/or Lasso </h2>
-
-<ul>
-<li> Define a range of interest for the penalty parameter.</li>
-<li> Divide the data set into training and test set comprising samples \( \{1, \ldots, n\} \setminus i \) and \( \{ i \} \), respectively.</li>
-<li> Fit the linear regression model by means of for example Ridge or Lasso regression  for each \( \lambda \) in the grid using the training set, and the corresponding estimate of the error variance \( \boldsymbol{\sigma}_{-i}^2(\lambda) \), as</li>
-</ul>
-$$
-\begin{align*}
-\boldsymbol{\beta}_{-i}(\lambda) & =  ( \boldsymbol{X}_{-i, \ast}^{T}
-\boldsymbol{X}_{-i, \ast} + \lambda \boldsymbol{I}_{pp})^{-1}
-\boldsymbol{X}_{-i, \ast}^{T} \boldsymbol{y}_{-i}
-\end{align*}
-$$
-
-
-<ul>
-<li> Evaluate the prediction performance of these models on the test set by \( C[y_i, \boldsymbol{X}_{i, \ast}; \boldsymbol{\beta}_{-i}(\lambda), \boldsymbol{\sigma}_{-i}^2(\lambda)] \). Or, by the prediction error \( |y_i - \boldsymbol{X}_{i, \ast} \boldsymbol{\beta}_{-i}(\lambda)| \), the relative error, the error squared or the R2 score function.</li>
-<li> Repeat the first three steps  such that each sample plays the role of the test set once.</li>
-<li> Average the prediction performances of the test sets at each grid point of the penalty bias/parameter. It is an estimate of the prediction performance of the model corresponding to this value of the penalty parameter on novel data.</li> 
-</ul>
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
 <h2 id="cross-validation-in-brief">Cross-validation in brief </h2>
 
@@ -2233,10 +1504,10 @@ <h2 id="cross-validation-in-brief">Cross-validation in brief </h2>
 <li> Split the dataset into \( k \) groups.</li>
 <li> For each unique group:
 <ol type="a"></li>
- <li> Decide which group to use as set for test data</li>
- <li> Take the remaining groups as a training data set</li>
- <li> Fit a model on the training set and evaluate it on the test set</li>
- <li> Retain the evaluation score and discard the model</li>
+<li> Decide which group to use as set for test data</li>
+<li> Take the remaining groups as a training data set</li>
+<li> Fit a model on the training set and evaluate it on the test set</li>
+<li> Retain the evaluation score and discard the model</li>
 </ol>
 <li> Summarize the model using the sample of model evaluation scores</li>
 </ol>
@@ -2356,9 +1627,220 @@ <h2 id="code-example-for-cross-validation-and-k-fold-cross-validation">Code Exam
 </div>
 
 
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="more-examples-on-bootstrap-and-cross-validation-and-errors">More examples on bootstrap and cross-validation and errors </h2>
+
+
+<!-- code=python (!bc pycod) typeset with pygments style "default" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #f8f8f8">
+  <pre style="line-height: 125%;"><span style="color: #408080; font-style: italic"># Common imports</span>
+<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">os</span>
+<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
+<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">pandas</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">pd</span>
+<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">matplotlib.pyplot</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">plt</span>
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.linear_model</span> <span style="color: #008000; font-weight: bold">import</span> LinearRegression, Ridge, Lasso
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.model_selection</span> <span style="color: #008000; font-weight: bold">import</span> train_test_split
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.utils</span> <span style="color: #008000; font-weight: bold">import</span> resample
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.metrics</span> <span style="color: #008000; font-weight: bold">import</span> mean_squared_error
+<span style="color: #408080; font-style: italic"># Where to save the figures and data files</span>
+PROJECT_ROOT_DIR <span style="color: #666666">=</span> <span style="color: #BA2121">&quot;Results&quot;</span>
+FIGURE_ID <span style="color: #666666">=</span> <span style="color: #BA2121">&quot;Results/FigureFiles&quot;</span>
+DATA_ID <span style="color: #666666">=</span> <span style="color: #BA2121">&quot;DataFiles/&quot;</span>
+
+<span style="color: #008000; font-weight: bold">if</span> <span style="color: #AA22FF; font-weight: bold">not</span> os<span style="color: #666666">.</span>path<span style="color: #666666">.</span>exists(PROJECT_ROOT_DIR):
+    os<span style="color: #666666">.</span>mkdir(PROJECT_ROOT_DIR)
+
+<span style="color: #008000; font-weight: bold">if</span> <span style="color: #AA22FF; font-weight: bold">not</span> os<span style="color: #666666">.</span>path<span style="color: #666666">.</span>exists(FIGURE_ID):
+    os<span style="color: #666666">.</span>makedirs(FIGURE_ID)
+
+<span style="color: #008000; font-weight: bold">if</span> <span style="color: #AA22FF; font-weight: bold">not</span> os<span style="color: #666666">.</span>path<span style="color: #666666">.</span>exists(DATA_ID):
+    os<span style="color: #666666">.</span>makedirs(DATA_ID)
+
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">image_path</span>(fig_id):
+    <span style="color: #008000; font-weight: bold">return</span> os<span style="color: #666666">.</span>path<span style="color: #666666">.</span>join(FIGURE_ID, fig_id)
+
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">data_path</span>(dat_id):
+    <span style="color: #008000; font-weight: bold">return</span> os<span style="color: #666666">.</span>path<span style="color: #666666">.</span>join(DATA_ID, dat_id)
+
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">save_fig</span>(fig_id):
+    plt<span style="color: #666666">.</span>savefig(image_path(fig_id) <span style="color: #666666">+</span> <span style="color: #BA2121">&quot;.png&quot;</span>, <span style="color: #008000">format</span><span style="color: #666666">=</span><span style="color: #BA2121">&#39;png&#39;</span>)
+
+infile <span style="color: #666666">=</span> <span style="color: #008000">open</span>(data_path(<span style="color: #BA2121">&quot;EoS.csv&quot;</span>),<span style="color: #BA2121">&#39;r&#39;</span>)
+
+<span style="color: #408080; font-style: italic"># Read the EoS data as  csv file and organize the data into two arrays with density and energies</span>
+EoS <span style="color: #666666">=</span> pd<span style="color: #666666">.</span>read_csv(infile, names<span style="color: #666666">=</span>(<span style="color: #BA2121">&#39;Density&#39;</span>, <span style="color: #BA2121">&#39;Energy&#39;</span>))
+EoS[<span style="color: #BA2121">&#39;Energy&#39;</span>] <span style="color: #666666">=</span> pd<span style="color: #666666">.</span>to_numeric(EoS[<span style="color: #BA2121">&#39;Energy&#39;</span>], errors<span style="color: #666666">=</span><span style="color: #BA2121">&#39;coerce&#39;</span>)
+EoS <span style="color: #666666">=</span> EoS<span style="color: #666666">.</span>dropna()
+Energies <span style="color: #666666">=</span> EoS[<span style="color: #BA2121">&#39;Energy&#39;</span>]
+Density <span style="color: #666666">=</span> EoS[<span style="color: #BA2121">&#39;Density&#39;</span>]
+<span style="color: #408080; font-style: italic">#  The design matrix now as function of various polytrops</span>
+
+Maxpolydegree <span style="color: #666666">=</span> <span style="color: #666666">30</span>
+X <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros((<span style="color: #008000">len</span>(Density),Maxpolydegree))
+X[:,<span style="color: #666666">0</span>] <span style="color: #666666">=</span> <span style="color: #666666">1.0</span>
+testerror <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros(Maxpolydegree)
+trainingerror <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros(Maxpolydegree)
+polynomial <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros(Maxpolydegree)
+
+trials <span style="color: #666666">=</span> <span style="color: #666666">100</span>
+<span style="color: #008000; font-weight: bold">for</span> polydegree <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #666666">1</span>, Maxpolydegree):
+    polynomial[polydegree] <span style="color: #666666">=</span> polydegree
+    <span style="color: #008000; font-weight: bold">for</span> degree <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(polydegree):
+        X[:,degree] <span style="color: #666666">=</span> Density<span style="color: #666666">**</span>(degree<span style="color: #666666">/3.0</span>)
+
+<span style="color: #408080; font-style: italic"># loop over trials in order to estimate the expectation value of the MSE</span>
+    testerror[polydegree] <span style="color: #666666">=</span> <span style="color: #666666">0.0</span>
+    trainingerror[polydegree] <span style="color: #666666">=</span> <span style="color: #666666">0.0</span>
+    <span style="color: #008000; font-weight: bold">for</span> samples <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(trials):
+        x_train, x_test, y_train, y_test <span style="color: #666666">=</span> train_test_split(X, Energies, test_size<span style="color: #666666">=0.2</span>)
+        model <span style="color: #666666">=</span> LinearRegression(fit_intercept<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">False</span>)<span style="color: #666666">.</span>fit(x_train, y_train)
+        ypred <span style="color: #666666">=</span> model<span style="color: #666666">.</span>predict(x_train)
+        ytilde <span style="color: #666666">=</span> model<span style="color: #666666">.</span>predict(x_test)
+        testerror[polydegree] <span style="color: #666666">+=</span> mean_squared_error(y_test, ytilde)
+        trainingerror[polydegree] <span style="color: #666666">+=</span> mean_squared_error(y_train, ypred) 
+
+    testerror[polydegree] <span style="color: #666666">/=</span> trials
+    trainingerror[polydegree] <span style="color: #666666">/=</span> trials
+    <span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;Degree of polynomial: </span><span style="color: #BB6688; font-weight: bold">%3d</span><span style="color: #BA2121">&quot;</span><span style="color: #666666">%</span> polynomial[polydegree])
+    <span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;Mean squared error on training data: </span><span style="color: #BB6688; font-weight: bold">%.8f</span><span style="color: #BA2121">&quot;</span> <span style="color: #666666">%</span> trainingerror[polydegree])
+    <span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;Mean squared error on test data: </span><span style="color: #BB6688; font-weight: bold">%.8f</span><span style="color: #BA2121">&quot;</span> <span style="color: #666666">%</span> testerror[polydegree])
+
+plt<span style="color: #666666">.</span>plot(polynomial, np<span style="color: #666666">.</span>log10(trainingerror), label<span style="color: #666666">=</span><span style="color: #BA2121">&#39;Training Error&#39;</span>)
+plt<span style="color: #666666">.</span>plot(polynomial, np<span style="color: #666666">.</span>log10(testerror), label<span style="color: #666666">=</span><span style="color: #BA2121">&#39;Test Error&#39;</span>)
+plt<span style="color: #666666">.</span>xlabel(<span style="color: #BA2121">&#39;Polynomial degree&#39;</span>)
+plt<span style="color: #666666">.</span>ylabel(<span style="color: #BA2121">&#39;log10[MSE]&#39;</span>)
+plt<span style="color: #666666">.</span>legend()
+plt<span style="color: #666666">.</span>show()
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
+</div>
+
+<p>Note that we kept the intercept column in the fitting here. This means that we need to set the <b>intercept</b> in the call to the <b>Scikit-Learn</b> function as <b>False</b>. Alternatively, we could have set up the design matrix \( X \) without the first column of ones.</p>
+
+<!-- !split  -->
+<h2 id="the-same-example-but-now-with-cross-validation">The same example but now with cross-validation </h2>
+
+<p>In this example we keep the intercept column again but add cross-validation in order to estimate the best possible value of the means squared error.</p>
+
+<!-- code=python (!bc pycod) typeset with pygments style "default" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #f8f8f8">
+  <pre style="line-height: 125%;"><span style="color: #408080; font-style: italic"># Common imports</span>
+<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">os</span>
+<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
+<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">pandas</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">pd</span>
+<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">matplotlib.pyplot</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">plt</span>
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.linear_model</span> <span style="color: #008000; font-weight: bold">import</span> LinearRegression, Ridge, Lasso
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.metrics</span> <span style="color: #008000; font-weight: bold">import</span> mean_squared_error
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.model_selection</span> <span style="color: #008000; font-weight: bold">import</span> KFold
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.model_selection</span> <span style="color: #008000; font-weight: bold">import</span> cross_val_score
+
+
+<span style="color: #408080; font-style: italic"># Where to save the figures and data files</span>
+PROJECT_ROOT_DIR <span style="color: #666666">=</span> <span style="color: #BA2121">&quot;Results&quot;</span>
+FIGURE_ID <span style="color: #666666">=</span> <span style="color: #BA2121">&quot;Results/FigureFiles&quot;</span>
+DATA_ID <span style="color: #666666">=</span> <span style="color: #BA2121">&quot;DataFiles/&quot;</span>
+
+<span style="color: #008000; font-weight: bold">if</span> <span style="color: #AA22FF; font-weight: bold">not</span> os<span style="color: #666666">.</span>path<span style="color: #666666">.</span>exists(PROJECT_ROOT_DIR):
+    os<span style="color: #666666">.</span>mkdir(PROJECT_ROOT_DIR)
+
+<span style="color: #008000; font-weight: bold">if</span> <span style="color: #AA22FF; font-weight: bold">not</span> os<span style="color: #666666">.</span>path<span style="color: #666666">.</span>exists(FIGURE_ID):
+    os<span style="color: #666666">.</span>makedirs(FIGURE_ID)
+
+<span style="color: #008000; font-weight: bold">if</span> <span style="color: #AA22FF; font-weight: bold">not</span> os<span style="color: #666666">.</span>path<span style="color: #666666">.</span>exists(DATA_ID):
+    os<span style="color: #666666">.</span>makedirs(DATA_ID)
+
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">image_path</span>(fig_id):
+    <span style="color: #008000; font-weight: bold">return</span> os<span style="color: #666666">.</span>path<span style="color: #666666">.</span>join(FIGURE_ID, fig_id)
+
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">data_path</span>(dat_id):
+    <span style="color: #008000; font-weight: bold">return</span> os<span style="color: #666666">.</span>path<span style="color: #666666">.</span>join(DATA_ID, dat_id)
+
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">save_fig</span>(fig_id):
+    plt<span style="color: #666666">.</span>savefig(image_path(fig_id) <span style="color: #666666">+</span> <span style="color: #BA2121">&quot;.png&quot;</span>, <span style="color: #008000">format</span><span style="color: #666666">=</span><span style="color: #BA2121">&#39;png&#39;</span>)
+
+infile <span style="color: #666666">=</span> <span style="color: #008000">open</span>(data_path(<span style="color: #BA2121">&quot;EoS.csv&quot;</span>),<span style="color: #BA2121">&#39;r&#39;</span>)
+
+<span style="color: #408080; font-style: italic"># Read the EoS data as  csv file and organize the data into two arrays with density and energies</span>
+EoS <span style="color: #666666">=</span> pd<span style="color: #666666">.</span>read_csv(infile, names<span style="color: #666666">=</span>(<span style="color: #BA2121">&#39;Density&#39;</span>, <span style="color: #BA2121">&#39;Energy&#39;</span>))
+EoS[<span style="color: #BA2121">&#39;Energy&#39;</span>] <span style="color: #666666">=</span> pd<span style="color: #666666">.</span>to_numeric(EoS[<span style="color: #BA2121">&#39;Energy&#39;</span>], errors<span style="color: #666666">=</span><span style="color: #BA2121">&#39;coerce&#39;</span>)
+EoS <span style="color: #666666">=</span> EoS<span style="color: #666666">.</span>dropna()
+Energies <span style="color: #666666">=</span> EoS[<span style="color: #BA2121">&#39;Energy&#39;</span>]
+Density <span style="color: #666666">=</span> EoS[<span style="color: #BA2121">&#39;Density&#39;</span>]
+<span style="color: #408080; font-style: italic">#  The design matrix now as function of various polytrops</span>
+
+Maxpolydegree <span style="color: #666666">=</span> <span style="color: #666666">30</span>
+X <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros((<span style="color: #008000">len</span>(Density),Maxpolydegree))
+X[:,<span style="color: #666666">0</span>] <span style="color: #666666">=</span> <span style="color: #666666">1.0</span>
+estimated_mse_sklearn <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros(Maxpolydegree)
+polynomial <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros(Maxpolydegree)
+k <span style="color: #666666">=5</span>
+kfold <span style="color: #666666">=</span> KFold(n_splits <span style="color: #666666">=</span> k)
+
+<span style="color: #008000; font-weight: bold">for</span> polydegree <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #666666">1</span>, Maxpolydegree):
+    polynomial[polydegree] <span style="color: #666666">=</span> polydegree
+    <span style="color: #008000; font-weight: bold">for</span> degree <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(polydegree):
+        X[:,degree] <span style="color: #666666">=</span> Density<span style="color: #666666">**</span>(degree<span style="color: #666666">/3.0</span>)
+        OLS <span style="color: #666666">=</span> LinearRegression(fit_intercept<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">False</span>)
+<span style="color: #408080; font-style: italic"># loop over trials in order to estimate the expectation value of the MSE</span>
+    estimated_mse_folds <span style="color: #666666">=</span> cross_val_score(OLS, X, Energies, scoring<span style="color: #666666">=</span><span style="color: #BA2121">&#39;neg_mean_squared_error&#39;</span>, cv<span style="color: #666666">=</span>kfold)
+<span style="color: #408080; font-style: italic">#[:, np.newaxis]</span>
+    estimated_mse_sklearn[polydegree] <span style="color: #666666">=</span> np<span style="color: #666666">.</span>mean(<span style="color: #666666">-</span>estimated_mse_folds)
+
+plt<span style="color: #666666">.</span>plot(polynomial, np<span style="color: #666666">.</span>log10(estimated_mse_sklearn), label<span style="color: #666666">=</span><span style="color: #BA2121">&#39;Test Error&#39;</span>)
+plt<span style="color: #666666">.</span>xlabel(<span style="color: #BA2121">&#39;Polynomial degree&#39;</span>)
+plt<span style="color: #666666">.</span>ylabel(<span style="color: #BA2121">&#39;log10[MSE]&#39;</span>)
+plt<span style="color: #666666">.</span>legend()
+plt<span style="color: #666666">.</span>show()
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
+</div>
+
+
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="material-for-the-lab-sessions">Material for the lab sessions </h2>
+
+<p>This week we will discuss during the first hour of each lab session
+some technicalities related to the project and methods for updating
+the learning like ADAgrad, RMSprop and ADAM. As teaching material, see
+the jupyter-notebook from week 37 (September 12-16).
+</p>
+
+<p>For the lab session, the following video on cross validation (from 2024), could be helpful, see <a href="https://www.youtube.com/watch?v=T9jjWsmsd1o" target="_blank"><tt>https://www.youtube.com/watch?v=T9jjWsmsd1o</tt></a></p>
+
+<p>See also video on ADAgrad, RMSprop and ADAM (material from last week not covered during lecture) at <a href="https://youtu.be/J_41Hld6tTU" target="_blank"><tt>https://youtu.be/J_41Hld6tTU</tt></a></p>
 <!-- ------------------- end of main content --------------- -->
 <center style="font-size:80%">
-<!-- copyright --> &copy; 1999-2024, Morten Hjorth-Jensen. Released under CC Attribution-NonCommercial 4.0 license
+<!-- copyright --> &copy; 1999-2025, Morten Hjorth-Jensen. Released under CC Attribution-NonCommercial 4.0 license
 </center>
 </body>
 </html>
diff --git a/doc/pub/week38/ipynb/.ipynb_checkpoints/week38-checkpoint.ipynb b/doc/pub/week38/ipynb/.ipynb_checkpoints/week38-checkpoint.ipynb
index 43e3b26d9..9f7e9120b 100644
--- a/doc/pub/week38/ipynb/.ipynb_checkpoints/week38-checkpoint.ipynb
+++ b/doc/pub/week38/ipynb/.ipynb_checkpoints/week38-checkpoint.ipynb
@@ -2,2750 +2,2306 @@
  "cells": [
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "799137cd",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "<!-- dom:TITLE: Data Analysis and Machine Learning: Logistic Regression -->\n",
-    "# Data Analysis and Machine Learning: Logistic Regression\n",
-    "<!-- dom:AUTHOR: Morten Hjorth-Jensen at Department of Physics, University of Oslo & Department of Physics and Astronomy and National Superconducting Cyclotron Laboratory, Michigan State University -->\n",
-    "<!-- Author: -->  \n",
-    "**Morten Hjorth-Jensen**, Department of Physics, University of Oslo and Department of Physics and Astronomy and National Superconducting Cyclotron Laboratory, Michigan State University\n",
-    "\n",
-<<<<<<< HEAD
-    "Date: **Sep 16, 2020**\n",
-=======
-    "Date: **Sep 18, 2020**\n",
->>>>>>> 9b0e2e75096cc1acee65bfac25f4eff818140252
-    "\n",
-    "Copyright 1999-2020, Morten Hjorth-Jensen. Released under CC Attribution-NonCommercial 4.0 license\n",
-    "\n",
+    "<!-- HTML file automatically generated from DocOnce source (https://github.com/doconce/doconce/)\n",
+    "doconce format html week38.do.txt --no_mako -->\n",
+    "<!-- dom:TITLE: Week 38: Statistical analysis, bias-variance tradeoff and resampling methods -->"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f5b17715",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "# Week 38: Statistical analysis, bias-variance tradeoff and resampling methods\n",
+    "**Morten Hjorth-Jensen**, Department of Physics and Center for Computing in Science Education, University of Oslo, Norway\n",
     "\n",
+    "Date: **September 15-19, 2025**"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cc859721",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Plans for week 38, lecture Monday September 15\n",
     "\n",
+    "**Material for the lecture on Monday September 15.**\n",
     "\n",
-<<<<<<< HEAD
-    "## To do for log reg\n",
+    "1. Statistical interpretation of OLS and various expectation values\n",
     "\n",
-    "* Develop code for log reg step by step, with link to gradient descent part\n",
+    "2. Resampling techniques, Bootstrap and cross validation and bias-variance tradeoff\n",
     "\n",
-    "* show how to read and set up design matrix\n",
+    "3. The material we did not cover last week, that is on more advanced methods for updating the learning rate, are covered by its own video. We will briefly discuss these topics at the beginning of the lecture and during the lab sessions. See video on ADAgrad, RMSprop and ADAM (material from last week not covered during lecture) at <https://youtu.be/J_41Hld6tTU>\n",
+    "<!-- * [Video of Lecture](https://youtu.be/omLmp_kkie0) -->\n",
+    "<!-- * [Whiteboard notes](https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2024/NotesSeptember9.pdf) -->"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a3ecb019",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Readings and Videos\n",
+    "1. Raschka et al, pages 175-192\n",
     "\n",
-    "* use breast cancer data as example\n",
+    "2. Hastie et al Chapter 7, here we recommend 7.1-7.5 and 7.10 (cross-validation) and 7.11 (bootstrap). See <https://link.springer.com/book/10.1007/978-0-387-84858-7>.\n",
     "\n",
-    "* develop other classification examples, pulsar example\n",
-=======
-    "## Plans for week 38\n",
+    "3. [Video on bias-variance tradeoff](https://www.youtube.com/watch?v=EuBBz3bI-aA)\n",
     "\n",
-    "* Thursday: Summary of regression methods and discussion of project 1. We revisit also cross-validation and bootstrap as resampling techniques with examples. Recommended reading: [Hastie et al](https://www.springer.com/gp/book/9780387848570) chapters 3 and 7.1-7.6 and 7.10-7.12.\n",
+    "4. [Video on Bootstrapping](https://www.youtube.com/watch?v=Xz0x-8-cgaQ)\n",
     "\n",
-    "* Friday: Logistic Regression. Recommended reading: [Hastie et al](https://www.springer.com/gp/book/9780387848570) chapters 4.1-4.4 and [Murphy](https://mitpress.mit.edu/books/machine-learning-1) chapter 8.1-8.2\n",
+    "5. [Video on cross validation](https://www.youtube.com/watch?v=fSytzGwwBVw)\n",
     "\n",
-    "## Thursday September 17\n",
+    "For the lab session, the following video on cross validation (from 2024), could be helpful, see <https://www.youtube.com/watch?v=T9jjWsmsd1o>"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f128464d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Linking the regression analysis with a statistical interpretation\n",
     "\n",
-    "[Video of Lecture](https://www.uio.no/studier/emner/matnat/fys/FYS-STK4155/h20/forelesningsvideoer/LectureSeptember17.mp4?vrtx=view-as-webpage) and [link to handwritten notes](https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/NotesSeptember17.pdf).\n",
+    "We will now couple the discussions of ordinary least squares, Ridge\n",
+    "and Lasso regression with a statistical interpretation, that is we\n",
+    "move from a linear algebra analysis to a statistical analysis. In\n",
+    "particular, we will focus on what the regularization terms can result\n",
+    "in.  We will amongst other things show that the regularization\n",
+    "parameter can reduce considerably the variance of the parameters\n",
+    "$\\theta$.\n",
     "\n",
-    "## Ridge and LASSO Regression, reminder\n",
+    "On of the advantages of doing linear regression is that we actually end up with\n",
+    "analytical expressions for several statistical quantities.  \n",
+    "Standard least squares and Ridge regression  allow us to\n",
+    "derive quantities like the variance and other expectation values in a\n",
+    "rather straightforward way.\n",
     "\n",
-    "The expression for the standard Mean Squared Error (MSE) which we used to define our cost function and the equations for the ordinary least squares (OLS) method, that is \n",
-    "our optimization problem is"
+    "It is assumed that $\\varepsilon_i\n",
+    "\\sim \\mathcal{N}(0, \\sigma^2)$ and the $\\varepsilon_{i}$ are\n",
+    "independent, i.e.:"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "981db0b8",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
-    "{\\displaystyle \\min_{\\boldsymbol{\\beta}\\in {\\mathbb{R}}^{p}}}\\frac{1}{n}\\left\\{\\left(\\boldsymbol{y}-\\boldsymbol{X}\\boldsymbol{\\beta}\\right)^T\\left(\\boldsymbol{y}-\\boldsymbol{X}\\boldsymbol{\\beta}\\right)\\right\\}.\n",
+    "\\begin{align*} \n",
+    "\\mbox{Cov}(\\varepsilon_{i_1},\n",
+    "\\varepsilon_{i_2}) & = \\left\\{ \\begin{array}{lcc} \\sigma^2 & \\mbox{if}\n",
+    "& i_1 = i_2, \\\\ 0 & \\mbox{if} & i_1 \\not= i_2.  \\end{array} \\right.\n",
+    "\\end{align*}\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "85cc8db5",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "The randomness of $\\varepsilon_i$ implies that\n",
+    "$\\mathbf{y}_i$ is also a random variable. In particular,\n",
+    "$\\mathbf{y}_i$ is normally distributed, because $\\varepsilon_i \\sim\n",
+    "\\mathcal{N}(0, \\sigma^2)$ and $\\mathbf{X}_{i,\\ast} \\, \\boldsymbol{\\theta}$ is a\n",
+    "non-random scalar. To specify the parameters of the distribution of\n",
+    "$\\mathbf{y}_i$ we need to calculate its first two moments. \n",
+    "\n",
+    "Recall that $\\boldsymbol{X}$ is a matrix of dimensionality $n\\times p$. The\n",
+    "notation above $\\mathbf{X}_{i,\\ast}$ means that we are looking at the\n",
+    "row number $i$ and perform a sum over all values $p$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "96a431a6",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "or we can state it as"
+    "## Assumptions made\n",
+    "\n",
+    "The assumption we have made here can be summarized as (and this is going to be useful when we discuss the bias-variance trade off)\n",
+    "that there exists a function $f(\\boldsymbol{x})$ and  a normal distributed error $\\boldsymbol{\\varepsilon}\\sim \\mathcal{N}(0, \\sigma^2)$\n",
+    "which describe our data"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "b6bf35de",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
-    "{\\displaystyle \\min_{\\boldsymbol{\\beta}\\in\n",
-    "{\\mathbb{R}}^{p}}}\\frac{1}{n}\\sum_{i=0}^{n-1}\\left(y_i-\\tilde{y}_i\\right)^2=\\frac{1}{n}\\vert\\vert \\boldsymbol{y}-\\boldsymbol{X}\\boldsymbol{\\beta}\\vert\\vert_2^2,\n",
+    "\\boldsymbol{y} = f(\\boldsymbol{x})+\\boldsymbol{\\varepsilon}\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "a4133715",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "where we have used the definition of  a norm-2 vector, that is"
+    "We approximate this function with our model from the solution of the linear regression equations, that is our\n",
+    "function $f$ is approximated by $\\boldsymbol{\\tilde{y}}$ where we want to minimize $(\\boldsymbol{y}-\\boldsymbol{\\tilde{y}})^2$, our MSE, with"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "9d8eea64",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
-    "\\vert\\vert \\boldsymbol{x}\\vert\\vert_2 = \\sqrt{\\sum_i x_i^2}.\n",
+    "\\boldsymbol{\\tilde{y}} = \\boldsymbol{X}\\boldsymbol{\\theta}.\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "0707109a",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "By minimizing the above equation with respect to the parameters\n",
-    "$\\boldsymbol{\\beta}$ we could then obtain an analytical expression for the\n",
-    "parameters $\\boldsymbol{\\beta}$.  We can add a regularization parameter $\\lambda$ by\n",
-    "defining a new cost function to be optimized, that is"
+    "## Expectation value and variance\n",
+    "\n",
+    "We can calculate the expectation value of $\\boldsymbol{y}$ for a given element $i$"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "c84ee75e",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
-    "{\\displaystyle \\min_{\\boldsymbol{\\beta}\\in\n",
-    "{\\mathbb{R}}^{p}}}\\frac{1}{n}\\vert\\vert \\boldsymbol{y}-\\boldsymbol{X}\\boldsymbol{\\beta}\\vert\\vert_2^2+\\lambda\\vert\\vert \\boldsymbol{\\beta}\\vert\\vert_2^2\n",
+    "\\begin{align*} \n",
+    "\\mathbb{E}(y_i) & =\n",
+    "\\mathbb{E}(\\mathbf{X}_{i, \\ast} \\, \\boldsymbol{\\theta}) + \\mathbb{E}(\\varepsilon_i)\n",
+    "\\, \\, \\, = \\, \\, \\, \\mathbf{X}_{i, \\ast} \\, \\theta, \n",
+    "\\end{align*}\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "eceb2fb1",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "which leads to the Ridge regression minimization problem where we\n",
-    "require that $\\vert\\vert \\boldsymbol{\\beta}\\vert\\vert_2^2\\le t$, where $t$ is\n",
-    "a finite number larger than zero. By defining"
+    "while\n",
+    "its variance is"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "571c857b",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
-    "C(\\boldsymbol{X},\\boldsymbol{\\beta})=\\frac{1}{n}\\vert\\vert \\boldsymbol{y}-\\boldsymbol{X}\\boldsymbol{\\beta}\\vert\\vert_2^2+\\lambda\\vert\\vert \\boldsymbol{\\beta}\\vert\\vert_1,\n",
+    "\\begin{align*} \\mbox{Var}(y_i) & = \\mathbb{E} \\{ [y_i\n",
+    "- \\mathbb{E}(y_i)]^2 \\} \\, \\, \\, = \\, \\, \\, \\mathbb{E} ( y_i^2 ) -\n",
+    "[\\mathbb{E}(y_i)]^2  \\\\  & = \\mathbb{E} [ ( \\mathbf{X}_{i, \\ast} \\,\n",
+    "\\theta + \\varepsilon_i )^2] - ( \\mathbf{X}_{i, \\ast} \\, \\boldsymbol{\\theta})^2 \\\\ &\n",
+    "= \\mathbb{E} [ ( \\mathbf{X}_{i, \\ast} \\, \\boldsymbol{\\theta})^2 + 2 \\varepsilon_i\n",
+    "\\mathbf{X}_{i, \\ast} \\, \\boldsymbol{\\theta} + \\varepsilon_i^2 ] - ( \\mathbf{X}_{i,\n",
+    "\\ast} \\, \\theta)^2 \\\\  & = ( \\mathbf{X}_{i, \\ast} \\, \\boldsymbol{\\theta})^2 + 2\n",
+    "\\mathbb{E}(\\varepsilon_i) \\mathbf{X}_{i, \\ast} \\, \\boldsymbol{\\theta} +\n",
+    "\\mathbb{E}(\\varepsilon_i^2 ) - ( \\mathbf{X}_{i, \\ast} \\, \\boldsymbol{\\theta})^2 \n",
+    "\\\\ & = \\mathbb{E}(\\varepsilon_i^2 ) \\, \\, \\, = \\, \\, \\,\n",
+    "\\mbox{Var}(\\varepsilon_i) \\, \\, \\, = \\, \\, \\, \\sigma^2.  \n",
+    "\\end{align*}\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "fa94afa6",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Hence, $y_i \\sim \\mathcal{N}( \\mathbf{X}_{i, \\ast} \\, \\boldsymbol{\\theta}, \\sigma^2)$, that is $\\boldsymbol{y}$ follows a normal distribution with \n",
+    "mean value $\\boldsymbol{X}\\boldsymbol{\\theta}$ and variance $\\sigma^2$ (not be confused with the singular values of the SVD)."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ae5807fe",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "we have a new optimization equation"
+    "## Expectation value and variance for $\\boldsymbol{\\theta}$\n",
+    "\n",
+    "With the OLS expressions for the optimal parameters $\\boldsymbol{\\hat{\\theta}}$ we can evaluate the expectation value"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "9b7dbfc6",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
-    "{\\displaystyle \\min_{\\boldsymbol{\\beta}\\in\n",
-    "{\\mathbb{R}}^{p}}}\\frac{1}{n}\\vert\\vert \\boldsymbol{y}-\\boldsymbol{X}\\boldsymbol{\\beta}\\vert\\vert_2^2+\\lambda\\vert\\vert \\boldsymbol{\\beta}\\vert\\vert_1\n",
+    "\\mathbb{E}(\\boldsymbol{\\hat{\\theta}}) = \\mathbb{E}[ (\\mathbf{X}^{\\top} \\mathbf{X})^{-1}\\mathbf{X}^{T} \\mathbf{Y}]=(\\mathbf{X}^{T} \\mathbf{X})^{-1}\\mathbf{X}^{T} \\mathbb{E}[ \\mathbf{Y}]=(\\mathbf{X}^{T} \\mathbf{X})^{-1} \\mathbf{X}^{T}\\mathbf{X}\\boldsymbol{\\theta}=\\boldsymbol{\\theta}.\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "a469ecb5",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "which leads to Lasso regression. Lasso stands for least absolute shrinkage and selection operator. \n",
+    "This means that the estimator of the regression parameters is unbiased.\n",
+    "\n",
+    "We can also calculate the variance\n",
     "\n",
-    "Here we have defined the norm-1 as"
+    "The variance of the optimal value $\\boldsymbol{\\hat{\\theta}}$ is"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "a22a5899",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
-    "\\vert\\vert \\boldsymbol{x}\\vert\\vert_1 = \\sum_i \\vert x_i\\vert.\n",
+    "\\begin{eqnarray*}\n",
+    "\\mbox{Var}(\\boldsymbol{\\hat{\\theta}}) & = & \\mathbb{E} \\{ [\\boldsymbol{\\theta} - \\mathbb{E}(\\boldsymbol{\\theta})] [\\boldsymbol{\\theta} - \\mathbb{E}(\\boldsymbol{\\theta})]^{T} \\}\n",
+    "\\\\\n",
+    "& = & \\mathbb{E} \\{ [(\\mathbf{X}^{T} \\mathbf{X})^{-1} \\, \\mathbf{X}^{T} \\mathbf{Y} - \\boldsymbol{\\theta}] \\, [(\\mathbf{X}^{T} \\mathbf{X})^{-1} \\, \\mathbf{X}^{T} \\mathbf{Y} - \\boldsymbol{\\theta}]^{T} \\}\n",
+    "\\\\\n",
+    "% & = & \\mathbb{E} \\{ [(\\mathbf{X}^{T} \\mathbf{X})^{-1} \\, \\mathbf{X}^{T} \\mathbf{Y}] \\, [(\\mathbf{X}^{T} \\mathbf{X})^{-1} \\, \\mathbf{X}^{T} \\mathbf{Y}]^{T} \\} - \\boldsymbol{\\theta} \\, \\boldsymbol{\\theta}^{T}\n",
+    "% \\\\\n",
+    "% & = & \\mathbb{E} \\{ (\\mathbf{X}^{T} \\mathbf{X})^{-1} \\, \\mathbf{X}^{T} \\mathbf{Y} \\, \\mathbf{Y}^{T} \\, \\mathbf{X} \\, (\\mathbf{X}^{T} \\mathbf{X})^{-1}  \\} - \\boldsymbol{\\theta} \\, \\boldsymbol{\\theta}^{T}\n",
+    "% \\\\\n",
+    "& = & (\\mathbf{X}^{T} \\mathbf{X})^{-1} \\, \\mathbf{X}^{T} \\, \\mathbb{E} \\{ \\mathbf{Y} \\, \\mathbf{Y}^{T} \\} \\, \\mathbf{X} \\, (\\mathbf{X}^{T} \\mathbf{X})^{-1} - \\boldsymbol{\\theta} \\, \\boldsymbol{\\theta}^{T}\n",
+    "\\\\\n",
+    "& = & (\\mathbf{X}^{T} \\mathbf{X})^{-1} \\, \\mathbf{X}^{T} \\, \\{ \\mathbf{X} \\, \\boldsymbol{\\theta} \\, \\boldsymbol{\\theta}^{T} \\,  \\mathbf{X}^{T} + \\sigma^2 \\} \\, \\mathbf{X} \\, (\\mathbf{X}^{T} \\mathbf{X})^{-1} - \\boldsymbol{\\theta} \\, \\boldsymbol{\\theta}^{T}\n",
+    "% \\\\\n",
+    "% & = & (\\mathbf{X}^T \\mathbf{X})^{-1} \\, \\mathbf{X}^T \\, \\mathbf{X} \\, \\boldsymbol{\\theta} \\, \\boldsymbol{\\theta}^T \\,  \\mathbf{X}^T \\, \\mathbf{X} \\, (\\mathbf{X}^T % \\mathbf{X})^{-1}\n",
+    "% \\\\\n",
+    "% & & + \\, \\, \\sigma^2 \\, (\\mathbf{X}^T \\mathbf{X})^{-1} \\, \\mathbf{X}^T  \\, \\mathbf{X} \\, (\\mathbf{X}^T \\mathbf{X})^{-1} - \\boldsymbol{\\theta} \\boldsymbol{\\theta}^T\n",
+    "\\\\\n",
+    "& = & \\boldsymbol{\\theta} \\, \\boldsymbol{\\theta}^{T}  + \\sigma^2 \\, (\\mathbf{X}^{T} \\mathbf{X})^{-1} - \\boldsymbol{\\theta} \\, \\boldsymbol{\\theta}^{T}\n",
+    "\\, \\, \\, = \\, \\, \\, \\sigma^2 \\, (\\mathbf{X}^{T} \\mathbf{X})^{-1},\n",
+    "\\end{eqnarray*}\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "f7b64841",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "<!-- !split  -->\n",
-    "## Various steps in cross-validation\n",
-    "\n",
-    "When the repetitive splitting of the data set is done randomly,\n",
-    "samples may accidently end up in a fast majority of the splits in\n",
-    "either training or test set. Such samples may have an unbalanced\n",
-    "influence on either model building or prediction evaluation. To avoid\n",
-    "this $k$-fold cross-validation structures the data splitting. The\n",
-    "samples are divided into $k$ more or less equally sized exhaustive and\n",
-    "mutually exclusive subsets. In turn (at each split) one of these\n",
-    "subsets plays the role of the test set while the union of the\n",
-    "remaining subsets constitutes the training set. Such a splitting\n",
-    "warrants a balanced representation of each sample in both training and\n",
-    "test set over the splits. Still the division into the $k$ subsets\n",
-    "involves a degree of randomness. This may be fully excluded when\n",
-    "choosing $k=n$. This particular case is referred to as leave-one-out\n",
-    "cross-validation (LOOCV). \n",
-    "\n",
-    "<!-- !split  -->\n",
-    "## How to set up the cross-validation for Ridge and/or Lasso\n",
+    "where we have used  that $\\mathbb{E} (\\mathbf{Y} \\mathbf{Y}^{T}) =\n",
+    "\\mathbf{X} \\, \\boldsymbol{\\theta} \\, \\boldsymbol{\\theta}^{T} \\, \\mathbf{X}^{T} +\n",
+    "\\sigma^2 \\, \\mathbf{I}_{nn}$. From $\\mbox{Var}(\\boldsymbol{\\theta}) = \\sigma^2\n",
+    "\\, (\\mathbf{X}^{T} \\mathbf{X})^{-1}$, one obtains an estimate of the\n",
+    "variance of the estimate of the $j$-th regression coefficient:\n",
+    "$\\boldsymbol{\\sigma}^2 (\\boldsymbol{\\theta}_j ) = \\boldsymbol{\\sigma}^2 [(\\mathbf{X}^{T} \\mathbf{X})^{-1}]_{jj} $. This may be used to\n",
+    "construct a confidence interval for the estimates.\n",
     "\n",
-    "* Define a range of interest for the penalty parameter.\n",
+    "In a similar way, we can obtain analytical expressions for say the\n",
+    "expectation values of the parameters $\\boldsymbol{\\theta}$ and their variance\n",
+    "when we employ Ridge regression, allowing us again to define a confidence interval. \n",
     "\n",
-    "* Divide the data set into training and test set comprising samples $\\{1, \\ldots, n\\} \\setminus i$ and $\\{ i \\}$, respectively.\n",
-    "\n",
-    "* Fit the linear regression model by means of ridge estimation  for each $\\lambda$ in the grid using the training set, and the corresponding estimate of the error variance $\\boldsymbol{\\sigma}_{-i}^2(\\lambda)$, as"
+    "It is rather straightforward to show that"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "26d50d5c",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
-    "\\begin{align*}\n",
-    "\\boldsymbol{\\beta}_{-i}(\\lambda) & =  ( \\boldsymbol{X}_{-i, \\ast}^{T}\n",
-    "\\boldsymbol{X}_{-i, \\ast} + \\lambda \\boldsymbol{I}_{pp})^{-1}\n",
-    "\\boldsymbol{X}_{-i, \\ast}^{T} \\boldsymbol{y}_{-i}\n",
-    "\\end{align*}\n",
+    "\\mathbb{E} \\big[ \\boldsymbol{\\theta}^{\\mathrm{Ridge}} \\big]=(\\mathbf{X}^{T} \\mathbf{X} + \\lambda \\mathbf{I}_{pp})^{-1} (\\mathbf{X}^{\\top} \\mathbf{X})\\boldsymbol{\\theta}^{\\mathrm{OLS}}.\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "e2e18a10",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "* Evaluate the prediction performance of these models on the test set by $\\log\\{L[y_i, \\boldsymbol{X}_{i, \\ast}; \\boldsymbol{\\beta}_{-i}(\\lambda), \\boldsymbol{\\sigma}_{-i}^2(\\lambda)]\\}$. Or, by the prediction error $|y_i - \\boldsymbol{X}_{i, \\ast} \\boldsymbol{\\beta}_{-i}(\\lambda)|$, the relative error, the error squared or the R2 score function.\n",
+    "We see clearly that \n",
+    "$\\mathbb{E} \\big[ \\boldsymbol{\\theta}^{\\mathrm{Ridge}} \\big] \\not= \\boldsymbol{\\theta}^{\\mathrm{OLS}}$ for any $\\lambda > 0$. We say then that the ridge estimator is biased.\n",
     "\n",
-    "* Repeat the first three steps  such that each sample plays the role of the test set once.\n",
-    "\n",
-    "* Average the prediction performances of the test sets at each grid point of the penalty bias/parameter. It is an estimate of the prediction performance of the model corresponding to this value of the penalty parameter on novel data. It is defined as"
+    "We can also compute the variance as"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "9a97e0bb",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
-    "\\begin{align*}\n",
-    "\\frac{1}{n} \\sum_{i = 1}^n \\log\\{L[y_i, \\mathbf{X}_{i, \\ast}; \\boldsymbol{\\beta}_{-i}(\\lambda), \\boldsymbol{\\sigma}_{-i}^2(\\lambda)]\\}.\n",
-    "\\end{align*}\n",
+    "\\mbox{Var}[\\boldsymbol{\\theta}^{\\mathrm{Ridge}}]=\\sigma^2[  \\mathbf{X}^{T} \\mathbf{X} + \\lambda \\mathbf{I} ]^{-1}  \\mathbf{X}^{T} \\mathbf{X} \\{ [  \\mathbf{X}^{\\top} \\mathbf{X} + \\lambda \\mathbf{I} ]^{-1}\\}^{T},\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "53cef62d",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "## Cross-validation in brief\n",
-    "\n",
-    "For the various values of $k$\n",
-    "\n",
-    "1. shuffle the dataset randomly.\n",
-    "\n",
-    "2. Split the dataset into $k$ groups.\n",
-    "\n",
-    "3. For each unique group:\n",
-    "\n",
-    "a. Decide which group to use as set for test data\n",
-    "\n",
-    "b. Take the remaining groups as a training data set\n",
-    "\n",
-    "c. Fit a model on the training set and evaluate it on the test set\n",
-    "\n",
-    "d. Retain the evaluation score and discard the model\n",
-    "\n",
+    "and it is easy to see that if the parameter $\\lambda$ goes to infinity then the variance of Ridge parameters $\\boldsymbol{\\theta}$ goes to zero. \n",
     "\n",
-    "5. Summarize the model using the sample of model evaluation scores\n",
-    "\n",
-    "## Code Example for Cross-validation and $k$-fold Cross-validation\n",
-    "\n",
-    "The code here uses Ridge regression with cross-validation (CV)  resampling and $k$-fold CV in order to fit a specific polynomial."
+    "With this, we can compute the difference"
    ]
   },
   {
-   "cell_type": "code",
-   "execution_count": 1,
-   "metadata": {},
-   "outputs": [],
+   "cell_type": "markdown",
+   "id": "c8ec4ab0",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "%matplotlib inline\n",
-    "\n",
-    "import numpy as np\n",
-    "import matplotlib.pyplot as plt\n",
-    "from sklearn.model_selection import KFold\n",
-    "from sklearn.linear_model import Ridge\n",
-    "from sklearn.model_selection import cross_val_score\n",
-    "from sklearn.preprocessing import PolynomialFeatures\n",
-    "\n",
-    "# A seed just to ensure that the random numbers are the same for every run.\n",
-    "# Useful for eventual debugging.\n",
-    "np.random.seed(3155)\n",
-    "\n",
-    "# Generate the data.\n",
-    "nsamples = 100\n",
-    "x = np.random.randn(nsamples)\n",
-    "y = 3*x**2 + np.random.randn(nsamples)\n",
-    "\n",
-    "## Cross-validation on Ridge regression using KFold only\n",
-    "\n",
-    "# Decide degree on polynomial to fit\n",
-    "poly = PolynomialFeatures(degree = 6)\n",
-    "\n",
-    "# Decide which values of lambda to use\n",
-    "nlambdas = 500\n",
-    "lambdas = np.logspace(-3, 5, nlambdas)\n",
-    "\n",
-    "# Initialize a KFold instance\n",
-    "k = 5\n",
-    "kfold = KFold(n_splits = k)\n",
-    "\n",
-    "# Perform the cross-validation to estimate MSE\n",
-    "scores_KFold = np.zeros((nlambdas, k))\n",
-    "\n",
-    "i = 0\n",
-    "for lmb in lambdas:\n",
-    "    ridge = Ridge(alpha = lmb)\n",
-    "    j = 0\n",
-    "    for train_inds, test_inds in kfold.split(x):\n",
-    "        xtrain = x[train_inds]\n",
-    "        ytrain = y[train_inds]\n",
-    "\n",
-    "        xtest = x[test_inds]\n",
-    "        ytest = y[test_inds]\n",
-    "\n",
-    "        Xtrain = poly.fit_transform(xtrain[:, np.newaxis])\n",
-    "        ridge.fit(Xtrain, ytrain[:, np.newaxis])\n",
-    "\n",
-    "        Xtest = poly.fit_transform(xtest[:, np.newaxis])\n",
-    "        ypred = ridge.predict(Xtest)\n",
-    "\n",
-    "        scores_KFold[i,j] = np.sum((ypred - ytest[:, np.newaxis])**2)/np.size(ypred)\n",
-    "\n",
-    "        j += 1\n",
-    "    i += 1\n",
-    "\n",
-    "\n",
-    "estimated_mse_KFold = np.mean(scores_KFold, axis = 1)\n",
-    "\n",
-    "## Cross-validation using cross_val_score from sklearn along with KFold\n",
-    "\n",
-    "# kfold is an instance initialized above as:\n",
-    "# kfold = KFold(n_splits = k)\n",
-    "\n",
-    "estimated_mse_sklearn = np.zeros(nlambdas)\n",
-    "i = 0\n",
-    "for lmb in lambdas:\n",
-    "    ridge = Ridge(alpha = lmb)\n",
-    "\n",
-    "    X = poly.fit_transform(x[:, np.newaxis])\n",
-    "    estimated_mse_folds = cross_val_score(ridge, X, y[:, np.newaxis], scoring='neg_mean_squared_error', cv=kfold)\n",
-    "\n",
-    "    # cross_val_score return an array containing the estimated negative mse for every fold.\n",
-    "    # we have to the the mean of every array in order to get an estimate of the mse of the model\n",
-    "    estimated_mse_sklearn[i] = np.mean(-estimated_mse_folds)\n",
-    "\n",
-    "    i += 1\n",
-    "\n",
-    "## Plot and compare the slightly different ways to perform cross-validation\n",
-    "\n",
-    "plt.figure()\n",
-    "\n",
-    "plt.plot(np.log10(lambdas), estimated_mse_sklearn, label = 'cross_val_score')\n",
-    "plt.plot(np.log10(lambdas), estimated_mse_KFold, 'r--', label = 'KFold')\n",
-    "\n",
-    "plt.xlabel('log10(lambda)')\n",
-    "plt.ylabel('mse')\n",
-    "\n",
-    "plt.legend()\n",
-    "\n",
-    "plt.show()"
+    "$$\n",
+    "\\mbox{Var}[\\boldsymbol{\\theta}^{\\mathrm{OLS}}]-\\mbox{Var}(\\boldsymbol{\\theta}^{\\mathrm{Ridge}})=\\sigma^2 [  \\mathbf{X}^{T} \\mathbf{X} + \\lambda \\mathbf{I} ]^{-1}[ 2\\lambda\\mathbf{I} + \\lambda^2 (\\mathbf{X}^{T} \\mathbf{X})^{-1} ] \\{ [  \\mathbf{X}^{T} \\mathbf{X} + \\lambda \\mathbf{I} ]^{-1}\\}^{T}.\n",
+    "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "1e5e5e07",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "## Bias-Variance tradeoff with Bootstrap"
+    "The difference is non-negative definite since each component of the\n",
+    "matrix product is non-negative definite. \n",
+    "This means the variance we obtain with the standard OLS will always for $\\lambda > 0$ be larger than the variance of $\\boldsymbol{\\theta}$ obtained with the Ridge estimator. This has interesting consequences when we discuss the so-called bias-variance trade-off below."
    ]
   },
   {
-   "cell_type": "code",
-   "execution_count": 2,
-   "metadata": {},
-   "outputs": [],
+   "cell_type": "markdown",
+   "id": "b1ea4ea7",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "import matplotlib.pyplot as plt\n",
-    "import numpy as np\n",
-    "from sklearn.linear_model import LinearRegression, Ridge, Lasso\n",
-    "from sklearn.preprocessing import PolynomialFeatures\n",
-    "from sklearn.model_selection import train_test_split\n",
-    "from sklearn.pipeline import make_pipeline\n",
-    "from sklearn.utils import resample\n",
-    "\n",
-    "np.random.seed(2018)\n",
-    "\n",
-    "n = 40\n",
-    "n_boostraps = 100\n",
-    "maxdegree = 14\n",
-    "\n",
-    "\n",
-    "# Make data set.\n",
-    "x = np.linspace(-3, 3, n).reshape(-1, 1)\n",
-    "y = np.exp(-x**2) + 1.5 * np.exp(-(x-2)**2)+ np.random.normal(0, 0.1, x.shape)\n",
-    "error = np.zeros(maxdegree)\n",
-    "bias = np.zeros(maxdegree)\n",
-    "variance = np.zeros(maxdegree)\n",
-    "polydegree = np.zeros(maxdegree)\n",
-    "x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2)\n",
-    "\n",
-    "for degree in range(maxdegree):\n",
-    "    model = make_pipeline(PolynomialFeatures(degree=degree), LinearRegression(fit_intercept=False))\n",
-    "    y_pred = np.empty((y_test.shape[0], n_boostraps))\n",
-    "    for i in range(n_boostraps):\n",
-    "        x_, y_ = resample(x_train, y_train)\n",
-    "        y_pred[:, i] = model.fit(x_, y_).predict(x_test).ravel()\n",
+    "## Deriving OLS from a probability distribution\n",
     "\n",
-    "    polydegree[degree] = degree\n",
-    "    error[degree] = np.mean( np.mean((y_test - y_pred)**2, axis=1, keepdims=True) )\n",
-    "    bias[degree] = np.mean( (y_test - np.mean(y_pred, axis=1, keepdims=True))**2 )\n",
-    "    variance[degree] = np.mean( np.var(y_pred, axis=1, keepdims=True) )\n",
-    "    print('Polynomial degree:', degree)\n",
-    "    print('Error:', error[degree])\n",
-    "    print('Bias^2:', bias[degree])\n",
-    "    print('Var:', variance[degree])\n",
-    "    print('{} >= {} + {} = {}'.format(error[degree], bias[degree], variance[degree], bias[degree]+variance[degree]))\n",
+    "Our basic assumption when we derived the OLS equations was to assume\n",
+    "that our output is determined by a given continuous function\n",
+    "$f(\\boldsymbol{x})$ and a random noise $\\boldsymbol{\\epsilon}$ given by the normal\n",
+    "distribution with zero mean value and an undetermined variance\n",
+    "$\\sigma^2$.\n",
     "\n",
-    "plt.plot(polydegree, error, label='Error')\n",
-    "plt.plot(polydegree, bias, label='bias')\n",
-    "plt.plot(polydegree, variance, label='Variance')\n",
-    "plt.legend()\n",
-    "plt.show()"
+    "We found above that the outputs $\\boldsymbol{y}$ have a mean value given by\n",
+    "$\\boldsymbol{X}\\hat{\\boldsymbol{\\theta}}$ and variance $\\sigma^2$. Since the entries to\n",
+    "the design matrix are not stochastic variables, we can assume that the\n",
+    "probability distribution of our targets is also a normal distribution\n",
+    "but now with mean value $\\boldsymbol{X}\\hat{\\boldsymbol{\\theta}}$. This means that a\n",
+    "single output $y_i$ is given by the Gaussian distribution"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "d9c8dd38",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "## Another Example from Scikit-Learn's Repository"
+    "$$\n",
+    "y_i\\sim \\mathcal{N}(\\boldsymbol{X}_{i,*}\\boldsymbol{\\theta}, \\sigma^2)=\\frac{1}{\\sqrt{2\\pi\\sigma^2}}\\exp{\\left[-\\frac{(y_i-\\boldsymbol{X}_{i,*}\\boldsymbol{\\theta})^2}{2\\sigma^2}\\right]}.\n",
+    "$$"
    ]
   },
   {
-   "cell_type": "code",
-   "execution_count": 3,
-   "metadata": {},
-   "outputs": [],
+   "cell_type": "markdown",
+   "id": "7abb690f",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "\"\"\"\n",
-    "============================\n",
-    "Underfitting vs. Overfitting\n",
-    "============================\n",
-    "\n",
-    "This example demonstrates the problems of underfitting and overfitting and\n",
-    "how we can use linear regression with polynomial features to approximate\n",
-    "nonlinear functions. The plot shows the function that we want to approximate,\n",
-    "which is a part of the cosine function. In addition, the samples from the\n",
-    "real function and the approximations of different models are displayed. The\n",
-    "models have polynomial features of different degrees. We can see that a\n",
-    "linear function (polynomial with degree 1) is not sufficient to fit the\n",
-    "training samples. This is called **underfitting**. A polynomial of degree 4\n",
-    "approximates the true function almost perfectly. However, for higher degrees\n",
-    "the model will **overfit** the training data, i.e. it learns the noise of the\n",
-    "training data.\n",
-    "We evaluate quantitatively **overfitting** / **underfitting** by using\n",
-    "cross-validation. We calculate the mean squared error (MSE) on the validation\n",
-    "set, the higher, the less likely the model generalizes correctly from the\n",
-    "training data.\n",
-    "\"\"\"\n",
-    "\n",
-    "print(__doc__)\n",
-    "\n",
-    "import numpy as np\n",
-    "import matplotlib.pyplot as plt\n",
-    "from sklearn.pipeline import Pipeline\n",
-    "from sklearn.preprocessing import PolynomialFeatures\n",
-    "from sklearn.linear_model import LinearRegression\n",
-    "from sklearn.model_selection import cross_val_score\n",
-    "\n",
+    "## Independent and Identically Distributed (iid)\n",
     "\n",
-    "def true_fun(X):\n",
-    "    return np.cos(1.5 * np.pi * X)\n",
-    "\n",
-    "np.random.seed(0)\n",
-    "\n",
-    "n_samples = 30\n",
-    "degrees = [1, 4, 15]\n",
-    "\n",
-    "X = np.sort(np.random.rand(n_samples))\n",
-    "y = true_fun(X) + np.random.randn(n_samples) * 0.1\n",
-    "\n",
-    "plt.figure(figsize=(14, 5))\n",
-    "for i in range(len(degrees)):\n",
-    "    ax = plt.subplot(1, len(degrees), i + 1)\n",
-    "    plt.setp(ax, xticks=(), yticks=())\n",
-    "\n",
-    "    polynomial_features = PolynomialFeatures(degree=degrees[i],\n",
-    "                                             include_bias=False)\n",
-    "    linear_regression = LinearRegression()\n",
-    "    pipeline = Pipeline([(\"polynomial_features\", polynomial_features),\n",
-    "                         (\"linear_regression\", linear_regression)])\n",
-    "    pipeline.fit(X[:, np.newaxis], y)\n",
-    "\n",
-    "    # Evaluate the models using crossvalidation\n",
-    "    scores = cross_val_score(pipeline, X[:, np.newaxis], y,\n",
-    "                             scoring=\"neg_mean_squared_error\", cv=10)\n",
-    "\n",
-    "    X_test = np.linspace(0, 1, 100)\n",
-    "    plt.plot(X_test, pipeline.predict(X_test[:, np.newaxis]), label=\"Model\")\n",
-    "    plt.plot(X_test, true_fun(X_test), label=\"True function\")\n",
-    "    plt.scatter(X, y, edgecolor='b', s=20, label=\"Samples\")\n",
-    "    plt.xlabel(\"x\")\n",
-    "    plt.ylabel(\"y\")\n",
-    "    plt.xlim((0, 1))\n",
-    "    plt.ylim((-2, 2))\n",
-    "    plt.legend(loc=\"best\")\n",
-    "    plt.title(\"Degree {}\\nMSE = {:.2e}(+/- {:.2e})\".format(\n",
-    "        degrees[i], -scores.mean(), scores.std()))\n",
-    "plt.show()"
+    "We assume now that the various $y_i$ values are stochastically distributed according to the above Gaussian distribution. \n",
+    "We define this distribution as"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Cross-validation with Ridge"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 4,
-   "metadata": {},
-   "outputs": [],
+   "id": "417c8406",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "import numpy as np\n",
-    "import matplotlib.pyplot as plt\n",
-    "from sklearn.model_selection import KFold\n",
-    "from sklearn.linear_model import Ridge\n",
-    "from sklearn.model_selection import cross_val_score\n",
-    "from sklearn.preprocessing import PolynomialFeatures\n",
-    "\n",
-    "# A seed just to ensure that the random numbers are the same for every run.\n",
-    "np.random.seed(3155)\n",
-    "# Generate the data.\n",
-    "n = 100\n",
-    "x = np.linspace(-3, 3, n).reshape(-1, 1)\n",
-    "y = np.exp(-x**2) + 1.5 * np.exp(-(x-2)**2)+ np.random.normal(0, 0.1, x.shape)\n",
-    "# Decide degree on polynomial to fit\n",
-    "poly = PolynomialFeatures(degree = 10)\n",
-    "\n",
-    "# Decide which values of lambda to use\n",
-    "nlambdas = 500\n",
-    "lambdas = np.logspace(-3, 5, nlambdas)\n",
-    "# Initialize a KFold instance\n",
-    "k = 5\n",
-    "kfold = KFold(n_splits = k)\n",
-    "estimated_mse_sklearn = np.zeros(nlambdas)\n",
-    "i = 0\n",
-    "for lmb in lambdas:\n",
-    "    ridge = Ridge(alpha = lmb)\n",
-    "    estimated_mse_folds = cross_val_score(ridge, x, y, scoring='neg_mean_squared_error', cv=kfold)\n",
-    "    estimated_mse_sklearn[i] = np.mean(-estimated_mse_folds)\n",
-    "    i += 1\n",
-    "plt.figure()\n",
-    "plt.plot(np.log10(lambdas), estimated_mse_sklearn, label = 'cross_val_score')\n",
-    "plt.xlabel('log10(lambda)')\n",
-    "plt.ylabel('MSE')\n",
-    "plt.legend()\n",
-    "plt.show()"
+    "$$\n",
+    "p(y_i, \\boldsymbol{X}\\vert\\boldsymbol{\\theta})=\\frac{1}{\\sqrt{2\\pi\\sigma^2}}\\exp{\\left[-\\frac{(y_i-\\boldsymbol{X}_{i,*}\\boldsymbol{\\theta})^2}{2\\sigma^2}\\right]},\n",
+    "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "fb01419c",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "## The Ising model\n",
+    "which reads as finding the likelihood of an event $y_i$ with the input variables $\\boldsymbol{X}$ given the parameters (to be determined) $\\boldsymbol{\\theta}$.\n",
     "\n",
-    "The one-dimensional Ising model with nearest neighbor interaction, no\n",
-    "external field and a constant coupling constant $J$ is given by"
+    "Since these events are assumed to be independent and identicall distributed we can build the probability distribution function (PDF) for all possible event $\\boldsymbol{y}$ as the product of the single events, that is we have"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "dbe14673",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "<!-- Equation labels as ordinary links -->\n",
-    "<div id=\"_auto1\"></div>\n",
-    "\n",
     "$$\n",
-    "\\begin{equation}\n",
-    "    H = -J \\sum_{k}^L s_k s_{k + 1},\n",
-    "\\label{_auto1} \\tag{1}\n",
-    "\\end{equation}\n",
+    "p(\\boldsymbol{y},\\boldsymbol{X}\\vert\\boldsymbol{\\theta})=\\prod_{i=0}^{n-1}\\frac{1}{\\sqrt{2\\pi\\sigma^2}}\\exp{\\left[-\\frac{(y_i-\\boldsymbol{X}_{i,*}\\boldsymbol{\\theta})^2}{2\\sigma^2}\\right]}=\\prod_{i=0}^{n-1}p(y_i,\\boldsymbol{X}\\vert\\boldsymbol{\\theta}).\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "a19153fe",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "where $s_i \\in \\{-1, 1\\}$ and $s_{N + 1} = s_1$. The number of spins\n",
-    "in the system is determined by $L$. For the one-dimensional system\n",
-    "there is no phase transition.\n",
-    "\n",
-    "We will look at a system of $L = 40$ spins with a coupling constant of\n",
-    "$J = 1$. To get enough training data we will generate 10000 states\n",
-    "with their respective energies."
+    "We will write this in a more compact form reserving $\\boldsymbol{D}$ for the domain of events, including the ouputs (targets) and the inputs. That is\n",
+    "in case we have a simple one-dimensional input and output case"
    ]
   },
   {
-   "cell_type": "code",
-   "execution_count": 5,
-   "metadata": {},
-   "outputs": [],
+   "cell_type": "markdown",
+   "id": "017ca4ca",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "import numpy as np\n",
-    "import matplotlib.pyplot as plt\n",
-    "from mpl_toolkits.axes_grid1 import make_axes_locatable\n",
-    "import seaborn as sns\n",
-    "import scipy.linalg as scl\n",
-    "from sklearn.model_selection import train_test_split\n",
-    "import tqdm\n",
-    "sns.set(color_codes=True)\n",
-    "cmap_args=dict(vmin=-1., vmax=1., cmap='seismic')\n",
-    "\n",
-    "L = 40\n",
-    "n = int(1e4)\n",
-    "\n",
-    "spins = np.random.choice([-1, 1], size=(n, L))\n",
-    "J = 1.0\n",
-    "\n",
-    "energies = np.zeros(n)\n",
-    "\n",
-    "for i in range(n):\n",
-    "    energies[i] = - J * np.dot(spins[i], np.roll(spins[i], 1))"
+    "$$\n",
+    "\\boldsymbol{D}=[(x_0,y_0), (x_1,y_1),\\dots, (x_{n-1},y_{n-1})].\n",
+    "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "85c7bc4a",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "Here we use ordinary least squares\n",
-    "regression to predict the energy for the nearest neighbor\n",
-    "one-dimensional Ising model on a ring, i.e., the endpoints wrap\n",
-    "around. We will use linear regression to fit a value for\n",
-    "the coupling constant to achieve this.\n",
-    "\n",
-    "## Reformulating the problem to suit regression\n",
-    "\n",
-    "A more general form for the one-dimensional Ising model is"
+    "In the more general case the various inputs should be replaced by the possible features represented by the input data set $\\boldsymbol{X}$. \n",
+    "We can now rewrite the above probability as"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "ad6066d5",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "<!-- Equation labels as ordinary links -->\n",
-    "<div id=\"_auto2\"></div>\n",
-    "\n",
     "$$\n",
-    "\\begin{equation}\n",
-    "    H = - \\sum_j^L \\sum_k^L s_j s_k J_{jk}.\n",
-    "\\label{_auto2} \\tag{2}\n",
-    "\\end{equation}\n",
+    "p(\\boldsymbol{D}\\vert\\boldsymbol{\\theta})=\\prod_{i=0}^{n-1}\\frac{1}{\\sqrt{2\\pi\\sigma^2}}\\exp{\\left[-\\frac{(y_i-\\boldsymbol{X}_{i,*}\\boldsymbol{\\theta})^2}{2\\sigma^2}\\right]}.\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "7d9872d8",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "Here we allow for interactions beyond the nearest neighbors and a state dependent\n",
-    "coupling constant. This latter expression can be formulated as\n",
-    "a matrix-product"
+    "It is a conditional probability (see below) and reads as the likelihood of a domain of events $\\boldsymbol{D}$ given a set of parameters $\\boldsymbol{\\theta}$."
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "e3973752",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "<!-- Equation labels as ordinary links -->\n",
-    "<div id=\"_auto3\"></div>\n",
+    "## Maximum Likelihood Estimation (MLE)\n",
     "\n",
-    "$$\n",
-    "\\begin{equation}\n",
-    "    \\boldsymbol{H} = \\boldsymbol{X} J,\n",
-    "\\label{_auto3} \\tag{3}\n",
-    "\\end{equation}\n",
-    "$$"
+    "In statistics, maximum likelihood estimation (MLE) is a method of\n",
+    "estimating the parameters of an assumed probability distribution,\n",
+    "given some observed data. This is achieved by maximizing a likelihood\n",
+    "function so that, under the assumed statistical model, the observed\n",
+    "data is the most probable. \n",
+    "\n",
+    "We will assume here that our events are given by the above Gaussian\n",
+    "distribution and we will determine the optimal parameters $\\theta$ by\n",
+    "maximizing the above PDF. However, computing the derivatives of a\n",
+    "product function is cumbersome and can easily lead to overflow and/or\n",
+    "underflowproblems, with potentials for loss of numerical precision.\n",
+    "\n",
+    "In practice, it is more convenient to maximize the logarithm of the\n",
+    "PDF because it is a monotonically increasing function of the argument.\n",
+    "Alternatively, and this will be our option, we will minimize the\n",
+    "negative of the logarithm since this is a monotonically decreasing\n",
+    "function.\n",
+    "\n",
+    "Note also that maximization/minimization of the logarithm of the PDF\n",
+    "is equivalent to the maximization/minimization of the function itself."
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "5adedf8a",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "where $X_{jk} = s_j s_k$ and $J$ is a matrix which consists of the\n",
-    "elements $-J_{jk}$. This form of writing the energy fits perfectly\n",
-    "with the form utilized in linear regression, that is"
+    "## A new Cost Function\n",
+    "\n",
+    "We could now define a new cost function to minimize, namely the negative logarithm of the above PDF"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "23e01c9e",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "<!-- Equation labels as ordinary links -->\n",
-    "<div id=\"_auto4\"></div>\n",
-    "\n",
     "$$\n",
-    "\\begin{equation}\n",
-    "    \\boldsymbol{y} = \\boldsymbol{X}\\boldsymbol{\\beta} + \\boldsymbol{\\epsilon},\n",
-    "\\label{_auto4} \\tag{4}\n",
-    "\\end{equation}\n",
+    "C(\\boldsymbol{\\theta})=-\\log{\\prod_{i=0}^{n-1}p(y_i,\\boldsymbol{X}\\vert\\boldsymbol{\\theta})}=-\\sum_{i=0}^{n-1}\\log{p(y_i,\\boldsymbol{X}\\vert\\boldsymbol{\\theta})},\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "94cc3cc3",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "We split the data in training and test data as discussed in the previous example"
+    "which becomes"
    ]
   },
   {
-   "cell_type": "code",
-   "execution_count": 6,
-   "metadata": {},
-   "outputs": [],
+   "cell_type": "markdown",
+   "id": "0e3f94c9",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "X = np.zeros((n, L ** 2))\n",
-    "for i in range(n):\n",
-    "    X[i] = np.outer(spins[i], spins[i]).ravel()\n",
-    "y = energies\n",
-    "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)"
+    "$$\n",
+    "C(\\boldsymbol{\\theta})=\\frac{n}{2}\\log{2\\pi\\sigma^2}+\\frac{\\vert\\vert (\\boldsymbol{y}-\\boldsymbol{X}\\boldsymbol{\\theta})\\vert\\vert_2^2}{2\\sigma^2}.\n",
+    "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "b8dcd25e",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "## Linear regression\n",
-    "\n",
-    "In the ordinary least squares method we choose the cost function"
+    "Taking the derivative of the *new* cost function with respect to the parameters $\\theta$ we recognize our familiar OLS equation, namely"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "9140542c",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "<!-- Equation labels as ordinary links -->\n",
-    "<div id=\"_auto5\"></div>\n",
-    "\n",
     "$$\n",
-    "\\begin{equation}\n",
-    "    C(\\boldsymbol{X}, \\boldsymbol{\\beta})= \\frac{1}{n}\\left\\{(\\boldsymbol{X}\\boldsymbol{\\beta} - \\boldsymbol{y})^T(\\boldsymbol{X}\\boldsymbol{\\beta} - \\boldsymbol{y})\\right\\}.\n",
-    "\\label{_auto5} \\tag{5}\n",
-    "\\end{equation}\n",
+    "\\boldsymbol{X}^T\\left(\\boldsymbol{y}-\\boldsymbol{X}\\boldsymbol{\\theta}\\right) =0,\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "294a8ce0",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "We then find the extremal point of $C$ by taking the derivative with respect to $\\boldsymbol{\\beta}$ as discussed above.\n",
-    "This yields the expression for $\\boldsymbol{\\beta}$ to be"
+    "which leads to the well-known OLS equation for the optimal paramters $\\theta$"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "a8c1b097",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
-    "\\boldsymbol{\\beta} = \\frac{\\boldsymbol{X}^T \\boldsymbol{y}}{\\boldsymbol{X}^T \\boldsymbol{X}},\n",
+    "\\hat{\\boldsymbol{\\theta}}^{\\mathrm{OLS}}=\\left(\\boldsymbol{X}^T\\boldsymbol{X}\\right)^{-1}\\boldsymbol{X}^T\\boldsymbol{y}!\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "4180c17f",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "which immediately imposes some requirements on $\\boldsymbol{X}$ as there must exist\n",
-    "an inverse of $\\boldsymbol{X}^T \\boldsymbol{X}$. If the expression we are modeling contains an\n",
-    "intercept, i.e., a constant term, we must make sure that the\n",
-    "first column of $\\boldsymbol{X}$ consists of $1$. We do this here"
+    "Next week we will make  a similar analysis for Ridge and Lasso regression"
    ]
   },
   {
-   "cell_type": "code",
-   "execution_count": 7,
-   "metadata": {},
-   "outputs": [],
+   "cell_type": "markdown",
+   "id": "6a48bb05",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "X_train_own = np.concatenate(\n",
-    "    (np.ones(len(X_train))[:, np.newaxis], X_train),\n",
-    "    axis=1\n",
-    ")\n",
-    "X_test_own = np.concatenate(\n",
-    "    (np.ones(len(X_test))[:, np.newaxis], X_test),\n",
-    "    axis=1\n",
-    ")"
+    "## Why resampling methods\n",
+    "\n",
+    "Before we proceed, we need to rethink what we have been doing. In our\n",
+    "eager to fit the data, we have omitted several important elements in\n",
+    "our regression analysis. In what follows we will\n",
+    "1. look at statistical properties, including a discussion of mean values, variance and the so-called bias-variance tradeoff\n",
+    "\n",
+    "2. introduce resampling techniques like cross-validation, bootstrapping and jackknife and more\n",
+    "\n",
+    "and discuss how to select a given model (one of the difficult parts in machine learning)."
    ]
   },
   {
-   "cell_type": "code",
-   "execution_count": 8,
-   "metadata": {},
-   "outputs": [],
+   "cell_type": "markdown",
+   "id": "d053a5c8",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "def ols_inv(x: np.ndarray, y: np.ndarray) -> np.ndarray:\n",
-    "    return scl.inv(x.T @ x) @ (x.T @ y)\n",
-    "beta = ols_inv(X_train_own, y_train)"
+    "## Resampling methods\n",
+    "Resampling methods are an indispensable tool in modern\n",
+    "statistics. They involve repeatedly drawing samples from a training\n",
+    "set and refitting a model of interest on each sample in order to\n",
+    "obtain additional information about the fitted model. For example, in\n",
+    "order to estimate the variability of a linear regression fit, we can\n",
+    "repeatedly draw different samples from the training data, fit a linear\n",
+    "regression to each new sample, and then examine the extent to which\n",
+    "the resulting fits differ. Such an approach may allow us to obtain\n",
+    "information that would not be available from fitting the model only\n",
+    "once using the original training sample.\n",
+    "\n",
+    "Two resampling methods are often used in Machine Learning analyses,\n",
+    "1. The **bootstrap method**\n",
+    "\n",
+    "2. and **Cross-Validation**\n",
+    "\n",
+    "In addition there are several other methods such as the Jackknife and the Blocking methods. We will discuss in particular\n",
+    "cross-validation and the bootstrap method."
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "8cb550a1",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "## Singular Value decomposition\n",
+    "## Resampling approaches can be computationally expensive\n",
     "\n",
-    "Doing the inversion directly turns out to be a bad idea since the matrix\n",
-    "$\\boldsymbol{X}^T\\boldsymbol{X}$ is singular. An alternative approach is to use the **singular\n",
-    "value decomposition**. Using the definition of the Moore-Penrose\n",
-    "pseudoinverse we can write the equation for $\\boldsymbol{\\beta}$ as"
+    "Resampling approaches can be computationally expensive, because they\n",
+    "involve fitting the same statistical method multiple times using\n",
+    "different subsets of the training data. However, due to recent\n",
+    "advances in computing power, the computational requirements of\n",
+    "resampling methods generally are not prohibitive. In this chapter, we\n",
+    "discuss two of the most commonly used resampling methods,\n",
+    "cross-validation and the bootstrap. Both methods are important tools\n",
+    "in the practical application of many statistical learning\n",
+    "procedures. For example, cross-validation can be used to estimate the\n",
+    "test error associated with a given statistical learning method in\n",
+    "order to evaluate its performance, or to select the appropriate level\n",
+    "of flexibility. The process of evaluating a model’s performance is\n",
+    "known as model assessment, whereas the process of selecting the proper\n",
+    "level of flexibility for a model is known as model selection. The\n",
+    "bootstrap is widely used."
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "96be5396",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "$$\n",
-    "\\boldsymbol{\\beta} = \\boldsymbol{X}^{+}\\boldsymbol{y},\n",
-    "$$"
+    "## Why resampling methods ?\n",
+    "**Statistical analysis.**\n",
+    "\n",
+    "* Our simulations can be treated as *computer experiments*. This is particularly the case for Monte Carlo methods which are widely used in statistical analyses.\n",
+    "\n",
+    "* The results can be analysed with the same statistical tools as we would use when analysing experimental data.\n",
+    "\n",
+    "* As in all experiments, we are looking for expectation values and an estimate of how accurate they are, i.e., possible sources for errors."
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "f8a288b4",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "where the pseudoinverse of $\\boldsymbol{X}$ is given by"
+    "## Statistical analysis\n",
+    "\n",
+    "* As in other experiments, many numerical  experiments have two classes of errors:\n",
+    "\n",
+    "  * Statistical errors\n",
+    "\n",
+    "  * Systematical errors\n",
+    "\n",
+    "* Statistical errors can be estimated using standard tools from statistics\n",
+    "\n",
+    "* Systematical errors are method specific and must be treated differently from case to case."
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "ee8bd2f9",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "$$\n",
-    "\\boldsymbol{X}^{+} = \\frac{\\boldsymbol{X}^T}{\\boldsymbol{X}^T\\boldsymbol{X}}.\n",
-    "$$"
+    "## Resampling methods\n",
+    "\n",
+    "With all these analytical equations for both the OLS and Ridge\n",
+    "regression, we will now outline how to assess a given model. This will\n",
+    "lead to a discussion of the so-called bias-variance tradeoff (see\n",
+    "below) and so-called resampling methods.\n",
+    "\n",
+    "One of the quantities we have discussed as a way to measure errors is\n",
+    "the mean-squared error (MSE), mainly used for fitting of continuous\n",
+    "functions. Another choice is the absolute error.\n",
+    "\n",
+    "In the discussions below we will focus on the MSE and in particular since we will split the data into test and training data,\n",
+    "we discuss the\n",
+    "1. prediction error or simply the **test error** $\\mathrm{Err_{Test}}$, where we have a fixed training set and the test error is the MSE arising from the data reserved for testing. We discuss also the \n",
+    "\n",
+    "2. training error $\\mathrm{Err_{Train}}$, which is the average loss over the training data.\n",
+    "\n",
+    "As our model becomes more and more complex, more of the training data tends to  used. The training may thence adapt to more complicated structures in the data. This may lead to a decrease in the bias (see below for code example) and a slight increase of the variance for the test error.\n",
+    "For a certain level of complexity the test error will reach minimum, before starting to increase again. The\n",
+    "training error reaches a saturation."
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "1037fcf3",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "Using singular value decomposition we can decompose the matrix  $\\boldsymbol{X} = \\boldsymbol{U}\\boldsymbol{\\Sigma} \\boldsymbol{V}^T$,\n",
-    "where $\\boldsymbol{U}$ and $\\boldsymbol{V}$ are orthogonal(unitary) matrices and $\\boldsymbol{\\Sigma}$ contains the singular values (more details below).\n",
-    "where $X^{+} = V\\Sigma^{+} U^T$. This reduces the equation for\n",
-    "$\\omega$ to"
+    "## Resampling methods: Bootstrap\n",
+    "Bootstrapping is a [non-parametric approach](https://en.wikipedia.org/wiki/Nonparametric_statistics) to statistical inference\n",
+    "that substitutes computation for more traditional distributional\n",
+    "assumptions and asymptotic results. Bootstrapping offers a number of\n",
+    "advantages: \n",
+    "1. The bootstrap is quite general, although there are some cases in which it fails.  \n",
+    "\n",
+    "2. Because it does not require distributional assumptions (such as normally distributed errors), the bootstrap can provide more accurate inferences when the data are not well behaved or when the sample size is small.  \n",
+    "\n",
+    "3. It is possible to apply the bootstrap to statistics with sampling distributions that are difficult to derive, even asymptotically. \n",
+    "\n",
+    "4. It is relatively simple to apply the bootstrap to complex data-collection plans (such as stratified and clustered samples).\n",
+    "\n",
+    "The textbook by [Davison on the Bootstrap Methods and their Applications](https://www.cambridge.org/core/books/bootstrap-methods-and-their-application/ED2FD043579F27952363566DC09CBD6A) provides many more insights and proofs. In this course we will take a more practical approach and use the results and theorems provided in the literature. For those interested in reading more about the bootstrap methods, we recommend the above text and the one by [Efron and Tibshirani](https://www.routledge.com/An-Introduction-to-the-Bootstrap/Efron-Tibshirani/p/book/9780412042317).\n",
+    "\n",
+    "Before we proceed however, we need to remind ourselves about a central theorem in statistics, namely the so-called **central limit theorem**."
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "2653d9e3",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "<!-- Equation labels as ordinary links -->\n",
-    "<div id=\"_auto6\"></div>\n",
+    "## The Central Limit Theorem\n",
     "\n",
-    "$$\n",
-    "\\begin{equation}\n",
-    "    \\boldsymbol{\\beta} = \\boldsymbol{V}\\boldsymbol{\\Sigma}^{+} \\boldsymbol{U}^T \\boldsymbol{y}.\n",
-    "\\label{_auto6} \\tag{6}\n",
-    "\\end{equation}\n",
-    "$$"
+    "Suppose we have a PDF $p(x)$ from which we generate  a series $N$\n",
+    "of averages $\\mathbb{E}[x_i]$. Each mean value $\\mathbb{E}[x_i]$\n",
+    "is viewed as the average of a specific measurement, e.g., throwing \n",
+    "dice 100 times and then taking the average value, or producing a certain\n",
+    "amount of random numbers. \n",
+    "For notational ease, we set $\\mathbb{E}[x_i]=x_i$ in the discussion\n",
+    "which follows. We do the same for $\\mathbb{E}[z]=z$.\n",
+    "\n",
+    "If we compute the mean $z$ of $m$ such mean values $x_i$"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "33e4596e",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "Note that solving this equation by actually doing the pseudoinverse\n",
-    "(which is what we will do) is not a good idea as this operation scales\n",
-    "as $\\mathcal{O}(n^3)$, where $n$ is the number of elements in a\n",
-    "general matrix. Instead, doing $QR$-factorization and solving the\n",
-    "linear system as an equation would reduce this down to\n",
-    "$\\mathcal{O}(n^2)$ operations."
+    "$$\n",
+    "z=\\frac{x_1+x_2+\\dots+x_m}{m},\n",
+    "$$"
    ]
   },
   {
-   "cell_type": "code",
-   "execution_count": 9,
-   "metadata": {},
-   "outputs": [],
+   "cell_type": "markdown",
+   "id": "137d3cd6",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "def ols_svd(x: np.ndarray, y: np.ndarray) -> np.ndarray:\n",
-    "    u, s, v = scl.svd(x)\n",
-    "    return v.T @ scl.pinv(scl.diagsvd(s, u.shape[0], v.shape[0])) @ u.T @ y"
+    "the question we pose is which is the PDF of the new variable $z$."
    ]
   },
   {
-   "cell_type": "code",
-   "execution_count": 10,
-   "metadata": {},
-   "outputs": [],
+   "cell_type": "markdown",
+   "id": "4c9853b7",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "beta = ols_svd(X_train_own,y_train)"
+    "## Finding the Limit\n",
+    "\n",
+    "The probability of obtaining an average value $z$ is the product of the \n",
+    "probabilities of obtaining arbitrary individual mean values $x_i$,\n",
+    "but with the constraint that the average is $z$. We can express this through\n",
+    "the following expression"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "493028c4",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "When extracting the $J$-matrix  we need to make sure that we remove the intercept, as is done here"
+    "$$\n",
+    "\\tilde{p}(z)=\\int dx_1p(x_1)\\int dx_2p(x_2)\\dots\\int dx_mp(x_m)\n",
+    "    \\delta(z-\\frac{x_1+x_2+\\dots+x_m}{m}),\n",
+    "$$"
    ]
   },
   {
-   "cell_type": "code",
-   "execution_count": 11,
-   "metadata": {},
-   "outputs": [],
+   "cell_type": "markdown",
+   "id": "c819e616",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "J = beta[1:].reshape(L, L)"
+    "where the $\\delta$-function enbodies the constraint that the mean is $z$.\n",
+    "All measurements that lead to each individual $x_i$ are expected to\n",
+    "be independent, which in turn means that we can express $\\tilde{p}$ as the \n",
+    "product of individual $p(x_i)$.  The independence assumption is important in the derivation of the central limit theorem."
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "f0385e1a",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "A way of looking at the coefficients in $J$ is to plot the matrices as images."
+    "## Rewriting the $\\delta$-function\n",
+    "\n",
+    "If we use the integral expression for the $\\delta$-function"
    ]
   },
   {
-   "cell_type": "code",
-   "execution_count": 12,
-   "metadata": {},
-   "outputs": [],
+   "cell_type": "markdown",
+   "id": "4310a9b4",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "fig = plt.figure(figsize=(20, 14))\n",
-    "im = plt.imshow(J, **cmap_args)\n",
-    "plt.title(\"OLS\", fontsize=18)\n",
-    "plt.xticks(fontsize=18)\n",
-    "plt.yticks(fontsize=18)\n",
-    "cb = fig.colorbar(im)\n",
-    "cb.ax.set_yticklabels(cb.ax.get_yticklabels(), fontsize=18)\n",
-    "plt.show()"
+    "$$\n",
+    "\\delta(z-\\frac{x_1+x_2+\\dots+x_m}{m})=\\frac{1}{2\\pi}\\int_{-\\infty}^{\\infty}\n",
+    "   dq\\exp{\\left(iq(z-\\frac{x_1+x_2+\\dots+x_m}{m})\\right)},\n",
+    "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "1e3932be",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "It is interesting to note that OLS\n",
-    "considers both $J_{j, j + 1} = -0.5$ and $J_{j, j - 1} = -0.5$ as\n",
-    "valid matrix elements for $J$.\n",
-    "In our discussion below on hyperparameters and Ridge and Lasso regression we will see that\n",
-    "this problem can be removed, partly and only with Lasso regression. \n",
-    "\n",
-    "In this case our matrix inversion was actually possible. The obvious question now is what is the mathematics behind the SVD?\n",
-    "\n",
-    "\n",
-    "\n",
-    "\n",
-    "\n",
-    "## The one-dimensional Ising model\n",
-    "\n",
-    "Let us bring back the Ising model again, but now with an additional\n",
-    "focus on Ridge and Lasso regression as well. We repeat some of the\n",
-    "basic parts of the Ising model and the setup of the training and test\n",
-    "data.  The one-dimensional Ising model with nearest neighbor\n",
-    "interaction, no external field and a constant coupling constant $J$ is\n",
-    "given by"
+    "and inserting $e^{i\\mu q-i\\mu q}$ where $\\mu$ is the mean value\n",
+    "we arrive at"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "91e64919",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "<!-- Equation labels as ordinary links -->\n",
-    "<div id=\"_auto7\"></div>\n",
-    "\n",
     "$$\n",
-    "\\begin{equation}\n",
-    "    H = -J \\sum_{k}^L s_k s_{k + 1},\n",
-    "\\label{_auto7} \\tag{7}\n",
-    "\\end{equation}\n",
+    "\\tilde{p}(z)=\\frac{1}{2\\pi}\\int_{-\\infty}^{\\infty}\n",
+    "   dq\\exp{\\left(iq(z-\\mu)\\right)}\\left[\\int_{-\\infty}^{\\infty}\n",
+    "   dxp(x)\\exp{\\left(iq(\\mu-x)/m\\right)}\\right]^m,\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "c83a8ca5",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "where $s_i \\in \\{-1, 1\\}$ and $s_{N + 1} = s_1$. The number of spins in the system is determined by $L$. For the one-dimensional system there is no phase transition.\n",
-    "\n",
-    "We will look at a system of $L = 40$ spins with a coupling constant of $J = 1$. To get enough training data we will generate 10000 states with their respective energies."
+    "with the integral over $x$ resulting in"
    ]
   },
   {
-   "cell_type": "code",
-   "execution_count": 13,
-   "metadata": {},
-   "outputs": [],
+   "cell_type": "markdown",
+   "id": "16c94c48",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "import numpy as np\n",
-    "import matplotlib.pyplot as plt\n",
-    "from mpl_toolkits.axes_grid1 import make_axes_locatable\n",
-    "import seaborn as sns\n",
-    "import scipy.linalg as scl\n",
-    "from sklearn.model_selection import train_test_split\n",
-    "import sklearn.linear_model as skl\n",
-    "import tqdm\n",
-    "sns.set(color_codes=True)\n",
-    "cmap_args=dict(vmin=-1., vmax=1., cmap='seismic')\n",
-    "\n",
-    "L = 40\n",
-    "n = int(1e4)\n",
-    "\n",
-    "spins = np.random.choice([-1, 1], size=(n, L))\n",
-    "J = 1.0\n",
-    "\n",
-    "energies = np.zeros(n)\n",
-    "\n",
-    "for i in range(n):\n",
-    "    energies[i] = - J * np.dot(spins[i], np.roll(spins[i], 1))"
+    "$$\n",
+    "\\int_{-\\infty}^{\\infty}dxp(x)\\exp{\\left(iq(\\mu-x)/m\\right)}=\n",
+    "  \\int_{-\\infty}^{\\infty}dxp(x)\n",
+    "   \\left[1+\\frac{iq(\\mu-x)}{m}-\\frac{q^2(\\mu-x)^2}{2m^2}+\\dots\\right].\n",
+    "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "63bc9d16",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "A more general form for the one-dimensional Ising model is"
+    "## Identifying Terms\n",
+    "\n",
+    "The second term on the rhs disappears since this is just the mean and \n",
+    "employing the definition of $\\sigma^2$ we have"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "710fedd5",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "<!-- Equation labels as ordinary links -->\n",
-    "<div id=\"_auto8\"></div>\n",
-    "\n",
     "$$\n",
-    "\\begin{equation}\n",
-    "    H = - \\sum_j^L \\sum_k^L s_j s_k J_{jk}.\n",
-    "\\label{_auto8} \\tag{8}\n",
-    "\\end{equation}\n",
+    "\\int_{-\\infty}^{\\infty}dxp(x)e^{\\left(iq(\\mu-x)/m\\right)}=\n",
+    "  1-\\frac{q^2\\sigma^2}{2m^2}+\\dots,\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "8c47e8f8",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "Here we allow for interactions beyond the nearest neighbors and a more\n",
-    "adaptive coupling matrix. This latter expression can be formulated as\n",
-    "a matrix-product on the form"
+    "resulting in"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "c0e43db3",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "<!-- Equation labels as ordinary links -->\n",
-    "<div id=\"_auto9\"></div>\n",
-    "\n",
     "$$\n",
-    "\\begin{equation}\n",
-    "    H = X J,\n",
-    "\\label{_auto9} \\tag{9}\n",
-    "\\end{equation}\n",
+    "\\left[\\int_{-\\infty}^{\\infty}dxp(x)\\exp{\\left(iq(\\mu-x)/m\\right)}\\right]^m\\approx\n",
+    "  \\left[1-\\frac{q^2\\sigma^2}{2m^2}+\\dots \\right]^m,\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "e9a40705",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "where $X_{jk} = s_j s_k$ and $J$ is the matrix consisting of the\n",
-    "elements $-J_{jk}$. This form of writing the energy fits perfectly\n",
-    "with the form utilized in linear regression, viz."
+    "and in the limit $m\\rightarrow \\infty$ we obtain"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "de6040d9",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "<!-- Equation labels as ordinary links -->\n",
-    "<div id=\"_auto10\"></div>\n",
-    "\n",
     "$$\n",
-    "\\begin{equation}\n",
-    "    \\boldsymbol{y} = \\boldsymbol{X}\\boldsymbol{\\beta} + \\boldsymbol{\\epsilon}.\n",
-    "\\label{_auto10} \\tag{10}\n",
-    "\\end{equation}\n",
+    "\\tilde{p}(z)=\\frac{1}{\\sqrt{2\\pi}(\\sigma/\\sqrt{m})}\n",
+    "    \\exp{\\left(-\\frac{(z-\\mu)^2}{2(\\sigma/\\sqrt{m})^2}\\right)},\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "f2907638",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "We organize the data as we did above"
+    "which is the normal distribution with variance\n",
+    "$\\sigma^2_m=\\sigma^2/m$, where $\\sigma$ is the variance of the PDF $p(x)$\n",
+    "and $\\mu$ is also the mean of the PDF $p(x)$."
    ]
   },
   {
-   "cell_type": "code",
-   "execution_count": 14,
-   "metadata": {},
-   "outputs": [],
+   "cell_type": "markdown",
+   "id": "9e53d173",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "X = np.zeros((n, L ** 2))\n",
-    "for i in range(n):\n",
-    "    X[i] = np.outer(spins[i], spins[i]).ravel()\n",
-    "y = energies\n",
-    "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.96)\n",
+    "## Wrapping it up\n",
     "\n",
-    "X_train_own = np.concatenate(\n",
-    "    (np.ones(len(X_train))[:, np.newaxis], X_train),\n",
-    "    axis=1\n",
-    ")\n",
+    "Thus, the central limit theorem states that the PDF $\\tilde{p}(z)$ of\n",
+    "the average of $m$ random values corresponding to a PDF $p(x)$ \n",
+    "is a normal distribution whose mean is the \n",
+    "mean value of the PDF $p(x)$ and whose variance is the variance\n",
+    "of the PDF $p(x)$ divided by $m$, the number of values used to compute $z$.\n",
     "\n",
-    "X_test_own = np.concatenate(\n",
-    "    (np.ones(len(X_test))[:, np.newaxis], X_test),\n",
-    "    axis=1\n",
-    ")"
+    "The central limit theorem leads to the well-known expression for the\n",
+    "standard deviation, given by"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "We will do all fitting with **Scikit-Learn**,"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 15,
-   "metadata": {},
-   "outputs": [],
+   "id": "959c77be",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "clf = skl.LinearRegression().fit(X_train, y_train)"
+    "$$\n",
+    "\\sigma_m=\n",
+    "\\frac{\\sigma}{\\sqrt{m}}.\n",
+    "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "When  extracting the $J$-matrix we make sure to remove the intercept"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 16,
-   "metadata": {},
-   "outputs": [],
+   "id": "6470ab77",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "J_sk = clf.coef_.reshape(L, L)"
+    "The latter is true only if the average value is known exactly. This is obtained in the limit\n",
+    "$m\\rightarrow \\infty$  only. Because the mean and the variance are measured quantities we obtain \n",
+    "the familiar expression in statistics (the so-called Bessel correction)"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "And then we plot the results"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 17,
-   "metadata": {},
-   "outputs": [],
+   "id": "720c157a",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "fig = plt.figure(figsize=(20, 14))\n",
-    "im = plt.imshow(J_sk, **cmap_args)\n",
-    "plt.title(\"LinearRegression from Scikit-learn\", fontsize=18)\n",
-    "plt.xticks(fontsize=18)\n",
-    "plt.yticks(fontsize=18)\n",
-    "cb = fig.colorbar(im)\n",
-    "cb.ax.set_yticklabels(cb.ax.get_yticklabels(), fontsize=18)\n",
-    "plt.show()"
+    "$$\n",
+    "\\sigma_m\\approx \n",
+    "\\frac{\\sigma}{\\sqrt{m-1}}.\n",
+    "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "62c153d3",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "The results perfectly with our previous discussion where we used our own code.\n",
+    "In many cases however the above estimate for the standard deviation,\n",
+    "in particular if correlations are strong, may be too simplistic. Keep\n",
+    "in mind that we have assumed that the variables $x$ are independent\n",
+    "and identically distributed. This is obviously not always the\n",
+    "case. For example, the random numbers (or better pseudorandom numbers)\n",
+    "we generate in various calculations do always exhibit some\n",
+    "correlations.\n",
     "\n",
-    "## Ridge regression\n",
-    "\n",
-    "Having explored the ordinary least squares we move on to ridge\n",
-    "regression. In ridge regression we include a **regularizer**. This\n",
-    "involves a new cost function which leads to a new estimate for the\n",
-    "weights $\\boldsymbol{\\beta}$. This results in a penalized regression problem. The\n",
-    "cost function is given by"
+    "The theorem is satisfied by a large class of PDFs. Note however that for a\n",
+    "finite $m$, it is not always possible to find a closed form /analytic expression for\n",
+    "$\\tilde{p}(x)$."
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "2\n",
-    "2\n",
-    " \n",
-    "<\n",
-    "<\n",
-    "<\n",
-    "!\n",
-    "!\n",
-    "M\n",
-    "A\n",
-    "T\n",
-    "H\n",
-    "_\n",
-    "B\n",
-    "L\n",
-    "O\n",
-    "C\n",
-    "K"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 18,
-   "metadata": {},
-   "outputs": [],
+   "id": "cfd10bd9",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "_lambda = 0.1\n",
-    "clf_ridge = skl.Ridge(alpha=_lambda).fit(X_train, y_train)\n",
-    "J_ridge_sk = clf_ridge.coef_.reshape(L, L)\n",
-    "fig = plt.figure(figsize=(20, 14))\n",
-    "im = plt.imshow(J_ridge_sk, **cmap_args)\n",
-    "plt.title(\"Ridge from Scikit-learn\", fontsize=18)\n",
-    "plt.xticks(fontsize=18)\n",
-    "plt.yticks(fontsize=18)\n",
-    "cb = fig.colorbar(im)\n",
-    "cb.ax.set_yticklabels(cb.ax.get_yticklabels(), fontsize=18)\n",
+    "## Confidence Intervals\n",
     "\n",
-    "plt.show()"
+    "Confidence intervals are used in statistics and represent a type of estimate\n",
+    "computed from the observed data. This gives a range of values for an\n",
+    "unknown parameter such as the parameters $\\boldsymbol{\\theta}$ from linear regression.\n",
+    "\n",
+    "With the OLS expressions for the parameters $\\boldsymbol{\\theta}$ we found \n",
+    "$\\mathbb{E}(\\boldsymbol{\\theta}) = \\boldsymbol{\\theta}$, which means that the estimator of the regression parameters is unbiased.\n",
+    "\n",
+    "In the exercises this week we show that the variance of the estimate of the $j$-th regression coefficient is\n",
+    "$\\boldsymbol{\\sigma}^2 (\\boldsymbol{\\theta}_j ) = \\boldsymbol{\\sigma}^2 [(\\mathbf{X}^{T} \\mathbf{X})^{-1}]_{jj} $.\n",
+    "\n",
+    "This quantity can be used to\n",
+    "construct a confidence interval for the estimates."
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "40dc022d",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "## LASSO regression\n",
+    "## Standard Approach based on the Normal Distribution\n",
     "\n",
-    "In the **Least Absolute Shrinkage and Selection Operator** (LASSO)-method we get a third cost function."
+    "We will assume that the parameters $\\theta$ follow a normal\n",
+    "distribution.  We can then define the confidence interval.  Here we will be using as\n",
+    "shorthands $\\mu_{\\theta}$ for the above mean value and $\\sigma_{\\theta}$\n",
+    "for the standard deviation. We have then a confidence interval"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "242bfa08",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "<!-- Equation labels as ordinary links -->\n",
-    "<div id=\"_auto12\"></div>\n",
-    "\n",
     "$$\n",
-    "\\begin{equation}\n",
-    "    C(\\boldsymbol{X}, \\boldsymbol{\\beta}; \\lambda) = (\\boldsymbol{X}\\boldsymbol{\\beta} - \\boldsymbol{y})^T(\\boldsymbol{X}\\boldsymbol{\\beta} - \\boldsymbol{y}) + \\lambda \\sqrt{\\boldsymbol{\\beta}^T\\boldsymbol{\\beta}}.\n",
-    "\\label{_auto12} \\tag{12}\n",
-    "\\end{equation}\n",
+    "\\left(\\mu_{\\theta}\\pm \\frac{z\\sigma_{\\theta}}{\\sqrt{n}}\\right),\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Finding the extremal point of this cost function is not so straight-forward as in least squares and ridge. We will therefore rely solely on the function ``Lasso`` from **Scikit-Learn**."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 19,
-   "metadata": {},
-   "outputs": [],
+   "id": "7dd4616b",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "clf_lasso = skl.Lasso(alpha=_lambda).fit(X_train, y_train)\n",
-    "J_lasso_sk = clf_lasso.coef_.reshape(L, L)\n",
-    "fig = plt.figure(figsize=(20, 14))\n",
-    "im = plt.imshow(J_lasso_sk, **cmap_args)\n",
-    "plt.title(\"Lasso from Scikit-learn\", fontsize=18)\n",
-    "plt.xticks(fontsize=18)\n",
-    "plt.yticks(fontsize=18)\n",
-    "cb = fig.colorbar(im)\n",
-    "cb.ax.set_yticklabels(cb.ax.get_yticklabels(), fontsize=18)\n",
+    "where $z$ defines the level of certainty (or confidence). For a normal\n",
+    "distribution typical parameters are $z=2.576$ which corresponds to a\n",
+    "confidence of $99\\%$ while $z=1.96$ corresponds to a confidence of\n",
+    "$95\\%$.  A confidence level of $95\\%$ is commonly used and it is\n",
+    "normally referred to as a *two-sigmas* confidence level, that is we\n",
+    "approximate $z\\approx 2$.\n",
     "\n",
-    "plt.show()"
+    "For more discussions of confidence intervals (and in particular linked with a discussion of the bootstrap method), see chapter 5 of the textbook by [Davison on the Bootstrap Methods and their Applications](https://www.cambridge.org/core/books/bootstrap-methods-and-their-application/ED2FD043579F27952363566DC09CBD6A)\n",
+    "\n",
+    "In this text you will also find an in-depth discussion of the\n",
+    "Bootstrap method, why it works and various theorems related to it."
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "00b509e4",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "It is quite striking how LASSO breaks the symmetry of the coupling\n",
-    "constant as opposed to ridge and OLS. We get a sparse solution with\n",
-    "$J_{j, j + 1} = -1$.\n",
-    "\n",
-    "\n",
-    "\n",
-    "## Performance as  function of the regularization parameter\n",
+    "## Resampling methods: Bootstrap background\n",
     "\n",
-    "We see how the different models perform for a different set of values for $\\lambda$."
+    "Since $\\widehat{\\theta} = \\widehat{\\theta}(\\boldsymbol{X})$ is a function of random variables,\n",
+    "$\\widehat{\\theta}$ itself must be a random variable. Thus it has\n",
+    "a pdf, call this function $p(\\boldsymbol{t})$. The aim of the bootstrap is to\n",
+    "estimate $p(\\boldsymbol{t})$ by the relative frequency of\n",
+    "$\\widehat{\\theta}$. You can think of this as using a histogram\n",
+    "in the place of $p(\\boldsymbol{t})$. If the relative frequency closely\n",
+    "resembles $p(\\vec{t})$, then using numerics, it is straight forward to\n",
+    "estimate all the interesting parameters of $p(\\boldsymbol{t})$ using point\n",
+    "estimators."
    ]
   },
   {
-   "cell_type": "code",
-   "execution_count": 20,
-   "metadata": {},
-   "outputs": [],
+   "cell_type": "markdown",
+   "id": "941834ae",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "lambdas = np.logspace(-4, 5, 10)\n",
+    "## Resampling methods: More Bootstrap background\n",
     "\n",
-    "train_errors = {\n",
-    "    \"ols_sk\": np.zeros(lambdas.size),\n",
-    "    \"ridge_sk\": np.zeros(lambdas.size),\n",
-    "    \"lasso_sk\": np.zeros(lambdas.size)\n",
-    "}\n",
+    "In the case that $\\widehat{\\theta}$ has\n",
+    "more than one component, and the components are independent, we use the\n",
+    "same estimator on each component separately.  If the probability\n",
+    "density function of $X_i$, $p(x)$, had been known, then it would have\n",
+    "been straightforward to do this by: \n",
+    "1. Drawing lots of numbers from $p(x)$, suppose we call one such set of numbers $(X_1^*, X_2^*, \\cdots, X_n^*)$. \n",
     "\n",
-    "test_errors = {\n",
-    "    \"ols_sk\": np.zeros(lambdas.size),\n",
-    "    \"ridge_sk\": np.zeros(lambdas.size),\n",
-    "    \"lasso_sk\": np.zeros(lambdas.size)\n",
-    "}\n",
+    "2. Then using these numbers, we could compute a replica of $\\widehat{\\theta}$ called $\\widehat{\\theta}^*$. \n",
     "\n",
-    "plot_counter = 1\n",
-    "\n",
-    "fig = plt.figure(figsize=(32, 54))\n",
-    "\n",
-    "for i, _lambda in enumerate(tqdm.tqdm(lambdas)):\n",
-    "    for key, method in zip(\n",
-    "        [\"ols_sk\", \"ridge_sk\", \"lasso_sk\"],\n",
-    "        [skl.LinearRegression(), skl.Ridge(alpha=_lambda), skl.Lasso(alpha=_lambda)]\n",
-    "    ):\n",
-    "        method = method.fit(X_train, y_train)\n",
-    "\n",
-    "        train_errors[key][i] = method.score(X_train, y_train)\n",
-    "        test_errors[key][i] = method.score(X_test, y_test)\n",
-    "\n",
-    "        omega = method.coef_.reshape(L, L)\n",
-    "\n",
-    "        plt.subplot(10, 5, plot_counter)\n",
-    "        plt.imshow(omega, **cmap_args)\n",
-    "        plt.title(r\"%s, $\\lambda = %.4f$\" % (key, _lambda))\n",
-    "        plot_counter += 1\n",
-    "\n",
-    "plt.show()"
+    "By repeated use of the above two points, many\n",
+    "estimates of $\\widehat{\\theta}$ can  be obtained. The\n",
+    "idea is to use the relative frequency of $\\widehat{\\theta}^*$\n",
+    "(think of a histogram) as an estimate of $p(\\boldsymbol{t})$."
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "69ba3346",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "We see that LASSO reaches a good solution for low\n",
-    "values of $\\lambda$, but will \"wither\" when we increase $\\lambda$ too\n",
-    "much. Ridge is more stable over a larger range of values for\n",
-    "$\\lambda$, but eventually also fades away.\n",
+    "## Resampling methods: Bootstrap approach\n",
     "\n",
-    "## Finding the optimal value of $\\lambda$\n",
+    "But\n",
+    "unless there is enough information available about the process that\n",
+    "generated $X_1,X_2,\\cdots,X_n$, $p(x)$ is in general\n",
+    "unknown. Therefore, [Efron in 1979](https://projecteuclid.org/euclid.aos/1176344552)  asked the\n",
+    "question: What if we replace $p(x)$ by the relative frequency\n",
+    "of the observation $X_i$?\n",
     "\n",
-    "To determine which value of $\\lambda$ is best we plot the accuracy of\n",
-    "the models when predicting the training and the testing set. We expect\n",
-    "the accuracy of the training set to be quite good, but if the accuracy\n",
-    "of the testing set is much lower this tells us that we might be\n",
-    "subject to an overfit model. The ideal scenario is an accuracy on the\n",
-    "testing set that is close to the accuracy of the training set."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 21,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "fig = plt.figure(figsize=(20, 14))\n",
-    "\n",
-    "colors = {\n",
-    "    \"ols_sk\": \"r\",\n",
-    "    \"ridge_sk\": \"y\",\n",
-    "    \"lasso_sk\": \"c\"\n",
-    "}\n",
-    "\n",
-    "for key in train_errors:\n",
-    "    plt.semilogx(\n",
-    "        lambdas,\n",
-    "        train_errors[key],\n",
-    "        colors[key],\n",
-    "        label=\"Train {0}\".format(key),\n",
-    "        linewidth=4.0\n",
-    "    )\n",
-    "\n",
-    "for key in test_errors:\n",
-    "    plt.semilogx(\n",
-    "        lambdas,\n",
-    "        test_errors[key],\n",
-    "        colors[key] + \"--\",\n",
-    "        label=\"Test {0}\".format(key),\n",
-    "        linewidth=4.0\n",
-    "    )\n",
-    "plt.legend(loc=\"best\", fontsize=18)\n",
-    "plt.xlabel(r\"$\\lambda$\", fontsize=18)\n",
-    "plt.ylabel(r\"$R^2$\", fontsize=18)\n",
-    "plt.tick_params(labelsize=18)\n",
-    "plt.show()"
+    "If we draw observations in accordance with\n",
+    "the relative frequency of the observations, will we obtain the same\n",
+    "result in some asymptotic sense? The answer is yes."
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "99f1499e",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "From the above figure we can see that LASSO with $\\lambda = 10^{-2}$\n",
-    "achieves a very good accuracy on the test set. This by far surpasses the\n",
-    "other models for all values of $\\lambda$.\n",
+    "## Resampling methods: Bootstrap steps\n",
     "\n",
+    "The independent bootstrap works like this: \n",
     "\n",
+    "1. Draw with replacement $n$ numbers for the observed variables $\\boldsymbol{x} = (x_1,x_2,\\cdots,x_n)$. \n",
     "\n",
+    "2. Define a vector $\\boldsymbol{x}^*$ containing the values which were drawn from $\\boldsymbol{x}$. \n",
     "\n",
+    "3. Using the vector $\\boldsymbol{x}^*$ compute $\\widehat{\\theta}^*$ by evaluating $\\widehat \\theta$ under the observations $\\boldsymbol{x}^*$. \n",
     "\n",
+    "4. Repeat this process $k$ times. \n",
     "\n",
-    "\n",
-    "\n",
-    "\n",
-    "\n",
-    "## Friday September 18: Intro to Logistic Regression\n",
-    "\n",
-    "[Video of Lecture](https://www.uio.no/studier/emner/matnat/fys/FYS-STK4155/h20/forelesningsvideoer/LectureSeptember18.mp4?vrtx=view-as-webpage) and [link to handwritten notes](https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/NotesSeptember18.pdf).\n",
-    "\n",
-    "\n",
->>>>>>> 9b0e2e75096cc1acee65bfac25f4eff818140252
-    "\n",
-    "<!-- !split  -->\n",
-    "## Logistic Regression\n",
-    "\n",
-    "In linear regression our main interest was centered on learning the\n",
-    "coefficients of a functional fit (say a polynomial) in order to be\n",
-    "able to predict the response of a continuous variable on some unseen\n",
-    "data. The fit to the continuous variable $y_i$ is based on some\n",
-    "independent variables $\\hat{x}_i$. Linear regression resulted in\n",
-    "analytical expressions for standard ordinary Least Squares or Ridge\n",
-    "regression (in terms of matrices to invert) for several quantities,\n",
-    "ranging from the variance and thereby the confidence intervals of the\n",
-    "parameters $\\hat{\\beta}$ to the mean squared error. If we can invert\n",
-    "the product of the design matrices, linear regression gives then a\n",
-    "simple recipe for fitting our data.\n",
-    "\n",
-    "<!-- !split  -->\n",
-    "## Classification problems\n",
-    "\n",
-    "\n",
-    "Classification problems, however, are concerned with outcomes taking\n",
-    "the form of discrete variables (i.e. categories). We may for example,\n",
-    "on the basis of DNA sequencing for a number of patients, like to find\n",
-    "out which mutations are important for a certain disease; or based on\n",
-    "scans of various patients' brains, figure out if there is a tumor or\n",
-    "not; or given a specific physical system, we'd like to identify its\n",
-    "state, say whether it is an ordered or disordered system (typical\n",
-    "situation in solid state physics); or classify the status of a\n",
-    "patient, whether she/he has a stroke or not and many other similar\n",
-    "situations.\n",
-    "\n",
-    "The most common situation we encounter when we apply logistic\n",
-    "regression is that of two possible outcomes, normally denoted as a\n",
-    "binary outcome, true or false, positive or negative, success or\n",
-    "failure etc.\n",
-    "\n",
-    "## Optimization and Deep learning\n",
-    "\n",
-    "Logistic regression will also serve as our stepping stone towards\n",
-    "neural network algorithms and supervised deep learning. For logistic\n",
-    "learning, the minimization of the cost function leads to a non-linear\n",
-    "equation in the parameters $\\hat{\\beta}$. The optimization of the\n",
-    "problem calls therefore for minimization algorithms. This forms the\n",
-    "bottle neck of all machine learning algorithms, namely how to find\n",
-    "reliable minima of a multi-variable function. This leads us to the\n",
-    "family of gradient descent methods. The latter are the working horses\n",
-    "of basically all modern machine learning algorithms.\n",
-    "\n",
-    "We note also that many of the topics discussed here on logistic \n",
-    "regression are also commonly used in modern supervised Deep Learning\n",
-    "models, as we will see later.\n",
-    "\n",
-    "\n",
-    "<!-- !split  -->\n",
-    "## Basics\n",
-    "\n",
-    "We consider the case where the dependent variables, also called the\n",
-    "responses or the outcomes, $y_i$ are discrete and only take values\n",
-    "from $k=0,\\dots,K-1$ (i.e. $K$ classes).\n",
-    "\n",
-    "The goal is to predict the\n",
-    "output classes from the design matrix $\\hat{X}\\in\\mathbb{R}^{n\\times p}$\n",
-    "made of $n$ samples, each of which carries $p$ features or predictors. The\n",
-    "primary goal is to identify the classes to which new unseen samples\n",
-    "belong.\n",
-    "\n",
-    "Let us specialize to the case of two classes only, with outputs\n",
-    "$y_i=0$ and $y_i=1$. Our outcomes could represent the status of a\n",
-    "credit card user that could default or not on her/his credit card\n",
-    "debt. That is"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "$$\n",
-    "y_i = \\begin{bmatrix} 0 & \\mathrm{no}\\\\  1 & \\mathrm{yes} \\end{bmatrix}.\n",
-    "$$"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Linear classifier\n",
-    "\n",
-    "Before moving to the logistic model, let us try to use our linear\n",
-    "regression model to classify these two outcomes. We could for example\n",
-    "fit a linear model to the default case if $y_i > 0.5$ and the no\n",
-    "default case $y_i \\leq 0.5$.\n",
-    "\n",
-    "We would then have our \n",
-    "weighted linear combination, namely"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "<!-- Equation labels as ordinary links -->\n",
-<<<<<<< HEAD
-    "<div id=\"_auto1\"></div>\n",
-=======
-    "<div id=\"_auto13\"></div>\n",
->>>>>>> 9b0e2e75096cc1acee65bfac25f4eff818140252
-    "\n",
-    "$$\n",
-    "\\begin{equation}\n",
-    "\\hat{y} = \\hat{X}^T\\hat{\\beta} +  \\hat{\\epsilon},\n",
-<<<<<<< HEAD
-    "\\label{_auto1} \\tag{1}\n",
-=======
-    "\\label{_auto13} \\tag{13}\n",
->>>>>>> 9b0e2e75096cc1acee65bfac25f4eff818140252
-    "\\end{equation}\n",
-    "$$"
+    "When you are done, you can draw a histogram of the relative frequency\n",
+    "of $\\widehat \\theta^*$. This is your estimate of the probability\n",
+    "distribution $p(t)$. Using this probability distribution you can\n",
+    "estimate any statistics thereof. In principle you never draw the\n",
+    "histogram of the relative frequency of $\\widehat{\\theta}^*$. Instead\n",
+    "you use the estimators corresponding to the statistic of interest. For\n",
+    "example, if you are interested in estimating the variance of $\\widehat\n",
+    "\\theta$, apply the etsimator $\\widehat \\sigma^2$ to the values\n",
+    "$\\widehat \\theta^*$."
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "c817851a",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "where $\\hat{y}$ is a vector representing the possible outcomes, $\\hat{X}$ is our\n",
-    "$n\\times p$ design matrix and $\\hat{\\beta}$ represents our estimators/predictors.\n",
-    "\n",
-    "## Some selected properties\n",
-    "\n",
-    "The main problem with our function is that it takes values on the\n",
-    "entire real axis. In the case of logistic regression, however, the\n",
-    "labels $y_i$ are discrete variables. A typical example is the credit\n",
-    "card data discussed below here, where we can set the state of\n",
-    "defaulting the debt to $y_i=1$ and not to $y_i=0$ for one the persons\n",
-    "in the data set (see the full example below).\n",
-    "\n",
-    "One simple way to get a discrete output is to have sign\n",
-    "functions that map the output of a linear regressor to values $\\{0,1\\}$,\n",
-    "$f(s_i)=sign(s_i)=1$ if $s_i\\ge 0$ and 0 if otherwise. \n",
-<<<<<<< HEAD
-    "We will encounter this model in our first demonstration of neural networks. Historically it is called the \"perceptron\" model in the machine learning\n",
-=======
-    "We will encounter this model in our first demonstration of neural networks. Historically it is called the ``perceptron\" model in the machine learning\n",
->>>>>>> 9b0e2e75096cc1acee65bfac25f4eff818140252
-    "literature. This model is extremely simple. However, in many cases it is more\n",
-    "favorable to use a ``soft\" classifier that outputs\n",
-    "the probability of a given category. This leads us to the logistic function.\n",
-    "\n",
-<<<<<<< HEAD
+    "## Code example for the Bootstrap method\n",
     "\n",
-    "## The logistic function\n",
-    "\n",
-    "The perceptron is an example of a ``hard classification\" model. We\n",
-    "will encounter this model when we discuss neural networks as\n",
-    "well. Each datapoint is deterministically assigned to a category (i.e\n",
-    "$y_i=0$ or $y_i=1$). In many cases, it is favorable to have a \"soft\"\n",
-=======
-    "## Simple example\n",
-    "\n",
-    "The following example on data for coronary heart disease (CHD) as function of age may serve as an illustration. In the code here we read and plot whether a person has had CHD (output = 1) or not (output = 0). This ouput  is plotted the person's against age. Clearly, the figure shows that attempting to make a standard linear regression fit may not be very meaningful."
+    "The following code starts with a Gaussian distribution with mean value\n",
+    "$\\mu =100$ and variance $\\sigma=15$. We use this to generate the data\n",
+    "used in the bootstrap analysis. The bootstrap analysis returns a data\n",
+    "set after a given number of bootstrap operations (as many as we have\n",
+    "data points). This data set consists of estimated mean values for each\n",
+    "bootstrap operation. The histogram generated by the bootstrap method\n",
+    "shows that the distribution for these mean values is also a Gaussian,\n",
+    "centered around the mean value $\\mu=100$ but with standard deviation\n",
+    "$\\sigma/\\sqrt{n}$, where $n$ is the number of bootstrap samples (in\n",
+    "this case the same as the number of original data points). The value\n",
+    "of the standard deviation is what we expect from the central limit\n",
+    "theorem."
    ]
   },
   {
    "cell_type": "code",
    "execution_count": 1,
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/html": [
-       "<div>\n",
-       "<style scoped>\n",
-       "    .dataframe tbody tr th:only-of-type {\n",
-       "        vertical-align: middle;\n",
-       "    }\n",
-       "\n",
-       "    .dataframe tbody tr th {\n",
-       "        vertical-align: top;\n",
-       "    }\n",
-       "\n",
-       "    .dataframe thead th {\n",
-       "        text-align: right;\n",
-       "    }\n",
-       "</style>\n",
-       "<table border=\"1\" class=\"dataframe\">\n",
-       "  <thead>\n",
-       "    <tr style=\"text-align: right;\">\n",
-       "      <th></th>\n",
-       "      <th>ID</th>\n",
-       "      <th>Age</th>\n",
-       "      <th>Agegroup</th>\n",
-       "      <th>CHD</th>\n",
-       "    </tr>\n",
-       "  </thead>\n",
-       "  <tbody>\n",
-       "    <tr>\n",
-       "      <th>0</th>\n",
-       "      <td>1</td>\n",
-       "      <td>21</td>\n",
-       "      <td>1</td>\n",
-       "      <td>0</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>1</th>\n",
-       "      <td>2</td>\n",
-       "      <td>23</td>\n",
-       "      <td>1</td>\n",
-       "      <td>0</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>2</th>\n",
-       "      <td>3</td>\n",
-       "      <td>25</td>\n",
-       "      <td>1</td>\n",
-       "      <td>1</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>3</th>\n",
-       "      <td>4</td>\n",
-       "      <td>29</td>\n",
-       "      <td>1</td>\n",
-       "      <td>0</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>4</th>\n",
-       "      <td>5</td>\n",
-       "      <td>21</td>\n",
-       "      <td>1</td>\n",
-       "      <td>0</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>...</th>\n",
-       "      <td>...</td>\n",
-       "      <td>...</td>\n",
-       "      <td>...</td>\n",
-       "      <td>...</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>95</th>\n",
-       "      <td>96</td>\n",
-       "      <td>61</td>\n",
-       "      <td>8</td>\n",
-       "      <td>1</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>96</th>\n",
-       "      <td>97</td>\n",
-       "      <td>69</td>\n",
-       "      <td>8</td>\n",
-       "      <td>1</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>97</th>\n",
-       "      <td>98</td>\n",
-       "      <td>65</td>\n",
-       "      <td>8</td>\n",
-       "      <td>1</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>98</th>\n",
-       "      <td>99</td>\n",
-       "      <td>64</td>\n",
-       "      <td>8</td>\n",
-       "      <td>1</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>99</th>\n",
-       "      <td>100</td>\n",
-       "      <td>63</td>\n",
-       "      <td>8</td>\n",
-       "      <td>0</td>\n",
-       "    </tr>\n",
-       "  </tbody>\n",
-       "</table>\n",
-       "<p>100 rows × 4 columns</p>\n",
-       "</div>"
-      ],
-      "text/plain": [
-       "     ID  Age  Agegroup  CHD\n",
-       "0     1   21         1    0\n",
-       "1     2   23         1    0\n",
-       "2     3   25         1    1\n",
-       "3     4   29         1    0\n",
-       "4     5   21         1    0\n",
-       "..  ...  ...       ...  ...\n",
-       "95   96   61         8    1\n",
-       "96   97   69         8    1\n",
-       "97   98   65         8    1\n",
-       "98   99   64         8    1\n",
-       "99  100   63         8    0\n",
-       "\n",
-       "[100 rows x 4 columns]"
-      ]
-     },
-     "metadata": {},
-     "output_type": "display_data"
-    },
-    {
-     "data": {
-      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAfgAAAFnCAYAAABKGFvpAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAgAElEQVR4nO3deWBU5dn38d9kmRCSQMIQIBEFBEHZt1ojLkCtgsW61VoX3kdc4gYuCCgiS6mKwIMLKgrS2mqxVYuAiLJoG0AWI5AAyg6yhiVMyDLZJpOc94808xBIMkPmTBIP389fTM6Z+1znOstv5p4hsRmGYQgAAFhKSH0XAAAAzEfAAwBgQQQ8AAAWRMADAGBBBDwAABZEwAMAYEEEPGq0aNEi9e3bV263O+jbSklJ0aBBgzR06FDvzyZMmKCFCxeato0NGzbo/vvvlyR98803Z23PzPEbqj/96U/q27evPvvss2rXKS0t1ezZs/WHP/xBQ4cO1e9//3uNGzdOW7ZsqcNK68aWLVt08803a+DAgfVdSpWOHz+u3//+9+rUqZP3Z4sXL9a4cePqsSr8LBhADZ544gmjR48exjfffFMn25s/f75x7733eh+7XC6jpKTE5/PuvfdeY/78+T7XKysrM3Jzc6vd3rnq2LGjcejQoWrHb6h89WvUqFHGM888YxQXFxuGYRjFxcXGc889Z/z2t7+tqxLr1Pr1640BAwbU2fb8PV8rHDp0yOjYsaP3scfjMVwuVzBKg4XwDh7VcrlcCg0N1YABA/TVV1/VSw1RUVEKCwszbTybzaaYmBjTxqvr8evCd999p+XLl2vixImy2+2SJLvdrnHjxnkfo36FhoYqKiqqvstAA2fenROW8/XXX+v6669XWFiYxowZo+LiYkVEREiS3n33XS1evFgtW7bUFVdcoRkzZujyyy/Xn//8Z9lsNr366qtKS0uTzWZTv3799Pjjj8tms521jYKCAo0fP1579+5Vq1atKk1DLly4UDNnztTll1+uV155RYWFhXruuefkdDpVWlqq7t27a+zYsZoxY4a2b9+uzMxMLViwQA888IBSUlL0xRdf6N5779WePXu0ceNGDRo0SD/++KM2b96snTt3erdjGIZefPFF/fjjjyouLtaf/vQndenSRdOmTdMnn3yi559/XrfddpsmTJigBQsWaO7cufrlL3+pBx98UJI0cuRIRURE6JVXXtHTTz9daXyXy6WXX35ZP/30k8rKynTdddfpwQcfVElJiR544AGlpqZqwoQJSklJ0U8//aRnn31Wv/71r6s8Hh9//LEWLFggu90um82m8ePHq0OHDtqyZYvGjx+vvLw83X333UpJSVFOTo5mzpypdu3aSZK2bt2qiRMnKiIiQt26dZNRwy+wXLZsmbp3737WC5Xo6Gi9//773serVq3SrFmzFBoaqkaNGmnChAlq06aNPv74Y82ePVs9evRQVFSUNm7cqObNm+vDDz/06zkxMTHaunWrmjdvrrfeeksRERHKz8/XpEmTdPjwYdlsNl1yySUaP368wsLC9NZbb+kf//iHBg0apOzsbKWlpSknJ0fh4eFq3LixxowZo0GDBunuu+/WwYMHNXXqVPXr16/KfZ87d26V/Tt48KAmTZokt9utsrIyjRo1Sr179z6n4/L1118rLS1NycnJZ52v/fv3P6uWimusVatWuvbaa70/37lzp8aMGaO8vDz9+9//liS99dZbWr16tex2uxwOh55//nm1aNFC+fn5evHFF7V//34ZhqGbb75Zd911l6Tyj5NmzpwpwzBUUlKiBx98UNddd50kadOmTZo+fbrCw8NlGIbuv/9+DRgwQFL5dfnRRx/JbrerZcuW+uMf/6jo6OhqzyfUo3qdP0CDNnLkSKOoqMgoLi42+vbta6xYscIwDMNISUkx+vXrZ5w6dcowDMOYMmVKpenDWbNmGUOHDjU8Ho/hdruNO++801i4cGGV25g6darxwAMPGKWlpUZxcbFx1113VZoynzlzpvHss88ahmEYf//7340JEyYYhlE+RXnbbbd516tqyvPee+81hg0bZng8HmPPnj3GJ598ctZU5/z5842uXbsaO3bsMAzDMD7//HNjwIABhtvtrnLcAQMGGOvXr/c+PnOK/szxx44d662/sLDQGDJkiLFgwYJKz58zZ45hGIaxZMkS4/rrr6+yT4ZhGP/4xz+8U+br16837rrrLu+y9evXG126dDG+//57wzAMY+LEicb48eMNwyifXr/mmmuMxYsXG4ZhGNu2bTO6du1a7RTxAw88YIwcObLaOgzDMA4ePGj07NnT2Ldvn2EYhrFw4ULjhhtu8H6cMnPmTOPKK680nE6n4fF4jGnTpvn1nKuuusrIzs42SktLjd/85jfemk+dOlXpHHr22WeNTz75pNLjIUOGGAUFBUZubq7x9ttvG3/5y1+MYcOGeddZunSp8a9//avK/ampfx6Pxxg0aJDx6aefGoZhGNu3bzcuv/xyIy8vzzAM/47LmjVrDMMwjFdeecUwDN9T9GdeY9OmTat0Xp3+kcLu3buNwYMHG2VlZYZhGMZLL73kPUfHjRtnjB492jAMw8jLyzMGDhzo3ceUlBRj//793mVXXXWV9+Ol22+/3UhPT/fub8U5vGHDBuPyyy83nE6nd3+ef/75avcD9YspelQpNzdXUVFRioiIkN1u1/XXX68vv/xSkrR06VJdc801io2NlSTddNNNlZ67YMEC3XrrrQoNDVV4eLgGDRqkzz//vMrtLF26VEOGDFFISIjsdnu1714lKTY2Vhs3blR6erpCQ0P197//3ed+XHvttQoNDVX79u11xx13VLlOu3btvDMHN954o06cOKH09HSfY/tSVlamxYsX6/bbb5ckNWrUSDfeeONZX267+uqrJUmdOnXSkSNHqh2vQ4cOeuSRR3T33XdrxowZ+vHHHystb9y4sfr27esd6/Dhw5Kk9PR0OZ1ODR48WJJ02WWXqW3btgHt2xdffKFu3bp53+EOGTJEGRkZSktL867Ts2dPNWvWTKGhoRo9erRfz+nRo4eaNm2qkJAQXXLJJd59aNq0qTIyMnTXXXdp6NChSk1NPWv/k5KSFBkZqZiYGD322GO66aablJqaquPHj0sqP9duuOGGavfp9P5deumllfp36NAh3Xzzzd5lLVu2VEpKiiTfxyUyMlJXXnmlJOnZZ5/1q79nXmM33nhjtetGRUXp5MmTWr58uUpKSjRq1Cj16dNHZWVlWrRokX73u99JKp+BGTBggPdavOSSS/TGG2/oD3/4gx599FFlZ2frp59+klTe70WLFunkyZO69NJLNXHiREnl1/bAgQPVrFkzSeXX/uLFi2ucEUL9YYoeVaqYTqz4hnlOTo4OHTqkoqIinThxQpdeeql33aZNm1Z67rFjx/T+++97gyw/P19NmjSpcjuZmZmKi4urdqzT/eY3v5HH49HLL7+s7Oxs3Xfffbr77rtr3A9/Pg8/fZuhoaGKiYlRZmamz+f5kpWVJbfb7b0ZSlKzZs28gVOhYnozIiJCJSUlVY6Vl5enhx9+WC+99JIGDRqkw4cP61e/+lWV45w5VmZmppo0aaLQ0FDv8orgqEqbNm20a9euGvft2LFjlfYrNDRUTZo00bFjx7w/O7P3/jynun1YsGCBPv74Yy1cuFCxsbF68803z3oxdOb2mjdvrqSkJH3++ee6/fbbFR4eXuNU8unL7Ha7d9sVx+v0/x3hdruVl5fn13GpzXcyfF1jp0tISNDs2bP13nvvafLkybrpppv05JNPKj8/X263W9OnT1ejRo0klb9wv+yyyySVv9jo2LGjXn31VUnSwIEDVVhYKEmaMWOG5syZo1tvvVUdO3bUqFGjdNlll+nYsWPau3ev977g8XjUvHlznTp1qtKxRcNAwKNK3377rT777DOFh4dLkkpKSpSUlKSUlBS1aNFCWVlZ3nWzs7MrPTchIUGPPvqo9x1jWVmZcnNzq9xOfHy8Tp06Ve1Yp8vKytKNN96om2++Wdu2bdOwYcN08cUX64orrqj1fp65TY/Ho7y8PMXHx0uSwsPDK/0Xwer2oyrNmjWT3W5XVlaW2rdv792Hli1bnnONP/30k1wul/fdvsfj8fu58fHxys3Nlcfj8X5hsaY+Dx48WP/617+Ul5dXKZwOHTqk6dOna+bMmUpISPC+25PK/1tdbm6uWrVqVe24tXlOhS1btqh79+7eFyb+7v8tt9yid955xzt7UhutWrVSeHi4PvzwQ+/PCgoKFBISol27dtX6uNTE1zV2usLCQnXo0EGzZs1SZmamRowYoffee0/Dhw+X3W7X+PHj1b17d0nl13FRUZGk8p6e/qLl9BeXbrdbY8aM0ciRIzV37lw99thj+s9//qOEhARdeOGF3nf0Uvk5Tbg3TEzR4yw5OTne6fUK4eHhuuaaa/TVV19p0KBBWrVqlTeYz/yG/a233qovvvhCpaWlksrffb377rtVbmvw4MFavHixysrK5Ha7tWzZsmrrmjdvnlauXClJ6tixo5o2baqysjJJ5dOUhYWF2r9/v6ZOnXpO+7t3717vl+KWLFmiFi1aqGfPnpKk1q1ba/fu3ZKk1NRU782xQuPGjVVUVKRFixZp6dKllZaFhITolltu8c5kFBUV6auvvtJtt912TvVJUmJiosLCwrz/D3316tV+P7dnz55yOBzej1i2b9+uvXv3Vrt+37599dvf/laTJ0/23vQLCgo0efJk/eIXv5BUPpvyww8/6MCBA5KkL7/8UomJierVq1e149bmORXatGmjHTt2yO12y+PxaN26dX7t+3XXXadjx47p008/1VVXXeXXc87Uo0cPJSQkaPny5ZLKQ/zxxx/X/v37a31cfJ2vZ15jX3zxRbVjbdmyRTNnzpRU/mKuXbt2Ki0t9Z5/p3889s4773h/r8RFF12kzZs3S5J27NhRadbqiSeeUGFhocLCwtS7d2/vtXzrrbdq5cqVysnJkSTt27dPjz76qF/7jLoXOmnSpEn1XQQajry8PA0dOlQHDhxQmzZtvJ/VpqSk6NNPP9UPP/ygyMhI9evXT1OmTNG///1vdenSRatXr9aIESMkld8Qd+7cqZkzZ+rzzz9Xdna2xo4dW+kFQ4VevXpp3bp1euedd5SSkqLOnTtr1apVysjIUHZ2tv72t79p3759crvd6tevn2bPnq0FCxZo3rx5GjBggO68805J5dO577zzjlauXKm7775bH330kVauXKnt27erpKREvXr1UlZWlkaMGKHjx48rNTVVTZo00Ztvvqm2bdvq8OHDmjNnjjZt2qRp06YpISFBUvlN8M9//rO+/PJLNW7cWPv27VNqaqq6deumFi1ayOVyafbs2dq1a5f+53/+R08//bR3/JtvvllXXHGFvv32W82dO1efffaZ99vcNptN999/vw4dOqTNmzdr0KBBevzxx3X8+HGlpaV5P++t0LhxYzVr1kzTp0/X2rVrZbPZtHnzZqWlpalLly4aP368jhw5omPHjsnhcOiVV17RwYMHlZ2drWuuuUZ9+vTR66+/rgULFigjI0N2u11r1qxRQkKCLr744rOOS//+/bV//35Nnz5dixcv1oIFCzR48GDdc889ksqnjLt06aKXX35ZCxYs0J49ezR9+nQ1a9ZMixcv1vvvv699+/Zp586d3u9V+PucyMhIbdu2TZ9++ql2796tZs2a6aabbtLGjRv1zjvvaMOGDYqMjFRqaqpCQkKUnp6uhQsXavfu3Tp+/Hilb8iHhYXpwIEDatu2bZXfVJekPXv21Ni/q6++WldffbXefvttffrpp/rss890880365prrvH7uKxbt04DBw70TpWfeb62bt26Uk1t27ZVSUmJpkyZoq+//lqXXHKJ1q1bp9TUVHXu3FkTJ07UkSNHtGPHDt1yyy366quvNG/ePH3yyScKCwvT6NGjFRERoV/+8pf6z3/+o/fee08LFy5UkyZN9OijjyokJESdOnXSe++9p2XLlunEiRPKyMjQxo0b9Ytf/EI2m02vvfaaFi1apFWrVmnChAm68MILlZCQoNjYWL300ktavHix1qxZo8mTJ1f6mA0Nh83g2xE4Rx6PR0VFRd7PLLds2aJHHnlEa9eurefKgLNNnz5dN9xwg3eaGjhfMEWPc3bkyBFNmDDB+/jzzz+v9fQnECwLFy5USUmJtm/fTrjjvMSX7HDO4uLi5Ha79Yc//EGGYahVq1aVvnQDNARvvPGGPvjgAw0fPry+SwHqRdCm6DMzM/X6669rx44dmj9//lnL58yZo5MnT6p58+b68ccf9cQTT3i/aQwAAAITtCn6jRs36le/+lW1vwChoKBAY8eOVXJysm644QZNnz49WKUAAHDeCVrADxo0qMY/hvDUU095fzd5WVmZGjduHKxSAAA479T7l+zcbrcWLFigp556yq/1PZ7SIFcEAMDPX71+yc7tdmvSpEl6+umnddFFF/n1nFOnCgLebnx8jDIz8wIeB5XR1+Cgr8FBX4ODvpovPr52f4K6Tt/BZ2dny+VySSr/rV4TJ07UsGHD1LVr1xp/gxkAADg3QXsHn5qaqkWLFikzM1OzZs3S/fffrzlz5ig2NlbJyckaNWqUdu/e7f2LTQUFBTX+pScAAOC/n91vsjNj6ocppOCgr8FBX4ODvgYHfTXfz2KKHgAA1A0CHgAACyLgAQCwIAIeAAALIuABALAgAh4AAAsi4AEAsCACHgAACyLgAQCwIAIeAAALIuABALAgAh4AAAsi4AEAsCACHgAACyLgAQCwIAIeAAALIuABALAgAh4AAAsi4AEAsCACHgAACyLgAQCwIAIeAAALIuABALAgAh4AAAsi4AEAsCACHgAACyLgAQCwIAIeAAALIuABALAgAh4AAAsi4AEAsCACHgAACyLgAQCwIAIeAAALIuABALAgAh4AAAsi4AEAsCACvgErLinViVMFKi4pre9STJFX4Nb2/VnKK3DXdyk++eq9P8fGjONXXFKqoyfzaxzDjL7Wxf6a1TNf6/jTD199rata/RnDmVOotVuPyplTWOXyoyddWrr+gI6edFU7hq+emFFHxTiB9tUXs45NXVw3dTVGdcJMH/G/MjMz9frrr2vHjh2aP3/+WcuLi4s1depUtWzZUvv371dycrLatWsXrHJ+VkrLyvTxv/cobVemsnKL1axJhHp1jNedAzsoNOTn95rM7fHopQ826UimS2WGFGKTLoiP1rj/11v2sKCdgrXiq/f+HBszjl+lMfKK1Szm7DHM6Gtd7K9ZPfO1jj/98NXXuqrVnzEK3SV69p11chV6vMcrOjJMUx9NUqQ9XK4it0a+uUaeUkOS9EnKXoWF2vTqiH6KbmT36xwxow6z+hrouervOnVx3dTVGL6ETpo0aZIpI51h9erVuuSSS5Samqo777zzrOV/+ctfFBMTo+TkZF144YWaMGGCbr/9dp/jFpjw7i8qKsKUcYLln9/s1tcbDquwuPwVXWFxqfZl5Kqw2KNuFzvqubrqVdfXyX/doEMnXDL++9iQlJvv1uY9Tg3odUGd1uiLr977c2zMOH7+jGFGX+tif83qma91/OlHQ6nVnzGeeWtNpVCVJLenTKvSMzT4ijYa/toqb7hXKDOkFd8f0k392vl1jphRh1n764tZx6Yurhuzx4iKivBrzDMF7e3goEGDFBUVVe3ylJQU9erVS5LUqVMn7dixQy5X9VNM54viklKl7cqsclnarpM/u+n6vAK3jmRWfVyPZLoa1HS9r97nFbh9Hhszjp8/Y5jR17rYX7N65msdZ06hz340lFr92Y4zp/CsUK3gKvRo+37nWeFewVNq6OhJl89zxJlTGHAdzpxCU/bXFzP6XlfXjVn7Y4Z6mx91Op2VXgBER0fL6XQqOjq6xufFxTVWWFhowNuPj48JeIxgOHoyX1l5xVUuO5VXpFB7uOKbV//Cqb6d2deM3Zkqq/o+pDJDynOX6eI2DeNY+Op9nrvM57GRFPDx8+ccyMsvCbivdbG/ZvXM1zoZ2cU++9EiqlGDqNWf7WRk5FW9M/+1fkfV4VBhz7F8tW8dVmNPMrKLA64jI7tYl8VGBby/ZlwTku9jUxfXjVn7Y8Z9vt4C3uFwKD8/3/vY5XLJ4fA9tXHqVEHA246Pj1FmZs0nbn0pLSlVs5gIOXPPPvhxMY1U6i5psLVX1dcYe4hCbKryogqxlS9vKPvjq/cx9hCfx0ZSwMfPn3PAjL7Wxf6a1TNf6yTGRvjsR6m7pEHU6s92EmNrnpK94tJ4rU7PqHZ5h1ZRivZxjiTGRgRcR2JshCl9NeOakHwfm7q4bszan9PHqO0b0jr9xlZ2drZ3Gr5///5KS0uTJO3cuVOXXnqpz3fv54OI8FD16hhf5bJeHZsrIjzw2Yu6FNPYrgviqz6uF8RHK6axvY4rqp6v3sc0tvs8NmYcP3/GMKOvdbG/ZvXM1zqOppE++9FQavVnO46mkYqOrPr9V3RkmC5r61BYqK3K5WGhNiU0j/Z5jjiaRgZch6NppCn764sZfa+r68as/TFD0L5kl5qaqkWLFmn79u0qKipSt27dNGvWLO3evVt9+vRRly5dtHTpUm3btk0rV67UmDFjFBcX53Pc8+FLdp3bxqmw2KMcl1vFbo+aNWmkft1a6c6BHRRiq/qibgiq62u/bi21eY9TrgK3DJW/Um7dovxbqw3tfwX46r0/x8aM4+fPGGb0tS7216ye+VrHn340lFr9GaN/70StSs+Q21PmPV4V314PDw3VgD4XaMX3hyq9G634Fr39vx9j+uqJGXWYtb9mXBMN5boxe4zafsnOZhhGNZ9INExmTOc25Cn60xWXlCrHVaym0RE/i3fuvvqaV+DW4RMutW7RsN65V8VX7/05NmYcv+KSUoXaw1XqLql2DDP6Whf7a1bPfK3jTz989bWuavVnDGdOoXYezFani2LlaBp51vKjJ13avMepHh0cSmhe9btTXz0xo46KcQLtqy9mHZu6uG7MGqO2U/QEPExDX4ODvgYHfQ0O+mq+n8Vn8AAAoG4Q8AAAWBABDwCABRHwAABYEAEPAIAFEfAAAFgQAQ8AgAUR8AAAWBABDwCABRHwAABYEAEPAIAFEfAAAFgQAQ8AgAUR8AAAWBABDwCABRHwAABYEAEPAIAFEfAAAFgQAQ8AgAUR8AAAWBABDwCABRHwAABYEAEPAIAFEfAAAFgQAQ8AgAUR8AAAWBABDwCABRHwAABYEAEPAIAFEfAAAFgQAQ8AgAUR8AAAWBABDwCABRHwAABYEAEPAIAFEfAAAFgQAQ8AgAWFBXPwtWvXavny5XI4HLLZbBo+fHil5YcOHdK0adPUrVs3bd++XUOGDNGvfvWrYJYEAMB5IWgBX1hYqIkTJ2rJkiWy2+0aMWKE1q1bp6SkJO86c+fOVZ8+fXTfffdp27Zteuqppwh4AABMELQp+vT0dCUmJsput0uSevfurZSUlErrNG/eXFlZWZKkrKwsdenSJVjlAABwXgnaO3in06moqCjv4+joaDmdzkrrDBs2TI8//rimTJmiLVu26LHHHvM5blxcY4WFhQZcX3x8TMBj4Gz0NTjoa3DQ1+Cgrw1D0ALe4XAoPz/f+9jlcsnhcFRa57nnntMdd9yhIUOGKCsrS9dff72+/vprxcbGVjvuqVMFAdcWHx+jzMy8gMdBZfQ1OOhrcNDX4KCv5qvtC6agTdH37NlTGRkZcrvdkqRNmzapf//+ys7OlsvlkiQdPXpU8fHxkqQmTZooJCREZWVlwSoJAIDzRtDewUdGRmrSpEl68cUXFRcXp06dOikpKUnTpk1TbGyskpOTNXbsWH3wwQdKS0vT4cOH9fTTT6tZs2bBKgkAgPOGzTAMo76LOBdmTP0whRQc9DU46Gtw0NfgoK/ma3BT9AAAoP4Q8AAAWBABDwCABRHwAABYEAEPAIAFEfAAAFgQAQ8AgAUR8AAAWBABDwCABRHwAABYEAEPAIAFEfAAAFgQAQ8AgAUR8AAAWBABDwCABRHwAABYEAEPAIAFEfAAAFgQAQ8AgAUR8AAAWBABDwCABRHwAABYEAEPAIAFEfAAAFgQAQ8AgAUR8AAAWBABDwCABRHwAABYEAEPAIAFEfAAAFgQAQ8AgAUR8AAAWBABDwCABRHwAABYEAEPAIAFEfAAAFgQAQ8AgAUR8AAAWFBYMAdfu3atli9fLofDIZvNpuHDh1dabhiGPvzwQ0nSkSNHlJubqylTpgSzJAAAzgtBC/jCwkJNnDhRS5Yskd1u14gRI7Ru3TolJSV511m0aJGaNGmiW265RZK0Y8eOYJUDAMB5JWhT9Onp6UpMTJTdbpck9e7dWykpKZXWWbx4sbKzs/XBBx/o1VdfVVRUVLDKAQDgvOLXO/iMjAxt3bpVNptNXbt2VWJios/nOJ3OSoEdHR0tp9N51rgul0vDhw/XTz/9pAcffFBffvmlQkNDqx03Lq6xwsKqX+6v+PiYgMfA2ehrcNDX4KCvwUFfGwafAf/SSy9p3rx5aty4sQzDUGFhoe655x6NGzeuxuc5HA7l5+d7H7tcLjkcjkrrREdHq0ePHpKkdu3ayeVy6ejRo2rdunW14546VeCrZJ/i42OUmZkX8DiojL4GB30NDvoaHPTVfLV9wVTjFP3HH3+svXv3asmSJdqwYYM2btyoL774Qnv37tU///nPGgfu2bOnMjIy5Ha7JUmbNm1S//79lZ2dLZfLJUlKSkrSoUOHJJW/ACgtLVV8fHytdgQAAPwfm2EYRnULH374Yc2YMUPR0dGVfu5yufTMM89o9uzZNQ6+Zs0aLVu2THFxcQoPD9fw4cM1bdo0xcbGKjk5WXl5eZo+fboSExN18OBB3XDDDbr22mtrHNOMV4a8wgwO+hoc9DU46Gtw0Ffz1fYdfI1T9NHR0WeFe8XPY2NjfQ7er18/9evXr9LPxowZ4/13TEyMJk+e7G+tAADATzVO0cfEVP+qoaZlAACgftX4Dn7hwoX6+uuvq1yWn5+vF154IShFAQCAwNQY8ElJSRo2bNhZPz/9N9ABAICGp8aAHz16tC6++OIql7Vo0b/9OXQAABVgSURBVCIoBQEAgMDV+Bn8gQMHql128OBB04sBAADmqPEd/Ntvv60ff/yxymWrVq3y+V/aAABA/agx4HNzc7Vv3z5J5b9bvmfPnpWWAQCAhqnGgE9OTtbvfvc7SdLIkSP16quvepd99tlnwa0MAADUWo2fwVeEuyTZbLZKy2677bbgVAQAAAJWY8Dv3bu32mUVU/cAAKDhqXGKfvr06Ro6dKgMw1BmZqa+/fZb77KPPvpIs2bNCnqBAADg3NUY8Kmpqdq1a5f38YQJE7z/5kt2AAA0XDUG/L333quRI0dWueyNN94ISkEAACBwNX4Gf91112nKlClKTU31/uzAgQP65z//qSeffDLoxQEAgNqpMeA/+OADxcTEqHPnzt6fORwObd68WX/729+CXhwAAKidGgPe4/Fo+PDhlf4mfHR0tKZMmaLNmzcHvTgAAFA7NQZ8kyZNql0WFxdnejEAAMAcNQZ8fn5+tcuKi4tNLwYAAJijxoDv1KmTXnvtNbndbu/PiouLNXPmTF122WVBLw4AANROjQH/0EMPKTs7W1dccYWGDBmiIUOG6Morr1Rubq7uueeeuqoRAACcoxr/H7zNZtMf//hHJScna+vWrZKk7t27KzExsU6KAwAAtVNjwFe44IILdMEFFwS7FgAAYJIap+gBAMDPEwEPAIAFEfAAAFgQAQ8AgAUR8AAAWBABDwCABRHwAABYEAEPAIAFEfAAAFgQAQ8AgAUR8AAAWBABDwCABRHwAABYEAEPAIAFEfAAAFgQAQ8AgAWFBXPwtWvXavny5XI4HLLZbBo+fHiV633++ecaPXq0Nm3apKioqGCWBADAeSFoAV9YWKiJEydqyZIlstvtGjFihNatW6ekpKRK6+3du1d79+4NVhkAAJyXgjZFn56ersTERNntdklS7969lZKSUmmdwsJCzZ07V48//niwygAA4LwUtHfwTqez0nR7dHS0nE5npXVee+01PfbYY94XAf6Ii2ussLDQgOuLj48JeAycjb4GB30NDvoaHPS1YQhawDscDuXn53sfu1wuORwO7+OjR48qNzdXX331lfdn77//vq699lp169at2nFPnSoIuLb4+BhlZuYFPA4qo6/BQV+Dg74GB301X21fMAUt4Hv27KmMjAy53W7Z7XZt2rRJd999t7KzsxUWFqaEhAS98sor3vVnzJihYcOG8SU7AABMELTP4CMjIzVp0iS9+OKLeu2119SpUyclJSVpzpw5+uijj7zrZWVladasWZKkuXPn6vjx48EqCQCA84bNMAyjvos4F2ZM/TCFFBz0NTjoa3DQ1+Cgr+ar7RQ9v+gGAAALIuABALAgAh4AAAsi4AEAsCACHgAACyLgAQCwIAIeAAALIuABALAgAh4AAAsi4AEAsCACHgAACyLgAQCwIAIeAAALIuABALAgAh4AAAsi4AEAsCACHgAACyLgAQCwIAIeAAALIuABALAgAh4AAAsi4AEAsCACHgAACyLgAQCwIAIeAAALIuABALAgAh4AAAsi4AEAsCACHgAACyLgAQCwIAIeAAALIuABALAgAh4AAAsi4AEAsCACHgAACyLgAQCwIAIeAAALIuABALCgsGAOvnbtWi1fvlwOh0M2m03Dhw+vtHzOnDk6efKkmjdvrh9//FFPPPGE2rdvH8ySAAA4LwQt4AsLCzVx4kQtWbJEdrtdI0aM0Lp165SUlORdp6CgQGPHjpXNZtOXX36p6dOn69133w1WSQAAnDeCNkWfnp6uxMRE2e12SVLv3r2VkpJSaZ2nnnpKNptNklRWVqbGjRsHqxwAAM4rQXsH73Q6FRUV5X0cHR0tp9NZ5bput1sLFizQxIkTfY4bF9dYYWGhAdcXHx8T8Bg4G30NDvoaHPQ1OOhrwxC0gHc4HMrPz/c+drlccjgcZ63ndrs1adIkPf3007rooot8jnvqVEHAtcXHxygzMy/gcVAZfQ0O+hoc9DU46Kv5avuCKWhT9D179lRGRobcbrckadOmTerfv7+ys7PlcrkkSUVFRZo4caKGDRumrl27atmyZcEqBwCA80rQ3sFHRkZq0qRJevHFFxUXF6dOnTopKSlJ06ZNU2xsrJKTkzVq1Cjt3r1bhw8fllT+pbsbbrghWCUBAHDesBmGYdR3EefCjKkfppCCg74GB30NDvoaHPTVfA1uih4AANQfAh4AAAsi4AEAsCACHgAACyLgAQCwIAIeAAALIuABALAgAh4AAAsi4AEAsCACHgAACyLgAQCwIAIeAAALIuABALAgAh4AAAsi4AEAsCACHgAACyLgAQCwIAIeAAALIuABALAgAh4AAAsi4AEAsCACHgAACyLgAQCwIAIeAAALIuABALAgAh4AAAsi4AEAsCACHgAACyLgAQCwIAIeAAALIuABALAgAh4AAAsi4AEAsCACHgAACyLgAQCwIAIeAAALIuABALAgAr4KeQVubd+fpbwCd63HKC4p1YlTBSouKQ1onUC3Y1YdvnqSV+DW5t2ZNfbMmVOotVuPyplTWKvl/tThzxj+rOOrJ/6cI2aNcfRkfkDH7+hJl5auP6CjJ121HsOMY2PGeebPOGZcV2bcA/xhRq1mMOOaqCt1VUdD2d9AhAVz8LVr12r58uVyOByy2WwaPnx4peXFxcWaOnWqWrZsqf379ys5OVnt2rULZkk1cns8eumDTTqS6VKZIYXYpAviozXu//WWPcy/VpWWlenjf+9R2q5MZeUWq1mTCPXqGK87B3ZQaEiI3+sEuh2z6vDVE396Vugu0bPvrJOr0OOtPzoyTFMfTVKkPdzncn/q8GcMf9bx1RN/9tf0MfKK1Szm3I+fq8itkW+ukafUkCR9krJXYaE2vTqin6Ib2f0aw4xjY8Z5ZtY574sZ9wB/mFGrGcy4JupKXdXRUPbXDKGTJk2aFIyBCwsL9fDDD+vdd9/VlVdeqQ8//FBxcXG68MILvev85S9/UUxMjJKTk3XhhRdqwoQJuv3222sct8CEV9RRURFVjjP5rxt06IRLxn8fG5Jy893avMepAb0u8Gvsf36zW19vOKzC4vJXfYXFpdqXkavCYo+6Xezwe51At2NWHb564k/PnnlrTaUbiCS5PWValZ6hwVe08bncnzr8GcOfdXz1xJ/9rYsx/Fln+GurvOFeocyQVnx/SDf1a+fXGGYcGzPOM39qPdfrqqr7gBn3AH+YcQ8wgxnXxJmqu78Gqq561lCOzemioiJq9bygvRxJT09XYmKi7Pbydwq9e/dWSkpKpXVSUlLUq1cvSVKnTp20Y8cOuVzVTyMGU16BW0cyq972kUyXX1N1xSWlStuVWeWytF0nVVxS6tc6gW4nr8BtSh2+enL0pMtnz5w5hWfdQCq4Cj3ac/hUjcudOYU+6zhwLMfnGL7qcOYU+uyJM6fQ5/7WxRj+HL8Dx3LOCvcKnlJDR0+6fI5x9KQr4GPjzCkM+Dzzpyf+nPO+mHEP8IcZ9wAzmHFN1FWtdVVHQ9lfswRtit7pdCoqKsr7ODo6Wk6n0691oqOjqx03Lq6xwsJCA64vPj6m0uOM3Zkqq/p+qDJDynOX6eI2MVWv8F9HT+YrK6+4ymWn8ooU+t8pL1/rxDePqnK5v9vJc5eZUkdefkmNPdlzLN9nz5w5VW+jwsY9WTUuz8gulqNpoxq388OBXJ9j+JKRXazLYqNq7ElGdrHP/W0R1SjoY/hz/Hz1ZM+xfCXFN6lxjD3H8mscw59jk5FdHPB55k9P/Dnnq7quTr8PmHEP8Ic/9wlf9wAzbD2QXeNyf64Jf/pqhrrqWUM5NmYJWsA7HA7l5//fDcLlcsnhcJzzOmc6daog4Nri42OUmZlX6Wcx9hCF2FTlBR5iK19+5nPOVFpSqmYxEXLmnn2CxMU0Uqm7RJJ8rhPodmLsIabU4asnHVpF+exZRGzNU0t9OjTTsvUHql2eGBshe3jNdXRt00Tza9hGoo8aKtYpdZfU2JPE2Aif+1sXY/hz/Hz1pEOrKJ/b6dCq5huZP8cmMTYi4PPMn574c86feV2deR8w4x7gD3/uE2Zsxxdf14U/14Q/fTVDXfWsoRybM9X2BVPQpuh79uypjIwMud3l01qbNm1S//79lZ2d7Z2G79+/v9LS0iRJO3fu1KWXXlrju/dgimls1wXxVW/7gvhoxTS2+xwjIjxUvTrGV7msV8fmiggP9WudQLcT09huSh2+epLQPNpnzxxNIxUdWfXryOjIMHVoHVfjckfTSJ91tGnV1OcYvupwNI302RNH00if+1sXY/hz/Nq0aqqwUFuVy8NCbUpoHu1zjITm0QEfG0fTyIDPM3964s8574sZ9wB/mHEPMIMZ10Rd1VpXdTSU/TVL0L5kFx4ervbt2+v9999Xenq6WrRoodtvv10zZ87U7t271adPH3Xp0kVLly7Vtm3btHLlSo0ZM0ZxcXE1jhvML9n169ZSm/c45Spwy1D5q/bWLcq/Qevvtyc7t41TYbFHOS63it0eNWvSSP26tdKdAzsoxGbze51At2NWHb564k/P+vdO1Kr0DLk9Zd76K76pGx4a6nO5P3X4M4Y/6/jqiT/7Wxdj+LPOgD4XaMX3hyq9I634Fr09zL/9NePYmHGemXXOn66q+4AZ9wB/mHEPMIMZ18SZgvUlu7rqWUM5Nqer7ZfsbIZhVPOpU8NkxvSIrymkvAK3Dp9wqXWL2r9qLy4pVY6rWE2jI6p91efPOoFux6w6fPUkr8CtPHeZYuwh1fbMmVOonQez1emiWDmaRp7zcn/q8GcMf9bx1RN/zhGzxgi1h6vUXVLr43f0pEub9zjVo4NDCc2rfofqawwzjo0Z55k/4/h7XdV0HzDjHuAPM+4BZjDjmqgQjCn62tTxc9mOP2o7RU/AwzT0NTjoa3DQ1+Cgr+ZrcJ/BAwCA+kPAAwBgQQQ8AAAWRMADAGBBBDwAABZEwAMAYEEEPAAAFkTAAwBgQQQ8AAAWRMADAGBBBDwAABZEwAMAYEEEPAAAFkTAAwBgQQQ8AAAWRMADAGBBBDwAABZkMwzDqO8iAACAuXgHDwCABRHwAABYEAEPAIAFEfAAAFgQAQ8AgAUR8AAAWFBYfRcQbAcPHtTrr7+uzp0769ixY4qNjdXw4cOVnZ2tGTNm6MILL9T+/fs1cuRINW/evL7L/VkoKyvTI488ou7du6ukpESHDh3Syy+/rKKiInpqgqKiIt1xxx266qqr9Oyzz3KumuD3v/+9IiIiJEkhISH629/+Rl9NsG/fPi1ZskQRERH6/vvvNWLECF100UX0NQCHDx/Wfffdp4SEBEmSy+VSp06d9Nxzz517Xw2L27x5s7FixQrv48GDBxtbt241xo8fbyxZssQwDMP45ptvjFGjRtVXiT87paWlxttvv+19/MgjjxiLFi2ipyaZMmWKMWbMGOOVV14xDMOgryaYOXPmWT+jr4HxeDzGQw89ZJSWlhqGYRjHjx83nE4nfQ1QVlaWsWbNGu/jN954w/j+++9r1VfLT9F3795d1113nfdxWVmZIiMjtXLlSvXq1UuS1Lt3b61cubK+SvzZCQkJ0WOPPSZJ8ng8On78uNq1a0dPTbBw4UL17t1brVu39v6MvgZu165dmjNnjt58802lpKRIoq+B2rp1qwzD0IcffqjZs2frP//5j+Li4uhrgOLi4nTllVdKktxut3744Qf17du3Vn21/BT96VasWKGrrrpK7du3l9PpVFRUlCQpOjpaOTk58ng8Cgs7r1oSkNWrV+uvf/2r+vfvr27dutHTAO3Zs0f79u3TyJEjtXPnTu/P6WvgHnroIXXv3l2lpaW65557FBUVRV8DlJGRofT0dL366quKiYnRqFGjFB4eTl9NtHjxYv3mN7+RVLv7gOXfwVdYv369vvvuOz3//POSJIfDofz8fEnln3E0bdqUE/AcXX311frzn/+sw4cPa968efQ0QCtWrJDdbtecOXO0ceNGbdmyRX/961/pqwm6d+8uSQoNDVXfvn313Xff0dcARUVF6eKLL1ZMTIwkqU+fPkpNTaWvJlq6dKluvPFGSbXLrPOi6ykpKdqwYYPGjRunEydOKCMjQ9dee63S0tKUkJCgTZs26dprr63vMn829uzZo8OHD6t///6SpNatW+vw4cP0NECPPvqo99/FxcUqKCjQfffdp3379tHXAOzdu1ebNm3SHXfcIUk6cOCAfv3rX3O+BqhHjx7Kzs5WaWmpQkNDlZGRobZt28put9NXE6xfv169evVSeHi4JNXqfLX8H5v54YcfNHToUHXt2lWSVFBQoHvuuUcDBw7U//7v/yoxMVGHDh3SM888wzc9/XTw4EFNmzZNnTt3lsfj0d69e/XCCy8oPDycnppg2bJlmjdvnkpKSnTPPffoqquuoq8BOH78uCZPnqzOnTvL5XLJ4/Fo7Nixys3Npa8BWrFihdavX6+4uDgdPXpU48ePV1FREX01wciRI/XCCy+oWbNmkqTs7Oxz7qvlAx4AgPPRefMZPAAA5xMCHgAACyLgAQCwIAIeAAALIuABALAgAh6A1yOPPKIJEybUdxkATEDAA5AkZWZm6ujRo1qyZIkKCwvruxwAAeL/wQOQJM2ZM0e9e/fW6NGj9eSTT+qWW26RJH344Ydavny5LrnkEtlsNi1fvlyPPfaY7rrrLn355Zdau3atYmNjdfz4cY0ZM0bx8fH1vCcAJN7BA/ivtLQ09e3bV7fccovmz58vSdq5c6feffddvffee5owYYKio6PVtm1b3XXXXdq3b5/efvttTZ48WaNGjdLll1+u6dOn1/NeAKhwXvwuegA127Bhg3r27ClJuu222/Tuu+/qwIED+u6779S1a1c1atRIktS3b19t2rRJkrR27VoVFxdr0qRJkqT8/HyVlJTUS/0AzkbAA9DChQtVVlaml156SZIUHx+v+fPny+FwyGazVfkcwzDUtm1bTZ482fuzir92BaD+EfDAeS4/P9/7B0Iq9OjRQ1OnTtV7772nOXPmqKioSI0aNdLGjRu961x55ZV6++235XK5FB0drW3btumjjz7Siy++WB+7AeAMBDxwHisqKtIzzzyj/Px8HT9+XC1btpQk7d69WydOnNCcOXOUnJysBx98UJdddplCQkK8f76yffv2Gj9+vMaMGaOLLrpIubm5Gj16dH3uDoDT8C16ADVauXKl929Pz5s3T0eOHNGYMWPquSoAvvAOHkCNPvnkE61evVo2m005OTl64YUX6rskAH7gHTwAABbE/4MHAMCCCHgAACyIgAcAwIIIeAAALIiABwDAggh4AAAs6P8DlYAGgR9VzfYAAAAASUVORK5CYII=\n",
-      "text/plain": [
-       "<Figure size 576x396 with 1 Axes>"
-      ]
-     },
-     "metadata": {},
-     "output_type": "display_data"
+   "id": "4b9647f3",
+   "metadata": {
+    "collapsed": false,
+    "editable": true,
+    "jupyter": {
+     "outputs_hidden": false
     }
-   ],
+   },
+   "outputs": [],
    "source": [
-    "# Common imports\n",
-    "import os\n",
+    "%matplotlib inline\n",
+    "\n",
     "import numpy as np\n",
-    "import pandas as pd\n",
+    "from time import time\n",
+    "from scipy.stats import norm\n",
     "import matplotlib.pyplot as plt\n",
-    "from sklearn.linear_model import LinearRegression, Ridge, Lasso\n",
-    "from sklearn.model_selection import train_test_split\n",
-    "from sklearn.utils import resample\n",
-    "from sklearn.metrics import mean_squared_error\n",
-    "from IPython.display import display\n",
-    "from pylab import plt, mpl\n",
-    "plt.style.use('seaborn')\n",
-    "mpl.rcParams['font.family'] = 'serif'\n",
-    "\n",
-    "# Where to save the figures and data files\n",
-    "PROJECT_ROOT_DIR = \"Results\"\n",
-    "FIGURE_ID = \"Results/FigureFiles\"\n",
-    "DATA_ID = \"DataFiles/\"\n",
-    "\n",
-    "if not os.path.exists(PROJECT_ROOT_DIR):\n",
-    "    os.mkdir(PROJECT_ROOT_DIR)\n",
-    "\n",
-    "if not os.path.exists(FIGURE_ID):\n",
-    "    os.makedirs(FIGURE_ID)\n",
-    "\n",
-    "if not os.path.exists(DATA_ID):\n",
-    "    os.makedirs(DATA_ID)\n",
-    "\n",
-    "def image_path(fig_id):\n",
-    "    return os.path.join(FIGURE_ID, fig_id)\n",
-    "\n",
-    "def data_path(dat_id):\n",
-    "    return os.path.join(DATA_ID, dat_id)\n",
-    "\n",
-    "def save_fig(fig_id):\n",
-    "    plt.savefig(image_path(fig_id) + \".png\", format='png')\n",
     "\n",
-    "infile = open(data_path(\"chddata.csv\"),'r')\n",
-    "\n",
-    "# Read the chd data as  csv file and organize the data into arrays with age group, age, and chd\n",
-    "chd = pd.read_csv(infile, names=('ID', 'Age', 'Agegroup', 'CHD'))\n",
-    "chd.columns = ['ID', 'Age', 'Agegroup', 'CHD']\n",
-    "output = chd['CHD']\n",
-    "age = chd['Age']\n",
-    "agegroup = chd['Agegroup']\n",
-    "numberID  = chd['ID'] \n",
-    "display(chd)\n",
-    "\n",
-    "plt.scatter(age, output, marker='o')\n",
-    "plt.axis([18,70.0,-0.1, 1.2])\n",
-    "plt.xlabel(r'Age')\n",
-    "plt.ylabel(r'CHD')\n",
-    "plt.title(r'Age distribution and Coronary heart disease')\n",
-    "plt.show()"
+    "# Returns mean of bootstrap samples \n",
+    "# Bootstrap algorithm\n",
+    "def bootstrap(data, datapoints):\n",
+    "    t = np.zeros(datapoints)\n",
+    "    n = len(data)\n",
+    "    # non-parametric bootstrap         \n",
+    "    for i in range(datapoints):\n",
+    "        t[i] = np.mean(data[np.random.randint(0,n,n)])\n",
+    "    # analysis    \n",
+    "    print(\"Bootstrap Statistics :\")\n",
+    "    print(\"original           bias      std. error\")\n",
+    "    print(\"%8g %8g %14g %15g\" % (np.mean(data), np.std(data),np.mean(t),np.std(t)))\n",
+    "    return t\n",
+    "\n",
+    "# We set the mean value to 100 and the standard deviation to 15\n",
+    "mu, sigma = 100, 15\n",
+    "datapoints = 10000\n",
+    "# We generate random numbers according to the normal distribution\n",
+    "x = mu + sigma*np.random.randn(datapoints)\n",
+    "# bootstrap returns the data sample                                    \n",
+    "t = bootstrap(x, datapoints)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "26cf7fd4",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "We see that our new variance and from that the standard deviation, agrees with the central limit theorem."
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "b2205188",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "## Plotting the mean value for each group\n",
-    "\n",
-    "What we could attempt however is to plot the mean value for each group."
+    "## Plotting the Histogram"
    ]
   },
   {
    "cell_type": "code",
    "execution_count": 2,
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAfQAAAFnCAYAAABQJLtnAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAgAElEQVR4nO3deVhU9eIG8HdYhm2AQSSXzDVFMw0xNVyue5laLjdXwDQVUTF3Tc20frmXGeaGmpqZ6VUUzcwll2uCGwpooZaKiCgiMOwwDPP9/YHOFRUHlOEwh/fzPPe5npnDmfd7CF7OrhBCCBAREZFZs5A6ABEREb04FjoREZEMsNCJiIhkgIVOREQkAyx0IiIiGWChExERyQALncq9Q4cOoVevXnB3d8fevXufeD8jIwPNmzdHx44dERgYKEHCkvnuu+/Qpk0bLF++vEw/959//kG/fv0wYMAADBs2rEw/u7h+//13dOvWDb6+vlJHITI7LHQq97p27YqZM2fC1tYWmzdvfuL93bt3Q6fT4f3338fHH38sQcKSCQgIQLt27cr8c9esWYPOnTtj27Zt6Nq1a5l/fnF07twZfn5+UscgMkssdDIb3bt3x6VLlxAVFWV4TQiBkydPokmTJhImMw93797FSy+9BAAYPHiwxGmIqLSx0MlsVK9eHZ07d8YPP/xgeO2PP/5AmzZtoFAoCs2bmZmJGTNmYNCgQRg4cCC2bt1qeO/cuXMYMmQIfH19MXDgQBw+fBgAoNVq4evrC3d3d2zZsgUjR45Ely5dcOjQoafmGT58ONzd3TFixAgAwN69e9GuXTuMGTMGADB37lwMHDgQvr6+mDRpEjIyMp66nGnTpqFJkyY4ffo0AGD06NFwd3dHXFycYZ7du3ejf//+8PHxweTJkw3Lun//PkaMGAFfX18MGjQIQUFBT/2MuXPnIjo6GkFBQfD19UV6ejoyMjIwc+ZMDBo0CAMGDMDatWshhHhiPYwYMaJQvkclJSUhICAA3t7ehdYlUHCoZPDgwfD19YW3tzfCw8MN7+l0Onz11VcYOHAgfHx8MH78eNy6dcvwvhACS5YsQf/+/TFw4EAkJSU9dVyZmZmYOnUqBg0ahMGDB2POnDnQ6XSG948fP4733nsPPj4++Oabb9CpUyf06tXL8EdhUev1aVavXo0ePXrgo48+QlBQENzd3eHr64v4+Hj0798f7u7uCA4OxpAhQ/Daa68hLi4O9+/fx7hx4+Dt7Y3+/ftj165dAICwsLBChxYOHTqETp064ZNPPgEAbNu2DZ06dUJAQAAmT56MAQMGYPDgwYXWEdETBJEZOHXqlAgMDBSnT58WjRs3Fvfu3RNCCDFp0iSRkZEhfHx8xNKlSw3zz5o1S0ydOlUIIUR6erro1KmTOHv2rBBCiGPHjomYmBjDe23bthVpaWmGr23QoIEICgoSQgixb98+8fbbbz81U15envDy8hLh4eGG10aNGiXy8/OFEEJs3LjR8HpgYKD45ptvDNPTp08XgYGBhumOHTuKU6dOFcpw69YtIYQQ586dEy1bthRJSUlCCCEWLlwoZs6cKYQQYtGiRWLNmjVCCCEyMzPFwIEDi1yHPj4+YufOnYbpGTNmiOnTpwshhMjOzhY9e/YUu3btKpRh+fLlQggh9uzZIy5duvTEMocNGyaWLVsmhBAiISFBtGzZ0pB79+7dIiUlRQghxK1bt0T79u0NX7dq1SoxdOhQodPphBBCfP7554ZsO3fuFB4eHiI2NlYIIcSIESPE6tWrnzqmlJQUsXv3bsP09OnTxfbt24UQQiQlJQkPDw/D9+fw4cPC3d3dsJ6ftV4fd+zYMdGmTRvDeBYsWCAaNGhgeP/WrVuiQYMGhvW3fv16kZCQID788EPD9zkpKUm0adNGnDlzxjBOHx8fwzICAwMN34+H082bNxcJCQmGdTZgwICn5iMSQghuoZNZadmyJerVq4eff/4ZsbGxcHNzg4ODQ6F59Ho9QkJC8MEHHwAAVCoVOnbsiD179gAA6tevj2+//RYDBw7E6NGjodFocOPGjULLeHiM293dHbdv335qFisrK/To0QO7d+8GAFy9ehUNGjSAhUXBj5WtrS0GDx4MHx8f7Nu3D3/++edzjXnXrl3o1KkTKlWqBAB47733sHfvXgghoFarceLECfz999+wt7fH999/X6xl6vV67N27F//+978NWbt3747g4OBC83Xp0sXwmY0bNy70XkJCAk6ePGlYzy+99BI8PT2xb98+AEDDhg0Ne0lmzJiBO3fuGLa0g4OD0atXL1haWgIARo0ahRYtWhiWXbt2bbzyyiuG5Ty6t+JRzs7OiI+Px6BBg+Dr64szZ84Y1vPx48fh6uoKT09PAAXH5+3t7Yu1Xh/322+/4V//+hfUarVh3qfp3LkzAOCjjz6CEAJhYWGGdVypUiV06NDhiXX8LC1atDAcJunVqxcuXLiA+Pj4Yn89VSxWUgcgKikfHx98++230Gg0GDJkyBPvJycnQ6vVYsmSJbC1tQUApKWloVGjRgCA6dOno0GDBli6dCkAoFOnTsjOzi60DJVKBQCwsbFBXl5ekVl69+6NoUOHYtasWQgJCUHfvn0BAKdPn8bChQuxd+9e1KhRA8HBwYbdrSV19+5dXLt2zbB7VqfToXLlykhJScHw4cNhZ2eHiRMnwtLSEv7+/nj33XeNLvPhOnpYZkBB4SQkJBSa7+F6KCoXULA+Hx7ySElJQYMGDQAUHDrw9vbG8OHDART8cfRwPd+9excuLi6GZVWpUqXIz1UqlUV+D3bt2oVt27Zh9+7dUKvVWL58ueEPsMTExEKfAcBQyA8zFLVeH10vAHDv3j00bNjQMO3s7PzUPI6Ojk+sn8fX8aVLl576tU/z6Oc8zJ6YmIjq1asXexlUcbDQyey8//77+Oqrr3D79m3UqlXrifcrVaoEpVKJ2bNno2nTpgCAvLw85OTkAACioqLw0UcfGeZ/VmEb07hxY1StWhWHDx9GTEwM6tWrZ/iMOnXqoEaNGgBQ6Lju01hbW0Or1QIo+OPjUdWqVcMrr7yCOXPmGF5LTk5GpUqVcO/ePfj6+sLX1xehoaEYNWoUGjdujJo1az7z8x6uo+TkZEPm5OTkJ4r1WapWrQoACAwMNJRWbm4udDodkpKScPv2bcOejsfXcbVq1ZCSkmKYTklJQWZmpmF9FVdUVBSaNm1qKLtH17ObmxuSk5MLza/RaAplKGq9Pu6ll14qtKxHl1OUh+snOTnZUMCPruNHv+fAk9/3xz/n4fpyc3Mz+tlUMXGXO5kdGxsbzJ8/HxMmTHjq+xYWFujdu7dhFzsArFq1yrBrvGbNmoiMjAQAXL58GYmJiS+U5/3338fChQvx1ltvGV6rVasWYmNjDb+E//jjj2cuo0aNGvj7778BFOwqflSfPn1w/PhxpKamAgCuX7+O0aNHAwCWLl2K6OhoAEDTpk1hbW391F3Gj3u4jh7u/s3JycH+/fsNexiKo0qVKmjbti1CQkIMr82ZMwenT5+GWq2Gk5OTYT2fOHHiiTGFhIQgPz8fAPD111/j8uXLxf7sh2rVqoXLly9Dq9VCp9MhLCzM8F779u2RnJxsOBnv999/R25ubqEMRa3Xx3Xr1g3//e9/Dd/P/fv3G832cP08XMcpKSk4duyYYRd8jRo1EBMTA61Wi9zc3KeedHjhwgXcu3cPQMEJfM2aNePWORXJcu7cuXOlDkH0LCdPnsTChQsRGRmJvLw8eHp6om7duqhcuTKAgrPEz549a/jl2Lx5c7Rq1QpHjx7F2rVrsXv3bjg5OWH06NGwsLCAu7s71q5diwMHDuDevXuIj49HeHg4WrRogSlTpuDWrVuIjIxEt27dMHbsWCQkJODChQvo1avXU/PVqFEDa9euxfz582FnZwcAqFOnDmJjY7Fs2TKcOXMGdnZ2OHv2LDQaDS5cuIB9+/YZjns3btwYL7/8MgIDA/H777+jQYMGOHz4MCIjI/Gvf/0L9evXh1qtxrx587B3716cPHkSX3zxBVxcXGBlZYVly5YhJCQE27Ztw9ChQ9G+ffsnMs6dOxehoaGIjo5GXFwc2rZti1atWuGPP/7AunXrEBwcjG7dumHw4MFQKBT46KOPDOuhRo0ahuPZj2vbti1+/vlnbN68GTt37sQbb7yBfv36wcLCAnXr1sWyZcvw3//+F0IInDt3DpGRkejatStatWqF69evY/ny5QgODkb9+vXh7e2NsLAwLF26FLGxscjJyUFGRgaCgoJw/fp1WFhYoFmzZoU+393dHeHh4Vi1ahXOnTsHOzs7nDlzBhYWFmjdujUaNWqE+fPnY//+/VCpVLhx4wa6du2Kl19+GdWqVStyvT6udu3ayMvLw4IFC3DkyBE0btwYJ06cwLhx46DRaBAQEICEhAScOXMGTZs2NWzlPyz0H3/8ESEhIRg5cqThOHu1atVw9epVrFixAlFRUahbty6OHDkCrVaLFi1a4MyZM3BwcMDZs2exdu1axMXFYfHixUXu7idSiOL8OU9EZIY0Gk2h4+bNmjXDjh07DIcZikun0yEnJ8dwbD8qKgr+/v4IDQ0t1byPeng+wMKFC032GSQv3OVORLI1duxYw272gwcPwtXV9annXRhz+/ZtfPbZZ4bpPXv2oG3btqWWk6g08KQ4IpItDw8PDB48GLa2tlAoFFi+fDmsrEr+a8/FxQVarRYDBw6EEAJVq1YtdDJdadu2bRt27dqF3NxcrFq1qshj+0SPMtku98TERCxbtgyXL1/Gzp07n3g/NzcXixYtQpUqVRATEwM/Pz/UqVPHFFGIiIhkz2S73MPDw9G5c+ciz7jdtGkTqlWrhlGjRhmu4yUiIqLnY7JC79at2xN38HrUsWPHDGesuru74/Lly8+8jzIREREVTbKT4pKSkgoVvkqlKvIBDI/iSflERERPkuykOFdXV2RmZhqmMzIy4OrqavTrFAoFEhPTTRmtTLi5OZr9OOQwBkAe45DDGACOozyRwxgAeYzDzc3R+Ewo4y10jUZj2K3eoUMHXLhwAQBw5coVNGzY8Jn3jSYiIqKimazQz5w5g5CQECQmJmLlypXIyclBUFAQfvrpJwDAkCFDEB8fj5UrV2LDhg2YN2+eqaIQERHJnlneKc7cd58A8tkNZO5jAOQxDjmMAeA4yhM5jAGQxzjK5S53IiIiMg0WOhERkQyw0ImIiGSAhU5ERCQDLHQiIiIZYKETERHJAAudiIhIBljoREREMsBCJyIikgEWOhERkQyw0ImIiGSAhU5ERCQDLHQiIiIZYKETERHJAAudiIhIBljoREREMsBCJyIikgEWOhERkQyw0ImIiGSAhU5ERCQDLHQiIiIZYKETERHJAAudiIhIBljoREREMsBCJyIikgEWOhERkQyw0ImIiGSAhU5ERCQDLHQiIiIZYKETERHJAAudiIhIBljoREREMsBCJyIikgEWOhERkQyw0ImIiGSAhU5ERCQDLHQiIiIZYKETERHJAAudiIhIBljoREREMsBCJyIikgEWOhERkQyw0ImIiGSAhU5ERCQDLHQiIiIZYKETERHJAAudiIhIBljoREREMsBCJyIikgEWOhERkQyw0ImIiGTAypQLDw0NxcGDB+Hq6gqFQoGAgIBC79+6dQuLFy9GkyZNEB0djZ49e6Jz586mjERERCRLJiv07OxszJkzB/v27YNSqcS4ceMQFhYGLy8vwzzr1q1D8+bNMXToUPz111+YMGECC52IiOg5mGyXe0REBKpXrw6lUgkA8PT0xLFjxwrNU7lyZSQnJwMAkpOT0bhxY1PFISKiikSrheWli0BqqtRJyozJttCTkpLg4OBgmFapVEhKSio0z7BhwzB27FgsWLAAUVFRGDNmjKniEBGRXAkByxvXYHU+HFYXwmF9PhxWl6KgyM0FfHyApSulTlgmTFborq6uyMzMNExnZGTA1dW10DyffPIJ+vXrh549eyI5ORlvv/02Dh8+DLVa/cxlu7k5miRzWZPDOOQwBkAe45DDGACOozwpt2NISADOnPnf/86eBVJS/ve+lRXwxhtAy5aAn1/5HUcpM1mhe3h4ID4+HlqtFkqlEufPn8fgwYOh0WhgZWUFlUqFO3fuwM3NDQDg5OQECwsL6PV6o8tOTEw3Vewy4+bmaPbjkMMYAHmMQw5jADiO8qTcjCEjA9ZREbA6Hw7rCwVb4JZxtwrNoqtbD7pOXaHzbI68Zs2ha9wEsLMDUI7G8QKK+weJyQrdzs4Oc+fOxZdffgkXFxe4u7vDy8sLixcvhlqthp+fH2bMmIEffvgBFy5cQFxcHCZOnIhKlSqZKhIREZVneXmwuvzX/3adXwiH5ZXLUDyyoaev7Ibcd96FrtmD8vZoBuHC3gAAhRBCSB2ipMz9ry1APn81mvsYAHmMQw5jADiO8sTkYxACFjE3DFvd1ufDYXUxEoqcnP/NYu+API9mBeXt2Rw6D0/oa7wCKBTF/hi5fC+Kw6TXoRMREQGAIjER1hHhhXadWzxy3FtYWkLXqDF0zZpD1/xN5DVrjvwG7oClpYSpzQsLnYiISldmJqwvRhbedR57s9As+bVqI6dDpwe7zt+ErklTwN5eosDywEInIqLnp9PBMvqvgq3uiPOwPh8Oy8t/FT7u7eqK3C5vF2x9ezZHnkdziMeueqIXx0InIqLiEQIWsTcLyvt8OKzPnys47p2d/b9Z7Oyga9Gq4IS1B2ed62vWKtFxb3o+LHQiInoqRVKS4bj3w13nFo/cIExYWCC/4WsFJ6w9OOs8v2GjguvAqcxxrRMRUSG2P2wAVn6LytevF3o9v2Yt5LRt/79d503eAB65IyhJi4VOREQFdDo4fDYD9uvWAI6O0Hbq8r9d5x7NIR7cCIzKJxY6ERFBkZYKJ79hUB45DF2j12D16z6kOvDENXNisqetERGRebC4GQN1z7ehPHIYuZ27QvPLQaB2baljUQmx0ImIKjCrM6fh8m4nWF2ORtZIf6Rt3gbh6CR1LHoO3OVORFRB2ezYBscJY4H8fKQvWoqcYSOkjkQvgIVORFTR6PWwXzwfDksXQ+/ohLR1m5DXsbPUqegFsdCJiCqS7Gw4fjwatiHByK9VG6k/bke+e0OpU1EpYKETEVUQioQEOH84ENbnw5HXygupG3/iLVhlhCfFERFVAJaXLsKlW0dYnw9HTv9B0OzYwzKXGRY6EZHMKQ/sh0vPt2F5Ow4Zs+YgfflqwMZG6lhUyljoRERyJQTsVn0HpyEDAaFH6vrNyB4/mQ9KkSkeQycikqO8PKg+mQy7zRuRX6Uq0jb/DJ2Hp9SpyIRY6EREMqPQpMBp+BAoTxxH3utNkfbjNuirvyx1LDIx7nInIpIRy+v/QP1uZyhPHEdutx7Q7PmNZV5BsNCJiGTC+uQJqLt1gtW1f5AVMAFpG7cAKpXUsaiMcJc7EZEM2P60Gaop4wEA6ctWIGewr8SJqKyx0ImIzJleD4f/mwP7Fd9C7+KCtO9/RF6bdlKnIgmw0ImIzFVGBpzGjITNb/ugq/cq0rZsR37dV6VORRJhoRMRmSGL+Ntw8hkA60tR0LZrj7T1P0CoXaSORRLiSXFERGbGKuI81O90hPWlKGT7DkXqz8Esc+IWOhGROVHuDYFTgB+Qk4OML+Yje9RY3vmNALDQiYjMgxCwC1wK1bzPIewdkPbDz9C+867UqagcYaETEZV3ublwnPwxbLdvRf7LNZC6eRvyX28idSoqZ1joRETlmCIpCc5DB8P6dBjyPJsjddPPEFWqSB2LyiGeFEdEVE5ZXr1S8Azz02HI6dUXml2/ssypSCx0IqJyyPro71B37wLLmzHInDQN6Wu+B+zspI5F5Rh3uRMRlTO2G9ZBNXMqYGmJtJVrkfvBAKkjkRlgoRMRlRc6HRzmzIT92tXQV66M1I1boWvZSupUZCZY6ERE5YAiPQ2OfsNg8/sh6NwbIvXH7dDXqi11LDIjLHQiIolZxN6Es09/WF2OhrZTF6QFbYBwcpY6FpkZnhRHRCQhq7On4dKtI6wuRyNrxCik/ridZU7PxWihh4aG4tChQwCAdevWYdy4cYiOjjZ5MCIiubPZuR3qvj2hSElB+oKvkDl/CWDFHaf0fIwW+vbt29GgQQNERUVh+/bt6N27N9asWVMW2YiI5EkI2C+aB6fRIyCUNkjd8h/kDPeTOhWZOaOFXqtWLdSqVQv79+/HkCFD0LlzZ1SvXr0sshERyU92NhxHDYPD14uQX7M2NL8eRl6nLlKnIhkwum8nNjYWv/32G3755ReEhIRAr9cjISGhLLIREcmKIiEBzkMHwTr8HPJavoXUjT9BVK4sdSySCaNb6L6+vggJCcHHH3+MSpUqYcmSJXj11VfLIhsRkWxY/nmp4Dau4eeQ028gNDv3ssypVBndQvf09MSqVasM09OnTzdpICIiuVEe+g2Ofh/BIjMDmTM/Q9b4yXyGOZU6o1voMTEx8PHxgbe3N7KysuDv74+4uLiyyEZEZN6EgN3q7+DkOxCKfB1S1/+ArAlTWOZkEkYLffny5Rg7dixq1qwJe3t7fPnll1i5cmVZZCMiMl95eVBNmQDVZzOhr+wGTch+aN/rLXUqkjGjhV6jRg14eXlBqVQCACpXrgxnZ970gIioSCkpcB74b9ht3oC815tCc+AodM2aS52KZM5ood+7dw85OTlQPNhFFB8fj5iYGFPnIiIySxbXrwFeXlCeOIbcbt2h2fMb9C/XkDoWVQBGT4rr3bs3unfvjtzcXJw9exZJSUlYtmxZWWQjIjIrVhHn4TygD5CSgqwxHyNz9ueApaXUsaiCMFrorVq1QnBwMCIiIiCEQLNmzaBWq8siGxGR2bAKPwvnAX2hyEgHgoKQ2Xug1JGoginWw1nUajU6dOiAjh07Qq1W4z//+Y+pcxERmQ2rM6fh3K83FBnpSF8RBIwcKXUkqoCMbqEPGTLkiddu3ryJfv36mSQQEZE5sT4VCqdBH0CRk420oA3Qvt9H6khUQRktdCcnJ0Op63Q6REdHIzc31+TBiIjKO+uTJ+Ds3Q/QapG2dhO0Pd+XOhJVYEYLfcmSJbCzszNMt27dGosWLSrWwkNDQ3Hw4EG4urpCoVAgICCg0PtCCGzevBkAcPv2baSlpWHBggUlyU9EJAnr40fhPGQgoNMhbf1maN/tIXUkquCMFvqlS5cM/9br9UhMTMSFCxeMLjg7Oxtz5szBvn37oFQqMW7cOISFhcHLy8swT0hICJycnNC7d8HNFi5fvvw8YyAiKlPWRw7DeehgQK9H2sYt0HbtJnUkIuOFPnnyZNSuXRtCCCgUCri5ueGTTz4xuuCIiAhUr17dcEMaT09PHDt2rFCh7927F+3atcMPP/yA+/fv87g8EZV7ysMH4DTUG1AokPrDVuR16ip1JCIAxSj0SZMmGbagSyIpKQkODg6GaZVKhaSkpELzxMfHIyMjAwEBAbhx4wZGjBiBX3/9FZZGrtt0c3MscZ7ySA7jkMMYAHmMQw5jAMr5OPbuBT4cXHBt+d69UHcp+jnm5XocxSSHMQDyGYcxxbqxzOPWrl2LkUYuy3B1dUVmZqZhOiMjA66uroXmUalUeOONNwAAderUQUZGBu7cuYMaNZ59V6XExHRjscs9NzdHsx+HHMYAyGMcchgDUL7Hody3F04jPwSUSqT+uB15b7QCishansdRXHIYAyCPcRT3D5IiC71Tp06G270+SgiBtLQ0o4Xu4eGB+Ph4aLVaKJVKnD9/HoMHD4ZGo4GVlRVUKhW8vLxw69YtAAWFn5+fDzc3t2IFJyIqK8o9u+A06iPAxhapW3cgz6uN1JGInlBkoXt6emLixIlPvC6EwHfffWd0wXZ2dpg7dy6+/PJLuLi4wN3dHV5eXli8eDHUajX8/PwwcuRILFmyBKtXr0ZsbCwWLVoEGxubFxsREVEpstm1A45jRkLY2SN1607oWr0ldSSip1IIIcTT3sjOzi50udqjrly5And3d5MGexZz330CyGc3kLmPAZDHOOQwBqD8jcPmPz/DcZw/hIMKqduCoXuzZbG+rryN43nIYQyAPMbxwrvcHy3zqKgoxMTEQK/XAwD27NmD77///gUjEhGVXzY/b4Hj+DEQTs5I3b6Ljz+lcs/oSXHLly/HpUuXcPv2bTRp0gTx8fFITzfvv3aIiJ7F9sdNUE3+GEKtRup/QqBr6iF1JCKjjD6cJTU1FWvWrEHr1q2xYMECbNq0Ca1atSqLbEREZc5243o4ThoH4eICzc5fWOZkNowW+sMbwzx6CdqdO3dMl4iISCK269fAcdpE6CtXhiZ4H/JfbyJ1JKJiM7rL/fr16zh06BDq16+Pvn37QqVSwdrauiyyERGVGbvV30H12Uzo3V6CJvgX5Ls3lDoSUYkYLfQVK1YAACwtLeHm5gaNRoNevXqZPBgRUVmx++5bqL6YjfwqVZEa/Avy6zeQOhJRiRnd5f7pp58a/t2jRw94e3tDpVKZNBQRUVmx+/brgjKvVh2pIb+yzMlsGd1Cj42NxaxZs+Dm5oY+ffqgbt26ZZGLiMjk7L9aCIfF85Ff4xVodu6Fvg5/v5H5MlroX331FapVq4a7d+9i586duHHjBtq0aYM+ffqURT4iotInBOwXzYPD0sXIr1kLmuBfoK9ZS+pURC/E6C53jUYDoOBe62lpaQgLC8O+fftMHoyIyCSEgMP8LwrKvFZtaHb/yjInWTC6hT579mxYWFggKSkJffv2xY4dO1CtWrWyyEZEVLqEgMPns2G/MhC6uvWQGvwL9NVfljoVUakwWuh6vR6TJk1C69atyyIPEZFpCAGHz2bAfs1K6F6tX1DmVblxQvJhtNCXLFmCevXqlUUWIiLTEAKqmVNhtz4IOveG0OzYC1GlitSpiEqV0UJnmRORWdProZo+GXab1kPX6LWCMndzkzoVUakzWuhERGZLr4dqynjY/bgJusZNoNmxB8LVVepURCbBQiciecrPh+PEANj+vAV5TT2Q+p/dEC6VpE5FZDJGL1sDgNzcXNy9exfx8fGIj4/HjBkzTJ2LiOj56XRwHOdfUObNPJG6I4RlTrJndAt92Yx6U1oAACAASURBVLJl2LRpE9RqNRQKBQAgLS0NCxYsMHk4IqIS0+ngOHYkbHftRF7zFkjdFgzh5Cx1KiKTM1roR44cwR9//AEHBwfDaz///LNJQxERPZe8PDj5D4fN3t3Ia/kWUrfugHB0kjoVUZkwWuivvfYabGxsCr1WqxbvqkRE5YxWCye/YbD5dS+0Xm2QuuU/AB8kRRWI0ULPzMxEz5490bhxYyiVSgBAVFQUb/9KROVHbi6cRgyBzYH90Lb9F1I3bwMe2atIVBEYLfQbN25g1KhRhV67e/euyQIREZVITg6cPvKBzeGD0LbviNRNWwF7e6lTEZU5o4X+f//3f2jWrFmh1zw8PEwWiIio2LKz4fzhICiPHYG2UxekbtgC2NlJnYpIEkYvW3u8zAFwdzsRSS8rC84+A6A8dgS5Xd9B6safWOZUoRndQg8PD8fs2bNx8+ZN6PV6CCGgUCgQEBBQFvmIiJ6UkQFn3wFQnjyB3G49kLZ2I/DYybtEFY3RLfStW7fixx9/hLe3N6Kjo3HkyBEMHz68LLIRET1BkZEO9aB/F5R5j/eRtm4Ty5wIxSj06tWro1KlStDr9YbpnJwckwcjInqcIj0Nzv37wPp0GHJ69UVa0AbgwdU3RBVdsc5yv3v3LvLz87Fx40ao1WqcP3++LLIRERkoUjVwHtAH1ufDkdO3H9K/WwNY8XEURA8Z/Wnw8fHBvXv34O/vj5kzZ0Kj0WDKlCllkY2ICACgSEku2DKPvICc/oOQ/u1KwNJS6lhE5YrRQm/VqpXh3+vXrzdpGCKixymSk+D8QS9YX4pC9mBfZHwdyDInegqjx9BjYmLg4+MDb29vZGdnw9/fH3FxcWWRjYgqOMX9+1D36VlQ5r7DkLF0OcucqAhGC3358uUYO3YsatasCTs7O3z55ZdYuXJlWWQjogpMce8e1H17wCr6T2QPG4GMJd8AFsV64jNRhWT0p6NGjRrw8vIy3Me9cuXKcHbmowiJyHQsEu5C3ac7rC5HI2ukPzIWfs0yJzLC6E/IvXv3kJOTY3gWenx8PGJiYkydi4gqKIs78XDu3R1Wf19Fln8AMr9cBDz4/UNERTN6Ulzv3r3RvXt35Obm4uzZs0hKSsKyZcvKIhsRVTS3bkHd611YxtxA1riJyPx0LsucqJiKdZZ7cHAwIiIiIIRAs2bNoFaryyIbEVUgFrdigQ/eg2XMDWROmoqs6Z+yzIlKoFgHpdRqNTp06ICOHTtCrVZj06ZNps5FRBWI9alQqN/tDNy4gcypM5D1yWyWOVEJGd1CP378OFavXo379+8bHs6SlpaGDz/8sCzyEZGcCQG7NSvg8PnsgulvvkGWN58VQfQ8jBb6woULMXv2bNSsWRMKhQJCCHz33XdlkY2IZEyRkQ7VxHGwDQmG3u0lpK3dCHWvd4HEdKmjEZklo4X+6quvonXr1oVeGzNmjMkCEZH8WV69AqePfGB19QryWr6FtHWboK9aTepYRGbNaKEPGzYMc+bMQePGjQ3Xou/Zswfff/+9ycMRkfwo9+yC4/ixsMjMQNaoMcj87P8Aa2upYxGZPaOFvmrVKmRlZSE3N9dwLXpCQoLJgxGRzOTlweGLz2C/ZgWEvQPSgjYgt/e/pU5FJBtGCz0jIwNbt24t9Nrx48dNFoiI5Mci4S4cRw6F8lQodK/WR9qGLch3byh1LCJZMXrZWrt27RAbG1votceniYiKYn0qFOrO7aA8FYrc93pDc+Aoy5zIBIxuoe/YsQMrV66Ei4sLlEql4bI1X1/fsshHRObqsUvSMj6fj2z/sby+nMhEjBZ61apVsXnzZsM0L1sjImOeuCRt3SbkebWROhaRrBkt9PXr18POzq7QawsWLDBZICIyb5ZXr8BpmDes/r7KS9KIypDRY+iPlzkAw9nuRESPUu7ZBfU7HQuelDZqDDS79rHMicqI0S10IiKjHr8kbe1G5PbqK3UqogqFhU5EL6TQJWn1GyDt+x95FjuRBIwWenx8PC5evAiFQoHXX38d1atXL4tcRGQGrE+FwnHEh7C8l4Dc93oj/dsVECpHqWMRVUjPLPR58+Zhy5YtsLe3hxAC2dnZ8Pb2xqxZs8oqHxGVR0LAbvUKOHzBS9KIyosiT4rbtm0brl27hn379uHcuXMIDw/HL7/8gmvXruHnn38u1sJDQ0Mxd+5cLF++/JmXuu3Zswfu7u7IzMws+QiIqEwpMtLhOHIoVHNmQu9aGanBvyB7dADLnEhiRRb6kSNHEBgYiDp16hheq1u3LgIDA3H06FGjC87OzsacOXMwc+ZMjBs3DleuXEFYWNgT8127dg3Xrl17zvhEVJYsr16B+p2OsN2zC3mtvKD5/QSvLycqJ4osdJVKBZVK9dTX1Wq10QVHRESgevXqhie0eXp64tixY4Xmyc7Oxrp16zB27NgSxiaislb4krSx0AT/An2VqlLHIqIHijyG7uhY9Iktz3rvoaSkJDg4OBimVSoVkpKSCs3zzTffYMyYMYbSLy43N3mcdCOHcchhDIA8xmGyMeTlAdOnA998Azg4ANu2wb5/f9ib5tNk8b0A5DEOOYwBkM84jCmy0Hfv3o3Dhw8/9b3MzEx8+umnz1ywq6troWPiGRkZcHV1NUzfuXMHaWlp2L9/v+G1DRs2oH379mjSpMkzl52YmP7M982Bm5uj2Y9DDmMA5DEOU43BIuEunEZ8COvTYQWXpG3YgvwG7oCJ1pccvheAPMYhhzEA8hhHcf8gKbLQvby8MGzYsCdeF0IUurd7UTw8PBAfHw+tVgulUonz589j8ODB0Gg0sLKyQrVq1bBw4ULD/F9//TWGDRtWaKueiKTz6CVpOe/3Qcay73hJGlE5VmShT506FXXr1n3qey+99JLRBdvZ2WHu3Ln48ssv4eLiAnd3d3h5eWHx4sVQq9Xw8/MDACQnJxvOml+3bh0GDhyIKlWqPM9YiKg08JI0IrOkEEKIp71x9OhRdOzY8alfdPz4cbRv396kwZ7F3HefAPLZDWTuYwDkMY7SGoMiIx2qCQGw3bML+S9VQfq6Tch7q3UpJCweOXwvAHmMQw5jAOQxjhfe5b5ixQr8+eefT33vv//9r6SFTkSl79GnpGnfao30tRt5FjuRGSmy0NPS0nD9+nUABZegeXh4FHqPiOTDJiQYjuPHQpGViaxRY5H52ReAtbXUsYioBIosdD8/P3zwwQcAgEmTJmHp0qWG94KDg02fjIhMLy8PDl/Mhv2alXxKGpGZK7LQH5Y58OTzz/v25Q88kbkr8pI0IjJLRd4p7lm3Y324K56IzJP1qVCoO7eD9ekw5LzfB5oDR1nmRGauyC30JUuWwNfXF0IIJCYm4o8//jC899NPP2HlypVlEpCIStHjl6R9MR/Zo3hJGpEcFFnoZ86cwdWrVw3Tn332meHfPCmOyPwoMtLhOH4sbPbuluSSNCIyrSIL3cfHB5MmTXrqe99++63JAhFR6eMlaUTyV+Qx9KLKHADGjx9vkjBEVPpsQoLh8naHgqek+QcgdedeljmRDBVZ6FFRUViwYAHOnDljeO3mzZuG27QSUTmXlweH2Z/AaeRQCIUCqes2IfOL+by+nEimitzl/sMPP6B27dp47bXXDK+5uroiMjISubm5+PDDD8skIBGVXKFL0hq4I+37H3kWO5HMFbmFrtPpEBAQAJVKZXhNpVJhwYIFiIyMLJNwRFRy1mEn4dKpbcElab36QvPbEZY5UQVQZKE7OTkV+UUuLi4mCUNEL0AI2K1cDue+PaFISUbG/y1AetAGPvKUqIIocpd7ZmZmkV+Um5trkjBE9JzS0+E04kPDJWlp636A7i0vqVMRURkqcgvd3d0d33zzDbRareG13NxcBAYGolGjRmUSjoiMsz55AmjZEjZ7d0P7Vmtofj/BMieqgIos9JEjR0Kj0eCtt95Cz5490bNnT7Ru3RppaWnw9vYuy4xE9BTWJ0/AuXd3qPv0AC5f5iVpRBVckbvcFQoFPv/8c/j5+eHixYsAgKZNm6J69eplFo6InmR98gTslyyAMrTgdsy5Xd6Gzbz/Q2Yd7jkjqsiKLPSHXn75Zbz88stlkYWInuFpRZ415RPoPN+Em5sjkJgucUIikpLRQiciaT2ryImIHmKhE5VTLHIiKgkWOlE5wyInoufBQicqJ6xPnoD9VwuhPHkCAIuciEqGhU4kMRY5EZUGFjqRRKxD/yjYtc4iJ6JSwEInKmNPFHnnrgVF3ryFxMmIyJyx0InKCIuciEyJhU5kYixyIioLLHQiE7EO/aPgZLc//guARU5EpsVCJyplLHIikgILnaiUWIedLNi1ziInIgmw0IleEIuciMoDFjrRc3q8yLWduiBzyifQvdlS4mREVBGx0IlKiEVOROURC52omFjkRFSesdCJjLAOO1lw1vqJ4wBY5ERUPrHQiYrAIicic8JCJ3qM9anQgl3rLHIiMiMsdKIHWOREZM5Y6EQnTsB51uz/FXnHzgVF3qKVxMGIiIqPhU4VlkXcLThO/hg4+juUYJETkXljoVOFZLNrB1RTJ8IiLRXo0gUpE6ezyInIrFlIHYCoLCnSUuE4egScRn0EhU6H9G++Aw4eZJkTkdnjFjpVGNZhJ+E41g+WcbeQ59kc6SvXIr/uq3BUKKSORkT0wriFTvKn1cJh3udw7t0dFvG3kTl5OjR7DyK/7qtSJyMiKjXcQidZs/z7KhzHjIR15AXk16qNtJVruXudiGSJW+gkT0LAduN6uHRpB+vIC8ge5IOUoydZ5kQkW9xCJ9lRJCbCceJY2Bz8DXq1GmnfrYH2vd5SxyIiMikWOsmK8uB+OE4IgMX9RGj/1RHpy1dBX6261LGIiEyOhU7ykJUF1dxZsNu4HkKpRMYX85HtNwaw4FElIqoYWOhk9qwiL8Bx9AhY/fM3dI1eQ9qq9ch/rbHUsYiIyhQ3X8h85efD7tuvoX63M6z++RtZo8Yi5cAxljkRVUjcQiezZHErFo5j/aA8FYr8qtWQHrgKeR06SR2LiEgyJi300NBQHDx4EK6urlAoFAgICCj0flBQEO7fv4/KlSvjzz//xMcff4x69eqZMhLJgM2ObVBNnwyL9DTk9uyF9K+WQVRylToWEZGkTFbo2dnZmDNnDvbt2welUolx48YhLCwMXl5ehnmysrIwY8YMKBQK/Prrr1iyZAlWr15tqkhk5hSpGqimT4Jt8A7oHVRIC1yF3AGDAd66lYjIdMfQIyIiUL16dSiVSgCAp6cnjh07VmieCRMmQPHgl7Fer4e9vb2p4pCZsz55Ai4dWsM2eAfy3myJlCN/IHegN8uciOgBk22hJyUlwcHBwTCtUqmQlJT01Hm1Wi127dqFOXPmFGvZbm6OpZJRanIYh8nHoNUCs2cDS5YUXIL2+eewnjkTrlal+58uvxflB8dRfshhDIB8xmGMyQrd1dUVmZmZhumMjAy4uj55nFOr1WLu3LmYOHEiatasWaxlJyaml1pOqbi5OZr9OEw9BsurV+DoPxzWl6KQX7tOwX3Y32wJpGSX6ufwe1F+cBzlhxzGAMhjHMX9g8Rku9w9PDwQHx8PrVYLADh//jw6dOgAjUaDjIwMAEBOTg7mzJmDYcOG4fXXX8eBAwdMFYfMiRCwXR9UcB/2S1HI9h6C5CMnC8qciIieymRb6HZ2dpg7dy6+/PJLuLi4wN3dHV5eXli8eDHUajX8/PwwZcoU/P3334iLiwNQcJLcO++8Y6pIZAYUCQlwnDAGNr8fgr5SJaStWg9tj/ekjkVEVO4phBBC6hAlZe67TwD57AYqzTEof/sVjhPHwiIpCdoOnZAeuAr6qtVKbflF4fei/OA4yg85jAGQxziKu8udN5Yh6WVmQvXZTNht3gBhY4OMeYuQPXwU78NORFQCLHSSlNWF8IL7sF+/Bt1rryNt9XrkN2wkdSwiIrPDTSCSRn4+7L9ZAnWPrrC6fg1ZYz5GyoGjLHMioufELXQqcxY3Y+A01g/WZ04hv1p1pH+3Bnnt2ksdi4jIrHELncqOELDZ9hNcOraB9ZlTyHm/D1KOhbLMiYhKAbfQqUwoUpKhmjYJtiHB0KsckbZ8NXL7D+KtW4mISgkLnUzO+sRxOAaMguWdeOS1aIW0lWuhr1Vb6lhERLLCXe5kOrm5cJgzC+p/vweLewnI/ORTaEL2s8yJiEyAW+hkEpaXo+HkPxxWf12Crm49pK9cC53nm1LHIiKSLW6hU+nS62G3dhVcuv4LVn9dQrbvUKQcPsEyJyIyMW6hU6mxSLgLx49HQ3n0d+hdXZEWtBHad3tIHYuIqEJgoVOpUO7bC8fJ42CRnAxtpy5I+3YVRJUqUsciIqowWOj0YjIyoJr9Cey2/ABha4v0BUuQ85EfL0cjIipjLHR6fqdPo9LAQbCMuYG815sifdU65Ls3lDoVEVGFxEKn4hECFnfiYRUZAavIC7CKigCO/g4LvR5ZAROQOX0WYGMjdUoiogqLhU5PEgIW8bcLyjvqAqwiI2AdGQGL+4mF53N3R+qCr5HX9l/S5CQiIgMWekUnBCzibj0o7whYRxX8v8X9+4Vmy6/xCnK7vwfdGx7Ie8MDuiYeqPxaXeQlpksUnIiIHsVCr0iEgMWt2IIt7qgHu84vRsIiKanQbPk1ayG3Z5sHxf0GdG80g3B1lSg0EREVBwtdroSAxc2YB1vdkYbj3hYpKYVmy69ZG7mt2xWUd1MP6Jq+AVGJ5U1EZG5Y6HIgBCxibjzY6n7wv4sRsNBoCs2WX6s2ctp1KCjuNx6Ut0sliUITEVFpYqGbG70eljHXYRUVaTjubRUVCYvUwuWtq1MX2g6doGvySHmrXSQKTUREpsZCL8/0eljeuPa/re6oCFhdjIJFWmqh2XR160HbqTN0TZsVlHeTphDOaolCExGRFFjo5YVeD8vr1wqOdT9a3ulphWbT1XsV2i5dC5e3k7NEoYmIqLxgoUshPx+IjobN0T8KijvyQXlnZhhmEQoF8l+tD+3b3QqK+41m0L3eBMLRScLgRERUXrHQy5hCkwKX9l7AnXg8rGahUCC/fgNoH56s9rC8VY6SZiUiIvPBQi9jQmkD3ZstYal2RIZ7Y+Q1LShvqFRSRyMiIjPGQi9r9vZIW/8D3Nwckc27rBERUSmxkDoAERERvTgWOhERkQyw0ImIiGSAhU5ERCQDLHQiIiIZYKETERHJAAudiIhIBljoREREMsBCJyIikgEWOhERkQyw0ImIiGSAhU5ERCQDLHQiIiIZYKETERHJAAudiIhIBljoREREMsBCJyIikgEWOhERkQyw0ImIiGSAhU5ERCQDLHQiIiIZYKETERHJAAudiIhIBljoREREMsBCJyIikgErUy48NDQUBw8ehKurKxQKBQICAgq9n5ubi0WLFqFKlSqIiYmBn58f6tSpY8pIREREsmSyLfTs7GzMmTMHM2fOxLhx43DlyhWEhYUVmmfTpk2oVq0aRo0ahaFDh2LWrFmmikNERCRrJiv0iIgIVK9eHUqlEgDg6emJY8eOFZrn2LFjaNasGQDA3d0dly9fRkZGhqkiERERyZbJdrknJSXBwcHBMK1SqZCUlFSseVQq1TOX7ebmWLphJSKHcchhDIA8xiGHMQAcR3kihzEA8hmHMSbbQnd1dUVmZqZhOiMjA66uriWeh4iIiIwzWaF7eHggPj4eWq0WAHD+/Hl06NABGo3GsFu9Q4cOuHDhAgDgypUraNiwodGtcyIiInqSQgghTLXwkydP4sCBA3BxcYG1tTUCAgKwePFiqNVq+Pn5IScnB4sWLYKbmxtiY2MxatQonuVORET0HExa6ERERFQ2eGMZIiIiGWChExERyYBJ7xRX2ozdec4cJCYmYtmyZbh8+TJ27twpdZznEhsbi2XLluG1117D3bt3oVarze57odfr4e/vj6ZNmyIvLw+3bt3C/PnzYWtrK3W055KTk4N+/fqhbdu2mD59utRxnkv//v1hY2MDALCwsMCmTZskTlRy169fx759+2BjY4OzZ89i3LhxaNq0qdSxSiQuLg5Dhw5FtWrVABRcfeTu7o6FCxdKnKxk1q1bh9u3b8PFxQU3b97EvHnzzPLne+PGjUhISICdnR20Wi0mT54MhULx9JmFmcjKyhJdunQRubm5QgghAgICRGhoqMSpSm7//v3i999/F3369JE6ynOLjIwUhw4dMky/++674uLFixImKrn8/HyxYsUKw7S/v78ICQmRMNGLWbBggZg2bZpYuHCh1FGeW2BgoNQRXohOpxMjR44U+fn5QgghEhISRFJSksSpSi45OVmcPHnSMP3tt9+Ks2fPSpio5O7duydatGhh+F6Y68/3X3/9Jd5//33DdEBAgDh48GCR85vNLvfi3HnOHHTr1q3QzXTMUdOmTdGlSxfDtF6vh52dnYSJSs7CwgJjxowBAOh0OiQkJJjtFRa7d++Gp6cnatSoIXWUF3L16lUEBQVh+fLlZvmzffHiRQghsHnzZqxZswZHjx6Fi4uL1LFKzMXFBa1btwYAaLVaXLp0CW+++abEqUrGzs4O1tbWhkuks7KyUL9+fYlTlVxMTIxhTwkA1KhR44lbqD/KbHa5F+fOc1T2Dh06hLZt26JevXpSR3kuJ06cwMaNG9GhQwc0adJE6jgl9s8//+D69euYNGkSrly5InWcFzJy5Eg0bdoU+fn58Pb2hoODA1q0aCF1rGKLj49HREQEli5dCkdHR0yZMgXW1tbo27ev1NGe2969e9GjRw+pY5SYSqXC1KlTMXHiRLi5uaFq1aqoWbOm1LFKrEmTJli6dClyc3OhVCpx6dKlQgX/OLPZQudd5cqfU6dO4fTp05g5c6bUUZ5bu3btsH79esTFxWHLli1SxymxQ4cOQalUIigoCOHh4YiKisLGjRuljvVcHh5rtrS0xJtvvonTp09LnKhkHBwcULduXTg6FtxmtHnz5jhz5ozEqV7Mb7/9hu7du0sdo8Sio6Oxfv16rFmzBgsXLoSLiwtWrFghdawSq1GjBr744gusXLkSmzZtQv369Z9Z6Gazhf7oneeUSiXOnz+PwYMHSx2rwjp27BjOnTuHWbNm4d69e4iPjzc8aMcc/PPPP4iLi0OHDh0AFPzgxMXFSRvqOYwePdrw79zcXGRlZWHo0KHSBXpO165dw/nz59GvXz8AwM2bN9G1a1eJU5XMG2+8AY1Gg/z8fFhaWiI+Ph61a9eWOtZzO3XqFJo1awZra2upo5RYQkIC1Go1rKwKKs7NzQ137tyRONXzUavVmDhxIgBgypQp8Pb2LnJey7lz584to1wvxNraGvXq1cOGDRsQERGBl156Cf/+97+ljlViZ86cQUhICKKjo5GTk4MmTZoY/qMzF5cuXcKoUaMghMCuXbuwe/duvPLKK2jUqJHU0YotIyMDa9euRUxMDMLCwnDt2jWMHz/ebM9vOHDgAH777TfEx8fD1tYWDRo0kDpSieTl5WHTpk2IiYnB0aNHYW9vj+HDhxd9Nm85ZGtri6pVq2Lr1q24ePEiEhMTMX78eLP7+X5o6dKlGDNmjNmdHwMAr7zyCi5duoRTp04hMjIS0dHRmDBhgln+fE+YMAGxsbEIDw/Hm2++iZYtWxY5L+8UR0REJANmcwydiIiIisZCJyIikgEWOhERkQyw0ImIiGSAhU5ERCQDLHQiGfD398dnn30mdQwikhALncjMJSYm4s6dO9i3bx+ys7OljkNEEuF16ERmLigoCJ6enpg6dSrGjx+P3r17AwA2b96MgwcPon79+lAoFDh48CDGjBmDQYMG4ddff0VoaCjUajUSEhIwbdo0uLm5PbHsopYhhMDKlSvx3nvvIS4uDmfOnMG8efPQsGFDLF26FFWrVsWdO3fwwQcfoE2bNti4cSMCAwOxZ88eWFhYYObMmahatSoWLlyIrVu3YuXKlXjnnXeg1+vx999/4+2334avr29Zr0oi82bSZ78Rkcn5+/sLIYRYtmyZ8PHxEUIIcfnyZdG6dWuRnZ0thBBi6dKlhveuXbsmunfvbni05Pbt28XUqVOfWO6zliGEENOnTxfjxo0TQggRHh4u/vrrLzFgwADDo3U1Go1o3bq1SExMFEII0bFjR3Hr1i0hhBA7d+4U06dPNyzLx8dHLFu2TAhR8Kjk1q1bi8uXL5fG6iGqMLjLnciMnTt3Dh4eHgCAvn374ty5c7h58yZOnz6N119/Hba2tgBQ6PGXoaGhyM3Nxdy5c/HZZ5/h1KlTyMnJeWLZz1rGQw8fs+np6YlXXnkFFy5cgKenJwDA2dkZVapUwblz54o1lodfZ2dnh9dff93sHs5CJDXzvMkwEQEoeBa6Xq/HvHnzABQ8hGLnzp1wdXUt8j7oQgjUrl0bX3zxheG1R59k+Oh8xu6lrlQqn3jt8a95dFo8OMKn0+meudzifDYRFcYtdCIzlZmZiZycHMyfPx+zZs3CrFmzMG3aNOzatQutWrXCxYsXDVve4eHhhq9r3bo1Ll26hIyMDADAX3/9hQULFjyx/Gct42lUKhU8PT0N86WmpiIhIQHNmzcHUPDHxr179wAUPN7ycREREQCA7Oxs/Pnnn898CAURPYlb6ERmKCcnB5MnT0ZmZiYSEhJQpUoVAMDff/+Ne/fuISgoCH5+fhgxYgQaNWoECwsLw2Mw69Wrh9mzZ2PatGmoWbMm0tLSMHXq1Cc+o2HDhvD393/qMo4ePYrIyEjcvXsXzs7O6Ny5MwBg8eLF+Prrr3H27FncvXsXixYtQuXKlQEAI0eOxFdffYVmzZrBxsYGp06dwoEDB/DOO+8AALKysjBv3jxER0fD398f7u7uJl+PRHLCs9yJZOr48eNo3749AGDLli24ffs2pk2bVubLKA5fX18EBASgVatWpb5sooqCW+hEMrV9+3acOHECCoUCqamp+PTTTyVZhjFbtmzBjRs3e2oe2gAAAEVJREFUsGHDBtSqVQtVq1Yt9c8gqgi4hU5ERCQDPCmOiIhIBljoREREMsBCJyIikgEWOhERkQyw0ImIiGSAhU5ERCQD/w+esXslqsifDwAAAABJRU5ErkJggg==\n",
-      "text/plain": [
-       "<Figure size 576x396 with 1 Axes>"
-      ]
-     },
-     "metadata": {},
-     "output_type": "display_data"
+   "id": "be6f7ced",
+   "metadata": {
+    "collapsed": false,
+    "editable": true,
+    "jupyter": {
+     "outputs_hidden": false
     }
-   ],
-   "source": [
-    "agegroupmean = np.array([0.1, 0.133, 0.250, 0.333, 0.462, 0.625, 0.765, 0.800])\n",
-    "group = np.array([1, 2, 3, 4, 5, 6, 7, 8])\n",
-    "plt.plot(group, agegroupmean, \"r-\")\n",
-    "plt.axis([0,9,0, 1.0])\n",
-    "plt.xlabel(r'Age group')\n",
-    "plt.ylabel(r'CHD mean values')\n",
-    "plt.title(r'Mean values for each age group')\n",
-    "plt.show()"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "We are now trying to find a function $f(y\\vert x)$, that is a function which gives us an expected value for the output $y$ with a given input $x$.\n",
-    "In standard linear regression with a linear dependence on $x$, we would write this in terms of our model"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
+   },
+   "outputs": [],
    "source": [
-    "$$\n",
-    "f(y_i\\vert x_i)=\\beta_0+\\beta_1 x_i.\n",
-    "$$"
+    "# the histogram of the bootstrapped data (normalized data if density = True)\n",
+    "n, binsboot, patches = plt.hist(t, 50, density=True, facecolor='red', alpha=0.75)\n",
+    "# add a 'best fit' line  \n",
+    "y = norm.pdf(binsboot, np.mean(t), np.std(t))\n",
+    "lt = plt.plot(binsboot, y, 'b', linewidth=1)\n",
+    "plt.xlabel('x')\n",
+    "plt.ylabel('Probability')\n",
+    "plt.grid(True)\n",
+    "plt.show()"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "69bcb406",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "This expression implies however that $f(y_i\\vert x_i)$ could take any\n",
-    "value from minus infinity to plus infinity. If we however let\n",
-    "$f(y\\vert y)$ be represented by the mean value, the above example\n",
-    "shows us that we can constrain the function to take values between\n",
-    "zero and one, that is we have $0 \\le f(y_i\\vert x_i) \\le 1$. Looking\n",
-    "at our last curve we see also that it has an S-shaped form. This leads\n",
-    "us to a very popular model for the function $f$, namely the so-called\n",
-    "Sigmoid function or logistic model. We will consider this function as\n",
-    "representing the probability for finding a value of $y_i$ with a given\n",
-    "$x_i$.\n",
+    "## The bias-variance tradeoff\n",
     "\n",
-    "## The logistic function\n",
+    "We will discuss the bias-variance tradeoff in the context of\n",
+    "continuous predictions such as regression. However, many of the\n",
+    "intuitions and ideas discussed here also carry over to classification\n",
+    "tasks. Consider a dataset $\\mathcal{D}$ consisting of the data\n",
+    "$\\mathbf{X}_\\mathcal{D}=\\{(y_j, \\boldsymbol{x}_j), j=0\\ldots n-1\\}$. \n",
     "\n",
-    "Another widely studied model, is the so-called \n",
-    "perceptron model, which is an example of a \"hard classification\" model. We\n",
-    "will encounter this model when we discuss neural networks as\n",
-    "well. Each datapoint is deterministically assigned to a category (i.e\n",
-    "$y_i=0$ or $y_i=1$). In many cases, and the coronary heart disease data forms one of many such examples, it is favorable to have a \"soft\"\n",
->>>>>>> 9b0e2e75096cc1acee65bfac25f4eff818140252
-    "classifier that outputs the probability of a given category rather\n",
-    "than a single value. For example, given $x_i$, the classifier\n",
-    "outputs the probability of being in a category $k$.  Logistic regression\n",
-    "is the most common example of a so-called soft classifier. In logistic\n",
-    "regression, the probability that a data point $x_i$\n",
-    "belongs to a category $y_i=\\{0,1\\}$ is given by the so-called logit function (or Sigmoid) which is meant to represent the likelihood for a given event,"
+    "Let us assume that the true data is generated from a noisy model"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "ce87dc4f",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
-    "p(t) = \\frac{1}{1+\\mathrm \\exp{-t}}=\\frac{\\exp{t}}{1+\\mathrm \\exp{t}}.\n",
+    "\\boldsymbol{y}=f(\\boldsymbol{x}) + \\boldsymbol{\\epsilon}\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Note that $1-p(t)= p(-t)$.\n",
-    "\n",
-    "## Examples of likelihood functions used in logistic regression and nueral networks\n",
-    "\n",
-    "\n",
-    "The following code plots the logistic function, the step function and other functions we will encounter from here and on."
-   ]
-  },
-  {
-   "cell_type": "code",
-<<<<<<< HEAD
-   "execution_count": 1,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "%matplotlib inline\n",
-    "\n",
-=======
-   "execution_count": 3,
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAeMAAAFnCAYAAACVViH2AAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAgAElEQVR4nO3de1xUdd4H8M/AXBhmgBmH4SaCSoIXUEBT6WKaptlaptU+ZbWPbUWbSbvtbrlbu4/uY0+btllbm2vmblnbxTXvtxWzsPJSJuINAQUFkYswMAxzv53nD5OiVFAYzszweb9e85Izc2bOd74OfOb8zk0iCIIAIiIiEk2I2AUQERH1dgxjIiIikTGMiYiIRMYwJiIiEhnDmIiISGQMYyIiIpExjIm66H/+53+wfv36Hl/um2++iTfeeOOijy1cuBCjRo3C2rVrL/n8/fv3Y8aMGbj77rvxu9/9zldlXpRYPSPyVxIeZ0zUNRaLBQqFAlKptEeX63Q6IQgCFArFRR9/8MEHMWPGDMycOfOij8+aNQv33Xcfbr/9dnzwwQeYNWuWT+pcu3Yt1q1bh/fee6/tPrF6RuSv+JtA1EUqlUqU5crl8i49v66uDjExMQDgsyC+FLF6RuSvGMZEneD1evGnP/0JZWVlCA0NRXJyMp577jnk5+fjtddew+jRo/Hiiy8CAHbt2oW//OUviIqKwsiRI7Fp0yZERERg4cKFWL58OXbt2oW8vDwUFhbixIkT+O1vfwuHw4G1a9eiubkZr7/+Ovr37w8AOH36NBYuXAi73Q6Px4M5c+Zg3Lhx2L17NxYuXAi9Xt+2xnnkyBHMnz8fCoUCGRkZuNyg19y5c9HQ0IAXXngBer0e119/PVasWIF7770XeXl5eO211/Duu+/i2WefxcyZM/G3v/0NH374IaZMmQKTyYSSkhIMGzYMixYtanvNjRs34l//+hfCwsIAAI8//jhCQ0OxfPlyNDY24sEHH0RqaioyMjJ+1LPDhw9j8eLFEAQBEokEzzzzDIYPH46dO3fipZdeQnR0NEaMGIH9+/cjJCQEb7zxBnQ6nS/+q4nEIRBRhwoKCoSHH364bXrOnDnCmTNnBEEQhNdee02YN2+eIAiCYDAYhMzMTOHAgQOCIAjCJ598IqSlpQn79u1re+6ECROEhQsXCoIgCDt27BDGjh0r5OfnC4IgCAsXLhT++Mc/CoIgCC6XS5gyZYqwZs0aQRAEobKyUsjKyhIqKysFQRCENWvWCA888IAgCILgcDiEcePGCZs2bRIEQRCKi4uF9PT0tudezIQJE9rVNW/ePOG1115rm37ggQfaPX/evHnC9OnTBYfDIdjtdmH06NFCYWGhIAiCcODAAeG6664TDAaDIAiCsG3btraefL/OC77fM5PJJIwePbqtlv379wujR48WWlpa2p4/YsQIoaqqShAEQXjkkUeEZcuWXfJ9EQUi7sBF1AmRkZEoKyvD7t274fV6sWTJEiQkJPxovl27dkGn0yE7OxsAMHHiRISHh/9ovuuuuw4AMGjQIDQ1NSEnJwcAkJaWhurqagDAoUOHUF1djTvuuAMAkJSUhBEjRmDjxo0/er2ioiIYDAZMnToVADBkyJC2tevuNGbMGMjlcigUCiQnJ7fVunbtWowbNw59+vQBAEyaNAn33Xdfp17zs88+g1qtxpgxYwAAo0aNQlRUFD799NO2eQYMGIB+/foBaN8jomDBMCbqhKysLCxcuBBvvfUWJkyYgH/84x8XHQZuaGiAVqttd59Go/nRfBe2mYaGhgIA1Gp127TL5QIA1NfXIzIyst1OTn369EF9ff1FlxsZGdn2epdablddqBMAFApFW611dXVtQQwAUqkUI0aM6NRr/vC5wPn3WVdX1+FyiYIFw5ioE1pbWzF69Gi88847eO+997B+/fqLHpqj1+vR1NTU7j6j0XhVy4yLi4PJZILb7W67r6mpCbGxsRdd7g/nvdLlymQyOJ3OtmmTydTp58bHx7d73263GyUlJVf1XOD8+4yLi+v08okCHcOYqBN27NiBVatWATg/XBwbGwuv1/uj+W666SY0NTXhwIEDAICdO3fC4XBc1TJHjBiBpKQkbN68GQBw5swZHDp0qG3Y+vsyMzOh0+mwdetWAMDx48dRXl5+RctLTEzEiRMnAACVlZWorKzs9HNnzJiBzz//vC1Ut27d2naMs0qlgs1mAwDk5eW1+8IAABMmTIDFYsH+/fsBAIWFhWhpacHNN998RfUTBbLQBQsWLBC7CCJ/p1Ao8OGHH2L16tV4//33kZKSgsceewybNm3CypUrUVFRAafTiXHjxmHIkCF44YUXsG3bNqjVapw6dQq33HIL+vbti2eeeQaHDh3C0aNHMWbMGDz99NOor69HcXExkpKS8OKLL6KqqgpGoxE33ngjbrzxRrz11ltYtWoVtm/fjj/84Q8YMWIEdu/ejVdeeQVVVVWor6/HzTffjJEjR+LVV1/FunXrUFNTA7lcjt27dyM+Ph4DBw5s937mzp2L4uJiHD16FBaLBdnZ2ejfvz/WrFmD1atXo7m5GYIg4IsvvkBiYiI+++wzrF+/HidOnEBCQgK2bduGTz75BMePH0dKSgpGjx6N6OhovPjii9i4cSNqa2vx7LPPQi6XQ6/XY926dVi/fj0GDhwIo9HYrmfXX389xo4diyVLlmDNmjXYt28fFi1ahKSkJOzduxdLlixBVVUV7HY7zGYzli9fjoqKCoSEhCArK0ukTwRR9+JJP4i6mdFobLe9NisrCx9//DFSUlJErIqI/BmHqYm62RNPPNE2NJ2fnw+dTofk5GSRqyIif8aTfhB1s8zMTMyaNQthYWGQSCR4/fXXedpHIrosDlMTERGJjMPUREREImMYExERiUy0DVkNDa1iLfqqabXhaG62il1GUGOPfY897hnss+8FWo/1+ohLPsY14ysglYZ2PBN1CXvse+xxz2CffS+YeswwJiIiEhnDmIiISGQMYyIiIpExjImIiETGMCYiIhIZw5iIiEhkDGMiIiKRMYyJiIhExjAmIiISGcOYiIhIZAxjIiIikTGMiYiIRMYwJiIiEhnDmIiISGQMYyIiIpExjImIiETGMCYiIhIZw5iIiEhkDGMiIiKRMYyJiIhEJu1ohoaGBrz66qsoKSnBmjVrfvS4w+HAokWLEBsbi9OnTyM3NxcDBgzwSbFERETBqMM14wMHDmDixIkQBOGij69cuRLx8fF47LHHMHv2bDz33HPdXiQREVEw6zCMb731VqhUqks+XlBQgKysLABAWloaSkpKYDabu69CIiKiINfhMHVHDAZDu7BWq9UwGAxQq9WXfZ5WGw6pNLSri+9xen2E2CUEPfbY99jjnsE++57YPfZ4vLA7PbA73bA53LA7PXB8O50cF4lojbJTr9PlMNbpdLBYLG3TZrMZOp2uw+c1N1u7uugep9dHoKGhVewyghp77Hvscc9gn32vKz0WBAFOtxcWmwtWuxtWhxsW+3c/2xxu2B2e7352emBznr/P7vw2dF0euNzeSy5jYEIk/vCzUe3qvZSrCmOj0QipVAq1Wo3x48fj4MGDGDVqFEpLSzF48OAO14qJiIi6k9vjRavVhVarE61WF0xWJ1otTrTaXDDbXDBbXWi1uWCxuWC2u2CxueH2XDpIL0YCIEwRijC5FBHhMujlYVDIzk/LZSEIk4dCIZNCIQ+BQhaKwcnaTr92h2H89ddfY8OGDWhoaMDSpUvx85//HMuXL4dGo0Fubi5+9rOfYdGiRVi6dCmqqqrwf//3f1f05oiIiC7F6xXQYnGiyWSH0exAc6sDzWYHjK1O2FweNDZb0WJxwmx14eK7GX9HAiA8TAqVUoY+EWFQKaVQhcmgCpMiPEyKcIUM4WFSKBVSKBWhUCqkCFdIESY/P62QhUIikfjkfUqES+0m7WOBOHzDYSffY499jz3uGexz57g9Xhha7GhssaOxxfbtv3YYTHY0m+wwmp3weC8dU0qFFFEqOaJUckReuIXLEBEuR8T3/lUrZVCFyRAS4psw7YxuH6YmIiLqLLfHi8YWO+oMVtQ1WVHfbMW5ZhsajDYYTHZcbJVQIgG0EQoMiI9En0gFtBEKaCPCoFHLoY1QQKNWIKW/DiZj4O1/dDEMYyIi6hYutwe1BitqGi2oMVhQ03j+5waj7aJrtxq1HIP6RkGvVUIfpYQuKgzRUWGIjlJCEyFHaMjlj75VyALviJxLYRgTEdEVM1mdqKprxZlz5rZbrcEK7w9Wc1VhUvSPj0Bcn/BvbyrE9lFCr1EGVZh2FcOYiIguy2p341StCRW1JlTWtaKyzgSDydFuHoU8FAMSItBPr0ZCtAoJ0Sr0jVYhUiX32U5PwYRhTEREbQRBQH2zDWVnjCg/24KKGhNqGi3t9lSOVMkxPEWH5NgIJMWq0S9GjWiNEiEM3avGMCYi6sUEQcDZRgtKKptRdsaIsuoWmCzOtscVslCkJWmQ0jcKA+Ij0T8uAtoIBdd2uxnDmIiolzG02FF8ugnHK5tRXNncLnw1ajlGD4lBaj8Nrukbhb56VYc7UlHXMYyJiIKc2+PFyeoWHK4w4Ei5AWcbvzuFcZRKjpxhsRicrEVakhb6qDCu9YqAYUxEFISsdjcOVzTiYFkjjp4ywObwAABk0hBkDNQhfWAfDE3WIiFaxfD1AwxjIqIg0Wp14kBZAwrLGnD8dHPbsb3RUWG4Lj0ew1N0SOungZyHFPkdhjERUQCz2t04eKIBXx8/h+LTTW0BnBSjRnaqHtmpevTVc+3X3zGMiYgCjMfrxbFTTdh9pA4HTzS2XX0oOS4CY4bEYmSaHvpOXkeX/APDmIgoQNQ0WrD7SC32HKtDi/n8HtDxunCMHRqL0UNiEdsnXOQK6WoxjImI/JjL7UVhWQM+O3gWZWeMAIBwhRQTsvvihox49I+L4BB0EGAYExH5IUOLHZ8dPIsvDteg1eoCAAztr8W4EQnIGhQNmZQ7YQUThjERkR+pqDEhf38VvilpgFcQoAqTYsrofrgpsy/iOAwdtBjGREQi8woCDpY1YPvXZ3DybAsAIFGvxuRr+2H0kBgeitQLMIyJiETi8XrxdfE5bNlXiZpvz4o1PEWHydf2w5BkLbcF9yIMYyKiHub2eLH7SC227qtEg9GOEIkE16fHYerYZCREq8Quj0TAMCYi6iFer4C9x+qw4ctTaGyxQxoqwYSsvpg6JgnRPC64V2MYExH5mCAIOFDagHVfVKDWYIU0VIJJIxMxdWwytBEKscsjP8AwJiLyobIzRqz69ARO1bYiRCLBjcPjccf1A6CLChO7NPIjDGMiIh+oM1iwbN0RHChtAABcOzgGM8YN5OFJdFEMYyKibmRzuLFp92l8cqAabo8XKX0jce/Ng5DSN0rs0siPMYyJiLqBIAj4+vg5fPTpCbSYnYjRKjFz3EBcOziGhyhRhxjGRERdVGuw4F/5ZThe2QyZNAR33jAAD04bhhajVezSKEAwjImIrpLL7cXmPaexdV8lPF4Bw1N0mHVLKmI0Sp41i64Iw5iI6CqU17Tg7a0lqGm0QBuhwKxJqchOjeaQNF0VhjER0RVwuDxY/0UF8vefgSAAE7L74u6bUqBU8M8pXT1+eoiIOqn8bAtWbC5GfbMNMVolHpo6GGlJWrHLoiDAMCYi6oDH68Wm3aexeU8lBEHA5Gv7Yca4gVBwuzB1E4YxEdFl1Ddb8damYlTUmKCLVOCRaUO5NkzdjmFMRHQJXx6uxfs7yuBweTB2WCweuCUV4WEyscuiIMQwJiL6AYfTg3/ll2L30TooFVLk3jEUY4fGiV0WBTGGMRHR95xttODv64+iptGCAfER+MX0dOh5eUPyMYYxEdG39h6tw8rtJXC6vJg4MhE/nXANZNIQscuiXoBhTES9ntvjxaqdJ7GzsBpKRSjm3JmOUYNjxC6LehGGMRH1aiarE39fdxSlZ4zoG63C3LsyEKvlZQ6pZzGMiajXqqxrxd/WHobB5MDIVD0enjYEYXL+WaSex08dEfVKXx+vxz+3HIfL7cWMcQMxLSeZ55Um0TCMiahXEQQBm/dWYt3nFQiThyLvruHIHBQtdlnUy3UqjPfs2YP8/HzodDpIJBLMnTu33eNnzpzB4sWLkZGRgePHj2PatGmYOHGiTwomIrpabo8XK/9Tgt1H6qCLVOCX94xAol4tdllEHYexzWbD/PnzsWXLFsjlcuTl5WHv3r3Iyclpm2fFihUYOXIkZs+ejeLiYvzqV79iGBORX7HYXXhj7RGUVBnRPy4Cv7x7OKLUCrHLIgIAdHgAXVFRERISEiCXywEA2dnZKCgoaDdPdHQ0mpqaAABNTU0YNmxY91dKRHSVDC12vPDeAZRUGZGdqse8+7MZxORXOlwzNhgMUKlUbdNqtRoGg6HdPA899BCeeOIJ/PnPf8bhw4cxZ86c7q+UiOgqnG0wY8m/D6G51YHJ1/bDTydcg5AQ7qhF/qXDMNbpdLBYLG3TZrMZOp2u3Ty/+93vcM8992DatGloamrC5MmT8cknn0Cj0VzydbXacEilgXf5Mb0+QuwSgh577Hu9pcfHTzVh0QcHYba58NC0YZg54ZoeXX5v6bOYgqXHHYZxZmYmampq4HQ6IZfLUVhYiFmzZsFoNEIqlUKtVqO2thZ6vR4AEBkZiZCQEHi93su+bnOztXveQQ/S6yPQ0NAqdhlBjT32vd7S48PljVi67ijcHgEP/2QIrk+P7dH33Vv6LKZA6/Hlvjh0GMZKpRILFizA888/D61Wi7S0NOTk5GDx4sXQaDTIzc3F73//e7z77rs4ePAgqqur8dRTT6FPnz7d+iaIiDpr37E6rNh8HNJQCfLuysCIa3joEvk3iSAIghgLDqRvMxcE2rewQMQe+16w9/jzQzVYua0ESoUUv7xnOAYlXnpzmS8Fe5/9QaD1uEtrxkREgWLngWq8v6MMaqUMv/mvTCTHBcf2RAp+DGMiCgr/+aoK//7sJCJVcjx9byb68mQeFEAYxkQU8DbtOY11n1dAG6HA0/dlIa4Pr7pEgYVhTEQB7UIQ6yLD8PSsLMRolGKXRHTFGMZEFLC27atsC+J592chOopBTIGpw9NhEhH5o/yvq7C6oPz80PQsBjEFNoYxEQWcT745g48+PQmNWo5nODRNQYBhTEQBZVfRWXzwyQlEqeR4+r4sxGq5sxYFPoYxEQWMr4/X493/lEKtlOG392UhXqfq+ElEAYBhTEQB4XC5AW9tKkaYIhS/+a9M9I1mEFPwYBgTkd8rO2PE0nVHEBIiwZN3DeeZtSjoMIyJyK9V1bfirx8fhscrYM6d6UhL0opdElG3YxgTkd8612zFklVFsDvceHjaEF59iYIWw5iI/JLJ6sSSfx+CyerC/ZNTMXZonNglEfkMw5iI/I7D6cFfVx/GuWYbfpKTjJuzE8UuicinGMZE5Fc8Xi/+vuEoTtWacF16HGaOGyh2SUQ+xzAmIr8hCALe216Kw+UGpA/og9lTB0MikYhdFpHPMYyJyG9s2nManx+qRXJcBB6/Mx3SUP6Jot6Bn3Qi8gv7iuuw/otT0EWG4Vd3D4dSwYvKUe/BMCYi0Z2oNuKfW45DqQjFr+4Zjii1QuySiHoUw5iIRHWu2YrX1xyB1wvMuTMDffVqsUsi6nEMYyISjcXuwqurD8Nsc+GBKakYNqCP2CURiYJhTESicHu8WLruKOqarLh1dBLGZ/YVuyQi0TCMiUgUH+08geOVzcgaFI27J6SIXQ6RqBjGRNTjCg6exaeFZ5GoV+HR24cihMcSUy/HMCaiHlVa1Yz3d5RBrZThybuGI0zOQ5iIGMZE1GMajDa8se4oAOCJGemI1ihFrojIPzCMiahH2J1uvL7m/J7T909O5XWJib6HYUxEPicIAv655TiqGyy4Obsv95wm+gGGMRH53LavqvBNaQNS+2lw78RBYpdD5HcYxkTkU0crDFhTUA5thIIXfyC6BP5WEJHPnDPa8ObGYwgNleCJGRmIUsnFLonILzGMicgnHE4P/rbmCCx2Nx6cnIaBCZFil0TktxjGRNTtBEHAyv+UoLrBjAlZfXHjiASxSyLyawxjIup2nxaexb7ieqT0jcR9k7jDFlFHGMZE1K3Kz7bgo50nEBEuw5w7M7jDFlEn8LeEiLqNyerE0vVH4RUE/OKOYdBGKMQuiSggMIyJqFt4vQKWbzyG5lYHZo4biCH9eW1ios5iGBNRt9jw5SkUn25G5jXRmDo2WexyiAIKw5iIuuxwuQGb9pyGXhOGh6cN4SURia4Qw5iIuqTJZMeKzcWQhoZgzp0ZUIXJxC6JKOAwjInoqrk9XizbcAxmmwv3TRqE5LgIsUsiCkiduqr3nj17kJ+fD51OB4lEgrlz57Z7XBAEvPfeewCAs2fPwmQy4c9//nP3V0tEfmXt5xU4ebYFo4fEYHwmT+xBdLU6DGObzYb58+djy5YtkMvlyMvLw969e5GTk9M2z4YNGxAZGYk777wTAFBSUuK7ionILxSdbMR/vqpCrFaJ/751MCTcTkx01Tocpi4qKkJCQgLk8vMneM/OzkZBQUG7eTZt2gSj0Yh3330XS5YsgUql8kmxROQfGlts+MfmYsikIXj8znQoFZ0aZCOiS+jwN8hgMLQLV7VaDYPB0G6empoamM1mzJ07F6dOncIjjzyCrVu3IjQ09JKvq9WGQyq99OP+Sq/nNjFfY499rys9dnu8WPThQVjsbsy9ZwRGpnN4+lL4Wfa9YOlxh2Gs0+lgsVjaps1mM3Q6Xbt51Go1RowYAQAYMGAAzGYzamtrkZiYeMnXbW62Xm3NotHrI9DQ0Cp2GUGNPfa9rvZ49WcnUVrZjLFDY5E1sA//vy6Bn2XfC7QeX+6LQ4fD1JmZmaipqYHT6QQAFBYWYvz48TAajTCbzQCAnJwcnDlzBsD5sPZ4PNDr9d1ROxH5kSMVBmz7qgoxWiUenJLG7cRE3aTDNWOlUokFCxbg+eefh1arRVpaGnJycrB48WJoNBrk5ubi0UcfxUsvvYRly5ahqqoKixYtgkLBc9ISBZPmVse3xxNL8Ph0bicm6k4SQRAEMRYcSEMLFwTakEggYo9972p67PUK+MtHB1FSZcSsSYMwaVQ/H1UXPPhZ9r1A63GXhqmJiDbvOY2SKiOyBkVj4shL7wtCRFeHYUxEl1V2xogNu09BF6nAQ7cN4XZiIh9gGBPRJZltLry58RgkkCD3jmFQK3neaSJfYBgT0UUJgoC3tx5Hc6sD028cgEGJGrFLIgpaDGMiuqiCg2dx8EQjBidp8BNen5jIpxjGRPQj1efM+HDnSaiVMjx6+zCEhHA7MZEvMYyJqB2Hy4NlG4/B7fHi57cNgTaC5wwg8jWGMRG1s2rnCdQ0WjBxZCIyB0WLXQ5Rr8AwJqI2B0rPoaCoBol6NX46IUXscoh6DYYxEQEAmkx2vLOtBHJpCB6bPgyyALyqGlGgYhgTEbxeAW9tKobF7sa9EwehbzSvSU7UkxjGRIQt+ypResaI7FQ9bsrk9YmJehrDmKiXKz/bgg1fnII2QoHZUwfzdJdEImAYE/ViVrsbb248BkEQ8Oi0oTzdJZFIGMZEvdi/dpSiscWO23KSMThZK3Y5RL0Ww5iol9p7tA77jtVjQHwkpt8wQOxyiHo1hjFRL3TOaMN7+aVQyEPx2B1DIQ3lnwIiMfE3kKiX8Xi8eGvjMdidHjxwSypitOFil0TU6zGMiXqZD3eUorzGhDFDY3FdepzY5RARGMZEvUrZGSNWf1KG6KgwPDg5jYcxEfkJhjFRL2Gxu7B80zEAwKO3D0V4mFTkiojoAoYxUS8gCALe/U8pmkwO3HtLGgYlasQuiYi+h2FM1AvsPlKH/SXncE1iFH46KVXscojoBxjGREGuvsmK93eUQamQIvf2oQjlYUxEfoe/lURBzO3x4s2Nx+BwefCzKWmIjlKKXRIRXQTDmCiIrf/iFE7XteL69DiMGRordjlEdAkMY6Igdfx0E7btq0SMRolZt3A7MZE/YxgTBaFWqxNvbS5GSIgEuXcMg1LBw5iI/BnDmCjICIKAd7aVwGh24s4bB2BgQqTYJRFRBxjGREGm4OBZHDzRiCHJWkwdmyx2OUTUCQxjoiBS3WDGR5+ehFopwyPThiKEp7skCggMY6Ig4XR58ObGY3C5vXho6mBoIxRil0REncQwJgoS//7sJM42WDAhqy+yUvVil0NEV4BhTBQEDpY14NPCs+gbrcJ/3XyN2OUQ0RViGBMFuCaTHf/cehwyaQgemz4Mclmo2CUR0RViGBMFMK9XwFubimGxu3HvxEFI1KvFLomIrgLDmCiAbd57GqVnjMhO1WN8ZoLY5RDRVWIYEwWoE9VGbPjyFLQRCsyeOhgSHsZEFLAYxkQByGxzYfnGYwCAx+4YBrVSJnJFRNQVDGOiACMIAt7eehwGkwN3XD8Aqf00YpdERF3EMCYKMJ8Wnj/d5eAkDW6/rr/Y5RBRN2AYEwWQyrpWrPr0BNRKGR69fRhCQridmCgYdOq6anv27EF+fj50Oh0kEgnmzp170fk2btyIp59+GoWFhVCpVN1aKFFvZ3O4sWzDUbg9Ah6ZNpSnuyQKIh2Gsc1mw/z587FlyxbI5XLk5eVh7969yMnJaTdfeXk5ysvLfVYoUW8mCALeyy9FfbMNt45OwvAUndglEVE36nCYuqioCAkJCZDL5QCA7OxsFBQUtJvHZrNhxYoVeOKJJ3xSJFFv9+WRWuw7Vo+BCZGYedNAscshom7W4ZqxwWBoN+SsVqthMBjazfPKK69gzpw5bYHdGVptOKTSwDttn14fIXYJQY89bq+y1oT3d5yASinD72ePRpyu65uA2OOewT77XrD0uMMw1ul0sFgsbdNmsxk63XdDZLW1tTCZTNi2bVvbfW+//TZuuukmZGRkXPJ1m5utV1uzaPT6CDQ0tIpdRlBjj9uzO934v5XfwOnyIPf2oQj1ervcH/a4Z7DPvhdoPb7cF4cOwzgzMxM1NTVwOp2Qy+UoLCzErFmzYDQaIZVKER8fjxdffLFt/pdffhkPPfQQd+Ai6iJBEPDu9lLUGqyYfG0/ZCMbAMYAABTDSURBVPOyiERBq8NtxkqlEgsWLMDzzz+PV155BWlpacjJycHy5cvxwQcftM3X1NSEpUuXAgBWrFiB+vp631VN1At8fqimbTvx3eNTxC6HiHxIIgiCIMaCA2lo4YJAGxIJROzxeVX1rXj+3QNQyEIw/6FrER2l7LbXZo97Bvvse4HW48sNU/OkH0R+xmp3Y+n6o3B7vHh42tBuDWIi8k8MYyI/IggC/rn1OM412zB1TBIyr4kWuyQi6gEMYyI/sv3rMygsa0BaPw2PJybqRRjGRH6itKoZHxeUI0otxy+mD0NoCH89iXoL/rYT+QGj2YFlG85fn/jx6emIUvO800S9CcOYSGRujxfL1h9Fi8WJeyak8PrERL0Qw5hIZB8XlKOsugUj0/SYfG0/scshIhEwjIlEtO9YHfL3n0G8Lhw/v20IJBJen5ioN2IYE4mkqr4V72wrQZg8FHNnZkCp6NTlxYkoCDGMiURgtrnwt7VH4HR78ei0oYjvhisxEVHgYhgT9TCvV8CbG4+hscWO26/rjyxeAIKo12MYE/WwtZ9X4NipJgxP0WH6jQPELoeI/ADDmKgH7Suuw9Z9lYjRKpF7+1CEcIctIgLDmKjHnKo14e2t53fYevKu4QgPk4ldEhH5CYYxUQ9obnXg9TWH4XZ78Yvpw5AQzR22iOg7DGMiH3O5Pfjb2iMwmp24e0IKhqfwSkxE1B7DmMiHBEHAO9tKcKrWhJxhcbh1dJLYJRGRH2IYE/nQ1n2V2HusHgMTIjF7ahrPsEVEF8UwJvKR/SXnsGZXBfpEKjB3ZgZk0lCxSyIiP8UwJvKB8rMtWLG5GGHyUPzy7hHQ8JKIRHQZDGOibtZotJ3fc9rjxS+mp6NfjFrskojIzzGMibqR1e7Gqx8fhsnqwqxJqRieohO7JCIKAAxjom7i9njx9/VHUNNowaRRiZg4MlHskogoQDCMibqBIAh4e+txHDvdjMxronHvzYPELomIAgjDmKgbrNlVgb3H6pGSEInHpg9DSAgPYSKizmMYE3XRzgPV2LqvErF9wvHk3cOhkPEQJiK6Mgxjoi74puQcPthRhkiVHL/+6QhEhMvFLomIAhDDmOgqlVY1Y/mmYsjloXjqnhHQa5Ril0REAYphTHQVTtWa8NePD0MQBDwxIx3JcRFil0REAYxhTHSFahoteOXfh+BweZB7xzCkD+CxxETUNQxjoivQaLTh5VVFMNtc+O9bB+PawTFil0REQYBhTNRJLWYH/rKqCM2tDvx0wjUYNyJB7JKIKEgwjIk6odXqxMurinCu2Yaf5CTj1jG8LjERdR+GMVEHzDYXXv6oCNUNFkzMTsTMcQPFLomIggzDmOgyrHYXXl5VhKpzZozP6otZtwyCRMKzaxFR92IYE12C1e7Gy6sOobKuFeNGxOOByakMYiLyCYYx0UXYHG68sroIp2pNuD49Dj+7dTBCGMRE5CNSsQsg8jcWuwtLVh3CqVoTxg6LxUO3DWEQE5FPMYyJvsdkdWLJR+e3EV+fHnc+iHkFJiLyMYYx0bdazA689FERahotGJ+ZgAempHGNmIh6BMOYCECTyY6XPipCfZMVk0Yl4r6J3GuaiHoOw5h6vVqDBUtWFcFgcuC2scm466aBDGIi6lGdCuM9e/YgPz8fOp0OEokEc+fObff48uXL0djYiOjoaBw7dgxPPvkkUlJSfFIwUXeqqDHh1dWHYLa5MHPcQPwkJ5lBTEQ9rsMwttlsmD9/PrZs2QK5XI68vDzs3bsXOTk5bfNYrVb8/ve/h0QiwdatW/HSSy9h2bJlPi2cqKuOnWrC39YegdPtweypg3muaSISTYfHGRcVFSEhIQFyuRwAkJ2djYKCgnbz/OpXv2pbm/B6vQgPD+/+Som60dfH6/Hq6kPweAU8MSODQUxEoupwzdhgMEClUrVNq9VqGAyGi87rdDqxbt06zJ8/v8MFa7XhkEpDr6BU/6DX8yLyvubLHguCgHUFJ/H25mKEh0nxh5+PQUZKtM+W56/4Oe4Z7LPvBUuPOwxjnU4Hi8XSNm02m6HT/fhi6k6nEwsWLMBTTz2FpKSOr2jT3Gy9wlLFp9dHoKGhVewygpove+z2ePH+jjLsKqqBNkKBX949HHGRil73f8rPcc9gn30v0Hp8uS8OHQ5TZ2ZmoqamBk6nEwBQWFiI8ePHw2g0wmw2AwDsdjvmz5+Phx56COnp6di+fXs3lU7UPax2N/66+hB2FdUgKVaNP/xsFJJig+MbNREFvg7XjJVKJRYsWIDnn38eWq0WaWlpyMnJweLFi6HRaJCbm4vf/va3OHHiBKqrqwGc36FrypQpPi+eqDMaW2z46+rDONtowYgUHR6bPgxhch7VR0T+QyIIgiDGggNpaOGCQBsSCUTd3eOSymYsXX8UZpsLk0Ym4t6Jg3r96S35Oe4Z7LPvBVqPLzdMzdUDCkqCIGDngWp8tPMkJBLgwcmpmJCdKHZZREQXxTCmoONye/De9jJ8eaQWkeEyzJmRgdR+GrHLIiK6JIYxBZXGFhv+vv4YTtWakBwXgbyZGegTGSZ2WUREl8UwpqBRdLIR/9hcDIvdjevS4/CzKWmQywLvWHYi6n0YxhTw3B4v1u6qwH++roI0NASzpw7GjcPjeY5pIgoYDGMKaIYWO97ceAwnz7YgVqvE43em8/hhIgo4DGMKWPuK6/De9jLYHG6MHhKD/751MJQKfqSJKPDwLxcFHKvdhX/tKMO+Y/VQyEI5LE1EAY9hTAGltKoZKzYXw2ByYGBCJB69fShitbxKGBEFNoYxBQS70401BRXYWViNEIkEd1zfH7df3x+hIR2eXp2IyO8xjMnvHTvdhJXbStDYYke8Lhw/v20IUvpGiV0WEVG3YRiT37LaXfj3Zyfx+aFahEgk+ElOMu64vj9kAXgdbCKiy2EYk98RBAF7jtZh9WcnYbK60C9GjZ/fNgTJcTxkiYiCE8OY/EplrQmvrTqIsjNGyGUhuOumgZgyOgnSUG4bJqLgxTAmv2C1u7Bx92l8cqAaXq+A7FQ97ps4CLoonleaiIIfw5hE5fF6UXCwBhu+PAWzzYWYPuG49+ZrkHlNtNilERH1GIYxiUIQBBypMGDVpydRa7AiTB6Ku24aiFlTh6LFaBW7PCKiHsUwph53otqItbsqUHrGCIkEGJ+ZgOk3DkSUSs6rLBFRr8Qwph5TWdeKdV9U4HC5AQAwPEWHu29KQWKMWuTKiIjExTAmn6usa8XmvadxoLQBAJDWT4OZNw3EoESNuIUREfkJhjH5zMnqFmzee7ptTXhAfARmjkvB0P5aXtSBiOh7GMbUrbyCgKMVBvznqyqUVBkBAKn9NJh2XTKG9e/DECYiugiGMXULh8uDvUfrkL//DOqazu8NnT6gD6Zd1x+p/TgcTUR0OQxj6pIGow27imrw+aEamG0uhIZIcH16HG65th+SYnn6SiKizmAY0xXzegUcLjfgs4NncbTCAAGAWinDtOuScXN2IjRqhdglEhEFFIYxdVp9sxW7j9Rhz9FaNJkcAICUhEiMz+qLawfH8BhhIqKrxDCmy7La3fim9Bx2H6nFieoWAIBCHorxmQkYn9WXQ9FERN2AYUw/4nB5cOhkI74+fg6Hyw1we7yQABiSrMUNGfHITtVDIedaMBFRd2EYEwDA5nDj6KkmHCg9h0MnDXC4PACAhGgVxgyJQU56HKKjlCJXSUQUnBjGvViLxYlDJxtRWNaA4tPNcHu8AIAYjRKjh8Zg9JBYJOp5qkoiIl9jGPciXq+AU7UmHC434HCFAZV1rW2PJepVyE7VI2uQHkmxap6cg4ioBzGMg5ggCDjXbENxZTOKTzehpLIZFrsbABAaIsGQZC0yBuqQlRqNWG24yNUSEfVeDOMgciF8S88YUXbGiNKqZhi+PQQJAHSRCoxM0yNjYDSG9tdCqeB/PxGRP+Bf4wDmcntQWWdGeU0Lys+24ER1C1oszrbHVWFSjEzTY2j/PhjaX4sYjZLDz0REfohhHCDcHi9qGi2orGvF6fpWnK41oareDI9XaJsnSi3H6CExSO2nQWo/DRKiVQhh+BIR+T2GsR+y2t2objDjzLnvbtUNZrjc3rZ5QkMkSIqNQEpCJFL6RiElIRK6qDCu+RIRBSCGsYjMNhfqmqyoabS03c42WtDc6mg3X2iIBH2jVegfH4HkuEj0j4tAol4FmZQn3iAiCgYMYx+z2F0412w7fzPacK7ZiromK+qbbDDbXD+aXxuhwLABfdBPr0a/GDUSY9SI14VDGhoiQvVERNQTGMZd4BUEtFqcaGp1oMlkR5PJAYPJjgajDYYWOxpb7LA63D96XohEAr0mDCkJkYjThSNep0JCtAoJunCEh8lEeCdERCQmhvFFeAUBZpsLJrMTLRYnWiwOtJidcHgE1DSYYWx1wGg+f3N7hIu+hlwWgugoJa5JjEKMRokY7fmbXnP+xjVdIiK6IOjD2CsIsDvcMNvdsNhcsNhdMNtcsNjcaLU6Ybadn261utBqdcJkdcFsdcErXDxkgfNrtlFqOfrFRKBPpAJ9IsKgi1SgT2QYdFFhiI4Kg1op485URETUKX4bxl5BgMPpgcPlgcPpgd3pgd3pht3pge3bf+0OD2wO93c35/lpi90Fq/38fVaHG5fJ1XaUCikiw2WI0SoRGS5HlFqOKNX5W6RKjoH9+kBwuxEZLkdICIOWiIi6R6fCeM+ePcjPz4dOp4NEIsHcuXPbPe5wOLBo0SLExsbi9OnTyM3NxYABAy77mm+sPQKH2wOnywun63zonv/XC4fL0+4wniulkIciXCGFJkKBhGgVVGEyqJTSb/+VQRUmhVopQ4RSBnW4HGqlDGqlDDLp5YeO9foINDS0XnYeIiKiK9VhGNtsNsyfPx9btmyBXC5HXl4e9u7di5ycnLZ5Vq5cifj4eDz66KMoLS3Fc889hw8++OCyr3ugrKHtZ7ksBHJpKBSyEESEyxAtC4NCFgqFPBRh394UMikU8lAo5aEIU0jb7g9XSKFUSBGmkCL82/u5PZaIiAJJh2FcVFSEhIQEyOVyAEB2djYKCgrahXFBQQF+/etfAwDS0tJQUlICs9kMtfrSl9/765M3QC4LhVwawm2rRETUq3UYxgaDASqVqm1arVbDYDB0ap7LhXFSXw2kAXjSCr0+QuwSgh577Hvscc9gn30vWHrcYRjrdDpYLJa2abPZDJ1Od8Xz/FBzs/VKaxUdtxn7Hnvse+xxz2CffS/Qeny5Lw4dblzNzMxETU0NnM7zVwMqLCzE+PHjYTQaYTabAQDjx4/HwYMHAQClpaUYPHjwZdeKiYiI6DsSQej4wJ/du3dj+/bt0Gq1kMlkmDt3LhYvXgyNRoPc3FzY7XYsWrQIer0eVVVVeOyxxzrcmzqQvs1cEGjfwgIRe+x77HHPYJ99L9B6fLk1406FsS8EUgMvCLT/+EDEHvsee9wz2GffC7Qed2mYmoiIiHyLYUxERCQyhjEREZHIGMZEREQiYxgTERGJjGFMREQkMoYxERGRyBjGREREImMYExERiYxhTEREJDKGMRERkcgYxkRERCJjGBMREYmMYUxERCQyhjEREZHIGMZEREQiYxgTERGJjGFMREQkMoYxERGRyBjGREREIpMIgiCIXQQREVFvxjVjIiIikTGMiYiIRMYwJiIiEhnDmIiISGQMYyIiIpExjImIiEQmFbuAQLR06VKsXLkSX331ldilBKUXXngBSqUS4eHhKCkpwbPPPgu9Xi92WUFhz549yM/Ph06ng0Qiwdy5c8UuKahUVVXh1VdfxdChQ1FXVweNRsMe+4jdbsc999yDG264AfPmzRO7nC5jGF+hr776CiaTSewygppSqcRTTz0FAFi+fDmWLVuGP/7xjyJXFfhsNhvmz5+PLVu2QC6XIy8vD3v37kVOTo7YpQUNo9GI2267DZMmTQIA3HbbbRg/fjzS09NFriz4XPjSEyw4TH0FGhsbsWXLFjzwwANilxLULgQxAAiCgPDwcBGrCR5FRUVISEiAXC4HAGRnZ6OgoEDcooLM8OHD24IYALxeL5RKpYgVBaf169cjOzsbiYmJYpfSbbhm/AMPP/wwGhsbf3T/k08+iZ07d2LevHlobW0VobLgcrk+T5w4EQBgMpnw5Zdf4vXXX+/p8oKSwWCASqVqm1ar1TAYDCJWFNx27NiBG264ASkpKWKXElROnjyJiooK/PrXv0ZpaanY5XQbhvEP/OMf/7jo/UeOHIFUKsWqVavQ0tICh8OB5cuXY/Lkyejfv3/PFhkELtXnC1pbW/GnP/0JL7zwAjQaTQ9VFdx0Oh0sFkvbtNlshk6nE7Gi4LVv3z589dVXePbZZ8UuJejs2LEDcrkcy5cvx4EDB+ByufDOO+9g9uzZYpfWJQzjTsrIyEBGRgYAoLq6Gh9//DFyc3NFrio4NTU14YUXXsAzzzyD2NhYbN++HVOmTBG7rICXmZmJmpoaOJ1OyOVyFBYWYtasWWKXFXQKCgrwzTff4LnnnsO5c+dQU1ODrKwsscsKGo8//njbzw6HA1arNeCDGOCFIq5YZWUlPvroI3z44YfIzc3F7NmzuU2zm82YMQNut7ttjVilUmHZsmUiVxUcdu/eje3bt0Or1UImk3FP32529OhRPPjgg207bFmtVtx///2YOXOmyJUFn+3bt+P999+Hy+XC/fffj2nTpoldUpcwjImIiETGvamJiIhExjAmIiISGcOYiIhIZAxjIiIikTGMiYiIRMYwJiIiEhnDmIiISGQ8AxdRL7F27Vrs3LkTsbGxaGlpwdatW7FhwwakpqaKXRpRr8eTfhD1EgcPHoRGo8GAAQOQl5eHpKQkPP3002KXRURgGBP1OqtXr8YHH3yAVatWtV1OkYjExWFqol7k1KlTePnll/H+++8ziIn8CHfgIuolXC4XfvOb3yAvLw8pKSkoLy/HN998I3ZZRASuGRP1Gu+88w7Ky8tRXl6O//3f/0V9fT0mTpyIUaNGiV0aUa/HbcZEREQi4zA1ERGRyBjGREREImMYExERiYxhTEREJDKGMRERkcgYxkRERCJjGBMREYmMYUxERCSy/wc6Z6BeGTCN7AAAAABJRU5ErkJggg==\n",
-      "text/plain": [
-       "<Figure size 576x396 with 1 Axes>"
-      ]
-     },
-     "metadata": {},
-     "output_type": "display_data"
-    },
-    {
-     "data": {
-      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAfIAAAFnCAYAAABdOssgAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAgAElEQVR4nO3df3RU9Z3/8dckIZgfQJJJ+JkEkG8bAY3AUiBVIPw4xePxrKYsPWyLrUUbD5ZoRVYQXZO6LgvRUA5FFoNYqcUfB8Ggiz2glBAqEZEJa8EQqEqATZEQEmDygxByv3+wuW5EyKWTmbl38nyc03Ny535m5p1PI6/5fO5nPtdlGIYhAADgSGHBLgAAAPz9CHIAAByMIAcAwMEIcgAAHIwgBwDAwQhyAAAcjCAHIEnau3evsrKy9E//9E9auHBhQN/76aefVlFRUUDfEwgVBDngEAsXLtRvf/tbv73+b37zG82ePVtvvfWW0tPT/fY+mzZt0r333tvusQULFuiuu+7y23sCoSwi2AUAsIeTJ0+qd+/ekqQf//jHAX3vmJiYgL4fEEpc7OwG2Edra6t+/etf6/DhwwoPD9fAgQP15JNPasOGDSosLFT37t01YMAA/eM//qNmzJihAwcO6D/+4z/kcrkUHh6up59+WkOGDNHKlSv1+uuvKzMzU7W1tfrqq6/kdru1ZMkSJSQkXPG+c+fO1c6dO3XjjTcqKSlJt912m1566SXNnDlTOTk5WrFihX7/+99r0aJF+uEPf2i+/rRp03Tu3DkdOnRIw4cP19KlS83XfOedd/SHP/xBN9xwgyRpzpw5Zo2nT5/W0KFD9d3vfle33HKLVqxYoTFjxmjJkiWSpE8//VT5+fkyDEMul0uPP/640tPTtX37dj333HNKTEzUrbfeqr179yosLEwvvPCC3G53YP5PAuzGAGAbxcXFxv33328eP/TQQ8bx48cNwzCMBQsWGCtWrDDPnTt3zhg7dqyxe/duwzAMY8eOHcYPfvAD49KlS2b7qVOnGufPnzcMwzCeeuopY968eVd970mTJhkfffSRefzN95s1a5axcePGdufvvvtu48KFC0ZTU5MxZswYw+PxGIZhGPv27TO+//3vGzU1NYZhGMYf//hHY8GCBYZhGMbGjRuNWbNmtXvvFStWmOfPnTtnjBkzxqxl7969xpgxY4yzZ8+az7/11luNY8eOGYZhGA888ICxevXqa3UrENK4Rg7YSM+ePXX48GF9+OGHam1t1bJly9S/f/9vbbtjxw5FR0crIyNDkpSZmanTp0/rv//7v802EydOVGxsrCTp7rvv1tatW3Xp0qVOq3fs2LGKjIxU9+7dNXDgQJ04cULS5evgEyZMMEf/U6dO1T//8z9bes0dO3YoNjZWY8eOlSSNHj1avXr10p/+9CezzeDBg5WSkiJJSktLM98X6IoIcsBGRo4cqX/7t3/TmjVrNGnSJK1du1bGVa5+nTx5UmfPntW9995r/i8hIUF1dXVmm169epk/x8XF6eLFi6qtre20ets+JEhS9+7ddfHiRbO2/zuFHxERoVtvvdXSa37zuZKUkJCgkydPdvi+QFfEYjfARs6fP68xY8Zo4sSJOnbsmB544AH16dNH06dPv6Jtv3791LdvX7366qvmY16vV5GRkebx2bNnzZ9ra2vVrVs3xcfHW6qlW7duam5uNo/PnTtn+ffo16+fzpw5Yx63tLTor3/9q2666abrfq4knTlzRn379rX8/kBXwogcsJH3339fb775piQpNTVVffr0UWtrq6TLK7sbGxvV0NCgxx57TJMmTVJdXZ0+/fRTSVJDQ4N++tOfyuv1mq/35z//2TwuKirStGnTFB4ebqmW5ORkHTlyRJJUWVmpyspKy79HVlaWSkpKzEB+7733tGnTpna/hyTl5OSopaWl3XMnTZqk+vp67d27V5Lk8Xh09uxZTZ482fL7A11JeF5eXl6wiwBwWffu3fX6669rw4YNWr9+vYYMGaIHH3xQ4eHh6tmzp1566SVt2bJF99xzj2655RaNHTtW+fn52rRpk9555x3NmTNHw4YNkyR98MEHGjx4sP7rv/5LL730klpbW/XMM88oKirqivedO3euPvvsMx04cED19fUaNWqUBg0apI0bN2rDhg2qra2VYRjatWuXkpOTtWPHDhUVFenIkSPq37+//vjHP+qDDz5QeXm5hgwZojFjxigxMVFLlizRO++8o7/97W9atGiRIiMjlZSUpLfffltFRUW68cYbVVdXp3Xr1umLL75Qc3OzbrvtNo0bN07Lli3Txo0b9dFHH2np0qVKTU1VaWmpli1bpmPHjqmpqUler1eFhYX64osvFBYWppEjRwb6/zIg6Pj6GRCiFi5cqAEDBignJyfYpQDwI6bWAQBwMBa7ASFo5cqV2rVrl7p3766+fftqxowZwS4JgJ90ytR6dXW1li9frkOHDmnjxo1XnN+0aZPeeOMNde/eXZI0ffp03XPPPZKkzZs3q7y8XGFhYUpNTdXMmTN9LQcAgC6jU0bk+/bt05QpU1ReXn7VNsuWLVNycnK7x06ePKmXX35ZRUVFcrlcmj59usaNG6dBgwZ1RlkAAIS8TgnyO+64Q3v27Llmm/Xr1ysxMVGNjY2aNWuW4uLitGvXLg0fPlwul0vS5c0wSkpKCHIAACwKyDXy733ve8rMzFRCQoJ27typRx55ROvWrdOZM2fa3fUoJiZGNTU1Hb5eS8slRURY+y4sAAChLCBB3rYnsiSNGzdOc+bM0aVLl5SQkNBuk4n6+nqlpqZ2+Hq1tQ1+qdNfkpJ6qLr6fLDLCHn0s//Rx/5HHweG0/o5KanHVc/57etndXV15o5SBQUF5u5NR48eVXJyssLDwzV+/HgdPHjQ3Eu6rKxMEyZM8FdJAACEnE4ZkX/88cfavHmzqqurtWrVKs2ePVuFhYWKi4tTdna2EhMTlZeXp+TkZB0+fFj5+fmSpL59+2r27NlavHixwsPDNWPGDK6PAwBwHRy5s5uTpkMk503hOBX97H/0sf/Rx4HhtH4OytQ6AADwP4IcAAAHI8gBAHAwghwAAAcjyAEAcDCCHAAAByPIAQBwMIIcAAAHI8gBAHAwghwAAAcjyAEAcDCCHAAAByPIAQBwMIIcAAAHI8gBAHAwghwAAAcjyAEAcDCCHAAAByPIAQBwMIIcAAAHI8gBAHAwghwAAAcjyAEAcDCCHAAAByPIAQBwMIIcAAAHI8gBAHAwghwAAAcjyAEAcLCIzniR6upqLV++XIcOHdLGjRuvOF9YWKjTp08rMTFRBw8e1MMPP6whQ4ZIkiZPnqwBAwZIknr37q2CgoLOKAkAgC6hU4J83759mjJlisrLy7/1fENDg5544gm5XC699957eu6557R69WpJUlZWlnJycjqjDAAAupxOmVq/4447FBMTc9Xzv/rVr+RyuSRJra2tio6ONs/t3btXa9as0fLly+XxeDqjHAAAuoxOGZFb1dzcrLffflu5ubnmY/Pnz1d6eroaGxuVlZWlF198UQMHDrzm68THRysiItzf5XaqpKQewS6hS6Cf/Y8+9j/6ODBCpZ8DFuTNzc3Ky8vTo48+qtTUVPPx9PR0SVJUVJSGDh0qj8fTYZDX1jb4tdbOlpTUQ9XV54NdRsijn/2PPvY/+jgwnNbP1/rQ4bdV63V1dfJ6vZKkpqYm5ebm6uc//7luvvlmbd26VZJUWlqqkpIS8zmVlZVKSUnxV0kAAIScThmRf/zxx9q8ebOqq6u1atUqzZ49W4WFhYqLi1N2drbmz5+vI0eO6MSJE5IuL36bNm2aEhIStHLlSn322Wc6deqUpk2bptGjR3dGSQAAdAkuwzCMYBdxvZw0HSI5bwrHqehn/6OP/Y8+Dgyn9XNQptYBAID/EeQAADgYQQ4AgIMR5AAAOBhBDgCAgxHkAAA4GEEOAICDEeQAADgYQQ4AgIMR5AAAOBhBDgCAgxHkAAA4GEEOAICDEeQAADgYQQ4AgIMR5AAAOBhBDgCAgxHkAAA4GEEOAICDEeQAADgYQQ4AgIMR5AAAOBhBDgCAgxHkAAA4GEEOAICDEeQAADgYQQ4AgIMR5AAAOBhBDgCAg0V0xotUV1dr+fLlOnTokDZu3HjF+QsXLmjp0qXq06ePjh49quzsbA0ePFiStHnzZpWXlyssLEypqamaOXNmZ5QEAECX0ClBvm/fPk2ZMkXl5eXfen7dunXq16+ffvGLX6iiokJPPvmkXnvtNZ08eVIvv/yyioqK5HK5NH36dI0bN06DBg3qjLIAAAh5nRLkd9xxh/bs2XPV88XFxZo3b54kKS0tTYcOHZLX69WuXbs0fPhwuVwuSdLIkSNVUlJCkAM2VXXaqwOHq4NdRkjr+ZVX5842BruMkOekfu4ZE6mkpB5XPd8pQd6RmpoaxcTEmMexsbGqqanRmTNn2j0eExOjmpqaQJQE4O/wxAsf6sy5pmCXAXQ5745Ivuq5gAS52+1WfX29eez1euV2u5WQkKDKykrz8fr6eqWmpnb4evHx0YqICPdLrf5yrU9T6Dz0s395G5qVGBeluycMCXYpQJcR36P7Nc/7Lcjr6uoUERGh2NhYZWZmqqysTKNHj1ZFRYVuuukmxcbGavz48frDH/4gwzDkcrlUVlamWbNmdfjatbUN/irbL5KSeqi6+nywywh59LP/tRpSz+huum1Y72CXErL4Ow6MUOrnTvn62ccff6zNmzerurpaq1atUlNTkwoLC/Xaa69Jkn7605+qqqpKq1at0u9+9zv9+7//uySpb9++mj17thYvXqwlS5ZoxowZXB8HbM2QK9glAGjHZRiGEewirpfTPkWF0ic/O6Of/e+B/B26sV9PLbr3H4JdSsji7zgwnNbP17psyIYwAKwzDDEkB+yFIAdgWavBPxqA3fDfJABL2q7Cte37AMAeCHIAlrQtpiHHAXshyAFY879JzogcsBeCHIAlrc77ggvQJRDkAK5LGANywFYIcgCWmFtOMLUO2ApBDsCSVnIcsCWCHIA1bUHOjjCArRDkACxpNb9HHuRCALRDkAO4LuQ4YC8EOQBL2NkNsCeCHIAl7OwG2BNBDsASg53dAFsiyAFYYrDYDbAlghyAJeaIPLhlAPgGghyAJV9fIyfKATshyAFYwtQ6YE8EOQBLWOwG2BNBDsASc0Qe5DoAtEeQA7guDMgBeyHIAVjSttc6Y3LAXghyANb8b46HkeOArRDkACxpbfuBIAdshSAHYA03TQFsiSAHYAk7uwH2RJADsKSVETlgSwQ5gOtCjgP2QpADsKSVnd0AW4rojBfZvXu3tm3bJrfbLZfLpblz57Y7v2jRIh0/ftw8rqio0KZNm5ScnKzJkydrwIABkqTevXuroKCgM0oC0NnYax2wJZ+DvLGxUbm5udqyZYsiIyOVk5Oj0tJSZWRkmG1uv/123XnnnZIkr9erhQsXKjk5WZKUlZWlnJwcX8sA4GcsdgPsyeep9f3796t///6KjIyUJI0aNUrFxcXt2rSFuCS99dZbmj59unm8d+9erVmzRsuXL5fH4/G1HAB+wm1MAXvyeUReU1OjmJgY8zg2NlY1NTXf2ra1tVW7du3Sz372M/Ox+fPnKz09XY2NjcrKytKLL76ogQMHXvM94+OjFRER7mvpAZWU1CPYJXQJ9LP/nG++vCVMdHQk/exn9G9ghEo/+xzkbrdb9fX15rHX65Xb7f7Wttu3b9ekSZPafaJPT0+XJEVFRWno0KHyeDwdBnltbYOvZQdUUlIPVVefD3YZIY9+9q8zZy7/d97UeJF+9iP+jgPDaf18rQ8dPk+tjxgxQlVVVWpubpYkeTweZWZmqq6uTl6vt13bTZs2KSsryzwuLS1VSUmJeVxZWamUlBRfSwLgB4ZY7AbYkc8j8qioKOXl5enZZ59VfHy80tLSlJGRofz8fMXFxSk7O1uSVF5erkGDBrWbhk9ISNDKlSv12Wef6dSpU5o2bZpGjx7ta0kA/MBc7EaQA7biMgzz3oSO4aTpEMl5UzhORT/71xdV5/Ts7z/RHWNS9aPJ/y/Y5YQs/o4Dw2n97NepdQBdA1PrgD0R5AAsMefuCHLAVghyANb8b5CHMSQHbIUgB2BJq/OW0wBdAkEO4LqwsxtgLwQ5AEvavuBCjAP2QpADsITvkQP2RJADsKRtRM5iN8BeCHIAlrS2/UCOA7ZCkAOwxpxaJ8kBOyHIAVjCYjfAnghyAJaYG7uR5ICtEOQALDFH5CQ5YCsEOQBL+PoZYE8EOQBLzCDnKjlgKwQ5AEu4jSlgTwQ5AEu+HpEDsBOCHIAlBt8jB2yJIAdgicGQHLAlghzAdWGvdcBeCHIAlrS2jcgB2ApBDuC6hDEgB2yFIAdgSSs7uwG2RJADsMbcbD2oVQD4BoIcgCVtl8hZ7AbYC0EOwBIWuwH2RJADuC4MyAF7IcgBWGIuduMiOWArBDkAa7iNKWBLEZ3xIrt379a2bdvkdrvlcrk0d+7cduc3bdqkN954Q927d5ckTZ8+Xffcc48kafPmzSovL1dYWJhSU1M1c+bMzigJQCczF60T5ICt+BzkjY2Nys3N1ZYtWxQZGamcnByVlpYqIyOjXbtly5YpOTm53WMnT57Uyy+/rKKiIrlcLk2fPl3jxo3ToEGDfC0LQCcz+B45YEs+T63v379f/fv3V2RkpCRp1KhRKi4uvqLd+vXrtXbtWq1cuVJ1dXWSpF27dmn48OHmPwwjR45USUmJryUB8APumQLYk88j8pqaGsXExJjHsbGxqqmpadfme9/7njIzM5WQkKCdO3fqkUce0bp163TmzJl2z42Jibniud8mPj5aERHhvpYeUElJPYJdQpdAP/tPbOzlS2O9ekXRz35G/wZGqPSzz0HudrtVX19vHnu9Xrnd7nZtUlJSzJ/HjRunOXPm6NKlS0pISFBlZaV5rr6+XqmpqR2+Z21tg69lB1RSUg9VV58Pdhkhj372r3PnmyRJ58830c9+xN9xYDitn6/1ocPnqfURI0aoqqpKzc3NkiSPx6PMzEzV1dXJ6/VKkgoKCtTS0iJJOnr0qJKTkxUeHq7x48fr4MGD5rW3srIyTZgwwdeSAPiBObXONXLAVnwekUdFRSkvL0/PPvus4uPjlZaWpoyMDOXn5ysuLk7Z2dlKTExUXl6ekpOTdfjwYeXn50uS+vbtq9mzZ2vx4sUKDw/XjBkzWOgG2JS52C3IdQBoz2UYztt30UnTIZLzpnCcin72r/c/Oa7XPziiX2bdrH9I6x3sckIWf8eB4bR+9uvUOoCugal1wJ4IcgDWMLUO2BJBDsCSVkbkgC0R5ACuDzkO2ApBDsCStnWxYQQ5YCsEOQBLvv56C0kO2AlBDsASRuSAPRHkACxpNe9jGtQyAHwDQQ7AGm5jCtgSQQ7AEm5jCtgTQQ7AEnNmnRE5YCsEOQBLuGkKYE8EOQBLvt5rPbh1AGiPIAdgiSEWuwF2RJADsIQROWBPBDkAS7iNKWBPBDkAS8yp9SDXAaA9ghyAJYzIAXsiyAFYwzVywJYIcgCWtJpbtAa5EADtEOQArouLq+SArRDkACxhRA7YE0EOwBqj4yYAAo8gB2BJ26r1MIbkgK0Q5AAsMcR9TAE7IsgBWML3yAF7IsgBWMJtTAF7IsgBWNK21o0BOWAvBDkAS9pG5Cx2A+yFIAdgiWEOyYNaBoBviOiMF9m9e7e2bdsmt9stl8uluXPntjtfWFio06dPKzExUQcPHtTDDz+sIUOGSJImT56sAQMGSJJ69+6tgoKCzigJQCdjsRtgTz4HeWNjo3Jzc7VlyxZFRkYqJydHpaWlysjIMNs0NDToiSeekMvl0nvvvafnnntOq1evliRlZWUpJyfH1zIA+Bm3MQXsyeep9f3796t///6KjIyUJI0aNUrFxcXt2vzqV78yP8W3trYqOjraPLd3716tWbNGy5cvl8fj8bUcAH5icPczwJZ8HpHX1NQoJibGPI6NjVVNTc23tm1ubtbbb7+t3Nxc87H58+crPT1djY2NysrK0osvvqiBAwde8z3j46MVERHua+kBlZTUI9gldAn0s/907375nwu3O1ZJ8dEdtIYv+DsOjFDpZ5+D3O12q76+3jz2er1yu91XtGtublZeXp4effRRpaammo+np6dLkqKiojR06FB5PJ4Og7y2tsHXsgMqKamHqqvPB7uMkEc/+1dj00VJUu2ZerlaLgW5mtDF33FgOK2fr/Whw+ep9REjRqiqqkrNzc2SJI/Ho8zMTNXV1cnr9UqSmpqalJubq5///Oe6+eabtXXrVklSaWmpSkpKzNeqrKxUSkqKryUB8AcWuwG25POIPCoqSnl5eXr22WcVHx+vtLQ0ZWRkKD8/X3FxccrOztb8+fN15MgRnThxQtLlxW/Tpk1TQkKCVq5cqc8++0ynTp3StGnTNHr0aJ9/KQCdj9uYAvbkMgzDcTcndNJ0iOS8KRynop/9a/XmA/q4/JR+M/c29YrtHuxyQhZ/x4HhtH7269Q6gK6hlal1wJYIcgDWGNzGFLAjghyAJW05zl7rgL0Q5AAscdxiGqCLIMgBWPL13c+CXAiAdghyAJZw0xTAnghyAJY48JuqQJdAkAOwpC3GWewG2AtBDsASc0BOjgO2QpADsITFboA9EeQALPn6CjlJDtgJQQ7AEoObpgC2RJADsOTrr58Ftw4A7RHkACz5ekROkgN2QpADuC7EOGAvBDkAS7iNKWBPBDkAawyD6+OADRHkACxpFaNxwI4IcgDWGFwfB+yIIAdgiWEYjMgBGyLIAVhiiO1ZATsiyAFYYhgGu8EANkSQA7DEMBiRA3ZEkAOwhAE5YE8EOQBLWOwG2BNBDsASQ3z9DLAjghyAJYzIAXsiyAFYYohr5IAdEeQALLm82I0kB+wmojNeZPfu3dq2bZvcbrdcLpfmzp3b7vyFCxe0dOlS9enTR0ePHlV2drYGDx4sSdq8ebPKy8sVFham1NRUzZw5szNKAtDJDG6aAtiSz0He2Nio3NxcbdmyRZGRkcrJyVFpaakyMjLMNuvWrVO/fv30i1/8QhUVFXryySf12muv6eTJk3r55ZdVVFQkl8ul6dOna9y4cRo0aJCvZQHoZIzIAXvyOcj379+v/v37KzIyUpI0atQoFRcXtwvy4uJizZs3T5KUlpamQ4cOyev1ateuXRo+fLj5j8PIkSNVUlLSYZA/vXaPr2UHVEREuFpaLgW7jJBHP/vX6bON6hEdGewyAHyDz0FeU1OjmJgY8zg2NlY1NTWW2pw5c6bd4zExMVc899vUeZt9LRvAdbohMkKjh/ZRUlKPYJcS8ujjwAiVfvY5yN1ut+rr681jr9crt9ttqU1CQoIqKyvNx+vr65Wamtrhe654ZLyvZQdUUlIPVVefD3YZIY9+9j/62P/o48BwWj9f60OHz6vWR4wYoaqqKjU3Xx4lezweZWZmqq6uTl6vV5KUmZmpsrIySVJFRYVuuukmxcbGavz48Tp48ODlmzFIKisr04QJE3wtCQCALsNltKWoDz788ENt3bpV8fHx6tatm+bOnav8/HzFxcUpOztbTU1NWrp0qZKSknTs2DE9+OCD7VatHzhwQOHh4Ro0aJClVetO+hQlOe+Tn1PRz/5HH/sffRwYTuvna43IOyXIA81JnS857w/Gqehn/6OP/Y8+Dgyn9bNfp9YBAEDwEOQAADgYQQ4AgIMR5AAAOBhBDgCAgxHkAAA4GEEOAICDEeQAADgYQQ4AgIMR5AAAOBhBDgCAgxHkAAA4GEEOAICDEeQAADgYQQ4AgIMR5AAAOBhBDgCAgxHkAAA4GEEOAICDEeQAADgYQQ4AgIMR5AAAOBhBDgCAgxHkAAA4GEEOAICDEeQAADgYQQ4AgIMR5AAAOBhBDgCAg0X48uS6ujoVFBQoJSVFR48e1bx585SYmNiuzaeffqp169Zp2LBh+vLLL5Wenq4f/ehHkqSnn35aX375pdn2qaeeUlpami8lAQDQpfgU5MuWLVNGRobuvPNO/elPf9LSpUv13HPPtWtTXV2tn/3sZ0pPT9fFixf1/e9/X1OnTlVCQoKSkpL0zDPP+PQLAADQlfkU5Dt37tScOXMkSaNGjdLChQuvaDNlypR2x+Hh4erWrZskqb6+Xv/5n/+p8PBwRUdHa+bMmYqI8KkkAAC6FJdhGMa1Gtx///06ffr0FY8//PDDeuSRR7R792717NlTLS0tGj58uA4ePHjVMH7llVckSffdd58k6eDBg0pLS1NERITy8/MVExOjX/7ylx0W3dJySRER4R22AwAg1HU4/F27du1Vz7ndbtXX16tnz57yer3q1avXVUP83XffVUNDgx566CHzseHDh5s/jxs3TmvWrLEU5LW1DR22sZOkpB6qrj4f7DJCHv3sf/Sx/9HHgeG0fk5K6nHVcz6tWp84caLKysokSR6PRxMnTpQktba2qqqqymy3YcMG1dTU6KGHHlJFRYW5wG3p0qVmm8rKSg0cONCXcgAA6HJ8uiA9b948Pf/88zp69KiOHz+uBQsWSJIqKir0+OOP691339UHH3ygJUuWaNiwYdq+fbvq6ur01FNPafDgwaqtrdXzzz+vG264QV9++aWeeOKJTvmlAADoKjq8Rm5HTpoOkZw3heNU9LP/0cf+Rx8HhtP62W9T6wAAILgIcgAAHIwgBwDAwQhyAAAcjCAHAMDBCHIAAByMIAcAwMEIcgAAHIwgBwDAwQhyAAAcjCAHAMDBCHIAAByMIAcAwMEIcgAAHIwgBwDAwQhyAAAcjCAHAMDBCHIAAByMIAcAwMEIcgAAHIwgBwDAwQhyAAAcjCAHAMDBCHIAAByMIAcAwMEIcgAAHIwgBwDAwQhyAAAcjCAHAMDBCHIAABwswpcn19XVqaCgQCkpKTp69KjmzZunxMTEK9pNnjxZAwYMkCT17t1bBQUFkqQTJ05o1apVGjhwoP7nf/5HCxYsUExMjC8lAQDQpfg0Il+2bJkyMjKUnZ2tqVOnaunSpd/aLisrS6+++qpeffVVM8QlKTc3VzNnztSDDz6o73znO1qzZo0v5QAA0OX4FOQ7d+7UyJEjJVVpVl4AAAV4SURBVEmjRo3Szp07v7Xd3r17tWbNGi1fvlwej0eSdPHiRe3Zs0e33HJLh88HAADfrsOp9fvvv1+nT5++4vGHH35YNTU15lR4bGyszp49q5aWFkVEtH/Z+fPnKz09XY2NjcrKytKLL76oqKgo3XDDDXK5XObza2pqLBWdlNTDUjs7cWLNTkQ/+x997H/0cWCESj93GORr16696jm32636+nr17NlTXq9XvXr1uiLEJSk9PV2SFBUVpaFDh8rj8eiuu+5SU1OTDMOQy+WS1+uV2+324VcBAKDr8WlqfeLEiSorK5MkeTweTZw4UZLU2tqqqqoqSVJpaalKSkrM51RWViolJUXdunXT2LFj9Ze//OWK5wMAAGtchmEYf++T6+rq9Pzzz6t///46fvy4HnvsMSUmJqq8vFyPP/643n33XVVUVGjlypUaPny4Tp06pT59+ujBBx+UdHnV+gsvvKCUlBT97W9/08KFC1m1DgDAdfApyAEAQHCxIQwAAA5GkAMA4GA+7eyG67dq1SqtW7dOe/bsCXYpIWfx4sWKiopSdHS0Dh06pEWLFikpKSnYZYWE3bt3a9u2bXK73XK5XJo7d26wSwo5x44d0/LlyzVs2DCdPHlScXFx9LOfNDU1acaMGbr99tu1YMGCYJfjM4I8gPbs2aNz584Fu4yQFRUVpUcffVSSVFhYqNWrV+tf//Vfg1yV8zU2Nio3N1dbtmxRZGSkcnJyVFpaqoyMjGCXFlLq6up05513aurUqZKkO++8U5mZmbr55puDXFnoafvAFCqYWg+Q06dPa8uWLZo1a1awSwlZbSEuSYZhKDo6OojVhI79+/erf//+ioyMlHR5F8bi4uLgFhWC0tPTzRCXLn+NNyoqKogVhaaioiKNGjVKycnJwS6l0zAi70TX2gVv+/btWrBggc6fPx+EykLHtfp4ypQpkqRz587pz3/+s377298GuryQ9H93cJSubxdG/H3ef/993X777RoyZEiwSwkpf/3rX/XFF19o3rx5qqioCHY5nYYg70RX2wXvL3/5iyIiIvTmm2/q7NmzunDhggoLC/WDH/xAgwYNCmyRDnetnQYl6fz58/r1r3+txYsXKy4uLkBVhba2HRzbsAujf3300Ufas2ePFi1aFOxSQs7777+vyMhIFRYWat++fbp48aJeeeUV3XfffcEuzScEeQDccsst5s1hTpw4obfeekvZ2dlBrir0nDlzRosXL9bjjz+uPn36aOvWrZo2bVqwy3K8ESNGqKqqSs3NzYqMjJTH49GPf/zjYJcVkoqLi/XJJ5/oySef1KlTp1RVVWXemAq+mzNnjvnzhQsX1NDQ4PgQl9gQJqAqKyv1xhtv6PXXX1d2drbuu+8+ruN2oqysLLW0tJgj8ZiYGK1evTrIVYWGDz/8UFu3blV8fLy6devGamo/OHDggO69915zcVtDQ4N+8pOf6Ic//GGQKws9W7du1fr163Xx4kX95Cc/0V133RXsknxCkAMA4GCsWgcAwMEIcgAAHIwgBwDAwQhyAAAcjCAHAMDBCHIAAByMIAcAwMHY2Q1AhzZt2qTt27erT58+Onv2rN577z1t3rxZ3/3ud4NdGtDlsSEMgA6VlZUpLi5OgwcPVk5OjlJTU/Uv//IvwS4LgAhyANdhw4YNeu211/Tmm2+atzUFEFxMrQOw5Msvv1RBQYHWr19PiAM2wmI3AB26ePGiHnvsMeXk5GjIkCH6/PPP9cknnwS7LABiRA7AgldeeUWff/65Pv/8cz3zzDP66quvNGXKFI0ePTrYpQFdHtfIAQBwMKbWAQBwMIIcAAAHI8gBAHAwghwAAAcjyAEAcDCCHAAAByPIAQBwMIIcAAAH+/+w2Id0pcVObAAAAABJRU5ErkJggg==\n",
-      "text/plain": [
-       "<Figure size 576x396 with 1 Axes>"
-      ]
-     },
-     "metadata": {},
-     "output_type": "display_data"
-    },
-    {
-     "data": {
-      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAfIAAAFnCAYAAABdOssgAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAgAElEQVR4nO3deXhU5d038O+syWQmy2QyCSRkYwsQiYAgpAVBUam4Aq++PG71kUesFdDy8AiKCvVVnoJgaaUoUFQui9VaWYq0iqVlUcKasIUkbEnIQkIyySSZJbOe94/AaIQhEybJmeX7uS4vMjP3zPzmZ5Jvzn3OuY9EEAQBREREFJSkYhdAREREN45BTkREFMQY5EREREGMQU5ERBTEGORERERBjEFOREQUxBjkRCFq1qxZGDp0KA4cOODzc5YvX46pU6fi/vvvx86dO7uxuvZsNhvGjx8Pq9XaY+9JFCoY5EQB5IknnsCmTZu65LVWrVoFvV7v8/gLFy5g48aN+PTTT/Huu+9CpVJ1SR3X8uPPGRERgW3btnXrexKFKrnYBRBRYKipqYFWq4VSqURGRgYyMjJ69P1jYmJ69P2IQgWDnChArFixAkVFRairq8PmzZsxY8YMjBo1CosXL0ZlZSUkEgkGDBiA1157DXK5HKtWrcKf//xnTJo0Cc3NzSguLkZ2djaWLl3a7nWPHTuGDRs24MyZM3jqqafw2GOPXfXex48fx1tvvYW6ujo88cQTuPPOO7F7925899132LlzJxISEvDkk0/i2LFjKCkpgd1ux4wZM3Dw4EG8/vrr2LVrF0pLSzF//nzcddddAACz2Yz//d//xblz5wAAmZmZmDdvHj788MOrPue2bduwY8cO/PGPf8To0aMhCALWr1+PHTt2QCaTISMjAwsXLoRGo8HixYvx5Zdf4vHHH8f58+dRUlKCSZMmYe7cud3/P4koEAlEFDAef/xx4YsvvvDcbmxsFLZs2eK5PX/+fOEvf/lLu9sPPvigYLPZhNbWVuHWW28V8vPzPY/ffvvtwqJFiwRBEIRjx44Jw4YNExwOxzXfe//+/cLtt9/e7r6BAwcKFRUVgiAIQkVFhTBw4MCrHl+7dq0gCIKwfft24e677/Y89uqrrwoLFiwQBEEQXC6X8Oyzzwr79++/5ue8UuuVxzdv3ixMnjxZsFgsgiAIwiuvvCK8/PLL7fr0zDPPCG63W6itrRWGDBki1NTUXPNzEYU67iMnCmCxsbGorq7Gf/zHf+CJJ57AwYMHUVhY2G7M6NGjoVQqERERgfT0dFRWVrZ7fNy4cQCArKwsWCwWGAyGLq3xh69fVVUFAHC73diyZQumTp0KAJBKpViwYAH69+/v02tu3boV99xzj2ef+dSpU/G3v/0NTqfTM2bs2LGQSCRITExEXFyc572Jwg2n1okC2ObNm/HZZ59hy5YtiIuLw7vvvntVYGk0Gs/XERERcDgc13w8IiICAK563F8/fP0rr93Q0AC73Y74+HjPuM7sc6+pqWn33Pj4eDgcDhgMBiQlJbV73x+/N1G44RY5UQA7fvw4cnJyEBcXBwDttkh7gkKhgN1uBwC0tLT4/Lz4+HgolUo0NDR47qutrUVdXZ1Pz+/du3e75zY0NEChUCAhIcHnGojCBYOcKICo1WpYrVaUlZVh6dKlSE9PR3FxMex2O5xOJ/Ly8nq0npSUFJw5cwYAsHv3bp+fJ5VK8dBDD3lOMXO73Vi4cKEnyH/8OX9sypQp+Oqrr9Da2goA2LJlCx544AHIZDJ/PxJRyJEtXrx4sdhFEFGbiIgIvPfee9i9ezceffRRTJw4EUeOHMF7772Hw4cPQ6VS4eDBg5BKpTh69Ci2bNmCM2fOIDk5Gf/4xz/wz3/+E0VFRejXrx/effddHDt2DCdPnsTYsWOxcOFCnD9/HseOHcNdd92FyMhIz/seP34cb7zxBqqqqpCXl4fs7GwkJCQgMTERb7/9Nvbu3YusrCzs3r0bBw8exIMPPoj/+q//QkVFBY4dO4af/exneP7551FbW4uCggI8+OCDGD16NL799lv88Y9/xBdffIFJkyZh4sSJ1/ycv/3tb3HixAmcPHkSQ4cOxbhx42C1WrFixQps2rQJMTExWLhwIZRKJZYtW4bdu3ejqKgI2dnZWLNmDQ4fPoyTJ09i1KhR7abkicKBRBAEQewiiIiI6MZwap2IiCiIMciJiIiCWJecflZXV4eVK1eiuLgYX3zxxVWP22w2LF26FElJSSgrK8PMmTORmZkJoO180aKiIkilUqSlpWH69OldURIREVFY6JIgP3LkCCZOnIiioqJrPr5hwwb07t0bzzzzDEpKSrBw4UJ88sknqKmpwQcffIAtW7ZAIpFg2rRpGDNmTI+v8UxERBSsumRq/Wc/+xnUarXXx3ft2oXhw4cDaFv9qbi4GCaTCXv37kV2djYkEgkAYPjw4dizZ09XlERERBQWemRlN4PB0C7oNRoNDAYDGhoa2t2vVqt9Wj7S6XRBLuf5pEQUWhxONwxNVjQ0t6Kx2QZDsxVNJjuaTDY0m+1oNtthsthhsjpgsjpgs7u6rRaZVNL2n0wCqUQCqVQKmVQCqRSQSiSQSK/cL4FUAgDffy2RSCD54b9o+1p6+caV2wA8465oG9L2vCu3217d88UP/2n3XPzgyx/fbDeuE27waVe/zo+L89Hdo9Mw+qbe1x3TI0Gu0+lgNps9t00mE3Q6HeLj41FeXu6532w2Iy0trcPXa2y0+PS+en006up8X40qnLA33rE33rE33vnaG6vNiYsGC2obLKhttKCmwQJDUyvqm1vRbLLjeucDSwCoIuSIipSjlzYKUZFyRCplUEW0/RuhlCFSIUOEQgalQgalQgqlXAaFXAqlXAq5XAqFXAq5TAqFTAqZTAK5rO32j4P7RoPPn96EI196U1fXAr0+2uvj3RbkRqMRcrkcGo0GEyZMQEFBAUaOHImSkhIMGjQIGo0G48aNw5/+9CcIggCJRIKCggI8/vjj3VUSEVGParbYcb66GaXVzai4ZEJlnQn1Ta1XjZNJJdBGRyArLQ7a6EhooyMQq1EiThOBmCgFoqOUiI5SQB2pgFTadQFLoaFLgvzgwYPYunUr6urqsHr1ajz99NNYu3Yt4uLiMHPmTDz55JNYunQpVq9ejQsXLuCtt94CAPTq1QtPP/00lixZAplMhocffpgHuhFR0GpssaGovAFFZY04XWlEnbF9aMdEKTA4XYsUvRq946OQGB+FJK0K8dGRDGi6YUG5spuvUzSczvGOvfGOvfGOvWlPEASUXmxB/uk6nChtQEXt971RR8qRmRyDfsmx6Jscg7SkaMSqlSJWKx5+33jna29EmVonIgpVVXUm7D1+EYeKL6GxxQYAUCpkGNpXh8HpWgzJ0KJPoqbt4C6ibsYgJyLygc3hQl5hDfYeu4jSi80A2ra6f3JTL4wYqMf4UWloabKKXCWFIwY5EdF1mKwO/OtIJf55pBImqwMSCZDTT4exQ3tj2IAEyGVty3FEKuXg5DGJgUFORHQN5lYHvtxXhn8XVMHucEMdKcd9P8nA7cNToI2OELs8Ig8GORHRDzhdbvw7vwp/+64U5lYntNERmHpbGm67uTcilfyVSYGH35VERJedKmvAx1+XoLbRClWEHI/c3h8Tb+kDhZwXiqTAxSAnorDXanfi813n8O/8KkglEkwc0QcPjM1AdFR4ni5GwYVBTkRh7XSFEeu3n0KdsRXJCWrMuHcwMnvHiF0Wkc8Y5EQUlgRBwDeHKvDZv88CAO4Zk4aHxmZCwQsyUZBhkBNR2LE7XNjwVQnyCmsQq1biuYduwsDUOLHLIrohDHIiCiuNLTa8+8VxlNW0ILN3DGZNHcrTySioMciJKGzUG61Y9ucC1De14qdDe+HJSVmcSqegxyAnorBQ22DB258WoKHZhgfHZuKBn2Z06TW3icTCICeikFdVb8byTwvQZLLj4Qn9cM+YdLFLIuoyDHIiCmmXGi14+5N8NFsc+I87B+Cukalil0TUpRjkRBSyWix2/PYvx9BsceCxuwZi4i19xC6JqMtx3UEiCkkOpwvvbjqB2kYrJo9JZ4hTyGKQE1HIcQsC1n1ZhLOVTbh1cCKmju8rdklE3YZBTkQhZ8veUhwuvoQBfWIx497BkPLodAphDHIiCimFpQ3Yvq8M+rhIzJ6Ww/PEKeQxyIkoZDSZ7Vj35SlIpRL84sGboFEpxC6JqNsxyIkoJLgFAX/cVohmsx3/Z0I/XsGMwgaDnIhCwj/2l6OwrBE5/XS4exTPFafwwSAnoqBXXtOCzXtKEadRYsa9g7n0KoUVBjkRBTW3W8BHXxXDLQiYce8QREcpxS6JqEcxyIkoqO08UonymhbkZvdCdma82OUQ9TgGOREFrYbmVmzaex7qSDn+78T+YpdDJAoGOREFrY3fnIbN7sIjd/RHDKfUKUwxyIkoKOWfrkPBmXpkpcZh7NDeYpdDJJouufrZvn37sGPHDuh0OkgkEsyaNavd46+88goqKio8t0tKSrBp0yb06dMHd9xxB1JSUgAAiYmJWLFiRVeUREQhzOF049OdZyCXSfDkz7J4lDqFNb+D3Gq1YtGiRdi+fTuUSiVmz56NvLw85ObmesaMHTsWkydPBgCYTCYsWLAAffq0XYloypQpmD17tr9lEFEY2XW0CvVNrbh7VCp669Ril0MkKr+n1o8ePYrk5GQolW37p0aMGIFdu3a1G3MlxAHgr3/9K6ZNm+a5fejQIaxbtw4rV65Efn6+v+UQUYiz2pzY9l0ZVBEy3JubLnY5RKLze4vcYDBArf7+L2KNRgODwXDNsW63G3v37sXPf/5zz33z5s1DTk4OrFYrpkyZgjVr1iA9/fo/nFptFOQ+XghBr4/2aVw4Ym+8Y2+8E7s3f/qqCCarA4/fMwh903Wi1vJjYvcmkLE33vnbG7+DXKfTwWw2e26bTCbodNf+4dq5cyduv/32dvuzcnJyAAAqlQqDBw9Gfn5+h0He2GjxqTa9Php1dS0+jQ037I137I13YvemyWTD5l1nEatW4qeDkwLq/5PYvQlk7I13vvbmemHv99T6sGHDUF1dDbvdDgDIz8/HhAkTYDQaYTKZ2o3dtGkTpkyZ4rmdl5eHPXv2eG6Xl5cjNZVrJBPRtf1tXxnsDjceHJuJCCUvT0oEdMEWuUqlwuLFi/Hmm29Cq9UiKysLubm5WLZsGeLi4jBz5kwAQFFRETIyMtpNw8fHx2PVqlU4deoULl26hEmTJmHkyJH+lkREIehSowV7jlYjSavC2ByebkZ0hUQQBEHsIjrL1ykaTud4x954x954J2ZvNnxVjN1Hq/HsA9kYPSRJlBquh9833rE33gXE1DoRUXdrMtnw3YkaJMapMGpQotjlEAUUBjkRBbxvDlfC6XLjZ6PTIJVy8ReiH2KQE1FAs7Q68e+CSsSolfjp0F5il0MUcBjkRBTQdh+tgtXmwl0j+0Dh4/oRROGEQU5EAcvhdGPH4QpEKmW4fXiK2OUQBSQGOREFrLzCGjSZ7JgwLAVRkQqxyyEKSAxyIgpIgiDg64MXIJNKcNcoLhRF5A2DnIgCUskFIy4aLBg1KBHa6AixyyEKWAxyIgpIu45WAQAmcN840XUxyIko4DSZ7ThSUocUvRoD+sSKXQ5RQGOQE1HA+fZ4NVxuAROGpbS7WiIRXY1BTkQBxe0WsPtoNZQKKXKzuQAMUUcY5EQUUE6WNqC+qRVjhiQhKtLvCzQShTwGOREFlF0FPMiNqDMY5EQUMBqaW3HsXD0ye0cjo1eM2OUQBQUGOREFjO9OXIQgAOOHcWucyFcMciIKCIIgIK+wFkq5lNccJ+oEBjkRBYSymhbUNFgwbEACVBE8yI3IVwxyIgoIeYU1AMBTzog6iUFORKJzud04eKoWGpUC2ZnxYpdDFFQY5EQkusLSRjRbHBg9OAlyGX8tEXUGf2KISHT7L0+rj7kpSeRKiIIPg5yIRGW1OZF/ug6JWhX69ua540SdxSAnIlEVnKmD3elGbnYvXiCF6AYwyIlIVHmFtQCAMdmcVie6EQxyIhJNi8WOU2UNyOwdgyRtlNjlEAUlBjkRiabgTD0EAVzJjcgPDHIiEs2RkjoAwIgsvciVEAUvBjkRicLS6sSpsgakJWqQGKcSuxyioNUlCxrv27cPO3bsgE6ng0QiwaxZs9o9vmnTJnz66aeIiIgAAEybNg0PPfQQAGDr1q0oKiqCVCpFWloapk+f3hUlEVGAO3auHi63gFu4NU7kF7+D3Gq1YtGiRdi+fTuUSiVmz56NvLw85Obmthv3zjvvoE+fPu3uq6mpwQcffIAtW7ZAIpFg2rRpGDNmDDIyMvwti4gC3PfT6tw/TuQPv6fWjx49iuTkZCiVSgDAiBEjsGvXrqvGbdy4EevXr8eqVatgNBoBAHv37kV2drbn3NHhw4djz549/pZERAHOZnfh5HkDeuuikJKgFrscoqDm9xa5wWCAWv39D6JGo4HBYGg3ZtSoUZgwYQLi4+Oxe/duvPDCC9iwYQMaGhraPVetVl/1XCIKPSfOG2B3ujmtTtQF/A5ynU4Hs9nsuW0ymaDT6dqNSU1N9Xw9ZswYPPfcc3C5XIiPj0d5ebnnMbPZjLS0tA7fU6uNglwu86k+vT7ap3HhiL3xjr3xrit6c/LrEgDAxNEZIdXrUPosXY298c7f3vgd5MOGDUN1dTXsdjuUSiXy8/Px6KOPwmg0Qi6XQ6PRYMWKFXjhhRcgl8tRVlaGPn36QCaTYdy4cfjTn/4EQRAgkUhQUFCAxx9/vMP3bGy0+FSbXh+NuroWfz9iSGJvvGNvvOuK3jicbhwsrEFCbCRilNKQ6TW/b7xjb7zztTfXC3u/g1ylUmHx4sV48803odVqkZWVhdzcXCxbtgxxcXGYOXMmEhISsHjxYvTp0wenT5/GsmXLAAC9evXC008/jSVLlkAmk+Hhhx/mgW5EIe5UWQNa7S6MH5bMtdWJuoBEEARB7CI6y9e/7PhXoHfsjXfsjXdd0ZuP/lGMPceq8fLjIzCgT1wXVSY+ft94x9541xVb5FwQhoh6jCAIOHHeAI1KgX7JsWKXQxQSGORE1GMqLpnQ2GLD0L7xkEo5rU7UFRjkRNRjjp1rO700p1+CyJUQhQ4GORH1mOPn6iGVSHBT33ixSyEKGQxyIuoRLRY7zlc1o39KDNSRCrHLIQoZDHIi6hEnzhsgAMjpz2l1oq7EICeiHnHcs39c18FIIuoMBjkRdTuX242T5xugi4ngRVKIuhiDnIi63dnKJlhsTuT0S+BqbkRdjEFORN2O0+pE3YdBTkTd7vg5AxRyKQala8UuhSjkMMiJqFs1NLeiqt6MwelaRCh8u/wwEfmOQU5E3aqwtAEAkJ3JRWCIugODnIi6VWHZ5SDPYJATdQcGORF1G7cg4FRZI7TREeitixK7HKKQxCAnom5zobYFJqsD2RnxPO2MqJswyImo21zZPz4kk0erE3UXBjkRdZtTZY0AgCHcP07UbRjkRNQtbA4XzlQakZakQUyUUuxyiEIWg5yIusXpCiOcLoGnnRF1MwY5EXULz/njnFYn6lYMciLqFoVlDVDIpRjQJ1bsUohCGoOciLqc0WRDVZ0ZWalxUMi5LCtRd2KQE1GX85x2xml1om7HICeiLnfltDMe6EbU/RjkRNSlBEFA8YVGREcpkKJXi10OUchjkBNRl7rUaEVjiw1ZaVpIuSwrUbdjkBNRlyq60DatPjgtTuRKiMIDg5yIulRxeVuQD0rn+upEPYFBTkRdpm3/uBGxGiV6xfOypUQ9Qd4VL7Jv3z7s2LEDOp0OEokEs2bNavf42rVrUV9fj4SEBBQWFmLOnDno168fAOCOO+5ASkoKACAxMRErVqzoipKISATVBguazXaMGZLEy5YS9RC/g9xqtWLRokXYvn07lEolZs+ejby8POTm5nrGWCwWvPzyy5BIJPj73/+Ot99+G++//z4AYMqUKZg9e7a/ZRBRAOC0OlHP83tq/ejRo0hOToZS2XZ1oxEjRmDXrl3txrz44ouev87dbjeior6fcjt06BDWrVuHlStXIj8/399yiEhEDHKinuf3FrnBYIBa/f25ohqNBgaD4Zpj7XY7Nm/ejEWLFnnumzdvHnJycmC1WjFlyhSsWbMG6enp131PrTYKch+XfdTro30aF47YG+/YG++89cbtFnC60gi9VoUh/fVhObXO7xvv2Bvv/O2N30Gu0+lgNps9t00mE3Q63VXj7HY7Fi9ejF/96ldIS0vz3J+TkwMAUKlUGDx4MPLz8zsM8sZGi0+16fXRqKtr8WlsuGFvvGNvvLteby7UtqDF4kBOPx3q6009XJn4+H3jHXvjna+9uV7Y+z21PmzYMFRXV8NutwMA8vPzMWHCBBiNRphMbT/Mra2tWLRoEf7zP/8TN910E77++msAQF5eHvbs2eN5rfLycqSmpvpbEhGJwDOtnsZpdaKe5PcWuUqlwuLFi/Hmm29Cq9UiKysLubm5WLZsGeLi4jBz5kzMmzcPZ86cQWVlJYC2g98mTZqE+Ph4rFq1CqdOncKlS5cwadIkjBw50u8PRUQ9r/iCEQAwmPvHiXqURBAEQewiOsvXKRpO53jH3njH3njnrTcutxtzfrcX0VFK/ObZ3Gs8M/Tx+8Y79sa7gJhaJyK6UGuC1ebitDqRCBjkROS30xVt0+pZXF+dqMcxyInIbyWX949npTLIiXoag5yI/OIWBJypNCIhNhLxMZFil0MUdhjkROSXqjozzK1OTqsTiYRBTkR+Kbl8/fGBnFYnEgWDnIj88v2BbjxinUgMDHIiumGCIOB0hRHa6AjoY7l/nEgMDHIiumE1DRY0WxzISo0Ly4ukEAUCBjkR3bArp50N5IFuRKJhkBPRDfPsH+eBbkSiYZAT0Q0RBAElFUbERCnQKz5K7HKIwhaDnIhuSJ3RisYWGwZy/ziRqBjkRHRDSnjaGVFAYJAT0Q05zfXViQICg5yIbsjpSiPUkXIk69Vil0IU1hjkRNRpjS021BlbMaBPHKTcP04kKgY5EXXaldPOBqTGilwJETHIiajTTldeXgimD/ePE4mNQU5EnXamwgilXIr0XtFil0IU9hjkRNQpJqsDVXVm9E2OgVzGXyFEYuNPIRF1ytnKJgjg9ceJAgWDnIg6xbN/nEFOFBAY5ETUKWcqjJBJJeiXzCPWiQIBg5yIfNZqd6KspgVpSdGIUMrELoeIwCAnok44faERLreAgTx/nChgMMiJyGeF5wwAeP44USBhkBORzwpL24J8AA90IwoYDHIi8onT5UZxeSOSE9TQqBRil0NEl8m74kX27duHHTt2QKfTQSKRYNasWe0et9lsWLp0KZKSklBWVoaZM2ciMzMTALB161YUFRVBKpUiLS0N06dP74qSiKiLXag1wWZ3YWAf7h8nCiR+B7nVasWiRYuwfft2KJVKzJ49G3l5ecjNzfWM2bBhA3r37o1nnnkGJSUlWLhwIT755BPU1NTggw8+wJYtWyCRSDBt2jSMGTMGGRkZ/pZFRF3s+wulcFqdKJD4PbV+9OhRJCcnQ6lUAgBGjBiBXbt2tRuza9cuDB8+HACQlZWF4uJimEwm7N27F9nZ2ZBcvgzi8OHDsWfPHn9LIqJucIYXSiEKSH5vkRsMBqjVas9tjUYDg8Hg05iGhoZ296vV6queey1abRTkct/OYdXreVEHb9gb79ib9txuAWermqHXqjCov17scgIWv2+8Y2+887c3fge5TqeD2Wz23DaZTNDpdD6NiY+PR3l5ued+s9mMtLS0Dt+zsdHiU216fTTq6lp8Ghtu2Bvv2JurVdWb0WKx45ZBfdgbL/h94x17452vvble2Ps9tT5s2DBUV1fDbrcDAPLz8zFhwgQYjUaYTCYAwIQJE1BQUAAAKCkpwaBBg6DRaDBu3DgUFhZCEAQAQEFBAW677TZ/SyKiLnbm8v7xIX11HYwkop7m9xa5SqXC4sWL8eabb0Kr1SIrKwu5ublYtmwZ4uLiMHPmTDz55JNYunQpVq9ejQsXLuCtt94CAPTq1QtPP/00lixZAplMhocffpgHuhEFoCsXSsnOjBe5EiL6MYlwZXM4iPg6RcPpHO/YG+/Ym6v9z+rvYHO48cn/uwf19SaxywlI/L7xjr3xLiCm1okotBmaWmFotmFAn1jPGSZEFDgY5ER0XVem1QfwtDOigMQgJ6LrunKg20AuBEMUkBjkRHRdpyuboFRIkZakEbsUIroGBjkReWWyOlBdb0a/5FjIZfx1QRSI+JNJRF55lmXltDpRwGKQE5FXZyqaAAADeMUzooDFICcir05XGiGTStAvmUFOFKgY5ER0Ta12J8prWpDeKxoRSt8uUkREPY9BTkTXdK66GS63wP3jRAGOQU5E13T6Ag90IwoGDHIiuqbTFUZIwAPdiAIdg5yIruJwunGuuhl9EjVQRyrELoeIroNBTkRXKb3YDKfLzWl1oiDAICeiq1xZCCaLQU4U8BjkRHSVkssXShnAICcKeAxyImrH5XbjbGUTesVHIVatFLscIuoAg5yI2qm4ZEKr3cX940RBgkFORO1cOX+c+8eJggODnIjaubJ/nFvkRMGBQU5EHm5BwJnKJuhiIqGLjRS7HCLyAYOciDwu1pthsjq4NU4URBjkRORxZVo9K41BThQsGORE5FF8gUFOFGwY5EQEABAEASUXGqGNjkBinErscojIRwxyIgIAVNeb0WJxYFBaHCQSidjlEJGPGOREBOCH0+pakSshos5gkBMRAKDkQiMAYBD3jxMFFQY5EcEtCCi+YIQ2OgJ67h8nCioMciJC9eXzxwelabl/nCjIyP15stFoxIoVK5CamoqysjLMnTsXCQkJ7cYcP34cGzZswJAhQ1BaWoqcnBw88sgjAIDXX38dpaWlnrGvvvoqsrKy/CmJiG5AyeX945xWJwo+fgX5O++8g9zcXEyePBn/+te/sHTpUrz99tvtxtTV1eHnP/85cnJy4HA48JOf/AR33nkn4uPjodfr8cYbb/j1AYjIf8WX949npfNAN6Jg41eQ7969G8899xwAYMSIEb7Wz8AAABeNSURBVFiwYMFVYyZOnNjutkwmg0KhAACYzWa89957kMlkiIqKwvTp0yGX+1USEXWSWxBQcsGI+JgI6Lm+OlHQ6TA1Z8yYgfr6+qvunzNnDgwGA9RqNQBAo9GgqakJTqfTaxhv3LgRv/jFLxAdHQ0AuP/++5GVlQW5XI5ly5ZhzZo1eP755zssWquNglwu63AcAOj10T6NC0fsjXfh1Juyi80wWR24IzsViYkxHY4Pp950FnvjHXvjnb+96TDI169f7/UxnU4Hs9mMmJgYmEwmxMbGeg3xbdu2wWKx4Je//KXnvuzsbM/XY8aMwbp163wK8sZGS4djgLbm1NW1+DQ23LA33oVbb/KOVgIA0hPVHX7ucOtNZ7A33rE33vnam+uFvV9HrY8fPx4FBQUAgPz8fIwfPx4A4Ha7UV1d7Rn3+eefw2Aw4Je//CVKSko8B7gtXbrUM6a8vBzp6en+lENEN+D7A924f5woGPm1Q3ru3LlYvnw5ysrKUFFRgfnz5wMASkpK8NJLL2Hbtm345z//id/85jcYMmQIdu7cCaPRiFdffRWZmZlobGzE8uXLERkZidLSUrz88std8qGIyDdt5483QhcTgQTuHycKShJBEASxi+gsX6doOJ3jHXvjXTj1prymBb/+6BDGDu2Np+8d3OH4cOpNZ7E33rE33ok+tU5Ewe1UWQMAYEgGp9WJghWDnCiMXQnywRnxIldCRDeKQU4UphxOF05XNqGPXo1YtVLscojoBjHIicLU2apmOJxuDOHWOFFQY5AThSnuHycKDQxyojB1qqwRMqkEA1N5oRSiYMYgJwpD5lYHymqa0Tc5BpFKXt+AKJgxyInCUHG5EYIA7h8nCgEMcqIwdKqc+8eJQgWDnCgMnSprRIRShszeHV/tjIgCG4OcKMw0NLeitsGCQalxkMv4K4Ao2PGnmCjMFHpOO+P+caJQwCAnCjMnz18O8kwGOVEoYJAThRGX243C0gboYiKQrIsSuxwi6gIMcqIwcr66GRabE0P76iCRSMQuh4i6AIOcKIycOG8AAAztqxO5EiLqKgxyojBy4lwDZFIJBqXz/HGiUMEgJwoTTSYbymtbMDA1DqoILstKFCoY5ERh4mRp29HqnFYnCi0McqIw8f3+cZ52RhRKGOREYeDKaWfxMRFITlCLXQ4RdSEGOVEYKK1ugbmVp50RhSIGOVEYOM7TzohCFoOcKAycOG+ATCrBYJ52RhRyGOREIc5osqG8pgUD+sTytDOiEMQgJwpxR8/UAwCGD9CLXAkRdQcGOVGIK/AEeYLIlRBRd2CQE4Uwq82JovIGpCVqkBCnErscIuoGDHKiEHbivAFOl4DhAzmtThSq/DryxWg0YsWKFUhNTUVZWRnmzp2LhISrp+/uuOMOpKSkAAASExOxYsUKAEBlZSVWr16N9PR0VFVVYf78+VCruVgFUVfhtDpR6PNri/ydd95Bbm4uZs6ciTvvvBNLly695rgpU6bg448/xscff+wJcQBYtGgRpk+fjmeffRYDBgzAunXr/CmHiH7A6XLj+Ll66GIikZqoEbscIuomfgX57t27MXz4cADAiBEjsHv37muOO3ToENatW4eVK1ciPz8fAOBwOHDgwAEMHTq0w+cTUecVX2iE1ebC8IEJXM2NKIR1OLU+Y8YM1NfXX3X/nDlzYDAYPFPhGo0GTU1NcDqdkMvbv+y8efOQk5MDq9WKKVOmYM2aNVCpVIiMjPT8gtFoNDAYDD4VrdVGQS6X+TRWr4/2aVw4Ym+8C4XeFO05DwC4Y1R6l36eUOhNd2FvvGNvvPO3Nx0G+fr1670+ptPpYDabERMTA5PJhNjY2KtCHABycnIAACqVCoMHD0Z+fj7uu+8+tLa2QhAESCQSmEwm6HS+LR/Z2GjxaZxeH426uhafxoYb9sa7UOiNWxCQd7wa6kg59NGKLvs8odCb7sLeeMfeeOdrb64X9n5NrY8fPx4FBQUAgPz8fIwfPx4A4Ha7UV1dDQDIy8vDnj17PM8pLy9HamoqFAoFRo8ejRMnTlz1fCLyT3lNC4wmO4b1T4BMypNTiEKZX0etz507F8uXL0dZWRkqKiowf/58AEBJSQleeuklbNu2DfHx8Vi1ahVOnTqFS5cuYdKkSRg5ciQA4Ne//jX+8Ic/4Ntvv8XFixexYMEC/z8RESH/dB0A8LQzojAgEQRBELuIzvJ1iobTOd6xN94Fe28EQcCCNXloNjuwcs5YRCh8O57EF8Hem+7E3njH3ngn+tQ6EQWe8xebUWdsxfCBCV0a4kQUmBjkRCHmwKlaAMCtg5NEroSIegKDnCiEuN0CDhVfgjpSjpsy48Uuh4h6AIOcKISUVBjRZLLjlqxEyGX88SYKB/xJJwohV6bVRw9OFLkSIuopDHKiEOF0uXGk5BJiNUpkpWnFLoeIegiDnChEFJY2wNzqxKhBiZBKubY6UbhgkBOFiANFV6bVebQ6UThhkBOFAJvDhYIz9UiIjUTf5BixyyGiHsQgJwoBR0ouwWZ3YUx2Ei9ZShRmGOREIWDPsYsAgLE5ySJXQkQ9jUFOFORqGiw4XWHE4HQtEuNUYpdDRD2MQU4U5PYea7tk8Libe4tcCRGJgUFOFMScLje+O1kDdaQct/CSpURhiUFOFMSOnzOg2WxHbnYvKOS80hlROGKQEwWxPZ5pdR7kRhSuGOREQaqhuRUnzhuQ2TsGqYkascshIpEwyImC1HcnLkIQeJAbUbhjkBMFIafLjX8XVCFCKeOSrERhjkFOFIQOFV+C0WTHuJzeUEXIxS6HiETEICcKMoIgYMfBCkgkwJ0jU8Uuh4hExiAnCjKnK4wor23BiIF6ruRGRAxyomCz41AFAODuUdwaJyIGOVFQqW204OiZemT2jkH/lFixyyGiAMAgJwoi/zxUCQHApFtTeblSIgLAICcKGuZWB/aeqIYuJgK3ZHFddSJqwyAnChJfH6yA3eHGnSNTIZPyR5eI2vC3AVEQaLHY8c3hCsSolZgwPEXscogogDDIiYLAVwcuwGZ34d7cdEQoeJUzIvqeX0tCGY1GrFixAqmpqSgrK8PcuXORkJDQbsyBAwfwxhtvID4+HgBgMBhwzz33YPbs2Xj99ddRWlrqGfvqq68iKyvLn5KIQk6TyYadRyqhjY7AhGG8yhkRtedXkL/zzjvIzc3F5MmT8a9//QtLly7F22+/3W5MYmIi3n77bQwZMgQA8Morr2Dq1KkAAL1ejzfeeMOfEohC3vb95bA73fi/P8ngNceJ6Cp+Bfnu3bvx3HPPAQBGjBiBBQsWXDUmMzPT83V9fT3sdjtSUtr28ZnNZrz33nuQyWSIiorC9OnTIZdz3WiiKxqaW7GroBoJsZEYl8OrnBHR1TpMzRkzZqC+vv6q++fMmQODwQC1Wg0A0Gg0aGpqgtPp9BrGn3zyCaZPn+65ff/99yMrKwtyuRzLli3DmjVr8Pzzz3dYtFYbBbmPWyZ6fbRP48IRe+NdoPTm893n4XS58eikQejdKzAWgAmU3gQi9sY79sY7f3vTYZCvX7/e62M6nQ5msxkxMTEwmUyIjY31GuJ2ux0nT57EnDlzPPdlZ2d7vh4zZgzWrVvnU5A3Nlo6HAO0NaeursWnseGGvfEuUHpTccmEr/aXISk+CkMz4gKipkDpTSBib7xjb7zztTfXC3u/jlofP348CgoKAAD5+fkYP348AMDtdqO6urrd2G3btuHee+9td9/SpUs9X5eXlyM9Pd2fcohChiAI+NOOEggC8NidA3jeOBF55dcO6blz52L58uUoKytDRUUF5s+fDwAoKSnBSy+9hG3btnnGfvXVV1i9enW75zc2NmL58uWIjIxEaWkpXn75ZX/KIQoZ+0/V4kxlE0YM1OOmvjqxyyGiACYRBEEQu4jO8nWKhtM53rE33ondG6vNiVfW7ofF5sRb/zUaCQF0qVKxexPI2Bvv2BvvRJ9aJ6Kut+27MjSZ7bh3THpAhTgRBSYGOVEAqaoz4ZvDFdDHReKeMWlil0NEQYBBThQgnC431m07BZdbwKN3DuTiL0TkEwY5UYDYsrcUFy6ZMC6nN27un9DxE4iIwCAnCginK4z4x/5y6OMiMX3iALHLIaIgwiAnEpnV5sQfvzwFSIBn7suGKoLLFBOR7xjkRCISBAEbvzmN+qZW3Jubjv59AmMZViIKHgxyIhHtPFKJfSdrkNErGg/8NLPjJxAR/QiDnEgkhWUN+HTnWcSolZg1dSjkMv44ElHn8TcHkQhqGyx4b/NJSKXArKlDER8TKXZJRBSkGOREPczS6sTvvzgOi82JJycNQv8U7hcnohvHICfqQa12J1Z+fgwXDRbcPSoVY3N6i10SEQU5BjlRD7HZXVj5+XGcrWrCrYMT8cjt/cUuiYhCAIOcqAfYHS78/ovjOF1hxC1Zejxz/xBIpRKxyyKiEMCVJ4i6WavdiT9sPomi8kYMH5CAZx/IhkzKv6GJqGswyIm6UWOLDb/7/BguXDIhp58Ov3jwJp5mRkRdikFO1E0u1Lbgd389jsYWG8YPS8Zjdw1kiBNRl2OQE3WDw8WXsH57EewOFx65vT8m3ZoKiYT7xImo6zHIibqQze7Cn3eexp5jF6GUS/HcQzdh5KBEscsiohDGICfqIuU1LXj/b4WobbAgLVGDmQ9kIzlBLXZZRBTiGOREfrLanNj6bSl2HqmEyy3g7lGpmDa+HxRy7g8nou7HICe6QYIgYH9hLf7y77NoMtuhj4vEE5OycFOmTuzSiCiMMMiJOkkQBBw7a8DfvitFWU0LFHIpHhqXiXtGp0Ehl4ldHhGFGQY5kY/cbgFHz9Zj23dlKK9tAQCMHJSIRyb0Q0KcSuTqiChcMciJOtBktuPb49XYVVANQ3MrJABuHZyI+36SgT56jdjlEVGYY5ATXYPN7sKxc/U4VHQJR8/Ww+UWoFRIcdvNybhrVCpSeDQ6EQUIBjnRZU0mG06WNqC4shiHTtXA7nADAFIS1JgwPAW52b0QFckfGSIKLPytRGGr2WzHmcomnK0yoqisERcumTyPJWlVGDU4CbcOSkSKXs1V2YgoYDHIKeQJgoAmsx2Vl0wor21BxSUTympacKnR6hkjl0mQnaFFdqYOt92SCpUMDG8iCgoMcgoJTpcbTSY76pusqG9qRZ3RijpjK2oazKhpsMBqc7UbHxUhx9C+OvTvE4sBKbHITI5BhKLt1DG9Php1dS1ifAwiok7zK8jdbjf+8pe/4He/+x02bNiAgQMHXnPc1q1bUVRUBKlUirS0NEyfPh0AUFlZidWrVyM9PR1VVVWYP38+1GoeRBTu3G4BVrsTVpsTVpsLllYHzK1OmK1t/7ZY7Gi22NFsdqDJbIOxxYYWiwPCNV5LLpMgSRuFXhlRSElQIy0pGmmJGuhiI7nFTUQhwa8gLy4uxs033wyVyvs5tDU1Nfjggw+wZcsWSCQSTJs2DWPGjEFGRgYWLVqEF154ATk5Ofj444+xbt06vPjiix2+b4vF7lN9SpPN57E/dK1AuJHBVz0kCF4fE673Oj98nnDl+Z4vIFx5vcsPCj+8/8p9QtvXVx4zOdxoaDBDEAD35TFuQfCMc7sFuK98LQhwu3/4ddtjLrcbbrcA1+X/3G4BLpcAp9sNp0uAy9X2r9Pl9vzncLphd17+1+GC/fK/NocLrXYXbPa2+3yllEsRFx2B3jo1tNERiI+JREJcJPSxKiTERSIhNhIyKZdKJaLQ5VeQDxkypMMxe/fuRXZ2tmfrZ/jw4dizZw9SUlJw4MABDB06FAAwYsQIvPrqqz4F+Qu//9afsimAyGVSRCikUCpkUEcqoIuJRIRCBlWE/PJ/MkRFyqGJVECtUkAdqUB0lALRaiViohSIUMi4ZU1EYa3DIJ8xYwbq6+uvun/OnDmYOHFih2/Q0NDQbrpcrVbDYDCgsbERkZHfT29qNBoYDAafit624kGfxhHdKL0+WuwSAhZ74x174x17452/vekwyNevX+/XG8THx6O8vNxz22w2Iy0tDVqtFq2trRAEARKJBCaTCTodLzZBRETUGd2y89DtdqO6uhoAMG7cOBQWFnr21RYUFOC2226DQqHA6NGjceLECQBAfn4+xo8f3x3lEBERhSyJIFzvMKvra2pqwsaNG/Hhhx/iwQcfxH333Ydhw4ahqKgIL730ErZt2wag7aj1kydPQiaTISMjo91R63/4wx+QmpqKixcvYsGCBTxqnYiIqBP8CnIiIiISF8/LISIiCmIMciIioiAW8ku02u12fPjhh1CpVDh79iy0Wi1+9atfiV1WQFm9ejU2bNiAAwcOiF1KwFiyZAlUKhWioqJQXFyMV155BXq9XuyyRLNv3z7s2LEDOp0OEokEs2bNErukgHHhwgWsXLkSQ4YMQU1NDeLi4tifH2htbcXDDz+MsWPHYv78+WKXE1DOnz+P7du3IyIiAocOHcLs2bORk5PT6dcJ+SBft24dbr31VowaNQpA22p09L0DBw6gublZ7DICjkql8vzBt3btWrz//vt47bXXRK5KHFarFYsWLcL27duhVCoxe/Zs5OXlITc3V+zSAoLRaMTkyZNx5513AgAmT56MCRMm4KabbhK5ssBw5Y8cas/lcuE3v/kN3n//fUilUjz00EOQy28skkM+yL/88kskJyejsLAQRqMRTzzxhNglBYz6+nps374dM2fOxObNm8UuJ6D8cNZGEARERUWJWI24jh49iuTkZCiVSgBtqzDu2rWLQX7Zj7eg3G73dZetDidbtmzBiBEjUFJSAovFInY5AeXEiRMQBAEff/wxWltbERcXh0ceeeSGXiskgvx6q89VVVVBIpHgqaeewr59+/Diiy/i448/FqFKcVyvNzt37sT8+fPR0hKeV/ryZdXC5uZmfPvtt3j33Xd7uryAYTAY2p0W2plVGMPNN998g7Fjx6Jfv35ilyK6s2fP4vz585g7dy5KSkrELifgVFdX4+jRo3jnnXcQHR2NefPmQaFQYOrUqZ1+rZAI8uutPqfRaDx/Md9yyy04fPgwXC4XZDJZT5UnKm+9OXHiBORyOT777DM0NTXBZrNh7dq1uPvuu5GRkdGzRYqko1ULW1pa8Otf/xpLlixBXFxcD1UVeHQ6Hcxms+c2V2G8tv379+PAgQN45ZVXxC4lIHzzzTdQKpVYu3Ytjhw5AofDgY8++ghPPfWU2KUFBLVajb59+yI6um151ltuuQUHDx4M3yC/ntzcXFRUVKBv376oqqpCWlpa2IT49QwdOtRzwZrKykr89a9/xcyZM0WuKnA0NDRgyZIleOmll5CUlISvv/4akyZNErssUQwbNgzV1dWw2+1QKpXIz8/Ho48+KnZZAWXXrl04fPgwFi5ciEuXLqG6uhrDhw8XuyxRPffcc56vbTYbLBYLQ/wHbr75ZhiNRs+GZXV19Q1vRIX8gjC1tbX4/e9/j7S0NJw7dw6PP/74DR0VGKrKy8vx6aef4s9//jNmzpyJp556Kqz3B18xZcoUOJ1Oz5a4Wq3G+++/L3JV4vnuu+/w9ddfQ6vVQqFQ8KjsHzh58iSeeOIJz8FtFosFjz322A1tWYWir7/+Ghs3boTD4cBjjz2G++67T+ySAsY333yD/fv3Q6vV4uLFi3jttdcQGRnZ6dcJ+SAnIiIKZVwQhoiIKIgxyImIiIIYg5yIiCiIMciJiIiCGIOciIgoiDHIiYiIghiDnIiIKIiF/MpuROS/TZs2YefOnUhKSkJTUxP+/ve/Y+vWrRg4cKDYpRGFPS4IQ0QdKigoQFxcHDIzMzF79mykpaXhf/7nf8Qui4jAICeiTvj888/xySef4LPPPvNc1pSIxMWpdSLySWlpKVasWIGNGzcyxIkCCA92I6IOORwO/Pd//zdmz56Nfv364dy5czh8+LDYZRERuEVORD746KOPcO7cOZw7dw5vvPEGamtrMXHiRIwcOVLs0ojCHveRExERBTFOrRMREQUxBjkREVEQY5ATEREFMQY5ERFREGOQExERBTEGORERURBjkBMREQUxBjkREVEQ+/+RA+hyZdAJPAAAAABJRU5ErkJggg==\n",
-      "text/plain": [
-       "<Figure size 576x396 with 1 Axes>"
-      ]
-     },
-     "metadata": {},
-     "output_type": "display_data"
-    }
-   ],
+   "id": "b8d1371c",
+   "metadata": {
+    "editable": true
+   },
    "source": [
->>>>>>> 9b0e2e75096cc1acee65bfac25f4eff818140252
-    "\"\"\"The sigmoid function (or the logistic curve) is a\n",
-    "function that takes any real number, z, and outputs a number (0,1).\n",
-    "It is useful in neural networks for assigning weights on a relative scale.\n",
-    "The value z is the weighted sum of parameters involved in the learning algorithm.\"\"\"\n",
-    "\n",
-    "import numpy\n",
-    "import matplotlib.pyplot as plt\n",
-    "import math as mt\n",
-    "\n",
-    "z = numpy.arange(-5, 5, .1)\n",
-    "sigma_fn = numpy.vectorize(lambda z: 1/(1+numpy.exp(-z)))\n",
-    "sigma = sigma_fn(z)\n",
-    "\n",
-    "fig = plt.figure()\n",
-    "ax = fig.add_subplot(111)\n",
-    "ax.plot(z, sigma)\n",
-    "ax.set_ylim([-0.1, 1.1])\n",
-    "ax.set_xlim([-5,5])\n",
-    "ax.grid(True)\n",
-    "ax.set_xlabel('z')\n",
-    "ax.set_title('sigmoid function')\n",
-    "\n",
-    "plt.show()\n",
-    "\n",
-    "\"\"\"Step Function\"\"\"\n",
-    "z = numpy.arange(-5, 5, .02)\n",
-    "step_fn = numpy.vectorize(lambda z: 1.0 if z >= 0.0 else 0.0)\n",
-    "step = step_fn(z)\n",
-    "\n",
-    "fig = plt.figure()\n",
-    "ax = fig.add_subplot(111)\n",
-    "ax.plot(z, step)\n",
-    "ax.set_ylim([-0.5, 1.5])\n",
-    "ax.set_xlim([-5,5])\n",
-    "ax.grid(True)\n",
-    "ax.set_xlabel('z')\n",
-    "ax.set_title('step function')\n",
-    "\n",
-    "plt.show()\n",
-    "\n",
-    "\"\"\"tanh Function\"\"\"\n",
-    "z = numpy.arange(-2*mt.pi, 2*mt.pi, 0.1)\n",
-    "t = numpy.tanh(z)\n",
-    "\n",
-    "fig = plt.figure()\n",
-    "ax = fig.add_subplot(111)\n",
-    "ax.plot(z, t)\n",
-    "ax.set_ylim([-1.0, 1.0])\n",
-    "ax.set_xlim([-2*mt.pi,2*mt.pi])\n",
-    "ax.grid(True)\n",
-    "ax.set_xlabel('z')\n",
-    "ax.set_title('tanh function')\n",
+    "where $\\epsilon$ is normally distributed with mean zero and standard deviation $\\sigma^2$.\n",
     "\n",
-    "plt.show()"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Two parameters\n",
+    "In our derivation of the ordinary least squares method we defined then\n",
+    "an approximation to the function $f$ in terms of the parameters\n",
+    "$\\boldsymbol{\\theta}$ and the design matrix $\\boldsymbol{X}$ which embody our model,\n",
+    "that is $\\boldsymbol{\\tilde{y}}=\\boldsymbol{X}\\boldsymbol{\\theta}$. \n",
     "\n",
-    "We assume now that we have two classes with $y_i$ either $0$ or $1$. Furthermore we assume also that we have only two parameters $\\beta$ in our fitting of the Sigmoid function, that is we define probabilities"
+    "Thereafter we found the parameters $\\boldsymbol{\\theta}$ by optimizing the means squared error via the so-called cost function"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "c95f3051",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
-    "\\begin{align*}\n",
-    "p(y_i=1|x_i,\\hat{\\beta}) &= \\frac{\\exp{(\\beta_0+\\beta_1x_i)}}{1+\\exp{(\\beta_0+\\beta_1x_i)}},\\nonumber\\\\\n",
-    "p(y_i=0|x_i,\\hat{\\beta}) &= 1 - p(y_i=1|x_i,\\hat{\\beta}),\n",
-    "\\end{align*}\n",
+    "C(\\boldsymbol{X},\\boldsymbol{\\theta}) =\\frac{1}{n}\\sum_{i=0}^{n-1}(y_i-\\tilde{y}_i)^2=\\mathbb{E}\\left[(\\boldsymbol{y}-\\boldsymbol{\\tilde{y}})^2\\right].\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "107fab0a",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "where $\\hat{\\beta}$ are the weights we wish to extract from data, in our case $\\beta_0$ and $\\beta_1$. \n",
-    "\n",
-    "Note that we used"
+    "We can rewrite this as"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "d56b4bd7",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
-    "p(y_i=0\\vert x_i, \\hat{\\beta}) = 1-p(y_i=1\\vert x_i, \\hat{\\beta}).\n",
+    "\\mathbb{E}\\left[(\\boldsymbol{y}-\\boldsymbol{\\tilde{y}})^2\\right]=\\frac{1}{n}\\sum_i(f_i-\\mathbb{E}\\left[\\boldsymbol{\\tilde{y}}\\right])^2+\\frac{1}{n}\\sum_i(\\tilde{y}_i-\\mathbb{E}\\left[\\boldsymbol{\\tilde{y}}\\right])^2+\\sigma^2.\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "4712d813",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "<!-- !split  -->\n",
-    "## Maximum likelihood\n",
+    "The three terms represent the square of the bias of the learning\n",
+    "method, which can be thought of as the error caused by the simplifying\n",
+    "assumptions built into the method. The second term represents the\n",
+    "variance of the chosen model and finally the last terms is variance of\n",
+    "the error $\\boldsymbol{\\epsilon}$.\n",
     "\n",
-    "In order to define the total likelihood for all possible outcomes from a  \n",
-    "dataset $\\mathcal{D}=\\{(y_i,x_i)\\}$, with the binary labels\n",
-    "$y_i\\in\\{0,1\\}$ and where the data points are drawn independently, we use the so-called [Maximum Likelihood Estimation](https://en.wikipedia.org/wiki/Maximum_likelihood_estimation) (MLE) principle. \n",
-    "We aim thus at maximizing \n",
-    "the probability of seeing the observed data. We can then approximate the \n",
-    "likelihood in terms of the product of the individual probabilities of a specific outcome $y_i$, that is"
+    "To derive this equation, we need to recall that the variance of $\\boldsymbol{y}$ and $\\boldsymbol{\\epsilon}$ are both equal to $\\sigma^2$. The mean value of $\\boldsymbol{\\epsilon}$ is by definition equal to zero. Furthermore, the function $f$ is not a stochastics variable, idem for $\\boldsymbol{\\tilde{y}}$.\n",
+    "We use a more compact notation in terms of the expectation value"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "43a58a59",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
-    "\\begin{align*}\n",
-    "P(\\mathcal{D}|\\hat{\\beta})& = \\prod_{i=1}^n \\left[p(y_i=1|x_i,\\hat{\\beta})\\right]^{y_i}\\left[1-p(y_i=1|x_i,\\hat{\\beta}))\\right]^{1-y_i}\\nonumber \\\\\n",
-    "\\end{align*}\n",
+    "\\mathbb{E}\\left[(\\boldsymbol{y}-\\boldsymbol{\\tilde{y}})^2\\right]=\\mathbb{E}\\left[(\\boldsymbol{f}+\\boldsymbol{\\epsilon}-\\boldsymbol{\\tilde{y}})^2\\right],\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "6333f694",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "from which we obtain the log-likelihood and our **cost/loss** function"
+    "and adding and subtracting $\\mathbb{E}\\left[\\boldsymbol{\\tilde{y}}\\right]$ we get"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "24e27c2b",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
-    "\\mathcal{C}(\\hat{\\beta}) = \\sum_{i=1}^n \\left( y_i\\log{p(y_i=1|x_i,\\hat{\\beta})} + (1-y_i)\\log\\left[1-p(y_i=1|x_i,\\hat{\\beta}))\\right]\\right).\n",
+    "\\mathbb{E}\\left[(\\boldsymbol{y}-\\boldsymbol{\\tilde{y}})^2\\right]=\\mathbb{E}\\left[(\\boldsymbol{f}+\\boldsymbol{\\epsilon}-\\boldsymbol{\\tilde{y}}+\\mathbb{E}\\left[\\boldsymbol{\\tilde{y}}\\right]-\\mathbb{E}\\left[\\boldsymbol{\\tilde{y}}\\right])^2\\right],\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "0462c197",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "## The cost function rewritten\n",
-    "\n",
-    "Reordering the logarithms, we can rewrite the **cost/loss** function as"
+    "which, using the abovementioned expectation values can be rewritten as"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "965cd453",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
-    "\\mathcal{C}(\\hat{\\beta}) = \\sum_{i=1}^n  \\left(y_i(\\beta_0+\\beta_1x_i) -\\log{(1+\\exp{(\\beta_0+\\beta_1x_i)})}\\right).\n",
+    "\\mathbb{E}\\left[(\\boldsymbol{y}-\\boldsymbol{\\tilde{y}})^2\\right]=\\mathbb{E}\\left[(\\boldsymbol{y}-\\mathbb{E}\\left[\\boldsymbol{\\tilde{y}}\\right])^2\\right]+\\mathrm{Var}\\left[\\boldsymbol{\\tilde{y}}\\right]+\\sigma^2,\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "The maximum likelihood estimator is defined as the set of parameters that maximize the log-likelihood where we maximize with respect to $\\beta$.\n",
-    "Since the cost (error) function is just the negative log-likelihood, for logistic regression we have that"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
+   "id": "4426c74e",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "$$\n",
-    "\\mathcal{C}(\\hat{\\beta})=-\\sum_{i=1}^n  \\left(y_i(\\beta_0+\\beta_1x_i) -\\log{(1+\\exp{(\\beta_0+\\beta_1x_i)})}\\right).\n",
-    "$$"
+    "that is the rewriting in terms of the so-called bias, the variance of the model $\\boldsymbol{\\tilde{y}}$ and the variance of $\\boldsymbol{\\epsilon}$."
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "d68ec470",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "This equation is known in statistics as the **cross entropy**. Finally, we note that just as in linear regression, \n",
-    "in practice we often supplement the cross-entropy with additional regularization terms, usually $L_1$ and $L_2$ regularization as we did for Ridge and Lasso regression.\n",
+    "## A way to Read the Bias-Variance Tradeoff\n",
     "\n",
-    "## Minimizing the cross entropy\n",
+    "<!-- dom:FIGURE: [figures/BiasVariance.png, width=600 frac=0.9] -->\n",
+    "<!-- begin figure -->\n",
     "\n",
-    "The cross entropy is a convex function of the weights $\\hat{\\beta}$ and,\n",
-    "therefore, any local minimizer is a global minimizer. \n",
-    "\n",
-    "\n",
-    "Minimizing this\n",
-    "cost function with respect to the two parameters $\\beta_0$ and $\\beta_1$ we obtain"
+    "<img src=\"figures/BiasVariance.png\" width=\"600\"><p style=\"font-size: 0.9em\"><i>Figure 1: </i></p>\n",
+    "<!-- end figure -->"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "0198c371",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "$$\n",
-    "\\frac{\\partial \\mathcal{C}(\\hat{\\beta})}{\\partial \\beta_0} = -\\sum_{i=1}^n  \\left(y_i -\\frac{\\exp{(\\beta_0+\\beta_1x_i)}}{1+\\exp{(\\beta_0+\\beta_1x_i)}}\\right),\n",
-    "$$"
+    "## Example code for Bias-Variance tradeoff"
    ]
   },
   {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "and"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "$$\n",
-    "\\frac{\\partial \\mathcal{C}(\\hat{\\beta})}{\\partial \\beta_1} = -\\sum_{i=1}^n  \\left(y_ix_i -x_i\\frac{\\exp{(\\beta_0+\\beta_1x_i)}}{1+\\exp{(\\beta_0+\\beta_1x_i)}}\\right).\n",
-    "$$"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
+   "cell_type": "code",
+   "execution_count": 3,
+   "id": "af517050",
+   "metadata": {
+    "collapsed": false,
+    "editable": true,
+    "jupyter": {
+     "outputs_hidden": false
+    }
+   },
+   "outputs": [],
    "source": [
-    "## A more compact expression\n",
+    "import matplotlib.pyplot as plt\n",
+    "import numpy as np\n",
+    "from sklearn.linear_model import LinearRegression, Ridge, Lasso\n",
+    "from sklearn.preprocessing import PolynomialFeatures\n",
+    "from sklearn.model_selection import train_test_split\n",
+    "from sklearn.pipeline import make_pipeline\n",
+    "from sklearn.utils import resample\n",
     "\n",
-    "Let us now define a vector $\\hat{y}$ with $n$ elements $y_i$, an\n",
-    "$n\\times p$ matrix $\\hat{X}$ which contains the $x_i$ values and a\n",
-    "vector $\\hat{p}$ of fitted probabilities $p(y_i\\vert x_i,\\hat{\\beta})$. We can rewrite in a more compact form the first\n",
-    "derivative of cost function as"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "$$\n",
-    "\\frac{\\partial \\mathcal{C}(\\hat{\\beta})}{\\partial \\hat{\\beta}} = -\\hat{X}^T\\left(\\hat{y}-\\hat{p}\\right).\n",
-    "$$"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "If we in addition define a diagonal matrix $\\hat{W}$ with elements \n",
-    "$p(y_i\\vert x_i,\\hat{\\beta})(1-p(y_i\\vert x_i,\\hat{\\beta})$, we can obtain a compact expression of the second derivative as"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "$$\n",
-    "\\frac{\\partial^2 \\mathcal{C}(\\hat{\\beta})}{\\partial \\hat{\\beta}\\partial \\hat{\\beta}^T} = \\hat{X}^T\\hat{W}\\hat{X}.\n",
-    "$$"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Extending to more predictors\n",
+    "np.random.seed(2018)\n",
     "\n",
-    "Within a binary classification problem, we can easily expand our model to include multiple predictors. Our ratio between likelihoods is then with $p$ predictors"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "$$\n",
-    "\\log{ \\frac{p(\\hat{\\beta}\\hat{x})}{1-p(\\hat{\\beta}\\hat{x})}} = \\beta_0+\\beta_1x_1+\\beta_2x_2+\\dots+\\beta_px_p.\n",
-    "$$"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Here we defined $\\hat{x}=[1,x_1,x_2,\\dots,x_p]$ and $\\hat{\\beta}=[\\beta_0, \\beta_1, \\dots, \\beta_p]$ leading to"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "$$\n",
-    "p(\\hat{\\beta}\\hat{x})=\\frac{ \\exp{(\\beta_0+\\beta_1x_1+\\beta_2x_2+\\dots+\\beta_px_p)}}{1+\\exp{(\\beta_0+\\beta_1x_1+\\beta_2x_2+\\dots+\\beta_px_p)}}.\n",
-    "$$"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Including more classes\n",
+    "n = 500\n",
+    "n_boostraps = 100\n",
+    "degree = 18  # A quite high value, just to show.\n",
+    "noise = 0.1\n",
     "\n",
-    "Till now we have mainly focused on two classes, the so-called binary\n",
-    "system. Suppose we wish to extend to $K$ classes.  Let us for the sake\n",
-<<<<<<< HEAD
-    "of simplicity assume we have only two predictors. We have then\n",
-    "following model"
-=======
-    "of simplicity assume we have only two predictors. We have then following model"
->>>>>>> 9b0e2e75096cc1acee65bfac25f4eff818140252
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-<<<<<<< HEAD
-    "1\n",
-    "5\n",
-    " \n",
-    "<\n",
-    "<\n",
-    "<\n",
-    "!\n",
-    "!\n",
-    "M\n",
-    "A\n",
-    "T\n",
-    "H\n",
-    "_\n",
-    "B\n",
-    "L\n",
-    "O\n",
-    "C\n",
-    "K"
-=======
-    "$$\n",
-    "\\log{\\frac{p(C=1\\vert x)}{p(K\\vert x)}} = \\beta_{10}+\\beta_{11}x_1,\n",
-    "$$"
+    "# Make data set.\n",
+    "x = np.linspace(-1, 3, n).reshape(-1, 1)\n",
+    "y = np.exp(-x**2) + 1.5 * np.exp(-(x-2)**2) + np.random.normal(0, 0.1, x.shape)\n",
+    "\n",
+    "# Hold out some test data that is never used in training.\n",
+    "x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2)\n",
+    "\n",
+    "# Combine x transformation and model into one operation.\n",
+    "# Not neccesary, but convenient.\n",
+    "model = make_pipeline(PolynomialFeatures(degree=degree), LinearRegression(fit_intercept=False))\n",
+    "\n",
+    "# The following (m x n_bootstraps) matrix holds the column vectors y_pred\n",
+    "# for each bootstrap iteration.\n",
+    "y_pred = np.empty((y_test.shape[0], n_boostraps))\n",
+    "for i in range(n_boostraps):\n",
+    "    x_, y_ = resample(x_train, y_train)\n",
+    "\n",
+    "    # Evaluate the new model on the same test data each time.\n",
+    "    y_pred[:, i] = model.fit(x_, y_).predict(x_test).ravel()\n",
+    "\n",
+    "# Note: Expectations and variances taken w.r.t. different training\n",
+    "# data sets, hence the axis=1. Subsequent means are taken across the test data\n",
+    "# set in order to obtain a total value, but before this we have error/bias/variance\n",
+    "# calculated per data point in the test set.\n",
+    "# Note 2: The use of keepdims=True is important in the calculation of bias as this \n",
+    "# maintains the column vector form. Dropping this yields very unexpected results.\n",
+    "error = np.mean( np.mean((y_test - y_pred)**2, axis=1, keepdims=True) )\n",
+    "bias = np.mean( (y_test - np.mean(y_pred, axis=1, keepdims=True))**2 )\n",
+    "variance = np.mean( np.var(y_pred, axis=1, keepdims=True) )\n",
+    "print('Error:', error)\n",
+    "print('Bias^2:', bias)\n",
+    "print('Var:', variance)\n",
+    "print('{} >= {} + {} = {}'.format(error, bias, variance, bias+variance))\n",
+    "\n",
+    "plt.plot(x[::5, :], y[::5, :], label='f(x)')\n",
+    "plt.scatter(x_test, y_test, label='Data points')\n",
+    "plt.scatter(x_test, np.mean(y_pred, axis=1), label='Pred')\n",
+    "plt.legend()\n",
+    "plt.show()"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "2b502d1d",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "and"
->>>>>>> 9b0e2e75096cc1acee65bfac25f4eff818140252
+    "## Understanding what happens"
    ]
   },
   {
-   "cell_type": "markdown",
-   "metadata": {},
+   "cell_type": "code",
+   "execution_count": 4,
+   "id": "9a5194fb",
+   "metadata": {
+    "collapsed": false,
+    "editable": true,
+    "jupyter": {
+     "outputs_hidden": false
+    }
+   },
+   "outputs": [],
    "source": [
-    "$$\n",
-    "\\log{\\frac{p(C=2\\vert x)}{p(K\\vert x)}} = \\beta_{20}+\\beta_{21}x_1,\n",
-    "$$"
+    "import matplotlib.pyplot as plt\n",
+    "import numpy as np\n",
+    "from sklearn.linear_model import LinearRegression, Ridge, Lasso\n",
+    "from sklearn.preprocessing import PolynomialFeatures\n",
+    "from sklearn.model_selection import train_test_split\n",
+    "from sklearn.pipeline import make_pipeline\n",
+    "from sklearn.utils import resample\n",
+    "\n",
+    "np.random.seed(2018)\n",
+    "\n",
+    "n = 40\n",
+    "n_boostraps = 100\n",
+    "maxdegree = 14\n",
+    "\n",
+    "\n",
+    "# Make data set.\n",
+    "x = np.linspace(-3, 3, n).reshape(-1, 1)\n",
+    "y = np.exp(-x**2) + 1.5 * np.exp(-(x-2)**2)+ np.random.normal(0, 0.1, x.shape)\n",
+    "error = np.zeros(maxdegree)\n",
+    "bias = np.zeros(maxdegree)\n",
+    "variance = np.zeros(maxdegree)\n",
+    "polydegree = np.zeros(maxdegree)\n",
+    "x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2)\n",
+    "\n",
+    "for degree in range(maxdegree):\n",
+    "    model = make_pipeline(PolynomialFeatures(degree=degree), LinearRegression(fit_intercept=False))\n",
+    "    y_pred = np.empty((y_test.shape[0], n_boostraps))\n",
+    "    for i in range(n_boostraps):\n",
+    "        x_, y_ = resample(x_train, y_train)\n",
+    "        y_pred[:, i] = model.fit(x_, y_).predict(x_test).ravel()\n",
+    "\n",
+    "    polydegree[degree] = degree\n",
+    "    error[degree] = np.mean( np.mean((y_test - y_pred)**2, axis=1, keepdims=True) )\n",
+    "    bias[degree] = np.mean( (y_test - np.mean(y_pred, axis=1, keepdims=True))**2 )\n",
+    "    variance[degree] = np.mean( np.var(y_pred, axis=1, keepdims=True) )\n",
+    "    print('Polynomial degree:', degree)\n",
+    "    print('Error:', error[degree])\n",
+    "    print('Bias^2:', bias[degree])\n",
+    "    print('Var:', variance[degree])\n",
+    "    print('{} >= {} + {} = {}'.format(error[degree], bias[degree], variance[degree], bias[degree]+variance[degree]))\n",
+    "\n",
+    "plt.plot(polydegree, error, label='Error')\n",
+    "plt.plot(polydegree, bias, label='bias')\n",
+    "plt.plot(polydegree, variance, label='Variance')\n",
+    "plt.legend()\n",
+    "plt.show()"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "727c7723",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "and so on till the class $C=K-1$ class"
+    "## Summing up\n",
+    "\n",
+    "The bias-variance tradeoff summarizes the fundamental tension in\n",
+    "machine learning, particularly supervised learning, between the\n",
+    "complexity of a model and the amount of training data needed to train\n",
+    "it.  Since data is often limited, in practice it is often useful to\n",
+    "use a less-complex model with higher bias, that is  a model whose asymptotic\n",
+    "performance is worse than another model because it is easier to\n",
+    "train and less sensitive to sampling noise arising from having a\n",
+    "finite-sized training dataset (smaller variance). \n",
+    "\n",
+    "The above equations tell us that in\n",
+    "order to minimize the expected test error, we need to select a\n",
+    "statistical learning method that simultaneously achieves low variance\n",
+    "and low bias. Note that variance is inherently a nonnegative quantity,\n",
+    "and squared bias is also nonnegative. Hence, we see that the expected\n",
+    "test MSE can never lie below $Var(\\epsilon)$, the irreducible error.\n",
+    "\n",
+    "What do we mean by the variance and bias of a statistical learning\n",
+    "method? The variance refers to the amount by which our model would change if we\n",
+    "estimated it using a different training data set. Since the training\n",
+    "data are used to fit the statistical learning method, different\n",
+    "training data sets  will result in a different estimate. But ideally the\n",
+    "estimate for our model should not vary too much between training\n",
+    "sets. However, if a method has high variance  then small changes in\n",
+    "the training data can result in large changes in the model. In general, more\n",
+    "flexible statistical methods have higher variance.\n",
+    "\n",
+    "You may also find this recent [article](https://www.pnas.org/content/116/32/15849) of interest."
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "7e90566c",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "$$\n",
-    "\\log{\\frac{p(C=K-1\\vert x)}{p(K\\vert x)}} = \\beta_{(K-1)0}+\\beta_{(K-1)1}x_1,\n",
-    "$$"
+    "## Another Example from Scikit-Learn's Repository\n",
+    "\n",
+    "This example demonstrates the problems of underfitting and overfitting and\n",
+    "how we can use linear regression with polynomial features to approximate\n",
+    "nonlinear functions. The plot shows the function that we want to approximate,\n",
+    "which is a part of the cosine function. In addition, the samples from the\n",
+    "real function and the approximations of different models are displayed. The\n",
+    "models have polynomial features of different degrees. We can see that a\n",
+    "linear function (polynomial with degree 1) is not sufficient to fit the\n",
+    "training samples. This is called **underfitting**. A polynomial of degree 4\n",
+    "approximates the true function almost perfectly. However, for higher degrees\n",
+    "the model will **overfit** the training data, i.e. it learns the noise of the\n",
+    "training data.\n",
+    "We evaluate quantitatively overfitting and underfitting by using\n",
+    "cross-validation. We calculate the mean squared error (MSE) on the validation\n",
+    "set, the higher, the less likely the model generalizes correctly from the\n",
+    "training data."
    ]
   },
   {
-   "cell_type": "markdown",
-   "metadata": {},
+   "cell_type": "code",
+   "execution_count": 5,
+   "id": "7c760f15",
+   "metadata": {
+    "collapsed": false,
+    "editable": true,
+    "jupyter": {
+     "outputs_hidden": false
+    }
+   },
+   "outputs": [],
    "source": [
-    "and the model is specified in term of $K-1$ so-called log-odds or\n",
-    "**logit** transformations.\n",
     "\n",
     "\n",
-    "## More classes\n",
+    "#print(__doc__)\n",
     "\n",
-    "In our discussion of neural networks we will encounter the above again\n",
-    "in terms of a slightly modified function, the so-called **Softmax** function.\n",
+    "import numpy as np\n",
+    "import matplotlib.pyplot as plt\n",
+    "from sklearn.pipeline import Pipeline\n",
+    "from sklearn.preprocessing import PolynomialFeatures\n",
+    "from sklearn.linear_model import LinearRegression\n",
+    "from sklearn.model_selection import cross_val_score\n",
     "\n",
-    "The softmax function is used in various multiclass classification\n",
-    "methods, such as multinomial logistic regression (also known as\n",
-    "softmax regression), multiclass linear discriminant analysis, naive\n",
-    "Bayes classifiers, and artificial neural networks.  Specifically, in\n",
-    "multinomial logistic regression and linear discriminant analysis, the\n",
-    "input to the function is the result of $K$ distinct linear functions,\n",
-    "and the predicted probability for the $k$-th class given a sample\n",
-    "vector $\\hat{x}$ and a weighting vector $\\hat{\\beta}$ is (with two\n",
-    "predictors):"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "$$\n",
-    "p(C=k\\vert \\mathbf {x} )=\\frac{\\exp{(\\beta_{k0}+\\beta_{k1}x_1)}}{1+\\sum_{l=1}^{K-1}\\exp{(\\beta_{l0}+\\beta_{l1}x_1)}}.\n",
-    "$$"
+    "\n",
+    "def true_fun(X):\n",
+    "    return np.cos(1.5 * np.pi * X)\n",
+    "\n",
+    "np.random.seed(0)\n",
+    "\n",
+    "n_samples = 30\n",
+    "degrees = [1, 4, 15]\n",
+    "\n",
+    "X = np.sort(np.random.rand(n_samples))\n",
+    "y = true_fun(X) + np.random.randn(n_samples) * 0.1\n",
+    "\n",
+    "plt.figure(figsize=(14, 5))\n",
+    "for i in range(len(degrees)):\n",
+    "    ax = plt.subplot(1, len(degrees), i + 1)\n",
+    "    plt.setp(ax, xticks=(), yticks=())\n",
+    "\n",
+    "    polynomial_features = PolynomialFeatures(degree=degrees[i],\n",
+    "                                             include_bias=False)\n",
+    "    linear_regression = LinearRegression()\n",
+    "    pipeline = Pipeline([(\"polynomial_features\", polynomial_features),\n",
+    "                         (\"linear_regression\", linear_regression)])\n",
+    "    pipeline.fit(X[:, np.newaxis], y)\n",
+    "\n",
+    "    # Evaluate the models using crossvalidation\n",
+    "    scores = cross_val_score(pipeline, X[:, np.newaxis], y,\n",
+    "                             scoring=\"neg_mean_squared_error\", cv=10)\n",
+    "\n",
+    "    X_test = np.linspace(0, 1, 100)\n",
+    "    plt.plot(X_test, pipeline.predict(X_test[:, np.newaxis]), label=\"Model\")\n",
+    "    plt.plot(X_test, true_fun(X_test), label=\"True function\")\n",
+    "    plt.scatter(X, y, edgecolor='b', s=20, label=\"Samples\")\n",
+    "    plt.xlabel(\"x\")\n",
+    "    plt.ylabel(\"y\")\n",
+    "    plt.xlim((0, 1))\n",
+    "    plt.ylim((-2, 2))\n",
+    "    plt.legend(loc=\"best\")\n",
+    "    plt.title(\"Degree {}\\nMSE = {:.2e}(+/- {:.2e})\".format(\n",
+    "        degrees[i], -scores.mean(), scores.std()))\n",
+    "plt.show()"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "2619ab70",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "It is easy to extend to more predictors. The final class is"
+    "## Various steps in cross-validation\n",
+    "\n",
+    "When the repetitive splitting of the data set is done randomly,\n",
+    "samples may accidently end up in a fast majority of the splits in\n",
+    "either training or test set. Such samples may have an unbalanced\n",
+    "influence on either model building or prediction evaluation. To avoid\n",
+    "this $k$-fold cross-validation structures the data splitting. The\n",
+    "samples are divided into $k$ more or less equally sized exhaustive and\n",
+    "mutually exclusive subsets. In turn (at each split) one of these\n",
+    "subsets plays the role of the test set while the union of the\n",
+    "remaining subsets constitutes the training set. Such a splitting\n",
+    "warrants a balanced representation of each sample in both training and\n",
+    "test set over the splits. Still the division into the $k$ subsets\n",
+    "involves a degree of randomness. This may be fully excluded when\n",
+    "choosing $k=n$. This particular case is referred to as leave-one-out\n",
+    "cross-validation (LOOCV)."
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "3e4d0bdb",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "$$\n",
-    "p(C=K\\vert \\mathbf {x} )=\\frac{1}{1+\\sum_{l=1}^{K-1}\\exp{(\\beta_{l0}+\\beta_{l1}x_1)}},\n",
-    "$$"
+    "## Cross-validation in brief\n",
+    "\n",
+    "For the various values of $k$\n",
+    "\n",
+    "1. shuffle the dataset randomly.\n",
+    "\n",
+    "2. Split the dataset into $k$ groups.\n",
+    "\n",
+    "3. For each unique group:\n",
+    "\n",
+    "a. Decide which group to use as set for test data\n",
+    "\n",
+    "b. Take the remaining groups as a training data set\n",
+    "\n",
+    "c. Fit a model on the training set and evaluate it on the test set\n",
+    "\n",
+    "d. Retain the evaluation score and discard the model\n",
+    "\n",
+    "5. Summarize the model using the sample of model evaluation scores"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "65d5f3f5",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "and they sum to one. Our earlier discussions were all specialized to\n",
-    "the case with two classes only. It is easy to see from the above that\n",
-    "what we derived earlier is compatible with these equations.\n",
-    "\n",
-    "To find the optimal parameters we would typically use a gradient\n",
-    "descent method.  Newton's method and gradient descent methods are\n",
-    "discussed in the material on [optimization\n",
-    "methods](https://compphysics.github.io/MachineLearning/doc/pub/Splines/html/Splines-bs.html).\n",
-    "\n",
-<<<<<<< HEAD
-    "\n",
-    "\n",
+    "## Code Example for Cross-validation and $k$-fold Cross-validation\n",
     "\n",
-    "## A simple classification problem"
+    "The code here uses Ridge regression with cross-validation (CV)  resampling and $k$-fold CV in order to fit a specific polynomial."
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 2,
-   "metadata": {},
+   "execution_count": 6,
+   "id": "66c55986",
+   "metadata": {
+    "collapsed": false,
+    "editable": true,
+    "jupyter": {
+     "outputs_hidden": false
+    }
+   },
    "outputs": [],
    "source": [
     "import numpy as np\n",
-    "from sklearn import datasets, linear_model\n",
     "import matplotlib.pyplot as plt\n",
+    "from sklearn.model_selection import KFold\n",
+    "from sklearn.linear_model import Ridge\n",
+    "from sklearn.model_selection import cross_val_score\n",
+    "from sklearn.preprocessing import PolynomialFeatures\n",
     "\n",
+    "# A seed just to ensure that the random numbers are the same for every run.\n",
+    "# Useful for eventual debugging.\n",
+    "np.random.seed(3155)\n",
     "\n",
-    "def generate_data():\n",
-    "    np.random.seed(0)\n",
-    "    X, y = datasets.make_moons(200, noise=0.20)\n",
-    "    return X, y\n",
+    "# Generate the data.\n",
+    "nsamples = 100\n",
+    "x = np.random.randn(nsamples)\n",
+    "y = 3*x**2 + np.random.randn(nsamples)\n",
     "\n",
+    "## Cross-validation on Ridge regression using KFold only\n",
     "\n",
-    "def visualize(X, y, clf):\n",
-    "    plot_decision_boundary(lambda x: clf.predict(x), X, y)\n",
+    "# Decide degree on polynomial to fit\n",
+    "poly = PolynomialFeatures(degree = 6)\n",
     "\n",
-    "def plot_decision_boundary(pred_func, X, y):\n",
-    "    # Set min and max values and give it some padding\n",
-    "    x_min, x_max = X[:, 0].min() - .5, X[:, 0].max() + .5\n",
-    "    y_min, y_max = X[:, 1].min() - .5, X[:, 1].max() + .5\n",
-    "    h = 0.01\n",
-    "    # Generate a grid of points with distance h between them\n",
-    "    xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h))\n",
-    "    # Predict the function value for the whole gid\n",
-    "    Z = pred_func(np.c_[xx.ravel(), yy.ravel()])\n",
-    "    Z = Z.reshape(xx.shape)\n",
-    "    # Plot the contour and training examples\n",
-    "    plt.contourf(xx, yy, Z, cmap=plt.cm.Spectral)\n",
-    "    plt.scatter(X[:, 0], X[:, 1], c=y, cmap=plt.cm.Spectral)\n",
-    "    plt.show()\n",
+    "# Decide which values of lambda to use\n",
+    "nlambdas = 500\n",
+    "lambdas = np.logspace(-3, 5, nlambdas)\n",
     "\n",
+    "# Initialize a KFold instance\n",
+    "k = 5\n",
+    "kfold = KFold(n_splits = k)\n",
     "\n",
-    "def classify(X, y):\n",
-    "    clf = linear_model.LogisticRegressionCV()\n",
-    "    clf.fit(X, y)\n",
-    "    return clf\n",
+    "# Perform the cross-validation to estimate MSE\n",
+    "scores_KFold = np.zeros((nlambdas, k))\n",
     "\n",
+    "i = 0\n",
+    "for lmb in lambdas:\n",
+    "    ridge = Ridge(alpha = lmb)\n",
+    "    j = 0\n",
+    "    for train_inds, test_inds in kfold.split(x):\n",
+    "        xtrain = x[train_inds]\n",
+    "        ytrain = y[train_inds]\n",
     "\n",
-    "def main():\n",
-    "    X, y = generate_data()\n",
-    "    # visualize(X, y)\n",
-    "    clf = classify(X, y)\n",
-    "    visualize(X, y, clf)\n",
+    "        xtest = x[test_inds]\n",
+    "        ytest = y[test_inds]\n",
     "\n",
-    "if __name__ == \"__main__\":\n",
-    "    main()"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Cancer Data again now with Decision Trees and other Methods"
-=======
-    "This will be discussed next week. Before we develop our own codes for logistic regression, we end this lecture by studying the functionality that **Scikit-learn** offers. \n",
+    "        Xtrain = poly.fit_transform(xtrain[:, np.newaxis])\n",
+    "        ridge.fit(Xtrain, ytrain[:, np.newaxis])\n",
     "\n",
+    "        Xtest = poly.fit_transform(xtest[:, np.newaxis])\n",
+    "        ypred = ridge.predict(Xtest)\n",
     "\n",
+    "        scores_KFold[i,j] = np.sum((ypred - ytest[:, np.newaxis])**2)/np.size(ypred)\n",
     "\n",
+    "        j += 1\n",
+    "    i += 1\n",
     "\n",
     "\n",
-    "## Wisconsin Cancer Data\n",
+    "estimated_mse_KFold = np.mean(scores_KFold, axis = 1)\n",
     "\n",
-    "We show here how we can use a simple regression case on the breast\n",
-    "cancer data using Logistic regression as our algorithm for\n",
-    "classification."
->>>>>>> 9b0e2e75096cc1acee65bfac25f4eff818140252
-   ]
-  },
-  {
-   "cell_type": "code",
-<<<<<<< HEAD
-   "execution_count": 3,
-   "metadata": {},
-   "outputs": [],
-=======
-   "execution_count": 4,
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "(426, 30)\n",
-      "(143, 30)\n",
-      "Test set accuracy with Logistic Regression: 0.95\n",
-      "Test set accuracy Logistic Regression with scaled data: 0.96\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "/Users/hjensen/opt/anaconda3/lib/python3.8/site-packages/sklearn/linear_model/_logistic.py:762: ConvergenceWarning: lbfgs failed to converge (status=1):\n",
-      "STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.\n",
-      "\n",
-      "Increase the number of iterations (max_iter) or scale the data as shown in:\n",
-      "    https://scikit-learn.org/stable/modules/preprocessing.html\n",
-      "Please also refer to the documentation for alternative solver options:\n",
-      "    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression\n",
-      "  n_iter_i = _check_optimize_result(\n"
-     ]
-    }
-   ],
->>>>>>> 9b0e2e75096cc1acee65bfac25f4eff818140252
-   "source": [
-    "import matplotlib.pyplot as plt\n",
-    "import numpy as np\n",
-    "from sklearn.model_selection import  train_test_split \n",
-    "from sklearn.datasets import load_breast_cancer\n",
-    "from sklearn.linear_model import LogisticRegression\n",
+    "## Cross-validation using cross_val_score from sklearn along with KFold\n",
+    "\n",
+    "# kfold is an instance initialized above as:\n",
+    "# kfold = KFold(n_splits = k)\n",
+    "\n",
+    "estimated_mse_sklearn = np.zeros(nlambdas)\n",
+    "i = 0\n",
+    "for lmb in lambdas:\n",
+    "    ridge = Ridge(alpha = lmb)\n",
+    "\n",
+    "    X = poly.fit_transform(x[:, np.newaxis])\n",
+    "    estimated_mse_folds = cross_val_score(ridge, X, y[:, np.newaxis], scoring='neg_mean_squared_error', cv=kfold)\n",
+    "\n",
+    "    # cross_val_score return an array containing the estimated negative mse for every fold.\n",
+    "    # we have to the the mean of every array in order to get an estimate of the mse of the model\n",
+    "    estimated_mse_sklearn[i] = np.mean(-estimated_mse_folds)\n",
+    "\n",
+    "    i += 1\n",
+    "\n",
+    "## Plot and compare the slightly different ways to perform cross-validation\n",
+    "\n",
+    "plt.figure()\n",
     "\n",
-    "# Load the data\n",
-    "cancer = load_breast_cancer()\n",
+    "plt.plot(np.log10(lambdas), estimated_mse_sklearn, label = 'cross_val_score')\n",
+    "plt.plot(np.log10(lambdas), estimated_mse_KFold, 'r--', label = 'KFold')\n",
+    "\n",
+    "plt.xlabel('log10(lambda)')\n",
+    "plt.ylabel('mse')\n",
+    "\n",
+    "plt.legend()\n",
     "\n",
-    "X_train, X_test, y_train, y_test = train_test_split(cancer.data,cancer.target,random_state=0)\n",
-    "print(X_train.shape)\n",
-    "print(X_test.shape)\n",
-    "# Logistic Regression\n",
-    "logreg = LogisticRegression(solver='lbfgs')\n",
-    "logreg.fit(X_train, y_train)\n",
-    "print(\"Test set accuracy with Logistic Regression: {:.2f}\".format(logreg.score(X_test,y_test)))\n",
-    "#now scale the data\n",
-    "from sklearn.preprocessing import StandardScaler\n",
-    "scaler = StandardScaler()\n",
-    "scaler.fit(X_train)\n",
-    "X_train_scaled = scaler.transform(X_train)\n",
-    "X_test_scaled = scaler.transform(X_test)\n",
-    "# Logistic Regression\n",
-    "logreg.fit(X_train_scaled, y_train)\n",
-    "print(\"Test set accuracy Logistic Regression with scaled data: {:.2f}\".format(logreg.score(X_test_scaled,y_test)))"
+    "plt.show()"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "8bd8e7a8",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-<<<<<<< HEAD
-    "## Other measures in classification studies: Cancer Data  again"
-=======
-    "## Using the correlation matrix\n",
-    "\n",
-    "In addition to the above scores, we could also study the covariance (and the correlation matrix).\n",
-    "We use **Pandas** to compute the correlation matrix."
->>>>>>> 9b0e2e75096cc1acee65bfac25f4eff818140252
+    "## More examples on bootstrap and cross-validation and errors"
    ]
   },
   {
    "cell_type": "code",
-<<<<<<< HEAD
-   "execution_count": 4,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-=======
-   "execution_count": 5,
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAscAAAWYCAYAAABaiWuCAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAgAElEQVR4nOzdeVxVdf7H8RcgCCKKCyou4JJLariUO2pmmWZlOjVqSpk2JmqOU5ZbpdakYmWGaepkZmaTlYqmllqTy2TgkmCMSCqCqIGEIigg2/f3h3h/kqBgXLjg+/l48Hjcs9zv+XzPOXzv537P955jZ4wxiIiIiIgI9qUdgIiIiIiIrVByLCIiIiKSS8mxiIiIiEguJcciIiIiIrmUHIuIiIiI5FJyLCIiIiKSS8mxyE188MEHdOvWjYULF1rmPfLII8TExJRiVCIiImINSo5FbsLf35/u3bvnmbd69Wq8vb1LKSIREduzcOFCpkyZUixl3XfffYSEhBRLWSJFpeRY5BZUqVKltEMQERERK6hQ2gFI+bFmzRqWLl1KmzZtcHNz4+eff6ZFixaMHz+e+fPnExERwYgRIxg2bBgAmZmZzJ8/n4MHD2JnZ0e3bt0YN24cdnZ2HD16lICAANLT08nMzGTQoEEMHjwYgPHjx7Nz506ef/55QkNDOXr0aJ5yr3Xo0CFeffVVUlJSePLJJ/nuu+84ePAgkZGRLFq0iF27dlGxYkVcXFx4/fXXqV27NgAnT55kypQpZGdn07hxY9LT0y1lzp07l6+++opp06Zx3333MXr0aMLCwoiMjCQ2Npbx48eTkpLCf/7zHwC+++47/vWvf+Hs7Iy9vT0TJkygXbt21j4cIlJO2WJbu2XLFtavX8/ly5fx8/Oja9eu+Pv7c/LkSWbOnElGRgY5OTlMmjSJ9u3bM3fuXD777DM8PT1Zt24dEyZM4OjRo7zwwguEhISQkJDA7NmzqVKlCpMnT+b1118vsJ09c+YMEydOJCwsjDlz5hAUFMT+/fvZtm0bLi4uzJgxg/Pnz5Odnc2zzz7L/fffX6LHS8ogI1KMAgMDTY8ePUxycrK5fPmy6dKli5k+fbrJyckx4eHhpm3btiYzM9MYY8zixYuNn5+fycrKMhkZGWbw4MEmKCjIGGNMaGioCQ0NNcYYk5GRYfr27WtOnDhh2U6vXr3MjBkzjDHGhIWF5Sn3j4KDg02rVq3Mjz/+aIwxZu7cucYYYz755BOTk5NjjDFm7dq1ZtKkSZb3PP7442bJkiXGGGPi4uJMhw4dTGBgoGX58OHDzdq1a40xxsTGxppmzZrl2V6vXr0s0507dzYJCQnGGGO2b9+epxwRkVthi21tYGCgmTx5smU6KyvL9O3b13z55ZfGGGMiIiJMx44dTUpKijHGmKVLl5pBgwaZrKwsM2fOHBMREZFnu8HBwZbpm7WzV5evX7/eGGPM8uXLTXx8vHnmmWfMggULjDHGxMfHm44dO5rY2Ngi7Gm5HWlYhRQ7Hx8f3NzccHJywtvbm+bNm2NnZ0fz5s1JTU0lMTERgPXr1zNw4EAcHBxwdHSkb9++bNy4EQBvb2+++uorhgwZwsiRI0lISODw4cN5tnN1HPAfy82Pi4sLXbt2BWDy5MkAeHp68tRTTzFs2DBWrlzJ//73PwBOnz7NoUOHePTRRwGoXbs27du3v+X9UbVqVb744guSk5MtPc0iIn+WLba11woNDSU2NpYBAwYA0KJFC2rXrs2OHTsAGDVqFAAvvvgilStXpkWLFn96n/Tu3RuAkSNHYozhxx9/5PHHHwegVq1atG/fns2bN//p7Uj5pmEVUuxcXV0trytUqGCZrlDhyumWmZkJQFxcHCtWrGDdunUAXLp0yTKWd+7cuSQnJ7N69WocHBzw8/PLM7QBoHLlygBUrFgxT7n5cXNzyzMdHR3NxIkT+eyzz/Dx8SEkJISpU6cCkJCQAEC1atUs67u7uxdlF+SxYsUKlixZQr9+/bj77rt56aWXaNCgwS2XJyICttnWXis+Ph64kqhelZGRQUpKCgAODg5MnTqVYcOGWZL1P+vatj4uLg640iFiZ2cHwPnz52nWrFmxbEvKLyXHUmo8PT3x9/enX79+AOTk5JCcnAxcGSv85JNP4uDgABS+MS6sw4cP4+rqio+PDwBZWVmWZR4eHgCcO3eOunXrApCUlES9evXyLcvR0RG40ug7OTlZGv6rHBwcmDVrFlOnTiUgIICpU6fy6aefFmt9REQKUlptbZ06dXB0dGTVqlWWeampqdjb//9F66+//ponnniCGTNm8Nlnn+VZdq2btbMFbR8gMDCQ6tWrA3D58uU87b1Ifgo1rOLIkSPWjkNuQwMHDmTTpk1kZ2cDVy79LVmyBAAvLy/CwsIAOHv2LJGRkcW6bW9vb5KTkzlx4gQAu3fvtiyrV68ePj4+bNiwAbjS+7F3794Cy6pRowYuLi78+uuvAOzatSvP8jFjxpCdnY2zszM+Pj6W+oqIlISSamtdXV1JS0vDGMO4ceNo06YNnp6ebNu2DbjSCTFu3Diio6MB2LlzJ40aNWLGjBmkpaXxySef5CkrPT2d4OBgVq5cedN2Nj+1a9fG19fX0pYDzJgxQ7eIk5tymDlz5sybrfTMM89Qt25dGjZsaP2IpMz6+uuvWbFiBVFRUbi4uLBz506+++47IiIiaNWqFW+++SZRUVGEhYXxwAMP0KlTJyIjIwkMDGTjxo0kJSUxdepUHB0dufPOO1mzZg3r1q0jMjKS9PR09u3bR5MmTVi4cCFhYWGEh4fj6+vL9OnT85Tr7OxsienYsWO8+uqrnD59mp9++on77rsPZ2dnatWqRVZWFm+99RbBwcE4OTlx4MABoqOjeeCBB+jcuTPLly/niy++4JdffqFu3brs3r0bZ2dnNm3axK5du4iIiKB+/fo0btyYypUrExAQwE8//USLFi344YcfOHLkCP369SMqKorFixezYcMGwsPDmTFjBjVr1izFIyUiZZkttrUA1atXZ9WqVWzcuJFu3bpxzz330L17dxYtWsSXX37JunXrGDBgAD169ODDDz8kICAAT09PWrduzZo1a/juu+84d+4cPXr0ICcnhyVLlnDgwAFGjBiBh4dHge1sly5dGD9+vKUjw8fHx9JT7Ovry+eff86qVatYu3Ytbdq04YknniiNwyZliJ0xxtxspUWLFuHt7c0PP/xAixYtePzxx/OMxxQRERERKQ8KlRxfa8+ePUyePJkuXbowbNgw2rRpY63YRERERERKVKGS49WrV3PXXXexevVqduzYQZ8+fRg0aBBhYWGcOnWKV155pSRiFRERERGxqkIlx23btsXT05MhQ4YwaNAgy61SjDGMHz+eRYsWWT1QERERERFrK9St3B5//PF8e4fPnDlDhw4dij0oEREREZHSUKie4/379xMZGWl5nvqXX37JI488ct0vVa9KSLj5/QfLumrVKnH+fGpph2GTtG/yp/2Sv9tpv3h4uN18pVtUXO1ueTke5aUeUH7qonrYltulHrfS7hbqPsdLly6lfv36lulatWrx5ptvFnlj5UmFCg6lHYLN0r7Jn/ZL/rRfbEt5OR7lpR5QfuqietgW1eMGZRZmpWbNmtGzZ0/LdM+ePQkODi72YG43m6O25Tu/f+M+JRyJiMjtQ22viNxIoZLjuLg4srKy8jyv/eoz06X4FdRwF0QNuoiIiEjxKFRy3Lt3b3r37s2dd96JnZ0dERERvPzyy9aOTUREpNQVpadZvdIiZV+hkuOHHnqI5s2b89NPPwHw0ksv0bhxY6sGJiIiIiJS0gqVHAM0adKEJk2aWKZXrVqFn5+fVYISERERESkNhUqOf/jhB5YuXUpiYiI5OTkYY0hOTlZybCN0GU9ExLapnRYpOwqVHAcEBPDqq6/i5eWFvb09xhiWLVtm7dhEpAQF7Y4q1vIe666hVyIiUvYU6j7HTZs2pVu3bjRo0IB69epRv359xowZY+3YRKQcCw39mfHjR9OrVy8yMzPzLFu8OJABA/ry9ddB+b43LCyUkSOH8fPP+wH48MMl/Pe/O60e87W++OKzEt2eFGxz1Lbr/r4I31TaYYlIGVWonmNPT0+mTp1Ku3btcHJyAmDjxo189NFHVg1O/hxdxhNb1rZte9q1u5v9+4P5+usgBg16AoDz588TEfE/atb04JFHHsv3vW3atKVJk6aW6VGjnsPOzq5E4r7qiy/+zV//+mSJblNERKyvUMnxN998g6+vLwcPHrTM032ORaQ4jBs3jtdem8HDDw/AycmJdeu+YODAJ1i9eiUAAQH/pGZND9LS0qhRoyZDhw7P8/64uDjee+8t7rijGaNGPceZM6d57723ueOOZri6uvLJJx/x979PIicnh6VLF/HXvw7lzJnTxMREM2/eu7i6Vmb9+q84ceI41avXIC7uNyZNmkpqaiozZ07HwcGeJk2a8r///cIDD/Tl0UcH8v3327l4MYXly5fi7d2Q++9/sDR2nYiIWEGhkuOxY8cydOjQPPN27NhhjXikBKhHWWxJ06ZNad3ah40b13HffQ9gb2+Pu7u7ZXnXrr50734vACNGPMmAAQOpVMnVsrxOnTp0734vv/12BrgyJKNPn3707t2HU6diWbfuS/r1exiAb77ZRNOmzRk+fATvvBPAvn0h3Htvbzw8ajFgwCDs7e1ZsOAt9u4NpmtXX4YPf5qlSxcxZsx4zp8/z9//PoZHHx1I794P8MEHgYwa9VzJ7SgRESkRhUqOhw4dSkxMDKdPn6ZTp06cPXuWe++918qhicjt4pln/saLLz5PfHw8w4Y9TVTUMcuyxMTfWbp0EZUquXLp0iUuXLiQJzn+o+joKOrXfwaAunXrXbe8QQMvANzd3UlNTQXA2dmZxYsDqVrVnRMnTtCsWYvr1q9WrZplfREo+tNMRaRsKFRyvG7dOhYvXky9evW45557mDNnDr1792bAgAHWjk9EbgONGjWmTZt2VKhQIU+v8dGjv7J69Sq+/HIDAD/+uOumZTVs2IjY2BiaN2/BmTOnr1ue39jkV16ZzMcf/5s6deqQmnrppusDljv3HD0amSeZFrElBd2FRneTESlYoZLj/fv3s23bNmbNmoWTkxOBgYHMmjVLybFIOVLSH5ZHjhwmLOwgdnbZPPXUaGbM+Cdw5Qd5W7duITHxd06cOE7Dhg2ZO/cNvLwakpBwls2bN9KhQ2eOHz/K1q1bqFu3Hj/+uIuUlBROnIjC338C7747j+PHj1G7dm1LcrtvXzDx8XFs3ryRfv0eJizsIFFRx+jSxZfHHvsL8+cH4OPThvDwQ5w8GU3nzl3ZunULx48f5ciRw0RFHefixYvs2PE9997bm65dfXn//QXk5OQoOb5FGuJVsGuTWlfXily6dBlQUitSEgqVHHt4eGBvn/eub3+cFhEpihYtWhIYuAQPDzcSElIs86tVq8bUqa9Zpvv06Wd5/eST///goY8+Wm15/eabb1lex8ae5JVXXsfd3Z24uDh++OE/AHTo0Jkvv9xoWS8wcInl9Zgx4y2vhw8fYXl9bRwtWrTkoYcesUxPnPhS4SsrIiJlRqGS43PnzvH111+TkpLCoUOH+PHHH0lKSrJ2bOWGxqWJlJy4uN9YvnwpzZo159SpWJ5//h+lHZKUErW9InIrCpUcT5o0iTfffJM9e/awZ88eevTowSuvvGLt2EREiqxDh0506NCptMMQK7B2sqtkWkSgkMlx1apVmTdvXp55GRkZVglIRERERKS0FCo5PnPmzHXzFi5cyJw5c4o9IBERkdtFwb3Vd5RoHCLy/wqVHD/yyCO4u7tjjCErK4vExETq1Klj7djExhXUqI/w+EsJRyIiIiJSPAqVHL/00ksMGTLEMn327Fm2bt1qtaBsye2UAOq2Sre34h5vqfNGpOQUdD/joq6vW8WJFDI5vjYxBqhVqxZHjhyxSkAicns4fDicxYsDsbMztG17D+fOJXLhwgVmzPgnjo6ORSrr0qWLTJ78Au+/v8xK0YqUrOPZ+yyvHdMqkJmdBUDQ7tKKSOT2UajkeOrUqZbXxhgSEhJ0n2MR+VNatmxNu3Z3Y2eXzciRzwHw/PPPERKyB1/fnkUqy9W1MgsXLrVGmFIKyuNdI46cPG/V8q9Npq/VxKFDkcpRj7JIIZPjuLg4Hn30UeDKo1Rr1qxJp066VdLtojx+UIntycnJITn5Au7u1QgLO8jmzRtp2LARJ0/GMGbM86SkJDN79iw8PetSo0ZNwsPD8PMbSdeuvnzzzSbee+9tvv12BwCrVn1MdPRxvLwa8ssvYTg6OvL3v09i3rzZODjY06RJU/73v1944IG+PProwNKtuIiI2JRCJcdjx46lQ4f8v33GxcXpx3nXUCJZMPVISH4OHTrEqlUfc/RoJHfc0ZRmzVrw178O4F//WomHRy22bPmaTz5ZzoQJL/LII48REvIT48b9nYiI/7FixYd07epLv34Ps3z5lZ7jqKhjbN26mU8//RKA119/lXvu6UidOp4MH/40S5cuYsyY8Zw/f56//32MkmMREcmjUMnxJ598gp2dHcaY65atWLGCxYsXF3tgtu6L8E2k5j7rXqSw9AXhej4+Pvj5jQBg1aoVLFz4LsnJyXz77RYAUlIuYG/vYFnfy8sbAHf3aqSmXrquvBMnTlCvXn3LdN269fIsb9DAC7jymOrU1NRirYuItRU0fKKo6xd1uIXI7aRQyfHp06d57rnnaNq0KQBHjx6ladOmODo6EhMTY9UApez5bOsRLumLg9yCGjVqEhl5BHd3dwYMGESVKlW4cCGJ8PBfCl1Gw4aNOHUq1jJ95szpPAmynZ1dscYscjsryl0yXF0r8kD7ejdfUaSUFSo5vueee1ixYgVVq1YFIDk5mSVLlvDyyy/z+eefWzVAKf+Kqzc1v3IKKqOo2yzqbZLKYk9wSd967ciRw4SFHQRyWLlyOdnZ2URFHWPUqDFcvJjCBx8spHbt2sTHxzF48DDOnUvkxx93kZKSwqlTsWzduoX4+Dj279/LuXOJXLx4kaCgr3jsscfp06cfM2ZM4447mpKRkYGdnR0ZGRls3bqF48ePcuTIYaKijnPx4kV27Piee+/tXaJ1l7KloB/TtfCqVuh1bU1Z6VHW1TYpDYVKjpOTky2JMUCVKlU4f/5KA/DH27yVVRorXLCifDCIFFaLFi0JDFyCh4cbCQkp1y2/664218178823LK9HjXqOUaOes0z36dPP8rp9+3t4+ulRAMyePYu6devj5OTE1Kmv5dn+Qw89Uix1ERGR8qNQyXFCQgIff/wxHTp0wM7OjpCQEM6ePWvt2MTGFZQ0+1Qp4UCKUVF7iK29XfWO3JqvvvqcAwf2kZOTQ82aHrRp07a0Q5JyqKz0EheH4mob1dZJWVCo5Hj27Nm8+eabfPDBBwB07tyZ2bNnWzUwa1EPsXqCpfybNWtOaYcg11C7W34VdXhGwT8oVHIstqNQyXHt2rUJDAy0dixSyspKL0hRejBsqSfY1bViKUQiInJ7K+jLmR5xLwUp1GPuEhMTmTx5MhMnTiQtLY3XXnuNCxcuWDs2EREREZESVaie43nz5nHPPfewd+9eXFxcGDJkCG+99Rb//Oc/rR3fLdNlvLLTE1yQ0ur1LQs0bk/k5sp6GygipaPQwyqeeOIJwsPDAWjZsiVubm5WDUzKroi0YDKzs66bb2u3CCoLbGlYCCj5vt0UtZNBl6nlqvzGFjumFSrlKJR3dqwp9LpF/T2NhmFIoc7UpKQk4P9vnp+amkpsbOyN3lLsbqeTVb0dIlIWWfuKnX5MbH1l5f7HRVHQedO/gO/6RV1fyp9CJcedO3fm4YcfJiMjg9GjRxMeHs4rr7xi7dgKpSwMn1CDfkVRH3takLLcSIuIiIhtszPGmMKseOLECfbs2YMxhq5du9K4ccFfofK7oX9hlYVkF6CSa0VSC/mI5NutJ9jRsQKZmdcPq7C2/JJmW+oFcXWtaHOP1S6uJwL+mW1efQiItZ9CWFx1+jNDSzw8rDccrbja3WvbtqJ+sbd2R0BRyq/kWpGfI+KKZbulrbTa1OJ2o3oUtU0urs6Worh6nv3x87+g87KgOhXXk1sLGlry4r2D853/RwU9gKk03cqQvpvV41ba3UIlx507d2bChAk8+eSTRd6AiIiIiEhZUahbuTVr1uy6xPjcuXNWCUhEREREpLQUKjl+6KGH2LlzJ5mZmZZ5ixcvtlpQIiIiIiKloVDDKlq0aPH/b7CzwxiDnZ0dERERVg1ORERERKQk3fBuFWfOnKFChQr07t2bRYsW5Vn23nvvWTUwEREREZGSdsOe47/85S88/vjj9OzZEwBnZ2eqV69eYsGJiIiIiJSkG445bt26NUOHDmXdunU899xz7N69u6TiEhEREREpcTdMjq8+EW/8+PE0bdqUAQMGWJZlZGRYNzIRERERkRJW6AedX02Ur5o3b57NPCXP2hISEliwYAFHjhxh7dq1wJVHar/zzjs0aNCA6OhoXnjhBWrWrFnKkZa8/PbNwoUL2bt3r2WdMWPG0K1bt9IKscSdPHmSBQsW0LJlS+Li4nB3d2f8+PE6Zyh439zu50xpKS9tW3lph8pL21Fe/s9zcnIYM2YMPj4+ZGZmEhsby+zZs0lPTy9TxwMKrsu//vWvMnVMANLT03niiSfw9fVl8uTJVvn/uGFyvH37do4cOQJAdHQ0Q4YMsSw7derUbZMcHzhwgN69e+e5O8f8+fPp0qULDz30EP/5z38ICAjgrbfeKsUoS0d++wZg1apVpRRR6UtKSuKhhx7i/vvvB67cCvHee+/liy++uO3PmYL2Ddze50xpKS9tW3lph8pL21Ge/s/btm3L2LFjAfD392fbtm3s37+/TB2Pq/KrC5S9Y3L1i9dV1mizbpgcN2nShIEDB+a7bOPGjX9qw2VJ3759CQkJyTNv586d+Pv7A9C+fXumTJlSGqGVuvz2DcAHH3yAk5MT2dnZ+Pn54eLiUgrRlQ4fH5880zk5Obi4uOicoeB9A7f3OVNaykvbVl7aofLSdpSX/3N7e3tLMpmVlUV8fDyNGjXinXfeKVPHAwquS0xMTJk6JkFBQbRv357IyEhSU1MB67RZN0yOJ06cSPv27fNd1qhRoz+98bIsMTERV1dXACpXrsyFCxfIysqiQoVCj1Qpt/r27Uu9evWoVKkSq1ev5o033mD27NmlHVap2L59O76+vjRp0kTnzB9cu290ztiO8nKelvVzqry0HeXh/3z37t18/PHH3Hvvvdx1111l+nj8sS7Ozs5l5pgcO3aMqKgoXnjhBSIjIy3zrXE8bviDvIISY7jSPX87q1GjBpcuXQLg4sWLVK1atUz8Y5SEpk2bUqlSJQA6d+5McHBwKUdUOoKDgwkJCWHatGmAzplr/XHf6JyxHeXlPC3L51R5aTvKy/959+7dWb58OadOnWL16tVl9njA9XUpS8dk+/btODk5sWzZMg4cOMChQ4f4+OOPrXI8CvX4aLlez549OXjwIAA///yz5V7QAgEBAZbXMTExeHt7l2I0pWPHjh3897//Zfr06SQkJHDw4EGdM7ny2zc6Z2xHeTlPy+o5VV7ajvLwf37s2DF27Nhhma5fvz6nTp0qk8ejoLqUpWPi7+/P+PHjGT16NHfffTc+Pj6MGDHCKsejUI+Pvt3t3buXoKAgdu/ezdChQxk5ciTp6em8/fbb1K1bl9jYWF588UWb/7WqNeS3bxYtWkRaWho1atTg119/ZcKECbfVMJzw8HD8/Pxo3bo1AKmpqQwbNoz77rvvtj9nCto3J06cuK3PmdJSXtq28tIOlZe2o7z8n588eZJ58+bRsmVLsrKyOH78OK+88gqOjo5l6nhAwXX55JNPytQxAdi6dSurV68mMzOTYcOG4evrW+zHQ8mxiIiIiEguDasQEREREcml5FhEREREJJeSYxERERGRXEqORURERERyKTkWEREREcml5FgsDhw4gJ+fH127duW1116z/A0aNIhTp07dUpl+fn75Pta1vPnHP/7Brl27AAgJCcHPz6/IZcTFxTFmzJhbeq+IiIgUj7LxSBcpEXfffTcDBw7k888/5/XXX7fMX716NY6OjqUYme2bMmUK1apV+1Nl1KlTh2eeeYb333+/mKISERGRolJyLDe0cOFCBg4cSO3atdmyZQt79uzB3d2d+Ph4Xn75ZTw8PNi8eTPfffcd9erV48yZM4wdO5Y77riDDRs2EB0dzSeffMLWrVsZMGAA7777LnXq1GHu3Ln8+9//5q233mLjxo1Ur16dUaNGER8fz6BBg9i1axcZGRkEBQXxySefEB0dTcWKFUlJSWHq1KmW56hfFRgYyKpVq3jqqac4fPgw0dHRzJw5k2+++YZDhw7RvHlz5syZA1BgvADffvstK1eupHnz5ri5ubF69WpGjx5N48aNCQgIoFOnTqSlpREREcHTTz/N0KFDCQkJ4a233qJnz574+flZ4n399ddp164diYmJBAYGsnHjRuzt7Zk2bZplHwCsWrWKrVu30rhx4+vqVdA+FxERESsxItdYu3at6dixo5k4caKZOHGi6d+/v4mNjTXHjx83Dz30kMnOzjbGGPPFF1+Yl156yRhjzI4dO0xKSooxxpiwsDAzatQoS3nDhw83wcHBecqfPHmyZbpXr14mNjbWGGNMbGysufPOO83hw4eNMcZ8+umnZs+ePebpp5+2rD9//nyzYMGCfGMfPny4ee+99yzxdevWzSQlJZns7GzTs2dPc+LEiRvG+/vvv5t77rnHxMfHW8ro1auXpfzAwEAzfPhwk5OTY6Kioky3bt3yLAsMDDTGGBMcHGyGDx+eJ7Zr63ntPjhy5Ijp2rWrSUtLs9Tv6ntvtM9FRETEOtRzLNfx9vbm3XffBSAoKAhnZ2d27NjB5cuXmTlzJgCXLl0iMzMTuPKM9jlz5uDi4sKlS5eIjo6+5W1Xr16dO++8E4Bhw4YREBDA+fPnee211wBISkq6Yc9pu3btAGjQoAH16tWjatWqlhgTEhJo2LBhgfGGhoZSu3ZtatWqBVwZZvJHbdu2xc7ODi8vL37//WfOfZwAACAASURBVPdbrudVISEhtG7dGmdnZwDuuecefv75ZwD27NlT4D4XERER61ByLDf02GOPAWCMoWHDhnnGIl+6dAmAsWPH8sILL/Dggw9y6tQpnnrqqQLLs7OzIycnxzL9x2TPyckpz7QxhrZt2zJr1izLdFpaWoHlX32/nZ1dnrKu3W5B8RpjsLOzK7Dsa8t3cHDAFPHJ61fXz8rKyjOvoG3eaJ+LiIiIdehuFVIoXbt2JTw8nIsXLwJw+PBhyxjepKQk3N3dAfjtt9/yvM/JyYmcnBzCw8OJjIykZs2aJCQkAHD27FkSExNvuN0ePXoQEhJiSSi/++47Vq5c+afqUlC87dq147fffuPs2bPAlbt33IqKFSuSnZ0NwNq1awHw8PCwlBsREWFZt1OnTvzyyy+kp6dft80b7XMRERGxDoeZV6/Zym3v4MGDlh+TnT59mk6dOlGhwpWLC9WrV8fT05PFixfzyy+/sG/fPiZNmoSLiwu1atVi0aJFHD9+nIiICA4dOoSzszNt27YlIyODL774ggMHDtCvXz/uuOMOtmzZQnBwMBcuXODYsWOcOnWKbt26MW/ePA4fPkx8fDzdunXD3t6eBg0akJWVxcqVKwkLCyM6Oprnn3/eEtdVQUFBfPPNN8TFxdGqVSs++OADIiMjqVGjBpGRkWzZsoW4uDi6dOmCt7d3vvF26dIFLy8v5s6dS0REBJmZmRw/fhw/Pz8OHjzIqlWriImJoVWrVqxbt47g4GBLb/Bnn31GTEwM3t7etGrVig0bNnDgwAHs7e3p2LEj1apVY9GiRURFReHg4EBISAgeHh507twZJycn3n33XcLDw8nMzOTAgQO4uLjQo0ePAve5iIiIWIedKeq1YZFybOfOnfTs2ROAHTt2sH79et57771SjkpERERKisYci1zjP//5D99//z0uLi6cPXuWyZMnl3ZIIiIiUoLUcywiIiIikks/yBMRERERyaXkWEREREQkl5JjEREREZFcSo5FRERERHIpORYRERERyaXkWEREREQkl5JjEREREZFcSo5FRERERHIpORYRERERyaXkWCTXI488QkxMTGmHISIiIqVIj48WyZWcnEyVKlWKvdwpU6ZQr149nn/++WIvW0RERIqXeo5FclkjMRYREZGypUJpByC2bc2aNSxdupQ2bdrg5ubGzz//TIsWLRg/fjzz588nIiKCESNGMGzYMAAyMzOZP38+Bw8exM7Ojm7dujFu3Djs7Ow4evQoAQEBpKenk5mZyaBBgxg8eDAA48ePZ+fOnTz//POEhoZy9OjRPOVe6/vvv+ett96iWrVqNG3alF9//ZWMjAzeeOMNWrVqBcDJkyeZOXMmGRkZ5OTkMGnSJNq3b295b82aNbnrrrv46aefSE5Opk+fPnz11VdMmzaNQYMG5Ynn559/5ujRo0yaNInLly+zbt06zp8/z8KFC2nYsOENt7dy5Up2795NxYoV2bt3L48++ihPPPEE4eHhzJkzBzs7OxwcHHjttddo0qRJnv3t6urKgQMHqFmzJqtWrSqZAy4i5YIttt0A27dvZ8WKFTg4OJCTk8MLL7zA3XffzZkzZ5g4cSJhYWHMmTOHoKAg9u/fz7Zt23BxcWHGjBmcP3+e7Oxsnn32We6///4blifypxiRmwgMDDQ9evQwycnJ5vLly6ZLly5m+vTpJicnx4SHh5u2bduazMxMY4wxixcvNn5+fiYrK8tkZGSYwYMHm6CgIGOMMaGhoSY0NNQYY0xGRobp27evOXHihGU7vXr1MjNmzDDGGBMWFpan3D9au3atufPOO82RI0eMMcZs3LjR9OrVy2RkZJisrCzTt29f8+WXXxpjjImIiDAdO3Y0KSkplvf6+PiYY8eOGWOMmTt3rjHGmOHDh5u1a9fmieeNN94wxhizfft207lzZ7Nt2zZjjDFvvPGGefXVV40x5qbbmzx5sgkMDLSUm5ycbDp16mT27NljjDHmhx9+MH369DHZ2dmW/d21a1eTmJhosrKyzLx58wp7qERELGyx7Q4KCjLnz583xhgTGxtrevbsaVkWGxtrmjVrZtavX2+MMWb58uUmPj7ePPPMM2bBggXGGGPi4+NNx44dTWxs7E3LE7lVGlYhheLj44ObmxtOTk54e3vTvHlz7OzsaN68OampqSQmJgKwfv16Bg4ciIODA46OjvTt25eNGzcC4O3tzVdffcWQIUMYOXIkCQkJHD58OM92unfvDnBdufm54447aN68OQAPPfQQZ8+eJTQ0lNDQUGJjYxkwYAAALVq0oHbt2uzYscPy3kaNGtGkSRMAJk+eXOA2unbtCkDTpk05d+4cXbp0scR36tQpgEJt71o//PADlSpVspR177338vvvvxMWFmZZp23btlSvXh0HBwdeeumlAuMTEbkRW2u7W7RowdSpUxk6dChTp07lt99+u27d3r17AzBy5EiMMfz44488/vjjANSqVYv27duzefPmQpcnUlQaViGF4urqanldoUIFy3SFCldOoczMTADi4uJYsWIF69atA+DSpUuWsbxz584lOTmZ1atX4+DggJ+fH+np6Xm2U7lyZQAqVqyYp9z8VK1a1fLawcEBNzc3EhISLPNGjhxpeZ2RkUFKSopl2s3NrUj1dnBwyBOfg4ODJbb4+Pibbu9acXFxXLhwAT8/P8u86tWrk5SUVOT4RERuxNbabn9/f4YNG8aoUaOAK8l0WlpannWubf/i4uKAK50YdnZ2AJw/f55mzZoVujyRolJyLMXK09MTf39/+vXrB0BOTg7JyckAHDp0iCeffNKSaN4o8S2Ma5PJrKwsUlJS8PDwsPR8XDtONzU1FXt761woqVOnTpG25+npSZ06dfKsf/HiRZycnKwSn4jIzZRE252YmMjp06ctvcyFKadOnToABAYGUr16dQAuX75MVlbWLZUnUhgaViHFauDAgWzatIns7GzgyqW6JUuWAODl5WUZOnD27FkiIyP/1Laio6MtZWzevJlatWrRtm1b2rRpg6enJ9u2bQOuJM7jxo0jOjr6T22vIDfbnqurK2lpaaSmpvLiiy/Sq1cvkpKSOHToEHAlkX7qqae4ePGiVeITEbmZkmi73d3dqVKliqWs3bt33/Q9tWvXxtfXlw0bNljmzZgxg5CQkFsqT6QwHGbOnDmztIMQ2/X111+zYsUKoqKicHFxYefOnXz33XdERETQqlUr3nzzTaKioggLC+OBBx6gU6dOREZGEhgYyMaNG0lKSmLq1Kk4Ojpy5513smbNGtatW0dkZCTp6ens27ePJk2asHDhQsLCwggPD8fX15fp06fnKdfZ2TlPXBEREZw/f56EhAQ++OADDhw4wLx58/D09MTe3p7u3buzaNEivvzyS9atW8eAAQPo0aMHP/30E/Pnz+fkyZMEBwdbxgnPnTuXXbt2ERERQf369Vm8eLElnk6dOvHSSy8RHx/P4cOH8fLyYu7cuZw8eZKkpCS6d+9e4Pbgyi3iPvzwQzZv3sxjjz3GXXfdRadOnZg3bx7r1q1j48aN+Pv707Jlyzz7OzIykgceeKDEj7mIlH222Hbb29vTuHFjFixYwK5duzDGsH//fsu648ePJz4+nr179+Lj42PpKfb19eXzzz9n1apVrF27ljZt2vDEE0/ctLw/fm6IFJYeAiJl0rp161i/fr1ucSYiIiLFSsMqRERERERyKTmWMuf7779n2bJlRERE8MYbb5R2OCIiIlKOaFiFiIiIiEgu9RyLiIiIiORSciwiIiIikssqDwFJSMj/yWAA1apV4vz5VGtsttgoxuKhGIuHYiwethCjh4f1nnx4o3a3PLKF41laVHfV/XbzZ+p+K+1uiT8hr0IFh5LeZJGVVIybo7blO79/4z43fa/2Y/FQjMVDMcqtKO9toLWo7rcn1b3kaFiFiIiIiEiuEu85FhERuZE/06MsIvJnKTm2QUX9YNAHiYiIiEjx0LAKEREREZFc6jkWEZEy4dqrZJXiK5J66TKgq2QiUrzUcywiIiIikkvJsYiIiIhILg2ruA3pB3wiIiIi+VNyXIYUNN5ORERERIqHhlWIiIiIiORSciwiIiIikkvDKkpAQWN8RURERMS2qOdYRERERCSXkmMRERERkVwaVlGOaTiHiIiISNEoORYRkTJN924XkeKkYRUiIiIiIrnUcywiIsXC1oZyqUdZRG6Feo5FRERERHIpORYRERERyaVhFbdIl+tEREREyh8lx7muTXYrxVck9dJlQMmuiIiIyO1EybHcVHH1kgftjsp3/mPdGxc5JhERERFrUHJ8E7b262sRERERsR4lx8XsdkqmNe5aveFyeyrr7ZzaLhG5ESXHIiIiKGkWkSuUHIvNKo+9suWxTiIiIuWJkmO5qSMnz+c7v4VXtRKO5MbySzyVdIrIn6UvtSK3FyXHcssKSpr7l+HPC30IipQfZeWLvYjYFiXHYvHOjjWlHUKxKyjZvcrVtSKXcu9pXZLbFZGyrziuVukLuYjtKbfJsX5YUbCCelOKy9XGvrCJZ1ETydJIPJXsipR/BbWNTRwKX0ZxJbvWTpqt3aYpuZeyrNwmxyIi8ueU9Vu2FcTaHQSlQV/gRYqPkuNyrDx+AJRXurQqcnPXtmmOjhXIzMwqxWhKR9DuKKsNBxORK5QclwNKgsuvon4Q2tqlWxERkbKmzCTHxTWGuLxeJiwLjmfvK9L6TRw6WCkSEbmW2kXru92GPRS1vgV9If9jOVc7C4rjC7w6B6QgZSY5Lo/+TI/v7XpJUW6sLDT2ZSFGKDtxlke2djWsKF/s9aVepOyzueS4qD0YZaHHw9Yaemu7+kHimFaBzOxbT+AL+kAq6MMnv/X1QXVritrr87dBbawUSfH1QH229Ui+w1OU7Frf7dQGFtcVsqK2f2Vdcdy1yNr/y0X9wlzcdy754xA7W6uvNbdpzc+Y/NgZY0yJblFERERExEbZl3YAIiIiIiK2QsmxiIiIiEguJcciIiIiIrmUHIuIiIiI5FJyLCIiIiKSS8mxiIiIiEguJcciIiIiIrms/hCQ9PR0nnjiCXx9fZk8eTJJSUm88847NGjQgOjoaF544QVq1qxp7TBuKCoqis2bN1OxYkX27dvH888/j5eXl03F+eGHH3L69GmqVatGTEwMb775Junp6aUaY0JCAgsWLODIkSOsXbsW4IbH98MPP+TixYskJyfTrVs3evfuXSoxzp49GxcXFypVqsSRI0eYNm0aHh4eNhXjVYsXL2blypWEhIRY5tlKjBkZGaxYsQIXFxeOHTtGtWrV+Mc//mFTMYaHh7Ns2TJat27NoUOHGDVqFO3atSu1GKXw/vrXv1KxYkUA7O3tWblypc21L8WpuNrTiIgIVq9eTf369UlMTGTy5MlUqGBzz/vKI7+6L1y4kL1791rWGTNmDN26dQPKT91PnjzJggULaNmyJXFxcbi7uzN+/Pjb4rgXVHebOe7GyubMmWNefvllM3fuXGOMMa+++qrZvHmzMcaY77//3kyaNMnaIdxQVlaW+dvf/mays7ONMcbEx8ebxMREm4rz7NmzpkOHDpYYx4wZYzZs2FDqMX7zzTfm+++/NwMHDrTMKyim0NBQ8+yzzxpjjMnIyDAPPPCAuXDhQqnEOH/+fMvrpUuXmtdff93mYjTGmODgYDNnzhzTsWNHyzxbivH99983e/futUxHRETYXIyjRo0y27ZtM8YYs23bNjNixIhSjVEKLzAw8Lp5tta+FKfiaE9zcnJM//79zdmzZ40xVz5/v/jiixKuSdHlV/f8jr8x5avuYWFhZvv27Zbpfv36mV9++eW2OO4F1d1WjrtVh1UEBQXRvn176tevb5m3c+dOS89N+/bt2blzpzVDuKlffvkFYwyrVq1i6dKl/PDDD1SrVs2m4nRxccHR0ZGLFy8CkJqaStOmTUs9xr59++Lq6ppnXkEx/fDDD7Rt2xYAR0dHGjduzL59RXvManHFeLV3E8AYQ6VKlWwuxt9//53NmzczfPjwPPNtKcZNmzZx6tQpPv74YxYsWGDpfbelGGvWrMm5c+cAOHfuHK1atSrVGKXwfv31V5YtW8bChQvZsWMHYHvtS3EqjvY0NjaW9PR0y/9iaX92FVZ+dQf44IMPWL58OcuWLSMtLQ0oX3X38fHh/vvvt0zn5OTg4uJyWxz3guoOtnHcrZYcHzt2jKioKPr06ZNnfmJiouWfoHLlyly4cIGsrCxrhXFTZ86cITQ0lEGDBvHcc8+xb98+1q9fb1NxVq5cmZdeeol//OMfTJkyhTp16uDl5WVTMV5VUEznzp3L0/hVrlzZkrSUluTkZP773/8yatQoAJuJMScnh/nz5/Piiy9et8xWYgQ4ffo0dnZ2jBgxgo4dOzJx4kSbi3HixIkEBQUREBDA+vXr6du3r83FKPn729/+xujRoxk7dixLlixh3759Zap9KQ5Fre+161+dn5iYWOJxF4e+ffvy9NNPM2rUKFxdXXnjjTeAgv93y3rdt2/fjq+vL02aNLntjvu1dbeV4261ASnbt2/HycmJZcuWceDAATIzM/n444+pUaMGly5dokqVKly8eJGqVauW6rgYV1dXGjdujJubGwB33303e/futak4IyIiWL58OevXr6dChQrMnTuXRYsW2VSMVxUUU/Xq1bl06ZJlvYsXL1K9evVSizMlJYVZs2Yxe/Zs3N3dAWwmxv/9739UqFCBNWvWcOHCBS5fvsyyZcvo06ePzcQIVxohHx8f4Mr/zf79+8nOzrapGP39/Xnttddo164dkZGRPPPMM/z44482FaPk7+q55eDgwD333ENISEiZaV+KS1Hre3X9a+fXqFGjNEL/05o2bWp53blzZ5YvXw4U3E6X5boHBwcTEhLCtGnTgNvruP+x7rZy3K3Wc+zv78/48eMZPXo0d999Nz4+PowYMYKePXty8OBBAH7++Wd69uxprRAKpU2bNiQlJZGdnQ1c6Ulu2LChTcUZHx+Pu7u7JfH18PAgIyPDpmK8qqCYevXqRWhoKABZWVkcP36cDh06lEqM586dY9asWbz88ss0aNCArVu32lSMd911F6+//jqjR49m6NChVKxYkdGjR9OwYUObiRGgS5cuxMbGAld6kb28vHBwcLCpGH/77TfL5bar/zdgO8da8nf8+HG+/PJLy3RMTAxeXl5lon0pTkWtb4MGDXB2diYhIeG695Q1AQEBltcxMTF4e3sD5a/uO3bs4L///S/Tp08nISGBgwcP3jbHPb+628pxtzPGmD9dyg1s3bqV1atXk5mZybBhw/D19eXtt9+mbt26xMbG8uKLL5b63Sq2b99OcHAw1apV47fffuPVV18lPT3dZuLMzs7mn//8JxUrVsTNzY2jR48ybdo0nJycSjXGvXv3EhQUxO7duxk6dCgjR4684X778MMPSU5O5sKFC/To0aNEfk2eX4xDhw4lKyvL0mPs6urKkiVLbCpGZ2dnYmJi+Pzzz/n3v//N6NGjGTFiBJUqVbKZGC9cuEBgYCBeXl4cP36c4cOHW3r7bCXG3bt3880339C8eXOOHTtGnz59eOCBB0otRimc+Ph4Xn/9dVq2bMnFixfJyspi6tSpJCcn21T7UpyKqz2NiIhg1apV1K1blwsXLtj8XQsg/7ovWrSItLQ0atSowa+//sqECRNo1KgRUH7qHh4ejp+fH61btwau/J5o2LBh3HfffeX+uBdU9xMnTtjEcbd6ciwiIiIiUlboISAiIiIiIrmUHIuIiIiI5FJyLCIiIiKSS8mxiIiIiEguJcciIiIiIrmUHIuIiIiI5FJyLCIiIiKSS8mxiIiIiEguJcciIiIiIrmUHIuIiIiI5FJyLCIiIiKSS8mxiIiIiEguJcciIiIiIrmUHIuIiIiI5FJyLCIiIiKSS8mxiIiIiEguJcciIiIiIrmUHIuIiIiI5FJyLCIiIiKSS8mxiIiIiEguJcciIiIiIrmUHIuIiIiI5FJyLCIiIiKSS8mxiIiIiEguJcciIiIiIrmUHIuIiIiI5FJyLFIEa9as4b777mPKlCmlHYqIiIhYgZJjkRtYuHBhnkR48ODBDBw4sBQjEhGRa/2xnRb5s5Qci4iIiIjksjPGmNIOQkrfmjVrWLp0KW3atMHNzY2ff/6ZFi1aMH78eObPn09ERAQjRoxg2LBhAGRmZjJ//nwOHjyInZ0d3bp1Y9y4cdjZ2XH06FECAgJIT08nMzOTQYMGMXjwYADGjx/Pzp07ef755wkNDeXo0aN5ys0vrrVr1+Li4oKzszMvv/wyLi4uTJw4kbCwMObMmUNQUBCJiYksWLCADRs2sGfPHmrUqMH7779PxYoVAdi1axeLFy/GwcEBZ2dnXnvtNby9vQGIjo7mjTfeID09nezsbMaOHUuPHj3YsmULb7/9NpcvX6Zx48Z07doVf39/Fi5cSFRUFG5ubvzyyy/UrFnTsq2ZM2eyadMmhg8fTlRUFJGRkTz44IO88MILljp9+OGHbNu2jQoVKnDnnXcyefJknJycOH78OLNmzQIgKyuLxx9/nEGDBvH7778zZcoULl++TFZWFr169WL06NFWOxdExHpsta29dOkSc+bM4fjx4wA0atSISZMmUb169QLbyEOHDvHqq6+SkpLCk08+yXfffUdWVhYLFixg2bJlHDx4kJYtWxIQEEBGRgajRo1i7969vPDCC4SEhBAfH8+AAQMs7dn+/fsJDAzEGENmZibPPvss999/vyXGjz76iG+//ZaKFSvi7OzMxIkTiYmJua6drl69ep59/Md2GmD37t28//77ODk54erqyqxZs6hduzZpaWlMmTKFxMREsrOz8fHxYerUqeTk5DBr1ix+/fVXHBwc8Pb2Zvr06VSqVMlq54qUIiOSKzAw0PTo0cMkJyeby5cvmy5dupjp06ebnJwcEx4ebtq2bWsyMzONMcYsXrzY+Pn5maysLJORkWEGDx5sgoKCjDHGhIaGmtDQUGOMMRkZGaZv377mxIkTlu306tXLzJgxwxhjTFhYWJ5yr3Xx4kXTsWNHc/nyZWOMMR9//LFZu3atMcaY2NhY06xZM7N161ZjjDH//Oc/Te/evc3p06dNTk6OefTRR82mTZuMMcacPHnStG3b1kRFRRljjAkKCjIPPvigyczMNJmZmebBBx+0lBsTE2PatWtnYmJiLPtk8uTJ1+0nX19fk5SUZLKzs83DDz9svv76a8vy4cOHm7/97W8mJyfHxMfHm5YtW5q4uDhjjDEbNmwwffv2NampqSYnJ8dMmDDBLFq0yBhjzIQJE8zmzZuNMcacPXvWjBo1yhhjTEBAgFm6dKkxxphLly6ZIUOGFPaQiogNsrW21hhjXnnlFTNlyhRjjDHZ2dnmueeeM8HBwTdtI4ODg02rVq3MwYMHjTHG+Pv7m4EDB1rq1rlzZ8syY4xp1qyZmTdvnjHGmPPnz5tu3bqZ3bt3G2OM2bFjh4mOjjbGGJOSkmJ8fX1NcnKyMcaYjRs3mv79+5vU1FRjjDEffvihCQwMtOzPm7XT/fv3t7TTVz8Tjh8/bowx5tNPPzVPP/205fVrr71mjDEmKyvLDBo0yBLb1TbZGGPGjh1rYmNj892XUvZpWIXk4ePjg5ubG05OTnh7e9O8eXPs7Oxo3rw5qampJCYmArB+/XoGDhyIg4MDjo6O9O3bl40bNwLg7e3NV199xZAhQxg5ciQJCQkcPnw4z3a6d+8OcF2513JwcAAgKCiItLQ0hg0bxsMPP5xnna5duwLQrFkzqlSpQt26dbGzs6Np06bExsYCsGnTJu666y4aNWoEwMMPP8yZM2c4ePAgYWFhnDp1ikcffRQALy8v2rRpY6lLQdq0aUPVqlWxt7enadOmnDp1Ks9yX19f7OzsqFWrFu7u7pw+fdqy3/r374+Liwt2dnY8/PDDbNiwAYCqVavy7bffcurUKTw8PFi4cCEA7u7u7N69m6NHj1KpUiU++uijG8YmIrbPltranJwcgoKCGDRoEAD29vZMmTKFO+64o1BtpKurK23btgWgadOm1KtXz1K3hg0bWtriq/r37w9cadt69OjB5s2bLe997733GDJkCP7+/iQlJXHixAkA1q1bR9++fXFxcQHgr3/9Kw8++OAN93FB7fSmTZto3bo1jRs3Bq58Jvz000+cPXsWd3d3Dhw4QGhoKA4ODnz66acAVKlShV9//ZUff/yRnJwc5s+fT926dW+4fSm7KpR2AGJbXF1dLa8rVKhgma5Q4cqpkpmZCUBcXBwrVqxg3bp1wJVLclWqVAFg7ty5JCcns3r1ahwcHPDz8yM9PT3PdipXrgxgucR1tdxrOTs78+mnn7J06VIWLFjAvffea7nM98dyHBwcrov92livfY+DgwNVqlQhLi7O8vpq/QCqV69OfHz8DffT1e0CODk5XRf/tcsrVqyYJ5avv/6akJAQAC5fvoy9/ZXvqNOmTeOjjz7i6aefplatWkyYMIEuXbowatQoXFxc+Mc//oGDgwNjxoyhX79+N4xPRGybLbW1586dIyMjI0872bBhQwBCQkJu2kYWVJer03/c5tX44UqC/OuvvwIwefJkmjVrxvz58wG47777SEtLs+yHa+Nzc3PDzc3turrkV/er9b92nx4/fhw/Pz/L8nr16pGYmEj//v3Jyspi9uzZJCUlMWLECJ588knatWvHG2+8wb/+9S+mTZvG4MGDee655264fSm7lBzLLfH09MTf39+SpOXk5JCcnAzAoUOHePLJJy09v/k1xoWRmZlJjRo1ePvtt0lJSWHKlCkEBAQQEBBQ5Fiv9j4AZGdnk5ycTJ06dXBwcCA5OZmsrCxL43/u3DlLL3Nx8/T0pGvXrjz77LOWeefOnQMgOTmZsWPH4u/vz4YNG/D392fPnj1cvHgRPz8//Pz82LNnD8899xytWrXCy8vLKjGKBgGNgwAAIABJREFUiO0oiba2evXqODk5ce7cOZo0aQJAfHw89vb21KlTp9jbyAsXLlC/fn0Azp8/j4eHh6U+I0eOtKx3bX08PT0tbSVAamoqcXFxlt7fovD09KR169YsW7YsT0yVK1fm3LlzPPTQQwwYMIDDhw/zzDPP0LhxY1q1akXHjh3p2bMnJ0+e5Nlnn6V27dr85S9/KfL2xfZpWIXckoEDB7Jp0yays7OBK5f+lixZAly57BYWFgbA2bNniYyMvKVtxMfH8+qrrwJXegnuvPNOy/aKon///oSHhxMTEwPAli1bqFu3Lu3ataNNmzZ4eXmxadMmAGJjYwkLC7NcQnR1dSUtLQ1jDOPGjbulelxr4MCBfPvtt1y+fBm40iszY8YMAKZOncrvv/+OnZ0dHTp0ICsrCzs7O8uPdODKpVhHR0eMfkcrclsoibbW3t6exx57zNI7nZOTw/Tp00lISLhpG3krtm7dClxJjHft2mUZZnFtfY4cOUJCQoLlPVfbzqs9yStXrmT37t1A0dvp/v3/j707DYjiStuHf0EDymKCIO6CSiIqERSNGPd11BjX/Ccug0bjjJoEMybRUYMKMeO4xO0BEwc0cTJqMmgQxGWikUR0VHBBUBCJCorg3kIQEJrlvB9s6hWlsaG3arh+n6SsrrrP4XDX3adOdY9CUlKStNxNqVTCz88P5eXl2LlzJ2JjYwE8Wa738ssvo7y8HD///DPCw8OlOJs1a4by8vJa9wHJmyIoKCjI1EGQ6e3btw/btm1Deno6bG1tERsbiyNHjiA1NRWenp5YsWIF0tPTkZSUhGHDhsHX1xdpaWkIDg5GdHQ0cnNzsXjxYlhbW6NTp04IDw/Hnj17kJaWhqKiIpw5cwbu7u4ICQlBUlISkpOT0bdvXwQEBFQ6bsOGDaWYrKyscPr0aWzduhWRkZG4f/8+AgICUFZWBn9/f9y9exeXLl2Cq6srVq1ahczMTBQVFeH69evYvXs3rly5AicnJ7z++uvw9PTEP/7xD0RGRuLq1av48ssv4eTkBEtLS/Tr1w9btmxBeHg4Dh06hCVLlsDb2xvAkxmV7du3Izo6Gn369EF2dnalfrp06VKlc+3duxexsbFSv4WGhuLs2bNITk7G66+/jjfeeAP5+flYs2YN9u/fj9TUVHz++eews7NDeXk5vvzyS0RHR2Pv3r1YuHAhOnfuDCsrK+mTOMLDwzF9+nQMGDDAVEOFiHQgx1wLAL6+vvjf//6HrVu3IiIiAsOHD8eQIUOqzZFXr17F0qVLkZ2djTt37qC8vBxhYWFVts3d3R1t2rTBpk2bMHz4cISEhOC7777DxIkTpbXOHh4e2LJlCw4dOoR79+7h1q1bOHfunJQ7i4qKsHbtWkRFRUGhUOCjjz6CpaVljfP066+/jk6dOuHvf/879u7di0OHDiEgIACtWrVCgwYNEBoaisjISOzcuRODBg3CxIkT0aBBA/zwww/YvXs3du7cCXd3d8yePVuatae6hR/lRkREREbh4eGBmJgYaVkFkRxxWQURERERkRqLYyIiIjIolUolfTrEJ5988sJPBCIyJS6rICIiIiJS48wxEREREZGaQT7n+P79R4Y4bK00bmyHnJxCU4ehV3WxTQDbZU7qYpsAw7fLxaX6Ly3QhZzy7ouY6/gxx7jNMWbAPOM2x5gB84y7JjHXJu/W+ZljK6u69zErdbFNANtlTupim4C62y65Mdd+Nse4zTFmwDzjNseYAfOM29Ax8xvy6qED6Yer3D6q/R+MHAkRke6Y04hIn1gcm5GKC4Dd3QYoLCiWtvMCQERERKQfLI6JiMgkOONLRHJU59ccExERERFpi8UxEREREZEal1XUYZpuWRIRERFR1VgcExGRrHAtMhGZEpdVEBERERGpsTgmIiIiIlLjsgoyOX5+MxEZApdnEFFtcOaYiIiIiEiNM8dERGQW+Ak8RGQMnDkmIiIiIlLjzDG9UE3X7Wna/3JmTpXbO7o2rl1gRERERHrG4pgkNb1lyVucREREVNdwWQURERERkRpnjomIyKCevcv07Mc2GpshP+KNHx9HZP5YHNcBplreILc1xFX1Ay9IRKQtFrZEBLA4JiKiGuLzBkRUl7E4JrOz7mh4ldv5qRdERESkKxbH9EKalk/o+/jW1lYoKSmVto9qX7vjPK2mxyAikoOo4+lVbh/Xj0mNyNBYHJNsaZohJiIiIjIUFsdEREQmommGmIhMh8WxEfAJaNPhrUkiMiZNy9C4xIvIfLA4liE+Ca4/18rOaPifml2pWGQT1V8VOfnZz2fmBAdR3cTimOolfRW7LJqJiIjqFhbHJiS3GWJDfyoFEZE50vSJOpcztX9o2NBrizUd/y8TvPVyHL7hp/qExTEREZHM6GtJ2PeHLqOgiq/qZrFLpBmL41riQ3Z1E58cJyIiqt9YHNdhmpZJ8JvkNNM8W1M1d8XrBoqEiAzFFLlRU25hDiGSHxbHdYDc1grLLR4iIrmq6RtyfTHVGmgu5yBzwOK4HmLxKj9cpkP0YrwbZjosdqk+YXFMZESaLjDWrYwcCJGM6esNPCcCDE9fM9DPHsfevgEKCoo1Ft9VnZeFOukLi2MiIqqS3D5ukmpejKY+jkNJWelz20211pkPPZM5YHFMREQE855p1rR22dqyZpd5uT04qI9iuqZLQuS2hKSmfcAZdN2xOFbT15pPzrTUL5ouJFHH//9/V9werG5/ZFa9uSSb3+RHhmfovPVs0fnsl2no+/hUP2nKc9rkaV2Ozzxa98iuODb0g0k1vQjo46Kh78St7wtLfWToJ8SfPr71Y6sqb2vqYt3Rqr+ZS1+zO5rGfUn2KwAqF/w1ZS4XEj4kSXKkr9xlquNoylHPHqcib9Ykp5nqkz80MfcZX1O8GZDLGxALIYQw6hmJiIiIiGTK0tQBEBERERHJBYtjIiIiIiI1FsdERERERGosjomIiIiI1FgcExERERGpsTgmIiIiIlJjcUxEREREpCa7LwHR1smTJ3H48GE4OzvDwsIC/v7+lf6/uLgYq1evRrNmzXD9+nXMmjUL7dq1AwAkJibixIkTsLS0RHx8PFauXIkWLVqYohnP0aVdK1euhJWVFcrLy1FUVISlS5fC0tL0739e1CYAOHjwINavX4+AgAAMGjRI2r53716kpqbC0tISrq6umDRpkjFDr1Zt23XhwgV899136Ny5MzIyMuDl5YV33nnH2OFrpMvvCwCUSiXGjRuH2bNnw8/Pz1hhV0uXNsk5X8iRLjls8ODBaNWqFQCgadOmWLdunSxiBuSZo3SJW659HRYWhgcPHqBJkyZISUnBRx99BHd3dwDy7uvq4pZrXx88eBAxMTHo2LEjLl68iHHjxmHw4MEA5N3X1cWtt74WZqiwsFAMHTpUFBcXCyGE8Pf3FydPnqy0T2hoqAgLCxNCCHH58mUxefJkIYQQjx49Ev7+/tJ+mZmZoqCgwEiRV0+XdiUmJorRo0dL+40ePVqcPXvWSJFrpk2bMjMzxalTp4Sfn5/45ZdfpO23b98WY8aMEeXl5UIIISZMmCAyMjKMFnt1dGnXkSNHRFJSkhBCCJVKJXr06CGUSqXxgq+GLu0SQoiysjIREBAg5syZI7Zv3260uKujS5vknC/kSJccJoQQwcHBxgtWzVxzlK5/q3Lt6w0bNkj9eeDAATF79mwhhPz7WlPcQsi3ryMiIkR2drYQQoiUlBQxbNgwIYT8+1pT3ELor69NP61YC4mJiWjZsiVsbGwAAD4+Pjh69GilfY4ePYpu3boBADw8PHD58mXk5+cjNjYWdnZ22LZtGzZt2oSUlBTY2dkZuwlV0qVdjo6OKCwsRGlpKUpLS2FhYYHWrVsbuwnP0aZNbdq0Qa9evZ577fHjx+Hp6QkLCwsAQLdu3XDs2DGDx6wNXdo1ZMgQeHl5ST8rFApYW1sbNF5t6dIuANiyZQv++Mc/4uWXXzZ0qFrTpU1yzhdypEsOA4AzZ85gy5Yt2LhxIxISEmQTsxxzlK5/q3Lt63nz5kn9WV5eLv29yb2vNcUNyLevJ0yYgJYtWwIAbty4Ic10y72vNcUN6K+vzXJZhVKphL29vfSzg4MDlEqlVvtkZ2cjKSkJf//736FQKDBt2jQ4OjpqTCDGpEu73Nzc8M477+Cvf/0rLC0t0bt3bzg5ORktdk20aZMmDx8+rPRae3t7rV9raLq062k7d+7EnDlz0KhRI32GV2u6tCsuLg4NGzaEt7c3fvjhB0OFWGO6tEnO+UKOdMlhDg4OmD9/Pry8vPD48WOMHz8eoaGhcHNzM3nMmpgyR+mag+Te1yqVCpGRkQgMDARgPn39bNyAvPu6qKgIISEhOH36NNauXQvAPPq6qrgB/fW1Wc4cOzs7o6CgQPo5Pz8fzs7OWu3j4OCAzp07w9raGpaWlujatSvOnDljtNiro0u7YmJiEB8fj6+++gohISHIysrCrl27jBa7Jtq0SRMnJ6dKry0oKND6tYamS7sq7Nu3D4WFhZg+fbqeo6s9XdoVExOD4uJihIWF4bfffsOJEycQERFhqFC1pkub5Jwv5EiXHAZAuqNia2uLTp06GWWWzVxzlK45SM59rVKpEBQUhI8//hiurq4AzKOvq4obkHdfN2zYEAsWLMDatWsxbdo0lJSUmEVfVxU3oL++NsviuGvXrrh16xZUKhUAICEhAQMHDkRubq50e27gwIE4f/48ACAtLQ0dO3aEg4MDfH19kZ2dLR3r1q1baNu2rdHbUBVd2nXnzh24uLhIx3JxcZGOY0ratEmTfv36ISUlBUIIAMD58+fRv39/g8esDV3aBQC7d++GUqnEBx98gLS0NGRkZBg6ZK3o0q6AgADMmjULs2bNQocOHdCnTx+8/fbbxgi7Wrq0Sc75Qo50yWGnTp2qdOv2xo0baNOmjSxi1sSUOUqXuOXc10VFRQgMDMSMGTPw2muv4dChQwDk39ea4pZzX3/zzTdSfzZv3hw5OTkoLi6WfV9riluffa0ICgoK0q0pxmdtbQ13d3ds27YNiYmJaNq0Kd5++20EBwfjypUr6N69Ozw9PfHTTz/h0qVLiI2Nxd/+9jc0btwYTk5OUKlU+Omnn3D69Gk0aNAAM2bMkNbWmGu7XnnlFcTExCAlJQWnT5/GgwcP4O/vb/K1rNq0SQiBzZs3Iz4+HgUFBbC1tYWbmxscHBxgZ2eHiIgInDx5En369EHfvn1N2p4KurTryJEjCAoKQn5+PiIjIxEdHY3u3bvLYo24Lu2q8OOPP+LXX3/Fw4cP0ahRI5MXk7q0Sc75Qo50yWGFhYX417/+hczMTBw8eBDe3t4YM2aMLGKWY47SJW459/W8efOQkpKCs2fPIjIyEnFxcZg4caLs+1pT3HLu6/j4eBw4cAC//fYb9uzZg4kTJ8LHx0f2fa0pbn32tYWoKL+JiIiIiOo5s1xWQURERERkCCyOiYiIiIjUWBwTEREREamxOCYiIiIiUmNxTERERESkxuKYiIiIiEiNxTERERERkRqLYyIiIiIiNRbHRERERERqLI6JiIiIiNRYHBMRERERqbE4JiIiIiJSY3FMRERERKTG4piIiIiISI3FMRERERGRGotjIiIiIiI1FsdERERERGosjomIiIiI1FgcExERERGpsTgmIiIiIlJjcUxEREREpMbimIiIiIhIjcUxEREREZEai2MiIiIiIjUWx0REREREaiyOiYiIiIjUWBwTGVhWVhbefPNNU4dBRETPYH6mqlgIIYSpgyCq6/Ly8vDSSy8BAPbs2YPIyEhs377dxFEREdHT+flFFi1ahFatWmHu3LkGjopMiTPHREagbeIlIiLjYn6mZ3HmmF4oPDwcoaGh8Pb2RqNGjZCQkICOHTvC398f69evR2pqKqZPn44//elPAICSkhKsX78e58+fh4WFBfr06YMPP/wQFhYWuHLlClavXo2ioiKUlJRgwoQJmDhxIgDA398fsbGxmDt3LhITE3HlypVKx31WQUEBVq5ciWvXrgEA2rVrh/nz58PJyQnXr1/HF198gaKiIpSVleGDDz5A//79ceHCBSxduhSPHj3ClClTcPToUfz+++8IDg5Gu3btAACZmZlYvny5FGPfvn0xd+5cFBQUICgoCFlZWbCwsMCrr76KpUuX4t69e5gxYwYePnyISZMm4dNPP8WaNWsQERGBmTNn4sSJE4iLi0NMTAxu3bqFZcuW4cGDB+jUqRM6dOiAa9eu4dSpUxg4cCBCQ0MRFRWFtWvXwsfHB8HBwUb4DRORvjFv6pY3Z82ahaioKHz//fewsbFBs2bN8Pnnn8PBwaFSe1QqFWbOnInTp0/jk08+QXx8PO7evYuxY8di1qxZz/UtAHTr1g2ffPIJrK2t8e6770r52dLSEvPmzUNSUhJWrVqFvXv34vbt21i5ciV8fHzw3XffISwsDA0aNECrVq0wZswYvPXWW1i0aBGUSiXKysrg5eWFxYsX639AkXEJIi0EBweL/v37i7y8PFFcXCzeeOMNERAQIMrLy0VycrLo2rWrKCkpEUII8fXXX4upU6eK0tJSoVKpxMSJE0VUVJQQQojExESRmJgohBBCpVKJESNGiIyMDOk8gwYNEoGBgUIIIZKSkiod91lLliwRixYtEkIIUVZWJmbPni3i4uJESUmJGD58uIiIiBBCCHHjxg3RrVs3cePGDSGEEHFxccLT01OcOXNGCCFEYGCgWLp0qRBCiNLSUjFy5EixZ88eIYQQeXl5ol+/fkIIIXJycqR2CCHEwoULxa5du4QQQly+fFn4+PiIx48fCyGEUCqVUmxCCNGhQwdx8+ZNIYQQERERws/PT/o/lUolevbsKc6dOydtmz17tigvL3/Rr4WIZIx5s/Z58+zZs6Jnz55CqVQKIYRYtWqV+OyzzzT2dYcOHcSaNWukc/bp00ccP35cCCHEpk2bxLvvvitKS0tFaWmpeO+998SmTZsqvbYiP9+8eVN06NBBHDhwQAghRFhYmHjvvfcqxR8cHCz9vGPHDrFs2TKpHyZMmKAxRjIfXFZBWvPy8kKjRo1gY2MDNzc3eHh4wMLCAh4eHigsLIRSqQQAREZGYvz48VAoFLC2tsaIESMQHR0NAHBzc8OPP/6ISZMm4b333sP9+/dx6dKlSufp168fADx33KeVl5cjKioKEyZMAABYWlpi0aJFeOWVV5CUlISsrCyMGTMGAODq6gpvb28pBgCws7NDjx49pPNkZWUBABITE5GZmYnRo0cDABo1aoQNGzYAAF5++WXcunULkydPxtSpU3H69GmkpKRIx2jdujWOHDkCADh48KDWD3lYW1tj1KhRiIqKAgBcunRJ6lsiMm/Mm7XLm5GRkRg8eDCcnJwAAKNHj8a+ffsgqrnZPWrUKACAo6Mj+vfvjwMHDgAA9u7di3HjxkGhUEChUGDs2LHYs2dPtb+3/v37P9fOqjg6OuLcuXNITEyEQqHAjh07qj0umQcrUwdA5sPe3l76t5WVlfSzldWTYVRSUgIAuHPnDrZt2yYln4KCAmlN16pVq5CXl4edO3dCoVBg6tSpKCoqqnSeittmDRo0qHTcpz18+BAqlUpKnADQtm1bAEB8fDxeeuklKS4AcHJywt27d587R8V5Ks5x9+7d517bvXt3AE+SdXh4OKKiouDo6IiQkBBkZ2dL+40bNw5RUVF46623cPLkSUyePFlTVz5n3Lhx+POf/4wlS5Zg7969eOedd7R+LRHJF/Nm7fLmnTt3cO3aNUydOhUAUFpaiiZNmiAnJ6dS/E97eu2wo6MjfvvtN+lYjRs31tiuqjzdn1X1ZYVRo0ahtLQU//jHP5Cbm4vp06djypQp1R6b5I/FMeldixYt8P7772PkyJEAnsxW5OXlAQAuXLiAKVOmQKFQAKg6gWvDyckJNjY2ePjwIdzd3QE8SdCWlpZo3rw58vLyUFpaKiXrhw8fSmvjqlPVa69du4ZWrVrhwoUL8PLygqOjI4Anyfppo0ePxvr16xEXF4e2bdtKbdSGl5cXnJ2d8fPPP+P69etSm4iofmDerJw3W7RogTZt2iAwMFDa9+HDhxoLYwD4/fff0bp1awBATk4OXFxcpGPl5ORUOk6zZs1e2C5tPHz4EG+++SbGjh2LS5cuYcaMGWjfvj169eqll+OTaXBZBend+PHjsX//fpSVlQF4MnPwz3/+E8CTW3VJSUkAgHv37iEtLa1W57C0tMS4ceOkWZby8nIEBATg/v378Pb2hqurK/bv3w8AuHnzJpKSkqTbhdV59rW5ubmYN28eFAoF3NzccPnyZahUKpSWluLUqVOVXtukSRP06tULCxYswNixYzWew97eHo8fPwYAzJ07V7pYjB07FitXrkSfPn1q3iFEZNaYNyvnzfHjxyM2Nha///47ACA9PR3vv/9+tXEcOnQIwJPC+NixY9Iyi/HjxyM6OhplZWUoLy9HdHS0tLSkpiryd2FhIT799FPs3LkTsbGxAIAOHTrg5ZdfRnl5ea2OTfKhCAoKCjJ1ECRv+/btw7Zt25Ceng5bW1vExsbiyJEjSE1NhaenJ1asWIH09HQkJSVh2LBh8PX1RVpaGoKDgxEdHY3c3FwsXrwY1tbW6NSpE8LDw7Fnzx6kpaWhqKgIZ86cgbu7O0JCQpCUlITk5GT07dsXAQEBlY7bsGHDSnH5+vrif//7H7Zu3YqIiAgMHz4cQ4YMgaWlJfr164ctW7YgPDwchw4dwpIlS+Dt7Y2rV69i6dKlyM7Oxp07d+Ds7IxVq1YhMzMTubm56Nevn/TaXbt2Yf/+/Vi4cCFcXV3h4eGBc+fOYfPmzTh79ixsbW1x+vRpWFpaolu3bgCe3CpNTU2Fv7+/FOe7776LrKwsJCUloX///mjbti0iIyMRFRWF9u3bY+DAgQCAVq1aYcuWLfjHP/4BOzs7o/1+iUj/mDd1y5stWrSAo6MjVqxYgX379uHEiRNYvnx5peURT9u0aROGDx+OkJAQfPfdd5g4caJUAHt7e+PatWvYtGkTIiIi4OnpCX9/fygUiufy87x583D37l1cunQJPj4+WLRoEbKzs3H9+nUMGzYML730ErZu3YoDBw5g3Lhx6NKlC0JDQxEZGYmdO3di0KBB0ieJkPniR7kRyUReXh4+++wzbNq0ydShEBGZFQ8PD8TExEjLKoh0wWUVRCYWGxuLhw8fYv/+/fwaUyIiIhNjcUxkYpmZmZgyZQqOHTuGP/zhD6YOh4jIbKhUKukTLT755JMXfgoFkTa4rIKIiIiISI0zx0REREREagb5nOP79x/V+rWNG9shJ6dQj9HUDeyX57FPqsZ+eZ5c+sTFpZHBjl3X8i5j0g5j0g5j0o7cYtJHPLXJu7KbObay0v6LE+oT9svz2CdVY788j31SPTn2D2PSDmPSDmPSjtxiMlU8Zv8NeQfSD9do/1Ht+cATEZEcaMrfzNNEZEqymzkmIiIiIjIVs585rinOVBARERGRJpw5JiIiIiJSY3FMRERERKRW75ZVEBGRcdX0wWkiIlOSXXG8K3k/CguKn9vONcFEREREZGhcVkFEREREpCa7mWMiIqrfDqQfht3dBlXeRawK7ywSkT5x5piIiIiISI3FMRERERGRGotjIiIiIiI1s1lzzI8CIiIiIiJDM5vi2ND4tdJERERExOKYiIj0gnf4iKgu4JpjIiIiIiI1FsdERERERGpcVvECXItMREREVH9w5piIiIiISI0zx0REVCN88I6I6jLOHBMRERERqXHmuJa4FpmI6opdyftRWFD83Hbms7or6nh6ldvH9Wtv5EiI5IfFMRERVclclk9wsoKI9InFMekdZySIiIjIXHHNMRERERGRGmeO6wBDz9RyJpiIzBGXWxBRbbA4NgImaCIiIiLzwOKYiIioFqq6q1bTO2pyuzMnt3iITIHFsZ6Zy9PdRET1Fe/mEVF1WBybkKYEfTkz57lt1tZWcC3vVqPjcwaAiEh7+iiaTZV35XRee/sGGObTyqDnJTIkFsdUZ+jjFicRkbaulZ15bpu74nUTRGJ4mopvorqIxbEJVTVDbAxMckRE5unZ/G1v3wAFVXy7oab9iejFWBxTrekr6db0OJwNJiIiIkNhcfwCmmZ3O7o2rtH++lDVLbzq1PXbey+aMXl6X21pKry5fpuo/qpYi2x3twEKX5BzNKlvM7g1zZmGzLFRx9OrvF4wf5MmdbY4rmlRS4ZXFy8O5l40m3v8VL/UdPJBX/m+4rzW1lYoKSmtdl9NkxiaJis07R91vAYBmhFDXgfkdo15UTzPFuzMu/JRZ4tj0qymybs+kVtyJSLjkdOdP+bpJ+SUkzmZUH/Uu+LYVA/BmQN9zXjUt+StD+aSdGsSp7m0SRNzj98cVJePtZmlNcR5zRXzcc2Zy9+43OKsD58MZSGEEKYOgoiIiIhIDixNHQARERERkVywOCYiIiIiUmNxTERERESkxuKYiIiIiEiNxTERERERkRqLYyIiIiIiNRbHRERERERqJvsSkJMnT+Lw4cNwdnaGhYUF/P39K/1/cXExVq9ejWbNmuH69euYNWsW2rVrZ6JojeNFfQIABw8exPr16xEQEIBBgwaZIErje1G/hIWF4cGDB2jSpAlSUlLw0Ucfwd3d3UTRGseL+uTgwYOIiYlBx44dcfHiRYwbNw6DBw82UbTGo83fEABER0djwYIFSEhIgL29vZGjNDxd8uvevXuRmpoKS0tLuLq6YtKkSQCArKwsfP3113Bzc0N2djYWLlxYo76rbUwXLlzAd999h86dOyMjIwNeXl545513AADLli1DRkaGdIwlS5bAw8PD4H00ePBgtGrVCgDQtGlTrFu3zqR9FB8fj+XLl8PJyQkAoFQqMXLkSMydO1enPtImJkDzdclUY0lTTIYaS7rEBJhuPGmKyZTjqbrruaHGU5WECRRC0f9OAAAgAElEQVQWFoqhQ4eK4uJiIYQQ/v7+4uTJk5X2CQ0NFWFhYUIIIS5fviwmT55s9DiNSZs+yczMFKdOnRJ+fn7il19+MUWYRqdNv2zYsEGUl5cLIYQ4cOCAmD17ttHjNCZt+iQiIkJkZ2cLIYRISUkRw4YNM3qcxqZNvwghxNWrV8X69etFhw4dRH5+vrHDNDhd8uvt27fFmDFjpL+nCRMmiIyMDCGEEO+9955ISkoSQgjx73//W2zYsMEoMR05ckQ6r0qlEj169BBKpVIIIURwcLDWMegrnurOa6o+Sk9PFykpKdJ+ixcvFllZWdXGqq+YNF2XTDmWNMVkiLGka0zVndtU/WTK8aTpem6o8aSJSZZVJCYmomXLlrCxsQEA+Pj44OjRo5X2OXr0KLp16wYA8PDwwOXLl5Gfn2/sUI1Gmz5p06YNevXqZYLoTEebfpk3bx4sLCwAAOXl5bCzszN2mEalTZ9MmDABLVu2BADcuHGjzs+kA9r1y+PHj7F161Z8+OGHJojQOHTJr8ePH4enp6f099StWzccO3YMJSUliI+PR5cuXaRjxsbGGiWmIUOGwMvLS9pPoVDA2toaAFBQUIDNmzcjLCwMO3bsQGmpdl83res16MyZM9iyZQs2btyIhIQEADBpH7Vr1w6dO3cGADx48AAqlUqaiaxtH2kbk6brkinHkqaYDDGWdI0JMN140hSTKceTpuu5ocaTJiZZVqFUKitNeTs4OECpVGq1j4ODg9HiNCZt+qQ+qkm/qFQqREZGIjAw0FjhmYS2fVJUVISQkBCcPn0aa9euNWaIJqFNv2zYsAEffPCBlJzrIl3y68OHDyttt7e3h1KpRE5ODho2bChdmGqan/SV83fu3Ik5c+agUaNGAIDRo0fDw8MDVlZWWLNmDUJDQ7V646NrPPPnz4eXlxceP36M8ePHIzQ0FLa2trLoo++//1663QzUvo+0jUkTU44lbehrLOkjJlONJ22Yajw9ez031HjSxCQzx87OzigoKJB+zs/Ph7Ozc433qUvqW3u1pW2/qFQqBAUF4eOPP4arq6sxQzQ6bfukYcOGWLBgAdauXYtp06ahpKTEmGEa3Yv65fbt28jLy8N///tfhIWFAQC2bduGixcvGj1WQ9Ilvzo5OVXaXlBQAGdnZzRu3BhFRUUQQmg8pqFiqrBv3z4UFhZi+vTp0jZPT09YWT2Z4+nVqxfi4uKMEk/F7KOtrS06deqEhIQEWfSRSqVCcnIyevToIW2rbR9pG5MmphxLL6LPsaSPmEw1nl7EVOOpquu5ocaTJiYpjrt27Ypbt25BpVIBABISEjBw4EDk5uZKt60GDhyI8+fPAwDS0tLQsWPHOjtrDGjXJ/WRNv1SVFSEwMBAzJgxA6+99hoOHTpkypANTps++eabb6Rk0bx5c+Tk5KC4uNhkMRvDi/qlRYsWWLVqFWbNmoVZs2YBAGbMmCHdjqsrdMmv/fr1Q0pKijR2zp8/j/79+8Pa2hq+vr7SG4mEhAQMGDDAKDEBwO7du6FUKvHBBx8gLS1NeiBo9erV0jlu3LgBNzc3g8dz6tQpHDt2rNJ527RpY/I+Ap4UfaNGjap03Nr2kbYxaWLKsVQdfY8lXWMy5Xh6EVOMJ03Xc0ONJ00UQUFBQTofpYasra3h7u6Obdu2ITExEU2bNsXbb7+N4OBgXLlyBd27d4enpyd++uknXLp0CbGxsfjb3/6Gxo0bGztUo9GmT4QQ2Lx5M+Lj41FQUABbW9saDUxzpE2/zJs3DykpKTh79iwiIyMRFxeHiRMnmjp0g9GmT+Lj43HgwAH89ttv2LNnDyZOnAgfHx9Th25Q2vQL8OT23LZt2xAfHw+FQoF27drVqTfeuuRXBwcH2NnZISIiAidPnkSfPn3Qt29fAED37t3x7bff4rfffsONGzfw17/+VevlKbrEdOTIEQQFBSE/Px+RkZGIjo5G9+7d0bp1a+zfvx+JiYlISEjA5cuXsWDBAq2eOdAlnsLCQvzrX/9CZmYmDh48CG9vb4wZM8akfVRhw4YNmDdvHhQKhbSttn2kbUyarkumHEuaYjLEWNI1JlOOpxfVFKYYT5qu54YaT5pYiIoynIiIiIionuOXgBARERERqbE4JiIiIiJSY3FMRERERKTG4piIiIiISI3FMRERERGRGotjIiIiIiI1FsdERERERGosjomIiIiI1FgcExERERGpsTgmIiIiIlJjcUxEREREpMbimIiIiIhIjcUxEREREZEai2MiIiIiIjUWx0REREREaiyOiYiIiIjUWBwTEREREamxOCYiIiIiUmNxTERERESkxuKYiIiIiEiNxTERERERkRqLYyIiIiIiNRbHRERERERqLI6JiIiIiNRYHBMRERERqbE4JiIiIiJSY3FMRERUh6xduxYTJkzA6NGjERMTY7I4YmJiMGLECEydOlWr/b/44gv06NEDe/bsAQAUFxdjwIABePz4sSHDrJHRo0fjxo0bpg6DDIzFMZGBTZ06VUr2RESGlJmZiZ07d+I///kPQkJCYGtrq9fj79mzR+tid8iQIZg1a5bWx166dCk6deok/dygQQPs27dP723Qxc6dO+Hm5mbqMMjArEwdABEREenHnTt30LhxY9jY2KBt27Zo27atqUPSyUsvvWTqECqRWzxkGCyOqVrh4eEIDQ2Ft7c3GjVqhISEBHTs2BH+/v5Yv349UlNTMX36dPzpT38CAJSUlGD9+vU4f/48LCws0KdPH3z44YewsLDAlStXsHr1ahQVFaGkpAQTJkzAxIkTAQD+/v6IjY3F3LlzkZiYiCtXrlQ6blVxRUREwNbWFg0bNsTf/vY3HDhwAKGhoXj11Vfxf//3fygqKsLHH38MIQRmzZqF0NBQNGnSBF5eXoiLi0Pjxo3xxRdfYMOGDbh48SKGDx+Ojz/+GCqVCjNnzsTp06exbNky/PLLL7h58yZWrFiBixcv4vDhwxBCYPPmzXBycgIAJCcnY+XKlbCwsIBCocCyZcvg7u6OdevWITU1Fffv30dkZCRmzpyJo0ePYv/+/fDz88PVq1dx7tw5WFhY4PHjx2jevDm++OILtG/fHjNnzsTvv/+OzZs3w8PDwzi/cCLSihxz44ULF7BixQrcv38fU6dOxdChQ5GRkfFcvpkyZQpGjBih8ZwAEB0djR07dqBhw4YAgPfffx8KhQJhYWF48OABpk6dig4dOmDp0qUIDw9HZGQkbGxsYGFhgaVLl+KVV17Rqh8vXryIwMBANGjQAF26dIEQQvq/Tz/9FIcPH8bWrVvRrVs3veXlZ393Fy9eRJMmTbBp0yY0aNAADx48wKJFi1BcXIzS0lIMGjQIs2bNwqpVq/Djjz/is88+w4QJEwAAUVFR+P7772FtbQ0nJycEBgZKx/rhhx8wfPhw5OXl4fLly/D09MTq1atrMdrI6ATRCwQHB4v+/fuLvLw8UVxcLN544w0REBAgysvLRXJysujatasoKSkRQgjx9ddfi6lTp4rS0lKhUqnExIkTRVRUlBBCiMTERJGYmCiEEEKlUokRI0aIjIwM6TyDBg0SgYGBQgghkpKSKh33afn5+aJnz56iuLhYCCHEv/71LxERESGEEGL27Nli8+bN0r5ffPGFyMzMFEIIERERIbp27Sqys7NFeXm5GDt2rPjLX/4iiouLxYMHD4Snp6e4e/eu9NoOHTqIb7/9VgghxLZt20T//v3F+fPnpfOEhoYKIYTIy8sTvr6+4uTJk0IIIX799Vfxhz/8QZSVlQkhhPDz85Piq+Dn5ydmzJghSktLxdWrV8WuXbvEF198IZYsWSLt880334hTp05p90siIqOTW24UQoi4uDgxaNCgStuqyjfVnfPcuXOid+/eQqlUCiGE+O9//ysWLlwohHiSR/38/Cod/4cffpDycVxcnJg8ebL0f1XtX6G4uFj0799f7Nu3TwghxKVLl8Rrr71WKV8OGjRIxMXFST/rKy8HBweLvn37itzcXFFWViZGjRolxbF69WrpOAUFBWLSpEmV+rIivjNnzohevXpJ/fTVV1+JadOmSfsuXLhQjB07VhQXF4uioiLRs2dPkZCQUGVfkLxwzTFpxcvLC40aNYKNjQ3c3Nzg4eEBCwsLeHh4oLCwEEqlEgAQGRmJ8ePHQ6FQwNraGiNGjEB0dDQAwM3NDT/++CMmTZqE9957D/fv38elS5cqnadfv34A8Nxxn6ZQKAA8ecf++PFj/OlPf8Jbb70FABg3bhz27t0L4MlMza1bt9CmTRvpte3atUPLli1hYWGBV155Be3bt4eNjQ2cnZ3h5OSErKysSufq3bs3AKBDhw4oKipC165dpfhu3rwJAPj1119hZ2eHN954AwAwcOBAPHjwAElJSdX26YABA6BQKODu7o4//vGPGDt2LH766ScUFxcDAOLj49GzZ89qj0FEpiWn3FidZ/NNdefcs2cP+vfvL83ADh06FJMnT9Z47FdeeQVz5szBlClTsG7dOqSkpGgVU2JiIpRKJUaOHAkA6NSpk1bLQPSVl729vfHyyy/D0tISr776qpT/HR0dcfz4cVy5cgV2dnb49ttvq4wjKioKAwcOlPrp7bffRlxcHG7duiXt4+vrCxsbGzRo0ABubm7PXWNInrisgrRib28v/dvKykr62crqyRAqKSkB8GS927Zt26QH0AoKCqQ1WqtWrUJeXh527twJhUKBqVOnoqioqNJ5HBwcADx5EOPp4z6tYcOG2LFjB0JDQ7Fx40YMHDgQ8+fPh5OTEwYPHoxly5bhwoULuHfvnnRBeVE7Kn5+9nwV/69QKDTue+fOHfz++++VHlJxcnJCbm5uVV0padSoUaWfu3TpgqZNmyImJgbt27fHq6++CktLvn8lkjM55cbqPJtvqjvnnTt3Ki3lsrKygre3d5XHffToEWbPno0VK1ZgxIgRyMrKwpAhQ7SK6f79+3jppZekCQ/gSWH6IvrKyxV9Cjzp14rXzpw5E7a2tvj444+hUCgwZ84cqYB/2rP91LhxY2l7y5Ytqz0HyRuLY9KrFi1a4P3335cSSXl5OfLy8gA8WQ83ZcoUKRHWNkmUlJTA2dkZa9euxaNHj7Bo0SKsXr0aq1evho2NDUaMGIGoqCj8/vvvWLp0qX4aVo0WLVqgefPm2L59u7QtPz8fNjY2NT7W2LFjsXfvXrRv315a00ZE5s8YubEmqjtnixYt8PDhQ+nn0tJSXL16FR07dnzuOBkZGcjPz5cmIkpLS7WOwcXFBXl5eSgtLZXeTLxoUkFbuuRlpVKJqVOnYurUqTh58iRmz54NT09PuLq6PneOp/spJycHANC8eXO9tIFMh9NSpFfjx4/H/v37UVZWBuDJrcR//vOfAABXV1fplta9e/eQlpZWq3PcvXtXKnobNWqETp06SecDniytOHDgAIQQWs1C6GrQoEHIzc3FhQsXAACFhYWYNm0a8vPzATyZ5Xj8+DGuX7/+wocxxowZgxMnTuC3337Dq6++avDYicg4jJEba6K6c44fPx7Hjh2TCr+DBw9KM94V+QwA5s6di5YtW8LKykrKf8ePH9c6hq5du8LZ2RkHDx4EAKSmpuLatWu6Nw4vzsvVqXigEniybMba2rrSg4IVnu2nyMhI9OrVS5o1JvOlCAoKCjJ1ECRf+/btw7Zt25Ceng5bW1vExsbiyJEjSE1NhaenJ1asWIH09HQkJSVh2LBh8PX1RVpaGoKDgxEdHY3c3FwsXrwY1tbW6NSpE8LDw7Fnzx6kpaWhqKgIZ86cgbu7O0JCQpCUlITk5GT07dsXAQEBlY5b8cQ08OTW2enTp7F161ZERkbi/v37CAgIkG4btmjRArt378a7774Ld3d3AMCpU6ewfv16ZGZmoqioCNevX8fu3btx5coVtGzZEtu3b8fZs2eRnJyM119/HfPnz8fNmzeRlJSE7t27Y8mSJcjOzsadO3dQXl6OsLAwpKenw9LSEr6+vvD19cWaNWuwZ88eREdH4/3330fnzp0BPLmVtnnzZsTGxmLKlCn4/vvvERsbi9TUVJSUlKBbt25S2xwcHHD69GkMGDBAWkdHRPIjx9x44cIFLF++HNnZ2Th16hQ8PT3x7bffVplvqjtnz5490aRJE6xatQrR0dG4ffs2PvvsM9jY2MDFxQWRkZGIiopC+/btMWLECDg5OeHLL7/EyZMnYWFhgaSkJJw/fx4ODg4ICQlBZmYm7t69iwEDBlTqQ4VCge7du2Pjxo2IjIzErVu3YGNjgxMnTqBFixb46quvcPHiRSQnJ6NLly5YtGiRXvLys7+7S5cuSdcDJycneHp6YuPGjdi7dy/Cw8Mxffp0DBgwAKtWrcKxY8eQmpqK1q1b44033kCTJk2wcuVKREVF4dGjR1i5ciXs7Oywbds2REVFSdeY//73v9L4cHd3r/QsDMmPhajq7RCRmZs5cyY2b95cq6UNpvbpp59i0aJFcHFxMXUoRERE9Q7XHFOdcf36deTk5KBRo0ZwdXU1q8I4NzcXiYmJ8Pb2RklJCQtjIiIiE2FxTHXGo0eP8Omnn8LFxQXr1q0zdTg1olKpEBQUBGdnZwQGBpo6HCIionqLyyqIiIiIiNT4aRVERERERGoGWVZx//6jKrc3bmyHnJxCQ5xSJ3KMizFpT45xyTEmQJ5xyTEmwDBxubg0evFOtaQp72pLrr8HQ6uv7Qbqb9vra7uB+tn22uRdo84cW1kpXryTCcgxLsakPTnGJceYAHnGJceYAPnGZSj1rb0V6mu7gfrb9vrabqB+t70m+EBeHXYg/XCV20e1/4ORIyEiMj7mQCKqDa45JiIiIiJSY3FMRERERKTGZRUk4S1IIiIiqu84c0xEREREpMaZYyIiIvDuGRE9wZljIiIiIiI1FsdERERERGosjomIiIiI1LjmuB7StK6OiEjOuCaYiIyBxXEdYKpilxcqIiIiqmtYHJPRsJgmIiIiueOaYyIiIiIiNc4ck95xTTMRERGZKxbHJFtchkFE5oZ5i8j8sTiWoQPph2F3twEKC4orba+ryTXqeHqV261bGTkQIqoXni5gq8q11e1PRHUfi2MzIrcEfTkzp8rtHV0b1+g418rOVH0c1Ow4RERERLpicUxERGRgXG5BZD5YHBMRkazI7S4ZEdUvLI5NyFwuABVxarM2z1xoWuc8rl97I0dCRHKhr6ViNcEZZSL5YXFMsqXpQjWK9SsREREZCItjkphi1oSIiIhITlgcExGRWTOXJWpEZB5YHBMRkazI7S6WpniqwjttROaPxTG9UMWFwdraCiUlpVrvbyjPzhJV96AgH2ohkq91R8NNHYJs8UE9ItNhcVxLNbmNx2RmPvgpFkRERPUbi2Mj0Nd6uJreapTbrUl9ebZdFTPa5t4uIqoec6BmnGkm0h8Wx0REZFCa7siQ4bFoJqo5Fsd1gKHX+BIR1SVP50xtn6UgovqDxTHVaZoe+HFXvF6j43AtMhEZQs0nN/ixdUSGxuKY6qVrZWeq3F7TotnQWJQT0dPq0zpqIlNhcUx1hpyWl3x/6DIKqvh4ORa1VJdxbbH5FK9Rx9Nhb9/guTylKUfVdO0y39iTOWNxTERUz+mrUAJeqXKrpjs19Ymh37zXtCi/VnYG1o+tUFJWeb31gfSrNTqvvpau1VRVY5APGZK+sDiuh+Q0w0pEdQeLYPlhvieqORbHMnQ5M4dPUJsJTTNoNvCscntNbzVqLjb0c2uSH/NE1dE0Xq1bGTmQF2ABaHiG7mMuwyA5qbPFsb4u+vr4Ag8mbvNR05mvjtDPOkJ9fVGMJhUXnmdvnWsqcmp6oapqf17UiIjIHMmuOK5pUVvToqKq/e3uNkBhFQ9PAebzcAWZhqbx4fWSfo6jieYZPU3rBateC6r5vIabsX72b7Di709fs9X6moHiQ5Wa8Q1/3aWv362miYao4zU7jqY1zVUZVcM/Td45I01kVxwTEZE8cA0xGYs+xpqmN8bPHvtFyxZLsrU7ToVPB06scruh72DXdNJQX0V/TR6GrOkEZkl21RM5xp6UsBBCCKOekYiIiIhIpixNHQARERERkVywOCYiIiIiUmNxTERERESkxuKYiIiIiEiNxTERERERkRqLYyIiIiIitVp/zvHJkydx+PBhODs7w8LCAv7+/s/tc/DgQaxfvx4BAQEYNGiQtH3v3r1ITU2FpaUlXF1dMWnSJABAVlYWvv76a7i5uSE7OxsLFy6Evb29UeK6cOECvvvuO3Tu3BkZGRnw8vLCO++8AwBYtmwZMjIypNcvWbIEHh4eBo8JAAYPHoxWrZ58jVnTpk2xbt06AKbtq/j4eCxfvhxOTk4AAKVSiZEjR2Lu3LkG76uwsDA8ePAATZo0QUpKCj766CO4u7sDMO240hSXKcdVdX1lynGlKS5TjquDBw8iJiYGHTt2xMWLFzFu3DgMHjwYgGHHlTG8qO3FxcVYvXo1mjVrhuvXr2PWrFlo164dAM3jxFzo0vbExEScOHEClpaWiI+Px8qVK9GiRQtTNKPGatvu6v4GzYUuv/OVK1fCysoK5eXlKCoqwtKlS2FpaR5ziLq0e+3atbC2tkZxcTFcXFwwY8YMUzRBXkQtFBYWiqFDh4ri4mIhhBD+/v7i5MmTlfbJzMwUp06dEn5+fuKXX36Rtt++fVuMGTNGlJeXCyGEmDBhgsjIyBBCCPHee++JpKQkIYQQ//73v8WGDRuMFteRI0ekc6tUKtGjRw+hVCqFEEIEBwfXKA59xVTduU3ZV+np6SIlJUX6efHixSIrK6vaePUV04YNG6Sxc+DAATF79mwhhOnHlaa4TDmuNMVU3blN2VemHFcREREiOztbCCFESkqKGDZsmBDCsOPKGLRpe2hoqAgLCxNCCHH58mUxefJk6f906XdT06Xtjx49Ev7+/tJ+mZmZoqCgwEiR60aXdlf3N2gOdGl7YmKiGD16tLTf6NGjxdmzZ40UuW50affPP/8s5syZI+03fvx4kZycbKTI5atWb4kSExPRsmVL2NjYAAB8fHxw9OjRSvu0adMGvXr1eu61x48fh6enJywsLAAA3bp1w7Fjx1BSUoL4+Hh06dJFOmZsbKzR4hoyZAi8vLyknxUKBaytrQEABQUF2Lx5M8LCwrBjxw6Ulmr+Zh19xgQAZ86cwZYtW7Bx40YkJCQAgMn7ql27dujcuTMA4MGDB1CpVNLskqH7at68edLYKS8vh52dHQDTjytNcZlyXGmKCTDtuNIUlynH1YQJE9CyZUsAwI0bN6QZdkOOK2PQpu1Hjx5Ft27dAAAeHh64fPky8vPzAVQ9TsyFLm2PjY2FnZ0dtm3bhk2bNiElJaXS34+c6dLu6v4GzYEubXd0dERhYSFKS0tRWloKCwsLtG7d2thNqBVd2n39+nUp9wFA69atERcXZ7TY5apWyyqUSmWl24cODg5QKpVavfbhw4eVXmtvbw+lUomcnBw0bNhQugjV5Jj6iOtpO3fuxJw5c9CoUSMAwOjRo+Hh4QErKyusWbMGoaGh+PDDD40S0/z58+Hl5YXHjx9j/PjxCA0Nha2trWz66vvvv5duMwPG6yuVSoXIyEgEBgYCkM+4ejaup5lqXFUVkxzGVXV9ZYpxVVRUhJCQEJw+fRpr164FYNhxZQzatF3TPg4ODlWOEzc3N6PFrwtd2p6dnY2kpCT8/e9/h0KhwLRp0+Do6KhxEkNOdP2dV3j2b9Ac6NJ2Nzc3vPPOO/jrX/8KS0tL9O7dW1peIne6tNvHxwerVq1CeXk5ysrKcPnyZWlyoD6r1cyxs7MzCgoKpJ/z8/Ph7Oys1WudnJwqvbagoADOzs5o3LgxioqKINTfZl2TY+ojrgr79u1DYWEhpk+fLm3z9PSEldWT9xG9evWq0bsqXWOqmHW0tbVFp06dkJCQIJu+UqlUSE5ORo8ePaRtxugrlUqFoKAgfPzxx3B1dQUgj3FVVVwVTDWuNMVk6nFVXV+Zalw1bNgQCxYswNq1azFt2jSUlJQYdFwZgzZtr26fqsaJudCl7Q4ODujcuTOsra1haWmJrl274syZM0aLXRe6/s6Bqv8GzYEubY+JiUF8fDy++uorhISEICsrC7t27TJa7LrQpd0+Pj6YMWMGvvrqK+zcuRNdunSpNJNcX9WqOO7atStu3boFlUoFAEhISMDAgQORm5sr3Y7TpF+/fkhJSZEuKufPn0f//v1hbW0NX19fXLx4UTrmgAEDjBYXAOzevRtKpRIffPAB0tLSpAeAVq9eLe1z48aNGs2c6BLTqVOncOzYsUrnbtOmjSz6CnhS8I0aNarSNkP3VVFREQIDAzFjxgy89tprOHToEADTjytNcQGmG1eaYjL1uKqurwDTjKtvvvlGGjvNmzdHTk4OiouLDTqujEGbtg8cOBDnz58HAKSlpaFjx45wcHDQOE7MhS5t9/X1RXZ2tnSsW7duoW3btkZvQ23o0u4KVf0NmgNd2n7nzh24uLhIx3JxcZGOI3e6tLu4uBgdOnTA3LlzMX36dOTm5mLo0KEma4tcWIiKrF9DJ06cwKFDh9C4cWNYW1vD398fa9asgaOjI2bNmgUhBDZv3owff/wR3bt3x5gxY9CvXz8AT57+Tk5OhkKhQNu2bSs9/f3VV1+hTZs2uH37NhYtWlTjp79rG9eRI0ewcOFCab1Vbm4ulixZAl9fXyxatAhNmjRBw4YNkZGRgcWLF6NJkyYGjyktLQ2bNm2Cp6cn7t27h2bNmmH27Nkm76sKf/nLX/D1119La2gBGLyv/P39ceXKFTRt2hQAUFhYiIiICACmHVea4jLluNIUk6nHVXW/Q8A042rz5s24e/cuWrZsiWvXrsHHxwcTJ04EYNhxZQwvantRURFWr14NFxcXZGZmYvbs2WjXrl2148Rc1LbtwJNlUNnZ2bC2tkZRUREWLlxoNp9coEu7gar/BnKUr48AACAASURBVM1FbdteWFiIZcuWoVWrVrC0tERWVhY+//xzs1lrXtt2P3z4EB999BF8fX1RWlqK3r17w9fX19TNMblaF8dERERERHWNebwNJiIiIiIyAhbHRERERERqLI6JiIiIiNRYHBMRERERqbE4JiIiIiJSY3FMRERERKTG4piIiIiISI3FMRERERGRGotjIiIiIiI1FsdERERERGosjomIiIiI1FgcExERERGpsTgmIiIiIlJjcUxEREREpMbimIiIiIhIjcUxEREREZEai2MiIiIiIjUWx0REREREaiyOiYiIiIjUWBwTEREREamxOCYiIiIiUmNxTERERESkxuKYiIiIiEiNxTERERERkRqLYyIiIiIiNRbHRERERERqLI5Jdj766CN06dIF8fHxAICsrCy8+eabJo6KiIiI6gMWxyQ7wcHBcHFxkX5u3bo1/vOf/5gwIiIi+Zk6dSr27Nmj83GysrLg4eGhh4iI6gYWx2QWXnrpJVOHQERERPWAlakDoLolPDwcoaGh8Pb2hr29Pc6dO4cmTZrgn//8J4KCgpCVlQULCwu8+uqrWLp0KaysngzB2NhYrF27Fo6OjujTp0+lY7777ruIi4tDTEwMbt68ic8//xwuLi7Yvn07fv75Z6xcuRI9e/bEqlWrAACbNm3C8ePHYWNjA2dnZ3z22Wdo2rTpc7EqlUoEBgYiJycHZWVl+POf/4yhQ4fiwoULWLp0KR49eoQpU6bgyJEjOH/+PJYvX15l27Zv345jx47h66+/hkKhQMOGDbFs2TK4ublp7I/t27cb/pdBRHXWunXrkJqaivv37yMyMhIzZ87EwIEDcfz4cWzatAk2Njawt7fH559/jmbNmmHy5MlISEjAW2+9hU8//RQzZsyAtbU11q5di2XLlgF4MhMNAB988EG1eTYmJgZffvklmjRpgi5duuDUqVPIy8vDL7/8guTkZKxcuRIWFhZQKBRYtmwZ3N3dq2yDpn015U13d3fs378ffn5+uHr1Ks6dO4cpU6bA398f33zzDQ4fPgyFQoG2bdsiICAADg4OCAoKqvI1c+fONdrvisyQINKz4OBg0bt3b6FUKkVpaalYs2aNyMnJEVFRUdI+CxcuFLt27RJCCKFUKkXXrl3FuXPnhBBCHDlyRHTu3FnExcVJ+3fo0EHcvHlTCCFERESE8PPzq3S+hQsXCiGEuHLlihg5cqQoLy8XQgixYsWKSsd52owZM8TGjRuFEELcvXtX9OzZUzpHXFyc8PT0FCdOnBBCCLFq1SqNbcvMzBRdu3YV6enpQgghoqKixPDhw0VJSYnG1xAR6crPz09ERERIP1fkomvXrgkhhNixY4d49913pf//85//LFasWCFKSkrE3LlzRUFBgRBCiJs3b4oOHTpUOnZ1ebbi/728vMTVq1eFEE9yZF5envD19RUnT54UQgjx66+/ij/84Q+irKzsudhftK+mvOnn5ydmzJghSktLxdWrV8WuXbtEZGSkePPNN0VhYaEQQojPPvtMLF68uFI/PfsaoupwWQUZRNeuXeHk5ASFQoEFCxbg5Zdfxq1btzB58mRMnToVp0+fRkpKCoAns8bOzs7w8fEBAAwZMgQNGjSo1Xnt7e3x4MEDHD58GCUlJZg/fz66d+/+3H53797FiRMn8P/+3/8DADRt2hQ+Pj44cOCAtI+trS169+4NAFi4cKHGtu3fvx9dunRBu3btAABvvfUWbt26hfPnz2t8DRGRvu3fvx+vvfYa2rdvD+BJLjp16hTu3bsHAFi+fDkiIyMxf/58TJ48GXZ2djqdr127dtKs8MKFC/Hrr7/Czs4Ob7zxBgBg4MCBePDgAZKSkp57rTb7asqbAwYMgEKhgLu7O/74xz9i7969GDlyJGxtbQEAEyZMQHR0NEpLSzW+hqg6XFZBBtGoUaNKP0dGRiI8PBxRUVFwdHRESEgIsv8/9u49oOb7/wP4s073Uiqkchn2FaJcZhgmZsNyGb62L8uYO2uYr2mEYfu6/th+uU2Y+W62mW8qsrmOmG5Yl7W5VVLxLekilS6n8/794fT5iXNSdDqXno+/Op/r6/25vM6r93mfz7l1CwCQnZ0Ne3v7Kss3btz4mfbr7OyMHTt2YOfOnVi1ahVGjBiBefPmScM3KmVmZgJ4mNCNjIwAAHl5eWjfvr3aNqibnpmZCQcHB+m1TCaDra2ttI/qtkVEVFcyMzORnJwsDY8AAFdXV+Tk5KBZs2ZwdnbG5MmT8eOPP2L9+vXPvT9VufDevXtV9u/g4ID8/HyVsT5t2WfNwQ4ODigvL0dOTg6cnJyq3RaRKiyOqV4kJCTAw8NDKnof/Y++adOmyM3NrbK8qmRaydTUFGVlZdLrgoIC6e8HDx7gxRdfxLZt25CdnY0PP/wQO3fuxNy5c6tso3nz5gAePhmjMqmWlpZWiaumnJ2dcePGDel1RUUFCgoKpH0QEdUHZ2dndO7cGYGBgdK0e/fuwcbGBsDDHHf16lW0a9cOmzdvxj//+U+126ouz1a3/+bNm1f5TkVhYSHMzMyea9ma7PfR95Dc3FyYmpqiSZMmtd4WEcCnVVA9ad26Na5cuYKysjLI5XJERkZK8wYMGIDc3FxcunQJAHDy5EkUFxer3VaLFi2QmpqKsrIylJaWSs9DBh4W4QEBAQAeFt1t2rRBRUXFE9twcnJCv379EBoaKk379NNPq2yrpry9vZGYmIibN28CAH7++We4uLigW7dutd4WEVFNWVtb48GDB0hNTcW6devg7e2N+Ph46VO5nJwc+Pj4QKFQAAC2bt2KadOm4bPPPsMPP/yAP/74Q9oO8LBzITAwEHFxcdXmWXUGDhyI/Px8JCQkAACKi4vx3nvvobCw8LmWfZrRo0fj6NGjKCkpAQCEhIRg5MiRkMlktd4WEQDIVqxYsULbQZDhOHz4MPbs2YOUlBRcvXoVr7/+OgDAzc0Nly5dwvbt23Hx4kVYWloiJiYGxsbGeOWVV9CxY0esXr0aR44cgUwmw3//+1/ExMTA3d0dixYtQkZGBuLj4/Hqq6/ixRdfxLVr17B161YkJCSgbdu2+PXXX1FWVoa+ffvil19+wb59+/DTTz/BxMQEH3/8scoxzP369cOPP/6Ib7/9FkFBQfD09MS4ceOQlJSEZcuW4datW4iMjMSgQYNgYWGhtm12dnZwd3fH6tWrERwcjKSkJGzYsAEODg5q1yEiel7m5ubYvn07wsPDMWHCBLi7u6Njx474/PPPERoaimPHjsHf3x+urq745z//iZ9//hmdOnVCWVkZTp06hV9++QWNGzdG9+7dkZqait27dyM7OxuTJ09Gy5Yt1eZZuVyOTZs2IS0tDVFRURg1ahQAwMzMDL169cL69etx8OBBHDp0CLNnz0anTp2eiL26ZdXlzfXr1yM8PByXL19GeXm51AHh5uaGBw8eYOPGjTh48CBsbW3h7+8PMzMztesQVcdICCG0HQQRERERkS7gsAoiIiIiIiUWx0RERERESiyOiYiIiIiUWBwTERERESlp5DnH2dn3pb/t7a2Ql6f+sVz6iu3SH4bYJoDt0jf29lYwMdHco6UezbuGylCvjdricXiIx4HHoFJ1x6Fp09r/AIzGe441+WagTWyX/jDENgFsl74x1HbVJx7Dh3gcHuJx4DGoVNfHQe9/Ie9IynGV073bvlHPkRARkSb9lBiG4qLSJ6Yz3xNRXeKYYyIiIiIiJb3vOSYioobhj6S7KC+XPzHdu60WgiEig6U3xbG64RNERERERHWFwyqIiIiIiJT0pueYiIj0U22/OM1PColIm9hzTERERESkxOKYiIiIiEiJwyqIiKhOcDgEERkC9hwTERERESmx55iIiLRC0z3N/AVVInoW7DkmIiIiIlJizzERETUo7FEmouqw55iIiIiISMlge47ZM0BEpNuupOXVanlTU9VvWRy7TER1yWCLYyIiotrgo+iICGiAxTF7AIiIGgZ1PdMdWtnXcyREpE8aXHFMREQ1w84EImqI+IU8IiIiIiIl9hwTEVGDwuEWRFQdFsdERFQr/OKa5nFIC5H2cFgFEREREZESe46JiEijavs8Y12jLn7vtvUcCBHVC50rjvlxHRGRfjLUIrg2OByCSP9xWAURERERkZLO9RwTEREZGvYoE+kPFsdKIedSVE5/qz8HlRER0ZM4DJDIMLE4VkquuKBmDotjIjJsLPKejaoxynxWMpH+Y3FMRES1omtfvNO1eIhIv7E4JiIi0hJ1Q/pMXes5ECKSGGxxrK2fB1WV6DhumYiIiEg/GGxxXFfU/Vc/fYxnPUdCREQN3ZGU47DKMkdxUWmV6XzqBVHd0fviWFtjzZaFfIPycvkT09vJej4xjU/CICIiXcJHyxGpp/fFcW3pwxc3WEwTEeknXRrSB/B9g+hZ6E1xrA9FbW2pS2Z1tR11SZFJlIiofql7D2snq93ytS2ytfWYPvZMkz7Tm+JYW9Q9/9jUuP4PXV0V05rG4pvIMBhip4SuUf+M/bpRV0X2xjP762Q7tcUim7SBxbEBq20xHXIuBdbW5ih67Ise6rDYJSIiIkPD4riOaboXoDZUfTmwLml6WIg6tSnK1RX8LOz1h7Y+iWhIn4Cwh1j/1fa9R/051+wwDHU9wbW9Br1V3Ia6litMXZNUTq9tr3dD6j3XlbYaCSFEve6RiIiIiEhHGWs7ACIiIiIiXcHimIiIiIhIicUxEREREZESi2MiIiIiIiUWx0RERERESiyOiYiIiIiUWBwTERERESnV2Y+ARERE4Pjx43B0dISRkRF8fX2rzC8tLcW6devg5OSE1NRUzJgxA23atKmr3WvE09p08OBB/PjjjzA3NwcAjB07Fm+99ZY2Qq2V7OxsfPnll7hy5QqCgoKemK+P5wp4erv08XylpaXhyy+/RKdOnZCZmYnGjRsbxL1Vk3bp2/lSKBSYNWsWPDw8UF5ejvT0dKxevRoWFhbSMvp4rnTB0+7thqAm90xDUJP7rKEoKSnBuHHj0K9fP/j5+Wk7HK14++23pfcIY2Nj7N27t242LOpAcXGxGDx4sCgtLRVCCOHr6ysiIiKqLLNjxw4RGBgohBDiypUrYvz48XWxa42pSZuCgoJEenq6NsJ7Lr/88os4deqUGD16tMr5+nauKj2tXfp4vuLj48WJEyek18OGDRN//PFHlWX08XzVpF36dr4qKirE1q1bpdezZs0SoaGhVZbRx3OlC552bzcENblnGoKa3GcNxZo1a8SiRYvE2rVrtR2K1gQEBGhku3UyrCIuLg4uLi4wMzMDAHTv3h1nzpypssyZM2fQrVs3AICbmxuuXLmCwsLCuti9RtSkTQCwb98+7N69G1u2bEF+fn49R/lshg4dCmtra7Xz9e1cVXpauwD9O18eHh4YPHiw9FqhUMDS0rLKMvp4vmrSLkC/zpexsTHmzJkDAJDL5cjKynqiV1gfz5UuqMm9behqes8YuprcZw1BSEgIunfvjhYtWmg7FK26du0aAgMDsXnzZpU12rOqk2EVOTk5VRKXjY0NcnJyarSMjY1NXYRQ52rSpp49e8LLywsODg4IDw/HvHnz6q5LX4v07VzVlL6frxMnTqBfv35o165dlen6fr7UtUtfz9e5c+fwzTffwMvLC126dKkyT9/PFekGdfdMQ1LdfWbokpKSkJKSggULFuDq1avaDkerpk+fDg8PD1RUVODdd9+FtbU1evbs+dzbrZOeY0dHRxQVFUmvCwsL4ejoWOtldElN4m3ZsiUcHBwAAL1798aFCxdQUVFRr3Fqgr6dq5rS5/MVFRWF6OhoLFmy5Il5+ny+qmuXvp6v/v37Y/fu3cjIyMC+ffuqzNPnc0W6obp7piGp7j4zdCdOnICZmRkCAwNx6dIlJCQk4JtvvtF2WFrh4eEBAJDJZHjppZcQHR1dJ9utk+K4a9euuH37NsrKygAAv//+O7y8vJCfny99ZOjl5YXY2FgAwNWrV9GhQwed7i2pSZs2btwIuVwOAEhNTUWLFi0gk8m0FvPz0OdzVR1DOF9nzpzBb7/9Bn9/f2RnZyM2NtYgztfT2qVv5yspKanKx3otWrRARkaGQZwr0g2q7pmGRt191pDMnj0bvr6+mDFjBnr06AEPDw9MnjxZ22HVu+TkZBw4cEB6ffPmTbRq1apOti1bsWLFiufdiKmpKdq1a4c9e/YgLi4OzZo1w9ixYxEQEIDr16+jR48ecHd3x9GjR/HXX38hPDwcixYtgr29fR00QTNq0qbr16/j4MGDuHbtGo4fP46FCxeiefPm2g79qWJiYhAaGorLly+jpKQEXbp0wbZt2/T2XFV6Wrv08XwlJiZi5syZEEIgODgYISEhaNmyJU6cOKHX56sm7dK381VYWIidO3ciNTUVkZGRSE5Oxrx587B79269Ple6QNW9bWJSZw9b0gvq7pmOHTtqO7R6pe4+a4hj0o8dO4ajR4/i9u3bsLCwQPv27bUdUr0qLy/H3r17kZqaitOnT8PKygpTp06FkZHRc2/bSAgh6iBGIiIiIiK9xx8BISIiIiJSYnFMRERERKTE4piIiIiISInFMRERERGREotjIiIiIiIlFsdEREREREosjomIiIiIlFgcExEREREpsTgmIiIiIlJicUxEREREpMTimIiIiIhIicUxEREREZESi2MiIiIiIiUWx0RERERESiyOiYiIiIiUWBwTERERESmxOCYiIiIiUmJxTERERESkxOKYiIiIiEiJxTERERERkRKLYyIiIiIiJRbHRERERERKLI6JiIiIiJRYHBMRERERKbE4JiIiIiJSYnFM9S4jIwNvvvmmtsMgIiIiegKLY6p3LVq0wI8//qiRbQ8aNAjR0dEa2TYREREZPhbHpBW2trbaDoGIiIjoCUZCCKHtIEh3bdmyBT/88AO8vLyQl5eHrKwsODo6Yu3atXBwcAAAnDt3Dlu2bIGZmRmsra2xcuVKODk5SesOHToU+fn5iI2Nxcsvv4z//ve/iIqKwqlTp2BsbIz58+cjPj4ea9asQUhICHJycvDll18iNDQUERERcHR0xJYtW2Bubl7t/hYvXoywsDC0bdsWtra28PPzQ+fOnRESEoLvv/8eZmZmcHJywsqVK2FjY4MVK1YgLCwMPj4+SEpKwqVLlzBhwgR8+OGHVY5BeXk5Nm3ahNjYWBgZGaFv37744IMPUF5ejqlTpyImJgbLly/H6dOnER0djblz5yIsLAz379/HhAkTcPLkScTGxuLq1atITU3FZ599hpKSElRUVGDOnDl49dVXkZCQgGXLlqlch4joWWzduhVnz56Fubk5LC0tsWrVKjg5OeHUqVPYsGEDmjRpgi5duiAyMhIFBQX49ddfkZiYiDVr1sDIyAgymQzLly9Hu3btqt2eKmlpaVixYgXKysqgUCiwcOFCdO/eXe2+x4wZo/L9Yu3atVIONzU1hYODAz799FM0adJE7XvM2rVr6/MwkyESRE/h5+cnBg8eLO7fvy+EEGLp0qViwYIFQggh0tLSRNeuXUVycrIQQojvvvtOTJo0qcq6w4cPF8XFxaKgoEBs3bpVCCFE+/btRXp6uhBCiPT0dNG+fXtx7NgxIYQQn3/+uXjttdfErVu3hEKhECNHjhRhYWE12t/AgQNFVFSU9PrixYvi5ZdfFjk5OUIIIdauXSuWLFkizffx8RHvv/++kMvlIikpSfz0009PtH/btm1i4sSJQi6Xi7KyMvHOO++IkJAQaX779u3F5s2bhRBCHDp0SCQmJoqoqCjh7u4uzp8/L+23vLxcDBkyRAQFBQkhhLh586bo1q2buHnzphBCqFyHiOhZ/fvf/xYKhUIIIURQUJBYuHChNC8oKEh4eHiIpKQkIcTDfFNQUCB69eolIiIihBBCnD59WrzxxhuioqLiqdt7lFwuF0OHDhUHDhwQQghx+fJl8fLLL0vvIar2LYTq94sLFy6I3r17Szl869at4r333pP2pe49huh5cFgF1ciAAQNgY2MDABg1ahSOHTuGiooKhIWFoXPnzmjbti0AYPjw4YiMjMSdO3ekdfv06QNLS0s0atQIc+bMUbuPV155BQDQvn172NrawsXFBUZGRvjb3/6G9PR0AKjR/h4VHByMQYMGSb3cI0aMwOHDhyEe+cBkwIABkMlkaNeuHcaNG6dyG6NHj4ZMJoOpqSmGDh2KQ4cOVVlm8ODB0vbd3d0BAJaWllKb/Pz8EB8fj4yMDIwcORIA0KpVK3h6elbZ1uPrEBE9K2dnZ7z33nt49913sXfvXvz5559V5rdp00bqFfbz88Pp06dhZWWFPn36AAC8vLxw9+5dxMfH12h7leLi4pCeno5Ro0YBADp06AAnJyecOXNG7b4rPf5+ERISAi8vLymHjx07FlFRUbh9+7badYiel4m2AyD9YGdnJ/3duHFjlJeXIy8vD5mZmUhOTsbEiROl+a6ursjJyUGzZs0AAI0aNarRPiqLb5lMBmtra2m6iYkJysvLAaBG+3vU48vL5XI0adIEeXl5UrJ9WnyZmZnYs2cPDh48CAAoKip6Ysx0ZeyPeny7WVlZsLW1hYnJ/992Dg4OyMrKUrsOEdGzSE1Nxfz58/H999/Dw8MD0dHRWLx4cZVlHs83mZmZuHfvXpX86uDggPz8/Bptr1JlTpsyZYo0raysDPfv31e77+picnNzk17b29tL011cXKrdFtGzYnFMNXLv3j3p77y8PJiamsLe3h7Ozs7o3LkzAgMDqyyrqlisC7Xdn7OzM1q2bIlPP/1UmpabmysVxjXd5+zZszFs2DAAgEKhQEFBQa1jb968OQoKCiCXy6UCOTc3F23atKn1toiIqvPXX3/B2toaHh4eAB52DDyNs7Mzmjdvjm+//VaaVlhYCDMzM5w8ebLG22vevDlMTU2rbKe4uBjGxrX/sNrZ2Rm5ubnS67y8PGkfRJrCYRVUI7/99hsKCwsBACEhIRgyZAhkMhm8vb0RHx+PW7duAQBycnLg4+MDhUKhkTietj9ra2uUlJQgKioKe/fuxejRoxEeHi4V9ykpKZg9e3at9jl69GiEhYWhoqICwMNhFl999VWtY/f09ESrVq0QFhYGAEhPT0d8fLw0zIKIqK60bt0aBQUFuHHjBoCHX2R+moEDByI/Px8JCQkAHha07733HgoLC2u1PU9PTzg7O+P48eMAHhbSH3zwAVJTU2vdjtGjR+Ps2bNSgRwcHIzevXtLvcZEmiBbsWLFCm0HQbrt5MmTaNOmDcLCwrBr1y4oFAqsWrUKlpaWsLOzQ8eOHfH5558jNDQUx44dg7+/P1xdXbFnzx6EhITg+vXryMrKQt++fQEAkyZNQkZGBuLj4/Hqq69i/vz5yMrKwl9//YVWrVph7dq1SEtLQ0lJCVJTU3HgwAFcv34dDg4O6Nmzp9r9AQ97db/66itcunQJkydPRufOndG4cWP861//wuHDh3H+/HmsWrUK9vb2WL9+PcLDw3H58mWUl5ejW7duKtvv6emJq1evIiAgAIcOHUJ+fj4WL14MU1NTTJkyRSpyW7RogZYtWyIpKQnLli3DrVu3EBkZiUGDBsHCwgLGxsbo378/du7cif379+PYsWNYunQpPD091a5DRPQsmjVrBrlcjg0bNiAqKgpmZma4dOkSUlNTYWNjg02bNiEtLQ1RUVHS2GAzMzP06tUL69evx8GDB3Ho0CHMnj0bnTp1qnZ7r7/+epV9V+a6rVu34sCBAzh48CBGjRqFV199FZGRkSr3re79wtnZGU2aNJGeZnT//n2sWbMGVlZWatchel58lBs91SeffAJXV9cnHnFGREREZGg4rIKIiIiISIlfyKNqbdmyBefOnYO5uTmaN2+u8lFnRERERIaCwyqIiIiIiJQ4rIKIiIiISInFMRERERGRkkbGHGdn36/y2t7eCnl5xZrYlV7jcVGNx0U1HhfV9Om4NG2quV/yejzvPo0+HbfaYtv0lyG3z5DbBuhu+54l79ZLz7GJiaw+dqN3eFxU43FRjcdFNR6XZ2PIx41t01+G3D5DbhtgWO3Tm6dVHEk5rnK6d9s36jkSIqKGgXmXiBoijjkmIiIiIlJicUxEREREpMTimIiIiIhIicUxEREREZESi2MiIiIiIiUWx0RERERESiyOiYiIiIiU9OY5x0REpBnqnmdMRNQQ6X1xzIfUExEREVFd4bAKIiIiIiIlFsdEREREREp6P6xCHQ63ICIiIqLaYs8xEREREZESi2MiIiIiIiUWx0RERERESiyOiYiIiIiUWBwTERERESmxOCYiIiIiUmJxTERERESkZLDPOVaHzz8mIiIiInXYc0xEREREpKRzPcfqenaJiIiIiDSNPcdEREREREosjomIiIiIlFgcExEREREp6dyYY20JOZeicvpb/dvWcyREREREpC0sjomIqFb4SEwiMmQGWxxfSctTOb1DK/t6joSIiIiI9IXBFsfqqCua28nqORAiIiIi0jkNrjhWJ7nigsrpIedqtx2OUSaihorf3SAiQ8DimIiIVFL3SZs6/ASOiAyB3hfHtU3emqau50SV6WM8NRgJEZFuqE1eBNjTTETapffFsSHiR5NERDX3eM60tjZHUVEpcyYRPRO9KY611UOsbiyyOu1kPWu87PfHrqCoqLS2IT03Ft9EpAnq8mVt8iLAHEVE2qU3xTFpHt+QiEiX1XZ4BodzENGzYHFcx2rT02z6wATlFfInpqvrZaltotc0TcXDj0SJ6pemP5mrqx5lTdNkjmU+I9IfRkIIoe0giIiIiIh0gbG2AyAiIiIi0hUsjomIiIiIlFgcExEREREpsTgmIiIiIlJicUxEREREpMTimIiIiIhIicUxEREREZGSxn8EJCIiAsePH4ejoyOMjIzg6+ur6V3qhbfffhvm5uYAAGNjY+zdu1fLEWlHdnY20T/XNwAAIABJREFUvvzyS1y5cgVBQUEAgPz8fGzcuBEtW7ZEamoqFixYgCZNmmg50vql6rhs3rwZMTEx0jKzZs1C3759tRWiVqSlpeHLL79Ep06dkJmZicaNG8PX15fXTC0YQk6ubd7YtWsXCgsLUVBQgL59++K1117TZvjVepZrXF/ap1AoMGvWLHh4eKC8vBzp6elYvXo1SkpK9L5tlUpKSjBu3Dj069cPfn5+BnHeKqmqWwypfVUIDSouLhaDBw8WpaWlQgghfH19RUREhCZ3qTcCAgK0HYJO+OWXX8SpU6fE6NGjpWnLli0TR44cEUIIcerUKbFw4UJthac1qo4Lrxkh4uPjxYkTJ6TXw4YNE3/88QevmRoylJxcm7wRFxcnpk2bJoQQoqysTLz++uvi3r179R90DdX2Gten9lVUVIitW7dKr2fNmiVCQ0MNom2V1qxZIxYtWiTWrl0rhDCc61II1e9BhtS+R2l0WEVcXBxcXFxgZmYGAOjevTvOnDmjyV3qjWvXriEwMBCbN29u0Mdk6NChsLa2rjItPDwc3bp1A/DwmgkPD9dGaFql6rgAwPbt27F7924EBgbiwYMHWohMuzw8PDB48GDptUKhgKWlJa+ZGjKUnFybvHH69Gl07doVAGBqaoq2bdviwgXVP2etC2p7jetT+4yNjTFnzhwAgFwuR1ZWFtq0aWMQbQOAkJAQdO/eHS1atJCmGUrbANV1iyG171EaHVaRk5NTJYHZ2NggJydHk7vUG9OnT4eHhwcqKirw7rvvwtraGj179tR2WDrh0evGxsYG9+7dg1wuh4mJxkcB6bShQ4fC1dUVVlZW2LdvHz777DOsXr1a22FpzYkTJ9CvXz+0a9eO10wNGXJOVncN5Obmom3bttJyNjY2yM3N1VaYtVKTa1wf23fu3Dl888038PLyQpcuXQyibUlJSUhJScGCBQtw9epVabohtK2SqrrFkNr3KI32HDs6OqKoqEh6XVhYCEdHR03uUm94eHgAAGQyGV566SVER0drOSLd8eh1U1hYCDs7OxY5AP72t7/BysoKANC7d29ERUVpOSLtiYqKQnR0NJYsWQKA10xNGXJOVncNODg4PNFmBwcHbYVZYzW9xvWxff3798fu3buRkZGBffv2GUTbTpw4ATMzMwQGBuLSpUtISEjAN998YxBtq6SqbjGk9j1Ko8Vx165dcfv2bZSVlQEAfv/9d3h5eWlyl3ohOTkZBw4ckF7fvHkTrVq10mJEumXAgAGIjY0F8PCaGTBggJYj0g3r1q2T/r558yZat26txWi058yZM/jtt9/g7++P7OxsxMbG8pqpIUPOyequgYEDByIuLg7Aw4/yk5OTdf5Tutpc4/rUvqSkpCrDeFq0aIGMjAyDaNvs2bPh6+uLGTNmoEePHvDw8MDkyZMNom2A+rrFUNr3OCMhhNDkDs6fP49jx47B3t4epqamevnN6LqWlZWFVatWoVOnTigsLIRcLsfixYthbNzwnqwXExODkJAQnDt3DuPHj8eUKVNQUlKC//mf/4GLiwvS09Pxz3/+s8E9eUDVcdm6dSsePHgAR0dHXLt2DXPnzkWbNm20HWq9SkxMxMSJE9G5c2cAQHFxMd59910MGjSowV8zNWUIObm2eWPXrl0oKCjAvXv38Oqrr+r0t+af5RrXl/alpaVh/fr16NSpk1QwLV26FKampnrftkrHjh3Dvn37UF5ejnfffRf9+vUziLapq1sKCgoMon2P03hxTERERESkLxpeVyURERERkRosjomIiIiIlFgcExEREREpsTgmIiIiIlJicUxEREREpMTimIiIiIhIicUxEREREZESi2MiIiIiIiUWx0RERERESiyOiYiIiIiUWBwTERERESmxOCYiIiIiUmJxTERERESkxOKYiIiIiEiJxTERERERkRKLYyIiIiIiJRbHRERERERKLI6JiIiIiJRYHBMRERERKbE4JiIiIiJSYnFMRERERKTE4piIiIiISInFMRERERGREotjIiIiIiIlFsdEREREREosjomIiIiIlFgck05bsWIFXnrpJRw8eFDboRAREVEDwOKYdMrEiROrFMIrVqxAx44dtRgRERE96vE8TWRoWBwTERERESmZaDsA0h0KhQIrV67EtWvXIJPJ0Lp1a/j7+yMyMhIbNmxAkyZN4OHhgaioKNjb2+Ozzz7DF198gT/++ANDhgzBRx99BAAQQmD37t04fvw4ZDIZXnjhBfj7+8PGxgYAcPbsWWzbtg0ymQwWFhZYvnw5WrdujY0bN+Ly5cvIzs5GcHAwpk6dCi8vLwBAWloa5s6di6tXr2LIkCFYsGABysrKMHXqVMTExGD58uU4c+YMbty4AT8/P7z++usAgKKiInz++edITU2FEAKjRo3C+PHjAQAnT57Ezp07YWFhAWNjY8ydOxfdunXD77//jg0bNsDU1BRCCEyZMgUDBw5Uecx27dqF48ePw8TEBB07doSfnx/MzMzg6+uL8PBwzJs3DxcvXkRMTAymTJmCs2fPIj4+HmvWrEFISAguXryI48ePw8LCAitXrkRubi7Ky8sxfvx4jB49Grdv38b8+fNVrtOiRQsNXxFEpClpaWlYtWoVSkpKUF5ejn79+uHDDz8EoD5H7t+/Hzt27ICnpycaNWqE33//HR06dICvry82bdqEy5cvY/LkyXj33XeRkJCAZcuW4f79+xgxYgQuXbqEe/fuYdGiRejfvz8AYP/+/QgODoaZmRmMjIywbNkyvPjiiwAe5s41a9YgOTkZANCmTRssXLgQe/bseSJPnzlzBmFhYfDx8UFKSkqVPF1JXa5MTk7GypUrAQByuRx///vfMWbMGNy9exeffPIJSktLIZfLMXDgQMyYMUPlsTx37hy2bNkCMzMzWFtbY+XKlXBycsKWLVvwww8/YOjQocjPz0dsbCxefvllFBYWPpGfly5dihEjRmDTpk2IjY0FAHTr1g0LFiyAqampypy+dOlSjBkzRjMXCGmXIFI6c+aMmDp1qvR6zpw5Ij09XQghRFBQkOjatau4deuWUCgUYtSoUWL69OmitLRU3L17V7i7u4usrCwhhBDBwcHizTffFMXFxUIIIZYsWSIWL14shBAiLS1NdO3aVaSkpAghhAgJCRFDhgwR5eXlQgghfHx8RFBQUJW4fHx8xIwZM4RCoRBZWVmiU6dOIjMzU5rfvn17ERgYKIQQ4siRI+KNN96Q5vn7+4uPP/5YCCHE/fv3xaBBg8SFCxeEEEL07t1bZGdnCyGEOHHihAgICBBCCDF27FgRFxcnhBDi8uXLws/PT+XxCg0NFUOHDhXFxcVCoVCIuXPniq1bt0rzBw4cKD755BMhhBCRkZHi9OnTIj09XbRv314EBwcLIYTYvXu3yMrKEpMmTZL2n5OTI/r27StiYmKEEELtOkSkn+RyuRg2bJg4ePCgEEKIgoIC0b9/fyHE03NkQECAePXVV0VBQYEoLS0Vffr0Ef7+/kKhUIjExETRtWtXadmoqCjh5uYmTp8+LYQQ4tKlS6Jr164iNzdXCCHEDz/8IEpLS6Vlx48fL8W4dOlSKX9VVFSImTNniqioKCGE+jw9ffp0lXm6ulw5d+5cceTIESGEEHfu3JHeg9atWyd27NghhBCiqKhI/OMf/1B5LCuPV3JyshBCiO+++05MmjRJmu/n5yeGDx8uiouLRUFBgbRfVfl5y5YtYtKkSUIulwu5XC6mTJkitmzZIm1L1TpkmDisgiS2tra4du0azp8/D4VCgU2bNsHFxUWa36ZNG7i4uMDIyAgvvvgi2rZtCzMzMzg6OsLBwQEZGRkAgNDQUAwbNgyWlpYAgDFjxuDQoUOQy+UICwtDly5d0KZNGwDA8OHDcfv2bek/dXX69u0LIyMjNGvWDPb29rh161aV+ZU9IW5ubtI8hUKB0NBQ/P3vfwcA2NjYYODAgTh06BAAwM7ODj/99BMKCgowaNAgqVfCzs4OoaGhuHv3Ljp06IBPP/1UZUzBwcHw9vaGpaUljIyMMHz4cISGhlZZ5rXXXgMA9O7dW+oFf3T6lClTIIRAZGQkxo4dCwBwcHCAl5fXE2P6Hl2nWbNm1R4vItJdcXFxSEtLw4gRIwAAjRo1whdffAEANcqRHh4eaNSoEczMzNC6dWu4ubnByMgIbm5uKC4uRk5OjrSstbW1lHu6d+8OR0dHhIeHAwBefPFFzJo1CxMmTMDGjRvx559/AniYO0NCQqReUWNjY3zyySdSr7I6/fr1k/J048aNpVxcXa60s7PD0aNHkZGRgaZNm2Lz5s0AgMaNG+PcuXO4fv06rKys8PXXX6vcZ1hYGDp37oy2bdtKxysyMhJ37tyRlunTpw8sLS3RqFEjzJkzR5r+eH4ODQ3FW2+9BZlMBplMhlGjRqnNw4/ndDIsHFZBkm7duuGzzz7Dzp07sWTJErzzzjuYOXOmNN/a2lr628TE5InX5eXlAIDMzEw4ODhI8xwcHFBeXo6cnJwn5slkMtja2iIzM7Pa2CqHZACAmZmZtK/H55ubm0vzcnNzUVZWhg0bNsDCwgIAUFBQIH3Bb8+ePfjqq68wbNgw9OjRAx9//DFatmyJjRs3IjAwEKNHj0b79u2xcOFClV8KzMzMxOHDhxEdHQ0AKC0thbFx1f83GzVqpLI9j06vbPvjxywxMbFG2yIi/ZKVlQVbW1uYmPz/W3CPHj0APJk/VeVIdbm4cnuP5kc7O7sq+27cuDHu3LmD+/fvY+bMmfjXv/6FoUOHIiMjQyr8KnPno3G88MILT23Xo3n60VxcXa5csmQJvv76a0yaNAnNmjXD3Llz0adPH0ydOhWWlpb46KOPIJPJMGvWLAwbNuyJfWZmZiI5ORkTJ06Uprm6uiInJ0fqRKhJHq7clr29vfTawcEBWVlZ1a5DhonFMUnu37+Pl19+GQMGDEBaWhqmTZsGJycnqUezppydnZGbmyu9zs3NhampKZo0aQJnZ2fcuHFDmldRUYGCggI0b968ztpRycHBAWZmZli2bBk8PDwAPHzTKCkpAfDwTWflypVYvHgx1q1bh8WLF+O7775DWVkZFi1ahAULFmDXrl2YM2cOTp8+rbKdr7zyCqZNm1alrbVV2fbc3Fyppz43NxdOTk613hYR6b7mzZujoKAAcrlcKmiTk5Ph6upa5zny3r17VV7n5eWhWbNmuHHjBgoLC6VP3eRyubRMZe7Mzc1Fu3btADws6I2NjdG0adNax1BdriwoKMCcOXMwe/ZshIaGYvbs2YiIiEBhYSEmTpyIiRMnIiIiAjNnzoS7uztatWr1xLY7d+6MwMDAKm1+tFCvTZx5eXlVYmQebpg4rIIkJ06cwP79+wEArVq1gpOTExQKRa23M3r0aBw9elQqQkNCQjBy5EjIZDJ4e3sjMTERN2/eBAD8/PPPcHFxQbdu3QA87BF58OABUlNTsW7duudqj7GxMd566y1pGAUAbN++HSEhIQCAWbNmoaKiAhYWFvDw8EBFRQUAYO7cuXjw4AFMTEzQvXt3abq6dpaWlgIAoqOj1Q7BqI6TkxP69esnfXyXl5eHM2fO1PqfEiLSD56enmjVqhXCwsIAAPn5+Zg/f36NcmRtlZSU4MyZMwCAixcvIjc3FwMGDICLiwtMTEyQkJAA4OGX2ipV5s7KnKRQKODv74/s7GwAtc/T1eXKxYsX4+7duzAyMkLPnj0hl8thZGQkfcEQeDiMpPIL0o/z9vZGfHy8NIQjJycHPj4+z/zedejQIVRUVEChUODQoUP8wl0DJVuxYsUKbQdBusHc3Bw//PADDhw4gH379qFdu3aYOXMmYmJisGnTJqSlpaGkpASpqak4cOAArl+/DhcXF3z77be4ePEiEhMT0bNnT/Tp0wcPHjzAxo0bcfDgQdja2sLf3x9mZmaws7ODu7s7Vq9ejeDgYCQlJWHDhg3Sx3fm5ubYvn07wsPDMWHCBHz//fcIDw/H5cuX4e7ujh07dlTZ18KFC5Geno74+HgMHToUH3zwAbKyshAbG4tRo0ahV69eOH36NHbu3ImQkBDY2tpi9uzZMDY2RkpKCrZt24bQ0FAkJibi008/RZMmTZCXl4cvvvgCoaGhOHv2LJYvX46WLVs+cbzc3NxQWFiI9evXIywsDJcvX8bKlSthZWWFRYsWIT4+HomJibCxsYGbmxvy8/Ph6+uLrKwsxMTEwMPDQ2p3ZXH83XffITQ0FNOnT8drr71W7TpEpJ+MjY3Rv39/7Ny5Ez/99BPCwsLg5+eHVq1aVZsjDx8+jD179iAlJQWWlpYIDw/HyZMnpfz4r3/9CykpKYiPj8frr7+OnJwcREVFoVmzZggICMCxY8fw2WefoUOHDrCysoKDgwM2bNiAiIgIGBkZIT4+vkru/O2337Br1y4EBQVhyJAh0rCL2ubpPn36qM2VCoUCGzZswKFDhxAaGgo/Pz906tQJJiYm+PLLLxEaGor9+/dj8uTJGDBgwBPH0s7ODh07dsTnn3+O0NBQHDt2DP7+/nB1dcWePXsQEhKC69evIysrC3379gUAlfkZePhPS3JyMrZs2YKgoCC4u7vD19cXMplM7TpkmIyEqn/FiIiISK9FR0dj8eLF+PXXX7UdCpFe4bAKIiIiIiIlFsdEREQGJiEhAatXr0Z2djbmzp2r7XCI9AqHVRARERERKbHnmIiIiIhISSPPOc7Ovq+JzdYbe3sr5OUVazuMOmEobTGUdgCG0xZDaQdQf21p2lRzPyBQ07yrj+dN32LWt3gBxlwf9C1eQP9iVhXvs+Rd9hyrYGIi03YIdcZQ2mIo7QAMpy2G0g7AsNryNPrYVn2LWd/iBRhzfdC3eAH9i7mu4uUv5D3FkZTjKqd7t32jniMhItJtzJdEZAjYc0xEREREpMTimIiIiIhIicUxEREREZESxxw/I46tIyKqGeZLItIn7DkmIiIiIlJiz3E9YK8JERERkX5gzzERERERkVKD6znWdC+uuu0TERmKuspz/FSNiHRRgyuOa+tKWp7K6R1a2ddzJERERESkaRxWQURERESkxOKYiIiIiEiJwyqUHh37ZpVljuKiUi1GQ0RERETawJ5jIiIiIiIl9hw/I35Rj4iIiMjwsOeYiIiIiEiJPcdKj/YEm5qaoLxc/tzbqcTeZCIiIiL9wJ5jIiIiIiIlFsdEREREREosjomIiIiIlDjmmIiI9MKjz6N/lHfbN+o5EiIyZOw5JiIiIiJSMtieY3U9DOqeT6wNIedSVE43dU1SOZ29I0TUEDyev/mrpURUnwy2ONYHyRUXVE7vAD76jYiIiEgbWBwTEZFK6j6BIyIyZCyODQC/pEJEhuTx4W+VP8yk7geV1A1Rq623+retk+0QkX5jcVwP6mqcc217cY6kHFc5Vo9FMxEREZFqLI6JiEgvqOtoaCerm+2r64FW16Osann2PhPpPxbHRERUJ2r7KZm6YRLaou5L0gALXqKGRO+L44b0hRF1bzy69gZDRFQT2nq0pvoimIjIAIpjdXTpecZERKQ5LHaJqC4ZbHGsz2pb2LNHmYieR0P6BI6I6GlYHNMz4yPkiKghUPdFPVU91iHnAGtrcxQ99pQgdV/UU5dHy2+9qHI6v/BHpHksjg3YlbQ86fmgVafvV7m8up5mFrtE1JBx2AZRw6L3xTHHFmveszxfWZW6KrLZY02kXcy76iVXXIDpAxOUV1TtlNh4pnYFdm0fT/e8j6Gr7O1mzzSRHhXHdfULSKR7WOwSEVWlrrc65Fw9B/IUdZW/+T5AukTnimN1N0hyBXsqNE3TvUG1Hc7x6LWg6pf+qlu+krrEWttErGu94bVdvra9SsRjRrpJXdHcTtZT5fSajpeu7O1WV3ybuibVPMhq9quN+0eXYiH9YCSEENoOgoiIiIhIFxhrOwAiIiIiIl3B4piIiIiISInFMRERERGREotjIiIiIiIlFsdEREREREosjomIiIiIlFgcExEREREp6dyPgGhCREQEjh8/DkdHRxgZGcHX17fK/NLSUqxbtw5OTk5ITU3FjBkz0KZNGwBAaGgoLl++DGNjY7Rq1Qr/+Mc/AADLly/HjRs3pG0sXboUbm5uOtuOmzdvYt26dTAxMUFAQIC0TkZGBrZt24bWrVvj1q1b8PPzg7W1td61Qxvn43nakpCQgL1796JTp064ceMGPDw88PbbbwPQr3NSXTv07Zzk5ORg8eLF6NGjB3JyclBeXo5ly5bB2NhYK+fkWWgi1+livNVdd7oac6WcnBy89dZbmDlzJnx8fHQ+5ri4OJw/fx7GxsaIjo7GmjVr4OzsrLPxrlmzBiYmJlAoFCgpKZHuYU17WswA8PPPP2PTpk3w9/fHwIEDpem6eO+pi1eX7z11MVeq1b0nDFxxcbEYPHiwKC0tFUII4evrKyIiIqoss2PHDhEYGCiEEOLKlSti/PjxQggh/vvf/4qRI0cKhUIhhBBizJgx4saNG0IIIQICAuqpBQ89TzuEECI0NFT8+OOP4sMPP6yyzpQpU0R8fLwQQoh///vf4osvvtBkMzTWjvo+H0I8X1tOnjwpHfeysjLx0ksviZycHCGEfp2T6tqhb+ckMzNT7N+/X1puxIgR4uLFi0KI+j8nz0JTuU4X463uutPVmIUQoqKiQvj7+4tZs2aJb7/9VuPxPm/M9+/fF76+vtJyaWlpoqioSGfjjYuLEyNGjJCWe/Qe1nbMaWlpIjIyUvj4+Ihff/1Vmq6r9566eHX53lMXsxC1v/cMflhFXFwcXFxcYGZmBgDo3r07zpw5U2WZM2fOoFu3bgAANzc3XLlyBYWFhTh37hzc3d1hZGQEAOjWrRvOnj0LACgqKsL27dsRGBiI7777DnK5XGfbAQAjR46EqalpleXLy8sRHR2NLl26SNsMDw/Xu3YA9X8+gOdry2uvvQYPDw9pOZlMBlNTU707J+raAejfOXFycpJ6QIqKilBcXAxXV1etnJNnoalcp4vxVnfd6WrMALBz506MGzcOdnZ2Go+1LmIODw+HlZUV9uzZgy1btuDPP/+ElZWVzsbbuHFjFBcXQy6XQy6Xw8jICC1atNBovDWNuWXLlujdu/cT6+rqvacuXl2+99TFDNT+3jP44jgnJ6fKx582NjbIycmp0TK5ublVpltbW0vrjhgxAtOnT8eMGTNw+/Zt7NixQ2fboU5eXh4sLCykm/Jpy9cFTbQDqP/zAdRdW/bt24dZs2ahUaNGen1OHm0HoL/n5MiRI5g5cyamTZuG5s2ba+WcPAtN5TpdjPdRj193mvQ8MUdFRcHCwgKenp4aj7Mm8dRkmVu3biE+Ph4+Pj6YM2cOvvvuO0RFRelsvK1bt8bbb7+NefPm4aOPPsIrr7wCBwcHjcZb05jV0dV7ryZ07d5T51nuPYMvjh0dHVFUVCS9LiwshKOjY42WcXBwqDK9qKhIWtfd3R0mJg+HbPfu3VvjCeN52qGOvb09SkpKIISo0fJ1QRPtAOr/fAB105bDhw+juLgYkydPBqC/5+TxdgD6e068vb3x73//G0eOHEF4eLhWzsmz0FSu08V4K6m67jTpeWI+deoUSktLERgYiGvXruH8+fMICgrS6ZhtbGzQqVMnmJqawtjYGF27dsWFCxd0Nt5Tp04hOjoaW7duxebNm5GRkYGffvpJo/HWNGZ1dPXeexpdvPfUeZZ7z+CL465du+L27dsoKysDAPz+++/w8vJCfn6+9FGXl5cXYmNjAQBXr15Fhw4dYGNjg/79++PPP/+U3hRjY2Px6quvAgDWrVsn7ePmzZto3bq1zrZDHVNTU/Tq1Qt//PGHtM0BAwboXTuA+j8fwPO35cCBA8jJycGcOXNw9epV3LhxQy/Piap2APp3TmJiYpCQkAAAMDY2houLC9LT07VyTp6FpnKdLsYLqL/udDVmf39/zJgxAzNmzED79u3Rt29fjB07Vqdj7tWrF27duiVt6/bt23jhhRd0Nt7MzEw0bdpU2lbTpk2l7Wg7ZnV09d6rjq7ee+o8y70nW7FixYq6Cl4XmZqaol27dtizZw/i4uLQrFkzjB07FgEBAbh+/Tp69OgBd3d3HD16FH/99RfCw8OxaNEi2Nvbw8bGBlZWVggKCkJERAT69u2Lfv36AQDCwsIQFxeH33//HVeuXMHHH3+s0bFYz9MOADh58iSOHj2KGzduoKioCD169AAA9OjRA19//TWuXbuGmzdvYt68edKYHn1qR32fj+dty8mTJ7FixQoUFhYiODgYhw4dQo8ePdCiRQu9OifVtUPfzsndu3exe/du3Lx5E+Hh4SgsLMTs2bNhampa7+fkWWgq1+livNVdd7oac6X//Oc/OH36NHJzc9GoUSONF5vPE7ODgwPKyspw9OhRxMTEwNzcHO+//740xEjX4n3xxRdx6tQp/Pnnn4iJicHdu3fh6+ur8TGxNYlZCIHt27cjOjoaRUVFsLS0ROvWrXX23lMXry7fe+pirlSbe89IVP67QkRERETUwBn8sAoiIiIioppicUxEREREpMTimIiIiIhIicUxEREREZESi2MiIiIiIiUWx0RERERESiyOiYiIiIiUWBwTERERESmxOCYiIiIiUmJxTERERESkxOKYiIiIiEiJxTERERERkRKLYyIiIiIiJRbHRERERERKLI6JiIiIiJRYHBMRERERKbE4JiIiIiJSYnFMRERERKTE4piIiIiISInFMRERERGREotjIiIiIiIlFsdEREREREosjomIiIiIlFgcExEREREpsTgmIiIiIlJicUxEREREpMTimAzO4cOH4e/vr+0wiIjoMczPpA+MhBBC20EQ1aWKigqUlJTA2toaAPDJJ5/A1dUVH374oZYjIyJq2B7Pz08zaNAgrFmzBr169dJwZET/jz3HZHBkMlmNEy8REdUf5mfSB+w5phpLS0vDqlWrUFK2gVvYAAAgAElEQVRSgvLycvTr10/qjT179iy2bdsGmUwGCwsLLF++HK1bt8b+/fuxY8cOeHp6olGjRvjjjz/QpEkTbNmyBebm5gCA8+fPY/PmzTA1NYVCoYCPjw+GDRuG69evY926ddL+xowZg3feeQenTp3C0qVLYWlpiYULF+LNN9/ExIkTkZKSglmzZuE///kP7t+/j19//RV79+5FYGAgzM3N4erqipEjR2Lnzp3Iy8vDhAkT8NFHH2Ht2rUIDg7G9OnTMW3atCfavWvXLhw/fhwmJibo2LEj/Pz8YGZmBl9fX4SHh2PevHm4ePEiYmJiMGXKFJw9exbx8fFYs2YNQkJCcPHiRRw/fhwWFhZYuXIlcnNzUV5ejvHjx2P06NG4ffs25s+fr3KdFi1a1Os5JqK6pQ95c82aNejTpw82bdqE2NhYGBkZoW/fvvjggw9gZGRUpT0JCQlYtmwZ7t+/jxEjRuDSpUu4d+8eFi1ahP79+wMA7t69qzLXXb16FYsWLZLy86lTp7BhwwY0adIEnp6euHDhAoyNjbF161Y4Ojpi8eLFCAsLQ9u2bWFraws/Pz9YWlpi5cqVAAC5XI6///3vGDNmjMpjHxISgu+//x5mZmZwcnLCypUrYWNjgxUrViAsLAw+Pj5ISkrCpUuXMG7cOMTGxiImJgbLly/H6dOnER0djV27dsHd3R2rV6/GjRs3oFAoMHjwYEybNg3l5eWYOnWqynXY063nBFENyOVyMWzYMHHw4EEhhBAFBQWif//+Qggh0tLSRNeuXUVKSooQQoiQkBAxZMgQUV5eLoQQIiAgQPTr10/k5+eLiooK4e3tLQ4fPiyt261bN3Hjxg0hhBDx8fHCx8dHCCFEXFyciIuLE0IIUVZWJoYOHSott3fvXvH+++9L8Z04cULs379fCCFEVFSUGDhwoDTPz89PBAQESK8TExNF9+7dxYMHD4QQQty9e1csXrxYZbtDQ0PF0KFDRXFxsVAoFGLu3Lli69at0vyBAweKTz75RAghRGRkpDh9+rRIT08X7du3F8HBwUIIIXbv3i2ysrLEpEmTpDhycnJE3759RUxMjBBCqF2HiPSXPuXNbdu2iYkTJwq5XC7KysrEO++8I0JCQlS2KyoqSri5uYnTp08LIYS4dOmS6Nq1q8jNzRVCiGpz3eP5OSgoSHh6eoq0tDQhhBDTpk0TX331lTR/4MCBIioqSno9d+5cceTIESGEEHfu3BFTp05VGePFixfFyy+/LHJycoQQQqxdu1YsWbJEmu/j4yPef/99IZfLRVJSkvjpp5+EEEK0b99ebN68WQghxKFDh0RiYqJYvHix8PPzE0II8eDBAzF8+HApV6tbh/Qbh1VQjcTFxSEtLQ0jRowAADRq1AhffPEFACAsLAxdunRBmzZtAADDhw/H7du3ERsbK63v6ekJOzs7GBsb429/+xsyMjKkdTt37owXXngBAODh4YH58+cDAFq3bo3//Oc/+Mc//oEpU6YgOzsbf/31l7SPCxcuICsrCwBw9OhRDBs2rEZtcXd3h4uLC06dOgXg4RdEvL29VS4bHBwMb29vWFpawsjICMOHD0doaGiVZV577TUAQO/eveHl5fXE9ClTpkAIgcjISIwdOxYA4ODgAC8vLxw8eFDltqZMmYJmzZrVqD1EpJv0KW8GBwdj9OjRkMlkMDU1xdChQ3Ho0CG1bbO2tpbyXffu3eHo6Ijw8HBkZWXVKNc9qk2bNmjZsiUAwM3NTWqnKnZ2djh69CgyMjLQtGlTbN68WeVywcHBGDRoEBwcHAAAI0aMwOHDhyEe+bB8wIABkMlkaNeuHcaNGydNHzx4sLROx44dcfjwYak9FhYWePPNN59oz6PruLu7q42f9IOJtgMg/ZCVlQVbW1uYmPz/JdOjRw8AQGZmppSAgIdjymxtbZGZmSlNs7Gxkf42NzdHeXm5ynUf3e7atWtRUFCAffv2QSaTYeLEiSgpKQHwMOH27dsXoaGheOeddyCTydCoUaMat2fkyJEICQmBt7c3IiMj8d5776lcLjMzE4cPH0Z0dDQAoLS0FMbGVf+nVLffR6dXHotH2+rg4IDExMQabYuI9I8+5c3MzEzs2bNHKvqKiopga2urtm12dnZVXjdu3Bh37typca57lLp2qrJkyRJ8/fXXmDRpEpo1a4a5c+eiT58+TyyXmZmJ5ORkTJw4EcDDIRhNmjRBXl6eFJu6fPtoPLm5uSgrK3uiPZX/YKhah/Qfi2OqkebNm6OgoAByuVxK9MnJyXB1dYWzszNu3LghLVtRUYGCggI0b978qdt9fF0ASExMROfOnZGQkIAJEyZAJpMBwBMJ86233kJAQAAaNWpU417jSiNHjsT//u//IiIiAm3btn2i4H00vldeeaXKWOTc3Nxa7QuAdCxyc3Ph4uIi/e3k5FTrbRGRftCnvOns7IzZs2dL0xQKBQoKCtTGcO/evSqv8/Ly0KxZM43nuoKCAsyZMwezZ89GaGgoZs+ejYiICFhZWVVZztnZGS1btsSnn34qTcvNzX3in4qncXBwgJmZGXJzc9GuXTtpO8zdho3DKqhGPD090apVK4SFhQEA8vPzMX/+fMhkMnh7eyMxMRE3b94EAPz8889wcXFBt27dnrrdx9e9dOkStm/fDgBo1aoV4uPjAQB37tzB1atXq6w7aNAg3P0/9u48LIorbRv4Dc0moCKouODKBBWUuAXiCi5JjCuaSSYqRuOG+mLejFERiUTNJIpbHJcYUV8Tt6gZdh1joiJxA1xBjeKCChhBbUBk3873h01/ojQC3U1X4/27rlyRorrqOafop54+dar68WPs378fffv2VbkPCwsL5OXlITc3F1988QUAwNbWFi4uLpg/fz5GjRql8rWjR4/Gr7/+ioKCAgBATExMuWRbVba2tujbt69yVCYjIwPHjx9XXqojorpHn/Lm6NGjceDAAZSUlAB4Ni3hhx9+UBlDfn4+jh8/DgA4d+4c0tPT4ebmpvFcZ2Fhgfz8fERHR+Onn36Cr68vHj9+DAMDA7z11lsoLi5+6abBsvZERUUpi/jExETMnDmz2vs3NDSEh4eHsj35+fk4dOiQypsAqW6QLV68eLGugyDpMzQ0RL9+/bBlyxbs378fBw4cgI+PD1q3bo2GDRsq7+YNCQnBrVu3sHLlSlhbWyMiIgLbt29HYmIi6tWrhz///BO//PILbt68CWtra7z11lvK14aFheHSpUtYvHgxLC0t0alTJ+zbtw/BwcFISEhAfn4+zp49C3t7e7Rq1QoymQwpKSlo3bo13NzcAAAJCQnw9/fH/fv3cf36dbz//vto0KABtm7dioMHD8LDwwMdOnRQtuvatWv47LPPVLa7Q4cOyM7OxooVK3DgwAFcu3YNS5Ysgbm5OebPn4+4uDhcuXIFlpaW6NChAzIzM+Ht7Y20tDTExsbC2dlZOVJRdsLYtWsXwsLCMG3aNAwaNKjS1xCR/tKXvAk8K+QTEhKwbt06hIeHIzMzE76+vjA2Nn6pXffv30d0dDSaNm2KdevW4fDhw/j666/RsWNHAKpz3Yv52crKCmvWrEFSUhLy8/ORnZ2NwMBAJCYmwtDQEN26dUNpaSl++OEHnD9/HpMmTYKVlRVWrlyJ8PBwhIWFwcfHB46Oji/F2Lx5c1hZWeGbb75BREQETp06haVLl6JRo0ZYsWIFoqKicO3aNRQVFSk/kEyePBnJycmIi4uDnZ2dch60q6srTp48ia1btyI4OBhDhgzBuHHjYGBgoPI1pN/4KDd6bUVFReHmzZsVPr6NiIgqFhMTA19fXxw7dkzXoRBpBadV0GsnNDQUABAeHq68i5yIiIgIYHFMr6HIyEh4eHigdevWvKmCiKga4uPj8e233+LRo0eVTkkj0mecVkFEREREpMCRYyIiIiIiBa085/jRo6c1fm2jRubIyMjVYDSaIcW4GFPVSTEuKcYESDMuKcYEVD+uJk209yUvdTHvalJdbyPbp9/qevsA3bWxJnlXciPHRkYyXYdQISnGxZiqTopxSTEmQJpxSTEmQLpxVVddaUdl6nob2T79VtfbB+hXG/X+G/IOJv5W4fJh7d+t5UiIiEgXeB4gIk3Sm+JYVfIjIiIiItIUvSmOiYjo9cCRYCLSJcnNOSYiIiIi0hWOHBMRkV7g9Doiqg0cOSYiIiIiUuDIMRER6QRHgolIijhyTERERESkwJFjIiKqk54fmTZPM0VuTgEAPvWCiCrHkWMiIiIiIgWOHBMR0WuFz1Emospw5JiIiIiISEFyI8f7rxxQzgsjIiIiIqpNkiuOiYiIpCT0RGKFyz36ta/lSIioNnBaBRERERGRAotjIiIiIiIFTqsgIiKqxO2Ssyp+w2kVRHURR46JiIiIiBRYHBMRERERKdTZaRW8u5iIiLSpovMMzzFE+q/OFsecI0ZERNWh6pvziOj1UmeLY1U4okxEREREqrx2xTEREVFFridl6DoEIpIAFsdERKQRUpuW8Hyxa2xshKKiYq3vU9XVSVVUXbXkVU4i3WFxrMBERERERER6XxzzMhgRkXr2XzmA3JyCl5YPa/+uDqKpPimdB1TdDG4ve6uWIyGimuJzjomIiIiIFFgcExEREREp6P20CiIiIn1V3Rv4iEj7XrviWPWXg1Qs9MSz/1tYmCLnuTl5vFGPiIjUxTnKRNLz2hXHREREmlDdwRYi0g8sjmuIj34jIiIiqntYHL9C2ciAcZ4Rikr+/wPkecmLiIiqSlOjzKoGZqaNeVMj2yciFsdERFRN2v4mPFXPLe7YupFW96vP9hy+Xu6+mFfR1FVOXkWlukhvimMpPeSdiIhqH88DmrP6+L4Kl3/h/o9ajoRIevSmOJYaVZfIyp5uURX8ZE1ERBWp7jQMZ/TVUiTPaOqRcxxpJn3A4ljDqpPQVh+veF1Vlw6L7v+t3M9lj5djUiEifcSRYP2h7UfOqZqqoy9fYU51C4tjCVJ9wiifnMpuElQ1Wq0qmVW1+C6jqviu6LKcsbER7JvXr3B9Jjn9wRMV+wDQ3NxiFsHady0vutxN4zWlqRFiVX87t0tUzCdHxeelsnjU+a4BVbGoOucZt7xV4XJNvfcr6uMX2/e812kATCp510AIIWp1j0REREREEmWo6wCIiIiIiKSCxTERERERkQKLYyIiIiIiBRbHREREREQKLI6JiIiIiBRYHBMRERERKdTqc45Pnz6N3377DTY2NjAwMIC3t3e53xcUFCAgIAC2tra4e/cupk+fjnbt2gEAwsLCcO3aNRgaGqJ169b4+OOPdR7TwIED0bJlSwBA06ZNsXr1ao3EVJW4AOC///0v1qxZAz8/PwwYMEC5XFd9VVlM2uqrV8UUGBiIx48fo3Hjxrh69So+++wz2NvbA9BeP6kbl6766r///S+OHj2Kjh074vLly/Dw8MDAgQMB6LavKotLV31VJjw8HPPmzcOFCxdgYWEBQLt9VRNSzLuapE777t27h4CAABgZGWHdunW6CP+Vatq++Ph4/PTTT3B0dMSdO3fg7OyMjz76SEetqFxN2yiXy+Hr64sePXpALpejqKgIixYtgqGhtMb91PkbBQC5XA4PDw94eXnB09OztsN/JanWUWoRtSQ3N1cMHjxYFBQUCCGE8Pb2FqdPny63zubNm0VgYKAQQojr16+LsWPHCiGEePDggRg5cqQoLS0VQggxZswYcefOHZ3GJIQQ69atUzuGmsaVlJQkzpw5Izw9PcWxY8eUy3XZV6piEkI7fVWVmL777jtlXxw8eFB4eXkJIbTXT+rGJYTu+iooKEjcv39fCCHE1atXxTvvvCOE0H1fqYpLCN31lRBC3Lp1S6xZs0Y4ODiI7OxsIYR2+6ompJh3NUndHB4WFib27t0rZs+eXXtBV4M67Tty5IiIi4sTQghRWFgoevbsKeRyeS1GXzXqtDE1NVXs27dPud6IESPEuXPnainyqlH3b7SkpET4+fmJGTNmiJ07d9Ze4FUk1TpKXbX28erSpUto0aIFTExMAADdu3fH8ePHy61z/PhxdOvWDQDQoUMHXL9+HdnZ2Thx4gScnJxgYGAAAOjWrRv++OMPncYEAGfPnsWWLVuwdu1aXLhwQe14qhNXq1at8Pbbb7/0Wl32laqYAO30VVVi+vzzz5V9UVpaCnNzcwDa6yd14wJ011djxoxBixYtADwbUSsbydZ1X6mKC9BdX+Xl5WHr1q34n//5n3LLtdlXNSHFvKtJ6ubwkSNHwtjYuFZjrg512jdo0CA4Ozsr15PJZJJsqzpttLW1VY6G5+TkIDc3VzkKKRXq/o1u2bIFH374IRo2bFircVeVVOsoddXatAq5XK687AgAlpaWkMvlVVonPT293HILC4uXXlvbMVlaWmLu3LlwdnZGXl4eRo8ejc2bN6NNmza1Epcquuyrymijr6oTU2FhIUJCQvDVV18B0F4/qRsXoNu+ys/Px/r16xEbG4tVq1YBkEZfVRQXoLu++u677zBr1izlCaGMNvuqJqSYdzVJ3RwudZpq3+7duzFjxgzUr19f+0FXkybaePDgQfz888+YOnUqmjVrVjuBV5E67bty5QrMzMzw5ptv4ueff661mKtDqnWUumpt5NjGxgY5OTnKn7Ozs2FjY1Oldaytrcstz8nJeem1tR0TAOWn8nr16qFTp04a+9RTlbhU0WVfVUYbfVXVmAoLC7F48WL885//ROvWrQFor5/UjQvQbV+ZmZlh3rx5WLVqFT755BMUFRVJoq8qigvQTV89ePAAWVlZOHToEAIDAwEA27dvx+XLl7XaVzUhxbyrSermcKnTRPsiIiKQm5uLSZMmaT3emtBEG4cNG4YdO3bg4MGDiIqK0n7Q1aBO+44ePYqCggIEBgbixo0bOHXqFIKCgmot9qqQah2lrlorjrt27Yq//voLhYWFAIALFy7A3d0dmZmZyuF1d3d3XLx4EQCQkJCAjh07wtLSEv369cPVq1chhAAAXLx4Ef3799dpTGfOnCl3ifHevXto1aqV2jFVNS5VdNlXqmirr6oSU35+Pr766it8+umn6Ny5Mw4fPgxAe/2kbly67Ktt27Yp+6NZs2bIyMhAQUGBzvtKVVy66qvmzZtj+fLlmD59OqZPnw4A+PTTT9GlSxet9pU22gLUft7VJHXapw/Ubd8vv/wCuVyOWbNmISEhAXfu3NFNQyqhThtjY2MRHx8PADA0NESLFi2QnJysm4aooE77/Pz8lHnGwcEBffr0wQcffKCztlREqnWUugxEWearBadOncLhw4fRqFEjGBsbw9vbGytWrICVlRWmT5+O/Px8BAQEoEmTJkhKSoKXl1e5u6avXLkCmUyGtm3bauyu6ZrGlJCQgA0bNsDJyQkPHz6Era0tvLy8NBJTVeISQmDTpk34z3/+gx49emDkyJHo168fAN31laqYtNlXr4rJ29sbN2/eRNOmTQEAubm5yk/e2uondeLSZV9t2rQJaWlpaNGiBW7fvo3u3bvjH//4BwDd9pWquHTZV8CzKRR79+7Fv//9b8yaNQsff/wxbG1ttdpX2miLLvKuJqnTviNHjiAsLAx37tzBqFGjMG3aNB235mU1bd+RI0fg4+MDR0dHAEBmZia+/PJLuLq66rhFL6tpG+Pi4rBt2zY4OjoiJycHaWlpWLx4cbl7OKRAnb9RAPjPf/6D3bt3w9bWFmPHjoWbm5sOW/MyqdZR6qjV4piIiIiISMqk9TBAIiIiIiIdYnFMRERERKTA4piIiIiISIHFMRERERGRAotjIiIiIiIFFsdERERERAosjomIiIiIFFgcExEREREpsDgmIiIiIlJgcUxEREREpMDimIiIiIhIgcUxEREREZECi2MiIiIiIgUWx0RERERECiyOiYiIiIgUWBwTERERESmwOCYiIiIiUmBxTERERESkwOKYiIiIiEiBxTERERERkQKLYyIiIiIiBRbHREREREQKLI6JiIiIiBRYHBMRERERKbA4JiIiIiJSYHFMRERERKTA4piIiKiGcnNzMX36dHz88ccYNWoU7t+/r7NYNmzYgD59+mD9+vWvXLekpAQTJkxAhw4dkJKSAgA4d+4cJk+erO0wqywlJQVDhw7VdRj0GmJxTHXO88meiEibDh48iOLiYuzduxeff/45DA01e1pdsGBBlYpdAPD29ka/fv2qtK5MJsPOnTvLLevRowf+/e9/VztGbbGzs8PevXt1HQa9hox0HQAREZG+SktLQ9OmTQEAAwYM0HE06jEwMED9+vV1HUY5DRo00HUI9BpicUyvVFpaiiVLluDGjRuQyWRo06YN/Pz8MGvWLJw5cwbu7u7YvHkzQkNDsWrVKnTv3h0ODg74+eef8d577yErKwuXL1/G4MGD8d577+GHH37AjRs3sGDBAgwePBjx8fFYtGgRnj59inHjxuHIkSMoLi7G2rVrERgYiIsXL8LR0REBAQHKmEJDQ7Fnzx6YmJjA1tYWS5YsgaWlJaZOnQoAmDNnDkxNTbF8+XIsWLAAsbGx8Pf3R2RkJGJiYtCsWTMkJSWha9eu+L//+z9cunQJ/v7+aNiwIYKDg1/qgytXrmDZsmUwMDCATCaDv78/7O3tsW/fPmzevBlvvvkmLCwscP78eTRu3Bj29vY4cOAAPD09cevWLZw/fx7jxo2Dt7c3tm3bht9++w0ymQxt27aFn58fLC0tsXjx4gpfM3v27Fo71kRUdfv27UNwcDAKCgowYcIEzJ49G+vXr38p32zduhVZWVnYvn07ZDIZSktLMWfOHPTo0QMAlPnu3LlzMDIygo2NDebOnYtjx47hxIkTMDU1RWxsLEaOHIkPP/wQixcvxvXr12FsbIwmTZpg6dKlsLS0rFLMQUFB2L59Oxo3boxhw4Ypl6enp2PGjBmIi4tDQkKCRvPy87ktMTERCQkJeO+99zBnzhwAwIULF7By5UoYGxtDCIHJkydjwIABmDhxIqKjo3H06FHY2dmhqKgIa9aswcWLFwEA3bp1w5w5c2BsbAxvb29ERUVh9uzZuHTpEm7evIlJkyZh/PjxFfZDTk4O/vWvf+Hu3bsQQmDUqFEYO3Ys/vrrL3z++eeIi4vDsmXLEBoainPnzmHBggXYs2cPGjdujC5duuDMmTPIysrCsWPHEB8fjxUrVkAIAQMDA8yfPx/Ozs44evQoVq5cWeFrSOIE0SscP35cTJkyRfnzrFmzRHJysigsLBQuLi7i/Pnzyt95eXmJ0tJSIYQQPj4+YvTo0aKgoEDI5XLh5OQkNm7cKIQQ4vDhw+K9995Tvi46Olo4OTmJixcvCiGEmDlzphg9erTIysoSBQUF4u2331b+7ty5c8LFxUXI5XIhhBDLly8XCxcuVG7LwcFBJCcnl2uDg4ODWL9+vRBCiPDwcHHlyhUxYsQIER4erlzn888/F9nZ2S+1PysrS7i6uorTp08LIYSIjIwU7777rigpKRFCCLFu3TrRu3dvIZfLRXFxsVixYoUQQghPT0/x6aefiuLiYnHr1i2xf/9+ERISIoYOHSpyc3OFEEIsXLhQ+Pr6KvdV0WuISLrWrVsnfHx8yi2rKN+EhoaKjIwMIYQQycnJws3NTbn+pk2bxKRJk0RxcbEQQoglS5aIoKAgIcSzPLpu3bpy2//xxx/L7f+7775T/lzR+mVu3LghnJ2dRVJSkhBCiN27d5fLl8nJycLBwUG5vibzsqenp5g2bZooLS0VaWlpwtHRUaSmpgohhPjggw/EpUuXhBBCXLt2rVx/Ph/fhg0bxMSJE0VxcbEoLi4WkydPFhs2bFCuO2DAAPHVV18JIYSIi4sTXbt2FUVFRRX2hZ+fn5g3b54QQoinT5+KgQMHirNnz5brh5CQECGEENu2bRNpaWkiKChIODs7i1u3binbmJWVJVxcXER0dLQQQoizZ88KFxcX8eTJEyGEqPA1JH2cc0yv1KBBA9y4cQOnTp1CaWkp1qxZgxYtWsDY2BjDhg1DaGgoAODPP/9Ehw4dYGBgoHyti4sLTExMYG1tDWtra3Ts2BFAxfOCLSws0LVrVwDAG2+8gZYtW6J+/fowMTFB27ZtkZycDAAICQnBwIEDYW1tDQAYMWIEIiIiIISotB2DBw9Wru/k5IRRo0YpY09PT4epqSksLCxeel1kZCTMzc3Rq1cvAIC7uzseP36MuLg45Tpdu3aFtbU1ZDIZ5s2bp1zu5uYGmUwGe3t7fPjhhwgLC8P777+PevXqAQDGjBmD8PBwFBcXq3wNEemfF/NNx44d4evri7Fjx8LX1xcPHjyAXC4HAAQHB2PUqFGQyWQAAC8vL7z11lsqt21mZoZx48bB09MTBw8exNWrV6sU0+HDh9G1a1e0atUKAKp0s5sm83Lfvn1hYGCApk2bwsrKSnnzYsOGDREWFobHjx+jY8eO+OqrryqMJSwsDB4eHpDJZJDJZBg1atRLV/rK5lx36NABubm5yj5+XmlpKcLCwvD3v/8dAGBpaYkBAwYgPDy83HqDBg0CAEyePFk5daZdu3awt7cHAPj4+CAyMhKWlpZwdXUFAPTs2RMNGzYsNzr84mtI+jitgl6pW7du+Prrr7FlyxYsXLgQ//jHP+Dl5QUA8PDwwNSpU/Hll18iLCwMH330UbnXPl9sGhkZKX+WyWQoKiqq0rplP5etn5qaitu3b2PChAkAnl2SbNy4MTIyMpSJuSIvXnYcMWIEvvvuOzx8+BCHDh1SeaJITU3FkydPlPsDAGtra2RmZip/VjVP78Xlqamp5WK0trZGUVER5HI5bG1tK90WEemPF/PNzJkzMX78eEyZMgXAs+ItLy8PwLO80KhRI+W6ZbmgIjExMVi+fDkiIiJgZ2eH4OBghISEVCmmhw8fltuPlZXVK1+jybz8fJ+YmpoqX7t69WoEBgZi9OjRcHBwwNy5c9GpU6eXYnmxn6VWyAUAACAASURBVKytrZGWllZunbJ9mJqaAsBL5xng2WBIYWEhVq5cCTMzMwBAVlbWS/usKBe/KqeXxZWamlrpdkjaWBzTKz19+hQuLi5wc3NDUlISpk6dCltbW3zwwQdwdnaGjY0Nfv/9d9y9e1f56VibmjdvjlatWpUbXUhPT6+0MK5I06ZN4erqioiICMTExMDT01Pl/po1a1buzu7s7GyYmJjUKPb09PRycRsbG6Nx48bV3hYR6Qe5XI779+8rRzVfLNiaN2+OjIwM5c8ZGRnIycmBnZ3dS9uKj49Hu3btlL97/qrTqzRt2hR3794ttx9NUScvFxYWYv78+ZgzZw62bt2KWbNmITIyssJ9PB9zenp6pR8kVLG2toaJiQkWLVoEZ2dnAM+OSX5+frW39WJOL4urWbNm1d4WSQenVdAr/f7779i3bx8AoHXr1rC1tUVpaany96NGjcKyZcvQp0+fWoln9OjRiIqKwpMnTwAAiYmJmDlzpvL35ubmyM/PR1hYGH799ddKt+Xh4YHt27ejXbt2ykuaLxowYAAyMzMRHx8P4NlzTT/55BNkZ2fXKPZff/1VmYRDQ0MxcuRIlfsmIv1nZWWFBg0aKKdinThxotzvR48ejbCwMJSUlAB4NpJ6/fp1AM9GbvPy8pCbm4svvvgCbdq0QVJSkrJIPHnyZJXjeO+993Dp0iXlVIgDBw6o3bbn21BZXq7MZ599hry8PBgZGaF79+7KfqhoH+Hh4SgpKUFpaSnCw8MxZsyYasdqaGgIDw+PctMoNm3apJxmVx0DBgxATk4Ozp49C+DZzYVPnjzBwIEDq70tkg7Z4sWLF+s6CJI2U1NT/Pzzz/jll1+we/du2Nvbw8vLS1nQtWzZElu2bMG3334Lc3NzAMD27dsRGhqKmzdvokWLFti5cyfOnTuHK1euwNXVFfPmzUNaWhouXrwIJycnLFq0CPfv30dqaipKS0sRGBiIxMRE1KtXD1FRUThy5AiuXbsGe3t7uLi4wMrKCt988w0iIiJw6tQpLF26VHm5LTs7G5s3b8aNGzcwZcoUzJ49G8nJyYiLi4OdnZ1yvh0AtGnTBj/88AMWLlyIJk2aVNh+ExMTuLq6YsWKFQgODkZ4eDhmzpwJR0dHREREYPv27co7sN955x0AwIoVKxAVFYVr166hqKgI3bp1A/D/L6WuXr0awcHBaNCgAfz8/GBiYqLyNUQkTfv27cOuXbuQmJiIP/74A6NHj8bkyZNfyjeGhoZo37491q5diz/++ANCCJw7dw5xcXF455134OrqisTERKxfvx7BwcF44403lE9ZaNCgAbZu3YqDBw/Cw8MDQ4YMQVJSEtauXYvY2FjUq1cPZ8+eRWZmJi5evIiDBw/i5s2bMDc3h5OTU7l4bWxs0LRpU3z99dc4fPgw2rZti5MnTyIuLg69e/fGnDlzkJaWhtjYWHTp0kVjefn53Obk5ITNmzcrzwdvvfUWDAwM8N133yEsLAx//PEH/P390apVK0ycOBEpKSmIi4tD//790adPH9y+fRsbNmxAUFAQnJyc4O3tDZlMhvnz5yMuLg5XrlxB37594efnh8TERGUfl02fKOPq6orIyEhs2bIFoaGhaNCgAWbOnImsrCx4e3sr+8HZ2RnW1tY4c+YM1qxZg6SkJERHR2PUqFEAnp0f3n77baxZswZBQUGIjo5GQEAAWrdurfI1JH0G4lV3MRG9QlZWFhYuXIgNGzboOpRqKy0txeTJk/Hjjz/qOhQiIiKSAM45phqLiopCly5d8Ouvv+rdV3zGxcWhQYMGSEpKqrXpIERERCR9LI6pxpKSkrBs2TK0bdtW70aNHz16hLlz56JZs2b4/vvvdR0OERERSQSnVRARERERKfBpFUREREREClqZVvHo0VNtbFapUSNzZGTkanUfmqAPcepDjADj1DTGqTnVibFJE+19GYCqvKsPfVhVdaktQN1qD9siXXWpPTVpS03yrl6OHBsZ6cczYfUhTn2IEWCcmsY4NUfqMUo9vuqoS20B6lZ72Bbpqkvtqa228Ia8GjqY+FuFy4e1f7eWIyEiql3Mf0RUl+nlyDERERERkTawOCYiIiIiUmBxTERERESkwDnHCpxDR0RUnqq8SERUl3HkmIiIiIhIgcUxEREREZECi2MiIiIiIgUWx0RERERECiyOiYiIiIgU+LQKDXv+7m7zNFPk5hTwiRdEREREeuK1K45DTyRWuNy4ZS0HQkRERESSw2kVREREREQKr93IMRERaQe/TImI6oI6WxyrStK3SzIqXN4RjbQZDhERERHpgTpbHGsKvz6ViEg9HFEmIn3C4ljhepKKEeXW6o8o88RAREREpB94Qx4RERERkQKLYyIiIiIiBRbHREREREQKLI6JiIiIiBR4Q94raPNGPU3hDX9EJGWq8uiw9rUcCBFRFbA4riFVyf55xsZGKCoqVllIs6glIiIikhYWx0REpBFVGTQgIpI6Fsd6hF9IQkR1SeiJxAqXG7e8VeHySU0+0GY4REQA6nBxzBEMIiJpu11ytsLlHSGdezqI6PVTZ4tjKdGHm/qI6PWl6qoUBxmI6HWk98VxXZxqUBfbRERUVSqLctfajYOIXk98zjERERERkYLejxzzsh8RERERaYreF8ekO3xOMxHVpj2HryMnp+Cl5R79+G0iRKQ5LI51qLqj3qpu4OO3TxHR62z18X0VLv/C/R+1HAkR1QUsjuuwqtzYZ55mitwKRmKex5FgIiIiel3oTXH8/MhA2dcyk2ZxBJqI6hJVXzLCaRhEVBm9KY5Jd6T2aDnOdSZ6PV3Li0ZRifoDI8whRFQZFsd1WFXmNGtjFF7Viafo/t8qXK6LURyOKBHRi1g0ExHA4liv6Ptj61R9VezBxFsVLtfnExKLb9In+p5bVFGVc5BU3S1VvWhmgU2k/yRXHKsqKuj1o2qeuaqndlR0UtLUCamqf5cWFqYVPmrqVdtRVTRX9/1Q1e2UxWncsu59MNEUfsChMqo+PFxPqvgpGVVdt7Ird/aytypcrqrgr+7TjPT9SR7azPeaUt0piVKLXxek8uHSQAghanWPREREREQSxa+PJiIiIiJSYHFMRERERKTA4piIiIiISIHFMRERERGRAotjIiIiIiIFFsdERERERAosjomIiIiIFCT1JSCnT5/Gb7/9BhsbGxgYGMDb27vc7wsKChAQEABbW1vcvXsX06dPR7t27QAA9+7dQ0BAAIyMjLBu3TpJxhkfH4+ffvoJjo6OuHPnDpydnfHRRx9JLk65XA5fX1/06NEDcrkcRUVFWLRoEQwNtfNZSp3jDgByuRweHh7w8vKCp6en5GIcOHAgWrZsCQBo2rQpVq9erZUY1Y3z0qVLOHXqFAwNDRETE4Nly5ahefPmkoozJiYGS5cuhbW1NYBnx/7999/H7NmzJRUnACxbtgxGRkYoLS1Ffn6+Rt5D6sQTFhaGa9euwdDQEK1bt8bHH38MAEhJScH333+PNm3a4P79+/Dx8YGFhYVaceqyPf7+/rhz545yG19++SU6dOgg6baoOn/p6thooy26Oi7qtKeyc7a+HZvK2qJv75nKahSNHRchEbm5uWLw4MGioKBACCGEt7e3OH36dLl1Nm/eLAIDA4UQQly/fl2MHTtW+buwsDCxd+9eMXv2bMnGeeTIEREXFyeEEKKwsFD07NlTyOVyycWZmpoq9u3bp1xvxIgR4ty5c5KLUwghSkpKhJ+fn5gxY4bYuXOnJGNct26dVuLSZJxPnz4V3t7eyvWSkpJETk6O5OJMTEwUV69eVa7n6+srUlJSJBfnpUuXxIgRI5TraeI9pE48Dx48ECNHjhSlpaVCCCHGjBkj7ty5I4QQYvLkycq8tGPHDvHdd9+pFWdVaas9tfV+e562zl+6ODbaaosujosQ2jtn69uxqawt+vaeqaxG0dRxkcy0ikuXLqFFixYwMTEBAHTv3h3Hjx8vt87x48fRrVs3AECHDh1w/fp1ZGdnAwBGjhwJY2NjScc5aNAgODs7K9eTyWRai1mdOG1tbZWfKHNycpCbm6sc+ZRSnACwZcsWfPjhh2jYsKFW4tNEjGfPnsWWLVuwdu1aXLhwQZJxRkVFwdzcHNu3b8eGDRtw9epVmJubSy7Odu3awdHREQDw+PFjFBYWSvJv08rKCrm5uSguLkZxcTEMDAxgZ2ens3hOnDgBJycnGBgYAAC6deuGP/74A0VFRYiJiUGXLl2U24yKilIrTl22B3iWszZt2oTAwEDs2rULxcUVfz2zVNoCVHz+0tWx0da5WBfHBdDOOVsfj01l9Ye+vWdU1SiaPC6SKY7lcnm5oW9LS0vI5fJqr6Ntmopz9+7dmDFjBurXry/ZOA8ePAgvLy9MnToVzZo1k1yc0dHRMDMzw5tvvqmV2DQRIwDMnTsX06ZNg5eXFxYuXIh79+5JLs779+8jLi4Onp6emDVrFnbt2oXo6GjJxfm8PXv2KC+lSy3ONm3a4KOPPsL//u//4p///Cd69+6tnAqii3jS09PLLbewsIBcLkdGRgbMzMyURWZt5lRttAcARowYgWnTpmH69On466+/sHnzZi23RDvnL10dG22di3VxXADtnLP1/di8WH/o63vmxRpFk8dFMsWxjY0NcnJylD9nZ2fDxsam2utomybijIiIQG5uLiZNmiTpOIcNG4YdO3bg4MGDWvtUrE6cR48eRUFBAQIDA3Hjxg2cOnUKQUFBkooRgPLTer169dCpUyetjR6rE6elpSUcHR1hbGwMQ0NDdO3aFWfPnpVcnGUKCwtx5coV9OzZUysxqhvn0aNHERMTg40bN2L9+vVISUnB/v37dRaPtbV1ueU5OTmwsbFBo0aNkJ+fDyGEym1qizbaAwBOTk4wMnp2O83bb7+ttQ95VYmzuus8T1fHRlvnYl0cF0A752x9PjYV1R/6+p55sUbR5HGRTHHctWtX/PXXXygsLAQAXLhwAe7u7sjMzFRernF3d8fFixcBAAkJCejYsSMsLS31Ks5ffvkFcrkcs2bNQkJCQrlJ8FKJMzY2FvHx8QAAQ0NDtGjRAsnJyZKL08/PD9OnT8f06dPh4OCAPn364IMPPpBUjGfOnFFe7gWe3azSqlUrjceobpyurq64f/++clt//fUX2rZtK7k4y0RERGDYsGFaiU8TcaampqJJkybKbTVp0kS5HV3E069fP1y9elV50rh48SL69+8PY2NjuLq64vLly8pturm5qRWnLtsDAAEBAcp93Lt3D23atJF0W1TR1bHR1rlYF8cF0M45W1+Pjar6Q9/eM6pqFE0eFwNRll0k4NSpUzh8+DAaNWoEY2NjeHt7Y8WKFbCyssL06dORn5+PgIAANGnSBElJSfDy8lLeIXvkyBGEhYXhzp07GDVqFKZNmya5OI8cOQIfHx/lnMnMzEx8+eWXcHV1lVSccXFx2LZtGxwdHZGTk4O0tDQsXrxYa3NQ1TnuAPCf//wHu3fvhq2tLcaOHauVJFXTGBMSErBhwwY4OTnh4cOHsLW1hZeXl8bjUzdO4Nmltvv378PY2Bj5+fnw8fHR2hNK1D3m06ZNw/fff6/1+wxqGmdubi78/f3RsmVL5R3US5YsUfs9pE6/hYWF4cqVK5DJZGjbtm25p1Vs3LgRrVq1woMHD7BgwYJae1qFNtqzYMECNG7cGGZmZrhz5w58fX3RuHFjSbdF1flLV8dGG23R1XFRpz2VnbP17dhU1hZ9e89UVqNo6rhIqjgmIiIiItIlyUyrICIiIiLSNRbHREREREQKLI6JiIiIiBRYHBMRERERKbA4JiIiIiJSYHFMRERERKTA4piIiIiISIHFMRERERGRAotjIiIiIiIFFsdERERERAosjomIiIiIFFgcExEREREpsDgmIiIiIlJgcUxEREREpMDimIiIiIhIgcUxEREREZECi2MiIiIiIgUWx0RERERECiyOiYiIiIgUWBwTERERESmwOCYiIiIiUmBxTERERESkwOKYiIiIiEiBxTERERERkQKLYyIiIiIiBRbHRK/w2WefoUuXLoiJiQEApKSkYOjQoTqOioiIiLSBxTHpteDgYEyYMEGr+1i3bh2aNGmi/NnOzg579+7V6j6JiKSmKvl2woQJCA4OVntfKSkp6NChg9rbIaoJFsdENdCgQQNdh0BERERaYKTrAKhuWLBgAUJCQtCzZ0/s2LEDycnJ8PT0xIkTJ/Dw4UNMmzYNhYWF2LNnD0xMTPDtt9/izp07KC0txeDBgzF16lQUFRVhypQpiI2Nhb+/PyIjIxETE4OtW7fi7t27CAoKQr169WBmZob58+dDLpcjMDAQjx8/xoQJE+Dg4IBFixaVi2vfvn3YvHkz3nzzTVhYWOD8+fNo3LgxfvjhByxevBgpKSkwMDDAG2+8gUWLFsHI6NlbIioqCqtWrYKVlRX69OlTbpsTJ05EdHQ0jh49iuTkZCxZsgRNmjTBzp078fvvv2PZsmVwcXHB8uXLAQAbNmzAiRMnYGJiAhsbGyxcuBBNmzatnQNDRHWOVPPt6tWrce3aNTx69AghISGYMmUK3N3dceLECWzYsAEmJiawsLDAkiVLYGtri7Fjx+LChQsYPnw4vvjiC3z66acwNjbGqlWr4O/vDwDKkepZs2ZVmmuPHj2KlStXonHjxujSpQvOnDmDrKwsHDt2DFeuXMGyZctgYGAAmUwGf39/2Nvb1/pxIz0iiDTEzc1NXLhwQQghxPbt20WnTp3E5cuXhRBCBAYGKv/t6+srfHx8hBBC5OXlieHDh4uQkBDldhwcHMT69euFEEKEh4eL2NhY4eLiIgoKCoQQQvz4448iKChICCFEUFCQ8PT0rDSudevWid69ewu5XC6Ki4vFihUrREZGhggNDVWu4+PjI/bv3y+EEEIul4uuXbuK8+fPCyGEOHLkiHB0dBTR0dHlYkxOTq4whnXr1inbd/PmTfH++++L0tJSIYQQ33zzTbntEBHVhFTzraenp3J9IYRISkoSXbt2Fbdv3xZCCLFr1y4xceJE5e+nTp0qvvnmG1FUVCRmz54tcnJyhBBCJCcnCwcHh3LbrizXlv3e2dlZ3Lp1SwghxPLly0VWVpZwdXUVp0+fFkIIERkZKd59911RUlJSaTvo9cZpFaQx/fv3x/HjxwEAly9fxjvvvIOoqCgAwJ9//gknJyeUlpYiIiICH3zwAQDAzMwMQ4cOfWmO2uDBgwEAI0aMQJcuXQAAoaGhyMvLw/jx4zF8+PBqxda1a1dYW1tDJpNh3rx5aNiwIf766y+MHTsWEyZMQGxsLK5evQrg2aixjY0NunfvDgAYNGgQTE1Na9QnFhYWePz4MX777TcUFRVh7ty56NGjR422RURURsr59nkHDhxA586d0b59ewDA8OHDcebMGTx8+BAAsHTpUoSEhGDu3LkYO3YszM3Na7wvAGjXrp1yVNjHxweRkZEwNzdHr169AADu7u54/Pgx4uLi1NoP1W0sjklj3N3dERkZidzcXJibm2PgwIGIiorCkydPUL9+fRgYGCA9PR2FhYWwtrZWvs7a2hppaWnltmVpaan8t5mZGXbt2oXY2FgMGjQI/v7+yM7OrlZs9evXL/dzSEgI9u3bh02bNmHnzp0YPXo08vPzAQCPHj1Co0aNyq1vZWVVrf2Vad68OTZv3oywsDC4u7tjzZo1KCoqqtG2iIjKSDnfPi81NRW3b9/GhAkTMGHCBHh7e6Nly5aQy+UAnuXISZMm4fz58xoZOHgx16empuLJkyfK/U+YMAHW1tbIzMxUe19Ud7E4Jo3p3bs37t27h+DgYPTq1Qv9+/fHn3/+ibCwMPTv3x/As8RsYmKC9PR05evS09Nha2urcrtFRUWwsbHBqlWrcPjwYTx58gQBAQFqxRofHw9nZ2dl0VtcXKz8XZMmTcrFB6DSRGpsbIzCwkLlz1lZWcp/5+Xl4W9/+xu+//57hIaG4tKlS9iyZYtasRMR6Uu+bd68OTp37oydO3cq/wsJCYGDgwMAoKCgAAkJCbC3t8f69esr3VZlubay/Tdr1uyl/b94LwnR81gck8aYmZnBxcUF33//Pfr164dGjRqhc+fOCAwMVF7SMjQ0hIeHh/KyXn5+Pg4dOoQxY8ao3G5aWpryxo/69eujU6dOKCkpAfBs2kJeXh4AYPbs2eWK3Mq0adMG169fR2FhIYqLi3HmzBnl79zc3JCeno7z588DAI4cOYLc3FyV27Kzs8Pdu3dRWFiIgoIC5fOQgWdF+Lp16wA8K7rbtWunjJ2IqKakmm/L1rl79y4CAgIwbNgwxMXF4f79+wAAuVwOT09PlJaWAgA2btyIqVOn4uuvv8bPP/+My5cvK7cDPBtgCAwMxKVLlyrNtaoMGDAAmZmZiI+PBwDk5ubik08+UWs0nOo+2eLFixfrOgiqO54+fYqMjAx8/PHHAIDHjx+joKAAf//735XruLq64uTJk9i6dSuCg4MxZMgQjBs3DgYGBpg8eTKSk5MRFxcHOzs7tGrVCkZGRoiNjcXWrVsREhKCR48ewc/PD/Xr10eTJk0QEhKC0NBQtG/fHu7u7uXiiYiIwPbt25GYmIiEhAS88847AIAOHTrg/Pnz2LRpE86dO4d69eohNjYWhoaG6N27Nzp16oRvv/0WBw8ehEwmw4MHDxAbGwsnJyfMnz8fKSkpiIuLQ//+/fG3v/0NN27cwMaNGxEfH4/27dvj2LFjKCwsRJ8+fXDo0CHs3r0b+/fvh5GREebNm1fjOcxERGWklm8BwNTUFJs2bUJUVBTGjRsHJycndOrUCf/6178QFhaGw4cPw8/PDy1btsQXX3yB//73v3B0dERhYSGOHj2KQ4cOwcrKCt27d8fdu3exbds2PHr0CJMmTUKrVq1U5tri4mKsWbMGSUlJiI6OxqhRowAAJiYmcHV1xYoVKxAcHIzw8HDMnDkTjo6OtXKMSD8ZCCGEroMgIiIiIpICTqsgIiIiIlJgcUxEREREpMDimIiIiIhIgcUxEREREZECi2MiIiIiIgUjbWz00aOn2tisWho1MkdGhupn1eojtkn66lp7ALZJHU2a1H/1SjUkxbxbRsp/M1KODWB86pBybIC045NybED14qtJ3tVKcSxFRkYyne37YOJvFS4f1v5dtbaryzZpS11rU11rD8A2UfVJuX81HZum872U+w6QdnxSjg2QdnxSjg3QfnycVkFEREREpMDimIiIiIhIgcUxEREREZECi2MiIiIiIoXX5oY8IiIioPxNc+ZppsjNKdBhNEQkNRw5JiIiIiJS4MgxERGRxJSNbr84sq3uI0CJ6NU4ckxEREREpMCRYx3S1peDEBEREVHNsDgmIqI6SdUABBFRZVgcExER6QlecSTSPs45JiIiIiJS4MhxDfHTOxFR7aqLeVdXUz9CTyRWuNyjX/tajoRIelgcS1BdPAEQEVHds+fwdeRU8CUqLLJJn7E41iMvFs1lz79k0UxERESkGSyONYx3RxMRkbquJ2UAAIyNjVBUVKxc3rF1owrXr+4Vx9slZ1XsmSO+RLwhj4iIiIhIgSPHCpznS0RUu3ilTXPKRppfNKyaA8HaPhfyRkDSByyOiYhIr+lDka2pGFUVwVKjqgjmdA7SB5xWQURERESkwJHjOoBTQoiIqCKqRnCJSDUWx0REpFX7rxxAbgXPwiV6leoU95y3TJrCaRVERERERAocOSYiIqqjVN8AVz2qRnAtLEw1sn0iKeHIMRERERGRAkeO6zDeqEdERERUPSyOX0Efnp+p756/XGdhYYocxY07vLmCiOo6fXlusbbxqRokJSyOX0McUSYiIiKq2GtXHHMkuPr4dZ/sAyJSj76MEKuK015Wy4EQ6dBrVxyTalK7rMWClIikTF8KXm26lheNopLil5bby97SQTQV47mEqqvOFscvjhCbp5nW2YfQaypBV3dkoLoJR1PFd3X2K7WCn4iIiKRN74tjTpOgMvpQCHMEg4heB5p6vnJ1vJhfn7/BW53tlGGefn3ofXFMpEtMokTSpeqqWsfWjTSyHdI+VUV2daZtqNpG6InqrQ9UL6/z/KC/WBzrkRcTtLGxEYqKXp7rVVOaSEKAfozgaltZH1R15IJJlIiISBpYHBPVISyyqSaq+3hHbT8OsrojtdUdCabqUzV4YmyomTKiutMwqrK+cZ5RhTcL1lT1p3H+Tav75eNXtUdvimN9nlusqUt7ulLdpCWlu5T1naaKXU1up6qj4ZoqyFnw6051866q9c0tTDURjkrVLaafX1/TV+BIf1T33Fbdv7Pq3uS++vi+aq0/TEUK1MTN8pXl+erc/K6rc5W6JFcc63MRDFTvzcN5bM88n6Ce/6TPIls1XTz5Q9vbltp0HKkkaSIiql0GQgih6yCIiIiIiKTAUNcBEBERERFJBYtjIiIiIiIFFsdERERERAosjomIiIiIFFgcExEREREpsDgmIiIiIlJgcUxEREREpCC5LwHRhEePHmHt2rW4fv06goKCAACZmZlYvXo1WrVqhbt372LOnDlo3LixjiOtuoratH79esTGxirXmTFjBvr06aOrEKslKSkJa9euhaOjI1JTU2FlZQVvb2+9Pk6q2qTPx6m0tBQzZsyAs7MzioqKkJycjG+//Rb5+fl6eZxUtWfLli16e4ykRsq5Ssp5R+r5Q+q5QB/e2/n5+fjwww/Rt29f+Pj4SOLvTlVsUvm7A4CPPvoIpqbPvmXT0NAQP/30k9b7rk4Wx+fPn8egQYNw7do15bI1a9agV69eGDp0KI4dO4aAgACsXLlSh1FWT0VtAoCdO3fqKCL1ZGZmYujQoRg8eDAAYOjQoXB3d8f+/fv19jipahOgv8cJALp27YpZs2YBAGbOnInffvsN586d09vjVFF7AP0+RlIi5Vwl5byjD/lD6rlA6u/tsg8/ZaRUl7wYt7j5mAAAIABJREFUGyCdfuvXrx9mz55dbpm2+65OTqsYMmQILCwsyi2LiopCt27dAADdu3dHVFSULkKrsYraBACbNm3Ctm3bEBgYiLy8PB1EVjPOzs7KkwDw7FN/vXr19Po4qWoToL/HydDQUHmyKS4uRlpaGtq1a6e3x0lVewD9PUZSI+VcJeW8I/X8IfVcIPX3dmhoKLp37w47OzvlMqn0XUWxAdLoNwC4ceMGAgMDsX79ehw/fhyA9vuuThbHFZHL5cqEbWlpiSdPnqC4uFjHUalnyJAhmDhxIqZMmQILCwt8/fXXug6pRn7//Xf07dsX9vb2deY4Pd+munCcTpw4AS8vL7i7u6NLly56f5xebE9dOEZSJsX+lXLekXL+kHoukOJ7+9atW0hMTMS7775bbrkU+k5VbFLotzLTpk3D9OnTMWvWLPzwww84e/as1vvutSmObWxskJOTAwDIzs5Gw4YNYWSk37NK3njjDZibmwMA3n77bURHR+s4ouqLjo5GTEwMFi5cCKBuHKcX21QXjlO/fv2wbds2pKSkYPfu3Xp/nF5sT104RlImtf6Vct6Rev6Qei6Q4nv7999/h4mJCQIDA3H+/HnEx8fjxx9/lETfqYpNCv1WxtnZGQAgk8nQs2dPxMTEaL3vXpvi2M3NDRcvXgQAXLhwAW5ubjqOSH0BAQHKf9+7dw9t2rTRYTTVd/z4cZw8eRJ+fn549OgRLl68qPfHqaI26fNxunXrlvIyFgDY2dkhJSVFb4+Tqvbo8zHSB1LqXynnHSnnD6nnAim/t2fOnAlvb29Mnz4dPXr0gLOzMyZNmiSJvlMVmxT6DQBu376NX375pVwsrVu31nrfGQghhEa3KAGxsbEIDQ3FiRMnMHbsWEyePBn5+flYtWoVWrRogeTkZHzxxRd6cXd9mYratHHjRuTl5cHGxgY3btzAZ599ppxjJXVXrlzBhAkT0LlzZwBAbm4uxo8fj4EDB+rtcVLVpjt37ujtcUpKSsKKFSvg6OiI4uJi3L59G19++SWMjY318jipas+OHTv09hhJjZRzlZTzjtTzh9RzgT68tw8fPozdu3ejqKgI48ePR9++fSXRdxXFlpCQIIl+S0tLw9KlS+Ho6Ijs7GwUFxfD19cXWVlZWu27OlkcExERERHVxGszrYKIiIiI6FVYHBMRERERKbA4JiIiIiJSYHFMRERERKTA4piIiIiISIHFMRERERGRAotjIiIiIiIFFsdERERERAosjomIiIiIFFgcExEREREpsDgmIiIiIlJgcUxEREREpMDimIiIiIhIgcUxEREREZECi2MiIiIiIgUWx0RERERECiyOiYiIiIgUWBwTERERESmwOCYiIiIiUmBxTERERESkwOKYiIiIiEiBxTERERERkQKLYyIiIiIiBRbHREREREQKLI6JiIiIiBRYHBMRERERKbA4JgKQkpKCoUOH6joMIiIi0jEWxyQ5wcHBmDBhQq3u087ODnv37tXKtgcOHIiYmBitbJuISCp0kbuJtIHFMZFCgwYNdB0CERER6ZiBEELoOgiSpgULFiAkJAQ9e/bEjh07kJycDE9PT5w4cQIPHz7EtGnTUFhYiD179sDExATffvst7ty5g9LSUgwePBhTp05FUVERpkyZgtjYWPj7+yMyMhIxMTHYunUr7t69i6CgINSrVw9mZmaYP38+5HI5/P398fjxY3Tq1AkODg5YtGhRubg2bNiAn3/+Ge7u7sjIyEBaWhpsbGywfPlyWFtbAwBOnDiBDRs2wMTEBBYWFliyZAlsbW2Vrx0yZAgyMzNx8eJFuLi44MGDB4iOjsbRo0dhaGiIzz//HHFxcVi2bBlCQ0Mhl8uxdu1ahIWF4fTp07CxscGGDRtgampa6f58fX1x4MABtG/fHg0aNICPjw86d+6M0NBQZb/Z2tpiyZIlsLS0xOLFi3HgwAF4enri1q1bOH/+PMaNG4fZs2fX+vEnIv0k1dwNABs3bsQff/wBU1NT1KtXD0uXLoWtrS2OHj2KlStXonHjxujSpQvOnDmDrKwsHDt2DFeuXMGyZctgYGAAmUwGf39/2NvbV7o9IrUIokq4ubmJCxcuCCGE2L59u+jUqZO4fPmyEEKIwMBA5b99fX2Fj4+PEEKIvLw8MXz4cBESEqLcjoODg1i/fr0QQojw8HARGxsrXFxcREFBgRBCiB9//FEEBQUJIYQICgoSnp6elcbl4+MjBg8eLJ4+fSqEEOLLL78Uc+bMEUIIkZSUJLp27Spu374thBBi165dYuLEieVeO3z4cJGbmyuysrLExo0blTEmJycLIYRITk4WDg4O4vDhw0IIIf71r3+JQYMGifv374vS0lIxcuRIceDAgSrtb8CAASI6Olr587lz54SLi4uQy+VCCCGWL18uFi5cqPy9p6en+PTTT0VxcbG4deuW2L9/f6V9QUT0Iqnm7h07dojS0lLl+nPnzlX+LigoSDg7O4tbt24JIZ7lxqysLOHq6ipOnz4thBAiMjJSvPvuu6KkpOSV2yOqKU6roEr1798fx48fBwBcvnwZ77zzDqKiogAAf/75J5ycnFBaWoqIiAh88MEHAAAzMzMMHToUwcHB5bY1ePBgAMCIESPQpUsXAEBoaCjy8vIwfvx4DB8+vFqxubm5wdLSEgAwatQoHD58GCUlJThw4AA6d+6M9u3bAwCGDx+OM2fO4OHDh8rX9urVC/Xq1UP9+vUxa9Yslfvo3bs3AMDBwQENGjRAixYtYGBggDfeeAPJyckAUKX9PS8kJAQDBw5UjnKPGDECEREREM9dxHFzc4NMJoO9vT0+/PDDavULEZFUc3fz5s3xySefYPz48fjpp59w9erVcr9v166dclTYx8cHkZGRMDc3R69evQAA7u7uePz4MeLi4qq0PaKaMNJ1ACRt7u7uWLt2Lby8vGBubo6ePXti9+7d8PT0RP369WFgYAC5XI7CwkJlsQcA1tbWSEtLK7etskIWeJaEd+3ahc2bN2Pt2rVwd3fH3Llzy23jVRo2bKj8t5WVFYqKipCRkYHU1FTcvn273I0hLVu2hFwuR9OmTQEA9evXr9I+ymKWyWSwsLBQLjcyMkJRUREAVGl/z3tx/eLiYjRu3BgZGRnK9lc1PiKiikgxd9+9exeff/459uzZA2dnZ8TExMDX17fcOi/mvtTUVDx58qRcfrW2tkZmZmaVtkdUEyyOqVK9e/fGP//5TwQHB6NXr17o1asX/Pz8EBYWhv79+wN4lqhMTEyQnp6u/MSfnp5e6byvoqIi2NjYYNWqVXj69CkWLFiAgIAABAQEVDm2J0+eKP+dkZEBY2NjNGrUCM2bN0fnzp0RGBhYbt3nE7wmVXd/zZs3R6tWrfDVV18pl6Wnp1frgwERUWWkmLv//PNPWFhYwNnZGcCzgYFXad68OZo1a4adO3cql2VnZ8PExARHjhyp9vaIqoLTKqhSZmZmcHFxwffff49+/fqhUaNGykKw7DKXoaEhPDw8lJfi8vPzcejQIYwZM0bldtPS0pQ3a9SvXx+dOnVCSUkJAMDCwgJ5eXkAgNmzZ6tMeCdPnkR2djaAZ5f43nvvPchkMgwbNgxxcXG4f/8+AEAul8PT0xOlpaUa6JGXvWp/FhYWyM/PR3R0NH766SeMHj0aUVFRyuI+MTERM2fO1EpsRPR6kmLubtOmDbKysnDnzh0Az25kfpUBAwYgMzMT8fHxAIDc3Fx88sknyM7OrtH2iKpCtnjx4sW6DoKk7enTp8jIyMDHH38MAHj8+DEKCgrw97//XbmOq6srTp48ia1btyI4OBhDhgzBuHHjYGBggMmTJyM5ORlxcXGws7NDq1atYGRkhNjYWGzduhUhISH4f+zdeWCM59oG8CuZrJJYEqEoitYWYiut2mI7qJ0uVHKq9mrqw2klaUrRfvatVBGctFR7tCKLpdVS0RxEbLFGqC2xRRZEZM/c3x8m7ydkZCKzxvX7y0xmud7nnfee2zPPO5OcnIzAwEC4uLjA3d0doaGhCAsLQ/369eHl5fVEpt27d6NevXrYvn071q1bB7VajdmzZ8PR0RGVKlVCkyZN8NVXXyE8PBy7du1CYGAgatWqheDgYISFheHChQtISkpChw4dAADvv/8+rl27hhMnTqBz586YPHkykpKScPbsWdSpUwfz5s1DQkICsrOzceXKFfzyyy+4cOECXF1d0bZtW63PBwBqtRqrV6/G0aNHMXLkSDRr1gyVK1fG//7v/2Lbtm3Yv38/Zs+ejSpVqmDBggXYt28f4uLikJeXh1atWhl+BxNRuWRutbtatWrIz8/HwoULER0dDTs7Oxw9ehRXrlyBs7MzlixZgoSEBERHR2PgwIEAADs7O7z22mtYsGABtm7dioiICHz44Ydo2rTpUx+vZ8+eRhtnKn/4VW5kkfz9/VGrVi1+xRkRERHpFZdVEBERERFp8IQ8sjjffPMNoqKiYG9vjxdeeIFfdUZERER6w2UVREREREQaXFZBRERERKRhkGUVycn3n+l+VapUwJ07mXpOUzbMpBtm0g0z6aa8ZnJ3N9yPuxTWXXMbO3PLA5hfJnPLA5hfJnPLA5hfJnPLA5hHpmepu2Y1c2xjozJ1hCcwk26YSTfMpBtmenbmltPc8gDml8nc8gDml8nc8gDml8nc8gDmmUkXPCFPz3Zc+v2J6/rW/4cJkhARmYfi6iLA2khE5onNsRHwjYGIiIjIMpjVsgoiIiIiIlNic0xEREREpMHmmIiIiIhIg80xEREREZEGm2MiIiIiIg02x0REREREGmyOiYiIiIg0+D3HRESkF8V9p3uFJHsTJCEienacOSYiIiIi0uDMsQnxl/OIiIiIzAubYzPEppmIiIjINNgcPyNtDSwREemGEwFEZI645piIiIiISIPNMRERERGRBpdVaGj7eG+k+1AjJ9GuMGOFJHtkPshRrudHkERERET6weaYiIhKxdDnXHAtMhGZEpdVEBERERFpsDkmIiIiItLgsooS/Hx6e5H1vURERERUfnHmmIiIiIhI47mbOeaPdxARERGRNs9dc0yGFxZ1qdjrB3Wqb+QkRERERKXDZRVERERERBqcOSYiomKZ2zI0fv8xERkDm+NygG8YRERERPrB5pie2aNri52c7PHADL/yjuufiYiIqDTYHJdj5XVG2ZANL5tpIiKi51u5bY7Nba2cIZ1LuFPK228u9voGqrbFXm+qxlBbo0pEzydtta6vif7vWlyN4n+kiSxfuW2OSX/YpBKRJdJWu8YOaWHkJERkSdgck8mZovlmw09Ufhj6eDanGsWZaSLDY3NMRmPsNzAnJ3uDPXYhvlERkS70VUNYi4gMz2Ka4+dpDTGVXz/uOlfst3rwjY3o/5X2PIqLBYeLvV7beRTajkMiIsCCmmP6/zcMW1sb5OXlmzgNAYafxeEsERkDJx/0p7SfkIVFXdLLV2Hq+5O5xzOx5tDzhM2xCWmbHWlcp4qRkzxU2tkX0k7bG5W2pR7mtga6tE05z9qnpzFVrdNW0zzR0aDPS0SWzeyaY0ufwSjtx4GGegx9Km3TzCbb8PTRvBYy9g+46Cu7oZvv4tawP3iQw6a/HIjLikZewZOfvrFGaaev/8A/S40CdKtTlvKJHT8RNH9WIiKmDkFEREREZA6sTR2AiIiIiMhcsDkmIiIiItJgc0xEREREpMHmmIiIiIhIg80xEREREZEGm2MiIiIiIg02x0REREREGib9EZB33nkH9vYPfzHM2toa33//Pe7evYvFixejdu3auHLlCqZOnYqqVasaJc+1a9cwcuRI1KhRAwCQkZGBRo0aoVatWoiJiVFuN2HCBHTo0MFgOZKTk7Fs2TKcO3cOISEhAPDUcVm3bh0yMjKQnp6ODh06oHv37kbJNGfOHDg6OqJChQo4d+4cPvvsM7i7u+PatWsYM2YM3N3dAQAeHh7w9/c3SqYVK1Zo3VemGqdx48YhKytLuU18fDyioqKQnJxs8HFKSEjAsmXL0LRpU9y6dQuVK1eGr6+vSV9P2jKZ8vWkLZOpX0+lceDAAfz+++9wc3ODlZUVfH19jfbcpa3l+h47fdXMuLg4bNq0CS+++CJSU1Ph5+cHG5tne5vUV33SVyZ91gJ9ZNLnMaevMVKr1ZgwYQI8PT2Rl5eHxMREzJkzB9nZ2SYZI2151q5da7IxAoDs7Gy8/fbb6NixI/z8/Ex+rBmEmNDy5cufuG769OmyY8cOERHZs2ePfPLJJ0bLk5aWJvv371cuf/3113L48OFicxrSr7/+Knv27JHBgwcr12kbl9jYWBkzZoyIiOTm5krPnj3l3r17Rsm0ZMkS5d9r1qyR2bNni4hIYmKihISE6D2DLpm07StTjlPhfhMRSUhIkOnTp4uIccbpxIkT8scffyiX+/TpI6dOnTLp60lbJlO+nrRlMvXrSVeZmZnSo0cPycnJERERX19fOXDggNGevzS13BBjp4+aqVarpW/fvnL79m0REZk7d678/PPPes1U2teTPjPpqxboK5O+jjl9jlFBQYGsXLlSuTxhwgQJDw832Rhpy2PKMSq8/7Rp02TevHkiYvpjzRBMuqzi/PnzCAoKwooVKxAZGQkA2LdvH1q1agUAaN26Nfbt22e0PFWqVMEbb7wBAMjNzcXp06fx6quvAgBWrVqF9evXIygoqMgsoCH07t0bTk5ORa7TNi579+5Fy5YtAQC2traoX78+Dh8u/ueb9Z1pypQpyr9FBBUqVFAu7927F+vWrcOyZcvw999/6z2PtkxA8fvKlOP05ptvKv/esGEDvL29lcuGHidPT0/06NFDuaxWq+Ho6GjS15O2TKZ8PWnLBJj29aSr2NhY1KxZE3Z2dgAe7tPCmmoMpanlhhg7fdTMxMREZGdnK59QlPX9Rx/1SZ+Z9FUL9JVJX8ecPsfI2toaEydOBADk5+cjKSkJ9erVM9kYactjyjEKCwtD69at8eKLLyrXmfpYMwSTzmGPHTsWnp6eKCgowIgRI+Dk5ITU1FSloDg7O+PevXvIz883+nT7tm3b0LdvXwAPi1ytWrVQoUIFbNq0CV9++SXmzJlj1DzaxiUtLQ316///77E7OzsjLS3NqNnS09Px3//+FytWrAAAuLq6YtKkSXjllVeQkpKCd955B2FhYahYsaLBs2jbV+YwThkZGbh58yYaNmwIwPjj9Mcff6Bjx45o0KCB2byeHs1UyNSvp0czmfPr6VGP7s/CPKmpqUZ7/tLUcmONXWmf3xhjWNrXk6EylaUWGCJTWY45Q+SJiorCd999By8vLzRv3tzkY/R4HgcHB5OM0d9//41Lly5h6tSpiI+PV6439fgYgklnjj09PQEAKpUKr776Kg4dOgQ3Nzc8ePAAwMNmolKlSiZZh/Lbb78pM36vvPKKMov1+uuvIzo62uh5tI2Lq6urcn3h31xdXY2W6/79+5g1axbmzJmDypUrAwAqVKiAV155BQBQtWpVVK1aFefOnTNKHm37ytTjBABbtmzB0KFDlcvGHKfo6GgcOnQIn332GQDzeD09ngkw/evp8Uzm/Hp61KP7szCPm5ub0Z6/NLXcWGNX2uc3xhiW9vVkiExlrQX6zlTWY84QY9SpUyesX78e165dw6ZNm0w+Ro/nMdUY/fHHH7Czs0NQUBCOHj2KkydP4rvvvjP5+BiCyZrjixcv4pdfflEuX716FXXq1EGXLl1w/PhxAMCxY8fQpUsXo2eLjo5Gq1atYGtrCwCYP39+kZx169Y1eiZt49K1a1fExsYCePixy8WLF9G2bVujZEpLS8OsWbMwbdo01K5dG7t27QLw8GOXwv9V5uXl4datW6hVq5ZRMmnbV6YcJ+DhR4ZRUVHw8vJSrjPWOEVGRuK///0vAgMDkZycjOPHj5v89VRcJlO/norLZK6vp8e1bNkSN27cQG5uLoCH+/TR15ohlbaWG2vsSvv8tWvXhoODA5KTk5+4j76U9vWk70z6qAX6zKSPY06fef7+++8iy5FefPFFXLt2zWRjpC2Pqcboww8/hK+vL8aNG4c2bdrA09MTI0eONMtjraysRERM8cRJSUmYPXs2mjZtioyMDOTn5yMgIADp6elYtGgRatasicTERPzrX/8y2rdVFJo6dSo+//xzZTZj8eLFyMrKgpubG86fP49JkyYp634MISYmBmFhYYiKisLw4cMxatQoZGdnax2XdevWIT09Hffu3UPnzp0NctZ8cZmGDx+O/Px8ZYbPyckJq1evxsGDB7F582Y0adIEV69eRZs2bYrMmBoy08qVK7XuK1ONk4ODA3bv3o1bt24VWW9sjHE6ffo0fHx80KxZMwBAZmYmRowYgW7dupns9aQt08aNG032etKW6fLlyyZ9PZXG/v37sWvXLlSpUgW2trZG+7aKZ6nl+h47fdXMuLg4bNy4ETVr1sS9e/fKdAa9vuqTvjLpsxboI5M+jzl9jVFCQgIWLFiApk2bKs3c559/DltbW5OMkbY8GzZsMNkYAcCuXbuwadMm5OXlYcSIEejYsaNJjzVDMFlzTERERERkbvgjIEREREREGmyOiYiIiIg02BwTEREREWmwOSYiIiIi0mBzTERERESkweaYiIiIiEiDzTERERERkQabYyIiIiIiDTbHREREREQabI6JiIiIiDTYHBMRERERabA5JiIiIiLSYHNMRERERKTB5piIiIiISIPNMRERERGRBptjIiIiIiINNsdERERERBpsjomIiIiINNgcExERERFpsDkmIiIiItJgc0xEREREpMHmmIiIiIhIg80xEREREZEGm2MiIiIiIg02x0REREREGmyOiYiIiIg02BwTldLMmTPx6quvYuvWraaOQkRERHrG5phMauvWrfDx8TF1jKfy8fEp0gjPnDkTTZo0MWEiIiLdWUKdLYvHazRRWbE5JiIiIiLSsBIRMXUIMj1/f3+Ehobi1VdfxYYNG5CYmAhvb29ERUXh9u3bGDt2LHJzc/Hjjz/Czs4Oc+bMweXLl6FWq9GjRw+MGTMGeXl5GD16NGJiYjBjxgzs3bsXhw4dwrp163DlyhWEhITA0dERDg4OmDZtGlJTUzFjxgykpKSgSZMmaNiwIaZPn14kl1qtxqxZs3D+/HmoVCrUrVsXgYGBOHjwIBYuXIiqVavC09MT0dHRqFKlCr788kssXboUp06dQq9evTBlyhQAgIhg/fr1+P3336FSqfDSSy8hMDAQzs7OAIC//voL3377LVQqFRwcHDBjxgzUrVsXixcvxk8//YSqVavC3d0do0ePhpeXF3x8fNCmTRtcunQJ8fHx6NWrF6ZOnYrc3NwiYxAZGYnLly/Dz88PPXv2BAA8ePAAX331Fa5cuQIRwcCBAzF8+HAAwO7du7F27Vo4ODjA2toakyZNQqtWrXDs2DEsXLgQtra2EBGMGjUKXbt2NeIrhIjKylzrLAAkJCRg9uzZyM7ORl5eHjp27IiPP/4YgPb6uHnzZqxZswYtWrSAi4sLjh07hsaNG8PX1xdLlixBXFwcRo4ciREjRuDkyZOYPn067t+/j/79++Po0aO4d+8epk2bhk6dOgEANm/ejNDQUNjZ2cHKygrTp0/Hyy+/DOBh3Zw7dy4uXrwIAKhXrx4++eQTBAcHP1GjIyMjsX37dnh7ez9RowutW7cOv//+O2xsbNCkSRP4+fnBzs4OFy9exKxZswAA+fn5eOuttzBkyBCkpKTA398fOTk5yM/PR9euXTFu3DiDvl7IhIRIo0uXLnLs2DEREQkODpYmTZrIqVOnREQkKChI+XdAQID4+fmJiEhWVpb069dPQkNDlcdp2LChrFixQkREIiIiJCYmRtq1ayc5OTkiIvLdd99JSEiIiIiEhISIt7e31kyRkZEyevRo5fLEiRMlMTFRuW/Lli3l+vXrolarZeDAgTJ27FjJycmRlJQU8fDwkKSkJBERCQ0NlTfffFMyMzNFROSzzz6TgIAAERFJSEiQli1byqVLl0REJCwsTHr16iV5eXkiIuLt7a3kLeTt7S3jxo0TtVotSUlJ0rRpU7l161aRMQgKChIRkR07dsg//vEP5W+BgYHy6aefiojI/fv3pVu3bnL48GEREXn99dclOTlZRET++OMPWb58uYiIDB06VGJjY0VEJC4uThl/IrIs5lhn8/PzpU+fPrJ161YREUlPT5dOnTqJSMn1cfny5dK5c2dJT0+XnJwcad++vQQGBoparZbTp09Ly5YtldtGR0dLo0aNZO/evSIicvToUWnZsqWkpaWJiMhPP/2k5I+Ojpbhw4crGT///HPx9/cXEZGCggIZP368REdHi4j2Gj127Nhia3R4eLj07t1bMjMzRa1Wy6RJk2TlypUiIjJp0iTZsWOHiIjcvn1bef+ZP3++rFmzRkREHjx4IMOGDdM6nmT5uKyCFJ07d0ZkZCQA4NSpU+jZsyf27dsHADh79iw8PDygVquxbds2DB06FADg4OCAN99884n1Xj169AAA9O/fH82bNwcAhIWFISsrCyNGjEC/fv10ylSxYkWcP38e+/fvh1qtxpIlS1CzZk3l7/Xq1UPNmjVhZWWFl19+GfXr14ednR3c3Nzg6uqKa9euAQDCw8PRp08fODo6AgCGDBmCiIgI5OfnY/v27WjevDnq1asHAOjXrx9u3LiB48ePPzVbhw4dYGVlhWrVqqFKlSq4fv16kb8XzoY0atRI+ZtarUZ4eDjeeustAICzszO6du2KiIgIAEClSpXw888/Iz09Hd26dVNmJipVqoTw8HCkpKSgcePG+OKLL3QaPyIyL+ZYZ2NjY5GQkID+/fsDAFxcXLB06VIA0Kk+enp6wsXFBXZ2dqhbty4aNWoEKysrNGrUCJmZmUhNTVVu6+TkBC8vLwBA69at4ebmpmz/yy+/jAkTJuC9997D4sWLcebMGQAP62ZYWBiGDBnjU8HRAAAgAElEQVQCALC2toa/v78yq6xNx44dlRpduXJlpQ6Hhoaib9++cHR0hJWVFfr164fw8HAAD2vtb7/9hmvXrsHd3R0rVqwAAFSuXBlRUVG4cOECKlSogH//+986jS1ZJjbHpPDy8sLevXuRmZmJChUqoFu3bti3bx/u3bsHFxcXWFlZIS0tDbm5uXB1dVXu5+rqiqSkpCKPVbhcAXhY2H/44QfExMSge/fumDFjBjIyMnTK1KpVK3z55ZdYu3YtunbtivXr10MeWQnk5OSk/NvGxuaJy3l5eQCAW7duPZE5Ly8PqampT/xNpVKhYsWKuHXr1lOzPbqNdnZ2ynM9/nd7e3vlb4Xjt3DhQvj4+MDHxweHDx9Gbm4uACA4OBhJSUno06cPJk+ejNu3bwMAFi9eDAcHBwwePBijR4/GlStXSh48IjI75lhnk5KSULFiRdjY2CjXtWnTBsCTtbO4+qitDhc+3qO1sVKlSkWeu3Llyrh9+zbu37+P8ePH45133sGPP/6IJUuWIDs7GwCKHY+XXnoJbm5uT92uR8fn0Tp869YtbNu2TanBa9euhbX1w3bos88+Q+PGjfH+++9j+PDhiI2NBQCMHj0a//jHPzBlyhQMHDhQ+Q8OlU9sjknxxhtv4OrVq9i6dSvat2+Pzp074+zZswgPD0fnzp0BPCzQdnZ2SEtLU+6XlpaG6tWra33cvLw8uLm5YdGiRdi1axfu3buH+fPn65Tp/v37aNeuHb777jts3LgRYWFhCAsLK/W21ahR44nMtra2qFq16hN/KygoQHp6Ol544YVSP09JCsdv+vTp2LhxIzZu3IgtW7YgMDAQwMM3nlmzZmHPnj1wc3NDQEAAACA3NxfTpk3D3r170bZtW0ycOFHv2YjI8Myxzr7wwgtIT09Hfn6+ct3FixeRnZ2t9/p47969Ipfv3LmDatWq4fLly8jIyFA+cXs0S3HjkZSUhOTk5GfKUKNGDbz99ttKDf7555+xadMmAEB6ejomTpyI3bt3491338WHH36ozH77+Phg+/bt8PPzw7Rp05CQkPBMz0/mj80xKRwcHNCuXTt8++236NSpE6pUqYJmzZohKCgI7du3B/Dw46xBgwYpH+9lZ2fj119/VT7uKk5SUpJyAoiLiwuaNGmCgoICAA9nHLKysgAAH3/8cZGCCAB//PEHNm/eDACoU6cOqlevDrVaXeptGzx4MH777TdlJiIsLAwDBgyASqVC3759cfr0aVy9ehUAsHPnTtSsWROtWrUqkvHKlSs6v9loUzh+hcsoAGDVqlVKwz9hwgQUFBTAwcEBnp6eyjhNmjQJWVlZsLGxQevWrZXriciymGOdbdGiBerUqYPt27cDAO7evYvJkyfrVB9LKzs7W5l1PXLkCNLS0tClSxfUrFkTNjY2OHnyJAAgKipKuc/j46FWqxEYGKg0x6Wt0YXvBzk5OQCAQ4cOKUvVAgICkJKSAisrK7Rt2xb5+fmwsrJSTjAEHi4jKTw5msonm5JvQs8TLy8v5OTkwMXFRbl8+PDhIh+b+fv7Y86cORg+fDgKCgrQr18/DBw4EAAwatQoAMDUqVMxZcoUtG/fHq6urqhUqRKGDx8Oa2tr2Nvb46uvvgIAvP7661i9ejWGDRuGZs2aFflYDwBatmyJefPm4c8//0RmZiYaNWqEgQMH4uDBgwgKCkJKSgqWL1+O6tWrIyoqCvb29mjcuDH279+P5ORkzJkzB4sXL0b//v1x+/ZtvP/++7C2tla+rQIAateujeXLl8PPz085G3v16tVKlqFDh2LRokUIDQ3FJ598ggULFiAuLg7JycmoV68ewsPDizzXnDlzlDFYv369cob0qFGj8O9//1sZv2HDhilnSn/00UcAgLZt22LEiBGwtbVFQUGBUrC7d++ODz74ALa2tsjOzi5zk05EpmNudValUmH16tWYPXs2tmzZArVajenTp8PW1vap9XHbtm0IDQ1FTk4OfvzxR6SlpRWpjevXr1dyBgUFAQCqV6+OuLg4rF+/Hnfv3sXXX3+NKlWqAAA+//xzBAYG4pVXXkHdunWVbX20bg4fPhwigv79+6Np06YASl+j+/fvj+TkZHh7e8PR0RHOzs6YPXs2gIdrqn19fWFnZ4eMjAwsWLAAjo6O6N27N7766iuoVCpkZGRg8uTJSkYqf/hVbkRERGRwhw4dQkBAAP78809TRyF6Ki6rICIiIiLSYHNMREREBnXy5EnMmTMHycnJmDRpkqnjED0Vl1UQEREREWlw5piIiIiISMMg31aRnHzfEA+rsypVKuDOnUyTZngac87HbM+G2Z6dOefTdzZ3dxe9PdbjnlZ3zXmM9YnbWX48D9sIPB/baeptfJa6Wy5njm1sVKaO8FTmnI/Zng2zPTtzzmfO2UqjvGxHSbid5cfzsI3A87GdlriN/J5j0rsdl34v9vq+9f9h5CREVJ6wthCRMbA5JiIis6KtCSYiMgY2x6TYcel3VEiyR+aDnCLXc1aGiIiInhflcs0xEREREdGz4MwxERHpBdcEE1F5wOaYiIhMgmuLicgccVkFEREREZEGm2MiIiIiIg02x0REREREGmyOiYiIiIg0eEIeEREZFE+8IyJLwuaYiIhKxdya3cI8j/+IEb9CjoieBZdVEBERERFpsDkmIiIiItLgsgoLYm6/PlXaj1b1lX9x5OZir/+X17ulehwiKt/MrWYSkWVgc0wmxzcwIiIiMhdsjssxfTWd5nbyDREREZGhsDkms/X48glbWxvk5eWbKA0RlRf8tIqInoYn5BERERERaXDmuBzQ14lxz5OwqEvFXj+oU30jJyEiIiJzwub4GenjYzl+tEdE5oz/kSai5xGbYyN4/A3m8V9xKun2ZDycUSayPOcS7gB48ryExnWqmCoSEVkwNsdUosI3nsdZwhuPtmaXiIiIqDhsjomIiCxEWNQlODnZ48Fjnz7y0y0i/WFzTOVG8ctRXjZ6DiKyTDwPhIgANsdkBrQt29DH4zRQle4xLhYc1vIXzsoQ0bMr7fkMXBJGZDpsjumZWfJaZEPjiX1E5Ychj2c2wUTmh80x6R2bZiIyZ/qqUY//iicA/Mvr3WfKVFb8DzmR/rA5JiIiegrty62IqDxic/wcep5mdp+3N7XiZo+cnOzRs3UtE6QhMq3Sns+gj/MfuEyCyPKxOdYz/oAHPU1p3zj5kSg9j/R1kq450fYf9QaqtkZO8hCXYRBpx+aYFOcS7jzxC1P6fnxLVVJTW9z3jhLR86e0n1YZumnmTDZR6bE5LoElzARbctNZXpnbLJE2nD0iwDLqHBGRsbA5JiIisnDmNgNd1u9vLunTuOIen//ZJ3157ppjS54h4QwxGQvfZIieT/pahsHlHGTJnrvm2Jyw2bUcJa0jtM2yQV7Bs6/V1vb4iyOLv95U36VK9CxY60rvaTWnrPWGno25TRqYW57yxOyaY339tn1pZ4hL+/Vmpbn947c15ElvZN709dVyxf34gDa2WTZ4ENWqVHm0fRSr7XlL06w/fmxWSLJH5oOcUh/j+npjeNoMV3Ef7T5Pbzxsai2fvmqOvpZn6Jqn8D8AhjxPwxxmyR+tMebWZGtjyJzm0vBbiYgY9RmJiIiIiMyUtakDEBERERGZCzbHREREREQabI6JiIiIiDTYHBMRERERabA5JiIiIiLSYHNMRERERKTB5piIiIiISMPsfgSkJAcOHMDvv/8ONzc3WFlZwdfXt8jfc3JyMH/+fFSvXh1XrlzBuHHjUK9ePQDA3LlzYWNjA7VajezsbEyfPh3W1vr7/0FJ2QBg586dWLJkCQIDA9G1a1fl+vDwcMTFxcHa2hp16tTBsGHD9JarLNlOnjyJ77//Hk2bNsXly5fh6emJd955xyyyFUpNTcWgQYMwfvx4eHt76zVbWfPFxsZi//79sLa2xqFDhzB37lzUqFHDLLKZ+ngICgpCSkoKqlatijNnzmDSpElo0KABANMfD9qyGeN4KIuy1EdLUpbXlqXQ5dgGgIiICHz66ac4duwYnJycjJyy7EraThHBxo0bAQDXr19Heno65s6da4qoZVLSdiYmJmLBggVo3rw54uLi0K9fP3Tv3t1EaZ9NcnIyli1bhnPnziEkJOSJv1tU/RELkpmZKT169JCcnBwREfH19ZUDBw4Uuc2aNWskKChIRETOnTsnw4cPFxGR2NhY6d+/v3K7/v37y5EjR4yaLSEhQQ4ePCje3t7y559/KtffvHlTBgwYIGq1WkREhgwZIpcvXzaLbLt375YTJ06IiEhubq68+uqrkpqaahbZREQKCgokMDBQJkyYIBs3btRbLn3ku3//vvj6+ha53YMHD8wimzkcD0uXLlVe8zt27JDx48eLiHkcD9qyGfp4KIuy1EdLUpb9Zyl02UYRkb///luWLFkiDRs2lIyMDGPHLDNdtjM0NFRCQ0OVy3FxcUbNqA+6bOeMGTMkODhYRETOnDkjPXv2NHbMMvv1119lz549Mnjw4GL/bkn1x6KWVcTGxqJmzZqws7MDALRu3RqRkZFFbhMZGYlWrR7+VG6jRo1w7tw5ZGRkoHLlysjMzER+fj7y8/NhZWWFF1980ajZateujddff/2J+0ZFRcHDwwNWVlYAgFatWuGvv/4yi2zdu3eHp6enclmlUsHW1tYssgHA2rVr8fbbb6NSpUp6y6SvfPv27UOFChUQHByMb775BmfOnEGFChXMIps5HA+TJ09WXvNqtVoZG3M4HrRlM/TxUBZlqY+WpCz7z1Loso1ZWVlYt24dPvroIxMk1A9dtnPbtm24e/cuNmzYgCVLlljk7Lgu21m1alWkpaUBANLS0uDh4WHsmGXWu3fvp+4fS6o/FrWsIjU1tcjAOzs7IzU1Vafb1K1bF++88w7+53/+B9bW1njjjTfg6upq1GzapKWlFbmvk5OTzvc1dLZHbdq0CRMmTICLi4tZZIuOjoaDgwNatGiBn376SW+Z9JXv+vXrOHHiBL766iuoVCr885//ROXKlbU2+sbMZk7HQ25uLkJDQ/HFF18AMK/j4fFsjzLE8VAWZamPzs7ORstZVvraf+ZMl21cunQpJk6cqDRclkiX7bxx4wYyMjLg6+uLy5cvY8yYMdi5cydUKpWx4z4zXbbzgw8+wEcffYS5c+fi5MmTmDhxorFjGpwl1R+Lmjl2c3PDgwcPlMsZGRlwc3PT6TZ79uzBoUOHsHLlSqxYsQLXrl3Dzz//bNRs2ri6uha574MHD3S+r6GzFdq2bRsyMzMxcuRIveUqa7Y9e/YgJycHQUFBOH/+PPbv31/sOidT5XN2dkbTpk1ha2sLa2trtGzZEocPHzaLbOZyPOTm5mLmzJmYMmUK6tSpA8B8jofishUy1PFQFmWpj5ZEH/vP3JW0jTdv3kR6ejp+/fVXBAUFAQCCg4Nx6tQpo2ctC132pbOzM1q0aAEAqFevHjIyMnDz5k2j5iwrXbbT398fb7/9NgICArBy5UpMmTIFd+/eNXZUg7Kk+mNRzXHLli1x48YN5ObmAgCOHTsGLy8v3L17V5ma9/LywvHjxwEA8fHxaNy4MZydnXHr1i24u7srj+Xu7q48jrGyadOpUyecOXMGIgIAOH78ODp37mwW2QDgl19+QWpqKiZOnIj4+HhcvnzZLLIFBgZi3LhxGDduHBo2bIgOHTpg6NChestW1nyvvfYarl+/rly+ceMGXnrpJbPIZg7HQ3Z2Nr744gt88MEHaNasGXbt2gXAPI4HbdkAwx4PZVGW+mhJyrr/LEFJ21ijRg3MmzdPqX/Aw5nH5s2bmzJ2qemyL9u3b4/ExEQAD5upgoKCIrXLEuiynTdv3lS2q2LFirC2toZarTZZZn2x1Pqjmjlz5kxTh9CVra0tGjRogODgYMTGxqJatWoYOnQoli9fjgsXLqBNmzbw8PDAb7/9hrNnz2Lfvn2YNm0aqlSpgpdffhl79uzBmTNnEBMTg5SUFPj6+uptvaAu2UQEq1atwqFDh/DgwQM4Ojqibt26cHZ2RoUKFRASEoIDBw6gQ4cO6Nixo15ylTXb7t27MXPmTGRkZCA0NBQRERFo06aN3tanliVboS1btmDv3r1IS0uDi4uLXhvQsuRzdXVFbm4ufvvtN8TExMDe3h4ffPCBshbSlNnM4XiYPHkyzpw5gyNHjiA0NBTR0dF49913zeJ40JbN0MeDobdLW320JGXZf5ZCl20EHi5BCg4OxqFDh6BSqVCvXj2zbTaKo8t2NmvWDBEREYiPj8f27dsxYcIEi/vmEV22s0GDBvjhhx9w9epVREREYMCAAWjXrp2po5dKTEyM8k1D2dnZaN68Ob799luLrD9WUjg9Q0RERET0nLOoZRVERERERIbE5piIiIiISIPNMRERERGRBptjIiIiIiINNsdERERERBpsjomIiIiINNgcExERERFpsDkmIiIiItJgc0xEREREpMHmmIiIiIhIg80xEREREZEGm2MiIiIiIg02x0REREREGmyOiYiIiIg02BwTEREREWmwOSYiIiIi0mBzTERERESkweaYiIiIiEiDzTERERERkQabYyIiIiIiDTbHREREREQabI6JiIiIiDTYHBMRERERabA5JiIiIiLSYHNMRERERKTB5piIiIiISIPNMZERbNu2DYGBgaaOQUREj2BtpuJYiYiYOgRRcbZu3YrQ0FBs3LjR1FHKrKCgANnZ2XBycgIA+Pv7o1atWvj4449NnIyIyovyVDON5fHaXJJu3bph7ty5eO211wycjEyJM8dERqBSqXQuvkREZByszVQczhxTifz9/REaGopXX30VGzZsQGJiIry9vREVFYXbt29j7NixyM3NxY8//gg7OzvMmTMHly9fhlqtRo8ePTBmzBjk5eVh9OjRiImJwYwZM7B3714cOnQI69atw5UrVxASEgJHR0c4ODhg2rRpSE1NxYwZM5CSkoImTZqgYcOGmD59+hPZEhISMHv2bGRnZyMvLw8dO3ZUZmP/+usvfPvtt1CpVHBwcMCMGTNQt25dbN68GWvWrEGLFi3g4uKCU6dOoWrVqvjmm29gb28PANi/fz9WrFgBW1tbqNVqeHt7o0+fPrhw4QLmz5+vPN+QIUPw7rvvYs+ePfj888/h6OiITz75BG+++SZ8fHxw6dIlTJgwAVu2bMH9+/fx559/4vvvv0dQUBDs7e1Rq1YtDBgwAGvXrsWdO3fw3nvvYcqUKZg3bx5CQ0MxduxYjBkzxqj7m4jKhjWzbDVz7ty5aN++PZYsWYLjx4/DysoKHTp0wEcffQQrK6si23Py5ElMnz4d9+/fR//+/XH06FHcu3cP06ZNQ6dOnQAAKSkpmDVrFtLS0pCXl4fhw4dj8ODBiI+Px7Rp05TavGfPHixcuBBVq1ZFixYtcPjwYVhbW2PlypVwc3NDQEAAtm/fjvr166NixYrw8/ODo6MjZs2aBQDIz8/HW2+9hSFDhhjy5UXGIEQ66NKlixw7dkxERIKDg6VJkyZy6tQpEREJCgpS/h0QECB+fn4iIpKVlSX9+vWT0NBQ5XEaNmwoK1asEBGRiIgIiYmJkXbt2klOTo6IiHz33XcSEhIiIiIhISHi7e2tNVN+fr706dNHtm7dKiIi6enp0qlTJxERSUhIkJYtW8qlS5dERCQsLEx69eoleXl5IiKyfPly6dixo9y9e1cKCgqkb9++sm3bNuW+rVq1ksuXL4uIyIkTJ5QcsbGxEhsbKyIiubm50rt3b+V233//vXzwwQdKvj/++EM2b94sIiLR0dHStWtX5W9+fn6yfPly5fLp06eldevWkpWVJSIiKSkpEhAQoHXbici8sWaWrWZ+++234uPjI/n5+ZKbmyvvvvuuhIWFFbtd0dHR0qhRI9m7d6+IiBw9elRatmwpaWlpIiLy/vvvK/U2NTVVOnToIDExMcp9H63NISEh0qJFC0lISBARkTFjxsjq1auVv3ft2lWio6OVy5MmTZIdO3aIiMjt27dl9OjRWsefLAeXVZBOOnfujMjISADAqVOn0LNnT+zbtw8AcPbsWXh4eECtVmPbtm0YOnQoAMDBwQFvvvkmtm7dWuSxevToAQDo378/mjdvDgAICwtDVlYWRowYgX79+umUKTY2FgkJCejfvz8AwMXFBUuXLgUAbN++Hc2bN0e9evUAAP369cONGzdw/Phx5f4tWrRApUqVYG1tjVdeeQXXrl1T7tusWTO89NJLAABPT09MnjwZAFC3bl1s2bIFw4YNw6hRo5CcnIyzZ88qz3H48GEkJSUBAH777Tf06dNHp23x8PBAzZo1sWfPHgAPTxLp27evTvclIvPDmlm2mhkaGorBgwdDpVLB1tYWvXv3RkREhNZtc3JygpeXFwCgdevWcHNzw759+5CUlISDBw8qY+zq6govL68nxvhR9erVQ+3atQEAjRo1UrazOJUqVcJvv/2Ga9euwd3dHStWrNB6W7IcbI5JJ15eXti7dy8yMzNRoUIFdOvWDfv27cO9e/fg4uICKysrpKWlITc3F66ursr9XF1dlcJXyNnZWfm3g4MDfvjhB8TExKB79+6YMWMGMjIydMqUlJSEihUrwsbGRrmuTZs2AIBbt24VyaFSqVCxYkXcunWr2Bz29vbIy8sr9r6PPu68efOQmpqKTZs2YePGjWjSpAmys7OVbe3QoQPCw8Nx7949qFQquLi46LQtADBgwACEhYUBAA4ePIj27dvrfF8iMi+smWWrmbdu3UJwcDB8fHzg4+ODiIgIFBQUaN22SpUqFblcuXJl3L59W8lf0hg/Stt2Fuezzz5D48aN8f7772P48OGIjY3VeluyHGyOSSdvvPEGrl69iq1bt6J9+/bo3Lkzzp49i/DwcHTu3BnAw4JjZ2eHtLQ05X5paWmoXr261sfNy8uDm5sbFi1ahF27duHevXuYP3++TpleeOEFpKenIz8/X7nu4sWLyM7ORo0aNYrkKCgoQHp6Ol544YUSH/fx+wLA6dOnATxc3/bGG29ApVIp+R81aNAghIWFYefOnTrPGhcaMGAADh48iAMHDqB+/fqwtubhSWSpWDPLVjNr1KiBDz/8EBs3bsTGjRuxZcsWLFu2TGuGe/fuFbl8584dVKtWTclfmjEujfT0dEycOBG7d+/Gu+++iw8//BCZmZl6eWwyHb77kk4cHBzQrl07fPvtt+jUqROqVKmCZs2aISgoSJnhtLa2xqBBg5SPq7Kzs/Hrr78+9eSEpKQk5aQRFxcXNGnSRJkdcHJyQlZWFgDg448/LlLQgYcf8dWpUwfbt28HANy9exeTJ0+GSqVC3759cfr0aVy9ehUAsHPnTtSsWROtWrUqcVsfv+/Ro0exatUqAECdOnVw4sQJAMDt27cRHx9f5L7dunVDSkoKfv75Z3Ts2FHrcxRuW2ZmJv71r38BAKpXr4527dph2rRpGDhwYIk5ich8sWaWrWYOHjwY27dvV7YtNDQUq1ev1pohOztbWcZy5MgRpKWloUuXLqhevTo6duyojPGdO3cQGRmpLLMoLScnJ2RnZyM6Ohrff/89AgICkJKSAisrK7Rt2xb5+flPnDRIlkc1c+bMmaYOQZbh/v37uHPnDoYNGwbg4RnAOTk5eOutt5TbvPbaa/jvf/+LdevWYevWrejduzfee+89WFlZYdSoUUhMTMSJEyfw4osvonbt2rCxsUFMTAzWrVuH0NBQJCcnIzAwEC4uLnB3d0doaCjCwsJQv359ZT1ZIWtra3Tq1Alr167Fzz//jO3bt8PPzw916tRBpUqV4OHhgTlz5iA0NBR///03Fi5cCFdXV2zbtg3BwcG4dOkSHB0dcfbsWfzyyy+4cOECXF1d0bZtW+W+4eHhiI2NxcyZM+Hs7IwmTZpg8+bN2Lp1K+Lj45GdnY3Dhw+jQYMGqF27NlQqFa5du4Y6deqgS5cuAID4+HjMmDED169fx7lz59CnTx9UrFgR69atw44dOzBo0CA0atRI2a64uDhMmjTJ8DuUiAyKNfPZaibwsJGPj4/H8uXLERERgbt37yIgIAC2trZPjPP169cRHR2NatWqYfny5di1axe+/PJLNG7cGACU5viHH35AeHg4xo4di+7duz9RmytXrowlS5YgISEB2dnZyMjIQFBQEC5dugRra2u0atUKarUaq1evxtGjRzFy5EhUrlwZCxcuREREBMLDw+Hn54emTZsa4NVExsSvciMyI/v27cOFCxf49W1ERDo6dOgQAgIC8Oeff5o6CpUTXFZBZAYKT8SLiIhQziQnIiIi42NzTGQG9u7di0GDBqFOnTp6O1GEiKi8O3nyJObMmYPk5GQuRyO94bIKIiIiIiINzhwTEREREWnYlHyT0ktOvv/M961SpQLu3LHM7wi05OyAZee35OyAZee35OyAcfO7u+v+ozCl9bzW3cdxW8wTt8U8PQ/b8ix11+xmjm1sVKaO8MwsOTtg2fktOTtg2fktOTtg+fn1oTyNAbfFPHFbzBO3Rctj6e2RnjM7Lv3+xHUVkuzRtXqXYm5NRETPg+LeGwCgb/1/GDkJET0rs5s5JiIiIiIyFTbHREREREQabI6JiIiIiDTMbs3xz6e3I/NBzhPXG3q9FteJEREREZHZNcf6wmaXiIiIiEqLyyqIiIiIiDTYHBMRERERaZTbZRXaaFtuQURERETEmWMiIiIiIg02x0REREREGs/dsorS4jIMIiIioucHm2MiInqulHbSg18BSvR8sZjmmN9bTERERESGZjHNsTZc9kBERERE+mLxzTGVXljUpWKvHzukhZGTEBEZjqEnTzg5Q1Q+8dsqiIiIiIg02BwTEREREWlwWQUREZGF4MnpRIbH5piIiCzaow1jhSR7ZD7IMdjjE1H5x+aYiIjIwPjdykSWg80xERERGYW2b0sa1Kl+uXpOsmxsjo2Aa8SI6HnGGtH/KzwAACAASURBVEhEloTNMRERkZnZcel3g6yfJqKSsTm2IJx9ISJD+Pn09mKbMFPVFtY6MkeFyzOcnOzx4JHjhcszyh82x3pWmpMuDP0GoG2dlTY/7jpX5IAvxAOfiIiInhdsji3IuYQ7xV6fd710TTARkSXiV6qVX6U9aa6423Mih/SFzbEJaWt2+/L4JiIiC/Z48/r4UgRjPKexHqe0t2cTb/7YHJshfR3gREREhqR9Nv9lo+YoD/iVc+aDzbERaJsh1uZiwWG93L6Bqm2pHkcbff2v2BIOfEvISFRecJmE/vBHRoj0h80xmZw+1po5OdmjZ+taes1FRPS8MbdvCuEnqWQKbI5J70y17qs0j8GZYCL940xw+cV9W37xffJJbI7LMUMvtyCi8s3cZhFJO23L9xrXqWLQx9emgUovT2tQ5XVWms1u2bE51rPSFhCyDCw2RGSJSvutSHwPMzxTNeXaliQa6rEBy32PZHNM5UZ5nQUgMjf8iN3yLY7cbNDHf/yTS9ssG+QV5Gu9PT/RLD1zarIN/djGbrLZHD+HtC238ERHgz4+i1/pmUuhICIiel6wOaYSlcdm19AnDZpb82opOYksXWmXJWhbE3wu4Q5sbW2Ql6d9tpXIkJ7nT2PNrjk+9XdKscXgaQWkOKW9vT7Y2prdcJZKXFb0Uz/2elxpv4+5tI9T2ua7uMfR9hilzV7ax1kcWbptqtQwEZnF/nqUYb9I3xJ+2UkfX/X3tNtrw5PR9MfQJ4vpi74aW0M+Z3lV2veB0tRwS57IASy7SbXUiRkrERFThyAiIiIiMgfWpg5ARERERGQu2BwTEREREWmwOSYiIiIi0mBzTERERESkweaYiIiIiEiDzTERERERkYbJvpj3wIED+P333+Hm5gYrKyv4+voW+XtOTg7mz5+P6tWr48qVKxg3bhzq1atnorRFlZQ9KCgIKSkpqFq1Ks6cOYNJkyahQYMGJkr7pJLyF4qIiMCnn36KY8eOwcnJycgpi1dSdhHBxo0bAQDXr19Heno65s6da4qoTygpe2JiIhYsWIDmzZsjLi4O/fr1Q/fu3U2U9knJyclYtmwZzp07h5CQkCf+bs7HbEnZzf2YLYuy1Nrw8HDExcXB2toaderUwbBhw0yxCYqybEu3bt1Qq1YtAEC1atWwePFio+d/lC51eOfOnViyZAkCAwPRtWtX5XpL2y+A9m2xtP3ytFphafvladtiaftl586d2LNnDxo3boxTp05h0KBB6NatG4Bn3C9iApmZmdKjRw/JyckRERFfX185cOBAkdusWbNGgoKCRETk3LlzMnz4cKPnLI4u2ZcuXSpqtVpERHbs2CHjx483ek5tdMkvIvL333/LkiVLpGHDhpKRkWHsmMXSJXtoaKiEhoYql+Pi4oyaURtdss+YMUOCg4NFROTMmTPSs2dPY8d8ql9//VX27NkjgwcPLvbv5nrMipSc3ZyP2bIoS629efOmDBgwQBmXIUOGyOXLl40X/jFlfd9Yvny58cKWQJdtSUhIkIMHD4q3t7f8+eefyvWWuF+0bYuI5e0XbbXCEvfL0+qepe2XkJAQuX79uogUff981v1ikmUVsbGxqFmzJuzs7AAArVu3RmRkZJHbREZGolWrVgCARo0a4dy5c8jIyDB21Cfokn3y5MmwsrICAKjValSoUMHYMbXSJX9WVhbWrVuHjz76yAQJtdMl+7Zt23D37l1s2LABS5YsMZsZb12yV61aFWlpaQCAtLQ0eHh4GDvmU/Xu3fup42muxyxQcnZzPmbLoiy1NioqCh4eHsq4tGrVCn/99ZdR8z+qrO8bhw8fxtq1a7Fs2TIcO3bMqNkfp8u21K5dG6+//voT97XE/aJtWwDL2y/aaoUl7pen1T1L2y9DhgxBzZo1AQBXr15VZsCfdb+YZFlFampqkTcqZ2dnpKam6nQbZ2dno+Usji7ZC+Xm5iI0NBRffPGFseKVSJf8S5cuxcSJE5UXornQJfuNGzeQkZEBX19fXL58GWPGjMHOnTuhUqmMHbcIXbJ/8MEH+OijjzB37lycPHkSEydONHbMMjHXY7Y0zPGYLYuy1Nq0tLQi1zs5OWmtdcZQ1veNTz75BJ6ensjKysLgwYOxZs0a1K1b12j5dcmpC0vcL09jqfvl8VphyfuluLpnifslOzsbK1asQExMDBYtWgTg2feLSWaO3dzc8ODBA+VyRkYG3NzcSn0bU9A1V25uLmbOnIkpU6agTp06xoz4VCXlv3nzJtLT0/Hrr78iKCgIABAcHIxTp04ZPevjdBl7Z2dntGjRAgBQr149ZGRk4ObNm0bNWRxdsvv7++Ptt99GQEAAVq5ciSlTpuDu3bvGjvrMzPWY1ZW5HrNlUZZa6+rqWuT6Bw8emHR/lvV9w9PTEwDg6OiIJk2amHQ2rCzHiiXul6exxP1SXK2w1P2ire5Z4n5xcHDAp59+ikWLFuGf//wn8vLynnm/mKQ5btmyJW7cuIHc3FwAwLFjx+Dl5YW7d+8qH4F5eXnh+PHjAID4+Hg0btzYLGagdMmenZ2NL774Ah988AGaNWuGXbt2mTJyESXlr1GjBubNm4dx48Zh3LhxAB7OaDZv3tyUsQHoNvbt27dHYmIigIcHUEFBAdzd3U2WuZAu2W/evKlkrVixIqytraFWq02WWReWcMxqYynHbFmUpdZ26tQJZ86cgYgAAI4fP47OnTubZkNQtm05ePBgkY9Sr169itq1axt/IzR02RZtLHG/aGOJ+0VbrbDE/aJtWyxxv6xfv14Z+xdeeAF37txBTk7OM+8XKym8h5Ht378fu3btQpUqVWBrawtfX18sWLAAlStXxrhx45CdnY358+fD3d0dCQkJGD9+vNmc+V5Sdl9fX1y4cAHVqlUDAGRmZhZ7hryplJQfePhRxH/+8x98/fXXmDhxIoYNG4bq1aubOHnJ2e/fv4+FCxeiZs2aSEhIQK9evdClSxdTxwZQcvYjR45gw4YNaNq0Ka5duwYPDw8MHz7c1LEVMTExCAsLQ1RUFIYPH45Ro0Zh+fLlFnHMlpTd3I/ZsihLrQ0PD8fp06ehUqnw0ksvmfzs+2fdlvj4eHzzzTfw8PDA7du3Ub16dYwfP96st0VEsGrVKmzZsgVt2rTBgAED0KlTJwCWt1+0bYsl7pen1QpL2y/atsUS98uqVauQlJSEmjVr4uLFi2jdujXeffddAM+2X0zWHBMRERERmRv+CAgRERERkQabYyIiIiIiDTbHREREREQabI6JiIiIiDTYHBMRERERabA5JiIiIiLSYHNMRERERKTB5piIiIiISIPNMRERERGRBptjIiIiIiINNsdERERERBpsjomIiIiINNgcExERERFpsDkmIiIiItJgc0xEREREpMHmmIiIiIhIg80xEREREZEGm2MiIiIiIg02x0REREREGmyOiYiIiIg02BwTEREREWmwOSYiIiIi0mBzTERERESkweaYiIiIiEiDzTERERERkQabYyIiIiIiDTbHRERE5UhmZibGjRuHYcOGYeDAgbh+/brJsnzzzTfo0KEDVqxYUeJtCwoK4OPjg0aNGuHatWsAgCNHjmDUqFGGjqmza9eu4c033zR1DDIwNsdkdrZu3QofHx9Tx9CbRws9EVFZ6FIfd+zYgfz8fPznP//B5MmTYW2t37d6f39/nZpdAPD19UWnTp10uq1KpcLGjRuLXNemTRt8/fXXpc5oKC+++CL+85//mDoGGZiNqQMQERGR/iQlJaFatWoAgK5du5o4TdlYWVnBxcXF1DGKqFixoqkjkIGxOSat/P39ERoaildffRUbNmxAYmIivL29ERUVhdu3b2Ps2LHIzc3Fjz/+CDs7O8yZMweXL1+GWq1Gjx49MGbMGOTl5WH06NGIiYnBjBkzsHfvXhw6dAjr1q3DlStXEBISAkdHRzg4OGDatGlITU1FUFAQUlJS4OPjg4YNG2L69OlFcqnVasyaNQvnz5+HSqVC3bp1ERgYiIkTJ+LgwYPw8vLCmjVrEBYWhkWLFqF169Zo2LAhfvrpJ/Tq1Qvp6ek4deoUevTogV69emH16tU4f/48/P390aNHD5w8eRLTp0/H/fv38d5772H37t3Iz8/HsmXLEBQUhOPHj6Np06aYP3++kiksLEwZh+rVq2PWrFlwdnbGmDFjAABTp06Fvb095s2bB39//yfG44UXXkBCQgJatmyJf//734iNjcWMGTNQqVIlbN261aj7nYhKZq71cfPmzdi6dStycnLg4+ODjz/+GCtWrCj2OdLT0xEcHAyVSgW1Wo2pU6eiTZs2AKDUvCNHjsDGxgZubm745JNP8OeffyIqKgr29vaIiYnBgAED8Pbbb2PmzJk4d+4cbG1t4e7ujtmzZ8PZ2VmnsQwJCUFwcDCqVq2Kvn37KtenpaVhwoQJOHHiBOLj4/Vam2fOnInt27fD29sbly5dQnx8PHr16oWpU6cCAI4dO4aFCxfC1tYWIoJRo0aha9eueP/99xEdHY09e/bgxRdfRF5eHpYsWYLjx48DAFq1aoWpU6fC1tYWvr6+2LdvHz7++GPExsbiwoULGDlyJEaMGFGm1x4ZgRA9RZcuXeTYsWMiIhIcHCxNmjSRU6dOiYhIUFCQ8u+AgADx8/MTEZGsrCzp16+fhIaGKo/TsGFDWbFihYiIRERESExMjLRr105ycnJEROS7776TkJAQEREJCQkRb29vrZkiIyNl9OjRyuWJEydKYmKi5ObmSrt27eTo0aPK38aPHy9qtVpERPz8/GTw4MGSk5Mjqamp4uHhIStXrhQRkV27dkmvXr2U+0VHR4uHh4ccP35cREQ+/PBDGTx4sKSnp0tOTo68/vrryt+OHDki7dq1k9TUVBERmTdvnnz22WdFtj0xMbHINjw+HqdPn5b+/ftLRESEcpvJkydLRkaG1nEgItMyx/ooIrJ8+XLl+bQ9x+nTpyUsLEzu3LkjIiKJiYnSpUsX5farVq2SkSNHSn5+voiIzJo1S8ng5+cny5cvL/L43333XZHnX7p0qXK5uNsXOn/+vHh6ekpCQoKIiGzatKlIzUxMTJSGDRsqt9dnbfb29paxY8eKWq2WpKQkadq0qdy6dUtERIYOHSqxsbEi/9fenUdFVf5/AH8PA4IsihC4IiKlFGpoIq4cNP3WqdRRS1tsUTtqhu16TnFKTlaYx6WTZIl6qNCfmcoaCqYpbgl6ktEQxULDUEkBl2GHeX5/CPcIMjAMs9xh3q//mLlz5/3c633ux+c+944QIjc3t9H2vDdfdHS0ePXVV0Vtba2ora0Vc+fOFdHR0dKy48ePF8uWLRNCCKFWq0VQUJCoqalpdluQfHDOMbUoNDQUBw8eBACcOXMGkyZNQkZGBgDg7NmzCAwMhFarRUpKCmbMmAEAcHJywlNPPXXfiOfEiRMBAJMnT8bgwYMB3P1ffUVFBV566SU888wzemXq0qUL8vLycPToUWi1WqxZswa9evWCg4MDnn76aSQmJkr5Bg4cCIVCIX12xIgR6NSpEzw8PODh4YGAgAAAzc8LdnFxQVBQEADgoYceQu/eveHm5oZOnTqhX79+uHz5MgAgISEBEyZMgIeHh9S+lJQUCCFabMe92yMwMBBTp06VspeUlMDR0REuLi56bRMiMj859o8tadrnBAQE4MMPP8QLL7yADz/8EFevXkVxcTGAu3Obp06dCqVSCQBYsGABgoODda7byckJL774ImbPno3U1FTk5OTolSk9PR1BQUHw8fEBAL1udjNm3zx27FgoFAp4e3vD3d1dunmxa9euSEpKwo0bNxAQEIBly5Y1myUpKQkqlQpKpRJKpRJTp069b982zLkeOHAgysvLpW1M8sXimFoUFhaGAwcOoLy8HM7OzpgwYQIyMjJw69YtuLm5QaFQoKSkBNXV1VIHBAAeHh4oKipqtK57L7E5OTlhy5YtyMrKwuOPP45PPvkEGo1Gr0xDhw7F8uXLsXHjRowfPx6bN2+WOjuVSoW0tDRUV1cjKSkJU6ZMafTZe4tNe3t76W+lUomamhq9lm34u2H5a9eu4fjx43j55Zfx8ssvY/ny5XjggQdQWlraYjuaXnKcPHkyMjMz8d9//yElJYV3RBPJnBz7x5Y07XPeeOMNDB8+HNu2bZNuhKuoqABwt1/r1q2btGz37t2lArapzMxMrFixAitXrsSWLVswf/58VFZW6pXpv//+a/Q97u7urX7GmH3zvdvE0dFR+uzq1avh5OSEadOmYd68ebh06VKzWZpup5b2raOjIwDcd64h+WFxTC0aPXo0/vnnH8THx2PUqFEIDQ3F2bNnkZSUhNDQUAB3O4NOnTqhpKRE+lxJSQm6d++uc701NTXw9PTEqlWrkJ6ejlu3bjWaJ9aSO3fuYMSIEfj+++8RFxeHxMREacR1yJAh8PT0xK+//opLly7B39+/Ha3XT8+ePREWFoa4uDjExcVh27Zt2LlzZ6OToT68vb0REhKClJQUHD16FGPGjDFRYiIyBjn2j/oqLi5GYWGhNKrZtGDr2bNnoyKytLRU51N3Tp8+DT8/P/Tp0wfA3fnK+vL29m60bVobVGiL9vTN1dXVWLp0KQ4cOIDg4GAsWrRI53fcm7m1fUvWgcUxtcjJyQkjRozA+vXrMW7cOHTr1g2DBg1CTEwMRo0aBQCws7ODSqWSLiVVVlZiz549mD59us71FhUVSTeSuLm54eGHH0ZdXR2Au6MCDaMXixcvvq+j/fXXX7F9+3YAQN++fdG9e3dotVrp/alTpyIqKspsxeW0adOk0SIAyM/PxxtvvCG97+zsjMrKSiQlJSEtLa3FdalUKsTGxsLPz0+6nElE8iTH/lFf7u7u6NKlC9RqNQDg8OHDjd6fNm0akpKSpO9dvXo1zp071yhDeXk53n//ffj6+qKgoEAqEo8cOaJ3jieeeALZ2dnSVIhffvnFoPY0p7W+uSVvvfUWKioqYG9vj2HDhknbobnvSE5ORl1dHbRaLZKTk1vct2QdlJGRkZGWDkHydufOHZSWluL5558HANy4cQNVVVV49tlnpWVCQkJw5MgRbNq0CfHx8XjyySfx4osvQqFQYO7cubh8+TLUajX69OkDHx8f2NvbIysrC5s2bUJCQgKuX7+OiIgIuLm5wcvLCwkJCUhMTET//v0RFhbWKI+joyO2bduGHTt2YOvWrfD398eCBQukYrJ3797YuHEjvvjiCzg7OwMAYmNjkZiYiAsXLqBXr16Ii4vDyZMn8eeffyIkJARLlixBUVERTp06hcDAQHz88ccoLCzEtWvXoNVqERMTg/z8fHTu3BkZGRnYt28fcnNz4e/vjxEjRsDd3R2ff/65NOr76aefSpfaNBoNNmzYgLy8PMybNw+LFy++b3s08PX1xXfffYePPvoIXl5eptytRGQEcusft2/fji1btiA/Px+HDh3CtGnTmv0OOzs79O/fH1999RUOHToEIQROnjwJtVqNSZMmISQkBPn5+Vi3bh3i4+Px0EMPSU9Z6NKlCzZt2oTU1FSoVCo8+eSTKCgowFdffYWsrCx07twZJ06cwM2bN3Hq1CmkpqbiwoULcHZ2RmBgYKO8np6e8Pb2xvLly5Geno5+/frhyJEjUKvVGD16NN577z0UFRUhKysLgwcPNlrfvHLlSmRkZCA3NxeBgYHYsGGDdE4IDg6GQqHA2rVrkZSUhEOHDuGTTz6Bj48PXn31Vfz7779Qq9UIDQ3FmDFj8PfffyM6Ohq7du1CYGAgwsPDoVQqsXTpUqjVavz5558YO3YsIiIikJ+fL21jJycn0/7jJIMpRGt3DRFZmdu3b+Ojjz5CdHS0paO0mVarxdy5c/H9999bOgoREZFN4nOOqcPIyMjA4MGDkZaWZnU3s6nVanTp0gUFBQWca0xERGRBLI6pwygoKEBUVBT69etndaPG169fxwcffIAePXpg/fr1lo5DRERkszitgoiIiIioHp9WQURERERUzyTTKq5fv2OK1RpVt27OKC0tt3QMk7OVdgK201ZbaSfQ8drq5eVmsnWbo9+V8/5gNsMwm2GYzTCWyGZIv2uzI8f29rbxDFlbaSdgO221lXYCttVWayDn/cFshmE2wzCbYeSc7V68IU+GUvP3Nvv60/3/Z+YkRETywb6RiMyBxTEREZkUi1oisiYsjo2suZMATwBERERE1oHFsQXpGk0hIrIFHFEmIjmy2RvyiIiIiIiaYnFMRERERFSPxTERERERUT3OObYinJ9HREREZFosjs2AN94REZlOav5eOBc5orysqtHrHDggIkNwWgURERERUT0Wx0RERERE9TitwkCcKkFEZBrsX4nIkjhyTERERERUjyPHHQCfYkFERERkHCyOW8HLe0RE1okDB0RkCBbHHZiuxxvpwhMGEbUHBxOIqCNgcdyBnSsohYODPWpqahu9HtC3m4USEREREckbb8gjIiIiIqrH4piIiIiIqB6nVZAk8XB+s6+rxvU3cxIiIiIiy2BxbEXOFZQ2+zrnEBMREREZB4tjIiKSlbYOBPDmYyIyJhbHJPm77oSOdzitgoiIiGwDi2MiIrIpbf1xEP6YCJFt4dMqiIiIiIjqceRYhnTNt7PU+jlqQkRERLaCxXE9/uyp6fFRcUQdg7X0l229sc+U7WL/R2Q9WBybgakfwWbqkWYiIjlgX0dE5sDimIiImtV0JNW5yBHlZVUWSmM8xiqynzbCoC9HlInkh8WxBXEUhIjIeukqbJtycXE0cRIiMiYWxwbir9Xppu8Jg4hsA/tLIrImLI6pVbpObDWF8iqC/y/9HMqaXPLlpUkiIiJqiw5bHPPxY0RE5tXWqWLWPrWsuV8V9VcG671sS8sTkeV02OJYl4ai2VQ3llh7Z98WxurseUMKkXWxpX6urZrrFx0qrOdUy/6YqAMUx9byvE1boqto1sVYxXRbbnrhCYCodU2LYAcHe9TU1FooTcekq79cfbD5198Pm9Xs67r6NIfefzX7ek3hgwDu9ptNp6MR2TrZFcdyK1o4QkKGMNZNiSzWyRx4E631aOu+0nUO81caI43pya0mINsgu+JY96ijcQ6Eho6i6QgI75qWH0ucsNkRky1q69Uesj2rD25v9vW2XvnT1Zc21/fyEXhkKbIrjnUx9fQJjhDLjzHmNOvq0HXRtW5rGVkzVnHf1vY2t35r/4+GteenjslY93o0rMehwh41dYZPlWlrHt3n8gfb9L3NHZ+6ppBYy4347HPksw0UQghh1m8kIiIiIpIpO0sHICIiIiKSCxbHRERERET1WBwTEREREdVjcUxEREREVI/FMRERERFRPRbHRERERET1WBwTEREREdWzmh8BMcSxY8ewd+9eeHp6QqFQIDw8vNH7MTExuHHjBh544AHk5OTgrbfegr+/v4XStk9rbd29ezf279+PgIAAnDlzBiqVChMmTLBQWsO11s4GycnJWLJkCf744w+4uLiYOaVxtNbW+Ph4/PTTT3B0vPsrUjNmzIBKpbJE1HZprZ1CCMTFxQEACgsLcfv2bURFRVkiaofV2j6oqqrCl19+ie7du+PSpUuYP38+/Pz8AADZ2dk4evQo7OzskJmZiaioKPTs2VMW2aKiomBvbw+tVovKykp8/PHHsLMz3piQPv3R7t27sWbNGkRERGD8+PHS60lJScjNzYWdnR369u2L559/3mi52pPt9OnT+OGHH/DII4/g4sWLGDJkCGbOnCmLbA2Ki4uhUqmwYMECzJ49WzbZLH0stJTN0sdCS/WWqY8Fg4gOqry8XEycOFFUVVUJIYQIDw8Xx44da7TM2rVrhVarFUIIkZqaKhYsWGD2nMagT1t37dolCgsLhRBC5OTkiEmTJpk9Z3vp004hhPjrr7/EmjVrxIABA4RGozF3TKPQd59evnzZEvGMRp92JiQkiISEBOnv3Nxcs2bs6PTZBxs2bBAxMTFCCCHOnTsnXnjhBSGEEHfu3BHh4eHScgUFBaKsrEwW2bKzs8XkyZOl5SZPnixOnjxp1mwFBQXi999/F7Nnzxa//fab9PrVq1fFlClTpPPP9OnTxcWLF2WRbd++fUKtVgshhKiurhbDhw8XxcXFssgmhBB1dXUiIiJCLFy4UMTFxRktV3uzyeFY0JVNDseCrnrL1MeCoTrstIrs7Gz06tULnTp1AgAMGzYMBw8ebLTMO++8A4VCAQDQarVwdnY2d0yj0Ket06dPR69evQAA//zzj1WOkOvTzoqKCmzatAlvvvmmBRIajz5tBYCtW7di8+bNiI6Oxs2bN82csv30aWdKSgpu3ryJH3/8EWvWrLHaKwFypc8+OHjwIIYOHQoAGDhwIM6dOweNRoOMjAw4OzsjNjYW0dHRyMnJMWo/2p5s7u7uKC8vR21tLWpra6FQKNCnTx+zZvPx8cHIkSPv++zhw4cRGBgonX+GDh2KQ4cOySLb448/jiFDhkh/K5VKODg4yCIbAGzcuBHPPfccunbtarRMxsgmh2NBVzY5HAu66i1THwuG6rDTKoqLixudRF1dXVFcXNzsstXV1UhISMCyZcvMFc+o9G1rZWUl1q1bh6ysLKxatcqcEY1Cn3auXbsWixYtkg5Sa6VPW4ODgxEWFgYPDw9kZGTg7bffxg8//GDuqO2iTzuvXLkCjUaD8PBwXLx4Ea+//jp2794NpVJp7rgdkj77QNcyhYWFUKvV+Oyzz6BUKvHKK6/A3d1dZ2Fjzmy+vr6YOXMm3n77bdjZ2WH06NHw8PAwSi59s+lSUlLS6LMuLi56f9bU2e61detWLFy4EG5ubrLIdvz4cTg5OeHRRx/Ftm3bjJbJGNnkcCzoIqdjoWm9ZepjwVAdduTY09MTZWVl0t8ajQaenp73LVddXY3IyEi8++676Nu3rzkjGo2+bXVycsKSJUuwatUqvPLKK6ipqTFnzHZrrZ1Xoy34GgAAA5hJREFUr17F7du3sWfPHsTExAAAYmNjcebMGbNnbS999qmPj4/UwY0cORInTpxAXV2dWXO2lz7tdHV1xaOPPgoA8PPzg0ajwdWrV82asyPTZx/oWsbV1RWPPPIIHBwcYGdnh6CgIJw4cUIW2fbv34/MzEx88803WLduHf7991/8/PPPZs2mi4eHR6PPlpWV6f1ZU2drkJKSgvLycrz22mtGy9XebPv370dVVRViYmKQl5eHo0ePYteuXbLIJodjQRe5HAvN1VumPhYM1WGL46CgIFy5cgXV1dUAgD/++ANhYWG4efMmNBoNgLsjqcuWLcOcOXMwaNAgpKenWzKywfRp6+bNmyGEAAD06NEDpaWlqKqqslhmQ7TWzp49e2LFihWYP38+5s+fDwCYM2cOBg8ebMnYBtFnn65evRq1tbUAgEuXLqFPnz5WN5qqTztHjRqFy5cvA7jb6dbV1cHLy8timTsaffZBWFgYTp06BQA4f/48AgIC4OrqipCQEBQWFkrrunLlCvr16yeLbNeuXWv078TLy0taj7my6TJu3Djk5ORIffKpU6cQGhoqi2wAsGPHDhQXF2PRokU4f/48Ll68KItsERERUv8+YMAAjBkzBjNmzJBFNjkcC7rI4VjQVW+Z+lgwlDIyMjLS0iFMwcHBAf7+/oiNjUV2dja8vb0xY8YMfP3117hw4QIee+wxvPPOO8jJycHJkyeRkJCA48ePY9asWZaO3mb6tDUzMxOpqanIy8tDfHw8Zs2ahWHDhlk6epvo007g7mWa2NhYZGZmQqlUws/PD66urhZO3zb6tPXChQuIj49HXl4e9u7diw8++AA9evSwdPQ20aedgwYNQnJyMs6fP49ffvkFCxcutMo583Klzz4IDAxEWloazp49i4yMDCxduhTdunWDh4cHqqurkZaWhqysLDg6OmLOnDnS/EFLZnvwwQexf/9+5OTkICsrCzdu3EB4eLjR5s/qk00IgW+//RaZmZkoKytD586d4evrC1dXVzg7O2PXrl04duwYxowZg7FjxxolV3uz7du3D5GRkdBoNEhISEBycjIee+wxo81RbU+2Bjt37sSBAwdQUlICNzc3oxWh7ckmh2NBVzY5HAu66i1THwuGUoiGcp2IiIiIyMZ12GkVRERERERtxeKYiIiIiKgei2MiIiIionosjomIiIiI6rE4JiIiIiKqx+KYiIiIiKgei2MiIiIionr/D/0jt4KC39abAAAAAElFTkSuQmCC\n",
-      "text/plain": [
-       "<Figure size 720x1440 with 30 Axes>"
-      ]
-     },
-     "metadata": {},
-     "output_type": "display_data"
-    },
-    {
-     "data": {
-      "image/png": "iVBORw0KGgoAAAANSUhEUgAAA4oAAAJECAYAAABQGNqIAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAgAElEQVR4nOydeViU5frHPzPDNsOwgwuCgqiguGHu21HcLTUtS3+m5hJpp1XLNFHTcE87HrOOWKZldtTUTE+dNDyZ4q6o6VHADRJxgQGGGfZhfn9A4MgyqDwqp+dzXVwXL3PP97253+d9nvd+n01hNpvNSCQSiUQikUgkEolEUozyUTsgkUgkEolEIpFIJJLHC5koSiQSiUQikUgkEonEApkoSiQSiUQikUgkEonEApkoSiQSiUQikUgkEonEApkoSiQSiUQikUgkEonEApkoSiQSiUQikUgkEonEAptH7YBEIpFIJBKJRCKRSMrn9u3b/O1vf+PChQts3bq1zOe5ubksXryY2rVrc/XqVcLCwvD393/g88oeRYlEIpFIJBKJRCJ5TDlx4gS9evXCbDaX+/n69eupW7cuL7/8Mi+++CIzZ86slvPKRFEikUgkEolEIpFIHlP69++Po6NjhZ//8ssvhISEABAYGMiFCxcwGAwPfF6ZKEokEolEIpFIJBJJDSU1NdUikdRqtaSmpj6wrpyjKKmQ/JTL1a75a/CMatf8g+7nFgrTFsG61rOFaQ8N/l2IrmZAkBBdhX+AEF2AeW+cEqK79Po+Ibr6JU8J0QXoszReiO6edxoL0RVJ0LyDQnRnaVoJ0RVJN41OiO4EY44Q3QY2LkJ0159YJkQXxLV9oto907X/CtEFmPv0RiG6Y+3Sheh6+huF6IK4NvXrD8X4/B+VuFhsTNguTLs6EfFsDGDr2fCBvu/h4YHRWHp9DAYDHh4eD+qW7FGUSCQSiUQikUgkkppEenp6yfDSHj16EBMTA0BsbCxBQUFotdoHPofsUZQ8ECmpOv4e+SWxFy+z6fO/37eOW/cW1BrYnryUDDDDlWXfWnxea0gnvPq3JfNsAs6tA7ixZR8pu08+Mn9Fant3DcZ/QDuyU/VgNnPyI8u3bK1eeQq1lwvZtzPwbOHP8Q+/JeNSslVd25AnsOvSHXN6Gmazmeyv11t8bt+nPw5PDoa8PAByfvqB3KjdVnWVvkGoGoVAdiZmMxQc2WXxuV3v0ShcvUrtPX3I2Tgfs976kIjD8UlEnb2Ku6MahQIm9Wlj8XmSLpPlu44S7OtJ7PVUBrQOoEdwA6u6AV2a07x/OwzFMY5asa2MTYsnO9Bv2gh2zf2SC3tjrGoCuLm5smD+DK5cSaRRI3/CZy3i1q0UCxsvLw8+X/MR0QePUsvLE1s7W954M7zCCep/ICrOTq5OTJ7xEtcTk/Hxr8fqRZ+TlpJWxq6enzevzpqEyWQiPGyutVAILReitF1cnZkx5y0Sr17DL6A+Sz74Oym3Lb/TMiSYCZNGc+638wQ08uPUybN882XZFejuRNQ9LVJb07k1Tn06Y9JlYDabSf24/F4g50E98F42jdjWwzBnWe9BFFXeysPRRcvI6aO5mXiTOv512bRkA/qUjPvSupOa1u5Vp8+Hz8QRdfQ33F20KIBJw/tZfJ50S8enW34iwKc2l67dZPSTfyHQz9uqrqg6WVQ5hprXpoqsh+6meZeWtBvQEX1KUdy3rdh8XzqPJYWmR3Lao0ePsmPHDm7fvs0nn3zC+PHjiYyMxNXVlbCwMMaMGcPixYv55JNPSExMZP78+dVyXpko1kCOHz/O/PnzmT59Oh06dGDFihU0b96cXr16PXRfTp45R2i3jlyIv/+ueKXajqAlEzncfSrmvAJafD4Ft27NSdt/tsRG5WDHxYiN5Calom3uR4s1b95Xg1kd/orUVjnY0XXReL4NfZfCvAJ6R76Od5dgrkefK7Gx0ThweO7XADQc1IEO4SPZPW555cL29mhfn0Ja2IuQn4/TrHnYtm5D/inLGGYunEfhzRtVd9jGFrteo8j5ai6YCrB78mWUvkEU/n6hxMSUeB7Tz18VHdg5YNf3xSolA9l5BczfFs3Wqc9gZ6Ni6pdRHIm/TofGpQ8d6345Q2u/2ozu3pwLSSm8s+E/VhNFWwc7hs4fz0d9p2HKK2DUp28S0DmYSwdLY+zm44VRl0lG8r2N74/4YDpRew/w7bc7eerJPixZPJsXx71uYWNjY8OO7//N52uLHlhOHN9Dp45PcPDQ8YqFBcZ50vQJHD9wgr0799GlTydenT2JD14vO5wtOKQph/Yeof1f2loPhEB/RWq/O+sNDuw7zK7vfqJ3v78QPm8qb05+z8Kmdm0v1q7ewOmTZ7GxsSEmbh//3hVFmq78YW/C7mmB2goHe+rMfZUrAydhzi+g3sqZaDq1IuvQaQs7uwBf7BrVt+rnnQgpbxXw/LQX+O3AaY786yBterVl1MwX+fStFfet9wc1rd2rLp+zc/OI+Oxbti2bhp2tDVOWrePIb3F0aNGkxGbp+u8Y9Jd29GrfgvjEZN5b+TVblr5dqa6oOllkOa5pbarIeuhu7BzsGL9gEtP6vE5BXgFv/mMawV1acC76t3vWeiwxFz6S07Zv35727dtb/G3atGklvzs4ODBnzpxqP68celoDadu2LYGBgSXHr7/++iNJEgH69uyGRqN5IA2Xtk3IuXYbc14BAOlHY/HsHWJhk7xpH7lJRRWhxr8Oxrhrj8xfkdq1n2iM4VoKhcWxuHksnvq9WlvYnPiw9K2zQqkk35hrVde2aTCmmzchPx+A/HNnsWvfqYydw+ChqJ99HvWosSicnKzqKusGYNbrwFTkb+H1S6j8W1jYmOJKEyCb4C4UnIu2qgtwJuEWdd202NmoAGjtV4v9FxItbNy1atKK50HpjDk087E+Hr9+m8akJaVgKo5xwvE4gkIty1vatdtcPnTvc3QGDujF4cMnAIg+eIyBA0LL2CQn3yxJEh0dNWgdNSQkJlWqKzLOnXp15OyJov/1zLGzdA7tUK7d7u1RFOQXVElTpL8itUP7dufEsaI5rceOxBDat3sZmz3//oXTJ0sf5gsKCiqNi6h7WqS2OiSI/Ou3MBf/X1kn/4u2h+UDisLBHveJz5JSQQ9NRYgobxUREvoE8SdjAYg9foGQ0CceSO8Palq7B9Xj85m4q9T1csPOtqiPoXWgP7/GnLewSbiRQl1PVwDq1XInLjGZNH3lKy+KqpNFluOa1qaKrIfupvETgaQk3aag+Fxxxy8QEnr/L3wkj5Y/fY/i1q1bWb58OePGjSM2Npa0tDSGDRvGgQMHSEhIYPXq1Wi1WuLj41mzZg1NmjTh8uXLTJ48GV9fXzZu3MjFixfx8PDg+vXrzJ07F6PRyJQpU1CpVAQGBnLq1CkGDRrEc889Z3HuvXv3snDhQnr27ElhYSF79uzhhx9+4K233qJt27ZcuXKFQYMG0blzZwAiIiLIz8/H19eXGzeK3lBdv36diIgImjZtSlhYWMnbhEWLFrFp0yZWr17N3r170el0LFq0iICAAH7//Xeefvpp2rZ9PG5cO09nTIbSoR4mQza2nmUXK1A62OL/9nDcujTj3OSVD9PFh4ba05l8Q3bJcZ4hGw9P53JtlbYqGg/vRvTMdVZ1Fa5umLOzSo7NWUYUrpYLkOSfOUXe0UOYMzKwbdcBp5lz0U+fUrmuxglzfum1M+dlo1RX9GZWgapBMAUxUVb9BdAZstHY25YcO9rboTNYvjUd3b05U778mQ93Hubs7ymE3dXwlYfW05ncOxbZyDFk4e3hVyWfrFGrlgeZmUUPRXp9Ju7ubqhUKkymskNVnntuMJPCxvDhsk9JSqp8eI/IOLt5uJJlKCobWZlGnN2cUamUmEz3/9ZUpL8itT083TFmFsXCkGnE1c2lwusHMPalkXz80Wcl17w8RN3TIrVV7q4UGkt1Cw1ZqNwt62Svt8aQ+sk3cI/JnIjyVhHOHi7kFP8f2YYstK5OKFVKCgWc616pie2eTm/A0cG+5Firtud8hmXZDwn050x8As0a+nL2YtGiasbsXNycK54rJapOFlmOa1qbKrIeuhtnDxdy7jhXliELP48HW6jlsaLw0dcfD5M/faL4zDPP8N133xEcHMzEiRN55ZVXMBqNLFiwgIiICKKjo+nXrx/h4eG8++67tGnThiNHjrBo0SJWrVpFnTp1GDFiBEqlkoiICA4cOECPHj0ICwtj+fLlTJ06FZ1Ox9ixY8skiqGhoezevZsGDRowatQohgwZglKp5MUXX6Rz586kp6czYcIEOnfuzC+//MLVq1f57LPPAIiKKqoYvL296d27N0lJSdjb2zN06FC2by8ad/7888+zevVqAE6ePElGRgajR48mNzeX9HQxq4PdD3kpelRah5JjlVZNfjnzSApz8rkUsRG1X23abJvNwfavYy54NGPFRZGdosdWqy45ttOqyUnRl7FT2qrounAcxxdvJjPhllVdc3oaCnXp22SFxhHzXWXgzuEx+adicJ67AJTKSitFc1YmCtvSa6ewU2POzizXVhXQCtOVM1Z9/QN3rZqs3PySY2NuHu53lBOA2Zt/ZWi7QAaEBKAzZDN4ybf8a/pzuGjs75YrwZCix96xVMdBq8GYWjbGVeWliS/w9JD+GIxZ3LqVipOTlowMPc7OTuh0aRUmGZs3f8+WLTv5efdmrl27zo//3lvhOao7zkNeeIru/buSnZVNWmo6Gq0Gg96IxskRfZr+gR/aRZaL6tYeNXY4/Z4KJcuYRWqKDkcnDXp9JlonR9LTMiq8fkOeGYhGo2blsshK9UXd0yK1Tbp0lI6lukqtBpOutE62qeOJykWL04BuJX9zHzcU477j5Jwtu7Ku6PJ2J6H/15d2/TqQk5WDPjUDB0c1Wfos1FoNhvTMxyJJhJrZ7rk7azHmlPYyGbJzcXexTADfHjOYL3ft46t/7cPZUY2rk4baHpWvUlvddfIfVHc5vpOa1qaKrIfuRp+agcMd59JoNehTH3xusOTRIIeeFuPr6wuAs7Mz9esXvb1xcXEpWWo2NjaW6OhoIiMjOXLkSMkQDrVazdKlS4mMjOTixYvodKXLjPv5+QHg7u5usWTt3QQEFG0N0KJFC8xmM0eOHGHVqlVs3ryZtLSiSf7x8fElenf6W1V69OhBu3btmDBhAuHh4djYPD7vCDKOx+Hg44XCrsgn1/aBpPwcg42rI6riyqb+5NItA3KTddi6O6N0sHsk/ork5ol4tD6eKItjUbtdYxKjTmHv6lhSyascbOm6aAK/Rf5Iym9X8RvYzqpu/vlzqGrXBtuiHjrb4ObkHT2EwskJRXFZ1ox7CZRFwzxV9XwovHHD6puzwuRLKJzdQVXkr9I7ANOV38BeA3aWSZ2qWScK/nuoyrFo2aAWyWkG8oofik5dvUW3oPpkZOViyClaHOBGuhFP5yL/ndX2KBVQaGVRmMST8bjV80RVHOMGbZtwYW8MahdH7O9o3KrKms828OSgF3h+RBg//BhFx45Fw9u6dG7HDz8WJX8KhQJf36K5ld27daRd26KeT7PZTEJiEv7+lc+Pqe4479iwi6kvTCc8bC6Hog7T/IlmALRs15yDe4+U+Fzbu9a9hEKYvyK1v16/hTHDJzPpxans3f0rT7QrujbtOoSwd/evQFEsvOvVKfnOiNHD8PRyZ+WySAKbNsY/oOJ5saLuaZHa2TEXsPWuhaJ4iKGmTTMMvxxF6aJF6aim4EYKydM/Qhe5BV3kFgB0X2yv8OFadHm7k70bd7N47AesmLyUmL0naNymaJpGYNsgYvaeeGD96qImtnstm/iRfDuNvOLet1OxV+ge0pQMQxaG4gVgbun0jB3Ug9FP/oVWTfzo1DIQWyvPG9VdJ/9BdZfjO6lpbarIeuhu4k/E4lnPC5viczVpG0TM3krm4NcwzOZCIT+PK49PtvCYExQURJ8+fQgKCiIvL489e/YARfMDd+zYgbe3d8kStX+gUCiqpH2n3ZYtW7h16xYLFy4kPz+ff/7znwA0atSIw4cPl9j9/nv5++Q5OjqW+HH9+vWSv8fFxTFo0CAmTpzI119/zfr16wkPD6+Sf5VxLOYMO3+KIiVVx+p13zB25DAc7CvuzSmPwuw8Yqd9RuD8ceSl6jH8N5G0/WdpNGsU+ekGElbuQGlvS+CiCeQkpeDYuB5xs9ZhumNow8P0V6S2KSeP6Blf0HneGHJS9ejO/8716HO0nzmC3HQjp1ftpOfKV3AP9MGp/osA2KrtufrDscqFc3MxrPwIx8mvY85Ip+DyJfJPnUQzYRLmTD3ZmzdSmKZD+/oUTDeSsfFrSOaSKqyYVZBP3t6N2PZ4HrIMFKYkUfj7BWy7DsOcY6Tg+E8AKLx8MKfdgvyqz3lQ29nw3tAuLN5xCDdHBxrXdaNDY28++tdRXDT2jO/ZincGdeDrA+c4nXCTJF0mr/Vvi5ujQ6W6+Tl5fBe+lkHvj8WYqufGhUQuHTxH/+kjyc4wsO/TnQD0fPVpXOt50vKpjpgKTMT/av3NbfisRSxc8B5NGjekYcMGTHt3HgAtWzZj3RcrCGnTm5ycXKZOncypU2dxcnJEoVCwbv2myoUFxvkfiz7nlffC8G3oS70G3nw87x8ANGrWkFkrZjCm90QAuvbtTJfenagf4Mv/TX6ejZ9W4rNAf0VqL/5gBe+9/xb+AQ1o4O9LxOyiffWaBjfhb/9YSN+uw+gzoCfhH7zNuTMX6DswFDd3V2a/u4ArlxLK1RR2TwvUNufkcmPOKmrNmoRJpycn9gpZh07j9c54TBmZJQ/VKjdnXEcMBMBj4rOkb/qRgpuVL6ohpLxVwKYlGxg5Ywx1G3pTu34dvp6/7p41yqOmtXvV5bPa3o6ZE59h0RfbcXfW0qR+XTq0aMJHG3birNUw4elenI67woGYCzRr6EOGMYsZ44dZ1RVVJ4ssxzWtTRVZD91NXk4ea2euZuz7E9Hr9CSev/q/s5DNnxCF2dqa7P/jREdHM2vWLIYOHUpoaCjh4eE0bdqUl156iTlz5uDi4sKcOXPIyMhg7dq1+Pj4kJyczODBg2nbti3Lli0jPj6eNm3asH//flxdXZkzZw7Lly/n/PnzzJs3j/j4eBYuXEhERAT9+pUuJX3mzBnmzJlD06ZNefnll2nQoAGXLl1i1qxZtGrVCldXVz777DMiIiLo27cv8+bNIy8vj7p163LgwAEaNmzIa6+9xoIFC8jIyGD27Nn4+/vz2muvERwcTL169Zg/fz5z5syhTp06bNmyhYCAABISEnj++edp2bJlpbERsamoqE2HQdzGw6JY13q2MO2hweW/SHhQRG0OrPAPEKILMO+NU0J0l17fJ0RXv+Qp60b3SZ+l1t+U3w973mls3egxI2jeQSG6szSthOiKpJtGZ93oPphgrNo2A/dKA5vKhzLeL+tPLBOiC+LaPlHtnunavS/mVVXmPn1vC8dUlbF2YqbUePqL22ReVJv69YdifP6PSlwsNiZst270GJB3TUzSa+fTwrrRI+BPnyhKKkYmimKRiWIpMlEsRSaKDweZKJYiE8UiZKJYikwUS5GJYikyUYS8309bN7oP7Hwfz7ZDzlGUSCQSiUQikUgkEokFco6iRCKRSCQSiUQikVij8H9rtX1ryB5FiUQikUgkEolEIpFYIHsUJRUiYk6FyHmEIuc/iuA/DnnixM/d2/YpVaXhKVH7bwpctl6tEiLbwStQiO6hheL2OP3cpfIVYe8XkT6LoqtTIzHCgl42X7YRuHx6lrsQ2QY2YuYzNVDc/5YJlXGp86tCdAG6/PSKEF1RPidlOAnRBYTVyaJ8TjolMBaC2tTLgmLcADH3Xo3iMd7KQgQyUZRIJBKJRCKRSCQSa1jZC/N/DTn0VCKRSCQSiUQikUgkFsgeRUmVcevegloD25OXkgFmuLLsW4vPaw3phFf/tmSeTcC5dQA3tuwjZffJ+zpXSqqOv0d+SezFy2z6/O+Pnc8PMxYAji5aRk4fzc3Em9Txr8umJRvQp2Tck4Z312D8B7QjO1UPZjMnP7JcirrVK0+h9nIh+3YGni38Of7ht2RcSraqKzIWorQDujSnef92GIpjEbViWxmbFk92oN+0Eeya+yUX9sZUyV8nVycmz3iJ64nJ+PjXY/Wiz0lLSStjV8/Pm1dnTcJkMhEeNrdK2qJioencGqc+nTHpMjCbzaR+XP7S9c6DeuC9bBqxrYdhzrK+7UFNLBfl0bxLS9oN6Ig+pSg+21ZsvmcNUfceiCvLIn2+m+qIMYiLhah75PCZOKKO/oa7ixYFMGl4P4vPk27p+HTLTwT41ObStZuMfvIvBPp5PzJ/oebVyTWxHqppMRat/bhh/pMNPZU9ig+Rbdu2odfr7+u7er2ebdvK3ngPC6XajqAlE4mbvZ4rH36Ltll93Lo1t7BROdhxMWIjiau+5+qK7TSeO+a+z3fyzDlCu3XkQXb5FOXzw44FwPPTXuC3A6fZ+ek2Tvx0hFEzX7yn76sc7Oi6aDyH5m7g5PJtuDf1xbtLsIWNjcaBw3O/5vQnu7jyw1E6hI+0qisyFqK0bR3sGDp/PLs++Iqov22lTlB9AjpbxsLNxwujLpOM5NQq+foHk6ZP4PiBE2xY9Q37f4rm1dmTyrULDmnKob1HqqwrKhYKB3vqzH2VWwsiSVn5NQ6B/mg6ld3LyS7AF7tG9R+5v6K178bOwY7xCybx1by1bP3bJuo39SO4y71tiizq3gNxZVmkz3dTHTEGcbEQdY9k5+YR8dm3vDN2CJOH9yMuMZkjv8VZ2Cxd/x092zVn3JBQxg7qQfgq6/sPivIXal6dXBProZoWY9HakkePTBQfItu3b3+gRHH79ke3GalL2ybkXLuNOa8AgPSjsXj2DrGwSd60j9ykokpA418HY9y1+z5f357d0Gg09+8w4nx+2LEACAl9gviTsQDEHr9ASOgT9/T92k80xnAthcJin28ei6d+r9YWNic+LH1rqVAqyTfmWtUVGQtR2vXbNCYtKQVTsW7C8TiCQi11067d5vKhe99wulOvjpw9UfS9M8fO0jm0Q7l2u7dHUZBfUGVdUbFQhwSRf/0W5mJfsk7+F22P9hY2Cgd73Cc+S0oFvRIP01/R2nfT+IlAUpJuU1B8rrjjFwgJbXtPGqLuPRBXlkX6fDfVEWMQFwtR98iZuKvU9XLDzrZoYFfrQH9+jTlvYZNwI4W6nq4A1KvlTlxiMml6wyPxF2penVwT66GaFmPR2o8lhYVifh5T/lRDT7du3cry5csZN24csbGxpKWlMWzYMA4cOEBCQgKrV69Gq9USHx/PmjVraNKkCZcvX2by5Mn4+vqyceNGLl68iIeHB9evX2fu3LkYjUamTJmCSqUiMDCQU6dOMWjQIJ577jmLcx84cICkpCTWr19Pw4YNGTlyJBs3buTKlSu4ubmRmZnJtGnT2Lp1K/Pnz2fevHnY2Njw1VdfMX36dH7++WeSkpJYuXIl3bp14+eff+b06dN89dVX/PDDD8yePZvjx48TExPD+++/T3BwMM7OzuzYsYOffvqJXbt2lTmXQqGocuzsPJ0xGUqHppgM2dh6upSxUzrY4v/2cNy6NOPc5JX3f7GqAVE+P4pYOHu4kGPMBiDbkIXW1QmlSkmhqWqVi9rTmXxDdslxniEbD0/ncm2VtioaD+9G9Mx1VnVFxkKUttbTmVxjqW6OIQtvD78q+WQNNw9XsgxZAGRlGnF2c0alUmKq4nWqCFGxULm7UmgsLReFhixU7pa6Xm+NIfWTb+AeEtuaWC7Kw9nDhZw77pssQxZ+Hg3vSUPUvQfiyrJIn++mOmIM4mIh6h7R6Q04OtiXHGvV9pzPsEwCQwL9OROfQLOGvpy9+DsAxuxc3Jy1D91fqHl1ck2sh2pajEVrP5b8yYae/qkSxWeeeYbvvvuO4OBgJk6cyCuvvILRaGTBggVEREQQHR1Nv379CA8P591336VNmzYcOXKERYsWsWrVKurUqcOIESNQKpVERERw4MABevToQVhYGMuXL2fq1KnodDrGjh1bJlHs2rUr9erVY+zYsfj4+HDp0qWSJE+hUDB9+nSioqJ49tlnsbGx4fvvvycgIICVK1fi4eGBu7s7MTExvPbaawB4enpy+vRpAAYOHMiHH34IQEhICL179yY7O5tp06bx9NNPk5ycXO65evfuXeXY5aXoUWlLl9ZXadXklzNHrjAnn0sRG1H71abNttkcbP865oJHszmpKJ8fVixC/68v7fp1ICcrB31qBg6OarL0Wai1GgzpmVVOEgGyU/TYakuXtbbTqslJKdu7rbRV0XXhOI4v3kxmwi2ruiJjIUrbkKLH3rFU10GrwZh6fz39AENeeIru/buSnZVNWmo6Gq0Gg96IxskRfZr+gZNEEBcLky4dpWNpuVBqNZh0pbo2dTxRuWhxGtCt5G/u44Zi3HecnLPxD91f0dp3o0/NwOGO+0aj1aBPvbe5waLuPaj+svwwfL6b6ogxiIuFqHvE3VmLMae0F9aQnYu7i2UC+PaYwXy5ax9f/Wsfzo5qXJ001PYomzQ8DH+h5tTJov0VqV3TYixaW/Lo+VMOPfX1LdpjztnZmfr1i8bou7i4YDQW7fkUGxtLdHQ0kZGRHDlypGQIpFqtZunSpURGRnLx4kV0Ol2Jpp+fHwDu7u4lOpURFxeHUqlkzZo1REZGYmNjg8FQ9Dbx6aefxmQyoVAo8PDwuK//MSAgAICgoCAuX75c4bmqSsbxOBx8vFDYFb1bcG0fSMrPMdi4OqIqbuTrT36qxD43WYetuzNKB7v78r86EOXzw4rF3o27WTz2A1ZMXkrM3hM0blO0b19g2yBi9t7bvoM3T8Sj9fFEWexz7XaNSYw6hb2rY8kDocrBlq6LJvBb5I+k/HYVv4HtrOqKjIUo7cST8bjV80RVrNugbRMu7I1B7eKIvfbe94jasWEXU1+YTnjYXA5FHab5E80AaNmuOQeL5yEqFApqe9e6Z+0/EBWL7JgL2HrXQlE8/E3TphmGX46idNGidFRTcCOF5OkfoYvcgi5yCwC6L7ZbfaCsieWiPOJPxOJZz1d6y0oAACAASURBVAub4nM1aRtEzN7j96Qh6t6D6i/LD8Pnu6mOGIO4WIi6R1o28SP5dhp5xb16p2Kv0D2kKRmGLAzFC8vc0ukZO6gHo5/8C62a+NGpZSC2NpW/3xflL9ScOlm0vyK1a1qMRWs/lhSaxPw8pvypehSrSlBQEH369CEoKIi8vDz27NkDwOuvv86OHTvw9vYuk2hVZRinUqnEbDYTGxtLo0aNsLe3JywsDIBz585hU9wAxMTE0LNnT7Zv386pU6do3bo1KpUKc/HKLufPn6dOnTolPuTm5lokrXf706RJkwrPVVUKs/OInfYZgfPHkZeqx/DfRNL2n6XRrFHkpxtIWLkDpb0tgYsmkJOUgmPjesTNWofpjiFF98KxmDPs/CmKlFQdq9d9w9iRw3Cwt7f+xYfg88OOBcCmJRsYOWMMdRt6U7t+Hb6ev+6evm/KySN6xhd0njeGnFQ9uvO/cz36HO1njiA33cjpVTvpufIV3AN9cKr/IgC2anuu/nCsUl2RsRClnZ+Tx3fhaxn0/liMqXpuXEjk0sFz9J8+kuwMA/s+3QlAz1efxrWeJy2f6oipwET8r2es+vyPRZ/zynth+Db0pV4Dbz6e9w8AGjVryKwVMxjTeyIAXft2pkvvTtQP8OX/Jj/Pxk83PZJYmHNyuTFnFbVmTcKk05MTe4WsQ6fxemc8pozMkgdJlZszriMGAuAx8VnSN/1Iwc2KFyWoieWiPPJy8lg7czVj35+IXqcn8fxVzkX/dk8aou49EFeWRfp8N9URY5GxEHWPqO3tmDnxGRZ9sR13Zy1N6telQ4smfLRhJ85aDROe7sXpuCsciLlAs4Y+ZBizmDF+mNU4iPIXal6dXBProZoWY9HakkePwmx+kHUlaxbR0dHMmjWLoUOHEhoaSnh4OE2bNuWll15izpw5uLi4MGfOHDIyMli7di0+Pj4kJyczePBg2rZty7Jly4iPj6dNmzbs378fV1dX5syZw/Llyzl//jzz5s0jPj6ehQsXEhERQb9+lktdf/bZZyQmJpKbm8vixYvZvHkzly5dwtHRkfT0dKZOncrBgwf59NNPWbZsGTt27OC7774jPDycHj168PLLL9OoUSMaNWrE8OHDmTFjBl5eXvj6+rJixQr++te/0rFjx5L/JSwsjBYtilaPK+9cjo6OlcYrqvbz1X4Nup9bWO2af/Br8Axh2iL43CFPmHZPU+XX9n5pmJ8vRFckP6tVQnR/zb8hRHdegZcQXYB6LplCdJMynIToikTU/Sfq3rtsI25eTMMCMYOL/qOyPrrmfmigENNLMdYuXYgugN/mV4ToXn3uEyG6Iu9pUXVy7+zHt1fmYSMqxiJZePXeFld6VOSe/48QXfumPYXoPih/qkRRcm/IRFEsMlF8OMhEsRSZKJYiE8VSZKJYhEwUS5GJYs1GJori+LMlinLoqUQikUgkEolEIpFY4zHeykIEMlGUSCQSiUQikUgkEmvI7TEkkiJEDBMVOTxU5LBWEVxuPVuY9ohhYoZQqRrWE6Kr7DFIiC7A5af+KUR36e1YIbqdljQWogvQZ+ltIbp7ZvgK0RXJuHkHhehOsG0mRBdsBelCN43OutF9sO6OvdWqFUFPLgEHPxYjjLi2r/s5MT77XRO3ObqoOrlV6yQhuvaB4obh2vTqLkT38l/FLBQjcgi85PFEJooSiUQikUgkEolEYo0/2dDTP+U+ihKJRCKRSCQSiUQiqRjZoyh5IFJSdfw98ktiL15m0+d/v28dt+4tqDWwPXkpGWCGK8u+tfi81pBOePVvS+bZBJxbB3Bjyz5Sdp98ZP6K1PbuGoz/gHZkp+rBbObkR9stPm/1ylOovVzIvp2BZwt/jn/4LRmXkq3qqpq0wqZVZ8yGDDCbyfv3N2VsbLsXDQFVetRGoXYkZ+MKq7pK3yBUjUIgOxOzGQqO7LL43K73aBSupSt5Kj19yNk4H7O+8j27AA6fiSPq6G+4u2hRAJOGW245k3RLx6dbfiLApzaXrt1k9JN/IdDP26quqBi7ubmyYP4MrlxJpFEjf8JnLeLWrRQLGy8vDz5f8xHRB49Sy8sTWztb3ngzHGsLUIuKs5OrE5NnvMT1xGR8/OuxetHnpKWklbGr5+fNq7MmYTKZCA+bay0UQsuFKG0XV2dmzHmLxKvX8Auoz5IP/k7KbcvvtAwJZsKk0Zz77TwBjfw4dfIs33y5tVJdkfWbqLKs6dwapz6dMekyMJvNpH5c/oqEzoN64L1sGrGth2HOsj7UVFR5Kw9HFy0jp4/mZuJN6vjXZdOSDehTMu5L605qWrtXnT7XtDrZNuQJ7Lp0x5yehtlsJvvr9Raf2/fpj8OTgyGvaBXknJ9+IDdqt1VdENemHo5PIursVdwd1SgUMKlPG4vPk3SZLN91lGBfT2KvpzKgdQA9ghtY1RUVY4CALs1p3r8dhmLtqBXbyti0eLID/aaNYNfcL7mwN6ZKuo8jZvOfa3Vd2aP4CFmxYgVRUVHVorVt2zb0en21aN0LJ8+cI7RbRx5kkxWl2o6gJROJm72eKx9+i7ZZfdy6NbewUTnYcTFiI4mrvufqiu00njvmkfkrUlvlYEfXReM5NHcDJ5dvw72pL95dgi1sbDQOHJ77Nac/2cWVH47SIXykdWFbexye/yu529eQ9+NGlN5+qJq0stRt1xNztpH8X3eSu/0z8n7ZYV3Xxha7XqPI/3UL+Yd3ofSsh9I3yMLElHie3G+XF/18/wmma3FVSgayc/OI+Oxb3hk7hMnD+xGXmMyR3+IsbJau/46e7ZozbkgoYwf1IHyV9eW1hcUYiPhgOlF7D7Bk6Sq+//4nliwuOw/VxsaGHd//m8VLPmbqO+/TpUt7OnV8onJhgXGeNH0Cxw+cYMOqb9j/UzSvzp5Url1wSFMO7T1iVU+0vyK13531Bgf2HeaTFZ+z+197CZ83tYxN7dperF29gciP1zPz7fm89/4U3NxdK9QUWb+JKssKB3vqzH2VWwsiSVn5NQ6B/mg6tSpjZxfgi12j+lXy9Q+ElLcKeH7aC/x24DQ7P93GiZ+OMGrmiw+k9wc1rd2rLp9rXJ1sb4/29SkYV39M1oZ12DQMwLZ1mzJmmQvnkTHtTTKmvVnlJFFUm5qdV8D8bdG8M6gjk/u2IT45jSPx1y1s1v1yhtZ+tRnfsxXjerRk2a6jVnVFtnu2DnYMnT+eXR98RdTftlInqD4BnS213Xy8MOoyyUiuQh3/uGMuFPPzmCITxUfI66+/Tq9evapFa/v27Y8kUezbsxsajeaBNFzaNiHn2m3MeQUApB+NxbN3iIVN8qZ95CYVVTAa/zoY4649Mn9Fatd+ojGGaykUFsfi5rF46vdqbWFz4sPSt84KpZJ8Y65VXZV/EIW621BQpGu6ch6b4HYWNrZte6Bw1GLbfRB2T43BnJttVVdZNwCzXgemIt3C65dQ+bewsDHFHS/53Sa4CwXnoq3qApyJu0pdLzfsbIsGPrQO9OfXmPMWNgk3UqjrWfSQXq+WO3GJyaTpDZXqiooxwMABvTh8+AQA0QePMXBAaBmb5OSbfL626OHJ0VGD1lFDQmLlizCIjHOnXh05e6Jo4Yozx87SObRDuXa7t0dRkF9QJU2R/orUDu3bnRPHTgFw7EgMoX3LLjSx59+/cPrk2ZLjgoKCSuMisn4TVZbVIUHkX7+Fufj/yjr5X7Q92lvYKBzscZ/4LCkV9DRWhIjyVhEhoU8Qf7Jo0anY4xcICbXyQqaK1LR2D6rH55pWJ9s2DcZ08yYU7/+bf+4sdu07lbFzGDwU9bPPox41FoVT1RauEdWmnkm4RV03LXY2Rfsgtvarxf4LiRY27lo1acULRemMOTTz8bCqK7Ldq9+mMWlJKZiKtROOxxEUalmW067d5vIhcQskScRRY4eebt26leXLlzNu3DhiY2NJS0tj2LBhHDhwgISEBFavXo1WqyU+Pp41a9bQpEkTLl++zOTJk/H19WXjxo1cvHgRDw8Prl+/zty5czEajUyZMgWVSkVgYCCnTp1i0KBBPPfccxbnjoqKYv78+Tz55JPY29tz9uxZXnvtNYKDg8s9n7OzM1OmTAEgKCiI/fv3M378eHbv3k3Tpk157bXXePPNN0lKSqJz587ExMTQu3dvdDod58+fp1mzZrzxxhsAbNy4kStXruDm5kZmZibTpk0jOjqapKQk1q9fT8OGDRk5cmS5dv/5z39YuHAhPXv2pLCwkD179rBv376Hfu3uxs7TGZOhdMiSyZCNradLGTulgy3+bw/HrUszzk1e+TBdfGioPZ3JN5Q2JnmGbDw8ncu1VdqqaDy8G9Ez11nVVWhdMOdmlf4hJwuF1jLGCrdaKBw05P37nyi8vNFMnodx/uRK33QpNE6Y80uvnTkvG6W6oh4GBaoGwRTEVK0XXac34OhgX3KsVdtzPsPygSMk0J8z8Qk0a+jL2Yu/A2DMzsXNWVuhrqgYA9Sq5UFmZpGPen0m7u5uqFQqTKayQ1Wee24wk8LG8OGyT0lKqnx4j8g4u3m4kmUoKhtZmUac3ZxRqZSYTPf/hlOkvyK1PTzdMWYWxcKQacTVzaXC6wcw9qWRfPzRZyXXvDxE1m+iyrLK3ZVCY6luoSELlbulz15vjSH1k2/gHpM5EeWtIpw9XMgp/j+yDVloXZ1QqpQUCjjXvVIT272aVicrXN0wZ5e2e+YsIwpXy1Wl88+cIu/oIcwZGdi264DTzLnop0+xri2oTdUZstHYl65w7Ghvh85g2Qs3untzpnz5Mx/uPMzZ31MIuyvhKw+R7Z7W05ncO1Y4zjFk4e3hV6Xv1kj+ZIvZ1NhE8ZlnnuG7774jODiYiRMn8sorr2A0GlmwYAERERFER0fTr18/wsPDeffdd2nTpg1Hjhxh0aJFrFq1ijp16jBixAiUSiUREREcOHCAHj16EBYWxvLly5k6dSo6nY6xY8eWSRR79erFunXr6NSpE507d+b06dPMnj2brVu3Vni+sLAwli5dyjvvvMOLL75IYWEhhYWFJCUV9SS8/fbbjB49mjfeeAODwUC3bt04ePAgarWa0NBQ3njjDS5dusRXX33FDz/8gEKhYPr06URFRdG7d2/q1avH2LFj8fHxqdRu9+7dNGjQgFGjRjFkyJBHcenKkJeiR6V1KDlWadXklzOPpDAnn0sRG1H71abNttkcbP865oL/rbHi2Sl6bLXqkmM7rZqclLI9xUpbFV0XjuP44s1kJtyyqms2ZKCwv+NtsoOmaF7FneRkYbpaNIzIfPs6OGhQuHli1lWsb87KRGFbeu0UdmrM2Znl2qoCWmG6UvUlu92dtRhzSt9oGrJzcXexfNh4e8xgvty1j6/+tQ9nRzWuThpqe5R92LqT6o7xSxNf4Okh/TEYs7h1KxUnJy0ZGXqcnZ3Q6dIqTDI2b/6eLVt28vPuzVy7dp0f/723wnNUd5yHvPAU3ft3JTsrm7TUdDRaDQa9EY2TI/o0/QM/tIssF9WtPWrscPo9FUqWMYvUFB2OThr0+ky0To6kp2VUeP2GPDMQjUbNymWRleqLrN9E1RcmXTpKx1JdpVaDSVfqs00dT1QuWpwGdCv5m/u4oRj3HSfnbHwZPdHl7U5C/68v7fp1ICcrB31qBg6OarL0Wai1GgzpmY9Fkgg1s92rKXXyH5jT01CoS9s9hcYRc7rl1lGFN2+U/J5/KgbnuQtAqbSaDIhqU921arJy80uOjbl5uN9RTgBmb/6Voe0CGRASgM6QzeAl3/Kv6c/horG/W64EUTEGMKTosXcs9dFBq8GY+vBHuEnEUOOHnvr6Fu3f5ezsTP36RW+VXVxcMBqNAMTGxhIdHU1kZCRHjhwpGXqhVqtZunQpkZGRXLx4EZ2udB8pPz8/ANzd3Ut0Kjt3/fr1uXjxYqXnAwgICADAy8uL2rVrl9Hz8fFBqVTi7OyMh4cHjo6OKJVKlMqiyxQXF4dSqWTNmjVERkZiY2ODwVD2TbY1uz/8aNGiRZnvPgoyjsfh4OOFwq7ovYVr+0BSfo7BxtURVXHFVn/yUyX2uck6bN2dUTrYPRJ/RXLzRDxaH0+UxbGo3a4xiVGnsHd1LKnkVQ62dF00gd8ifyTlt6v4DWxXmSQApisXULp7gU2Rrsq/KQXnjoFGCw5FugVxp1F6FpdLBzUolZj1ZReZuJPC5EsonN1BVaSr9A7AdOU3sNeAnWXjpmrWiYL/HqpyLFo28SP5dhp5xT0Wp2Kv0D2kKRmGLAzFi2bc0ukZO6gHo5/8C62a+NGpZSC2NpW//6ruGK/5bANPDnqB50eE8cOPUXQsnm/YpXM7fvixKPlTKBT4+hYt6NC9W0fatS16A2w2m0lITMLfv/J5XtUd5x0bdjH1hemEh83lUNRhmj9RtPdfy3bNOVg8L0yhUFDbu1alOg/LX5HaX6/fwpjhk5n04lT27v6VJ9oVXZt2HULYu/tXoCgW3vXqlHxnxOhheHq5s3JZJIFNG+MfUPFCEiLrN1H1RXbMBWy9a6EoHmKoadMMwy9HUbpoUTqqKbiRQvL0j9BFbkEXuQUA3Rfby00SQXx5u5O9G3ezeOwHrJi8lJi9J2jcJhCAwLZBxOw98cD61UVNbPdqSp38B/nnz6GqXRtsi3robIObk3f0EAonJxTFz2aacS+BsmiYp6qeD4U3blSpx0hUm9qyQS2S0wzkFb8MOHX1Ft2C6pORlYshp2jBnRvpRjydi/x3VtujVEChlcmnomIMkHgyHrd6nqiKtRu0bcKFvTGoXRyxvyM5/Z/hTzZHscb2KFaVoKAg+vTpQ1BQEHl5eezZswcomh+4Y8cOvL29yyRbCoWiStq///47vr6+XL16tST5quh896JbEU2aNMHe3p6wsDAAzp07h01xJaVUKjGbzcTGxtKoUaMK7arDjzs5FnOGnT9FkZKqY/W6bxg7chgO9hW/1SqPwuw8Yqd9RuD8ceSl6jH8N5G0/WdpNGsU+ekGElbuQGlvS+CiCeQkpeDYuB5xs9ZhMlgf7y/CX5Happw8omd8Qed5Y8hJ1aM7/zvXo8/RfuYIctONnF61k54rX8E90Aen+i8CYKu25+oPxyoXzs8lZ/Mn2D/zMmZDBoXXr2KKO4394HGYszLJ+/lb8n7+Fvsh47DrMxyFZ11yNnwEBfmV6xbkk7d3I7Y9nocsA4UpSRT+fgHbrsMw5xgpOP4TAAovH8xptyC/anMeANT2dsyc+AyLvtiOu7OWJvXr0qFFEz7asBNnrYYJT/fidNwVDsRcoFlDHzKMWcwYP8yqrrAYA+GzFrFwwXs0adyQhg0bMO3deQC0bNmMdV+sIKRNb3Jycpk6dTKnTp3FyckRhULBuvWbKhcWGOd/LPqcV94Lw7ehL/UaePPxvH8A0KhZQ2atmMGY3hMB6Nq3M116d6J+gC//N/l5Nn5aic8C/RWpvfiDFbz3/lv4BzSggb8vEbOXAdA0uAl/+8dC+nYdRp8BPQn/4G3OnblA34GhuLm7MvvdBVy5lFCupsj6TVRZNufkcmPOKmrNmoRJpycn9gpZh07j9c54TBmZJcmhys0Z1xEDAfCY+Czpm36k4Gbli1UIKW8VsGnJBkbOGEPdht7Url+Hr+evu2eN8qhp7V51+Vzj6uTcXAwrP8Jx8uuYM9IpuHyJ/FMn0UyYhDlTT/bmjRSm6dC+PgXTjWRs/BqSuWR+1YIhqE1V29nw3tAuLN5xCDdHBxrXdaNDY28++tdRXDT2jO/ZincGdeDrA+c4nXCTJF0mr/Vvi5ujQ6W6Itu9/Jw8vgtfy6D3x2JM1XPjQiKXDp6j//SRZGcY2PfpTgB6vvo0rvU8aflUR0wFJuJ/rfpIkseKwv+tkWzWUJitrcn+mBIdHc2sWbMYOnQooaGhhIeH07RpU1566SXmzJmDi4sLc+bMISMjg7Vr1+Lj40NycjKDBw+mbdu2LFu2jPj4eNq0acP+/ftxdXVlzpw5LF++nPPnzzNv3jzi4+NZuHAhERER9OtnuQT06NGj6dq1K7m5uZw5c4Y33niDFi1acOnSpTLna9myJe+//z7nz59n0qRJ9OvXj+TkZBYsWEBGRgazZ89m586d7Ny5kwULFnD9+nUWLlzIggULAHjvvfeYNm0aw4cPZ/PmzVy6dAlHR0fS09OZOnUqjo6OfPbZZyQmJpKbm8vixYvLtbt06RJz5syhadOmvPzyyzRoUPlyyvkpl6v9uv0aPKPaNf+g+7mFwrRFsK512RUxq4sRw9KtG90Hqob1hOgqewwSogvw1VP/FKI7+dZ/hOjqlzxl3eg+6bO0/B6fB2XPO42tGz1mBM07KET3C9tmQnQv29paN7pPuml01o3ugwlG69tl3A8NbCofyni/rD+xTIguiGv7RLV7pmviFh4RVScPDf5diK59YNUWuLkfbHqVXTCrOtjwVzFJ2GUbcT1fC6/e28JYj4qcY5Vvh3S/OLR7Rojug1JjE8VHzejRo1m4cCE+Pj6P2hVhyERRLDJRLEUmiqXIRPHhIBPFUmSiWIRMFEuRiWIpMlEsRSaKkHN0ixBdh/bDheg+KDV+juKj4JdffiEpKYmNG2tGoZZIJBKJRCKRSCSSe+F/fo6iCHr06EGPHj0etRsSiUQikUgkEonkYfEn2x5D9ihKJBKJRCKRSCQSicQC2aMokfwPkhtb/r5yD4qmoRBZzAni5sNIai4K/wCB6mLmKIqcSyiRPCxUPmLm2oK4eW4pVxyF6Hoipj0FUDW8JERXVIwTzPe36u7/FI/xVhYikImiRCKRSCQSiUQikVhDDj2VSCQSiUQikUgkEsmfGdmjKHkgUlJ1/D3yS2IvXmbT53+/bx237i2oNbA9eSkZYIYry761+LzWkE549W9L5tkEnFsHcGPLPlJ2n3xk/orU9u4ajP+AdmSn6sFs5uRH2y0+b/XKU6i9XMi+nYFnC3+Of/gtGZeSrerahjyBXZfumNPTMJvNZH+93uJz+z79cXhyMOTlAZDz0w/kRu22qqv0DULVKASyMzGboeDILovP7XqPRuHqVWrv6UPOxvmY9ZVvzA1wOD6JqLNXcXdUo1DApD5tLD5P0mWyfNdRgn09ib2eyoDWAfQIrnx/UBAXYzc3VxbMn8GVK4k0auRP+KxF3LqVYmHj5eXB52s+IvrgUWp5eWJrZ8sbb4ZjbaciUXF2cnVi8oyXuJ6YjI9/PVYv+py0lLQydvX8vHl11iRMJhPhYXOthaJGlgsXV2dmzHmLxKvX8Auoz5IP/k7KbUt/WoYEM2HSaM79dp6ARn6cOnmWb76sfF8tUeVNpLamc2uc+nTGpMvAbDaT+nH5q3w7D+qB97JpxLYehjnL+nYYospbeTi6aBk5fTQ3E29Sx78um5ZsQJ+ScV9ad1LT2r3q9FmUbkCX5jTv3w5DcTmOWrGtjE2LJzvQb9oIds39kgt7Y6qkK6ocQ81rU0XFuDyad2lJuwEd0acUxX3bis33rfXYIXsUJZKqc/LMOUK7deRBduNUqu0IWjKRuNnrufLht2ib1cetW3MLG5WDHRcjNpK46nuurthO47ljHpm/IrVVDnZ0XTSeQ3M3cHL5Ntyb+uLdJdjCxkbjwOG5X3P6k11c+eEoHcJHWhe2t0f7+hSMqz8ma8M6bBoGYNu6TRmzzIXzyJj2JhnT3qxSg4aNLXa9RpH/6xbyD+9C6VkPpW+QhYkp8Ty53y4v+vn+E0zX4qqUDGTnFTB/WzTvDOrI5L5tiE9O40j8dQubdb+cobVfbcb3bMW4Hi1ZtuuoVV1hMQYiPphO1N4DLFm6iu+//4kli8vulWljY8OO7//N4iUfM/Wd9+nSpT2dOj5RubDAOE+aPoHjB06wYdU37P8pmldnTyrXLjikKYf2HrGqJ9pfUeUC4N1Zb3Bg32E+WfE5u/+1l/B5U8vY1K7txdrVG4j8eD0z357Pe+9Pwc3dtUJNkeVNlLbCwZ46c1/l1oJIUlZ+jUOgP5pOrcrY2QX4YteofpV8/QMh5a0Cnp/2Ar8dOM3OT7dx4qcjjJr54gPp/UFNa/eqy2dRurYOdgydP55dH3xF1N+2UieoPgGdLcuxm48XRl0mGcnW64g/EFmOa1qbKirG5WHnYMf4BZP4at5atv5tE/Wb+hHcpcUDaUoeHTJRlDwQfXt2Q6PRPJCGS9sm5Fy7jTmvAID0o7F49g6xsEnetI/cpKLKS+NfB2PctUfmr0jt2k80xnAthcLiWNw8Fk/9Xq0tbE58WPrWWaFUkm/Mtapr2zQY082bkJ8PQP65s9i171TGzmHwUNTPPo961FgUTtY3GVbWDcCs14GpyN/C65dQ+Vs2CKa44yW/2wR3oeBctFVdgDMJt6jrpsXORgVAa79a7L+QaGHjrlWTVrypt86YQzMfD6u6omIMMHBALw4fPgFA9MFjDBwQWsYmOfkmn68teqvt6KhB66ghITGpUl2Rce7UqyNnTxQtJnTm2Fk6h3Yo12739igK8guqpFkTywVAaN/unDh2CoBjR2II7Vt2M+w9//6F0yfPlhwXFBRUGheR5U2UtjokiPzrtzAX/19ZJ/+Ltkd7CxuFgz3uE58lpYIemooQUd4qIiT0CeJPxgIQe/wCIaFWXshUkZrW7oG4tq86dOu3aUxaUgqm4lgkHI8jKNQyFmnXbnP50L0teiayHNe0NlVUjMuj8ROBpCTdpqD4XHHHLxAS2vaBdR8XzGaTkJ/HFTn0tIps3bqV5cuXM27cOGJjY0lLS2PYsGEcOHCAhIQEVq9ejVarJT4+njVr1tCkSRMuX77M5MmT8fX1ZePGjVy8eBEPDw+uX7/O3LlzMRqNTJkyBZVKRWBgIKdOnWLQoEE899xzZc7/17/+lRYtWnDjxg3atGnD4MGD2bt3LwsXLqRnz54UFhayZ88e9u3bx4oVKzCZTCiVShwdHXnppZcwGo289dZbE29KwAAAIABJREFUtG3blitXrjBo0CA6d+78CCJZFjtPZ0yG0qEeJkM2tp4uZeyUDrb4vz0cty7NODd55cN08aGh9nQm31C6qlieIRsPT+dybZW2KhoP70b0zHVWdRWubpizs0qOzVlGFK6NLWzyz5wi7+ghzBkZ2LbrgNPMueinT6lcV+OEOb/02pnzslGqK3ozq0DVIJiCmCir/gLoDNlo7EtXkHS0t0NnsHzTObp7c6Z8+TMf7jzM2d9TCLvrIbk8RMUYoFYtDzIzDQDo9Zm4u7uhUqkwmco2As89N5hJYWP4cNmnJCVVPhRQZJzdPFzJMhSVjaxMI85uzqhUSkym+x9eUxPLBYCHpzvGzKJYGDKNuLq5VHj9AMa+NJKPP/qs5JqXh8jyJkpb5e5KobFUt9CQhcrdsk72emsMqZ98A/eYzIkobxXh7OFCTvH/kW3IQuvqhFKlpFDAue4V2e6VovV0JtdYGoscQxbeHn4PrCuyHNe0NlVUjMvD2cOFnDvqpSxDFn4egpZMfxT8yYaeykSxijzzzDN89913BAcHM3HiRF555RWMRiMLFiwgIiKC6Oho+vXrR3h4OO+++y5t2rThyJEjLFq0iFWrVlGnTh1GjBiBUqkkIiKCAwcO0KNHD8LCwli+fDlTp05Fp9MxduzYchPFoUOH0rt3b0wmEwMHDmTw4MGEhoaye/duGjRowKhRoxgyZAj79+/n9OnTrF27FoDRo0fTtWtX/Pz8ePHFF+ncuTPp6elMmDDhsUkU81L0qLQOJccqrZr8cuaRFObkcyliI2q/2rTZNpuD7V/HXPD4voW5H7JT9Nhq1SXHdlo1OSn6MnZKWxVdF47j+OLNZCbcsqprTk9DoS5966vQOGJOT7ewKbx5o+T3/FMxOM9dAEplpZWiOSsThW3ptVPYqTFnl7+UuCqgFaYrZ6z6+gfuWjVZufklx8bcPNzvKCcAszf/ytB2gQwICUBnyGbwkm/51/TncNHYV6hb3TF+aeILPD2kPwZjFrdupeLkpCUjQ4+zsxM6XVqFScbmzd+zZctOft69mWvXrvPjv/dWeI7qjvOQF56ie/+uZGdlk5aajkarwaA3onFyRJ+mf+CH9ppULv6fvTOPi7Jc//97ZphhZhjZRFEEQUkWcQ8yc8k9K9G0zTI1zSW1/JaWqWGmuaWpmac6UqfN9ByXMn9lZamnUjLNFBdCwA0UEYWBgRkYhhme3x9D4IQyaN4qp+f9evF6MTPX83muuea+n+u5n3sbPuph7hnYmxJLCfl5Rrwa6CkqKsbQwIvCAtMVf7/BD96HXq9j1bLEWv0VVadFajuMhSi9qnWVBj0OY/U12aNJACofAw3u7V71nv/oIVh+3I/1aEYNPdHl7VJ6P96fuHs6Yy2xUpRvQuulo6SoBJ1Bj7mw+JZoJIKc9y7FnFeEp1d1LLQGPZb8muX4arne5fhS6ltOFRXjy1GUb0J7yXVJb9BTlP/X5wbL3BzkoadXSUhICADe3t40b+58yuPj44PFYgEgLS2NpKQkEhMT2bt3b9WQDJ1Ox9KlS0lMTOT48eMYjcYqzbCwMAD8/f2rdC7Fbrdz4sQJ3nrrLf71r3+5HAsQHu7ca6xt27akpaVRWlpKYmIiiYmJNGnSBKPRiCRJ7N27l7fffpsNGzZQUFBz8YCbhWl/OtrgRig0zucWvndEkrf9IB6+XqgqLzbNJw6ssi/LMaL290ap1dwUf0WS+1sGhuAAlJWxCIxrRdaOZDx9vapuCFVaNd0WP8WRxG/IO3KasPvi3OqWp6agCgyEyj3e1DFtsO3bg6JBAxSVZVQ/ehwoncP5VM2CqTh/3u2Ts4qcEyi8/UHl9FcZFI7j1BHw1IPG9eZd1boL9t/31DkW7UIbk1NgxlZ5U5R8+gLdo5pjKinDbHUuDnC+0EKAt9N/b50nSgVUuJksc71j/N77n3J//BM8Omw8X3+zgzsr5xt2vSuOr79xNv4UCgUhIUEA9Oh+J3Gxzh4uSZLIzMqmRYva58dc7zhv+fQrpj0xg4Txc9mz4xfa3O7cM61dXBt+rpwXplAoCAxqXKvOjfL3Uq53uVj78UZGPjyRp5+cxs7vfuL2OOdvE9e5Izu/+wlwxiKoWZOqY4aNGEpAI39WLUskMroVLcKvvFCOqDotUrv04DHUQY1RqJ26+k6tMf+wD6WPAaWXDvv5PHJmrMCYuBFj4kYAjB9uvuLNtejydik7133H66NeY+XEpRzc+RutOkUCEBkbxcGdv/1l/euFnPeqyTqQgV+zAFSVsQiNjeDYzoPofLzwvKTBcbVc73J8KfUtp4qK8eXI+C2NgGaN8Kg8V0RsFAd37ndzVD1CqhDzd4si9yheZ6KioujXrx9RUVHYbDa+//57AKZMmcKWLVsICgrCbHYdpqRQKGrV/OGHH0hKSuKTTz4BYM2aNVc8PioqiuTkZMaPHw/Anj17CA0NZePGjVy4cIFFixZRXl7Of/7zn7/8XQF+PXiYL7ftIC/fyOqP/s2ox4ai9bxyb87lqCi1kTb9fSIXjMaWX4T59ywKdh3lttnDKS80k7lqC0pPNZGLn8KanYdXq2akz/4Ih/nqN369Hv6K1HZYbSTN/JC75o3Eml+EMfUM55JSuOPlYZQVWjj09pf0WjUJ/8hgGjR/EgC1zpPTX/9au3BZGeZVK/CaOAXJVIj95AnKkw+gf+pppOIiSjeso6LAiGHKVBznc/AIa0nxkgXuHbaXY9u5DnXPR6HETEVeNhVnjqHuNhTJasG+fxsAikbBSAUXoLxuc68AdBoPZg3pyutb9uDnpaVVUz86twpixdZ9+Og9GdOrPS/Gd2bt7hQOZeaSbSzm2QGx+Hlpa9UVFmMgYfZiFi2cRUSrlrRsGcr0l+YB0K5daz76cCUdO/XFai1j2rSJJCcfpUEDLxQKBR99vL52YYFx/ufifzFp1nhCWobQLDSIf8z7JwC3tW7J7JUzGdl3LADd+t9F175daB4ewuMTH2Xdu7X4XA/LBcDrr61k1qvP0yI8lNAWIcx/ZRkA0TERvPnPRfTvNpR+9/Yi4bUXSDl8jP739cbP35dXXlrIqROZl9UUWd5EaUvWMs7PeZvGs5/GYSzCmnaKkj2HaPTiGBym4qqbapWfN77D7gOg4diHKFz/Dfbc2hfCEFLersD6JZ/y2MyRNG0ZRGDzJqxd8NFVa1yO+pb3rpfPonTLrTa+SPiA+FdHYckv4vyxLE78nMKAGY9RajLz47tfAtDrmQfwbRZAu4F34rA7yPip9t40keW4vuVUUTG+HDarjQ9eXs2oV8dSZCwiK/U0KUlHrlpH5tZAIblbk10GgKSkJGbPns2QIUPo3bs3CQkJREdHM27cOObMmYOPjw9z5szBZDLxwQcfEBwcTE5ODoMGDSI2NpZly5aRkZFBp06d2LVrF76+vsyZM4fly5eTmprKvHnzyMjIYNGiRcyfP5977rmn6tz5+fk899xzREREEBgYSGJiIjNmzCAiIoI5c+YQHR3NhAkTCA11PtV+5513KC0tRaVSUVZWxgsvvMDp06eZPXs27du3x9fXl/fff7/Gef5Med7J6x7Hn2JmXnfNP+iRskiYtgg+6lBzRczrxZCYM0J09fdGuTe6BhQtwoXoAnw6+eoTXV2YeOG/QnSLlgx0b3SN9Fvq/kn5tfD9i63cG10DIstF5Og17o2ugdn6mqsq3up01xvdG10DT1nqts3A1RLqUXMu3/Xg49+WCdEFcbmvvuU9gFdiE4TojtIUuje6BgJa1Bzpdb0QlVNfe/Pyw1X/KpnStT2oqAvrMje7N7oFKN1R+3SDa0XXZ7wQ3b+K3KNYR7p27crOndVziDZvri7Qf/T0AQQEBLBgQc2nRtOmVS+z/kdvH8DChQur/m/bti1Dhw6tcWzDhg1dehEvPf5SP/5g0qRJNd4LDw9n3brqVb0mTJhQw0ZGRkZGRkZGRkZG5grcwsNERSDPUZSRkZGRkZGRkZGRkZFxQe5RlJGRkZGRkZGRkZGRccffbHsMeY6izBV5L/iJ6675X5W4sf69HF7CtEXwZPI8Ydrln74uRLfkm2NCdD0j3W9E/Hdh33px5Xi7TiVEt2+pmOX6T6rV7o2ukZbl5e6NbiFExkLUHMVdJf5CdEX9diJjLCr31be8B/DE2+2E6Fre2SpEN++UuBiLmv8oymeR8zUDtv0oTPt6UvrdO0J0df1rThu7FZB7FGVkZGRkZGRkZGRkZNzxN5ujKDcUZWRkZGRkZGRkZGRk3PE3G3oqNxRl6kxQtxha3BtHaX4RSBIHVriuuNp+0kB0jXwovWgioG0L9r+xCdOJnGs+n5ePgcdmjCA3K5cmLZqyfsmnFOWZbgmfb2Qs8vKNvJX4CWnHT7L+X29dkwaAMiQK1W0dobQYSQL73q9cPtf0HYHCt1G1fUAw1nULkIpq309K3fF2NF17IBUWIEkSpWs/dvncs98AtPcPAptzM3Trtq8p2/FdnXxWRbTHo/1dSGYTSBK2b/9d8/w94p3+NgxEofPCum7l/5wugF+PtjS+7w5seSaQ4NSyTS6fNx7chUYDYik+mol3h3DOb/yRvO8OuNUN79qGNgPiMFeW5R0rP69h0/b+ztwzfRhfzf2EYzsP3lR/QVz9E+VzfYyF/q4ONOh3Fw6jCUmSyP/Husvaecf3JGjZdNI6DEUqcb8dhshrp6g438jrfZuu7Yi7906K8pxx/3zlhqvWEOmvKO1fMrLZcfQ0/l46FAp4ul8nl8+zjcUs/2ofMSEBpJ3L594O4fSMCXWrKzI/iaojonwW5a9In2VuPnJD8SZy9uxZjh07Rt++fQFYsmQJR44ccdkK41ZBpdXQbfEYNvV+iQqbnb6JUwjqGsO5pJQqGw+9ll/mrgWgZXxnOic8xnejl1/zOR+d/gRHdh9i79af6dQnluEvP8m7z9ftplqkzzc6FgcOp9C7+50cy/gL+1p6qNH0GY51zVxw2NHcPwFlSBQVZ6rnHDqyUnFsryx7Gi2a/k+6bSTi6YlhylQKxj8J5eU0mD0PdYdOlCe73nwVL5pHRe75q/NZ7Yn20clYFk0Cux3tmJmoItrjSD9U/bXieiGVWrD/6ty6RhkU9r+nCyh1GqKWjOWXHtOQbHba/msqft3bULDraJWNSqvh+Px1lGXnY2gTRtv3nnN7E6zWahiyYAwr+k/HYbMz/N3nCL8rhhM/V5dlv+BGWIzFmHLclIUb4O8fx4mof6J8ro+xUGg9aTL3GU7d9zRSuZ1mq15G36U9JXsOudhpwkPQ3NbcrZ+i/QVxcb6R13uNVsOYhU8zvd8U7DY7z/1zOjFd217VZuUi/RWlXWqzs+DzJD6b9iAaDxXTPtnB3oxzdG4VVGXz0Q+H6RAWyIgebTiWnceLn/7XfUNRYH4SVUdE+SzMX4E+37L8zXoU5e0xbiLZ2dls37696vXjjz9+E72pncDbW2E+m0eFzQ5A7q8ZNO/TwcXmtzeqn94qlErKLWV/6Zwde99OxoE0ANL2H6Nj79tvCZ9vdCz69+qOXq+/5uMBlE3DkYqM4HD6XHHuBKoWbV1sHOn7q/73iOmKPSXJra46OgZHbi5ULi5RnnIUzR1dathpBw1B99Cj6IaPQtGgbgvXqFpEUWG8CHanz45TqXjExLmeP7YnCi8D6h7xaAaORCpzvxlwfdMF8ImNwHr2IlJlmSvcl0ZA344uNjnrf6Qs29mY07dogiX9rFvd5p1aUZCdh6NSN3N/OlG9XXULzl7k5J7f6+SnaH9BXP0T5XN9jIWuYxTl5y4glTt1Sw78jqHnHS42Cq0n/mMfIu8KvRI30l8QF+cbeb1vdXskedkXsVeeK33/MTr2jr0qDZH+itI+nHmBpn4GNB7OxbY6hDVm17EsFxt/g44Ci7N3y2ix0jq4oVtdkflJVB0R5bMof0X6LHNrcEv2KH722WcsX76c0aNHk5aWRkFBAUOHDmX37t1kZmayevVqDAYDGRkZvPfee0RERHDy5EkmTpxISEgI69at4/jx4zRs2JBz584xd+5cLBYLU6dORaVSERkZSXJyMvHx8TzyyCMu5zYajSxevJjw8HDOnDnDAw88QFFREYsWLeK+++4jLy+P06dPM2rUKJKSkkhLS2PZsmU0a9aM3Nxc3nrrLcLCwsjMzGTIkCHcfvvtl32/bdu2bN68mdTUVFatWsV9992Hp6cnVquVVatWkZKSQuvWrZkyZQo7d+5k0aJF3HvvvZjNZn7//XfeeOMNgoODyc3NZdmyZbRq1YqsrCweffRR2rRpw9q1azlz5gx+fn5kZ2czb968y75XV3QB3pSbq29qbeZSGgZ4X9ZWqVbR6uHuJL380TX9/n/g3dAHq8V5zlJzCQbfBihVSiocdXuaI8rnmxGLv4pC3wCpvHoIiWQrRam70lNDBarQGOwHd7jX9fVDKi2p1i2xoPBt5WJTfjgZ2749SCYT6rjONHh5LkUzprrXNvgglVVrYy1BYfBxtfFrjEKrx/btf1A0CkI/cR6WBRNrnWxe33QBNAHeOMzVv5/DXIo6wKeGnVKrpsULD+PXtTUpE1fVqglgCPCmzFKtazWXENQwzO1x7hDlL4irf6J8ro+xUPn7UmGp1q0wl6Dyd/W50fMjyX/n31B543kz/QVxcb6R13vvhj5YLzlXibmEsIYtr0pDpL+itI3mUvSe1SvOenlqMJpdRzCM6NGGqZ9s540vf+HomTzG/6mBejlE5idRdUSUz6L8FenzLctNWszm559/5rvvvqNhw4YoFAqeeeYZl8/PnDnDkiVLaNu2LampqQwcOJA+ffr85fPekj2KDz74IC1btiQmJoalS5ei0WiwWCwsXLiQ6OhokpKcPR0JCQkMGzaMsWPHMnjwYBYvXgxAkyZNSEhIYPLkyeh0Onbv3o2Pjw/jx4/HZDIxbdo0Vq5cedkhngcOHMBkMjFixAimTZtGw4YN6d27N7fffjvBwcEsWLCA1q1b8/vvvzN37lwGDBjAtm3bAHj99dfp3r0748aN47nnnuP5559HkqTLvq9WqxkyZAjR0dE8++yzhIeHA3Dx4kUmT57MO++8w6ZNzidzf5w/KCiIV155hb59+/Ldd9/VOOeECRNISEgAYMOGDfTu3ZsJEybwwAMPXPG9ulKaV4TaoKt6rTHosOYV1bBTqlV0WzSa/a9voDjzwlWdA6D34/156ePZ/N+7L1KUb0Lr5TynzqDHXFhc50aiSJ9vVCyuJ1JJMQq1tuq1QqNDKi2+rK0qvD2OU4frpltYgEJX3dup0HshFRa62FTknkcyOeeWlicfRN2uPSjdX3okswmF5yU9qVq9c+7fpVhLcJxOd9pfPAdaPQq/gP8pXQBbXhEqQ/XvpzLoKL/MfN0Kazkn5q8jZeIqOn3+CgqP2rfDMOcV4elVras16LHk1yzLV4sof0Fc/RPlc32MhcNYiNKrWldp0OMwVvvs0SQAlY+BBvd2x3/8wwD4jx6Ctk2rGlo3wl8QF+cbeb0vyjehveRceoOeovyrm5cv0l9R2v4GHSVl1VueWMps+F/yWwK8suEnhsRF8kL8nSwf2Yfpa/+LqaT23kqR+UlUHRHlsyh/RfosU01paSlz5sxh1qxZPPvss6SlpbFnzx4Xm/fff5/bb7+d8ePHM27cOF5//fpsk3ZL/0ohISEAeHt707y5s/fDx8cHi8W5j0taWhpJSUkkJiayd+/equF5Op2OpUuXkpiYyPHjxzEaq/eICgsLA8Df379K51J69uxJXFwcTz31FAkJCXh4VHe6/uHDpf54e3u7+POHzwEBARQXF1NQUHDF9y9HcHAwSqUSpVLpcu4r+Z6WlsahQ4dITExk69atNGzYkIqKChYvXsz69et56KGHSElxzh+43Ht1Jfe3DAzBASg1Tp8C41qRtSMZT1+vqsSh0qrptvgpjiR+Q96R04TdF1eb5GXZue47Xh/1GisnLuXgzt9o1SkSgMjYKA7u/O2W8PlGxeJ6UpFzAoW3P6icPiuDwnGcOgKeetC4JmRV6y7Yf99zOZkalKemoAoMhMr9x9QxbbDt24OiQQMUlfVRP3ocKJ03Y6pmwVScP1+nMf6OU8dQ+jeCynqgahGNPeVX0BtA64yzPf0QyoBA5wFaHSiVSEWXr1v1VRfAtD8dbXAjFJVlzveOSPK2H8TD1wtVZZlrPnFglX1ZjhG1vzdKraZW3awDGfg1C0BVqRsaG8GxnQfR+XjheckN4dUiyl8QV/9E+VwfY1F68BjqoMYo1E5dfafWmH/Yh9LHgNJLh/18HjkzVmBM3IgxcSMAxg83Yz2acVP8BXFxvpHX+4zf0gho1giPynNFxEZxcOd+N0fdOH9FabcLbUxOgRmb3bkfa/LpC3SPao6ppAyz1bn4yflCCwHezpzirfNEqYAKN9uAi8xPouqIKJ9F+SvS51uWigoxf7WQnJxMUFAQGo3zetWpUyd++OEHF5uAgICq9o7RaCQmJua6fN1bcuhpXYmKiqJfv35ERUVhs9n4/vvvAZgyZQpbtmwhKCgIs9nscoxCoahVMz09nfj4eMaOHcvatWv5+OOPq3rp6uJPVlYWMTExXLx4EW9vb/z8/K74vkqlQpIkrFYr586dQ6PR1Orf5T6LioqiS5cu9OnTB0mSCAwMRKlUkpOTw7JlyygpKWHgwIHEx8df9j1fX986fTeH1UbSzA+5a95IrPlFGFPPcC4phTteHkZZoYVDb39Jr1WT8I8MpkHzJwFQ6zw5/fWvddK/HOuXfMpjM0fStGUQgc2bsHbBR1d1vCifb3Qsfj14mC+37SAv38jqj/7NqMeGovX0vDoRezm2netQ93wUSsxU5GVTceYY6m5DkawW7PudveKKRsFIBRegvI5zbMrKMK9agdfEKUimQuwnT1CefAD9U08jFRdRumEdFQVGDFOm4jifg0dYS4qXLKibdnkZ1g3v4PngBCSziYpzp3GkH8Jz0GikkmJs2zdh274Jz8Gj0fR7GEVAU6yfrgC7m82465suUFFqI236+0QuGI0tvwjz71kU7DrKbbOHU15oJnPVFpSeaiIXP4U1Ow+vVs1In/0RDnPtcyDLrTa+SPiA+FdHYckv4vyxLE78nMKAGY9RajLz47tfAtDrmQfwbRZAu4F34rA7yPip9h5nUf6CuPonyuf6GAvJWsb5OW/TePbTOIxFWNNOUbLnEI1eHIPDVFx1I6ny88Z32H0ANBz7EIXrv8Gee+VFj0ReO0XF+UZe721WGx+8vJpRr46lyFhEVurpq1rIRrS/orR1Gg9mDenK61v24OelpVVTPzq3CmLF1n346D0Z06s9L8Z3Zu3uFA5l5pJtLObZAbH4eWlr1RWZn0TVEVE+C/NXoM+3LDdh6Gl+fj5eXl5Vrw0GA/n5rr/L6NGjmTx5MosWLeLw4cNMmjTpupxbIUluHsncBJKSkpg9ezZDhgyhd+/eJCQkEB0dzbhx45gzZw4+Pj7MmTMHk8nEBx98QHBwMDk5OQwaNIjY2FiWLVtGRkYGnTp1YteuXfj6+jJnzhyWL19Oamoq8+bNIyMjg0WLFjF//nzuueeeqnPv37+fjRs3Eh4eTmZmJo8++igAc+bMqeHD1KlTWb58OSaTiblz56LX63nzzTcJDQ0lMzOThx56qGqO4uXeLyws5P/+7/8ICwujd+/eHDhwgC+//JIFCxZgNpuZNWsWL730EhEREVXnf+6553jttdcwmUy89tpraLVa3nrrLYKDg8nLy6Nz587079+fhIQEmjRpAkBxcTEzZ8687Hu18V7wE9f9t/2vqmYv7vWil8PLvdEtxJPJdZ8jerWUf3p9hhz8mZJvjrk3ugY8I+WJ7X+wb724crxd534447XQt9QhRPekWu3e6BppWe6+gX4rITIW3fVG90bXwK4SfyG6on47kTEWlfvqW94DeOLtdkJ0Le9sFaKbd0pcjANaiCkXonwW5S9AwLYfhWlfT0q3LBGiqxs8/Yqf7dmzh3/+8598/LFz25EPP/yQ8+fPu9zHP/PMMwwYMICBAwdiNBrp378/27dvr3OH0JW4JRuKMrcGckNRLHJDsRq5oViN3FCsRm4oViM3FKuRG4rV1Le8B3JD8VLkhuIl2vWlobh5sRBd3ZAZVz5naSmDBg1i69ataDQann32WR5//HGio6Px8PDAYDDw4IMPMn36dDp37ozdbueuu+7i22+/xd//r12H6/XQUxkZGRkZGRkZGRkZmf9VdDodr776KvPnz8fPz4/IyEi6dOnCkiVL8PX1Zfz48cycOZNPPvmEgwcPcvbsWZ5//vm/3EgEuaEoIyMjIyMjIyMjIyPjnpu0PUbXrl3p2rWry3vTp1cPV42NjSU29ur2Xa0LckNRRkZGRkZGRkZGRkbGHfV1tdZrRG4oylyRITFnrr9oSsj116xk2NBC90bXQFna5fcb/KuImkcIoH7iJSG6hp6/C9GVMsXoAth3/CRMWwTtO5wXpv2vY2LmjLXvIGaOW3shqk6WpTUTojv7OTHzbe84mS1E14mgnbI+FyMrqo6ILG+icl99y3sA9h1ifDa8/qIQXS+B+UkYguZr6u+NEqIrc+siNxRlZGRkZGRkZGRkZGTc8TfrURT0GFFGRkZGRkZGRkZGRkamviL3KMrUGXXH29F07YFUWIAkSZSu/djlc89+A9DePwhsNgCs276mbMd3ddIO6hZDi3vjKM0vAkniwIrNLp+3nzQQXSMfSi+aCGjbgv1vbMJ0IsetriqiPR7t70Iym0CSsH3775rfq0c8AMqGgSh0XljXrXSrKzIWypAoVLd1hNJiJAnse79y+VzTdwQK30bV9gHBWNctQCpysynun8jLN/JW4iekHT/J+n+9dVXHXsovh9PZse8I/j4GFMDTD9/j8nn2BSPvbtxGeHAgJ87mMuL+u4kMC6qbdkY2O44Dqv9kAAAgAElEQVSext9Lh0IBT/fr5KptLGb5V/uICQkg7Vw+93YIp2dMqFtdUeVClC6ILXN/pk3XdsTdeydFeSYkSeLzlRuuWkOkv6K0w7u2oc2AOMyV16EdK2uOnWx7f2fumT6Mr+Z+wrGdB+vkr8g6Xd/KsqhrPYgrF6J061veA3GxEOVzfcxPonT/F+5bbgn+ZrsKyg3FW5Tt27cTFRVFcHDwzXbFiacnhilTKRj/JJSX02D2PNQdOlGefMDFrHjRPCpyr27+iEqrodviMWzq/RIVNjt9E6cQ1DWGc0kpVTYeei2/zF0LQMv4znROeIzvRi+vXVjtifbRyVgWTQK7He2Ymagi2uNIP1StG9cLqdSC/dedACiDwtw7LDAWeKjR9BmOdc1ccNjR3D8BZUgUFWeq9y90ZKXi2L7G+UKjRdP/yWu62B44nELv7ndyLOPkVR/7B6VlNua/v4nPl01Ho/Zg6rKP2Hsknc5tI6psln78BfF3x9HnjrZkZOUwa9VaNi59wb22zc6Cz5P4bNqDaDxUTPtkB3szztG5VXUS/+iHw3QIC2REjzYcy87jxU//6z5hiioXonRBbJn7ExqthjELn2Z6vynYbXae++d0Yrq2JSXpSN1FRPorSFut1TBkwRhW9J+Ow2Zn+LvPEX5XDCd+rr4O+QU3wmIsxpRzFfVNZJ2uZ2VZ2LUexJU5Qbr1Lu+BuBgL8rk+5idhee9/5L7llkAeeipzK7B9+3ays0UuXnB1qKNjcOTmQuVGx+UpR9Hc0aWGnXbQEHQPPYpu+CgUDeq2qEPg7a0wn82jwmYHIPfXDJr36eBi89sbm6r+VyiVlFvK3OqqWkRRYbwIdqeu41QqHjFxrt8rticKLwPqHvFoBo5EKit1qysyFsqm4UhFRnA4fa44dwJVi7YuNo70/VX/e8R0xZ6SVCftP9O/V3f0ev01HfsHh9NP07SRHxq185lTh8gW/HQw1cUm83weTQN8AWjW2J/0rBwKiszutTMv0NTPgMbDuUl8h7DG7DqW5WLjb9BRYLECYLRYaR3c0K2uqHIhShfElrk/0+r2SPKyL2KvrI/p+4/RsffVLbkt0l9R2s07taIgOw9H5ffO3J9OVO+OLjYFZy9ycs/VLWwhsk7Xt7Is6loP4sqFKN36lvdAXCxE+Vwf85Mo3f+V+xaZG89N7VH87LPPWL58OaNHjyYtLY2CggKGDh3K7t27yczMZPXq1RgMBjIyMnjvvfeIiIjg5MmTTJw4kZCQENatW8fx48dp2LAh586dY+7cuVgsFqZOnYpKpSIyMpLk5GTi4+N55JFHapx//fr1nDp1Cj8/P5KTk1m6dCkAS5YsITg4mHPnztGtWzf69u3L0qVL2bp1K48//jj79+8nOjqaBg0acOTIEfR6PYsWLeLXX39l/vz5tG/fniZNmnDkyBEef/xxunfvzvbt29mxYwctWrQgLS2NuXPnYjAYyM3N5c033yQ8PJysrCzatm1LZGQkqanOi9mhQ4eIi4vj1VdfpVOnTqjVag4dOkRCQgJt27bFbDazYMECwsLCOH/+PL1796Z79+589913/PzzzzRr1oyjR4+ycuXKy75XVxS+fkilJVWvpRILCt9WLjblh5Ox7duDZDKhjutMg5fnUjRjqlttXYA35ebqi77NXErDAO/L2irVKlo93J2klz9y77PBB6ms2mesJSgMPq42fo1RaPXYvv0PikZB6CfOw7JgYq375IiMhULfAKncWq1tK0Wpa34la1ShMdgP7nCrKwpjkRkvrWfVa4POk1STa5LtGNmCwxmZtG4ZwtHjzpV0LaVl+Hkbatc2l6L3VFe99vLUYDS7PoEc0aMNUz/Zzhtf/sLRM3mM/9ON1uUQVi4E6YLYMvdnvBv6YL2kPpaYSwhr2PKqNITWEUHahgBvyizVdc9qLiGoYZhbf9z6K7BO17eyLOpaD+LKhSjd+pb3QGCMBflcH/OTsLwn37dcP+QexRvHgw8+SMuWLYmJiWHp0qVoNBosFgsLFy4kOjqapCTnE4eEhASGDRvG2LFjGTx4MIsXLwagSZMmJCQkMHnyZHQ6Hbt378bHx4fx48djMpmYNm0aK1euZM2aNTXOfeLECdasWcOMGTOYMGECgwcPRpIkVq9eTWhoKOPHj2fWrFnMmzcPk8nEiy++iNFoZPjw4fzzn//kP//5D/3792flypWkpKRQUFBAXFwc0dHRtG3blkmTJjFv3jxmzJiBJEl4e3sza9Ysxo8fT5s2bdiyZQsAr7/+Ot27d2fs2LEkJCSg0Who164d0dHRDBkyhPHjx9OxY0f69u2LwWBg1qxZPPnkk3zxxRcAVf5OmDCBl156iVdeeQW73c4XX3xB+/btGTduHKNGjQK47Ht1RSosQKGr7n1S6L2QCl2XuK7IPY9kMgFQnnwQdbv2oHRfxErzilAbdFWvNQYd1ryiGnZKtYpui0az//UNFGdecO+z2YTC85IeM63eOf/hUqwlOE6nO+0vngOtHoVfQO26AmMhlRSjUGurtTU6pNLLL1OuCm+P49Rht5oi8fc2YLFWP+U2l5bh7+OaYF8YOYjC4hLWbP2RnDwjvg30BDb0+bNUTW2DjpKy8qrXljIb/gati80rG35iSFwkL8TfyfKRfZi+9r+YSmp/6i6sXAjSBbFl7s8U5ZvQXlIf9QY9RfmmWo64sf6K0jbnFeHpVV2+tAY9lvya16GrRWSdrm9lWdS1HsSVC1G69S3vgcAYC/K5PuYnYXlPvm+RuUZuiaGnISHO/YW8vb1p3tz5FMLHxweLxQJAWloaSUlJJCYmsnfv3qrhcjqdjqVLl5KYmMjx48cxGqv38goLCwPA39+/SudS0tPTXeb/DRgwgAYNGpCWllblj0ajwcfHh8zMTAACAgLw8vJCqVTi5eV1WV+BKt1GjRpRUlKC0WhEr9fz9ttvs3r1ag4ePFjla1paGqGhoVXnGzx48BXjdLnvlJaWxsmTJ0lMTOSTTz4hIiICk8nEzJkz+e233xg6dCi7du1CkqTLvldXylNTUAUGgtr5pEsd0wbbvj0oGjRAUfl76EePA6VzuISqWTAV58/X6clL7m8ZGIIDUGqcHdyBca3I2pGMp69XVSJVadV0W/wURxK/Ie/IacLui6tNEgDHqWMo/RuBh1NX1SIae8qvoDeA1qlrTz+EMiDQeYBWB0olUlHBTYtFRc4JFN7+oHL6rAwKx3HqCHjqQeOaLFStu2D/fY9bTZG0iwgj52IBtnLnkJPktFP06BiNyVyCucT5hPGCsYhR8T0Zcf/dtI8Io0u7SNQe7gcztAttTE6BGZvd4dQ+fYHuUc0xlZRhtjon258vtBDg7Yy5t84TpQIq3JRrUeVClC6ILXN/JuO3NAKaNcKjsj5GxEZxcOd+N0fdOH9FaWcdyMCvWQCqyu8dGhvBsZ0H0fl44XnJDf3VIrJO17eyLOpaD+LKhSjd+pb3RMZClM/1MT+J0pXvW64jUoWYv1uUerGYTVRUFP369SMqKgqbzcb3338PwJQpU9iyZQtBQUGYza7DCRQKRa2aERERLnMAt23bRlxcHFFRUWRlOceD22w2TCZTVQOtrpw9exaACxcuoNPp8Pf356mnnuLll18mLi6O9evXc+HCharvlpWVRUxMDFarlW+//ZYHHngApVKJJElkZmYSEBBwxe8UFRVFQEAAI0eOBJy9hr6+vhw6dIj58+dTXl7OiBEj6Nu3L7m5uTXei4mJqduXKivDvGoFXhOnIJkKsZ88QXnyAfRPPY1UXETphnVUFBgxTJmK43wOHmEtKV6yoE7SDquNpJkfcte8kVjzizCmnuFcUgp3vDyMskILh97+kl6rJuEfGUyD5k8CoNZ5cvrrX2sXLi/DuuEdPB+cgGQ2UXHuNI70Q3gOGo1UUoxt+yZs2zfhOXg0mn4PowhoivXTFWAvr11XYCywl2PbuQ51z0ehxExFXjYVZ46h7jYUyWrBvn8bAIpGwUgFF6C8bvN3LsevBw/z5bYd5OUbWf3Rvxn12FC0np7uD7wEnaeGl8c+yOIPN+PvbSCieVM6t41gxadf4m3Q89QDfTiUfordB4/RumUwJksJM8cMrZu2xoNZQ7ry+pY9+HlpadXUj86tglixdR8+ek/G9GrPi/GdWbs7hUOZuWQbi3l2QCx+XtrahUWVC1G6ILbM/Qmb1cYHL69m1KtjKTIWkZV6+uoWshHtryDtcquNLxI+IP7VUVjyizh/LIsTP6cwYMZjlJrM/PjulwD0euYBfJsF0G7gnTjsDjJ+cvN0XGSdrmdlWdi1HsSVOUG69S7viYyxIJ/rY34Slvf+R+5bZG48CulqupWuM0lJScyePZshQ4bQu3dvEhISiI6OZty4ccyZMwcfHx/mzJmDyWTigw8+IDg4mJycHAYNGkRsbCzLli0jIyODTp06sWvXLnx9fZkzZw7Lly8nNTWVefPmkZGRwaJFi5g/fz733OO6LPL69es5fvw4fn5+VFRU8Mwzz2A2m1m8eDFBQUHk5ORw991307dvXzZu3MiSJUtYuHAhALNmzWLmzJkEBQUxa9Ys4uPjef7555kxYwb+/v54eXlx6NAhhg8fzt13382nn37Kjh076Ny5MykpKZhMJl577TW0Wi1vvvkmYWFhXLx4kYcffpjIyEi++uor/vvf/yJJEuPHj2fhwoX4+Pgwe/Zs3nzzzarv16JFC5YuXUpgYCDFxcWEhITw+OOPs3z5chQKBVqtlnPnzjF79mz+8Y9/1HhPo9Fc8ffJu+fu6/6bb04Jue6afzBsaKF7o2ugLO3yQyj+Kvp7o4ToAqifeEmIruPs1S3kUVekTDG6APYdPwnTFoGo8gYw5Zi/EN23oozujW4xlqU1E6I7+7lrW0TIHY6Tt87iZnXlP5/7CtEdEnNGiK5IROW++pb3ADwjxdQR9ZinheiKzE+isLyzVYiuyPsW/XOrhWlfT0o/mSlEVzdykRDdv8pNbSj+LzJjxgyGDBlC586db7Yrfxm5oehEbihWIzcUxSM3FG8MckNRPHJDsRq5oViN3FAUj9xQFEfpxzOE6OpGLRai+1e5JeYo/q+wf/9+0tLS2LJlS42hsDIyMjIyMjIyMjIyMvWFejFHsb4QGxvL5s2bb7YbMjIyMjIyMjIyMjLXm7/Z9hhyQ1HmiogYYtAyWcwwGQBVSzFDyfRXt4VcnSn55pgYYcDQU8xQGVVwayG6DiGqTsrSxAzBOZTcRIjuHY8KkQUgNO3aV++sDVHDkYQOt0wTJy0CUdc3sdRccfx6kHfKS4hu80nipkaIyn31Le8BZL0jZuhwaB8xeU8RKibviSTv1A9CdAME3rfonxMmLfMXkBuKMjIyMjIyMjIyMjIy7pB7FGVkZGRkZGRkZGRkZGRcuIX3PBSB3FCUqTPKkChUt3WE0mIkCex7v3L5XNN3BArfRtX2AcFY1y1AKsp3q+3Xoy2N77sDW54JJDi1bJPL540Hd6HRgFiKj2bi3SGc8xt/JO+7AzfNZ5GxUHe8HU3XHkiFBUiSROnaj10+9+w3AO39g8Dm3HzXuu1rynZ851b3l8Pp7Nh3BH8fAwrg6Yddt4vJvmDk3Y3bCA8O5MTZXEbcfzeRYUFudS9HXr6RtxI/Ie34Sdb/661r0hDps6gYiyrHAKqI9ni0vwvJbAJJwvbtv2t+rx7xACgbBqLQeWFdt9KtbnjXNrQZEIc5vwgkiR0rP69h0/b+ztwzfRhfzf2EYzsP1slfkXVEjoV4bVG6Qd1iaHFvHKWVMT6wwnVef/tJA9E18qH0oomAti3Y/8YmTCdyatX8A/1dHWjQ7y4cRhOSJJH/j3WXtfOO70nQsumkdRiKVLnxem2IikV9y3sitUX9dr9kZLPj6Gn8vXQoFPB0v04un2cbi1n+1T5iQgJIO5fPvR3C6RkT6lYXxOUnUbqiYgzicqrMzUduKP4PsWnTJkpLSxkxYgRFRUVs376doUPrtnmsWzzUaPoMx7pmLjjsaO6fgDIkiooz1ePVHVmpOLavcb7QaNH0f7JuiUenIWrJWH7pMQ3JZqftv6bi170NBbuOVtmotBqOz19HWXY+hjZhtH3vOfcJU5TPAmOBpyeGKVMpGP8klJfTYPY81B06UZ7s+l2LF82jIve8e71KSstszH9/E58vm45G7cHUZR+x90g6ndtGVNks/fgL4u+Oo88dbcnIymHWqrVsXPpCnc9xKQcOp9C7+50cyzh5TccL9VlQjIWVYwC1J9pHJ2NZNAnsdrRjZqKKaI8j/VCViUdcL6RSC/Zfdzr9CQpzL6vVMGTBGFb0n47DZmf4u88RflcMJ35OqbLxC26ExViMKacO5bfKGYF1RI6FeG1Buiqthm6Lx7Cp90tU2Oz0TZxCUNcYziVVx9hDr+WXuWsBaBnfmc4Jj/Hd6OVuQ6HQetJk7jOcuu9ppHI7zVa9jL5Le0r2HHKx04SHoLmtuVu9aofExKLe5T2B2qJ+u1KbnQWfJ/HZtAfReKiY9skO9maco3Or6kbVRz8cpkNYICN6tOFYdh4vfvrfOjUUReUnUbrC6gcIy6m3KlLF32tXQXl7jP8hHnroIZ544gkAioqKrusKrMqm4UhFRnDYAag4dwJVi7YuNo70/VX/e8R0xZ6SVCdtn9gIrGcvItmc2oX70gjo29HFJmf9j5RlO5ONvkUTLOlnb5rPImOhjo7BkZsL5eUAlKccRXNHlxp22kFD0D30KLrho1A0cL8n1eH00zRt5IdG7Xw21CGyBT8dTHWxyTyfR9MA575nzRr7k56VQ0HRtW3z0r9Xd/R6/TUdK9pnUTEWVY4BVC2iqDBeBLtT23EqFY+YONfvFdsThZcBdY94NANHIpWVutVt3qkVBdl5OCp9ztyfTlRvV58Lzl7k5J6rWyRCZB2RYyFeW5Ru4O2tMJ/No6Iyxrm/ZtC8TwcXm9/eqO5VUyiVlFvK3OoC6DpGUX7uAlK5U7vkwO8Yet7hYqPQeuI/9iHyrtCTcjlExaK+5T2R2qJ+u8OZF2jqZ0DjoQKgQ1hjdh3LcrHxN+gosDh7zYwWK62DG9ZNW1B+EqUrKsYgLqfK3BrU+x7Fzz77jOXLlzN69GjS0tIoKChg6NCh7N69m8zMTFavXo3BYCAjI4P33nuPiIgITp48ycSJEwkJCWHdunUcP36chg0bcu7cOebOnYvFYmHq1KmoVCoiIyNJTk4mPj6eRx55pMb5169fz6lTp/Dz8yM5OZmlS5cCsGTJEoKDgzl37hzdunWjb9++LF26lK1btzJ8+HCOHj2KXq9n0aJFAGzfvp3du3fTrFkzkpOTmTHDuaHnokWL6NixI+np6YwZM4bAwEBeeOEFbDYbS5Ys4fz588yfP5/Jkyfz/fffA7B48WI2bNhAdnY2q1atokuXLixevJhGjRqxdOlSfvvtN9566y3mz59PdHR0neKs0DdAKq8egiDZSlHqrvTUSYEqNAb7wR110tYEeOMwV2s7zKWoA3xq2Cm1alq88DB+XVuTMnHVTfNZZCwUvn5IpSXV2iUWFL6tXGzKDydj27cHyWRCHdeZBi/PpWjG1Fp1jUVmvLSeVa8NOk9STa6JpWNkCw5nZNK6ZQhHjztXpbOUluHnbaiT79cbUT6LirGocgygMPgglVX7jLUEhcFVW+HXGIVWj+3b/6BoFIR+4jwsCybWOp/CEOBNmaXaZ6u5hKCGYXXyqVZ/RdYRORbCtUXp6gK8KTdXN9pt5lIaBnhf1lapVtHq4e4kvfyRW10Alb8vFZZq7QpzCSp/13LR6PmR5L/zb6i8Wa4LomJR3/KeSG1Rv53RXIreU1312stTg9Hs2rs5okcbpn6ynTe+/IWjZ/IY/6cHF1fUFpSfROmKijGIy6m3LH+zxWzqfY/igw8+SMuWLYmJiWHp0qVoNBosFgsLFy4kOjqapCTn06yEhASGDRvG2LFjGTx4MIsXLwagSZMmJCQkMHnyZHQ6Hbt378bHx4fx48djMpmYNm0aK1euZM2aNTXOfeLECdasWcOMGTOYMGECgwcPRpIkVq9eTWhoKOPHj2fWrFnMmzcPk8nEiy++SH5+PsOGDWPlypUcPnyYgoICTCYT8+bNY9asWYwbN46RI0ciSRJqtZrJkyczbtw4nnzySd599138/f155ZVXMJlMBAUF0bRpU/r370+fPn0YMmRIlW+PPPIIzZo149lnnyU2NpZp06Zht9sxGAx4eXkxcuTIOjcSAaSSYhRqbdVrhUaHVFp8WVtVeHscpw7XWduWV4TKUK2tMugozzPVsKuwlnNi/jpSJq6i0+evoKh8SnijfRYZC6mwAIWuuidOofdCKnRdVr0i9zySyRmf8uSDqNu1B2XtVdnf24DFWv1k3lxahr+Pa1J5YeQgCotLWLP1R3LyjPg20BPYsOaNy41ClM+iYiyqHANIZhMKz0t6aLV65/y8S7GW4Did7rS/eA60ehR+AbXqmvOK8PSq9llr0GPJL3Lrj1t/RdYRORbCtUXpluYVoTZUb9GiMeiw5tWMsVKtotui0ex/fQPFmRfqpO0wFqL0qtZWGvQ4jNXlwqNJACofAw3u7Y7/+IcB8B89BG2bVjW0LkVULOpb3hOpLeq38zfoKCkrr3ptKbPhf0nMAV7Z8BND4iJ5If5Olo/sw/S1/8VU4r4XW1R+EqUrKsYgLqfeskgVYv5uUerpr1STkBDn/kfe3t40b+58wuXj44PF4tzHKS0tjaSkJBITE9m7d2/VsDidTsfSpUtJTEzk+PHjGI3GKs2wsDAA/P39q3QuJT09neDg4KrXAwYMoEGDBqSlpVX5o9Fo8PHxITMzE4CAgAAaVHa5/6GbmZmJj48PGo0GgM6dO9O8eXM8PDzYunUr7777Ll9//TUFBQVVfgUGBrJ37142btzIQw895DY+Xbp0ITc3l9OnT/PNN99w77331jGyTipyTqDw9geVsxNaGRSO49QR8NSDxvXCq2rdBfvve+qsbdqfjja4EQqNU9v3jkjyth/Ew9cLVeVNRfOJA6vsy3KMqP29UWo1N8VnkbEoT01BFRgIaudTUHVMG2z79qBo0ABFZZnVjx4HSufNgqpZMBXnz7t9wtUuIoyciwXYKp8UJqedokfHaEzmEsyVk9UvGIsYFd+TEfffTfuIMLq0i0TtcfMGHYjyWVSMRZVjAMepYyj9G0Hld1O1iMae8ivoDaB1atvTD6EMCHQeoNWBUolUVFCrbtaBDPyaBaCq9Dk0NoJjOw+i8/HC03Dtey6KrCNyLMRri9LN/S0DQ3AAysoYB8a1ImtHMp6+XlUNSJVWTbfFT3Ek8Rvyjpwm7L642iSrKD14DHVQYxSVQ/b0nVpj/mEfSh8DSi8d9vN55MxYgTFxI8bEjQAYP9yM9WjGTYlFfct7IrVF/XbtQhuTU2DGZnfu1Jt8+gLdo5pjKinDbHUuqnK+0EKAt/O6763zRKmACsn9HDRR+UmUrqgYg7icKnNrUO+HntaVqKgo+vXrR1RUFDabrWqY5pQpU9iyZQtBQUGYza7d+wqFolbNiIgIsrOrN4Tetm0bcXFxREVFkZXlHAdvs9kwmUxVjc7LaYaGhmIymbDZbGg0Gvbu3UtAQADr16/H29ubiRMncurUKQ4frn4698QTT/DBBx8QGhpKQEDNJ+UqlQqp8mKXmppKdHQ0I0aM4I033qBNmzZVjdI6Yy/HtnMd6p6PQomZirxsKs4cQ91tKJLVgn3/Nuf3axSMVHAByus2rwSgotRG2vT3iVwwGlt+EebfsyjYdZTbZg+nvNBM5qotKD3VRC5+Cmt2Hl6tmpE++yMcZjfzjkT5LDAWlJVhXrUCr4lTkEyF2E+eoDz5APqnnkYqLqJ0wzoqCowYpkzFcT4Hj7CWFC9Z4FZW56nh5bEPsvjDzfh7G4ho3pTObSNY8emXeBv0PPVAHw6ln2L3wWO0bhmMyVLCzDHXvhDSrwcP8+W2HeTlG1n90b8Z9dhQtJ6e7g+8ET4LirGwcgxQXoZ1wzt4PjgByWyi4txpHOmH8Bw0GqmkGNv2Tdi2b8Jz8Gg0/R5GEdAU66crwF5eu6zVxhcJHxD/6igs+UWcP5bFiZ9TGDDjMUpNZn5890sAej3zAL7NAmg38E4cdgcZP7npKRBZR+RYiNcWpOuw2kia+SF3zRuJNb8IY+oZziWlcMfLwygrtHDo7S/ptWoS/pHBNGj+JABqnSenv/7VrbZkLeP8nLdpPPtpHMYirGmnKNlziEYvjsFhKq66+VX5eeM77D4AGo59iML132DPrWWxFUGxqHd5T6C2qN9Op/Fg1pCuvL5lD35eWlo19aNzqyBWbN2Hj96TMb3a82J8Z9buTuFQZi7ZxmKeHRCLn5f2ippV2oLykyhdYfUDhOXUW5a/2WI2Ckmqw6OTW5ikpCRmz57NkCFD6N27NwkJCURHRzNu3DjmzJmDj48Pc+bMwWQy8cEHHxAcHExOTg6DBg0iNjaWZcuWkZGRQadOndi1axe+vr7MmTOH5cuXk5qayrx588jIyGDRokXMnz+fe+5xXaZ4/fr1HD9+HD8/PyoqKnjmmWcwm80sXryYoKAgcnJyuPvuu+nbty8bN25kyZIlzJ8/Hx8fH2bNmsXgwYP5v//7P7Zv385PP/1Es2bNKCwsZOrUqRw6dIjly5cTFxeHzWZj27ZtLFiwgC5dulBRUcE999zDsmXLaNeuHWazmYULF5Kamsrs2bNp164dEyZM4LbbbuO2227j4Ycfxmq10q9fPzZv3nzZxuWfKXlzwnX/vfYsKnRvdI10mekrTFsEJd8cc290jRhef1GIriq4tRBdx9mrWyDkajC/tFSI7qHkJkJ073i05uiF68XC/3f5+WB/ldnPiVmYwHEy273RNVLfYlEfWfuGmLLcXW90b3QNNJ8UIkQXxOW++pb3ALLeOSNEN3RpT53NfTgAACAASURBVCG6ilAxeU8kpx95R4huQAtx+Slg24/CtK8nJW8/I0RXP/kfQnT/KvW+oShTN2w2GxUVFaxYsYKZM2fW6Ri5oSgWuaFYjdxQrEZuKFYjNxTrN3JDsRq5oViN3FAUj9xQFEfJqklCdPXPivnN/ip/m6Gnf3cmTZpEWFgYjz/++M12RUZGRkZGRkZGRkbmFkduKP5NeP/992+2CzIyMjIyMjIyMjL1l7/ZIjxyQ1FGRkZGRkZGRkZGRsYdf7MZe3JDUeaKKFqEC1D9TYCmE2XPeCG6UqaY+XOekeLmX4ny2SFEVdzcR5GcVKvdG10DXVo2E6ILkCnVbU+6q0fMvDxVPYxFyTdi6nXeKS8hugDNBojZKeukh5h5oC1NYspbqJCc9wdicl99y3sAAS3EzM+XTp0QoiuS+jb/UeR1yP0SizI3A7mhKCMjIyMjIyMjIyMj446/2dBTMY8RZWRkZGRkZGRkZGRkZOotco+iTJ35JSObHUdP4++lQ6GAp/t1cvk821jM8q/2ERMSQNq5fO7tEE7PmNA6afv1aEvj++7AlmcCCU4t2+TyeePBXWg0IJbio5l4dwjn/MYfyfvugHufD6ezY98R/H0MKICnH3bdBzP7gpF3N24jPDiQE2dzGXH/3USGBbnXFRgLVUR7PNrfhWQ2gSRh+/bfNWzUPZzDjZQNA1HovLCuW3nTfBYV48uRl2/krcRPSDt+kvX/euuaNADUHW9H07UHUmEBkiRRuvZjl889+w1Ae/8gsNkAsG77mrId37nVDeoWQ4t74yjNLwJJ4sCKzS6ft580EF0jH0ovmgho24L9b2zCdCKnTj4rQ6JQ3dYRSouRJLDv/crlc03fESh8G1XbBwRjXbcAqcjNZsmXoU3XdsTdeydFeSYkSeLzlRuuWkOkv/UtFqLKG4D+rg406HcXDqPTv/x/rLusnXd8T4KWTSetw1CkEqtbXVHXofCubWgzIA5zZR3ZsfLzGjZt7+/MPdOH8dXcTzi286BbzT8QlkcEXTvrW94DcbEQVUdEXivqW04Vda0QrX3LUfH3mqMo9yjWU4YOHYrDUfuMsaKiIj7/vGYSvhZKbXYWfJ7Ei/F3MrF/JzJyCtibcc7F5qMfDtMhLJAxvdozumc7ln21r07aSp2GqCVjSX/lY069sQlD6+b4dW/jYqPSajg+fx1Zb/8/Tq/cTKu5I937XGZj/vubeHHUYCY+fA/pWTnsPZLuYrP04y/oFdeG0YN7Myq+JwlvX/7i5qIrMBaoPdE+Opmyze9h+2YdyqAwVBHtXUw84nohlVoo/+lLyja/j+2HLTfNZ1ExvhIHDqfQu/udf20uuacnhilTsaz+ByWffoRHy3DUHTrVMCteNA/T9OcwTX+uTjckKq2GbovHsGfupxxY/jn+0SEEdY1xsfHQa/ll7loOvfMVp77eR+eEx+rms4caTZ/hlP+0kfJfvkIZ0AxlSJSLiSMrlbJNy51//+8dHGfTr6lhpNFqGLPwadbM+4DP3lxP8+gwYrq2vToRkf7Wt1gIKm8ACq0nTeY+w4WFieStWos2sgX6Lu1r2GnCQ9Dc1rzuPgu6Dqm1GoYsGMNXr61hx5uf0SSqOeF3udYRv+BGWIzFmHKu7vcSlkcEXTvrW94DgblPVB0ReK2obzlV2LVCsPYtiVQh5u8WRW4o1lM+++wzVCpVrTZFRUVs3ry5Vpu6cjjzAk39DGg8nOfsENaYXceyXGz8DToKLM4nREaLldbBDeuk7RMbgfXsRSSbHYDCfWkE9O3oYpOz/kfKsp0Xb32LJljSz7r3Of00TRv5oVE7O847RLbgp4OpLjaZ5/NoGuDcsLhZY3/Ss3IoKDLXriswFqoWUVQYL4LdGQvHqVQ8YuJcbNSxPVF4GVD3iEczcCRSWalbXVE+i4rxlejfqzt6vf6ajv0DdXQMjtxcKC8HoDzlKJo7utSw0w4agu6hR9ENH4WigfvFMwJvb4X5bB4VleU499cMmvfp4GLz2xvVPQYKpZJyS1mdfFY2DUcqMoLDqV1x7gSqFq4NFkf6/qr/PWK6Yk9JqpP2n2l1e+T/Z+/M46Is1z7+nRlmmBmGVdBiUXBBEM0l0czlmGmLpaaW2fHkUu6ZlZYrZnLANfXteMrEFtP0HLVO+paVqZ1KydxxIVlUXAJcYGBggGEW5v1jEJxABtE75O35fj5+Ps7MNb/nmov7vu7nfu6NnMxrWMt/R9rhFDr26XxLGiL9bWixEFXeADQdI7BkXcVucfhXfPRXdL27ONnI1O74jX2anJs84a8OUXmoaadW5GXmYCuP54XDaUT0cc71eb9d49z+W99IRVg7Iih3NrR2D8TFQlQdEZkrGlqbKipXiNaWqH+ETD39/PPPWbFiBWPGjCE1NZW8vDyGDBnCvn37uHDhAmvWrEGn05Gens7atWsJDw/n3LlzTJo0iZCQEDZt2sSZM2do1KgRWVlZLFiwgKKiIqZNm4ZCoaB169YkJSUxYMAAhg0bVuX6mzdvJiMjA19fX5KSkli2bBkAS5cuJTg4mKysLHr06EHfvn1ZtmwZO3bsYMSIEZw6dQqtVsuiRYsA2L17N/v27SMoKIikpCRmzZoFwKJFi+jYsSNpaWm88MILNGnShNdffx2z2czSpUu5fPkycXFxzJw5Ez8/v2p/43WMRiNxcXGcP3+enj17cvXqVZRKJTExMQC8++67WK1WysrKUCqVTJkyhT179hAfH8/69eu5du0ab731Fp06dUKpVHL8+HFiYmJo164dW7ZsITMzk1WrVlVo//zzzwQFBXHq1Cneecf1NKHr6I0laN0rd3n0cFehNzo/dXu+V1umrd/N21/+wqlLOYz/3U3yzVD5e2EzVk5BsBlLUPp7V7GTq5WEvf4Mvt3bkDxplWufC4x4qN0rXus07pw2OCfTjq3DOJF+gTbNQzh15hIARSWl+Hrpbq4rMBYynTf20uLKN0zFyHTOsZD5Nkam1mL+9t/IAgLRToqlKH5SjU+kRPksKsYikfn4Yi+pjLG9uAiZTysnG8uJJMwH92M3GFBGd8Vz7gIKZk2rUVfj74XFWHmzbDaW0Mi/+l0f5UoFrZ7pSeLcdbXzWeuJ3VJZR+zmEuSamz2ZlaFoFoX12J5aaf8er0bemG74HcXGYkIbNb8lDZH+NrhYCCpvAAo/H8qKKv0rMxaj8HPOFwGvjST3vX9B+U1crXwWlId0/l6UFlX+7UzGYgIbhdbar5oQ1o4Iyp0Nrd0DcbEQVUdE5oqG1qaKyhWite9KpKmnt8/QoUNp3rw5UVFRLFu2DJVKRVFREQsXLiQyMpLERMcTm5iYGIYPH87YsWMZNGgQixcvBuCee+4hJiaGl156CY1Gw759+/D29mb8+PEYDAamT5/OO++8w4YNG6pc++zZs2zYsIFZs2YxYcIEBg0ahN1uZ82aNTRr1ozx48czZ84cYmNjMRgMvPHGG+Tm5jJ8+HDeeecdTpw4QV5eHgaDgdjYWObMmcO4ceMYOXIkdrsdpVLJSy+9xLhx4xg9ejSrV6/Gz8+PN998E4PBQGBgIPfeey+PPPIIXbt2velvvI5Op2Pw4MHIZDJeeuklFixYwPnz5/nhhx/Yu3cvJ06c4JVXXuG1114jKSmJffv28fDDDxMU5Ng2vmPHjvTt2xedTsecOXMYPXo027ZtA2DYsGEEBQXx8ssv06FDB7Zt20b79u0ZN24co0aNuqW/qZ9OQ3GppeJ1UakZP53ayebNLT8xOLo1rw94gBUjH2bGxv9iKHY9YmLOKUBxg5ZCp8GSY6hiV2aycDZuE8mTVtHpP28ic6t5RNXPS0eRqfL6xpJS/LydE+nrIweSX1jMhh0/kp2jx8dTS5NGVRtrJ12BsbAbDcjcbxgxU2sda4RuxFSM7bxjKor9Whaotch8a95YWpTPomIsEnt+HjJNZYxlWg/s+flONmVXLmM3OOJuSTqG8r72IK85XZbkFKDUaSpeq3QaTDkFVezkSgU9Fo3h8JItFF6o3REN9uJCZMrKv5dMpcFeUlitraJFe2wZJ2qlWx0FuQbUN/wOrU5LQW7V+lgTIv1tcLEQVN4AbPp85B6V/sl1Wmz6Sv/c7vFH4a3D8/Ge+I1/BgC/MYNRt21VRcvJZ0F5yJhTgLtH5d9OrdNSlFu1jtQFYe2IoNzZ0No9EBcLUXVEZK5oaG2qqFwhWlui/hE69fT6yJmXlxdNmzqe4nh7e1NUVARAamoqiYmJJCQkcODAgYopZRqNhmXLlpGQkMCZM2fQ6/UVmqGhoQD4+flV6NxIWloawcHBFa8fe+wxPD09SU1NrfBHpVLh7e3NhQsXAPD398ezfBrDdd0LFy7g7e2NSqUCoGvXrjRt2hQ3Nzd27NjB6tWr+frrr8nLy6vwq0mTJhw4cICtW7fy9NNP1/gbbxYrgGbNmpGenu7k8/X3U1KqP3/IVVwAZs+ezZEjRxgyZAh79+7FfgsLve5r1pjsPCNmq2NdZNL5q/SMaIqhuBSjybG4/HJ+Ef5ejt/npXFHLoOyWlzDcDgNdXAAMpVjgNunS2tydh/DzccDRflNWtNJT1bYl2brUfp5IVeravY5PJTsa3mYy59gJaVm0KtjJAZjMcbyRdRX9QWMGtCb55/4C+3DQ+l2X2uUbjUPtIuMhS0jBblfAJT7oAiLxJp8CLQ6UDtiYU07jty/ieMLag3I5dgL8urFZ1ExFonldDKKJk2g/BxEZVRbzAf3I/P0RFZeP7VjxoHccUOmCAqm7PJll1tiXzmSji7YH3l5OW4S3YqLe5Jw9/Go6EAq1Ep6LH6RkwnfkHPyPKH9o2uSrKAs+ywyLz9QOLTlgS2wZZwEdy2onG9OFG26Yf11fy2jUZX0I6n4BwXgVv47wjtHcOz7wy6+9cf529BiIaq8AZQcS0EZ2BhZ+TQ1bac2GH84iNxbh9xDg/VyDtmzVqJP2Io+YSsA+o+/wHQqvUZdUXno4tF0fIP8UZTHs1nncFK+P4bG2wP3GzrkdUFYOyIodza0dk9kLETVEZG5oqG1qaJyhWjtuxF7WZmQf3cr9brraUREBP369SMiIgKz2cyuXbsAmDp1Ktu3bycwMBCj0XnIXSaT1agZHh5OZmblgcc7d+4kOjqaiIgILl50zB83m80YDIaKzlV1ms2aNcNgMGA2m1GpVBw4cAB/f382b96Ml5cXkyZNIiMjgxMnKp9A/e1vf+Ojjz6iWbNm+Pv71/gbf8+lS5cq/n/+/Hl69OiBUqnk4MGDTu/36dOn2u9X9xsUCkVFZ/D06dNkZ2cTFxeHxWLh+eefp2/fvkRFRVX5XnVoVG7MGdydJdv34+uhptW9vnRtFcjKHQfx1rrzwkPteWNAVzbuS+b4hStk6gt5+bHO+HqoXWqXlZhJnfEBrePHYM4twPjrRfL2nqLlvBFY8o1cWLUdubuS1otfxJSZg0erINLmrcNmrHlNjMZdxdyxQ1n88Rf4eekIb3ovXduFs/LTL/HSaXnxqYc5npbBvmMptGkejKGomNkvDKnXWGApxbTlPdyHTsBuNFCWdR5b2nHcB47BXlyIefdnmHd/hvugMaj6PYPM/15Mn64Eq6VGWVE+i4rxzTh07ARf7txDTq6eNev+xajnhqB2d3f9xRspLcW4aiUek6ZiN+RjPXcWS9JRtC9OxF5YQMmWTZTl6dFNnYbtcjZuoc0pXBrvUtZmMpM4+2MejB2JKbcA/elLZCUm02XucErzizj+7pc8tGoyfq2D8Ww6GgClxp3zXx9y7bPVgvn7TSh7PwvFRspyMim7lIKyxxDspiKsh3cCIAsIxp53FSy1W/tYHWaTmY/mrmHUW2Mp0Bdw8fR5khNP3pqISH8bWiwElTcAu6mUy/PfpfG8idj0BZhSMyjef5yAN17AZiisuClT+HrhM7w/AI3GPk3+5m+wXqlhww5BechiMrMt5iMGvDWKotwCLqdc5OzPyTw26zlKDEZ+XP0lAA9NeQqfIH/ue/IBbFYb6T+5HukR1o4Iyp0Nrd0TGQthdURgrmhobaqwXCFY+67kTzb1VGa/lWGlWpKYmMi8efMYPHgwffr0ISYmhsjISMaNG8f8+fPx9vZm/vz5GAwGPvroI4KDg8nOzmbgwIF07tyZ5cuXk56eTqdOndi7dy8+Pj7Mnz+fFStWcPr0aWJjY0lPT2fRokXExcXx6KPOWwdv3ryZM2fO4OvrS1lZGVOmTMFoNLJ48WICAwPJzs7mL3/5C3379mXr1q0sXbqUuLg4vL29mTNnDoMGDeKVV15h9+7d/PTTTwQFBZGfn8+0adM4fvw4K1asIDo6GrPZzM6dO4mPj6dbt26UlZXx6KOPsnz5cu677z7AMRW2ut94IwcOHGD16tU8+OCDXLx4EaVSyZtvvolMJuOf//wnpaWl2O121Go1U6ZM4YcffiA2Npb+/fszdOjQipjOmzeP//mf/6mIUWRkJBMmTKBly5a0bNmSS5cuIZPJUKvVZGVlMW/evIoR0+oo2b70ThcNfh5/5I5rXqf7zueF6Nov3PrGCrXBuucnIboAbg/3EqIra9ZGiK4iWIwugGHEGCG6XySHuDaqAyNe9xCiCzB2Ze2mut4qH7zWWIiuSETF4h8RetdGdSAnQ1y5CHpMzOSihf9b/Rrd26VvSc07fteVBxPuF6IL4tq+htbuARS9t0OIrvbxCNdGdUAW1kKILohrU88Pe0+Irkgi0r6ubxdqRVG8692H64LH3PVCdG8XIR1FiVvjwIEDfPHFF1XWL9Y3UkfRgdRRrETqKFYidRQrkTqKlUgdxUqkjmIlUkexEqmjeIO21FGsoMF0FOP+JkTXI+ZTIbq3i3Q8Rj1jNBrZvn07qampHD58a2tfJCQkJCQkJCQkJCQkRFCvaxQlHLueLly4sL7dkJCQkJCQkJCQkJCoCWmNooSEg9mhf61vF26J5taGNUA+fEi+a6O7jNLU6rcWv5vx3vixEF1RU1qPJ90jRBdgt6bmrfXriqipgOeUStdGdxk9tWKmnmYaXB8wXleCvBtWvRYVC5Hl7ZybmF0NG1q7B+LqiH9Y9Tu+3y4ip32LoulkMUsjLr53ybVRHWkwU09jRwjR9XhzoxDd20UaUZSQkJCQkJCQkJCQkHDFXXyUhQikjqKEhISEhISEhISEhIQr/mRTT6WOokStadG9LW0fi8aYWwB2O3ve+U8Vm3ZPdOXRGcP5asF6Ur4/Vu/agT2iCHs8mpJy3aMrv3D6vP3kJ9EEeFNyzYB/uzAOv/0ZhrPZ9aYLoAhvj1v7B7EbDWC3Y/72X1VslL0GACBv1ASZxgPTpnfqTVfZ8X5U3Xthz8/DbrdTsvETp8/d+z2G+omBYHYcQmza+TWle75zqSta+0ZycvX8I2E9qWfOsfnDf9zy9/8If317taNx/y6Ycwxgh4zlnzl93nhQNwIe60zhqQt4dWjB5a0/kvPdUZe6ouqeKH+h4dVr7YMd8Oz3IDa9AbvdTu4/N1Vr5zWgN4HLZ5DaYQj28sO1XSEqzqJ8boixEFUuGlq7J1JbVLkQmZMbWh2Rh0SgaNkRSgqx28F64Cunz1V9n0fmE1Bp7x+MaVM89gLXZx2KrNcS9YvUUfx/QEFBAbt372bIkLofZO4KpVrF4PgXWPnIDGxmKyNWv0qLB6M4+3NyhY1vcABF+kIM2bd2gKoobYVaRY/FL/BZn5mUma30TZhKYPcoshIrdd20an5Z4JgX3nxAV7rGPMd3Y1bUi64jGO6on32JokWTwWpF/cJsFOHtsaUdr9SOfgh7SRHWQ98DIA8MrT9dd3d0U6eRN340WCx4zotF2aETliTnm6/CRbGUXbnsWu+P0v4dR08k06fnA6Skn6u7iEB/5RoVEUvH8kuv6djNVtp9OA3fnm3J23uqwkahVnEmbhOlmbno2obSbu2rLm+CRdU9Uf5e/15DqtcytTv3LJhCRv+J2C1WglbNRdutPcX7jzvZqVqEoGrZ1OXvvxFRcRblc0OMhahy0dDaPZHawsqFwJzc4OqImxLVwyMwbVgANiuqJyYgD4mg7FJKhYnt4mlsuzeUX0CN6pHRteokiqzXdyX2P9fU04a3ClqiCgUFBXzxxReuDW+Dpp1akZeZg81sBeDC4TQi+nR0ssn77Rrn9t/62UuitJvc3wrjbzmUleteOZRO04c7ONkcebvyibNMLsdSVFpvugCKsAjK9NfA6tC2ZZzGLSrayUbZuTcyDx3KXgNQPTkSe2lJvekqI6OwXbkCFgsAluRTqLp0q2KnHjgYzdPPohkxCpln7TaiEKn9ex55qCdarbZO3/0j/PXuHI7pt2vYy8tc/sFU/Ps615HszT9Smulo1LVh91CU9ptLXVF1T5S/0PDqtaZjBJasq9gtDt3io7+i693FyUamdsdv7NPk3OQp/M0QFWdRPjfEWIgqFw2t3ROpLapciMzJDa2OyO9tgb1ADzaHblnWWRRh7ZxsbGmVR7S5RXXHmpxYrz5L3B002BHFzz//nBUrVjBmzBhSU1PJy8tjyJAh7Nu3jwsXLrBmzRp0Oh3p6emsXbuW8PBwzp07x6RJkwgJCWHTpk2cOXOGRo0akZWVxYIFCygqKmLatGkoFApat25NUlISAwYMYNiwYU7X1uv1LF68mBYtWnDp0iWeeuopdu3axXfffcfSpUtp2rQpr732GgMGDOC3335jx44d/PWvf+Xw4cNERkbi6enJyZMn0Wq1LFq0iGPHjvHWW2/RuXNnrFYrKSkpvPjiixw4cIBTp04RExNDu3btMBqNxMfHExoayuXLl+nTpw89e/Zky5YtZGZmsmrVKnr27MmuXbvYsWMHzzzzDCdOnADg/PnzdOrUiUWLFvHFF1+wefNm3n77bYKDg2sVb52/F6VFldMETMZiAhuF3pG/pShtjb8XFmNlZ8dsLKGRf/UHQMuVClo905PEuevqTRdApvPGXlpc+YapGJnO29nGtzEytRbzt/9GFhCIdlIsRfGTanzKJUzXxxd7SaWuvbgImU8rJxvLiSTMB/djNxhQRnfFc+4CCmZNqykMwrVFINJflb8XNmNlHbEZS1D6e1exk6uVhL3+DL7d25A8aZVLXVF1T5S/0PDqtcLPh7KiSt0yYzEKP+dYBLw2ktz3/gXlN1q1RVScRfncEGMhqlw0tHZPpLaociEyJze0OiLTemK3VJY3u7kEueZmo3syFM2isB7bU68+37X8ydYoNtgRxaFDh9K8eXOioqJYtmwZKpWKoqIiFi5cSGRkJImJjichMTExDB8+nLFjxzJo0CAWL14MwD333ENMTAwvvfQSGo2Gffv24e3tzfjx4zEYDEyfPp133nmHDRs2VLn20aNHMRgMPP/880yfPp1GjRoxc+ZMVCoVTZs2pXHjxoSHh/Pcc8/xxhtvoNfrGTFiBO+//z7//ve/eeSRR3jnnXdITk4mLy+Pjh070rdvXzw9PVmwYAGPP/44u3btYt68eYwdO5Zt27YBsGbNGpo1a8aECROYOXMmb775JlarlWHDhhEUFMTLL79Mhw4deOONN8jNzeVvf/sb7733Hq+++irjxo3D3d0dALlczuuvv17rTiKAMacAdw91xWu1TktRbkGd/35/hHZJTgFKnabitUqnwZRTVVeuVNBj0RgOL9lC4YWr9aYLYDcakLnfMLKl1jrWFN6IqRjb+TSH/bUsUGuR+frXj25+HjJNpa5M64E93/nYj7Irl7EbHNeyJB1DeV97kLtOPSK1RSDSX3NOAQpdZR1R6DRYcgxV7MpMFs7GbSJ50io6/edNZG41H4chqu6J8hcaXr226fORe1TqynVabPrKWLjd44/CW4fn4z3xG/8MAH5jBqNu26qK1u8RFWdRPjfEWIgqFw2t3ROpLapciMzJDa2O2IsLkSkry5tMpcFeUv2xOIoW7bFlnKhR74/w+W7FXlYm5N/dSoPtKF4nJMRxVoyXlxdNmzqejnh7e1NU5DhPJzU1lcTERBISEjhw4EDF9DKNRsOyZctISEjgzJkz6PWV5/qEhoYC4OfnV6FzI7179yY6OpoXX3yRmJgY3NzckMvlPPvss2zatIkff/yRXr16Vdj7+/vj4eGBXC7Hw8OjWj+Bivdv/C1eXl5Ov+XcuXMkJCSwfv16wsPDMRiqNoTXr+nt7Y1CoSAyMpIBAwbw008/UVhYyJEjR+jcufMtxfni0XR8g/xRqByD0M06h5Py/TE03h6439Bw1AVR2leOpKML9kdertskuhUX9yTh7uNR0dgp1Ep6LH6RkwnfkHPyPKH9o2uSFKoLYMtIQe4XAG4ObUVYJNbkQ6DVgdqhbU07jty/ieMLag3I5dgL8upF13I6GUWTJlB+/pgyqi3mg/uReXoiK69r2jHjQO64GVMEBVN2+XKttpcWqS0Ckf4aDqehDg5AVl7mfLq0Jmf3Mdx8PFCUl7mmk56ssC/N1qP080KuVtWoK6ruifIXGl69LjmWgjKwMTKlQ1fbqQ3GHw4i99Yh99BgvZxD9qyV6BO2ok/YCoD+4y8wnUp3qS0qzqJ8boixEFUuGlq7J1JbVLkQmZMbWh0pyz6LzMsPFA5deWALbBknwV0LKrWTraJNN6y/7ncZA9E+S9wdNNipp7UlIiKCfv36ERERgdlsZteuXQBMnTqV7du3ExgYiNFodPqOTCarUTMtLY0BAwYwduxYNm7cyCeffEJMTAxPP/00gwcP5tq1a8TFxQn5Lf7+/owcORKAbdu24ePjg9lsxm53DIWfPn2ayMjIKr/B3d2dAQMGMHfuXPr06XPL17aYzGyL+YgBb42iKLeAyykXOftzMo/Neo4Sg5EfV38JwENTnsInyJ/7nnwAm9VG+k+un0qJ0raZzCTO/pgHY0diyi1Af/oSWYnJdJk7nNL8Io6/+yUPrZqMX+tgPJuOBkCpcef814fqRdcRjFJM4ChkGAAAIABJREFUW97DfegE7EYDZVnnsaUdx33gGOzFhZh3f4Z592e4DxqDqt8zyPzvxfTpSrBa6ke3tBTjqpV4TJqK3ZCP9dxZLElH0b44EXthASVbNlGWp0c3dRq2y9m4hTancGm86ziI1v4dh46d4Mude8jJ1bNm3b8Y9dwQ1OUj8LVGoL9lJWZSZ3xA6/gxmHMLMP56kby9p2g5bwSWfCMXVm1H7q6k9eIXMWXm4NEqiLR567AZa15nKqruifIXGl69tptKuTz/XRrPm4hNX4ApNYPi/ccJeOMFbIbCihsnha8XPsP7A9Bo7NPkb/4G65WaN5IQFWdRPjfEWIgqFw2t3ROpLaxcCMzJDa6OWC2Yv9+EsvezUGykLCeTskspKHsMwW4qwnp4JwCygGDseVfBUrt1q0J9vlv5k009ldmv9zAaGImJicybN4/BgwfTp08fYmJiiIyMZNy4ccyfPx9vb2/mz5+PwWDgo48+Ijg4mOzsbAYOHEjnzp1Zvnw56enpdOrUib179+Lj48P8+fNZsWIFp0+fJjY2lvT0dBYtWkRcXByPPvpoxbUPHz7M1q1badGiBRcuXODZZ5/lvvvuA2DBggWEhoYyatQoALZu3crSpUtZuHAhAHPmzGH27NkEBgYyZ84cBgwYwFNPPVXh87Rp01ixYgUGg4EFCxawdu1aTp8+zYIFC2jevDnLli2jSZMmFBYWEhISwl//+lesVisTJkygZcuWtGzZEoClS5cyefJkxowZU+H3lStXGDZsGLt27UKlcv3kfnboX+/Y3+uPoLm1YQ2QDx+S79roLqM0tfqpKncz3hs/FqJrGDHGtVEdOJ50jxBdgN0a11M760LfEpsQ3XPlIwENiZ5avWujOpBpqNtGTbUhyLth1WtRsRBZ3s65iZnp0NDaPRBXR/zDqs4AuxPkZHgI0RVJ08khQnQvvndJiC5ARNrXwrTvJMaZYk4Y0C2pejzO3UCD7SjebZjNZlQqFcuWLWPSpEnodLr6dskJs9lMXl4en3/+OZMnT67Vd6SOolikjuIfg9RRrETqKIpH6iiKR+ooVtLQ2j2QOop/BFJHURzGNwYL0dUtE3t6QV35fz/19I8iISEBs9nMvffee9d1EktKSpg4cSLNmzfn5Zdfrm93JCQkJCQkJCQkJBoef7JzFKWO4h1iypQp9e3CTdFoNHzyySf17YaEhISEhISEhISERANB6ihKSEhISEhISEhISEi44k+2mY3UUZS4Kcuyfrzjml0DWt9xzessu5YqTFsEw+lY3y7cMqLWz4lcGzRY0FpCUWsfP7x/uhBdgIesYtbafKh2vVtpXbhgvSZEF+BDD7VrozoQ9JigNWPfiltHKHL9owi6zfYRovvojK+E6IK4tm9jibg6IoqjUX717cItIbJ+iGr7mi8SswfCgwm9hehKuObnn3/mu+++o1GjRshksiozGe12e8XZ75mZmRQUFLBo0aLbvq7UUZSQkJCQkJCQkJCQkHCBvR5GFEtKSpg/fz47duxApVLx8ssvs3//frp161Zhs337dry8vHjqqacASElJuSPXbnjbZUlISEhISEhISEhISPwJSEpKIjAwsOJou06dOvHDDz842Xz55Zfk5+ezfv16VqxYgYfHnZlBJI0oStwSvr4+LIyfTUbGRVq2DCNm3mKuXs1xsgkIaMSHa1eS+PNBGgf4o1QpeeXVGGo6icXTx5NJs8eRdTGb4LAg1iz+kLycvCp2QaGBTJk3EZvNRsz4BfXmr0htRXh73No/iN1oALsd87f/qmKj7DUAAHmjJsg0Hpg2veMyFqJ0fXu1o3H/LphzDGCHjOWfOX3eeFA3Ah7rTOGpC3h1aMHlrT+S891Rl7oAgT2iCHs8mpLcArDbObrSefvo9pOfRBPgTck1A/7twjj89mcYzma71FV2vB9V917Y8/Ow2+2UbHTe7Mm932OonxgIZjMApp1fU7rnu1r5/HtycvX8I2E9qWfOsfnDf9RJozradr+P6McfoCDHgN1u5z/vbKmTjqgYi/RZVL7QPtgBz34PYtM7/Mv956Zq7bwG9CZw+QxSOwzBXmxyqSuq7on0WVS9Fpkv5CERKFp2hJJC7HawHnCeSqrq+zwyn4BKe/9gTJvisRfUfOh3Q2v3ALx9vJg9/zUunv+N0BZNWfr3f5Bzzfl33tcxihcnPk/yydO0aBlK0tFT/Gv95/WiKyoni8z1osqyqJwssu79kp7JnlPn8fPQIJPBxH6dnD7P1Bey4quDRIX4k5qVy+MdWtA7qlmttO866mFEMTc316njp9PpyM11rndZWVkYjUamTJlCRkYGY8eO5euvv0ahuL1jse76juL+/fv56quv8PLyonXr1hVDqnXlwIEDeHl5ERkZeVObtLQ04uLieOqppxgy5OYHa27cuJEPP/yQ77//HoAhQ4awdevW2/6j3CqzZ8/m+eefp02bNsKvFff3Wez5fh+fffYlTz7Rj6VL3mT0mKlONm5ubmz/32/58CPHDcuRw7vo9sD9/Lz/8E11J856kcP7jvD9lz/SvV83prw5kb9PrTq3OqpjJPu/P0CXv3SuV3+FaSvdUT/7EkWLJoPVivqF2SjC22NLO16pGf0Q9pIirIcc5U4eGOo6EIJ05RoVEUvH8kuv6djNVtp9OA3fnm3J23uqwkahVnEmbhOlmbno2obSbu2rtWp8FGoVPRa/wGd9ZlJmttI3YSqB3aPISkyu9Fmr5pcFGwFoPqArXWOe47sxK2oWdndHN3UaeeNHg8WC57xYlB06YUly9qlwUSxlVy679NMVR08k06fnA6Skn7ttreuo1CpeWDiRGf2mYjVbefX9GUR1b0dy4slb0hEWY4E+g5h8IVO7c8+CKWT0n4jdYiVo1Vy03dpTvP+4k52qRQiqlk1r76yoOi3QZ1H1WmS+wE2J6uERmDYsAJsV1RMTkIdEUHapcvqV7eJpbLs3lAdFjeqR0S47idDw2j2AmfNeYd+Pv/DVtp30ffQvxMRO59VJc5xsmjQJ4KM1n3L86Cnc3Nw4lvYj3361hzz9zde3CdEVlZMF5npRZVlUThZZ90rMVuL/k8jn04eiclMwff0eDqRn0bVVYIXNuh9O0CG0Cc/3aktKZg5vfPrfBtxR/OOPx2jUqBFFRZXnhBqNRho1auRko9PpaN++PQBhYWEYjUays7MJDg6+rWvf9VNPd+zYwZNPPsnMmTN54oknblvv4MGDnD59ukab8PBwoqOjXWqNGDHC6fXnn3/+h3cSARYuXPiHdBIB+j/+ML/8cgSAxJ8P0f/xPlVssrOvVDSWHh5adB5aLlzMrFG328MPcOrIrwCcOHSKB/t0rdbuuy/2YLVY691fUdqKsAjK9NfA6viNtozTuEU5l0Vl597IPHQoew1A9eRI7KWuNxIRpevdORzTb9ewmx26+QdT8e/rvElP9uYfKc103Ixpw+6hKO03l7oATe5vhfG3HMrKta8cSqfpwx2cbI68XflEVCaXYykqdamrjIzCduUKWCwAWJJPoerSrYqdeuBgNE8/i2bEKGSedd/M4JGHeqLVauv8/epodX9rcjKvYS2PTdrhFDr2qf1N5HVExVikzyAmX2g6RmDJuoq93L746K/oendxspGp3fEb+zQ5Nxm1qw5RdU+kz6Lqtch8Ib+3BfYCPdgc2mVZZ1GEtXOysaVVdtrcorpjTU6slXZDa/cA+jzSiyOHkgA4dOAYfR7pVcVm17c/cPxoZUfBarW6vI4IXVE5WWSuF1WWReVkkXXvxIWr3OurQ+XmuP/tENqYvSkXnWz8dBryihwzGfRFJtoEN6qiI3FzOnToQFZWFubyke+jR4/Su3dv8vPzMRqNAHTr1o1Lly4Bjo6kzWYjICDgppq1xeWI4ueff86KFSsYM2YMqamp5OXlMWTIEPbt28eFCxdYs2YNOp2O9PR01q5dS3h4OOfOnWPSpEmEhISwadMmzpw5Q6NGjcjKymLBggUUFRUxbdo0FAoFrVu3JikpiQEDBjBs2DCnax86dIiTJ09itVrJycnhypUrvPvuu7zyyiucPHkSk8nEuHHj+OSTT2jTpg0pKSlMnz6dwMBAjEYjS5YsISQkhJycHLy9venfvz8HDx7E09OTzMxMxo8fz+rVq7FYLCiVSkpLS5k5c2aN8bh06RLx8fG0adOGJk2aVLy/Z88e4uPjWb9+PdeuXeOtt96ic+fOWK1WUlJSePHFFzlw4ACnTp0iJiaGdu3aYTQaiY+PJzQ0lMuXL9OnTx969uzJsmXL2LFjByNGjODUqVNotVoWLVqEyWQiNjaW4OBg9Ho9nTt3JjQ0lPj4eAYPHsyQIUM4e/YsH330EaGhoZw7d46xY8fi7+9fq3jXhsaNG1FY6CiUBQWF+Pn5olAosNlsVWyHDRvIxPEjeXv5ajIza54W4dvIh2JjMQDFhUV4+XqhUMix2W7vyY0of0Vpy3Te2EuLK98wFSPTeTvb+DZGptZi/vbfyAIC0U6KpSh+Uo2HwIrSVfl7YTNWTmOzGUtQ+ntXsZOrlYS9/gy+3duQPGnVTfVuROPvhcVYecNsNpbQyN+rWlu5UkGrZ3qSOHedS12Zjy/2kspY2IuLkPm0crKxnEjCfHA/doMBZXRXPOcuoGDWtFr5/Ufg1cgb0w2xKTYWE9qo+S3riIpxddwpn0FMvlD4+VBWVOlfmbEYhZ9zWQ54bSS57/0LbuGmXVTdE+mzqHotMl/ItJ7YLZXadnMJcs3NRlFlKJpFYT22p1baDa3dA2jk70dRoUPbWFiEj6/3TX0GGDXuOf658oOK3/lH6orKySJzvaiyLConi6x7emMJWvfK3Vs93FXojc4j9c/3asu09bt5+8tfOHUph/G/6/w2KOph6qlGo+Gtt94iLi4OX19fWrduTbdu3Vi6dCk+Pj6MHz+ecePGsWzZMt5//30uXrzIkiVLcHd3v+1ru+woDh06lG3bthEVFcXYsWOZPHkyRUVFLFy4kLi4OBITE3n00UeJiYlh5syZdOrUiQMHDrB48WLeffdd7rnnHoYPH45cLicuLo59+/bRu3dvxo8fz4oVK5g+fTp6vZ5Ro0ZV6bhER0cTGRnJ4MGD6drV8aRt06ZN9OjRg9GjR3Py5ElUKhXTpk0jJCSE7777jg0bNjBz5kzWrFlD06ZNGTduHACfffYZYWFhdOnShaCgoIoppW3btqVv374ATJw4kfT0dFq1ck4kN7Js2TIGDhxI//79KzrKAA8//DDr1q0DoGPHjvTt2xebzcarr77KunXr2LVrF8uWLWPXrl1s27aNdu3asWbNGpo1a8aECRMwmUw8/vjj7Nq1izfeeIP169czfPhwPD09eeKJJ8jLy+Py5cukpKQwY8YMtFotqampRERE0KVL5RPkOXPmVHREjx8/zty5c/n3v/9dq3jfjHFj/8ZTgx7DWFTM1au5eHrqMBgK8PLyRK/Pu2kDsWXL/7J165fs/m4Lv/2WxTfffu/0+aC/PUmvx3pQUlxCXm4+Wp0WY0ERWk8PCvIK6txYivJXtDaA3WhA5n7D6JNa61jXdCOmYmzn0xz217JArUXm649df/WmMRGla84pQKGrPGZAodNgyTFUsSszWTgbtwlNaBM6/edNfu4yFbu1+lhdpySnAKVOU/FapdNgyimoYidXKuixaAyHl2yh8MLNfb2OPT8PmaYyFjKtB/Z85ylRN05DsiQdw2vBQpDL62XKSXUU5BpQ3xAbrU5LQW7VuLtCVIxF+CwqX1zHps9H7lHpn1ynxaav9M/tHn8U3jo8H+9Z8Z7fmMEU/XgY06n0m+qKqnsifRZVr0XmC3txITJlpbZMpcFeUv2RIooW7bFlnKhRr6G1ewAjRj3Do0/2obiomNwcPR6eWgoKCtF5epCfZ7ipz4OG9ker1bBqecIfqnsdUTlZZK4XVZZF5WSRdc9Pp6G41FLxuqjUjJ/O+fihN7f8xODo1jzesQV6YwkDl37GjlnD8Nbefkfmz0L37t3p3r2703szZsyo+L+npyexsbF3/Lq1nnoaEhICgJeXF02bOp7SeXt7V8yZTU1NJTExkYSEBA4cOFAx1Uqj0bBs2TISEhI4c+YMer2+QjM0NBQAPz8/p7m3rmjRogUA7dq1Q61Ws3HjRtasWcPevXvJy8ur8KdZs8r5z08//XS1WhaLhaVLl5KQkMDVq1ed/KuOM2fOVOhej8nNuB6nG2Pm5eXlFLNz586RkJDA+vXrCQ8Px2BwVFx/f388y6dAXI9PZGQkzz33HFOnTmXSpEnI5VX/fKmpqRV+NW3a1Gl73LrGe+0Hn/LEgL/x7PDxfP3NHh544H4Auj8YzdffOBpBmUxGSIhjPnqvng8Q3dnxtMhut3PhYiZhYVWf7G7/9Cum/20WMeMXsH/PL7S93zF99r7otvz8/YEK3SaBjWvtq0h/RWsD2DJSkPsFgJvjGY4iLBJr8iHQ6kDtaDysaceR+5ePZqs1IJdjL6i6AcIfoWs4nIY6OACZyqHr06U1ObuP4ebjgaK8sWs66ckK+9JsPUo/L+RqVY26AFeOpKML9kdert0kuhUX9yTh7uNR0ZAq1Ep6LH6RkwnfkHPyPKH9XU8Zt5xORtGkCZSfX6WMaov54H5knp7IyvOWdsw4kDum0SiCgim7fPmu6SQCpB9JxT8oALfy2IR3juDY9zWvqa0OUTEW4bOofHGdkmMpKAMbI1M6/NN2aoPxh4PIvXXIPTRYL+eQPWsl+oSt6BO2AqD/+IsaO1wgru6J9FlUvRaZL8qyzyLz8gOFQ1se2AJbxklw14LK+aZV0aYb1l/316jX0No9gI2fbGXkM5OYOHo633/3E/dHO/yJ7tqR77/7qUI7MKjyPNzhzw/BP8CPVcsTaB3ZirAWVdeNidK9jqicLDLXiyrLonKyyLp3X7PGZOcZMZd3KJPOX6VnRFMMxaUYTY6pkpfzi/D3csTcS+OOXAZlLjYMvGsps4v5d5dyxzaziYiIoF+/fkRERGA2m9m1axcAU6dOZfv27RXTQW9EJpPV6Vo3fm/p0qX069ePp556in379vHVV19V+HPxomOOtN1uZ/PmzRUjm3a7nStXrqBUKpkxYwZHjhxBpVKRmur6wPaWLVty/vx5oqKiKuYC15WIiAj8/f0ZOXIkANu2bcPHx6fKb7zOpUuXaN++Pc888ww//PADq1at4v3336+iefHiRXx8fLhw4QIREREVn9U13jcSM28xixbOIbxVc5o3b8aMmY6nF/fd14Z1H79Dx059MZlKmT59EklJp/D09EAmk7Huk8016r6/+EMmzxlPSPMQgpoF8s9Yx+9q2aY5896Zzci+YwHo8ciDdO/bjaYtQvjrpGfZtLpmXVH+CtO2lGLa8h7uQydgNxooyzqPLe047gPHYC8uxLz7M8y7P8N90BhU/Z5B5n8vpk9XgtVyc02BumUlZlJnfEDr+DGYcwsw/nqRvL2naDlvBJZ8IxdWbUfurqT14hcxZebg0SqItHnrsBldr8Gymcwkzv6YB2NHYsotQH/6ElmJyXSZO5zS/CKOv/slD62ajF/rYDybjgZAqXHn/NeHahYuLcW4aiUek6ZiN+RjPXcWS9JRtC9OxF5YQMmWTZTl6dFNnYbtcjZuoc0pXBrv0t+bcejYCb7cuYecXD1r1v2LUc8NQX2b00HMJjMfzV3DqLfGUqAv4OLp83XaFEZYjAX6DGLyhd1UyuX579J43kRs+gJMqRkU7z9OwBsvYDMUVnS0FL5e+AzvD0CjsU+Tv/kbrFdq2BBFVJ0W6LOoei0yX2C1YP5+E8rez0KxkbKcTMoupaDsMQS7qQjr4Z0AyAKCseddBUvt19o2tHYPYMnf32HOW68R1qIZzcJCiHtzOQCRUeH8z/uLeKTHEPo9/hAxf3+d5BMpPNK/D75+Prw5cyEZZy/8sbqicrLAXC+qLIvKySLrnkblxpzB3VmyfT++Hmpa3etL11aBrNxxEG+tOy881J43BnRl475kjl+4Qqa+kJcf64yvh9ql9t2Iqx3x/78hs7v4xYmJicybN4/BgwfTp08fYmJiiIyMZNy4ccyfPx9vb2/mz5+PwWDgo48+Ijg4mOzsbAYOHEjnzp1Zvnw56enpdOrUib179+Lj48P8+fNZsWIFp0+fJjY2lvT0dBYtWkRcXByPPvpoxbUPHz5MfHw8kZGRjBgxgosXL/Lmm28yZswYxo0bh1KpZOfOnXz66ad07dqV7Oxsfv31V2JjYwkLC2PJkiUEBgZSUFBAr1696NatGwcPHmTdunVoNBpmzJjB8uXLsVgstG3blv/93/8lKiqKUaNGER8fj7e3NzExMU5rES9evMjf//53IiIi8PT0ZO3atcybNw8vLy9iY2Pp378/Q4cOrYjNtGnTWLFiBQaDgQULFrB27VpOnz7NggULaN68OcuWLaNJkyYUFhYSEhLCX//6V7Zu3crSpUuJi4vD29ubOXPmMGjQIAYMGMCqVauIjIwkOzubv/zlLwQGBlbYxcTEYDQa+eCDD2jWrBkZGRmMHz+ekJAQ3nrrLZfx/j1uqqA7UMSc6RrQ+o5rXufANdcd/buJvIkdXRvdZRzcfGfO5fk955RK10Z1ZHDU7T3QuRneGz8Wojvq/ulCdAEeson5+/1XUfsZCrfCBeutT6etLR8KukkJekzMHnGZ34ob0c401H2zpvqg22wfIbpeM75ybVRHRLV9l0quCdEVydFov/p24ZY4nnSPa6M6Iqrta25x/aCpLjyYcL8QXQDNoBmuje4CCibc/L75dvBas1OI7u3isqMo8edF6iiKReooViJ1FCuROoqVSB3FSqSOYiVSR7ESqaMoHqmjWInUUYSCcY8I0fVaW7ezmkVz1x+PISEhISEhISEhISEhIfHHcsfWKEpISEhISEhISEhISPy/5S7eeEYE0tRTiZtS/D8T7rjm/kX5ro3qiKjpSKIQGYv2HS67NqoD7q3FTFFTNL/z05yvIyrOH6rNQnQ/ObJciC7Aug5vCtEdnXTnt+QWjahY9NTWvHN2XRE5PVRUvvgiueadwe82RrwuZmo2wMa3xUzPFuWzLKyFEF2A2FeShOj2Lan5GIe6EuRd/TErdwL/MDHlYnmqmDb1gr0WG0vVkU0XvhCmfScxjOkrRNf7491CdG8XaeqphISEhISEhISEhISEhBPS1FMJCQkJCQkJCQkJCQlX/MmmnkodRYlaIw+JQNGyI5QUYreD9YDzDnGqvs8j8wmotPcPxrQpHntBDWeMlePbqx2N+3fBnGMAO2Qs/8zp88aDuhHwWGcKT13Aq0MLLm/9kZzvjtabzw0xFsqO96Pq3gt7fh52u52SjZ84fe7e7zHUTwwEs2NKpWnn15Tucb0LlyK8PW7tH8RuNIDdjvnbf1W9dq8BAMgbNUGm8cC06R2XuiAuzqJiXB1tu99H9OMPUJBjwG638593ttRJ5/fk5Or5R8J6Us+cY/OH/6izTmCPKMIej6YktwDsdo6udJ7+037yk2gCvCm5ZsC/XRiH3/4Mw9nsevVZlK6oWGgf7IBnvwex6R1lIPefm6q18xrQm8DlM0jtMAR7salWPje0fCGyvInSFpWHGpq/AL+kZ7Ln1Hn8PDTIZDCxXyenzzP1haz46iBRIf6kZuXyeIcW9I5q5lK3Rfe2tH0sGmN5LPa8858qNu2e6MqjM4bz1YL1pHx/zKUmiM31ouq1qLonKsbVIardk/jjkTqKt8kPP/xAbGws69evJzg4mNmzZ/P888/Tpk2b+nbtzuKmRPXwCEwbFoDNiuqJCchDIii7lFJhYrt4GtvuDY4XKjWqR0bXquGRa1RELB3LL72mYzdbaffhNHx7tiVv76kKG4VaxZm4TZRm5qJrG0q7ta+6TuaifG6IsXB3Rzd1GnnjR4PFgue8WJQdOmFJcv5e4aJYyq7cwnolpTvqZ1+iaNFksFpRvzAbRXh7bGnHK0zcoh/CXlKE9dD3jt8YGFo7bUFxFhbjalCpVbywcCIz+k3Farby6vsziOrers4Hzd/I0RPJ9On5ACnp5+qsoVCr6LH4BT7rM5Mys5W+CVMJ7B5FVmJyhY2bVs0vCzYC0HxAV7rGPMd3Y1bUm8+idEXFQqZ2554FU8joPxG7xUrQqrlou7WneP9xJztVixBULZveks8NLV+ILG/CtAXloYbmL0CJ2Ur8fxL5fPpQVG4Kpq/fw4H0LLq2CqywWffDCTqENuH5Xm1JyczhjU//67KjqFSrGBz/AisfmYHNbGXE6ldp8WAUZ3+ujIVvcABF+kIM2a79vI7IXC+sXguqe6JiXB0i2727AnEnFd2VSGsUb5PevXsTFFS5aHjhwoX//zqJgPzeFtgL9GCzAlCWdRZFWDsnG1va4Yr/u0V1x5qcWCtt787hmH67ht3s0M4/mIp/X+czBrM3/0hppiN5acPuoSjtt3rzuSHGQhkZhe3KFSg/W8mSfApVl25V7NQDB6N5+lk0I0Yh83S9eYYiLIIy/TWwOvy1ZZzGLSra+dqdeyPz0KHsNQDVkyOxl9ZuMbyoOIuKcXW0ur81OZnXsJZfK+1wCh37dK6T1u955KGeaLXa29Jocn8rjL/lUFbu35VD6TR9uIOTzZG3K5/Ay+RyLEWldb7enfBZlK6oWGg6RmDJuord4tAtPvorut5dnGxkanf8xj5Nzk1GJG5GQ8sXIsubKG1Reaih+Qtw4sJV7vXVoXJTANAhtDF7Uy462fjpNOQVOUbN9EUm2gQ3cqnbtFMr8jJzsJXH4sLhNCL6OJfjvN+ucW7/r7Xy8zoic72oei2q7omKcXWIbPck/nikEcVyPv/8c1asWMFzzz3H5cuXSUlJYeXKlSxatIiOHTuSlpbGCy+8QGRkJGazmTlz5uDv70/jxo0xGo0ApKSkEB8fz+DBg+nZsyfz588nMjKSl19+mRU6ud3QAAAgAElEQVQrVnDs2DE2bNjAuXPnWLNmDS1atCA9PZ3JkycTFhbm5M+mTZvIyMjA19eXwsJCZsyYwX//+18WLVrEQw89RFlZGbt27WLq1KlV/N68eTNLlizBx8eHgoICwsLCePbZZ/nggw949913eeWVVzh58iQmk4l33323VvGRaT2xWyqnTNjNJcg1N3tKJkPRLArrsT210lb5e2EzVmrbjCUo/b2r2MnVSsJefwbf7m1InrSq3nxukLHw8cVeUlzpc3ERMp9WTjaWE0mYD+7HbjCgjO6K59wFFMyaVrOuzht7aaUupmJkOmd/Zb6Nkam1mL/9N7KAQLSTYimKnwT2mh/LiYqzqBhXh1cjb0zGyo5xsbGY0EbN66QlAo2/F5Yb/DMbS2jk71WtrVypoNUzPUmcu+4P8u6PRVQsFH4+lBVV6pYZi1H4OZe3gNdGkvvev6D8prO2NLR8IbK8idIWlYcamr8AemMJWvfKA+I93FXojc6jT8/3asu09bt5+8tfOHUph/G/6/xWh87fi9KiSp9NxmICG4XWyqeaEJnrRdVrUXVPVIyr425v924Xu7RG8c/J0KFD2bZtG+3atWPKlCmcPHkSpVLJSy+9RFRUFMnJyaxevZp//OMfbN26FQ8PD2bNmkVZWRnr168HICIigi5dHE+UAgIC6Nu3L5mZmQAMGzaMY8cc871/+ukn3N3dGT16NFeuXMHd3d3Jl7Nnz7Jhwwa+/vprZDIZs2bNYs+ePfTt25fvvvuOZs2aMWLECAYNGkS7du2q+L1161asViuTJ08G4Mknn6Rz586MHTuWTZs20aNHD0aPHs3Jk7WfBmAvLkSmVFe8lqk02Euq3zJa0aI9towTtdY25xSg0FVqK3QaLDmGKnZlJgtn4zahCW1Cp/+8yc9dpmK33nw7bFE+N8hY5Och01SOuMi0HtjznY+NuHEaiyXpGF4LFoJcDmU379DZjQZk7jeM5Ki1jrWKN2IqxnY+zWF/LQvUWmS+/tj1V2+qC+LiLCrG1VGQa0Ct01S81uq0FORWvVZ9UZJTgPIG/1Q6Daacgip2cqWCHovGcHjJFgov1Px3a6iIioVNn4/co1JXrtNi01eWAbd7/FF46/B8vGfFe35jBlP042FMp9Jr1G5o+UJkeROlLSoPNTR/wTFaWFxqqXhdVGrG74byB/Dmlp8YHN2axzu2QG8sYeDSz9gxaxjeWvffy1VgzCnA3aNSR63TUpRbNRa3ishcL6pei6p7omJcHXd7u3fb/Mk6itLU09/RvLnjqUe7du1wc3Njx44drF69mq+//pq8vDwA0tPTCQ0NBUAulztNPa0Nw4YNw8/PjxEjRrBq1Src3Jz762lpacjlctauXUtCQgJubm4Vo5YALVq0qPCxOr9TU1MJCak8vyo4OJi0tLQav++KsuyzyLz8QOHwVR7YAlvGSXDXgsq5oVC06Yb11/211jYcTkMdHIBM5dD26dKanN3HcPPxQFGebJpOerLCvjRbj9LPC7laVS8+N8RYWE4no2jSBJSOp8HKqLaYD+5H5umJrHzKnnbMOJA7phQpgoIpu3y5xoYHwJaRgtwvAMrLsCIsEmvyIdDqQO3w15p2HLl/E8cX1BqQy7EX5LmMhag4i4pxdaQfScU/KAC38muFd47g2PeHXXzrj+PKkXR0wf7Iy/1rEt2Ki3uScPfxqLiJVaiV9Fj8IicTviHn5HlC+0fXJNlgERWLkmMpKAMbI1M6dLWd2mD84SBybx1yDw3Wyzlkz1qJPmEr+oStAOg//sJlJxEaXr4QWd5EaYvKQw3NX4D7mjUmO8+IubwTlXT+Kj0jmmIoLsVocmyscjm/CH8vRxnx0rgjl0GZi+O6Lx5NxzfIH0V5LJp1Difl+2NovD1wv6HDcauIzPWi6rWouicqxtVxt7d7EreGNKL4O2QyWcX/ExIS8PLyYtKkSWRkZHDihOPJW8uWLSs6XmVlZRWjhr/Hw8OjooOXnV25U9nx48cZP348r776KkuWLGH79u2MGTOm4vPw8HDc3d0ZP348AMnJyU6dyRt9rO69iIgIUlIqF67/9ttvhIeH1/h9l1gtmL/fhLL3s1BspCwnk7JLKSh7DMFuKsJ6eKdDOyAYe95VsNR+HVNZiZnUGR/QOn4M5twCjL9eJG/vKVrOG4El38iFVduRuytpvfhFTJk5eLQKIm3eOmxGF2vdRPncEGNRWopx1Uo8Jk3FbsjHeu4slqSjaF+ciL2wgJItmyjL06ObOg3b5WzcQptTuDTetcOWUkxb3sN96ATsRgNlWeexpR3HfeAY7MWFmHd/hnn3Z7gPGoOq3zPI/O/F9OlKsFpcawuKs7AYV4PZZOajuWsY9dZYCvQFXDx9/o4t6D907ARf7txDTq6eNev+xajnhqB2v/lT++qwmcwkzv6YB2NHYsotQH/6ElmJyXSZO5zS/CKOv/slD62ajF/rYDybjgZAqXHn/NeH6s1nUbqiYmE3lXJ5/rs0njcRm74AU2oGxfuPE/DGC9gMhRU3kQpfL3yG9weg0dinyd/8DdYrNW8q0dDyhcjyJkxbUB5qaP4CaFRuzBncnSXb9+ProabVvb50bRXIyh0H8da688JD7XljQFc27kvm+IUrZOoLefmxzvh6qGvUtZjMbIv5iAFvjaIot4DLKRc5+3Myj816jhKDkR9XfwnAQ1OewifIn/uefACb1Ub6TzWPhorM9cLqtaC6JyrG1SGy3bsr+JNtZiOz21086vmTkJiYyLx583j00UcZN24cfn5+HD58mBUrVhAdHY3ZbGbnzp3Ex8fTqVMn5syZg6+vL97e3nzzzTf07t2bp556iri4OLy9vYmJiUGr1fLaa6/RpUsXVCoV69evZ/78+ZSUlPDzzz8THBzMuXPneOmll5xGAAG2bNnC2bNn8fDwID8/n+nTp3P27NmKdY8TJkygWbNm1fpts9lYvHgxXl5eGAyG/2PvzMOiKts//pkZZhiGYRXUF1nFBUTc0VQ0F3JfsjLtl+ZuatpqprmUvppLaqmvlmSLVpZpapttapmpqbiiKSAouCDKOgwMDAzz+wMCJ9AB5RGmzue6uC7mzH2+z33u8yzznPMsNG7cmGHDhvH9998zb948xowZw4QJE1AqlbeJRjG5bz9d7XE+tDjTutFd0nGWqzBtEYiMRctWVVi5tArYN7U+af5uUDSs2lv5qiAqzu+rjUJ0Nx5bIUQX4KNW84Tojj65QIiuSETFoosmXYju1SwxZQ/E1Rc7zvpYN6pFPDndUZj2p8tzhOiK8lkWEChEF2DBcyeF6EYYqjY1oLI0cKl46G514BEgJl+siBHTpiaaq/7AtLJsTtxh3agWkDmsuxBd1y2/CNG9V6Q3iiV07tyZvXv3Whxr164dmzeXrVb1yiuvlP6/YkXZj7lp06aV/v/XfMW/2LBhQ+n/o0ePLv2/T58+d/Tn8ccfL3esRYsW7NhhWZAq8luhUDB79uxy5/ft25e+ffveMV0JCQkJCQkJCQkJifJIi9lISEhISEhISEhISEhIWPIvG3oqLWYjISEhISEhISEhISEhYYH0RlFCQkJCQkJCQkJCQsIK/7ahp9JiNhK3pXODHtWu+b6V1c/uhXG3bCZrC3RV1hemLWrCuZ+sepfR/guRE+RF+dyw0PYGZIhadEbUwjAJduLG+IxSiVnkaH+uuxBdW8TWFvaZZ3dTiC6Ia/tsrd2TsG387FyEadvKYjbpQx4Uouu+Y58Q3XtFeqMoISEhISEhISEhISFhjX/ZHEWpoyghISEhISEhISEhIWEFs9RRlJC4PU6uTkyeNYFrScl4BzRg/ZL3yUjNKGfXwN+LqXMnYTKZmDNxvlVdTadWOD3UCVN6FmazmbT/ba7QznlgN7xWzCCm1SOYc60PuRHlr0jtwM7Nad4nDH2aDsxm9qzaXs4mtH8Hes8YzrfzN3F+74lK+VsRzTu3IKzvA+hSi+O+fdUXVdawNX9F+uwVHkJA3zAMJbrH37IcStNyygAcPF0w3MzCIzSAqOXbyIpPrnHtW0lNS2d15CZiLiSw5f3VVT7/fvgr6v6JqodsMV+I0hUVYwC3rqHU7dceY2oWmOHiim0W39cd3BHPPu3IPpOIc6tArm/dR+pPx63q2lq7J9JnW9O1RZ9tMRYV4eii5YmZI0lJSqF+wH/YsuwTdKlZd6UlUXPY3iSbauKjjz6663OvXLnC7t27q88ZG2LSzHFE/X6MT9Z+xv4fDzB13qQK7UJaB3No7+FKacrU9tSfP5Ubb0SSuuZT1E0D0HRsWc5OFeiDqpFvjfsrUlupVjFk0Vi+/e/H7Hn7S+oH+RLYKcTCxs3bk5z0bLKS06rk799RqVWMfWMSHy/4gC/f3oJvsD8hnUOrpGFr/or0WaFWEb5kLIfmf8LxldtxD/bBq7Olrp1GzR/zP+XUum+5uOsIHeY8UePaf+f46bP06PIA9zJ7XaS/ou6fqHrIFvOFKF2Rdb3cQUXQsvHEztvIxeXb0Dbzxa1L83LXdWHhZpLWfs2lVTtoPP+pSmnbWrsnymdb1LVFn20xFhUxbMYIon8/xTfvbOfYj4d5cvboe9KrNRQJ+qul/Gs7ips2bbrrc69evfqv7Sh27PkAZ479CcDpo2fo1KNDhXY/7dhDYUFhpTQdWgdRcO0G5hL73ON/ou3W3sJGprbHffxjpN7miev99Fektm+bxmRcTcVkLLZPjIolqEdrC5uMKzdJOPRnlXytiMZtm5J69SaFJWnFRp2ndY92VdKwNX9F+lyvbWP0V1IpKtFNORqHb89WFjbHlpe94ZDJ5RTk5Ne49t/p1b0LGo3mrs79C5H+irp/ouohW8wXonRF1vUu7ZqQd+Um5hKfM4/E4BFhmS+St+wj/2rxwwNNQH1yYq9UStvW2j1RPtuiri36bIuxqIjWPdoSdzwGgJio87Tu0fae9CRqhn/l0NNdu3ah0+lYs2YNDRs2pH///qxatQqTyYRcLsfR0ZEJEyawfv161q5dy4YNGzh79iwHDx7k1VdfZceOHZw7d441a9bQr18/3nvvPQCWLFnCli1bWL9+PXv37mXv3r0sXryY7t27U1RUxM8//8y+ffsqTOvvbN68mYsXL+Lm5kZ2djYzZszgl19+Kaf37LPPsnLlSp544gmuX7/O+fPn2bJlC0uXLsXV1RWdTkdAQADDhg1jw4YNrF27lueee47o6Gjy8vJYu3ZtlWLnVseVXH0uALnZOTi7OaNQyDGZ7v5xiMLdlaKcslUvi/S5KNwtV9byfOEp0tZ9BlWsuET4K1Jb6+FM/i2r2OXpc/Gq43+vrlaIcx0X8vRlcc/V5+Jfp2GVNGzNXxDns4OHMwW3+GfUG6jj4VyhrVypoPHQLhyY/VGNa4tApL+i7p+oesgW84UoXZF1vcrDGZO+LF+Y9AaUHuVXaJSrlQRMH4pb52acnbymUtq21u6J8tkWdW3RZ1uMRUU413EhrySPG/S5aF2dkCvkFAlI634izVH8F9CvXz+WL1/OtGnTANi/fz+nTp3igw8+AGDkyJGEh4fz9NNPU1hYyNdff41SqWTNmjWo1WqGDBkCUHr+kCFD2LGjeP7GsGHDWL9+PQA9evTgp59+ws/PjyeffJLBgwffNq3g4OBS/+Lj4/n444/ZtWsXMpmMmTNnsmfPHiIiIsrphYaGsnPnTkJDQ5k6dSrR0dFs3bqVwsJCpkyZAsCAAQNo164d48ePZ/PmzYSHhzN69Giio6MrFa/BIwbQtU84hlwDGWmZaLQa9LocNE6O6DJ091zBmNIzkTuWbWEg12owpZeNY7er74HCRYtT3y6lx9zHDCFnXxR5Z+Luq7+iY6FP1WF/yzLqaq2GnDTdPWneDl1aFmptWdw1Wg26tKrNH7A1f0Gcz4ZUHcpb/FNpHchLLa8rVyoIXzyGqKVfkJ14o8a1RSDSX1H3r7rrob+wxXwhSldUjAGMqToU2rJ8odA6UFDBfKiivALiF27Gwb8ebbbP42D7ZzEXmsrZ2Vq7J9JnW9O1RZ9tMRYV0eP/ehHWuwN5uXnFbbajA7m6XBy0GvSZ2TbfSfw38q8denorMTExGAwGIiMjiYyMpH79+qSnF+//NHnyZKKiomjUqBFq9d3tgxQYGAhAaGjoHdP6i9jYWORyOe+99x6RkZHY2dmh1+sr1PuLhg0bWqTh4+NT+p23tzexsbF3PP9OfPXJt7w0YiZzJs7n0J4/aN62GQAtwppzsGQMu0wmo55X3coF5G8YTpxH6VUXmbL4uYWmTTP0vx5B7qJF7uhA4fVUkme+RXrkVtIjtwKQ/uGO2zaWIv0VHYuk43G4NfBAoSqOhV+7JpzfewIHF0fstdW7H2DcsRg8GnhiV5JWk3ZBnNgb9Y/2V6TPKcfi0Hp7IC/RrRfWmKQ9J7F3dSz90a1QKwlfMo7oyO9Jjb6Ef7+wGtcWgUh/Rd2/6q6H/sIW84UoXVExBsiKikXt7YmsxGfX9k1J3X0CO1dHFCU++04eUGqfn5yO0t0ZuVpVoZ6ttXsifbY1XVv02RZjURF7N//E0lH/ZdXkNzmx9xiN2zQFoGm7IE7sPXbP+rWCf9kcRcXrr7/+ek07URN8/PHHjBw5knPnzuHo6EhCQgJz586lbdu2ODk54e/vj5OTEz/99BMtWrRg48aNdO3aFRcXF65fv87Zs2fp0qULSUlJyGQyDh48SL9+/bh27Rrbt29n1KhRAOzevZvg4GC8vb0ByM3NvW1afyGXy9m7dy/Lli2jbdu21KtXj3r16uHh4VFOD2DHjh1ERETg7Fw8NCglJYX4+Hi6desGwLvvvsuIESNwd3dn48aNjB49ulIx+mDlxnLHoqPO8vCIgTRqFkhou+asWxhJXm4ejUMCWRj5Ojs2fQ1AeK9O9BjQDb9AXxwcHYiOOgvAYFUFL7ELTeTHX8Z93CM4tAyi8GY6uu278Xh2BPZN/DGUjKdXuDnjPmYIjh1bQqEJ46WrFkN3vq5geM69+nsn7lXbT6Etp1lUaOLGhat0mdAfn1aNyL6RybFtvxHxwmPUb+pNYlRxh7/71Idp2LEZaq0DRoOR9MQUC50srA9VMhWauHrhCv0nDKZR6yZkpqTz29Zf7niOq0xpU/6K9NmtSGbx2VxoIjPuGi2e7kfd1oHk3sgk9ovfaPvSo7gH+ZByNJae70zDs0UA9TsE0eTxrjTo1Izzm61fQ3Vpt5rU3WpaR0+c5psf9xATl0Befj7Ng5tgZ3fnwScn37VMp7r8zZCXX1Gnuu5fK8XfVo+spnooqcCys2oL+UKUrp/SYClcTTHOzrev0Ofc2Kv4TR6Ic9vGGFMySf78Vxq+/DjaYB+yjsTgFh5CvYc7ow32xWt4N65+sgddVNlD1F/kuRXGozrakXJtn8B2r7p8/ifo2qLPtTkWrvLKvzCJPXaenk/2xjfYjyZtg9i8eBP5ubefy/zoC8MrrV2T5Gz6CMxU+5/jU2Pu63VUFpnZfC9r29kuCxcuxM7ODpPJxOzZs1m3bh0GgwGFQkF+fj7Tp09nx44dfPbZZ6xfv56lS5cSExPDnDlzaNKkCc899xz+/v706NGD8PBwpk2bRkhICA0aNGDRokW89tpr+Pr68tprrxEcHMzTTz+Nn58fQIVpKRQKC/+++OIL4uPjcXR0JDMzk5deeon4+PhyegcOHGDu3Ln07t2bCRMm4O7ujslkYsmSJTg7O5OVlUXjxo0ZNmwY33//PfPmzWPMmDFMmDABpVJZUWhK6dygR7XH/X3Hu3srWxnG5VRu2fDaQldlfWHaiWaDdaO7wE9WvW8J/0KUvyDO54aFtjcgY/TJBUJ0P2o1T4hugp24x6yjVJlCdPfnugvRtUW6aNKtG90FV7OcrBvdBfPsbgrRBXFtn621exK2jZ9d+Xm/1cXmxB3WjWoBNx96UIiu58/7hOjeK//ajqKEdaSOolikjmIZUkfx/iB1FMuQOorikTqKZUgdRYl/AlJHEW70FNNRrLundnYUbe+XjoSEhISEhISEhISEhIRQ/pWrnkpISEhISEhISEhISFQFaXsMCYkSfn65cbVrHlosZrgXwM+zfKwb1SJExqJlKzFDvjR9g4TogpihZCAuzu+rxQyX3XhshRBdEDdEVNSQVtOVP4XoAnw84HMhuqKGW4qkQR8xg4uObBFTrhOszK+/W36eXv1t3l98ujxHiO7PLzsK0RXJf9/OFqIbYSi/zUl10MBFjL8gruy98XXF+55KVANmmXWbfxDS0FMJCQkJCQkJCQkJCQkJC6Q3ihISEhISEhISEhISElaQhp5KSNwGuU8QikatwZCN2QyFh7+1+F4VMRKZq2eZvYc3eZsXYdalWdV26xpK3X7tMaZmgRkurthm8X3dwR3x7NOO7DOJOLcK5PrWfaT+dLzGfLbFWChbt0XVuSvmzAzMZjOGTy33ybR/qA/q/oPAaAQg78dd5O/5yaquyFiI0hYV44po3rkFYX0fQJeahdlsZvuqL+5K5++kpqWzOnITMRcS2PL+6rvW8QoPIaBvGIY0HZjNHH/LcuW5llMG4ODpguFmFh6hAUQt30ZWfHKN+vzH6Vj2HInG3UWLDJg0tLfF91dvpPPO1h8J9K5H/JUURvZ/kKb+XlZ1RcVC06kVTg91wpRenAfS/re5Qjvngd3wWjGDmFaPYM6t3GqWorQVTVpi17ITZn0WmM0Yf/isnI2y60AA5HXqIXNwJG/zKqu6IsueqPsnqh6yNX9Fagd2bk7zPmHoS2KxZ9X2cjah/TvQe8Zwvp2/ifN7T1j1FcTmN1sre6JiLFpbomaROorArFmzGDlyJM2aNbtnrY8++qjSG9rbFHZKVD2fJO/j+WAqRNX/aeQ+QRRdPl9qYko6h2n3x8UfVGpUvUZXruFxUBG0bDx/dH0Js7GQ0PdfxK1LczL2nym1UahVXFi4mfyraWib+xP63vPWK3NRPttiLOzt0T77IhkTR0NBAU5zF6Bs1YaCk5bnZS9eQFHKdat+liIwFqK0hcW4AlRqFWPfmMSMh56l0FjI8+/OIKRzKGcPRFdZ6+8cP32WHl0e4Hxcwl1rKNQqwpeMZVuPVygyFhIR+SxenUO4dqBs42Y7jZo/5n8KQMOBHegw5wl+GrOyxnw25BtZuGEb21fMQKW048UVH3E4OpYOoU1Kbd7cuJOBD4bRs30ocUnJvLrmU7a+Of2OuqJiIVPbU3/+VC72m4S5oJAGa2aj6diS3EOnLOxUgT6oGvlWKRbCtJX2qIc9Q87iKVBYiHrsLBRNWmKKLdO1C+uO2ZBD4dG9AMi9/K3Kiix7wvKyoHrI1vwVqa1UqxiyaCxv9ZqByVjIk+88T2CnEOIPlsXCzduTnPRsspIr4WcJIvObrZU9UTEWrV0bMRdJcxT/dbzxxhvV0kkE2LRpU7Xo1Dbk/wnErEsHUyEARdfiUQSEWtiYYqNK/7cL6Uzh2QOV0nZp14S8KzcxG4u1M4/E4BHR2sImecs+8q8WVzCagPrkxF6pMZ9tMRbK4BBMKSlQUABAwdkzqNp3LGenHjQEh8eG4fDkKGRO1heiEBkLUdqiYlwRjds2JfXqTQpL0oqNOk/rHu3uSuvv9OreBY1Gc08a9do2Rn8llaIS/1KOxuHbs5WFzbHlZU/gZXI5BTn5d51edfh8OvYS//F0Q6Usfs7ZqmkAv504Z2GTeD2V/3i4AtCgrjuxSclk6PR31BUVC4fWQRRcu4G5oFg39/ifaLu1t7CRqe1xH/8Yqbd5I3G/tRUBQRSl34TCYl3TxXPYhYRZ2CjbdUPmqEXZdSCqAU9hzre+uJPIsifq/omqh2zNX5Havm0ak3E1FVNJLBKjYgnqYZkvMq7cJOFQ1Ra3EpnfbK3siYqxaO3aiLlIzF9txabfKO7cuZOFCxfy9NNPk5OTw/nz55k9ezY+Pj5ERUXx5ZdfEhgYyMWLF3nppZfQ6XTMmjULT09PvL29+eGHH1i6dCmrV69myJAh9OzZkxdffBGFQkFgYCAnTpxg2LBhxMbG8ueff9KvXz+GDRsGwKpVqzCZTMjlchwdHZkwYQK7du1Cp9OxZs0aGjZsSP/+/Su0+/LLL1m5ciVPPPEE169f5/z582zbZjkcorLn9enTh7Vr1/Lcc88RHR1NXl4eS5cuZdmyZXh7e3Pt2jXCw8OJiIjgzTff5LvvvmPo0KGcPn0aX19fZs+eXalYyzROmAvKhkyYjQbkDrd7SiZD4RdC4Yk9ldJWeThj0pdpm/QGlB7lN3WVq5UETB+KW+dmnJ28psZ8tslYuLphNuSW+Zybg8zVcoW/gtMnMR45hDkrC2VYB5xmz0c388U76wqMhShtUTGuCOc6LuTpyxrxXH0u/nUa3pWWCBw8nCm4xT+j3kAdj4pXy5MrFTQe2oUDsz+6T95VTLpOj6PavvSz1sGec1mWncDWTQM4HZdIs4Y+nLlwGYAcQz5uztrb6oqKhcLdlaKcMt0ifS4Kd8v85vnCU6St+wxKfnRWFlHaMq0L5vyy+oK8XGRaS12ZW11kag3GHz5H5umFZvICchZNvuMvHpFlT9T9E1UP2Zq/IrW1Hs7k55Tp5ulz8arjXymf7oTI/GZrZU9UjEVrS9Q8Nt1RfPjhh1m9ejW9evXCz8+PXbt28eabb7Jq1SpeeOEFtm3bRr169di+fTvvvvsur776KkOHDmXfvn3MmDGD4cOH4+rqSvv2xU+BXFxcmDhxIqtWreKVV17h3LlzPPPMM+zevZvs7GxGjBjBsGHD2L9/P6dOneKDDz4AYOTIkYSHh9OvXz+WL1/OtGnTAG5r9+ijj7Jz505CQ0OZOnUq0dGWw9Cqcl5oaCibN28mPDyc0aNHEx0dzfr16/Hz82PcuHEYjUYiIiIICwvj5ZdfZtOmTYwYMQKtVktsbGylY23OzUamVJd+lqkcMBsqXjJaEV62DNEAACAASURBVNgS08XTldY2pupQaMu0FVoHClKzytkV5RUQv3AzDv71aLN9HgfbP4u58PbLYYvy2SZjkZmBzKHsTY5M44g503LbiFuHnBacPIHz/DdALoei2zc+ImMhSltUjCtCl5aFWutQ+lmj1aBLK59WTWFI1aG8xT+V1oG8VF05O7lSQfjiMUQt/YLsxBv308VyuDtryckre7OiN+Tj7mLZAZz+1CA2fbuPj7/bh7OjA65OGurVKf8D8VZExcKUnoncsUxXrtVgSi/LA3b1PVC4aHHq26XsGscMIWdfFHln4mpE26zPQmZ/y5tftaZ4vtSt5OViulTchphvXgO1BpmbB+b028dEZNkTdf9E1UO25q9IbX2qDnvHMl21VkNOWvlYVBWR+c3Wyp6oGIvWro2Ype0xbA8fn+L983x9fblw4QIZGRlkZWXx1VdfERkZyYULF1AoFKX2gYGBpfbOzuWf4Pn6Fj8hc3JyokGDBsjlclxcXMjJKd4HKSYmBoPBQGRkJJGRkdSvX5/09PJ7Z1mza9iw+M1CaGjoPZ/31zWFhoYSExNTGhOVSoWLiwuJiYkAeHh44OLigkKhIDg42HpwSyhKjkfm7A6K4mcLcq9ATBejwV4DKrWFraJZRwr/PFRp7ayoWNTenshUxdqu7ZuSuvsEdq6OKEoaUt/JA0rt85PTUbo7I1erasRnW4xFwbmzKOrVg5L9x5QhzTEeOYTMyQlZyVBAzZgJIC8uJ4oG3hRdv37HTiKIjYUobVExroi4YzF4NPDEriStJu2COLE3yspZ94+UY3FovT2Ql/hXL6wxSXtOYu/qWPojVqFWEr5kHNGR35MafQn/fmF3khROiyb+JN/MwFjylP5kzEW6tg4mS5+LvmShiBvpOkYN7MbI/g/Ssok/HVs0RWl35+eiomJhOHEepVddZCVDZTVtmqH/9QhyFy1yRwcKr6eSPPMt0iO3kh65FYD0D3dY7SSK1DZdPI/c3RNKYqYICKbw7FHQaEFdHIvC2FPIPeoVn6B2ALkcsy7jjroiy56o+yeqHrI1f0VqJx2Pw62BB4qSWPi1a8L5vSdwcHHE/pbOdFURmd9sreyJirFobYmax6bfKP7F5cuX8fPz49KlSzRq1Ag3Nzfc3d0ZNmwYLi4uZGRkcPLkyVJ7mezengYEBQVx8uRJJk6cCMChQ4fw8/MDQC6XYzabOXfu3B3t7uTH3Zx367GgoCCSkpIAMBqNZGVl4e/vf8c0rVJYgHHvZpTdhkGunqLUqxRdPo8y/BHMeTkURv1YrO/pjTnjBhRUfh5TkcFIzIwNNF00BmOaDv2fSWTsP0OjuU9SkKkncc1XyO2VNF0yjryrqTg2bkDs3I8w6a2Myxflsy3GIj8f/Zq3cJz8LOasTAoT4ik4eRzNuEmYs3UYvthMUUY62mdfxHQ9GTv/hmQvW1SjsRClLSzGFWDMM/LB7PWMen08unQdSecuVctCNgBHT5zmmx/3kJqWzvqPPmPUE4+gtre3fuItmPKMHJj1IZ0WPEVemo70c5e5duAs7WcPJz8zh1Nrv6H7mim4N/XGyXc0AEoHey7tOlpjPjvYq5g9/lGWfLgDd2ctTXz/Q4fQJrz1yTc4azWMe7gnp2Iv8vuJ8zRr6E1WTi6zxj5iVVdULMx5+Vx/bS11507ClK4jL+YiuYdO4fnyWExZ2aU/IhVuzrgO7wdAnfGPkbnlewpT7rzwgzDtgnzyvliH/aNPY9ZnUXTtEqbYU9gPGoM5Nxvj7m0Yd2/DfvAYVA8NRebxH/I+eQsKC+7or8iyJywvC6qHbM1fkdoFeUZ2zvmAga+PIidNx/XzScQfPEufmU9gyNKz751vAOg+9WFcG3jQYsADmApNxP125zeWIvObrZU9UTEWrV0bqc3zCUUgM5vN5pp24l7o0aMHEyZM4Pr16/z555/MnTsXX19fjh8/zo4dO6hfvz7JycmMGTMGFxcX5s+fT1ZWFpMnT6Zjx47ExsaycOFCXFxcmDlzJmvXruXcuXMsXLiQvXv3smPHDt544w2uXbvG4sWLWbBgAX379mXdunUYDAYUCgX5+flMnz4dhULBwoULsbOzw2QyMXv27Art/vjjD+bOnUvv3r2ZMGEC7u7u5a6rsud9//33zJs3jzFjxjBhwgSUSiV6vZ4lS5bg5eVFcnIyDz74IBEREWzdupVly5YxZcoUxowZYzW2uW8/Xe3369DiTOtGd0nHWa7CtEUgMhYtW1Vh5dIqoOkbJERXJKLi/L7aKER347EVQnQBPmo1T4ju6JMLhOiarohb/ODjAZ8L0e2iKT+6pLbToI+YwUVHtjgK0U0oGRlR3Tw5XYy/AJ8uzxGiK9JnUfz37YqHq94rEYaqTQ2oLA1cxPgL4sreG19XPN+1NrP4UtUW86oprnToIUTX+/BeIbr3yj+io7h3b+0Mrq0jdRTFInUU7w9SR7EMqaNYhtRRLEPqKBYjdRTvD1JH8RZtqaNYiq10FC+H9RSi63O0cotK3W9seo7i119/TXZ2Np9++mlNuyIhISEhISEhISEh8Q/GbBbzV1ux6TmKgwYNYtCgQTXthoSEhISEhISEhISExD8Km+4oStgeHWe5Ch1yKQpZQGC1a3aKhE+eETOZu6UQVTAlXBWkDIqGDYToihqmllh4U4hut5bj+fXUBiHaCXZiZuGLHCKq8G4mRFdULBpmOQnRFcoPYobW7XZQWDe6K8Tcu/++nc28Va2sG94VYur6T5fnMGJtCyHa4jhp3eQu2O2gEDL89GqWk7Dhp/kxYnRfaprNihgxbWqiueoLu/2TMBdJ22NISAhD6iSWIaqTaIuI6iTaIqI6ibaIqE6ihERFiOskisP2OonisMU5iqKQOokS1YX0RlFCQkJCQkJCQkJCQsIK/7Y3ilJHUUJCQkJCQkJCQkJCwgq1eeEZEUgdRYlKI/cJQtGoNRiyMZuh8PC3Ft+rIkYic/Uss/fwJm/zIsy6O28aDeDWNZS6/dpjTM0CM1xcsc3i+7qDO+LZpx3ZZxJxbhXI9a37SP3peI35/EfcVfacuYS7owMyGUx6qI3F91fTs1n57RFCfDyIuZZG31aBdAvxs+ovgFd4CAF9wzCk6cBs5vhbOyy+bzllAA6eLhhuZuERGkDU8m1kxSdb1VW2bouqc1fMmRmYzWYMn260+N7+oT6o+w8CY/G2D3k/7iJ/z09WdRVNWmLXshNmfRaYzRh/+Kx82l0HAiCvUw+ZgyN5m1dZ1QVx909UjJ1cnZg8awLXkpLxDmjA+iXvk5GaUc6ugb8XU+dOwmQyMWfifKu6FZGals7qyE3EXEhgy/ur70oDILBzc5r3CUNfEos9q7aXswnt34HeM4bz7fxNnN97olK6f5yOZc+RaNxdtMiASUN7W3x/9UY672z9kUDvesRfSWFk/wdp6u91V9dQ22Mhqn4Tqa3p1AqnhzphSs/CbDaT9r+Kl653HtgNrxUziGn1CObcPKu6omIsUltUfS+qHhLZPonStsWyJ6qMiGqrRZa9v9O8cwvC+j6ALrU4NttXfXHXWhI1izRHUaJy2ClR9XySgt+2UvDHt8g9GiD3sdxTz5R0jvxtK4v/vl6H6UpspTqJcgcVQcvGEztvIxeXb0PbzBe3Ls0tbBRqFRcWbiZp7ddcWrWDxvOfqjGfDcZCFm0/wMsDH2ByrzbEJWdwOO6ahc1Hv56mlX89xnZvyZhuLVjx7RHr/pZcZ/iSsRya/wnHV27HPdgHr84hlpelUfPH/E85te5bLu46Qoc5T1gXtrdH++yL5Kz/H7mffIRdw0CUrdqUM8tevICsGc+TNeP5SjU8KO1RD3uG/B3vYfx+M3IvfxRNLJfSsQvrjtmQQ8Fv35C/YwPGX7+yrgvC7p+wGAOTZo4j6vdjfLL2M/b/eICp8yZVaBfSOphDew9XSvN2HD99lh5dHrinp5tKtYohi8by7X8/Zs/bX1I/yJfATpaxcPP2JCc9m6xk62X5Lwz5RhZu2MbLowYzeWhvYpOSORwda2Hz5saddA9rzpjBPRg1sBtz1t79Hlq1ORbC6jeB2jK1PfXnT+XGG5GkrvkUddMANB3LL5GlCvRB1ci3Ur6CuBiL1BZV34uqh0S2T6K0bbHsiSojotpqkWXv76jUKsa+MYmPF3zAl29vwTfYn5DOofekWZswF8mE/NVWpI6iRKWQ/ycQsy4dTIUAFF2LRxFgWfBNsVGl/9uFdKbw7IFKabu0a0LelZuYjcXamUdi8IhobWGTvGUf+VeLKy9NQH1yYq/UmM+nE2/wHzctKrviFf1a+ddl//kkCxt3rQMZOcVPDtNz8mjmXceqLkC9to3RX0mlqCQWKUfj8O1pucDCseVlT0RlcjkFOflWdZXBIZhSUqCgAICCs2dQte9Yzk49aAgOjw3D4clRyJysr96oCAiiKP0mFBb7a7p4DruQMMu023VD5qhF2XUgqgFPYc6v3GR4UfdPVIwBOvZ8gDPHilcAPX30DJ16dKjQ7qcdeygsKKyU5u3o1b0LGo3mnjR82zQm42oqppJYJEbFEtTDsuxlXLlJwqGqrWp6OvYS//F0Q6UsHrTSqmkAv504Z2GTeD2V/3i4AtCgrjuxSclk6PR3dR21ORai6jeR2g6tgyi4dgNzSR7NPf4n2m7tLWxkanvcxz9G6m3eolSEqBiL1BZV34uqh0S2T6K0bbHsiSojotpqkWXv7zRu25TUqzcpLEkrNuo8rXu0u2ddiZpBGnpaBXJycnjhhRdo164dFy9eZODAgXTq1Ikvv/ySlStX8sQTT3D9+nXOnz/P559/zhtvvIG7uzt6vZ6goCAefvhhLl++zOLFi2ndujWxsbGMHTuW4ODgcmmtWrUKk8mEXC7H0dGRCRMmVJhOnz59WLt2Lc899xzR0dHk5eWxdOlSli1bhre3N9euXSM8PJyIiAjefPNNvvvuO4YOHcrp06fx9fVl9uzZlbp2mcYJc0HZkAmz0YDc4XZPyWQo/EIoPLGnUtoqD2dM+jJtk96A0sOlnJ1crSRg+lDcOjfj7OQ1NeZzut6Axr5sywVHexXpessncCO7NufFTbtZ/s0fnLmcysSelVtNz8HDmQJ9WUfKqDdQx8O5Qlu5UkHjoV04MPsjq7oyVzfMhtzSz+bcHGSujS1sCk6fxHjkEOasLJRhHXCaPR/dzBfvrKt1wZxfpkteLjKt5b2TudVFptZg/OFzZJ5eaCYvIGfRZDDfeYl7UfdPVIwB3Oq4kqsvjkdudg7Obs4oFHJMJjHL+d8rWg9n8nPKYpynz8Wrjv8966br9Diq7cvScbDnXJZlJ7B10wBOxyXSrKEPZy5cBiDHkI+bs/ae078bRMVCVP0mUlvh7kpRTlkZKdLnonC31PV84SnS1n0GVXjgISrGIrVF1fei6iGR7ZMobVsse6LKiKi2WmTZ+zvOdVzIuyVv5+pz8a/TUEhaNYHZXHvf/olA6ihWAblczujRo+nUqROZmZmMGzeOTp068eijj7Jz505CQ0OZOnUq0dHRbNu2jYKCAqZOnYrZbKZv37506dIFpVLJM888Q0hICGfPnuWdd95h9WrLOTX79+/n1KlTfPDBBwCMHDmS8PDwCtMJDQ1l8+bNhIeHM3r0aKKjo1m/fj1+fn6MGzcOo9FIREQEYWFhvPzyy2zatIkRI0ag1WqJjY2t6DIrxJybjUypLv0sUzlgNlS8ZLQisCWmi5Xf+sGYqkOhLdNWaB0oSM0qZ1eUV0D8ws04+NejzfZ5HGz/LObC2y+HLcpnd60DufkFpZ9z8o243+I/wLwvfmNIWFP6tg4kXW9g0LJtfDfzcVw09n+Xs8CQqkOpdSj9rNI6kJeqK2cnVyoIXzyGqKVfkJ14w6rP5swMZA5lb1xkGkfMmZZblRSlXC/9v+DkCZznvwFyORTdvpNj1mchs7/lTY5aUzxX8VbycjFdKs5r5pvXQK1B5uaBOf3Ofou6f9Ud48EjBtC1TziGXAMZaZlotBr0uhw0To7oMnS1tpMIoE/VYe9YFmO1VkNOWvlYVBV3Zy05eWVvP/SGfNxdLDuA058axKZv9/Hxd/twdnTA1UlDvTrlf8TdL0TFQlT9JlLblJ6J3LGsjMi1GkzpZbp29T1QuGhx6tul9Jj7mCHk7Isi70zcbXVFxViktqj6XlRdL7J9EqVti2VPVBkR1VaLLHt/R5eWhfqWvK3RatCllY+7rWLlGfc/DmnoaRUwm80cPnyYtWvX8sUXX5CRYblIRcOGxU9MQkNDiYmJ4ebNm0RGRvLee+/RpEkTbt68iZ2dHd999x3vvPMOu3btKqcBEBMTg8FgIDIyksjISOrXr096enqF6fxFYGCgRdo+Pj4AqFQqXFxcSExMBMDDwwMXFxcUCkWFbzJvR1FyPDJnd1AUP1uQewViuhgN9hpQWTYUimYdKfzzUKW1s6JiUXt7IlMVa7u2b0rq7hPYuTqiKKlsfCcPKLXPT05H6e6MXK2qEZ9b+NUlOUOPsaQhOXnpBl2CfMnKzUefVzy5/HpmDh7OxZW9s4M9chkUVWICVcqxOLTeHshLYlEvrDFJe05i7+pY+qNCoVYSvmQc0ZHfkxp9Cf9+YXeSBKDg3FkU9epByebzypDmGI8cQubkhKxkyJ5mzASQFw8pUjTwpuj69Ts2PACmi+eRu3uCXbG/ioBgCs8eBY0W1MX+FsaeQu5Rr/gEtQPI5Zh15fP93xF1/6o7xl998i0vjZjJnInzObTnD5q3Ld77r0VYcw6WzEOUyWTU86pbKf/uJ0nH43Br4IGiJBZ+7Zpwfu8JHFwcsb+loa8qLZr4k3wzA2PJk/STMRfp2jqYLH0u+pLFHG6k6xg1sBsj+z9Iyyb+dGzRFKVdzT27FBULUfWbSG3DifMoveoiKxk6rGnTDP2vR5C7aJE7OlB4PZXkmW+RHrmV9MitAKR/uOOOP4BBXIxFaouq70XV9SLbJ1Hatlj2RJURUW21yLL3d+KOxeDRwBO7krSatAvixN4oK2dJ1FYUr7/++us17YStsHnzZhITE5k1axYtWrTgk08+YdSoUQDs2LGDiIgInJ2Lh46kpKRgNpt5/vnnadu2LXZ2doSEhLBu3Tq0Wi3PPPMMXl5e/PLLLzzyyCMW6eTm5pKQkMDcuXNp27YtTk5O+Pv74+TkVC4dgI0bNzJ69OjSz3FxceTl5dGmTRuMRiPvvfceU6ZMwd7enk2bNpX6bI2CP25ZYbKoiKL069i1fQhF/YaYc7Iw/XkQZceByOt4UXQtHgCZpzdyRzeKLkVXqHnl9/IrfpkLTeTGXsVv8kCc2zbGmJJJ8ue/0vDlx9EG+5B1JAa38BDqPdwZbbAvXsO7cfWTPeiiLN+I+nSx7DxUl88yN3eLz0qFnIC6rnz8WzTRSTfwdNbwcFgT3vnpOPEpGbQOqE/Duq58duBPktJ0fHfiAgPaNKJtw/9Y6JzelVJhLDLjrtHi6X7UbR1I7o1MYr/4jbYvPYp7kA8pR2Pp+c40PFsEUL9DEE0e70qDTs04v/kXC53gun97UmgyYbqchMOjw1AGNaMoLY38n39AM3Isdv4BFJ6NRuEfgLp3PxT+DbEPf5Cc99dTlHrTQsbO429PhYtMFKVcRtVjCAr/pph1GRQe3o193ydR/McPU8KfmJLiUIb1QOHlj13bBzH+vBVziuU8ELlbBUOuqun+RR8ssPhcXTE+Sfm3m9FRZ3l4xEAaNQsktF1z1i2MJC83j8YhgSyMfJ0dm74GILxXJ3oM6IZfoC8Ojg5ER50t1Rg9eVCF1/F3jp44zTc/7iEmLoG8/HyaBzfBzkpH65fIvRafiwpN3LhwlS4T+uPTqhHZNzI5tu03Il54jPpNvUksKWPdpz5Mw47NUGsdMBqMpCda5t1uwy3njirtFAQ0qMumb/cRHZeEp5szD3fvwDtf/MCFy9dpE9SQfcfOsvGbX7l6I41TcZeYNrwfapXSQkfu7EllqE2xaFho+aO4uuq3iqgubWe10VK40ER+/GXcxz2CQ8sgCm+mo9u+G49nR2DfxB9DyTxchZsz7mOG4NixJRSaMF66ajEc75TJsk6urhhXRHVoP9i3fjldUfV9ddVDLfrXE+JvRVSX9r4frlt8ru1lr1z5gGorIxo3y/aputrqQ2mWbWp1xTgL68NoTYUmrl64Qv8Jg2nUugmZKen8tvUXq+c9+sJwqza1gdT/fYoZWbX/1Zk64o7pHjx4kPfff5/o6GiOHj1K+/btK7T7+uuvGTRoEGPHjkWlsv7Qwxoys/nftiPI3RMfH8/cuXNp2bIlrq6ubNiwgYULF6LVapk7dy69e/dmwoQJuLu7YzKZePPNN3F0dKSgoAB7e3ueeeYZoqKiWLlyJWFhYRiNRn788UcWLVpEx46Wk5XXrVuHwWBAoVCQn5/P9OnT+eOPP8ql8/333zNv3jzGjBnDhAkTUCqV6PV6lixZgpeXF8nJyTz44INERESwdetWli1bxpQpUxgzZozV6819++lqj+GhxZnWje6SjrNchejKAgKF6H7yTOWH51aVISGXhejaN7U+af5uUDRsIEQX4NPlOUJ0PzJfs250F/x6aoMQXYB57eYI0X1t5/8J0VV4NxOiC+JiEWG485C12kgDl4qHdN8rG41i6mRRzFtVubl6d4Oo+n7E2hZCdEWy4LmTQnRFlT1R5QPAI0BM+7QiRkybmmiu3GJ0d8PmxB3WjWoBscF9hOg2OffDbb8zGAwMGjSI7777DpVKxbRp0/i///u/cn2H+Ph4vv76a959912OHz+Oo6PjPfslzVGsAoGBgWzeXLZ61dNPl3Wk9u61fDqtUCiYOXNmOY127dpZaLzyyisVpjVlypRyxzp37lwunb59+9K3b1+LY1qtloULF5Y7f+jQoQwdOrTC9CQkJCQkJCQkJCQkbk9NLGZz8uRJvLy8St8QtmnThl9//dWio2gwGNiwYQPz58/n3Xffrba0pY6ihISEhISEhISEhISEFWpiz8O0tDSLt4NarZa0NMsVh9966y2mTJlSLcNNb0XqKEpISEhISEhISEhISNRC6tSpQ05O2TBlvV5PnTple5QmJyej0+n4/vvvS499+OGHPPjggxYLX94N0hxFidvi635vmasiwp0aVbvmX/yefUGYtgg+VIqbf7XbQSFMWwQi5z10N937GP2K6KJJt250F+zPdbdudJfYms8JduLWIV8QVX54fnXwW8gsIbq2iMi5XSLolSpmbjeAj0PlFmaqKpcNN60b1TJEtX3z7GwvFrZGV2X5BZ+qi8WXNls3qgWca9xPiG5w3K7bfne7OYrBwcHY2dmh1VpuP9W0adNqm6MobY8hISEhISEhISEhISFRC3FwcOD1119n4cKFvPXWWzRt2pSOHTsSGRlpse5Jeno669atA2DDhg2kpFhfMdoa0tBTCQkJCQkJCQkJCQkJK9TEHEUoXtCyc+fOFsdmzJhh8dnd3Z0pU6ZUuCDm3SJ1FCWqhIurM7Nee4GkS1fwD/Rl2X9Xk3rTckJti9YhjJs0krPR5whs5M/J42f4bNOXVUqneecWhPV9AF1qFmazme2rvqh1/orSdusaSt1+7TGmZoEZLq7YZvF93cEd8ezTjuwziTi3CuT61n2k/nTcqr+BnZvTvE8Y+jQdmM3sWbW9nE1o/w70njGcb+dv4vzeE1Y1RepWRHXlC6/wEAL6hmEo8fn4W5bLcrecMgAHTxcMN7PwCA0gavk2suKTrepqOrXC6aFOmNKL/Uv7X8VDaZwHdsNrxQxiWj2CObf83qL/BJ9F+Qv3L8+lpqWzOnITMRcS2PL+6rvSAHFlWqS2KF2RZUSUtqi63snVicmzJnAtKRnvgAasX/I+GakZ5ewa+Hsxde4kTCYTcybOt+qvSJ9trd0DcXG2NV2R2vfzd4DE/UUaegqcO3eOw4cPA8UTRF999dUKt7aQgFfmPsfv+/5g3ar3+em7vcxZ8FI5m3r1PPlg/SdE/m8js6cv4tXXX8TNvfL7aanUKsa+MYmPF3zAl29vwTfYn5DOdzdfUqS/IrTlDiqClo0ndt5GLi7fhraZL25dmlvYKNQqLizcTNLar7m0ageN5z9l1VelWsWQRWP59r8fs+ftL6kf5EtgpxALGzdvT3LSs8lKTruNyv3TrYjqyhcKtYrwJWM5NP8Tjq/cjnuwD16dLX2206j5Y/6nnFr3LRd3HaHDnCes6srU9tSfP5Ubb0SSuuZT1E0D0HRsWf46An1QNfL9R/ssyl+4v3nu+Omz9OjyAPcyk19UmRapLUpXZBkRqS2qHZk0cxxRvx/jk7Wfsf/HA0ydN6lCu5DWwRzae7hW+GxL7d5fiIqzremK0r6fdXJtoMgsE/JXW5E6ihR3FI8cOQIULzk7ePDgGvao9tKjV1eOHS3eLPfo4RP06NW1nM3PP/zKqeNnSj8XFhZSWFBY6TQat21K6tWbFBqLz4mNOk/rHu1qnb8itF3aNSHvyk3MJdeeeSQGj4jWFjbJW/aRf7W4stUE1Ccn9opVX33bNCbjaiqmEt3EqFiCeljqZly5ScKhP61q3Q/diqiufFGvbWP0V1IpKtFJORqHb0/LjbaPLS97mi2TyynIybeq69A6iIJrNzCX3N/c43+i7dbewkamtsd9/GOk3uZNxz/FZ1H+wv3Nc726d0Gj0dyThqgyLVJblK7IMiJSW1Q70rHnA5w5VpxPTx89Q6ceHSq0+2nHniq1oSJ9tqV27y9ExdnWdEVp3886uTZgNsuE/NVWau3Q04SEBNavX09gYCBxcXFMmTKF48ePs3LlSsaMGUNMTAwZGRk88sgj/P777yQmJrJ+/Xq0Wi3x8fF88MEH+Pv7k5CQwPjx4wkMDKzwuKurK7t37yY7O5s1a9YwfPhwoHhC6MqVKzl58iQDBgzg8ccf58svv2TlypWMGjWKwnsP+QAAIABJREFUK1euEB8fX5pmXFwc7733Hk2aNCEhIYHJkyfj4+PD6tWrMZlMKJVKCgoKeOGFFyo8dispKSmsWLGCxo0bk5SUxLBhw2jevDnPP/88ly9fpnPnzkRFRdGrVy/2798PQFBQEPv37+fZZ5/Fzc2NnTt34uvrS0JCAs8//zwGg4FZs2bh6emJt7c3P/zwA2vXriU4OLhK96WOhzs52bkA6LNzcHVzQaFQYDKZKrQfNeEJ/vfWBrKz9ZVOw7mOC3n6slUwc/W5+NdpWCU/74e/IrRVHs6Y9GVDoUx6A0oPl3J2crWSgOlDcevcjLOT11j1VevhTH5OmW6ePhevOv5Wz6sp3Yqornzh4OFMwS06Rr2BOh7OFdrKlQoaD+3CgdkfWdVVuLtSlFOmW6TPReFuee88X3iKtHWfQRUbd1vzWZS/cH/zXHUgqkyL1BalK7KMiNQW1Y641XElV1+sm5udg7ObMwqFHJPp3lf7FeWzLbV7fyEqzramK0rb1upkiapRazuKv/32G/b29owePZqUlBTs7e159NFH2blzJyEhIYwfP54pU6aQk5PDG2+8wcKFCzlw4AC9e/fm1VdfZc6cOYSGhnLq1Clmz57N559/ftvjERERXL16lWnTpgHFnVSdTseLL75Ieno6o0aN4vHHHy9NPzg4mIkTJzJ//vzSNOfMmcMrr7xCmzZtOHz4MEuWLGHt2rV88cUXbNy4kcDAQI4fLx5PX9GxW1m6dCndu3dn4MCBXLlyhalTp7Jz506mT5/OE088wbRp08jPz+fmzZsEBwfz5ptv8vLLLzN69GiKiooYOnQoO3fuxN3dnV27drFs2TJWrFjB0KFD2bdvHzNmzGD48OG4ulZuOOiTo4bSe0APcnNySUtNx9FJg06XjdbJkcyMrNs2EIMf7YdG48CaFZFVuve6tCzUWofSzxqtBl1aVqXPF+mv6FgYU3UotOrSzwqtAwWp5a+9KK+A+IWbcfCvR5vt8zjY/lnMhRWnDaBP1WHvWKar1mrISdPd0ZfKIEq3Iu41X/yFIVWH8hYdldaBvNTyPsuVCsIXjyFq6RdkJ96wqmtKz0TuWKYr12owpZf5Z1ffA4WLFqe+XUqPuY8ZQs6+KPLOxP2jfBblL9zfPFcdiCrTIrVF6YosI9WtLaquHzxiAF37hGPINZCRlolGq0Gvy0Hj5IguQ3dPP9hF+WyL7Z6oONuarmhtsL06+V75t20qWGuHnj7++OO4u7vz5JNPsmbNGuzsyvq0Pj4+ADg7O+PrWzzXwMXFpXQzypiYmFIbX19fzp8/f8fjFeHn5wcUryB06yaXAP7+/uW+i4mJ4cCBA0RGRnL48OHS4UorVqxg5cqVDB8+nOTk5Nseu5WYmBhOnTpFZGQk3333HXXq1KGoqKjUL6VSiVarJSAgAIDAwEAAPD09USqV6PV63N3dK7zOv2x9fX1xdq74Cf/f+XTjVp4aOplJo19i70+/0TaseAhZWIfW7P3pNwBkMhleDcr21xk+8hE8PN1ZsyKSpsGNCQj0q1RaAHHHYvBo4ImdqvieN2kXxIm9UZU+X6S/omORFRWL2tsTWcm1u7ZvSuruE9i5OqIo+eHtO3lAqX1+cjpKd2fkatUdY5J0PA63Bh4oSnT92jXh/N4TOLg4Yn/LD/qqIkq3Iu41X/xFyrE4tN4eyEt06oU1JmnPSexdHUs7Nwq1kvAl44iO/J7U6Ev49wuzqms4cR6lV11kymJdTZtm6H89gtxFi9zRgcLrqSTPfIv0yK2kR24FIP3DHVZ/ANuiz6L8hfub56oDUWVapLYoXZFlpLq1RdX1X33yLS+NmMmcifM5tOcPmrct3lOwRVhzDpbMCZPJZNTzqmv1mu+Xz7bY7omKs63pitYG26uTJapGrX2jeOrUKSZOnMjzzz/P0qVL+eqrrxgzZkylzg0KCiIpKQlXV1cSExMJCgq643G5XI7ZbCYzM5Pc3OJX8jLZ7ccLV/RdUFAQDz30EEFBQRiNRn7++WcAcnJyWLt2LWlpaQwePJj+/ftXeOzvWh07dqRnz56YzWbq1auHXC6/bdq3HnNzc8PJyYm0tDTq1KljcZ3WrqsyLP3vKl59/QUCAv3wC/Bh4bwVAASHNOHtdxfTK/wRHurbnTn/nc7Z0+fp1a8Hbu6uzHvlDS7GJ1YqDWOekQ9mr2fU6+PRpetIOneJsweia52/IrSLDEZiZmyg6aIxGNN06P9MImP/GRrNfZKCTD2Ja75Cbq+k6ZJx5F1NxbFxA2LnfoRJf+cN6wvyjOyc8wEDXx9FTpqO6+eTiD94lj4zn8CQpWffO98A0H3qw7g28KDFgAcwFZqI++10jehWRHXlC1OekQOzPqTTgqfIS9ORfu4y1w6cpf3s4eRn5nBq7Td0XzMF96beOPmOBkDpYM+lXUfvqGvOy+f6a2upO3cSpnQdeTEXyT10Cs+Xx2LKyi79capwc8Z1ePGGvXXGP0bmlu8pTLnzBH9b81mUv3B/89zRE6f55sc9pKals/6jzxj1xCOo7e2rpCGqTIvUFqUrsoyI1BbVjry75H2mvDoRn4Y+NPDz4n8L3gWgUbOGzF01i6cixgMQ3qsTnSM64hvow/9NHsbmd7bc0V+RPttSuyc6zramK0r7ftbJtYHavPCMCGRmc+18ifrDDz9w8OBBvL29SUhI4JlnniEpKYm5c+cyZMgQevTowZw5cwgODmbChAm89tpruLi48Nprr5GVlcWGDRvw8/Pj4sWLTJw4sXSOYkXHExISWLRoEXXr1mX48OFs2bKFc+fOsWDBAuLi4li8eDELFy5Eq9Uyd+5cBg8ezCOPPMLs2bNxcXFh/vz5ZGRk8MEHH+Dt7U1ycjKDBg2iXbt2TJs2jWbNmpGXl4eDgwOTJk2q8NitpKSksHr1ary9vUlNTaVDhw706tWLt956i2+++YYpU6bw2GOPYTQaef311zl37hyTJk2id+/eABw7dowvv/wSX19fLl68yEsvvYRcLmf+/PlkZWUxefJkOnbsaPUe+Lrf3UqjdyLcqVG1a/7F79kXhGmL4ENlM2Haux0UwrRFkGiuXIN/N3Q3OQrR7aJJF6K7P9ddiC7Yns8Jdvc+J+d2LIhaKET3t5BZQnRtkQYu2TXtQpXolXpZmLaPg6cQ3cuGm0J0RSKq7ZtnZ3uxsDW6KutbN7pLFl+q2iJTNcUJXzELXrZO+kqI7r1SazuKEjWP1FEUi9RRLEPqKJYhdRTLkDqKto3UUSxD6iiWIXUUbRepo/jv6yjW2qGnEhISEhISEhISEhIStYV/2+u1WruYjYSEhISEhISEhISEhETNIA09lbgt73mPqGkXqkTDggIhuglKpRBdEOdzx1mV2/qkNpH7/e1XIb4XUi+KGXraoI+452xXfxAz5FLUEFFRQ1qvZjkJ0RVJ17OLhegWHtgmRBfAfDFeiG7SOjFDOf3e7CZEF+CTZ8QssCFqGHWE4c5bqNwLIts+ETw5XUxdD2BKuCpENz9GzPDsUyfFDRFtPyzHutFd4LT6WyG61U2U98NCdNtd2SlE916Rhp5KSNQQojqJtoioTqItIqqTKCEhcWdEdRJtEVvrJIpEVCfRFhHVSbQlzP+yVU+loacSEhISEhISEhISEhISFkhvFCUkJCQkJCQkJCQkJKzwb9tHUeooSlQar/AQAvqGYUjTgdnM8bd2WHzfcsoAHDxdMNzMwiM0gKjl28iKT65RbbeuodTt1x5jahaY4eIKy3k+dQd3xLNPO7LPJOLcKpDrW/eR+tPxGvNXpM9ynyAUjVqDIRuzGQoPW84HUEWMROZatoS73MObvM2LMOvuvBm1KF0AZeu2qDp3xZyZgdlsxvDpRovv7R/qg7r/IDAaAcj7cRf5e36yqqvp1AqnhzphSs/CbDaT9r+Kl+V2HtgNrxUziGn1CObcPKu6iiYtsWvZCbM+C8xmjD98Vv6aug4EQF6nHjIHR/I2r7KqK9JnUXlZlL8groyI0q2I1LR0VkduIuZCAlveX31XGgB/xF1lz5lLuDs6IJPBpIfaWHx/NT2bld8eIcTHg5hrafRtFUi3ED+ruqLKtch8ISoWospIYOfmNO8Thr5Ed8+q7eVsQvt3oPeM4Xw7fxPn956wqgli87GoWIjSFdk+iarvRbV7IvOFyLZPomaROopW2L59OxERETg7O9e0KzWKQq0ifMlYtvV4hSJjIRGRz+LVOYRrB86W2thp1Pwx/1MAGg7sQIc5T/DTmJU1pi13UBG0bDx/dH0Js7GQ0PdfxK1LczL2n7FI+8LCzeRfTUPb3J/Q9563WjGKjIUon7FTour5JHkfzwdTIar+TyP3CaLoctncQFPSOUy7Py7+oFKj6jXaemMpShfA3h7tsy+SMXE0FBTgNHcBylZtKDhpea3ZixdQlHLdul4JMrU99edP5WK/SZgLCmmwZjaaji3JPXTKwk4V6IOqkW+ldVHaox72DDmLp0BhIeqxs1A0aYkptkzXLqw7ZkMOhUf3AiD38q9Rn0XlZWExRlwZEVb2bsPx02fp0eUBzscl3NX5AAZjIYu2H+DLlx5FZafgpU17OBx3jQ6NvUptPvr1NK386zGya3POX03l5U9+sd45ElSuReYLUbEQVUaUahVDFo3lrV4zMBkLefKd5wnsFEL8wTJdN29PctKzyUquRH1Zgsh8LCoWwtpUke2TqPpeULsntH4T2PbVRv5tK4BKcxStsGPHDnQ6XU27UePUa9sY/ZVUioyFAKQcjcO3ZysLm2PLy55OyeRyCnLya1T7/9k787goy/X/v2eG2WBYBTFklVAUzV2PC/6UzNTU0jT1qJlWpuaWlrmUS+WuWWqdI3k6aWqZpJZWato3M7VjiGKigAqCDkgOy8CwMzO/PyBwAgGXW6Se9+vF68XMXM9nrrmee3nu577u+3Hu0JSCazewlulmnYzDvXdbG5vUHUco1Jd2CvYBjciNv1Zn/or0Wf5QINbsDDCX6lpSLqMIaGVjY46PLP/fLqQbJTHH6kwXQNk8BHNaGpRt+lMccw5Vpy6V7DSDBqMdOhztqLHIHGveJVPbNpjilN+xFpf6nBd1Hl3PTjY2Mo0atxeGYrjFbEdVKAKCsWTcgJJSXXPiBexCOtr+pg49kTnoUPYYiGrAs1gL82ulLcpnUWVZlL8gro6I0r0VfXqFYm9vf8fHA5xN+p2HXHWo7BQAtPFvyNHYZBsbN52WzNzSGbmM3AJaeDeoUVdUvRZZLkTFQlQd8W0XRKbegLlMNykynuAw2/KWee0GCSfO16h1MyLLsahYiNIV2T+Jau9F9Xsiy4XIvk+i7nkgZhQ3bNhAcXExSqWS+Ph41q1bR1paGuvWrcPf35+kpCQGDx5M+/btmTFjBnq9nq5du3L69Gl69+5NRkYGFy5coEWLFkyfPp3Dhw+zZMkSnnjiCdRqNefOnWPq1KmEhISwfft2Ll26RIMGDUhJSWHx4sXY2dlx+fJlNm3aRGBgIPHx8fTr1w+lUoler2fz5s00adIET09Pli1bRr9+/TCZTJw/f57Vq1fj7e1NWloaa9asISgoiOTkZIYPH07Lli3Ztm0bV69exdXVFb1ez1tvvVXlezdjMplYsmQJ/v7+XL9+nbCwMEJDQ1m1ahXffPMNw4YN4+zZs/j6+nLjxg2uXr1Kt27diIyMpE+fPoSGhvLxxx/j7+9PQkICL7zwAu7u7sycOROA4OBgjh49yrRp0+jdu3etzpHW3YliU0XFLjLl08C96llWuVJB0LBQjs3/pE61Ve5OmE0VKUtmUz5Kd+fKmholAa8Ow7VbC2Imra8zf0X6LLN3xFpcoWstykeuvdUdexkKvxBKTh+uM10AmYsr1vy8Cu28XGQuQTY2xWfPUHTyBFajEWXHzjjOX0z2nJnV6ircXLDkVpw/iykPhZttjD1eeZb0Dz+DsgvaWvmrc8ZaWOEvBXnIdLa6MteGyDT2FO3/HJmHF/aT3iJ3ySSwVr/TqSifRZVlUf6CuDoiSlckGaZ87NUVu1M6qFVkmGxnQ8b0aMnMLYdYvfcXzl01MOFPF+BVIapeiywXomIhqo7o3J0ozK2IcYEpD68G/jUeVxMiy7GoWIjSFdo/CWrvRfV7IsuFyL7vQURao3ifOXr0KNHR0Xz00UcA7Ny5E4AVK1bQp08f+vbti8FgYMiQIRw5coRXX32VMWPGMH36dEwmE6GhoRw/fhytVktYWBjTp0/n0Ucf5ZNPPqFLly507dqV6OhoFixYwJdffkmjRo0YMWIEcrmcd955h59//pmePXsyb9485s+fzyOPPMKNGzeIiYmhe/fuNG7cmLFjx+Lt7Q3AwYMH8fLyYsSIEWzatImDBw8yfvx4VqxYQa9evRg4cCDXrl1jypQp7Nmzhy+++IL58+fTqVMnoqJKp/Creu9mNm7ciJ+fHy+99BIFBQX069eP77//ntdee40tW7YwevRodDod8fHxODo6MnLkSKZOnUphYSE3btxgzpw5vPHGG7Rq1Yro6Gjmz5/P559/zoQJE1i1ahWvvfYazz33HBZL7StoviEbpU5b/lql01JgqDzTKlcq6L5sHJErviAn6fc61S4yZKPQacpfK3Raig3GSnaWgmIuv7Mdrb8n7XYt4HinaVhLbv1sKpGxEOWzNS8HmbJCV6bSYs2v+vlNisDWmBNrt028KF0Aa1YmMm3FjIvM3gFrVpaNzc2pN8VnTuO0eCnI5VBN2TZnZCF3qDh/cp095oyKGNs1ckfhrMOxX2j5e27jBpN7JJKCcxdv7a/JiEx90wyRxr50vcbNFORhvhJfan8jBTT2yFzdsWZUXz5E+SyqLIvyF8TVEVG6InHTackrrHjMTm5hEW43/QaABV/8xOCOzejXNpAMUz6DVkbwzZxncLZX31JXVL0WWS5ExUJUHTEZslE7VPin0dmTm3732Usiy7GoWIjSFdo/CWrvRfV7IsuFyL7vQUR6PMZ9Ji4uDj+/ijUCw4YNK3/fx8cHAHd3d3JycsjMzATA29sbuVyOk5MTDRo0wMHBAblcjlxu+3P+ON7X15dLly4BoNVqWbVqFeHh4Vy6dImMjIzy7/P1Lb3T5OHhQc+ePW/ps7+/PwBubm7k5uaWHx8dHU14eDjffPMNDRo0wGKxsHz5cnbs2MHQoUOJiSnNt6/qvT/HJCEhgfDwcLZs2ULTpk0xGo3lsXB2dkahUNC8eXMA/Pz8UCqV6HQ6AgICbGLn6+tLbGxFPn5gYGD5b/T09Lzlb/wzaacuovN2R64qvbfg2TGI5MNnULs4lDfwCo2S7suf57fw7zD8dgX//h2rkxSubYyMR+PtgaxM16VTMwyHTmPn4oCiTNd30oBy+8LUDJRuTsg1qjqLhSifLamXkTm5gaJUV+4ViDnxN1Dbg8r2YkrRogsl50/Uyl9RugDFF2JQeHpC2fO8lCEtKTp5ApmjI7KylD37cS+CvDTVTNHYG8v169V2lgD5p2NRejVEpiz12b5dC0w/nkTurEPuoKXkuoHUOWvJCN9JRnjpjauM/+6u8ULVnBiL3M0D7Ep1FQHNKYn5Fex1oCk9dyXx0cjdy+qdRgtyOdbszBpjIcpnUWVZlL8gro6I0hXJI34NSc00UVR2IXfmyu+EBvtizCvEVFC60cX1rFzcnUrri5NWjVwGFmv1q2xE1WuR5UJULETVkeSoi7g2dkdRpuvXoSmxP5xG6+yA+qZB0+0ishyLioUoXZH9k6j2XlS/J7JciOz7JOqeOp9RDA4O5uTJk+WvIyIiGDRoEMHBwSQnJxMSEsKNGzdwcnLC1dWVvLy8atRsuXr1Kj4+Ply5cqV8gDRt2jS++uorvLy8MJlMNn4kJyfj4uJCWloaMTExhIWFIZfLsVqtxMXF8fDDDwMgk1W+mxAcHEyXLl149NFHsVqteHp6IpfLSU1NZc2aNeTl5TFgwAAGDhxY5XsuLi42Wu7u7jz77LMA7Nmzp/zzqr77z+/d/FuSkpIIDg6+pW1tMRcUcWzuf+n61rMUpGeTceEqKcdi6DR/BIVZuUR/sJde6yfj1swbR9/nAFBq1Vz59tc607bkFxE3exPNloyjKD0b0/lkMo+e4+E3R1GcZSJp/VfI1UqaLX+eAr0Bh6DGxL/5CWZT9bnzImMhymdKiin6YTvKnsMhz4TFoMdyNRZl9yFYC3IpiTwAgMzDG2vm71BcuzWVwnQBCgsxrV+Lw6RpWI1ZlCRcpvhMFPbPT8Sak03+F9uxZGagmzYT8/VU7PybkLNySY2y1oJCri/8gIZvTsSckU1BXCJ5J6LxeG08ZmNO+QWqwtUJlxH9AWjwwlCydnxHSVo1mxwUF1LwxYeon34Jq8mIJeUK5vho1IPGYc3LoehQBEWHIlA/OQ7VY8OQuT9Ewda1UFJ8a03BPosqy8JijLg6Iqzu3YJfT59l74HDGNIz2PjJZ4wdOQSN+tYzW1WhVdkxb3A3Vnx1AlcHDUEPudI5yIu135zE2V7N+F6teW1gZ7b9HEN0Uhr6jBym9u2Aq4OmemFB9VpkuRAVC1F1pLigiD1vfMzARWPJTc/memwyl4/H0HfOSPKNJo78ay8AvaY8hUtjdx4Z8A/MJWYu/lT9rJfIciwqFsL6VJH9k6j2XlC/J7R9E9j3PYjUv2TZu0NmtdZwO+0+sGHDBgoLC1Gr1bi4uDB69GjS0tJ477338PPzIykpiaFDh9K+fXvWrl3L3r17Wbp0KSkpKSxbtoylS5cCMG/ePGbPns2wYcMYM2YM3bt3p7CwkLNnzzJ9+nRatWrFmjVruHjxIu3atePo0aO4uLiwePFiMjMz2bRpEwEBAVy/fp2XXnoJT09PNm3aRHJyMoWFhYwaNYqFCxfSvHlzZsyYwdtvv43RaOTtt99Go9Gwbt06vL29MRgMdO7cmT59+vDGG2/QqFEjAHJycpg7d26V792MyWRi1apVeHp6kpOTg4+PD//85z/ZuXMnK1euZPLkyYwbNw6gPB6TJ09m6NChAOXrLf38/EhMTGTChAn4+PiwaNEiLly4wMSJE3n88cdrPC8feY++Z+f4ftCkWEyjk6BU1mx0B4jyF6DLXJeajR4g8r6LrdnoDjEkOgjRbdxXTEKGfr+4buhonpsQ3VD7DCG6emPNmzQ8aPSIWSZEt+RYRM1Gd4g18bIQ3eQPrwrR9VvVU4ju1pdrn3p4uyTYianXvfPFpD2L6vdEMupVMW29OUEvRBegMK7qVNi7JfpMIyG6nYbnCtEFcFy3r2ajB4CjjYYK0Q29Lq6NvxseiIGiCMaMGcOyZcvK1xZK3D7SQLEUaaAoHmmgWIE0UKxAGihWIA0UK5AGihVIA8UKpIFiBdJAURw/NRomRLfH9Z1CdO+WOl+jKIIff/wRvV7P9u23t722hISEhISEhISEhIREVVisYv4eVOp8jaIIevbsWe1mNBISEhISEhISEhISEhK35i85UJR4cBGVflNK/UqVEZna00lQqoyiSWMhuqLSQ0Fg+uJ+MSlDQtMt61cVqZeIShG16yZmXQxACWJ81huzaja6A3wFpcrWR+pjiqgoRC1hsO8XXLPRHWLYLyY9W1S5aCxwaYS4KN9bLEiPx5CQkJCQkJCQkJCQkJD4GyPNKEpISEhISEhISEhISNSA9W82oygNFCVqjVf3EAL6dSQ/PRusVqLW7rb5vPXkAWg9nMm/YcS9VQCRqyMwXk6tlXZgt5a07NsRU5n24fd3VbJp9URnHp89gn2LtxD7w+k69VlkLERpK5q2xq51V6wmI1itFO3/rJKNssdAAOQNPJFpHSjY/n6NunKfYBQPt4X8HKxWKPmf7c5lqt5jkLl4VNi7e1OwfQnW7OqfiwZg37UNjo91xZxhxGq1kr6h6g2qnAb2xGvNbOLaDMGaV1CjrmuPVjTs34kigxGskLjGNvWu4ZNd8OjbgZxzSTi1CeT6ziMYDkbVmb8ifRZV3upjLETp/nJRz+FzV3Bz0CKTwcTH2tl8rs/I4d19JwnxcScuJZ1+bQLpGeJXi0hUxpCewbrwLcRdSmDHf9bdkYZIn0XFGMS1RaLqSH3r90Rqi9JVtm2PqlsPrFmZWK1W8rdttvlc/VhfNE8MgqIiAAoOfEvh4YM16oK48iaq7RRZLkS29w8af7fnKEqpp4J4//33OXz4MADXrl3j0KFDdezR3aHQqOi+fDwnFm8l6t1duDX3watbiI2Nnb2GXxZvI/rDfSR+e5LOb4yslbZSo2LwkvHse/tTDr/3JY2CfQnsaqvt6u1BbkYOxtSaBxeifRYZC2HaSjWa4S9TuPsjir7bjtzLH0XT1ra6HXthzc+l+Ke9FO7eRNGPX9Wsa6dE9egoin/aSfEv+5C7N0buY7vSwJx8gcKId0v/vv4Q87X4Wg0SZRo1jRZP4fel4RjWb0PTLAD7Lq0r2akCfVA97Fuzr2XItSqCV75A/ILNJK6OQNfCF9fQljY2Co2KS+9sJ/mDr7ny/m6CFj9bZ/6K9FlUeauPsRClm19UwpJdx3ht4D+Y1KcdF1Mz+d/FFBubT348Sxt/T8b3as24no+wZt/JWkSiaqLOxhAW+g/u5sFXonwWFWNAWFskqo7Ut35PpLYwn9VqdNNmkrtxA3lbP8GuSSDKNu0qmeUsewvj7BkYZ8+o9SBRVHkT1XaKLBci23uJukcaKApi2rRpPProowDo9fp6P1D0bB+E6ZoBS1EJAGm/XsT30TY2NqdWV9wZlsnlFOcW1krbt10QmXoD5jLtpMh4gsPa2thkXrtBwonzD4TPImMhSlsREIwl4waUlOqaEy9gF9LRxkbZoScyBx3KHgNRDXgWa2F+jbryhwKxZmeAuVTXknIZRUArGxtzfGT5/3Yh3SiJOVajLoC2bTDFKb9jLS7Vzos6j65nJxsbmUaN2wtDMdzi7mVVOHdoSsG1G1jLYpx1Mg6eFuGnAAAgAElEQVT33rblLXXHEQr1pR26fUAjcuOv1Zm/In0WVd7qYyxE6Z5N+p2HXHWo7BQAtPFvyNHYZBsbN52WzNzSu+sZuQW08G5Qo+6t6NMrFHt7+zs+HsT5LCrGIK4tElVH6lu/J1JblK6yeQjmtDQoe2Zxccw5VJ26VLLTDBqMduhwtKPGInOs3WZiosqbqLZTZLkQ2d4/iFiRCfl7UPlLpZ5u2LCB4uJilEol8fHxrFu3jrS0NNatW4e/vz9JSUkMHjyY9u3bM2PGDPR6PaGhoZw7d46WLVsybdo0AHbs2EFiYiKurq6cOXOGVatWcenSJTZv3kyLFi2IjY1l1qxZFBcXM2vWLDw8PFi1ahWnTp1i3bp1zJkzh//+9780b96cl156id27d3PhwgXWr19Ply5dmD9/Pu3atWPZsmXs3r2bHTt2sHr1ary9vct/S2RkJF9++SWBgYEkJiYya9YssrOzmTt3Lh4eHnh7e7N//37WrFnDokWLCAkJwcnJia+++ooDBw7w3XffceXKFRwdHcnIyGDu3LkcOXKEZcuW0atXLywWC99//z1HjhypVWy17k4UmyoGDkWmfBq4O1VpK1cqCBoWyrH5n9RKW+fuRGFuRQpCgSkPrwb+tTq2OkT5LDIWorRlOmeshXkVbxTkIdM529q4NkSmsado/+fIPLywn/QWuUsmgfXWiRYye0esxRXnzlqUj1x7qzuGMhR+IZScPlyjvwAKNxcsuRWxsJjyULjZ+uzxyrOkf/gZlHVQtUHl7oTZVOGz2ZSP0t25kp1coyTg1WG4dmtBzKT1deavSJ9Flbf6GAtRuhmmfOzVFTsQOqhVZJhsZxXG9GjJzC2HWL33F85dNTDhTxdw9xtRPouKMYhri0TVkfrW74nUFtbvubhiza/o96x5uchcgmxsis+eoejkCaxGI8qOnXGcv5jsOTNr1hZU3kS1nSLLhcj2XqLu+csMFI8ePUp0dDQfffQRADt37gRgxYoV9OnTh759+2IwGBgyZAhHjhzh1VdfZdSoUbz88stA6bMXp02bxuXLl/n000/Zt68013z//v1YrVa0Wi0zZ87Ex8eHgwcP8umnn/L6668zc+ZMNm/ejE6nw9HRkTFjxtCxY0euXr2KXq9HpVIxePBgAKZOnQrAiy++yLlz5wCQy+W8+uqrNoNEq9XKK6+8QkREBJ6enuzatYt///vfzJs3j2HDhnHkyBFmz57NiBEjcHFxoXfv3uTn5zN79myeeuopUlNT2bp1K3v37gVg4cKFREREMHz4cA4ePIifnx+jRo3iySefrHV88w3ZKHXa8tcqnZYCQ3YlO7lSQfdl44hc8QU5Sb/XSttkyEbtoCl/rdHZk5teWft2EeWzyFiI0raajMjUN80yaOxL1yreTEEe5ivxpfY3UkBjj8zVHWvGrfWteTnIlBXnTqbSYs2v+rERisDWmBPP1ujrH5gzspA7VMRCrrPHnFHhs10jdxTOOhz7hZa/5zZuMLlHIik4d/GWukWGbBS6Cp8VOi3FBmMlO0tBMZff2Y7W35N2uxZwvNM0rCXm++6vSJ9Flbf6GAtRum46LXmFxeWvcwuLcLvpewAWfPETgzs2o1/bQDJM+QxaGcE3c57B2V59S12RiPJZVIxBXFskqo7Ut35PpLawfi8rE5m2ot+T2TtgzbJ9ZIsl7Xr5/8VnTuO0eCnI5WCpfiWaqPImqu0UWS5EtvcPItIaxXpKXFwcfn4VC+mHDRtW/r6Pjw8A7u7u5OTkkJmZCYCPjw8KhQKFQoGy7Jkz8fHxNoO2vn374ujoiEajYdu2bWzcuJGjR4+Wa3Tt2hW9Xs/Vq1f59ttv6d+/f42+Dhw4kJ9++omcnBxOnTpFhw4dbD7PzMzEaDTy1VdfER4ezqVLl1AoFOWfBwYGAuDr64uTk5PNe8HBwSQkJNC4ccXz7vz8/IiNja10fKtWtmkS1ZF26iI6b3fkqtJ7C54dg0g+fAa1i0N546PQKOm+/Hl+C/8Ow29X8O/fsTrJcpKjLuLa2B1FmbZfh6bE/nAarbMD6psatttFlM8iYyFK25wYi9zNA+xKdRUBzSmJ+RXsdaAp1S2Jj0bu7ll6gEYLcjnW7MxqdS2pl5E5uYGiVFfuFYg58TdQ24PK9sJS0aILJedP1CoOAPmnY1F6NUSmLNW2b9cC048nkTvrkDtoKbluIHXOWjLCd5IRXnpjKOO/u2vseIyR8Wi8PZCVxdilUzMMh05j5+KAoizGvpMGlNsXpmagdHNCrlHVib8ifRZV3upjLETpPuLXkNRME0VlA50zV34nNNgXY14hpoLSDTSuZ+Xi7lR6QeukVSOXgeVuFhneJaJ8FhVjENcWiaoj9a3fE6ktSrf4QgwKT08ou75ThrSk6OQJZI6OyMrSs+3HvQjy0usrRWNvLNev1zhIBHHlTVTbKbJciGzvJeqev8yMYnBwMCdPViymj4iIYNCgQQQHB5OcnExISAg3btzAyckJV1dX8vLykMkq5wQ3bdoUvb7iYeUHDhygY8eOrFy5kscee4ynnnqKn3/+uXzGEWDkyJG89957BAQEoFJV7tAUCgVWq5WCggJSUlJo0qQJAwcOZP78+YSFhVWyd3V1xc3NjeHDh+Ps7ExmZiZnzpwp/7wqv29+78+/4cqVK7Rs2bJK29piLiji2Nz/0vWtZylIzybjwlVSjsXQaf4ICrNyif5gL73WT8atmTeOvs8BoNSqufLtrzVqFxcUseeNjxm4aCy56dlcj03m8vEY+s4ZSb7RxJF/lc6M9pryFC6N3XlkwD8wl5i5+FP1d+hE+SwyFsK0iwsp+OJD1E+/hNVkxJJyBXN8NOpB47Dm5VB0KIKiQxGonxyH6rFhyNwfomDrWigprl63pJiiH7aj7Dkc8kxYDHosV2NRdh+CtSCXksgDAMg8vLFm/g7FtVvzAGAtKOT6wg9o+OZEzBnZFMQlknciGo/XxmM25pR3OApXJ1xGlN6gafDCULJ2fEdJ2q03DLDkFxE3exPNloyjKD0b0/lkMo+e4+E3R1GcZSJp/VfI1UqaLX+eAr0Bh6DGxL/5CWZT9Ws2Rfkr0mdR5a0+xkKUrlZlx7zB3Vjx1QlcHTQEPeRK5yAv1n5zEmd7NeN7tea1gZ3Z9nMM0Ulp6DNymNq3A64Ommp1b8Wvp8+y98BhDOkZbPzkM8aOHIJGfXszk6J8FhVjQFhbJKqO1Ld+T6S2MJ8LCzGtX4vDpGlYjVmUJFym+EwU9s9PxJqTTf4X27FkZqCbNhPz9VTs/JuQs3JJjXEAhJU3UW2nyHIhsr1/EPm7zSjKrNY6vG15j9mwYQOFhYWo1WpcXFwYPXo0aWlpvPfee/j5+ZGUlMTQoUNp3749a9euZe/evSxZsgSTycS8efN4/fXXGTp0KDt27ODSpUu4urpisViYMmUKBw4cYOvWrXTu3JnU1FTOnz/PW2+9RatWrcjNzSUsLIyvv/4aT09PUlNTWbp0KUajkQULFuDu7s706dPx9/cnLCyM//f//h9paWk888wzfP/991UOLqOioti9ezeNGjUiNTWVcePG4ezszOLFizEajUyaNIkuXbqQmJjIwoULcXZ2ZsKECeWzhH/8BkdHx/K1jTExMSxcuLB87eTNM7BV8ZH36Ht+jhLsxFWxJiV/mQnyu2bEkKyaje4ARZPGNRvdAckfXhWiC6A31m5zgtulsXPVaUZ3iyh/ARKUypqN7oBQ+wwhuiJjIYqu4e2F6Np1GypEF6DkWETNRnfA8QmnhOh2mesiRHfb6lwhuiCu75P6vQoGh4jpR+z7BddsdIeI6vuO5rkJ0RXV1gMEx38rTPte8o1n7XaDvV2eSKv8yLIHgb/UQLG+UFRURGZmJl9++SWTJ0+ua3duiTRQrL9IA8UKpIFiBdJAUTzSQLECaaBYgTRQFI80UKxAGiiK4+82UPzLpJ7WF/Lz85k4cSJNmjQp39xGQkJCQkJCQkJCQuLBxvLgPslCCNJA8T6j1WrZvHlzXbshISEhISEhISEhISFxS6SBosQtEZFiEEr9TImof9SvdKTGfQX6u79+pYi2bnO9ZqM7JCHGR4iusPMn6NyBuPNnTbwsRLcEMemhIDKtVUzqqTlBX7PRHTBiCOj3i0kRTSgSky4rst8TVUdEpcCLIu+72JqN7pDGfQXF+Gtx5bh3fvWPqrlTxCX43lss/L2mFKWBosR9RdQgUUJCQkJC4m4QNUisj9TH9cES4hE1SKxP/N02dqlf0w4SEhISEhISEhISEhISwpFmFCUkJCQkJCQkJCQkJGrg75Z3IA0UJWqNfdc2OD7WFXOGEavVSvqG7VXaOQ3sidea2cS1GYI1r6BW2l7dQwjo15H89GywWolau9vm89aTB6D1cCb/hhH3VgFEro7AeDm1znwWGQtR2oqmrbFr3RWryQhWK0X7K2/FrOwxEAB5A09kWgcKtr9fo67cJxjFw20hPwerFUr+t8/mc1XvMchcPCrs3b0p2L4Ea3bND9oV5bOoGLv2aEXD/p0oMhjBColrbNeVNXyyCx59O5BzLgmnNoFc33kEw8GoGnUBlG3bo+rWA2tWJlarlfxttptiqR/ri+aJQVBUBEDBgW8pPHywRl1RdU/UuYP6d/5E1pFfLuo5fO4Kbg5aZDKY+Fg7m8/1GTm8u+8kIT7uxKWk069NID1Dqn+GblUY0jNYF76FuEsJ7PjPuts+/g9E1pH61l4EdmtJy74dMZXVvcPv76pk0+qJzjw+ewT7Fm8h9ofTNWqK9BfEnT9R7ZCodlOktqhyLKq8gdh6LVG3SAPFu8BkMjFx4kS2bt1ard21a9eIjY2ld+/e98mze49Mo6bR4ikk9p+ItbiExuvnY9+lNXknom3sVIE+qB72vS1thUZF9+XjiQh7HUtRCb3Dp+HVLYSUYzHlNnb2Gn5ZvA2AJgM70/mNkRwc926d+CwyFsK0lWo0w18md9lkKClBM34uiqatMcdX6Np17IU1P5eSX38AQO7lX7OunRLVo6Mo+HQxmEtQPfEScp9gLFcrFv+bky9gPvRpmeMaVH2eq9UFsCifRcVYrlURvPIFfukxC2tRCa3+MxPX0JZkHj1XbqPQqLj0znYK9enoWvrT6qMZtess1Wp002aSOeE5KC7G8c23ULZpR/EZ22Nzlr2FJa32m+GIqnvCyhv18PwJrCP5RSUs2XWML2c9jcpOwawth/nfxRQ6B3mV23zy41na+HsypkdLYvUGXtv6f3c0UIw6G0NY6D+IvZhw28f+gdA6Us/aC6VGxeAl41nbZzbmohJG/WsGgV1DuHy8ou65enuQm5GDMbUW7aVgf0Hc+RPWDglqN4VqCyrHosobCK7XDyAW2d9rMxtpjeJdoNPp+PTTT2u00+v1HDp06D54JA5t22CKU37HWlwCQF7UeXQ9O9nYyDRq3F4YiuEWdy9vhWf7IEzXDFiKSrXTfr2I76NtbGxOra64OyWTyynOLawzn0XGQpS2IiAYS8YNKCnVNSdewC6ko42NskNPZA46lD0GohrwLNbC/Bp15Q8FYs3OAHOpriXlMoqAVjY25vjI8v/tQrpREnOsTn0WFWPnDk0puHYDa1k5zjoZh3vvtjY2qTuOUKgv7YTtAxqRG3+tVtrK5iGY09KguBiA4phzqDp1qWSnGTQY7dDhaEeNReZY82YUouqeqHMH9e/8iawjZ5N+5yFXHSo7BQBt/BtyNDbZxsZNpyUzt3S2KCO3gBbeDWql/Wf69ArF3t7+jo79A5F1pL61F77tgsjUGzCXxSIpMp7gMNtYZF67QcKJ87XWFOkviDt/otohUe2mSG1R5VhUeQOx9fpBxCro70FF6Izihg0bKC4uRqlUEh8fz7p160hLS2PdunX4+/uTlJTE4MGDad++PTNmzECv1xMaGsq5c+do2bIl06ZNA2DHjh0kJibi6urKmTNnWLVqFZcuXWLz5s20aNGC2NhYZs2aRXFxMbNmzcLDw4NVq1Zx6tQp1q1bxzvvvENubi5ffvklgYGBJCYmMmvWLNzcKnbgTElJ4Z133qG4uJi2bdty5coVHn74YSZMmIDZbGbFihW4uLiQnZ1NQEAAw4cPZ8+ePbzzzjtERkbyww8/sGzZMvr164fJZOL8+fOsXr2ahg0bsnv3bi5cuMD69evp378/v/zyC1evXsXV1RW9Xs9bb71lEzeTycSSJUvw9/fn+vXrhIWFERoayqpVq/jmm28YNmwYZ8+exdfXlxs3bnD16lW6detGZGQkffr0ITQ0lI8//hh/f38SEhJ44YUXcHd3Z+bMmQAEBwdz9OhRpk2bVutZToWbC5bcisbIYspD4eZsY+PxyrOkf/gZlHVQtUXr7kSxqUK7yJRPA3enKm3lSgVBw0I5Nv+TOvNZZCxEact0zlgL8yreKMhDprPVlbk2RKaxp2j/58g8vLCf9Ba5SyaB9dYZ+TJ7R6zFFSlL1qJ85Npb3aWWofALoeT04Tr1WVSMVe5OmE0VsTCb8lG6O1eyk2uUBLw6DNduLYiZtL5W2jIXV6z5FbGw5uUicwmysSk+e4aikyewGo0oO3bGcf5isufMrFZXVN0Tde6g/p0/kXUkw5SPvbriUQMOahUZJtvZgDE9WjJzyyFW7/2Fc1cNTPjTBfj9RGgdqWfthc7dicLcilgUmPLwauBf6+Pvt78g7vwJa4cEtZsitUWVY1HlDcTWa4m6R9hA8ejRo0RHR/PRRx8BsHPnTgBWrFhBnz596Nu3LwaDgSFDhnDkyBFeffVVRo0axcsvvwxAz549mTZtGpcvX+bTTz9l377SNR379+/HarWi1WqZOXMmPj4+HDx4kE8//ZTXX3+dmTNnsnnzZnQ6HY6OjowZM4bg4GB69OhBREQEnp6e7Nq1i3//+9/Mmzev3F8vLy969+7N8ePHmTx5MgD9+/enZ8+eREVFUVJSUv7+gAED6NChA0899RTr1pWu1QgLC+PgwYN4eXkxYsQINm3axMGDBxk/fjyDBw8GYOrUqQDMnDmT+fPn06lTJ6KiKk+9b9y4ET8/P1566SUKCgro168f33//Pa+99hpbtmxh9OjR6HQ64uPjcXR0ZOTIkUydOpXCwkJu3LjBnDlzeOONN2jVqhXR0dHMnz+fzz//nAkTJrBq1Spee+01nnvuOSyW2i/JNWdkIXfQlr+W6+wxZxjLX9s1ckfhrMOxX2j5e27jBpN7JJKCcxer1c43ZKPUVWirdFoKDNmV7ORKBd2XjSNyxRfkJP1eZz6LjIUobavJiEx902yAxr50/cPNFORhvhJfan8jBTT2yFzdsWbcOtbWvBxkSk35a5lKizW/6mffKQJbY048e0ut++WzqBgXGbJR6CpiodBpKTYYK9lZCoq5/M52tP6etNu1gOOdpmEtqX7LcWtWJjJtRSxk9g5Ys7JsdW9Kbyo+cxqnxUtBLodq6rmouifq3EH9O38i64ibTkteYXH569zCItxu+g0AC774icEdm9GvbSAZpnwGrYzgmznP4GyvrvX33CuE1pF61l6YDNmoHSpiodHZk5teue7dLiL7J1HnT1g7JKjdFKktqhyLKm8gtl4/iPzdNrMRlnoaFxeHn1/FOohhw4aVv+/jU/rQZ3d3d3JycsjMzATAx8cHhUKBQqFAWfZA1vj4eLy9vct1+vbti6OjIxqNhm3btrFx40aOHj1artG1a1f0ej1Xr17l22+/pX///mRmZmI0Gvnqq68IDw/n0qVLKBSKKv3+wzcAX19fLl26ZOMzgLe3N/Hx8VUe7+/vD4Cbmxu5ublV2ixfvpwdO3YwdOhQYmJiKn0eFxdHQkIC4eHhbNmyhaZNm2I0Gstj5uzsjEKhoHnz5gD4+fmhVCrR6XQEBATY+Ovr60tsbMVamMDAQAA8PDzw9PSs0r+qyD8di9KrITJl6b0F+3YtMP14ErmzDrmDlpLrBlLnrCUjfCcZ4aU3BTL+u7vGjgcg7dRFdN7uyFWl2p4dg0g+fAa1i0N556HQKOm+/Hl+C/8Ow29X8O/fsTpJoT6LjIUobXNiLHI3D7Ar1VUENKck5lew14GmNMYl8dHI3cvKhEYLcjnW7MxqdS2pl5E5uYGiVFfuFYg58TdQ24PK9oJV0aILJedP1BgD0T6LirExMh6NtweysnLs0qkZhkOnsXNxQFFWjn0nDSi3L0zNQOnmhFyjqjEWxRdiUHh6Qlm7qAxpSdHJE8gcHZGVpQPaj3sR5KXtmqKxN5br12u82BFV90SdO6h/509kHXnEryGpmSaKyi62zlz5ndBgX4x5hZgKSjfQuJ6Vi7tTaRlx0qqRy8BirZtEJ5F1pL61F8lRF3Ft7I6iLBZ+HZoS+8NptM4OqG8aNN0uIvsnUedPVDskqt0UqS2qHIsqbyC2XkvUPcJmFIODgzl58mT564iICAYNGkRwcDDJycmEhIRw48YNnJyccHV1JS8vD1kVC0SbNm2KXq8vf33gwAE6duzIypUreeyxx3jqqaf4+eefy2ccAUaOHMl7771HQEAAKpUKV1dX3NzcGD58OM7OzmRmZnLmzJkq/b569Wr5/0lJSQQGBpKTk2Mz2Lp27RpNmzat8viqfoNCocBqtVJQUEBKSgqpqamsWbOGvLw8BgwYwMCBA3FxcbGJnbu7O88++ywAe/bsKf+8Kv0/v/dHjF1cXEhKSiI4OLha/2qDtaCQ6ws/oOGbEzFnZFMQl0jeiWg8XhuP2ZhT3uEoXJ1wGdEfgAYvDCVrx3eUpFW/MNpcUMSxuf+l61vPUpCeTcaFq6Qci6HT/BEUZuUS/cFeeq2fjFszbxx9nwNAqVVz5dtf68RnkbEQpl1cSMEXH6J++iWsJiOWlCuY46NRDxqHNS+HokMRFB2KQP3kOFSPDUPm/hAFW9dCSfGtNQFKiin6YTvKnsMhz4TFoMdyNRZl9yFYC3IpiTwAgMzDG2vm71Bc87oS0T6LirElv4i42ZtotmQcRenZmM4nk3n0HA+/OYriLBNJ679CrlbSbPnzFOgNOAQ1Jv7NTzCbarE2r7AQ0/q1OEyahtWYRUnCZYrPRGH//ESsOdnkf7EdS2YGumkzMV9Pxc6/CTkrl9QoK6ruCStv1MPzJ7COaFV2zBvcjRVfncDVQUPQQ650DvJi7TcncbZXM75Xa14b2JltP8cQnZSGPiOHqX074OqgqVn8T/x6+ix7DxzGkJ7Bxk8+Y+zIIWjUtzcrKbSO1LP2origiD1vfMzARWPJTc/memwyl4/H0HfOSPKNJo78ay8AvaY8hUtjdx4Z8A/MJWYu/lT9jLPI/knU+RPWDglqN4VqCyrHosobCK7XDyCWv9deNsisVnG3Fjds2EBhYSFqtRoXFxdGjx5NWloa7733Hn5+fiQlJTF06FDat2/P2rVr2bt3L0uWLMFkMjFv3jxef/11hg4dyo4dO7h06RKurq5YLBamTJnCgQMH2Lp1K507dyY1NZXz58/z1ltv0apVK3JzcwkLC+Prr78unzWLiopi9+7dNGrUiNTUVMaNG1c+u/YHu3bt4scffyQkJIS4uDiCgoKYNGkSZrOZ5cuX4+TkhNFoJCgoiOHDh/P111/z9ttvM3PmTEJCQli4cCHNmzdnxowZvP322xiNRt5++22cnZ2ZPn06/v7+hIWF8f3339OoUSMAcnJymDt3ro0fJpOJVatW4enpSU5ODj4+Pvzzn/9k586drFy5ksmTJzNu3DiA8rhNnjyZoUOHAnD58mU2bdqEn58fiYmJTJgwAR8fHxYtWsSFCxeYOHEijz/+eI3nL7Zp/7suA3/maJ5bzUZ3SKh9hjDt+kbjvmKSBRRNGgvRNSfoaza6Q/T7xSSK6I212/Tgdmnd5jZ337sNdsf41Gx0B4wYklWz0R0g6tyBuPPXZa5LzUZ3gCwgsGajO8Su21Ahuj+FzK3Z6A7oNLzqbJ27RWR521wkplyMVQmqe4LqB0CCUlmz0R0wOORqzUYPGOpmYuK89Ouq13jeLb3zxaWKPpq2Q5j2veQzr1FCdEembBOie7cIHSjWN3bt2oVery9fS/h3Rxoo1l+kgWIF0kCxAmmgeJO2NFAsRxooliINFCuQBor3B2mgWEF9GShu8xotRHdUSvWP2qsrpOcolpGamsr//d//YTQaiY+Pv2VqqYSEhISEhISEhITE34+/2+yaNFAs46GHHmL9emm7XgkJCQkJCQkJCQkJCSn1VOKWdGscds81/ewqP1vnXpFUUnk75geZ52Rede3CA0OCnbiUr/qWmiUqLQvEpWeLisUhbdW7U98LRJULUYhMBRRFj5hlQnQvd50iRPf5m54zJyEOUdcBfrK7273zViRZ69+mKwvUt7Fx3G0gKoUaYNmV7cK07yVbGotJPX1W/2Cmngp7PIaEhISEhISEhISEhIRE/URKPZWQkJCQkJCQkJCQkKgBcflPDybSQFHitnB0cWTS3BdJSU7FO6AxG5f/h0xD5Qe9Nvb3YsqbEzGbzbwxYfFtf0/Lbo/Qsd8/yDYYsVqt7Hr/iwfOX1HaXt1DCOjXkfz0bLBaiVq72+bz1pMHoPVwJv+GEfdWAUSujsB4OfUvpwsQ2K0lLft2xFSmffj9XZVsWj3Rmcdnj2Df4i3E/nC6Vrr2Xdvg+FhXzBml5St9Q9UpL04De+K1ZjZxbYZgzas5Lc21Rysa9u9EkcEIVkhcE2HzecMnu+DRtwM555JwahPI9Z1HMByMqpXPouJcH2NR38qFKF0QF2eR5+/PGNIzWBe+hbhLCez4z7o70gBxcRbV1tfH/ul+XQMAODjrGDlnDGnJaTQKeIgdK7eSbbi9JSai2oqquFfXLVVxL2Ihsh26n3Gua/5u6/Wk1NN7zLVr1zh06FBduyGMiXOeJ/LnU2z94DOOHjjGlAUTq7QLaducEz/8746+Q6VRMX7pRD5962O+fG8Hvs39Ca1pusYAACAASURBVOnW6oHzV4S2QqOi+/LxnFi8lah3d+HW3AevbiE2Nnb2Gn5ZvI3oD/eR+O1JOr8x8i+nC6DUqBi8ZDz73v6Uw+99SaNgXwK72mq7enuQm5GDMbX6B0XfjEyjptHiKfy+NBzD+m1omgVg36V1JTtVoA+qh31rrSvXqghe+QLxCzaTuDoCXQtfXENb2tgoNCouvbOd5A++5sr7uwla/GyttEXFuT7Gor6VC1G6IC7OIs9fVUSdjSEs9B/czY4JIuMsqh+pb/2TaJ//zPDZo/nt52j2/msXpw78j1Hzn7ut40W1FVVxL69bquJuYyGyftzPOP+dOX78OIsWLWL9+vVs2LCh0ufh4eEsXbqU8PBwpk+fzuXLl+/J90oDxXuMXq//Sw8Uuzz6D86dOg/A2V/P0TWsc5V2B3cfpqS45I6+I6h9Mwz6G5QUlR4fHxlL27AOD5y/IrQ92wdhumbAUvbb0369iO+jbWxsTq2uuLMvk8spzq150Xp90wXwbRdEpt6AuUw7KTKe4LC2NjaZ126QcOJ8rfT+QNs2mOKU37GWnZO8qPPoenaysZFp1Li9MBTDLe64VoVzh6YUXLuBtczfrJNxuPe29Td1xxEK9aUdpX1AI3Ljr9VKW1Sc62Ms6lu5EKUL4uIs8vxVRZ9eodjb29/x8SA2zqL6kfrWP4n2+c+0DWvPxag4AOIiY2kb1v62jhfVVlTFvbxuqYq7jYXI+nE/4/wgYJGJ+auO/Px8Fi5cyLx585g6dSpxcXGcOHHCxiYvL4+5c+cyYcIEHn/8cVatWnVPfm+9Tz1NSEhg48aNBAYGcvHiRSZPnsx7771HYmIiK1euxM7Ojtdff51Jkyaxb98+9Ho9Xbt25fTp0/Tu3ZuMjAwuXLhAixYtmD59Oj/88APLli2jf//+GAwGrly5wtixYzl27BhxcXGsWbOGxo0bk5aWxpo1awgKCiI5OZnhw4fTtGlTdu/ezYULF1i/fj39+/dn/fr1XL16lW7duhEZGcmjjz7Kpk2bGDVqFFOmTGHDhg2cPn2atWvX4uRU8YDU77//nqNHj+Lt7U1KSgpz587l/PnzLFq0iJCQEJycnPjqq694/fXX+eCDD+jVqxcWi4Xvv/+eI0eO8MEHH1BSUoLFYkGpVDJlyhS+/PJL3n33XUaOHMn169eJjY0lIiKimuhWxrWBC3mmPADycnJxcnVCoZBjNt+7rG2nBs4UmCp2Gcsz5eHfoMkdaYn0V4S21t2J4pt+e5EpnwbuVT84V65UEDQslGPzP/nL6QLo3J0ovGkXwgJTHl4N/Gt1bHUo3Fyw5Fb4bDHloXCz3YXP45VnSf/wM7iNCx2VuxNmU4W/ZlM+SvfKu/vJNUoCXh2Ga7cWxEyq3SN5RMW5PsaivpULUbogLs4iz58oRMZZVD9S3/on0T7/GacGzhSUndN8Ux46F0fkCjmWWn6XqLaiKu7ldcst9e8iFiLrx/2M89+VM2fO4OXlhUqlAqBdu3b8+OOPdOnSpdxmxowZ5f9bLJa7vvn2B/V+oPjTTz+hVqt57rnnSEtLQ61Ws2TJEgYNGoS/vz8Wi4VOnTrRu3dvgoODGTNmDNOnT8dkMhEaGsrx48fRarWEhYUxffp0wsLCOHjwIN7e3rzyyissWbKE8+fPs3jxYj755BMOHDjA+PHjWbFiBb169WLgwIFcu3aNKVOmsGfPHgYPHgzA1KlTAXj11VcZOXIkU6dOpbCwkBs3bmAymVCr1QCoVCoWL15sM0g0Go0sXryYQ4cOodFoWL9+PZ9//jljx46ld+/e5OfnM3v2bJ566ikCAwP55Zdf8PPzY9SoUTz55JMcPXqUs2fPsnHjRgBeeOEFfv75Z55++mn27NlDq1atmDJlCr/99lutYvzk6AH06Nud/Lx8MtOzsNfZY8rOxd7RgezM7HveQWSnG9HoKra5ttfZk51e+1x8kf6KjkW+IRvlTb9dpdNSYMiuZCdXKui+bByRK74gJ+n3v5wugMmQjdpBU/5ao7MnN72y9u1izshC7lDhs1xnjzmjonzZNXJH4azDsV9o+Xtu4waTeySSgnMXb6lbZMhGoavwV6HTUlzFGhJLQTGX39mO1t+TdrsWcLzTNKwl5mp9FhXn+hiL+lYuROmCuDiLPH+iuNdxFtXW18f+6X5eA4T9sw8dH+9MQV5B6bWAg5a87Dy0OntMWTm1HhiBuLaiKu72uqUq7mUsRLZD9zPODwJ1sZlNeno6Dg4O5a91Oh3p6VWn8RYVFbF7924WLlx4T7673qeePvPMM7i5uTFq1CjWr1+PnZ0dOp2OXr16sXfvXnbv3s2TTz5Zbu/t7Y1cLsfJyYkGDRrg4OCAXC5HLrcNha9vaY62k5OTzf+5ubkAxMXFER0dTXh4ON988w0NGjTAYqm6+Pj5+aFUKtHpdAQEBDBy5EgiIiIoLCwkLS0Nb29vG/ukpCQAtmzZQnh4OEajkZKSijs8gYGBAAQHB6Mse+baH++1atWKuLg4fHx8bL4/Nja2/HWTJk3KbWvDV1v3MWv0HN6YsJgTh3+hZfsWADzSsSXHy9YgyGQyPL0a1kqvJi6eisO9sQd2qtL7GE07BHP6h8haHy/SX9GxSDt1EZ23O/Ky3+7ZMYjkw2dQuziUDxQUGiXdlz/Pb+HfYfjtCv79O/7ldAGSoy7i2tgdRZm2X4emxP5wGq2zA2rdnT8vK/90LEqvhsiUpbr27Vpg+vEkcmcdcgctJdcNpM5ZS0b4TjLCdwKQ8d/dNXaWxsh4NN4eyMr8denUDMOh09i5OKAo89d30oBy+8LUDJRuTsg1qhp9FhXn+hiL+lYuROmCuDiLPH+iuNdxFtXW18f+6X5eA/yw/SArxr7N+5NWcfqHUwS1awZAsw7BnP7h1G1piWorquJur1uq4l7GQmQ7dD/j/CBgEfRXHQ0aNCgffwCYTCYaNGhQya6oqIhFixbxyiuvlI9d7hbFokWLFt0TpToiMjKSJ554glGjRhEVFcW1a9do27Ytvr6+rFy5EqVSybBhwwDIzs7m8OHDDBkyBIDNmzczduzYSv8fOnSI5s2b4+3tzcmTJ3FycqJ58+ZcuHCBnJwcOnfuTFRUFGFhYQwbNoz27dtjtVpp3rw5169fJyYmhtDQUJKTk1EoFDbfCeDg4EB0dDT79+/n6aefpnHjxja/SaPRlKeJduzYEX9/fxwdHfHy8rLx5w9u9hdK85R/+uknBg4cCJQOOPv06YOvry+7d++md+/eNjOYt+LjdzdXeu+3yBieGj2Qh1sE0qpDSz58J5yCvAKCQgJ5J3wRu7d8DUD3Pl0JG9ATv0BftA5afouMAcBFrqmk+WfMJWb0l67xxItP8nDbpmSlZfDTzv+r8TijpfKarLv1tzruVruNrPIDtK0lZrIupvDIS/1p2DaQvN+ziP/iJ9rPehq3YB/Sfo3n0X9NxeORABp1DqbpMz1o3LUFsdurj8+Drpspr7yLhaXEzO+X9IS++AQ+bR4m5/csTkX8RO9XhtKomTdJkfEA9JryFE26tECj01KUX0RGUpqNThvFn3ZtKzFTePkqbs8PQds6mJIbGWTvOoT7tNGom/qTX7b+RuHqhNu4wTh0aQ0lZoqu6G1Sd3IK1ZVikRevx2/SQJzaB1GUlkXq5z/S5LVn0DX3wXgyDtfuIXg+1Q1dc1+8RvREv/Uw2WW/ozwWisoPmb9XcfZT/unB0Q94LBKUle9lPujlohKCYnwv4yxK1+/l3tV+zx/8evosew8cJu5iAgWFhbRs3hQ7u1snPGX+59vKb96DOH99i5Q7Uf3Ig9w/idStzXUAQPypWB4d9Ti+zf1o2j6Y7cu2UJh367XXLjKlzet71VYYqTkV806vW2rL7cbi/9n9aWb/HrVD0ebK5+5exbn3jKfvNDz3lVPvVt7R9V7QYeaQW37m7u5OeHg4I0aMQKFQ8PHHHzNw4EAcHR0pKSlBpVJRUFDAokWLGD9+PM2bN+fAgQM8/PDDd+2XzGq9m33G6p79+/dz/PhxvL29SUhI4OWXXy6fTZswYQLPPPMMvXuXdlZr165l7969LF26lJSUFJYtW8bSpUsBmDdvHrNnz6ZZs2YsXLiQ5s2b8+KLL7Jw4UKcnZ2ZOXMm7777bnlaqL29PevWrcPb2xuDwUDnzp3p06cPWVlZTJ8+HX9/f8LCwoiKimLv3r1MnjyZoUOHlvt99uxZFixYwJ49e6r8XYcOHeLYsWM0aNCAtLQ0pk2bhslkKvdnwoQJtGrVirNnz5b7+9JLL+Hn5wfAhg0bKCwsxGq1otFomDJlCseOHePNN9/k8ccf58UXX8TNza3a2HZrHHbX5+fP+NlVXudyr0gqubs0j/vNczKvunbhgSHBTlwyx1hVlhBdvbHyQP9ekKBU1mx0h4TaZwjRFRWLQ9rKg+Z7hahyIQpRMRZJj5hlQnQvd50iRPf53No9CkDi7hB1HeAnEzN7lWSt5kbQA8oCde02jbtdNhe5CNEFWHbl9jbRqSv+7TNaiO7Eq1ur/fzYsWMcOHAAV1fX8r1HVq5ciYuLCxMmTGDKlClcvHiRhg1LZ/bz8vL48ssv79qvej9QrIqioqLytX9vvvlmpbTSusRisWCxWIiJieHy5cs2M40PGtJAUSzSQLECaaBYgTRQrEAaKFYgDRQrkAaK9RtpoCgeaaAojroaKNYV9X4zm6pYvnw5jo6OdOjQ4YEaJELp+sPVq1fj6enJnDlz6todCQkJCQkJCQkJCYlaUBeb2dQlf8mB4oIFC+rahVsSEBDABx98UNduSEhISEhISEhISEhI3JK/5EBR4t5Q39JD6ltpblJQLEy7dZvrQnQNiQ41G90BTQSm1enzxWh3mSsmBSdhdW7NRndI/UuXrX/3bv1W9RSi65t4WYgugDlBL0RXVIpo4PENQnR/BMa2nyVEW6KCXmYx/Yio1HqRHM2rfq+IO2VzkZjrrHmD/rqPvagt9a9Xujvq2aW1RH1H2CBRQkJCQkLiLpAGiRISEjXxl9vYpQYerAV8EhISEhISEhISEhISEnWONKMoISEhISEhISEhISFRAxZZXXtwf5EGihJ3hYOzjpFzxpCWnEajgIfYsXIr2Ybbf0xFYLeWtOzbEVN6NlitHH6/8gNNWz3Rmcdnj2Df4i3E/nC6zn0WpevaoxUN+3eiyGAEKySuibD5vOGTXfDo24Gcc0k4tQnk+s4jGA5G1airbNseVbceWLMysVqt5G/bbPO5+rG+aJ4YBEVFABQc+JbCwwdr1LXv2gbHx7pizjBitVpJ31D1FtdOA3vitWY2cW2GYM2r3Tb0omIhSlfuE4zi4baQn4PVCiX/22fzuar3GGQuHhX27t4UbF+CNTu9Rm2v7iEE9OtIflkdiVq72+bz1pMHoPVwJv+GEfdWAUSujsB4ObVGXVGxEOUviGsvRJXlXy7qOXzuCm4OWmQymPhYO5vP9Rk5vLvvJCE+7sSlpNOvTSA9Q/xq5bOoMqdo2hq71l2xmoxgtVK0/7NKNsoeA0s1G3gi0zpQsP39Gv0V2V78GUN6BuvCtxB3KYEd/1l3Rxq34kHvR+6X7r3SFtVeiCxvorRFxULkdZao9kKi7ql16unevXt5++23WbBgAb/88stdf/GhQ4e4du1atTaRkZEMHjyY//3vf9XarVy5kjFjxgBgMpkYPVrMM05q4vnnnyc9veYLvr8Sw2eP5refo9n7r12cOvA/Rs1/7rY1lBoVg5eMZ9/bn3L4vS9pFOxLYNcQGxtXbw9yM3Iwpt59fO+Fz6J05VoVwStfIH7BZhJXR6Br4YtraEsbG4VGxaV3tpP8wddceX83QYufrVlYrUY3bSa5GzeQt/UT7JoEomzTrpJZzrK3MM6egXH2jFoNEmUaNY0WT+H3peEY1m9D0ywA+y6tK9mpAn1QPexbs583ISoWwmJsp0T16CiKf9pJ8S/7kLs3Ru4TbGNiTr5AYcS7pX9ff4j5WnytBokKjYruy/8/e2ceF2XV/v/3zDDADMM2ghibICIogiuSimumaZlLmpaZmsZPeszHLJcCLUtzKS016xtZj/moZZZampUFWeZGiCsJmgsoruwMw7AM8/sDAieWQeUYPN5vX75eM8w1n/ua6z73de5z7rM8w4EFG0hcsRVtWy/ce5pfI1ZqWw4u2Mix93dyflc8YdFP/GOxEOUviMsXospyYXEpi7buY9bQ+4kc2JkzV7I5dOaymc26Pcfp6OPGM/06MKlvCMt3xtdPXFSZU9pgO+ZfFG37iOLvNiF390HRxjwWVqH9MBUWUPLrDoq2raV4z9cW3RWZL2oi8XgS/Xvdj4jdohtzPXI3dRtCW1S+EFneRGmLioXQ+yxB+aKxUibof2Ol3g3FrVu3MnHiRF577TVCQ0Pv+MA//fQT6el1r7TWtWtXAgICLGo9+eSTla81Gg3//e9/79i/22Ht2rU0a9bsHzn2P0Wn/l04k5gCQEpCMp36d7llDe/O/mSnZ2AsLgUgNeE0gf07mdlkX7rBuQN/3LnDNIzPonQdu7bBcOkGpopY5MSn4DLAPBZXNv9CUXp5Ilf7tqDgdN0dLgDKtkEYr12DkvKVVkuSTmLdrXs1O9tHR6AaNQbVuAnI7C2vkKnqFEjJ5euYSsr91Sf+gaZvNzMbma0N2imjyKilt7U2RMVClK78Pj9MeVlgLNctu3wWhW+wmY3xdELla6ugnpQm7bOoC+DWxR/dpQzKKny+9vsZvB/oaGZz+O2qJ4EyuZySAssbLouKhSh/QVy+EFWWj6de5z5nDdZWCgA6+jRnb3KamY1WoyK7YrP3rAID7TzrV4+IKnMK30DKsm5Aabmu8fwprILM631l177I7DQoew/F+pGnMRVZ3pRcZL6oiYH9eqFWq+9YpyYacz1yN3UbQltUvhBZ3kRpi4qFyPssUfmisXKvNRTrNfQ0NjaW1NRU1q9fT3h4OPHx8Xz77beMHj2a48eP4+3tTVhYGLGxsfj6+pKSksKCBQvQaDRcu3aNd999Fz8/P9LS0ggODiYgIIBTp04BcOzYMSIiIoiOjsbNzQ29Xo+rqyvPPPNMnT4lJSWxevVqgoODUd60NPv27dtZuHAhCQkJxMXFsXjxYoYMGUJGRgYXLlxgwoQJ7Nu3j5SUFJYvX46HhwfXrl1j+fLl+Pv7k5aWxpgxY2jfvj0zZswgPT2dXr16cfLkSdq3b8/06dPJyspiyZIl+Pn5cfHiRYYPH47JZGLhwoW88sorhIWFcfjwYbZv3463tzfnzp1jxowZFBYW8vLLL+Ph4YGrqytHjhxh6tSp9O3bt9rvW7lyJUajEblcjp2dHc8++yxfffUVK1as4IknnuDq1askJyfz0EMPsWbNGv79739z4sQJDAYDS5cuZdmyZXh6enL58mXCw8MZMGAAb731VrXzFhUVdQvFpToOzRwxFJRf8IU6PRone+QKOWXG+hd7jYsDRQVVwzEMOj3uzXzuyK+6aAifRelauzhg1FXFwqgrROlSfZsSua0S35dG49yzHUmRqy3qypycMRXqK9+b9AXInPzNbEqOH6U4/gCm3FyUoWHYRy0gb+7MOnUVWifKCqoSfplOj0Jr7q/rC0+T+f5nUFGh1hdRsRAWY7U9ppIqXVNxIXJVbT3JMhQtgyg9EmtRF0Dl4kCJrirOxbpCmrk41GgrVyrwH92LfVHrLOqKioUof0FcvhBVlrN0hahtquooOxtrsnTmPfbje7dn5vqfeHvHQU5ezCDibzeGtSGqzMk0jpiKqvIFBj0yjXksZM7NkdmqKf7+c2Su7qgjX6dgUSSYas93IvPF3aYx1yN3U7chtEXlC5HlTZS2qFiIvM8SlS8kGgf1aig+8MADrFu3jgkTJuDp6UmfPn1Yv349Tz31FBqNhtOnT5Ofn88rr7yCvb09//nPf/j6668ZN24cS5cuZcCAAQwZMoTi4mK+++47QkJCaNu2LSNGjCAsLAyAvn37MmDAAACGDRvG448/jkajqdWn+fPnM3/+fDp06MD+/fvZu3cvAMOHD2fVqvK5CP3792f37t14enrywgsvsGjRIv744w8WLFjAunXr+OGHH3jmmWdYunQp/fr1Y+jQoVy6dIlp06axfft2XnrpJcaNG8e//vWvSh+nT59OYmIiubm5jB8/nqKiInJycvD19aVt27YAmEwmXnjhBbZv345Wq2XXrl0sW7aM5cuXM3r0aH777TfmzJnD8ePHef/996s1FPfu3cuxY8f45JNPABg/fjzh4eE89thjbN++neDgYKZNm8aJEycIDg5m06ZNhIeHM3HiRE6cOMGHH35Iy5YtmTx5MsXFxQwYMIDQ0FBmzZpV7bzdDv2fHEjooDAMegN5mbnY2qnQ5+lRadTocvJvueLRZeRhY2db+d5Wo6Ygs2H36mlon0XpFmfkodBUxUKhUVFSw1yPMkMJZxduQuXjRuet89nfbTqmUmOtuqacbGSqqp51mdoOU06Ouea1qr0XS44ewWHBmyCXQ1ntv8GYlYPcrmrLE7lGjTGryl+rFi4oHDXYD+5V+TftpBEU/JKA4eSZWnVBXCyExVifj0xZpSuzVmEqzK/RVuHXAeP547Vq/Z3CjDyUmqo4W2tUGDKqXyNypYLwxZNIWPoF+anXLeqKioUof0FcvhBVlrUaFfqiqj1TC4qK0d4Uc4D5X/zKiNAABnfyI0tXyKPLvuTbuY/jqLap02dRZc6ky0Vmc9OTOFt1+dyjmzHoMV4or0NMNy6DrRqZswumrNrPo8h8cTdoKvWIaN2G1haVL0SWN1HaomIh8j5LVL5orEjbY9QTFxcXHB0dUSgUtG3bFrVazZo1a/jwww85cuQIWVnlG5+mpKTQsmX5pHxra2uGDRtWo96NGzdYsWIFMTEx6HQ6cv52E/t3/vzzz0pdLy+vOm29vct7WB0cHMxeFxQUVPp47NgxYmJi+Pbbb2nWrBllFTfHXl5eKBQKFApF5ZPLvn37EhoayuTJk4mOjsbKyry9nZ2djU6nQ6vVVh4/OTm58nMfHx8AtFptpQ83k5KSQmFhITExMcTExNCiRYvKeAK0atUKgODgqiFGfn5+lX9LSUmpjIm1tTWOjo6kpqYC1c/b7RC3aTdLJ7zBysi3OBJ3GP/O5cODA7oGciTu8C3rpSWewdnDBYV1eRxbdm1DctwRVI522GgaZt/FhvZZlG5uwmlsPV2RVcTCqVsAGT8dwcrJDkVFLLwjH6m0L7qShVLrgNzWuk7dklNJKNzcoKIMK4PaUxx/AJm9PbKKoVnqSc+CvHx4nMLDk7KrV+tsJAIUHklG6d4cmbLcX3Xnduj2xCN31CC3U1F6NYMrc98hK2YLWTFbAMj6z7Z63fSJioUo3bIrZ5E5aEFRrit398N4/gTYqMHavGGgaNed0j8OWIzBX1w7fAaNpwvyCp/dQv1Jiz2KjZNd5U2FwlZJ+JLJnIj5jowTF/AZYnmKgKhYiPIXxOULUWU5pGVzrmTrKK5oWB+9cJ1egd7k6ovQGcoXjrqaU4CLQ/l16KCyQS6DsnpMrBNV5oznk5FrXaGiblP4tqU06XdQa8C2PMalp48hd3Er/4KtCuRyTHnZdeqKzBd3g6ZSj4jWbWhtUflCZHkTpS0qFiLvs0TlC4nGwW2veiqTma8PGx0dTVRUFKGhoWzevJnr18t7CQIDA0lLSyMoKAiDwcD333/P8OHDkcvlmEwmUlNTKSwsZO3atcTGlg+JiYuLs3h8Pz8/Lly4QMeOHbl48eLt/oxKH7t3784DDzyAyWTCzc0NuVxe4+8EOH36NEOHDmXKlCls3LiRTz/9lOjo6MrPnZ2dsbe3JzMzk2bNmpGamkpgYNUCAzVp/t2fo0ePEhERAcCBAwcqG8W1ff/mv/0Vc4Di4mJyc3MrG6eWjn2rbF62gSdefpr7Wrnj5t2CjYvW3bJGiaGY7dGfMPS1CRRk5nE1OY2z+5N4aO4TFObq+OWDHQD0mzYcJw8XQh65H2OpkTO/1v+JTEP7LEq3rLCYlNlrCVg0ieLMPHR/pJG99ySt542jJEdH6uqvkdsoCVgyGUN6Bnb+Hpyetw6jzsJ4/6IidKvfwS5yOqbcHErPnaXkaCLqyVMx5edR+MUmyrKz0EyfifHqFax8WpG/bJFFf02GIq6+uobm86ZizMrDkHIe/YFjuM56BmNufmUFqXB2wGnsEACaTRlFzubvKL1W94R5UbEQFuPSEorjNqHsOwb0Osoy0im7mIwyfCQmQwGlCT8AIHP1xJR9HUrqNycPwGgoZt/L/6HH609jyMwj69RFLu9LolvUWIpyCji2Zgf9Vj+HNsATe++JAChVNlzY9fs/EgtR/oK4fCGqLKusrXhlRE+Wfn0AZztb/O9zJszfnXe+jcdRbcMz/Towa2gYG39L4ljqNdKz8nn+oa4429nWqlmJqDJXUoThi/exeez/YdLlUnb5AsbTx7B5dBImfT7FP31J8U9fYjNsEtYPjkbmch+GDe9AaUmdsiLzRU38fuQ4O36IJSMziw/XfcaEJ0Zia1P3U9r60pjrkbup2xDaovKFyPImSltULITeZwnKF42Ve217DJnJZLnbcs+ePbz++usMGjSIJ598koMHD7Js2TKee+45Jk2aBMCGDRuIjY0lLCyMpKQkcnNzeeONN7C1teXdd9/Fx8eHGzduMHr0aAICAti5cyc///xz5dy+GTNm4OrqSqtWrdiwYQPDhw+nZ8+eLFq0iLZt2xIdHW02Kf3EiROsXLmS9u3bU1payq5du5g/fz55eXm88cYbzJw5k6CgIF599VXatm3Ls88+y6uvvoqjoyMzZ85kxYoV5ObmsmDBAtRqNatWrcLT05OMjAzCwsIYOHAg77zzDjt27GDRokXodDpeeeUV5syZg4+PD1u2bMHPz4/U1FTGjBlDcXGxma+nTp3iUfIPDwAAIABJREFUq6++wtvbm/Pnz/Piiy8il8tZsGBBZWy++eYbtm3bxqJFi+je3Xxhkffff5/CwkIUCgVFRUW89NJLHDx4kHnz5jFo0CCeffZZtFot3333HfPnz2fSpEk8++yzKJVKdDodS5Yswd3dnStXrtCnTx8GDBjAli1bqp23uniy5YhbLU8WaSlrmCeENZFqalqToycb6n4qcyd06HjVstFtkHHeTohueq7lhXMaG91fdhKiu/Ht6qMMGopWJWIq5nM3zRNvUF0rcfNXJljXPWrldmn5Vl8huqbzZ4XoAhjP1b2w3O2S/r2Y8+e3/z0huhO6vChEV8KcfkYx9UgvdZZlo0bGXr1WiK6o3PnKow07Lehm7FfttGzUCFjSUszOCnNTNwjRvVPq1VCUuDeRGopikRqKVUgNxSqkhuJNulJDsRKpoViF1FBs2kgNxSqkhmIVUkOxcTYUb3voqYSEhISEhISEhISExL3CvfZ07bYXs5GQkJCQkJCQkJCQkJD430QaeipRKyUZ5xpc82yPaQ2u+ReihiOJYl3H+cK0RwTd2QJPtaEeHGjZ6DaQ+foJ0QXY8K/bW/TIEpHXfxaim7fsEctGt8mDb4lZPfLHWf6WjW4DkeUiYNJ/hejOU3cQoisSUUP2Jt+0b1tD0tKq+l6fDcGnh5cL0QVxdV9Tq/dAXN0nqt6zCRA3NcLqgd5CdEXVeyKnAyy+sEmYdkOyqOU4IbpRqRuF6N4p0hNFCQkJCQkJCQkJCQkJCTOkOYoSEhISEhISEhISEhIWEPdMtXEiNRQl7oiMzCxWxawn5c9zbP541W3rqHt0xP7BHhizcjGZTGS+V/MQBIehfXFfPpuUjiMx6W99WFND+StS2z08CN/BoRRm5oHJROI728w+7/DcI6hcHSm8kYtLsC8Jb39J7tkrFnWVnbpg3bM3ppxsTCYThRs/Nfvc5sGHsH34USgu3wTc8MMuimJ3W9SVewWiaN0JCvMxmaD0kPnKZdYDxiNzcq2yd/HEsGkRpjzL+6IdPJNO7MkLaO1UyGQw9cHOZp+nZ+WzYmc8QV4upFzOZHBHP/oGtaxFrQpRMXZ2duLNRS9z/nwarVv7Ej1vCdevZ5jZuLo24+OP3mHf/niau7qgtFby7xnRWJoFICrO9k72RL78LJfTruDp68GHSz4mO6P6RsgePu5MmzcVo9FIdMQCS6FokuXC0cmBl199gbQLl/Dx82bZG6vIuGHuT0inICZPHU/SiVP4tfbhaOJJPlv/VZ26osqbSG1ROVlUeasJO0cNT8wdz7W0a7TwvY/NyzaQl5F7W1o309TqvYb0WZRuU6v3ABRtOmDVoQcmXS6YTBR//1n14/ceCoC8mRsylR2GTSst6ja1eg/Ar2d72j8Uiq5CO3bl1mo2wQ+HMWj2WHYuWE9y3JF66TZG7rX5etLQ07vAoUOHOHXqVIPp/bWn49y5cwE4d+4cL7zwQoPp3wqJx5Po3+t+7mSmq8zWhhYLpnH9zRgyVm/ENsAXdffq83ys/bywbu19B942jL8itRW21oQveYYDCzaQuGIr2rZeuPcMMrOxUttycMFGjr2/k/O74gmLfsKysI0NmukzKfjwPfQb1mHVyg9lx87VzPIXv07u7Bnkzp5Rv8rSSon1A+Mo+XULJQd3InfxQO5lPo/RmHaKoi9XlP//5n2Ml07XqzFQWFzKoq37mDX0fiIHdubMlWwOnblsZrNuz3E6+rjxTL8OTOobwvKd8RZ1hcUYWPjGXGLjfmPZW2v45psfWLa0+lwcKysrvv7me5Yue48XZ71Gz57d6H5/l7qFBcZ56tzJJPx2mA1rPmPvD/uYNn9qjXZBndpyIO6QRT3R/ooqFwBz5v2b3345yPsrP2b3t3FEv159uwQ3N1c++XADMe99StRLi3jltZk4a2vfKkVkeROlLTInCylvtTBm9lOc+O0YOz7YyuEfDjEuauId6f1FU6v3QFzdd0/WewBKG2zH/IuibR9R/N0m5O4+KNqYnz+r0H6YCgso+XUHRdvWUrzna4uyTbHeU9paM2LRM+x847/EvvsVLQK98ethru3s6UpBVj65VyzneInGhdRQvAvEx8c3aENRo9EwbNiwyvetWrVixYoVDaZ/Kwzs1wu1Wn1HGqpOgZRcvo6ppBQAfeIfaPp2M7OR2dqgnTKKjFp6XOtLQ/grUtutiz+6SxmUFZfH4trvZ/B+oKOZzeG3v6x8LZPLKSkosqirbBuE8do1qNhHryTpJNbdulezs310BKpRY1CNm4DM3vIEfvl9fpjyssBY7m/Z5bMofIPNbIynEypfWwX1pDRpn0VdgOOp17nPWYO1lQKAjj7N2ZucZmaj1ajIrlgwI6vAQDvPZhZ1RcUYYMjgBzh48DAA+/b/zpDB/avZXLlyjY8/KS/HdnZqNHZqUtPq3sdOZJy7P3A/Jw//AcDx30/So39YjXa7t8VSWnGNWqIplguA/gN7c/j3owD8fugI/QdWX2jix+/3cCzxZOX70tLSOuMisryJ0haZk0WUt9ro1L8LZxJTAEhJSKZTfwsdMvWkqdV7IK7uuxfrPQCFbyBlWTegtNxn4/lTWAWFmh+/a19kdhqUvYdi/cjTmIos7/XcFOs9787+ZKdnYKzQTk04TWD/TmY22ZducO7AH/XSa+yUCfrfWLnnh56uWrWK7du3s2jRIjp06MCgQYOIjY3l5MmTLFmyhOjoaOzs7Pjkk0/w8fHh3LlzTJkyBRcXF2bOnAlAYGAge/fuZfr06ezfvx+tVovBYMDNzY3w8HDi4+Oxt7cnPT2diIgIbGxsAMjNza1R4+TJk5SUlKBUKikqKmLOnDkA7N27l02bNtGxY0dyc6uGz6xfv55169YRFxfHt99+y6uvvkpCQgLHjx9n3rx5vPLKK4SFhbFq1SqMRiNKpZKSkpJ/7Cnk31FonSgrqEqgZTo9Cq35qnauLzxN5vufwR3eODR2VC4OlOiqYlGsK6SZi0ONtnKlAv/RvdgXtc6irszJGVOhvvK9SV+AzMl8pcqS40cpjj+AKTcXZWgY9lELyJs7s25dtT2mkqqhUKbiQuSq2nq/ZShaBlF6JNaivwBZukLUNlUbudvZWJOlM++NHN+7PTPX/8TbOw5y8mIGEX+r+GpCVIwBmjdvRn6+DoC8vHy0WmcUCgVGo7Ga7eOPP8rUiKd5e/kHpKfXPbxHZJydmzmh15WXDX1+AQ7ODigUcozG26+6mmK5AGjmoqUgvzwWuvwCnJwdaz1/ABOefYL33llbec5rQmR5E6UtMieLKG+14dDMEUPF7yjU6dE42SNXyCkTcKxbRar3qmhq9R6ATOOIqahKG4Memcb8/MmcmyOzVVP8/efIXN1RR75OwaJIMNVe/ppivadxcaDophWODTo97s186vVdicbPPd9QnDZtGt988w0hISHExcWh1WrZu3cvXbp0oU+fPoSEhDBmzBiio6MJDg7m2LFjREVF8fnnnxMREcFbb73FrFmzmDhxImVlZSxcuJDNmzfj5uZGYmIivr6+dOvWDQ8PD0aOHGl2bEdHxxo1AAYMGADA1KlTOXPmDH5+fsydO5ft27fj6urKli1byMoqX9L86aefZt26dQA8/PDDLF9evsR3SEgIbdu2rTzeF198waeffoqfnx+JiYmiQ1tvjFk5yO1Ule/lGjXGrKqGsFULFxSOGuwH96r8m3bSCAp+ScBwUsxy//8UhRl5KDVVsbDWqDBk5FWzkysVhC+eRMLSL8hPvW5R15STjUxV1esrU9thyskxsym7drXydcnRIzgseBPkciirvVIz6fORKW2rdK1VmArza7RV+HXAeL7+S3ZrNSr0RSWV7wuKitFqbM1s5n/xKyNCAxjcyY8sXSGPLvuSb+c+jqPaplbdho7xs1OeYviwh9AV6Ll+PRN7ew25uXk4ONiTlZVdayPjiy++YcuWHfy0+wsuXbrMd9/H1XqMho7zsKceofdD4RTqC8nOzEGtUaPLK0Btb0dedt4d37Q3pXIxbsJoBj3SH32BnsyMLOzs1eTl5aOxtyMnO7fW8zfssSGo1SpWL4+p019R17RI7YbOyaLL2830f3IgoYPCMOgN5GXmYmunQp+nR6VRo8vJbxSNRJDqvZtpavUegEmXi8zmpiepturyuYo3Y9BjvHC63P7GZbBVI3N2wZRVu+9Npd67GV1GHjZ2VT7aatQUZFbX/l+hTPZPe3B3ueeHnsrlcvr27cvPP/9MUlISM2bMYNeuXezevZuBAwcCkJKSgpeXFwDe3t4kJydXft/Pr3yfL1dXV9zc3IiKiiIqKorx48djMNRv0vnfNUpKSli2bBkxMTFcv36drKwssrOzKSwsxNW1fPEHT0/PW/6ty5cvZ8WKFYwdO5YrV+o3QfluUHgkGaV7c2TK8n4Lded26PbEI3fUILdTUXo1gytz3yErZgtZMVsAyPrPtv+5yhLg2uEzaDxdkFuXx8It1J+02KPYONlVJnmFrZLwJZM5EfMdGScu4DMktC5JAEpOJaFwcwNleU+lMqg9xfEHkNnbI6sYNqSe9CzIy4e7KDw8Kbt61WJlWXblLDIHLSjK/ZW7+2E8fwJs1GBtXrkp2nWn9I8D9Y5FSMvmXMnWUVxafqN+9MJ1egV6k6svQmcoX3jgak4BLg7l/juobJDLoMzCZJmGjvFHazfw8NCnGDM2gl3fxXJ/xXzDnj1C2fVdeeNPJpPh5eUOQO9e9xPatbwH2GQykZqWjq9v3XOQGjrOX2/YyYtPzSU6YgEHYg/Svks7AEJC27O/Yl6YTCbDzb15nTp3y9+baehysfHTLTw9OpKpE18kbvevdAktPzehYZ2I2/0rUB4Ld48Wld8ZO34kLq5aVi+PIaCtP75+tS8kIeqaFqnd0DlZdHm7mbhNu1k64Q1WRr7FkbjD+HcOACCgayBH4g7fsX5DIdV7VTS1eg/AeD4ZudYVrMp9Vvi2pTTpd1BrwLbc59LTx5C7uJV/wVYFcjmmvOoLN91MU6n3biYt8QzOHi4oKrRbdm1DctwRVI522NzUOP1foQyTkP+NlXv+iSLA4MGDeffdd+nRowfh4eG8/vrr2NnZ8fjjjwPlw0LT0tJwcnIiNTWVwMCqRRlksqquhby8PJydnVm7di1nzpxh5syZ7NixA7lcjslk4tq1ayiVSrRardnx/64xe/ZsDh8+jLW1NSkp5fMrnJ2dsbW15fr16zRv3pxLly7V+nvs7OzQ6XRoNBouX66aBF1QUMCaNWvIzMxk2LBhPPzww3cWOOD3I8fZ8UMsGZlZfLjuMyY8MRJbm9p7tWrCZCji6qtraD5vKsasPAwp59EfOIbrrGcw5uZXVpIKZwecxg4BoNmUUeRs/o7Sa7c2Mboh/BWpbTQUs+/l/9Dj9acxZOaRdeoil/cl0S1qLEU5BRxbs4N+q59DG+CJvfdEAJQqGy7s+r1u4aIidKvfwS5yOqbcHErPnaXkaCLqyVMx5edR+MUmyrKz0EyfifHqFax8WpG/bJFlh0tLKI7bhLLvGNDrKMtIp+xiMsrwkZgMBZQm/ACAzNUTU/Z1KKnfnAcAlbUVr4zoydKvD+BsZ4v/fc6E+bvzzrfxOKpteKZfB2YNDWPjb0kcS71GelY+zz/UFWc72zp1hcUYiJ63hMVvvkIb/1a0atWS2XNeByAkpB3r/rOSTp0HYDAU8eKLkRw9ehJ7eztkMhnrPt1ct7DAOP/fko957pUIvFp54dHSnfde/z8AWrdrxbyVL/P0gCkAhA/sQc8B3fH28+LJyDFs+qAOn5tguQBY+sZKXnntBXz9WtLS14uF88tHZ7QNasO7/7eYgeEjeXBwP6LfeImk48kMHNIfZ60T8+e8yfmzqTVqiixvorRF5mQh5a0WNi/bwBMvP819rdxx827BxkXrblmjJppavddQPovSbXL1HkBJEYYv3sfmsf+HSZdL2eULGE8fw+bRSZj0+RT/9CXFP32JzbBJWD84GpnLfRg2vAOlJXXKNsV6r8RQzPboTxj62gQKMvO4mpzG2f1JPDT3CQpzdfzywQ4A+k0bjpOHCyGP3I+x1MiZX+s/kkTin0NmsrQm+z2AyWSiX79+fPTRR/j7+xMVFYWnpyeRkZEAnD17lrVr19KyZUvOnz9PREQEXl5evPbaa5w6dYqpU6cyaNAgsrOzefXVVwkKCiI7O5vWrVszatQo4uPjWbduHSqVitmzZ+PmVt7DVFxcXE3DZDIxZ84cSkpKaN++Pd988w1BQUEsWLCA/fv3s3HjRoKDg7lx4wYnTpwgKiqKlJQU3n33XV599VUeeeQRPv/8c44cOUKHDh3YvXs37u7uREdHM2fOHNq1a4fBYEClUjF1as2rzf1FSca5Bo/12R7TGlzzL/z2vydMWwTrOlZfEbOhGBF0UYiuenCgZaPbQObrJ0QXYMO/xFRGkdd/FqKbt+wRIboAD74l5mnEj7P8LRvdBiLLRcCk/wrRnaeuvnJlY6eXOkuI7uSC29vKwRItrRwtG90Gnx5eLkQXxNV9Ta3eA3F1n6h6zyagfgvc3A5WD1RfMKshEFXvnbMSN3R78YU7X7TpbhDl86QQ3UWN9PdLTxQpf6K3Z8+eyveLFpn3KPn5+bF48eJq33vzzTfN3js7O7NqVfV9hLp160a3bt2q/d3a2rqahkwmY9myZZXvJ0+eXPm6T58+9OnTp5pO165dGTduXOX7sWPHMnbsWACefLKqQK9evbradyUkJCQkJCQkJCQkJP6O1FCUkJCQkJCQkJCQkJCwQONYDuvuITUUJSQkJCQkJCQkJCQkLNCYF54RgdRQlKiVX4NebnDNnj881+CafyHCX5H8bFssTjzJS4hsq6M5lo1uC3GrEZ5TKYTohrkGCNHd+HaBEF2Aj+uxoMvtIM5ncYsdeKlcheiKnMMjinPFToKUr1o2aURM6PIi823qv7DSrSBqLqHIef/puYLm5imVlm1ug4zzdkJ0OS/wmv5+jyBhrWWT26BV6T2/WcI9h9RQlJCQkJCQkLjnEdVIbIoIayRKSDRx7q3nidI+ihISEhISEhISEhISEhJ/Q3qiKCEhISEhISEhISEhYYGmN7ngzpAaihL1xrl3MM2HdKM4IxdMcH75l2afNx/WHdeHupJ/MhWHjn5c3fILGbsT66V98PhpYuNPoHXUIAOmjh5k9nn69Sw+2PIDfp5unL10jfEP9yHAx/0f81lkLP5O+54hhA6+n7yMXEwmE1tXfnFbOu7hQfgODqUwMw9MJhLf2Wb2eYfnHkHl6kjhjVxcgn1JePtLcs9esagrMhaitP16tqf9Q6HoKmIRu3JrNZvgh8MYNHssOxesJznuSL38tXeyJ/LlZ7mcdgVPXw8+XPIx2RnZ1ew8fNyZNm8qRqOR6IgF9dIWdf7UPTpi/2APjFnl5SvzvZr3cnIY2hf35bNJ6TgSk97y/nii/BWpLer8iSpvIrWb4jXyd+wcNTwxdzzX0q7Rwvc+Ni/bQF5G7i3riLpGaiIjM4tVMetJ+fMcmz+uvt3WP+2vqJzc1PKbSG0pJ0s0JqShpw3MoUOHOHXqVI2fnTp1ikOHDt229tatW8nLy7vt798JcpU1gcumcHr+p5x/+0s07bxx7tXezEZha82fCzeRtuYbLqzchv+Cp+ulXVhUzMK1XzJrwjAiRw/idNoVDp04bWbz1qfb6RfanknD+jNhaF+i11jemFSUzyJj8Xesba155s2p/Pf1T/jq3c14t/UhqGfwLesobK0JX/IMBxZsIHHFVrRtvXDvGWRmY6W25eCCjRx7fyfnd8UTFv2ERV2RsRClrbS1ZsSiZ9j5xn+JffcrWgR649fDPBbOnq4UZOWTeyWzXr7+xdS5k0n47TAb1nzG3h/2MW3+1Brtgjq15UBc/XOBqPMns7WhxYJpXH8zhozVG7EN8EXdvfqG8dZ+Xli39v7H/RWtLeL8iSxvorSb4jVSE2NmP8WJ346x44OtHP7hEOOiJt6yhqhrpDYSjyfRv9f9mG5zIpRIf0Xl5KaW30RqSzm58VOGScj/xorUUGxg4uPj62woxsfH37b2tm3b/rGGomPXNhgu3cBUXApATnwKLgM6mdlc2fwLRenlNw1q3xYUnL5UL+3jpy9wn6sz1sryB9wdA3z59Yh5DFOvZnCfS/nKfB7NtZxOu0J2nu4f8VlkLP6Of5cAMtJvUFpxrNMJyXTq3/WWddy6+KO7lEFZhc6138/g/UBHM5vDb1f1DMvkckoKLC/sIDIWorS9O/uTnZ6BsUI3NeE0gf3NdbMv3eDcgT/q5efNdH/gfk4eLv/e8d9P0qN/WI12u7fFUlpSWm9dUedP1SmQksvXMVX4ok/8A03fbmY2MlsbtFNGkVFLr/bd9Fe0tojzJ7K8idJuitdITXTq34UziSkApCQk06l/l1vWEHWN1MbAfr1Qq9W3/X2R/orKyU0tv4nUlnJy48ck6H9j5Z4aerpq1Sq2b9/OokWL6NChA4MGDSI2NpaTJ0+yZMkSoqOjsbOz45NPPsHHx4dz584xZcoUXFxcmDlzJgCBgYHs3buX6dOns3//frRaLQaDATc3N8LDw4mPj8fe3p709HQiIiKwsbEBIDMzk59++on8/HxWr17N2LFjycnJ4aOPPqJNmzacO3eOyMhIdDodCxYsQK1Ws3jxYiIjIxk9ejReXl6kp6fz6aef0qpVKzw8PHjttddYv349JSUlzJ8/nxEjRjBy5EhmzJjBxYsX6dmzJwkJCQwcOJCePXtWO5aXV/23ULB2ccCoqxraYNQVonRxrGYnt1Xi+9JonHu2Iylydb20s/J02NnaVL7XqGw4lWveCOwU4MvxM6m0a+XFyT8vAlBQWISzg+au+ywyFn/HoZkjBl1h5Xu9To9Ps1a3rKNycaDkJp1iXSHNXBxqtJUrFfiP7sW+qHUWdUXGQpS2xsWBooIqXYNOj3szn3r5ZAnnZk7odXoA9PkFODg7oFDIMRrvbFaDqPOn0DpRVlClW6bTo9Cax9j1hafJfP8zuIWbdlH+itYWcf5EljdR2k3xGqkJh2aOGCrKd6FOj8bJHrlCTtktHEvUNSIKkf6KyslNLb+J1JZyskRj455qKE6bNo1vvvmGkJAQ4uLi0Gq17N27ly5dutCnTx9CQkIYM2YM0dHRBAcHc+zYMaKiovj888+JiIjgrbfeYtasWUycOJGysjIWLlzI5s2bcXNzIzExEV9fX7p164aHhwcjR440O3azZs0YMGAA6enpPP/885X+zJkzh86dO3Po0CGWLFnCmjVrWLNmDePGjWP37t1MmjSJoUOHAuDh4cGECRPw9PSsfA9UHvcvXnrpJZ544gmef/55ioqKuHHjBnPnzq3xWPWlOCMPhaZqDzaFRkVJDXM9ygwlnF24CZWPG523zmd/t+mYSo11amsdNBQYqnqXdIVFaB3NG4AvPf0o63f+wn+//QUHOxVO9mrcmlWvoO6GzyJj8XfyMnOx1agq36s1avIyb32OTWFGHsqbdKw1KgwZ1Z9Oy5UKwhdPImHpF+SnXreoKzIWorR1GXnY3LSfoK1GTUHm7T+pH/bUI/R+KJxCfSHZmTmoNWp0eQWo7e3Iy85rkBtgUefPmJWD3K5KV65RY8yqirFVCxcUjhrsB/eq/Jt20ggKfknAcPLMXfdXhLbo89fQ5e1uaDfFa+Qv+j85kNBBYRj0hvL8aadCn6dHpVGjy8m/pUYiiLtGRCHSX1E5uanlN5HaUk5u/Nxri9ncU0NP5XI5ffv25eeffyYpKYkZM2awa9cudu/ezcCBAwFISUmpfNLm7e1NcnJy5ff9/PwAcHV1xc3NjaioKKKiohg/fjwGw61PXE9JSWHfvn3ExMRw6NChyuEmzZo141//+hcfffQRgwYNsqBSMy1btkSpVKLRaPD19a31WPUlN+E0tp6uyKzL+xacugWQ8dMRrJzsUFQkCO/IRyrti65kodQ6ILe1tqgd0saHKzeyKa7oHTuacp7endqSq9Ojq5igfT0rjwlD+zL+4T50aOND95AAlFZ193OI8llkLP7OmcMpuHi4YlVxrDZdAzkSl3DLOtcOn0Hj6YK8Qsct1J+02KPYONlVJniFrZLwJZM5EfMdGScu4DMk1KKuyFiI0k5LPIOzhwuKCt2WXduQHHcElaMdNjdVdvXl6w07efGpuURHLOBA7EHad2kHQEhoe/ZXzLGSyWS4uTe/Ze2/EHX+Co8ko3Rvjqxi2Le6czt0e+KRO2qQ26kovZrBlbnvkBWzhayYLQBk/WebxZsoUf6K0BZ9/hq6vN0N7aZ4jfxF3KbdLJ3wBisj3+JI3GH8OwcAENA1kCNxh29ZT9Q1IgqR/orKyU0tv4nUlnKyRGND8dprr732TztxN7Gzs+Pjjz+mdevWDB8+nGXLlqFSqRg1ahQAe/bsISQkBDc3N1JSUkhOTmbUqFGkp6eTnJzMgAEDAMjLy0Ov1zN16lSCg4OZN28eTz75JAkJCWg0Gpo1a0ZxcTEqVVWlmpKSQm5uLgEBAWRlZXH48GEmT57MoEGD6NSpE2VlZbRp04bi4mK2bt1K586diY2NpU+fPgB8/fXX9O/fn6tXr+Ls7MyPP/5I586dcXFxYfv27Xh4eNC2bVvy8vKIjY01e6q5Z8+eGo9VF+dvGlNuKjWiP51Oy8ihOHTxp/haDlc+30OrWY+jaetFbnwKzuFBuA3viaatN+5j+5K+IZa8BPNFabzHV5+UrbRS4OvRnPU7f+HEmTRcnR0Y3i+MD774nj8vXqVzYCt+OZzEpzv2kH49k2NnLvD82CHYWivNdC5uOG72vqF8/jsNpXvEyvLTRWOpkfQ/L/Hws8No3akNOdey+HXLzxa/52syr5hNpUZyzlwm5P8NoXknP/TXczj9xa90efExtIFeXPv9NA988DyuIb60CAukzeO90YDJAAAgAElEQVS98ejRjuRN5sdyLiurpisixg2pfU5p3h9WVmrk+p/p9Hr2Ybw6tib/eg6Hv/yVAS+MokWAJ6kV3+83bTiturfDVqOiuLCYrNRrZjqpZdXnyJ5ISGL4U0Np3c6P4K7teX9hDAa9Af8gPxbGvMa29d8AED6wB/0f6UtLP29UdipOJCRVanSUVd/ouqHOX0tloblwqZGisxfRTh6JqkMgpTeyyNv6Ey7Tn8KmjQ+FFXPJFM4OaCeNwK57Byg1Unwh3Wx4VFqJeeOhofytiYbSPkq+kPPXUmE+GqKhyltNiNJu7NeIk9y2mm5NnD6czAPjBuHdtiVtugSyafF6ivS1z43qU1NObqBrRDt5SL18/v3IcXb8EEvKmXMYiopo37YNVnV0imZ/vEuIv/lFNvydhsrJ2QpFNd3GnN9qRJT2PZyTu8wcWcsRGhffvbtFiO7gGaOF6N4pMpPpdtfWapqYTCb69evHRx99hL+/P1FRUXh6ehIZGQnA2bNnWbt2LS1btuT8+fNERETg5eXFa6+9xqlTp5g6dSqDBg0iOzubV199laCgILKzs2ndujWjRo0iPj6edevWoVKpmD17Nm5ubpXHPnfuHIsWLaJ58+aMHz8eGxsbPvnkEzw9Pbly5QqPPvoobm5uLF68mG7duhEWFsa4ceMYNWoUL7/8Mh9//DFpaWkUFRWxdOlS4uLi2L59O926dSMhIaF8GfHoaDZt2sSOHTt47rnnKhvAZ8+erXasrl3rXhQl1m1Mg8e/5w/jG1zzL/YN+q8wbRF8bFssTLuf0U6IbquSEiG6IvlJpbBsdBv8WnJViO5EmeVtX26XXuosIbp79VohuiJZZ7osRLe3soUQ3aaIqGukpVXd0w5ul/k24hbY8Nv/nhDdsz2mCdFNz63eYdVQnFMqLRvdBqLyW1OkKebkZy9t+KddqBfTfBr+3hjgvQubhejeKfdcQ1Gi/kgNRbFIDcW7g9RQrEJqKFYhNRTFIzUUq5AailVIDUXxNMWcLDUUG2dD8Z5azEZCQkJCQkJCQkJCQuJ2aMx7HorgnlrMRkJCQkJCQkJCQkJCQsIy0hNFiVrpnbS4wTVFDZMB6J0kZmiPKM51nC9Me+zIHCG6ilYeQnTlfYcK0QU498jnQnQ3Ft4Qojtuvr8QXYAH3xIz3PLHWWKGOst8/YToArwx6ZgQ3QGlrkJ0RQ3XA3FD9n5tYiPVRQ0PBXF1nyifxV154uo+F98CIbo2AeKG4Vo90FuIbnrEra/wWx9ETeVoStxbzxOlhqKEhISEhISEhISEhIRFpKGnEhISEhISEhISEhISEvc00hNFiTsiIzOLVTHrSfnzHJs/XnXbOuoeHbF/sAfGrFxMJhOZ722q0c5haF/cl88mpeNITHrDP+avSG338CB8B4dSmJkHJhOJ72wz+7zDc4+gcnWk8EYuLsG+JLz9Jblnr1jUVbTpgFWHHph0uWAyUfz9Z9VslL3Lh4DKm7khU9lh2LTSoq7cKxBF605QmI/JBKWHdpp9bj1gPDKnquF4chdPDJsWYcrLtKh98PhpYuNPoHXUIAOmjh5k9nn69Sw+2PIDfp5unL10jfEP9yHAx/KqoaJi7OjkwMuvvkDahUv4+Hmz7I1VZNww/50hnYKYPHU8SSdO4dfah6OJJ/ls/VcWtUXF2d7JnsiXn+Vy2hU8fT34cMnHZGdkV7Pz8HFn2ryp5dvwRCz4x/wFOHgmndiTF9DaqZDJYOqDnc0+T8/KZ8XOeIK8XEi5nMngjn70DWppUVfU+XPuHUzzId0ozsgFE5xf/qXZ582Hdcf1oa7kn0zFoaMfV7f8QsbuRIv+griyLConiypvNWHnqOGJueO5lnaNFr73sXnZBvIycm9L62aaWr3XkD6L0hVVjpWdumDdszemnGxMJhOFGz81+9zmwYewffhRKC5fhdzwwy6KYnfXy2dRdaqo/CYyD/n1bE/7h0LRVZy/2JVbq9kEPxzGoNlj2blgPclxR+ql2xgps2zyP4X0RLEWDh06xKlTp4Tpnzt3jhdeeKFBtE6dOsWhQ4caROtWSTyeRP9e93Mnm6zIbG1osWAa19+MIWP1RmwDfFF371DNztrPC+vW3nfgbcP4K1JbYWtN+JJnOLBgA4krtqJt64V7zyAzGyu1LQcXbOTY+zs5vyuesOgnLAsrbbAd8y+Ktn1E8XebkLv7oGhjHmOr0H6YCgso+XUHRdvWUrzna8u6VkqsHxhHya9bKDm4E7mLB3KvQDMTY9opir5cUf7/m/cxXjpdr8ZAYVExC9d+yawJw4gcPYjTaVc4dMJ84+a3Pt1Ov9D2TBrWnwlD+xK9puYbrZsRFmNgzrx/89svB3l/5cfs/jaO6NdfrGbj5ubKJx9uIOa9T4l6aRGvvDYTZ61T3cIC4zx17mQSfjvMhjWfsfeHfUybP7VGu6BObTkQV888I7JcFJeyaOs+Zg29n8iBnTlzJZtDZ8znXq7bc5yOPm48068Dk/qGsHxnfL3cFnH+5CprApdN4fT8Tzn/9pdo2nnj3Ku9mY3C1po/F24ibc03XFi5Df8FT9fLX1FlWWROFlLeamHM7Kc48dsxdnywlcM/HGJc1MQ70vuLplbvgbi6r1HXezY2aKbPpODD99BvWIdVKz+UHTtXM8tf/Dq5s2eQO3tGvRuJoupUUflNZB5S2lozYtEz7Hzjv8S++xUtAr3x62F+/pw9XSnIyif3iuUcL9G4kBqKtRAfHy+0odiqVStWrFjRIFqnTp0iPr5+N0INzcB+vVCr1XekoeoUSMnl65hKSgHQJ/6Bpm83MxuZrQ3aKaPIqKXHtb40hL8itd26+KO7lEFZcXksrv1+Bu8HOprZHH67qhdQJpdTUmB57y+FbyBlWTegtFzXeP4UVkGhZjbKrn2R2WlQ9h6K9SNPYyoqtKgrv88PU14WGMt1yy6fReEbbGZjPJ1Q+doqqCelSfss6gIcP32B+1ydsVaWD3zoGODLr0fMr8nUqxnc51J+k+7RXMvptCtk5+nq1BUVY4D+A3tz+PejAPx+6Aj9B1ZfqODH7/dwLPFk5fvS0lJKK8p+bYiMc/cH7ufk4T8AOP77SXr0D6vRbve2WIt+3g1/j6de5z5nDdZW5YsqdPRpzt7kNDMbrUZFdkH5k5esAgPtPJvVS1vE+XPs2gbDpRuYKspbTnwKLgM6mdlc2fwLRenlN1Bq3xYUnL5UL39FlWWROVlEeauNTv27cCYxBYCUhGQ69e9yR3p/0dTqPRBX9zXmek/ZNgjjtWtQsf9vSdJJrLt1r2Zn++gIVKPGoBo3AZl9/RauEVWnispvIvOQd2d/stMzMFZopyacJrC/uXb2pRucO/BHvfQaOyZB/xorTXLo6apVq9i+fTuLFi2iQ4cODBo0iNjYWE6ePMmSJUuIjo7Gzs6OTz75BB8fH86dO8eUKVNwcXFh5syZAAQGBrJ3716mT5/O/v370Wq1GAwG3NzcCA8PJz4+Hnt7e9LT04mIiMDGxgaAy5cvs3DhQkpKSujUqRMXLlygdevWREREUFpayptvvolWq0Wn0xEYGMjw4cNZu3Yta9as4d///jcnTpzAYDAQFhbGunXriIuL46uvvmLFihVMmjSJlJQUsrOzGTlyJL/99hupqal8+OGHaDQazpw5w0cffUSbNm04d+4ckZGRqNVqfvrpJ/Lz81m9ejVjx44lJyenmp2Dg0ONv33AgAH/2Hn8C4XWibKCqgRaptOj0JpvqOz6wtNkvv8Z3OGNQ2NH5eJAia4qFsW6Qpq5ONRoK1cq8B/di31R6yzqyjSOmIr0VX8w6JFpzGMsc26OzFZN8fefI3N1Rx35OgWLIsFU+0ALmdoeU0nVUChTcSFyVW293zIULYMoPRJr0V+ArDwddrY2le81KhtO5Zo3AjsF+HL8TCrtWnlx8s+LABQUFuHsoKlVV1SMAZq5aCnIL4+zLr8AJ2dHFAoFRqOxRvsJzz7Be++sJT+/7satyDg7N3NCryv3WZ9fgIOzAwqFHKPx9gfYCC0XukLUNlUrgNrZWJOlM++lHt+7PTPX/8TbOw5y8mIGEX+76awNEefP2sUBo64qFkZdIUqX6hvGy22V+L40Guee7UiKXF0vf0WVZZE5WUR5qw2HZo4YKn5HoU6PxskeuUJOmYBj3SpSvVeFsHrPyRlTYVW9Z9IXIHMyX1W65PhRiuMPYMrNRRkahn3UAvLmzrSsLahOFZXfROYhjYsDRQVV2gadHvdmPvX6rkTjp0k2FKdNm8Y333xDSEgIcXFxaLVa9u7dS5cuXejTpw8hISGMGTOG6OhogoODOXbsGFFRUXz++edERETw1ltvMWvWLCZOnEhZWRkLFy5k8+bNuLm5kZiYiK+vL926dcPDw4ORI0eaHdvd3Z0BAwawf/9+nnvuOQCGDBlC3759SUxMpKSkhGnTpmEymRg8eDC9evViypQpbNq0ifDwcCZOnMiJEycIDg5m3bp1ADz22GNs376doKAgpkyZwnPPPUdBQQFvvvkmCxcuZN++fQwaNIjo6GjmzJlD586dOXToEEuWLGHNmjUMGDCA9PR0nn/++cr41GRX029vDBizcpDbqSrfyzVqjFlV80isWrigcNRgP7hX5d+0k0ZQ8EsChpNn7qqvoinMyEOpqYqFtUaFISOvmp1cqSB88SQSln5Bfup1i7omXS4ym5t6fW3V5fMqbsagx3ihfGin6cZlsFUjc3bBlFW7vkmfj0xpW/leZq3CVJhfo63CrwPG88ct+voXWgcNBYaqXmNdYRFaR/MG4EtPP8r6nb/w329/wcFOhZO9Grdm1Su/m2noGI+bMJpBj/RHX6AnMyMLO3s1eXn5aOztyMnOrbWRMeyxIajVKlYvj6nTX2j4OA976hF6PxROob6Q7Mwc1Bo1urwC1PZ25GXn3fFNu9ByoVGhL6rad6GgqBitxtbMZv4XvzIiNIDBnfzI0hXy6LIv+Xbu4ziqbf4uJ/z8FWfkobjJP4VGRUkN8+TKDCWcXbgJlY8bnbfOZ3+36ZhKaz72X4jKFw2dk0WXt5vp/+RAQgeFYdAbyMvMxdZOhT5Pj0qjRpeT3ygaiSDVezcjrN7LyUamqqr3ZGo7TDnmW0eVXbta+brk6BEcFrwJcjlYuD8SVac2dH77C5F5SJeRh41dlbatRk1BZvXz979C48ggd48mOfRULpfTt29ffv75Z5KSkpgxYwa7du1i9+7dDBw4EICUlBS8vLwA8Pb2Jjk5ufL7fn7lOwS5urri5uZGVFQUUVFRjB8/HoOhfhPF/9L+S//PP/8kJSWFGzduEBMTU/lE78aNqr3W/jpucHBwNb2bNR0cHPD2Lu99d3R0pKCgoPI37du3j5iYGA4dOlTrcI+67P7+2xsDhUeSUbo3R1YxxFDduR26PfHIHTXI7VSUXs3gytx3yIrZQlbMFgCy/rPtf66yBLh2+AwaTxfk1uWxcAv1Jy32KDZOdpUVqcJWSfiSyZyI+Y6MExfwGRJalyQAxvPJyLWuYFWuq/BtS2nS76DWgG25bunpY8hdKsqErQrkckx51ReZuJmyK2eROWhBUa4rd/fDeP4E2KjB2rxyU7TrTukfB+odi5A2Ply5kU1xRW/60ZTz9O7UllydHl3Fgg7Xs/KYMLQv4x/uQ4c2PnQPCUBpVXf/V0PHeOOnW3h6dCRTJ75I3O5f6RJa3rsbGtaJuN2/AiCTyXD3aFH5nbHjR+LiqmX18hgC2vrj61f3QgQNHeevN+zkxafmEh2xgAOxB2nfpR0AIaHt2V8xL0wmk+Hm3rxOnbvl782EtGzOlWwdxRU3L0cvXKdXoDe5+iJ0hvIFKa7mFODiUJ73HFQ2yGVQVsskKtHnLzfhNLaersgqyptTtwAyfjqClZMdiory5h35SKV90ZUslFoH5LbWFmMhKl80dE4WXd5uJm7TbpZOeIOVkW9xJO4w/p0DAAjoGsiRODF7y90OUr1XhahyXHIqCYWbG1TsQaoMak9x/AFk9vbIKu6L1JOeBXn5ME+FhydlV69abCSCuDq1ofPbX4jMQ2mJZ3D2cEFRod2yaxuS446gcrTD5qYOgP8VpKGnTYTBgwfz7rvv0qNHD8LDw3n99dexs7Pj8ccfB8qHV6alpeHk5ERqaiqBgVULKchkssrXeXl5ODs7s3btWs6cOcPMmTPZsWMHcrkck8nEtWvXUCqVaLVas+NfvHix8nVqaip+fn7k5+djbW1NREQEAD/++COenp41Hvd2CAwM5MEHHyQwMJDi4mJ+/PFHgEpfc3Jy0Ov1tdo1hA9/5/cjx9nxQywZmVl8uO4zJjwxElub2nu1asJkKOLqq2toPm8qxqw8DCnn0R84huusZzDm5ldWkgpnB5zGDgGg2ZRR5Gz+jtJrtzYxuiH8FaltNBSz7+X/0OP1pzFk5pF16iKX9yXRLWosRTkFHFuzg36rn0Mb4Im990QAlCobLuz6vW7hkiIMX7yPzWP/D5Mul7LLFzCePobNo5Mw6fMp/ulLin/6Epthk7B+cDQyl/swbHgHSi3smF1aQnHcJpR9x4BeR1lGOmUXk1GGj8RkKKA04QcAZK6emLKvQ0n95voBqGysiZryGEv+sw2tg4Y23vcRFtyGdzbswEGjZvLwBzh2+jy/HUmmXStPcgv0vPzMSIu6wmIMLH1jJa+89gK+fi1p6evFwvnLAWgb1IZ3/28xA8NH8uDgfkS/8RJJx5MZOKQ/zlon5s95k/NnU2sXFhjn/1vyMc+9EoFXKy88Wrrz3uv/B0Drdq2Yt/Jlnh4wBYDwgT3oOaA73n5ePBk5hk0fbP5H/FVZW/HKiJ4s/foAzna2+N/nTJi/O+98G4+j2oZn+nVg1tAwNv6WxLHUa6Rn5fP8Q11xtrO1qC3i/JUVFpMyey0BiyZRnJmH7o80sveepPW8cZTk6Ehd/TVyGyUBSyZjSM/Azt+D0/PWYdRZns8kqiyLzMlCylstbF624f+zd+ZxUZXt/3/PDDMwC6soyiaICIrirrmjmaZlZY9lm7nzxTJ/pWUauD6aZlmZrWY9amaZPlppVj5iLqlJiksQm4CiiAs7M+zD+f2BgRPoIHoH1Hn78vXizFznc665zn2u+9zn3AuPz32aVm3ccfNuyedL192yRm00tXrvTvksSldYTi4pwbj6LfTTZiDl5VKekkzZyWh0k8OQCvIp+moTFTnZGGbMxHwpAxufNhSsWFo3pwXVqaLym8g8VFZcytcRnzJq4XhMWflcik8j+XAs9855nKI8I/s/2AHA4OkP4eThSvD9d2EuN5N0oO49SWQaDoUkiZj/UTySJDF48GA+/vhj/P39CQ8Px9PTk2nTpgGQnJzM2rVrad26NampqYSGhuLl5cXChQuJi4sjLCyM4cOHk5OTw4IFCwgKCiInJ4e2bdsyZswYoqKiWLduHVqtltmzZ1u8fdu2bRv79u0jKCiIhIQE/P39mTZtGmazmddffx29Xk9ZWRm2trY8++yzfP/998yfP5+JEycydepU1Go1n3/+OW+//TYLFizA2dmZefPmMXr0aIYMGUJERATt27dn6tSpLFiwAEdHRxYsWEBeXh6ffvopnp6eZGRk8MADD9CjRw9SUlJYunQpLVq0YNy4cdja2tawCw4OrvHbrVGWmXLHz1ty3+l3XPMP/A6/K0xbBOu6zBem/djDudaN6oGqjYcQXWXIKCG6AJ/d/6UQ3X8XnhKiGz+/rxBdgHteF/M24n8v+Vs3qgcKXz8hugABEz8TovsfdQchuilqtXWjejJAly1Ed7Kpfks5WKO1zc27l9eX9cdXCtEFcXVfU6v3QFzdNzrovHWjemAbULcJbuqDzd01J8y6ExwOFfP2fI9WJUQXYNnZ25+06a9gvM+/hOiuP2t9mayGoMk2FBuSbdu2WYwJ/LsiNxTFIjcUq5EbitXIDcVq5IZiNXJDsRq5oVhNU6v3QG4oXo/cUKxGbig2zoZik+162lBkZGTw008/kZeXR2JiIu3atWtol2RkZGRkZGRkZGRkBGNtPOjfDbmheIu0atWK1avrNmWwjIyMjIyMjIyMjMzfg4ZqJh4+fJjdu3fTrFkzFAoF06db9lIoKSnhtddew83NjbNnzxIaGoqvr+9tH1duKMrcEPOFO784anqeuC4cPgL8BVB5iulKJpKShNqXIrhddG2EyCKd+3ssxHsnENndEv5+MyY2NkR2ERWFsLxsI6brqSjGd58lrPupqBiLzBZNjcxUvRBdV8TUpwCqNslCdMXloX/a4hCNg6KiIhYsWMB3332HRqPhueee48iRI/Tp06fKZv369bRq1YqpU6eSkJBAeHg4mzbdfnfeJrk8hoyMjIyMjIzMnUTkGEUZGZm/BxVIQv7fjJMnT+Lu7o5GU7lcSbdu3di3b5+Fzb59++jatSsAAQEBxMfHYzQab/v3yg1FGRkZGRkZGRkZGRmZRkhWVhZ6ffUbc4PBQFZW1i3b1Ae566mMjIyMjIyMjIyMjIwVpAYYpdisWTNMJlPVttFopFmzZrdsUx/khqJMnfnldCKRUb/h4mhAAYQ9YrkWY/qVbD7Y8iN+nm4kX7jMuPsGEeDjXidt54GdaDGyF6WZeSBB6sqtFt+3eLAPze/tQUHMORy6+HFpy34yd0c3qM9/JjMrm3fWbCDhTAqbP3mnXhoA7v2D8B3Rk6KsfJAkot/abvF952fuR9vckaKrebh28uXYG1vJS86wqqvu2h1Nv4FIuTlIkkTR5+stvre9517s7nsASksBKP5xFyWRu63qKr0CUbXtCkUFSBKUH91p8b1m6DgUTs2r7V09Kd60FCnf+pOuX5LSiYw5i4tei0IBYfd0s/g+PbuAN3dGEeTlSsLFLEZ08SMkqLVVXVExdnRyYO6CF0g7ewEfP29W/PsdMq9a/s7grkFMDhtH7G9x+LX14WR0DF9ssD4ttqhY2DvZM23uVC6mZeDp68FHyz8hJzOnhp2HjzvT54VhNpuJCF1kVbcplgtR509UeROpLSoniypvtaF3NPD4nHFcTrtMS99WbF6xkfzMvHppXc+dyvWiYizSZ1G6osqxrm8X7O/pizk7D0mSyHq39jFbDqNCcF85m4QuDyMV1m1sbVOrU0XmIb9+Hel4b0+M17QjV22rYdPpvt4Mn/0YOxdtIH7viTrpNkYaYpRmly5duHjxIqWlpWg0GqKjo3niiSfIzc3FxsYGg8FASEgIJ06coEePHiQkJBAYGIjBYLjtY8tdTxsJR48eJS4urqHduCFFJaUsWbuVl8Y/yLRHhpOYlsHR3xItbF5f/zWDe3Zk4oNDGD8qhIj36jaIVqnVELhiConz15P6xlYMHbxxHtDRwkZlp+HMkk2kvfctZ1dtx3/R0w3qc21En45lyIC7uJ2Zk1V2Gvovn8SRRRuJfnMbLu29cO8XZGFjo7Pjl0Wfc+r9naTuiqJ3xOPWhW1tMcyYiemjdyncuA6bNn6ou3SrYVawbDF5s58nb/bzdarQsFGjuftJyg5soeyXnShdPVB6BVqYmNPiKNn6ZuX/b9/HfCGxTo2BotJylm47xEuj7mLasG4kZeRwNOmihc26fafp4uPGpMGdmRgSzMqdUVZ1hcUYeHne/+Pn/b/w/qpP2P3dXiIWz6ph4+bWnE8/2siad9cT/uJSXlk4E2cXp5vqiooFQNicyRz7+Tgb3/uCgz8eYvr8sFrtgrq258jeo3XSbIrlAsScP5HlTZS2qJwMgsrbDRg7+yl++/kUOz7YxvEfj/Jk+ITb0vuDO5HrRcZYlM+idEWVY4WdLS0XTefKq2vIXP05dgG+6Pp0rmGn8fNC09b71pxuYnWqyDykttMweukkdv77MyLf/i8tA73x62up7ezZHFN2AXkZt98V8p+IVqtl4cKFLFmyhLfeeouAgAD69OnDmjVrqiasefrpp7l48SLvv/8+//nPf1i6dOkdObbcUGwkREVFNeqG4unEs7Rq7oxGXfkSukuALwdOWPp77lImrVwrb5g8WriQmJZBTr71gbSOPdpRfOEqUmk5ALlRCbgO7Wphk7F5PyXplQlG59sSU+KFBvW5NoYNHoBOp6vXvn/g1t0f44VMKq7F4vKvSXjf3cXC5vgb1U+dFUolZaYSq7rq9kGYL1+GsjIAymJj0PTqU8PO7oHRaMeMRfvkeBT21mfpU7byQ8rPBnOlvxUXk1H5drKwMSceq/rbJqgf5bGHrOoCnD53hVbOBjQ2lQv8dvFpwcH4NAsbF4OWnGuLemebiungab2bhagYAwwZNpDjv54E4NejJxgyrOZiyv/7YR+nomOqtsvLyykvK7+prqhYAPS5+y5ijlfOOnv61xj6Duldq93u7ZFW/fyDplguQMz5E1neRGmLyskgprzdiK5DupMUnQBAwrF4ug7pflt6f3Ancr3IGNfGnfBZlK6ocqztGkjZxStI18pRYfTvGEJ6Wdgo7GxxmTKGzBu8abwRTa1OFZmHvLv5k5Oeifma9rljiQQOsSzLOReuknLk7zG7eUNMZgPQr18/Fi9ezAsvvFC1NMbs2bMJDQ0FwM7OjgULFvDMM8+wfPnyO7I0BshdT+vEO++8w9dff83SpUvp3Lkzw4cPJzIykpiYGJYvX05ERAR6vZ5PP/0UHx8fUlJSmDJlCq6ursycOROAwMBADh48yIwZMzh8+DAuLi4UFxfj5uZG//79iYqKwt7envT0dEJDQ7G1ta06/qZNmzhz5gzNmjXj4sWLLFq0CJPJVKu2k5MT//3vf/Hz8yM1NZVZs2bh4uLC22+/TVlZGWq1mpKSEl5++eVbikF2vhG9XbVPBq0tcXmWDaquAb6cTjpHhzZexJw5D4CpqARnh5u/+ta4OmA2Vnf1MBuLULs61rBT2qnxffERnPt1IHaa9bUsRfosCq2rA2XGoqrtUmMRzVwdarVVqiP7C9cAACAASURBVFX4PzKAQ+HrrOoqnJyRigqrtqVCEwonfwubstMnKY06gpSXh7pnb+zDF5E/Z+bNdXX2SGXV504qLUKpvdGTWQWq1kGUn4i06i9AtrEInW31FN96Ww3ZRsunkeMGdmTmhj28seMXYs5nEvqniq82RMUYoJmrC6aCyjgbC0w4OTuiUqkwm8212o+f+jjvvrWWgoKbP5wQFQsA52ZOFBorfS4sMOHg7IBKpcRsrn8Hm6ZYLkDM+RNZ3kRpi8rJIKa83QiHZo4UmyrjU2QsxOBkj1KlpELAsW4VkTFuaogqxyoXJypM1boVxkJULpYxbv7C02S9/wXc4kOJplanisxDBlcHSkzVPhcbC3Fv5lOnfWUaP3JDsQ5Mnz6db7/9luDgYPbu3YuLiwsHDx6ke/fuDBo0iODgYMaOHUtERASdOnXi1KlThIeH8+WXXxIaGsrrr7/OSy+9xIQJE6ioqGDJkiVs3rwZNzc3oqOj8fX1pVevXnh4ePDwww/XOH7Lli157LHHUCqVLFmyhJ9//pmQkJBatceMGcPWrVtxc3Nj27ZtfPjhh7zyyit07NiRoUOHAhAWFkZSUhL+/v41jnUjXBwMmIqrny4Zi0pwcbRsTL349ANs2Lmfz77bj4Nei5O9DrdmNSu+P1OamY/KYFe1rTJoKatlHElFcRnJSzah9XGj27b5HO41A6m89hs40T6LoigzH7VBW7WtMWgpzsyvYadUq+i/bCLHXvuKgnNXrOpKuTkotNVPfRU6PVJuroVNxeVLVX+XnTyBw6JXQamEihvfVEmFBSjU1edOodEiFdW+5pTKrzPm1NNWff0DF4OWwpKyqm1TSSku15UTgPlfHWB0zwBGdPUj21jEAyu28t2cR3HU2f5Zroo7HeMnxz/C8PuHUGgqJCszG729jvz8Agz2enJz8m7YyHjwXyPR6bSsXrnmhtp/cKdj8eBT9zPw3v4UFRaRk5WLzqDDmG9CZ68nPyf/tm/am1K5EH3+RF3TIrXvdE4WXd6uZ8gTw+g5vDfFhcXkZ+Vhp9dSmF+I1qDDmFvQKBqJIK7ea4qIKsfm7FyU+mpdpUGHObs6xjYtXVE5GrAfMaDqM5eJozHtP0ZxzM3XnG1qdarIPGTMzMdWX+2znUGHKaum9t+FhpjMpiGRu57WAaVSSUhICD/99BOxsbE8//zz7Nq1i927dzNs2DAAEhIS8PLyAsDb25v4+Piq/f38KpfEbd68OW5uboSHhxMeHs64ceMoLrY+aFqr1fL666+zZs0azpw5Q3Z2dq3aarWavLw8vvnmmypblaqye1ZZWRkrVqxgzZo1XLlyxUKjLgS38yHjag6l1566nUxIZWDX9uQZCzFeG/h9JTuf8aNCGHffIDq386FPcABqG+vPIvKOJWLn2RyFptLWqVcAmXtOYOOkR3UtsXlPu7/KviQjG7WLA0o7TYP5LIrLx5MweLqivBYLt57+pEWexNZJX5XkVXZq+i+fzG9rvifzt7P4jOxpVbcsLhaVmxtcW4RXHdSR0qgjKOztUVzrNqSbOBWUleVF5eFJxaVLN63QACoyklE4uICq0l+lux/m1N/AVgcay5t3VYc+lP9+pM6xCG7dgowcI6XXbopOnr3CgEBv8gpLMBZXTg5wKdeEq0Ol/w5aW5QKqLAyWOZOx/jz9Vt4+pFphE2Yxd7dB+jes/LtVc/eXdm7+wAACoUCd4+WVfs8Nu5hXJu7sHrlGgLa++Prd/OJVu50LL7ZuJNZT80hInQRRyJ/oWP3DpXH6dmRw9fGhSkUCtzcW9zUrxvRlMqF6PMn6poWqX2nc7Lo8nY9ezft5rXx/2bVtNc5sfc4/t0CAAjoEciJvcdvW/9OIarea4qIKsdFJ+JRu7dAcW34ia5bB4z7olA6GlDqtZRfyiRjzltkr9lC9potAGT/Z7vVRiI0vTpVZB5Ki07C2cMV1TXt1j3aEb/3BFpHPbbXNU7/LlQI+t9Ykd8o1pERI0bw9ttv07dvX/r378/ixYvR6/U8+uijQGX3z7S0NJycnDh37hyBgdWDjxUKRdXf+fn5ODs7s3btWpKSkpg5cyY7duxAqVQiSRKXL19GrVbj4uJStc+MGTP45ptvcHd3r7F45vXazs7OuLi4MHbsWBwdHcnJyeHkyZPk5+cze/Zsjh8/jkajISEh4ZZ/v9ZWQ/iUf7H8P9txcTDQzrsVvTu1462NO3Aw6Jj80N2cSkzl5xPxdGjjSZ6pkLmTar4drY2KolISZq8lYOlESrPyMf6eRs7BGNrOe5KyXCPnVn+D0lZNwPLJFKdnovf3IHHeOszXdaP4q32ujV9PnGbHj5FkZmXz0bovGP/4w9jZ3vjNVm2Yi0s5NPc/9F38NMVZ+WTHnefioVh6hT9GSa6JU+/tYPDqZ3AJ8MTeewIAaq0tZ3f9enPhkhKMq99CP20GUl4u5SnJlJ2MRjc5DKkgn6KvNlGRk41hxkzMlzKw8WlDwYo6DIQuL6N07ybUIWOh0EhFZjoV5+NR938YqdhE+bEfAVA090TKuQJldRvzAKDV2PDK6H689s0RnPV2+Ldypre/O299F4WjzpZJgzvz0qjefP5zLKfOXSY9u4Dn7u2Bs97uprrCYgy89u9VvLLwBXz9WtPa14sl8ysX8G4f1I63P1zGsP4Pc8+IwUT8+0ViT8czbOQQnF2cmP/yq6Qmn/vLYwHw4fJPeOaVULzaeOHR2p13F38IQNsObZi3ai5PD50CQP9hfek3tA/efl48MW0smz7YfGPRJlguQMz5E1neRGmLyskgqLzdgM0rNvL43Kdp1cYdN++WfL503S1r1MadyPUiYyzKZ1G6osqxVFzCpQXv0WJeGObsfIoTUik8cormL03CnFdQ1ThUOTvg9NhIAJpNGUPu5u8pv2xl0pUmVqeKzENlxaV8HfEpoxaOx5SVz6X4NJIPx3LvnMcpyjOy/4MdAAye/hBOHq4E338X5nIzSQfq3pNEpuFQSNKdngPr74kkSQwePJiPP/4Yf39/wsPD8fT0ZNq0aQAkJyezdu1aWrduTWpqKqGhoXh5ebFw4ULi4uIICwtj+PDh5OTksGDBAoKCgsjJyaFt27aMGTOGqKgo1q1bh1arZfbs2bi5uVUde+XKlSQlJdGtWzcOHjyIk5MTCxYs4M0337TQBoiOjmb79u20bNmSjIwMJk6cSJs2bXj55ZcpKyujY8eOfPvttwQFBbFo0SLUanWtvxeg+OTOG35XXw4N/+yOa/5Bvx/HCdFVeXYQoruuy3whugCjg84L0dWNCLRuVA8Uvn5CdAE2PiumMvp34Skhugn/EVOOAYY+84MQ3f+9VPdu7LeCyHIRMFFMLpqnqzmrYmOnTVmZdaN6MN/mqhDd1jZihgesP75SiC7AgaC5QnQHxi4ToisSUXXfAN2t9ZSqK66+JutG9URUnfr5G2J8TrER9+5r2dn6zzr/VzLae5QQ3e1pO4To3i7yG8U6olAo2LdvX9X2n6ed9fPzY9mymgn71Vdftdh2dnbmnXdqrjXUq1cvevXqVeNzgFmzqqdo/2N2o9q0Abp160a3bjWnaF6xYkXV35MnT671ODIyMjIyMjIyMjIyMiA3FGVkZGRkZGRkZGRkZKxSl6Us/k7Ik9nIyMjIyMjIyMjIyMjIWCCPUZS5IXN9nmhoF26JNuVinnuI6pM/f1Xd1narD+WRB4Topv8gJhYix4BkpuqF6IryeWWChxBdgANll6wb1YOB6pbWjRoZQ4vELDGQcpNx3/80flKJuUYGm8Vc0zJ/DRNOLhaiW7x4hhDdkoTal6f4J2IbYC9M2/6dOz8vhghGed9v3age7EhrnL9f7noqIyMjIyMjIyMjIyNjBXkdRRkZGRkZGRkZGRkZGZl/NPIbRZk649evIx3v7YkxKx8kichV22rYdLqvN8NnP8bORRuI33uiwbXd+wfhO6InRdd0o9/abvF952fuR9vckaKrebh28uXYG1vJS85oMH8BfklKJzLmLC56LQoFhN1jOYttenYBb+6MIsjLlYSLWYzo4kdI0M0XbAdQteuMTee+SMY8kCRKf/iiho16YOW0z8pmbii0eoo3rbKqq+vbBft7+mLOzkOSJLLerX2Ka4dRIbivnE1Cl4eRCout6gKou3ZH028gUm4OkiRR9Pl6i+9t77kXu/segNLKhdaLf9xFSeTuBvNZlL8grszZO9kzbe5ULqZl4OnrwUfLPyEnM6eGnYePO9PnhWE2m4kIXdRg/orUdh7YiRYje1GamQcSpK7cavF9iwf70PzeHhTEnMOhix+Xtuwnc3e0VV1ReUiktkif/4ze0cDjc8ZxOe0yLX1bsXnFRvIz825Zp6nFoqnpitb+M5lZ2byzZgMJZ1LY/EnN2eLrgqh6D8Tl+6amC2Lj3NiQJ7P5h3D06FHi4uIa2g3WrVtX9fe+ffsYMmQIFy5caDiHboDaTsPopZPY+e/PiHz7v7QM9Mavb5CFjbNnc0zZBeRlWFmo9i/SVtlp6L98EkcWbST6zW24tPfCvZ+lro3Ojl8Wfc6p93eSuiuK3hGPN5i/AEWl5SzddoiXRt3FtGHdSMrI4WjSRQubdftO08XHjUmDOzMxJJiVO6OsC6ttsRv7LCXbP6b0+00o3X1QtbNc782m52CkIhNlB3ZQsn0tpfu+sSqrsLOl5aLpXHl1DZmrP8cuwBddn5rryGn8vNC09bbu5/XY2mKYMRPTR+9SuHEdNm38UHepufRLwbLF5M1+nrzZz9epUhPmsyB/QWyZC5szmWM/H2fje19w8MdDTJ8fVqtdUNf2HNl7tMH9FaWt1GoIXDGFxPnrSX1jK4YO3jgP6Ghho7LTcGbJJtLe+5azq7bjv+hpq7qi8pBIbZE+18bY2U/x28+n2PHBNo7/eJQnwyfcskZTi0VT0xWtXRvRp2MZMuAu6j2ThqB6DxCX75uaLoiNs0yD849tKEZFRTWKhuKGDRuq/g4JCcHDQ9xEFreDdzd/ctIzMZeWA3DuWCKBQ7pa2ORcuErKkd8bjbZbd3+MFzKpuKZ7+dckvO+2nEDm+BvVbwwUSiVlppIG8xfg9LkrtHI2oLFRAdDFpwUH49MsbFwMWnJMlW+3sk3FdPBsZlVX5RtIRfZVKK/02Zwah01QTwsbdY8QFHoD6oGj0Nz/NFJJkVVdbddAyi5eQSqr1C2M/h1DiOV6oAo7W1ymjCHzBm/tboS6fRDmy5fh2mLgZbExaHr1qWFn98BotGPGon1yPAp76wPtRfksyl8QW+b63H0XMccr9zv9awx9h/Su1W739kjKr8WsIf0Vpe3Yox3FF64iXdPNjUrAdailbsbm/ZSkVzY+db4tMSVaf6gnKg+J1Bbpc210HdKdpOgEABKOxdN1SPdb1mhqsWhquqK1a2PY4AHodLp67y+q3gNx+b6p6YLYODdGJEkS8r+x0ii7nr7zzjt8/fXXLF26lM6dOzN8+HAiIyOJiYlh+fLlREREoNfr+fTTT/Hx8SElJYUpU6bg6urKzJkzAQgMDOTgwYPMmDGDw4cP4+LiQnFxMW5ubvTv35+oqCjs7e1JT08nNDQUW1tbi+ObzWbUajVlZWVMmjSJmTNnolKp8PPz48SJE4wdO5bExER+//13Ro4cydixYwF47733KC8vp6KiArVazfTp02/4+a5du8jPz2f16tW0adOG++67D4Dvv/+e8+fPk5KSwocffojZbK46fkBAACdPnmTUqFE8+uijAKxatQqz2YxSqUSv1zN16lRSUlL46KOP8PPzIykpiWeeeQZJkmp85uvrW6dzYnB1oMRU3fWu2FiIezOf2z7XIrW1rg6UGauTUamxiGauDrXaKtUq/B8ZwKHwdVZ1RcYi21iEzrZ61kS9rYZso+VbkXEDOzJzwx7e2PELMeczCb3b+uypCoMjUklh9QfFhSgMjpY2zi1Q2Oko/eFLFM3d0U1bjGnpNJBuPNOpysWJClN1jCuMhahcLHWbv/A0We9/AXVsZFT54+SMVFTts1RoQuHkb2FTdvokpVFHkPLyUPfsjX34IvLnzLypriifRfkLYsucczMnCo2VfhcWmHBwdkClUmI213+G26aYLzSuDpiN1bpmYxFqV8cadko7Nb4vPoJzvw7ETlttVVdUHhKpLdLn2nBo5kjxtWuyyFiIwckepUpJxS2UwaYWi6amK1pbBKLqPRCX75uaLoiNc2Ok6Xl8ezTKhuL06dP59ttvCQ4OZu/evbi4uHDw4EG6d+/OoEGDCA4OZuzYsURERNCpUydOnTpFeHg4X375JaGhobz++uu89NJLTJgwgYqKCpYsWcLmzZtxc3MjOjoaX19fevXqhYeHBw8//HCN43/11VesX78ePz8/oqOjcXR0JDQ0lFWrVvHyyy8TFxfHs88+y549eygoKOCpp55i7NixHDx4kNOnT/PRRx8BMGXKFH7++WckSar185EjR/LGG2/w3HPPWRy/Q4cOTJ06lcWLF3Po0CGGDx9OaGgob775JrNmzSI7O5vx48fz6KOPcvDgQU6dOsWnn34KwLhx4+jfvz9Hjx7F1taWCRMmcPnyZWxtbdm1a1eNz+qKMTMfW71d1badQYcpK/+Wz+1fqV2UmY/aoK3a1hi0FGfW1FWqVfRfNpFjr31FwbkrDeYvVL4tLCwpq9o2lZTiYrCzsJn/1QFG9wxgRFc/so1FPLBiK9/NeRRH3Y3Pp2TMQ2F73ZNZO13lWILrKS7EfDax0v7qRbDToXB2Rcq+cUzM2bko9dUxVhp0mLOrdW1auqJyNGA/YkD1b5w4GtP+YxTHJN1QF0DKzUGhrfZZodMj5eZa2FRcrl7uoezkCRwWvQpKJVTcOJWL8lmUv3Dny9yDT93PwHv7U1RYRE5WLjqDDmO+CZ29nvyc/NtqJIrw96/QLs3MR3XdtaYyaCmrZZxcRXEZyUs2ofVxo9u2+RzuNQOp/MZLbYjKQyK1Rfr8B0OeGEbP4b0pLiwmPysPO72WwvxCtAYdxtyCW2okivRZ1v1rtEUgqt4Dcfm+qemC2DjLNDyNsuupUqkkJCSEn376idjYWJ5//nl27drF7t27GTZsGAAJCQl4eXkB4O3tTXx8fNX+fn5+ADRv3hw3NzfCw8MJDw9n3LhxFBdbn5Bi5cqVvPnmmzz22GNkZFQPwvb2rhyvZG9vj4eHB0qlEkdHR0wmUw2fAFq3bk18fPwNP78RfxzH2dm5ShvAx8cHABcXF4tjFhUVsWbNGtasWUPLli3Jzs7m0UcfxcXFhSeffJLVq1djY2NT62d1JS06CWcPV1Sayn1a92hH/N4TaB312F5XcdQHUdqXjydh8HRFeU3Xrac/aZEnsXXSV1V2Kjs1/ZdP5rc135P521l8Rva8maRQfwGCW7cgI8dI6bUbz5NnrzAg0Ju8whKMxZUDzC/lmnB1qEzKDlpblAqosNJtwZwaj9KlOVw75yrf9pTH/go6A9hV+lyeeAqlq1vlDnZaUCqR8mtOanI9RSfiUbu3QKGu1NV164BxXxRKRwNKvZbyS5lkzHmL7DVbyF6zBYDs/2y32kgEKIuLReXmBtfWpVMHdaQ06ggKe3sU17oj6SZOBWVlN12VhycVly5ZrdRE+SzKX7jzZe6bjTuZ9dQcIkIXcSTyFzp27wBAcM+OHL42DlGhUODm3uKWtUX4+1do5x1LxM6zOYpruk69AsjccwIbJz2qa7re06rXzyrJyEbt4oDSTnNTXVF5SKS2SJ//YO+m3bw2/t+smvY6J/Yex79bAAABPQI5sff4LWmJ9FnW/Wu0RSCq3gNx+b6p6YLYODdGJEH/GiuN8o0iwIgRI3j77bfp27cv/fv3Z/Hixej1+qruloGBgaSlpeHk5MS5c+cIDAys2lehUFT9nZ+fj7OzM2vXriUpKYmZM2eyY8cOlEolkiRx+fJl1Go1Li4uVfuYTCbee+89srKyePDBB6u6hFojMDCQqKjqiUXOnj3LkCFDkCSp1s+BKj/i4uLo0KFDDf+vp7bPAwMDOXnyJKGhoQAcOXKE1q1bc+rUKUJDQ3n++ed57bXX+OabbwgMDKzx2cSJE+v028qKS/k64lNGLRyPKSufS/FpJB+O5d45j1OUZ2T/BzsAGDz9IZw8XAm+/y7M5WaSDpxuMG1zcSmH5v6Hvoufpjgrn+y481w8FEuv8McoyTVx6r0dDF79DC4Bnth7TwBArbXl7K5fGywWWo0Nr4zux2vfHMFZb4d/K2d6+7vz1ndROOpsmTS4My+N6s3nP8dy6txl0rMLeO7eHjjr7W4uXFZC8VfvY/uv/0My5lFx8SzmxFPYPjARqbCA0j1bKd2zFdsHJ6K55xEUrq0o3vgWlJfdVFYqLuHSgvdoMS8Mc3Y+xQmpFB45RfOXJmHOK6hqaKmcHXB6bCQAzaaMIXfz95RftjLRSEkJxtVvoZ82Aykvl/KUZMpORqObHIZUkE/RV5uoyMnGMGMm5ksZ2Pi0oWDFUqsxFuazIH9BbJn7cPknPPNKKF5tvPBo7c67iz8EoG2HNsxbNZenh04BoP+wvvQb2gdvPy+emDaWTR9sbhB/RWlXFJWSMHstAUsnUpqVj/H3NHIOxtB23pOU5Ro5t/oblLZqApZPpjg9E72/B4nz1mE23nysjag8JFJbpM+1sXnFRh6f+zSt2rjj5t2Sz5euu2WNphaLpqYrWrs2fj1xmh0/RpKZlc1H675g/OMPY3cLPaFE1XuAuHzf1HRBbJxlGhyF1EhHUEqSxODBg/n444/x9/cnPDwcT09Ppk2bBkBycjJr166ldevWpKamEhoaipeXFwsXLiQuLo6wsDCGDx9OTk4OCxYsICgoiJycHNq2bcuYMWOIiopi3bp1aLVaZs+ejZubW9Wxn3vuOTp06EBxcTFarZZJkyZV6S5ZsoS9e/eyfft2Xn31VS5evMiyZctYvHgxI0aM4N1336WkpARJkrCzs6sao3ijz5csWYKNjQ1ms5mQkBDmzZvHgw8+yMMPP0x4eDiOjo4sWLCAN998k7i4OBYvXkxSUhLLli1jyZIlDB8+nPfff5+ioiJUKhUlJSW8+OKL/O9//+Pw4cN4enqSkpLCs88+S2xsbI3Prn/T+Wfm+jwh8AzfedqUi3lBnmIjpkf6/FXWxxbWl/LIA0J0038QEwtXX5N1o3qSmaoXoivK55UJ4ia0OlB2ybpRPRiobilEVyRDi27cXfR2SFGrrRv9Q/hJJeYaGWwWc03L/DVMOLlYiG7x4hlCdEsSCoToNkVsA+o2wU19sH9npzDtO8lQr+FCdPec/1GI7u3SaBuKMg2P3FCsRG4oViM3FKuRG4rVyA3FauSGYjVyQ1GmNuSGYtNFbijC3Z7DhOhGXqjjciR/MY1yjKKMjIyMjIyMjIyMjIxMw9FoxyjKyMjIyMjIyMjIyMg0Fioa8cQzIpAbijI3ZLwm17rRLZKeJ67bQucu6UJ0RXVdNL3/nRBdAMNrLwnRbX33rS+QXhek1GQhugCu3994huHGiKgukQAHBGV8kT6LYr7NVSG6/3vR37pRPShsYuUYIEVQN+oBumwhuiCuS3lT6wIP4nwW1UXUbv47QnTVF8TUewDSOTHaooaf2Nw9UIiuTONFbijKyMjIyMjI/OMR2ehqaohqJMrINHUa81IWIpDHKMrIyMjIyMjIyMjIyMhYIL9RlJGRkZGRkZGRkZGRsULFP2yxCLmhKFNndH27YH9PX8zZeUiSRNa7m2q1cxgVgvvK2SR0eRipsLhO2s4DO9FiZC9KM/NAgtSVWy2+b/FgH5rf24OCmHM4dPHj0pb9ZO6Otqqr7todTb+BSLk5SJJE0efrLb63vede7O57AEpLASj+cRclkdanKBYZC1E+/3I6kcio33BxNKAAwh6xXAso/Uo2H2z5ET9PN5IvXGbcfYMI8HG3rpuUTmTMWVz0WhQKCLunm6VudgFv7owiyMuVhItZjOjiR0hQa6u6AEqvQFRtu0JRAZIE5Uctp8/WDB2Hwql5tb2rJ8WbliLlZ91UV1SMRemCuGvE3smeaXOncjEtA09fDz5a/gk5mTk17Dx83Jk+Lwyz2UxE6KIG81ektqhYiCrH0PTKsl+/jnS8tyfGrHyQJCJXbath0+m+3gyf/Rg7F20gfu8Jq5p/ICovN7V6RGQeEuWzql1nbDr3RTLmgSRR+sMXNX/XwFEAKJu5odDqKd60qk4+/5nMrGzeWbOBhDMpbP6k/uMam1qdKjLGIu8DGhv/rGbi37yhePToURwcHGjfvn1Du3JLXLhwgfj4eIYOHdrQrlShsLOl5aLppI4MQyorx2N1OLo+nSk8csrCTuPnhaat9y1pK7UaAldM4ZeBs5BKy+n0yUycB3Qk52BMlY3KTsOZJZsoSc/C0NGHTh8/b/3Gz9YWw4yZ5IROgLIy7OctRt2lG2UnLfcrWLaYist1X19OZCxE+VxUUsqStVvZtnI2GrUNM1eu4+hvifTu1K7K5vX1XzNqUE/u7tWJpLQMXln9OVtef/HmuqXlLN12iP/O+hcaGxWzNkRyNOkivf2rK8N1+07TxceNcQM7Ep+eyUsbf6pbBWGjRnP3kxR/tgjM5Wju+z+UXoFUnK+e1MOcFod5z2eVGxo7NMMmWL+5FhRjYboIvEaAsDmTOfbzcfbu2E+/e/owfX4Y/56xrIZdUNf2HNl7lF6DejSov00tFsLKMTS5sqy20zB66STeGjYbc2k5T37wPH59g0g+HFtl4+zZHFN2AXkZdfj91yEsLze1ekRgHhLms9oWu7HPYlr2DJSXYzdpLqp2nTEnVuva9ByMVGSi/Ne9ACjdfW7J9+uJPh3LkAF3EZ+UUm+NJlenCoyx0PsAmQbnbz1GMSoqiri4uIZ245ZJT09nz549De2GBdqugZRdvIJUVg5AYfTvGEJ6Wdgo7GxxmTKGzBs8YbwRSvw0/QAAIABJREFUjj3aUXzhKlJppXZuVAKuQ7ta2GRs3k9JeuWNg863JabEC1Z11e2DMF++DGVlAJTFxqDp1aeGnd0Do9GOGYv2yfEo7K3PyioyFqJ8Pp14llbNndGoK58NdQnw5cAJy2vj3KVMWrk6AeDRwoXEtAxy8o031z13hVbOBjQ2qkpdnxYcjE+zsHExaMkxVT5RzjYV08GzmVV/AZSt/JDys8FcGeeKi8mofDtZ2JgTj1X9bRPUj/LYQ1Z1RcVYlC6Iu0YA+tx9FzHHK2feO/1rDH2H9K7Vbvf2SMqvlfmG9LepxUJUOYamV5a9u/mTk56J+dq5O3cskcAhlucu58JVUo7c+kyQovJyU6tHROYhUT6rfAOpyL4K5ZW65tQ4bIJ6Wv6uHiEo9AbUA0ehuf9ppJKiOuv/mWGDB6DT6eq9PzS9OlVkjEXeBzRGKpCE/G+sNNgbxXfeeYevv/6apUuX0rlzZ4YPH05kZCQxMTEsX76ciIgI9Ho9n376KT4+PqSkpDBlyhRcXV2ZOXMmAIGBgRw8eJAZM2Zw+PBhXFxcKC4uxs3Njf79+xMVFYW9vT3p6emEhoZia2tbdfzk5GTWrl2Ln58fiYmJjBgxgsGDB7N582bOnj2Lvb092dnZzJ07l/3797Ns2TJGjhxJZmYmZ8+eZfz48Rw6dIiEhARWrlyJo6MjS5Ys4ezZswwYMIArV66gVquJiIjAZDLxwgsv0KNHD1JTUxk1ahR9+/YF4N1336WsrAy1Wk1iYiJvvPEG27dvJy4ujtWrVzNy5EhWr15Neno6AwYMICYmho4dOzJjRuX00ps2bSI1NRVnZ2cKCgqYPXs2OTk5LF++HD8/P86fP89DDz1EmzZtanzWo0cdnoZfQ+XiRIWpOmlUGAtRuTha2DR/4Wmy3v8C6ngD9QcaVwfMxuquKWZjEWpXxxp2Sjs1vi8+gnO/DsROW21VV+HkjFRUWLUtFZpQOFlOXV92+iSlUUeQ8vJQ9+yNffgi8ufMvKmuyFiI8jk734jerrr8G7S2xOVZVlhdA3w5nXSODm28iDlzHgBTUQnODoYb6xqL0Nmqq7b1thqyjZZvAsYN7MjMDXt4Y8cvxJzPJPTuLjf19Q8UOnuksupyIZUWodTe6Cm1AlXrIMpPRFrXFRRjUbog7hoBcG7mRKGx0u/CAhMOzg6oVErM5oo67f9X+9vUYiGqHEPTK8sGVwdKTNWxKDYW4t7Mx9rPrBOi8nJTq0dE5iFhPhsckUqqfaa4EIXBUlfh3AKFnY7SH75E0dwd3bTFmJZOA6n+1+bt0NTqVJExFnkfINPwNFhDcfr06Xz77bcEBwezd+9eXFxcOHjwIN27d2fQoEEEBwczduxYIiIi6NSpE6dOnSI8PJwvv/yS0NBQXn/9dV566SUmTJhARUUFS5YsYfPmzbi5uREdHY2vry+9evXCw8ODhx9+uMbxX3nlFcLDwwkODubq1avExsaSnJzMxo0b2bFjBwALFixg69atjB07lt27d+Pp6ckLL7zA0qVL+f3331m0aBHr1q3jxx9/ZNKkSYwePZq3336bZ599FoApU6awb98+evfuzYQJE+jbty+5ublMnjyZvn37cvDgQU6dOsXHH38MwJYtW9BoNIwePRqA5557DoAXX3yRJ598sko3JCSEGTNmkJyczGeffcauXbtQKBTMmTOHyMjKG4y8vDzGjRtHSUkJubm5REdH1/jsVjBn56LUa6u2lQYd5uy8qm2blq6oHA3YjxhQ9ZnLxNGY9h+jOCbpptqlmfmoDHZV2yqDlrLMvBp2FcVlJC/ZhNbHjW7b5nO41wyk8huv3ybl5qDQVj81VOj0SH/63dd3vSk7eQKHRa+CUgkVN06MImMhymcXBwOm4pKqbWNRCS6OlpXVi08/wIad+/nsu/046LU42etwa1bzBtxC16ClsKSsattUUorLdecSYP5XBxjdM4ARXf3INhbxwIqtfDfnURx1tn+Ws0AqLEChrtZSaLRIRQW12qr8OmNOPX1TvSpdQTEWpQt3/hp58Kn7GXhvf4oKi8jJykVn0GHMN6Gz15Ofk39bDSMR/orUFh0LUeUYml5ZNmbmY6uvjoWdQYcpK/8mv7DuiMrLTa0eEZmHhPlszENhe90bPjtd5Ti66ykuxHw2sdL+6kWw06FwdkXKvnJTn0XR1OpUkTEWeR/QGGnMb/9E0GBdT5VKJSEhIfz000/Exsby/PPPs2vXLnbv3s2wYcMASEhIwMvLCwBvb2/i46vHdPj5+QHQvHlz3NzcCA8PJzw8nHHjxlFcbH3gdEJCAt7e3lUaISEhJCYm4uFRvUBw69atLY75h72Dg4PF3yZT9dpLf/j7x/5JSUlIksTRo0d57733+Oqrr8jJyanyoXXr6j7ajzzyyA399fLyQqVSoVKpUKsrn9wkJiaiVCr5+OOPWbNmDTY2NhiNRkJCQujZsyeTJ08mIiICGxubWj+7FYpOxKN2b4HiWjcLXbcOGPdFoXQ0oNRrKb+UScact8hes4XsNVsAyP7PdqsNI4C8Y4nYeTZHoanUduoVQOaeE9g46VEZKisl72n3V9mXZGSjdnFAaae5qW5ZXCwqNze4Fi91UEdKo46gsLdHca3biW7iVFBWdpdQeXhScemS1cpSZCxE+RzczoeMqzmUXnvKezIhlYFd25NnLMR4baKBK9n5jB8Vwrj7BtG5nQ99ggNQWyknwa1bkJFjpPTaDfjJs1cYEOhNXmEJxuLKiRIu5Zpwdaj03UFri1JRt1nDKjKSUTi4gKrSB6W7H+bU38BWBxrLSkjVoQ/lvx+xqgniYixKF+78NfLNxp3MemoOEaGLOBL5Cx27dwAguGdHDu89CoBCocDNvYVV3/4Kf0Vqi46FqHIMTa8sp0Un4ezhiurauWvdox3xe0+gddRja9DedF9riMrLTa0eEZmHRPlsTo1H6dIcrtU3Kt/2lMf+CjoD2FWWi/LEUyhd3Sp3sNOCUomUX3Oiqb+KplanioyxyPuAxogkSUL+N1ZUCxcuXNhQB9fr9XzyySe0bduWhx56iBUrVqDVahkzZgwA+/btIzg4GDc3NxISEoiPj2fMmDGkp6dbTPaSn59PYWEhYWFhdOrUiXnz5vHEE09w7NgxDAYDzZo1o7S0FK22uiK6Xvvy5ctERUXh5+fHli1beOKJJwDYvn07QUFBBAUFsWfPHtq3b4+npydRUVFVk+TExcVRUFBA7969SU9PZ/fu3VX+b9iwgZCQEI4cOcK5c+eYO3cuwcHBbNy4kfHjx1NYWMiBAwcYNapylqmtW7fStm3bqjecAwYMIC0tDZVKRWRkZNWb0fXr1zN+/HiUSiV79+5lxYoVdO/eHTc3N9zc3MjMzKRLly6MHz+e3Nxc9u3bR4sWLWp8NnDgwJuen8zVn1dvlJspST6Py+SH0XYOpPxqNvnb9uA64yls2/lQdG1cj8rZAZeJo9H36QzlZkrPplt0VSkoqfn0SCo3U5iYTutpo3Do7k/p5VwyvtxHm5cexdDei7yoBJz7B+H2UD8M7b1xfyyE9I2R5B9LtNBp2fJPff/NZszn09D+ayzqwA5UZGVR8r8f0I2bhI2PL+Wxv6Hy8cVu+EhUPm2w7T8I0ycfUZF51UKmMPdPN5h3KBY65zJqcId81tzTz2JbbaPC16MFG3bu57ekNJo7O/DQ4N588NUPnDl/iW6Bbdh/PJb1O/aRfiWLU0lnee6xkdhp1BY65FkeR61S4tvCic8O/MZvaVdo7qDjoZ7t+GB3NMmXc+jq25I2LZz44tDvpGXl892JM9zfrS3d27Sy1M2tpTKqqKAi+xI23e9B1bINkikP8++HUfcZhbKZOxUXkwFQNPdEqXem4uxvNTWAsjOZQmJcgzuke/lSzW5Jd+oa+UlZWEP7t2OxPPTUKNp28KNTj468v2QNxYXF+Af5sWTNQrZv+BaA/sP6MuT+EFr7eaPVa/ntWPUEJIMrLBfnvlP+1kZjj8XT/f409kZUOYZGX5aPZDlYhqLczJUz6QyYeh9eXdpScCWX41sPMPSFMbQM8OTctXM0ePpDtOnTATuDltKiUrLPXbbQ6aKq5WHwHcjLInPyX1aPiPL3Dvrs0FZhqVthpuLyeTRDRqPyCUDKz6H86B5sRzyJqlVrzCm/Y05LQt1zCCp3H2y6D6L0f1uQLluOPbYZNKKmz7Xw64nT7PgxkoSkFIpLSujYvt1NH55L+TWvl8Zep1aknrM8zh2KsbJNzQlo7pTP6sB+NbQbI2tXrhOiO2XWBCG6t4tCasBmrCRJDB48mI8//hh/f3/Cw8Px9PRk2rRpQPU4wtatW5OamkpoaCheXl4sXLiQuLg4wsLCGD58ODk5OSxYsICgoCBycnJo27YtY8aMISoqinXr1qHVapk9ezZubm5Vx/5D29fXl0uXLvF///d/uLm5sXnzZs6cOYO9vT35+fnMnTuX2NhYFixYQPv27Zk6dSoLFizA0dGRmTNn8uabb5KXl8eiRYu4cuUKH3zwAX379iUtLQ21Ws38+fNJSUlh3rx5dO7cGScnJ9auXcuSJUsYPnw47777LiUlJdja2uLk5MRTTz1Fbm4u/+///T98fHwYMmQI0dHR7Nixg6VLl2I0GnnllVd4+eWXGTNmDF999RXJycno9Xpyc3OZNWsWcXFxbNmyBT8/P86dO8fYsWMpLS2t8VlwcPBNz098u5F3/Jyn59Vt0Hx96Nzl1mZwqyuZqXrrRvXA1ddk3aieGF57SYiudO7WJ5mok25qshBdgMLv460bNSJOnWwpTHu+jZUGQj1ZXN7culEjQ1Qs/veSv3WjetDUyjHAygQP60b1YLzm1oZO1BWRObmp1SOi/AXwuFdMZza7+fVf6uJmmC+IqfdAXJ1aHnlAiK7N3Td/wXA7aB+cLUz7TtLLfZAQ3aiL+4Xo3i4N2lD8u3H06FG2b9/O8uXLG9qVO4LcUKykqVXwIDcUr6ep3WDLDcW/BrmhKB65oVhNU6tH5IZiNXJDsRq5ofjPayj+rZfH+CsxGo188803JCQkcOzYMes7yMjIyMjIyMjIyMg0GSRB/xorDTbr6d8Ng8HAq6++2tBuyMjIyMjIyMjIyMgI4J/WEVNuKMrcEBFdWtJPiut6ahsgRtuV2qeyv11Edu3RC+rOomjdQYiuSDJTzwvRFdWN2sNRTHkDQFDPOqE+i0JQLMwp6UJ0dSMCheiCuG6t56T6L4reEIiqQwBIFbPeX1Or9wBKEsToqgV1EVV5iqv3br4YUP0pSfhOiK6qjbhhIjKNE7mhKCMjIyMjIyMjIyMjYwV5HUUZGRkZGRkZGRkZGRmZfzTyG0WZOqPu2h1Nv4FIuTlIkkTR5+stvre9517s7nsASisXWC3+cRclkbvrpO08sBMtRvaiNDMPJEhdudXi+xYP9qH5vT0oiDmHQxc/Lm3ZT+buaKu6qnadsencF8mYB5JE6Q9f1PxdAyvXsVQ2c0Oh1VO8aZVVXZGx0PXtgv09fTFn5yFJElnvbqrVzmFUCO4rZ5PQ5WGkwlrWFfsTvySlExlzFhe9FoUCwu7pZvF9enYBb+6MIsjLlYSLWYzo4kdIUM01k2ronk4kMuo3XBwNKICwR4Zb6l7J5oMtP+Ln6UbyhcuMu28QAT7uVnVF+iwqxqLKsUif7Z3smTZ3KhfTMvD09eCj5Z+Qk1lzXUsPH3emzwvDbDYTEbqowfwVqS0qFqLyEIDSKxBV265QVIAkQfnRnRbfa4aOQ+FUPSut0tWT4k1LkfKzbqorMsf9mY79guk54i7yMyvP57ZVX9VLR1S5EHX+mpq/IK5ciNIVWT/9mcysbN5Zs4GEMyls/qT+M7CK8lnkNS0qDzVG5DGKMjU4evQoDg4OtG/fvqFduSlGo5GwsDA2btwIwLZt2xg6dCgODg5W9qwDtrYYZswkJ3QClJVhP28x6i7dKDtpeZNbsGwxFZdvbZkKpVZD4Iop/DJwFlJpOZ0+mYnzgI7kHIypslHZaTizZBMl6VkYOvrQ6ePnrd9gq22xG/sspmXPQHk5dpPmomrXGXPiqSoTm56DkYpMlP+6t9IXdx/rDguMhcLOlpaLppM6MgyprByP1eHo+nSm8MgpCzuNnxeatt511i0qLWfptkP8d9a/0NiomLUhkqNJF+ntX125rNt3mi4+bowb2JH49Exe2viT1UZXUUkpS9ZuZdvK2WjUNsxcuY6jvyXSu1O7KpvX13/NqEE9ubtXJ5LSMnhl9edsef3FBvNZVIyFlWOBPgOEzZnMsZ+Ps3fHfvrd04fp88P494xlNeyCurbnyN6j9BrUo0H9bWqxEJaHAGzUaO5+kuLPFoG5HM19/4fSK5CK89VjDs1pcZj3fFa5obFDM2yC9ZszgTnuz2jsNEx6NYzZ98ygvLSc5z+cTVC/TsQe+u2WdISVC0Hnr6n5C4grF4J0RdZPtRF9OpYhA+4iPimlXvsL9VnkNS0qD8k0CuSup3UgKiqKuLi4hnbDKgaDgc8++6xqe/v27eTn598RbXX7IMyXL0NZGQBlsTFoevWpYWf3wGi0Y8aifXI8Cvu6DbJ37NGO4gtXkUrLAciNSsB1aFcLm4zN+ylJr0wqOt+WmBIvWNVV+QZSkX0Vyit1zalx2AT1tPxdPUJQ6A2oB45Cc//TSCXWJ18QGQtt10DKLl5BKqv0uTD6dwwhvSxsFHa2uEwZQ+YNnj7XxulzV2jlbEBjowKgi08LDsanWdi4GLTkmCqfVmebiung2cy6buJZWjV3RqOufObUJcCXAycsr5VzlzJp5eoEgEcLFxLTMsjJNzaYz6JiLKoci/QZoM/ddxFzvHISiNO/xtB3SO9a7XZvj6T82vEb0t+mFgtReQhA2coPKT8bzJXaFReTUfl2srAxJ1Yv12QT1I/y2ENWdUXmuD/j3z2AzPSrlF+7bhKPxdN1SB0a4H9CVLkQdf6amr8grlyI0hVZP9XGsMED0Ol09dpXtM8ir2lReaixUoEk5H9jpcm/UXznnXf4+uuvWbp0KZ07d2b48OFERkYSExPD8uXLiYiIQK/X8+mnn+Lj40NKSgpTpkzB1dWVmTNnAhAYGMjBgweZMWMGhw8fxsXFheLiYtzc3Ojfvz9RUVHY29uTnp5OaGgotra2VcdPTk5m7dq1+Pn5kZiYyIgRIxg8eDCbN2/m7Nmz2Nvbk52dzdy5c9m/fz/Lli1jxIgRGI1Gfv/9d9544w08PT25fPkyb7/9Nn5+fqSlpdGpUyceeeQRIiIicHNzo7CwkObNmzNp0iQ2bNjAqlWrWLFiBX379uWFF16gbdu2tG3bliVLlnDs2DF+/vln0tPTWb9+PW3atOHs2bPs3r2bFStW4O3tzQsvvMCoUaN4/PHH6xRnhZMzUlFh1bZUaELhZLmwdNnpk5RGHUHKy0Pdszf24YvInzPTqrbG1QGzsbo7jdlYhNrVsYad0k6N74uP4NyvA7HTVlv32eCIVFLtM8WFKAyWugrnFijsdJT+8CWK5u7opi3GtHQaSDeeoU5kLFQuTlSYqivtCmMhKhdLn5u/8DRZ738BdbxRBcg2FqGzVVdt6201ZBstn+aNG9iRmRv28MaOX4g5n0no3V2s6+Yb0dtVXw8GrS1xeZYVVtcAX04nnaNDGy9izlTOQGoqKsHZwdAgPouKsahyLNJnAOdmThQaK8tzYYEJB2cHVColZnP9Z2kU6W9Ti4WoPASg0NkjlVWXOam0CKX2Rm+fFKhaB1F+ItK6zwJz3J9xaOZIsbH6fBYaC/Fp1uaWdUSVC1Hnr6n5C+LKhShdkfWTKET5LPKaFpWHGiuNec1DETT5huL06dP59ttvCQ4OZu/evbi4uHDw4EG6d+/OoEGDCA4OZuzYsURERPx/9s48Lqp6///PmWGG2UBBcJQdcUFRzP1qLmikabmVS2VmmZF2Dc3KNCzTNLey/d7fNbtXS725fNOya8YV00xNcskFWQxxCYVAlmGAYWBmfn8MgSPIoPJJuZ1njx4PjvOe13nP+7zP53M+57zP50OnTp04fvw4cXFxfP7558TExLBixQpeeuklnnjiCWw2G4sWLWLjxo0YDAaOHj1KaGgoPXv2xN/fnwcffLDG/l955RXi4uKIjIwkJyeHpKQk0tPTWbduHdu3bwdg/vz5bNmyhfHjxxMfH4+fnx8PP/wwq1evJj4+nsmTJ7Ns2TKio6MZNmwYFouFb775BoCoqCiio6MBGDlyJOPGjePxxx9n586dGAwGNBoNLVq0YNasWcjlct5/31EX37dvX/z9/Zk0aRIBAQHYbDb27NlDUFAQzZs3p23btvUeJALYC/KRaarvlMm0OuwFBU42V5crlP98DM8Fb4JcDra6Ox9LrhGFXl21rdBrKM8trGFnM5eTvmgDmhADXb94jQM9Y7FXXH9yabupEJn7VXf31FrHOxtXYy7Bei7NYZ9zCdRaZF4+2PN+u76uwFhY8wqQ6zRV23K9Fmtetc9uLXxQNNHjMbRf1b95Pzma4r2HMZ86c11db72GkrLyqu3iMgveV8Uc4LVN3zO6RzuGdgkjz1TKiOVb+M+ccTTRul8rV63rqafYXFa1bSotw7uJc2f14uMj+PTrvXz2n7146jQ09dBiaFZzAPVH+SwqxqLyWITPIx97gP739aW0pJT8KwVo9VpMxmK0HjqM+cZbGhiJ8FektuhYiGqHAOwlRciU1TknU2mwl9a+rIEirDPWjBP181lgG3ctxiuFqPXVx1Or12K8UvO8cYWonBN1/BqbvyAuL0TpiuyfRCHKZ5HntKh2SOLOoNGXnsrlcqKiovjuu+9ISkpi5syZ7Nixg/j4eAYPHgxAamoqgYGBAAQFBZGSUl03HRYWBoCvry8Gg4G4uDji4uKYOHEiZrPrl8ZTU1MJCgqq0oiKiiItLQ1/f/8qm+DgYKd9hoSEAODt7U1xcXGVTnCw490qlUrFyJEjAcjJyWHlypWsWrUKk8lEQeWJ/dhjj7Fu3TrS09MJDQ1FLq/7UMrlcsaPH8+GDRvYu3cv/fv3d/nbrqY8OQmFwQBKxxMeZURHLIkHkXl4IKsstdA++TTIHWWCCv8AbFlZ9bpoKDychjrAF5nKcd+iac925O46hltTHYrKC4igaQ9U2ZddzkPp7YlcrapT15qRgtzbF9wcuorQ9lQk/QRaPagduhVpx5H7GBxfUGtALsdurDl5xR8Vi9JjKSj9miOrLDvRdu2AaU8i8iZ65DoNFVm5XJ7zDnmrNpO3ajMAef/a6vLiOjK4OZfzTVgqByQ/n/uNfuFBFJaUYTI7XlzPKijGx9Phv6fGHbkMbC5e2o5sG8LlnHwslXe8f07NoH+X9hSaSjBVTrrwW56RScOjmHj/ADq3DaF3ZDuUbq7vUYnyWVSMReWxCJ+/XPc1Lzw2h3kxCziY8CMduznWCYvs0ZEDuw8BIJPJMPg1d+nbH+FvY46FqHYIwHY5HZmnNygc2nK/MKwZJ8FdCyrnmyqKDr2pOH2wXj6LbOOu5cyRVHz8fXGrPG/adg/n2O7DLr5VE1E5J+r4NTZ/QVxeiNIV2T+JQpTPIs9pUe3QnYrNbhfy/52K4vXXX3/9djtxq+h0Oj755BNat27NqFGjWL58ORqNhjFjxgCwZ88eIiMjMRgMpKamkpKSwpgxY8jMzCQlJaXqiZ3RaKSkpISpU6fSqVMnXn31VR599FEOHz6MXq+nWbNmWCwWNJrqu4BXa2dnZ5OYmEhYWBibN2/m0UcfBRzvCkZERBAREcGuXbto3749AQEBJCcnU1RURK9evTh69CjNmjWjTZs2mM1mvv7aMWPUokWL+H//7//RrVs3vvnmGwYPHoynpychISGsWLGCnJwcnn766apy2LVr1zJp0iQAvvzySwYNGkRWVhZeXl60adOGN954g7KyMiZPnoxMJqszriXr1lRvWK1YL15A89B4lOEdsF25Qtl/d6KdOBm3kFAqkk6iCAlFPWQYipBWuPcdQPEn/8CWm+OkmZ1VszTCXmGlJC2T4GnD8ezWBkt2AZc/30Orl8ahbx9IYWIqXn0jMIy6G337IPwejiJzXQLGw2lOOv4dy52FbVZs2RdRDRqNIqQddmM+FYd24T50AoqWwVjPnsZ64QzKHoNQ+IXg1m0Alv9uxp7t/N6Y9YrFWbeBYlFSUMsAocJKWfpFvJ96EE3ncCpy8jB+sQuf2MdwbxtCaeX7UwovT7yfHI2ud2eosGI5l+lUxtR0cIiTrFIhJ7R5Uz77/iQnL/yGr6eWUT3a8vf4o6Rn59MltAWtmjfl3/tPc+GKkf8c+4UHuramW6uWTjpXz1oGoHRTEOrfnE+/3svJMxfw9fJk1MBe/H3TTn65mEXX8FbsPZLE2u17yPztCsfPnOO5h4ehVimddCh0jk1D+lz433NCYlxU5vzUsqHy2FN9Tb41oM9f1VLCdvJwEqMeG07rDmF06t6Rvy1ahbnETJuIMBatep2tn34FQN/BfRj0QBTBYUFodBpOHk6q0hipuuYipYH8rZU7PBaPdbgmtxuoHZJ71TIpmc2GLS8Lt273omjRCntxIdbTB1D2Ho68mR+2S47FsWW+Ach1XtjO1T5BTPkvuc7/0EBt3De5GlxhrbCS+cuv3P/0SFp3aUtBdh7fb/6uzu8McKvlCXwD5IVn61r6xAY6fkW/XHMh2EB5XMNnUf0eNFheiNJV3Xu303ZD9U9yT+d+73r8dOwE279NIPXMWcxlZXRs3xY3FwM4u9H5NzSUz5ZdB5x31EAxVrbxqfkjGqgdUv5leJ2xulP421urheg++9IUIbq3isz+PzDPq91uZ+DAgXz88ce0adOGuLg4AgICmDZtGlD9HmFwcDAZGRnExMQQGBjI66+/TnJyMlOnTmXIkCHk5+czf/58IiIiyM/Pp3Xr1owZM4bExETWrFmDRqNXNa1DAAAgAElEQVRh9uzZGAyGqn3/rh0aGkpWVhbPPPMMBoOBjRs38ssvv+Dh4YHRaGTu3LkkJSUxf/582rdvz8yZM3njjTcoLCzkjTfeQK1W8+677xISEkJOTg5jx44lODiY2NhYfH19adWqFevWrWPUqFHMmDEDgA8//JDc3Fx+H+t/9dVXvPHGG8yaNYtHHnmE1atXc+HCBcrKyli2bBkACxYsICQkpGowWRe5QwY08JGC4z+3aHDN3+k5vliIbllq7SUUt0puhk6ILkDwiighurLgDkJ07edPC9EFOP/SHiG6mYU3N3mHK/ybiMk3gKeK67f8xI3yiU7t2ugOQ1Qsdj50a5NZXA9FK3/XRjdJyTcpro1ugtgUbyG6r7mXuTa6CfzvE1dklbnz1kqYr4con0X1eyLRL3tJiK4iQEy/B2D9VUzfZ3p5hRBd7dBwIboA2pn/EKbdkEQYap/k7FZJyj4kRPdW+Z8YKEq4xmKxoFKpWLFiBdOmTUOvd/2itjRQdCANFKuRBorVSAPFaqSBYjXSQLEaaaBYjTRQFI80UKxGGiiKo33znq6NboLk3xKF6N4qjX4yG4n6sWrVKiwWCy1btqzXIFFCQkJCQkJCQkJC4s+LNFD8kzB9+vTb7YKEhISEhISEhIREo0VaHkNCohIhJQY/F7i2uUnc7rmxmVzri6JVuhBdH0HlXo0RUSWtDvYIUT2rVLo2ugk6h4opoQbglEKIbKMsfxMUC2Gl6jsvCtEF8L9PTBk1gpq4fSViSlr5Ah77KFKM9s49QmQbW78H4kqdRb3CUPfiRbeGyLJWEYg6dgDamcKkJW4BaaAoISEhISEh8adH2CBRQkLif4Y7eSkLEUgDRQkJCQkJCQkJCQkJCRf82UpPxU3xJSEhISEhISEhISEhIdEokZ4oStQbeWA4itZdoLQIux0qDn3t9LkqeqLTguxynwDMGxZjN15xqe3VvxPNh/XEklsIdsh4e4vT581H9sb3vu4UnTqP511hZG3eS278UZe6P57JJOHUObx1GmQymHpvV6fPM/OKWPl1IhGBPqReusLQu8KIigh2qSsyFsou3VDd3R97QT52u53S9WudPne/9z7U948Ai2NBZPO3OyhLiHepKyoWP55IIyHxJN5N9MiAqWOHOOv+lsffN39LWICB9F+zmXj/ANqF+LnUFamt7XMXHvf2wZpXiN1u58qHG2q18xwehd/bs0m960HsJa6XU/DrG0Ho0B6UXjGC3c7Rd7Y6fd752QfQ+DahNKcQn06hHH5rC4Xpl13qgri88GjqwbS5T3PpwmUCQv35x9JPyM/Nr2HnH+LH9FenYrVamRezwKWuom1n3Dr3wW4qBLsdy85/1/xN/R0LLMubGZBpdJg3vOdSFxpfLET5C+JyWeTxuxZdEz2PzJlI9oVsWoS2ZOPydRhzC29YR9T5J6rtFHXsRPkL4vq+xtbvgdi+72pyr+Tx/qpPSf3lLBs/ef+Gv/87Itshkdp3GlLpaSPg0KFDeHp60r59+9vtym3hvffeo2PHjtxzzz112n3xxRdER0fj6el56zt1U6K6ZwLmzxaAtQLV/c8gDwzHdrH6xWbrhWSsuz5zbKjUqAY/Ua+BkVyjInz5FH7s/wJ2SwWdPpmFV7+O5O87VWWjUKv4ZdEGyjKvoO8YQqePZ7ocKJZaKlj8xX7+74WHULkpeOHTBA6duUSvNtUN9Zo9J7grxMDE/h1JyczlpXXfue4kBMYCd3f0sbPIj3kCysvxeHUhyru6Uv6z828tWrIQW3aWa71KRMWitMzCotVb+OLt2aiUbsx6ew2HTqbRq1PbKpsVa7cxfEAP7unZiTMXLvPKB+vZvOJF1z4L0pap3WmxYDoZw6ZiL6/A/4M4tL07U3LwuJOdKiwQVesgl37+jkKtou/SyWwZ9DI2SwXRq2LxuzuCS/uTqmzctGp+XLAegFbDe9Fr3iPEP7nStbigvACYOucpDv9whN3b93L3vb2Z/tpU3ohdUsMuokt7Du4+RM8B3V2LKt1Rj/8rxUuehYoK1JPnomjbGWtadYzdegzEXlpMxU+7AZD7hdTP4cYWC4H+isplocevFsbPfoyTPxzn0H8O0PWe7kyIe4K/P39jg05R55+otlPUsRPW74G4vq+R9Xsgtu+7lqMnkhjU7y+knDl7w9+tQmA7JFRb4rbTKEtPExMTSU5Ovt1u3DZiY2NdDhIBtm7ditFobJB9yluGYTfmgbUCANuldBShnZxsrGmHq/52i7ibiqT99dJu0r0t5l9zsFsc2gWJqfhEd3GyubxxL2WZjs5GG9qC4rRfXeqeOP8bLb30qNwcsxveFdKcfSkXnGy89RryKxfezis20yGgmUtdkbFQto/Amp0N5eUAlCedQtWzdw079YjRaMaMRzNhEjIP1zMWiorFibRztPT1QqV03HO6q10o3x9zPjfPZ+XS0qcpAP7NvUm7cJl8o+m2aWu6hFN+6Tfs5Y7jV3L0NPoo5wV0ZWp3vKeMIfc6d/hrw9CtDaZfc7FV5nH2T2cIuucuJ5sjb1U/KZfJ5ZQX12/hcFF5AdD7nr9w6ohjtsATP52iz6BetdrFb02gojJmrlCEhmPLy4EKh701Ixm3iB7Ov6l7FDKdHmX/4ageeBx7WWm9tBtbLET6KyqXRR6/2ugyqBtnjqYCkHo4hS6Dut2whqjzT1TbKerYifIXxPV9ja3fA7F937UMHtgPrVZ7w9+7GpHtkEjtOxG7oP/uVBr8ieL777/Ptm3bWLx4MZ07d2bIkCEkJCRw6tQpli5dyrx589DpdPzzn/8kJCSEs2fPMmXKFHx8fJg1axYA4eHh7Nu3j9jYWA4cOIC3tzdmsxmDwUDfvn1JTEzEw8ODzMxMYmJicHd3r9p/eno6q1evJiwsjLS0NIYOHcrAgQPZuHEj586dw8PDg7y8PObOncvevXtZsmQJQ4cOxWQycfr0ad566y0CAgLIzs7m3XffJSwsjAsXLtCpUyfGjh3LvHnzMBgMlJSU4Ovry+TJk/n000957733WL58OX369OH555+ndevWzJw5kzfffBNvb29MJhPh4eGMGjXKKV7btm1j0aJFPPPMMxQXF5OSkkJcXByBgYEcOXKEbdu2ERQUxNmzZ5k5cyZWq5VFixbRvn17nnvuOWbOnElmZib9+vXj1KlTdOzYkdjYWH744QcyMzNZu3YtrVq1YsiQISxdupSwsDAuXrzIqFGj6N69HnfDK5FpPbCXV5e82C2lyDXXu8spQxEcQcWxhHppq3w8sZqqta2mUpQ+TWrYydVKQl8ci9fdHUia9oFL3TxTKVr36iUMdO4q8kzOdzYn9u/IrE938db2Hzl1MZeYay4sakNkLGRNvbCXllRrlxQja9rGyab8xM9YEg9iLyxE2aMXHnELMM6ZVaeuqFjkGU3o1NXnn17jTnKhc0fYpV0oJ86cp0OrQE794pjiv7i0DC9P/W3RVng3xVZcfVFrM5Wg8HbON9/nH+fK3/4N9RwMAGh8PCk3VetaTKU086n9ab5cqaDN2H7sj1tTL21ReQHg1awpJSaHdklRMZ5enigUcqxWW718q9VffRPsZdX+Yi5BpneOscyrOTK1FsvOz5H5+qGdtpDixdPAXvd+G10sBPorKpdFHr/a8GzWBHPl7yg1laBv6oFcIcd2A3EXdf6JajtFHTtR/oK4vq+x9Xsgtu8Tgch2SKS2xO2nwQeK06dP56uvviIyMpLdu3fj7e3Nvn376NatGwMGDCAyMpLx48czb948OnXqxPHjx4mLi+Pzzz8nJiaGFStW8NJLL/HEE09gs9lYtGgRGzduxGAwcPToUUJDQ+nZsyf+/v48+OCDNfb/yiuvEBcXR2RkJDk5OSQlJZGens66devYvn07APPnz2fLli2MHz+e+Ph4/Pz8ePjhh1m9ejXx8fFMnjyZZcuWER0dzbBhw7BYLHzzzTcAREVFER0dDcDIkSMZN24cjz/+ODt37sRgMKDRaGjRogWzZs1i06ZNlJeXM336dOx2O0OHDqVfv340a1Z9x2rUqFG8//77DB48mODgYHbs2MGKFSt47733eP7559m2bRve3t7s2LGD5cuX8/bbbxMdHU1mZiYAL774IhMmTOCvf/1rlX+xsbH07dsXf39/Jk2aREBAALt27aKwsJCJEydSVlZGQcGNrWdoLylCplRXbctUGuylta8XpgjrjDXjRL21LblGFPpqbYVeQ3kt76fYzOWkL9qAJsRA1y9e40DPWOwV11/hyFuvoaSsvGq7uMyC91X7AXht0/eM7tGOoV3CyDOVMmL5Fv4zZxxNtO7XylUhMhb2gnxkmuo7hzKtDvs1x+rq0o3yn4/hueBNkMvBdv2LKlGx8PbUU2yuvitvKi3Du4lzJ/ji4yP49Ou9fPafvXjqNDT10GJoVvNGwB+lbc0rQK7TVG3L9VqsedX55tbCB0UTPR5D+1X78uRoivcexnzqzHV1S3ONKPXVuiq9BnNuzSf6cqWCvkue5PCyTRSd/61OX3+nofNi5GMP0P++vpSWlJJ/pQCtXovJWIzWQ4cx33hLAyMAu6kQmftVd8DVWse7bldjLsF6Ls1hn3MJ1FpkXj7Y8+qOSaOLhaBzGsTlssjj9zuDHh1MjyG9MJeYMV4pRK3TUGIsQaPXYioouqFBIog7/0S1naKOnSh/QVzf19j6PRDb94lAZDskUvtOxH4TN8NEU1BQwNtvv01gYCDnzp1j1qxZ+Pj4ONmcOHGCtWvX0qFDBzIyMoiMjGTcuHEutRu89FQulxMVFcV3331HUlISM2fOZMeOHcTHxzN48GAAUlNTCQwMBCAoKIiUlOr69rCwMAB8fX0xGAzExcURFxfHxIkTMZtdv8SdmppKUFBQlUZUVBRpaWn4+/tX2QQHBzvtMyQkBABvb2+Ki4urdIKDHXXqKpWKkSNHApCTk8PKlStZtWoVJpOpasD12GOPsW7dOtLT0wkNDUUul5OamkpOTg6rVq3i448/pm3btuTk5NTq99Xx+OWXX8jPz8dkMuHt7V1rnK79rkKhQKFQoLzOIuBRUVH06NGDp556innz5uHmdmP3CGyX05F5eoPC8T25XxjWjJPgrgWVc8Or6NCbitMH661deDgNdYAvMpVDu2nPduTuOoZbUx2Kyo4/aNoDVfZll/NQensiV6vq1I0Mbs7lfBOWysHkz+d+o194EIUlZZjMjheqswqK8fF0NHCeGnfkMtcvKouMRXlyEgqDASqPozKiI5bEg8g8PJBVlp5on3wa5I5SGoV/ALasLJeNrahYRLYN4XJOPpbKO94/p2bQv0t7Ck0lmConXfgtz8ik4VFMvH8AnduG0DuyHcp65J8o7dJjKSj9miOrLBnSdu2AaU8i8iZ65DoNFVm5XJ7zDnmrNpO3ajMAef/aWufFGUD2kTPoA3yQV+axoUcbLiT8jHtTXdUFrEKtpO/Spzi56htyT54jZFiPuiSraOi8+HLd17zw2BzmxSzgYMKPdOzmWPQ5skdHDuw+BIBMJsPg17xe/l2LNSMFubcvVB4LRWh7KpJ+Aq0e1I5YVKQdR+5jcHxBrQG5HLux5sQxjT0Wos5pEJfLIo/f7+zeEM+ySW/w3rQVHNt9hDZd2wHQrns4x3YfqbfO74g6/0S1naKOnSh/QVzf19j6PRDb94lAZDskUvtOxIZdyP+3wsqVK+nduzcxMTFER0ezbNmyGjY5OTlMmjSJp556ivnz57NixQry8vJcagvJ2KFDh/Luu+/Sp08f+vbty8KFC9HpdFUj1/DwcC5cuEDTpk05f/484eHhVd+VyWRVfxuNRry8vFi9ejVnzpxh1qxZbN++Hblcjt1uJzs7G6VSWTWYulY7OzubpKQk2rZtW/UEDuDcuXN07Nix1n1eqxMREYHZbGbnzp2Eh4ezevVqEhIcpRS7d++usr/33ntZuXIlCoWCOXPmVGmoVCpiYmIA+O9//0tAQECtMbt48SLBwcGcO3eO1q1b4+XlhYeHB1euXKFZs2Y14nQ1tfkPVMUpNTWV8vJyhg8fzpQpU1i/fj1r165l3rx5tX6vVirKsezegDJqPJSYsOVmYruYgrLvg9jNxVQc/tbhi28A9vzfoLx+714B2EotpM5eTbvFT2K5YsR0+gL5+07R+tUJlBeYOP/Bl8jdlbRb+hTmzFx0bfxJe3UNVlPd78RoVG68Mvpuln15EC+dmjYtvejVxo93/pNIE607kwd25qXhvVj/QxLHz2eTmVfEc/d1x0unrlNXZCwoK8P0wTvopsViLyyg4mw65T8fRfvUVOxFRko3bcCWn4c+dhbWrMu4hbSiaPlil7KiYqFxVxE35SGW/msr3p562ga1pFentryzbjueei1PjbqH42kZ/HAshQ6tAigsLmHu5JqVAH+ktt1cRtb8j2j+6lSseUbMqRmUHDyO70uTsRYWVV2UKbw8afrwMACaTRlDwcZvqMi+/qQMVrOF/XP/RZ+Fj2O+YiQv+SKX9ifRM+5hygqKOf7RdgZ+8Cze7QLwCHoCAKXGnXM7fnIdDEF5AfD/ln7Cs6/EENgqEP9gPz5c+P8AaN2hFa++N5fHo6cA0HdwH+6O7k1QWCCPThvPhr9vvL5oeRnmTX/D/aFnsJsKsV06hzXtOO4jnsReUoRl1xYsu7bgPvJJVPeORebTEvO6d6Ci/PqajTUWAv0VlctCj18tbFy+jkfmPk7LVn4YglqwfvGaG9YQdf6JajtFHTth/R6I6/saWb8HYvu+a/np2Am2f5tA7pU8/rHm30x65EHU7nU/8ayBwHZIqLZEvdi7dy/Tpk0DoGvXrlXjkKu5dm6Tuh4uXY3Mbm/4eV7tdjsDBw7k448/pk2bNsTFxREQEFD1I35/jzA4OJiMjAxiYmIIDAzk9ddfJzk5malTpzJkyBDy8/OZP38+ERER5Ofn07p1a8aMGUNiYiJr1qxBo9Ewe/ZsDAZD1b5/1w4NDSUrK4tnnnkGg8HAxo0b+eWXX/Dw8MBoNDJ37lySkpKYP38+7du3Z+bMmbzxxhsUFhbyxhtvoFareffddwkJCSEnJ4exY8cSHBxMbGwsvr6+tGrVinXr1jFq1ChmzJgBwIcffkhubi6vv/46AFarlRUrVqDT6SgvL8fd3b2qRPRqBg0axNNPP01WVhanT5/m1VdfJSgoiCNHjvB///d/BAUFkZGRwQsvvIDVauXNN9+ksLCQ1157je3bt7N9+3YWL16MyWTilVde4eWXX2bMmDGsXr2aCxcuUFZWxtixY9m8eTNhYWGcP3+e8ePHExkZWedxLHn3mQbKiGoOLrmxktcboc+qG58AoT7YM9KF6JZ8U/sT4oZA9+z9QnRlwR2E6Irk3Li/CdHdV+Lt2ugmGB1xUYguwMhTCiG6Ox+6tYkWrkdZau1lbQ2BqFh82fH65fC3Qm6GTogugP99Yua1e+ZL1xchN8NAq5hYPPZR3X3irXD+pT1CdINXRAnRFdXvgbi+rzH2e4oAMdqFE54UoisSn2/33m4X6kWQdyfXRjfBhbyTdX7+1FNPkZubW+PfY2NjmTFjBgcOHMDT05OKigoiIiJISkq6bvXgmjVrAHjiiSdc+iVkoChxYwwaNMjp6eSdgjRQdCANFKuRBorVSAPFaqSBYjXSQLEaaaBYjTRQrEYaKFYjDRSrkQaKdQ8U62LAgAF8/vnntGzZkoKCAgYPHkxiYmKtttu3b+fixYs8++yz9dJulMtj/C/x1VdfUVRUxPr162+3KxISEhISEhISEhIS1+FOfEdxwIABHDt2DICjR48yYMAAh682G5cuXaqy27x5M1euXOHZZ58lNTWVjIwMl9q3561aiSpGjBjBiBEjbrcbEhISEhISEhISEhJ1cCcWYs6aNYu33nqLc+fOcfHiRV5++WXAMTHn7Nmz2b59O7t27WLp0qV06NCBhIQECgoKmDdvHqGhoXVqS6WnEtfl44DHGlzzrJu4Wa5aVYh5QC7K50kqcWW4PqHFQnRFlr+JIujZQCG6osqod2nElEQCfF+e5droJuivbCFEVySiYrGwwleI7tl6TDpw09qNrI1ba2kqRFdUHyKSVuU3N4GQK0Tmm6jyevd2YhZxF1kCL4om6/8lRFdkSWtjKT3194oQopuZnyRE91aRnihKSEhISEhISEhISEi4oD7Lp/wv0fhun0lISEhISEhISEhISEgIRXqiKFFv/PpGEDq0B6VXjGC3c/SdrU6fd372ATS+TSjNKcSnUyiH39pCYfrlemmH3d2Rjvf1wFSpnfDeFzVsOt3fiyGzH+brBZ+SsvvYbfVZlL8A2j534XFvH6x5hdjtdq58uKFWO8/hUfi9PZvUux7EXrnAb10ou3RDdXd/7AX52O12Stevdfrc/d77UN8/AiyOhYjN3+6gLCH+tvkrUlseGI6idRcoLcJuh4pDXzt9roqeiKxpdQmh3CcA84bF2I11rD0HePXvRPNhPbHkFoIdMt7e4vR585G98b2vO0WnzuN5VxhZm/eSG3/Upb8gLuc8mnowbe7TXLpwmYBQf/6x9BPyc2sunO4f4sf0V6ditVqZF7PgtvkrUltULETmRWNr40S2F42tHxGl2xjzTVT/pGjbGbfOfbCbCsFux7Lz3zX33X84APJmBmQaHeYN77nUFemzKN3ayL2Sx/urPiX1l7Ns/OT9m9L4o32+3dhvceKZxoY0UKyDQ4cO4enpSfv27W+3Ky754osviI6OxtPTU4i+Qq2i79LJbBn0MjZLBdGrYvG7O4JL+6trqt20an5c4Ji9tdXwXvSa9wjxT650qa1Uqxi9eDLvDJ6N1VLBhL/PJKxPBOkHqrW9Anwpziui8HLdF+p/hM+i/AWQqd1psWA6GcOmYi+vwP+DOLS9O1Ny8LiTnSosEFXroPoLu7ujj51FfswTUF6Ox6sLUd7VlfKfnS8OipYsxJZd/3e3hPkrUttNieqeCZg/WwDWClT3P4M8MBzbxeop260XkrHu+qxyB2pUg59wOUiUa1SEL5/Cj/1fwG6poNMns/Dq15H8faeqbBRqFb8s2kBZ5hX0HUPo9PHMel2gicy5qXOe4vAPR9i9fS9339ub6a9N5Y3YJTXsIrq05+DuQ/Qc0P22+tvYYiEyLxpbGyeyvWhs/Ygo3caYb6L6J5TuqMf/leIlz0JFBerJc1G07Yw1rTrf3HoMxF5aTMVPjiXK5H4h9dMW5bMo3etw9EQSg/r9hZQzZ29e5A/2+XbzZ5vaRSo9rYPExESSk5Nvtxv1YuvWrRiNRmH6hm5tMP2ai81SAUD2T2cIuucuJ5sjb1XftZTJ5ZQXl9VLO6hrG/Izc7FWap8/nEb4oC5ONvm/5nD24Ok7wmdR/gJouoRTfuk37OUO7ZKjp9FH9XSykand8Z4yhtzr3ImvDWX7CKzZ2VA58UF50ilUPXvXsFOPGI1mzHg0EyYh83A9MYAof0Vqy1uGYTfmgdWha7uUjiLUeV0ka9rhqr/dIu6mImm/S90m3dti/jUHe2VeFCSm4hPtnBeXN+6lLNNxkaoNbUFx2q/18llkzvW+5y+cOuL43omfTtFnUK9a7eK3JlBReSxup7+NLRYi86KxtXEi24vG1o+I0m2M+Saqf1KEhmPLy4EKh7/WjGTcIno477t7FDKdHmX/4ageeBx7WalLXZE+i9K9HoMH9kOrvbV1cP9onyX+WBrtE8X333+fbdu2sXjxYjp37syQIUNISEjg1KlTLF26lHnz5qHT6fjnP/9JSEgIZ8+eZcqUKfj4+DBr1iwAwsPD2bdvH7GxsRw4cABvb2/MZjMGg4G+ffuSmJiIh4cHmZmZxMTE4O7u7rR/q9WKUqmkvLycXr168cILLzBhwgSmT5/Ohx9+yLFjxxg+fDgfffQRw4YNIzc3l3PnzjFp0iT2799Pamoqb7/9Nv7+/sycOZPMzEz69OnDsWPHiI6OJi8vj+TkZDp06MCMGTMA2LBhAxkZGXh5eVFUVMTs2bPZv38/mZmZrF27llatWmEwGFiyZAkDBw7EZrOxc+dOunXrRkZGBsuXL8fNzY2XX36ZadOmER0dXa94a3w8KTdVN6AWUynNfGp/eilXKmgzth/749bUS1vv40lZcXVpkdlUgl+zkHp9ty5E+SzKXwCFd1NsxdU+20wlKLybONn4Pv84V/72b6jnhSqArKkX9tKSqm17STGypm2cbMpP/Iwl8SD2wkKUPXrhEbcA45xZt8VfkdoyrQf28urjZ7eUItdc7+mFDEVwBBXHElzqqnw8sZqqda2mUpQ+TWrYydVKQl8ci9fdHUia9kG9fBaZc17NmlJicuRGSVExnl6eKBRyrNabnwlTpL+NLRYi86KxtXEi24vG1o+I0m2M+Saqf5Lpm2Avq9bFXIJM7xwLmVdzZGotlp2fI/P1QzttIcWLp4G97nNemM+CdEXSGH2+FW51zcPGRqMdKE6fPp2vvvqKyMhIdu/ejbe3N/v27aNbt24MGDCAyMhIxo8fz7x58+jUqRPHjx8nLi6Ozz//nJiYGFasWMFLL73EE088gc1mY9GiRWzcuBGDwcDRo0cJDQ2lZ8+e+Pv78+CDD9bY/6ZNm1i7di1hYWEcPXqUrl278vDDD1cNJlUqFQsWLCAgIIAff/yRgIAAnn/+eRYvXszp06dZsGABa9as4dtvv2Xy5Mm8+OKLTJw4kRkzZmAymejXrx8HDhxAo9EwaNAgZsyYQXp6Op999hk7duxAJpMxZ84cEhISiI6Oxt/fn0mTJhEQEABAfHw8wcHBTJgwgZEjRxIaGsqIESMICQnBZrPRs2fPeg8SAUpzjSj1mqptlV6DObfmE0y5UkHfJU9yeNkmis7/Vi9tU64Rd526alut11J85dafjoryWZS/ANa8AuS6ap/lei3WvMKqbbcWPiia6PEY2q/q37yfHE3x3sOYT+ND060AACAASURBVJ25rq69IB+ZpvquoUyrw17gPHX91SUh5T8fw3PBmyCXg+36HaYof0Vq20uKkCmrj59MpcFeWvv054qwzlgzTtTp5+9Yco0o9NW6Cr2G8tzCGnY2cznpizagCTHQ9YvXONAzFnuFtU7ths65kY89QP/7+lJaUkr+lQK0ei0mYzFaDx3GfOMtDYxE+CtSW3QsROZFY2vjRLYXja0fEaXbGPNNVP9kNxUic7/qaZla63hX8WrMJVjPpTnscy6BWovMywd7Xt1+C/NZkK5IGqPPEvWn0ZaeyuVyoqKi+O6770hKSmLmzJns2LGD+Ph4Bg8eDDgWmgwMdKyhFhQUREpK9TtIYWFhAPj6+mIwGIiLiyMuLo6JEydiNrt+cf7tt99m5cqVPPzww1y+7HhR+5FHHmHLli2UlZWRnZ1dNWj7ff8Anp6eTn8XF1evdxcQEIBcLsfT05NmzZqh0+mQy+XI5Y7DlJaWhlwu5+OPP2bVqlW4ublhMpmu6+Pvv7FTp07o9XoGDhzI9u3b2bp1KyNHjnT5G68m+8gZ9AE+yFWOewuGHm24kPAz7k11VR2HQq2k79KnOLnqG3JPniNkWI+6JKu4cPQMXv4+KCq1g7u3JWX3MTRNdLhf1SndKKJ8FuUvQOmxFJR+zZEpHdrarh0w7UlE3kSPXKehIiuXy3PeIW/VZvJWbQYg719bXV5ElScnoTAYoHJtLGVERyyJB5F5eCCrLDvRPvk0yB1r+Cn8A7BlZblsxEX5K1Lbdjkdmac3KBy6cr8wrBknwV0LKrWTraJDbypOH3TpK0Dh4TTUAb7IKvOiac925O46hltTHYrKvAia9kCVfdnlPJTensjVKpfaDZ1zX677mhcem8O8mAUcTPiRjt06ABDZoyMHdh8CQCaTYfBrfsPaIvwVqS06FiLzorG1cSLbi8bWj4jSbYz5Jqp/smakIPf2BTeHv4rQ9lQk/QRaPagd/lakHUfuY3B8Qa0BuRy7seYEVn+Uz6J0RdIYfb4V7Ha7kP/vVBSvv/7667fbiZtFp9PxySef0Lp1a0aNGsXy5cvRaDSMGTMGgD179hAZGYnBYCA1NZWUlBTGjBlDZmYmKSkpVU/UjEYjJSUlTJ06lU6dOvHqq6/y6KOPcvjwYfR6Pc2aNcNisaDRVHc2aWlpTJ8+nYEDB/LXv/6VyZMno9PpOH78ODt37uShhx7C398fgF27dtG+fXsCAgJITEysmiAnOTmZoqIievXqhdFoJCEhoerp5dq1a5k0aZLT33K5nN27d7N8+XK6deuGwWDAYDDg4+PDl19+yaBBg8jKysLLy4vdu3dX7fN3goKCWL58OUqlkrFjx7qM79GV1TPG2SusFJy5ROQzw2jeJYyS3wpI2/Q93V54CO/wQLJ/SuOevz+Hb2QoLXqF03Zcf/z7dCBlw3dOmvnymieDrcLKb79k0u/p+wm8qzVFvxVwZMv3RD8/hhbtAjh/2HG3b+D0UbTq3QG1XoOl1ELe+WwnHS+bzGlblM8N5e9dilpuSFRYKUu/iPdTD6LpHE5FTh7GL3bhE/sY7m1DKK18f0rh5Yn3k6PR9e4MFVYs5zKdSrq0Xtcswmy1Yr14Ac1D41GGd8B25Qpl/92JduJk3EJCqUg6iSIkFPWQYShCWuHedwDFn/wDW26Ok0xJwTUXEw3kb600kHaTHteUXdls2PKycOt2L4oWrbAXF2I9fQBl7+HIm/lhu5QOgMw3ALnOC9u5k7W69+sPzsfPXmGlJC2T4GnD8ezWBkt2AZc/30Orl8ahbx9IYWIqXn0jMIy6G337IPwejiJzXQLGynz5nbPKmvfvGirnzttq3lg6eTiJUY8Np3WHMDp178jfFq3CXGKmTUQYi1a9ztZPvwKg7+A+DHogiuCwIDQ6DScPV09gEazQC/G3Nu70WAy06Zw0Gyov8hWKGv42ujaugc7p41bnGzoN5fO1fUhDxliUrtc1F9uNId/aN7/mKWQD9U9uPu7OujYrtuyLqAaNRhHSDrsxn4pDu3AfOgFFy2CsZ09jvXAGZY9BKPxCcOs2AMt/N2PPdn5n03rFUvMANpDPonTVD42qez+V/HTsBNu/TSD1zFnMZWV0bN8WN7frFxuWffGlMJ+1E5+sl8+3m+VL6leqfaPMnvucEN1bRWa/k4exLrDb7QwcOJCPP/6YNm3aEBcXR0BAANOmTQMgPT2d1atXExwcTEZGBjExMQQGBvL666+TnJzM1KlTGTJkCPn5+cyfP5+IiAjy8/Np3bo1Y8aMITExkTVr1qDRaJg9ezYGg6Fq38899xwdOnTAbDaj0WiYOnUqACdOnOC1115j27ZtVdvz58+nffv2PP3008yfP58mTZowa9YsVq5cSWFhIQsWLGDbtm1s376dN998k0uXLrFkyRLefPNNAF555RVmz57N2LFj2bRpE+np6eh0OgoKCnjhhRfQ6XSsXr2aCxcuUFZWxoQJE6r2+cwzzxAcHFzld0xMDOPGjatX2enHAY812LH6nbNu4u4gtaoQ84BclM+TVAWujW4Sn9Bi10Y3QW6GzrXRHUbQs4FCdA8uEXP8dmlqXqA1FN+Xi5lxrr+yhRBdkYiKxcIKX9dGN8HZyrv1QrQbWRu31tJUiK6oPkQkrcrLXRvdBCLzbXTERSG67u3ETJBSllr7qwl3Mk3W/0uIbuEEcYM5n2/3CtNuSLw92rg2ugnyilxXTtwOGvVA8U7CZrNhs9lISkoiPT291vcabycWi6XqvclXX321qpy1LqSBooPGdhEF0kDxaqSBYjXSQLEaaaB4lXYja+OkgWI10kCxGmmgWI00UBSHl761EN180y9CdG+VRjuZzZ3G+fPneeuttzAYDMyZM+d2u1ODpUuX4uHhQffu3es1SJSQkJCQkJCQkJCQ+PMiDRQbiNDQUD766KPb7cZ1ee211263CxISEhISEhISEhKNFml5DAmJSr5TNHz5YjC3NuNhXYgqnzpvr98CvDeKqPJQAO3QcCG6Pt+kuDa6CUSWtF74m5gypz6rooTofhJbv1lWb4Zgt5rrqd3JiDr3QFy5bM+HGmYZkGvx3ymwbL9QTMneWsSUiL4yQkyMAd78qvZ1AW8VUWWt4krVxeWbqBJRt3v6C9FVtEoXogtQIqhPFVUiKqqkVeLORRooSkhISEhISPzpETVIlJCQ+N/hzza1izRQlJCQkJCQkJCQkJCQcIHtTzZQlGY1kZCQkJCQkJCQkJCQkHBCeqIocUt0vDuSHkP/gjG3ELvdzhfvbbopnbC7O9Lxvh6YrhjBbifhvS9q2HS6vxdDZj/M1ws+JWX3sduqWxsNFQtll26o7u6PvSAfu91O6fq1Tp+733sf6vtHgMWxCLD52x2UJcS71JUHhqNo3QVKi7DboeLQ106fq6InImtaPbW/3CcA84bF2I1Xbou/ANo+d+Fxbx+seY6YXvlwQ612nsOj8Ht7Nql3PYi9xFyrzR+h++OZTBJOncNbp0Emg6n3dnX6PDOviJVfJxIR6EPqpSsMvSuMqIjg66jVTUPlW23omuh5ZM5Esi9k0yK0JRuXr8OYW3hDGo3x3BPls6JtZ9w698FuKgS7HcvOf9ewUfYfDoC8mQGZRod5w3v10haVy179O9F8WE8suYVgh4y3tzh93nxkb3zv607RqfN43hVG1ua95MYfdakrMi9ExVmUz359Iwgd2oPSSt2j72x1+rzzsw+g8W1CaU4hPp1COfzWFgrTL982f0Vqizp2ItvkxtaniuyrryX3Sh7vr/qU1F/OsvGT929K407F/iebzOa2P1E8dOgQycnJtX5WWlrKzJkzWb16NXPnzr3lfRmNRr74omajdi3Lly9n4sSJddocPnyY0aNHc+jQIQDee+89EhISbtnHG+XgwYMsW7bsD98vgEqtYvKbU/ls4T/5v3c3EtQ+hIi7O92wjlKtYvTiyXz9xmckvPt/tAgPIqxPhJONV4AvxXlFFF6uu4H9I3Rro6Figbs7+thZFP/jQ0rWrcGtVRjKu7rWMCtaspDC2TMpnD2zfg25mxLVPRMo/34z5T9+jdzHH3mg84Q31gvJlG1Z6fj/q79h/TXNZYcmzF9ApnanxYLp/PbmKnI/WI+6XSja3p1r2KnCAlG1DqqXpkjdUksFi7/Yz0vD/8K0wV05czmfQ2cuOdms2XOCu0IMTB7YmSejInn768R66zv51lD5dh3Gz36Mkz8cZ/vfv+DIt4eYEPfEDX2/MZ57wnxWuqMe/1fKtn6M5ZsNyP1CULR1zje3HgOxlxZT/v12yrauxrLny3pJi8pluUZF+PIppL22loy3tqDvEIRXv45ONgq1il8WbeDCR19x7r2ttFnwuEtdoXkhKM6ifFaoVfRdOpmDC9ZxdOUXeLcPxO9uZ103rZofF6zn+N++JmNHIr3mPXLb/BWqLejYCW2TG1ufKrCvro2jJ5IY1O8v/MmqNP8nue0DxcTExOsOFE+fPo1KpWLKlCksXLjwlvdlNBrZunWrS7tHH33UpU337t1p165d1XZsbCz33HPPLfl3M/Tu3ZvZs2f/4fsFaNOtHbmZOVRYKgBIO5xCl0Hdb1gnqGsb8jNzsVbqnD+cRvigLk42+b/mcPbg6TtCtzYaKhbK9hFYs7OhchHl8qRTqHr2rmGnHjEazZjxaCZMQubhegY5ecsw7MY8sDr8s11KRxHqfDFtTTtc9bdbxN1UJO2/bf4CaLqEU37pN+zlDp9Ljp5GH9XTyUamdsd7yhhyr/MU5Y/UPXH+N1p66VG5OWYhvCukOftSLjjZeOs15Bc7nubkFZvpENCs3vpX01D5dj26DOrGmaOpAKQeTqHLoG439P3GeO6J8lkRGo4tLwcqHLrWjGTcIno42Si7RyHT6VH2H47qgcexl9VvtldRudyke1vMv+Zgr4xFQWIqPtHOsbi8cS9lmY6LXm1oC4rTfnWpKzIvRMVZlM+Gbm0w/ZqLrVI3+6czBN1zl5PNkbeqn+LK5HLKi8tum78itUUdO5FtcmPrU0X21bUxeGA/tFrtTX//TsZmtwv5/06lztLT999/n23btrF48WI6d+7MkCFDSEhI4NSpUyxdupR58+ah0+n45z//SUhICGfPnmXKlCn4+Pgwa9YsAMLDw9m3bx+xsbEcOHAAb29vzGYzBoOBvn37kpiYiIeHB5mZmcTExODu7g6AyWRi8+bNpKam8sEHH9CnTx8WLlxIREQEnp6efPnll3z77bfMnTuXTp06kZWVRdeuXRkxYgQAGzduJCMjAy8vL37++WdWrFjBpk2byMzM5IMPPqBfv34ArF27lg4dOpCSksILL7yAn59fnQFbtGgR5eXlBAYGkpWVBcClS5dYtGgR7du357nnnmPmzJlkZmbSp08fjh07RnR0NHl5eSQnJ9OhQwdmzJgBwIYNG6p8LCoqYvbs2Xz33XcsWbKEoUOHYjKZOH36NG+99RYBAQGsX7+eixcv4uXlRWZmJgsXLmTJkiWcPn2azz77DKvVyrJly2jatClGo5HQ0FDGjx/P6tWr+eijj5g9ezbHjx8nLy+Pv//97ygUtzattmezJphN1Y11iamEkGatblhH7+NJWXF1KZTZVIJfs5Bb8k2kbm00VCxkTb2wl5ZUbdtLipE1beNkU37iZyyJB7EXFqLs0QuPuAUY58yqW1frgb28OhZ2SylyzfWeMMhQBEdQccz1E3JR/gIovJtiK66Oqc1UgsLbeXkH3+cf58rf/g2VF8r1QZRunqkUrbuyalvnriLP5Hz3eGL/jsz6dBdvbf+RUxdzibnmwrC+NFS+1alfGaNSUwn6ph7IFXJs1vpNmd8Yzz1RPsv0TbCXVZ8jmEuQ6Z3zTebVHJlai2Xn58h8/dBOW0jx4mlgrzveonJZ5eOJ1VQdC6upFKVPzaVV5GoloS+OxevuDiRN+8Clrsi8EBVnUT5rfDwpvypvLaZSmvnUPuuqXKmgzdh+7I9b41JXZIwb2zkisk1ubH2qyL76z4Y06+lVTJ8+na+++orIyEh2796Nt7c3+/bto1u3bgwYMIDIyEjGjx/PvHnz6NSpE8ePHycuLo7PP/+cmJgYVqxYwUsvvcQTTzyBzWZj0aJFbNy4EYPBwNGjRwkNDaVnz574+/vz4IMPOu1br9czevRoAJ577jkAoqOjKS0tZfbs2YwaNQqNRsPo0aOJjo7GarUybNgwRowYQXp6Op999hlff+2oF9+5cyd2u51x48Zx7NixKr3U1FRmzZpFYGAg8fHxfPbZZ7z88svXjceePXs4d+4cq1evBqgqNfXz8yM6OprMzEwAXnzxRSZOnMiMGTMwmUz069ePAwcOoNFoGDRoEDNmzKjycceOHchkMubMmUNCQgLR0dHEx8fj5+fHww8/zOrVq4mPj2fy5Mls2rSJuLg4evbsydGjjndBJk6cWFWWu3nzZioqKnj22WcBeOCBB+jevTtTpkxhw4YN9OrVi0ceeYSYmBiSk5Pp2LHjtT/xhjBeKUStr14XUavXYrxyY+8xAZhyjbjr1FXbar2W4iu3vk6WKN3aaKhY2AvykWmq78LJtDrsBQVONrbsrKq/y38+hueCN0EuB9v1O0x7SREyZXUsZCoN9tKiWm0VYZ2xZpy4rf4CWPMKkOuqYyrXa7HmVcfUrYUPiiZ6PIb2q/o37ydHU7z3MOZTZ/5wXW+9hpKy8qrt4jIL3nq1k81rm75ndI92DO0SRp6plBHLt/CfOeNoonW/rm5tNFS+Xc2gRwfTY0gvzCVmh75OQ4mxBI1ei6mgqN6DRGic554on+2mQmTuV91ZV2sd72FdjbkE67k0h33OJVBrkXn5YM/7rU5tUblsyTWiuCp3FXoN5bW8o2ozl5O+aAOaEANdv3iNAz1jsVdYr6srMi9ExVmUz6W5RpRX5a1Kr8GcW1NXrlTQd8mTHF62iaLzdeeDSH9Faos6diLb5MbWp4rsqyX+t6mz9FQulxMVFcV3331HUlISM2fOZMeOHcTHxzN48GDAMdgKDAwEICgoiJSU6sVDw8LCAPD19cVgMBAXF0dcXBwTJ07EbHb9Mn1t/K4ZHh6OTCYjPT2d999/n08++YS8vDwA0tLSCAgIqPrOfffdh0ctj9DVajXr16/nH//4B/v27SM/P7/OfZ85c4aQkJCq7d9/d20EBAQgl8vx9PSkWbNm6HQ65HI5crm8yke5XM7HH3/MqlWrcHNzw2QyVX3/9/14e3tTXOxYmH3p0qVs3LiRMWPGkJSUVGOfVx+L331IS0ur2g4NDa2heSucOZKKj78vbirH/Ya23cM5tvuwi2/V5MLRM3j5+6Co1Anu3paU3cfQNNHhflVHeqfo1kZDxaI8OQmFwQBKx11QZURHLIkHkXl4IKss49A++TTIHU+DFf4B2LKyXDbktsvpyDy9QeHwT+4XhjXjJLhrQeXccSo69KbidP0WfRflL0DpsRSUfs2RKR0+a7t2wLQnEXkTPXKdhoqsXC7PeYe8VZvJW7UZgLx/ba3zAlikbmRwcy7nm7BUXij/fO43+oUHUVhShsnsmBwgq6AYH09HXDw17shlNzfVdkPl29Xs3hDPsklv8N60FRzbfYQ2XR2l9e26h3Ns95Eb0mqM554on60ZKci9fcHNoasIbU9F0k+g1YPaoVuRdhy5j8HxBbUG5HLsxrr7IxCXy4WH01AH+CKrjEXTnu3I3XUMt6Y6FJWxCJr2QJV92eU8lN6eyNWqOnVF5oWoOIvyOfvIGfQBPsgrdQ092nAh4Wfcm+qqBpAKtZK+S5/i5KpvyD15jpBhPeqSFOqvSG1Rx05km9zY+lSRffWfDbug/+5UXM56OnToUN5991369OlD3759WbhwITqdjnHjxgGOAduFCxdo2rQp58+fJzy8+mVemUxW9bfRaMTLy4vVq1dz5swZZs2axfbt25HL5djtdrKzs1EqlXh7e9fpz9Wae/bsYf/+/Xz66acAfPbZZwC0bdu26ukewLfffkuPHj1QKBRVj4yTk5P58MMPuffeexk1ahQ//PBD1RPI69G6dWt+/PHHqu2LFy/WaV8Xbdu2xd3dnZiYGACSkpJwc6s+HFf/zt+5fPkyb7/9NiUlJTzwwAMMHz7c6fPw8HCngfqvv/5K27Zt69S8FSxmC/+M+weTXp+CMc/IheRzJO0/ecM65WYL2+b9k+GvT6L4ipGslAukH0jivjmPUFpoYu/ftwMwcPoomvr7EPnAX7BWWDnzfd136ETpiowFZWWYPngH3bRY7IUFVJxNp/zno2ifmoq9yEjppg3Y8vPQx87CmnUZt5BWFC1f7Fq3ohzL7g0oo8ZDiQlbbia2iyko+z6I3VxMxeFvAZD5BmDP/w3KXb8LI9RfwG4uI2v+RzR/dSrWPCPm1AxKDh7H96XJWAuLqi58FV6eNH14GADNpoyhYOM3VGRff8IAUboalRuvjL6bZV8exEunpk1LL3q18eOd/yTSROvO5IGdeWl4L9b/kMTx89lk5hXx3H3d8dKpr6t5PRos367DxuXreGTu47Rs5YchqAXrF6+5oe83xnNPmM/lZZg3/Q33h57BbirEdukc1rTjuI94EntJEZZdW7Ds2oL7yCdR3TsWmU9LzOvegYryunURl8u2Ugups1fTbvGTWK4YMZ2+QP6+U7R+dQLlBSbOf/Alcncl7ZY+hTkzF10bf9JeXYPVVPd7Y0LzQlCcRflsNVvYP/df9Fn4OOYrRvKSL3JpfxI94x6mrKD4/7N35mFVlWv//zKTKCjUQQUExRQcitSQ0iMypGlOJIrmQGkhlZriDISKyGDiBDmgqIhmgCKoaGaCoCAqIanIJCAKmKggYzKu3x/7t1Z7Y73nba/7cXjP+lyX12Ht61xfnjbPWuu5n+e+vzd++/4E7EK+gn5fY3Tq8SkAQOM1Ldw5dfWFfcev2j3C9Jn8qr1TGb6r/4qr167jxJlzePS4Erv2H4br9I+hrfXPMmckXg5UuP+QbMtxHOzs7LB79268+eab8PLygrGxMb788ksAQGFhIfbs2QNTU1MUFxfDzc0NJiYmWLNmDXJycuDu7o7Ro0ejqqoKq1evRv/+/VFVVYXevXvD2dkZV65cwf79+/Haa69h+fLlMDSU7RjV1dXB398fOTk5mDVrFt555x2sXr0aenp6cHNzw8CBA/H48WMsWrQIffr0gaGhIcLCwrBy5Uo4OzsjKioKt2/fRpcuXdDW1ob58+ejpaUF8+bNQ+/evdG7d2/o6uri4MGDGDp0KO7fv49bt27B19cXp0+fxk8//YTVq1fD1tZW4bvw9fVFU1MTunXrhosXL6JXr15YsGAB/P39UV1dDR8fH5w4cQInTpyAv78/ysvLERAQAH9/fwCAp6cnli9fjilTpiA6OhqFhYXQ0dHBkydPsGTJEhQWFmL16tWwtLTEokWLsG7dOlRXV2PdunXYvXs3unbtCgCora3FqlWrsGHDBmGsw4cPR2BgIHR1dVFdXY0333wTLi4uOH36NHx8fLBixQpYWlrC29sblpaWWLt2LTQ0NPB3fGLqJG52/QWmKrSnCc+DEu5/ZyzxT9lmUclEFwA6jLH4z/8nJWg4nfuf/09K8KhYh4kuS0y/G8lEd+7C/93O88sEq/ua1b0HsBuz5wQ2KbZlP7Hb2S+rVt604n/il9fE1cH/Hay+Y//jf10jSEGvFjbegUXqr96JD6u/n7rDCCa6XHEhE12A3TuVFXqH9jHT1nidrs6eJZpaxv/5/6QETY3/2RDsRfAfA0WJ/16kQFGGFCj+iRQo/okUKP6JFCj+iRQo/okUKP6JFCj+iRQo/okUKP6JFCi+nIHif0w9lZCQkJCQkJCQkJCQ+G/nv+18TQoUJSQkJCQkJCQkJCQk/gP/XWGilHoqISEhISEhISEhISEh0Q42SfMSEhISEhISEhISEhISryxSoCghISEhISEhISEhISGhgBQoSkhISEhISEhISEhISCggBYoSEhISEhISEhISEhISCkiBooSEhISEhISEhISEhIQCUqAoISEhISEhISEhISEhoYAUKEpISEhISEhISEhISEgoIAWKEhKvCDU1NUx0c3Jy0NLSQqKVm5tLovM/QTleACgvLyfTeh4sWLAA2dnZr5R2QEAAk7nBSheQzeW6ujpy3XPnzqGiooJclyWsxjx16tRXar69ijzP+dba2vpcfs/LSFRU1Isewj/i3Llzz+13/TfPi/8LSIGihFJkZmaipKQEZWVlWL9+PW7evEmqn5iYiOPHj6OgoAB//PEHiaadnR2zBbY8Z8+eJdNas2YNsrKycOjQIUyaNAlBQUEkuosXL0ZWVhZCQkKwdu1a+Pr6kuh6e3vjxIkTpIEcwG68ALBkyRJkZmaS6f0dVPNCS0sL/fv3F64pX8KstEtKStC3b18SreehCwDz589HVVUVuW5ISAjU1NTIdVk+31iNuW/fvgrzjer7Zjkvtm7diqysLJw6dQrDhw9HWFgYie69e/cwf/58rFixAqdOnSJ7JrH62/EUFhbi6tWruHr1KlavXk2iWVJSgkePHuHJkyfYv38/ysrKSHRZrluOHz8OX19fnD9/nkwTYDfmHTt2YMOGDcjPzyfRk4fjONy6dYt8Xki8GKRAUUIp4uLioKenh8DAQJiZmZHupm3YsAFnz55FRkYGmpubERwcTKLr4OCgsCi5d+8eie6lS5fg7OwMBwcH2Nvbw9vbm0QXALp37w4rKyvEx8cjISEBHTt2JNEdMGAArKyskJycjIMHD6J79+4kugsWLED37t0RGBiIsLAwVFZWkuiyGi8AuLi4oLS0FKtXr0Z8fDyam5tJdGNiYjB+/HjyefHWW2+hsLBQuN69ezeJLkvtt99+G/X19cL1/v37X2pdABg9ejRMTEyE6/T0dBLd4cOHw8DAQLim2kBg9XwD2I25a9euSElJXupuYgAAIABJREFUQVlZGcrLy1/6+QYAGhoasLKyQmRkJI4fP46GhgYS3V27dmH27NkwMTGBo6Mjjh8/TqLL6m8HyN7VGzduRGBgIH788UeyjYrt27fj6dOnCAwMREVFBUJDQ0l0Wa5bNm/eDB8fH6ioqMDb2xuRkZEKc1BZWI05KCgICxYswNWrV7F27VokJiaS6AKAu7s7tm7ditjYWMTGxuLWrVtk2hLPH/UXPQCJVxNjY2Noa2vj8ePHmDFjBuliVVdXF8uXL0dYWBj69etHliKhrq6Ow4cPw9zcHCoqKoiPj4efn59o3ZMnTyI8PBxRUVGYM2cO9u7dSzBaGVVVVcjIyICJiQlee+01Mt3S0lLExcXB0tIS6urqePr0KYnukCFDoKOjg9dffx0bN27EoUOHMHbsWLi4uMDMzOylGy8ATJo0CQBgb28PLy8vbNmyBS4uLpg2bRo6d+6stO6JEycQEREBfX19AMCxY8dIxrt582ZEREQAkO3c1tfXw93d/aXWjo6Oxs6dO4UFa319PT799NOXVheQzbnFixfD3NwcAJCRkQEbGxvRujk5OZg6daqgm5eXhw8++EC0LqvnG8BuzIcPH1YIwO/fv4/ly5eL1mU5L5qamlBUVAQDAwPo6+uTPZd79eoFa2tr/Pbbb9DU1ETXrl1JdFn97QBAW1sbO3bsQFhYGNzc3MjefW+++SYMDQ1RVFSEwMBAsvUFy3XL3bt3oaKigmvXriE9PR2amprYsWMHunXrhhkzZrx0Y25tbYWamho0NTVx7do1lJeXIzU1FYMHD8bYsWNFaevr6yMgIEC4lgLFVxspUJRQisLCQixatAiOjo548OCBwimEWPj0IxUVFQAg2ZUDZOmsgwcPxvXr1wHIXpgU9OzZE3p6emhtbYW6ujqqq6tJdAHZS2L9+vUICAhAUlISfv/9dxLd9957D/Hx8Vi5ciWSkpLAcRyJ7vLly9HS0oIHDx5g5syZ2LhxIwBg48aN8PLyUlr3/fffR1xcHPl4AcDT0xPa2tpISUnB+PHj4eXlBY7jEBISgm+//VZpXSsrKyFIBIA+ffpQDBfz58/H3LlzheuffvqJRJel9kcffYSlS5cK1zExMS+1LgA8fPgQU6ZMEa6pUrRUVFQUxhwfH0+iy+r5BrAbs4eHBz7++GPhOi0tjUSX5bxQVVWFq6srNm3ahKSkJLJTtLy8PGRlZaGxsRH5+fkoLS0l0WX1twMgZF/U1NSgpaWF7Lu4ffs2/Pz8MGzYMDx9+pTsdJzlumX58uXQ1NTEtGnTcOzYMXTq1AmAbPNNDKzGvGzZMtTX12PkyJHYsmWLsJG7detW0dpWVlYoKSmBqakpAFm9d79+/UTrSrwgOAkJJXj8+DF39uxZrqWlhbt16xZ39epVMu0ffviBGzNmDPfRRx9xkydP5qKjo0l0ExMTFa5/++03El03NzcuLS2N27x5M+fp6cnNmDGDRLc91dXVTHRv3brFNTc3k2g5Oztz6enpCp81NjZyn3/+uShd+b9VTk4OFxMTI0pPnlGjRnFHjx7lGhsbhc+ampo4V1dXUbpubm6ci4sLt3LlSm7lypWck5OTyJH+SWFhIffTTz9xxcXFZJqstWtqargbN25wtbW1r4TunTt3FK4fPXpEotv+Ppafd2Jg9XzjOHZj5jiOS01N5cLDw7m0tDQyTY5jNy/a09DQQKJTUFDAubi4cFZWVty0adO4wsJCEl2Wf7stW7Zw586d406fPs0NHDiQW7FiBYluUVERt3//fq6hoYG7dOkS99NPP5Hoyq9bcnJySNctAQEBXFtbG8dxnPC/jY2NXEhIiChd+THfvHmTy8jIED1WjuO4b7755pl7o7GxkfP19RWtbWVlxdnb23N2dnacnZ0dZ21tLVpT4sUhBYoSJCQkJJBp1dbWcnl5edzp06fJXpY8586d4+Lj47n8/HyyF3x9fT33xx9/cA0NDVxERATpmFevXs1du3aNO3jwIGdnZ8cFBgaS6C5atIi7du0at23bNs7FxYX79ttvSXTlX2KUAV37ly3V98BxHDdv3jwuJyfnmc//+OMPUbpz587lLl++LPzz9PQUpccTFRXFjRs3jps3bx43btw4so0Ultq//PILN2LECG7cuHGcra3tM0HNy6bLcRxXUVHBeXh4cOPGjeOWLl3KPXz4kES3sLBQCAimT59O9rxoa2vjoqOjOV9fXy46OlpYrFLAasyhoaHc559/zq1bt46bO3cuFxoaSqLLcl5cuXJF4Z+XlxeJbkREBJONn3v37nFff/01t3z5ci4hIYH79ddfyX8Hx8kCcyru3LnDVVRUcFVVVdy+ffu40tJSEt1ff/2Vu3PnDldaWsr5+flxN27cINHlOI778ccfhZ/z8vK4lStXvtS6w4YN427evEmi1Z49e/YoXJ8+fZrJ75F4PkippxJKYW9vL6SGcv+/lklsXjvP6NGjsW/fPnz44YckejwbNmxAVVUVNDQ00Lt3bwQHB5MYjHTo0AG5ubmoqqrCqFGjYGhoSDBaGbyZjb+/PxISEshqQHhzGD8/P/z444/Ys2cPie6lS5cwePBgAICFhYXoNKdjx44hNjYW5eXluHLlCgDZfNPS0hI9Vnn+yiFRW1tblOamTZugq6srXFtZWYnS4ykuLsaJEyeE6/Xr15PostROTU3F2bNnoampiaamJvj5+cHOzu6l1QVkKWMODg6YM2cOSkpKEBwcrFB3oyx79uyBp6cnevTogTt37mD37t0kuv7+/mhuboapqSlu3ryJ/Px8Uene8rAac3Nzs0LNFZVxGct5sX79elhaWgKQtdahchQ9c+YMnJ2dSbTk2blzJ2bPno3Lly/D0dER/v7+GDRoEIn2kydPsHPnTqiqqmLo0KEwMTFBr169ROtu374dCxYsQHBwMPT19REaGkoy3+Li4uDh4YFvv/0WNjY2iIqKwoABA0Rp1tXVoaamBkVFRUKrpQ4dOkBDQ4NcV0dHR7Quz4cffviM+ZW8eZcY5s6di6KiIhQUFKBv377kazmJ54sUKEoohbu7O6ZOnQoACot4CiZMmKBQz5Wdna3wQFMWViY5e/bsQXJyMrp37w4nJydERkZi2bJlJNqvipkNq4DO0dER1tbWiI6OFuabmpoa3njjDVG68vAOibyj7P79+0mMLx49egQ3Nzfk5eXB0tISfn5+JIsoPT29//H6ZdTu3r07NDU1AQCamppkrrWsdAGZuQi/+dW/f3+yOqlevXrhrbfeAiDbPLh8+TKJrr6+Pr788kvhOiQkhEQXYDfm9kGWqiqNETvLeeHr6yt8F4DMkIeCIUOGKDwvY2NjFeo3lYWVSQ4gC+wHDRqEwsJCvP3229i0aRNJ66JXyczm7NmziI2NRVlZGXJycsBxHNTV1fHvf//7pdTlYWl+FR0djcjISBgZGaGsrAyzZ89WqPeWeLWQAkUJpeAX7YDspXz//n0y7ZqaGmzcuFF4gCUmJmLbtm2idVmZ5DQ0NCAyMhJhYWGwsbEhW0QBshebn58fAgMDSc1sqM1hWAV0nTp1QqdOnbB48WKFz6k2DwB2DomsTmGqqqrg5+cHY2Nj3Lt3D+rqdI9xVtp3797Fvn37YGJigrt37wo75C+rLq/95MkTdO7cGZWVlWSB4p07d5CdnS2MuaSkhES3trZW4bquro5EF2A3ZjU1Nbi7u8PExAT37t1TCMDEwHJeyI+xvr4e169fx/Tp00XrpqamIi4uDj169AAgc4ClCBRZmeQAMiM3JycnhIWFoXPnzmRBKGszGwcHBzJjGCcnJzg5OSE1NRXDhg0jGCVbXR6W5lcss14knj9SoCihFKtWrRJ+rq+vR1tbG5l2dnY2HB0dhSa7VC6iZmZmGDt2LFRVVXHmzBm4uLiQ6PJNyfkAlPK7mDFjhmCtbWFhQZY+9cEHH8DBwQFPnjzByJEjRev+XUCXlZVFknKZnZ2NY8eOCcF9Xl4eYmNjResC7BwSWZ3CrFixAjExMcjLy4OFhQVpuhor7RUrVmDXrl1IT0+HhYUFVqxY8VLrArK2KRMnTkR9fT06deqETZs2keh+9tln8PLyEr5jqkWUmZkZJk6cCCMjI5SWlmLWrFkkugC7MX/11Ve4ePEi8vLyMHLkSLIFMct58e6770JXVxccx6Fjx46YOXMmia6JiYmC4yTVc+iLL76At7c38vLykJqaSrpoLygoQEVFBVRUVFBbW0u2kTlv3jykpKTAxcUFWVlZpPMiMzMT9vb2yMvLI312th9jVFQUyRqjvS7VSfOqVauE935bWxtu3rwpWpOHZdaLxPNHChQllMbJyQmALG+er9mgYM2aNQrBxZ07d0h0p0+fDmtraxQUFKB3797o3bs3ia6amhrmzp2Lp0+f4saNG6Q20NnZ2fD29oaBgQHGjx8PHR0dODo6ita9ePEifHx80KdPH4wdOxYNDQ2YNm2a0nqHDh3CjBkzFDYQALqAbt26dfj000+FdhOUFu/yPcVyc3OFgF8sd+7cwa1bt2BsbEx6CjNt2jSsXr2abKPjeWivXLkS7u7u8PDweCV0AdnpanR0NDQ0NBTanIglNTVVaJ5NydixY2FlZYWioiL06dOHJM2Zh9WY7ezsEBoaqtCShQKW82Lt2rVk9fjytG+j8Mknn5DodunSBT/++CMA2SklZfumyZMnw9nZGdXV1Th8+DDZZkrPnj3R2NiIa9euoWfPnmR1//r6+lBVVcXJkydhaWlJkpUya9YsREZGChsIwJ++DRTP0ZCQEMTExEBDQ0PQpQgUKyoqhJ9v376Nw4cPk53os8x6kXj+SH89CaVYu3atUAMCyFI6+Ia+Yml/AnX58mWSBQq/w2dubo78/HysWrWKJBVw4cKFwq64hYUFaZpIVFQUQkNDkZCQgIkTJ8LPz48kUExKSsLp06cRERGBCRMmYMOGDaL05I1f+A0EgC6ge/vttxUK4ikXwRcuXMDbb78NgMaAh4fVKUzfvn0VFjhVVVXo0qXLS62tpaWloMs3e35ZdQHZAi08PJw0SATYmZawMgED2I3ZwcGBiaEGy3nRPkg8deoUSeAYFxencE1VcnH48GHMnz8fgMxkJSQkBP7+/qJ1AVm2Dx/EUN4nrOr+WRja7dixAwDg5eWFSZMmCZ/Lp16K4ebNmzh//rxQv/vLL7+I0mNlviMPy6wXieePFChK/COSkpJgZ2eHsLAwhc8zMjKwf/9+kt8hn9pTVVWFTp06idqZY+0exmpXHABMTU1hZGQknHJRmbgYGhpCS0tL0JV351SGyZMnAwBGjRoFa2tr4XOqOsLm5mZs3rwZZmZmZHWrrB1Vzc3NhZ18QHEHVwxdu3ZFSkqKUMN78OBBLF++/KXWfuuttxQ2k3bv3g13d/eXVhcAhg8fLtStAjJzCfnTZ2VhZVrCygQMYDdmVoYaLObFkiVLEBwczMzx++jRoxg6dCgAmUGc2FOYK1euCP9CQ0OF8T548ED0WHlYbaawqvtnYWjHm6AVFRUpfD5+/HjR2gAwcOBABZMn/vcpi7xJzq1btwCA1CQHkJlSya/ZLly4QKov8XyRAkWJf8SNGzdgZ2eHnJwchZOt/Px8st/h6+uLMWPGAAAaGxtx/PhxUXqsH4ysdsUB2feakJCAhw8fIikpidRQIywsDEVFRTh48CBZbUloaCgKCgrg7OwMfX196OjokOhmZmaS162yMuDJzc2FhYUF0xOC9PR04fr+/ftkgSIr7c2bNyMiIgLAn4trioCOlS4A5OTkYOrUqUKwkZeXRxIosjItYWUCBrAbMytDDRbzgj+V++yzzxTqP3/44QdRujzr1q1TyJw5dOiQKD1dXV0YGRmhU6dOMDIyAiBbwI8bN06UrjysNlNY1f2zMrQDZO9qX19fweBHbEDHk5KSgqNHj8LY2BiA7N4Tc6rI0iRnw4YNWL58OWbNmqWwmSJ2zBIvFilQlPhHLFy4EADg7e2Nbt26CZ+/9957ZL+DDxIBWQqRWJc21u5hLG2mPTw8sGHDBuTl5aGyspIsIPD09ERYWBiqqqrw8OFDsnYe/Lw4dOgQ6uvr8dFHH2HgwIGidVnUrcob8LS1teHJkyfo0qWL6BrFAwcOwN/fX+GEAKAzZfLw8FBYpKelpZHostSeP3++won7Tz/99FLrArLFpLzJEVVKMivTElYmYAC7Mc+fPx9jx44VTs/4gJFCl3pe9OzZEwCeMQmysLAQrQ3I2njwGS91dXW4cuWKYGSmDBYWFrCwsICtrS35iR8Pq80UVnX/rAztAGDLli3Q1tZGUVERgoKCoKGhAR8fH9G6RkZGQv0qx3E4cuSIaE1AZpKTmJiIuro6WFpawtjYWHQLLn7zvX///sJ9wnEcWQsZiReDCifWF1/iv4qrV6/+5eeUwZH8bhT/EKOq76J+MAKydEu+yTxA68jZHsrTSnmo0tR+//13dO3aFVeuXMHevXtRXFyMM2fOKK3HcRxUVFSesbf/8ccfyYwqqI19ePLy8tC3b1/huqCgAG+++aZoXT7VmSqt8HloT506FatXr35ldAFZP007OzuYmpoCAJqamhTqspXF398fH3/8MVmAwZOVlQVzc3N06tQJgGwzhcp8htWYhw0bhn379imkzFLAYl60N+rioXre29vbw9jYWHBTnTp1KonLdWlpKQIDA6GjowNbW1t07doVgwYNEq0LAJ9//jnc3NyE6/j4eLJ3Nau6/8LCQhQUFJAbPiUlJcHQ0BCRkZHIyMiAk5MTvvrqKzJ9Hio/CPl6zWnTpiE2NlZ0veZfUVNTI7q0ReLFIp0oSvwjAgIC0LdvX/z+++9obGwU+l9RYmVlJSzUdXR00LlzZxJdFoXsgKLNNEC3Kw7IalV+/vlnoScaVS0oq3YTy5YtQ01NDQwNDTFjxgyMGDFClN6UKVNw5MgRzJw5U1hEAbL0G6pAkdrYh6ekpAQtLS1QVVVFSEgIXFxcSAJFlqnOrLRZmeSwNPb55ZdfFPrFUgSJgKzHn/wGAhVLly7Fvn37hECR0qGU1ZhZ1VWymBeqqqqYOHEikpOTYWZmJrz7qObF2rVrhROZtrY2hbo0MezcuROzZ8/G5cuX4ejoCH9/f7JAcdOmTQpBAJWTOCBLax0+fDgAupRWQFY7zgdaVLW2gOzdZ2FhgU8++QTr1q0TXWPK+0Hw9aU8VGsAFvWaPGvWrMGkSZOQnZ2N8PBwjB49mrRFjcTzheZJJPFfg5eXFwICAjB8+HD88MMPCAoKwg8//ID333+f7Hc4OjrCyMgIRkZG+P3338lSLXR1dREQEABjY2P069ePrLdP+13f27dvk+gCslTA1tZW4fug2plbt24dhgwZIqTlUrU30dTUxLZt2xAWFgZbW1vRaZz8397b2xsHDhxAZGQkIiMj8e2331IMFwC9sQ/P1atX0adPH2zatAkuLi5ISUkh0eVTna9cuYKrV69i165dJLostXmTnLKyMpSXl2P37t0vtS4ADB48+BkDFwrefvtthdooKhOw0aNHKwT18rWmYmE1Zr6u8tixY4iLi3up55unpyesra2hp6eHKVOmwMbGBlOmTCEzGJPPmrh9+/bfnmD+U3r16gVra2toa2tDU1MTXbt2JdEFZMYqycnJiIuLQ1xcHNasWUOiGxMTg/Hjx8PBwQH29vZkJ10hISEYMWKEoBsUFESiCwDffPMNDh48qJBKLYYbN24AkKX38u9/yjUAy3rN7t27w8rKCvHx8UhISCCr15R4MUgnihL/CD7Fsn3dIKWTGqt2BawejCdOnEBoaCiePHkCLS0tPH36lGyXsn///gq1NvwOq1hYtZvYuHGjws59eno6bGxsROva29sjLS0Nubm5sLS0hK2trWhNHlbGPgYGBvjjjz/Q1NQEW1tbMsMnVgYgLLVZmeSwNPZhZeASHR2NnTt3CiYg9fX1+PTTT0XrlpaWYvHixcJpSUZGBsm9B7AbM6u6ShbzgjfmysrKQmVlJfT19fHo0SPRjcpZtyvIy8tDVlYWGhsbkZ+fL7rmXx4fHx+hLm/gwIFkf78TJ04gIiJCqK08duwYiS51qwl5Zs2ahYaGBjx58gSAzMV2wYIFSuux9oNgWa9ZVVWFjIwMmJiYkJT3SLxYpEBRQinU1NTg5uYGU1NT3LlzR1hMiYF1uwJWD8Zr167h9OnT2LNnD9zc3EhPNTp27Ijo6GihLQRVLSiLdhOAbMNgzZo1qKqqInU7+/7775GVlQVTU1OkpaUhMzMTX3/9tWhdgJ2xz7179/DZZ59h+vTpKCgoQE5ODokuy1RnVtqsTHJYGvuwMnD56KOPFExyqHQfPnyIKVOmCNeUTtSsxszCpApgOy+mT5+OCRMm4OnTp9DW1hbdi1felTsnJwccx5G6cn/xxRfw9vZGXl4eUlNTyWoIAZnRypdffomwsDDSd5+VlZWCAQ9VDSt1qwl59u3bh/j4eNTX18PAwAAVFRWiAkUedXV1LFmyBPn5+bCwsCBL4Zw+fTqsra2Z1GuamJjAz88PgYGBSEpKItt8lXgxSGY2EkqTnJyMgoIC9O7dGyNHjhStV1tbi5qamr9sV0DVLJlFITv/kgwNDcX8+fPh5+dHlirDyihn0qRJCu1Nrl69KtjJi2HFihVwdXXFqVOn4OzsjOjoaJITni1btmDRokXCdXBwMJYsWSJa96+4ceMGiVNrY2MjiouLYWFhgdLSUjQ3NwvOiWJhYcrEWjs3NxdVVVXo2bMnDA0NRacls9ZlCaXLLk9JSYlgvAMAjx8/VmhdIBYWY25qakJUVBRaWlrw1ltvwczMjGzMLOcF3+O3S5cuaGtrI3k/sXLlbg9/GkoB/64LCgqCq6sr1q5dKzSgF8O8efNQXV0tPC+p3ntTp07Fw4cPyVpNyMO3huDXA3v37sWcOXNE63p6emL48OEwNTVFSUkJLly4IHpz4q84cuQInJ2dyXUBdiZ8Es8H6URRQmlsbW2FFECKYnO+XQFfo8iTmJgIe3t7Udo88oXsVA/G69evIzExEZqamnB1dQXl3gurEx75nfyGhgYyo4A+ffqgX79+uHDhAszMzMiMHtovxKiMHgB2xj5aWlro1q0bysvLoaqqipMnT5LsMLMyZWKpvWfPHiQnJ6N79+5wcnJCZGQkycktK11ANi+8vb1hYGCA8ePHQ0dHR2FzRVlYuezq6ekhMDAQqqqqGDp0KExMTMiCLlZjDgwMhL6+Pqqrq+Ho6IitW7fC19dXtC7LeQHImqtXVlYCoHP8bp8SGhgYiJUrV4rWra+vR1pamvB8o+yvaW5ujvPnz2PEiBGYMGECpk+fTqLb2tqqYFZGVX4i32oCoDsZB2TpwsCfJS3FxcUkur169cLYsWMByEpRqMwDt23bhmPHjkFVVVXoNUoVKLIy4ZN4MUhmNhJKcenSJTg7O5MXmwNAUFCQ8BL+7bffyNpubNu2DXZ2dsKYv/vuOxLd0NBQ2NvbY+7cuZg9eza2bNlCoguwM8qRb3Py6NEjhIWFkehmZmYiJycHNTU12L59OzIyMkh01dTU4O7ujvXr18Pd3Z0sHRmQGfvY2NiQG/t4enpixowZWLlyJVasWEG22GFlysRSu6GhAZGRkTA3N4eNjQ3ZBgIrXQCIiopCaGgorK2tMXHiRDJzGN5ld9CgQZgwYQLu3r1LohscHIy+fftCVVUVb7/9NunCjNWYu3fvjvnz5+ONN96AiYkJDA0NSXRZzosNGzZg48aNCAwMxI8//ojs7GwS3aioKCQkJKClpQV+fn5kJm7u7u64cuUKSktLUVpaStpfc/r06Rg5ciTee+89XLlyBYsXLybR3bRpE6ytrYV/VNkjCxYsEExhmpqaFLJ1xPLgwQMkJiaiW7ducHR0xMOHD0l07969K9Q9VlZWkgWKt27dQmJiIs6dO4fExETSlGRWJnwSLwbpRFFCKU6ePInw8HBERUVhzpw52Lt3L5n2zJkzsXv3bqiqqiI9PR0TJ04k0eUfjHwKEmUhOyALZhwcHEg1Q0JCcOTIEairqwu7fmIMNcrLy1FWVoaioiIhWGxrayNLy1q5ciU0NTUxZ84chIWFkS0cvvrqK6Gv1siRI0lTtAYMGKBwUkS5WD158qRwffnyZRJdlm51rLRbW1sVdNva2l5qXQAwNTWFkZGRoE3lbsnKZbdnz55wcnJCWFgYOnfuTOpuyWrM9+7dQ1NTE1RUVNDW1obHjx+T6LKcF9ra2tixY4dCiiEFO3fuxM8//wxHR0c4Ozvj4MGDJLo9evSAl5eXcE1VB8oSNTU1nD17lvwU9NSpU5g/fz4A2SbFgQMHyGpB161bJ/xsZWVFVmYwadIkTJw4EXV1ddDV1cWmTZtIdC0tLdHY2AhtbW0SPXlYmfBJvBikQFFCKXr27Ak9PT20trZCXV2ddJeytbUVhYWFaG1thZOTE9mDnOWDkRXZ2dk4f/48WXB769YtnDt3Djk5OUJ6paqqKkljZ16rqakJhoaGcHFxUaiZEgu/+GtpaSHTBIChQ4dixYoVQg0FVZrMwIEDUV9fL7gl1tTUiNYE2LrVsdJWU1PD3Llz8fTpU9y4cQP9+vV7qXUBmRlMQkICHj58iKSkJLKd/PYuu1SO0QUFBaioqICKigpqa2tJDSRYjfnf//437O3toaKigujoaLKWECznRXNzMwDZ/dzS0kJ2orhq1Srcv38fS5cuxW+//YbLly/DwsJCtO7w4cNx9OhRwXCOKlWWJe7u7ujTp4/goC12ffHLL7/g3LlzyM3NFRx2OY4jdWs/cuQItLS0MH78eOTm5kJNTY2kb+6gQYOQnJxMWlsKAPr6+rCxsYGBgYGwCU2RWg+wM+GTeDFIgaKEUly9ehX9+/dHY2MjvLy8UFJSQqb97bffYs2aNZgwYQKysrKwZMkSkjQclg9GVgwYMEDhtE+sS5ujoyMcHR1x/fqe9vhGAAAgAElEQVR1vPXWW2KH9wwBAQFwdXWFiYkJHj16hKNHj5LUBvn5+QlmHWlpaUhJSSHrpRgeHo4xY8YIpyRUbpEHDx7Etm3bFFoKUNSCsnSrY6W9cOFC4UTYwsKC7ESYlS4gS5/asGED8vLyUFlZSdZ2o73LrrybqBgmT54MZ2dnVFdX4/Dhw2QnDwC7MTs6OsLa2lq4t6lOKlnOC3V1dSQmJmLAgAEYNGiQUD8mlvr6ehw8eBD6+voYNWoUFi9eDFdXV9G6sbGx0NTUFL5bynY6TU1NQlpvZWUlWlpa8K9//Uu0bq9evRSe72I3aSwtLaGrq4tjx47ByckJgGxTkyKQ47l+/Tp8fHwAAGPGjEFwcDA8PT1F65aVleG7777D7du3YW5ujqVLl5IYw5w7dw4XLlxAp06dANC1IAGA06dPY/Dgwbh27RoA2jkn8QLgJCSUoKGhgXv69CnX0NDARUREcIWFhWTaR44cUbg+e/Ysia6rqytXU1MjXMfGxpLotufnn38WrTFz5kxu1qxZ3JQpUzg7Oztu5syZ3MyZMzkHBweCEXJcTEwMd/z4cY7jOO7YsWNcfn4+ie7u3bsVrnft2kWiu3r1aoVrb29vEt2/0r5z5w6J7nfffadwHR0dTaIrIfE/8fjx4xc9hP/T/PLLL9yDBw84juMU3idiycnJUbi+e/cuie6yZcsUrrOzs0l0OY7jQkJChJ9///13bsmSJaL0ysrKuLKyMm7Pnj1cWloaV1paypWVlXHbtm0TO1SO4ziuubn5L3+mIDw8XOGa6t335ZdfcgkJCdzNmze5EydOcG5ubiS6mzdvVri+fPkyiS7HcVxiYqLC9W+//UamLfH8kU4UJZRi7NixCA0NRf/+/TF79mxS7cmTJytY9FPtBltZWQm7ZwAUnFXFEBMTgwMHDqChoUE4qRR7cvTWW2/hk08++cvfRQGr3c/y8nK0tLRAXV0dLS0tQgNpsbSvG+TTqDIzMzFo0CBR2urq6ti2bZugSVUPs3TpUoWWAvI97iQkWEGZnibxLCEhIQgPDwcAhfeJWHR0dDB//nzo6OjA1tYWXbt2JTk5srCwQHp6usLzTWwqbm5urvAvLi4OgKwO9I8//hClO2vWLBgZGYHjOJw/f174/P79+ySO0UuWLMGIESMwefJkxMfH4+nTp5gxY4ZoXUBmNPfTTz/B1NQUd+/eJasF7du3r4LraUFBAYluWloajh8/DmNjY9Kex4DMhE9+DUd5civx/JECRQmlcHBwQP/+/YVryj45rCz6WT0YT5w4gYiICGGBRpHCwadrFhcXC0XxxcXFZC5tZmZmUFeX3f5aWlp4/fXXSXSHDRsGBwcHdO7cGdXV1UIwKpaUlBRcvHgRJiYmuHfvHrS0tFBUVETSyiIjIwOOjo6CPT1VvS2rlgISEv9XqKurI216/jwYPny4QtsRitZQALBr1y7Mnj0bV65cgaOjI/z9/UVvggHA3r17FYxV7t+/Lxi6KEtNTY3goMo/N1VVVfHpp5+K0vXx8RFabsmTlpYmSpenT58+mDx5MgDZhvT3339PogsAixYtQlBQkJDuTJWq3rFjR2F9de/ePXTv3h0AcPjwYVHtSORbhXAcR+ayC7Bt4STx/JECRQmlUFdXx+HDh2Fubk5erKyrqys0ru3Xrx/OnTtHosvqwWhlZaWwi9+nTx8SXQBISEhg4tLGavfTwcEB7777LnnNkZmZmVBbIg9Fywn5npIAnSsg31IgIiICEyZMwIYNG0h0S0pKoKOjA3V1dcTFxeGDDz4gOx0H2DQqnzp1KlavXq2wucSC9PR02NjYMP0dlNTU1JDdI+Xl5cIikiWUY168eDHWrl3LfNyU8yInJwdTp04V+vHm5eWRBIq9evWCtbU1fvvtN2hqapK51np4eCg4ZVMEXXzbinHjxsHMzEy0Hg8fJJ46dUo4RcvJyUFycjLef/990frtjdCamppEa/L861//QnBwsHBNZX61c+dO/PDDDwAg9GjetWsX6uvrRQWK8v0kAfzl+1VZWK3hJF4MUqAooRSJiYkYPHiw0ACesliZlUV/+wfj+PHjSXTz8vIwbdo0YdeW4pSLtUsbq91PQDZOAwMD1NXVISIigiRlyMvL6y9PHigCDysrK1RXVwvzLD4+Ht98841oXVYtBbZv344FCxYgNDQU+vr6CA0NRUBAAIk2q0blffv2VfhbVVVVCY6GYsjNzcWOHTtQVVVFnj61detW2Nraory8HP7+/pg9ezbc3NxE665ZswaTJk1CdnY2wsPDMXr0aKxYsUK07pIlS7Bs2TKSU6j2sBqzlZUVTp48iYqKCnz44YcYMmQIwWiB5ORkHD58WCgHoJwXKioqCmY+VP1R8/LykJWVhcbGRuTn5wsndWJp306J78lHQWpqKp48eYI//vgDgYGBcHV1FdW+iaeoqEj42dLSEsePHxetCcg2uOfNm4cePXrg3r17pIZurJrMe3l5YdKkSc98fuLECVG69+/fx5kzZ8jHC7Bt4STx/JECRQmlWLVqlUJLBT5gpIDaoj8pKQl2dnYIDQ1V+Jzqwdja2goPDw/hmmLhwNqljdXup6enJ65fvw59fX1hgUYRKP5dehrfekIMnp6eyM7Ohp6enjBmikCxfUsBqnYFb775JgwNDVFUVITAwEDs3r2bRBf4s1F5WFgYbGxsyHo/du3aFSkpKUIGwsGDB0k2J/bt24d58+bh1KlTcHZ2RnR0NMFoZWhoaMDKygpBQUE4fvw4Dhw4QKLbvXt3WFlZwd/fHwkJCWR9+FxcXFBaWor4+HjBjVNDQ4NEm9WYv/76awCyk51ly5YJAfm4ceOE1Hhl2LlzJzw9PYXnEF9HR8GmTZsUNn3ksxHE8MUXX8Db2xt5eXlITU0V3QB9yZIlCA4OFtqPABBq6KmcWh88eIAZM2Zg9uzZCA4OxtGjR0XpRUREICIiArW1tTh27Bg4joO6uvpfpqMqw9dff82sH6+Hh4dCdgeVe/ZfBYmA+I1uDw8PjB49mny8ANsWThLPHylQlFCK9n33KHfm5C36e/fujd69e4vSu3HjBuzs7JCTk6PQDoPqwdh+4SB2vIAsTdbIyAjvvPMONDQ0yE5geOrr6xEfH4/KykoAdEEzqybzLGloaFAI7qnG3L6lAMXJHCBLG/bz88OwYcPw9OlTsiAfYNeo/PDhw0hPTxeu79+/TxIo9unTB/369cOFCxdgZmYmWPVT0NTUhKKiIhgYGEBfXx+vvfYaiW5VVRUyMjJgYmJCpgn8uaC0t7eHl5cXtmzZAhcXF0ybNg2dO3cWpc1qzN9//z20tbURExODgQMHYu3ateA4DoGBgaJqmt555x0MHDhQuJ44cSLFcAE8mxlANee6dOmCH3/8EYDs/hBbK82XLHz22WeYNWuW8DmfxkiBlpYWKioqoKWlhd69e4t+R7m6usLV1RWnT5/GmDFjiEapyPDhw4UG8FT1pcCr12S+b9++CjWllJkILFs4STx/pEBR4qUjKioKLi4uMDc3R35+PlatWiUqtW7hwoUAgFGjRuG9994T+jy99957JOPt2LEjkpOThXQLKtdMQBbkLl68GLW1tdDT08OmTZvwzjvviNZds2aN4KA2bNgwsrRIVk3mWTJw4ECUlpbC2NgYgGyRRkHHjh0VTppv3LihsHhVlnnz5iElJQVTp05FVlYW6a44q0blLOqkAJnr7fvvv4+amhps374dGRkZJLqA7ATf1dUVmzZtQlJSElljdWNjY6xfvx4BAQFISkoiO2n29PSEtrY2UlJSMH78eHh5eYHjOISEhIjuOWpiYqIwZqoU+EOHDmH69OmIjIzEG2+8AUC2WREWFiZK9+HDh1iyZIlQDkCZVseKw4cPC8Fdhw4dEBISAn9/f6X1+P/2K1euYNCgQULq91+5aStLZWUlpk2bBk9PT/z666+4efMmie6YMWOQlpaG3NxcWFpakr2rL126hODgYCFVnaq3LfDqNZkfNmwYE7dvHnNzc6GO98iRI3B2dibTlni+SIGixEtDXV0dampqUFRUJLRV0NHRIUuf2rdvn8IuH5VZgI+PD7S1tVFUVISBAweSuWYCQFxcHGJjY2FgYICHDx9iy5YtJIFinz598Omnn6KpqQlTp04lq1th1WT+3r17CAoKUrCOp9oBrampweTJk9GxY0dh8fB36T7/hOzsbBw7dkyoz6CoXQVkC0B+EWhjY4MLFy6I1uRh1aj8448/VjDJoVr4rVy5EpqampgzZw7CwsKwePFiEl1A9l3wm0zAs1kUyjJjxgzBkr979+5kur/++ivmzZsnfCcA0NzcjMLCQtHa7733nhBgaGlpkS2uP//8czg4OAhBIiAL0OVrAJWhrKxMoR0NZVoddZP5K1euCP/48gjKenQtLS2F+uDW1laoqamRaL///vtwc3ODoaEhamtr4evrS6L7/fffIysrC6ampkhLS0NmZqaQpiyGkydPIjw8HFFRUZgzZw5ZCjXAtsl8ZWUlHj16BCMjI5JyC0CW5tuvXz9yt28A2LZtG44dOwZVVVXhnSoFiq8uUqAoQQJFCsfZs2cRGxuLsrIy5OTkCPUJFC6fADtbcyMjI3z55ZcICwuDm5sbac2YqampMOY33ngDpqamJLrFxcWoq6sTUsouX75MYtTx0UcfKSzyqPo+srKOB2SGKOnp6UK6JZXpxbp16/D5558Lp7VUphezZs1ScCItLy8n7X8VGhqqkEJFASuTHBMTE2ERtXDhQrJFFKBoZhMQEIBZs2a91GY2/v7+Qvuc4uJilJaW4t///jd27twpWpuV+3JiYuIzLWNUVFREO2kGBAQoPCtHjBghSk+esLAw4btobm7Gd999h40bNyqtp6urCyMjI3Tq1EmoF1NVVcW4ceNIxvvWW2+hsLBQON3ZvXs33N3dSbRZ9ZRsbm5WeI/K19OLoWfPntDT00NrayvU1dVJgyNWvg3x8fHYuXMn+vTpg/Hjx6OgoABffvmlaF0zMzOF3slUbt8AcOvWLSQmJpK/UyVeDFKgKKEULJrMOzk5wcnJCampqaTpdDysbM0fP34MQFbH8/vvvyMzM1O0Jk9xcTF+/vlnmJiYkLaxsLe3R25uLsaNGwdvb2+y3b6lS5eitrYWJSUlMDMzI2syz8o6HgCGDBmCxsZGaGtrk2kCwIABAxRqYg0NDUl0Bw0ahKlTpwKQpclSpUQC7PqjsjLJYbWIAhTNbOLj4196M5tLly4JgaJ8MCdmXrN2Xx48eDC0tLSE69jYWBLXzA4dOmDJkiXIz8+HhYUFSSDOqsm8hYUFLCwsYGtrq9BmiYrNmzcjIiICwJ9mNlSBIqvN1/YnnqqqqqI1AeDq1avo378/Ghsb4eXlhZKSEhJdABg5ciRiYmKQm5sLCwsLsndqbm4uTp8+jbCwMDg6OuLGjRskuq+//jqOHj0qpJ5SpspaWloyeadKvBikQFFCKVg0medhESQC7GzNzc3Ncf78eYwYMQITJkwQ1duoPQsXLmTSxoIPYCorK0n/dufOnYOvry90dXVRW1uL1atXk6TWsbKOB2TmDt9//z0MDAyEhZR8gKcsQ4cOxYoVK4RAi6pOSj690sjICFlZWaI1eVj1R2VlksNqEQW8OmY2LIM51u7LqampiIuLExar9+/fJwkUN2/eDAcHB8yZMwclJSUIDg4W3UKGVZN5HhZBIiAztZHPEPjpp5/ItFltvqqpqcHd3V1oMk9llrd582aoqqrCysoKMTExpJkT/v7+aG5uhqmpKW7evIn8/Hx4eXmJ1uVPavlnJ1UpzqlTpzB48GChrpsyVVZfXx82Njbk71SJF4MUKEooBcsm86xgZWsuHxheuXKFRJOHb2NB7Xp68eJFrFy5Eg0NDejQoQMCAwNJXNpSU1Nx9uxZaGpqoqmpCX5+fiSBIrV1vDys0mXDw8MxZswYYc5R1UnJt3mpr69HYWEhPv/8cxJtVv1RWZnksFpEAWzNbPz8/BAYGEhiZsMymGvvvsxz79496OnpidY3MTHB1q1bhWuqe69Xr15CC4j+/fuTOAOzajLPmrlz5yrUuH344Ydk2qw2X7/66ismbSw6dOiA5ORk3L59G7179yZ149TX11fIZggJCSHRffz4MXx8fPDw4UMEBQWRaAKyViHybsCULc7OnTuHCxcuCM9nys1oieePFChKKAWLJvOsqampgaenJ7khSmlpKQIDA5kYrWRmZmLx4sWoq6uDrq4umevp4cOHERcXh9dffx0VFRXw8fEhCRS7d+8uGD1oamqie/fuojUB2SkMbx1PTfvUSqp0WQsLC7i6ugrXVHNCvs2Ljo4OickDD6s6G1YmOawWUcDzMbOxsLAQrft3wVxhYSFJMAfIXERZNBNfsGCBUJdXXFwspM6K5e7du3jy5Ak6d+6MyspK0hYyrJrMU5vk8LBMz2a1+QrINjtUVFTI0k4BwM/PDyUlJYJJTkpKimhHYJ7a2lqFa/5eEYunpyeOHDmCvLw89OzZkyyl9eTJk3j33XeFdzRlizMrKyuFmlX+Hpd4NZECRQmlYNFk/u+gqnvYuXMnZs+ejcuXL5MaorDSBdi5ng4YMACvv/46ANmpJf+S4E8YleXu3bvYt2+fUFPJu9eKJTAwEKNHj4azszN5itbx48eRl5eHESNGYOTIkWS66urqTOzH165dK/ztAKCiogIdO3YUrQs8GwyVlpaSLCBOnTqFsWPHYvjw4cjJyUFAQABWrVolWpfVIgqQ/bcHBQWhQ4cOpBtA2dnZ8Pb2hoGBAcaPHw8dHR1RaVlJSUmws7PDrl27FD6nbAnBqpn4qVOnmJjkTJo0CRMnTkR9fT06deqETZs2idbkoW4yz0NtksPDMj370aNHcHNzQ15eHiwtLeHn50dySsfK9bSlpUXBJIcqSARk5jATJkyAsbExSktLFXpXimHr1q0Kay0qrKyscPLkSVRUVODDDz/EkCFDyLTT0tJw/PhxGBsbg+M43L9/XzK0eYWRAkUJpWDRZJ6HhVEOwM4QhaXRCivX0wcPHuDIkSNCDUhDQwOuXr0quiZtxYoV2LVrF9LT08lMJADA29sb3bp1w6FDh1BfX4+PPvqIpCchIKtb+de//oXk5GR4e3ujb9+++Pjjj0U7aGZkZMDR0ZHMfpw3Sbh48aLC55T9r+zt7YU0Tv7e41P4xFBUVCT8bGlpiePHj4vWBGRp36tXr4aLiwuJnjw7d+7ErFmzyDeAoqKiEBoaioSEBEycOBF+fn6iAsUbN27Azs5O4aQZoG0JQd1MnLVJzqBBg5CcnIzKykro6+vj6dOnJLoAfZN5ViY5PCzTs/fs2QNPT0/06NEDd+7cwe7du0XXggLsXE/bG4rxm3iZmZmi7+2pU6di8ODB5E3m8/Pz4evri549e8LJyYlsU5APvJuamrBs2TL4+/tj9uzZGDduHNTVxYUGRkZG2Lx5MwDZfX3kyBHR45V4cUiBooRSsGwyz8ooh5UhCkujFVaup/x4f/31V+Gz2NhY0TVpKioqcHNzQ8eOHVFZWUnWrqBbt27o2rUrhg4dir1792Lp0qU4c+YMifbdu3ehoqKCa9euIT09HZqamtixYwe6desmpAgqw5o1axRSscT+7Q4cOAB/f38cPXoUQ4cOFT6ntHh3d3cXHFXLy8tF19xGREQgIiICtbW1OHbsmNDyxtbWlmK46Nu3r4JLK2UtL6sNIFNTUxgZGQkLd/kegsrAp8d+9tlnCqcCVL0qAfpm4qzqKvnA8OrVq8JnhYWFpI6O1E3mWZvksEzP7tWrl5BxYGVlReZmzMr1NCUlBRcvXhQ2SLW0tFBUVERSOpOcnAwA+PDDD3H+/Hl07NiRJHV4y5YtQp/moKAgaGhowMfHR7Tu999/D21tbcTExGDgwIFYu3YtOI5DYGAgvL29RWnzQSLP+PHjRelJvFikQFFCKVg2mWdllMPKEIWl0Qor11Nvb++/TDWRDxyVYenSpZg0aRJGjRqFjIwM3L59G1999ZUoTQBYtmwZampqYGhoiBkzZpD2RVu+fDk0NTUxbdo0HDt2TNiBb/+y+9/CcRxUVFTwr3/9SyH1NjY2VlQKkb+/PwAIp548BQUFSmu2hw8SAVkq4P3790Xpubq6wtXVFadPn8aYMWPEDu8ZunbtipSUFMGl9eDBg2T3CKsNoPz8fCQkJODhw4dISkoiq58LDAzErFmzMHbsWGhoaJBmNlA3E/+7ukqxLF++HHv27MH69ethaWkpfE7p6EjdZJ61SQ7L9Ow7d+7g1q1bMDY2xt27d8naTbByPTUzMxM2JuShKJ05deqU8OyxtLREWFiY6IALkLW9MTQ0RGRkJDIyMv5y/Mpw6NAhTJ8+HZGRkcJmVWtrK8LCwpTW5NPg5Q3XANo0eInnjwrHcdyLHoTEq8eOHTueaTL/xRdfkGjPmzcP1dXVr5RRjjz8rjYL+EJ8Cqqrq1FfXw9Alu77zTffiNbk5wMPP0/EMnfuXPj4+JD9t8sTGBiIFStWKDSxb2pqUqgZ+ic4OzvjyJEjsLe3F2o0AIiu0/i7es8ffvhBwXlQDPJ1g3V1deA47pmXvrLk5uaiqqoKPXv2hKGhocL3rSzDhw8XnhOA+O9Yntu3bwsbQBYWFli/fj1JOtmDBw+wYcMGhc0fipOH5ORk6Orq4tSpU+jSpQucnZ1JdIE/F4A8169fJ1m8U9dr8mRkZChshP36669kRjmTJk1CeHi4Qv9ACg4dOoT+/fuTm+QAsrl8+/ZtmJubk7U2AWSntV5eXuT3CAAm5ld1dXUKqZv8+7S+vl509svevXsxZ84c4ZpqTTRkyBBYWFjgk08+wahRo0SnhfJcunTpmawDjuOEHsjKsG3bNixcuBBff/21wn2clJRElnEm8fyRThQllIJlk3lWRjk5OTn49ttvUVhYCHNzc6xdu1YhbU1Z6uvrkZaWJgRdlGm49+/fx5kzZ8jdBj09PZGdnQ09PT2h2JwiUORTkXmePHkiWhMANm7cqJBSmJ6eDhsbGxLtUaNG4e7du1BXV8f+/fsxceJEDBgwQKkgEYBQj+Ht7Q17e3vhcz41SVlmzZoFIyMj1NXVoaqqCkZGRigrK4OOjg5ZoAhA2LHW0dFROJURw549e5CcnIzu3bvDyckJkZGRWLZsmWhdDw8PhcV0WlqaaE2e3r17Kzjttp/bymJoaKhQc0V1ojhkyBDo6Ojg9ddfx9atWzFx4kRcunSJRLu9yVFOTg5JoEhdr8mjqqqKkpIShXuaClZN5lma5MTHxwvlCxMnTsS8efNItBsaGpi5UQ8fPlyohT1z5gxGjx4tWrO2thZHjhx55n1KUSJx+/Zt3Lx5Ez169MDdu3dRXFwsWhMAvvnmGzJjHHmuXLmC1157DeXl5UJ9opubm6hTbT4NnvcU4KFMg5d4/kiBooRSsGwyz8ooZ9u2bfDx8UGPHj1QXFyMrVu3ikqz4HF3d4eFhYVgRU+Zhuvh4YHRo0eTuw02NDQoBOBUtSU9e/bE+PHjYWJiQur89uDBA6xZswZVVVXkLmpxcXHw8PDAt99+CxsbG0RFRWHAgAGideWDROBZ+/R/io+PD2xtbbFv3z58+umnUFFRQVtbG7Zv3y5KV54vvvhCOBEoLi5GamoqiQtlQ0MDIiMjERYWBhsbG7L51v7Ehd+soYDVBlB5eTmTVhPLly9Ha2srysrKMH36dNEpkYBscyIyMhLvvvsudHV1hdPx+vp6EgMh6npNHlb3NMCuyTy1SQ5PYWEhEhIShOslS5aQ6AKydhMjR47EqFGjhO+DgujoaERGRioY2lEEiqzep4As66X96SoFs2bNQkNDg7DpevToUSxYsEC0roaGBqysrBAUFITjx4/jwIEDojXla4PlU/Upa4Qlnj9SoCihFCybzLMyyunfv7+wC/7OO++QuWb26NEDXl5ewjWV4QwgM+uQNzWgarsxcOBAlJaWwtjYGABE16LxsHJ+27dvH+bNm4dTp07B2dkZ0dHRJLqArAG6trY2Hj9+jBkzZii47SmDvHMoD7/YGTdunNK6vAFMRUWFoK+qqkq6McGqXUFrayuAP50X29raRGsCsvSp4OBgYQOByiEZkKXA9+nTR0gjp/qeWbWaqKiogIeHB+nu/Y4dOwAAXl5emDRpkvD5iRMnSPRZ1WtS39PysGoyT22SwyOfmg1AqG/mm86LYeXKlejVqxd+/vlnREZGolu3biSnladPn8ahQ4eEDWMqQztW71NAtnnO4nR13759iI+PR319PQwMDFBRUUESKDY1NaGoqAgGBgbQ19fHa6+9JlozICAAffv2xe+//47GxkahxlTi1UYKFCWUgmWTeVZGOc3Nzbh06ZKC41l5ebnoGq/hw4fj6NGjgtU25e7ZsGHDmPTiq6mpweTJk9GxY0dhgS2/EBSDubk56e4yIDM06tevHy5cuAAzMzOhMTUFhYWFWLRoERwdHfHgwQMUFhaK0vv888/xySefIDo6GjY2NkLK188//0wy3kePHmHt2rUwNTXFnTt3SBo7s25XoKamhrlz5+Lp06e4ceMG+vXrR6J78uRJhIeHIyoqCnPmzMHevXtJdAGZo6O8uyDVgoe61QRPcHCw8JwAZPV/YlPr+XquSZMmoaioCAUFBejbty+Zi6GHh4dQr1lZWUlmRER9T8vDqsk8tUkOT2ZmJpYvXy6895qamhAaGkpykj1gwABcvHgRly9fRmZmJlktIe+Ky0PVsJ3V+5QlDx8+RFxcnFD/T/WMU1NTg6urKzZt2oSkpCRkZ2eL1vTy8sLgwYMRHh6u8IyjqnOXeDFIgaKEUrBsMm9kZPSMUQ4FJ06ceKaW8sKFC7h//76oQDE2NhaamprCi43SYS8iIgL9+vUj68XHk5ubi/T0dOGE52VvhpuZmYn3338fNTU12L59OzIyMsi0V/GY/xwAACAASURBVKxYgczMTNjZ2SE/P1+0K+Ann3wCQHbCwy9ITE1NSQI6AFi/fj1iYmJw+/ZtvPnmm5gyZYpoTVbtCngWLlzIxJyiZ8+e0NPTQ2trK9TV1UlPV3v06IHU1FShJURcXBzJTj51qwkeTU1NbNiwQcjEoDQB41MB+brY2bNnk8y79vWaVK6Z1Pe0PKyazIeEhCA8PBzAn70PKdDQ0BBOmfkMEoDmJNvBwQEGBgZYtGgRAgMDyYxW+vTpo2AGRlVqwOp9ypIOHToA+DOtnqr2ccGCBQrPs/Z1yMrAG0a1d4im2nCUeDFIgaKEUrBsMs/KKKe9uQiPWJORLl26YMOGDcL1rVu3ROnJY2ZmBk9PT+GaKq11yJAhaGxshLa2Nokea1auXAlNTU3MmTMHYWFhWLx4MZm2vr6+YKBBZd4CyE50Tp8+DTMzMxQXFyM3N5dEV1NTU6G/I4WxD9+uwNDQUMFZtr17pBhYmFNcvXoV/fv3R2NjI7y8vMgCDQDYv3//M46qFIEidasJno0bN2L06NG4cOECRo8e/UwvOjEUFxcrpJtS1V+xMutqf0/HxsaSzWNWTeZZmeS0NxbhoUhRPnPmDBITE3H58mXk5ubC0dGRJJskOjoa27dvR6dOncBxHOLi4kRrAuzep4AslZPPdKmsrERLSwuJ6/CDBw+QmJiIbt26wdHRkcyzgSVqampwc3MTsl7kMx0kXj2kQFFCKVg2mWdllPNXQSIA0c2/LSwskJ6erpDOQpVa98YbbzBJa/3hhx/w/fffw8DAQEg9pXAbbA/VYkdVVRVNTU0wNDSEi4sLkzYZ1Hh5eeG7774TaoEoemoBstPgHTt2MDH28ff3R0hICDQ1NXHv3j34+fmRLNJYmVNs3rwZqqqqsLKyQkxMjEK6k1hYOaquWrVK2L1va2sjq0WztLTEBx98gOLiYgwbNgxZWVkkugAEo66/u1YWVuYiISEhiImJgYaGhjDfqFpNsGoyz8ok56+CRAAkm7vXr18X3qtRUVGIiopCYmKiaN23334bFhYWwrWDg4NoTYDd+xSAQjul5uZmfPfdd9i4caNo3XXr1gk/W1lZPVNz+jLi7e2N5ORkFBQUYNiwYRg5cuSLHpKECKRAUUIpWDaZZ2mUw4K9e/c+c/KgbGuF9iQkJGDw4MFCqiXV6cNHH32kkG4bExNDohsTE4MDBw4oBAQUi52AgAC4urrCxMQEjx49wtGjR0naK7DE2NgYW7duJddlaezz3nvvISgoCObm5kJfNwpYmVPwaVkAMHv2bBJNHlaOqhUVFcLPt2/fxuHDh8l6EpaVlaGqqgrx8fG4fPkyvv76a9G6gCy7w8/PD8bGxrh37x5ZiiErc5GbN2/i/PnzUFVVBUCbWs+qyTwrkxyWrFq1Curq6rC1tcWiRYvI/n5FRUWYOXMmTExMANClUbN4n+bm5gr/+E21trY2/PHHH6K1AVm7JS0tLYwfPx65ublQU1MjKQlgdQLKY2trK3oTXuLlQAoUJZSifY+xyspKMm2WRjksYNnLTf70AZDt4FKwdOlS1NbWCs11KeqNAFkdaEREhOAUSRUQWFlZ4d133wUAWFtbk56WtIfqFJQVLI19BgwYgMLCQkRGRmLevHkYMWIEiS4rcwqWUG961NXVoaamBkVFRSgvLwcgC3Q1NDRIxuvq6oqGhgZMnz4dGzZswMyZM0l0AVnNX0xMjFBjSlXzx8pcZODAgUKQCEChybpYPvvsMyZtEFiZ5LAMCJycnDB//vxnXJ7F0traKvTkA+iCZhbv05qaGpSWlqK6ulrIrFJVVVXYABHD9evXBVOtMWPGIDg4WCF9VllYnYBK/N9DChQllIJlk3mWRjnyUAUE7U8e3n//fdGaPHp6ekwaR587dw6+vr7Q1dVFbW0tVq9eTVLMbmVlJQSJgCyooaC8vBwtLS1QV1dHS0uLsNCmgNUpKCtYGvvMnTsX7u7uOHXqFM6ePYt58+aRnDazMqdgycmTJ3HgwAGhn53YTY+zZ88iNjYWZWVlQh2zuro6SfsRAHjzzTfR1taGjh07wtfXV+E+FIuqqioGDx4MPT09mJubKwRhYmBlLpKSkoKjR48qtP+hmm+smsyzMslhGRBQ1Oz+FayC5pEjRyImJga5ublkGx7W1tawtrbGuHHjRDWr/zvMzMyEE3wtLS28/vrrovRYn4BK/N9DChQllIJlk3lWRjmvWkAAsGscnZqairNnz0JTUxNNTU3w8/MjCRTz8vIwbdo0IRWXKmVo2LBhcHBwQOfOnVFdXa3QtkAsrE5B21NRUUGyk8/S2Ofrr7+Gm5sbAGDUqFGora0l0WVlTsESKysrhabnYjc9nJyc4OTkhNTUVDLXV3mWLVuGiRMnYtSoUf+vvXuPy/n+/wf+KKk+K7NqhE5UlEM0rJmaGslsObS2MJ9kN1/lLM2xmu1mLYkw2gcJw26zFgkzE66Fai6HHDtclFLJuVIqKt6/P7r1/nU5bl2vV+/r0vN+u7ndPu/+eO5583F1vZ+vw/OJ06dPIycnB9OmTWMSOyYmBnv27BFHvYwaNYrJvDxezUXMzMywatUq8ZnV0XqA35B51k1yNLkg2LJlC1xdXVFcXIzw8HBMmDBB/L2kivDwcNTW1sLKygqXLl3C5cuXlWYgqyI1NRVlZWWorq5GREQE/Pz8mNyLzcnJwZ9//gkrKysUFBSo/BnhvQP6PMePH2e2IEaaHxWKpEl4Dpnn1SinuQoClngNju7UqZN4HElXVxedOnViEvfx48cICgoSn1kdGRoyZAjeffddXLt2DVZWVkqrzapivQv6oiKI1a77V199hW+++QY9e/bEokWLVI7XmL+/P7Kzs1FaWoouXbowawDCqzkFDw1/pyUlJfjiiy/ExkmsFj3q6upw9OhRuLq6Ijk5GT169GCygNCnTx94eHgAqC/yWc4OzM3Nxf79+8Xnr776iklcXs1FGheJABAYGKhyzAa8hsyzbpIjRUHASuvWreHo6Ihly5Zh79692LZtG5O4xsbGmDp1qvi8du1aJnGB+u6k48ePx4QJExAVFYVdu3YxiRsYGIhly5aJR51VnTXKcwfU19f3mWPImnKChLwYFYqkSXgOmefVKIfXsUieeA2OLigowJYtW8QdAlZHOZ8+MsSylbcgCDAxMcGDBw+wdetWZseeWO+C7t69W7xP2RirXXc7OzulJjOlpaVKO1+qiI2NxdGjR9GpUyd4eXlh+/btTJoG8WpOwYO2tvZzj3izWvQ4cOCA+HfavXt3xMTEMOmI2zA/sUFZWZnKMRs83WnRzs4OAMSOvk3Fq1lXRkYGQkNDYWJighEjRsDAwIBZV2deQ+ZZN8nhfSSSp5qaGly9ehUmJiYwNjbGf/7zHyZxnz4hwWq2LVB/LPT27dvQ09ODra0ts9/J7du3V5o1WlhYyCQujx3Q3r17i3OEGwiCgB07dqgUl0iLCkXSJDyHzPNqlMPrWCRPvAZHL1iwABs2bMCJEydgb2+PBQsWMIlraGiIo0ePii+trHbRgoODceHCBRgbG4srlKwKRda7oKGhoc/tSseq9X+HDh1w7Ngx2NjYQEtLCz///LPKq8wNqqqqsH37dsTExGDAgAHMWv/zak7BQ3BwMAwMDHDv3j1xrt2NGzcwZswYJvG7du0qxjU1NYWpqSmTuF26dMGIESNgYWGBoqIi+Pr6MokL1N+LnT9/PiwsLFBYWIiamhpER0erPPeQV7OuuLg4REdHY//+/Rg1ahTCwsKYFYq8hszzapLD60gkT61atYKfnx9WrlyJv/76CxkZGUzidu7cGSNHjoS5uTnzz0hJSQnGjh2L4OBgnDlzhtnYm+LiYiQlJTGfNcpjB/RFi4off/yxyrGJdKhQJE3Cc8g8r0Y5vI5F8sRrGLyWlhb8/f1haGiIkpISGBgYMIm7ePFi6Ovr4+rVq3BwcGC2i1ZVVYXff/9dfGZVwADsd0EbF4m5ubniQgerXfcdO3bgxIkT4vONGzeYFYqPHz8GAPH40JMnT5jEffrvuGFnUR01fBZ27NghNgF544038OuvvzIZY5Gbm4tLly7B0tISBQUFyMvLUzkmAPj4+KBfv364cuUKunXrxqQRSoPWrVuLA9obGsQAqi9+PH0v+vLly0z+jq2srGBmZib+O27Xrp3KMRvwGjLPq0kOryORPM2cOVNpIZDF/XmA72dk4MCB8Pf3h6mpKSoqKrBkyRImcYOCgjB06FDms0Z57YACwM2bN7Ft2zZxwVgTFuXJi1GhSJqE55B5Xo1yeB6L1DRz587F6NGjmTe+MDMzw9SpUxETEwN/f39mdyodHBxQWVkpvsSXl5cziQvw2wWNjIxEXl4ebt++jc6dO+Pq1asqxwT4jmNp1aoVJk2ahIcPH+LixYsqf6Ybugs+PYSbZZdk1k6ePCn+iY6OBlB/fOrWrVtM4k+aNInLzhEA2NjYiEVLQkICs52j0NDQ5w5ubygem2rNmjXYvXs3tLW1xQZjLE5NXL58Gfv378edO3fw119/MTuuB/AbMs+rSQ7PgkDTHD16FADw0UcfITk5GYaGhsxGhaxduxabNm0CALRp04ZJTADo2bMnJk2aJD67uLgwictrBxQAVqxYgWHDhuH48eMYNmwYWrVqxSw2aX5UKJIm4TlknlejHF4FgSZydHTk0vji3r17AOrvS928eRPp6elM4v78889Ys2aNeGSPZcdaXrug+vr6WLdunVg0b968mUlcnuNYZs2ahZSUFLGIUfX+1bZt2xAeHo5du3bhvffeE3/Osksya2+++SbMzMzQpk0bcRVfW1sbnp6eTOLb2Ngo7Rw9fbewqdauXYv4+Hi0bt1aLLpYFYrPKxIBqNyROjMzEzKZTNz5Y9XwIigoCJGRkVAoFCgpKWG24w7wGzLPq0kOz4JA0/zxxx/ivwWW94OB+gKu4fsJYDd+y9DQEL/99hs6d+4MLS0tZidTeO2AAvV/t0OHDkVeXh6cnZ25zj0m/FGhSJqE564Gr0Y5vAqC5sTqy4dX4wsbGxskJydj0KBBGDlyJMaNG8ck7ieffIK5c+eKzyzb3fPaBa2trQVQv/tZV1fH7J4NT3/88Qc+/vhjuLi4ICsrC0uXLlWps2p4eDgAYPbs2bCzsxNX2q9cucIkXx7s7e1hb28PV1dXprMIGzx58gTHjx9nvmB16dIlJCcnizMONaHLYPfu3fHo0SPo6+szjWtqaqrUAETVxjCN8Royz6tJDs+CgJeamhqxK3dJSQnq6uqY7PzZ2dlxuR8MAFlZWfDx8RF3gxUKBZPv6gMHDqBfv344e/asGJcFXjugQH0zqevXr6O0tBR79uyBXC7H9OnTmf43SPOhQpE0Cc9dDV6NcngVBDzxmv3Iq/FF48Lw5MmTTGIC9UdlKyoqcO3aNXTu3Bmff/45s9i8dkF1dHQgk8nQq1cv9O3bVyMu9Dc+Htu9e3fs3buXSdyFCxdiy5Yt4gvJ85r9qJvS0lJMmzaN+QB0XgtWDg4OYpEI1O9EqDtjY2MMGDAAJiYm4u83Fk1nbty4gYMHDzJvAALwGzLPq0kOz4KAl5iYGPGEUm1tLZYvX44VK1aoHDcnJ4fL/WCg/l5348VMVj0QeDV84rUDCgB+fn6oqqrCuHHjEBkZybRpEGl+VCgStcOrUQ6vgoAnXrMfeV3qLyoqQkREBAwMDODq6ooOHTowOZp15MgRLFmyBG+++SYqKirwzTffMGtwwGsXdOrUqeKquJOTE+rq6pjE5WHr1q3YunUrKioqsHv3bgiCIB6vY2HYsGFKDWxOnDiBAQMGMInNy+bNm5kOQG/Aa8Hq2LFj2LVrl9hshuXssrKyMqxbtw6tWrXCe++9BwsLCya/M44cOYLjx4+LBQyr329BQUEYNmwY8wYgPPFqksOzIGAtOztb/NMwj/bJkyeorq5mEp/n/eCneyA4Ojoyievm5ob4+Hjxvjerzue8dkCB+qK5YTFwxowZLfq48+uACkWidng1yuFVEPDEc/Zj48YXrKxfvx4TJkyAXC6Hu7s7wsPDmRSKqampOHToEHR1dVFTU4OwsDBmhSKvXVBeq+I8+Pn5wc/PDwcOHMDw4cOZxy8qKsKcOXPEf2+nT59W+0KxS5cuTAegN+C1YGVmZiYOmhcEATt37mQSFwCioqLQt29f5Obmok+fPli5ciWTI4yOjo5Ku1wNhZ2q7OzslAbLs7pHyBOvJjk8CwLWysvLUVRUhPv376OoqAhA/f3gxv9fquLp+8Es3b17F/7+/sxPIISHh6O2thZWVla4dOkSLl++rNTDoal47YAC9YtWDb877e3tNaLDPHkxKhSJ2uHVKIdXQcCTps1+tLa2hpOTE86fPw9dXV2Vm1006NSpk7g7p6uri06dOjGJC7DfBeW9Ks4TjyIRAO7cuaN0XFgTdnhYD0BvwGvBqqFILC0thZGREQIDA5nEBeqLZi8vL8TExOCtt95i9rlOS0vD3r17YW5uLs5HZbEL6uzsjDVr1igtNqp74zJeTXJ4FgSsOTk5wcnJCZ6enujcubPU6fwrsbGxXE4gGBsbY+rUqeLz2rVrVY4J8NkB3b17NxISElBcXCy+YwmCAD09PZVjE+lQoUjUDq9GObyORfKkabMfFQoFzp07h0ePHuHy5cviqrCqCgoKsGXLFlhYWKCgoADFxcVM4gLsd0F5r4proqVLl8LKykp8HjRokITZ/DO8BqDzWrBKT0/HnDlzUFFRgbZt22LVqlXMjr9duXIFt2/fhpaWFioqKnDz5k0mcRvvggLsmlRt3boVPXr0ED9/mtC4jFeTHF5HInlKTU1FWVkZqqurERERAT8/P2YdfHmxtrbmcgKhoqJC6bnh3q2qeOyAuru7w8nJCb/99ht8fHwA1I9cYjnHlEhAIKSFCAkJEeRyubBmzRrh0aNHwjfffCN1Sq90//59ped79+5x+e8kJSUxiXPlyhVhzJgxgqOjozB27FghNzeXSdwHDx4IUVFRgr+/v7By5UrhwYMHTOIKgiBs2rRJEARBiImJEQRBENatW8ckbl5eHpM4r4PS0lJh6dKlwrJly4Tk5GRm/y40UWZmpuDt7S04OjoK3t7eQkZGBpO4X3/9tXD37l1BEATh9u3bQnBwMJO4giAIp06dEj744AOhd+/ewocffiicPXuWSdzDhw8zifO0kJAQpeeW/FnMzc0VfyePGzdOIz57UVFRgiAIgq+vr3DlyhUhIiKCSdxHjx6J//vevXvCrVu3mMQVhPp/cxkZGcL9+/eFixcvCosWLWISNy4uThgxYoQwdepUYcSIEcJvv/3GJO6iRYuE8+fPC6WlpcLZs2eFhQsXMokrCIJQWVkpVFRUCIIgCCUlJcziEmnQjiJpMXgdi+SJ1+xHXt1UbW1tle6AlJSUqBwTqD8+5e/vD0NDQ5SUlMDAwIBJXIDfLuiBAwdgamqKTz/9FHFxcejcubPSLMGWhNcdN55yc3PFHUWWd47WrFmDxYsXw9LSEnl5eVi9ejViYmJUjmtlZSU2LWnXrp3SDq6qDA0NcezYMZSUlDAdGbJu3TqcOXMGo0ePZnr/ul27dlxGLGkiXkciedLT08Pt27ehp6cHW1tbGBkZMYnL8944rxMIvBrP8doBBYB58+Zh1KhR8PDwwKlTp5CTk4Np06Yxi0+aFxWKpMXgVRDwxKuVPq9uqpWVlUhLS0NlZSUAdoXt3LlzMXr0aHh4eOD06dNMv3gmT56M0NBQKBQKpKamMvuCLysrE++WjBkzBsuXL2+xhSKvO248bdq0icsLds+ePcUXtHfeeQcODg4qxwSAvLw8JCUlicez8/PzmcQFgNDQUPj6+jIf8bJs2TJ06tQJCQkJ2LFjBz744AOxoYsq9u/fj379+uH06dMA2I1Y0kQ8CwJeSkpKMHbsWAQHB+PMmTMqd81sjnvjVVVVXBrlHD16FADw0UcfITk5GYaGhkxmSvK6gw0Affr0gYeHBwDAw8MDubm5zGKT5keFImkxeBUEPPFqpc+rm+qUKVNgb2+Ptm3bAmB3N8jR0ZHbFw+vXdC3335b6bnxPaGWhtcdN554vWDX1tbi77//hoWFBQoLC6Gnp4fi4mL88ssvSk1H/q1Zs2Zh2bJl4o7G/PnzmeQL1M8OfPPNNxEZGQkjIyN89tlnTF5WHz9+jFatWkFXVxdnz55FcXExUlNT0a9fP5WKUl6z5zQRz4KAl4EDB8Lf3x+mpqaoqKhQ+fRBc9wbDwsLg5ubGzw8PJh2E//jjz/Ez3L37t0RExOD0NBQlePy2gEFIJ6AalBWVsYsNml+VCiSFoNXQcATr1b6vLqpWlpaKrXuZrWrwfOLh9cuaGFhITZt2gQrKysUFBTg+vXrKsfUVN7e3vjss89w//597NixAytXrpQ6pVfi9YK9b9++Zz7Hx48fx40bN1QqFNu3b4+oqCgA9cVo69atVcqzsf79+8PAwABvv/02fvjhB4waNQp///23ynHnzZuHyspKuLm5YfXq1WKnyx9++EGluE+PzsnKyhKL/paGZ0HAy9q1a7Fp0yYAUBqf0lTN0U114cKFsLa2RlJSErZv346OHTsiICBA5bh2dnbikXJTU1OYmpqqHBPgtwMK1J8gGTFiBCwsLFBUVARfX18u/x3SPKhQJC0Gr4KAJ16t9Hl1U3VxceFyN4jnFw+vXdAFCxZgw4YNiI+Ph729PRYsWMAkribq378/lztuPPF6wQ4NDX3u8cqGI2ZNNXv2bAwaNAje3t7Yu3cvHj58iPHjx6sUs8H8+fPx+PFjXL9+HePGjWN2v7RLly4ICwuDoaGh+LOamhqUl5c3KZ6vry+2b9+Od999V9zBb7iDPWbMGCY5axqeBQEvLi4uYnEEAIcOHWJyh55nN9VevXohJSUFcrkc6enpcHZ2ZhI3JycHly5dgqWlJQoKCpCXl8ckLq8dUIDfvUoiDS1BEASpkyCkOfj6+ioVBKdOncLWrVslzkoa5eXlSkchWb3AT548Gbq6umJslnMfc3NzuXzxhISEKBUB+fn5XFadMzIy0LNnT+ZxNcH169exfPly5OTkwMbGBnPnzoWFhYXUaf0rvIrcnTt34rPPPlM5zo8//ojp06e/8FkVn3/+OYKCgvD+++8zidcgPz8fhoaG0NHRQWJiIoYOHQozM7Mmx3vw4AEMDQ2RmJiI0aNHiz/ft28fRowYwSJljTNmzBhuBQEvkyZNQkVFhZgvq++RlStXIigoCBMmTMDixYuxa9cuZgt4gwYNgomJCQIDA+Hs7AwdHTb7MI2bajUsWLH4/jt79qy4A5qRkcFsB/R5EhIS1H68CXkx2lEkLQavY5E88Zr9yKubqpGRESIjI8XnzMxMlWM2sLGx4fKiw2sX9ObNm9i+fbt4xJll0axpvv/+e3h6emLy5MnIy8tDWFgYNmzYIHVaL8XrBMKaNWuwe/duaGtri7tdLArFuro6peeamhqVYzaIiooSPx8Au0WPdevWYebMmYiOjoaxsTGio6NVahjUsDPZuEgEWvb9YF5HInnS0tJSOobN6sQLr26qAHDw4EHIZDLI5XJkZ2fD3d2dyfeVjY0Nlx1hXjugQP3CTHR0NMrKyqCnp4eHDx9SoajBqFAkLQavgoAn1sPgG/Dqpmpvb48TJ06If8cymQw9evRgEpuXhISEZ3ZBWVixYgWGDRuG48ePY9iwYWjVqhWTuJrIzs5ObE7Ss2dPXLlyReKMXo3XkeTMzEzIZDJxsPrhw4eZxNXR0UFAQAAsLS1RWFjI9E6erq4uIiMjxYUlVoseXbt2hampKa5evYqIiAiVm3X5+vo+M7BeEATcuHEDrq6uKsXWVDwLAl5WrlypVNw7Ojoyicu6m2pjFy5cEI+Ux8XFIS4uDjKZjFl81oYMGSLugEZERDDbAQXqdysPHDiA2NhYpk34iDSoUCQtBq+CgCdesx95dVPdvHmz2CAHAG7cuCHOrVJXvHZBu3fvjqFDhyIvLw/Ozs44d+4ck7iayNDQEIWFhWKnz06dOgEAduzYwezeLWu8TiB0794djx49gr6+PpN4DaZPn46UlBQoFAq4ubkxLQh4LXrk5OQgLCwMzs7OePjwIQoLC1WK17t3b3zxxRfYv38/HBwcYG5ujqKiIqSlpTHJVxPxLAh4uXv3Lvz9/ZnPMGXdTbWxRYsWQUdHB66urggMDGSyoMsTrx1QAOjQoQO0tbXFUw23bt1iEpdIQ/1/YxDCCM9jkbzwmv3Iq5tqUFCQ0hETXi9orJobAPx2QTMyMnD9+nWUlpZiz549kMvlzO6MaZr169fjl19+AVC/wwMAGzZsQGVlpdoWirxOIBgbG2PAgAEwMTERj566u7urHBeoz9nFxYVJrMZ4LXoEBATg2LFj8PHxwblz51QubufNmwegvllXw31KCwsLnDp1SuVcNRXPgoCX2NhYLjNMWXdTbczLywszZsx4ZkdbVTU1NdDV1QVQvyNaV1fHZDQNzx3QCxcuQCaTQVdXF35+fqBWKJqNCkXSYmjisUhesx95dVN9+h7CwIEDmcSNj4/Htm3bUFVVJb5csyoUee2C+vn5oaqqCuPGjUNkZCT++9//qhxTU4WEhDxzbwyov8uirnidQDhy5AiOHz8uvqju3r2bSVyeeC16dOnSRfzsDRgwQOV4Dc6fP4+LFy/CysoK+fn5TI8YahpNOxIJ8JthyqubKlA/a5SHmJgY8fuotrYWy5cvx4oVK1SOy3MHNCIiAvr6+nB1dYWNjQ3eeecdZrFJ86NCkbQYmngsktfsx8aF4cmTJ5nE5Gnfvn3YunWr2HWS5cs1r13Q2NhYTJkyBV27dkV0dDSTmJrqeUUiALXuRMnrBIKjo6PSboYqXT6bi6YtesyaNQuLFy9GTk4ObG1t8e2330qdkmQ07UgkwG+GaVZWFnx8fJS6qbIqOfkADwAAEZtJREFUFFnLzs4W/yQmJgIAnjx5gurqaibxee2AAsCwYcOwZcsWdOvWDUOGDGEenzQvKhRJi9FcxyJZ4tV5kVc3VV4cHR2VRhN069aNWWxeu6B6enpKnSEfP37cohvaaBpeJxDS0tKwd+9emJubi41WWDW04aVr16548uQJDA0NsWTJErWfhdmjRw/s3LlT6jTUAs+CgBdeM0x5dVPloby8HEVFRbh//7545URbWxsTJ05kEp/XDigAjBw5Uuk7uiWPhnod0BxFQtQYr9mPoaGhGDlyJORyOQICAhAeHq7Wq+4BAQG4f/++uCOsCaMmtm3bBmdnZ3H1ev369ZgyZYrEWZF/ysXF5ZkTCCwKujlz5ogvq4IgYOfOnQgMDFQ5Lk/Tp0/HqFGj4OHhgaSkJOTk5GDatGlSp0VeUxcvXoSDgwPzuE/PD258/09d8Zrry1NISAiMjIxgY2MDLS0tZgvcRBq0o0iIGuPVeZFXN1VeHj9+jKCgIPFZnVeCG6xatUos6hvuVVKhqDl4nUBYtWoVgPpGUkZGRmpfJAJAnz594OHhAQDw8PBAbm6uxBmR11lYWBjc3Nzg4eHBtPEOr26qPKWmpqKsrAzV1dWIiIiAn5+f2s8kzMjIgLu7O65fvw6A3WghIg0qFAlRY7w6L/LqpsrL03O1bG1tJczm5fbu3YuPP/4YM2bMwKRJk8Sf//nnnxJmRf6tfv364e7du9DR0UFiYiKzu0zp6emYM2cOKioq0LZtW6xatYrZnDheGuYnNigrK5Mok6Z5+PAh83EkhJ+FCxfC2toaSUlJ2L59Ozp27IiAgACV4/LqpsrTrVu3MH78eEyYMAFRUVHYtWuX1Cm9UHBwML777jssXrxY6SoLqwVuIg0qFAlRY7w6L/LqpsqLoaEhjh49Kr6wqvNRluzsbIwcOfKZXVorKyuJMiJN8b///Q8zZ85EdHQ0jI2NER0dzeSlMjExEQkJCTAxMcGdO3ewevVqtS8Uu3TpghEjRsDCwgJFRUXw9fWVOqXnetEYDFYLbKR59OrVCykpKZDL5UhPT2c2E5RXN1We9PT0cPv2bejp6cHW1hZGRkZSp/RCtra2aNWqFdLS0pQKxXPnzmnc8Vny/1GhSIga49V5kVc3VV4WL14MfX19XL16FQ4ODmp9lKW8vBxyuRwpKSlK867oZVWzdO3aFaamprh69SoiIiKwceNGJnGtrKzEFv3t2rXTiAUEHx8f9OvXD1euXEG3bt3U9rje0qVLYWdn98zPWS2wkeYxZMgQmJiYIDAwEBEREdDRYfOqyqubKk8lJSUYO3YsgoODcebMGbUe9XLx4kUsXLgQCoVCPHYK1H/+XtT5mqg/KhQJUWO8Oi/y6qbKi5mZGaZOnYqYmBj4+/sze2nnwdPTE7///juysrKUGu7Qy6pmycnJQVhYGJydnfHw4UMUFhYyiZuXl4ekpCRYWFigoKBAY45l2djYiPfFEhIS1PKeVEhICPr16/fMz8+cOSNBNqSpDh48CJlMBrlcjuzsbLi7uzO5q8irmypPAwcOhL+/P0xNTVFRUYElS5ZIndILLV26FJmZmYiPj4eXl5f4c03oKUBejApFQtQYr9mPU6ZMUeqmqs47dABw7949APV3pW7evIn09HSJM3qxAQMGYMCAATh9+jT69+8v/pxeVjVLQEAAjh07Bh8fH5w7d47Z8bdZs2Zh2bJl4svq/PnzmcTlae3atYiPj0fr1q3FxkzqWCg2LhKrqqrEu5RpaWnPLSCJerpw4QIGDx4MAIiLi0NcXBxkMpnKcauqqpRO0miCtWvXYtOmTQCgNH9VHenr66Nv377o1q0bDA0NxZ/TaAzNRoUiIWqMV+dFXt1UebGxsUFycjIGDRqEkSNHYty4cVKn9EqNi0QA9KKqYbp06SIu0gwYMIBZ3Pbt2yMqKgoAUFtbi9atWzOLzculS5eQnJwMbW1tAFD7uY9btmzBnj17UFlZCRMTE9y+fZvr3DjC1qJFi6CjowNXV1cEBgYym/HLq5sqTy4uLuJRdQA4dOgQs8ZavDQuEgHAwMBAokwIC1QoEqLGeA2D59VNlZfGheHJkyclzIQQ1cyePRuDBg2Ct7c39u7di4cPH2L8+PFSp/VSDg4OYpEIPPsiqG7u3LmDxMRE8aj65s2bpU6J/AteXl6YMWMGtLS0mMbl1U2Vp6ysLPj4+IiFrUKhUPtCkbxeqFAkpAXi1U2Vl6KiIkRERMDAwACurq7o0KEDs1VmQppTt27d4O3tDQDw9vbGjz/+KHFGr3bs2DHs2rUL5ubmAOqPwKvzruIbb7wBAOId7Ly8PCnTIf8Sr91fXt1UedLS0sLcuXPFZ02471dTUwNdXV0A9c146urqlBq7Ec1ChSIhLRCvbqq8rF+/HhMmTIBcLoe7uzvCw8M1rlDUhCNDhL+6ujql55qaGoky+efMzMywatUqAIAgCNi5c6fEGb3crVu3IJPJ0LFjR7i7u6v13FXSfHh1U+Xp6RnC6j5KBwBiYmLEXgq1tbVYvnw5VqxYIXFWpKnU/1NCCGGOVzdVXqytreHk5ITz589DV1f3mRmF6ig+Ph7btm1DVVWV2ACECkWio6ODgIAAWFpaorCwUJzrps4aisTS0lIYGRkhMDBQ4oxe7rvvvhP/t6OjI81wIwD4dVPl6e7du/D394dCoUD37t0RFhamtuNpsrOzxT+JiYkAgCdPnqC6ulrizIgqqFAkpAXi1U2VF4VCgXPnzuHRo0e4fPkyioqKpE7plfbt24etW7fC2NgYALB7926JMyLqYPr06UhJSYFCoYCbm5tGHH9LT0/HnDlzUFFRgbZt22LVqlVqvbOxceNGTJ48GUD9MdSQkBCxgRBpuXh1U+UpNjYWwcHBsLS0RH5+PjZu3IilS5dKndZzlZeXo6ioCPfv3xe/o7W1tTFx4kRpEyMqoUKRkBaIVzdVXiZPnozQ0FAoFAqkpqZqxPwrR0dHsUgE6u+mEQLUN5NycXGROo1/LDExEQkJCTAxMcGdO3ewevVqtSwUi4uLcf36dVy9ehWnTp0CUL+jwbopCtFMvLqp8mRtbS2eOnB0dIRcLpc4oxdzcnKCk5MTPD09aRf/NUKFIiEtEK9uqrzY2toqzb8qKSmRMJt/RqFQYOzYseLOrUKhQEJCgsRZEfLvWVlZiS3627VrBysrK4kzer7MzEwcOXIEWVlZ4mdNW1sbH374ocSZEXXAq5sqT/n5+cjMzIS5uTkKCgpw7do1qVN6pdTUVJSVlaG6uhoRERHw8/NTy7mr5J+hQpEQovYqKyuRlpYmdjGUyWRYs2aNxFm93OPHjxEUFCQ+a0K3OkKeJy8vD0lJSbCwsEBBQYHazl11d3eHu7s7Lly4oBF3P0nz0sRZml9++SVCQkKgUChgb2+vEadpbt26hfHjx2PChAmIiorCrl27pE6JqIAKRUKI2psyZQrs7e3Rtm1bAMD9+/clzujVnu5WR50XiaaaNWsWli1bJr6szp8/X+qUXury5cu4du0aRowYgcTERPTs2RNdu3aVOi1C/rWqqiql0zSaQE9PD7dv34aenh5sbW1hZGQkdUpEBVQoEkLUnqWlJUJCQsRndd3RaMzQ0BBHjx5FaWkpAM3YBSXkedq3by82g6mtrUXr1q0lzujlLly4gMWLFwMAhg8fjqioKAQHB0ucFSH/XlhYGNzc3ODh4aH2HVoblJSUYOzYsQgODsaZM2dw6dIlqVMiKmj17bfffit1EoQQ8jI1NTW4cOECKisrUVxcjB07dojd69TV119/jeLiYpw4cQK6urq4du0avLy8pE6LkH9t9uzZqKqqQo8ePZCYmIizZ8+q9dHOvLw89OvXD0D9OJKsrCzxmRBNYmtri0GDBiElJQXx8fFQKBTo37+/1Gm9VF1dHaZNm4bevXujTZs2cHNzg76+vtRpkSaiHUVCiNpLSEiArq6ueJRToVBInNGrmZmZYerUqYiJiYG/vz82btwodUqENEm3bt3g7e0NAPD29saPP/4ocUYvl5OTgz///BNWVlZqfaeSkFfp1asXUlJSIJfLkZ6erhHjdNauXYtNmzYBANq0aSNxNkRVVCgSQtSekZERIiMjxefMzEwJs/ln7t27B6B+SPnNmzeRnp4ucUaENE1dXZ3Sc01NjUSZ/DOBgYEadaeSkBcZMmQITExMEBgYiIiICOjoqP9ru4uLi9glGQAOHTqEoUOHSpgRUYX6/4sjhLR49vb2OHHiBCwtLQHU3/fr0aOHxFm9nI2NDZKTkzFo0CCMHDkS48aNkzolQppER0cHAQEBsLS0RGFhoVofOwWU71QCQGFhoYTZENJ0Bw8ehEwmg1wuR3Z2Ntzd3dX+rmJWVhZ8fHzEPBUKBRWKGkxLEARB6iQIIeRlXFxcxHmEAHDjxg0cPnxYwowIaVlSUlLEHTp1P/5WWVmJPXv2iPNWT58+jZ9++knapAhpArlcjt69e0MmkyEuLg5FRUWQyWRSp/VS//d//wd/f3/xec+ePRox1oM8HxWKhBC1l5CQoDSwNy0tDQMHDpQwo1crKipCREQEDAwM4Orqig4dOqBv375Sp0XIa2/evHno2bMnzp49C2dnZ6SkpFDHYaKRBg8eDB0dHbi6umL48OEa8R1SXl6uNBqqpqYGurq6EmZEVKEtdQKEEPIqjYtEAGpfJALA+vXrMWHCBJibm8Pd3R179+6VOiVCWoRu3bph4sSJ6NmzJ3x8fNCrVy+pUyKkSby8vHDw4EGEhIRoRJEIAHfv3sXYsWPxzjvv4IsvvkBRUZHUKREVUKFICCEcWFtbw8nJCfr6+tDV1UWHDh2kTomQFiEvLw8PHjxAaWkpTp8+DblcLnVKhDTJzJkzoaWlJXUa/0psbCyCg4Px119/Yf78+dTxW8NRoUgIIRwoFAqcO3cOjx49wuXLl2lVlZBmMnjwYGRnZ8PT0xPff/+92s9cJeR1Ym1tjd69e+Ott96Co6MjOnfuLHVKRAXU9ZQQQjiYPHkyQkNDoVAokJqaSpf5CWkmp06dgpeXF+zt7bF7926p0yGkRcnPz0dmZibMzc1RUFCAa9euSZ0SUQEVioQQwoGtrS1+/fVX8bmhAyMhhK9r167Bzs5O6jQIaZG+/PJLhISEiF2SaZFUs1GhSAghHFRWViItLQ2VlZUA6mc/UudFQvjr06cPKisrYWhoCAD46aefMHHiRGmTIqSFqKqqUlokJZqNxmMQQggHvr6+sLe3R9u2bQHUH4fbunWrxFkR8vr78MMPUVJSAhMTEwD1izbU0IaQ5jFmzBi4ubnBw8MDNjY2UqdDVEQ7ioQQwoGlpSVCQkLE5/z8fOmSIaQFOHnyJPr3749PPvkEc+fOFX8eHx8vYVaEtCwLFy6EtbU1kpKSsH37dnTs2BEBAQFSp0WaiLqeEkIIBy4uLti1axdOnTqFU6dOITY2VuqUCHmt7d+/H9ra2nj33XeVfv7+++9LlBEhLU+vXr2Qnp4OuVyOY8eOUcdvDUc7ioQQwkFCQgJ0dXXx5ptvAqgfl0EI4ad169YoKipCWloaunbtKv78l19+wfz58yXMjJCWY8iQITAxMUFgYCAiIiKgo0OlhiajO4qEEMLB/PnzERkZKT5nZmaiR48eEmZEyOvt999/R0JCAvLz82Fubo6G15sbN27g8OHDEmdHSMtQXV0NmUyGjIwMtG3bFu7u7nRXUYNRmU8IIRzY29vjxIkTsLS0BFDf9ZQKRUL48fT0hKenJ2QyGQYPHiz+/OjRoxJmRUjLcuHCBfHzFxcXh7i4OMhkMomzIk1FO4qEEMKBi4sLunTpIj7TrgYhhJDX3eDBg6GjowNXV1cMHz4cffv2lTologLaUSSEEA6CgoLw6aefis9paWkSZkMIIYTw5+XlhRkzZkBLS0vqVAgDtKNICCGEEEIIIUQJjccghBBCCCGEEKKECkVCCCGEEEIIIUqoUCSEEEIIIYQQooQKRUIIIYQQQgghSqhQJIQQQgghhBCi5P8BZZigRg0PMu8AAAAASUVORK5CYII=\n",
-      "text/plain": [
-       "<Figure size 1080x576 with 2 Axes>"
-      ]
-     },
-     "metadata": {},
-     "output_type": "display_data"
+   "execution_count": 7,
+   "id": "2b363d73",
+   "metadata": {
+    "collapsed": false,
+    "editable": true,
+    "jupyter": {
+     "outputs_hidden": false
     }
-   ],
+   },
+   "outputs": [],
    "source": [
-    "import matplotlib.pyplot as plt\n",
+    "# Common imports\n",
+    "import os\n",
     "import numpy as np\n",
-    "from sklearn.model_selection import  train_test_split \n",
-    "from sklearn.datasets import load_breast_cancer\n",
-    "from sklearn.linear_model import LogisticRegression\n",
-    "cancer = load_breast_cancer()\n",
     "import pandas as pd\n",
-    "# Making a data frame\n",
-    "cancerpd = pd.DataFrame(cancer.data, columns=cancer.feature_names)\n",
-    "\n",
-    "fig, axes = plt.subplots(15,2,figsize=(10,20))\n",
-    "malignant = cancer.data[cancer.target == 0]\n",
-    "benign = cancer.data[cancer.target == 1]\n",
-    "ax = axes.ravel()\n",
-    "\n",
-    "for i in range(30):\n",
-    "    _, bins = np.histogram(cancer.data[:,i], bins =50)\n",
-    "    ax[i].hist(malignant[:,i], bins = bins, alpha = 0.5)\n",
-    "    ax[i].hist(benign[:,i], bins = bins, alpha = 0.5)\n",
-    "    ax[i].set_title(cancer.feature_names[i])\n",
-    "    ax[i].set_yticks(())\n",
-    "ax[0].set_xlabel(\"Feature magnitude\")\n",
-    "ax[0].set_ylabel(\"Frequency\")\n",
-    "ax[0].legend([\"Malignant\", \"Benign\"], loc =\"best\")\n",
-    "fig.tight_layout()\n",
-    "plt.show()\n",
-    "\n",
-    "import seaborn as sns\n",
-    "correlation_matrix = cancerpd.corr().round(1)\n",
-    "# use the heatmap function from seaborn to plot the correlation matrix\n",
-    "# annot = True to print the values inside the square\n",
-    "plt.figure(figsize=(15,8))\n",
-    "sns.heatmap(data=correlation_matrix, annot=True)\n",
-    "plt.show()"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Discussing the correlation data\n",
+    "import matplotlib.pyplot as plt\n",
+    "from sklearn.linear_model import LinearRegression, Ridge, Lasso\n",
+    "from sklearn.model_selection import train_test_split\n",
+    "from sklearn.utils import resample\n",
+    "from sklearn.metrics import mean_squared_error\n",
+    "# Where to save the figures and data files\n",
+    "PROJECT_ROOT_DIR = \"Results\"\n",
+    "FIGURE_ID = \"Results/FigureFiles\"\n",
+    "DATA_ID = \"DataFiles/\"\n",
     "\n",
-    "In the above example we note two things. In the first plot we display\n",
-    "the overlap of benign and malignant tumors as functions of the various\n",
-    "features in the Wisconsing breast cancer data set. We see that for\n",
-    "some of the features we can distinguish clearly the benign and\n",
-    "malignant cases while for other features we cannot. This can point to\n",
-    "us which features may be of greater interest when we wish to classify\n",
-    "a benign or not benign tumour.\n",
+    "if not os.path.exists(PROJECT_ROOT_DIR):\n",
+    "    os.mkdir(PROJECT_ROOT_DIR)\n",
     "\n",
-    "In the second figure we have computed the so-called correlation\n",
-    "matrix, which in our case with thirty features becomes a $30\\times 30$\n",
-    "matrix.\n",
+    "if not os.path.exists(FIGURE_ID):\n",
+    "    os.makedirs(FIGURE_ID)\n",
     "\n",
-    "We constructed this matrix using **pandas** via the statements"
+    "if not os.path.exists(DATA_ID):\n",
+    "    os.makedirs(DATA_ID)\n",
+    "\n",
+    "def image_path(fig_id):\n",
+    "    return os.path.join(FIGURE_ID, fig_id)\n",
+    "\n",
+    "def data_path(dat_id):\n",
+    "    return os.path.join(DATA_ID, dat_id)\n",
+    "\n",
+    "def save_fig(fig_id):\n",
+    "    plt.savefig(image_path(fig_id) + \".png\", format='png')\n",
+    "\n",
+    "infile = open(data_path(\"EoS.csv\"),'r')\n",
+    "\n",
+    "# Read the EoS data as  csv file and organize the data into two arrays with density and energies\n",
+    "EoS = pd.read_csv(infile, names=('Density', 'Energy'))\n",
+    "EoS['Energy'] = pd.to_numeric(EoS['Energy'], errors='coerce')\n",
+    "EoS = EoS.dropna()\n",
+    "Energies = EoS['Energy']\n",
+    "Density = EoS['Density']\n",
+    "#  The design matrix now as function of various polytrops\n",
+    "\n",
+    "Maxpolydegree = 30\n",
+    "X = np.zeros((len(Density),Maxpolydegree))\n",
+    "X[:,0] = 1.0\n",
+    "testerror = np.zeros(Maxpolydegree)\n",
+    "trainingerror = np.zeros(Maxpolydegree)\n",
+    "polynomial = np.zeros(Maxpolydegree)\n",
+    "\n",
+    "trials = 100\n",
+    "for polydegree in range(1, Maxpolydegree):\n",
+    "    polynomial[polydegree] = polydegree\n",
+    "    for degree in range(polydegree):\n",
+    "        X[:,degree] = Density**(degree/3.0)\n",
+    "\n",
+    "# loop over trials in order to estimate the expectation value of the MSE\n",
+    "    testerror[polydegree] = 0.0\n",
+    "    trainingerror[polydegree] = 0.0\n",
+    "    for samples in range(trials):\n",
+    "        x_train, x_test, y_train, y_test = train_test_split(X, Energies, test_size=0.2)\n",
+    "        model = LinearRegression(fit_intercept=False).fit(x_train, y_train)\n",
+    "        ypred = model.predict(x_train)\n",
+    "        ytilde = model.predict(x_test)\n",
+    "        testerror[polydegree] += mean_squared_error(y_test, ytilde)\n",
+    "        trainingerror[polydegree] += mean_squared_error(y_train, ypred) \n",
+    "\n",
+    "    testerror[polydegree] /= trials\n",
+    "    trainingerror[polydegree] /= trials\n",
+    "    print(\"Degree of polynomial: %3d\"% polynomial[polydegree])\n",
+    "    print(\"Mean squared error on training data: %.8f\" % trainingerror[polydegree])\n",
+    "    print(\"Mean squared error on test data: %.8f\" % testerror[polydegree])\n",
+    "\n",
+    "plt.plot(polynomial, np.log10(trainingerror), label='Training Error')\n",
+    "plt.plot(polynomial, np.log10(testerror), label='Test Error')\n",
+    "plt.xlabel('Polynomial degree')\n",
+    "plt.ylabel('log10[MSE]')\n",
+    "plt.legend()\n",
+    "plt.show()"
    ]
   },
   {
-   "cell_type": "code",
-   "execution_count": 27,
-   "metadata": {},
-   "outputs": [],
+   "cell_type": "markdown",
+   "id": "de30ce89",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "cancerpd = pd.DataFrame(cancer.data, columns=cancer.feature_names)"
+    "Note that we kept the intercept column in the fitting here. This means that we need to set the **intercept** in the call to the **Scikit-Learn** function as **False**. Alternatively, we could have set up the design matrix $X$ without the first column of ones."
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "936b8f0f",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "and then"
+    "## The same example but now with cross-validation\n",
+    "\n",
+    "In this example we keep the intercept column again but add cross-validation in order to estimate the best possible value of the means squared error."
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 28,
-   "metadata": {},
+   "execution_count": 8,
+   "id": "399b09d4",
+   "metadata": {
+    "collapsed": false,
+    "editable": true,
+    "jupyter": {
+     "outputs_hidden": false
+    }
+   },
    "outputs": [],
    "source": [
-    "correlation_matrix = cancerpd.corr().round(1)"
+    "# Common imports\n",
+    "import os\n",
+    "import numpy as np\n",
+    "import pandas as pd\n",
+    "import matplotlib.pyplot as plt\n",
+    "from sklearn.linear_model import LinearRegression, Ridge, Lasso\n",
+    "from sklearn.metrics import mean_squared_error\n",
+    "from sklearn.model_selection import KFold\n",
+    "from sklearn.model_selection import cross_val_score\n",
+    "\n",
+    "\n",
+    "# Where to save the figures and data files\n",
+    "PROJECT_ROOT_DIR = \"Results\"\n",
+    "FIGURE_ID = \"Results/FigureFiles\"\n",
+    "DATA_ID = \"DataFiles/\"\n",
+    "\n",
+    "if not os.path.exists(PROJECT_ROOT_DIR):\n",
+    "    os.mkdir(PROJECT_ROOT_DIR)\n",
+    "\n",
+    "if not os.path.exists(FIGURE_ID):\n",
+    "    os.makedirs(FIGURE_ID)\n",
+    "\n",
+    "if not os.path.exists(DATA_ID):\n",
+    "    os.makedirs(DATA_ID)\n",
+    "\n",
+    "def image_path(fig_id):\n",
+    "    return os.path.join(FIGURE_ID, fig_id)\n",
+    "\n",
+    "def data_path(dat_id):\n",
+    "    return os.path.join(DATA_ID, dat_id)\n",
+    "\n",
+    "def save_fig(fig_id):\n",
+    "    plt.savefig(image_path(fig_id) + \".png\", format='png')\n",
+    "\n",
+    "infile = open(data_path(\"EoS.csv\"),'r')\n",
+    "\n",
+    "# Read the EoS data as  csv file and organize the data into two arrays with density and energies\n",
+    "EoS = pd.read_csv(infile, names=('Density', 'Energy'))\n",
+    "EoS['Energy'] = pd.to_numeric(EoS['Energy'], errors='coerce')\n",
+    "EoS = EoS.dropna()\n",
+    "Energies = EoS['Energy']\n",
+    "Density = EoS['Density']\n",
+    "#  The design matrix now as function of various polytrops\n",
+    "\n",
+    "Maxpolydegree = 30\n",
+    "X = np.zeros((len(Density),Maxpolydegree))\n",
+    "X[:,0] = 1.0\n",
+    "estimated_mse_sklearn = np.zeros(Maxpolydegree)\n",
+    "polynomial = np.zeros(Maxpolydegree)\n",
+    "k =5\n",
+    "kfold = KFold(n_splits = k)\n",
+    "\n",
+    "for polydegree in range(1, Maxpolydegree):\n",
+    "    polynomial[polydegree] = polydegree\n",
+    "    for degree in range(polydegree):\n",
+    "        X[:,degree] = Density**(degree/3.0)\n",
+    "        OLS = LinearRegression(fit_intercept=False)\n",
+    "# loop over trials in order to estimate the expectation value of the MSE\n",
+    "    estimated_mse_folds = cross_val_score(OLS, X, Energies, scoring='neg_mean_squared_error', cv=kfold)\n",
+    "#[:, np.newaxis]\n",
+    "    estimated_mse_sklearn[polydegree] = np.mean(-estimated_mse_folds)\n",
+    "\n",
+    "plt.plot(polynomial, np.log10(estimated_mse_sklearn), label='Test Error')\n",
+    "plt.xlabel('Polynomial degree')\n",
+    "plt.ylabel('log10[MSE]')\n",
+    "plt.legend()\n",
+    "plt.show()"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "ded3c9a0",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "Diagonalizing this matrix we can in turn say something about which\n",
-    "features are of relevance and which are not. This leads  us to\n",
-    "the classical Principal Component Analysis (PCA) theorem with\n",
-    "applications. This will be discussed later this semester ([week 43](https://compphysics.github.io/MachineLearning/doc/pub/week43/html/week43-bs.html)).\n",
+    "## Material for the lab sessions\n",
     "\n",
+    "This week we will discuss during the first hour of each lab session\n",
+    "some technicalities related to the project and methods for updating\n",
+    "the learning like ADAgrad, RMSprop and ADAM. As teaching material, see\n",
+    "the jupyter-notebook from week 37 (September 12-16).\n",
     "\n",
+    "For the lab session, the following video on cross validation (from 2024), could be helpful, see <https://www.youtube.com/watch?v=T9jjWsmsd1o>\n",
     "\n",
-    "## Other measures in classification studies: Cancer Data  again"
+    "See also video on ADAgrad, RMSprop and ADAM (material from last week not covered during lecture) at <https://youtu.be/J_41Hld6tTU>"
    ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 7,
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "(426, 30)\n",
-      "(143, 30)\n",
-      "Test set accuracy with Logistic Regression: 0.95\n",
-      "Test set accuracy Logistic Regression with scaled data: 0.96\n",
-      "[1.         1.         1.         1.         1.         1.\n",
-      " 1.         1.         0.92857143 0.92857143]\n",
-      "Test set accuracy with Logistic Regression  and scaled data: 0.96\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "/Users/hjensen/opt/anaconda3/lib/python3.8/site-packages/sklearn/linear_model/_logistic.py:762: ConvergenceWarning: lbfgs failed to converge (status=1):\n",
-      "STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.\n",
-      "\n",
-      "Increase the number of iterations (max_iter) or scale the data as shown in:\n",
-      "    https://scikit-learn.org/stable/modules/preprocessing.html\n",
-      "Please also refer to the documentation for alternative solver options:\n",
-      "    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression\n",
-      "  n_iter_i = _check_optimize_result(\n"
-     ]
-    },
-    {
-     "ename": "ModuleNotFoundError",
-     "evalue": "No module named 'scikitplot'",
-     "output_type": "error",
-     "traceback": [
-      "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
-      "\u001b[0;31mModuleNotFoundError\u001b[0m                       Traceback (most recent call last)",
-      "\u001b[0;32m<ipython-input-7-12adb44b1c20>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m\u001b[0m\n\u001b[1;32m     34\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m     35\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 36\u001b[0;31m \u001b[0;32mimport\u001b[0m \u001b[0mscikitplot\u001b[0m \u001b[0;32mas\u001b[0m \u001b[0mskplt\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m     37\u001b[0m \u001b[0my_pred\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mlogreg\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mpredict\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mX_test_scaled\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m     38\u001b[0m \u001b[0mskplt\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mmetrics\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mplot_confusion_matrix\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0my_test\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0my_pred\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mnormalize\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;32mTrue\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
-      "\u001b[0;31mModuleNotFoundError\u001b[0m: No module named 'scikitplot'"
-     ]
-    }
-   ],
-   "source": [
->>>>>>> 9b0e2e75096cc1acee65bfac25f4eff818140252
-    "import matplotlib.pyplot as plt\n",
-    "import numpy as np\n",
-    "from sklearn.model_selection import  train_test_split \n",
-    "from sklearn.datasets import load_breast_cancer\n",
-    "from sklearn.linear_model import LogisticRegression\n",
-    "\n",
-    "# Load the data\n",
-    "cancer = load_breast_cancer()\n",
-    "\n",
-    "X_train, X_test, y_train, y_test = train_test_split(cancer.data,cancer.target,random_state=0)\n",
-    "print(X_train.shape)\n",
-    "print(X_test.shape)\n",
-    "# Logistic Regression\n",
-    "logreg = LogisticRegression(solver='lbfgs')\n",
-    "logreg.fit(X_train, y_train)\n",
-    "print(\"Test set accuracy with Logistic Regression: {:.2f}\".format(logreg.score(X_test,y_test)))\n",
-    "#now scale the data\n",
-    "from sklearn.preprocessing import StandardScaler\n",
-    "scaler = StandardScaler()\n",
-    "scaler.fit(X_train)\n",
-    "X_train_scaled = scaler.transform(X_train)\n",
-    "X_test_scaled = scaler.transform(X_test)\n",
-    "# Logistic Regression\n",
-    "logreg.fit(X_train_scaled, y_train)\n",
-    "print(\"Test set accuracy Logistic Regression with scaled data: {:.2f}\".format(logreg.score(X_test_scaled,y_test)))\n",
-    "\n",
-    "\n",
-    "from sklearn.preprocessing import LabelEncoder\n",
-    "from sklearn.model_selection import cross_validate\n",
-    "#Cross validation\n",
-    "accuracy = cross_validate(logreg,X_test_scaled,y_test,cv=10)['test_score']\n",
-    "print(accuracy)\n",
-    "print(\"Test set accuracy with Logistic Regression  and scaled data: {:.2f}\".format(logreg.score(X_test_scaled,y_test)))\n",
-    "\n",
-    "\n",
-    "import scikitplot as skplt\n",
-    "y_pred = logreg.predict(X_test_scaled)\n",
-    "skplt.metrics.plot_confusion_matrix(y_test, y_pred, normalize=True)\n",
-    "plt.show()\n",
-    "y_probas = logreg.predict_proba(X_test_scaled)\n",
-    "skplt.metrics.plot_roc(y_test, y_probas)\n",
-    "plt.show()\n",
-    "skplt.metrics.plot_cumulative_gain(y_test, y_probas)\n",
-    "plt.show()"
-   ]
-<<<<<<< HEAD
-=======
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": []
->>>>>>> 9b0e2e75096cc1acee65bfac25f4eff818140252
   }
  ],
  "metadata": {
   "kernelspec": {
-   "display_name": "Python 3",
+   "display_name": "Python 3 (ipykernel)",
    "language": "python",
    "name": "python3"
   },
@@ -2759,13 +2315,9 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-<<<<<<< HEAD
-   "version": "3.6.8"
-=======
-   "version": "3.8.3"
->>>>>>> 9b0e2e75096cc1acee65bfac25f4eff818140252
+   "version": "3.9.15"
   }
  },
  "nbformat": 4,
- "nbformat_minor": 2
+ "nbformat_minor": 5
 }
diff --git a/doc/pub/week38/ipynb/figures/BiasVariance.png b/doc/pub/week38/ipynb/figures/BiasVariance.png
new file mode 100644
index 000000000..3fb3474ac
Binary files /dev/null and b/doc/pub/week38/ipynb/figures/BiasVariance.png differ
diff --git a/doc/pub/week38/ipynb/figures/adagrad.png b/doc/pub/week38/ipynb/figures/adagrad.png
new file mode 100644
index 000000000..97a9cf908
Binary files /dev/null and b/doc/pub/week38/ipynb/figures/adagrad.png differ
diff --git a/doc/pub/week38/ipynb/figures/adam.png b/doc/pub/week38/ipynb/figures/adam.png
new file mode 100644
index 000000000..a3a39f025
Binary files /dev/null and b/doc/pub/week38/ipynb/figures/adam.png differ
diff --git a/doc/pub/week38/ipynb/figures/nns.png b/doc/pub/week38/ipynb/figures/nns.png
new file mode 100644
index 000000000..19e31ef05
Binary files /dev/null and b/doc/pub/week38/ipynb/figures/nns.png differ
diff --git a/doc/pub/week38/ipynb/figures/rmsprop.png b/doc/pub/week38/ipynb/figures/rmsprop.png
new file mode 100644
index 000000000..9f336d033
Binary files /dev/null and b/doc/pub/week38/ipynb/figures/rmsprop.png differ
diff --git a/doc/pub/week38/ipynb/ipynb-week38-src.tar.gz b/doc/pub/week38/ipynb/ipynb-week38-src.tar.gz
index 076993818..1e2fa70f2 100644
Binary files a/doc/pub/week38/ipynb/ipynb-week38-src.tar.gz and b/doc/pub/week38/ipynb/ipynb-week38-src.tar.gz differ
diff --git a/doc/pub/week38/ipynb/w38KHBiasVariance.ipynb b/doc/pub/week38/ipynb/w38KHBiasVariance.ipynb
deleted file mode 100644
index 43efc5e41..000000000
--- a/doc/pub/week38/ipynb/w38KHBiasVariance.ipynb
+++ /dev/null
@@ -1,196 +0,0 @@
-{
- "cells": [
-  {
-   "cell_type": "code",
-   "execution_count": 3,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "import numpy as np\n",
-    "import matplotlib.pyplot as plt\n",
-    "\n",
-    "from sklearn.preprocessing import PolynomialFeatures\n",
-    "from sklearn.linear_model import LinearRegression\n",
-    "from sklearn.model_selection import train_test_split\n",
-    "from sklearn.utils import resample\n",
-    "from sklearn.metrics import mean_squared_error"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 4,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "n = 50  # increase, var goes down\n",
-    "x = np.random.rand(n) * 10\n",
-    "y = 5 + x**2 + np.random.randn(n) * 3  # decrease,\n",
-    "poly = PolynomialFeatures(10)  # increase, var goes up"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 5,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "X = poly.fit_transform(x.reshape(n, 1))\n",
-    "\n",
-    "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)\n",
-    "x_test = X_test[:, 1]"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 6,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "models = []\n",
-    "for i in range(10):\n",
-    "    X_sample, y_sample = resample(X_train, y_train)\n",
-    "    mdl = LinearRegression().fit(X_sample, y_sample)\n",
-    "    models.append(mdl)"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 7,
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "92.13165463148223\n",
-      "92.13165463148225\n"
-     ]
-    },
-    {
-     "data": {
-      "text/plain": [
-       "<ErrorbarContainer object of 3 artists>"
-      ]
-     },
-     "execution_count": 7,
-     "metadata": {},
-     "output_type": "execute_result"
-    },
-    {
-     "data": {
-      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAjMAAAGdCAYAAADnrPLBAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjguMCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy81sbWrAAAACXBIWXMAAA9hAAAPYQGoP6dpAAA6QUlEQVR4nO3de3RU5b3/8c9cMnsmYRIDgQyRIBepFoOK4KGALVhuWkQtZ3kp6oEjZWkBNUWqRXsqciSpN/QUVqn0uIBKEdf5Ka3VWoja4uF4QxTLxUKt3JQMUYgzk2Rumdm/P4aMDvdoJsMO79dae83M3s/s/d3R5Xx89rOfbTNN0xQAAIBF2XNdAAAAwNdBmAEAAJZGmAEAAJZGmAEAAJZGmAEAAJZGmAEAAJZGmAEAAJZGmAEAAJbmzHUB7SGZTGrfvn3yer2y2Wy5LgcAAJwE0zQVCoVUVlYmu/3Y/S+nRZjZt2+fysvLc10GAAD4Cvbu3asePXocc/tpEWa8Xq+k1B+jsLAwx9UAAICTEQwGVV5env4dP5bTIsy0XFoqLCwkzAAAYDEnGiLCAGAAAGBphBkAAGBphBkAAGBphBkAAGBphBkAAGBphBkAAGBphBkAAGBphBkAAGBphBkAAGBphBkAAGBphBkAAGBphBkAAGBphBkAAGBphBkAAPCVNMWa1eunL6rXT19UU6w5Z3UQZgAAgKURZgAAgKURZgAAgKURZgAAgKVlNcw0NzfrZz/7mXr37i2Px6M+ffpo3rx5SiaT6TamaWru3LkqKyuTx+PRyJEjtXXr1oz9RKNR3XbbbSopKVFBQYGuvPJKffzxx9ksHQAAWERWw8yDDz6oX//611q0aJE++OADPfTQQ3r44Ye1cOHCdJuHHnpICxYs0KJFi7Rhwwb5fD6NGTNGoVAo3aayslKrV6/WqlWrtH79ejU0NOiKK65QIpHIZvkAAMACnNnc+RtvvKGrrrpK48ePlyT16tVLTz/9tN555x1JqV6Zxx9/XPfee68mTpwoSVq+fLlKS0u1cuVK3XLLLQoEAnryySf11FNPafTo0ZKkFStWqLy8XC+//LLGjRuXzVMAAACnuKz2zFxyySV65ZVXtGPHDknS+++/r/Xr1+t73/ueJGnnzp3y+/0aO3Zs+juGYWjEiBF6/fXXJUkbN25UPB7PaFNWVqaKiop0m8NFo1EFg8GMBQAAdExZ7Zm5++67FQgEdO6558rhcCiRSGj+/Pn6wQ9+IEny+/2SpNLS0ozvlZaWavfu3ek2LpdLxcXFR7Rp+f7hqqurdf/997f16QAAgFNQVntmnnnmGa1YsUIrV67Uu+++q+XLl+uRRx7R8uXLM9rZbLaMz6ZpHrHucMdrM2fOHAUCgfSyd+/er3ciAADglJXVnpmf/OQn+ulPf6rrr79ekjRgwADt3r1b1dXVmjx5snw+n6RU70v37t3T36urq0v31vh8PsViMdXX12f0ztTV1WnYsGFHPa5hGDIMI1unBQAATiFZ7ZlpamqS3Z55CIfDkb41u3fv3vL5fKqpqUlvj8ViWrduXTqoDBo0SHl5eRltamtrtWXLlmOGGQAAcPrIas/MhAkTNH/+fPXs2VPnnXee3nvvPS1YsEA333yzpNTlpcrKSlVVValfv37q16+fqqqqlJ+fr0mTJkmSioqKNHXqVN15553q0qWLOnfurNmzZ2vAgAHpu5sAAMDpK6thZuHChfqP//gPTZ8+XXV1dSorK9Mtt9yin//85+k2d911l8LhsKZPn676+noNGTJEa9euldfrTbd57LHH5HQ6de211yocDmvUqFFatmyZHA5HNssHAAAWYDNN08x1EdkWDAZVVFSkQCCgwsLCXJcDAECH0BRrVv+fr5EkbZs3Tvmutu0jOdnfb57NBAAALI0wAwAALI0wAwAALI0wAwAALI0wAwAALI0wAwAALI0wAwAALI0wAwAALI0wAwAALI0wAwAALI0wAwAALI0wAwAALI0wAwAALI0wAwAALI0wAwAALI0wAwAALI0wAwAALI0wAwAALI0wAwAALI0wAwAALI0wAwAALI0wAwAALI0wAwAALI0wAwAALI0wAwAALI0wAwAALI0wAwAALI0wAwAALI0wAwAALI0wAwAALI0wAwAALI0wAwAALI0wAwAALI0wAwAALI0wAwAALC3rYeaTTz7RjTfeqC5duig/P18XXnihNm7cmN5umqbmzp2rsrIyeTwejRw5Ulu3bs3YRzQa1W233aaSkhIVFBToyiuv1Mcff5zt0gEAgAVkNczU19dr+PDhysvL00svvaRt27bp0Ucf1RlnnJFu89BDD2nBggVatGiRNmzYIJ/PpzFjxigUCqXbVFZWavXq1Vq1apXWr1+vhoYGXXHFFUokEtksHwAAWIAzmzt/8MEHVV5erqVLl6bX9erVK/3eNE09/vjjuvfeezVx4kRJ0vLly1VaWqqVK1fqlltuUSAQ0JNPPqmnnnpKo0ePliStWLFC5eXlevnllzVu3LhsngIAADjFZbVn5vnnn9fgwYN1zTXXqFu3bho4cKB+85vfpLfv3LlTfr9fY8eOTa8zDEMjRozQ66+/LknauHGj4vF4RpuysjJVVFSk2xwuGo0qGAxmLAAAoGPKapj56KOPtHjxYvXr109r1qzRrbfeqttvv12//e1vJUl+v1+SVFpamvG90tLS9Da/3y+Xy6Xi4uJjtjlcdXW1ioqK0kt5eXlbnxoAADhFZDXMJJNJXXTRRaqqqtLAgQN1yy23aNq0aVq8eHFGO5vNlvHZNM0j1h3ueG3mzJmjQCCQXvbu3fv1TgQAAJyyshpmunfvrv79+2es++Y3v6k9e/ZIknw+nyQd0cNSV1eX7q3x+XyKxWKqr68/ZpvDGYahwsLCjAUAAHRMWQ0zw4cP1/bt2zPW7dixQ2eddZYkqXfv3vL5fKqpqUlvj8ViWrdunYYNGyZJGjRokPLy8jLa1NbWasuWLek2AADg9JXVu5l+/OMfa9iwYaqqqtK1116rt99+W0uWLNGSJUskpS4vVVZWqqqqSv369VO/fv1UVVWl/Px8TZo0SZJUVFSkqVOn6s4771SXLl3UuXNnzZ49WwMGDEjf3QQAAE5fWQ0zF198sVavXq05c+Zo3rx56t27tx5//HHdcMMN6TZ33XWXwuGwpk+frvr6eg0ZMkRr166V1+tNt3nsscfkdDp17bXXKhwOa9SoUVq2bJkcDkc2ywcAABZgM03TzHUR2RYMBlVUVKRAIMD4GQAA2khTrFn9f75GkrRt3jjlu9q2j+Rkf795NhMAALA0wgwAALA0wgwAALA0wgwAALA0wgwAALA0wgwAALA0wgwAALA0wgwAALA0wgwAALA0wgwAALA0wgwAALA0wgwAALA0wgwAALA0wgwAALA0wgwAALA0wgwAALA0wgwAALA0wgwAALA0wgwAALA0wgwAALA0wgwAALA0wgwAALA0wgwAALA0wgwAALA0wgwAALA0wgwAALA0wgwAALA0wgwAALA0wgwAALA0wgwAALA0wgwAALA0wgwAALA0wgwAALA0wgwAALA0wgwAALA0wgwAALC0dgsz1dXVstlsqqysTK8zTVNz585VWVmZPB6PRo4cqa1bt2Z8LxqN6rbbblNJSYkKCgp05ZVX6uOPP26vsgEAwCmuXcLMhg0btGTJEp1//vkZ6x966CEtWLBAixYt0oYNG+Tz+TRmzBiFQqF0m8rKSq1evVqrVq3S+vXr1dDQoCuuuEKJRKI9SgcAAKe4rIeZhoYG3XDDDfrNb36j4uLi9HrTNPX444/r3nvv1cSJE1VRUaHly5erqalJK1eulCQFAgE9+eSTevTRRzV69GgNHDhQK1as0ObNm/Xyyy9nu3QAAGABWQ8zM2bM0Pjx4zV69OiM9Tt37pTf79fYsWPT6wzD0IgRI/T6669LkjZu3Kh4PJ7RpqysTBUVFek2RxONRhUMBjMWAADQMTmzufNVq1bp3Xff1YYNG47Y5vf7JUmlpaUZ60tLS7V79+50G5fLldGj09Km5ftHU11drfvvv//rlg8AACwgaz0ze/fu1R133KEVK1bI7XYfs53NZsv4bJrmEesOd6I2c+bMUSAQSC979+5tXfEAAMAyshZmNm7cqLq6Og0aNEhOp1NOp1Pr1q3TL3/5SzmdznSPzOE9LHV1deltPp9PsVhM9fX1x2xzNIZhqLCwMGMBAAAdU9bCzKhRo7R582Zt2rQpvQwePFg33HCDNm3apD59+sjn86mmpib9nVgspnXr1mnYsGGSpEGDBikvLy+jTW1trbZs2ZJuAwAATm9ZGzPj9XpVUVGRsa6goEBdunRJr6+srFRVVZX69eunfv36qaqqSvn5+Zo0aZIkqaioSFOnTtWdd96pLl26qHPnzpo9e7YGDBhwxIBiAABwesrqAOATueuuuxQOhzV9+nTV19dryJAhWrt2rbxeb7rNY489JqfTqWuvvVbhcFijRo3SsmXL5HA4clg5AAA4VdhM0zRzXUS2BYNBFRUVKRAIMH4GAIA20hRrVv+fr5EkbZs3Tvmutu0jOdnfb57NBAAALI0wAwAALI0wAwAALI0wAwAALI0wAwAALI0wAwAALI0wAwAALI0wAwAALI0wAwAALI0wAwAALI0wAwAALI0wAwAALI0wAwAALI0wAwAALI0wAwAALI0wAwAALI0wAwAALI0wAwAALI0wAwAALI0wAwAALI0wAwAALI0wAwAALI0wAwAALI0wAwAALI0wAwAALI0wAwAALI0wAwAALI0wAwAALI0wAwAALI0wAwAALI0wAwAALI0wAwAALI0wAwAALI0wAwAALI0wAwAALC2rYaa6uloXX3yxvF6vunXrpquvvlrbt2/PaGOapubOnauysjJ5PB6NHDlSW7duzWgTjUZ12223qaSkRAUFBbryyiv18ccfZ7N0AABgEVkNM+vWrdOMGTP05ptvqqamRs3NzRo7dqwaGxvTbR566CEtWLBAixYt0oYNG+Tz+TRmzBiFQqF0m8rKSq1evVqrVq3S+vXr1dDQoCuuuEKJRCKb5QMAAAuwmaZpttfBPv30U3Xr1k3r1q3Td77zHZmmqbKyMlVWVuruu++WlOqFKS0t1YMPPqhbbrlFgUBAXbt21VNPPaXrrrtOkrRv3z6Vl5frT3/6k8aNG3fC4waDQRUVFSkQCKiwsDCr5wgAwOmiKdas/j9fI0naNm+c8l3ONt3/yf5+t+uYmUAgIEnq3LmzJGnnzp3y+/0aO3Zsuo1hGBoxYoRef/11SdLGjRsVj8cz2pSVlamioiLd5nDRaFTBYDBjAQAAHVO7hRnTNDVr1ixdcsklqqiokCT5/X5JUmlpaUbb0tLS9Da/3y+Xy6Xi4uJjtjlcdXW1ioqK0kt5eXlbnw4AADhFtFuYmTlzpv72t7/p6aefPmKbzWbL+Gya5hHrDne8NnPmzFEgEEgve/fu/eqFAwCAU1q7hJnbbrtNzz//vP7yl7+oR48e6fU+n0+SjuhhqaurS/fW+Hw+xWIx1dfXH7PN4QzDUGFhYcYCAADaSH299NRT0o035boSSVkOM6ZpaubMmXruuef06quvqnfv3hnbe/fuLZ/Pp5qamvS6WCymdevWadiwYZKkQYMGKS8vL6NNbW2ttmzZkm4DAACyzO+XnnhCuvpq6ZZbUusWLsxpSS3adtjxYWbMmKGVK1fqD3/4g7xeb7oHpqioSB6PRzabTZWVlaqqqlK/fv3Ur18/VVVVKT8/X5MmTUq3nTp1qu6880516dJFnTt31uzZszVgwACNHj06m+UDAHB6271bWr1aevll6YwzUkFmxQqpU6fU9lhzLqtLy2qYWbx4sSRp5MiRGeuXLl2qKVOmSJLuuusuhcNhTZ8+XfX19RoyZIjWrl0rr9ebbv/YY4/J6XTq2muvVTgc1qhRo7Rs2TI5HI5slg8AwOln+3bpueekdeukM8+UJk6Ubr1VcrtzXdkxtes8M7nCPDMAAByDaUrvv58KMG++KX3jG9K//qv07W9LzuP3eZwq88xktWcGAACcgpJJ6a23pGefTQWZCy9MBZi5cyW79R7bSJgBAOB00NycunT03HPShx9K3/qWdNNN0sMPSyeYDuVUR5gBAKCjikRSg3efe06qrZVGjJBuv10655xcV9amCDMAAHQkDQ3SSy9Jv/+9FAxKo0enLh/17JnryrKGMAMAgNXV10t//GNqicel731PWrBAOsbksh0NYQYAACvy+1O9Ly+9JBmGdOWV0pIl0mHPMjwdEGYAALCKXbu+mMSuuFj6/vellSulgoJcV5ZThBkAAE5lf/97agDva69JPXqkJrGbPj3VGwNJhBkAAE4tpilt2pQKMG+9lbrzaOJE6a67TjiJ3emKvwoAALmWTKZm3332Welvf5MGDkwFmPvvt+Qkdu2NMAMAQC7E45mT2A0dKv3bv0nnn2/5SezaG2EGAID2EolINTWpQby1tdLIkVJlZep5SBaUSH7xeMe3Pjqo73yjqxz29g9ihBkAALIpFMqcxG7sWGtNYhcOS3v2SHv2yNy1W6E9n2j/vs+0/dNGbYy5pQvHS5L+fdkGdS9y674J/XVZRfd2LZGnZgMA0NYOHvxiErtEIjWJ3ZVXnnqT2Jmm9Nln0u7diu7ao7pd+7R/32fyH2zQftOl/XkF2m8Uyt+pWHXOAvnNPIWTx+55admy+MaL2iTQ8NRsAADaU21tqvflz3/+YhK7//5v6YwzcldTLKbk3r068I/d2r9rn/bXHpD/QEj7w0ntd+Zrv7NAfneh9ucVqF6GpN6Sq7fkO8q+El+8tdlSOehwplKB5v4/btOY/r52u+REmAEA4KvatSs1gPeVV6TOnaWrr26/SexMU6H9n2n/P3Zr/8592l97UP4DIdWFovLbDPmdBarLK1CdM1/NNruk4tTSSanlKFxOu3yFbpUWGiotdKu00C1foVvdCg35Ct3yKabP/99q7X5ypc79bJfyoxFdMnN5ZlmSagMRvb3zoIb27ZLlP0IKYQYAgNb44INUgPnf/5XKy1O3UM+Y0aaT2MWak6qrb9T+nR9r/y6//LWfaf+BBu1viGl/3Kb9zgLtd+ar0eE69A2HpK6SvatUdOT+bDappFMqkJR6XSp1JlUaCcr30Qcq3fqeSj/6u3y1u1QUOCBbInH0bpdDeko6X6nQ0pDnPma7ulDka/wFWocwAwDo8BJJU2/vPKi6UETdvG79S+/OJ38JxDSl995LBZi335bOPTcVYO6+u9WT2CWTpg42xbQ/GNF+f7327/HLX3tAdQca5G+IaX9M2m/36IDTc9g381OLS6nlS7wOqdQWk6/hgLp9tk++z/bJV79f3QJ18gU/U2nDAXVtqJfTTB67MJstNZ+NYUiFhanByd/6lnTVVdLw4RlB7c1/HtAPfvPmCc+1m/fYQaetEWYAAB3an7fU6v4/blNt4IueghPedZNMSm+8kZrEbvNm6aKLUgFm3rxjTmLXGG2WPxhJBZXPw6mBtLUHVXewJajYVGdzKW5zHOXbh677fCmouBJxdWusV2nogHyhAyoNfabShoPyhQ6oW8NB+RoOqLThgPLj0cxd2e2SwyF5PFKXLtJ5A6V/+RdpwgRp8OCvPYvwv/TurO5FbvkDER2t/8YmyVeUCozthbuZAAAd1p+31OpHK9494kf3qHfdxOPSX/+a6oH56KPUJHYTJyre/zx92hCTPxhRXTAifyAi/8EG1fnrU3f9hKLaH7erwXbyIaGkJaQ0HFS3hkNhpeGgShsOqDSUCirF4aBsdruUl5cag9O1q9S3byqYXHGFdMEFOXu8Qav+rl/Dyf5+E2YAAB1SImnqkgdfzeiROVwPj01PnB1X3V/+L9Wj0vNs+bv2UF3UTPWmJJ064HDLtJ3cIwU6RZsOhZNUIMkIKo31Ko01qJtiyutWkgomQ4ZIl12WmvXXYs9d+vOWWt33/FbtD37RM9TW88wQZr6EMAMAHVdzOKKGzz5X6MDnCtYHFfq8QaFgk/6x5zP935aP5Uim7ilO2u2K251qMPJV7ynUpwXFijvzTuoYzkTzFz0nDQdTSyQoX6JJpYZU2q1YpRedp06XficVTNztN14kl0KRuAbMXStJWjrl4jafAZh5ZgAAp7ZoVM31n6eCyMGAgp+HFAo0KhQKK9QYSb02RBSKxBWMJxVK2BS05ylkcynkcCmU51YoL19NrqMFB5ukrlLvricso0vj5+rW9Ll8kaBKFVVpQZ5Ke5bKd14/dRvwDfl8ndU53yV7DqbpP9V9ObgM6dOKQdVtjDADACfpa90R00bHaI8ajss0U88XCgaV+DyghoMBBQ8EFAykekNCDRGFGqMKhWMKheOpINJsKiRnanG4FHIYCjndCrqOFkQcypgI5Sh37xyLOx6VN9YkbywsbzImj5lQQyyhqNOlRpdbQaOTmgyPkvbMAbiLKi9rt/lQkB2EGQA4CV/pjpg2PsbXqsE0paam1LOBgkElAgE1HAx+0RsSbFKoMaJgU0yhSLNC0YRC8aSCpkMhOTKDSJ5HIZdHja4v3z6cp9QEJ4cmObEpfTfxyTDiURW2BJFEVIX2pLxup7yFBfKWnCFvaYm8xV55z/DK28mjQrdTXndeqs2h9y5n5riWljEzp9JdN8gOwgwAnMCx7tzwByL60Yp32+TOjeMe46mNmnlxqVa/slmdwyH1jjTIE4/IaI4rL9msmvUv6ECPQuW7nApF4grFTQVNRyqI2JwK2fJSQcTpTl2aceV/KYi0dH0Upz7a1aoQIqWCiDfWpMJYWN7miArVLG+eXd4CQ97iQnm7nCFv1zNSr8VeFXbynDCItAWH3ab7JvTXj1a8K5uU8bdt6cu6b0L/nF0aQdshzADAcSSSpu7/47aj/p99q59DY5qpJygfPKjYpwcU+vSggp99rsCBoP76f//Q98MxOZPNciYTijryFM5zK5xnqNHl0ct1H0lGgfac4VPIfeRc9M+2vGnlLPoZQSQeljcZV6EjKa/hTIWRok7ydi1O9Yx07azCzoXyevOzHkTaymUV3bX4xouO6NHy5ejpzsgOwgwAHMfbOw8ecWuvLZmQN9okb6xJBdGwPPui+t2DH8ubjCl4IKBQsEnBaLNCzVLQnqegw1DIYSjo8ijkylfQla9oXsuMqoe6Qc4+2pP9js9ojsobaVKnWGopTMRUZEvI65C8LrsK8w15izulgsiZpSr0dU1dsunksUQQaSuXVXTXmP6+3I41QlYRZgB0GKZpKp4wFY4nFIknFI4lFGlOKBxtVjjQoEioUZGGJoUbwgo3NikSbFQkEEp9boooHIkrGk8onDAVNu0K2xwK25062+FS1OlSxOlK95QEPV4FPd70sd8PHHrjKpNKTr7mTtEmeaON8kYbVRCLKD8Wlrs5JlciLpuZVMLmUKPLrXpPkfZ7Oyvg7qS4MzUiNuo0FO1k6LNDl4j+6/oLddWFZ7bVn7NDcdhtDPLtwAgzwCks53eutBHTNBVtTqYCxqGQ0RI4IvFk+nM4nlA0llA4HFU41KhwsCEVOBqaFG4MK9wUVSTWrEizqbBsCsuhiN2piCNPYaehsNN1xJ0qx+eSdOgHznNoaQUjHpU3eqhXJB5WUSKWGi/ilLz5hgoLPakxI906q/BMn7xl3VR4hldet1OF7jx1cjvlsNv0xkk+6+ZE2vNZOMCphDADnKLa4+6ZZNJM9Vykw0Vm4Gh5H4k1p4JEU1ThcFSRSCz1Gk69hhvDijRFFYnFFW6WwrIpYmsJGakejZOdQfXo3KklT6nlJDiSCXniUbmbY/IkUos7mZDbbJbHlpTHLrmddnkMp9weQ55O+XIXeeXpfIbcXTvLU9RJbpdThtOuu5/9mw42xo84RjTPUCzPUF5Rd62++7tfOWie6Fk3kmS3pYbccFcOcCTCDI7QUXoDrMo0Tb3wt3267elNR2yrDUR062/f0a3fKtO5hU6FQ02ppTGsSENYkaawwuGYwtF4KpgkTIWTUlh2hW1ORW0OhR0uhZ2pJeY8yQk8juvQvCCOTifVs+FqjstIxORpjsmTjMtjJuS2melg4XG75O7kkdtbIE+nfHnyDXk8hgxP6tXjcsiT55D70PLFZ3vq9dDnPEfbjQVJJE39aMW7krJzR8zJ3HUz7du9teS1ndyVAxwFYeZryMaPfnsHicOPV98Y03++mN3eAKtIJE1FmxOKxpOKHHptuVQSjScUCUcVDUdTr5GYIo1hRRuaFAk2KNrQpGhTRNFITNFoTJFYIvXdpBQ1bYrIrqjNoajNoYgjT1FHniJOl6KOvC8NDD0Gu12/ftt/2MrDJvUwDi2tYDRH5WmOyZ2Iy5OIy63Eod4Lhzwel9z5bnk6eVI9GJ088njccucb8uQbcruc6YDhcTlSweQoocPttMvZhiGjvbTHHTEnc4yBPYu5Kwc4Cp7N9BVl4xJAe1xWONHxjqatn4LaWvFIVNFASJFAKBUWQk2KNqZ6IaKNEUUjUUUicUWjcUVjzYrGmw+Fh4QiidRYjWhzqpcimjAVMe0ZgSJidyhqz1PU4VT0UKCIOF2KO07yekYW2cyk3PGY3M1RuZujMprjMg4NDu1klwrybPLkOVI9GgWp3gx35zPk6XKGPEVeuQs8chupoHG8Hg2308FU7SeBGYCBTE2xZvX/+RpJ0rZ545Tvats+Eh40+SVtHWay8ejz9nqceos17+3WXcvflOfQ3RPeWJMKYmF1ioWVHwurIBZJTcqVOPTj2RyTN8+u8ed1U+xQz0S0OaFI86GAkDAVTUoR06ZoUorKngoLsitidypqd37xeqgnIuN9S+/E4b0UTpcSrRrQmR3ORLPczTEZibjciZiMRLOMZLMMMyFDSbltpgyHTW6nXYbLIbeRJ8NtyO0xZBwKGUaRV0aRV+5OHhlGnow8hwynXe7DXv+6vU4/+/3WE9bEnSsAcu1UCTOWucz0q1/9Sg8//LBqa2t13nnn6fHHH9e3v/3tdq/j8Am0fvzaChWHA0rKdih52PTZqw6t79pJSVNqNqV40lRzUmo2TTWbNjVLh15tiiv12pCQbrHZFXG4tOziqyRJ17/3JzlMU0mbXX/Z9Ge9ludUUjYlbXYlbDaZNpsSNrsSNnv6fXq7/dB7u/1Qe/uh9zYlbA4lJPWwORR3OLSjay9J0ln1+5SUTc12h2qLukmSvJFGxZypUCFJ97bnH/soXM0xGc2xLwWLeCpYmM1ymwkZNlNuh+2LcOB2ye12ych3y+iUL6OwkwxvJ7mLOsnId8tdkAoWbqddxqFeCsP5xavhtMto50sjfbt6T9xI3LkCAC0sEWaeeeYZVVZW6le/+pWGDx+uJ554Qpdffrm2bdumnj17tmsth0+g9dh3bszasVYN/F7W9n00u4vLjlgXch99OlF3PJIKFIcGc7a8dydiMpLNciebZZhJGTZTxqG7RgwjT4Y7L9Vr4TnUa5HvkdubnwoYZ3hlFBfJ3aU4FT4O67VwOeynxaWQE93Zwp0rAJDJEmFmwYIFmjp1qn74wx9Kkh5//HGtWbNGixcvVnV1dbvWUhc6+viSkoZ65SXjyksk5Egm5EwmlJdMyGmm3n95yTMTcpim8syEnGZSSiaVjDfLbiZlyqbnKy6VJF215RU5k0m1TJre2WuoyJ0nu8Muu8Mhh9Mhm8MhR15e6r3TIUeeU3Z76kffYVP6vd1mk8Nuk12m9hxo1Jv//EwypaTNpr+e/S+SpJH/3CBbMqlmu0P/23ewJGnAJ9sVdhlqcBUokufSoqnDNbyih2yO3F/66ah4ngwAtM4pH2ZisZg2btyon/70pxnrx44dq9dff/2o34lGo4pGo+nPwWCwzeo5vGvfE4so7nDqs4IzJNsXPy5PT/vWSc82eawJs/5QMSrjc2v2eaLjVR3leH/te/ER6zafeY6kL3oDhg7oKRs/olnH82QA4OSd8mHms88+UyKRUGlpacb60tJS+f2H356aUl1drfvvvz8r9Rx+CSDsygw3X+USQHtfVjiZCboOP75Eb0B743kyAHByLDPhg82W+R9w0zSPWNdizpw5CgQC6WXv3r1tVkfLJQDpix/5dI2HXlv7o5+NfX7V4x2Nr8ids9uyT3ctz5O56sIzNbRvF4IMABzFKd8zU1JSIofDcUQvTF1d3RG9NS0Mw5BhtHLGsFbIxiWA9r6scKzjdS9y6z/Gf1PFBQa9AQAASzjlw4zL5dKgQYNUU1Oj73//++n1NTU1uuqqq3JWVzYuAbT3ZQUuYwAAOoJTPsxI0qxZs3TTTTdp8ODBGjp0qJYsWaI9e/bo1ltvzWld2XikfHs/pr69jwcAQFuzRJi57rrrdODAAc2bN0+1tbWqqKjQn/70J5111lm5Lg0AAOSYJcKMJE2fPl3Tp0/PdRkAAOAUY5m7mQAAAI6GMAMAACyNMAMAACyNMAMAACyNMAMAACyNMAMAACyNMAMAACyNMAMAACyNMAMAACyNMAMAACyNMAMAACyNMAMAACyNMAMAACyNMAMAACyNMAMAACyNMAMAACyNMAMAACyNMAMAACyNMAMAACyNMAMAACyNMAMAACyNMAMAACyNMAMAACyNMAMAACyNMAMAACyNMAMAACyNMAMAACyNMAMAACyNMAMAACyNMAMAACyNMAMAACyNMAMAACyNMAMAACyNMAMAACyNMAMAACwta2Fm165dmjp1qnr37i2Px6O+ffvqvvvuUywWy2i3Z88eTZgwQQUFBSopKdHtt99+RJvNmzdrxIgR8ng8OvPMMzVv3jyZppmt0gEAgIU4s7Xjv//970omk3riiSd09tlna8uWLZo2bZoaGxv1yCOPSJISiYTGjx+vrl27av369Tpw4IAmT54s0zS1cOFCSVIwGNSYMWN06aWXasOGDdqxY4emTJmigoIC3XnnndkqHwAAWETWwsxll12myy67LP25T58+2r59uxYvXpwOM2vXrtW2bdu0d+9elZWVSZIeffRRTZkyRfPnz1dhYaF+97vfKRKJaNmyZTIMQxUVFdqxY4cWLFigWbNmyWazZesUAACABbTrmJlAIKDOnTunP7/xxhuqqKhIBxlJGjdunKLRqDZu3JhuM2LECBmGkdFm37592rVrV7vVDgAATk3tFmb++c9/auHChbr11lvT6/x+v0pLSzPaFRcXy+Vyye/3H7NNy+eWNoeLRqMKBoMZCwAA6JhaHWbmzp0rm8123OWdd97J+M6+fft02WWX6ZprrtEPf/jDjG1Hu0xkmmbG+sPbtAz+PdYlpurqahUVFaWX8vLy1p4mAACwiFaPmZk5c6auv/7647bp1atX+v2+fft06aWXaujQoVqyZElGO5/Pp7feeitjXX19veLxeLr3xefzHdEDU1dXJ0lH9Ni0mDNnjmbNmpX+HAwGCTQAAHRQrQ4zJSUlKikpOam2n3zyiS699FINGjRIS5culd2e2RE0dOhQzZ8/X7W1terevbuk1KBgwzA0aNCgdJt77rlHsVhMLpcr3aasrCwjNH2ZYRgZY2wAAEDHlbUxM/v27dPIkSNVXl6uRx55RJ9++qn8fn9GL8vYsWPVv39/3XTTTXrvvff0yiuvaPbs2Zo2bZoKCwslSZMmTZJhGJoyZYq2bNmi1atXq6qqijuZAACApCzemr127Vp9+OGH+vDDD9WjR4+MbS1jXhwOh1588UVNnz5dw4cPl8fj0aRJk9K3bktSUVGRampqNGPGDA0ePFjFxcWaNWtWxmUkAABw+rKZp8FUusFgUEVFRQoEAukeHwAA8PU0xZrV/+drJEnb5o1Tvqtt+0hO9vebZzMBAABLI8wAAABLI8wAAABLI8wAAABLI8wAAABLI8wAAABLI8wAAABLI8wAAABLI8wAAABLI8wAAABLI8wAAABLI8wAAABLI8wAAABLI8wAAABLI8wAAABLI8wAAABLI8wAAABLI8wAAABLI8wAAABLI8wAAABLI8wAAABLI8wAAABLI8wAAABLI8wAAABLI8wAAABLI8wAAABLI8wAAABLI8wAAABLI8wAAABLI8wAAABLI8wAAABLI8wAAABLI8wAAABLI8wAAABLI8wAAABLI8wAAABLa5cwE41GdeGFF8pms2nTpk0Z2/bs2aMJEyaooKBAJSUluv322xWLxTLabN68WSNGjJDH49GZZ56pefPmyTTN9igdAACc4pztcZC77rpLZWVlev/99zPWJxIJjR8/Xl27dtX69et14MABTZ48WaZpauHChZKkYDCoMWPG6NJLL9WGDRu0Y8cOTZkyRQUFBbrzzjvbo3wAAHAKy3qYeemll7R27Vo9++yzeumllzK2rV27Vtu2bdPevXtVVlYmSXr00Uc1ZcoUzZ8/X4WFhfrd736nSCSiZcuWyTAMVVRUaMeOHVqwYIFmzZolm82W7VMAAACnsKxeZtq/f7+mTZump556Svn5+Udsf+ONN1RRUZEOMpI0btw4RaNRbdy4Md1mxIgRMgwjo82+ffu0a9euox43Go0qGAxmLAAAoGPKWpgxTVNTpkzRrbfeqsGDBx+1jd/vV2lpaca64uJiuVwu+f3+Y7Zp+dzS5nDV1dUqKipKL+Xl5V/3dAAAwCmq1WFm7ty5stlsx13eeecdLVy4UMFgUHPmzDnu/o52mcg0zYz1h7dpGfx7rEtMc+bMUSAQSC979+5t7WkCAACLaPWYmZkzZ+r6668/bptevXrpgQce0JtvvplxeUiSBg8erBtuuEHLly+Xz+fTW2+9lbG9vr5e8Xg83fvi8/mO6IGpq6uTpCN6bFoYhnHEcQEAQMfU6jBTUlKikpKSE7b75S9/qQceeCD9ed++fRo3bpyeeeYZDRkyRJI0dOhQzZ8/X7W1terevbuk1KBgwzA0aNCgdJt77rlHsVhMLpcr3aasrEy9evVqbfkAAKCDydqYmZ49e6qioiK9fOMb35Ak9e3bVz169JAkjR07Vv3799dNN92k9957T6+88opmz56tadOmqbCwUJI0adIkGYahKVOmaMuWLVq9erWqqqq4kwkAAEjK8QzADodDL774otxut4YPH65rr71WV199tR555JF0m6KiItXU1Ojjjz/W4MGDNX36dM2aNUuzZs3KYeUAAOBU0S6T5kmpcTRHm7W3Z8+eeuGFF4773QEDBui1117LVmkAAMDCeDYTAACwNMIMAACwNMIMAACwNMIMAACwNMIMAACwNMIMAACwNMIMAACwNMIMAACwNMIMAACwNMIMAACwNMIMAACwNMIMAACwNMIMAACwNMIMAACwNMIMAACwNMIMAACwNMIMAACwNMIMAACwNMIMAACwNMIMAACwNMIMAACwNMIMAACwNMIMAACwNMIMAACwNMIMAACwNMIMAACwNMIMAACwNMIMAACwNMIMAACwNMIMAACwNMIMAACwNMIMAACwNMIMAACwNMIMAACwNMIMAACwtKyHmRdffFFDhgyRx+NRSUmJJk6cmLF9z549mjBhggoKClRSUqLbb79dsVgso83mzZs1YsQIeTwenXnmmZo3b55M08x26QAAwAKc2dz5s88+q2nTpqmqqkrf/e53ZZqmNm/enN6eSCQ0fvx4de3aVevXr9eBAwc0efJkmaaphQsXSpKCwaDGjBmjSy+9VBs2bNCOHTs0ZcoUFRQU6M4778xm+QAAwAKyFmaam5t1xx136OGHH9bUqVPT688555z0+7Vr12rbtm3au3evysrKJEmPPvqopkyZovnz56uwsFC/+93vFIlEtGzZMhmGoYqKCu3YsUMLFizQrFmzZLPZsnUKAADAArJ2mendd9/VJ598IrvdroEDB6p79+66/PLLtXXr1nSbN954QxUVFekgI0njxo1TNBrVxo0b021GjBghwzAy2uzbt0+7du066rGj0aiCwWDGAgAAOqashZmPPvpIkjR37lz97Gc/0wsvvKDi4mKNGDFCBw8elCT5/X6VlpZmfK+4uFgul0t+v/+YbVo+t7Q5XHV1tYqKitJLeXl5m54bAAA4dbQ6zMydO1c2m+24yzvvvKNkMilJuvfee/Wv//qvGjRokJYuXSqbzab/+Z//Se/vaJeJTNPMWH94m5bBv8e6xDRnzhwFAoH0snfv3taeJgAAsIhWj5mZOXOmrr/++uO26dWrl0KhkCSpf//+6fWGYahPnz7as2ePJMnn8+mtt97K+G59fb3i8Xi698Xn8x3RA1NXVydJR/TYfPk4X74sBQAAOq5Wh5mSkhKVlJScsN2gQYNkGIa2b9+uSy65RJIUj8e1a9cunXXWWZKkoUOHav78+aqtrVX37t0lpQYFG4ahQYMGpdvcc889isVicrlc6TZlZWXq1atXa8sHAABtJN/l1K5fjM91GdkbM1NYWKhbb71V9913n9auXavt27frRz/6kSTpmmuukSSNHTtW/fv310033aT33ntPr7zyimbPnq1p06apsLBQkjRp0iQZhqEpU6Zoy5YtWr16taqqqriTCQAASMryPDMPP/ywnE6nbrrpJoXDYQ0ZMkSvvvqqiouLJUkOh0Mvvviipk+fruHDh8vj8WjSpEl65JFH0vsoKipSTU2NZsyYocGDB6u4uFizZs3SrFmzslk6AACwCJt5GkylGwwGVVRUpEAgkO7xAQAAp7aT/f3m2UwAAMDSCDMAAMDSCDMAAMDSCDMAAMDSCDMAAMDSCDMAAMDSCDMAAMDSCDMAAMDSCDMAAMDSCDMAAMDSCDMAAMDSCDMAAMDSsvrU7FNFy7M0g8FgjisBAAAnq+V3+0TPxD4twkwoFJIklZeX57gSAADQWqFQSEVFRcfcbjNPFHc6gGQyqX379snr9cpms33l/QSDQZWXl2vv3r3HfRS51Z0u5ymdPud6upyndPqc6+lynhLn2hGd7HmapqlQKKSysjLZ7cceGXNa9MzY7Xb16NGjzfZXWFjYof8la3G6nKd0+pzr6XKe0ulzrqfLeUqca0d0Mud5vB6ZFgwABgAAlkaYAQAAlkaYaQXDMHTffffJMIxcl5JVp8t5SqfPuZ4u5ymdPud6upynxLl2RG19nqfFAGAAANBx0TMDAAAsjTADAAAsjTADAAAsjTADAAAsjTBzAtXV1br44ovl9XrVrVs3XX311dq+fXuuy8qKxYsX6/zzz09PYjR06FC99NJLuS4r66qrq2Wz2VRZWZnrUtrc3LlzZbPZMhafz5frsrLik08+0Y033qguXbooPz9fF154oTZu3Jjrstpcr169jvhnarPZNGPGjFyX1uaam5v1s5/9TL1795bH41GfPn00b948JZPJXJfW5kKhkCorK3XWWWfJ4/Fo2LBh2rBhQ67L+tpee+01TZgwQWVlZbLZbPr973+fsd00Tc2dO1dlZWXyeDwaOXKktm7d2urjEGZOYN26dZoxY4befPNN1dTUqLm5WWPHjlVjY2OuS2tzPXr00C9+8Qu98847euedd/Td735XV1111Vf6F8sqNmzYoCVLluj888/PdSlZc95556m2tja9bN68Odcltbn6+noNHz5ceXl5eumll7Rt2zY9+uijOuOMM3JdWpvbsGFDxj/PmpoaSdI111yT48ra3oMPPqhf//rXWrRokT744AM99NBDevjhh7Vw4cJcl9bmfvjDH6qmpkZPPfWUNm/erLFjx2r06NH65JNPcl3a19LY2KgLLrhAixYtOur2hx56SAsWLNCiRYu0YcMG+Xw+jRkzJv1MxZNmolXq6upMSea6detyXUq7KC4uNv/7v/8712VkRSgUMvv162fW1NSYI0aMMO+4445cl9Tm7rvvPvOCCy7IdRlZd/fdd5uXXHJJrsvIiTvuuMPs27evmUwmc11Kmxs/frx58803Z6ybOHGieeONN+aoouxoamoyHQ6H+cILL2Ssv+CCC8x77703R1W1PUnm6tWr05+TyaTp8/nMX/ziF+l1kUjELCoqMn/961+3at/0zLRSIBCQJHXu3DnHlWRXIpHQqlWr1NjYqKFDh+a6nKyYMWOGxo8fr9GjR+e6lKz6xz/+obKyMvXu3VvXX3+9Pvroo1yX1Oaef/55DR48WNdcc426deumgQMH6je/+U2uy8q6WCymFStW6Oabb/5aD9E9VV1yySV65ZVXtGPHDknS+++/r/Xr1+t73/tejitrW83NzUokEnK73RnrPR6P1q9fn6Oqsm/nzp3y+/0aO3Zsep1hGBoxYoRef/31Vu3rtHjQZFsxTVOzZs3SJZdcooqKilyXkxWbN2/W0KFDFYlE1KlTJ61evVr9+/fPdVltbtWqVXr33Xc7xDXp4xkyZIh++9vf6hvf+Ib279+vBx54QMOGDdPWrVvVpUuXXJfXZj766CMtXrxYs2bN0j333KO3335bt99+uwzD0L/927/lurys+f3vf6/PP/9cU6ZMyXUpWXH33XcrEAjo3HPPlcPhUCKR0Pz58/WDH/wg16W1Ka/Xq6FDh+o///M/9c1vflOlpaV6+umn9dZbb6lfv365Li9r/H6/JKm0tDRjfWlpqXbv3t2qfRFmWmHmzJn629/+1qGT8jnnnKNNmzbp888/17PPPqvJkydr3bp1HSrQ7N27V3fccYfWrl17xP8JdTSXX355+v2AAQM0dOhQ9e3bV8uXL9esWbNyWFnbSiaTGjx4sKqqqiRJAwcO1NatW7V48eIOHWaefPJJXX755SorK8t1KVnxzDPPaMWKFVq5cqXOO+88bdq0SZWVlSorK9PkyZNzXV6beuqpp3TzzTfrzDPPlMPh0EUXXaRJkybp3XffzXVpWXd4r6Jpmq3uaSTMnKTbbrtNzz//vF577TX16NEj1+Vkjcvl0tlnny1JGjx4sDZs2KD/+q//0hNPPJHjytrOxo0bVVdXp0GDBqXXJRIJvfbaa1q0aJGi0agcDkcOK8yegoICDRgwQP/4xz9yXUqb6t69+xGB+5vf/KaeffbZHFWUfbt379bLL7+s5557LtelZM1PfvIT/fSnP9X1118vKRXId+/ererq6g4XZvr27at169apsbFRwWBQ3bt313XXXafevXvnurSsabmz0u/3q3v37un1dXV1R/TWnAhjZk7ANE3NnDlTzz33nF599dUO/S/W0ZimqWg0musy2tSoUaO0efNmbdq0Kb0MHjxYN9xwgzZt2tRhg4wkRaNRffDBBxn/4egIhg8ffsSUCTt27NBZZ52Vo4qyb+nSperWrZvGjx+f61KypqmpSXZ75s+Uw+HokLdmtygoKFD37t1VX1+vNWvW6Kqrrsp1SVnTu3dv+Xy+9B15Umoc2Lp16zRs2LBW7YuemROYMWOGVq5cqT/84Q/yer3pa3xFRUXyeDw5rq5t3XPPPbr88stVXl6uUCikVatW6a9//av+/Oc/57q0NuX1eo8Y81RQUKAuXbp0uLFQs2fP1oQJE9SzZ0/V1dXpgQceUDAY7HD/V/vjH/9Yw4YNU1VVla699lq9/fbbWrJkiZYsWZLr0rIimUxq6dKlmjx5spzOjvuf8QkTJmj+/Pnq2bOnzjvvPL333ntasGCBbr755lyX1ubWrFkj0zR1zjnn6MMPP9RPfvITnXPOOfr3f//3XJf2tTQ0NOjDDz9Mf965c6c2bdqkzp07q2fPnqqsrFRVVZX69eunfv36qaqqSvn5+Zo0aVLrDtQGd1t1aJKOuixdujTXpbW5m2++2TzrrLNMl8tldu3a1Rw1apS5du3aXJfVLjrqrdnXXXed2b17dzMvL88sKyszJ06caG7dujXXZWXFH//4R7OiosI0DMM899xzzSVLluS6pKxZs2aNKcncvn17rkvJqmAwaN5xxx1mz549Tbfbbfbp08e89957zWg0muvS2twzzzxj9unTx3S5XKbP5zNnzJhhfv7557ku62v7y1/+ctTf0MmTJ5ummbo9+7777jN9Pp9pGIb5ne98x9y8eXOrj2MzTdNsg/AFAACQE4yZAQAAlkaYAQAAlkaYAQAAlkaYAQAAlkaYAQAAlkaYAQAAlkaYAQAAlkaYAQAAlkaYAQAAlkaYAQAAlkaYAQAAlkaYAQAAlvb/Aa1yW8Mj68H3AAAAAElFTkSuQmCC",
-      "text/plain": [
-       "<Figure size 640x480 with 1 Axes>"
-      ]
-     },
-     "metadata": {},
-     "output_type": "display_data"
-    }
-   ],
-   "source": [
-    "def sort_both(x, y):\n",
-    "    sort_inds = np.argsort(x)\n",
-    "    return x[sort_inds], y[sort_inds]\n",
-    "\n",
-    "\n",
-    "preds = np.zeros((10, y_test.size))\n",
-    "for i in range(10):\n",
-    "    y_pred = models[i].predict(X_test)\n",
-    "    preds[i, :] = y_pred\n",
-    "\n",
-    "means = np.mean(preds, axis=0)\n",
-    "vars = np.var(preds, axis=0)\n",
-    "\n",
-    "bias = np.mean((y_test - means) ** 2)\n",
-    "variance = np.mean(vars)\n",
-    "mse = np.mean((preds - y_test) ** 2)\n",
-    "print(bias + variance)\n",
-    "print(mse)\n",
-    "\n",
-    "for i in range(10):\n",
-    "    y_pred = models[i].predict(X_test)\n",
-    "    plt.plot(*sort_both(x_test, y_pred), lw=0.5, color=\"red\")\n",
-    "plt.scatter(*sort_both(X_test[:, 1], y_test))\n",
-    "# plt.scatter(*sort_both(X_train[:, 1], y_train))\n",
-    "sort_inds = np.argsort(x_test)\n",
-    "plt.errorbar(*sort_both(x_test, means), yerr=vars[sort_inds])"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": []
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 8,
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "15.94085519235225\n"
-     ]
-    }
-   ],
-   "source": [
-    "print(bias)"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 9,
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "76.19079943912999\n"
-     ]
-    }
-   ],
-   "source": [
-    "print(variance)"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": []
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": []
-  }
- ],
- "metadata": {
-  "kernelspec": {
-   "display_name": "Python 3 (ipykernel)",
-   "language": "python",
-   "name": "python3"
-  },
-  "language_info": {
-   "codemirror_mode": {
-    "name": "ipython",
-    "version": 3
-   },
-   "file_extension": ".py",
-   "mimetype": "text/x-python",
-   "name": "python",
-   "nbconvert_exporter": "python",
-   "pygments_lexer": "ipython3",
-   "version": "3.9.18"
-  }
- },
- "nbformat": 4,
- "nbformat_minor": 2
-}
diff --git a/doc/pub/week38/ipynb/week38.ipynb b/doc/pub/week38/ipynb/week38.ipynb
index 4cc0c95c5..cd2b6ab03 100644
--- a/doc/pub/week38/ipynb/week38.ipynb
+++ b/doc/pub/week38/ipynb/week38.ipynb
@@ -2,2687 +2,1902 @@
  "cells": [
   {
    "cell_type": "markdown",
-   "id": "a811ba80",
-   "metadata": {},
+   "id": "cd058661",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "<!-- HTML file automatically generated from DocOnce source (https://github.com/doconce/doconce/)\n",
     "doconce format html week38.do.txt --no_mako -->\n",
-    "<!-- dom:TITLE: Week 38: Logistic Regression and Optimization -->"
+    "<!-- dom:TITLE: Week 38: Statistical analysis, bias-variance tradeoff and resampling methods -->"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "e5014a9c",
-   "metadata": {},
+   "id": "bb0e0285",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "# Week 38: Logistic Regression and Optimization\n",
-    "**Morten Hjorth-Jensen**, Department of Physics and Center for Computing in Science Education, University of Oslo and Department of Physics and Astronomy and Facility for Rare Isotope Beams, Michigan State University\n",
+    "# Week 38: Statistical analysis, bias-variance tradeoff and resampling methods\n",
+    "**Morten Hjorth-Jensen**, Department of Physics and Center for Computing in Science Education, University of Oslo, Norway\n",
     "\n",
-    "Date: **September 16-20, 2024**"
+    "Date: **September 15-19, 2025**"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "023eb6d1",
-   "metadata": {},
+   "id": "5d0bf374",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "## Plans for week 38, lecture Monday September 16\n",
+    "## Plans for week 38, lecture Monday September 15\n",
+    "\n",
+    "**Material for the lecture on Monday September 15.**\n",
     "\n",
-    "**Material for the lecture on Monday September 16.**\n",
+    "1. Statistical interpretation of OLS and various expectation values\n",
     "\n",
-    "  * Logistic regression as our first encounter of classification methods. From binary cases to several categories.\n",
+    "2. Resampling techniques, Bootstrap and cross validation and bias-variance tradeoff\n",
     "\n",
-    "  * Start gradient and optimization methods\n",
+    "3. The material we did not cover last week, that is on more advanced methods for updating the learning rate, are covered by its own video. We will briefly discuss these topics at the beginning of the lecture and during the lab sessions. See video on ADAgrad, RMSprop and ADAM (material from last week not covered during lecture) at <https://youtu.be/J_41Hld6tTU>\n",
     "\n",
-    "  * [Video of lecture](https://youtu.be/c9DIfNHy2ks)\n",
+    "4. [Video of Lecture](https://youtu.be/4Fo7ITVA7V4)\n",
     "\n",
-    "  * Whiteboard notes at <https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2024/NotesSeptember16.pdf>"
+    "5. [Video from lab sessions on the bias-variance tradeoff](https://youtu.be/GBWc1abChKo)\n",
+    "\n",
+    "6. [Whiteboard notes](https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2025/FYSSTKweek38.pdf)"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "e981c015",
-   "metadata": {},
+   "id": "38a10c06",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "## Suggested reading and videos\n",
-    "  * Readings and Videos:\n",
+    "## Readings and Videos\n",
+    "1. Raschka et al, pages 175-192\n",
     "\n",
-    "    * Hastie et al 4.1, 4.2 and 4.3 on logistic regression\n",
+    "2. Hastie et al Chapter 7, here we recommend 7.1-7.5 and 7.10 (cross-validation) and 7.11 (bootstrap). See <https://link.springer.com/book/10.1007/978-0-387-84858-7>.\n",
     "\n",
-    "    * Raschka et al, pages 53-76 on Logistic regression and pages 37-52 on gradient optimization\n",
+    "3. [Video on bias-variance tradeoff](https://www.youtube.com/watch?v=EuBBz3bI-aA)\n",
     "\n",
-    "    * For a good discussion on gradient methods, see Goodfellow et al section 4.3-4.5 and chapter 8. We will come back to the latter chapter in our discussion of Neural networks as well.\n",
+    "4. [Video on Bootstrapping](https://www.youtube.com/watch?v=Xz0x-8-cgaQ)\n",
     "\n",
-    "    * [Video on Logistic regression](https://www.youtube.com/watch?v=C5268D9t9Ak)\n",
+    "5. [Video on cross validation](https://www.youtube.com/watch?v=fSytzGwwBVw)\n",
     "\n",
-    "    * [Yet another video on logistic regression](https://www.youtube.com/watch?v=yIYKR4sgzI8)\n",
-    "\n",
-    "    * [Video on gradient descent](https://www.youtube.com/watch?v=sDv4f4s2SB8)"
+    "For the lab session, the following video on cross validation (from 2024), could be helpful, see <https://www.youtube.com/watch?v=T9jjWsmsd1o>"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "11590c09",
-   "metadata": {},
+   "id": "2beeb82a",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "## Plans for the lab sessions\n",
+    "## Linking the regression analysis with a statistical interpretation\n",
     "\n",
-    "**Material for the active learning sessions on Tuesday and Wednesday.**\n",
+    "We will now couple the discussions of ordinary least squares, Ridge\n",
+    "and Lasso regression with a statistical interpretation, that is we\n",
+    "move from a linear algebra analysis to a statistical analysis. In\n",
+    "particular, we will focus on what the regularization terms can result\n",
+    "in.  We will amongst other things show that the regularization\n",
+    "parameter can reduce considerably the variance of the parameters\n",
+    "$\\theta$.\n",
     "\n",
-    "  * Repetition  from last week on the bias-variance tradeoff\n",
+    "On of the advantages of doing linear regression is that we actually end up with\n",
+    "analytical expressions for several statistical quantities.  \n",
+    "Standard least squares and Ridge regression  allow us to\n",
+    "derive quantities like the variance and other expectation values in a\n",
+    "rather straightforward way.\n",
     "\n",
-    "  * Resampling techniques, cross-validation examples included here, see also the lectures from last week on the bootstrap method\n",
-    "\n",
-    "  * Exercise for week 38 on the bias-variance tradeoff, see also the video from the lab session from week 37 at <https://youtu.be/omLmp_kkie0>\n",
-    "\n",
-    "  * Work on project 1, in particular resampling methods like cross-validation and bootstrap.\n",
-    "\n",
-    "  * [Video on cross-validation from exercise session](https://youtu.be/T9jjWsmsd1o)"
+    "It is assumed that $\\varepsilon_i\n",
+    "\\sim \\mathcal{N}(0, \\sigma^2)$ and the $\\varepsilon_{i}$ are\n",
+    "independent, i.e.:"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "57e011be",
-   "metadata": {},
+   "id": "84021a7f",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "## Material for lecture Monday September 16"
+    "$$\n",
+    "\\begin{align*} \n",
+    "\\mbox{Cov}(\\varepsilon_{i_1},\n",
+    "\\varepsilon_{i_2}) & = \\left\\{ \\begin{array}{lcc} \\sigma^2 & \\mbox{if}\n",
+    "& i_1 = i_2, \\\\ 0 & \\mbox{if} & i_1 \\not= i_2.  \\end{array} \\right.\n",
+    "\\end{align*}\n",
+    "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "0896e712",
-   "metadata": {},
+   "id": "1291c926",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "## Logistic Regression\n",
+    "The randomness of $\\varepsilon_i$ implies that\n",
+    "$\\mathbf{y}_i$ is also a random variable. In particular,\n",
+    "$\\mathbf{y}_i$ is normally distributed, because $\\varepsilon_i \\sim\n",
+    "\\mathcal{N}(0, \\sigma^2)$ and $\\mathbf{X}_{i,\\ast} \\, \\boldsymbol{\\theta}$ is a\n",
+    "non-random scalar. To specify the parameters of the distribution of\n",
+    "$\\mathbf{y}_i$ we need to calculate its first two moments. \n",
     "\n",
-    "In linear regression our main interest was centered on learning the\n",
-    "coefficients of a functional fit (say a polynomial) in order to be\n",
-    "able to predict the response of a continuous variable on some unseen\n",
-    "data. The fit to the continuous variable $y_i$ is based on some\n",
-    "independent variables $\\boldsymbol{x}_i$. Linear regression resulted in\n",
-    "analytical expressions for standard ordinary Least Squares or Ridge\n",
-    "regression (in terms of matrices to invert) for several quantities,\n",
-    "ranging from the variance and thereby the confidence intervals of the\n",
-    "parameters $\\boldsymbol{\\beta}$ to the mean squared error. If we can invert\n",
-    "the product of the design matrices, linear regression gives then a\n",
-    "simple recipe for fitting our data."
+    "Recall that $\\boldsymbol{X}$ is a matrix of dimensionality $n\\times p$. The\n",
+    "notation above $\\mathbf{X}_{i,\\ast}$ means that we are looking at the\n",
+    "row number $i$ and perform a sum over all values $p$."
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "44bb3650",
-   "metadata": {},
+   "id": "bf15a73d",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "## Classification problems\n",
-    "\n",
-    "Classification problems, however, are concerned with outcomes taking\n",
-    "the form of discrete variables (i.e. categories). We may for example,\n",
-    "on the basis of DNA sequencing for a number of patients, like to find\n",
-    "out which mutations are important for a certain disease; or based on\n",
-    "scans of various patients' brains, figure out if there is a tumor or\n",
-    "not; or given a specific physical system, we'd like to identify its\n",
-    "state, say whether it is an ordered or disordered system (typical\n",
-    "situation in solid state physics); or classify the status of a\n",
-    "patient, whether she/he has a stroke or not and many other similar\n",
-    "situations.\n",
+    "## Assumptions made\n",
     "\n",
-    "The most common situation we encounter when we apply logistic\n",
-    "regression is that of two possible outcomes, normally denoted as a\n",
-    "binary outcome, true or false, positive or negative, success or\n",
-    "failure etc."
+    "The assumption we have made here can be summarized as (and this is going to be useful when we discuss the bias-variance trade off)\n",
+    "that there exists a function $f(\\boldsymbol{x})$ and  a normal distributed error $\\boldsymbol{\\varepsilon}\\sim \\mathcal{N}(0, \\sigma^2)$\n",
+    "which describe our data"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "921c6771",
-   "metadata": {},
+   "id": "ed7830e9",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "## Optimization and Deep learning\n",
-    "\n",
-    "Logistic regression will also serve as our stepping stone towards\n",
-    "neural network algorithms and supervised deep learning. For logistic\n",
-    "learning, the minimization of the cost function leads to a non-linear\n",
-    "equation in the parameters $\\boldsymbol{\\beta}$. The optimization of the\n",
-    "problem calls therefore for minimization algorithms. This forms the\n",
-    "bottle neck of all machine learning algorithms, namely how to find\n",
-    "reliable minima of a multi-variable function. This leads us to the\n",
-    "family of gradient descent methods. The latter are the working horses\n",
-    "of basically all modern machine learning algorithms.\n",
-    "\n",
-    "We note also that many of the topics discussed here on logistic \n",
-    "regression are also commonly used in modern supervised Deep Learning\n",
-    "models, as we will see later."
+    "$$\n",
+    "\\boldsymbol{y} = f(\\boldsymbol{x})+\\boldsymbol{\\varepsilon}\n",
+    "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "f80e9666",
-   "metadata": {},
+   "id": "b1d75235",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "## Basics\n",
-    "\n",
-    "We consider the case where the dependent variables, also called the\n",
-    "responses or the outcomes, $y_i$ are discrete and only take values\n",
-    "from $k=0,\\dots,K-1$ (i.e. $K$ classes).\n",
-    "\n",
-    "The goal is to predict the\n",
-    "output classes from the design matrix $\\boldsymbol{X}\\in\\mathbb{R}^{n\\times p}$\n",
-    "made of $n$ samples, each of which carries $p$ features or predictors. The\n",
-    "primary goal is to identify the classes to which new unseen samples\n",
-    "belong.\n",
-    "\n",
-    "Let us specialize to the case of two classes only, with outputs\n",
-    "$y_i=0$ and $y_i=1$. Our outcomes could represent the status of a\n",
-    "credit card user that could default or not on her/his credit card\n",
-    "debt. That is"
+    "We approximate this function with our model from the solution of the linear regression equations, that is our\n",
+    "function $f$ is approximated by $\\boldsymbol{\\tilde{y}}$ where we want to minimize $(\\boldsymbol{y}-\\boldsymbol{\\tilde{y}})^2$, our MSE, with"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "952f8119",
-   "metadata": {},
+   "id": "0255cd11",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
-    "y_i = \\begin{bmatrix} 0 & \\mathrm{no}\\\\  1 & \\mathrm{yes} \\end{bmatrix}.\n",
+    "\\boldsymbol{\\tilde{y}} = \\boldsymbol{X}\\boldsymbol{\\theta}.\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "9b587b40",
-   "metadata": {},
+   "id": "04897143",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "## Linear classifier\n",
-    "\n",
-    "Before moving to the logistic model, let us try to use our linear\n",
-    "regression model to classify these two outcomes. We could for example\n",
-    "fit a linear model to the default case if $y_i > 0.5$ and the no\n",
-    "default case $y_i \\leq 0.5$.\n",
+    "## Expectation value and variance\n",
     "\n",
-    "We would then have our \n",
-    "weighted linear combination, namely"
+    "We can calculate the expectation value of $\\boldsymbol{y}$ for a given element $i$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "bfb711d7",
-   "metadata": {},
+   "id": "2a6cea60",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "<!-- Equation labels as ordinary links -->\n",
-    "<div id=\"_auto1\"></div>\n",
-    "\n",
     "$$\n",
-    "\\begin{equation}\n",
-    "\\boldsymbol{y} = \\boldsymbol{X}^T\\boldsymbol{\\beta} +  \\boldsymbol{\\epsilon},\n",
-    "\\label{_auto1} \\tag{1}\n",
-    "\\end{equation}\n",
+    "\\begin{align*} \n",
+    "\\mathbb{E}(y_i) & =\n",
+    "\\mathbb{E}(\\mathbf{X}_{i, \\ast} \\, \\boldsymbol{\\theta}) + \\mathbb{E}(\\varepsilon_i)\n",
+    "\\, \\, \\, = \\, \\, \\, \\mathbf{X}_{i, \\ast} \\, \\theta, \n",
+    "\\end{align*}\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "0acaaf3c",
-   "metadata": {},
+   "id": "08eb2262",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "where $\\boldsymbol{y}$ is a vector representing the possible outcomes, $\\boldsymbol{X}$ is our\n",
-    "$n\\times p$ design matrix and $\\boldsymbol{\\beta}$ represents our estimators/predictors."
+    "while\n",
+    "its variance is"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "73564ce7",
-   "metadata": {},
+   "id": "0f36d3c2",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "## Some selected properties\n",
-    "\n",
-    "The main problem with our function is that it takes values on the\n",
-    "entire real axis. In the case of logistic regression, however, the\n",
-    "labels $y_i$ are discrete variables. A typical example is the credit\n",
-    "card data discussed below here, where we can set the state of\n",
-    "defaulting the debt to $y_i=1$ and not to $y_i=0$ for one the persons\n",
-    "in the data set (see the full example below).\n",
-    "\n",
-    "One simple way to get a discrete output is to have sign\n",
-    "functions that map the output of a linear regressor to values $\\{0,1\\}$,\n",
-    "$f(s_i)=sign(s_i)=1$ if $s_i\\ge 0$ and 0 if otherwise. \n",
-    "We will encounter this model in our first demonstration of neural networks.\n",
-    "\n",
-    "Historically it is called the **perceptron** model in the machine learning\n",
-    "literature. This model is extremely simple. However, in many cases it is more\n",
-    "favorable to use a ``soft\" classifier that outputs\n",
-    "the probability of a given category. This leads us to the logistic function."
+    "$$\n",
+    "\\begin{align*} \\mbox{Var}(y_i) & = \\mathbb{E} \\{ [y_i\n",
+    "- \\mathbb{E}(y_i)]^2 \\} \\, \\, \\, = \\, \\, \\, \\mathbb{E} ( y_i^2 ) -\n",
+    "[\\mathbb{E}(y_i)]^2  \\\\  & = \\mathbb{E} [ ( \\mathbf{X}_{i, \\ast} \\,\n",
+    "\\theta + \\varepsilon_i )^2] - ( \\mathbf{X}_{i, \\ast} \\, \\boldsymbol{\\theta})^2 \\\\ &\n",
+    "= \\mathbb{E} [ ( \\mathbf{X}_{i, \\ast} \\, \\boldsymbol{\\theta})^2 + 2 \\varepsilon_i\n",
+    "\\mathbf{X}_{i, \\ast} \\, \\boldsymbol{\\theta} + \\varepsilon_i^2 ] - ( \\mathbf{X}_{i,\n",
+    "\\ast} \\, \\theta)^2 \\\\  & = ( \\mathbf{X}_{i, \\ast} \\, \\boldsymbol{\\theta})^2 + 2\n",
+    "\\mathbb{E}(\\varepsilon_i) \\mathbf{X}_{i, \\ast} \\, \\boldsymbol{\\theta} +\n",
+    "\\mathbb{E}(\\varepsilon_i^2 ) - ( \\mathbf{X}_{i, \\ast} \\, \\boldsymbol{\\theta})^2 \n",
+    "\\\\ & = \\mathbb{E}(\\varepsilon_i^2 ) \\, \\, \\, = \\, \\, \\,\n",
+    "\\mbox{Var}(\\varepsilon_i) \\, \\, \\, = \\, \\, \\, \\sigma^2.  \n",
+    "\\end{align*}\n",
+    "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "ef6011fd",
-   "metadata": {},
+   "id": "ea74022f",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "## Simple example\n",
-    "\n",
-    "The following example on data for coronary heart disease (CHD) as function of age may serve as an illustration. In the code here we read and plot whether a person has had CHD (output = 1) or not (output = 0). This ouput  is plotted the person's against age. Clearly, the figure shows that attempting to make a standard linear regression fit may not be very meaningful."
+    "Hence, $y_i \\sim \\mathcal{N}( \\mathbf{X}_{i, \\ast} \\, \\boldsymbol{\\theta}, \\sigma^2)$, that is $\\boldsymbol{y}$ follows a normal distribution with \n",
+    "mean value $\\boldsymbol{X}\\boldsymbol{\\theta}$ and variance $\\sigma^2$ (not be confused with the singular values of the SVD)."
    ]
   },
   {
-   "cell_type": "code",
-   "execution_count": 1,
-   "id": "3444ad7b",
-   "metadata": {},
-   "outputs": [],
+   "cell_type": "markdown",
+   "id": "d6eba03b",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "%matplotlib inline\n",
-    "\n",
-    "# Common imports\n",
-    "import os\n",
-    "import numpy as np\n",
-    "import pandas as pd\n",
-    "import matplotlib.pyplot as plt\n",
-    "from sklearn.linear_model import LinearRegression, Ridge, Lasso\n",
-    "from sklearn.model_selection import train_test_split\n",
-    "from sklearn.utils import resample\n",
-    "from sklearn.metrics import mean_squared_error\n",
-    "from IPython.display import display\n",
-    "from pylab import plt, mpl\n",
-    "plt.style.use('seaborn')\n",
-    "mpl.rcParams['font.family'] = 'serif'\n",
-    "\n",
-    "# Where to save the figures and data files\n",
-    "PROJECT_ROOT_DIR = \"Results\"\n",
-    "FIGURE_ID = \"Results/FigureFiles\"\n",
-    "DATA_ID = \"DataFiles/\"\n",
+    "## Expectation value and variance for $\\boldsymbol{\\theta}$\n",
     "\n",
-    "if not os.path.exists(PROJECT_ROOT_DIR):\n",
-    "    os.mkdir(PROJECT_ROOT_DIR)\n",
-    "\n",
-    "if not os.path.exists(FIGURE_ID):\n",
-    "    os.makedirs(FIGURE_ID)\n",
-    "\n",
-    "if not os.path.exists(DATA_ID):\n",
-    "    os.makedirs(DATA_ID)\n",
-    "\n",
-    "def image_path(fig_id):\n",
-    "    return os.path.join(FIGURE_ID, fig_id)\n",
-    "\n",
-    "def data_path(dat_id):\n",
-    "    return os.path.join(DATA_ID, dat_id)\n",
-    "\n",
-    "def save_fig(fig_id):\n",
-    "    plt.savefig(image_path(fig_id) + \".png\", format='png')\n",
-    "\n",
-    "infile = open(data_path(\"chddata.csv\"),'r')\n",
-    "\n",
-    "# Read the chd data as  csv file and organize the data into arrays with age group, age, and chd\n",
-    "chd = pd.read_csv(infile, names=('ID', 'Age', 'Agegroup', 'CHD'))\n",
-    "chd.columns = ['ID', 'Age', 'Agegroup', 'CHD']\n",
-    "output = chd['CHD']\n",
-    "age = chd['Age']\n",
-    "agegroup = chd['Agegroup']\n",
-    "numberID  = chd['ID'] \n",
-    "display(chd)\n",
-    "\n",
-    "plt.scatter(age, output, marker='o')\n",
-    "plt.axis([18,70.0,-0.1, 1.2])\n",
-    "plt.xlabel(r'Age')\n",
-    "plt.ylabel(r'CHD')\n",
-    "plt.title(r'Age distribution and Coronary heart disease')\n",
-    "plt.show()"
+    "With the OLS expressions for the optimal parameters $\\boldsymbol{\\hat{\\theta}}$ we can evaluate the expectation value"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "01d01242",
-   "metadata": {},
+   "id": "b8a7314f",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "## Plotting the mean value for each group\n",
-    "\n",
-    "What we could attempt however is to plot the mean value for each group."
+    "$$\n",
+    "\\mathbb{E}(\\boldsymbol{\\hat{\\theta}}) = \\mathbb{E}[ (\\mathbf{X}^{\\top} \\mathbf{X})^{-1}\\mathbf{X}^{T} \\mathbf{Y}]=(\\mathbf{X}^{T} \\mathbf{X})^{-1}\\mathbf{X}^{T} \\mathbb{E}[ \\mathbf{Y}]=(\\mathbf{X}^{T} \\mathbf{X})^{-1} \\mathbf{X}^{T}\\mathbf{X}\\boldsymbol{\\theta}=\\boldsymbol{\\theta}.\n",
+    "$$"
    ]
   },
   {
-   "cell_type": "code",
-   "execution_count": 2,
-   "id": "143c59fe",
-   "metadata": {},
-   "outputs": [],
+   "cell_type": "markdown",
+   "id": "ed668c22",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "agegroupmean = np.array([0.1, 0.133, 0.250, 0.333, 0.462, 0.625, 0.765, 0.800])\n",
-    "group = np.array([1, 2, 3, 4, 5, 6, 7, 8])\n",
-    "plt.plot(group, agegroupmean, \"r-\")\n",
-    "plt.axis([0,9,0, 1.0])\n",
-    "plt.xlabel(r'Age group')\n",
-    "plt.ylabel(r'CHD mean values')\n",
-    "plt.title(r'Mean values for each age group')\n",
-    "plt.show()"
+    "This means that the estimator of the regression parameters is unbiased.\n",
+    "\n",
+    "We can also calculate the variance\n",
+    "\n",
+    "The variance of the optimal value $\\boldsymbol{\\hat{\\theta}}$ is"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "42136436",
-   "metadata": {},
+   "id": "6f4ab09a",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "We are now trying to find a function $f(y\\vert x)$, that is a function which gives us an expected value for the output $y$ with a given input $x$.\n",
-    "In standard linear regression with a linear dependence on $x$, we would write this in terms of our model"
+    "$$\n",
+    "\\begin{eqnarray*}\n",
+    "\\mbox{Var}(\\boldsymbol{\\hat{\\theta}}) & = & \\mathbb{E} \\{ [\\boldsymbol{\\theta} - \\mathbb{E}(\\boldsymbol{\\theta})] [\\boldsymbol{\\theta} - \\mathbb{E}(\\boldsymbol{\\theta})]^{T} \\}\n",
+    "\\\\\n",
+    "& = & \\mathbb{E} \\{ [(\\mathbf{X}^{T} \\mathbf{X})^{-1} \\, \\mathbf{X}^{T} \\mathbf{Y} - \\boldsymbol{\\theta}] \\, [(\\mathbf{X}^{T} \\mathbf{X})^{-1} \\, \\mathbf{X}^{T} \\mathbf{Y} - \\boldsymbol{\\theta}]^{T} \\}\n",
+    "\\\\\n",
+    "% & = & \\mathbb{E} \\{ [(\\mathbf{X}^{T} \\mathbf{X})^{-1} \\, \\mathbf{X}^{T} \\mathbf{Y}] \\, [(\\mathbf{X}^{T} \\mathbf{X})^{-1} \\, \\mathbf{X}^{T} \\mathbf{Y}]^{T} \\} - \\boldsymbol{\\theta} \\, \\boldsymbol{\\theta}^{T}\n",
+    "% \\\\\n",
+    "% & = & \\mathbb{E} \\{ (\\mathbf{X}^{T} \\mathbf{X})^{-1} \\, \\mathbf{X}^{T} \\mathbf{Y} \\, \\mathbf{Y}^{T} \\, \\mathbf{X} \\, (\\mathbf{X}^{T} \\mathbf{X})^{-1}  \\} - \\boldsymbol{\\theta} \\, \\boldsymbol{\\theta}^{T}\n",
+    "% \\\\\n",
+    "& = & (\\mathbf{X}^{T} \\mathbf{X})^{-1} \\, \\mathbf{X}^{T} \\, \\mathbb{E} \\{ \\mathbf{Y} \\, \\mathbf{Y}^{T} \\} \\, \\mathbf{X} \\, (\\mathbf{X}^{T} \\mathbf{X})^{-1} - \\boldsymbol{\\theta} \\, \\boldsymbol{\\theta}^{T}\n",
+    "\\\\\n",
+    "& = & (\\mathbf{X}^{T} \\mathbf{X})^{-1} \\, \\mathbf{X}^{T} \\, \\{ \\mathbf{X} \\, \\boldsymbol{\\theta} \\, \\boldsymbol{\\theta}^{T} \\,  \\mathbf{X}^{T} + \\sigma^2 \\} \\, \\mathbf{X} \\, (\\mathbf{X}^{T} \\mathbf{X})^{-1} - \\boldsymbol{\\theta} \\, \\boldsymbol{\\theta}^{T}\n",
+    "% \\\\\n",
+    "% & = & (\\mathbf{X}^T \\mathbf{X})^{-1} \\, \\mathbf{X}^T \\, \\mathbf{X} \\, \\boldsymbol{\\theta} \\, \\boldsymbol{\\theta}^T \\,  \\mathbf{X}^T \\, \\mathbf{X} \\, (\\mathbf{X}^T % \\mathbf{X})^{-1}\n",
+    "% \\\\\n",
+    "% & & + \\, \\, \\sigma^2 \\, (\\mathbf{X}^T \\mathbf{X})^{-1} \\, \\mathbf{X}^T  \\, \\mathbf{X} \\, (\\mathbf{X}^T \\mathbf{X})^{-1} - \\boldsymbol{\\theta} \\boldsymbol{\\theta}^T\n",
+    "\\\\\n",
+    "& = & \\boldsymbol{\\theta} \\, \\boldsymbol{\\theta}^{T}  + \\sigma^2 \\, (\\mathbf{X}^{T} \\mathbf{X})^{-1} - \\boldsymbol{\\theta} \\, \\boldsymbol{\\theta}^{T}\n",
+    "\\, \\, \\, = \\, \\, \\, \\sigma^2 \\, (\\mathbf{X}^{T} \\mathbf{X})^{-1},\n",
+    "\\end{eqnarray*}\n",
+    "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "e8a7f059",
-   "metadata": {},
+   "id": "7b7808c7",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "$$\n",
-    "f(y_i\\vert x_i)=\\beta_0+\\beta_1 x_i.\n",
-    "$$"
+    "where we have used  that $\\mathbb{E} (\\mathbf{Y} \\mathbf{Y}^{T}) =\n",
+    "\\mathbf{X} \\, \\boldsymbol{\\theta} \\, \\boldsymbol{\\theta}^{T} \\, \\mathbf{X}^{T} +\n",
+    "\\sigma^2 \\, \\mathbf{I}_{nn}$. From $\\mbox{Var}(\\boldsymbol{\\theta}) = \\sigma^2\n",
+    "\\, (\\mathbf{X}^{T} \\mathbf{X})^{-1}$, one obtains an estimate of the\n",
+    "variance of the estimate of the $j$-th regression coefficient:\n",
+    "$\\boldsymbol{\\sigma}^2 (\\boldsymbol{\\theta}_j ) = \\boldsymbol{\\sigma}^2 [(\\mathbf{X}^{T} \\mathbf{X})^{-1}]_{jj} $. This may be used to\n",
+    "construct a confidence interval for the estimates.\n",
+    "\n",
+    "In a similar way, we can obtain analytical expressions for say the\n",
+    "expectation values of the parameters $\\boldsymbol{\\theta}$ and their variance\n",
+    "when we employ Ridge regression, allowing us again to define a confidence interval. \n",
+    "\n",
+    "It is rather straightforward to show that"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "f1c0bcf8",
-   "metadata": {},
+   "id": "456afe19",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "This expression implies however that $f(y_i\\vert x_i)$ could take any\n",
-    "value from minus infinity to plus infinity. If we however let\n",
-    "$f(y\\vert y)$ be represented by the mean value, the above example\n",
-    "shows us that we can constrain the function to take values between\n",
-    "zero and one, that is we have $0 \\le f(y_i\\vert x_i) \\le 1$. Looking\n",
-    "at our last curve we see also that it has an S-shaped form. This leads\n",
-    "us to a very popular model for the function $f$, namely the so-called\n",
-    "Sigmoid function or logistic model. We will consider this function as\n",
-    "representing the probability for finding a value of $y_i$ with a given\n",
-    "$x_i$."
+    "$$\n",
+    "\\mathbb{E} \\big[ \\boldsymbol{\\theta}^{\\mathrm{Ridge}} \\big]=(\\mathbf{X}^{T} \\mathbf{X} + \\lambda \\mathbf{I}_{pp})^{-1} (\\mathbf{X}^{\\top} \\mathbf{X})\\boldsymbol{\\theta}^{\\mathrm{OLS}}.\n",
+    "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "e4fd2845",
-   "metadata": {},
+   "id": "0a38fc64",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "## The logistic function\n",
+    "We see clearly that \n",
+    "$\\mathbb{E} \\big[ \\boldsymbol{\\theta}^{\\mathrm{Ridge}} \\big] \\not= \\boldsymbol{\\theta}^{\\mathrm{OLS}}$ for any $\\lambda > 0$. We say then that the ridge estimator is biased.\n",
     "\n",
-    "Another widely studied model, is the so-called \n",
-    "perceptron model, which is an example of a \"hard classification\" model. We\n",
-    "will encounter this model when we discuss neural networks as\n",
-    "well. Each datapoint is deterministically assigned to a category (i.e\n",
-    "$y_i=0$ or $y_i=1$). In many cases, and the coronary heart disease data forms one of many such examples, it is favorable to have a \"soft\"\n",
-    "classifier that outputs the probability of a given category rather\n",
-    "than a single value. For example, given $x_i$, the classifier\n",
-    "outputs the probability of being in a category $k$.  Logistic regression\n",
-    "is the most common example of a so-called soft classifier. In logistic\n",
-    "regression, the probability that a data point $x_i$\n",
-    "belongs to a category $y_i=\\{0,1\\}$ is given by the so-called logit function (or Sigmoid) which is meant to represent the likelihood for a given event,"
+    "We can also compute the variance as"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "f4bb77ad",
-   "metadata": {},
+   "id": "851bebe1",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
-    "p(t) = \\frac{1}{1+\\mathrm \\exp{-t}}=\\frac{\\exp{t}}{1+\\mathrm \\exp{t}}.\n",
+    "\\mbox{Var}[\\boldsymbol{\\theta}^{\\mathrm{Ridge}}]=\\sigma^2[  \\mathbf{X}^{T} \\mathbf{X} + \\lambda \\mathbf{I} ]^{-1}  \\mathbf{X}^{T} \\mathbf{X} \\{ [  \\mathbf{X}^{\\top} \\mathbf{X} + \\lambda \\mathbf{I} ]^{-1}\\}^{T},\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "47fc800d",
-   "metadata": {},
+   "id": "fe64e9b5",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "Note that $1-p(t)= p(-t)$."
+    "and it is easy to see that if the parameter $\\lambda$ goes to infinity then the variance of Ridge parameters $\\boldsymbol{\\theta}$ goes to zero. \n",
+    "\n",
+    "With this, we can compute the difference"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "0fe9154b",
-   "metadata": {},
+   "id": "496492d5",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "## Examples of likelihood functions used in logistic regression and nueral networks\n",
-    "\n",
-    "The following code plots the logistic function, the step function and other functions we will encounter from here and on."
+    "$$\n",
+    "\\mbox{Var}[\\boldsymbol{\\theta}^{\\mathrm{OLS}}]-\\mbox{Var}(\\boldsymbol{\\theta}^{\\mathrm{Ridge}})=\\sigma^2 [  \\mathbf{X}^{T} \\mathbf{X} + \\lambda \\mathbf{I} ]^{-1}[ 2\\lambda\\mathbf{I} + \\lambda^2 (\\mathbf{X}^{T} \\mathbf{X})^{-1} ] \\{ [  \\mathbf{X}^{T} \\mathbf{X} + \\lambda \\mathbf{I} ]^{-1}\\}^{T}.\n",
+    "$$"
    ]
   },
   {
-   "cell_type": "code",
-   "execution_count": 8,
-   "id": "150c4acd",
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAiMAAAHFCAYAAAAg3/mzAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjguMCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy81sbWrAAAACXBIWXMAAA9hAAAPYQGoP6dpAABH+klEQVR4nO3deVxVdeLG8c9lu+woCCgKglvuG6apmVpJWdO+OC0u/bQymzZarZlSp8mxbZxKrCbLbDGzdSozaSqz1Mnd3BdUVEAElVUul3vP7w+SiUADRM5dnvfrxSvv4Zx7n8vXi0/nfM85FsMwDERERERM4mN2ABEREfFuKiMiIiJiKpURERERMZXKiIiIiJhKZURERERMpTIiIiIiplIZEREREVOpjIiIiIipVEZERETEVCojIi5o3LhxJCYmmh3jd1ksFqZMmfK7682dOxeLxcLevXt/d90XX3yRDh06EBAQgMVi4dixY6eds6EWLVp00veXmJjIuHHjmjSPiKfyMzuAiNT0l7/8hXvuucfsGL9rxYoVtGnTptGeb/369dx9991MmDCBsWPH4ufnR1hYWKM9f30tWrSIWbNm1VpIPv74Y8LDw5s+lIgHUhkRcUHt27c3O0KdnHPOOY36fJs3bwbg1ltvpX///o363I2tT58+ZkcQ8Rg6TCPSxA4fPsxtt91GfHw8VquV6OhoBg8ezNdff121Tm2HaY4dO8b48eOJjIwkNDSUSy+9lIyMjBqHSqZMmYLFYmHjxo1cd911REREEBkZSWpqKhUVFWzfvp2LL76YsLAwEhMTefrpp2tkzMzM5OabbyYmJgar1UqXLl147rnncDqd1dar7TDNypUrGTx4MIGBgcTFxTF58mTsdvvv/lyGDRvGzTffDMCAAQOwWCxVh0FOdkhk2LBhDBs2rOrxd999h8ViYf78+Tz22GPExcURHh7OhRdeyPbt22tsv3jxYi644AIiIiIIDg6mS5cuTJ8+Hagcg1mzZlW9zxNfJw411ZapLj+3vXv3YrFYePbZZ3n++edJSkoiNDSUgQMHsnLlyt/9OYl4Iu0ZEWlio0ePZu3atfztb3+jU6dOHDt2jLVr15Kfn3/SbZxOJ5dddhmrV69mypQp9O3blxUrVnDxxRefdJvrr7+em2++mdtvv5309HSefvpp7HY7X3/9NZMmTeKBBx7g3Xff5eGHH6ZDhw5cffXVQGVZGjRoEOXl5fz1r38lMTGRzz//nAceeIDdu3eTlpZ20tfcsmULF1xwAYmJicydO5fg4GDS0tJ49913f/fnkpaWxvz583nyySd544036Ny5M9HR0b+7XW0effRRBg8ezGuvvUZhYSEPP/wwl112GVu3bsXX1xeAOXPmcOuttzJ06FBefvllYmJi2LFjB5s2bQIqD5WVlJTwwQcfsGLFiqrnbtWqVa2vWd+f26xZs+jcuTMzZ86ser1LLrmEPXv2EBER0aD3LeK2DBFpUqGhoca99957ynXGjh1rtG3bturxF198YQDG7Nmzq603ffp0AzCeeOKJqmVPPPGEARjPPfdctXV79+5tAMZHH31UtcxutxvR0dHG1VdfXbXskUceMQDjv//9b7Xt77jjDsNisRjbt2+vWvbb1x41apQRFBRk5OTkVC2rqKgwOnfubADGnj17Tvm+33jjDQMwVq1aVW1527ZtjbFjx9ZYf+jQocbQoUOrHn/77bcGYFxyySXV1nv//fcNwFixYoVhGIZRVFRkhIeHG+eee67hdDpPmufOO+80TvZr8reZ6vpz27NnjwEYPXr0MCoqKqrW++mnnwzAmD9//knziHgqHaYRaWL9+/dn7ty5PPnkk6xcubJOhzCWLl0KVO7t+LUbbrjhpNv84Q9/qPa4S5cuWCwWRo4cWbXMz8+PDh06sG/fvqpl33zzDV27dq0xZ2PcuHEYhsE333xz0tf89ttvueCCC4iNja1a5uvry6hRo07x7hrf5ZdfXu1xz549Aare5/LlyyksLGTSpElYLJZGec36/twuvfTSqr00tWUU8SYqIyJNbMGCBYwdO5bXXnuNgQMHEhkZyZgxY8jJyTnpNvn5+fj5+REZGVlt+a//0f+t364bEBBAcHAwgYGBNZaXlZVVe63aDkXExcVVff9UOVu2bFljeW3LzqSoqKhqj61WKwDHjx8HKg+pAI16JlB9f26/l1HEm6iMiDSxFi1aMHPmTPbu3cu+ffuYPn06H3300SmvWREVFUVFRQVHjhyptvxUBaahoqKiyM7OrrE8KysLqMx/qm1ry3S6OQMDA7HZbDWW5+XlNej5TsxFOXDgwGnl+rXT+bmJeDuVERETJSQk8Kc//YkRI0awdu3ak643dOhQoHKvyq+99957jZ7pggsuYMuWLTXyzJs3D4vFwvDhw0+67fDhw/nPf/7DoUOHqpY5HI4auesrMTGRjRs3Vlu2Y8eOWs+QqYtBgwYRERHByy+/jGEYJ12vPnsrTufnJuLtdDaNSBMqKChg+PDh3HjjjXTu3JmwsDBWrVrF4sWLq85mqc3FF1/M4MGDuf/++yksLCQ5OZkVK1Ywb948AHx8Gu//K+677z7mzZvHpZdeyrRp02jbti1ffPEFaWlp3HHHHXTq1Omk2/75z3/m3//+N+effz6PP/44wcHBzJo1i5KSktPKNHr0aG6++WYmTZrENddcw759+3j66acbfLZNaGgozz33HBMmTODCCy/k1ltvJTY2ll27drFhwwZeeuklAHr06AHAjBkzGDlyJL6+vvTs2ZOAgIAaz3k6PzcRb6cyItKEAgMDGTBgAG+99RZ79+7FbreTkJDAww8/zEMPPXTS7Xx8fPjss8+4//77+fvf/055eTmDBw/m7bff5pxzzqFZs2aNljE6Oprly5czefJkJk+eTGFhIe3atePpp58mNTX1lNt2796dr7/+mvvvv5+xY8fSvHlzRo8ezTXXXMNtt93W4Ew33ngjWVlZvPzyy7zxxht0796d2bNnM3Xq1AY/5/jx44mLi2PGjBlMmDABwzBITExk7Nix1V73xx9/JC0tjWnTpmEYBnv27Kn1Uv2n83MT8XYW41T7KEXEpb377rvcdNNN/PjjjwwaNMjsOCIiDaIyIuIm5s+fz8GDB+nRowc+Pj6sXLmSZ555hj59+lSd+isi4o50mEbETYSFhfHee+/x5JNPUlJSQqtWrRg3bhxPPvmk2dFERE6L9oyIiIiIqXRqr4iIiJhKZURERERMpTIiIiIipnKLCaxOp5OsrCzCwsIa7aZWIiIicmYZhkFRURFxcXGnvDijW5SRrKws4uPjzY4hIiIiDbB///5T3pjSLcpIWFgYUPlmwsPDTU7TMHa7nSVLlpCSkoK/v7/ZcbyexsN1aCxch8bCdXjKWBQWFhIfH1/17/jJuEUZOXFoJjw83K3LSHBwMOHh4W79F8tTaDxch8bCdWgsXIenjcXvTbHQBFYRERExlcqIiIiImEplREREREylMiIiIiKmUhkRERERU6mMiIiIiKlURkRERMRUKiMiIiJiKpURERERMZXKiIiIiJhKZURERERMpTIiIiIiplIZEREREVOpjIiIiIipVEZERETEVCojIiIiYiqVERERETGVyoiIiIiYSmVERERETKUyIiIiIqZSGRERERFTqYyIiIiIqVRGRERExFQqIyIiImKqepeR77//nssuu4y4uDgsFguffPLJ726zdOlSkpOTCQwMpF27drz88ssNySoiIiIeqN5lpKSkhF69evHSSy/Vaf09e/ZwySWXMGTIENatW8ejjz7K3XffzYcffljvsCIiIuJ5/Oq7wciRIxk5cmSd13/55ZdJSEhg5syZAHTp0oXVq1fz7LPPcs0119T35UVERMTD1LuM1NeKFStISUmptuyiiy5izpw52O12/P39a2xjs9mw2WxVjwsLCwGw2+3Y7fYzG/gMOZHbXfN7Go2H69BYuA6NhevwlLGoa/4zXkZycnKIjY2ttiw2NpaKigry8vJo1apVjW2mT5/O1KlTayxfsmQJwcHBZyxrU0hPTzc7gvyKxsN1aCxch8bCdbj7WJSWltZpvTNeRgAsFku1x4Zh1Lr8hMmTJ5Oamlr1uLCwkPj4eFJSUggPDz9zQc8gu91Oeno6I0aMqHVvkDQtjYfr0Fi4Do2F6zBrLMornBSV2SmyVVBU9r+vYlsFJeUOin/5c7GtgtJyB6XlDkrKKyixOSgtryAowJcPbz+n6vlOHNn4PWe8jLRs2ZKcnJxqy3Jzc/Hz8yMqKqrWbaxWK1artcZyf39/t/+AeMJ78CQaD9ehsXAdGgvX0dCxKLM7OFpazpGSco6W2DlSWs6x0so/HzteTkGpnWPH7RT88lV43E5hmZ0yu/O08oZZ/arlrWv2M15GBg4cyGeffVZt2ZIlS+jXr5/+souIiNRRmd1BbqGN3KIyDhfZOFxsI7fQRl6xjbzicvJLbOQXl5NfbKOk3HFarxVq9SMs8MSXP6FWP0ID/Qiz+hHyy1eY1Y9gqy8hAX4EB/gSYvUj1NqwWlHvrYqLi9m1a1fV4z179rB+/XoiIyNJSEhg8uTJHDx4kHnz5gEwceJEXnrpJVJTU7n11ltZsWIFc+bMYf78+Q0KLCIi4mlKyyvIOlZG1rHjZBccZ39+Cat2+fDBm2vILSonp7CMguP1m8zq52OheUgAkcEBNA/xp3lwAM2CA2gW7E+zIH+aBfsTEeRPeJA/4YH/+3Oo1Q9fn9qnUZwp9S4jq1evZvjw4VWPT8ztGDt2LHPnziU7O5vMzMyq7yclJbFo0SLuu+8+Zs2aRVxcHC+88IJO6xUREa9R4XBy8Nhx9uaXkplfwoGjxzlw9Dj7j5Zy4OhxjpSU17KVDxzOr7bE6udDTLiV6FArMWGBRIdZaRFqpUVYAFEhVlqEBtAi1ErzkADCA/1OOjfT1dS7jAwbNqxqAmpt5s6dW2PZ0KFDWbt2bX1fSkRExG0YhkFOYRm7c0vIyCsm43AJGXkl7Msv4eDR41Q4T/5vJ1TOt4hrFkSrZoG0DLdSmLOP8/r1JK55CC0jAokND3SrglEfTXI2jYiIiKcwDIOsgjJ2HCpi56EitucUszO3iF25xZSeYq6G1c+HtlHBJESGEB8ZRJvmwcQ3r/xvm8ggwgP/N4/SbrezaNFeLunb2ivmV6qMiIiInER5hZMdh4rYkl3IlqxCtmZXfhWWVdS6vp+PhYSoYNq1CKV9dAjtokNoGxVCYlQIMWFWfJp4Loa7UBkRERGhcl7Hztxifj5QwMaDx/j5QAFbs4sod9Q83dXPx0K76BA6xobRKSaMs1qG0iEmjLZRwfj71vu2b15PZURERLzS0ZJy1u0/ytp9x1iz7ygbDhyr9TBLRJA/3eLC6dKq8qtrq3A6xIQS4KfS0VhURkRExCvkFpXx34wj/LTnCP/dk8+OQ8U11gm1+tGjdQQ920TQo00EPVs3Iz4yyCMnjboSlREREfFIBcftrNidzw+7DrN8Vz4ZeSU11mkXHULfhOaVX22b0TEmrMmvsSEqIyIi4iEcToP1+4+xdHsu3+/MY+OBY/z6bFqLBbq2Cqd/UiQDkqI4O7E5UaE1bz0iTU9lRERE3FZBqZ2lOw/z7bZclu44XOPiYe2jQzi3QwvO7RhN/6RIIoI8/zRZd6QyIiIibiW3sIyvthziq005rMjIx/Gr3R/hgX6c1yma8zpFc26HFsQ1CzIxqdSVyoiIiLi87ILjfLExmy835bA28yi/vhB4x5hQzu8cw/mdY0hu2xw/nVrrdlRGRETEJR0rLWfRzzl8uv4gP+09Uq2A9I5vxsjuLbmoW0sSW4SYF1IahcqIiIi4DLvDyTfbclm4+gBLd+Rid/yvgZyd2JxLe7Tiou4taRWhwy+eRGVERERMt+NQEQtX7+ejtQfJ/9Uk1C6twrmidxyX9YqjteZ/eCyVERERMUWZ3cHnG7N5e+U+1u8/VrU8OszK1X1bc03fNnSKDTMvoDQZlREREWlS+4+U8vZ/9/H+qv0cLbUDlfd6uaBLDNf3i2dop2hNQvUyKiMiInLGGYbByowjvLYsg2+251ZNRm3dLIgbByRwfb94osN0ATJvpTIiIiJnTIXDyZebcvjXsgw2HiioWj6kYwtGn9OWC7rE6vLrojIiIiKNr8zuYP5Pmby2bA8Hjx0HwOrnw3X92nDL4CTaR4eanFBcicqIiIg0mtLyCt5Zmckr32eQV2wDICokgDEDE7n5nATdC0ZqpTIiIiKnrcRWwVsr9/Gv7zOqTs1t0zyIiUPbc21yGwL9fU1OKK5MZURERBrMVuHgnZWZvPTtrqqb1CVEBvOn4R24qm9r/HVWjNSByoiIiNSbw2nw6fqDPJ++gwNHK+eEJEYF86fzO3JF7ziVEKkXlREREamXb7fnMuPLbWzLKQIgJszKfSM6cV1yG10fRBpEZUREROpkV24xT36xhe+2HwYgLNCPO4a155ZBSQQFaE6INJzKiIiInFJhmZ0Xvt7J3OV7qXAa+PtaGDcokTuHd6BZcIDZ8cQDqIyIiEitnE6DD9Yc4OmvtpFXXDk59YLOMTx2aRfa6Toh0ohURkREpIZduUU8+tEmftp7BID20SH85Q9dGXZWjMnJxBOpjIiISBW7E2b+ZxevLtuD3WEQHODLfRd2YtzgRJ0hI2eMyoiIiACwMuMIMzb4crgsA6g8JDPtyu60bhZkcjLxdCojIiJerrS8gr9/uY15K/YBFmLCrEy9vBsXd2+JxaKb2MmZpzIiIuLFVu89wv0LN7AvvxSAQbFOXpowiMiwYJOTiTdRGRER8UJldgfPLdnOaz/swTCgVUQgT13ZjcId/yUs0N/seOJlVEZERLzMtpxC7np3HTtziwG4LrkNf7msK0G+sGiHyeHEK6mMiIh4CcMwePu/mTz5+RZsFU5ahFqZcU0PLugSC4Ddbjc5oXgrlRERES9wrLSchz/cyFebDwEw/Kxonr2uF1GhVpOTiaiMiIh4vFV7j3DP/HVkFZTh72vh4Ys783+Dk/Dx0Zky4hpURkREPJRhGMz5YQ/Tv9yGw2mQGBXMizf0pUebCLOjiVSjMiIi4oFKbBU8/OFGPt+YDcAVveP421U9CLXq1764Hv2tFBHxMBmHi5n49hp2HCrGz8fCny/twthBibqAmbgslREREQ+yZHMOqe9voNhWQXSYlbSb+nJ2YqTZsUROSWVERMQDGIbB7KW7eXrxdgD6J0by0o19iAkPNDmZyO9TGRERcXO2CgeTP/qZj9YeBGD0OW15/LKuusuuuA2VERERN5ZfbOP2t9awet9RfH0sPHFZV8YMTDQ7lki9qIyIiLipHYeKGP/mKvYfOU6Y1Y9ZN/XlvE7RZscSqTeVERERN7QyI59b562mqKyChMhgXh/Xjw4xYWbHEmkQlRERETfz5c/Z3LNgPeUVTvq1bc6rY/oRGRJgdiyRBlMZERFxI28u38uUzzZjGJDSNZYXbuhDoL+v2bFETovKiIiIGzAMg2eXbGfWt7sBuGlAAtOu6I6v7i8jHkBlRETExTmcBpM/2sj7qw8AcP+ITvzp/A66oqp4DJUREREXZnc4uW/Bej7fmI2PBaZf3YNRZyeYHUukUTXoijhpaWkkJSURGBhIcnIyy5YtO+X677zzDr169SI4OJhWrVpxyy23kJ+f36DAIiLeoszu4I631/D5xmz8fS3MurGvioh4pHqXkQULFnDvvffy2GOPsW7dOoYMGcLIkSPJzMysdf0ffviBMWPGMH78eDZv3szChQtZtWoVEyZMOO3wIiKeqrS8glvnrebrrblY/Xx4dXQ/RvZoZXYskTOi3mXk+eefZ/z48UyYMIEuXbowc+ZM4uPjmT17dq3rr1y5ksTERO6++26SkpI499xzuf3221m9evVphxcR8URFZXbGvv4Ty3bmERzgyxu3nM3wzjFmxxI5Y+pVRsrLy1mzZg0pKSnVlqekpLB8+fJatxk0aBAHDhxg0aJFGIbBoUOH+OCDD7j00ksbnlpExEMVltkZPecnVu09SligH2+NH8Cg9i3MjiVyRtVrAmteXh4Oh4PY2Nhqy2NjY8nJyal1m0GDBvHOO+8watQoysrKqKio4PLLL+fFF1886evYbDZsNlvV48LCQgDsdjt2u70+kV3Gidzumt/TaDxch8bif4ptFfzfm2tYv7+AZkH+zB2XTLe40Cb72WgsXIenjEVd8zfobJrfnk5mGMZJTzHbsmULd999N48//jgXXXQR2dnZPPjgg0ycOJE5c+bUus306dOZOnVqjeVLliwhODi4IZFdRnp6utkR5Fc0Hq7D28fC5oCXt/qSUWQhyNfg1o7H2bf+B/atb/os3j4WrsTdx6K0tLRO61kMwzDq+qTl5eUEBwezcOFCrrrqqqrl99xzD+vXr2fp0qU1thk9ejRlZWUsXLiwatkPP/zAkCFDyMrKolWrmhOyatszEh8fT15eHuHh4XWN61Lsdjvp6emMGDECf39/s+N4PY2H69BYVE5WnfDWuqpDM2+OS6ZH64gmz6GxcB2eMhaFhYW0aNGCgoKCU/77Xa89IwEBASQnJ5Oenl6tjKSnp3PFFVfUuk1paSl+ftVfxte38tLFJ+tBVqsVq9VaY7m/v79bDwp4xnvwJBoP1+GtY3G83MHEdzZUFhFr5RyR3vHNTM3krWPhitx9LOqavd5n06SmpvLaa6/x+uuvs3XrVu677z4yMzOZOHEiAJMnT2bMmDFV61922WV89NFHzJ49m4yMDH788Ufuvvtu+vfvT1xcXH1fXkTEY5RXOLn97TWsyMgn1OrHm+P7m15ERMxQ7zkjo0aNIj8/n2nTppGdnU337t1ZtGgRbdu2BSA7O7vaNUfGjRtHUVERL730Evfffz/NmjXj/PPPZ8aMGY33LkRE3IzDaXDfgvV8v+MwQf6+zL3lbPomNDc7logpGjSBddKkSUyaNKnW782dO7fGsrvuuou77rqrIS8lIuJxDMPgz5/8zBc/V15Z9dUxyfRLjDQ7lohpGnQ5eBERabgZi7cz/6f9+Fjgn3/sw5CO0WZHEjGVyoiISBOa/d1uXl66G6i86d0lusS7iMqIiEhTee+nTGYs3gbAo5d01k3vRH6hMiIi0gS+2XaIRz/+GYA7hrXntvPam5xIxHWojIiInGEbDxzjznfW4TTg2uQ2PHTRWWZHEnEpKiMiImdQZn4p/zd3FcftDoZ0bMH0q3uc9PYZIt5KZURE5Aw5WlLOuDd+Iq+4nK6twpl9czL+vvq1K/Jb+lSIiJwBZXYHE+atJiOvhNbNgnjjlrMJtTbo0k4iHk9lRESkkTmdBve/v4E1+44SHujH3FvOJjY80OxYIi5LZUREpJHN/HrHr66u2o+OsWFmRxJxaSojIiKN6NP1B3nhm10APHVVD85pF2VyIhHXpzIiItJI1mYe5cEPNgJw+9B2XNcv3uREIu5BZUREpBEcPHac2+atobzCyYVdYnnoos5mRxJxGyojIiKnqdhWwfi5q8grttGlVTj//GNvfH10LRGRulIZERE5DZVnzqxnW04RLUKtvDa2HyE6hVekXlRGREROw6xvd/HV5kME+Prw6phkWjcLMjuSiNtRGRERaaBvth3i+a93APDXK7vRN6G5yYlE3JPKiIhIA2QcLuae+esxDLj5nARGnZ1gdiQRt6UyIiJST0Vldm57aw1FtgrOTmzO43/oZnYkEbemMiIiUg8nLvW+K7eYluGBzLqpLwF++lUqcjr0CRIRqYe073axZEvlhNXZN/clJkz3nBE5XSojIiJ19OOuPJ5P/9+E1T6asCrSKFRGRETqIKegjLvnr8NpwPX92mjCqkgjUhkREfkddoeTO99dS35JOV1bhTPtiu5mRxLxKCojIiK/4+9fbmPNvqOEBfox++a+BPr7mh1JxKOojIiInMKXP2cz54c9ADx7XS/aRoWYnEjE86iMiIicxJ68Eh78YCMAt5/Xjou6tTQ5kYhnUhkREalFmd3Bne+spdhWQf+kSB686CyzI4l4LJUREZFaPLVoK1uyC4kMCeDFG/rg56tflyJnij5dIiK/sXhTNvNW7APg+et7ERuuC5uJnEkqIyIiv7L/SCkP/WqeyLCzYkxOJOL5VEZERH5hdzi5+711FJZV0Du+GQ9onohIk1AZERH5xbNLtrMu8xhhgX68eEMf/DVPRKRJ6JMmIgIs3XGYV5ZmAPD0NT2Jjww2OZGI91AZERGvl1ds4/73NwBw8zkJjOzRyuREIt5FZUREvJphGDz8wUbyim10ig3lz5d2NTuSiNdRGRERr/bWyn38Z1suAX4+/POPfXTfGRETqIyIiNfacaiIv32xFYBHLu5Ml1bhJicS8U4qIyLilcrsDu6evw5bhZOhnaK5ZXCi2ZFEvJbKiIh4pRmLt7Etp4gWoQE8e10vLBaL2ZFEvJbKiIh4naU7DvPGj3sBeObaXkSHWc0NJOLlVEZExKscLSnnwYWVp/GOHdiW4Z11uXcRs6mMiIjXMAyDP3+yidwiG+2jQ5h8SRezI4kIKiMi4kU+XZ/FFz9n4+dj4R+jeus0XhEXoTIiIl7h4LHj/OXTTQDcc0FHerZpZm4gEamiMiIiHs/pNHjg/Q0UlVXQJ6EZdwxrb3YkEfkVlRER8XhvLN/Liox8gvx9ef763vjpbrwiLkWfSBHxaDsPFTFj8TYA/vyHLiS1CDE5kYj8lsqIiHgsu8NJ6vsbKK9wMuysaG7sn2B2JBGphcqIiHistG938/PBAiKC/JlxTU9dZVXERamMiIhH2nSwgBe/2QnAtCu6ERseaHIiETmZBpWRtLQ0kpKSCAwMJDk5mWXLlp1yfZvNxmOPPUbbtm2xWq20b9+e119/vUGBRUR+j63Cwf3vb6DCaTCye0su7xVndiQROQW/+m6wYMEC7r33XtLS0hg8eDCvvPIKI0eOZMuWLSQk1H489vrrr+fQoUPMmTOHDh06kJubS0VFxWmHFxGpzT+/3sn2Q0VEhQTw5JXddXhGxMXVu4w8//zzjB8/ngkTJgAwc+ZMvvrqK2bPns306dNrrL948WKWLl1KRkYGkZGRACQmJp5eahGRk1ibeZSXl+4G4G9X9SAqVDfBE3F19Soj5eXlrFmzhkceeaTa8pSUFJYvX17rNv/+97/p168fTz/9NG+99RYhISFcfvnl/PWvfyUoKKjWbWw2GzabrepxYWEhAHa7HbvdXp/ILuNEbnfN72k0Hq6jMceizO7ggffX4zTgil6tuOCsKI1xPehz4To8ZSzqmr9eZSQvLw+Hw0FsbGy15bGxseTk5NS6TUZGBj/88AOBgYF8/PHH5OXlMWnSJI4cOXLSeSPTp09n6tSpNZYvWbKE4ODg+kR2Oenp6WZHkF/ReLiOxhiLj/f6kJHnQ4S/wTkB+1m0aH8jJPM++ly4Dncfi9LS0jqtV+/DNECN46+GYZz0mKzT6cRisfDOO+8QEREBVB7qufbaa5k1a1ate0cmT55Mampq1ePCwkLi4+NJSUkhPDy8IZFNZ7fbSU9PZ8SIEfj7+5sdx+tpPFxHY43F2sxjLF35EwDP/rEvwzpFN1ZEr6HPhevwlLE4cWTj99SrjLRo0QJfX98ae0Fyc3Nr7C05oVWrVrRu3bqqiAB06dIFwzA4cOAAHTt2rLGN1WrFaq15nNff39+tBwU84z14Eo2H6zidsSizO5j88WYMA67p24YR3XT2zOnQ58J1uPtY1DV7vU7tDQgIIDk5ucZuo/T0dAYNGlTrNoMHDyYrK4vi4uKqZTt27MDHx4c2bdrU5+VFRGr1j/QdZOSVEBNm5fE/dDU7jojUU72vM5Kamsprr73G66+/ztatW7nvvvvIzMxk4sSJQOUhljFjxlStf+ONNxIVFcUtt9zCli1b+P7773nwwQf5v//7v5NOYBURqau1mUf517IMAJ66qgcRwe77f5Ei3qrec0ZGjRpFfn4+06ZNIzs7m+7du7No0SLatm0LQHZ2NpmZmVXrh4aGkp6ezl133UW/fv2Iiori+uuv58knn2y8dyEiXqnM7uDBhRtwGnB1n9Zc2LX2w8Ui4toaNIF10qRJTJo0qdbvzZ07t8ayzp07u/2MYBFxPTO/3snuwyVEh1l5/DIdnhFxV7o3jYi4pQ37j/Hq95UXN3vqqh40Cw4wOZGINJTKiIi4HVuFgwc/qDw8c0XvOEbo8IyIW1MZERG3M+vb3ew4VEyL0ACmXNbN7DgicppURkTErWzJKiTt210ATL28O81DdHhGxN2pjIiI26hwOHnoww1UOA0u7taSS3q0NDuSiDQClRERcRuvLstg08FCIoL8mXZlt5PehkJE3IvKiIi4hV25xcz8eicAj/+hKzFhgSYnEpHGojIiIi7P4TR46IMNlFc4Gdopmqv7tjY7kog0IpUREXF581bsZW3mMUKtfjx1dQ8dnhHxMCojIuLS9h8p5enF2wF4eGRnWjfTPa1EPI3KiIi4LMMwePTjnzlud9A/KZKb+ieYHUlEzgCVERFxWR+uPciynXlY/Xz4+9U98PHR4RkRT6QyIiIuKbeojL9+vgWA+0Z0ol10qMmJRORMURkREZc05d+bKThup3vrcCacm2R2HBE5g1RGRMTlLN6Uw6Kfc/D1sTDjmp74+epXlYgn0ydcRFxKQamdv3y6CYCJQ9vRLS7C5EQicqapjIiIS3lq0VYOF9loFx3CXed3NDuOiDQBlRERcRnLd+WxYPV+AGZc05NAf1+TE4lIU1AZERGXcLzcwSMf/QzA6HPacnZipMmJRKSpqIyIiEv45ze7yDxSSquIQB66+Cyz44hIE/IzO4CISGYxvLFpHwB/u6o7YYH+JicSkaakPSMiYiq7w8n83b44Dbi8Vxznd441O5KINDGVEREx1b+W7SWr1ELzYH+euKyr2XFExAQqIyJiml25xbz03W4AHht5FlGhVpMTiYgZVEZExBROp8HkjzZidxh0aebk8l6tzI4kIiZRGRERU7zz332s2nuUkABfrm/nxGLRHXlFvJXKiIg0uaxjx/n7l9sAuH9ERyJ1dEbEq6mMiEiTMgyDP3+yiZJyB8ltm3NT/3izI4mIyVRGRKRJ/XtDFt9syyXA14cZ1/TAx0eHZ0S8ncqIiDSZIyXlTP1sCwB/Or8DHWLCTE4kIq5AZUREmsxfP9/CkZJyzooNY+LQ9mbHEREXoTIiIk3i2+25fLzuID4WmHFtTwL89OtHRCrpt4GInHHFtgoe++WOvLcMTqJ3fDNzA4mIS1EZEZEz7unF28gqKCM+Moj7UzqZHUdEXIzKiIicUav2HmHeiso78v796p4EB+hm4SJSncqIiJwxZXYHD3+4EYBR/eIZ3KGFyYlExBWpjIjIGfPiNzvJOFxCTJiVRy/tYnYcEXFRKiMickZszirg5aUZAEy7ojsRQf4mJxIRV6UyIiKNrsLh5OEPN+JwGlzSoyUXd29pdiQRcWEqIyLS6F5dlsGmg4VEBPkz5fJuZscRERenMiIijWpXbjEzv94JwON/6EpMWKDJiUTE1amMiEijcTgNHvpgA+UVToZ2iubqvq3NjiQibkBlREQazZvL97I28xihVj+euroHFovuyCsiv09lREQaRWZ+Kc98tR2AyZd0pnWzIJMTiYi7UBkRkdNmGAaPfLSR43YH57SL5IazE8yOJCJuRGVERE7be6v2s3x3PoH+Psy4pic+Pjo8IyJ1pzIiIqcl69hxnvpiKwAPpJxF26gQkxOJiLtRGRGRBjMMg8kf/UyRrYI+Cc24ZXCS2ZFExA2pjIhIgy1cfYClOw4T4OfDM9f2wleHZ0SkAVRGRKRBsguO89fPtwBw/4hOdIgJNTmRiLirBpWRtLQ0kpKSCAwMJDk5mWXLltVpux9//BE/Pz969+7dkJcVERfx68MzveObMWFIO7MjiYgbq3cZWbBgAffeey+PPfYY69atY8iQIYwcOZLMzMxTbldQUMCYMWO44IILGhxWRFzDB2sO8N32ysMzz17XU4dnROS01LuMPP/884wfP54JEybQpUsXZs6cSXx8PLNnzz7ldrfffjs33ngjAwcObHBYETFfTkEZ0345PJM6ohMdYsJMTiQi7q5eZaS8vJw1a9aQkpJSbXlKSgrLly8/6XZvvPEGu3fv5oknnmhYShFxCYZh8OjHP1NUVkGv+GZMOFdnz4jI6fOrz8p5eXk4HA5iY2OrLY+NjSUnJ6fWbXbu3MkjjzzCsmXL8POr28vZbDZsNlvV48LCQgDsdjt2u70+kV3Gidzumt/TaDwa5oO1B/lmWy7+vhamX9kVw+nA7nSc1nNqLFyHxsJ1eMpY1DV/vcrICb+9+ZVhGLXeEMvhcHDjjTcydepUOnXqVOfnnz59OlOnTq2xfMmSJQQHB9c/sAtJT083O4L8isaj7o7YYMYGX8DCxa0r2Ln6e3Y24vNrLFyHxsJ1uPtYlJaW1mk9i2EYRl2ftLy8nODgYBYuXMhVV11Vtfyee+5h/fr1LF26tNr6x44do3nz5vj6+lYtczqdGIaBr68vS5Ys4fzzz6/xOrXtGYmPjycvL4/w8PC6xnUpdrud9PR0RowYgb+/v9lxvJ7Go34Mw2Dcm2tYvvsIveMjeG9C/0abtKqxcB0aC9fhKWNRWFhIixYtKCgoOOW/3/XaMxIQEEBycjLp6enVykh6ejpXXHFFjfXDw8P5+eefqy1LS0vjm2++4YMPPiApqfbjzVarFavVWmO5v7+/Ww8KeMZ78CQaj7p5a+U+lu8+QqC/D89f35tAa0Cjv4bGwnVoLFyHu49FXbPX+zBNamoqo0ePpl+/fgwcOJBXX32VzMxMJk6cCMDkyZM5ePAg8+bNw8fHh+7du1fbPiYmhsDAwBrLRcQ17csvYfqiynvPPHxxZ9pF6+JmItK46l1GRo0aRX5+PtOmTSM7O5vu3buzaNEi2rZtC0B2dvbvXnNERNyD02nw4MKNlJY7OKddJGMHJpodSUQ8UIMmsE6aNIlJkybV+r25c+eectspU6YwZcqUhrysiDSx13/cw097jxAS4Msz1/bCRxc3E5EzQPemEZFa7TxUxDNfbQfgsUu7Eh/p3meyiYjrUhkRkRrKK5zc9/56bBVOzusUzQ39482OJCIeTGVERGp44T872XSwkGbB/jxzbc9aryMkItJYVEZEpJo1+46Q9t0uAJ66qgex4YEmJxIRT6cyIiJVSmwV3LdgA04Dru7Tmkt6tDI7koh4AZUREany5BdbyTxSSutmQUy5opvZcUTES6iMiAgA/9l6iPk/ZWKxwLPX9SI80H2v+igi7kVlREQ4XGTj4Q83AjB+cBID20eZnEhEvInKiIiXMwyDhz7YQF5xOWfFhvHARWeZHUlEvIzKiIiXm7diH99uP0yAnw8v3NCHQH/f399IRKQRqYyIeLHtOUX87Zeb4D06sjNntQwzOZGIeCOVEREvVWZ3cPf8dZRXOBl+VjRjByWaHUlEvJTKiIiX+vuX29h+qIgWoQE8c10vXWVVREyjMiLihb7dnsvc5XsBeOa6XrQItZobSES8msqIiJfJLSrjwYUbABg3KJHhZ8WYnEhEvJ3KiIgXcTgN7n1vPXnF5XRuGcYjIzubHUlERGVExJukfbuL5bvzCfL35aUb++o0XhFxCSojIl7ipz1H+MfXOwD465Xd6RATanIiEZFKKiMiXuBoSTn3vLeu6m681ya3MTuSiEgVlRERD2cYBg8s3EB2QRntWoTw1yu7mx1JRKQalRERD/f6j3v5z7ZcAvx8ePHGPoRY/cyOJCJSjcqIiAdbs+8o03+53PtfLu1Ct7gIkxOJiNSkMiLiofKLbfzp3bVUOA0u7dGKm89pa3YkEZFaqYyIeCCH0+DeBesr54lEhzDj2p663LuIuCyVEREP9M//7GTZzjyC/H2ZfVMyoZonIiIuTGVExMN8tz2XF7/ZCcDfrurOWS3DTE4kInJqKiMiHuTgsePcu2A9hgE3Dkjg6r66noiIuD6VEREPUWZ3MOntNRwrtdOjdQSP/6Gr2ZFEROpEZUTEAxiGwV8+2cSGAwVEBPmTdpPuOyMi7kNlRMQDvLVyHwvXHMDHAi/d2If4yGCzI4mI1JnKiIib+29GPtM+2wLAwxd3ZkjHaJMTiYjUj8qIiBvLOnacO3+5sNllveK47bx2ZkcSEak3lRERN1Vmd3DH22vIKy6nc8swZlzTQxc2ExG3pDIi4oYMw+DPv0xYbRbsz7/G9CM4QBc2ExH3pDIi4oZe+T6DD36ZsPriDZqwKiLuTWVExM18tTmHGYu3AfD4H7pqwqqIuD2VERE3sjmrgPt+ucLqzeckMHZQotmRREROm8qIiJvILSrj1jdXU1ru4NwOLXjism6asCoiHkFlRMQNlNkd3DpvDVkFZbSLDmHWTX3x99XHV0Q8g36bibg4p9Pg/oUb2LD/GBFB/rw+9mwigvzNjiUi0mhURkRc3N8Xb+OLjdn4+1qYfXNfEluEmB1JRKRRqYyIuLC5P+7h1e8zAHjm2l4Mat/C5EQiIo1PZUTERS3elMPUzyvvOfPgRWdxZZ/WJicSETkzVEZEXNCafUe45711GAbcNCCBScPamx1JROSMURkRcTEZh4uZ8OZqbBVOLuwSw9TLdQqviHg2lRERF5JdcJzRc37iaKmdXm0ieOGGPvjpFF4R8XD6LSfiIo6UlDN6zk8cPHacpBYhzBl3tm5+JyJeQWVExAUU2yoY98ZP7MotplVEIG+N70+LUKvZsUREmoTKiIjJyuwObn1zNRsPFBAZEsBb4wfQprnuwisi3kNlRMREFQ4nd81fx4qMfEKtfrx5S386xISaHUtEpEmpjIiYxOE0eGDhBtK3HCLAz4d/jelHjzYRZscSEWlyDSojaWlpJCUlERgYSHJyMsuWLTvpuh999BEjRowgOjqa8PBwBg4cyFdffdXgwCKewOk0eOiDjXyyPgtfHwuzbuzLwPZRZscSETFFvcvIggULuPfee3nsscdYt24dQ4YMYeTIkWRmZta6/vfff8+IESNYtGgRa9asYfjw4Vx22WWsW7futMOLuCOn02DyRz/z4doD+PpYePGGPozoGmt2LBER09S7jDz//POMHz+eCRMm0KVLF2bOnEl8fDyzZ8+udf2ZM2fy0EMPcfbZZ9OxY0eeeuopOnbsyGeffXba4UXcjdNp8Ngnm1iwej8+Fpg5qjeX9GhldiwREVPV6yIG5eXlrFmzhkceeaTa8pSUFJYvX16n53A6nRQVFREZGXnSdWw2GzabrepxYWEhAHa7HbvdXp/ILuNEbnfN72nMGA/DMJjy+Vbm/3QAHws8c00PLu4a7fV/J/TZcB0aC9fhKWNR1/z1KiN5eXk4HA5iY6vvUo6NjSUnJ6dOz/Hcc89RUlLC9ddff9J1pk+fztSpU2ssX7JkCcHB7n3KY3p6utkR5FeaajycBny4x4cfDvlgweCG9k78Dq5j0UEdrjxBnw3XobFwHe4+FqWlpXVar0GXd/ztfTIMw6jTvTPmz5/PlClT+PTTT4mJiTnpepMnTyY1NbXqcWFhIfHx8aSkpBAeHt6QyKaz2+2kp6czYsQI/P39zY7j9ZpyPBxOg0c/2cwPh7KwWOCpK7tzbV/dgfcEfTZch8bCdXjKWJw4svF76lVGWrRoga+vb429ILm5uTX2lvzWggULGD9+PAsXLuTCCy885bpWqxWrtebVJ/39/d16UMAz3oMnOdPjYXc4uX/her7YmI2vj4XnruvFlX1URGqjz4br0Fi4Dncfi7pmr9cE1oCAAJKTk2vsNkpPT2fQoEEn3W7+/PmMGzeOd999l0svvbQ+LynitsrsDu54ew1fbMzG39fCrBv7qIiIiNSi3odpUlNTGT16NP369WPgwIG8+uqrZGZmMnHiRKDyEMvBgweZN28eUFlExowZwz//+U/OOeecqr0qQUFBREToAk/imUrLK7ht3hp+2JWH1c+Hl0cnM/yskx+aFBHxZvUuI6NGjSI/P59p06aRnZ1N9+7dWbRoEW3btgUgOzu72jVHXnnlFSoqKrjzzju58847q5aPHTuWuXPnnv47EHExR0rKGf/mKtZlHiM4wJc5Y8/WBc1ERE6hQRNYJ02axKRJk2r93m8LxnfffdeQlxBxS/uPlDL29Z/IyCshIsifN245m74Jzc2OJSLi0hpURkSkpi1ZhYx94ycOF9mIiwhk3vj+dIgJMzuWiIjLUxkRaQTLd+Vx+1trKLJV0LllGHNv6U/LiECzY4mIuAWVEZHT9On6gzywcAN2h8GApEheHdOPiCD3PRVPRKSpqYyINJDTafCPr3fw4je7ALikR0uev743gf6+JicTEXEvKiMiDXC83MH9C9ez6OfKU9VvH9qOhy/qjI/P71+JWEREqlMZEamnQ4Vl3DpvNRsPFODva+Gpq3pwXb94s2OJiLgtlRGReth44Bi3zVtDTmEZzYP9eWV0P/onnfwO1CIi8vtURkTq6L2fMnn8082UO5x0jAllztizSYhy77tIi4i4ApURkd9RZnfwxKebWbB6PwAjusby3PW9CA/UGTMiIo1BZUTkFA4cLeWOt9fy88ECfCxwf8pZ3DG0vSaqiog0IpURkZP4dnsuqQvWc7TUTvNgf164oQ9DOkabHUtExOOojIj8hq3CwdOLtzPnhz0A9GwTQdpNfWnTXPNDRETOBJURkV/JOFzM3e+tY9PBQgDGDUrkkZGddSEzEZEzSGVE5BcfrjnAXz7dRGm5g+bB/jxzbS8u7BprdiwREY+nMiJe70hJOX/5ZBNf/JwNwDntIpk5qo9udCci0kRURsSrpW/J5fHPtpBXXI6fj4V7LujIpOEd8NXZMiIiTUZlRLxSwXE7b+/0YdWK9QB0ig3luet606NNhLnBRES8kMqIeJ30LYf48yc/c6jQBx8L3HZee+4b0RGrnyapioiYQWVEvEZOQRlT/r2ZxZsr77QbHWgwa/QA+rfXtUNERMykMiIez+E0eHvlPp75ajvFtgr8fCyMH5xIB9tO+iQ0MzueiIjXUxkRj7bxwDH+8ulmNuw/BkCfhGZMv7oH7aOCWLRop7nhREQEUBkRD5VbVMYzi7ezcM0BAMKsfjw0sjM39U/Ax8eC3W43OaGIiJygMiIexVbhYO6Pe3nxm10U2yoAuLpPax4e2ZnYcF03RETEFamMiEcwDIMvN+Xw9OJt7M0vBaBXmwieuLwbfROam5xORERORWVE3N7yXXnMWLyNDQcKAGgRauXhi8/imr5t8NHFy0REXJ7KiLitTQcLmLF4G8t25gEQHODLhCHtuO28doRa9VdbRMRd6De2uJ1NBwt44T87WbLlEAD+vhZuGtCWO4d3IDrManI6ERGpL5URcRsb9h/jhf/s5D/bcgGwWOCKXnGkjjiLhKhgk9OJiEhDqYyISzMMg5UZR3jl+918t/0wAD4WuLxXHH86vyMdYkJNTigiIqdLZURcUoXDyZebcvjXsgw2/jIx1dfHwhW94/jT8A60i1YJERHxFCoj4lIKjtv5YM0BXv9hDwePHQfA6ufDdf3aMOHcdiS2CDE5oYiINDaVEXEJW7MLmbdiH5+sO8hxuwOAqJAAxgxM5OZzEogK1cRUERFPpTIipimzO/hqcw5vr9zHqr1Hq5Z3ig1l7KBErunbhkB/XxMTiohIU1AZkSZlGAabswp5f/V+Pll3kMKyyku2+/lYuKhbS0YPbMuApEgsFl2sTETEW6iMSJPILSrj8w3ZLFxzgK3ZhVXL4yICua5fPDcOSNC9Y0REvJTKiJwxhWV2Fm/K4d/rs1i+Ow+nUbk8wM+Hi7q15Pp+bRjUvgW+umS7iIhXUxmRRlVw3M432w7x5c85fLfjMOUVzqrv9UloxpW9W3NF7ziaBQeYmFJERFyJyoictsNFNr7eeogvN+WwfFceFSd2gQAdYkK5snccl/dqraukiohIrVRGpN6czspJqP/Zdohvt+VW3S33hI4xoYzs3pKLu7eiS6swTUYVEZFTUhmROjlcZGP57jyW7cxj6Y7DHC6yVft+zzYRXNStJRd1a6lLtIuISL2ojEitisrsrN53lOW7KgvItpyiat8PCfDl3I4tOL9zDMPPiiFGZ8KIiEgDqYwIAEdLylm19wj/3XOEn/YcYXNWAb+a+gFAt7hwzu3QgiEdozk7qTlWP12QTERETp/KiBdyOA125haxZt9R1u47xrrMo2TkldRYr21UMOckRTG4YwsGt4/SJdlFROSMUBnxcE6nwd78En4+WMDGAwX8fKCATVkFlJY7aqzbPjqE/klRnNMukgFJUbSM0KEXERE581RGPEiJrYIdh4rYkl3I1uxCtmYXsS27kJJaikdIgC+9E5rRN6E5fROa0zu+Gc1DdO0PERFpeiojbqiozE7G4RJ2Hy5m+6Eidh4qZsehIg4cPV7r+lY/H7rFhdOzTTN6tomgZ5sIklqE6sqnIiLiElRGXNTxcgeZR0rZl1/CvvxS9uSXkHG4mIzDJeT+5rTaX4sOs9KlVThdWoXRtVU4XVuFk9QiBD9fnyZMLyIiUncqIyaxVTjIKSjj4LHjHDh6nANHSjlw9Dj7j5ay/8hxcgrLTrl9dJiVdi1C6BQbRqeWYXSKCaVTbJgOtYiIiNtRGWlkhmFQbKsgt8jGoYIyDhWVkVNgI+tYKRt2+DAncyVZBTbyik++d+OEsEA/EqNCaBsVTNuoYNq1CKV9TCjtokMID/RvgncjIiJy5qmM1EGFw8mx43aOlpRzpKSc/JJy8ott5BWXk19iI6+onMPFNg4X2cgtKqPM7jzJM/kAhVWPAv19iIsIok1kMG2aB9GmeRDxzSv/nBgVQrNgf11KXUREPJ7XlBGHs3KPRVGZnaKyCgqP2yksq6DguJ3C43YKfvk6VlrOseN2jpVWPj5SUk7BcXu9Xy/U6kdMuJWW4YG0DA8kOjSAvP27uHBQMvFRocQ1C6K5yoaIiEjDykhaWhrPPPMM2dnZdOvWjZkzZzJkyJCTrr906VJSU1PZvHkzcXFxPPTQQ0ycOLHer/vf3flYrMcptTs4Xl5Bic1BaXkFJeUOSmyVj0tsFRT/6qvEVkFRWeWfT1ezYH8igwOICg0gKsRa+d9QK1EhAcSEWYn+1VdwQPUfrd1uZ9GinVzYJQZ/fx1iEREROaHeZWTBggXce++9pKWlMXjwYF555RVGjhzJli1bSEhIqLH+nj17uOSSS7j11lt5++23+fHHH5k0aRLR0dFcc8019Xrt8fNW42M9vdvQB/j6EBboR0SQP2FB/kQE+RMe6Ed4kD/Ng/1pFhRARLA/zYL8aRYcQGSIP82DA4gI8tcZKSIiImdAvcvI888/z/jx45kwYQIAM2fO5KuvvmL27NlMnz69xvovv/wyCQkJzJw5E4AuXbqwevVqnn322XqXkfbRIYSHhxMU4EtwgB/BAb6EWv0IDvAjxFq5LNTqS2igHyEBfoQG+hFq9SMs0J+wQD/CAv10PxUREREXU68yUl5ezpo1a3jkkUeqLU9JSWH58uW1brNixQpSUlKqLbvooouYM2cOdru91kMWNpsNm+1/Z5sUFlZO+vzg9gGEh4fXJ3J1hhP7SSeXnll2u73af8VcGg/XobFwHRoL1+EpY1HX/PUqI3l5eTgcDmJjY6stj42NJScnp9ZtcnJyal2/oqKCvLw8WrVqVWOb6dOnM3Xq1BrLlyxZQnDw6R2mMVt6errZEeRXNB6uQ2PhOjQWrsPdx6K0tLRO6zVoAutvzwAxDOOUZ4XUtn5ty0+YPHkyqampVY8LCwuJj48nJSXl9PaMmMhut5Oens6IESM0gdUFaDxch8bCdWgsXIenjMWJIxu/p15lpEWLFvj6+tbYC5Kbm1tj78cJLVu2rHV9Pz8/oqKiat3GarVitda8Xb2/v79bDwp4xnvwJBoP16GxcB0aC9fh7mNR1+z1Oj0kICCA5OTkGruN0tPTGTRoUK3bDBw4sMb6S5YsoV+/fm79AxYREZHGUe9zVVNTU3nttdd4/fXX2bp1K/fddx+ZmZlV1w2ZPHkyY8aMqVp/4sSJ7Nu3j9TUVLZu3crrr7/OnDlzeOCBBxrvXYiIiIjbqveckVGjRpGfn8+0adPIzs6me/fuLFq0iLZt2wKQnZ1NZmZm1fpJSUksWrSI++67j1mzZhEXF8cLL7xQ79N6RURExDM1aALrpEmTmDRpUq3fmzt3bo1lQ4cOZe3atQ15KREREfFwuqSoiIiImEplREREREylMiIiIiKmUhkRERERU6mMiIiIiKlURkRERMRUKiMiIiJiKpURERERMZXKiIiIiJhKZURERERMpTIiIiIiplIZEREREVOpjIiIiIipVEZERETEVCojIiIiYiqVERERETGVyoiIiIiYSmVERERETKUyIiIiIqZSGRERERFTqYyIiIiIqVRGRERExFQqIyIiImIqP7MD1IVhGAAUFhaanKTh7HY7paWlFBYW4u/vb3Ycr6fxcB0aC9ehsXAdnjIWJ/7dPvHv+Mm4RRkpKioCID4+3uQkIiIiUl9FRUVERESc9PsW4/fqigtwOp1kZWURFhaGxWIxO06DFBYWEh8fz/79+wkPDzc7jtfTeLgOjYXr0Fi4Dk8ZC8MwKCoqIi4uDh+fk88McYs9Iz4+PrRp08bsGI0iPDzcrf9ieRqNh+vQWLgOjYXr8ISxONUekRM0gVVERERMpTIiIiIiplIZaSJWq5UnnngCq9VqdhRB4+FKNBauQ2PhOrxtLNxiAquIiIh4Lu0ZEREREVOpjIiIiIipVEZERETEVCojIiIiYiqVEZPZbDZ69+6NxWJh/fr1ZsfxOnv37mX8+PEkJSURFBRE+/bteeKJJygvLzc7mldIS0sjKSmJwMBAkpOTWbZsmdmRvNL06dM5++yzCQsLIyYmhiuvvJLt27ebHcvrTZ8+HYvFwr333mt2lDNOZcRkDz30EHFxcWbH8Frbtm3D6XTyyiuvsHnzZv7xj3/w8ssv8+ijj5odzeMtWLCAe++9l8cee4x169YxZMgQRo4cSWZmptnRvM7SpUu58847WblyJenp6VRUVJCSkkJJSYnZ0bzWqlWrePXVV+nZs6fZUZqETu010Zdffklqaioffvgh3bp1Y926dfTu3dvsWF7vmWeeYfbs2WRkZJgdxaMNGDCAvn37Mnv27KplXbp04corr2T69OkmJpPDhw8TExPD0qVLOe+888yO43WKi4vp27cvaWlpPPnkk/Tu3ZuZM2eaHeuM0p4Rkxw6dIhbb72Vt956i+DgYLPjyK8UFBQQGRlpdgyPVl5ezpo1a0hJSam2PCUlheXLl5uUSk4oKCgA0OfAJHfeeSeXXnopF154odlRmoxb3CjP0xiGwbhx45g4cSL9+vVj7969ZkeSX+zevZsXX3yR5557zuwoHi0vLw+Hw0FsbGy15bGxseTk5JiUSqDy91Nqairnnnsu3bt3NzuO13nvvfdYu3Ytq1atMjtKk9KekUY0ZcoULBbLKb9Wr17Niy++SGFhIZMnTzY7sseq61j8WlZWFhdffDHXXXcdEyZMMCm5d7FYLNUeG4ZRY5k0rT/96U9s3LiR+fPnmx3F6+zfv5977rmHt99+m8DAQLPjNCnNGWlEeXl55OXlnXKdxMRE/vjHP/LZZ59V+6XrcDjw9fXlpptu4s033zzTUT1eXcfixAc+KyuL4cOHM2DAAObOnYuPj3r6mVReXk5wcDALFy7kqquuqlp+zz33sH79epYuXWpiOu9111138cknn/D999+TlJRkdhyv88knn3DVVVfh6+tbtczhcGCxWPDx8cFms1X7nidRGTFBZmYmhYWFVY+zsrK46KKL+OCDDxgwYABt2rQxMZ33OXjwIMOHDyc5OZm3337bYz/srmbAgAEkJyeTlpZWtaxr165cccUVmsDaxAzD4K677uLjjz/mu+++o2PHjmZH8kpFRUXs27ev2rJbbrmFzp078/DDD3v0YTPNGTFBQkJCtcehoaEAtG/fXkWkiWVlZTFs2DASEhJ49tlnOXz4cNX3WrZsaWIyz5eamsro0aPp168fAwcO5NVXXyUzM5OJEyeaHc3r3Hnnnbz77rt8+umnhIWFVc3biYiIICgoyOR03iMsLKxG4QgJCSEqKsqjiwiojIiXW7JkCbt27WLXrl01iqB2Gp5Zo0aNIj8/n2nTppGdnU337t1ZtGgRbdu2NTua1zlxevWwYcOqLX/jjTcYN25c0wcSr6PDNCIiImIqzdITERERU6mMiIiIiKlURkRERMRUKiMiIiJiKpURERERMZXKiIiIiJhKZURERERMpTIiIiIiplIZEREREVOpjIiIiIipVEZExBR79+7FYrHU+Prt/VFExPPpRnkiYor4+Hiys7OrHufk5HDhhRdy3nnnmZhKRMygG+WJiOnKysoYNmwY0dHRfPrpp/j4aKetiDfRnhERMd348eMpKioiPT1dRUTEC6mMiIipnnzySRYvXsxPP/1EWFiY2XFExAQ6TCMipvnwww+54YYb+PLLL7ngggvMjiMiJlEZERFTbNq0iQEDBpCamsqdd95ZtTwgIIDIyEgTk4lIU1MZERFTzJ07l1tuuaXG8qFDh/Ldd981fSARMY3KiIiIiJhK09ZFRETEVCojIiIiYiqVERERETGVyoiIiIiYSmVERERETKUyIiIiIqZSGRERERFTqYyIiIiIqVRGRERExFQqIyIiImIqlRERERExlcqIiIiImOr/AbSpzJqxOet0AAAAAElFTkSuQmCC",
-      "text/plain": [
-       "<Figure size 640x480 with 1 Axes>"
-      ]
-     },
-     "metadata": {},
-     "output_type": "display_data"
-    },
-    {
-     "data": {
-      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAjgAAAHFCAYAAAD/kYOsAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjguMCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy81sbWrAAAACXBIWXMAAA9hAAAPYQGoP6dpAAA4kUlEQVR4nO3de3SU1b3/8c8EwoQIiQYkCW0IlEWBAAoEyYWCREwQBNGKQLVRFLAssIKpy5p6A/Q0R2slgILSQ02R67ER0BqRYA3gIVCBBKun8gMPGIRELkqGi5kMyfP7QzMwTghJZZjJnvdrrVn67Nnz5LvZTvj47OdisyzLEgAAgEFC/F0AAADApUbAAQAAxiHgAAAA4xBwAACAcQg4AADAOAQcAABgHAIOAAAwDgEHAAAYh4ADAACMQ8ABcFErVqxQbm6uX2v46quvNGHCBHXo0EE2m0233nqrX+v5/e9/r7Vr13q1FxUVyWazqaio6LLXBOAcG49qAHAxo0aN0scff6wDBw74rYaHHnpICxcu1J///Gd17dpVUVFR+ulPf+q3etq0aaOxY8cqLy/Po93hcOh///d/lZCQoIiICP8UB0At/V0AADTGxx9/rK5du+quu+7ydykNioiIUHJysr/LAIIeS1RAkDt69Kjuv/9+xcXFyW636+qrr9agQYO0ceNGSdLQoUP19ttv6/PPP5fNZnO/6lRXV+uZZ55Rjx493J+/9957dfToUY+f07lzZ40aNUpr1qzRNddco7CwMP3kJz/R/PnzG6zvwIEDstls2rhxo/71r3+5f35RUdEFl4PqPnP+0ZWJEyeqTZs22rdvn0aOHKk2bdooLi5Ov/nNb+R0Oj0+73Q6NWfOHPXs2VNhYWFq166d0tLStHXrVkmSzWbT6dOn9Ze//MVdz9ChQyVdeInqzTffVEpKisLDw9W2bVulp6eruLjYo8+sWbNks9n0ySef6Be/+IUiIyMVHR2t++67T5WVlQ3+OQHwxBEcIMhlZmZq165d+o//+A/99Kc/1YkTJ7Rr1y4dP35ckrRw4ULdf//9+uyzz7RmzRqPz9bW1mrMmDHasmWLHnnkEaWmpurzzz/XU089paFDh2rHjh1q3bq1u39paalmzpypWbNmKSYmRsuXL9eMGTNUXV2thx9+uN76YmNjVVxcrGnTpqmyslLLly+XJCUkJGjXrl1NGqvL5dItt9yiSZMm6Te/+Y02b96sp59+WpGRkXryySclSWfPntWIESO0ZcsWzZw5UzfccIPOnj2rbdu2qaysTKmpqSouLtYNN9ygtLQ0PfHEE5LU4HLUihUrdNdddykjI0MrV66U0+nUc889p6FDh+q9997Tz372M4/+t99+u8aPH69Jkybpn//8p7KzsyVJf/7zn5s0XiCoWQCCWps2bayZM2c22Ofmm2+24uPjvdpXrlxpSbLy8/M92j/88ENLkrVw4UJ3W3x8vGWz2azS0lKPvunp6VZERIR1+vTpBmu4/vrrrV69enm0vf/++5Yk6/333/do379/vyXJevXVV91t99xzjyXJ+u///m+PviNHjrS6d+/u3l66dKklyfrTn/7UYD1XXHGFdc8993i1f7+mmpoaq2PHjlafPn2smpoad7+TJ09aHTp0sFJTU91tTz31lCXJeu655zz2OW3aNCssLMyqra1tsCYA57BEBQS5gQMHKi8vT88884y2bdsml8vV6M/+7W9/05VXXqnRo0fr7Nmz7lffvn0VExPjtUzTq1cvXXvttR5td955pxwOR5OPxvw7bDabRo8e7dF2zTXX6PPPP3dvv/POOwoLC9N99913SX7mnj17dPjwYWVmZiok5Nyv3DZt2uj222/Xtm3bdObMGY/P3HLLLV41VlVV6ciRI5ekJiAYEHCAILd69Wrdc889+q//+i+lpKQoKipKd999tyoqKi762S+//FInTpxQq1atFBoa6vGqqKjQsWPHPPrHxMR47aOurW5JzJfCw8MVFhbm0Wa321VVVeXePnr0qDp27OgRRn6IunHFxsZ6vdexY0fV1tbq66+/9mhv166dV42S9M0331ySmoBgwDk4QJBr3769cnNzlZubq7KyMr355pt69NFHdeTIEa1fv/6in23Xrt0F+7Vt29Zju77QVNf2/b/UG6MurHz/JOHvB6umuPrqq/XBBx+otrb2koScunGVl5d7vXf48GGFhIToqquu+sE/B4AnjuAAcOvUqZMeeOABpaeneywZ2e32eo8ejBo1SsePH1dNTY0GDBjg9erevbtH/08++US7d+/2aFuxYoXatm2r/v37N7nezp07S5I++ugjj/Y333yzyfuqM2LECFVVVXnd3+b7LvRn8n3du3fXj370I61YsULWebcdO336tPLz891XVgG4tDiCAwSxyspKpaWl6c4771SPHj3Utm1bffjhh1q/fr1+/vOfu/v16dNHb7zxhhYtWqTExESFhIRowIABmjBhgpYvX66RI0dqxowZGjhwoEJDQ/XFF1/o/fff15gxY3Tbbbe599OxY0fdcsstmjVrlmJjY7Vs2TIVFhbq2Wef/bf+ko+JidGNN96onJwcXXXVVYqPj9d7772nN95449/+M/nFL36hV199VVOnTtWePXuUlpam2tpabd++XT179tSECRPcfyZFRUV66623FBsbq7Zt23oFOkkKCQnRc889p7vuukujRo3Sr371KzmdTv3hD3/QiRMn9J//+Z//dq0AGuDvs5wB+E9VVZU1depU65prrrEiIiKs1q1bW927d7eeeuopj6uavvrqK2vs2LHWlVdeadlsNuv8Xx0ul8t6/vnnrWuvvdYKCwuz2rRpY/Xo0cP61a9+Ze3du9fdLz4+3rr55putv/71r1avXr2sVq1aWZ07d7ZeeOGFRtVa31VUlmVZ5eXl1tixY62oqCgrMjLS+uUvf2nt2LGj3quorrjiCq/P1125dL5vvvnGevLJJ61u3bpZrVq1stq1a2fdcMMN1tatW919SktLrUGDBlnh4eGWJOv666+3LOvCV3atXbvWSkpKssLCwqwrrrjCGjZsmPU///M/9dZy9OhRj/ZXX33VkmTt37+/EX9SACzLsnhUA4DLonPnzurdu7f+9re/+bsUAEGAc3AAAIBxCDgAAMA4LFEBAADj+PQIzubNmzV69Gh17NhRNptNa9eubbB/3UPqvv/69NNPPfrl5+crISFBdrtdCQkJXs/HAQAAwc2nAef06dO69tpr9eKLLzbpc3v27FF5ebn71a1bN/d7xcXFGj9+vDIzM7V7925lZmZq3Lhx2r59+6UuHwAANFOXbYnKZrNpzZo1uvXWWy/Yp6ioSGlpafr666915ZVX1ttn/Pjxcjgceuedd9xtN910k6666iqtXLnyElcNAACao4C80V+/fv1UVVWlhIQEPf7440pLS3O/V1xcrIceesij//Dhw5Wbm3vB/TmdTo9budfW1uqrr75Su3btZLPZLnn9AADg0rMsSydPnmzU8+ICKuDExsZq8eLFSkxMlNPp1GuvvaZhw4apqKhIQ4YMkfTtc2uio6M9PhcdHd3ggwFzcnI0e/Zsn9YOAAAuj4MHD+rHP/5xg30CKuB0797d41bnKSkpOnjwoJ5//nl3wJHkddTFsqwGj8RkZ2crKyvLvV1ZWalOnTpp//79Xg8DbE5cLpfef/99paWlKTQ01N/lBDXmInAwF4GF+QgcJszFyZMn1aVLl0b93R1QAac+ycnJWrZsmXs7JibG62jNkSNHvI7qnM9ut8tut3u1R0VFKSIi4tIVe5m5XC6Fh4erXbt2zfY/VlMwF4GDuQgszEfgMGEu6upuzOklAX+jv5KSEsXGxrq3U1JSVFhY6NFnw4YNSk1NvdylAQCAAOXTIzinTp3Svn373Nv79+9XaWmpoqKi1KlTJ2VnZ+vQoUNaunSpJCk3N1edO3dWr169VF1drWXLlik/P1/5+fnufcyYMUNDhgzRs88+qzFjxmjdunXauHGjPvjgA18OBQAANCM+DTg7duzwuAKq7jyYe+65R3l5eSovL1dZWZn7/erqaj388MM6dOiQWrdurV69euntt9/WyJEj3X1SU1O1atUqPf7443riiSfUtWtXrV69WklJSb4cCgAAaEZ8GnCGDh2qhm6zk5eX57H9yCOP6JFHHrnofseOHauxY8f+0PIAAIChAv4cHAAAgKYi4AAAAOMQcAAAgHEIOAAAwDgEHAAAYBwCDgAAMA4BBwAAGIeAAwAAjEPAAQAAxiHgAAAA4xBwAACAcQg4AADAOAQcAABgHAIOAAAwDgEHAAAYh4ADAACMQ8ABAADGIeAAAADjEHAAAIBxCDgAAMA4BBwAAGAcAg4AADAOAQcAABiHgAMAAIxDwAEAAMYh4AAAAOMQcAAAgHEIOAAAwDgEHAAAYBwCDgAAMA4BBwAAGIeAAwAAjEPAAQAAxiHgAAAA4xBwAACAcXwacDZv3qzRo0erY8eOstlsWrt2bYP933jjDaWnp+vqq69WRESEUlJS9O6773r0ycvLk81m83pVVVX5cCQAAKA58WnAOX36tK699lq9+OKLjeq/efNmpaenq6CgQDt37lRaWppGjx6tkpISj34REREqLy/3eIWFhfliCAAAoBlq6cudjxgxQiNGjGh0/9zcXI/t3//+91q3bp3eeust9evXz91us9kUExNzqcoEAACG8WnA+aFqa2t18uRJRUVFebSfOnVK8fHxqqmpUd++ffX00097BKDvczqdcjqd7m2HwyFJcrlccrlcvin+MqirvTmPwRTMReBgLgIL8xE4TJiLptQe0AHnj3/8o06fPq1x48a523r06KG8vDz16dNHDodD8+bN06BBg7R7925169at3v3k5ORo9uzZXu0bNmxQeHi4z+q/XAoLC/1dAr7DXAQO5iKwMB+BoznPxZkzZxrd12ZZluXDWs79IJtNa9as0a233tqo/itXrtTkyZO1bt063XjjjRfsV1tbq/79+2vIkCGaP39+vX3qO4ITFxenY8eOKSIioknjCCQul0uFhYVKT09XaGiov8sJasxF4GAuAgvzEThMmAuHw6H27dursrLyon9/B+QRnNWrV2vSpEl6/fXXGww3khQSEqLrrrtOe/fuvWAfu90uu93u1R4aGtpsJ/l8pozDBMxF4GAuAgvzETia81w0pe6Auw/OypUrNXHiRK1YsUI333zzRftblqXS0lLFxsZehuoAAEBz4NMjOKdOndK+ffvc2/v371dpaamioqLUqVMnZWdn69ChQ1q6dKmkb8PN3XffrXnz5ik5OVkVFRWSpNatWysyMlKSNHv2bCUnJ6tbt25yOByaP3++SktL9dJLL/lyKAAAoBnx6RGcHTt2qF+/fu4rnLKystSvXz89+eSTkqTy8nKVlZW5+7/yyis6e/aspk+frtjYWPdrxowZ7j4nTpzQ/fffr549eyojI0OHDh3S5s2bNXDgQF8OBQAANCM+PYIzdOhQNXQOc15ensd2UVHRRfc5d+5czZ079wdWBgAATBZw5+AAAAD8UAQcAABgHAIOAAAwDgEHAAAYh4ADAACMQ8ABAADGIeAAAADjEHAAAIBxCDgAAMA4BBwAAGAcAg4AADAOAQcAABiHgAMAAIxDwAEAAMYh4AAAAOMQcAAAgHEIOAAAwDgEHAAAYBwCDgAAMA4BBwAAGIeAAwAAjEPAAQAAxiHgAAAA4xBwAACAcQg4AADAOAQcAABgHAIOAAAwDgEHAAAYh4ADAACMQ8ABAADGIeAAAADjEHAAAIBxCDgAAMA4BBwAAGAcAg4AADCOTwPO5s2bNXr0aHXs2FE2m01r16696Gc2bdqkxMREhYWF6Sc/+Ylefvllrz75+flKSEiQ3W5XQkKC1qxZ44PqAQBAc+XTgHP69Glde+21evHFFxvVf//+/Ro5cqQGDx6skpIS/e53v9ODDz6o/Px8d5/i4mKNHz9emZmZ2r17tzIzMzVu3Dht377dV8MAAADNTEtf7nzEiBEaMWJEo/u//PLL6tSpk3JzcyVJPXv21I4dO/T888/r9ttvlyTl5uYqPT1d2dnZkqTs7Gxt2rRJubm5Wrly5SUfAwAAaH58GnCaqri4WBkZGR5tw4cP15IlS+RyuRQaGqri4mI99NBDXn3qQhEAHK+SFvz9M9X4uxCotqZW+8pC9GnhXoW04LRPfzJhLqpOn2p034AKOBUVFYqOjvZoi46O1tmzZ3Xs2DHFxsZesE9FRcUF9+t0OuV0Ot3bDodDkuRyueRyuS7hCC6vutqb8xhMwVwEDpfLpb+VhWhXyWf+LgVuISo8tN/fRUBSc5+LWueZRvcNqIAjSTabzWPbsiyv9vr6fL/tfDk5OZo9e7ZX+4YNGxQeHv5Dyg0IhYWF/i4B32EuAsM3Nd/+32n3yFrFtPZzMQAumeqqWq1qZN+ACjgxMTFeR2KOHDmili1bql27dg32+f5RnfNlZ2crKyvLve1wOBQXF6eMjAxFRERcwhFcXi6XS4WFhUpPT1doaKi/ywlqzEXgcLlcWvS/70mS7hvWRz/v9yM/VxTc+G4EDhPmwuFwaFV24/oGVMBJSUnRW2+95dG2YcMGDRgwwD0ZKSkpKiws9DgPZ8OGDUpNTb3gfu12u+x2u1d7aGhos53k85kyDhMwF4EltGVL5iNA8N0IHM15LppSt08DzqlTp7Rv3z739v79+1VaWqqoqCh16tRJ2dnZOnTokJYuXSpJmjp1ql588UVlZWVpypQpKi4u1pIlSzyujpoxY4aGDBmiZ599VmPGjNG6deu0ceNGffDBB74cCoBmxPrunw2sXAMwnE9Po96xY4f69eunfv36SZKysrLUr18/Pfnkk5Kk8vJylZWVuft36dJFBQUFKioqUt++ffX0009r/vz57kvEJSk1NVWrVq3Sq6++qmuuuUZ5eXlavXq1kpKSfDkUAM1IXcAJIeEAQcunR3CGDh3qPkm4Pnl5eV5t119/vXbt2tXgfseOHauxY8f+0PIAGKqBXzsAgkTzvBAeABqhoasrAZiNgAPAOJa+DTYh5BsgaBFwABinbonKJhIOEKwIOACMw1VUAAg4AIxz7ioqv5YBwI8IOAAMRsIBghUBB4Bx6s7B4QgOELwIOACMc+4cHBIOEKwIOACMRbwBghcBB4BxauuWqPgNBwQtvv4AjMV9cIDgRcABYBz3o6jIN0DQIuAAMM65q6hIOECwIuAAMBbxBgheBBwAxqn97p8cwQGCFwEHgHnqHrZJvgGCFgEHgHHcN/rzaxUA/ImAA8A43MkYAAEHgLHIN0DwIuAAME7dZeLkGyB4EXAAGKduiSqEx4kDQYuAA8A4nGQMgIADwDgWl4kDQY+AA8BYXEUFBC8CDgDjsEQFgIADwDg8bBMAAQeAcc7d6M+vZQDwIwIOAGPZWKQCghYBB4BxuIoKAAEHgHFYogJAwAFgnHNXUZFwgGBFwAFgnHOPavBrGQD8iK8/APO4H7bJERwgWBFwABjHfQSHfAMELQIOAONwkjGAyxJwFi5cqC5duigsLEyJiYnasmXLBftOnDhRNpvN69WrVy93n7y8vHr7VFVVXY7hAAh0dQmHJSogaPk84KxevVozZ87UY489ppKSEg0ePFgjRoxQWVlZvf3nzZun8vJy9+vgwYOKiorSHXfc4dEvIiLCo195ebnCwsJ8PRwAzQBLVAB8HnBeeOEFTZo0SZMnT1bPnj2Vm5uruLg4LVq0qN7+kZGRiomJcb927Nihr7/+Wvfee69HP5vN5tEvJibG10MB0EycW6Ii4QDByqcBp7q6Wjt37lRGRoZHe0ZGhrZu3dqofSxZskQ33nij4uPjPdpPnTql+Ph4/fjHP9aoUaNUUlJyyeoG0Ly572Ts3zIA+FFLX+782LFjqqmpUXR0tEd7dHS0KioqLvr58vJyvfPOO1qxYoVHe48ePZSXl6c+ffrI4XBo3rx5GjRokHbv3q1u3bp57cfpdMrpdLq3HQ6HJMnlcsnlcv07QwsIdbU35zGYgrkIHC6Xy30Ep6bmLHPiZ3w3AocJc9GU2n0acOp8/zCxZVmNOnScl5enK6+8UrfeeqtHe3JyspKTk93bgwYNUv/+/bVgwQLNnz/faz85OTmaPXu2V/uGDRsUHh7eyFEErsLCQn+XgO8wF4GihSRpU1GR2nFqXkDguxE4mvNcnDlzptF9fRpw2rdvrxYtWngdrTly5IjXUZ3vsyxLf/7zn5WZmalWrVo12DckJETXXXed9u7dW+/72dnZysrKcm87HA7FxcUpIyNDERERjRxN4HG5XCosLFR6erpCQ0P9XU5QYy4Ch8vlkrXt75KkG25I04+ubO3nioIb343AYcJc1K3ANIZPA06rVq2UmJiowsJC3Xbbbe72wsJCjRkzpsHPbtq0Sfv27dOkSZMu+nMsy1Jpaan69OlT7/t2u112u92rPTQ0tNlO8vlMGYcJmIvAULdExXwEDuYicDTnuWhK3T5fosrKylJmZqYGDBiglJQULV68WGVlZZo6daqkb4+uHDp0SEuXLvX43JIlS5SUlKTevXt77XP27NlKTk5Wt27d5HA4NH/+fJWWluqll17y9XAANAPnHrYJIFj5POCMHz9ex48f15w5c1ReXq7evXuroKDAfVVUeXm51z1xKisrlZ+fr3nz5tW7zxMnTuj+++9XRUWFIiMj1a9fP23evFkDBw709XAANAPn7oNDxAGC1WU5yXjatGmaNm1ave/l5eV5tUVGRjZ4ItHcuXM1d+7cS1UeANPUXSZOvgGCFs+iAmAclqgAEHAAGMf6LtpwJ2MgeBFwABjFstxP2mSJCghiBBwARjkv33CSMRDECDgAjHJevuEcHCCIEXAAGIUlKgASAQeAYWrPO4TDScZA8CLgADCKxxIV+QYIWgQcAGY5f4nKj2UA8C8CDgCj1HIVFQARcAAYxhInGQMg4AAwjMdJxixSAUGLgAPAKJbHVVT+qwOAfxFwABiGJSoABBwAhuEkYwASAQeAYTyWqPxXBgA/I+AAMIrnVVREHCBYEXAAGMVzicp/dQDwLwIOALNYHMEBQMABYJjzj+AACF4EHABGqcs3LE8BwY2AA8Ao1ndLVCxPAcGNgAPAKHVLVBzBAYIbAQcAABiHgAPAKCxRAZAIOAAMwxIVAImAA8AwdXcyJt8AwY2AA8Aodff5Y4kKCG4EHABGORdw/FsHAP8i4AAwyrklKhIOEMwIOACMUlv77T85yRgIbgQcAEape1QDS1RAcCPgADCK+z44LFEBQY2AA8AonGQMQCLgADCM+yRjAg4Q1Ag4AIziPoLDEhUQ1C5LwFm4cKG6dOmisLAwJSYmasuWLRfsW1RUJJvN5vX69NNPPfrl5+crISFBdrtdCQkJWrNmja+HAaAZ4FENAKTLEHBWr16tmTNn6rHHHlNJSYkGDx6sESNGqKysrMHP7dmzR+Xl5e5Xt27d3O8VFxdr/PjxyszM1O7du5WZmalx48Zp+/btvh4OgAB3bomKhAMEM58HnBdeeEGTJk3S5MmT1bNnT+Xm5iouLk6LFi1q8HMdOnRQTEyM+9WiRQv3e7m5uUpPT1d2drZ69Oih7OxsDRs2TLm5uT4eDYBAx0nGACSppS93Xl1drZ07d+rRRx/1aM/IyNDWrVsb/Gy/fv1UVVWlhIQEPf7440pLS3O/V1xcrIceesij//Dhwy8YcJxOp5xOp3vb4XBIklwul1wuV1OGFFDqam/OYzAFcxE4XK6zkr592Cbz4X98NwKHCXPRlNp9GnCOHTummpoaRUdHe7RHR0eroqKi3s/ExsZq8eLFSkxMlNPp1GuvvaZhw4apqKhIQ4YMkSRVVFQ0aZ85OTmaPXu2V/uGDRsUHh7+7wwtoBQWFvq7BHyHufC/slOS1FJVVVUqKCjwdzn4Dt+NwNGc5+LMmTON7uvTgFPn+2vhlmVdcH28e/fu6t69u3s7JSVFBw8e1PPPP+8OOE3dZ3Z2trKystzbDodDcXFxysjIUERERJPHEyhcLpcKCwuVnp6u0NBQf5cT1JiLwLHzwHHpnzsV3rq1Ro4ccvEPwKf4bgQOE+aibgWmMXwacNq3b68WLVp4HVk5cuSI1xGYhiQnJ2vZsmXu7ZiYmCbt0263y263e7WHhoY220k+nynjMAFz4X915+uF2MRcBBC+G4GjOc9FU+r26UnGrVq1UmJiotfhsMLCQqWmpjZ6PyUlJYqNjXVvp6SkeO1zw4YNTdonADPVPYuKs4yB4ObzJaqsrCxlZmZqwIABSklJ0eLFi1VWVqapU6dK+nb56NChQ1q6dKmkb6+Q6ty5s3r16qXq6motW7ZM+fn5ys/Pd+9zxowZGjJkiJ599lmNGTNG69at08aNG/XBBx/4ejgAApzFfXAA6DIEnPHjx+v48eOaM2eOysvL1bt3bxUUFCg+Pl6SVF5e7nFPnOrqaj388MM6dOiQWrdurV69euntt9/WyJEj3X1SU1O1atUqPf7443riiSfUtWtXrV69WklJSb4eDoAAx8M2AUiX6STjadOmadq0afW+l5eX57H9yCOP6JFHHrnoPseOHauxY8deivIAGKSW++AAEM+iAmCYujsZs0QFBDcCDgCjWOfOMvZnGQD8jIADwCicZAxAIuAAMMy5h236uRAAfkXAAWAU98M2WaICghoBB4BRalmiAiACDgDD1C1RsUYFBDcCDgCjnFuiAhDMCDgAjFJ3J+MQfrsBQY1fAQCMUncbHE4yBoIbAQeAUTjJGIBEwAFgGIuTcACIgAPANNwHB4AIOAAMwxIVAImAA8Aw5x7VQMIBghkBB4BROAUHgETAAWCYWouHbQIg4AAwFEtUQHAj4AAwCicZA5AIOAAMU3cfHPINENwIOACM4n5UA0tUQFAj4AAwSt0SFfkGCG4EHABmcS9RkXCAYEbAAWCUc0tUfi0DgJ8RcAAYpe4+OFxFBQQ3Ag4Ao1g8bBOACDgADMNJxgAkAg4A4/CoBgAEHACGYYkKgETAAWAYHtUAQCLgADCM5V6iIuEAwYyAA8AodUtUAIIbAQeAUSzugwNABBwAhuFhmwAkAg4Aw7jvg+PfMgD42WUJOAsXLlSXLl0UFhamxMREbdmy5YJ933jjDaWnp+vqq69WRESEUlJS9O6773r0ycvLk81m83pVVVX5eigAAty5JSoiDhDMfB5wVq9erZkzZ+qxxx5TSUmJBg8erBEjRqisrKze/ps3b1Z6eroKCgq0c+dOpaWlafTo0SopKfHoFxERofLyco9XWFiYr4cDIMC5zzEm3wBBraWvf8ALL7ygSZMmafLkyZKk3Nxcvfvuu1q0aJFycnK8+ufm5nps//73v9e6dev01ltvqV+/fu52m82mmJgYn9YOoPmxuA8OAPk44FRXV2vnzp169NFHPdozMjK0devWRu2jtrZWJ0+eVFRUlEf7qVOnFB8fr5qaGvXt21dPP/20RwA6n9PplNPpdG87HA5JksvlksvlasqQAkpd7c15DKZgLgLH2bNnJX27VMV8+B/fjcBhwlw0pXafBpxjx46ppqZG0dHRHu3R0dGqqKho1D7++Mc/6vTp0xo3bpy7rUePHsrLy1OfPn3kcDg0b948DRo0SLt371a3bt289pGTk6PZs2d7tW/YsEHh4eFNHFXgKSws9HcJ+A5z4X97DtsktVBFebkKCg75uxx8h+9G4GjOc3HmzJlG9/X5EpXkfbmmZVmNuoRz5cqVmjVrltatW6cOHTq425OTk5WcnOzeHjRokPr3768FCxZo/vz5XvvJzs5WVlaWe9vhcCguLk4ZGRmKiIj4d4YUEFwulwoLC5Wenq7Q0FB/lxPUmIvAUbbpM+nzz/Sjjh01cuQ1/i4n6PHdCBwmzEXdCkxj+DTgtG/fXi1atPA6WnPkyBGvozrft3r1ak2aNEmvv/66brzxxgb7hoSE6LrrrtPevXvrfd9ut8tut3u1h4aGNttJPp8p4zABc+F/ISHfXjsR0iKEuQggfDcCR3Oei6bU7dOrqFq1aqXExESvw2GFhYVKTU294OdWrlypiRMnasWKFbr55psv+nMsy1JpaaliY2N/cM0AmjeL++AA0GVYosrKylJmZqYGDBiglJQULV68WGVlZZo6daqkb5ePDh06pKVLl0r6NtzcfffdmjdvnpKTk91Hf1q3bq3IyEhJ0uzZs5WcnKxu3brJ4XBo/vz5Ki0t1UsvveTr4QAIcNwHB4B0GQLO+PHjdfz4cc2ZM0fl5eXq3bu3CgoKFB8fL0kqLy/3uCfOK6+8orNnz2r69OmaPn26u/2ee+5RXl6eJOnEiRO6//77VVFRocjISPXr10+bN2/WwIEDfT0cAAHu3KMa/FoGAD+7LCcZT5s2TdOmTav3vbrQUqeoqOii+5s7d67mzp17CSoDYJpa7oMDQDyLCoBh6paoOAsHCG4EHABGYYkKgETAAWCYcycZ+7kQAH5FwAFglHOXiZNwgGBGwAFgFJaoAEgEHACGqf3uEE5jHgcDwFwEHABm4U7GAETAAWAY7oMDQCLgADCMJZaoABBwABiGh20CkAg4AAxz7iRjPxcCwK8IOACMxBIVENwIOACMwhIVAImAA8Awte5HNRBxgGBGwAFgFO5kDEAi4AAwTN19cAg4QHAj4AAwS91VVJyFAwQ1Ag4Ao7BEBUAi4AAwzLmTjP1cCAC/IuAAMMq5y8RJOEAwI+AAMErdEhX5BghuBBwARrG4Dw4AEXAAGIY7GQOQCDgADMN9cABIBBwAhrHEEhUAAg4Aw1jWxfsAMB8BB4BR3CcZ89sNCGr8CgBgFO6DA0Ai4AAwDI9qACARcAAYppb74AAQAQeAYTjJGIBEwAFgGIv74AAQAQeAYbgPDgCJgAPAMDyqAYBEwAFgmHMnGfu5EAB+dVkCzsKFC9WlSxeFhYUpMTFRW7ZsabD/pk2blJiYqLCwMP3kJz/Ryy+/7NUnPz9fCQkJstvtSkhI0Jo1a3xVPoBmxH2OMUtUQFDzecBZvXq1Zs6cqccee0wlJSUaPHiwRowYobKysnr779+/XyNHjtTgwYNVUlKi3/3ud3rwwQeVn5/v7lNcXKzx48crMzNTu3fvVmZmpsaNG6ft27f7ejgAAhxLVACkyxBwXnjhBU2aNEmTJ09Wz549lZubq7i4OC1atKje/i+//LI6deqk3Nxc9ezZU5MnT9Z9992n559/3t0nNzdX6enpys7OVo8ePZSdna1hw4YpNzfX18MBEOAs7oMDQFJLX+68urpaO3fu1KOPPurRnpGRoa1bt9b7meLiYmVkZHi0DR8+XEuWLJHL5VJoaKiKi4v10EMPefW5UMBxOp1yOp3ubYfDIUma8pcP1ar1FU0dVsCwLEvHjoXo9SM7ZOOXuV8xF4Hjk8Pffr9ra2vkcrn8XA3q5oC58D8T5qIptfs04Bw7dkw1NTWKjo72aI+OjlZFRUW9n6moqKi3/9mzZ3Xs2DHFxsZesM+F9pmTk6PZs2d7tRf/31cKsVc1ZUgBKESq/MrfRUAScxFYvtj3vyo4/om/y8B3CgsL/V0CvtOc5+LMmTON7uvTgFPn+/9Ha1lWg/+XW1//77c3ZZ/Z2dnKyspybzscDsXFxenpW3oqvE3bxg0iANXU1Ojjjz9W79691aJFC3+XE9SYi8BRU1OjA3s+1gNjh8neqpW/ywl6LpdLhYWFSk9PV2hoqL/LCWomzEXdCkxj+DTgtG/fXi1atPA6snLkyBGvIzB1YmJi6u3fsmVLtWvXrsE+F9qn3W6X3W73ar8tsZMiIiIaPZ5A43K51PrLf2pkYlyz/Y/VFMxF4HC5XCr48p+yt2rFXASQ0NBQ5iNANOe5aErdPj3JuFWrVkpMTPQ6HFZYWKjU1NR6P5OSkuLVf8OGDRowYIB7YBfqc6F9AgCA4OLzJaqsrCxlZmZqwIABSklJ0eLFi1VWVqapU6dK+nb56NChQ1q6dKkkaerUqXrxxReVlZWlKVOmqLi4WEuWLNHKlSvd+5wxY4aGDBmiZ599VmPGjNG6deu0ceNGffDBB74eDgAAaAZ8HnDGjx+v48ePa86cOSovL1fv3r1VUFCg+Ph4SVJ5ebnHPXG6dOmigoICPfTQQ3rppZfUsWNHzZ8/X7fffru7T2pqqlatWqXHH39cTzzxhLp27arVq1crKSnJ18MBAADNwGU5yXjatGmaNm1ave/l5eV5tV1//fXatWtXg/scO3asxo4deynKAwAAhuFZVAAAwDgEHAAAYBwCDgAAMA4BBwAAGIeAAwAAjEPAAQAAxiHgAAAA4xBwAACAcQg4AADAOAQcAABgHAIOAAAwDgEHAAAYh4ADAACMQ8ABAADGIeAAAADjEHAAAIBxCDgAAMA4BBwAAGAcAg4AADAOAQcAABiHgAMAAIxDwAEAAMYh4AAAAOMQcAAAgHEIOAAAwDgEHAAAYBwCDgAAMA4BBwAAGIeAAwAAjEPAAQAAxiHgAAAA4xBwAACAcQg4AADAOAQcAABgHAIOAAAwjk8Dztdff63MzExFRkYqMjJSmZmZOnHixAX7u1wu/fa3v1WfPn10xRVXqGPHjrr77rt1+PBhj35Dhw6VzWbzeE2YMMGXQwEAAM2ITwPOnXfeqdLSUq1fv17r169XaWmpMjMzL9j/zJkz2rVrl5544gnt2rVLb7zxhv7f//t/uuWWW7z6TpkyReXl5e7XK6+84suhAACAZqSlr3b8r3/9S+vXr9e2bduUlJQkSfrTn/6klJQU7dmzR927d/f6TGRkpAoLCz3aFixYoIEDB6qsrEydOnVyt4eHhysmJsZX5QMAgGbMZ0dwiouLFRkZ6Q43kpScnKzIyEht3bq10fuprKyUzWbTlVde6dG+fPlytW/fXr169dLDDz+skydPXqrSAQBAM+ezIzgVFRXq0KGDV3uHDh1UUVHRqH1UVVXp0Ucf1Z133qmIiAh3+1133aUuXbooJiZGH3/8sbKzs7V7926voz91nE6nnE6ne9vhcEj69pwfl8vVlGEFlLram/MYTMFcBA7mIrAwH4HDhLloSu1NDjizZs3S7NmzG+zz4YcfSpJsNpvXe5Zl1dv+fS6XSxMmTFBtba0WLlzo8d6UKVPc/967d29169ZNAwYM0K5du9S/f3+vfeXk5NRb84YNGxQeHn7RWgLdhYIdLj/mInAwF4GF+QgczXkuzpw50+i+TQ44DzzwwEWvWOrcubM++ugjffnll17vHT16VNHR0Q1+3uVyady4cdq/f7/+/ve/exy9qU///v0VGhqqvXv31htwsrOzlZWV5d52OByKi4tTRkbGRfcdyFwulwoLC5Wenq7Q0FB/lxPUmIvAwVwEFuYjcJgwF3UrMI3R5IDTvn17tW/f/qL9UlJSVFlZqX/84x8aOHCgJGn79u2qrKxUamrqBT9XF2727t2r999/X+3atbvoz/rkk0/kcrkUGxtb7/t2u112u92rPTQ0tNlO8vlMGYcJmIvAwVwEFuYjcDTnuWhK3T47ybhnz5666aabNGXKFG3btk3btm3TlClTNGrUKI8rqHr06KE1a9ZIks6ePauxY8dqx44dWr58uWpqalRRUaGKigpVV1dLkj777DPNmTNHO3bs0IEDB1RQUKA77rhD/fr106BBg3w1HAAA0Iz49D44y5cvV58+fZSRkaGMjAxdc801eu211zz67NmzR5WVlZKkL774Qm+++aa++OIL9e3bV7Gxse5X3ZVXrVq10nvvvafhw4ere/fuevDBB5WRkaGNGzeqRYsWvhwOAABoJnx2FZUkRUVFadmyZQ32sSzL/e+dO3f22K5PXFycNm3adEnqAwAAZuJZVAAAwDgEHAAAYBwCDgAAMA4BBwAAGIeAAwAAjEPAAQAAxiHgAAAA4xBwAACAcQg4AADAOAQcAABgHAIOAAAwDgEHAAAYh4ADAACMQ8ABAADGIeAAAADjEHAAAIBxCDgAAMA4BBwAAGAcAg4AADAOAQcAABiHgAMAAIxDwAEAAMYh4AAAAOMQcAAAgHEIOAAAwDgEHAAAYBwCDgAAMA4BBwAAGIeAAwAAjEPAAQAAxiHgAAAA4xBwAACAcQg4AADAOAQcAABgHAIOAAAwjk8Dztdff63MzExFRkYqMjJSmZmZOnHiRIOfmThxomw2m8crOTnZo4/T6dSvf/1rtW/fXldccYVuueUWffHFFz4cCQAAaE58GnDuvPNOlZaWav369Vq/fr1KS0uVmZl50c/ddNNNKi8vd78KCgo83p85c6bWrFmjVatW6YMPPtCpU6c0atQo1dTU+GooAACgGWnpqx3/61//0vr167Vt2zYlJSVJkv70pz8pJSVFe/bsUffu3S/4WbvdrpiYmHrfq6ys1JIlS/Taa6/pxhtvlCQtW7ZMcXFx2rhxo4YPH37pBwMAAJoVnx3BKS4uVmRkpDvcSFJycrIiIyO1devWBj9bVFSkDh066Kc//ammTJmiI0eOuN/buXOnXC6XMjIy3G0dO3ZU7969L7pfAAAQHHx2BKeiokIdOnTwau/QoYMqKiou+LkRI0bojjvuUHx8vPbv368nnnhCN9xwg3bu3Cm73a6Kigq1atVKV111lcfnoqOjL7hfp9Mpp9Pp3q6srJQkffXVV3K5XP/O8AKCy+XSmTNndPz4cYWGhvq7nKDGXAQO5iKwMB+Bw4S5OHnypCTJsqyL9m1ywJk1a5Zmz57dYJ8PP/xQkmSz2bzesyyr3vY648ePd/977969NWDAAMXHx+vtt9/Wz3/+8wt+rqH95uTk1Ftzly5dLrg/AAAQmE6ePKnIyMgG+zQ54DzwwAOaMGFCg306d+6sjz76SF9++aXXe0ePHlV0dHSjf15sbKzi4+O1d+9eSVJMTIyqq6v19ddfexzFOXLkiFJTU+vdR3Z2trKystzbtbW1+uqrr9SuXbsGw1agczgciouL08GDBxUREeHvcoIacxE4mIvAwnwEDhPmwrIsnTx5Uh07drxo3yYHnPbt26t9+/YX7ZeSkqLKykr94x//0MCBAyVJ27dvV2Vl5QWDSH2OHz+ugwcPKjY2VpKUmJio0NBQFRYWaty4cZKk8vJyffzxx3ruuefq3Yfdbpfdbvdou/LKKxtdQ6CLiIhotv+xmoa5CBzMRWBhPgJHc5+Lix25qeOzk4x79uypm266SVOmTNG2bdu0bds2TZkyRaNGjfK4gqpHjx5as2aNJOnUqVN6+OGHVVxcrAMHDqioqEijR49W+/btddttt0n6dmCTJk3Sb37zG7333nsqKSnRL3/5S/Xp08d9VRUAAAhuPjvJWJKWL1+uBx980H3F0y233KIXX3zRo8+ePXvcJ/22aNFC//znP7V06VKdOHFCsbGxSktL0+rVq9W2bVv3Z+bOnauWLVtq3Lhx+uabbzRs2DDl5eWpRYsWvhwOAABoJnwacKKiorRs2bIG+5x/JnTr1q317rvvXnS/YWFhWrBggRYsWPCDa2zO7Ha7nnrqKa/lN1x+zEXgYC4CC/MROIJtLmxWY661AgAAaEZ42CYAADAOAQcAABiHgAMAAIxDwAEAAMYh4BjG6XSqb9++stlsKi0t9Xc5QefAgQOaNGmSunTpotatW6tr16566qmnVF1d7e/SgsbChQvVpUsXhYWFKTExUVu2bPF3SUEnJydH1113ndq2basOHTro1ltv1Z49e/xdFvTt3NhsNs2cOdPfpfgcAccwjzzySKNuYQ3f+PTTT1VbW6tXXnlFn3zyiebOnauXX35Zv/vd7/xdWlBYvXq1Zs6cqccee0wlJSUaPHiwRowYobKyMn+XFlQ2bdqk6dOna9u2bSosLNTZs2eVkZGh06dP+7u0oPbhhx9q8eLFuuaaa/xdymXBZeIGeeedd5SVlaX8/Hz16tVLJSUl6tu3r7/LCnp/+MMftGjRIv3f//2fv0sxXlJSkvr3769Fixa523r27Klbb71VOTk5fqwsuB09elQdOnTQpk2bNGTIEH+XE5ROnTql/v37a+HChXrmmWfUt29f5ebm+rssn+IIjiG+/PJLTZkyRa+99prCw8P9XQ7OU1lZqaioKH+XYbzq6mrt3LnTfef0OhkZGdq6daufqoIk993q+R74z/Tp03XzzTcH1SONfHonY1welmVp4sSJmjp1qgYMGKADBw74uyR857PPPtOCBQv0xz/+0d+lGO/YsWOqqalRdHS0R3t0dLQqKir8VBUsy1JWVpZ+9rOfqXfv3v4uJyitWrVKu3bt0ocffujvUi4rjuAEsFmzZslmszX42rFjhxYsWCCHw6Hs7Gx/l2ysxs7F+Q4fPqybbrpJd9xxhyZPnuynyoOPzWbz2LYsy6sNl88DDzygjz76SCtXrvR3KUHp4MGDmjFjhpYtW6awsDB/l3NZcQ5OADt27JiOHTvWYJ/OnTtrwoQJeuuttzx+idfU1KhFixa666679Je//MXXpRqvsXNR9wvk8OHDSktLU1JSkvLy8hQSwv9L+Fp1dbXCw8P1+uuv67bbbnO3z5gxQ6Wlpdq0aZMfqwtOv/71r7V27Vpt3rxZXbp08Xc5QWnt2rW67bbbPB5GXVNTI5vNppCQEDmdTmMfVE3AMUBZWZkcDod7+/Dhwxo+fLj++te/KikpST/+8Y/9WF3wOXTokNLS0pSYmKhly5YZ+8sjECUlJSkxMVELFy50tyUkJGjMmDGcZHwZWZalX//611qzZo2KiorUrVs3f5cUtE6ePKnPP//co+3ee+9Vjx499Nvf/tboZUPOwTFAp06dPLbbtGkjSeratSvh5jI7fPiwhg4dqk6dOun555/X0aNH3e/FxMT4sbLgkJWVpczMTA0YMEApKSlavHixysrKNHXqVH+XFlSmT5+uFStWaN26dWrbtq37HKjIyEi1bt3az9UFl7Zt23qFmCuuuELt2rUzOtxIBBzgktqwYYP27dunffv2eYVLDpb63vjx43X8+HHNmTNH5eXl6t27twoKChQfH+/v0oJK3WX6Q4cO9Wh/9dVXNXHixMtfEIISS1QAAMA4nPkIAACMQ8ABAADGIeAAAADjEHAAAIBxCDgAAMA4BBwAAGAcAg4AADAOAQcAABiHgAMAAIxDwAEAAMYh4AAwwoEDB2Sz2bxe338eEoDgwMM2ARghLi5O5eXl7u2KigrdeOONGjJkiB+rAuAvPGwTgHGqqqo0dOhQXX311Vq3bp1CQjhYDQQbjuAAMM6kSZN08uRJFRYWEm6AIEXAAWCUZ555RuvXr9c//vEPtW3b1t/lAPATlqgAGCM/P1+/+MUv9M4772jYsGH+LgeAHxFwABjh448/VlJSkrKysjR9+nR3e6tWrRQVFeXHygD4AwEHgBHy8vJ07733erVff/31KioquvwFAfArAg4AADAOlxcAAADjEHAAAIBxCDgAAMA4BBwAAGAcAg4AADAOAQcAABiHgAMAAIxDwAEAAMYh4AAAAOMQcAAAgHEIOAAAwDgEHAAAYJz/DxYHI6oI3yvKAAAAAElFTkSuQmCC",
-      "text/plain": [
-       "<Figure size 640x480 with 1 Axes>"
-      ]
-     },
-     "metadata": {},
-     "output_type": "display_data"
-    },
-    {
-     "data": {
-      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAjgAAAHFCAYAAAD/kYOsAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjguMCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy81sbWrAAAACXBIWXMAAA9hAAAPYQGoP6dpAABP0ElEQVR4nO3deVxUdf8+/mtmGIZ92GRREdBIMMwFFHE3E9MWs1zKog0tf7YZrWTdqXflt+Uuyla77eZWS737mLd1ayZWagW4IGBuuIEogggCgyzDMJzfHwNTI9vMyHBmuZ6Px9w6Z97n8Dqvm6arc97nHIkgCAKIiIiI7IhU7AKIiIiIuhsDDhEREdkdBhwiIiKyOww4REREZHcYcIiIiMjuMOAQERGR3WHAISIiIrvDgENERER2hwGHiIiI7A4DDpEDy8jIwNKlS1FVVWXxn1VYWAiJRIJ3333X7G389NNPiI2Nhbu7OyQSCf773/92X4EmOnr0KJYuXYrCwsI2nz300EMICwvr8ZqI6E8MOEQOLCMjA8uWLeuRgHOtBEHAnDlzIJfL8d133yEzMxMTJkwQrZ6jR49i2bJl7QacV199FZs3b+75oohIz0nsAoiIjHHhwgVcvnwZM2fOxOTJk8Uup1MDBgwQuwQih8cjOEQOaunSpXj++ecBAOHh4ZBIJJBIJNi1axcAYOPGjUhISEBwcDBcXV0RFRWFl156CbW1tQbbeeihh+Dh4YFTp05h+vTp8PDwQEhICJ599lmo1ep2f/Z7772H8PBweHh4ID4+HllZWV3W2rdvXwDAiy++CIlEoj8F1NHpoKVLl0IikRgsk0gkeOKJJ7B27VpERUXBzc0NQ4YMwf/+97826x8/fhz33nsvAgMDoVAo0K9fPzzwwANQq9VIS0vD7NmzAQCTJk3S9y4tLa3DmhoaGpCSkoLw8HA4OzujT58+ePzxx9scPQsLC8Ntt92G7du3Y/jw4XB1dUVkZCS+/PLLTntERIZ4BIfIQc2fPx+XL1/GypUr8e233yI4OBgAMGjQIADAyZMnMX36dCxevBju7u44fvw43nrrLezbtw8///yzwbY0Gg3uuOMOJCUl4dlnn8WePXvw97//HUqlEn/7298Mxn788ceIjIxEamoqAN3pnOnTp6OgoABKpbLDWocMGYK77roLTz75JObNmweFQmHWfm/duhX79+/H8uXL4eHhgbfffhszZ85Efn4++vfvDwDIy8vD2LFj4e/vj+XLlyMiIgIlJSX47rvv0NjYiFtvvRVvvvkmXn75ZXz88ccYPnw4gI6P3AiCgDvvvBM//fQTUlJSMG7cOBw6dAivvfYaMjMzkZmZabA/eXl5ePbZZ/HSSy8hMDAQ//znP5GUlITrrrsO48ePN2u/iRyOQEQO65133hEACAUFBZ2Oa25uFjQajbB7924BgJCXl6f/7MEHHxQACP/5z38M1pk+fbowcOBA/fuCggIBgDB48GChqalJv3zfvn0CAGH9+vWd1tC6/jvvvGOw/MEHHxRCQ0PbjH/ttdeEq7/iAAiBgYGCSqXSLystLRWkUqmwYsUK/bKbbrpJ8Pb2FsrKyjqs55tvvhEACL/88kubz66uafv27QIA4e233zYYt3HjRgGAsGrVKv2y0NBQwcXFRTh79qx+WX19veDr6ys89thjHdZDRIZ4ioqI2nXmzBnMmzcPQUFBkMlkkMvl+km9x44dMxgrkUhw++23Gyy78cYbcfbs2TbbvfXWWyGTyQzGAWh3rCVMmjQJnp6e+veBgYEICAjQ//y6ujrs3r0bc+bMQa9evbrlZ7Ye8XrooYcMls+ePRvu7u746aefDJYPHToU/fr10793cXHB9ddf32M9IrIHPEVFRG1cuXIF48aNg4uLC15//XVcf/31cHNzw7lz53DXXXehvr7eYLybmxtcXFwMlikUCjQ0NLTZtp+fX5txANps01Ku/vmtNbT+/MrKSmi1Wv2cn+5QUVEBJyenNoFJIpEgKCgIFRUVJtVIRF1jwCGiNn7++WdcuHABu3btMrgU2xovJ3dxcWl3MnN5eblZ2/P19YVMJsP58+evtTQ9Pz8/NDU14dKlSwYhRxAElJaWYsSIEd32s4hIh6eoiBxYR0dPWq8+unoi7+eff94zhZkgLCwMZWVluHjxon5ZY2MjfvzxR7O25+rqigkTJuCbb77pNCSZcuSp9bL2devWGSzftGkTamtrrf6ydyJbxCM4RA5s8ODBAIAPPvgADz74IORyOQYOHIjRo0fDx8cHCxcuxGuvvQa5XI6vvvoKeXl5Ilfc1ty5c/G3v/0N99xzD55//nk0NDTgww8/hFarNXub7733HsaOHYu4uDi89NJLuO6663Dx4kV89913+Pzzz+Hp6Yno6GgAwKpVq+Dp6QkXFxeEh4e3e3ppypQpmDp1Kl588UWoVCqMGTNGfxXVsGHDkJiYaHatRNQ+HsEhcmATJ05ESkoKvv/+e4wdOxYjRoxAdnY2/Pz8sHXrVri5ueH+++/HI488Ag8PD2zcuFHsktsIDw/Hli1bUFVVhVmzZuH555/H7Nmz8cADD5i9zSFDhmDfvn2IiYlBSkoKbrnlFrz44otQKBRwdnbW/9zU1FTk5eVh4sSJGDFiBL7//vt2t9f6WInk5GT861//wvTp0/Huu+8iMTERP//8s9mXvBNRxySCIAhiF0FERETUnXgEh4iIiOwOAw4RERHZHQYcIiIisjsWDTh79uzB7bffjt69e+sn2XVl9+7diImJgYuLC/r374/PPvuszZhNmzZh0KBBUCgUGDRoEDZv3myB6omIiMhWWTTg1NbWYsiQIfjoo4+MGl9QUIDp06dj3LhxyMnJwcsvv4ynnnoKmzZt0o/JzMzE3LlzkZiYiLy8PCQmJmLOnDnYu3evpXaDiIiIbEyPXUUlkUiwefNm3HnnnR2OefHFF/Hdd98ZPOdm4cKFyMvLQ2ZmJgDdPS9UKhV++OEH/ZhbbrkFPj4+WL9+vcXqJyIiItthVTf6y8zMREJCgsGyqVOnYvXq1dBoNJDL5cjMzMQzzzzTZkxqamqH21Wr1Qa3cm9ubsbly5fh5+env2MrERERWTdBEFBTU4PevXtDKu38JJRVBZzS0lIEBgYaLAsMDERTUxPKy8sRHBzc4ZjS0tIOt7tixQosW7bMIjUTERFRzzp37lyXD8S1qoADoM0RldYzaH9d3t6Yzo7EpKSkIDk5Wf++uroa/fr1Q0FBATw9PbusSaPR4JdffsGkSZMgl8uN2g9i364Fe2c+9s583dE7bbOAspoGFFc14HxVPYorG1CqasAllRoXaxpQVqNGXWPzNdfq7CSFm1wKZycpXOQyuDhJIXeSQuEkg7NMCrmTBM4yCZxlUjjJpJDLpJDLJHCSSiCTSSGXtvxdKoVMKml5QfenBJBKJZBKJJBKJZBBAulfl0nQ8qcEEgkgkUogNDUh748/MHTIjZDLnSCB7jOpBPq/A9CNhwQSQPf667+7JIbLJC3jdX+X6Ndvz5/j2l/+lx/Rqa5Oaki63ILpNE1a7Nu3F3FxcXByknU4TiGXQinTIjw83Kh/d1tVwAkKCmpzJKasrAxOTk7657t0NObqozp/pVAo2r0Vuq+vL7y8vLqsS6PRwM3NDX5+fvzCNAH7Zj72znzsnflM6Z0gCDhfWY/DxdU4cfEKTpbV4FTZFZwpr0VjU0cBRgJIXCBVAC5yKfw9FPD3UMDX3RnebnJ4uzrDx00OL1c5PF2c4Omi+9ND4QR3hRPcFTK4OzvBVS6DVGpd0ws0Gg2cKk5h+ogI/t6ZSKPRoPzMHxgxMKTL3qlUKgBtD3S0x6oCTnx8fJtnuezYsQOxsbH6nY6Pj0d6errBPJwdO3Zg9OjRPVorEZEjqa7XIPvsZewvrMSh81U4XKxCdb2m3bFOUgn6+riin587Qnxc0dvbFcFKFwQpXRDk5YIALxe4O8s4B5IsyqIB58qVKzh16pT+fUFBAXJzc+Hr64t+/fohJSUFxcXFWLNmDQDdFVMfffQRkpOTsWDBAmRmZmL16tUGV0c9/fTTGD9+PN566y3MmDEDW7Zswc6dO/Hbb79ZcleIiBxKg0aLfQWX8Ut+GbLOXMbxUhWuvuZWLpNgYJAnIoO8EBHggetaXn193CCzsiMs5HgsGnAOHDiASZMm6d+3zoN58MEHkZaWhpKSEhQVFek/Dw8Px7Zt2/DMM8/g448/Ru/evfHhhx/i7rvv1o8ZPXo0NmzYgFdeeQWvvvoqBgwYgI0bNyIuLs6Su0JEZPeq6zXIvCjBlnU5yDxzGfUarcHn4f7uGBHmg+H9fBDdR4nrAz3h7MQb4pN1smjAmThxIjq7zU5aWlqbZRMmTMDBgwc73e6sWbMwa9asay2PiMjhNWi02HnsIrbkXsCu/DJotDIAlwAAgV4K3BQZgLHX9cKIcB8EeLqIWyyRCaxqDg4REfWMc5frsCazEBv3n4OqoUm/vLebgDnxEZhyQxAGBXtxngzZLAYcIiIHknWmAv/8tQA/Hb+on1PTx9sVM4b2xq3RATiV/SumT+zPK4HI5jHgEBE5gINFlfjHjnz8fqpCv2xchD8eHhOGidcHQCqVQKPR4FQn2yCyJQw4RER27FiJCu/8mI+fj5cB0F35NCc2BA+PCcd1AR4iV0dkOQw4RER2qFbdhPfTT+DL3wvQLOju0DtreF88Ofk69PVxE7s8IotjwCEisjPpRy/itS2HcaG6AQAwfXAQnksYiP69eMSGHAcDDhGRnVA1aJDy7R/YeqgEABDi64q/z4jGxIEBIldG1PMYcIiI7MAf56vx+NcHUXS5Dk5SCRaM74+nboqAq3PHDy8ksmcMOERENkwQBKzJPIs3th5Do7YZfX1c8dG84Rga4i12aUSiYsAhIrJR6iYtXvi/Q9iSewEAkDAoEO/MGgKlG+9hQ8SAQ0Rkg2oaNHhsbTYyTlfASSrBy9Oj8PCYMN55mKgFAw4RkY0pUzXgwX/tx7ESFdydZfg8MRZjI/zFLovIqjDgEBHZkDOXruCBL/fhfGU9/D0USHt4BKL7KMUui8jqMOAQEdmIc5frcO8XWbioUiPMzw1rHolDPz/etI+oPQw4REQ2oKymAfev3ouLKjWuD/TA1wtGwd9DIXZZRFZLKnYBRETUueo6DR5YvQ9nK+oQ4uuKtUlxDDdEXWDAISKyYnWNTXjk3/txvLQGvTwVWJcUh0AvF7HLIrJ6DDhERFaquVnAU+tzkX22EkpXOdYmjUSon7vYZRHZBAYcIiIr9cmuU9h57CKcnaT48qERiAzyErskIpvBgENEZIX2nLiEf6SfAAC8PiMaMaE+IldEZFsYcIiIrMy5y3V4akMOBAG4d2Q/zBkRInZJRDaHAYeIyIo0aLT4/77KRlWdBkP6KrH0jkFil0RkkxhwiIisyOtbj+JwsQq+7s745P4YKJxkYpdEZJMYcIiIrETGqXKsyyoCAKTOHYo+3q4iV0RkuxhwiIisQK26CS9sOgQAuH9UP4y/vpfIFRHZNgYcIiIr8Nb24zhfWY8+3q54aVqU2OUQ2TwGHCIikWWersCazLMAgLdn3QgPBR8TSHStGHCIiERU19iEF1tOTd07sh/GXOcvckVE9oEBh4hIRO+nn0DR5Tr0Vrrg5emRYpdDZDcYcIiIRFJYXou0jEIAwBszB8PTRS5uQUR2hAGHiEgkK344Bo1WwITre2FSZIDY5RDZFQYcIiIRZJ2pwI9HLkImlWDJrbxqiqi7MeAQEfWw5mYBr289CgC4d2QIrg/0FLkiIvvDgENE1MM25xTjcLEKngonLL75erHLIbJLDDhERD2orrEJ7/yYDwB4/Kbr4O+hELkiIvvUIwHnk08+QXh4OFxcXBATE4Nff/21w7EPPfQQJBJJm9cNN9ygH5OWltbumIaGhp7YHSIis335WwFKVQ3o6+OKh0aHiV0Okd2yeMDZuHEjFi9ejCVLliAnJwfjxo3DtGnTUFRU1O74Dz74ACUlJfrXuXPn4Ovri9mzZxuM8/LyMhhXUlICFxcXS+8OEZHZatVN+OdvBQCA56cOhIucTwonshSLB5z33nsPSUlJmD9/PqKiopCamoqQkBB8+umn7Y5XKpUICgrSvw4cOIDKyko8/PDDBuMkEonBuKCgIEvvChHRNflq71lU1WkQ7u+O227sLXY5RHbNog88aWxsRHZ2Nl566SWD5QkJCcjIyDBqG6tXr8bNN9+M0NBQg+VXrlxBaGgotFothg4dir///e8YNmxYu9tQq9VQq9X69yqVCgCg0Wig0Wi6rKF1jDFj6U/sm/nYO/NZa+8aNFqs2nMGAPDouDA0a5vQrBW5qKtYa+9sAXtnPlN6Z0p/LRpwysvLodVqERgYaLA8MDAQpaWlXa5fUlKCH374AV9//bXB8sjISKSlpWHw4MFQqVT44IMPMGbMGOTl5SEiIqLNdlasWIFly5a1Wb5jxw64ubkZvT/p6elGj6U/sW/mY+/MZ22921MiQfkVGXycBSgu5GFbaZ7YJXXI2npnS9g78xnTu7q6OqO31yOPrJVIJAbvBUFos6w9aWlp8Pb2xp133mmwfNSoURg1apT+/ZgxYzB8+HCsXLkSH374YZvtpKSkIDk5Wf9epVIhJCQECQkJ8PLy6rIOjUaD9PR0TJkyBXI5b6VuLPbNfOyd+ayxd41Nzfh/qb8BaMDTUwfh9pEhYpfULmvsna1g78xnSu9az8AYw6IBx9/fHzKZrM3RmrKysjZHda4mCAK+/PJLJCYmwtnZudOxUqkUI0aMwMmTJ9v9XKFQQKFoeymmXC436RfR1PGkw76Zj70znzX17tvcIpRUNyDAU4F7RoZCbuWTi62pd7aGvTOfMb0zpbcWnWTs7OyMmJiYNoed0tPTMXr06E7X3b17N06dOoWkpKQuf44gCMjNzUVwcPA11UtE1N2atM34ZNdpAMCj4/vzyimiHmLxU1TJyclITExEbGws4uPjsWrVKhQVFWHhwoUAdKePiouLsWbNGoP1Vq9ejbi4OERHR7fZ5rJlyzBq1ChERERApVLhww8/RG5uLj7++GNL7w4RkUm2/lGCsxV18HV3xry4fmKXQ+QwLB5w5s6di4qKCixfvhwlJSWIjo7Gtm3b9FdFlZSUtLknTnV1NTZt2oQPPvig3W1WVVXh0UcfRWlpKZRKJYYNG4Y9e/Zg5MiRlt4dIiKT/Ov3QgDAw6PD4ObcI9MeiQg9NMl40aJFWLRoUbufpaWltVmmVCo7nSn9/vvv4/333++u8oiILOLQ+SrknquCs0yKe3n0hqhH8VlUREQWsibzLABg+uAgPnOKqIcx4BARWUBlbSO+z7sAAEiMDxO3GCIHxIBDRGQB/zlwDuqmZtzQ2wvD+3mLXQ6Rw2HAISLqZtpmAev26k5PPRAfatSNTYmoezHgEBF1s90nynDucj2UrnLcMaSP2OUQOSQGHCKibtY6uXh2TF+4OvPGfkRiYMAhIupGheW12JV/CQBw/6hQkashclwMOERE3WjjgXMAgAnX90KYv7vI1RA5LgYcIqJu0tws4L85xQCAuSOs84nhRI6CAYeIqJtknalASXUDvFyccFNkgNjlEDk0Bhwiom6y6aDu6M1tQ3rzqeFEImPAISLqBnWNTfjhcAkA4K5hvDScSGwMOERE3WDHkYuoa9Sin68bYkJ9xC6HyOEx4BARdYNNB88DAO4a3od3LiayAgw4RETX6KKqAb+fKgcAzOTpKSKrwIBDRHSNtuQWo1kAYkN9EOrHe98QWQMGHCKia/Rty9VTdw3vK3IlRNSKAYeI6BocvaDC8dIaOMukuHVwsNjlEFELBhwiomvw/aELAIDJUQFQuslFroaIWjHgEBGZSRAEbD9cCgCYzqM3RFaFAYeIyEz5F2tQUF4LZycpJvHRDERWhQGHiMhMP/yhO3ozPqIXPBROIldDRH/FgENEZKYfj+gCzi3RQSJXQkRXY8AhIjJDQXktjpfWwEkqwZSoQLHLIaKrMOAQEZmh9cGa8QP8ePUUkRViwCEiMkPr1VPTonn1FJE1YsAhIjLR+co6HDpfDakESLiBp6eIrBEDDhGRiVqP3owI84W/h0LkaoioPQw4REQmag04vHqKyHox4BARmaBM1YDsokoADDhE1owBh4jIBDuPlUEQgKEh3ghWuopdDhF1gAGHiMgEPx8vAwDcHMVHMxBZMwYcIiIjqZu0+P1UOQBg4kAGHCJrxoBDRGSkfQWXUa/RIsBTgRt6e4ldDhF1ggGHiMhIvxy/BACYNDAAEolE5GqIqDM9EnA++eQThIeHw8XFBTExMfj11187HLtr1y5IJJI2r+PHjxuM27RpEwYNGgSFQoFBgwZh8+bNlt4NInJwu/J1828mRfYSuRIi6orFA87GjRuxePFiLFmyBDk5ORg3bhymTZuGoqKiTtfLz89HSUmJ/hUREaH/LDMzE3PnzkViYiLy8vKQmJiIOXPmYO/evZbeHSJyUIXltThTXgsnqQRjrvMXuxwi6oLFA857772HpKQkzJ8/H1FRUUhNTUVISAg+/fTTTtcLCAhAUFCQ/iWTyfSfpaamYsqUKUhJSUFkZCRSUlIwefJkpKamWnhviMhR/dJy9GZEmC88XfhwTSJrZ9GA09jYiOzsbCQkJBgsT0hIQEZGRqfrDhs2DMHBwZg8eTJ++eUXg88yMzPbbHPq1KldbpOIyFy/5Ovm39wUyauniGyBkyU3Xl5eDq1Wi8BAw4fRBQYGorS0tN11goODsWrVKsTExECtVmPt2rWYPHkydu3ahfHjxwMASktLTdqmWq2GWq3Wv1epVAAAjUYDjUbT5X60jjFmLP2JfTMfe2c+S/SurrEJWWcqAABjB/jY7f8v/L0zH3tnPlN6Z0p/LRpwWl19tYEgCB1egTBw4EAMHDhQ/z4+Ph7nzp3Du+++qw84pm5zxYoVWLZsWZvlO3bsgJubm9H7kZ6ebvRY+hP7Zj72znzd2bvDlRI0NsngqxCQv38PTtj5BVT8vTMfe2c+Y3pXV1dn9PYsGnD8/f0hk8naHFkpKytrcwSmM6NGjcK6dev074OCgkzaZkpKCpKTk/XvVSoVQkJCkJCQAC+vru9lodFokJ6ejilTpkAu57l3Y7Fv5mPvzGeJ3u39/iiA85g2pB9uvTWqW7Zpjfh7Zz72znym9K71DIwxLBpwnJ2dERMTg/T0dMycOVO/PD09HTNmzDB6Ozk5OQgODta/j4+PR3p6Op555hn9sh07dmD06NHtrq9QKKBQKNosl8vlJv0imjqedNg387F35uuu3gmCgN0ndKenJg8KdIj/P/h7Zz72znzG9M6U3lr8FFVycjISExMRGxuL+Ph4rFq1CkVFRVi4cCEA3dGV4uJirFmzBoDuCqmwsDDccMMNaGxsxLp167Bp0yZs2rRJv82nn34a48ePx1tvvYUZM2Zgy5Yt2LlzJ3777TdL7w4ROZiTZVdQXFUPhZMU8f15eTiRrbB4wJk7dy4qKiqwfPlylJSUIDo6Gtu2bUNoaCgAoKSkxOCeOI2NjXjuuedQXFwMV1dX3HDDDdi6dSumT5+uHzN69Ghs2LABr7zyCl599VUMGDAAGzduRFxcnKV3h4gczJ4Tuqun4vr7wdVZ1sVoIrIWPTLJeNGiRVi0aFG7n6WlpRm8f+GFF/DCCy90uc1Zs2Zh1qxZ3VEeEVGHMk63XD11nZ/IlRCRKfgsKiKiDmi0zdjbcnk4715MZFsYcIiIOpB3rgq1jVr4ujsjKohPDyeyJQw4REQd+O1UOQAgfoAfpFI7v/kNkZ1hwCEi6kDGqZbTUwN4eorI1jDgEBG1o1bdhINFlQCAsZx/Q2RzGHCIiNqxr/AympoF9PVxRT8/4x/pQkTWgQGHiKgdv5/Uzb/h0Rsi28SAQ0TUjt9b7n8zmgGHyCYx4BARXaX8ihrHSnQP9Rs9gDf4I7JFDDhERFfJbDl6ExnkCX+Ptg/qJSLrx4BDRHSV309x/g2RrWPAISK6yu+ndQGHj2cgsl0MOEREf1FUUYdzl+vhJJVgZLiv2OUQkZkYcIiI/iKr5eGaQ0O84a5wErkaIjIXAw4R0V/sLbgMAIjrz6M3RLaMAYeI6C/2FuiO4IwM5+XhRLaMAYeIqEVxVT3OV9ZDJpUgJtRH7HKI6Bow4BARtdjfcnoqurcXPDj/hsimMeAQEbX48/QU598Q2ToGHCKiFvoJxpx/Q2TzGHCIiABcqlHjzKVaSCTAiDAewSGydQw4REQA9rUcvYkM8oLSTS5yNUR0rRhwiIgA7GuZfxPH+TdEdoEBh4gIf86/4QRjIvvAgENEDq+qrhH5F2sAMOAQ2QsGHCJyePsLKyEIwIBe7vD3UIhdDhF1AwYcInJ4e8/w8QxE9oYBh4gc3r5C3fybUXzAJpHdYMAhIod2Rd2Ew8XVADj/hsieMOAQkUM7eLYSzQIQ4uuKYKWr2OUQUTdhwCEih3bgbCUAIDaUR2+I7AkDDhE5tOyzuvk3MaE+IldCRN2JAYeIHFaTthm5RVUAgNgwBhwie8KAQ0QO63hpDWobtfBUOCEiwFPscoioGzHgEJHDym6ZfzMs1AcyqUTkaoioO/VIwPnkk08QHh4OFxcXxMTE4Ndff+1w7LfffospU6agV69e8PLyQnx8PH788UeDMWlpaZBIJG1eDQ0Nlt4VIrIjf04w5ukpIntj8YCzceNGLF68GEuWLEFOTg7GjRuHadOmoaioqN3xe/bswZQpU7Bt2zZkZ2dj0qRJuP3225GTk2MwzsvLCyUlJQYvFxcXS+8OEdmRgy0BhxOMieyPk6V/wHvvvYekpCTMnz8fAJCamooff/wRn376KVasWNFmfGpqqsH7N998E1u2bMH333+PYcOG6ZdLJBIEBQVZtHYisl8l1fUorqqHTCrB0BBvscshom5m0SM4jY2NyM7ORkJCgsHyhIQEZGRkGLWN5uZm1NTUwNfX8B4VV65cQWhoKPr27YvbbrutzREeIqLOHCjUHb2JCvaEu8Li/61HRD3Mov9Ul5eXQ6vVIjAw0GB5YGAgSktLjdrGP/7xD9TW1mLOnDn6ZZGRkUhLS8PgwYOhUqnwwQcfYMyYMcjLy0NERESbbajVaqjVav17lUoFANBoNNBoNF3W0DrGmLH0J/bNfOyd+Yzt3b4C3QM2h/VVss8t+HtnPvbOfKb0zpT+SgRBEMyuqgsXLlxAnz59kJGRgfj4eP3yN954A2vXrsXx48c7XX/9+vWYP38+tmzZgptvvrnDcc3NzRg+fDjGjx+PDz/8sM3nS5cuxbJly9os//rrr+Hm5mbCHhGRvXj3kAznaiV4MEKL4f4W+xokom5UV1eHefPmobq6Gl5eXp2OtegRHH9/f8hksjZHa8rKytoc1bnaxo0bkZSUhG+++abTcAMAUqkUI0aMwMmTJ9v9PCUlBcnJyfr3KpUKISEhSEhI6LJBgC4xpqenY8qUKZDL5V2OJx32zXzsnfmM6V2tugnJe38BICBpxiQEK3mBAsDfu2vB3pnPlN61noExhkUDjrOzM2JiYpCeno6ZM2fql6enp2PGjBkdrrd+/Xo88sgjWL9+PW699dYuf44gCMjNzcXgwYPb/VyhUEChULRZLpfLTfpFNHU86bBv5mPvzNdZ746erYa2WUBvpQv6+fMGf1fj75352DvzGdM7U3pr8Zl1ycnJSExMRGxsLOLj47Fq1SoUFRVh4cKFAHRHV4qLi7FmzRoAunDzwAMP4IMPPsCoUaP0R39cXV2hVCoBAMuWLcOoUaMQEREBlUqFDz/8ELm5ufj4448tvTtEZAda738znJeHE9ktiwecuXPnoqKiAsuXL0dJSQmio6Oxbds2hIaGAgBKSkoM7onz+eefo6mpCY8//jgef/xx/fIHH3wQaWlpAICqqio8+uijKC0thVKpxLBhw7Bnzx6MHDnS0rtDRHaAN/gjsn89cm3kokWLsGjRonY/aw0trXbt2tXl9t5//328//773VAZETma5mYBOa0BJ8y3i9FEZKv4LCoicigny66gRt0EN2cZIoM4/4bIXjHgEJFDySnSHb25sa8STjJ+BRLZK/7TTUQO5WBLwBnej/NviOwZAw4ROZScoioAwDAGHCK7xoBDRA6jul6Dk2VXAIAP2CSycww4ROQwDp2vAgCE+Lqil2fbm38Skf1gwCEih9F6eorzb4jsHwMOETmM1gnGw3h6isjuMeAQkUMQBIETjIkcCAMOETmEgvJaVNdroHCSIirYS+xyiMjCGHCIyCG0Hr0Z3EcJZyd+9RHZO/5TTkQOIedcy/ybft7iFkJEPYIBh4gcwsGzVQA4/4bIUTDgEJHdq2tswvFSFQBeIk7kKBhwiMjuHTpfjWYBCFa6IEjpInY5RNQDGHCIyO79eXm4t6h1EFHPYcAhIruXo7/BH09PETkKBhwismuCIOBg6yMaQr1FrYWIeg4DDhHZtfOV9Si/ooaTVIIbeivFLoeIeggDDhHZtdxzVQCAQb294CKXiVsMEfUYBhwismutAWcoH7BJ5FAYcIjIrrUGnCF9vUWtg4h6FgMOEdktjbYZh4urAQBDeYk4kUNhwCEiu3W8pAbqpmZ4uTgh3M9d7HKIqAcx4BCR3cptecDmkBBvSKUSkashop7EgENEdiunZf7NME4wJnI4DDhEZLf0V1Bx/g2Rw2HAISK7VF2vwZlLtQB4BRWRI2LAISK7dKjl6ql+vm7w81CIXA0R9TQGHCKyS3nnWi4P5/wbIofEgENEdinvPAMOkSNjwCEiuyMIfwk4nGBM5JAYcIjI7lSogco6DeQyCQYFe4ldDhGJgAGHiOzO2Su6m/oNCuYTxIkcFQMOEdmdszW6gMP5N0SOiwGHiOxO6xEczr8hclw9EnA++eQThIeHw8XFBTExMfj11187Hb97927ExMTAxcUF/fv3x2effdZmzKZNmzBo0CAoFAoMGjQImzdvtlT5RGRDGpuacV53fz8MDfERtxgiEo3FA87GjRuxePFiLFmyBDk5ORg3bhymTZuGoqKidscXFBRg+vTpGDduHHJycvDyyy/jqaeewqZNm/RjMjMzMXfuXCQmJiIvLw+JiYmYM2cO9u7da+ndISIrl3+xBk2CBEpXJ4T5uYldDhGJxOIB57333kNSUhLmz5+PqKgopKamIiQkBJ9++mm74z/77DP069cPqampiIqKwvz58/HII4/g3Xff1Y9JTU3FlClTkJKSgsjISKSkpGDy5MlITU219O4QkZVrvTz8xj5KSCR8gjiRo3Ky5MYbGxuRnZ2Nl156yWB5QkICMjIy2l0nMzMTCQkJBsumTp2K1atXQ6PRQC6XIzMzE88880ybMR0FHLVaDbVarX+vUqkAABqNBhqNpsv9aB1jzFj6E/tmPvbOfDlFlQCA6N6e7J+J+HtnPvbOfKb0zpT+WjTglJeXQ6vVIjAw0GB5YGAgSktL212ntLS03fFNTU0oLy9HcHBwh2M62uaKFSuwbNmyNst37NgBNzfjD2Gnp6cbPZb+xL6Zj70zXdYJGQAJtGWnsW3bKbHLsUn8vTMfe2c+Y3pXV1dn9PYsGnBaXX2YWBCETg8dtzf+6uWmbDMlJQXJycn69yqVCiEhIUhISICXV9c3AdNoNEhPT8eUKVMgl8u7HE867Jv52DvzVNdrUJb5CwDgwdvGI0DpLnJFtoW/d+Zj78xnSu9az8AYw6IBx9/fHzKZrM2RlbKysjZHYFoFBQW1O97JyQl+fn6djulomwqFAgpF26cJy+Vyk34RTR1POuyb+dg70xwtqAIA+CsEBCjd2Tsz8ffOfOyd+YzpnSm9tegkY2dnZ8TExLQ57JSeno7Ro0e3u058fHyb8Tt27EBsbKx+xzoa09E2icgx5J6rAgD08xDELYSIRGfxU1TJyclITExEbGws4uPjsWrVKhQVFWHhwoUAdKePiouLsWbNGgDAwoUL8dFHHyE5ORkLFixAZmYmVq9ejfXr1+u3+fTTT2P8+PF46623MGPGDGzZsgU7d+7Eb7/9ZundISIr1hpwQj0ZcIgcncUDzty5c1FRUYHly5ejpKQE0dHR2LZtG0JDQwEAJSUlBvfECQ8Px7Zt2/DMM8/g448/Ru/evfHhhx/i7rvv1o8ZPXo0NmzYgFdeeQWvvvoqBgwYgI0bNyIuLs7Su0NEVkoQBOS1BJwwHsEhcng9Msl40aJFWLRoUbufpaWltVk2YcIEHDx4sNNtzpo1C7NmzeqO8ojIDpyvrEdFbSPkMgn6cG4xkcPjs6iIyC7ktBy9iQryhJzfbEQOj18DRGQXcouqAAA39lWKWwgRWQUGHCKyC7nndHcwHsKAQ0RgwCEiO6DRNuPwBd0NwBhwiAhgwCEiO3C8pAaNTc1Qusr5BHEiAsCAQ0R2QH96KsSbTxAnIgAMOERkB1qvoBoa4i1qHURkPRhwiMjm5ekDDuffEJEOAw4R2bTqeg1OX6oFAAzp6y1uMURkNRhwiMimtR69CfVzg5+HQtxiiMhqMOAQkU3LabnB3zDOvyGiv2DAISKb1noF1bB+PiJXQkTWhAGHiGyWIAi8goqI2sWAQ0Q2q7CiDlV1Gjg7SREV7CV2OURkRRhwiMhmtZ6eGtxHCWcnfp0R0Z/4jUBENqt1gjFPTxHR1RhwiMhm6a+g6uctah1EZH0YcIjIJjVotDhWonuCOK+gIqKrMeAQkU06XFyNpmYBAZ4K9Fa6iF0OEVkZBhwiskl/nX/DJ4gT0dUYcIjIJuXwBn9E1AkGHCKySbmcYExEnWDAISKbU1rdgAvVDZBKgBv7KsUuh4isEAMOEdmc1hv8DQzygpuzk8jVEJE1YsAhIpvT+vwpnp4ioo4w4BCRzdHf4I93MCaiDjDgEJFN0Wib8cf5agA8gkNEHWPAISKbcrykBvUaLZSucvT39xC7HCKyUgw4RGRTss9eBgAM7+cNqZQ3+COi9jHgEJFNyW6ZfzOcN/gjok4w4BCRTTl4VneJeEwoAw4RdYwBh4hsRml1A4qr6iGVAEN4BRURdYIBh4hsxsEi3dGbqGAvuCt4gz8i6hgDDhHZjOyW01Ocf0NEXWHAISKb0XoEh/NviKgrFg04lZWVSExMhFKphFKpRGJiIqqqqjocr9Fo8OKLL2Lw4MFwd3dH79698cADD+DChQsG4yZOnAiJRGLwuueeeyy5K0QksgaNFoeLdTf4Y8Ahoq5YNODMmzcPubm52L59O7Zv347c3FwkJiZ2OL6urg4HDx7Eq6++ioMHD+Lbb7/FiRMncMcdd7QZu2DBApSUlOhfn3/+uSV3hYhEdri4GhqtAH8PBfr6uIpdDhFZOYvN0jt27Bi2b9+OrKwsxMXFAQC++OILxMfHIz8/HwMHDmyzjlKpRHp6usGylStXYuTIkSgqKkK/fv30y93c3BAUFGSp8onIymTrLw/3hkTCG/wRUecsFnAyMzOhVCr14QYARo0aBaVSiYyMjHYDTnuqq6shkUjg7e1tsPyrr77CunXrEBgYiGnTpuG1116Dp6dnu9tQq9VQq9X69yqVCoDulJhGo+myhtYxxoylP7Fv5mPv2jpQqLuD8dC+yk77wt6Zj70zH3tnPlN6Z0p/LRZwSktLERAQ0GZ5QEAASktLjdpGQ0MDXnrpJcybNw9eXl765ffddx/Cw8MRFBSEw4cPIyUlBXl5eW2O/rRasWIFli1b1mb5jh074ObmZuQeocPtU+fYN/OxdzqCAGSdkgGQQF18FNu2He1yHfbOfOyd+dg78xnTu7q6OqO3Z3LAWbp0abth4a/2798PAO0eRhYEwajDyxqNBvfccw+am5vxySefGHy2YMEC/d+jo6MRERGB2NhYHDx4EMOHD2+zrZSUFCQnJ+vfq1QqhISEICEhwSA4dVZLeno6pkyZArlc3uV40mHfzMfeGSq6XIearN8gl0kw/66pUMhlHY5l78zH3pmPvTOfKb1rPQNjDJMDzhNPPNHlFUthYWE4dOgQLl682OazS5cuITAwsNP1NRoN5syZg4KCAvz8889dhpDhw4dDLpfj5MmT7QYchUIBhULRZrlcLjfpF9HU8aTDvpmPvdM5dKEGABDdRwkPNxej1mHvzMfemY+9M58xvTOltyYHHH9/f/j7+3c5Lj4+HtXV1di3bx9GjhwJANi7dy+qq6sxevToDtdrDTcnT57EL7/8Aj8/vy5/1pEjR6DRaBAcHGz8jhCRzTh4tgoAEMMb/BGRkSx2mXhUVBRuueUWLFiwAFlZWcjKysKCBQtw2223GUwwjoyMxObNmwEATU1NmDVrFg4cOICvvvoKWq0WpaWlKC0tRWNjIwDg9OnTWL58OQ4cOIDCwkJs27YNs2fPxrBhwzBmzBhL7Q4RiSibD9gkIhNZ9D44X331FQYPHoyEhAQkJCTgxhtvxNq1aw3G5Ofno7pad/Ou8+fP47vvvsP58+cxdOhQBAcH618ZGRkAAGdnZ/z000+YOnUqBg4ciKeeegoJCQnYuXMnZLKOz8sTkW1SNWhwrFR33p0Bh4iMZdGn1fn6+mLdunWdjhEEQf/3sLAwg/ftCQkJwe7du7ulPiKyftlnKyEIQKifGwK8jJt/Q0TEZ1ERkVXbX6C7/82IMF+RKyEiW8KAQ0RWbX/LDf5GMuAQkQkYcIjIajVotMg7p5ujNyKcAYeIjMeAQ0RW69D5ajRqm+HvoUCYn/F3HSciYsAhIqulPz0V7sMHbBKRSRhwiMhq7eMEYyIyEwMOEVklbbOAgy03+GPAISJTMeAQkVU6VqJCjboJngonRAV3/VBcIqK/YsAhIqvUOv9meKgPZFLOvyEi0zDgEJFV+nOCMU9PEZHpGHCIyOoIgoB9BZx/Q0TmY8AhIqtTUF6L8itqOMukuLGvUuxyiMgGMeAQkdVpPT01JEQJF7lM5GqIyBYx4BCR1eHpKSK6Vgw4RGR19hVWAODzp4jIfAw4RGRVzl2uw7nL9ZBJJTyCQ0RmY8AhIquSeUZ39GZIXyU8FE4iV0NEtooBh4isStZpXcCJH+AnciVEZMsYcIjIagiCgIyWgDN6gL/I1RCRLWPAISKrUVBei1JVA5xlUsSE+ohdDhHZMAYcIrIarfNvhvXz5v1viOiaMOAQkdXg6Ski6i4MOERkFQRB0E8wHn0dJxgT0bVhwCEiq3Di4hVU1DbCVS7DkL7eYpdDRDaOAYeIrELG6XIAQGyYD5yd+NVERNeG3yJEZBUyef8bIupGDDhEJDpts4CsM5xgTETdhwGHiER39IIKqoYmeCqcEN3bS+xyiMgOMOAQkegyz+jm34wM94WTjF9LRHTt+E1CRKLL4PwbIupmDDhEJKoGjVY//2bMdZx/Q0TdgwGHiES1v/AyGjTNCPRSIDLIU+xyiMhOMOAQkah2518CAIyP6AWJRCJyNURkLxhwiEhUe07qAs6Egb1EroSI7AkDDhGJ5kJVPU5cvAKpBBjL+TdE1I0sGnAqKyuRmJgIpVIJpVKJxMREVFVVdbrOQw89BIlEYvAaNWqUwRi1Wo0nn3wS/v7+cHd3xx133IHz589bcE+IyBL2nNAdvRka4g1vN2eRqyEie2LRgDNv3jzk5uZi+/bt2L59O3Jzc5GYmNjlerfccgtKSkr0r23bthl8vnjxYmzevBkbNmzAb7/9hitXruC2226DVqu11K4QkQXsbgk446/n6Ski6l5OltrwsWPHsH37dmRlZSEuLg4A8MUXXyA+Ph75+fkYOHBgh+sqFAoEBQW1+1l1dTVWr16NtWvX4uabbwYArFu3DiEhIdi5cyemTp3a/TtDRN2uSduM307pbvA3gQGHiLqZxQJOZmYmlEqlPtwAwKhRo6BUKpGRkdFpwNm1axcCAgLg7e2NCRMm4I033kBAQAAAIDs7GxqNBgkJCfrxvXv3RnR0NDIyMtoNOGq1Gmq1Wv9epVIBADQaDTQaTZf70jrGmLH0J/bNfI7Qu+yzlahpaIK3qxxRge7dtq+O0DtLYe/Mx96Zz5TemdJfiwWc0tJSfSj5q4CAAJSWlna43rRp0zB79myEhoaioKAAr776Km666SZkZ2dDoVCgtLQUzs7O8PHxMVgvMDCww+2uWLECy5Yta7N8x44dcHNzM3qf0tPTjR5Lf2LfzGfPvdtaJAUgRX83NX7c/kO3b9+ee2dp7J352DvzGdO7uro6o7dncsBZunRpu2Hhr/bv3w8A7d7TQhCETu91MXfuXP3fo6OjERsbi9DQUGzduhV33XVXh+t1tt2UlBQkJyfr36tUKoSEhCAhIQFeXl0/2E+j0SA9PR1TpkyBXC7vcjzpsG/mc4Te/fOzLAAqzBk/GNOH9+m27TpC7yyFvTMfe2c+U3rXegbGGCYHnCeeeAL33HNPp2PCwsJw6NAhXLx4sc1nly5dQmBgoNE/Lzg4GKGhoTh58iQAICgoCI2NjaisrDQ4ilNWVobRo0e3uw2FQgGFQtFmuVwuN+kX0dTxpMO+mc9ee1d+RY0/inVfVDdFBVlkH+21dz2BvTMfe2c+Y3pnSm9NDjj+/v7w9+/6fhXx8fGorq7Gvn37MHLkSADA3r17UV1d3WEQaU9FRQXOnTuH4OBgAEBMTAzkcjnS09MxZ84cAEBJSQkOHz6Mt99+29TdISIR/HZSN7k4KtgLAV4uIldDRPbIYpeJR0VF4ZZbbsGCBQuQlZWFrKwsLFiwALfddpvBBOPIyEhs3rwZAHDlyhU899xzyMzMRGFhIXbt2oXbb78d/v7+mDlzJgBAqVQiKSkJzz77LH766Sfk5OTg/vvvx+DBg/VXVRGRdWu9PJxXTxGRpVhskjEAfPXVV3jqqaf0Vzzdcccd+OijjwzG5Ofno7q6GgAgk8nwxx9/YM2aNaiqqkJwcDAmTZqEjRs3wtPzz4fwvf/++3BycsKcOXNQX1+PyZMnIy0tDTKZzJK7Q0TdoEnbjJ+PlwEAJvLxDERkIRYNOL6+vli3bl2nYwRB0P/d1dUVP/74Y5fbdXFxwcqVK7Fy5cprrpGIeta+wsuortfAx02O2FCfrlcgIjIDn0VFRD1qxxHdxQeTowLhJONXEBFZBr9diKjHCIKA9KO6gJMwyPirKYmITMWAQ0Q95miJCsVV9XCRSzEugvNviMhyGHCIqMe0np4aF9ELrs68KICILIcBh4h6DE9PEVFPYcAhoh5x7nIdjpaoIJXoJhgTEVkSAw4R9YjWozexYb7wdXcWuRoisncMOETUI3h6ioh6EgMOEVlcZW0j9hVeBgAkDAoSuRoicgQMOERkcT8fL4O2WUBkkCf6+bmJXQ4ROQAGHCKyuO1HSgEAU3h6ioh6CAMOEVlUdZ0Gu/N1Tw+/9cZgkashIkfBgENEFrX9SAkatc0YGOiJyCAvscshIgfBgENEFrUl9wIA4I6hvUWuhIgcCQMOEVlMmaoBmWcqAAB3DGHAIaKew4BDRBbz/aESCAIwvJ83Qnx59RQR9RwGHCKymO9yiwEAM4b2EbkSInI0DDhEZBGF5bXIO18NmVSC6YN59RQR9SwGHCKyiO/ydJOLRw/wQy9PhcjVEJGjYcAhom4nCAL+y9NTRCQiBhwi6nZHLqhw5lItnJ2kmHoD715MRD2PAYeIut2WlqM3kyMD4OkiF7kaInJEDDhE1K3UTVpsOqgLODOH8fQUEYmDAYeIulX60Yu4XNuIQC8FbooMELscInJQDDhE1K2+3lsEAJgbGwInGb9iiEgc/PYhom5TUF6LjNMVkEiAOSNCxC6HiBwYAw4RdZsN+3VHbyZc3wt9ffhoBiISDwMOEXWLxqZm/N+B8wCAe0f2E7kaInJ0DDhE1C3Sj15ERW0jAjwVmMzJxUQkMgYcIuoW6/e1TC4ewcnFRCQ+fgsR0TUrLK/Fb6fKdZOLYzm5mIjEx4BDRNfs65ajN+MjeiHEl5OLiUh8DDhEdE1UDRr9vW8eiA8VuRoiIh0GHCK6Jl/vLcIVdRMiAjwwaSAnFxORdWDAISKzqZu0+PK3AgDAo+P7QyqViFwREZGORQNOZWUlEhMToVQqoVQqkZiYiKqqqk7XkUgk7b7eeecd/ZiJEye2+fyee+6x5K4QUTu25F5AWY0agV4KzBjKB2sSkfVwsuTG582bh/Pnz2P79u0AgEcffRSJiYn4/vvvO1ynpKTE4P0PP/yApKQk3H333QbLFyxYgOXLl+vfu7q6dmPlRNSV5mYBq/acAQAkjQ2HsxMPCBOR9bBYwDl27Bi2b9+OrKwsxMXFAQC++OILxMfHIz8/HwMHDmx3vaCgIIP3W7ZswaRJk9C/f3+D5W5ubm3GElHP+fl4GU6VXYGnwol3LiYiq2Ox/+TKzMyEUqnUhxsAGDVqFJRKJTIyMozaxsWLF7F161YkJSW1+eyrr76Cv78/brjhBjz33HOoqanpttqJqGuf7zkNAJg3qh88XeQiV0NEZMhiR3BKS0sREND2ioqAgACUlpYatY1///vf8PT0xF133WWw/L777kN4eDiCgoJw+PBhpKSkIC8vD+np6e1uR61WQ61W69+rVCoAgEajgUaj6bKO1jHGjKU/sW/ms/be5RRVYX9hJeQyCRJH9rWqOq29d9aMvTMfe2c+U3pnSn9NDjhLly7FsmXLOh2zf/9+ALoJw1cTBKHd5e358ssvcd9998HFxcVg+YIFC/R/j46ORkREBGJjY3Hw4EEMHz68zXZWrFjRbs07duyAm5vxNyXrKEBR59g381lr7z4+KgUgxXBfLbJ/+1nsctplrb2zBeyd+dg78xnTu7q6OqO3Z3LAeeKJJ7q8YiksLAyHDh3CxYsX23x26dIlBAYGdvlzfv31V+Tn52Pjxo1djh0+fDjkcjlOnjzZbsBJSUlBcnKy/r1KpUJISAgSEhLg5eXV5fY1Gg3S09MxZcoUyOU8FG8s9s181ty7309X4ERmNuQyCVYkjkeIj3Xdudiae2ft2DvzsXfmM6V3rWdgjGFywPH394e/v3+X4+Lj41FdXY19+/Zh5MiRAIC9e/eiuroao0eP7nL91atXIyYmBkOGDOly7JEjR6DRaBAcHNzu5wqFAgqFos1yuVxu0i+iqeNJh30zn7X1rrlZwD/STwEA7osLRf8ApcgVdczaemdL2DvzsXfmM6Z3pvTWYpOMo6KicMstt2DBggXIyspCVlYWFixYgNtuu83gCqrIyEhs3rzZYF2VSoVvvvkG8+fPb7Pd06dPY/ny5Thw4AAKCwuxbds2zJ49G8OGDcOYMWMstTtEBGDb4RL8UVwNd2cZnrjpOrHLISLqkEVvXPHVV19h8ODBSEhIQEJCAm688UasXbvWYEx+fj6qq6sNlm3YsAGCIODee+9ts01nZ2f89NNPmDp1KgYOHIinnnoKCQkJ2LlzJ2QymSV3h8ihabTN+MeOEwCA+eP6w9+j7VFRIiJrYdEb/fn6+mLdunWdjhEEoc2yRx99FI8++mi740NCQrB79+5uqY+IjPefA+dQUF4LP3dnLBjfv+sViIhExFuPElGX6hu1+GDnSQDAEzddBw+FRf/biIjomjHgEFGXPtt9GmU1avT1ccW8ON61mIisHwMOEXXqVNkVfLpLd9fil6ZFQuHEuW5EZP0YcIioQ4IgYMnmP9CobcbEgb1w6+D2b8VARGRtGHCIqEPfZJ/H3oLLcJFL8fcZ0UbfhZyISGwMOETUroorary57RgA4Jmbr0eIr3XdsZiIqDMMOETUrje2HkNVnQZRwV54ZGy42OUQEZmEAYeI2tiVX4Zvc4ohkQAr7hoMuYxfFURkW/itRUQGylQNePY/eQCAB+PDMDTEW9yCiIjMwIBDRHraZgGLN+aiorYRkUGeeGlapNglERGZhQGHiPQ+/uUUMk5XwFUuw0fzhsNFznveEJFtYsAhIgDA3jMVSN2pe5jm63dG47oAD5ErIiIyHwMOEaHiihpPb8hFswDcNbwP7o7pK3ZJRETXhAGHyMHVN2qR9O8DKFU1oH8vd/x9RrTYJRERXTMGHCIHpm0W8PSGHOSeq4LSVY5VibFw55PCicgOMOAQOShBEPD3/x3FjqMX4ewkxT8fjOW8GyKyGww4RA7qn78WIC2jEADw/pyhGBHmK25BRETdiAGHyAF9vbcIb7Q8Z2rJ9CjceiOfEk5E9oUn24kczD9/PYPXt+rCTdLYcMwfx+dMEZH9YcAhchCCIOCjn0/hH+m6e90snDAAL94yEBKJROTKiIi6HwMOkQMQBAFv/5iPT3edBgA8O+V6PHHTdQw3RGS3GHCI7FytugnP/18etv1RCgB45dYozB/XX+SqiIgsiwGHyI4Vltfi0bUHcOLiFchlErxx52DMGREidllERBbHgENkp37JL8PT63OgamhCgKcCn94fg5hQH7HLIiLqEQw4RHamVt2E//fDcazNOgsAiAn1waf3DUeAl4vIlRER9RwGHCI7knGqHC9sOoTzlfUAgAfjQ7Hk1kFwduItr4jIsTDgENmBiitqvLvjBNbvKwIA9PF2xduzbsSY6/xFroyISBwMOEQ2rL5Riy9/L8Cnu07jiroJAJA4KhQvTouEBx+aSUQOjN+ARDaosakZm3POI3XnSZRUNwAAovt44ZVbB2FUfz+RqyMiEh8DDpENUTVo8PXeIvzr9wJcVKkB6E5HPT91IO4Y0htSKW/cR0QEMOAQ2YRjJSr858A5fHPgvP5UVICnAgvG9UdifChc5DKRKyQisi4MOERW6ooGWJNVhM25F3C4WKVfHhHggQXj+2PG0N5QODHYEBG1hwGHyIqcrahF+tGL+PFIKQ4UyiDgOABALpPg5qhAzBkRggkRvXgqioioCww4RCK6XNuIrDMVyDhdjszTFTh9qfYvn0oQ3dsLs2L6YsbQPvBxdxatTiIiW8OAQ9RDGjRa5JfWIO98FfLOVSPvfBVOlV0xGCOTShAX7oubBvpDVnoE988cBblcLlLFRES2y6IB54033sDWrVuRm5sLZ2dnVFVVdbmOIAhYtmwZVq1ahcrKSsTFxeHjjz/GDTfcoB+jVqvx3HPPYf369aivr8fkyZPxySefoG/fvhbcGyLj1DRoUFheh8KKWpy+dAUnLtbgeGkNCstr0Sy0HT8w0BPxA/wQP8APo8L9oHSTQ6PRYNu2Iz1fPBGRnbBowGlsbMTs2bMRHx+P1atXG7XO22+/jffeew9paWm4/vrr8frrr2PKlCnIz8+Hp6cnAGDx4sX4/vvvsWHDBvj5+eHZZ5/FbbfdhuzsbMhknHRJliEIAmobtShTNeBSjRplNWpcVDWgpLoBF6rqcaGqHucr61FR29jhNnzc5LixrzeGhHhjaIgSQ/p6w89D0YN7QUTkGCwacJYtWwYASEtLM2q8IAhITU3FkiVLcNdddwEA/v3vfyMwMBBff/01HnvsMVRXV2P16tVYu3Ytbr75ZgDAunXrEBISgp07d2Lq1KkW2ReyXYIgQKMV0NCkRYNGi/pGLeoatahv+XtNQxNq1U240vKqrtdAVa9Bdcursk6DytpGXK5rRGNTs1E/09/DGWF+7gjzd8fAQE8MDNK9AjwVkEg4QZiIyNKsag5OQUEBSktLkZCQoF+mUCgwYcIEZGRk4LHHHkN2djY0Go3BmN69eyM6OhoZGRkmBZz0o6Vw96jtclxTkxa5FRJIj1yEUzccIWrnLIXh510MEK7awtXjr15dENof37od/Xv98j/XEVr+R4Bw1ed/LhNaVm4WdOsJAJoFQKvV4sgFCS78VgiJVIpmQTe+uVk3VvdegFYQoG3Wvdc2615//Xvrq6nlT422GU2tf2p1f2q0zWhs+XtjUzPUTdqWP3UvbXvnhszk7ixDgJcLenkq0MtTgT7eruitdEFvb1f09nZFqJ8bPF04b4aISExWFXBKS0sBAIGBgQbLAwMDcfbsWf0YZ2dn+Pj4tBnTuv7V1Go11Gq1/n11dTUA4Ok1mZAq3Iyub/UfmUaPpb84kSt2BQZcnaVwdZLBxVkGN7kM7gonuDnL4K6Qwc1ZBi8XObxc5PB0kcHTRQ4fNzm8XZ3h7e4Eb1c53Jw7+8emCY21KlR0nZs7pdFoUFdXh4qKCk4yNhF7Zz72znzsnflM6V1NTQ2Atv/h3h6TA87SpUv1p546sn//fsTGxpq6ab2rD+ELgtDlYf3OxqxYsaLdmos/fcjsGomIiEgcNTU1UCqVnY4xOeA88cQTuOeeezodExYWZupmAQBBQUEAdEdpgoOD9cvLysr0R3WCgoLQ2NiIyspKg6M4ZWVlGD16dLvbTUlJQXJysv59c3MzLl++DD8/P6PmQ6hUKoSEhODcuXPw8vIya98cEftmPvbOfOyd+dg787F35jOld4IgoKamBr179+5yuyYHHH9/f/j7+5u6mlHCw8MRFBSE9PR0DBs2DIDuSqzdu3fjrbfeAgDExMRALpcjPT0dc+bMAQCUlJTg8OHDePvtt9vdrkKhgEJheKWKt7e3yfV5eXnxF9cM7Jv52DvzsXfmY+/Mx96Zz9jedXXkppVF5+AUFRXh8uXLKCoqglarRW5uLgDguuuug4eHBwAgMjISK1aswMyZMyGRSLB48WK8+eabiIiIQEREBN588024ublh3rx5AHQ7lpSUhGeffRZ+fn7w9fXFc889h8GDB+uvqiIiIiLHZtGA87e//Q3//ve/9e9bj8r88ssvmDhxIgAgPz9fP+kXAF544QXU19dj0aJF+hv97dixQ38PHAB4//334eTkhDlz5uhv9JeWlsZ74BAREREACwectLS0Lu+Bc/VMaIlEgqVLl2Lp0qUdruPi4oKVK1di5cqV3VBl1xQKBV577bU2p7moc+yb+dg787F35mPvzMfemc9SvZMIxlxrRURERGRDpGIXQERERNTdGHCIiIjI7jDgEBERkd1hwCEiIiK7w4Bjhq1btyIuLg6urq7w9/fXP/mcjKNWqzF06FBIJBL9vZGoY4WFhUhKSkJ4eDhcXV0xYMAAvPbaa2hsbBS7NKv0ySefIDw8HC4uLoiJicGvv/4qdklWb8WKFRgxYgQ8PT0REBCAO++8E/n5+WKXZXNWrFihv58bGae4uBj3338//Pz84ObmhqFDhyI7O7tbts2AY6JNmzYhMTERDz/8MPLy8vD777/rb0JIxnnhhReMus026Rw/fhzNzc34/PPPceTIEbz//vv47LPP8PLLL4tdmtXZuHEjFi9ejCVLliAnJwfjxo3DtGnTUFRUJHZpVm337t14/PHHkZWVhfT0dDQ1NSEhIQG1tdf41FgHsn//fqxatQo33nij2KXYjMrKSowZMwZyuRw//PADjh49in/84x9mPWmgXQIZTaPRCH369BH++c9/il2Kzdq2bZsQGRkpHDlyRAAg5OTkiF2STXr77beF8PBwscuwOiNHjhQWLlxosCwyMlJ46aWXRKrINpWVlQkAhN27d4tdik2oqakRIiIihPT0dGHChAnC008/LXZJNuHFF18Uxo4da7Ht8wiOCQ4ePIji4mJIpVIMGzYMwcHBmDZtGo4cOSJ2aTbh4sWLWLBgAdauXQs3Nzexy7Fp1dXV8PX1FbsMq9LY2Ijs7GwkJCQYLE9ISEBGRoZIVdmm1rvL83fMOI8//jhuvfVWPi7IRN999x1iY2Mxe/ZsBAQEYNiwYfjiiy+6bfsMOCY4c+YMAGDp0qV45ZVX8L///Q8+Pj6YMGECLl++LHJ11k0QBDz00ENYuHAhYmNjxS7Hpp0+fRorV67EwoULxS7FqpSXl0Or1SIwMNBgeWBgIEpLS0WqyvYIgoDk5GSMHTsW0dHRYpdj9TZs2ICDBw9ixYoVYpdic86cOYNPP/0UERER+PHHH7Fw4UI89dRTWLNmTbdsnwEHusAikUg6fR04cADNzc0AgCVLluDuu+9GTEwM/vWvf0EikeCbb74ReS/EYWzvVq5cCZVKhZSUFLFLthrG9u6vLly4gFtuuQWzZ8/G/PnzRarcukkkEoP3giC0WUYde+KJJ3Do0CGsX79e7FKs3rlz5/D0009j3bp1cHFxEbscm9Pc3Izhw4fjzTffxLBhw/DYY49hwYIF+PTTT7tl+xZ9FpWteOKJJ3DPPfd0OiYsLAw1NTUAgEGDBumXKxQK9O/f32EnMRrbu9dffx1ZWVltnjUSGxuL++67z+ChrI7C2N61unDhAiZNmoT4+HisWrXKwtXZHn9/f8hksjZHa8rKytoc1aH2Pfnkk/juu++wZ88e9O3bV+xyrF52djbKysoQExOjX6bVarFnzx589NFHUKvVfAh0J4KDgw3+fQoAUVFR2LRpU7dsnwEHui9Gf3//LsfFxMRAoVAgPz8fY8eOBQBoNBoUFhYiNDTU0mVaJWN79+GHH+L111/Xv79w4QKmTp2KjRs3Ii4uzpIlWi1jewfoLqWcNGmS/qihVMqDr1dzdnZGTEwM0tPTMXPmTP3y9PR0zJgxQ8TKrJ8gCHjyySexefNm7Nq1C+Hh4WKXZBMmT56MP/74w2DZww8/jMjISLz44osMN10YM2ZMm9sRnDhxotv+fcqAYwIvLy8sXLgQr732GkJCQhAaGop33nkHADB79myRq7Nu/fr1M3jv4eEBABgwYAD/S7ELFy5cwMSJE9GvXz+8++67uHTpkv6zoKAgESuzPsnJyUhMTERsbKz+SFdRURHnK3Xh8ccfx9dff40tW7bA09NTfxRMqVTC1dVV5Oqsl6enZ5t5Su7u7vDz8+P8JSM888wzGD16NN58803MmTMH+/btw6pVq7rtCDUDjoneeecdODk5ITExEfX19YiLi8PPP/8MHx8fsUsjO7Vjxw6cOnUKp06dahMGBUEQqSrrNHfuXFRUVGD58uUoKSlBdHQ0tm3b5rBHWI3VOudh4sSJBsv/9a9/4aGHHur5gsghjBgxAps3b0ZKSgqWL1+O8PBwpKam4r777uuW7UsEfkMSERGRneGJfCIiIrI7DDhERERkdxhwiIiIyO4w4BAREZHdYcAhIiIiu8OAQ0RERHaHAYeIiIjsDgMOERER2R0GHCIiIrI7DDhERERkdxhwiMguFBYWQiKRtHld/XwlInIMfNgmEdmFkJAQlJSU6N+Xlpbi5ptvxvjx40WsiojEwodtEpHdaWhowMSJE9GrVy9s2bIFUikPVhM5Gh7BISK7k5SUhJqaGqSnpzPcEDkoBhwisiuvv/46tm/fjn379sHT01PscohIJDxFRUR2Y9OmTbj33nvxww8/YPLkyWKXQ0QiYsAhIrtw+PBhxMXFITk5GY8//rh+ubOzM3x9fUWsjIjEwIBDRHYhLS0NDz/8cJvlEyZMwK5du3q+ICISFQMOERER2R1eXkBERER2hwGHiIiI7A4DDhEREdkdBhwiIiKyOww4REREZHcYcIiIiMjuMOAQERGR3WHAISIiIrvDgENERER2hwGHiIiI7A4DDhEREdkdBhwiIiKyO/8/+od1efM9u/8AAAAASUVORK5CYII=",
-      "text/plain": [
-       "<Figure size 640x480 with 1 Axes>"
-      ]
-     },
-     "metadata": {},
-     "output_type": "display_data"
-    }
-   ],
-   "source": [
-    "\"\"\"The sigmoid function (or the logistic curve) is a\n",
-    "function that takes any real number, z, and outputs a number (0,1).\n",
-    "It is useful in neural networks for assigning weights on a relative scale.\n",
-    "The value z is the weighted sum of parameters involved in the learning algorithm.\"\"\"\n",
-    "\n",
-    "import numpy\n",
-    "import matplotlib.pyplot as plt\n",
-    "import math as mt\n",
-    "\n",
-    "z = numpy.arange(-5, 5, .1)\n",
-    "sigma_fn = numpy.vectorize(lambda z: 1/(1+numpy.exp(-z)))\n",
-    "sigma = sigma_fn(z)\n",
-    "\n",
-    "fig = plt.figure()\n",
-    "ax = fig.add_subplot(111)\n",
-    "ax.plot(z, sigma)\n",
-    "ax.set_ylim([-0.1, 1.1])\n",
-    "ax.set_xlim([-5,5])\n",
-    "ax.grid(True)\n",
-    "ax.set_xlabel('z')\n",
-    "ax.set_title('sigmoid function')\n",
-    "\n",
-    "plt.show()\n",
-    "\n",
-    "\"\"\"Step Function\"\"\"\n",
-    "z = numpy.arange(-5, 5, .02)\n",
-    "step_fn = numpy.vectorize(lambda z: 1.0 if z >= 0.0 else 0.0)\n",
-    "step = step_fn(z)\n",
-    "\n",
-    "fig = plt.figure()\n",
-    "ax = fig.add_subplot(111)\n",
-    "ax.plot(z, step)\n",
-    "ax.set_ylim([-0.5, 1.5])\n",
-    "ax.set_xlim([-5,5])\n",
-    "ax.grid(True)\n",
-    "ax.set_xlabel('z')\n",
-    "ax.set_title('step function')\n",
-    "\n",
-    "plt.show()\n",
-    "\n",
-    "\"\"\"tanh Function\"\"\"\n",
-    "z = numpy.arange(-2*mt.pi, 2*mt.pi, 0.1)\n",
-    "t = numpy.tanh(z)\n",
-    "\n",
-    "fig = plt.figure()\n",
-    "ax = fig.add_subplot(111)\n",
-    "ax.plot(z, t)\n",
-    "ax.set_ylim([-1.0, 1.0])\n",
-    "ax.set_xlim([-2*mt.pi,2*mt.pi])\n",
-    "ax.grid(True)\n",
-    "ax.set_xlabel('z')\n",
-    "ax.set_title('tanh function')\n",
-    "\n",
-    "plt.show()"
+   "cell_type": "markdown",
+   "id": "503eb7b2",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "The difference is non-negative definite since each component of the\n",
+    "matrix product is non-negative definite. \n",
+    "This means the variance we obtain with the standard OLS will always for $\\lambda > 0$ be larger than the variance of $\\boldsymbol{\\theta}$ obtained with the Ridge estimator. This has interesting consequences when we discuss the so-called bias-variance trade-off below."
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "9c1d64b9",
-   "metadata": {},
+   "id": "1a33763c",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "## Two parameters\n",
+    "## Deriving OLS from a probability distribution\n",
+    "\n",
+    "Our basic assumption when we derived the OLS equations was to assume\n",
+    "that our output is determined by a given continuous function\n",
+    "$f(\\boldsymbol{x})$ and a random noise $\\boldsymbol{\\epsilon}$ given by the normal\n",
+    "distribution with zero mean value and an undetermined variance\n",
+    "$\\sigma^2$.\n",
     "\n",
-    "We assume now that we have two classes with $y_i$ either $0$ or $1$. Furthermore we assume also that we have only two parameters $\\beta$ in our fitting of the Sigmoid function, that is we define probabilities"
+    "We found above that the outputs $\\boldsymbol{y}$ have a mean value given by\n",
+    "$\\boldsymbol{X}\\hat{\\boldsymbol{\\theta}}$ and variance $\\sigma^2$. Since the entries to\n",
+    "the design matrix are not stochastic variables, we can assume that the\n",
+    "probability distribution of our targets is also a normal distribution\n",
+    "but now with mean value $\\boldsymbol{X}\\hat{\\boldsymbol{\\theta}}$. This means that a\n",
+    "single output $y_i$ is given by the Gaussian distribution"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "d1929423",
-   "metadata": {},
+   "id": "70a645e3",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
-    "\\begin{align*}\n",
-    "p(y_i=1|x_i,\\boldsymbol{\\beta}) &= \\frac{\\exp{(\\beta_0+\\beta_1x_i)}}{1+\\exp{(\\beta_0+\\beta_1x_i)}},\\nonumber\\\\\n",
-    "p(y_i=0|x_i,\\boldsymbol{\\beta}) &= 1 - p(y_i=1|x_i,\\boldsymbol{\\beta}),\n",
-    "\\end{align*}\n",
+    "y_i\\sim \\mathcal{N}(\\boldsymbol{X}_{i,*}\\boldsymbol{\\theta}, \\sigma^2)=\\frac{1}{\\sqrt{2\\pi\\sigma^2}}\\exp{\\left[-\\frac{(y_i-\\boldsymbol{X}_{i,*}\\boldsymbol{\\theta})^2}{2\\sigma^2}\\right]}.\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "1698b9e7",
-   "metadata": {},
+   "id": "7fced9cb",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "where $\\boldsymbol{\\beta}$ are the weights we wish to extract from data, in our case $\\beta_0$ and $\\beta_1$. \n",
+    "## Independent and Identically Distributed (iid)\n",
     "\n",
-    "Note that we used"
+    "We assume now that the various $y_i$ values are stochastically distributed according to the above Gaussian distribution. \n",
+    "We define this distribution as"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "eff2f862",
-   "metadata": {},
+   "id": "313c05af",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
-    "p(y_i=0\\vert x_i, \\boldsymbol{\\beta}) = 1-p(y_i=1\\vert x_i, \\boldsymbol{\\beta}).\n",
+    "p(y_i, \\boldsymbol{X}\\vert\\boldsymbol{\\theta})=\\frac{1}{\\sqrt{2\\pi\\sigma^2}}\\exp{\\left[-\\frac{(y_i-\\boldsymbol{X}_{i,*}\\boldsymbol{\\theta})^2}{2\\sigma^2}\\right]},\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "640f9f45",
-   "metadata": {},
+   "id": "66eeeef9",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "## Maximum likelihood\n",
+    "which reads as finding the likelihood of an event $y_i$ with the input variables $\\boldsymbol{X}$ given the parameters (to be determined) $\\boldsymbol{\\theta}$.\n",
     "\n",
-    "In order to define the total likelihood for all possible outcomes from a  \n",
-    "dataset $\\mathcal{D}=\\{(y_i,x_i)\\}$, with the binary labels\n",
-    "$y_i\\in\\{0,1\\}$ and where the data points are drawn independently, we use the so-called [Maximum Likelihood Estimation](https://en.wikipedia.org/wiki/Maximum_likelihood_estimation) (MLE) principle. \n",
-    "We aim thus at maximizing \n",
-    "the probability of seeing the observed data. We can then approximate the \n",
-    "likelihood in terms of the product of the individual probabilities of a specific outcome $y_i$, that is"
+    "Since these events are assumed to be independent and identicall distributed we can build the probability distribution function (PDF) for all possible event $\\boldsymbol{y}$ as the product of the single events, that is we have"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "f94fafba",
-   "metadata": {},
+   "id": "cda5e4d2",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
-    "\\begin{align*}\n",
-    "P(\\mathcal{D}|\\boldsymbol{\\beta})& = \\prod_{i=1}^n \\left[p(y_i=1|x_i,\\boldsymbol{\\beta})\\right]^{y_i}\\left[1-p(y_i=1|x_i,\\boldsymbol{\\beta}))\\right]^{1-y_i}\\nonumber \\\\\n",
-    "\\end{align*}\n",
+    "p(\\boldsymbol{y},\\boldsymbol{X}\\vert\\boldsymbol{\\theta})=\\prod_{i=0}^{n-1}\\frac{1}{\\sqrt{2\\pi\\sigma^2}}\\exp{\\left[-\\frac{(y_i-\\boldsymbol{X}_{i,*}\\boldsymbol{\\theta})^2}{2\\sigma^2}\\right]}=\\prod_{i=0}^{n-1}p(y_i,\\boldsymbol{X}\\vert\\boldsymbol{\\theta}).\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "5d457b2e",
-   "metadata": {},
+   "id": "2e6ed5cd",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "from which we obtain the log-likelihood and our **cost/loss** function"
+    "We will write this in a more compact form reserving $\\boldsymbol{D}$ for the domain of events, including the ouputs (targets) and the inputs. That is\n",
+    "in case we have a simple one-dimensional input and output case"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "683657ba",
-   "metadata": {},
+   "id": "ba81d29e",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
-    "\\mathcal{C}(\\boldsymbol{\\beta}) = \\sum_{i=1}^n \\left( y_i\\log{p(y_i=1|x_i,\\boldsymbol{\\beta})} + (1-y_i)\\log\\left[1-p(y_i=1|x_i,\\boldsymbol{\\beta}))\\right]\\right).\n",
+    "\\boldsymbol{D}=[(x_0,y_0), (x_1,y_1),\\dots, (x_{n-1},y_{n-1})].\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "3d17d95b",
-   "metadata": {},
+   "id": "26e2d548",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "## The cost function rewritten\n",
-    "\n",
-    "Reordering the logarithms, we can rewrite the **cost/loss** function as"
+    "In the more general case the various inputs should be replaced by the possible features represented by the input data set $\\boldsymbol{X}$. \n",
+    "We can now rewrite the above probability as"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "76cd7541",
-   "metadata": {},
+   "id": "0d5ef8ad",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
-    "\\mathcal{C}(\\boldsymbol{\\beta}) = \\sum_{i=1}^n  \\left(y_i(\\beta_0+\\beta_1x_i) -\\log{(1+\\exp{(\\beta_0+\\beta_1x_i)})}\\right).\n",
+    "p(\\boldsymbol{D}\\vert\\boldsymbol{\\theta})=\\prod_{i=0}^{n-1}\\frac{1}{\\sqrt{2\\pi\\sigma^2}}\\exp{\\left[-\\frac{(y_i-\\boldsymbol{X}_{i,*}\\boldsymbol{\\theta})^2}{2\\sigma^2}\\right]}.\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "b88061e7",
-   "metadata": {},
+   "id": "b6c5763c",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "The maximum likelihood estimator is defined as the set of parameters that maximize the log-likelihood where we maximize with respect to $\\beta$.\n",
-    "Since the cost (error) function is just the negative log-likelihood, for logistic regression we have that"
+    "It is a conditional probability (see below) and reads as the likelihood of a domain of events $\\boldsymbol{D}$ given a set of parameters $\\boldsymbol{\\theta}$."
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "3c95fe37",
-   "metadata": {},
+   "id": "e4afd86f",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "$$\n",
-    "\\mathcal{C}(\\boldsymbol{\\beta})=-\\sum_{i=1}^n  \\left(y_i(\\beta_0+\\beta_1x_i) -\\log{(1+\\exp{(\\beta_0+\\beta_1x_i)})}\\right).\n",
-    "$$"
+    "## Maximum Likelihood Estimation (MLE)\n",
+    "\n",
+    "In statistics, maximum likelihood estimation (MLE) is a method of\n",
+    "estimating the parameters of an assumed probability distribution,\n",
+    "given some observed data. This is achieved by maximizing a likelihood\n",
+    "function so that, under the assumed statistical model, the observed\n",
+    "data is the most probable. \n",
+    "\n",
+    "We will assume here that our events are given by the above Gaussian\n",
+    "distribution and we will determine the optimal parameters $\\theta$ by\n",
+    "maximizing the above PDF. However, computing the derivatives of a\n",
+    "product function is cumbersome and can easily lead to overflow and/or\n",
+    "underflowproblems, with potentials for loss of numerical precision.\n",
+    "\n",
+    "In practice, it is more convenient to maximize the logarithm of the\n",
+    "PDF because it is a monotonically increasing function of the argument.\n",
+    "Alternatively, and this will be our option, we will minimize the\n",
+    "negative of the logarithm since this is a monotonically decreasing\n",
+    "function.\n",
+    "\n",
+    "Note also that maximization/minimization of the logarithm of the PDF\n",
+    "is equivalent to the maximization/minimization of the function itself."
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "4f573bed",
-   "metadata": {},
+   "id": "03d912b0",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "This equation is known in statistics as the **cross entropy**. Finally, we note that just as in linear regression, \n",
-    "in practice we often supplement the cross-entropy with additional regularization terms, usually $L_1$ and $L_2$ regularization as we did for Ridge and Lasso regression."
+    "## A new Cost Function\n",
+    "\n",
+    "We could now define a new cost function to minimize, namely the negative logarithm of the above PDF"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "08a700a8",
-   "metadata": {},
+   "id": "fef4cb78",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "## Minimizing the cross entropy\n",
-    "\n",
-    "The cross entropy is a convex function of the weights $\\boldsymbol{\\beta}$ and,\n",
-    "therefore, any local minimizer is a global minimizer. \n",
-    "\n",
-    "Minimizing this\n",
-    "cost function with respect to the two parameters $\\beta_0$ and $\\beta_1$ we obtain"
+    "$$\n",
+    "C(\\boldsymbol{\\theta})=-\\log{\\prod_{i=0}^{n-1}p(y_i,\\boldsymbol{X}\\vert\\boldsymbol{\\theta})}=-\\sum_{i=0}^{n-1}\\log{p(y_i,\\boldsymbol{X}\\vert\\boldsymbol{\\theta})},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "000125c6",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "which becomes"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "9bd6709b",
-   "metadata": {},
+   "id": "4f665607",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
-    "\\frac{\\partial \\mathcal{C}(\\boldsymbol{\\beta})}{\\partial \\beta_0} = -\\sum_{i=1}^n  \\left(y_i -\\frac{\\exp{(\\beta_0+\\beta_1x_i)}}{1+\\exp{(\\beta_0+\\beta_1x_i)}}\\right),\n",
+    "C(\\boldsymbol{\\theta})=\\frac{n}{2}\\log{2\\pi\\sigma^2}+\\frac{\\vert\\vert (\\boldsymbol{y}-\\boldsymbol{X}\\boldsymbol{\\theta})\\vert\\vert_2^2}{2\\sigma^2}.\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "98c81b67",
-   "metadata": {},
+   "id": "5f5877fa",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "and"
+    "Taking the derivative of the *new* cost function with respect to the parameters $\\theta$ we recognize our familiar OLS equation, namely"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "5540b76a",
-   "metadata": {},
+   "id": "1c342299",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
-    "\\frac{\\partial \\mathcal{C}(\\boldsymbol{\\beta})}{\\partial \\beta_1} = -\\sum_{i=1}^n  \\left(y_ix_i -x_i\\frac{\\exp{(\\beta_0+\\beta_1x_i)}}{1+\\exp{(\\beta_0+\\beta_1x_i)}}\\right).\n",
+    "\\boldsymbol{X}^T\\left(\\boldsymbol{y}-\\boldsymbol{X}\\boldsymbol{\\theta}\\right) =0,\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "0018d823",
-   "metadata": {},
+   "id": "4b155a17",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "## A more compact expression\n",
-    "\n",
-    "Let us now define a vector $\\boldsymbol{y}$ with $n$ elements $y_i$, an\n",
-    "$n\\times p$ matrix $\\boldsymbol{X}$ which contains the $x_i$ values and a\n",
-    "vector $\\boldsymbol{p}$ of fitted probabilities $p(y_i\\vert x_i,\\boldsymbol{\\beta})$. We can rewrite in a more compact form the first\n",
-    "derivative of cost function as"
+    "which leads to the well-known OLS equation for the optimal paramters $\\theta$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "ee63f4f9",
-   "metadata": {},
+   "id": "c23eaf84",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
-    "\\frac{\\partial \\mathcal{C}(\\boldsymbol{\\beta})}{\\partial \\boldsymbol{\\beta}} = -\\boldsymbol{X}^T\\left(\\boldsymbol{y}-\\boldsymbol{p}\\right).\n",
+    "\\hat{\\boldsymbol{\\theta}}^{\\mathrm{OLS}}=\\left(\\boldsymbol{X}^T\\boldsymbol{X}\\right)^{-1}\\boldsymbol{X}^T\\boldsymbol{y}!\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "413ff641",
-   "metadata": {},
+   "id": "7699c6f7",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "If we in addition define a diagonal matrix $\\boldsymbol{W}$ with elements \n",
-    "$p(y_i\\vert x_i,\\boldsymbol{\\beta})(1-p(y_i\\vert x_i,\\boldsymbol{\\beta})$, we can obtain a compact expression of the second derivative as"
+    "Next week we will make  a similar analysis for Ridge and Lasso regression"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "337a2c56",
-   "metadata": {},
+   "id": "84c9b69d",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "$$\n",
-    "\\frac{\\partial^2 \\mathcal{C}(\\boldsymbol{\\beta})}{\\partial \\boldsymbol{\\beta}\\partial \\boldsymbol{\\beta}^T} = \\boldsymbol{X}^T\\boldsymbol{W}\\boldsymbol{X}.\n",
-    "$$"
+    "## Why resampling methods\n",
+    "\n",
+    "Before we proceed, we need to rethink what we have been doing. In our\n",
+    "eager to fit the data, we have omitted several important elements in\n",
+    "our regression analysis. In what follows we will\n",
+    "1. look at statistical properties, including a discussion of mean values, variance and the so-called bias-variance tradeoff\n",
+    "\n",
+    "2. introduce resampling techniques like cross-validation, bootstrapping and jackknife and more\n",
+    "\n",
+    "and discuss how to select a given model (one of the difficult parts in machine learning)."
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "8c3e92fe",
-   "metadata": {},
+   "id": "59e6b611",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "## Extending to more predictors\n",
+    "## Resampling methods\n",
+    "Resampling methods are an indispensable tool in modern\n",
+    "statistics. They involve repeatedly drawing samples from a training\n",
+    "set and refitting a model of interest on each sample in order to\n",
+    "obtain additional information about the fitted model. For example, in\n",
+    "order to estimate the variability of a linear regression fit, we can\n",
+    "repeatedly draw different samples from the training data, fit a linear\n",
+    "regression to each new sample, and then examine the extent to which\n",
+    "the resulting fits differ. Such an approach may allow us to obtain\n",
+    "information that would not be available from fitting the model only\n",
+    "once using the original training sample.\n",
+    "\n",
+    "Two resampling methods are often used in Machine Learning analyses,\n",
+    "1. The **bootstrap method**\n",
     "\n",
-    "Within a binary classification problem, we can easily expand our model to include multiple predictors. Our ratio between likelihoods is then with $p$ predictors"
+    "2. and **Cross-Validation**\n",
+    "\n",
+    "In addition there are several other methods such as the Jackknife and the Blocking methods. We will discuss in particular\n",
+    "cross-validation and the bootstrap method."
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "ba84fae7",
-   "metadata": {},
+   "id": "3ea44242",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "$$\n",
-    "\\log{ \\frac{p(\\boldsymbol{\\beta}\\boldsymbol{x})}{1-p(\\boldsymbol{\\beta}\\boldsymbol{x})}} = \\beta_0+\\beta_1x_1+\\beta_2x_2+\\dots+\\beta_px_p.\n",
-    "$$"
+    "## Resampling approaches can be computationally expensive\n",
+    "\n",
+    "Resampling approaches can be computationally expensive, because they\n",
+    "involve fitting the same statistical method multiple times using\n",
+    "different subsets of the training data. However, due to recent\n",
+    "advances in computing power, the computational requirements of\n",
+    "resampling methods generally are not prohibitive. In this chapter, we\n",
+    "discuss two of the most commonly used resampling methods,\n",
+    "cross-validation and the bootstrap. Both methods are important tools\n",
+    "in the practical application of many statistical learning\n",
+    "procedures. For example, cross-validation can be used to estimate the\n",
+    "test error associated with a given statistical learning method in\n",
+    "order to evaluate its performance, or to select the appropriate level\n",
+    "of flexibility. The process of evaluating a model’s performance is\n",
+    "known as model assessment, whereas the process of selecting the proper\n",
+    "level of flexibility for a model is known as model selection. The\n",
+    "bootstrap is widely used."
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "bddd73d3",
-   "metadata": {},
+   "id": "a98de365",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "Here we defined $\\boldsymbol{x}=[1,x_1,x_2,\\dots,x_p]$ and $\\boldsymbol{\\beta}=[\\beta_0, \\beta_1, \\dots, \\beta_p]$ leading to"
+    "## Why resampling methods ?\n",
+    "**Statistical analysis.**\n",
+    "\n",
+    "* Our simulations can be treated as *computer experiments*. This is particularly the case for Monte Carlo methods which are widely used in statistical analyses.\n",
+    "\n",
+    "* The results can be analysed with the same statistical tools as we would use when analysing experimental data.\n",
+    "\n",
+    "* As in all experiments, we are looking for expectation values and an estimate of how accurate they are, i.e., possible sources for errors."
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "fce6aba6",
-   "metadata": {},
+   "id": "2fd2ca6a",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "$$\n",
-    "p(\\boldsymbol{\\beta}\\boldsymbol{x})=\\frac{ \\exp{(\\beta_0+\\beta_1x_1+\\beta_2x_2+\\dots+\\beta_px_p)}}{1+\\exp{(\\beta_0+\\beta_1x_1+\\beta_2x_2+\\dots+\\beta_px_p)}}.\n",
-    "$$"
+    "## Statistical analysis\n",
+    "\n",
+    "* As in other experiments, many numerical  experiments have two classes of errors:\n",
+    "\n",
+    "  * Statistical errors\n",
+    "\n",
+    "  * Systematical errors\n",
+    "\n",
+    "* Statistical errors can be estimated using standard tools from statistics\n",
+    "\n",
+    "* Systematical errors are method specific and must be treated differently from case to case."
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "63325aad",
-   "metadata": {},
+   "id": "87ab1f2b",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "## Including more classes\n",
+    "## Resampling methods\n",
+    "\n",
+    "With all these analytical equations for both the OLS and Ridge\n",
+    "regression, we will now outline how to assess a given model. This will\n",
+    "lead to a discussion of the so-called bias-variance tradeoff (see\n",
+    "below) and so-called resampling methods.\n",
+    "\n",
+    "One of the quantities we have discussed as a way to measure errors is\n",
+    "the mean-squared error (MSE), mainly used for fitting of continuous\n",
+    "functions. Another choice is the absolute error.\n",
     "\n",
-    "Till now we have mainly focused on two classes, the so-called binary\n",
-    "system. Suppose we wish to extend to $K$ classes.  Let us for the sake\n",
-    "of simplicity assume we have only two predictors. We have then following model"
+    "In the discussions below we will focus on the MSE and in particular since we will split the data into test and training data,\n",
+    "we discuss the\n",
+    "1. prediction error or simply the **test error** $\\mathrm{Err_{Test}}$, where we have a fixed training set and the test error is the MSE arising from the data reserved for testing. We discuss also the \n",
+    "\n",
+    "2. training error $\\mathrm{Err_{Train}}$, which is the average loss over the training data.\n",
+    "\n",
+    "As our model becomes more and more complex, more of the training data tends to  used. The training may thence adapt to more complicated structures in the data. This may lead to a decrease in the bias (see below for code example) and a slight increase of the variance for the test error.\n",
+    "For a certain level of complexity the test error will reach minimum, before starting to increase again. The\n",
+    "training error reaches a saturation."
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "1c5878f6",
-   "metadata": {},
+   "id": "88ffab6d",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "$$\n",
-    "\\log{\\frac{p(C=1\\vert x)}{p(K\\vert x)}} = \\beta_{10}+\\beta_{11}x_1,\n",
-    "$$"
+    "## Resampling methods: Bootstrap\n",
+    "Bootstrapping is a [non-parametric approach](https://en.wikipedia.org/wiki/Nonparametric_statistics) to statistical inference\n",
+    "that substitutes computation for more traditional distributional\n",
+    "assumptions and asymptotic results. Bootstrapping offers a number of\n",
+    "advantages: \n",
+    "1. The bootstrap is quite general, although there are some cases in which it fails.  \n",
+    "\n",
+    "2. Because it does not require distributional assumptions (such as normally distributed errors), the bootstrap can provide more accurate inferences when the data are not well behaved or when the sample size is small.  \n",
+    "\n",
+    "3. It is possible to apply the bootstrap to statistics with sampling distributions that are difficult to derive, even asymptotically. \n",
+    "\n",
+    "4. It is relatively simple to apply the bootstrap to complex data-collection plans (such as stratified and clustered samples).\n",
+    "\n",
+    "The textbook by [Davison on the Bootstrap Methods and their Applications](https://www.cambridge.org/core/books/bootstrap-methods-and-their-application/ED2FD043579F27952363566DC09CBD6A) provides many more insights and proofs. In this course we will take a more practical approach and use the results and theorems provided in the literature. For those interested in reading more about the bootstrap methods, we recommend the above text and the one by [Efron and Tibshirani](https://www.routledge.com/An-Introduction-to-the-Bootstrap/Efron-Tibshirani/p/book/9780412042317).\n",
+    "\n",
+    "Before we proceed however, we need to remind ourselves about a central theorem in statistics, namely the so-called **central limit theorem**."
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "2c8a1b85",
-   "metadata": {},
+   "id": "96fabf7e",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "and"
+    "## The Central Limit Theorem\n",
+    "\n",
+    "Suppose we have a PDF $p(x)$ from which we generate  a series $N$\n",
+    "of averages $\\mathbb{E}[x_i]$. Each mean value $\\mathbb{E}[x_i]$\n",
+    "is viewed as the average of a specific measurement, e.g., throwing \n",
+    "dice 100 times and then taking the average value, or producing a certain\n",
+    "amount of random numbers. \n",
+    "For notational ease, we set $\\mathbb{E}[x_i]=x_i$ in the discussion\n",
+    "which follows. We do the same for $\\mathbb{E}[z]=z$.\n",
+    "\n",
+    "If we compute the mean $z$ of $m$ such mean values $x_i$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "cced4ec8",
-   "metadata": {},
+   "id": "6e876164",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
-    "\\log{\\frac{p(C=2\\vert x)}{p(K\\vert x)}} = \\beta_{20}+\\beta_{21}x_1,\n",
+    "z=\\frac{x_1+x_2+\\dots+x_m}{m},\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "6efd1ce1",
-   "metadata": {},
+   "id": "2b00fa3c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "the question we pose is which is the PDF of the new variable $z$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "75d6acad",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "and so on till the class $C=K-1$ class"
+    "## Finding the Limit\n",
+    "\n",
+    "The probability of obtaining an average value $z$ is the product of the \n",
+    "probabilities of obtaining arbitrary individual mean values $x_i$,\n",
+    "but with the constraint that the average is $z$. We can express this through\n",
+    "the following expression"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "933753b8",
-   "metadata": {},
+   "id": "8b412a9e",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
-    "\\log{\\frac{p(C=K-1\\vert x)}{p(K\\vert x)}} = \\beta_{(K-1)0}+\\beta_{(K-1)1}x_1,\n",
+    "\\tilde{p}(z)=\\int dx_1p(x_1)\\int dx_2p(x_2)\\dots\\int dx_mp(x_m)\n",
+    "    \\delta(z-\\frac{x_1+x_2+\\dots+x_m}{m}),\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "ba94450f",
-   "metadata": {},
+   "id": "3bdb59e7",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "and the model is specified in term of $K-1$ so-called log-odds or\n",
-    "**logit** transformations."
+    "where the $\\delta$-function enbodies the constraint that the mean is $z$.\n",
+    "All measurements that lead to each individual $x_i$ are expected to\n",
+    "be independent, which in turn means that we can express $\\tilde{p}$ as the \n",
+    "product of individual $p(x_i)$.  The independence assumption is important in the derivation of the central limit theorem."
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "8f174f5d",
-   "metadata": {},
+   "id": "d709f4c1",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "## More classes\n",
-    "\n",
-    "In our discussion of neural networks we will encounter the above again\n",
-    "in terms of a slightly modified function, the so-called **Softmax** function.\n",
+    "## Rewriting the $\\delta$-function\n",
     "\n",
-    "The softmax function is used in various multiclass classification\n",
-    "methods, such as multinomial logistic regression (also known as\n",
-    "softmax regression), multiclass linear discriminant analysis, naive\n",
-    "Bayes classifiers, and artificial neural networks.  Specifically, in\n",
-    "multinomial logistic regression and linear discriminant analysis, the\n",
-    "input to the function is the result of $K$ distinct linear functions,\n",
-    "and the predicted probability for the $k$-th class given a sample\n",
-    "vector $\\boldsymbol{x}$ and a weighting vector $\\boldsymbol{\\beta}$ is (with two\n",
-    "predictors):"
+    "If we use the integral expression for the $\\delta$-function"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "9ba36ed7",
-   "metadata": {},
+   "id": "bf40508f",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
-    "p(C=k\\vert \\mathbf {x} )=\\frac{\\exp{(\\beta_{k0}+\\beta_{k1}x_1)}}{1+\\sum_{l=1}^{K-1}\\exp{(\\beta_{l0}+\\beta_{l1}x_1)}}.\n",
+    "\\delta(z-\\frac{x_1+x_2+\\dots+x_m}{m})=\\frac{1}{2\\pi}\\int_{-\\infty}^{\\infty}\n",
+    "   dq\\exp{\\left(iq(z-\\frac{x_1+x_2+\\dots+x_m}{m})\\right)},\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "b5b5ecc6",
-   "metadata": {},
+   "id": "8b2b63fe",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "It is easy to extend to more predictors. The final class is"
+    "and inserting $e^{i\\mu q-i\\mu q}$ where $\\mu$ is the mean value\n",
+    "we arrive at"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "e6b33699",
-   "metadata": {},
+   "id": "4c1720db",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
-    "p(C=K\\vert \\mathbf {x} )=\\frac{1}{1+\\sum_{l=1}^{K-1}\\exp{(\\beta_{l0}+\\beta_{l1}x_1)}},\n",
+    "\\tilde{p}(z)=\\frac{1}{2\\pi}\\int_{-\\infty}^{\\infty}\n",
+    "   dq\\exp{\\left(iq(z-\\mu)\\right)}\\left[\\int_{-\\infty}^{\\infty}\n",
+    "   dxp(x)\\exp{\\left(iq(\\mu-x)/m\\right)}\\right]^m,\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "b49a6a23",
-   "metadata": {},
+   "id": "5aba4a1e",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "and they sum to one. Our earlier discussions were all specialized to\n",
-    "the case with two classes only. It is easy to see from the above that\n",
-    "what we derived earlier is compatible with these equations.\n",
-    "\n",
-    "To find the optimal parameters we would typically use a gradient\n",
-    "descent method.  Newton's method and gradient descent methods are\n",
-    "discussed in the material on [optimization\n",
-    "methods](https://compphysics.github.io/MachineLearning/doc/pub/Splines/html/Splines-bs.html)."
+    "with the integral over $x$ resulting in"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "e9bfd38c",
-   "metadata": {},
+   "id": "00a5fc23",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "## Searching for Optimal Regularization Parameters $\\lambda$\n",
-    "\n",
-    "In project 1, when using Ridge and Lasso regression, we end up\n",
-    "searching for the optimal parameter $\\lambda$ which minimizes our\n",
-    "selected scores (MSE or $R2$ values for example). The brute force\n",
-    "approach, as discussed in the code here for Ridge regression, consists\n",
-    "in evaluating the MSE as function of different $\\lambda$ values.\n",
-    "Based on these calculations, one tries then to determine the value of the hyperparameter $\\lambda$\n",
-    "which results in optimal scores (for example the smallest MSE or an $R2=1$)."
+    "$$\n",
+    "\\int_{-\\infty}^{\\infty}dxp(x)\\exp{\\left(iq(\\mu-x)/m\\right)}=\n",
+    "  \\int_{-\\infty}^{\\infty}dxp(x)\n",
+    "   \\left[1+\\frac{iq(\\mu-x)}{m}-\\frac{q^2(\\mu-x)^2}{2m^2}+\\dots\\right].\n",
+    "$$"
    ]
   },
   {
-   "cell_type": "code",
-   "execution_count": 2,
-   "id": "5dc4fd0e",
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAkkAAAGwCAYAAAC99fF4AAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjguMCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy81sbWrAAAACXBIWXMAAA9hAAAPYQGoP6dpAABe3klEQVR4nO3dd1QU198G8GepSxcFKUpVA6KIgIpgTxREgz12RI0maGKNSSQaNYkGY4omsRsVe4k1iRWMgArRWNaKHUURBCwgbWnz/uHr/twACgjMLjyfc/YcZ/bO7DMrOl/u3LkjEQRBABEREREp0RA7ABEREZEqYpFEREREVAIWSUREREQlYJFEREREVAIWSUREREQlYJFEREREVAIWSUREREQl0BI7gLoqKirCgwcPYGRkBIlEInYcIiIiKgNBEPDs2TNYW1tDQ+PVfUUskirowYMHsLGxETsGERERVcC9e/fQsGHDV7ZhkVRBRkZGAJ5/ycbGxiKnISIiorLIyMiAjY2N4jz+KiySKujFJTZjY2MWSURERGqmLENlOHCbiIiIqAQskoiIiIhKwCKJiIiIqAQck0RERK9UWFiI/Px8sWMQlYm2tjY0NTUrZV8skoiIqESCICA5ORlPnz4VOwpRudSpUweWlpZvPI8hiyQiIirRiwKpfv360NfX58S5pPIEQUB2djZSUlIAAFZWVm+0PxZJRERUTGFhoaJAqlevnthxiMpMT08PAJCSkoL69eu/0aU3UQduR0dHIyAgANbW1pBIJNizZ88r2yclJWHo0KFwcnKChoYGJk+eXKzNqlWr0KFDB5iamsLU1BRdu3bFqVOnirVbunQpHBwcIJVK4enpiWPHjlXSURERqb8XY5D09fVFTkJUfi9+bt90LJ2oRVJWVhbc3NywePHiMrWXy+UwNzfHjBkz4ObmVmKbyMhIDBkyBEePHkVsbCxsbW3h6+uLxMRERZtt27Zh8uTJmDFjBs6dO4cOHTrA398fCQkJlXJcREQ1BS+xkTqqrJ9biSAIQqXs6Q1JJBLs3r0bffr0KVP7zp07o2XLlli0aNEr2xUWFsLU1BSLFy/GiBEjAABeXl7w8PDAsmXLFO2aNm2KPn36IDQ0tEyfn5GRARMTE6Snp3PGbSKqcXJzcxEfH6/ocSdSJ6/6+S3P+bvGz5OUnZ2N/Px81K1bFwCQl5eHM2fOwNfXV6mdr68vYmJiSt2PXC5HRkaG0ouIiIhqrhpfJE2fPh0NGjRA165dAQBpaWkoLCyEhYWFUjsLCwskJyeXup/Q0FCYmJgoXjY2NlWam4iIqLwiIyMhkUheOW1DWFgY6tSpU22Z1FmNLpIWLFiALVu2YNeuXcW62/57vVIQhFdewwwJCUF6erride/evSrJTEREFTdy5EhIJBIEBwcXe2/8+PGQSCQYOXKkYl1KSgo+/PBD2NraQldXF5aWlvDz80NsbKyijb29PSQSSbHX/PnzS81x+/ZtDBkyBNbW1pBKpWjYsCF69+6N69evK9qU5Yall72cQ09PD87Ozvj+++/x8qgZHx8fJCUlwcTEpMz7rQolfV8vv17+Oygve3v71w61qSw1dgqAH374Ad9++y0iIiLQokULxXozMzNoamoW6zVKSUkp1rv0Ml1dXejq6lZZ3pclZiRi8M7BCH0nFO1t21fLZxIR1RQ2NjbYunUrFi5cqLgdPDc3F1u2bIGtra1S2/79+yM/Px/r1q2Do6MjHj58iCNHjuDx48dK7b7++muMHTtWaZ2RkVGJn5+Xl4du3brB2dkZu3btgpWVFe7fv4/9+/cjPT39jY7tRY7c3FxERERg3LhxMDY2xocffggA0NHRgaWl5Rt9RmVISkpS/Hnbtm2YNWsWrl27plj34u9F1dXInqTvv/8e33zzDQ4ePIhWrVopvaejowNPT0+Eh4crrQ8PD4ePj091xizVnMg5OJ5wHB3WdsCwXcOQmJH4+o2IiKpBVl5Wqa/cgtwyt83JzylT24rw8PCAra0tdu3apVi3a9cu2NjYwN3dXbHu6dOnOH78OL777jt06dIFdnZ2aNOmDUJCQtCzZ0+lfRoZGcHS0lLpZWBgUOLnX7lyBbdv38bSpUvRtm1b2NnZoV27dpg3bx5at25doWP6bw57e3uMGTMGLVq0wOHDhxXvl3S5LSwsDLa2ttDX10ffvn3x6NGjYvudO3cu6tevDyMjI4wZMwbTp09Hy5YtldqsXbsWTZs2hVQqhbOzM5YuXVpqzpe/JxMTE0gkEqV10dHR8PT0hFQqhaOjI7766isUFBQotp8zZ46id8/a2hoTJ04E8Pymrbt372LKlCmKXqmqJGpPUmZmJm7evKlYjo+Ph0wmQ926dWFra4uQkBAkJiZi/fr1ijYymUyxbWpqKmQyGXR0dODi4gLg+SW2L7/8Eps3b4a9vb2ix8jQ0BCGhoYAgKlTpyIwMBCtWrWCt7c3Vq5ciYSEhBK7Z8Xw7TvfQkOigVVnV2Hzxc3Ye3UvZnSYganeU6GrVT29WUREJTEMNSz1vR5NemDf0H2K5fo/1Ed2fnaJbTvZdULkyEjFsv3P9kjLTivWTphdsRuwR40ahbVr12LYsGEAgDVr1mD06NGIjPzfZ744L+zZswdt27attKsF5ubm0NDQwI4dOzB58uRKe47YywRBQFRUFOLi4tCkSZNS2508eRKjR4/Gt99+i379+uHgwYOYPXu2UptNmzZh3rx5WLp0Kdq1a4etW7fixx9/hIODg6LNqlWrMHv2bCxevBju7u44d+4cxo4dCwMDAwQFBZUr+6FDhzB8+HD88ssv6NChA27duoUPPvgAADB79mzs2LEDCxcuxNatW9GsWTMkJyfj/PnzAJ4Xu25ubvjggw+K9exVCUFER48eFQAUewUFBQmCIAhBQUFCp06dlLYpqb2dnZ3ifTs7uxLbzJ49W2k/S5YsEezs7AQdHR3Bw8NDiIqKKlf29PR0AYCQnp5egSMvmzMPzgg+q30EzIGAORAa/dxI+OvaX1X2eUREL+Tk5AhXrlwRcnJylNa/+P+opFePTT2U2urP0y+1bae1nZTami0wK7FdeQUFBQm9e/cWUlNTBV1dXSE+Pl64c+eOIJVKhdTUVKF3796Kc4wgCMKOHTsEU1NTQSqVCj4+PkJISIhw/vx5pX2+OFcYGBgovY4ePVpqjsWLFwv6+vqCkZGR0KVLF+Hrr78Wbt26pfxdAsLu3bvLfGwv59DW1hYACFKpVDhx4oSizYvz6pMnTwRBEIQhQ4YI3bt3V9rPoEGDBBMTE8Wyl5eX8NFHHym1adeuneDm5qZYtrGxETZv3qzU5ptvvhG8vb1fm3vt2rVKn9ehQwfh22+/VWqzYcMGwcrKShAEQfjxxx+Ft956S8jLyytxf3Z2dsLChQtf+Zml/fwKQvnO36L2JHXu3FlpwNl/hYWFFVv3qvYAcOfOnTJ99vjx4zF+/PgytRWLh5UHjo86js0XN+PT8E9x68ktRN+NRs+3er5+YyKiKpAZklnqe5oayj0mKdNSSm2rIVEe7XFn0p03yvVfZmZm6NmzJ9atWwdBENCzZ0+YmZkVa9e/f3/07NkTx44dQ2xsLA4ePIgFCxbgt99+Uxpc/OmnnxYbbNygQYNSP/+jjz7CiBEjcPToUZw8eRK///47vv32W/zxxx/o1q1bhY/rRY7U1FTMmDEDb7/99iuHisTFxaFv375K67y9vXHw4EHF8rVr14qdD9u0aYO///4bAJCamop79+7h/fffV+q9KSgoqNAA8TNnzuDff//FvHnzFOsKCwuRm5uL7OxsvPfee1i0aBEcHR3RvXt39OjRAwEBAdDSqv6SpcYO3K4pJBIJhrUYhl5OvfBT7E+Y6j1V8V5CegLq6tWFoU7p3d9ERJXJQKfkcTjV2basRo8ejY8//hgAsGTJklLbSaVSdOvWDd26dcOsWbMwZswYzJ49W6koMjMzQ+PGjcv1+UZGRujVqxd69eqFuXPnws/PD3Pnzn2jIulFjsaNG2Pnzp1o3Lgx2rZtq5jm5r9e17HwQkl3fL9QVFQE4PklNy8vL6V2FbmUWFRUhK+++gr9+vUr9p5UKoWNjQ2uXbuG8PBwREREYPz48fj+++8RFRUFbW3tcn/em6iRA7drIiNdI8zuPBtGus/vpigSijBk5xC4LHHB7rjdZf6HQERUW3Tv3h15eXnIy8uDn59fmbdzcXFBVlbFBo2XRiKRwNnZuVL3a2pqigkTJmDatGmlngNcXFzwzz//KK3777KTk1OxZ5yePn1a8WcLCws0aNAAt2/fVhRoL14vj1sqKw8PD1y7dq3Yvho3bgwNjedliZ6eHnr16oVffvkFkZGRiI2NxcWLFwE8vwGrsLCw3J9bEexJUlP3M+4jMSMR9zLuod/2fujZpCd+8f8FjqaOYkcjIlIJmpqaiIuLU/z5vx49eoT33nsPo0ePRosWLWBkZITTp09jwYIF6N27t1LbZ8+eFZs6Rl9fv8THWshkMsyePRuBgYFwcXGBjo4OoqKisGbNGnz++edKbV/csPSyxo0bK240ep2PPvoI3333HXbu3IkBAwYUe3/ixInw8fHBggUL0KdPHxw+fFjpUhsATJgwAWPHjkWrVq3g4+ODbdu24cKFC3B0/N/5ZM6cOZg4cSKMjY3h7+8PuVyO06dP48mTJ5g6dep/P/aVZs2ahXfffRc2NjZ47733oKGhgQsXLuDixYuYO3cuwsLCUFhYCC8vL+jr62PDhg3Q09ODnZ0dgOfzJEVHR2Pw4MHQ1dUt8TJqpXntqCUqUXUM3H6drLws4YuILwTtr7UFzIEgnSsVvon6RsjNzxUtExHVDK8a+KrKXgzcLs3LA7dzc3OF6dOnCx4eHoKJiYmgr68vODk5CTNnzhSys7MV25R2Q9CHH35Y4mekpqYKEydOFJo3by4YGhoKRkZGgqurq/DDDz8IhYWFinYl7RNAqQPCSxuwPHbsWKFZs2ZCYWFhsYHbgiAIq1evFho2bCjo6ekJAQEBwg8//KA0kFoQBOHrr78WzMzMBENDQ2H06NHCxIkThbZt2yq12bRpk9CyZUtBR0dHMDU1FTp27Cjs2rWr1O/6hf8O3BYEQTh48KDg4+Mj6OnpCcbGxkKbNm2ElStXCoIgCLt37xa8vLwEY2NjwcDAQGjbtq0QERGh2DY2NlZo0aKFoKurK5RWxlTWwG2VecCtulGlB9xeTbuK8fvG4+idowCAt+q9hcPDD8Oujp2ouYhIffEBt7Vbt27dYGlpiQ0bNogdpUIq6wG3vNxWAzibOePIiCPYcmkLph6aCgNtAzQ0bih2LCIiUgPZ2dlYvnw5/Pz8oKmpiS1btiAiIqLYpMu1EYukGkIikWCo61D0bNITKVkpiltxc/JzsPHCRoxyHwUtDf51ExGRMolEgv3792Pu3LmQy+VwcnLCzp07S71jrjbhWbOGMZGawET6v3krvjvxHb6K+gpLTy/Fsp7L0LZhWxHTERGRqtHT00NERITYMVQSpwCo4exM7GAqNYUsWQbv1d4I/isYT3Ofih2LiNQEh62SOqqsn1sWSTXcKPdRuPrxVQS5PX+2zoozK+C82BnbLm3jf35EVKoXk/ZlZ5f87DUiVfbi5/ZNJ5/k5bZaoL5BfYT1CcPIliMR/Fcwrj26hsE7B+Nq2lXM7jz79TsgolpHU1MTderUQUrK80eL6OvrV/kT14nelCAIyM7ORkpKCurUqfPGDxdmkVSLdLbvjPPB5/Hdie+w6J9FGOE2QuxIRKTCLC0tAUBRKBGpizp16ih+ft8E50mqIFWaJ6kinsmfKR5xAgDfHvsWXey7wNvGW8RURKSKCgsLkZ+fL3YMojLR1tZ+ZQ8S50mi13q5QDp29xhm/D0DEkgQ3CoY377zLepI64gXjohUiqam5htftiBSRxy4TWhq3hSjWo6CAAHLTi9D0yVNsf3ydg7sJiKiWo1FEsFM3wxreq/B0aCjcKrnhOTMZAzaMQg9N/dE/JN4seMRERGJgkUSKbwY2D2n0xzoaOrgwM0DeHv92ygoKhA7GhERUbVjkURKdLV0MbvzbFwIvoDO9p0xt8tcPs6EiIhqJZ79qEROZk74e8TfSut+v/w7Iu9E4tt3vlV69AkREVFNxJ4kKpVEIlFMHpeTn4NJBydh6emlcF7izIHdRERU47FIojLR09bDpn6b8Fa9txQDu3tt7YX7GffFjkZERFQlWCRRmXVx6ILzwecxq+MsaGto46/rf8FliQuWn16OIqFI7HhERESVikUSlYtUS4qvunyFcx+eQ9uGbfEs7xnG7RuHU4mnxI5GRERUqThwmyqkWf1mOD7qOJb8uwQ3Ht1A24ZtxY5ERERUqdiTRBWmqaGJiV4T8WuPXxXr7qXfQ6ewTjibdFbEZERERG+ORRJVqulHpiP6bjTarGqD6RHTkZOfI3YkIiKiCmGRRJXqJ9+fMKjZIBQKhfjuxHdwW+6GqDtRYsciIiIqNxZJVKksDC2wdcBW7B28F9ZG1rjx+AY6r+uMCfsnICsvS+x4REREZcYiiapEL6deuDL+CsZ6jAUALP53MRb+s1DkVERERGXHIomqjInUBCsDVuLQ8EPo6tgVn3h/InYkIiKiMmORRFXOt5EvwgPDoaetBwAoLCrEqL2j8G/ivyInIyIiKh2LJKp2S/5dgjBZGLxXe+PLv79EXmGe2JGIiIiKYZFE1W6Y6zAMbj4YhUIh5h6bC5/VPrj+6LrYsYiIiJSwSKJqV0+/Hrb034LtA7ajrl5dnEk6A/cV7lh9djUEQRA7HhEREQCRi6To6GgEBATA2toaEokEe/bseWX7pKQkDB06FE5OTtDQ0MDkyZOLtbl8+TL69+8Pe3t7SCQSLFq0qFibOXPmQCKRKL0sLS0r56CozN5r9h7OB59HF/suyM7Pxpg/xyDkSIjYsYiIiACIXCRlZWXBzc0NixcvLlN7uVwOc3NzzJgxA25ubiW2yc7OhqOjI+bPn//KwqdZs2ZISkpSvC5evFihY6A309C4IcIDwzH/nfkw0jHCMNdhYkciIiICIPIDbv39/eHv71/m9vb29vj5558BAGvWrCmxTevWrdG6dWsAwPTp00vdl5aWFnuPVISmhiY+b/85xnqORV29uor1Mfdi4NXAC5oamiKmIyKi2qrWjkm6ceMGrK2t4eDggMGDB+P27duvbC+Xy5GRkaH0osr1coEUey8WncI6wXejLx5mPhQxFRER1Va1skjy8vLC+vXrcejQIaxatQrJycnw8fHBo0ePSt0mNDQUJiYmipeNjU01Jq59HmY9hK6mLv6O/xvuK9wRfTda7EhERFTL1Moiyd/fH/3794erqyu6du2Kffv2AQDWrVtX6jYhISFIT09XvO7du1ddcWulPs598O/Yf+Fi7oKkzCS8ve5tLDixAEVCkdjRiIiolqiVRdJ/GRgYwNXVFTdu3Ci1ja6uLoyNjZVeVLWamjfFqTGnMLzFcBQKhfg84nP03dYX6bnpYkcjIqJagEUSno83iouLg5WVldhR6D8MdAywvs96rHh3BXQ1dfHHtT+w6eImsWMREVEtIOrdbZmZmbh586ZiOT4+HjKZDHXr1oWtrS1CQkKQmJiI9evXK9rIZDLFtqmpqZDJZNDR0YGLiwsAIC8vD1euXFH8OTExETKZDIaGhmjcuDEAYNq0aQgICICtrS1SUlIwd+5cZGRkICgoqJqOnMpDIpHgA88P4GnliQ0XNmBcq3FiRyIiolpAIog4xXFkZCS6dOlSbH1QUBDCwsIwcuRI3LlzB5GRkYr3JBJJsfZ2dna4c+cOAODOnTtwcHAo1qZTp06K/QwePBjR0dFIS0uDubk52rZti2+++UZRaJVFRkYGTExMkJ6ezktvIsnKy8KuuF0Y3mJ4iT8XRERE/1We87eoRZI6Y5EkLkEQMHDHQOy4sgPvu7+PJT2WQFdLV+xYRESk4spz/uaYJFJbPg19oCHRwOpzq+G30Q9Pcp6IHYmIiGoQFkmkliQSCaZ4T8G+oftgpGOEqLtR6LC2AxLSE8SORkRENQSLJFJr3Rt3x7FRx2BtZI3LqZfhvdob55PPix2LiIhqABZJpPbcLN0Q+34sXMxd8ODZA/Tf3h8FRQVixyIiIjXHIolqBFsTWxwfdRw9mvTApn6boKUh6uwWRERUA/BMQjWGqZ4p9g3dp7QuLTsNZvpmIiUiIiJ1xp4kqrH+TfwXjX9pjGX/LhM7ChERqSEWSVRj7b22F+nydIzfPx4/xPwgdhwiIlIzLJKoxvqmyzf4ov0XAIBPwz/F3Oi5IiciIiJ1wiKJaiyJRIJ578zDvLfnAQC+PPolFpxYIHIqIiJSFyySqMb7osMXikLp84jP8fM/P4uciIiI1AGLJKoVvujwBWZ1nAUA2H9zPwqLCkVOREREqo5TAFCtMafzHDiaOmJQ80HQ1NAUOw4REak49iRRrSGRSBDUMghSLSkAQBAE3Hh0Q+RURESkqlgkUa1UJBThk8OfwHWZK6LuRIkdh4iIVBCLJKqVBEFA/NN4yAvl6L21Ny4+vCh2JCIiUjEskqhW0tTQxOZ+m9HBtgPS5enovqk77j69K3YsIiJSISySqNbS09bD3sF70cy8GR48ewD/Tf5Iz00XOxYREakIFklUq5nqmeLg8INoYNQAcWlxGLhjIAqKCsSORUREKoBFEtV6DY0b4o8hf0BfWx/ht8JxNP6o2JGIiEgFcJ4kIgAeVh7Y3G8zBAjo1qib2HGIiEgFsEgi+n+9nXuLHYGIiFQIL7cRleDO0zvw3eCLO0/viB2FiIhEwiKJqATj9o1D+O1w9N/eHzn5OWLHISIiEbBIIirBindXwEzfDGeTzmL8/vEQBEHsSEREVM1YJBGVwNbEFlv7b4WGRANhsjCsOLNC7EhERFTNWCQRleIdx3cw/535AICJBybizIMzIiciIqLqxCKJ6BWm+UxDX+e+yC/Kx6Adg5AhzxA7EhERVRMWSUSvIJFIsLrXatia2MJEasLHlhAR1SKcJ4noNUz1TBERGAFbE1voaumKHYeIiKoJiySiMmhSr4nScl5hHnQ0dURKQ0RE1YGX24jKoaCoALOOzoLPah/kFeaJHYeIiKoQiySicniU/QhL/12KM0ln8HXU12LHISKiKsQiiagcLAwtsPzd5QCA0OOh+Of+PyInIiKiqiJqkRQdHY2AgABYW1tDIpFgz549r2yflJSEoUOHwsnJCRoaGpg8eXKxNpcvX0b//v1hb28PiUSCRYsWlbivpUuXwsHBAVKpFJ6enjh27NibHxDVCgNcBmCY6zAUCUUYsXsEsvKyxI5ERERVQNQiKSsrC25ubli8eHGZ2svlcpibm2PGjBlwc3MrsU12djYcHR0xf/58WFpalthm27ZtmDx5MmbMmIFz586hQ4cO8Pf3R0JCQoWPhWqXX/1/RQOjBrjx+AY+j/hc7DhERFQFJIKKPJRKIpFg9+7d6NOnT5nad+7cGS1btiy1pwgA7O3tMXny5GI9Tl5eXvDw8MCyZcsU65o2bYo+ffogNDS0xH3J5XLI5XLFckZGBmxsbJCeng5jY+MyZaaaJfxWOHw3+gIAIgIj8I7jOyInIiKi18nIyICJiUmZzt+1bkxSXl4ezpw5A19fX6X1vr6+iImJKXW70NBQmJiYKF42NjZVHZVUXLdG3TC+1Xjoa+vjwbMHYschIqJKVuuKpLS0NBQWFsLCwkJpvYWFBZKTk0vdLiQkBOnp6YrXvXv3qjoqqYH5Xefj8vjLCHQLFDsKERFVslo7maREIlFaFgSh2LqX6erqQleXsy2TMiNdIxjpGokdg4iIqkCt60kyMzODpqZmsV6jlJSUYr1LROUReScS/bf3R35hvthRiIioEtS6IklHRweenp4IDw9XWh8eHg4fHx+RUpG6y8rLwnu/v4ddcbvwfcz3YschIqJKIGqRlJmZCZlMBplMBgCIj4+HTCZT3IofEhKCESNGKG3zon1mZiZSU1Mhk8lw5coVxft5eXmKNnl5eUhMTIRMJsPNmzcVbaZOnYrffvsNa9asQVxcHKZMmYKEhAQEBwdX/UFTjWSgY4CFfgsBAN9Ef4PbT26LnIiIiN6UqFMAREZGokuXLsXWBwUFISwsDCNHjsSdO3cQGRmpeK+kcUN2dna4c+cOAODOnTtwcHAo1qZTp05K+1m6dCkWLFiApKQkNG/eHAsXLkTHjh3LnL08txBS7SAIArpt6IYj8Ufw7lvv4s8hf4odiYiI/qM852+VmSdJ3bBIopJcTbuKFstaIL8oH3sH70Uvp15iRyIiopdwniQikTibOeMT708AABMPTER2frbIiYiIqKJYJBFVspkdZ8LWxBZ30+9iy8UtYschIqIKqrXzJBFVFQMdAyzvuRzP8p7hPZf3xI5DREQVxCKJqAr4N/EXOwIREb0hXm4jqmJPc58iLjVO7BhERFROLJKIqtDxhONo8msTDNwxEAVFBWLHISKicmCRRFSFmpk3Q5FQhEspl/Db2d/EjkNEROXAIomoCpnqmWJOpzkAgFlHZyE9N13cQEREVGYskoiqWHCrYDibOSM1OxXzjs0TOw4REZURiySiKqatqY0ffX8EACz6ZxHin8SLnIiIiMqCRRJRNfBv7I9ujt2QX5SPWZGzxI5DRERlwCKJqBpIJBLM7zofmhJNaGlooUgoEjsSERG9BieTJKomHlYeiJ8UDxsTG7GjEBFRGbAniagasUAiIlIfLJKIRHD90XXM/HsmBEEQOwoREZWCl9uIqllWXhbarGqDdHk6PK080bdpX7EjERFRCdiTRFTNDHQM8HGbjwEAM4/ORGFRociJiIioJCySiEQwzWca6kjr4ErqFfx+5Xex4xARUQlYJBGJoI60Dqa0nQIA+Drqa/YmERGpIBZJRCKZ5DUJdaR1EJcWh+2Xt4sdh4iI/oNFEpFITKQm+MT7EwDA19HsTSIiUjUskohENNFrIhrXbYzhrsORX5QvdhwiInoJpwAgEpGxrjGufXwNGhL+vkJEpGr4PzORyFggERGpJv7vTKQCBEHAvuv7MGjHIBQUFYgdh4iIwCKJSCVk52cjaE8Qtl/ezjvdiIhUBIskIhVgoGOgmDdp/vH5fKYbEZEKYJFEpCI+avMRjHSMcDHlIvbd2Cd2HCKiWo9FEpGKqCOtg3GtxgEAQo+HsjeJiEhkLJKIVMgU7ynQ1dRFzL0YHEs4JnYcIqJajUUSkQqxNLTEqJajADzvTSIiIvGwSCJSMZ+2+xSeVp6KYomIiMTBGbeJVIyjqSNOf3Ba7BhERLWeqD1J0dHRCAgIgLW1NSQSCfbs2fPK9klJSRg6dCicnJygoaGByZMnl9hu586dcHFxga6uLlxcXLB7926l9+fMmQOJRKL0srS0rKSjIiIioppA1CIpKysLbm5uWLx4cZnay+VymJubY8aMGXBzcyuxTWxsLAYNGoTAwECcP38egYGBGDhwIE6ePKnUrlmzZkhKSlK8Ll68+MbHQ1SZnsmf4ceYHzHz75liRyEiqpUkgorcZyyRSLB792706dOnTO07d+6Mli1bYtGiRUrrBw0ahIyMDBw4cECxrnv37jA1NcWWLVsAPO9J2rNnD2QyWYXzZmRkwMTEBOnp6TA2Nq7wfohKcyLhBNqvbQ9dTV3cnXwXFoYWYkciIlJ75Tl/17iB27GxsfD19VVa5+fnh5iYGKV1N27cgLW1NRwcHDB48GDcvn37lfuVy+XIyMhQehFVJR8bH3g18IK8UI5lp5eJHYeIqNapcUVScnIyLCyUf+O2sLBAcnKyYtnLywvr16/HoUOHsGrVKiQnJ8PHxwePHj0qdb+hoaEwMTFRvGxsbKrsGIiA572rU72nAgCW/rsUOfk5IiciIqpdalyRBDw/ubxMEASldf7+/ujfvz9cXV3RtWtX7Nv3/BEQ69atK3WfISEhSE9PV7zu3btXNeGJXtKvaT/YmdghNTsVmy5uEjsOEVGtUuOKJEtLS6VeIwBISUkp1rv0MgMDA7i6uuLGjRulttHV1YWxsbHSi6iqaWloYZLXJADAT7E/oUgoEjkREVHtUeOKJG9vb4SHhyutO3z4MHx8fErdRi6XIy4uDlZWVlUdj6jc3vd4H0Y6RohLi8Ohm4fEjkNEVGuIOplkZmYmbt68qViOj4+HTCZD3bp1YWtri5CQECQmJmL9+vWKNi/uSMvMzERqaipkMhl0dHTg4uICAJg0aRI6duyI7777Dr1798bevXsRERGB48ePK/Yxbdo0BAQEwNbWFikpKZg7dy4yMjIQFBRUPQdOVA7GusYY12ocEjISYGPCsXBERNVF1CkAIiMj0aVLl2Lrg4KCEBYWhpEjR+LOnTuIjIxUvPff8UYAYGdnhzt37iiWd+zYgZkzZ+L27dto1KgR5s2bh379+ineHzx4MKKjo5GWlgZzc3O0bdsW33zzjaLQKgtOAUDV6b/j6oiIqGLKc/5WmXmS1A2LJCIiIvVTq+dJIqrJrqVdw/h945H0LEnsKERENR6LJCI1MubPMVh2ehlWnFkhdhQiohqPRRKRGvm49ccAgBVnViCvME/kNERENRuLJCI10q9pP1gZWiE5Mxk7ruwQOw4RUY3GIolIjWhraiO4VTAA4NdTv4qchoioZmORRKRmPvD8ANoa2vjn/j84/eC02HGIiGosFklEasbS0BIDmw0EwN4kIqKqxCKJSA1NaDMBZvpmcKzjKHYUIqIaS9THkhBRxXg19ML9Kfehq6UrdhQiohqLPUlEaooFEhFR1WKRRKTGioQiHLx5EMcTjr++MRERlQuLJCI19mPMj/Df5I+Zf88UOwoRUY3DIolIjQ1xHQINiQai7kYhLjVO7DhERDUKiyQiNdbQuCEC3goAAD7PjYiokrFIIlJzL2bgXnd+HbLzs0VOQ0RUc7BIIlJzvo18YV/HHk9zn2L75e1ixyEiqjFYJBGpOQ2JBj70/BAAsPz0cpHTEBHVHCySiGqAUS1HQVtDG1n5WXia+1TsOERENQJn3CaqASwMLXBx3EW8Ve8tSCQSseMQEdUILJKIaggnMyexIxAR1Sjluty2YMEC5OTkKJajo6Mhl8sVy8+ePcP48eMrLx0RlVtmXibnTCIiqgQSQRCEsjbW1NREUlIS6tevDwAwNjaGTCaDo+PzJ5E/fPgQ1tbWKCwsrJq0KiQjIwMmJiZIT0+HsbGx2HGIAABH44+iz7Y+cKjjgHMfnuOlNyKi/yjP+btcPUn/rafKUV8RUTVoadkSeYV5OP/wPE4mnhQ7DhGRWuPdbUQ1iKmeKQY3HwyA0wEQEb0pFklENUyw5/MZuLdd3obHOY9FTkNEpL7KfXfbb7/9BkNDQwBAQUEBwsLCYGZmBuD5wG0iElebBm3gZuGG8w/PY+OFjZjoNVHsSEREaqlcA7ft7e3LNBA0Pj7+jUKpAw7cJlW25NQSfHzgYzSv3xwXgi9wADcR0f8rz/m7XEUS/Q+LJFJlT3KewPona+QW5OLK+Ctoat5U7EhERCqhPOdvTiZJVAOZ6pliQ98N8LTyhIOpg9hxiIjUUrkGbp88eRIHDhxQWrd+/Xo4ODigfv36+OCDD5QmlyQi8QxwGcACiYjoDZSrSJozZw4uXLigWL548SLef/99dO3aFdOnT8eff/6J0NDQSg9JRG+moKhA7AhERGqnXEWSTCbDO++8o1jeunUrvLy8sGrVKkydOhW//PILtm/fXukhiahibjy6gV5beuHtdW+LHYWISO2Ua0zSkydPYGFhoViOiopC9+7dFcutW7fGvXv3Ki8dEb0RI10jHLh5AAVFBbicchnN6jcTOxIRkdooV0+ShYWF4vb+vLw8nD17Ft7e3or3nz17Bm1t7cpNSEQVZmloiYC3AgAAv539TeQ0RETqpVxFUvfu3TF9+nQcO3YMISEh0NfXR4cOHRTvX7hwAY0aNSrz/qKjoxEQEABra2tIJBLs2bPnle2TkpIwdOhQODk5QUNDA5MnTy6x3c6dO+Hi4gJdXV24uLhg9+7dxdosXboUDg4OkEql8PT0xLFjx8qcm0idjPEYAwBYf2E95AW8sYKIqKzKVSTNnTsXmpqa6NSpE1atWoWVK1dCR0dH8f6aNWvg6+tb5v1lZWXBzc0NixcvLlN7uVwOc3NzzJgxA25ubiW2iY2NxaBBgxAYGIjz588jMDAQAwcOxMmT/3vY57Zt2zB58mTMmDED586dQ4cOHeDv74+EhIQyZydSF36N/NDAqAEe5zzGnqt7xI5DRKQ2KjSZZHp6OgwNDaGpqam0/vHjxzAyMqrQJTeJRILdu3ejT58+ZWrfuXNntGzZEosWLVJaP2jQIGRkZChNVdC9e3eYmppiy5YtAAAvLy94eHhg2bJlijZNmzZFnz59ynx3HieTJHUy6+gsfBP9Dbo6dkV4YLjYcYiIRFNlk0mOHj26TO3WrFlTnt1WqtjYWEyZMkVpnZ+fn6KYysvLw5kzZzB9+nSlNr6+voiJiSl1v3K5XGkOqIyMjMoLTVTFRruPxtzouYi4HYHbT27D0dRR7EhERCqvXEVSWFgY7Ozs4O7uDlV9mklycrLSHXjA8wHnycnJAIC0tDQUFha+sk1JQkND8dVXX1V+YKJqYF/HHhPaTICzmTPM9c3FjkNEpBbKVSQFBwdj69atuH37NkaPHo3hw4ejbt26VZWtwv77ME9BEIqtK0ubl4WEhGDq1KmK5YyMDNjY2FRCWqLq8bP/z2JHICJSK+UauL106VIkJSXh888/x59//gkbGxsMHDgQhw4dUpmeJUtLy2I9QikpKYqeIzMzM2hqar6yTUl0dXVhbGys9CIiIqKaq1xFEvC8WBgyZAjCw8Nx5coVNGvWDOPHj4ednR0yMzOrImO5eHt7IzxceWDq4cOH4ePjAwDQ0dGBp6dnsTbh4eGKNkQ1VYY8A8tPL8eMIzPEjkJEpPLKdbntvyQSCSQSCQRBQFFRUbm3z8zMxM2bNxXL8fHxkMlkqFu3LmxtbRESEoLExESsX79e0UYmkym2TU1NhUwmg46ODlxcXAAAkyZNQseOHfHdd9+hd+/e2Lt3LyIiInD8+HHFPqZOnYrAwEC0atUK3t7eWLlyJRISEhAcHFzBb4JIPSSkJ2DcvnHQ0tDCpLaTUN+gvtiRiIhUl1BOubm5wubNm4WuXbsKUqlUGDBggLBv3z6hsLCwvLsSjh49KgAo9goKChIEQRCCgoKETp06KW1TUns7OzulNr///rvg5OQkaGtrC87OzsLOnTuLffaSJUsEOzs7QUdHR/Dw8BCioqLKlT09PV0AIKSnp5drOyKxtV7ZWsAcCD/F/CR2FCKialee83e55kkaP348tm7dCltbW4waNQrDhw9HvXr1KrtuUwucJ4nU1fLTyzFu3zg0r98cF4IvvPKGBSKimqY85+9yFUkaGhqwtbWFu7v7K/9j3bVrV9nTqikWSaSunuY+hdWPVsgtyMW/Y/9FK+tWYkciIqo2VTaZ5IgRI/hbJ5GaqyOtg35N+2Hzxc1Yc24NiyQiolJU6LEkxJ4kUm9Hbh9B1w1dUUdaBw+mPoCetp7YkYiIqkV5zt/lngKAiNRfF4cucDR1RDubdkjLThM7DhGRSnqjKQCISD1pSDRwefxlSLWkYkchIlJZ7EkiqqVYIBERvRqLJKJa7u7Tu4i+Gy12DCIilcMiiagWC78VDoefHTByz0gUCeWfNZ+IqCZjkURUi7WzbQcjXSPEP41H1J0oseMQEakUFklEtZi+tj4GNxsMAFgrWytyGiIi1cIiiaiWG+0+GgCw48oOpOemi5yGiEh1sEgiquXaNGiDpmZNkVOQg22Xt4kdh4hIZbBIIqrlJBKJojeJl9yIiP6HRRIRYXiL4dCUaOJSyiU8zHwodhwiIpXAGbeJCJaGljgw7AC8bbxhqGModhwiIpXAIomIAADdGnUTOwIRkUrh5TYiUiIIAjLzMsWOQUQkOhZJRKQQfTcabsvdMGL3CLGjEBGJjpfbiEihrl5dXEy5iLi0OKRkpaC+QX2xIxERiYY9SUSk0Lx+c7Rp0AYFRQXYdGGT2HGIiETFIomIlIxqOQoAsPrcagiCIHIaIiLxsEgiIiWDmw+GVEuKy6mXcfrBabHjEBGJhkUSESmpI62Dfk37AeAM3ERUu7FIIqJiRrd8/piSzRc3Iyc/R+Q0RETi4N1tRFRMF4cu+NDzQ/R26g0dTR2x4xARiYJFEhEVoyHRwPJ3l4sdg4hIVLzcRkRERFQCFklEVKrbT24jJCIES04tETsKEVG1Y5FERKWKuReD+Sfm48fYH1EkFIkdh4ioWrFIIqJS9WvaD8a6xoh/Go/IO5FixyEiqlYskoioVPra+hjSfAiA5zNwExHVJiySiOiVxniMAQDsvLITT3KeiJyGiKj6sEgiolfytPJEC4sWkBfKsfniZrHjEBFVGxZJRPRKEokE77u/DwD47dxvIqchIqo+ohZJ0dHRCAgIgLW1NSQSCfbs2fPabaKiouDp6QmpVApHR0csX6484V1+fj6+/vprNGrUCFKpFG5ubjh48KBSmzlz5kAikSi9LC0tK/PQiGqUYa7DYGloiXY27SAvkIsdh4ioWog643ZWVhbc3NwwatQo9O/f/7Xt4+Pj0aNHD4wdOxYbN27EiRMnMH78eJibmyu2nzlzJjZu3IhVq1bB2dkZhw4dQt++fRETEwN3d3fFvpo1a4aIiAjFsqamZuUfIFENUU+/Hu5PuQ9NDf47IaLaQ9Qiyd/fH/7+/mVuv3z5ctja2mLRokUAgKZNm+L06dP44YcfFEXShg0bMGPGDPTo0QMAMG7cOBw6dAg//vgjNm7cqNiXlpZWuXqP5HI55PL//QadkZFR5m2JagIWSERU26jVmKTY2Fj4+voqrfPz88Pp06eRn58P4HkxI5VKldro6enh+PHjSutu3LgBa2trODg4YPDgwbh9+/YrPzs0NBQmJiaKl42NTSUcEZF6EQQBx+4eQ8y9GLGjEBFVObUqkpKTk2FhYaG0zsLCAgUFBUhLSwPwvGj66aefcOPGDRQVFSE8PBx79+5FUlKSYhsvLy+sX78ehw4dwqpVq5CcnAwfHx88evSo1M8OCQlBenq64nXv3r2qOUgiFfbLyV/QMawjZvw9Q+woRERVTq2KJOD5nTYvEwRBaf3PP/+MJk2awNnZGTo6Ovj4448xatQopTFH/v7+6N+/P1xdXdG1a1fs27cPALBu3bpSP1dXVxfGxsZKL6Lapl/TfpBAgsg7kbj5+KbYcYiIqpRaFUmWlpZITk5WWpeSkgItLS3Uq1cPAGBubo49e/YgKysLd+/exdWrV2FoaAgHB4dS92tgYABXV1fcuHGjSvMTqTsbExv4NfYDAKw5t0bkNEREVUutiiRvb2+Eh4crrTt8+DBatWoFbW1tpfVSqRQNGjRAQUEBdu7cid69e5e6X7lcjri4OFhZWVVJbqKa5MWcSWGyMBQUFYichoio6ohaJGVmZkImk0EmkwF4fou/TCZDQkICgOfjgEaMGKFoHxwcjLt372Lq1KmIi4vDmjVrsHr1akybNk3R5uTJk9i1axdu376NY8eOoXv37igqKsJnn32maDNt2jRERUUhPj4eJ0+exIABA5CRkYGgoKDqOXAiNdbLqRfM9M2QlJmEgzcPvn4DIiI1JWqRdPr0abi7uyvmL5o6dSrc3d0xa9YsAEBSUpKiYAIABwcH7N+/H5GRkWjZsiW++eYb/PLLL0pzLOXm5mLmzJlwcXFB37590aBBAxw/fhx16tRRtLl//z6GDBkCJycn9OvXDzo6Ovjnn39gZ2dXPQdOpMZ0NHUwosXzX15+O8sZuImo5pIIL0Y+U7lkZGTAxMQE6enpHMRNtc6V1CtotrQZ3qr3Fi6OuwgdTR2xIxERlUl5zt+iTiZJROrJxdwFJ8ecRCvrVtCQqNXQRiKiMmORREQV0qZBG7EjEBFVKf4KSERvRF4gR2pWqtgxiIgqHYskIqqwbZe2ocFPDTAtfNrrGxMRqRkWSURUYTYmNniU8wi/X/4d6bnpYschIqpULJKIqMK8G3qjqVlT5BTkYPPFzWLHISKqVCySiKjCJBIJPvD8AACw4swKcEYRIqpJWCQR0RsZ4TYCupq6OP/wPE4lnhI7DhFRpWGRRERvpK5eXQxqPggAsPzMcpHTEBFVHhZJRPTGgj2DATy/2y0zL1PkNERElYOTSRLRG2vbsC2+fftb9HHuA0MdQ7HjEBFVChZJRPTGJBIJQjqEiB2DiKhS8XIbEVU63uVGRDUBiyQiqjRXUq9gyM4heP+P98WOQkT0xlgkEVGlycnPwdZLW7Hp4iakZaeJHYeI6I2wSCKiSuNp7QlPK0/kFeYhTBYmdhwiojfCIomIKlVwq+fTAaw4swJFQpHIaYiIKo5FEhFVqsHNB8NIxwg3H9/E0fijYschIqowFklEVKkMdQwR2CIQALDs9DKR0xARVRyLJCKqdC8uue25ugf3M+6LnIaIqGI4mSQRVTpXC1cEuQWhhUULGOsaix2HiKhCWCQRUZUI6xMmdgQiojfCy21EREREJWCRRERVRl4gx8YLG/HBnx+IHYWIqNxYJBFRlUmXp+P9P97HqrOrcCrxlNhxiIjKhUUSEVWZ+gb1MajZIADA4lOLRU5DRFQ+LJKIqEp93OZjAMC2y9uQkpUichoiorJjkUREVapNgzZo06AN8grzsOrMKrHjEBGVGacAIKIq93HrjzEicQSWnV6Gz9p9Bm1NbbEjEVWJvMI8JKQnoLCoEE5mTgCAIqEI4/eNR3Z+NvKL8lFQVIDCokJItaTQ09KDs5kzPm33qWIf0XejoaelBysjK1gZWkFTQ1Osw6n1JIIgCGKHUEcZGRkwMTFBeno6jI05WR7Rq8gL5LBZaIPU7FRsG7ANA5sNFDsS0Rt7mPkQ/z74F2eTzuJs0lmcf3geCekJKBKK8O5b7+LPIX8q2hqFGiEzL7PE/XS064iokVGKZasfrZCcmQwA0NHUgX0dezQybYRGpo3gYeWBUe6jqvbAarjynL/Zk0REVU5XSxfjWo1DdEI0rAytxI5DVCGCIEAikQAAcgtyYbfIDvJCebF2Ui0pNCTKo1m+6fIN8gvzoa2pDW0NbWhINJBbkIucghw0MGqg9Bm2JrbQlGjiYdZD5BXm4fqj67j+6DoAoINtB6Ui6f2976O+QX20btAaXg280MC4AajysCepgtiTRFQ+hUWFvGxAaic9Nx1/Xf8LO+J2ID03HX8H/a147+11byMlKwUeVh7wsPJAS8uWcKrnBEtDS0Ux9SYKiwpxL+Mebj2+hdtPbuPm45twNHXEh60+BABk5mXCONQYAv53Gm9g1ABeDb3g1cALbzu8jVbWrd44R01TnvM3i6QKYpFERFQzCYKAYwnHsPLMSuy4skOpt+j+lPuK3poXPUNiyczLxIbzG3A26SxOPTiFSymXUCQUKd4PcgtSPB6oSCjC4VuH4WPjU+ufp1ie87eod7dFR0cjICAA1tbWkEgk2LNnz2u3iYqKgqenJ6RSKRwdHbF8+XKl9/Pz8/H111+jUaNGkEqlcHNzw8GDB4vtZ+nSpXBwcIBUKoWnpyeOHTtWWYdFRK+QkpWCryK/wu0nt8WOQlTMtkvb0HRJU3QK64RNFzdBXihHU7OmmNVxFi4EX4C1kbWirdg3IBjqGGJc63FY1WsVzgefR/r0dEQGRWJB1wXo37Q/fBv5KtpeSrkE/03+MP3OFJ4rPTHl4BTsjtuNtOw0EY9A9Yk6JikrKwtubm4YNWoU+vfv/9r28fHx6NGjB8aOHYuNGzfixIkTGD9+PMzNzRXbz5w5Exs3bsSqVavg7OyMQ4cOoW/fvoiJiYG7uzsAYNu2bZg8eTKWLl2Kdu3aYcWKFfD398eVK1dga2tbpcdMVNuN+WMM/rz+Jx7nPMbP/j+LHYdISUFRAa49ugZDHUMMbT4UYz3HwtPKs1Iun1U1Qx1DdLLvhE72nYq9l5qVikamjXDryS3FQPNFJxcBAJqZN0PoO6EIcAqo5sSqT2Uut0kkEuzevRt9+vQptc3nn3+OP/74A3FxcYp1wcHBOH/+PGJjYwEA1tbWmDFjBj766CNFmz59+sDQ0BAbN24EAHh5ecHDwwPLli1TtGnatCn69OmD0NDQMuXl5Taiigm/FQ7fjb4w0DbAvSn3YKpnKnYkqqWe5j7FghMLYGdipxjnU1BUgPXn1+M9l/dgpGskcsLKl5iRiGMJxxB9NxrRd6NxOfUyAODw8MPo1qgbACDqThS2Xtr6vOCy6wQro5p1s0WNvbstNjYWvr6+Suv8/PywevVq5OfnQ1tbG3K5HFKpVKmNnp4ejh8/DgDIy8vDmTNnMH36dKU2vr6+iImJKfWz5XI55PL/XZfOyMh408MhqpW6OnZF8/rNcSnlEladXYXP2n0mdiSqZeQFciz9dynmHpuLxzmP0dC4IUa5j4KOpg60NLQw2n202BGrTAPjBhjcfDAGNx8M4HkP07GEY/Cx8VG0+fP6n1h+ZjmWn3k+nOWtem+hs11nRdFUm+6gU6sZt5OTk2FhYaG0zsLCAgUFBUhLe35d1c/PDz/99BNu3LiBoqIihIeHY+/evUhKSgIApKWlobCwsMT9JCcnl/rZoaGhMDExUbxsbGwq+eiIageJRIKpbacCAH45+QvyCvNETkS1yZ/X/oTzEmdMPTwVj3Meo6lZUyz2Xwxtjdo5wam5gTn6Ne0HAx0DxbqAtwIw2Wsy3C3dIYEE1x9dx8qzKzFs1zA0XNgQtx7fUrTNLcgVI3a1UasiCUCx68Ivrha+WP/zzz+jSZMmcHZ2ho6ODj7++GOMGjUKmpqar93Pq645h4SEID09XfG6d+9eZRwOUa001HUoLAwskPgsEb9f/l3sOFQL3H16F7239kavrb1w5+kdWBtZ47eA33Bh3AX0du6tFmOOqksn+05Y2H0hzn54Fo8+e4Q/Bv+BT7w/gaeVJxoaN4SjqaOibeDuQDj+7IhRe0dhnWwd7jy9AxUZxVMp1Opym6WlZbHenpSUFGhpaaFevXoAAHNzc+zZswe5ubl49OgRrK2tMX36dDg4OAAAzMzMoKmpWeJ+/tu79DJdXV3o6upW8hER1U66Wrr4uM3H+PLol/g+5nsMdR3KkxRVqfsZ9/HHtT+gpaGFad7TMLPjTKXeEyqZqZ4pApwCFIO65QVyxb9VQRAQcy8GD549QLwsHmGyMACAtZE12jZsi052nTDRa6JY0SuFWvUkeXt7Izw8XGnd4cOH0apVK2hrK3eVSqVSNGjQAAUFBdi5cyd69+4NANDR0YGnp2ex/YSHh8PHxwdEVD3GtRqHunp14dXACzkFOWLHoRooJ/9/P1ftbNvhR98fcT74PEK7hrJAqiBdrf91FkgkElz96CoODDuA6e2mo23DttDS0MKDZw+wK24Xtl7aqrTt3Oi5WH9+Pa6lXVOaz0mViXp3W2ZmJm7evAkAcHd3x08//YQuXbqgbt26sLW1RUhICBITE7F+/XoAz6cAaN68OT788EOMHTsWsbGxCA4OxpYtWxRTAJw8eRKJiYlo2bIlEhMTMWfOHMTHx+Ps2bOoU6cOgOdTAAQGBmL58uXw9vbGypUrsWrVKly+fBl2dnZlys6724jeXHZ+NvS19cWOQTWMIAhYfno5vor6CjHvxyhdHqKqlZ2fjTMPzuCf+//A3MAcI1uOBPB84kuT+SaK4shIxwhulm5wt3SHu6U7vG284WzmXC0Zy3X+FkR09OhRAUCxV1BQkCAIghAUFCR06tRJaZvIyEjB3d1d0NHREezt7YVly5YVe79p06aCrq6uUK9ePSEwMFBITEws9tlLliwR7OzsBB0dHcHDw0OIiooqV/b09HQBgJCenl6u7YiIqOo8yn4k9NrSS8AcCJgD4ZNDn4gdiQRBSM1KFT459InQfk17QTpXqvj7efEasXuEoq28QC4sjF0o/HPvnyrJUp7zt8rMk6Ru2JNEVHnOJZ3DwZsHEdIhROwopMZi78Vi8M7BSEhPgI6mDhZ0XYAJXhOKPWyWxFVQVICraVchS5bhXNI5nEs+hyHNh2Cs51gAz2cHd13mioHNBmLbgG2V/vk1dp4kIqp5kp4lofWq1igUCuHX2A8eVh5iRyI1IwgCfor9CdOPTEdBUQGa1G2C7e9tR0vLlmJHoxJoaWihef3maF6/OYa3GF7sfUEQ0Ne5Lzrbda7+cP/B8pqIRGVlZIVBzQcBAOYfny9yGlJHq8+txrTwaSgoKsDg5oNx+oPTLJDUmKuFK3YN2oVxrceJHYVFEhGJb3q75zPg77iyA9fSromchtRNYItAdLTriCU9lmBzv821/in3VHlYJBGR6FwtXNHLqRcECPjuxHdixyE1cPHhRRQWFQJ4flv60aCjGN96POfbokrFIomIVEJI++eDtjdc2ICE9ASR05Aq23hhI1qtaoUvjnyhWMfB2VQV+FNFRCqhbcO26GLfBQVFBQg9Fip2HFJBgiBg/vH5CNwdiLzCPNx8clPRm0RUFVgkEZHKmNN5DiwNLdHCooXYUUjFFBYV4qP9HyHkyPMex2ne0/D7e79DU0PzNVsSVRynACAildHRriPuTLqj9OgDouz8bAzdORR7r+2FBBIs6r5I7Z8JRuqBRRIRqRQWSPSyIqEIPTb1QNTdKOhq6mJTv03o79Jf7FhUS/ByGxGpnCKhCNsubcOcyDliRyGRaUg0MK7VOJjpmyFiRAQLJKpWfCxJBfGxJERVR5Ysg/sKd2hINBD3URzeqveW2JFIZBnyDM5/RJWiPOdv9iQRkcppadkSAW8FoEgowldRX4kdh6pZ/JN4vLP+HdzPuK9YxwKJxMAiiYhU0lednxdHWy5uwfnk8yKnoepyNe0qOqztgL/j/0bwX8Fix6FajkUSEakkdyt3DGo2CAIEfB7xudhxqBpcfHgRHdd2ROKzRLiYu2BVwCqxI1EtxyKJiFTWvLfnQVtDG4duHcKR20fEjkNV6ErqFbyz/h2kZqfCw8oDUSOjYGVkJXYsquVYJBGRympUtxGCWz2/5PJZxGfgfSY10/VH15UKpIjACJjpm4kdi4jzJBGRavuy45e4mHIRMzvM5MNLa6jx+8YjOTMZLSxa4PDwwzDVMxU7EhEATgFQYZwCgIiociQ9S8KEAxOwtOdS1DeoL3YcquHKc/5mTxIRqRV5gZyzctcAeYV50NHUAQBYGVlhx8AdIiciKo5jkohILeQV5mFe9Dw0+qURUrNSxY5DbyA5MxnuK9yx4fwGsaMQvRKLJCJSC5oSTeyM24nEZ4mY+fdMseNQBaVmpeKd9e/gSuoVzIqchZz8HLEjEZWKRRIRqQVNDU384v8LAGDV2VU4m3RW5ERUXo9zHqPbhm64knoFDYwaICIwAnraemLHIioViyQiUhvtbdtjSPMhECBg4oGJnBJAjTzNfQrfDb44//A8LAwscGTEETSq20jsWESvxCKJiNTKgm4LoK+tjxP3TmDLpS1ix6EyeCZ/Bv9N/jiTdAZm+mY4MuIInMycxI5F9FoskohIrTQ0bogv2n8BAPgs/DNk5mWKnIheZ/359fjn/j8wlZoiIjACzeo3EzsSUZmwSCIitfOJzydwNHVEanYqYu7FiB2HXmN86/GY3Wk2wgPD4WbpJnYcojLjZJIVxMkkicQVey8WdfXq8rKNipIXyAGAc1qRyuFkkkRU43nbeIsdgUqRV5iHAb8PQJFQhJ0Dd0KqJRU7ElGF8HIbEam9k/dPcmJCFZFfmI8hO4fgr+t/4e/4v3Hx4UWxIxFVGHuSiEitnbx/Et6rvSHVkqK9bXs4mDqIHanWKiwqxIg9I7Arbhd0NHWwZ9AetG7QWuxYRBXGniQiUmttGrRBZ/vOyCnIwdg/x6JIKBI7Uq1UJBRh9B+jsfXSVmhraGPnwJ3wa+wndiyiN8IiiYjUmkQiwYp3V0BPSw9H4o9g6b9LxY5U6xQJRfjgzw+w/vx6aEo0sW3ANrz71rtixyJ6YyySiEjtNanXBAu6LQDwfO6ka2nXRE5Uu9x6fAvbL2+HhkQDm/ptQt+mfcWORFQpWCQRUY0wvvV4dHPshpyCHATuDkR+Yb7YkWqNJvWa4HDgYWzsuxGDmg8SOw5RpRG1SIqOjkZAQACsra0hkUiwZ8+e124TFRUFT09PSKVSODo6Yvny5cXaLFq0CE5OTtDT04ONjQ2mTJmC3Nxcxftz5syBRCJRellaWlbmoRFRNdOQaGBN7zWoI62Dfx/8i22Xt4kdqUYTBAH30u8plts2bIshrkNETERU+US9uy0rKwtubm4YNWoU+vfv/9r28fHx6NGjB8aOHYuNGzfixIkTGD9+PMzNzRXbb9q0CdOnT8eaNWvg4+OD69evY+TIkQCAhQsXKvbVrFkzREREKJY1NTUr9+CIqNo1NG6IFe+uwNPcpxjmOkzsODWWIAj4NPxTrDm3BhEjIuBh5SF2JKIqIWqR5O/vD39//zK3X758OWxtbbFo0SIAQNOmTXH69Gn88MMPiiIpNjYW7dq1w9ChQwEA9vb2GDJkCE6dOqW0Ly0trXL1HsnlcsjlcsVyRkZGmbclouozsNlAsSPUaIIg4IsjX+DH2B8BALJkGYskqrHUakxSbGwsfH19ldb5+fnh9OnTyM9/Pv6gffv2OHPmjKIoun37Nvbv34+ePXsqbXfjxg1YW1vDwcEBgwcPxu3bt1/52aGhoTAxMVG8bGxsKvHIiKgqPM19iu+Of8dpASqJIAiYdXQW5p+YDwBY7L8Yo91Hi5yKqOqo1WSSycnJsLCwUFpnYWGBgoICpKWlwcrKCoMHD0Zqairat28PQRBQUFCAcePGYfr06YptvLy8sH79erz11lt4+PAh5s6dCx8fH1y+fBn16tUr8bNDQkIwdepUxXJGRgYLJSIVVlBUgPZr2uNy6mVIJBJ81u4zsSOpNUEQ8HnE5/g+5nsAwCK/RfiozUcipyKqWmrVkwQ8nxPlZS+ez/tifWRkJObNm4elS5fi7Nmz2LVrF/766y988803im38/f3Rv39/uLq6omvXrti3bx8AYN26daV+rq6uLoyNjZVeRKS6tDS0MNFrIgDgiyNf4NjdYyInUl9FQhEmHJigKJB+7v4zJrWdJHIqoqqnVkWSpaUlkpOTldalpKRAS0tL0QP05ZdfIjAwEGPGjIGrqyv69u2Lb7/9FqGhoSgqKrnL3cDAAK6urrhx40aVHwMRVZ+xHmMxzHUYCoVCDNoxCPcz7osdSS3lF+bj2qNrkOD5xJ0vik+imk6tiiRvb2+Eh4crrTt8+DBatWoFbW1tAEB2djY0NJQPS1NTE4IgKHqd/ksulyMuLg5WVlZVE5yIRCGRSLD83eVoZt4MSZlJ6LWlF7LyssSOpXZ0tXSxZ9Ae/DX0L3zg+YHYcYiqjahFUmZmJmQyGWQyGYDnt/jLZDIkJCQAeD4OaMSIEYr2wcHBuHv3LqZOnYq4uDisWbMGq1evxrRp0xRtAgICsGzZMmzduhXx8fEIDw/Hl19+iV69eilu8582bRqioqIQHx+PkydPYsCAAcjIyEBQUFD1HTwRVQtDHUP8OeRPmOmb4VzyOQzfPZwDucsgrzAPGy9sVPxyaaBjgB5Neoiciqh6iTpw+/Tp0+jSpYti+cXA6KCgIISFhSEpKUlRMAGAg4MD9u/fjylTpmDJkiWwtrbGL7/8ojTH0syZMyGRSDBz5kwkJibC3NwcAQEBmDdvnqLN/fv3MWTIEKSlpcHc3Bxt27bFP//8Azs7u2o4aiKqbg6mDtgzaA/eXv82ziWdQ9KzJDQwbiB2LJWVIc9A/+39EXE7AvFP4vFlpy/FjkQkColQ2jUoeqWMjAyYmJggPT2dg7iJ1MT+G/vRyroV6hvUFzuKykrOTEaPTT1wLvkcDLQNsHvQbnRr1E3sWESVpjznb7WaAoCI6E3893JRalYqzA3MRUqjeq4/uo7uG7sj/mk86hvUx76h+9DKupXYsYhEo1YDt4mIKsvGCxth/7M9Im5HvL5xLXAq8RTarWmH+KfxaGTaCDGjY1ggUa3HIomIah1BEPDHtT+QnZ+NPlv74HjCcbEjiepxzmN0Xd8Vadlp8LTyRMz7MWhUt5HYsYhExyKJiGodiUSCDX03wLeRL7Lys9B9Y3dE3okUO5Zo6urVxfyu89GjSQ9EjozkmC2i/8eB2xXEgdtE6i8nPwd9tvXB4VuHoaelhz+G/IGujl3FjlUtsvKykJqdCvs69op1RUIRNCT83ZlqtvKcv/mvgYhqLT1tPewdvBc9mvRATkEOem7uie2Xt4sdq8pdSb2CNr+1ge8GXzzJeaJYzwKJSBn/RRBRrSbVkmLXwF3o17Qf8grzIEuWiR2pSm26sAmtV7XGldQryMzLxN30u2JHIlJZnAKAiGo9XS1dbB+wHZsvbsbwFsPFjlMlsvKyMPXQVKw8uxIA8I7DO9jUbxMsDC1ETkakutiTREQEQFNDE4FugZBIJACA7PxsfPjnh0jOTH7Nlqov5l4MWq5oiZVnV0ICCWZ1nIVDww+xQCJ6DRZJREQleNHr4rHCQ+3vfPs+5nvcfHwTDY0b4nDgYXzV5StoamiKHYtI5bFIIiIqwZS2U+Bi7oKkzCS8ve5tfBb+GeQFcrFjlVleYZ7iz8t7Lsf4VuNxcdzFWnP3HlFl4BQAFcQpAIhqvsy8TEw9NBWrzq4CALSwaIEV765A24ZtRU5Wunvp9/DJ4U8AANvfq/l36hGVF6cAICKqBIY6hlgZsBJ7B++Fmb4ZLjy8AO/V3lh7bq3Y0Yp5nPMYXxz5As5LnPH7ld+xK24Xbjy6IXYsIrXGIomI6DV6OfXClfFXMLLlSNSR1in2oFwxpeemY07kHDj87IDQ46HIzs9GO5t2OPPBGTSp10TseERqjZfbKoiX24hqp9SsVJgbmCuWh+0ahqZmTfGh54dK66vD6Qen8fa6t/Es7xmA55cDv+78NXo59VLcpUdEyspz/uY8SURE5fByIXTs7jFsvrgZADA3ei6GuA7BiBYj0NGuY5XcPZYhz8DNxzfhYeUBAHCt7wp9bX3YmNhgTqc56O/Sn7NmE1Ui9iRVEHuSiCi/MB/bL2/HopOLcPrBacV6ayNr9HPuhw9bfYjm9ZtXeP9FQhGuP7qO8Fvh2H9zPyLvPH/4bPykeEUxFP8kHnZ17FgcEZVRec7fLJIqiEUSEb0gCAJi78di7bm12BG3A09znwIADg0/BN9GvgCA4wnHcTzhOGxNbGFtZI16evWgrakNbQ1t5Bbk4q16b0FbUxsAsPbcWmy+tBn/Jv6LdHm60mc51XPCkRFH0MC4QbUeI1FNwcttRETVSCKRwMfGBz42PljcYzEO3TqEQzcPob1te0WbPVf34MfYH0vdx+2Jt+Fg6gDg+QNoI25HAHj+bDnvht7o0aQH/Bv7w8XcheONiKoJiyQiokqkq6WLXk690Mupl9J6TytPDHMdhgfPHuDBswd4kvsE+YX5yCvMg1RLipyCHEXbfk37oUm9JmjToA2amTdT9DARUfXi5bYK4uU2IiIi9cPJJImIiIjeEIskIiIiohKwSCIiIiIqAYskIiIiohKwSCIiIiIqAYskIiIiohKwSCIiIiIqAYskIiIiohKwSCIiIiIqAYskIiIiohKwSCIiIiIqAYskIiIiohKwSCIiIiIqAYskIiIiohJoiR1AXQmCAADIyMgQOQkRERGV1Yvz9ovz+KuwSKqgZ8+eAQBsbGxETkJERETl9ezZM5iYmLyyjUQoSylFxRQVFeHBgwcwMjKCRCKp1H1nZGTAxsYG9+7dg7GxcaXuu6bhd1V2/K7Kjt9V2fG7Kjt+V+VTVd+XIAh49uwZrK2toaHx6lFH7EmqIA0NDTRs2LBKP8PY2Jj/kMqI31XZ8bsqO35XZcfvquz4XZVPVXxfr+tBeoEDt4mIiIhKwCKJiIiIqAQsklSQrq4uZs+eDV1dXbGjqDx+V2XH76rs+F2VHb+rsuN3VT6q8H1x4DYRERFRCdiTRERERFQCFklEREREJWCRRERERFQCFklEREREJWCRpCbkcjlatmwJiUQCmUwmdhyV1KtXL9ja2kIqlcLKygqBgYF48OCB2LFUzp07d/D+++/DwcEBenp6aNSoEWbPno28vDyxo6mkefPmwcfHB/r6+qhTp47YcVTO0qVL4eDgAKlUCk9PTxw7dkzsSCopOjoaAQEBsLa2hkQiwZ49e8SOpJJCQ0PRunVrGBkZoX79+ujTpw+uXbsmWh4WSWris88+g7W1tdgxVFqXLl2wfft2XLt2DTt37sStW7cwYMAAsWOpnKtXr6KoqAgrVqzA5cuXsXDhQixfvhxffPGF2NFUUl5eHt577z2MGzdO7CgqZ9u2bZg8eTJmzJiBc+fOoUOHDvD390dCQoLY0VROVlYW3NzcsHjxYrGjqLSoqCh89NFH+OeffxAeHo6CggL4+voiKytLnEACqbz9+/cLzs7OwuXLlwUAwrlz58SOpBb27t0rSCQSIS8vT+woKm/BggWCg4OD2DFU2tq1awUTExOxY6iUNm3aCMHBwUrrnJ2dhenTp4uUSD0AEHbv3i12DLWQkpIiABCioqJE+Xz2JKm4hw8fYuzYsdiwYQP09fXFjqM2Hj9+jE2bNsHHxwfa2tpix1F56enpqFu3rtgxSI3k5eXhzJkz8PX1VVrv6+uLmJgYkVJRTZOeng4Aov3/xCJJhQmCgJEjRyI4OBitWrUSO45a+Pzzz2FgYIB69eohISEBe/fuFTuSyrt16xZ+/fVXBAcHix2F1EhaWhoKCwthYWGhtN7CwgLJyckipaKaRBAETJ06Fe3bt0fz5s1FycAiSQRz5syBRCJ55ev06dP49ddfkZGRgZCQELEji6as39ULn376Kc6dO4fDhw9DU1MTI0aMgFBLJpUv73cFAA8ePED37t3x3nvvYcyYMSIlr34V+a6oZBKJRGlZEIRi64gq4uOPP8aFCxewZcsW0TLwsSQiSEtLQ1pa2ivb2NvbY/Dgwfjzzz+V/sMpLCyEpqYmhg0bhnXr1lV1VNGV9buSSqXF1t+/fx82NjaIiYmBt7d3VUVUGeX9rh48eIAuXbrAy8sLYWFh0NCoPb8zVeTnKiwsDJMnT8bTp0+rOJ16yMvLg76+Pn7//Xf07dtXsX7SpEmQyWSIiooSMZ1qk0gk2L17N/r06SN2FJU1YcIE7NmzB9HR0XBwcBAth5Zon1yLmZmZwczM7LXtfvnlF8ydO1ex/ODBA/j5+WHbtm3w8vKqyogqo6zfVUle1P9yubwyI6ms8nxXiYmJ6NKlCzw9PbF27dpaVSABb/ZzRc/p6OjA09MT4eHhSkVSeHg4evfuLWIyUmeCIGDChAnYvXs3IiMjRS2QABZJKs3W1lZp2dDQEADQqFEjNGzYUIxIKuvUqVM4deoU2rdvD1NTU9y+fRuzZs1Co0aNakUvUnk8ePAAnTt3hq2tLX744QekpqYq3rO0tBQxmWpKSEjA48ePkZCQgMLCQsU8ZY0bN1b8m6ytpk6disDAQLRq1Qre3t5YuXIlEhISOL6tBJmZmbh586ZiOT4+HjKZDHXr1i32f31t9tFHH2Hz5s3Yu3cvjIyMFOPbTExMoKenV/2BRLmnjiokPj6eUwCU4sKFC0KXLl2EunXrCrq6uoK9vb0QHBws3L9/X+xoKmft2rUCgBJfVFxQUFCJ39XRo0fFjqYSlixZItjZ2Qk6OjqCh4eHaLdqq7qjR4+W+HMUFBQkdjSVUtr/TWvXrhUlD8ckEREREZWgdg1EICIiIiojFklEREREJWCRRERERFQCFklEREREJWCRRERERFQCFklEREREJWCRRERERFQCFklEREREJWCRRESVqnPnzpg8ebLYMUr06NEj1K9fH3fu3AEAREZGQiKRVPlDayv6OWFhYahTp065tmndujV27dpVrm2IqGQskohIpSUlJWHo0KFwcnKChoZGqQXYzp074eLiAl1dXbi4uGD37t3F2oSGhiIgIAD29vZVG1pEX375JaZPn46ioiKxoxCpPRZJRKTS5HI5zM3NMWPGDLi5uZXYJjY2FoMGDUJgYCDOnz+PwMBADBw4ECdPnlS0ycnJwerVqzFmzJjqii6Knj17Ij09HYcOHRI7CpHaY5FERFXmyZMnGDFiBExNTaGvrw9/f3/cuHFDqc2qVatgY2MDfX199O3bFz/99JPSJSZ7e3v8/PPPGDFiBExMTEr8nEWLFqFbt24ICQmBs7MzQkJC8M4772DRokWKNgcOHICWlha8vb1Lzfvo0SMMGTIEDRs2hL6+PlxdXbFlyxalNp07d8aECRMwefJkmJqawsLCAitXrkRWVhZGjRoFIyMjNGrUCAcOHCi2/xMnTsDNzQ1SqRReXl64ePGi0vthYWGwtbVVfBePHj1Sev/WrVvo3bs3LCwsYGhoiNatWyMiIkKpjaamJnr06FEsNxGVH4skIqoyI0eOxOnTp/HHH38gNjYWgiCgR48eyM/PB/C8aAgODsakSZMgk8nQrVs3zJs3r9yfExsbC19fX6V1fn5+iImJUSxHR0ejVatWr9xPbm4uPD098ddff+HSpUv44IMPEBgYqNQjBQDr1q2DmZkZTp06hQkTJmDcuHF477334OPjg7Nnz8LPzw+BgYHIzs5W2u7TTz/FDz/8gH///Rf169dHr169FN/FyZMnMXr0aIwfPx4ymQxdunTB3LlzlbbPzMxEjx49EBERgXPnzsHPzw8BAQFISEhQatemTRscO3asbF8eEZVOICKqRJ06dRImTZokXL9+XQAgnDhxQvFeWlqaoKenJ2zfvl0QBEEYNGiQ0LNnT6Xthw0bJpiYmLxy3/+lra0tbNq0SWndpk2bBB0dHcVy7969hdGjRyu1OXr0qABAePLkSanH06NHD+GTTz5RytC+fXvFckFBgWBgYCAEBgYq1iUlJQkAhNjYWKXP2bp1q6LNo0ePBD09PWHbtm2CIAjCkCFDhO7duyt99qBBg0r9Ll5wcXERfv31V6V1e/fuFTQ0NITCwsJXbktEr8aeJCKqEnFxcdDS0oKXl5diXb169eDk5IS4uDgAwLVr19CmTRul7f67XFYSiURpWRAEpXU5OTmQSqWv3EdhYSHmzZuHFi1aoF69ejA0NMThw4eL9dS0aNFC8WdNTU3Uq1cPrq6uinUWFhYAgJSUFKXtXr7UV7duXaXvIi4urtilwP8uZ2Vl4bPPPoOLiwvq1KkDQ0NDXL16tVg+PT09FBUVQS6Xv/J4iejVtMQOQEQ1kyAIpa5/Ubz8t5B51XavYmlpieTkZKV1KSkpimIFAMzMzPDkyZNX7ufHH3/EwoULsWjRIri6usLAwACTJ09GXl6eUjttbW2lZYlEorTuxTGV5Q6zl7+L1/n0009x6NAh/PDDD2jcuDH09PQwYMCAYvkeP34MfX196OnpvXafRFQ69iQRUZVwcXFBQUGB0nieR48e4fr162jatCkAwNnZGadOnVLa7vTp0+X+LG9vb4SHhyutO3z4MHx8fBTL7u7uuHLlyiv3c+zYMfTu3RvDhw+Hm5sbHB0diw00fxP//POP4s9PnjzB9evX4ezsDOD59/Xy+/9t/yLfyJEj0bdvX7i6usLS0lIx59PLLl26BA8Pj0rLTVRbsUgioirRpEkT9O7dG2PHjsXx48dx/vx5DB8+HA0aNEDv3r0BABMmTMD+/fvx008/4caNG1ixYgUOHDhQrHdJJpNBJpMhMzMTqampkMlkSgXPpEmTcPjwYXz33Xe4evUqvvvuO0RERCjNqeTn54fLly+/sjepcePGCA8PR0xMDOLi4vDhhx8W66F6E19//TWOHDmCS5cuYeTIkTAzM0OfPn0AABMnTsTBgwexYMECXL9+HYsXL8bBgweL5du1axdkMhnOnz+PoUOHlthbdezYsWID2Ymo/FgkEVGVWbt2LTw9PfHuu+/C29sbgiBg//79iktT7dq1w/Lly/HTTz/Bzc0NBw8exJQpU4qNHXJ3d4e7uzvOnDmDzZs3w93dHT169FC87+Pjg61bt2Lt2rVo0aIFwsLCsG3bNqXxUK6urmjVqhW2b99eat4vv/wSHh4e8PPzQ+fOnWFpaakoYirD/PnzMWnSJHh6eiIpKQl//PEHdHR0AABt27bFb7/9hl9//RUtW7bE4cOHMXPmTKXtFy5cCFNTU/j4+CAgIAB+fn7FeowSExMRExODUaNGVVpuotpKIlRkAAARURUZO3Ysrl69WiW3sO/fvx/Tpk3DpUuXoKFRM39H/PTTT5Geno6VK1eKHYVI7XHgNhGJ6ocffkC3bt1gYGCAAwcOYN26dVi6dGmVfFaPHj1w48YNJCYmwsbGpko+Q2z169fHtGnTxI5BVCOwJ4mIRDVw4EBERkbi2bNncHR0xIQJExAcHCx2LCIiFklEREREJamZF+WJiIiI3hCLJCIiIqISsEgiIiIiKgGLJCIiIqISsEgiIiIiKgGLJCIiIqISsEgiIiIiKgGLJCIiIqIS/B8Nj5nHHBDgHgAAAABJRU5ErkJggg==",
-      "text/plain": [
-       "<Figure size 640x480 with 1 Axes>"
-      ]
-     },
-     "metadata": {},
-     "output_type": "display_data"
-    }
-   ],
+   "cell_type": "markdown",
+   "id": "354b2ab3",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "import numpy as np\n",
-    "import pandas as pd\n",
-    "import matplotlib.pyplot as plt\n",
-    "from sklearn.model_selection import train_test_split\n",
-    "from sklearn import linear_model\n",
-    "\n",
-    "def MSE(y_data,y_model):\n",
-    "    n = np.size(y_model)\n",
-    "    return np.sum((y_data-y_model)**2)/n\n",
-    "# A seed just to ensure that the random numbers are the same for every run.\n",
-    "# Useful for eventual debugging.\n",
-    "np.random.seed(2021)\n",
-    "\n",
-    "n = 100\n",
-    "x = np.random.rand(n)\n",
-    "y = np.exp(-x**2) + 1.5 * np.exp(-(x-2)**2)+ np.random.randn(n)\n",
-    "\n",
-    "Maxpolydegree = 5\n",
-    "X = np.zeros((n,Maxpolydegree-1))\n",
+    "## Identifying Terms\n",
     "\n",
-    "for degree in range(1,Maxpolydegree): #No intercept column\n",
-    "    X[:,degree-1] = x**(degree)\n",
-    "\n",
-    "# We split the data in test and training data\n",
-    "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)\n",
-    "\n",
-    "# Decide which values of lambda to use\n",
-    "nlambdas = 500\n",
-    "MSERidgePredict = np.zeros(nlambdas)\n",
-    "lambdas = np.logspace(-4, 2, nlambdas)\n",
-    "for i in range(nlambdas):\n",
-    "    lmb = lambdas[i]\n",
-    "    RegRidge = linear_model.Ridge(lmb)\n",
-    "    RegRidge.fit(X_train,y_train)\n",
-    "    ypredictRidge = RegRidge.predict(X_test)\n",
-    "    MSERidgePredict[i] = MSE(y_test,ypredictRidge)\n",
-    "\n",
-    "# Now plot the results\n",
-    "plt.figure()\n",
-    "plt.plot(np.log10(lambdas), MSERidgePredict, 'g--', label = 'MSE SL Ridge Test')\n",
-    "plt.xlabel('log10(lambda)')\n",
-    "plt.ylabel('MSE')\n",
-    "plt.legend()\n",
-    "plt.show()"
+    "The second term on the rhs disappears since this is just the mean and \n",
+    "employing the definition of $\\sigma^2$ we have"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "70944449",
-   "metadata": {},
+   "id": "2ee3d80b",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "Here we have performed a rather data greedy calculation as function of the regularization parameter $\\lambda$. There is no resampling here. The latter can easily be added by employing the function **RidgeCV** instead of just calling the **Ridge** function. For **RidgeCV** we need to pass the array of $\\lambda$ values.\n",
-    "By inspecting the figure we can in turn determine which is the optimal regularization parameter.\n",
-    "This becomes however less functional in the long run."
+    "$$\n",
+    "\\int_{-\\infty}^{\\infty}dxp(x)e^{\\left(iq(\\mu-x)/m\\right)}=\n",
+    "  1-\\frac{q^2\\sigma^2}{2m^2}+\\dots,\n",
+    "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "1538973c",
-   "metadata": {},
+   "id": "5c6f424d",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "## Grid Search\n",
-    "\n",
-    "An alternative is to use the so-called grid search functionality\n",
-    "included with the library **Scikit-Learn**, as demonstrated for the same\n",
-    "example here."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 5,
-   "id": "1c1fdba0",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "import numpy as np\n",
-    "from sklearn.model_selection import train_test_split\n",
-    "from sklearn.linear_model import Ridge\n",
-    "from sklearn.model_selection import GridSearchCV\n",
-    "\n",
-    "def R2(y_data, y_model):\n",
-    "    return 1 - np.sum((y_data - y_model) ** 2) / np.sum((y_data - np.mean(y_data)) ** 2)\n",
-    "\n",
-    "def MSE(y_data,y_model):\n",
-    "    n = np.size(y_model)\n",
-    "    return np.sum((y_data-y_model)**2)/n\n",
-    "\n",
-    "# A seed just to ensure that the random numbers are the same for every run.\n",
-    "# Useful for eventual debugging.\n",
-    "np.random.seed(2021)\n",
-    "\n",
-    "n = 100\n",
-    "x = np.random.rand(n)\n",
-    "y = np.exp(-x**2) + 1.5 * np.exp(-(x-2)**2)+ np.random.randn(n)\n",
-    "\n",
-    "Maxpolydegree = 5\n",
-    "X = np.zeros((n,Maxpolydegree-1))\n",
-    "\n",
-    "for degree in range(1,Maxpolydegree): #No intercept column\n",
-    "    X[:,degree-1] = x**(degree)\n",
-    "\n",
-    "# We split the data in test and training data\n",
-    "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)\n",
-    "\n",
-    "# Decide which values of lambda to use\n",
-    "nlambdas = 10\n",
-    "lambdas = np.logspace(-4, 2, nlambdas)\n",
-    "# create and fit a ridge regression model, testing each alpha\n",
-    "model = Ridge()\n",
-    "gridsearch = GridSearchCV(estimator=model, param_grid=dict(alpha=lambdas))\n",
-    "gridsearch.fit(X_train, y_train)\n",
-    "print(gridsearch)\n",
-    "ypredictRidge = gridsearch.predict(X_test)\n",
-    "# summarize the results of the grid search\n",
-    "print(f\"Best estimated lambda-value: {gridsearch.best_estimator_.alpha}\")\n",
-    "print(f\"MSE score: {MSE(y_test,ypredictRidge)}\")\n",
-    "print(f\"R2 score: {R2(y_test,ypredictRidge)}\")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "dd6f78eb",
-   "metadata": {},
-   "source": [
-    "By default the grid search function includes cross validation with\n",
-    "five folds. The [Scikit-Learn\n",
-    "documentation](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html#sklearn.model_selection.GridSearchCV)\n",
-    "contains more information on how to set the different parameters.\n",
-    "\n",
-    "If we take out the random noise, running the above codes results in $\\lambda=0$ yielding the best fit."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "20b7afcb",
-   "metadata": {},
-   "source": [
-    "## Randomized Grid Search\n",
-    "\n",
-    "An alternative to the above manual grid set up, is to use a random\n",
-    "search where the parameters are tuned from a random distribution\n",
-    "(uniform below) for a fixed number of iterations. A model is\n",
-    "constructed and evaluated for each combination of chosen parameters.\n",
-    "We repeat the previous example but now with a random search.  Note\n",
-    "that values of $\\lambda$ are now limited to be within $x\\in\n",
-    "[0,1]$. This domain may not be the most relevant one for the specific\n",
-    "case under study."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 6,
-   "id": "4810b670",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "import numpy as np\n",
-    "from sklearn.model_selection import train_test_split\n",
-    "from sklearn.linear_model import Ridge\n",
-    "from sklearn.model_selection import GridSearchCV\n",
-    "from scipy.stats import uniform as randuniform\n",
-    "from sklearn.model_selection import RandomizedSearchCV\n",
-    "\n",
-    "\n",
-    "def R2(y_data, y_model):\n",
-    "    return 1 - np.sum((y_data - y_model) ** 2) / np.sum((y_data - np.mean(y_data)) ** 2)\n",
-    "\n",
-    "def MSE(y_data,y_model):\n",
-    "    n = np.size(y_model)\n",
-    "    return np.sum((y_data-y_model)**2)/n\n",
-    "\n",
-    "# A seed just to ensure that the random numbers are the same for every run.\n",
-    "# Useful for eventual debugging.\n",
-    "np.random.seed(2021)\n",
-    "\n",
-    "n = 100\n",
-    "x = np.random.rand(n)\n",
-    "y = np.exp(-x**2) + 1.5 * np.exp(-(x-2)**2)+ np.random.randn(n)\n",
-    "\n",
-    "Maxpolydegree = 5\n",
-    "X = np.zeros((n,Maxpolydegree-1))\n",
-    "\n",
-    "for degree in range(1,Maxpolydegree): #No intercept column\n",
-    "    X[:,degree-1] = x**(degree)\n",
-    "\n",
-    "# We split the data in test and training data\n",
-    "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)\n",
-    "\n",
-    "param_grid = {'alpha': randuniform()}\n",
-    "# create and fit a ridge regression model, testing each alpha\n",
-    "model = Ridge()\n",
-    "gridsearch = RandomizedSearchCV(estimator=model, param_distributions=param_grid, n_iter=100)\n",
-    "gridsearch.fit(X_train, y_train)\n",
-    "print(gridsearch)\n",
-    "ypredictRidge = gridsearch.predict(X_test)\n",
-    "# summarize the results of the grid search\n",
-    "print(f\"Best estimated lambda-value: {gridsearch.best_estimator_.alpha}\")\n",
-    "print(f\"MSE score: {MSE(y_test,ypredictRidge)}\")\n",
-    "print(f\"R2 score: {R2(y_test,ypredictRidge)}\")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "0696dfc9",
-   "metadata": {},
-   "source": [
-    "## Wisconsin Cancer Data\n",
-    "\n",
-    "We show here how we can use a simple regression case on the breast\n",
-    "cancer data using Logistic regression as our algorithm for\n",
-    "classification."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 3,
-   "id": "c55d1159",
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "(426, 30)\n",
-      "(143, 30)\n",
-      "Test set accuracy with Logistic Regression: 0.94\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "/Users/mhjensen/miniforge3/envs/myenv/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:460: ConvergenceWarning: lbfgs failed to converge (status=1):\n",
-      "STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.\n",
-      "\n",
-      "Increase the number of iterations (max_iter) or scale the data as shown in:\n",
-      "    https://scikit-learn.org/stable/modules/preprocessing.html\n",
-      "Please also refer to the documentation for alternative solver options:\n",
-      "    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression\n",
-      "  n_iter_i = _check_optimize_result(\n"
-     ]
-    }
-   ],
-   "source": [
-    "import matplotlib.pyplot as plt\n",
-    "import numpy as np\n",
-    "from sklearn.model_selection import  train_test_split \n",
-    "from sklearn.datasets import load_breast_cancer\n",
-    "from sklearn.linear_model import LogisticRegression\n",
-    "\n",
-    "# Load the data\n",
-    "cancer = load_breast_cancer()\n",
-    "\n",
-    "X_train, X_test, y_train, y_test = train_test_split(cancer.data,cancer.target,random_state=0)\n",
-    "print(X_train.shape)\n",
-    "print(X_test.shape)\n",
-    "# Logistic Regression\n",
-    "logreg = LogisticRegression(solver='lbfgs')\n",
-    "logreg.fit(X_train, y_train)\n",
-    "print(\"Test set accuracy with Logistic Regression: {:.2f}\".format(logreg.score(X_test,y_test)))"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "b83cd520",
-   "metadata": {},
-   "source": [
-    "## Using the correlation matrix\n",
-    "\n",
-    "In addition to the above scores, we could also study the covariance (and the correlation matrix).\n",
-    "We use **Pandas** to compute the correlation matrix."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 4,
-   "id": "5497a1d8",
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "image/png": "iVBORw0KGgoAAAANSUhEUgAAA9wAAAfFCAYAAABjxsRdAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjguMCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy81sbWrAAAACXBIWXMAAA9hAAAPYQGoP6dpAAEAAElEQVR4nOzdeZyN9f//8eeZfTEzGMbMmDEGZexC1qxlKaT0zZa1PqVPCJFWobKEFC2Uj1Ap9SmpJFuMFELRIqVkjaFkZ5jl/fvDb87HMcNs55qzzON+u83tNue63ue6Xu/3dc55nde1HZsxxggAAAAAADiVj6sDAAAAAADAG1FwAwAAAABgAQpuAAAAAAAsQMENAAAAAIAFKLgBAAAAALAABTcAAAAAABag4AYAAAAAwAIU3AAAAAAAWICCGwAAAAAAC1BwA7BUxYoV1b9/f/vj5ORk2Ww2JScnuywmAADcyYQJE7R48WLL17N+/XqNHTtWx48ft3xdAC6i4AZQpOrVq6cNGzaoXr16rg4FAAC3UJQF97hx4yi4gSJEwQ0gG2OMzp07Z8myw8PD1bhxY4WHh1uyfAAAULTOnTsnY4yrwwDcEgU3kE9jx46VzWbTDz/8oDvvvFMREREqXbq0HnroIaWnp+vXX39Vhw4dFBYWpooVK2ry5MnZlnHy5EmNHDlSiYmJCggIUPny5TVs2DCdOXPGod0rr7yiFi1aKCoqSqGhoapVq5YmT56stLQ0h3atWrVSzZo1tXnzZjVv3lwhISGqVKmSJk2apMzMzFz7ZLPZNHjwYM2aNUvVqlVTYGCg5s+fL0kaN26cGjVqpNKlSys8PFz16tXTnDlzsiXWtLQ0jRo1StHR0QoJCdENN9ygTZs2ZVtXTqeUt2rVSq1atcrWtn///qpYsaLDtJkzZ6pOnToqUaKEwsLClJSUpMcffzzXPgIAvIO35WGbzaYzZ85o/vz5stlsstlsDjkxJSVFAwcOVFxcnAICApSYmKhx48YpPT1d0sWd5LfccosiIyO1b98++/POnj2rGjVqqFq1ajpz5ozGjh2rhx9+WJKUmJhoX1dWPrbZbBo7dmy2+C6/NGzevHmy2WxasWKF7r77bpUtW1YhISE6f/68JOm9995TkyZNFBoaqhIlSqh9+/baunXrVccA8GZ+rg4A8FTdunVT7969NXDgQK1cudKegFetWqUHHnhAI0eO1DvvvKNHHnlEVapUUdeuXSVdTIAtW7bUgQMH9Pjjj6t27dravn27nnrqKf34449atWqVbDabJGnXrl3q1auX/QvB999/r/Hjx+uXX37RG2+84RBPSkqK7rrrLo0YMUJjxozRRx99pMcee0yxsbHq27dvrv1ZvHix1q1bp6eeekrR0dGKioqSJO3Zs0cDBw5UhQoVJEkbN27UkCFD9Oeff+qpp56yP//ee+/Vm2++qZEjR6pt27b66aef1LVrV506dcop4y1JCxcu1AMPPKAhQ4Zo6tSp8vHx0e+//66ff/7ZaesAAHgGb8nDGzZsUJs2bdS6dWuNHj1akuxngaWkpKhhw4by8fHRU089pcqVK2vDhg169tlntWfPHs2dO1c2m01vvfWW6tatq27dumndunXy9/fXAw88oN27d+ubb75RaGio/vWvf+mff/7RSy+9pEWLFikmJkaSVL169QKN/913362OHTvqrbfe0pkzZ+Tv768JEyboySef1IABA/Tkk0/qwoULmjJlipo3b65NmzYVeF2ARzMA8mXMmDFGknn++ecdptetW9dIMosWLbJPS0tLM2XLljVdu3a1T5s4caLx8fExmzdvdnj+Bx98YCSZpUuX5rjejIwMk5aWZt58803j6+tr/vnnH/u8li1bGknmm2++cXhO9erVTfv27XPtkyQTERHhsMyrxfD000+byMhIk5mZaYwxZseOHUaSGT58uEP7BQsWGEmmX79+9mlr1qwxksyaNWsc4m/ZsmW29fXr188kJCTYHw8ePNiULFky1/4AALyXN+bh0NBQh1yZZeDAgaZEiRJm7969DtOnTp1qJJnt27fbp3311VfGz8/PDBs2zLzxxhtGkvnPf/7j8LwpU6YYSWb37t3Z1iXJjBkzJtv0hIQEh9jmzp1rJJm+ffs6tNu3b5/x8/MzQ4YMcZh+6tQpEx0dbbp163aF3gPejVPKgQLq1KmTw+Nq1arJZrPp5ptvtk/z8/NTlSpVtHfvXvu0JUuWqGbNmqpbt67S09Ptf+3bt892qvXWrVt16623KjIyUr6+vvL391ffvn2VkZGhnTt3Oqw/OjpaDRs2dJhWu3Zth3VfTZs2bVSqVKls01evXq2bbrpJERER9hieeuopHT16VEeOHJEkrVmzRpJ01113OTy3W7du8vNz3ok0DRs21PHjx9WzZ099/PHH+vvvv522bACAZ/G2PJyTJUuWqHXr1oqNjXWINauPa9eutbdt1qyZxo8frxdffFH//ve/1bt3b91zzz0FXndu7rjjDofHy5cvV3p6uvr27esQa1BQkFq2bMmvk6DY4pRyoIBKly7t8DggIEAhISEKCgrKNv3kyZP2x4cPH9bvv/8uf3//HJebVUTu27dPzZs3V9WqVTV9+nRVrFhRQUFB2rRpkwYNGpTtpmaRkZHZlhUYGJjnm59lnVp2qU2bNqldu3Zq1aqVZs+ebb9+bPHixRo/frx92UePHpV08cvGpfz8/HKMq6D69Omj9PR0zZ49W3fccYcyMzN1/fXX69lnn1Xbtm2dth4AgPvztjyck8OHD+vTTz/NNdYsd911l0aPHq3z58/br9e2yuXfGw4fPixJuv7663Ns7+PDcT4UTxTcQBErU6aMgoODs137del86eI11WfOnNGiRYuUkJBgn79t2zZL4sq6Xu1SCxculL+/v5YsWeLwBebyny7J+pKRkpKi8uXL26enp6fbi/GrCQoK0okTJ7JNz+kI9oABAzRgwACdOXNGX375pcaMGaNOnTpp586dDuMEAEBO3DUPXymW2rVra/z48TnOj42Ntf+fkZGhu+66S6VKlVJgYKDuueceff311woICMjTugIDA+03PrvUlfL45d8bssbtgw8+IB8Dl6DgBopYp06dNGHCBEVGRioxMfGK7bISWWBgoH2aMUazZ8+2PMZLY/Dz85Ovr6992rlz5/TWW285tMu6m+qCBQtUv359+/T333/ffhfVq6lYsaL++9//6vz58/b+Hj16VOvXr7/iz4eFhobq5ptv1oULF3Tbbbdp+/btJHgAQK7cMQ9f6Uh4p06dtHTpUlWuXDnHy74uNWbMGK1bt04rVqxQaGioWrRooYcffljTp093WI+kHNdVsWJF/fDDDw7TVq9erdOnT+epD+3bt5efn5927dqV7XRzoDij4AaK2LBhw/Thhx+qRYsWGj58uGrXrq3MzEzt27dPK1as0IgRI9SoUSO1bdtWAQEB6tmzp0aNGqXU1FTNnDlTx44dK7JYO3bsqGnTpqlXr1667777dPToUU2dOtXhy4d08bq53r1768UXX5S/v79uuukm/fTTT5o6dWqefm+7T58+eu2119S7d2/de++9Onr0qCZPnpztuffee6+Cg4PVrFkzxcTEKCUlRRMnTlRERMQVT2EDAOBS7piHa9WqpeTkZH366aeKiYlRWFiYqlatqqefflorV65U06ZN9eCDD6pq1apKTU3Vnj17tHTpUs2aNUtxcXFauXKlJk6cqNGjR+vGG2+UJE2cOFEjR45Uq1atdPvtt9vXI0nTp09Xv3795O/vr6pVqyosLEx9+vTR6NGj9dRTT6lly5b6+eef9fLLLysiIiJPfahYsaKefvppPfHEE/rjjz/UoUMHlSpVSocPH9amTZsUGhqqcePGOX3sAHdHwQ0UsdDQUK1bt06TJk3S66+/rt27dys4OFgVKlTQTTfdZP/d6aSkJH344Yd68skn1bVrV0VGRqpXr1566KGHHG4IY6U2bdrojTfe0HPPPafOnTurfPnyuvfeexUVFZXtRixz5sxRuXLlNG/ePM2YMUN169bVhx9+qB49euS6nmbNmmn+/PmaNGmSunTpokqVKmnMmDFaunSpw01Wmjdvrnnz5un999/XsWPHVKZMGd1www168803VbZsWWd3HwDghdwxD0+fPl2DBg1Sjx497D9blpycrJiYGG3ZskXPPPOMpkyZogMHDigsLEyJiYn2gvbQoUPq3bu3WrVq5fBznQ899JDWrl2ru+++W9ddd50qVqyoVq1a6bHHHtP8+fM1e/ZsZWZmas2aNWrVqpUefvhhnTx5UvPmzdPUqVPVsGFDvf/+++rSpUue+/HYY4+pevXqmj59ut59912dP39e0dHRuv7663X//fc7dcwAT2EzxhhXBwEAAAAAgLfhdoEAAAAAAFiAghsAAAAAAAsUqODevXu3s+MAAAAAAMCrFKjgrlKlilq3bq23335bqampzo4JAAAAAACPV6CC+/vvv9d1112nESNGKDo6WgMHDtSmTZucHRsAAAAAAB6rUHcpT09P16effqp58+bp888/1zXXXKN77rlHffr0uepP9GRmZurgwYMKCwuTzWYr6OoBAPB4xhidOnVKsbGx8vFxza1VyMsAAFzk7LzslJ8FO3/+vF599VU99thjunDhgvz9/dW9e3c999xziomJydb+wIEDio+PL+xqAQDwGvv371dcXJxL1k1eBgDAkbPycqEK7i1btuiNN97QwoULFRoaqn79+umee+7RwYMH9dRTT+nUqVM5nmp+4sQJlSxZUvv371d4eHihOgAA3igzM1Pp6emuDgNO4Ofnd9U95CdPnlR8fLyOHz+uiIiIIozsf8jLAABc5Oy87FeQJ02bNk1z587Vr7/+qltuuUVvvvmmbrnlFvsXisTERL322mtKSkrK8flZp6uFh4eT2AHgEsYYpaSk6Pjx464OBU5UsmRJRUdHX/V0bVeeyk1eBgDAkbPycoEK7pkzZ+ruu+/WgAEDFB0dnWObChUqaM6cOYUKDi6yZuLV57d+rGjiAIqhrGI7KipKISEhXE/r4YwxOnv2rI4cOSJJOV5mBeSKvAwAHqtABfdvv/2Wa5uAgAD169evIIuHu8st8eeGLwZAjjIyMuzFdmRkpKvDgZMEBwdLko4cOaKoqCj5+vq6OCIAAFBUClRwz507VyVKlNCdd97pMP2///2vzp49S6ENAAWQlpYmSQoJCXFxJHC2rG2alpZGwQ3XKOxRco6yA0CBFOg+55MmTVKZMmWyTY+KitKECRMKHRQAFGecRu592KYAABRPBSq49+7dq8TExGzTExIStG/fvkIHBQAAAACApytQwR0VFaUffvgh2/Tvv/+e6w4BAAAAAFABr+Hu0aOHHnzwQYWFhalFixaSpLVr12ro0KHq0aOHUwMEAEgvrNxZZOsa3vbaIltXTvbs2aPExERt3bpVdevWVXJyslq3bq1jx46pZMmSLo0NAAAgPwpUcD/77LPau3evbrzxRvn5XVxEZmam+vbtyzXcyF1e7nLOzVcAj9K/f3/Nnz9fAwcO1KxZsxzmPfDAA5o5c6b69eunefPm5XvZTZs21aFDhxQREeGkaJ1n3rx5GjZsGL+bDpDbASBHBTqlPCAgQO+9955++eUXLViwQIsWLdKuXbv0xhtvKCAgwNkxAgA8QHx8vBYuXKhz587Zp6Wmpurdd99VhQoVCrzcgIAARUdHc+MxAADgcQpUcGe59tprdeedd6pTp05KSEhwVkwAAA9Ur149VahQQYsWLbJPW7RokeLj43XdddfZpy1btkw33HCDSpYsqcjISHXq1Em7du264nKTk5Nls9kcjiLPnj1b8fHxCgkJ0e23365p06Y5nG4+duxY1a1bV2+99ZYqVqyoiIgI9ejRQ6dOncpzHHv27JHNZtOiRYvUunVrhYSEqE6dOtqwYYM9rgEDBujEiROy2Wyy2WwaO3ZsIUYQXmvNxKv/AQC8VoEK7oyMDM2ZM0e9evXSTTfdpDZt2jj8AQCKpwEDBmju3Ln2x2+88YbuvvtuhzZnzpzRQw89pM2bN+uLL76Qj4+Pbr/9dmVmZuZpHV9//bXuv/9+DR06VNu2bVPbtm01fvz4bO127dqlxYsXa8mSJVqyZInWrl2rSZMm5TuOJ554QiNHjtS2bdt07bXXqmfPnkpPT1fTpk314osvKjw8XIcOHdKhQ4c0cuTI/AwXAADwcgW6hnvo0KGaN2+eOnbsqJo1a3KaH5wvtz3+XAcGuKU+ffrosccesx8d/vrrr7Vw4UIlJyfb29xxxx0Oz5kzZ46ioqL0888/q2bNmrmu46WXXtLNN99sL26vvfZarV+/XkuWLHFol5mZqXnz5iksLMwe2xdffGEvzvMax8iRI9WxY0dJ0rhx41SjRg39/vvvSkpKUkREhGw2m6Kjo/M4QgAAoDgpUMG9cOFCvf/++7rlllucHQ8AwIOVKVNGHTt21Pz582WMUceOHVWmTBmHNrt27dLo0aO1ceNG/f333/Yjyvv27ctTwf3rr7/q9ttvd5jWsGHDbAV3xYoV7cW2JMXExOjIkSP5jqN27doOy5CkI0eOKCkpKddYAQBA8VaggjsgIEBVqlRxdiwAAC9w9913a/DgwZKkV155Jdv8zp07Kz4+XrNnz1ZsbKwyMzNVs2ZNXbhwIU/LN8ZkO7PKGJOtnb+/v8Njm83mcLp4XuO4dDlZ683r6e8AAKB4K1DBPWLECE2fPl0vv/wyp5PDNTjlHHBbHTp0sBet7du3d5h39OhR7dixQ6+99pqaN28uSfrqq6/ytfykpCRt2rTJYdqWLVvytQxnxCFd3AGdkZGR7+cBboebtwGAJQpUcH/11Vdas2aNPv/8c9WoUSPbUYRL71ALAChefH19tWPHDvv/lypVqpQiIyP1+uuvKyYmRvv27dOjjz6ar+UPGTJELVq00LRp09S5c2etXr1an3/+eb52ADsjDuniaeunT5/WF198oTp16igkJEQhISH5Xg4AAPBOBSq4S5Ysme36OQCAdYa3vdbVIeRLeHh4jtN9fHy0cOFCPfjgg6pZs6aqVq2qGTNmqFWrVnledrNmzTRr1iyNGzdOTz75pNq3b6/hw4fr5ZdfzvMynBGHJDVt2lT333+/unfvrqNHj2rMmDH8NBjgxl5YufOq8z3tsxaA+7OZnC58s9jJkycVERGhEydOXPFLGVzIG04r45RyeKDU1FTt3r1biYmJCgoKcnU4HuXee+/VL7/8onXr1rk6lBxdbdu6Q050hxi8WmEvg3LGZVTukNvdIDdTcAPIjbNzYoGOcEtSenq6kpOTtWvXLvXq1UthYWE6ePCgwsPDVaJEiUIHBgDAlUydOlVt27ZVaGioPv/8c82fP1+vvvqqq8OCJ+KeIG6DYhiANypQwb1371516NBB+/bt0/nz59W2bVuFhYVp8uTJSk1N1axZs5wdJwAAdps2bdLkyZN16tQpVapUSTNmzNC//vUvV4cFAADgoEAF99ChQ9WgQQN9//33ioyMtE+//fbb+cIDALDc+++/7+oQAOdxh9O9AQCWKPBdyr/++msFBAQ4TE9ISNCff/7plMBgEZI6AAAAABSJAhXcmZmZOf7u6IEDBxQWFlbooAAAADyCO+zIdocYAAA58inIk9q2basXX3zR/thms+n06dMaM2aMbrnlFmfFBgAAAACAxyrQEe4XXnhBrVu3VvXq1ZWamqpevXrpt99+U5kyZfTuu+86O0YAAAAAADxOgQru2NhYbdu2Te+++66+++47ZWZm6p577tFdd92l4OBgZ8cIuAY/FQMAgPPkeur7HUUSBgAUpQL/DndwcLDuvvtu3X333c6MBwAAAAAAr1CggvvNN9+86vy+ffsWKBiIo6rOwjjC2xTlTZHc+P1RsWJFDRs2TMOGDXN1KADczAsrd1q+jOFtry30OgAULwX+He5LpaWl6ezZswoICFBISAgFNwAUM/3799f8+fPtj0uXLq3rr79ekydPVu3atZ22ns2bNys0NNRpywMAALBSgQruY8eOZZv222+/6d///rcefvjhQgcFAPA8HTp00Ny5cyVJKSkpevLJJ9WpUyft27fPaesoW7as05YFwL003vf6Vee/sPK+IooEAJynQD8LlpNrrrlGkyZNynb0GwBQPAQGBio6OlrR0dGqW7euHnnkEe3fv19//fWXJOnPP/9U9+7dVapUKUVGRqpLly7as2eP/fn9+/fXbbfdpqlTpyomJkaRkZEaNGiQ0tLS7G0qVqzo8LOUv/zyi2644QYFBQWpevXqWrVqlWw2mxYvXixJ2rNnj2w2mxYtWqTWrVsrJCREderU0YYNG4piSAAAQDFX4Jum5cTX11cHDx505iIBaxTl9bBAMXT69GktWLBAVapUUWRkpM6ePavWrVurefPm+vLLL+Xn56dnn31WHTp00A8//KCAgABJ0po1axQTE6M1a9bo999/V/fu3VW3bl3de++92daRmZmp2267TRUqVNA333yjU6dOacSIETnG88QTT2jq1Km65ppr9MQTT6hnz576/fff5efn1DQIb0S+yJMNfxx1dQi5HiGXpI0VCneUnGu8AeRXgb5pfPLJJw6PjTE6dOiQXn75ZTVr1swpgQEAPMuSJUtUokQJSdKZM2cUExOjJUuWyMfHRwsXLpSPj4/+85//yGazSZLmzp2rkiVLKjk5We3atZMklSpVSi+//LJ8fX2VlJSkjh076osvvsix4F6xYoV27dql5ORkRUdHS5LGjx+vtm3bZms7cuRIdezYUZI0btw41ahRQ7///ruSkpIsGQsAAACpgAX3bbfd5vDYZrOpbNmyatOmjZ5//nlnxIWCYk+822AvOIqb1q1ba+bMmZKkf/75R6+++qpuvvlmbdq0Sd9++61+//13hYWFOTwnNTVVu3btsj+uUaOGfH197Y9jYmL0448/5ri+X3/9VfHx8fZiW5IaNmyYY9tLb9wWExMjSTpy5AgFNwAAsFSBCu7MzExnx4G8oqAGsmHnhnsIDQ1VlSpV7I/r16+viIgIzZ49W5mZmapfv74WLFiQ7XmX3gjN39/fYZ7NZrtizjHG2I+W5+bS5WY9h1wGeJa8nDJe2GUU9pRzALgcF68BBeSM3/sEvJnNZpOPj4/OnTunevXq6b333lNUVJTCw8OdsvykpCTt27dPhw8fVrly5SRd/NkwAAAAd1Gggvuhhx7Kc9tp06YVZBUAAA9z/vx5paSkSLr485Evv/yyTp8+rc6dO6thw4aaMmWKunTpoqefflpxcXHat2+fFi1apIcfflhxcXH5Xl/btm1VuXJl9evXT5MnT9apU6f0xBNPSFKej3wDQFFyxs56ztoCPEuBCu6tW7fqu+++U3p6uqpWrSpJ2rlzp3x9fVWvXj17O77wAFeWl6Rb2KTqjFOtC7uMwn654IvF/9f6MVdHkKtly5bZr48OCwtTUlKS/vvf/6pVq1aSpC+//FKPPPKIunbtqlOnTql8+fK68cYbC3zE29fXV4sXL9a//vUvXX/99apUqZKmTJmizp07KygoyFndAgAAKLACFdydO3dWWFiY5s+fr1KlSkm6eDRjwIABat68+RV/lqXY4/prt5GXny9pUimyCCIBvMO8efM0b968q7aJjo7W/Pnzr7qMy136m9uSHH63W7p4WvlXX31lf/z1119Lkv1a8ooVK8oY4/CckiVLZpsGeLvc8l5uOc8dfvarKBSHa7yLYoc/gP8pUMH9/PPPa8WKFfZiW7r4Uy7PPvus2rVrR8ENACgSH330kUqUKKFrrrlGv//+u4YOHapmzZqpcuXKrg4NAACgYAX3yZMndfjwYdWoUcNh+pEjR3Tq1CmnBAa4Wq578ysUTRzuzhNuHsddzL3XqVOnNGrUKO3fv19lypTRTTfdxM9TAgVQXI5gW60ociI5DfAsBSq4b7/9dg0YMEDPP/+8GjduLEnauHGjHn74YXXt2tWpAXoUThkvMoU9NQ6Ad+jbt6/69u3r6jAAAAByVKCCe9asWRo5cqR69+6ttLS0iwvy89M999yjKVOmODVAoCCKy576wu5J94aj0wBwVewMhxvJy2+J53adeO7LmJqPiABYrUAFd0hIiF599VVNmTJFu3btkjFGVapUUWhoqLPjA4BiJzMz09UhwMnYpgCKldx2dHnAL28AzlKggjvLoUOHdOjQIbVo0ULBwcEyxvBTYABQQAEBAfLx8dHBgwdVtmxZBQQE8Jnq4YwxunDhgv766y/5+PgoICDA1SEBAIAiVKCC++jRo+rWrZvWrFkjm82m3377TZUqVdK//vUvlSxZ0ntvWMNpaUWmuJwSnhtOp3YPRfUTKj4+PkpMTNShQ4d08ODBQi8P7iMkJEQVKlSQj4+Pq0OBGyLnAYD3KlDBPXz4cPn7+2vfvn2qVq2afXr37t01fPhw7y24AcBiAQEBqlChgtLT05WRkeHqcOAEvr6+8vPz42wFAACKoQIV3CtWrNDy5csVFxfnMP2aa67R3r17nRIY4O5yu2lJbjc9gftwtzMJbDab/P395e/vb59WVEfZgSJV2DPHuA4UXigvN1az2oY5Iwv1fKf8WgzXgcNLFKjgPnPmjEJCQrJN//vvvxUYGFjooCyRl6TOGzdPOPUNAAAAAHJXoIK7RYsWevPNN/XMM89Iung0JjMzU1OmTFHr1q2dGiAAAECO3ODeKnnZCe2Uo31wC5zdlje5vS+a5KFccMYyAHdQoIJ7ypQpatWqlbZs2aILFy5o1KhR2r59u/755x99/fXXzo6x6LhB4rYaXwyKTlGcEkZiBwAAANyXzRhjCvLElJQUzZw5U99++60yMzNVr149DRo0SDExMbk+98SJEypZsqT279+v8PDwgqw+uy+5UVtebNrzj6tDQBHaHDfgqvOvPzC30MuAcwxqU+Wq819Z/XsRRXJlucUoFT7OvKzD6hhy44wYL3Xy5EnFx8fr+PHjioiIcOqy88qSvCzlmptzy0kNK5Yu1PPzsozCckYM5GY4kzPydl6+H7haYd9XeRmnwubm3J6/6c0nco2hYd/xubYpDgo71vnh7Lyc74I7LS1N7dq102uvvaZrry3YDXoOHDig+Pj4Aj0XAABvtH///mw3Iy0q5GUAABw5Ky/n+5Ryf39//fTTT4X6eZPY2Fjt379fYWFh/ExKPmXtcXH6UQjkC9vB9dgG7oHtUHjGGJ06dUqxsbEui6Eo8jKvlewYk5wxLjljXLJjTHLGuOQsr+Pi7LxcoGu4+/btqzlz5mjSpEkFWqmPj4/L9uJ7i/DwcN5AboDt4HpsA/fAdigcV51KnqUo8zKvlewYk5wxLjljXLJjTHLGuOQsL+PizLxcoIL7woUL+s9//qOVK1eqQYMGCg0NdZg/bdo0pwQHAAAAAICnylfB/ccff6hixYr66aefVK9ePUnSzp07HdpwijgAAAAAAPksuK+55hodOnRIa9askSR1795dM2bMULly5SwJDtkFBgZqzJgxCgwMdHUoxRrbwfXYBu6B7YC84rWSHWOSM8YlZ4xLdoxJzhiXnLlqXPJ1l3IfHx+lpKQoKipK0sXz37dt26ZKlSpZFiAAAAAAAJ7IpzBPLuBPeAMAAAAA4PXyVXDbbLZs12hzzTYAAAAAANnl6xpuY4z69+9vP+89NTVV999/f7a7lC9atMh5EQIAAAAA4IHyVXD369fP4XHv3r2dGgwAAAAAAN4iXzdNAwAAAAAAeVOom6bBOl9++aU6d+6s2NhY2Ww2LV682GG+MUZjx45VbGysgoOD1apVK23fvt01wXqp3LZB//797fc1yPpr3Lixa4L1UhMnTtT111+vsLAwRUVF6bbbbtOvv/7q0Ib3gvXysh14P0Aid10J+SQ7Pt9zxudtzmbOnKnatWsrPDxc4eHhatKkiT7//HP7/OL4WsltTIrj6yQnEydOlM1m07Bhw+zTivr1QsHtps6cOaM6dero5ZdfznH+5MmTNW3aNL388svavHmzoqOj1bZtW506daqII/VeuW0DSerQoYMOHTpk/1u6dGkRRuj91q5dq0GDBmnjxo1auXKl0tPT1a5dO505c8behveC9fKyHSTeDyB3XQn5JDs+33PG523O4uLiNGnSJG3ZskVbtmxRmzZt1KVLF3uRVBxfK7mNiVT8XieX27x5s15//XXVrl3bYXqRv14M3J4k89FHH9kfZ2ZmmujoaDNp0iT7tNTUVBMREWFmzZrlggi93+XbwBhj+vXrZ7p06eKSeIqrI0eOGElm7dq1xhjeC65y+XYwhvcDsiN35Yx8kjM+33PG5+2VlSpVyvznP//htXKJrDExhtfJqVOnzDXXXGNWrlxpWrZsaYYOHWqMcc1nC0e4PdDu3buVkpKidu3a2acFBgaqZcuWWr9+vQsjK36Sk5MVFRWla6+9Vvfee6+OHDni6pC82okTJyRJpUuXlsR7wVUu3w5ZeD/gani/Xl1xf//w+Z4zPm+zy8jI0MKFC3XmzBk1adKE14qyj0mW4vw6GTRokDp27KibbrrJYborXi/5uks53ENKSookqVy5cg7Ty5Urp71797oipGLp5ptv1p133qmEhATt3r1bo0ePVps2bfTtt9/afzoPzmOM0UMPPaQbbrhBNWvWlMR7wRVy2g4S7wfkjvfrlRX39w+f7znj89bRjz/+qCZNmig1NVUlSpTQRx99pOrVq9uLpOL4WrnSmEjF93UiSQsXLtR3332nzZs3Z5vnis8WCm4PZrPZHB4bY7JNg3W6d+9u/79mzZpq0KCBEhIS9Nlnn6lr164ujMw7DR48WD/88IO++uqrbPN4LxSdK20H3g/IK96v2RX39w+f7znj89ZR1apVtW3bNh0/flwffvih+vXrp7Vr19rnF8fXypXGpHr16sX2dbJ//34NHTpUK1asUFBQ0BXbFeXrhVPKPVB0dLSk/+2hyXLkyJFse2tQdGJiYpSQkKDffvvN1aF4nSFDhuiTTz7RmjVrFBcXZ5/Oe6FoXWk75IT3Ay7H+zXvitP7h8/3nPF5m11AQICqVKmiBg0aaOLEiapTp46mT59erF8rVxqTnBSX18m3336rI0eOqH79+vLz85Ofn5/Wrl2rGTNmyM/Pz/6aKMrXCwW3B0pMTFR0dLRWrlxpn3bhwgWtXbtWTZs2dWFkxdvRo0e1f/9+xcTEuDoUr2GM0eDBg7Vo0SKtXr1aiYmJDvN5LxSN3LZDTng/4HK8X/OuOLx/+HzPGZ+3eWeM0fnz54vtayUnWWOSk+LyOrnxxhv1448/atu2bfa/Bg0a6K677tK2bdtUqVKlIn+9cEq5mzp9+rR+//13++Pdu3dr27ZtKl26tCpUqKBhw4ZpwoQJuuaaa3TNNddowoQJCgkJUa9evVwYtXe52jYoXbq0xo4dqzvuuEMxMTHas2ePHn/8cZUpU0a33367C6P2LoMGDdI777yjjz/+WGFhYfa9kREREQoODrb/riLvBWvlth1Onz7N+wGSyF1XQj7Jjs/3nPF5m7PHH39cN998s+Lj43Xq1CktXLhQycnJWrZsWbF9rVxtTIrr60SSwsLCHO55IEmhoaGKjIy0Ty/y14sl9z5Hoa1Zs8ZIyvbXr18/Y8zFW9qPGTPGREdHm8DAQNOiRQvz448/ujZoL3O1bXD27FnTrl07U7ZsWePv728qVKhg+vXrZ/bt2+fqsL1KTuMvycydO9fehveC9XLbDrwfkIXclTPySXZ8vueMz9uc3X333SYhIcEEBASYsmXLmhtvvNGsWLHCPr84vlauNibF9XVyJZf+LJgxRf96sRljjDWlPAAAAAAAxRfXcAMAAAAAYAEKbgAAAAAALEDBDQAAAACABSi4AQAAAACwAAU3AAAAAAAWoOAGAAAAAMACFNwAAAAAAFiAghsAAAAAAAtQcANwGZvNpsWLF1uy7IoVK+rFF1+0ZNkAAABAXlBwo9jq37+/bDZbtr/ff//dKcufN2+eSpYs6ZRleatDhw7p5ptvliTt2bNHNptN27Ztc21QAAAAgJP4uToAwJU6dOiguXPnOkwrW7asi6K5srS0NPn7+7s6DKeLjo52dQgAAACAZTjCjWItMDBQ0dHRDn++vr6SpE8//VT169dXUFCQKlWqpHHjxik9Pd3+3GnTpqlWrVoKDQ1VfHy8HnjgAZ0+fVqSlJycrAEDBujEiRP2I+djx46VlPNp1CVLltS8efMk/e9I7/vvv69WrVopKChIb7/9tiRp7ty5qlatmoKCgpSUlKRXX331qv1r1aqVhgwZomHDhqlUqVIqV66cXn/9dZ05c0YDBgxQWFiYKleurM8//9z+nIyMDN1zzz1KTExUcHCwqlatqunTpzssNz09XQ8++KBKliypyMhIPfLII+rXr59uu+02h3U/+OCDGjVqlEqXLq3o6Gj7GGS5dCwSExMlSdddd51sNptatWplX86wYcMcnnfbbbepf//+9sdHjhxR586dFRwcrMTERC1YsCDbWJw4cUL33XefoqKiFB4erjZt2uj777+/6vgBAAAAhUHBDeRg+fLl6t27tx588EH9/PPPeu211zRv3jyNHz/e3sbHx0czZszQTz/9pPnz52v16tUaNWqUJKlp06Z68cUXFR4erkOHDunQoUMaOXJkvmJ45JFH9OCDD2rHjh1q3769Zs+erSeeeELjx4/Xjh07NGHCBI0ePVrz58+/6nLmz5+vMmXKaNOmTRoyZIj+/e9/684771TTpk313XffqX379urTp4/Onj0rScrMzFRcXJzef/99/fzzz3rqqaf0+OOP6/3337cv87nnntOCBQs0d+5cff311zp58mSO12LPnz9foaGh+uabbzR58mQ9/fTTWrlyZY5xbtq0SZK0atUqHTp0SIsWLcrzWPXv31979uzR6tWr9cEHH+jVV1/VkSNH7PONMerYsaNSUlK0dOlSffvtt6pXr55uvPFG/fPPP3leDwAAAJAvBiim+vXrZ3x9fU1oaKj97//+7/+MMcY0b97cTJgwwaH9W2+9ZWJiYq64vPfff99ERkbaH8+dO9dERERkayfJfPTRRw7TIiIizNy5c40xxuzevdtIMi+++KJDm/j4ePPOO+84THvmmWdMkyZNrhhTy5YtzQ033GB/nJ6ebkJDQ02fPn3s0w4dOmQkmQ0bNlxxOQ888IC544477I/LlStnpkyZ4rDcChUqmC5dulxx3cYYc/3115tHHnnE/vjSscjq99atW7P1YejQoQ7TunTpYvr162eMMebXX381kszGjRvt83fs2GEkmRdeeMEYY8wXX3xhwsPDTWpqqsNyKleubF577bUr9hsAAAAoDK7hRrHWunVrzZw50/44NDRUkvTtt99q8+bNDke0MzIylJqaqrNnzyokJERr1qzRhAkT9PPPP+vkyZNKT09Xamqqzpw5Y19OYTRo0MD+/19//aX9+/frnnvu0b333mufnp6eroiIiKsup3bt2vb/fX19FRkZqVq1atmnlStXTpIcjgjPmjVL//nPf7R3716dO3dOFy5cUN26dSVdPDX78OHDatiwocNy69evr8zMzCuuW5JiYmIc1uMMO3bskJ+fn8N4JSUlOdyw7ttvv9Xp06cVGRnp8Nxz585p165dTo0HAAAAyELBjWItNDRUVapUyTY9MzNT48aNU9euXbPNCwoK0t69e3XLLbfo/vvv1zPPPKPSpUvrq6++0j333KO0tLSrrtNms8kY4zAtp+dcWrRnFbKzZ89Wo0aNHNplXXN+JZffbM1mszlMs9lsDut4//33NXz4cD3//PNq0qSJwsLCNGXKFH3zzTfZlnOpy/t0pXVfXpTnxsfH56rjlTXv8ngulZmZqZiYGCUnJ2ebx53kAQAAYBUKbiAH9erV06+//ppjMS5JW7ZsUXp6up5//nn5+Fy8FcKl1zhLUkBAgDIyMrI9t2zZsjp06JD98W+//Wa/fvpKypUrp/Lly+uPP/7QXXfdld/u5Mu6devUtGlTPfDAA/Zplx4FjoiIULly5bRp0yY1b95c0sWj/1u3brUfBS+IgIAA+7Iudfl4ZWRk6KefflLr1q0lSdWqVVN6erq2bNliP+r+66+/6vjx4/bn1KtXTykpKfLz81PFihULHCMAAACQHxTcQA6eeuopderUSfHx8brzzjvl4+OjH374QT/++KOeffZZVa5cWenp6XrppZfUuXNnff3115o1a5bDMipWrKjTp0/riy++UJ06dRQSEqKQkBC1adNGL7/8sho3bqzMzEw98sgjefrJr7Fjx+rBBx9UeHi4br75Zp0/f15btmzRsWPH9NBDDzmt71WqVNGbb76p5cuXKzExUW+99ZY2b95sv4u4JA0ZMkQTJ05UlSpVlJSUpJdeeknHjh276lHm3ERFRSk4OFjLli1TXFycgoKCFBERoTZt2uihhx7SZ599psqVK+uFF15wKKarVq2qDh066N5779Xrr78uPz8/DRs2TMHBwfY2N910k5o0aaLbbrtNzz33nKpWraqDBw9q6dKluu222xxORwcAAACchbuUAzlo3769lixZopUrV+r6669X48aNNW3aNCUkJEiS6tatq2nTpum5555TzZo1tWDBAk2cONFhGU2bNtX999+v7t27q2zZspo8ebIk6fnnn1d8fLxatGihXr16aeTIkQoJCck1pn/961/6z3/+o3nz5qlWrVpq2bKl5s2b51AIO8P999+vrl27qnv37mrUqJGOHj3qcLRbungH9Z49e6pv375q0qSJSpQoofbt2ysoKKjA6/Xz89OMGTP02muvKTY2Vl26dJEk3X333erXr5/69u2rli1bKjEx0X50O8vcuXMVHx+vli1bqmvXrvaf/8pis9m0dOlStWjRQnfffbeuvfZa9ejRQ3v27LFfww4AAAA4m83kdOElAORDZmamqlWrpm7duumZZ55xdTgAAACAW+CUcgD5tnfvXq1YsUItW7bU+fPn9fLLL2v37t3q1auXq0MDAAAA3AanlAPINx8fH82bN0/XX3+9mjVrph9//FGrVq1StWrVXB0aAAAA4DY4pRwAAAAAAAtwhBsAAAAAAAtQcAMAAAAAYAEKbgAAAAAALEDBDQAAAACABSi4AQAAAACwAAU3AAAAAAAWoOAGAAAAAMACFNwAAAAAAFiAghsAAAAAAAtQcAMAAAAAYAEKbgAAAAAALEDBDQAAAACABSi4AQAAAACwAAU3AAAAAAAWoOAGkG979uyRzWbTvHnzXLL+CRMmaPHixS5ZNwAAAJBXNmOMcXUQADzL+fPntXXrVlWuXFlly5Yt8vWXKFFC//d//+eygh8AAADICz9XBwDAc2RkZCg9PV2BgYFq3Lixq8Nxqkv7BgCAtzp79qxCQkJcHQZQbHBKOYqlsWPHymaz6YcfftCdd96piIgIlS5dWg899JDS09P166+/qkOHDgoLC1PFihU1efLkbMs4efKkRo4cqcTERAUEBKh8+fIaNmyYzpw549DulVdeUYsWLRQVFaXQ0FDVqlVLkydPVlpamkO7Vq1aqWbNmtq8ebOaN2+ukJAQVapUSZMmTVJmZmaufbLZbBo8eLBee+01XXvttQoMDFT16tW1cOHCbG1TUlI0cOBAxcXFKSAgQImJiRo3bpzS09PtbbJOG588ebKeffZZJSYmKjAwUGvWrMnxlPKiGlObzaYzZ85o/vz5stlsstlsatWqldP6BgDwLt6Y8/O7ni+//FJNmzZVSEiI7r77bkv6BCBnHOFGsdatWzf17t1bAwcO1MqVK+0JZNWqVXrggQc0cuRIvfPOO3rkkUdUpUoVde3aVdLFvcMtW7bUgQMH9Pjjj6t27dravn27nnrqKf34449atWqVbDabJGnXrl3q1auXPaF9//33Gj9+vH755Re98cYbDvGkpKTorrvu0ogRIzRmzBh99NFHeuyxxxQbG6u+ffvm2p9PPvlEa9as0dNPP63Q0FC9+uqr6tmzp/z8/PR///d/9nU0bNhQPj4+euqpp1S5cmVt2LBBzz77rPbs2aO5c+c6LHPGjBm69tprNXXqVIWHh+uaa65x6Zhu2LBBbdq0UevWrTV69GhJUnh4eJH0DQDgubwp5+dnPYcOHVLv3r01atQoTZgwQT4+Ppb1CUAODFAMjRkzxkgyzz//vMP0unXrGklm0aJF9mlpaWmmbNmypmvXrvZpEydOND4+Pmbz5s0Oz//ggw+MJLN06dIc15uRkWHS0tLMm2++aXx9fc0///xjn9eyZUsjyXzzzTcOz6levbpp3759rn2SZIKDg01KSop9Wnp6uklKSjJVqlSxTxs4cKApUaKE2bt3r8Pzp06daiSZ7du3G2OM2b17t5FkKleubC5cuODQNmve3Llz7dOKckxDQ0NNv379so2BM/oGAPAu3pjz87ueL774wuE5VvQJQM44pRzFWqdOnRweV6tWTTabTTfffLN9mp+fn6pUqaK9e/fapy1ZskQ1a9ZU3bp1lZ6ebv9r3769bDabkpOT7W23bt2qW2+9VZGRkfL19ZW/v7/69u2rjIwM7dy502H90dHRatiwocO02rVrO6z7am688UaVK1fO/tjX11fdu3fX77//rgMHDthjb926tWJjYx1iz+rz2rVrHZZ56623yt/fP0/rl4pmTK/E6r4BADyXN+X8/KynVKlSatOmjcM0q/oEIDtOKUexVrp0aYfHAQEBCgkJUVBQULbpJ0+etD8+fPiwfv/99ysWa3///bckad++fWrevLmqVq2q6dOnq2LFigoKCtKmTZs0aNAgnTt3zuF5kZGR2ZYVGBiYrd2VREdHX3Ha0aNHFRcXp8OHD+vTTz/NNfYsMTExeVp3FqvH9Gqs7hsAwHN5S87P73pyynVW9QlAdhTcQAGUKVNGwcHBV7x2qUyZMpKkxYsX68yZM1q0aJESEhLs87dt22ZJXCkpKVeclpXYy5Qpo9q1a2v8+PE5LiM2NtbhcdY1XFbL65jmtgx37BsAwHO5W87P73pyynXu1ifAm1FwAwXQqVMnTZgwQZGRkUpMTLxiu6wkd+lPTRljNHv2bEvi+uKLL3T48GH7aeUZGRl67733VLlyZcXFxdljX7p0qSpXrqxSpUpZEkdB5HVMpSsfAXDXvgEAPJe75XxnrMfd+gR4MwpuoACGDRumDz/8UC1atNDw4cNVu3ZtZWZmat++fVqxYoVGjBihRo0aqW3btgoICFDPnj01atQopaamaubMmTp27JglcZUpU0Zt2rTR6NGj7Xcp/+WXXxx+Guzpp5/WypUr1bRpUz344IOqWrWqUlNTtWfPHi1dulSzZs2yF+dFKa9jKkm1atVScnKyPv30U8XExCgsLExVq1Z1274BADyXu+V8Z6zH3foEeDMKbqAAQkNDtW7dOk2aNEmvv/66du/ereDgYFWoUEE33XSTKlasKElKSkrShx9+qCeffFJdu3ZVZGSkevXqpYceesjhJi3Ocuutt6pGjRp68skntW/fPlWuXFkLFixQ9+7d7W1iYmK0ZcsWPfPMM5oyZYoOHDigsLAwJSYmqkOHDi47MpzXMZWk6dOna9CgQerRo4f9p02Sk5Pdtm8AAM/lbjnfGetxtz4B3sxmjDGuDgJA4dlsNg0aNEgvv/yyq0MBAAAAIImfBQMAAAAAwAIU3AAAAAAAWIBruAEvwdUhAAAAgHvhCDcAAAAAABag4AYAAAAAwAIuOaU8MzNTBw8eVFhYmGw2mytCAADALRhjdOrUKcXGxsrHxzX7wcnLAABc5Oy87JKC++DBg4qPj3fFqgEAcEv79+9XXFycS9ZNXgYAwJGz8rJLCu6wsDBJFzsRHh7uihAAAHALJ0+eVHx8vD03ugJ5GQCAi5ydl11ScGedrhYeHk5iBwBAcump3ORlAAAcOSsv87NgyG7NxKvPb/1Y0cQBAIDVyHkAAAtRcAMAAFwJBTkAoBD4WTAAAAAAACxAwQ0AAAAAgAU4pRz5V9jT63J7fl6WAQAAAABujoIbAACgoNiJDAC4Ck4pBwAAAADAAhTcAAAAAABYgIIbAAAAAAALUHADAAAAAGABbpoGz1TYO6UDAAAAgMUouOF8ebljKwAAAAB4OU4pBwAAAADAAhTcAAAAAABYgIIbAAAAAAALcA13ccP11QAAAABQJDjCDQAAAACABSi4AQAAAACwAAU3AAAAAAAW4BpuuCeuNQcAAADg4TjCDQAAAACABTjCDQAAYKXcztpq/VjRxAEAKHIc4QYAAAAAwAIc4QYAAN6Le4IAAFyIghsAAHgmbymmOeUcALwWp5QDAAAAAGABCm4AAAAAACxAwQ0AAAAAgAW4htvTcJ0XAAAAAHgEjnADAAAAAGABjnAXpbzcTZUj1AAAAADgFSi4UTwVwc6PF1buvOr84W2vLdTyAQAAALg3Cm534y2/KQoAAAAAxRzXcAMAAAAAYAGOcHsbjpA7D3eELxKceg/gishpF5GPAMBjUXADAAB4Mm7KCgBui4IbcGMc/c0bxgkAAADuiGu4AQAAAACwAEe4USxt+ONorm2aVIosgkisxZFfAAAAwHUouIEryK0ob9K6iAJxcxT1AOD++KwGANeg4IZX2jBnpKtD8Ai5fQFzB54QIwAURq47eL3gjCsAKK4ouAEAAIq5ojgCzlF2AMURBTfgIhy5BQA4Q17uS6IKhVtHcSnI3SE3s+MB8C4U3PmR2+9c8huXRSZPXy4s5g5J2R1iyI0nxAjARfLy+9Fwi5znCcg3ANwRBTcAAICFKJidg4IagCei4IZb4ssJnMkdTlME4J3IVwCAq/FxdQAAAAAAAHgjjnCjyHE0AO7GGacpFvYoOUfhAXgyTvcGgJwVn4I7LzdmKexNz7j5Cy7ReN/rhV7Gxgr3OSESAHAR8iKQb4XdeZHbDtq8LN/qnbzuEANQVIpPwY084wg0kH/F4Qh1cfmCVBy2JfLOW3JiYXcCswMYAAqGghsAAHit3ArmJpUiiygSz5ZbwU5BjqJW2J2jxWUnsrfw5J3hLim4jTGSpJMnTzpvoV8+X/hlLBlT+GW4uU17/nF1CF6j1q8vXXX+mSJYx+a4AVedf/2BuYV6Ppwnt8+71DOnrzp/4uLvCh3DoDZVCvX83GKUcu/nK6t/L1QMeelDbuvIbRm59dOpueuS5WXlRlewJC9L0plU5y6vICGcO3/V+au2HyyiSLxbbvlKImcVFWfki9yW4Q75pLCf1c6IITeFzUdFIS952R3iLMrc7Oy8bDMuyPAHDhxQfHx8Ua8WAAC3tX//fsXFxblk3eRlAAAcOSsvu6TgzszM1MGDBxUWFiabzZZjm5MnTyo+Pl779+9XeHh4EUdYNOijd6CP3oE+egdP7KMxRqdOnVJsbKx8fFzza515ycvIG098DXoKxtZajK91GFtrOXt8nZ2XXXJKuY+PT573FoSHh3v9C5M+egf66B3oo3fwtD5GRES4dP35ycvIG097DXoSxtZajK91GFtrOXN8nZmXXbMrHQAAAAAAL0fBDQAAAACABdy24A4MDNSYMWMUGBjo6lAsQx+9A330DvTROxSHPsK98Rq0DmNrLcbXOoyttdx9fF1y0zQAAAAAALyd2x7hBgAAAADAk1FwAwAAAABgAQpuAAAAAAAsQMENAAAAAIAFKLgBAAAAALCAWxXcEydOlM1m07Bhw+zTjDEaO3asYmNjFRwcrFatWmn79u2uC7IA/vzzT/Xu3VuRkZEKCQlR3bp19e2339rne3of09PT9eSTTyoxMVHBwcGqVKmSnn76aWVmZtrbeGIfv/zyS3Xu3FmxsbGy2WxavHixw/y89On8+fMaMmSIypQpo9DQUN166606cOBAEfbiyq7Wv7S0ND3yyCOqVauWQkNDFRsbq759++rgwYMOy3Dn/km5b8NLDRw4UDabTS+++KLDdG/o444dO3TrrbcqIiJCYWFhaty4sfbt22ef7+l9PH36tAYPHqy4uDgFBwerWrVqmjlzpkMbd+8j3NvYsWNls9kc/qKjo+3zPT0fFKWiyq3Hjh1Tnz59FBERoYiICPXp00fHjx+3uHeul9v49u/fP9truXHjxg5tGN/sJk6cqOuvv15hYWGKiorSbbfdpl9//dWhDa/dgsvL+Hrya9dtCu7Nmzfr9ddfV+3atR2mT548WdOmTdPLL7+szZs3Kzo6Wm3bttWpU6dcFGn+HDt2TM2aNZO/v78+//xz/fzzz3r++edVsmRJextP7+Nzzz2nWbNm6eWXX9aOHTs0efJkTZkyRS+99JK9jSf28cyZM6pTp45efvnlHOfnpU/Dhg3TRx99pIULF+qrr77S6dOn1alTJ2VkZBRVN67oav07e/asvvvuO40ePVrfffedFi1apJ07d+rWW291aOfO/ZNy34ZZFi9erG+++UaxsbHZ5nl6H3ft2qUbbrhBSUlJSk5O1vfff6/Ro0crKCjI3sbT+zh8+HAtW7ZMb7/9tnbs2KHhw4dryJAh+vjjj+1t3L2PcH81atTQoUOH7H8//vijfZ6n54OiVFS5tVevXtq2bZuWLVumZcuWadu2berTp4/l/XO1vOS9Dh06OLyWly5d6jCf8c1u7dq1GjRokDZu3KiVK1cqPT1d7dq105kzZ+xteO0WXF7GV/Lg165xA6dOnTLXXHONWblypWnZsqUZOnSoMcaYzMxMEx0dbSZNmmRvm5qaaiIiIsysWbNcFG3+PPLII+aGG2644nxv6GPHjh3N3Xff7TCta9eupnfv3sYY7+ijJPPRRx/ZH+elT8ePHzf+/v5m4cKF9jZ//vmn8fHxMcuWLSuy2PPi8v7lZNOmTUaS2bt3rzHGs/pnzJX7eODAAVO+fHnz008/mYSEBPPCCy/Y53lDH7t3725/L+bEG/pYo0YN8/TTTztMq1evnnnyySeNMZ7XR7ifMWPGmDp16uQ4z9vyQVGyKrf+/PPPRpLZuHGjvc2GDRuMJPPLL79Y3Cv3kdPnZb9+/UyXLl2u+BzGN2+OHDliJJm1a9caY3jtOtvl42uMZ7923eII96BBg9SxY0fddNNNDtN3796tlJQUtWvXzj4tMDBQLVu21Pr164s6zAL55JNP1KBBA915552KiorSddddp9mzZ9vne0Mfb7jhBn3xxRfauXOnJOn777/XV199pVtuuUWSd/Txcnnp07fffqu0tDSHNrGxsapZs6ZH9vvEiROy2Wz2szO8oX+ZmZnq06ePHn74YdWoUSPbfE/vY2Zmpj777DNde+21at++vaKiotSoUSOHUww9vY/Sxc+gTz75RH/++aeMMVqzZo127typ9u3bS/KOPsL1fvvtN8XGxioxMVE9evTQH3/8Ial45gOrOGssN2zYoIiICDVq1MjepnHjxoqIiGC8JSUnJysqKkrXXnut7r33Xh05csQ+j/HNmxMnTkiSSpcuLYnXrrNdPr5ZPPW16/KCe+HChfruu+80ceLEbPNSUlIkSeXKlXOYXq5cOfs8d/fHH39o5syZuuaaa7R8+XLdf//9evDBB/Xmm29K8o4+PvLII+rZs6eSkpLk7++v6667TsOGDVPPnj0leUcfL5eXPqWkpCggIEClSpW6YhtPkZqaqkcffVS9evVSeHi4JO/o33PPPSc/Pz89+OCDOc739D4eOXJEp0+f1qRJk9ShQwetWLFCt99+u7p27aq1a9dK8vw+StKMGTNUvXp1xcXFKSAgQB06dNCrr76qG264QZJ39BGu1ahRI7355ptavny5Zs+erZSUFDVt2lRHjx4tdvnASs4ay5SUFEVFRWVbflRUVLEf75tvvlkLFizQ6tWr9fzzz2vz5s1q06aNzp8/L4nxzQtjjB566CHdcMMNqlmzpiReu86U0/hKnv3a9bNsyXmwf/9+DR06VCtWrHC4nvByNpvN4bExJts0d5WZmakGDRpowoQJkqTrrrtO27dv18yZM9W3b197O0/u43vvvae3335b77zzjmrUqKFt27Zp2LBhio2NVb9+/eztPLmPV1KQPnlav9PS0tSjRw9lZmbq1VdfzbW9p/Tv22+/1fTp0/Xdd9/lO15P6WPWjQu7dOmi4cOHS5Lq1q2r9evXa9asWWrZsuUVn+spfZQuFtwbN27UJ598ooSEBH355Zd64IEHFBMTk+3MqUt5Uh/hWjfffLP9/1q1aqlJkyaqXLmy5s+fb79pT3HIB0XFGWOZU3vGW+revbv9/5o1a6pBgwZKSEjQZ599pq5du17xeYzv/wwePFg//PCDvvrqq2zzeO0W3pXG15Nfuy49wv3tt9/qyJEjql+/vvz8/OTn56e1a9dqxowZ8vPzs+8lunyPw5EjR7LtQXJXMTExql69usO0atWq2e8QnHWXU0/u48MPP6xHH31UPXr0UK1atdSnTx8NHz7cftaCN/TxcnnpU3R0tC5cuKBjx45dsY27S0tLU7du3bR7926tXLnSfnRb8vz+rVu3TkeOHFGFChXsnz979+7ViBEjVLFiRUme38cyZcrIz88v188gT+7juXPn9Pjjj2vatGnq3LmzateurcGDB6t79+6aOnWqJM/vI9xPaGioatWqpd9++63Y5IOi4KyxjI6O1uHDh7Mt/6+//mK8LxMTE6OEhAT99ttvkhjf3AwZMkSffPKJ1qxZo7i4OPt0XrvOcaXxzYknvXZdWnDfeOON+vHHH7Vt2zb7X4MGDXTXXXdp27ZtqlSpkqKjo7Vy5Ur7cy5cuKC1a9eqadOmLow875o1a5bttvY7d+5UQkKCJCkxMdHj+3j27Fn5+Di+lHx9fe1H17yhj5fLS5/q168vf39/hzaHDh3STz/95BH9ziq2f/vtN61atUqRkZEO8z29f3369NEPP/zg8PkTGxurhx9+WMuXL5fk+X0MCAjQ9ddff9XPIE/vY1pamtLS0q76GeTpfYT7OX/+vHbs2KGYmJhikQ+KirPGskmTJjpx4oQ2bdpkb/PNN9/oxIkTjPdljh49qv379ysmJkYS43slxhgNHjxYixYt0urVq5WYmOgwn9du4eQ2vjnxqNeuZbdjK6BL71JujDGTJk0yERERZtGiRebHH380PXv2NDExMebkyZOuCzIfNm3aZPz8/Mz48ePNb7/9ZhYsWGBCQkLM22+/bW/j6X3s16+fKV++vFmyZInZvXu3WbRokSlTpowZNWqUvY0n9vHUqVNm69atZuvWrUaSmTZtmtm6dav9Lt156dP9999v4uLizKpVq8x3331n2rRpY+rUqWPS09Nd1S27q/UvLS3N3HrrrSYuLs5s27bNHDp0yP53/vx5+zLcuX/G5L4NL3f5XcqN8fw+Llq0yPj7+5vXX3/d/Pbbb+all14yvr6+Zt26dfZleHofW7ZsaWrUqGHWrFlj/vjjDzN37lwTFBRkXn31Vfsy3L2PcG8jRowwycnJ5o8//jAbN240nTp1MmFhYWbPnj3GGM/PB0WpqHJrhw4dTO3atc2GDRvMhg0bTK1atUynTp2KvL9F7Wrje+rUKTNixAizfv16s3v3brNmzRrTpEkTU758ecY3F//+979NRESESU5OdvhOdPbsWXsbXrsFl9v4evpr1+0L7szMTDNmzBgTHR1tAgMDTYsWLcyPP/7ougAL4NNPPzU1a9Y0gYGBJikpybz++usO8z29jydPnjRDhw41FSpUMEFBQaZSpUrmiSeecCjMPLGPa9asMZKy/fXr188Yk7c+nTt3zgwePNiULl3aBAcHm06dOpl9+/a5oDfZXa1/u3fvznGeJLNmzRr7Mty5f8bkvg0vl1PB7Q19nDNnjqlSpYoJCgoyderUMYsXL3ZYhqf38dChQ6Z///4mNjbWBAUFmapVq5rnn3/eZGZm2pfh7n2Ee+vevbuJiYkx/v7+JjY21nTt2tVs377dPt/T80FRKqrcevToUXPXXXeZsLAwExYWZu666y5z7NixIuql61xtfM+ePWvatWtnypYta/z9/U2FChVMv379so0d45vdlb4TzZ07196G127B5Ta+nv7atf3/TgIAAAAAACdy+c+CAQAAAADgjSi4AQAAAACwAAU3AAAAAAAWoOAGAAAAAMACFNwAAAAAAFiAghsAAAAAAAtQcAMAAAAAYAEKbgAAAAAALEDBDQAAAACABSi4AQAAAACwAAU3AAAAAAAWoOAGAAAAAMACFNwAAAAAAFiAghsAAAAAAAtQcAMAAAAAYAEKbgAAAAAALEDBDQAAAACABSi4AQAAAACwAAU3AAAAAAAWoOAGAAAAAMACFNwAAAAAAFiAghsAAAAAAAtQcAMAAAAAYAEKbgAAAAAALEDBDQAAAACABSi4AQAAAACwAAU3AAAAAAAWoOAGAAAAAMACFNwAAAAAAFiAghsAAAAAAAtQcAMAAAAAYAEKbgAAAAAALEDBDQAAAACABSi4AQAAAACwAAU3AAAAAAAWoOAGAAAAAMACFNwALPHOO+/oxRdfzDZ9z549stlsmjp1atEHBQAA8m39+vUaO3asjh8/7upQAI9DwQ3AElcquAEAgGdZv369xo0bR8ENFAAFNwAAAAAAFqDghtcZO3asbDabfvjhB915552KiIhQ6dKl9dBDDyk9PV2//vqrOnTooLCwMFWsWFGTJ0/OtoyTJ09q5MiRSkxMVEBAgMqXL69hw4bpzJkzDu1eeeUVtWjRQlFRUQoNDVWtWrU0efJkpaWlObRr1aqVatasqc2bN6t58+YKCQlRpUqVNGnSJGVmZubap//+979q1KiRIiIi7M+9++677fOTk5Nls9n0zjvv6JFHHlFMTIxKlCihzp076/Dhwzp16pTuu+8+lSlTRmXKlNGAAQN0+vRph3Wkpqbqsccec+jzoEGDsu3NzszM1OTJk5WUlKTAwEBFRUWpb9++OnDggEN/P/vsM+3du1c2m83+d7lp06YpMTFRJUqUUJMmTbRx40aH+f3791eJEiX0+++/65ZbblGJEiUUHx+vESNG6Pz58w5tL1y4oGeffdYeV9myZTVgwAD99ddfDu1Wr16tVq1aKTIyUsHBwapQoYLuuOMOnT171t5m5syZqlOnjkqUKKGwsDAlJSXp8ccfz3U7AUBx5405ODMzUy+99JLq1q2r4OBglSxZUo0bN9Ynn3zi0Ca33HhpLBs2bFDTpk0VHBysihUrau7cuZKkzz77TPXq1VNISIhq1aqlZcuW5Ti+W7duVdeuXRUeHq6IiAj17t07W75777331K5dO8XExCg4OFjVqlXTo48+mm0cJembb75R586dFRkZqaCgIFWuXFnDhg2zr/Phhx+WJCUmJtpzenJysiSpYsWK6tSpk5YtW6Z69eopODhYSUlJeuONN7KtJyUlRQMHDlRcXJwCAgKUmJiocePGKT093aFdbnn47Nmz9tdIUFCQSpcurQYNGujdd9+92qYEXMMAXmbMmDFGkqlatap55plnzMqVK82oUaOMJDN48GCTlJRkZsyYYVauXGkGDBhgJJkPP/zQ/vwzZ86YunXrmjJlyphp06aZVatWmenTp5uIiAjTpk0bk5mZaW87fPhwM3PmTLNs2TKzevVq88ILL5gyZcqYAQMGOMTUsmVLExkZaa655hoza9Yss3LlSvPAAw8YSWb+/PlX7c/69euNzWYzPXr0MEuXLjWrV682c+fONX369LG3WbNmjZFkEhISTP/+/c2yZcvMrFmzTIkSJUzr1q1N27ZtzciRI82KFSvMc889Z3x9fc2QIUPsz8/MzDTt27c3fn5+ZvTo0WbFihVm6tSpJjQ01Fx33XUmNTXV3va+++6zj2XWesqWLWvi4+PNX3/9ZYwxZvv27aZZs2YmOjrabNiwwf5njDG7d+82kkzFihVNhw4dzOLFi83ixYtNrVq1TKlSpczx48ft6+rXr58JCAgw1apVM1OnTjWrVq0yTz31lLHZbGbcuHH2dhkZGaZDhw4mNDTUjBs3zqxcudL85z//MeXLlzfVq1c3Z8+eta87KCjItG3b1ixevNgkJyebBQsWmD59+phjx44ZY4x59913jSQzZMgQs2LFCrNq1Soza9Ys8+CDD179hQcA8LocbIwxffr0MTabzfzrX/8yH3/8sfn888/N+PHjzfTp0+1t8pIbL42latWqZs6cOWb58uWmU6dORpIZN26cqVWrlnn33XfN0qVLTePGjU1gYKD5888/s41vQkKCefjhh83y5cvNtGnT7Pn6woUL9rbPPPOMeeGFF8xnn31mkpOTzaxZs0xiYqJp3bq1Q/+WLVtm/P39Te3atc28efPM6tWrzRtvvGF69OhhjDFm//79ZsiQIUaSWbRokT2nnzhxwhhjTEJCgomLizPVq1c3b775plm+fLm58847jSSzdu1a+3oOHTpk4uPjTUJCgnnttdfMqlWrzDPPPGMCAwNN//797e3ykocHDhxoQkJCzLRp08yaNWvMkiVLzKRJk8xLL72U6/YEihoFN7xOVjJ6/vnnHabXrVvXniyypKWlmbJly5quXbvap02cONH4+PiYzZs3Ozz/gw8+MJLM0qVLc1xvRkaGSUtLM2+++abx9fU1//zzj31ey5YtjSTzzTffODynevXqpn379lftz9SpU40kh0L0clkFd+fOnR2mDxs2zEjKVizedtttpnTp0vbHy5YtM5LM5MmTHdq99957RpJ5/fXXjTHG7Nixw0gyDzzwgEO7b775xkgyjz/+uH1ax44dTUJCQrZYswruWrVqmfT0dPv0TZs2GUnm3XfftU/r16+fkWTef/99h2XccsstpmrVqvbHWcn50i9txhizefNmI8m8+uqrxpj/bcNt27ZliyvL4MGDTcmSJa84HwBwZd6Wg7/88ksjyTzxxBNXbJOf3JgVy5YtW+zTjh49anx9fU1wcLBDcb1t2zYjycyYMcM+LWt8hw8f7rCuBQsWGEnm7bffzjHGzMxMk5aWZtauXWskme+//94+r3LlyqZy5crm3LlzV+zjlClTjCSze/fubPMSEhJMUFCQ2bt3r33auXPnTOnSpc3AgQPt0wYOHGhKlCjh0M6Y/33P2b59uzEmb3m4Zs2a5rbbbrtqG8BdcEo5vFanTp0cHlerVk02m00333yzfZqfn5+qVKmivXv32qctWbJENWvWVN26dZWenm7/a9++vcMpVJK0detW3XrrrYqMjJSvr6/8/f3Vt29fZWRkaOfOnQ7rj46OVsOGDR2m1a5d22HdObn++uslSd26ddP777+vP//8M199lqSOHTtmm/7PP//YTytfvXq1pIuncF/qzjvvVGhoqL744gtJ0po1a3Js17BhQ1WrVs3eLi86duwoX19f++PatWtLUrbxsNls6ty5s8O0y8dtyZIlKlmypDp37uywzerWravo6Gj7Nqtbt64CAgJ03333af78+frjjz+yxdWwYUMdP35cPXv21Mcff6y///47z30CAFzkLTn4888/lyQNGjToim3ymxtjYmJUv359++PSpUsrKipKdevWVWxsrH16Vg7PKca77rrL4XG3bt3k5+dnj0WS/vjjD/Xq1UvR0dH28WnZsqUkaceOHZKknTt3ateuXbrnnnsUFBR0xT7mpm7duqpQoYL9cVBQkK699tps27Z169aKjY112LZZr4m1a9dKylsebtiwoT7//HM9+uijSk5O1rlz5wocO2A1Cm54rdKlSzs8DggIUEhISLaEEhAQoNTUVPvjw4cP64cffpC/v7/DX1hYmIwx9g/+ffv2qXnz5vrzzz81ffp0rVu3Tps3b9Yrr7wiSdk+/CMjI7PFGBgYmGuSaNGihRYvXqz09HT17dtXcXFxqlmzZo7XKeXU56tNz+r30aNH5efnp7Jlyzq0s9lsio6O1tGjR+3tpItfFi4XGxtrn58Xl49HYGCgpOzjltM2CwwMzLbNjh8/roCAgGzbLSUlxb7NKleurFWrVikqKkqDBg1S5cqVVblyZU2fPt2+rD59+uiNN97Q3r17dccddygqKkqNGjXSypUr89w3ACjuvCUH//XXX/L19VV0dPQV2+Q3N14+NtLFccgtV1/q8nj8/PwUGRlpX9fp06fVvHlzffPNN3r22WeVnJyszZs3a9GiRZL+Nz5Z133HxcVdsX95kZfxPXz4sD799NNs27ZGjRqSZN+2ecnDM2bM0COPPKLFixerdevWKl26tG677Tb99ttvheoHYAU/VwcAuJsyZcooODg4x5t9ZM2XpMWLF+vMmTNatGiREhIS7PO3bdvm9Ji6dOmiLl266Pz589q4caMmTpyoXr16qWLFimrSpEmhlx8ZGan09HT99ddfDkW3MUYpKSn2o+xZCfXQoUPZkvPBgwftY1PUypQpo8jIyGw3l8kSFhZm/7958+Zq3ry5MjIytGXLFr300ksaNmyYypUrpx49ekiSBgwYoAEDBujMmTP68ssvNWbMGHXq1Ek7d+502NYAAOdytxxctmxZZWRkKCUlJceCWnJNbkxJSVH58uXtj9PT03X06FF7LKtXr9bBgweVnJxsP6otKduNULNy/uU3d7NCmTJlVLt2bY0fPz7H+Zce3c8tD4eGhmrcuHEaN26cDh8+bD/a3blzZ/3yyy+W9wXID45wA5fp1KmTdu3apcjISDVo0CDbX8WKFSXJftftrCOz0sUCdfbs2ZbFFhgYqJYtW+q5556TdPF0Ome48cYbJUlvv/22w/QPP/xQZ86csc9v06ZNju02b96sHTt22NtlxVpUp3h16tRJR48eVUZGRo7brGrVqtme4+vrq0aNGtmPhnz33XfZ2oSGhurmm2/WE088oQsXLmj79u2W9wUAijN3y8FZpzvPnDnzim3ykxudZcGCBQ6P33//faWnp6tVq1aSch4fSXrttdccHl977bWqXLmy3njjjWy//nGpK52Flh+dOnXSTz/9pMqVK+e4bS8tuLPkJQ+XK1dO/fv3V8+ePfXrr786/OoI4A44wg1cZtiwYfrwww/VokULDR8+XLVr11ZmZqb27dunFStWaMSIEWrUqJHatm2rgIAA9ezZU6NGjVJqaqpmzpypY8eOOTWep556SgcOHNCNN96ouLg4HT9+XNOnT3e4Fquw2rZtq/bt2+uRRx7RyZMn1axZM/3www8aM2aMrrvuOvXp00eSVLVqVd1333166aWX5OPjo5tvvll79uzR6NGjFR8fr+HDh9uXWatWLS1atEgzZ85U/fr15ePjowYNGjgl3sv16NFDCxYs0C233KKhQ4eqYcOG8vf314EDB7RmzRp16dJFt99+u2bNmqXVq1erY8eOqlChglJTU+1HUW666SZJ0r333qvg4GA1a9ZMMTExSklJ0cSJExUREWE/0g8AsIa75eDmzZurT58+evbZZ3X48GF16tRJgYGB2rp1q0JCQjRkyJB85UZnWbRokfz8/NS2bVtt375do0ePVp06ddStWzdJUtOmTVWqVCndf//9GjNmjPz9/bVgwQJ9//332Zb1yiuvqHPnzmrcuLGGDx+uChUqaN++fVq+fLm9sK9Vq5Ykafr06erXr5/8/f1VtWpVhzPIcvP0009r5cqVatq0qR588EFVrVpVqamp2rNnj5YuXapZs2YpLi4uT3m4UaNG6tSpk2rXrq1SpUppx44deuutt9SkSROFhIQUdngBp6LgBi4TGhqqdevWadKkSXr99de1e/du++8133TTTfa960lJSfrwww/15JNPqmvXroqMjFSvXr300EMPOdwUprAaNWqkLVu26JFHHtFff/2lkiVLqkGDBlq9erX9uqfCstlsWrx4scaOHau5c+dq/PjxKlOmjPr06aMJEyY47CGfOXOmKleurDlz5uiVV15RRESEOnTooIkTJzpcwzV06FBt375djz/+uE6cOCFz8VcRnBLv5Xx9ffXJJ59o+vTpeuuttzRx4kT5+fkpLi5OLVu2tH9RqFu3rlasWKExY8YoJSVFJUqUUM2aNfXJJ5+oXbt2ki5+uZo3b57ef/99HTt2TGXKlNENN9ygN998M9s17gAA53K3HCxJ8+bNU7169TRnzhzNmzdPwcHBql69usPvQuc1NzrLokWLNHbsWM2cOdN+c9EXX3zRft13ZGSkPvvsM40YMUK9e/dWaGiounTpovfee0/16tVzWFb79u315Zdf6umnn9aDDz6o1NRUxcXF6dZbb7W3adWqlR577DHNnz9fs2fPVmZmptasWWM/op4XMTEx2rJli5555hlNmTJFBw4cUFhYmBITE9WhQweVKlVKUt7ycJs2bfTJJ5/ohRde0NmzZ1W+fHn17dtXTzzxRCFHFnA+m7HqGzAAAAAApxk7dqzGjRunv/76y2X3TQGQP1zDDQAAAACABSi4AQAAAACwAKeUAwAAAABgAY5wAwAAAABgAQpuAAAAAAAsQMENAAAAAIAFXPI73JmZmTp48KDCwsJks9lcEQIAAG7BGKNTp04pNjZWPj6u2Q9OXgYA4CJn52WXFNwHDx5UfHy8K1YNAIBb2r9/v+Li4lyybvIyAACOnJWXXVJwh4WFSbrYifDwcFeEAACAWzh58qTi4+PtudEVyMsAAFzk7LzskoI763S18PBwEjsKZs3Eq89v/VjRxAEATuLKU7nJyxYjZwGAx3FWXuamaQAAAAAAWMAlR7jh5diTDwAAAAAU3AAAoBhjJzEAwEKcUg4AAAAAgAUouAEAAAAAsAAFNwAAAAAAFuAabrin3K6pAwAAAAA3xxFuAAAAAAAswBFuAACAK+Eu5gCAQuAINwAAAAAAFqDgBgAAAADAAhTcAAAAAABYgGu4AQAAXInrxAHAa1FwA66Sl58+40sWAAAA4LEouAEAAAoqLztPAQDFFtdwAwAAAABgAQpuAAAAAAAsQMENAAAAAIAFuIYbxZMzbliWyzI2/HH0qvObVIrMPQYAAAAAHosj3AAAAAAAWIAj3PBOzrhrLHeeBQAAAFAIFNwAAMB7sfMUAOBCnFIOAAAAAIAFOMKNouchRxu84qZnuY11bjeGAwC4njt8lrtDDADggSi4AQAAPJkzfnkDAGAJCm4AAOCePOSMKAAAroSCG/BgG+aMvOp8jzjtHQAAAPBS3DQNAAAAAAALcIQbxVJuN0RzlxiatLZ2HYVdPgAAzvLCyp1XnT+87bVFFAkAOA9HuAEAAAAAsABHuAE3lts12gAAAADcFwV3ccNPh+ASuZ2+J3EKHwDAOfKScwDA21BwAwAA4Kq4JwgAFAwFN/KP30X1Go33vZ6HVlMLtQ5uggMAAIDiioIbAADA2+W2s5zLyQDAEhTcAK7K6iPUHAEHAACAt6LgRnZecMq4O/zONgAA7iLXvPhH4X4Vwx1uiFYUNwJlJzGA/OJ3uAEAAAAAsABHuD0N12DBzbjDUQ0AgOfL/UaehbuJp8QRagBFjyPcAAAAAABYgCPccEu5/t5npcgiigS5ydtPi13Zxgr3OSkSAAAAwL1QcKPIecsNzbylHwAA9+YJO6HzsvOVHawAiiMKbgAAAFiusGdEFQV3uC8J15kD3oWCGx6Jo8vIF242CMAieclH7nAEGkWDYhnA5bhpGgAAAAAAFuAINwCPl+sRBT7pABRQUZxRxVlbxUdRnLLujKPsHKkHnIevoQAAwDVyu9wDuIQzitXcriP3hBu7ucN15gDyjoIbAADAg3GE/KKiuCkbBbtz1lFcjrI7Y6w9oZ+4OgruopSXPfmFvXkTRwvgYXL78vLCyty/vOS2jA25PH9juvVJ3Ru+OAAAACB/KLgBAIA13GAnsDsc/XWHGABnym0nsjN2phc2BnZkw114T8HtDj/744wvFhZ/OSHpI79c/buprl6/JG2YMzLXNpafRpiHz4YX0u+wNIRi8+XFHfIJgGzc4ZRxd1h+bvmmKE57d4fcXFic7n1RXsbBHfrpyTtYXFJwG2MkSSdPnnTeQs+kXn2+M9dV0BjcwJlz510dAuB2Us+cvur8vLxvcltGbnL9PMzD50tqeuFiyI1TP7PdWRHnk6xxzcqNrmBJXpbcIi+S91CUnJFPrF5HYfNVXtaRG2fEkJvcPs+cEYM7rKOw8hKjO+T/3OJ0ZozOzss244IMf+DAAcXHxxf1agEAcFv79+9XXFycS9ZNXgYAwJGz8rJLCu7MzEwdPHhQYWFhstlsRb36Qjt58qTi4+O1f/9+hYeHuzoct8U45Q3jlHeMVd4wTnnnDmNljNGpU6cUGxsrHx8fl8Tg6Xk5P9xhm7tSce4/fafv9L14KWj/nZ2XXXJKuY+Pj8v24jtTeHh4sXzx5hfjlDeMU94xVnnDOOWdq8cqIiLCZeuWvCcv54ert7mrFef+03f6XtwU575LBeu/M/Oya3alAwAAAADg5Si4AQAAAACwAAV3AQQGBmrMmDEKDAx0dShujXHKG8Yp7xirvGGc8o6xKn6K+zYvzv2n7/S9uCnOfZfcp/8uuWkaAAAAAADejiPcAAAAAABYgIIbAAAAAAALUHADAAAAAGABCm4AAAAAACxAwS3p1VdfVWJiooKCglS/fn2tW7fuqu3Xrl2r+vXrKygoSJUqVdKsWbOytTl+/LgGDRqkmJgYBQUFqVq1alq6dKlVXSgSVozTiy++qKpVqyo4OFjx8fEaPny4UlNTrepCkcnPWB06dEi9evVS1apV5ePjo2HDhuXY7sMPP1T16tUVGBio6tWr66OPPrIo+qLj7HGaPXu2mjdvrlKlSqlUqVK66aabtGnTJgt7UHSseE1lWbhwoWw2m2677TbnBu0CVoyTN36eextn56d58+bJZrNl+3PH/FSc842z++6t233RokVq27atypYtq/DwcDVp0kTLly/P1s5Ttrvk/P5767b/6quv1KxZM0VGRio4OFhJSUl64YUXsrXzlG3v7L4X2XY3xdzChQuNv7+/mT17tvn555/N0KFDTWhoqNm7d2+O7f/44w8TEhJihg4dan7++Wcze/Zs4+/vbz744AN7m/Pnz5sGDRqYW265xXz11Vdmz549Zt26dWbbtm1F1S2ns2Kc3n77bRMYGGgWLFhgdu/ebZYvX25iYmLMsGHDiqpblsjvWO3evds8+OCDZv78+aZu3bpm6NCh2dqsX7/e+Pr6mgkTJpgdO3aYCRMmGD8/P7Nx40aLe2MdK8apV69e5pVXXjFbt241O3bsMAMGDDARERHmwIEDFvfGWlaMVZY9e/aY8uXLm+bNm5suXbpY04EiYsU4eePnubexIj/NnTvXhIeHm0OHDjn8uZvinG+s6Lu3bvehQ4ea5557zmzatMns3LnTPPbYY8bf399899139jaest2Nsab/3rrtv/vuO/POO++Yn376yezevdu89dZbJiQkxLz22mv2Np6y7a3oe1Ft92JfcDds2NDcf//9DtOSkpLMo48+mmP7UaNGmaSkJIdpAwcONI0bN7Y/njlzpqlUqZK5cOGC8wN2ESvGadCgQaZNmzYObR566CFzww03OClq18jvWF2qZcuWOX4J6Natm+nQoYPDtPbt25sePXoUKlZXsmKcLpeenm7CwsLM/PnzCxqmW7BqrNLT002zZs3Mf/7zH9OvXz+PL7itGCdv/Dz3Nlbkp7lz55qIiAinx+psxTnfWNH34rDds1SvXt2MGzfO/thTtrsx1vS/OG3722+/3fTu3dv+2FO2vRV9L6rtXqxPKb9w4YK+/fZbtWvXzmF6u3bttH79+hyfs2HDhmzt27dvry1btigtLU2S9Mknn6hJkyYaNGiQypUrp5o1a2rChAnKyMiwpiMWs2qcbrjhBn377bf2U37/+OMPLV26VB07drSgF0WjIGOVF1caz8Is05WsGqfLnT17VmlpaSpdurTTllnUrByrp59+WmXLltU999xTqOW4A6vGyds+z72NVflJkk6fPq2EhATFxcWpU6dO2rp1q/M7UAjFOd9Y+blYHLZ7ZmamTp065ZAbPWG7S9b1Xyoe237r1q1av369WrZsaZ/mCdveqr5LRbPdi3XB/ffffysjI0PlypVzmF6uXDmlpKTk+JyUlJQc26enp+vvv/+WdLFw/OCDD5SRkaGlS5fqySef1PPPP6/x48db0xGLWTVOPXr00DPPPKMbbrhB/v7+qly5slq3bq1HH33Umo4UgYKMVV5caTwLs0xXsmqcLvfoo4+qfPnyuummm5y2zKJm1Vh9/fXXmjNnjmbPnl3YEN2CVePkbZ/n3saq/JSUlKR58+bpk08+0bvvvqugoCA1a9ZMv/32mzUdKYDinG+s6ntx2e7PP/+8zpw5o27dutmnecJ2l6zrv7dv+7i4OAUGBqpBgwYaNGiQ/vWvf9nnecK2t6rvRbXd/Zy6NA9ls9kcHhtjsk3Lrf2l0zMzMxUVFaXXX39dvr6+ql+/vg4ePKgpU6boqaeecnL0RcfZ45ScnKzx48fr1VdfVaNGjfT7779r6NChiomJ0ejRo50cfdHK71i5apmuZmWfJk+erHfffVfJyckKCgpyyjJdyZljderUKfXu3VuzZ89WmTJlnBGe23D2a8pbP8+9jbPzU+PGjdW4cWP7/GbNmqlevXp66aWXNGPGDGeF7RTFOd84O87isN3fffddjR07Vh9//LGioqKcskxXcHb/vX3br1u3TqdPn9bGjRv16KOPqkqVKurZs2ehlukKzu57UW33Yl1wlylTRr6+vtn2jBw5ciTbHpQs0dHRObb38/NTZGSkJCkmJkb+/v7y9fW1t6lWrZpSUlJ04cIFBQQEOLkn1rJqnEaPHq0+ffrY9zTVqlVLZ86c0X333acnnnhCPj6edwJGQcYqL640noVZpitZNU5Zpk6dqgkTJmjVqlWqXbt2oZfnSlaM1a5du7Rnzx517tzZPi0zM1OS5Ofnp19//VWVK1cueNAuYNVryts+z72NVfnpcj4+Prr++uvd6mhXcc43VueQLN623d977z3dc889+u9//5vtzC9P2O6Sdf2/nLdt+8TEREkXv2sfPnxYY8eOtRednrDtrer75aza7p5X0ThRQECA6tevr5UrVzpMX7lypZo2bZrjc5o0aZKt/YoVK9SgQQP5+/tLurh35Pfff7d/gZWknTt3KiYmxiO/nFk1TmfPns1WVPv6+spcvJmfE3tQdAoyVnlxpfEszDJdyapxkqQpU6bomWee0bJly9SgQYNCLcsdWDFWSUlJ+vHHH7Vt2zb736233qrWrVtr27Ztio+Pd0boRcqq15S3fZ57G6vy0+WMMdq2bZtiYmKcE7gTFOd8Y2UOuZQ3bfd3331X/fv31zvvvJPjvXI8YbtL1vX/ct607S9njNH58+ftjz1h21vV95zmW7LdLb8tm5vLusX8nDlzzM8//2yGDRtmQkNDzZ49e4wxxjz66KOmT58+9vZZPycyfPhw8/PPP5s5c+Zk+zmRffv2mRIlSpjBgwebX3/91SxZssRERUWZZ599tsj75yxWjNOYMWNMWFiYeffdd80ff/xhVqxYYSpXrmy6detW5P1zpvyOlTHGbN261WzdutXUr1/f9OrVy2zdutVs377dPv/rr782vr6+ZtKkSWbHjh1m0qRJbvmTDflhxTg999xzJiAgwHzwwQcOP+9w6tSpIu2bs1kxVpfzhruUWzFO3vh57m2syE9jx441y5YtM7t27TJbt241AwYMMH5+fuabb74p8v5dTXHON1b03Vu3+zvvvGP8/PzMK6+84pAbjx8/bm/jKdvdGGv6763b/uWXXzaffPKJ2blzp9m5c6d54403THh4uHniiSfsbTxl21vR96La7sW+4DbGmFdeecUkJCSYgIAAU69ePbN27Vr7vH79+pmWLVs6tE9OTjbXXXedCQgIMBUrVjQzZ87Mtsz169ebRo0amcDAQFOpUiUzfvx4k56ebnVXLOXscUpLSzNjx441lStXNkFBQSY+Pt488MAD5tixY0XQG2vld6wkZftLSEhwaPPf//7XVK1a1fj7+5ukpCTz4YcfFkFPrOXscUpISMixzZgxY4qmQxay4jV1KW8ouI2xZpy88fPc2zg7Pw0bNsxUqFDBBAQEmLJly5p27dqZ9evXF0VX8q045xtn991bt3vLli1z7Hu/fv0clukp290Y5/ffW7f9jBkzTI0aNUxISIgJDw831113nXn11VdNRkaGwzI9Zds7u+9Ftd1txnjoubsAAAAAALixYn0NNwAAAAAAVqHgBgAAAADAAhTcAAAAAABYgIIbAAAAAAALUHADAAAAAGABCm4AAAAAACxAwQ0AAAAAgAUouAEAAAAAsAAFNwAAAAAAFqDgBgAAAADAAhTcAAAAAABYgIIbAAAAAAALUHADAAAAAGABCm4AAAAAACxAwQ0AAAAAgAUouAEAAAAAsAAFNwAAAAAAFqDgBgAAAADAAhTcAAAAAABYgIIbAAAAAAALUHADAAAAAGABCm4AAAAAACxAwQ0AAAAAgAUouAEAAAAAsAAFNwAAAAAAFqDgBgAAAADAAhTcAAAAAABYgIIbAAAAAAALUHADAAAAAGABCm4AAAAAACxAwQ0AAAAAgAUouAEAAAAAsAAFNwAAAAAAFqDgBgAAAADAAhTcAAAAAABYgIIbAAAAAAALUHADAAAAAGABCm4AHqlixYrq37+//fHBgwc1duxYbdu2zWUxAQCA/Onfv78qVqxYoOeuX79eY8eO1fHjx50aE+BMNmOMcXUQAJBfW7duVXh4uCpXrixJ2rJli66//nrNnTvXoRAHAADua9euXTp58qSuu+66fD936tSpevjhh7V79+4CF+2A1fxcHQAAFERBEjMAAHAvWTvOAW/FKeUotsaOHSubzaYffvhBd955pyIiIlS6dGk99NBDSk9P16+//qoOHTooLCxMFStW1OTJk7Mt4+TJkxo5cqQSExMVEBCg8uXLa9iwYTpz5oxDu1deeUUtWrRQVFSUQkNDVatWLU2ePFlpaWkO7Vq1aqWaNWtq8+bNat68uUJCQlSpUiVNmjRJmZmZufYpMzNTL730kurWravg4GCVLFlSjRs31ieffOLQZvLkyUpKSlJgYKCioqLUt29fHThwoMCxHD9+XCNGjFClSpXsy7zlllv0yy+/2NuMGzdOjRo1UunSpRUeHq569eppzpw5uvQkm9tuu00JCQk59rVRo0aqV6+e/fGlp5QnJyfr+uuvlyQNGDBANptNNptNY8eO1VtvvSWbzaYNGzZkW+bTTz8tf39/HTx4MNexBQBcRP707PxpjNGrr75q72upUqX0f//3f/rjjz9yHaesbb9161Z17dpV4eHhioiIUO/evfXXX39lG9O8jFdOp5TbbDYNHjxYb731lqpVq6aQkBDVqVNHS5YscYjl4YcfliQlJibac39ycrIkafXq1WrVqpUiIyMVHBysChUq6I477tDZs2dz7SfgVAYopsaMGWMkmapVq5pnnnnGrFy50owaNcpIMoMHDzZJSUlmxowZZuXKlWbAgAFGkvnwww/tzz9z5oypW7euKVOmjJk2bZpZtWqVmT59uomIiDBt2rQxmZmZ9rbDhw83M2fONMuWLTOrV682L7zwgilTpowZMGCAQ0wtW7Y0kZGR5pprrjGzZs0yK1euNA888ICRZObPn59rn/r06WNsNpv517/+ZT7++GPz+eefm/Hjx5vp06fb29x33332Pi5btszMmjXLlC1b1sTHx5u//vor37GcPHnS1KhRw4SGhpqnn37aLF++3Hz44Ydm6NChZvXq1fZ2/fv3N3PmzDErV640K1euNM8884wJDg4248aNs7f5+OOPjSSzcuVKh37t2LHDSDIzZsywT0tISDD9+vUzxhhz4sQJM3fuXCPJPPnkk2bDhg1mw4YNZv/+/eb8+fMmOjra3HXXXQ7LTEtLM7GxsebOO+/MdVwBAP9D/vTs/Hnvvfcaf39/M2LECLNs2TLzzjvvmKSkJFOuXDmTkpJy1XHK2vYJCQnm4YcfNsuXLzfTpk0zoaGh5rrrrjMXLlzI93j169fPJCQkOKxHkqlYsaJp2LChef/9983SpUtNq1atjJ+fn9m1a5cxxpj9+/ebIUOGGElm0aJF9tx/4sQJs3v3bhMUFGTatm1rFi9ebJKTk82CBQtMnz59zLFjx67aR8DZKLhRbGUljeeff95het26de0f3lnS0tJM2bJlTdeuXe3TJk6caHx8fMzmzZsdnv/BBx8YSWbp0qU5rjcjI8OkpaWZN9980/j6+pp//vnHPq9ly5ZGkvnmm28cnlO9enXTvn37q/bnyy+/NJLME088ccU2WYn3gQcecJj+zTffGEnm8ccfz3csTz/9dI5J/mqyxuDpp582kZGR9i9XaWlpply5cqZXr14O7UeNGmUCAgLM33//bZ92acFtjDGbN282kszcuXOzrW/MmDEmICDAHD582D7tvffeM5LM2rVr8xw3AID8eSlPy58bNmzIcdvt37/fBAcHm1GjRl11/Vnbfvjw4Q7TFyxYYCSZt99+2xiTv/G6UsFdrlw5c/LkSfu0lJQU4+PjYyZOnGifNmXKFCPJ7N692+H5Wa+lbdu2XbU/QFHglHIUe506dXJ4XK1aNdlsNt188832aX5+fqpSpYr27t1rn7ZkyRLVrFlTdevWVXp6uv2vffv2Dqc0SRdv8HXrrbcqMjJSvr6+8vf3V9++fZWRkaGdO3c6rD86OloNGzZ0mFa7dm2Hdefk888/lyQNGjToim3WrFkjSdluKtawYUNVq1ZNX3zxRb5j+fzzz3Xttdfqpptuump8q1ev1k033aSIiAj7GDz11FM6evSojhw5IuniOPfu3VuLFi3SiRMnJEkZGRl666231KVLF0VGRl51HVfy73//W5I0e/Zs+7SXX35ZtWrVUosWLQq0TAAo7sifnpc/lyxZIpvNpt69ezuMfXR0tOrUqeMw9ldz1113OTzu1q2b/Pz87OOU3/HKSevWrRUWFmZ/XK5cOUVFReW6PSWpbt26CggI0H333af58+fn6XR5wCoU3Cj2Spcu7fA4ICBAISEhCgoKyjY9NTXV/vjw4cP64Ycf5O/v7/AXFhYmY4z+/vtvSdK+ffvUvHlz/fnnn5o+fbrWrVunzZs365VXXpEknTt3zmE9ORWVgYGB2dpd7q+//pKvr6+io6Ov2Obo0aOSpJiYmGzzYmNj7fPzE8tff/2luLi4q8a2adMmtWvXTtLFovfrr7/W5s2b9cQTT0hyHIO7775bqampWrhwoSRp+fLlOnTokAYMGHDVdVxNuXLl1L17d7322mvKyMjQDz/8oHXr1mnw4MEFXiYAFHfkz4s8KX8ePnxYxhiVK1cu2/hv3LjRPva5uXys/Pz8FBkZaR+H/I5XTgq6PaWLN2JbtWqVoqKiNGjQIFWuXFmVK1fW9OnTc30u4GzcpRwooDJlyig4OFhvvPHGFedL0uLFi3XmzBktWrRICQkJ9vnO/r3osmXLKiMjQykpKTkmOOl/yevQoUPZkvzBgwftMed3vZffAOVyCxculL+/v5YsWeLwRWzx4sXZ2lavXl0NGzbU3LlzNXDgQM2dO1exsbH2LxwFNXToUL311lv6+OOPtWzZMpUsWTLbHnoAgPXIn/9bb1HnzzJlyshms2ndunUKDAzMtoycpuUkJSVF5cuXtz9OT0/X0aNH7eNkxXjlV/PmzdW8eXNlZGRoy5YteumllzRs2DCVK1dOPXr0sHz9QBaOcAMF1KlTJ+3atUuRkZFq0KBBtr+sO27abDZJjknMGONwerMzZJ3CN3PmzCu2adOmjSTp7bffdpi+efNm7dixQzfeeGOB1rtz506tXr36im1sNpv8/Pzk6+trn3bu3Dm99dZbObYfMGCAvvnmG3311Vf69NNP1a9fP4fn5iRrfK+057t+/fpq2rSpnnvuOS1YsED9+/dXaGhobt0DADgZ+fN/6y3q/NmpUycZY/Tnn3/mOPa1atXKU+wLFixwePz+++8rPT1drVq1kmTNeOUkt9wvSb6+vmrUqJH9zIjvvvvOKesG8ooj3EABDRs2TB9++KFatGih4cOHq3bt2srMzNS+ffu0YsUKjRgxQo0aNVLbtm0VEBCgnj17atSoUUpNTdXMmTN17Ngxp8bTvHlz9enTR88++6wOHz6sTp06KTAwUFu3blVISIiGDBmiqlWr6r777tNLL70kHx8f3XzzzdqzZ49Gjx6t+Ph4DR8+vEDj8N5776lLly569NFH1bBhQ507d05r165Vp06d1Lp1a3Xs2FHTpk1Tr169dN999+no0aOaOnXqFfek9+zZUw899JB69uyp8+fPZ7sGLCeVK1dWcHCwFixYoGrVqqlEiRKKjY1VbGysvc3QoUPVvXt32Ww2PfDAA/nuKwCg8Mif/xuHos6fzZo103333acBAwZoy5YtatGihUJDQ3Xo0CF99dVXqlWrlv2+J1ezaNEi+fn5qW3bttq+fbtGjx6tOnXqqFu3bpJkyXjlJGsHwfTp09WvXz/5+/uratWqWrBggVavXq2OHTuqQoUKSk1NtZ9Rkds184DTufKObYArZd1p89KfpjDm4t0yQ0NDs7Vv2bKlqVGjhsO006dPmyeffNJUrVrVBAQEmIiICFOrVi0zfPhwh5/W+PTTT02dOnVMUFCQKV++vHn44YfN559/biSZNWvWXHUdWTFdfgfPnGRkZJgXXnjB1KxZ0x5PkyZNzKeffurQ5rnnnjPXXnut8ff3N2XKlDG9e/c2+/fvz7W/V4rl2LFjZujQoaZChQrG39/fREVFmY4dO5pffvnF3uaNN94wVatWNYGBgaZSpUpm4sSJZs6cOTneXdQYY3r16mUkmWbNmuXY18vvUm6MMe+++65JSkoy/v7+RpIZM2aMw/zz58+bwMBA06FDhxyXCQDIHfnTs/Nn1jIbNWpkQkNDTXBwsKlcubLp27ev2bJly1XHKWvbf/vtt6Zz586mRIkSJiwszPTs2dPhl0DyM15Xukv5oEGDsq0/p9z/2GOPmdjYWOPj42N/XWzYsMHcfvvtJiEhwQQGBprIyEjTsmVL88knn1y1f4AVbMYY45pSHwCK1qeffqpbb71Vn332mW655RZXhwMAgEcZO3asxo0bp7/++qtIrsMGvAGnlAPwej///LP27t2rESNGqG7dug4/WQMAAABYhZumAfB6DzzwgG699VaVKlVK7777rv1GPAAAAICVOKUcAAAAAAALcIQbAAAAAAALUHADAAAAAGABl9w0LTMzUwcPHlRYWBjXUgIAijVjjE6dOqXY2Fj5+LhmPzh5GQCAi5ydl11ScB88eFDx8fGuWDUAAG5p//79iouLc8m6ycsAADhyVl52ScEdFhYm6WInwsPDXRECAABu4eTJk4qPj7fnRlcgLwMAcJGz87JLCu6s09XCw8NJ7AAASC49lZu8DACAI2flZZcU3G5rzcTCPb/1Y86JAwAAFI3ccj+5HQBQCNylHAAAAAAAC3CE25nYSw4AAAAA+P84wg0AAAAAgAUouAEAAAAAsAAFNwAAAAAAFuAabgAA4L0K+wskAAAUAke4AQAAAACwAAU3AAAAAAAW8J5TyvlJLgAAAACAG+EINwAAAAAAFqDgBgAAAADAAt5zSjkAAICzOeMu51zWBgDFFke4AQAAAACwAAU3AAAAAAAWoOAGAAAAAMACFNwAAAAAAFig+Nw0zRk3PQEAAAAAII+KT8HtDvJS9HMnUwAAAADwChTcAADAM3H2GgDAzXENNwAAAAAAFqDgBgAAAADAAhTcAAAAAABYgIIbAAAAAAALcNM0d5PbDWC4izkAAAAAeASOcAMAAAAAYAEKbgAAAAAALMAp5QAAwD3xO9sAAA/HEW4AAAAAACzAEW5Pw03VAAAAAMAjUHADAABrsJMYHuaFlTuvOn9422uLKBIA3oKCGwAAuEZxuUabHQ8AUGxRcAMFxF5wAAAAAFfDTdMAAAAAALAABTcAAAAAABbglHIAAABX4hpvAPBaFNwocrld+yxZf/2zO8QAAAAAwLtRcBc3ebkjLHvSAQCAk3nDzUa9oQ8AihYFNwAAQDFHIQkA1qDg9jbF5TdNAQAoLrjGGwA8FgU3ssslsb+QfsdV57MXHAAAAAAouFEAjfe9ftX5L6y8r9Dr4NQ2AADyqAjuz+IJedkTbojqCTECcC4KbsCLecIXJACA6+W2M31jhcLvTC8O8lJQAyheKLiRzYY/jro6hFyR0AAAQH7x/QFAUaPgBq7A6qTsjOVzhBoAUFxQLAPwRD6uDgAAAAAAAG/EEe5ixh1OF8/tOjGJa8XyqrB7+4vi5i1cRw4AbsDinw3l6LPzFDZvukPedYcYAHdBwe1mciuIm1SKLNTzi0JeCmp4DpImABROYXN7cUHRnjeME+BZKLjzgYSJ/CAh5g0F/UWMA+CZ3GFHd14UNk5n3MWcO6F7D2/4jsO9dFBUKLjhkUja7sMbki4AeDJ3KPqdcXYbub1oeEveZkc1PAUFtxO5Q8LDReyJx6WKS1L2huv+3AHjULx4Qu72hBjdAfeI8RxFcQ8Zd+At+cRb+uEqLim4jTGSpJMnTzpvoWdSC72ITXv+cUIg1lq1/aCrQygSqWdOX3X+mXPnLV1+XtaRl2VY7foDc686f3PcAI9Yx9VMXPydpct31jpyG6eGFUtfdX5qeperzs/L52Vur8nclpHb850xToPaVCn0Mgrty+evOtsZ2yI/spaXlRtdwZK8LDklN1utsPnEGU7mMk7uEKM7qPXrS1edfyYPy8jtc87VOQ//4w45KzeFXYen5FWrxzovfXhl9e+FXkZeOTsv24wLMvyBAwcUHx9f1KsFAMBt7d+/X3FxcS5ZN3kZAABHzsrLLim4MzMzdfDgQYWFhclmsxV6eSdPnlR8fLz279+v8PBwJ0SI/GIbuBbj73psA9fy5PE3xujUqVOKjY2Vj4+PS2Jwdl6WPHubXI6+uCdv6ovkXf2hL+6JvuSNs/OyS04p9/HxsWQvfnh4uMe/eDwd28C1GH/XYxu4lqeOf0REhEvXb1Veljx3m+SEvrgnb+qL5F39oS/uib7kzpl52TW70gEAAAAA8HIU3AAAAAAAWMArCu7AwECNGTNGgYGBrg6l2GIbuBbj73psA9di/N2PN20T+uKevKkvknf1h764J/riGi65aRoAAAAAAN7OK45wAwAAAADgbii4AQAAAACwAAU3AAAAAAAWoOAGAAAAAMACFNwAAAAAAFjAYwruV199VYmJiQoKClL9+vW1bt26q7Zfu3at6tevr6CgIFWqVEmzZs0qoki9V362waFDh9SrVy9VrVpVPj4+GjZsWNEF6qXyM/6LFi1S27ZtVbZsWYWHh6tJkyZavnx5EUbrnfKzDb766is1a9ZMkZGRCg4OVlJSkl544YUijNb75DcPZPn666/l5+enunXrWhugl7Ei73744YeqXr26AgMDVb16dX300UeFXq8r+jJ79mw1b95cpUqVUqlSpXTTTTdp06ZNDm3Gjh0rm83m8BcdHe12fZk3b162OG02m1JTUwu1Xlf0pVWrVjn2pWPHjvY27rBd8vodyVXvFyv64ynvmbz0xVPeM3npi6e8Z/L6vdaV75mrMh5g4cKFxt/f38yePdv8/PPPZujQoSY0NNTs3bs3x/Z//PGHCQkJMUOHDjU///yzmT17tvH39zcffPBBEUfuPfK7DXbv3m0efPBBM3/+fFO3bl0zdOjQog3Yy+R3/IcOHWqee+45s2nTJrNz507z2GOPGX9/f/Pdd98VceTeI7/b4LvvvjPvvPOO+emnn8zu3bvNW2+9ZUJCQsxrr71WxJF7h/yOf5bjx4+bSpUqmXbt2pk6deoUTbBewIq8u379euPr62smTJhgduzYYSZMmGD8/PzMxo0bC7xeV/WlV69e5pVXXjFbt241O3bsMAMGDDARERHmwIED9jZjxowxNWrUMIcOHbL/HTlypMD9sKovc+fONeHh4Q5xHjp0qFDrdVVfjh496tCHn376yfj6+pq5c+fa27jDdsnLdyRXvV+s6o+nvGfy0hdPec/kpS+e8p7Jy/daV75ncuMRBXfDhg3N/fff7zAtKSnJPProozm2HzVqlElKSnKYNnDgQNO4cWPLYvR2+d0Gl2rZsiUFdyEVZvyzVK9e3YwbN87ZoRUbztgGt99+u+ndu7ezQysWCjr+3bt3N08++aQZM2YMBXc+WJF3u3XrZjp06ODQpn379qZHjx4FXm9eFMV3iPT0dBMWFmbmz59vn2bFa86KvsydO9dEREQ4db15URTb5YUXXjBhYWHm9OnT9mnusF0udaXvSK56vxR2uXn9zueu75lLXakvnvKeuVRet4snvGeyXP691pXvmdy4/SnlFy5c0Lfffqv/x959h0dV5X8c/0w6CQmGUJIQDCEiRToohGIIKEgRFV1EBAF11wIKIusGUYOogGBDpawsUiwUAVnsoBRRQEXCqsAiCghKExQJIIEk5/cHv8wypMwkmTszybxfz5PnyZx75p5yy7nf26Zr164O6V27dtX69esL/c6GDRsK5O/WrZs2bdqks2fPWlbXiqo0ywDu447+z8vLU1ZWlqpWrWpFFSs8dyyDzMxMrV+/XqmpqVZUsUIrbf/Pnj1bP/74ozIyMqyuYoVi1bhbVJ78eVox1njqGOLUqVM6e/ZsgX3szp07FR8fr6SkJPXr10+7du0qVTusbsuJEyeUmJiohIQE9erVS5mZmWUq15ttOd+sWbPUr18/RUREOKR7e7m4whvbi5XzvZCvbjOuKg/bTGmUl22msONab20zrvD5gPvIkSPKzc1VzZo1HdJr1qypgwcPFvqdgwcPFpo/JydHR44csayuFVVplgHcxx39/+yzz+rkyZPq27evFVWs8MqyDBISEhQaGqrWrVtr6NChuvPOO62saoVUmv7fuXOn0tPT9cYbbygoKMgT1awwrBp3i8qTP08rxhpPHUOkp6erVq1auuqqq+xpbdq00bx58/TRRx9p5syZOnjwoNq1a6ejR4/6VFsaNGigOXPmaPny5Zo/f77CwsLUvn177dy5s9Tleqst5/vyyy/13XffFdjn+sJycYU3thcr53shX91mXFFetpmSKk/bTGHHtd7aZlxRbo5CbDabw2djTIE0Z/kLS4frSroM4F6l7f/58+dr7Nix+ve//60aNWpYVT2/UJplsG7dOp04cUIbN25Uenq6LrnkEt1yyy1WVrPCcrX/c3Nz1b9/fz3++OO69NJLPVW9CseKcdeVeVox1lh5DDFp0iTNnz9fa9asUVhYmD29e/fu9v+bNGmilJQUJScna+7cuRo5cmSp2lFU3crSlrZt26pt27b26e3bt1fLli310ksv6cUXXyx1ua6wcrnMmjVLjRs31hVXXOGQ7ivLxV3ztOrYzMpjPl/fZpwpT9tMSZSXbaa441pvbjPF8fmAu1q1agoMDCxw5uHw4cMFzlDki42NLTR/UFCQYmJiLKtrRVWaZQD3KUv/L1y4UHfccYfeeusth7PIKJmyLIOkpCRJ5wapQ4cOaezYsQTcJVTS/s/KytKmTZuUmZmpYcOGSTp3+5kxRkFBQVqxYoU6d+7skbqXR1aNu0XlyZ+nFWON1ccQzzzzjMaPH6+PP/5YTZs2LbYuERERatKkif0qWEl56ngoICBAl19+ub2e5XG5nDp1SgsWLNC4ceOc1sUby8UV3therJxvPl/fZkrDV7eZkigv20xxx7Xe2mZc4fO3lIeEhKhVq1ZauXKlQ/rKlSvVrl27Qr+TkpJSIP+KFSvUunVrBQcHW1bXiqo0ywDuU9r+nz9/vgYPHqw333zT4ecdUHLu2gaMMcrOznZ39Sq8kvZ/VFSUvv32W23ZssX+d/fdd6t+/frasmWL2rRp46mql0tWjbtF5cmfpxVjjZXHEJMnT9YTTzyhDz/8UK1bt3Zal+zsbG3fvl1xcXGlaInnjoeMMdqyZYu9nuVtuUjSokWLlJ2drQEDBjitizeWiyu8sb1YOV+pfGwzpeGr20xJlIdtxtlxrbe2GZdY+ko2N8l/hfusWbPMtm3bzIgRI0xERITZs2ePMcaY9PR0M3DgQHv+/J+OeOCBB8y2bdvMrFmz+FmwMirpMjDGmMzMTJOZmWlatWpl+vfvbzIzM83WrVu9Uf1yr6T9/+abb5qgoCAzdepUh59xOHbsmLeaUO6VdBm8/PLLZvny5eb7778333//vXn11VdNVFSUGTNmjLeaUK6VZh90Pt5SXjJWjLuff/65CQwMNBMnTjTbt283EydOLPInW4oq11fa8vTTT5uQkBCzePFih31sVlaWPc+DDz5o1qxZY3bt2mU2btxoevXqZSIjI32uLWPHjjUffvih+fHHH01mZqYZMmSICQoKMl988YXL5fpKW/J16NDB3HzzzYWW6wvLxRjnx0je2l6sak952WZcaUt52WZcaUs+X99mXDmu9eY240y5CLiNMWbq1KkmMTHRhISEmJYtW5q1a9fapw0aNMikpqY65F+zZo1p0aKFCQkJMXXq1DHTp0/3cI0rnpIuA0kF/hITEz1b6QqkJP2fmppaaP8PGjTI8xWvQEqyDF588UVz2WWXmfDwcBMVFWVatGhhpk2bZnJzc71Q84qhpPug8xFwl5wV4+5bb71l6tevb4KDg02DBg3MkiVLSlSur7QlMTGx0H1sRkaGPc/NN99s4uLiTHBwsImPjzd9+vRxy0lnd7dlxIgR5uKLLzYhISGmevXqpmvXrmb9+vUlKtdX2mKMMTt27DCSzIoVKwot01eWiyvHSN7aXqxoT3naZpy1pTxtM66sZ+Vhm3H1uNab20xxbMb8/xsnAAAAAACA2/j8M9wAAAAAAJRHBNwAAAAAAFiAgBsAAAAAAAsQcAMAAAAAYAECbgAAAAAALEDADQAAAACABQi4AQAAAACwAAE3AAAAAAAWIOAGAAAAAMACBNwAAAAAAFiAgBsAAAAAAAsQcAMAAAAAYAECbgAAAAAALEDADQAAAACABQi4AQAAAACwAAE3AAAAAAAWIOAGAAAAAMACBNwAAAAAAFiAgBsAAAAAAAsQcAMAAAAAYAECbgAAAAAALEDADQAAAACABQi4AQAAAACwAAE3AAAAAAAWIOAGAAAAAMACBNwAAAAAAFiAgBsAAAAAAAsQcAMAAAAAYAECbgAAAAAALEDADQAAAACABQi4AQAAAACwAAE3AAAAAAAWIOAGAAAAAMACBNwAAAAAAFiAgBsAAAAAAAsQcAMAAAAAYAECbgAAAAAALEDADaBcev/99zV27FhvVwMA4MN+++039evXTzVq1JDNZtP111/v0fKnTZumOXPmlGkederU0eDBg0v9fZvN5jBezpkzRzabTXv27ClTvXxBWfsG8IQgb1cAAErj/fff19SpUwm6AQBFeuKJJ/T222/r1VdfVXJysqpWrerR8qdNm6Zq1ar5VFDYs2dPbdiwQXFxcd6uSpm9/fbbioqK8nY1gGIRcAOo8IwxOn36tCpVquTtqgAAPOi7775TcnKybr311mLz5ebmKicnR6GhoR6qmfdUr15d1atX93Y13KJFixbergLgFLeUw++MHTtWNptN33zzjf7yl7+oSpUqqlq1qkaOHKmcnBzt2LFD11xzjSIjI1WnTh1NmjSpwDyOHz+uUaNGKSkpSSEhIapVq5ZGjBihkydPOuSbOnWqrrzyStWoUUMRERFq0qSJJk2apLNnzzrk69Spkxo3bqyvvvpKHTt2VHh4uOrWrauJEycqLy/PaZveeusttWnTRlWqVLF/9/bbb5cknThxQhdddJHuuuuuAt/bs2ePAgMDNXnyZEn/u81s1apV+utf/6qYmBhFRUXptttu08mTJ3Xw4EH17dtXF110keLi4jRq1CiHtuzZs0c2m02TJ0/W008/rTp16qhSpUrq1KmTvv/+e509e1bp6emKj49XlSpVdMMNN+jw4cMF6rVw4UKlpKQoIiJClStXVrdu3ZSZmWmfPnjwYE2dOlXSuVvl8v/yb4+z2WwaNmyYZsyYoYYNGyo0NFRz5sxRvXr11K1btwLlnThxQlWqVNHQoUOd9jUAVFQVaXzMH48+/vhjbd++3T5OrFmzxj5t0qRJevLJJ5WUlKTQ0FCtXr1ap0+f1oMPPqjmzZvb25+SkqJ///vfBcrIy8vTSy+9pObNm6tSpUq66KKL1LZtWy1fvlzSududt27dqrVr19rLr1OnjiSVqBxXHT9+3D52V65cWddcc42+//77AvkKu6U8v583bNigdu3aqVKlSqpTp45mz54tSXrvvffUsmVLhYeHq0mTJvrwww8LzHfnzp3q37+/atSoodDQUDVs2NA+Vudbs2aNbDab5s+frzFjxig+Pl5RUVG66qqrtGPHDoe8mZmZ6tWrl31+8fHx6tmzp37++Wd7nsJuKd+7d68GDBjgUI9nn33WYX3JXweeeeYZPffcc0pKSlLlypWVkpKijRs3utzngEsM4GcyMjKMJFO/fn3zxBNPmJUrV5qHHnrISDLDhg0zDRo0MC+++KJZuXKlGTJkiJFklixZYv/+yZMnTfPmzU21atXMc889Zz7++GMzZcoUU6VKFdO5c2eTl5dnz/vAAw+Y6dOnmw8//NCsWrXKPP/886ZatWpmyJAhDnVKTU01MTExpl69embGjBlm5cqV5t577zWSzNy5c4ttz/r1643NZjP9+vUz77//vlm1apWZPXu2GThwoEM9IiIizLFjxxy++/e//92EhYWZI0eOGGOMmT17tpFkkpKSzIMPPmhWrFhhnn76aRMYGGhuueUW07JlS/Pkk0+alStXmn/84x9Gknn22Wft89u9e7eRZBITE821115r3n33XfP666+bmjVrmksvvdQMHDjQ3H777eaDDz4wM2bMMJUrVzbXXnutQ52eeuopY7PZzO23327effdds3TpUpOSkmIiIiLM1q1bjTHG/PDDD+amm24yksyGDRvsf6dPnzbGGCPJ1KpVyzRt2tS8+eabZtWqVea7774zU6ZMMTabzXz//fcOZU6dOtVIss8fAPxRRRofT58+bTZs2GBatGhh6tatax8n/vjjD/tYVatWLZOWlmYWL15sVqxYYXbv3m2OHTtmBg8ebF577TWzatUq8+GHH5pRo0aZgICAAuUNHDjQ2Gw2c+edd5p///vf5oMPPjBPPfWUmTJlijHGmM2bN5u6deuaFi1a2MvfvHmzMcaUqJzExEQzaNCgYpddXl6eSUtLM6Ghoeapp54yK1asMBkZGaZu3bpGksnIyLDnzR/rd+/eXaCf69evb2bNmmU++ugj06tXLyPJPP7446ZJkyZm/vz55v333zdt27Y1oaGh5pdffrF/f+vWraZKlSqmSZMmZt68eWbFihXmwQcfNAEBAWbs2LH2fKtXrzaSTJ06dcytt95q3nvvPTN//nxz8cUXm3r16pmcnBxjjDEnTpwwMTExpnXr1mbRokVm7dq1ZuHChebuu+8227ZtK7JvDh8+bGrVqmWqV69uZsyYYT788EMzbNgwI8ncc8899nz560CdOnXMNddcY5YtW2aWLVtmmjRpYqKjowscLwFlQcANv5N/QHF+oGiMMc2bNzeSzNKlS+1pZ8+eNdWrVzd9+vSxp02YMMEEBASYr776yuH7ixcvNpLM+++/X2i5ubm55uzZs2bevHkmMDDQ/Pbbb/ZpqampRpL54osvHL7TqFEj061bt2Lb88wzzxhJxQ4OP/74owkICDDPP/+8Pe3PP/80MTExDgc3+YPwfffd5/D966+/3kgyzz33nEN68+bNTcuWLe2f8wewZs2amdzcXHv6Cy+8YCSZ3r17O3x/xIgRRpL5448/jDHG7N271wQFBRUoPysry8TGxpq+ffva04YOHWqKOmcoyVSpUsWhj40x5vjx4yYyMtIMHz7cIb1Ro0YmLS2t0HkBgL+oaONj/vcvu+wyh7T8sSo5OdmcOXOm2O/n5OSYs2fPmjvuuMO0aNHCnv7pp58aSWbMmDHFfv+yyy4zqampTutZVDnGuBZwf/DBB0aSPdjP99RTT7kccEsymzZtsqcdPXrUBAYGmkqVKjkE11u2bDGSzIsvvmhP69atm0lISLCP5/mGDRtmwsLC7Ms0P+Du0aOHQ75FixbZT6IbY8ymTZuMJLNs2bJi231h36Snpxe6vtxzzz3GZrOZHTt2GGP+tw40adLEHuQbY8yXX35pJJn58+cXWy5QEtxSDr/Vq1cvh88NGzaUzWZT9+7d7WlBQUG65JJL9NNPP9nT3n33XTVu3FjNmzdXTk6O/a9bt27229XyZWZmqnfv3oqJiVFgYKCCg4N12223KTc3t8BtXrGxsbriiisc0po2bepQdmEuv/xySVLfvn21aNEi/fLLLwXy1K1bV7169dK0adNkjJEkvfnmmzp69KiGDRvmUt9I5160cmF6YfXr0aOHAgICHPIV9X3p3O1fkvTRRx8pJydHt912m0PfhoWFKTU11aFvnencubOio6Md0iIjIzVkyBDNmTPHfnvjqlWrtG3btkL7AQD8UUUZH53p3bu3goODC6S/9dZbat++vSpXrqygoCAFBwdr1qxZ2r59uz3PBx98IEllehTJlXJctXr1akkq8Kx6//79XZ5HXFycWrVqZf9ctWpV1ahRQ82bN1d8fLw9PX/szu//06dP65NPPtENN9yg8PBwh2Xfo0cPnT59usBt2r1793b43LRpU4d5XnLJJYqOjtY//vEPzZgxQ9u2bXOpDatWrVKjRo0KrC+DBw+WMUarVq1ySO/Zs6cCAwOLrAfgDgTc8FsXvqk0JCRE4eHhCgsLK5B++vRp++dDhw7pm2++UXBwsMNfZGSkjDE6cuSIpHNBZMeOHfXLL79oypQpWrdunb766iv780x//vmnQzkxMTEF6hgaGlog34WuvPJKLVu2zB6oJiQkqHHjxpo/f75DvuHDh2vnzp1auXKlpHPPz6WkpKhly5Yu9U1R6ef3TWm+L8k+j0OHDkk6dxLhwv5duHChvW9dUdTbV++77z5lZWXpjTfekCS9/PLLSkhI0HXXXefyvAGgIqso46MzhY0TS5cuVd++fVWrVi29/vrr2rBhg7766ivdfvvtDm399ddfFRgYqNjY2FKV7Wo5rjp69KiCgoIK9FVJ6lfYG9xDQkKcjt1Hjx5VTk6OXnrppQLLvkePHpJUYPy+sJ75L6vLX6ZVqlTR2rVr1bx5cz388MO67LLLFB8fr4yMjALP+Z/v6NGjhS7X/BMGR48eLVE9AHfgLeVACVWrVk2VKlXSq6++WuR0SVq2bJlOnjyppUuXKjEx0T59y5Ytbq/Tddddp+uuu07Z2dnauHGjJkyYoP79+6tOnTpKSUmRdO6Kb+PGjfXyyy+rcuXK2rx5s15//XW316Us8vtu8eLFDn1WGjabrdD0Sy65RN27d9fUqVPVvXt3LV++XI8//rjDGW4AQMn54vhYnMLGiddff11JSUlauHChw/Ts7GyHfNWrV1dubq4OHjxYqp/XcrUcV8XExCgnJ0dHjx51CCIPHjxYqvmVRHR0tAIDAzVw4MAir/gnJSWVeL5NmjTRggULZIzRN998ozlz5mjcuHGqVKmS0tPTC/1OTEyMDhw4UCB9//79kv63DgKeRMANlFCvXr00fvx4xcTEFDuA5A+g5//EiDFGM2fOtKxuoaGhSk1N1UUXXaSPPvpImZmZ9oBbku6//37dfffd+uOPP1SzZk395S9/sawupdGtWzcFBQXpxx9/1I033lhs3vPPQpf0576GDx+url27atCgQQoMDNRf//rXUtcZAHCOL4+PrrLZbAoJCXEIgg8ePFjg7eHdu3fXhAkTNH36dI0bN67I+RV1Jd7VclyVlpamSZMm6Y033tD9999vT3/zzTdLNb+SCA8PV1pamjIzM9W0aVP7FXB3sdlsatasmZ5//nnNmTNHmzdvLjJvly5dNGHCBG3evNnhDr558+bJZrMpLS3NrXUDXEHADZTQiBEjtGTJEl155ZV64IEH1LRpU+Xl5Wnv3r1asWKFHnzwQbVp00ZXX321QkJCdMstt+ihhx7S6dOnNX36dP3+++9urc9jjz2mn3/+WV26dFFCQoKOHTumKVOmKDg4WKmpqQ55BwwYoNGjR+vTTz/VI4884vZBsazq1KmjcePGacyYMdq1a5euueYaRUdH69ChQ/ryyy8VERGhxx9/XNK5M9+S9PTTT6t79+4KDAx0eaC/+uqr1ahRI61evdr+0yEAgLLxtfGxNHr16qWlS5fq3nvv1U033aR9+/bpiSeeUFxcnHbu3GnP17FjRw0cOFBPPvmkDh06pF69eik0NFSZmZkKDw/XfffdJ+l/V2kXLlyounXrKiwsTE2aNHG5HFd17dpVV155pR566CGdPHlSrVu31ueff67XXnvNbX1TnClTpqhDhw7q2LGj7rnnHtWpU0dZWVn64Ycf9M477xR4dtqZd999V9OmTdP111+vunXryhijpUuX6tixY7r66quL/N4DDzygefPmqWfPnho3bpwSExP13nvvadq0abrnnnt06aWXlrWpQIkRcAMlFBERoXXr1mnixIl65ZVXtHv3blWqVEkXX3yxrrrqKvtvbDZo0EBLlizRI488oj59+igmJkb9+/fXyJEjHV48U1Zt2rTRpk2b9I9//EO//vqrLrroIrVu3VqrVq3SZZdd5pC3UqVKuvbaa/X666/r7rvvdlsd3Gn06NFq1KiRpkyZovnz5ys7O1uxsbG6/PLLHercv39/ff7555o2bZrGjRsnY4x2795t739n+vbtq7Fjx/KyNABwE18bH0tjyJAhOnz4sGbMmKFXX31VdevWVXp6un7++Wf7Cd98c+bMUcuWLTVr1izNmTNHlSpVUqNGjfTwww/b8zz++OM6cOCA/vrXvyorK0uJiYnas2dPicpxRUBAgJYvX66RI0dq0qRJOnPmjNq3b6/3339fDRo0KHO/ONOoUSNt3rxZTzzxhB555BEdPnxYF110kerVq2d/jrsk6tWrp4suukiTJk3S/v37FRISovr162vOnDkaNGhQkd+rXr261q9fr9GjR2v06NE6fvy46tatq0mTJmnkyJFlaSJQajaT/8piABXemTNnVKdOHXXo0EGLFi3ydnW8qnXr1rLZbPrqq6+8XRUAAABUUFzhBvzAr7/+qh07dmj27Nk6dOhQkS8bqeiOHz+u7777Tu+++66+/vprvf32296uEgAAACowAm7AD7z33nsaMmSI4uLiNG3atEJ/CswfbN68WWlpaYqJiVFGRoauv/56b1cJAAAAFRi3lAMAAAAAYIEAb1cAAAAAAICKiIAbAAAAAAALeOUZ7ry8PO3fv1+RkZGy2WzeqAIAAD7BGKOsrCzFx8crIMA758EZlwEAOMfd47JXAu79+/erdu3a3igaAACftG/fPiUkJHilbMZlAAAcuWtc9krAHRkZKelcI6KiorxRBQAAfMLx48dVu3Zt+9joDYzLAACc4+5x2SsBd/7talFRUQzsAABIXr2Vm3EZAABH7hqX+R1u+KbVE4qfnjbaM/UAAMBqjHkAUGHxlnIAAAAAACxAwA0AAAAAgAW4pRwVE7fnAQAAAPAyrnADAAAAAGABAm4AAAAAACxAwA0AAAAAgAV4hhsAAKA8c/beEol3lwCAl3CFGwAAAAAACxBwAwAAAABgAQJuAAAAAAAswDPcAAAApcXz0wCAYhBwo3xy5QAHAAAAALyIgBueV16CZWf15IoFAAAAgGLwDDcAAAAAABbgCjdgFa6QAwAAAH6NK9wAAAAAAFiAK9xAaZWXZ9EBAAAAeAVXuAEAAAAAsABXuAFfxnPgAABfwHgEAKVCwI2SY9B1i+dXfu80zwNsoQAAdzzCxGNQAOAV3FIOAAAAAIAFuH4G96sgZ9E37Dpa7PSUujFlmn/bva84z1TGMgAAAAB4DwE3AAAAysaVk+08cgbAD3FLOQAAAAAAFuAKNwAAQFEqyGNSAADvIOBGQf5wcOEPbXQTZ29Tf+DqSz1UEwBAYZy9c0Qq+3tH3IJfOQHghwi4AR/m9MVtaR6qCAAAAIAS4xluAAAAAAAswBVuVEhW/6QXAAAu4zEmAPBbXOEGAAAAAMACXOEGAAAoQkW5Y8qVF6sVp7y0EwB8DQE3AADwWxtmjfJ2FZCPt5gDqIAIuOGXynqm313zKLOyPhfIwQsAAABgGQLu8oYACz6G3+kGAAAACkfADQAAUEqu3O3k7PnnivKcuNe5clGCCw8APIyA29944KdJ3HHgwMGHa+gnAAAAwHcRcAMAgArL2WMvbT1UD/gIXswGwMMIuOFxPvGyMQAAygnGTQAovwK8XQEAAAAAACoirnADfsyl35+9+G9lKoO3mAMApLJfqU+R9e+hAQB3I+AGUKy2e18pdvrGMgbkvoCTAgDg+3hRKIDyiFvKAQAAAACwAFe4gQqsIrxox9nVZ4kr0IC/cmX/APfwl6vLZb7jyQ1vQeeuK6BiIeAGAACWsDx40Y1O6+DssRi4hy+c4HWlDs5ODDhdX1aX7cSCJ96d4hb8fBrgNgTcKJd8YWAHAAAAgOIQcAMAgHKJq9coKU7YA/A0Am6UGIMV3MoNt4yW9bZV5wftzzitQ5lx+x78kNNtt4IcpTBu+g9fWNY8Aw74lgoylHmIJw6InQYfZeMLAwEqljJfYfLEi3Ys3q5cUdbAwh0vj+MgDAAAwLMIuAEAgE/iJDF8jSfWSWcnsp9fWbaXqrn04jYnUtLKOANXToRzZxcqiIoTcLvj6nNZr4K54Sqav/zsBpDP6cHLxR4owwmXri4HLXGSo/hb453Vsa1cuZPA4lvffeAAyRM/E8edAJ7DM9iA+3liu3K2n3RHHTbmlK2MlDucjImeGNM8EZ944sREWevghhjp+Zzij6N8eWz2SsBtjJEkHT9+3H0zPXm6+OmulOVsHh5w8s/sYqcfL2Mdnc0f8DWnT54o8zzKut67UofjQcVvm6dzip+HO7ZNZ/tUZ+1wuk92Zf/jzv16IVxaFmWsQ5n7qYTy55c/NnqDJeOynPclYxJQcr6wXXmiDmUtwyfGNE/EJxaPu26pgxtiLGfHUe4cv9w9LtuMF0b4n3/+WbVr1/Z0sQAA+Kx9+/YpISHBK2UzLgMA4Mhd47JXAu68vDzt379fkZGRstlsBaYfP35ctWvX1r59+xQVFeXp6rkd7fFttMe30R7fRnvKzhijrKwsxcfHKyAgwCNlXsjZuFwaFW3d8CX0rXXoW2vQr9ahb93P3eOyV24pDwgIcOlsQVRUVIVacWiPb6M9vo32+DbaUzZVqlTxWFmFcXVcLo2Ktm74EvrWOvStNehX69C37uXOcdk7p9IBAAAAAKjgCLgBAAAAALCATwbcoaGhysjIUGhoqLer4ha0x7fRHt9Ge3wb7UFR6Evr0LfWoW+tQb9ah771fV55aRoAAAAAABWdT17hBgAAAACgvCPgBgAAAADAAgTcAAAAAABYgIAbAAAAAAALEHADAAAAAGABSwLuadOmKSkpSWFhYWrVqpXWrVtXZN4DBw6of//+ql+/vgICAjRixIhC8y1ZskSNGjVSaGioGjVqpLfffrtM5XqzPTNnzlTHjh0VHR2t6OhoXXXVVfryyy8d8owdO1Y2m83hLzY21ifbM2fOnAJ1tdlsOn36dKnL9WZ7OnXqVGh7evbsac/jK8tn6dKluvrqq1W9enVFRUUpJSVFH330UYF85WX7caU95Wn7caU95Wn7caU95Wn7+eyzz9S+fXvFxMSoUqVKatCggZ5//vkC+by5/fiSkrZx7dq1atWqlcLCwlS3bl3NmDHDYbqr674/cHffStKxY8c0dOhQxcXFKSwsTA0bNtT7779vVRN8lrv71pV9nL+wYr194YUXVL9+fVWqVEm1a9fWAw88wD7BDX179uxZjRs3TsnJyQoLC1OzZs304YcfWtkEnM+42YIFC0xwcLCZOXOm2bZtmxk+fLiJiIgwP/30U6H5d+/ebe6//34zd+5c07x5czN8+PACedavX28CAwPN+PHjzfbt28348eNNUFCQ2bhxY6nL9WZ7+vfvb6ZOnWoyMzPN9u3bzZAhQ0yVKlXMzz//bM+TkZFhLrvsMnPgwAH73+HDh8vUFqvaM3v2bBMVFeVQ1wMHDpSpXG+25+jRow7t+O6770xgYKCZPXu2PY+vLJ/hw4ebp59+2nz55Zfm+++/N6NHjzbBwcFm8+bN9jzlaftxpT3laftxpT3laftxpT3lafvZvHmzefPNN813331ndu/ebV577TUTHh5u/vnPf9rzeHP78SUlbeOuXbtMeHi4GT58uNm2bZuZOXOmCQ4ONosXL7bncWXd9wdW9G12drZp3bq16dGjh/nss8/Mnj17zLp168yWLVs81SyfYEXfurKP8wdW9O3rr79uQkNDzRtvvGF2795tPvroIxMXF2dGjBjhqWb5BCv69qGHHjLx8fHmvffeMz/++KOZNm2aCQsLcxi/YR23B9xXXHGFufvuux3SGjRoYNLT051+NzU1tdAAqG/fvuaaa65xSOvWrZvp16+fW8otjhXtuVBOTo6JjIw0c+fOtadlZGSYZs2albS6TlnRntmzZ5sqVapYVq5V83V1+Tz//PMmMjLSnDhxwp7mi8snX6NGjczjjz9u/1xet598F7bnQuVl+8l3YXvK6/aTz9nyKW/bzw033GAGDBhg/+zN7ceXlLSNDz30kGnQoIFD2l133WXatm1r/+zKuu8PrOjb6dOnm7p165ozZ864v8LliBV9e6HC9nH+wIq+HTp0qOncubNDnpEjR5oOHTq4qdblgxV9GxcXZ15++WWHPNddd5259dZb3VRrFMett5SfOXNGX3/9tbp27eqQ3rVrV61fv77U892wYUOBeXbr1s0+T6vKtWq+Fzp16pTOnj2rqlWrOqTv3LlT8fHxSkpKUr9+/bRr164ylWNle06cOKHExEQlJCSoV69eyszMtLxcTy2fWbNmqV+/foqIiHBI98Xlk5eXp6ysLId1qTxvP4W150Llafspqj3ldftxZfmUp+0nMzNT69evV2pqqj3NW9uPLylNG4vqt02bNuns2bP2tOLWfX9gVd8uX75cKSkpGjp0qGrWrKnGjRtr/Pjxys3NtaYhPsjK9fZ8Re3jKjKr+rZDhw76+uuv7Y+J7dq1S++//75f3a5vVd9mZ2crLCzMIU+lSpX02WefubH2KIpbA+4jR44oNzdXNWvWdEivWbOmDh48WOr5Hjx4sNh5WlWuVfO9UHp6umrVqqWrrrrKntamTRvNmzdPH330kWbOnKmDBw+qXbt2Onr0aKnLsao9DRo00Jw5c7R8+XLNnz9fYWFhat++vXbu3GlpuZ5YPl9++aW+++473XnnnQ7pvrp8nn32WZ08eVJ9+/a1p5Xn7aew9lyoPG0/hbWnPG8/zpZPedl+EhISFBoaqtatW2vo0KEO9fXW9uNLStPGovotJydHR44ckeR83fcHVvXtrl27tHjxYuXm5ur999/XI488omeffVZPPfWUNQ3xQVb17fmK2sdVdFb1bb9+/fTEE0+oQ4cOCg4OVnJystLS0pSenm5NQ3yQVX3brVs3Pffcc9q5c6fy8vK0cuVK/fvf/9aBAwesaQgcBFkxU5vN5vDZGFMgzYp5WlGulfOVpEmTJmn+/Plas2aNw5mn7t272/9v0qSJUlJSlJycrLlz52rkyJFlKtPd7Wnbtq3atm1r/9y+fXu1bNlSL730kl588UXLyrV6vtK5M9eNGzfWFVdc4ZDui8tn/vz5Gjt2rP7973+rRo0aJZ6nry2f4tqTrzxtP0W1p7xuP64sn/Ky/axbt04nTpzQxo0blZ6erksuuUS33HJLieZp5X7IV5S0jYXlPz/d1XXfH7i7b/Py8lSjRg298sorCgwMVKtWrbR//35NnjxZjz32mJtr79vc3bfnK2of5y/c3bdr1qzRU089pWnTpqlNmzb64YcfNHz4cMXFxenRRx91c+19m7v7dsqUKfrrX/+qBg0ayGazKTk5WUOGDNHs2bPdXHMUxq0Bd7Vq1RQYGFjgDMzhw4cLnHkpidjY2GLnaVW5Vs033zPPPKPx48fr448/VtOmTYvNGxERoSZNmpTpzL/V7ckXEBCgyy+/3F7X8rp8Tp06pQULFmjcuHFO83p7+SxcuFB33HGH3nrrLYcrvVL53H6Ka0++8rT9uNKefOVh+3GlPeVp+0lKSpJ0Lvg/dOiQxo4daw+4vbX9+JLStLGofgsKClJMTEyh37lw3fcHVvVtXFycgoODFRgYaM/TsGFDHTx4UGfOnFFISIibW+J7rF5vS7KPq2is6ttHH31UAwcOtN8x0KRJE508eVJ/+9vfNGbMGAUEVPxfM7aqb6tXr65ly5bp9OnTOnr0qOLj45Wenm4f/2Att665ISEhatWqlVauXOmQvnLlSrVr167U801JSSkwzxUrVtjnaVW5Vs1XkiZPnqwnnnhCH374oVq3bu00f3Z2trZv3664uLhSl2lle85njNGWLVvsdS2Py0eSFi1apOzsbA0YMMBpXm8un/nz52vw4MF68803C33OqbxtP87aI5Wv7ceV9pzP17cfV9tTXrafCxljlJ2dbf/sre3Hl5SmjUX1W+vWrRUcHFzody5c9/2BVX3bvn17/fDDD8rLy7Pn+f777xUXF+cXwbZk/Xpbkn1cRWNV3546dapAUB0YGChz7iXPbmyB77J6vQ0LC1OtWrWUk5OjJUuW6LrrrnNvA1A4d7+FLf9V9rNmzTLbtm0zI0aMMBEREWbPnj3GGGPS09PNwIEDHb6TmZlpMjMzTatWrUz//v1NZmam2bp1q336559/bgIDA83EiRPN9u3bzcSJE4v8WZaiyvWl9jz99NMmJCTELF682OGnJbKysux5HnzwQbNmzRqza9cus3HjRtOrVy8TGRnpk+0ZO3as+fDDD82PP/5oMjMzzZAhQ0xQUJD54osvXC7Xl9qTr0OHDubmm28utFxfWT5vvvmmCQoKMlOnTnVYl44dO2bPU562H1faU562H1faU562H1fak688bD8vv/yyWb58ufn+++/N999/b1599VUTFRVlxowZY8/jze3Hl5S0b/N/puaBBx4w27ZtM7NmzSrwMzWurPv+wIq+3bt3r6lcubIZNmyY2bFjh3n33XdNjRo1zJNPPunx9nmTFX2br7h9nD+wom8zMjJMZGSkmT9/vtm1a5dZsWKFSU5ONn379vV4+7zJir7duHGjWbJkifnxxx/Np59+ajp37mySkpLM77//7unm+SW3B9zGGDN16lSTmJhoQkJCTMuWLc3atWvt0wYNGmRSU1MdKyEV+EtMTHTI89Zbb5n69eub4OBg06BBA7NkyZISletL7UlMTCw0T0ZGhj3PzTffbOLi4kxwcLCJj483ffr0KTQo9IX2jBgxwlx88cUmJCTEVK9e3XTt2tWsX7++ROX6UnuMMWbHjh1GklmxYkWhZfrK8klNTS20PYMGDXKYZ3nZflxpT3naflxpT3naflxd38rL9vPiiy+ayy67zISHh5uoqCjTokULM23aNJObm+swT29uP76kpPvaNWvWmBYtWpiQkBBTp04dM336dIfprq77/sDdfWvMud+Qb9OmjQkNDTV169Y1Tz31lMnJybG6KT7Hir51to/zF+7u27Nnz5qxY8ea5ORkExYWZmrXrm3uvfdevwwK3d23a9asMQ0bNjShoaEmJibGDBw40Pzyyy+eaAqMMTZj/OQeDQAAAAAAPKjiv30AAAAAAAAvIOAGAAAAAMACBNwAAAAAAFiAgBsAAAAAAAsQcAMAAAAAYAECbgAAAAAALEDADQAAAACABQi4AQAAAACwAAE3AAAAAAAWIOAGAAAAAMACBNwAAAAAAFiAgBsAAAAAAAsQcAMAAAAAYAECbgAAAAAALEDADQAAAACABQi4AQAAAACwAAE3AAAAAAAWIOAGAAAAAMACBNwAAAAAAFiAgBsAAAAAAAsQcAMAAAAAYAECbgAAAAAALEDADQAAAACABQi4AQAAAACwAAE3AAAAAAAWIOAGAAAAAMACBNwAAAAAAFiAgBsAAAAAAAsQcAMAAAAAYAECbgAAAAAALEDADQAAAACABQi4AQAAAACwAAE3AAAAAAAWIOAGAAAAAMACBNwAAAAAAFiAgBsAAAAAAAsQcAM+bM6cObLZbNqzZ489rVOnTurUqZPX6gQAgC9Zv369xo4dq2PHjlle1vjx47Vs2TLLywFQcRBwA+XMtGnTNG3aNG9XAwAAn7B+/Xo9/vjjBNwAfBIBN2CBU6dOWTbvRo0aqVGjRpbN393+/PNPGWMKnVbWfsrNzVV2dnaZ5gEAgLf9+eefXi2/qPHYGFPmuhV3HAD4AwJuoIzGjh0rm82mzZs366abblJ0dLSSk5MlSZs2bVK/fv1Up04dVapUSXXq1NEtt9yin376qcB8Nm7cqPbt2yssLEzx8fEaPXq0zp49WyDfhbeUr1mzRjabTWvWrHHIt2fPHtlsNs2ZM8eetmvXLvXr10/x8fEKDQ1VzZo11aVLF23ZssVpOzdt2qTevXuratWqCgsLU4sWLbRo0SKHPPm3wK9YsUK33367qlevrvDwcGVnZ6tTp05q3LixPv30U7Vr107h4eG6/fbbJUl79+7VgAEDVKNGDYWGhqphw4Z69tlnlZeXV6A9kyZN0pNPPqmkpCSFhoZq9erVTusOAKiYxo4dq7///e+SpKSkJNlstgJj4sKFC5WSkqKIiAhVrlxZ3bp1U2Zmpn36Z599puDgYI0aNcph3vlj2qxZsyRJNptNJ0+e1Ny5c+3l5I/H+ccCFyrs0bA6deqoV69eWrp0qVq0aKGwsDA9/vjjkqSDBw/qrrvuUkJCgkJCQpSUlKTHH39cOTk5LvWHs7ZK0uDBg1W5cmV9++236tq1qyIjI9WlSxd7G4cNG6YZM2aoYcOGCg0N1dy5c+391KVLF0VGRio8PFzt2rXTe++9V2h7CzsOAPxVkLcrAFQUffr0Ub9+/XT33Xfr5MmTks4FifXr11e/fv1UtWpVHThwQNOnT9fll1+ubdu2qVq1apKkbdu2qUuXLqpTp47mzJmj8PBwTZs2TW+++aZb69ijRw/l5uZq0qRJuvjii3XkyBGtX7/e6W14q1ev1jXXXKM2bdpoxowZqlKlihYsWKCbb75Zp06d0uDBgx3y33777erZs6dee+01nTx5UsHBwZKkAwcOaMCAAXrooYc0fvx4BQQE6Ndff1W7du105swZPfHEE6pTp47effddjRo1Sj/++GOB2+dffPFFXXrppXrmmWcUFRWlevXqubOLAADlyJ133qnffvtNL730kpYuXaq4uDhJst8JNn78eD3yyCMaMmSIHnnkEZ05c0aTJ09Wx44d9eWXX6pRo0bq0KGDnnzySaWnp+vKK69U7969tXXrVg0dOlQDBgzQHXfcIUnasGGDOnfurLS0ND366KOSpKioqFLVe/Pmzdq+fbseeeQRJSUlKSIiQgcPHtQVV1yhgIAAPfbYY0pOTtaGDRv05JNPas+ePZo9e3ax83SlrfnOnDmj3r1766677lJ6erpDQL9s2TKtW7dOjz32mGJjY1WjRg2tXbtWV199tZo2bapZs2YpNDRU06ZN07XXXqv58+fr5ptvdqhLUccBgF8yAMokIyPDSDKPPfaY07w5OTnmxIkTJiIiwkyZMsWefvPNN5tKlSqZgwcPOuRt0KCBkWR2795tT09NTTWpqan2z6tXrzaSzOrVqx3K2r17t5FkZs+ebYwx5siRI0aSeeGFF0rcxgYNGpgWLVqYs2fPOqT36tXLxMXFmdzcXGOMMbNnzzaSzG233VZgHqmpqUaS+eSTTxzS09PTjSTzxRdfOKTfc889xmazmR07dji0Jzk52Zw5c6bEbQAAVEyTJ08uMFYaY8zevXtNUFCQue+++xzSs7KyTGxsrOnbt689LS8vz/To0cNcdNFF5rvvvjONGjUyDRo0MCdOnHD4bkREhBk0aFCBOuQfC1wof1w8v26JiYkmMDDQPr7lu+uuu0zlypXNTz/95JD+zDPPGElm69atRfZBSdo6aNAgI8m8+uqrBeYjyVSpUsX89ttvDult27Y1NWrUMFlZWfa0nJwc07hxY5OQkGDy8vIc2lvYcQDgr7ilHHCTG2+8sUDaiRMn9I9//EOXXHKJgoKCFBQUpMqVK+vkyZPavn27Pd/q1avVpUsX1axZ054WGBhY4IxxWVStWlXJycmaPHmynnvuOWVmZjrcsl2UH374Qf/973916623SpJycnLsfz169NCBAwe0Y8cOh+8U1heSFB0drc6dOzukrVq1So0aNdIVV1zhkD548GAZY7Rq1SqH9N69e3OmHADg1EcffaScnBzddtttDmNXWFiYUlNTHW47t9lsmjdvniIjI9W6dWvt3r1bixYtUkREhCV1a9q0qS699FKHtHfffVdpaWmKj493qG/37t0lSWvXrnVLW/MVNVZ37txZ0dHR9s8nT57UF198oZtuukmVK1e2pwcGBmrgwIH6+eefXT4OAPwRt5QDbpJ/G9v5+vfvr08++USPPvqoLr/8ckVFRclms6lHjx4OLyE5evSoYmNjC3y/sLTSstls+uSTTzRu3DhNmjRJDz74oKpWrapbb71VTz31lCIjIwv93qFDhyRJo0aNKvB8W74jR444fC6sL4pKP3r0qOrUqVMgPT4+3j7dlXkDAHC+/PHr8ssvL3R6QIDjdaeYmBj17t1bU6dO1Q033KAmTZpYVrfCxrJDhw7pnXfeKfKk8oVj7YXflVxva3h4eJG3w19Yt99//13GmELrzFgNOEfADbjJhS9L+eOPP/Tuu+8qIyND6enp9vTs7Gz99ttvDnljYmJ08ODBAvMsLO1CYWFh9vmer7CBOTEx0f7yl++//16LFi3S2LFjdebMGc2YMaPQ+ec/Zz569Gj16dOn0Dz169d3+FzYi2OKSo+JidGBAwcKpO/fv9+hfGfzBgDgfPnjx+LFi5WYmOg0/8qVKzV9+nRdccUVevvtt7VkyRKXr9SePxaHhoba04sKkgsby6pVq6amTZvqqaeeKvQ7+cFtYUra1uLG0gunRUdHKyAggLEaKCUCbsAiNptNxhiHgVeS/vWvfyk3N9chLS0tTcuXL9ehQ4fst5Xn5uZq4cKFTsvJvzr8zTffqFu3bvb05cuXF/u9Sy+9VI888oiWLFmizZs3F5mvfv36qlevnv7zn/9o/PjxTutTUl26dNGECRO0efNmtWzZ0p4+b9482Ww2paWlub1MAEDFkT/OXvjzVd26dVNQUJB+/PFHp4Fz/ks9U1NTtXLlSvXp00d33HGHWrZsqaSkJIeyCvuZrPPH4vOvMr/zzjsut6NXr156//33lZyc7HBLtytK0taSioiIUJs2bbR06VI988wzqlSpkiQpLy9Pr7/+uhISEgrcHg/gfwi4AYtERUXpyiuv1OTJk1WtWjXVqVNHa9eu1axZs3TRRRc55H3kkUe0fPlyde7cWY899pjCw8M1depU+9vOixMbG6urrrpKEyZMUHR0tBITE/XJJ59o6dKlDvm++eYbDRs2TH/5y19Ur149hYSEaNWqVfrmm28crsAX5p///Ke6d++ubt26afDgwapVq5Z+++03bd++XZs3b9Zbb71V4v7J98ADD2jevHnq2bOnxo0bp8TERL333nuaNm2a7rnnHgZxAECx8m/9njJligYNGqTg4GDVr19fderU0bhx4zRmzBjt2rVL11xzjaKjo3Xo0CF9+eWXioiI0OOPP67c3FzdcsststlsevPNNxUYGKg5c+aoefPmuvnmm/XZZ58pJCTEXtaaNWv0zjvvKC4uTpGRkapfv7569OihqlWr6o477tC4ceMUFBSkOXPmaN++fS63Y9y4cVq5cqXatWun+++/X/Xr19fp06e1Z88evf/++5oxY4YSEhIK/a6rbS2tCRMm6Oqrr1ZaWppGjRqlkJAQTZs2Td99953mz5/PFW2gOF5+aRtQ7uW/mfTXX38tMO3nn382N954o4mOjjaRkZHmmmuuMd99951JTEws8JbTzz//3LRt29aEhoaa2NhY8/e//9288sorTt9SbowxBw4cMDfddJOpWrWqqVKlihkwYIDZtGmTw1vKDx06ZAYPHmwaNGhgIiIiTOXKlU3Tpk3N888/b3Jycpy28z//+Y/p27evqVGjhgkODjaxsbGmc+fOZsaMGfY8+W8n/eqrrwp8PzU11Vx22WWFzvunn34y/fv3NzExMSY4ONjUr1/fTJ482f72c2P+95byyZMnO60rAMC/jB492sTHx5uAgIACv9yxbNkyk5aWZqKiokxoaKhJTEw0N910k/n444+NMcaMGTPGBAQEFPgVjfXr15ugoCAzfPhwe9qWLVtM+/btTXh4uJHkMB5/+eWXpl27diYiIsLUqlXLZGRkmH/961+FvqW8Z8+ehbbj119/Nffff79JSkoywcHBpmrVqqZVq1ZmzJgxBd6YXhhnbTXm3FvKIyIiCv2+JDN06NBCp61bt8507tzZREREmEqVKpm2bduad955xyFPcccBgL+yGWOMt4J9AAAAAAAqKn4WDAAAAAAACxBwAwAAAABgAQJuAAAAAAAsQMANAAAAAIAFCLgBAAAAALAAATcAAAAAABYI8kaheXl52r9/vyIjI2Wz2bxRBQAAfIIxRllZWYqPj1dAgHfOgzMuAwBwjrvHZa8E3Pv371ft2rW9UTQAAD5p3759SkhI8ErZjMsAADhy17jslYA7MjJS0rlGREVFeaMKAAD4hOPHj6t27dr2sdEbGJcBADjH3eOyVwLu/NvVoqKifGtgXz2h+Olpoz1TDwCA3/Hmrdw+Oy6XB86OHSSOHwCgHHLXuMxL0wAAAAAAsIBXrnADAABUBBt2HXWaJyXNAxUBAPgk/wm4XbnlCwAAAAAAN+GWcgAAAAAALEDADQAAAACABQi4AQAAAACwgP88ww0AAPxPWX/yk3fAAADKgCvcAAAAAABYgIAbAAAAAAALcEs5AAAon7jdGwDg47jCDQAAAACABQi4AQAAAACwAAE3AAAAAAAW4BluAADgv3zhOfCy/nQZAMBncYUbAAAAAAALEHADAAAAAGABbikHAADwZdxyDgDlFle4AQAAAACwAFe4AQCA39qw66j1hfBiNgDwWwTcJcFgBQAAfI0vBPQAgEJxSzkAAAAAABbgCjcAAIAXObutPaVujIdqAgBwNwJud+KWcwAAAADA/yPgBgAA3sGJagBABUfADQAA4MO45RwAyi9emgYAAAAAgAW4wg0AAHwTP3flX3jEAEAFxBVuAAAAAAAsQMANAAAAAIAFCLgBAAAAALBAxXmGm+e8AADABZy94dtfOH3TeZqHKgIAfqbiBNwAAMCvEEyf45F+4IVmAFAq3FIOAAAAAIAFuMINAACAsnHl0T6uggPwQ1zhBgAAAADAAlzh9qDnV37vNM8DV1/qgZoAAACch5fPAoAlCLg9qO3eV1zI9Yzl9QAAwCMI4soNp28xrxvjoZoAQMXCLeUAAAAAAFiAK9wAAMAnVZSf/aoo7QAAlBxXuAEAAAAAsABXuEuA55sAAABKx9nLYx/gqBRABcSurYJxOpjxFnQAAFAeOXsJH7/zDcAHEXD7GAJmAAAAAKgYCLjLGVd+y7us3yeoBwAAfomr6ADcjID7PGV9i6gn3kLq7Le8N178N8vrwFV4AAAAAHCOgBsFlPUqOgAAqFjKw4tjuYsPgC8i4AYAAECZuHSX38Vlm4dbgnpnt4z7Am5rByoUvwm4PXG7tzs4u2W8InDHGWhuawcAAADg6/wm4Ibn+MIt6Z4IyAn6AaBsysvJcLiHL1xUsPoq+oZZoyydv1u4cpWfq+iA2xBwwydZHbTznBcAAAAAqxFwVzC+8BZzuIagH0BFxxVseJI7rqA7vQIuH3gG3MkVandsdylpxU8vD3f5Oa1j0BLnMynrlX6ex3efctyXXgm4jTGSpOPHj7tvpidPFz/5z2z3lVWONdnxkrer4JKvEoZ4uwqasGxzmb7vbP0+ffJEmeswtPMlJarThaau+sFpnrKWAZzP2TrnC+ubp+uYv6/IHxu9wZJxWYy9cK/jPnCsV9Y6OPu+O7ijH8p6DOPufUlpOK1jkAvLoqztcLa8faCfyg0P9qW7x2Wb8cII//PPP6t27dqeLhYAAJ+1b98+JSQkeKVsxmUAABy5a1z2SsCdl5en/fv3KzIyUjabTcePH1ft2rW1b98+RUVFebo65Qb95Dr6yjX0k+voK9fRV67J76e9e/fKZrMpPj5eAQEBXqnLheMyXMf67jn0tefQ155DX3uOq31tjFFWVpbbxmWv3FIeEBBQ6NmCqKgoVjQX0E+uo69cQz+5jr5yHX3lmipVqni9n4oal+E61nfPoa89h772HPrac1zp6ypVqritPO+cSgcAAAAAoIIj4AYAAAAAwAI+EXCHhoYqIyNDoaGh3q6KT6OfXEdfuYZ+ch195Tr6yjX0U8XAcvQc+tpz6GvPoa89x1t97ZWXpgEAAAAAUNH5xBVuAAAAAAAqGgJuAAAAAAAsQMANAAAAAIAFCLgBAAAAALCARwLuadOmKSkpSWFhYWrVqpXWrVtXbP61a9eqVatWCgsLU926dTVjxgxPVNMnlKSv1qxZI5vNVuDvv//9rwdr7Hmffvqprr32WsXHx8tms2nZsmVOv+Ov61RJ+8pf16kJEybo8ssvV2RkpGrUqKHrr79eO3bscPo9f1yvStNX/rheTZ8+XU2bNlVUVJSioqKUkpKiDz74oNjv+OP6VJ6VZixC6ZR2H42SK82+C2U3YcIE2Ww2jRgxwttVqZDGjh1b4BgkNjbWY+VbHnAvXLhQI0aM0JgxY5SZmamOHTuqe/fu2rt3b6H5d+/erR49eqhjx47KzMzUww8/rPvvv19LliyxuqpeV9K+yrdjxw4dOHDA/levXj0P1dg7Tp48qWbNmunll192Kb8/r1Ml7at8/rZOrV27VkOHDtXGjRu1cuVK5eTkqGvXrjp58mSR3/HX9ao0fZXPn9arhIQETZw4UZs2bdKmTZvUuXNnXXfdddq6dWuh+f11fSrPSrt/RcmVZb+Dkinpvgtl99VXX+mVV15R06ZNvV2VCu2yyy5zOAb59ttvPVe4sdgVV1xh7r77boe0Bg0amPT09ELzP/TQQ6ZBgwYOaXfddZdp27atZXX0FSXtq9WrVxtJ5vfff/dA7XyTJPP2228Xm8ef16nzudJXrFPnHD582Egya9euLTIP69U5rvQV69U50dHR5l//+leh01ifyjdX9q9wH1f2O3Cf4vZdKJusrCxTr149s3LlSpOammqGDx/u7SpVSBkZGaZZs2ZeK9/SK9xnzpzR119/ra5duzqkd+3aVevXry/0Oxs2bCiQv1u3btq0aZPOnj1rWV29rTR9la9FixaKi4tTly5dtHr1aiurWS756zpVFv6+Tv3xxx+SpKpVqxaZh/XqHFf6Kp+/rle5ublasGCBTp48qZSUlELzsD4BrivJfgel58q+C2UzdOhQ9ezZU1dddZW3q1Lh7dy5U/Hx8UpKSlK/fv20a9cuj5VtacB95MgR5ebmqmbNmg7pNWvW1MGDBwv9zsGDBwvNn5OToyNHjlhWV28rTV/FxcXplVde0ZIlS7R06VLVr19fXbp00aeffuqJKpcb/rpOlQbrlGSM0ciRI9WhQwc1bty4yHysV673lb+uV99++60qV66s0NBQ3X333Xr77bfVqFGjQvOyPgGucXW/g9Iryb4LpbdgwQJt3rxZEyZM8HZVKrw2bdpo3rx5+uijjzRz5kwdPHhQ7dq109GjRz1SfpAnCrHZbA6fjTEF0pzlLyy9IipJX9WvX1/169e3f05JSdG+ffv0zDPP6Morr7S0nuWNP69TJcE6JQ0bNkzffPONPvvsM6d5/X29crWv/HW9ql+/vrZs2aJjx45pyZIlGjRokNauXVvkgau/r0+AK0qyj0bplHTfhZLbt2+fhg8frhUrVigsLMzb1anwunfvbv+/SZMmSklJUXJysubOnauRI0daXr6lV7irVaumwMDAAldoDx8+XOBMfr7Y2NhC8wcFBSkmJsayunpbafqqMG3bttXOnTvdXb1yzV/XKXfxp3Xqvvvu0/Lly7V69WolJCQUm9ff16uS9FVh/GG9CgkJ0SWXXKLWrVtrwoQJatasmaZMmVJoXn9fnwBXlHW/A9eUZN+F0vn66691+PBhtWrVSkFBQQoKCtLatWv14osvKigoSLm5ud6uYoUWERGhJk2aeOw4xNKAOyQkRK1atdLKlSsd0leuXKl27doV+p2UlJQC+VesWKHWrVsrODjYsrp6W2n6qjCZmZmKi4tzd/XKNX9dp9zFH9YpY4yGDRumpUuXatWqVUpKSnL6HX9dr0rTV4Xxh/XqQsYYZWdnFzrNX9cnwBXu2u+gdIrbd6F0unTpom+//VZbtmyx/7Vu3Vq33nqrtmzZosDAQG9XsULLzs7W9u3bPXccYvVb2RYsWGCCg4PNrFmzzLZt28yIESNMRESE2bNnjzHGmPT0dDNw4EB7/l27dpnw8HDzwAMPmG3btplZs2aZ4OBgs3jxYqur6nUl7avnn3/evP322+b777833333nUlPTzeSzJIlS7zVBI/IysoymZmZJjMz00gyzz33nMnMzDQ//fSTMYZ16nwl7St/XafuueceU6VKFbNmzRpz4MAB+9+pU6fseVivzilNX/njejV69Gjz6aefmt27d5tvvvnGPPzwwyYgIMCsWLHCGMP6VBE427/CfVzZ78A9nO27YB3eUm6dBx980KxZs8bs2rXLbNy40fTq1ctERkbaYyyrWR5wG2PM1KlTTWJiogkJCTEtW7Z0+BmHQYMGmdTUVIf8a9asMS1atDAhISGmTp06Zvr06Z6opk8oSV89/fTTJjk52YSFhZno6GjToUMH895773mh1p6V/xNDF/4NGjTIGMM6db6S9pW/rlOF9ZEkM3v2bHse1qtzStNX/rhe3X777fZ9efXq1U2XLl0cDlhZn8o/Z/tXuI8r+x24h7N9F6xDwG2dm2++2cTFxZng4GATHx9v+vTpY7Zu3eqx8m3G/P9bWQAAAAAAgNtY+gw3AAAAAAD+ioAbAAAAAAALEHADAAAAAGABAm4AAAAAACxAwA0AAAAAgAUIuAEAAAAAsAABNwAAAAAAFiDgBgAAAADAAgTcAAAAAABYgIAbAAAAAAALEHADAAAAAGABAm4AAAAAACxAwA0AAAAAgAUIuAEAAAAAsAABNwAAAAAAFiDgBgAAAADAAgTcAAAAAABYgIAbAAAAAAALEHADAAAAAGABAm4AAAAAACxAwA0AAAAAgAUIuAEAAAAAsAABNwAAAAAAFiDgBgAAAADAAgTcAAAAAABYgIAbAAAAAAALEHADAAAAAGABAm4AAAAAACxAwA0AAAAAgAUIuAEAAAAAsAABNwAAAAAAFiDgBgAAAADAAgTcAAAAAABYgIAbAAAAAAALEHADAAAAAGABAm4AAAAAACxAwA0AAAAAgAUIuAEPGzx4sOrUqeOVstevX6+xY8fq2LFjXikfAAAA8Cc2Y4zxdiUAf/Ljjz/q+PHjatGihcfLfuaZZ/T3v/9du3fv9lrQDwAAAPgLrnADHnLq1ClJUnJysleCbSvlt83dzp49q5ycHEvKNMbozz//LNM8AAD+o7gxydcUNUbm5uYqOzvbknkDKBwBN/ze2LFjZbPZlJmZqT59+igqKkpVqlTRgAED9OuvvxbIv3DhQqWkpCgiIkKVK1dWt27dlJmZ6ZBn8ODBqly5sr799lt17dpVkZGR6tKli33ahVeXbTabhg0bptmzZ6t+/fqqVKmSWrdurY0bN8oYo8mTJyspKUmVK1dW586d9cMPPxSo18cff6wuXbooKipK4eHhat++vT755BOHdv7973+XJCUlJclms8lms2nNmjVua1tRdu7cqf79+6tGjRoKDQ1Vw4YNNXXqVIc8a9askc1m02uvvaYHH3xQtWrVUmhoqH744Ydiy/ztt9907733qlatWgoJCVHdunU1ZsyYAgcU+X08Y8YMNWzYUKGhoZo7d26x9QYAVCw//PCDhgwZonr16ik8PFy1atXStddeq2+//dYhX3FjkuR8zC1JWUUxxmjatGlq3ry5KlWqpOjoaN10003atWuXQ75OnTqpcePG+vTTT9WuXTuFh4fr9ttv1549e2Sz2TRp0iQ9+eSTSkpKUmhoqFavXi1JWr58uVJSUhQeHq7IyEhdffXV2rBhg8O884+RNm/erJtuuknR0dFKTk4uUZ8D/o6AG/h/N9xwgy655BItXrxYY8eO1bJly9StWzedPXvWnmf8+PG65ZZb1KhRIy1atEivvfaasrKy1LFjR23bts1hfmfOnFHv3r3VuXNn/fvf/9bjjz9ebPnvvvuu/vWvf2nixImaP3++srKy1LNnTz344IP6/PPP9fLLL+uVV17Rtm3bdOONN+r8p0Fef/11de3aVVFRUZo7d64WLVqkqlWrqlu3bvYDgDvvvFP33XefJGnp0qXasGGDNmzYoJYtW1ratm3btunyyy/Xd999p2effVbvvvuuevbsqfvvv7/Q740ePVp79+7VjBkz9M4776hGjRpFlnn69GmlpaVp3rx5GjlypN577z0NGDBAkyZNUp8+fQrMe9myZZo+fboee+wxffTRR+rYsWOxywQAULHs379fMTExmjhxoj788ENNnTpVQUFBatOmjXbs2FEgf2FjkitjbmnKutBdd92lESNG6KqrrtKyZcs0bdo0bd26Ve3atdOhQ4cc8h44cEADBgxQ//799f777+vee++1T3vxxRe1atUqPfPMM/rggw/UoEEDvfnmm7ruuusUFRWl+fPna9asWfr999/VqVMnffbZZwXq0qdPH11yySV66623NGPGjJJ0OQAD+LmMjAwjyTzwwAMO6W+88YaRZF5//XVjjDF79+41QUFB5r777nPIl5WVZWJjY03fvn3taYMGDTKSzKuvvlqgvEGDBpnExESHNEkmNjbWnDhxwp62bNkyI8k0b97c5OXl2dNfeOEFI8l88803xhhjTp48aapWrWquvfZah3nm5uaaZs2amSuuuMKeNnnyZCPJ7N692yGvu9pWmG7dupmEhATzxx9/OKQPGzbMhIWFmd9++80YY8zq1auNJHPllVcWmEdRZc6YMcNIMosWLXJIf/rpp40ks2LFCnuaJFOlShV7eQAA5OTkmDNnzph69eo5HAcUNSaVZMx1tazCbNiwwUgyzz77rEP6vn37TKVKlcxDDz1kT0tNTTWSzCeffOKQd/fu3UaSSU5ONmfOnHGoa3x8vGnSpInJzc21p2dlZZkaNWqYdu3a2dPyj5Eee+yxYusLoGhc4Qb+36233urwuW/fvgoKCrLfevXRRx8pJydHt912m3Jycux/YWFhSk1Ndbg1O9+NN97ocvlpaWmKiIiwf27YsKEkqXv37rLZbAXSf/rpJ0nn3jz+22+/adCgQQ71ysvL0zXXXKOvvvpKJ0+eLLZsq9p2+vRpffLJJ7rhhhsUHh7uMO8ePXro9OnT2rhxo8vzvXDaqlWrFBERoZtuuskhffDgwZJU4Pa+zp07Kzo62mm9AQAVU05OjsaPH69GjRopJCREQUFBCgkJ0c6dO7V9+/YC+S8cd0oy5pa0rPO9++67stlsGjBggEM5sbGxatasWYFxOTo6Wp07dy50Xr1791ZwcLD9844dO7R//34NHDhQAQH/CwUqV66sG2+8URs3bizwnHZJjmcAOArydgUAXxEbG+vwOSgoSDExMTp69Kgk2W/fuvzyywv9/vmDliSFh4crKirK5fKrVq3q8DkkJKTY9NOnTzvU68Kg83y//fabQzB/IavadvToUeXk5Oill17SSy+9VGieI0eOOHyOi4srNF9hZR49elSxsbEOJyQkqUaNGgoKCrIvO2fzBgD4h5EjR2rq1Kn6xz/+odTUVEVHRysgIEB33nlnoS/SvHDcKMmYW9KyLizHGKOaNWsWOr1u3brF1rO4afljY2HfiY+PV15enn7//XeFh4e7NH8AxSPgBv7fwYMHVatWLfvnnJwcHT16VDExMZKkatWqSZIWL16sxMREp/O7MAi0Sn69XnrpJbVt27bQPEUN2BfOw91ti46OVmBgoAYOHKihQ4cWmicpKcmleReWHhMToy+++ELGGIfphw8fVk5Ojr1dJa03AKBiev3113Xbbbdp/PjxDulHjhzRRRddVCD/heNGScbckpZ1YTk2m03r1q1TaGhogekXphU3vl04Lf+45sCBAwXy7t+/XwEBAQXuBmP8BEqPgBv4f2+88YZatWpl/7xo0SLl5OSoU6dOkqRu3bopKChIP/74o0/dWtW+fXtddNFF2rZtm4YNG1Zs3vwB+sIz61a1LTw8XGlpacrMzFTTpk3tV+fdpUuXLlq0aJGWLVumG264wZ4+b948+3QAAPLZbLYCwep7772nX375RZdcconT75dkzC1LWb169dLEiRP1yy+/qG/fvk7rVRL169dXrVq19Oabb2rUqFH2YPrkyZNasmSJ/c3lANyDgBv4f0uXLlVQUJCuvvpqbd26VY8++qiaNWtmH+jq1KmjcePGacyYMdq1a5euueYaRUdH69ChQ/ryyy8VERHh9E3kVqhcubJeeuklDRo0SL/99ptuuukm1ahRQ7/++qv+85//6Ndff9X06dMlSU2aNJEkTZkyRYMGDVJwcLDq169vadumTJmiDh06qGPHjrrnnntUp04dZWVl6YcfftA777yjVatWlbrtt912m6ZOnapBgwZpz549atKkiT777DONHz9ePXr00FVXXVXqeQMAKp5evXppzpw5atCggZo2baqvv/5akydPVkJCgkvfL8mYW5ay2rdvr7/97W8aMmSINm3apCuvvFIRERE6cOCAPvvsMzVp0kT33HNPqfogICBAkyZN0q233qpevXrprrvuUnZ2tiZPnqxjx45p4sSJpZovgMIRcAP/b+nSpRo7dqymT58um82ma6+9Vi+88ILDVdnRo0erUaNGmjJliubPn6/s7GzFxsbq8ssv19133+21ug8YMEAXX3yxJk2apLvuuktZWVmqUaOGmjdvbn+BmHTutzpHjx6tuXPnaubMmcrLy9Pq1avt6Va0rVGjRtq8ebOeeOIJPfLIIzp8+LAuuugi1atXTz169ChTu8PCwrR69WqNGTNGkydP1q+//qpatWpp1KhRysjIKNO8AQAVz5QpUxQcHKwJEyboxIkTatmypZYuXapHHnnE5Xm4OuaWtax//vOfatu2rf75z39q2rRpysvLU3x8vNq3b68rrriipE130L9/f0VERGjChAm6+eabFRgYqLZt22r16tVq165dmeYNwJHNmPN+zBfwQ2PHjtXjjz+uX3/9tcAzvwAAAABQWvwsGAAAAAAAFiDgBgAAAADAAtxSDgAAAACABbjCDQAAAACABQi4AQAAAACwgFd+FiwvL0/79+9XZGSkbDabN6oAAIBPMMYoKytL8fHxCgjwznlwxmUAAM5x97jslYB7//79ql27tjeKBgDAJ+3bt08JCQleKZtxGQAAR+4al70ScEdGRko614ioqChvVAEAAJ9w/Phx1a5d2z42egPjMgAA57h7XPZKwJ1/u1pUVBQDOwAAkldv5WZcBgDAkbvGZa8E3F6xeoLzPGmjra8HAAA4x9nYzLgMACjneEs5AAAAAAAWIOAGAAAAAMACBNwAAAAAAFiAgBsAAAAAAAsQcAMAAAAAYAECbgAAAAAALEDADQAAAACABQi4AQAAAACwAAE3AAAAAAAWIOAGAAAAAMACQd6uAAAAqKBWT/B2DQAA8CoC7vM5OzBIG+2ZegAAAAAAyj1uKQcAAAAAwAIE3AAAAAAAWICAGwAAAAAACxBwAwAAAABgAV6aVhK8VA0AAAAA4CKucAMAAAAAYAECbgAAAAAALEDADQAAAACABQi4AQAAAACwAAE3AAAAAAAWIOAGAAAAAMACBNwAAAAAAFiAgBsAAAAAAAsQcAMAAAAAYAECbgAAAAAALBDk7QpUKKsnFD89bbRn6gEAAAAA8DqucAMAAAAAYAECbgAAAAAALFBxbil3djs3AAAAAAAexBVuAAAAAAAsQMANAAAAAIAFCLgBAAAAALAAATcAAAAAABYg4AYAAAAAwAIV5y3l5cDzK793mueBqy/1QE0AAAAAAFYj4AYAAL7J2U9+po32TD0AACglbikHAAAAAMACXOEugQ27jhY7PaVujIdqAgAAAADwdQTcbuQsINfFnqkHAAAAAMD7CLg9qO3eV5zmeX7l38pUBi9dAwAAAADfQMANAAC8oqyPavHrHwAAX0fADQAAfBKPagEAyjsC7grGlbP9znA1AAAAAADKjoD7PE7PpAMAgHKlrCeiOQkNACgLAm4f4+zFahsvLttL1VzBwQkAAO7jbFxl3ASAistvAu6KcvXaFwJyX8DBCwDAlV//KOu4yHgDACgLvwm44V84QAIAeII73p3CmAUAFRcBN9yuory4jQMgAIAvYDwCgPLLKwG3MUaSdPz4cffN9OTp4if/me2+snxYkx0vWV7GVwlDLC9jwrLNls7flXXv9MkTxU63uo6SNLTzJZaXAcC78vdH+WOjN1gyLkv6cut+t86vNJyNi54Y06zmifHIGcYrABWFu8dlm/HCCP/zzz+rdu3ani4WAACftW/fPiUkJHilbMZlAAAcuWtc9krAnZeXp/379ysyMlI2m03SuTMJtWvX1r59+xQVFeXpKuH/sRy8j2XgfSwD7/OnZWCMUVZWluLj4xUQEOCVOhQ2LrvKn5aVu9BnJUeflRx9VnL0WclVxD5z97jslVvKAwICijxbEBUVVWEWVnnGcvA+loH3sQy8z1+WQZUqVbxafnHjsqv8ZVm5E31WcvRZydFnJUeflVxF6zN3jsveOZUOAAAAAEAFR8ANAAAAAIAFfCbgDg0NVUZGhkJDQ71dFb/GcvA+loH3sQy8j2VQfrCsSo4+Kzn6rOTos5Kjz0qOPnPOKy9NAwAAAACgovOZK9wAAAAAAFQkBNwAAAAAAFiAgBsAAAAAAAsQcAMAAAAAYAECbgAAAAAALOAzAfe0adOUlJSksLAwtWrVSuvWrfN2lfzG2LFjZbPZHP5iY2O9Xa0K7dNPP9W1116r+Ph42Ww2LVu2zGG6MUZjx45VfHy8KlWqpE6dOmnr1q3eqWwF5WwZDB48uMB20bZtW+9UtoKaMGGCLr/8ckVGRqpGjRq6/vrrtWPHDoc8bAu+j/H7f9yxb8/OztZ9992natWqKSIiQr1799bPP//swVZ4jrv2Af7UZ9OnT1fTpk0VFRWlqKgopaSk6IMPPrBPp7+cmzBhgmw2m0aMGGFPo98cOYsN6K+S8YmAe+HChRoxYoTGjBmjzMxMdezYUd27d9fevXu9XTW/cdlll+nAgQP2v2+//dbbVarQTp48qWbNmunll18udPqkSZP03HPP6eWXX9ZXX32l2NhYXX311crKyvJwTSsuZ8tAkq655hqH7eL999/3YA0rvrVr12ro0KHauHGjVq5cqZycHHXt2lUnT56052Fb8G2M347csW8fMWKE3n77bS1YsECfffaZTpw4oV69eik3N9dTzfAYd+0D/KnPEhISNHHiRG3atEmbNm1S586ddd1119mDHfqreF999ZVeeeUVNW3a1CGdfiuouNiA/ioh4wOuuOIKc/fddzukNWjQwKSnp3upRv4lIyPDNGvWzNvV8FuSzNtvv23/nJeXZ2JjY83EiRPtaadPnzZVqlQxM2bM8EINK74Ll4ExxgwaNMhcd911XqmPvzp8+LCRZNauXWuMYVsoDxi/i1aaffuxY8dMcHCwWbBggT3PL7/8YgICAsyHH37osbp7S2n2Af7eZ8YYEx0dbf71r3/RX05kZWWZevXqmZUrV5rU1FQzfPhwYwzrWWGKiw3or5Lz+hXuM2fO6Ouvv1bXrl0d0rt27ar169d7qVb+Z+fOnYqPj1dSUpL69eunXbt2ebtKfmv37t06ePCgwzYRGhqq1NRUtgkPW7NmjWrUqKFLL71Uf/3rX3X48GFvV6lC++OPPyRJVatWlcS24OsYv0vGlfX566+/1tmzZx3yxMfHq3Hjxn7Rp6XZB/hzn+Xm5mrBggU6efKkUlJS6C8nhg4dqp49e+qqq65ySKffCldUbEB/lZzXA+4jR44oNzdXNWvWdEivWbOmDh486KVa+Zc2bdpo3rx5+uijjzRz5kwdPHhQ7dq109GjR71dNb+Uv96zTXhX9+7d9cYbb2jVqlV69tln9dVXX6lz587Kzs72dtUqJGOMRo4cqQ4dOqhx48aS2BZ8HeN3ybiyPh88eFAhISGKjo4uMk9FVdp9gD/22bfffqvKlSsrNDRUd999t95++201atSI/irGggULtHnzZk2YMKHANPqtoOJiA/qr5IK8XYF8NpvN4bMxpkAarNG9e3f7/02aNFFKSoqSk5M1d+5cjRw50os1829sE95188032/9v3LixWrdurcTERL333nvq06ePF2tWMQ0bNkzffPONPvvsswLT2BZ8G8unZErTX/7Qp+7eB1TkPqtfv762bNmiY8eOacmSJRo0aJDWrl1rn05/Odq3b5+GDx+uFStWKCwsrMh89Nv/FBcb5L9Alv5yndevcFerVk2BgYEFznYcPny4wJkTeEZERISaNGminTt3ersqfin/LZBsE74lLi5OiYmJbBcWuO+++7R8+XKtXr1aCQkJ9nS2Bd/G+F0yrqzPsbGxOnPmjH7//fci81REZdkH+GOfhYSE6JJLLlHr1q01YcIENWvWTFOmTKG/ivD111/r8OHDatWqlYKCghQUFKS1a9fqxRdfVFBQkL3d9FvRzo8NWM9KzusBd0hIiFq1aqWVK1c6pK9cuVLt2rXzUq38W3Z2trZv3664uDhvV8UvJSUlKTY21mGbOHPmjNauXcs24UVHjx7Vvn372C7cyBijYcOGaenSpVq1apWSkpIcprMt+DbG75JxZX1u1aqVgoODHfIcOHBA3333XYXsU3fsA/ytzwpjjFF2djb9VYQuXbro22+/1ZYtW+x/rVu31q233qotW7aobt269JsT58cGrGel4OGXtBVqwYIFJjg42MyaNcts27bNjBgxwkRERJg9e/Z4u2p+4cEHHzRr1qwxu3btMhs3bjS9evUykZGR9L+FsrKyTGZmpsnMzDSSzHPPPWcyMzPNTz/9ZIwxZuLEiaZKlSpm6dKl5ttvvzW33HKLiYuLM8ePH/dyzSuO4pZBVlaWefDBB8369evN7t27zerVq01KSoqpVasWy8CN7rnnHlOlShWzZs0ac+DAAfvfqVOn7HnYFnwb47cjd+zb7777bpOQkGA+/vhjs3nzZtO5c2fTrFkzk5OT461mWcZd+wB/6rPRo0ebTz/91Ozevdt888035uGHHzYBAQFmxYoVxhj6y1Xnv6XcGPrtQs5iA/qrZHwi4DbGmKlTp5rExEQTEhJiWrZsaf9JCFjv5ptvNnFxcSY4ONjEx8ebPn36mK1bt3q7WhXa6tWrjaQCf4MGDTLGnPvJhYyMDBMbG2tCQ0PNlVdeab799lvvVrqCKW4ZnDp1ynTt2tVUr17dBAcHm4svvtgMGjTI7N2719vVrlAK639JZvbs2fY8bAu+j/H7f9yxb//zzz/NsGHDTNWqVU2lSpVMr169Kuy+x137AH/qs9tvv92+vVWvXt106dLFHmwbQ3+56sKAm35z5Cw2oL9KxmaMMZ64kg4AAAAAgD/x+jPcAAAAAABURATcAAAAAABYgIAbAAAAAAALEHADAAAAAGABAm4AAAAAACxAwA0AAAAAgAUIuAEAAAAAsAABNwAAAAAAFiDgBgAAAADAAgTcAAAAAABYgIAbAAAAAAALEHADAAAAAGABAm4AAAAAACxAwA0AAAAAgAUIuAEAAAAAsAABNwAAAAAAFiDgBgAAAADAAgTcAAAAAABYgIAbAAAAAAALEHADAAAAAGABAm4AAAAAACxAwA0AAAAAgAUIuAEAAAAAsAABNwAAAAAAFiDgBgAAAADAAgTcAAAAAABYgIAbAAAAAAALEHADAAAAAGABAm4AAAAAACxAwA0AAAAAgAUIuAEAAAAAsAABNwAAAAAAFiDgBgAAAADAAgTcAAAAAABYgIAbAAAAAAALEHADAAAAAGABAm7Ax+zfv19jx47Vli1bCkwbPHiwKleu7PlKAQCAUhs/fryWLVvm7WoA8AICbsDH7N+/X48//nihATcAACh/CLgB/0XADaBc+/PPP2WMKXTaqVOnyjTv3NxcZWdnl2keAACg6DHZGKM///yzTPMu7lgA8DYCblRov/76q/72t7+pdu3aCg0NVfXq1dW+fXt9/PHH9jydOnVS48aNtWHDBrVr106VKlVSnTp1NHv2bEnSe++9p5YtWyo8PFxNmjTRhx9+WKCczz77TF26dFFkZKTCw8PVrl07vffeewXyfffdd7ruuusUHR2tsLAwNW/eXHPnzrVPX7NmjS6//HJJ0pAhQ2Sz2WSz2TR27FiH+fzwww/q0aOHKleurNq1a+vBBx90CAz37Nkjm82mZ555Rs8995ySkpJUuXJlpaSkaOPGjQXqtWnTJvXu3VtVq1ZVWFiYWrRooUWLFjnkOXXqlEaNGqWkpCSFhYWpatWqat26tebPn2/Ps2vXLvXr10/x8fEKDQ1VzZo11aVLF5eu1rtShzlz5shms2nFihW6/fbbVb16dYWHhys7O9u+HD/99FO1a9dO4eHhuv322yVJe/fu1YABA1SjRg2FhoaqYcOGevbZZ5WXl1egzyZNmqQnn3xSSUlJCg0N1erVq53WHQBQtP/+97+65ZZbVLNmTYWGhuriiy/Wbbfd5jBuORsfpXNjpM1m05tvvql//OMfiouLU+XKlXXttdfq0KFDysrK0t/+9jdVq1ZN1apV05AhQ3TixAmHedhsNg0bNkz//Oc/demllyo0NFSNGjXSggULHPL9+uuvuvfee9WoUSNVrlxZNWrUUOfOnbVu3boC7cvOzta4cePUsGFDhYWFKSYmRmlpaVq/fr29zJMnT2ru3Ln2cb1Tp06S/jeurV69Wvfcc4+qVaummJgY9enTR/v37y9Q1sKFC5WSkqKIiAhVrlxZ3bp1U2ZmpkMeV8biVatWqVOnToqJiVGlSpV08cUX68Ybb3TpRLUrdch/BO7bb79V165dFRkZqS5dujgsgxkzZqhhw4YKDQ21L2tXjqeKOxYAfFGQtysAWGngwIHavHmznnrqKV166aU6duyYNm/erKNHjzrkO3jwoIYMGaKHHnpICQkJeumll3T77bdr3759Wrx4sR5++GFVqVJF48aN0/XXX69du3YpPj5ekrR27VpdffXVatq0qWbNmqXQ0FBNmzZN1157rebPn6+bb75ZkrRjxw61a9dONWrU0IsvvqiYmBi9/vrrGjx4sA4dOqSHHnpILVu21OzZszVkyBA98sgj6tmzpyQpISHBXtezZ8+qd+/euuOOO/Tggw/q008/1RNPPKEqVarosccec2jX1KlT1aBBA73wwguSpEcffVQ9evTQ7t27VaVKFUnS6tWrdc0116hNmzaaMWOGqlSpogULFujmm2/WqVOnNHjwYEnSyJEj9dprr+nJJ59UixYtdPLkSX333XcOfdmjRw/l5uZq0qRJuvjii3XkyBGtX79ex44dK3Y5uVqHfLfffrt69uyp1157TSdPnlRwcLAk6cCBAxowYIAeeughjR8/XgEBAfr111/Vrl07nTlzRk888YTq1Kmjd999V6NGjdKPP/6oadOmOcz7xRdf1KWXXqpnnnlGUVFRqlevXrF1BwAU7T//+Y86dOigatWqady4capXr54OHDig5cuX68yZMwoNDXVpfDzfww8/rLS0NM2ZM0d79uzRqFGjdMsttygoKEjNmjXT/PnzlZmZqYcffliRkZF68cUXHb6/fPlyrV69WuPGjVNERISmTZtm//5NN90kSfrtt98kSRkZGYqNjdWJEyf09ttvq1OnTvrkk0/sAXNOTo66d++udevWacSIEercubNycnK0ceNG7d27V+3atdOGDRvUuXNnpaWl6dFHH5UkRUVFOdTpzjvvVM+ePfXmm29q3759+vvf/64BAwZo1apV9jzjx4/XI488Yj9GOHPmjCZPnqyOHTvqyy+/VKNGjSQ5H4v37Nmjnj17qmPHjnr11Vd10UUX6ZdfftGHH36oM2fOKDw8vMjl6WodJOnMmTPq3bu37rrrLqWnpysnJ8c+bdmyZVq3bp0ee+wxxcbGqkaNGi4fT+Ur6lgA8DkGqMAqV65sRowYUWye1NRUI8ls2rTJnnb06FETGBhoKlWqZH755Rd7+pYtW4wk8+KLL9rT2rZta2rUqGGysrLsaTk5OaZx48YmISHB5OXlGWOM6devnwkNDTV79+51KL979+4mPDzcHDt2zBhjzFdffWUkmdmzZxeo66BBg4wks2jRIof0Hj16mPr169s/796920gyTZo0MTk5Ofb0L7/80kgy8+fPt6c1aNDAtGjRwpw9e9Zhnr169TJxcXEmNzfXGGNM48aNzfXXX19ELxpz5MgRI8m88MILReYpiqt1mD17tpFkbrvttgLzyF+On3zyiUN6enq6kWS++OILh/R77rnH2Gw2s2PHDmPM//osOTnZnDlzpsRtAAAU1LlzZ3PRRReZw4cPF5nH1fFx9erVRpK59tprHfKNGDHCSDL333+/Q/r1119vqlat6pAmyVSqVMkcPHjQnpaTk2MaNGhgLrnkkiLrmJOTY86ePWu6dOlibrjhBnv6vHnzjCQzc+bMIr9rjDERERFm0KBBBdLzx7V7773XIX3SpElGkjlw4IAxxpi9e/eaoKAgc9999znky8rKMrGxsaZv377GGNfG4sWLFxtJZsuWLcXW+UKu1sGY/x2vvPrqqwXmI8lUqVLF/Pbbbw7prh5PFXcsAPgibilHhXbFFVdozpw5evLJJ7Vx40adPXu20HxxcXFq1aqV/XPVqlVVo0YNNW/e3H4lW5IaNmwoSfrpp58kSSdPntQXX3yhm266yeHt4YGBgRo4cKB+/vln7dixQ9K527e6dOmi2rVrO5Q9ePBgnTp1Shs2bHCpTTabTddee61DWtOmTe11Ol/Pnj0VGBjokO/8+v/www/673//q1tvvVXSuTP1+X89evTQgQMH7PW/4oor9MEHHyg9PV1r1qwp8LxV1apVlZycrMmTJ+u5555TZmamwy3bRSlJHfLdeOONhc4rOjpanTt3dkhbtWqVGjVqpCuuuMIhffDgwTLGOFw9kKTevXtzlhwA3ODUqVNau3at+vbtq+rVqxeZr6TjY69evRw+54/N+XeFnZ/+22+/FbitvEuXLqpZs6b9c2BgoG6++Wb98MMP+vnnn+3pM2bMUMuWLRUWFqagoCAFBwfrk08+0fbt2+15PvjgA4WFhdkfYSqt3r17O3y+cLz+6KOPlJOTo9tuu81hnAwLC1NqaqrWrFkjybWxuHnz5goJCdHf/vY3zZ07V7t27XKpjq7W4XxFjdedO3dWdHS0/XNJjqeczRvwNQTcqNAWLlyoQYMG6V//+pdSUlJUtWpV3XbbbTp48KBDvqpVqxb4bkhISIH0kJAQSdLp06clSb///ruMMYqLiyvw/fxAPf+W66NHj7qUz5nw8HCFhYU5pIWGhtrrdL6YmJgC+STZg+VDhw5JkkaNGqXg4GCHv3vvvVeSdOTIEUnnbrX+xz/+oWXLliktLU1Vq1bV9ddfr507d0o6dyLgk08+Ubdu3TRp0iS1bNlS1atX1/3336+srKwi21OSOuQrrB+LSi9pvxc1bwBAyfz+++/Kzc11eCyqMCXdTxc1Njsbs/PFxsYWKCs/Lb+s5557Tvfcc4/atGmjJUuWaOPGjfrqq690zTXXOJxw/vXXXxUfH6+AgLIdUrs6Xl9++eUFxsqFCxfax0lXxuLk5GR9/PHHqlGjhoYOHark5GQlJydrypQpxdbR1TrkCw8PL3DrfL4Ll3dJjqeKmgfgq3iGGxVatWrV9MILL+iFF17Q3r17tXz5cqWnp+vw4cOFvvyspKKjoxUQEKADBw4UmJb/spNq1apJOjeYupLPk/LLHD16tPr06VNonvr160uSIiIi9Pjjj+vxxx/XoUOH7Fe7r732Wv33v/+VJCUmJmrWrFmSpO+//16LFi3S2LFjdebMGc2YMaPMdchns9kKzVdYekn7vah5AwBKpmrVqgoMDHS4alwYT4+PF550Pz8tP/B9/fXX1alTJ02fPt0h34UnkKtXr67PPvtMeXl5ZQ66i5PfB4sXL1ZiYmKxeV0Zizt27KiOHTsqNzdXmzZt0ksvvaQRI0aoZs2a6tevX5nrIBU/nl44rSTHU67MH/AlXOGG37j44os1bNgwXX311dq8ebNb5hkREaE2bdpo6dKlDme88/Ly9PrrryshIUGXXnqppHO3sK1atarAW0fnzZun8PBwtW3bVlLBs9pWql+/vurVq6f//Oc/at26daF/kZGRBb5Xs2ZNDR48WLfccot27NhR6FtNL730Uj3yyCNq0qRJsf1d2jq4qkuXLtq2bVuBOsybN082m01paWmlnjcAoGiVKlVSamqq3nrrrQJXP8/n6vjoLp988on9aq107icgFy5cqOTkZPvVeJvNZh+P833zzTcFbm/v3r27Tp8+rTlz5hRbZmhoaJnG9W7duikoKEg//vhjkWNlYZyNxYGBgWrTpo2mTp0qScWO16WtgytKcjwFlDdc4UaF9ccffygtLU39+/dXgwYNFBkZqa+++koffvhhkVdSS2PChAm6+uqrlZaWplGjRikkJETTpk3Td999p/nz59vPwGZkZOjdd99VWlqaHnvsMVWtWlVvvPGG3nvvPU2aNMn+1vDk5GRVqlRJb7zxhho2bKjKlSsrPj7e4Vlyd/rnP/+p7t27q1u3bho8eLBq1aql3377Tdu3b9fmzZv11ltvSZLatGmjXr16qWnTpoqOjtb27dv12muvKSUlReHh4frmm280bNgw/eUvf1G9evUUEhKiVatW6ZtvvlF6erpb6lAaDzzwgObNm6eePXtq3LhxSkxM1Hvvvadp06bpnnvuYQAHAAs999xz6tChg9q0aaP09HRdcsklOnTokJYvX65//vOfioyMdHl8dJdq1aqpc+fOevTRR+1vKf/vf//r8NNgvXr10hNPPKGMjAylpqZqx44dGjdunJKSkhzetn3LLbdo9uzZuvvuu7Vjxw6lpaUpLy9PX3zxhRo2bGi/WtykSROtWbNG77zzjuLi4hQZGVng7q3i1KlTR+PGjdOYMWO0a9cuXXPNNYqOjtahQ4f05Zdf2u9Cc2UsnjFjhlatWqWePXvq4osv1unTp/Xqq69Kkq666qoy16G0XD2eAsodL7+0DbDM6dOnzd13322aNm1qoqKiTKVKlUz9+vVNRkaGOXnypD1famqqueyyywp8PzEx0fTs2bNAuiQzdOhQh7R169aZzp07m4iICFOpUiXTtm1b88477xT47rfffmuuvfZaU6VKFRMSEmKaNWtW6NvI58+fbxo0aGCCg4ONJJORkWGMOffWz4iIiAL5MzIyzPmbc/4btydPnlxo/fPnl+8///mP6du3r6lRo4YJDg42sbGxpnPnzmbGjBn2POnp6aZ169YmOjrahIaGmrp165oHHnjAHDlyxBhjzKFDh8zgwYNNgwYNTEREhKlcubJp2rSpef755x3elF4UV+qQ/2bSr776qsD3i1qOxhjz008/mf79+5uYmBgTHBxs6tevbyZPnmx/+7mzPgMAlN62bdvMX/7yFxMTE2NCQkLMxRdfbAYPHmxOnz5tz+PK+Jj/lvK33nrLIb2osSF/bPz111/taflj+LRp00xycrIJDg42DRo0MG+88YbDd7Ozs82oUaNMrVq1TFhYmGnZsqVZtmyZGTRokElMTHTI++eff5rHHnvM1KtXz4SEhJiYmBjTuXNns379enueLVu2mPbt25vw8HAjyaSmphZb9/y2rl692iF92bJlJi0tzURFRZnQ0FCTmJhobrrpJvPxxx8bY1wbizds2GBuuOEGk5iYaEJDQ01MTIxJTU01y5cvL2TpFeSsDsYUfbxy/jIojCvHU8UdCwC+yGaMMZ4P8wEAAADPstlsGjp0qF5++WVvVwWAn+AZbgAAAAAALEDADQAAAACABXhpGgAAAPwCT1IC8DSucAMAAAAAYAECbgAAAAAALEDADQAAAACABbzyDHdeXp7279+vyMhIfsQeAODXjDHKyspSfHy8AgK8cx6ccRkAgHPcPS57JeDev3+/ateu7Y2iAQDwSfv27VNCQoJXymZcBgDAkbvGZa8E3JGRkZLONSIqKsobVQAAwCccP35ctWvXto+N3sC4DADAOe4el70ScOffrhYVFcXAfqHVE4qfnjbaM/UAAHiUN2/lLrfjsrMxU2LcBACUirvGZV6aBgAAAACABQi4AQAAAACwAAE3AAAAAAAWIOAGAAAAAMACBNwAAAAAAFjAK28pBwAA8Ah+/QMA4EUE3OUNBw4AAAAAUC5wSzkAAAAAABbgCre/cXaFXOIqOQAAAAC4AVe4AQAAAACwAFe4AQCA/+LdKAAACxFwAwAA3+TKY1AAAPgwAm538oWz5BycAAAAAIBPIOD2MRt2HS12ekrdGA/VBAAAAABQFrw0DQAAAAAACxBwAwAAAABgAW4p9ySerwYAAAAAv8EVbgAAAAAALEDADQAAAACABbilvJzhLeYAAAAAUD5whRsAAAAAAAsQcAMAAAAAYAECbgAAAAAALMAz3B7k7PlrT5TBM94AAAAA4Blc4QYAAAAAwAIE3AAAAAAAWICAGwAAAAAAC/AMNwAAQGmtnuA8T9po6+sBAPBJXOEGAAAAAMACBNwAAAAAAFiAW8rdyBM/++UTnN0+x61zAICKwpVbxgEAKAIBd0n4yaD7/Mrvi53+AGsNAAAAADhF6IQC2u59pfgMdWM8UxEAAAAAKMcIuAEAgHf4yZ1jAAD/xUvTAAAAAACwAFe44Xn8ZikAoJxw9kLUFBces3L2bpSyeuDqSy2dPwCg9Ai4/YxH3qTujlsEeRM6AAAAgHKOgBsAAMDPOf2FEq6iA0CpEHADAACUkkt3jl1sbR1cuWWdgBkAvIOAGwAAlEvueEzKlWewfZ3Tn/OUJD1jeT0AAAURcJ+PnydxC3e8YAYAAE/wyLtNysi1gBoA4IsIuAEAALyIgBoAKi4C7hIoD2fBAQCAbyGgBgD/RcCNEivriQdXvs9t5wBQAfCoFgDAzxFwAwAAVHDO3mTu/Co8L10DgNIg4AaK4uzKTNpoz9QDAAAvc8vvdDsZV5/PubHsZQCAjyHghk9yetv5rlHFTnZ6SzrBMgDAj/AcOQB4BwH3eXgpmh9xx3OFvnAF3BfqAAAWYVz2Hf5yy7lbruQDwHn8JuB2tgOVpLYeqAdg50rQT8AMACgHNswq/s4zl1xctq+7cqxHEr2MVAAA/RZJREFUwAzA0/wm4OZWKv/i7KqIR96C7omr6J7AVXQAgAc4O1Z7fuXfPFQTC3niZDvjNuBT/CbgBgAAQPnlLCDfeLHzgLysb2t3FvQ/ELTEaR2cqQi3tVeENgDu4pWA2xgjSTp+/Lj7Zvrps8VOPvlntvvKQrn38db93q6CrqhT1fpC3s0ofvqVDzqfx8nTxU93th072TZdqkNZlbUOzr7vwjymrvqh2OlDO1/ivAx4hKeXVf5YmD82eoMl47LkfP/h7OuM3SiBJjteKvM8TjqZfvrkiWKnf/xz2Y8vTicUX8aEZZuLnX65kzpc0ars2/mX88YUO72Jk+8fz3bhGKiMY/OXe34rdvpXCUOcVqHM+3tfOAaqIDw5Nrt7XLYZL4zwP//8s2rXru3pYgEA8Fn79u1TQkKCV8pmXAYAwJG7xmWvBNx5eXnav3+/IiMjZbPZPF28zzh+/Lhq166tffv2KSoqytvV8Rn0S+Hol8LRL4WjX4rma31jjFFWVpbi4+MVEBDglTqUdlz2tb70FH9stz+2WaLdtNs/0G7Hdrt7XPbKLeUBAQFeO4vvi6Kiovxq5XYV/VI4+qVw9Evh6Jei+VLfVKlSxavll3Vc9qW+9CR/bLc/tlmi3f6GdvuXwtrtznHZO6fSAQAAAACo4Ai4AQAAAACwAAG3F4WGhiojI0OhoaHeropPoV8KR78Ujn4pHP1SNPrGffy1L/2x3f7YZol2027/QLutbbdXXpoGAAAAAEBFxxVuAAAAAAAsQMANAAAAAIAFCLgBAAAAALAAATcAAAAAABYg4AYAAAAAwAIE3GUwbdo0JSUlKSwsTK1atdK6deuKzb927Vq1atVKYWFhqlu3rmbMmFEgz5IlS9SoUSOFhoaqUaNGevvttx2mjx07VjabzeEvNjbWre0qK3f3y9atW3XjjTeqTp06stlseuGFF9xSrqd5o1/8cX2ZOXOmOnbsqOjoaEVHR+uqq67Sl19+WeZyPc0b/eKP68vSpUvVunVrXXTRRYqIiFDz5s312muvlbnc8sob45ov8NZ+yNusWN75FixYIJvNpuuvv97NtS47K9p97NgxDR06VHFxcQoLC1PDhg31/vvvW9WEUrGi3S+88ILq16+vSpUqqXbt2nrggQd0+vRpq5pQKiVp94EDB9S/f3/Vr19fAQEBGjFiRKH5Ktp+zZV2l4f9mhXLOl+Z9mkGpbJgwQITHBxsZs6cabZt22aGDx9uIiIizE8//VRo/l27dpnw8HAzfPhws23bNjNz5kwTHBxsFi9ebM+zfv16ExgYaMaPH2+2b99uxo8fb4KCgszGjRvteTIyMsxll11mDhw4YP87fPiw5e11lRX98uWXX5pRo0aZ+fPnm9jYWPP888+XuVxP81a/+OP60r9/fzN16lSTmZlptm/fboYMGWKqVKlifv7551KX62ne6hd/XF9Wr15tli5darZt22Z++OEH88ILL5jAwEDz4Ycflrrc8spb45q3eWt78zYr2p1vz549platWqZjx47muuuus7glJWNFu7Ozs03r1q1Njx49zGeffWb27Nlj1q1bZ7Zs2eKpZjllRbtff/11Exoaat544w2ze/du89FHH5m4uDgzYsQITzXLqZK2e/fu3eb+++83c+fONc2bNzfDhw8vkKci7tdcabev79esaHO+su7TCLhL6YorrjB33323Q1qDBg1Menp6ofkfeugh06BBA4e0u+66y7Rt29b+uW/fvuaaa65xyNOtWzfTr18/++eMjAzTrFmzMtbeOlb0y/kSExMLDSxLWq6neatf/H19McaYnJwcExkZaebOnVvqcj3NW/3C+nJOixYtzCOPPFLqcssrb41r3uat7c3brGp3Tk6Oad++vfnXv/5lBg0a5HMBtxXtnj59uqlbt645c+aM+yvsJla0e+jQoaZz584OeUaOHGk6dOjgplqXXVn236mpqYUGYRVxv3a+otp9IV/br1nVZnfs07ilvBTOnDmjr7/+Wl27dnVI79q1q9avX1/odzZs2FAgf7du3bRp0yadPXu22DwXznPnzp2Kj49XUlKS+vXrp127dpW1SW5hVb9YUa4neatf8vn7+nLq1CmdPXtWVatWLXW5nuStfsnnz+uLMUaffPKJduzYoSuvvLLU5ZZH3h7XvMXb25u3WNnucePGqXr16rrjjjvcX/Eysqrdy5cvV0pKioYOHaqaNWuqcePGGj9+vHJzc61pSAlZ1e4OHTro66+/tt9WvGvXLr3//vvq2bOnBa0oOav23xVxv1YavrRfs7LN7tinEXCXwpEjR5Sbm6uaNWs6pNesWVMHDx4s9DsHDx4sNH9OTo6OHDlSbJ7z59mmTRvNmzdPH330kWbOnKmDBw+qXbt2Onr0qDuaViZW9YsV5XqSt/pFYn2RpPT0dNWqVUtXXXVVqcv1JG/1i+S/68sff/yhypUrKyQkRD179tRLL72kq6++utTllkfeHNe8yZvbmzdZ1e7PP/9cs2bN0syZM62peBlZ1e5du3Zp8eLFys3N1fvvv69HHnlEzz77rJ566ilrGlJCVrW7X79+euKJJ9ShQwcFBwcrOTlZaWlpSk9Pt6YhJWTV/rsi7tdKw5f2a1a12V37tKAyfdvP2Ww2h8/GmAJpzvJfmO5snt27d7f/36RJE6WkpCg5OVlz587VyJEjS94IC1jRL1aU62ne6Bd/X18mTZqk+fPna82aNQoLCytTuZ7mjX7x1/UlMjJSW7Zs0YkTJ/TJJ59o5MiRqlu3rjp16lTqcssrb4xrvsBb+yFvc2e7s7KyNGDAAM2cOVPVqlVzf2XdyN3LOy8vTzVq1NArr7yiwMBAtWrVSvv379fkyZP12GOPubn2pefudq9Zs0ZPPfWUpk2bpjZt2uiHH37Q8OHDFRcXp0cffdTNtS89K/ZBFXG/VhK+ul9zZ5vduU8j4C6FatWqKTAwsMAZk8OHDxc4s5IvNja20PxBQUGKiYkpNk9R85SkiIgINWnSRDt37ixNU9zKqn6xolxP8la/FMaf1pdnnnlG48eP18cff6ymTZuWqVxP8la/FMZf1peAgABdcsklkqTmzZtr+/btmjBhgjp16uTz64u7+NK45km+tL15khXt3rp1q/bs2aNrr73WPj0vL0+SFBQUpB07dig5OdnNLSkZq5Z3XFycgoODFRgYaM/TsGFDHTx4UGfOnFFISIibW1IyVrX70Ucf1cCBA3XnnXdKOnei9uTJk/rb3/6mMWPGKCDAuzfSWrX/roj7tZLwxf2aFW3+8ccf3bZP45byUggJCVGrVq20cuVKh/SVK1eqXbt2hX4nJSWlQP4VK1aodevWCg4OLjZPUfOUpOzsbG3fvl1xcXGlaYpbWdUvVpTrSd7ql8L4y/oyefJkPfHEE/rwww/VunXrMpfrSd7ql8L4y/pyIWOMsrOzS11ueeRL45on+dL25klWtLtBgwb69ttvtWXLFvtf7969lZaWpi1btqh27dqWtcdVVi3v9u3b64cffrAfjEvS999/r7i4OK8H25J17T516lSBoDowMFDm3EuZ3diC0rFq/10R92uu8tX9mhVtdus+rcSvWYMx5n+vnp81a5bZtm2bGTFihImIiDB79uwxxhiTnp5uBg4caM+f//MKDzzwgNm2bZuZNWtWgZ9X+Pzzz01gYKCZOHGi2b59u5k4cWKBnxl48MEHzZo1a8yuXbvMxo0bTa9evUxkZKS9XG+zol+ys7NNZmamyczMNHFxcWbUqFEmMzPT7Ny50+Vyvc1b/eKP68vTTz9tQkJCzOLFix1+3iorK8vlcr3NW/3ij+vL+PHjzYoVK8yPP/5otm/fbp599lkTFBRkZs6c6XK5FYW3xjVv89b25m1WtPtCvviWcivavXfvXlO5cmUzbNgws2PHDvPuu++aGjVqmCeffNLj7SuKFe3OyMgwkZGRZv78+WbXrl1mxYoVJjk52fTt29fj7StKSdttjLEfW7Vq1cr079/fZGZmmq1bt9qnV8T9mjHO2+3r+zUr2nyh0u7TCLjLYOrUqSYxMdGEhISYli1bmrVr19qnDRo0yKSmpjrkX7NmjWnRooUJCQkxderUMdOnTy8wz7feesvUr1/fBAcHmwYNGpglS5Y4TL/55ptNXFycCQ4ONvHx8aZPnz7Frhje4O5+2b17t5FU4O/C+RRXri/wRr/44/qSmJhYaL9kZGS4XK4v8Ea/+OP6MmbMGHPJJZeYsLAwEx0dbVJSUsyCBQtKVG5F4o1xzRd4az/kbVYs7/P5YsBtjDXtXr9+vWnTpo0JDQ01devWNU899ZTJycmxuikl4u52nz171owdO9YkJyebsLAwU7t2bXPvvfea33//3QOtcV1J213YtpuYmOiQpyLu15y1uzzs16xY1ucr7T7N9v+FAQAAAAAAN+IZbgAAAAAALEDADQAAAACABQi4AQAAAACwAAE3AAAAAAAWIOAGAAAAAMACBNwAAAAAAFiAgBsAAAAAAAsQcAMAAAAAYAECbgAAAAAALEDADQAAAACABQi4AQAAAACwAAE3AAAAAAAWIOAGAAAAAMACBNwAAAAAAFiAgBsAAAAAAAsQcAMAAAAAYAECbgAAAAAALEDADQAAAACABQi4AQAAAACwAAE3AAAAAAAWIOAGAAAAAMACBNwAAAAAAFiAgBsAAAAAAAsQcAMAAAAAYAECbgAAAAAALEDADQAAAACABQi4AQAAAACwAAE3AAAAAAAWIOAGAAAAAMACBNwAgP9j777jq6jy/4+/b0glJCGhJSEhFKVJFZRu6CAg9q4UXVGBFRW+CutqwALYC4q4iogNXBQRxUU6FkBBQKWIIEVYmqJIBEFCPr8/+OUul1Tgzm15PR8PHg8yc+7M+Zwzd858pl0AAAA4gIQbAAAAAAAHkHADAAAAAOAAEm4AAAAAABxAwg0AAAAAgANIuAEAAAAAcAAJNwAAAAAADiDhBkqZ1157TS6XS1u3bnVPe/vtt/XMM8/4rU4AAODUbN26VS6XS6+99tppfX706NGaMWOGV+sEID+XmZm/KwHAd37++Wf9+OOPatq0qaKioiRJvXr10po1azyScAAAELiOHDmiVatWqVatWqpUqdIpf75cuXK64oorTjthB1Ay4f6uAADfqlSp0mkNzL526NAhlS1bNt/0Y8eOKScnx32ywJvLBgAgWERFRally5b+roZXFDW2e2PM/vPPPxUTE3NGywBOF7eUA//f999/r2uvvVZVqlRRVFSUqlWrpj59+ujIkSPuMmvWrNHFF1+sxMRERUdHq0mTJpo8ebLHchYtWiSXy6UpU6bovvvuU2pqquLj49W5c2dt2LAh33pnz56tTp06KSEhQWXLllW9evU0ZswY9/wVK1bommuuUfXq1RUTE6Pq1avr2muv1bZt29xlvvnmG7lcLk2cODHf8v/zn//I5XJp5syZkvLfUt6+fXvNmjVL27Ztk8vlcv8zM5199tnq1q1bvmX+8ccfSkhI0KBBg4psUzPT+PHj1aRJE8XExCgxMVFXXHGFNm/e7FGuffv2atCggT799FO1bt1aZcuW1U033eS+Xe6xxx7Tww8/rBo1aigqKkoLFy6UJM2cOVOtWrVS2bJlFRcXpy5dumjp0qUeyx45cqRcLpdWrlypK664QomJiapVq1aR9QYAlFxpGT8laePGjbruuutUuXJlRUVFqV69enrhhRdK1E4ul0uDBw/WSy+9pNq1aysqKkr169fX1KlT85UtSXsVdEt53pi3du1aXXvttUpISFCVKlV000036ffff/eoy8GDBzV58mT3uN++fXtJxxPcYcOGqUaNGoqOjlZSUpKaN2+uKVOmFBvj7t27deuttyotLU2RkZGqUaOGRo0apZycnHz1LmhsL2rMPnz4sEaMGKEaNWooMjJSVatW1aBBg7R//36POlSvXl29evXS9OnT1bRpU0VHR2vUqFHF1h1wjAGw1atXW7ly5ax69eo2YcIEmz9/vr355pt21VVX2YEDB8zM7Pvvv7e4uDirVauWvf766zZr1iy79tprTZI9+uij7mUtXLjQJFn16tXt+uuvt1mzZtmUKVOsWrVqdvbZZ1tOTo677CuvvGIul8vat29vb7/9ts2bN8/Gjx9vAwcOdJeZNm2aPfDAA/b+++/b4sWLberUqZaZmWmVKlWyn3/+2V2uadOm1qZNm3yxXXXVVVa5cmU7evSomZlNmjTJJNmWLVvMzGzt2rXWpk0bS05OtqVLl7r/mZk9++yz5nK57IcffvBY5gsvvGCSbO3atUW26y233GIRERE2dOhQmz17tr399ttWt25dq1Kliu3evdtdLjMz05KSkiw9Pd3GjRtnCxcutMWLF9uWLVtMklWtWtU6dOhg7777rs2ZM8e2bNlib731lkmyrl272owZM+ydd96xZs2aWWRkpH322WfuZWdlZZkky8jIsHvvvdfmzp1rM2bMKLLeAICSKU3j59q1ay0hIcEaNmxor7/+us2ZM8eGDh1qYWFhNnLkyGLbSpKlp6db/fr1bcqUKTZz5kzr3r27SbJp06a5y5W0vfLGyEmTJrmn5Y15derUsQceeMDmzp1rTz31lEVFRVn//v3d5ZYuXWoxMTHWo0cP97ifN6bfeuutVrZsWXvqqads4cKF9tFHH9nYsWNt3LhxRca3a9cuS09Pt4yMDHvppZds3rx59tBDD1lUVJT169cvX70LGtsLG7Nzc3OtW7duFh4ebvfff7/NmTPHnnjiCYuNjbWmTZva4cOH3cvPyMiwlJQUq1mzpr366qu2cOFC++qrr4rtH8ApJNyAmXXs2NHKly9ve/fuLbTMNddcY1FRUfbTTz95TL/wwgutbNmytn//fjP73wFDjx49PMr9+9//NknuZDY7O9vi4+Otbdu2lpubW+K65uTk2B9//GGxsbH27LPPuqc/99xzJsk2bNjgnvbrr79aVFSUDR061D3t5ITbzKxnz56WkZGRb10HDhywuLg4GzJkiMf0+vXrW4cOHYqs59KlS02SPfnkkx7Tt2/fbjExMXbPPfe4p2VmZpokmz9/vkfZvEG5Vq1a9tdff7mnHzt2zFJTU61hw4Z27Ngx9/Ts7GyrXLmytW7d2j0tb/B+4IEHiqwvAODUlabxs1u3bpaWlma///67x3IHDx5s0dHR9uuvvxa5fkkWExPjccI5JyfH6tata2eddZZ7Wknbq6iE+7HHHvP47MCBAy06OtqjvWJjY61v37756tmgQQO75JJLioylILfeequVK1fOtm3b5jH9iSee8DhJX9jYfmL9Tx6zZ8+eXWBc77zzjkmyf/3rX+5pGRkZVqZMGY/+BPyJW8pR6h06dEiLFy/WVVddVeSzzQsWLFCnTp2Unp7uMb1fv346dOhQvluZe/fu7fF3o0aNJMl9K9uSJUt04MABDRw4UC6Xq9D1/vHHH7r33nt11llnKTw8XOHh4SpXrpwOHjyo9evXu8tdf/31ioqK8ri1bMqUKTpy5Ij69+9fdCMUIi4uTv3799drr72mgwcPSjreDuvWrdPgwYOL/OxHH30kl8ulG264QTk5Oe5/ycnJaty4sRYtWuRRPjExUR07dixwWb1791ZERIT77w0bNmjnzp268cYbFRb2v91YuXLldPnll2vZsmU6dOiQxzIuv/zyUwkdAFCM0jR+Hj58WPPnz9ell16qsmXLeoxrPXr00OHDh7Vs2bIiWuu4Tp06qUqVKu6/y5Qpo6uvvlqbNm3Sjh07Tqu9ClJQGx4+fFh79+4t9rPnn3++/vOf/2j48OFatGiR/vzzz2I/Ix0f9zt06KDU1FSP9rnwwgslSYsXL85XxxPH9hOdPGYvWLBA0vE2ONGVV16p2NhYzZ8/32N6o0aNVLt27RLVG3AaCTdKvd9++03Hjh1TWlpakeX27dunlJSUfNNTU1Pd809UoUIFj7/zXgSSN3D9/PPPklTseq+77jo9//zz+tvf/qZPPvlEX331lZYvX65KlSp5DIJJSUnq3bu3Xn/9dR07dkzS8ee1zz//fJ1zzjlFrqMof//735Wdna233npLkvT8888rLS1NF198cZGf27Nnj8xMVapUUUREhMe/ZcuW6ZdffvEoX1DbFjYvr60L64/c3Fz99ttvJV4+AODUlabxc9++fcrJydG4cePyjWk9evSQpHzjWkGSk5MLnZbXDqfaXgUprg2L8txzz+nee+/VjBkz1KFDByUlJemSSy7Rxo0bi/zcnj179OGHH+Zrn7w2PNNxPzw8PN+JHZfLpeTk5HxtwpiPQMJbylHqJSUlqUyZMu4zy4WpUKGCdu3alW/6zp07JUkVK1Y8pfXmDRpFrff333/XRx99pKysLA0fPtw9/ciRI/r111/zle/fv7+mTZumuXPnqlq1alq+fLlefPHFU6rXyc466yxdeOGFeuGFF3ThhRdq5syZGjVqlMqUKVPk5ypWrCiXy6XPPvuswLeOnjytqKsUJ8/LO5AorD/CwsKUmJhY4uUDAE5daRo/ExMTVaZMGd14442FvjC0Ro0axdZ99+7dhU7LG9u83V6nKjY2VqNGjdKoUaO0Z88e99Xuiy66SN9//32hn6tYsaIaNWqkRx55pMD5eScM8pzquJ+Tk6Off/7ZI+k2M+3evVvnnXdeiZcN+BpXuFHqxcTEKDMzU9OmTSvy7HSnTp20YMEC94CX5/XXX1fZsmVP+ac5WrdurYSEBE2YMEFmVmCZvLeFn5ycvvLKK+6z8Cfq2rWrqlatqkmTJmnSpEmKjo7WtddeW2xdoqKiijzrPWTIEH377bfq27evypQpo1tuuaXYZfbq1Utmpv/+979q3rx5vn8NGzYsdhmFqVOnjqpWraq3337bo+0OHjyo9957z/3mcgCAc0rT+Fm2bFl16NBBq1atUqNGjQoc106+qlyQ+fPna8+ePe6/jx07pnfeeUe1atVyX7H3dnsVprixX5KqVKmifv366dprr9WGDRvyPa51ol69emnNmjWqVatWge1zcsJ9Kjp16iRJevPNNz2mv/feezp48KB7PhCIuMINSHrqqafUtm1btWjRQsOHD9dZZ52lPXv2aObMmXrppZcUFxenrKws9/NJDzzwgJKSkvTWW29p1qxZeuyxx5SQkHBK6yxXrpyefPJJ/e1vf1Pnzp11yy23qEqVKtq0aZO++eYbPf/884qPj9cFF1ygxx9/XBUrVlT16tW1ePFiTZw4UeXLl8+3zDJlyqhPnz566qmnFB8fr8suu6xE9WrYsKGmT5+uF198Uc2aNVNYWJiaN2/unt+lSxfVr19fCxcu1A033KDKlSsXu8w2bdpowIAB6t+/v1asWKELLrhAsbGx2rVrlz7//HM1bNhQt99++ym1WZ6wsDA99thjuv7669WrVy/deuutOnLkiB5//HHt379fY8eOPa3lAgBOTWkaP5999lm1bdtW7dq10+23367q1asrOztbmzZt0ocffuh+zrgoFStWVMeOHXX//fcrNjZW48eP1/fff+/x02Debq/CNGzYUIsWLdKHH36olJQUxcXFqU6dOmrRooV69eqlRo0aKTExUevXr9cbb7xR7MnsBx98UHPnzlXr1q11xx13qE6dOjp8+LC2bt2qjz/+WBMmTCj2MYDCdOnSRd26ddO9996rAwcOqE2bNvr222+VlZWlpk2b6sYbbzzdZgCc57/3tQGBZd26dXbllVdahQoVLDIy0qpVq2b9+vXz+KmJ7777zi666CJLSEiwyMhIa9y4scfbQc3+95bVE3/iw6zgt4mamX388ceWmZlpsbGxVrZsWatfv77Hz37s2LHDLr/8cktMTLS4uDjr3r27rVmzxjIyMgp8u+gPP/xgkkySzZ07N9/8gt5S/uuvv9oVV1xh5cuXN5fLZQXtGkaOHGmSbNmyZUW0Yn6vvvqqtWjRwmJjYy0mJsZq1aplffr0sRUrVrjLZGZm2jnnnJPvs3lt9vjjjxe47BkzZliLFi0sOjraYmNjrVOnTvbFF194lMl74+mJPwEDAPCe0jJ+5tXlpptusqpVq1pERIRVqlTJWrdubQ8//HCx7STJBg0aZOPHj7datWpZRESE1a1b19566618ZUvSXkW9pfzkMa+gsX/16tXWpk0bK1u2rEmyzMxMMzMbPny4NW/e3BITEy0qKspq1qxpd911l/3yyy/Fxvjzzz/bHXfcYTVq1LCIiAhLSkqyZs2a2X333Wd//PGHR70LGtuLGrP//PNPu/feey0jI8MiIiIsJSXFbr/9dvvtt988ymVkZFjPnj2LrSvgKy6zQu7FAYATNG/eXC6XS8uXL/d3VQAACDoul0uDBg3S888/7++qAPAhbikHUKgDBw5ozZo1+uijj/T111/r/fff93eVAAAAgKBBwg2gUCtXrlSHDh1UoUIFZWVl6ZJLLvF3lQAAAICgwS3lAAAAAAA4gJ8FAwAAAADAASTcAAAAAAA4wC/PcOfm5mrnzp2Ki4uTy+XyRxUAAAgIZqbs7GylpqYqLMw/58EZlwEAOM7b47JfEu6dO3cqPT3dH6sGACAgbd++XWlpaX5ZN+MyAACevDUu+yXhjouLk3Q8iPj4eH9UAQCAgHDgwAGlp6e7x0Z/YFwGAOA4b4/Lfkm4825Xi4+PZ2AHAEDy663cjMsAAHjy1rjM73CfaOGYoud3GOGbegAAgNDB8QUAlFqlJ+EubrADAAAAAMCL+FkwAAAAAAAcUHqucAMAAJwqbgcHAJwBrnADAAAAAOAArnADAACcLt4RAwAoAle4AQAAAABwAAk3AAAAAAAO4JZyAABQenFLOADAQVzhBgAAAADAASTcAAAAAAA4gIQbAAAAAAAH8Aw3AACAPxX3HHmHEb6pBwDA67jCDQAAAACAA7jCDQAAEMi4Ag4AQYsr3AAAAAAAOICEGwAAAAAAB4TOLeXF3W4FAAAAAIAPhU7CDQAAgAI9PfeHIuff1aW2j2oCAKULt5QDAAAAAOAAEm4AAAAAABxAwg0AAAAAgAN4hhsAACDEtfzpX8WUeMIn9QCA0oYr3AAAAAAAOICEGwAAAAAAB5BwAwAAAADgAJ7hBgAAKOX4nW4AcAYJ9ylgMAIAAAAAlBQJ9yngDZ8AACDgLBzj7xoAAArBM9wAAAAAADiAK9xexC3nAADA15Zu3ufvKgAACkHCDQAAQlcQ3G5dXMLcqmYFH9WkcMVdVCgJLjwAKI24pRwAAAAAAAdwhduHSnJ2mLO/AACUUBBcvfaG0nLLOI/mAQhFXOEGAAAAAMABXOE+QWk5gwwAgE8UdwW6wwjf1MPPQuH4orifRl1WbYCPagIAwYUr3AAAAAAAOICEGwAAAAAAB5BwAwAAAADgAJ7hBgAAgOO88VveABBsSLh9qLgXjkjS03OLfukIP4kBAAAAAMGBhNuLSpJQAwAAlEa86RxAaUTCDQAAUMpx0QAAnMFL0wAAAAAAcABXuANMcWeYecYbAAAAAIIDCTcAAADOiC9uSS/uLedcdAAQiEi4AQBAYFo4xt810NLN+4qc36pmBR/VBMXxxs+OOZ20l6SOnDgAQgsJNwAAwGkqLiFHcFk6cViR81vd/ISPagIgVJSahDtUBkSe8QYAAACA4FBqEm4cx61MAAAgEJ3pc+CB8Dve3rit/UzXwXEcEFhIuJFPsTvy8PeKXkCHEV6sDQAAzgmVO+AQGIo7aeCTkwLFvfuA4zTAp/yScJuZJOnAgQPeW+jBw0XP/vOI99YVwBpuGOf4OuYVV2Dt34ucvTytf7HrGNTxrJJXCACCWN5YmDc2+oMj47JU7Nisj7K8u74CfLX1V8fXgcDgjWOgg8XMHzNjZZHzzyvmePPwwT+KrcOBYr4XDYvZpg9UTypy/gvFxCB54Tjs0yeLrkPOxcXXIfyDogtcMPRUanRaXliw6Yw+z/GsFxWzTXlze/D2uOwyP4zwO3bsUHp6uq9XCwBAwNq+fbvS0tL8sm7GZQAAPHlrXPZLwp2bm6udO3cqLi5OLpfrjJd34MABpaena/v27YqPj/dCDf2LeAIb8QQ24glsoRaPdOYxmZmys7OVmpqqsLAwB2pYPG+Py1Jo9rUv0G6nh3Y7PbTb6aHdTk+wtJu3x2W/3FIeFhbmyFn8+Pj4gO68U0U8gY14AhvxBLZQi0c6s5gSEhK8XJtT49S4LIVmX/sC7XZ6aLfTQ7udHtrt9ARDu3lzXPbPqXQAAAAAAEIcCTcAAAAAAA4IiYQ7KipKWVlZioqK8ndVvIJ4AhvxBDbiCWyhFo8UmjF5A+1yemi300O7nR7a7fTQbqentLabX16aBgAAAABAqAuJK9wAAAAAAAQaEm4AAAAAABxAwg0AAAAAgANIuAEAAAAAcAAJNwAAAAAADgjYhHv8+PGqUaOGoqOj1axZM3322WdFll+8eLGaNWum6Oho1axZUxMmTMhX5r333lP9+vUVFRWl+vXr6/3333eq+vl4O57XXntNLpcr37/Dhw87GYbbqcSza9cuXXfddapTp47CwsJ05513FlguWPqnJPEEU/9Mnz5dXbp0UaVKlRQfH69WrVrpk08+yVcuWPqnJPEEU/98/vnnatOmjSpUqKCYmBjVrVtXTz/9dL5ywdI/JYknmPrnRF988YXCw8PVpEmTfPP82T/eEmrjsq94u93Wrl2ryy+/XNWrV5fL5dIzzzzjYO39x9vt9vLLL6tdu3ZKTExUYmKiOnfurK+++srJEPzC2+02ffp0NW/eXOXLl1dsbKyaNGmiN954w8kQ/MKJ/VueqVOnyuVy6ZJLLvFyrf0v1PIbx1gAmjp1qkVERNjLL79s69atsyFDhlhsbKxt27atwPKbN2+2smXL2pAhQ2zdunX28ssvW0REhL377rvuMkuWLLEyZcrY6NGjbf369TZ69GgLDw+3ZcuWBWU8kyZNsvj4eNu1a5fHP1841Xi2bNlid9xxh02ePNmaNGliQ4YMyVcmmPqnJPEEU/8MGTLEHn30Ufvqq6/shx9+sBEjRlhERIStXLnSXSaY+qck8QRT/6xcudLefvttW7NmjW3ZssXeeOMNK1u2rL300kvuMsHUPyWJJ5j6J8/+/futZs2a1rVrV2vcuLHHPH/2j7eE2rjsK06021dffWXDhg2zKVOmWHJysj399NM+isZ3nGi36667zl544QVbtWqVrV+/3vr3728JCQm2Y8cOX4XlOCfabeHChTZ9+nRbt26dbdq0yZ555hkrU6aMzZ4921dhOc6JdsuzdetWq1q1qrVr184uvvhihyPxrVDLb5wUkAn3+eefb7fddpvHtLp169rw4cMLLH/PPfdY3bp1Pabdeuut1rJlS/ffV111lXXv3t2jTLdu3eyaa67xUq0L50Q8kyZNsoSEBK/XtSRONZ4TZWZmFpigBlP/nKiweIK1f/LUr1/fRo0a5f47WPsnz8nxBHv/XHrppXbDDTe4/w72/jk5nmDsn6uvvtr++c9/WlZWVr6E25/94y2hNi77ihPtdqKMjIyQTLidbjczs5ycHIuLi7PJkyefeYUDhC/azcysadOm9s9//vPMKhtAnGq3nJwca9Omjb3yyivWt2/fkEu4Qy2/cVLA3VL+119/6euvv1bXrl09pnft2lVLliwp8DNLly7NV75bt25asWKFjh49WmSZwpbpLU7FI0l//PGHMjIylJaWpl69emnVqlXeD+AkpxNPSQRT/5RUsPZPbm6usrOzlZSU5J4WzP1TUDxS8PbPqlWrtGTJEmVmZrqnBXP/FBSPFFz9M2nSJP3444/KysoqcL6/+sdbQm1c9hUnx/9Q5qt2O3TokI4ePZpvbAhWvmg3M9P8+fO1YcMGXXDBBd6rvB852W4PPvigKlWqpJtvvtn7FfezUMtvnBZwCfcvv/yiY8eOqUqVKh7Tq1Spot27dxf4md27dxdYPicnR7/88kuRZQpbprc4FU/dunX12muvaebMmZoyZYqio6PVpk0bbdy40ZlA/r/Tiackgql/SiKY++fJJ5/UwYMHddVVV7mnBXP/FBRPMPZPWlqaoqKi1Lx5cw0aNEh/+9vf3POCsX+KiieY+mfjxo0aPny43nrrLYWHhxdYxl/94y2hNi77ilPtFup81W7Dhw9X1apV1blzZ+9U3M+cbLfff/9d5cqVU2RkpHr27Klx48apS5cu3g/CD5xqty+++EITJ07Uyy+/7EzF/SzU8hunFXx0EABcLpfH32aWb1px5U+efqrL9CZvx9OyZUu1bNnSPb9NmzY699xzNW7cOD333HPeqvYp1e9M2zKY+qc4wdo/U6ZM0ciRI/XBBx+ocuXKXlmmN3g7nmDsn88++0x//PGHli1bpuHDh+uss87Stddee0bL9BZvxxMs/XPs2DFdd911GjVqlGrXru2VZQayUBuXfcWJdisNnGy3xx57TFOmTNGiRYsUHR3thdoGDifaLS4uTqtXr9Yff/yh+fPn6+6771bNmjXVvn1771Xcz7zZbtnZ2brhhhv08ssvq2LFit6vbAAJtfzGKQGXcFesWFFlypTJd3Zk7969+c6K5ElOTi6wfHh4uCpUqFBkmcKW6S1OxXOysLAwnXfeeY6fATqdeEoimPrndARD/7zzzju6+eabNW3atHxn/IOxf4qK52TB0D81atSQJDVs2FB79uzRyJEj3QlqMPZPUfGcLFD7Jzs7WytWrNCqVas0ePBgSccfYTAzhYeHa86cOerYsaPf+sdbQm1c9hVfjf+hxul2e+KJJzR69GjNmzdPjRo18m7l/cjJdgsLC9NZZ50lSWrSpInWr1+vMWPGhETC7US7rV27Vlu3btVFF13knp+bmytJCg8P14YNG1SrVi0vR+JboZbfOC3gbimPjIxUs2bNNHfuXI/pc+fOVevWrQv8TKtWrfKVnzNnjpo3b66IiIgiyxS2TG9xKp6TmZlWr16tlJQU71S8EKcTT0kEU/+cjkDvnylTpqhfv356++231bNnz3zzg61/iovnZIHePyczMx05csT9d7D1z8lOjqeg+YHYP/Hx8fruu++0evVq97/bbrtNderU0erVq9WiRQtJ/usfbwm1cdlXfDX+hxon2+3xxx/XQw89pNmzZ6t58+ber7wf+XJ7K26fHUycaLe6devmGxt69+6tDh06aPXq1UpPT3csHl8JtfzGcQ6+kO205b1mfuLEibZu3Tq78847LTY21rZu3WpmZsOHD7cbb7zRXT7vNfN33XWXrVu3ziZOnJjvNfNffPGFlSlTxsaOHWvr16+3sWPH+vxnc7wZz8iRI2327Nn2448/2qpVq6x///4WHh5uX375ZcDFY2a2atUqW7VqlTVr1syuu+46W7Vqla1du9Y9P5j6pyTxBFP/vP322xYeHm4vvPCCx08w7N+/310mmPqnJPEEU/88//zzNnPmTPvhhx/shx9+sFdffdXi4+Ptvvvuc5cJpv4pSTzB1D8nK+gt5f7sH28JtXHZV5xotyNHjrjHoJSUFBs2bJitWrXKNm7c6PP4nOJEuz366KMWGRlp7777rsfYkJ2d7fP4nOJEu40ePdrmzJljP/74o61fv96efPJJCw8Pt5dfftnn8TnFiXY7WSi+pTzU8hsnBWTCbWb2wgsvWEZGhkVGRtq5555rixcvds/r27evZWZmepRftGiRNW3a1CIjI6169er24osv5lvmtGnTrE6dOhYREWF169a19957z+kw3Lwdz5133mnVqlWzyMhIq1SpknXt2tWWLFnii1DM7NTjkZTvX0ZGhkeZYOqf4uIJpv7JzMwsMJ6+fft6LDNY+qck8QRT/zz33HN2zjnnWNmyZS0+Pt6aNm1q48ePt2PHjnksM1j6pyTxBFP/nKyghNvMv/3jLaE2LvuKt9tty5YtBe7jitoug5G32y0jI6PAdsvKyvJBNL7j7Xa777777KyzzrLo6GhLTEy0Vq1a2dSpU30Rik85sX87USgm3Gahl984xWX2/59WBwAAAAAAXhNwz3ADAAAAABAKSLgBAAAAAHAACTcAAAAAAA4g4QYAAAAAwAEk3AAAAAAAOICEGwAAAAAAB5BwAwAAAADgABJuAAAAAAAcQMINAAAAAIADSLgBAAAAAHAACTcAAAAAAA4g4QYAAAAAwAEk3AAAAAAAOICEGwAAAAAAB5BwAwAAAADgABJuAAAAAAAcQMINAAAAAIADSLgBAAAAAHAACTcAAAAAAA4g4QYAAAAAwAEk3AAAAAAAOICEGwAAAAAAB5BwAwAAAADgABJuAAAAAAAcQMINAAAAAIADSLgBAAAAAHAACTcAAAAAAA4g4QYAAAAAwAEk3AAAAAAAOICEGwAAAAAAB5BwAwAAAADgABJuAAAAAAAcQMINAAAAAIADSLgBAAAAAHAACTcAAAAAAA4g4QYAAAAAwAEk3AAAAAAAOICEGyhlDh06pJEjR2rRokX+rgoAIIC98847OueccxQTEyOXy6XVq1f7bN3r1q3TyJEjtXXr1tNexmuvvSaXy3Xayxg5cqRcLpfHtOrVq6tfv36nXadAcaZtA6Dkwv1dAQC+dejQIY0aNUqS1L59e/9WBgAQkH7++WfdeOON6t69u8aPH6+oqCjVrl3bZ+tft26dRo0apfbt26t69eo+W29x3n//fcXHx/u7GmesZ8+eWrp0qVJSUvxdFSDkkXADKNKhQ4dUtmxZv63/6NGjcrlcCg/Pv7s607qZmQ4fPqyYmJgzqSIAhJwffvhBR48e1Q033KDMzMwiy/p7nPClpk2b+rsKXlGpUiVVqlTJ39U4ZYVta94Yz//8809FR0fnu6sBOFPcUo5S7eeff9aAAQOUnp6uqKgoVapUSW3atNG8efMkSQ899JDCw8O1ffv2fJ+96aabVKFCBR0+fFjS8dvMevXqpY8++khNmzZVTEyM6tWrp48++kjS8du36tWrp9jYWJ1//vlasWKFx/L69euncuXK6fvvv1e3bt0UGxurlJQUjR07VpK0bNkytW3bVrGxsapdu7YmT56cr067d+/WrbfeqrS0NEVGRqpGjRoaNWqUcnJyJElbt251D7CjRo2Sy+WSy+Vy3x6Xd/vcypUrdcUVVygxMVG1atXSG2+8IZfLpaVLl+Zb54MPPqiIiAjt3LmzyLbeuHGjrrvuOlWuXFlRUVGqV6+eXnjhBY8yixYtksvl0htvvKGhQ4eqatWqioqK0qZNm9zt891336lr166Ki4tTp06dJEm//vqrBg4cqKpVqyoyMlI1a9bUfffdpyNHjngs3+VyafDgwZowYYLq1aunqKioAtsRAEqzfv36qW3btpKkq6++Wi6Xy31HVFH74rlz5+riiy9WWlqaoqOjddZZZ+nWW2/VL7/8km8d33//va699lpVqVJFUVFRqlatmvr06aMjR47otdde05VXXilJ6tChg3useu211055PSU1a9YsNWnSRFFRUapRo4aeeOKJAsudfEt53rj19ttv695771VKSorKlSuniy66SHv27FF2drYGDBigihUrqmLFiurfv7/++OMPj2WamcaPH68mTZooJiZGiYmJuuKKK7R582aPcu3bt1eDBg20fPlytWvXTmXLllXNmjU1duxY5ebmusvl5ubq4YcfVp06dRQTE6Py5curUaNGevbZZ91lCrul/NVXX1Xjxo0VHR2tpKQkXXrppVq/fr1HmbxtYNOmTerRo4fKlSun9PR0DR06NN+4W5h33nlHrVq1UmxsrMqVK6du3bpp1apVBa6noG2tqPH8888/V6dOnRQXF6eyZcuqdevWmjVrlsey8+KfM2eObrrpJlWqVElly5Ytcf2BU2JAKdatWzerVKmS/etf/7JFixbZjBkz7IEHHrCpU6eamdmePXssKirK7rvvPo/P7du3z2JiYuz//u//3NMyMjIsLS3NGjRoYFOmTLGPP/7YWrRoYREREfbAAw9YmzZtbPr06fb+++9b7dq1rUqVKnbo0CH35/v27WuRkZFWr149e/bZZ23u3LnWv39/k2QjRoyw2rVr28SJE+2TTz6xXr16mSRbsWKF+/O7du2y9PR0y8jIsJdeesnmzZtnDz30kEVFRVm/fv3MzOzw4cM2e/Zsk2Q333yzLV261JYuXWqbNm0yM7OsrCyTZBkZGXbvvffa3LlzbcaMGXbkyBFLTk6266+/3qMdjh49aqmpqXbllVcW2c5r1661hIQEa9iwob3++us2Z84cGzp0qIWFhdnIkSPd5RYuXGiSrGrVqnbFFVfYzJkz7aOPPrJ9+/ZZ3759LSIiwqpXr25jxoyx+fPn2yeffGJ//vmnNWrUyGJjY+2JJ56wOXPm2P3332/h4eHWo0cPj3rkLbtRo0b29ttv24IFC2zNmjXFbicAUJps2rTJXnjhBZNko0ePtqVLl9ratWvNzArdF5uZvfjiizZmzBibOXOmLV682CZPnmyNGze2OnXq2F9//eVe/urVq61cuXJWvXp1mzBhgs2fP9/efPNNu+qqq+zAgQO2d+9eGz16tEmyF154wT1W7d2795TWM2nSJJNkW7ZsKTLeefPmWZkyZaxt27Y2ffp0mzZtmp133nlWrVo1O/lQOSMjw/r27ev+O2/cysjIsH79+tns2bNtwoQJVq5cOevQoYN16dLFhg0bZnPmzLFHH33UypQpY3//+989lnnLLbdYRESEDR061GbPnm1vv/221a1b16pUqWK7d+92l8vMzLQKFSrY2WefbRMmTLC5c+fawIEDTZJNnjzZXW7MmDFWpkwZy8rKsvnz59vs2bPtmWee8RhvC2qbvDa/9tprbdasWfb6669bzZo1LSEhwX744Qd3uROPV5544gmbN2+ePfDAA+ZyuWzUqFFFtrWZ2SOPPGIul8tuuukm++ijj2z69OnWqlUri42NdW9neespbFsrbDxftGiRRUREWLNmzeydd96xGTNmWNeuXc3lcrmP7U6Mv2rVqjZgwAD7z3/+Y++++67l5OQUW3/gVJFwo1QrV66c3XnnnUWW6du3r1WuXNmOHDninvboo49aWFiYx0CVkZFhMTExtmPHDve01atXmyRLSUmxgwcPuqfPmDHDJNnMmTM91iPJ3nvvPfe0o0ePWqVKlUySrVy50j193759VqZMGbv77rvd02699VYrV66cbdu2zaP+TzzxhElyD2I///yzSbKsrKx8seYl3A888ECB8yIjI23Pnj3uae+8845JssWLFxfYdnm6detmaWlp9vvvv3tMHzx4sEVHR9uvv/5qZv87cLngggvyLSOvfV599VWP6RMmTDBJ9u9//9tj+qOPPmqSbM6cOe5pkiwhIcG9PgBAwfL2x9OmTfOYXti++GS5ubl29OhR27Ztm0myDz74wD2vY8eOVr58eXcCXZBp06aZJFu4cOFpr6ekCXeLFi0sNTXV/vzzT/e0AwcOWFJSUokT7osuusij3J133mmS7I477vCYfskll1hSUpL776VLl5oke/LJJz3Kbd++3WJiYuyee+5xT8vMzDRJ9uWXX3qUrV+/vnXr1s39d69evaxJkyZFxnxy2/z2228WExOT70T1Tz/9ZFFRUXbddde5p+VtAyePuz169LA6deoUud6ffvrJwsPD8510yM7OtuTkZLvqqqvyraegba2w8bxly5ZWuXJly87Odk/LycmxBg0aWFpamuXm5nrE36dPnyLrC3gDt5SjVDv//PP12muv6eGHH9ayZct09OjRfGWGDBmivXv3atq0aZKO36r14osvqmfPnvle5NKkSRNVrVrV/Xe9evUkHb8N7MRnjvKmb9u2zePzLpdLPXr0cP8dHh6us846SykpKR7PjSUlJaly5coen//oo4/UoUMHpaamKicnx/3vwgsvlCQtXry4xO1y+eWX55t2++23S5Jefvll97Tnn39eDRs21AUXXFDosg4fPqz58+fr0ksvVdmyZT3q1qNHDx0+fFjLli0rdv2FzVuwYIFiY2N1xRVXeEzPu+Vv/vz5HtM7duyoxMTEQpcPACheQfvpvXv36rbbblN6errCw8MVERGhjIwMSXLflnzo0CEtXrxYV1111Wk/Q1yS9ZTUwYMHtXz5cl122WWKjo52T4+Li9NFF11U4uX06tXL4++8cb5nz575pv/666/u28o/+ugjuVwu3XDDDR7jY3Jysho3bpzvF0WSk5N1/vnne0xr1KiRx/HA+eefr2+++UYDBw7UJ598ogMHDhRb/6VLl+rPP//M9wb29PR0dezYMd9Y6nK58rXPyfUoyCeffKKcnBz16dPHI97o6GhlZmYW+AsqhR0TnDyeHzx4UF9++aWuuOIKlStXzj29TJkyuvHGG7Vjxw5t2LChRMsGvImEG6XaO++8o759++qVV15Rq1atlJSUpD59+mj37t3uMk2bNlW7du3czxt/9NFH2rp1qwYPHpxveUlJSR5/R0ZGFjk97/nvPGXLlvUY8PPKnvz5vOknfn7Pnj368MMPFRER4fHvnHPOkaRTeratoLeWVqlSRVdffbVeeuklHTt2TN9++60+++yzAtvhRPv27VNOTo7GjRuXr255JxdOrlthb00tW7ZsvrfD7tu3T8nJyfleclK5cmWFh4dr3759JVo2AKBkCtoX5+bmqmvXrpo+fbruuecezZ8/X1999ZX7hOqff/4pSfrtt9907NgxpaWlnda6S7qekvrtt9+Um5ur5OTkfPMKmlaY0x3/9+zZIzNTlSpV8o2Ry5Ytyzc+VqhQId+6o6KiPOIeMWKEnnjiCS1btkwXXnihKlSooE6dOuV7d8yJ8sbKgsbI1NTUfGNpQccrUVFR+Y5rTrZnzx5J0nnnnZcv3nfeeSdfvAVta3lOrutvv/0mMys0BkkcE8AveEs5SrWKFSvqmWee0TPPPKOffvpJM2fO1PDhw7V3717Nnj3bXe6OO+7QlVdeqZUrV+r5559X7dq11aVLFz/WPL+KFSuqUaNGeuSRRwqcnzfYlERhb+gcMmSI3njjDX3wwQeaPXu2ypcvr+uvv77IZSUmJrrPLg8aNKjAMjVq1CjR+guaXqFCBX355ZcyM4/5e/fuVU5OjipWrFiiZQMASqag/eiaNWv0zTff6LXXXlPfvn3d0zdt2uRRLikpSWXKlNGOHTtOa90lXU9JJSYmyuVyeZxoz1PQNG+rWLGiXC6XPvvsM0VFReWbX9C04oSHh+vuu+/W3Xffrf3792vevHn6xz/+oW7dumn79u0FvuU7L5HftWtXvnk7d+7MN5aerrzlvPvuu+67EopS1Jh98rzExESFhYUVGsOJ6y/J8gFvIeEG/r9q1app8ODBmj9/vr744guPeZdeeqmqVaumoUOHavHixXr66acDbifdq1cvffzxx6pVq1aRt0znDd6nehVAkpo1a6bWrVvr0Ucf1Zo1azRgwADFxsYW+ZmyZcuqQ4cOWrVqlRo1auQ+u+8tnTp10r///W/NmDFDl156qXv666+/7p4PAHBW3ph4coL40ksvefwdExOjzMxMTZs2TY888kihiVxhY1VJ11NSeb8cMn36dD3++OPuq7bZ2dn68MMPT2uZp6JXr14aO3as/vvf/+qqq67y+vLLly+vK664Qv/973915513auvWrapfv36+cq1atVJMTIzefPNN9xviJWnHjh1asGBBvse2Tle3bt0UHh6uH3/80eu3c8fGxqpFixaaPn26nnjiCfdPhOXm5urNN99UWlqaT39LHshDwo1S6/fff1eHDh103XXXqW7duoqLi9Py5cs1e/ZsXXbZZR5ly5Qpo0GDBunee+9VbGxsvmecAsGDDz6ouXPnqnXr1rrjjjtUp04dHT58WFu3btXHH3+sCRMmKC0tTXFxccrIyNAHH3ygTp06KSkpSRUrVsz3PHphhgwZ4v6ZmIEDB5boM88++6zatm2rdu3a6fbbb1f16tWVnZ2tTZs26cMPP9SCBQtOO+4+ffrohRdeUN++fbV161Y1bNhQn3/+uUaPHq0ePXqoc+fOp71sAEDJ1K1bV7Vq1dLw4cNlZkpKStKHH36ouXPn5iv71FNPqW3btmrRooWGDx+us846S3v27NHMmTP10ksvKS4uTg0aNJAk/etf/1JcXJyio6NVo0aNU1pPST300EPq3r27unTpoqFDh+rYsWN69NFHFRsbq19//fW0l1sSbdq00YABA9S/f3+tWLFCF1xwgWJjY7Vr1y59/vnnatiwofsdKiV10UUXqUGDBmrevLkqVaqkbdu26ZlnnlFGRobOPvvsAj9Tvnx53X///frHP/6hPn366Nprr9W+ffs0atQoRUdHKysryxvhqnr16nrwwQd13333afPmzerevbsSExO1Z88effXVV4qNjdWoUaNOe/ljxoxRly5d1KFDBw0bNkyRkZEaP3681qxZoylTpgTcxRKUDiTcKLWio6PVokULvfHGG9q6dauOHj2qatWq6d5779U999yTr/zVV1+te++9VzfeeKMSEhL8UOOipaSkaMWKFXrooYf0+OOPa8eOHYqLi1ONGjXcA1qeiRMn6v/+7//Uu3dvHTlyRH379nX/vmlxLrnkEkVFRalDhw6FDtwnq1+/vlauXKmHHnpI//znP7V3716VL19eZ599tsdL4k5HdHS0Fi5cqPvuu0+PP/64fv75Z1WtWlXDhg3z2gECAKBoERER+vDDDzVkyBDdeuutCg8PV+fOnTVv3jxVq1bNo2zjxo311VdfKSsrSyNGjFB2draSk5PVsWNH911QNWrU0DPPPKNnn31W7du317FjxzRp0iT169evxOspqS5dumjGjBn65z//qauvvlrJyckaOHCg/vzzzzNK/krqpZdeUsuWLfXSSy9p/Pjxys3NVWpqqtq0aZPvBWkl0aFDB7333nt65ZVXdODAASUnJ6tLly66//77FRERUejnRowYocqVK+u5557TO++8o5iYGLVv316jR48u8XhfEiNGjFD9+vX17LPPasqUKTpy5IiSk5N13nnn6bbbbjujZWdmZmrBggXKyspSv379lJubq8aNG2vmzJn5XmwH+IrLzMzflQCCwbhx43THHXdozZo17heRlUYffvihevfurVmzZp1xsgwAAACEMhJuoBirVq3Sli1bdOutt6pNmzaaMWOGv6vkF+vWrdO2bds0ZMgQxcbGauXKldyaBQAAABSBhBsoRvXq1bV79261a9dOb7zxxin9TEgoad++vb744gude+65mjx5surWrevvKgEAAAABjYQbAAAAAAAHhPm7AgAAAAAAhCISbgAAAAAAHOCXnwXLzc3Vzp07FRcXx0uXAAClmpkpOztbqampCgvzz3lwxmUAAI7z9rjsl4R7586dSk9P98eqAQAISNu3b1daWppf1s24DACAJ2+Ny35JuOPi4iQdDyI+Pt4fVQAAICAcOHBA6enp7rHRHxiXAQA4ztvjsl8S7rzb1eLj4xnYAQCQ/HorN+MyAACevDUu+yXhxhlYOKbo+R1G+KYeAAA4jTEPABDkeEs5AAAAAAAOIOEGAAAAAMABJNwAAAAAADiAhBsAAAAAAAfw0jQAAOAfxb0UDQCAIEfC7UslObDgjasAAAAAEBK4pRwAAAAAAAeQcAMAAAAA4AASbgAAAAAAHEDCDQAAAACAA0i4AQAAAABwAAk3AAAAAAAOIOEGAAAAAMAB/A53qCnut775nW8AAAAA8AkSbi96eu4PRc6/i9YGAAAAgFKDW8oBAAAAAHAA11wBAEBwKu4xKolHqQAAfsUVbgAAAAAAHMAV7lNRzJn0lj/tK/rzNSt4sTIAAAAAgEBGwh1oSnJ7HAAAAAAg4JFwlzY87wYAKE34uUwAgB+RcPvQ0s3F3HIuqRW3nQMAAABASOClaQAAAAAAOICEGwAAAAAAB5BwAwAAAADgABJuAAAAAAAcQMINAAAAAIADeEt5gCnuTeYh8RZzfpoMAAAAQClAwh1ivJKw85ulAIAQUey42MFHFQEAlErcUg4AAAAAgAO4wh1kijtTDwBAacGYCAAIdCTcAACg1Hp67g9Fzr8r/L2iF8BjVgCAIpBww/tK8lI0AAAAAAhxJNyngFvXAAAAAAAlRcJdypTkpEFI/PQYACDgBcKJ7JY//avoAoyJAIAzQMINAACcwSNGAIBSjoQbAACgEPyONwDgTJBwn4gz8SVDOwEAAABAscL8XQEAAAAAAEIRV7hPEAgvbwEAAAAAhAYSbqAwxd0632GEb+oBAAhYT8/9odgyd3Wp7YOaAAACEbeUAwAAAADgAK5wI58zvbW+uN/xLtFvgfPWVwBAiFg6cViR81vd/ISPagIA8DUSbgQmp9+Ezu3gAAAAABxWahLu4s4uAwAAAADgTaUm4UbpUtxt69yyDgDwhpY//euMl1Hci9d46RoABC8SbpROTt+y7iu8SR0AQh/7egAIWiTcAAAAwawkJ5FJygHAL0i4EZDO9E3pPlHMAc7TOZcXOT9YbhHkVkcApyso9uVBoNjHpIr5dRBJXCUHAD8h4YbXcYB1XLHP9S0swQESB0DFJvwSST+A4Hamz4GX6Oc2S5KUF4WEHQBOi18SbjOTJB04cMBry3xhwaYi55/35xGvrQvB78DBw46v42Ax21xJ6vDV838vcv751ZOKXsBHWUXPv2BosXU4fPCPIucX9z0u7rvpDWNmrCxy/qCOZxU53xt1PNN1DAr/oOgVlKCv4B3F9lUxfX2q8r5DeWOjPzgxLkvF7wfhO/PW7iy6wNqix5tinennJS1P61/k/PN2TCpyfrFjoqSvtv5a9DL6PFLsMkqFT58sen4ojEnFxVgSodAOwcKH26S3x2WX+WGE37Fjh9LT0329WgAAAtb27duVlpbml3UzLgMA4Mlb47JfEu7c3Fzt3LlTcXFxcrlcvl69Dhw4oPT0dG3fvl3x8fE+X7+vEGdoIc7QQpyh5UziNDNlZ2crNTVVYWFhDtWwaMWNy6WlH08HbVM42qZotE/haJvC0TaF81bbeHtc9sst5WFhYX47i3+i+Pj4UrGhEmdoIc7QQpyh5XTjTEhIcKA2JVfScbm09OPpoG0KR9sUjfYpHG1TONqmcN5oG2+Oy/45lQ4AAAAAQIgj4QYAAAAAwAGlMuGOiopSVlaWoqKi/F0VRxFnaCHO0EKcoSXU4wz1+M4EbVM42qZotE/haJvC0TaFC9S28ctL0wAAAAAACHWl8go3AAAAAABOI+EGAAAAAMABJNwAAAAAADiAhBsAAAAAAAeQcAMAAAAA4ICQSbjHjx+vGjVqKDo6Ws2aNdNnn31WZPnFixerWbNmio6OVs2aNTVhwgSP+WvXrtXll1+u6tWry+Vy6ZlnnnGw9iXn7ThffvlltWvXTomJiUpMTFTnzp311VdfORlCiXg7zunTp6t58+YqX768YmNj1aRJE73xxhtOhlAi3o7zRFOnTpXL5dIll1zi5VqfOm/H+dprr8nlcuX7d/jwYSfDKJYT/bl//34NGjRIKSkpio6OVr169fTxxx87FUKJeDvO9u3bF9ifPXv2dDKMYjnRn88884zq1KmjmJgYpaen66677vLJdutELO+9957q16+vqKgo1a9fX++///4Zr9cf/NE2I0eOzLe9JycnezUub/DXsVVp3G5K0jbBst1I/jteLY3bTknaJli2HX8d/zu+3VgImDp1qkVERNjLL79s69atsyFDhlhsbKxt27atwPKbN2+2smXL2pAhQ2zdunX28ssvW0REhL377rvuMl999ZUNGzbMpkyZYsnJyfb000/7KJrCORHnddddZy+88IKtWrXK1q9fb/3797eEhATbsWOHr8LKx4k4Fy5caNOnT7d169bZpk2b7JlnnrEyZcrY7NmzfRVWPk7EmWfr1q1WtWpVa9eunV188cUOR1I0J+KcNGmSxcfH265duzz++ZMTcR45csSaN29uPXr0sM8//9y2bt1qn332ma1evdpXYeXjRJz79u3z6Mc1a9ZYmTJlbNKkST6KKj8n4nzzzTctKirK3nrrLduyZYt98sknlpKSYnfeeWfQxbJkyRIrU6aMjR492tavX2+jR4+28PBwW7Zs2Wmv1x/81TZZWVl2zjnneGz3e/fudTzeU+GvY6vSut2UpG2CYbsx89/xamnddkrSNsGw7fjr+N8X201IJNznn3++3XbbbR7T6tata8OHDy+w/D333GN169b1mHbrrbday5YtCyyfkZEREAm303GameXk5FhcXJxNnjz5zCt8mnwRp5lZ06ZN7Z///OeZVfYMOBVnTk6OtWnTxl555RXr27ev3xNuJ+KcNGmSJSQkeL2uZ8KJOF988UWrWbOm/fXXX96v8Gnyxffz6aeftri4OPvjjz/OvMKnyYk4Bw0aZB07dvQoc/fdd1vbtm29VOuCORHLVVddZd27d/co061bN7vmmmtOe73+4K+2ycrKssaNG59h7Z3lr2Or0rrdnKiwtgmG7cbMf8erbDvHFdQ2wbDt+Ov43xfbTdDfUv7XX3/p66+/VteuXT2md+3aVUuWLCnwM0uXLs1Xvlu3blqxYoWOHj3qWF3PhK/iPHTokI4ePaqkpCTvVPwU+SJOM9P8+fO1YcMGXXDBBd6r/ClwMs4HH3xQlSpV0s033+z9ip8iJ+P8448/lJGRobS0NPXq1UurVq3yfgAl5FScM2fOVKtWrTRo0CBVqVJFDRo00OjRo3Xs2DFnAimGr/ZDEydO1DXXXKPY2FjvVPwUORVn27Zt9fXXX7tv9du8ebM+/vhjR2+ddyqWwsrkLfN01utr/mqbPBs3blRqaqpq1Kiha665Rps3bz7TkLzGX8dWpXm7KalA3m4k/x2vsu38T2HH8oG87fjr+N9X203QJ9y//PKLjh07pipVqnhMr1Klinbv3l3gZ3bv3l1g+ZycHP3yyy+O1fVM+CrO4cOHq2rVqurcubN3Kn6KnIzz999/V7ly5RQZGamePXtq3Lhx6tKli/eDKAGn4vziiy80ceJEvfzyy85U/BQ5FWfdunX12muvaebMmZoyZYqio6PVpk0bbdy40ZlAiuFUnJs3b9a7776rY8eO6eOPP9Y///lPPfnkk3rkkUecCaQYvtgPffXVV1qzZo3+9re/ea/ip8ipOK+55ho99NBDatu2rSIiIlSrVi116NBBw4cPdyYQORdLYWXylnk66/U1f7WNJLVo0UKvv/66PvnkE7388svavXu3WrdurX379nkjtDPmr2Or0rzdlESgbzeS/45X2Xb+p6Bj+UDfdvx1/O+r7Sbca0vyM5fL5fG3meWbVlz5gqYHGifjfOyxxzRlyhQtWrRI0dHRXqjt6XMizri4OK1evVp//PGH5s+fr7vvvls1a9ZU+/btvVfxU+TNOLOzs3XDDTfo5ZdfVsWKFb1f2TPg7f5s2bKlWrZs6Z7fpk0bnXvuuRo3bpyee+45b1X7lHk7ztzcXFWuXFn/+te/VKZMGTVr1kw7d+7U448/rgceeMDLtS85J/dDEydOVIMGDXT++ed7oaZnxttxLlq0SI888ojGjx+vFi1aaNOmTRoyZIhSUlJ0//33e7n2xdftTPusJMs81fX6gz/a5sILL3T/v2HDhmrVqpVq1aqlyZMn6+677z71IBzir2Or0rrdFCdYthvJf8erpX3bKaxtgmXb8dfxv9PbTdAn3BUrVlSZMmXynYXYu3dvvrMVeZKTkwssHx4ergoVKjhW1zPhdJxPPPGERo8erXnz5qlRo0berfwpcDLOsLAwnXXWWZKkJk2aaP369RozZoxfEm4n4ly7dq22bt2qiy66yD0/NzdXkhQeHq4NGzaoVq1aXo6kaL76foaFhem8887z2xVup+JMSUlRRESEypQp4y5Tr1497d69W3/99ZciIyO9HEnRnO7PQ4cOaerUqXrwwQe9W/FT5FSc999/v2688Ub31fuGDRvq4MGDGjBggO677z6FhXn/pjOnYimsTN4yT2e9vuavtilIbGysGjZs6Ld92Mn8dWxVmreb0xFo243kv+NVtp1TO5YPtG3HX8f/vtpugv6W8sjISDVr1kxz5871mD537ly1bt26wM+0atUqX/k5c+aoefPmioiIcKyuZ8LJOB9//HE99NBDmj17tpo3b+79yp8CX/anmenIkSNnXunT4EScdevW1XfffafVq1e7//Xu3VsdOnTQ6tWrlZ6e7lg8hfFVf5qZVq9erZSUFO9U/BQ5FWebNm20adMm94kTSfrhhx+UkpLi82Rbcr4///3vf+vIkSO64YYbvFvxU+RUnIcOHcqXVJcpU0Z2/AWmXozgf5yKpbAyecs8nfX6mr/apiBHjhzR+vXr/bYPO5m/jq1K83ZzOgJtu5H8d7xa2redUz2WD7Rtx1/H/z7bbrz2+jU/ynud+8SJE23dunV25513WmxsrG3dutXMzIYPH2433niju3zea+TvuusuW7dunU2cOLHAn+NZtWqVrVq1ylJSUmzYsGG2atUq27hxo8/jy+NEnI8++qhFRkbau+++6/FTAdnZ2T6PL48TcY4ePdrmzJljP/74o61fv96efPJJCw8Pt5dfftnn8eVxIs6TBcJbyp2Ic+TIkTZ79mz78ccfbdWqVda/f38LDw+3L7/80ufx5XEizp9++snKlStngwcPtg0bNthHH31klStXtocfftjn8eVxcrtt27atXX311T6LpShOxJmVlWVxcXE2ZcoU27x5s82ZM8dq1aplV111VdDF8sUXX1iZMmVs7Nixtn79ehs7dmyhPwtW2HoDgb/aZujQobZo0SLbvHmzLVu2zHr16mVxcXEh3zYlObYqrdtNSdomGLYbM/8dr5bWbackbRMM246/jv99sd2ERMJtZvbCCy9YRkaGRUZG2rnnnmuLFy92z+vbt69lZmZ6lF+0aJE1bdrUIiMjrXr16vbiiy96zN+yZYtJyvfv5OX4mrfjzMjIKDDOrKwsH0RTOG/Hed9999lZZ51l0dHRlpiYaK1atbKpU6f6IpQieTvOkwVCwm3m/TjvvPNOq1atmkVGRlqlSpWsa9eutmTJEl+EUiQn+nPJkiXWokULi4qKspo1a9ojjzxiOTk5TodSJCfi3LBhg0myOXPmOF39EvN2nEePHrWRI0darVq1LDo62tLT023gwIH222+/BV0sZmbTpk2zOnXqWEREhNWtW9fee++9U1pvoPBH21x99dWWkpJiERERlpqaapdddpmtXbvWkfjOhL+OrUrjdlOStgmW7cbMf8erpXHbKUnbBMu246/jf6e3G5eZQ/exAQAAAABQigX9M9wAAAAAAAQiEm4AAAAAABxAwg0AAAAAgANIuAEAAAAAcAAJNwAAAAAADiDhBgAAAADAASTcAAAAAAA4gIQbAAAAAAAHkHADAAAAAOAAEm4AAAAAABxAwg0AAAAAgANIuAEAAAAAcAAJNwAAAAAADiDhBgAAAADAASTcAAAAAAA4gIQbAAAAAAAHkHADAAAAAOAAEm4AAAAAABxAwg0AAAAAgANIuAEAAAAAcAAJNwAAAAAADiDhBgAAAADAASTcAAAAAAA4gIQbAAAAAAAHkHADAAAAAOAAEm4AAAAAABxAwg0AAAAAgANIuAEAAAAAcAAJNwAAAAAADiDhBgAAAADAASTcAAAAAAA4gIQbAAAAAAAHkHADAAAAAOAAEm4AAAAAABxAwg0AAAAAgANIuAEAAAAAcAAJNxBiDh06pJEjR2rRokX+roq2bt0ql8ul1157zT1t5MiRcrlc/qsUAAAOOtVxeOfOnRo5cqRWr17taL0k6e2339Yzzzzj+HoA/A8JNxBiDh06pFGjRgVEwl2Qv/3tb1q6dKm/qwEAgCNOdRzeuXOnRo0aRcINhCgSbiBIHDp0yGfrOnr0qHJychxZdlpamlq2bOnIsgEAcIovx+FgQ9sAhSPhBk7R2rVr5XK5NG3aNPe0r7/+Wi6XS+ecc45H2d69e6tZs2buv3Nzc/XYY4+pbt26ioqKUuXKldWnTx/t2LHD43Pt27dXgwYN9Omnn6p169YqW7asbrrpJknSggUL1L59e1WoUEExMTGqVq2aLr/8ch06dEhbt25VpUqVJEmjRo2Sy+WSy+VSv379Co1n0aJFcrlceuONNzR06FBVrVpVUVFR2rRpk37++WcNHDhQ9evXV7ly5VS5cmV17NhRn332Wb7l7Ny5U1dddZXi4uKUkJCgq6++Wrt3785XrqBbyl0ul0aOHJmvbPXq1T3qfujQIQ0bNkw1atRQdHS0kpKS1Lx5c02ZMqXQ+AAAoSWUxuFFixbpvPPOkyT179/fXf7EMXHFihXq3bu3kpKSFB0draZNm+rf//63e/4vv/yi9PR0tW7dWkePHnVPX7dunWJjY3XjjTe6Y5o1a5a2bdvmXk/eeJx3LHDyVfmCHg3r16+fypUrp++++05du3ZVXFycOnXqJEn666+/9PDDD7vbt1KlSurfv79+/vnnAuMHSoNwf1cACDbnnHOOUlJSNG/ePF155ZWSpHnz5ikmJkbr1q3Tzp07lZqaqpycHC1evFi33Xab+7O33367/vWvf2nw4MHq1auXtm7dqvvvv1+LFi3SypUrVbFiRXfZXbt26YYbbtA999yj0aNHKywsTFu3blXPnj3Vrl07vfrqqypfvrz++9//avbs2frrr7+UkpKi2bNnq3v37rr55pv1t7/9TZLcg39RRowYoVatWmnChAkKCwtT5cqV3QNkVlaWkpOT9ccff+j9999X+/btNX/+fLVv316S9Oeff6pz587auXOnxowZo9q1a2vWrFm6+uqrvdXskqS7775bb7zxhh5++GE1bdpUBw8e1Jo1a7Rv3z6vrgcAELhCaRw+99xzNWnSJPXv31///Oc/1bNnT0nH7waTpIULF6p79+5q0aKFJkyYoISEBE2dOlVXX321Dh06pH79+qlixYqaOnWq2rdvr3vvvVdPPfWUDh06pCuvvFLVqlXThAkTJEnjx4/XgAED9OOPP+r9998/oz7466+/1Lt3b916660aPny4cnJylJubq4svvlifffaZ7rnnHrVu3Vrbtm1TVlaW2rdvrxUrVigmJuaM1gsEJQNwym644QarWbOm++/OnTvbLbfcYomJiTZ58mQzM/viiy9Mks2ZM8fMzNavX2+SbODAgR7L+vLLL02S/eMf/3BPy8zMNEk2f/58j7LvvvuuSbLVq1cXWreff/7ZJFlWVlaJYlm4cKFJsgsuuKDYsjk5OXb06FHr1KmTXXrppe7pL774okmyDz74wKP8LbfcYpJs0qRJ7mlZWVl28q6nsPpmZGRY37593X83aNDALrnkkhLFBQAIXaE0Di9fvjzfWJmnbt261rRpUzt69KjH9F69ellKSoodO3bMPe3RRx81Sfb+++9b3759LSYmxr799luPz/Xs2dMyMjLyrSfvWGDhwoUe07ds2ZKvbn379jVJ9uqrr3qUnTJlikmy9957r8D4xo8fX0QrAKGLW8qB09CpUydt3rxZW7Zs0eHDh/X555+re/fu6tChg+bOnSvp+Nn2qKgotW3bVtLxs9SS8t1Wdv7556tevXqaP3++x/TExER17NjRY1qTJk0UGRmpAQMGaPLkydq8ebPXYrr88ssLnD5hwgSde+65io6OVnh4uCIiIjR//nytX7/eXWbhwoWKi4tT7969PT573XXXea1+0vG2+s9//qPhw4dr0aJF+vPPP726fABAcAjFcfhkmzZt0vfff6/rr79ekpSTk+P+16NHD+3atUsbNmxwl/+///s/9ezZU9dee60mT56scePGqWHDho7V7+Tjho8++kjly5fXRRdd5FHXJk2aKDk5OWBf5go4jYQbOA2dO3eWdHww//zzz3X06FF17NhRnTt3dg/Y8+bNU5s2bdy3T+Xd9pySkpJveampqfluiy6oXK1atTRv3jxVrlxZgwYNUq1atVSrVi09++yzZxxTQet76qmndPvtt6tFixZ67733tGzZMi1fvlzdu3f3SHb37dunKlWq5Pt8cnLyGdfrRM8995zuvfdezZgxQx06dFBSUpIuueQSbdy40avrAQAEtlAch0+2Z88eSdKwYcMUERHh8W/gwIGSjj+/nSfvWfHDhw8rOTnZ/ey2E8qWLav4+Ph89d2/f78iIyPz1Xf37t0edQVKE57hBk5DWlqaateurXnz5ql69epq3ry5ypcvr06dOmngwIH68ssvtWzZMo0aNcr9mQoVKkg6/kxY3rNZeXbu3Onx3JikQn+rul27dmrXrp2OHTumFStWaNy4cbrzzjtVpUoVXXPNNacdU0Hre/PNN9W+fXu9+OKLHtOzs7M9/q5QoYK++uqrfJ8v6KVpBYmKitKRI0fyTT/54Cc2NlajRo3SqFGjtGfPHvfV7osuukjff/99idYFAAh+oTgOnyyvPiNGjNBll11WYJk6deq4/79r1y4NGjRITZo00dq1azVs2DA999xzJVpXdHS0JOUbiwtLkgtqm4oVK6pChQqaPXt2gZ+Ji4srUV2AUMMVbuA0de7cWQsWLNDcuXPVpUsXSVLt2rVVrVo1PfDAAzp69Kj7DLwk921pb775psdyli9frvXr17vf8FlSZcqUUYsWLfTCCy9IklauXCnpePIqySu3W7tcLvfy8nz77bf5fke7Q4cOys7O1syZMz2mv/322yVaT/Xq1fXtt996TFuwYIH++OOPQj9TpUoV9evXT9dee602bNjAT5IAQCkTKuNwYeXr1Kmjs88+W998842aN29e4L+8JPbYsWO69tpr5XK59J///EdjxozRuHHjNH369HzrKqhe1atXl6R8Y/HJ43pRevXqpX379unYsWMF1vXEkwNAacIVbuA0derUSePHj9cvv/yiZ555xmP6pEmTlJiY6PFTJHXq1NGAAQM0btw4hYWF6cILL3S/HTU9PV133XVXseucMGGCFixYoJ49e6patWo6fPiwXn31VUn/u70uLi5OGRkZ+uCDD9SpUyclJSWpYsWK7sH0VPTq1UsPPfSQsrKylJmZqQ0bNujBBx9UjRo1PH6nu0+fPnr66afVp08fPfLIIzr77LP18ccf65NPPinRem688Ubdf//9euCBB5SZmal169bp+eefV0JCgke5Fi1aqFevXmrUqJESExO1fv16vfHGG2rVqpXKli17yvEBAIJXqIzDtWrVUkxMjN566y3Vq1dP5cqVU2pqqlJTU/XSSy/pwgsvVLdu3dSvXz9VrVpVv/76q9avX6+VK1e6fxotKytLn332mebMmaPk5GQNHTpUixcv1s0336ymTZuqRo0akqSGDRtq+vTpevHFF9WsWTOFhYWpefPmSk5OVufOnTVmzBglJiYqIyND8+fPz5ewF+Waa67RW2+9pR49emjIkCE6//zzFRERoR07dmjhwoW6+OKLdemll5Z4eUDI8Pdb24Bg9dtvv1lYWJjFxsbaX3/95Z7+1ltvmSS77LLL8n3m2LFj9uijj1rt2rUtIiLCKlasaDfccINt377do1xmZqadc845+T6/dOlSu/TSSy0jI8OioqKsQoUKlpmZaTNnzvQoN2/ePGvatKlFRUWZJI83fZ8s782k06ZNyzfvyJEjNmzYMKtatapFR0fbueeeazNmzLC+ffvme8vpjh077PLLL7dy5cpZXFycXX755bZkyZISvaX8yJEjds8991h6errFxMRYZmamrV69Ot9byocPH27Nmze3xMREi4qKspo1a9pdd91lv/zyS6HxAQBCU6iMw2bH3/Bdt25di4iIyPeG82+++cauuuoqq1y5skVERFhycrJ17NjRJkyYYGZmc+bMsbCwsHxvRd+3b59Vq1bNzjvvPDty5IiZmf366692xRVXWPny5c3lcnmMx7t27bIrrrjCkpKSLCEhwW644QZbsWJFgW8pj42NLTCOo0eP2hNPPGGNGze26OhoK1eunNWtW9duvfVW27hxY5FtAIQql5mZv5J9AAAAAABCFc9wAwAAAADgABJuAAAAAAAcQMINAAAAAIADSLgBAAAAAHAACTcAAAAAAA4g4QYAAAAAwAHh/lhpbm6udu7cqbi4OLlcLn9UAQCAgGBmys7OVmpqqsLC/HMenHEZAIDjvD0u+yXh3rlzp9LT0/2xagAAAtL27duVlpbml3UzLgMA4Mlb47JfEu64uDhJx4OIj4/3RxUAAAgIBw4cUHp6unts9AfGZQAAjvP2uOyXhDvvdrX4+HgGdl9bOKb4Mh1GOF8PAIAHf97KzbiMYhV3/MCxA4AQ461xmZemAQAAAADgABJuAAAAAAAcQMINAAAAAIADSLgBAAAAAHAACTcAAAAAAA4g4QYAAAAAwAF++VkwAAAAlFBJftITABCQuMINAAAAAIADuMINAAAA/yvuSn6HEb6pBwB4EVe4AQAAAABwAAk3AAAAAAAOIOEGAAAAAMABPMON/HiGCgAA3+Et5AAQsrjCDQAAAACAA0i4AQAAAABwALeUAwAAIPDxyBuAIMQVbgAAAAAAHMAV7mDD2V0AAAJHSV54xth8XBC8HO7puT8UOf+uLrV9VBMAoYKEG6eOpB8AAAAAikXCDQAAAPhAcVfQJa6iA6GGhBvexxVwAAAAACDhDjlB8HwUAADAyZZu3lfk/FY1KxS9AC88T9/yp38Vs4Anil8HAJyAt5QDAAAAAOAArnADAIDSizvDQkZxV8glqVWHM1xJADw2x5vUgeBCwu1L/HQIAAAAAJQa3FIOAAAAAIADuMINAADgpNJw27oPYizJLeOhoLhbxnmxGxBcuMINAAAAAIADuMIN3+NZdgAA4AfFXj32UT0AlB5c4QYAAAAAwAFc4QYAAKGrNDw/DZyC4q7ylwQ/PQaUHFe4AQAAAABwAFe4AQAAUCoU/4bvM+ONq8cAQgtXuAEAAAAAcAAJNwAAAAAADuCWcgSm4l5yw8+GAQAAAAhwJNyBhreplholec6Lt4ACAELF0s37/F2FUoHnyIHAwi3lAAAAAAA4gCvcCE7ccg4AAAAgwJFwexO3gwcNbuf2neLamnYGgOBXWm4XLzbOar6pB4DgQcKNkBQqzy+RrAIAAADBi4T7VHAFG6UQST8AhL7ScoXa31r+9K9iyyyrNsAHNXEWxw7A/5Bww+d8Mqh74ZauMx0sfHGVPRDqGCp3EwAAAADeRsINAAAABIiSXAX3t2C4qAAEChLuE3HLOEJMabn6zKAMAACAQETCDcCvSJYBlHbFPWrVqmYFx9eB0sMXz5EXt46n5xa//OLr+cQp1KigOvCLNfANEm6EpOJ20qHwQpLSorRcpQcAAEDoIeGG15WWs+gkgsGDq+iAnxT3qFaHEc5+3gu8MaZ54wo1gkMgPH/tjToEQhzFOuNHQS/3SjWKFAD7MPgfCTcAeAFJPQAAAE5WehJuXogGoJTjpAAAAIBvlZ6EG5JKdmtcabj1zRu3SvEceOjwRSIaCOsIBCT98HCmJ8OD5GT6md6WXloe1ULp4Y3jsDP+XlQ74ypo6cRhZ/T5Vh2KL3Om46Y3jg3OdB3BchzlFL8k3GYmSTpw4ID3Fvrpk95bVhD7auuvZ7yMeWt3eqEmoe/wwT/O6PPn7ZhUbJnlaf3PaB3wjjEzVvp9HYM6nlXsMs50m/RGHV5YsOmM6uDVceE0FRdDSdrhVOTFnDc2+oMj47IkHTzs3eX5wcE/j/i7CgAcUJIxs7h94pnuH0qyzy2unsUt40yPDbyxDm+MLb5Yx8nL8ta47DI/jPA7duxQenq6r1cLAEDA2r59u9LS0vyybsZlAAA8eWtc9kvCnZubq507dyouLk4ul8vXq/eqAwcOKD09Xdu3b1d8fLy/qxOQaKPi0UbFo41KhnYqXqC1kZkpOztbqampCgsL80sdQmlczhNo/exNxBaciC04EVtwOpPYvD0u++WW8rCwML+dxXdKfHx8yG2o3kYbFY82Kh5tVDK0U/ECqY0SEhL8uv5QHJfzBFI/exuxBSdiC07EFpxONzZvjsv+OZUOAAAAAECII+EGAAAAAMABJNxnKCoqSllZWYqKivJ3VQIWbVQ82qh4tFHJ0E7Fo41Kh1DuZ2ILTsQWnIgtOAVSbH55aRoAAAAAAKGOK9wAAAAAADiAhBsAAAAAAAeQcAMAAAAA4AASbgAAAAAAHEDCXQKffvqpLrroIqWmpsrlcmnGjBke881MI0eOVGpqqmJiYtS+fXutXbvWP5X1o+LaqV+/fnK5XB7/WrZs6Z/K+sGYMWN03nnnKS4uTpUrV9Yll1yiDRs2eJRhWypZO5X2benFF19Uo0aNFB8fr/j4eLVq1Ur/+c9/3PPZjopvo9K+DYWSUB17QnnMCOX9fCjvn0vTfnXMmDFyuVy688473dOCue9OVFBswdp3I0eOzFfv5ORk9/xA6TMS7hI4ePCgGjdurOeff77A+Y899pieeuopPf/881q+fLmSk5PVpUsXZWdn+7im/lVcO0lS9+7dtWvXLve/jz/+2Ic19K/Fixdr0KBBWrZsmebOnaucnBx17dpVBw8edJdhWypZO0mle1tKS0vT2LFjtWLFCq1YsUIdO3bUxRdf7B5E2I6KbyOpdG9DoSRUx55QHjNCeT8fyvvn0rJfXb58uf71r3+pUaNGHtODue/yFBabFLx9d84553jU+7vvvnPPC5g+M5wSSfb++++7/87NzbXk5GQbO3ase9rhw4ctISHBJkyY4IcaBoaT28nMrG/fvnbxxRf7pT6BaO/evSbJFi9ebGZsS4U5uZ3M2JYKkpiYaK+88grbURHy2siMbShUhfLYE8pjRqjv50N5/xxq+9Xs7Gw7++yzbe7cuZaZmWlDhgwxs9D4vhUWm1nw9l1WVpY1bty4wHmB1Gdc4T5DW7Zs0e7du9W1a1f3tKioKGVmZmrJkiV+rFlgWrRokSpXrqzatWvrlltu0d69e/1dJb/5/fffJUlJSUmS2JYKc3I75WFbOu7YsWOaOnWqDh48qFatWrEdFeDkNsrDNlR6hEJfh/KYEar7+VDeP4fqfnXQoEHq2bOnOnfu7DE9FPqusNjyBGvfbdy4UampqapRo4auueYabd68WVJg9Vm4T9cWgnbv3i1JqlKlisf0KlWqaNu2bf6oUsC68MILdeWVVyojI0NbtmzR/fffr44dO+rrr79WVFSUv6vnU2amu+++W23btlWDBg0ksS0VpKB2ktiWJOm7775Tq1atdPjwYZUrV07vv/++6tev7x5E2I4KbyOJbag0CYW+DuUxIxT386G8fw7l/erUqVO1cuVKLV++PN+8YP++FRWbFLx916JFC73++uuqXbu29uzZo4cfflitW7fW2rVrA6rPSLi9xOVyefxtZvmmlXZXX321+/8NGjRQ8+bNlZGRoVmzZumyyy7zY818b/Dgwfr222/1+eef55vHtvQ/hbUT25JUp04drV69Wvv379d7772nvn37avHixe75bEeFt1H9+vXZhkqRUOjrUB4zQnE/H8r751Ddr27fvl1DhgzRnDlzFB0dXWi5YOy7ksQWrH134YUXuv/fsGFDtWrVSrVq1dLkyZPdL30LhD7jlvIzlPcmvLyzKHn27t2b74wKPKWkpCgjI0MbN270d1V86u9//7tmzpyphQsXKi0tzT2dbclTYe1UkNK4LUVGRuqss85S8+bNNWbMGDVu3FjPPvss29EJCmujgpTGbai0Cra+DuUxI1T386G8fw7V/erXX3+tvXv3qlmzZgoPD1d4eLgWL16s5557TuHh4e7+Cca+Ky62Y8eO5ftMMPXdiWJjY9WwYUNt3LgxoL5vJNxnqEaNGkpOTtbcuXPd0/766y8tXrxYrVu39mPNAt++ffu0fft2paSk+LsqPmFmGjx4sKZPn64FCxaoRo0aHvPZlo4rrp0KUtq2pYKYmY4cOcJ2VIS8NioI21DpESx9HcpjRmnbz4fy/jlU9qudOnXSd999p9WrV7v/NW/eXNdff71Wr16tmjVrBm3fFRdbmTJl8n0mmPruREeOHNH69euVkpISWN83n76iLUhlZ2fbqlWrbNWqVSbJnnrqKVu1apVt27bNzMzGjh1rCQkJNn36dPvuu+/s2muvtZSUFDtw4ICfa+5bRbVTdna2DR061JYsWWJbtmyxhQsXWqtWraxq1aqlpp1uv/12S0hIsEWLFtmuXbvc/w4dOuQuw7ZUfDuxLZmNGDHCPv30U9uyZYt9++239o9//MPCwsJszpw5ZsZ2ZFZ0G7ENhZZQHXtCecwI5f18KO+fS9t+9eQ3eQdz353sxNiCue+GDh1qixYtss2bN9uyZcusV69eFhcXZ1u3bjWzwOkzEu4SWLhwoUnK969v375mdvy181lZWZacnGxRUVF2wQUX2HfffeffSvtBUe106NAh69q1q1WqVMkiIiKsWrVq1rdvX/vpp5/8XW2fKahtJNmkSZPcZdiWim8ntiWzm266yTIyMiwyMtIqVapknTp1ch/MmbEdmRXdRmxDoSVUx55QHjNCeT8fyvvn0rZfPTnhDua+O9mJsQVz31199dWWkpJiERERlpqaapdddpmtXbvWPT9Q+sxlZubU1XMAAAAAAEornuEGAAAAAMABJNwAAAAAADiAhBsAAAAAAAeQcAMAAAAA4AASbgAAAAAAHEDCDQAAAACAA0i4AQAAAABwAAk3AAAAAAAOIOEGAAAAAMABJNwAAAAAADiAhBsAAAAAAAeQcAMAAAAA4AASbgAAAAAAHEDCDQAAAACAA0i4AQAAAABwAAk3AAAAAAAOIOEGAAAAAMABJNwAAAAAADiAhBsAAAAAAAeQcAMAAAAA4AASbgAAAAAAHEDCDQAAAACAA0i4AQAAAABwAAk3AAAAAAAOIOEGAAAAAMABJNwAAAAAADiAhBsAAAAAAAeQcAMAAAAA4AASbgAAAAAAHEDCDQAAAACAA0i4AQAAAABwAAk3AAAAAAAOIOEGAAAAAMABJNwAAAAAADiAhBsAAAAAAAeQcAMAAAAA4AASbgAAAAAAHEDCDQSYQ4cOaeTIkVq0aJG/q1Kg1157TS6XS1u3bvX5unfu3KmRI0dq9erVPl83AADeFuhjPoAzR8INBJhDhw5p1KhRATv49uzZU0uXLlVKSorP171z506NGjWKhBsAEBICfcwHcOZIuAEfOXTokL+rcEb+/PNPmZkqVaqkli1bKioqyt9V8pq82AAA8IZAHPPNTH/++ae/qwGUOiTcKHXWrl0rl8uladOmuad9/fXXcrlcOuecczzK9u7dW82aNXP/nZubq8cee0x169ZVVFSUKleurD59+mjHjh0en2vfvr0aNGigTz/9VK1bt1bZsmV10003SZIWLFig9u3bq0KFCoqJiVG1atV0+eWX69ChQ9q6dasqVaokSRo1apRcLpdcLpf69etXaDyLFi2Sy+XSm2++qbvvvlvJycmKiYlRZmamVq1ala/8ihUr1Lt3byUlJSk6OlpNmzbVv//9b48yebeNz5kzRzfddJMqVaqksmXL6siRIwXeUp4X79KlS9W6dWvFxMSoevXqmjRpkiRp1qxZOvfcc1W2bFk1bNhQs2fPzlevjRs36rrrrlPlypUVFRWlevXq6YUXXvCI87zzzpMk9e/f3902I0eO9FpsAIDQEmpj/uHDhzV06FA1adJECQkJSkpKUqtWrfTBBx/kK+tyuTR48GBNmDBB9erVU1RUlCZPniyp+DH3VNcFoAgGlEIpKSk2YMAA999jx461mJgYk2T//e9/zczs6NGjFh8fb/fcc4+73IABA0ySDR482GbPnm0TJkywSpUqWXp6uv3888/ucpmZmZaUlGTp6ek2btw4W7hwoS1evNi2bNli0dHR1qVLF5sxY4YtWrTI3nrrLbvxxhvtt99+s8OHD9vs2bNNkt188822dOlSW7p0qW3atKnQWBYuXGiSLD093S6++GL78MMP7c0337SzzjrL4uPj7ccff3SXXbBggUVGRlq7du3snXfesdmzZ1u/fv1Mkk2aNMldbtKkSSbJqlatagMGDLD//Oc/9u6771pOTo573pYtWzzirVChgtWpU8cmTpxon3zyifXq1csk2ahRo6xhw4Y2ZcoU+/jjj61ly5YWFRXlbmczs7Vr11pCQoI1bNjQXn/9dZszZ44NHTrUwsLCbOTIkWZm9vvvv7vX/c9//tPdNtu3b/dabACA0BNKY/7+/futX79+9sYbb9iCBQts9uzZNmzYMAsLC7PJkyd7lM0b6xo1amRvv/22LViwwNasWVOiMfdU1wWgcCTcKJVuuOEGq1mzpvvvzp072y233GKJiYnuQeSLL74wSTZnzhwzM1u/fr1JsoEDB3os68svvzRJ9o9//MM9LTMz0yTZ/PnzPcq+++67JslWr15daN1+/vlnk2RZWVkliiUv4T733HMtNzfXPX3r1q0WERFhf/vb39zT6tata02bNrWjR496LKNXr16WkpJix44dM7P/JaV9+vTJt77CEm5JtmLFCve0ffv2WZkyZSwmJsYjuV69erVJsueee849rVu3bpaWlma///67x7oGDx5s0dHR9uuvv5qZ2fLly/Ml0N6MDQAQekJpzD9ZTk6OHT161G6++WZr2rSpxzxJlpCQ4B5D85R0zD2VdQEoHLeUo1Tq1KmTNm/erC1btujw4cP6/PPP1b17d3Xo0EFz586VJM2bN09RUVFq27atJGnhwoWSlO9Wr/PPP1/16tXT/PnzPaYnJiaqY8eOHtOaNGmiyMhIDRgwQJMnT9bmzZu9FtN1110nl8vl/jsjI0OtW7d213vTpk36/vvvdf3110uScnJy3P969OihXbt2acOGDR7LvPzyy0u8/pSUFI9b8ZKSklS5cmU1adJEqamp7un16tWTJG3btk3S8VvW5s+fr0svvVRly5bNV6/Dhw9r2bJlRa7b6dgAAMEr1Mb8adOmqU2bNipXrpzCw8MVERGhiRMnav369fnKduzYUYmJie6/T3XMPZV1ASgYCTdKpc6dO0s6PsB+/vnnOnr0qDp27KjOnTu7B9F58+apTZs2iomJkSTt27dPkgp8O3dqaqp7fp6CytWqVUvz5s1T5cqVNWjQINWqVUu1atXSs88+e8YxJScnFzgtr1579uyRJA0bNkwREREe/wYOHChJ+uWXX4qNoTBJSUn5pkVGRuabHhkZKen4oC8db9ecnByNGzcuX7169OhRYL1O5nRsAIDgFUpj/vTp03XVVVepatWqevPNN7V06VItX75cN910k3tcLapepzLmnuq6ABQs3N8VAPwhLS1NtWvX1rx581S9enU1b95c5cuXV6dOnTRw4EB9+eWXWrZsmUaNGuX+TIUKFSRJu3btUlpamsfydu7cqYoVK3pMO/Fq84natWundu3a6dixY1qxYoXGjRunO++8U1WqVNE111xz2jHt3r27wGl59c6r34gRI3TZZZcVuIw6deqUKAZvSkxMVJkyZXTjjTdq0KBBBZapUaNGkcsI1NgAAP4XSmP+m2++qRo1auidd97xWGdhL/48uV6nMuae6roAFIyEG6VW586d9e9//1vp6enq2bOnJKl27dqqVq2aHnjgAR09etR9VlyS+1axN9980/22bElavny51q9fr/vuu++U1l+mTBm1aNFCdevW1VtvvaWVK1fqmmuucf/c1qn+dMeUKVN09913uwfFbdu2acmSJerTp4+k4wnn2WefrW+++UajR48+pWU7qWzZsurQoYNWrVqlRo0aua+AF6SwtgnU2AAAgSFUxnyXy6XIyEiPBHj37t0lfnP4qYy5Z7ouAMeRcKPU6tSpk8aPH69ffvlFzzzzjMf0SZMmKTEx0eOZ5Dp16mjAgAEaN26cwsLCdOGFF2rr1q26//77lZ6errvuuqvYdU6YMEELFixQz549Va1aNR0+fFivvvqqpP/d8hYXF6eMjAx98MEH6tSpk5KSklSxYkVVr169yGXv3btXl156qW655Rb9/vvvysrKUnR0tEaMGOEu89JLL+nCCy9Ut27d1K9fP1WtWlW//vqr1q9fr5UrV3r8bIovPfvss2rbtq3atWun22+/XdWrV1d2drY2bdqkDz/8UAsWLJB0/Pa8mJgYvfXWW6pXr57KlSun1NRUpaamBmxsAAD/C5Uxv1evXpo+fboGDhyoK664Qtu3b9dDDz2klJQUbdy4sURtUdIx1xvrAiB+Fgyl12+//WZhYWEWGxtrf/31l3v6W2+9ZZLssssuy/eZY8eO2aOPPmq1a9e2iIgIq1ixot1www3un6bKk5mZaeecc06+zy9dutQuvfRSy8jIsKioKKtQoYJlZmbazJkzPcrNmzfPmjZtalFRUSbJ+vbtW2gceW8pf+ONN+yOO+6wSpUqWVRUlLVr187jreF5vvnmG7vqqquscuXKFhERYcnJydaxY0ebMGGCu0zem7yXL1+e7/OFvaW8oHgzMjKsZ8+e+aZLskGDBnlM27Jli910001WtWpVi4iIsEqVKlnr1q3t4Ycf9ig3ZcoUq1u3rkVEROR7s+uZxgYACE2hMuabHf9Zs+rVq1tUVJTVq1fPXn75ZcvKyrKTD+sLGmvzlHTMLem6ABTOZWbml0wfgFcsWrRIHTp00LRp03TFFVf4uzoAAAAA/j/eUg4AAAAAgANIuAEAAAAAcAC3lAMAAAAA4ACucAMAAAAA4AASbgAAAAAAHOCX3+HOzc3Vzp07FRcXJ5fL5Y8qAAAQEMxM2dnZSk1NVViYf86DMy4DAHCct8dlvyTcO3fuVHp6uj9WDQBAQNq+fbvS0tL8sm7GZQAAPHlrXPZLwh0XFyfpeBDx8fH+qAIAAAHhwIEDSk9Pd4+N/sC4DADAcd4el/2ScOfdrhYfH8/ADgCA5NdbuRmXAQDw5K1x2S8JNxy0cEzR8zuM8E09AAAIBIyLAAA/4i3lAAAAAAA4gCvcpU1xZ/olzvYDAAAAgBdwhRsAAAAAAAeQcAMAAAAA4AASbgAAAAAAHEDCDQAAAACAA0i4AQAAAABwAAk3AAAAAAAOIOEGAAAAAMABJNwAAAAAADiAhBsAAAAAAAeQcAMAAAAA4IBwf1cAAADgtCwc4+8aAABQJK5wAwAAAADgABJuAAAAAAAcwC3lyK+4W/Q6jPBNPQAAAAAgiJFw49SRkAMAAABAsbilHAAAAAAAB3CFO9jwRlYAALyHu7YAAA7iCjcAAAAAAA4g4QYAAAAAwAHcUg7v4/Y8AAAAACDh9qmSPH9NMgoAAAAAIYGEGwAAoDDctQUAOAMk3AAAIDDxyxwAgCDHS9MAAAAAAHAACTcAAAAAAA4g4QYAAAAAwAE8wx1oeF4NAAAAAEICV7gBAAAAAHAAV7i9iavTAAAAAID/jyvcAAAAAAA4gCvcwGl6eu4PRc6/q0ttH9UEAAAAQCDiCjcAAAAAAA4g4QYAAAAAwAHcUg7fK8nL5TqMcL4eAAD/CoWXjTKmAQCKwBVuAAAAAAAcwBVuoBDFvRQtFJQkRl7+BgAAAJwernADAAAAAOAArnAjMBX3TBzPw0kKjJ8mC4Q6AAAAAIGIhPtUhMLLXULE0onDipzf6uYnfFQTZ5WG29oBIJgt3byv2DKtOvigImeIk6cA4AwSbvhcSQ5OzhQHDgCAYMGYBQChi4QbpRJXjgEApUkwjHuceAAQiki4AT8JhoMfKXjqCQCBiv0oAJReJNyAQwLhACsY6sAVCwAIfr74mUnGEwDBqPQk3CV54RlvvgaCEgdhAJzijfeOtPzpX0XOX1ZtwBktPxBOrgIAClZ6Eu6S4C3kAAAgwJBQA0DwIuGG1/niLeQIHYFwdToQ6gCEJE5khwxvJP1OnzgIhBMTjBcATkbCjXyKS5hb1azgo5qcPqdv34PvBMNBni944/lITiwAAAD4Fgk3ThlXsFHa+CLZBeCMYDiJzEni0BEIJzYDoQ4A/scvCbeZSZIOHDjgvYV++qT3lhXEvtr6q7+rEBQabhhXbJnlaf2LnH/ejkln9HmEluL2Z4cP/hHwdSju8y8s2FRsHQZ1POuMllHc533B13XMa/e8sdEfHBmXJengYe8u73Sq8OeRIucfKKaOxX3eF4obsxhvgseYGSvPeBnF7YOK29f7og7eEAjjRSDUAb7n7XHZZX4Y4Xfs2KH09HRfrxYAgIC1fft2paWl+WXdjMsAAHjy1rjsl4Q7NzdXO3fuVFxcnFwu1xkt68CBA0pPT9f27dsVHx/vpRr6TyjFQyyBK5TiIZbAFUrxOBmLmSk7O1upqakKCwvz6rJLKm9cNjNVq1aNPgsgoRKHFDqxhEocErEEolCJQwreWLw9LvvllvKwsDCvn8WPj48Pqo4sTijFQyyBK5TiIZbAFUrxOBVLQkKC15d5KvLG5bzb6OizwBMqcUihE0uoxCERSyAKlTik4IzFm+Oyf06lAwAAAAAQ4ki4AQAAAABwQNAn3FFRUcrKylJUVJS/q+IVoRQPsQSuUIqHWAJXKMUTSrEUJZTiDJVYQiUOKXRiCZU4JGIJRKEShxRasZwJv7w0DQAAAACAUBf0V7gBAAAAAAhEJNwAAAAAADiAhBsAAAAAAAeQcAMAAAAA4AASbgAAAAAAHBA0CffIkSPlcrk8/iUnJ7vnm5lGjhyp1NRUxcTEqH379lq7dq0fa1y46tWr54vF5XJp0KBBkqR+/frlm9eyZUs/1/q4Tz/9VBdddJFSU1Plcrk0Y8YMj/kl6YcjR47o73//uypWrKjY2Fj17t1bO3bs8GEU/1NUPEePHtW9996rhg0bKjY2VqmpqerTp4927tzpsYz27dvn669rrrnGx5EU3zcl2a4CpW+Ki6Wg74/L5dLjjz/uLhMo/TJmzBidd955iouLU+XKlXXJJZdow4YNHmWC5XtTXCzB9p0pSd8E0/fmTI0fP141atRQdHS0mjVrps8++8zfVfLgjeMAf/WVr8bO3377TTfeeKMSEhKUkJCgG2+8Ufv37/dpLN76zjgdiy/3zU7G4sv9mNN98uKLL6pRo0aKj49XfHy8WrVqpf/85z/u+cHQHyWNJVj65GRjxoyRy+XSnXfe6Z4WTP3iNxYksrKy7JxzzrFdu3a5/+3du9c9f+zYsRYXF2fvvfeefffdd3b11VdbSkqKHThwwI+1LtjevXs94pg7d65JsoULF5qZWd++fa179+4eZfbt2+ffSv9/H3/8sd1333323nvvmSR7//33PeaXpB9uu+02q1q1qs2dO9dWrlxpHTp0sMaNG1tOTo6Poyk6nv3791vnzp3tnXfese+//96WLl1qLVq0sGbNmnksIzMz02655RaP/tq/f7+PIym+b0qyXQVK3xQXy4kx7Nq1y1599VVzuVz2448/ussESr9069bNJk2aZGvWrLHVq1dbz549rVq1avbHH3+4ywTL96a4WILtO1OSvgmm782ZmDp1qkVERNjLL79s69atsyFDhlhsbKxt27bN31Vz88ZxgL/6yldjZ/fu3a1Bgwa2ZMkSW7JkiTVo0MB69erl01i89Z1xOhZf7pudjMWX+zGn+2TmzJk2a9Ys27Bhg23YsMH+8Y9/WEREhK1Zs8bMgqM/ShpLsPTJib766iurXr26NWrUyIYMGeKeHkz94i9BlXA3bty4wHm5ubmWnJxsY8eOdU87fPiwJSQk2IQJE3xUw9M3ZMgQq1WrluXm5prZ8S/hxRdf7N9KlcDJA21J+mH//v0WERFhU6dOdZf573//a2FhYTZ79myf1b0gBR04nOyrr74ySR4HoZmZmR47nkBQ2EFQUdtVoPZNSfrl4osvto4dO3pMC8R+MTt+wk2SLV682MyC+3tzciwFCZbvjFnB8QTr9+ZUnX/++Xbbbbd5TKtbt64NHz7cTzXK70yPAwKlr5waO9etW2eSbNmyZe4yS5cuNUn2/fff+yQWM+98Z/wRi1P7Zl/H4tR+zB99YmaWmJhor7zyStD2R0GxmAVfn2RnZ9vZZ59tc+fO9RjDQ6FffCFobimXpI0bNyo1NVU1atTQNddco82bN0uStmzZot27d6tr167uslFRUcrMzNSSJUv8Vd0S+euvv/Tmm2/qpptuksvlck9ftGiRKleurNq1a+uWW27R3r17/VjLkilJP3z99dc6evSoR5nU1FQ1aNAg4PtKkn7//Xe5XC6VL1/eY/pbb72lihUr6pxzztGwYcOUnZ3tnwoWo6jtKlj7Zs+ePZo1a5ZuvvnmfPMCsV9+//13SVJSUpKk4P7enBxLYWWC5TtTWDyh+L050V9//aWvv/7aIwZJ6tq1a8DFcCbHAYHaV96q+9KlS5WQkKAWLVq4y7Rs2VIJCQk+j+9MvzP+iMWpfbOvY3FqP+brOI4dO6apU6fq4MGDatWqVdD2R0Gx5AmmPhk0aJB69uypzp07e0wP5n7xpXB/V6CkWrRooddff121a9fWnj179PDDD6t169Zau3atdu/eLUmqUqWKx2eqVKmibdu2+aO6JTZjxgzt379f/fr1c0+78MILdeWVVyojI0NbtmzR/fffr44dO+rrr79WVFSU/ypbjJL0w+7duxUZGanExMR8ZfI+H6gOHz6s4cOH67rrrlN8fLx7+vXXX68aNWooOTlZa9as0YgRI/TNN99o7ty5fqxtfsVtV8HaN5MnT1ZcXJwuu+wyj+mB2C9mprvvvltt27ZVgwYNJAXv96agWE4WTN+ZwuIJ1e/NiX755RcdO3aswG0wkGI40+OAQO0rb9V99+7dqly5cr7lV65c2afxeeM74+tYnNw3+zIWJ/djvorju+++U6tWrXT48GGVK1dO77//vurXr+9OuoKpPwqLRQquPpk6dapWrlyp5cuX55sXjN8TfwiahPvCCy90/79hw4Zq1aqVatWqpcmTJ7tfMnDiFWLp+I7n5GmBZuLEibrwwguVmprqnnb11Ve7/9+gQQM1b95cGRkZmjVrVr6kIhCdTj8Eel8dPXpU11xzjXJzczV+/HiPebfccov7/w0aNNDZZ5+t5s2ba+XKlTr33HN9XdVCne52Feh98+qrr+r6669XdHS0x/RA7JfBgwfr22+/1eeff55vXrB9b4qKRQq+70xh8YTq96YggT6GOnUcEChxeqPuBZX3dXze+s74Mhan982+isXp/Zgv4qhTp45Wr16t/fv367333lPfvn21ePHiQusQyP1RWCz169cPmj7Zvn27hgwZojlz5uQ7zjpRMPWLPwTVLeUnio2NVcOGDbVx40b3W0pPPgOyd+/efGdcAsm2bds0b948/e1vfyuyXEpKijIyMrRx40Yf1ez0lKQfkpOT9ddff+m3334rtEygOXr0qK666ipt2bJFc+fO9bhSV5Bzzz1XERERAd9fJ29Xwdg3n332mTZs2FDsd0jyf7/8/e9/18yZM7Vw4UKlpaW5pwfj96awWPIE23emuHhOFArfm5NVrFhRZcqUCbox9FSPAwK1r7xV9+TkZO3Zsyff8n/++We/xnc63xlfxuL0vtlXsTi9H/NVHJGRkTrrrLPUvHlzjRkzRo0bN9azzz4bdP1RVCwFCdQ++frrr7V37141a9ZM4eHhCg8P1+LFi/Xcc88pPDzcvZ5g6hd/CNqE+8iRI1q/fr1SUlLctyaeeDviX3/9pcWLF6t169Z+rGXRJk2apMqVK6tnz55Fltu3b5+2b9+ulJQUH9Xs9JSkH5o1a6aIiAiPMrt27dKaNWsCsq/yEoeNGzdq3rx5qlChQrGfWbt2rY4ePRrw/XXydhVsfSMdv0OkWbNmaty4cbFl/dUvZqbBgwdr+vTpWrBggWrUqOExP5i+N8XFIgXXd6Yk8ZwsFL43J4uMjFSzZs3y3dI/d+7cgI7hVI8DArWvvFX3Vq1a6ffff9dXX33lLvPll1/q999/92t8p/Od8UUsvto3Ox2Lr/Zj/tq+zExHjhwJmv4oSSwFCdQ+6dSpk7777jutXr3a/a958+a6/vrrtXr1atWsWTPo+8UnnHsfm3cNHTrUFi1aZJs3b7Zly5ZZr169LC4uzrZu3Wpmx19Jn5CQYNOnT7fvvvvOrr322oD9WTAzs2PHjlm1atXs3nvv9ZienZ1tQ4cOtSVLltiWLVts4cKF1qpVK6tatWpAxJKdnW2rVq2yVatWmSR76qmnbNWqVe43EJekH2677TZLS0uzefPm2cqVK61jx45++wmdouI5evSo9e7d29LS0mz16tUeP91w5MgRMzPbtGmTjRo1ypYvX25btmyxWbNmWd26da1p06Y+j6eoWEq6XQVK3xS3nZmZ/f7771a2bFl78cUX830+kPrl9ttvt4SEBFu0aJHHNnTo0CF3mWD53hQXS7B9Z4qLJ9i+N2ci72fBJk6caOvWrbM777zTYmNj3WNsIPDGcYC/+spXY2f37t2tUaNGtnTpUlu6dKk1bNjQ6z+r46uxxulYfLlvdjIWX+7HnO6TESNG2Keffmpbtmyxb7/91v7xj39YWFiYzZkzx8yCoz9KEksw9UlBTv6lkWDqF38JmoQ77zfdIiIiLDU11S677DJbu3ate35ubq5lZWVZcnKyRUVF2QUXXGDfffedH2tctE8++cQk2YYNGzymHzp0yLp27WqVKlWyiIgIq1atmvXt29d++uknP9XU08KFC01Svn99+/Y1s5L1w59//mmDBw+2pKQki4mJsV69evktvqLi2bJlS4HzdMJvpv/00092wQUXWFJSkkVGRlqtWrXsjjvu8MvvphcVS0m3q0Dpm+K2MzOzl156yWJiYgr8/eZA6pfCtqFJkya5ywTL96a4WILtO1NcPMH2vTlTL7zwgmVkZFhkZKSde+65Rf7cmz944zjAX33lq7Fz3759dv3111tcXJzFxcXZ9ddfb7/99pvPYvHmd8bpWHy5b3YyFl/ux5zuk5tuusm9D6pUqZJ16tTJnWybBUd/lCSWYOqTgpyccAdTv/iLy8zsNC6MAwAAAACAIgTtM9wAAAAAAAQyEm4AAAAAABxAwg0AAAAAgANIuAEAAAAAcAAJNwAAAAAADiDhBgAAAADAASTcAAAAAAA4gIQbAAAAAAAHkHADAAAAAOAAEm4AAAAAABxAwg0AAAAAgANIuAEAAAAAcAAJNwAAAAAADiDhBgAAAADAASTcAAAAAAA4gIQbAAAAAAAHkHADAAAAAOAAEm4AAAAAABxAwg0AAAAAgANIuAEAAAAAcAAJNwAAAAAADiDhBgAAAADAASTcAAAAAAA4gIQbAAAAAAAHkHADAAAAAOAAEm4AAAAAABxAwg0AAAAAgANIuAEAAAAAcAAJNwAAAAAADiDhBgAAAADAASTcAAAAAAA4gIQbAAAAAAAHkHADAAAAAOAAEm4AAAAAABxAwg0AAAAAgANIuAEAAAAAcAAJNwAAAAAADiDhBnzo0KFDGjlypBYtWuTvqjhq9OjRmjFjRr7pr732mlwul1asWOH7SgEASr3SMg5729tvv61nnnnG39UAghIJN+BDhw4d0qhRo0J+oC8s4QYAwJ9KyzjsbSTcwOkj4Qa84NChQ/6uAgAApRbjMIBARcKNkLJ27Vq5XC5NmzbNPe3rr7+Wy+XSOeec41G2d+/eatasmfvv3NxcPfbYY6pbt66ioqJUuXJl9enTRzt27PD4XPv27dWgQQN9+umnat26tcqWLaubbrpJkrRgwQK1b99eFSpUUExMjKpVq6bLL79chw4d0tatW1WpUiVJ0qhRo+RyueRyudSvX79C48nNzdXDDz+sOnXqKCYmRuXLl1ejRo307LPPusuMHDlSLpdL3377ra688kolJCQoKSlJd999t3JycrRhwwZ1795dcXFxql69uh577LF86/npp590ww03qHLlyoqKilK9evX05JNPKjc316Pcr7/+qoEDB6pq1aqKjIxUzZo1dd999+nIkSPuMi6XSwcPHtTkyZPdMbZv395jOdnZ2br99ttVsWJFVahQQZdddpl27tzpUaZ69erq1auXZs+erXPPPVcxMTGqW7euXn311Xz13717t2699ValpaUpMjJSNWrU0KhRo5STk+NR7sUXX1Tjxo1Vrlw5xcXFqW7duvrHP/7hnn/o0CENGzZMNWrUUHR0tJKSktS8eXNNmTKl0D4CAPxPqI3DkrR//34NHTpUNWvWdNerR48e+v77791lSjI+SsfHyMGDB2vSpEnusb158+ZatmyZzEyPP/64atSooXLlyqljx47atGlTgbF/9tlnatmypWJiYlS1alXdf//9OnbsmEfZUaNGqUWLFkpKSlJ8fLzOPfdcTZw4UWaWL8a3335brVq1Urly5VSuXDk1adJEEydOdK9z1qxZ2rZtm7vNXC6XJGnr1q1yuVx64okn9NRTT7nr3qpVKy1btizfelasWKHevXsrKSlJ0dHRatq0qf797397lCnJWLx582Zdc801Sk1NVVRUlKpUqaJOnTpp9erVRfYl4BcGhJiUlBQbMGCA+++xY8daTEyMSbL//ve/ZmZ29OhRi4+Pt3vuucddbsCAASbJBg8ebLNnz7YJEyZYpUqVLD093X7++Wd3uczMTEtKSrL09HQbN26cLVy40BYvXmxbtmyx6Oho69Kli82YMcMWLVpkb731lt14443222+/2eHDh2327NkmyW6++WZbunSpLV261DZt2lRoLGPGjLEyZcpYVlaWzZ8/32bPnm3PPPOMjRw50l0mKyvLJFmdOnXsoYcesrlz59o999zjjqVu3br23HPP2dy5c61///4myd577z335/fu3WtVq1a1SpUq2YQJE2z27Nk2ePBgk2S33367u9yff/5pjRo1stjYWHviiSdszpw5dv/991t4eLj16NHDXW7p0qUWExNjPXr0cMe4du1aMzObNGmSSbKaNWva3//+d/vkk0/slVdescTEROvQoYNH7BkZGZaWlmb169e3119/3T755BO78sorTZItXrzYXW7Xrl2Wnp5uGRkZ9tJLL9m8efPsoYcesqioKOvXr5+73JQpU0yS/f3vf7c5c+bYvHnzbMKECXbHHXe4y9x6661WtmxZe+qpp2zhwoX20Ucf2dixY23cuHGF9hEAwFMojcMHDhywc845x2JjY+3BBx+0Tz75xN577z0bMmSILViwwMxKPj6amUmyjIwMa926tU2fPt3ef/99q127tiUlJdldd91lF198sX300Uf21ltvWZUqVaxRo0aWm5v7/9q78/CoyvP/45/JHgJhCQQCgSCUhEUUJLJqw64Ibi2ligtYFBWogNoWpQpYi4JKUb8slSKiAlpAERGFIIvsooBWVpWlUBYBRcImCdy/P/hldMiezJmZTN6v68rFNWeeOed+7jmcZ+5zzjzj0fe4uDirWbOmvfTSS7Zo0SJ76KGHTJINHDjQY1t9+/a1qVOnWnp6uqWnp9vf/vY3i46OtlGjRnm0e+KJJ0yS/eY3v7HZs2fb4sWLbdy4cfbEE0+YmdmWLVusXbt2VqNGDXfO1q5da2Zmu3fvNklWt25du/76623evHk2b948a9q0qVWuXNmOHz/u3s7SpUstIiLCrr32Wnv77bfto48+sr59+5okmzZtmrtdYcbilJQU+9WvfmVvvPGGrVixwubOnWuPPPKILVu2LM/3EvAXCm4EnTvvvNPq1avnfty5c2e77777rHLlyjZ9+nQzM1u9erVJssWLF5uZ2bZt20ySDRgwwGNd69evN0n2+OOPu5elpaWZJPv444892s6ZM8ck2ebNm/OM7ciRIybJRowYUai+9OjRw5o1a5Zvm+yC+4UXXvBY3qxZM5Nk77zzjntZZmamVatWzX7zm9+4lw0bNswk2fr16z1e/+CDD5rL5bIdO3aYmdnkyZNNkv373//2aDdmzBiPXJqZxcTEWJ8+fXLEml1wX5rnsWPHmiQ7ePCge1lSUpJFRUXZ3r173cvOnDljVapUsfvvv9+97P7777fy5ct7tDMze/75502Su9gfNGiQVapUKUdMv3T55ZfbLbfckm8bAED+gmkcfuqpp0ySpaen59mmKOOjJKtRo4adPHnSvWzevHkmyZo1a+ZRXI8fP94k2Zdffulelt339957z2Nb9913n4WEhOQYC7OdP3/eMjMz7amnnrK4uDj3dnbt2mWhoaF2xx135JuH7t27W1JSUo7l2QV306ZNLSsry738008/NUk2a9Ys97KGDRta8+bNLTMz02MdPXr0sISEBDt//ryZFTwWHz161CTZ+PHj840ZCBTcUo6g06lTJ+3atUu7d+/W2bNntWrVKl1//fXq0KGD0tPTJUlLlixRZGSkrrnmGknSsmXLJCnHbWUtW7ZUo0aN9PHHH3ssr1y5sjp27OixrFmzZoqIiFD//v01ffp07dq1q8R9admypb744gsNGDBAixYt0okTJ/Js26NHD4/HjRo1ksvlUrdu3dzLwsLC9Ktf/Up79+51L1u6dKkaN26sli1bery+b9++MjMtXbrU3S4mJkY9e/bM0U5Sjhzl56abbvJ4fMUVV0iSR1zSxZzWqVPH/TgqKkrJycke7RYsWKAOHTqoZs2aysrKcv9l93vFihWSLuby+PHjuv322/Xee+/p6NGjOeJq2bKlPvzwQw0bNkzLly/XmTNnCt0nAMBFwTQOf/jhh0pOTlbnzp3zbFPU8bFDhw6KiYlxP27UqJEkqVu3bu5btX+5/NKxsUKFCjnG0d69e+vChQv65JNPPOLq3LmzKlasqNDQUIWHh+vJJ5/UsWPH9N1330mS0tPTdf78eQ0cODDfPBSke/fuCg0NdT++dFz/5ptvtH37dt1xxx2S5DFe33DDDTp48KB27NghqeCxuEqVKqpfv76ee+45jRs3Tps2bcrxFTggkFBwI+hkD4pLlizRqlWrlJmZqY4dO6pz587uQW/JkiVq166doqOjJUnHjh2TJCUkJORYX82aNd3PZ8utXf369bVkyRLFx8dr4MCBql+/vurXr+/xfeuieuyxx/T8889r3bp16tatm+Li4tSpU6dcf1arSpUqHo8jIiJUrlw5RUVF5Vh+9uxZ9+Njx47l2e/s57P/rVGjhseHAUmKj49XWFhYjhzlJy4uzuNxZGSkJOUYVC9tl932l+0OHz6s999/X+Hh4R5/2d8VzC6s77rrLr366qvau3evfvvb3yo+Pl6tWrVyf/iTpJdeekl/+ctfNG/ePHXo0EFVqlTRLbfcoq+//rrQfQOAsi6YxuEjR44oMTEx3zZFHR9zG6/zW/7LMVuSqlevniOGGjVquGORpE8//VRdu3aVJE2ZMkWrV6/Whg0bNHz4cEk/j7dHjhyRpAL7WJCCxvXDhw9Lkh599NEc4/WAAQMk/TxeFzQWu1wuffzxx7ruuus0duxYXXXVVapWrZoeeughZWRklKgfgBMouBF0EhMTlZycrCVLlig9PV2pqamqVKmSOnXqpIMHD2r9+vVat26dx9nq7IHi4MGDOdZ34MABVa1a1WPZpYNqtmuvvVbvv/++fvzxR61bt05t2rTRkCFD9NZbbxWrL2FhYXr44Ye1ceNGff/995o1a5b27dun6667zmszssbFxeXZb0nuvsfFxenw4cM5Jlv57rvvlJWVlSNHvlK1alV17dpVGzZsyPWvX79+7rb33HOP1qxZox9//FEffPCBzEw9evRwn4GPiYnRqFGjtH37dh06dEiTJk3SunXrdOONN/qlbwBQGgXTOFytWrUck7ZdytfjY3bx+kuHDh1yxyJJb731lsLDw7VgwQL16tVLbdu2VWpqao7XZU8iV1AfSyo7B4899lie43WzZs0kFW4sTkpK0tSpU3Xo0CHt2LFDQ4cO1cSJE/WnP/3J0X4AxUHBjaDUuXNnLV26VOnp6erSpYskKTk5WXXq1NGTTz6pzMxMj4E++7a0N99802M9GzZs0LZt29SpU6cibT80NFStWrXShAkTJEkbN26UlPeV3MKoVKmSevbsqYEDB+r777/Xnj17iryO3HTq1Elbt251x5jt9ddfl8vlUocOHdztTp48meP3tV9//XX389kuvQrtpB49euirr75S/fr1lZqamuMv+0r9L8XExKhbt24aPny4zp07py1btuRoU716dfXt21e33367duzYwU/OAEARBMs43K1bN+3cudP99arcFGV89IaMjAzNnz/fY9nMmTMVEhKiX//615IunpAICwvzuM37zJkzeuONNzxe17VrV4WGhmrSpEn5brOk43pKSooaNGigL774ItexOjU1VRUqVMjxusKMxcnJyfrrX/+qpk2b5vgsAwSCMH8HADihU6dOmjhxoo4eParx48d7LJ82bZoqV67s8VMkKSkp6t+/v15++WWFhISoW7du2rNnj5544gnVrl1bQ4cOLXCbkydP1tKlS9W9e3fVqVNHZ8+edf+EVfaHigoVKigpKUnvvfeeOnXqpCpVqqhq1aqqW7duruu88cYbdfnllys1NVXVqlXT3r17NX78eCUlJalBgwbFT9AvDB06VK+//rq6d++up556SklJSfrggw80ceJEPfjgg0pOTpYk3X333ZowYYL69OmjPXv2qGnTplq1apVGjx6tG264weODU9OmTbV8+XK9//77SkhIUIUKFZSSkuKVeC/11FNPKT09XW3bttVDDz2klJQUnT17Vnv27NHChQs1efJkJSYm6r777lN0dLTatWunhIQEHTp0SM8884wqVqyoq6++WpLUqlUr9ejRQ1dccYUqV66sbdu26Y033lCbNm1Urlw5R+IHgGAULOPwkCFD9Pbbb+vmm2/WsGHD1LJlS505c0YrVqxQjx491KFDhyKNj94QFxenBx98UP/973+VnJyshQsXasqUKXrwwQfd8550795d48aNU+/evdW/f38dO3ZMzz//vPuEQ7a6devq8ccf19/+9jedOXNGt99+uypWrKitW7fq6NGjGjVqlKSL4/o777yjSZMmqUWLFgoJCcn1inl+/vnPf6pbt2667rrr1LdvX9WqVUvff/+9tm3bpo0bN7p/Sq6gsfjLL7/UoEGD9Lvf/U4NGjRQRESEli5dqi+//FLDhg3zQoYBL/PvnG2AM3744QcLCQmxmJgYO3funHv5jBkz3D99canz58/bmDFjLDk52cLDw61q1ap255132r59+zzapaWlWZMmTXK8fu3atXbrrbdaUlKSRUZGWlxcnKWlpdn8+fM92i1ZssSaN29ukZGRJinX2byzvfDCC9a2bVurWrWqRUREWJ06daxfv362Z88ed5vsWcp/+ZMpZmZ9+vSxmJiYHOvMLf69e/da7969LS4uzsLDwy0lJcWee+4594yh2Y4dO2YPPPCAJSQkWFhYmCUlJdljjz1mZ8+e9Wi3efNma9eunZUrV84kWVpampn9PEv5hg0bPNovW7bMJHn8nEdSUpJ179491/iz15ftyJEj9tBDD9lll11m4eHhVqVKFWvRooUNHz7cPRPs9OnTrUOHDla9enWLiIiwmjVrWq9evTxmfx02bJilpqZa5cqVLTIy0urVq2dDhw61o0eP5ogDAJC3YBmHs/syePBgq1OnjoWHh1t8fLx1797dtm/f7m5T2PFRufx8V/ZM388995zH8uyxcfbs2Tn6vnz5cktNTbXIyEhLSEiwxx9/PMfs36+++qqlpKS4x7NnnnnGpk6dapJs9+7dHm1ff/11u/rqqy0qKsrKly9vzZs39/ipru+//9569uxplSpVMpfLZdklRF6xZ/f10tngv/jiC+vVq5fFx8dbeHi41ahRwzp27GiTJ092tyloLD58+LD17dvXGjZsaDExMVa+fHm74oor7B//+IfHTOlAoHCZXfKFEwAAAAABp3379jp69Ki++uorf4cCoJD4DjcAAAAAAA6g4AYAAAAAwAHcUg4AAAAAgAO4wg0AAAAAgAMouAEAAAAAcIBffof7woULOnDggCpUqCCXy+WPEAAACAhmpoyMDNWsWVMhIf45D864DADARd4el/1ScB84cEC1a9f2x6YBAAhI+/btU2Jiol+2zbgMAIAnb43Lfim4K1SoIOliJ2JjY/0RAgAAAeHEiROqXbu2e2z0B8ZlAAAu8va47JeCO/t2tdjYWAZ2AAAkv97KzbgMAIAnb43Lfim4gaCw7Jn8n+/wmG/iAAD4T0FjgcR4AABlGAU3AABAXgpTUAMAkAd+FgwAAAAAAAdwhRvBidu9AQAAAPgZV7gBAAAAAHAAV7gBAEDpxIRlAIAAR8ENAADKLiZFAwA4iFvKAQAAAABwAAU3AAAAAAAOoOAGAAAAAMABFNwAAAAAADiAghsAAAAAAAdQcAMAAAAA4AB+FgwAAAQvfvYLAOBHXOEGAAAAAMABXOEGAACBKViuThfUjw6P+SYOAIDPcYUbAAAAAAAHUHADAAAAAOAACm4AAAAAABzAd7jhfcHyXbWSfnfQB3lYO/XRfJ9v0+/5Em8DAOCwYBk3AQA5cIUbAAAAAAAHcIUb8JfCXEHnqgYAAABQalFwIzA5fXtdsPzUDAAAAICARcENAAAQyPiONwCUWhTcQAAraFI0AAAAAIGLSdMAAAAAAHAAV7jhe974/jTfwfaZf6TvzPf5oV2SfRQJAAAAULpQcAMAAP/g5CkAIMhRcJc2vpg4hclZAAAAAKDEKLjLGm7nhpdxyzkA5G/trmP5Pt+mXpyPIgEA+BqTpgEAAAAA4ACucKNMKuhqg1Q2rjgUdHUaAIBAwR1VAEojrnADAAAAAOAArnADwazA79v/1idhAAAcVJi5VZjwFAD8goIbAAAAJVKYrygVdMs3X3MCEIwouAE/Kcz3yB3fRp2Sb6P1f18poMXzJd8IAKBEAuH7zxTUAMoiCm4gD/yMS2DwxlUTAAAAwB+YNA0AAAAAAAdwhRsoJq6AAwB8wRfjzdqpj+a/jX6B//Ug7ogCEIi4wg0AAAAAgAO4wo1SiavL3lHwhGcorECYkAgA8sLxHgD8g4IbAADAj0r6qxW++NULZhgHgOKh4A42y57xdwSAz5X0gyBXnwHkxRfFbDAozBX0dXX6+yCSkuFuJQDeRsGNoBQIH5ACIYZAwFURAAAAlFUU3AAAAMiXN74DXtA6fHEFnJPAAHyNgjvQlIFbwrnyi6IIltsUC8JtjAhKZWBMAwAgPxTcAAAAKPUC4QStN66gF3SC1RdX6UsaAyeJgZ9RcHsTZ/IlcQUbcEJhPmDxAQcAACCwUHADAAAAheCLq8t8zxwILhTcyIEr1PAmb0y043QMa6fm//o2/Z73YjQAfIkxrfQIhPECpUcw3NbO3WtlQ/AU3AXdzt3hMee3UYDCDPpt6sWVaB0lfT1QGjn9IW3t1EcLjqGA5/+Rnv/3Br0xe29BcZb4xEFhjoElPNb64gMUv9sOAAB8xS8Ft5lJkk6cOOG9lZ46m//z3thWQdso6OVnfiqwzYkCtlHQOkr6egDOOHvqZL7PF/R/s6DXF2YdJT7mFuYYWMJtFNRPb4wbhcml0zHktr7ssdEfHBmXpQL3mU/3fO/d7QEFaLrjZUfXvyHxHkfXL0lX759WotcXJsaCjgUlPVZPWPpNgTGU1DPzNjq+jYEdf1Wi1xdmPCrpcdkbuS5pP72hoH54M0Zvj8su88MIv3//ftWuXdvXmwUAIGDt27dPiYmJftk24zIAAJ68NS77peC+cOGCDhw4oAoVKsjlcvl680V24sQJ1a5dW/v27VNsbKy/w/GqYO1bsPZLCt6+0a/SJ1j75ut+mZkyMjJUs2ZNhYSEOL693BR2XA7W99xXyF/JkL+SIX8lQ/5KpjTlz9vjsl9uKQ8JCfHbWfySiI2NDfgdpLiCtW/B2i8pePtGv0qfYO2bL/tVsWJFn2wnL0Udl4P1PfcV8lcy5K9kyF/JkL+SKS358+a47J9T6QAAAAAABDkKbgAAAAAAHEDBXQiRkZEaMWKEIiMj/R2K1wVr34K1X1Lw9o1+lT7B2rdg7Zc3kJuSIX8lQ/5KhvyVDPkrmbKcP79MmgYAAAAAQLDjCjcAAAAAAA6g4AYAAAAAwAEU3AAAAAAAOICCGwAAAAAAB1BwAwAAAADggDJbcE+cOFGXXXaZoqKi1KJFC61cuTLf9itWrFCLFi0UFRWlevXqafLkyTnajB8/XikpKYqOjlbt2rU1dOhQnT171qku5Koo/Tp48KB69+6tlJQUhYSEaMiQIbm2mzt3rho3bqzIyEg1btxY7777rkPR583b/ZoyZYquvfZaVa5cWZUrV1bnzp316aefOtiDvDnxnmV766235HK5dMstt3g36EJwol/Hjx/XwIEDlZCQoKioKDVq1EgLFy50qAd5c6Jvpe348c4776hLly6qVq2aYmNj1aZNGy1atChHu0A4fkje71sgHUO8zYnxsSxx4v9RWVLU/S/b6tWrFRYWpmbNmjkbYAArau5++uknDR8+XElJSYqMjFT9+vX16quv+ijawFPU/M2YMUNXXnmlypUrp4SEBN1zzz06duyYj6INLJ988oluvPFG1axZUy6XS/PmzSvwNWVq7LAy6K233rLw8HCbMmWKbd261QYPHmwxMTG2d+/eXNvv2rXLypUrZ4MHD7atW7falClTLDw83ObMmeNu8+abb1pkZKTNmDHDdu/ebYsWLbKEhAQbMmSIr7pV5H7t3r3bHnroIZs+fbo1a9bMBg8enKPNmjVrLDQ01EaPHm3btm2z0aNHW1hYmK1bt87h3vzMiX717t3bJkyYYJs2bbJt27bZPffcYxUrVrT9+/c73BtPTvQt2549e6xWrVp27bXX2s033+xMB/LgRL9++uknS01NtRtuuMFWrVple/bssZUrV9rmzZsd7o0nJ/pWGo8fgwcPtjFjxtinn35qO3futMcee8zCw8Nt48aN7jaBcPwwc6ZvgXIM8TYnxseyxIl9rSwpav6yHT9+3OrVq2ddu3a1K6+80jfBBpji5O6mm26yVq1aWXp6uu3evdvWr19vq1ev9mHUgaOo+Vu5cqWFhITYiy++aLt27bKVK1dakyZN7JZbbvFx5IFh4cKFNnz4cJs7d65JsnfffTff9mVt7CiTBXfLli3tgQce8FjWsGFDGzZsWK7t//znP1vDhg09lt1///3WunVr9+OBAwdax44dPdo8/PDDds0113gp6oIVtV+/lJaWlmsh0KtXL7v++us9ll133XV22223lSjWonCiX5fKysqyChUq2PTp04sbZrE41besrCxr166d/etf/7I+ffr4vOB2ol+TJk2yevXq2blz57wVZrE40bfSfvzI1rhxYxs1apT7cSAcP8yc6dul/HUM8TYnxseyxBf7WjArbv5+//vf21//+lcbMWJEmS24i5q7Dz/80CpWrGjHjh3zRXgBr6j5e+6556xevXoey1566SVLTEx0LMbSojAFd1kbO8rcLeXnzp3T559/rq5du3os79q1q9asWZPra9auXZuj/XXXXafPPvtMmZmZkqRrrrlGn3/+ufuWwl27dmnhwoXq3r27A73IqTj9Koy8+l6SdRaFU/261OnTp5WZmakqVap4bZ0FcbJvTz31lKpVq6Z+/fqVaD3F4VS/5s+frzZt2mjgwIGqXr26Lr/8co0ePVrnz58vaciF5lTfguH4ceHCBWVkZHj8H/L38UNyrm+X8scxxNucGh/LCl/ta8GquPmbNm2avv32W40YMcLpEANWcXI3f/58paamauzYsapVq5aSk5P16KOP6syZM74IOaAUJ39t27bV/v37tXDhQpmZDh8+rDlz5vhs3C7tytrYEebvAHzt6NGjOn/+vKpXr+6xvHr16jp06FCurzl06FCu7bOysnT06FElJCTotttu05EjR3TNNdfIzJSVlaUHH3xQw4YNc6wvv1ScfhVGXn0vyTqLwql+XWrYsGGqVauWOnfu7LV1FsSpvq1evVpTp07V5s2bSxhh8TjVr127dmnp0qW64447tHDhQn399dcaOHCgsrKy9OSTT5Y07EJxqm/BcPx44YUXdOrUKfXq1cu9zN/HD8m5vl3KH8cQb3NqfCwrfLWvBavi5O/rr7/WsGHDtHLlSoWFlbmPtG7Fyd2uXbu0atUqRUVF6d1339XRo0c1YMAAff/992Xue9zFyV/btm01Y8YM/f73v9fZs2eVlZWlm266SS+//LIvQi71ytrYUeaucGdzuVwej80sx7KC2v9y+fLly/X3v/9dEydO1MaNG/XOO+9owYIF+tvf/ublyPNX1H75a52BFMPYsWM1a9YsvfPOO4qKivLKOovCm33LyMjQnXfeqSlTpqhq1areCK/YvP2eXbhwQfHx8XrllVfUokUL3XbbbRo+fLgmTZpU0lCLzNt9K+3Hj1mzZmnkyJF6++23FR8f75V1epsTfcvm72OIt3l7fCxrnNzXyoLC5u/8+fPq3bu3Ro0apeTkZF+FF9CKsu9duHBBLpdLM2bMUMuWLXXDDTdo3Lhxeu2118rkVW6paPnbunWrHnroIT355JP6/PPP9dFHH2n37t164IEHfBFqUChLY0eZOx1YtWpVhYaG5jhj9d133+U405KtRo0aubYPCwtTXFycJOmJJ57QXXfdpXvvvVeS1LRpU506dUr9+/fX8OHDFRLi7LmN4vSrMPLqe0nWWRRO9Svb888/r9GjR2vJkiW64oorSry+onCib99++6327NmjG2+80b3swoULkqSwsDDt2LFD9evXL37QheDUe5aQkKDw8HCFhoa6lzVq1EiHDh3SuXPnFBERUex1F5ZTfSvNx4+3335b/fr10+zZs3Nc3fX38UNyrm/Z/HkM8Tanxseywul9LdgVNX8ZGRn67LPPtGnTJg0aNEjSxfHOzBQWFqbFixerY8eOPond34qz7yUkJKhWrVqqWLGie1mjRo1kZtq/f78aNGjgaMyBpDj5e+aZZ9SuXTv96U9/kiRdccUViomJ0bXXXqunn3466K7QeltZGzvK3BXuiIgItWjRQunp6R7L09PT1bZt21xf06ZNmxztFy9erNTUVIWHh0u6+P29Sz8Uh4aGyi5OTOfFHuSuOP0qjLz6XpJ1FoVT/ZKk5557Tn/729/00UcfKTU1tUTrKg4n+tawYUP95z//0ebNm91/N910kzp06KDNmzerdu3a3gg9X069Z+3atdM333zjPoEgSTt37lRCQoJPim3Jub6V1uPHrFmz1LdvX82cOTPX7635+/ghOdc3yf/HEG9zanwsK5zc18qCouYvNjY2x3j3wAMPKCUlRZs3b1arVq18FbrfFWffa9eunQ4cOKCTJ0+6l+3cuVMhISFKTEx0NN5AU5z85TVuS/LJuF3albmxwzdzswWW7Kn/p06dalu3brUhQ4ZYTEyM7dmzx8zMhg0bZnfddZe7ffbU9UOHDrWtW7fa1KlTc0xdP2LECKtQoYLNmjXLdu3aZYsXL7b69etbr169ArZfZmabNm2yTZs2WYsWLax37962adMm27Jli/v51atXW2hoqD377LO2bds2e/bZZ/32s2De7NeYMWMsIiLC5syZYwcPHnT/ZWRk+KxfTvXtUv6YpdyJfv33v/+18uXL26BBg2zHjh22YMECi4+Pt6effrrU9600Hj9mzpxpYWFhNmHCBI//Q8ePH3e3CYTjh1N9C5RjiLc5MT6WJU7sa2VJcY6vv1SWZykvau4yMjIsMTHRevbsaVu2bLEVK1ZYgwYN7N577/VXF/yqqPmbNm2ahYWF2cSJE+3bb7+1VatWWWpqqrVs2dJfXfCrjIwM92cdSTZu3DjbtGmT+2fVyvrYUSYLbjOzCRMmWFJSkkVERNhVV11lK1ascD/Xp08fS0tL82i/fPlya968uUVERFjdunVt0qRJHs9nZmbayJEjrX79+hYVFWW1a9e2AQMG2A8//OCD3vysqP2SlOMvKSnJo83s2bMtJSXFwsPDrWHDhjZ37lwf9MSTt/uVlJSUa5sRI0b4pkO/4MR79kv+KLjNnOnXmjVrrFWrVhYZGWn16tWzv//975aVleWD3njydt9K4/EjLS0t13716dPHY52BcPww837fAukY4m3eHh/LGif+H5UlRd3/fqksF9xmRc/dtm3brHPnzhYdHW2JiYn28MMP2+nTp30cdeAoav5eeukla9y4sUVHR1tCQoLdcccdtn//fh9HHRiWLVuW77GsrI8dLjPuewAAAAAAwNvK3He4AQAAAADwBQpuAAAAAAAcQMENAAAAAIADKLgBAAAAAHAABTcAAAAAAA6g4AYAAAAAwAEU3AAAAAAAOICCGwAAAAAAB1BwAwAAAADgAApuAAAAAAAcQMENAAAAAIADKLgBAAAAAHAABTcAAAAAAA6g4AYAAAAAwAEU3AAAAAAAOICCGwAAAAAAB1BwAwAAAADgAApuAAAAAAAcQMENAAAAAIADKLgBAAAAAHAABTcAAAAAAA6g4AYAAAAAwAEU3AAAAAAAOICCGwAAAAAAB1BwAwAAAADgAApuAAAAAAAcQMENAAAAAIADKLgBAAAAAHAABTcAAAAAAA6g4AYAAAAAwAEU3AAAAAAAOICCGwAAAAAAB1BwAwAAAADgAApuAAAAAAAcQMENAAAAAIADKLgBAAAAAHAABTcQoE6fPq2RI0dq+fLl/g4l4OzZs0cul0uvvfaae9maNWs0cuRIHT9+3G9xAQD8i7GzdKlbt6769u1brNfOnDlT48eP92o8gBPC/B0AgNydPn1ao0aNkiS1b9/ev8EEmISEBK1du1b169d3L1uzZo1GjRqlvn37qlKlSv4LDgDgN4ydpcu7776r2NjYYr125syZ+uqrrzRkyBDvBgV4GQU34GOnT59WuXLl/B1GqRYZGanWrVv7OwwAgI8wdgan5s2b+zsEwHHcUo4ya8uWLXK5XJo9e7Z72eeffy6Xy6UmTZp4tL3pppvUokUL9+MLFy5o7NixatiwoSIjIxUfH6+7775b+/fv93hd+/btdfnll+uTTz5R27ZtVa5cOf3hD3+QJC1dulTt27dXXFycoqOjVadOHf32t7/V6dOntWfPHlWrVk2SNGrUKLlcLrlcrgJvuzp+/LgeeeQR1atXzx3XDTfcoO3bt7vbfP/99xowYIBq1aqliIgI1atXT8OHD9dPP/3ksS6Xy6VBgwbpjTfeUKNGjVSuXDldeeWVWrBgQY7tbt++XbfffruqV6+uyMhI1alTR3fffbd7nUeOHNGAAQPUuHFjlS9fXvHx8erYsaNWrlzpXkdmZqbi4+N111135dqv6OhoPfzww5Jy3lI+cuRI/elPf5IkXXbZZe58LV++XP369VOVKlV0+vTpHOvt2LFjjvcaAJA3xs7SO3ZK0okTJ/Too4/qsssuU0REhGrVqqUhQ4bo1KlT+eZI+vl9WblypVq3bq3o6GjVqlVLTzzxhM6fP+/RtrD5uvSW8uXLl8vlcmnWrFkaPny4atasqdjYWHXu3Fk7duzwiOWDDz7Q3r173e+zy+VyPz9p0iRdeeWVKl++vCpUqKCGDRvq8ccfL7CPgCMMKMMSEhKsf//+7sfPPvusRUdHmyT73//+Z2ZmmZmZFhsba3/+85/d7fr372+SbNCgQfbRRx/Z5MmTrVq1ala7dm07cuSIu11aWppVqVLFateubS+//LItW7bMVqxYYbt377aoqCjr0qWLzZs3z5YvX24zZsywu+66y3744Qc7e/asffTRRybJ+vXrZ2vXrrW1a9faN998k2dfTpw4YU2aNLGYmBh76qmnbNGiRTZ37lwbPHiwLV261MzMzpw5Y1dccYXFxMTY888/b4sXL7YnnnjCwsLC7IYbbvBYnySrW7eutWzZ0v7973/bwoULrX379hYWFmbffvutu93mzZutfPnyVrduXZs8ebJ9/PHH9uabb1qvXr3sxIkTZma2fft2e/DBB+2tt96y5cuX24IFC6xfv34WEhJiy5Ytc69r6NChFh0dbT/++KNHLBMnTjRJ9uWXX5qZ2e7du02STZs2zczM9u3bZ3/84x9Nkr3zzjvufP3444/2xRdfmCSbMmWKxzq3bNlikmzChAl55hQAkBNjZ+kcO0+dOmXNmjWzqlWr2rhx42zJkiX24osvWsWKFa1jx4524cKFfN/3tLQ0i4uLs5o1a9pLL71kixYtsoceesgk2cCBA93tipKvpKQk69Onj/vxsmXL3Dm844477IMPPrBZs2ZZnTp1rEGDBpaVlWVmF8fwdu3aWY0aNdzv89q1a83MbNasWSbJ/vjHP9rixYttyZIlNnnyZHvooYfy7R/gFApulGl33nmn1atXz/24c+fOdt9991nlypVt+vTpZma2evVqk2SLFy82M7Nt27aZJBswYIDHutavX2+S7PHHH3cvS0tLM0n28ccfe7SdM2eOSbLNmzfnGduRI0dMko0YMaJQfXnqqadMkqWnp+fZZvLkySbJ/v3vf3ssHzNmjEcfzS5+aKhevbp74DczO3TokIWEhNgzzzzjXtaxY0erVKmSfffdd4WK08wsKyvLMjMzrVOnTnbrrbe6l3/55ZcmyV555RWP9i1btrQWLVq4H19acJuZPffccybJdu/enWN7aWlp1qxZM49lDz74oMXGxlpGRkah4wYAMHZmK21j5zPPPGMhISG2YcMGj3bZeV24cGG+289+X9577z2P5ffdd5+FhITY3r17zaxo+cqr4L60MP/3v/9tktxFtZlZ9+7dLSkpKUecgwYNskqVKuXbF8CXuKUcZVqnTp20a9cu7d69W2fPntWqVat0/fXXq0OHDkpPT5ckLVmyRJGRkbrmmmskScuWLZOkHLeotWzZUo0aNdLHH3/ssbxy5crq2LGjx7JmzZopIiJC/fv31/Tp07Vr164S9+XDDz9UcnKyOnfunGebpUuXKiYmRj179vRYnt2XS2Pv0KGDKlSo4H5cvXp1xcfHa+/evZIufqduxYoV6tWrl/s2vrxMnjxZV111laKiohQWFqbw8HB9/PHH2rZtm7tN06ZN1aJFC02bNs29bNu2bfr000/dtxMWx+DBg7V582atXr1a0sVb6t544w316dNH5cuXL/Z6AaAsYuy8qLSNnQsWLNDll1+uZs2aKSsry/133XXXub+GVZAKFSropptu8ljWu3dvXbhwQZ988omkoucrN5du44orrpAkdw7z07JlSx0/fly333673nvvPR09erTA1wBOouBGmZY9wC5ZskSrVq1SZmamOnbsqM6dO7sHhCVLlqhdu3aKjo6WJB07dkzSxZmyL1WzZk3389lya1e/fn0tWbJE8fHxGjhwoOrXr6/69evrxRdfLHZfjhw5osTExHzbHDt2TDVq1PD4npMkxcfHKywsLEfscXFxOdYRGRmpM2fOSJJ++OEHnT9/vsDtjhs3Tg8++KBatWqluXPnat26ddqwYYOuv/5697qy/eEPf9DatWvd352bNm2aIiMjdfvtt+e7jfzcfPPNqlu3riZMmCBJeu2113Tq1CkNHDiw2OsEgLKKsfOi0jZ2Hj58WF9++aXCw8M9/ipUqCAzK1RhWr169RzLatSoIenn97io+crNpTmMjIyUpBz9zs1dd92lV199VXv37tVvf/tbxcfHq1WrVu6TQYCvUXCjTEtMTFRycrKWLFmi9PR0paamqlKlSurUqZMOHjyo9evXa926dR5nvrMHgYMHD+ZY34EDB1S1alWPZZcOONmuvfZavf/++/rxxx+1bt06tWnTRkOGDNFbb71VrL5Uq1Ytx8Qzl4qLi9Phw4dlZh7Lv/vuO2VlZeWIvSBVqlRRaGhogdt988031b59e02aNEndu3dXq1atlJqaqoyMjBxtb7/9dkVGRuq1117T+fPn9cYbb+iWW25R5cqVixTbL4WEhGjgwIGaM2eODh48qIkTJ6pTp05KSUkp9joBoKxi7LyotI2dVatWVdOmTbVhw4Zc/5544okCYz98+HCOZYcOHZL083vs7XwVxz333KM1a9boxx9/1AcffCAzU48ePQp1hRzwNgpulHmdO3fW0qVLlZ6eri5dukiSkpOTVadOHT355JPKzMz0+NCQfYvbm2++6bGeDRs2aNu2berUqVORth8aGqpWrVq5r75u3LhRUtHO5kpSt27dtHPnTi1dujTPNp06ddLJkyc1b948j+Wvv/66+/miiI6OVlpammbPnp3vmXGXy+XuT7Yvv/xSa9euzdG2cuXKuuWWW/T6669rwYIFOnToUKFuJy8oX/fee68iIiJ0xx13aMeOHRo0aFCB6wQA5I6xs/SNnT169NC3336ruLg4paam5virW7dugbFnZGRo/vz5HstmzpypkJAQ/frXv5bk/Xzl5Zd3DeQlJiZG3bp10/Dhw3Xu3Dlt2bLFK9sGisSv3yAHAsDcuXNNkkmyFStWuJffc889JskqV65s58+f93hN//79zeVy2ZAhQ2zRokX2z3/+0+Lj46127dp29OhRd7u0tDRr0qRJjm1OmjTJfve739lrr71mS5cutYULF1rPnj1Nki1atMjdLikpyVJSUmzRokW2YcOGXCcEy5Y902r58uXt6aeftsWLF9t7771nDz/8cI6ZVitUqGDjxo2z9PR0GzFihIWHh+c60+ovZx39ZUy/nOAke6bVevXq2SuvvGJLly61WbNm2e233+6eNObJJ580l8tlTz75pH388cc2ceJEq1GjhtWvXz/XCU8WLVpkkiwxMdESExNz5D+3SdOyJ1q5//77bc2aNbZhwwaPSWvMLk6UJsmSkpJyrBMAUHiMnaVv7Dx58qQ1b97cEhMT7YUXXrD09HRbtGiRTZkyxX73u9/ZunXr8syTmecs5S+//LItWrTIBg8ebJLswQcfdLcrSr7ymjRt9uzZHu1yG/dHjBhhkmzixIm2fv1692Rw9957r/3xj3+0t956y1asWGFvv/22NWvWzCpWrFikSeoAb6HgRpn3ww8/WEhIiMXExNi5c+fcy2fMmGGS7De/+U2O15w/f97GjBljycnJFh4eblWrVrU777zT9u3b59Eurw8Na9eutVtvvdWSkpIsMjLS4uLiLC0tzebPn+/RbsmSJda8eXOLjIw0SR6DUl59GTx4sNWpU8fCw8MtPj7eunfvbtu3b3e3OXbsmD3wwAOWkJBgYWFhlpSUZI899pidPXvWY12F/dBgZrZ161b73e9+Z3FxcRYREWF16tSxvn37utf5008/2aOPPmq1atWyqKgou+qqq2zevHnWp0+fXD80nD9/3mrXrm2SbPjw4Tmez23gNTN77LHHrGbNmhYSEmKSPH42xcxs+fLlJsmeffbZfLIIACgIY2fpGzvNLhbdf/3rXy0lJcUiIiKsYsWK1rRpUxs6dKgdOnQo3zxlvy/Lly+31NRUi4yMtISEBHv88cctMzPTo21h81WSgvv777+3nj17WqVKlczlcln2dcTp06dbhw4drHr16hYREWE1a9a0Xr16uX8eDfA1l9klX7AAgCD1yCOPaNKkSdq3b1+uk9oAAIDctW/fXkePHtVXX33l71CAUiXM3wEAgNPWrVunnTt3auLEibr//vsptgEAAOATFNwAgl6bNm1Urlw59ejRQ08//bS/wwEAAEAZwS3lAAAAAAA4gJ8FAwAAAADAARTcAAAAAAA4gIIbAAAAAAAH+GXStAsXLujAgQOqUKGCXC6XP0IAACAgmJkyMjJUs2ZNhYT45zw44zIAABd5e1z2S8F94MAB1a5d2x+bBgAgIO3bt0+JiYl+2TbjMgAAnrw1Lvul4K5QoYKki52IjY31RwgAAASEEydOqHbt2u6x0R8YlwEAuMjb47JfCu7s29ViY2MZ2Itq2TP5P9/hMd/EAQDwKn/eys24jALx+QNAGeOtcZlJ0wAAAAAAcAAFNwAAAAAADqDgBgAAAADAARTcAAAAAAA4wC+TpjmiNEzmUVCMUmDECQAAAAAoMa5wAwAAAADggOC5wu0LpeEqOgAAAAAgIHCFGwAAAAAAB1BwAwAAAADgAApuAAAAAAAcwHe4vakws5ADAAAAAMoErnADAAAAAOAACm4AAAAAABxAwQ0AAAAAgAMouAEAAAAAcACTpgUaJl4DAAC/5I3PBh0eK/k6AABFxhVuAAAAAAAcQMENAAAAAIADys4t5YW5HYvbrQAAAAAAXlJ2Cu7C4PvTAAAAAAAvoeBGqfSP9J35Pj+0S7KPIgEAoIQC4YR/IMQAAEGI73ADAAAAAOAACm4AAAAAABxAwQ0AAAAAgAP4DjcAAAACX0HfM+fXZgAEIApuAAAAJwXChGSBEAMAlEHcUg4AAAAAgAO4wg0AAICS8cYVdG4JBxCEKLgBAACAUuIf6TvzfX5ol2QfRZK30hAj4CsU3GVNYc5Ac4YZAAAAAEqMghsAAAQvZrYGAPgRBTcAAAD8j5nUAQQhZikHAAAAAMABXOGG93H7HgBACo7xIBj6AAQhJmZDacEVbgAAAAAAHMAV7mDjje8/BcDZ/ILOWpb09Zz1BAAAAOA0Cm543dpdx/J9fl1WyYppAAACBhN9IcB446IDFy4A76HgRg4FFcxtOvgoEAAAgDKmpHf5+YIvYnT6bsfC4MQCvIHvcAMAAAAA4ACucKPICjpj2NpHcQAAAJQqhfoKwm8dDwPewa33KAwK7jKmoNvFC6P1f19x/PXr6vQv0TYK4ovbjMrKQbis9BNAkOI72MHDG+8lP/MGwMsouAE/KUzRT1EPAEAACYBfcimp0vAdcSCYUHADxVQaJgzxxvop2gEAAIDioeAGAAClE7eDI9CwTyLAcDek/1FwIyAV9D1vp7/jDQDwgSC4PReBozDz1LSpF+f3GApUp+SrCHTc1l54FMylHwV3kPHKgR7wIgYKAAAAlFUU3EAQ4wwyAAA/K+jCRJsOJXs9gktp+BwVCDFycSV/FNwAACAw8X1Y+NjaqY/6O4QS//wqX7srPQKhWC6MQIizNBf1FNwolQLhO96BEAO8ozQfxAEAABC4KLhRJhXm7DEF80X+/mmyYCl2y0o/AQQnb9xKXdCEZdyuDfheIFy9DnZBU3AX+J2cQsxKWdJ1MFAEjpLejuWtdZR0/SUt+r1xFd7pW9sKWv/aqSXfxtCwuQWs4bcFb8Rh3hjwSkPRXtITDz757Xhmzg4q3vh8EOh8UQx7A5+TfKM0jP1lBcXsRWX9ooNfCm4zkySdOHHCa+s8deanfJ8/ceqs4+so6PUoW86eOpnv84XZXwpaR0EK2kZh1l/S/dobeSjpNk6E5f9/92xWAa/3wrGqpO9lYXjzmOqUAt+rAvpQmDyWOA8FjRdeznN2vNljoz84MS5LKjiXPuCNzweBzhvHUT7jlB2BMPYDv1SYsaeknx+Kwtvjssv8MMLv379ftWvX9vVmAQAIWPv27VNiYqJfts24DACAJ2+Ny34puC9cuKADBw6oQoUKcrlcJV7fiRMnVLt2be3bt0+xsbFeiLB0Ig8XkYeLyMNF5IEcZAvUPJiZMjIyVLNmTYWEhPglBm+Py1Lg5tvfyEveyE3eyE3eyE3eyE3e8suNt8dlv9xSHhIS4shZ/NjYWHYmkYds5OEi8nAReSAH2QIxDxUrVvTr9p0al6XAzHcgIC95Izd5Izd5Izd5Izd5yys33hyX/XMqHQAAAACAIEfBDQAAAACAA4Ki4I6MjNSIESMUGRnp71D8ijxcRB4uIg8XkQdykI08+Bb5zh15yRu5yRu5yRu5yRu5yZsvc+OXSdMAAAAAAAh2QXGFGwAAAACAQEPBDQAAAACAAyi4AQAAAABwAAU3AAAAAAAOKDUF98SJE3XZZZcpKipKLVq00MqVK/Ntv2LFCrVo0UJRUVGqV6+eJk+e7KNInVWUPLzzzjvq0qWLqlWrptjYWLVp00aLFi3yYbTOKer+kG316tUKCwtTs2bNnA3QR4qah59++knDhw9XUlKSIiMjVb9+fb366qs+itYZRc3BjBkzdOWVV6pcuXJKSEjQPffco2PHjvkoWmd88sknuvHGG1WzZk25XC7NmzevwNcE4zGyqHkI5mOkE5wYh+fOnavGjRsrMjJSjRs31rvvvutU+I7ydm5ee+01uVyuHH9nz551shuOKEpuDh48qN69eyslJUUhISEaMmRIru2CYb/xdl7K6j5T2ON4MOwzkvdzU1b3m1WrVqldu3aKi4tTdHS0GjZsqH/84x852nltv7FS4K233rLw8HCbMmWKbd261QYPHmwxMTG2d+/eXNvv2rXLypUrZ4MHD7atW7falClTLDw83ObMmePjyL2rqHkYPHiwjRkzxj799FPbuXOnPfbYYxYeHm4bN270ceTeVdQ8ZDt+/LjVq1fPunbtaldeeaVvgnVQcfJw0003WatWrSw9Pd12795t69evt9WrV/swau8qag5WrlxpISEh9uKLL9quXbts5cqV1qRJE7vlllt8HLl3LVy40IYPH25z5841Sfbuu+/m2z5Yj5FFzUOwHiOd4MQ4vGbNGgsNDbXRo0fbtm3bbPTo0RYWFmbr1q3zVbe8woncTJs2zWJjY+3gwYMef6VNUXOze/due+ihh2z69OnWrFkzGzx4cI42wbDfOJGXsrrPFOY4Hgz7jJkzuSmr+83GjRtt5syZ9tVXX9nu3bvtjTfesHLlytk///lPdxtv7jelouBu2bKlPfDAAx7LGjZsaMOGDcu1/Z///Gdr2LChx7L777/fWrdu7ViMvlDUPOSmcePGNmrUKG+H5lPFzcPvf/97++tf/2ojRowIioK7qHn48MMPrWLFinbs2DFfhOcTRc3Bc889Z/Xq1fNY9tJLL1liYqJjMfpaYQrNYD1G/lJh8pCbYDhGOsGJcbhXr152/fXXe7S57rrr7LbbbvNS1L7hRG6mTZtmFStW9HqsvlaSzy1paWm5FpbBsN84kRf2mZ9dehwPhn3GzJncsN/87NZbb7U777zT/dib+03A31J+7tw5ff755+ratavH8q5du2rNmjW5vmbt2rU52l933XX67LPPlJmZ6VisTipOHi514cIFZWRkqEqVKk6E6BPFzcO0adP07bffasSIEU6H6BPFycP8+fOVmpqqsWPHqlatWkpOTtajjz6qM2fO+CJkrytODtq2bav9+/dr4cKFMjMdPnxYc+bMUffu3X0RcsAIxmOkNwTDMdIJTo3DebUp7JgWCJz8jHLy5EklJSUpMTFRPXr00KZNm7zfAQd543NLbkr7fuNUXiT2GSn343hp32ckZ+sA9htp06ZNWrNmjdLS0tzLvLnfBHzBffToUZ0/f17Vq1f3WF69enUdOnQo19ccOnQo1/ZZWVk6evSoY7E6qTh5uNQLL7ygU6dOqVevXk6E6BPFycPXX3+tYcOGacaMGQoLC/NFmI4rTh527dqlVatW6auvvtK7776r8ePHa86cORo4cKAvQva64uSgbdu2mjFjhn7/+98rIiJCNWrUUKVKlfTyyy/7IuSAEYzHSG8IhmOkE5wah/NqU9gxLRA4lZuGDRvqtdde0/z58zVr1ixFRUWpXbt2+vrrr53piAO88bklN6V9v3EqL+wzF+V2HC/t+4zkXG7K+n6TmJioyMhIpaamauDAgbr33nvdz3lzvyk11YfL5fJ4bGY5lhXUPrflpU1R85Bt1qxZGjlypN577z3Fx8c7FZ7PFDYP58+fV+/evTVq1CglJyf7KjyfKcr+cOHCBblcLs2YMUMVK1aUJI0bN049e/bUhAkTFB0d7Xi8TihKDrZu3aqHHnpITz75pK677jodPHhQf/rTn/TAAw9o6tSpvgg3YATrMbK4gu0Y6QQnxuHijmmBxtu5ad26tVq3bu1+vl27drrqqqv08ssv66WXXvJW2D7hxHscDPuNt/vAPpP/cTwY9hnJ+7kp6/vNypUrdfLkSa1bt07Dhg3Tr371K91+++0lWmduAr7grlq1qkJDQ3OcTfjuu+9ynHXIVqNGjVzbh4WFKS4uzrFYnVScPGR7++231a9fP82ePVudO3d2MkzHFTUPGRkZ+uyzz7Rp0yYNGjRI0sXC08wUFhamxYsXq2PHjj6J3ZuKsz8kJCSoVq1a7mJbkho1aiQz0/79+9WgQQNHY/a24uTgmWeeUbt27fSnP/1JknTFFVcoJiZG1157rZ5++mklJCQ4HncgCMZjZEkE0zHSCU6Nw3m1KWhMCyS++owSEhKiq6++ulRddSrJ55b8lPb9xqm8XKqs7TP5HcdL+z4j+a4OKGv7zWWXXSZJatq0qQ4fPqyRI0e6C25v7jcBf0t5RESEWrRoofT0dI/l6enpatu2ba6vadOmTY72ixcvVmpqqsLDwx2L1UnFyYN08YxW3759NXPmzKD4nmpR8xAbG6v//Oc/2rx5s/vvgQceUEpKijZv3qxWrVr5KnSvKs7+0K5dOx04cEAnT550L9u5c6dCQkKUmJjoaLxOKE4OTp8+rZAQz8NeaGiopJ+vMJUFwXiMLK5gO0Y6walxOK82+Y1pgcZXn1HMTJs3by5VJwWL+7mlIKV9v3EqL5cqS/tMQcfx0r7PSL6rA8rSfnMpM9NPP/3kfuzV/abI06z5QfZU71OnTrWtW7fakCFDLCYmxvbs2WNmZsOGDbO77rrL3T77JzeGDh1qW7dutalTpwbFT94UNQ8zZ860sLAwmzBhgsdU/8ePH/dXF7yiqHm4VLDMUl7UPGRkZFhiYqL17NnTtmzZYitWrLAGDRrYvffe668ulFhRczBt2jQLCwuziRMn2rfffmurVq2y1NRUa9mypb+64BUZGRm2adMm27Rpk0mycePG2aZNm9w/h1FWjpFFzUOwHiOd4MQ4vHr1agsNDbVnn33Wtm3bZs8++2yp/qkeb+Zm5MiR9tFHH9m3335rmzZtsnvuucfCwsJs/fr1Pu9fSRRnvM7+P9yiRQvr3bu3bdq0ybZs2eJ+Phj2GyfyUlb3mcIcx4NhnzFzJjdldb/5v//7P5s/f77t3LnTdu7caa+++qrFxsba8OHD3W28ud+UioLbzGzChAmWlJRkERERdtVVV9mKFSvcz/Xp08fS0tI82i9fvtyaN29uERERVrduXZs0aZKPI3ZGUfKQlpZmknL89enTx/eBe1lR94dfCpaC26zoedi2bZt17tzZoqOjLTEx0R5++GE7ffq0j6P2rqLm4KWXXrLGjRtbdHS0JSQk2B133GH79+/3cdTetWzZsnz/r5eVY2RR8xDMx0gnODEOz54921JSUiw8PNwaNmxoc+fOdbobjvB2boYMGWJ16tSxiIgIq1atmnXt2tXWrFnji654XVFzk9v/yaSkJI82wbDfeDsvZXWfKexxPBj2GTPv56as7jcvvfSSNWnSxMqVK2exsbHWvHlzmzhxop0/f95jnd7ab1xmZeg+SgAAAAAAfCTgv8MNAAAAAEBpRMENAAAAAIADKLgBAAAAAHAABTcAAAAAAA6g4AYAAAAAwAEU3AAAAAAAOICCGwAAAAAAB1BwAwAAAADgAApuAAAAAAAcQMENAAAAAIADKLgBAAAAAHAABTcAAAAAAA6g4AYAAAAAwAEU3AAAAAAAOICCGwAAAAAAB1BwAwAAAADgAApuAAAAAAAcQMENAAAAAIADKLgBAAAAAHAABTcAAAAAAA6g4AYAAAAAwAEU3AAAAAAAOICCGwAAAAAAB1BwAwAAAADgAApuAAAAAAAcQMENAAAAAIADKLgBAAAAAHAABTcAAAAAAA6g4AYAAAAAwAEU3AAAAAAAOICCGwAAAAAAB1BwAwAAAADgAApuAAAAAAAcQMENAAAAAIADKLgBAAAAAHAABTcAAAAAAA6g4AYAAAAAwAEU3EAAOX36tEaOHKnly5f7O5SANnHiRL322mv+DgMA4CPFGR83bdqktLQ0VaxYUS6XS+PHj3csvkt5Yzzfs2ePXC5Xsce75cuXy+VyecTQt29f1a1bt9gxBYqS5gbwpTB/BwDgZ6dPn9aoUaMkSe3bt/dvMAFs4sSJqlq1qvr27evvUAAAPlCc8fEPf/iDTp06pbfeekuVK1f2aaEZqOP5E088ocGDB/s7jBJLSEjQ2rVrVb9+fX+HAhSIghvwgdOnT6tcuXL+DqNMyszMlMvlUlgYhzsACDROjo9fffWV7rvvPnXr1i3fdmfOnFFUVJRcLpcjcQSSYClQIyMj1bp1a3+HARQKt5SjTNmyZYtcLpdmz57tXvb555/L5XKpSZMmHm1vuukmtWjRwv34woULGjt2rBo2bKjIyEjFx8fr7rvv1v79+z1e1759e11++eX65JNP1LZtW5UrV05/+MMfJElLly5V+/btFRcXp+joaNWpU0e//e1vdfr0ae3Zs0fVqlWTJI0aNUoul0sulyvfq7gXLlzQ008/rZSUFEVHR6tSpUq64oor9OKLL0qSVq5cKZfLpVmzZuV47euvvy6Xy6UNGzZIunibWfny5bV9+3Zdd911iomJUUJCgp599llJ0rp163TNNdcoJiZGycnJmj59usf6XnvtNblcLi1dulT33Xef4uLiFBsbq7vvvlunTp3SoUOH1KtXL1WqVEkJCQl69NFHlZmZ6bGOc+fO6emnn3bnuFq1arrnnnt05MgRd5u6detqy5YtWrFihTtH2Vctsm+fe+ONN/TII4+oVq1aioyM1DfffKOwsDA988wzOfLwySef5NgnAKCsCabxMXs8ysrK0qRJk9ztf/nc4sWL9Yc//EHVqlVTuXLl9NNPP+mbb77RPffcowYNGqhcuXKqVauWbrzxRv3nP//JsY3jx4/rkUceUb169dx9vuGGG7R9+/YC4y3Kdgpr+/btuv7661WuXDlVrVpVDzzwgDIyMnK0y+2WcpfLpUGDBmnatGnuzxOpqalat26dzEzPPfecLrvsMpUvX14dO3bUN998k2O9S5YsUadOnRQbG6ty5cqpXbt2+vjjjz3ajBw5Ui6XS1u2bNHtt9+uihUrqnr16vrDH/6gH3/80aPt7Nmz1apVK1WsWFHlypVTvXr13PuKlPct5atWrVKnTp1UoUIFlStXTm3bttUHH3zg0SZ7H1i2bJkefPBBVa1aVXFxcfrNb36jAwcOFCbdQNEYUMYkJCRY//793Y+fffZZi46ONkn2v//9z8zMMjMzLTY21v785z+72/Xv398k2aBBg+yjjz6yyZMnW7Vq1ax27dp25MgRd7u0tDSrUqWK1a5d215++WVbtmyZrVixwnbv3m1RUVHWpUsXmzdvni1fvtxmzJhhd911l/3www929uxZ++ijj0yS9evXz9auXWtr1661b775Js++PPPMMxYaGmojRoywjz/+2D766CMbP368jRw50t2mefPm1q5duxyvvfrqq+3qq692P+7Tp49FRERYo0aN7MUXX7T09HS75557TJI99thjlpycbFOnTrVFixZZjx49TJJ99tln7tdPmzbNJNlll11mjzzyiC1evNjGjBljoaGhdvvtt9tVV11lTz/9tKWnp9tf/vIXk2QvvPCC+/Xnz5+366+/3mJiYmzUqFGWnp5u//rXv6xWrVrWuHFjO336tJmZbdy40erVq2fNmzd352jjxo1mZrZs2TKTZLVq1bKePXva/PnzbcGCBXbs2DG79dZbrU6dOpaVleWRh9/97ndWs2ZNy8zMzDPPAFAWBMv4+N1339natWtNkvXs2dPd3uznsapWrVrWv39/+/DDD23OnDmWlZVlK1assEceecTmzJljK1assHfffdduueUWi46Otu3bt7vXf+LECWvSpInFxMTYU089ZYsWLbK5c+fa4MGDbenSpQXGW9jt7N692yTZtGnT8n3fDh06ZPHx8VarVi2bNm2aLVy40O644w6rU6eOSbJly5a52/bp08eSkpI8Xi/JkpKSrG3btvbOO+/Yu+++a8nJyValShUbOnSo3XzzzbZgwQKbMWOGVa9e3a644gq7cOGC+/VvvPGGuVwuu+WWW+ydd96x999/33r06GGhoaG2ZMkSd7sRI0aYJEtJSbEnn3zS0tPTbdy4cRYZGWn33HOPu92aNWvM5XLZbbfdZgsXLrSlS5fatGnT7K677so3N8uXL7fw8HBr0aKFvf322zZv3jzr2rWruVwue+utt9ztsveBevXq2R//+EdbtGiR/etf/7LKlStbhw4d8s01UBwU3Chz7rzzTqtXr577cefOne2+++6zypUr2/Tp083MbPXq1SbJFi9ebGZm27ZtM0k2YMAAj3WtX7/eJNnjjz/uXpaWlmaS7OOPP/ZoO2fOHJNkmzdvzjO2I0eOmCQbMWJEofrSo0cPa9asWb5tsgeWTZs2uZd9+umnJsndX7OLg7Akmzt3rntZZmamVatWzSS5i1ozs2PHjlloaKg9/PDDObbzxz/+0WP7t9xyi0mycePGeSxv1qyZXXXVVe7Hs2bNyrF9M7MNGzaYJJs4caJ7WZMmTSwtLS1HX7ML7l//+td5Pvfuu++6l/3vf/+zsLAwGzVqVI72AFDWBNP4aHaxkBw4cKDHsuyx6u677y7w9VlZWXbu3Dlr0KCBDR061L38qaeeMkmWnp7ulXjz2k5hC+6//OUv5nK5cuSvS5cuhS64a9SoYSdPnnQvmzdvnkmyZs2aeRTX48ePN0n25ZdfmpnZqVOnrEqVKnbjjTd6rPP8+fN25ZVXWsuWLd3LsgvusWPHerQdMGCARUVFubfz/PPPmyQ7fvx4nn3OLTetW7e2+Ph4y8jIcC/Lysqyyy+/3BITE93rz94HLt1nx44da5Ls4MGDeW4XKA5uKUeZ06lTJ+3atUu7d+/W2bNntWrVKl1//fXq0KGD0tPTJV28NSoyMlLXXHONJGnZsmWSlOP2tZYtW6pRo0Y5bpuqXLmyOnbs6LGsWbNmioiIUP/+/TV9+nTt2rWrxH1p2bKlvvjiCw0YMECLFi3SiRMncrS5/fbbFR8frwkTJriXvfzyy6pWrZp+//vfe7R1uVy64YYb3I/DwsL0q1/9SgkJCWrevLl7eZUqVRQfH6+9e/fm2F6PHj08Hjdq1EiS1L179xzLf/n6BQsWqFKlSrrxxhuVlZXl/mvWrJlq1KhRpJlef/vb3+ZY1r59e1155ZUeeZg8ebJcLpf69+9f6HUDQLAKpvGxILmNE1lZWRo9erQaN26siIgIhYWFKSIiQl9//bW2bdvmbvfhhx8qOTlZnTt3Lta2C7udwlq2bJmaNGmiK6+80mN57969C72ODh06KCYmxv04e+zu1q2bx3fbs5dnj99r1qzR999/rz59+niM3RcuXND111+vDRs26NSpUx7buummmzweX3HFFTp79qy+++47SdLVV18tSerVq5f+/e9/63//+1+B8Z86dUrr169Xz549Vb58effy0NBQ3XXXXdq/f7927NhRYBy/7BvgLRTcKHOyB8glS5Zo1apVyszMVMeOHdW5c2f3B4MlS5aoXbt2io6OliQdO3ZM0sVZMS9Vs2ZN9/PZcmtXv359LVmyRPHx8Ro4cKDq16+v+vXru79vXRyPPfaYnn/+ea1bt07dunVTXFycOnXqpM8++8zdJjIyUvfff79mzpyp48eP68iRI/r3v/+te++9V5GRkR7rK1eunKKiojyWRUREqEqVKjm2HRERobNnz+ZYfmnbiIiIPJf/8vWHDx/W8ePHFRERofDwcI+/Q4cO6ejRo4XMSu75l6SHHnpIH3/8sXbs2KHMzExNmTJFPXv2VI0aNQq9bgAIVsE0PhYktzgefvhhPfHEE7rlllv0/vvva/369dqwYYOuvPJKnTlzxt3uyJEjSkxMLPa2C7udwjp27Fiu41hRxraijN2S3OP34cOHJUk9e/bMMXaPGTNGZqbvv//eYx1xcXEej7M/i2T3/de//rXmzZunrKws3X333UpMTNTll1+e63w02X744QeZWZ77oaQc+2JBcQDewrS9KHMSExOVnJysJUuWqG7dukpNTVWlSpXUqVMnDRgwQOvXr9e6devcP+ch/XxQPnjwYI5B9sCBA6patarHsrxmOr322mt17bXX6vz58/rss8/08ssva8iQIapevbpuu+22IvclLCxMDz/8sB5++GEdP35cS5Ys0eOPP67rrrtO+/btc8/8+uCDD+rZZ5/Vq6++qrNnzyorK0sPPPBAkbfnpOxJSz766KNcn69QoUKh15VX/nv37q2//OUvmjBhglq3bq1Dhw5p4MCBxYoXAIJNMI2PBcktjjfffFN33323Ro8e7bH86NGjqlSpkvtxtWrVckwIVxSF3U5hxcXF6dChQzmW57bM27Lf35dffjnPWcOrV69e5PXefPPNuvnmm/XTTz9p3bp1euaZZ9S7d2/VrVtXbdq0ydG+cuXKCgkJ0cGDB3M8lz0R2qX7IuArXOFGmdS5c2ctXbpU6enp6tKliyQpOTlZderU0ZNPPqnMzEyPW8Wyb3978803PdazYcMGbdu2TZ06dSrS9kNDQ9WqVSv37c0bN26UVLKzq5UqVVLPnj01cOBAff/999qzZ4/7uYSEBP3ud7/TxIkTNXnyZN14442qU6dOkbfhpB49eujYsWM6f/68UlNTc/ylpKS420ZGRhYrR1FRUe5bFseNG6dmzZqpXbt23uwGAJRqwTg+FpbL5cpx59cHH3yQ45bmbt26aefOnVq6dGme68ov3sJup7A6dOigLVu26IsvvvBYPnPmzGKtryjatWunSpUqaevWrbmO3ampqe6r4sURGRmptLQ0jRkzRpK0adOmXNvFxMSoVatWeueddzxyfuHCBb355pvuk0mAP3CFG2VSp06dNHHiRB09elTjx4/3WD5t2jRVrlzZ4ydPUlJS1L9/f7388ssKCQlRt27dtGfPHj3xxBOqXbu2hg4dWuA2J0+erKVLl6p79+6qU6eOzp49q1dffVXSz7fxVahQQUlJSXrvvffUqVMnValSRVWrVs3xEx7ZbrzxRl1++eVKTU1VtWrVtHfvXo0fP15JSUlq0KCBR9vBgwerVatWkqRp06YVJV0+cdttt2nGjBm64YYbNHjwYLVs2VLh4eHav3+/li1bpptvvlm33nqrJKlp06Z666239Pbbb6tevXqKiopS06ZNC7WdAQMGaOzYsfr888/1r3/9y8kuAUCpEyzjY3H06NFDr732mho2bKgrrrhCn3/+uZ577rkcV+6HDBmit99+WzfffLOGDRumli1b6syZM1qxYoV69OihDh065BtvYbdTWEOGDNGrr76q7t276+mnn1b16tU1Y8YMbd++3RtpyVf58uX18ssvq0+fPvr+++/Vs2dPxcfH68iRI/riiy905MgRTZo0qUjrfPLJJ7V//3516tRJiYmJOn78uF588UWFh4crLS0tz9c988wz6tKlizp06KBHH31UERERmjhxor766ivNmjWrTPzOOgKUv2dtA/zhhx9+sJCQEIuJibFz5865l8+YMcMk2W9+85scrzl//ryNGTPGkpOTLTw83KpWrWp33nmn7du3z6NdWlqaNWnSJMfr165da7feeqslJSVZZGSkxcXFWVpams2fP9+j3ZIlS6x58+YWGRlpkqxPnz559uOFF16wtm3bWtWqVS0iIsLq1Klj/fr1sz179uTavm7dutaoUaNcn+vTp4/FxMTkWJ5Xf5KSkqx79+7ux9mzfm7YsMGjXfaspL/8aZi8tpeZmWnPP/+8XXnllRYVFWXly5e3hg0b2v33329ff/21u92ePXusa9euVqFCBffPmZj9PBP57Nmzc+1jtvbt21uVKlXcPzUGALgoWMZHs/xnKb90rMrue79+/Sw+Pt7KlStn11xzja1cudLS0tJy/DLGDz/8YIMHD7Y6depYeHi4xcfHW/fu3T1+1iuveAu7ncLOUm5mtnXrVuvSpYtFRUVZlSpVrF+/fvbee+8VepbyS/OUve3nnnvOY3le4+yKFSuse/fuVqVKFQsPD7datWpZ9+7dPdrl9Xkg+z3ZvXu3mZktWLDAunXrZrVq1bKIiAiLj4+3G264wVauXFlgblauXGkdO3a0mJgYi46OttatW9v777+f6/Yu3Qey+/bLfAHe4DIz82mFD8AvvvzyS/cs3QMGDPB3OH7z3XffKSkpSX/84x81duxYf4cDAACAIEbBDQS5b7/9Vnv37tXjjz+u//73v/rmm2/ck6mVJfv379euXbv03HPPaenSpdq5c6dq1arl77AAAAAQxJg0DQhyf/vb39SlSxedPHlSs2fPLpPFtiT961//Uvv27bVlyxbNmDGDYhsAAACO4wo3AAAAAAAO4Ao3AAAAAAAOoOAGAAAAAMABfvkd7gsXLujAgQOqUKECv4kHACjTzEwZGRmqWbOmQkL8cx6ccRkAgIu8PS77peA+cOCAateu7Y9NAwAQkPbt26fExES/bJtxGQAAT94al/1ScFeoUEHSxU7Exsb6IwQAAALCiRMnVLt2bffY6A+MywAAXOTtcdkvBXf27WqxsbEM7AAASH69lZtxGQAAT94al/1ScKOUW/ZM/s93eMw3cQAA4G8FjYkS4yIAlGEU3AAAIHhxkhgA4Ef8LBgAAAAAAA7gCnew4Uw+AAAAAAQECm7kVJjvowEAEAw4UQ0AcBC3lAMAAAAA4AAKbgAAAAAAHEDBDQAAAACAAyi4AQAAAABwAAU3AAAAAAAOYJZyeB8zvgIAAAAABXeZw09+AQDgW5yIBoAyi1vKAQAAAABwAAU3AAAAAAAO4JZyAACAvPBVLABACVBwlzYM/AAAAABQKnBLOQAAAAAADqDgBgAAAADAAdxSDt8rzG3x/EQKAAAAgFKOghsAAMCf+J1uAAhaFNyBhknRAAAAACAoUHAjMJX0xANXAwAg+HGSGgAQ4Jg0DQAAAAAAB3CFGwFp7a5j+T7fpl6cjyIBAAAAgOLhCjcAAAAAAA7gCjeC0j/Sd+b7/NAuyT6KBAAAAEBZxRVuAAAAAAAcwBVueB3fvwYAAAAArnADAAAAAOAIrnADAADkwRt3bXHnFwCUXRTcCEqt//tKAS2e90kcAAAAAMoubikHAAAAAMABXOEGAAAIZMueyf/5Do/5Zh0AgCLjCjcAAAAAAA7gCjeKrKDJX0pFDAWd6Zc42w8AAACgRCi4fakwRR6Cxj/Sd+b7/NAuyT6KBABKKcZNAEApR8ENAADgRyX+2bAAODFR0ElmiRPNAMomCu4ypjC3Yjv9e6CBcEs6AAAAADiNghsAAKCYOIkMAMgPBTcAAPCPALgVujTwRVHfpoPjm2BuEwBlEgU3cuBsPQAAZUthvoMNACg6Cu5SpsQTq8Br+HACAIGPcRMA4E8U3CiTCnMVf11W6S+ouX0PAAAA8J8QfwcAAAAAAEAw4gp3URQ0uUuHx3wTBwAA8MqcI8xbclHr/76S7/Pr6vQv0esLsw4ACEYU3N4UALOt8sHBe0r64aMghfkOOLd8AwACQWEKagBAThTcAAAgIHESGYGGuVEAFBUF9y8FwBVqlB4lPdvvjVvrfDFTOh8uADiFgrpsKWjcXDs1/9e36fd8vs8X6s6xsLn5ryPrtwWuAwCKImgK7mApCvjwAXjyxUmF0nJ8AAAAQOkSNAV3wVcb8z8r6g381ie8rTT81newnOwC4H0FHR9a+ygOBAdvjIkFXtioU+JNBDzmkAF8K2gKbgAAEFiYaAtlzdqpj+b7vDe+TlZQMVwaTtYDZYlfCm4zkySdOHHCa+s8deanfJ8/sWCE17ZV3BiWbDngeAwoPc6eOllgm6v3T8v3+Q2J95QohiX/98cC2zQt4PmCYnhm3sYiROQf3jgWTVj6Tb7PD+z4K0fXXxgFxeB0H0oLX+che//LHhv9wYlxWSp4XASKoumOl/N9vjBjYkH7ZEHbKHD9BTxfmPUX1I+CPtOezbq5wG0UpKCxe2DYe/mv4NePlDiG0oBx03d8mWtvj8su88MIv3//ftWuXdvXmwUAIGDt27dPiYmJftk24zIAAJ68NS77peC+cOGCDhw4oAoVKsjlcvl68447ceKEateurX379ik2Ntbf4ZRK5LDkyGHJkUPvII/5MzNlZGSoZs2aCgkJ8UsMgTIuB+u+Qr9KF/pVutCv0ifQ++btcdkvt5SHhIT47Sy+L8XGxgbkTlSakMOSI4clRw69gzzmrWLFin7dfqCNy8G6r9Cv0oV+lS70q/QJ5L55c1z2z6l0AAAAAACCHAU3AAAAAAAOoOB2QGRkpEaMGKHIyEh/h1JqkcOSI4clRw69gzyisIJ1X6FfpQv9Kl3oV+kTzH3LjV8mTQMAAAAAINhxhRsAAAAAAAdQcAMAAAAA4AAKbgAAAAAAHEDBDQAAAACAAyi4AQAAAABwAAV3MUycOFGXXXaZoqKi1KJFC61cuTLPtu+88466dOmiatWqKTY2Vm3atNGiRYt8GG3gKkoeV61apXbt2ikuLk7R0dFq2LCh/vGPf/gw2sBUlBz+0urVqxUWFqZmzZo5G2ApUJQcLl++XC6XK8ff9u3bfRhx4CnqfvjTTz9p+PDhSkpKUmRkpOrXr69XX33VR9HCl4q6b6xYsUItWrRQVFSU6tWrp8mTJ+doc/z4cQ0cOFAJCQmKiopSo0aNtHDhQqe6kCsn+jV+/HilpKQoOjpatWvX1tChQ3X27FmnupCnovTt4MGD6t27t1JSUhQSEqIhQ4bk2m7u3Llq3LixIiMj1bhxY7377rsORZ83b/drypQpuvbaa1W5cmVVrlxZnTt31qeffupgD3LnxPuV7a233pLL5dItt9zi3aALwYl+lbZjR2H7FQjHDidqo0A4bniNoUjeeustCw8PtylTptjWrVtt8ODBFhMTY3v37s21/eDBg23MmDH26aef2s6dO+2xxx6z8PBw27hxo48jDyxFzePGjRtt5syZ9tVXX9nu3bvtjTfesHLlytk///lPH0ceOIqaw2zHjx+3evXqWdeuXe3KK6/0TbABqqg5XLZsmUmyHTt22MGDB91/WVlZPo48cBRnP7zpppusVatWlp6ebrt377b169fb6tWrfRg1fKGo+8auXbusXLlyNnjwYNu6datNmTLFwsPDbc6cOe42P/30k6WmptoNN9xgq1atsj179tjKlStt8+bNvuqWI/168803LTIy0mbMmGG7d++2RYsWWUJCgg0ZMsRX3TKzovdt9+7d9tBDD9n06dOtWbNmNnjw4Bxt1qxZY6GhoTZ69Gjbtm2bjR492sLCwmzdunUO9+ZnTvSrd+/eNmHCBNu0aZNt27bN7rnnHqtYsaLt37/f4d78zIl+ZduzZ4/VqlXLrr32Wrv55pud6UAenOhXaTx2FKZfgXDscKI2CoTjhjdRcBdRy5Yt7YEHHvBY1rBhQxs2bFih19G4cWMbNWqUt0MrVbyRx1tvvdXuvPNOb4dWahQ3h7///e/tr3/9q40YMaLMF9xFzWF2wf3DDz/4ILrSoag5/PDDD61ixYp27NgxX4QHPyrqvvHnP//ZGjZs6LHs/vvvt9atW7sfT5o0yerVq2fnzp3zfsCF5ES/Bg4caB07dvRo8/DDD9s111zjpagLpyRjc1paWq4FQa9evez666/3WHbdddfZbbfdVqJYi8KJfl0qKyvLKlSoYNOnTy9umEXmVL+ysrKsXbt29q9//cv69Onj84LbiX6VxmPHL+XVr0A4djhRGwXCccObuKW8CM6dO6fPP/9cXbt29VjetWtXrVmzplDruHDhgjIyMlSlShUnQiwVvJHHTZs2ac2aNUpLS3MixIBX3BxOmzZN3377rUaMGOF0iAGvJPth8+bNlZCQoE6dOmnZsmVOhhnQipPD+fPnKzU1VWPHjlWtWrWUnJysRx99VGfOnPFFyPCR4uwba9euzdH+uuuu02effabMzExJF/efNm3aaODAgapevbouv/xyjR49WufPn3emI5dwql/XXHONPv/8c/ctybt27dLChQvVvXt3B3qRO2+MzbnJq/8lWWdRONWvS50+fVqZmZk++3znZL+eeuopVatWTf369SvReorDqX6VxmNHYfj72OFUbeTv44a3hfk7gNLk6NGjOn/+vKpXr+6xvHr16jp06FCh1vHCCy/o1KlT6tWrlxMhlgolyWNiYqKOHDmirKwsjRw5Uvfee6+ToQas4uTw66+/1rBhw7Ry5UqFhfFfvzg5TEhI0CuvvKIWLVrop59+0htvvKFOnTpp+fLl+vWvf+2LsANKcXK4a9curVq1SlFRUXr33Xd19OhRDRgwQN9//z3f4w4ixdk3Dh06lGv7rKwsHT16VAkJCdq1a5eWLl2qO+64QwsXLtTXX3+tgQMHKisrS08++aRj/cnmVL9uu+02HTlyRNdcc43MTFlZWXrwwQc1bNgwx/pyKW98xslNXv0vyTqLwql+XWrYsGGqVauWOnfu7LV15sepfq1evVpTp07V5s2bSxhh8TjVr9J47CgMfx87nKqN/H3c8DY+dReDy+XyeGxmOZblZtasWRo5cqTee+89xcfHOxVeqVGcPK5cuVInT57UunXrNGzYMP3qV7/S7bff7mSYAa2wOTx//rx69+6tUaNGKTk52VfhlQpF2Q9TUlKUkpLiftymTRvt27dPzz//fJksuLMVJYcXLlyQy+XSjBkzVLFiRUnSuHHj1LNnT02YMEHR0dGOxwvfKepxPrf2v1x+4cIFxcfH65VXXlFoaKhatGihAwcO6LnnnvPJh+b84ixJv5YvX66///3vmjhxolq1aqVvvvlGgwcPVkJCgp544gkvR5+/4n7G8fU6AymGsWPHatasWVq+fLmioqK8ss7C8ma/MjIydOedd2rKlCmqWrWqN8IrNm+/X6X12FGQQDl2OFEbBcJxw1souIugatWqCg0NzXF25bvvvstxFuZSb7/9tvr166fZs2f77OxnoCpJHi+77DJJUtOmTXX48GGNHDmyTBbcRc1hRkaGPvvsM23atEmDBg2SdHHwMTOFhYVp8eLF6tixo09iDxQl2Q9/qXXr1nrzzTe9HV6pUJwcJiQkqFatWu5iW5IaNWokM9P+/fvVoEEDR2OGbxRn36hRo0au7cPCwhQXFyfp4v4THh6u0NBQd5tGjRrp0KFDOnfunCIiIrzcE09O9euJJ57QXXfd5b5rq2nTpjp16pT69++v4cOHKyTE+W8AeuuYeKm8+l+SdRaFU/3K9vzzz2v06NFasmSJrrjiihKvr7Cc6Ne3336rPXv26MYbb3Qvu3DhgiQpLCxMO3bsUP369YsfdCE49X6VxmNHYfj72OFUbeTv44a38R3uIoiIiFCLFi2Unp7usTw9PV1t27bN83WzZs1S3759NXPmTJ9+HytQFTePlzIz/fTTT94Or1Qoag5jY2P1n//8R5s3b3b/PfDAA0pJSdHmzZvVqlUrX4UeMLy1H27atEkJCQneDq9UKE4O27VrpwMHDujkyZPuZTt37lRISIgSExMdjRe+U5x9o02bNjnaL168WKmpqQoPD5d0cf/55ptv3EWAdHH/SUhIcPwDs+Rcv06fPp3jg3FoaKjs4uS2XuxB3rx1TLxUXv0vyTqLwql+SdJzzz2nv/3tb/roo4+UmppaonUVlRP9atiwYY7PCjfddJM6dOigzZs3q3bt2t4IPV9OvV+l8dhRGP4+djhVG/n7uOF1vpqdLVhkT30/depU27p1qw0ZMsRiYmJsz549ZmY2bNgwu+uuu9ztZ86caWFhYTZhwgSPnxE6fvy4v7oQEIqax//7v/+z+fPn286dO23nzp326quvWmxsrA0fPtxfXfC7oubwUsxSXvQc/uMf/7B3333Xdu7caV999ZUNGzbMJNncuXP91QW/K2oOMzIyLDEx0Xr27GlbtmyxFStWWIMGDezee+/1VxfgkKLuG9k/nzV06FDbunWrTZ06NcfPZ/33v/+18uXL26BBg2zHjh22YMECi4+Pt6effrpU92vEiBFWoUIFmzVrlu3atcsWL15s9evXt169evmsX8Xpm5nZpk2bbNOmTdaiRQvr3bu3bdq0ybZs2eJ+fvXq1RYaGmrPPvusbdu2zZ599lm//SyYN/s1ZswYi4iIsDlz5nh8vsvIyCjV/bqUP2Ypd6JfpfHYUZh+BcKxw4naKBCOG95EwV0MEyZMsKSkJIuIiLCrrrrKVqxY4X6uT58+lpaW5n6clpZmknL89enTx/eBB5ii5PGll16yJk2aWLly5Sw2NtaaN29uEydOtPPnz/sh8sBRlBxeioL7oqLkcMyYMVa/fn2LioqyypUr2zXXXGMffPCBH6IOLEXdD7dt22adO3e26OhoS0xMtIcffthOnz7t46jhC0XdN5YvX27Nmze3iIgIq1u3rk2aNCnHOtesWWOtWrWyyMhIq1evnv3973+3rKwsp7viwdv9yszMtJEjR7qPL7Vr17YBAwb45ScIi9q33D7jJCUlebSZPXu2paSkWHh4uDVs2NAvJym93a+kpKRc24wYMcI3Hfr/nHi/fskfBbeZM/0qjceOgvoVKMcOJ2qjQDhueIvLzEf3KgEAAAAAUIbwHW4AAAAAABxAwQ0AAAAAgAMouAEAAAAAcAAFNwAAAAAADqDgBgAAAADAARTcAAAAAAA4gIIbAAAAAAAHUHADAAAAAOAACm4AAAAAABxAwQ0AAAAAgAMouAEAAAAAcMD/A8hPDvkJfbUxAAAAAElFTkSuQmCC",
-      "text/plain": [
-       "<Figure size 1000x2000 with 30 Axes>"
-      ]
-     },
-     "metadata": {},
-     "output_type": "display_data"
-    },
-    {
-     "data": {
-      "image/png": "iVBORw0KGgoAAAANSUhEUgAABOgAAAMwCAYAAACa9V8lAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjguMCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy81sbWrAAAACXBIWXMAAA9hAAAPYQGoP6dpAAEAAElEQVR4nOzdeXhN1+LG8e/JIDJJIiGEEFNMIRq05qHVpoZcVA1BNLTamjVV6hpqqqGKVl2KItq6VS16XZeYippqCEEliBBTY4ixEkKG3x9+Tp0MJC12G+/nefbz5Oyz93rX3ifnZGedtfYyZWRkZCAiIiIiIiIiIiKGsDK6AiIiIiIiIiIiIk8zNdCJiIiIiIiIiIgYSA10IiIiIiIiIiIiBlIDnYiIiIiIiIiIiIHUQCciIiIiIiIiImIgNdCJiIiIiIiIiIgYSA10IiIiIiIiIiIiBlIDnYiIiIiIiIiIiIHUQCciIiIiIiIiImIgNdCJiIiIiIiIiIgYSA10IiIiIiIiIiKSb/30008EBQXh5eWFyWTihx9+eOg+mzdvpmbNmhQsWJCyZcvy+eefP9Y6qoFORERERERERETyraSkJPz9/ZkxY0autj9x4gQtWrSgYcOG7Nu3j3/+85/079+fpUuXPrY6mjIyMjIeW+kiIiIiIiIiIiJ/ESaTieXLl9OmTZsctxkyZAgrVqwgJibGvO7tt99m//797Nix47HUSz3oRERERERERETkbyUlJYXr169bLCkpKY+k7B07dvDSSy9ZrAsMDGTPnj3cuXPnkWRkZvNYShX5C7mTeNyw7J+qDjUsu9GhCYZlGym8xkjDsttWPW1YtkPzSoZlm8qUMyx7zIAow7In/7rZsOzrH7UyLPvFybGGZa97r4Jh2UaqNGa7YdkjHPwNyzZSQ4fLhmW/nnTLsOzSNi6GZS+MnGJY9tN6vZZ2Jtqw7NFt/m1Y9msFrhqW7VEmybBsI68VF31s3HFvtDYu+98nlxuW/SQZ+f/2hBlfMnr0aIt1H3zwAaNGjfrTZZ87dw5PT0+LdZ6enqSmppKYmEjx4sX/dEZmaqATEREREREREZG/laFDhxIWFmaxzs7O7pGVbzKZLB7fu0Nc5vWPihroRERERERERETkb8XOzu6RNsjdr1ixYpw7d85i3YULF7CxscHd3f2xZKqBTkRERERERERE8i49zegaPBZ169blv//9r8W6tWvXUqtWLWxtbR9LpiaJkMcqNDTUYmaUJk2aMHDgQMPqIyIiIiIiIiJPlxs3bhAVFUVUVBQAJ06cICoqilOnTgF3h8t269bNvP3bb7/NyZMnCQsLIyYmhvnz5zNv3jwGDRr02OqoHnTyRC1btuyxtTY/KnuiDrLg398TffgYFy9d5tMJI3ihUb1HnlMi9CVK9wmiQFFXko6cIXbEQq7uPJztti7PVqT8iC44lvfCyt6OW2cucvar9ZyeveqR1edJHbeR2ZW7NcP/7RbYF3XlytGz/Dzqa87tOpLttj7Na1E55AXcq5bGuoAtV46eYe/UZZzZfPAPZRds1Qb79p2wKlyYtJPx3Ph8Bqm/HMh2W9vqNXCZ/GmW9VfeCCHt9Kk8Z9tUb4xNzZcwObqQcelXbm9eQvqvx7LdtsBLr2FTJeu5T7/0K7e+Gp3NHg/27fZoFm4+SOJvNynn6cp7/6hDQJliOW7/v73HWLj5IKcSr+FUsAD1KpYkrOWzuDoWzHN2na7NaPhWK5yLunLh6FlWjvmS+N3Zv97ORVxpMbwLJfzK4F6mGDvC17ByzFd5zrzfyBFhvPF6F9zcXNi1ax/9BgwjOvpojtu3adOc94f0o3w5H2xtbYk9doJpn8xm0aKleco18vUG6BH2Gq27tMTZxZlD+2KYOmw6J47G57h9GV8f3hgUSsXqvhT3LsanH/yLJV/k7ZjB2OM2+py/M6QXnbu9iotrIfZFHmTE4A85ejgux+2Du7WjXccgKla+O+nGwahoJo37lP17f8lTrpGfqUZmu3ZuSeHX22FTtDC3Y09yfvwcbu459ND97AOqUOrrSaTExhPfut8fygbj3mMP025gR57v/BKOLo4c2xfLghFzOBv7+CZP0vXa4z3ub9dsI/y/m0i8ep1yJYsx+LXWBFQum+P2i9dsZXHENn69eJliHm70bNuMoMa1/lC2kX+/jXx/P63XikZ+nj9Is5CXafVWG1yLuHE29jRfjp7Hkd0xjzwnX8hIN7oGubJnzx6aNm1qfnzv3nWvvfYa4eHhJCQkmBvrAMqUKcOqVat45513+Ne//oWXlxfTp0+nXbt2j62O6kEnD/UopxAuXLgwzs7Oj6y8x+HmzVtULF+Wf4b1fmwZRVvXxXfsa8R/spxdzd7n6s7D+H8zFLsS2Y9lT0tO4cz8CCLbjOLnhmHET1tGufc74hXywiOr05M4biOzywY9R91RXdn32QqWvzycc7uO8PJX7+Holf05L/ZcJc5u+YWIbh+zvMVwft0ew0sL3sW9auk8Zxdo3BTHt/uS/M1XXO3dkzu/HMBl3CSsihR94H6Xe3ThUqe25iXt7Jk8Z1v71sK2cQfu7FrFrUXjSPv1GHZt+mFydst2+9ubviV5znvm5eYXQ8i4eYO02Mg8Z6+JOs7k/+7kjedrsHhAG54pU4w+89aQcOVGttvvO3GOEd/+RJvavix9tx2Tuz7PodMXGf391jxnV2tVh5Yju7Fxxg981uKfxO8+TGj4EFxyeL2t7WxIuvwbG//1H87F5P3CNrP3BvVm4IA36T9wOHXqteTc+YtErPoGJyfHHPe5cvkqEyZOp0Gjf/BMzWYsXPgt8+ZO5aUXG+c618jXG6BL7050evNVpg7/jNdb9uLyxct88s1HODja57iPnb0dv55KYNb4uSSev/SHco08bqPPea/+PXijdzdGDBlPq2bBXLyQyKKlc3B0cshxnzr1a/Ofpavp+I8etAnsytmzCXy9dDaexR/8mXQ/Iz9Tjcx2btEIz3++yaXPvyW+TT+S9xzCe+4YbIoXeeB+Vk4OFP/oXZJ2ROU5835GvcceJujttjR/4x+Ej5zL8KDBXLt4hX8uGkXBP/DlSm7peu3xHXfE9n18tPA/9Gz7At9ODCOgUhl6T5hLQuKVbLdfsnY7079ZxdvtX2LZlMH0ah/I+PnL2BT58IatzIz8+23k+/tpvVY08vP8Qeq0qk+3kT34Ycb3/LPluxzeFc2QhSNw9/J4pDnyZDVp0oSMjIwsS3h4OADh4eFs2rTJYp/GjRuzd+9eUlJSOHHiBG+//fZjraMa6J6AJk2a0K9fPwYOHIibmxuenp7MmTOHpKQkunfvjrOzM+XKlWP16tUW+0VHR9OiRQucnJzw9PQkJCSExMRE8/MRERE0aNAAV1dX3N3dadWqFXFxv39jHh8fj8lkYtmyZTRt2hQHBwf8/f3ZsWPHA+trMpn4/PPPad26NY6OjowbN460tDRef/11ypQpg729PRUrVuTTTy2/tUlLSyMsLMxcn8GDB5tnObn/XNw/xNVkMvHDDz9YbOPq6mp+k9y+fZu+fftSvHhxChYsiI+PDxMmPN7p6BvWrU3/N1/jxSb1H1tGqbdb8uu/f+TXRT+SHHuW2BELSTl7iZKhL2W7/Y1f4jm/fDtJR85w6/RFzi3dyqWNB3B97tFNl/4kjtvI7GpvNufI4k0c+WYTV4/9ys+jvubGr5eo0i37i+afR33NgVn/I3H/ca6fOM+eSUu4fuIcpV58Js/Z9q904NaaVaRE/I+00ydJ+nwGaRcvUrBV6wful3H1KhlXLpsX0vP+7ZRNQDNSD20j7dA2Mq6c487mJWTcuIJN9RwafW7fguTr5sXKszQUdCD10PY8Z3+15Rfa1vbllecqUtbTlcH/qEMxV0e++zn7bx8PnLqIl5sTnRtUpURhZ54pU4xX61Qi+kxitts/SMM3WrBnySb2fLuJi3G/snLMV1xLuESdrs2y3f7qmURWjv6Sfcu2cOu35DznZda/3xtMmDidH35YzaFDR+jeYyAODvYEd2qb4z6bf9rBf/4TweHDxzh+/CSfzZjHgYMx1K//bK5zjXy9ATq80Y6F0xexefUWThyJZ9zASdjZF+TFtjn/c3p4/xH+NW42G1Zs5M7tP/aFkJHHbfQ5f/3trsyYMpeIlRs4GnOMsN7DKOhQkDbtWua4z4C33uer+d8S/csR4mJPMGTAKKysrGjQ6Llc5xr5mWpkduHubbn6/VqufbeG23GnuTB+DnfOXcStc87nG6DY2H5c/+8mbkVl3/Mqt4x6jz3My6+34j8zvmd3xM+cOXqKWe9Op0BBO+q1bvRY8kDXa4/zuL/630+0ff5ZXnmhDmVLejI4tA3F3F1Zsjb7z6mVW/bwarO6vFzvGUp6utO8/jO0bfosC/7zY56zjfz7beT7+2m9VjTy8/xBWrzxDzZ9u4FNi9fz67EzfDVmPpcSLtGs68uPNEckMzXQPSELFy7Ew8ODXbt20a9fP3r16kX79u2pV68ee/fuJTAwkJCQEJKT7/5hSUhIoHHjxtSoUYM9e/YQERHB+fPn6dChg7nMpKQkwsLC2L17Nxs2bMDKyoq2bduSnumDediwYQwaNIioqCh8fX0JDg4mNTX1gfX94IMPaN26NQcPHqRHjx6kp6dTsmRJlixZQnR0NCNHjuSf//wnS5YsMe8zZcoU87jsrVu3cvnyZZYvX/6nztv06dNZsWIFS5Ys4ciRI3z99df4+Pj8qTKNZrK1xrl6WS5vsuyyfnnzflxq+eaqDCc/H1xq+3J1h7pZ54aVrTUe1cpw9ifL4Vtnf/oFz1oVcleIyYStU0FSriblLdzGBpsKvtyJ3G2x+k7kbmyr+D1wV9eZX1D438soNHEqtv5/4MLDyhqroqVIPxltsTrtZDRWxcvlqgibqg1IP3WYjN8u5yn6TmoaMWcTqetbwmJ9nQol2B9/Idt9/EsX5fy1JLbEnCYjI4NLv91k/YF4GlbyzlO2ta01Xn5liN1i+R6L3XKQUjVz9x77M8qUKUXx4p6sW7/ZvO727dv8tOVn6tbN/XCf55s2oKJvObZs+Tl3Oxj4egN4lSqOh6c7uzbvMa+7c/sOUT/vp1qtqnkuL9eMPG6Dz3mp0iUpWqwIP238/Z+i27fvsHNbJDWf9c91OfYOBbG1seHqlWu52t7Iz1RDP89tbShYtTxJ2/ZarE7aug/7ZyrnuJvLKy9iW6o4iTMW5S0vE8PeYw9R1NsTt6KFObAlyrwu9XYqMTsP4Vvz0TVMPWlP6/XandRUYo6foW71ihbr6/pXZH8OQ6lv30mjgK3lnZPsCtjyy7HT3EnN/c3jDf37beT7+ym9VjT08/wBrG1tKFOtnMVnGsDBn6L+1p9pj1V6unFLPqN70D0h/v7+DB8+HLh788GJEyfi4eFBz549ARg5ciSzZs3iwIED1KlTh1mzZhEQEMD48ePNZcyfPx9vb2+OHj2Kr69vlrHP8+bNo2jRokRHR+Pn9/uH+aBBg2jZ8u43P6NHj6Zq1aocO3aMSpVy/oDp3LkzPXr0sFg3evTv9xQoU6YM27dvZ8mSJeZGw08++YShQ4ea6/X555+zZs2aPJ+r+506dYoKFSrQoEEDTCYTpUs/2u7LRrAtXAgrG2tuX7T8Ryjl4jUKF3V94L71982kgHshTDbWHJ/8Hb8uyvs3k0+jgoWdsbKxJjnTOb958Rr2RVxzVUb1t1pg42DH8f/uzFO2VSEXTNY2pF+1vGhJv3oFk1vhbPdJv3yJ3z6ZTGrsEUy2BbB74SUKTZzKtfcG5HgvkuyY7J0wWVmTkXzdYn1G8m+YHAo9vACHQlj5VOX26nm5zrznStIt0tIzKOxkOezK3dmexN9uZrtPDR9Pxgc3YciijdxOTSU1PYMmVUoxpE3dPGU7uDljbWPNjUyv942L13D2cMnbgfwBxTzvDkc5f96y59/58xcpXarkA/ctVMiZU/GR2NkVIC0tjb79/sn6DVtylWvk6w1QuOjd3+crmYZAXb54hWIlPf9Qmblh5HEbfc6LeN4dApR40XLYYuLFS5TwLp7rct4f+Q7nEi6wdXPuGoON/Ew1MtvG7e7f37TEqxbr0y5dwdoj+6FgtqW9KDIolJOdB0Pan/tHwqj32MO4/P+1y7WLVy3WX0+8ikeJBw8N/Ct7Wq/XrlxPIi09HXcXJ4v17i5OJF79Ldt96vlXZPmPO3m+th+Vy5Qk+vgZfti0i9S0NK7+lkQRt1x8HmLs328j399P67WikZ/nD+L8/7+H1zL9LlxLvIpLLusl8kepge4JqV69uvlna2tr3N3dqVatmnmdp+fdC6sLF+72LImMjGTjxo04OVn+cQSIi4vD19eXuLg4RowYwc8//0xiYqK559ypU6csGujuzy5evLg550ENdLVqZe3l8fnnn/PFF19w8uRJbt68ye3bt6lRowYA165dIyEhgbp1f/9H2sbGhlq1amUZ5poXoaGhvPjii1SsWJGXX36ZVq1a8dJL2Q8rAEhJSSElJcVinVVKCnZ2dn+4Do9LBpbnxWQywUPOVWTrD7B2LIhLzQqUH9aZm/HnOL/8jw2LeiplPr+mbNZlo1zrugSEtWVtj2ncunT9odtnn53psSm7lXelnTlN2pnfb6ydGnMI6yJFsX+1E7/l4aLrz7KpWg9SbpIWF/WHyzCZLB9nZGRdd0/c+St89J+febNZDepVLEni9WSm/W8XHy7bxqj2Df9wHX6vTE5n/M8JDm7LrH9NMj/+R+u7sz9l/uwzmUwP/Tz87bcb1Kz9Ek5OjjzftAEfT/6AEydOsfmnB9+a4FHI6+v9UtsXeG9SmPnxe92GAn/suI30KH7Pn1R2m1dbMmHqSPPj0E59gOzOedZ1OXm7X3dat2tOh6AepKTczl3F7zH0M9W47Kzn1kS2ny5WVnhNHUzi9EXciT+b55y/6nusfptGvD7+93vwfNT9w+w3/Iu/93Prab1eM2X6Y/2gv99vtnuRxKvXCRk+nYwMKOzixD8a1yZ8xUasrHLYKU+VeTx/v7PzpN7f2YdnE/0UXCsa+rfkgfXK9DgX7/2nVcbfZJKIvwM10D0hmWcuNZlMFuvu/RG818iWnp5OUFAQkyZNIrN7jWxBQUF4e3szd+5cvLy8SE9Px8/Pj9u3LS+wH5STE0dHyxuZL1myhHfeeYcpU6ZQt25dnJ2dmTx5Mjt3/rlvK7K7qLx/UoqAgABOnDjB6tWrWb9+PR06dKBZs2Z8//332ZY3YcIEi55+AMPf68/IwQP+VD0fpTuXr5OemoZdpm9gCngUyvItbWa3Tl0EICnmNAWKuFJmUPu/3QWfEW5d/o301DQcMn3jbe/hws3EB5/zskHP0ejjN1j/1mf8ujXvNztOv36NjLRUrDJ9A2rl4kbGlexvtpydO4cPYfd8zo3T2cm4eYOM9LQs34CaHJyzfFOaHZsq9UiN+RnScz885R43x4JYW5m4lKm33OUbN3F3yv5m5vM37sffpyihTe5+qeBbvDD2BWzoPut/9AmsSZFCOd/0/n7JV34jLTUNpyKW37Y7ebhw4yGv9x/x3/+uZdeufebHdnYFAChWrAjnzv0+nLdoUQ/OX3jw/fQyMjKIi4sHYP/+Q1SqVJ4hg/vmqoHuSb/eW9du59C+34dtFShw97gLFynMpQu/9wJw83DN0uPnUTLy9/xJZ6+L2Mi+yN//8br3u1akqAcX7uux6e7hTuKFh08G8Gbf1+gT9gZd2vbk8ANmGM7MyM9UI7NTr1wnIzUNmyKWvWms3V2z9LoBsHK0x76aLwUrl8NzZK//X2nCZGVFxej/crrHcJJ/3p9j3l/lPZZZ5LpdHNv3+++LTYG715kuRVy5euH3ehRyd+HaY/jMfVKe1us1t0KOWFtZZektd/n6Ddxdsp/orWABW8b06sSInu25fO03PNwKsXT9zzja2+HmnPPkSJk96b/f93vS7+/7Pa3XikZ+nj/Ib///e5i5t5zL3/wzTf4edA+6v6iAgAAOHTqEj48P5cuXt1gcHR25dOkSMTExDB8+nBdeeIHKlStzJQ8f4Hm1ZcsW6tWrR+/evXnmmWcoX768xYQULi4uFC9enJ9//n14TGpqKpGRD57Np0iRIiQkJJgfx8bGmu/Dd0+hQoXo2LEjc+fO5dtvv2Xp0qVcvpz9PQ6GDh3KtWvXLJYhAx7vTCt5lXEnjd8OHKdw4+oW6ws3qs61Pbn/BwnAqoDa2HMj/U4aiQdPUKKh5X08SjT04/ye2Bz3K9e6Lo2nvcWPfWdy+seoPxaemkpq7FFsAyx7pdoG1OJO9C857JSVTbkKpF/O4+x76WmkXziFVSnLe6dYl6pMekJcDjvdZVXSFys3T1IPbctb5v+ztbGmcgkPdsRafqu8M/ZX/H2yn5Hs1u00rDJ9PX/vm/e89MJIu5PGr7+coEKDahbryzfw41Rk3t5juXHjRhJxcfHmJTr6KAkJ52n2wu83R7e1taVRwzrs2LHnASVlZTKZzI0wD/WEX+/kpJucjf/VvJw4Gk/i+UvUblTTvI2NrQ016vhzcM+jvXi2YODv+ZPOTrqRzMkTp83L0cNxXDh3kYZNfu+9bmtrw3P1axK568H/GL7VL5T+g96iW/teHIiKfuC2mRn5mWro5/mdVG4dOoZjPcv7PDnWf4ab+7LeYyz9RjLHW/biROu+5uXqN6tIOX6aE637cnP/g28o/5d5j2VyK+kW50+eMy9nY09z5cJlqjX4/b6H1rY2VH6uKkcj/9ykGEZ6Wq/XbG1sqFy2JD8fsDzGnw8cxd/X5yH7WuPp7oq1lRUR2/fRKKAKVla5/3fzSf/9tvCE398WntJrRUM/zx8g7U4qJw7GUa2h5b1c/Rr6/60/0x4r3YPukfn7/LV4yvTp04e5c+cSHBzMe++9h4eHB8eOHWPx4sXMnTsXNzc33N3dmTNnDsWLF+fUqVO8//77j60+5cuX58svv2TNmjWUKVOGr776it27d1OmTBnzNgMGDGDixIlUqFCBypUrM3XqVK5evfrAcp9//nlmzJhBnTp1SE9PZ8iQIRY9/qZNm0bx4sWpUaMGVlZWfPfddxQrVgxXV9dsy7Ozs8synPXO7bzNAJmcfJNTZ341Pz7763kOH43DpZAzxYs9eKrz3Dr1+f+oOqMv1/fHcW1PLCVCXsCupAdnF64DoNywYOyKFSa6378AKNn9JW6dTSQp9m69XJ+rROneQZyeF/FI6gNP5riNzD44ZzVNPu3FxQPHuRB5jEpdmuJUwp2YrzYAUPv9DjgWc2PTwNnA3T/+TT55i+0ffM2Fvcew//9vc1Nv3eZODvdQy8nNZUtwfm8YqUePkBpziIItWmFdtCi3/rcCAIfuPbHyKMKNyXfvOVmw7auknztH6skTmGxtsXv+RewaNuH6mOF5Pu7UvespENid9PMnSU84jk21hpicC5N64CcAbOu3weToyu214Rb72VStT1rCcTIu/ZpNqbkT0tCPYd9upmrJIlQvVZSlOw+TcPUGr9a5O7x++urdXLiWzLhOd2cJa1TFm7Hfb2XJjhjq+Zbg4m83mbziZ/y8i1DUJfffwANs+WIVHab25syB45zaG8uznZ/H1cuDnYvuvt6BgztSyLMw3707y7xP8Sp373FZwKEgjoULUbxKadJup3LhWN6Hrkz/7AveH9KP2GMnOHbsBO8P6Udy8k2+Wfz7xDkL5n/Kr78mMGz4RACGDO5LZOR+4o6fpEABW5q//AIhXV+lT9+huc418vUGWPLFUrr168KZE2c5feIM3fp1IeXmLdYt32DeZvin75OYkMjnE7+4m21rQxnfu+fe1taGIsU8qFC1nLlx4q9+3Eaf83mff02fsDc4cfwkJ46fou87PbmVfIsflv7PvM20mR9yLuECk8benX397X7defeffen/5hDOnDpLkaJ372WXlJRMclLuPt+M/Ew1MvvyguV4ffQut36J5WbUYVw7vIxt8SJc+WYVAEXeDcXG052EwVMgI4PbsSct9k+7fI2MlNtZ1ueWUe+xh4mYt5LWfV7lXHwC504k0LpvO27fSmH7f356JOVnR9drdz2O4w5p2YhhM76hSrmS+FfwYemGn0lIvEL7F+9+GfDpv//HhcvX+LBvZwDif73IL3GnqFa+FNeTbvLVys0cO32Osb2D85xt5N9vI9/fT+u1opGf5w+y6osV9J42gOMH4ojde4Tng1/Ew8uDDYv+3P3VRR5GDXR/UV5eXmzbto0hQ4YQGBhISkoKpUuX5uWXX8bKygqTycTixYvp378/fn5+VKxYkenTp9OkSZPHUp+3336bqKgoOnbsiMlkIjg4mN69e7N69WrzNu+++y4JCQmEhoZiZWVFjx49aNu2Ldeu5dwVeMqUKXTv3p1GjRrh5eXFp59+atHrzsnJiUmTJhEbG4u1tTW1a9dm1apVefo2Lq9+ORxLj35DzI8/+mwOAK2bN+PD4e8+kowL/9mBrZszZcLaYefpxo3Dp9nfeSK3ztxtTCxQ1JWCJdx/38HKinLDOmNfqggZqekkx5/n2Lh/c/bL9Y+kPvBkjtvI7OP/3YmdmzMBA9viUNSVy0fOENFtMjfO3v2m0aGoK44lPMzbV+r6PFa2NjQYH0qD8aHm9UeX/MTmsDl5yr69eSNJzi44dOmGVWF30k6e4NrwIaRfOA+AVWF3rIv8flFtsrHF8c1eWLkXIeN2Cmkn47k2fDB3dud9SHna0T3cKeiIbZ2WmBxcyLj0Kyn/mWGeacvk6IKpUKYbEBcoiHX5AG5v/jbPefcLrFGWq8m3mL1+H4nXkylfzI0ZPV7Cy+3uEJmL12+ScPWGefvWtXxJTrnD4u3RTF25E+eCdtQuX5wBLWrnOfvgyp9xdHXihQGv4FzElfNHzxDe/SOunr37HnMu6orr/e8xoP+qCeafS1YvS4029bly5iIfNcj7EPnJH8/E3r4gM6aPx83NhV279tG8ZWdu3Ph9lrFS3l4WtxtwdHTgs+kTKFmyGDdv3uLIkTi6hfbnu+9W5DrXyNcbYNHMxdgVtOPd8QNwdnEmel8MAzsPtmj08fQqSsZ9x+3h6U742rnmx517daRzr47s3R5Fv/Zh5IaRx230OZ81fT4F7e34cPJwCrkWIiryIF1efYukG7/3RvcqWZz09N97oYa83hE7uwLMXjjNoqxpk2YybdIscsPIz1Qjs39b9RPnXZ3x6NMZ66KFuX00ntM9PyD117vD2W2KuGFb/PFNjGDUe+xh/vv5cgoULED3cW/iWMiJuKhYJnQdza2kW4+k/Ozoeu2ux3HcL9d7hmu/JTNn6TouXrlOee/i/Ov9N/AqcvezLPHqdc5dumrePj09nS9XbuLkrxexsbamdtVyfDm2HyWKZj/BwYMY+ffbyPf303qtaOTn+YP8vHIbTm7OvNK/A65F3Thz9BQfhY4j8ezFR5Yhkh1TRn64e6vIA9xJPG5Y9k9Vc9/z5VFrdGjCwzfKh8JrjHz4Ro9J26qnH77RY+LQ3Lhp301lyhmWPWZAlGHZk3/dbFj29Y9aGZb94uSch508buveq2BYtpEqjTHu3lUjHPwfvlE+1NAh+1tpPAmvP8ZGrYcpbfP4Z7rOycLIKYZlP63Xa2ln8ja8/VEa3ebfhmW/VuCqYdkeZZIevtFjYuS14qKPjTvujdbGZf/75PKHb5QP3D6du/stPg4FvPPXdYruQSciIiIiIiIiImIgDXEVEREREREREZG8+wOz+Er21INORERERERERETEQGqgExERERERERERMZCGuIqIiIiIiIiISN5lpD98G8kVNdBJvve0zsxl5HEbaWPB28aFH/I2LLps1FXDsiHSuGh7a8OinytS0bDsHROuGpY9z6WgYdlGHreRGjiXNy7cwNvKHLcx8II/ubBh0aVtjJtxsLTJ3rDsuHp9Dcuuv6a3YdlGHvfZa86GZRv599vI4z4bZeA5N/Ba8biBr3dpjPtcE8krNdCJiIiIiIiIiEjepasH3aOie9CJiIiIiIiIiIgYSD3oREREREREREQkzzJ0D7pHRj3onlImk4kffvjhb1OuiIiIiIiIiEh+pQY6+Uu6fdvAG/2LiIiIiIiIiDxBaqB7RJo0aUK/fv0YOHAgbm5ueHp6MmfOHJKSkujevTvOzs6UK1eO1atXW+wXHR1NixYtcHJywtPTk5CQEBITE83PR0RE0KBBA1xdXXF3d6dVq1bExcWZn4+Pj8dkMrFs2TKaNm2Kg4MD/v7+7NixI8e6+vj4ANC2bVtMJpP5McB///tfatasScGCBSlbtiyjR48mNTUVgDFjxuDl5cWlS5fM2//jH/+gUaNGpKen51huaGgobdq0sajDwIEDadKkicX569u3L2FhYXh4ePDiiy/m6vyIiIiIiIiIiEHS041b8hk10D1CCxcuxMPDg127dtGvXz969epF+/btqVevHnv37iUwMJCQkBCSk5MBSEhIoHHjxtSoUYM9e/YQERHB+fPn6dChg7nMpKQkwsLC2L17Nxs2bMDKyoq2bduSnumXcdiwYQwaNIioqCh8fX0JDg42N6xltnv3bgAWLFhAQkKC+fGaNWvo2rUr/fv3Jzo6mtmzZxMeHs6HH35ozvDx8eGNN94A4PPPP+enn37iq6++wsrKKsdy83L+bGxs2LZtG7Nnz87V+RERERERERER+bvTJBGPkL+/P8OHDwdg6NChTJw4EQ8PD3r27AnAyJEjmTVrFgcOHKBOnTrMmjWLgIAAxo8fby5j/vz5eHt7c/ToUXx9fWnXrp1Fxrx58yhatCjR0dH4+fmZ1w8aNIiWLVsCMHr0aKpWrcqxY8eoVKlSlnoWKVIEAFdXV4oVK2Ze/+GHH/L+++/z2muvAVC2bFnGjh3L4MGD+eCDD7C2tubrr7+mRo0avP/++3z22WfMmTOH0qVLP7Dc3CpfvjwfffSR+fHIkSMfen5ERERERERExCCaJOKRUQPdI1S9enXzz9bW1ri7u1OtWjXzOk9PTwAuXLgAQGRkJBs3bsTJySlLWXFxcfj6+hIXF8eIESP4+eefSUxMNPecO3XqlEUD3f3ZxYsXN+dk10CXk8jISHbv3m3uMQeQlpbGrVu3SE5OxsHBgbJly/Lxxx/z1ltv0bFjR7p06ZLr8h+mVq1aWerzsPOTWUpKCikpKRbrbmekUcBk/cjqKSIiIiIiIiLyKKmB7hGytbW1eGwymSzWmUwmAHMjW3p6OkFBQUyaNClLWfca2YKCgvD29mbu3Ll4eXmRnp6On59flkkUHpSTW+np6YwePZpXXnkly3MFCxY0//zTTz9hbW1NfHw8qamp2Ng8+NfIysqKjIwMi3V37tzJsp2jo2OW+jzs/GQ2YcIERo8ebbEuxKEKrzn5Zbu9iIiIiIiIiIjR1EBnoICAAJYuXYqPj0+2jVyXLl0iJiaG2bNn07BhQwC2bt36SLJtbW1JS0vLUp8jR45Qvnz5HPf79ttvWbZsGZs2baJjx46MHTvWokEsu3KLFCnCL7/8YrEuKioqS4NmZg87P9kZOnQoYWFhFuu2le+Rq31FREREREREJA/S0x6+jeSKJokwUJ8+fbh8+TLBwcHs2rWL48ePs3btWnr06EFaWhpubm64u7szZ84cjh07xo8//pil8emP8vHxYcOGDZw7d44rV64Ad+/59uWXXzJq1CgOHTpETEwM3377rfm+emfOnKFXr15MmjSJBg0aEB4ezoQJE/j5558fWO7zzz/Pnj17+PLLL4mNjeWDDz7I0mD3R85Pduzs7ChUqJDFouGtIiIiIiIiIvJXpgY6A3l5ebFt2zbS0tIIDAzEz8+PAQMG4OLigpWVFVZWVixevJjIyEj8/Px45513mDx58iPJnjJlCuvWrcPb25tnnnkGgMDAQFauXMm6deuoXbs2derUYerUqZQuXZqMjAxCQ0N59tln6du3LwAvvvgiffv2pWvXrty4ceOB5Y4YMYLBgwdTu3ZtfvvtN7p16/anz4+IiIiIiIiIGCgj3bglnzFlZL45mEg+s8Gzo2HZjQ5NMCz7p6pDDcs20ryCtx++0WPSNM3x4Rs9JmWzua/j02C9vXE9ZH+6c86w7DGpRQzLLuHym2HZZ685G5ZtpKf1c+24jXEX3mVTjfsicKN1kmHZpU32hmW/VuCqYdk+S3oblh3fYaZh2UZ+phr597vZTQ3He9KMfL2NNCH+30ZX4YlIidloWLZd5aaGZT8OugediIiIiIiIiIjkXR4np5ScaZygiIiIiIiIiIiIgdRAJyIiIiIiIiIiYiANcRURERERERERkbzLh5M1GEU96ERERERERERERAykHnSS7z2tM6kaedxGOl5jpGHZnV65ali2ddkShmVbNQkyLPt4q8WGZU++eMSw7LofVTAs+8XJFw3LXjfU27BsI3Ufs92w7NdtqxiWDbaGJTd0uGxYdnjSLcOyjfzPoNz2GYZlG3u9Ztxx+5yJNizbyL/f/jXOGpZtV9G4mXNtXmhkWPbxPgeMyzZwRvCnhiaJeGTUg05ERERERERERMRAaqATERERERERERExkIa4ioiIiIiIiIhInmVkpBldhXxDPegEAB8fHz755BOjqyEiIiIiIiIi8tRRDzoBYPfu3Tg6Oj72HJPJxPLly2nTps1jzxIRERERERGRxyhDk0Q8KupB95S7ffs2AEWKFMHBwcHg2uTenTt3jK6CiIiIiIiIiMgj8VQ30DVp0oR+/foxcOBA3Nzc8PT0ZM6cOSQlJdG9e3ecnZ0pV64cq1evttgvOjqaFi1a4OTkhKenJyEhISQmJpqfj4iIoEGDBri6uuLu7k6rVq2Ii4szPx8fH4/JZGLZsmU0bdoUBwcH/P392bFjxwPrazKZmDVrFs2bN8fe3p4yZcrw3XffWWxz9uxZOnbsiJubG+7u7rRu3Zr4+Hjz86GhobRp04YJEybg5eWFr68vkHWIq8lkYvbs2bRq1QoHBwcqV67Mjh07OHbsGE2aNMHR0ZG6detaHBfAf//7X2rWrEnBggUpW7Yso0ePJjU11ZwB0LZtW0wmk/nxw/a7V5/PP/+c1q1b4+joyLhx4x54rkRERERERERE/i6e6gY6gIULF+Lh4cGuXbvo168fvXr1on379tSrV4+9e/cSGBhISEgIycnJACQkJNC4cWNq1KjBnj17iIiI4Pz583To0MFcZlJSEmFhYezevZsNGzZgZWVF27ZtSU+37Po5bNgwBg0aRFRUFL6+vgQHB1s0SmVnxIgRtGvXjv3799O1a1eCg4OJiYkBIDk5maZNm+Lk5MRPP/3E1q1bcXJy4uWXXzb3lAPYsGEDMTExrFu3jpUrV+aYNXbsWLp160ZUVBSVKlWic+fOvPXWWwwdOpQ9e/YA0LdvX/P2a9asoWvXrvTv35/o6Ghmz55NeHg4H374IXB3GC3AggULSEhIMD9+2H73fPDBB7Ru3ZqDBw/So0ePB54nEREREREREXnM0tONW/KZp76Bzt/fn+HDh1OhQgWGDh2Kvb09Hh4e9OzZkwoVKjBy5EguXbrEgQMHAJg1axYBAQGMHz+eSpUq8cwzzzB//nw2btzI0aNHAWjXrh2vvPIKFSpUoEaNGsybN4+DBw8SHR1tkT1o0CBatmyJr68vo0eP5uTJkxw7duyB9W3fvj1vvPEGvr6+jB07llq1avHZZ58BsHjxYqysrPjiiy+oVq0alStXZsGCBZw6dYpNmzaZy3B0dOSLL76gatWq+Pn55ZjVvXt3OnTogK+vL0OGDCE+Pp4uXboQGBhI5cqVGTBggEW5H374Ie+//z6vvfYaZcuW5cUXX2Ts2LHMnj0buDuMFsDV1ZVixYqZHz9sv3s6d+5Mjx49KFu2LKVLl37geRIRERERERER+bt46ieJqF69uvlna2tr3N3dqVatmnmdp6cnABcuXAAgMjKSjRs34uTklKWsuLg4fH19iYuLY8SIEfz8888kJiaae86dOnXKokHs/uzixYubcypVqpRjfevWrZvlcVRUlLlux44dw9nZ2WKbW7duWQxFrVatGgUKFMgxI7v63TsPmc/NrVu3uH79OoUKFSIyMpLdu3db9HxLS0vj1q1bJCcn53iPu9zuV6tWrYfWOSUlhZSUFIt1Vikp2NnZPXRfEREREREREckDTRLxyDz1DXS2trYWj00mk8U6k8kEYG5kS09PJygoiEmTJmUp614jW1BQEN7e3sydOxcvLy/S09Px8/OzGGaaOTtzTl7cv2/NmjVZtGhRlm3u9VYDcj1ba3b1e9i5GT16NK+88kqWsgoWLJhjTm73y029J0yYwOjRoy3WDX+vPyMHD3joviIiIiIiIiIiRnjqG+jyKiAggKVLl+Lj44ONTdbTd+nSJWJiYpg9ezYNGzYEYOvWrY8s/+eff6Zbt24Wj5955hlz3b799luKFi1KoUKFHllmbgUEBHDkyBHKly+f4za2trakpaXleb/cGjp0KGFhYRbrrH47+6fLFREREREREZFM0tMevo3kylN/D7q86tOnD5cvXyY4OJhdu3Zx/Phx1q5dS48ePUhLSzPPnjpnzhyOHTvGjz/+mKXB6M/47rvvmD9/PkePHuWDDz5g165d5okaunTpgoeHB61bt2bLli2cOHGCzZs3M2DAAM6cOfPI6pCTkSNH8uWXXzJq1CgOHTpETEwM3377LcOHDzdv4+Pjw4YNGzh37hxXrlzJ9X65ZWdnR6FChSwWDW8VERERERERkb8yNdDlkZeXF9u2bSMtLY3AwED8/PwYMGAALi4uWFlZYWVlxeLFi4mMjMTPz4933nmHyZMnP7L80aNHs3jxYqpXr87ChQtZtGgRVapUAcDBwYGffvqJUqVK8corr1C5cmV69OjBzZs3n0iPusDAQFauXMm6deuoXbs2derUYerUqRYTOkyZMoV169bh7e1t7vmXm/1ERERERERERPKrp3qI6/0zkN4THx+fZV1GRobF4woVKrBs2bIcy23WrFmWGVvvL8PHxydLma6urlnWZcfLy4u1a9fm+HyxYsVYuHBhjs+Hh4dnuz7zcWeuS3Z1btKkSZZ1gYGBBAYG5pgfFBREUFBQlvUP2y8350ZEREREREREniBNEvHIqAediIiIiIiIiIiIgZ7qHnQiIiIiIiIiIvIHpasH3aOiBrq/EQ3zFBERERERERHJfzTEVURERERERERExEDqQSciIiIiIiIiInmnSSIeGTXQiYg8IilHfjMs26GsYdFknIx++EYif3OmMuUMTN9uWPJxW1vDskXk8bIuWcWw7OM2xv1Dn3jC0bBsD4y7VrQuG2dYtpGv98mMm4Zli+SVGuhERERERERERCTvNEnEI6N70ImIiIiIiIiIiBhIPehERERERERERCTv1IPukVEPOhEREREREREREQOpgU6euLS0NNLVyi4iIiIiIiIiAqiB7i+pSZMm9OvXj4EDB+Lm5oanpydz5swhKSmJ7t274+zsTLly5Vi9erXFftHR0bRo0QInJyc8PT0JCQkhMTHR/HxERAQNGjTA1dUVd3d3WrVqRVzc77P5xMfHYzKZWLZsGU2bNsXBwQF/f3927NjxwPpOnTqVatWq4ejoiLe3N7179+bGjRvm58PDw3F1dWXlypVUqVIFOzs7Tp48ye3btxk8eDAlSpTA0dGR5557jk2bNpn3u3TpEsHBwZQsWRIHBweqVavGN9988yfProiIiIiIiIg8ChkZaYYt+Y0a6P6iFi5ciIeHB7t27aJfv3706tWL9u3bU69ePfbu3UtgYCAhISEkJycDkJCQQOPGjalRowZ79uwhIiKC8+fP06FDB3OZSUlJhIWFsXv3bjZs2ICVlRVt27bN0ptt2LBhDBo0iKioKHx9fQkODiY1NTXHulpZWTF9+nR++eUXFi5cyI8//sjgwYMttklOTmbChAl88cUXHDp0iKJFi9K9e3e2bdvG4sWLOXDgAO3bt+fll18mNjYWgFu3blGzZk1WrlzJL7/8wptvvklISAg7d+58VKdZRERERERERMRwmiTiL8rf35/hw4cDMHToUCZOnIiHhwc9e/YEYOTIkcyaNYsDBw5Qp04dZs2aRUBAAOPHjzeXMX/+fLy9vTl69Ci+vr60a9fOImPevHkULVqU6Oho/Pz8zOsHDRpEy5YtARg9ejRVq1bl2LFjVKpUKdu6Dhw40PxzmTJlGDt2LL169WLmzJnm9Xfu3GHmzJn4+/sDEBcXxzfffMOZM2fw8vIy50ZERLBgwQLGjx9PiRIlGDRokLmMfv36ERERwXfffcdzzz2X53MqIiIiIiIiIo+Qbl/1yKiB7i+qevXq5p+tra1xd3enWrVq5nWenp4AXLhwAYDIyEg2btyIk5NTlrLi4uLw9fUlLi6OESNG8PPPP5OYmGjuOXfq1CmLBrr7s4sXL27OyamBbuPGjYwfP57o6GiuX79Oamoqt27dIikpCUdHRwAKFChgUe7evXvJyMjA19fXoqyUlBTc3d2Bu/eqmzhxIt9++y1nz54lJSWFlJQUc5nZubfN/axSUrCzs8txHxERERERERERI6mB7i/K1tbW4rHJZLJYZzKZAMyNbOnp6QQFBTFp0qQsZd1rZAsKCsLb25u5c+fi5eVFeno6fn5+3L59O8fszDmZnTx5khYtWvD2228zduxYChcuzNatW3n99de5c+eOeTt7e3tzWffKs7a2JjIyEmtra4sy7zUyTpkyhWnTpvHJJ5+Y73E3cODALPW934QJExg9erTFuuHv9Wfk4AE57iMiIiIiIiIiYiQ10OUTAQEBLF26FB8fH2xssr6sly5dIiYmhtmzZ9OwYUMAtm7d+qdz9+zZQ2pqKlOmTMHK6u4tDZcsWfLQ/Z555hnS0tK4cOGCuT6ZbdmyhdatW9O1a1fgbqNebGwslStXzrHcoUOHEhYWZrHO6rezuT0cEREREREREcmtDA1xfVQ0SUQ+0adPHy5fvkxwcDC7du3i+PHjrF27lh49epCWloabmxvu7u7MmTOHY8eO8eOPP2ZpyPojypUrR2pqKp999hnHjx/nq6++4vPPP3/ofr6+vnTp0oVu3bqxbNkyTpw4we7du5k0aRKrVq0CoHz58qxbt47t27cTExPDW2+9xblz5x5Yrp2dHYUKFbJYNLxVRERERERERP7K1ECXT3h5ebFt2zbS0tIIDAzEz8+PAQMG4OLigpWVFVZWVixevJjIyEj8/Px45513mDx58p/OrVGjBlOnTmXSpEn4+fmxaNEiJkyYkKt9FyxYQLdu3Xj33XepWLEi//jHP9i5cyfe3t4AjBgxgoCAAAIDA2nSpAnFihWjTZs2f7rOIiIiIiIiIvIIpKcbt+QzGuL6F7Rp06Ys6+Lj47Osy8jIsHhcoUIFli1blmO5zZo1Izo6OscyfHx8spTp6uqaZV1m77zzDu+8847FupCQEPPPoaGhhIaGZtnP1taW0aNHZ7ln3D2FCxfmhx9+eGC2iIiIiIiIiMjfnRroREREREREREQk73QPukdGQ1xFREREREREREQMpAY6ERERERERERERA2mIq4iIiIiIiIiI5F0+nKzBKOpBJyIiIiIiIiIiYiD1oBMRERERERERkbzTJBGPjCkjIyPD6EqIPE5zS3Y1LHujdZJh2U3THA3LNlJo1BjDsu98Pcmw7OTVhw3LtqvobFj202rXt8a9v9fbWxuW3exmmmHZx21tDcsue+eOYdlGMvKcN3S4bFj2luTChmUb+btm5Out67Unr+u/qhuWnTTzf4ZlJ54w7vX2KGPc7/nTetweazYblv0k3Vwzw7Bs+8C+hmU/DhriKiIiIiIiIiIiYiANcRURERERERERkbzTJBGPjHrQyQP5+PjwySefGF0NEREREREREZF8Sw10AkB4eDiurq5GV0NERERERERE/i7S041b8hk10ImIiIiIiIiIiBgoXzTQNWnShH79+jFw4EDc3Nzw9PRkzpw5JCUl0b17d5ydnSlXrhyrV6+22C86OpoWLVrg5OSEp6cnISEhJCYmmp+PiIigQYMGuLq64u7uTqtWrYiLizM/Hx8fj8lkYtmyZTRt2hQHBwf8/f3ZsWPHA+s7atQoSpUqhZ2dHV5eXvTv39/8nI+PD+PGjaNbt244OTlRunRp/vOf/3Dx4kVat26Nk5MT1apVY8+ePRZlLl26lKpVq2JnZ4ePjw9TpkyxeP7KlSt069YNNzc3HBwcaN68ObGxsQBs2rSJ7t27c+3aNUwmEyaTiVGjRpn3TU5OpkePHjg7O1OqVCnmzJmT53Owfft2GjVqhL29Pd7e3vTv35+kpN9n1Jk5cyYVKlSgYMGCeHp68uqrr5qf+/7776lWrRr29va4u7vTrFkzi31FRERERERExAAZ6cYt+Uy+aKADWLhwIR4eHuzatYt+/frRq1cv2rdvT7169di7dy+BgYGEhISQnJwMQEJCAo0bN6ZGjRrs2bOHiIgIzp8/T4cOHcxlJiUlERYWxu7du9mwYQNWVla0bduW9ExdKYcNG8agQYOIiorC19eX4OBgUlNTs63n999/z7Rp05g9ezaxsbH88MMPVKtWzWKbadOmUb9+ffbt20fLli0JCQmhW7dudO3alb1791K+fHm6detGRkYGAJGRkXTo0IFOnTpx8OBBRo0axYgRIwgPDzeXGRoayp49e1ixYgU7duwgIyODFi1acOfOHerVq8cnn3xCoUKFSEhIICEhgUGDBpn3nTJlCrVq1WLfvn307t2bXr16cfjw4Vyfg4MHDxIYGMgrr7zCgQMH+Pbbb9m6dSt9+96dEnnPnj3079+fMWPGcOTIESIiImjUqJH5dQoODqZHjx7ExMSwadMmXnnlFfOxi4iIiIiIiIj83Zky8kFLR5MmTUhLS2PLli0ApKWl4eLiwiuvvMKXX34JwLlz5yhevDg7duygTp06jBw5kp07d7JmzRpzOWfOnMHb25sjR47g6+ubJefixYsULVqUgwcP4ufnR3x8PGXKlOGLL77g9ddfB+72yqtatSoxMTFUqlQpSxlTp05l9uzZ/PLLL9ja2mZ53sfHh4YNG/LVV19Z1HvEiBGMGTMGgJ9//pm6deuSkJBAsWLF6NKlCxcvXmTt2rXmcgYPHsz//vc/Dh06RGxsLL6+vmzbto169eoBcOnSJby9vVm4cCHt27cnPDycgQMHcvXq1QfWJyMjg2LFijF69GjefvvtXJ2Dbt26YW9vz+zZs83lbt26lcaNG5OUlMSqVavo3r07Z86cwdnZ2SJ/79691KxZk/j4eEqXLp3lfOXG3JJd/9B+j8JGa+N6+jVNczQs20ihUWMMy77z9STDspNXH374Ro+JXUXnh28kj9Sub417f6+3tzYsu9nNNMOyj2fzN/tJKXvnjmHZRjLynDd0uGxY9pbkwoZlG/m7ZuTrreu1J6/rv6oblp0083+GZSeeMO719ihj3O/503rcHms2G5b9JN1cOdWwbPtWYXnafubMmUyePJmEhASqVq3KJ598QsOGDXPcftGiRXz00UfExsbi4uLCyy+/zMcff4y7u/ufrXq28k0PuurVf/+Qt7a2xt3d3aJnmqenJwAXLlwA7vY627hxI05OTublXoPavWGscXFxdO7cmbJly1KoUCHKlCkDwKlTp3LMLl68uEVOZu3bt+fmzZuULVuWnj17snz58iy97e4v7169H3QsMTEx1K9f36KM+vXrExsbS1paGjExMdjY2PDcc8+Zn3d3d6dixYrExMRkW8+c6mMymShWrFiW43vQOYiMjCQ8PNziXAcGBpKens6JEyd48cUXKV26NGXLliUkJIRFixaZezr6+/vzwgsvUK1aNdq3b8/cuXO5cuVKjnVNSUnh+vXrFsudDOP+oRMRERERERHJt/4mk0R8++23DBw4kGHDhrFv3z4aNmxI8+bNs7Tv3LN161a6devG66+/zqFDh/juu+/YvXs3b7zxxqM4a9nKNw10mXujmUwmi3UmkwnAPDw1PT2doKAgoqKiLJbY2Fjz8MqgoCAuXbrE3Llz2blzJzt37gTg9u3bOWZnzsnsXg+9f/3rX9jb29O7d28aNWrEnfu+scyuvAdlZGRkmNfdc3/HyJw6SWa3X3ayO7eZj+9h5/qtt96yOM/79+8nNjaWcuXK4ezszN69e/nmm28oXrw4I0eOxN/fn6tXr2Jtbc26detYvXo1VapU4bPPPqNixYqcOHEi27pOmDABFxcXi2X1b4ceeowiIiIiIiIikj9NnTqV119/nTfeeIPKlSvzySef4O3tzaxZs7Ld/ueff8bHx4f+/ftTpkwZGjRowFtvvZVlPoBHKd800OVVQEAAhw4dwsfHh/Lly1ssjo6OXLp0iZiYGIYPH84LL7xA5cqVH9hzKy/s7e35xz/+wfTp09m0aRM7duzg4MGDf7i8KlWqsHXrVot127dvx9fXF2tra6pUqUJqaqq5gRHuDnE9evQolStXBqBAgQKkpT2enmb3znXm81y+fHkKFCgAgI2NDc2aNeOjjz7iwIEDxMfH8+OPPwJ3G/zq16/P6NGj2bdvHwUKFGD58uXZZg0dOpRr165ZLM2dqz6W4xIRERERERF5qhk4SUR2I+hSUlKyVPH27dtERkby0ksvWax/6aWX2L59e7aHVa9ePc6cOcOqVavIyMjg/PnzfP/997Rs2fKxnEZ4ihvo+vTpw+XLlwkODmbXrl0cP36ctWvX0qNHD9LS0nBzc8Pd3Z05c+Zw7NgxfvzxR8LC8ja+OTvh4eHMmzePX375hePHj/PVV19hb2//h++vBvDuu++yYcMGxo4dy9GjR1m4cCEzZswwT/RQoUIFWrduTc+ePdm6dSv79++na9eulChRgtatWwN37zV348YNNmzYQGJionmI6aMwZMgQduzYQZ8+fcy9FFesWEG/fv0AWLlyJdOnTycqKoqTJ0/y5Zdfkp6eTsWKFdm5cyfjx49nz549nDp1imXLlnHx4kVzw2JmdnZ2FCpUyGKxNRl3vyQRERERERERefSyG0E3YcKELNslJiaSlpZmvl3YPZ6enpw7dy7bsuvVq8eiRYvo2LEjBQoUoFixYri6uvLZZ589lmOBp7iBzsvLi23btpGWlkZgYCB+fn4MGDAAFxcXrKyssLKyYvHixURGRuLn58c777zD5MmT/3Suq6src+fOpX79+lSvXp0NGzbw3//+90/dZDAgIIAlS5awePFi/Pz8GDlyJGPGjCE0NNS8zYIFC6hZsyatWrWibt26ZGRksGrVKvPQ1Hr16vH222/TsWNHihQpwkcfffRnD9WsevXqbN68mdjYWBo2bMgzzzzDiBEjzPeqc3V1ZdmyZTz//PNUrlyZzz//nG+++YaqVatSqFAhfvrpJ1q0aIGvry/Dhw9nypQpNG/e/JHVT0RERERERET+XrIbQTd06NAct8/u1mA53fYrOjqa/v37M3LkSCIjI4mIiODEiRO8/fbbj/QYLOqXH2ZxFXkQzeL6dNEsrk+eZnF98jSL65OnWVyfPM3i+uRpFtcn72m9XtMsrk+eZnE1IPtpmcV1+UTDsu3bvp+r7W7fvo2DgwPfffcdbdu2Na8fMGAAUVFRbN6c9bUKCQnh1q1bfPfdd+Z1W7dupWHDhvz666/mDkeP0lPbg05ERERERERERPK3AgUKULNmTdatW2exft26ddSrVy/bfZKTk7Gysmwys7a++0X14+rnZvNYShURERERERERkfwtI93oGuRKWFgYISEh1KpVi7p16zJnzhxOnTplHrI6dOhQzp49y5dffglAUFAQPXv2ZNasWQQGBpKQkMDAgQN59tln8fLyeix1VAOdiIiIiIiIiIjkWx07duTSpUuMGTOGhIQE/Pz8WLVqlXnCzoSEBE6dOmXePjQ0lN9++40ZM2bw7rvv4urqyvPPP8+kSY/vtkZqoBMRERERERERkbxL/3v0oAPo3bs3vXv3zva58PDwLOv69etHv379HnOtfqd70ImIiIiIiIiIiBhIPegk32tb9bRx4Ye8DYvu9MpVw7JTjvxmWLaRM6nadh1iWLZTk2jDsjNOGpeduuEnw7KN5F/jnGHZ8w4bN8Okfw3jZtb0NywZphwpYVj2iIHGzdL87PGzhmUb+h32MuOijfxsMfI9puu1Jy91w1XDsp0mvWdYtqOB10yGMnDmXIfmlQzLFskrNdCJiIiIiIiIiEje/Y2GuP7VaYiriIiIiIiIiIiIgdSDTkRERERERERE8i4jw+ga5BvqQSe5smnTJkwmE1evXjW6KiIiIiIiIiIi+Yoa6ERERERERERERAz0l2+ga9KkCf369WPgwIG4ubnh6enJnDlzSEpKonv37jg7O1OuXDlWr15tsV90dDQtWrTAyckJT09PQkJCSExMND8fERFBgwYNcHV1xd3dnVatWhEXF2d+Pj4+HpPJxLJly2jatCkODg74+/uzY8eOB9b36tWrvPnmm3h6elKwYEH8/PxYuXKl+fmlS5dStWpV7Ozs8PHxYcqUKRb7+/j4MG7cOLp164aTkxOlS5fmP//5DxcvXqR169Y4OTlRrVo19uzZY94nPDwcV1dXfvjhB3x9fSlYsCAvvvgip0//PntpXFwcrVu3xtPTEycnJ2rXrs369estslNSUhg8eDDe3t7Y2dlRoUIF5s2bR3x8PE2bNgXAzc0Nk8lEaGio+fXp378/gwcPpnDhwhQrVoxRo0ZZlHvt2jXefPNNihYtSqFChXj++efZv3+/+fn9+/fTtGlTnJ2dKVSoEDVr1jQf38mTJwkKCsLNzQ1HR0eqVq3KqlWrHvgaiIiIiIiIiMgTkJ5u3JLP/OUb6AAWLlyIh4cHu3btol+/fvTq1Yv27dtTr1499u7dS2BgICEhISQnJwOQkJBA48aNqVGjBnv27CEiIoLz58/ToUMHc5lJSUmEhYWxe/duNmzYgJWVFW3btiU904s8bNgwBg0aRFRUFL6+vgQHB5OampptPdPT02nevDnbt2/n66+/Jjo6mokTJ2JtbQ1AZGQkHTp0oFOnThw8eJBRo0YxYsQIwsPDLcqZNm0a9evXZ9++fbRs2ZKQkBC6detG165d2bt3L+XLl6dbt25k3DfWOzk5mQ8//JCFCxeybds2rl+/TqdOnczP37hxgxYtWrB+/Xr27dtHYGAgQUFBnDp1yrxNt27dWLx4MdOnTycmJobPP/8cJycnvL29Wbp0KQBHjhwhISGBTz/91OL1cXR0ZOfOnXz00UeMGTOGdevWAZCRkUHLli05d+4cq1atIjIykoCAAF544QUuX74MQJcuXShZsiS7d+8mMjKS999/H1tbWwD69OlDSkoKP/30EwcPHmTSpEk4OTnl4rdGREREREREROTv4W8xSYS/vz/Dhw8HYOjQoUycOBEPDw969uwJwMiRI5k1axYHDhygTp06zJo1i4CAAMaPH28uY/78+Xh7e3P06FF8fX1p166dRca8efMoWrQo0dHR+Pn5mdcPGjSIli1bAjB69GiqVq3KsWPHqFSpUpZ6rl+/nl27dhETE4Ovry8AZcuWNT8/depUXnjhBUaMGAGAr68v0dHRTJ482dwjDaBFixa89dZbFsdWu3Zt2rdvD8CQIUOoW7cu58+fp1ixYgDcuXOHGTNm8NxzzwF3G80qV67Mrl27ePbZZ/H398ff39+cMW7cOJYvX86KFSvo27cvR48eZcmSJaxbt45mzZplqXvhwoUBKFq0KK6urhbHXb16dT744AMAKlSowIwZM9iwYQMvvvgiGzdu5ODBg1y4cAE7OzsAPv74Y3744Qe+//573nzzTU6dOsV7771nPqcVKlQwl33q1CnatWtHtWrVstRJRERERERERAyUD3uyGeVv0YOuevXq5p+tra1xd3c3N9gAeHp6AnDhwgXgbk+1jRs34uTkZF7uNf7cG8YaFxdH586dKVu2LIUKFaJMmTIAFj3KMmcXL17cIiezqKgoSpYsaW6cyywmJob69etbrKtfvz6xsbGkpaVlm3nv2B50vAA2NjbUqlXL/LhSpUq4uroSExMD3O0xOHjwYKpUqYKrqytOTk4cPnzYfLxRUVFYW1vTuHHjbOv+IPfXF+6ep/tfixs3buDu7m7xepw4ccL8WoSFhfHGG2/QrFkzJk6caDHUuH///owbN4769evzwQcfcODAgQfWJSUlhevXr1ssKfrAEBEREREREZG/sL9FA9294Y73mEwmi3UmkwnAPDw1PT2doKAgoqKiLJbY2FgaNWoEQFBQEJcuXWLu3Lns3LmTnTt3AnD79u0cszPnZGZvb//A48jIyDCXcf+6Bx3vve1zU4/MZd+/7r333mPp0qV8+OGHbNmyhaioKKpVq2Y+3ofV/UGye33ufy2KFy+e5bU4cuQI7733HgCjRo3i0KFDtGzZkh9//JEqVaqwfPlyAN544w2OHz9OSEgIBw8epFatWnz22Wc51mXChAm4uLhYLJ8eP5Xj9iIiIiIiIiLyB2WkG7fkM3+LBrq8CggI4NChQ/j4+FC+fHmLxdHRkUuXLhETE8Pw4cN54YUXqFy5MleuXPnTudWrV+fMmTMcPXo02+erVKnC1q1bLdZt374dX19f833q/qjU1FSLiSOOHDnC1atXzT0Ht2zZQmhoKG3btqVatWoUK1aM+Ph48/bVqlUjPT2dzZs3Z1t+gQIFACx6+uVGQEAA586dw8bGJstr4eHhYd7O19eXd955h7Vr1/LKK6+wYMEC83Pe3t68/fbbLFu2jHfffZe5c+fmmDd06FCuXbtmsQwoWypPdRYREREREREReZLyZQNdnz59uHz5MsHBwezatYvjx4+zdu1aevToQVpaGm5ubri7uzNnzhyOHTvGjz/+SFhY2J/Obdy4MY0aNaJdu3asW7eOEydOsHr1aiIiIgB499132bBhA2PHjuXo0aMsXLiQGTNmMGjQoD+dbWtrS79+/di5cyd79+6le/fu1KlTh2effRaA8uXLs2zZMqKioti/fz+dO3e26IHn4+PDa6+9Ro8ePfjhhx84ceIEmzZtYsmSJQCULl0ak8nEypUruXjxIjdu3MhVvZo1a0bdunVp06YNa9asIT4+nu3btzN8+HD27NnDzZs36du3L5s2beLkyZNs27aN3bt3U7lyZQAGDhzImjVrOHHiBHv37uXHH380P5cdOzs7ChUqZLHYWeXLX3MRERERERERySfyZcuFl5cX27ZtIy0tjcDAQPz8/BgwYAAuLi5YWVlhZWXF4sWLiYyMxM/Pj3feeYfJkyc/kuylS5dSu3ZtgoODqVKlCoMHDzb3OgsICGDJkiUsXrwYPz8/Ro4cyZgxYywmiPijHBwcGDJkCJ07d6Zu3brY29uzePFi8/PTpk3Dzc2NevXqERQURGBgIAEBARZlzJo1i1dffZXevXtTqVIlevbsSVJSEgAlSpRg9OjRvP/++3h6etK3b99c1ctkMrFq1SoaNWpEjx498PX1pVOnTsTHx+Pp6Ym1tTWXLl2iW7du+Pr60qFDB5o3b87o0aOBuz32+vTpQ+XKlXn55ZepWLEiM2fO/NPnS0RERERERET+pPR045Z8xpSR3U3Q5G8lPDycgQMHcvXqVaOr8peUGJj3iS8eleWHvA3L7vTKVcOyU478Zli2Q/OsMyw/KbZdhxiWnXYm2rDsjJPGZadu+MmwbCMZ+R7rf7iwYdnTK102LNtIU46UMCx7xEBnw7LTjp81LNtIi5e5Gpbdtuppw7KNpOu1J8+uonGfLbY93jYs28hrJiMlzfyfYdlG/m/gMHC2YdlP0s0vhxqWbd9tgmHZj4ON0RUQEREREREREZG/IfX5emTy5RBXERERERERERGRvws10OUDoaGhGt4qIiIiIiIiIvI3pSGuIiIiIiIiIiKSd/lwsgajqAediIiIiIiIiIiIgdSDTkRERERERERE8k496B4ZNdBJvmfk1Nplo64alm1dtoRh2Q5lDYsmefVhw7KdmkQblm1dsoph2WmGJUPKkf8Zlr0/qphh2c92NCya0kfsDcs28vM87fhZw7I5Yly0kYz8O2asJMOSE084GpZdqre3Ydm6XnvyTs08bVh26ReMu14zlTbues1IiSc2GZbtYeD/Bg4DDYuWvykNcRURERERERERETGQetCJiIiIiIiIiEjeZWiI66OiHnRiiNDQUNq0aWN0NUREREREREREDKcedGKITz/9lIyMDPPjJk2aUKNGDT755BPjKiUiIiIiIiIiuZaRnvHwjSRX1EAnhnBxcTG6CiIiIiIiIiIifwka4srd3lv9+vVj4MCBuLm54enpyZw5c0hKSqJ79+44OztTrlw5Vq9ebbFfdHQ0LVq0wMnJCU9PT0JCQkhMTDQ/HxERQYMGDXB1dcXd3Z1WrVoRFxdnfj4+Ph6TycSyZcto2rQpDg4O+Pv7s2PHjgfW9+rVq7z55pt4enpSsGBB/Pz8WLlypfn5pUuXUrVqVezs7PDx8WHKlCkW+/v4+DB+/Hh69OiBs7MzpUqVYs6cORbbnDlzhk6dOlG4cGEcHR2pVasWO3fuBCAuLo7WrVvj6emJk5MTtWvXZv369eZ9hw4dSp06dbLUu3r16nzwwQeA5RDX0NBQNm/ezKefforJZMJkMnHixAnKly/Pxx9/bFHGL7/8gpWVlcV5FBEREREREREDpKcbt+QzaqD7fwsXLsTDw4Ndu3bRr18/evXqRfv27alXrx579+4lMDCQkJAQkpOTAUhISKBx48bUqFGDPXv2EBERwfnz5+nQoYO5zKSkJMLCwti9ezcbNmzAysqKtm3bkp7pF2nYsGEMGjSIqKgofH19CQ4OJjU1Ndt6pqen07x5c7Zv387XX39NdHQ0EydOxNraGoDIyEg6dOhAp06dOHjwIKNGjWLEiBGEh4dblDNlyhRq1arFvn376N27N7169eLw4btTUN+4cYPGjRvz66+/smLFCvbv38/gwYPN9b5x4wYtWrRg/fr17Nu3j8DAQIKCgjh16hQAXbp0YefOnRaNaIcOHeLgwYN06dIlyzF9+umn1K1bl549e5KQkEBCQgKlSpWiR48eLFiwwGLb+fPn07BhQ8qVK/fQ11RERERERERE5O9AQ1z/n7+/P8OHDwfu9gCbOHEiHh4e9OzZE4CRI0cya9YsDhw4QJ06dZg1axYBAQGMHz/eXMb8+fPx9vbm6NGj+Pr60q5dO4uMefPmUbRoUaKjo/Hz8zOvHzRoEC1btgRg9OjRVK1alWPHjlGpUqUs9Vy/fj27du0iJiYGX19fAMqWLWt+furUqbzwwguMGDECAF9fX6Kjo5k8eTKhoaHm7Vq0aEHv3r0BGDJkCNOmTWPTpk1UqlSJf//731y8eJHdu3dTuHBhAMqXL29xrvz9/c2Px40bx/Lly1mxYgV9+/bFz8+P6tWr8+9//9tcj0WLFlG7dm1zne/n4uJCgQIFcHBwoFixYub13bt3Z+TIkezatYtnn32WO3fu8PXXXzN58uQsZYiIiIiIiIiI/F2pB93/q169uvlna2tr3N3dqVatmnmdp6cnABcuXADu9lTbuHEjTk5O5uVeg9q9nmNxcXF07tyZsmXLUqhQIcqUKQNg7mmWXXbx4sUtcjKLioqiZMmS2TZ0AcTExFC/fn2LdfXr1yc2Npa0tLRsM00mE8WKFTNnRkVF8cwzz5gb5zJLSkpi8ODBVKlSBVdXV5ycnDh8+LDFcXXp0oVFixYBkJGRwTfffJNt77kHKV68OC1btmT+/PkArFy5klu3btG+ffsc90lJSeH69esWS0pqWo7bi4iIiIiIiMgflJFu3JLPqIHu/9na2lo8NplMFutMJhOAeZhneno6QUFBREVFWSyxsbE0atQIgKCgIC5dusTcuXPZuXOn+R5ut2/fzjE7c05m9vb2DzyOjIwMcxn3r8vN8d7LfFjGe++9x9KlS/nwww/ZsmULUVFRVKtWzeK4OnfuzNGjR9m7dy/bt2/n9OnTdOrU6YHlZueNN95g8eLF3Lx5kwULFtCxY0ccHBxy3H7ChAm4uLhYLB+v35fnXBERERERERGRJ0VDXP+ggIAAli5dio+PDzY2WU/jpUuXiImJYfbs2TRs2BCArVu3/unc6tWrc+bMGfMw2syqVKmSJWf79u34+vqa71OXm4wvvviCy5cvZ9uLbsuWLYSGhtK2bVvg7j3p4uPjLbYpWbIkjRo1YtGiRdy8eZNmzZqZeyFmp0CBAhY9/O5p0aIFjo6OzJo1i9WrV/PTTz89sO5Dhw4lLCzMYl3anLActhYRERERERGRPyw9a4cg+WPUg+4P6tOnD5cvXyY4OJhdu3Zx/Phx1q5dS48ePUhLS8PNzQ13d3fmzJnDsWPH+PHHH7M0HP0RjRs3plGjRrRr145169Zx4sQJVq9eTUREBADvvvsuGzZsYOzYsRw9epSFCxcyY8YMBg0alOuM4OBgihUrRps2bdi2bRvHjx9n6dKl5tlly5cvz7Jly4iKimL//v107tw52x5/Xbp0YfHixXz33Xd07dr1gZk+Pj7s3LmT+Ph4EhMTzeVZW1sTGhrK0KFDKV++PHXr1n1gOXZ2dhQqVMhisbPJXcOkiIiIiIiIiIgR1ED3B3l5ebFt2zbS0tIIDAzEz8+PAQMG4OLigpWVFVZWVixevJjIyEj8/Px45513HtnkBkuXLqV27doEBwdTpUoVBg8ebO59FhAQwJIlS1i8eDF+fn6MHDmSMWPGWEwQ8TAFChRg7dq1FC1alBYtWlCtWjWLmWKnTZuGm5sb9erVIygoiMDAQAICArKU0759ey5dukRycjJt2rR5YOagQYOwtramSpUqFClSxOJ+dq+//jq3b9+mR48euT4GEREREREREZG/Cw1xBTZt2pRlXeYhm5D1Xm4VKlRg2bJlOZbbrFkzoqOjcyzDx8cnS5murq7Z3jPufoULFzZPnJCddu3aZZlB9n7ZHVtUVJTF49KlS/P9999nu7+Pjw8//vijxbo+ffpk2c7V1ZVbt25lW0Z4eLjFY19fX3MPvcwSEhKwsbGhW7du2T4vIiIiIiIiIgbI4f75kndqoJO/rJSUFE6fPs2IESPo0KHDA+9hJyIiIiIiIiLyd6UhrvKX9c0331CxYkWuXbvGRx99ZHR1REREREREROR+6enGLfmMGujkLys0NJS0tDQiIyMpUaKE0dUREREREREREXksNMRVRERERERERETy7iH30JfcUw86ERERERERERERA6kHneR7pjLlDEyPNCzZqkmQYdkZJ6MfvtFjYlfxrGHZRh53mmHJYF2yioHpxjlua2tYdt2yxg37P5lxwbBscDYs2fopPefJq437TE084WhYdomXjfsO+7hNIcOyy14z7j1WWtdrT5yR1y0eZQ4blp1xIs6wbCOZSj+d12tG/i3xMCxZ/q7UQCciIiIiIiIiInmXDydrMIqGuIqIiIiIiIiIiBhIDXTyWISHh+Pq6mp0NURERERERETkcUnPMG7JZ9RAJ49Fx44dOXr0aJ72adKkCQMHDnw8FRIRERERERER+YvSPejksbC3t8fe3t7oaoiIiIiIiIiI/OU90R50TZo0oV+/fgwcOBA3Nzc8PT2ZM2cOSUlJdO/eHWdnZ8qVK8fq1ast9ouOjqZFixY4OTnh6elJSEgIiYmJ5ucjIiJo0KABrq6uuLu706pVK+Lifp+dJz4+HpPJxLJly2jatCkODg74+/uzY8eOB9b36tWrvPnmm3h6elKwYEH8/PxYuXKl+fmlS5dStWpV7Ozs8PHxYcqUKRb7+/j4MH78eHr06IGzszOlSpVizpw5FtucOXOGTp06UbhwYRwdHalVqxY7d+4EIC4ujtatW+Pp6YmTkxO1a9dm/fr15n2HDh1KnTp1stS7evXqfPDBB+bHCxYsoHLlyhQsWJBKlSoxc+bMBx53kyZN6Nu3L3379jWf0+HDh5OR8XsX0itXrtCtWzfc3NxwcHCgefPmxMbGmp/PPMR11KhR1KhRg6+++gofHx9cXFzo1KkTv/32GwChoaFs3ryZTz/9FJPJhMlkIj4+nitXrtClSxeKFCmCvb09FSpUYMGCBQ+sv4iIiIiIiIg8ARnpxi35zBMf4rpw4UI8PDzYtWsX/fr1o1evXrRv35569eqxd+9eAgMDCQkJITk5GYCEhAQaN25MjRo12LNnDxEREZw/f54OHTqYy0xKSiIsLIzdu3ezYcMGrKysaNu2LemZZhMZNmwYgwYNIioqCl9fX4KDg0lNTc22nunp6TRv3pzt27fz9ddfEx0dzcSJE7G2tgYgMjKSDh060KlTJw4ePMioUaMYMWIE4eHhFuVMmTKFWrVqsW/fPnr37k2vXr04fPjutOI3btygcePG/Prrr6xYsYL9+/czePBgc71v3LhBixYtWL9+Pfv27SMwMJCgoCBOnToFQJcuXdi5c6dFY+ShQ4c4ePAgXbp0AWDu3LkMGzaMDz/8kJiYGMaPH8+IESNYuHDhQ18nGxsbdu7cyfTp05k2bRpffPGF+fnQ0FD27NnDihUr2LFjBxkZGbRo0YI7d+7kWGZcXBw//PADK1euZOXKlWzevJmJEycC8Omnn1K3bl169uxJQkICCQkJeHt7M2LECKKjo1m9ejUxMTHMmjULDw9NWC0iIiIiIiIi+ccTH+Lq7+/P8OHDgbs9wCZOnIiHhwc9e/YEYOTIkcyaNYsDBw5Qp04dZs2aRUBAAOPHjzeXMX/+fLy9vTl69Ci+vr60a9fOImPevHkULVqU6Oho/Pz8zOsHDRpEy5YtARg9ejRVq1bl2LFjVKpUKUs9169fz65du4iJicHX1xeAsmXLmp+fOnUqL7zwAiNGjADA19eX6OhoJk+eTGhoqHm7Fi1a0Lt3bwCGDBnCtGnT2LRpE5UqVeLf//43Fy9eZPfu3RQuXBiA8uXLW5wrf39/8+Nx48axfPlyVqxYQd++ffHz86N69er8+9//Ntdj0aJF1K5d21znsWPHMmXKFF555RUAypQpQ3R0NLNnz+a1117L8XXy9vZm2rRpmEwmKlasyMGDB5k2bRo9e/YkNjaWFStWsG3bNurVq2fO9fb25ocffqB9+/bZlpmenk54eDjOzs4AhISEsGHDBj788ENcXFwoUKAADg4OFCtWzLzPqVOneOaZZ6hVqxZwt1eiiIiIiIiIiPwF5MPJGozyxHvQVa9e3fyztbU17u7uVKtWzbzO09MTgAsXLgB3e6pt3LgRJycn83KvQe1ez7G4uDg6d+5M2bJlKVSoEGXKlAEw9zTLLrt48eIWOZlFRUVRsmRJc0NXZjExMdSvX99iXf369YmNjSUtLS3bTJPJRLFixcyZUVFRPPPMM+bGucySkpIYPHgwVapUwdXVFScnJw4fPmxxXF26dGHRokUAZGRk8M0335h7z128eJHTp0/z+uuvW5y/cePGWfS6y06dOnUwmUzmx3Xr1jUfW0xMDDY2Njz33HPm593d3alYsSIxMTE5lunj42NunIO7r0FO5/+eXr16sXjxYmrUqMHgwYPZvn37A7dPSUnh+vXrFkvKnex7SYqIiIiIiIiI/BU88R50tra2Fo9NJpPFunuNQveGeaanpxMUFMSkSZOylHWvkS0oKAhvb2/mzp2Ll5cX6enp+Pn5cfv27RyzM+dk9rAJDjIyMiwasO6tyyy7472X+bCM9957jzVr1vDxxx9Tvnx57O3tefXVVy2Oq3Pnzrz//vvs3buXmzdvcvr0aTp16mRxbHPnzrVoTAPMQ3X/iOyO8976zOfkfg86Fzlp3rw5J0+e5H//+x/r16/nhRdeoE+fPnz88cfZbj9hwgRGjx5tse6fHZsxPPjFB+aIiIiIiIiISN5kPOR/esm9v/wsrgEBASxduhQfHx9sbLJW99KlS8TExDB79mwaNmwIwNatW/90bvXq1Tlz5ox5GG1mVapUyZKzfft2fH19c934Vb16db744gsuX76cbS+6LVu2EBoaStu2bYG796SLj4+32KZkyZI0atSIRYsWcfPmTZo1a2buhejp6UmJEiU4fvy4uVddbv38889ZHleoUAFra2uqVKlCamoqO3fuNA9xvXTpEkePHqVy5cp5yrlfgQIFLHof3lOkSBFCQ0MJDQ2lYcOGvPfeezk20A0dOpSwsDCLdelrZ/zhOomIiIiIiIiIPG5PfIhrXvXp04fLly8THBzMrl27OH78OGvXrqVHjx6kpaXh5uaGu7s7c+bM4dixY/z4449ZGmj+iMaNG9OoUSPatWvHunXrOHHiBKtXryYiIgKAd999lw0bNjB27FiOHj3KwoULmTFjBoMGDcp1RnBwMMWKFaNNmzZs27aN48ePs3TpUvPssuXLl2fZsmVERUWxf/9+OnfunG2Psy5durB48WK+++47unbtavHcqFGjmDBhAp9++ilHjx7l4MGDLFiwgKlTpz6wbqdPnyYsLIwjR47wzTff8NlnnzFgwAAAKlSoQOvWrenZsydbt25l//79dO3alRIlStC6detcH39mPj4+7Ny5k/j4eBITE0lPT2fkyJH85z//4dixYxw6dIiVK1c+sBHQzs6OQoUKWSx2tn/5dmgREREREREReYr95RvovLy82LZtG2lpaQQGBuLn58eAAQNwcXHBysoKKysrFi9eTGRkJH5+frzzzjtMnjz5kWQvXbqU2rVrExwcTJUqVRg8eLC5h1dAQABLlixh8eLF+Pn5MXLkSMaMGWMxQcTDFChQgLVr11K0aFFatGhBtWrVLGaKnTZtGm5ubtSrV4+goCACAwMJCAjIUk779u25dOkSycnJtGnTxuK5N954gy+++ILw8HCqVatG48aNCQ8PN9+nLyfdunXj5s2bPPvss/Tp04d+/frx5ptvmp9fsGABNWvWpFWrVtStW5eMjAxWrVqVZRhrXgwaNMjcQ69IkSKcOnWKAgUKMHToUKpXr06jRo2wtrZm8eLFfzhDRERERERERB6R9AzjlnzGlJHTDcXkqdWkSRNq1KjBJ598YnRVHomb//nIsOztb0Yall1/TYhh2Rknow3LTt3wk2HZNi80MizbVLqKYdnWJY3Lvtalu2HZyw95G5bdZZCjYdlvTHvw5D6P0xfvFDUs20hGnvPplS4blp14wrjf8xIvG/cd9vgVhQzLbnYz661GnpR6c2oalq3rtScvaeb/DMt2aF7JsGxTmXLGZRt4rRjfYaZh2UaqdHSV0VV4IpI+7GZYtuOwLw3Lfhw09k9ERERERERERPIuQ5NEPCp/+SGuIiIiIiIiIiIi+Zl60EkWmzZtMroKIiIiIiIiIiJPDTXQiYiIiIiIiIhI3uXDyRqMoiGuIiIiIiIiIiIiBlIPOhERERERERERybt0TRLxqJgyMjLUH1HytaE+nY2ugiHKpj6dHWQ7vXLV6CoYIuXIb0ZXwRAuixYYln2tS3fDsvdHFTMse729tWHZzW6mGZZ93NbWsGwjNXS4bFj22WvOhmWXcHk6P1ONPOdGvseO2xj3z+XTer1m5GeLR5kkw7ITTzgalm2kUr29Dcs+NfO0YdmVjq4yLPtJShoVbFi246hvDMt+HNSDTkRERERERERE8k73oHtkns6vbERERERERERERP4i1EAnIiIiIiIiIiJiIDXQyRMVHx+PyWQiKirK6KqIiIiIiIiIyJ+RkW7cks+ogU7+km7fvm10FUREREREREREnointoGuSZMm9OvXj4EDB+Lm5oanpydz5swhKSmJ7t274+zsTLly5Vi9erXFftHR0bRo0QInJyc8PT0JCQkhMTHR/HxERAQNGjTA1dUVd3d3WrVqRVxcnPn5ez3Ili1bRtOmTXFwcMDf358dO3Y8sL6jRo2iVKlS2NnZ4eXlRf/+/QEYM2YM1apVy7J9zZo1GTlyJAChoaG0adOG8ePH4+npiaurK6NHjyY1NZX33nuPwoULU7JkSebPn5+lnkuWLKFhw4bY29tTu3Ztjh49yu7du6lVqxZOTk68/PLLXLx40SJ7wYIFVK5cmYIFC1KpUiVmzpxpfq5MmTIAPPPMM5hMJpo0aWJRxwkTJuDl5YWvr2+ujk1EREREREREDJKeYdySzzy1DXQACxcuxMPDg127dtGvXz969epF+/btqVevHnv37iUwMJCQkBCSk5MBSEhIoHHjxtSoUYM9e/YQERHB+fPn6dChg7nMpKQkwsLC2L17Nxs2bMDKyoq2bduSnm7Z/XLYsGEMGjSIqKgofH19CQ4OJjU1Ndt6fv/990ybNo3Zs2cTGxvLDz/8YG646tGjB9HR0ezevdu8/YEDB9i3bx+hoaHmdT/++CO//vorP/30E1OnTmXUqFG0atUKNzc3du7cydtvv83bb7/N6dOW01B/8MEHDB8+nL1792JjY0NwcDCDBw/m008/ZcuWLcTFxVk0ls2dO5dhw4bx4YcfEhMTw/jx4xkxYgQLFy4EYNeuXQCsX7+ehIQEli1bZt53w4YNxMTEsG7dOlauXJnrYxMRERERERER+TuzMboCRvL392f48OEADB06lIkTJ+Lh4UHPnj0BGDlyJLNmzeLAgQPUqVOHWbNmERAQwPjx481lzJ8/H29vb44ePYqvry/t2rWzyJg3bx5FixYlOjoaPz8/8/pBgwbRsmVLAEaPHk3VqlU5duwYlSpVylLPU6dOUaxYMZo1a4atrS2lSpXi2WefBaBkyZIEBgayYMECateuDdztwda4cWPKli1rLqNw4cJMnz4dKysrKlasyEcffURycjL//Oc/LY5/27ZtdOrUyaKegYGBAAwYMIDg4GA2bNhA/fr1AXj99dcJDw83bz927FimTJnCK6+8AtztMRcdHc3s2bN57bXXKFKkCADu7u4UK1bM4jgdHR354osvKFCggHldbo5NREREREREROTv7KnuQVe9enXzz9bW1ri7u1sMqfT09ATgwoULAERGRrJx40acnJzMy70GtXvDWOPi4ujcuTNly5alUKFC5iGdp06dyjG7ePHiFjmZtW/fnps3b1K2bFl69uzJ8uXLLXrb9ezZk2+++YZbt25x584dFi1aRI8ePSzKqFq1KlZWv7/cnp6eFsd67/gz1+H+et47H5nP0b19Ll68yOnTp3n99dctztG4ceMshvnmpFq1ahaNc7k9tvulpKRw/fp1iyU1I+2h2SIiIiIiIiKSNxnp6YYt+c1T3YPO1tbW4rHJZLJYZzKZAMzDU9PT0wkKCmLSpElZyrrXyBYUFIS3tzdz587Fy8uL9PR0/Pz8skx68KCczLy9vTly5Ajr1q1j/fr19O7dm8mTJ7N582ZsbW0JCgrCzs6O5cuXY2dnR0pKSpaefA871nvrMtchu3pmXnf/+YG7w1yfe+45i3Ksra2zPbb7OTo6ZlmXm2O734QJExg9erTFuvoufjR0zXovOxERERERERGRv4KnuoEurwICAli6dCk+Pj7Y2GQ9dZcuXSImJobZs2fTsGFDALZu3fpIsu3t7fnHP/7BP/7xD/r06UOlSpU4ePAgAQEB2NjY8Nprr7FgwQLs7Ozo1KkTDg4OjyQ3Lzw9PSlRogTHjx+nS5cu2W5zr4dcWlruerXl9diGDh1KWFiYxbqx1Xrm8ghEREREREREJNfy4WQNRlEDXR706dOHuXPnEhwczHvvvYeHhwfHjh1j8eLFzJ07Fzc3N9zd3ZkzZw7Fixfn1KlTvP/++386Nzw8nLS0NJ577jkcHBz46quvsLe3p3Tp0uZt3njjDSpXrgzAtm3b/nTmHzVq1Cj69+9PoUKFaN68OSkpKezZs4crV64QFhZG0aJFsbe3JyIigpIlS1KwYEFcXFweWGZejs3Ozg47OzuLdTamh/feExERERERERExylN9D7q88vLyYtu2baSlpREYGIifnx8DBgzAxcUFKysrrKysWLx4MZGRkfj5+fHOO+8wefLkP53r6urK3LlzqV+/PtWrV2fDhg3897//xd3d3bxNhQoVqFevHhUrVswyvPRJeuONN/jiiy8IDw+nWrVqNG7cmPDwcPO9+GxsbJg+fTqzZ8/Gy8uL1q1bP7TMv8qxiYiIiIiIiMh90jOMW/IZU0ZGRv47qqdQRkYGlSpV4q233soyxPPv7s8e21Cfzo+hVn99ZVOfzvb3Tq9cNboKhkg58pvRVTCEy6IFhmVf69LdsOz9UcUevtFjst7euF7JzW4aN+nP8Uz3bX1aNHS4bFj22WvOhmWXcHk6P1ONPOdGvseO2xh3o/Gn9XrNyM8WjzJJhmUnnsh6z+2nQane3oZln5p52rDsSkdXGZb9JN14r61h2U6TlxuW/ThoiGs+cOHCBb766ivOnj1L9+7G/cP4OOTnYxMRERERERERATXQ5Quenp54eHgwZ84c3NzcjK7OI5Wfj01ERERERETkby3DuF7I+Y0a6PKB/DxKOT8fm4iIiIiIiIgIqIFORERERERERET+iHw4WYNRns67koqIiIiIiIiIiPxFqAed5HuTf91sWPZzRSoalj354hHDso3UiWeMroIhjJzV08iZ99oaOJOqkTPIzqv5rmHZTVONm4FuXsGbhmWfTL1oWPY8x4KGZZd42cDvciOMm0nVyNlMjVR3qKth2YGDVxqWbeT12qKbxn22GGlv1cJGV8EQT+tMyWUnXDUsu96cJoZli+SVetCJiIiIiIiIiEieZaRnGLbk1cyZMylTpgwFCxakZs2abNmy5YHbp6SkMGzYMEqXLo2dnR3lypVj/vz5f/RUPZR60ImIiIiIiIiISL717bffMnDgQGbOnEn9+vWZPXs2zZs3Jzo6mlKlSmW7T4cOHTh//jzz5s2jfPnyXLhwgdTU1MdWRzXQiYiIiIiIiIhI3v1NJomYOnUqr7/+Om+88QYAn3zyCWvWrGHWrFlMmDAhy/YRERFs3ryZ48ePU7jw3WH5Pj4+j7WO+XqI67lz53jxxRdxdHTE1dXV6Opky8fHh08++SRP+4SGhtKmTRvz4yZNmjBw4MBHWq/HwWQy8cMPPxhdDRERERERERH5m0tJSeH69esWS0pKSpbtbt++TWRkJC+99JLF+pdeeont27dnW/aKFSuoVasWH330ESVKlMDX15dBgwZx8+bjuydyvu5BN23aNBISEoiKisLFxeWxZplMJpYvX27RcPakLFu2DFsDb/qZWwkJCbi5uRldDRERERERERF5FNLTDYueMGECo0ePtlj3wQcfMGrUKIt1iYmJpKWl4enpabHe09OTc+fOZVv28ePH2bp1KwULFmT58uUkJibSu3dvLl++/NjuQ5evG+ji4uKoWbMmFSpUyHGbO3fu/C0atx7kXnfLv7pixYybZVJERERERERE8o+hQ4cSFhZmsc7Ozi7H7U0mk8XjjIyMLOvuSU9Px2QysWjRInOHr6lTp/Lqq6/yr3/9C3t7+z9Z+6zyNMS1SZMm9OvXj4EDB+Lm5oanpydz5swhKSmJ7t274+zsTLly5Vi9erXFftHR0bRo0QInJyc8PT0JCQkhMTHR/HxERAQNGjTA1dUVd3d3WrVqRVxcnPn5+Ph4TCYTy5Yto2nTpjg4OODv78+OHTtyrKuPjw9Lly7lyy+/xGQyERoaCtx9QT7//HNat26No6Mj48aNIy0tjddff50yZcpgb29PxYoV+fTTT7OUOX/+fKpWrYqdnR3Fixenb9++5iyAtm3bYjKZzI/j4uJo3bo1np6eODk5Ubt2bdavX5+XU05aWhphYWHmczN48GAyMizHeGce4urj48O4cePo1q0bTk5OlC5dmv/85z9cvHiR1q1b4+TkRLVq1dizZ49FOdu3b6dRo0bY29vj7e1N//79SUpKsih3/Pjx9OjRA2dnZ0qVKsWcOXPMz9++fZu+fftSvHhxChYsiI+Pj8VY7sxDXA8ePMjzzz+Pvb097u7uvPnmm9y4ccP8/L2hvB9//DHFixfH3d2dPn36cOfOnTydQxERERERERHJX+zs7ChUqJDFkl0DnYeHB9bW1ll6y124cCFLr7p7ihcvTokSJSxGY1auXJmMjAzOnDnzaA/k/+X5HnQLFy7Ew8ODXbt20a9fP3r16kX79u2pV68ee/fuJTAwkJCQEJKTk4G7wxobN25MjRo12LNnDxEREZw/f54OHTqYy0xKSiIsLIzdu3ezYcMGrKysaNu2LemZukoOGzaMQYMGERUVha+vL8HBwTnOoLF7925efvllOnToQEJCgkWD2wcffEDr1q05ePAgPXr0ID09nZIlS7JkyRKio6MZOXIk//znP1myZIl5n1mzZtGnTx/efPNNDh48yIoVKyhfvrw5C2DBggUkJCSYH9+4cYMWLVqwfv169u3bR2BgIEFBQZw6dSrX53vKlCnMnz+fefPmsXXrVi5fvszy5csfut+0adOoX78++/bto2XLloSEhNCtWze6du3K3r17KV++PN26dTM39h08eJDAwEBeeeUVDhw4wLfffsvWrVvNjZD316dWrVrs27eP3r1706tXLw4fPgzA9OnTWbFiBUuWLOHIkSN8/fXXOd5EMTk5mZdffhk3Nzd2797Nd999x/r167Pkbdy4kbi4ODZu3MjChQsJDw8nPDw81+dPRERERERERB6T9AzjllwqUKAANWvWZN26dRbr161bR7169bLdp379+vz6668WnYiOHj2KlZUVJUuW/GPn6iHyPMTV39+f4cOHA3e7E06cOBEPDw969uwJwMiRI5k1axYHDhygTp06zJo1i4CAAMaPH28uY/78+Xh7e3P06FF8fX1p166dRca8efMoWrQo0dHR+Pn5mdcPGjSIli1bAjB69GiqVq3KsWPHqFSpUpZ6FilSBDs7O+zt7bMMrezcuTM9evSwWHf/uOUyZcqwfft2lixZYm5IHDduHO+++y4DBgwwb1e7dm1zFoCrq6tFlr+/P/7+/ubH48aNY/ny5axYsSJLQ1ROPvnkE4YOHWo+R59//jlr1qx56H4tWrTgrbfeAn5/TWrXrk379u0BGDJkCHXr1uX8+fMUK1aMyZMn07lzZ3NPvAoVKjB9+nQaN27MrFmzKFiwoLnc3r17m8uYNm0amzZtolKlSpw6dYoKFSrQoEEDTCYTpUuXzrF+ixYt4ubNm3z55Zc4OjoCMGPGDIKCgpg0aZK5FdvNzY0ZM2ZgbW1NpUqVaNmyJRs2bDD/vmWWkpKS5aaQD+q2KiIiIiIiIiL5W1hYGCEhIdSqVYu6desyZ84cTp06xdtvvw3cbd86e/YsX375JXC33Wjs2LF0796d0aNHk5iYyHvvvUePHj0ey/BW+AM96KpXr27+2draGnd3d6pVq2Zed69h5cKFCwBERkayceNGnJyczMu9BrV7w1jj4uLo3LkzZcuWpVChQpQpUwYgS0+z+7OLFy9ukZMXtWrVyrLu888/p1atWhQpUgQnJyfmzp1rzr9w4QK//vorL7zwQp5ykpKSGDx4MFWqVMHV1RUnJycOHz6c6x50165dIyEhgbp165rX2djYZFv/zO4/V/dek4e9TuHh4RavU2BgIOnp6Zw4cSLbck0mE8WKFTOXERoaSlRUFBUrVqR///6sXbs2x/rFxMTg7+9vbpyDuy3U6enpHDlyxLyuatWqWFtbmx8XL178ga/5hAkTcHFxsVgy0n/LcXsRERERERER+YP+Bj3oADp27Mgnn3zCmDFjqFGjBj/99BOrVq0ydyxKSEiwaKtxcnJi3bp1XL16lVq1atGlSxeCgoKYPn36Iz1998tzD7rMEyqYTCaLdfd6Kt0bnpqenm7uFZXZvUa2oKAgvL29mTt3Ll5eXqSnp+Pn58ft27dzzM6ckxf3NwoBLFmyhHfeeYcpU6ZQt25dnJ2dmTx5Mjt37gT4w62j7733HmvWrOHjjz+mfPny2Nvb8+qrr2Y5rschu3P1sNfprbfeon///lnKKlWqVLbl3ivnXhkBAQGcOHGC1atXs379ejp06ECzZs34/vvvs5T5oF5t969/UF52srtJpJt71h6WIiIiIiIiIvL06N27t3lEYGbZ3UqrUqVKWYbFPk6PfRbXgIAAli5dio+PDzY2WeMuXbpETEwMs2fPpmHDhgBs3br1cVfLwpYtW6hXr57FC3X/JBXOzs74+PiwYcMGmjZtmm0Ztra2pKWlZSk3NDSUtm3bAnfvSRcfH5/rerm4uFC8eHF+/vlnGjVqBEBqaiqRkZEEBATkupzcCAgI4NChQ+b76v1RhQoVomPHjnTs2JFXX32Vl19+mcuXL2eZabZKlSosXLiQpKQkc4Pptm3bsLKywtfX9w/n29nZZbkppIa3ioiIiIiIiMhfWZ6HuOZVnz59uHz5MsHBwezatYvjx4+zdu1aevToQVpaGm5ubri7uzNnzhyOHTvGjz/+mKUH1ONWvnx59uzZw5o1azh69CgjRowwT/Rwz6hRo5gyZQrTp08nNjaWvXv38tlnn5mfv9eAd+7cOa5cuWIud9myZURFRbF//346d+6c5x5/AwYMYOLEiSxfvpzDhw/Tu3dvrl69+qePObMhQ4awY8cO+vTpQ1RUFLGxsaxYsYJ+/frluoxp06axePFiDh8+zNGjR/nuu+8oVqwYrq6uWbbt0qULBQsW5LXXXuOXX35h48aN9OvXj5CQkBxnURERERERERGRv46MjAzDlvzmsTfQeXl5sW3bNtLS0ggMDMTPz48BAwbg4uKClZUVVlZWLF68mMjISPz8/HjnnXeYPHny466WhbfffptXXnmFjh078txzz3Hp0qUs3R5fe+01PvnkE2bOnEnVqlVp1aoVsbGx5uenTJnCunXr8Pb25plnngHuNli5ublRr149goKCCAwMzHPPt3fffZdu3boRGhpqHn57r0feo1S9enU2b95MbGwsDRs25JlnnmHEiBHmYci54eTkxKRJk6hVqxa1a9cmPj6eVatWYWWV9dfMwcGBNWvWcPnyZWrXrs2rr77KCy+8wIwZMx7lYYmIiIiIiIiI/OWZMvJjs6PIfWwKlDAs+7kiFQ3L3nnxyMM3yoeuvP2M0VUwxK5vHR++0WNyPNO9Ip+ktlVPG5btsmiBYdmv1XzXsOymacb9rm20TjIs+2TqNcOy5zkWNCy7xMuP/bvcHJ2NyPt9hh9Z9jVnw7KNVHeoq2HZhQavNCzbyOu10zcvGpZtpL21Cz98o3xof1Qxw7KNvF4re+eOYdn15tQ0LNu+9WDDsp+k6z1fMiy70NycJ6b8OzLuqktERERERERERETUQCciIiIiIiIiImKkxz6Lq4iIiIiIiIiI5EPpumvao6IedCIiIiIiIiIiIgZSDzoREREREREREcmzDPWge2Q0i6vke8mfvGVY9o4JVw3LNnImNiMZec79a5wzLNuuonEzDlqXNW6mZCNf73kFbxuWvTByimHZ4TVGGpYdGjXGsGwjGXnOGzpcNizbyJlUjfw8X37I27BsI3UZZNwM0Ys+Nm6GaCOP21SmnGHZYwZEGZbd7GaaYdklXH4zLNujjHG/51OOGHeteDLjpmHZ/z653LDsJ+la92aGZbssWG9Y9uOgHnQiIiIiIiIiIpJ36kH3yOgedCIiIiIiIiIiIgZSA10+ER4ejqurq/nxqFGjqFGjhmH1ERERERERERGR3FEDXT41aNAgNmzYYHQ1RERERERERCS/SjdwyWfUQPcXcvv2o7vhuJOTE+7u7o+svEchu+NLS0sjPT3v76w/up+IiIiIiIiIyF+NGugM1KRJE/r27UtYWBgeHh68+OKLAEydOpVq1arh6OiIt7c3vXv35saNGxb7hoeHU6pUKRwcHGjbti2XLl2yeD7zENcmTZowcOBAi23atGlDaGio+fHMmTOpUKECBQsWxNPTk1dfffWB9d++fTuNGjXC3t4eb29v+vfvT1LS77MD+fj4MG7cOEJDQ3FxcaFnz57mobgrV66kSpUq2NnZcfLkSa5cuUK3bt1wc3PDwcGB5s2bExsba3G82e0nIiIiIiIiIsbISM8wbMlv1EBnsIULF2JjY8O2bduYPXs2AFZWVkyfPp1ffvmFhQsX8uOPPzJ48GDzPjt37qRHjx707t2bqKgomjZtyrhx4/5UPfbs2UP//v0ZM2YMR44cISIigkaNGuW4/cGDBwkMDOSVV17hwIEDfPvtt2zdupW+fftabDd58mT8/PyIjIxkxIgRACQnJzNhwgS++OILDh06RNGiRQkNDWXPnj2sWLGCHTt2kJGRQYsWLbhz5465rOz2ExERERERERH5u7MxugJPu/Lly/PRRx9ZrLu/p1uZMmUYO3YsvXr1YubMmQB8+umnBAYG8v777wPg6+vL9u3biYiI+MP1OHXqFI6OjrRq1QpnZ2dKly7NM888k+P2kydPpnPnzua6VqhQgenTp9O4cWNmzZpFwYIFAXj++ecZNGiQeb+tW7dy584dZs6cib+/PwCxsbGsWLGCbdu2Ua9ePQAWLVqEt7c3P/zwA+3btwfIsp+IiIiIiIiISH6gHnQGq1WrVpZ1Gzdu5MUXX6REiRI4OzvTrVs3Ll26ZB4+GhMTQ926dS32yfw4r1588UVKly5N2bJlCQkJYdGiRSQnJ+e4fWRkJOHh4Tg5OZmXwMBA0tPTOXHixAOPr0CBAlSvXt38OCYmBhsbG5577jnzOnd3dypWrEhMTEyO+2UnJSWF69evWywpqWm5OgciIiIiIiIikgfpGcYt+Ywa6Azm6Oho8fjkyZO0aNECPz8/li5dSmRkJP/6178AzMM9MzLy/otoZWWVZb/7h486Ozuzd+9evvnmG4oXL87IkSPx9/fn6tWr2ZaXnp7OW2+9RVRUlHnZv38/sbGxlCtXLsfjA7C3t8dkMpkf53Q8GRkZFttl3i87EyZM+D/27j2u57P/A/jr2+nb+SCpkA4qSpFkG1E5bDnMMkMOQ4twk5xZI8uxhZCZwzJl2GiY28xyWqUDRhOmbpKSm8yMOYRO3+/vD78+t68OtMVFXs89Po/b9/pc1/W+rk+tdb9d1+eCkZGRyrX04Mka2xARERERERERicQE3UvmxIkTKCsrQ1RUFN566y04Ojri6tWrKnWcnZ1x9OhRlbInPz/JzMwMhYWF0ufy8nL89ttvKnU0NDTQvXt3LF68GKdPn0Z+fj5+/vnnKvtzd3fH2bNnYW9vX+nS0tKqzZTh7OyMsrIyHDt2TCr7888/cf78eTg5OdWqr9DQUNy+fVvlmta9+q26RERERERERPQ3KQRe9QzfQfeSad68OcrKyvD555+jT58+SEtLw9q1a1XqhISEoGPHjli8eDH69u2L/fv3P/X9c127dsWUKVPw448/onnz5li+fLnK6rg9e/bg4sWL8PLygomJCfbu3QuFQoEWLVpU2d/MmTPx1ltvYfz48QgKCoKenh6ys7Nx4MABfP7557Was4ODA/z8/BAUFIR169bBwMAAH3/8MZo0aQI/P79a9SWXyyGXy1XK7muo16oPIiIiIiIiIqIXiSvoXjJubm5YtmwZIiMj4eLigi1btiAiIkKlzltvvYX169fj888/h5ubG/bv34/Zs2fX2G9gYCBGjBiB4cOHw9vbG7a2tujSpYt039jYGDt37kTXrl3h5OSEtWvX4ttvv0WrVq2q7K9169ZITk5GTk4OOnfujLZt2yIsLAyWlpZ/a96xsbFo164d3n33XXTo0AFKpRJ79+6Fpqbm3+qPiIiIiIiIiJ4vpUIp7KpvZMq/80IzolfI/RVjhMU+EvGXsNgdQo2FxRZJ5DNv43ZNWGx5CwNhsdXtmgiLLfLr/ZV2ibDYGzOihMWOc5sjLHZA5jxhsUUS+cw7694UFvvKbXE/10T+PP/+rJWw2CINnVb5vcUvypalRcJii5y3zLb50ys9J/MmZgqL3f2BuAPkmhjdFRa7oa247/Ooc+J+V7ykfCAs9jeXvhcW+0W6NcBHWGyT75KExX4euIKOiIiIiIiIiIhIIL6DjoiIiIiIiIiIaq8eHtYgClfQERERERERERERCcQVdEREREREREREVGv18bAGUbiCjoiIiIiIiIiISCCe4kr1nmeTrsJif6WnLSz2yKKHwmKL5KVpISy2yFOirGU6wmK/rvO2K3s9/45L5EmqIk8zvagh7gUrI7T+EhY75X4DYbFfV6/ryblzNP4QFpu/rxHVX9YaRsJivy6nuN5831tY7AbfJwuL/TxwiysREREREREREdUeD4moM6/nX/8TERERERERERG9JLiCjoiIiIiIiIiIak3JFXR1hivoiIiIiIiIiIiIBGKC7jny8fHBpEmTXpl+iYiIiIiIiIiemULgVc8wQfcaKy0tfWGxSkpK6nQML3LsRERERERERETPExN0z0lAQACSk5MRHR0NmUwGmUyG/Px8AEBWVhZ69eoFfX19mJubY9iwYbhx4wYAICkpCVpaWkhJSZH6ioqKQsOGDVFYWFhtv3FxcTA2NlYZw65duyCTyaTP4eHhcHNzw4YNG2BnZwe5XA6lUonbt29j9OjRaNSoEQwNDdG1a1ecOnWqxvlduXIF/v7+MDExgampKfz8/KT5Vcy/b9++iIiIQOPGjeHo6Ij8/HzIZDLEx8fDx8cH2tra2Lx5MxQKBebNm4emTZtCLpfDzc0NCQkJUl/VtSMiIiIiIiIiqg+YoHtOoqOj0aFDBwQFBaGwsBCFhYWwsrJCYWEhvL294ebmhhMnTiAhIQG///47Bg4cCOB/21eHDRuG27dv49SpU5g1axZiYmJgaWlZbb/P6sKFC4iPj8eOHTuQmZkJAOjduzeuXbuGvXv3IiMjA+7u7ujWrRtu3rxZZR/3799Hly5doK+vj8OHDyM1NRX6+vro0aOHykq5Q4cOITs7GwcOHMCePXuk8pkzZyIkJATZ2dnw9fVFdHQ0oqKisHTpUpw+fRq+vr547733kJOToxL3yXZEREREREREJI5SIe6qb3iK63NiZGQELS0t6OrqwsLCQipfs2YN3N3dsWjRIqlsw4YNsLKywvnz5+Ho6IgFCxbg4MGDGD16NM6ePYthw4bh/fffr7HfZ1VSUoJNmzbBzMwMAPDzzz/jzJkzuH79OuRyOQBg6dKl2LVrF7Zv347Ro0dX6mPr1q1QU1PD+vXrpRV6sbGxMDY2RlJSEt555x0AgJ6eHtavXw8tLS0AkFbYTZo0Cf369ZP6W7p0KWbOnIlBgwYBACIjI5GYmIgVK1bgiy++kOo92Y6IiIiIiIiIqD5ggu4Fy8jIQGJiIvT19Svdy83NhaOjI7S0tLB582a0bt0a1tbWWLFiRZ3Ft7a2lpJzFeO5d+8eTE1NVeo9ePAAubm51c7hwoULMDAwUCl/+PChShtXV1cpOfc4Dw8P6c937tzB1atX4enpqVLH09Oz0jbbx9tVp7i4GMXFxSplCqUCajIuFiUiIiIiIiKqU/VwJZsoTNC9YAqFAn369EFkZGSle5aWltKf09PTAQA3b97EzZs3oaenV2O/ampqUCqVKmVVHaTwZD8KhQKWlpZISkqqVPfJd9o93qZdu3bYsmVLpXuPJ/+qG3NV5Y+/Kw8AlEplpbKnPQMAiIiIwNy5c1XKmurboJmh7VPbEhERERERERGJwATdc6SlpYXy8nKVMnd3d+zYsQM2NjbQ0Kj68efm5mLy5MmIiYlBfHw8hg8fjkOHDkFNTa3afs3MzHD37l0UFRVJiayKd8zVxN3dHdeuXYOGhgZsbGyeaV7u7u7Ytm2bdKjEP2FoaIjGjRsjNTUVXl5eUnl6ejreeOONWvcXGhqKKVOmqJT5tnzvH42RiIiIiIiIiOh54r6/58jGxgbHjh1Dfn4+bty4AYVCgfHjx+PmzZsYPHgwfvnlF1y8eBH79+9HYGAgysvLUV5ejmHDhuGdd97BRx99hNjYWPz222+Iioqqsd8333wTurq6+OSTT3DhwgV88803iIuLe+oYu3fvjg4dOqBv377Yt28f8vPzkZ6ejtmzZ+PEiRNVthk6dCgaNmwIPz8/pKSkIC8vD8nJyZg4cSL++9//1vo5TZ8+HZGRkdi2bRvOnTuHjz/+GJmZmZg4cWKt+5LL5TA0NFS5uL2ViIiIiIiIqO7xkIi6w8zFczRt2jSoq6vD2dkZZmZmKCgoQOPGjZGWloby8nL4+vrCxcUFEydOhJGREdTU1LBw4ULk5+fjyy+/BABYWFhg/fr1mD17trQirqp+GzRogM2bN2Pv3r1wdXXFt99+i/Dw8KeOUSaTYe/evfDy8kJgYCAcHR0xaNAg5Ofnw9zcvMo2urq6OHz4MJo1a4Z+/frByckJgYGBePDgwd9aURcSEoKpU6di6tSpcHV1RUJCAnbv3g0HB4da90VERERERERE9KqRKZ98cRlRPePZpKuw2F/paQuLPbLoobDYInlp1v5047pySflAWGxrmY6w2K/rvO3KXs+/4wrInCcsdpzbHGGxL2qI+2vaEVp/CYudcr+BsNivq866N4XFvnLb4OmVnpM5Gn8Ii83f14jqL2sNI2Gxv7n0vbDYL9L1bt7CYjc6lCws9vPwev6/CyIiIiIiIiIiopcED4kgIiIiIiIiIqJaq4/vghOFK+iIiIiIiIiIiIgEYoKOiIiIiIiIiIhIIG5xJSIiIiIiIiKi2lPKRI+g3mCCjuq9A9MdhMU+EvGXsNgHQq2ExRZJ5DNv4ybu5D3dni2FxQbEnfon8uv9lba402s3ZkQJiy3yJFWRJ8iW/zdLWOxN724VFlvkiaIiNekhbpPJL9vE/Uy9qKkpLPaBaeJ+X9uytEhY7APT9YTFFmn+irvCYnd/UC4sdhMjcfMW+XNt0W5DYbGJXiVM0BERERERERERUa3xkIi6w3fQERERERERERERCcQE3QsWEBCAvn37ih4GERERERERERG9JLjF9QWLjo6GUql87nF8fHzg5uaGFStWPPdYRERERERERPT6USp4SERd4Qq6F6S8vBwKhQJGRkYwNjYWPZxnVlJSUmd9lZaW1ln/dTkuIiIiIiIiIiKRmKCrgo+PD4KDgxEcHAxjY2OYmppi9uzZKivfSkpKMGPGDDRp0gR6enp48803kZSUJN2Pi4uDsbEx9uzZA2dnZ8jlcly6dKnSFlcfHx9MmDABkyZNgomJCczNzfHll1+iqKgIH330EQwMDNC8eXP89NNPKmPMyspCr169oK+vD3NzcwwbNgw3btwA8GgbbXJyMqKjoyGTySCTyZCfn//Udo/PfcqUKWjYsCHefvvtap9TbGwsnJycoK2tjZYtW2L16tXSvfz8fMhkMsTHx8PHxwfa2trYvHmzNP+IiAg0btwYjo6OAIAzZ86ga9eu0NHRgampKUaPHo179+5J/VXXjoiIiIiIiIjEUCrEXfUNE3TV2LhxIzQ0NHDs2DGsXLkSy5cvx/r166X7H330EdLS0rB161acPn0aAwYMQI8ePZCTkyPVuX//PiIiIrB+/XqcPXsWjRo1qjZWw4YN8csvv2DChAn417/+hQEDBqBjx4749ddf4evri2HDhuH+/fsAgMLCQnh7e8PNzQ0nTpxAQkICfv/9dwwcOBDAo220HTp0QFBQEAoLC1FYWAgrK6untnty7mlpaVi3bl2VY46JicGsWbOwcOFCZGdnY9GiRQgLC8PGjRtV6s2cORMhISHIzs6Gr68vAODQoUPIzs7GgQMHsGfPHty/fx89evSAiYkJjh8/ju+++w4HDx5EcHCwSl9PtiMiIiIiIiIiqg/4DrpqWFlZYfny5ZDJZGjRogXOnDmD5cuXIygoCLm5ufj222/x3//+F40bNwYATJs2DQkJCYiNjcWiRYsAPNrSuXr1arRp06bGWG3atMHs2bMBAKGhofjss8/QsGFDBAUFAQDmzJmDNWvW4PTp03jrrbewZs0auLu7S3EAYMOGDbCyssL58+fh6OgILS0t6OrqwsLCQqrzLO0AwN7eHosXL65xzPPnz0dUVBT69esHALC1tUVWVhbWrVuHESNGSPUmTZok1amgp6eH9evXQ0tLC8CjZN+DBw/w9ddfQ09PDwCwatUq9OnTB5GRkTA3N6+yHRERERERERGJo1TyHXR1hQm6arz11luQyf73jdahQwdERUWhvLwcv/76K5RKZaVtlsXFxTA1NZU+a2lpoXXr1k+N9XgddXV1mJqawtXVVSqrSFBdv34dAJCRkYHExETo6+tX6is3N7fa7Z/P2s7Dw6PG8f7xxx+4fPkyRo4cKSURAaCsrAxGRkYqdavqy9XVVSXJlp2djTZt2kjJOQDw9PSEQqHAuXPnpPk/2a4qxcXFKC4uVikrLyuHXEO9xnZERERERERERKIwQfc3KBQKqKurIyMjA+rqqomfx5NfOjo6Kkm+6mhqaqp8lslkKmUVfSgUCul/K1aXPcnS0rLGcT9Lu8cTZdX1Azxa+fbmm2+q3HvyeVTV15NlSqWy2uf0ePnTxgUAERERmDt3rkrZJ77umNWj5qQjEREREREREZEoTNBV4+jRo5U+Ozg4QF1dHW3btkV5eTmuX7+Ozp07v/Cxubu7Y8eOHbCxsYGGRtVfQi0tLZSXl9e63bMwNzdHkyZNcPHiRQwdOvRv91PB2dkZGzduRFFRkZSES0tLg5qaWq0PgwgNDcWUKVNUysq/nFJNbSIiIiIiIiL6u+rjYQ2i8JCIaly+fBlTpkzBuXPn8O233+Lzzz/HxIkTAQCOjo4YOnQohg8fjp07dyIvLw/Hjx9HZGQk9u7d+9zHNn78eNy8eRODBw/GL7/8gosXL2L//v0IDAyUknI2NjY4duwY8vPzcePGDSgUimdq96zCw8MRERGB6OhonD9/HmfOnEFsbCyWLVtW6/kMHToU2traGDFiBH777TckJiZiwoQJGDZsmLS99VnJ5XIYGhqqXNzeSkREREREREQvMyboqjF8+HA8ePAAb7zxBsaPH48JEyZg9OjR0v3Y2FgMHz4cU6dORYsWLfDee+/h2LFjsLKyeu5ja9y4MdLS0lBeXg5fX1+4uLhg4sSJMDIygpraoy/ptGnToK6uDmdnZ5iZmaGgoOCZ2j2rUaNGYf369YiLi4Orqyu8vb0RFxcHW1vbWs9HV1cX+/btw82bN9G+fXv0798f3bp1w6pVq2rdFxERERERERG9GEqFTNhV33CLazU0NTWxYsUKrFmzptr7c+fOrfS+swoBAQEICAioVB4XF6fyOSkpqVKd/Pz8SmVKpVLls4ODA3bu3FllbODRKr8jR45UKn9au6rGU50hQ4ZgyJAhVd6zsbGpNGag8vwruLq64ueff642VnXtiIiIiIiIiIhedVxBR0REREREREREJBBX0BERERERERERUa1VsXGO/iYm6KpQm22eRERERERERERE/wQTdEREREREREREVGv18bAGUfgOOiIiIiIiIiIiIoG4go6I6pzMtrmw2Bc1TwuL3UZYZKD84hVhsdXtmgiLfVFTU1jsS2V/CIst0kUNhbDY5f/NEhZbvamzsNgin7ndbQNhsYVKuCss9EEddWGxAXHfayJ/dwDE/e4gdt4iZYoewGun+Jy4n2uAobDIl5QPhMV+XXAFXd3hCjoiIiIiIiIiIiKBmKAjIiIiIiIiIiISiFtciYiIiIiIiIio1pRK0SOoP7iCjoiIiIiIiIiISCCuoCMiIiIiIiIiolrjIRF1hyvo6rnS0lLRQ5CUlJRUWf53x/gyzY2IiIiIiIiI6O9igu4VkpCQgE6dOsHY2BimpqZ49913kZubK93Pz8+HTCZDfHw8fHx8oK2tjc2bNwMAYmNj4eTkBG1tbbRs2RKrV69W6XvmzJlwdHSErq4u7OzsEBYW9tQE2JUrV+Dv7w8TExOYmprCz88P+fn50v2AgAD07dsXERERaNy4MRwdHasdo0KhwLx589C0aVPI5XK4ubkhISHhmeZGRERERERERPQqY4LuFVJUVIQpU6bg+PHjOHToENTU1PD+++9DoVCo1Js5cyZCQkKQnZ0NX19fxMTEYNasWVi4cCGys7OxaNEihIWFYePGjVIbAwMDxMXFISsrC9HR0YiJicHy5curHcv9+/fRpUsX6Ovr4/Dhw0hNTYW+vj569OihslLu0KFDyM7OxoEDB7Bnz55qxxgdHY2oqCgsXboUp0+fhq+vL9577z3k5OTUODciIiIiIiIiEkOplAm76hu+g+4V8sEHH6h8/uqrr9CoUSNkZWXBxcVFKp80aRL69esnfZ4/fz6ioqKkMltbW2RlZWHdunUYMWIEAGD27NlSfRsbG0ydOhXbtm3DjBkzqhzL1q1boaamhvXr10Mme/QvRmxsLIyNjZGUlIR33nkHAKCnp4f169dDS0sLAKQVdk+OcenSpZg5cyYGDRoEAIiMjERiYiJWrFiBL774otq5ERERERERERG96pige4Xk5uYiLCwMR48exY0bN6SVcwUFBSoJOg8PD+nPf/zxBy5fvoyRI0ciKChIKi8rK4ORkZH0efv27VixYgUuXLiAe/fuoaysDIaGhtWOJSMjAxcuXICBgYFK+cOHD1W23bq6ukrJucc9PsY7d+7g6tWr8PT0VKnj6emJU6dOVduuKsXFxSguLlYpKy8rh1xDvcZ2RERERERERFQ7SsXT69CzYYLuFdKnTx9YWVkhJiYGjRs3hkKhgIuLS6XDF/T09KQ/VyTxYmJi8Oabb6rUU1d/lLQ6evQoBg0ahLlz58LX1xdGRkbYunUroqKiqh2LQqFAu3btsGXLlkr3zMzMqhxLdWOsULESr4JSqaxUVl1/FSIiIjB37lyVsk983TGrR82JPSIiIiIiIiIiUZige0X8+eefyM7Oxrp169C5c2cAQGpq6lPbmZubo0mTJrh48SKGDh1aZZ20tDRYW1tj1qxZUtmlS5dq7Nfd3R3btm1Do0aNalxp9ywMDQ3RuHFjpKamwsvLSypPT0/HG2+8Uau+QkNDMWXKFJWy8i+nVFObiIiIiIiIiEg8HhLxiqg4KfXLL7/EhQsX8PPPP1dKRFUnPDwcERERiI6Oxvnz53HmzBnExsZi2bJlAAB7e3sUFBRg69atyM3NxcqVK/H999/X2OfQoUPRsGFD+Pn5ISUlBXl5eUhOTsbEiRPx3//+t9bzmz59OiIjI7Ft2zacO3cOH3/8MTIzMzFx4sRa9SOXy2FoaKhycXsrERERERERUd1TKGXCrtpavXo1bG1toa2tjXbt2iElJeWZ2qWlpUFDQwNubm61jlkbTNC9ItTU1LB161ZkZGTAxcUFkydPxpIlS56p7ahRo7B+/XrExcXB1dUV3t7eiIuLg62tLQDAz88PkydPRnBwMNzc3JCeno6wsLAa+9TV1cXhw4fRrFkz9OvXD05OTggMDMSDBw/+1oq6kJAQTJ06FVOnToWrqysSEhKwe/duODg41LovIiIiIiIiIqIK27Ztw6RJkzBr1iycPHkSnTt3Rs+ePVFQUFBju9u3b2P48OHo1q3bcx8jt7i+Qrp3746srCyVMqVSKf3ZxsZG5fPjhgwZgiFDhlTb9+LFi7F48WKVskmTJtU4HgsLC2zcuLHa+3FxcZXKqhujmpoa5syZgzlz5lTZV01zIyIiIiIiIqIXT/k3VrKJsGzZMowcORKjRo0CAKxYsQL79u3DmjVrEBERUW27MWPGYMiQIVBXV8euXbue6xi5go6IiIiIiIiIiOqlkpISZGRk4J133lEpf+edd5Cenl5tu9jYWOTm5uLTTz993kMEwBV0RERERERERET0NygV4lbQFRcXo7i4WKVMLpdDLperlN24cQPl5eUwNzdXKTc3N8e1a9eq7DsnJwcff/wxUlJSoKHxYlJnXEFHRERERERERESvlIiICBgZGalcNW1XlclUk4lKpbJSGQCUl5djyJAhmDt3LhwdHet83NXhCjoiIiIiIiIiInqlhIaGYsqUKSplT66eA4CGDRtCXV290mq569evV1pVBwB3797FiRMncPLkSQQHBwMAFAoFlEolNDQ0sH//fnTt2rUOZ/IIE3RERERERERERFRrIs9yrGo7a1W0tLTQrl07HDhwAO+//75UfuDAAfj5+VWqb2hoiDNnzqiUrV69Gj///DO2b98OW1vbfz74KsiUPBqT6rlmDVyFxe5kYC8sdurdC8JiixSr6Sws9kEddWGxRbqkfCAsdpdyPWGxO+veFBY75X4DYbFf13lf1FAIiz3vxAJhsQ+3ChUW+3XVxOiu6CEI8c6Ny8JiW+mYCYt9+cEfwmKLJPL3tTkar+czf115aVoIix2R/42w2C9StkMvYbGdcvY+c91t27Zh2LBhWLt2LTp06IAvv/wSMTExOHv2LKytrREaGoorV67g66+/rrJ9eHg4du3ahczMzDoafWVcQUdERERERERERLUm8pCI2vD398eff/6JefPmobCwEC4uLti7dy+sra0BAIWFhSgoKBA6RiboiIiIiIiIiIioXhs3bhzGjRtX5b24uLga24aHhyM8PLzuB/UYnuL6EggICEDfvn1FD4OIiIiIiIiIiARggu4Fys/Ph0wme657lomIiIiIiIiIXgSFUibsqm+YoKMXorS0tFblf7c/IiIiIiIiIqJXTb1N0G3fvh2urq7Q0dGBqakpunfvjqKiIgD/21K6aNEimJubw9jYGHPnzkVZWRmmT5+OBg0aoGnTptiwYYNKn2fOnEHXrl2lPkePHo179+5J9xUKBebNm4emTZtCLpfDzc0NCQkJ0v2Ko3jbtm0LmUwGHx8flf6XLl0KS0tLmJqaYvz48SpJKBsbGyxatAiBgYEwMDBAs2bN8OWXX6q0v3LlCvz9/WFiYgJTU1P4+fkhPz9fup+UlIQ33ngDenp6MDY2hqenJy5dugQAOHXqFLp06QIDAwMYGhqiXbt2OHHiRLXP9/bt2xg9ejQaNWoEQ0NDdO3aFadOnZLuh4eHw83NDRs2bICdnR3kcjmUSiVkMhnWrl0LPz8/6OnpYcGCRyfTrVmzBs2bN4eWlhZatGiBTZs2qcSrrh0RERERERERiaFUyoRd9U29TNAVFhZi8ODBCAwMRHZ2NpKSktCvXz8olUqpzs8//4yrV6/i8OHDWLZsGcLDw/Huu+/CxMQEx44dw9ixYzF27FhcvvzoyPf79++jR48eMDExwfHjx/Hdd9/h4MGDCA4OlvqMjo5GVFQUli5ditOnT8PX1xfvvfcecnJyAAC//PILAODgwYMoLCzEzp07pbaJiYnIzc1FYmIiNm7ciLi4uEovKYyKioKHhwdOnjyJcePG4V//+hf+85//SOPr0qUL9PX1cfjwYaSmpkJfXx89evRASUkJysrK0LdvX3h7e+P06dM4cuQIRo8eDZns0Tf10KFD0bRpUxw/fhwZGRn4+OOPoampWeXzVSqV6N27N65du4a9e/ciIyMD7u7u6NatG27evCnVu3DhAuLj47Fjxw6Vbb2ffvop/Pz8cObMGQQGBuL777/HxIkTMXXqVPz2228YM2YMPvroIyQmJqrEfbIdEREREREREVF9UC9PcS0sLERZWRn69esnHZnr6uqqUqdBgwZYuXIl1NTU0KJFCyxevBj379/HJ598AgAIDQ3FZ599hrS0NAwaNAhbtmzBgwcP8PXXX0NPTw8AsGrVKvTp0weRkZEwNzfH0qVLMXPmTAwaNAgAEBkZicTERKxYsQJffPEFzMzMAACmpqawsLBQGY+JiQlWrVoFdXV1tGzZEr1798ahQ4cQFBQk1enVq5d04sjMmTOxfPlyJCUloWXLlti6dSvU1NSwfv16KekWGxsLY2NjJCUlwcPDA7dv38a7776L5s2bAwCcnJykvgsKCjB9+nS0bNkSAODg4FDt801MTMSZM2dw/fp1yOVyAI9W/+3atQvbt2/H6NGjAQAlJSXYtGmTNO8KQ4YMUUmwDRkyBAEBAdLcpkyZgqNHj2Lp0qXo0qVLte2IiIiIiIiISJzH1kHRP1QvV9C1adMG3bp1g6urKwYMGICYmBjcunVLpU6rVq2gpva/6Zubm6sk8dTV1WFqaorr168DALKzs9GmTRspOQcAnp6eUCgUOHfuHO7cuYOrV6/C09NTJY6npyeys7OfOuZWrVpBXV1d+mxpaSnFrtC6dWvpzzKZDBYWFlKdjIwMXLhwAQYGBtDX14e+vj4aNGiAhw8fIjc3Fw0aNEBAQAB8fX3Rp08fREdHo7CwUOpvypQpGDVqFLp3747PPvsMubm51Y41IyMD9+7dg6mpqRRLX18feXl5Ku2sra0rJecAwMPDQ+Vzdnb2Mz23J9tVpbi4GHfu3FG5lErFU9sREREREREREYlSLxN06urqOHDgAH766Sc4Ozvj888/R4sWLZCXlyfVeXL7pkwmq7JMoXiU3Kl4f1pVHi9/sk5N7R5XU+xnqaNQKNCuXTtkZmaqXOfPn8eQIUMAPFpRd+TIEXTs2BHbtm2Do6Mjjh49CuDRO+POnj2L3r174+eff4azszO+//77KseqUChgaWlZKda5c+cwffp0qd7jyczHVVX+LM+tuv4eFxERASMjI5XrzsM/ntqOiIiIiIiIiEiUepmgAx4lfDw9PTF37lycPHkSWlpa1SacnoWzszMyMzOlgyYAIC0tDWpqanB0dIShoSEaN26M1NRUlXbp6enSVlItLS0AQHl5+d8eR3Xc3d2Rk5ODRo0awd7eXuUyMjKS6rVt2xahoaFIT0+Hi4sLvvnmG+meo6MjJk+ejP3796Nfv36IjY2tNta1a9egoaFRKVbDhg1rPXYnJ6can1tthIaG4vbt2yqXoXblVXxERERERERE9M8olDJhV31TLxN0x44dw6JFi3DixAkUFBRg586d+OOPP/5WwqfC0KFDoa2tjREjRuC3335DYmIiJkyYgGHDhsHc3BwAMH36dERGRmLbtm04d+4cPv74Y2RmZmLixIkAgEaNGkFHRwcJCQn4/fffcfv27TqZb8X4GjZsCD8/P6SkpCAvLw/JycmYOHEi/vvf/yIvLw+hoaE4cuQILl26hP379+P8+fNwcnLCgwcPEBwcjKSkJFy6dAlpaWk4fvx4tc+re/fu6NChA/r27Yt9+/YhPz8f6enpmD17do0nv1Zn+vTpiIuLw9q1a5GTk4Nly5Zh586dmDZtWq37ksvlMDQ0VLlksnr5bU5ERERERERE9US9PCTC0NAQhw8fxooVK3Dnzh1YW1sjKioKPXv2/Nt96urqYt++fZg4cSLat28PXV1dfPDBB1i2bJlUJyQkBHfu3MHUqVNx/fp1ODs7Y/fu3dKBCxoaGli5ciXmzZuHOXPmoHPnzkhKSvqn05XGd/jwYcycORP9+vXD3bt30aRJE3Tr1g2GhoZ48OAB/vOf/2Djxo34888/YWlpieDgYIwZMwZlZWX4888/MXz4cPz+++9o2LAh+vXrh7lz51YZSyaTYe/evZg1axYCAwPxxx9/wMLCAl5eXlKysjb69u2L6OhoLFmyBCEhIbC1tUVsbCx8fHz+4VMhIiIiIiIioudFWQ9XsokiUyp55gbVb80auD690nPSycBeWOzUuxeExRYpVtNZWOyDOupPr1QPXVI+EBa7S/nT3035vHTWvSksdsr9BsJiv67zvqgh7sCheScWCIt9uFWosNivqyZGd0UPQYh3blwWFttKR9zrUC4/eD3flSzy97U5Gq/nM39deWlaCIsdkf/N0yvVAyeb+QmL3bbg38JiPw/c+0dERERERERERCRQvdziSkREREREREREzxf3ZNYdrqAjIiIiIiIiIiISiCvoiIiIiIiIiIio1hQ8JKLOcAUdERERERERERGRQFxBR/VemG4bccHLxYUeKfB0rIuamsJio7RUWOiwSQbCYot0/6crwmLfyCsWFrtJD3F/x9U54fU8SVXkCbJ2t8X9+y3yJFWvsxHCYpelbRcWW5mXKyx2wWpxp7haL/ERFjts/GlhsUWe0ty9TNwJsiJ/X7soLDJwYJqDsNjlF8X9zlR8TtzPllOZ4k5SfeODO8Jivy6UXEFXZ7iCjoiIiIiIiIiISCAm6IiIiIiIiIiIiATiFlciIiIiIiIiIqo1HhJRd7iCjoiIiIiIiIiISCAm6F5CMpkMu3btEj0MIiIiIiIiIqJqKQVe9Q0TdPTClFZxumZJScnf6uvvtiMiIiIiIiIietm8sgk6hUKByMhI2NvbQy6Xo1mzZli4cKF0/8yZM+jatSt0dHRgamqK0aNH4969e9L9gIAA9O3bF4sWLYK5uTmMjY0xd+5clJWVYfr06WjQoAGaNm2KDRs2SG3y8/Mhk8mwdetWdOzYEdra2mjVqhWSkpKkOuXl5Rg5ciRsbW2ho6ODFi1aIDo6utL4N2zYgFatWkEul8PS0hLBwcEAABsbGwDA+++/D5lMJn0ODw+Hm5sbNm3aBBsbGxgZGWHQoEG4e/d/x2UrlUosXrwYdnZ20NHRQZs2bbB9+3bp/q1btzB06FCYmZlBR0cHDg4OiI2NBfAo4RUcHAxLS0toa2vDxsYGERERNX4NYmNj4eTkBG1tbbRs2RKrV6+u9Kzi4+Ph4+MDbW1tbN68WXruERERaNy4MRwdHWv19XqyHRERERERERHRq+6VPSQiNDQUMTExWL58OTp16oTCwkL85z//AQDcv38fPXr0wFtvvYXjx4/j+vXrGDVqFIKDgxEXFyf18fPPP6Np06Y4fPgw0tLSMHLkSBw5cgReXl44duwYtm3bhrFjx+Ltt9+GlZWV1G769OlYsWIFnJ2dsWzZMrz33nvIy8uDqakpFAoFmjZtivj4eDRs2BDp6ekYPXo0LC0tMXDgQADAmjVrMGXKFHz22Wfo2bMnbt++jbS0NADA8ePH0ahRI8TGxqJHjx5QV1eX4ubm5mLXrl3Ys2cPbt26hYEDB+Kzzz6TEpOzZ8/Gzp07sWbNGjg4OODw4cP48MMPYWZmBm9vb4SFhSErKws//fQTGjZsiAsXLuDBgwcAgJUrV2L37t2Ij49Hs2bNcPnyZVy+fLna5x8TE4NPP/0Uq1atQtu2bXHy5EkEBQVBT08PI0aMkOrNnDkTUVFRiI2NhVwuR3JyMg4dOgRDQ0McOHAASqXymb9eT7YjIiIiIiIiInF4SETdeSUTdHfv3kV0dDRWrVolJYOaN2+OTp06AQC2bNmCBw8e4Ouvv4aenh4AYNWqVejTpw8iIyNhbm4OAGjQoAFWrlwJNTU1tGjRAosXL8b9+/fxySefAHiUBPzss8+QlpaGQYMGSfGDg4PxwQcfAHiUbEtISMBXX32FGTNmQFNTE3PnzpXq2traIj09HfHx8VKCbsGCBZg6dSomTpwo1Wvfvj0AwMzMDABgbGwMCwsLlXkrFArExcXBwMAAADBs2DAcOnQICxcuRFFREZYtW4aff/4ZHTp0AADY2dkhNTUV69atg7e3NwoKCtC2bVt4eHgA+N9qPQAoKCiAg4MDOnXqBJlMBmtr6xq/BvPnz0dUVBT69esnzTMrKwvr1q1TSdBNmjRJqlNBT08P69evh5aWFoBHyb5n+Xo92Y6IiIiIiIiIqD54JRN02dnZKC4uRrdu3aq936ZNGynZAwCenp5QKBQ4d+6clPBp1aoV1NT+t8vX3NwcLi4u0md1dXWYmpri+vXrKv1XJMAAQENDAx4eHsjOzpbK1q5di/Xr1+PSpUt48OABSkpK4ObmBgC4fv06rl69Wu3Ya2JjYyMl5wDA0tJSGltWVhYePnyIt99+W6VNSUkJ2rZtCwD417/+hQ8++AC//vor3nnnHfTt2xcdO3YE8GgL6dtvv40WLVqgR48eePfdd/HOO+9UOY4//vgDly9fxsiRIxEUFCSVl5WVwcjISKVuRTLwca6uripJtmf9ej3ZrirFxcUoLi5WKStVlkNTpl5NCyIiIiIiIiL6O5RcQVdnXskEnY6OTo33lUolZLKqv0keL9fU1Kx0r6oyhULx1DFV9BsfH4/JkycjKioKHTp0gIGBAZYsWYJjx44909hrUtPYKv73xx9/RJMmTVTqyeVyAEDPnj1x6dIl/Pjjjzh48CC6deuG8ePHY+nSpXB3d0deXh5++uknHDx4EAMHDkT37t1V3mFXoSJWTEwM3nzzTZV7j2/JBaCSdKuu7Fm/XlX19aSIiAiVFYwA8K6BK94zbP3UtkREREREREREIrySh0Q4ODhAR0cHhw4dqvK+s7MzMjMzUVRUJJWlpaVBTU2tTg4XOHr0qPTnsrIyZGRkoGXLlgCAlJQUdOzYEePGjUPbtm1hb2+P3Nxcqb6BgQFsbGyqHTvwKBFXXl5eqzE5OztDLpejoKAA9vb2Ktfj788zMzNDQEAANm/ejBUrVuDLL7+U7hkaGsLf3x8xMTHYtm0bduzYgZs3b1aKZW5ujiZNmuDixYuVYtna2tZq3BVjr6uvV2hoKG7fvq1y9TRoVesxEREREREREVHNFAKv+uaVXEGnra2NmTNnYsaMGdDS0oKnpyf++OMPnD17FiNHjsTQoUPx6aefYsSIEQgPD8cff/yBCRMmYNiwYdJ2yX/iiy++gIODA5ycnLB8+XLcunULgYGBAAB7e3t8/fXX2LdvH2xtbbFp0yYcP35cJXEVHh6OsWPHolGjRujZsyfu3r2LtLQ0TJgwAQCkBJ6npyfkcjlMTEyeOiYDAwNMmzYNkydPhkKhQKdOnXDnzh2kp6dDX18fI0aMwJw5c9CuXTu0atUKxcXF2LNnD5ycnAAAy5cvh6WlJdzc3KCmpobvvvsOFhYWMDY2rjJeeHg4QkJCYGhoiJ49e6K4uBgnTpzArVu3MGXKlFo9z7r8esnlcmnFYAVubyUiIiIiIiKil9krmaADgLCwMGhoaGDOnDm4evUqLC0tMXbsWACArq4u9u3bh4kTJ6J9+/bQ1dXFBx98gGXLltVJ7M8++wyRkZE4efIkmjdvjn//+99o2LAhAGDs2LHIzMyEv78/ZDIZBg8ejHHjxuGnn36S2o8YMQIPHz7E8uXLMW3aNDRs2BD9+/eX7kdFRWHKlCmIiYlBkyZNkJ+f/0zjmj9/Pho1aoSIiAhcvHgRxsbGcHd3lw690NLSQmhoKPLz86Gjo4POnTtj69atAAB9fX1ERkYiJycH6urqaN++Pfbu3avyjr7HjRo1Crq6uliyZAlmzJgBPT09uLq6YtKkSbV+ns/760VERERERERE9DKTKZVKpehBvCry8/Nha2uLkydPSoc+0MsvpumHoocghF1pqbDYF594X+KLJHLeHUKNhcUW6f5P/xEW+0be099N+bw06SHuLRFXEsQt6k+530BY7M66lV+78KJcuW3w9Er1kNfZCGGxy9Iqvwf3RVHm5T690nNSsPqysNjWS3yExd48/rSw2Bc1xP1M7f6gdq+1qUsif18Taeg0cb87lF+8Iix28bm7wmKfyrQQFvsN/6KnV3pODFbuERb7RTpsMUBYbK9r3wmL/Ty8ku+gIyIiIiIiIiIiqi9e2S2uREREREREREQkjoJ7MusME3S1YGNjA+4IJiIiIiIiIiKiusQtrkRERERERERERAJxBR0REREREREREdWaAjLRQ6g3mKAjeo5EngoGvJ4nc4k8kewNgSdzqds1ERZb5EmqQk/WTBB3GprQeb+eP1peWyJPUtXw7C8sdhnEzfvK7b+ExW4m8PTa19XrepKqSCJPn9ft2VJY7BsJ4k6IFvl93kTgyffivtr0qmKCjoiIiIiIiIiIak3JFXR1hu+gIyIiIiIiIiIiEogJunouLi4OxsbGoodBRERERERERPWMQuBV3zBBV8/5+/vj/Pnz0ufw8HC4ubmJGxAREREREREREangO+jqOR0dHejo6IgeBgCgpKQEWlpaKmXl5eWQyWRQU6tdrvjvtiMiIiIiIiIietkwu1EDhUKByMhI2NvbQy6Xo1mzZli4cKF0/8yZM+jatSt0dHRgamqK0aNH4969e9L9gIAA9O3bF0uXLoWlpSVMTU0xfvx4lJaWSnWKi4sxY8YMWFlZQS6Xw8HBAV999RWAR0mokSNHwtbWFjo6OmjRogWio6Oltvv27YO2tjb++usvlXGHhITA29sbgOoW17i4OMydOxenTp2CTCaDTCZDXFwcAgMD8e6776r0UVZWBgsLC2zYsKHa55Oeng4vLy/o6OjAysoKISEhKCoqku7b2NhgwYIFCAgIgJGREYKCgqTx7NmzB87OzpDL5bh06RJu3bqF4cOHw8TEBLq6uujZsydycnKkvqprR0RERERERERiKCETdtU3TNDVIDQ0FJGRkQgLC0NWVha++eYbmJubAwDu37+PHj16wMTEBMePH8d3332HgwcPIjg4WKWPxMRE5ObmIjExERs3bkRcXBzi4uKk+8OHD8fWrVuxcuVKZGdnY+3atdDX1wfwKEHYtGlTxMfHIysrC3PmzMEnn3yC+Ph4AED37t1hbGyMHTt2SP2Vl5cjPj4eQ4cOrTQff39/TJ06Fa1atUJhYSEKCwvh7++PUaNGISEhAYWFhVLdvXv34t69exg4cGCVz+bMmTPw9fVFv379cPr0aWzbtg2pqamV5r9kyRK4uLggIyMDYWFh0rOLiIjA+vXrcfbsWTRq1AgBAQE4ceIEdu/ejSNHjkCpVKJXr14qycyq2hERERERERERveq4xbUad+/eRXR0NFatWoURI0YAAJo3b45OnToBALZs2YIHDx7g66+/hp6eHgBg1apV6NOnDyIjI6VEnomJCVatWgV1dXW0bNkSvXv3xqFDhxAUFITz588jPj4eBw4cQPfu3QEAdnZ20hg0NTUxd+5c6bOtrS3S09MRHx+PgQMHQl1dHf7+/vjmm28wcuRIAMChQ4dw69YtDBgwoNKcdHR0oK+vDw0NDVhYWEjlHTt2RIsWLbBp0ybMmDEDABAbG4sBAwZIycInLVmyBEOGDMGkSZMAAA4ODli5ciW8vb2xZs0aaGtrAwC6du2KadOmSe1SU1NRWlqK1atXo02bNgCAnJwc7N69G2lpaejYsaP0fK2srLBr1y5pLk+2IyIiIiIiIiJx6uNhDaJwBV01srOzUVxcjG7dulV7v02bNlJyDgA8PT2hUChw7tw5qaxVq1ZQV1eXPltaWuL69esAgMzMTKirq0vbUauydu1aeHh4wMzMDPr6+oiJiUFBQYF0f+jQoUhKSsLVq1cBPEps9erVCyYmJrWa76hRoxAbGwsAuH79On788UcEBgZWWz8jIwNxcXHQ19eXLl9fXygUCuTl5Un1PDw8KrXV0tJC69atpc/Z2dnQ0NDAm2++KZWZmpqiRYsWyM7OrrZdVYqLi3Hnzh2Vq1RZ/vQHQEREREREREQkCBN01XjawQpKpRIyWdV7nh8v19TUrHRPoVA8U4z4+HhMnjwZgYGB2L9/PzIzM/HRRx+hpKREqvPGG2+gefPm2Lp1Kx48eIDvv/8eH374YY39VmX48OG4ePEijhw5gs2bN8PGxgadO3eutr5CocCYMWOQmZkpXadOnUJOTg6aN28u1Xs8gVlBR0dH5RkplcoqYzz5jJ9sV5WIiAgYGRmpXD/dPVtjGyIiIiIiIiIikZigq4aDgwN0dHRw6NChKu87OzsjMzNT5VCEtLQ0qKmpwdHR8ZliuLq6QqFQIDk5ucr7KSkp6NixI8aNG4e2bdvC3t4eubm5leoNGTIEW7ZswQ8//AA1NTX07t272phaWlooL6+8oszU1BR9+/ZFbGwsYmNj8dFHH9U4dnd3d5w9exb29vaVridPan0aZ2dnlJWV4dixY1LZn3/+ifPnz8PJyalWfYWGhuL27dsqV0+DVrXqg4iIiIiIiIieTiHwqm+YoKuGtrY2Zs6ciRkzZuDrr79Gbm4ujh49Kp2wOnToUGhra2PEiBH47bffkJiYiAkTJmDYsGHS++eexsbGBiNGjEBgYCB27dqFvLw8JCUlSYdA2Nvb48SJE9i3bx/Onz+PsLAwHD9+vFI/Q4cOxa+//oqFCxeif//+0vvfqouZl5eHzMxM3LhxA8XFxdK9UaNGYePGjcjOzpbeu1edmTNn4siRIxg/fjwyMzOl98hNmDDhmeb+OAcHB/j5+SEoKAipqak4deoUPvzwQzRp0gR+fn616ksul8PQ0FDl0pSpP70hEREREREREZEgTNDVICwsDFOnTsWcOXPg5OQEf39/6f1xurq62LdvH27evIn27dujf//+6NatG1atWlWrGGvWrEH//v0xbtw4tGzZEkFBQdKqvLFjx6Jfv37w9/fHm2++iT///BPjxo2r1IeDgwPat2+P06dPV3l66+M++OAD9OjRA126dIGZmRm+/fZb6V737t1haWkJX19fNG7cuMZ+WrdujeTkZOTk5KBz585o27YtwsLCYGlpWav5V4iNjUW7du3w7rvvokOHDlAqldi7d2+lLcJERERERERE9HJQQibsqm9kyupeAEavnfv376Nx48bYsGED+vXrJ3o4dSamae3fyVdXLmqIW3hrV8b8+4s2qN9fwmKr2zURFrtg9WVhsa/cNhAWu4nRXWGxRc77osC/OOmse1NYbJHPXKSOX7YTFlvDs7+w2GVp24XFTh+dISx2h1BjYbG3LC16eqXnhL+vvV7ebyXu9xbdni2FxRb5+1rK/QbCYov83aHl+b3CYr9IP5oPFha79+/fPr3SK0RD9ABIPIVCgWvXriEqKgpGRkZ47733RA+JiIiIiIiIiF5yivq3kE0YJugIBQUFsLW1RdOmTREXFwcNDX5bEBERERERERG9KMzEEGxsbMCdzkREREREREREYjBBR0REREREREREtaaoh4c1iMK3khIREREREREREQnEFXRU74k8uQev6YlFr6/X8+88mvQQOO+E1/Mk1TZu14TFvnjWSlhsfq+9eMq8XGGxyyDuJFWRJ8gC4k5xLb94RVjszrriTlK9WGIsLPbrejq1yBPBRbr/03+ExW7SQ+DXe7fAk5IFfp+LO7P3xeLLsurO6/n/JomIiIiIiIiIiF4STNAREREREREREREJxC2uRERERERERERUa+I2MNc/XEH3isvPz4dMJkNmZqbooRARERERERER0d/ABN0rzsrKCoWFhXBxcXnmNuHh4XBzc3t+gyIiIiIiIiKiek8hkwm76hsm6F5x6urqsLCwgIbGy79bubS09JnK/m5fRERERERERESvopciQadQKBAZGQl7e3vI5XI0a9YMCxculO6fOXMGXbt2hY6ODkxNTTF69Gjcu3dPuh8QEIC+ffti6dKlsLS0hKmpKcaPH6+SxCkuLsaMGTNgZWUFuVwOBwcHfPXVVwCA8vJyjBw5Era2ttDR0UGLFi0QHR0ttd23bx+0tbXx119/qYw7JCQE3t7e0uf09HR4eXlBR0cHVlZWCAkJQVFRUbXzrljJtm7dOlhZWUFXVxcDBgxQiaNQKDBv3jw0bdoUcrkcbm5uSEhIkO4/ucU1KSkJMpkMhw4dgoeHB3R1ddGxY0ecO3cOABAXF4e5c+fi1KlTkMlkkMlkiIuLk8bTrFkzyOVyNG7cGCEhITV+3X744Qe0a9cO2trasLOzw9y5c1FWVibdl8lkWLt2Lfz8/KCnp4cFCxZIc96wYQPs7Owgl8uhVCpRUFAAPz8/6Ovrw9DQEAMHDsTvv/9e6Vk92Y6IiIiIiIiI6FX3UiToQkNDERkZibCwMGRlZeGbb76Bubk5AOD+/fvo0aMHTExMcPz4cXz33Xc4ePAggoODVfpITExEbm4uEhMTsXHjRsTFxUmJJwAYPnw4tm7dipUrVyI7Oxtr166Fvr4+gEdJsKZNmyI+Ph5ZWVmYM2cOPvnkE8THxwMAunfvDmNjY+zYsUPqr7y8HPHx8Rg6dCiAR0lEX19f9OvXD6dPn8a2bduQmppaaZxPunDhAuLj4/HDDz8gISEBmZmZGD9+vHQ/OjoaUVFRWLp0KU6fPg1fX1+89957yMnJqbHfWbNmISoqCidOnICGhgYCAwMBAP7+/pg6dSpatWqFwsJCFBYWwt/fH9u3b8fy5cuxbt065OTkYNeuXXB1da22/3379uHDDz9ESEgIsrKysG7dOsTFxakkVgHg008/hZ+fH86cOSONoWLOO3bskBKLffv2xc2bN5GcnIwDBw4gNzcX/v7+VT6rx9sRERERERERkRhKgVd9I3xf5N27dxEdHY1Vq1ZhxIgRAIDmzZujU6dOAIAtW7bgwYMH+Prrr6GnpwcAWLVqFfr06YPIyEgpkWdiYoJVq1ZBXV0dLVu2RO/evXHo0CEEBQXh/PnziI+Px4EDB9C9e3cAgJ2dnTQGTU1NzJ07V/psa2uL9PR0xMfHY+DAgVBXV4e/vz+++eYbjBw5EgBw6NAh3Lp1CwMGDAAALFmyBEOGDMGkSZMAAA4ODli5ciW8vb2xZs0aaGtrVzn/hw8fYuPGjWjatCkA4PPPP0fv3r0RFRUFCwsLLF26FDNnzsSgQYMAAJGRkUhMTMSKFSvwxRdfVPtcFy5cKK3u+/jjj9G7d288fPgQOjo60NfXh4aGBiwsLKT6BQUFsLCwQPfu3aGpqYlmzZrhjTfeqLH/jz/+WPqa2dnZYf78+ZgxYwY+/fRTqd6QIUOkxFyFkpISbNq0CWZmZgCAAwcO4PTp08jLy4OVlRUAYNOmTWjVqhWOHz+O9u3bV9mOiIiIiIiIiKg+EL6CLjs7G8XFxejWrVu199u0aSMl5wDA09MTCoVC2rYJAK1atYK6urr02dLSEtevXwcAZGZmQl1dXWU76pPWrl0LDw8PmJmZQV9fHzExMSgoKJDuDx06FElJSbh69SqAR4nDXr16wcTEBACQkZGBuLg46OvrS5evry8UCgXy8vKqjdusWTMpOQcAHTp0kOZ2584dXL16FZ6eniptPD09kZ2dXW2fANC6dWuVZwFAeh5VGTBgAB48eAA7OzsEBQXh+++/V9mu+qSMjAzMmzdPZb5BQUEoLCzE/fv3pXoeHh6V2lpbW6sk2bKzs2FlZSUl5wDA2dkZxsbGKvN8sl1ViouLcefOHZWrRFFeYxsiIiIiIiIiqj2FwKu+EZ6g09HRqfG+UqmErJrTOR4v19TUrHRPoVA8U4z4+HhMnjwZgYGB2L9/PzIzM/HRRx+hpKREqvPGG2+gefPm2Lp1Kx48eIDvv/8eH374oXRfoVBgzJgxyMzMlK5Tp04hJycHzZs3rzF+VXN6fG5Pzr+mZ1Lh8edRUbfieVTFysoK586dwxdffAEdHR2MGzcOXl5e1R7GoFAoMHfuXJX5njlzBjk5OSqrBR9PrFZXVt18niyvqq8nRUREwMjISOX68tbFp7YjIiIiIiIiIhJF+BZXBwcH6Ojo4NChQxg1alSl+87Ozti4cSOKioqkBE1aWhrU1NTg6Oj4TDFcXV2hUCiQnJwsbXF9XEpKCjp27Ihx48ZJZbm5uZXqDRkyBFu2bEHTpk2hpqaG3r17S/fc3d1x9uxZ2NvbP9OYKhQUFODq1ato3LgxAODIkSPS3AwNDdG4cWOkpqbCy8tLapOenl7j9tOn0dLSQnl55VVlOjo6eO+99/Dee+9h/PjxaNmyJc6cOQN3d/dKdd3d3XHu3Llaz7cqzs7OKCgowOXLl6VVdFlZWbh9+zacnJxq1VdoaCimTJmiUpbvPuAfj5GIiIiIiIiIVClqXjtEtSB8BZ22tjZmzpyJGTNm4Ouvv0Zubi6OHj0qnbA6dOhQaGtrY8SIEfjtt9+QmJiICRMmYNiwYdL7557GxsYGI0aMQGBgIHbt2oW8vDwkJSVJh0DY29vjxIkT2LdvH86fP4+wsDAcP368Uj9Dhw7Fr7/+ioULF6J///4qK8VmzpyJI0eOYPz48cjMzEROTg52796NCRMmPHX+I0aMwKlTp5CSkoKQkBAMHDhQej/c9OnTERkZiW3btuHcuXP4+OOPkZmZiYkTJz7T3Kt7Hnl5ecjMzMSNGzdQXFyMuLg4fPXVV/jtt99w8eJFbNq0CTo6OrC2tq6yjzlz5uDrr79GeHg4zp49i+zsbGzbtg2zZ8+u9Xi6d++O1q1bS8/3l19+wfDhw+Ht7V3lFtmayOVyGBoaqlxaaupPb0hEREREREREJIjwBB0AhIWFYerUqZgzZw6cnJzg7+8vvS9NV1cX+/btw82bN9G+fXv0798f3bp1w6pVq2oVY82aNejfvz/GjRuHli1bIigoCEVFRQCAsWPHol+/fvD398ebb76JP//8U2U1XQUHBwe0b98ep0+flk5vrdC6dWskJycjJycHnTt3Rtu2bREWFia9/6069vb26NevH3r16oV33nkHLi4uWL16tXQ/JCQEU6dOxdSpU+Hq6oqEhATs3r0bDg4OtZr/4z744AP06NEDXbp0gZmZGb799lsYGxsjJiYGnp6eaN26NQ4dOoQffvgBpqamVfbh6+uLPXv24MCBA2jfvj3eeustLFu2rNqEXk1kMhl27doFExMTeHl5oXv37rCzs8O2bdv+9hyJiIiIiIiIiF4VMqVSWR9Pp30lhIeHY9euXcjMzBQ9lHrtP469hMVOud9AWOzOujeFxX5dNekh7u881O2aCItdfvGKsNhXEsS9HvbKbQNhsdu4XRMW+/uzVk+v9JwM6veXsNiv6/dah1BjYbFlts/+Dt+6puHZX1jsw61ChcV+w79IWGyR/45tLDEWFnuE1l/CYov82XLxifeHv0jvt7osLLZI8hbivt6LdhsKi939gbgDA7v9/nosONnS+MOnV3pOhl7dLCz28/BSrKAjIiIiIiIiIiJ6XQk/JIKIiIiIiIiIiF493JJZd7iCTqDw8HBubyUiIiIiIiIies0xQUdERERERERERCQQt7gSEREREREREVGtKWSiR1B/8BRXqvc8m3QVFttaw0hY7Etlt4XFFilA1lj0EF47FzXEnbzH0+9ePJEnRIt85gd11IXFFvl9LpLIr7dIXmcjhMXO7RgsLPbIoofCYtOLJ/J3ZGuZjrDYl5QPhMUWaY68WFhskac0R+R/Iyz2i/R1E3GnuA6/Ur9OceUKOiIiIiIiIiIiqjVxf1Vf//AddERERERERERERAJxBR0REREREREREdUa35lWd7iCrp6ysbHBihUrRA+DiIiIiIiIiIieggm611h5eTkUihezY7y0tLRW5X+3PyIiIiIiIiKiJ61evRq2trbQ1tZGu3btkJKSUm3dnTt34u2334aZmRkMDQ3RoUMH7Nu377mOjwm6x2zfvh2urq7Q0dGBqakpunfvjqKiIhw+fBiampq4du2aSv2pU6fCy8sLABAXFwdjY2Ps2bMHLVq0gK6uLvr374+ioiJs3LgRNjY2MDExwYQJE1BeXi71YWNjgwULFmD48OHQ19eHtbU1/v3vf+OPP/6An58f9PX14erqihMnTqjETk9Ph5eXF3R0dGBlZYWQkBAUFRUBAHx8fHDp0iVMnjwZMpkMMpms0hidnZ0hl8uRkpLy1LlV5fbt2xg9ejQaNWoEQ0NDdO3aFadOnZLuh4eHw83NDRs2bICdnR3kcjmUSiVkMhnWrl0LPz8/6OnpYcGCBQCANWvWoHnz5tDS0kKLFi2wadMmlXjVtSMiIiIiIiIiMRQycVdtbNu2DZMmTcKsWbNw8uRJdO7cGT179kRBQUGV9Q8fPoy3334be/fuRUZGBrp06YI+ffrg5MmTdfDUqsYE3f8rLCzE4MGDERgYiOzsbCQlJaFfv35QKpXw8vKCnZ2dStKorKwMmzdvxkcffSSV3b9/HytXrsTWrVuRkJAg9bF3717s3bsXmzZtwpdffont27erxF6+fDk8PT1x8uRJ9O7dG8OGDcPw4cPx4Ycf4tdff4W9vT2GDx8OpfLR7u4zZ87A19cX/fr1w+nTp7Ft2zakpqYiODgYwKNMb9OmTTFv3jwUFhaisLBQZYwRERFYv349zp49Cw8Pj2ea2+OUSiV69+6Na9euSd+s7u7u6NatG27evCnVu3DhAuLj47Fjxw5kZmZK5Z9++in8/Pxw5swZBAYG4vvvv8fEiRMxdepU/PbbbxgzZgw++ugjJCYmqsR9sh0RERERERER0dMsW7YMI0eOxKhRo+Dk5IQVK1bAysoKa9asqbL+ihUrMGPGDLRv3x4ODg5YtGgRHBwc8MMPPzy3MfKQiP9XWFiIsrIy9OvXD9bW1gAAV1dX6f7IkSMRGxuL6dOnAwB+/PFH3L9/HwMHDpTqlJaWSivBAKB///7YtGkTfv/9d+jr68PZ2RldunRBYmIi/P39pXa9evXCmDFjAABz5szBmjVr0L59ewwYMAAAMHPmTHTo0AG///47LCwssGTJEgwZMgSTJk0CADg4OGDlypXw9vbGmjVr0KBBA6irq8PAwAAWFhYq8ywtLcXq1avRpk2bWs3tcYmJiThz5gyuX78OuVwOAFi6dCl27dqF7du3Y/To0QCAkpISbNq0CWZmZirthwwZopJgGzJkCAICAjBu3DgAwJQpU3D06FEsXboUXbp0qbYdEREREREREYnzYl6aVbXi4mIUFxerlMnlcilPUaGkpAQZGRn4+OOPVcrfeecdpKenP1MshUKBu3fvokGDBv9s0DXgCrr/16ZNG3Tr1g2urq4YMGAAYmJicOvWLel+QEAALly4gKNHjwIANmzYgIEDB0JPT0+qo6urKyXnAMDc3Bw2NjbQ19dXKbt+/bpK7NatW6vcB1STgxVlFe0yMjIQFxcHfX196fL19YVCoUBeXl6N89TS0lKJ96xze1xGRgbu3bsHU1NTlTHk5eUhNzdXqmdtbV0pOQcAHh4eKp+zs7Ph6empUubp6Yns7Owa21WluLgYd+7cUbkUSpE/MoiIiIiIiIiorkVERMDIyEjlioiIqFTvxo0bKC8vl3IrFczNzSu97qs6UVFRKCoqqnYhU13gCrr/p66ujgMHDiA9PR379+/H559/jlmzZuHYsWOwtbVFo0aN0KdPH8TGxsLOzg579+5FUlKSSh+ampoqn2UyWZVlTx7M8HidivfFVVVW0U6hUGDMmDEICQmpNI9mzZrVOE8dHR2pvwrPMrfHKRQKWFpaVlnH2NhY+nN1Cb6qyp8cU8X76p7W7kkRERGYO3euSllTfRs0M7R9alsiIiIiIiIiejWEhoZiypQpKmVPrp573LPkHary7bffIjw8HP/+97/RqFGjvzfYZ8AE3WNkMhk8PT3h6emJOXPmwNraGt9//730BR81ahQGDRqEpk2bonnz5pVWfb0o7u7uOHv2LOzt7auto6WlpXIYxdPUZm7u7u64du0aNDQ0YGNjU5uhV8nJyQmpqakYPny4VJaeng4nJ6da91XVv6C+Ld/7x2MkIiIiIiIiIlUi96tVtZ21Kg0bNoS6unql1XLXr1+vtKruSdu2bcPIkSPx3XffoXv37v9ovE/DLa7/79ixY1i0aBFOnDiBgoIC7Ny5E3/88YdKksjX1xdGRkZYsGBBtQcovAgzZ87EkSNHMH78eGRmZiInJwe7d+/GhAkTpDo2NjY4fPgwrly5ghs3bjy1z9rMrXv37ujQoQP69u2Lffv2IT8/H+np6Zg9e3al02afxfTp0xEXF4e1a9ciJycHy5Ytw86dOzFt2rRa9yWXy2FoaKhyqcn4bU5ERERERET0OtLS0kK7du1w4MABlfIDBw6gY8eO1bb79ttvERAQgG+++Qa9e/d+3sNkgq6CoaEhDh8+jF69esHR0RGzZ89GVFQUevbsKdVRU1NDQEAAysvLVVZ7vWitW7dGcnIycnJy0LlzZ7Rt2xZhYWGwtLSU6sybNw/5+flo3rx5le+Be1Jt5iaTybB37154eXkhMDAQjo6OGDRoEPLz85+afa5K3759ER0djSVLlqBVq1ZYt24dYmNj4ePjU+u+iIiIiIiIiOjFUMrEXbUxZcoUrF+/Hhs2bEB2djYmT56MgoICjB07FsCj3XiP50K+/fZbDB8+HFFRUXjrrbdw7do1XLt2Dbdv367Lx6dCplQqlc+t93ooKCgIv//+O3bv3i16KHWuvs7Ns0lXYbGtNYyExb5U9vx+cLzMAmSNRQ/htXNRQ9zC9hFafwmLfeW2gbDYF594v+mL1Fn3prDYIp/5QR11YbFFfp+LJPLrLZLX2covt35RcjsGC4s9suihsNj04on8HdlapiMs9iXlA2GxRZojL356pedkY4mxsNgR+d8Ii/0irbX6UFjssZc316r+6tWrsXjxYhQWFsLFxQXLly+Hl5cXgEeHZ+bn50vv2vfx8UFycnKlPkaMGIG4uLh/OvQq8R10z+j27ds4fvw4tmzZgn//+9+ih1On6vPciIiIiIiIiOj5EPkOutoaN24cxo0bV+W9J5NuNR2c+bwwQfeM/Pz88Msvv2DMmDF4++23RQ+nTtXnuRERERERERERveyYoHtGIrKnL0p9nhsRERERERER0cuOCToiIiIiIiIiIqq1V2mL68uOp7gSEREREREREREJxBV0RERERERERERUa0rRA6hHmKCjeu91Pcb9df232+5hqbDYbdyuCYt9I09PWGy72wbCYl95IC52h1BjYbEvLi0SFvuKwK/3RU1NYbFf1w0c1kt8hMVulpcrLHb5xSvCYud2DBYWu3n6KmGxrdtNFRabXrwu5eJ+b+mse1NYbJFS7jcQFntjibj/T/TJe3eExSaqLW5xJSIiIiIiIiIiEug1XWNDRERERERERET/hEImegT1R52soFMqlRg9ejQaNGgAmUyGzMzMuui2TgUEBKBv3761ahMXFwdjY2Ppc3h4ONzc3Op0XM+Dj48PJk2aJHoYRERERERERET0DOpkBV1CQgLi4uKQlJQEOzs7NGzYsC66rZKPjw/c3NywYsWK5xajOtOmTcOECRNeeNza2rlzJzSFvqeHiIiIiIiIiOq71/Mtvc9HnSTocnNzYWlpiY4dO1Zbp6SkBFpaWnURThh9fX3o6+uLHsZTNWgg7gWgNanqe0CpVKK8vBwaGrX7Vvy77YiIiIiIiIiIXjb/eItrQEAAJkyYgIKCAshkMtjY2AB4tNItODgYU6ZMQcOGDfH2228DAJYtWwZXV1fo6enBysoK48aNw71791T6TEtLg7e3N3R1dWFiYgJfX1/cunULAQEBSE5ORnR0NGQyGWQyGfLz81FeXo6RI0fC1tYWOjo6aNGiBaKjo2s9l7i4ODRr1gy6urp4//338eeff6rcf3KLa8W22UWLFsHc3BzGxsaYO3cuysrKMH36dDRo0ABNmzbFhg0bVPq5cuUK/P39YWJiAlNTU/j5+SE/P79Sv0uXLoWlpSVMTU0xfvx4lJb+73TK1atXw8HBAdra2jA3N0f//v2le09ucb116xaGDx8OExMT6OrqomfPnsjJyVGZt7GxMfbt2wcnJyfo6+ujR48eKCwsrPF5ZWVloVevXtDX14e5uTmGDRuGGzduqIzjye+BpKQkyGQy7Nu3Dx4eHpDL5UhJSUFxcTFCQkLQqFEjaGtro1OnTjh+/LjUV3XtiIiIiIiIiEgMhcCrvvnHCbro6GjMmzcPTZs2RWFhoUpSZePGjdDQ0EBaWhrWrVv3KKCaGlauXInffvsNGzduxM8//4wZM2ZIbTIzM9GtWze0atUKR44cQWpqKvr06YPy8nJER0ejQ4cOCAoKQmFhIQoLC2FlZQWFQoGmTZsiPj4eWVlZmDNnDj755BPEx8c/8zyOHTuGwMBAjBs3DpmZmejSpQsWLFjw1HY///wzrl69isOHD2PZsmUIDw/Hu+++CxMTExw7dgxjx47F2LFjcfnyZQDA/fv30aVLF+jr6+Pw4cNITU2VEmIlJSVSv4mJicjNzUViYiI2btyIuLg4xMXFAQBOnDiBkJAQzJs3D+fOnUNCQgK8vLyqHWNAQABOnDiB3bt348iRI1AqlejVq5dKwu/+/ftYunQpNm3ahMOHD6OgoADTpk2rts/CwkJ4e3vDzc0NJ06cQEJCAn7//XcMHDhQpV5V3wMAMGPGDERERCA7OxutW7fGjBkzsGPHDmzcuBG//vor7O3t4evri5s3VY9Bf7IdEREREREREdGr7h/vDzQyMoKBgQHU1dVhYWGhcs/e3h6LFy9WKXt8ZZetrS3mz5+Pf/3rX1i9ejUAYPHixfDw8JA+A0CrVq2kP2tpaUFXV1cllrq6OubOnavSb3p6OuLj4ysljKoTHR0NX19ffPzxxwAAR0dHpKenIyEhocZ2DRo0wMqVK6GmpoYWLVpg8eLFuH//Pj755BMAQGhoKD777DOkpaVh0KBB2Lp1K9TU1LB+/XrIZI+OO4mNjYWxsTGSkpLwzjvvAABMTEywatUqqKuro2XLlujduzcOHTqEoKAgFBQUQE9PD++++y4MDAxgbW2Ntm3bVjm+nJwc7N69G2lpadIW5C1btsDKygq7du3CgAEDAAClpaVYu3YtmjdvDgAIDg7GvHnzqp33mjVr4O7ujkWLFkllGzZsgJWVFc6fPw9HR0cAlb8Hrl27BgCYN2+etKqyqKgIa9asQVxcHHr27AkAiImJwYEDB/DVV19h+vTpUvvH2xERERERERER1Qd1coprdTw8PCqVJSYm4u2330aTJk1gYGCA4cOH488//0RRURGA/62gq621a9fCw8MDZmZm0NfXR0xMDAoKCp65fXZ2Njp06KBS9uTnqrRq1Qpqav97jObm5nB1dZU+q6urw9TUFNevXwcAZGRk4MKFCzAwMJDeadegQQM8fPgQubm5Kv2qq6tLny0tLaU+3n77bVhbW8POzg7Dhg3Dli1bcP/+/WrnpaGhgTfffFMqMzU1RYsWLZCdnS2V6erqSsm5J+NVJSMjA4mJidIc9PX10bJlSwBQmUdV3wNPlufm5qK0tBSenp5SmaamJt544w2VMdbUX4Xi4mLcuXNH5SpXltfYhoiIiIiIiIhqTynwqm+ea4JOT09P5fOlS5fQq1cvuLi4YMeOHcjIyMAXX3wBANJ2Sx0dnVrHiY+Px+TJkxEYGIj9+/cjMzMTH330kcqW0adRKv/el/fJ01JlMlmVZQrFox3SCoUC7dq1Q2Zmpsp1/vx5DBkypMZ+K/owMDDAr7/+im+//RaWlpaYM2cO2rRpg7/++uuZ56VUKqUVfNXFq+mZKBQK9OnTp9I8cnJyVLbbPvk9UFV5RZzHx1PVGGvqr0JERASMjIxUrqzb52tsQ0REREREREQk0nNN0D3pxIkTKCsrQ1RUFN566y04Ojri6tWrKnVat26NQ4cOVduHlpYWystVV0SlpKSgY8eOGDduHNq2bQt7e3uVVVzPwtnZGUePHlUpe/JzXXB3d0dOTg4aNWoEe3t7lcvIyOiZ+9HQ0ED37t2xePFinD59Gvn5+fj5558r1XN2dkZZWRmOHTsmlf355584f/48nJyc/tE8zp49Cxsbm0rzeFoS7Un29vbQ0tJCamqqVFZaWooTJ07UeoyhoaG4ffu2yuVs5FirPoiIiIiIiIjo6RQycVd980ITdM2bN0dZWRk+//xzXLx4EZs2bcLatWtV6oSGhuL48eMYN24cTp8+jf/85z9Ys2aNdDqojY0Njh07hvz8fNy4cQMKhQL29vY4ceIE9u3bh/PnzyMsLEzlsIpnERISgoSEBCxevBjnz5/HqlWrnvr+ub9j6NChaNiwIfz8/JCSkoK8vDwkJydj4sSJ+O9///tMfezZswcrV65EZmYmLl26hK+//hoKhQItWrSoVNfBwQF+fn4ICgpCamoqTp06hQ8//BBNmjSBn5/f357H+PHjcfPmTQwePBi//PILLl68iP379yMwMLBSAvVp9PT08K9//QvTp09HQkICsrKyEBQUhPv372PkyJG16ksul8PQ0FDlUpepP70hEREREREREZEgLzRB5+bmhmXLliEyMhIuLi7YsmULIiIiVOo4Ojpi//79OHXqFN544w106NAB//73v6Gh8eg8i2nTpkFdXR3Ozs4wMzNDQUEBxo4di379+sHf3x9vvvkm/vzzT4wbN65WY3vrrbewfv16fP7553Bzc8P+/fsxe/bsOpt7BV1dXRw+fBjNmjVDv3794OTkhMDAQDx48ACGhobP1IexsTF27tyJrl27wsnJCWvXrsW3336rcpjG42JjY9GuXTu8++676NChA5RKJfbu3VtpW2ttNG7cGGlpaSgvL4evry9cXFwwceJEGBkZqbyT71l99tln+OCDDzBs2DC4u7vjwoUL2LdvH0xMTP72GImIiIiIiIiIXgUy5d99+RrRK2KI9fvCYlvLav9OxbpySflAWGyRRj7UEha7jds1YbFv5NVua3ldunLbQFhskTqEGguLvWVpkbDYdv//zlgRLv6Dv1j6x7E1FMJij9D6S1hs6yU+wmIr82r3upK6VH7xirDYVxLEfa81T18lLPaIdlOFxaYXr0u5uN9bOuveFBZbpJT7DYTFFvnf0E/euyMstsHKPcJiv0ifWX8oLPbHlzYLi/08vNAVdERERERERERERKRKQ/QAiIiIiIiIiIjo1cMtmXWHK+iIiIiIiIiIiIgE4go6IiIiIiIiIiKqNQXX0NUZrqAjIiIiIiIiIiISiKe4Ur1XeuOisNi5HYOFxRZ5EptIcW5zhMV+v9VlYbF1e7YUFltm21xY7M3jTwuL/a/ricJi31n8rrDYby/JERb7wHQHYbFFfp+3+GiTsNhhum2ExRZJ5CmPI4seCottrWEkLPbGjChhsfn72ov3uv6+Jm9hICy2RjcvYbFF/r4m8gTZiPxvhMV+kRZaDxUWe9alLcJiPw/c4kpERERERERERLUmLgVa/3CLKxERERERERERkUBM0NFz5ePjg0mTJkmfbWxssGLFCmHjISIiIiIiIqK6oRR41Tfc4voakclk+P7779G3b19hYzh+/Dj09PSExSciIiIiIiIietlwBV09UVJS8tz6Li0trbO+zMzMoKurW2f9ERERERERERG96pigewF++OEHGBsbQ6F49PrEzMxMyGQyTJ8+XaozZswYDB48WPq8Y8cOtGrVCnK5HDY2NoiKUj3ZysbGBgsWLEBAQACMjIwQFBSEkpISBAcHw9LSEtra2rCxsUFERIRUHwDef/99yGQy6fOT8vPzIZPJEB8fDx8fH2hra2Pz5s34888/MXjwYDRt2hS6urpwdXXFt99+q9K2qKgIw4cPh76+PiwtLSuNuWIcFVtcK2JlZmZK9//66y/IZDIkJSUBAG7duoWhQ4fCzMwMOjo6cHBwQGxs7FOfORERERERERE9XwqBV33DBN0L4OXlhbt37+LkyZMAgOTkZDRs2BDJyclSnaSkJHh7ewMAMjIyMHDgQAwaNAhnzpxBeHg4wsLCEBcXp9LvkiVL4OLigoyMDISFhWHlypXYvXs34uPjce7cOWzevFlKxB0/fhwAEBsbi8LCQulzdWbOnImQkBBkZ2fD19cXDx8+RLt27bBnzx789ttvGD16NIYNG4Zjx45JbaZPn47ExER8//332L9/P5KSkpCRkfGPnl1YWBiysrLw008/ITs7G2vWrEHDhg3/UZ9ERERERERERC8TvoPuBTAyMoKbmxuSkpLQrl07JCUlYfLkyZg7dy7u3r2LoqIinD9/Hj4+PgCAZcuWoVu3bggLCwMAODo6IisrC0uWLEFAQIDUb9euXTFt2jTpc0FBARwcHNCpUyfIZDJYW1tL98zMzAAAxsbGsLCweOqYJ02ahH79+qmUPR5rwoQJSEhIwHfffYc333wT9+7dw1dffYWvv/4ab7/9NgBg48aNaNq0ae0e1hMKCgrQtm1beHh4AEC1K/+IiIiIiIiI6MVSyESPoP7gCroXxMfHB0lJSVAqlUhJSYGfnx9cXFyQmpqKxMREmJubo2XLlgCA7OxseHp6qrT39PRETk4OysvLpbKKpFWFgIAAZGZmokWLFggJCcH+/fv/9nif7Lu8vBwLFy5E69atYWpqCn19fezfvx8FBQUAgNzcXJSUlKBDhw5SmwYNGqBFixZ/ewwA8K9//Qtbt26Fm5sbZsyYgfT09BrrFxcX486dOypXcXHxPxoDEREREREREdHzxATdC+Lj44OUlBScOnUKampqcHZ2hre3N5KTk1W2twKAUqmETKaahlYqKx8i/ORpqO7u7sjLy8P8+fPx4MEDDBw4EP379/9b432y76ioKCxfvhwzZszAzz//jMzMTPj6+kqHU1Q1vqdRU1Or1PbJAyl69uyJS5cuYdKkSbh69Sq6deumspLvSRERETAyMlK5IqPX1npsRERERERERFQzBZTCrvqGCboXpOI9dCtWrIC3tzdkMhm8vb2RlJRUKUHn7OyM1NRUlfbp6elwdHSEurp6jXEMDQ3h7++PmJgYbNu2DTt27MDNmzcBAJqamior8GqjYtXfhx9+iDZt2sDOzg45OTnSfXt7e2hqauLo0aNS2a1bt3D+/Plq+6zYdltYWCiVPX5gxOP1AgICsHnzZqxYsQJffvlltX2Ghobi9u3bKtfMiWNrM1UiIiIiIiIioheK76B7QSreQ7d582ZER0cDeJS0GzBgAEpLS6X3zwHA1KlT0b59e8yfPx/+/v44cuQIVq1ahdWrV9cYY/ny5bC0tISbmxvU1NTw3XffwcLCAsbGxgAevb/t0KFD8PT0hFwuh4mJyTOP397eHjt27EB6ejpMTEywbNkyXLt2DU5OTgAAfX19jBw5EtOnT4epqSnMzc0xa9YsaZVcVXR0dPDWW2/hs88+g42NDW7cuIHZs2er1JkzZw7atWuHVq1aobi4GHv27JFiVkUul0Mul6uUlZbceOZ5EhERERERERG9aFxB9wJ16dIF5eXlUjLOxMQEzs7OMDMzU0k6ubu7Iz4+Hlu3boWLiwvmzJmDefPmqRwQURV9fX1ERkbCw8MD7du3R35+Pvbu3SslyaKionDgwAFYWVmhbdu2tRp7WFgY3N3d4evrCx8fH1hYWKBv374qdZYsWQIvLy+899576N69Ozp16oR27drV2O+GDRtQWloKDw8PTJw4EQsWLFC5r6WlhdDQULRu3RpeXl5QV1fH1q1bazV2IiIiIiIiIqp7SoFXfcMVdC/Q0qVLsXTpUpWyqrZ0AsAHH3yADz74oNq+8vPzK5UFBQUhKCio2jZ9+vRBnz59ahyjjY1Nle+Ta9CgAXbt2lVjW319fWzatAmbNm2SyqZPn17juJ2cnHDkyBGVssfjz549u9KqOiIiIiIiIiKi+oQJOiIiIiIiIiIiqjWF6AHUI9ziSkREREREREREJBATdERERERERERERAJxiysREREREREREdWaol4e1yAGV9AREREREREREREJxBV0RERERERERERUa1w/V3eYoKN673CrUGGxPfeNExZb5LxFStQuERf8rJWw0HaZfwmLDWQIi3xRR11Y7DfNWgiLvWVpkbDYX+lpC4stct7AaWGRrXTMhMW+qPF6ns12scRYYPRrAmOLk9sxWFjs5umrhMUWOe8rtw2ExYamprDQN/L0hMVGnsCfqQlJ4mKjgbDIdmXcNEivDn63EhERERERERERCcQVdEREREREREREVGuv51r754Mr6IiIiIiIiIiIiARigq6ekclk2LVrV4118vPzIZPJkJmZWaexn1e/RERERERERPTyUUAp7KpvmKB7hZSUCHz5/Qv2Os2ViIiIiIiIiF5vTNDVkR9++AHGxsZQKB7twM7MzIRMJsP06dOlOmPGjMHgwYOlzzt27ECrVq0gl8thY2ODqKgolT5tbGywYMECBAQEwMjICEFBQSgpKUFwcDAsLS2hra0NGxsbRERESPUB4P3334dMJpM+P8nW1hYA0LZtW8hkMvj4+Ej3YmNj4eTkBG1tbbRs2RKrV6+W7gUGBqJ169YoLi4GAJSWlqJdu3YYOnRojf36+Phg0qRJKmPo27cvAgICapwrAKSnp8PLyws6OjqwsrJCSEgIiopEnuJHRERERERERACgFHjVN0zQ1REvLy/cvXsXJ0+eBAAkJyejYcOGSE5OluokJSXB29sbAJCRkYGBAwdi0KBBOHPmDMLDwxEWFoa4uDiVfpcsWQIXFxdkZGQgLCwMK1euxO7duxEfH49z585h8+bNUiLu+PHjAB4l2QoLC6XPT/rll18AAAcPHkRhYSF27twJAIiJicGsWbOwcOFCZGdnY9GiRQgLC8PGjRsBACtXrkRRURE+/vhjAEBYWBhu3LghJfGq6/dZPTnXM2fOwNfXF/369cPp06exbds2pKamIjhY3HH0RERERERERER1jae41hEjIyO4ubkhKSkJ7dq1Q1JSEiZPnoy5c+fi7t27KCoqwvnz56VVZcuWLUO3bt0QFhYGAHB0dERWVhaWLFmisrKsa9eumDZtmvS5oKAADg4O6NSpE2QyGaytraV7ZmZmAABjY2NYWFhUO9aKeqampir15s+fj6ioKPTr1w/AoxVxWVlZWLduHUaMGAF9fX1s3rwZ3t7eMDAwQFRUFA4dOgQjI6Ma+31WT851+PDhGDJkiLT6zsHBAStXroS3tzfWrFkDbW3tWscgIiIiIiIiInrZcAVdHfLx8UFSUhKUSiVSUlLg5+cHFxcXpKamIjExEebm5mjZsiUAIDs7G56enirtPT09kZOTg/LycqnMw8NDpU5AQAAyMzPRokULhISEYP/+/XUy9j/++AOXL1/GyJEjoa+vL10LFixAbm6uVK9Dhw6YNm0a5s+fj6lTp8LLy6tO4gOV55qRkYG4uDiV8fj6+kKhUCAvL6/KPoqLi3Hnzh2Vq0RZXmVdIiIiIiIiIvr7FAKv+oYr6OqQj48PvvrqK5w6dQpqampwdnaGt7c3kpOTcevWLWl7KwAolUrIZDKV9kpl5V3Uenp6Kp/d3d2Rl5eHn376CQcPHsTAgQPRvXt3bN++/R+NveLdeTExMXjzzTdV7qmrq6vUS0tLg7q6OnJycp6pbzU1tUpzKy0trVTvybkqFAqMGTMGISEhleo2a9asylgRERGYO3euStkwXWeM0Hd5prESEREREREREb1oXEFXhyreQ7dixQp4e3tDJpPB29sbSUlJKu+fAwBnZ2ekpqaqtE9PT4ejo6NKQqwqhoaG8Pf3R0xMDLZt24YdO3bg5s2bAABNTU2VFXhV0dLSAgCVeubm5mjSpAkuXrwIe3t7lavi8Afg0XvisrOzkZycjH379iE2NrbGfoFHW18LCwulz+Xl5fjtt99qHCPwKBl59uzZSuOxt7eXYj0pNDQUt2/fVrkG6zk9NRYRERERERER1Y5S4D/1DVfQ1aGK99Bt3rwZ0dHRAB4l7QYMGIDS0lKV01KnTp2K9u3bY/78+fD398eRI0ewatUqlVNTq7J8+XJYWlrCzc0Nampq+O6772BhYQFjY2MAj05DPXToEDw9PSGXy2FiYlKpj0aNGkFHRwcJCQlo2rQptLW1YWRkhPDwcISEhMDQ0BA9e/ZEcXExTpw4gVu3bmHKlCnIzMzEnDlzsH37dnh6eiI6OhoTJ06Et7c37Ozsqu23a9eumDJlCn788Uc0b94cy5cvx19//fXU5zlz5ky89dZbGD9+PIKCgqCnp4fs7GwcOHAAn3/+eZVt5HI55HK5SpmWrOaEJxERERERERGRSFxBV8e6dOmC8vJyKRlnYmICZ2dnmJmZwcnpfyu53N3dER8fj61bt8LFxQVz5szBvHnzVA6IqIq+vj4iIyPh4eGB9u3bIz8/H3v37oWa2qMvZVRUFA4cOAArKyu0bdu2yj40NDSwcuVKrFu3Do0bN4afnx8AYNSoUVi/fj3i4uLg6uoKb29vxMXFwdbWFg8fPsTQoUMREBCAPn36AABGjhyJ7t27Y9iwYSgvL6+238DAQIwYMQLDhw+Ht7c3bG1t0aVLl6c+y9atWyM5ORk5OTno3Lkz2rZti7CwMFhaWj61LRERERERERHRq0KmrOrFZ0T1yCFzf2GxPfcNExY7zXeTsNgifaVdIix2l3K9p1d6TuyqeK/j6+CgjrgVsodLrwmLHSBrLCx2Z92bwmKn3G8gLLZIccqrwmJ7adb+VHb6Z0T+bLHWMBIWe468WFjs5umrhMXO7RgsLPaV2wbCYl/U1BQWW+R/x15Xr+t/v4P+u1n0EF6IYBtx/397Vf42YbGfB66gIyIiIiIiIiIiEojvoCMiIiIiIiIiolpT1MPDGkThCjoiIiIiIiIiIiKBuIKOiIiIiIiIiIhqjevn6g5X0BEREREREREREQnEFXRU73mdjRAWW+TJXF5nxZ1IJtJFtznCYg/q95ew2Op2TYTFVvPpIyz2xXe3Cou95cEfwmIPneMgLPbbS8SdKHpguriTkmW2zYXFnv/RKWGxu5eZCYv9up7yePj1PJT7tT1JVeS8xf1UE/v7WkPbImGx5S3EnZyr0c1LWOwrozOExT6ooy4sNlFtMUFHRERERERERES1xkMi6g63uBIREREREREREQnEBB1JZDIZdu3aJXoYRERERERERPQKUAi86hsm6F4SL0NyrLCwED179nyuMfLz8yGTyZCZmflc4xARERERERERvSqYoHsBSkpKRA+hRhXjs7CwgFwuFzyaZ1da+pq+RZmIiIiIiIiI6pXXPkH3ww8/wNjYGArFowWSmZmZkMlkmD59ulRnzJgxGDx4sPR5x44daNWqFeRyOWxsbBAVFaXSp42NDRYsWICAgAAYGRkhKCgIJSUlCA4OhqWlJbS1tWFjY4OIiAipPgC8//77kMlk0ucnVaw+27p1Kzp27AhtbW20atUKSUlJKvWysrLQq1cv6Ovrw9zcHMOGDcONGzek+z4+PggODsaUKVPQsGFDvP322wBUV/FVxIqPj0fnzp2ho6OD9u3b4/z58zh+/Dg8PDygr6+PHj164I8/VE8yjI2NhZOTE7S1tdGyZUusXr1aumdrawsAaNu2LWQyGXx8fJ6p3ePj8fHxgba2NjZv3lzlcyIiIiIiIiKi508p8J/65rVP0Hl5eeHu3bs4efIkACA5ORkNGzZEcnKyVCcpKQne3t4AgIyMDAwcOBCDBg3CmTNnEB4ejrCwMMTFxan0u2TJEri4uCAjIwNhYWFYuXIldu/ejfj4eJw7dw6bN2+WEnHHjx8H8ChBVVhYKH2uzvTp0zF16lScPHkSHTt2xHvvvYc///wTwKNtqt7e3nBzc8OJEyeQkJCA33//HQMHDlTpY+PGjdDQ0EBaWhrWrVtXbaxPP/0Us2fPxq+//goNDQ0MHjwYM2bMQHR0NFJSUpCbm4s5c/53THpMTAxmzZqFhQsXIjs7G4sWLUJYWBg2btwIAPjll18AAAcPHkRhYSF27tz5TO0qzJw5EyEhIcjOzoavr2+Nz4mIiIiIiIiI6FWgIXoAohkZGcHNzQ1JSUlo164dkpKSMHnyZMydOxd3795FUVERzp8/L630WrZsGbp164awsDAAgKOjI7KysrBkyRIEBARI/Xbt2hXTpk2TPhcUFMDBwQGdOnWCTCaDtbW1dM/MzAwAYGxsDAsLi6eOOTg4GB988AEAYM2aNUhISMBXX32FGTNmYM2aNXB3d8eiRYuk+hs2bICVlRXOnz8PR0dHAIC9vT0WL1781FjTpk2TEmETJ07E4MGDcejQIXh6egIARo4cqZKcnD9/PqKiotCvXz8Aj1bMZWVlYd26dRgxYoQ0V1NTU5W5Pq1dhUmTJkl1iIiIiIiIiEic+nhYgyiv/Qo64NGWz6SkJCiVSqSkpMDPzw8uLi5ITU1FYmIizM3N0bJlSwBAdna2lJyq4OnpiZycHJSXl0tlHh4eKnUCAgKQmZmJFi1aICQkBPv37//b4+3QoYP0Zw0NDXh4eCA7OxvAoxV+iYmJ0NfXl66Ksefm5lY7vuq0bt1a+rO5uTkAwNXVVaXs+vXrAIA//vgDly9fxsiRI1XiL1iwQCX2k2rT7mnjLi4uxp07d1Su4uLiZ5orEREREREREZEIr/0KOuBRgu6rr77CqVOnoKamBmdnZ3h7eyM5ORm3bt2StrcCgFKphEwmU2mvVFbe+6ynp6fy2d3dHXl5efjpp59w8OBBDBw4EN27d8f27dvrZA4VY1IoFOjTpw8iIyMr1bG0tKx2fNXR1NSsFOPJsor391X8b0xMDN58802VftTV1auNUZt2Txt3REQE5s6dq1I2e3oI5syYWGM7IiIiIiIiIqqd+vguOFGYoMP/3kO3YsUKeHt7QyaTwdvbGxEREbh16xYmTvxfcsfZ2Rmpqakq7dPT0+Ho6FhjEgoADA0N4e/vD39/f/Tv3x89evTAzZs30aBBA2hqaqqswKvJ0aNH4eXlBQAoKytDRkYGgoODATxKBO7YsQM2NjbQ0HixX15zc3M0adIEFy9exNChQ6uso6WlBQAqc32Wds8qNDQUU6ZMUSlTu3vlH/VJRERERERERPQ8MUGH/72HbvPmzYiOjgbwKGk3YMAAlJaWqpw0OnXqVLRv3x7z58+Hv78/jhw5glWrVqmcOFqV5cuXw9LSEm5ublBTU8N3330HCwsLGBsbA3h0kmvFu93kcjlMTEyq7euLL76Ag4MDnJycsHz5cty6dQuBgYEAgPHjxyMmJgaDBw/G9OnT0bBhQ1y4cAFbt25FTEzMU5OI/1R4eDhCQkJgaGiInj17ori4GCdOnMCtW7cwZcoUNGrUCDo6OkhISEDTpk2hra0NIyOjp7Z7VnK5HHK5XKWstORGNbWJiIiIiIiIiMTjO+j+X5cuXVBeXi4l40xMTODs7AwzMzM4OTlJ9dzd3REfH4+tW7fCxcUFc+bMwbx581QOiKiKvr4+IiMj4eHhgfbt2yM/Px979+6FmtqjL0FUVBQOHDgAKysrtG3btsa+PvvsM0RGRqJNmzZISUnBv//9bzRs2BAA0LhxY6SlpaG8vBy+vr5wcXHBxIkTYWRkJMV6nkaNGoX169cjLi4Orq6u8Pb2RlxcHGxtbQE8emfeypUrsW7dOjRu3Bh+fn7P1I6IiIiIiIiIXi4KgVd9I1NW9QI1einl5+fD1tYWJ0+ehJubm+jhvDJKb1wUFju3Y7Cw2M3TVwmLLVKc2xxhsQf1+0tYbHW7JsJiq/n0ERZ707tbhcWef/+UsNj/mdNRWOy3l+QIi31guoOw2DLb5sJit/hok7DYsZrOwmJffOydty9aZ92bwmKPLHooLLa1hpGw2BszooTF5u9rL57I39feb3VZWGx5CwNhsTW6eQmLnT46Q1jsgzrPdwdZTSLyvxEW+0UaYfOBsNgb83cIi/08cIsrERERERERERHVmoJrvuoMt7gSEREREREREREJxBV0rxAbGxtwRzIRERERERERUf3CFXRERERERERERFRrSoFXba1evRq2trbQ1tZGu3btkJKSUmP95ORktGvXDtra2rCzs8PatWv/RtRnxwQdERERERERERHVW9u2bcOkSZMwa9YsnDx5Ep07d0bPnj1RUFBQZf28vDz06tULnTt3xsmTJ/HJJ58gJCQEO3Y8v4MpmKAjIiIiIiIiIqJaU0Ap7KqNZcuWYeTIkRg1ahScnJywYsUKWFlZYc2aNVXWX7t2LZo1a4YVK1bAyckJo0aNQmBgIJYuXVoXj61KfAcd1Xvl/80SFvvKbXFHqdsInLd6U2dhsUUqPndXWGxdO2Ghobwk7nvtdSWzbS4weo7A2PSiXdTUFD0EIUT+9xsaD8XFfk2J/HqL/Gn+urqRpycsdkOI+11R3S5XWGyx/y1RCIxNz1txcTGKi4tVyuRyOeRyuUpZSUkJMjIy8PHHH6uUv/POO0hPT6+y7yNHjuCdd95RKfP19cVXX32F0tJSaD6H72uuoCMiIiIiIiIiolpTCvwnIiICRkZGKldERESlMd64cQPl5eUwNzdXKTc3N8e1a9eqnNe1a9eqrF9WVoYbN27U3QN8DFfQERERERERERHRKyU0NBRTpkxRKXty9dzjZDKZymelUlmp7Gn1qyqvK0zQERERERERERHRK6Wq7axVadiwIdTV1Sutlrt+/XqlVXIVLCwsqqyvoaEBU1PTvz/oGnCLK1VLJpNh165doodBRERERERERC8hhcDrWWlpaaFdu3Y4cOCASvmBAwfQsWPHKtt06NChUv39+/fDw8Pjubx/DmCC7rVVUlIiLHZpaamw2ERERERERET0epkyZQrWr1+PDRs2IDs7G5MnT0ZBQQHGjh0L4NF22eHDh0v1x44di0uXLmHKlCnIzs7Ghg0b8NVXX2HatGnPbYxM0L2EfvjhBxgbG0OheJQTzszMhEwmw/Tp06U6Y8aMweDBg6XPO3bsQKtWrSCXy2FjY4OoqCiVPm1sbLBgwQIEBATAyMgIQUFBKCkpQXBwMCwtLaGtrQ0bGxvphYo2NjYAgPfffx8ymUz6XJWZM2fC0dERurq6sLOzQ1hYmEoSLjw8HG5ubtiwYQPs7Owgl8uhVCpx+/ZtjB49Go0aNYKhoSG6du2KU6dOSe1yc3Ph5+cHc3Nz6Ovro3379jh48ODffq5EREREREREVHcUUAq7asPf3x8rVqzAvHnz4ObmhsOHD2Pv3r2wtrYGABQWFqKgoECqb2tri7179yIpKQlubm6YP38+Vq5ciQ8++KBOn9/j+A66l5CXlxfu3r2LkydPol27dkhOTkbDhg2RnJws1UlKSsLkyZMBABkZGRg4cCDCw8Ph7++P9PR0jBs3DqampggICJDaLFmyBGFhYZg9ezYAYOXKldi9ezfi4+PRrFkzXL58GZcvXwYAHD9+HI0aNUJsbCx69OgBdXX1asdrYGCAuLg4NG7cGGfOnEFQUBAMDAwwY8YMqc6FCxcQHx+PHTt2SH317t0bDRo0wN69e2FkZIR169ahW7duOH/+PBo0aIB79+6hV69eWLBgAbS1tbFx40b06dMH586dQ7NmzerseRMRERERERFR/TZu3DiMGzeuyntxcXGVyry9vfHrr78+51H9DxN0LyEjIyO4ubkhKSkJ7dq1k5Jxc+fOxd27d1FUVITz58/Dx8cHALBs2TJ069YNYWFhAABHR0dkZWVhyZIlKgm6rl27qizHLCgogIODAzp16gSZTCZljgHAzMwMAGBsbAwLC4sax1uR8AMerbybOnUqtm3bppKgKykpwaZNm6R+f/75Z5w5cwbXr1+XXuq4dOlS7Nq1C9u3b8fo0aPRpk0btGnTRupjwYIF+P7777F7924EBwfX5pESEREREREREb20uMX1JeXj44OkpCQolUqkpKTAz88PLi4uSE1NRWJiIszNzdGyZUsAQHZ2Njw9PVXae3p6IicnB+Xl5VKZh4eHSp2AgABkZmaiRYsWCAkJwf79+//WWLdv345OnTrBwsIC+vr6CAsLU1kaCgDW1tZScg54tOrv3r17MDU1hb6+vnTl5eUhNzcXAFBUVIQZM2bA2dkZxsbG0NfXx3/+859KfT+uuLgYd+7cUbmKS/jOOyIiIiIiIqK6phT4T33DBN1LysfHBykpKTh16hTU1NTg7OwMb29vJCcnIykpCd7e3lJdpVIJmUym0l6prPzNqqenp/LZ3d0deXl5mD9/Ph48eICBAweif//+tRrn0aNHMWjQIPTs2RN79uzByZMnMWvWrEqHUDwZW6FQwNLSEpmZmSrXuXPnpHftTZ8+HTt27MDChQuRkpKCzMxMuLq61njARUREBIyMjFSuJRu+q9WciIiIiIiIiIheJG5xfUlVvIduxYoV8Pb2hkwmg7e3NyIiInDr36HrtgABAABJREFU1i1MnDhRquvs7IzU1FSV9unp6XB0dKzx3XEAYGhoCH9/f/j7+6N///7o0aMHbt68iQYNGkBTU1NlBV5V0tLSYG1tjVmzZkllly5deur83N3dce3aNWhoaFR7AEVKSgoCAgLw/vvvAwDu3buH/Pz8GvsNDQ3FlClTVMqU/zn01PEQERERERERUe0oRA+gHmGC7iVV8R66zZs3Izo6GsCjpN2AAQNQWloqvX8OAKZOnYr27dtj/vz58Pf3x5EjR7Bq1SqsXr26xhjLly+HpaUl3NzcoKamhu+++w4WFhYwNjYG8Oh9cocOHYKnpyfkcjlMTEwq9WFvb4+CggJs3boV7du3x48//ojvv//+qfPr3r07OnTogL59+yIyMhItWrTA1atXsXfvXvTt2xceHh6wt7fHzp070adPH8hkMoSFhUkn21ZHLpdL77Sr8FBL86njISIiIiIiIiIShVtcX2JdunRBeXm5lIwzMTGBs7MzzMzM4OTkJNVzd3dHfHw8tm7dChcXF8yZMwfz5s1TOSCiKvr6+oiMjISHhwfat2+P/Px87N27F2pqj74toqKicODAAVhZWaFt27ZV9uHn54fJkycjODgYbm5uSE9Plw6rqIlMJsPevXvh5eWFwMBAODo6YtCgQcjPz4e5uTmARwlEExMTdOzYEX369IGvry/c3d2f4ckRERERERER0fOmVCqFXfWNTFkfZ0X0mIeZe4TFTvPdJCy2575hwmKrN3UWFjvObY6w2O+3uiwstm7PlsJiy2ybC4u9efxpYbHn3z8lLPa5WHH/fncflyAs9oHpDsJii/w+b/GRuP+WhOm2eXqlesiuVNwBU3M0/hAW21rDSFjsjRlRwmIfbhUqLLbX2QhhsUUS+ftaZ92bwmI3tC0SFlvk74pbloqb90UNcRswI/K/ERb7RXq/WR9hsb8v+EFY7OeBK+iIiIiIiIiIiIgE4jvoiIiIiIiIiIio1hTgpsy6whV0REREREREREREAnEFHRERERERERER1Zq4t/zVP1xBR0REREREREREJBBPcaV6L9RmiOghCGFXJi7/LvK0pDnRbsJilx06LCz2lQRxz1zkiWQ38vSExRY576hzTYTFPlx6TVhsL00LYbFF6v6gXFjsi5qawmK/rhLVxf1s6VIu7mcqvV4CMucJi/1wXoiw2MXn7gqL/bqStzAQFttg5R5hsV+kPs3eFRb7h4L69Yy5xZWIiIiIiIiIiGpNyUMi6gy3uBIREREREREREQnEFXRERET/x96dx9WU/38Af532fVGpkJIWlVKJQabFMoghM2TJkjCMoYTQdyajsieyDBpLIfs61kjcJoREtlJJZMnYtZC2z++Pvp2fqzDm2+kY3s953Me45557Xp9z7u326XM/CyGEEEIIIeSjVVIPujpDPejIe3Ech71794pdDEIIIYQQQgghhJDPFjXQ1aF/c2PWzJkzYW9vL3YxCCGEEEIIIYQQ8i/BGBPt9rmhBrq/qbS0VOwiEEIIIYQQQgghhJDP0GfRQLd//35oaWmhsrISAJCWlgaO4xAYGMjvM2bMGAwaNIi/v2vXLtjY2EBRUREmJiaIiIiQOqaJiQlmzZoFHx8faGpqYvTo0SgtLcX48eNhaGgIJSUlmJiYYO7cufz+ANC3b19wHMfff9v7jgFU9cKLiopCr169oKKiAisrKyQnJ+PGjRtwc3ODqqoq2rdvj5ycHKnjrly5Es2bN4eCggIsLS2xceNGqcfz8vLQp08fqKmpQUNDA15eXvjrr78AADExMQgJCcGlS5fAcRw4jkNMTAz/3MePH6Nv375QUVGBubk59u3bxz8mkUjAcRwSEhLg5OQEFRUVdOjQAZmZmTVeo9atW0NJSQmmpqYICQlBeXk5//jMmTPRtGlTKCoqolGjRvDz+//lz1esWAFzc3MoKSlBX18f/fr1q/XaEkIIIYQQQgghhPwbfRYNdC4uLigsLMTFixcBAImJidDV1UViYiK/j0QigaurKwAgNTUVXl5eGDhwIK5cuYKZM2ciODhYqlEKAMLDw9GyZUukpqYiODgYS5cuxb59+7B9+3ZkZmYiNjaWb4hLSUkBAERHRyM/P5+//7b3HaNaWFgYhg0bhrS0NLRo0QKDBw/GmDFjEBQUhPPnzwMAxo8fz++/Z88e+Pv7Y/Lkybh69SrGjBmDESNG4MSJEwCqupx6enri6dOnSExMRHx8PHJycjBgwAAAwIABAzB58mTY2NggPz8f+fn5/GMAEBISAi8vL1y+fBkeHh7w9vbG06dPpcr8888/IyIiAufPn4ecnBx8fX35x44cOYIhQ4bAz88P6enpiIqKQkxMDGbPng0A2LlzJxYvXoyoqChkZ2dj7969sLW1BQCcP38efn5+CA0NRWZmJuLi4uDi4lLrtSWEEEIIIYQQQkj9qRTx9rn5LFZx1dTUhL29PSQSCVq3bg2JRIKAgACEhISgsLAQxcXFyMrKgpubGwBg0aJF6Ny5M4KDgwEAFhYWSE9PR3h4OHx8fPjjdurUCVOmTOHv5+XlwdzcHB07dgTHcTA2NuYf09PTAwBoaWnBwMDgnWV93zGqjRgxAl5eXgCAadOmoX379ggODka3bt0AAP7+/hgxYgS//8KFC+Hj44Nx48YBACZNmoQzZ85g4cKFcHd3x7Fjx3D58mXk5ubCyMgIALBx40bY2NggJSUFbdq0gZqaGuTk5Gotu4+PD9/7cM6cOVi2bBnOnTuH7t278/vMnj2bbwCdPn06evbsiZKSEigpKWH27NmYPn06hg8fDgAwNTVFWFgYpk6dil9//RV5eXkwMDBAly5dIC8vj6ZNm6Jt27b89VJVVUWvXr2grq4OY2NjODg4vPP6vn79Gq9fv5baVs4qIMfJvvM5hBBCCCGEEEIIIWL6LHrQAYCbmxskEgkYY0hKSkKfPn3QsmVLnDx5EidOnIC+vj5atGgBAMjIyICzs7PU852dnZGdnY2Kigp+m5OTk9Q+Pj4+SEtLg6WlJfz8/HD06NGPLuffOYadnR3/b319fQDge5RVbyspKUFBQcF7zycjI4N/3MjIiG+cAwBra2toaWnx+7zPm+VRVVWFuro6Hj58+M59DA0NAYDfJzU1FaGhoVBTU+Nvo0ePRn5+Pl6+fIn+/fvj1atXMDU1xejRo7Fnzx5++GvXrl1hbGwMU1NTDB06FJs2bcLLly/fWda5c+dCU1NT6pb8Iv2D50gIIYQQQgghhJCPw0T873PzWTXQJSUl4dKlS5CRkYG1tTVcXV2RmJgoNbwVqBryyXGc1PNrWwFEVVVV6r6joyNyc3MRFhaGV69ewcvL66PnQ/s7x5CXl+f/XV3O2rZVz7n35rY3z6d6W23n+77tb3szuzrrzewPla+yshIhISFIS0vjb1euXEF2djaUlJRgZGSEzMxM/Pbbb1BWVsa4cePg4uKCsrIyqKur48KFC9iyZQsMDQ0xY8YMtGrVCs+fP6+1rEFBQXjx4oXUrb2m9QfPkRBCCCGEEEIIIUQsn00DXfU8dJGRkXB1dQXHcXB1dYVEIqnRQGdtbY2TJ09KPf/06dOwsLCArOz7h0JqaGhgwIABWL16NbZt24Zdu3bx87HJy8tL9cD7J8f4J6ysrGo9HysrKwBV55uXl4c7d+7wj6enp+PFixf8PgoKCn+r7P+Eo6MjMjMzYWZmVuMmI1P1FlRWVkbv3r2xdOlSSCQSJCcn48qVKwAAOTk5dOnSBQsWLMDly5dx69YtHD9+vNYsRUVFaGhoSN1oeCshhBBCCCGEEEI+ZZ/FHHTA/89DFxsbiyVLlgCoarTr378/ysrK+PnnAGDy5Mlo06YNwsLCMGDAACQnJ2P58uVYsWLFezMWL14MQ0ND2NvbQ0ZGBjt27ICBgQG0tLQAVK3kmpCQAGdnZygqKkJbW/ujj/FPBAYGwsvLC46OjujcuTP279+P3bt349ixYwCALl26wM7ODt7e3oiMjER5eTnGjRsHV1dXfhiviYkJcnNzkZaWhiZNmkBdXR2Kior/uExvmjFjBnr16gUjIyP0798fMjIyuHz5Mq5cuYJZs2YhJiYGFRUV+Oqrr6CiooKNGzdCWVkZxsbGOHDgAG7evAkXFxdoa2vj0KFDqKyshKWlZZ2UjRBCCCGEEEIIIf9M5Wc41FQsn00POgBwd3dHRUUF3xinra0Na2tr6Onp8T3FgKoeXdu3b8fWrVvRsmVLzJgxA6GhoVILRNRGTU0N8+fPh5OTE9q0aYNbt27h0KFDfC+wiIgIxMfHw8jI6J0LGXzoGP+Ep6cnlixZgvDwcNjY2CAqKgrR0dH8deA4Dnv37oW2tjZcXFzQpUsXmJqaYtu2bfwxvv/+e3Tv3h3u7u7Q09PDli1b/nF53tatWzccOHAA8fHxaNOmDdq1a4dFixbxC2RoaWlh9erVcHZ2hp2dHRISErB//37o6OhAS0sLu3fvRqdOnWBlZYVVq1Zhy5YtsLGxqbPyEUIIIYQQQgghhIiJY7VNvkbIZyTIZLDYRRCFabl47e835cRb9HrGEnvRsssT/hQt+16ceNdct1mxaNmPc1U/vJNAxDzviMzGomX/WfZAtGwX+Xevkv456/JKmCko/o6bb81DS4R3Qla8zxb3CvE+U8mXxSctVLTsklA/0bJfZxaKlv2lUrRUFy1bfekB0bLrU+cm34iWnXD34xfu/JR9Vj3oCCGEEEIIIYQQQgj5t6EGOkIIIYQQQgghhBBCRPTZLBJBCCGEEEIIIYQQQuoPLRJRd6gHHSGEEEIIIYQQQgghIqIedIQQQgghhBBCCCHkozHqQVdnqIGOfPaGKzwXLfveC/FWDWplf0+0bDFX1ixecVC0bLX5gaJlG3dOFy2b5eaIlq17+Lpo2WISc1XPP0WsOYh53mKaIfdItOz4KeaiZb/8Qn++b4q4SvPXKk9Fy6YVweufmOct5kqqSjOWipYtf1fE+tpt8bLLE/4ULVuus4to2YR8LGqgI4QQQgghhBBCCCEfrZJRD7q6QnPQEUIIIYQQQgghhBAiImqgI4QQQgghhBBCCCFERNRA9zdxHIe9e/eKXQzRxMTEQEtLS+xiEEIIIYQQQggh5BPBRLx9bqiBDkBpaanYRSCEEEIIIYQQQgghX6hPvoFu//790NLSQmVlJQAgLS0NHMchMPD/V0scM2YMBg0axN/ftWsXbGxsoKioCBMTE0REREgd08TEBLNmzYKPjw80NTUxevRolJaWYvz48TA0NISSkhJMTEwwd+5cfn8A6Nu3LziO4+/X5u7duxg4cCAaNGgAVVVVODk54ezZs/zjK1euRPPmzaGgoABLS0ts3LhR6vkcxyEqKgq9evWCiooKrKyskJycjBs3bsDNzQ2qqqpo3749cnL+f9XEmTNnwt7eHlFRUTAyMoKKigr69++P58+f8/ukpKSga9eu0NXVhaamJlxdXXHhwgWp7OfPn+OHH36Avr4+lJSU0LJlSxw4cAASiQQjRozAixcvwHEcOI7DzJkz+WszZ84c+Pr6Ql1dHU2bNsXvv/8uddx79+5hwIAB0NbWho6ODvr06YNbt27xj0skErRt2xaqqqrQ0tKCs7Mzbt++DQC4dOkS3N3doa6uDg0NDbRu3Rrnz59/5/UnhBBCCCGEEEJI/agEE+32ufnkG+hcXFxQWFiIixcvAgASExOhq6uLxMREfh+JRAJXV1cAQGpqKry8vDBw4EBcuXIFM2fORHBwMGJiYqSOGx4ejpYtWyI1NRXBwcFYunQp9u3bh+3btyMzMxOxsbF8Q1xKSgoAIDo6Gvn5+fz9txUVFcHV1RX379/Hvn37cOnSJUydOpVvXNyzZw/8/f0xefJkXL16FWPGjMGIESNw4sQJqeOEhYVh2LBhSEtLQ4sWLTB48GCMGTMGQUFBfOPU+PHjpZ5z48YNbN++Hfv370dcXBzS0tLw008/8Y8XFhZi+PDhSEpKwpkzZ2Bubg4PDw8UFhYCACorK9GjRw+cPn0asbGxSE9Px7x58yArK4sOHTogMjISGhoayM/PR35+PqZMmcIfOyIiAk5OTrh48SLGjRuHH3/8EdevXwcAvHz5Eu7u7lBTU8Off/6JkydPQk1NDd27d0dpaSnKy8vh6ekJV1dXXL58GcnJyfjhhx/AcRwAwNvbG02aNEFKSgpSU1Mxffp0yMvLv+8tQwghhBBCCCGEEPKvIid2AT5EU1MT9vb2kEgkaN26NSQSCQICAhASEoLCwkIUFxcjKysLbm5uAIBFixahc+fOCA4OBgBYWFggPT0d4eHh8PHx4Y/bqVMnqUamvLw8mJubo2PHjuA4DsbGxvxjenp6AAAtLS0YGBi8s6ybN2/Go0ePkJKSggYNGgAAzMzM+McXLlwIHx8fjBs3DgAwadIknDlzBgsXLoS7uzu/34gRI+Dl5QUAmDZtGtq3b4/g4GB069YNAODv748RI0ZIZZeUlGD9+vVo0qQJAGDZsmXo2bMnIiIiYGBggE6dOkntHxUVBW1tbSQmJqJXr144duwYzp07h4yMDFhYWAAATE1NpV4HjuNqPX8PDw/+nKZNm4bFixdDIpGgRYsW2Lp1K2RkZLBmzRq+0S06OhpaWlqQSCRwcnLCixcv0KtXLzRv3hwAYGVlJfW6BAYGokWLFgAAc3Pzd15/QgghhBBCCCGEkH+jT74HHQC4ublBIpGAMYakpCT06dMHLVu2xMmTJ3HixAno6+vzDTgZGRlwdnaWer6zszOys7NRUVHBb3NycpLax8fHB2lpabC0tISfnx+OHj360eVMS0uDg4MD3zj3tneVLSMjQ2qbnZ0d/299fX0AgK2trdS2kpISFBQU8NuaNm3KN84BQPv27VFZWYnMzEwAwMOHDzF27FhYWFhAU1MTmpqaKCoqQl5eHl/2Jk2a8I1zH+PN8lY34j18+BBAVY/GGzduQF1dHWpqalBTU0ODBg1QUlKCnJwcNGjQAD4+PujWrRu+/fZbLFmyBPn5+fzxJk2ahFGjRqFLly6YN2+e1NDe2rx+/RoFBQVSt9LKivc+hxBCCCGEEEIIIR+PhrjWnX9NA11SUhIuXboEGRkZWFtbw9XVFYmJiVLDWwGAMcb31Hpz29tUVVWl7js6OiI3NxdhYWF49eoVvLy80K9fv48qp7Ky8gf3qa1sb297cwhn9WO1baseOvu+nOr/+/j4IDU1FZGRkTh9+jTS0tKgo6PDL5Dxd8r+Lm8POeU4ji9bZWUlWrdujbS0NKlbVlYWBg8eDKCqR11ycjI6dOiAbdu2wcLCAmfOnAFQNb/etWvX0LNnTxw/fhzW1tbYs2fPO8syd+5cvgGy+vb7s5v/+NwIIYQQQgghhBBChPavaKCrnocuMjISrq6u4DgOrq6ukEgkNRrorK2tcfLkSannnz59GhYWFpCVlX1vjoaGBgYMGIDVq1dj27Zt2LVrF54+fQqgqhHqzR54tbGzs0NaWhr/nLdZWVnVWrY3h3T+U3l5ebh//z5/Pzk5GTIyMnyPuKSkJPj5+cHDw4NfQOPx48dSZb979y6ysrJqPb6CgsIHz782jo6OyM7ORsOGDWFmZiZ109TU5PdzcHBAUFAQTp8+jZYtW2Lz5s38YxYWFggICMDRo0fx3XffITo6+p15QUFBePHihdTtB23Td+5PCCGEEEIIIYSQf4YxJtrtc/OvaKCrnocuNjaWn2vOxcUFFy5ckJp/DgAmT56MhIQEhIWFISsrC+vXr8fy5cul5purzeLFi7F161Zcv34dWVlZ2LFjBwwMDKClpQWgarXShIQEPHjwAM+ePav1GIMGDYKBgQE8PT1x6tQp3Lx5E7t27UJycjIAIDAwEDExMVi1ahWys7OxaNEi7N69+4Nl+zuUlJQwfPhwXLp0iW+M8/Ly4ueMMzMzw8aNG5GRkYGzZ8/C29tbqtecq6srXFxc8P333yM+Ph65ubk4fPgw4uLi+PMvKipCQkICHj9+jJcvX/6tcnl7e0NXVxd9+vRBUlIScnNzkZiYCH9/f9y9exe5ubkICgpCcnIybt++jaNHjyIrKwtWVlZ49eoVxo8fD4lEgtu3b+PUqVNISUl5b4OmoqIiNDQ0pG4KMu9vmCWEEEIIIYQQQggR07+igQ4A3N3dUVFRwTfGaWtrw9raGnp6elINNo6Ojti+fTu2bt2Kli1bYsaMGQgNDZVaIKI2ampqmD9/PpycnNCmTRvcunULhw4dgoxM1SWKiIhAfHw8jIyM4ODgUOsxFBQUcPToUTRs2BAeHh6wtbXlV0IFAE9PTyxZsgTh4eGwsbFBVFQUoqOjpRoY/ykzMzN899138PDwwDfffIOWLVtixYoV/OPr1q3Ds2fP4ODggKFDh8LPzw8NGzaUOsauXbvQpk0bDBo0CNbW1pg6dSrfa65Dhw4YO3YsBgwYAD09PSxYsOBvlUtFRQV//vknmjZtiu+++w5WVlbw9fXFq1evoKGhARUVFVy/fh3ff/89LCws8MMPP2D8+PEYM2YMZGVl8eTJEwwbNgwWFhbw8vJCjx49EBIS8j9fL0IIIYQQQgghhPxvaA66usOxz7Ff4Bdm5syZ2Lt3L9LS0sQuyifpuoWHaNn3XqiLlt3K/oFo2Y9zVT+8k0B0mxWLlq02P1C0bHY7Xbzs3Pcv3iKkl4evi5Ytpktp715RXGgz5B6Jlh1aridatpjEvObxgeKtnv6l/nxHZDYWLXu4wnPRssX8/f2l1lvEPO/G3cXrJ6I0Y6lo2RV3RayviVhXLE/4U7Rsuc4uomUr95kqWnZ9atvI9cM7CeTc/UTRsoXwr+lBRwghhBBCCCGEEELI50hO7AIQQgghhBBCCCGEkH8f9hkONRUL9aD7DMycOZOGtxJCCCGEEEIIIYT8S1EPOkIIIYQQQgghhBDy0WhZg7pDPegIIYQQQgghhBBCCBER9aAjnz0xV8e6lybeKq6KluJl66JQtGwxVyRTFXF1LM7YWrRsMT3OvSNatpirNDfWFO9nDOJ9pIp73mIS8ZpX3LwnWrZKjxaiZYu5guxt9kq0bDGJWW9BbqVo0V9qfe11pmjRkBdxJVXZJuLV1ypESwZeZx4ULVvWNEe0bEI+FjXQEUIIIYQQQgghhJCPVkmLRNQZGuJKCCGEEEIIIYQQQoiIqAcdIYQQQgghhBBCCPlotEhE3aEedEQUbm5umDhxotjFIIQQQgghhBBCCBEdNdB9QjiOw969e8UuRr3YvXs3wsLC+PsmJiaIjIwUr0CEEEIIIYQQQgj5KJVgot0+NzTEtZ6UlpZCQUFB7GJ8Mho0aCB2EQghhBBCCCGEEEI+CdSDDsD+/fuhpaWFysqqJdbT0tLAcRwCAwP5fcaMGYNBgwbx93ft2gUbGxsoKirCxMQEERERUsc0MTHBrFmz4OPjA01NTYwePRqlpaUYP348DA0NoaSkBBMTE8ydO5ffHwD69u0LjuP4+7W5e/cuBg4ciAYNGkBVVRVOTk44e/Ys//jKlSvRvHlzKCgowNLSEhs3bpR6PsdxWLNmDfr27QsVFRWYm5tj3759Uvtcu3YNPXv2hIaGBtTV1fH1118jJ6dqieqUlBR07doVurq60NTUhKurKy5cuMA/d9CgQRg4cKDU8crKyqCrq4vo6GgA0kNc3dzccPv2bQQEBIDjOHAch+LiYmhoaGDnzp01XitVVVUUFoq3LDwhhBBCCCGEEEJIXaIGOgAuLi4oLCzExYsXAQCJiYnQ1dVFYmIiv49EIoGrqysAIDU1FV5eXhg4cCCuXLmCmTNnIjg4GDExMVLHDQ8PR8uWLZGamorg4GAsXboU+/btw/bt25GZmYnY2Fi+IS4lJQUAEB0djfz8fP7+24qKiuDq6or79+9j3759uHTpEqZOnco3Lu7Zswf+/v6YPHkyrl69ijFjxmDEiBE4ceKE1HFCQkLg5eWFy5cvw8PDA97e3nj69CkA4N69e3BxcYGSkhKOHz+O1NRU+Pr6ory8HABQWFiI4cOHIykpCWfOnIG5uTk8PDz4RjNvb2/s27cPRUVFfN6RI0dQXFyM77//vsY57d69G02aNEFoaCjy8/ORn58PVVVVDBw4kG/QqxYdHY1+/fpBXV39Ha8mIYQQQgghhBBC6gMT8b/PDQ1xBaCpqQl7e3tIJBK0bt0aEokEAQEBCAkJQWFhIYqLi5GVlQU3NzcAwKJFi9C5c2cEBwcDACwsLJCeno7w8HD4+Pjwx+3UqROmTJnC38/Ly4O5uTk6duwIjuNgbGzMP6anpwcA0NLSgoGBwTvLunnzZjx69AgpKSn8MFEzMzP+8YULF8LHxwfjxo0DAEyaNAlnzpzBwoUL4e7uzu/n4+PD9wicM2cOli1bhnPnzqF79+747bffoKmpia1bt0JeXp4/xzfP601RUVHQ1tZGYmIievXqhW7dukFVVRV79uzB0KFD+XJ/++230NDQqHFODRo0gKysLNTV1aXOfdSoUejQoQPu37+PRo0a4fHjxzhw4ADi4+PfeX0IIYQQQgghhBBC/m2oB91/ubm5QSKRgDGGpKQk9OnTBy1btsTJkydx4sQJ6Ovro0WLFgCAjIwMODs7Sz3f2dkZ2dnZqKio4Lc5OTlJ7ePj44O0tDRYWlrCz88PR48e/ehypqWlwcHB4Z1zuL2rbBkZGVLb7Ozs+H+rqqpCXV0dDx8+5DO+/vprvnHubQ8fPsTYsWNhYWEBTU1NaGpqoqioCHl5eQAAeXl59O/fH5s2bQIAFBcX448//oC3t/dHnWvbtm1hY2ODDRs2AAA2btyIpk2bwsXF5Z3Pef36NQoKCqRur//bu5AQQgghhBBCCCF1p5Ix0W6fG2qg+y83NzckJSXh0qVLkJGRgbW1NVxdXZGYmCg1vBUAGGPgOE7q+ayWN4eqqqrUfUdHR+Tm5iIsLAyvXr2Cl5cX+vXr91HlVFZW/uA+tZXt7W1vN75xHMcPk/1Qho+PD1JTUxEZGYnTp08jLS0NOjo6KC0t5ffx9vbGsWPH8PDhQ+zduxdKSkro0aPHB8v+tlGjRvHDXKOjozFixIga5/KmuXPn8o2G1bclN/M+OpcQQgghhBBCCCGkvlAD3X9Vz0MXGRkJV1dXcBwHV1dXSCSSGg101tbWOHnypNTzT58+DQsLC8jKyr43R0NDAwMGDMDq1auxbds27Nq1i5/7TV5eXqoHXm3s7OyQlpbGP+dtVlZWtZbNysrqvcd9OyMpKQllZWW1Pp6UlAQ/Pz94eHjwC2U8fvxYap8OHTrAyMgI27Ztw6ZNm9C/f//3rmKroKBQ67kPGTIEeXl5WLp0Ka5du4bhw4e/t+xBQUF48eKF1M3ftOnfOGtCCCGEEEIIIYQQcVAD3X9Vz0MXGxvLzzXn4uKCCxcuSM0/BwCTJ09GQkICwsLCkJWVhfXr12P58uVS883VZvHixdi6dSuuX7+OrKws7NixAwYGBtDS0gJQtZJrQkICHjx4gGfPntV6jEGDBsHAwACenp44deoUbt68iV27diE5ORkAEBgYiJiYGKxatQrZ2dlYtGgRdu/e/cGyvWn8+PEoKCjAwIEDcf78eWRnZ2Pjxo3IzMwEUDXn3caNG5GRkYGzZ8/C29u7Rq87juMwePBgrFq1CvHx8RgyZMh7M01MTPDnn3/i3r17Uo192tra+O677xAYGIhvvvkGTZo0ee9xFBUVoaGhIXVTlKG3OSGEEEIIIYQQUtdokYi6Qy0Xb3B3d0dFRQXfGKetrQ1ra2vo6elJ9UBzdHTE9u3bsXXrVrRs2RIzZsxAaGio1AIRtVFTU8P8+fPh5OSENm3a4NatWzh06BBk/tuAFBERgfj4eBgZGcHBwaHWYygoKODo0aNo2LAhPDw8YGtri3nz5vE99zw9PbFkyRKEh4fDxsYGUVFRiI6Olmpg/BAdHR0cP36cXzG2devWWL16NT8sdt26dXj27BkcHBwwdOhQ+Pn5oWHDhjWO4+3tjfT0dDRu3LjGvHhvCw0Nxa1bt9C8eXN+wYxqI0eORGlpKXx9ff/2ORBCCCGEEEIIIYT8W3CstsnTCPmEbNq0Cf7+/rh///57h8m+y+Nurh/eSSCX0t69Iq/Q2g4oFi37dWahaNmPc1U/vJNAjMPdRMvmjK1Fy2a300XLvh0oES373gt10bIba4r3MzayuES07LWqSqJli0nMax73vYpo2bKmjUXLfnn4umjZftdrXwisPsxQfC1aduPu4vUbuBcn3oJiYp63mPU1ManNDxQtW7aJePW1irvi1deKpoWLlq3So4V42ROjRMuuT1YN24qWnfHwnGjZQpATuwCEvMvLly+Rm5uLuXPnYsyYMf+ocY4QQgghhBBCCCHkU0dDXMkna8GCBbC3t4e+vj6CgoLELg4hhBBCCCGEEELeQHPQ1R1qoCOfrJkzZ6KsrAwJCQlQU1MTuziEEEIIIYQQQgghgqAGOkIIIYQQQgghhBBCRERz0BFCCCGEEEIIIYSQj1ZJ647WGWqgI589MVfuQdpz0aLlOruIli1rmiNatq6IK+99qcRcQRaQiJZ8U15etOxWzcRbpRlXZUWL/mJXOxTxmou6KnfcHdGyG3cXb5VmiPhrLOmleCvIDulsJ1o24iSiRX+p9TUxV0oWc/X5CtGSxV1BVkxivtdUJooWTf6lqIGOEEIIIYQQQgghhHy0z3GxBrHQHHSEEEIIIYQQQgghhIiIGuiIIGbOnAl7e3uxi0EIIYQQQgghhBDyyfsiG+g4jsPevXvFLsZnbcqUKUhISPio55iYmCAyMlKYAhFCCCGEEEIIIaROVTIm2u1z89nNQVdaWgoFBQWxi/HFU1NTg5qamtjFIIQQQgghhBBCCPnk1WsPuv3790NLSwuVlZUAgLS0NHAch8DAQH6fMWPGYNCgQfz9Xbt2wcbGBoqKijAxMUFERITUMU1MTDBr1iz4+PhAU1MTo0ePRmlpKcaPHw9DQ0MoKSnBxMQEc+fO5fcHgL59+4LjOP5+be7evYuBAweiQYMGUFVVhZOTE86ePcs/vnLlSjRv3hwKCgqwtLTExo0bpZ7PcRzWrFmDvn37QkVFBebm5ti3b5/UPteuXUPPnj2hoaEBdXV1fP3118jJqVpRKSUlBV27doWuri40NTXh6uqKCxcu8M8dNGgQBg4cKHW8srIy6OrqIjo6GgDAGMOCBQtgamoKZWVltGrVCjt37nznOVdfo7CwMAwePBhqampo1KgRli1bJrVPXl4e+vTpAzU1NWhoaMDLywt//fUX//jbQ1x9fHzg6emJhQsXwtDQEDo6Ovjpp59QVlYGAHBzc8Pt27cREBAAjuPAcRwA4Pbt2/j222+hra0NVVVV2NjY4NChQ+8tPyGEEEIIIYQQQoTHRPzvc1OvDXQuLi4oLCzExYsXAQCJiYnQ1dVFYmIiv49EIoGrqysAIDU1FV5eXhg4cCCuXLmCmTNnIjg4GDExMVLHDQ8PR8uWLZGamorg4GAsXboU+/btw/bt25GZmYnY2Fi+IS4lJQUAEB0djfz8fP7+24qKiuDq6or79+9j3759uHTpEqZOnco3Lu7Zswf+/v6YPHkyrl69ijFjxmDEiBE4ceKE1HFCQkLg5eWFy5cvw8PDA97e3nj69CkA4N69e3BxcYGSkhKOHz+O1NRU+Pr6ory8HABQWFiI4cOHIykpCWfOnIG5uTk8PDxQWFgIAPD29sa+fftQVFTE5x05cgTFxcX4/vvvAQC//PILoqOjsXLlSly7dg0BAQEYMmSI1DWvTXh4OOzs7HDhwgUEBQUhICAA8fHxAKoa/Tw9PfH06VMkJiYiPj4eOTk5GDBgwHuPeeLECeTk5ODEiRNYv349YmJi+Ndy9+7daNKkCUJDQ5Gfn4/8/HwAwE8//YTXr1/jzz//xJUrVzB//nzqmUcIIYQQQgghhJDPSr0OcdXU1IS9vT0kEglat24NiUSCgIAAhISEoLCwEMXFxcjKyoKbmxsAYNGiRejcuTOCg4MBABYWFkhPT0d4eDh8fHz443bq1AlTpkzh7+fl5cHc3BwdO3YEx3EwNjbmH9PT0wMAaGlpwcDA4J1l3bx5Mx49eoSUlBQ0aNAAAGBmZsY/vnDhQvj4+GDcuHEAgEmTJuHMmTNYuHAh3N3d+f18fHz4HoFz5szBsmXLcO7cOXTv3h2//fYbNDU1sXXrVsjLy/Pn+OZ5vSkqKgra2tpITExEr1690K1bN6iqqmLPnj0YOnQoX+5vv/0WGhoaKC4uxqJFi3D8+HG0b98eAGBqaoqTJ08iKiqKbwitjbOzM6ZPn86X6dSpU1i8eDG6du2KY8eO4fLly8jNzYWRkREAYOPGjbCxsUFKSgratGlT6zG1tbWxfPlyyMrKokWLFujZsycSEhIwevRoNGjQALKyslBXV5d6XfLy8vD999/D1taWLz8hhBBCCCGEEELEx1il2EX4bNT7IhFubm6QSCRgjCEpKQl9+vRBy5YtcfLkSZw4cQL6+vpo0aIFACAjIwPOzs5Sz3d2dkZ2djYqKir4bU5OTlL7+Pj4IC0tDZaWlvDz88PRo0c/upxpaWlwcHDgG+fe9q6yZWRkSG2zs7Pj/62qqgp1dXU8fPiQz/j666/5xrm3PXz4EGPHjoWFhQU0NTWhqamJoqIi5OXlAQDk5eXRv39/bNq0CQBQXFyMP/74A97e3gCA9PR0lJSUoGvXrvyccGpqatiwYQM/jPZdqhv03rxffW4ZGRkwMjLiG+cAwNraGlpaWjXO/002NjaQlZXl7xsaGvLX4l38/Pwwa9YsODs749dff8Xly5ffu//r169RUFAgdXtdXvHe5xBCCCGEEEIIIYSISZQGuqSkJFy6dAkyMjKwtraGq6srEhMTpYa3AlVDKavnIntz29tUVVWl7js6OiI3NxdhYWF49eoVvLy80K9fv48qp7Ky8gf3qa1sb297u/GN4zh+mOyHMnx8fJCamorIyEicPn0aaWlp0NHRQWlpKb+Pt7c3jh07hocPH2Lv3r1QUlJCjx49AIDPOXjwINLS0vhbenr6B+ehe9/51nae79te7X3X4l1GjRqFmzdvYujQobhy5QqcnJxqzIf3prlz5/KNmdW3hccuvjeDEEIIIYQQQgghBACePXuGoUOH8m0KQ4cOxfPnz9+5f1lZGaZNmwZbW1uoqqqiUaNGGDZsGO7fv/9RufXeQFc9D11kZCRcXV3BcRxcXV0hkUhqNNBZW1vj5MmTUs8/ffo0LCwspHpi1UZDQwMDBgzA6tWrsW3bNuzatYuf+01eXl6qB15t7OzskJaWxj/nbVZWVrWWzcrK6r3HfTsjKSmJXyjhbUlJSfDz84OHhwe/UMbjx4+l9unQoQOMjIywbds2bNq0Cf379+dXsbW2toaioiLy8vJgZmYmdXuz91ttzpw5U+N+dc9Ga2tr5OXl4c6dO/zj6enpePHixUed/9sUFBRqfV2MjIwwduxY7N69G5MnT8bq1avfeYygoCC8ePFC6jali8M/LhMhhBBCCCGEEEJqVwkm2k0ogwcPRlpaGuLi4hAXF4e0tDR+WrHavHz5EhcuXEBwcDAuXLiA3bt3IysrC7179/6o3Hqdgw74/3noYmNjsWTJEgBVjXb9+/dHWVkZP/8cAEyePBlt2rRBWFgYBgwYgOTkZCxfvhwrVqx4b8bixYthaGgIe3t7yMjIYMeOHTAwMICWlhaAqlVKExIS4OzsDEVFRWhra9c4xqBBgzBnzhx4enpi7ty5MDQ0xMWLF9GoUSO0b98egYGB8PLygqOjIzp37oz9+/dj9+7dOHbs2N++FuPHj8eyZcswcOBABAUFQVNTE2fOnEHbtm1haWkJMzMzbNy4EU5OTigoKEBgYGCNXnccx2Hw4MFYtWoVsrKypBapUFdXx5QpUxAQEIDKykp07NgRBQUFOH36NNTU1DB8+PB3lu3UqVNYsGABPD09ER8fjx07duDgwYMAgC5dusDOzg7e3t6IjIxEeXk5xo0bB1dX1xrDjT+GiYkJ/vzzTwwcOBCKiorQ1dXFxIkT0aNHD1hYWODZs2c4fvz4exsBFRUVoaioKLXtpdz7G3MJIYQQQgghhBBCMjIyEBcXhzNnzuCrr74CAKxevRrt27dHZmYmLC0tazxHU1OTX1Sz2rJly9C2bVvk5eWhadOmfyu73nvQAYC7uzsqKir4xjhtbW1YW1tDT09PqvHF0dER27dvx9atW9GyZUvMmDEDoaGhUgtE1EZNTQ3z58+Hk5MT2rRpg1u3buHQoUOQkak63YiICMTHx8PIyAgODrX3rlJQUMDRo0fRsGFDeHh4wNbWFvPmzeN77nl6emLJkiUIDw+HjY0NoqKiEB0dLdXA+CE6Ojo4fvw4v2Js69atsXr1an4o6Lp16/Ds2TM4ODhg6NCh8PPzQ8OGDWscx9vbG+np6WjcuHGNefHCwsIwY8YMzJ07F1ZWVujWrRv279+PZs2avbdskydPRmpqKhwcHBAWFoaIiAh069YNQFWj4N69e6GtrQ0XFxd06dIFpqam2LZt298+99qEhobi1q1baN68Ob+YR0VFBX766SdYWVmhe/fusLS0/GADLSGEEEIIIYQQQoTHGBPtVusc9K9f/0/nk5ycDE1NTb5xDgDatWsHTU1NnD59+m8f58WLF+A4ju8o9ndwrLZJ3cgXzcTEBBMnTsTEiRPFLkqdeBk5RrTs5LnPRcvu8Htr0bJZ7vsXIRHSy8PXRctWHddTtGzO2Fq0bDHd8hKvwT7pZe2LCNWHvjZ3PryTQPpcFa9Xctz3KqJlv84sFC1bzGv+R0vxFlp6nKv64Z0E0ri7KN9hAwDG/FH74mH1wb1CvGs+5De7D+8kkNuBEtGyjcPdRMum+lr9E7O+JttEvOwX3iNEyxaT7pFEsYtQL5o2sBUt29fve4SEhEht+/XXXzFz5sx/fMw5c+YgJiYGWVlZUtstLCwwYsQIBAUFffAYJSUl6NixI1q0aIHY2Ni/nS1e7YMQQgghhBBCCCGEkH+gtjno39WANnPmTHAc997b+fPnAdRcEBT48KKY1crKyjBw4EBUVlZ+9Oi/ep+DjhBCCCGEEEIIIYT8+wm5WMOH1DYH/buMHz8eAwcOfO8+JiYmuHz5Mv76668ajz169Aj6+vrvfX5ZWRm8vLyQm5uL48ePQ0ND42+VrRo10JEabt26JXYRCCGEEEIIIYQQQuqErq4udHV1P7hf+/bt8eLFC5w7dw5t27YFAJw9exYvXrxAhw4d3vm86sa57OxsnDhxAjo6Oh9dRhriSgghhBBCCCGEEEI+mpiLRAiheoHK0aNH48yZMzhz5gxGjx6NXr16Sa3g2qJFC+zZswcAUF5ejn79+uH8+fPYtGkTKioq8ODBAzx48AClpaV/O5sa6AghhBBCCCGEEEIIAbBp0ybY2trim2++wTfffAM7Ozts3LhRap/MzEy8ePECAHD37l3s27cPd+/ehb29PQwNDfnbx6z8Squ4ks/e6iZDRMu+KVcpWrZpuXjt72Ke93CF56Jl6zYrFi1bzNUOxdR0nJFo2WKu0nxMWbxVPf8seyBatou8gWjZYhLzmoeW64mWfVNevNVMv9TfY+tLtUTLFrPeIibTsjLRssX8GRNzNXJFS3XRssVcEVxMmpuiRcsWcwXZL2UVV0Mt8VYIzn+eLlq2EL7M34SEEEIIIYQQQgghhHwiqIGOEEIIIYQQQgghhBAR0SquhBBCCCGEEEIIIeSjMdCsaXWFetARQgghhBBCCCGEECIiaqATGcdx2Lt3r9jFqDcSiQQcx+H58+diF4UQQgghhBBCCCH/A8aYaLfPDTXQCai0tFTsIvxr0bUjhBBCCCGEEELIl+KLbaDbv38/tLS0UFlZCQBIS0sDx3EIDAzk9xkzZgwGDRrE39+1axdsbGygqKgIExMTRERESB3TxMQEs2bNgo+PDzQ1NTF69GiUlpZi/PjxMDQ0hJKSEkxMTDB37lx+fwDo27cvOI7j77/tfcfw9fVFr169pPYvLy+HgYEB1q1bBwBwc3PDhAkTMHHiRGhra0NfXx+///47iouLMWLECKirq6N58+Y4fPgwf4zqnm5HjhyBg4MDlJWV0alTJzx8+BCHDx+GlZUVNDQ0MGjQILx8+ZJ/HmMMCxYsgKmpKZSVldGqVSvs3LkTAHDr1i24u7sDALS1tcFxHHx8fPgyjh8/HpMmTYKuri66du36t86NEEIIIYQQQggh5N/ui22gc3FxQWFhIS5evAgASExMhK6uLhITE/l9JBIJXF1dAQCpqanw8vLCwIEDceXKFcycORPBwcGIiYmROm54eDhatmyJ1NRUBAcHY+nSpdi3bx+2b9+OzMxMxMbG8g1xKSkpAIDo6Gjk5+fz99/2vmOMGjUKcXFxyM/P5/c/dOgQioqK4OXlxW9bv349dHV1ce7cOUyYMAE//vgj+vfvjw4dOuDChQvo1q0bhg4dKtXYBgAzZ87E8uXLcfr0ady5cwdeXl6IjIzE5s2bcfDgQcTHx2PZsmX8/r/88guio6OxcuVKXLt2DQEBARgyZAgSExNhZGSEXbt2AQAyMzORn5+PJUuWSJVRTk4Op06dQlRU1N8+N0IIIYQQQgghhNS/SjDRbp+bL3YVV01NTdjb20MikaB169aQSCQICAhASEgICgsLUVxcjKysLLi5uQEAFi1ahM6dOyM4OBgAYGFhgfT0dISHh/O9wACgU6dOmDJlCn8/Ly8P5ubm6NixIziOg7GxMf+Ynp4eAEBLSwsGBgbvLOv7jtGhQwdYWlpi48aNmDp1KoCqBr/+/ftDTU2N369Vq1b45ZdfAABBQUGYN28edHV1MXr0aADAjBkzsHLlSly+fBnt2rXjnzdr1iw4OzsDAEaOHImgoCDk5OTA1NQUANCvXz+cOHEC06ZNQ3FxMRYtWoTjx4+jffv2AABTU1OcPHkSUVFRcHV1RYMGDQAADRs2hJaWltR5mpmZYcGCBVLb/s65EUIIIYQQQgghhPybfbE96ICqYZUSiQSMMSQlJaFPnz5o2bIlTp48iRMnTkBfXx8tWrQAAGRkZPANVdWcnZ2RnZ2NiooKfpuTk5PUPj4+PkhLS4OlpSX8/Pxw9OjRjy7nh44xatQoREdHAwAePnyIgwcPwtfXV2ofOzs7/t+ysrLQ0dGBra0tv01fX59//ruep6+vDxUVFb5xrnpb9XPS09NRUlKCrl27Qk1Njb9t2LABOTk5HzzPt6/d3z23N71+/RoFBQVStzJW8c79CSGEEEIIIYQQ8s/QIhF154tvoEtKSsKlS5cgIyMDa2truLq6IjExUWp4K1D1puM4Tur5tb0hVFVVpe47OjoiNzcXYWFhePXqFby8vNCvX7+PKueHjjFs2DDcvHkTycnJ/PDXr7/+WuoY8vLyUvc5jpPaVn1u1XPy1fa8t59Tva36OdX/P3jwINLS0vhbeno6Pw/d+7x97f7uub1p7ty50NTUlLodLrz2wWxCCCGEEEIIIYQQsXyxQ1yB/5+HLjIyEq6uruA4Dq6urpg7dy6ePXsGf39/fl9ra2ucPHlS6vmnT5+GhYUFZGVl35ujoaGBAQMGYMCAAejXrx+6d++Op0+fokGDBpCXl5fqgfdPjqGjowNPT09ER0cjOTkZI0aM+GcX5H9kbW0NRUVF5OXlSTVuvklBQQEA/tY5A/jocwsKCsKkSZOktsVajflbWYQQQgghhBBCCCFi+KIb6KrnoYuNjeUXK3BxcUH//v1RVlbGzz8HAJMnT0abNm0QFhaGAQMGIDk5GcuXL8eKFSvem7F48WIYGhrC3t4eMjIy2LFjBwwMDPj510xMTJCQkABnZ2coKipCW1v7o48BVA0F7dWrFyoqKjB8+PD/+dr8E+rq6pgyZQoCAgJQWVmJjh07oqCgAKdPn4aamhqGDx8OY2NjcByHAwcOwMPDA8rKyh+cT+5jzk1RURGKiopS2+S59zegEkIIIYQQQggh5ONVfoZDTcXyRQ9xBQB3d3dUVFTwjXHa2tqwtraGnp4erKys+P0cHR2xfft2bN26FS1btsSMGTMQGhoqtUBEbdTU1DB//nw4OTmhTZs2uHXrFg4dOgQZmapLHxERgfj4eBgZGcHBweEfHQMAunTpAkNDQ3Tr1g2NGjX63y7K/yAsLAwzZszA3LlzYWVlhW7dumH//v1o1qwZAKBx48YICQnB9OnToa+vj/Hjx3/wmJ/KuRFCCCGEEEIIIYQIgWOf48x6X6CXL1+iUaNGWLduHb777juxi1On/tdzW91kiACl+ntuylV+eCeBmJaL1/4u5nkPV3guWrZus2LRsh/n1pzD8UvQdJyRaNnJc5+Lln1MWbyewX+WPRAt20X+3Suef87EvOah5XqiZd98a97bes3+Qn+PrS/VEi1bzHqLmEzLykTLFvNnrK/NHdGyFS3VRct+nVkoWraYNDdFi5b9wluc6Z8AQPdIomjZ9UlbzUy07GdFN0TLFsIXPcT1c1BZWYkHDx4gIiICmpqa6N27t9hFqjOf87kRQgghhBBCCCGEVKMGun+5vLw8NGvWDE2aNEFMTAzk5D6fl/RzPjdCCCGEEEIIIeTfrhI0KLOuUIvHv5yJiQk+11HKn/O5EUIIIYQQQgghhFT7Mid7IIQQQgghhBBCCCHkE0E96AghhBBCCCGEEELIR6NRb3WHGujIZ++ErHgraxpDWbRsMVegu81eiZYt5kqqKj1aiJate/i6aNliriCbt0K8VeA6/O4mWvZav2TRso3lNEXLFpOYn2tirl7b9vsC0bIbx4m4EvoL8VZ5XA8t0bL/01u813vOPg3RssVcQVbMVbkB8X7GxFxJVa6zi2jZsqY5omW/FLGuKOZKqmKuIEvIx6IGOkIIIYQQQgghhBDy0SqpB12doTnoCCGEEEIIIYQQQggR0WfdQHf9+nW0a9cOSkpKsLe3F7s4teI4Dnv37v2o57i5uWHixIn8fRMTE0RGRtZpuerarVu3wHEc0tLSxC4KIYQQQgghhBBCyCflXznEleM47NmzB56enu/d79dff4WqqioyMzOhpqYmWHlu3bqFZs2a4eLFi6I0BKakpEBVVbw5oP4OIyMj5OfnQ1dXV+yiEEIIIYQQQgghpA4w0BDXuvLJNdCVlpZCQUGhTo6Vk5ODnj17wtjY+J37lJWVQV5evk7yxKKnpyd2ET5IVlYWBgbiTXJNCCGEEEIIIYQQ8qn6qCGu+/fvh5aWFiorq1b8SUtLA8dxCAwM5PcZM2YMBg0axN/ftWsXbGxsoKioCBMTE0REREgd08TEBLNmzYKPjw80NTUxevRolJaWYvz48TA0NISSkhJMTEwwd+5cfn8A6Nu3LziO4++/jeM4pKamIjQ0FBzHYebMmfwwy+3bt8PNzQ1KSkqIjY3FkydPMGjQIDRp0gQqKiqwtbXFli1bpI5XWVmJ+fPnw8zMDIqKimjatClmz54NAGjWrBkAwMHBARzHwc3NDUBVz7auXbtCV1cXmpqacHV1xYULFz7mkqO4uBjDhg2DmpoaDA0Na1y/6mvy5hBXjuMQFRWFXr16QUVFBVZWVkhOTsaNGzfg5uYGVVVVtG/fHjk50qsI7d+/H61bt4aSkhJMTU0REhKC8vJyqeOuWbMGffv2hYqKCszNzbFv3z7+8WfPnsHb2xt6enpQVlaGubk5oqOrVs2pbYhrYmIi2rZtC0VFRRgaGmL69OlSeW5ubvDz88PUqVPRoEEDGBgYYObMmR91/QghhBBCCCGEECKMSsZEu31uPqqBzsXFBYWFhbh48SKAqgYWXV1dJCYm8vtIJBK4uroCAFJTU+Hl5YWBAwfiypUrmDlzJoKDgxETEyN13PDwcLRs2RKpqakIDg7G0qVLsW/fPmzfvh2ZmZmIjY3lG+JSUlIAANHR0cjPz+fvvy0/Px82NjaYPHky8vPzMWXKFP6xadOmwc/PDxkZGejWrRtKSkrQunVrHDhwAFevXsUPP/yAoUOH4uzZs/xzgoKCMH/+fAQHByM9PR2bN2+Gvr4+AODcuXMAgGPHjiE/Px+7d+8GABQWFmL48OFISkrCmTNnYG5uDg8PDxQWFv7tax4YGIgTJ05gz549OHr0KCQSCVJTUz/4vLCwMAwbNgxpaWlo0aIFBg8ejDFjxiAoKAjnz58HAIwfP57f/8iRIxgyZAj8/PyQnp6OqKgoxMTE8I2Q1UJCQuDl5YXLly/Dw8MD3t7eePr0KQDw1+bw4cPIyMjAypUr3zmk9d69e/Dw8ECbNm1w6dIlrFy5EmvXrsWsWbOk9lu/fj1UVVVx9uxZLFiwAKGhoYiPj//b148QQgghhBBCCCHkU/dRQ1w1NTVhb28PiUSC1q1bQyKRICAgACEhISgsLERxcTGysrL4HmSLFi1C586dERwcDACwsLBAeno6wsPD4ePjwx+3U6dOUg1oeXl5MDc3R8eOHcFxnNQQ1erhnFpaWu8dMmlgYAA5OTmoqanx+z1+/BgAMHHiRHz33XdS+7+ZP2HCBMTFxWHHjh346quvUFhYiCVLlmD58uUYPnw4AKB58+bo2LGjVJl0dHSkytSpUyepjKioKGhrayMxMRG9evV6Z9mrFRUVYe3atdiwYQO6du0KoKrBqkmTJh987ogRI+Dl5QWgqkGyffv2CA4ORrdu3QAA/v7+GDFiBL//7NmzMX36dP78TE1NERYWhqlTp+LXX3/l9/Px8eF7SM6ZMwfLli3DuXPn0L17d+Tl5cHBwQFOTk4A8M7ejQCwYsUKGBkZYfny5eA4Di1atMD9+/cxbdo0zJgxAzIyVW3HdnZ2fL65uTmWL1+OhIQE/noQQgghhBBCCCFEHOwz7Mkmlo9exdXNzQ0SiQSMMSQlJaFPnz5o2bIlTp48iRMnTkBfXx8tWrQAAGRkZMDZ2Vnq+c7OzsjOzkZFRQW/rbpBp5qPjw/S0tJgaWkJPz8/HD169J+c2zu9nVdRUYHZs2fDzs4OOjo6UFNTw9GjR5GXl8efx+vXr9G5c+ePynn48CHGjh0LCwsLaGpqQlNTE0VFRfxxPyQnJwelpaVo3749v61BgwawtLT84HPt7Oz4f1f39LO1tZXaVlJSgoKCAgDghwOrqanxt9GjRyM/Px8vX76s9biqqqpQV1fHw4cPAQA//vgjtm7dCnt7e0ydOhWnT59+Z/kyMjLQvn17cBzHb3N2dkZRURHu3r1bax4AGBoa8nm1ef36NQoKCqRuFazinfsTQgghhBBCCCGEiO0fNdAlJSXh0qVLkJGRgbW1NVxdXZGYmCg1vBWoakl9swGmetvb3l6B1NHREbm5uQgLC8OrV6/g5eWFfv36fWxR3+ntvIiICCxevBhTp07F8ePHkZaWhm7duqG0tBQAoKys/I9yfHx8kJqaisjISJw+fRppaWnQ0dHhj/sh/0tL9JsLX1S/BrVtq55PsLKyEiEhIUhLS+NvV65cQXZ2NpSUlGo9bvVxqo/Ro0cP3L59GxMnTsT9+/fRuXNnqZ6Jb5/bu94bb25/X15t5s6dyzeGVt/SX2S9c39CCCGEEEIIIYQQsX10A131PHSRkZFwdXUFx3FwdXWFRCKp0UBnbW2NkydPSj3/9OnTsLCwgKys7HtzNDQ0MGDAAKxevRrbtm3Drl27+LnO5OXlpXrg/a+qewIOGTIErVq1gqmpKbKzs/nHzc3NoaysjISEhFqfX73q7NtlSkpKgp+fHzw8PPiFMqqH2f4dZmZmkJeXx5kzZ/htz549Q1ZW3Tc4OTo6IjMzE2ZmZjVu1cNN/w49PT34+PggNjYWkZGR+P3332vdz9raGqdPn5ZqhDx9+jTU1dXRuHHjf3weQUFBePHihdTNWtPiHx+PEEIIIYQQQgghtWMi/ve5+ag56ID/n4cuNjYWS5YsAVDVaNe/f3+UlZXx888BwOTJk9GmTRuEhYVhwIABSE5OxvLly7FixYr3ZixevBiGhoawt7eHjIwMduzYAQMDA2hpaQGomtssISEBzs7OUFRUhLa29seehhQzMzPs2rULp0+fhra2NhYtWoQHDx7AysoKAKCkpIRp06Zh6tSpUFBQgLOzMx49eoRr165h5MiRaNiwIZSVlREXF4cmTZpASUkJmpqaMDMzw8aNG+Hk5ISCggIEBgZ+VG88NTU1jBw5EoGBgdDR0YG+vj5+/vnnj2ow+7tmzJiBXr16wcjICP3794eMjAwuX76MK1eu1Fi44X3HaN26NWxsbPD69WscOHCAv4ZvGzduHCIjIzFhwgSMHz8emZmZ+PXXXzFp0qT/6fwUFRWhqKgotU2We39jMCGEEEIIIYQQQoiY/lFLiLu7OyoqKvjGOG1tbVhbW0NPT0+qQcbR0RHbt2/H1q1b0bJlS8yYMQOhoaFSC0TURk1NDfPnz4eTkxPatGmDW7du4dChQ3zDTUREBOLj42FkZAQHB4d/cgpSgoOD4ejoiG7dusHNzQ0GBgbw9PSssc/kyZMxY8YMWFlZYcCAAfxcaHJycli6dCmioqLQqFEj9OnTBwCwbt06PHv2DA4ODhg6dCj8/PzQsGHDjypbeHg4XFxc0Lt3b3Tp0gUdO3ZE69at/+dzflu3bt1w4MABxMfHo02bNmjXrh0WLVoktUDHhygoKCAoKAh2dnZwcXGBrKwstm7dWuu+jRs3xqFDh3Du3Dm0atUKY8eOxciRI/HLL7/U1SkRQgghhBBCCCFEQIwx0W6fG459jmdFyBsGG/cVLduY+2fzF/7b3WavRMte2uKpaNkqPVqIlv3y8HXRsh/nqn54p8+QcbibaNkj/ZJFyxaTmJ+pYn6uiXne/+ldIFr2vbh3zzkrePYLddGyjymL1/NfzNd7zj4N0bJNy+t+dMrfdVNOvPe5mMR8r8l1dhEtm+XmiJYtZl1RTJqbokXLltc1FS27PikoNhEtu/T13Q/v9C8i3m8jQgghhBBCCCGEEELIx89BRwghhBBCCCGEEEIIDcqsO9SDjhBCCCGEEEIIIYQQEVEPOkIIIYQQQgghhBDy0aj/XN2hHnSEEEIIIYQQQgghhIiJEUJqVVJSwn799VdWUlJC2ZRN2ZRN2ZRN2ZRN2ZRN2ZRN2ZRNiGA4xmhGP0JqU1BQAE1NTbx48QIaGhqUTdmUTdmUTdmUTdmUTdmUTdmUTdmECIKGuBJCCCGEEEIIIYQQIiJqoCOEEEIIIYQQQgghRETUQEcIIYQQQgghhBBCiIiogY6Qd1BUVMSvv/4KRUVFyqZsyqZsyqZsyqZsyqZsyqZsyqZsQgRDi0QQQgghhBBCCCGEECIi6kFHCCGEEEIIIYQQQoiIqIGOEEIIIYQQQgghhBARUQMdIYQQQgghhBBCCCEiogY6Qggh5DPEGMPt27fx6tUrsYtCiGDofV7/KioqkJiYiGfPnoldFEIIIeSzQg10hLzH8+fP6zXvxo0bOHLkCP+HhtBruJSXlyMkJAR37twRNOdT9OrVK7x8+ZK/f/v2bURGRuLo0aMilkp4ubm5YhdBFDNnzsTt27fFLka9YozB3Nwcd+/erffs8vJyrF+/Hg8ePKj37LKyMpiamiI9Pb3es8U8bzE/z8W85mK/z+Xk5HD16lVRssV6vWVlZdGtW7d6ryN9SkpLS5GZmYny8vJ6yRPzs+VTU1BQgL179yIjI0PsogiquLhY7CKIIiYmRqp+TsiXhhroCPmv+fPnY9u2bfx9Ly8v6OjooHHjxrh06ZKg2U+ePEGXLl1gYWEBDw8P5OfnAwBGjRqFyZMnC5YrJyeH8PBwVFRUCJbxISkpKZg6dSoGDhyI7777TuompD59+mDDhg0Aqhpiv/rqK0RERKBPnz5YuXKloNlxcXE4efIkf/+3336Dvb09Bg8eLHiPBDMzM7i7uyM2NhYlJSWCZr1t/fr1OHjwIH9/6tSp0NLSQocOHQRvPNu/fz+aN2+Ozp07Y/PmzfV67hUVFVi7di0GDx6MLl26oFOnTlI3ocjIyMDc3BxPnjwRLONd5OTk8OOPP+L169f1ni0vL4/Xr1+D47h6zxbzvMX8PBfzmov9Pjc2Nhblmov9+9vW1hY3b94UJVvM3yUvX77EyJEjoaKiAhsbG+Tl5QEA/Pz8MG/ePMFyxfxsqXb37l2sWLEC06dPx6RJk6RuQvLy8sLy5csBVH256uTkBC8vL9jZ2WHXrl2CZl+4cAFXrlzh7//xxx/w9PTEf/7zH5SWlgqara+vD19fX6n6Yn0Rs54aFBQEAwMDjBw5EqdPnxY0qzYJCQn4z3/+g1GjRsHX11fqRkh9oAY6Qv4rKioKRkZGAID4+HjEx8fj8OHD6NGjBwIDAwXNDggIgJycHPLy8qCiosJvHzBgAOLi4gTN7tKlCyQSiaAZ77J161Y4OzsjPT0de/bsQVlZGdLT03H8+HFoamoKmn3hwgV8/fXXAICdO3dCX18ft2/fxoYNG7B06VJBswMDA1FQUAAAuHLlCiZPngwPDw/cvHlT8IrupUuX4ODggMmTJ8PAwABjxozBuXPnBM2sNmfOHCgrKwMAkpOTsXz5cixYsAC6uroICAgQNDs1NRUXLlyAnZ0dAgICYGhoiB9//BEpKSmC5gKAv78//P39UVFRgZYtW6JVq1ZSNyEtWLAAgYGBovTw+eqrr5CWllbvuQAwYcIEzJ8/v956t7xJzPMW8/NczGsu5vv8l19+QVBQEJ4+fVrv2WK+3rNnz8aUKVNw4MAB5Ofno6CgQOomJDF/lwQFBeHSpUuQSCRQUlLit3fp0kXqS14hiPnZkpCQAEtLS6xYsQIRERE4ceIEoqOjsW7dOsHL9Oeff/L1tT179oAxhufPn2Pp0qWYNWuWoNljxoxBVlYWAODmzZsYOHAgVFRUsGPHDkydOlXQ7C1btuDFixfo3LkzLCwsMG/ePNy/f1/QzGpi1lPv3r2L2NhYPHv2DO7u7mjRogXmz59fL71HQ0JC8M033yAhIQGPHz/Gs2fPpG6E1AeOCT2GjpB/CWVlZWRlZcHIyAj+/v4oKSlBVFQUsrKy8NVXXwn6wWxgYIAjR46gVatWUFdXx6VLl2Bqaorc3FzY2tqiqKhIsOyoqCjMnDkT3t7eaN26NVRVVaUe7927t2DZdnZ2GDNmDH766Sf+vJs1a4YxY8bA0NAQISEhgmWrqKjg+vXraNq0Kby8vGBjY4Nff/0Vd+7cgaWlpaDd69XU1HD16lWYmJhg5syZuHr1Knbu3IkLFy7Aw8OjXioh5eXl2L9/P2JiYnD48GGYm5tj5MiRGDp0KPT09ATJfPOaT5s2Dfn5+diwYQOuXbsGNzc3PHr0SJDct1Wfe3R0NOLi4mBpaYlRo0bBx8dHkIZhXV1dbNiwAR4eHnV+7A/R1tbGy5cvUV5eDgUFBf6P2mpCNirs2LED06dPR0BAQK2fLXZ2doJl9+3bFwkJCVBTU4OtrW2N7N27dwuWLeZ5i/l5LuY1F/N97uDggBs3bqCsrAzGxsY1zvvChQuCZYv5esvI/P93/G/2nGSMgeM4QXv2ifm7xNjYGNu2bUO7du2k6ms3btyAo6OjoI2TYn62tG3bFt27d0doaCh/3g0bNoS3tze6d++OH3/8UbDsN+vnw4YNQ6NGjTBv3jzk5eXB2tpa0DqypqYmLly4gObNm2P+/Pk4fvw4jhw5glOnTmHgwIH1MsT8yZMn2LBhA2JiYpCeno5u3brB19cXvXv3hpycnCCZn0I9FQAePnyI2NhYxMTE4Pr16+jevTtGjhyJb7/9VuozqK4YGhpiwYIFGDp0aJ0fm5C/S5ifakL+hbS1tXHnzh0YGRkhLi6O/1aOMSb4EJLi4mKpnnPVHj9+DEVFRUGzqytVixYtqvGY0JXsnJwc9OzZEwCgqKiI4uJicByHgIAAdOrUSdAGOjMzM+zduxd9+/bFkSNH+G/dHz58CA0NDcFyAUBBQYFvADx27BiGDRsGAGjQoIHgPQ+qycnJoW/fvvDw8MCKFSsQFBSEKVOmICgoCAMGDMD8+fNhaGhYp5lqamp48uQJmjZtiqNHj/LXXElJqV4neK+srERpaSlev34NxhgaNGiAlStXIjg4GKtXr8aAAQPqNE9BQQFmZmZ1esy/KzIyUpRcAPx19PPz47dxHFcvf8BraWnh+++/F+z47yPmeYv5eS7mNRfzfe7p6Slatpiv94kTJwQ79oeI+bvk0aNHaNiwYY3t1fUXIYn52ZKRkYEtW7YAqKo/vHr1CmpqaggNDUWfPn0EbaAzMjJCcnIyGjRogLi4OGzduhUA8OzZM6lejEJgjKGyshJAVX2tV69efJkeP34saHY1HR0dBAQEICAgAMuWLUNgYCAOHToEXV1djB07FtOnT6/1b4j/xadQTwWAhg0bwtnZGZmZmcjKysKVK1fg4+MDLS0tREdHw83NrU7zSktL0aFDhzo9JiEfixroCPmv7777DoMHD+bnsunRowcAIC0tTfA/rl1cXLBhwwaEhYUBqKpwVVZWIjw8HO7u7oJmV1c8xNCgQQMUFhYCABo3boyrV6/C1tYWz58/F3yC2BkzZmDw4MEICAhA586d0b59ewDA0aNH4eDgIGh2x44dMWnSJDg7O+PcuXP8sJisrCw0adJE0Oxq58+fx7p167B161aoqqpiypQpGDlyJO7fv48ZM2agT58+dT70tWvXrhg1ahQcHByQlZXFN85eu3YNJiYmdZpVm9TUVERHR2PLli1QVFTEsGHD8Ntvv/E/3xEREfDz86vzBrrJkydjyZIlWL58eb3P0TV8+PB6zXuTmAuSREdHi5Yt5nmL+Xku5jUX833+66+/ipYt5uvt6uoqWraYv0vatGmDgwcPYsKECQD+v/fg6tWr+XqEUMT8bFFVVeXnv2vUqBFycnJgY2MDAII3VE2cOBHe3t5QU1ODsbEx3yjz559/wtbWVtBsJycnzJo1C126dEFiYiI/R3Fubi709fUFza724MEDbNiwAdHR0cjLy0O/fv34+tq8efNw5syZOl/gTOx66l9//YWNGzciOjoaN2/ehKenJw4cOIAuXbrg1atX+OWXXzB8+PA6n3Ny1KhR2Lx5M4KDg+v0uIR8DGqgI+S/Fi9eDBMTE9y5cwcLFiyAmpoaACA/Px/jxo0TNDs8PBxubm44f/48SktLMXXqVFy7dg1Pnz7FqVOnBM1+U0lJieDfRr7p66+/Rnx8PGxtbeHl5QV/f38cP34c8fHx6Ny5s6DZ/fr1Q8eOHZGfny81D1jnzp3Rt29fQbOXL1+OcePGYefOnVi5ciUaN24MADh8+DC6d+8uaPaiRYsQHR2NzMxMeHh48EMvq4cKNGvWDFFRUWjRokWdZ//222/45ZdfcOfOHezatQs6OjoAqhrOBg0aVOd5b7Kzs0NGRga++eYbrF27Ft9++y1kZWWl9hk2bJgg802ePHkSJ06cwOHDh2FjYwN5eXmpx4Uc+gdULVJRveIdx3GwtrZG7969a5x/XTM2Nhb0+H/Ho0ePkJmZCY7jYGFhIdjw7Td9CuctJjGuOSDe+7xaamqqVLbQX/R8Cp4/f461a9dKnbevr6/gc8iK+btk7ty56N69O9LT01FeXo4lS5bg2rVrSE5ORmJioqDZYn62tGvXDqdOnYK1tTV69uyJyZMn48qVK9i9ezfatWsnaPa4cePQtm1b3LlzB127duXrK6ampoLPQRcZGQlvb2/s3bsXP//8M/+F3s6dOwXvabV7925ER0fjyJEjsLa2xk8//YQhQ4ZAS0uL38fe3l6Qzxox66nffvstjhw5AgsLC4wePRrDhg1DgwYN+MeVlZUxefJkLF68uM6zS0pK8Pvvv+PYsWOws7OrUV+rrbcyIXWN5qAj5BPx4MEDrFy5EqmpqaisrISjoyN++umnOh9m+LaKigrMmTMHq1atwl9//YWsrCyYmpoiODgYJiYmGDlypGDZT58+RUlJCRo1aoTKykosXLgQJ0+ehJmZGYKDg6GtrS1Y9tsKCgpw/PhxWFpawsrKqt5y65u5uTl8fX0xYsQIGBgY1LpPaWkptmzZUue9UvLy8tCkSZMa84YwxnDnzh00bdq0TvPeFBYWBl9fX76SWZ9GjBjx3seF7Hl048YNeHh44N69e7C0tARjjJ/L5+DBg2jevLlg2UDVMPbIyEj+D3grKyv4+/sLnltcXIwJEyZgw4YNfC8jWVlZDBs2DMuWLavz4UBvE+u8ASAxMRELFy6Uyg4MDOQnWReKmNdczPf5w4cPMXDgQEgkEmhpaYExhhcvXsDd3R1bt24VvIFSrNf7/Pnz6NatG5SVldG2bVswxnD+/Hm8evUKR48ehaOjo6D5Yrp69SrCw8Ol6mvTpk0TvDcXAGzcuBGrVq1Cbm4ukpOTYWxsjMjISDRr1gx9+vQRLPfmzZsoKiqCnZ0dXr58iSlTpvD1tcWLF9dr42FFRQWuXLkCY2Pjeq0nvqmkpASysrI1GnDqkqamJgYOHIhRo0ahTZs2te7z6tUrLFiwQNSevHVt5MiRGDVq1Ht7pDLGkJeXV+fvu/eNWuI4DsePH6/TPEJqxQghjDHG1q9f/96bUEpLS5mbmxvLzMwULON9QkJCmKmpKYuNjWXKysosJyeHMcbYtm3bWLt27UQpU33o378/W7ZsGWOMsZcvXzJzc3MmLy/P5OTk2M6dOwXNTk1NZZcvX+bv7927l/Xp04cFBQWx169fC5qdm5vLKioqamyvrKxkt2/fFjRbRkaG/fXXXzW2P378mMnIyAiWW1paypo1a8auXbsmWManqkePHqx79+7syZMn/LbHjx+z7t27Mw8PD0Gz4+LimIKCAmvbti0LCAhgEydOZG3btmWKiors6NGjgmb/8MMPzNTUlB06dIi9ePGCvXjxgh08eJA1b96cjR07VtBsMc9748aNTE5Ojnl5ebElS5awyMhI5uXlxeTl5dmmTZsEzRbzmov5Pvfy8mKtW7dm6enp/LZr164xJycnNnDgQEGzxXy9O3bsyHx8fFhZWRm/raysjA0fPpx9/fXXgmYnJia+9yaU0tJS5uPjw9eT6tuKFSuYrq4umzVrllR9LTo6mrm5uYlSpvrg7+/P1qxZwxhjrLy8nDk7OzOO45iqqio7ceKEoNl5eXnszp07/P2zZ88yf39/FhUVJWguY4wVFxcLnvEuYtbXxPybiJBPATXQEfJfWlpaUjdVVVXGcRxTVFRk2tragmbr6uqyrKwsQTPepXnz5uzYsWOMMcbU1NT4Cl9GRgbT0tISPP/GjRvs559/ZgMHDuQrA4cPH2ZXr14VNFdfX5+lpaUxxhjbtGkTMzMzY8XFxWzFihXM3t5e0GwnJye+ETAnJ4cpKSmxQYMGMTMzM+bv7y9otliVLsYY4ziu1uxbt24xFRUVQbMbNWok9Qe0GB4+fMiSkpLYyZMn2cOHD+slU0VFRaoxuFpaWhpTVVUVNNve3p5NmzatxvZp06YxBwcHQbN1dHRq/cPt+PHjTFdXV9BsMc+7RYsWbNGiRTW2R0REsBYtWgiaLeY1F/N9rqGhwc6dO1dj+9mzZ5mmpqag2WK+3kpKSiwjI6PG9mvXrjFlZWVBszmOq3GTkZHhb0LS1NQUrYHOysqK7dmzhzEmXV+7cuUK09HRETz/2bNnbPXq1Wz69Ol8Y3hqaiq7e/euoLmNGzdmKSkpjDHG9uzZwxo1asQyMzPZzz//zDp06CBodseOHdmGDRsYY4zl5+czDQ0N1r59e6ajo8NCQkIEzf4U62v37t1jSkpKgmaL+TfRm+7cuSP4e5uQ2tT9+sSE/Es9e/ZM6lZUVITMzEx07NiRX7lKKMOGDcPatWsFzXiXe/fu1boIRmVlJcrKygTNTkxMhK2tLc6ePYvdu3ejqKgIAHD58mXBu+u/ePGCn9MiLi4O33//PVRUVNCzZ09kZ2cLmp2VlQV7e3sAwI4dO+Di4oLNmzcjJiYGu3btEjSbvWNWg6KiIsHmH5w0aRImTZoEjuMwY8YM/v6kSZPg7++PAQMG8NdDKBMmTMD8+fNRXl4uaE5tiouL4evrC0NDQ7i4uODrr79Go0aNMHLkSMEXQ1FUVOQXYnlTUVERFBQUBM3OyMiodYi8r68v0tPTBc1++fJlrRN4N2zYUPBrLuZ537x5E99++22N7b179xZ8gnkxr7mY7/PKyspah7nJy8sLvoiDmK+3hoYG8vLyamy/c+cO1NXVBc1+u7728OFDxMXFoU2bNnU+Wf7b+vbti7179wqa8S65ubm1zjemqKiI4uJiQbMvX74MCwsLzJ8/HwsXLsTz588BAHv27EFQUJCg2Y8fP+an5Dh06BD69+8PCwsLjBw5EleuXBE0++rVq2jbti0AYPv27WjZsiVOnz7N19mE9K762uvXrwX7XFu6dCmWLl0KjuOwZs0a/v7SpUuxePFi/PTTT4LMUfwmMf8mqqysRGhoKDQ1NWFsbIymTZtCS0sLYWFhoi7KQ74stEgEIe9hbm6OefPmYciQIbh+/bpgOaWlpVizZg3i4+Ph5OQEVVVVqceFnJTUxsYGSUlJNeZx2LFjh+CTXE+fPh2zZs3CpEmTpCr07u7uWLJkiaDZRkZGSE5ORoMGDRAXF4etW7cCqKr4C71QBmOM/0V/7Ngx9OrViy+TUKuhTZo0CQD4RrI354OqqKjA2bNnBWsku3jxIoCq875y5YpUxVJBQQGtWrXClClTBMmudvbsWSQkJODo0aOwtbWt8TMm5EINkyZNQmJiIvbv3w9nZ2cAVQtH+Pn5YfLkyfyqcELo1asXfvjhB6xdu5b/I+Ps2bMYO3YsevfuLVguAOjp6SEtLQ3m5uZS29PS0tCwYUNBs9u3b49ff/0VGzZs4H+eX716hZCQEMFXWhTzvI2MjJCQkFDjS5eEhAQYGRkJmi3mNRfzfd6pUyf4+/tjy5YtaNSoEYCqL76qVwgXkpiv94ABAzBy5EgsXLgQHTp0AMdxOHnyJAIDAwVfqKG2RSi6du0KRUVFBAQEIDU1VbBsMzMzhIWF4fTp02jdunWN3yV+fn6CZTdr1gxpaWk16muHDx+GtbW1YLlA1e8xHx8fLFiwQKq+1qNHDwwePFjQbH19faSnp8PQ0BBxcXFYsWIFgKovBYReBKasrAyKiooAqupr1Z8nLVq0QH5+viCZS5cuBQC+kax60Tqgqr72559/CtZIVr3wAmMMq1atkrq+CgoKMDExwapVqwTJribm30Q///wz1q5di3nz5sHZ2RmMMZw6dQozZ85ESUkJZs+eLVg2IdWogY6QD5CVlcX9+/cFzbh69So/oXJWVpbUYxzHCZr966+/YujQobh37x4qKyuxe/duZGZmYsOGDThw4ICg2VeuXMHmzZtrbNfT08OTJ08EzZ44cSK8vb2hpqaGpk2bws3NDQDw559/Cj7Rs5OTE2bNmoUuXbogMTGRb6DJzc2ttQdKXRCzkezEiRMAqhZLWLJkCTQ0NATJeR8tLS18//339Z4LALt27cLOnTv59xgAeHh4QFlZGV5eXoI20C1duhTDhw9H+/bt+V4+5eXl6N27t+CN4KNHj8YPP/yAmzdvSv0BP3/+fEyePFnQ7MjISPTo0QNNmjRBq1atwHEc0tLSoKSkhCNHjgiaLeZ5T548GX5+fkhLS5PKjomJEfz1FvOai/k+X758Ofr06QMTExMYGRmB4zjk5eXB1tYWsbGxgmaL+XovXLgQHMdh2LBhfM9keXl5/Pjjj5g3b56g2e+ip6eHzMxMQTPWrFkDLS0tpKam1mgI5DhO0Aa6wMBA/PTTTygpKQFjDOfOncOWLVswd+5crFmzRrBcAEhJSUFUVFSN7Y0bN8aDBw8EzR4xYgS8vLxgaGgIjuPQtWtXAFWN8EL35rKxscGqVavQs2dPxMfHIywsDABw//59fvXguiZmI1l1z1t3d3fs3r1blEU43vc3kdDWr1+PNWvWSH2x06pVKzRu3Bjjxo2jBjpSL2gVV0L+a9++fVL3GWPIz8/H8uXLYWRkhMOHD4tUMuEdOXIEc+bMkVqRbMaMGfjmm28EzW3SpAm2b9+ODh06QF1dHZcuXYKpqSn27NmDKVOmICcnR9D88+fP486dO+jatSv/DeXBgwehpaXF93QSwuXLl+Ht7Y28vDxMmjSJH847YcIEPHnypNZGy7oiZiPZl0pFRQWpqak1Vge+du0a2rZtK9jQJPbfVc709PRw//59ZGRkgDEGa2vrWoe1C5EfGRmJiIgI/kuORo0aITAwEH5+foJ/+fDq1SvExsbi+vXr/Hl7e3tDWVlZ0Fyxz3vPnj2IiIhARkYGAPCregq5wmM1Ma652O/zavHx8VLn3aVLl3rJFeP1rqiowMmTJ2FrawslJSXk5OSAMQYzMzPBV0gGqn6Hvqm6vjZv3jyUlZXh1KlTgpdBLKtXr8asWbNw584dAFUNZDNnzqx1WH1d0tfXR1xcHBwcHKTqa0ePHsXIkSP58ghl586duHPnDvr3748mTZoAqGpM0dLSEvS9LpFI0LdvXxQUFGD48OFYt24dAOA///kPrl+/LmjvezEbyb5USkpK/HDuN2VmZsLe3h6vXr0SqWTkS0INdIT8l4yM9JSMHMdBT08PnTp1QkREBAwNDUUq2edr6tSpSE5Oxo4dO2BhYYELFy7gr7/+wrBhwzBs2LB6WTa+tLQUubm5aN68OeTkxO1UXFJSAllZ2VrnM/ocFBcXY968eUhISMDDhw9rzOdx8+ZNwbI7deqE3bt3Q0tLS2p7QUEBPD09cfz4ccGyO3fuDB0dnRpD/4YPH46nT5/i2LFjguRWVlZCSUkJ165dqzHcUmjl5eXYtGkTunXrBgMDA35+MKHnpgKqhiRZWlriwIEDgg/7epuY511eXo7Zs2fD19dX8OGNbxPzmov9PldSUkJaWhpatmxZ79livd5A1R+yGRkZaNasWb1ny8jIgOO4GnN0tWvXDuvWrRO8V9Wn4PHjx6isrBR82Hy1H374AY8ePcL27dvRoEEDXL58GbKysvD09ISLiwsiIyPrpRwlJSWCT0PytoqKChQUFEg1lN26dQsqKir1dv3rW0VFBWJiYt5ZXxOyzuTr64slS5bU+L1ZXFyMCRMm8I2kQvjqq6/w1Vdf8cOMq02YMAEpKSk4c+aMYNmEVKMGOkI+Ae7u7u/tVSHkL0JTU1OkpKTU6Kr//PlzODo6CtpoUlZWBh8fH2zduhWMMcjJyaGiogKDBw9GTEyMoHOLvHz5EhMmTMD69esBVHWjNzU1hZ+fHxo1aoTp06cLlg1UXd+dO3ciJycHgYGBaNCgAS5cuAB9fX00bty4TrO+++47xMTEQENDA99999179xXy2+BBgwYhMTERQ4cO5YeqvMnf31+wbBkZGTx48KBGZfrhw4do3LixoAuiXL16Fd27d0dJSUmtQ/9sbGwEy7axscHatWvRrl07wTLeRUVFBRkZGTXmS6oPjRs3xrFjx2r0WqwPYp63mpoarl69ChMTk3rPFvOai/k+b968OXbv3o1WrVrVe7aYr3ebNm0wb948wefZq83t27el7svIyEBPT69eGm58fX3f+7iQjQdiftFUUFAADw8PXLt2DYWFhWjUqBEePHiA9u3b49ChQzXmCatLFRUVmDNnDlatWoW//vqLr68FBwfDxMRE8N6D5eXlkEgkyMnJweDBg6Guro779+9DQ0NDan64ujBp0iSEhYVBVVWVnzv4XYSci238+PGIiYlBz549a62vVQ/DFYKsrCzy8/Nr1NeqFwsRcrGvxMRE9OzZE02bNkX79u3BcRxOnz6NO3fu4NChQ/j6668FyyakGs1BR8gn4O3J+cvKypCWloarV69i+PDhgmbfunULFRUVNba/fv0a9+7dEyyXMYb79+9j9erVCAsLw4ULF1BZWQkHB4d66QURFBSES5cuQSKRoHv37vz2Ll264NdffxW0ge7y5cvo3LkztLS0cOvWLYwePRoNGjTAnj17cPv2bWzYsKFO8zQ1NfnKlYaGhuBD7N7l8OHDOHjwoKDDh9/25lCo9PR0qblyKioqEBcXV+cNom9r2bIlsrOzpYb+DRw4sF6GWy5YsACBgYFYuXJlvffw+eqrr3Dx4kVRGqqqV+1ds2ZNvfeMFfO8u3TpAolEAh8fn3rPFvOai/k+/+WXXxAUFITY2Fh+ZfD6IubrPXv2bEyZMgVhYWG1LpYg5DQKYvxsVXv27JnU/bKyMly9ehXPnz9Hp06dBM2WSCQoLS2tsb2kpARJSUmCZmtoaODkyZM4fvw4X19zdHSsl6Hcs2fPxvr167FgwQKMHj2a325ra4vFixcL2kB3+/ZtdO/eHXl5eXj9+jW6du0KdXV1LFiwACUlJXU+F9zFixf5Lwyr5w6ujdD1uK1bt2L79u3w8PAQNOdNBQUFYIyBMYbCwkKpBveKigocOnRI8B6Lrq6uyMrKwm+//cbX17777juMGzeOXwSIEKFRDzryRftUvql6l5kzZ6KoqAgLFy6s82NXz7nn6emJ9evXS62KVlFRgYSEBMTHxws24bKYw5KAqgr+tm3b0K5dO6n5VG7cuAFHR0cUFBQIlt2lSxc4Ojryq6FVZ58+fRqDBw/GrVu3BMsWU7NmzXDo0KF67WFTPRQKQI3hUACgrKyMZcuWfbBXxL+VtrY2Xr58ifLycigoKNRoEHz69Klg2Tt27MD06dMREBBQ6x/wdnZ2gmX37dsXCQkJUFNTq/dVe8U876ioKMycORPe3t61Zgu5oqmY11zM97mDgwNu3LiBsrIyGBsb1zjvCxcuCJYt5uv95rQgbzYWMMbAcVytX/z9L5YuXYoffvgBSkpKNYafvU3IhRpqU1lZiXHjxsHU1BRTp06t8+NXf9Fkb2+P48ePSzUEV3/RFBUVJVjdQcyh3EDVyrlRUVHo3LmzVJ3p+vXraN++fY1G07rk6ekJdXV1rF27Fjo6Onx2YmIiRo0ahezsbMGyxdSoUSNIJJIac7EJ6c36Wm04jkNISAh+/vnneisTIWKgBjryRXN3d8eePXugpaUFd3f3d+7HcZygQwfe5caNG2jbtq0gf1xUV65rm8dFXl4eJiYmiIiIQK9eveo8u5rYw++uXr0KU1NTqQrfpUuX4OLighcvXgiWrampiQsXLqB58+ZS2bdv34alpSVKSkoEyw4JCcGQIUPQvHlzwTLeJTY2Fn/88QfWr19fLxOJA1XffjPGYGpqinPnzkFPT49/TEFBAQ0bNhRkKPW+ffvQo0cPyMvL11iA5m1C/hFdPYT7XYTsofv2vJ7A/3/eCPEH/JtGjBjx3sejo6MFyxbzvGvLfrMMn+s1F/N9HhIS8t7HhZxLVczXOzEx8b2Pu7q61mles2bNcP78eejo6Lx33juO4wSdmuNdMjMz4ebmhvz8/Do/9qfwRZOYQ7mVlZVx/fp1GBsbS9WZ0tPT0bZtWxQVFQmWrauri1OnTsHS0lIq+9atW7C2tsbLly8Fy16/fj369esn6PDhd4mIiMDNmzexfPnyeht1kZiYCMYYOnXqhF27dkk1RCsoKMDY2FiQXmyXL19Gy5YtISMjU2MBmrcJ+QUbIdVoiCv5op04caLWf38qkpOTBZtTpXrC12bNmiElJQW6urqC5LyPmMOS2rRpg4MHD2LChAkA/r8HwOrVq9G+fXtBs5WUlGrtoZeZmSnVgCSEXbt2ITQ0FG3atMGQIUMwYMAAwTOrRUREICcnB/r6+jAxMamxGIYQPU2qh0K9PcGx0Dw9Pfk57zw9Pd+5n5B/RJeVlUEikSA4OBimpqaCZLxPbm5uvWcCVb093Nzc+IUa6ptY5w3U//u8mpjXXMz3efVcSGIt1CDW611WVoaZM2ciKiqq3nrYvPlzJebP2Lvk5OQINjdWbm6uKF80vUnModw2NjZISkqqMbR5x44dcHBwEDS7srKy1t/Rd+/eFXzxnylTpmDcuHH49ttvMWTIEHTv3r3epg84efIkTpw4gcOHD8PGxqZGfU2IHtHVjfq5ublo2rRpvTUM2tvb8/U1e3v7WjsuAMJ/6UFINepBR8gn4O2J+xljyM/Px/nz5xEcHFwvq5kC9b86lpjDkk6fPo3u3bvD29sbMTExGDNmDK5du4bk5GQkJiaidevWgmWLvRratWvXsGnTJmzduhV3795Fly5dMGTIEHh6egras03MniYAsHHjRqxatQq5ublITk6GsbExFi9eDFNTU/Tp00fQbLFoaWnhwoUL9d5wIeaqnoB4CzWIvYKsmMPQxFwcQ6z3OVC1Qu+VK1fqfaEGsV9vPT09nD59WpQpKsT09nQo1fW1gwcPYvjw4Vi+fLlIJROWmEO59+/fj6FDhyIoKAihoaEICQlBZmYmNmzYgAMHDqBr166CZQ8YMACampr4/fffoa6ujsuXL0NPTw99+vRB06ZNBe0ZXF5ejri4OGzZsgV//PEHlJWV0b9/fwwZMgQdOnQQLBcQt0c0ACQlJSEqKgo3b97Ejh070LhxY2zcuBHNmjVDx44d6zTr9u3bfIPg2wvQvE3M+S/Jl4Ma6MgX7UMrWr5JyPlzfHx8pL4pql6RrFOnTvjmm28EywWqvh2cPXu2KKtjiTksCQCuXLmChQsXIjU1lZ/weNq0abC1tRU0V8zV0N526tQpbN68GTt27EBJSYmgc++JaeXKlZgxYwYmTpyI2bNn88ObY2JisH79+nrvQfv8+fMaK/EJYcSIEbC1tf3gHJtCEHNVT3d3d/j7+7+396JQxDxvMYehiXnNxXyfe3p6wtPTU5SFGsR8vSdPngx5eXnMmzevXvI+5rUVcs7gt6dDebO+5uvrK3gPJ7G+aBL7C7YjR45gzpw5UvW1GTNmCF5Hvn//Ptzd3SErK4vs7Gw4OTkhOzsburq6+PPPPwVftKDay5cvsWfPHmzevBnHjh1DkyZNkJOTUy/Z9W3Xrl0YOnQovL29sXHjRqSnp8PU1BQrVqzAgQMHcOjQIbGLSIigqIGOfNHe/IaIMYY9e/ZAU1MTTk5OAIDU1FQ8f/4c3333neDfFoklNDQU69evR2hoKEaPHs03XGzfvh2LFy9GcnKy2EX8bImxGtrb0tLSEBsbi61bt+LJkyd49epVvZehPlhbW2POnDn8hM/V88hcvXoVbm5uePz4sWDZ8+fPh4mJCQYMGAAA6N+/P3bt2gVDQ0McOnRI0D+uZ8+ejYULF6Jz5861TiIv5GTq8+bNw/Xr10VZ1VPMhRrEPO/o6Gjs2LFDlGFoYl5zMd/nYi7UIObrPWHCBGzYsAFmZmZwcnKqcd513Uj2dsNYamoqKioqYGlpCQDIysqCrKwsWrduLcqcwfXhU/ui6Uvx6tUrbNmyRaq+Vh+rsL/t8ePH2Lp1K1atWoWMjIzPdrilg4MDAgICMGzYMKn6WlpaGrp3744HDx4Ilr1+/Xro6uqiZ8+eAICpU6fi999/h7W1NbZs2UI96Ei9oAY6Qv5r2rRpePr0KVatWsXP5VFRUYFx48ZBQ0MD4eHhgmWbmpoiJSUFOjo6UtufP38OR0dHQSc8FnN1rDe9evWKX1q+moaGhqCZlZWVuHHjBh4+fFhjLh8XFxdBs8WUm5uLzZs3Y9OmTcjKyoKLiwsGDx6M/v37S63mWxcaNGiArKws6OrqQltb+71zigg5pPldk0xnZ2fDzs5O0IZJU1NTxMbGokOHDoiPj4eXlxe2bduG7du3Iy8vD0ePHhUsW8zJ1MVc1VPMhRrEPG8xh6GJec3FfJ+LuVCDmK+3mAtrLVq0CBKJBOvXr4e2tjYA4NmzZxgxYgS+/vprTJ48WbDsTp06Yffu3TV6QBcUFMDT01PQ8xbzi6ZPQWlpaa31taZNm4pUIuFV95zbtGkTjh07BiMjIwwaNAje3t513kvb0dERCQkJ0NbWhoODw3vra0J+tqioqCA9PR0mJiZS7/ObN2/C2tpa0IXULC0tsXLlSnTq1AnJycno3LkzIiMjceDAAcjJyQn6+5uQarRIBCH/tW7dOpw8eVJqol1ZWVlMmjQJHTp0ELSB7tatW7VW4l+/fo179+4JlgsA9+7dg5mZWY3tlZWVNRrM6lpxcTGmTZuG7du348mTJzUeF/IPmzNnzmDw4MH8Kp9vqo+JYBMSEpCQkFBrZXPdunWC5bZv3x7nzp2Dra0tRowYgcGDB6Nx48aC5S1evJifSFnoufXep1mzZkhLS6vx7efhw4cFnyssPz+fn0D+wIED8PLywjfffAMTExN89dVXgmaLOZm6lpYWvv/+e1Gyv9TzFmN4aTUxr/mXuDAHIO7rLWZvrYiICBw9epRvnAOq5rSdNWsWvvnmG0Eb6CQSCUpLS2tsLykpQVJSkmC5QNX7vLZFERQVFVFcXCxodkVFBRYvXsx/sfT2NRDyC7bs7Gz4+vri9OnTUtvro/EfqOqdKZFIaq2vzZgxQ7DcQYMGYf/+/VBRUUH//v0hkUgEnXuuT58+UFRUBCDuZ4uhoSFu3LhRY17PkydPCj7P6J07d/i/ifbu3Yt+/frhhx9+gLOzM9zc3ATNJqQaNdAR8l/l5eXIyMjgh0tUy8jIEKwCvm/fPv7fR44ckeq9VFFRgYSEBMEnnhZzdaypU6fixIkTWLFiBYYNG4bffvsN9+7dQ1RUlODz2owdOxZOTk44ePAgDA0N6221KKBqLpfQ0FA4OTnVe7a7uzvWrFkDGxubesl7cx5BoecUfJ/AwED89NNPKCkpAWMM586dw5YtWzB37lysWbNG0GxtbW3cuXMHRkZGiIuLw6xZswBU/XFRX0NUSktLkZubi+bNm9fbsEsxpwUQcxiKmOddXwsK1eZTGPojxvv8TfW90JKYr3e1GzduICcnBy4uLlBWVuYbTYRUUFCAv/76q8bvsYcPH6KwsFCQzMuXL/P/Tk9PlxpmV1FRgbi4OEG/7ALE/aIpJCQEa9aswaRJkxAcHIyff/4Zt27dwt69ewVtpAKq5mmWk5PDgQMH6r3OtHr1avz444/Q1dWFgYGBVDbHcYKeO8dx2LZtG7p161Yvn2dvfp6I+dkyZswY+Pv7Y926deA4Dvfv30dycjKmTJki+HtNTU0NT548QdOmTXH06FEEBAQAAJSUlD7bKWDIJ4gRQhhjjAUEBDBtbW0WHh7OkpKSWFJSEgsPD2c6OjosICBAkEyO4xjHcUxGRob/d/VNQUGBWVhYsP379wuSXW3fvn1MU1OTzZs3j6moqLDw8HA2atQopqCgwI4ePSpotpGRETtx4gRjjDF1dXWWnZ3NGGNsw4YNrEePHoJmq6io8Hn1zcDAgG3YsEGUbLGVl5eznTt3srCwMDZr1iy2e/duVl5eXi/Zv//+O2vatCn/M9akSRO2Zs0awXN/+uknZmxszLp06cJ0dHRYYWEhY4yxrVu3MgcHB0Gzi4uLma+vL5OVlWWysrIsJyeHMcbYhAkT2Ny5cwXNZoyxsrIyFh8fz1atWsUKCgoYY4zdu3ePvwZC2rBhA+vQoQMzNDRkt27dYowxtnjxYrZ3717Bs8U872fPnrHVq1ez6dOnsydPnjDGGEtNTWV3794VPFusay7m+7y8vJyFhoayRo0aSWX/8ssv9fL5Itbr/fjxY9apUye+DlN93r6+vmzSpEmCZg8dOpQ1bdqU7dixg925c4fduXOH7dixg5mYmLBhw4YJkll9nrXV1ziOYyoqKmzt2rWCZFdbt24da9y4Mdu6dStTVVVlW7ZsYbNmzeL/LSRTU1N24MABxhhjampq7MaNG4wxxpYsWcIGDRokaLaKigrLyMgQNONdmjZtyubNmydK9qfg/PnzbOPGjSw2NpZduHCh3nL/85//MGVlZf7nS0lJif3yyy+C5w4ePJg5OjqykSNHMhUVFfb48WPGGGN//PEHs7GxETyfEMYYowY6Qv6roqKCzZ8/nzVq1Ij/hdCoUSM2f/58wRsQTExM2KNHjwTNeJ+4uDjm4uLCVFVVmbKyMnN2dmZHjhwRPFdVVZX/I65x48bs7NmzjDHGbt68yVRVVQXNdnd3Z4cPHxY0410aNGjAV27FcOfOHfbbb7+xadOmsYCAAKmbkLKzs5m5uTlTUVFhDg4OzN7enqmoqDBLS8t6vR6PHj1if/31V73llZaWsvDwcObn5ydVwV28eDFbvXq1oNl+fn6sdevWLCkpiamqqvJ/RP/xxx/M3t5e0Oxbt26xFi1aMBUVFamGC39/fzZmzBhBs1esWMF0dXXZrFmzmLKyMp8dHR3N3NzcBM0W87wvXbrE9PT0mJmZGZOTk5NqLBo6dKig2WJeczHf5yEhIczU1JTFxsZKnfe2bdtYu3btBM0W8/UeOnQo69atG7tz5w5TU1Pjs48cOcKsra0FzS4uLmY//vgjU1RU5BvNFBQU2I8//siKiooEybx16xbLzc1lHMexlJQUduvWLf52//79z/6LJhUVFXb79m3GWNWXjKmpqYwxxnJycpiGhoag2U5OTiwpKUnQjHdRV1fn39tiKCoqYgcPHmQrV65kS5YskboJ6a+//mLu7u6M4zimra3NtLS0GMdxrFOnTuzhw4eCZlcrLi5mKSkp7OzZs/Xy5RZjVV94/PTTT6x3795SfyPMmDGDzZo1q17KQAg10BFSixcvXrAXL17UW15lZeU7HysuLq63ctQ3W1tbJpFIGGOMde3alU2ePJkxVvWNbOPGjQXN3r17N7O2tmbR0dHs/Pnz7NKlS1I3IU2dOpWFhoYKmvEux44dYyoqKszGxobJyckxe3t7pqWlxTQ1NZm7u7ug2T169GDdu3fne3kwVtULo3v37szDw0PQ7C9V06ZNWXJyMmOMSf0RnZ2dzdTV1QXN7tOnDxsyZAh7/fq1VLZEImFmZmaCZltZWbE9e/YwxqTP+8qVK0xHR0fQbDHPu3PnziwwMJAxJn3ep06dYsbGxoJmi3nNxXyfN2/enB07dqxGdkZGBtPS0hI0W8zXW19fn6WlpdXIro8v2KoVFRWxS5cusbS0NMEa5j7G++pyda2+v2iysLBgZ86cYYwx1rFjR75n6tatW5menp6g2QkJCax9+/bsxIkT7PHjx3wdvT7q6r6+vmzlypWCZrzLhQsXmIGBAdPQ0GCysrJMT0+PcRzHVFVVWbNmzQTN9vLyYq1bt2bp6en8tmvXrjEnJyc2cOBAQbMJ+dLRHHSE1ELo1UPf5u7ujtjYWDRp0kRq+9mzZzF06FBkZWXVSzmKiopqzLcn5LUYMWIELl26BFdXVwQFBaFnz55YtmwZysvLsWjRIsFyAfCTuPv6+vLb6mvFwZKSEvz+++84duwY7OzsIC8vL/W4kOceFBSEyZMnIzQ0FOrq6ti1axcaNmwIb29vdO/eXbBcAEhMTMSZM2fQoEEDfpuOjg7mzZsHZ2dnQbOfPHmCGTNm4MSJE7VO9CzkBNeAeJNMP3r0CA0bNqyxvbi4WPB5fE6ePIlTp05BQUFBaruxsbHgi9+IOZm6mOedkpKCqKioGtsbN24sNWeWEMS85mK+z8VcaEnM17u4uBgqKio1tj9+/JifaF5oqqqqsLOzq5esakOHDsXKlSuhpqYmtf3WrVsYOnSo4AtFVNPV1a2XnGrVq1N/9dVX8Pf3x6BBg7B27Vrk5eXx83QJpUuXLgCAzp07S22vj/qamZkZgoODcebMGdja2taor/n5+QmWHRAQgG+//RYrV66ElpYWzpw5A3l5eQwZMgT+/v6C5QJAXFwcjh07JrVSrLW1NX777Td88803gmaXlJRg2bJl76yvCbmCLAA8f/4c586dq5HNcRyGDh0qaDYhAC0SQYiUnTt3vnOFKiF/IWhoaMDOzg4rVqzAwIEDUVlZidDQUMydOxcTJkwQLBeo+qNq/PjxkEgkUkuX10fF581Knbu7O65fv47z58+jefPmaNWqlWC5gLir/l2+fBn29vYAgKtXr9ZrdkZGBrZs2QIAkJOTw6tXr6CmpobQ0FD06dMHP/74o2DZioqKtU7gXVRUVKMxo64NGTIEOTk5GDlyJPT19b+YSabbtGmDgwcP8p8j1dmrV69G+/btBcsFqhooavv8uHv3Lr+yr1DEnExdzPNWUlJCQUFBje2ZmZnQ09MTNFvMay7m+1zMhZbEfL1dXFywYcMGhIWFAai65pWVlQgPD4e7u7ug2UBV4+SOHTtqra/t3r1bsNz09HTY2toiNjaW/2Jp/fr18PPzQ9euXQXLBcT9ounNhbv69esHIyMjnDp1CmZmZujdu7dguYC4Kwb//vvvUFNTQ2JiIhITE6Ue4zhO0Aa6tLQ0REVFQVZWFrKysnj9+jVMTU2xYMECDB8+HN99951g2ZWVlTUaIwFAXl5e8JWrfX19ER8fj379+qFt27b1Wl/bv38/vL29UVxcDHV19Rr1NWqgI/VC5B58hHwylixZwtTU1NhPP/3EFBQU2JgxY1iXLl2YpqYm+89//iN4/sqVK5mqqiobNGgQa9++PWvcuDGLj48XPLd9+/asffv2bOvWrezEiRNMIpFI3cjnRV9fn127do0xxpi1tTX7448/GGOMpaWlCT4saejQoczGxoadOXOGVVZWssrKSpacnMxatmzJhg8fLmi2mpoaPxyrvok5yfSpU6eYuro6Gzt2LFNSUmL+/v6sS5cuTFVVlZ0/f17QbC8vLzZ69GjGWNX1v3nzJissLGSdOnViPj4+gmaLOZm6mOc9evRo5unpyUpLS/ns27dvMwcHB+bv7y9otpjXXMz3uZgLLYn5el+7do3p6emx7t27MwUFBdavXz9mZWXF9PX1BZ9TdMuWLUxeXp717NmTKSgosF69ejFLS0umqakp+M9YWVkZmzZtGlNQUGBBQUGsX79+TE1NTfAFIhhjrHv37szc3JzNmzePRUdHs5iYGKkb+bzo6uqyzMxMxljVEOO4uDjGWNXweWVlZUGze/fuzVxcXNi9e/f4bXfv3mWurq7M09NT0GwNDQ128uRJQTPexdzcnPn7+3/W0wuRTx810BHyX5aWlmzz5s2MMen5VIKDg9lPP/1UL2WYPn064ziOycvLs1OnTtVLpqqqKrt+/Xq9ZH1qMjMzWVRUFAsLC2MhISFSNyGNGDGCX9nxTUVFRWzEiBGCZvfp04f9/vvvjDHGAgMDmZmZGZs1axZzdHRknTt3FjT72bNnrHfv3vwqxQoKCkxGRoZ5enqy58+fC5rt5OTEz1FV38SeZPry5cts2LBhzMbGhllZWTFvb292+fJlwXPv3bvHLCwsmJWVFZOTk2Pt2rVjOjo6zNLSsl7mThJrMnUxz/vFixfM2dmZaWlpMVlZWWZkZMTk5eWZi4tLvczRJdY1Z0y89zlj4i20JPbrnZ+fz2bMmMF69uzJevTowX7++Wd2//59wXNtbW3Z8uXLGWP/X1+rrKxko0ePZjNmzBA8n7GqSeOr62unT5+ul0wxv2gS27Nnz9iRI0fYxo0b2fr166VuQgoJCam1sebly5eC1xW7du3KNm3axBhjbMyYMaxt27YsNjaWdevWjbVt21bQ7Ly8PObg4MDk5eWZqakpa968OZOXl2eOjo7szp07gmZbWVkJPhf0u6ioqIhaXyOEMcY4xhgTuxcfIZ8CFRUVZGRkwNjYGA0bNkR8fDxatWqF7OxstGvXDk+ePBEs+9mzZxg1ahQSEhIQHh6OxMRE7N27FwsWLMC4ceMEywWqhpb+/PPP/BwfX4oPDTsUckizrKws8vPza8yZ9PjxYxgYGKC8vFyw7Js3b6KoqAh2dnZ4+fIlpkyZgpMnT8LMzAyLFy+uMUxLCNnZ2cjIyABQNadJbfM31bWUlBRMnz4dM2bMQMuWLWsM3RByrsWRI0eiTZs2GDt2rGAZn6pXr15h69atSE1NRWVlJRwdHeHt7Q1lZeV6K8Pjx49RWVlZ6xxlQhH7vI8fP44LFy7w2fX9+S7GNf+Sif161zdVVVVcu3YNJiYm0NXVxYkTJ2Bra4uMjAx06tQJ+fn5gmWXlZVh+vTp+O233zB58mScPHkSmZmZWLduHTw8PATLBaqGci9btgzt2rUTNOdT86Fhh0IO7X1Xfe3Jkydo2LChoNPAnD9/HoWFhXB3d8ejR48wfPhwvr62bt06fqoUIcXHx+P69etgjMHa2rpePlsOHz6MpUuXYtWqVfVSJ33Td999h4EDB8LLy6tecwl5EzXQEfJfpqam2LlzJxwdHdGmTRuMGjUKY8aMwdGjRzFw4EBBKwCNGzdGs2bNsHHjRjRr1gwAsG3bNowbNw7t2rXDwYMHBcvOycnB2LFjMWTIkFobLup7Aub6YmxsjHHjxmHatGn1lllQUADGGLS1tZGdnS01R1BFRQX279+P6dOn4/79+/VWJrFU/+qpr7lFsrOzMWjQIFy8eLFGOYSea3Hu3LlYtGgRevbsWe+TTBNCyOfGyMgIhw4dgq2tLVq1aoXp06dj0KBBSE5ORvfu3fHixQvBslu1aoWXL19i48aNaNeuHRhjWLBgAX799Vf4+vpixYoVgmWL+UWTmCwsLODh4YE5c+bUujCJkGRkZPDXX3/VmNPx+PHjGDBgAB49elSv5fkSPHr0CF5eXvjzzz+hoqJS430u5N9ja9euRWhoKEaMGFFrfU3o+RYJAWiRCEJ4nTp1wv79++Ho6IiRI0ciICAAO3fuxPnz5wWdiBUAxo4di59//hkyMjL8tgEDBsDZ2RkjRowQNPvRo0fIycmRyqmv1UzF9OzZM/Tv379eM7W0tMBxHDiOg4WFRY3HOY5DSEiIoGVISUlBZWUlvvrqK6ntZ8+ehaysLJycnATNX7t2LRYvXozs7GwAgLm5OSZOnIhRo0YJmuvt7Q0FBQVs3ry53heJEHOSaUII+dx8/fXXiI+Ph62tLby8vODv74/jx48jPj6+xkqfdc3JyQlLly6FqqoqgKrP8GnTpqFbt24YMmSIoNlaWlp48eIFOnXqJLX9c6+v3bt3D35+fvXaOKetrS1VX3uzzlBRUYGioiLBe8Xn5uaivLwc5ubmUtuzs7MhLy8PExMTQfMTEhKwePFiZGRkgOM4tGjRAhMnThS8F92gQYNw7949zJkzp97ra6NHjwYAhIaG1njsc/4ZI58W6kFHyH9VVlaisrIScnJV7dbbt2/nu5KPHTtW8FUmq5WUlEBJSalesoCqIYZWVlaYOnVqrb8Ihe5eXllZiRs3btS6IpmLi4tguWIMO0xMTARjDJ06dcKuXbvQoEED/jEFBQUYGxujUaNGgpahbdu2mDp1Kvr16ye1fffu3Zg/fz7Onj0rWHZwcDAWL16MCRMm8CsrJicnY/ny5fD398esWbMEy1ZRUcHFixdhaWkpWAYhhBDhPX36FCUlJWjUqBEqKyuxcOFCvr4WHBwMbW1tUcr1+vVrKCoqCnb8tm3bQk5ODv7+/rXW11xdXQXLNjU1RUpKCnR0dKS2P3/+HI6Ojrh586Zg2WIMO1y/fj0YY/D19UVkZCQ0NTX5xxQUFGBiYiL4CtGurq7w9fXF8OHDpbbHxsZizZo1kEgkgmUvX74cAQEB6NevH3+eZ86cwc6dO7Fo0SKMHz9esGwVFRUkJyejVatWgmUQ8imjBjpCAJSXl2P27Nnw9fWFkZFRvedXVlZi9uzZWLVqFf766y9kZWXB1NQUwcHBMDExwciRIwXLVlVVxaVLl+plHrC3nTlzBoMHD8bt27fx9keREN9ULV26lP93cXGxaMMOb9++jaZNm9brt4LV1NTUcPnyZZiamkptz83NhZ2dHQoLCwXL1tXVxbJlyzBo0CCp7Vu2bMGECRPw+PFjwbJdXFwwY8YMUedmKi0tRW5uLpo3b85/EUAIIeTvKy8vx6ZNm9CtWzcYGBiIUoaNGzdi1apVyM3NRXJyMoyNjREZGYlmzZqhT58+guWK+UWTjIwMHjx4UGMutr/++gtNmzbF69ev6zRv3759/L8fPXok2rDDxMREODs7i/I7W0NDAxcuXKhRP79x4wacnJzw/PlzwbIbN26MoKCgGg1xv/32G2bPni3oVCyOjo7/x96Zx9Wcvv//dUp70S5MUx0t2sk6Gilr1oiKosgWpaQpzMgyMpYk2wzRDNKgJrLOFNomWUraJNplqaFojLJ27t8fTe+v4xzz2brP26/ez8fjPNR9enhdcs773O/rvq7XhR9++IF1r0VJF0xwcLTB3SFwcADo0qULwsPDRU6pJEVYWBgOHz6MrVu3MuXVAGBpaYnIyEiqCboRI0awlqDz8fHBgAEDcP78efTo0YN6wioyMlLoe7baDlNTU6GsrCzSYvvLL7+gubmZ6utQTk4Of/zxh0iCrra2lvoGtKWlRWwLbf/+/akOxgCApUuXIiAgAMHBwWI3+DS9Fpubm7F06VIcPnwYAJgEvL+/P3r27ImVK1dS0/6Q58+fIzU1FSYmJjA1NZWYriR5+/YtxowZg6ioKLGt5BySo7GxEaqqqtR1Dh06BFdXV4n7U4mjpaUFRUVF0NPTY62SqyPTpUsXLF68mBk0JGn27t2LNWvWYNmyZdi4cSNzkKiqqoodO3ZQTdANGDAA9+/fl2iC7v1EWXJyslAlWUtLC1JSUqi0Wk6ZMkVkjY22w6amJqSkpGDs2LFC68nJyRAIBBg3bhw1bR6PJ/bQ9M8//6Teavn8+XM4OjqKrI8ZM4a6d/PmzZsRFBSEjRs3it2v0fRabGlpwXfffcdKwQQHB4Okx8ZycHyqODk5kYMHD7Ki3bt3b3Lp0iVCCCHKysrMiO+SkhKiqqpKVTsqKoro6uqStWvXkoSEBHL69GmhB00UFRVJWVkZVY1PEWNjY5Kamiqynp6eToyNjalqu7m5keHDh5PGxkZm7dmzZ2T48OHExcWFqrafnx8JDAwUWQ8KCiJLliyhqs3j8UQeUlJSzJ808ff3J/379yeZmZlESUmJeX+fPn2a9O3bl6q2i4sL2b17NyGEkObmZmJkZERkZGRIly5dSEJCAlVtQlpfWwcOHCArV64kDQ0NhBBCcnNzyYMHD6jqampqktLSUqoanyK5ubmksLCQ+f7UqVPEycmJrFq1irx+/Zqq9ubNm8nx48eZ711cXIiUlBTp2bMnyc/Pp6qto6NDVFRUiLe3N8nKyqKq9SEBAQEkOjqaEELIu3fviK2tLeHxeERJSYmkpaVR1R4+fDg5fPgwaW5upqojjrq6OjJr1izSo0cPIi0tTaSkpIQeNLG3tyeJiYlUNT6Gqakpo/3+fq2oqIhoaGhQ1Y6PjydmZmbk4MGD5MaNG6SgoEDoQYMPPy/ff8jKyhJjY2Ny9uxZKtqfApaWluT8+fMi67/99huxsrKiqj1hwgTi4uJC3r17x6y9e/eOTJs2jTg6OlLVdnd3J1u3bhVZDw8PJzNmzKCq/f5r7v2HJPZr69evJ3w+n8TGxhIFBQXm/R0XF0eGDBlCVZuDow2ugo6D42/GjRuHVatW4datW+jfvz9jANwGzRL6hw8fiq1gEwgEePv2LTVdAIwHGxsnk4MHD0Z5eTkr1XvffvstvvrqK5GKi5cvXyI8PBxr1qyhpn3v3j1mWu/76OnpoaamhpouAERERMDOzg56enro168fACA/Px/du3fHkSNHqGoDrUMiLly4wLQuXLt2Dffv34enpyeWL1/O/Nz27dvbVbeqqqpd/77/hFOnTiEuLg5DhgwRqhI1MzNDRUUFVe3ff/8d33zzDQAgMTERhBA0Njbi8OHDCAsLw7Rp06hpFxYWYtSoUejWrRuqq6uxYMECqKurIzExEffu3UNMTAw1bU9PT/z444/YvHkzNY2P0dLSgsjISMTHx6OmpgZv3rwRep7mBLpFixZh5cqVsLS0RGVlJWbMmIGpU6cy1bk7duygph0VFYXY2FgAwMWLF3Hx4kX89ttviI+PR3BwMC5cuEBN+8GDBzh//jwOHToEBwcHGBgYYO7cufDy8qLeBpmQkMAMBzh79iyqqqpw584dxMTE4JtvvkFWVhY17f79+yMkJARLly6Fq6sr5s2bJ7G2sDlz5qCmpgahoaESqYB/nyVLliAoKAgPHjwQu1+jWRFdVVXFfHa+j5ycHJqamqjpAq3DwwDA29ubWaM91KvNG9jAwAA5OTnQ1NRsd41/RUxMDNzc3ET8/d68eYPjx4/D09OTmnZZWRnMzMxE1vv06YPy8nJqugCwdetW2NnZwcTEBMOGDQMAZGZmMpXwNDE1NcXGjRuRnp4u5EGXlZWFoKAgIcuY9u44SUtLa9e/7z8hJiYG+/fvx8iRI4U8qq2srHDnzh3W4uLoXHAedBwcf/P+BNUPoZ2oGjBgAJYtW4ZZs2ZBRUUFBQUF4PP5WL9+PS5duoTMzExq2mySmJiI1atXs9J2KC0tjdraWhE/lYaGBmhra1P9//7888+xZ88ekaTv6dOn4evriwcPHlDTBlpbNn7++WcUFBRAQUEBVlZWmDlzpsjvv71xcHD4t36Ox+NR33xKEkVFRdy6dQt8Pl/o/V1QUAA7Ozv8+eef1LQVFBRQWloKXV1deHp6omfPnti8eTNqampgZmaGFy9eUNMeNWoUbGxssHXrVqF/95UrV+Du7o7q6mpq2kuXLkVMTAwMDQ0xYMAAkRv49k4Av8+aNWsQHR2N5cuXIzQ0FN988w2qq6tx6tQprFmzhmr7fLdu3XDz5k307t0bW7ZsQWpqKpKTk5GVlYUZM2bg/v371LTff60FBATg1atXiIqKQmlpKQYPHoxnz55R036fx48fIzY2FocOHcKdO3fg6OiIefPmYdKkSf/4Of/fIi8vj/Lycnz22WdYuHAhFBUVsWPHDlRVVcHa2hrPnz9vd833aWlpwblz53Dw4EH8+uuvMDQ0hLe3N2bPno3u3btT01VRUUFmZib69u1LTeNjiPt/lNT0eTMzM2zatAlOTk5C17Vdu3bh8OHDyM3NpaZ97969f3ye9lCvD5FUCzub+zUdHR0cPXpUZHLupUuX4O7ujsePH1PTBoBHjx5hz549Qvs1Pz8/oSFjNBB3iCwOHo9HdUCIpFFQUMCdO3egp6cn9P6+ffs2Bg0aRHXPxMHRBldBx8HxNx9OEJUka9euxezZs/Hw4UMIBAKcPHkSd+/eRUxMDM6dO8daXLRpq96R5GlwG20aH1JQUEB94zNjxgz4+/tDRUWFmVSbkZGBgIAAzJgxg6o20DoYZOHChdR1PoTNU1GgtVI1KytL7MRgmkmTgQMH4vz581i6dCkAMK+7AwcOUJ8Cp6uri6tXr0JdXR1JSUk4fvw4AODZs2fUzY9zcnIQFRUlst6rVy/U1dVR1b516xZsbGwAtHr+vQ/tSp+ff/4ZBw4cwIQJE7B+/XrMnDkTvXv3hpWVFa5du0b1tUYIYV7bly5dwsSJEwG0vg5oDmEBADU1Ndy/fx+6urpISkpipjITQqj7Jb2PtrY2bG1tcffuXZSWlqKoqAhz5syBqqoqDh48CHt7+3bV6969O27fvo0ePXogKSkJP/zwA4BW70lpael21RKHtLQ0nJyc4OTkhCdPniAqKgqhoaH4+uuvMX78ePj7+4skF9oDXV1dkeFOkoLNiujg4GD4+vri1atXIIQgOzsbx44dw6ZNmxAdHU1VW9IJuPfZsmUL9PX1mSo+FxcXnDhxAj169MCvv/5KdeLmx/ZrDx48EPLEo8HkyZOxbNkyJCYmonfv3gBahzQEBQVR7axpo2fPnvjuu++o63wIm+8xoHVAQ2Fhodj9Gs3fu7m5OTIzM0Xea7/88ovYylkODhpwCToOjk+ASZMmIS4uDt999x14PB7WrFkDGxsbnD17FqNHj253vV27dmHhwoWQl5cXKlMXB82bSTY2AGpqauDxeODxeDA2Nhba9LW0tODFixdCZe00CAsLw7179zBy5EhmMINAIICnp6dENmKlpaVIT08Xu/Gh2drLJgcPHoSPjw9kZWWhoaEh9P9OeyjIpk2b4OjoiNu3b+Pdu3fYuXMniouLcfXqVZEBJe3NsmXL4OHhAWVlZejp6THJid9//x2WlpZUteXl5cVWD929exdaWlpUtdlMBtfV1TG/W2VlZaZCcuLEiQgNDaWqPWDAAISFhWHUqFHIyMjA3r17AbRea2lWUwGAs7Mz3N3dYWRkhIaGBsY8PT8/XyI2Bn/88QeOHDmCgwcPorKyElOmTMG5c+cwatQovHz5EqtXr4aXl9e/rEL6T5k7dy5cXV2ZNs+2z+zr16+jT58+7ar1T2RnZ+PgwYM4duwYtLW1MWfOHNTW1mLSpElYvHgxtm3b1q56O3bswMqVKxEVFUVlSMA/wWaiau7cuXj37h1CQkLQ3NwMd3d39OrVCzt37qRywHbmzBmMGzcOMjIyQgMbxEEzcfFhC/ulS5eQlJREtYW9X79+zH7t/f0S0Lpfq6qqEjvIoD0JDw+Ho6Mj+vTpg88++wxAa2Jw2LBh7f6eEkdjYyOys7PF7tdotvaySVJSEjw9PcUeKtE+vO+sBRMcnxZciysHRyfEwMAAN27cgIaGxj+WsXe00nUAOHz4MAgh8Pb2xo4dO4ROX2VlZaGvr0+9qqmN0tJSpm3B0tJSIjcdBw4cwOLFi6GpqQkdHR2RRNXNmzepx8AGurq68PHxwapVq6i0uf0rioqKsG3bNuTm5kIgEMDGxgYrVqygniQDgBs3buD+/fsYPXo0lJWVAQDnz5+HqqoqbG1tqekuXLgQT548QXx8PNTV1VFYWAhpaWlMmTIFdnZ2VP3Q2igvL0dFRQXs7OygoKDw0UqM9sTExAQxMTEYPHgwhg0bhgkTJmDlypWIi4vD0qVLqbZEFRYWwsPDAzU1NVi+fDnWrl0LoLXlt6GhAUePHqWm/fbtW+zcuRP379/HnDlzmGqDHTt2QFlZGfPnz6emPWnSJCQnJ8PY2Bjz58+Hp6enSCX0o0eP8Nlnn1Gplk9ISMD9+/fh4uLC3MQfPnwYqqqqVKd6Pn78mElKlpWVYdKkSZg/fz7Gjh3LvM4vXbqEKVOmtHtrlpqaGpqbm/Hu3TsoKiqKWCTQ9Fr8VKivr4dAIBBpvWxPpKSkUFdXB21tbVatWNhoYV+/fj3zZ1BQEPP5Bfzffm3atGmQlZVtd+33IYTg4sWLQm2mbd0PNDl79iw8PDzQ1NQEFRUVkf1aR32PGRoaYuzYsVizZg31gyVxJCcn47vvvhPar61ZswZjxoyReCwcnRMuQcfB8QnA5/ORk5MDDQ0NofXGxkbY2Nh0uCTZh9y+fVusmTrN0+CMjAwMHTqUuu/ap4aenh6WLFmCFStWsB2KRNHQ0EB2djbTosJBn+fPn2P8+PEoLi7GX3/9hZ49e6Kurg5ffPEFfv31VxFfuPakoaEBrq6uSEtLA4/HQ1lZGfh8PubNmwdVVVVERERQ0165ciW6du2Kr7/+GgkJCZg5cyb09fVRU1ODwMBAVgZXvHr1CtLS0h32ejdv3jzMnz//Hw9XCCGoqalp94MQNg3sZWVl0bt3b3h7e2POnDliK1OfP38OJyendq8qPXz48D8+7+Xl1a56nwojRozAyZMnRbzXnj9/jilTpnQo/9T36dmzJxISEjB06FCYmJggLCwMLi4uuHv3LgYOHEjVa/Hw4cNwc3OjbsvwqWFsbIzx48fju+++Exlo1pHp2rUr8vLyuP0aR6eFS9BxcHwCvH9C+j5//PEHPv/8c7x+/ZqK7tu3b2FiYoJz586JnVJFm8rKSkydOhVFRUWM9xzwfx5RtH2LWlpacOrUKZSUlIDH48HMzAyTJ0+WiG/QgwcPcObMGbGJSZoG9l27dkV+fj74fD41jU+RkJAQqKurY+XKlazF8PjxY7FtKjSHobzv7yiOn376iZp2G6mpqbh58yZzEj1q1Cjqmp6ennj8+DGio6NhamrKGD1fuHABgYGBKC4uph5DG9euXcOVK1dgaGhI3bPo/v374PF4TBVXdnY2jh49CjMzM+q+k4cPH4ampiYmTJgAoPU9t3//fpiZmeHYsWNUK4TZTJKxZWBPCEFmZiYGDBjQqW7e2eZj+7XHjx+jV69eePv2LRXdt2/fYsyYMYiKioKxsTEVjX/Cz88P586dg5GREfLy8lBdXQ1lZWXExcVhy5YtEqm+z83NFdqvScoTrKmpCRkZGWL3azTtMZSUlFBUVNTp9mve3t6wtbXFvHnzWI3jxYsXIvu1rl27shQNR2eC86Dj4GCR9/1EkpOThdotW1pakJKSQtXbRUZGBq9fv6be8vUxAgICYGBggEuXLoHP5yM7OxsNDQ0ICgqi7u1RXl6O8ePH4+HDhzAxMQEhhGnfOH/+PNWTu5SUFEyePBkGBga4e/cuLCwsUF1dDUIIY2xPCxcXF1y4cIG6z96nxqZNmzBx4kQkJSWJnRhMMymam5sLLy8vlJSUiJiq025L+rDt6O3bt7h16xYaGxupGMe38e7dO8jLyyM/Px8jRoygqiWOCxcuIDk5mUlUtWFkZNTuHmT/iiFDhmDIkCES0XJ3d8fChQsxe/Zs1NXVYfTo0TA3N0dsbCzq6uqoekx+9913jOfd1atXsWfPHuzYsQPnzp1DYGAgTp48SU177ty5cHR0FEma/PXXX5g7dy7VBB1bBvaEEIwaNQrFxcUwMjKipvNPsHnIJWkKCwuZr2/fvi006KalpQVJSUno1asXNX0ZGRncunWLtf1aZGQkDAwMUFNTg61btzLtprW1tViyZAlV7cePH2PGjBlIT0+HqqoqCCH4888/4eDggOPHj1P1NM3Ly8P48ePR3NyMpqYmqKuro76+HoqKitDW1qaaoBs7dixu3LjR6RJ0e/bsgYuLCzIzM8Xu12h7Y/v5+SE9PR2vXr1i1iUxvI6Dow0uQcfB8R4CgQDl5eViq1xo+E1MmTIFQOtN+oftIDIyMtDX16faigW0ehNt2bIF0dHRQga8kuDq1atITU2FlpYWpKSkICUlhS+//BKbNm2Cv78/8vLyqGn7+/ujd+/euHbtGuNV1NDQgFmzZsHf3x/nz5+npr1q1SoEBQXh22+/hYqKCk6cOAFtbW14eHhQNzw2NDREaGgorl27JvGNDwAcOXIE+/btQ1VVFa5evQo9PT3s2LEDBgYGVL2avvvuOyQnJ8PExAQARLxcaDJ37lwYGxvjxx9/RPfu3SV6g5WYmCiyJhAIsGTJEqqb/i5dukBPT4+1zWxTU5PYqqL6+nqRKisasPU6v3XrFgYNGgQAiI+Ph4WFBbKyspikPM0E3f3795lhEKdOncL06dOxcOFC2Nratvvk1A9hI0nGtoG9lJQUM5CDjQQdm4dcbNiC9O3bl/n/FnfgoKCggN27d7e77vt4enrixx9/lHib/Nu3b7Fw4UKEhoaKfG4sW7aMuv7SpUvx/PlzFBcXw9TUFEBrktTLywv+/v44duwYNe3AwEBMmjQJe/fuhaqqKq5duwYZGRnMmjULAQEB1HQBYMKECQgODsbt27fF7tdoV2RnZmYiKioKFRUVSEhIQK9evXDkyBEYGBjgyy+/pKZ79OhRJCcnQ0FBAenp6RId6uXh4QGgtbtA0vs1Dg4GwsHBQQgh5OrVq8TAwIBISUkRHo8n9JCSkqKqra+vT548eUJV42NMmTKFqKiokB49epAxY8aQqVOnCj1ooqqqSioqKgghhPD5fJKamkoIIaS8vJwoKChQ1VZUVCSFhYUi6/n5+URJSYmqtrKyMikvLyeEtP4Obt26xWjr6elR1dbX1//ow8DAgKr2Dz/8QDQ1NUlYWBhRUFBg/u8PHjxI7O3tqWqrqqqSgwcPUtX4GMrKyqSsrIwV7Y9x584doqOjQ1Xjp59+IuPGjSMNDQ1UdcQxfvx4snr1akJI6++/srKStLS0EBcXFzJt2jSq2my+zpWUlEhVVRUhhJBJkyaRzZs3E0IIuXfvHpGXl6eqraWlRW7evEkIIaRv377k8OHDhJDW6zmta2rfvn1Jv379iJSUFLG0tCT9+vVjHlZWVkRFRYW4uLhQ0V63bh1Zt24d4fF45KuvvmK+X7duHfnuu+/I0aNHyevXr6lot3Hu3Dny5ZdfkqKiIqo64hg3bhxxdHQUen/X19cTR0dHMn78eKraPB6P/PHHHyLrdXV1RFZWlopmdXU1qaqqIjwej+Tk5JDq6mrm8ejRI/Lu3Tsquu/j5+dHunbtSmxsbMjChQtJYGCg0IMm3bp1Y65lkqZr164kOztbZP369eukW7duVLW7detG7ty5w3x9+/ZtQggh165dIyYmJlS1P7wXkeR9SUJCAlFQUCDz588ncnJyzP/9999/T8aNG0dVu3v37mTjxo2kpaWFqo44lJSUmP9vDg624CroODj+xsfHBwMGDMD58+fRo0cPiZ6aVFVVSUzrQ1RVVTFt2jRWtC0sLFBYWAg+n4/Bgwdj69atkJWVxf79+6mX9MvJyeGvv/4SWX/x4gX1iWBKSkqMr2DPnj1RUVEBc3NzABA7Vr49YfO1tnv3bhw4cABTpkwRqgAYMGAAvvrqK6racnJyVCeW/hMjR45EQUEBU130KVBRUYF3795R1di1axfKy8vRs2dP6OnpiQyFoOlZFB4eDnt7e9y4cQNv3rxBSEgIiouL8fTpU2RlZVHTBdh9nZubm2Pfvn2YMGECLl68iA0bNgBonWD6YbVRezN69GjMnz8f/fr1Q2lpKeNFV1xcTM2qoa0KPT8/H2PHjv3olEcatE3I1dfXZ83AftasWWhuboa1tTVkZWWhoKAg9DzNKY8ZGRlCFehA6zCezZs3U7vWsmkL0uahSGMK8L/LrVu3GBuM0tJSoedo71mnTp2KU6dOYfny5VR1xCEQCMQOuJGRkaH+/yEjI8P8brt3746amhqYmpqiW7duqKmpoarN5mstLCwM+/btg6enJ44fP86sDx06FN9++y1V7Tdv3sDNze0fpxbTYuDAgbh//z7TbcHBwQZcgo6D42/KysqQkJDwSd1ES4KDBw+ypr169Wo0NTUBaN0MTJw4EcOGDYOGhgbi4uKoak+cOBELFy7Ejz/+yLSEXb9+HT4+PtTbBoYMGYKsrCyYmZlhwoQJCAoKQlFREU6ePCkxryoAIkM5aFNVVSXW1FlOTo55HdAiICAAu3fvxq5du6jqiCM6OhpeXl64desWLCwsJNqm8uHNFCEEtbW1OH/+PPUpi23JEzYwMzNDYWEh9u7dC2lpaTQ1NcHZ2Rm+vr7o0aMHVW02X+dbtmzB1KlTER4eDi8vL1hbWwNoTWy0Xedo8f3332P16tW4f/8+Tpw4wSQEc3NzMXPmTCqan0KSrO199ObNG7H2GJ9//jk17cjISNZasNg45PoUbEHYpL0n8f4nGBoaYsOGDbhy5Qr69+8vcuBCs+1wxIgRCAgIwLFjx9CzZ08AwMOHDxEYGIiRI0dS0wVaW9lv3LgBY2NjODg4YM2aNaivr8eRI0dgaWlJVZtN7t69K9bap2vXrmhsbKSq7eXlhbi4OHz99ddUdcQRHR0NHx8fPHz4UOx+jeZQLw6ONrgprhwcfzNixAiEhIRQ9wD7FHn37h3S09NRUVEBd3d3qKio4NGjR+jatatQRYIkePr0KdTU1KjfdDQ2NsLLywtnz55lPoDfvXuHyZMn49ChQ1TNvSsrK/HixQtYWVmhubkZX331FS5fvgxDQ0NERkZSnXYItE48DA8PR1lZGQDA2NgYwcHBmD17NlVdMzMzbNq0CU5OTlBRUWEma+7atQuHDx9Gbm4uNe2pU6ciNTUVGhoaMDc3F9l00TSwP3PmDGbPni32Zpa26bCDg4PQ91JSUtDS0sKIESPg7e0tcd9JSVFTUwNdXV2x15GamhqqSRM2X+dAayXR8+fPoaamxqxVV1czpuYc7UtZWRm8vb1x5coVoXXSwU3FPT09cfPmTZFDrgULFqB///44dOgQNW0DAwPk5ORAU1OTmsanTHl5OSoqKmBnZwcFBYWPejC2JwYGBh99jsfjUfH9a+P+/ftwcnLCrVu3mOt6TU0NLC0tcfr0aZFhQO3JjRs38Ndff8HBwQFPnjyBl5cXs187ePAgcwhCi4yMDGzbto0ZxGJqaorg4GAMGzaMqm7v3r0RFRWFUaNGCX2OxcTEYPPmzbh9+zY1bX9/f8TExMDa2hpWVlYSHep17do1uLu7o7q6mlnj8Xgd/nrO8WnRMXfmHBz/BUuXLkVQUBDq6urEmrF21FOTe/fuwdHRETU1NXj9+jVGjx4NFRUVbN26Fa9evcK+ffuox/D+ZlNdXV1k0iUNVFVVcfr0aZSVlaGkpARA6421JCoo32/fVVRUxA8//EBds43t27cjNDQUfn5+sLW1BSEEWVlZ8PHxQX19PQIDA6lpBwcHw9fXF69evQIhBNnZ2Th27Bg2bdqE6OhoarpA6/+3s7MzVY2P4e/vj9mzZyM0NBTdu3eXqDabFRdsYmBggNraWpGEVENDAwwMDKhustl8nQOtiaHc3FyhAxdZWVmxQzPamzZT8crKSvzyyy9UTcXV1dVRWloKTU3Nf3moQ7PVc86cOejSpQvOnTsncXsMaWnpj77OtbW1qb7Od+3aBS8vL3zxxRcih1w7d+6kpguIt2pobGyEqqoqVV22aWhogKurK9LS0sDj8VBWVgY+n4/58+dDVVWVavUgm/YYurq6uHnzJi5evIg7d+6AEAIzMzOMGjWKuvaAAQOYr7W0tPDrr79S12wjNjYWc+fOhbOzM/z9/UEIwZUrVzBy5EgcOnQI7u7u1LQXLVqEgIAA/PTTT+DxeHj06BGuXr2Kr776iuqwIQAoKipiqtBv3bol9Bzt66u3tzf69euHY8eOcUMiONiDDeM7Do5PkY+ZsErCjJVNnJycyKxZs8jr16+JsrIyYwSbnp5ODA0NqWrX19eTESNGML/jNm1vb2+yfPlyqtrvIxAIiEAgkJheGzk5OSQmJoYcOXKE3LhxQyKa+vr6jHn7+xw6dIjo6+tT19+/fz/5/PPPmffYZ599RqKjo6nrssn7Q0E6E23v6489aGs/fvxYZL26upooKipS1SaEvdd5dXU16dOnD1FUVCTS0tLMNTUgIIAsWrSIqrakTcUPHTpEXr16RQhpHcBx6NChjz5ooqioSEpKSqhqfIyPDUt4+PAh9aEgbZSWlpIzZ86Q06dPS2wYzubNm8nx48eZ76dPn054PB7p2bMnyc/Pl0gMbDB79mwyduxYcv/+faH9WnJyMjEzM5NYHGztmdjkjz/+IL///jvJzMwU+9lCgz59+pDt27eLrEdERJA+ffpQ1//666+JgoIC8zkmLy/PDF/qqCgqKn5yQ704Oh9cBR0Hx9+weToItJrBlpeXi/WwEecD0V5cvnwZWVlZIp4xenp6ePjwITVdoHV8vYyMDGO624abmxsCAwOpe8mw1er54MEDzJw5E1lZWcyJf2NjI4YOHYpjx45BV1eXmnZtbS2GDh0qsj506FDU1tZS021jwYIFWLBgAerr6yEQCCTWcrdu3TrMnTuXevuwOJydnZGWlobevXtLRM/GxgYpKSlQU1NDv379/vEEmOaghsTERKHv3759i7y8PBw+fBjr16+notnmucfj8RAaGipUNdbS0oLr16+jb9++VLSB1gqin3/+GZMmTWLldR4QEIABAwagoKBAaCjE1KlTMX/+fKrakjYVf9+DbM6cOe3+9/+7mJmZUR/u8yFtXpo8Hg/R0dFCVhQtLS34/fff0adPH4nEYmRkBCMjI4lotREVFYXY2FgAwMWLF3Hp0iUkJSUhPj4ewcHBuHDhAjVtPp+PnJwckaErjY2NsLGxodrqeeHCBSQnJ4u0dBoZGeHevXvUdNtga88EsNfq+fz5c/j6+uL48eNMRaq0tDTc3Nzw/fffU7dDmTRpksj65MmTJeLPtnHjRnzzzTe4ffs2BAIBzMzMJGJ7c+jQIbi5uYkMvZEEI0aM+OSGenF0PrgEHQfH37Bx495Gm+fBvXv3RNo7aXseCAQCsX//gwcPoKKiQk0XYHezyWarp7e3N96+fYuSkhJmUtTdu3fh7e2NefPmUb25MDQ0RHx8vMjmLi4ujvpN1vr16zFr1iz07t1b4t5BZ8+eRVhYGIYPH4558+bB2dlZYobyxsbGWLVqFS5fviy2fb69zbWdnJwgJycHgN1BDU5OTiJr06dPh7m5OeLi4jBv3rx218zLywPQ2uZZVFQkdPAgKysLa2trqpNUu3TpgsWLFzNt85J+nbN54MKmqbiDgwNmzZqF6dOnU71hFseWLVsQEhKC7777Tuz7u2vXru2uGRkZCaD1db5v3z5IS0szz7VNr6VhT7F8+XJs2LABSkpK/3KaJ02fqNraWuYg69y5c3B1dcWYMWOgr6+PwYMHU9MFWv0cxe2ZXr9+Tf091tTUJLZVvb6+nrnm04LNPRObrZ7z589Hfn4+zp07hy+++AI8Hg9XrlxBQEAAFixYgPj4eGraurq6SElJEUkWpaSkUD3IBYDDhw9j+vTpUFJSEmrzlQSrVq2Cv78/XFxcMG/ePLGHyrSYNGkSAgMDUVRUJPZ6TnuIHAcHAK7FlYPjQ4qLi8lvv/1GTp8+LfSgibW1NXFxcSG3b98mz549I42NjUIPmri6upIFCxYQQlpb8SorK8lff/1FRowYQebMmUNVW1lZmZSWljJft7VrZGdnE3V1darabLZ6ysvLk5s3b4qs5+bmUm9LSkhIINLS0mTs2LHk22+/JRs2bCBjx44lXbp0ISdPnqSqbWlpSaSkpMjgwYPJ7t27JdYm0kZBQQFZtmwZ0dbWJqqqqsTHx4dkZ2dT19XX1//ow8DAgLr+p0Z5eTn1NtM5c+aQP//8k6rGx7C3tyeJiYmsaKupqZHi4mJCiPA1NTMzk2hra1PV5vP55OLFiyLahw8fJqamplS1ly5dSnR0dIi8vDxxdnYmiYmJ5PXr11Q123jfEuP9hyTsMezt7cnTp0+panyo9+zZM+brjz0cHByoxtGjRw+SlZVFCCHE2NiYxMfHE0IIuXPnDlFRUaGi2bYX5PF4JCYmRmh/ePLkSeLr60uMjY2paLcxfvx4psWwbb/W0tJCXFxcyLRp06hqs7lnYrPVU1FRkWRmZoqs//7779Q/x3744QciKytLfHx8GDuURYsWETk5ObJv3z6q2pqamkRRUZG4ubmRs2fPkrdv31LVe593796R06dPk6lTpxJZWVliYmJCNm/eTGpra6lri7M7ev8az8EhCbgEHQfH31RUVBArKysh77n3N900YdPz4OHDh8TY2JiYmpqSLl26kCFDhhANDQ1iYmIi1tumPWFzsyknJyf2d15aWkrk5OSoahsbG5Pr16+LrF+/fp307t2bqjYhhNy4cYN4eHgQGxsb0q9fP+Lh4SE2YUiDW7dukVWrVhEDAwMiIyNDxo0bR37++WfS1NQkEX1CCHn79i05efIkmTRpEpGRkSEWFhZkx44d1JPhHIQ0NzeTgIAA6jeybZSVlZGkpCTS3NxMCCES8U2Kj48nfD6f7N69m1y5coUUFBQIPWjC5oHLli1biJmZGbl27RpRUVEhmZmZJDY2lmhpaZHdu3dT1SaEkJaWFpKcnEy8vLxI165diZqaGlmwYAFJT0+nqpuenv6PD0nw+vVrcufOHYneRLOJr68v0dPTI6NGjSIaGhrkr7/+IoQQcvz4cdKvXz8qmh96E7//kJWVJcbGxuTs2bNUtNsoLi4mWlpaxNHRkcjKypLp06cTU1NT0r17d+o+p2zumWRlZcVql5WVUdfW1dUlhYWFIusFBQWkV69eVLUJIeTkyZPE1taWqKurE3V1dWJra0tOnTpFXfft27fk7NmzxN3dnSgpKRFNTU2yePFiJjEuKf744w8SERFBLC0tiYyMDJk0aRI5deoUaWlpkWgcHByShEvQcXD8zcSJE4mTkxN5/PgxUVZWJrdv3yaZmZlk0KBB5Pfff6eq7eDgQH777TeqGv9Ec3Mz+fHHH4mvry9ZvHgxOXDgAHNDSxM2N5vm5uZk48aNIusbNmwgFhYWVLVPnTpFBg0aRHJycpiEQU5ODhkyZAhrlTdscPnyZbJkyRKipaVFrepBHK9fvybHjx8nY8aMIV26dCF2dnbExMSEqKioCBmP//+KqqoqUVNT+7cekoxDVVWVSEtLExUVFepVyQ0NDawNoGFz4BCbBy6EfDqm4i9fviTx8fHE2tq6Q1c9NDc3E29vbyItLS00FGTp0qVk06ZNEo3lzz//JImJiRIZmPHmzRuybds24u/vL3S4FBkZSQ4cOEBVW19fnzx58oSqxj9RW1tL1qxZQyZMmEDGjRtHvvnmG/Lo0SPqumzumXr37i22Ymzfvn3Uh5lFRUWRUaNGCf2Oa2tryZgxY6hXsX0qNDU1kdjYWDJ+/HgiKytL+Hy+RPWvXbtGFi5cSOTk5Ii+vj5RVVUl+vr6JC0tTaJxcHBICh4hHxhecXB0UjQ1NZGamgorKyt069YN2dnZMDExQWpqKoKCghhvIxokJiZi9erVCA4OFut5YGVlRU2bberq6rB3717k5uZCIBDAxsYGvr6+6NGjB1XdEydOwM3NDaNGjYKtrS14PB4uX76MlJQUxMfHY+rUqdS01dTU0NzcjHfv3qFLl1Yr0LavlZSUhH726dOn7ar966+/QlpaGmPHjhVaT05OhkAgwLhx49pV75/Iz89HbGwsjh8/joaGBrx8+ZKqXm5uLg4ePIhjx45BTk4Onp6emD9/PuPvEhERga1bt+KPP/74n7XY9Go6fPgw83VDQwPCwsIwduxYfPHFFwCAq1evIjk5GaGhoVR9gw4dOiQ0oEJKSgpaWloYPHgw1NTUqOkCgKenJx4/fozo6GiYmpqioKAAfD4fFy5cQGBgIIqLi6lp/yv/TNp+py9fvsSxY8dw8+ZN5prq4eEhMcPt5uZmiZuKv09dXR2OHz+O2NhY3Lx5EwMHDsT169fbVaOwsBAWFhaQkpJCYWHhP/4szc/vgIAAZGVlYceOHXB0dERhYSH4fD7OnDmDtWvXUt23uLq6ws7ODn5+fnj58iWsra1RXV0NQgiOHz+OadOmUdF9+/YtFi5ciNDQUPD5fCoa/ymNjY3MwKeOCpt7pr1792LZsmXw9vbG0KFDGe1Dhw5h586dWLRoETXtfv36oby8HK9fv8bnn38OAKipqYGcnJyIb297D13KycmBQCAQ8VW8fv06pKWlJeoNV19fj+PHj2Pfvn0oKSmh6o0NAH/88QeOHDmCgwcPorKyElOmTMG8efMwatQovHz5EqtXr0ZCQkK7+FXv2rULCxcuhLy8PDOA52O0t2cwB4c4uAQdB8ffqKmpITc3F3w+H71790Z0dDQcHBxQUVEBS0tLNDc3U9OWkpISWePxeCCEUB8SAbSae+/evZuZjtWnTx/4+flJbAocW+Tm5iIyMhIlJSUghMDMzAxBQUHo168fVd33Eyj/ivcnFbYHVlZW2Lx5M8aPHy+0npSUhBUrVqCgoKBd9T6kqqoKR48exc8//4zS0lLY2dnB3d0dLi4uVM3draysUFJSgjFjxmDBggWYNGmSkLE6ADx58gTdu3cXmaL83+Dg4IDExESoqqrCwcHhoz/H4/GQmpr6P+t9jGnTpsHBwQF+fn5C63v27MGlS5dw6tQpato1NTXQ1dUVO0W2pqaGudmhgY6ODpKTk2FtbQ0VFRUmQVdVVQVLS0u8ePGCmjaH5Hn+/DlOnDiBo0ePIj09HXw+H+7u7vDw8KAyjU9KSgp1dXXQ1taGlJQU83n9IbQ/v/X09BAXF4chQ4YIvc7Ly8thY2OD58+fU9N+/z129OhRrF27FgUFBTh8+DD2799PNTmoqqqKmzdvspKg27JlC/T19eHm5gYAcHFxwYkTJ9CjRw/8+uuvsLa2pqr/7Nkz/Pjjj0LTTOfOnQt1dXWqugB7eyag9SA7IiKCGcDTNsVV3DCi9uQ/mTi+du3adtUeNGgQQkJCMH36dKH1kydPYsuWLe1+8PAhzc3NSExMxM8//4xLly5BV1cXM2fOhIeHB0xNTanpTpo0CcnJyTA2Nsb8+fPh6ekp8vp+9OgRPvvss3bZrxkYGODGjRvQ0NCAgYHBR3+Ox+NRndLMwdEGl6Dj4PibYcOGISgoCFOmTIG7uzuePXuG1atXY//+/cjNzcWtW7eoabNZcZGQkICZM2diwIABTIXNtWvXkJOTg6NHj8LFxYWaNgC8evUKhYWFePz4scgHLTctqf1RUFBASUkJ9PX1hdarq6thbm6OpqYmatpffPEFsrOzYWlpCQ8PD7i7u6NXr17U9N5nw4YN8Pb2lpjep4KysjLy8/NFkhRlZWXo168f1USVtLQ0amtroa2tLbTe0NAAbW1tqokLFRUV3Lx5E0ZGRkKJi5ycHDg6OqKhoYGadhu3b99GTU0N3rx5I7RO+7pWWlqK9PR0sdfUNWvWUNNtamrC5s2bkZKSIlab5o2NgoIC1NTU4OrqCg8PDwwcOJCaFtD6mf3555+Dx+Ox+vmtqKiIW7dugc/nC73OCwoKYGdnhz///JOatoKCAkpLS6GrqwtPT0/07NkTmzdvRk1NDczMzKheW+bOnQtLS8t/WZ1MAz6fj9jYWAwdOhQXL16Eq6sr4uLiEB8fj5qaGqpT2DMyMuDk5ISuXbsy1VO5ublobGzEmTNnMHz4cGraHJJHWVmZqYp9n6qqKlhZWeGvv/6ipj1z5kycPXsWioqKcHFxgYeHh8Smqc6bNw/z589n7knEQQhBTU0N9Yp0Dg426MJ2ABwcnwqrV69mkhNhYWGYOHEihg0bBg0NDcTFxVHVZvMDJiQkBKtWrcK3334rtL527VqsWLGCaoIuKSkJnp6eqK+vF3lOEpWDAPD48WOxN5OSaCtmQ7tbt26orKwUSdCVl5eLtNe2Nw4ODoiOjoa5uTlVHXGEhoYKfd/S0oKioiLo6elRb7dkEw0NDSQmJiI4OFho/dSpU9DQ0KCq/bHzvxcvXkBeXp6qtp2dHWJiYrBhwwYArdcTgUCA8PDwf6xobA8qKysxdepUFBUVCVVWtVUS0ryuHThwAIsXL4ampiZ0dHSEqhd5PB7VBN38+fORkZGB2bNno0ePHmIrJ2lx+vRpjBo1Smw1Og3e/8xm8/N74MCBOH/+PJYuXQrg/15jBw4c+Meb2/ZAV1cXV69ehbq6OpKSknD8+HEArRVetN/fhoaG2LBhA65cuYL+/fuLfHbRbEOrra2Frq4uAODcuXNwdXXFmDFjoK+vL9KK2N74+vrC1dUVe/fuZSrAW1pasGTJEvj6+lI9SPbw8IC9vT3s7e1FWjslyYsXL0T2TF27du2Q2nJycvjjjz9EEnS1tbWMPQoteDwe4uLiMHbsWOpaH/Ljjz+KrH3YRs7j8bjkHEeHhaug4+D4B54+fQo1NTWJ3WiwUXGhqKiIwsJCsRU21tbWVFt7DQ0NMXbsWKxZswbdu3enpiOO3NxceHl5Ma0a70M7Ocim9sKFC3Ht2jUkJiaid+/eAFqTc9OmTcPAgQMRHR1NTZtNli1bBktLS8ybNw8tLS0YPnw4rly5AkVFRZw7dw729vbtqufs7Pxv/+zJkyfbVft9Dh06hHnz5sHR0VGoQjYpKQnR0dGYM2dOu2u2VbXs3LkTCxYsgKKiIvNcS0sL45+TlZXV7tpt3L59G/b29ujfvz9SU1MxefJkFBcX4+nTp8jKymJe+zRoa58+cOAA+Hw+srOz0dDQgKCgIGzbtg3Dhg2jpq2np4clS5ZgxYoV1DQ+hqqqKs6fPw9bW1uJa7NNRUUFduzYIdR2GBAQQPV1BgBXrlyBo6MjPDw8cOjQISxatAjFxcW4evUqMjIy0L9/f2raP/zwAwICAqCsrAw9PT3cvHkTUlJS2L17N06ePIm0tDRq2my2ofXs2RMJCQkYOnQoTExMEBYWBhcXF9y9excDBw6k2lasoKCA/Px8mJiYCK3fvXsXffv2perjumjRImRkZKC0tBQ6OjoYPnw4hg8fDnt7e+p2KFVVVfDz80N6ejpevXrFrEvCBoZN7RkzZqCurg6nT59mLEAaGxsxZcoUaGtrIz4+npo2m3zYRu7q6ooTJ05AR0eHShv5f1KJ296ewRwc4uAq6Dg4PqC8vBwVFRWws7ODurr6R6tA2hM2Ky7s7e2RmZkpkqC7fPky1RtJoLWCbPny5RJPzgGtLTLGxsb48ccf0b17d4lWe7CpHR4eDkdHR/Tp0wefffYZAODBgwcYNmwYtm3bRl3/wYMHOHPmjNhENM2NT0JCAmbNmgUAOHv2LKqqqnDnzh3ExMTgm2++afdk0ft+eoQQJCYmolu3biJtSf9JIu+/Yc6cOTA1NcWuXbtw8uRJxjcoKyuLWrVHm/cUIQRFRUWQlZVlnpOVlYW1tTW++uorKtptmJmZobCwkKk0aWpqgrOzs0QG0Fy9ehWpqanQ0tKClJQUpKSk8OWXX2LTpk3w9/en6s317Nkz6rYEH0NNTU0iPlgfIyEhgWkz/PDa0t7m7e+TnJyMyZMno2/fvrC1tQUhBFeuXIG5uTnOnj2L0aNHU9MeOnQosrKysG3bNvTu3RsXLlyAjY0Nrl69CktLS2q6ALBkyRIMGjQI9+/fx+jRo5nqRT6fj7CwMKraVVVVVP/+f8LZ2Rnu7u4wMjJCQ0MDM1hJnJVAe2NjY4OSkhKRBF1JSQn69u1LVTsqKgpA6xCW9PR0pKenY+fOnfD19YW2tjZqa2upaXt4eAAAfvrpJ4nvmdjUjoiIgJ2dHfT09Bifv/z8fHTv3h1Hjhyhrt/U1ISMjAyx11SaVapRUVGIjY0FAFy8eBEXL17Eb7/9hvj4eAQHB7d7G/mHn8m5ubloaWlh3melpaWQlpameuDBwSGEpMbFcnB86tTX15MRI0YQHo9HpKSkSEVFBSGEEG9vb7J8+XKq2hMnTiROTk7k8ePHRFlZmdy+fZtkZmaSQYMGkd9//52q9t69e4mWlhbx9fUlR44cIUeOHCG+vr5EW1ub7N27l5w+fZp5tDdz584l0dHR7f73/jsoKyuTsrKyTqdNCCECgYAkJyeTrVu3kt27d5OMjAyJ6F66dIkoKioSc3Nz0qVLF9K3b1+iqqpKunXrRhwcHKhqy8nJkfv37xNCCFmwYAEJCAgghBBSWVlJVFRUqGqHhISQ+fPnk3fv3jFr7969IwsXLiRfffUVVW02mTNnDvnzzz/ZDkPiqKqqMp8ffD6fpKamEkIIKS8vJwoKClS1vb29yd69e6lqfIwjR46Q6dOnk6amJolr79y5kygrKxNfX18iKytLFi1aREaNGkW6detGvv76a6raffv2JStWrBBZX7FiBenXrx9VbY7WzzOBQCAxvTdv3pBt27YRf39/cvPmTWY9MjKSHDhwgKr28ePHyeeff07Cw8NJZmYmyczMJOHh4URfX58cP36cFBQUMA9avHjxgiQlJZGVK1eSIUOGEFlZWdK3b19qeoQQoqSkRO7cuUNV41PUJqT19x0VFUWWLFlCgoKCyOHDh8mbN2+o6968eZPo6OiQrl27EmlpaaKlpUV4PB5RUlIiBgYGVLXl5eVJTU0NIYQQf39/snDhQkIIIXfv3iWqqqpUtSMiIsikSZPI06dPmbWnT58SJycnsm3bNqraHBxtcC2uHBx/4+npicePHyM6OhqmpqaM2fKFCxcQGBiI4uJiatqamppITU2FlZUVunXrhuzsbJiYmCA1NRVBQUFUKy7+Xc8eGqX8zc3NcHFxgZaWFiwtLSEjIyP0PM0TuilTpmD27NmYNm0aNY1PUZtNBg0aBEdHR3z77beMobm2tjY8PDzg6OiIxYsXU9PW09PDgQMHMHLkSBgYGOCHH37AxIkTUVxcjC+//BLPnj2jpq2lpYXLly+LbUsaOnSoRAYWAMDLly/x9u1boTWa/jl//PHHR6tjCwsLqfs8sjWAhs2BQ5s2bcL27dsxYcIEiV9T+/Xrh4qKChBCoK+vL6JNs4qtT58+WLt2LWbOnCk0LGHNmjV4+vQp9uzZQ01bXl4eRUVFIr5cpaWlsLKyEmqLowUbfqYtLS04dOjQR4eC0JxODQAxMTEIDw9HWVkZAMDY2BjBwcGYPXs2Nc23b99i4cKFCA0NZWWC7L/ar7V1YNDYr61YsQIZGRkoKCiAhYUF7OzsMHz4cNjZ2Ql5g9HAwcEB33zzDUaNGkVV51PTZhN7e3sYGxtj7969UFVVRUFBAWRkZDBr1iwEBARQrf5ns428V69euHDhgohX8q1btzBmzBg8evSImjYHRxtciysHx99cuHABycnJTNtfG0ZGRv9yStv/SktLC5SVlQG0JusePXoEExMT6Onp4e7du1S122NE+X/L0aNHkZycDAUFBaSnp4sYmtO8mYyOjoaXlxdu3boFCwsLkZtJmjfwbGqzSUlJCY4dOwYA6NKlC16+fAllZWV8++23cHJyopqgmzt3LlxdXRnz+ra2s+vXr1P3z3n37t1H25Jov/+am5sREhKC+Ph4sYlAmu3zlpaWiI6OFnk9b9u2DaGhoVT9ktgcQMPmwKH9+/dDWVkZGRkZyMjIEHqO9jXVyclJou1f71NTU8NMGFRQUGCmG86ePRtDhgyhmqDT0tJCfn6+SIIuPz9fZIJxe8Omn2lAQAAOHTqECRMmwMLCQqL/99u3b0doaCj8/PyYtuKsrCz4+Pigvr4egYGBVHRlZGSQmJgoMnRIUrDZ2hseHg4tLS2sXbsWTk5OMDU1lZh2dHQ0fHx88PDhQ7F7JpqJaDa12SQ/Px9RUVGQlpaGtLQ0Xr9+DT6fj61bt8LLy4tqgo7NNvLnz5/jjz/+EEnQPX78mOrUXA6O9+ESdBwcf9PU1CRkZt5GfX095OTkqGpbWFgwo9QHDx6MrVu3QlZWFvv372fllFZSrF69Gt9++y1Wrlwpsel7bVy5cgWXL1/Gb7/9JvIc7RsbNrXZRElJCa9fvwbQekJaUVHBbILEJVLak3Xr1sHCwgL379+Hi4sL856WlpbGypUrqWrPnTsX3t7eKC8vx5AhQwC0DmrYvHkz5s6dS1U7ODgYaWlp+OGHH+Dp6Ynvv/8eDx8+RFRUFDZv3kxVe8WKFXBzc4OXlxciIyPx9OlTzJ49G8XFxdQTVX5+fnBxcWFlAM3YsWOZr/l8Pm7fvi2xgUNs3sCvW7eONW0dHR00NDRAT08Penp6uHbtGqytrVFVVUXdR3bBggVYuHAhKisrMXToUPB4PFy+fBlbtmxBUFAQVW02/UyPHz+O+Ph4jB8/XmKabezevRt79+6Fp6cns+bk5ARzc3OsW7eOWoIOAKZOnYpTp079R8by7QWbUyvz8vKQkZGB9PR0REREQFpamhkSYW9vTzVh9+TJE1RUVAh9XtKsFvxUtNlERkaGuZ50794dNTU1MDU1Rbdu3VBTU0NVOzIyEvr6+rh//z62bt3KFDDU1tZiyZIlVLWnTp2KuXPnIiIiQmi/FhwcTN0zmIODgbXmWg6OT4zx48eT1atXE0JaPcIqKytJS0sLcXFxIdOmTaOqnZSURE6cOEEIIaSiooKYmpoSHo9HNDU1SUpKClVtNlFTUyPl5eWsaOvp6RFfX19SV1fXqbTZxMnJiezfv58QQkhwcDAxNDQkYWFhxMbGhowcOZLl6OjR0tJCtmzZQnr27El4PB7h8XikZ8+eZMuWLUK+dDTQ1dUlaWlphBBCVFRUGO/DmJgYMm7cOKrahBCSn59PLCwsiKGhIVFXVyfjx4+XyOteRUWFtWtLZ8XAwIDU19eLrD979oy6Z9G8efPIunXrCCGtvqoKCgpk1KhRRFVVlXh7e1PVFggEZPv27aRXr17M+7tXr15kx44d1L3R2PQz7dGjB7l79y4r2nJycmL/3aWlpUROTo6qdlhYGFFVVSXTpk0j3333Hdm5c6fQo7OQn59P5syZQ7p06UKkpKSoapmamhJnZ2dy7do1UlVVRaqrq4UeHVWbTUaPHk1+/vlnQgghixYtIoMGDSKxsbFk7NixZNCgQSxHR4+mpiayePFiIicnR6SkpIiUlBSRlZUlixcvJi9evGA7PI5OAudBx8HxN7dv34a9vT369++P1NRUTJ48GcXFxXj69CmysrLQu3dvicYjqYoLNgkMDISWlha+/vpriWurqKggPz9f4v+vbGuzSWVlJV68eAErKys0Nzfjq6++wuXLl2FoaIjIyEhWqwMkRZt3Ck3vt/dRVlZGcXEx9PT08Nlnn+HkyZMYNGgQqqqqYGlpiRcvXlDV/+uvv7BgwQKcOHECwP+1d9PG29sbtra2mDdvHnWtTwk2fcGkpKRQV1cn0tb5xx9/QFdXV2QKYHsiEAggEAjQpUtrY0h8fDxzbfHx8RGaJEyTthYoFRUVieix6WcaERGByspK7NmzR+L7FAsLC7i7u4vsHcLCwhAXF4eioiJq2gYGBh99jsfjobKykpo22+Tl5TETXDMzM/H8+XP07dsXDg4OCA8Pp6arpKSEgoIC6u2Nn5o2m9y4cQN//fUXHBwc8OTJE3h5eTHX1J9++on61GC2aWpqYjxVDQ0NoaSkxHZIHJ0IrsWVg+NvzMzMUFhYiL1790JaWhpNTU1wdnaGr68vevToIZEYysvLUVFRATs7O6irq1NvzWGblpYWbN26FcnJybCyshLx9ti+fTs1bWdnZ6SlpbGSJGNTG2i9mS0vLxd7A29nZ0dN9/12bUVFRfzwww/UtD5VJJWYa4PP56O6uhp6enowMzNDfHw8Bg0ahLNnz1I39s7KysKsWbOgoaGBwsJCZGVlYenSpTh//jyioqKgpqZGTXvPnj1wcXFBZmamxIclsAkbvmBnzpxhvk5OTka3bt2Y71taWpCSkvKPSY32QEpKSsgmwdXVFa6urlQ1P+Tx48e4e/cueDweTExMoKWlRV2TTT/Ty5cvIy0tDb/99hvMzc1FtE+ePElNe/369XBzc8Pvv/8OW1tbpq04JSUF8fHx1HQBdtvI2URNTQ0vXryAtbU17O3tsWDBAtjZ2UnkM23EiBGsJcnY1Obz+cjJyYGGhobQemNjI2xsbKgmgwcMGMB8raWlhV9//ZWa1qeIkpJSh/UX5Pj04SroODg+ARoaGuDq6oq0tDTweDyUlZWBz+dj3rx5UFVVRUREBNshUsHBweGjz/F4PKrVHhs3bsSOHTtYmXbIpva1a9fg7u6Oe/fuSdxUPCcnBwKBAIMHDxZav379OqSlpYU2hBztQ2RkJKSlpeHv74+0tDRMmDABLS0tePfuHbZv346AgABq2nJycggMDMSGDRuY13hFRQVmz56NmpoaPHjwgJp2m7G3goICNDQ0RAbQdNQqF01NTcTExEjUF6wtMdbmy/Q+MjIy0NfXR0REBCZOnEgthoMHD0JZWRkuLi5C67/88guam5upVm0+f/4cvr6+OHbsGHPgIS0tDTc3N3z//fdCCcv25syZM5g9e7ZY83La1/N/5Z958OBBatpA64CMyMhIZkCGmZkZgoKC0K9fP6q679P2eu/InQ5tnDt3TmIJuQ/Zv38/wsLC4O3tLXbPRDMRzab2P1Ulf/7554ynLw2qqqrw7t07keE3ZWVlzHWdg4ODDlyCjoPjPV69eoXCwkKxlUU0P4Q9PT3x+PFjREdHw9TUFAUFBeDz+bhw4QICAwNRXFxMTRtgr6KKTdhsU2FTu2/fvjA2Nsb69euZiabvQ/NmctCgQQgJCcH06dOF1k+ePIktW7bg+vXr1LQ5WqmpqcGNGzfQu3dvWFtbU9XKyMjA8OHDRdYFAgE2btxIdRKijo4O/P39WRlAwyY9e/ZEeno6jI2NJa5tYGCAnJwcaGpqSlzbxMQE+/btEzn0ycjIwMKFC6lOQ3d1dUV+fj52796NL774AjweD1euXEFAQACsrKyoVnTp6+tj4sSJCA0NlfgwlM5MTEwMwsPDUVZWBgAwNjZGcHAwZs+eTVWXzYoqNvmnazjtRDQb2m1VyVOmTMHhw4fFViVfvHiR6nVt+PDh8Pb2FjnciI2NRXR0NNLT06lpc3B0drgEHQfH3yQlJcHT01PsNEnaGwAdHR0kJyfD2toaKioqTIJOEj5RbFZUcUgeNv1UlJWVmWnF71NVVQUrKyvqI+w7WyL67du3GDNmDKKiolhJ2LTxfuu+goICM/2OJurq6sjJyel0Po9s+oKxiby8PO7cuSNS1VFdXQ1TU1O8fPmSmraSkhKSk5Px5ZdfCq1nZmbC0dERTU1N1LTZ9jN99+4d0tPTUVFRAXd3d6ioqODRo0fo2rUrM3mRBh4eHsz00A8rfGizfft2hIaGws/PD7a2tiCEICsrC99//z3CwsKoTpBls6KKQ3J8ClXJXbt2xc2bN0X2iuXl5RgwYAAaGxupaXfWRDQHRxucBx0Hx9/4+fnBxcUFa9askfhJdFNTExQVFUXW6+vrIScnR1Xbx8cHAwYMwPnz58VWVHUWWlpaUFRUBD09PareWB8i6RaZwYMHo7y8nJUEnZycHP744w+RBF1tbS1j7k6LzpiIlpGRwa1bt1h7T3+sdX/+/PlQU1PDtm3bqGl7eXkhLi6OlQE0ksbZ2Vno+9TUVFZ8wfz9/WFoaCjSor9nzx6Ul5djx44d1LS1tbVRWFgokqArKCgQuclrbzQ0NMRWHnfr1o36Zwmbfqb37t2Do6Mjampq8Pr1a4wePRoqKirYunUrXr16hX379lHTVlZWRkREBBYtWgQdHR0MHz4cw4cPh729Pfr06UNNFwB2796NvXv3wtPTk1lzcnKCubk51q1bRyVB9+/4PHa2lsPGxkbqPqps0XaAyGZVMo/HE3to+ueff1LfL1VXV4vVeP36NR4+fEhVm4PjU4BL0HFw/M3jx4+xfPlyVtpE7OzsEBMTgw0bNgBo/WAUCAQIDw//R5+29qCsrAwJCQmdbkLVsmXLYGlpiXnz5qGlpQV2dna4evUqFBUVce7cOdjb21PVZ6tFZunSpQgKCkJdXZ1YPxWaprijR4/GqlWrcPr0aeYGo7GxEV9//TVGjx5NTReQfCJ6165d//bP0vQc9PT0xI8//ojNmzdT0/gYgYGBkJGRQU1NDUxNTZl1Nzc3BAYGUk3QsTmARtJ8mByaOnUqK3GcOHFCKJHQxtChQ7F582aqCboZM2bA398fKioqTDVsRkYGAgICMGPGDGq6ALB69WosX74cMTExzECpuro6BAcHU23jBlo/N1atWoXLly9L3M80ICAAAwYMEEmCTp06FfPnz6emCwBRUVEAWn/PbVNFd+7cCV9fX2hra6O2tpaadm1tLYYOHSqyPnToUGq6U6ZMAdC6N/yw5fD9iqqOypYtW6Cvrw83NzcAgIuLC06cOIEePXrg119/lYhdw7Zt21BSUgIejwdTU1MEBwdj2LBhVHXFDSSRVGJy2LBh2LRpE44dOwZpaWkArZ+rmzZtEqkWbi/YSkSL+9z6GDTtjjg42uBaXDk4/sbb2xu2traYN2+exLVv374Ne3t79O/fH6mpqZg8eTKKi4vx9OlTZGVlUT0dHzFiBEJCQuDo6EhN41Pks88+w6lTpzBgwACcOnUKvr6+SEtLQ0xMDNLS0pCVlUVNm+0WmQ9pa6OgXUn28OFD2NnZoaGhgTHyzs/PR/fu3XHx4kXo6upS05Z0a++/O7WStufg0qVLERMTA0NDQwwYMABKSkpCz9NMVLHZus/mAJrOiry8PG7duiW2JcrCwgKvXr2ipv3mzRvMnj0bv/zyC1ONKxAI4OnpiX379kFWVpaadr9+/VBeXo7Xr1/j888/B9Dq9SgnJyfSfnnz5s121WbTz1RTUxNZWVkwMTERen9XV1fDzMwMzc3N1LTbaGpqwuXLl5kk3c2bN2FmZoa8vDxqmhYWFnB3dxepzg0LC0NcXByKioqoabNZUcUmfD4fsbGxGDp0KC5evAhXV1fExcUhPj4eNTU1uHDhAjXt2NhYzJ07F87Ozsx+7cqVK0hMTMShQ4fg7u5OTZvNxOTt27dhZ2cHVVVVJhGZmZmJ58+fIzU1FRYWFu2uyVZr77/rU9tRuy04Pj24BB0Hx980NzfDxcUFWlpaEj+JBlpPgvfu3Yvc3FwIBALY2NjA19eXOZGnRWJiIlavXo3g4GCJV1Sxiby8PMrLy/HZZ59h4cKFUFRUxI4dO1BVVQVra2s8f/6cmraBgQHWr18v1CIDAIcPH8a6devEnpq2F/fu3fvH5/X09KhpA603VD///DMKCgqgoKAAKysrzJw5U+R119501kQ0m4kqFRUV3Lx5E0ZGRkI38Dk5OXB0dERDQwM17c7KiBEjcPLkSZEKi+fPn2PKlClU/78tLCzg4+MDPz8/ofW2lsDbt29T026jtLSUubZYWlpSv54BwPr16//tn127di3FSCSLuro6Ll++DDMzM6H39+XLlzFt2jT88ccf1LRXrFiBjIwMFBQUwMLCAnZ2dhg+fDiTUKDJiRMn4ObmhlGjRsHW1hY8Hg+XL19GSkoK4uPjJV7B2pFbPdtQUFBAaWkpdHV1ERAQgFevXiEqKgqlpaUYPHgwnj17Rk3b1NQUCxcuFDk43b59Ow4cOICSkhJq2mwmJgHg0aNH2LNnj9B+zc/PD+rq6lR1O2simoOjDS5Bx8HxN9HR0fDx8YGCggI0NDSEWuBon0SzCZsVVWyip6eHAwcOYOTIkTAwMMAPP/yAiRMnori4GF9++SXVDd/HKk3KyspgaWlJtdKks9JZE9FsMmHCBNjY2GDDhg1QUVFBYWEh9PT0MGPGDAgEAiQkJLAdYofjYybyjx8/Rq9evfD27Vtq2j/99BP8/PwQHByMESNGAABSUlIQERGBHTt2YMGCBdS0OSSPm5sbunXrhv379zPvby0tLTg5OeHzzz/HwYMHqWlLSUlBS0sLgYGBcHJyEmqhlwS5ubmIjIxESUkJCCEwMzNDUFAQUxlOC7ZbPdmiZ8+eSEhIwNChQ2FiYoKwsDC4uLjg7t27GDhwINUDVTk5ORQXF7NSGcxmYvJTozMkojk42uA86Dg4/mb16tX49ttvsXLlyn+73Lk9efXqFQoLC8VOmKTpeUCzWutTZu7cuXB1dWX8yNo80K5fv07dZNrQ0BDx8fEiLTJxcXESm0h3+/Zt1NTU4M2bN0LrtP01SktLkZ6eLvZ1vmbNGmq606ZNA9Dayt6GJBPRDx48wJkzZ8T+zjuSH9r7hIeHw97eHjdu3MCbN28QEhIi1Lrf3jg7O+PQoUPo2rWryOCED6E5LIENCgsLma9v376Nuro65vuWlhYkJSWhV69eVGPw9vbG69evsXHjRsZPVV9fX8RQnwYtLS04dOgQUlJSxF5bOmpLMyEECQkJSEtLE/vvpvk6j4yMhIODA8zMzPDq1Su4u7ujrKwMmpqaOHbsGDVdAMjLy0NGRgbS09MREREBaWlpZkiEvb099YRd//79ERsbS1VDHFFRUYzuxYsXcenSJSQlJSE+Ph7BwcHUK6rYwtnZGe7u7jAyMkJDQwPGjRsHoNUig7Ztha6uLlJSUkR0UlJSqNpyAICamhru378PXV1dJCUlISwsDEDr+14Sh+eNjY3Izs4We22heU1nOxHd1NSEjIwMsfs12t1UHBwAl6Dj4GB48+YN3NzcWEnOJSUlwdPTE/X19SLP0U4eSKIF6FNk3bp1sLCwwP379+Hi4sJMy5WWlsbKlSupaq9fvx5ubm74/fffxbbI0KSyshJTp05FUVGRkM9HW8UozdfagQMHsHjxYmhqakJHR0ekSpVmgo7NRHRKSgomT54MAwMD3L17FxYWFqiurgYhBDY2NqzFRRszMzMUFhZi7969kJaWRlNTE5ydnam17nfr1o15TYmbqtmR6du3L3g8Hng8HlO99j4KCgrYvXs39TgWL16MxYsX48mTJ1BQUICysjJ1TaB1YMGhQ4cwYcIEWFhYdJpp5AEBAdi/fz8cHBzQvXt3if67e/bsifz8fBw7dgw3b96EQCDAvHnz4OHhAQUFBara1tbWsLa2Zm6WCwoKsGPHDvj7+0MgEFD9HPPw8GASgZI6UGujtraWSQqdO3cOrq6uGDNmDPT19TF48GCJxiJJIiMjoa+vj/v372Pr1q3MdaW2thZLliyhqh0UFAR/f3/k5+dj6NChzH7t0KFD2LlzJ1VtNhOTZ8+ehYeHB5qamqCioiKyX6OZoGMzEZ2Xl4fx48ejubkZTU1NUFdXR319PRQVFaGtrc0l6DgkAtfiysHxN4GBgdDS0hKpapIEhoaGGDt2LNasWcPKFFmAvYqqzgpbLTKTJk2CtLQ0Dhw4AD6fj+zsbDQ0NCAoKAjbtm2jOpVMT08PS5YswYoVK6hpfIoMGjQIjo6O+PbbbxmvJm1tbXh4eMDR0RGLFy9mO0Qq1NTUQFdXV2zSoKamhjHUb28IIaipqYGWlhYUFRWpaHxq3Lt3D4QQ5j2tpaXFPCcrKwttbW1mEl9HRFNTEzExMRg/fjzboUgUdXV1xMbGdrp/N9B6I902HKLNvL5v375wcHBAeHg4Nd1FixYhIyMDpaWl0NHRwfDhw5nqPdrV92y2enZmEhMTERERwfjNtU1xdXJyoqr79u1b7Nq1CzU1NZgzZw6zP9yxYweUlZWpTks2NjbG+PHj8d1330n8c5TN1l57e3sYGxtj7969UFVVRUFBAWRkZDBr1iwEBAT8y+p8Do72gEvQcXD8jb+/P2JiYmBtbQ0rKysRjyqabWhdu3ZFXl4e1WmtH4PNiioOyaOpqYnU1FRYWVmhW7duyM7OhomJCVJTUxEUFER1+l3Xrl2Rn58PPp9PTeNfwUYiWkVFBfn5+ejduzfU1NRw+fJlmJubo6CgAE5OTqiurqamzSbS0tKora0V8UNraGiAtrY2tWuLQCCAvLw8iouLJV7h0tlJSEhgDMw/fI+19wTT9+nZsyfS09NhbGxMTeNTxMDAAL/99hv1xNDHePjwIbKyssS2wNGsNFFTU8OLFy9gbW3NVLPZ2dmha9eu1DQ/pK6ujkkQtiXstLW1UVtbS03Tz88P586dg5GREfLy8lBdXQ1lZWXExcVhy5YtVN9jHJLl7du3WLhwIUJDQ1nZMykpKaGoqIgVbTYT0aqqqrh+/TpMTEygqqqKq1evwtTUFNevX4eXlxfu3LlDTZuDow2uxZWD42+KioqY06lbt24JPUe7bWT69OlIT09nJUEXEBAAAwMDXLp0SWxFFUf78+uvv0JaWhpjx44VWk9OToZAIGDaGGjQ0tLCtIdoamri0aNHMDExgZ6eHu7evUtNF2j1Eblw4QJ8fHyo6oiDzUS0kpISXr9+DaB141lRUQFzc3MAENvW3lFo8/f7kBcvXkBeXp6arpSUFNMS1BkTdBUVFdixYwdKSkrA4/FgamqKgIAA6p8vu3btwjfffAMvLy+cPn0ac+fORUVFBXJycuDr60tVOygoCDt37sSePXtYa2998+YNqqqq0Lt3b3TpIpnt9bp167B+/Xr89NNP1NtKP+TgwYPw8fGBrKys2MFaNBN0R44ckXhC7kNUVFSgpqYGNTU1qKqqokuXLtDR0aGqGRkZCQMDA9TU1Ei81bOzkpOTA4FAINJCfP36dUhLS2PAgAFUdGVkZJCYmIjQ0FAqf/+/YuzYsbhx4wYrCTo2W3tlZGSYa1n37t1RU1MDU1NTdOvWDTU1NVS1OTja4BJ0HBx/k5aWxpr2nj174OLigszMTLETJmludK9evYrU1FRoaWlBSkoKUlJS+PLLL7Fp0yb4+/tTrajqrKxcuRKbN28WWSeEYOXKlVQTdBYWFigsLASfz8fgwYOxdetWyMrKYv/+/dQ3YoaGhggNDcW1a9ck/jpnMxE9ZMgQZGVlwczMDBMmTEBQUBCKiopw8uRJDBkyhKo2GyxfvhxA6016aGioUHtMS0sLrl+/jr59+1KNYevWrQgODsbevXthYWFBVetTIjk5GZMnT0bfvn1ha2sLQgiuXLkCc3NznD17lhmGQ4MffvgB+/fvx8yZM3H48GGEhISAz+djzZo1ePr0KTVdALh8+TLS0tLw22+/wdzcXOTaQnNYQnNzM5YuXYrDhw8DaB2Ew+fz4e/vj549e1L1NHVxccGxY8egra0NfX19kX83zYqqNWvWYM2aNVi1apXEvXsnTpwoUb33WbFiBTIyMlBQUAALCwvY2dlh1apVsLOzozpl8p8qqpYtW0ZNt7Pj6+uLkJAQkQTdw4cPsWXLFly/fp2a9tSpU3Hq1CnmM1WSTJgwAcHBwbh9+7bY/RrNrgM2E9H9+vXDjRs3YGxsDAcHB6xZswb19fU4cuQILC0tqWpzcLTBtbhycHwCREdHw8fHBwoKCmJPoisrK6lpq6mpITc3F3w+H71790Z0dDQcHBxQUVEBS0tLNDc3U9PurCgoKKCkpAT6+vpC69XV1TA3N0dTUxM17eTkZMasv7KyEhMnTsSdO3egoaGBuLg4sQbz7YWBgcFHn6P9OmeztbeyshIvXryAlZUVmpub8dVXX+Hy5cswNDREZGRkhxvU4uDgAADIyMjAF198AVlZWeY5WVlZ6Ovr46uvvqJa3aampobm5ma8e/cOsrKyItVFtBNGbNGvXz+MHTtW5ABg5cqVuHDhAtWEjaKiIkpKSqCnpwdtbW1cvHgR1tbWKCsrw5AhQ9DQ0EBNe+7cuf/4/MGDB6lpBwQEICsrCzt27ICjoyNzAHLmzBmsXbuW6rXF1dUVaWlpmD59utghEWvXrqWmraGhgezsbFYq/9lESkoKWlpaCAwMhJOTE/WJse+jqqqKmzdvsmoT0dlQVlZm3tPvU1VVBSsrK/z111/UtDdu3Iht27Zh5MiR6N+/P5SUlISep3mo+U9Jd5rD69hu7b1x4wb++usvODg44MmTJ/Dy8mL2az/99BP1w0UODoBL0HFwfBLo6OjA398fK1eulPhJ9LBhwxAUFIQpU6bA3d0dz549w+rVq7F//37k5uaKtPt2JAQCAcrLy8X659jZ2VHT1dHRwdGjR0WSYZcuXYK7uzseP35MTVscT58+hZqaWoeefMgloiXP3LlzsXPnTlba0NqqmT6Gl5eXhCKRLPLy8igqKhJJfpaWlsLKygqvXr2ips3n85GQkAAbGxsMHDgQ8+fPx6JFi3DhwgXMmDGjwyZF9fT0EBcXhyFDhjBDYPh8PsrLy2FjY0PVL0lJSQnJycn48ssvqWl8jJCQEKirq1Ofev6pUVBQgIyMDGY4hbS0NDMkwt7enmrCbu7cubC0tGSloopN+Hw+cnJyoKGhIbTe2NgIGxsbqod7GhoaOHfuHL744guh9StXrmDChAlUBxaweajJJlwimqOzw7W4cnB8Arx58wZubm4ST84BwOrVq5mKrbCwMEycOBHDhg1jKqo6KteuXYO7uzsz/fB9aJ4OAq2tAcuWLUNiYiJTfVBeXo6goCCJTc0tLy9HRUUF7OzsoK6uLvI7oM2HHnC0YbO1l82bCzahWbX0r+ioCbh/hZaWFvLz80USdPn5+SLDOtqbESNG4OzZs7CxscG8efMQGBiIhIQE3LhxQ2KT7548eYK7d++Cx+PB2NhYaJotTU1xv9umpibq1zddXV3WfNg2bdqEiRMnIikpSWwLHM3BWmxibW0Na2trpnqpoKAAO3bsgL+/PwQCAdW9g6GhITZs2IArV65IvKKKTaqrq8X+Xl+/fo2HDx9S1R49ejRWrVqF06dPo1u3bgBaP7u//vprqpYBQGuVXmeEzdbeESNG4OTJkyLt6s+fP8eUKVOQmpoq8Zg4Oh9cgo6D4xPAy8sLcXFx+PrrryWu/f6gAj6fj9u3b3eKiiofHx8MGDAA58+fR48ePST6bw0PD4ejoyP69OmDzz77DADw4MEDDBs2jLofWkNDA9MWxePxUFZWBj6fj/nz50NVVRURERFU9WNiYhAeHo6ysjIAgLGxMYKDgzF79myqumwmotm8uejMVFRU4ODBg6ioqMDOnTuhra2NpKQk6OrqMkM6OhoLFizAwoULUVlZiaFDh4LH4+Hy5cvYsmULgoKCqGrv37+fqUT28fGBuro6Ll++jEmTJlEfDNPU1ISlS5ciJiaGiUFaWhqenp7YvXu3kA9iezNw4ECcP38eS5cuBfB/hw4HDhwQqbppbyIiIhASEoJ9+/aJWCbQ5rvvvkNycjJMTEwAQMSaoyOTl5fHTHDNzMzE8+fP0bdvX6a9nxbR0dFQVVVFbm4ucnNzhZ6jPZiDDc6cOcN8nZyczCTIgFY/05SUFOqv+4iICNjZ2UFPT48ZJJefn4/u3bvjyJEjVLXfR9KHmkCrTcW2bduEBg4FBwdj2LBhVHXZTESnp6eLTCAHgFevXiEzM5OaLgfH+3AtrhwcnwD+/v6IiYmBtbU1rKysWDmJfr+iSkFB4aMTGDsKSkpKKCgooD4R6mMQQnDx4kUUFBRAQUEBVlZWVNtq2/D09MTjx48RHR0NU1NTph3rwoULCAwMRHFxMTXt7du3IzQ0FH5+foyBfVZWFr7//nuEhYUhMDCQmrY4aCei224upkyZgsOHD4u9ubh48SL16bmdkYyMDIwbNw62trb4/fffUVJSAj6fj61btyI7OxsJCQlsh0gFQgh27NiBiIgIPHr0CEDr5ODg4GD4+/t32Gv6okWLcOnSJezZswe2trYAWgdH+Pv7Y/To0di7dy817StXrsDR0REeHh44dOgQFi1ahOLiYly9ehUZGRno378/Ne33vRYVFRVF9g4024rV1NQQGRmJOXPmUNP4FFFTU8OLFy9gbW3NtLWyPVG2o9LWVfL+5PU2ZGRkoK+vj4iICOpDQ5qamvDzzz8L7ddmzpwp8n6jAVuHmrGxsZg7dy6cnZ2FBg4lJibi0KFDcHd3p6bNRmtvYWEhAKBv375ITU2Furo681xLSwuSkpIQFRWF6urqdtfm4PgQLkHHwfEJ8E+nrjwej2pJ9ccqqubNmyeRiiq2GDFiBEJCQuDo6Mh2KBJFR0cHycnJsLa2FvJLqqqqgqWlJV68eEFN28DAAOvXr4enp6fQ+uHDh7Fu3TqJtHNIMhH9qdxcdEa++OILuLi4YPny5UKv85ycHEyZMqVTVC62mZerqKhITPPZs2f48ccfhSou5s6dK3SzQwNNTU0kJCTA3t5eaD0tLQ2urq548uQJVf2ioiJs27YNubm5EAgEsLGxwYoVK6hP/Tt06NA/Xr9otnrr6OggMzOT6rCXT5Fz5859Egk5Niqq2MLAwAA5OTnQ1NRkOxSJwuahpqmpKRYuXCiisX37dhw4cAAlJSXUtNlASkqKeS+JS40oKChg9+7d8Pb2lnRoHJ0QLkHHwdHJYbOiik0SExOxevVqBAcHi/XPsbKyYikyuqioqODmzZswMjISSVw4OjpSnbQoLy+PW7duiVQtlpWVwdLSkqqBPZuJ6M56c8EmysrKKCoqgoGBgdDrvLq6Gn369KH6WvsUeN+LzcTERCKvvYyMDDg5OaFr164YMGAAACA3NxeNjY04c+YMhg8fTk1bUVERubm5Igb9xcXFGDRoENXJ2J2VTZs2oba2Frt27WI7lE4FWxVVnxqNjY0iPmEdDTYPNeXk5FBcXCyyXysvL4eFhYXEPkMllYhu86Pm8/nIzs4W8i+VlZWFtrY2pKWlqcbAwdGG5B3pOTg4PikuXLiALVu2MF5obRgZGeHevXssRUWfadOmoaSkBN7e3hg4cCD69u2Lfv36MX92VOzs7BATE8N8z+PxIBAIEB4eTt0/x9DQEPHx8SLrcXFx1KswAgMDISMjg5qaGiE/Kjc3NyQlJVHVrqqq4pJzEkZVVRW1tbUi63l5eejVqxcLEUmGpqYmeHt7o0ePHrCzs8OwYcPQo0cPzJs3j/qkYl9fX7i6uqKqqgonT57EyZMnUVlZiRkzZsDX15eq9hdffIG1a9cK3TS+fPkS69evp+4D5+DggB9//BF//vknVR1x2NvbIyYmBi9fvpS4dnZ2Ng4fPgw+n49JkybB2dlZ6MHR/mzfvh2LFy/G+PHjER8fj7i4ODg6OsLHxweRkZFsh0eNLVu2CHnFuri4QF1dHb169UJBQQGLkdGltrYWQ4cOFVkfOnSo2M+39kRXVxcpKSki6ykpKdDV1aWqDbQmoi0tLaGgoMC0FdP0/NPT04O+vj4EAgEGDBgAPT095tGjRw8uOcchUbghERwcnZympiaxBtr19fWQk5NjISLJ0FmnY4WHh8Pe3h43btzAmzdvEBISguLiYjx9+hRZWVlUtdevXw83Nzf8/vvvsLW1ZQzsU1JSxCbu2pMLFy4gOTmZlUS0v78/DA0NRYyN9+zZg/LycuzYsYOqfmfE3d0dK1aswC+//MIkobOysvDVV1+JVCN0JJYvX46MjAycPXtWxIstKCiIqhdbRUUFTpw4IXQjIy0tjeXLlwsdCtBg586dcHR0xGeffQZra2vweDzk5+dDXl4eycnJVLUtLS2xevVq+Pn5Yfz48Zg9ezbGjx8PWVlZqroA0L9/f4SEhGDp0qVwdXXFvHnzMGTIEOq6QGsSnEvESZbdu3dj7969QtcwJycnmJubY926dRL3cZUUUVFRiI2NBQBcvHgRly5dQlJSEuLj4xEcHIwLFy6wHCEd2g41PxwgJ4lDzaCgIPj7+yM/P19o4NChQ4ewc+dOqtofa+318fFBfX091df5pk2b0L17d5FW1p9++glPnjzBihUrqGlzcDAQDg6OTs348ePJ6tWrCSGEKCsrk8rKStLS0kJcXFzItGnTWI6Ogwa1tbVkzZo1ZMKECWTcuHHkm2++IY8ePZKI9o0bN4iHhwexsbEh/fr1Ix4eHuTmzZvUdZWVlUlpaSnzdUVFBSGEkOzsbKKurk5Vu2fPnuTGjRsi67m5uaRXr15UtTsrb968Ie7u7kRKSorweDwiIyNDpKSkyKxZs8i7d+/YDo8aGhoaJC0tTWQ9NTWVaGpqUtUeOnQoSUxMFFlPTEwkQ4YMoapNCCHNzc1k//79ZPny5SQwMJAcOHCANDc3U9clhJCWlhaSnJxMvLy8SNeuXYmamhpZsGABSU9Pp6797t07curUKeLk5ERkZGSIqakpCQ8PJ3V1ddS1OSSLnJwcKSsrE1kvLS0lcnJyLEQkGeTl5UlNTQ0hhBB/f3+ycOFCQgghd+/eJaqqqmyGRpWEhAQiLS1Nxo4dS7799luyYcMGMnbsWNKlSxdy8uRJ6vonT54ktra2RF1dnairqxNbW1ty6tQp6rr6+vrk8OHDIuuHDh0i+vr6VLX19PRIVlaWyPq1a9eoa3NwtMF50HFwdHJu374Ne3t79O/fH6mpqZg8ebJQRVXv3r3ZDpEqt2/fRk1NjchY9cmTJ1PVFQgEKC8vx+PHjyEQCISek8Q0187GhAkTYGNjgw0bNkBFRQWFhYXQ09PDjBkzIBAIqE71/Jj3nqS9XDojFRUVyMvLg0AgQL9+/Tq8oT2bXmxxcXFMNVdbFde1a9fw/fffY/PmzUIxdVSPTwB49eoVzp49i40bN6KoqAgtLS0S037y5AmioqKwceNGtLS0YPz48fD398eIESMkFgMHPSwsLODu7i5SURUWFoa4uDgUFRWxFBldevbsiYSEBAwdOhQmJiYICwuDi4sL7t69i4EDB+L58+fUtNs8ejU0NITWGxsbYWNjQ2Wi6Pvk5uYiMjISJSUlIITAzMwMQUFBHdqKhU2/Ynl5eZSUlIhMkq2srISZmRm3X+OQCFyLKwdHJ8fMzAyFhYXYu3cvpKWl0dTUBGdnZ/j6+qJHjx5sh0eNyspKTJ06FUVFRUJTNtuMaGneVF27dg3u7u6MKe378Hg86jd0r169QmFhodjkIM3E5K+//gppaWmMHTtWaD05ORkCgQDjxo2jps1ma6+hoSGSkpLg5+cntP7bb7+Bz+dT1e7s9O7dm/kdd4Zph21ebDExMZCXlwcgOS+2mTNnAgBCQkLEPtd2naVxjftU2pLq6upw/PhxxMbGorCwEAMHDpSILtDqCXfw4EEcO3YM2tramDNnDmprazFp0iQsXrwY27Zt+581bGxskJKSAjU1NfTr1+8f31M3b978n/U4hGHTJoJNnJ2d4e7uDiMjIzQ0NDB7hfz8fJEkTntTXV0t9nr1+vVriUwD79+/P9PeK0lycnIgEAgwePBgofXr169DWlqaGQREAzZbe3V1dZGVlSWSoMvKykLPnj2panNwtMEl6Dg4OKCjo4P169ezHYZECQgIgIGBAS5dusRMbWpoaEBQUFC73Mj8Ez4+PhgwYADOnz+PHj16SDRxkJSUBE9PT9TX14s8Rzs5uHLlSmzevFlknRCClStXUk3QsZmIXr58Ofz8/PDkyROmkiUlJQURERGc/xxFfvzxR0RGRjLTDo2MjLBs2TLMnz+f5cjowaYXG5u+nlFRUTh69KjIurm5OWbMmEE1Qff8+XOcOHECR48eRXp6Ovh8Ptzd3XH8+HHqyYPHjx/jyJEjOHjwIMrKyjBp0iQcP34cY8eOZT5XXF1dMWXKlHb5XHNycmK8aadMmfI//30c/xnTpk3D9evXERkZiVOnTjEVVdnZ2R26oioyMhIGBgaoqanB1q1boaysDKB1iMKSJUuoaJ45c4b5Ojk5Gd26dWO+b2lpQUpKCvT19alot+Hh4QF7e3vY29tLvPrb19cXISEhIgm6hw8fYsuWLbh+/To1bTYT0fPnz8eyZcvw9u1bof1aSEgIgoKCqGpzcLTBtbhycHCwVlHFJpqamkhNTYWVlRW6deuG7OxsmJiYIDU1FUFBQcjLy6OmraSkhIKCAuo3b+IwNDTE2LFjsWbNGnTv3l2i2goKCigpKRHZ1FZXV8Pc3Jxq+x3b7N27Fxs3bsSjR48AAPr6+li3bl2HHljAJqGhoYiMjMTSpUuZyrGrV69iz549CAgIQFhYGMsR0uPly5eIjY3FnTt3mBt4Dw8PKCgosB0aNdhsS1JQUICamhpcXV3h4eEh0ao5WVlZ9O7dG97e3pgzZw60tLREfub58+dwcnJCWlqaxOLi4Ggv3r59i4ULFyI0NFSiFedSUlIAINRh0YaMjAz09fURERGBiRMnUoth0aJFyMjIQGlpKXR0dDB8+HAMHz4c9vb26NOnDzVdAFBWVkZhYaHI77yqqgpWVlb466+/qOqz1drbdmC8a9cuxvpGXl4eK1aswJo1a6hqc3C0wSXoODg6OWxWVLGJmpoacnNzwefz0bt3b0RHR8PBwQEVFRWwtLREc3MzNe0RI0YgJCQEjo6O1DQ+RteuXZGXl8eKt6COjg6OHj0q4od06dIluLu74/Hjx1T1P4VE9JMnT6CgoMBUAHDQQVNTE7t372baLts4duwYli5dKvZ6x/G/8/DhQ2RlZYl9j304xbg9MTIywtq1azFr1iyh9SNHjmDt2rVUfaIuXLiAUaNGMTf0kiQzMxPDhg2TuC4HO7BZUcUmqqqquHnzJiuWEAYGBsjJyYGmpqbEtduoq6tDeno60tPTmYSdtrY2amtrqWlqaGjg3LlzItYIV65cwYQJE/Ds2TNq2p8CL168QElJCRQUFGBkZMRUDnNwSAKuxZWDo5Pj5+cHFxcXViqq2MTCwoI5HRw8eDC2bt0KWVlZ7N+/n/omcOnSpQgKCkJdXR0sLS0hIyMj9DxNA/Xp06cjPT2dlQTd5MmTsWzZMiQmJjL65eXlCAoKop4g+1QS0eIqXDjan5aWFrEeOf3798e7d+9YiEhysJUkO3jwIHx8fCArKwsNDQ2h1n0ej0dVm822pDFjxlD9+/+JAQMGoLm5GYqKigCAe/fuITExEWZmZlTiUlNT+7ctGZ4+fdru+p0dZWVlREREYNGiRRKvqGKTqVOn4tSpU1i+fLnEtcW17jc2NkJVVVViMaioqEBNTQ1qampQVVVFly5doKOjQ1Vz9OjRWLVqFU6fPs209zY2NuLrr7/G6NGjqWp/ColoZWVliVZDc3C8D1dBx8HRyWGzoopNkpOTGR+yyspKTJw4EXfu3IGGhgbi4uKoTr0TV2lB00D9fZqbm+Hi4gItLS2xyUGaN9F//vknHB0dcePGDXz22WcAgAcPHmDYsGE4efIk1Q0vm629AJCQkID4+HixE4M5M/X2Z+nSpZCRkcH27duF1r/66iu8fPkS33//PUuR0eVfJcloVpLp6urCx8cHq1atkng1GdttSWy9v8eMGQNnZ2f4+PigsbERffr0gYyMDOrr67F9+3YsXry4XfUOHz7MfN3Q0ICwsDCMHTtWqI08OTkZoaGhCAwMbFdtjv+DjYoqNtm4cSO2bduGkSNHon///lBSUhJ6nua+ZcuWLdDX14ebmxsAwMXFBSdOnECPHj3w66+/wtrampr2ihUrkJGRgYKCAlhYWMDOzg7Dhw+HnZ0d9QThw4cPYWdnh4aGBqatND8/H927d8fFixehq6tLTZvN1l6gdUDGL7/8IvZ6fvLkSer6HBxcgo6Do5Pj7e0NW1tbzJs3j+1QWOfp06f/UYXAf8u9e/f+8Xk9PT1q2tHR0fDx8YGCgoLEb+CB1hvpixcvoqCgAAoKCrCysoKdnR1VTYDdRPSuXbvwzTffwMvLCwcOHMDcuXNRUVGBnJwc+Pr6YuPGjRKPqaOzdOlSxMTEQFdXF0OGDAHQOj35/v378PT0FEpMf5jE+/8ZNpNkGhoayM7OZvWwh422JDbf35qamsjIyIC5uTmio6Oxe/du5OXl4cSJE1izZg1KSkqoaU+bNg0ODg4i06n37NmDS5cu4dSpU9S0OztNTU24fPkyk6S7efMmzMzMqHrnssmH3pLvQ3vfwufzERsbi6FDh+LixYtwdXVFXFwck5C/cOECNW0pKSloaWkhMDAQTk5OMDU1paYljqamJvz8889C+7WZM2eKHOzSgo1E9PHjx+Hp6YkxY8bg4sWLGDNmDMrKylBXV4epU6fi4MGD1LQ5ONrgEnQcHJ0cNiuqPgXKy8tRUVEBOzs7KCgoMFVsHRUdHR34+/tj5cqVrHgmsQWbieg+ffpg7dq1mDlzJlRUVFBQUAA+n481a9bg6dOn2LNnj8Rj6ug4ODj8Wz/H4/GQmppKORrJwWaSLCQkBOrq6li5cqXEtdmEzfe3oqIi7ty5g88//xyurq4wNzfH2rVrcf/+fZiYmFD1UlVWVkZ+fr7IsKOysjL069cPL168oKbdWWGzoqqzoqCggNLSUujq6iIgIACvXr1CVFQUSktLMXjwYKpebAUFBcjIyEB6ejoyMzMhLS3NVJLZ29tLPGEnadhIRFtZWWHRokXw9fVlrucGBgZYtGgRevTogfXr11PT5uBog0vQcXB0ctiuqGKLhoYGuLq6Ii0tDTweD2VlZeDz+Zg3bx5UVVURERFBPYbbt2+LLaGn6cemrq6OnJycTtfSzGYiWlFRESUlJdDT04O2tjYuXrwIa2trlJWVYciQIWhoaKCmzdG5YDNJ1tLSgokTJ+Lly5di32MdqVLxfdh8f1tZWWH+/PmYOnUqLCwskJSUhC+++AK5ubmYMGEC6urqqGnr6enBz88PwcHBQuvh4eHYs2fPv6wU5/jPYbui6u7Y2YYAACTGSURBVFOg7bZVUgepPXv2REJCAoYOHQoTExOEhYXBxcUFd+/excCBA/H8+XOJxAG0Jux27NiB2NhYCASCDjvEjc1EtJKSEoqLi6Gvrw9NTU2kpaXB0tISJSUlGDFiRIdtI+f4tOCGRHBwdHJWr16Nb7/9ttNVVAUGBkJGRgY1NTVCm1w3NzcEBgZSTdBVVlZi6tSpKCoqYrzngP/bcNLcdHl5eSEuLg5ff/01NY1PkaNHjyI5ORkKCgpIT0+XqIG9jo4OGhoaoKenBz09PVy7dg3W1taoqqoCd0bG0Z5s2rQJEydORFJSksSTZN999x2Sk5NhYmICACLvsY4Km+/vNWvWwN3dHYGBgRg5ciTjBXfhwgXGN4oW69evx7x585Cens7oXrt2DUlJSYiOjqaq3VnJy8tjKqoiIiI6VUVVTEwMwsPDUVZWBgAwNjZGcHAwZs+eTVXX2dkZ7u7uMDIyQkNDA8aNGwcAYqtHaZCXl8dUkGVmZuL58+fo27fvv10l/v8j4eHh0NLSwtq1ayWeiFZXV8dff/0FAOjVqxdu3boFS0tLNDY2Uq1I5uB4Hy5Bx8HRyXnz5g3c3Nw6VXIOaL2BSU5OZoYVtGFkZET95D8gIAAGBga4dOkS+Hw+srOz0dDQgKCgIGzbto2qdktLC7Zu3Yrk5GRYWVl1mioXNhPRI0aMwNmzZ2FjY4N58+YhMDAQCQkJuHHjBpydnSUaC0fHhs0k2fbt2/HTTz9hzpw5VHU+Ndh8f0+fPh1ffvklamtrhczqR44cialTp1LVnjNnDkxNTbFr1y6cPHkShBCYmZkhKysLgwcPpqrdWbG2toa1tTVzoNRWUeXv79+hK6q2b9+O0NBQ+Pn5wdbWFoQQZGVlwcfHB/X19VQHkkRGRsLAwAA1NTXYunUrlJWVAQC1tbVYsmQJNV2gdWryixcvYG1tDXt7eyxYsAB2dnbo2rUrVV22YTMRPWzYMFy8eBGWlpZwdXVFQEAAUlNTcfHiRYwcOZKaLgfH+3AtrhwcnZzAwEBoaWl1uooqFRUV3Lx5E0ZGRkK+QTk5OXB0dKTalqSpqYnU1FRYWVmhW7duyM7OhomJCVJTUxEUFETVX+OfTl07mh/X+7DZ2isQCCAQCNClS+uZWHx8PC5fvgxDQ0Nm4iYHR3ugpqaGyMhIVpJkOjo6yMzMhJGRkcS12YR7f3NIkn+qqAoPD2c7PCoYGBhg/fr18PT0FFo/fPgw1q1bh6qqKiq6b9++xcKFCxEaGgo+n09F4584d+5cp0jI/Ssk2dr79OlTvHr1Cj179oRAIMC2bduY63loaCjU1NSoaXNwtMEl6Dg4Ojn+/v6IiYmBtbV1p6qomjBhAmxsbLBhwwaoqKigsLAQenp6mDFjBgQCARISEqhpq6mpITc3F3w+H71790Z0dDQcHBxQUVEBS0vLDl1GLxAIUF5ejsePH0MgEAg9R3OaK1uJ6Hfv3mHjxo3w9vaGrq6uRLU5Oh9sJsk2bdqE2tpa7Nq1S+LaHBydgQ8rquzt7TtFAkdeXh63bt0SO5DE0tISr169oqatqqqKmzdvspKgY5O2A2sNDQ2h9cbGRtjY2FD3p2YjEf3u3Tv8/PPPGDt2LHR0dKhocHD8O3AtrhwcnZyioiLGq+bWrVtCz3Vk36Dw8HDY29vjxo0bePPmDUJCQlBcXIynT58iKyuLqraFhQUKCwvB5/MxePBgbN26FbKysti/f3+H3gReu3YN7u7uuHfvnog3E4/Ho3oqylZrb5cuXRAeHg4vLy8qfz8Hx/sEBARg9+7drCTJsrOzkZqainPnzsHc3FzkPXby5EmJxyQJDh48CGVlZbi4uAit//LLL2hubube+xztxpEjRzpFQu5DDA0NER8fL3LAFhcXR/0wYurUqTh16hSWL19OVedTo7q6Wuye7PXr13j48CFVbbZae7t06YLFixejpKSEqg4Hx7+CS9BxcHRy0tLS2A6BFczMzFBYWIi9e/dCWloaTU1NcHZ2hq+vL3r06EFVe/Xq1WhqagIAhIWFYeLEiRg2bBg0NDQQFxdHVZtNfHx8MGDAAJw/fx49evSQaAKYzUT0qFGjkJ6e3um8uTgkD5tJMlVV1U7pqbh582bs27dPZF1bWxsLFy7kEnQc7cbEiRPZDoEV1q9fDzc3N/z++++wtbUFj8fD5cuXkZKSgvj4eKrahoaG2LBhA65cuYL+/ftDSUlJ6HmaA6bY4MyZM8zXycnJ6NatG/N9S0sLUlJSoK+vTzUGNhPRgwcPRl5eHvT09CSuzcHRBtfiysHBwfEJ8PTpU6ipqXXoqkUlJSUUFBRIZPLZp0RUVBTWrVsHDw8PsRv8yZMnsxQZR0dj7ty5//j8wYMHJRRJ50FeXh537twRuWmtrq6GqakpXr58yU5gHBwdiNzcXERGRqKkpIQZSBIUFER9WrGBgcFHn+PxeNRbPSVN2xAtHo8n0ukgIyMDfX19REREdNhk8S+//IKVK1ciMDBQ7H7NysqKpcg4OhNcgo6Dg6PT8urVKxQWFor1Q5NE0qS8vBwVFRWws7ODgoICCCEdOkE3YsQIhISEwNHRke1QJMo/TY2l3drLwSFpnjx5grt374LH48HY2BhaWlpsh0SVzz//HHv27BH5zDh9+jR8fX3x4MEDliLj4ODg+O8wMDBATk4ONDU12Q5Foojbr7UlK7n9Goek4FpcOTg4OiVJSUnw9PREfX29yHO0P4QbGhrg6uqKtLQ08Hg8lJWVgc/nY/78+VBVVUVERAQ1bTZZunQpgoKCUFdXB0tLS5H2u456Mvlh8peDgxYvX74EIQSKiooAgHv37iExMRFmZmYYM2YMVe2mpiYsXboUMTExzGteWloanp6e2L17NxNTR2PGjBnw9/eHiooKM+gmIyMDAQEBmDFjBsvRtS//SQtzR/Uc5JA8Hh4ezFAMNqdEt9W0dOSD1DbETcZtbGyEqqqq5IORILQmAnNw/Cd8/Fifg4ODowPj5+cHFxcX1NbWQiAQCD1on5AFBgZCRkYGNTU1Qjetbm5uSEpKoqrNJtOmTUNJSQm8vb0xcOBA9O3bF/369WP+7Eioq6szyV9vb2/89ddfLEfE0RlwcnJCTEwMgNabqUGDBiEiIgJOTk7Yu3cvVe3ly5cjIyMDZ8+eRWNjIxobG3H69GlkZGQgKCiIqjabhIWFYfDgwRg5ciQUFBSgoKCAMWPGYMSIEfjuu+/YDq9d6dat27/94OBoL5SVlREREQETExP07NkTM2fOxL59+3Dnzh2J6MfExMDS0pJ5f1tZWeHIkSMS0WaLLVu2CHkiu7i4QF1dHb169UJBQQGLkbU/NjY2ePbsGQDg8OHD0NLSgp6entgHB4ck4FpcOTg4OiVdu3ZFXl4eevfuLXFtHR0dJCcnw9raGioqKigoKACfz0dVVRUsLS3x4sULicckCe7du/ePz3ekzY+ysjIzqVdaWhp1dXUdvtWPg300NTWRkZEBc3NzREdHY/fu3cjLy8OJEyewZs0aqtPpNDU1kZCQAHt7e6H1tLQ0uLq64smTJ9S0PwVKS0tRUFAABQUFWFpadqjrGQfHp0BdXR3S09ORnp6OjIwMlJaWQltbG7W1tdQ0t2/fjtDQUPj5+cHW1haEEGRlZeH7779HWFgYAgMDqWmzCZ/PR2xsLIYOHYqLFy/C1dUVcXFxiI+PR01NDS5cuMB2iO2GgoICysrK8Nlnn0FaWhq1tbXQ1tZmOyyOTgzX4srBwdEpmT59OtLT01lJ0DU1NYlt96qvr4ecnJzE45EUnemG9YsvvsCUKVPQv39/EELg7+8PBQUFsT/7008/STg6jo5Kc3MzVFRUAAAXLlyAs7MzpKSkMGTIkH+ZIG8P7e7du4usa2tro7m5mar2p4CxsTGMjY3ZDoODo8OioqICNTU1qKmpQVVVFV26dIGOjg5Vzd27d2Pv3r3w9PRk1pycnGBubo5169Z12ARdbW0tdHV1AQDnzp2Dq6srxowZA319fQwePJjl6NqXvn37Yu7cufjyyy9BCMG2bdugrKws9mfXrFkj4eg4OiNcgo6Dg6NTsmfPHri4uCAzM1OsH5q/vz81bTs7O8TExGDDhg0AWv1MBAIBwsPD4eDgQE33U+H27duoqanBmzdvhNY70jTT2NhYREZGoqKiAjweD3/++SdevXrFdlgcHRxDQ0OcOnUKU6dORXJyMnPz+PjxY3Tt2pWq9hdffIG1a9ciJiYG8vLyAFo98davX48vvviCqjabtLS04NChQ0hJSRE7cCg1NZWlyOiTkJDAVNR8eD2/efMmS1FxdDRWrFiBjIwMFBQUwMLCAnZ2dli1ahXs7Oyoe6LV1tZi6NChIutDhw6lWrnHNmpqarh//z50dXWRlJSEsLAwAK0+fB1tUMKhQ4ewdu1anDt3DjweD7/99hu6dBFNkfB4PC5BxyERuBZXDg6OTkl0dDR8fHygoKAADQ0NIdNfHo+HyspKatq3b9+Gvb09+vfvj9TUVEyePBnFxcV4+vQpsrKyWKnqkwSVlZWYOnUqioqKmKlYwP8ZLne0TV8bBgYGuHHjBjQ0NNgOhaODk5CQAHd3d7S0tGDkyJFMG9KmTZvw+++/47fffqOmfevWLTg6OuLVq1ewtrYGj8dDfn4+5OXlkZycDHNzc2rabOLn54dDhw5hwoQJ6NGjh4iBfGRkJEuR0WXXrl345ptv4OXlhQMHDmDu3LmoqKhATk4OfH19sXHjRrZD5OggSElJQUtLC4GBgXBycoKpqanEtC0sLODu7o6vv/5aaD0sLAxxcXEoKiqSWCySxM/PD+fOnYORkRHy8vJQXV0NZWVlxMXFYcuWLR02AS8lJYW6ujquxZWDVbgEHQcHR6dER0cH/v7+WLlypdix6rSpq6vD3r17kZubC4FAABsbG/j6+qJHjx4Sj0VSTJo0CdLS0jhw4AD4fD6ys7PR0NCAoKAgbNu2DcOGDWM7RA6O/++pq6tDbW0trK2tmWtbdnY2unbtij59+lDVfvnyJWJjY3Hnzh0QQmBmZgYPD4+Ptnd3BDQ1NRETE4Px48ezHYpE6dOnD9auXYuZM2cKeamuWbMGT58+xZ49e9gOkaODUFBQgIyMDKSnpyMzMxPS0tIYPnw4M9mVZsLuxIkTcHNzw6hRo2Brawsej4fLly8jJSUF8fHxmDp1KjVtNnn79i127dqFmpoazJkzhxnktWPHDigrK2P+/PksR8jB0XHhEnQcHBydEnV1deTk5HTYarVPEU1NTaSmpsLKygrdunVDdnY2TExMkJqaiqCgIOTl5bEdIgcHB8d/RM+ePZGent7p/OcUFRVRUlICPT09aGtr4+LFi7C2tkZZWRmGDBmChoYGtkPk6KAUFBRgx44diI2NhUAgoF59n5ubi8jISJSUlDAHD0FBQR1u+nwbb9++xcKFCxEaGgo+n892OBwcnQ7Og46Dg6NT4uXlhbi4OJG2BUnx6tUrFBYWivUs6khebO/T0tLCGO9qamri0aNHMDExgZ6eHu7evctydBwcHP8LmzZtQvfu3eHt7S20/tNPP+HJkydYsWIFS5HRJSgoCDt37sSePXtE2ls7Mjo6OmhoaICenh709PRw7do1WFtbo6qqCtzZP0d7k5eXx0xwzczMxPPnz9G3b1+J+Pb2798fsbGx1HU+FWRkZJCYmIjQ0FC2Q+Hg6JRwCToODo5OSUtLC7Zu3Yrk5GRYWVmJDInYvn07Ne2kpCR4enqivr5e5Dkej9dhvdgsLCxQWFgIPp+PwYMHY+vWrZCVlcX+/fu5U1oOjv/PiYqKwtGjR0XWzc3NMWPGjA6boLt8+TLS0tLw22+/wdzcXOSz5OTJkyxFRpcRI0bg7NmzsLGxwbx58xAYGIiEhATcuHEDzs7ObIfH0YFQU1PDixcvYG1tDXt7eyxYsAB2dnbUB98AgIeHB9NKa2RkRF3vU2Hq1Kk4deoUli9fznYoHBydDq7FlYODo1PyT6euPB6P6uQ9Q0NDjB07FmvWrEH37t2p6XxqJCcno6mpCc7OzqisrMTEiRNx584daGhoIC4uDiNGjGA7RA4Ojv8SeXl5lJSUwMDAQGi9srISZmZmHXaK8dy5c//x+YMHD0ooEskiEAggEAiYaYfx8fG4fPkyDA0N4ePjA1lZWZYj5OgonDt3TmIJuQ9ZtGgRMjIyUFpaCh0dHQwfPpzxv6Pt6ckmGzduxLZt2zBy5Ej0798fSkpKQs/7+/uzFBkHR8eHS9BxcHBwSJiuXbsiLy+P878D8PTpU6ipqXX41jCBQIDy8nKxLc12dnYsRcXB0X4YGRlh7dq1mDVrltD6kSNHsHbtWqqTsTkkT01NDXR1dUWu3YQQ3L9/H59//jlLkXFwtD91dXVMi21bwk5bWxu1tbVsh0aFDw9a3ofH43XY6zmfz0dOTg40NDSE1hsbG2FjY9Nh/90cnxZciysHBweHhJk+fTrS09M7bYKuvLwcFRUVsLOzg7q6eof3K7p27Rrc3d1x7949kX9rR25p5uhczJ8/H8uWLcPbt2+ZatiUlBSEhIQgKCiI5ejo8+TJE9y9exc8Hg/GxsbQ0tJiOySqGBgYoLa2Ftra2kLrT58+hYGBAXdd4+hQqKioQE1NDWpqalBVVUWXLl2go6PDdljUqKqqYjsEVqiurhZ77Xr9+jUePnzIQkQcnREuQcfBwcEhYfbs2QMXFxdkZmbC0tJSxLOoo7YONDQ0wNXVFWlpaeDxeCgrKwOfz8f8+fOhqqqKiIgItkOkgo+PDwYMGIDz58+jR48eHb5akKNzEhISgqdPn2LJkiV48+YNgNa21xUrVmDVqlUsR0ePpqYmLF26FDExMUx1rLS0NDw9PbF7924oKiqyHCEdCCFir2UvXryAvLw8CxFxcLQ/K1asQEZGBgoKCmBhYQE7OzusWrUKdnZ2UFVVZTs8idB2sNiR9y5nzpxhvk5OTka3bt2Y71taWpCSkgJ9fX0WIuPojHAtrhwcHBwSJjo6Gj4+PlBQUICGhobQpqcjtw54enri8ePHiI6OhqmpKQoKCsDn83HhwgUEBgaiuLiY7RCpoKSkhIKCAhgaGrIdCgcHdV68eIGSkhIoKCjAyMgIcnJybIdElUWLFuHSpUvYs2cPbG1tAbQOjvD398fo0aOxd+9eliNsX9pM43fu3IkFCxYIJSBbWlpw/fp1SEtLIysri60QOTjaDSkpKWhpaSEwMBBOTk4wNTVlOySJERMTg/DwcJSVlQEAjI2NERwcjNmzZ7McWfsjJSUFoHUP/mFqREZGBvr6+oiIiMDEiRPZCI+jk8FV0HFwcHBImNWrV+Pbb7/FypUrmU1BZ+DChQtITk7GZ599JrRuZGSEe/fusRQVfQYPHozy8nIuQcfRKVBWVsbAgQPZDkNinDhxAgkJCbC3t2fWxo8fDwUFBbi6una4BF1eXh6A1qqaoqIioWEQsrKysLa2xldffcVWeBwc7UpeXh4yMjKQnp6OiIgISEtLM0Mi7O3tO2zCbvv27QgNDYWfnx9sbW1BCEFWVhZ8fHxQX1+PwMBAtkNsV9qqnw0MDJCTkwNNTU2WI+LozHAJOg4ODg4J8+bNG7i5uXWq5BzQ2gomrt2rvr6+Q1fZLF26FEFBQairqxPb0mxlZcVSZBwcHP8rzc3NYqdxa2tro7m5mYWI6JKWlgagdXrtzp07WZmsycEhKaytrWFtbc1YjxQUFGDHjh3w9/eHQCDosF6Lu3fvxt69e+Hp6cmsOTk5wdzcHOvWretwCbo2xHnvNTY2dpp2Zo5PA67FlYODg0PCBAYGQktLC19//TXboUiUCRMmwMbGBhs2bICKigoKCwuhp6eHGTNmQCAQICEhge0QqSAuEdvWRsENieDg+P+bkSNHQkNDAzExMYz32suXL+Hl5YWnT5/i0qVLLEdIhz///BMtLS1QV1cXWn/69Cm6dOnCJe44Ogx5eXnMBNfMzEw8f/4cffv2hYODA8LDw9kOjwry8vK4deuWSOV/WVkZLC0t8erVK5Yio8uWLVugr68PNzc3AICLiwtOnDiBHj164Ndff4W1tTXLEXJ0BrgKOg4ODg4J09LSgq1btyI5ORlWVlYiFVXbt29nKTK6hIeHw97eHjdu3MCbN28QEhKC4uJiPH36tEP7FXXWaWgcHJ2BnTt3wtHREZ999hmsra3B4/GQn58PeXl5JCcnsx0eNWbMmIFJkyZhyZIlQuvx8fE4c+YMfv31V5Yi4+BoP9TU1PDixQtYW1vD3t4eCxYsgJ2dXYdPQBsaGiI+Pl7kIDkuLg5GRkYsRUWfqKgoxMbGAgAuXryIS5cuISkpCfHx8QgODsaFCxdYjpCjM8BV0HFwcHBIGAcHh48+x+PxkJqaKsFoJEtdXR327t2L3NxcCAQC2NjYwNfXFz169GA7NA4ODo7/ipcvXyI2NhZ37twBIQRmZmbw8PCAgoIC26FRQ11dHVlZWSIeXHfu3IGtrS0aGhpYioyDo/04d+5cp0jIfciJEyfg5uaGUaNGwdbWFjweD5cvX0ZKSgri4+MxdepUtkOkgoKCAkpLS6Grq4uAgAC8evUKUVFRKC0txeDBg/Hs2TO2Q+ToBHAJOg4ODg4ODglw+/Zt1NTU4M2bN0LrkydPZikiDg4Ojv8OJSUlXLt2DZaWlkLrRUVFGDx4cIf03+Pg6Ezk5uYiMjISJSUlzMFDUFAQ+vXrx3Zo1OjZsycSEhIwdOhQmJiYICwsDC4uLrh79y4GDhyI58+fsx0iRyeAa3Hl4ODg4JAYr169QmFhIR4/fsxMzWqjoyaqKisrMXXqVBQVFTHec0BrtSQAzoOOg+P/YzZt2oTu3bvD29tbaP2nn37CkydPsGLFCpYio8vAgQOxf/9+7N69W2h937596N+/P0tRcXBwtBf9+/dn2j07C87OznB3d4eRkREaGhowbtw4AEB+fr6IHx8HBy24BB0HBwcHh0RISkqCp6cn6uvrRZ7ryMMSAgICYGBggEuXLoHP5yM7OxsNDQ0ICgrCtm3b2A6Pg4PjfyAqKgpHjx4VWTc3N8eMGTM6bIJu48aNGDVqFAoKCjBy5EgAQEpKCnJycjifJg6O/8/x8PCAvb097O3tO7Tn3IdERkbCwMAANTU12Lp1K5SVlQEAtbW1In6bHBy04FpcOTg4ODgkgqGhIcaOHYs1a9age/fubIcjMTQ1NZGamgorKyt069YN2dnZMDExQWpqKoKCgpCXl8d2iBwcHP8l8vLyKCkpgYGBgdB6ZWUlzMzMOuy0Q6C1qiQ8PBz5+flQUFCAlZUVVq1a1alu6Dk4OiKLFi1CRkYGSktLoaOjg+HDh2P48OGwt7dHnz592A6PCm/fvsXChQsRGhoKPp/PdjgcnRgptgPg4ODg4OgcPH78GMuXL+9UyTmgtYW17RRWU1MTjx49AgDo6enh7t27bIbGwcHxP6Krqyt2CnVWVhZ69uzJQkSSo2/fvvj5559RXFyMGzdu4Kf/1979hVR9/3Ecf52jZ+usjp2TnLARxY7aqDzaP+oiOOdYEUFjYmU1AiNaIFSnTFALNlYxCFf2j9iMGEyDoaw/UIQG1Tm4rqzkSJGsZivoL9lFmEXm8XcxfvJz1Y/VPOcz/T4f4M3n48ULL/T4+n4/n/ePP1LOAcNATU2N2tvbde/ePVVXV2v06NHav3+/pk6dOmyHejkcDp04ccJ0DIAjrgCA5Fi2bJkikYgyMzNNR0mqnJwctbW1yefzac6cOaqqqtIHH3ygw4cP85QWGOK+/PJLbd68WT09PZo3b56kP496lpeXq6yszHC65Hj+/Ll6enoGrFlt6iUwHLlcLnk8Hnk8HrndbqWmpiojI8N0rIQpLCzUyZMntWXLFtNRYGEccQUAJEV3d7eKiork9Xrl9/vlcDgG7IfDYUPJEqupqUnPnj3TkiVL1NHRoc8++0zt7e1KT09XfX19/z/1AIaevr4+VVZW6sCBA/0TmkeMGKGKigp9/fXXhtMlTnd3t8rLy9XQ0KDOzs7X9ofrnaKAFVRUVCgajSoWiyknJ0eBQEDBYFCBQEBut9t0vIT59ttvtXv3bs2fP18zZ87UyJEjB+wP18+p+HehoAMAJMWRI0dUUlIip9Op9PT0/imm0p9DIjo6OgymS64nT57I4/EM+BkAGLq6urp0/fp1OZ1OZWdn68MPPzQdKaHWr1+vCxcuaMeOHSouLtahQ4d09+5d1dTUaNeuXVq1apXpiADek91ul9frVWlpqQoKCjR58mTTkZLir3eJ/i+rfU6FORR0AICkyMjIUDgcVmVlpex2612BevPmTf3+++8KBAJyOp3q6+ujoAMwJE2YMEG1tbUKhUJKS0vTlStXlJWVpbq6Ov388886c+aM6YgA3lMsFlM0GlUkElFzc7NSUlL6h0SEQiHLFHaACRR0AICkGDNmjFpaWix3B11nZ6eWL1+uCxcuyGaz6caNG/L5fFq7dq3cbrf27NljOiIAvJNRo0bp2rVrmjhxosaPH6/jx49r9uzZunXrlvx+v7q6ukxHBDBIYrGY9u3bp6NHjyoej1viCPt/KxIepCLZrPcKAwDAiNWrV6u+vt50jKQrLS2Vw+HQnTt39NFHH/Wvr1ixQo2NjQaTAcD78fl8+uOPPyRJU6ZMUUNDgyTp1KlTw/qOKsAqWltbtXfvXhUUFCg/P191dXXKy8sb9gMUamtr5ff75XQ65XQ6lZubq7q6OtOxYCFMcQUAJEVvb6+qqqrU1NSk3Nzc14ZEVFdXG0qWWGfPnlVTU5PGjx8/YD07O1u3b982lAoA3t+aNWsUi8UUDAa1detWLV68WAcPHtSrV6+G7e9ywCo8Ho+6urqUl5enUCikdevWKRAIDPvpzNXV1frqq6+0YcMGzZ07V319fbp48aJKSkr0+PFjlZaWmo4IC+CIKwAgKfLz89+6Z7PZdP78+SSmSR6Xy6UrV64oOztbLpdLsVhMPp9PLS0tWrRo0RsnIALAUHLnzh1dunRJmZmZysvLMx0HwD9w+vRpSxRyf/XJJ59o+/btKi4uHrD+008/6ZtvvtGtW7cMJYOVUNABAJBAixcv1owZM7Rz5065XC61tbVp4sSJWrlypeLxuH755RfTEQHgb+vp6dHChQtVU1OjSZMmmY4DAINixIgRunr1qrKysgas37hxQ36/Xy9evDCUDFbCEVcAABLou+++UygU0qVLl/Ty5UuVl5fr2rVrevLkiS5evGg6HgC8E4fDoatXr3J5OoBhJSsrSw0NDdq2bduA9fr6emVnZxtKBavhDToAABLswYMH+v7773X58mXF43HNmDFD69ev17hx40xHA4B3VlZWJofDoV27dpmOAgCD4tixY1qxYoUWLFiguXPnymaz6ddff9W5c+fU0NCgwsJC0xFhARR0AAAAAP62jRs3qra2VllZWZo1a5ZGjhw5YJ9BEQCGosuXL2vv3r26fv26+vr6NGXKFJWVlWn69Ommo8EiKOgAAEiwFy9eqK2tTY8ePVI8Hh+w9/nnnxtKBQDvx6pDfwAASCQKOgAAEqixsVHFxcV6/Pjxa3s2m029vb0GUgHAu2lra1NOTo7sdrvpKAAw6FatWqVQKKRQKMSdczCGv7AAACTQhg0bVFRUpPv37ysejw/4opwDMFRMnz69/0GDz+dTZ2en4UQAMHhGjRqlPXv26NNPP9XHH3+sL774Qj/88IPa29tNR4OF8AYdAAAJlJaWptbWVmVmZpqOAgDvLT09XWfOnNGcOXNkt9v18OFDeb1e07EAYFA9ePBAkUhEkUhE0WhUv/32m8aOHav79++bjgYLSDUdAACA4WzZsmWKRCIUdACGtKVLlyoYDGrcuHGy2WyaNWuWUlJS3vi9HR0dSU4HAIPD5XLJ4/HI4/HI7XYrNTVVGRkZpmPBIniDDgCABOru7lZRUZG8Xq/8fr8cDseA/XA4bCgZALybxsZG3bx5U+FwWDt27JDL5Xrj923atCnJyQDgn6moqFA0GlUsFlNOTo4CgYCCwaACgYDcbrfpeLAICjoAABLoyJEjKikpkdPpVHp6umw2W/+ezWbjTRMAQ86aNWt04MCBtxZ0ADDU2O12eb1elZaWqqCgQJMnTzYdCRZEQQcAQAJlZGQoHA6rsrKS6YcAAAD/QrFYTNFoVJFIRM3NzUpJSVEwGOyf7Ephh2SgoAMAIIHGjBmjlpYW7qADAAAYImKxmPbt26ejR48qHo+rt7fXdCRYAEMiAABIoNWrV6u+vl7btm0zHQUAAABv0dra2j/Btbm5WU+fPtW0adOUn59vOhosgoIOAIAE6u3tVVVVlZqampSbm/vakIjq6mpDyQAAACBJHo9HXV1dysvLUygU0rp16xQIBJSWlmY6GiyEI64AACTQ/3vqarPZdP78+SSmAQAAwF+dPn2aQg7GUdABAAAAAAAABjFODgAAAAAAADCIgg4AAAAAAAAwiIIOAAAAAAAAMIiCDgAAAAAAADCIgg4AAAAAAAAwiIIOAAAAAAAAMIiCDgAAAAAAADCIgg4AAAAAAAAwiIIOAAAAAAAAMOg/y0S3Y0TUd1sAAAAASUVORK5CYII=",
-      "text/plain": [
-       "<Figure size 1500x800 with 2 Axes>"
-      ]
-     },
-     "metadata": {},
-     "output_type": "display_data"
-    }
-   ],
-   "source": [
-    "import matplotlib.pyplot as plt\n",
-    "import numpy as np\n",
-    "from sklearn.model_selection import  train_test_split \n",
-    "from sklearn.datasets import load_breast_cancer\n",
-    "from sklearn.linear_model import LogisticRegression\n",
-    "cancer = load_breast_cancer()\n",
-    "import pandas as pd\n",
-    "# Making a data frame\n",
-    "cancerpd = pd.DataFrame(cancer.data, columns=cancer.feature_names)\n",
-    "\n",
-    "fig, axes = plt.subplots(15,2,figsize=(10,20))\n",
-    "malignant = cancer.data[cancer.target == 0]\n",
-    "benign = cancer.data[cancer.target == 1]\n",
-    "ax = axes.ravel()\n",
-    "\n",
-    "for i in range(30):\n",
-    "    _, bins = np.histogram(cancer.data[:,i], bins =50)\n",
-    "    ax[i].hist(malignant[:,i], bins = bins, alpha = 0.5)\n",
-    "    ax[i].hist(benign[:,i], bins = bins, alpha = 0.5)\n",
-    "    ax[i].set_title(cancer.feature_names[i])\n",
-    "    ax[i].set_yticks(())\n",
-    "ax[0].set_xlabel(\"Feature magnitude\")\n",
-    "ax[0].set_ylabel(\"Frequency\")\n",
-    "ax[0].legend([\"Malignant\", \"Benign\"], loc =\"best\")\n",
-    "fig.tight_layout()\n",
-    "plt.show()\n",
-    "\n",
-    "import seaborn as sns\n",
-    "correlation_matrix = cancerpd.corr().round(1)\n",
-    "# use the heatmap function from seaborn to plot the correlation matrix\n",
-    "# annot = True to print the values inside the square\n",
-    "plt.figure(figsize=(15,8))\n",
-    "sns.heatmap(data=correlation_matrix, annot=True)\n",
-    "plt.show()"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "e9552a3c",
-   "metadata": {},
-   "source": [
-    "## Discussing the correlation data\n",
-    "\n",
-    "In the above example we note two things. In the first plot we display\n",
-    "the overlap of benign and malignant tumors as functions of the various\n",
-    "features in the Wisconsing breast cancer data set. We see that for\n",
-    "some of the features we can distinguish clearly the benign and\n",
-    "malignant cases while for other features we cannot. This can point to\n",
-    "us which features may be of greater interest when we wish to classify\n",
-    "a benign or not benign tumour.\n",
-    "\n",
-    "In the second figure we have computed the so-called correlation\n",
-    "matrix, which in our case with thirty features becomes a $30\\times 30$\n",
-    "matrix.\n",
-    "\n",
-    "We constructed this matrix using **pandas** via the statements"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 5,
-   "id": "623ddee7",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "cancerpd = pd.DataFrame(cancer.data, columns=cancer.feature_names)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "7a61e306",
-   "metadata": {},
-   "source": [
-    "and then"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 6,
-   "id": "859552c6",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "correlation_matrix = cancerpd.corr().round(1)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "43d915d7",
-   "metadata": {},
-   "source": [
-    "Diagonalizing this matrix we can in turn say something about which\n",
-    "features are of relevance and which are not. This leads  us to\n",
-    "the classical Principal Component Analysis (PCA) theorem with\n",
-    "applications. This will be discussed later this semester ([week 43](https://compphysics.github.io/MachineLearning/doc/pub/week43/html/week43-bs.html))."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "5c8e892e",
-   "metadata": {},
-   "source": [
-    "## Other measures in classification studies: Cancer Data  again"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 7,
-   "id": "08b680f2",
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "(426, 30)\n",
-      "(143, 30)\n",
-      "[1.         0.86666667 1.         0.85714286 1.         0.85714286\n",
-      " 1.         0.92857143 0.92857143 1.        ]\n",
-      "Test set accuracy with Logistic Regression: 0.94\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "/Users/mhjensen/miniforge3/envs/myenv/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:460: ConvergenceWarning: lbfgs failed to converge (status=1):\n",
-      "STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.\n",
-      "\n",
-      "Increase the number of iterations (max_iter) or scale the data as shown in:\n",
-      "    https://scikit-learn.org/stable/modules/preprocessing.html\n",
-      "Please also refer to the documentation for alternative solver options:\n",
-      "    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression\n",
-      "  n_iter_i = _check_optimize_result(\n",
-      "/Users/mhjensen/miniforge3/envs/myenv/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:460: ConvergenceWarning: lbfgs failed to converge (status=1):\n",
-      "STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.\n",
-      "\n",
-      "Increase the number of iterations (max_iter) or scale the data as shown in:\n",
-      "    https://scikit-learn.org/stable/modules/preprocessing.html\n",
-      "Please also refer to the documentation for alternative solver options:\n",
-      "    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression\n",
-      "  n_iter_i = _check_optimize_result(\n",
-      "/Users/mhjensen/miniforge3/envs/myenv/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:460: ConvergenceWarning: lbfgs failed to converge (status=1):\n",
-      "STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.\n",
-      "\n",
-      "Increase the number of iterations (max_iter) or scale the data as shown in:\n",
-      "    https://scikit-learn.org/stable/modules/preprocessing.html\n",
-      "Please also refer to the documentation for alternative solver options:\n",
-      "    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression\n",
-      "  n_iter_i = _check_optimize_result(\n",
-      "/Users/mhjensen/miniforge3/envs/myenv/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:460: ConvergenceWarning: lbfgs failed to converge (status=1):\n",
-      "STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.\n",
-      "\n",
-      "Increase the number of iterations (max_iter) or scale the data as shown in:\n",
-      "    https://scikit-learn.org/stable/modules/preprocessing.html\n",
-      "Please also refer to the documentation for alternative solver options:\n",
-      "    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression\n",
-      "  n_iter_i = _check_optimize_result(\n",
-      "/Users/mhjensen/miniforge3/envs/myenv/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:460: ConvergenceWarning: lbfgs failed to converge (status=1):\n",
-      "STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.\n",
-      "\n",
-      "Increase the number of iterations (max_iter) or scale the data as shown in:\n",
-      "    https://scikit-learn.org/stable/modules/preprocessing.html\n",
-      "Please also refer to the documentation for alternative solver options:\n",
-      "    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression\n",
-      "  n_iter_i = _check_optimize_result(\n",
-      "/Users/mhjensen/miniforge3/envs/myenv/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:460: ConvergenceWarning: lbfgs failed to converge (status=1):\n",
-      "STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.\n",
-      "\n",
-      "Increase the number of iterations (max_iter) or scale the data as shown in:\n",
-      "    https://scikit-learn.org/stable/modules/preprocessing.html\n",
-      "Please also refer to the documentation for alternative solver options:\n",
-      "    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression\n",
-      "  n_iter_i = _check_optimize_result(\n",
-      "/Users/mhjensen/miniforge3/envs/myenv/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:460: ConvergenceWarning: lbfgs failed to converge (status=1):\n",
-      "STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.\n",
-      "\n",
-      "Increase the number of iterations (max_iter) or scale the data as shown in:\n",
-      "    https://scikit-learn.org/stable/modules/preprocessing.html\n",
-      "Please also refer to the documentation for alternative solver options:\n",
-      "    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression\n",
-      "  n_iter_i = _check_optimize_result(\n",
-      "/Users/mhjensen/miniforge3/envs/myenv/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:460: ConvergenceWarning: lbfgs failed to converge (status=1):\n",
-      "STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.\n",
-      "\n",
-      "Increase the number of iterations (max_iter) or scale the data as shown in:\n",
-      "    https://scikit-learn.org/stable/modules/preprocessing.html\n",
-      "Please also refer to the documentation for alternative solver options:\n",
-      "    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression\n",
-      "  n_iter_i = _check_optimize_result(\n",
-      "/Users/mhjensen/miniforge3/envs/myenv/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:460: ConvergenceWarning: lbfgs failed to converge (status=1):\n",
-      "STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.\n",
-      "\n",
-      "Increase the number of iterations (max_iter) or scale the data as shown in:\n",
-      "    https://scikit-learn.org/stable/modules/preprocessing.html\n",
-      "Please also refer to the documentation for alternative solver options:\n",
-      "    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression\n",
-      "  n_iter_i = _check_optimize_result(\n",
-      "/Users/mhjensen/miniforge3/envs/myenv/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:460: ConvergenceWarning: lbfgs failed to converge (status=1):\n",
-      "STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.\n",
-      "\n",
-      "Increase the number of iterations (max_iter) or scale the data as shown in:\n",
-      "    https://scikit-learn.org/stable/modules/preprocessing.html\n",
-      "Please also refer to the documentation for alternative solver options:\n",
-      "    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression\n",
-      "  n_iter_i = _check_optimize_result(\n",
-      "/Users/mhjensen/miniforge3/envs/myenv/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:460: ConvergenceWarning: lbfgs failed to converge (status=1):\n",
-      "STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.\n",
-      "\n",
-      "Increase the number of iterations (max_iter) or scale the data as shown in:\n",
-      "    https://scikit-learn.org/stable/modules/preprocessing.html\n",
-      "Please also refer to the documentation for alternative solver options:\n",
-      "    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression\n",
-      "  n_iter_i = _check_optimize_result(\n"
-     ]
-    },
-    {
-     "data": {
-      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAfYAAAHFCAYAAAAABdu/AAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjguMCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy81sbWrAAAACXBIWXMAAA9hAAAPYQGoP6dpAABCWElEQVR4nO3deVxU9f7H8few4wIGCC4houa+pJCJZrmkZWZat7TsuqJlVmSuqaVpGtmiZoW5pFipVyu1zRauS1kuqVGWeuumKC7gnigmCJzfH17m1wjawMxIM+f19HEeD/nO95zzOTDwmc/3fM85FsMwDAEAAI/gVdYBAAAA5yGxAwDgQUjsAAB4EBI7AAAehMQOAIAHIbEDAOBBSOwAAHgQEjsAAB6ExA4AgAchsXuI5ORkWSwWBQQEaP/+/UVeb9eunRo3blwGkTlH//79VbNmTZu2mjVrqn///lc1jn379slisSg5Odmu/nv37tVjjz2munXrKjAwUOXKlVOjRo309NNP69ChQy6PtWvXrgoJCZHFYtGwYcOcvo+y+BlI0vr162WxWK74s+jQoYMsFkuR9429lixZopkzZ5ZonZK+PwBX8CnrAOBcOTk5evrpp/XOO++UdSgut3LlSgUFBZV1GJf1ySef6P7771dYWJgee+wxNW/eXBaLRT/99JMWLFigTz/9VKmpqS7b/5NPPqktW7ZowYIFqlKliqpWrer0fZT1z6BixYp66623iny4SEtL0/r16x2KbcmSJfr5559L9IGoatWq2rRpk2rXrl3q/QKOIrF7mNtvv11LlizRyJEj1axZM5ft548//lBgYKDLtm+P5s2bl+n+ryQtLU3333+/6tatq3Xr1ik4ONj6WocOHZSQkKCVK1e6NIaff/5ZLVu2VI8ePVy2j7L+GfTq1Uvz58/Xf//7X1133XXW9gULFqh69epq0qSJdu3a5fI48vPzlZeXJ39/f7Vq1crl+wOuhKF4DzN69GiFhoZqzJgxf9n3/PnzGjt2rKKjo+Xn56fq1avr0Ucf1e+//27Tr2bNmrrzzju1YsUKNW/eXAEBAZo0aZJ1OHTJkiUaM2aMqlatqgoVKqhbt246cuSIzpw5o4ceekhhYWEKCwvTgAEDdPbsWZttv/HGG7r55psVHh6u8uXLq0mTJnrxxRd14cKFv4z/0mHgdu3aWYdnL13+PDSamZmphx9+WNdee638/PwUHR2tSZMmKS8vz2b7hw8fVs+ePVWxYkUFBwerV69eyszM/Mu4JGn69OnKzs5WUlKSTVIvZLFYdM8999i0LViwQM2aNVNAQIBCQkJ09913a/fu3TZ9+vfvrwoVKui3337THXfcoQoVKigyMlIjRoxQTk6OpP8fpv7tt9/02WefWb8H+/bts56y2bdvn812C9dZv369tS01NVV33nmnwsPD5e/vr2rVqqlr1646ePCgtU9xQ/Hp6en65z//aV2vQYMGeuWVV1RQUGDtUzhk/fLLL2v69OmKjo5WhQoVFBcXp82bN9v1PZakTp06KTIyUgsWLLC2FRQUaNGiRerXr5+8vIr+ibPnPdeuXTt9+umn2r9/v8376M+xv/jii5oyZYqio6Pl7++vdevWFRmKP3/+vJo3b646dero9OnT1u1nZmaqSpUqateunfLz8+0+XsAeVOwepmLFinr66af1xBNPaO3aterQoUOx/QzDUI8ePbRmzRqNHTtWbdu21Y4dOzRx4kRt2rRJmzZtkr+/v7X/999/r927d+vpp59WdHS0ypcvr+zsbEnSuHHj1L59eyUnJ2vfvn0aOXKkHnjgAfn4+KhZs2ZaunSpUlNTNW7cOFWsWFGzZs2ybnfPnj3q3bu39cPFjz/+qKlTp+o///mPzR9reyQlJSkrK8um7ZlnntG6detUr149SRf/oLZs2VJeXl6aMGGCateurU2bNmnKlCnat2+fFi5cKOniiMStt96qw4cPKzExUXXr1tWnn36qXr162RXLl19+qYiICLurt8TERI0bN04PPPCAEhMTdeLECT377LOKi4vT1q1bbarRCxcu6K677lJ8fLxGjBihr7/+Ws8995yCg4M1YcIEtWjRQps2bdLdd9+t2rVr6+WXX5akEg3FZ2dnq1OnToqOjtYbb7yhiIgIZWZmat26dTpz5sxl1zt27Jhat26t3NxcPffcc6pZs6Y++eQTjRw5Unv27FFSUpJN/zfeeEP169e3nst+5plndMcddygtLa3YD0SX8vLyUv/+/fXWW29pypQp8vb21pdffqmDBw9qwIABeuKJJ4qsY897LikpSQ899JD27Nlz2ZGVWbNmqW7dunr55ZcVFBRk8zMqFBAQoOXLlysmJkYDBw7UBx98oIKCAj344IMyDENLly6Vt7f3Xx4nUCIGPMLChQsNScbWrVuNnJwco1atWkZsbKxRUFBgGIZh3HLLLUajRo2s/T///HNDkvHiiy/abGfZsmWGJGPu3LnWtqioKMPb29v45ZdfbPquW7fOkGR069bNpn3YsGGGJCMhIcGmvUePHkZISMhljyE/P9+4cOGC8fbbbxve3t7GyZMnra/169fPiIqKsukfFRVl9OvX77Lbe+mll4ocy8MPP2xUqFDB2L9/v03fl19+2ZBk7Ny50zAMw5g9e7Yhyfjwww9t+g0ePNiQZCxcuPCy+zUMwwgICDBatWp1xT6FTp06ZQQGBhp33HGHTXt6errh7+9v9O7d29rWr18/Q5KxfPlym7533HGHUa9ePZu2qKgoo2vXrjZthe+TtLQ0m/bCn+W6desMwzCMbdu2GZKMVatWXTH2S38GTz31lCHJ2LJli02/Rx55xLBYLNb3UFpamiHJaNKkiZGXl2ft99133xmSjKVLl15xv4Xxvvfee8bevXsNi8VifPLJJ4ZhGMZ9991ntGvXzjAMw+jatWuR982fXek9d7l1C2OvXbu2kZubW+xrl74/Cn+vZs6caUyYMMHw8vIyvvzyyyseI1BaDMV7ID8/P02ZMkXbtm3T8uXLi+2zdu1aSSoyjHrfffepfPnyWrNmjU1706ZNVbdu3WK3deedd9p83aBBA0lS165di7SfPHnSZjg+NTVVd911l0JDQ+Xt7S1fX1/17dtX+fn5+vXXX//6YC9j6dKlGj16tJ5++mkNHjzY2v7JJ5+offv2qlatmvLy8qxLly5dJElfffWVJGndunWqWLGi7rrrLpvt9u7du9QxXc6mTZv0xx9/FPlZREZGqkOHDkV+FhaLRd26dbNpa9q0abFXQ5RWnTp1dM0112jMmDF688037T5PvXbtWjVs2FAtW7a0ae/fv78Mw7C+7wp17drVpmJt2rSpJJXoWKKjo9WuXTstWLBAJ06c0IcffqiBAwdetr+z3nN33XWXfH197erbs2dPPfLIIxo1apSmTJmicePGqVOnTnbvCygJEruHuv/++9WiRQuNHz++2PPVJ06ckI+PjypXrmzTbrFYVKVKFZ04ccKm/UrDuCEhITZf+/n5XbH9/Pnzki6ei23btq0OHTqkV199VRs2bNDWrVv1xhtvSLo4HF4a69atU//+/dW3b18999xzNq8dOXJEH3/8sXx9fW2WRo0aSZKOHz8u6eL3JyIiosi2q1SpYlcMNWrUUFpaml19C7/XxX2Pq1WrVuRnUa5cOQUEBNi0+fv7W7+vzhAcHKyvvvpK119/vcaNG6dGjRqpWrVqmjhx4hXnP5w4ceKyx1H4+p+FhobafF14+qekP/v4+Hh9/PHHmj59ugIDA3XvvfcW28+Z77mSXmUwcOBAXbhwQT4+PkpISCjRukBJcI7dQ1ksFk2bNk2dOnXS3Llzi7weGhqqvLw8HTt2zCa5G4ahzMxM3XDDDUW252yrVq1Sdna2VqxYoaioKGv7Dz/8UOpt7tixQz169NAtt9yiefPmFXk9LCxMTZs21dSpU4tdvzABhYaG6rvvvivyur2T52677Ta99tpr2rx581+eZy9MbhkZGUVeO3z4sMLCwuzapz0KPxAUTrQrVPiB5s+aNGmif/3rXzIMQzt27FBycrImT56swMBAPfXUU8VuPzQ09LLHIcmpx/Jn99xzjx599FG98MILGjx48GWv2HDme64kvxPZ2dnq06eP6tatqyNHjmjQoEH68MMPS7xPwB5U7B7s1ltvVadOnTR58uQis9E7duwoSXr33Xdt2j/44ANlZ2dbX3elwj+Mf56kZxhGsQnZHunp6erSpYtq1aqlDz74oNhh0jvvvFM///yzateurdjY2CJLYWJv3769zpw5o48++shm/SVLltgVy5NPPqny5ctr6NChNrOhCxmGYZ2UFRcXp8DAwCI/i4MHD2rt2rVO/VkU3qxlx44dNu2XHuefWSwWNWvWTDNmzFClSpX0/fffX7Zvx44dtWvXriJ93n77bVksFrVv3770wV9BYGCgJkyYoG7duumRRx65bL+SvOf8/f1LPWp0qSFDhig9PV0rVqzQW2+9pY8++kgzZsxwyraBS1Gxe7hp06YpJiZGR48etQ43SxcvE7rttts0ZswYZWVlqU2bNtZZ8c2bN1efPn1cHlunTp3k5+enBx54QKNHj9b58+c1e/ZsnTp1qlTb69Kli37//Xe9/vrr2rlzp81rtWvXVuXKlTV58mSlpKSodevWSkhIUL169XT+/Hnt27dPq1ev1ptvvqlrr71Wffv21YwZM9S3b19NnTpV1113nVavXq0vvvjCrliio6P1r3/9S7169dL1119vvUGNJO3atUsLFiyQYRi6++67ValSJT3zzDMaN26c+vbtqwceeEAnTpzQpEmTFBAQoIkTJ5bq+1GcG264QfXq1dPIkSOVl5ena665RitXrtQ333xj0++TTz5RUlKSevTooVq1askwDK1YsUK///77Fc8NP/nkk3r77bfVtWtXTZ48WVFRUfr000+VlJSkRx555LLzNJxh+PDhGj58+BX7lOQ916RJE61YsUKzZ89WTEyMvLy8FBsbW+K45s+fr3fffVcLFy5Uo0aN1KhRIz322GMaM2aM2rRpU2Q+AuCwspu3B2f686z4S/Xu3duQZDMr3jAM448//jDGjBljREVFGb6+vkbVqlWNRx55xDh16pRNv+JmVxuG7cxke2KZOHGiIck4duyYte3jjz82mjVrZgQEBBjVq1c3Ro0aZXz22Wc2M7QNw75Z8ZIuu/x5lvKxY8eMhIQEIzo62vD19TVCQkKMmJgYY/z48cbZs2et/Q4ePGj84x//MCpUqGBUrFjR+Mc//mFs3LjRrlnxhfbs2WMMHTrUqFOnjuHv728EBgYaDRs2NIYPH15kZvr8+fONpk2bGn5+fkZwcLDRvXt36yz9P38fypcvX2Q/hd/bS78/xf3cfv31V6Nz585GUFCQUblyZePxxx83Pv30U5vv+X/+8x/jgQceMGrXrm0EBgYawcHBRsuWLY3k5OQi+7j0yoT9+/cbvXv3NkJDQw1fX1+jXr16xksvvWTk5+db+xTOHn/ppZeKxCfJmDhxYpH2P7vce+9Sxc1st/c9d/LkSePee+81KlWqZFgsFuv390qxXzorfseOHUZgYGCR79H58+eNmJgYo2bNmkV+3wBHWQzDMK7i5wgAAOBCnGMHAMCDkNgBAPAgJHYAADwIiR0AAA9CYgcAwIOQ2AEA8CBufYOagoICHT58WBUrVnTJLU8BAK5lGIbOnDmjatWqycvLdbXm+fPnlZub6/B2/Pz8ijyr4e/GrRP74cOHFRkZWdZhAAAcdODAAV177bUu2fb58+cVWDFUyjvn8LaqVKmitLS0v3Vyd+vEXrFiRUmSX8N+snj7lXE0gGukr3+5rEMAXOZMVpbqREda/567Qm5urpR3Tv4N+0mO5Ir8XGXuWqTc3FwSu6sUDr9bvP1I7PBYQUFBZR0C4HJX5XSqT4BDucKwuMe0NLdO7AAA2M0iyZEPEG4ylYvEDgAwB4vXxcWR9d2Ae0QJAADsQsUOADAHi8XBoXj3GIsnsQMAzIGheAAA4G6o2AEA5sBQPAAAnsTBoXg3GeR2jygBAIBdqNgBAObAUDwAAB6EWfEAAMDdULEDAMyBoXgAADyISYbiSewAAHMwScXuHh8/AACAXajYAQDmwFA8AAAexGJxMLEzFA8AAK4yKnYAgDl4WS4ujqzvBkjsAABzMMk5dveIEgAA2IWKHQBgDia5jp3EDgAwB4biAQCAu6FiBwCYA0PxAAB4EJMMxZPYAQDmYJKK3T0+fgAAALtQsQMAzIGheAAAPAhD8QAAwN1QsQMATMLBoXg3qYVJ7AAAc2AoHgAAuBsqdgCAOVgsDs6Kd4+KncQOADAHk1zu5h5RAgAAu1CxAwDMwSST50jsAABzMMlQPIkdAGAOJqnY3ePjBwAAsAsVOwDAHBiKBwDAgzAUDwAA3A0VOwDAFCwWiywmqNhJ7AAAUzBLYmcoHgAAD0LFDgAwB8v/FkfWdwMkdgCAKTAUDwAA3A4VOwDAFMxSsZPYAQCmQGIHAMCDmCWxc44dAAAPQsUOADAHLncDAMBzMBQPAAAclpSUpOjoaAUEBCgmJkYbNmy4Yv/FixerWbNmKleunKpWraoBAwboxIkTdu+PxA4AMIWLT221OLCUfJ/Lli3TsGHDNH78eKWmpqpt27bq0qWL0tPTi+3/zTffqG/fvoqPj9fOnTv13nvvaevWrRo0aJDd+ySxAwBMwSJHkrpFllKcZJ8+fbri4+M1aNAgNWjQQDNnzlRkZKRmz55dbP/NmzerZs2aSkhIUHR0tG666SY9/PDD2rZtm937JLEDAFACWVlZNktOTk6x/XJzc7V9+3Z17tzZpr1z587auHFjseu0bt1aBw8e1OrVq2UYho4cOaL3339fXbt2tTs+EjsAwBQcG4b//4l3kZGRCg4Oti6JiYnF7u/48ePKz89XRESETXtERIQyMzOLXad169ZavHixevXqJT8/P1WpUkWVKlXSa6+9ZvdxktgBAOZgccIi6cCBAzp9+rR1GTt27JV3e8nJecMwLjs7f9euXUpISNCECRO0fft2ff7550pLS9OQIUPsPkwudwMAoASCgoIUFBT0l/3CwsLk7e1dpDo/evRokSq+UGJiotq0aaNRo0ZJkpo2bary5curbdu2mjJliqpWrfqX+6ViBwCYg6PD8CWcFu/n56eYmBilpKTYtKekpKh169bFrnPu3Dl5edmmZm9vb0kXK317ULEDAEzB0RvUlGbd4cOHq0+fPoqNjVVcXJzmzp2r9PR069D62LFjdejQIb399tuSpG7dumnw4MGaPXu2brvtNmVkZGjYsGFq2bKlqlWrZtc+SewAAFMoi8Teq1cvnThxQpMnT1ZGRoYaN26s1atXKyoqSpKUkZFhc017//79debMGb3++usaMWKEKlWqpA4dOmjatGn2x2nYW9v/DWVlZSk4OFj+TQbL4u1X1uEALnFq6+tlHQLgMllZWYoIDdbp06ftOm9d2n0EBwcr9MGF8vIrV+rtFOSe04nFA1waqzNQsQMAzIGHwAAA4DnKYii+LDArHgAAD0LFDgAwBbNU7CR2AIApmCWxMxQPAIAHoWIHAJiCWSp2EjsAwBxMcrkbQ/EAAHgQKnYAgCkwFA8AgAchsQMA4EHMktg5xw4AgAehYgcAmINJZsWT2AEApsBQPAAAcDskdpN76L622v3Jszq1eYa+XTxabZrXvmL/h3verNQPntbJTdP148pn1PvOlkX6BFcI1Iynemrvl1N1avMMpX7wtG67qaGrDgG4ojmzk1T/umhVqhCg1i1j9M03G67Yf8PXX6l1yxhVqhCgBnVrad6cNy/bd/myfynQ16L7/tHDyVHDFQordkcWd1DmiT0pKUnR0dEKCAhQTEyMNmy48i8dnOfezi300qh/aNpbX6jVAy9oY+oerXp9qCKrXFNs/8H33aTJj3fT1Dmr1eLeqZry5mrNfKqn7ri5sbWPr4+3Pn3zMUVVC9GDo95Ss7sn69Hnlujw0dNX67AAq/eWL9OoEcM05qnx2rw1Va1vaqsed3ZRenp6sf33paWpR7c71Pqmttq8NVWjx4zTiCcTtHLFB0X67t+/X2PHjFSbm9q6+jDgJBY5mNjd5CR7mSb2ZcuWadiwYRo/frxSU1PVtm1bdely+V86OFfCPzsoedUmJa/cpF/SjmjUyx/oYOYpDb6v+D9Uvbu21FsffKv3v/xe+w6d0HtfbNeiVZs0on8na59+PeJ0TVA59Rw+V5t+3Kv0jFPa+MNe/fTroat1WIDVrJnT1X9AvAbED1L9Bg308vSZujYyUvPmzC62/7y5byqyRg29PH2m6jdooAHxg9Sv/0DNnP6yTb/8/HwN6PugnpkwSdHRta7GoQB2K9PEPn36dMXHx2vQoEFq0KCBZs6cqcjISM2eXfwvHZzH18dbzRtEas2m3TbtazbvVqtm0cWu4+fro/O5F2za/si5oNjGUfLxufhW6npLE23ZkaaZT/XSvn8/r23vjdOogZ3l5eUen3ThOXJzc5X6/XZ17NTZpr3jrZ21edPGYtfZsnmTOt5q2//Wzrfp++3bdOHC/7/3n58yWWGVK6v/wHjnBw6XYSjexXJzc7V9+3Z17mz7S9S5c2dt3Fj8Lx2cJ+yaCvLx8dbRk2ds2o+cOKOI0KBi1/n3pt3q36O1mjeIlCS1aFhDfbu3kp+vj8IqVZAkRVcP1d23Npe3t0V3Pz5b0+Z/oSf6dNSYQbe59oCASxw/flz5+fkKD4+waY+IiNCRI5nFrnPkSKYiImz7h4dHKC8vT8ePH5ckbfz2WyUvfEtJb85zTeBwHYsTFjdQZpe7Ff7SXfpLFBERoczM4n/pcnJylJOTY/06KyvLpTGagWHYfm2xWGRc2vg/ifM+V0RokL5aNFIWi3T05Bm9+9EWjRjQSfn5BZIkLy8vHTt5Ro8+t1QFBYZSdx9Q1crBGta3oxLnfu7qwwGKuLTKMgzjipVXcf0L28+cOaOB/f+ppDfnKSwszPnBAk5Q5texl+SXLjExUZMmTboaYXm846fOKi8vXxGhFW3aw0MqFKniC53PuaAhkxbrsalLFRESpIzjpxX/jzbKOvuHjv+eLUnKPH5aF/LyVVDw/x8O/pOWqaqVg+Xr460LefmuOyjgT8LCwuTt7V2kOj969GiRKr5QRESVIoXFsWNH5ePjo9DQUO3auVP79+3TP3p0s75eUHDxQ22FAB/t2PmLatW+8pUlKDtcx+5ihb90l/4SHT16tEgVX2js2LE6ffq0dTlw4MDVCNUjXcjLV+ruA+rQqr5Ne4dW9bX5x7QrrpuXV6BDR39XQYGh+26L0Wcbdlqrmk0/7FXtyMo2vwDX1QhXxrHTJHVcVX5+fmreIkZr/51i0752TYpaxbUudp0bW8Vp7Rrb/mtSvlSLmFj5+vqqXv362pb6k7Zs+8G6dO12l25p115btv2gayMjXXY8cJxZzrGXWcXu5+enmJgYpaSk6O6777a2p6SkqHv37sWu4+/vL39//6sVoseb9e5avTWlr77fla4tO9IUf08bRVYJ0fz3L15yOPnxu1QtPFiDnnlHklSnRrhiG0dp68/7dE3Fckro00ENa1ezvi5J897boEfuv0WvjL5XSUu/Up0alTUqvrOSln5VJscIc0sYNlzx/fuoRUysbmwVp7fmz9WB9HQNemiIJOmZ8WN1+NAhvZX8tiRp8END9GbS6xo9crgGxg/Wls2blLzwLS16d6kkKSAgQI0aN7bZR6XgSpJUpB1/PxbLxcWR9d1BmQ7FDx8+XH369FFsbKzi4uI0d+5cpaena8iQIWUZlmm8/+X3Cgkur3EPdVGVsCDt/C1DPR5PUnrGKUlSlbAgRVYJsfb39rboiT4dVDcqQhfy8vX1tl/Vvv8rSs84ae1z8Mjv6jb0Db044h5tXT5Wh4/+rjeWrNcrySlF9g+42n09e+nkiRN6fupkZWZkqFGjxlr18WpFRUVJkjIzMnTgwP9fXlszOlqrPl6t0SOe1JzZb6hqtWp6ZcYs3X3PP8rqEIASsxiXmyl1lSQlJenFF19URkaGGjdurBkzZujmm2+2a92srCwFBwfLv8lgWbz9XBwpUDZObX29rEMAXCYrK0sRocE6ffq0goKKvyLHGfsIDg5Wrcffl5d/+VJvpyAnW3tfu9elsTpDmU+eGzp0qIYOHVrWYQAAPJ2DQ/Hucrlbmd9SFgAAOE+ZV+wAAFwNZrncjcQOADAFs8yKZygeAAAPQsUOADAFLy+LQw+kMtzkYVYkdgCAKTAUDwAA3A4VOwDAFJgVDwCABzHLUDyJHQBgCmap2DnHDgCAB6FiBwCYglkqdhI7AMAUzHKOnaF4AAA8CBU7AMAULHJwKN5NnttKYgcAmAJD8QAAwO1QsQMATIFZ8QAAeBCG4gEAgNuhYgcAmAJD8QAAeBCzDMWT2AEApmCWip1z7AAAeBAqdgCAOTg4FO8mN54jsQMAzIGheAAA4Hao2AEApsCseAAAPAhD8QAAwO1QsQMATIGheAAAPAhD8QAAwO1QsQMATMEsFTuJHQBgCpxjBwDAg5ilYuccOwAAHoSKHQBgCgzFAwDgQRiKBwAAboeKHQBgChY5OBTvtEhci4odAGAKXhaLw0tpJCUlKTo6WgEBAYqJidGGDRuu2D8nJ0fjx49XVFSU/P39Vbt2bS1YsMDu/VGxAwDgIsuWLdOwYcOUlJSkNm3aaM6cOerSpYt27dqlGjVqFLtOz549deTIEb311luqU6eOjh49qry8PLv3SWIHAJhCWcyKnz59uuLj4zVo0CBJ0syZM/XFF19o9uzZSkxMLNL/888/11dffaW9e/cqJCREklSzZs0S7ZOheACAKRTOindkKYnc3Fxt375dnTt3tmnv3LmzNm7cWOw6H330kWJjY/Xiiy+qevXqqlu3rkaOHKk//vjD7v1SsQMATMHLcnFxZH1JysrKsmn39/eXv79/kf7Hjx9Xfn6+IiIibNojIiKUmZlZ7D727t2rb775RgEBAVq5cqWOHz+uoUOH6uTJk3afZ6diBwCgBCIjIxUcHGxdihtS/7NLK33DMC5b/RcUFMhisWjx4sVq2bKl7rjjDk2fPl3Jycl2V+1U7AAAc7A4eJOZ/6164MABBQUFWZuLq9YlKSwsTN7e3kWq86NHjxap4gtVrVpV1atXV3BwsLWtQYMGMgxDBw8e1HXXXfeXYVKxAwBMoXDynCOLJAUFBdksl0vsfn5+iomJUUpKik17SkqKWrduXew6bdq00eHDh3X27Flr26+//iovLy9de+21dh0niR0AABcZPny45s+frwULFmj37t168sknlZ6eriFDhkiSxo4dq759+1r79+7dW6GhoRowYIB27dqlr7/+WqNGjdLAgQMVGBho1z4ZigcAmILlf/8cWb+kevXqpRMnTmjy5MnKyMhQ48aNtXr1akVFRUmSMjIylJ6ebu1foUIFpaSk6PHHH1dsbKxCQ0PVs2dPTZkyxe59ktgBAKbgrFnxJTV06FANHTq02NeSk5OLtNWvX7/I8H1JMBQPAIAHoWIHAJiCWR7baldinzVrlt0bTEhIKHUwAAC4SlncUrYs2JXYZ8yYYdfGLBYLiR0AgDJkV2JPS0tzdRwAALiUI49eLVzfHZR68lxubq5++eWXEj1KDgCAsuKsG9T83ZU4sZ87d07x8fEqV66cGjVqZL3+LiEhQS+88ILTAwQAwBmu9tPdykqJE/vYsWP1448/av369QoICLC233rrrVq2bJlTgwMAACVT4svdVq1apWXLlqlVq1Y2n14aNmyoPXv2ODU4AACchVnxl3Hs2DGFh4cXac/OznabYQoAgPkwee4ybrjhBn366afWrwuT+bx58xQXF+e8yAAAQImVuGJPTEzU7bffrl27dikvL0+vvvqqdu7cqU2bNumrr75yRYwAADjMIjnwCBjH1r2aSlyxt27dWt9++63OnTun2rVr68svv1RERIQ2bdqkmJgYV8QIAIDDzDIrvlT3im/SpIkWLVrk7FgAAICDSpXY8/PztXLlSu3evVsWi0UNGjRQ9+7d5ePDM2UAAH9PZfXY1qutxJn4559/Vvfu3ZWZmal69epJkn799VdVrlxZH330kZo0aeL0IAEAcJRZnu5W4nPsgwYNUqNGjXTw4EF9//33+v7773XgwAE1bdpUDz30kCtiBAAAdipxxf7jjz9q27Ztuuaaa6xt11xzjaZOnaobbrjBqcEBAOBMblJ0O6TEFXu9evV05MiRIu1Hjx5VnTp1nBIUAADOxqz4P8nKyrL+//nnn1dCQoKeffZZtWrVSpK0efNmTZ48WdOmTXNNlAAAOIjJc39SqVIlm08qhmGoZ8+e1jbDMCRJ3bp1U35+vgvCBAAA9rArsa9bt87VcQAA4FJmmRVvV2K/5ZZbXB0HAAAuZZZbypb6jjLnzp1Tenq6cnNzbdqbNm3qcFAAAKB0SvXY1gEDBuizzz4r9nXOsQMA/o54bOtlDBs2TKdOndLmzZsVGBiozz//XIsWLdJ1112njz76yBUxAgDgMIvF8cUdlLhiX7t2rT788EPdcMMN8vLyUlRUlDp16qSgoCAlJiaqa9eurogTAADYocQVe3Z2tsLDwyVJISEhOnbsmKSLT3z7/vvvnRsdAABOYpYb1JTqznO//PKLJOn666/XnDlzdOjQIb355puqWrWq0wMEAMAZGIq/jGHDhikjI0OSNHHiRN12221avHix/Pz8lJyc7Oz4AABACZQ4sT/44IPW/zdv3lz79u3Tf/7zH9WoUUNhYWFODQ4AAGcxy6z4Ul/HXqhcuXJq0aKFM2IBAMBlHB1Od5O8bl9iHz58uN0bnD59eqmDAQDAVbil7J+kpqbatTF3OWgAADyVRzwE5r8p0xQUFFTWYQAucU3bp8o6BMBljLycq7YvL5XiUrBL1ncHDp9jBwDAHZhlKN5dPoAAAAA7ULEDAEzBYpG8mBUPAIBn8HIwsTuy7tXEUDwAAB6kVIn9nXfeUZs2bVStWjXt379fkjRz5kx9+OGHTg0OAABn4SEwlzF79mwNHz5cd9xxh37//Xfl5+dLkipVqqSZM2c6Oz4AAJyicCjekcUdlDixv/baa5o3b57Gjx8vb29va3tsbKx++uknpwYHAABKpsST59LS0tS8efMi7f7+/srOznZKUAAAOJtZ7hVf4oo9OjpaP/zwQ5H2zz77TA0bNnRGTAAAOF3h090cWdxBiSv2UaNG6dFHH9X58+dlGIa+++47LV26VImJiZo/f74rYgQAwGHcUvYyBgwYoLy8PI0ePVrnzp1T7969Vb16db366qu6//77XREjAACwU6luUDN48GANHjxYx48fV0FBgcLDw50dFwAATmWWc+wO3XkuLCzMWXEAAOBSXnLsPLmX3COzlzixR0dHX/Ei/b179zoUEAAAKL0SJ/Zhw4bZfH3hwgWlpqbq888/16hRo5wVFwAATsVQ/GU88cQTxba/8cYb2rZtm8MBAQDgCjwEpoS6dOmiDz74wFmbAwAApeC0x7a+//77CgkJcdbmAABwqovPYy992e2xQ/HNmze3mTxnGIYyMzN17NgxJSUlOTU4AACchXPsl9GjRw+br728vFS5cmW1a9dO9evXd1ZcAACgFEqU2PPy8lSzZk3ddtttqlKliqtiAgDA6Zg8VwwfHx898sgjysnJcVU8AAC4hMUJ/9xBiWfF33jjjUpNTXVFLAAAuExhxe7I4g5KfI596NChGjFihA4ePKiYmBiVL1/e5vWmTZs6LTgAAFAydif2gQMHaubMmerVq5ckKSEhwfqaxWKRYRiyWCzKz893fpQAADjILOfY7U7sixYt0gsvvKC0tDRXxgMAgEtYLJYrPuvEnvXdgd2J3TAMSVJUVJTLggEAAI4p0Tl2d/m0AgDApRiKL0bdunX/MrmfPHnSoYAAAHAF7jxXjEmTJik4ONhVsQAAAAeVKLHff//9Cg8Pd1UsAAC4jJfF4tBDYBxZ92qyO7Fzfh0A4M7Mco7d7jvPFc6KBwAAf192V+wFBQWujAMAANdycPKcm9wqvuT3igcAwB15yeLwUhpJSUmKjo5WQECAYmJitGHDBrvW+/bbb+Xj46Prr7++RPsjsQMATKHwcjdHlpJatmyZhg0bpvHjxys1NVVt27ZVly5dlJ6efsX1Tp8+rb59+6pjx44l3ieJHQAAF5k+fbri4+M1aNAgNWjQQDNnzlRkZKRmz559xfUefvhh9e7dW3FxcSXeJ4kdAGAKznpsa1ZWls2Sk5NT7P5yc3O1fft2de7c2aa9c+fO2rhx42XjXLhwofbs2aOJEyeW7jhLtRYAAG6m8Dp2RxZJioyMVHBwsHVJTEwsdn/Hjx9Xfn6+IiIibNojIiKUmZlZ7Dr//e9/9dRTT2nx4sXy8Snxk9UlleJ57AAAmNmBAwcUFBRk/drf3/+K/S+9D0zhY84vlZ+fr969e2vSpEmqW7duqeMjsQMATMFZ94oPCgqySeyXExYWJm9v7yLV+dGjR4tU8ZJ05swZbdu2TampqXrsscckXbzU3DAM+fj46Msvv1SHDh3+cr8kdgCAKXjJwVvKlvByNz8/P8XExCglJUV33323tT0lJUXdu3cv0j8oKEg//fSTTVtSUpLWrl2r999/X9HR0Xbtl8QOAICLDB8+XH369FFsbKzi4uI0d+5cpaena8iQIZKksWPH6tChQ3r77bfl5eWlxo0b26wfHh6ugICAIu1XQmIHAJhCWTy2tVevXjpx4oQmT56sjIwMNW7cWKtXr1ZUVJQkKSMj4y+vaS9xnIYb3wQ+KytLwcHBSs88adf5DsAdVekwvqxDAFzGyMtRzraZOn36tMv+jhfmiqS1PyuwQsVSb+ePs2c0tENjl8bqDFzuBgCAB2EoHgBgChaLxaFHkLvL48tJ7AAAU7DIsQe0uUdaJ7EDAEziz3ePK+367oBz7AAAeBAqdgCAabhHze0YEjsAwBTK4jr2ssBQPAAAHoSKHQBgClzuBgCAB/GSY8PU7jLE7S5xAgAAO1CxAwBMgaF4AAA8iFnuPMdQPAAAHoSKHQBgCgzFAwDgQcwyK57EDgAwBbNU7O7yAQQAANiBih0AYApmmRVPYgcAmAIPgQEAAG6Hih0AYApessjLgQF1R9a9mkjsAABTYCgeAAC4HSp2AIApWP73z5H13QGJHQBgCgzFAwAAt0PFDgAwBYuDs+IZigcA4G/ELEPxJHYAgCmYJbFzjh0AAA9CxQ4AMAUudwMAwIN4WS4ujqzvDhiKBwDAg1CxAwBMgaF4AAA8CLPiAQCA26FiBwCYgkWODae7ScFOYgcAmAOz4gEAgNshsZvc/Dmz1bRBHUVcU163tG6pjd9uuGL/bzZ8pVtat1TENeXVrOF1WjBvjs3rXW/roErlfIosPe/u5srDAC7roXtaafcHo3Vq/XP6duFjatOs5hX7P/yPVkpdOlwn1z+nH/81Qr27tLB5vfstjfTNgseU8eVEHV87WZsXJeiB25u78AjgLBYn/HMHZZrYv/76a3Xr1k3VqlWTxWLRqlWryjIc01nx/nKNHT1cI0eP1debtimuzU26r8edOnAgvdj++/alqefd3RTX5iZ9vWmbRox6SmNGDtOHq1ZY+7y79H39svegddm07Ud5e3ur+z33Xq3DAqzu7dhULw27U9OS16lVv1na+OM+rZo+QJERwcX2H3z3jZr8yO2aOv/fatF7hqbMT9HMEd11x00NrH1OZv2hFxetU7vBSbqhz0y98+l2zR1/r2698bqrdVgopcJZ8Y4s7qBME3t2draaNWum119/vSzDMK03Zs1Qn34D1XdAvOrVb6AXXpqu6tdGasG8N4vtv3D+HF0bWUMvvDRd9eo3UN8B8fpn3wF6feYr1j7XhIQookoV67Ju7b9Vrlw59SCxowwkPHCTkj/epuSPt+qX/cc0auYnOnj0tAbf06rY/r27tNBbq7bo/TU7tO/wSb337x1a9MlWjfjnLdY+G1L36qOvduqX/ceUduik3lj+rX7ak6nWfzESgLJnccLiDso0sXfp0kVTpkzRPffcU5ZhmFJubq5+SP1e7Tt2smlv37GTtmzeVOw6323ZXKR/h1s7K/X77bpw4UKx67y7aKHuubeXypcv75zAATv5+nireb3qWvPdf23a12z5r1o1iSp2HT9fb53PzbNp+yPngmIbXisf7+L/XLaLra26NSrrm9Q05wQOOMitZsXn5OQoJyfH+nVWVlYZRuPeThw/rvz8fIVHhNu0h4eH6+iRI8Wuc/TIEYWHX9I/Ilx5eXk6cfy4qlStavPa9q3fadfOn/Va0lznBg/YIaxSOfn4eOvoyTM27UdOnVFESN1i1/n3lv+qf7cb9PFXu5T6yyG1qF9dfe+MlZ+vj8IqlVfmiYvbCirvrz0fjZO/n4/y8wv0xMsfau3W31x+THCMlyzycmA83ctNana3SuyJiYmaNGlSWYfhUSyXvMkNwyjS9lf9i2uXpHcWLVTDRo0Vc0NLJ0QKlM7/3qJWFllkyCi2b+LCNYoIraiv5g+VRdLRU2f17qfbNaJPO+UXFFj7nTmXqxv7zVKFQD+1j62jaQldlXbopDak7nXhkcBRjg6nu0dad7NZ8WPHjtXp06ety4EDB8o6JLcVGhYmb29vHcm0rc6PHTumypdU5YXCIyJ05JJq/tjRY/Lx8VFIaKhN+7lz57Ti/WXq03+gcwMH7HT893PKy8tXRGhFm/bwayro6Mmzxa5zPidPQ6a+r5B2z6j+PdN0XY8XtD/jlLKyz+v47+es/QzD0N6DJ7Tjvxl6dekGrVz3s0b1befKwwHs5laJ3d/fX0FBQTYLSsfPz0/XN2+h9Wv/bdO+fu2/dWOruGLXaXljqyL9161JUfMWMfL19bVpX/nBe8rJyVGv+x90buCAnS7k5Sv1l0PqcEMdm/YOLeto80/7r7huXn6BDh3LUkGBofs6NdNn3/7HOjpVHItF8vdzqwFQczLJ7DneiSb2aMKTeji+n65vEaOWN7ZS8oJ5OnggXQMGPSxJmjRhnA4fPqw585MlSQMGPax5byZp3JgR6jdgkL7bslnvLFqg+YsWF9n2u4sWqGu37kUqeeBqmrX0G701sae+/88hbflpv+J73KjIiEqav3KLJGnyI7epWuVgDZq8XJJUJzJMsQ2v1dadB3RNUKAS7m+rhrUiNGjye9ZtjuzbTt/vPqi9h07Kz9dbt8fV04NdWijhxVVlcYgoAZ7udhWcPXtWv/32/xNO0tLS9MMPPygkJEQ1atQow8jM4Z57e+rkiRN6MXGKjmRmqEHDxlq+8mPVqHFxxnBmZqYO/uma9po1o7V85ccaN3qk5s+ZrSpVq2nayzPVvYftVQ2//fdXbdr4rVZ+/NlVPR7gUu+v2aGQ4HIaN7CjqoRW1M69meoxIlnpmb9LkqqEBikyopK1v7eXRU/0vll1a4TpQl6Bvt6+R+0fmq30zFPWPuUD/PTqqB6qHh6sP3Iu6Nf9xzTw2WV6f82Oq3x0QPEsxpXGl1xs/fr1at++fZH2fv36KTk5+S/Xz8rKUnBwsNIzTzIsD49VpcP4sg4BcBkjL0c522bq9OnTLvs7Xpgr1vyQrgoVS7+Ps2ey1PH6Gi6N1RnKtGJv167dFc9bAQDgLMyKBwAAbofJcwAAczBJyU5iBwCYArPiAQDwII4+oY2nuwEAgKuOih0AYAomOcVOYgcAmIRJMjtD8QAAeBAqdgCAKTArHgAAD8KseAAA4Hao2AEApmCSuXMkdgCASZgkszMUDwCAB6FiBwCYArPiAQDwIGaZFU9iBwCYgklOsXOOHQAAV0pKSlJ0dLQCAgIUExOjDRs2XLbvihUr1KlTJ1WuXFlBQUGKi4vTF198UaL9kdgBAOZgccJSQsuWLdOwYcM0fvx4paamqm3bturSpYvS09OL7f/111+rU6dOWr16tbZv36727durW7duSk1Ntf8wDcMwSh7q30NWVpaCg4OVnnlSQUFBZR0O4BJVOowv6xAAlzHycpSzbaZOnz7tsr/jhbli8+7DqlCx9Ps4eyZLrRpUK1GsN954o1q0aKHZs2db2xo0aKAePXooMTHRrm00atRIvXr10oQJE+zqT8UOAEAJZGVl2Sw5OTnF9svNzdX27dvVuXNnm/bOnTtr48aNdu2roKBAZ86cUUhIiN3xkdgBAKZQOCvekUWSIiMjFRwcbF0uV3kfP35c+fn5ioiIsGmPiIhQZmamXTG/8sorys7OVs+ePe0+TmbFAwBMwVmz4g8cOGAzFO/v73/l9S65Ts4wjCJtxVm6dKmeffZZffjhhwoPD7c7ThI7AAAlEBQUZNc59rCwMHl7exepzo8ePVqkir/UsmXLFB8fr/fee0+33nprieJjKB4AYA5XeVa8n5+fYmJilJKSYtOekpKi1q1bX3a9pUuXqn///lqyZIm6du1asp2Kih0AYBJlcUvZ4cOHq0+fPoqNjVVcXJzmzp2r9PR0DRkyRJI0duxYHTp0SG+//baki0m9b9++evXVV9WqVStrtR8YGKjg4GC79kliBwDARXr16qUTJ05o8uTJysjIUOPGjbV69WpFRUVJkjIyMmyuaZ8zZ47y8vL06KOP6tFHH7W29+vXT8nJyXbtk8QOADCFsrpX/NChQzV06NBiX7s0Wa9fv750O/kTEjsAwBTMcq94EjsAwBxMktmZFQ8AgAehYgcAmEJZzIovCyR2AIA5ODh5zk3yOkPxAAB4Eip2AIApmGTuHIkdAGASJsnsDMUDAOBBqNgBAKbArHgAADxIWd1S9mpjKB4AAA9CxQ4AMAWTzJ0jsQMATMIkmZ3EDgAwBbNMnuMcOwAAHoSKHQBgChY5OCveaZG4FokdAGAKJjnFzlA8AACehIodAGAKZrlBDYkdAGAS5hiMZygeAAAPQsUOADAFhuIBAPAg5hiIZygeAACPQsUOADAFhuIBAPAgZrlXPIkdAGAOJjnJzjl2AAA8CBU7AMAUTFKwk9gBAOZglslzDMUDAOBBqNgBAKbArHgAADyJSU6yMxQPAIAHoWIHAJiCSQp2EjsAwByYFQ8AANwOFTsAwCQcmxXvLoPxJHYAgCkwFA8AANwOiR0AAA/CUDwAwBTMMhRPYgcAmIJZbinLUDwAAB6Eih0AYAoMxQMA4EHMcktZhuIBAPAgVOwAAHMwSclOYgcAmAKz4gEAgNuhYgcAmAKz4gEA8CAmOcVOYgcAmIRJMjvn2AEA8CBU7AAAUzDLrHgSOwDAFJg85wYMw5AknTmTVcaRAK5j5OWUdQiAyxj5F9/fhX/PXSkry7Fc4ej6V4tbJ/YzZ85IkhpdV7NsAwEAOOTMmTMKDg52ybb9/PxUpUoVXRcd6fC2qlSpIj8/PydE5ToW42p8THKRgoICHT58WBUrVpTFXcZI3FxWVpYiIyN14MABBQUFlXU4gFPx/r76DMPQmTNnVK1aNXl5uW4+9/nz55Wbm+vwdvz8/BQQEOCEiFzHrSt2Ly8vXXvttWUdhikFBQXxhw8ei/f31eWqSv3PAgIC/vYJ2Vm43A0AAA9CYgcAwIOQ2FEi/v7+mjhxovz9/cs6FMDpeH/DE7j15DkAAGCLih0AAA9CYgcAwIOQ2AEA8CAkdgAAPAiJHXZLSkpSdHS0AgICFBMTow0bNpR1SIBTfP311+rWrZuqVasmi8WiVatWlXVIQKmR2GGXZcuWadiwYRo/frxSU1PVtm1bdenSRenp6WUdGuCw7OxsNWvWTK+//npZhwI4jMvdYJcbb7xRLVq00OzZs61tDRo0UI8ePZSYmFiGkQHOZbFYtHLlSvXo0aOsQwFKhYodfyk3N1fbt29X586dbdo7d+6sjRs3llFUAIDikNjxl44fP678/HxFRETYtEdERCgzM7OMogIAFIfEDrtd+mhcwzB4XC4A/M2Q2PGXwsLC5O3tXaQ6P3r0aJEqHgBQtkjs+Et+fn6KiYlRSkqKTXtKSopat25dRlEBAIrjU9YBwD0MHz5cffr0UWxsrOLi4jR37lylp6dryJAhZR0a4LCzZ8/qt99+s36dlpamH374QSEhIapRo0YZRgaUHJe7wW5JSUl68cUXlZGRocaNG2vGjBm6+eabyzoswGHr169X+/bti7T369dPycnJVz8gwAEkdgAAPAjn2AEA8CAkdgAAPAiJHQAAD0JiBwDAg5DYAQDwICR2AAA8CIkdAAAPQmIHHPTss8/q+uuvt37dv3//MnmW9759+2SxWPTDDz9ctk/NmjU1c+ZMu7eZnJysSpUqORybxWLRqlWrHN4OgL9GYodH6t+/vywWiywWi3x9fVWrVi2NHDlS2dnZLt/3q6++avfdyuxJxgBQEtwrHh7r9ttv18KFC3XhwgVt2LBBgwYNUnZ2tmbPnl2k74ULF+Tr6+uU/QYHBztlOwBQGlTs8Fj+/v6qUqWKIiMj1bt3bz344IPW4eDC4fMFCxaoVq1a8vf3l2EYOn36tB566CGFh4crKChIHTp00I8//miz3RdeeEERERGqWLGi4uPjdf78eZvXLx2KLygo0LRp01SnTh35+/urRo0amjp1qiQpOjpaktS8eXNZLBa1a9fOut7ChQvVoEEDBQQEqH79+kpKSrLZz3fffafmzZsrICBAsbGxSk1NLfH3aPr06WrSpInKly+vyMhIDR06VGfPni3Sb9WqVapbt64CAgLUqVMnHThwwOb1jz/+WDExMQoICFCtWrU0adIk5eXllTgeAI4jscM0AgMDdeHCBevXv/32m5YvX64PPvjAOhTetWtXZWZmavXq1dq+fbtatGihjh076uTJk5Kk5cuXa+LEiZo6daq2bdumqlWrFkm4lxo7dqymTZumZ555Rrt27dKSJUusz7H/7rvvJEn//ve/lZGRoRUrVkiS5s2bp/Hjx2vq1KnavXu3nn/+eT3zzDNatGiRJCk7O1t33nmn6tWrp+3bt+vZZ5/VyJEjS/w98fLy0qxZs/Tzzz9r0aJFWrt2rUaPHm3T59y5c5o6daoWLVqkb7/9VllZWbr//vutr3/xxRf65z//qYSEBO3atUtz5sxRcnKy9cMLgKvMADxQv379jO7du1u/3rJlixEaGmr07NnTMAzDmDhxouHr62scPXrU2mfNmjVGUFCQcf78eZtt1a5d25gzZ45hGIYRFxdnDBkyxOb1G2+80WjWrFmx+87KyjL8/f2NefPmFRtnWlqaIclITU21aY+MjDSWLFli0/bcc88ZcXFxhmEYxpw5c4yQkBAjOzvb+vrs2bOL3dafRUVFGTNmzLjs68uXLzdCQ0OtXy9cuNCQZGzevNnatnv3bkOSsWXLFsMwDKNt27bG888/b7Odd955x6hatar1a0nGypUrL7tfAM7DOXZ4rE8++UQVKlRQXl6eLly4oO7du+u1116zvh4VFaXKlStbv96+fbvOnj2r0NBQm+388ccf2rNnjyRp9+7dRZ5BHxcXp3Xr1hUbw+7du5WTk6OOHTvaHfexY8d04MABxcfHa/Dgwdb2vLw86/n73bt3q1mzZipXrpxNHCW1bt06Pf/889q1a5eysrKUl5en8+fPKzs7W+XLl5ck+fj4KDY21rpO/fr1ValSJe3evVstW7bU9u3btXXrVpsKPT8/X+fPn9e5c+dsYgTgeiR2eKz27dtr9uzZ8vX1VbVq1YpMjitMXIUKCgpUtWpVrV+/vsi2SnvJV2BgYInXKSgokHRxOP7GG2+0ec3b21uSZDjhacv79+/XHXfcoSFDhui5555TSEiIvvnmG8XHx9ucspAuXq52qcK2goICTZo0Sffcc0+RPgEBAQ7HCaBkSOzwWOXLl1edOnXs7t+iRQtlZmbKx8dHNWvWLLZPgwYNtHnzZvXt29fatnnz5stu87rrrlNgYKDWrFmjQYMGFXndz89P0sUKt1BERISqV6+uvXv36sEHHyx2uw0bNtQ777yjP/74w/rh4UpxFGfbtm3Ky8vTK6+8Ii+vi9Ntli9fXqRfXl6etm3bppYtW0qSfvnlF/3++++qX7++pIvft19++aVE32sArkNiB/7n1ltvVVxcnHr06KFp06apXr16Onz4sFavXq0ePXooNjZWTzzxhPr166fY2FjddNNNWrx4sXbu3KlatWoVu82AgACNGTNGo0ePlp+fn9q0aaNjx45p586dio+PV3h4uAIDA/X555/r2muvVUBAgIKDg/Xss88qISFBQUFB6tKli3JycrRt2zadOnVKw4cPV+/evTV+/HjFx8fr6aef1r59+/Tyyy+X6Hhr166tvLw8vfbaa+rWrZu+/fZbvfnmm0X6+fr66vHHH9esWbPk6+urxx57TK1atbIm+gkTJujOO+9UZGSk7rvvPnl5eWnHjh366aefNGXKlJL/IAA4hFnxwP9YLBatXr1aN998swYOHKi6devq/vvv1759+6yz2Hv16qUJEyZozJgxiomJ0f79+/XII49ccbvPPPOMRowYoQkTJqhBgwbq1auXjh49Kuni+etZs2Zpzpw5qlatmrp37y5JGjRokObPn6/k5GQ1adJEt9xyi5KTk62Xx1WoUEEff/yxdu3apebNm2v8+PGaNm1aiY73+uuv1/Tp0zVt2jQ1btxYixcvVmJiYpF+5cqV05gxY9S7d2/FxcUpMDBQ//rXv6yv33bbbfrkk0+UkpKiG264Qa1atdL06dMVFRVVongAOIfFcMbJOgAA8LdAxQ4AgAchsQMA4EFI7AAAeBASOwAAHoTEDgCAByGxAwDgQUjsAAB4EBI7AAAehMQOAIAHIbEDAOBBSOwAAHgQEjsAAB7k/wBOlcO9LR7lIQAAAABJRU5ErkJggg==",
-      "text/plain": [
-       "<Figure size 640x480 with 2 Axes>"
-      ]
-     },
-     "metadata": {},
-     "output_type": "display_data"
-    },
-    {
-     "data": {
-      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAkIAAAHFCAYAAAAe+pb9AAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjguMCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy81sbWrAAAACXBIWXMAAA9hAAAPYQGoP6dpAACwZ0lEQVR4nOzdd1gUVxcH4N/SexWkiFhRLIBdrLGiqLHEChYUNZbYMBpjwdiN0WjsilGMYokaEzWiErtgF9AINkRBiigoKJ3d8/3Bx8R1FwQFh3Le5+FJ9s7cmTO76+7ZO7dIiIjAGGOMMVYBqYgdAGOMMcaYWDgRYowxxliFxYkQY4wxxiosToQYY4wxVmFxIsQYY4yxCosTIcYYY4xVWJwIMcYYY6zC4kSIMcYYYxUWJ0KMMcYYq7A4EWKMyfH19YVEIhH+1NTUYGlpicGDB+Phw4dK62RnZ2PTpk1wdnaGoaEhtLW1YW9vj1mzZiExMVFpHZlMhl27dqFz586oVKkS1NXVYW5ujp49e+Lo0aOQyWQfjDUzMxPr169HmzZtYGxsDA0NDVhbW2PgwIE4f/78Jz0PjLGKgRMhxphSO3bswOXLl/HPP//gm2++wZEjR9CmTRu8evVKbr+0tDR06dIFkyZNQqNGjbB3714cP34cw4YNw9atW9GoUSPcv39frk5GRgZcXV0xYsQImJubY9OmTThz5gw2b94MKysrDBgwAEePHi0wvpcvX6J169bw8vJCgwYN4Ovri9OnT2PVqlVQVVVFp06dEBoaWuzPC2OsnCHGGHvHjh07CABdv35drnzBggUEgLZv3y5XPnbsWAJA+/btUzjW/fv3ydDQkOrXr085OTlC+fjx4wkA7dy5U2kMDx48oNDQ0ALj7N69O6mpqdHp06eVbr927Ro9ffq0wGMUVlpaWrEchzFW+nCLEGOsUJo2bQoAeP78uVAWHx+P7du3w8XFBYMGDVKoY2dnh++++w53797Fn3/+KdTZtm0bXFxcMHz4cKXnql27NhwcHPKN5ebNm/D394enpyc6duyodJ9mzZqhatWqAIAffvgBEolEYZ+824BPnjwRyqpVq4aePXvijz/+QKNGjaClpYUFCxagUaNGaNu2rcIxpFIprK2t0a9fP6EsKysLixcvRt26daGpqQkzMzOMHDkSL168kKt75swZfPHFFzA1NYW2tjaqVq2Kr776CmlpafleO2OseKmJHQBjrGyIjIwEkJvc5Dl79ixycnLQp0+ffOv16dMHs2fPRkBAAL766iucPXsW2dnZBdb5kFOnTgnHLgm3bt1CeHg45s6di+rVq0NXVxdWVlaYMmUKHj58iNq1a8vFEhsbi5EjRwLI7fvUu3dvXLx4ETNnzkSrVq3w9OlTzJ8/H1988QVu3LgBbW1tPHnyBD169EDbtm2xfft2GBkZISYmBidOnEBWVhZ0dHRK5NoYY/I4EWKMKSWVSpGTk4OMjAwEBgZi8eLFaNeuHb788kthn6ioKABA9erV8z1O3ra8fQtT50OK4xgFSUhIQFhYmFzSV6NGDcyYMQO+vr5YsmSJUO7r64vKlSuje/fuAIDff/8dJ06cwKFDh+RaiRwdHdGsWTP4+vpi/PjxuHnzJjIyMvDTTz/B0dFR2M/Nza1ErokxphzfGmOMKdWyZUuoq6tDX18f3bp1g7GxMf766y+oqX3c7ydlt6ZKKwcHB7kkCABMTU3Rq1cv7Ny5UxjR9urVK/z1118YPny48LwcO3YMRkZG6NWrF3JycoQ/JycnWFhY4Ny5cwAAJycnaGhoYOzYsdi5cyceP378Wa+RMZaLEyHGmFK//fYbrl+/jjNnzuDrr79GeHg4hgwZIrdPXh+cvNtmyuRts7GxKXSdDymOYxTE0tJSafmoUaMQExODgIAAAMDevXuRmZkJDw8PYZ/nz5/j9evX0NDQgLq6utxffHw8Xr58CQCoWbMm/vnnH5ibm2PixImoWbMmatasiV9++aVErokxphwnQowxpezt7dG0aVN06NABmzdvxujRo3HixAkcPHhQ2KdDhw5QU1MTOkIrk7etS5cuQh11dfUC63yIi4uL3LE/REtLC0DuvEPvyktK3pdf65WLiwusrKywY8cOALlTDLRo0QL16tUT9qlUqRJMTU1x/fp1pX8bN24U9m3bti2OHj2K5ORkXLlyBc7Ozpg6dSr27dtXqOtijBUDsYetMcZKl/yGzyclJZGxsTHZ29uTVCoVykti+PyjR48+efj89evXheHze/fuJQB07do1uX3atWtHACgyMlIos7W1pR49euR73u+++440NTXpwoULBIC2bNkit3337t0EgK5cuVJg/Mq8fv2aANCMGTOKXJcx9nG4szRjrFCMjY3x/fffY+bMmdizZw+GDh0KAPj5559x//59DB06FBcuXECvXr2gqamJK1euYOXKldDX18ehQ4egqqoqHOvnn3/G48eP4eHhgZMnT6Jv376oXLkyXr58iYCAAOzYsQP79u0rcAj9b7/9hm7duqF79+4YNWoUunfvDmNjY8TFxeHo0aPYu3cvbt68iapVq8LV1RUmJibw9PTEwoULoaamBl9fX0RHRxf5eRg1ahR+/PFHuLm5QVtbW2HagMGDB8PPzw+urq6YMmUKmjdvDnV1dTx79gxnz55F79690bdvX2zevBlnzpxBjx49ULVqVWRkZGD79u0AgM6dOxc5LsbYRxI7E2OMlS75tQgREaWnp1PVqlWpdu3aci08WVlZtGHDBmrRogXp6emRpqYm1alTh2bOnEkvX75Uep6cnBzauXMndezYkUxMTEhNTY3MzMyoe/futGfPHrlWp/ykp6fT2rVrydnZmQwMDEhNTY2srKyoX79+9Pfff8vte+3aNWrVqhXp6uqStbU1zZ8/n7Zt21bkFiEiolatWhEAcnd3V7o9OzubVq5cSY6OjqSlpUV6enpUt25d+vrrr+nhw4dERHT58mXq27cv2drakqamJpmamlL79u3pyJEjH7xuxljxkRARiZyLMcYYY4yJgjtLM8YYY6zC4kSIMcYYYxUWJ0KMMcYYq7A4EWKMMcZYhcWJEGOMMcYqLE6EGGOMMVZhVbgJFWUyGWJjY6Gvr1+mFoFkjDHGKjIiwps3b2BlZQUVleJrx6lwiVBsbKyw+CNjjDHGypbo6GhUqVKl2I5X4RIhfX19ALlPpIGBgcjRMMYYY6wwUlJSYGNjI3yPF5cKlwjl3Q4zMDDgRIgxxhgrY4q7Wwt3lmaMMcZYhcWJEGOMMcYqLE6EGGOMMVZhcSLEGGOMsQqLEyHGGGOMVVicCDHGGGOswuJEiDHGGGMVFidCjDHGGKuwOBFijDHGWIXFiRBjjDHGKixRE6ELFy6gV69esLKygkQiwZ9//vnBOufPn0eTJk2gpaWFGjVqYPPmzSUfKGOMMcbKJVHXGktNTYWjoyNGjhyJr7766oP7R0ZGwtXVFWPGjMHu3bsRGBiICRMmwMzMrFD1WemSkZGDrCypQrmBgeYH6759mwWZjOTK1NRUoKOjXmA9mYzw9m2WQrmmpio0NQv+55CVJUVGRo5Cua6uOlRVC/5NkZaWjZwcmVyZRALo63/4WlNSMhXK1NVVoK1d8LXm5MiQlpatUK6trQZ1ddUC6+b32ujra3xwnR9lr42qqgS6uhoF1iMivHnzca9NdrYU6ekf99qkp2cjO1umUF6Y9+GbN5kg+Ust1PtQKpUhNVXxtdHSUoOGRsGvTWZmDjIzFV8bPT0NqKgU/NqkpmZBKpUPWEVFAj29j3ttNDRUoaX1ca+Njo461NQKfm34M4I/I95F7/9jKyaiJkLdu3dH9+7dC73/5s2bUbVqVaxZswYAYG9vjxs3bmDlypWcCImIiPD6dQaePk1GVFTu38SJzT74j+G77wKwdu01uTIVFQmkK12BU5HyO2urA3/0ER42bboV9+8nyu3SvXstHDcxBBLT5eu2qQLMcQYAREcno1q1XxRiWePdHlOuvVAMckpToFt1AMCOHcEYN+5vhV3Cf+mBuv6RCuX4rQdgpgMAGDr0Dxw+fE9uc9Wqhng6tAlwK16+XlUDYIuL8NDScpXCh9XYsY2xJSpD8Zx97ICvHQEAV648Q9u2OxR2OfRjF/Q7G6NYd1FboKkFAGDJkgtYvPiiwi5vf+kJXf/HinX9Bwj/27HjTly/Hiu3uXVrG1yqbw1EpcjXa2wBLGkLAHj9OgMmJisUDr3wu9aYF/pK8ZxjHIF+drnXdCgcQ4YcUtjl6hpXND/xRLHupq5ANUMAwIQJx+HrGyK32chIC6++bQdceiZfz1Qb2N1TeGhntx7x8W/ldhk0qD72ZakB6e99wXStDkxrCgC4e/cFHB0VW7O3L+6IkZfiFcrxfUugnQ0AYO3aq5g58x+FXWLX9oDlcSXvw4N9AN3cL8Tevffh9Gn5fRo2NMftTnWAe/L/nlDXFFjdEUBucmBouFzh0N9OaYGf7r9RPKdbPWBYfQBAQMBj9OixR2GXgFUu6BwQrVh3VQegXiUA/BnBnxFQ+IwoCWVq9fnLly+ja9eucmUuLi749ddfkZ2dDXX1grNfVvzWrr2KOXPOKPyCGjiwPszNdT/uoE9TgBvP5cv0Cvna3n4BxKXKl1nqfbhetkzxnADwIu3DdRPTlNdV0sqg4EGSYt03ir/QlFJ2zv9/SBUoPUd5XSW/KhXEvFFetzDuJgL3k+TLPtASAQCQkvJzfpmqWPa+VxnK6yr51a4g4rViXctCvqeDnwNv33sd65h+uF6mVHm8rwrx2jxPVV5XVohf0fcSP+51JSiv16Hqh+u+zVZeV0lLmQL+jPhwPaDcfUbcu3f34477AWWqs3R8fDwqV64sV1a5cmXk5OTg5cuXSutkZmYiJSVF7o99WFJSulzT5YEDB2Bvb48qVarI/c2fP1tpM3KDBq1RpUoVDLJqjZvm3+N6je9wpem3uNx8Oi43nw7HNVWw9uE6hXoykuLXyF8VylNy3kD1gKrwd//NfYV9/OP98Sxd8VfMwWcHhXrV/66h9HqX3/tRafnYm2OFuuNvjle6z4/3FFsxAKDqsapC3cMxhxW2R6U9xannpxTK76bclbvWNKniF/7Wxz5Kz7n6wRqhXruz7ZTusyhssdLy7hddhbqLw5Yo3Wfdo/VKy9+N93rSdYXtgS8DcTdF8UPs1PNTQj3TPyspPfbKB6uUlk8P/Vao63bFTek+S+8tVVre8KSDUNf3ia/C9tfZr3Dw2UGF8mfpMXLXGp8Rp7DP/uj9SMlRbCX5NfJXoZ7TKSelcS0JVx7voCuDhLrf3f5O6T6r7q9WWm502EioezrhtML2O8l3cDnxikL55cQrQj31g8oT1rUP1yotX3B3oVC358WeSvdZHK78fdj6TBuhLn9G8GfEqeenoLJTBZJ2EnTr1knpsT9VmWoRAqBwuyXvnmF+t2GWLVuGBQsWlHhcZZ1MRrh1Kw7+/g/h7/8IV6/G4MoVTzRrZg0A8Pb2xr1795TU1FJ6vBcvsgDEQKZhhiYG1sAb5P79n46mOqCl5JeqBCBNZeUEmck7v56U3b5WB6CiWJc08F/dfH6BkbJYAMh03zmvrvJ9SDufukbv1FX2PaICkLIfsaqQv1Zlb21lzxFyr0Ooq1/EePVl/9XNZx/Kp7uCXLzKPlXUoPQ1I/V36qoX7bUhnXeuVS+/a80nXsN3rlXp+y33faNA5b33obKfkhoAJEreh5rv1H1dxNdGrxCvTX7vYWOCTOf/dZW9Nqr5lKu989pIlX++kvJ//iDtD78PZfnFa/DOtfJnRIX/jPg3NQ40iwAld4yLS5lKhCwsLBAfL/9sJCQkQE1NDaamypudv//+e3h5eQmPU1JSYGNjU6JxljW3bz9Hly67kJAg/6vC3/+RkAi9eZObxaioqMDS0lLYJztbEwkJisc0MLCFvn4KTKQmgJK7ECopKkCGkn+9BEgylZVLoJL037eOTLGPHpANQKZYV5IFoS69VoGyf74SZbEAUEn977yUKlFeNz2fuq8lUJHk1pUpNpoBMkCirIVbCvlrVXZSZc8Rcq9DiPdNEeN9oyLUleWzjySflnG5eJXddcqB0qRBkv3Oa/O2aK+NJO2da32b37XmE2+yClR0/3+tSt9vue8bBbL33ofKvjOzAJCS92HmO/EmF/G1efvfa0Pp+dTN7z38SgKVjP9fq7LXRgql/0aR885rI8vvnEpPCUn6h9+HKvnFm/LO+5A/Iyr0Z0QmZWPZv6dy/00Bub+7S6CbUJlKhJydnXH06FG5slOnTqFp06b59g/S1NSEpuaHe92XR5mZOYiNfQNLS335kR3no4G94cDTZABAbakMKS8V73P7+z+Ct3d7ubIaFlXxsOF/zbxpUhl0AxRbikaOnII1a7oBfz0CRp9Q2B7Y8RL8Hd7ianP5ZmqJBPA06AVkyXcsNNBWh3TAf59s659fw8v3Yq5duy+q+EcpdITs32Y0pANym9Jfv87AmgTF2wDdHAcB6XcUyrd2242t/+8IeaN6LI5ZPVDY55tq/YAUxecgqu8zoSPkftm/CA+Xv31raKiJri9zAF355L5+1dqQDviv2XnJgwsKo5qaNBkMbLytcM5pHb0xbUBup+GoqGRslwYr7DOo/kAg5V+Fcv8ep4X+A2dMI3Gh4VOFfbzMvgTSHimUv/va+Ly+iZgY+VtDVasaov7VBIWOkF0bO0I6YAOA3NFbPz4NVDh2h0aDgSzFa13VdRNW/b+z9L/2CThoFKawz5ha/YFkxfI7vcOEztJ/qt9DSHv510BLSw39paqAqnxn6Sqm1eSudeXTIIVbww0aDIDBngcKnaU9v5gGzwF+AIDnz99i09sbCnH1bjgQeKv42uzv/qfQWTrQKgoBNRU7o0636g28fahQ/vqrZKGz9G/poXj8WL7jeeXKunB+8AYwle9Y7Fy3HqQDcm9LymSEheHnFY7dqvFggBRfm/mdV2D+gNxOww+dEuGnqfhva3jd/sBrxdcmsNdVobO0v95D/oyowJ8RmgD8jBbAZd+3aN26NdasWYdmzRorHPtTSaikxqMVwtu3b/HoUe4T1qhRI/z888/o0KEDTExMULVqVXz//feIiYnBb7/9BiB3+HyDBg3w9ddfY8yYMbh8+TLGjRuHvXv3FnrUWEpKCgwNDZGcnAwDA4MSuzYxBAVFY/HiC4iNfYOYmDfCh8CVK55o0aJK7k7no4HBR4H3hml2T07Biez3fnpICBZHd0HVMBNxcXGQSWUwzNLG6+ny97vHvnmLiPoxuPnFPahVfgtVi9w/FZ1s9Aisgy0r+yoGe/wroJmlYjljjLEKiYiQkZEBbW35e9qnTp1Cx44dkZaWViLf36K2CN24cQMdOnQQHufdwhoxYgR8fX0RFxeHqKgoYXv16tVx/PhxTJs2DRs2bICVlRXWrl1bPobOv9dK8zZHhkdpWXjkYouH6hK0aFEFHTtWV153111gTxhSX76F/40ohc1ymffecIUkCAC6a6grJkIkQXyQDtDxMWCSWyRLU8ybt+rr4YdmETjj+t6Q3kzgkaYuLtf+75ebtoo2nPQdhV+ojDHGWFJSEsaNG4f09HQcOXJErt/v+6PFi5uoLUJiKA0tQgcOHIC3t7fQ76aNzBa7sgdCHapIJ0KNpFeIf+9l0dW9AyMjxeZaAPDKaQMvaRuE5eSg/utkhe2GhoHQ08ttgl6R3Q11yAxZ1bIANUAzWw2Nn9jgoVQKu1evcyuo5wD1nkLi9AhoEQ5JpRTIpLnJk7IWIQD4edAl/Dz4UoHXra+qj0U1FqF/5f4F7scYY6ziOHv2LIYNG4aYmNzboBs3bsT48Yoj8Erq+7tM9REqy1JTsxAb+wa1a5sqjMDqpdcJ6lq53eW1JRIo6+OXmqqB1FQlE1wBSNFJAXQAaxXlsyEkJ8uQnJxb1x3/H3a6FtAwUsPBH0ehMWxQW1UVHpqZ8M08CGQ/AUKzQaEAdkKuM10tu1pKz+FVdRq82uwr6ClgjDHGBFlZWZg7dy5WrlwpjAA3NjaGhUUh5joqRpwIlZC4uDe4tDUYgQfDcSnqNULeZKChnhaCF3RSGIFVN8tKLtuoraqKFzny3erV1CrDoKcJUvqlQKb13jTsxyTAccBAIoEOAIVuz+0MoPKNfJIkM5Jh+l8dUSlVDzcluUlSF8MXCFBPA2Cu9Jr09fUxZ+5cYLeS0QKFmZCMMcYYA3Dv3j24ubkhOPi/ztodO3bEzp07UaVKlc8aCydCJWSK2x84cO6JXNntNxlIefDfrJmWlpZ49uwZEPbyv9lUs6So5boHQSnyiZBEYgIzr8pIynpv1k38N+eDRCJBc3U1ZFJu69BfLoHItngN1I6Vn8Ph/5b1PYXD7lEIdw4HADQBoHxauve4F2YnxhhjTB4RYcuWLfDy8kJ6eu7oPXV1dSxbtgzTpk2DSj53NkoSJ0IlpE0WcOC9MhmAK/FK1uWp986MuqtvoHZlPSBFfninTEZ4nSAFjAAVqMBS878RV6mVgZt2ua06K9+pEzL5LtK18jpAWyucNq/PDmOMMVbSMjMzMWDAALlpcOzt7eHn54dGjRqJFhcnQp/gwPMD8H7sjTdSxeRm+dvhyJ0FQd6Wh0GIWx0HSIE41ThUufReE2AzIFPNArrnq0OtSgrUqiRDrUoKVC3eIF6au0idpaYlnrV5Z36TNgCUTJ79EMs+4eoYY4yx4qOpqQl9fX3h8YQJE/DTTz9BR0dHxKg4Efok3o+9cS9N2bITgLnRG+hKNJH63pi8sBeqwm0qGWSIyVTSAbpeDFDvpnzZOz2o9VX1wRhjjJU1GzZswMOHD+Ht7Y2ePZWvQ/e5cSL0EfKGvz9Y+CB3fh0ZoPJa/r5m9282QpY0DPi3OnS0stBMS4KG+lKkNorAvRe5+6ioyi9XURh8O4sxxlhZcPv2bcTGxqJbt25CmZGREa5evZrv+qBi4ESoiIKCojF1qg9iYx8B/59bB4mAbIiyRYdOAshCWkYSzmcA518DiAZwJHerXV07hIeHf5a4GWOMsc9BJpPhl19+waxZs6Crq4vbt2/LjQQrTUkQwIlQkf34YyBiY1sDaAz8+i/QOwQS8zhYWefXsqMNpR2V9fWxaBG37DDGGCs/YmNj4eHhgYCAAAC5cwUtXboUGzduFDmy/HEiVAQJCak4fjxvUUNt4J9mwD/NoNEyCs+e/SpqbIwxxpiY/vzzT4wePRqJif8t4Dt9+nQsWbKkgFri40SoCPbsuYMcJet0dczRAbq/N1jeVBvYXTo6gjHGGGMlJTU1FdOmTYOPj49QZmlpid9++w2dO3cWMbLC4USokNLTs7F6tfK1vr4ylAA3nssXWup+hqgYY4wx8dy4cQPu7u548OCBUNa3b1/4+PjA1NRUxMgK7/NP4VhGyWSEIUMaQFNTVX6DwxNU1lHWUZoxxhgrvzIyMvDll18KSZCOjg62bduGQ4cOlZkkCOBEqNB0dTWwfHln3L//DbS1H/63oc818YJijDHGRKKlpSV0gm7WrBlCQkLg6elZ6kaFfQgnQkVka2sEE5NzALYC3a8CbcPEDokxxhj7LLKysuQe9+nTB4cPH0ZgYCBq164tUlSfhvsIfciTZCBDfgHUqmSIGIRBZWQ8ZBLgsVUS0LSyfD1T7c8YJGOMMVZykpOT8c033yAzMxP79++Xa/Xp06ePeIEVA06E8pG3jtjub/ugyQP5eYBWVe+GVqvCIDPK7Ru0YugFTG6zR4wwGWOMsRIVGBiIoUOH4smTJwCAHj16YMSIEeIGVYz41lg+8tYRy6IsxY1qAMwgPHu89hdjjLHyJjs7G97e3mjXrp2QBBkYGEBLS0vcwIoZtwjlQ9mK8oIcAC9y1wqzq2LHa38xxhgrVx49eoShQ4fi6tWrQlnr1q2xe/duVKtWTbzASgAnQh+gIdEAulYDOlYFZl3ILXuiAQwBLK0tEf6M1wpjjDFWPhARfH19MWnSJKSmpgIAVFVV8cMPP2DWrFlQUyt/aQPfGlMiO1uKpIVfAPesEdDsEbC9O6Ch+sF6jDHGWFmVkZGBgQMHYtSoUUISVLNmTQQGBmLu3LnlMgkCOBFS6tSpCKQH1Aa+GYOF/2ohMjZFbnuU5LU4gTHGGGMlRFNTE9nZ2cJjT09PhISEoEWLFiJGVfI4EVLizJlI4f/TT9dC3bob8O3eUGQQAWoq2Kd6W8ToGGOMseInkUiwbds21K9fHwcPHsS2bdugp6cndlgljhMhJc6deyr3OCtLCt/AKGh8ZQfs74VAlaf51GSMMcbKhnv37uH8+fNyZZUqVcLt27fx1VdfiRTV58eJ0HtevUpHcHCcQnn7zjWgssUFaGcjQlSMMcZY8SAibN68GY0bN8bAgQPx/Ln8ouEqKhUrNahYV1sIFy9GgUix/IsvbD9/MIwxxlgxSkhIQO/evTF+/Hikp6cjISEBixZV7ClgOBF6T4sW1tixozd0uj0AzF8L5YsXj0aVKlVQpUoVxMUpthgxxhhjpZm/vz8cHBxw9OhRoWzixIlYsWKFiFGJr3yOhfsElSvrwcNAB08bpCHZPhTJcZqIOWQEo1ea2J8dI7evvj7PKM0YY6x0S09Px3fffYd169YJZebm5ti+fTt69OghYmSlAydCyviEYv6NTv89NgRuSqrhkka8UKSvr1/hmxMZY4yVbqGhoXB3d8fdu3eFMldXV2zfvh2VK1cuoGbFwYlQITVp0hjP/JeJHQZjjDFWKOnp6ejatSsSEhIAAFpaWli5ciUmTJggt3p8Rcd9hBhjjLFySFtbG6tXrwYAODo64ubNm5g4cSInQe/hFiHGGGOsnJBKpVBV/W9JKDc3NxAR+vfvD01NTREjK724RYgxxhgr41JTUzF27FiMHj1aYZu7uzsnQQXgFiFlfuuBZoFNEJcVDyQClostcP23m2JHxRhjjCm4ceMG3N3d8eDBAwC5naEHDBggclRlBydCypjpIK7SG8RkvgZkACS6gJmO2FExxhhjAqlUihUrVsDb2xs5OTkAAB0dHWRmZoocWdnCidD/9e69D+rqKmjf3hbt21cDycSOiDHGGFMuKioKw4YNw4ULF4Sypk2bws/PD3Z2diJGVvZwIgQgNTULx48/RE6ODIcOhQMAJAbDgK8Cga4XRY6OMcYY+8++ffswbtw4JCcnA8hdNX727NmYP38+1NXVRY6u7OFECEBQUDRycuSbgChFC1CTihQRY4wxJi89PR1ff/01du3aJZRVrVoVu3fvRtu2bUWMrGzjUWMAzv92R/kGx6efNxDGGGMsH5qamnIrxbu5uSE0NJSToE/EiRCA8/88VizUzAJq8eKqjDHGSgcVFRX4+vqiZs2a2L17N/z8/GBkZCR2WGUe3xoD0EpbA29VVREqlYL+X1bdNBWRarm3y3hxVcYYY5/bo0ePkJiYiBYtWghllpaWuHfvHtTU+Ou7uHCLEIAf61RGsLEREk2McURfH9O1tTBUUwsAoKamxourMsYY+2yICDt27ICTkxO++uorJCUlyW3nJKh4cSL0DmMVFfTS1MBKXV30tsxdi6WyRWX0799f5MgYY4xVBElJSRg4cCBGjRqF1NRUxMTEYMGCBWKHVa5xIgQAVQ1wX/ICd3PicF/yAqhjgujKr8WOijHGWAVy9uxZODg44ODBg0KZp6cnlixZImJU5R+3rwHAFhd0+tsTMS9iYG1tjWeXnuGbSzMBnpyTMcZYCcvKysLcuXOxcuVKEOX2VDU2NoaPjw+++uorkaMr/zgRYowxxkRy7949uLm5ITg4WCjr2LEjdu7ciSpVqogYWcXBiRBjjDEmgrS0NLRr1w4vXrwAAKirq2PZsmWYNm0aVFS458rnUmETobqX60JF9783WtzqOEAKxKnGocqlKojL5DmEGGOMlRwdHR0sWbIEY8eOhb29Pfbs2QMnJyexw6pwKmwiFHfAGugfDKj/fxkNk9z/yCBDTGaMsJ++Ks8hxBhjrHgQESQSifB49OjRICIMHToUOjo6IkZWcVXYRAi/dobqibYw/Po6tL6IRHx8HGRSGVRUVWBpaQkgNwlaVIPnEGKMMfZp0tPT8d1334GIsG7dOqFcIpFg7NixIkbGKm4iBEAaY4gk785wdq6CnIg1SEj4F5bWlnj27JnYoTHGGCsnQkND4e7ujrt37wIAunXrhh49eogcFcvDvbEAXL8eC0D2wf0YY4yxwpLJZFi9ejWaN28uJEFaWlpC52hWOlToFqE846wM4ZH+JQ5pV8FvuCt2OIwxxsq42NhYeHh4ICAgQChzdHTEnj17UK9ePREjY++TUN7sTRVESkoKDA0NAcwCoAUDiQSPjI1g9v5QxaaVAf8BYoTIGGOsDDt8+DDGjBmDxMREoWz69OlYsmQJNDU1RYysbMv7/k5OToaBgUGxHbfCtgiZaMmQlAF8r62tmAQxxhhjRZSRkYHJkyfDx8dHKLOyssLOnTvRuXNnESNjBamwGYDHqLtY3aU2pkxqBoxzhK/KTfkdbA3FCYwxxliZpK6ujnv37gmP+/bti9u3b3MSVMpV2BahvV/eQqzLsdwHmVJYbz3030Y1FcDNXpzAGGOMlUmqqqrYtWsXWrdujQULFmDUqFFycwax0qnCJkJyNgbDhLRxOTsS8Zpp6Lt/NtDORuyoGGOMlWJPnz7Fq1ev5GaDtrW1RUREBPcFKkMq7K0xOdOaorfGLrRK/hmT1I9yEsQYY6xAe/fuhaOjI/r164eUlBS5bZwElS0VNhGKj49HlSpVhL+4OF5bjDHGWMGSk5MxbNgwuLm5ITk5GZGRkViwYIHYYbFPIHoitHHjRlSvXh1aWlpo0qQJLl68WOD+fn5+cHR0hI6ODiwtLTFy5Ei5IYqFRVJCTEyM8CeT5U6oqK/Pa4sxxhhTFBgYCCcnJ+zevVsoc3Nzg7e3t4hRsU8laiK0f/9+TJ06FXPmzEFwcDDatm2L7t27IyoqSun+ly5dwvDhw+Hp6Ym7d+/iwIEDuH79OkaPHv1R57e2tpb7q1u3LhYt4rXFGGOM/Sc7Oxve3t5o164dnjx5AgAwMDDA7t274efn9/+56VhZJeqEii1atEDjxo2xadMmocze3h59+vTBsmXLFPZfuXIlNm3ahIiICKFs3bp1WLFiBaKjowt1zrwJmSS+EshG8LIajDHG8hcREQF3d3dcvXpVKGvTpg127dqFatWqiRdYBVRSEyqK1iKUlZWFmzdvomvXrnLlXbt2RVBQkNI6rVq1wrNnz3D8+HEQEZ4/f46DBw8WuHhdZmYmUlJS5P4YY4yxD0lNTUXLli2FJEhVVRWLFy/GuXPnOAkqR0RLhF6+fAmpVIrKlSvLlVeuXBnx8fFK67Rq1Qp+fn4YNGgQNDQ0YGFhASMjI6xbty7f8yxbtgyGhobCn41N7oiwpk+rAmeeyv/dUH5exhhjFY+uri7mzp0LAKhZsyaCgoIwZ84cqKqqihwZK06izyP0/mRTRJTvBFRhYWGYPHkyvL294eLigri4OMyYMQPjxo3Dr7/+qrTO999/Dy8vL+FxSkoKbGxs8M8vE4F1R+V35vXFGGOsQnv/O2jSpEmQyWQYM2YM9PT0RIyMlRTREqFKlSpBVVVVofUnISFBoZUoz7Jly9C6dWvMmDEDAODg4ABdXV20bdsWixcvhqWlpUIdTU1NntOBMcZYgbKysjB37lyoqKhg+fLlQrmKigqmTZsmYmSspIl2a0xDQwNNmjRBQECAXHlAQABatWqltE5aWhpU3lsgNa+Jsqh9vi/VeFyk/RljjJVP4eHhaNmyJX766SesWLECZ8+eFTsk9hmJOnzey8sL27Ztw/bt2xEeHo5p06YhKioK48aNA5B7W2v48OHC/r169cIff/yBTZs24fHjxwgMDMTkyZPRvHlzWFlZFencqzsreaPzQquMMVZhEBE2bdqEJk2aIDg4GACgpqYmNzKZlX+i9hEaNGgQEhMTsXDhQsTFxaFBgwY4fvw4bG1tAQBxcXFycwp5eHjgzZs3WL9+PaZPnw4jIyN07NgRP/7446cHwwutMsZYhZGQkABPT08cO3ZMKLO3t8eePXvk1g5j5Z+o8wiJIW8egmYLbHHt8k+5hbaGuUkQrzHGGGPlnr+/Pzw8PJCQkCCUTZgwAT/99BN0dHREjIwVpKTmERJ91JhYbthGAd48QowxxiqKjIwMzJw5U27KFTMzM2zfvh09e/YUMTImJtHXGmOMMcY+B1VVVVy5ckV47Orqijt37nASVMFxIsQYY6xCUFdXh5+fHypVqoT169fj2LFj+U7XwiqOCntrjDHGWPkWGxuL5ORk2Nv/NxCmdu3aePLkCXR1dUWMjJUmFbZFaMlfPYEtoWKHwRhjrAQcPnwYDg4O+Oqrr5CWlia3jZMg9q4KmwhNPN8W+POB2GEwxhgrRqmpqRg7diz69euHxMREhIeHY+HChWKHxUoxvjXGGGOsXLhx4wbc3d3x4MF/P3L79u0rLMvEmDIVtkWIMcZY+SCVSrFs2TI4OzsLSZCOjg62bduGQ4cOwdTUVOQIWWnGLUKMMcbKrKioKAwbNgwXLlwQypo1awY/Pz/Url1bxMhYWcGJEGOMsTLpzZs3aNq0KV68eAEAkEgkmD17NubPnw91dXWRo2NlRYW9NWb083eAP88szRhjZZW+vj6mTp0KAKhatSrOnz+PxYsXcxLEioRbhBhjjJVZ3333HWQyGb755hsYGRmJHQ4rgzgRYowxVurl5ORg0aJFUFNTw7x584RyVVVVzJ07V8TIWFnHiRBjjLFSLSIiAu7u7rh69SpUVFTQuXNnODs7ix0WKycqbB8hxhhjpRsRwdfXF05OTrh69SqA3A7RoaG8KgArPtwixBhjrNRJSkrC119/jYMHDwplNWvWhJ+fH1q0aCFiZKy84RYhxhhjpcrZs2fh4OAglwR5enoiJCSEkyBW7DgRYowxVipkZWXhu+++Q6dOnRATEwMAMDY2xsGDB7Ft2zbo6emJHCErj/jWGGOMsVJBJpPB398fRAQA6NixI3bu3IkqVaqIHBkrzypsi9DlFV7A1yfFDoMxxtj/aWlpYc+ePTAwMMDKlSsREBDASRArcRW2Rcg+vjIQlSJ2GIwxVmElJCTgzZs3qFmzplDWoEEDPH36lCdHZJ9NhW0RYowxJh5/f380bNgQ/fv3R2Zmptw2ToLY58SJEGOMsc8mPT0dkydPhqurKxISEhASEoIlS5aIHRarwCrsrTHGGGOfV2hoKNzd3XH37l2hzNXVFRMnThQxKlbRVdgWodN1HgCNLcQOgzHGyj2ZTIbVq1ejefPmQhKkpaWF9evX49ixY6hcubLIEbKKTEJ54xQriJSUFBgaGkLiK4FshEzscBhjrFyLjY3FiBEj8M8//whljo6O2LNnD+rVqydiZKysyfv+Tk5OhoGBQbEdl2+NMcYYKxHJyclwcnLCixcvhLLp06djyZIl0NTUFDEyxv5TYW+NMcYYK1mGhoYYO3YsAMDKygoBAQFYuXIlJ0GsVOEWIcYYYyVm/vz5kMlkmD59OkxNTcUOhzEFH9UilJOTg3/++QdbtmzBmzdvAOTeB3779m2xBscYY6xskEqlWLZsGVavXi1Xrq6ujqVLl3ISxEqtIrcIPX36FN26dUNUVBQyMzPRpUsX6OvrY8WKFcjIyMDmzZtLIk7GGGOlVFRUFIYNG4YLFy5AXV0dX3zxBRo1aiR2WIwVSpFbhKZMmYKmTZvi1atX0NbWFsr79u2L06dPF2twjDHGSrd9+/bBwcEBFy5cAJB7xyAoKEjkqBgrvCK3CF26dAmBgYHQ0NCQK7e1tUVMTEyxBcYYY6z0SklJwTfffINdu3YJZVWrVsXu3bvRtm1bESNjrGiK3CIkk8kglUoVyp89ewZ9ff1iCepzsHptCLxIEzsMxhgrcwIDA+Ho6CiXBLm5uSE0NJSTIFbmFDkR6tKlC9asWSM8lkgkePv2LebPnw9XV9fijK1EhS2cDQz/W+wwGGOszMjOzoa3tzfatWuHJ0+eAAAMDAywe/du+Pn58WKprEwq8q2x1atXo0OHDqhXrx4yMjLg5uaGhw8folKlSti7d29JxMgYY6wUyMrKwv79+yGT5c7K36ZNG+zatQvVqlUTNzDGPkGREyErKyuEhIRg3759uHnzJmQyGTw9PeHu7i7XeZoxxlj5oqurCz8/P7Rr1w5z5szBrFmzoKqqKnZYjH2SIq81duHCBbRq1QpqavI5VN5IgXbt2hVrgMVNWKvEZAUMmlcD/AeIHRJjjJVKSUlJSE1NhY2NjVx5QkICzM3NRYqKVVQltdZYkfsIdejQAUlJSQrlycnJ6NChQ7EExRhjTFxnz56Fg4MDBg4ciJycHLltnASx8qTIiRARQSKRKJQnJiZCV1e3WIL6HGb3PgaMcRQ7DMYYK1WysrIwc+ZMdOrUCTExMbhy5Qp+/PFHscNirMQUuo9Qv379AOSOEvPw8JBbNE8qleL27dto1apV8UdYQja2v4j1/ezEDoMxxkqN8PBwuLu7Izg4WCjr2LEjRowYIWJUjJWsQidChoaGAHJbhPT19eU6RmtoaKBly5YYM2ZM8UfIGGOsRBERtmzZAi8vL6SnpwP4b40wLy8vqKh81LKUjJUJhU6EduzYAQCoVq0avv322zJ1G4wxxphyCQkJGD16NI4ePSqU2dvbw8/Pj9cLYxVCkYfPz58/vyTiYIwx9pm9fv0ajo6OiI+PF8omTJiAn376CTo6OiJGxtjnU+RECAAOHjyI33//HVFRUcjKypLbduvWrWIJjDHGWMkyMjLC4MGDsWbNGpiZmWH79u3o2bOn2GEx9lkV+cbv2rVrMXLkSJibmyM4OBjNmzeHqakpHj9+jO7du5dEjIwxxkrIsmXLMHnyZNy5c4eTIFYhFXlCxbp162L+/PkYMmQI9PX1ERoaiho1asDb2xtJSUlYv359ScVaLPImZJL4SiAbIRM7HMYY+yxkMhl++eUX6OrqYuzYsWKHw1iRlZoJFaOiooRh8tra2njz5g0AYNiwYbzWGGOMlUKxsbHo1q0bvLy8MGXKFISHh4sdEmOlRpETIQsLCyQmJgIAbG1tceXKFQBAZGQkiti4JKrhV5oBJyLFDoMxxkrU4cOH4eDggICAAABARkaG8P+MsY9IhDp27CgMs/T09MS0adPQpUsXDBo0CH379i32AEvK2t/7A7/cEDsMxhgrEampqRg7diz69esn/Hi1srJCQEAAJk+eLHJ0jJUeRR41tnXrVshkuX1rxo0bBxMTE1y6dAm9evXCuHHjij1AxhhjRXPjxg24u7vjwYMHQlnfvn3h4+MDU1NTESNjrPQpciKkoqIiN8vowIEDMXDgQABATEwMrK2tiy86xhhjhSaVSrFixQp4e3sLC6Xq6Ohg7dq1GDVqlNJ1Ihmr6Ipl3vT4+HhMmjQJtWrVKo7DMcYY+wipqanYsmWLkAQ1a9YMISEh8PT05CSIsXwUOhF6/fo13N3dYWZmBisrK6xduxYymQze3t6oUaMGrly5gu3bt5dkrIwxxgpgYGCAXbt2QV1dHXPmzEFgYCBq164tdliMlWqFnkdowoQJOHr0KAYNGoQTJ04gPDwcLi4uyMjIwPz589G+ffuSjrVY5M1DYL/cAmGD7gHVDMUOiTHGPkpKSgrS0tJgYWEhVx4dHQ0bGxuRomKsZIg+j9Dff/+NHTt2YOXKlThy5AiICHZ2djhz5kyZSYLedc/iOSdBjLEyKzAwEI6OjnBzcxMGsOThJIixwit0IhQbG4t69eoBAGrUqAEtLS2MHj26xAJjjDGmKDs7G97e3mjXrh2ePHmCs2fPYvXq1WKHxViZVehRYzKZDOrq6sJjVVVV6OrqlkhQjDHGFD169AhDhw7F1atXhbI2bdrgq6++EjEqxsq2QidCRAQPDw9oamoCyJ2ddNy4cQrJ0B9//FG8ETLGWAVHRPD19cWkSZOQmpoKIPfH6IIFCzBr1iyoqqqKHCFjZVehb42NGDEC5ubmMDQ0hKGhIYYOHQorKyvhcd5fUW3cuBHVq1eHlpYWmjRpgosXLxa4f2ZmJubMmQNbW1toamqiZs2aPFqNMVZuJSUlYeDAgRg1apSQBNWsWRNBQUGYM2cOJ0GMfaJCtwjt2LGj2E++f/9+TJ06FRs3bkTr1q2xZcsWdO/eHWFhYahatarSOgMHDsTz58/x66+/olatWkhISBDmzGCMsfLk1atXcHR0xLNnz4QyT09PrFmzBnp6eiJGxlj5Uejh8yWhRYsWaNy4MTZt2iSU2dvbo0+fPli2bJnC/idOnMDgwYPx+PFjmJiYfNQ584bfSXwlkI2QfbgCY4yJ6Ouvv8bWrVthbGwMHx8f7g/EKizRh88Xt6ysLNy8eRNdu3aVK+/atSuCgoKU1jly5AiaNm2KFStWwNraGnZ2dvj222+Rnp7+OUJmjLHP7ueff4anpydu377NSRBjJaDIa40Vl5cvX0IqlaJy5cpy5ZUrV0Z8fLzSOo8fP8alS5egpaWFw4cP4+XLl5gwYQKSkpLy7SeUmZmJzMxM4XFKSgoAwHenO/DsMjDHuZiuiDHGPh4RYcuWLdDT08PQoUOFcl1dXWzbtk3EyBgr30RLhPK8v/4NEeW7Jo5MJoNEIoGfn5/QMfvnn39G//79sWHDBmhrayvUWbZsGRYsWKBQ3ifUAdB8plDOGGOfW0JCAkaPHo2jR49CT08Pzs7OqFmzpthhMVYhiHZrrFKlSlBVVVVo/UlISFBoJcpjaWkJa2trudFp9vb2ICK5zoTv+v7775GcnCz8RUdHF99FMMbYJ/L394eDgwOOHj0KAHj79i2OHTsmclSMVRwflQjt2rULrVu3hpWVFZ4+fQoAWLNmDf76669CH0NDQwNNmjRBQECAXHlAQABatWqltE7r1q0RGxuLt2/fCmUPHjyAiooKqlSporSOpqYmDAwM5P4YY0xs6enpmDx5MlxdXfH8+XMAgJmZGY4ePYopU6aIHB1jFUeRE6FNmzbBy8sLrq6ueP36NaRSKQDAyMgIa9asKdKxvLy8sG3bNmzfvh3h4eGYNm0aoqKiMG7cOAC5rTnDhw8X9ndzc4OpqSlGjhyJsLAwXLhwATNmzMCoUaOU3hZjjLHS6Pbt22jWrBnWrVsnlLm6uuLOnTvo2bOniJExVvEUORFat24dfHx8FCbyatq0Ke7cuVOkYw0aNAhr1qzBwoUL4eTkhAsXLuD48eOwtbUFAMTFxSEqKkrYX09PDwEBAXj9+jWaNm0Kd3d39OrVC2vXri3qZSDGKBkw5eSJMfb5yGQyrF69Gs2aNcPdu3cBAFpaWli/fj2OHTuWb7cAxljJKfI8Qtra2rh37x5sbW2hr6+P0NBQ1KhRAw8fPoSDg0OpH8rO8wgxxsTy6tUr1K9fH3FxcQAABwcH7NmzB/Xr1xc5MsZKv1Izj1D16tUREhKiUO7v7y+sTs8YY0yRsbExdu7cCRUVFUyfPh3Xrl3jJIgxkRV5+PyMGTMwceJEZGRkgIhw7do17N27F8uWLeO5Lhhj7B2pqanIyMiAqampUNalSxfcv38ftWrVEjEyxlieIidCI0eORE5ODmbOnIm0tDS4ubnB2toav/zyCwYPHlwSMTLGWJlz48YNuLu7o1atWjh27Jjc/GicBDFWenzSWmMvX76ETCaDubl5ccZUoriPEGOsJEmlUqxYsQLe3t7CgtAbNmzAhAkTRI6MsbKt1PQRWrBgASIiIgDkTopYlpIgxhgrSVFRUejYsSNmz54tJEHNmjVDly5dRI6MMZafIidChw4dgp2dHVq2bIn169fjxYsXJREXY4yVKfv27YODgwMuXLgAAFBRUcGcOXMQGBiI2rVrixwdYyw/RU6Ebt++jdu3b6Njx474+eefYW1tDVdXV+zZswdpaWklESNjjJVaKSkpGD58OIYMGYLk5GQAQNWqVXHu3DksXrwY6urqIkfIGCvIJ/URAoDAwEDs2bMHBw4cQEZGhrC6e2nFfYQYY8UlMTERzZo1Q2RkpFDm5uaGDRs2wMjISLzAGCuHSk0foffp6upCW1sbGhoayM7OLo6YPouo2QuBfn+KHQZjrAwzNTVF69atAQAGBgbYvXs3/Pz8OAlirAz5qEQoMjISS5YsQb169dC0aVPcunULP/zwg8JK8qWZQYYmkF52EjfGWOm0fv16DBkyBKGhoXB3dxc7HMZYERV5HiFnZ2dcu3YNDRs2xMiRI4V5hBhjrDwjIuzcuRMGBgbo16+fUG5oaIg9e/aIGBlj7FMUORHq0KEDtm3bxtPCM8YqjKSkJHz99dc4ePAgjIyM0KxZM9jY2IgdFmOsGBT51tjSpUs5CWKMVRhnz56Fg4MDDh48CAB4/fq18P+MsbKvUC1CXl5eWLRoEXR1deHl5VXgvj///HOxBFbSfmtxHd906SB2GIyxUiorKwtz587FypUrkTe41tjYGD4+Pvjqq69Ejo4xVlwKlQgFBwcLI8KCg4NLNKDPZfKgg/hmxO9ih8EYK4Xu3bsHNzc3uc+7jh07YufOnahSpYqIkTHGiluhEqGzZ88q/X/GGCtPiAhbtmyBl5cX0tPTAQDq6upYtmwZpk2bBhWVT55xhDFWyhT5X/WoUaPw5s0bhfLU1FSMGjWqWIJijDExJCUlYd68eUISZG9vj2vXrmH69OmcBDFWThX5X/bOnTuFD4l3paen47fffiuWoBhjTAympqbYtm0bAGDChAm4ceMGnJycxA2KMVaiCj18PiUlBUQEIsKbN2+gpaUlbJNKpTh+/DivRM8YK1PS09ORlZUFQ0NDoax37964ffs2GjZsKGJkjLHPpdCJkJGRESQSCSQSCezs7BS2SyQSLFiwoFiDY4yxknL79m24ubnB3t4ev//+OyQSibCNkyDGKo5CJ0Jnz54FEaFjx444dOgQTExMhG0aGhqwtbWFlZVViQTJGGPFRSaT4ZdffsGsWbOQlZWFu3fvYufOnfDw8BA7NMaYCAqdCLVv3x5A7jpjVatWlfv1xBhjZUFsbCw8PDwQEBAglDk6OqJ58+YiRsUYE1OhEqHbt2+jQYMGUFFRQXJyMu7cuZPvvg4ODsUWXElq9sQWCHsJ1KskdiiMsc/g8OHDGDNmDBITE4Wy6dOnY8mSJdDU1BQxMsaYmCSUN2VqAVRUVBAfHw9zc3OoqKhAIpFAWTWJRAKpVFoigRaXlJQUGBoaItlkBQyaVwP8B4gdEmOsBKWmpmLatGnw8fERyqysrLBz50507txZxMgYY0UhfH8nJ8PAwKDYjluoFqHIyEiYmZkJ/88YY2XBixcv0KZNGzx48EAo69u3L3x8fGBqaipiZIyx0qJQiZCtra3S/2eMsdKsUqVKqF+/Ph48eAAdHR2sXbsWo0aN4j6OjDHBR02o+PfffwuPZ86cCSMjI7Rq1QpPnz4t1uAYY+xTSCQS+Pj44Msvv0RISAg8PT05CWKMySlyIrR06VJoa2sDAC5fvoz169djxYoVqFSpEqZNm1bsATLGWGHt27cP/v7+cmWmpqb466+/ULt2bZGiYoyVZoUePp8nOjoatWrVAgD8+eef6N+/P8aOHYvWrVvjiy++KO74SozHCD/80eeo2GEwxopBSkoKvvnmG+zatQtmZma4c+cOKleuLHZYjLEyoMgtQnp6esLw01OnTgmjLrS0tJSuQVZa/el4G2hnI3YYjLFPFBgYCEdHR+zatQtAbgdpPz8/kaNijJUVRW4R6tKlC0aPHo1GjRrhwYMH6NGjBwDg7t27qFatWnHHxxhjSmVnZ2PRokVYsmQJZDIZAMDAwAAbN26Eu7u7yNExxsqKIrcIbdiwAc7Oznjx4gUOHTokDEG9efMmhgwZUuwBMsbY+x49eoS2bdti0aJFQhLUpk0bhIaGchLEGCuSQk2oWJ7kTcgk8ZVANkImdjiMsSIgIvj6+mLSpElITU0FAKiqqmLBggWYNWsWVFVVRY6QMVZSRJ1Q8X2vX7/Gr7/+ivDwcEgkEtjb28PT0xOGhobFFhhjjL3vxYsXmDZtmpAE1axZE35+fmjRooXIkTHGyqoi3xq7ceMGatasidWrVyMpKQkvX77E6tWrUbNmTdy6daskYmSMMQCAubk5Nm/eDADw9PRESEgIJ0GMsU9S5Ftjbdu2Ra1ateDj4wM1tdwGpZycHIwePRqPHz/GhQsXSiTQ4sK3xhgrO7KyspCdnQ1dXV258mvXrvGK8YxVMCV1a+yjWoS+++47IQkCADU1NcycORM3btwotsAYYxXbvXv34OzsjIkTJyps4ySIMVZcipwIGRgYICoqSqE8Ojoa+vr6xRLU5/Ddic7Arrtih8EYew8RYfPmzWjcuDFu3bqFnTt34vfffxc7LMZYOVXkRGjQoEHw9PTE/v37ER0djWfPnmHfvn0YPXp0mRo+//2pLsCeMLHDYIy948WLF+jduzfGjx8vTNBqb2/Py2MwxkpMkUeNrVy5EhKJBMOHD0dOTg4AQF1dHePHj8fy5cuLPUDGWMVw4sQJeHh44Pnz50LZhAkT8NNPP0FHR0fEyBhj5dlHzyOUlpaGiIgIEBFq1apVZj6ohM5WJitg0Lwa4D9A7JAYq9DS09Mxa9YsrF27VigzMzPD9u3b0bNnTxEjY4yVJqJ3lk5LS8PEiRNhbW0Nc3NzjB49GpaWlnBwcCgzSRBjrHRJSEhA8+bN5ZIgV1dX3Llzh5MgxthnUehEaP78+fD19UWPHj0wePBgBAQEYPz48SUZG2OsnKtUqRKsra0B5C7cvH79ehw7doxXjmeMfTaFvjVWs2ZNLFmyBIMHDwaQO49H69atkZGRUaamtc9rWjPYrIXkoSmArrrYITFWocXFxWH48OH45ZdfUK9ePbHDYYyVUiV1a6zQiZCGhgYiIyOFX28AoK2tjQcPHsDGxqbYAippPKEiY+L5888/YWRkhC+++ELsUBhjZYzofYSkUik0NDTkytTU1ISRY4wxlp/U1FSMHTsWffv2xdChQ5GUlCR2SIwxBqAIw+eJCB4eHtDU1BTKMjIyMG7cOLnp7//444/ijZAxVqbduHED7u7uePDgAQAgJiYGvr6+8PLyEjkyxhgrQiI0YsQIhbKhQ4cWazCMsfJDKpVixYoV8Pb2FlqOdXR0sHbtWowaNUrk6BhjLFehE6EdO3aUZByMsXIkKioKw4YNk1uEuWnTpvDz84OdnZ2IkTHGmLwiL7HBGGMF2bdvHxwcHIQkSCKRYM6cOQgKCuIkiDFW6hR5iQ3GGMtPfHw8Ro8ejdTUVABA1apVsXv3brRt21bkyBhjTDluEWKMFRsLCwv88ssvAIAhQ4YgNDSUkyDGWKlWYVuETq2dAIScAVZ3FDsUxsqs7OxsSKVSaGlpCWWjRo1CjRo10KFDBxEjY4yxwqmwLULNn9gC9xLFDoOxMuvRo0do27Ytpk+fLlcukUg4CWKMlRkflQjt2rULrVu3hpWVFZ4+fQoAWLNmDf76669iDY4xVvoQEXbs2AEnJydcvXoVGzduxLFjx8QOizHGPkqRE6FNmzbBy8sLrq6ueP36NaRSKQDAyMgIa9asKe74GGOlSFJSEgYOHIhRo0YJHaJr1qwJc3NzkSNjjLGPU+REaN26dfDx8cGcOXPkFltt2rQp7ty5U6zBMcZKj7Nnz8LBwQEHDx4Uyjw9PRESEoLmzZuLGBljjH28IidCkZGRaNSokUK5pqam8AuxLLhW7SlQ11TsMBgr9bKysjBz5kx06tQJMTExAABjY2McPHgQ27Ztg56ensgRMsbYxyvyqLHq1asjJCQEtra2cuX+/v6oV69esQVW0rpO3gjZiPVih8FYqZaQkIBu3bohODhYKOvUqRN27twJa2trESNjjLHiUeREaMaMGZg4cSIyMjJARLh27Rr27t2LZcuWYdu2bSURI2NMJKamptDX1wcAqKurY9myZZg2bRpUVCrsgFPGWDlT5E+zkSNHYv78+Zg5cybS0tLg5uaGzZs345dffsHgwYOLHMDGjRtRvXp1aGlpoUmTJrh48WKh6gUGBkJNTQ1OTk5FPidjrHBUVVWxa9cutGrVCteuXcP06dM5CWKMlSsSIqKPrfzy5UvIZLKPHjGyf/9+DBs2DBs3bkTr1q2xZcsWbNu2DWFhYahatWq+9ZKTk9G4cWPUqlULz58/R0hISKHPmZKSAkNDQ0h8JZCNkH1U3IyVV/7+/jA2NkbLli3lyokIEolEpKgYY+y/7+/k5GQYGBgU23E/6addpUqVPmnY7M8//wxPT0+MHj0a9vb2WLNmDWxsbLBp06YC63399ddwc3ODs7PzR5+bMfaf9PR0TJ48Ga6urnBzc0NKSorcdk6CGGPl1Ud1li7oQ/Hx48eFOk5WVhZu3ryJWbNmyZV37doVQUFB+dbbsWMHIiIisHv3bixevPiD58nMzERmZqbw+P0PeMYqutDQULi7u+Pu3bsAckeG/vrrr5g2bZrIkTHGWMkrciI0depUucfZ2dkIDg7GiRMnMGPGjEIf5+XLl5BKpahcubJceeXKlREfH6+0zsOHDzFr1ixcvHgRamqFC33ZsmVYsGBBoeNirKKQyWT45ZdfMGvWLGRlZQEAtLS0sGrVKowfP17k6Bhj7PMociI0ZcoUpeUbNmzAjRs3ihzA+61L+fVFkEqlcHNzw4IFC2BnZ1fo43///ffw8vISHqekpMDGxqbIcTJWnsTGxsLDwwMBAQFCmaOjI/bs2VOmpsFgjLFPVWzDP7p3745Dhw4Vev9KlSpBVVVVofUnISFBoZUIAN68eYMbN27gm2++gZqaGtTU1LBw4UKEhoZCTU0NZ86cUXoeTU1NGBgYyP0BgH6GJpCaXYQrZKx8OHz4MBwcHOSSoOnTp+Pq1aucBDHGKpxiS4QOHjwIExOTQu+voaGBJk2ayH0YA0BAQABatWqlsL+BgQHu3LmDkJAQ4W/cuHGoU6cOQkJC0KJFiyLFGz17IdD/zyLVYaysi42NxZAhQ5CYmAgAsLKyQkBAAFauXAlNTU2Ro2OMsc+vyLfGGjVqJHfriogQHx+PFy9eYOPGjUU6lpeXF4YNG4amTZvC2dkZW7duRVRUFMaNGwcg97ZWTEwMfvvtN6ioqKBBgwZy9c3NzaGlpaVQzhhTzsrKCj/99BMmT56Mvn37wsfHB6amvNQMY6ziKnIi1KdPH7nHKioqMDMzwxdffIG6desW6ViDBg1CYmIiFi5ciLi4ODRo0ADHjx8Xlu+Ii4tDVFRUUUNkjP2fVCqFTCaDurq6UPbNN9+gRo0acHV15WHxjLEKr0gTKubk5MDPzw8uLi6wsLAoybhKjDAhk8kKGDSvBvgPEDskxkpEVFQUhg0bhhYtWmDFihVih8MYY5+kVEyoqKamhvHjx8vNy8MYK3327dsHBwcHXLhwAT/99BNOnz4tdkiMMVYqFbmzdIsWLeRWoi6rlnUNANx4hAwrX1JSUjB8+HAMGTIEycnJAICqVatCS0tL5MgYY6x0KnIfoQkTJmD69Ol49uwZmjRpAl1dXbntDg4OxRZcSfqx2z9YNqy+2GEwVmwCAwMxdOhQPHnyRChzc3PDhg0bYGRkJFpcjDFWmhW6j9CoUaOwZs0apR+oEolEmAhRKpUWd4zFihddZeVNdnY2Fi1ahCVLlkAmy31PGxgYYOPGjXB3dxc5OsYYKx4l1Ueo0ImQqqoq4uLikJ6eXuB+eSO+SitOhFh5kpCQgC+//BJXr14Vytq0aYNdu3ahWrVq4gXGGGPFrKQSoULfGsvLl0p7osNYRWJsbCz821RVVcWCBQswa9YsqKqqihwZY4yVDUXqLM1zjjBWuqirq8PPzw9OTk4ICgrCnDlzOAlijLEiKFJnaTs7uw8mQ0lJSZ8UEGMsf2fPnoWxsTGcnJyEslq1auHWrVv8Q4Uxxj5CkRKhBQsWwNDQsKRiYYzlIysrC3PnzsXKlStRp04d3Lx5Ezo6OsJ2ToIYY+zjFCkRGjx4MMzNzUsqFsaYEvfu3YObm5swf9e9e/fg4+ODKVOmiBwZY4yVfYXuI1TefnH2CXUALkSLHQZj+SIibN68GY0bNxaSIHV1daxcuRKTJk0SOTrGGCsfijxqrLzw3ekOhF8B2tmIHQpjChISEjB69GgcPXpUKLO3t8eePXvk+gcxxhj7NIVuEZLJZHxbjLHPwN/fHw4ODnJJ0IQJE3Djxg1OghhjrJgVeYkNxljJefbsGXr37o3s7GwAgJmZGbZv346ePXuKHBljjJVPRV50lTFWcqpUqYKFCxcCALp37447d+5wEsQYYyWIW4QYE5FMJgMRyU2COGPGDNSsWRP9+/cvd4MUGGOstKmwLUJdJm8EVnUQOwxWgcXGxqJbt25YtGiRXLmqqioGDBjASRBjjH0GFTYRul7tKVCvkthhsArq8OHDcHBwQEBAABYtWoSgoCCxQ2KMsQqpwiZCjIkhNTUVY8eORb9+/ZCYmAgAqFy5stA5mjHG2OfFfYQY+0xu3LgBd3d3PHjwQCjr27cvfHx8YGpqKmJkjDFWcXGLEGMlTCqVYtmyZXB2dhaSIB0dHWzbtg2HDh3iJIgxxkTELUKMlaCEhAQMGDAAFy5cEMqaNWsGPz8/1K5dW8TIGGOMAdwixFiJMjAwwOvXrwHkrtc3Z84cBAYGchLEGGOlBCdCjJUgLS0t7NmzB3Xq1MH58+exePFiqKurix0WY4yx/+NbY4wVo8DAQBgbG6NevXpCWf369XH37l25SRMZY4yVDhW2RWjt/v7A6htih8HKiezsbHh7e6Ndu3Zwc3NDZmam3HZOghhjrHSqsInQ8KvNgFORYofByoGIiAi0bdsWixYtgkwmQ2hoKLZu3Sp2WIwxxgqhwiZCjH0qIoKvry+cnJxw9epVALktP4sXL8aECRNEjo4xxlhhcB8hxj5CUlISvv76axw8eFAoq1mzJvbs2YPmzZuLGBljjLGi4BYhxorozJkzcHBwkEuCPD09ERISwkkQY4yVMRW2RShFKxMG2jyMmRVNVFQUXFxckJOTAwAwNjaGj48PvvrqK5EjY4wx9jEqbItQ1aXewB99xA6DlTFVq1bF999/DwDo2LEjbt++zUkQY4yVYRW2RYixwiAiEBFUVP77zTBv3jzUrFkTw4YNkytnjDFW9vCnOGP5SEhIQO/evbFq1Sq5cnV1dYwYMYKTIMYYKwf4k5wxJfz9/eHg4ICjR49izpw5uHXrltghMcYYKwGcCDH2jvT0dEyePBmurq54/vw5AMDIyAivXr0SOTLGGGMlgfsIMfZ/oaGhcHd3x927d4Wy7t27Y8eOHahcubKIkTHGGCsp3CLEKjyZTIbVq1ejefPmQhKkpaWFdevW4e+//+YkiDHGyjFuEWIV2osXL+Dm5oZ//vlHKHNwcMCePXtQv379Yj+fVCpFdnZ2sR+XMcbKAw0Njc8+EIUTIVah6ejoICoqSng8ffp0LFmyBJqamsV6HiJCfHw8Xr9+XazHZYyx8kRFRQXVq1eHhobGZzunhIjos52tFEhJSYGhoSHCasyBvXNLYHdPsUNiIrt58yb69+8PHx8fdO7cuUTOERcXh9evX8Pc3Bw6OjqQSCQlch7GGCurZDIZYmNjoa6ujqpVqyp8TuZ9fycnJ8PAwKDYzlthW4SsXxsCielih8E+sxs3bsDY2Bg1a9YUypo0aYIHDx5AXb1kllyRSqVCEmRqaloi52CMsfLAzMwMsbGxyMnJKbHP5PdxZ2lWIUilUixbtgzOzs5wd3dX6KdTkv/g8s6lo6NTYudgjLHyIO+WmFQq/Wzn5ESIlXtRUVHo2LEjZs+ejZycHFy9ehXbtm377HHw7TDGGCuYGJ+TnAixcm3fvn1wcHDAhQsXAOT+I5szZw5Gjx4tcmSMMcZKgwqbCP3peBtoU0XsMFgJSUlJwfDhwzFkyBAkJycDyF05/vz581i8ePFnu/fMWH4CAwPRsGFDqKuro0+fPkWuf+7cOUgkkjI3EvH+/fuwsLDAmzdvxA6l3Pn2228xefJkscMocypsIuQxwg+Y4yx2GKwEBAUFwcnJCbt27RLK3NzcEBoairZt24oYWdnj4eEBiUQCiUQCNTU1VK1aFePHj1e65EhQUBBcXV1hbGwMLS0tNGzYEKtWrVJ6r//s2bNwdXWFqakpdHR0UK9ePUyfPh0xMTGf47JKBS8vLzg5OSEyMhK+vr5ih1Nkr169wrBhw2BoaAhDQ0MMGzasUEnZnDlzMHHiROjr65d8kCI5dOgQ6tWrB01NTdSrVw+HDx/+YJ3ff/8dTk5O0NHRga2tLX766SeFfTZs2AB7e3toa2ujTp06+O233+S2z5w5Ezt27EBkZGSxXUuFQBVMcnIyASCJr0TsUFgJiIyMJDU1NQJAAMjAwIB2794takzp6ekUFhZG6enposbxMUaMGEHdunWjuLg4io6OppMnT5K1tTUNHjxYbr8//viD1NTUaMyYMRQcHEyRkZHk4+NDxsbG1L9/f5LJZMK+mzdvJhUVFRo5ciSdPXuWIiMj6fz58+Tp6UnTpk37bNeWmZn52c6ljKmpKW3fvv2j6589e5YA0KtXr4ovqCLo1q0bNWjQgIKCgigoKIgaNGhAPXv2LLBOdHQ0qaurU3R09CedW+zXriBBQUGkqqpKS5cupfDwcFq6dCmpqanRlStX8q1z/PhxUlNTo02bNlFERAQdO3aMLCwsaN26dcI+GzduJH19fdq3bx9FRETQ3r17SU9Pj44cOSJ3rH79+tHMmTNL7PpKWkGfl3nf38nJycV6Tk6EWLkzbdo0AkCtW7emyMhIscMp84lQ79695cq8vLzIxMREePz27VsyNTWlfv36KdQ/cuQIAaB9+/YRUe4XoYaGBk2dOlXp+Qr6Un/16hWNGTOGzM3NSVNTk+rXr09Hjx4lIqL58+eTo6Oj3P6rV68mW1tbhWtZunQpWVpakq2tLc2aNYtatGihcK6GDRuSt7e38Hj79u1Ut25d0tTUpDp16tCGDRvyjZOIKCMjgyZNmkRmZmakqalJrVu3pmvXrhFRbrKel6jn/e3YsSPf48yYMYOqVKlCGhoaVKtWLdq2bRsRKSZCL1++pMGDB5O1tTVpa2tTgwYNaM+ePXLHO3DgADVo0IC0tLTIxMSEOnXqRG/fvhWO16xZM9LR0SFDQ0Nq1aoVPXnyRGlcYWFhBEDuy/3y5csEgO7du5fv87Jq1Spq2rSpXFlh4m7fvj1NnDiRpk2bRqamptSuXTsiIrp79y51796ddHV1ydzcnIYOHUovXrwQ6vn7+1Pr1q3J0NCQTExMqEePHvTo0aN84ysOAwcOpG7dusmVubi4KPx4eNeQIUOof//+cmWrV6+mKlWqCD8inJ2d6dtvv5XbZ8qUKdS6dWu5Ml9fX7KxsfmUSxCVGIlQhb01xsoHyk3m5cqWLl2KDRs24Ny5c6hWrZo4gZVTjx8/xokTJ+T6WJ06dQqJiYn49ttvFfbv1asX7OzssHfvXgDAgQMHkJWVhZkzZyo9vpGRkdJymUyG7t27IygoCLt370ZYWBiWL18OVVXVIsV/+vRphIeHIyAgAMeOHYO7uzuuXr2KiIgIYZ+7d+/izp07cHd3BwD4+Phgzpw5WLJkCcLDw7F06VLMmzcPO3fuzPc8M2fOxKFDh7Bz507cunULtWrVgouLC5KSkmBjY4O4uDgYGBhgzZo1iIuLw6BBg5QeZ/jw4di3bx/Wrl2L8PBwbN68GXp6ekr3zcjIQJMmTXDs2DH8+++/GDt2LIYNG4arV68CyJ3Uc8iQIRg1ahTCw8Nx7tw59OvXD0SEnJwc9OnTB+3bt8ft27dx+fJljB07Nt8RPJcvX4ahoSFatGghlLVs2RKGhoYICgrK93m5cOECmjZtWqS48+zcuRNqamoIDAzEli1bEBcXh/bt28PJyQk3btzAiRMn8Pz5cwwcOFCok5qaCi8vL1y/fh2nT5+GiooK+vbtC5lMlm+MS5cuhZ6eXoF/Fy9ezLf+5cuX0bVrV7kyFxeXAp+XzMxMaGlpyZVpa2vj2bNnePr0aYH7XLt2TW46kObNmyM6OlqoxwqhWNOqMoBbhMqPxMRE6t+/P61fv17sUAqU3y+cJk2akLW19Wf/a9KkSaFjHzFiBKmqqpKuri5paWkJLRg///yzsM/y5csLvEXz5Zdfkr29PRERjR8/ngwMDIr8HJ48eZJUVFTo/v37SrcXtkWocuXKCrdVHBwcaOHChcLj77//npo1ayY8trGxUWihWLRoETk7OyuN5e3bt6Surk5+fn5CWVZWFllZWdGKFSuEMkNDw3xbgoiI7t+/TwAoICBA6fbC3BpzdXWl6dOnExHRzZs3CYDSVp7ExEQCQOfOncv3WO9asmQJ1a5dW6G8du3atHTp0nzrOTo6yj3XhYmbKLdFyMnJSW6fefPmUdeuXeXKoqOjCUC+75OEhAQCQHfu3Mn33ImJifTw4cMC/9LS0vKt//5rT0Tk5+dHGhoa+dbZsmUL6ejo0D///ENSqZTu379PdevWJQAUFBRERLnvSwsLC7px4wbJZDK6fv06mZubEwCKjY0VjpX3HVfY17K0EaNFqMLOLM3KtrNnz2LYsGGIiYnBsWPH8MUXX5TIIqklKT4+vkx0Du7QoQM2bdqEtLQ0bNu2DQ8ePMCkSZMU9qN8VushIqFl4d3/L4qQkBBUqVIFdnZ2Ra77roYNGyqsYeTu7o7t27dj3rx5ICLs3bsXU6dOBZC7KG90dDQ8PT0xZswYoU5OTg4MDQ2VniMiIgLZ2dlo3bq1UKauro7mzZsjPDy80LGGhIRAVVUV7du3L9T+UqkUy5cvx/79+xETE4PMzExkZmZCV1cXAODo6IhOnTqhYcOGcHFxQdeuXdG/f38YGxvDxMQEHh4ecHFxQZcuXdC5c2cMHDgQlpaW+Z5P2ev4odc3PT1doVXjQ3Hneb8l6ebNmzh79qzSFrKIiAjY2dkhIiIC8+bNw5UrV/Dy5UuhJSgqKgoNGjRQGqOJiQlMTEzyvYbCeP85+NDzMmbMGERERKBnz57Izs6GgYEBpkyZgh9++EFo9Zw3bx7i4+PRsmVLEBEqV64MDw8PrFixQq5lVFtbGwCQlpb2SddQkXAixMqUrKwszJ07FytXrhS+eLW1tRETE1PmEiELC4sycV5dXV3UqlULALB27Vp06NABCxYswKJFiwBASE7Cw8PRqlUrhfr37t1DvXr1hH2Tk5MRFxdX4Jfs+/I+3POjoqKikIi9P3t43rW8z83NDbNmzcKtW7eQnp6O6OhoDB48GACEL04fHx+520AA8r0tlxdHUb8M3/eha37fqlWrsHr1aqxZswYNGzaErq4upk6diqysLCHegIAABAUF4dSpU1i3bh3mzJmDq1evonr16tixYwcmT56MEydOYP/+/Zg7dy4CAgLQsmVLhXNZWFjg+fPnCuUvXrxA5cqV842xUqVKCiMOPxR3nvdfO5lMhl69euHHH39UOE/ee6tXr16wsbGBj48PrKysIJPJ0KBBA4Vjv2vp0qVYunRpvtsBwN/fP98RqBYWFoiPj5crS0hIKPB5kUgk+PHHH7F06VLEx8fDzMwMp0+fBgDh9r62tja2b9+OLVu24Pnz57C0tMTWrVuhr6+PSpUqCcdKSkoCkLtUBSukYm1fKgP41ljZFRYWRo0aNZLrZNqxY8dPHoFS0spbZ+mzZ8+SlpYWxcTEEFHurSATExOlnaX/+usvuc7SUVFRH9VZ+ty5cwXeGtu4cSOZm5vLjU5zc3NT2llamfbt25OXlxeNHz+eXFxc5LZZW1sX6nZOnrdv35KGhobCrTFra2v66aefhLIP3RqLjIwkiURS6FtjPXv2pFGjRgnbpVIp2dnZ5XvNOTk5ZG1tTatWrVK6vWXLljRp0iSl2/I6S1+9elUou3Llygc7S0+cOFEhnsLE3b59e5oyZYpcvdmzZ1OdOnUoOztb6blevnxJAOjChQtC2cWLFwkAHT58ON8YP/XW2MCBA6l79+5yZd26dSuws7Qyw4YNy/f2a5527drRkCFD5Mr++ecfUldXLzDG0oxHjX0GeU+k/XILosjXYofDCkEmk9HGjRtJW1tbSIDU1dVp5cqVJJVKxQ7vg8pbIkSU279p4sSJwuMDBw6QqqoqjRkzhkJDQykyMpK2bdumdPj8hg0bSCKR0KhRo+jcuXP05MkTunTpEo0dO5a8vLzyjeWLL76gBg0a0KlTp+jx48d0/Phx8vf3J6LcL2aJRELLly+nR48e0fr168nY2LjQidDWrVvJysqKKlWqRLt27ZLb5uPjQ9ra2rRmzRq6f/8+3b59m7Zv355vAkGUO5rHysqK/P396e7duzRixAgyNjampKQkYZ8PJUJERB4eHmRjY0OHDx+mx48f09mzZ2n//v1EpJgITZ06lWxsbCgwMJDCwsJo9OjRZGBgIFzzlStXaMmSJXT9+nV6+vQp/f7776ShoUHHjx+nx48f06xZsygoKIiePHlCJ0+eJBMTE9q4cWO+sXXr1o0cHBzo8uXLdPnyZWrYsOEHh88fOXKEzM3NKScnRyj7UNxEyhOhmJgYMjMzo/79+9PVq1cpIiKCTp48SSNHjqScnBySSqVkampKQ4cOpYcPH9Lp06epWbNmH0yEPlVgYCCpqqrS8uXLKTw8nJYvX64wfH7dunXUsWNH4fGLFy9o06ZNFB4eTsHBwTR58mTS0tKSSzTv379Pu3btogcPHtDVq1dp0KBBZGJiojAydv78+XLHLms4EfoMhCfSZAVRt9/FDod9wMuXL6lnz55yrUD29vZ069YtsUMrtPKYCOV1/oyKihLKLly4QN26dSNDQ0PS0NCgevXq0cqVK+W+9PIEBASQi4sLGRsbk5aWFtWtW5e+/fZbuU6f70tMTKSRI0eSqakpaWlpUYMGDejYsWPC9k2bNpGNjQ3p6urS8OHDacmSJYVOhF69ekWampqko6NDb968UXq9Tk5OpKGhQcbGxtSuXTv6448/8o01PT2dJk2aRJUqVVIYPp+nMIlQeno6TZs2jSwtLYXh83lzD72fCCUmJlLv3r1JT0+PzM3Nae7cuTR8+HDhmsPCwsjFxUUY0m9nZyfMUxMfH099+vQRzmNra0ve3t4F/tBITEwkd3d30tfXJ319fXJ3d//gnEZ5rVAnTpyQO05BcRMpT4SIiB48eEB9+/YlIyMj0tbWprp169LUqVOFxDsgIIDs7e1JU1OTHBwc6Ny5cyWeCBHl/jCoU6cOqaurU926denQoUNy2+fPny/33nzx4gW1bNmSdHV1SUdHhzp16qQw71BYWBg5OTmRtra2kCgqa32zs7OjvXv3lsh1fQ5iJEISonx6OJZTKSkpMDQ0RLLJChg0rwb4DxA7JFaA5ORkODo6CkNBJ0yYgJ9++qlMreSekZGByMhIVK9eXaGjKGMVzcaNG/HXX3/h5MmTYodS7vz999+YMWMGbt++DTW1stkFuKDPS+H7OzkZBgYGxXZOnkeIlWqGhobYvXs3LC0tcfToUWzYsKFMJUGMMXljx45Fu3bteK2xEpCamoodO3aU2SRILPxssVIlNDQUJiYmsLGxEcratGmDx48fc2sKY+WAmpoa5syZI3YY5dK7k0mywhO9RWjjxo1CE1iTJk0KnLHzjz/+QJcuXWBmZgYDAwM4Oztz82o5IZPJsHr1ajRv3hzDhg1TWKiTkyDGGGMlQdREaP/+/Zg6dSrmzJmD4OBgtG3bFt27d0dUVJTS/S9cuIAuXbrg+PHjuHnzJjp06IBevXohODi4yOeePPAgMKXph3dkJS42NhbdunWDl5cXsrKycP78eWzfvl3ssBhjjFUAonaWbtGiBRo3boxNmzYJZfb29ujTpw+WLVtWqGPUr18fgwYNgre3d6H2z+tsJfGVQDYi//Vm2Odx+PBhjBkzBomJiULZ9OnTsWTJEmhqaooYWfHhztKMMVY4YnSWFq2PUFZWFm7evIlZs2bJlXft2rXAxeneJZPJ8ObNmwKnQ8+brj1PSkrKxwXMilVqaiqmTZsGHx8foczKygo7d+5E586dRYyMMcZYRSLarbGXL19CKpUqTDteuXJlhenJ87Nq1SqkpqYW2EFs2bJlMDQ0FP7e7YTLxHHjxg00btxYLgnq168fbt++zUkQY4yxz0r0ztIfux7P3r178cMPP2D//v0wNzfPd7/vv/8eycnJwl90dPQnx8w+3uPHj+Hs7IwHDx4AyF0/6Ndff8XBgwdhamoqcnSMMcYqGtESoUqVKkFVVbXIi9MBuZ2sPT098fvvv3+wBUFTUxMGBgZyf0w8NWrUgKenJwCgWbNmCA4OxqhRoz5qRXLGGGPsU4mWCGloaKBJkyYICAiQKw8ICFC6gnWevXv3wsPDA3v27EGPHj1KOkxWAlatWoWVK1ciMDAQtWvXFjscxkQRGBiIhg0bQl1dHX369Cly/XPnzkEikeD169fFHltJun//PiwsLHhCxRLw7bffYvLkyWKHUeaIemvMy8sL27Ztw/bt2xEeHo5p06YhKioK48aNA5B7W2v48OHC/nv37sXw4cOxatUqtGzZEvHx8YiPj0dycrJYl8AKkJKSguHDh2PHjh1y5bq6upg+fTrU1dVFiowVloeHByQSCSQSCdTU1FC1alWMHz8er169Utg3KCgIrq6uMDY2hpaWFho2bIhVq1YpzAkFAGfPnoWrqytMTU2ho6ODevXqYfr06YiJifkcl1UqeHl5wcnJCZGRkfD19RU7nCJbsmQJWrVqBR0dHRgZGRW63pw5czBx4kTo6+uXXHAiO3ToEOrVqwdNTU3Uq1cPhw8f/mCd33//HU5OTtDR0YGtrS1++uknhX02bNgAe3t7aGtro06dOvjtt9/kts+cORM7duxAZGRksV1LhVCsK5d9hA0bNpCtrS1paGhQ48aN6fz588K2ESNGUPv27YXH7du3l1t8M+9vxIgRhT5f3qJtEl9JMV4Fe19gYCBVr16dAJCenh49fPhQ7JBEU9YXXe3WrRvFxcVRdHQ0nTx5kqytrWnw4MFy+/3xxx+kpqZGY8aMoeDgYIqMjCQfHx+lq89v3ryZVFRUaOTIkXT27FmKjIyk8+fPk6enJ02bNu2zXVtmZuZnO5cypqamwgKqH+P9RVc/N29vb/r555/Jy8uLDA0NC1UnOjqa1NXVKTo6+pPOLfZrV5CgoCBSVVWlpUuXUnh4OC1dulRh9fn3HT9+nNTU1GjTpk0UERFBx44dIwsLC2FRXCKijRs3kr6+Pu3bt48iIiJo7969pKenR0eOHJE7Vr9+/WjmzJkldn0ljVef/wzynsiJI9sRHbovdjjlTnZ2Nnl7e5OKioqQqBoYGJC/v7/YoYmmrCdC76/Y7uXlRSYmJsLjt2/fkqmpKfXr10+h/pEjRwgA7du3j4hyvwg1NDRo6tSpSs9X0Jf6q1evaMyYMWRubk6amppUv359Onr0KBHlrubt6Ogot//q1auVrj6/dOlSsrS0JFtbW5o1axa1aNFC4VwNGzYkb29v4fH27dupbt26pKmpSXXq1KENGzbkGycRUUZGBk2aNElY6f3d1ecjIyMVfszltwp9RkYGzZgxg6pUqSKsPr9t2zYiUkyEXr58SYMHDyZra2vS1tamBg0a0J49e+SOd+DAAWrQoAFpaWmRiYkJderUid6+fSscr1mzZqSjo0OGhobUqlUrevLkSYHXSUS0Y8eOQidCq1atoqZNm8qVFSbu9u3b08SJE2natGlkampK7dq1IyKiu3fvUvfu3UlXV5fMzc1p6NCh9OLFC6Gev78/tW7dmgwNDcnExIR69OhBjx49KlSsH2vgwIHUrVs3uTIXFxeFHw/vGjJkCPXv31+ubPXq1VSlShXhR4SzszN9++23cvtMmTKFWrduLVfm6+tLNjY2n3IJohIjEaqwa40t/asnEBcK9LMTO5RyIyIiAu7u7rh69apQ1qZNG+zatQvVqlUTL7BSqum1pojPKtxUEcXJQsMCN5rf+Ki6jx8/xokTJ+Rua546dQqJiYn49ttvFfbv1asX7OzssHfvXgwaNAgHDhxAVlYWZs6cqfT4+d1ikclk6N69O968eYPdu3ejZs2aCAsLg6qqapHiP336NAwMDBAQEAD6/1yyy5cvR0REBGrWrAkAuHv3Lu7cuYODBw8CAHx8fDB//nysX78ejRo1QnBwMMaMGQNdXV2MGDFC6XlmzpyJQ4cOYefOnbC1tcWKFSvg4uKCR48ewcbGBnFxcahTpw4WLlyIQYMGwdDQUOlxhg8fjsuXL2Pt2rVwdHREZGQkXr58qXTfjIwMNGnSBN999x0MDAzw999/Y9iwYahRowZatGiBuLg4DBkyBCtWrEDfvn3x5s0bXLx4EUSEnJwc9OnTB2PGjMHevXuRlZWFa9euFfsghgsXLqBpU/kZ/T8Ud56dO3di/PjxCAwMBBEhLi4O7du3x5gxY/Dzzz8jPT0d3333HQYOHIgzZ84AyJ2vzMvLCw0bNkRqaiq8vb3Rt29fhISEQEVFec+QpUuXYunSpQVeh7+/P9q2bat02+XLlzFt2jS5MhcXF6xZsybf42VmZiosJq2trY1nz57h6dOnqFatGjIzMxUmGNTW1sa1a9eQnZ0t/Jts3rw5oqOj8fTpU9ja2hZ4Hez/ijWtKgOEjNJkBVG338UOp1yQyWS0Y8cO0tPTE37hqqqq0uLFiyknJ0fs8ESX3y8c64vWhH/w2f+sL1oXOvYRI0aQqqoq6erqkpaWlvD6/vzzz8I+y5cvL/AWzZdffkn29vZERDR+/HgyMDAo8nN48uRJUlFRofv3lbfiFrZFqHLlygq3VRwcHGjhwoXC4++//56aNWsmPLaxsVFooVi0aBE5OzsrjeXt27ekrq5Ofn5+QllWVhZZWVnRihUrhDJDQ8N8W4KIiO7fv08AKCAgQOn2wtwac3V1penTpxMR0c2bNwmA0laexMREAkDnzp3L91j5KUqLkKOjo9xznZ934ybKbRFycnKS22fevHnUtWtXubLo6GgCkO/7JCEhgQDQnTt38j13YmIiPXz4sMC/tLS0fOu//9oTEfn5+ZGGhka+dbZs2UI6Ojr0zz//kFQqpfv371PdunUJAAUFBRFR7vvSwsKCbty4QTKZjK5fv07m5uYEgGJjY4Vj5X3HfcxrWRpwixArc169eoWxY8cKv54BoGbNmtizZw+aN28uYmSln4WGRZk4b4cOHbBp0yakpaVh27ZtePDgASZNmqSwH+WzWg+9MzcYFXKesPeFhISgSpUqsLP7tBbchg0bQkNDQ67M3d0d27dvx7x580BE2Lt3L6ZOnQoAePHiBaKjo+Hp6YkxY8YIdXJycvJtxYmIiEB2djZat24tlKmrq6N58+YIDw8vdKwhISFQVVVF+/btC7W/VCrF8uXLsX//fsTExAiz6uvq6gIAHB0d0alTJzRs2BAuLi7o2rUr+vfvD2NjY5iYmMDDwwMuLi7o0qULOnfujIEDB8LS0rLQ8RZGenq6QqvGh+LO835L0s2bN3H27Fno6ekpnCciIgJ2dnaIiIjAvHnzcOXKFbx8+RIyWe6ySlFRUWjQoIHSGE1MTApcraAwijo/3pgxYxAREYGePXsiOzsbBgYGmDJlCn744Qeh1XPevHmIj49Hy5YtQUSoXLkyPDw8sGLFCrmWUW1tbQBAWlraJ11DRcKJEPskMplMbkkUT09PrFmzRumHE5P3sbenPjddXV3UqlULALB27Vp06NABCxYswKJFiwBASE7Cw8OVTn1x79491KtXT9g3OTkZcXFxRfqSzftwz4+KiopCIpadna30Wt7n5uaGWbNm4datW0hPT0d0dDQGDx4MAMIXp4+Pj9xtGgD53pbLi+NjJ4vN86Frft+qVauwevVqrFmzBg0bNoSuri6mTp2KrKwsId6AgAAEBQXh1KlTWLduHebMmYOrV6+ievXq2LFjByZPnowTJ05g//79mDt3LgICAtCyZcsixVGQSpUqKYw4/FDced5/7WQyGXr16oUff/xR4Tx5761evXrBxsYGPj4+sLKygkwmQ4MGDRSO/a5PvTVmYWFR5PnxJBIJfvzxRyxduhTx8fEwMzPD6dOnAUDoVqCtrY3t27djy5YteP78OSwtLbF161bo6+ujUqVKwrGSkpIAAGZmZgVeA/uP6DNLs7LN1NQUO3fuhKmpKQ4ePIht27ZxElTOzZ8/HytXrkRsbCyA3PUBTUxMsGrVKoV9jxw5gocPH2LIkCEAgP79+0NDQwMrVqxQeuz85sRxcHDAs2fPhBnJ32dmZob4+Hi5ZCgkJKRQ11OlShW0a9cOfn5+8PPzQ+fOnYUvrcqVK8Pa2hqPHz9GrVq15P6qV6+u9Hi1atWChoYGLl26JJRlZ2fjxo0bsLe3L1RMQG7rlUwmw/nz5wu1/8WLF9G7d28MHToUjo6OqFGjBh4+fCi3j0QiQevWrbFgwQIEBwdDQ0NDbmh3o0aN8P333yMoKAgNGjTAnj17Ch1vYTRq1AhhYWFFjluZxo0b4+7du6hWrZrCa6Orq4vExESEh4dj7ty56NSpE+zt7ZVO+/C+cePGISQkpMC/91un3uXs7KwwP96pU6cKnB8vj6qqKqytraGhoYG9e/fC2dlZYeUEdXV1VKlSBaqqqti3bx969uwp19/p33//hbq6OurXr//B87H/K9YbbWVA3j1G6zVGRAmpYodT5oSFhVF8fLxCeUpKigjRlA3lbdQYEVGTJk1o4sSJwuMDBw6QqqoqjRkzhkJDQykyMpK2bdumdPj8hg0bSCKR0KhRo+jcuXP05MkTunTpEo0dO5a8vLzyjeWLL76gBg0a0KlTp+jx48d0/PhxYTRiWFgYSSQSWr58OT169IjWr19PxsbGSkeNKbN161aysrKiSpUq0a5du+S2+fj4kLa2Nq1Zs4bu379Pt2/fpu3bt9OqVavyjXXKlClkZWVF/v7+dPfuXRoxYgQZGxtTUlKSsM+H+ggREXl4eJCNjQ0dPnyYHj9+TGfPnqX9+/cTkWIfoalTp5KNjQ0FBgZSWFgYjR49mgwMDIRrvnLlCi1ZsoSuX79OT58+pd9//500NDTo+PHj9PjxY5o1axYFBQXRkydP6OTJk2RiYkIbN27MN7anT59ScHAwLViwgPT09Cg4OJiCg4PpzZs3+dY5cuQImZuby/Ud/FDcRLl9hKZMmSJ3rJiYGDIzM6P+/fvT1atXKSIigk6ePEkjR46knJwckkqlZGpqSkOHDqWHDx/S6dOnqVmzZgSADh8+XODz/ikCAwNJVVWVli9fTuHh4bR8+XKF4fPr1q2jjh07Co9fvHhBmzZtovDwcAoODqbJkyeTlpYWXb16Vdjn/v37tGvXLnrw4AFdvXqVBg0aRCYmJhQZGSl3/vnz58sdu6zh4fOfAc8j9HFkMhlt2rSJtLW1qXv37nJfbKxg5TERyuv8GRUVJZRduHCBunXrRoaGhqShoUH16tWjlStXKu0wHxAQQC4uLmRsbExaWlpUt25d+vbbb+U6fb4vMTGRRo4cSaampqSlpUUNGjSgY8eOCds3bdpENjY2pKurS8OHD6clS5YUOhF69eoVaWpqko6OjtIvcj8/P3JyciINDQ0yNjamdu3a0R9//JFvrOnp6TRp0iSqVKmSwvD5PIVJhNLT02natGlkaWkpDJ/Pm3vo/UQoMTGRevfuTXp6emRubk5z586l4cOHC9ccFhZGLi4uwpB+Ozs7YZ6a+Ph46tOnj3AeW1tb8vb2JqlUmm9sI0aMUDqv29mzZ/Otk5OTQ9bW1nTixAmh7ENxEylPhIiIHjx4QH379iUjIyPS1tamunXr0tSpU4XPp4CAALK3tydNTU1ycHCgc+fOlXgiRJT7w6BOnTqkrq5OdevWpUOHDsltnz9/vtx788WLF9SyZUvS1dUlHR0d6tSpk8K8Q2FhYeTk5ETa2tpConjv3j2Fc9vZ2dHevXtL5Lo+BzESIQlRPj0cy6mUlBQYGhpC4iuBbIRM7HDKhISEBIwePRpHjx4Vynbs2AEPDw/xgipDMjIyEBkZierVqyt0FGWsotm4cSP++usvnDx5UuxQyp2///4bM2bMwO3bt6GmVja7ABf0eZn3/Z2cnFys64aWzWeKfTYnTpyAh4cHnj9/LpRNmDABAwcOFDEqxlhZNXbsWLx69Qpv3rwp18tsiCE1NRU7duwos0mQWPjZYkqlp6dj1qxZWLt2rVBmZmaG7du3o2fPniJGxhgry9TU1DBnzhyxwyiX+Afqx+FEiCm4c+cO3Nzc8O+//wplrq6u2L59e4FDQBljjLGyhhMhJufRo0do2rSpMM+GlpYWVq5ciQkTJhT7dPuMMcaY2HgeISanVq1aGDRoEIDcmWhv3ryJiRMnchLEGGOsXOIWIaZg/fr1qF27NmbOnAlNTU2xw2GMMcZKTIVtETq0xROYc1HsMESVmpqKsWPHYv/+/XLlBgYGmDdvHidBjDHGyr0Kmwh1um8H3Ir/8I7l1I0bN9C4cWP4+Phg3LhxiI6OFjskxhhj7LOrsIlQRSWVSrFs2TI4OzsL6zZlZWXh9u3bIkfGyotz585BIpHku24YY6XJr7/+iq5du4odRrnUrFkz/PHHH2KH8UGcCFUgUVFR6NixI2bPno2cnBwAuW/UkJAQ9OjRQ+ToWHnRqlUrxMXFwdDQUOxQKgyJRCL86enpwdHREb6+vgr7SaVSrF69Gg4ODtDS0oKRkRG6d++OwMBAhX2zsrKwYsUKODo6QkdHB5UqVULr1q2xY8cOZGdnf4arKnmZmZnw9vbGvHnzxA6lxBARfvjhB1hZWUFbWxtffPEF7t69W2Cd7OxsLFy4EDVr1oSWlhYcHR1x4sQJuX3evHmDqVOnwtbWFtra2mjVqhWuX78ut8+8efMwa9YsyGSlfBWHYl2wowwQ1ioxWUHU7Xexw/ls9u7dS4aGhsJ6QBKJhObMmUNZWVlih1buleW1xsSQmZkpdghFIpPJKDs7W9QYANCOHTsoLi6OHj16REuWLCEAcmt6yWQy6t+/PxkZGZGPjw89fvyYQkJCaMyYMaSmpia3/lZmZiZ98cUXZGxsTOvXr6fg4GCKiIggPz8/atSoEQUHB3+2ayvJzyg/Pz+ys7P75OOU5s/R5cuXk76+Ph06dIju3LlDgwYNIktLywIXyp45cyZZWVnR33//TREREbRx40bS0tKiW7duCfsMHDiQ6tWrR+fPn6eHDx/S/PnzycDAgJ49eybsk5OTQ+bm5nT8+PFCx8uLrn4GeU/klXrTicae+HCFMi45OZmGDRsmtyhi1apV6cKFC2KHVmEUmAh1+13xb3PIhw96PU553etxxRp7+/bt6ZtvvqEpU6aQkZERmZub05YtW+jt27fk4eFBenp6VKNGDbkPuvcXAyUiunTpErVr1460tbXJyMiIunbtKqzE3r59e5o4cSJNmzaNTE1NqV27dkREdO7cOWrWrBlpaGiQhYUFfffddx9MOK5du0adO3cmU1NTMjAwoHbt2tHNmzeF7YMHD6ZBgwbJ1cnKyiJTU1NhMVOZTEY//vgjVa9enbS0tMjBwYEOHDigcH0nTpygJk2akLq6Op05c4YePXpEX375JZmbm5Ouri41bdqUAgIC5M4VGxtLrq6upKWlRdWqVSM/Pz+ytbWl1atXC/u8fv2axowZQ2ZmZqSvr08dOnSgkJCC3xNQspCoiYkJeXl5CY/37dtHAOjIkSMK9fv160empqb09u1bIiL68ccfSUVFRe6L793nK28/ZQp6rd+/ViIiR0dHmj9/vty1bNq0ib788kvS0dGhuXPnkrW1NW3atEmu3s2bNwkARUREENHHPW+9evWib7/9Vq7sQ+8hZTF6e3sTEdGRI0eocePGpKmpSdWrV6cffvhB7j27atUqatCgAeno6FCVKlVo/PjxShf5LS4ymYwsLCxo+fLlQllGRgYZGhrS5s2b861naWlJ69evlyvr3bs3ubu7ExFRWloaqaqqyi16TJT7Ws6ZM0euzMPDg4YNG1bomMVIhCrsrTHnmT8DW1zEDqPEpaWlwd/fX3g8ZMgQhIaGom3btiJGxQQ3niv+PUv5cL2UTOV1UzKLPcSdO3eiUqVKuHbtGiZNmoTx48djwIABaNWqFW7dugUXFxcMGzYMaWlpSuuHhISgU6dOqF+/Pi5fvoxLly6hV69ekEqlcudQU1NDYGAgtmzZgpiYGLi6uqJZs2YIDQ3Fpk2b8Ouvv2Lx4sUFxvrmzRuMGDECFy9exJUrV1C7dm24urrizZs3AAB3d3ccOXIEb9++FeqcPHkSqamp+OqrrwAAc+fOxY4dO7Bp0ybcvXsX06ZNw9ChQ3H+/Hm5c82cORPLli1DeHg4HBwc8PbtW7i6uuKff/5BcHAwXFxc0KtXL0RFRQl1hg8fjtjYWJw7dw6HDh3C1q1bkZCQIGwnIvTo0QPx8fE4fvw4bt68icaNG6NTp05ISkoq1OsllUrx+++/IykpCerq6kL5nj17YGdnh169einUmT59OhITExEQEAAA8PPzQ+fOndGoUSOFfdXV1aGrq6v03IV5rQtj/vz56N27N+7cuYPRo0dj8ODB8PPzk9tnz549cHZ2Ro0aNT76ebt48SKaNm0qV/ah95CyGEeNGoWTJ09i6NChmDx5MsLCwrBlyxb4+vpiyZIlQh0VFRWsXbsW//77L3bu3IkzZ85g5syZBT4X3bt3h56eXoF/+YmMjER8fLxcHyhNTU20b98eQUFB+dbLzMxUWPBUW1sbly5dAgDk5ORAKpUWuE+e5s2b4+LFUj5Cu1jTqjIgL6OU+ErEDuWz+euvv8jAwIB2794tdigVUoEtQpXWKf7NLURr3eknyuueflKssbdv357atGkjPM7JySFdXV25X3hxcXEEgC5fvkxEii1CQ4YModatWxd4DicnJ7my2bNnU506dUgmkwllGzZsID09PZJKpYWOPycnh/T19eno0aNElNuaUalSJfrtt9+EfYYMGUIDBgwgIqK3b9+SlpYWBQUFyR3H09OThgwZInd9f/755wfPX69ePVq3bh0REYWHhxMAun79urD94cOHBEBoJTl9+jQZGBhQRkaG3HFq1qxJW7Zsyfc8AEhLS4t0dXVJVVWVAJCJiQk9fPhQ2Kdu3brUu3dvpfWTkpIIAP34449ERKStrU2TJ0/+4PW970OvdWFbhKZOnSq3z61bt0gikdCTJ7nvb6lUStbW1rRhwwYi+rjn7dWrVwTgg63j77+H8ouxbdu2tHTpUrmyXbt2kaWlZb7H/v3338nU1LTA8z979owePnxY4F9+AgMDCQDFxMTIlY8ZM4a6du2ab70hQ4ZQvXr16MGDBySVSunUqVOkra1NGhoawj7Ozs7Uvn17iomJoZycHNq1axdJJBKFW41//fUXqaioFPrfrRgtQjyhYjnz6NEjGBsbw9TUVCj78ssvERkZCRMTExEjY2WVg4OD8P+qqqowNTVFw4YNhbK89efebdl4V0hICAYMGFDgOd7/VR4eHg5nZ2e5Gc1bt26Nt2/f4tmzZwCAevXqCdtmz56N2bNnIyEhAd7e3jhz5gyeP38OqVSKtLQ0oVVGXV0dAwYMgJ+fH4YNG4bU1FT89ddf2LNnDwAgLCwMGRkZ6NKli1w8WVlZCq0j78ecmpqKBQsW4NixY4iNjUVOTg7S09OFc9+/fx9qampo3LixUKdWrVowNjYWHt+8eRNv376V+/cL5C6CHBERUeBzuHr1anTu3BnR0dHw8vLCtGnTUKtWrQLrvC/v+Saij5pNvjCvdWG8/9w2atQIdevWxd69ezFr1iycP38eCQkJwiKjH/O8paenA4BCq8aH3kP5xXjz5k1cv35drgVIKpUiIyMDaWlp0NHRwdmzZ7F06VKEhYUhJSUFOTk5yMjIQGpqar6tbNbW1vk9TYX2/mv5odf3l19+wZgxY1C3bl1IJBLUrFkTI0eOxI4dO4R9du3ahVGjRsHa2hqqqqpo3Lgx3NzccOvWLbljaWtrQyaTITMzE9ra2p98LSWBE6Fygojg6+uLSZMmoVu3bjhw4IDcG52TIPax3r29AuR+qL5blvc+y29kSGE+/N7/ElD2QU1EwvksLS0REhIibMt7f3t4eODFixdYs2YNbG1toampCWdnZ2HtPCD39lj79u2RkJCAgIAAaGlpoXv37nLX8Pfffyt8Ab0/wej7Mc+YMQMnT57EypUrUatWLWhra6N///7CufPif9+75TKZDJaWljh37pzCfkZGRkrr57GwsECtWrVQq1YtHDhwAI0aNULTpk2FhNHOzg5hYWFK64aHhwMAateuLeybV1YUH3qtVVRUFJ4HZSPQlCUF7u7u2LNnD2bNmoU9e/bAxcUFlSpVAvBxz5upqSkkEglevXolV16Y95CyGGUyGRYsWIB+/fopnEtLSwtPnz6Fq6srxo0bh0WLFsHExASXLl2Cp6dngaPwunfv/sFbS+/e6n2XhYUFACA+Ph6WlpZCeUJCQoELaJuZmeHPP/9ERkYGEhMTYWVlhVmzZqF69erCPjVr1sT58+eRmpqKlJQUWFpaYtCgQXL7AEBSUhJ0dHRKbRIEcCJULiQlJeHrr7/GwYMHAQCHDh3C3r174ebmJnJk7IOaKvkwqmLw4XoGmsrrGpS+2cAdHBxw+vRpLFiwoNB16tWrh0OHDsklREFBQdDX14e1tTVUVFSUtnZcvHgRGzduhKurKwAgOjoaL1++lNunVatWsLGxwf79++Hv748BAwZAQ0NDOK+mpiaioqLQvn37Il3nxYsX4eHhgb59+wLI/XJ68uSJsL1u3brIyclBcHAwmjRpAiC3Bffd+ZYaN26M+Ph4qKmpoVq1akU6/7tq1aqFr776Ct9//z3++usvAMDgwYPh5uaGo0ePKvQTWrVqFUxNTYWWMDc3N8yePRvBwcEKLWE5OTnIzMxUmqx86LU2MzNDXFyc8DglJQWRkZGFuiY3NzfMnTsXN2/exMGDB7Fp0yZh28c8bxoaGqhXrx7CwsLk+tAU5j2kTOPGjXH//v18W+Fu3LiBnJwcrFq1Cioqud1zf//99w8ed9u2bULrVVFVr14dFhYWCAgIEF7HrKwsnD9/Hj/++OMH62tpacHa2hrZ2dk4dOiQ0AL3Ll1dXejq6uLVq1c4efIkVqxYIbf933//lWsFLZWK9UZbGVDe+gidOXOGrK2t5UaFeXp6luhIBFY0ZXn4fPv27WnKlClyZcr6eeCdUUvv9xG6f/8+aWho0Pjx4yk0NJTCw8Np48aN9OLFi3zP8ezZM9LR0aGJEydSeHg4/fnnn1SpUiW5viTKODk5UZcuXSgsLIyuXLlCbdu2JW1tbYV4Z8+eTfXq1SM1NTW6ePGi3LY5c+aQqakp+fr60qNHj+jWrVu0fv168vX1VXp9efr06UNOTk4UHBxMISEh1KtXL9LX15e7ts6dO1Pjxo3p6tWrdOvWLerQoQNpa2vTmjVriCh3lE+bNm3I0dGRTpw4QZGRkRQYGEhz5syR61v0vnef/zy3b98miUQi1JPJZNS3b18yNjambdu2UWRkJIWGhtLYsWMVhs9nZGRQ27ZtheHzISEhFBERQfv376fGjRvnO3z+Q6/1rFmzyMLCgi5cuEB37tyhPn36kJ6enkIfofevJU+rVq3I0dGR9PT0KC0tTSj/2OfNy8uLvvrqK7mywryHlMV44sQJUlNTo/nz59O///5LYWFhtG/fPmEUVXBwMAGgNWvWUEREBP3222/CZ/f776XitHz5cjI0NKQ//viD7ty5Q0OGDFEYPj9s2DCaNWuW8PjKlSt06NAhioiIoAsXLlDHjh2pevXqcnGeOHGC/P396fHjx3Tq1ClydHSk5s2bK0wl0L59e1q4cGGh4+Xh859BeUmEMjMzacaMGSSRSIQEyNjYmA4ePCh2aOw9FT0RIsodCt+qVSvS1NQkIyMjcnFxEbYrO0denaIOn7916xY1bdqUNDU1qXbt2nTgwAGl8d69e5cAkK2trVyHbKLcL9VffvmF6tSpQ+rq6mRmZkYuLi50/vz5fK+PiCgyMlJIbGxsbGj9+vUK1xYbG0vdu3cnTU1NsrW1pT179pC5ubncUOaUlBSaNGkSWVlZkbq6OtnY2JC7uztFRUXle935JQ9dunSh7t27C4+zs7Np5cqVVL9+fdLU1CQDAwNycXFRSAaJcpOhZcuWUcOGDUlLS4tMTEyodevW5OvrW+DrUNBrnZycTAMHDiQDAwOysbEhX19fpZ2l80uENmzYQABo+PDhCts+5nkLDw8nbW1tev36tVBWmPdQfjGeOHGCWrVqRdra2mRgYEDNmzenrVu3Ctt//vlnsrS0JG1tbXJxcaHffvutxBMhmUxG8+fPJwsLC9LU1KR27drRnTt35PZp3749jRgxQnh87tw5sre3J01NTTI1NaVhw4YpdLjev38/1ahRQ/j3OXHiRLnnkSj3B426ujpFR0cXOl4xEiEJUT43rsuplJQUGBoaQuIrgWxEKZ/tMh/37t2Dm5sbgoODhbKOHTti586dqFKlioiRMWUyMjIQGRmJ6tWrK3TMZBXbs2fPYGNjg3/++QedOnUSO5wKaeDAgWjUqBG+//57sUMpd2bMmIHk5GRs3bq10HUK+rzM+/5OTk6GgUEhuhAUEvcRKmPu37+Pxo0bC/eM1dXVsWzZMkybNk2478wYK53OnDmDt2/fomHDhoiLi8PMmTNRrVo1tGvXTuzQKqyffvoJR44cETuMcsnc3Bzffvut2GF8UIX95nzt9SPQ/YDYYRSZnZ2dMMLF3t4e165dw/Tp0zkJYqwMyM7OxuzZs1G/fn307dsXZmZmOHfunMLIPPb52NraYtKkSWKHUS7NmDGjwNFppQW3CJUxEokEW7duhZ2dHebNmwcdHR2xQ2KMFZKLiwtcXMr/jPaMlSXcjFCKpaenY/LkyTh69KhcuampKZYtW8ZJEGOMMfaJOBEqpUJDQ9GsWTOsW7cOo0aNQnx8vNghMcYYY+UOJ0KljEwmw+rVq9G8eXPcvXsXQO7EbDdu3BA5MsYYY6z8qbCJ0Ib2F4E+dmKHISc2NhbdunWDl5eXMJ27o6Pj/9q786gorvRv4N8Gmq3ZBkRZBSEiOhEMuACOwxgNuAQME9EIQWVM1OCCkETN6IhJxvgzHo2aKPq6QDTgEhFjEsUQFaJIXFjcIApIIEYQEUEBAYHn/cNDjU03a4Am3c/nnD6HunWr6qm+3dRTVbfrIi0tDa+++qqCo2OMMcaUj8omQiumfAfMc1Z0GIL4+Hg4OTkhMTFRKHv33Xdx4cIFqcElGWOMMdZ1+FdjClZZWYmwsDDs2rVLKLOwsMCXX36J8ePHKzAyxhhjTPmp7BWh3uLhw4f4+uv/Pc/Iz88PV69e5SSIMcYUJCgoCJ988omiw1A6JSUlMDU1xe+//67oUKRwIqRg1tbW2LFjByQSCXbt2oW4uDiYmJgoOizG2J9YUlISRCKR8DIxMcHLL7+MlJQUmbplZWVYsmQJbG1toampCXNzcwQHB6OwsFCmbnFxMRYtWgQ7OztoaWnB2toaPj4+OHXqVE/sVo+4evUqvv/+e6V+yOLDhw8RFBQEQ0NDGBoaIigoCOXl5a0uc+/ePcyePRsWFhbQ1dXFhAkTkJOTI1UnLy9PeFCogYEBpk2bhnv37gnz+/bti6CgIERERHTHbnUaJ0I9rLCwEI8ePZIqmz59OnJzczFnzhyIRCIFRcZY7/X06VNFh9AhvSXemzdvoqioCElJSTA1NcXkyZNRUlIizC8rK4Obmxt+/PFHbNu2Dbm5uTh48CDy8vIwYsQI3L59W6j766+/wtXVFadPn8ann36Ka9euISEhAWPHjsWCBQt6bJ+ICPX19d22/i+++AL+/v7Q19fv9Dq6O8Y/KiAgAJmZmUhISEBCQgIyMzMRFBTUYn0iwmuvvYbbt2/jm2++QUZGBmxsbDB+/HhUVVUBAKqqquDl5QWRSITTp08jJSUFdXV18PHxQWPj/8b1DA4ORkxMDB4+fNjt+9luXTqE65+AIkef379/PxkaGsodOZkpr9ZGU3Zz2yXz2rQptc11pqb+JnfZ1NT2j/LcHp6enrRw4UIKDQ0lIyMj6tu3L+3YsYMqKytp9uzZpKenR3Z2dnT8+HFhmfr6evrXv/5Ftra2pK2tTQ4ODrRp0yaZde/evZuGDBkiNXp1EwAUGRlJvr6+pKurS6tWrSIiom3btpGdnR2JxWJycHCgvXv3trkP+/btI1dXV9LT06N+/frRjBkz6N69e0RE1NDQQJaWlhQZGSm1TFpaGgGgvLw8IiIqLy+nt99+m0xNTUlfX5/Gjh1LmZmZQv2IiAhydnam3bt304ABA0gkElFjYyOdOHGCRo8eTYaGhmRsbEyTJ0+m3NxcqW2lpKSQs7MzaWlpkaurK8XHxxMAysjIEOrcuHGDJk6cSBKJhPr27Utvvvkm3b9/v8V9PnPmjMyo5levXiUAdOzYMaFs/vz5JJFIqKioSGr56upqsrS0pAkTJghlEydOJEtLS6qsrJTZXlujp7fU1vn5+TL7+vDhQwJAZ86ckdqXhIQEcnV1JbFYTNu3bycAlJ2dLbWdDRs2kI2NDTU2NnbqfWtoaCAjIyP67rvvpMpb+wy1FOPp06epsbGR1q1bRwMGDCBtbW1ycnKir7/+Wliuvd+VrpSVlUUA6OeffxbKUlNTCQD98ssvcpe5efMmAaDr169LxW5sbEw7d+4kIqKTJ0+Smpqa1MjwZWVlBIASExOl1mdra0u7d++Wuy1FjD7PV4R6wKNHjzBz5kzMmDEDFRUV2Lt3L+Li4hQdFusFfv75jsyroKCizeUqKmrkLltRUdPlMX755Zfo06cPLl68iEWLFuGdd96Bv78/PDw8kJ6eDm9vbwQFBaG6uhrAs2dhWVlZ4dChQ8jKysKqVavw73//G4cOHRLWGRkZiQULFmDu3Lm4du0ajh07hhdeeEFquxEREZgyZQquXbuGf/3rX4iPj0doaCjeffddXL9+HfPmzUNwcDDOnDnTavx1dXX4+OOPceXKFRw9ehT5+fmYPXs2AEBNTQ1vvPEGYmJipJaJjY2Fu7s77OzsQESYPHkyiouLcfz4caSlpcHFxQXjxo1DWVmZsExubi4OHTqEuLg4ZGZmAnh2lhweHo5Lly7h1KlTUFNTg5+fn3CG/PjxY/j4+GDo0KFIT0/Hxx9/jGXLlknFUlRUBE9PTwwbNgyXL19GQkIC7t27h2nTprW7DaurqxEVFQUAwrhmjY2NOHDgAAIDA2FmZiZVX0dHByEhITh58iTKyspQVlaGhIQELFiwABKJRGb9RkZGLW67PW3dHkuXLsXatWuRnZ2NqVOnwtXVVW67BQQEQCQSdep9u3r1KsrLyzF8+HCp8tY+Qy3F6OTkhJUrVyIqKgqRkZG4ceMGwsLC8OabbyI5ORlA+74r8ujp6bX6ahqPUp7U1FQYGhpi1KhRQpmbmxsMDQ1x/vx5ucvU1tYCgNRo8Orq6tDU1MS5c+eEOiKRCFpaWkIdbW1tqKmpCXWajBw5EmfPnm11H3tUl6ZVfwI9fUXo3LlzZGtrSwCE14wZM9o8g2LKo7UzHGC1zCssLKHNdSYk5MhdNiEhp0tj9/T0pL/97W/CdH19PUkkEgoKChLKioqKCAClprZ8JSskJIRef/11YdrCwoJWrFjRYn0AtGTJEqkyDw8Pevvtt6XK/P39adKkSe3eHyKiixcvEgB6/PgxERGlp6eTSCSiX3/9lYj+d5Vo69atRER06tQpMjAwoJqaGqn12Nvb044dO4jo2RUhsVhMJSUlrW67pKSEANC1a9eIiCgyMpJMTEykPhs7d+6Uukryn//8h7y8vKTW89tvvxEAunnzptztNF2hkEgkJJFISCQSEQBydXWluro6IiIqLi4mAPTZZ5/JXceRI0cIAF24cIEuXLhAAOjIkSOt7p88rbV1R64IHT16VGrZjRs3kp2dnTDddNXixo0bRNS59y0+Pp7U1dWFK0otaf4ZkhdjZWUlaWtr0/nz56WWnTNnDs2YMaPFdTf/rsiTk5PT6uvOnTstLrtmzRoaOHCgTPnAgQPpk08+kbtMXV0d2djYkL+/P5WVlVFtbS2tXbuWAAjvcUlJCRkYGFBoaChVVVVRZWUlLViwgADQ3LlzpdYXFhZG//jHP+Rui68I9aBxvzgAl7tv2IqnT59i1apV+Pvf/45ff/0VAGBgYICvvvoKsbGxrZ5BMdabODk5CX+rq6vDxMQEQ4cOFcqaRpd+vu/J9u3bMXz4cJiamkJPTw87d+4UOt+WlJTg7t27GDduXKvbbX5Wnp2djdGjR0uVjR49GtnZ2QCAmJgYqbPipjPOjIwMTJkyBTY2NtDX18c//vEPABDieemll+Do6Ij9+/cDAJKTk1FSUiJcOUhLS0NlZSVMTEyk1p+fn4+8vDwhFhsbG5iamkrFl5eXh4CAANjZ2cHAwAADBgyQ2vbNmzfh5OQkdaY9cuRIqXWkpaXhzJkzUtt2dHQU1t+as2fPIj09Hfv374eNjQ2io6PbPdI9EQF4NtDz8393RHvbuj2afx7eeOMNFBQU4OeffwbwrP2HDRsmPHetM+/bkydPoKWlJbOfbX2G5MWYlZWFmpoavPLKK1Ix7N27V2r7rX1XWvLCCy+0+rK0tGx1eXntSEQttq9YLEZcXBxu3boFY2Nj6OrqIikpCRMnToS6ujoAwNTUFF9//TW+/fZb6OnpwdDQEBUVFXBxcRHqNNHR0RGuIPcGKvscobj/NwfIPAuc8O/ydefm5uLNN9/EhQsXhLLRo0fjq6++gq2tbZdvj7Hu1PzAKRKJpMqa/nk23e45dOgQwsLCsGHDBri7u0NfXx/r168Xvg86Ojrt2q68WzDN/1E//8/b19dX6nK/paWl0IHTy8sLX331FUxNTVFYWAhvb2/h6e0AEBgYiNjYWCxfvhyxsbHw9vZGnz59hP0yNzdHUlKSTDzPn9DIi9fHxwfW1tbYuXMnLCws0NjYiBdffFHYtryDT1PS0aSxsRE+Pj5Yt26dzPrNzc1lyp43YMAAGBkZwcHBATU1NfDz88P169ehpaUFU1NTGBkZISsrS+6yv/zyC0QiEezt7QE8e++zs7Px2muvtbrN57XV1mpqz87Fn9/nljqaN39/zc3NMXbsWMTGxsLNzQ379+/HvHnzhPmded/69OmD6upq1NXVQVNTEwDa/RlqHmPT9+H777+XSUyabh+19V1piZ6eXqvzx4wZgxMnTsidZ2ZmJvVLrib3798XTmrkcXV1RWZmJioqKlBXVwdTU1OMGjVKKvnz8vJCXl4eSktLoaGhASMjI5iZmQknAE3KyspkThoUSWUToe6SnZ2NESNGCD3p1dXVsXr1aixfvhwaGvx2M2lublYyZTY2hm0uZ2ioLXdZQ0NtObV71tmzZ+Hh4YGQkBCh7PkzYH19fdja2uLUqVMYO3Zsu9c7ePBgnDt3DjNnzhTKzp8/j8GDBwvrbf5Ln7S0NJSWluL//u//YG1tDQByx+0LCAjAypUrkZaWhsOHDyMyMlKY5+LiguLiYmhoaHToRObBgwfIzs7Gjh07MGbMGACQ6Svh6OiImJgY1NbWCgfH5vG5uLggLi4Otra2f+h/SFBQED766CNs27YNYWFhUFNTw7Rp0xATE4OPPvpIqp/QkydPsG3bNnh7e8PY2BgA4O3tja1bt2Lx4sUySUl5ebncq9xttXXTwbCoqAgvvfQSAAj9q9ojMDAQy5Ytw4wZM5CXl4c33nhDmNeZ923YsGEAnl3Nafr7l19+addnqLkhQ4ZAS0sLhYWF8PT0lFunre9KS9p6j1pLQN3d3VFRUYGLFy8KVx8vXLiAiooKeHh4tLltQ8Nn/59ycnJw+fJlfPzxxzJ1mk4iTp8+jZKSEvj6+krNv379unBVrVfo0httfwLCPUbjT4kmHOry9Tc2NtKECRMIANnb20v1zGeqqbV73r2dp6cnhYaGSpXZ2NjI9CsBQPHx8UREtGnTJjIwMKCEhAS6efMmrVy5kgwMDMjZ2VmoHx0dTdra2rR582a6desWpaWl0ZYtW+Sur0l8fDyJxWKKjIykW7du0YYNG0hdXV3oSyJPSUkJaWpq0vvvv095eXn0zTffkIODg0y/FKJnfZCcnZ1JT0+PqqurhfLGxkb629/+Rs7OzpSQkED5+fmUkpJCK1asoEuXLhHR/3419ryGhgYyMTGhN998k3JycujUqVM0YsQIqX2rqKggY2NjmjlzJmVlZVFCQgI5OjoSAOFXab///juZmprS1KlT6cKFC5SXl0cnT56k4OBgqq+vl7vf8n41RkS0ZcsW6tu3L1VVVRER0f3798ne3p5efPFFOn78OBUWFlJycjKNGTOG+vbtK/xqjojo9u3bZGZmRkOGDKHDhw/TrVu3KCsrizZv3kyOjo4ttkFbbe3m5kZjxoyhGzduUHJyMo0cOVJuHyF5/SorKipIW1ubnJ2dady4cVLzOvO+ERG5uLjQ559/Lky35zPUUowrVqwgExMTio6OptzcXEpPT6cvvviCoqOjiah935XuMGHCBHJycqLU1FRKTU2loUOH0quvvipVZ9CgQVJ9wg4dOkRnzpyhvLw8Onr0KNnY2NA///lPqWX27NlDqamplJubS/v27SNjY2MKDw+XqlNVVUU6Ojr0008/yY1NEX2EOBHqBkVFRRQaGip0pGOqTdUSoZqaGpo9ezYZGhqSkZERvfPOO7R8+XKZf+7bt2+nQYMGkVgsJnNzc1q0aJHc9T2vMz+fj42NJVtbW9LS0iJ3d3c6duyY3ERo69atBEDu4y0ePXpEixYtIgsLCxKLxWRtbU2BgYFUWFhIRPITISKixMREGjx4MGlpaZGTkxMlJSXJ7FtKSgo5OTmRpqYmubq6UmxsrMxPmW/dukV+fn5kZGREOjo65OjoSEuWLGmxU29LB+bKykr6y1/+QuvWrRPK7t+/T4sWLSJra2vS0NCgfv360axZs6igoEBmvXfv3qUFCxaQjY0NaWpqkqWlJfn6+raajBK13tZZWVnk5uZGOjo6NGzYMPrhhx/anQgRPeswD4D27NkjM6+j71tTrG5ublJlbX2GWoqxsbGRNm/eLOy7qakpeXt7U3JyMhG1/7vS1R48eECBgYGkr69P+vr6FBgYKBM7AIqKihKmN2/eTFZWViQWi6l///60cuVKqq2tlVpm2bJl1K9fPxKLxTRw4EDasGGDzHsdGxtLgwYNajE2RSRCIqJmN6SV3KNHj5514jL+FAYjbf9QH6G6ujr85z//wSuvvMJDYrAW1dTUID8/HwMGDJDqFMuYPDExMQgODkZFRUW7+1OxrlNTU4NBgwbhwIEDcHd3V3Q4SmfkyJFYsmQJAgIC5M5v7f+lcPyuqICBgUGXxaSynVZen7sbia8ndXr5X375BQEBAcjIyMBXX32Fq1ev8tAYjLEO27t3L+zs7GBpaYkrV65g2bJlmDZtGidBCqKtrY29e/eitLRU0aEonZKSEkydOhUzZsxQdChSVDYROuV4Cxhu1nbFZogIO3bsQHh4OJ48eQLgWW/78+fPw8fHp6vDZIwpueLiYqxatQrFxcUwNzeHv78/1qxZo+iwVFpLnZvZH9O3b18sXbpU0WHIUNlEqDNKSkrw1ltv4dtvvxXKBg8ejNjYWOEXBowx1hFLly7tlQcHxlSFyj5QsaMSEhLg5OQklQSFhITg8uXLnAQxxhhjf1KcCLXhyZMnCA0NxcSJE4WHUJmamuLbb7/F1q1boaurq+AIGWOMMdZZnAi14e7du9i9e7cwPWnSJFy7dg2vvvqqAqNif0Yq9gNNxhjrMEX8n+REqA329vbYsmULtLW18cUXX+C7775r9THkjDXXNBxFbxpbhzHGeqOmYUuaj0/WnbizdDN3796FkZGR1C2v4OBgjBs3DjY2NgqMjP1Zqaurw8jISBiUVFdXt8ODVzLGmLJrbGzE/fv3oaur26NDUnEi9Jz4+Hi8/fbb8Pf3lxprSCQScRLE/pCmcZyeH6GdMcaYNDU1NfTv379HTxZVNhFa882rQM0VYJ4zKisrERYWhl27dgEAtm/fjsmTJ3M/INZlRCIRzM3N0bdv3xZH12aMMVWnqakJNbWe7bWj8ERo27ZtWL9+PYqKivDXv/4VmzZtEkZqlic5ORnh4eG4ceMGLCwssHTpUsyfP7/D212QPAZ4cguXXOoQGBiInJwcYZ6fnx8/Wp11C3V19R69980YY6x1Cu0sffDgQSxZsgQrVqxARkYGxowZg4kTJ6KwsFBu/fz8fEyaNAljxoxBRkYG/v3vf2Px4sWIi4vr8LYbiLA2Lx4eHh5CEqSrq4tdu3YhLi6Oh8tgjDHGVIBCB10dNWoUXFxcpPrjDB48GK+99hrWrl0rU3/ZsmU4duwYsrOzhbL58+fjypUrSE1Nbdc2mwZt81C3w/mG20L5iBEjEBMTg4EDB/6BPWKMMcZYd+iuQVcVdkWorq4OaWlp8PLykir38vLC+fPn5S6TmpoqU9/b2xuXL1/ucL+LpiRITU0NK1asQEpKCidBjDHGmIpRWB+h0tJSNDQ0yDyTp1+/figuLpa7THFxsdz69fX1KC0thbm5ucwytbW1qK2tFaYrKiqEv620jLHzaAw8PDzw5MkTYRBVxhhjjPUujx49AtD1D11UeGfp5j+RI6JWfzYnr7688iZr167Fhx9+KHfendoyTJw4sSPhMsYYY0yBHjx4AENDwy5bn8ISoT59+kBdXV3m6k9JSUmLT242MzOTW19DQ6PFzs0ffPABwsPDheny8nLY2NigsLCwS99I1jmPHj2CtbU1fvvtty6958s6jtui9+C26D24LXqPiooK9O/fH8bGxl26XoUlQpqamnB1dUViYiL8/PyE8sTEREyZMkXuMu7u7lKjvwPADz/8gOHDhwvDGDSnpaUFLS0tmXJDQ0P+UPciBgYG3B69BLdF78Ft0XtwW/QeXf2cIYX+fD48PBy7du3Cnj17kJ2djbCwMBQWFgrPBfrggw8wc+ZMof78+fNRUFCA8PBwZGdnY8+ePdi9ezfee+89Re0CY4wxxv7EFNpHaPr06Xjw4AE++ugjFBUV4cUXX8Tx48eF4SyKioqknik0YMAAHD9+HGFhYdi6dSssLCywZcsWvP7664raBcYYY4z9iSm8s3RISAhCQkLkzouOjpYp8/T0RHp6eqe3p6WlhYiICLm3y1jP4/boPbgteg9ui96D26L36K62UOgDFRljjDHGFEmhfYQYY4wxxhSJEyHGGGOMqSxOhBhjjDGmsjgRYowxxpjKUspEaNu2bRgwYAC0tbXh6uqKs2fPtlo/OTkZrq6u0NbWhp2dHbZv395DkSq/jrTFkSNH8Morr8DU1BQGBgZwd3fHyZMnezBa5dfR70aTlJQUaGhoYNiwYd0boArpaFvU1tZixYoVsLGxgZaWFuzt7bFnz54eila5dbQtYmJi4OzsDF1dXZibmyM4OBgPHjzooWiV108//QQfHx9YWFhAJBLh6NGjbS7TJcdvUjIHDhwgsVhMO3fupKysLAoNDSWJREIFBQVy69++fZt0dXUpNDSUsrKyaOfOnSQWi+nw4cM9HLny6WhbhIaG0rp16+jixYt069Yt+uCDD0gsFlN6enoPR66cOtoeTcrLy8nOzo68vLzI2dm5Z4JVcp1pC19fXxo1ahQlJiZSfn4+XbhwgVJSUnowauXU0bY4e/Ysqamp0ebNm+n27dt09uxZ+utf/0qvvfZaD0eufI4fP04rVqyguLg4AkDx8fGt1u+q47fSJUIjR46k+fPnS5U5OjrS8uXL5dZfunQpOTo6SpXNmzeP3Nzcui1GVdHRtpBnyJAh9OGHH3Z1aCqps+0xffp0WrlyJUVERHAi1EU62hYnTpwgQ0NDevDgQU+Ep1I62hbr168nOzs7qbItW7aQlZVVt8WoitqTCHXV8Vupbo3V1dUhLS0NXl5eUuVeXl44f/683GVSU1Nl6nt7e+Py5ct4+vRpt8Wq7DrTFs01Njbi8ePHXT7AnirqbHtERUUhLy8PERER3R2iyuhMWxw7dgzDhw/Hp59+CktLSzg4OOC9997DkydPeiJkpdWZtvDw8MCdO3dw/PhxEBHu3buHw4cPY/LkyT0RMntOVx2/Ff5k6a5UWlqKhoYGmdHr+/XrJzNqfZPi4mK59evr61FaWgpzc/Nui1eZdaYtmtuwYQOqqqowbdq07ghRpXSmPXJycrB8+XKcPXsWGhpK9a9CoTrTFrdv38a5c+egra2N+Ph4lJaWIiQkBGVlZdxP6A/oTFt4eHggJiYG06dPR01NDerr6+Hr64vPP/+8J0Jmz+mq47dSXRFqIhKJpKaJSKasrfryylnHdbQtmuzfvx+rV6/GwYMH0bdv3+4KT+W0tz0aGhoQEBCADz/8EA4ODj0VnkrpyHejsbERIpEIMTExGDlyJCZNmoSNGzciOjqarwp1gY60RVZWFhYvXoxVq1YhLS0NCQkJyM/PFwYLZz2rK47fSnWa16dPH6irq8tk8iUlJTJZYxMzMzO59TU0NGBiYtJtsSq7zrRFk4MHD2LOnDn4+uuvMX78+O4MU2V0tD0eP36My5cvIyMjAwsXLgTw7GBMRNDQ0MAPP/yAl19+uUdiVzad+W6Ym5vD0tIShoaGQtngwYNBRLhz5w4GDhzYrTErq860xdq1azF69Gi8//77AAAnJydIJBKMGTMG//3vf/kuQg/qquO3Ul0R0tTUhKurKxITE6XKExMT4eHhIXcZd3d3mfo//PADhg8fDrFY3G2xKrvOtAXw7ErQ7NmzERsby/fcu1BH28PAwADXrl1DZmam8Jo/fz4GDRqEzMxMjBo1qqdCVzqd+W6MHj0ad+/eRWVlpVB269YtqKmpwcrKqlvjVWadaYvq6mqoqUkfOtXV1QH872oE6xlddvzuUNfqP4Gmn0Lu3r2bsrKyaMmSJSSRSOjXX38lIqLly5dTUFCQUL/p53dhYWGUlZVFu3fv5p/Pd5GOtkVsbCxpaGjQ1q1bqaioSHiVl5craheUSkfbozn+1VjX6WhbPH78mKysrGjq1Kl048YNSk5OpoEDB9Jbb72lqF1QGh1ti6ioKNLQ0KBt27ZRXl4enTt3joYPH04jR45U1C4ojcePH1NGRgZlZGQQANq4cSNlZGQIjzLoruO30iVCRERbt24lGxsb0tTUJBcXF0pOThbmzZo1izw9PaXqJyUl0UsvvUSamppka2tLkZGRPRyx8upIW3h6ehIAmdesWbN6PnAl1dHvxvM4EepaHW2L7OxsGj9+POno6JCVlRWFh4dTdXV1D0etnDraFlu2bKEhQ4aQjo4OmZubU2BgIN25c6eHo1Y+Z86cafUY0F3HbxERX8tjjDHGmGpSqj5CjDHGGGMdwYkQY4wxxlQWJ0KMMcYYU1mcCDHGGGNMZXEixBhjjDGVxYkQY4wxxlQWJ0KMMcYYU1mcCDHGpERHR8PIyEjRYXSara0tNm3a1Gqd1atXY9iwYT0SD2Osd+NEiDElNHv2bIhEIplXbm6uokNDdHS0VEzm5uaYNm0a8vPzu2T9ly5dwty5c4VpkUiEo0ePStV57733cOrUqS7ZXkua72e/fv3g4+ODGzdudHg9f+bElLHejhMhxpTUhAkTUFRUJPUaMGCAosMC8GxQ16KiIty9exexsbHIzMyEr68vGhoa/vC6TU1Noaur22odPT29Do1O3VnP7+f333+PqqoqTJ48GXV1dd2+bcZY+3AixJiS0tLSgpmZmdRLXV0dGzduxNChQyGRSGBtbY2QkBCpUc2bu3LlCsaOHQt9fX0YGBjA1dUVly9fFuafP38ef//736GjowNra2ssXrwYVVVVrcYmEolgZmYGc3NzjB07FhEREbh+/bpwxSoyMhL29vbQ1NTEoEGDsG/fPqnlV69ejf79+0NLSwsWFhZYvHixMO/5W2O2trYAAD8/P4hEImH6+VtjJ0+ehLa2NsrLy6W2sXjxYnh6enbZfg4fPhxhYWEoKCjAzZs3hTqttUdSUhKCg4NRUVEhXFlavXo1AKCurg5Lly6FpaUlJBIJRo0ahaSkpFbjYYzJ4kSIMRWjpqaGLVu24Pr16/jyyy9x+vRpLF26tMX6gYGBsLKywqVLl5CWlobly5dDLBYDAK5duwZvb2/885//xNWrV3Hw4EGcO3cOCxcu7FBMOjo6AICnT58iPj4eoaGhePfdd3H9+nXMmzcPwcHBOHPmDADg8OHD+Oyzz7Bjxw7k5OTg6NGjGDp0qNz1Xrp0CQAQFRWFoqIiYfp548ePh5GREeLi4oSyhoYGHDp0CIGBgV22n+Xl5YiNjQUA4f0DWm8PDw8PbNq0SbiyVFRUhPfeew8AEBwcjJSUFBw4cABXr16Fv78/JkyYgJycnHbHxBgDlHL0ecZU3axZs0hdXZ0kEonwmjp1qty6hw4dIhMTE2E6KiqKDA0NhWl9fX2Kjo6Wu2xQUBDNnTtXquzs2bOkpqZGT548kbtM8/X/9ttv5ObmRlZWVlRbW0seHh709ttvSy3j7+9PkyZNIiKiDRs2kIODA9XV1cldv42NDX322WfCNACKj4+XqhMREUHOzs7C9OLFi+nll18Wpk+ePEmamppUVlb2h/YTAEkkEtLV1RVG0vb19ZVbv0lb7UFElJubSyKRiH7//Xep8nHjxtEHH3zQ6voZY9I0FJuGMca6y9ixYxEZGSlMSyQSAMCZM2fwySefICsrC48ePUJ9fT1qampQVVUl1HleeHg43nrrLezbtw/jx4+Hv78/7O3tAQBpaWnIzc1FTEyMUJ+I0NjYiPz8fAwePFhubBUVFdDT0wMRobq6Gi4uLjhy5Ag0NTWRnZ0t1dkZAEaPHo3NmzcDAPz9/bFp0ybY2dlhwoQJmDRpEnx8fKCh0fl/Z4GBgXB3d8fdu3dhYWGBmJgYTJo0CX/5y1/+0H7q6+sjPT0d9fX1SE5Oxvr167F9+3apOh1tDwBIT08HEcHBwUGqvLa2tkf6PjGmTDgRYkxJSSQSvPDCC1JlBQUFmDRpEubPn4+PP/4YxsbGOHfuHObMmYOnT5/KXc/q1asREBCA77//HidOnEBERAQOHDgAPz8/NDY2Yt68eVJ9dJr079+/xdiaEgQ1NTX069dP5oAvEomkpolIKLO2tsbNmzeRmJiIH3/8ESEhIVi/fj2Sk5Olbjl1xMiRI2Fvb48DBw7gnXfeQXx8PKKiooT5nd1PNTU1oQ0cHR1RXFyM6dOn46effgLQufZoikddXR1paWlQV1eXmqenp9ehfWdM1XEixJgKuXz5Murr67FhwwaoqT3rInjo0KE2l3NwcICDgwPCwsIwY8YMREVFwc/PDy4uLrhx44ZMwtWW5xOE5gYPHoxz585h5syZQtn58+elrrro6OjA19cXvr6+WLBgARwdHXHt2jW4uLjIrE8sFrfr12gBAQGIiYmBlZUV1NTUMHnyZGFeZ/ezubCwMGzcuBHx8fHw8/NrV3toamrKxP/SSy+hoaEBJSUlGDNmzB+KiTFVx52lGVMh9vb2qK+vx+eff47bt29j3759MrdqnvfkyRMsXLgQSUlJKCgoQEpKCi5duiQkJcuWLUNqaioWLFiAzMxM5OTk4NixY1i0aFGnY3z//fcRHR2N7du3IycnBxs3bsSRI0eETsLR0dHYvXs3rl+/LuyDjo4ObGxs5K7P1tYWp06dQnFxMR4+fNjidgMDA5Geno41a9Zg6tSp0NbWFuZ11X4aGBjgrbfeQkREBIioXe1ha2uLyspKnDp1CqWlpaiuroaDgwMCAwMxc+ZMHDlyBPn5+bh06RLWrVuH48ePdygmxlSeIjsoMca6x6xZs2jKlCly523cuJHMzc1JR0eHvL29ae/evQSAHj58SETSnXNra2vpjTfeIGtra9LU1CQLCwtauHChVAfhixcv0iuvvEJ6enokkUjIycmJ1qxZ02Js8jr/Nrdt2zays7MjsVhMDg4OtHfvXmFefHw8jRo1igwMDEgikZCbmxv9+OOPwvzmnaWPHTtGL7zwAmloaJCNjQ0RyXaWbjJixAgCQKdPn5aZ11X7WVBQQBoaGnTw4EEiars9iIjmz59PJiYmBIAiIiKIiKiuro5WrVpFtra2JBaLyczMjPz8/Ojq1astxsQYkyUiIlJsKsYYY4wxphh8a4wxxhhjKosTIcYYY4ypLE6EGGOMMaayOBFijDHGmMriRIgxxhhjKosTIcYYY4ypLE6EGGOMMaayOBFijDHGmMriRIgxxhhjKosTIcYYY4ypLE6EGGOMMaayOBFijDHGmMr6/2StLV9hbYG4AAAAAElFTkSuQmCC",
-      "text/plain": [
-       "<Figure size 640x480 with 1 Axes>"
-      ]
-     },
-     "metadata": {},
-     "output_type": "display_data"
-    },
-    {
-     "data": {
-      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAkIAAAHFCAYAAAAe+pb9AAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjguMCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy81sbWrAAAACXBIWXMAAA9hAAAPYQGoP6dpAACZY0lEQVR4nOzdeVxU5ffA8c+wLwqKKIIr7qa5YZpbam65Vi6gmLvmbi5pmZlalpllau57KgrqV81KTXPPJfel1NzFBURBQZBlmLm/P/g5dQEXcODOwHm/Xr5qztw79wwXmMO5z30enaIoCkIIIYQQuZCN1gkIIYQQQmhFCiEhhBBC5FpSCAkhhBAi15JCSAghhBC5lhRCQgghhMi1pBASQgghRK4lhZAQQgghci0phIQQQgiRa0khJIQQQohcSwohIczkzJkz9OrVC19fX5ycnMiTJw81atTgm2++ISoqSuv0nmnixInodLpM7btlyxYmTpyY7nMlS5akZ8+emU/sJRiNRlatWkWLFi0oVKgQ9vb25MuXj9dff51vv/2W+/fvZ+p1e/bsScmSJc2b7Auy5u8xISyVTpbYEOLlLVq0iEGDBlG+fHkGDRrEK6+8gl6v59ixYyxatIiqVauyceNGrdN8qokTJzJp0iQy8+tgyJAhzJkzJ919T548iZubG6VLlzZHmi8sPj6et99+m99//52AgADefvttfHx8iImJ4eDBgyxZsoRy5cqxf//+DL/2lStXiImJoXr16lmQ+dNZ+/eYEBZLEUK8lIMHDyq2trbKW2+9pSQkJKR5PjExUfnpp580yOzFTZgwQcnsr4PBgwdnet+s8v777yuAsnr16nSfj4uLUxYuXJjNWWVedn6PPX78WDEajWZ5LSGsgWX99hLCCrVp00axs7NTQkNDX2h7QJkwYUKaeIkSJZQePXqYHi9btkwBlJ07dyp9+/ZVPDw8lLx58yrdunVTYmNjlbCwMKVTp06Ku7u7UrhwYWXUqFFKUlKSaf/du3crgLJ7927Vca5du6YAyrJly0yx9Aqh4OBgpVmzZkrhwoUVJycnpUKFCspHH32kxMbGmrbp0aOHAqT5d+3atTTvKSIiQrG3t1c+/fTTNO/9/PnzCqDMnDnTFAsLC1Pef/99pUiRIoq9vb1SsmRJZeLEiYper3/m1/fOnTuKnZ2d0rp162dul9rs2bOVBg0aKAULFlRcXFyUypUrK1OnTlV9TZ+85xIlSqhigDJ48GBlxYoVSoUKFRRnZ2elSpUqys8//6zaLiIiQunXr59StGhRxcHBQfH09FTq1q2r7Nix45m5ZfX32G+//ab06tVL8fT0VABlzZo1CqD8/vvvaV5j7ty5CqCcPn3aFDt69KjStm1bJX/+/Iqjo6NSrVo1JSQk5IVyFUJrdtnbfxIiZzEYDOzatQs/Pz+KFSuWJcfo27cv7du3Jzg4mJMnT/LJJ5+QnJzMP//8Q/v27Xn//ff5/fffmTp1Kj4+PowcOdIsx7106RKtWrVi+PDhuLq6cuHCBaZOncqRI0fYtWsXAOPHjycuLo7169dz6NAh077e3t5pXq9gwYK0adOGH3/8kUmTJmFj8+8QxWXLluHg4EDXrl0BCA8Pp1atWtjY2PDZZ59RunRpDh06xOTJk7l+/TrLli17at67d+8mOTmZdu3aZej9XrlyhcDAQHx9fXFwcOD06dN8+eWXXLhwgaVLlz53/19//ZWjR4/y+eefkydPHr755hveffdd/vnnH0qVKgVAt27dOHHiBF9++SXlypXj4cOHnDhxgsjIyKe+bnZ8j/Xu3ZvWrVuzcuVK4uLiaNOmDYUKFWLZsmU0adJEte3y5cupUaMGVapUAVK+3m+99Ra1a9dm/vz5uLu7ExwcTEBAAI8fP9ZsjJgQL0zrSkwIaxYeHq4ASufOnV94HzL41/rQoUNV273zzjsKoEyfPl0Vr1atmlKjRg3T45ftCP2X0WhU9Hq9snfv3jTdgGddGkv9njZv3qwAyvbt202x5ORkxcfHR+nQoYMp1r9/fyVPnjzKjRs3VK/37bffKoDy999/PzXXr7/+WgGUbdu2pXlOr9er/j2NwWBQ9Hq9smLFCsXW1laJiooyPfe0jpCXl5cSExNjioWHhys2NjbKlClTTLE8efIow4cPf+px05Md32Pdu3dPs+3IkSMVZ2dn5eHDh6bYuXPnFED54YcfTLEKFSoo1atXT/P1bNOmjeLt7a0YDIYXzlsILchdY0JYuDZt2qgeV6xYEYDWrVunid+4ccNsx7169SqBgYEULlwYW1tb7O3tadiwIQDnz5/P1Gu2bNmSwoULqzo6v/32G3fu3KF3796m2C+//ELjxo3x8fEhOTnZ9K9ly5YA7N27N8PHPnXqFPb29qp//71z7OTJk7Rr144CBQqY3m/37t0xGAxcvHjxua/fuHFj8ubNa3rs5eVFoUKFVOekVq1aLF++nMmTJ3P48GH0en2G30dW6NChQ5pY7969iY+PJyQkxBRbtmwZjo6OBAYGAnD58mUuXLhg6uT991y1atWKsLAw/vnnn+x5E0JkkhRCQrwET09PXFxcuHbtWpYdw8PDQ/XYwcHhqfGEhASzHDM2NpYGDRrw559/MnnyZPbs2cPRo0fZsGEDkHJXVmbY2dnRrVs3Nm7cyMOHD4GUSy3e3t60aNHCtN3du3f5+eef0xQulSpVAnjmre/FixcHSFMUli9fnqNHj3L06FH69eunei40NJQGDRpw+/ZtZs6cyf79+zl69Chz5sx54fdboECBNDFHR0fVviEhIfTo0YPFixdTp04dPDw86N69O+Hh4U993ez4HkvvUmalSpV47bXXTEWrwWBg1apVvP3226bvvbt37wLw4YcfpjlXgwYNAp59roSwBDJGSIiXYGtrS5MmTdi6dSu3bt2iaNGiz93H0dGRxMTENPFnjRPJDCcnJ4A0x3qRD6Zdu3Zx584d9uzZY+oCAabi5WX06tWLadOmmcaRbN68meHDh2Nra2vaxtPTkypVqvDll1+m+xo+Pj5Pff1GjRphZ2fH5s2bef/9901xZ2dnatasCaR0nP5r06ZNxMXFsWHDBkqUKGGKnzp1KjNv8ak8PT2ZMWMGM2bMIDQ0lM2bN/Pxxx8TERHBtm3b0t0nO77HnjaHVK9evRg0aBDnz5/n6tWrhIWF0atXL9X7ARg7dizt27dP9zXKly//3HyF0JJ0hIR4SWPHjkVRFPr160dSUlKa5/V6PT///LPpccmSJTlz5oxqm127dhEbG2vWvJ5M+pf6WJs3b37uvk8+GB0dHVXxBQsWpNn2yTYv2iWqWLEitWvXZtmyZaxevZrExETVhyukXA7866+/KF26NDVr1kzz71mFkLe3N7179+bXX38lODj4hXJK7/0qisKiRYteaP/MKF68OEOGDKFZs2acOHHimdtq9T3WpUsXnJycWL58OcuXL6dIkSI0b97c9Hz58uUpW7Ysp0+fTvc81axZU3W5UAhLJB0hIV5SnTp1mDdvHoMGDcLPz4+BAwdSqVIl9Ho9J0+eZOHChVSuXJm2bdsCKXcOjR8/ns8++4yGDRty7tw5Zs+ejbu7u1nzKly4ME2bNmXKlCnkz5+fEiVKsHPnTtPlrWepW7cu+fPnZ8CAAUyYMAF7e3uCgoI4ffp0mm1fffVVAKZOnUrLli2xtbWlSpUqpkt46enduzf9+/fnzp071K1bN03X4PPPP2fHjh3UrVuXYcOGUb58eRISErh+/Tpbtmxh/vz5z+yMzJgxg2vXrtG1a1c2b95smlDx8ePHXLhwgeDgYJycnLC3twegWbNmODg40KVLF8aMGUNCQgLz5s3jwYMHz/1avajo6GgaN25MYGAgFSpUIG/evBw9epRt27Y9tZvyhFbfY/ny5ePdd99l+fLlPHz4kA8//FB1tx+kFMctW7akRYsW9OzZkyJFihAVFcX58+c5ceIE69aty9gXSojspvFgbSFyjFOnTik9evRQihcvrjg4OCiurq5K9erVlc8++0yJiIgwbZeYmKiMGTNGKVasmOLs7Kw0bNhQOXXq1FPv6Dl69KjqOE/u8Lp3754q3qNHD8XV1VUVCwsLUzp27Kh4eHgo7u7uynvvvaccO3bshe4aO3jwoFKnTh3FxcVFKViwoNK3b1/lxIkTafZNTExU+vbtqxQsWFDR6XRPnUfov6KjoxVnZ2cFUBYtWpTu1/PevXvKsGHDFF9fX8Xe3l7x8PBQ/Pz8lHHjxqnmMnoag8GgrFixQmnWrJni6emp2NnZKe7u7kqtWrWU8ePHK7du3VJt//PPPytVq1ZVnJyclCJFiiijR49Wtm7dmubOu2fNI5Taf99/QkKCMmDAAKVKlSqKm5ub4uzsrJQvX16ZMGGCEhcX99z3oyjZ9z32X9u3bzfND3Xx4sV0tzl9+rTi7++vFCpUSLG3t1cKFy6svPnmm8r8+fNf6H0JoSVZYkMIIYQQuZaMERJCCCFEriWFkBBCCCFyLSmEhBBCCJFraVoI7du3j7Zt2+Lj44NOp2PTpk3P3Wfv3r34+fnh5OREqVKlmD9/ftYnKoQQQogcSdNCKC4ujqpVqzJ79uwX2v7atWu0atWKBg0amBafHDZsGP/73/+yOFMhhBBC5EQWc9eYTqdj48aNvPPOO0/d5qOPPmLz5s2qdY4GDBjA6dOnVStfCyGEEEK8CKuaUPHQoUOqWU0BWrRowZIlS9Dr9abJ0f4rMTFRNdW80WgkKiqKAgUKPHVaeSGEEEJYFkVRePToET4+Pmkm9nwZVlUIhYeH4+XlpYp5eXmRnJzM/fv30104cMqUKUyaNCm7UhRCCCFEFrp58+YLrbn3oqyqEIK0iwM+ubL3tO7O2LFjGTlypOlxdHQ0xYsX5+LFi2lW7xbZT6/Xs3v3bho3bpxuR09kHy3Pxemb0fT88Xi2HlOYm8JX9otpZ3tY60REDhGXpODq8O9ne0yiQrHvY82+fp1VFUKFCxcmPDxcFYuIiMDOzo4CBQqku4+jo2OahSMBPDw8nrqPyD56vR4XFxcKFCgghZDGtDwXy3+6go2ji+lxwbyOLOzmh00uvXydnJzMgYMHqFe3HnZ22v2atkmKxfvoV+QJO4xOMTxzW50xGYfYW8C/5yzJ1YfQRjMx2qb9HWwtDAYDp8+cpmqVqtja2mqdTq5x7OQZRo37gtHDBtDmrSYAREfHwPftzD6sxaoKoTp16qhWWAbYvn07NWvWlA9RIazUidAH7Lt4TxXr/0YpqhfPr1FG2tPr9dzKA1WKumv3u82QDKt7wZVdmdvfIQ8O3dZRpnBl8+aVzfR6PRfvPKJMtQbyOZMNjEYjU6dOZfz48RgMBiZM+Z62nbpRtmxZIiMjs+SYmt4+Hxsby6lTpzh16hSQcnv8qVOnCA0NBVIua3Xv3t20/YABA7hx4wYjR47k/PnzLF26lCVLlvDhhx9qkb4Qwgxm/n5J9dgzjyNda5fQKBthsu3jzBdB6KDDErDyIkhkr4iICFq2bMknn3yCwZDSgfTz88PV1TVLj6tpR+jYsWM0btzY9PjJWJ4ePXqwfPlywsLCTEURgK+vL1u2bGHEiBHMmTMHHx8fZs2aRYcOHbI9dyFyIqMCn20+xy9nwolLSs62Y/7XgIalcHaQSxCa+nMhHF2UuX11NtDyGyj/lnlzEjnanj17CAwMJCwsDEgZ9zt+/Hg+++yzLL8kqWkh1KhRI541jdHy5cvTxBo2bMiJEyeyMCshcq89YTp+unFLs+N75nGQblB2uH8J9n4Dj8LSPqcoEHpQHbN1hNbfgvNzbjDR6aBQRfAoZb5cRY5mMBj48ssvmTRpEkajEUi5GzwoKIgmTZpkSw5WNUZICJF1Hicls/O2tssP9n+jtHSDslrUNVjaAh5nYLzFO3Ph1Y5Zl5PIle7evUtgYCC7dv17CbZp06asWrUqzVQ5WUkWXRVCALD6yC1ik7W7S6tR+YJ0ryvdoCwV/xBWB2SsCGr4sRRBIksoisJff/0FgI2NDV988QXbtm3L1iIIpCMkhADikwws/uO6KtagrCeftKqYLcd3c7anSD7nbDlWrmVIhnU94f4/L75P1S7Q6OMsS0nkboULFyYoKIhevXqxatUqGjZsqEkeUggJIQj68waRcUmq2Kjm5ano7aZRRuKFJD6Cg7Mh6urzt425DTcOqGOFXoGavdPf3sMXSjdJGfcjhBncvn0bZ2dn1WTGTZs25dKlSzg5OWmWlxRCQuRgRqOC8TnrKickG5m/V/1B2qh8QaoVy5eFmYmXlvQYfmwLd05mbn/XghAYAvmKmzcvIdKxbds2unXrRt26ddm0aZNqUkQtiyCQQkiIHElRFL757R9WHbrBo8SM3wb/QZOyWZCVMBujETb2z3wRZOsInddIESSynF6v57PPPuPrr78GYPPmzSxYsIABAwZonNm/pBASIgf65UwY8/ZcydS+b5QtkKtndbYKuyfD+c2Z21dnA+/Og2KvmTcnIVK5efMmnTt35uDBf6djaNOmDZ06ddIwq7SkEBIihzEaFWbtvPT8DZ9iSOPSZsxGmN2pNbD/O3XM0R1q9nr+eB5bByjTTIogkeV+/vlnevbsSVRUFAB2dnZMnTqVESNGmH2tsJclhZAQOcyWv8K4FBGb4f1sbXQ08zFQXcYGWa4bB2HzUHVMZwv+P0LpxunvI0Q2SkpKYuzYsUyfPt0UK1GiBCEhIdSuXVvDzJ5OCiEhcpD0ukEVCudlTtcaz903v5MN+3ftyKrUxMuKugrBXcGoV8dbTZMiSFiEmJgYmjVrxpEjR0yxd955h6VLl5I/v+VebpdCSIgcZOtf4Vy8q+4GDW9altIF8zx3X71e/9xthEaeTIQYH6WOvz4IXuujSUpCpJY3b15KlizJkSNHcHBw4Ntvv2XIkCEWdyksNSmEhMghntYNav5KYY0yEmZh0P//RIgX1fGyLaD5ZE1SEiI9Op2ORYsWER0dzZdffomfn5/WKb0QKYSEyCG2/R3OP3cfqWLDmpTFxsay/xoTz6AosPUjuLpbHS9UCTouARtZl01o58qVK9y8eZNGjRqZYm5ubmzbtk27pDJB1hoTIgdIrxtU3isvb1WSbpA1szm2CI4tVQddC0FgMDjm1SYpIYB169ZRo0YNOnTowM2bN7VO56VIISREDrD9XDgXwqUblJN4RZ/CZsen6qCdE3SRiRCFdhISEhg0aBD+/v7ExMQQFRXFJ598onVaL0UujQlh5YxGhRm/q7tB5bzy0LKydIOsSuQViDgHioIuPoaa1+eiU4zqbd6ZB0VrapOfyPUuXryIv78/p0+fNsUCAwOZO3euhlm9PCmEhLBy28/dTdMNGvqmdIOsytElsOVD+P/CJ91fzI3HQeX22ZqWEE+sXr2a/v37Exubcleqk5MTs2fPpnfv3hZ/V9jzSCEkhBVTlLRjg8oWykOrV701ykhk2MXfVEVQul7tBG+Mzr6chPh/jx8/ZtiwYSxZssQUq1ixImvXrqVy5coaZmY+UggJYcW2n7vLubAYVWxok7LYSjfIOoT/Bet7P7sIKlYb2s1+/vIZQpiZoii89dZb7N+/3xTr0aMHc+bMwdXVVcPMzEsGSwthpdLrBpUu6Epr6QZZh0d3YU1nSEq1HEq+EigFyvDIyQdjlS7QJRjsnbTJUeRqOp2OkSNHAuDi4sLy5ctZvnx5jiqCQDpCQlit389H8PcddTdomHSDrIM+HoIDITrVbcev+kP7hSQnJ7NryxZatWqFjb29NjkKQcoSGd9++y2tWrWiYsWKWqeTJaQjJIQVUhSFGb+rZxouXdCVNlV8NMpIvDCjETYNgtvH1PFitaHdD3IJTGjmr7/+4tNPP0VRFFV81KhRObYIAukICWGVdqbTDRr6pnSDrMLer+HvDepYvhLQebVcAhOaUBSFJUuWMHToUBISEvD19aVPn9yzhp10hISwMoqiMDPV2KBSnq60rSrdIIt3Zi3snaqOObpBYAi4emqTk8jVHj16xHvvvUe/fv1ISEgAYMmSJRiNzxjAn8NIISSEldl1IYKzt6NVsaFNykg3yNKF/gk/DVbHdDbQaRkUyrmXHYTlOnXqFH5+fqxevdoUGzhwILt27cLGJveUB3JpTAgrkKA3EJOgB0jTDfL1dKWtjA2ybA9upAyONiSp4y2/gTJNtclJ5FqKojB//nxGjBhBYmIikLJY6qJFi/D399c4u+wnhZAQFu77HReZt/cKScnpt6qHNC6DnW3u+evN6iREw+oAeHxfHa/VH2r10yYnkWtFR0fTr18/1q1bZ4r5+fkREhJC6dKlNcxMO/LbUwgLdvDKfWbuvPTUIqhkARferibdIItlSE6ZMPHeeXW8TFNo8ZU2OYlc7aOPPlIVQcOGDePAgQO5tggCKYSEsGipF1NNbcibZaUbZMl++wQu/66OFawIHZeCrTTkRfabPHkyRYoUIV++fGzcuJGZM2fi6OiodVqakp9EISzUoSuRHLkWle5zDrY2BLxWjPbVi2RzVuKFHVkERxaoYy6eEBgMTu7a5CRyHUVRVIuienp6smnTJjw9PSlZsqR2iVkQKYSEsFCpJ0z0dndi85D62NnocHawxcneVqPMxHNd/h22fqSO2TqmzBWUv6QmKYnc588//2TEiBFs3LgRLy8vU7xmzZoaZmV5pKcuhAU6fDWSP1N1gwY1LkPBvI7kd3WQIsiSRZyHdb1AMajjb8+B4rW1yUnkKoqi8N1331G/fn0OHTpEt27dctW8QBklHSEhLNDMVGODvN2d8K9ZVKNsxAuLuw+r/SFRPes3DT+CKp20yUnkKpGRkfTs2ZNffvnFFIuLiyM6Opr8+fNrmJnlko6QEBbmz6uRHLoaqYoNalQaRzvpAlk0fULKXEEPQ9XxSu2h0VhtchK5yoEDB6hevbqqCPr444/Zs2ePFEHPIB0hISxM6gkTC7s54f9aMY2yES9EUeDnYXDzT3W8iB+8M1cWUhVZymg08s033/Dpp59iMKRckvX09GTlypW89dZbGmdn+aQQEsKCHLkWxcEr6m7QQOkGWb5938KZEHXMvRh0XgP2ztrkJHKFe/fu0b17d7Zt22aKvfHGG6xevZoiReSu0hchl8aEsCAzd6rvFPNycyRAukGW7a8NsHuyOuaQB7oEQ16v9PcRwkx27txpKoJ0Oh3jx49n586dUgRlgHSEhLAQx65HceByqm5Qw9Jyh5glu3UcNg1Ux3Q2KRMmFq6sTU4iV+ncuTPbt29ny5YtrFq1iqZNZe26jJJCSAgLkXpsUKG8jnSuVVyjbMRzPbwJazpDcoI63vxLKNdCm5xEjvfo0SPy5s2ris2ePZuYmBgKFy6sUVbWTS6NCaERRVG49eAx1+7Hsf3vcPZfUi/KObCRdIMsVuKjlCIoLkId9+sFrw9Mfx8hXtLOnTspV64ca9euVcVdXFykCHoJ0hESQgPh0Ql0XXyYK/fi0n2+YF5Hukg3yDIZDfC/vnD3L3W8VCNoNU3uEBNmZzAY+Pzzz/niiy9QFIW+ffvi5+eXqxdKNScphITQwPif/npqEQQyNsii7fgMLm5TxzzLQacfwdZem5xEjnXnzh0CAwPZu3evKVanTp00l8dE5smlMSGy2V+3o9lx7u5Tny+Y15HA2tINskjHlsGh2eqYswcEhoBzPk1SEjnXb7/9RrVq1UxFkK2tLVOmTGHr1q0UKlRI4+xyDukICZHNZqUaFP1fpTxdmdapqnSDLNHVPbDlQ3XMxh4CVoFHKU1SEjlTcnIy48eP5+uvvzbFihYtypo1a6hfv76GmeVMUggJkY3+vhPN9lTdoFHNyjGgUcq1fntbadJapHsXIaQ7GJPV8bYzoWQ9bXISOdLt27cJCAjgwIEDpljr1q1Zvnw5np6eGmaWc8lvXSGyUepuUD4Xe3rWK4m9rY0UQZbqcdT/L6QarY7XHwHVu2qTk8ixdDodFy+mTKxqZ2fHt99+y+bNm6UIykLym1eIbHLuTgy//a3uBvWt70teJxlga7GSkyDkPXhwTR2v2Bbe/EybnESO5uPjw8qVKylZsiT79+9n1KhR2NjIR3VWkktjQmST1N0gd2d7etQtqU0y4vkUBX4ZDjcOqOPe1eDdBSAfTsIMbty4gbu7O/ny5TPFWrRowYULF3B0dNQusVxECiEhssH5sBi2/R2uikk3yMIkJ8G5TRB9K+Vx1BU4FaTeJq9PyhpiDq7Znp7IeTZt2kSvXr148803Wb9+Pbr/zEElRVD2kUJIiGzwwy51N8jNyY4e9Upqk4xIKzkRgjrCtX1P38beBbqsATfv7MtL5EiJiYmMGTOGWbNmAbBhwwaWLl1Knz59NM4sd5JCSIgsdiE8hi1n1d2gPvVL4SbdIMugKPDz8GcXQeig/SLwqZZNSYmc6urVq/j7+3P8+HFTrGPHjnTs2FHDrHI3ucgtRBb7Yedl1WM3Jzt6SjfIcvzxPZxe/extmk6Eim2yJR2Rc61fv57q1aubiiBHR0fmzp3L2rVrcXd31zi73Es6QkJkoX/CH7HlrzBVrHd9X9ydpRtkEc5thp2T1DF7F/B9I+X/bR2gQhuo4p/9uYkcIyEhgVGjRjF37lxTrGzZsqxdu5Zq1appl5gApBASIkvN2nUJRfn3cV4nO3rV89UuIfGvOydhw/upgv9/CUy6P8JMHjx4wJtvvsmpU6dMsS5durBgwQJZL8xCyKUxIbLIpbuP2HI2VTeonnSDLEL0bVjdGZLj1fFmk6QIEmaVL18+ypQpA4CTkxMLFy4kKChIiiALIh0hIbLIrF2X1d0gRzt6SzdIe4mxsCYAYtUD2Kn+HtQdpk1OIsfS6XQsXryYx48fM2XKFKpUqaJ1SiIVKYSEyAKX7j7ilzN3VLFe9Uri7iLdIE0ZjSmXw8LPquMl6kPr7+E/87gIkRnnz5/n7t27NGrUyBRzd3fn119/1S4p8UxyaUyILPBDet2g+tIN0tzOifBPqg8kj9IQsBLsHDRJSeQcK1asoGbNmnTq1Inbt29rnY54QVIICWFmlyNi+TlVN6hnvZLkc5EPWk2dWAkHZqpjTvkgcC24eGiSksgZ4uLi6NWrFz169ODx48fcv3+fCRMmaJ2WeEFyaUwIM/sh1Z1ieRzt6CPdIG1d25+ybth/2diB/wrwLKNJSiJn+Ouvv/D39+f8+fOmWN++fZk5c+Yz9hKWRAohIV6SwajwT/gj4vXJRMXp+fl0qm5QXekGaSrySsoK8sZkdbz1dCjVUJuchNVTFIWlS5cydOhQ4uNT7j7MkycPCxYsIDAwUOPsREZIISTES4iMTSRg4WEuR8Sm+7yrg610g7T0OApW+0PCQ3W8zhDw66FJSsL6PXr0iIEDBxIU9O+ivFWrVmXt2rWUK1dOw8xEZkghJMRL+Hb7P08tgiBlbFB+V+kGacKgh7XdIVK9xAnlW0Gzz7XJSVg9RVFo2rQpR44cMcUGDhzI9OnTcXJy0jAzkVmaD5aeO3cuvr6+ODk54efnx/79+5+5fVBQEFWrVsXFxQVvb2969epFZGRkNmUrxL9uRj1m3bFbT30+r6MdfeuXysaMhImiwK8j4Xqq3yder6bMHG1jq01ewurpdDpGjx4NQN68eQkJCWHu3LlSBFkxTQuhkJAQhg8fzrhx4zh58iQNGjSgZcuWhIaGprv9H3/8Qffu3enTpw9///0369at4+jRo/Tt2zebMxcC5u65QrLx31HROl1K8ZPX0Y5KPm4s6OYn3SCtHJoNJ1aoY3m8IDAYHPNok5PIMTp27Mh3333HyZMn8feXdeisnaaXxqZPn06fPn1MhcyMGTP47bffmDdvHlOmTEmz/eHDhylZsiTDhqXM/urr60v//v355ptvsjVvIW49eMz64zdVsZ51SzKhbSWNMhImF7bA9vHqmJ0TdFkD7kW1yUlYrRMnThAUFESrVq1U8ZEjR2qUkTA3zQqhpKQkjh8/zscff6yKN2/enIMHD6a7T926dRk3bhxbtmyhZcuWREREsH79elq3bv3U4yQmJpKYmGh6HBMTA4Ber0ev15vhnYiX8eQcWNu5mL3rEnrDv90gRzsb+tYrYXXv47+s9VyohJ/F7n990aGowsnt5qIUqgJW8t5yxLmwcoqiMHfuXD766COSkpJo2rQpvXr10jqtXC2rfh40K4Tu37+PwWDAy8tLFffy8iI8PDzdferWrUtQUBABAQEkJCSQnJxMu3bt+OGHH556nClTpjBp0qQ08d27d+Pi4vJyb0KYzY4dO7RO4YVFJcK6k7bAv8sx1PZM5tj+ndolZUbWdC7+y1H/kIb/TMReH6eKn/PuyKVrdnBti0aZZZ61ngtrFxsby+zZszl8+LApNnv2bAoVKoROlmHRzOPHj7PkdTW/ayz1N5WiKE/9Rjt37hzDhg3js88+o0WLFoSFhTF69GgGDBjAkiVL0t1n7NixqhZmTEwMxYoVo3HjxhQoUMB8b0Rkil6vZ8eOHTRr1gx7e+tYh2vCz+cwKP8Oknaws+Grbm/g5WbdgyWt8VyY6B9ju7IdNvooVdj4qj9l286hrJV9eFn1ubByR44cYfjw4Vy/ft0Ua9euHT/++COurq7aJSay7MYozQohT09PbG1t03R/IiIi0nSJnpgyZQr16tUzjdivUqUKrq6uNGjQgMmTJ+Pt7Z1mH0dHRxwdHdPE7e3t5ReMBbGW83HnYTzrjqvXEAqsVZyiBfJqlJH5Wcu5MDEaYeNQCDuljhd7HZu3Z2NjxWuIWd25sGKKovD999/z0UcfkZycMvlm/vz5WbJkCTY2Nri6usq50FhWff01u2vMwcEBPz+/NK3fHTt2ULdu3XT3efz4MTY26pRtbVNug1UUJb1dhDCreXuuqMYGOdjaMKBhaQ0zEuz+Es79pI7lLwmdg8Au7R9BQqQWGRlJu3btGDVqlKkIqlu3LqdOnaJNmzYaZyeymqa3z48cOZLFixezdOlSzp8/z4gRIwgNDWXAgAFAymWt7t27m7Zv27YtGzZsYN68eVy9epUDBw4wbNgwatWqhY+Pj1ZvQ+QSYdHxhBxV3ynWuVYxCrtb9yUxq3Y6GPZ/q445uqcspOrqqU1Owup8+OGH/PLLL6bHH330EXv27KF48eIaZiWyi6ZjhAICAoiMjOTzzz8nLCyMypUrs2XLFkqUKAFAWFiYak6hnj178ujRI2bPns2oUaPIly8fb775JlOnTtXqLYhcZN6eKyQZjKbHDrY2DGwk3SDN3DgEm4eqYzpb6LQMCpbXJidhlaZOncpvv/2GXq9n5cqVvPXWW1qnJLKR5oOlBw0axKBBg9J9bvny5WliQ4cOZejQoWk3FiILhUcnEHxE3Q0KeK0Y3u7OGmWUy0Vdg5CuYEhSx1t9A2WaaJOTsBqpb8opVKgQP/30Ez4+PhQpUkTDzIQWNF9iQwhrMH+vuhtkb6uTbpBW4h/C6gB4nOoOktoD4DWZZV482759+3j99de5d++eKv7aa69JEZRLSSEkxHPcjUlg9RH1si8BrxXDJ590g7KdIRnW9YT7/6jjZZpB8y81SUlYB4PBwOTJk2ncuDFHjhyhR48eGI3G5+8ocjzNL40JYenm7blCUnLqblAZDTPKpRQFto6Bq7vV8UKvQMelYCu/zkT67t69y3vvvcfvv/9uiiUmJhIbG4ubm5uGmQlLIB0hIZ4hIiaBNam6QZ1qFqOIdIOy358L4FiqiVNdC0KXYHCSDzORvl27dlG1alVTEWRjY8OkSZPYvn27FEECkI6QEM80b+8VElN1gwbJ2KDsd3E7/DZWHbN1hM6rIX8JbXISFs1gMPD555/zxRdfmOaZ8/b2ZvXq1TRq1Ejb5IRFkUJIiKeIiElg9Z/qblBHv2IUzS9r1GWru3/D+t6gpBrP8c5cKFZLm5yERbtz5w5du3Zlz549pljz5s1ZuXIlhQoV0i4xYZHk0pgQT7Fg31VVN8jORrpB2S42AlZ3hqRH6nijsfBqR21yEhZv586dpiLI1taWr776iq1bt0oRJNIlHSEh0hHxKIFVh2+oYp1qFqWYh3SDso0+AYIDIVrdlaNyR2j4kTY5Cavw3nvvsWPHDnbt2kVwcDD169fXOiVhwaQQEiIdC/em1w2SO8WyjaLAT4Ph1lF1vOhr8PYcsLLV5EXWio6Oxt3d3fRYp9Mxd+5cEhIS8PSUpVbEs8mlMSFSufcokVV/qrtBHWpINyhb7Z0Kf61Xx9yLpQyOtpe13cS/tmzZQunSpdmwYYMqnidPHimCxAuRQkiIVBbuu0KCXt0NGtxYukHZ5ux62DNFHXPIC4EhkEfGeIgUer2eMWPG0Lp1ayIjI+nduzfXrl3TOi1hheTSmBD/cT82kZWpxga1r1GE4gWkG5Qtbh6FTanWHtTZpEyY6FVJm5yExblx4wadO3fm8OHDpljDhg1Vl8eEeFHSERLiPxbtu6rqBtna6BjSuKyGGeUiD25AcBcwJKrjLaZAueba5CQszk8//US1atVMRZC9vT0zZsxg06ZNeHh4aJydsEbSERLi/92PTWTFoVTdoOrSDcoWCTGwpjPEqRfCpGYfqN1fm5yERUlKSmLMmDHMnDnTFPP19SUkJITXXntNw8yEtZNCSIj/t2j/VeL1BtNjWxsdQ96UsUFZzpCcMmFixDl1vFRjaDlV7hATXL9+nU6dOnHs2DFTrEOHDixevJh8+fJpl5jIEeTSmBBAZGwiK1N1g96pVoQSBVw1yigX2f4pXN6hjnmWh07LwdZek5SEZbGzszMNhHZwcGDOnDmsW7dOiiBhFlIICQEs2n+Nx0nqbtBQ6QZlvaOL4c956pizR8odYs75NElJWJ6iRYvy448/Uq5cOQ4fPsygQYPQSadQmIkUQiLXi4pLYsWh66rY29V8KOkp3aAsdXknbBmjjtk6pMwV5OGrTU7CIly+fJno6GhVrHXr1vz1119Ur15do6xETiWFkMj1Fu+/quoG2ehg6Jtyp1iWirgA63qCYlDH286CEnU0SUlYhuDgYGrUqEG/fv1Mq8Y/YW8vl0qF+UkhJHK1B3FJ/Hjwuir2TrUi+Eo3KOvE3YfV/pAYo443GAXVumiTk9BcfHw8/fv3p0uXLjx69Ih169axcuVKrdMSuYDcNSZytcV/XCUuVTdI7hTLQsmJEPIePFQPTOeVt6Hxp9rkJDR34cIF/P39OXv2rCnWrVs32rdvr2FWIreQjpDItVK6QeoP5HZVfShVMI9GGeVwigKbh0HoIXXcpzq8Mx9s5NdRbrRy5Upq1qxpKoKcnZ1ZunQpP/74I3nyyM+iyHrSERK51pI/rhGbmGx6nNINkrFBWWb/d3AmWB1zKwJdgsFBJq3MbeLi4hgyZAjLly83xV555RXWrVvHK6+8ol1iIteRQkjkSg8fJ7E81digtlV9KFNI/gLNEn9vgl1fqGP2rilFUN7CmqQktHP//n0aNmzIuXP/TqLZu3dvfvjhB1xcpCgW2Ut60SJXWpqqG6TTIfMGZZXbx2HjgFRBHXRYDN5VNElJaKtAgQKUK1cOAFdXV1auXMmSJUukCBKakI6QyHWiH+tZduC6Kta2ig9lCuXVJqGcLPoWrOkCyfHqePMvoEIrbXISmtPpdCxduhSDwcC0adMoX7681imJXEwKIZHrLDlwjUepukHDmkg3yOwSY1MWUo29q47X6A51hmiTk9DEmTNnePDgAQ0bNjTF8ufPz+bNmzXMSogUcmlM5CrR8XqWHbimirV+1Vu6QeZmNMCGfhB+Vh0v2QBafScLqeYSiqKwYMECatWqRadOnbhz547WKQmRhhRCIldZ+sc1HiWk7gbJnWJm9/sE+GeLOlagDPivADsHbXIS2SomJoYuXbowYMAAEhMTuXfvHl9++aXWaQmRhlwaE7lGdLyepam6Qa1e9aacl3SDzOr4j3DwB3XMKR8ErgUXD01SEtnrxIkT+Pv7c+XKFVNsyJAhTJs2TcOshEifdIRErrH8wHVVNwhgmMwbZF7X9sGvI9UxGzsIWAUFSmuTk8g2iqIwe/Zs6tSpYyqC3N3dWb9+PT/88ANOTk4aZyhEWtIRErlCTIKeJX9cVcVav+pN+cLSDTKb+5chpBsY1cUmbb4H3wba5CSyzcOHD+nTpw8bNmwwxV577TVCQkLw9fXVMDMhnk0KIZErLD9wnZhU3aChcqeY+TyOgtWdIOGhOl53WMpdYiJHMxqNNGrUiNOnT5tiI0aM4Ouvv8bBQcaECcsml8ZEjpfSDVKPDWpZuTAVCrtplFEOk5wEa7tDlLrjRoU20HSSNjmJbGVjY8PYsWOBlNvif/rpJ6ZPny5FkLAK0hESOd6PB64THa9XxeROMTNRFPh1BFzfr44XrgLtF8pCqrlIQEAAd+7coUOHDhQvXlzrdIR4YfJbSuRojxL0LE7VDXqrUmEqeks3yCwOzoKTq9SxPIX/fyFVV21yElnu0KFDjBs3Lk18xIgRUgQJqyMdIZGjrTh0Q7pBWeX8L7Bjgjpm5wxd1oB7EW1yElnKaDTy7bff8sknn2AwGKhYsSLvvfee1mkJ8VKkIyRyrNjEZBbtV49baVHJi1d8pBv00sJOp8wcjaKOt18IRWpokpLIWvfv36dNmzZ89NFHGAwGAFavXo2iKM/ZUwjLJoWQyLF+PHidh4+lG2R2MWGwujPoH6vjTSbAK+20yUlkqf3791OtWjW2bt0KpCyaOm7cODZv3oxOlksRVk4ujYkcKTYxmcWpukHNXvGiko+7RhnlEEmPUxZSfZRqzaiqgVB/hDY5iSxjNBqZMmUKn332GUajEYCCBQsSFBREs2bNNM5OCPOQQkjkSCsOXedBqm7QB9INejlGI2x8H8JOqePF60LbGbKQag5z9+5dunXrxo4dO0yxxo0bExQUhLe3t4aZCWFecmlM5Dhxicks2qfuBjWt6EXlItINeim7voDzP6tj+X1Tls+wc9QmJ5FlRo4caSqCdDodEydOZMeOHVIEiRxHOkIix1l5+IZ0g8zt1Gr4Y7o65uiespCqawFtchJZavr06ezatQtIGRTduHFjjTMSImtIISRylMdJySxM1Q1qUqEQrxaVblCm3TgIm4epYzpb8P8RCpbTJidhdoqiqAY+e3l58fPPP1OsWDG8vLw0zEyIrCWXxkSOsvLQDaLiklSxD5pKNyjTIq9AcFcwqjtstJoGpaVDkFPs2LGD1157jcjISFW8Zs2aUgSJHE8KIZFjpNcNerNCIaoUzadNQtYu/gGsDoD4KHX89UHwWh9tchJmlZyczKeffkqLFi04fvw4PXv2lHmBRK4jl8ZEjhF0OJTI1N0gGRuUOQY9rOsJkZfU8bItoPlkTVIS5nXr1i0CAwPZv//fdeIMBgOPHz/G1VWWRxG5h3SERI4Qn2Rgwb4rqljj8gWpWiyfNglZM0WBLaPh6h51vFAl6LgEbGw1SUuYz5YtW6hWrZqpCLK1teWbb77hl19+kSJI5DrSERI5QtCfN7gfm3pskAzkzQybowvh+DJ10LUQBAaDY15tkhJmodfrGTduHNOmTTPFihcvTnBwMHXq1NEwMyG0I4WQsHrxSQbm71WPDWpYriDVpBuUYV7Rp7A5NUMdtHVMWUg1n6wqbs1CQ0Pp3Lkzhw4dMsXatWvHsmXL8PDw0DAzIbQll8aE1UvpBiWqYnKnWCbc/Zua1+eiU4zq+LvzoGhNbXISZrNz505TEWRvb8/333/Ppk2bpAgSuZ50hIRVS9AbWJDqTrE3yhWkRvH8GmVkpR7dxW5tV3TGBHW88Tio3EGbnIRZ9ezZk99//52DBw+ydu1aXnvtNa1TEsIiSCEkrNrqP0O59yhVN0juFMsYfTwEB6KLuaWOv9oJ3hitTU7ipT148ID8+f/9g0Cn0zF//nwMBgP58uXTLjEhLIxcGhNWK0FvYN5e9Z1iDcp64ldCukEvzGiETYPg9jF1vGgtaDdbFlK1Uhs2bKBUqVJs3rxZFc+bN68UQUKkIoWQsFprjqTtBg2XsUEZs/dr+HuDKqS4F4fOq8HeSaOkRGYlJCQwdOhQOnTowMOHD+nZsyc3btzQOi0hLJpcGhNWKUFvYN4edTeofhlP/ErIwM8XdmYt7J2qCultnMA/CPs8BTVKSmTW5cuX8ff35+TJk6ZY8+bNVZfHhBBpSUdIWKXgI6FEpB4bJN2gFxf6J/w0WBVSdDYc8x0ChSpqlJTIrJCQEGrUqGEqghwdHVmwYAFr1qzBzc1N4+yEsGzSERJWJ72xQfXKFOC1ktINeiEPbkBwIBjUE1Aam31FxD0fjZISmREfH8+IESNYsGCBKVa+fHnWrl1LlSpVNMxMCOshHSFhddYeu8ndmNR3isks0i8kITplIdXH99Xx1/phfK2vNjmJTLl8+TKvv/66qgh67733OHbsmBRBQmSAFELCqiQmG5i7W90Nqlu6ALV8pRv0XIZkWN8b7p1Xx0s3gbe+1iYnkWkODg7cvHkTAGdnZ5YsWcKKFSvIkyePxpkJYV2kEBJWZe3Rm4THqCf9k3mDXtBvn8Dl39WxghWg0zKwlavk1qZ48eL8+OOPvPLKKxw9epTevXujk+kOhMgwKYSE1UhMNjA31Z1ir5fyoHapAhplZEWOLIIjC9QxlwIQGAJO7trkJDLk/PnzPHr0SBVr27Ytp0+fplKlShplJYT1k0JIWI21x24RFp26GyRjg57r8u+w9SN1zNYhZa6g/CU1SUm8OEVRWLZsGX5+fvTv3x9FUVTP29lJN0+Il6F5ITR37lx8fX1xcnLCz8+P/fv3P3P7xMRExo0bR4kSJXB0dKR06dIsXbo0m7IVWklMNjBv92VVrLavB3VKSzfomSLOw7peoBjU8Xazofjr2uQkXlhsbCw9evSgd+/exMfHs2bNGkJCQrROS4gcRdM/JUJCQhg+fDhz586lXr16LFiwgJYtW3Lu3DmKFy+e7j7+/v7cvXuXJUuWUKZMGSIiIkhOTs7mzEV2W3/8FndSd4Nk3qBni7sPq/0hMUYdf2M0VA3QJifxwq5fv86YMWO4ePGiKda/f3/efvttDbMSIufRtBCaPn06ffr0oW/flNt2Z8yYwW+//ca8efOYMmVKmu23bdvG3r17uXr1Kh4eKXcJlSxZMjtTFhpISjamuVOslq8HdWRs0NPpE1LmCnoYqo5XehcafaJNTuKFKIrC4sWLGTNmDElJKXM95c2bl4ULF9K5c2eNsxMi59GsEEpKSuL48eN8/PHHqnjz5s05ePBguvts3ryZmjVr8s0337By5UpcXV1p164dX3zxBc7Ozunuk5iYSGLiv3POxMSk/HWs1+vR6/Vmejcis56cg2edi5Cjt7j9MF4VG9LIVzqBT6Mo2G4egs3NP1Vho08NDK1ngcGQ8i+VFzkXImvFxMQwaNAg1q5da4pVq1aNoKAgypYtK+dGA/JzYTmy6hxoVgjdv38fg8GAl5eXKu7l5UV4eHi6+1y9epU//vgDJycnNm7cyP379xk0aBBRUVFPHSc0ZcoUJk2alCa+e/duXFxcXv6NCLPYsWNHuvFkI3x/yhb497bgUnkVos7/yZYL2ZSclSkX/hMVw/6nij2292CfR08Sd+x+7v5POxciaz18+JCxY8cSFhZmirVq1YqePXty6dIlLl26pGF2Qn4utPf48eMseV3NbzdIPe+FoihPnQvDaDSi0+kICgrC3T3llt/p06fTsWNH5syZk25XaOzYsYwcOdL0OCYmhmLFitG4cWMKFJBLK1rT6/Xs2LGDZs2aYW9vn+b5kGO3iEo8p4pN6FCTujJIOl26c5uwO6kughQHV+y7b6CJV+Vn7vu8cyGylqIorFu3jl9//RV3d3f69+/PhAkT5FxoTH4uLEdkZGSWvK5mhZCnpye2trZpuj8RERFpukRPeHt7U6RIEVMRBFCxYkUUReHWrVuULZt28KyjoyOOjo5p4vb29vJNbUHSOx96g5H5+66pYjVL5OeN8l4ycVx6bh2Hn4ekCurQdViKfdHqL/wy8rOhnRUrVtC3b1+mTJnChQsX5FxYEDkX2suqr79mt887ODjg5+eXpt24Y8cO6tatm+4+9erV486dO8TGxppiFy9exMbGhqJFi2ZpviL7bThxi1sP1GODPmhaVoqg9Dy8CWs6Q7L6zjpafAnl39ImJ/FMR48eZd++faqYh4cHGzZsoFSpUhplJUTuo+k8QiNHjmTx4sUsXbqU8+fPM2LECEJDQxkwYACQclmre/fupu0DAwMpUKAAvXr14ty5c+zbt4/Ro0fTu3fvpw6WFtZJbzDywy71vEF+JfJTv4ynRhlZsMRHKUVQXIQ67tcTXh+kSUri6RRFYcaMGdSrVw9/f/+njokUQmQPTQuhgIAAZsyYweeff061atXYt28fW7ZsoUSJEgCEhYURGvrv7b958uRhx44dPHz4kJo1a9K1a1fatm3LrFmztHoLIotsPHE7bTeoiXSD0jAa4H994e5f6rjvG9DqW5Cvl0WJiorinXfeYcSIEej1eu7evcs333yjdVpC5GqaD5YeNGgQgwal/1fr8uXL08QqVKggo/dzOL3ByOxUs0hXL56PBmWlG5TGjs/g4jZ1rEBZ8F8BtjKewZIcOnSIzp07q/64Gz16NF9++aWGWQkhNF9iQ4jUNp68TWiU+jbJ4U3LSTcotWPL4NBsdcw5f8pCqs75tclJpGE0Gpk2bRpvvPGGqQgqUKAAv/zyC998840MwBVCY5p3hIT4r2SDkTmpukHViuXjDekGqV3dA1s+VMds7CFgFRQorUlKIq379+/To0cPtmzZYorVr1+fNWvWyA0eQlgI6QgJi7Lp1B1uRKq7QXKnWCr3LsLa7mBMNbN22xlQsr4mKYm0kpOTadCggakI0ul0fPLJJ+zevVuKICEsiBRCwmIkG4z8sEs9e27VYvloVK6gRhlZoMdRKQupJkSr4/WGQ/X3NElJpM/Ozo5PP/0UgIIFC7Jt2za+/PJL7OykES+EJZGfSGExfkqnGzRc7hT7V3IShLwHD9STTFKhDTSZoE1O4pm6du3KvXv3CAgIwNvbW+t0hBDpkI6QsAjJ6dwpVqWoO43KSzcIAEWBX0bAjQPquHdVaL8QbORHWWu7d+9m3LhxaeLDhw+XIkgICyYdIWERfjkbzrX7carYcBkb9K8DM+HUKnUsrzd0CQYHV21yEgAYDAa++OILPv/8cxRF4dVXX6Vz585apyWEeEHyZ6TQnFGBuXuuqmJVirrTuHwhjTKyMOd/ht8nqmP2LilFkJuPJimJFGFhYTRr1oxJkyahKAoA69at0zgrIURGSCEkNHfivo5rqcYGDXtTukEA3DkJ/+sHKP8J6lIuh/lU0ygpASnrIlarVo3du3cDYGNjw+TJk6UQEsLKyKUxoSmDUeG3W+p6vHIRN5pUlG4QMXdgTRdIVi81QtOJULGtJimJlNviJ06cyFdffWXqAvn4+LBmzRreeOMNjbMTQmSUFEJCU7+eDSciQd35+aCJzCJNUlzKQqqPwtTxau9BvQ+0yUlw69YtAgMD2b9/vyn21ltvsWLFCgoWlIH9QlgjuTQmNGMwKsxJNTaoko8bTXN7N8hohA3vQ9hpdbxEPWjzvSykqqERI0aYiiBbW1umTp3Kr7/+KkWQEFZMOkJCM7+cucPVVHeKyQrzwM5JcOEXdcyjVMryGXYO2uQkAJg1axb79u3D0dGR4OBg6tatq3VKQoiXJIWQ0ITBqPDDLvW8Qa94u9HsFS+NMrIQJ1fBgRnqmJM7BK4FFw9NUsrNjEYjNv+Zo8nb25tff/2VUqVK4eEh50OInEAujQlNbDkbxuWIWFVsWG7vBl3/A34ero7Z2IH/CvAsq0lKudnmzZupWbMmUVFRqnjNmjWlCBIiB5FCSGQ7o1Fh1k71mmIVCueleW7uBkVeSVk+w6hXx1t/B6UaaZJSbpWUlMTIkSN5++23OXnyJL179zbdHSaEyHnk0pjIdlv+CuNSqm7QkEalsLHJpd2g+AcpC6nGP1DH6wwBv56apJRbXbt2jYCAAI4ePWqK2djYkJCQgLOzs4aZCSGySqYKobi4OL7++mt27txJREQERqNR9fzVq1efsqfI7dLrBnm7KDTLrXeKGfSwtjtEqsdLUa4lNPtcm5xyqQ0bNtC7d2+io6MBcHBw4LvvvmPw4MG5+5KtEDlcpgqhvn37snfvXrp164a3t7f8khAvbOtf4Vy8q+4GvVXUmDu7QYoCWz6Ea/vUca9XocNisLHVJq9cJjExkQ8//JDZs2ebYqVLlyYkJAQ/Pz8NMxNCZIdMFUJbt27l119/pV69eubOR+Rg6XWDyhXKQxWPh9okpLXDc+H4cnUsjxcEBoNjHk1Sym0uX75MQEAAJ06cMMX8/f1ZtGgRbm5uGmYmhMgumRosnT9/frlrQmTYb3+H88/dR6rYkMalyI3NIP7ZCr+NU8fsnKDzGnAvqk1OudCuXbtMRZCjoyPz588nODhYiiAhcpFMFUJffPEFn332GY8fP37+xkKQ0g2ambob5JWHFrnxTrHws7C+D+qFVIF350NRuRSTnfr164e/vz/lypXjzz//pH///nKpX4hcJlOXxr777juuXLmCl5cXJUuWxN7eXvX8f9vMQgBsPxfOhXB1N2hYk7K5b2zQo3BY3Rn06hm1efNTqPSuNjnlIpGRkRQoUMD0WKfTsWjRInQ6HXnz5tUwMyGEVjJVCL3zzjtmTkPkZCndIPVdUWUL5aFVZW8MhmSNstKAPj5lNfmYW+p4lQBo8KE2OeUiq1atYuDAgaxZs4Y2bdqY4nIZTIjcLVOF0IQJE8ydh8jBtp+7y/mwGFXsSTfIYNAoqexmNMLGAXAnVbe02OvQ7gdZSDULPX78mKFDh7J06VIAevTowalTpyhWrJjGmQkhLIFMqCiylKKkvVOsTKE8tHrVW6OMNLLnKzi3SR3LVwI6B4GdoyYp5Qbnzp2jU6dOnDt3zhRr166d3OwhhDB54ULIw8ODixcv4unpSf78+Z85oDD12jwi99px7i7nUnWDhr5ZBtvcNDbodAjsm6aOObqlLKTq6qlNTrnA8uXLGTRoEPHx8QC4uLgwb948unfvrnFmQghL8sKF0Pfff28aTDhjxoysykfkIIqS9k6x0gVdaVPFR6OMNBB6GDYPUcd0ttBpGRSqoE1OOVxsbCyDBw9mxYoVpljlypVZt24dFSrI11wIofbChVCPHj3S/X8hnub38xH8fSft2KBc0w2KugbBgWBIUsdbToUyTbXJKYc7f/487du358KFC6ZYv379mDlzpqwVJoRI10uPEYqPj0evV6+YLXdhiJRu0EVVrFRu6gYlRMOazvA4Uh2v1R9q9dMmp1zA2dmZsLAwAPLkycPChQvp0qWLxlkJISxZpiZUjIuLY8iQIRQqVIg8efKQP39+1T8hdp6P4K/bqbpBb+aSbpAhGdb1hHsX1PEyzaDFV5qklFuULFmSZcuWUb16dU6cOCFFkBDiuTJVCI0ZM4Zdu3Yxd+5cHB0dWbx4MZMmTcLHx0d1XV7kTumNDSrl6UrbqrmkG7TtY7iySx0rWBE6LgVbuVHTnE6fPs2jR+qJOt99912OHDlC2bJlNcpKCGFNMlUI/fzzz8ydO5eOHTtiZ2dHgwYN+PTTT/nqq68ICgoyd47Cyuz+J4Kzt6NVsSG55U6xPxfC0UXqmIsnBIaAk1wyNhdFUZg7dy61atVi4MCBKIp6uRI7Oyk4hRAvJlOFUFRUFL6+vkDKeKAnt8vXr1+fffv2mS87YXUURWHm7+pukK+nK+1yQzfo0u+w7SN1zNYROq+G/CW0ySkHio6Oxt/fn8GDB5OUlERQUBAbNmzQOi0hhJXKVCFUqlQprl+/DsArr7zC2rVrgZROUb58+cyVm7BCe/65x+lbqbpBjctgZ5upbzXrcfdcyrggxaiOvz0HitfWJKWc6OjRo1SvXp3169ebYh988IFqyQwhhMiITH069erVi9OnTwMwduxY01ihESNGMHr0aLMmKKyHoijMSDU2qGQBF96ulsO7QbH3YHUAJKnHqtDwI6jSSZucchhFUZg5cyb16tXj2rVrAOTLl4+NGzcyY8YMHB1ldm4hROZk6kL6iBEjTP/fuHFjLly4wLFjxyhdujRVq1Y1W3LCuuy9eI/TNx+qYkPeLJuzu0H6hJS5gqJD1fHKHaDRWG1yymEePHhA79692bRpkylWu3ZtQkJCKFFCLjkKIV5Ohgqh+Ph4du7caWpDjx07lsTERNPzhw8fpnz58jg5OZk3S2HxFEVhRqqxQSUKuPBOTu4GKUrKrNG3jqjjRWqmXBKThVRf2q1bt6hfvz43btwwxT788EO++uor7O3tNcxMCJFTZKgQWrFiBb/88oupEJo9ezaVKlUyzdh64cIFvL29VR0jkTvsu3SfU6m6QYNz+tigfdPg7Dp1zL1YyuBoe5nF2Bx8fHx45ZVXuHHjBh4eHqxYsYLWrVtrnZYQIgfJ0KdUUFAQvXv3VsVWr17N7t272b17N9OmTTMNnBa5R8qdYupZpIt7uPBu9SIaZZQN/vof7P5SHXPIA12CIa+XNjnlQDY2NqxYsYJOnTpx6tQpKYKEEGaXoULo4sWLlCtXzvTYyckJG5t/X6JWrVqcO3fOfNkJq7D/0n1OhD5UxYY0LoN9Tu0G3TwKGweqYzqblAkTC1fWJqcc4o8//kgzBYenpydr166lWLFiGmUlhMjJMvRJFR0drZqo7N69e5QsWdL02Gg0qsYMiZwvvVmki3k4826NHNoNehgKwV3AkOr7vMVXUK6FNjnlAEajkSlTptCoUSMCAgKIiIjQOiUhRC6RoUKoaNGi/PXXX099/syZMxQtWvSlkxLW48DlSI7feKCK5dhuUOIjWN0Z4u6p4zV7Q+0B2uSUA0RERNCyZUs++eQTDAYD4eHhfP/991qnJYTIJTL0adWqVSs+++wzEhIS0jwXHx/PpEmT5Bp+LpJyp5h6bFDR/M60r5EDi2GjAdb3gYi/1fFSjaDlN3KHWCbt2bOHatWqsX37dgB0Oh2fffYZX3zxhcaZCSFyiwzdNfbJJ5+wdu1aypcvz5AhQyhXrhw6nY4LFy4we/ZskpOT+eSTT7IqV2FhDl6J5FiqbtDgnNoN2v4pXPpNHfMsB51+BFu5jTujDAYDkydP5vPPP8doTJmN28vLi6CgIJo0aaJxdkKI3CRDhZCXlxcHDx5k4MCBfPzxx6aFDnU6Hc2aNWPu3Ll4eckdM7lBemuKFcnnTIec2A06ugQOz1XHnD1SFlJ1zqdJStYsPDycrl27smvXLlOsSZMmrFq1isKFC2uYmRAiN8rwzNK+vr5s27aNqKgoLl++DECZMmXw8PAwe3LCch26EsmR61Gq2ODGZXCwy2HdoCu7YUuqZWNs7KFzEHiU0iYnK6bX66lXrx5Xr14FUm6PnzRpEmPHjsXW1lbj7IQQuVGmltgA8PDwoFatWubMRViR1GuKFcnnTEe/HNYNuncR1vYAxaCOt5sFJepqk5OVs7e3Z+LEiXTv3h0fHx9Wr15Nw4YNtU5LCJGLZboQErnXoSuRHLmm7gYNalw6Z3WD4iJhdSdIjFbH64+EaoHa5JRDdOvWjQcPHtClSxcKFiyodTpCiFwuB31yieyS+k4xH3cnOvnloMnukhMh5D14cF0dr9gO3hyvSUrWatu2bYwbNy5NfNiwYVIECSEsgnSERIYcvhrJn6m6QQNz0tggRYGfh0PoQXXcuxq8uwBscsj7zGJ6vZ7x48czdepUAKpVq0anTp00zkoIIdKS3+oiQ1LfKebt7oR/zRw0NuiP7+H0anUsr0/KGmIOLtrkZGVCQ0Np1KiRqQgC+OmnnzTMSAghnk4KIfHC/rwayaGrkarYoEalcbTLIXf7nNsMOyepY/YuEBgMbt7a5GRlfv75Z6pXr87BgykdNTs7O7777jtWrlypcWZCCJE+uTQmXljqNcUKuznh/1oOGRt0+wRseD9VUAcdFoN3VU1SsiZJSUmMHTuW6dOnm2IlSpQgJCSE2rVra5iZEEI8mxRC4oUcuRbFwSupukGNc0g3KPo2rOkCyfHqeLPPoYIsGfM8165do3Pnzhw5csQUe+edd1i6dCn58+fXMDMhhHg+uTQmXsjMneo7xbzcHPGvmQO6QYmxsCYAYsPV8erdoO5QbXKyMiNGjDAVQQ4ODsyaNYsNGzZIESSEsArSERLPdex6FAcuq7tBAxuWxsneyrtBRmPK5bDws+p4yQbQerospPqC5syZw4EDB3Bzc2Pt2rX4+flpnZIQQrwwKYTEc6UeG1QoryOdaxXXKBsz+n0C/POrOuZRGvxXgJ2DNjlZAaPRiM1/phEoUqQIW7dupWzZsri7u2uYmRBCZJxcGhPPdPxGFPsv3VfFBjbKAd2gEyvg4Cx1zCkfBK4FF1k372nWrl1LjRo1ePjwoSpes2ZNKYKEEFZJCiHxTDNSzRtUMK8jXay9G3RtP/wyQh2zsYOAleBZRpucLFx8fDwDBgwgICCA06dP06dPHxRF0TotIYR4aXJpTDzV8RsP0naDrH1sUOSVlOUzjMnqeOvp4PuGNjlZuH/++Qd/f3/OnDljijk7O5OUlISjo6OGmQkhxMuTjpB4qtRjgwrmdSSwthV3gx5HwWp/SHiojtcdCn49NEnJ0gUFBeHn52cqgpydnVmyZAkrV66UIkgIkSNIR0ik62ToA/ZdvKeK9X+jlPV2g5KTYG13iLysjpdvBU0npb9PLvb48WOGDRvGkiVLTLGKFSuydu1aKleurGFmQghhXpp3hObOnYuvry9OTk74+fmxf//+F9rvwIED2NnZUa1ataxNMJdK3Q3yzONI19olNMrmJSkK/DoSrqf63ir8KrRfBDZWWtxlkXPnzlGrVi1VEdSjRw+OHj0qRZAQIsfRtBAKCQlh+PDhjBs3jpMnT9KgQQNatmxJaGjoM/eLjo6me/fuNGnSJJsyzV1O3XzInn/U3aABDUvh7GClBcOh2XAy1VpXeQpDlxBwzKNNThZs//79/P333wC4uLiwfPlyli9fjqurq8aZCSGE+WlaCE2fPp0+ffrQt29fKlasyIwZMyhWrBjz5s175n79+/cnMDCQOnXqZFOmucvM39WzSHvmcbDebtCFLbB9vDpm5wxd1oB7EW1ysnDvv/8+HTt2pHLlyhw7dowePWT8lBAi59JsjFBSUhLHjx/n448/VsWbN29uWrk6PcuWLePKlSusWrWKyZMnP/c4iYmJJCYmmh7HxMQAoNfr0ev1mcw+5zpzK5rdqbpBfeuXxE5nRK83mv14T85BlpyL8LPY/a8vOtS3eSe3m4NS6FWQ8w9AREQEhQoVMp2D5ORk5s+fj52dHS4uLvJzooEs/bkQGSLnwnJk1TnQrBC6f/8+BoMBLy8vVdzLy4vw8PB097l06RIff/wx+/fvx87uxVKfMmUKkyalHQy7e/duXFxcMp54DrfgvA3/bRTmsVPwiDrHli3nsvS4O3bsMOvrOekf8MY/E7HXx6ni57w7cemaHVzbYtbjWSNFUdixYwdLlixhzJgxpqUxzH0uRObJubAcci609/jx4yx5Xc3vGtOlWs9JUZQ0MQCDwUBgYCCTJk2iXLlyL/z6Y8eOZeTIkabHMTExFCtWjMaNG1OgQIHMJ54Dnb0dzblDf6pig5qU4936vll2TL1ez44dO2jWrBn29vZmetHH2K5sh43+gSpsfDWAsm1nU1bWECMmJoZBgwaxdu1aAObNm8ehQ4f4+++/zXsuRKZkyc+FyBQ5F5YjMjLy+RtlgmaFkKenJ7a2tmm6PxEREWm6RACPHj3i2LFjnDx5kiFDhgApax4pioKdnR3bt2/nzTffTLOfo6NjuvOd2Nvbyzd1KnP3XlM99nB1oGe9UtjbZ/23idnOh9EIG4dC2Cl1vHgdbN7+ARtZQ4yTJ0/i7+/P5cv/TiXg7++Pl5cXf//9t/xsWBA5F5ZDzoX2surrr9lgaQcHB/z8/NK0G3fs2EHdunXTbO/m5sbZs2c5deqU6d+AAQMoX748p06donbt2tmVeo509lY0v5+PUMXef6MULg6aNw0zZveXcO4ndSx/SQgIArvcPQGgoijMnTuXOnXqmIogNzc31q1bx5w5c3ByctI4QyGEyH6afsqNHDmSbt26UbNmTerUqcPChQsJDQ1lwIABQMplrdu3b7NixQpsbGzSzGFSqFAhnJycZG4TM0g9b5CHqwPdXreyO8VOrYH936pjju4pC6m65u7LoNHR0fTt25f169ebYn5+foSEhFC6dGkNMxNCCG1pWggFBAQQGRnJ559/TlhYGJUrV2bLli2UKJHyARwWFvbcOYXEy/vrdjS/n7+rivVrUApXRyvqBt04CJuHqmM6W/BfDgXLa5KSpThz5gzvvvsuV69eNcU++OADpk6dKstkCCFyPc0/6QYNGsSgQYPSfW758uXP3HfixIlMnDjR/EnlMqm7Qfld7Olex4q6QVFXIbgrGFPdWtlqGpROO24st8mTJw/376csnpsvXz6WLVvGO++8o21SQghhITRfYkNo6+870ew4p+4G9bWmblD8Q1jdGeKj1PHaA+G1PpqkZGlKlSrF4sWLqV27NidPnpQiSAgh/kMKoVxuVqpuUD4Xe3rULalNMhll0MO6nnD/H3W8bHNo8aUmKVmCY8eOERennj+pU6dOHDhwgJIlS2qTlBBCWCgphHKxc3di+O3vtGOD8lhDN0hRYOsYuLpbHS9UCTouzZULqRqNRr799lvq1KnD4MGD0zxva5v7viZCCPE8UgjlYqm7Qe7OVjQ26M8FcGypOuZaEAKDwTGvNjlp6P79+7Rr147Ro0eTnJzMjz/+yM8//6x1WkIIYfGs4E9/kRXOh8Ww7W/1ZJb9GviS18kKJgy7uB1+G6uO2TpC5zWQr7g2OWnojz/+oEuXLty6dcsUGzt2LC1bttQwKyGEsA7SEcql0usGWcXYoLt/w/reoKRaAPaduVDsNW1y0ojRaGTKlCk0atTIVAQVLFiQbdu28dVXX73wenxCCJGbyW/KXOhCeAxb/1J3g/rUt4JuUGxEyh1iSY/U8UZj4dWO2uSkkYiICLp168b27dtNsYYNG7J69Wp8fHw0zEwIIayLFEK50A87L6seuznZ0bNeSW2SeVH6eAgOhOhUE2xW7ggNP9ImJ41cv36dunXrEhYWBqQsXDx+/HjGjx8vXSAhhMgg+a2Zy/wT/ohfz4apYn3ql8LNkrtBigI/DYZbR9XxorXg7TmQy1aTL168OFWqVCEsLAwvLy+CgoJo0qSJ1mkJIYRVkjFCucysXeqxQXmtoRu0dyr89T91zL04dA4C+9y3UKiNjQ0rVqwgMDCQU6dOSREkhBAvQTpCucjFu4/YkqYb5Iu7swV3g86uhz1T1DGHvBAYAnkKaZNTNvv9999xcnKifv36plihQoUICgrSMCshhMgZpCOUi8zaeQlF+fdxXic7etXz1S6h57l5BDalWodOZwOdloHXK9rklI2Sk5MZP348zZs3JyAggHv37mmdkhBC5DhSCOUSl+6mHRvUq54Fd4Me3EgZHG1IVMff+hrKNtMmp2x0+/ZtmjRpwuTJk1EUhTt37jB37lyt0xJCiBxHLo3lEj/suqzuBjna0cdSu0EJMbCmM8Sl6oC81hdqva9NTtlo27ZtdOvWzbRivK2tLV9++SWjR4/WODMhhMh5pBDKBS5HPOLnM3dUsV71SuLuYoHdIENyyoSJEefU8dJvwltTc/QdYnq9nvHjxzN16lRTrGjRogQHB1OvXj0NMxNCiJxLCqFcIHU3KI+jHb3rW2g3aPs4uLxDHfMsDx2XgW3O/Xa9efMmnTt35uDBg6ZYmzZtWL58OQUKFNAwMyGEyNly7ieLAOByRCw/n1Z3g3rWLUk+FweNMnqGI4vgz/nqmEuBlDvEnPNpklJ2SExMpF69ety8eRMAOzs7vv76a0aOHIkuB3fAhBDCEshg6Rxu9q5LGFN1g/pYYDdId3U3bE01Q7StAwQEgYfl5WtOjo6OTJo0CYASJUqwf/9+Ro0aJUWQEEJkA+kI5WBX78WyOVU3qEfdEuR3taxuUN7429hu+AoUg/qJdj9AiTraJJXNevbsSVxcHF27diV//vxapyOEELmGdIRysNm7Lqu6Qa4OtvStX0q7hNITd5/aV6ejS0y1kGqDD6FqZ21yymIbN25k3LhxqphOp2PIkCFSBAkhRDaTjlAOdfVeLJtO3VbFetQtaVndoOREbP/XE9ekVLfJv/IONB6X7i7WLDExkdGjR/PDDz8A4OfnR/v27TXOSgghcjfpCOVQs3eru0EuDrb0bWBB3SBFgc3DsLl5WB33qQHvzAObnPWteeXKFerVq2cqggC2bt2qYUZCCCFACqEc6fr9OH46pR4b1L1OSTwsqRu0/zs4E6yOuRWBLmvAwUWbnLLIunXrqFGjBsePHwdSBkfPmzePhQsXapyZEEIIuTSWA83efRnDf9pBLg629GtgQXde/b0Jdn2hCin2rui6BEPewtrklAUSEhIYOXIk8+bNM8XKli3L2rVrqVatmnaJCSGEMJFCKIe5ERnHxpPqsUHd6pSgQB5HjTJK5fZx2NhfFVLQYXhnAXbeVTRKyvwuXryIv78/p0+fNsUCAwOZP38+efPm1TAzIYQQ/yWXxnKY2bvU3SBne1vet5SxQdG3YE0XSE5Qhf8u0hml3FsaJZU1hg8fbiqCnJycWLx4MatWrZIiSAghLIx0hHKQ0MjHbEjVDepuKd2gxFhY3Rli76rCxmrduEJTymuUVlZZuHAh1apVo1ChQqxdu5bKlStrnZIQQoh0SEcoB5m9+1KablC/NyygG2Q0wIZ+cPesOu77Boa3vskRC6kaDOrJIIsWLcr27ds5evSoFEFCCGHBpBDKIW5GPWbDCXU36L3Xi+NpCd2g3yfAP1vUsQJlwH8F2Nprk5MZ/fjjj9SoUYPo6GhVvEaNGri6umqUlRBCiBchhVAOMWf3ZZL/0w1ysrfh/TdKa5jR/zv+Ixz8QR1zzg+Ba1P+a8Xi4uLo2bMnPXv25MyZM/Tr1w9FUZ6/oxBCCIshY4RygJtRj1l//JYq9l7tEhTMq3E36Ope+HWkOmZjDwGroIAFFGkv4ezZs/j7+3PhwgVTzN3dneTkZOztrb/LJYQQuYV0hHKAuXvU3SBHOxveb6jx2KD7l2FtNzAmq+NtZ0DJ+pqkZA6KorB48WJq1aplKoLy5MlDUFAQixYtkiJICCGsjHSErNytB49ZdyxVN+j1EhTK66RRRsDjKFjdCRLUY2ao9wFUf0+bnMzg0aNHDBgwgNWrV5tiVatWZe3atZQrV07DzIQQQmSWdISs3JzdV9J0g/pr2Q1KToKQbhB1VR2v0AaaTNQkJXM4deoUfn5+qiJo4MCBHD58WIogIYSwYtIRsmK3H8az/vhNVSywdnHtukGKAr+OgBt/qOOFq0D7hVa9kOq+ffu4dOkSAG5ubixatAh/f3+NsxJCCPGypBCyYnN3X0ZvUHeDBjbUcBDywVlwcpU6ltcbAkPAwbpvIx86dCi7d+/m5s2bhISEULq0dQ/2FkIIkUIKISt1+2E8a4+pu0FdahWnkJtG3aDzv8COCeqYnXPKavJuPtrk9BLCw8MpXPjfBWB1Oh0//vgjjo6OODpawNxMQgghzMJ6r1XkcvP2qLtBDnY2DGykUZci7HTKzNGkmkOn/ULwqa5JSpmlKAqzZs2iZMmSbN++XfWcm5ubFEFCCJHDSCFkhe48jGftUfWdYoG1iuOlRTco5k7KGmL6x+p404nwSrvsz+clPHjwgA4dOvDBBx+QmJjIe++9R1hYmNZpCSGEyEJyacwKzdtzhSSD0fTYwdaGAVqMDUqKgzWd4dEddbxaV6g3PPvzeQl//vknAQEB3LhxwxTr3r07BQoU0DArIYQQWU06QlYmLDqekKPqsUGdaxWjsHs2d4OMRtjYP+Wy2H+VqAdtZljNQqqKovDdd99Rv359UxHk4eHB5s2b+fbbb3FwcNA4QyGEEFlJOkJWZn463SBNxgbt+gLO/6yO5fcF/5VgZx3FQ2RkJD179uSXX34xxerWrUtwcDDFihXTMDMhhBDZRTpCViQ8OoE1R9TdoIDXiuHt7py9iZxaDX9MV8ec3FMWUnW1jktJx44do1q1aqoi6OOPP2bPnj1SBAkhRC4iHSErMn+vuhtkb6vL/m7Q9QOweZg6prMF/xVQ0HpmWM6XLx/R0SlLgHh6erJy5UreeustjbMSQgiR3aQjZCXuxiSw+kioKhbwWjF88mVjNyjyCoR0BaNeHW/9HZRqlH15mEGZMmVYtGgRb7zxBqdOnZIiSAghcikphKzEvD1XSEpO3Q0qk30JxD+A1QEp//2v1wdDzV7Zl0cmHTx4kMeP1bf4BwQEsHv3booUKaJRVkIIIbQmhZAViIhJYE2qblCnmsUokl3dIIMe1vWEyEvqeLm3oPkX2ZNDJhkMBiZPnkyDBg0YNmxYmudtrHj9MyGEEC9PPgWswPy9V0lM1Q0alF1jgxQFtoyGq3vUca/K0GEx2NhmTx6ZEB4eTosWLRg/fjxGo5ElS5bw22+/aZ2WEEIICyKFkIWLiEkg6M8bqlhHv2IUze+SPQkcngfHl6ljroWgSzA45s2eHDJh586dVKtWjZ07dwIpnZ/PP/+cpk2bapyZEEIISyJ3jVm4BfvU3SA7m2zsBv2zDX77RB2zc0pZSDWfZd5ibjAY+Pzzz/niiy9QlJS1z7y9vVmzZg0NGzbUODshhBCWRgohCxbxKG03qFPNohTzyIZuUPhf8L8+pFlI9Z15ULRm1h8/E+7cuUNgYCB79+41xVq0aMGKFSsoVKiQhpkJIYSwVFIIWbCFe6+SoE/dDcqGO8Ue3U1ZQywpVh1vPA4qt8/642fC5cuXqVu3Lvfu3QPA1taWyZMnM2bMGBkQLYQQ4qnkE8JC3XuUyKpU3aAONbKhG6SPh+BAiFbPYM2r/vDG6Kw99kvw9fWlatWqABQtWpQ9e/bw8ccfSxEkhBDimeRTwkIt2p+2GzS4cRZ3g4xG2DQQbh9Tx4vVhnY/WPRCqra2tqxatYoePXpw6tQp6tevr3VKQgghrIBcGrNA92MTWXHouirWvkYRihfI4m7Q3q/h743qWL7iEBAE9tm8uv1z/Prrr+TPn5+6deuaYl5eXixfvly7pIQQQlgd6QhZoEX71N0gWxsdQxqXzdqDnlkLe6eqY45uKQup5imYtcfOAL1ez4cffkibNm0ICAggMjJS65SEEEJYMSmELExkbCIrDqnHBrWvnsXdoNA/4afB6pjOBjotg0IVs+64GXT9+nUaNGjAd999B8CtW7dYvHixxlkJIYSwZlIIWZiF+68SrzeYHtva6BjyZhaODXpwPWVwtCFJHW/5DZSxnMkHN23aRPXq1fnzzz8BsLe3Z+bMmYwZM0bjzIQQQlgzGSNkQaLikliZqhv0TrUilCjgmjUHTIhOWUj18X11vNb7UKtf1hwzgxITE/noo4+YOXOmKVaqVClCQkKoWdMy5zMSQghhPaQQsiCL9l/lcZK6GzQ0q7pBhmRY3xvuXVDHSzeBFlOy5pgZdPXqVfz9/Tl+/Lgp1rFjRxYvXoy7u7uGmQkhhMgppBCyEFFxSfx48Loq9nY1H0p6ZlE36LdP4PLv6ljBCinjgmy1/7aIj4+nXr16hIeHA+Do6Mj333/PgAED0FnwbfxCCCGsi4wRshCLU3WDbHQw9M0sulPsyCI4skAdc/GEwBBwsoxOi7OzM1988QUAZcuW5fDhwwwcOFCKICGEEGal/Z/+ggfpdIPeqVYE36zoBl36HbamGmBs6wCdV0P+kuY/3kvo06cPer2e9957j7x5LXeleyGEENZL847Q3Llz8fX1xcnJCT8/P/bv3//UbTds2ECzZs0oWLAgbm5u1KlTh99++y0bs80ai/+4SlyqblCW3CkWcR7W9wLFqI6/PQeK1zb/8TIgODiY8ePHq2I6nY6BAwdKESSEECLLaFoIhYSEMHz4cMaNG8fJkydp0KABLVu2JDQ0NN3t9+3bR7NmzdiyZQvHjx+ncePGtG3blpMnT2Zz5ubz8HESPx5U3ynWrqoPpQrmMe+B4u7Dan9IjFHH3xgDVfzNe6wMiI+PZ86cOXTv3p3JkyezefNmzXIRQgiR+2haCE2fPp0+ffrQt29fKlasyIwZMyhWrBjz5s1Ld/sZM2YwZswYXnvtNcqWLctXX31F2bJl+fnnn7M5c/NZ8sc1YhOTTY9TukFmHhukT0iZK+hhqgKzUnto/Il5j5UB58+fp27duuzYscMU27lzp2b5CCGEyH00GyOUlJTE8ePH+fjjj1Xx5s2bc/DgwRd6DaPRyKNHj/Dw8HjqNomJiSQmJpoex8SkdET0ej16vT4TmZvPw8d6lh64poq1frUwJfI7mi83RcF282Bsbv6pCht9amBoPROSk5+yY9ZauXIlQ4cO5fHjxwC4uLgwa9Ysunfvrvl5ya2efN3l6689ORfmZzQa0ev1KIqSof2Sk5Oxs7MjNjYWOzsZVpuVdDod9vb22Nik36PJqp8Hzc7q/fv3MRgMeHl5qeJeXl6mW6af57vvviMuLg5//6df2pkyZQqTJk1KE9+9ezcuLlm8iOlzbAm1IS7x3xOuQ6Gy7hZbttwy2zHKhf9ExbD/qWKP7T3Y59GTxB27zXacF5WQkMDChQvZtWuXKVa8eHFGjx6Np6cnW7ZsyfachNp/O3RCW3IuzMPW1hZPT0/s7e0ztX/hwoW5evWqmbMS6dHr9dy7dw+j0ZjmuSd/OJub5uVt6tuhFUV5oVuk16xZw8SJE/npp58oVKjQU7cbO3YsI0eOND2OiYmhWLFiNG7cmAIFCmQ+8ZcUHa9n3Hf7gX87Mq1f9aZ3xypmO4bu3CbsTqqLIMXBFfvuG2niVclsx3lRf/31F4GBgVy48O8kjj179qRly5a0adMm07+khHno9Xp27NhBs2bN5FxoTM6F+SiKwu3bt0lOTsbb2/up3YZn7R8XF4erq6tM35HFjEYjYWFheHl5UaRIkTRf76xaZFuzQsjT0xNbW9s03Z+IiIg0XaLUQkJC6NOnD+vWraNp02evh+Xo6Iijo2OauL29vaa/YFbsUY8N0ulgeLNy5svp1nH4eYg6prNB12Ep9kWrmecYGTR69GhTEZQnTx4WLFhAp06d2LJli+bnQ/xLzoXlkHPx8vR6PQkJCfj4+JAnT8ZvQnlySc3Z2TnDRZTIuEKFCnHnzh3TZbL/yqqfBc3OqoODA35+fmlavzt27KBu3bpP3W/NmjX07NmT1atX07p166xOM0tEx+tZlmZskDdlCpnpNvGHN2FNZ0hOUMebfwnl3zLPMTJh6dKl5M+fn6pVq3L8+HECAwM1y0UIkTsYDClTkzg4OGiciXgRT87Tk/OWHTS9NDZy5Ei6detGzZo1qVOnDgsXLiQ0NJQBAwYAKZe1bt++zYoVK4CUIqh79+7MnDmT119/3dRNcnZ2tqq1p5YduMajBHU3aFgTM90plvgopQiKi1DH/XrB6wPNc4wX9GSQ4RPFixdn586dVKxYEScnp2zNRQiRu8llLeugxXnStM8XEBDAjBkz+Pzzz6lWrRr79u1jy5YtlChRAoCwsDDVnEILFiwgOTmZwYMH4+3tbfr3wQcfaPUWMiw6Xs+SP9TdoFavelPOywzdIKMB/tcX7v6ljvs2hFbTUiqubKAoCvPnz6dGjRo8evRI9Vz16tWlCBJCCGExNB8sPWjQIAYNGpTuc8uXL1c93rNnT9YnlMWWH7iu6gYBDDPXvEE7PoOL29SxAmXB/0ewzZ5xBtHR0bz//vusXbsWgP79+xMUFCR/jQkhRBbQ6XRs3LiRd955R+tUrJaM/MpGMQl6lvyhvgWz9avelC9shm7QsWVwaLY65uwBXdeCc/6Xf/0XcPz4cfz8/ExFEKQMis/Oa71CCJFThIeHM3ToUEqVKoWjoyPFihWjbdu2FjPxrKIoTJw4ER8fH5ydnWnUqBF///231mllmOYdodxk+YHrxKTqBg1tYoY1xa7ugS0fqmM29hCwCjxKvfzrP4eiKMyePZsPP/yQpKQkANzd3Vm6dCnt27fP8uMLIcSLMBoVHjxOyuA+Rh491qO3STTLXWP5XRywsXl+h/z69evUq1ePfPny8c0331ClShX0ej2//fYbgwcPVk1DopVvvvmG6dOns3z5csqVK8fkyZNp1qwZ//zzj1WtESmFUDZ5lJB2bFDLyoWpUNjt5V743kUI6Q7GVDNEt50JJeu93Gu/gAcPHtCnTx82btxoitWqVYvg4GB8fX2z/PhCCPGiHjxOwm/y75rmcPzTphTIk3ZKl9QGDRqETqfjyJEjuLq6muKVKlWid+/eT93vo48+YuPGjdy6dYvChQvTtWtXPvvsM9Ot56dPn2b48OEcO3YMnU5H2bJlWbBgATVr1uTGjRsMGTKEP/74g6SkJEqWLMm0adNo1apVmuMoisKMGTMYN26c6Q/eH3/8ES8vL1avXk3//v0z+qXRjBRC2eTHg9eJjldPD/7Sd4o9jvr/hVSj1fH6I6B615d77Rfw559/0rlzZ65fv26KjRo1iq+++kpuVRVCiEyKiopi27ZtfPnll6oi6Il8+fI9dd+8efOyfPlyfHx8OHv2LP369SNv3ryMGTMGgK5du1K9enXmzZuHra0tp06dMhVJgwcPJikpiX379uHq6sq5c+eeOvfStWvXCA8Pp3nz5qaYo6MjDRs25ODBg1IICbVHCXoW7Vd3g96qVJiK3i/RDUpOgpD34IH6danYFt78LPOvmwEHDhwwFUEeHh4sX76ctm3bZsuxhRAip7p8+TKKolChQoUM7/vpp5+a/r9kyZKMGjWKkJAQUyEUGhrK6NGjTa9dtuy/f5CHhobSoUMHXn31VQBKlXr60Ion09ekt0zWjRs3Mpy3lqQQygYrDt0wbzdIUeCX4XDjgDruXQ3eXQDZNPvpiBEj2L17N1FRUaxZs4bixYtny3GFECIne7IwbGbutl2/fj0zZszg8uXLxMbGkpycjJvbv390jxw5kr59+7Jy5UqaNm1Kp06dKF26NADDhg1j4MCBbN++naZNm9KhQweqVHn2sk+ZXSbLkkghlMViE5NZtF99p1iLSl684vMS3aADM+BUkDqW1we6BIND2jaqudy+fZsiRYqYHut0OoKCgnB2dpZlAIQQFi+/iwPHP332skypGY1GHsXGkjdPHrMNln6esmXLotPpOH/+fIZuiz98+DCdO3dm0qRJtGjRAnd3d4KDg/nuu+9M20ycOJHAwEB+/fVXtm7dyoQJEwgODubdd9+lb9++tGjRgl9//ZXt27czZcoUvvvuO4YOHZrmWIULFwZSOkPe3t6m+Issk2Vp5Pb5LPbjwes8fGzGbtC5zfD7RHXM3gW6rAE373R3eVlGo5GpU6dSqlSpNLdturm5SREkhLAKNjY6CuRxzPA/Dxf7TO2X3r8XuWPMw8ODFi1aMGfOHOLi4tI8//Dhw3T3O3DgACVKlGDcuHHUrFmTsmXLpnuZqly5cowYMYLt27fTvn17li1bZnquWLFiDBgwgA0bNjBq1CgWLVqU7rF8fX0pXLiwapmspKQk9u7d+8xlsiyRFEJZKC4xmcWpukHNXvGikk8mlwO5cxI2vJ8qqIP2i8CnWuZe8znu3btHmzZt+Pjjj0lKSqJr165EREQ8f0chhBCZNnfuXAwGA7Vq1eJ///sfly5d4vz588yaNYs6deqku0+ZMmUIDQ0lODiYK1euMGvWLNUdvfHx8QwZMoQ9e/Zw48YNDhw4wNGjR6lYsSIAw4cP57fffuPatWucOHGCXbt2mZ5LTafTMXz4cL766is2btzIX3/9Rc+ePXFxcbG6dSTl0lgWWnHoBg9SdYM+yGw3KOYOrOkCyfHqeLNJULFNJjN8tn379tGlSxfu3LkDpHzj9+vXDw8Pjyw5nhBCiBS+vr6cOHGCL7/8klGjRhEWFkbBggXx8/Nj3rx56e7z9ttvM2LECIYMGUJiYiKtW7dm/PjxTJw4EQBbW1siIyPp3r07d+/exdPTk/bt2zNp0iQgZaHTwYMHc+vWLdzc3Hjrrbf4/vvvn5rjmDFjiI+PZ9CgQTx48IDatWuzfft2q5pDCECnPBmVlUvExMTg7u7O/fv3KVCgQJYdJy4xmfpTd6kKoaYVvVjco2bGXywpDpa+BeFn1PHq70G72WZfQ8xgMDBlyhQmTJiA0WgEoFChQgQFBdG0acaurz+PXq9ny5YttGrVSi6xaUzOheWQc2E+CQkJXLt2DV9f30ytc2g0GomJicHNzc0sY4TEsz3rfEVGRuLp6Ul0dLRqAPjLko5QFll52EzdIKMx5XJY6iKoRH1o/b3Zi6C7d+/y3nvv8fvv/0469uabb7Jq1SrVgDghhBAiJ5DyNgs8Tkpm0T712KCmFQvxatFMjA3aOQku/KKOeZSCgJVgZ95JCw8ePEjVqlVNRZCNjQ2TJk1i+/btUgQJIYTIkaQjlAVWHb5BZJx6PZsPmpTL+AudXJVyq/x/OblD4FpwMf84nQIFChAbGwuAt7c3q1evplGjRmY/jhBCCGEppCNkZo+TklmwV90NerNCJrpB1/+An4erYzZ24L8SPF9yaY6nKF++PAsWLKBFixacOnVKiiAhhBA5nhRCZhZ0ODSdblAGC5fIKynLZxjVY4xoPR1KNXzJDP+1d+9e4uPVd6F17dqVrVu3UqhQIbMdRwghhLBUUgiZUXySgQX7rqhijcsXpGqxfBl4kQcpC6nGP1DH6wwBvx4vnySQnJzMJ598QqNGjRg+fHia561tenQhhBAis6QQMqOgP29wPzZVN6hpBsYGGfSwtjtEXlbHy7eCZp+bIUO4desWjRs3ZsqUKQAsXLiQXbt2meW1hRBCCGsjhZCZxCcZmJ9qbFCj8gWp9qLdIEWBLR/CtX3quNerKTNH29i+dI6//vor1apV448//gDAzs6OadOmyVggIYQQuZYUQmay+kgo92MTVbEMjQ06NAeOL1fH8nhBYDA45nmp3PR6PaNHj6ZNmzZERkYCULx4cfbt28eHH34ok4QJIYSV0ul0bNq0Ses0rJp8AppBgt7A/L3qsUFvlCtI9eL5X+wF/tkK2z9Vx+ycUhZSdS/6UrnduHGDN954g2+//dYUe/vttzl58uRT16sRQgihvfDwcIYOHUqpUqVwdHSkWLFitG3bNs3i11rZsGEDLVq0wNPTE51Ox6lTp7ROKVNkHiEzWP1nKPceZbIbFH4W1vcBUq108u58KOL3UnmdO3eOevXqmVYqtre3Z9q0aQwbNkwGRAshhAW7fv069erVI1++fHzzzTdUqVIFvV7Pb7/9xuDBg7lw4YLWKRIXF0e9evXo1KkT/fr10zqdTJNC6CWl1w1qUNYTvxIv0A16FA6rO4M+Th1/81Oo9O5L51a+fHmqV6/O7t278fX1JSQkhNdee+2lX1cIIayS0QjxURneR/f4EdgmgTmGETh7vNDrDBo0CJ1Ox5EjR3B1dTXFK1WqRO/evZ+630cffcTGjRu5desWhQsXpmvXrnz22WemNetOnz7N8OHDOXbsGDqdjrJly7JgwQJq1qzJjRs3GDJkCH/88QdJSUmULFmSadOm0apVq3SP1a1bNyClaLNmUgi9pDVHQolI1Q0a3vQFukFJj1NWk4+5pY5X6QwNPjRLbra2tgQFBTFhwgS++eYb8uXLZ5bXFUIIqxQfBdNKZ2gXGyATiyM93egr4Or5zE2ioqLYtm0bX375paoIeuJZv8vz5s3L8uXL8fHx4ezZs/Tr14+8efMyZswYIGWuuOrVqzNv3jxsbW05deqUqUgaPHgwSUlJ7Nu3D1dXV86dO0eePC83RtUaSCH0Ep7eDXrO8hdGI2waAHdOqOPFXod2szK9kOr//vc/ihQpwuuvv26KeXt7s3Dhwky9nhBCiOx3+fJlFEWhQoUKGd7300//HW9asmRJRo0aRUhIiKkQCg0NZfTo0abXLlv23z/cQ0ND6dChA6+++ioApUqVepm3YTVksPRLCDl6k7sxmRgbtOcrOPeTOpavBHQOAjvHDOeRkJDAkCFD6NixIwEBAURFZbD1K4QQwmIoSsqY0cyM5Vy/fj3169encOHC5MmTh/HjxxMaGmp6fuTIkfTt25emTZvy9ddfc+XKv3/MDxs2jMmTJ1OvXj0mTJjAmTNnXv7NWAEphDIpQW9g7h71xIf1yhSgZsnndINOh8C+aeqYo1vKQqrPaZem59KlS9StW5c5c+YAKRX9ihUrMvw6QgghLEPZsmXR6XScP38+Q/sdPnyYzp0707JlS3755RdOnjzJuHHjSEr6d6LfiRMn8vfff9O6dWt27drFK6+8wsaNGwHo27cvV69epVu3bpw9e5aaNWvyww8/mPW9WSK5NJZJa4+l1w16zizSoYdh8xB1TGcLnZZDoYy3QIODg+nXr59pxXgnJydmzZpF3759M/xaQgiR4zl7pIzRyQCj0cijR4/ImzeveeZcc37OH8uAh4cHLVq0YM6cOQwbNizNOKGHDx+mO07owIEDlChRgnHjxpliN27cSLNduXLlKFeuHCNGjKBLly4sW7aMd99NuUGnWLFiDBgwgAEDBjB27FgWLVrE0KFDM/gmrYsUQpmQmGxg7m71D1Pd0gWo5fuMb/CoaxAcCAb1Ehy0+gbKNMnQ8ePj4xk+fLhq7E/58uVZu3YtVapUydBrCSFErmFjk/HOu9GIYnAAVzfz3DX2gubOnUvdunWpVasWn3/+OVWqVCE5OZkdO3Ywb968dLtFZcqUITQ0lODgYF577TV+/fVXU7cHUj47Ro8eTceOHfH19eXWrVscPXqUDh06ADB8+HBatmxJuXLlePDgAbt27aJixYpPzTEqKorQ0FDu3LkDwD///ANA4cKFKVy4sDm/HFlKLo1lwtqjNwmPSVDFnjk2KCEaVgfA40h1vPYAeC1j3ZsLFy5Qu3ZtVRHUrVs3jh07JkWQEELkEL6+vpw4cYLGjRszatQoKleuTLNmzdi5cyfz5s1Ld5+3336bESNGMGTIEKpVq8bBgwcZP3686XlbW1siIyPp3r075cqVw9/fn5YtWzJp0iQADAYDgwcPpmLFirz11luUL1+euXPnPjXHzZs3U716dVq3bg1A586dqV69OvPnzzfjVyLr6ZQno7JyiZiYGNzd3bl//z4FChTI8P6JyQYaTdtDWPS/hVCdUgVY8/7r6e9gSIbVneBKqoVNyzSDLsFg++JNudjYWEqWLGlaJsPZ2Zk5c+bQs2dPq50gUa/Xs2XLFlq1amW6hVNoQ86F5ZBzYT4JCQlcu3YNX19fnJycMry/0WgkJiYGNzc3WY4oGzzrfEVGRuLp6Ul0dDRubm5mO6ac1Qxad+yWqggC+OBZ8wZt+zhtEVToFei4NENFEECePHmYPHkykDKp1rFjx+jVq5fVFkFCCCGE1mSMUAakjA1S3ylW29eD10s9pbP050I4ukgdcy2Y0glyylw1279/f2xsbHjvvfdwcXHJ1GsIIYQQIoV0hDJg/fFb3EnVDRre9Cl3il3aAds+UsdsHaHzashf4rnHUhSFpUuXqq7vQsq8Eu+//74UQUIIIYQZSEfoBSUlG9PcKVbL14M6pdPpBt09B+t6gWJUx9+ZC8VqPfdYsbGxDBw4kFWrVgHw+uuvmwajCSGEEMJ8pCP0gtYfv8Xth/Gq2PD07hSLvZdyh1jSI3W84cfwasfnHufMmTP4+fmZiiCA/fv3ZypnIYQQQjybFEIvICnZyJxUY4NqlUynG6RPSJkrKDpUHa/cERp9/MxjKIrCggULqFWrFhcvXgRSFs9bs2YNX3/99Uu/ByGEEEKkJZfGXsCGE2m7QR80Lau+W0tR4KfBcOuIeueir8Hbc565kGpMTAzvv/8+ISEhpliNGjUICQmhTJkyZnkPQgghhEhLOkLPoTcYmZ2qG1SzRH7qpu4G7f0G/lqvjrkXSxkcbf/0uStOnDhhKnqeGDJkCAcPHpQiSAghhMhi0hF6jg0nbnHrQaqxQU3LqbtBZ9enrCj/Xw55IDAE8hR66msrisKIESNMq/+6u7uzZMkS03TnQgghhMha0hF6hvS6QX4l8lOvzH+6QTePwqZB6h11NtBxGXhVeubr63Q6fvzxR/Lly8drr73GyZMnpQgSQghhEUqWLMmMGTNMj3U6HZs2bdIsn6wihdAzbDxxm5tRqcYGNfnP2KCHoRDcBQzqVehpMQXKNU/3NfV6vepxyZIl2b17N3/88Qe+vr5my10IIYT1erJ00pN/BQoU4K233uLMmTOa5RQWFkbLli01O35WkULoKdLrBtUono8GZf9/5eKEGFjdGeLuqXes2Qdq90/zeoqiMGPGDPz8/IiNjVU9V61aNRwcHMyavxBCCOv21ltvERYWRlhYGDt37sTOzo42bdpolk/hwoVxdHTU7PhZRQqhp9h08jahUY9VsQ+ejA0yGuB/fSDib/VOpRpDy6lp7hCLiorinXfeYcSIEZw9e5aBAweSy9a6FUIIkUGOjo4ULlyYwoULU61aNT766CNu3rzJvXspf4B/9NFHlCtXDhcXF0qVKsX48eNVVx1Onz5N48aNyZs3L25ubvj5+XHs2DHT8wcPHuSNN97A2dmZYsWKMWzYMOLi4p6az38vjV2/fh2dTseGDRto3LgxLi4uVK1alUOHDqn2yegxtCCFUDqS0+kGVSuWjzeedIO2fwqXtqt38iwPnZaDrXql6EOHDlGtWjU2b95sivn4+EghJIQQ4oXFxsYSFBREmTJlKFAgZZxq3rx5Wb58OefOnWPmzJksWrSI77//3rRP165dKVq0KEePHuX48eN8/PHH2NunfEadPXuWFi1a0L59e86cOUNISAh//PEHQ4YMyVBe48aN48MPP+TUqVOUK1eOLl26kJycbNZjZDW5aywdm07d4Uakuhs0/Mm8QUeXwOG56h2cPVLuEHPOZwoZjUa+/fZbPvnkEwwGAwAFChRgxYoVtGrVKqvfghBCiGeYPn0606dPf+521atXZ+XKlapYu3btOHHixHP3HTlyJCNHjsx0jr/88gt58uQBIC4uDm9vb3755RdsbFJ6GJ9++qlp25IlSzJq1ChCQkIYM2YMAKGhoYwePZoKFSoAULbsv6shTJs2jcDAQIYPH256btasWTRs2JB58+bh5PT0aV/+68MPPzQtATVp0iQqVarE5cuXqVChgtmOkdWkEEol2WBk9q5LqljVYvloWK4gXNkFW0ard7B1SJkryOPfgc737t2jR48ebN261RSrX78+a9asoWjRolmavxBCiOeLiYnh9u3bz92uWLFiaWL37t17oX1jYmIyldsTjRs3Zt68eUDKEIu5c+fSsmVLjhw5QokSJVi/fj0zZszg8uXLxMbGkpycjJubm2n/kSNH0rdvX1auXEnTpk3p1KkTpUuXBuD48eNcvnyZoKAg0/aKomA0Grl27RoVK1Z8oRyrVKli+n9vb28AIiIiqFChgtmOkdWkEErlp1N3uJ66G9SkLLr7F2FtT1AM6h3azoISdUwP9+/fT+fOnblz5w6Qck31k08+YeLEidjZyZdbCCEsgZubG0WKFHnudp6enmliBQsWfKF9/1uUZIarq6tqYl0/Pz/c3d1ZtGgRbdq0oXPnzkyaNIkWLVrg7u5OcHAw3333nWn7iRMnEhgYyK+//srWrVuZMGECwcHBvPvuuxiNRvr378+wYcPSHLd48eIvnOOTS22A6Y5qo9Fo+q85jpHV5JP5P9IbG1S1qDuNitnAYn9IjFbv0GAUVOuiCh0+fNhUBBUqVIhVq1bRrFmzLM1bCCFExrzoZSuj0Zims/PfMZ/ZSafTYWNjQ3x8PAcOHKBEiRKMGzfO9PyNGzfS7FOuXDnKlSvHiBEj6NKlC8uWLePdd9+lRo0a/P3331m6gkF2HMMcZLD0f/x85g7X7qtHsw9vXAJdSDd4cF298StvQ+NPSW3UqFG0bNmSxo0bc+rUKSmChBBCZEpiYiLh4eGEh4dz/vx5hg4dSmxsLG3btqVMmTKEhoYSHBzMlStXmDVrFhs3bjTtGx8fz5AhQ9izZw83btzgwIEDHD161HQ56qOPPuLQoUMMHjyYU6dOcenSJTZv3szQoUPNln92HMMcpCP0/wxGhR92qrtBVYq40ejilxB6UL2xT3V4Zz7Y2HDz5k3VNWQbGxuCg4NxdXXF1tY2O1IXQgiRA23bts007iZv3rxUqFCBdevW0ahRIwBGjBjBkCFDSExMpHXr1owfP56JEycCYGtrS2RkJN27d+fu3bt4enrSvn17Jk2aBKSM7dm7dy/jxo2jQYMGKIpC6dKlCQgIMFv+2XEMc9Apuew+7piYGNzd3bl//77pFkRImTdoeMgp1ba/v3aMMmdT3VXgVgT67cLgUpDJkyfz5Zdfsn37dtM3psgYvV7Pli1baNWqlepas8h+ci4sh5wL80lISODatWv4+vpm6i6lJ5fG3NzcTHdriazzrPMVGRmJp6cn0dHRLz3+6r+kI0RKN2hWqjvF+nv+lbYIsneFLsGExSp0fbsZu3fvBiAwMJCzZ8+qCishhBBCWD4phIBfztzh6r1/xwa9qrvKmPjU80vooMNidvx1l/fea0ZERASQcilsyJAh5M+fPxszFkIIIYQ55PpCyGBUmLXz325QYSJZ7jQdW0OCarvkNycycdVBvvrqK9Os0EWKFGHNmjU0aNAgO1MWQgghhJnk+kLo17NhXPn/bpALCSxx+JYCSpRqm1vF2xP46Qb2799virVs2ZIVK1akO8eEEEIIIaxDri6E/tsNssHITPs5VLJRz8OwJ+EVOo75mcjISCBlJP6UKVMYNWqUDJwTQgghrFyu/iTfcjaMyxGxAIyxC6aZ7XH1Bh6l8fKfTnx8PJAyE+b+/fsZPXq0FEFCCGFFctkN0lZLi/OUaz/Njf/pBvnb7maA3S/qDZzyQdd1VKxRh3nz5tGuXTtOnjxJnTp10r6YEEIIi/RkPrekpCSNMxEv4sl5ys55+HLtpbHfz0dwKSKW123O8aXdUlN8x5VkGpR0xClgFRRIWZyue/fudOvWzbSOihBCCOtgZ2eHi4sL9+7dw97ePsPdfKPRSFJSEgkJCXIlIIsZjUbu3buHi4tLtq7NmWsLoYV/XMNXF8Z8+++x1xlIMih8tCORGX8mMahTHeb4qu8EkyJICCGsj06nw9vbm2vXrqW7FtfzKIpCfHw8zs7O8jmQDWxsbChevHi2fq1zbSF0714Em/NOI58ujmsPjASsf8zROykr5s5dt5Muf/xB/fr1Nc5SCCHEy3JwcKBs2bKZujym1+vZt28fb7zxhszynQ0cHByyvfOmeSE0d+5cpk2bRlhYGJUqVWLGjBnPnJdn7969jBw5kr///hsfHx/GjBnDgAEDMnzc7+3nUsomnP+d09NnczzRiSlxBwcHpk+fTr169TL7loQQQlgYGxubTC2xYWtrS3JyMk5OTlII5VCaXvAMCQlh+PDhjBs3jpMnT9KgQQNatmxJaGhouttfu3aNVq1a0aBBA06ePMknn3zCsGHD+N///pfhY1dVLjBkSzwd1/1bBJUuVcq0Uq60QIUQQoicT9NCaPr06fTp04e+fftSsWJFZsyYQbFixZg3b16628+fP5/ixYszY8YMKlasSN++fenduzfffvttho/dbEUcc47qTY8D2rfjxMmT1KhRI9PvRwghhBDWRbNCKCkpiePHj9O8eXNVvHnz5hw8eDDdfQ4dOpRm+xYtWnDs2DH0en26+zzNmYiUuQoc7WDB1+NYs36TWVezFUIIIYTl02yM0P379zEYDHh5eaniXl5ehIeHp7tPeHh4utsnJydz//59vL290+yTmJhIYmKi6XF0dLTp/0vn17Hk+y+o3GYAUVFRafYVWU+v1/P48WMiIyPl+rvG5FxYDjkXlkPOheV48jlt7kkXNR8snXosjqIozxyfk9726cWfmDJlCpMmTUr3uSsPFBr1/BT4NAMZCyGEEEIrkZGRuLu7m+31NCuEPD09sbW1TdP9iYiISNP1eaJw4cLpbm9nZ0eBAgXS3Wfs2LGMHDnS9Pjhw4eUKFGC0NBQs34hRebExMRQrFgxbt68KZcmNSbnwnLIubAcci4sR3R0NMWLF8fDw8Osr6tZIeTg4ICfnx87duzg3XffNcV37NjB22+/ne4+derU4eeff1bFtm/fTs2aNZ/asnR0dMTR0TFN3N3dXb6pLYibm5ucDwsh58JyyLmwHHIuLIe55xnS9K6xkSNHsnjxYpYuXcr58+cZMWIEoaGhpnmBxo4dS/fu3U3bDxgwgBs3bjBy5EjOnz/P0qVLWbJkCR9++KFWb0EIIYQQVkzTMUIBAQFERkby+eefExYWRuXKldmyZQslSpQAICwsTDWnkK+vL1u2bGHEiBHMmTMHHx8fZs2aRYcOHbR6C0IIIYSwYpoPlh40aBCDBg1K97nly5eniTVs2JATJ05k+niOjo5MmDAh3ctlIvvJ+bAcci4sh5wLyyHnwnJk1bnQKea+D00IIYQQwkpoOkZICCGEEEJLUggJIYQQIteSQkgIIYQQuZYUQkIIIYTItXJkITR37lx8fX1xcnLCz8+P/fv3P3P7vXv34ufnh5OTE6VKlWL+/PnZlGnOl5FzsWHDBpo1a0bBggVxc3OjTp06/Pbbb9mYbc6X0Z+NJw4cOICdnR3VqlXL2gRzkYyei8TERMaNG0eJEiVwdHSkdOnSLF26NJuyzdkyei6CgoKoWrUqLi4ueHt706tXLyIjI7Mp25xr3759tG3bFh8fH3Q6HZs2bXruPmb5/FZymODgYMXe3l5ZtGiRcu7cOeWDDz5QXF1dlRs3bqS7/dWrVxUXFxflgw8+UM6dO6csWrRIsbe3V9avX5/Nmec8GT0XH3zwgTJ16lTlyJEjysWLF5WxY8cq9vb2yokTJ7I585wpo+fjiYcPHyqlSpVSmjdvrlStWjV7ks3hMnMu2rVrp9SuXVvZsWOHcu3aNeXPP/9UDhw4kI1Z50wZPRf79+9XbGxslJkzZypXr15V9u/fr1SqVEl55513sjnznGfLli3KuHHjlP/9738KoGzcuPGZ25vr8zvHFUK1atVSBgwYoIpVqFBB+fjjj9PdfsyYMUqFChVUsf79+yuvv/56luWYW2T0XKTnlVdeUSZNmmTu1HKlzJ6PgIAA5dNPP1UmTJgghZCZZPRcbN26VXF3d1ciIyOzI71cJaPnYtq0aUqpUqVUsVmzZilFixbNshxzoxcphMz1+Z2jLo0lJSVx/Phxmjdvroo3b96cgwcPprvPoUOH0mzfokULjh07hl6vz7Jcc7rMnIvUjEYjjx49MvsCe7lRZs/HsmXLuHLlChMmTMjqFHONzJyLzZs3U7NmTb755huKFClCuXLl+PDDD4mPj8+OlHOszJyLunXrcuvWLbZs2YKiKNy9e5f169fTunXr7EhZ/Ie5Pr81n1nanO7fv4/BYEizer2Xl1eaVeufCA8PT3f75ORk7t+/j7e3d5blm5Nl5lyk9t133xEXF4e/v39WpJirZOZ8XLp0iY8//pj9+/djZ5ejflVoKjPn4urVq/zxxx84OTmxceNG7t+/z6BBg4iKipJxQi8hM+eibt26BAUFERAQQEJCAsnJybRr144ffvghO1IW/2Guz+8c1RF6QqfTqR4ripIm9rzt04uLjMvouXhizZo1TJw4kZCQEAoVKpRV6eU6L3o+DAYDgYGBTJo0iXLlymVXerlKRn42jEYjOp2OoKAgatWqRatWrZg+fTrLly+XrpAZZORcnDt3jmHDhvHZZ59x/Phxtm3bxrVr10yLhYvsZY7P7xz1Z56npye2trZpKvmIiIg0VeMThQsXTnd7Ozs7ChQokGW55nSZORdPhISE0KdPH9atW0fTpk2zMs1cI6Pn49GjRxw7doyTJ08yZMgQIOXDWFEU7Ozs2L59O2+++Wa25J7TZOZnw9vbmyJFiuDu7m6KVaxYEUVRuHXrFmXLls3SnHOqzJyLKVOmUK9ePUaPHg1AlSpVcHV1pUGDBkyePFmuImQjc31+56iOkMP/tXf3QVFVbxzAv7u4LPsiQViCgGyAblBQvIhDToGJsZnTMg6yo6tAwohTOoAYxkz4B0wRJuDAEI0NA2QSL445RpSuxcuCIxSsk8IqL4IkUU45CkIgL+f3h8PNTULwpxK7z2dmZzjn3nPuc/bMcp+599xdc3P4+PhAo9EY1Gs0Grz00ktTtvH3979n/1OnTsHX1xcCgeCRxWrsHmQugDtXgiIjI1FcXEz33B+i2c6HpaUlzp8/j3PnznGvHTt2QC6X49y5c1i5cuXjCt3oPMhnY9WqVfj1119x69Ytrq6trQ18Ph8ODg6PNF5j9iBzMTQ0BD7f8NRpZmYG4O+rEeTxeGjn71ktrZ4HJh+FzM/PZ62trSwuLo5JJBLW3d3NGGPsvffeY1u3buX2n3z8Lj4+nrW2trL8/Hx6fP4hme1cFBcXswULFrDc3FzW19fHvW7cuDFXQzAqs52Pf6Knxh6e2c7FwMAAc3BwYKGhoaylpYXV1NSwZcuWsejo6LkagtGY7VwUFBSwBQsWsE8++YR1dnayuro65uvry/z8/OZqCEZjYGCA6XQ6ptPpGACWmZnJdDod91UGj+r8bXSJEGOM5ebmMicnJ2Zubs68vb1ZTU0Nty0iIoIFBAQY7F9dXc28vLyYubk5k8lkLC8v7zFHbLxmMxcBAQEMwD2viIiIxx+4kZrtZ+NulAg9XLOdC71ez4KCgphIJGIODg5s9+7dbGho6DFHbZxmOxfZ2dnM3d2diUQiZmdnx9RqNbt69epjjtr4VFVVTXsOeFTnbx5jdC2PEEIIIabJqNYIEUIIIYTMBiVChBBCCDFZlAgRQgghxGRRIkQIIYQQk0WJECGEEEJMFiVChBBCCDFZlAgRQgghxGRRIkQIIVM4dOgQHB0dwefzcfDgwbkOZ1Z4PB6OHz8+12EQMi9QIkTIPBEZGQkejwcejweBQABnZ2fs2bMHg4ODcx3afclksnmVTPT392Pnzp3Yu3cvent7sX379rkOiRDyiBjVr88TYuwUCgUKCgowOjoKrVaL6OhoDA4OIi8vb9Z9McYwPj6OBQvo38A/9fT0YHR0FG+88Qb9mjghRo6uCBEyjwiFQtja2sLR0RGbN2+GWq3mboEwxrB//344OztDJBLhhRdewNGjR7m21dXV4PF4OHnyJHx9fSEUCqHVajExMYH09HS4urpCKBRi6dKl+OCDD7h2vb29UKlUsLa2ho2NDZRKJbq7u7ntkZGRCAkJwYEDB2BnZwcbGxu88847GB0dBQAEBgbiypUriI+P565oAcCff/6JTZs2wcHBAWKxGB4eHvjyyy8NxjswMAC1Wg2JRAI7OztkZWUhMDAQcXFx3D63b99GYmIi7O3tIZFIsHLlSlRXV0/7Pvb09ECpVEIqlcLS0hJhYWH4/fffAQCFhYXw8PAAADg7O4PH4xmM9+7j7ty5E3Z2drCwsIBMJkNaWhq3PTMzEx4eHpBIJHB0dMTbb79t8OvxhYWFsLKyQkVFBeRyOcRiMUJDQzE4OIiioiLIZDJYW1tj165dGB8f59rJZDKkpqZi8+bNkEqlWLJkCXJycqYd7/3mkBBTRokQIfOYSCTiEo73338fBQUFyMvLQ0tLC+Lj47FlyxbU1NQYtElMTERaWhr0ej08PT2RlJSE9PR0JCcno7W1FcXFxVi8eDEAYGhoCKtXr4ZUKkVtbS3q6uoglUqhUChw+/Ztrs+qqip0dnaiqqoKRUVFKCwsRGFhIQDg2LFjcHBwQEpKCvr6+tDX1wcAGB4eho+PDyoqKnDhwgVs374dW7duRUNDA9fv7t27UV9fjxMnTkCj0UCr1aK5udlgPG+99Rbq6+tRUlKCn3/+GRs3boRCoUB7e/uU7xljDCEhIbh+/Tpqamqg0WjQ2dkJlUoFAFCpVDh9+jQAoLGxEX19fXB0dLynn+zsbJw4cQJlZWW4dOkSvvjiC8hkMm47n89HdnY2Lly4gKKiIvzwww9ITEw06GNoaAjZ2dkoKSnBd999h+rqamzYsAGVlZWorKzE4cOHcejQIYOEFgA+/vhjeHp6orm5GUlJSYiPj4dGo5lyvDOdQ0JM1v/3W7GEkMclIiKCKZVKrtzQ0MBsbGxYWFgYu3XrFrOwsGBnzpwxaBMVFcU2bdrEGPv7l52PHz/Obe/v72dCoZB99tlnUx4zPz+fyeVyNjExwdWNjIwwkUjETp48ycXl5OTExsbGuH02btzIVCoVV3ZycmJZWVn3HeO6detYQkICF5tAIGDl5eXc9hs3bjCxWMxiY2MZY4x1dHQwHo/Hent7DfpZs2YNS0pKmvIYp06dYmZmZqynp4era2lpYQBYY2MjY4wxnU7HALCurq5/jXXXrl3s1VdfNXhvplNWVsZsbGy4ckFBAQPAOjo6uLqYmBgmFovZwMAAVxccHMxiYmK4spOTE1MoFAZ9q1Qq9vrrr3NlAOyrr75ijM1sDgkxZbQ4gJB5pKKiAlKpFGNjYxgdHYVSqUROTg5aW1sxPDyMtWvXGux/+/ZteHl5GdT5+vpyf+v1eoyMjGDNmjVTHq+pqQkdHR1YuHChQf3w8DA6Ozu58nPPPQczMzOubGdnh/Pnz087lvHxcXz00UcoLS1Fb28vRkZGMDIyAolEAgC4fPkyRkdH4efnx7V54oknIJfLuXJzczMYY1i+fLlB3yMjI7CxsZnyuHq9Ho6OjgZXedzd3WFlZQW9Xo8VK1ZMG/ekyMhIrF27FnK5HAqFAuvXr8drr73Gba+qqsKHH36I1tZW9Pf3Y2xsDMPDwxgcHOTGKBaL4eLiwrVZvHgxZDIZpFKpQd21a9cMju3v739P+d8Wo890DgkxVZQIETKPrF69Gnl5eRAIBFiyZAkEAgEAoKurCwDwzTffwN7e3qCNUCg0KE+ehIE7t9amMzExAR8fHxw5cuSebU899RT392Qck3g8HiYmJqbtOyMjA1lZWTh48CC3liYuLo67XcMY4/q622T9ZHxmZmZoamoySMQAGCQT/2z/zz6nq/833t7e6OrqwrfffovTp08jLCwMQUFBOHr0KK5cuYJ169Zhx44dSE1NxZNPPom6ujpERUVxtzKBqd+3B3kvJ/ebykznkBBTRYkQIfOIRCKBq6vrPfXu7u4QCoXo6elBQEDAjPtbtmwZRCIRvv/+e0RHR9+z3dvbG6WlpXj66adhaWn5wHGbm5sbLPgFAK1WC6VSiS1btgC4c8Jub2+Hm5sbAMDFxQUCgQCNjY3c1Zv+/n60t7dzY/Ty8sL4+DiuXbuGl19+eUaxuLu7o6enB7/88gvXb2trK27evMkde6YsLS2hUqmgUqkQGhoKhUKB69ev46effsLY2BgyMjLA599ZillWVjarvqdz9uzZe8rPPvvslPs+rDkkxFjRYmlCjMDChQuxZ88exMfHo6ioCJ2dndDpdMjNzUVRUdG/trOwsMDevXuRmJiIzz//HJ2dnTh79izy8/MBAGq1GosWLYJSqYRWq0VXVxdqamoQGxuLq1evzjg+mUyG2tpa9Pb24o8//gAAuLq6QqPR4MyZM9Dr9YiJicFvv/1mMKaIiAi8++67qKqqQktLC7Zt2wY+n89d/Vi+fDnUajXCw8Nx7NgxdHV14ccff0R6ejoqKyunjCUoKAienp5Qq9Vobm5GY2MjwsPDERAQYHDb8H6ysrJQUlKCixcvoq2tDeXl5bC1tYWVlRVcXFwwNjaGnJwcXL58GYcPH8ann346477vp76+Hvv370dbWxtyc3NRXl6O2NjYKfd9WHNIiLGiRIgQI5Gamop9+/YhLS0Nbm5uCA4Oxtdff41nnnlm2nbJyclISEjAvn374ObmBpVKxa1JEYvFqK2txdKlS7Fhwwa4ublh27Zt+Ouvv2Z1dSElJQXd3d1wcXHhbsckJyfD29sbwcHBCAwMhK2tLUJCQgzaZWZmwt/fH+vXr0dQUBBWrVoFNzc3WFhYcPsUFBQgPDwcCQkJkMvlePPNN9HQ0DDlk17A39+6bG1tjVdeeQVBQUFwdnZGaWnpjMcD3Ln1lp6eDl9fX6xYsQLd3d2orKwEn8/Hiy++iMzMTKSnp+P555/HkSNHDB6t/38lJCSgqakJXl5eSE1NRUZGBoKDg6fc92HNISHGisfuvuFOCCH/YYODg7C3t0dGRgaioqLmOpw5IZPJEBcXZ/BdSoSQB0drhAgh/1k6nQ4XL16En58fbt68iZSUFACAUqmc48gIIcaCEiFCyH/agQMHcOnSJZibm8PHxwdarRaLFi2a67AIIUaCbo0RQgghxGTRYmlCCCGEmCxKhAghhBBisigRIoQQQojJokSIEEIIISaLEiFCCCGEmCxKhAghhBBisigRIoQQQojJokSIEEIIISaLEiFCCCGEmKz/AbJZwh57msdDAAAAAElFTkSuQmCC",
-      "text/plain": [
-       "<Figure size 640x480 with 1 Axes>"
-      ]
-     },
-     "metadata": {},
-     "output_type": "display_data"
-    }
-   ],
-   "source": [
-    "import matplotlib.pyplot as plt\n",
-    "import numpy as np\n",
-    "from sklearn.model_selection import  train_test_split \n",
-    "from sklearn.datasets import load_breast_cancer\n",
-    "from sklearn.linear_model import LogisticRegression\n",
-    "\n",
-    "# Load the data\n",
-    "cancer = load_breast_cancer()\n",
-    "\n",
-    "X_train, X_test, y_train, y_test = train_test_split(cancer.data,cancer.target,random_state=0)\n",
-    "print(X_train.shape)\n",
-    "print(X_test.shape)\n",
-    "# Logistic Regression\n",
-    "logreg = LogisticRegression(solver='lbfgs')\n",
-    "logreg.fit(X_train, y_train)\n",
-    "\n",
-    "from sklearn.preprocessing import LabelEncoder\n",
-    "from sklearn.model_selection import cross_validate\n",
-    "#Cross validation\n",
-    "accuracy = cross_validate(logreg,X_test,y_test,cv=10)['test_score']\n",
-    "print(accuracy)\n",
-    "print(\"Test set accuracy with Logistic Regression: {:.2f}\".format(logreg.score(X_test,y_test)))\n",
-    "\n",
-    "import scikitplot as skplt\n",
-    "y_pred = logreg.predict(X_test)\n",
-    "skplt.metrics.plot_confusion_matrix(y_test, y_pred, normalize=True)\n",
-    "plt.show()\n",
-    "y_probas = logreg.predict_proba(X_test)\n",
-    "skplt.metrics.plot_roc(y_test, y_probas)\n",
-    "plt.show()\n",
-    "skplt.metrics.plot_cumulative_gain(y_test, y_probas)\n",
-    "plt.show()"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "fe0c7fda",
-   "metadata": {},
-   "source": [
-    "## Optimization, the central part of any Machine Learning algortithm\n",
-    "\n",
-    "[Overview Video, why do we care about gradient methods?](https://www.uio.no/studier/emner/matnat/fys/FYS-STK3155/h20/forelesningsvideoer/OverarchingAimsWeek39.mp4?vrtx=view-as-webpage)\n",
-    "\n",
-    "Almost every problem in machine learning and data science starts with\n",
-    "a dataset $X$, a model $g(\\beta)$, which is a function of the\n",
-    "parameters $\\beta$ and a cost function $C(X, g(\\beta))$ that allows\n",
-    "us to judge how well the model $g(\\beta)$ explains the observations\n",
-    "$X$. The model is fit by finding the values of $\\beta$ that minimize\n",
-    "the cost function. Ideally we would be able to solve for $\\beta$\n",
-    "analytically, however this is not possible in general and we must use\n",
-    "some approximative/numerical method to compute the minimum."
+    "resulting in"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "9df4ecc4",
-   "metadata": {},
-   "source": [
-    "## Revisiting our Logistic Regression case\n",
-    "\n",
-    "In our discussion on Logistic Regression we studied the \n",
-    "case of\n",
-    "two classes, with $y_i$ either\n",
-    "$0$ or $1$. Furthermore we assumed also that we have only two\n",
-    "parameters $\\beta$ in our fitting, that is we\n",
-    "defined probabilities"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "a0a65501",
-   "metadata": {},
-   "source": [
-    "$$\n",
-    "\\begin{align*}\n",
-    "p(y_i=1|x_i,\\boldsymbol{\\beta}) &= \\frac{\\exp{(\\beta_0+\\beta_1x_i)}}{1+\\exp{(\\beta_0+\\beta_1x_i)}},\\nonumber\\\\\n",
-    "p(y_i=0|x_i,\\boldsymbol{\\beta}) &= 1 - p(y_i=1|x_i,\\boldsymbol{\\beta}),\n",
-    "\\end{align*}\n",
-    "$$"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "8dc1194c",
-   "metadata": {},
-   "source": [
-    "where $\\boldsymbol{\\beta}$ are the weights we wish to extract from data, in our case $\\beta_0$ and $\\beta_1$."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "62d70952",
-   "metadata": {},
-   "source": [
-    "## The equations to solve\n",
-    "\n",
-    "Our compact equations used a definition of a vector $\\boldsymbol{y}$ with $n$\n",
-    "elements $y_i$, an $n\\times p$ matrix $\\boldsymbol{X}$ which contains the\n",
-    "$x_i$ values and a vector $\\boldsymbol{p}$ of fitted probabilities\n",
-    "$p(y_i\\vert x_i,\\boldsymbol{\\beta})$. We rewrote in a more compact form\n",
-    "the first derivative of the cost function as"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "41787d1e",
-   "metadata": {},
-   "source": [
-    "$$\n",
-    "\\frac{\\partial \\mathcal{C}(\\boldsymbol{\\beta})}{\\partial \\boldsymbol{\\beta}} = -\\boldsymbol{X}^T\\left(\\boldsymbol{y}-\\boldsymbol{p}\\right).\n",
-    "$$"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "86fc7282",
-   "metadata": {},
-   "source": [
-    "If we in addition define a diagonal matrix $\\boldsymbol{W}$ with elements \n",
-    "$p(y_i\\vert x_i,\\boldsymbol{\\beta})(1-p(y_i\\vert x_i,\\boldsymbol{\\beta})$, we can obtain a compact expression of the second derivative as"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "8f4c640e",
-   "metadata": {},
-   "source": [
-    "$$\n",
-    "\\frac{\\partial^2 \\mathcal{C}(\\boldsymbol{\\beta})}{\\partial \\boldsymbol{\\beta}\\partial \\boldsymbol{\\beta}^T} = \\boldsymbol{X}^T\\boldsymbol{W}\\boldsymbol{X}.\n",
-    "$$"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "f37d28ca",
-   "metadata": {},
-   "source": [
-    "This defines what is called  the Hessian matrix."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "9b5cb6dd",
-   "metadata": {},
-   "source": [
-    "## Solving using Newton-Raphson's method\n",
-    "\n",
-    "If we can set up these equations, Newton-Raphson's iterative method is normally the method of choice. It requires however that we can compute in an efficient way the  matrices that define the first and second derivatives. \n",
-    "\n",
-    "Our iterative scheme is then given by"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "f474d68a",
-   "metadata": {},
+   "id": "5b6a9003",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
-    "\\boldsymbol{\\beta}^{\\mathrm{new}} = \\boldsymbol{\\beta}^{\\mathrm{old}}-\\left(\\frac{\\partial^2 \\mathcal{C}(\\boldsymbol{\\beta})}{\\partial \\boldsymbol{\\beta}\\partial \\boldsymbol{\\beta}^T}\\right)^{-1}_{\\boldsymbol{\\beta}^{\\mathrm{old}}}\\times \\left(\\frac{\\partial \\mathcal{C}(\\boldsymbol{\\beta})}{\\partial \\boldsymbol{\\beta}}\\right)_{\\boldsymbol{\\beta}^{\\mathrm{old}}},\n",
+    "\\left[\\int_{-\\infty}^{\\infty}dxp(x)\\exp{\\left(iq(\\mu-x)/m\\right)}\\right]^m\\approx\n",
+    "  \\left[1-\\frac{q^2\\sigma^2}{2m^2}+\\dots \\right]^m,\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "ff928190",
-   "metadata": {},
+   "id": "88b7b6c2",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "or in matrix form as"
+    "and in the limit $m\\rightarrow \\infty$ we obtain"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "a9e9efc2",
-   "metadata": {},
+   "id": "bb8051d4",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
-    "\\boldsymbol{\\beta}^{\\mathrm{new}} = \\boldsymbol{\\beta}^{\\mathrm{old}}-\\left(\\boldsymbol{X}^T\\boldsymbol{W}\\boldsymbol{X} \\right)^{-1}\\times \\left(-\\boldsymbol{X}^T(\\boldsymbol{y}-\\boldsymbol{p}) \\right)_{\\boldsymbol{\\beta}^{\\mathrm{old}}}.\n",
+    "\\tilde{p}(z)=\\frac{1}{\\sqrt{2\\pi}(\\sigma/\\sqrt{m})}\n",
+    "    \\exp{\\left(-\\frac{(z-\\mu)^2}{2(\\sigma/\\sqrt{m})^2}\\right)},\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "93061994",
-   "metadata": {},
+   "id": "4950aac9",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "The right-hand side is computed with the old values of $\\beta$. \n",
-    "\n",
-    "If we can compute these matrices, in particular the Hessian, the above is often the easiest method to implement."
+    "which is the normal distribution with variance\n",
+    "$\\sigma^2_m=\\sigma^2/m$, where $\\sigma$ is the variance of the PDF $p(x)$\n",
+    "and $\\mu$ is also the mean of the PDF $p(x)$."
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "08acd443",
-   "metadata": {},
+   "id": "6d705546",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "## Brief reminder on Newton-Raphson's method\n",
+    "## Wrapping it up\n",
     "\n",
-    "Let us quickly remind ourselves how we derive the above method.\n",
+    "Thus, the central limit theorem states that the PDF $\\tilde{p}(z)$ of\n",
+    "the average of $m$ random values corresponding to a PDF $p(x)$ \n",
+    "is a normal distribution whose mean is the \n",
+    "mean value of the PDF $p(x)$ and whose variance is the variance\n",
+    "of the PDF $p(x)$ divided by $m$, the number of values used to compute $z$.\n",
     "\n",
-    "Perhaps the most celebrated of all one-dimensional root-finding\n",
-    "routines is Newton's method, also called the Newton-Raphson\n",
-    "method. This method  requires the evaluation of both the\n",
-    "function $f$ and its derivative $f'$ at arbitrary points. \n",
-    "If you can only calculate the derivative\n",
-    "numerically and/or your function is not of the smooth type, we\n",
-    "normally discourage the use of this method."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "caa94b50",
-   "metadata": {},
-   "source": [
-    "## The equations\n",
-    "\n",
-    "The Newton-Raphson formula consists geometrically of extending the\n",
-    "tangent line at a current point until it crosses zero, then setting\n",
-    "the next guess to the abscissa of that zero-crossing.  The mathematics\n",
-    "behind this method is rather simple. Employing a Taylor expansion for\n",
-    "$x$ sufficiently close to the solution $s$, we have"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "ac3e7ef2",
-   "metadata": {},
-   "source": [
-    "<!-- Equation labels as ordinary links -->\n",
-    "<div id=\"eq:taylornr\"></div>\n",
-    "\n",
-    "$$\n",
-    "f(s)=0=f(x)+(s-x)f'(x)+\\frac{(s-x)^2}{2}f''(x) +\\dots.\n",
-    "    \\label{eq:taylornr} \\tag{2}\n",
-    "$$"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "6bd1aafd",
-   "metadata": {},
-   "source": [
-    "For small enough values of the function and for well-behaved\n",
-    "functions, the terms beyond linear are unimportant, hence we obtain"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "699697a1",
-   "metadata": {},
-   "source": [
-    "$$\n",
-    "f(x)+(s-x)f'(x)\\approx 0,\n",
-    "$$"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "4efbdd72",
-   "metadata": {},
-   "source": [
-    "yielding"
+    "The central limit theorem leads to the well-known expression for the\n",
+    "standard deviation, given by"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "4bd64a59",
-   "metadata": {},
+   "id": "749b506b",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
-    "s\\approx x-\\frac{f(x)}{f'(x)}.\n",
+    "\\sigma_m=\n",
+    "\\frac{\\sigma}{\\sqrt{m}}.\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "358dc6db",
-   "metadata": {},
+   "id": "02d5afea",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "Having in mind an iterative procedure, it is natural to start iterating with"
+    "The latter is true only if the average value is known exactly. This is obtained in the limit\n",
+    "$m\\rightarrow \\infty$  only. Because the mean and the variance are measured quantities we obtain \n",
+    "the familiar expression in statistics (the so-called Bessel correction)"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "8a007c48",
-   "metadata": {},
+   "id": "2664f854",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
-    "x_{n+1}=x_n-\\frac{f(x_n)}{f'(x_n)}.\n",
+    "\\sigma_m\\approx \n",
+    "\\frac{\\sigma}{\\sqrt{m-1}}.\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "e0828d1d",
-   "metadata": {},
-   "source": [
-    "## Simple geometric interpretation\n",
-    "\n",
-    "The above is Newton-Raphson's method. It has a simple geometric\n",
-    "interpretation, namely $x_{n+1}$ is the point where the tangent from\n",
-    "$(x_n,f(x_n))$ crosses the $x$-axis.  Close to the solution,\n",
-    "Newton-Raphson converges fast to the desired result. However, if we\n",
-    "are far from a root, where the higher-order terms in the series are\n",
-    "important, the Newton-Raphson formula can give grossly inaccurate\n",
-    "results. For instance, the initial guess for the root might be so far\n",
-    "from the true root as to let the search interval include a local\n",
-    "maximum or minimum of the function.  If an iteration places a trial\n",
-    "guess near such a local extremum, so that the first derivative nearly\n",
-    "vanishes, then Newton-Raphson may fail totally"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "26efa0c4",
-   "metadata": {},
+   "id": "a986ee46",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "## Extending to more than one variable\n",
+    "In many cases however the above estimate for the standard deviation,\n",
+    "in particular if correlations are strong, may be too simplistic. Keep\n",
+    "in mind that we have assumed that the variables $x$ are independent\n",
+    "and identically distributed. This is obviously not always the\n",
+    "case. For example, the random numbers (or better pseudorandom numbers)\n",
+    "we generate in various calculations do always exhibit some\n",
+    "correlations.\n",
     "\n",
-    "Newton's method can be generalized to systems of several non-linear equations\n",
-    "and variables. Consider the case with two equations"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "8af30001",
-   "metadata": {},
-   "source": [
-    "$$\n",
-    "\\begin{array}{cc} f_1(x_1,x_2) &=0\\\\\n",
-    "                     f_2(x_1,x_2) &=0,\\end{array}\n",
-    "$$"
+    "The theorem is satisfied by a large class of PDFs. Note however that for a\n",
+    "finite $m$, it is not always possible to find a closed form /analytic expression for\n",
+    "$\\tilde{p}(x)$."
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "77528641",
-   "metadata": {},
-   "source": [
-    "which we Taylor expand to obtain"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "d10154f0",
-   "metadata": {},
-   "source": [
-    "$$\n",
-    "\\begin{array}{cc} 0=f_1(x_1+h_1,x_2+h_2)=&f_1(x_1,x_2)+h_1\n",
-    "                     \\partial f_1/\\partial x_1+h_2\n",
-    "                     \\partial f_1/\\partial x_2+\\dots\\\\\n",
-    "                     0=f_2(x_1+h_1,x_2+h_2)=&f_2(x_1,x_2)+h_1\n",
-    "                     \\partial f_2/\\partial x_1+h_2\n",
-    "                     \\partial f_2/\\partial x_2+\\dots\n",
-    "                       \\end{array}.\n",
-    "$$"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "58a6cb05",
-   "metadata": {},
-   "source": [
-    "Defining the Jacobian matrix ${\\bf \\boldsymbol{J}}$ we have"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "87917443",
-   "metadata": {},
-   "source": [
-    "$$\n",
-    "{\\bf \\boldsymbol{J}}=\\left( \\begin{array}{cc}\n",
-    "                         \\partial f_1/\\partial x_1  & \\partial f_1/\\partial x_2 \\\\\n",
-    "                          \\partial f_2/\\partial x_1     &\\partial f_2/\\partial x_2\n",
-    "             \\end{array} \\right),\n",
-    "$$"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "316440eb",
-   "metadata": {},
-   "source": [
-    "we can rephrase Newton's method as"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "4ec22184",
-   "metadata": {},
-   "source": [
-    "$$\n",
-    "\\left(\\begin{array}{c} x_1^{n+1} \\\\ x_2^{n+1} \\end{array} \\right)=\n",
-    "\\left(\\begin{array}{c} x_1^{n} \\\\ x_2^{n} \\end{array} \\right)+\n",
-    "\\left(\\begin{array}{c} h_1^{n} \\\\ h_2^{n} \\end{array} \\right),\n",
-    "$$"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "9da35b82",
-   "metadata": {},
-   "source": [
-    "where we have defined"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "61c4f7fc",
-   "metadata": {},
-   "source": [
-    "$$\n",
-    "\\left(\\begin{array}{c} h_1^{n} \\\\ h_2^{n} \\end{array} \\right)=\n",
-    "   -{\\bf \\boldsymbol{J}}^{-1}\n",
-    "   \\left(\\begin{array}{c} f_1(x_1^{n},x_2^{n}) \\\\ f_2(x_1^{n},x_2^{n}) \\end{array} \\right).\n",
-    "$$"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "ffd39c16",
-   "metadata": {},
+   "id": "f21341e3",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "We need thus to compute the inverse of the Jacobian matrix and it\n",
-    "is to understand that difficulties  may\n",
-    "arise in case ${\\bf \\boldsymbol{J}}$ is nearly singular.\n",
+    "## Confidence Intervals\n",
     "\n",
-    "It is rather straightforward to extend the above scheme to systems of\n",
-    "more than two non-linear equations. In our case, the Jacobian matrix is given by the Hessian that represents the second derivative of cost function."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "de590520",
-   "metadata": {},
-   "source": [
-    "## Steepest descent\n",
+    "Confidence intervals are used in statistics and represent a type of estimate\n",
+    "computed from the observed data. This gives a range of values for an\n",
+    "unknown parameter such as the parameters $\\boldsymbol{\\theta}$ from linear regression.\n",
     "\n",
-    "The basic idea of gradient descent is\n",
-    "that a function $F(\\mathbf{x})$, \n",
-    "$\\mathbf{x} \\equiv (x_1,\\cdots,x_n)$, decreases fastest if one goes from $\\bf {x}$ in the\n",
-    "direction of the negative gradient $-\\nabla F(\\mathbf{x})$.\n",
+    "With the OLS expressions for the parameters $\\boldsymbol{\\theta}$ we found \n",
+    "$\\mathbb{E}(\\boldsymbol{\\theta}) = \\boldsymbol{\\theta}$, which means that the estimator of the regression parameters is unbiased.\n",
     "\n",
-    "It can be shown that if"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "6a0e0292",
-   "metadata": {},
-   "source": [
-    "$$\n",
-    "\\mathbf{x}_{k+1} = \\mathbf{x}_k - \\gamma_k \\nabla F(\\mathbf{x}_k),\n",
-    "$$"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "ec6877a5",
-   "metadata": {},
-   "source": [
-    "with $\\gamma_k > 0$.\n",
+    "In the exercises this week we show that the variance of the estimate of the $j$-th regression coefficient is\n",
+    "$\\boldsymbol{\\sigma}^2 (\\boldsymbol{\\theta}_j ) = \\boldsymbol{\\sigma}^2 [(\\mathbf{X}^{T} \\mathbf{X})^{-1}]_{jj} $.\n",
     "\n",
-    "For $\\gamma_k$ small enough, then $F(\\mathbf{x}_{k+1}) \\leq\n",
-    "F(\\mathbf{x}_k)$. This means that for a sufficiently small $\\gamma_k$\n",
-    "we are always moving towards smaller function values, i.e a minimum."
+    "This quantity can be used to\n",
+    "construct a confidence interval for the estimates."
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "b7e72c2f",
-   "metadata": {},
+   "id": "b22eb043",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "## More on Steepest descent\n",
+    "## Standard Approach based on the Normal Distribution\n",
     "\n",
-    "The previous observation is the basis of the method of steepest\n",
-    "descent, which is also referred to as just gradient descent (GD). One\n",
-    "starts with an initial guess $\\mathbf{x}_0$ for a minimum of $F$ and\n",
-    "computes new approximations according to"
+    "We will assume that the parameters $\\theta$ follow a normal\n",
+    "distribution.  We can then define the confidence interval.  Here we will be using as\n",
+    "shorthands $\\mu_{\\theta}$ for the above mean value and $\\sigma_{\\theta}$\n",
+    "for the standard deviation. We have then a confidence interval"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "cae90d84",
-   "metadata": {},
+   "id": "a2e2c4c5",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
-    "\\mathbf{x}_{k+1} = \\mathbf{x}_k - \\gamma_k \\nabla F(\\mathbf{x}_k), \\ \\ k \\geq 0.\n",
+    "\\left(\\mu_{\\theta}\\pm \\frac{z\\sigma_{\\theta}}{\\sqrt{n}}\\right),\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "1ab31b86",
-   "metadata": {},
-   "source": [
-    "The parameter $\\gamma_k$ is often referred to as the step length or\n",
-    "the learning rate within the context of Machine Learning."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "87d0d18e",
-   "metadata": {},
-   "source": [
-    "## The ideal\n",
-    "\n",
-    "Ideally the sequence $\\{\\mathbf{x}_k \\}_{k=0}$ converges to a global\n",
-    "minimum of the function $F$. In general we do not know if we are in a\n",
-    "global or local minimum. In the special case when $F$ is a convex\n",
-    "function, all local minima are also global minima, so in this case\n",
-    "gradient descent can converge to the global solution. The advantage of\n",
-    "this scheme is that it is conceptually simple and straightforward to\n",
-    "implement. However the method in this form has some severe\n",
-    "limitations:\n",
-    "\n",
-    "In machine learing we are often faced with non-convex high dimensional\n",
-    "cost functions with many local minima. Since GD is deterministic we\n",
-    "will get stuck in a local minimum, if the method converges, unless we\n",
-    "have a very good intial guess. This also implies that the scheme is\n",
-    "sensitive to the chosen initial condition.\n",
-    "\n",
-    "Note that the gradient is a function of $\\mathbf{x} =\n",
-    "(x_1,\\cdots,x_n)$ which makes it expensive to compute numerically."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "c92a82a1",
-   "metadata": {},
-   "source": [
-    "## The sensitiveness of the gradient descent\n",
-    "\n",
-    "The gradient descent method \n",
-    "is sensitive to the choice of learning rate $\\gamma_k$. This is due\n",
-    "to the fact that we are only guaranteed that $F(\\mathbf{x}_{k+1}) \\leq\n",
-    "F(\\mathbf{x}_k)$ for sufficiently small $\\gamma_k$. The problem is to\n",
-    "determine an optimal learning rate. If the learning rate is chosen too\n",
-    "small the method will take a long time to converge and if it is too\n",
-    "large we can experience erratic behavior.\n",
-    "\n",
-    "Many of these shortcomings can be alleviated by introducing\n",
-    "randomness. One such method is that of Stochastic Gradient Descent\n",
-    "(SGD), to be discussed next week."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "b5a9af46",
-   "metadata": {},
+   "id": "be028ae6",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "## Convex functions\n",
-    "\n",
-    "Ideally we want our cost/loss function to be convex(concave).\n",
+    "where $z$ defines the level of certainty (or confidence). For a normal\n",
+    "distribution typical parameters are $z=2.576$ which corresponds to a\n",
+    "confidence of $99\\%$ while $z=1.96$ corresponds to a confidence of\n",
+    "$95\\%$.  A confidence level of $95\\%$ is commonly used and it is\n",
+    "normally referred to as a *two-sigmas* confidence level, that is we\n",
+    "approximate $z\\approx 2$.\n",
     "\n",
-    "First we give the definition of a convex set: A set $C$ in\n",
-    "$\\mathbb{R}^n$ is said to be convex if, for all $x$ and $y$ in $C$ and\n",
-    "all $t \\in (0,1)$ , the point $(1 − t)x + ty$ also belongs to\n",
-    "C. Geometrically this means that every point on the line segment\n",
-    "connecting $x$ and $y$ is in $C$ as discussed below.\n",
+    "For more discussions of confidence intervals (and in particular linked with a discussion of the bootstrap method), see chapter 5 of the textbook by [Davison on the Bootstrap Methods and their Applications](https://www.cambridge.org/core/books/bootstrap-methods-and-their-application/ED2FD043579F27952363566DC09CBD6A)\n",
     "\n",
-    "The convex subsets of $\\mathbb{R}$ are the intervals of\n",
-    "$\\mathbb{R}$. Examples of convex sets of $\\mathbb{R}^2$ are the\n",
-    "regular polygons (triangles, rectangles, pentagons, etc...)."
+    "In this text you will also find an in-depth discussion of the\n",
+    "Bootstrap method, why it works and various theorems related to it."
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "77ee5272",
-   "metadata": {},
+   "id": "e746545b",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "## Convex function\n",
+    "## Resampling methods: Bootstrap background\n",
     "\n",
-    "**Convex function**: Let $X \\subset \\mathbb{R}^n$ be a convex set. Assume that the function $f: X \\rightarrow \\mathbb{R}$ is continuous, then $f$ is said to be convex if $$f(tx_1 + (1-t)x_2) \\leq tf(x_1) + (1-t)f(x_2) $$ for all $x_1, x_2 \\in X$ and for all $t \\in [0,1]$. If $\\leq$ is replaced with a strict inequaltiy in the definition, we demand $x_1 \\neq x_2$ and $t\\in(0,1)$ then $f$ is said to be strictly convex. For a single variable function, convexity means that if you draw a straight line connecting $f(x_1)$ and $f(x_2)$, the value of the function on the interval $[x_1,x_2]$ is always below the line as illustrated below."
+    "Since $\\widehat{\\theta} = \\widehat{\\theta}(\\boldsymbol{X})$ is a function of random variables,\n",
+    "$\\widehat{\\theta}$ itself must be a random variable. Thus it has\n",
+    "a pdf, call this function $p(\\boldsymbol{t})$. The aim of the bootstrap is to\n",
+    "estimate $p(\\boldsymbol{t})$ by the relative frequency of\n",
+    "$\\widehat{\\theta}$. You can think of this as using a histogram\n",
+    "in the place of $p(\\boldsymbol{t})$. If the relative frequency closely\n",
+    "resembles $p(\\vec{t})$, then using numerics, it is straight forward to\n",
+    "estimate all the interesting parameters of $p(\\boldsymbol{t})$ using point\n",
+    "estimators."
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "282df4c7",
-   "metadata": {},
+   "id": "dea3037c",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "## Conditions on convex functions\n",
-    "\n",
-    "In the following we state first and second-order conditions which\n",
-    "ensures convexity of a function $f$. We write $D_f$ to denote the\n",
-    "domain of $f$, i.e the subset of $R^n$ where $f$ is defined. For more\n",
-    "details and proofs we refer to: [S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge University Press](http://stanford.edu/boyd/cvxbook/, 2004).\n",
-    "\n",
-    "**First order condition.**\n",
-    "\n",
-    "Suppose $f$ is differentiable (i.e $\\nabla f(x)$ is well defined for\n",
-    "all $x$ in the domain of $f$). Then $f$ is convex if and only if $D_f$\n",
-    "is a convex set and $$f(y) \\geq f(x) + \\nabla f(x)^T (y-x) $$ holds\n",
-    "for all $x,y \\in D_f$. This condition means that for a convex function\n",
-    "the first order Taylor expansion (right hand side above) at any point\n",
-    "a global under estimator of the function. To convince yourself you can\n",
-    "make a drawing of $f(x) = x^2+1$ and draw the tangent line to $f(x)$ and\n",
-    "note that it is always below the graph.\n",
+    "## Resampling methods: More Bootstrap background\n",
     "\n",
-    "**Second order condition.**\n",
+    "In the case that $\\widehat{\\theta}$ has\n",
+    "more than one component, and the components are independent, we use the\n",
+    "same estimator on each component separately.  If the probability\n",
+    "density function of $X_i$, $p(x)$, had been known, then it would have\n",
+    "been straightforward to do this by: \n",
+    "1. Drawing lots of numbers from $p(x)$, suppose we call one such set of numbers $(X_1^*, X_2^*, \\cdots, X_n^*)$. \n",
     "\n",
-    "Assume that $f$ is twice\n",
-    "differentiable, i.e the Hessian matrix exists at each point in\n",
-    "$D_f$. Then $f$ is convex if and only if $D_f$ is a convex set and its\n",
-    "Hessian is positive semi-definite for all $x\\in D_f$. For a\n",
-    "single-variable function this reduces to $f''(x) \\geq 0$. Geometrically this means that $f$ has nonnegative curvature\n",
-    "everywhere.\n",
+    "2. Then using these numbers, we could compute a replica of $\\widehat{\\theta}$ called $\\widehat{\\theta}^*$. \n",
     "\n",
-    "This condition is particularly useful since it gives us an procedure for determining if the function under consideration is convex, apart from using the definition."
+    "By repeated use of the above two points, many\n",
+    "estimates of $\\widehat{\\theta}$ can  be obtained. The\n",
+    "idea is to use the relative frequency of $\\widehat{\\theta}^*$\n",
+    "(think of a histogram) as an estimate of $p(\\boldsymbol{t})$."
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "e435596b",
-   "metadata": {},
+   "id": "fd576cb1",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "## More on convex functions\n",
-    "\n",
-    "The next result is of great importance to us and the reason why we are\n",
-    "going on about convex functions. In machine learning we frequently\n",
-    "have to minimize a loss/cost function in order to find the best\n",
-    "parameters for the model we are considering. \n",
-    "\n",
-    "Ideally we want the\n",
-    "global minimum (for high-dimensional models it is hard to know\n",
-    "if we have local or global minimum). However, if the cost/loss function\n",
-    "is convex the following result provides invaluable information:\n",
-    "\n",
-    "**Any minimum is global for convex functions.**\n",
+    "## Resampling methods: Bootstrap approach\n",
     "\n",
-    "Consider the problem of finding $x \\in \\mathbb{R}^n$ such that $f(x)$\n",
-    "is minimal, where $f$ is convex and differentiable. Then, any point\n",
-    "$x^*$ that satisfies $\\nabla f(x^*) = 0$ is a global minimum.\n",
+    "But\n",
+    "unless there is enough information available about the process that\n",
+    "generated $X_1,X_2,\\cdots,X_n$, $p(x)$ is in general\n",
+    "unknown. Therefore, [Efron in 1979](https://projecteuclid.org/euclid.aos/1176344552)  asked the\n",
+    "question: What if we replace $p(x)$ by the relative frequency\n",
+    "of the observation $X_i$?\n",
     "\n",
-    "This result means that if we know that the cost/loss function is convex and we are able to find a minimum, we are guaranteed that it is a global minimum."
+    "If we draw observations in accordance with\n",
+    "the relative frequency of the observations, will we obtain the same\n",
+    "result in some asymptotic sense? The answer is yes."
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "7bc1bf29",
-   "metadata": {},
-   "source": [
-    "## Some simple problems\n",
-    "\n",
-    "1. Show that $f(x)=x^2$ is convex for $x \\in \\mathbb{R}$ using the definition of convexity. Hint: If you re-write the definition, $f$ is convex if the following holds for all $x,y \\in D_f$ and any $\\lambda \\in [0,1]$ $\\lambda f(x)+(1-\\lambda)f(y)-f(\\lambda x + (1-\\lambda) y ) \\geq 0$.\n",
-    "\n",
-    "2. Using the second order condition show that the following functions are convex on the specified domain.\n",
-    "\n",
-    " * $f(x) = e^x$ is convex for $x \\in \\mathbb{R}$.\n",
-    "\n",
-    " * $g(x) = -\\ln(x)$ is convex for $x \\in (0,\\infty)$.\n",
-    "\n",
-    "3. Let $f(x) = x^2$ and $g(x) = e^x$. Show that $f(g(x))$ and $g(f(x))$ is convex for $x \\in \\mathbb{R}$. Also show that if $f(x)$ is any convex function than $h(x) = e^{f(x)}$ is convex.\n",
-    "\n",
-    "4. A norm is any function that satisfy the following properties\n",
-    "\n",
-    " * $f(\\alpha x) = |\\alpha| f(x)$ for all $\\alpha \\in \\mathbb{R}$.\n",
-    "\n",
-    " * $f(x+y) \\leq f(x) + f(y)$\n",
-    "\n",
-    " * $f(x) \\leq 0$ for all $x \\in \\mathbb{R}^n$ with equality if and only if $x = 0$\n",
-    "\n",
-    "Using the definition of convexity, try to show that a function satisfying the properties above is convex (the third condition is not needed to show this)."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "90fef1a2",
-   "metadata": {},
+   "id": "8629a2e8",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "## Revisiting our first homework\n",
-    "\n",
-    "We will use linear regression as a case study for the gradient descent\n",
-    "methods. Linear regression is a great test case for the gradient\n",
-    "descent methods discussed in the lectures since it has several\n",
-    "desirable properties such as:\n",
+    "## Resampling methods: Bootstrap steps\n",
     "\n",
-    "1. An analytical solution (recall homework set 1).\n",
+    "The independent bootstrap works like this: \n",
     "\n",
-    "2. The gradient can be computed analytically.\n",
+    "1. Draw with replacement $n$ numbers for the observed variables $\\boldsymbol{x} = (x_1,x_2,\\cdots,x_n)$. \n",
     "\n",
-    "3. The cost function is convex which guarantees that gradient descent converges for small enough learning rates\n",
+    "2. Define a vector $\\boldsymbol{x}^*$ containing the values which were drawn from $\\boldsymbol{x}$. \n",
     "\n",
-    "We revisit an example similar to what we had in the first homework set. We had a function  of the type"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 12,
-   "id": "1c59342a",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "x = 2*np.random.rand(m,1)\n",
-    "y = 4+3*x+np.random.randn(m,1)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "79d0e3da",
-   "metadata": {},
-   "source": [
-    "with $x_i \\in [0,1] $ is chosen randomly using a uniform distribution. Additionally we have a stochastic noise chosen according to a normal distribution $\\cal {N}(0,1)$. \n",
-    "The linear regression model is given by"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "ec79a08a",
-   "metadata": {},
-   "source": [
-    "$$\n",
-    "h_\\beta(x) = \\boldsymbol{y} = \\beta_0 + \\beta_1 x,\n",
-    "$$"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "fa4910ae",
-   "metadata": {},
-   "source": [
-    "such that"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "e7665e13",
-   "metadata": {},
-   "source": [
-    "$$\n",
-    "\\boldsymbol{y}_i = \\beta_0 + \\beta_1 x_i.\n",
-    "$$"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "90ffc363",
-   "metadata": {},
-   "source": [
-    "## Gradient descent example\n",
-    "\n",
-    "Let $\\mathbf{y} = (y_1,\\cdots,y_n)^T$, $\\mathbf{\\boldsymbol{y}} = (\\boldsymbol{y}_1,\\cdots,\\boldsymbol{y}_n)^T$ and $\\beta = (\\beta_0, \\beta_1)^T$\n",
-    "\n",
-    "It is convenient to write $\\mathbf{\\boldsymbol{y}} = X\\beta$ where $X \\in \\mathbb{R}^{100 \\times 2} $ is the design matrix given by (we keep the intercept here)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "3aa073fa",
-   "metadata": {},
-   "source": [
-    "$$\n",
-    "X \\equiv \\begin{bmatrix}\n",
-    "1 & x_1  \\\\\n",
-    "\\vdots & \\vdots  \\\\\n",
-    "1 & x_{100} &  \\\\\n",
-    "\\end{bmatrix}.\n",
-    "$$"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "e1ddc571",
-   "metadata": {},
-   "source": [
-    "The cost/loss/risk function is given by ("
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "5709f3d7",
-   "metadata": {},
-   "source": [
-    "$$\n",
-    "C(\\beta) = \\frac{1}{n}||X\\beta-\\mathbf{y}||_{2}^{2} = \\frac{1}{n}\\sum_{i=1}^{100}\\left[ (\\beta_0 + \\beta_1 x_i)^2 - 2 y_i (\\beta_0 + \\beta_1 x_i) + y_i^2\\right]\n",
-    "$$"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "b7b3b90f",
-   "metadata": {},
-   "source": [
-    "and we want to find $\\beta$ such that $C(\\beta)$ is minimized."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "6651ef6c",
-   "metadata": {},
-   "source": [
-    "## The derivative of the cost/loss function\n",
-    "\n",
-    "Computing $\\partial C(\\beta) / \\partial \\beta_0$ and $\\partial C(\\beta) / \\partial \\beta_1$ we can show  that the gradient can be written as"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "646be0cc",
-   "metadata": {},
-   "source": [
-    "$$\n",
-    "\\nabla_{\\beta} C(\\beta) = \\frac{2}{n}\\begin{bmatrix} \\sum_{i=1}^{100} \\left(\\beta_0+\\beta_1x_i-y_i\\right) \\\\\n",
-    "\\sum_{i=1}^{100}\\left( x_i (\\beta_0+\\beta_1x_i)-y_ix_i\\right) \\\\\n",
-    "\\end{bmatrix} = \\frac{2}{n}X^T(X\\beta - \\mathbf{y}),\n",
-    "$$"
+    "3. Using the vector $\\boldsymbol{x}^*$ compute $\\widehat{\\theta}^*$ by evaluating $\\widehat \\theta$ under the observations $\\boldsymbol{x}^*$. \n",
+    "\n",
+    "4. Repeat this process $k$ times. \n",
+    "\n",
+    "When you are done, you can draw a histogram of the relative frequency\n",
+    "of $\\widehat \\theta^*$. This is your estimate of the probability\n",
+    "distribution $p(t)$. Using this probability distribution you can\n",
+    "estimate any statistics thereof. In principle you never draw the\n",
+    "histogram of the relative frequency of $\\widehat{\\theta}^*$. Instead\n",
+    "you use the estimators corresponding to the statistic of interest. For\n",
+    "example, if you are interested in estimating the variance of $\\widehat\n",
+    "\\theta$, apply the etsimator $\\widehat \\sigma^2$ to the values\n",
+    "$\\widehat \\theta^*$."
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "b6f528c2",
-   "metadata": {},
+   "id": "ab8c1d8a",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "where $X$ is the design matrix defined above."
+    "## Code example for the Bootstrap method\n",
+    "\n",
+    "The following code starts with a Gaussian distribution with mean value\n",
+    "$\\mu =100$ and variance $\\sigma=15$. We use this to generate the data\n",
+    "used in the bootstrap analysis. The bootstrap analysis returns a data\n",
+    "set after a given number of bootstrap operations (as many as we have\n",
+    "data points). This data set consists of estimated mean values for each\n",
+    "bootstrap operation. The histogram generated by the bootstrap method\n",
+    "shows that the distribution for these mean values is also a Gaussian,\n",
+    "centered around the mean value $\\mu=100$ but with standard deviation\n",
+    "$\\sigma/\\sqrt{n}$, where $n$ is the number of bootstrap samples (in\n",
+    "this case the same as the number of original data points). The value\n",
+    "of the standard deviation is what we expect from the central limit\n",
+    "theorem."
    ]
   },
   {
-   "cell_type": "markdown",
-   "id": "ae40f47b",
-   "metadata": {},
+   "cell_type": "code",
+   "execution_count": 1,
+   "id": "d7b87cf8",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "%matplotlib inline\n",
+    "\n",
+    "import numpy as np\n",
+    "from time import time\n",
+    "from scipy.stats import norm\n",
+    "import matplotlib.pyplot as plt\n",
+    "\n",
+    "# Returns mean of bootstrap samples \n",
+    "# Bootstrap algorithm\n",
+    "def bootstrap(data, datapoints):\n",
+    "    t = np.zeros(datapoints)\n",
+    "    n = len(data)\n",
+    "    # non-parametric bootstrap         \n",
+    "    for i in range(datapoints):\n",
+    "        t[i] = np.mean(data[np.random.randint(0,n,n)])\n",
+    "    # analysis    \n",
+    "    print(\"Bootstrap Statistics :\")\n",
+    "    print(\"original           bias      std. error\")\n",
+    "    print(\"%8g %8g %14g %15g\" % (np.mean(data), np.std(data),np.mean(t),np.std(t)))\n",
+    "    return t\n",
+    "\n",
+    "# We set the mean value to 100 and the standard deviation to 15\n",
+    "mu, sigma = 100, 15\n",
+    "datapoints = 10000\n",
+    "# We generate random numbers according to the normal distribution\n",
+    "x = mu + sigma*np.random.randn(datapoints)\n",
+    "# bootstrap returns the data sample                                    \n",
+    "t = bootstrap(x, datapoints)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d57a0c6c",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "## The Hessian matrix\n",
-    "The Hessian matrix of $C(\\beta)$ is given by"
+    "We see that our new variance and from that the standard deviation, agrees with the central limit theorem."
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "592c656d",
-   "metadata": {},
+   "id": "bd8574db",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "$$\n",
-    "\\boldsymbol{H} \\equiv \\begin{bmatrix}\n",
-    "\\frac{\\partial^2 C(\\beta)}{\\partial \\beta_0^2} & \\frac{\\partial^2 C(\\beta)}{\\partial \\beta_0 \\partial \\beta_1}  \\\\\n",
-    "\\frac{\\partial^2 C(\\beta)}{\\partial \\beta_0 \\partial \\beta_1} & \\frac{\\partial^2 C(\\beta)}{\\partial \\beta_1^2} &  \\\\\n",
-    "\\end{bmatrix} = \\frac{2}{n}X^T X.\n",
-    "$$"
+    "## Plotting the Histogram"
    ]
   },
   {
-   "cell_type": "markdown",
-   "id": "aaff093b",
-   "metadata": {},
+   "cell_type": "code",
+   "execution_count": 2,
+   "id": "5715940c",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
    "source": [
-    "This result implies that $C(\\beta)$ is a convex function since the matrix $X^T X$ always is positive semi-definite."
+    "# the histogram of the bootstrapped data (normalized data if density = True)\n",
+    "n, binsboot, patches = plt.hist(t, 50, density=True, facecolor='red', alpha=0.75)\n",
+    "# add a 'best fit' line  \n",
+    "y = norm.pdf(binsboot, np.mean(t), np.std(t))\n",
+    "lt = plt.plot(binsboot, y, 'b', linewidth=1)\n",
+    "plt.xlabel('x')\n",
+    "plt.ylabel('Probability')\n",
+    "plt.grid(True)\n",
+    "plt.show()"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "dd177bee",
-   "metadata": {},
+   "id": "9584858b",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "## Simple program\n",
+    "## The bias-variance tradeoff\n",
     "\n",
-    "We can now write a program that minimizes $C(\\beta)$ using the gradient descent method with a constant learning rate $\\gamma$ according to"
+    "We will discuss the bias-variance tradeoff in the context of\n",
+    "continuous predictions such as regression. However, many of the\n",
+    "intuitions and ideas discussed here also carry over to classification\n",
+    "tasks. Consider a dataset $\\mathcal{D}$ consisting of the data\n",
+    "$\\mathbf{X}_\\mathcal{D}=\\{(y_j, \\boldsymbol{x}_j), j=0\\ldots n-1\\}$. \n",
+    "\n",
+    "Let us assume that the true data is generated from a noisy model"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "94ead835",
-   "metadata": {},
+   "id": "6f3cee73",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
-    "\\beta_{k+1} = \\beta_k - \\gamma \\nabla_\\beta C(\\beta_k), \\ k=0,1,\\cdots\n",
+    "\\boldsymbol{y}=f(\\boldsymbol{x}) + \\boldsymbol{\\epsilon}\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "75c4e856",
-   "metadata": {},
+   "id": "fecd4f4b",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "We can use the expression we computed for the gradient and let use a\n",
-    "$\\beta_0$ be chosen randomly and let $\\gamma = 0.001$. Stop iterating\n",
-    "when $||\\nabla_\\beta C(\\beta_k) || \\leq \\epsilon = 10^{-8}$. **Note that the code below does not include the latter stop criterion**.\n",
+    "where $\\epsilon$ is normally distributed with mean zero and standard deviation $\\sigma^2$.\n",
     "\n",
-    "And finally we can compare our solution for $\\beta$ with the analytic result given by \n",
-    "$\\beta= (X^TX)^{-1} X^T \\mathbf{y}$."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "228edb14",
-   "metadata": {},
-   "source": [
-    "## Gradient Descent Example\n",
+    "In our derivation of the ordinary least squares method we defined then\n",
+    "an approximation to the function $f$ in terms of the parameters\n",
+    "$\\boldsymbol{\\theta}$ and the design matrix $\\boldsymbol{X}$ which embody our model,\n",
+    "that is $\\boldsymbol{\\tilde{y}}=\\boldsymbol{X}\\boldsymbol{\\theta}$. \n",
     "\n",
-    "Here our simple example"
+    "Thereafter we found the parameters $\\boldsymbol{\\theta}$ by optimizing the means squared error via the so-called cost function"
    ]
   },
   {
-   "cell_type": "code",
-   "execution_count": 13,
-   "id": "46647c95",
-   "metadata": {},
-   "outputs": [],
+   "cell_type": "markdown",
+   "id": "1bf50201",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "\n",
-    "# Importing various packages\n",
-    "from random import random, seed\n",
-    "import numpy as np\n",
-    "import matplotlib.pyplot as plt\n",
-    "from mpl_toolkits.mplot3d import Axes3D\n",
-    "from matplotlib import cm\n",
-    "from matplotlib.ticker import LinearLocator, FormatStrFormatter\n",
-    "import sys\n",
-    "\n",
-    "# the number of datapoints\n",
-    "n = 100\n",
-    "x = 2*np.random.rand(n,1)\n",
-    "y = 4+3*x+np.random.randn(n,1)\n",
-    "\n",
-    "X = np.c_[np.ones((n,1)), x]\n",
-    "# Hessian matrix\n",
-    "H = (2.0/n)* X.T @ X\n",
-    "# Get the eigenvalues\n",
-    "EigValues, EigVectors = np.linalg.eig(H)\n",
-    "print(f\"Eigenvalues of Hessian Matrix:{EigValues}\")\n",
-    "\n",
-    "beta_linreg = np.linalg.inv(X.T @ X) @ X.T @ y\n",
-    "print(beta_linreg)\n",
-    "beta = np.random.randn(2,1)\n",
-    "\n",
-    "eta = 1.0/np.max(EigValues)\n",
-    "Niterations = 1000\n",
-    "\n",
-    "for iter in range(Niterations):\n",
-    "    gradient = (2.0/n)*X.T @ (X @ beta-y)\n",
-    "    beta -= eta*gradient\n",
-    "\n",
-    "print(beta)\n",
-    "xnew = np.array([[0],[2]])\n",
-    "xbnew = np.c_[np.ones((2,1)), xnew]\n",
-    "ypredict = xbnew.dot(beta)\n",
-    "ypredict2 = xbnew.dot(beta_linreg)\n",
-    "plt.plot(xnew, ypredict, \"r-\")\n",
-    "plt.plot(xnew, ypredict2, \"b-\")\n",
-    "plt.plot(x, y ,'ro')\n",
-    "plt.axis([0,2.0,0, 15.0])\n",
-    "plt.xlabel(r'$x$')\n",
-    "plt.ylabel(r'$y$')\n",
-    "plt.title(r'Gradient descent example')\n",
-    "plt.show()"
+    "$$\n",
+    "C(\\boldsymbol{X},\\boldsymbol{\\theta}) =\\frac{1}{n}\\sum_{i=0}^{n-1}(y_i-\\tilde{y}_i)^2=\\mathbb{E}\\left[(\\boldsymbol{y}-\\boldsymbol{\\tilde{y}})^2\\right].\n",
+    "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "e0bb3c65",
-   "metadata": {},
+   "id": "aa1ee75a",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "## And a corresponding example using **scikit-learn**"
+    "We can rewrite this as"
    ]
   },
   {
-   "cell_type": "code",
-   "execution_count": 14,
-   "id": "d29a0ccf",
-   "metadata": {},
-   "outputs": [],
+   "cell_type": "markdown",
+   "id": "0b88cfa1",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "# Importing various packages\n",
-    "from random import random, seed\n",
-    "import numpy as np\n",
-    "import matplotlib.pyplot as plt\n",
-    "from sklearn.linear_model import SGDRegressor\n",
-    "\n",
-    "n = 100\n",
-    "x = 2*np.random.rand(n,1)\n",
-    "y = 4+3*x+np.random.randn(n,1)\n",
-    "\n",
-    "X = np.c_[np.ones((n,1)), x]\n",
-    "beta_linreg = np.linalg.inv(X.T @ X) @ (X.T @ y)\n",
-    "print(beta_linreg)\n",
-    "sgdreg = SGDRegressor(max_iter = 50, penalty=None, eta0=0.1)\n",
-    "sgdreg.fit(x,y.ravel())\n",
-    "print(sgdreg.intercept_, sgdreg.coef_)"
+    "$$\n",
+    "\\mathbb{E}\\left[(\\boldsymbol{y}-\\boldsymbol{\\tilde{y}})^2\\right]=\\frac{1}{n}\\sum_i(f_i-\\mathbb{E}\\left[\\boldsymbol{\\tilde{y}}\\right])^2+\\frac{1}{n}\\sum_i(\\tilde{y}_i-\\mathbb{E}\\left[\\boldsymbol{\\tilde{y}}\\right])^2+\\sigma^2.\n",
+    "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "7d20e2cc",
-   "metadata": {},
+   "id": "51802535",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "## Gradient descent and Ridge\n",
+    "The three terms represent the square of the bias of the learning\n",
+    "method, which can be thought of as the error caused by the simplifying\n",
+    "assumptions built into the method. The second term represents the\n",
+    "variance of the chosen model and finally the last terms is variance of\n",
+    "the error $\\boldsymbol{\\epsilon}$.\n",
     "\n",
-    "We have also discussed Ridge regression where the loss function contains a regularized term given by the $L_2$ norm of $\\beta$,"
+    "To derive this equation, we need to recall that the variance of $\\boldsymbol{y}$ and $\\boldsymbol{\\epsilon}$ are both equal to $\\sigma^2$. The mean value of $\\boldsymbol{\\epsilon}$ is by definition equal to zero. Furthermore, the function $f$ is not a stochastics variable, idem for $\\boldsymbol{\\tilde{y}}$.\n",
+    "We use a more compact notation in terms of the expectation value"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "52a46927",
-   "metadata": {},
+   "id": "c1fab3ca",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
-    "C_{\\text{ridge}}(\\beta) = \\frac{1}{n}||X\\beta -\\mathbf{y}||^2 + \\lambda ||\\beta||^2, \\ \\lambda \\geq 0.\n",
+    "\\mathbb{E}\\left[(\\boldsymbol{y}-\\boldsymbol{\\tilde{y}})^2\\right]=\\mathbb{E}\\left[(\\boldsymbol{f}+\\boldsymbol{\\epsilon}-\\boldsymbol{\\tilde{y}})^2\\right],\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "446851f9",
-   "metadata": {},
+   "id": "bf1b97b3",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "In order to minimize $C_{\\text{ridge}}(\\beta)$ using GD we adjust the gradient as follows"
+    "and adding and subtracting $\\mathbb{E}\\left[\\boldsymbol{\\tilde{y}}\\right]$ we get"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "dc10da38",
-   "metadata": {},
+   "id": "4e6a9591",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
-    "\\nabla_\\beta C_{\\text{ridge}}(\\beta)  = \\frac{2}{n}\\begin{bmatrix} \\sum_{i=1}^{100} \\left(\\beta_0+\\beta_1x_i-y_i\\right) \\\\\n",
-    "\\sum_{i=1}^{100}\\left( x_i (\\beta_0+\\beta_1x_i)-y_ix_i\\right) \\\\\n",
-    "\\end{bmatrix} + 2\\lambda\\begin{bmatrix} \\beta_0 \\\\ \\beta_1\\end{bmatrix} = 2 (\\frac{1}{n}X^T(X\\beta - \\mathbf{y})+\\lambda \\beta).\n",
+    "\\mathbb{E}\\left[(\\boldsymbol{y}-\\boldsymbol{\\tilde{y}})^2\\right]=\\mathbb{E}\\left[(\\boldsymbol{f}+\\boldsymbol{\\epsilon}-\\boldsymbol{\\tilde{y}}+\\mathbb{E}\\left[\\boldsymbol{\\tilde{y}}\\right]-\\mathbb{E}\\left[\\boldsymbol{\\tilde{y}}\\right])^2\\right],\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "e15d77aa",
-   "metadata": {},
+   "id": "3bec9e3c",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "We can easily extend our program to minimize $C_{\\text{ridge}}(\\beta)$ using gradient descent and compare with the analytical solution given by"
+    "which, using the abovementioned expectation values can be rewritten as"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "89cd7379",
-   "metadata": {},
+   "id": "a65f2f18",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
-    "\\beta_{\\text{ridge}} = \\left(X^T X + n\\lambda I_{2 \\times 2} \\right)^{-1} X^T \\mathbf{y}.\n",
+    "\\mathbb{E}\\left[(\\boldsymbol{y}-\\boldsymbol{\\tilde{y}})^2\\right]=\\mathbb{E}\\left[(\\boldsymbol{y}-\\mathbb{E}\\left[\\boldsymbol{\\tilde{y}}\\right])^2\\right]+\\mathrm{Var}\\left[\\boldsymbol{\\tilde{y}}\\right]+\\sigma^2,\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "20a6a0b6",
-   "metadata": {},
+   "id": "d73eda6c",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "## The Hessian matrix for Ridge Regression\n",
-    "The Hessian matrix of Ridge Regression for our simple example  is given by"
+    "that is the rewriting in terms of the so-called bias, the variance of the model $\\boldsymbol{\\tilde{y}}$ and the variance of $\\boldsymbol{\\epsilon}$."
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "2bcf31af",
-   "metadata": {},
+   "id": "ecc681f6",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "$$\n",
-    "\\boldsymbol{H} \\equiv \\begin{bmatrix}\n",
-    "\\frac{\\partial^2 C(\\beta)}{\\partial \\beta_0^2} & \\frac{\\partial^2 C(\\beta)}{\\partial \\beta_0 \\partial \\beta_1}  \\\\\n",
-    "\\frac{\\partial^2 C(\\beta)}{\\partial \\beta_0 \\partial \\beta_1} & \\frac{\\partial^2 C(\\beta)}{\\partial \\beta_1^2} &  \\\\\n",
-    "\\end{bmatrix} = \\frac{2}{n}X^T X+2\\lambda\\boldsymbol{I}.\n",
-    "$$"
+    "## A way to Read the Bias-Variance Tradeoff\n",
+    "\n",
+    "<!-- dom:FIGURE: [figures/BiasVariance.png, width=600 frac=0.9] -->\n",
+    "<!-- begin figure -->\n",
+    "\n",
+    "<img src=\"figures/BiasVariance.png\" width=\"600\"><p style=\"font-size: 0.9em\"><i>Figure 1: </i></p>\n",
+    "<!-- end figure -->"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "3f9a5445",
-   "metadata": {},
+   "id": "0b1fdbf0",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Example code for Bias-Variance tradeoff"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "id": "e1bb5682",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
    "source": [
-    "This implies that the Hessian matrix  is positive definite, hence the stationary point is a\n",
-    "minimum.\n",
-    "Note that the Ridge cost function is convex being  a sum of two convex\n",
-    "functions. Therefore, the stationary point is a global\n",
-    "minimum of this function."
+    "import matplotlib.pyplot as plt\n",
+    "import numpy as np\n",
+    "from sklearn.linear_model import LinearRegression, Ridge, Lasso\n",
+    "from sklearn.preprocessing import PolynomialFeatures\n",
+    "from sklearn.model_selection import train_test_split\n",
+    "from sklearn.pipeline import make_pipeline\n",
+    "from sklearn.utils import resample\n",
+    "\n",
+    "np.random.seed(2018)\n",
+    "\n",
+    "n = 500\n",
+    "n_boostraps = 100\n",
+    "degree = 18  # A quite high value, just to show.\n",
+    "noise = 0.1\n",
+    "\n",
+    "# Make data set.\n",
+    "x = np.linspace(-1, 3, n).reshape(-1, 1)\n",
+    "y = np.exp(-x**2) + 1.5 * np.exp(-(x-2)**2) + np.random.normal(0, 0.1, x.shape)\n",
+    "\n",
+    "# Hold out some test data that is never used in training.\n",
+    "x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2)\n",
+    "\n",
+    "# Combine x transformation and model into one operation.\n",
+    "# Not neccesary, but convenient.\n",
+    "model = make_pipeline(PolynomialFeatures(degree=degree), LinearRegression(fit_intercept=False))\n",
+    "\n",
+    "# The following (m x n_bootstraps) matrix holds the column vectors y_pred\n",
+    "# for each bootstrap iteration.\n",
+    "y_pred = np.empty((y_test.shape[0], n_boostraps))\n",
+    "for i in range(n_boostraps):\n",
+    "    x_, y_ = resample(x_train, y_train)\n",
+    "\n",
+    "    # Evaluate the new model on the same test data each time.\n",
+    "    y_pred[:, i] = model.fit(x_, y_).predict(x_test).ravel()\n",
+    "\n",
+    "# Note: Expectations and variances taken w.r.t. different training\n",
+    "# data sets, hence the axis=1. Subsequent means are taken across the test data\n",
+    "# set in order to obtain a total value, but before this we have error/bias/variance\n",
+    "# calculated per data point in the test set.\n",
+    "# Note 2: The use of keepdims=True is important in the calculation of bias as this \n",
+    "# maintains the column vector form. Dropping this yields very unexpected results.\n",
+    "error = np.mean( np.mean((y_test - y_pred)**2, axis=1, keepdims=True) )\n",
+    "bias = np.mean( (y_test - np.mean(y_pred, axis=1, keepdims=True))**2 )\n",
+    "variance = np.mean( np.var(y_pred, axis=1, keepdims=True) )\n",
+    "print('Error:', error)\n",
+    "print('Bias^2:', bias)\n",
+    "print('Var:', variance)\n",
+    "print('{} >= {} + {} = {}'.format(error, bias, variance, bias+variance))\n",
+    "\n",
+    "plt.plot(x[::5, :], y[::5, :], label='f(x)')\n",
+    "plt.scatter(x_test, y_test, label='Data points')\n",
+    "plt.scatter(x_test, np.mean(y_pred, axis=1), label='Pred')\n",
+    "plt.legend()\n",
+    "plt.show()"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "003f6d0d",
-   "metadata": {},
+   "id": "256590ad",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "## Program example for gradient descent with Ridge Regression"
+    "## Understanding what happens"
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 15,
-   "id": "bb679580",
-   "metadata": {},
+   "execution_count": 4,
+   "id": "a3b16f08",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
    "outputs": [],
    "source": [
-    "from random import random, seed\n",
-    "import numpy as np\n",
     "import matplotlib.pyplot as plt\n",
-    "from mpl_toolkits.mplot3d import Axes3D\n",
-    "from matplotlib import cm\n",
-    "from matplotlib.ticker import LinearLocator, FormatStrFormatter\n",
-    "import sys\n",
-    "\n",
-    "# the number of datapoints\n",
-    "n = 100\n",
-    "x = 2*np.random.rand(n,1)\n",
-    "y = 4+3*x+np.random.randn(n,1)\n",
-    "\n",
-    "X = np.c_[np.ones((n,1)), x]\n",
-    "XT_X = X.T @ X\n",
-    "\n",
-    "#Ridge parameter lambda\n",
-    "lmbda  = 0.001\n",
-    "Id = n*lmbda* np.eye(XT_X.shape[0])\n",
-    "\n",
-    "# Hessian matrix\n",
-    "H = (2.0/n)* XT_X+2*lmbda* np.eye(XT_X.shape[0])\n",
-    "# Get the eigenvalues\n",
-    "EigValues, EigVectors = np.linalg.eig(H)\n",
-    "print(f\"Eigenvalues of Hessian Matrix:{EigValues}\")\n",
-    "\n",
-    "\n",
-    "beta_linreg = np.linalg.inv(XT_X+Id) @ X.T @ y\n",
-    "print(beta_linreg)\n",
-    "# Start plain gradient descent\n",
-    "beta = np.random.randn(2,1)\n",
-    "\n",
-    "eta = 1.0/np.max(EigValues)\n",
-    "Niterations = 100\n",
-    "\n",
-    "for iter in range(Niterations):\n",
-    "    gradients = 2.0/n*X.T @ (X @ (beta)-y)+2*lmbda*beta\n",
-    "    beta -= eta*gradients\n",
-    "\n",
-    "print(beta)\n",
-    "ypredict = X @ beta\n",
-    "ypredict2 = X @ beta_linreg\n",
-    "plt.plot(x, ypredict, \"r-\")\n",
-    "plt.plot(x, ypredict2, \"b-\")\n",
-    "plt.plot(x, y ,'ro')\n",
-    "plt.axis([0,2.0,0, 15.0])\n",
-    "plt.xlabel(r'$x$')\n",
-    "plt.ylabel(r'$y$')\n",
-    "plt.title(r'Gradient descent example for Ridge')\n",
+    "import numpy as np\n",
+    "from sklearn.linear_model import LinearRegression, Ridge, Lasso\n",
+    "from sklearn.preprocessing import PolynomialFeatures\n",
+    "from sklearn.model_selection import train_test_split\n",
+    "from sklearn.pipeline import make_pipeline\n",
+    "from sklearn.utils import resample\n",
+    "\n",
+    "np.random.seed(2018)\n",
+    "\n",
+    "n = 40\n",
+    "n_boostraps = 100\n",
+    "maxdegree = 14\n",
+    "\n",
+    "\n",
+    "# Make data set.\n",
+    "x = np.linspace(-3, 3, n).reshape(-1, 1)\n",
+    "y = np.exp(-x**2) + 1.5 * np.exp(-(x-2)**2)+ np.random.normal(0, 0.1, x.shape)\n",
+    "error = np.zeros(maxdegree)\n",
+    "bias = np.zeros(maxdegree)\n",
+    "variance = np.zeros(maxdegree)\n",
+    "polydegree = np.zeros(maxdegree)\n",
+    "x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2)\n",
+    "\n",
+    "for degree in range(maxdegree):\n",
+    "    model = make_pipeline(PolynomialFeatures(degree=degree), LinearRegression(fit_intercept=False))\n",
+    "    y_pred = np.empty((y_test.shape[0], n_boostraps))\n",
+    "    for i in range(n_boostraps):\n",
+    "        x_, y_ = resample(x_train, y_train)\n",
+    "        y_pred[:, i] = model.fit(x_, y_).predict(x_test).ravel()\n",
+    "\n",
+    "    polydegree[degree] = degree\n",
+    "    error[degree] = np.mean( np.mean((y_test - y_pred)**2, axis=1, keepdims=True) )\n",
+    "    bias[degree] = np.mean( (y_test - np.mean(y_pred, axis=1, keepdims=True))**2 )\n",
+    "    variance[degree] = np.mean( np.var(y_pred, axis=1, keepdims=True) )\n",
+    "    print('Polynomial degree:', degree)\n",
+    "    print('Error:', error[degree])\n",
+    "    print('Bias^2:', bias[degree])\n",
+    "    print('Var:', variance[degree])\n",
+    "    print('{} >= {} + {} = {}'.format(error[degree], bias[degree], variance[degree], bias[degree]+variance[degree]))\n",
+    "\n",
+    "plt.plot(polydegree, error, label='Error')\n",
+    "plt.plot(polydegree, bias, label='bias')\n",
+    "plt.plot(polydegree, variance, label='Variance')\n",
+    "plt.legend()\n",
     "plt.show()"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "2050684c",
-   "metadata": {},
+   "id": "8c4d3e7f",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "## Using gradient descent methods, limitations\n",
-    "\n",
-    "* **Gradient descent (GD) finds local minima of our function**. Since the GD algorithm is deterministic, if it converges, it will converge to a local minimum of our cost/loss/risk function. Because in ML we are often dealing with extremely rugged landscapes with many local minima, this can lead to poor performance.\n",
+    "## Summing up\n",
     "\n",
-    "* **GD is sensitive to initial conditions**. One consequence of the local nature of GD is that initial conditions matter. Depending on where one starts, one will end up at a different local minima. Therefore, it is very important to think about how one initializes the training process. This is true for GD as well as more complicated variants of GD.\n",
+    "The bias-variance tradeoff summarizes the fundamental tension in\n",
+    "machine learning, particularly supervised learning, between the\n",
+    "complexity of a model and the amount of training data needed to train\n",
+    "it.  Since data is often limited, in practice it is often useful to\n",
+    "use a less-complex model with higher bias, that is  a model whose asymptotic\n",
+    "performance is worse than another model because it is easier to\n",
+    "train and less sensitive to sampling noise arising from having a\n",
+    "finite-sized training dataset (smaller variance). \n",
     "\n",
-    "* **Gradients are computationally expensive to calculate for large datasets**. In many cases in statistics and ML, the cost/loss/risk function is a sum of terms, with one term for each data point. For example, in linear regression, $E \\propto \\sum_{i=1}^n (y_i - \\mathbf{w}^T\\cdot\\mathbf{x}_i)^2$; for logistic regression, the square error is replaced by the cross entropy. To calculate the gradient we have to sum over *all* $n$ data points. Doing this at every GD step becomes extremely computationally expensive. An ingenious solution to this, is to calculate the gradients using small subsets of the data called \"mini batches\". This has the added benefit of introducing stochasticity into our algorithm.\n",
+    "The above equations tell us that in\n",
+    "order to minimize the expected test error, we need to select a\n",
+    "statistical learning method that simultaneously achieves low variance\n",
+    "and low bias. Note that variance is inherently a nonnegative quantity,\n",
+    "and squared bias is also nonnegative. Hence, we see that the expected\n",
+    "test MSE can never lie below $Var(\\epsilon)$, the irreducible error.\n",
     "\n",
-    "* **GD is very sensitive to choices of learning rates**. GD is extremely sensitive to the choice of learning rates. If the learning rate is very small, the training process take an extremely long time. For larger learning rates, GD can diverge and give poor results. Furthermore, depending on what the local landscape looks like, we have to modify the learning rates to ensure convergence. Ideally, we would *adaptively* choose the learning rates to match the landscape.\n",
+    "What do we mean by the variance and bias of a statistical learning\n",
+    "method? The variance refers to the amount by which our model would change if we\n",
+    "estimated it using a different training data set. Since the training\n",
+    "data are used to fit the statistical learning method, different\n",
+    "training data sets  will result in a different estimate. But ideally the\n",
+    "estimate for our model should not vary too much between training\n",
+    "sets. However, if a method has high variance  then small changes in\n",
+    "the training data can result in large changes in the model. In general, more\n",
+    "flexible statistical methods have higher variance.\n",
     "\n",
-    "* **GD treats all directions in parameter space uniformly.** Another major drawback of GD is that unlike Newton's method, the learning rate for GD is the same in all directions in parameter space. For this reason, the maximum learning rate is set by the behavior of the steepest direction and this can significantly slow down training. Ideally, we would like to take large steps in flat directions and small steps in steep directions. Since we are exploring rugged landscapes where curvatures change, this requires us to keep track of not only the gradient but second derivatives. The ideal scenario would be to calculate the Hessian but this proves to be too computationally expensive. \n",
-    "\n",
-    "* GD can take exponential time to escape saddle points, even with random initialization. As we mentioned, GD is extremely sensitive to initial condition since it determines the particular local minimum GD would eventually reach. However, even with a good initialization scheme, through the introduction of randomness, GD can still take exponential time to escape saddle points."
+    "You may also find this recent [article](https://www.pnas.org/content/116/32/15849) of interest."
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "6b20b26d",
-   "metadata": {},
+   "id": "6ba8872d",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "## Challenge yourself the coming weekend\n",
+    "## Another Example from Scikit-Learn's Repository\n",
     "\n",
-    "Write a code which implements gradient descent for a logistic regression example."
+    "This example demonstrates the problems of underfitting and overfitting and\n",
+    "how we can use linear regression with polynomial features to approximate\n",
+    "nonlinear functions. The plot shows the function that we want to approximate,\n",
+    "which is a part of the cosine function. In addition, the samples from the\n",
+    "real function and the approximations of different models are displayed. The\n",
+    "models have polynomial features of different degrees. We can see that a\n",
+    "linear function (polynomial with degree 1) is not sufficient to fit the\n",
+    "training samples. This is called **underfitting**. A polynomial of degree 4\n",
+    "approximates the true function almost perfectly. However, for higher degrees\n",
+    "the model will **overfit** the training data, i.e. it learns the noise of the\n",
+    "training data.\n",
+    "We evaluate quantitatively overfitting and underfitting by using\n",
+    "cross-validation. We calculate the mean squared error (MSE) on the validation\n",
+    "set, the higher, the less likely the model generalizes correctly from the\n",
+    "training data."
    ]
   },
   {
-   "cell_type": "markdown",
-   "id": "3570021a",
-   "metadata": {},
+   "cell_type": "code",
+   "execution_count": 5,
+   "id": "624a6bc3",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
    "source": [
-    "## Lab session: Material from last week and relevant for the first project"
+    "\n",
+    "\n",
+    "#print(__doc__)\n",
+    "\n",
+    "import numpy as np\n",
+    "import matplotlib.pyplot as plt\n",
+    "from sklearn.pipeline import Pipeline\n",
+    "from sklearn.preprocessing import PolynomialFeatures\n",
+    "from sklearn.linear_model import LinearRegression\n",
+    "from sklearn.model_selection import cross_val_score\n",
+    "\n",
+    "\n",
+    "def true_fun(X):\n",
+    "    return np.cos(1.5 * np.pi * X)\n",
+    "\n",
+    "np.random.seed(0)\n",
+    "\n",
+    "n_samples = 30\n",
+    "degrees = [1, 4, 15]\n",
+    "\n",
+    "X = np.sort(np.random.rand(n_samples))\n",
+    "y = true_fun(X) + np.random.randn(n_samples) * 0.1\n",
+    "\n",
+    "plt.figure(figsize=(14, 5))\n",
+    "for i in range(len(degrees)):\n",
+    "    ax = plt.subplot(1, len(degrees), i + 1)\n",
+    "    plt.setp(ax, xticks=(), yticks=())\n",
+    "\n",
+    "    polynomial_features = PolynomialFeatures(degree=degrees[i],\n",
+    "                                             include_bias=False)\n",
+    "    linear_regression = LinearRegression()\n",
+    "    pipeline = Pipeline([(\"polynomial_features\", polynomial_features),\n",
+    "                         (\"linear_regression\", linear_regression)])\n",
+    "    pipeline.fit(X[:, np.newaxis], y)\n",
+    "\n",
+    "    # Evaluate the models using crossvalidation\n",
+    "    scores = cross_val_score(pipeline, X[:, np.newaxis], y,\n",
+    "                             scoring=\"neg_mean_squared_error\", cv=10)\n",
+    "\n",
+    "    X_test = np.linspace(0, 1, 100)\n",
+    "    plt.plot(X_test, pipeline.predict(X_test[:, np.newaxis]), label=\"Model\")\n",
+    "    plt.plot(X_test, true_fun(X_test), label=\"True function\")\n",
+    "    plt.scatter(X, y, edgecolor='b', s=20, label=\"Samples\")\n",
+    "    plt.xlabel(\"x\")\n",
+    "    plt.ylabel(\"y\")\n",
+    "    plt.xlim((0, 1))\n",
+    "    plt.ylim((-2, 2))\n",
+    "    plt.legend(loc=\"best\")\n",
+    "    plt.title(\"Degree {}\\nMSE = {:.2e}(+/- {:.2e})\".format(\n",
+    "        degrees[i], -scores.mean(), scores.std()))\n",
+    "plt.show()"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "c5f36ff0",
-   "metadata": {},
+   "id": "7dcfbdc3",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "## Various steps in cross-validation\n",
     "\n",
@@ -2704,48 +1919,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "a6e47b16",
-   "metadata": {},
-   "source": [
-    "## How to set up the cross-validation for Ridge and/or Lasso\n",
-    "\n",
-    "* Define a range of interest for the penalty parameter.\n",
-    "\n",
-    "* Divide the data set into training and test set comprising samples $\\{1, \\ldots, n\\} \\setminus i$ and $\\{ i \\}$, respectively.\n",
-    "\n",
-    "* Fit the linear regression model by means of for example Ridge or Lasso regression  for each $\\lambda$ in the grid using the training set, and the corresponding estimate of the error variance $\\boldsymbol{\\sigma}_{-i}^2(\\lambda)$, as"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "5b7545c5",
-   "metadata": {},
-   "source": [
-    "$$\n",
-    "\\begin{align*}\n",
-    "\\boldsymbol{\\beta}_{-i}(\\lambda) & =  ( \\boldsymbol{X}_{-i, \\ast}^{T}\n",
-    "\\boldsymbol{X}_{-i, \\ast} + \\lambda \\boldsymbol{I}_{pp})^{-1}\n",
-    "\\boldsymbol{X}_{-i, \\ast}^{T} \\boldsymbol{y}_{-i}\n",
-    "\\end{align*}\n",
-    "$$"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "fa3a49a2",
-   "metadata": {},
-   "source": [
-    "* Evaluate the prediction performance of these models on the test set by $C[y_i, \\boldsymbol{X}_{i, \\ast}; \\boldsymbol{\\beta}_{-i}(\\lambda), \\boldsymbol{\\sigma}_{-i}^2(\\lambda)]$. Or, by the prediction error $|y_i - \\boldsymbol{X}_{i, \\ast} \\boldsymbol{\\beta}_{-i}(\\lambda)|$, the relative error, the error squared or the R2 score function.\n",
-    "\n",
-    "* Repeat the first three steps  such that each sample plays the role of the test set once.\n",
-    "\n",
-    "* Average the prediction performances of the test sets at each grid point of the penalty bias/parameter. It is an estimate of the prediction performance of the model corresponding to this value of the penalty parameter on novel data."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "685304e1",
-   "metadata": {},
+   "id": "583f2b85",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "## Cross-validation in brief\n",
     "\n",
@@ -2770,8 +1947,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "9cac2104",
-   "metadata": {},
+   "id": "2b422220",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "## Code Example for Cross-validation and $k$-fold Cross-validation\n",
     "\n",
@@ -2781,20 +1960,12 @@
   {
    "cell_type": "code",
    "execution_count": 6,
-   "id": "1134c2ed",
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAioAAAGwCAYAAACHJU4LAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjguMCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy81sbWrAAAACXBIWXMAAA9hAAAPYQGoP6dpAABmvklEQVR4nO3dd1yV5f/H8ddhHTYIAoJMQUUUHLhQc+U2s+EqM7VpqWnDyva2rNTKhlrZ0LLSLHOPcu+BIjhQUVBUFJDNYZz79wff+EWOFIHrnMPn+Xicx4Nzn3u87+PB8+G6r/u6dJqmaQghhBBCmCAr1QGEEEIIIa5GChUhhBBCmCwpVIQQQghhsqRQEUIIIYTJkkJFCCGEECZLChUhhBBCmCwpVIQQQghhsmxUB7gZRqOR1NRUXFxc0Ol0quMIIYQQ4jpomkZOTg5+fn5YWV27zcSsC5XU1FQCAgJUxxBCCCFEJaSkpODv73/Ndcy6UHFxcQHKTtTV1VVxGiGEEEJcj+zsbAICAsq/x6/FrAuVvy/3uLq6SqEihBBCmJnr6bYhnWmFEEIIYbKkUBFCCCGEyZJCRQghhBAmy6z7qFyv0tJSiouLVccQFs7W1hZra2vVMYQQwqJYdKGiaRrnzp3j0qVLqqOIWsLd3Z169erJuD5CCFFFLLpQ+btI8fb2xtHRUb48RLXRNI38/HzS0tIA8PX1VZxICCEsg8UWKqWlpeVFiqenp+o4ohZwcHAAIC0tDW9vb7kMJIQQVcBiO9P+3SfF0dFRcRJRm/z9eZM+UUIIUTUstlD5m1zuETVJPm9CCFG1LL5QEUIIIYT5kkJFCCGEECZLChWhlE6n47ffflMdQwghhImSQkUIIcR1KzVqXDiVSsGlHNVRRC0hhYqZk7tLqkdRUZHqCEKYlPPZhUz+NY4Wr69m+V2P4FDHlcMNIomd+R2a0ag6nrBgtapQ0TSN/KKSGn9omnZDOY1GI++99x5hYWHo9XoCAwN5++23OXnyJDqdjp9//pmuXbtib2/PvHnzMBqNvPHGG/j7+6PX62nRogUrV64s319RURHjxo3D19cXe3t7goODmTJlSvnrr732GoGBgej1evz8/HjiiSf+M+PkyZNp3779ZcujoqJ49dVXAdi1axc9e/akbt26uLm50aVLF/bu3XtD78X1nsOlS5d45JFH8PHxwd7enmbNmrF06dLy1xctWkTTpk3R6/UEBwfz4YcfVth/cHAwb731FqNGjcLNzY2HH34YgK1bt9K5c2ccHBwICAjgiSeeIC8vr1LnIIS52j9/Cbe/u5IfdyaTYyjhjKs3AOFJB2kxfiS7u9+BIS9fcUphqSx2wLcrKSguJeKVVTV+3IQ3euNod/1v9eTJk5kzZw7Tp0+nU6dOnD17lsOHD5e//txzz/Hhhx8yd+5c9Ho9H330ER9++CGzZs2iZcuWfP3119x+++3Ex8fTsGFDPv74Y5YsWcLPP/9MYGAgKSkppKSkALBw4UKmT5/OggULaNq0KefOnWP//v3/mXH48OG8++67HD9+nNDQUADi4+OJi4tj4cKFAOTk5DBy5Eg+/vhjAD788EP69etHYmIiLi4u1/1+ANc8B6PRSN++fcnJyWHevHmEhoaSkJBQPuDanj17GDJkCK+99hpDhw5l69atPP7443h6ejJq1KjyY7z//vu8/PLLvPTSSwDExcXRu3dv3nzzTb766isuXLjAuHHjGDduHHPnzr2h/EKYqwNfLqDJoyN4Nawtn499lxf6R9DmrT6kJb7I8Vffpe3Cr2mz4Q9iO/am6fa12NrrVUcWFkan3eif+yYkOzsbNzc3srKycHV1rfBaYWEhSUlJhISEYG9vD0B+UYnJFyo5OTl4eXkxc+ZMHnrooQqvnTx5kpCQEGbMmMGECRPKl9evX5+xY8fywgsvlC9r27Ytbdq04dNPP+WJJ54gPj6etWvXXjbOx7Rp05g1axYHDx7E1tb2hs6refPmDBo0iJdffhmAF154gbVr17Jz584rrl9aWkqdOnX44YcfuO2224CyzrSLFy/mjjvuuOaxrnUOq1evpm/fvhw6dIhGjRpdtu3w4cO5cOECq1evLl/27LPPsmzZMuLj44GyFpWWLVuyePHi8nXuv/9+HBwcmDVrVvmyzZs306VLF/Ly8so/V/90pc+dEObq1JY91O1+C05FBexteytN1y9F71Dxcx339c80fHQE9iVF7OgzlHYrFihKK8zJtb6//61Wtag42FqT8EZvJce9XocOHcJgMHDrrbdedZ3WrVuX/5ydnU1qaiodO3assE7Hjh3LW0ZGjRpFz549ady4MX369OG2226jV69eAAwePJgZM2bQoEED+vTpQ79+/RgwYAA2Nv/90Rg+fDhff/01L7/8Mpqm8eOPPzJx4sTy19PS0njllVf4888/OX/+PKWlpeTn55OcnHzd78ffrnUOsbGx+Pv7X7FIgbL3dODAgZe9PzNmzKC0tLS85eWf7yuUtcQcO3aM+fPnly/TNA2j0UhSUhJNmjS54fMQwlyUGIooHnYPTkUFJDRsQbO/lmLncHnxHfnAEGLzC4kaP4rWq35hy5LH6Hh7FwWJhaWqVYWKTqe7oUswKvw9X8y1ODk5Xbbs360MmqaVL2vVqhVJSUmsWLGCtWvXMmTIEHr06MHChQsJCAjgyJEjrFmzhrVr1/L444/z/vvvs2HDhv9sYbn33nt5/vnn2bt3LwUFBaSkpDBs2LDy10eNGsWFCxeYMWMGQUFB6PV6YmJiKtVR9Vrn8F/v2T/fi38u+7d/v69Go5FHH330in12AgMDb/gchDAnu599i/anE8m2d8Z7+WLsHK/eQthi3P2sOnSMmbkepB0sZk2vYlztb6yFVoirqVWdac1Bw4YNcXBwYN26dde1vqurK35+fmzevLnC8q1bt1b4i9/V1ZWhQ4cyZ84cfvrpJxYtWkRGRgZQVhzdfvvtfPzxx6xfv55t27YRFxf3n8f29/enc+fOzJ8/n/nz59OjRw98fHzKX9+0aRNPPPEE/fr1K+/IevHixes6r6ud65XOISoqitOnT3P06NErbhcREXHF96dRo0bXnDiwVatWxMfHExYWdtnDzs6u0uchhKlLO3ScZl+UdTg/NPFF6oYF/+c2XWa8Sm5kC85nG5iy/PB/ri/E9TLt5oVayN7enueee45nn30WOzs7OnbsyIULF4iPj7/q5aBJkybx6quvEhoaSosWLZg7dy6xsbHllyymT5+Or68vLVq0wMrKil9++YV69erh7u7ON998Q2lpKe3atcPR0ZHvv/8eBwcHgoKCrivv8OHDee211ygqKmL69OkVXgsLC+P777+ndevWZGdnM2nSpOtqMbqSa51Dly5d6Ny5M3fffTfTpk0jLCyMw4cPo9Pp6NOnD08//TRt2rThzTffZOjQoWzbto2ZM2fy2WefXfOYzz33HO3bt2fs2LE8/PDDODk5cejQIdasWcMnn3xSqfMQwhycHPsMbYvyORLSlDZvTrqubextrXn3rkiGzt7O5jU7OeEPDdpGVnNSUStoCmVnZ2sTJkzQAgMDNXt7ey0mJkbbuXPndW+flZWlAVpWVtZlrxUUFGgJCQlaQUFBVUauEaWlpdpbb72lBQUFaba2tlpgYKD2zjvvaElJSRqg7du377L1X3/9da1+/fqara2t1rx5c23FihXlr8+ePVtr0aKF5uTkpLm6umq33nqrtnfvXk3TNG3x4sVau3btNFdXV83JyUlr3769tnbt2uvOmpmZqen1es3R0VHLycmp8NrevXu11q1ba3q9XmvYsKH2yy+/aEFBQdr06dPL1wG0xYsX/+dxrnUOmqZp6enp2ujRozVPT0/N3t5ea9asmbZ06dLy1xcuXKhFRESUv5/vv/9+hf3/O9ffdu7cqfXs2VNzdnbWnJyctKioKO3tt9++ak5z/twJoWmalpKRp90z/F1tY1AL7fCvK294++8feFErsrLW9rTpXg3phKW41vf3vym962fo0KEcPHiQzz//HD8/P+bNm8f06dNJSEigfv36/7n9jd71I0R1k8+dMHev/H6Q77adokOoJz88fPlYSf8laf0OgrrFYIXG8VUbCe11SzWkFObuRu76UdZHpaCggEWLFjF16lQ6d+5MWFgYr732GiEhIXz++eeqYgkhRK2VllPIgl1l4xON6xZWqX2EdG3HvpiyO/KyJ79SZdlE7aWsUCkpKaG0tPSyvzodHBwu6/j4N4PBQHZ2doWHqB6bNm3C2dn5qo+q9s4771z1WH379q3y4wkhLpcw+W0mrvuaXk4FxIR6Vno/dae+BUDzvRs4sye+quKJWkpZZ1oXFxdiYmJ48803adKkCT4+Pvz444/s2LGDhg0bXnGbKVOm8Prrr9dw0tqpdevWxMbG1tjxxowZw5AhQ674WmU74Aohrl9pcQnh82bRNesCu2/vctkt/TciqFNrDjRtR1T8DlLe/oD6v8pIzqLylPZROX78OA888AAbN27E2tqaVq1a0ahRI/bu3UtCQsJl6xsMBgwGQ/nz7OxsAgICpI+KMBnyuRPmav+cH2n+yL1ccnDB/vxZ7F0uH6/pRsR+/j0tHr+fbHtnbFJP41jHrYqSCktgFn1UAEJDQ9mwYQO5ubmkpKSwc+dOiouLCQkJueL6er0eV1fXCg8hhBA3r3TWbAAO97zjposUgKiH7uGMhy/WpSVs+XntTe9P1F4mMY6Kk5MTTk5OZGZmsmrVKqZOnao6khBC1BoXjycTtXcjAL5Pj6+SfVrZ2rBz6ixeOVhAg5J69KySvYraSGmLyqpVq1i5ciVJSUmsWbOGbt260bhxY0aPHq0ylhBC1CrHP52LjWbkaFATgjq3qbL93jK0F/kOzuxPucSxtNwq26+oXZQWKllZWYwdO5bw8HDuv/9+OnXqxOrVq294Fl8hhBCV57rkVwAyBtxVpfut66ynayMvAFb8eaBK9y1qD6WXfoYMGXLVOz2EEEJUv3OZ+exxqY+X4ylCHhtZ5fsf4VXM08+Nx9OQS+ndqVjbmkSPA2FGZFLCWmjUqFHccccd11yna9euTJw4sUbyCCHUWRZ/npd6j2Xs+3/gE3HloSFuRkyX5vhnX8An6wIJ83+v8v0LyyeFigm6UiGxcOFC7O3tmTp1Kq+99ho6ne6yx9q10rNeCHFjlh5IBaBvc/9q2b/e2YlDncsGbSyY/2O1HENYNilUzMCXX37J8OHDmTlzJs8++ywATZs25ezZsxUenTt3VpxUCGFOLpxMpXTHTnSakX6RvtV2HKfhwwBouHUtJYaiajuOsEy1s1DJy7v6o7Dw+tctKPjvdW/S1KlTGTduHD/88AMPPfRQ+XIbGxvq1atX4WFnZwdAXFwc3bt3x8HBAU9PTx555BFyc6/e4z4vL4/7778fZ2dnfH19+fDDD286txDC9CV9OY8l3z3FL0vfwdu1+gYoDB96G5mOrtTJz+LwT0ur7TjCMtXOQsXZ+eqPu++uuK6399XX/fccNMHBl69zE55//nnefPNNli5dyt3/znUV+fn59OnThzp16rBr1y5++eUX1q5dy7hx4666zaRJk/jrr79YvHgxq1evZv369ezZs+emsgshTJ/tihUAFLdtV63HsdHbkRjTA4C8HxZU67GE5ZHu1yZqxYoV/P7776xbt47u3btf9npcXFyFyQEjIiLYuXMn8+fPp6CggO+++w4np7LRJWfOnMmAAQN477338PHxqbCf3NxcvvrqK7777jt69iwbkunbb7/F3796rlcLIUxDYU4ejeO2A+A1rGpvS74S+3uGwLpfCdu8htLiErn7R1y32vlJucZlEKytKz5PS7v6ulb/apA6ebLSkf4tKiqKixcv8sorr9CmTRtcXFwqvN64cWOWLFlS/lyv1wNw6NAhmjdvXl6kAHTs2BGj0ciRI0cuK1SOHz9OUVERMTEx5cs8PDxo3LhxlZ2LEML0HP15KVHFhaS5eBLas1O1H6/J8DtY9FkfVgVF82BSOu0a+fz3RkJQWwsVpxuYx6K61v0P9evXZ9GiRXTr1o0+ffqwcuXKCsWKnZ0dYWFhl22nadpVZz290nKFc1IKIRQq+LXsVuGk9t3w/vcfXdXA1l7PpknvsDo2lZDjGVKoiOtWO/uomInAwEA2bNhAWloavXr1Ijs7+z+3iYiIIDY2lrx/dOTdsmULVlZWNGrU6LL1w8LCsLW1Zfv27eXLMjMzOXr0aNWchBDCJNXfsQEA/e0DauyY3cK9Afjr8DVaqoX4FylUTJy/vz/r168nPT2dXr16kZWVdc31hw8fjr29PSNHjuTgwYP89ddfjB8/nhEjRlx22QfA2dmZBx98kEmTJrFu3ToOHjzIqFGjsKqBv7CEEGqk7kvAPz2VEp0VYcNqrlDp0siLyPPH6LtoNmfj5I8hcX3k28gM1K9fnw0bNnDp0iV69uzJpUuXrrquo6Mjq1atIiMjgzZt2jBo0CBuvfVWZs6cedVt3n//fTp37sztt99Ojx496NSpE9HR0dVwJkIIU7A5346h90zhq2FP41y3To0d193RjqmbvubJLT+Q/O1PNXZcYd50mhl3UsjOzsbNzY2srCxcXV0rvFZYWEhSUhIhISHY21ff+ABC/JN87oQ5GPfDXpYeOMuEWxvyZM/LLwlXp22PTCJmzgcciIwh6sDWGj22MB3X+v7+N2lREUKIWsRo1Nh6PB2ATg3r1vjx6907CIBGh/ZQcCmnxo8vzI8UKkIIUYuc2LiT8b9/Qu/kfbQIcK/x4wd3bsM5d2/sS4o48uNvNX58YX6kUBFCiFrk4s+/MXrPHzyesBJb65r/CtBZWXGqfVcADL//UePHF+bH4gsVM+6CI8yQfN6EqXPYugmAgk5dlGWwH1h2p1HQjg1oRqOyHMI8WGyhYmtrC5TNfSNETfn78/b3508IU2IsKSXkyH4APPrcqixHo2EDKbK2wSk/h+SEE8pyCPNgsSPTWltb4+7uTtr/hsB3dHS86oitQtwsTdPIz88nLS0Nd3d3rP89FYMQJuDUpp2EFOaSb2tPgxoYNv9qHNxdeOalb1ic78JrubaMUJZEmAOLLVQA6tWrB1BerAhR3dzd3cs/d0KYmrQV6wgBjodFEqm3U5oluHNbSlcfZUviRUa0D1KaRZg2iy5UdDodvr6+eHt7U1xcrDqOsHC2trbSkiJMmvWWLQDktmmvOAl0DKvLB6uPsvX4RUpLjVgr6NgrzINFFyp/s7a2li8QIUStZ59yEgCXHt3UBgEi67vxyqZv6J6wheOt59Hotu6qIwkTJSWsEELUAqcz87ntnvfp8tjXNBjYU3UcbKytaFl4keBLZ0n/fbnqOMKESaEihBC1wO6TmaDT4d60EY6uzqrjAFDUpSsALpvXK80hTJsUKkIIUQvsPJkBQJugmpuE8L/Uu/s2ABomHqAwO1dxGmGqpFARQohaYOikEcz+9S06W2erjlIuMKYlaa510ZcWk/jrKtVxxL+czy5kw9ELZOQVKc0hhYoQQli4S8lnaX58P70St9Ms3F91nHI6KytOtYwBIHf5SsVpxL/tXr6ZGW9+x9PfblOaQwoVIYSwcCeXrQUg2SsAjxDTKVQA6FZ2t0+dXWq/DMXl6n49i8XznuGR9fOU5pBCRQghLFzh5rIi4FzTVoqTXM7/zj4c8/Bnl0cIeYUy3pUp8TgcB4C+XVulOWrFOCpCCFGbOe7fC4DWpo3iJJfzjQqn46RvOXOpgJCULDo1rKs6kgAMRcVQUABAve4dlWaRFhUhhLBgmtFI8PF4ADy7q5vf51rahngAsCMpXXES8bejafn0fPAzbnl+Ib7Nw5VmkUJFCCEs2OldcbgW5mKwtiWoq/qh86+kXYgHNqUlnNmyW3UU8T8HzlwCIDjMH52V2lJBLv0IIYQFO3b8LOfqR2Dn7EBze73qOFcUY1/IgY+GYm00UvjMHdi7OKmOVOvFnc4CyqY6UE1aVIQQwoJtcA5g8H1T+W3qN6qjXFVgszDy9Y7oS4s5sfwv1XEEcO/zo/jhxxfoUHBWdRS1hUpJSQkvvfQSISEhODg40KBBA9544w2MRqPKWEIIYTH2n74EQItA0xmR9t90VlacatoagKyV6xSnEYU5eTQ5cYAOyQcIDfNTHUftpZ/33nuPL774gm+//ZamTZuye/duRo8ejZubGxMmTFAZTQghzF6RoZiTSefA2p7m/u6q41xTSadOsGMNzju3qo5S6yVv2EEjYymZjm7Ua9pQdRy1LSrbtm1j4MCB9O/fn+DgYAYNGkSvXr3YvVs6VAkhxM06tW4rez4YzMIFkwnydFQd55q8+5fN6ByaeIDiQoPiNLVb5p+bAEgOjVDekRYUFyqdOnVi3bp1HD16FID9+/ezefNm+vXrd8X1DQYD2dnZFR5CCCGuLGP9JqzQsHNxRqfTqY5zTUG3tOWSgwuOxYWcWLVRdZxazWr3LgDyW7ZWnKSM0kLlueee45577iE8PBxbW1tatmzJxIkTueeee664/pQpU3Bzcyt/BAQE1HBiIYQwH1a7yr5w8lqY3oi0/2ZlY01Sk7KcmdJPRal6hw8A4NgpRnGSMkoLlZ9++ol58+bxww8/sHfvXr799ls++OADvv322yuuP3nyZLKyssofKSkpNZxYCCHMh9eh/33hdDTN8VP+7eJdQ5na+X6W1W+hOkqtlXUmjYALZd+tQb26KE5TRmln2kmTJvH8888zbNgwACIjIzl16hRTpkxh5MiRl62v1+vR601zHAAhhDAluRczCTx/CgD/3qbxhfNf6o28l89yAnEpseE1o4a1lWlfrrJERw6fIj8kGp+SXJoEqb/jBxS3qOTn52P1r4461tbWcnuyEELcpJS/tmGFxnk3LzxDzOMyeRNfF5z1NuQUlnDorPRBVGE77owa8jpfTP1RdZRySguVAQMG8Pbbb7Ns2TJOnjzJ4sWLmTZtGnfeeafKWEIIYfaytuwA4GyI2nlaboSNtRXd3I3cdmgjp39fqTpOrbQ/5RKASd3OrvTSzyeffMLLL7/M448/TlpaGn5+fjz66KO88sorKmMJIYTZ2+/iy+lmt+LeubPqKDdk+P6VtF/yMbsv9oHHh6qOU6toRiOnDiWBzokWge6q45TTaZqmqQ5RWdnZ2bi5uZGVlYWrq6vqOEIIYTL6zNjI4XM5zB4RTa+m9VTHuW4Hv/uVZiPv5py7N/Uyz6uOU6ukxh7Cr2UESR5++J5Jwt7ertqOdSPf3+pHchFCCFGlDCWlHEvLBaCZCUwqdyNC+nenRGdFvUtpnE9IVB2nVjm7ej0AJc6u1Vqk3CgpVIQQwsIcjz9J6Pkk6uqt8HWzVx3nhjh5unOyfhgAp5euVZymdineuh2AjKbNFSepSAoVIYSwMLk/LWTV1+P46tc3TX5E2itJj4oGoGTjZsVJahe3g7EAWLUzrXF3pFARQggLo+3dA0B+eITiJJVj3fkWADz3y7xvNaUov5CQU4cB8OllWh2wpVARQggL434kHgC7NtGKk1SOf/8eAASfOUZe+iW1YWqJk+u2YF9SxCUHFwLayqUfIYQQ1aTEUETQ6WMAeHc2rSb861WvWUMm3/sqt4z5iv2ZJarj1AoZq/8C4GSj5uisrRWnqUgKFSGEsCCnt8diX1JEnp0D9VtHqY5TaTn9b+esqxd7TmWqjlIrbHEP5ruW/Unvc5vqKJeRQkUIISzIxY3bAEgOaIiVjWn9ZXwjWgfVAWC3FCrVTtM0frIP5pVej+H8yEOq41xG6ci0QgghqlbJnr0AZIc3U5zk5rTxseex7b/Q4o8TGO//y6yLLlN3OrOAtBwDNlY6mge4q45zGSlUhBDCgiwLv4XNGUaibzO9Jvwb0TjAk5CtP+FYXMiJzbto0NU8+9uYg6N/7SD6dAK61tHY25peQSiXfoQQwkJomsZSx0BmdhiG14A+quPcFBu9HSdCy1qFLqz8U3Eay+b05SwWzX+WZ9Z/qzrKFUmhIoQQFiItx0BmfjFWOgjzdlYd56bltmoLgNXWrYqTWDavA2Xj7tjd0klxkiuTQkUIISzEye376XV0G+2tc02yCf9GOXYvG3jML36v4iSWKyctneCzJwAIuO1WxWmuTAoVIYSwENrixcxe/DZPr/1KdZQqEXzbrRjRUT/jLBePnVQdxyKdXP4n1pqR1Dr18GrcQHWcK5JCRQghLITNwYMAFDU17zt+/ubqU5eTfmVfnil/yASF1SF33UYAUiNaKk5ydVKoCCGEhfA4UTZXi0Mr0/3SuVEXIltRaGPH+cNJqqNYJJc9ZTMml7aPUZzk6uT2ZCGEsADFhQb8z50CwKdTa8Vpqs6F518mMvJemgZ7Yd73MZmeovxCQhPjAPC5rZfiNFcnLSpCCGEBTm+Pxc5YQq6dI/UiG6uOU2WiIkMptrYlPjWLwuJS1XEsSty5HO655x0+7P0IQbeYbnErhYoQQliA9G27AEgJCENnZTn/tQd4OODloqe4VONAyiXVcSzK9lNZxPo1JnH4IyY3EeE/Wc6nWQgharHiffsByA6znNYUAJ1Ox8Tjf7Hqq8cpfe9d1XEsyo6kDADaNfBQnOTapI+KEEJYgEVt+jPf4EH/2023U2RlhTrpaHwxmdgd21VHsRglhiL6fv4mdeo1pt1jpj09gbSoCCGEBdhS4sLSJp3xurWz6ihVrk6PLgAEH92PsUT6qVSFE6s3cc/upby5dhbhfm6q41yTFCpCCGHmLuUXcTarEIBG9VwUp6l6IT1vodDGDveCHFK271MdxyJkLF8DwInwlljZmvbFFSlUhBDCzJ3cvIeHd/xK30vHcLW3VR2nytk52nMiJAKA8yvWKU5jGRy2bQagsOMtipP8NylUhBDCzBlWrubF9V/z6I5FqqNUm6z/TVCokwkKb1ppcQkNDpe1THn2M93xU/4mhYoQQpg5XVzZoF0F4U0VJ6k+jl3L/vKvd1AmKLxZSeu24GLIJ0fvSIOeHVXH+U9SqAghhJlzO3YIALuWUYqTVJ/gAT057lGfbb7hXLyUpzqOWUv/YxUAJxq3wNrE+6eA3J4shBBmzVhSSsCZEwDU7dBGcZrq41bfh0EvzicxLZdZZ3Lo7e6kOpLZyo8vK2wLOndTnOT6SIuKEEKYsbOxCTgWF2KwtqV+G8ttUQFoHVwHgD2nMhUnMV+GklIev+VR2j7+LR6PPaQ6znWRQkUIIczY+S3/GzrfNxgbvZ3iNNUrOsgDK2MpqTv3q45itvYlX6KguBSjry8NmwSpjnNd5NKPEEKYMcOBgwBkhjRSnKT6tdMbOPDRMOxKiyl8egD2LnL550ZtSbwAQMewuuh0OsVpro8UKkIIYcYW3zKIt0uDGN65EZbbQ6WMf0QIGbZ6nIsKOLxyA+GD+6mOZHZ6PnEfrYs1Clq8pzrKdZNLP0IIYcbis0s5WC8Mz3YtVUepdjorK5LDWwBwad0GtWHMUPb5izRN3EeXpL20aBqoOs51U1qoBAcHo9PpLnuMHTtWZSwhhDALRqPG8Qu5AIR5OytOUzMM7comXdTvlAkKb9Txn5dirRlJ8QqgXqT5XCpUWqjs2rWLs2fPlj/WrCmbe2Dw4MEqYwkhhFk4l3CcF5fNZMT+lQR6OKqOUyPce3YFIPjIfjSjUW0YM1O0cjUAZ1qb/iBv/6S0j4qXl1eF5++++y6hoaF06dLliusbDAYMBkP58+zs7GrNJ4QQpuzi5h2M2LecpHoh2Fh/ojpOjWjQqzOFNnbUyc8ieUcsgTGtVEcyG767yub30fcx/WHz/8lk+qgUFRUxb948Hnjggav2RJ4yZQpubm7lj4CAgBpOKYQQpiP/f3f8ZASFKU5Sc+wc7TkR3ASA88v/VJzGfJyLO0rghRRKdVY0GHyb6jg3xGQKld9++41Lly4xatSoq64zefJksrKyyh8pKSk1F1AIIUyM1eHDABQ1ClecpGYl3T6UaZ2Gs9EjRHUUs3Fq/kIAjjZohpuv13+sbVpM5vbkr776ir59++Ln53fVdfR6PXq9vgZTCSGE6XI7mQiAbbMIxUlqlv1DD/Cx7W4aGJ14WnUYM7Gz0J6ioBZY9eitOsoNM4lC5dSpU6xdu5Zff/1VdRQhhDALmtGIb+pJADzatlCapaZFB5UNpX/iQh4ZeUV4OFn2iLw3y1BSyueuEeQPe4ul4zupjnPDTOLSz9y5c/H29qZ///6qowghhFm4mHgSV0MepTori5/j59/cHe1o7VBE7yNbObx2q+o4Jm9XUib5RaV4u+hp6ueqOs4NU16oGI1G5s6dy8iRI7GxMYkGHiGEMHnndscBcKZuffROtePW5H96dtM8Zv32Drp581RHMXmH/1iHV24GXRp5mc2w+f+kvFBZu3YtycnJPPDAA6qjCCGE2dgb0pyoCQuY9dQ01VHU6Fg2Foj7vl2Kg5i+nu89y65P72dweoLqKJWivFDp1asXmqbRqJH5jJInhBCqJablkG3vjGuLZqqjKOHbtxsADU4ewpCXrziN6TqzJ56gtGRKdFaE39lTdZxKUV6oCCGEuHHH0v43dL5X7Rg6/9/82zYn3ckdfWkxJ5avVx3HZJ3+cREAR8Ka4+pTV3GaypFCRQghzNBDHz/Hq2tnEW5r+O+VLZDOyopTTaMBuLRyreI0pst+zSoAcrqbZ2sKmMjtyUIIIa7fpeSz9IjfBEBe/e8Up1GnuOMtsHMdTjvkzp8ryb2YSXh8WR8en2F3KU5TedKiIoQQZubsjn0AnHP3xsnTXW0Yhbz6l7UShB3dT3Fh7WxZupaj3y5EX1rMGU8/gju3UR2n0qRQEUIIM5Oz9wAA5/1DFSdRK7hLO168cxK9HvyUg+fzVMcxOaW//QZASude6KzM9+teLv0IIYSZMSaU3WZaENpQcRK1rGysOT9wCKcPnWfnyUxaBnmojmQyikuNPNlhNG3cwnnwUfO97APSoiKEEGbH8UTZHD+6prVrjp8rad+grDjZkZShOIlp2ZmUwWmdAxvb9aFJzw6q49wUKVSEEMLMeKccB8C1Ze0aOv9K2tdz5MGdixnywTOUFpeojmMyVsefA6BHEx+srcxvNNp/kks/QghhRnKzcrEtKus46tuupeI06oUHevDklh9xLsrn2J9bCevdWXUk5TSjke4vjMHeuyExg15SHeemSYuKEEKYkePZJUSPn0+P53/GPaCe6jjK2ejtON64OQAXl61RnMY0HFu5iS4JW5iwZQHtG/mojnPTpFARQggzciwtF3Q66ob4q45iMvJjOgGg37pZcRLTcPH7HwE43DwGe1fzH7lYChUhhDAjiX8Pne9t/l9AVaVO71sBCD60F2NJqeI0amlGIwFrlgJgvOtuxWmqhhQqQghhRtp98BJfLXydDueOqI5iMhr06UK+rZ46+dkkb9mjOo5Sx9dsxj/9DIU2doQ/fK/qOFVCChUhhDAjYQd2cOvxXQQ6WauOYjLsHO05ERYJwPk/VilOo9aFr74HIKFlJ5zr1lGcpmpIoSKEEGaiMCcPv/RUAOq1a6E2jInJadcRg7UNF5POqI6ijGY0ErS27LKPNmSo4jRVR25PFkIIM3F29wFCNCPZeic8w4JUxzEp+icn0NzjFhzdXelr1LAy87FDKiPu8GlOezXA3pBPxIPDVMepMtKiIoQQZiJjVywAqX4hZj13S3VoFhGElZMTGXlFHD6XozqOEktO5PL4nS/w+uercajjqjpOlZFPuhBCmImiuLI5frKDwxQnMT12Nla0CykbTn/r4XOK09Q8o1FjedxZAPq2sqzWNilUhBDCTOiPHgbA2KSJ4iSm6e7cEyyb+wRtnhytOkqNO7h5Hw4nEnHW29C1sZfqOFVK+qgIIYSZyCzRkWPngENUM9VRTFJERCAN0k6Qn3mGovxC7BztVUeqMQVvTWHdmoWsGvQo9ra9VcepUtKiIoQQZqCk1MiYPhOJnPgzHncNUB3HJAV3aUeGkxuOxQaO/bFWdZwaU5iTR5ONKwCoP9CyihSQQkUIIcxCckY+xaUaDnY21PeUUWmvxMrGmqSo9gBkLV2pOE3NiZ81H1dDHufcvIgYdrvqOFVOChUhhDADfw+d38DLqVbeenu9Srt3B6DO1o2Kk9Qc63llg7wl9b0LKxvLGwhQChUhhDADzp/MYO2cMTy4Z4nqKCbNf1DZZbGwpHhy0tIVp6l+F48n0+zAVgD8xj2sOE31kEJFCCHMgF3CQcIyTuNrU7sn3fsvfi2acNrTDxvNyPFFy1XHqXbHPpqDjWbkaFATgjpGq45TLeSuHyGEMAPuJ48BoJc7fv7TsS592XnkBBfybGmhOkw1068q64uTOchyRqL9NylUhBDCxBlLSvE7dwoAzzYt1IYxA8Vvvs1T3+0m0OjII5qGTmeZfXriTmcx5PYX6XdsB68/YZmXfUAu/QghhMk7n5CIU3EhRVY2+EVLi8p/6RDqiZ21FckZ+Zy4mKc6TrX5YWcyxda2MHQI7oG+quNUGylUhBDCxKXtjAUg1csfW72d2jBmwElvQ9vgOjRJO8HBxZY5nkpubgF/7EsB4J62gYrTVC8pVIQQwsQVxMYBkBEUqjiJ+XgsYRUr5j5Bg4+mqI5SLeLf/ojlnzzA40kbyuc4slRSqAghhIk7U2JDvHcD8iMiVUcxG/6DbgOg8ZG95KVfUhumGnjM/4bArPN09rK12D44f5NCRQghTNwPzXvTf/THZEycpDqK2QiMaUmqhy92pSUkLvhDdZwqlbh8Aw1TjlBkbUPjSWNVx6l2yguVM2fOcN999+Hp6YmjoyMtWrRgz549qmMJIYRJ0DSNY/8blTbMS4bOv146KytS2nYGwPCHZRUqmdM+BuBA+57UCfJTnKb6KS1UMjMz6dixI7a2tqxYsYKEhAQ+/PBD3N3dVcYSQgiTcSErn5y8Qqx0ZcPni+tnP7Ds8k/Qjg1oRqPiNFUj81QqURuWAuA43vJbU0DxOCrvvfceAQEBzJ07t3xZcHCwukBCCGFi0pau4dC0Iexu1Br7Kf1VxzErjYYNxDDOlnqX0jixfgcNuseojnTTDr81jZiSIo75N6LJ4L6q49QIpS0qS5YsoXXr1gwePBhvb29atmzJnDlzrrq+wWAgOzu7wkMIISxZXmwc+tJinPQyPueNcnB34VBk2WzKZ7//WXGam1dsKCL0528ByHzoMXRWyntv1AilZ3nixAk+//xzGjZsyKpVqxgzZgxPPPEE33333RXXnzJlCm5ubuWPgICAGk4shBA17FACAAVhjRQHMU/p459i+NC3eKfZANVRbtqKwxd5ZOBkfm3dj6hnHlEdp8YoLVSMRiOtWrXinXfeoWXLljz66KM8/PDDfP7551dcf/LkyWRlZZU/UlJSajixEELULKfjiQBYN41QnMQ8tRraj+0NWhJ/oYCTZj5K7debk4j1a0zylOnonRxVx6kxSgsVX19fIiIq/vI1adKE5OTkK66v1+txdXWt8BBCCEtW70wSAO7RzRUnMU91nOyIaeAJwKr4c4rTVN7eUxnEplzCztqK4e2CVMepUUovenbs2JEjR45UWHb06FGCgmrXP4IQQlxJ1pk06uZmAODbvqXiNObrDm/oNmsO4auzYe+fquNUSsnQe3izyIakRyfg5aJXHadGKS1UnnzySTp06MA777zDkCFD2LlzJ7Nnz2b27NkqYwkhhEk4u2MvbsB5Ny98vCx7mPTq1DXcG4/dS7BCI+3QcbybmNdUBCc37KTtjtW0RkdKszdUx6lxSi/9tGnThsWLF/Pjjz/SrFkz3nzzTWbMmMHw4cNVxhJCCJNwKqeU5Y06EN+8o+ooZq1uoxCONiibdTppzjzFaW7chZfLipPYNt0I6hitOE3NU36/22233cZtt92mOoYQQpicXZ4hfHnnC4zuGEx31WHM3KU+t8FncTgvWwLTXlUd57ql7kug5eYVALi89rLiNGrUjpuwhRDCDCX+PXS+twydf7MCHx4BQJOj+0g7dFxxmuuX8sLr2GhGDjRtR8N+XVXHUUIKFSGEMFGXEpNA02jo7aI6itnza9GEQ6FRWKFx4pMvVce5LucTEmmxZjEA1pMnK06jjhQqQghhgvIysvj93WHETx9MQ32p6jgWIXvQUAC8lixSnOT6HJ/0GvrSYg6FRhFxj/kPWFdZUqgIIYQJSt0eC4DBTk8dPy+1YSxE+LgHSHX1YkO9JhxJTlcd55pOpefxRMMBfNH2LrS336k1w+VfSe09cyGEMGFZe/cDcNavgeIklsPNvx6vfryMN3o8wuL4C6rjXNOMtYlctHdh66PPETG0dk9GKYWKEEKYoOKD8QDkNghTnMSy3B3tD8CivacpLjUqTnNliUnn+W3faQAm9WqsOI16UqgIIYQJsk88CoAW3kRxEsvSPdwHL0cbGsXtYPdPK1XHuaLcQUP54ccXeMAtl0h/N9VxlFM+jooQQojL1U0pu4XWqUWU4iSWxc7Gig+Or6DLTx+xPyEG7u2nOlIFB79fTMu9GyjRWeHXUS77wU20qJSUlLB27VpmzZpFTk4OAKmpqeTm5lZZOCGEqI0Mefn4XjwDgE+7FmrDWKDQx0cDEBm3nXNxRxWn+X8lhiKcnnsGgD19hxLUuY3iRKahUoXKqVOniIyMZODAgYwdO5YLF8o6JU2dOpVnnnmmSgMKIURtk5yayZy2d7Ey4ha8GoeojmNx/Ns1J75xNFZoJL0zTXWccntenkrI2RNk2TvT+PMPVccxGZUqVCZMmEDr1q3JzMzEwcGhfPmdd97JunXrqiycEELURkcLrHiv6yi+eHxKrb4ttToVjXkMgCa//0DBpRzFacpmym706fsAHBrzNO6BvooTmY5K/QZs3ryZl156CTs7uwrLg4KCOHPmTJUEE0KI2ioxreyLU4bOrz5RY0eS6uGLe0EOB975WHUcDj80njr52Zz0CSZ6Su0dhfZKKlWoGI1GSksvHynx9OnTuLjIUM9CCHEzcvfuxys3k4ZeTqqjWCxrWxuSRzwMgO83szCWqBv9d9fhVJz27QEg9/1p2NrrlWUxRZUqVHr27MmMGTPKn+t0OnJzc3n11Vfp18+0elALIYS5ufejyez6dATtjuxUHcWiNXtpItl6J3J1NmzYEq8kQ2FxKc8tPcodIz7k68kzaTbiTiU5TFmlbk+ePn063bp1IyIigsLCQu69914SExOpW7cuP/74Y1VnFEKIWqOkqJj655MB8GrdXHEay+Zctw5zP/6JN44bCY/NosstGlZWuhrNMPPPY5y4mIeXuxN3P/lIjR7bXFSqRcXPz4/Y2FgmTZrEo48+SsuWLXn33XfZt28f3t7eVZ1RCCFqjbOxh9CXFlNgo6delIxKWt3uvK8nzvZ2HDqbzar4czV67MRlf6F/83XsSop5c2BT3Bxta/T45qLSA745ODgwevRoRo8eXZV5hBCiVkvfGUsAkOoTSKitjMlZ3dwd7RjdKYQ5K+I4+sYH9PrhPaxr4H3PSUvH8f7hjM84SxM3a3o0u6Paj2muKtWi8u2337Js2bLy588++yzu7u506NCBU6dOVVk4IYSobQri4gDIDApVnKT2eLBDEMu/m8iEhdPY/fLUaj+eZjRy5M77qJ9xlrPuPrSZLWOmXEulCpV33nmnfPyUbdu2MXPmTKZOnUrdunV58sknqzSgEELUJjaHDwNQ3ChccZLaw81Jz/n7HgCg0SfvkXW6ei8B7XptOq23rqREZ0XWV9/iVl+6TFxLpQqVlJQUwsLKZvT87bffGDRoEI888ghTpkxh06ZNVRpQCCFqE/eTxwDQRzZVnKR2af3eiyTVC6FOfjaHH55Qbcc5+sc6oqa8AMCuB58k/K7e1XYsS1GpQsXZ2Zn09HQAVq9eTY8ePQCwt7enoKCg6tIJIUQtomkaX7YcwJetB+LRub3qOLWKjd6O/A+mA9Bu5c/EfbOoyo9xPiERj3uHYF9SRGzzTrT7/L0qP4YlqvQ4Kg899BAPPfQQR48epX///gDEx8cTFBRUpQGFEKK2OJtVyE/hXXi35yPUbxmhOk6t03T4QHb0HgyAz4QxVXoJ6FJ+Ee/MWou+qICkeiGErV2ClY11le3fklWqUPn000+JiYnhwoULLFq0CE9PTwD27NnDvffeW6UBhRCitkhMK5t9PriuE7bWMsePCpE/ziHFKwDv7IvsGTmeUqN20/vMKSxm5Nc7+d0hkHEPvo9+xTKc69apgrS1Q6XuwXJ3d+eDDz7gwIEDpKWlsWTJEgCio6OrNJwQQtQm6dv3EH06gcDgNqqj1FqOddw48813bB73FE83vZOBSxN47fbK9xfKSErhrc9Wst/ajzqOtrz45P34+chUMzeiUoXKypUruf/++0lPT0fTKlabOp3uivMACSGEuLb6875k0ZpFbDM+BmO6qo5TazXs15Wji5eS+eM+vtl6Ejd7Gyb2aHjDM1mf2rIHqzvu4NWcTE4/NI1Xxg+lkRQpN6xSbYvjxo1j8ODBpKamYjQaKzykSBFCiMpxPZEIgG2k9E9RrX9zP17oV3aLeMGU99jZbxhF+YXXta1mNLL7rY/x6taJgIunKXBw4oPhrWlW3606I1usSrWopKWl8dRTT+Hj41PVeYQQolbSjEZ8U5MAcI+WOX5MwSOdQ/E+l8yAqd9irRk50XAvJbNm0+i27lfd5vjqTRRMfIrWh3YDEN84Gp+lC6kXFlxDqS1PpQqVQYMGsX79ekJDZeREIYSoChmnzuBZkIMRHfXbtVQdR/zPHUO6se/8VwQ/P5EGqcdhwK3EN2xJ7h134dahLbqWLcnWrNmfcommEx4kZt96AAzWtuwd8ThtZ39QI0PyWzKd9u9OJtchPz+fwYMH4+XlRWRkJLa2FSdSeuKJJ6os4LVkZ2fj5uZGVlYWrq6uNXJMIYSoDvHzf6fpfXdwxsOX+umpquOIf8lIOs2J+8fQYssKbDRj+fJbH/qc454BADy98XvG7FjI/nY98Pt0Gn5yi/lV3cj3d6XKvB9++IFVq1bh4ODA+vXr0en+f1psnU5XY4WKEEJYitz9ZXP8XAxoQH3FWcTlPEL88di0lPMJiZx4/1Ocdu+g3qlE6jjYUt/dgSa+rrjGPM2lJlNo3ShEdVyLUqlC5aWXXuKNN97g+eefx+oGe0ELIYS4gvgEAApCGykOIq7FJ6IhPnNnlD9fqC5KrVGpQqWoqIihQ4dKkSKEEFXk9+jeLC12o2t/mftFiH+qVKUxcuRIfvrpp6rOIoQQtdZafX2+b3UbdXp0UR1FCJNSqRaV0tJSpk6dyqpVq4iKirqsM+20adOuaz+vvfYar7/+eoVlPj4+nDtXvVNsCyGEKcnKLyYtxwBAQ29nxWmEMC2VKlTi4uJo2bLs9rmDBw9WeO2fHWuvR9OmTVm7dm35c2trmaRJCFG7JO89yN1x6zgf2gQXe9v/3kCIWqRShcpff/1VdQFsbKhXr16V7U8IIcxN/orVfLh8OgeatgceVh1HCJOivDdsYmIifn5+hISEMGzYME6cOHHVdQ0GA9nZ2RUeQghh7rSD8QDkh8kdP0L8m9JCpV27dnz33XesWrWKOXPmcO7cOTp06EB6evoV158yZQpubm7lj4CAgBpOLIQQVc/x+FEArJrKAGFC/FulRqatLnl5eYSGhvLss8/y1FNPXfa6wWDAYDCUP8/OziYgIEBGphVCmLXz7t74ZF3g8KIVhN/VR3UcIapdtY9MW12cnJyIjIwkMTHxiq/r9Xr0en0NpxJCiOqTnZaOT9YFAHzbRytOI4TpUd5H5Z8MBgOHDh3C19dXdRQhhKgRqdv2ApDm4ombn5fiNEKYHqWFyjPPPMOGDRtISkpix44dDBo0iOzsbEaOHKkylhBC1Jic3fsBOO/fQHESIUyT0ks/p0+f5p577uHixYt4eXnRvn17tm/fTlBQkMpYQghRYzZFdOCTwa/TPSqASNVhhDBBSguVBQsWqDy8EEIod6DQho0Nounds5nqKEKYJJPqoyKEELVN4vlcABp6uyhOIoRpMqm7foQQojbJTb/E0CWzSawbSEOvHqrjCGGSpFARQghFUrft5YltP3HRuQ51nKeqjiOESZJLP0IIoUjWnv/d8VNf7vgR4mqkUBFCCEVK4srm+MkJlTl+hLgaKVSEEEIRh2NHANDJHD9CXJUUKkIIoYh3ynEAnFtGKU4ihOmSQkUIIRTIz8yiXsY5AHzbt1KcRgjTJYWKEEIokLojFis0Mh3d8AjxVx1HCJMltycLIYQCBzyDefCR2XR2LeVN1WGEMGFSqAghhAKJ6QWcquOH1j5QdRQhTJpc+hFCCAVk6Hwhro+0qAghhAJ9vpxCuM6RJne9pDqKECZNChUhhKhhhdm53LnlN6w1Ixc831IdRwiTJpd+hBCihp3Zvg9rzcglBxfqhgSojiOESZNCRQghaljmrlgAzvqFoLOS/4aFuBb5DRFCiBpWsv8AANlhjRUnEcL0SaEihBA1zOHoobIfmkWqDSKEGZBCRQghapjPqUQAXFq3UBtECDMghYoQQtSgnPRLeGRnAFD/lnaK0whh+uT2ZCGEqEFHc40Me+oXooszWFDfW3UcIUyeFCpCCFGDjpzLpdjaFrsmzVRHEcIsyKUfIYSoQUfOZQMQXk+GzhfiekiLihBC1KCO014hMiMbl+gXgCaq4whh8qRQEUKIGqIZjbTZvY46+dkcc3lZdRwhzIJc+hFCiBqSfuwUdfKzKdVZ4d8hWnUcIcyCFCpCCFFDUrfsBuBM3frYuzorTiOEeZBCRQghakj+nlgALgY3VBtECDMihYoQQtQQ6/iDABiaNFWcRAjzIYWKEELUEPcTRwHQt2iuOIkQ5kMKFSGEqAGlpUZKDEUY0eEV00p1HCHMhtyeLIQQNSA5s4C+oz7G3VjEntZRquMIYTakRUUIIWrAkXM5APgH1MXaxlpxGiHMh8kUKlOmTEGn0zFx4kTVUYQQosr9Xag09nFVnEQI82ISl3527drF7NmziYqS5lAhhGVq/faz/HzsKBe8nwOkM60Q10t5i0pubi7Dhw9nzpw51KlT55rrGgwGsrOzKzyEEMIc+Cfspe3pBALc9KqjCGFWlBcqY8eOpX///vTo0eM/150yZQpubm7lj4CAgBpIKIQQN6cwOxf/C6cB8O3URnEaIcyL0kJlwYIF7N27lylTplzX+pMnTyYrK6v8kZKSUs0JhRDi5iVv3Im1ZiTDyY26DYNVxxHCrCjro5KSksKECRNYvXo19vb217WNXq9Hr5dmUyGEebm0dRcAZ4Ia42GlvCFbCLOirFDZs2cPaWlpREf//wyipaWlbNy4kZkzZ2IwGLC2llv4hBDmzxgbC0Bek2ZqgwhhhpQVKrfeeitxcXEVlo0ePZrw8HCee+45KVKEEBbD7WgCADYtW6gNIoQZUlaouLi40KxZxb8unJyc8PT0vGy5EEKYK6NR46TenTrOHtTt1E51HCHMjkmMoyKEEJYqOSOfxwY8i52NFQkdo/97AyFEBSZVqKxfv151BCGEqFIJZ8vGewqv54KNDJ0vxA2T7udCCFGNjpy8AECErwydL0RlSKEihBDVqNvLY9k5cwS9ju9QHUUIsySFihBCVCPfk0fwzsvEt4G/6ihCmCUpVIQQoppknkrFJ6vs0o9/F7njR4jKkEJFCCGqyZkN2wE47emHi5eH4jRCmCcpVIQQoprk7tgNQFqDcMVJhDBfUqgIIUQ1sY47AIChaZTiJEKYLylUhBCimngeOwSAQ9tWipMIYb5MasA3IYSwFIXFpfwZ0Jw0G0eCO7dXHUcIsyWFihBCVINjabm81e1B3B1t2RcRpjqOEGZLLv0IIUQ1SEgtGzo/wtcVnU6nOI0Q5ksKFSGEqAapew/iZMiniQydL8RNkUJFCCGqQZ/3niVuxlB6JW5XHUUIsyaFihBCVLESQxFBKYlYoeHbvoXqOEKYNSlUhBCiiiVv2YNDiYFcO0f82zRXHUcIsyaFihBCVLGL67cCcCo4HCsba8VphDBvUqgIIUQV03btAiC7mbSmCHGzpFARQogq5n44DgDbtm0UJxHC/EmhIoQQVai40EBwSiIA9bp1VJxGCPMnI9MKIUQVSjyTyY/dHyQq/RSDWjdTHUcIsyeFihBCVKED6UV83+o2OoR6MthKGq2FuFnyWySEEFUo7kwWAJH+boqTCGEZpEVFCCGqkOPyP2hS6kKUT6TqKEJYBJ2maZrqEJWVnZ2Nm5sbWVlZuLrKfBpCCLWK8gvB1QW70hLO7D5I/eimqiMJYZJu5PtbLv0IIUQVSd64A7vSErLsnfFr2UR1HCEsghQqQghRRdL/3ARAcoMIdNKRVogqIb9JQghRRax27AAgt5UM9CZEVZFCRQghqki9Q/sBcOwsA70JUVWkUBFCiCpwKfksARdSAAjq3VVtGCEsiBQqQghRBU6tWg9AilcA7oG+asMIYUFkHBUhhKgCG70b88GQN+ge4MRo1WGEsCDSoiKEEFVg58ViNoW0wmbQ3aqjCGFRlBYqn3/+OVFRUbi6uuLq6kpMTAwrVqxQGUkIIW6Y0agRm3IJgJYB7kqzCGFplBYq/v7+vPvuu+zevZvdu3fTvXt3Bg4cSHx8vMpYQghxQ5L3HOTxlV/S69Qewuu5qI4jhEVR2kdlwIABFZ6//fbbfP7552zfvp2mTWXoaSGEeUj7YzWP7VhIQsYxbKxfUR1HCItiMp1pS0tL+eWXX8jLyyMmJuaK6xgMBgwGQ/nz7OzsmoonhBBXpW3fDkBW82jFSYSwPMo708bFxeHs7Ixer2fMmDEsXryYiIiIK647ZcoU3Nzcyh8BAQE1nFYIIS7nFb8PAH2nDoqTCGF5lM+eXFRURHJyMpcuXWLRokV8+eWXbNiw4YrFypVaVAICAmT2ZCGEMtnnL+Lk64O1ZuTC4eN4NW6gOpIQJu9GZk9WfunHzs6OsLAwAFq3bs2uXbv46KOPmDVr1mXr6vV69Hp9TUcUQoirSlqyhuaakTMevtSXIkWIKqf80s+/aZpWodVECCFMWf669QCkRslEhEJUB6UtKi+88AJ9+/YlICCAnJwcFixYwPr161m5cqXKWEIIcd1sE/43nEKnW9QGEcJCKS1Uzp8/z4gRIzh79ixubm5ERUWxcuVKevbsqTKWEEJcl8LiUu697QXqtz/NNyP7qY4jhEVSWqh89dVXKg8vhBA3ZX/KJYqMGrlBDQgMra86jhAWyeT6qAghhLnYdTIDgLbBHuh0OsVphLBMyu/6EUIIc9XipQl8mpFNacSLqqMIYbGkUBFCiEooMRTRcs96nIoKOO4r4zgJUV3k0o8QQlTCybVbcSoqIFvvRHDXdqrjCGGxpFARQohKuLhyLQBJ4S2wtpXGaSGqixQqQghRCfqtWwDIbyfz+whRnaRQEUKIG1RiKCI0fhcAngP6KE4jhGWT9kohLFRG0mlSVm8kraCEPY3bciHHQInRyKC572FrZwP+/jg0b4Z/7y54hshM5Dfi2Ir1hBvyyLJ3JrR3Z9VxhLBoUqgIYSEKc/I4/N2vFP26GL/9O/FPP4MHsNM/gi+GTy1f78Utq/HOy6yw7Qm/UM5370vA+IfxbxtVw8nNT8LxNAp9G1ESGERr6Z8iRLWS3zAhzJimaew+mYHxgQeJ3LKKFsWFFV4/5R1IQcNwRnUIxsfVHjsbK44UTeT4mdPYJZ/CM+kowedP0iD1OA3mzeTQqqWMfncBj3VrSNsQD0VnZfoWOIey8/5pvHV7BK1VhxHCwkmhIoQZys3J5+f95/lhZzLH0nL5IjUNx+JCzrl5capTT+wH3kZwv1sJqu9NENDlnxt3eqXCvjKSTnN83iL0Py/g65BO/HX0In8dvUivMA9eamZPYPuWNXlqJi/PUMK+5LIWqU6NvBWnEcLySWdaIcxIRtJptg1/nKJ6fnw970+OpeXiYGvN4Yef5PDiVfhknKPd0vk0f/ge3Opf35eoR4g/bV6eQFTcNiZ+/Rr3tA3ExkpH/QVzqdepLdsfepoSQ1E1n5n52LfnCPr8PPzrOBDk6ag6jhAWT6dpmqY6RGVlZ2fj5uZGVlYWrq4yMqSwXOlJKSROfJHmy3/GocQAwHc9RqB7803uaOGHi71tlR7v+IVczt8+mA7bVwJwJKQp7ksX4xPRsEqPY4623zGS1kvmsfaecfSZ/5HqOEKYpRv5/pYWFSFMWPb5i2wb/jgOjRvRfsn3OJQYSAwMZ9/0Lxm+/GtGtA+q8iIFINTLmZgty9j1xgyy7Z1pnBSPTdu2JCz4o8qPZW58dm7GRjPi1aKJ6ihC1ArSoiKECSosLuX7jYn0v7sLfllpABwNbILh1ddoNmoQOqua+xvjzJ54Cm+/g9DUY5TorNjz3Nu0m/J8jR3flFw8dpK6DUMAyDx5hjpBfooTCWGepEVFCDNVbChi/o5TdHn/L95ec5yFEd045RPEvulf0jDpIJEPDKnRIgWgfnRT/OL3srtjX2w0I63fe5EfvluNGf+NU2knFywB4Jh/QylShKghctePECbAWFLK3ve/wPfDd/ij1zjOB0ZR390B/w/ewr9NMEGKx+pwcHcheuNSto0Yx28ZNvyUUMzJFYeZ3DccnU6nNFtN0i1fDsCFDl0JU5xFiNpCChUhFNKMRg589RPOr71C69RjADy+93f6jB3GPe0C0dtYK074/3RWVsTM/4z4TSdg2SFmbzyBdV4uz94dXeOtPCqUGIpouK9sfp86Q+9SnEaI2sPy/3cRwkQdXriCQ41b0fyRewlNPUaO3pFtoyYSvWUlozqGmFSR8k8P3dKAd++KxDsnnTsfG8SOkU+ojlQjji5ehWthLpmOrjQc0EN1HCFqDWlREaKGHT6XTep9D9F93S8AFNrYETvwPsKnvUVMoK/idNdnWNtA/H87R6P0ZJj3KdtdXGj/2RTVsarVajyY3+tx2vg4cIcMmy9EjZEWFSFqyIkLuTz5Uyx9P9rEEodASnRW7Oh5N1kH4mm/8CvczaRI+Vund55l20NPA9D+83fZPeVTxYmq17LUEua37If100+pjiJErSKFihDV7OSGnezu2Je59z3H4n1n0DQoGTqU1G17abd6IT5NzLdbZsycD9h+5ygAol6eSMJPy9QGqiYpGfkkpuVibaWjcyMv1XGEqFWk/VKIanJ8zRayXnqVFjv/JBgNXxcvzg+9j/G9Ioj0d1Mdr8q0/flL9sacotXuv/AbfQ8pQRsIsLD5gY7MnMv9e+K40Os23ByqfoA9IcTVSYuKEFVIMxo5+P1i9rfoRGivTrTauQ4rNPa26U7BwkXMfiDGoooUACsba5qs+Z2jQU1wL8ghY/gosguLVceqUgHffM4ba2cx6twe1VGEqHWkUBGiChhKSvlldwoLewyn2f130Xz/Fozo2BPTm6S/ttNq5zrCet2iOma1cXB3wWPdSv5q2olHez/FkwtiMRotY0C4tEPHaZwUD0CDh+9TnEaI2kcu/QhxE87sOciSg+f5KkXjYq6BFv5t6G+7mLied+L3yvNEt2uuOmKNqRsaiOfKP8j4YhvrDqfx0bpEnuzZSHWsm5Y0Zx7ewOGQZoSHh6qOI0StIy0qQtygovxC9nwwm7iIttRvHYnLjA+5mGugnqs9vR8YSFHyadot+5GAWlSk/C3K352372gGwJHPvmHfp98rTnTznJeVDZuf1XeA4iRC1E7SoiLEdTAWl3Do52XkffM9jTevIrowt2w5OhraFPHFfa24tYkPttZS+w9uHUDRb0sY/tsUcpc7ktyqKYExrVTHqpSLx04SnhgLQMBDctlHCBWkUBHiKkpKjew5lcmahPMMGDeE5qfiy19Lc/HkxG2DCZw0nnYtIxSmNE1DXnqIhJ+/JCIxlot33k1e/D6cPN1Vx7phxz7+ivaaseyyj/w7C6GEFCpC/EP2+YskfreQkmXLGX/LI6QVl024V8c3gpBzJznSqRcOI0fQZNgAvGV00quytdfjvXwxF1pFE3z+JHsGDKPV5uVmNyfQmYTjFFtZk3XXYNVRhKi1dJoZz9WenZ2Nm5sbWVlZuLq6qo4jzFBhTh7Hf19D9qq1uG3fTNiJeOyMJQCMGvQasZExdG/sTf8gRzpF+qN3clSc2LwcXriC0KEDsDWWsn3CK7Sf8brqSNftWFouPaZtwLMwh9XP9cDTt67qSEJYjBv5/pY/CUWtkpFrYP+ZLGKTL6EtWsjYL1+jaWnFMT+SvQJIvaUHT43pQ0TXNthIv5NKCx/Ul+3jXqD9x28S/cnbHO7UnvBBfVXHui6/x54BoHmLUClShFBIChVhkTSjkfRjpzi7dS95u/Zgu3cPvkfj+Cz6Dua16g9Ao2I3niot5qKzByej2mLs0pX6d/UjsHUkgYrzW5J2019jz84dRG9fzfpZP+PZuzteLnrVsa6ptLiEXau2g21dBrbwUx1HiFpNaaEyZcoUfv31Vw4fPoyDgwMdOnTgvffeo3HjxipjCTNSYiji3LlMThVbc/xCLmlxR+j3/nP4nTlB3YIc/v13cIuzR9nqNYQW/u60CmxK8kMxBLRrQV0z6zthTnRWVoQv/YmXJ33G997NWf/jXuY92M6kW6riv13IgmmjWBfekY5vblAdR4haTWmhsmHDBsaOHUubNm0oKSnhxRdfpFevXiQkJODk5KQymjARuYYSzl/MpuCvjRSeTKbk9Bl0J5NwOJ2Mx7kUfDLPsyGqJy/2HgeAkyGfZ47tB6BUZ0Vq3fpcCArD0KIVLrd0oGfPWxjk+89J5YJr/qRqISdPd0a++wS/ztzM9hMZvL/yEJP7N1Ud66pKv5gFgFPDEOxtrRWnEaJ2U1qorFy5ssLzuXPn4u3tzZ49e+jcubOiVKK6aJpGrqGES/nFZGblwZ9/UpR2kdILFzGmp2OVno71pUwcLqaxNyCC9zoMJ9dQgkNRIYemD7rqfgOyLxDq5USwpxON6rmw0+8LPJtHUD+mFQEuTgTU4DmKqwvzdub9wc15ddZauj/2PPuemkjLiQ+qjnWZC0dOELl3IwD1nh6vOI0QwqT6qGRlZQHg4eFxxdcNBgMGg6H8eXZ2do3ksnSa0UhJUTGG3HyK8wowoKPQyRVDiZGiAgO2e3ZRkl9AaUEhxoICSgsKMBYY0AoLyPAJ4FiLDuQWlVCUnUP/mW9gk5+LTUEedvl56AvzsS/Mx8FQwKpGMUzqOwEAu5Jijn447KqZLhRBbnTZ3Tc2Ls4c9m9MibMzhZ7eFAcEYd0wDKcmDanbvAmdGjVgnc0//urtE16t75eovH6RvjhnbKddykFyn3uClPatTG6m5WPvfkKMZuRQaBRNurRVHUeIWs9kChVN03jqqafo1KkTzZo1u+I6U6ZM4fXXq//2xkNns9m2dDNN/lwCmgZGDdDQGY2gaWhGjUMde3K6cQs0NDxSkmi77Afg/9f9ezudpnGgQy8SI9tj1DQ8ziXTY9Gcste1/+2z7A1AZzSyp31P9rfujoZGnbRUBs2fBhrl+9QZjejQ0JUa2d76Vv6M6UepUcP94lkmzXkZK2MJ1qWlWBlLsSotxdpY9vMfrXozq9sISowaHpcusnTmA9gYS7E2GrHWjNgCf09eP79Fn/JLKW4FOez/+J6rvle/RXThwwFlPUFsS4t5dfOyq67rmZsJgL2tFXXcXDgUFIHR3gGDmzvF7h4Y69RB5+mJjb8fHuGN+bNTDN6u9jjrbeD1wzfzTypMSMzc6STs3kpEYiwX7ryb/IR9ONYxjRmlDXn5NFz4HQB5o0yvtUeI2shkCpVx48Zx4MABNm/efNV1Jk+ezFNPPVX+PDs7m4CAqm/YP3o+h60rt/PAr7Ouus7SfEfmXyz7z7XDyf2MX/7jVdddVerOT4b6ALQ6c4TJG/646rrrbbxYpm8CQHjaKd7es/6q625xqs92n/YABGZm0ij50FXXtb2UycXcIgBsio04Fhuuuq7eWIqTnTV6W2tcHVw47VmfYls7Suz0lNjaUWpnR4mtHqOdHcYmrbinbQCOdjY46W3Ylj8ZKxdnrN1csXFxxcbdFX0dN/QebjTz8eJwgN//X/OfHH/VDMJy2ert8F76KxejWxNyLondA+4heuNSkxgMbv/UL2ibm0GaiydRTz2iOo4QAhMZ8G38+PH89ttvbNy4kZCQkOverroGfIs7ncXGX/8kes0i0OlAp0OzsgJ06KzKnh/p2JOz4S3Q6aDOuRQiV/0K/3sNndX/frYCnY6UVh240Kysedsp/TyN1ywp3w86HVhZlf98oWlL0pu2QKfTYZ+dSfCfywAdWOnQWVmDDtDp0NnaktuwCblNo7Cx0mFjKMBr91Z0NrZY2dpiZWuDla0NOhsbrGxt0Xy80QICsLHSYa1p6FNPY2Vjg7WdDVZ2ttjY2mLn4oydkwPWNtJ5UFS/hJ+W0eie27HRjGx/6nXaf/iK0jyaprE3vC3RR3ez/dFnaf/Fe0rzCGHJbuT7W2mhomka48ePZ/Hixaxfv56GDRve0PYyMq0Q5m37+JdoP/NtiqxsOLFwGeF39lKWZf2RNB6Zs5VBiZt47ovncatwd5gQoirdyPe30rbWsWPHMm/ePH744QdcXFw4d+4c586do6CgQGUsIUQNaffRG+xteyvJ7vV4bcNpLuZe/ZJkddI0jelrjlJkY4vDww9KkSKECVHaoqLT6a64fO7cuYwaNeo/t5cWFSHMX+7FTIbN2sbBHI2YBp58/2DbGh8MbtPa3YxedQZbez0bn+1m8iPnCmHuzKZFRdO0Kz6up0gRQlgG57p1mPHwLTjaWbPtRDpz5v1Vo8c3lpRSf+Qw1n05hud98qRIEcLEqO9mL4So9cK8XZg6KIoHdy7moQd6s+/juTV27N1vzKBB6nHqFORwx52dauy4QojrI4WKEMIk3BblR0+XImyNpTSaNJbEZdXfspJ1+hwNP3wTgIQHxuNW36fajymEuDFSqAghTEb0T3M42KQ1TkUF1B1yJ8nb9lbr8Q4/+AR18rM4WS+Y6GmvVeuxhBCVI4WKEMJk2NrrCd64msSAxtTJz8KuX1/OJyRWy7HivllEu9W/AJA37SNs7aVvihCmSAoVIYRJca5bB88Na0nxCqDepTSKO3cldX/VTqGQkXQa3/FlI8/u6D2YpvfcXqX7F0JUHSlUhBAmxyPEH5u1q0n18MU/PZUFL83k+IXcKtl3SamRl5YkEOsTykmfYKIWfFUl+xVCVA8pVIQQJsk3KhybzZv4dOBYPm7al6GztrEvOfOm9qlpGi8uPsjys8WMG/YaRWvX4eDuUkWJhRDVQQoVIYTJ8m4SyrB5H9LUz5WLuUU8MmMN296eWal9GUtK+f2Zqfy86xRWOvjk3mgaNWtQxYmFEFVNChUhhEnzdNaz4JH29G7syYeL3yXmpfHsbdeTjKTT172Pgks57LulH3dMe57Jf83ljYHN6BkhtyILYQ6kUBFCmDwXe1s+H9EG265dKLayptXOtVhHNGHbo8+Sn5l11e00o5G4r38mI7Qx0dtXU2RlQ8vbu3Jf+6AaTC+EuBlK5/q5WTLXjxC1z7GVG7B68EEapB4HIM/OgYQOPcl56FG8u3ZEb2NF3v6DFP7+B16/LyT0TNntzefdvLgwczbN7rtDYXohBNzY97dNDWUSQogqEdanC6UnD7PrnZn4fvI+/umptFm/hCdcGrMkruzvrgd3/cbLf34JgMHaln19h9D0q49o5u2pMroQohKkUBFCmB1rWxvavDoR7eUnSPhlBdnzF1DSrh11NT1GTSO5cQv2XeqCoeMthD/zOO0DfVVHFkJUklz6EUIIIUSNupHvb+lMK4QQQgiTJYWKEEIIIUyWFCpCCCGEMFlSqAghhBDCZEmhIoQQQgiTJYWKEEIIIUyWFCpCCCGEMFlSqAghhBDCZEmhIoQQQgiTJYWKEEIIIUyWFCpCCCGEMFlSqAghhBDCZEmhIoQQQgiTJYWKEEIIIUyWjeoAN0PTNKBsumghhBBCmIe/v7f//h6/FrMuVHJycgAICAhQnEQIIYQQNyonJwc3N7drrqPTrqecMVFGo5HU1FRcXFzQ6XRVuu/s7GwCAgJISUnB1dW1SvdtCuT8zJ+ln6Olnx9Y/jnK+Zm/6jpHTdPIycnBz88PK6tr90Ix6xYVKysr/P39q/UYrq6uFvsBBDk/S2Dp52jp5weWf45yfuavOs7xv1pS/iadaYUQQghhsqRQEUIIIYTJkkLlKvR6Pa+++ip6vV51lGoh52f+LP0cLf38wPLPUc7P/JnCOZp1Z1ohhBBCWDZpURFCCCGEyZJCRQghhBAmSwoVIYQQQpgsKVSEEEIIYbKkULkOt99+O4GBgdjb2+Pr68uIESNITU1VHatKnDx5kgcffJCQkBAcHBwIDQ3l1VdfpaioSHW0KvX222/ToUMHHB0dcXd3Vx3npn322WeEhIRgb29PdHQ0mzZtUh2pymzcuJEBAwbg5+eHTqfjt99+Ux2pSk2ZMoU2bdrg4uKCt7c3d9xxB0eOHFEdq0p9/vnnREVFlQ8SFhMTw4oVK1THqhZTpkxBp9MxceJE1VGqzGuvvYZOp6vwqFevnrI8Uqhch27duvHzzz9z5MgRFi1axPHjxxk0aJDqWFXi8OHDGI1GZs2aRXx8PNOnT+eLL77ghRdeUB2tShUVFTF48GAee+wx1VFu2k8//cTEiRN58cUX2bdvH7fccgt9+/YlOTlZdbQqkZeXR/PmzZk5c6bqKNViw4YNjB07lu3bt7NmzRpKSkro1asXeXl5qqNVGX9/f9599112797N7t276d69OwMHDiQ+Pl51tCq1a9cuZs+eTVRUlOooVa5p06acPXu2/BEXF6cujCZu2O+//67pdDqtqKhIdZRqMXXqVC0kJER1jGoxd+5czc3NTXWMm9K2bVttzJgxFZaFh4drzz//vKJE1QfQFi9erDpGtUpLS9MAbcOGDaqjVKs6depoX375peoYVSYnJ0dr2LChtmbNGq1Lly7ahAkTVEeqMq+++qrWvHlz1THKSYvKDcrIyGD+/Pl06NABW1tb1XGqRVZWFh4eHqpjiCsoKipiz5499OrVq8LyXr16sXXrVkWpxM3IysoCsNjfudLSUhYsWEBeXh4xMTGq41SZsWPH0r9/f3r06KE6SrVITEzEz8+PkJAQhg0bxokTJ5RlkULlOj333HM4OTnh6elJcnIyv//+u+pI1eL48eN88sknjBkzRnUUcQUXL16ktLQUHx+fCst9fHw4d+6colSisjRN46mnnqJTp040a9ZMdZwqFRcXh7OzM3q9njFjxrB48WIiIiJUx6oSCxYsYO/evUyZMkV1lGrRrl07vvvuO1atWsWcOXM4d+4cHTp0ID09XUmeWluoXKmz0L8fu3fvLl9/0qRJ7Nu3j9WrV2Ntbc3999+PZsKD+t7o+QGkpqbSp08fBg8ezEMPPaQo+fWrzDlaCp1OV+G5pmmXLROmb9y4cRw4cIAff/xRdZQq17hxY2JjY9m+fTuPPfYYI0eOJCEhQXWsm5aSksKECROYN28e9vb2quNUi759+3L33XcTGRlJjx49WLZsGQDffvutkjw2So5qAsaNG8ewYcOuuU5wcHD5z3Xr1qVu3bo0atSIJk2aEBAQwPbt2022KfNGzy81NZVu3boRExPD7Nmzqzld1bjRc7QEdevWxdra+rLWk7S0tMtaWYRpGz9+PEuWLGHjxo34+/urjlPl7OzsCAsLA6B169bs2rWLjz76iFmzZilOdnP27NlDWloa0dHR5ctKS0vZuHEjM2fOxGAwYG1trTBh1XNyciIyMpLExEQlx6+1hcrfhUdl/N2SYjAYqjJSlbqR8ztz5gzdunUjOjqauXPnYmVlHg1tN/NvaK7s7OyIjo5mzZo13HnnneXL16xZw8CBAxUmE9dL0zTGjx/P4sWLWb9+PSEhIaoj1QhN00z6/8zrdeutt152B8zo0aMJDw/nueees7giBcq+6w4dOsQtt9yi5Pi1tlC5Xjt37mTnzp106tSJOnXqcOLECV555RVCQ0NNtjXlRqSmptK1a1cCAwP54IMPuHDhQvlrKu+br2rJyclkZGSQnJxMaWkpsbGxAISFheHs7Kw23A166qmnGDFiBK1bty5vAUtOTraYfkW5ubkcO3as/HlSUhKxsbF4eHgQGBioMFnVGDt2LD/88AO///47Li4u5a1jbm5uODg4KE5XNV544QX69u1LQEAAOTk5LFiwgPXr17Ny5UrV0W6ai4vLZf2J/u6/aCn9jJ555hkGDBhAYGAgaWlpvPXWW2RnZzNy5Eg1gVTecmQODhw4oHXr1k3z8PDQ9Hq9FhwcrI0ZM0Y7ffq06mhVYu7cuRpwxYclGTly5BXP8a+//lIdrVI+/fRTLSgoSLOzs9NatWplUbe2/vXXX1f8txo5cqTqaFXiar9vc+fOVR2tyjzwwAPln08vLy/t1ltv1VavXq06VrWxtNuThw4dqvn6+mq2traan5+fdtddd2nx8fHK8ug0zYR7hAohhBCiVjOPzghCCCGEqJWkUBFCCCGEyZJCRQghhBAmSwoVIYQQQpgsKVSEEEIIYbKkUBFCCCGEyZJCRQghhBAmSwoVIYQQQpgsKVSEsEBdu3Zl4sSJqmNcUXp6Ot7e3pw8eRKA9evXo9PpuHTpUrUet7LH+eabb3B3d7+hbdq0acOvv/56Q9sIIa5MChUhxH86e/Ys9957L40bN8bKyuqqRdCiRYuIiIhAr9cTERHB4sWLL1tnypQpDBgwwOJmtv6nl19+meeffx6j0ag6ihBmTwoVIcR/MhgMeHl58eKLL9K8efMrrrNt2zaGDh3KiBEj2L9/PyNGjGDIkCHs2LGjfJ2CggK++uorHnrooZqKrkT//v3Jyspi1apVqqMIYfakUBHCwmVmZnL//fdTp04dHB0d6du3L4mJiRXWmTNnDgEBATg6OnLnnXcybdq0Cpc7goOD+eijj7j//vtxc3O74nFmzJhBz549mTx5MuHh4UyePJlbb72VGTNmlK+zYsUKbGxsrjnzeHp6Ovfccw/+/v44OjoSGRnJjz/+WGGdrl27Mn78eCZOnEidOnXw8fFh9uzZ5OXlMXr0aFxcXAgNDWXFihWX7X/Lli00b94ce3t72rVrR1xcXIXXv/nmGwIDA8vfi/T09AqvHz9+nIEDB+Lj44OzszNt2rRh7dq1FdaxtramX79+l+UWQtw4KVSEsHCjRo1i9+7dLFmyhG3btqFpGv369aO4uBgo++IeM2YMEyZMIDY2lp49e/L222/f8HG2bdtGr169Kizr3bs3W7duLX++ceNGWrdufc39FBYWEh0dzdKlSzl48CCPPPIII0aMqNAyA/Dtt99St25ddu7cyfjx43nssccYPHgwHTp0YO/evfTu3ZsRI0aQn59fYbtJkybxwQcfsGvXLry9vbn99tvL34sdO3bwwAMP8PjjjxMbG0u3bt146623Kmyfm5tLv379WLt2Lfv27aN3794MGDCA5OTkCuu1bduWTZs2Xd+bJ4S4OmXzNgshqs3f084fPXpUA7QtW7aUv3bx4kXNwcFB+/nnnzVNK5vSvX///hW2Hz58uObm5nbNff+bra2tNn/+/ArL5s+fr9nZ2ZU/HzhwoPbAAw9UWOevv/7SAC0zM/Oq59OvXz/t6aefrpChU6dO5c9LSko0JycnbcSIEeXLzp49qwHatm3bKhxnwYIF5eukp6drDg4O2k8//aRpmqbdc889Wp8+fSoce+jQoVd9L/4WERGhffLJJxWW/f7775qVlZVWWlp6zW2FENcmLSpCWLBDhw5hY2NDu3btypd5enrSuHFjDh06BMCRI0do27Zthe3+/fx66XS6Cs81TauwrKCgAHt7+2vuo7S0lLfffpuoqCg8PT1xdnZm9erVl7VYREVFlf9sbW2Np6cnkZGR5ct8fHwASEtLq7DdPy87eXh4VHgvDh06dNllqX8/z8vL49lnnyUiIgJ3d3ecnZ05fPjwZfkcHBwwGo0YDIZrnq8Q4tpsVAcQQlQfTdOuuvzvAuLfxcS1truWevXqce7cuQrL0tLSygsGgLp165KZmXnN/Xz44YdMnz6dGTNmEBkZiZOTExMnTqSoqKjCera2thWe63S6Csv+PqfrufPmn+/Ff5k0aRKrVq3igw8+ICwsDAcHBwYNGnRZvoyMDBwdHXFwcPjPfQohrk5aVISwYBEREZSUlFTo35Gens7Ro0dp0qQJAOHh4ezcubPCdrt3777hY8XExLBmzZoKy1avXk2HDh3Kn7ds2ZKEhIRr7mfTpk0MHDiQ++67j+bNm9OgQYPLOv/ejO3bt5f/nJmZydGjRwkPDwfK3q9/vv7v9f/ON2rUKO68804iIyOpV69e+Zgw/3Tw4EFatWpVZbmFqK2kUBHCgjVs2JCBAwfy8MMPs3nzZvbv3899991H/fr1GThwIADjx49n+fLlTJs2jcTERGbNmsWKFSsua2WJjY0lNjaW3NxcLly4QGxsbIWiY8KECaxevZr33nuPw4cP895777F27doKY6707t2b+Pj4a7aqhIWFsWbNGrZu3cqhQ4d49NFHL2upuRlvvPEG69at4+DBg4waNYq6detyxx13APDEE0+wcuVKpk6dytGjR5k5cyYrV668LN+vv/5KbGws+/fv5957771iq82mTZsu61wshLhxUqgIYeHmzp1LdHQ0t912GzExMWiaxvLly8svk3Ts2JEvvviCadOm0bx5c1auXMmTTz55WV+Sli1b0rJlS/bs2cMPP/xAy5Yt6devX/nrHTp0YMGCBcydO5eoqCi++eYbfvrppwr9YyIjI2ndujU///zzVfO+/PLLtGrVit69e9O1a1fq1atXXkhUhXfffZcJEyYQHR3N2bNnWbJkCXZ2dgC0b9+eL7/8kk8++YQWLVqwevVqXnrppQrbT58+nTp16tChQwcGDBhA7969L2s5OXPmDFu3bmX06NFVlluI2kqnVeZitBDCoj388MMcPny4Wm6vXb58Oc888wwHDx7Eysoy/1aaNGkSWVlZzJ49W3UUIcyedKYVQvDBBx/Qs2dPnJycWLFiBd9++y2fffZZtRyrX79+JCYmcubMGQICAqrlGKp5e3vzzDPPqI4hhEWQFhUhBEOGDGH9+vXk5OTQoEEDxo8fz5gxY1THEkIIKVSEEEIIYbos8wKxEEIIISyCFCpCCCGEMFlSqAghhBDCZEmhIoQQQgiTJYWKEEIIIUyWFCpCCCGEMFlSqAghhBDCZEmhIoQQQgiT9X+fZkvEqg05YAAAAABJRU5ErkJggg==",
-      "text/plain": [
-       "<Figure size 640x480 with 1 Axes>"
-      ]
-     },
-     "metadata": {},
-     "output_type": "display_data"
-    }
-   ],
+   "id": "ac654a70",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
    "source": [
     "import numpy as np\n",
     "import matplotlib.pyplot as plt\n",
@@ -2887,34 +2058,228 @@
     "plt.show()"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "84ccde87",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## More examples on bootstrap and cross-validation and errors"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "id": "631a50c9",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "# Common imports\n",
+    "import os\n",
+    "import numpy as np\n",
+    "import pandas as pd\n",
+    "import matplotlib.pyplot as plt\n",
+    "from sklearn.linear_model import LinearRegression, Ridge, Lasso\n",
+    "from sklearn.model_selection import train_test_split\n",
+    "from sklearn.utils import resample\n",
+    "from sklearn.metrics import mean_squared_error\n",
+    "# Where to save the figures and data files\n",
+    "PROJECT_ROOT_DIR = \"Results\"\n",
+    "FIGURE_ID = \"Results/FigureFiles\"\n",
+    "DATA_ID = \"DataFiles/\"\n",
+    "\n",
+    "if not os.path.exists(PROJECT_ROOT_DIR):\n",
+    "    os.mkdir(PROJECT_ROOT_DIR)\n",
+    "\n",
+    "if not os.path.exists(FIGURE_ID):\n",
+    "    os.makedirs(FIGURE_ID)\n",
+    "\n",
+    "if not os.path.exists(DATA_ID):\n",
+    "    os.makedirs(DATA_ID)\n",
+    "\n",
+    "def image_path(fig_id):\n",
+    "    return os.path.join(FIGURE_ID, fig_id)\n",
+    "\n",
+    "def data_path(dat_id):\n",
+    "    return os.path.join(DATA_ID, dat_id)\n",
+    "\n",
+    "def save_fig(fig_id):\n",
+    "    plt.savefig(image_path(fig_id) + \".png\", format='png')\n",
+    "\n",
+    "infile = open(data_path(\"EoS.csv\"),'r')\n",
+    "\n",
+    "# Read the EoS data as  csv file and organize the data into two arrays with density and energies\n",
+    "EoS = pd.read_csv(infile, names=('Density', 'Energy'))\n",
+    "EoS['Energy'] = pd.to_numeric(EoS['Energy'], errors='coerce')\n",
+    "EoS = EoS.dropna()\n",
+    "Energies = EoS['Energy']\n",
+    "Density = EoS['Density']\n",
+    "#  The design matrix now as function of various polytrops\n",
+    "\n",
+    "Maxpolydegree = 30\n",
+    "X = np.zeros((len(Density),Maxpolydegree))\n",
+    "X[:,0] = 1.0\n",
+    "testerror = np.zeros(Maxpolydegree)\n",
+    "trainingerror = np.zeros(Maxpolydegree)\n",
+    "polynomial = np.zeros(Maxpolydegree)\n",
+    "\n",
+    "trials = 100\n",
+    "for polydegree in range(1, Maxpolydegree):\n",
+    "    polynomial[polydegree] = polydegree\n",
+    "    for degree in range(polydegree):\n",
+    "        X[:,degree] = Density**(degree/3.0)\n",
+    "\n",
+    "# loop over trials in order to estimate the expectation value of the MSE\n",
+    "    testerror[polydegree] = 0.0\n",
+    "    trainingerror[polydegree] = 0.0\n",
+    "    for samples in range(trials):\n",
+    "        x_train, x_test, y_train, y_test = train_test_split(X, Energies, test_size=0.2)\n",
+    "        model = LinearRegression(fit_intercept=False).fit(x_train, y_train)\n",
+    "        ypred = model.predict(x_train)\n",
+    "        ytilde = model.predict(x_test)\n",
+    "        testerror[polydegree] += mean_squared_error(y_test, ytilde)\n",
+    "        trainingerror[polydegree] += mean_squared_error(y_train, ypred) \n",
+    "\n",
+    "    testerror[polydegree] /= trials\n",
+    "    trainingerror[polydegree] /= trials\n",
+    "    print(\"Degree of polynomial: %3d\"% polynomial[polydegree])\n",
+    "    print(\"Mean squared error on training data: %.8f\" % trainingerror[polydegree])\n",
+    "    print(\"Mean squared error on test data: %.8f\" % testerror[polydegree])\n",
+    "\n",
+    "plt.plot(polynomial, np.log10(trainingerror), label='Training Error')\n",
+    "plt.plot(polynomial, np.log10(testerror), label='Test Error')\n",
+    "plt.xlabel('Polynomial degree')\n",
+    "plt.ylabel('log10[MSE]')\n",
+    "plt.legend()\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "45c7bf7f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Note that we kept the intercept column in the fitting here. This means that we need to set the **intercept** in the call to the **Scikit-Learn** function as **False**. Alternatively, we could have set up the design matrix $X$ without the first column of ones."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5d58c073",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## The same example but now with cross-validation\n",
+    "\n",
+    "In this example we keep the intercept column again but add cross-validation in order to estimate the best possible value of the means squared error."
+   ]
+  },
   {
    "cell_type": "code",
-   "execution_count": null,
-   "id": "884e5e5a",
-   "metadata": {},
+   "execution_count": 8,
+   "id": "6e8fb6ba",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
    "outputs": [],
-   "source": []
+   "source": [
+    "# Common imports\n",
+    "import os\n",
+    "import numpy as np\n",
+    "import pandas as pd\n",
+    "import matplotlib.pyplot as plt\n",
+    "from sklearn.linear_model import LinearRegression, Ridge, Lasso\n",
+    "from sklearn.metrics import mean_squared_error\n",
+    "from sklearn.model_selection import KFold\n",
+    "from sklearn.model_selection import cross_val_score\n",
+    "\n",
+    "\n",
+    "# Where to save the figures and data files\n",
+    "PROJECT_ROOT_DIR = \"Results\"\n",
+    "FIGURE_ID = \"Results/FigureFiles\"\n",
+    "DATA_ID = \"DataFiles/\"\n",
+    "\n",
+    "if not os.path.exists(PROJECT_ROOT_DIR):\n",
+    "    os.mkdir(PROJECT_ROOT_DIR)\n",
+    "\n",
+    "if not os.path.exists(FIGURE_ID):\n",
+    "    os.makedirs(FIGURE_ID)\n",
+    "\n",
+    "if not os.path.exists(DATA_ID):\n",
+    "    os.makedirs(DATA_ID)\n",
+    "\n",
+    "def image_path(fig_id):\n",
+    "    return os.path.join(FIGURE_ID, fig_id)\n",
+    "\n",
+    "def data_path(dat_id):\n",
+    "    return os.path.join(DATA_ID, dat_id)\n",
+    "\n",
+    "def save_fig(fig_id):\n",
+    "    plt.savefig(image_path(fig_id) + \".png\", format='png')\n",
+    "\n",
+    "infile = open(data_path(\"EoS.csv\"),'r')\n",
+    "\n",
+    "# Read the EoS data as  csv file and organize the data into two arrays with density and energies\n",
+    "EoS = pd.read_csv(infile, names=('Density', 'Energy'))\n",
+    "EoS['Energy'] = pd.to_numeric(EoS['Energy'], errors='coerce')\n",
+    "EoS = EoS.dropna()\n",
+    "Energies = EoS['Energy']\n",
+    "Density = EoS['Density']\n",
+    "#  The design matrix now as function of various polytrops\n",
+    "\n",
+    "Maxpolydegree = 30\n",
+    "X = np.zeros((len(Density),Maxpolydegree))\n",
+    "X[:,0] = 1.0\n",
+    "estimated_mse_sklearn = np.zeros(Maxpolydegree)\n",
+    "polynomial = np.zeros(Maxpolydegree)\n",
+    "k =5\n",
+    "kfold = KFold(n_splits = k)\n",
+    "\n",
+    "for polydegree in range(1, Maxpolydegree):\n",
+    "    polynomial[polydegree] = polydegree\n",
+    "    for degree in range(polydegree):\n",
+    "        X[:,degree] = Density**(degree/3.0)\n",
+    "        OLS = LinearRegression(fit_intercept=False)\n",
+    "# loop over trials in order to estimate the expectation value of the MSE\n",
+    "    estimated_mse_folds = cross_val_score(OLS, X, Energies, scoring='neg_mean_squared_error', cv=kfold)\n",
+    "#[:, np.newaxis]\n",
+    "    estimated_mse_sklearn[polydegree] = np.mean(-estimated_mse_folds)\n",
+    "\n",
+    "plt.plot(polynomial, np.log10(estimated_mse_sklearn), label='Test Error')\n",
+    "plt.xlabel('Polynomial degree')\n",
+    "plt.ylabel('log10[MSE]')\n",
+    "plt.legend()\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0c13445c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Material for the lab sessions\n",
+    "\n",
+    "This week we will discuss during the first hour of each lab session\n",
+    "some technicalities related to the project and methods for updating\n",
+    "the learning like ADAgrad, RMSprop and ADAM. As teaching material, see\n",
+    "the jupyter-notebook from week 37 (September 12-16).\n",
+    "\n",
+    "For the lab session, the following video on cross validation (from 2024), could be helpful, see <https://www.youtube.com/watch?v=T9jjWsmsd1o>\n",
+    "\n",
+    "See also video on ADAgrad, RMSprop and ADAM (material from last week not covered during lecture) at <https://youtu.be/J_41Hld6tTU>"
+   ]
   }
  ],
- "metadata": {
-  "kernelspec": {
-   "display_name": "Python 3 (ipykernel)",
-   "language": "python",
-   "name": "python3"
-  },
-  "language_info": {
-   "codemirror_mode": {
-    "name": "ipython",
-    "version": 3
-   },
-   "file_extension": ".py",
-   "mimetype": "text/x-python",
-   "name": "python",
-   "nbconvert_exporter": "python",
-   "pygments_lexer": "ipython3",
-   "version": "3.9.18"
-  }
- },
+ "metadata": {},
  "nbformat": 4,
  "nbformat_minor": 5
 }
diff --git a/doc/pub/week39/html/._week39-bs000.html b/doc/pub/week39/html/._week39-bs000.html
new file mode 100644
index 000000000..85a342f93
--- /dev/null
+++ b/doc/pub/week39/html/._week39-bs000.html
@@ -0,0 +1,313 @@
+<!--
+HTML file automatically generated from DocOnce source
+(https://github.com/doconce/doconce/)
+doconce format html week39.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week39-bs --no_mako
+-->
+<html>
+<head>
+<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
+<meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
+<meta name="viewport" content="width=device-width, initial-scale=1.0" />
+<meta name="description" content="Week 39: Resampling methods and logistic regression">
+<title>Week 39: Resampling methods and logistic regression</title>
+<!-- Bootstrap style: bootstrap -->
+<!-- doconce format html week39.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week39-bs --no_mako -->
+<link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
+<!-- not necessary
+<link href="https://netdna.bootstrapcdn.com/font-awesome/4.0.3/css/font-awesome.css" rel="stylesheet">
+-->
+<style type="text/css">
+/* Add scrollbar to dropdown menus in bootstrap navigation bar */
+.dropdown-menu {
+   height: auto;
+   max-height: 400px;
+   overflow-x: hidden;
+}
+/* Adds an invisible element before each target to offset for the navigation
+   bar */
+.anchor::before {
+  content:"";
+  display:block;
+  height:50px;      /* fixed header height for style bootstrap */
+  margin:-50px 0 0; /* negative fixed header height */
+}
+</style>
+</head>
+
+<!-- tocinfo
+{'highest level': 2,
+ 'sections': [('Plan for week 39, September 22-26, 2025',
+               2,
+               None,
+               'plan-for-week-39-september-22-26-2025'),
+              ('Readings and Videos, resampling methods',
+               2,
+               None,
+               'readings-and-videos-resampling-methods'),
+              ('Readings and Videos, logistic regression',
+               2,
+               None,
+               'readings-and-videos-logistic-regression'),
+              ('Lab sessions week 39', 2, None, 'lab-sessions-week-39'),
+              ('Lecture material', 2, None, 'lecture-material'),
+              ('Resampling methods', 2, None, 'resampling-methods'),
+              ('Resampling approaches can be computationally expensive',
+               2,
+               None,
+               'resampling-approaches-can-be-computationally-expensive'),
+              ('Why resampling methods ?', 2, None, 'why-resampling-methods'),
+              ('Statistical analysis', 2, None, 'statistical-analysis'),
+              ('Resampling methods', 2, None, 'resampling-methods'),
+              ('Resampling methods: Bootstrap',
+               2,
+               None,
+               'resampling-methods-bootstrap'),
+              ('The bias-variance tradeoff',
+               2,
+               None,
+               'the-bias-variance-tradeoff'),
+              ('A way to Read the Bias-Variance Tradeoff',
+               2,
+               None,
+               'a-way-to-read-the-bias-variance-tradeoff'),
+              ('Understanding what happens',
+               2,
+               None,
+               'understanding-what-happens'),
+              ('Summing up', 2, None, 'summing-up'),
+              ("Another Example from Scikit-Learn's Repository",
+               2,
+               None,
+               'another-example-from-scikit-learn-s-repository'),
+              ('Various steps in cross-validation',
+               2,
+               None,
+               'various-steps-in-cross-validation'),
+              ('Cross-validation in brief',
+               2,
+               None,
+               'cross-validation-in-brief'),
+              ('Code Example for Cross-validation and $k$-fold '
+               'Cross-validation',
+               2,
+               None,
+               'code-example-for-cross-validation-and-k-fold-cross-validation'),
+              ('More examples on bootstrap and cross-validation and errors',
+               2,
+               None,
+               'more-examples-on-bootstrap-and-cross-validation-and-errors'),
+              ('The same example but now with cross-validation',
+               2,
+               None,
+               'the-same-example-but-now-with-cross-validation'),
+              ('Logistic Regression', 2, None, 'logistic-regression'),
+              ('Classification problems', 2, None, 'classification-problems'),
+              ('Optimization and Deep learning',
+               2,
+               None,
+               'optimization-and-deep-learning'),
+              ('Basics', 2, None, 'basics'),
+              ('Linear classifier', 2, None, 'linear-classifier'),
+              ('Some selected properties', 2, None, 'some-selected-properties'),
+              ('Simple example', 2, None, 'simple-example'),
+              ('Plotting the mean value for each group',
+               2,
+               None,
+               'plotting-the-mean-value-for-each-group'),
+              ('The logistic function', 2, None, 'the-logistic-function'),
+              ('Examples of likelihood functions used in logistic regression '
+               'and nueral networks',
+               2,
+               None,
+               'examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-networks'),
+              ('Two parameters', 2, None, 'two-parameters'),
+              ('Maximum likelihood', 2, None, 'maximum-likelihood'),
+              ('The cost function rewritten',
+               2,
+               None,
+               'the-cost-function-rewritten'),
+              ('Minimizing the cross entropy',
+               2,
+               None,
+               'minimizing-the-cross-entropy'),
+              ('A more compact expression',
+               2,
+               None,
+               'a-more-compact-expression'),
+              ('Extending to more predictors',
+               2,
+               None,
+               'extending-to-more-predictors'),
+              ('Including more classes', 2, None, 'including-more-classes'),
+              ('More classes', 2, None, 'more-classes'),
+              ('Optimization, the central part of any Machine Learning '
+               'algortithm',
+               2,
+               None,
+               'optimization-the-central-part-of-any-machine-learning-algortithm'),
+              ('Revisiting our Logistic Regression case',
+               2,
+               None,
+               'revisiting-our-logistic-regression-case'),
+              ('The equations to solve', 2, None, 'the-equations-to-solve'),
+              ("Solving using Newton-Raphson's method",
+               2,
+               None,
+               'solving-using-newton-raphson-s-method'),
+              ('Example code for Logistic Regression',
+               2,
+               None,
+               'example-code-for-logistic-regression'),
+              ('Synthetic data generation',
+               3,
+               None,
+               'synthetic-data-generation')]}
+end of tocinfo -->
+
+<body>
+
+
+
+<script type="text/x-mathjax-config">
+MathJax.Hub.Config({
+  TeX: {
+     equationNumbers: {  autoNumber: "none"  },
+     extensions: ["AMSmath.js", "AMSsymbols.js", "autobold.js", "color.js"]
+  }
+});
+</script>
+<script type="text/javascript" async
+ src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
+</script>
+
+
+<!-- Bootstrap navigation bar -->
+<div class="navbar navbar-default navbar-fixed-top">
+  <div class="navbar-header">
+    <button type="button" class="navbar-toggle" data-toggle="collapse" data-target=".navbar-responsive-collapse">
+      <span class="icon-bar"></span>
+      <span class="icon-bar"></span>
+      <span class="icon-bar"></span>
+    </button>
+    <a class="navbar-brand" href="http://github.com/week39-bs.html">Week 39: Resampling methods and logistic regression</a>
+  </div>
+  <div class="navbar-collapse collapse navbar-responsive-collapse">
+    <ul class="nav navbar-nav navbar-right">
+      <li class="dropdown">
+        <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
+        <ul class="dropdown-menu">
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs001.html#plan-for-week-39-september-22-26-2025" style="font-size: 80%;"><b>Plan for week 39, September 22-26, 2025</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs002.html#readings-and-videos-resampling-methods" style="font-size: 80%;"><b>Readings and Videos, resampling methods</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs003.html#readings-and-videos-logistic-regression" style="font-size: 80%;"><b>Readings and Videos, logistic regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs004.html#lab-sessions-week-39" style="font-size: 80%;"><b>Lab sessions week 39</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs005.html#lecture-material" style="font-size: 80%;"><b>Lecture material</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs010.html#resampling-methods" style="font-size: 80%;"><b>Resampling methods</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs007.html#resampling-approaches-can-be-computationally-expensive" style="font-size: 80%;"><b>Resampling approaches can be computationally expensive</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs008.html#why-resampling-methods" style="font-size: 80%;"><b>Why resampling methods ?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs009.html#statistical-analysis" style="font-size: 80%;"><b>Statistical analysis</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs010.html#resampling-methods" style="font-size: 80%;"><b>Resampling methods</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs011.html#resampling-methods-bootstrap" style="font-size: 80%;"><b>Resampling methods: Bootstrap</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs012.html#the-bias-variance-tradeoff" style="font-size: 80%;"><b>The bias-variance tradeoff</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs013.html#a-way-to-read-the-bias-variance-tradeoff" style="font-size: 80%;"><b>A way to Read the Bias-Variance Tradeoff</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs014.html#understanding-what-happens" style="font-size: 80%;"><b>Understanding what happens</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs015.html#summing-up" style="font-size: 80%;"><b>Summing up</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs016.html#another-example-from-scikit-learn-s-repository" style="font-size: 80%;"><b>Another Example from Scikit-Learn's Repository</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs017.html#various-steps-in-cross-validation" style="font-size: 80%;"><b>Various steps in cross-validation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs018.html#cross-validation-in-brief" style="font-size: 80%;"><b>Cross-validation in brief</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs019.html#code-example-for-cross-validation-and-k-fold-cross-validation" style="font-size: 80%;"><b>Code Example for Cross-validation and \( k \)-fold Cross-validation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs020.html#more-examples-on-bootstrap-and-cross-validation-and-errors" style="font-size: 80%;"><b>More examples on bootstrap and cross-validation and errors</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs021.html#the-same-example-but-now-with-cross-validation" style="font-size: 80%;"><b>The same example but now with cross-validation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs022.html#logistic-regression" style="font-size: 80%;"><b>Logistic Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs023.html#classification-problems" style="font-size: 80%;"><b>Classification problems</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs024.html#optimization-and-deep-learning" style="font-size: 80%;"><b>Optimization and Deep learning</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs025.html#basics" style="font-size: 80%;"><b>Basics</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs026.html#linear-classifier" style="font-size: 80%;"><b>Linear classifier</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs027.html#some-selected-properties" style="font-size: 80%;"><b>Some selected properties</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs028.html#simple-example" style="font-size: 80%;"><b>Simple example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs029.html#plotting-the-mean-value-for-each-group" style="font-size: 80%;"><b>Plotting the mean value for each group</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs030.html#the-logistic-function" style="font-size: 80%;"><b>The logistic function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs031.html#examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-networks" style="font-size: 80%;"><b>Examples of likelihood functions used in logistic regression and nueral networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs032.html#two-parameters" style="font-size: 80%;"><b>Two parameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs033.html#maximum-likelihood" style="font-size: 80%;"><b>Maximum likelihood</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#the-cost-function-rewritten" style="font-size: 80%;"><b>The cost function rewritten</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs035.html#minimizing-the-cross-entropy" style="font-size: 80%;"><b>Minimizing the cross entropy</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs036.html#a-more-compact-expression" style="font-size: 80%;"><b>A more compact expression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs037.html#extending-to-more-predictors" style="font-size: 80%;"><b>Extending to more predictors</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs038.html#including-more-classes" style="font-size: 80%;"><b>Including more classes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs039.html#more-classes" style="font-size: 80%;"><b>More classes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs040.html#optimization-the-central-part-of-any-machine-learning-algortithm" style="font-size: 80%;"><b>Optimization, the central part of any Machine Learning algortithm</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs041.html#revisiting-our-logistic-regression-case" style="font-size: 80%;"><b>Revisiting our Logistic Regression case</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs042.html#the-equations-to-solve" style="font-size: 80%;"><b>The equations to solve</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs043.html#solving-using-newton-raphson-s-method" style="font-size: 80%;"><b>Solving using Newton-Raphson's method</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs044.html#example-code-for-logistic-regression" style="font-size: 80%;"><b>Example code for Logistic Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs044.html#synthetic-data-generation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Synthetic data generation</a></li>
+
+        </ul>
+      </li>
+    </ul>
+  </div>
+</div>
+</div> <!-- end of navigation bar -->
+<div class="container">
+<p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
+<a name="part0000"></a>
+<!-- ------------------- main content ---------------------- -->
+<div class="jumbotron">
+<center>
+<h1>Week 39: Resampling methods and logistic regression</h1>
+</center>  <!-- document title -->
+
+<!-- author(s): Morten Hjorth-Jensen -->
+<center>
+<b>Morten Hjorth-Jensen</b> 
+</center>
+<!-- institution -->
+<center>
+<b>Department of Physics, University of Oslo</b>
+</center>
+<br>
+<center>
+<h4>Week 39</h4>
+</center> <!-- date -->
+<br>
+
+
+
+<p><a href="http://github.com/._week39-bs001.html" class="btn btn-primary btn-lg">Read &raquo;</a></p>
+
+
+</div> <!-- end jumbotron -->
+
+<p>
+<!-- navigation buttons at the bottom of the page -->
+<ul class="pagination">
+  <li class="active"><a href="http://github.com/._week39-bs000.html">1</a></li>
+  <li><a href="http://github.com/._week39-bs001.html">2</a></li>
+  <li><a href="http://github.com/._week39-bs002.html">3</a></li>
+  <li><a href="http://github.com/._week39-bs003.html">4</a></li>
+  <li><a href="http://github.com/._week39-bs004.html">5</a></li>
+  <li><a href="http://github.com/._week39-bs005.html">6</a></li>
+  <li><a href="http://github.com/._week39-bs006.html">7</a></li>
+  <li><a href="http://github.com/._week39-bs007.html">8</a></li>
+  <li><a href="http://github.com/._week39-bs008.html">9</a></li>
+  <li><a href="http://github.com/._week39-bs009.html">10</a></li>
+  <li><a href="">...</a></li>
+  <li><a href="http://github.com/._week39-bs044.html">45</a></li>
+  <li><a href="http://github.com/._week39-bs001.html">&raquo;</a></li>
+</ul>
+<!-- ------------------- end of main content --------------- -->
+</div>  <!-- end container -->
+<!-- include javascript, jQuery *first* -->
+<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.10.2/jquery.min.js"></script>
+<script src="https://netdna.bootstrapcdn.com/bootstrap/3.0.0/js/bootstrap.min.js"></script>
+<!-- Bootstrap footer
+<footer>
+<a href="https://.../"><img width="250" align=right src="https://.../"></a>
+</footer>
+-->
+<center style="font-size:80%">
+<!-- copyright --> &copy; 1999-2025, Morten Hjorth-Jensen. Released under CC Attribution-NonCommercial 4.0 license
+</center>
+</body>
+</html>
+
diff --git a/doc/pub/week39/html/._week39-bs001.html b/doc/pub/week39/html/._week39-bs001.html
new file mode 100644
index 000000000..02a5393e5
--- /dev/null
+++ b/doc/pub/week39/html/._week39-bs001.html
@@ -0,0 +1,304 @@
+<!--
+HTML file automatically generated from DocOnce source
+(https://github.com/doconce/doconce/)
+doconce format html week39.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week39-bs --no_mako
+-->
+<html>
+<head>
+<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
+<meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
+<meta name="viewport" content="width=device-width, initial-scale=1.0" />
+<meta name="description" content="Week 39: Resampling methods and logistic regression">
+<title>Week 39: Resampling methods and logistic regression</title>
+<!-- Bootstrap style: bootstrap -->
+<!-- doconce format html week39.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week39-bs --no_mako -->
+<link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
+<!-- not necessary
+<link href="https://netdna.bootstrapcdn.com/font-awesome/4.0.3/css/font-awesome.css" rel="stylesheet">
+-->
+<style type="text/css">
+/* Add scrollbar to dropdown menus in bootstrap navigation bar */
+.dropdown-menu {
+   height: auto;
+   max-height: 400px;
+   overflow-x: hidden;
+}
+/* Adds an invisible element before each target to offset for the navigation
+   bar */
+.anchor::before {
+  content:"";
+  display:block;
+  height:50px;      /* fixed header height for style bootstrap */
+  margin:-50px 0 0; /* negative fixed header height */
+}
+</style>
+</head>
+
+<!-- tocinfo
+{'highest level': 2,
+ 'sections': [('Plan for week 39, September 22-26, 2025',
+               2,
+               None,
+               'plan-for-week-39-september-22-26-2025'),
+              ('Readings and Videos, resampling methods',
+               2,
+               None,
+               'readings-and-videos-resampling-methods'),
+              ('Readings and Videos, logistic regression',
+               2,
+               None,
+               'readings-and-videos-logistic-regression'),
+              ('Lab sessions week 39', 2, None, 'lab-sessions-week-39'),
+              ('Lecture material', 2, None, 'lecture-material'),
+              ('Resampling methods', 2, None, 'resampling-methods'),
+              ('Resampling approaches can be computationally expensive',
+               2,
+               None,
+               'resampling-approaches-can-be-computationally-expensive'),
+              ('Why resampling methods ?', 2, None, 'why-resampling-methods'),
+              ('Statistical analysis', 2, None, 'statistical-analysis'),
+              ('Resampling methods', 2, None, 'resampling-methods'),
+              ('Resampling methods: Bootstrap',
+               2,
+               None,
+               'resampling-methods-bootstrap'),
+              ('The bias-variance tradeoff',
+               2,
+               None,
+               'the-bias-variance-tradeoff'),
+              ('A way to Read the Bias-Variance Tradeoff',
+               2,
+               None,
+               'a-way-to-read-the-bias-variance-tradeoff'),
+              ('Understanding what happens',
+               2,
+               None,
+               'understanding-what-happens'),
+              ('Summing up', 2, None, 'summing-up'),
+              ("Another Example from Scikit-Learn's Repository",
+               2,
+               None,
+               'another-example-from-scikit-learn-s-repository'),
+              ('Various steps in cross-validation',
+               2,
+               None,
+               'various-steps-in-cross-validation'),
+              ('Cross-validation in brief',
+               2,
+               None,
+               'cross-validation-in-brief'),
+              ('Code Example for Cross-validation and $k$-fold '
+               'Cross-validation',
+               2,
+               None,
+               'code-example-for-cross-validation-and-k-fold-cross-validation'),
+              ('More examples on bootstrap and cross-validation and errors',
+               2,
+               None,
+               'more-examples-on-bootstrap-and-cross-validation-and-errors'),
+              ('The same example but now with cross-validation',
+               2,
+               None,
+               'the-same-example-but-now-with-cross-validation'),
+              ('Logistic Regression', 2, None, 'logistic-regression'),
+              ('Classification problems', 2, None, 'classification-problems'),
+              ('Optimization and Deep learning',
+               2,
+               None,
+               'optimization-and-deep-learning'),
+              ('Basics', 2, None, 'basics'),
+              ('Linear classifier', 2, None, 'linear-classifier'),
+              ('Some selected properties', 2, None, 'some-selected-properties'),
+              ('Simple example', 2, None, 'simple-example'),
+              ('Plotting the mean value for each group',
+               2,
+               None,
+               'plotting-the-mean-value-for-each-group'),
+              ('The logistic function', 2, None, 'the-logistic-function'),
+              ('Examples of likelihood functions used in logistic regression '
+               'and nueral networks',
+               2,
+               None,
+               'examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-networks'),
+              ('Two parameters', 2, None, 'two-parameters'),
+              ('Maximum likelihood', 2, None, 'maximum-likelihood'),
+              ('The cost function rewritten',
+               2,
+               None,
+               'the-cost-function-rewritten'),
+              ('Minimizing the cross entropy',
+               2,
+               None,
+               'minimizing-the-cross-entropy'),
+              ('A more compact expression',
+               2,
+               None,
+               'a-more-compact-expression'),
+              ('Extending to more predictors',
+               2,
+               None,
+               'extending-to-more-predictors'),
+              ('Including more classes', 2, None, 'including-more-classes'),
+              ('More classes', 2, None, 'more-classes'),
+              ('Optimization, the central part of any Machine Learning '
+               'algortithm',
+               2,
+               None,
+               'optimization-the-central-part-of-any-machine-learning-algortithm'),
+              ('Revisiting our Logistic Regression case',
+               2,
+               None,
+               'revisiting-our-logistic-regression-case'),
+              ('The equations to solve', 2, None, 'the-equations-to-solve'),
+              ("Solving using Newton-Raphson's method",
+               2,
+               None,
+               'solving-using-newton-raphson-s-method'),
+              ('Example code for Logistic Regression',
+               2,
+               None,
+               'example-code-for-logistic-regression'),
+              ('Synthetic data generation',
+               3,
+               None,
+               'synthetic-data-generation')]}
+end of tocinfo -->
+
+<body>
+
+
+
+<script type="text/x-mathjax-config">
+MathJax.Hub.Config({
+  TeX: {
+     equationNumbers: {  autoNumber: "none"  },
+     extensions: ["AMSmath.js", "AMSsymbols.js", "autobold.js", "color.js"]
+  }
+});
+</script>
+<script type="text/javascript" async
+ src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
+</script>
+
+
+<!-- Bootstrap navigation bar -->
+<div class="navbar navbar-default navbar-fixed-top">
+  <div class="navbar-header">
+    <button type="button" class="navbar-toggle" data-toggle="collapse" data-target=".navbar-responsive-collapse">
+      <span class="icon-bar"></span>
+      <span class="icon-bar"></span>
+      <span class="icon-bar"></span>
+    </button>
+    <a class="navbar-brand" href="http://github.com/week39-bs.html">Week 39: Resampling methods and logistic regression</a>
+  </div>
+  <div class="navbar-collapse collapse navbar-responsive-collapse">
+    <ul class="nav navbar-nav navbar-right">
+      <li class="dropdown">
+        <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
+        <ul class="dropdown-menu">
+     <!-- navigation toc: --> <li><a href="#plan-for-week-39-september-22-26-2025" style="font-size: 80%;"><b>Plan for week 39, September 22-26, 2025</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs002.html#readings-and-videos-resampling-methods" style="font-size: 80%;"><b>Readings and Videos, resampling methods</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs003.html#readings-and-videos-logistic-regression" style="font-size: 80%;"><b>Readings and Videos, logistic regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs004.html#lab-sessions-week-39" style="font-size: 80%;"><b>Lab sessions week 39</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs005.html#lecture-material" style="font-size: 80%;"><b>Lecture material</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs010.html#resampling-methods" style="font-size: 80%;"><b>Resampling methods</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs007.html#resampling-approaches-can-be-computationally-expensive" style="font-size: 80%;"><b>Resampling approaches can be computationally expensive</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs008.html#why-resampling-methods" style="font-size: 80%;"><b>Why resampling methods ?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs009.html#statistical-analysis" style="font-size: 80%;"><b>Statistical analysis</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs010.html#resampling-methods" style="font-size: 80%;"><b>Resampling methods</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs011.html#resampling-methods-bootstrap" style="font-size: 80%;"><b>Resampling methods: Bootstrap</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs012.html#the-bias-variance-tradeoff" style="font-size: 80%;"><b>The bias-variance tradeoff</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs013.html#a-way-to-read-the-bias-variance-tradeoff" style="font-size: 80%;"><b>A way to Read the Bias-Variance Tradeoff</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs014.html#understanding-what-happens" style="font-size: 80%;"><b>Understanding what happens</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs015.html#summing-up" style="font-size: 80%;"><b>Summing up</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs016.html#another-example-from-scikit-learn-s-repository" style="font-size: 80%;"><b>Another Example from Scikit-Learn's Repository</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs017.html#various-steps-in-cross-validation" style="font-size: 80%;"><b>Various steps in cross-validation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs018.html#cross-validation-in-brief" style="font-size: 80%;"><b>Cross-validation in brief</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs019.html#code-example-for-cross-validation-and-k-fold-cross-validation" style="font-size: 80%;"><b>Code Example for Cross-validation and \( k \)-fold Cross-validation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs020.html#more-examples-on-bootstrap-and-cross-validation-and-errors" style="font-size: 80%;"><b>More examples on bootstrap and cross-validation and errors</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs021.html#the-same-example-but-now-with-cross-validation" style="font-size: 80%;"><b>The same example but now with cross-validation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs022.html#logistic-regression" style="font-size: 80%;"><b>Logistic Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs023.html#classification-problems" style="font-size: 80%;"><b>Classification problems</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs024.html#optimization-and-deep-learning" style="font-size: 80%;"><b>Optimization and Deep learning</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs025.html#basics" style="font-size: 80%;"><b>Basics</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs026.html#linear-classifier" style="font-size: 80%;"><b>Linear classifier</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs027.html#some-selected-properties" style="font-size: 80%;"><b>Some selected properties</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs028.html#simple-example" style="font-size: 80%;"><b>Simple example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs029.html#plotting-the-mean-value-for-each-group" style="font-size: 80%;"><b>Plotting the mean value for each group</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs030.html#the-logistic-function" style="font-size: 80%;"><b>The logistic function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs031.html#examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-networks" style="font-size: 80%;"><b>Examples of likelihood functions used in logistic regression and nueral networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs032.html#two-parameters" style="font-size: 80%;"><b>Two parameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs033.html#maximum-likelihood" style="font-size: 80%;"><b>Maximum likelihood</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#the-cost-function-rewritten" style="font-size: 80%;"><b>The cost function rewritten</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs035.html#minimizing-the-cross-entropy" style="font-size: 80%;"><b>Minimizing the cross entropy</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs036.html#a-more-compact-expression" style="font-size: 80%;"><b>A more compact expression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs037.html#extending-to-more-predictors" style="font-size: 80%;"><b>Extending to more predictors</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs038.html#including-more-classes" style="font-size: 80%;"><b>Including more classes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs039.html#more-classes" style="font-size: 80%;"><b>More classes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs040.html#optimization-the-central-part-of-any-machine-learning-algortithm" style="font-size: 80%;"><b>Optimization, the central part of any Machine Learning algortithm</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs041.html#revisiting-our-logistic-regression-case" style="font-size: 80%;"><b>Revisiting our Logistic Regression case</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs042.html#the-equations-to-solve" style="font-size: 80%;"><b>The equations to solve</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs043.html#solving-using-newton-raphson-s-method" style="font-size: 80%;"><b>Solving using Newton-Raphson's method</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs044.html#example-code-for-logistic-regression" style="font-size: 80%;"><b>Example code for Logistic Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs044.html#synthetic-data-generation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Synthetic data generation</a></li>
+
+        </ul>
+      </li>
+    </ul>
+  </div>
+</div>
+</div> <!-- end of navigation bar -->
+<div class="container">
+<p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
+<a name="part0001"></a>
+<!-- !split -->
+<h2 id="plan-for-week-39-september-22-26-2025" class="anchor">Plan for week 39, September 22-26, 2025 </h2>
+
+<div class="panel panel-default">
+<div class="panel-body">
+<!-- subsequent paragraphs come in larger fonts, so start with a paragraph -->
+<ol>
+<li> Resampling techniques, Bootstrap and cross validation and bias-variance tradeoff</li>
+<li> Logistic regression, our first classification encounter and a stepping stone towards neural networks</li>
+<li> <a href="https://youtu.be/OVouJyhoksY" target="_self">Video of lecture</a></li>
+<li> <a href="https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2024/FYSSTKweek39.pdf" target="_self">Whiteboard notes</a></li>
+</ol>
+</div>
+</div>
+
+
+<p>
+<!-- navigation buttons at the bottom of the page -->
+<ul class="pagination">
+<li><a href="http://github.com/._week39-bs000.html">&laquo;</a></li>
+  <li><a href="http://github.com/._week39-bs000.html">1</a></li>
+  <li class="active"><a href="http://github.com/._week39-bs001.html">2</a></li>
+  <li><a href="http://github.com/._week39-bs002.html">3</a></li>
+  <li><a href="http://github.com/._week39-bs003.html">4</a></li>
+  <li><a href="http://github.com/._week39-bs004.html">5</a></li>
+  <li><a href="http://github.com/._week39-bs005.html">6</a></li>
+  <li><a href="http://github.com/._week39-bs006.html">7</a></li>
+  <li><a href="http://github.com/._week39-bs007.html">8</a></li>
+  <li><a href="http://github.com/._week39-bs008.html">9</a></li>
+  <li><a href="http://github.com/._week39-bs009.html">10</a></li>
+  <li><a href="http://github.com/._week39-bs010.html">11</a></li>
+  <li><a href="">...</a></li>
+  <li><a href="http://github.com/._week39-bs044.html">45</a></li>
+  <li><a href="http://github.com/._week39-bs002.html">&raquo;</a></li>
+</ul>
+<!-- ------------------- end of main content --------------- -->
+</div>  <!-- end container -->
+<!-- include javascript, jQuery *first* -->
+<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.10.2/jquery.min.js"></script>
+<script src="https://netdna.bootstrapcdn.com/bootstrap/3.0.0/js/bootstrap.min.js"></script>
+<!-- Bootstrap footer
+<footer>
+<a href="https://.../"><img width="250" align=right src="https://.../"></a>
+</footer>
+-->
+<center style="font-size:80%">
+<!-- copyright only on the titlepage -->
+</center>
+</body>
+</html>
+
diff --git a/doc/pub/week39/html/._week39-bs002.html b/doc/pub/week39/html/._week39-bs002.html
new file mode 100644
index 000000000..fc1a16a34
--- /dev/null
+++ b/doc/pub/week39/html/._week39-bs002.html
@@ -0,0 +1,305 @@
+<!--
+HTML file automatically generated from DocOnce source
+(https://github.com/doconce/doconce/)
+doconce format html week39.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week39-bs --no_mako
+-->
+<html>
+<head>
+<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
+<meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
+<meta name="viewport" content="width=device-width, initial-scale=1.0" />
+<meta name="description" content="Week 39: Resampling methods and logistic regression">
+<title>Week 39: Resampling methods and logistic regression</title>
+<!-- Bootstrap style: bootstrap -->
+<!-- doconce format html week39.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week39-bs --no_mako -->
+<link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
+<!-- not necessary
+<link href="https://netdna.bootstrapcdn.com/font-awesome/4.0.3/css/font-awesome.css" rel="stylesheet">
+-->
+<style type="text/css">
+/* Add scrollbar to dropdown menus in bootstrap navigation bar */
+.dropdown-menu {
+   height: auto;
+   max-height: 400px;
+   overflow-x: hidden;
+}
+/* Adds an invisible element before each target to offset for the navigation
+   bar */
+.anchor::before {
+  content:"";
+  display:block;
+  height:50px;      /* fixed header height for style bootstrap */
+  margin:-50px 0 0; /* negative fixed header height */
+}
+</style>
+</head>
+
+<!-- tocinfo
+{'highest level': 2,
+ 'sections': [('Plan for week 39, September 22-26, 2025',
+               2,
+               None,
+               'plan-for-week-39-september-22-26-2025'),
+              ('Readings and Videos, resampling methods',
+               2,
+               None,
+               'readings-and-videos-resampling-methods'),
+              ('Readings and Videos, logistic regression',
+               2,
+               None,
+               'readings-and-videos-logistic-regression'),
+              ('Lab sessions week 39', 2, None, 'lab-sessions-week-39'),
+              ('Lecture material', 2, None, 'lecture-material'),
+              ('Resampling methods', 2, None, 'resampling-methods'),
+              ('Resampling approaches can be computationally expensive',
+               2,
+               None,
+               'resampling-approaches-can-be-computationally-expensive'),
+              ('Why resampling methods ?', 2, None, 'why-resampling-methods'),
+              ('Statistical analysis', 2, None, 'statistical-analysis'),
+              ('Resampling methods', 2, None, 'resampling-methods'),
+              ('Resampling methods: Bootstrap',
+               2,
+               None,
+               'resampling-methods-bootstrap'),
+              ('The bias-variance tradeoff',
+               2,
+               None,
+               'the-bias-variance-tradeoff'),
+              ('A way to Read the Bias-Variance Tradeoff',
+               2,
+               None,
+               'a-way-to-read-the-bias-variance-tradeoff'),
+              ('Understanding what happens',
+               2,
+               None,
+               'understanding-what-happens'),
+              ('Summing up', 2, None, 'summing-up'),
+              ("Another Example from Scikit-Learn's Repository",
+               2,
+               None,
+               'another-example-from-scikit-learn-s-repository'),
+              ('Various steps in cross-validation',
+               2,
+               None,
+               'various-steps-in-cross-validation'),
+              ('Cross-validation in brief',
+               2,
+               None,
+               'cross-validation-in-brief'),
+              ('Code Example for Cross-validation and $k$-fold '
+               'Cross-validation',
+               2,
+               None,
+               'code-example-for-cross-validation-and-k-fold-cross-validation'),
+              ('More examples on bootstrap and cross-validation and errors',
+               2,
+               None,
+               'more-examples-on-bootstrap-and-cross-validation-and-errors'),
+              ('The same example but now with cross-validation',
+               2,
+               None,
+               'the-same-example-but-now-with-cross-validation'),
+              ('Logistic Regression', 2, None, 'logistic-regression'),
+              ('Classification problems', 2, None, 'classification-problems'),
+              ('Optimization and Deep learning',
+               2,
+               None,
+               'optimization-and-deep-learning'),
+              ('Basics', 2, None, 'basics'),
+              ('Linear classifier', 2, None, 'linear-classifier'),
+              ('Some selected properties', 2, None, 'some-selected-properties'),
+              ('Simple example', 2, None, 'simple-example'),
+              ('Plotting the mean value for each group',
+               2,
+               None,
+               'plotting-the-mean-value-for-each-group'),
+              ('The logistic function', 2, None, 'the-logistic-function'),
+              ('Examples of likelihood functions used in logistic regression '
+               'and nueral networks',
+               2,
+               None,
+               'examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-networks'),
+              ('Two parameters', 2, None, 'two-parameters'),
+              ('Maximum likelihood', 2, None, 'maximum-likelihood'),
+              ('The cost function rewritten',
+               2,
+               None,
+               'the-cost-function-rewritten'),
+              ('Minimizing the cross entropy',
+               2,
+               None,
+               'minimizing-the-cross-entropy'),
+              ('A more compact expression',
+               2,
+               None,
+               'a-more-compact-expression'),
+              ('Extending to more predictors',
+               2,
+               None,
+               'extending-to-more-predictors'),
+              ('Including more classes', 2, None, 'including-more-classes'),
+              ('More classes', 2, None, 'more-classes'),
+              ('Optimization, the central part of any Machine Learning '
+               'algortithm',
+               2,
+               None,
+               'optimization-the-central-part-of-any-machine-learning-algortithm'),
+              ('Revisiting our Logistic Regression case',
+               2,
+               None,
+               'revisiting-our-logistic-regression-case'),
+              ('The equations to solve', 2, None, 'the-equations-to-solve'),
+              ("Solving using Newton-Raphson's method",
+               2,
+               None,
+               'solving-using-newton-raphson-s-method'),
+              ('Example code for Logistic Regression',
+               2,
+               None,
+               'example-code-for-logistic-regression'),
+              ('Synthetic data generation',
+               3,
+               None,
+               'synthetic-data-generation')]}
+end of tocinfo -->
+
+<body>
+
+
+
+<script type="text/x-mathjax-config">
+MathJax.Hub.Config({
+  TeX: {
+     equationNumbers: {  autoNumber: "none"  },
+     extensions: ["AMSmath.js", "AMSsymbols.js", "autobold.js", "color.js"]
+  }
+});
+</script>
+<script type="text/javascript" async
+ src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
+</script>
+
+
+<!-- Bootstrap navigation bar -->
+<div class="navbar navbar-default navbar-fixed-top">
+  <div class="navbar-header">
+    <button type="button" class="navbar-toggle" data-toggle="collapse" data-target=".navbar-responsive-collapse">
+      <span class="icon-bar"></span>
+      <span class="icon-bar"></span>
+      <span class="icon-bar"></span>
+    </button>
+    <a class="navbar-brand" href="http://github.com/week39-bs.html">Week 39: Resampling methods and logistic regression</a>
+  </div>
+  <div class="navbar-collapse collapse navbar-responsive-collapse">
+    <ul class="nav navbar-nav navbar-right">
+      <li class="dropdown">
+        <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
+        <ul class="dropdown-menu">
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs001.html#plan-for-week-39-september-22-26-2025" style="font-size: 80%;"><b>Plan for week 39, September 22-26, 2025</b></a></li>
+     <!-- navigation toc: --> <li><a href="#readings-and-videos-resampling-methods" style="font-size: 80%;"><b>Readings and Videos, resampling methods</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs003.html#readings-and-videos-logistic-regression" style="font-size: 80%;"><b>Readings and Videos, logistic regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs004.html#lab-sessions-week-39" style="font-size: 80%;"><b>Lab sessions week 39</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs005.html#lecture-material" style="font-size: 80%;"><b>Lecture material</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs010.html#resampling-methods" style="font-size: 80%;"><b>Resampling methods</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs007.html#resampling-approaches-can-be-computationally-expensive" style="font-size: 80%;"><b>Resampling approaches can be computationally expensive</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs008.html#why-resampling-methods" style="font-size: 80%;"><b>Why resampling methods ?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs009.html#statistical-analysis" style="font-size: 80%;"><b>Statistical analysis</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs010.html#resampling-methods" style="font-size: 80%;"><b>Resampling methods</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs011.html#resampling-methods-bootstrap" style="font-size: 80%;"><b>Resampling methods: Bootstrap</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs012.html#the-bias-variance-tradeoff" style="font-size: 80%;"><b>The bias-variance tradeoff</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs013.html#a-way-to-read-the-bias-variance-tradeoff" style="font-size: 80%;"><b>A way to Read the Bias-Variance Tradeoff</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs014.html#understanding-what-happens" style="font-size: 80%;"><b>Understanding what happens</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs015.html#summing-up" style="font-size: 80%;"><b>Summing up</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs016.html#another-example-from-scikit-learn-s-repository" style="font-size: 80%;"><b>Another Example from Scikit-Learn's Repository</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs017.html#various-steps-in-cross-validation" style="font-size: 80%;"><b>Various steps in cross-validation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs018.html#cross-validation-in-brief" style="font-size: 80%;"><b>Cross-validation in brief</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs019.html#code-example-for-cross-validation-and-k-fold-cross-validation" style="font-size: 80%;"><b>Code Example for Cross-validation and \( k \)-fold Cross-validation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs020.html#more-examples-on-bootstrap-and-cross-validation-and-errors" style="font-size: 80%;"><b>More examples on bootstrap and cross-validation and errors</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs021.html#the-same-example-but-now-with-cross-validation" style="font-size: 80%;"><b>The same example but now with cross-validation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs022.html#logistic-regression" style="font-size: 80%;"><b>Logistic Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs023.html#classification-problems" style="font-size: 80%;"><b>Classification problems</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs024.html#optimization-and-deep-learning" style="font-size: 80%;"><b>Optimization and Deep learning</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs025.html#basics" style="font-size: 80%;"><b>Basics</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs026.html#linear-classifier" style="font-size: 80%;"><b>Linear classifier</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs027.html#some-selected-properties" style="font-size: 80%;"><b>Some selected properties</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs028.html#simple-example" style="font-size: 80%;"><b>Simple example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs029.html#plotting-the-mean-value-for-each-group" style="font-size: 80%;"><b>Plotting the mean value for each group</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs030.html#the-logistic-function" style="font-size: 80%;"><b>The logistic function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs031.html#examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-networks" style="font-size: 80%;"><b>Examples of likelihood functions used in logistic regression and nueral networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs032.html#two-parameters" style="font-size: 80%;"><b>Two parameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs033.html#maximum-likelihood" style="font-size: 80%;"><b>Maximum likelihood</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#the-cost-function-rewritten" style="font-size: 80%;"><b>The cost function rewritten</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs035.html#minimizing-the-cross-entropy" style="font-size: 80%;"><b>Minimizing the cross entropy</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs036.html#a-more-compact-expression" style="font-size: 80%;"><b>A more compact expression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs037.html#extending-to-more-predictors" style="font-size: 80%;"><b>Extending to more predictors</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs038.html#including-more-classes" style="font-size: 80%;"><b>Including more classes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs039.html#more-classes" style="font-size: 80%;"><b>More classes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs040.html#optimization-the-central-part-of-any-machine-learning-algortithm" style="font-size: 80%;"><b>Optimization, the central part of any Machine Learning algortithm</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs041.html#revisiting-our-logistic-regression-case" style="font-size: 80%;"><b>Revisiting our Logistic Regression case</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs042.html#the-equations-to-solve" style="font-size: 80%;"><b>The equations to solve</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs043.html#solving-using-newton-raphson-s-method" style="font-size: 80%;"><b>Solving using Newton-Raphson's method</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs044.html#example-code-for-logistic-regression" style="font-size: 80%;"><b>Example code for Logistic Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs044.html#synthetic-data-generation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Synthetic data generation</a></li>
+
+        </ul>
+      </li>
+    </ul>
+  </div>
+</div>
+</div> <!-- end of navigation bar -->
+<div class="container">
+<p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
+<a name="part0002"></a>
+<!-- !split -->
+<h2 id="readings-and-videos-resampling-methods" class="anchor">Readings and Videos, resampling methods </h2>
+<div class="panel panel-default">
+<div class="panel-body">
+<!-- subsequent paragraphs come in larger fonts, so start with a paragraph -->
+<ol>
+<li> Raschka et al, pages 175-192</li>
+<li> Hastie et al Chapter 7, here we recommend 7.1-7.5 and 7.10 (cross-validation) and 7.11 (bootstrap). See <a href="https://link.springer.com/book/10.1007/978-0-387-84858-7" target="_self"><tt>https://link.springer.com/book/10.1007/978-0-387-84858-7</tt></a>.</li>
+<li> <a href="https://www.youtube.com/watch?v=EuBBz3bI-aA" target="_self">Video on bias-variance tradeoff</a></li>
+<li> <a href="https://www.youtube.com/watch?v=Xz0x-8-cgaQ" target="_self">Video on Bootstrapping</a></li>
+<li> <a href="https://www.youtube.com/watch?v=fSytzGwwBVw" target="_self">Video on cross validation</a></li>
+</ol>
+</div>
+</div>
+
+
+<p>
+<!-- navigation buttons at the bottom of the page -->
+<ul class="pagination">
+<li><a href="http://github.com/._week39-bs001.html">&laquo;</a></li>
+  <li><a href="http://github.com/._week39-bs000.html">1</a></li>
+  <li><a href="http://github.com/._week39-bs001.html">2</a></li>
+  <li class="active"><a href="http://github.com/._week39-bs002.html">3</a></li>
+  <li><a href="http://github.com/._week39-bs003.html">4</a></li>
+  <li><a href="http://github.com/._week39-bs004.html">5</a></li>
+  <li><a href="http://github.com/._week39-bs005.html">6</a></li>
+  <li><a href="http://github.com/._week39-bs006.html">7</a></li>
+  <li><a href="http://github.com/._week39-bs007.html">8</a></li>
+  <li><a href="http://github.com/._week39-bs008.html">9</a></li>
+  <li><a href="http://github.com/._week39-bs009.html">10</a></li>
+  <li><a href="http://github.com/._week39-bs010.html">11</a></li>
+  <li><a href="http://github.com/._week39-bs011.html">12</a></li>
+  <li><a href="">...</a></li>
+  <li><a href="http://github.com/._week39-bs044.html">45</a></li>
+  <li><a href="http://github.com/._week39-bs003.html">&raquo;</a></li>
+</ul>
+<!-- ------------------- end of main content --------------- -->
+</div>  <!-- end container -->
+<!-- include javascript, jQuery *first* -->
+<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.10.2/jquery.min.js"></script>
+<script src="https://netdna.bootstrapcdn.com/bootstrap/3.0.0/js/bootstrap.min.js"></script>
+<!-- Bootstrap footer
+<footer>
+<a href="https://.../"><img width="250" align=right src="https://.../"></a>
+</footer>
+-->
+<center style="font-size:80%">
+<!-- copyright only on the titlepage -->
+</center>
+</body>
+</html>
+
diff --git a/doc/pub/week39/html/._week39-bs003.html b/doc/pub/week39/html/._week39-bs003.html
new file mode 100644
index 000000000..c849cd54d
--- /dev/null
+++ b/doc/pub/week39/html/._week39-bs003.html
@@ -0,0 +1,305 @@
+<!--
+HTML file automatically generated from DocOnce source
+(https://github.com/doconce/doconce/)
+doconce format html week39.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week39-bs --no_mako
+-->
+<html>
+<head>
+<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
+<meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
+<meta name="viewport" content="width=device-width, initial-scale=1.0" />
+<meta name="description" content="Week 39: Resampling methods and logistic regression">
+<title>Week 39: Resampling methods and logistic regression</title>
+<!-- Bootstrap style: bootstrap -->
+<!-- doconce format html week39.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week39-bs --no_mako -->
+<link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
+<!-- not necessary
+<link href="https://netdna.bootstrapcdn.com/font-awesome/4.0.3/css/font-awesome.css" rel="stylesheet">
+-->
+<style type="text/css">
+/* Add scrollbar to dropdown menus in bootstrap navigation bar */
+.dropdown-menu {
+   height: auto;
+   max-height: 400px;
+   overflow-x: hidden;
+}
+/* Adds an invisible element before each target to offset for the navigation
+   bar */
+.anchor::before {
+  content:"";
+  display:block;
+  height:50px;      /* fixed header height for style bootstrap */
+  margin:-50px 0 0; /* negative fixed header height */
+}
+</style>
+</head>
+
+<!-- tocinfo
+{'highest level': 2,
+ 'sections': [('Plan for week 39, September 22-26, 2025',
+               2,
+               None,
+               'plan-for-week-39-september-22-26-2025'),
+              ('Readings and Videos, resampling methods',
+               2,
+               None,
+               'readings-and-videos-resampling-methods'),
+              ('Readings and Videos, logistic regression',
+               2,
+               None,
+               'readings-and-videos-logistic-regression'),
+              ('Lab sessions week 39', 2, None, 'lab-sessions-week-39'),
+              ('Lecture material', 2, None, 'lecture-material'),
+              ('Resampling methods', 2, None, 'resampling-methods'),
+              ('Resampling approaches can be computationally expensive',
+               2,
+               None,
+               'resampling-approaches-can-be-computationally-expensive'),
+              ('Why resampling methods ?', 2, None, 'why-resampling-methods'),
+              ('Statistical analysis', 2, None, 'statistical-analysis'),
+              ('Resampling methods', 2, None, 'resampling-methods'),
+              ('Resampling methods: Bootstrap',
+               2,
+               None,
+               'resampling-methods-bootstrap'),
+              ('The bias-variance tradeoff',
+               2,
+               None,
+               'the-bias-variance-tradeoff'),
+              ('A way to Read the Bias-Variance Tradeoff',
+               2,
+               None,
+               'a-way-to-read-the-bias-variance-tradeoff'),
+              ('Understanding what happens',
+               2,
+               None,
+               'understanding-what-happens'),
+              ('Summing up', 2, None, 'summing-up'),
+              ("Another Example from Scikit-Learn's Repository",
+               2,
+               None,
+               'another-example-from-scikit-learn-s-repository'),
+              ('Various steps in cross-validation',
+               2,
+               None,
+               'various-steps-in-cross-validation'),
+              ('Cross-validation in brief',
+               2,
+               None,
+               'cross-validation-in-brief'),
+              ('Code Example for Cross-validation and $k$-fold '
+               'Cross-validation',
+               2,
+               None,
+               'code-example-for-cross-validation-and-k-fold-cross-validation'),
+              ('More examples on bootstrap and cross-validation and errors',
+               2,
+               None,
+               'more-examples-on-bootstrap-and-cross-validation-and-errors'),
+              ('The same example but now with cross-validation',
+               2,
+               None,
+               'the-same-example-but-now-with-cross-validation'),
+              ('Logistic Regression', 2, None, 'logistic-regression'),
+              ('Classification problems', 2, None, 'classification-problems'),
+              ('Optimization and Deep learning',
+               2,
+               None,
+               'optimization-and-deep-learning'),
+              ('Basics', 2, None, 'basics'),
+              ('Linear classifier', 2, None, 'linear-classifier'),
+              ('Some selected properties', 2, None, 'some-selected-properties'),
+              ('Simple example', 2, None, 'simple-example'),
+              ('Plotting the mean value for each group',
+               2,
+               None,
+               'plotting-the-mean-value-for-each-group'),
+              ('The logistic function', 2, None, 'the-logistic-function'),
+              ('Examples of likelihood functions used in logistic regression '
+               'and nueral networks',
+               2,
+               None,
+               'examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-networks'),
+              ('Two parameters', 2, None, 'two-parameters'),
+              ('Maximum likelihood', 2, None, 'maximum-likelihood'),
+              ('The cost function rewritten',
+               2,
+               None,
+               'the-cost-function-rewritten'),
+              ('Minimizing the cross entropy',
+               2,
+               None,
+               'minimizing-the-cross-entropy'),
+              ('A more compact expression',
+               2,
+               None,
+               'a-more-compact-expression'),
+              ('Extending to more predictors',
+               2,
+               None,
+               'extending-to-more-predictors'),
+              ('Including more classes', 2, None, 'including-more-classes'),
+              ('More classes', 2, None, 'more-classes'),
+              ('Optimization, the central part of any Machine Learning '
+               'algortithm',
+               2,
+               None,
+               'optimization-the-central-part-of-any-machine-learning-algortithm'),
+              ('Revisiting our Logistic Regression case',
+               2,
+               None,
+               'revisiting-our-logistic-regression-case'),
+              ('The equations to solve', 2, None, 'the-equations-to-solve'),
+              ("Solving using Newton-Raphson's method",
+               2,
+               None,
+               'solving-using-newton-raphson-s-method'),
+              ('Example code for Logistic Regression',
+               2,
+               None,
+               'example-code-for-logistic-regression'),
+              ('Synthetic data generation',
+               3,
+               None,
+               'synthetic-data-generation')]}
+end of tocinfo -->
+
+<body>
+
+
+
+<script type="text/x-mathjax-config">
+MathJax.Hub.Config({
+  TeX: {
+     equationNumbers: {  autoNumber: "none"  },
+     extensions: ["AMSmath.js", "AMSsymbols.js", "autobold.js", "color.js"]
+  }
+});
+</script>
+<script type="text/javascript" async
+ src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
+</script>
+
+
+<!-- Bootstrap navigation bar -->
+<div class="navbar navbar-default navbar-fixed-top">
+  <div class="navbar-header">
+    <button type="button" class="navbar-toggle" data-toggle="collapse" data-target=".navbar-responsive-collapse">
+      <span class="icon-bar"></span>
+      <span class="icon-bar"></span>
+      <span class="icon-bar"></span>
+    </button>
+    <a class="navbar-brand" href="http://github.com/week39-bs.html">Week 39: Resampling methods and logistic regression</a>
+  </div>
+  <div class="navbar-collapse collapse navbar-responsive-collapse">
+    <ul class="nav navbar-nav navbar-right">
+      <li class="dropdown">
+        <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
+        <ul class="dropdown-menu">
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs001.html#plan-for-week-39-september-22-26-2025" style="font-size: 80%;"><b>Plan for week 39, September 22-26, 2025</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs002.html#readings-and-videos-resampling-methods" style="font-size: 80%;"><b>Readings and Videos, resampling methods</b></a></li>
+     <!-- navigation toc: --> <li><a href="#readings-and-videos-logistic-regression" style="font-size: 80%;"><b>Readings and Videos, logistic regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs004.html#lab-sessions-week-39" style="font-size: 80%;"><b>Lab sessions week 39</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs005.html#lecture-material" style="font-size: 80%;"><b>Lecture material</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs010.html#resampling-methods" style="font-size: 80%;"><b>Resampling methods</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs007.html#resampling-approaches-can-be-computationally-expensive" style="font-size: 80%;"><b>Resampling approaches can be computationally expensive</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs008.html#why-resampling-methods" style="font-size: 80%;"><b>Why resampling methods ?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs009.html#statistical-analysis" style="font-size: 80%;"><b>Statistical analysis</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs010.html#resampling-methods" style="font-size: 80%;"><b>Resampling methods</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs011.html#resampling-methods-bootstrap" style="font-size: 80%;"><b>Resampling methods: Bootstrap</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs012.html#the-bias-variance-tradeoff" style="font-size: 80%;"><b>The bias-variance tradeoff</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs013.html#a-way-to-read-the-bias-variance-tradeoff" style="font-size: 80%;"><b>A way to Read the Bias-Variance Tradeoff</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs014.html#understanding-what-happens" style="font-size: 80%;"><b>Understanding what happens</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs015.html#summing-up" style="font-size: 80%;"><b>Summing up</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs016.html#another-example-from-scikit-learn-s-repository" style="font-size: 80%;"><b>Another Example from Scikit-Learn's Repository</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs017.html#various-steps-in-cross-validation" style="font-size: 80%;"><b>Various steps in cross-validation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs018.html#cross-validation-in-brief" style="font-size: 80%;"><b>Cross-validation in brief</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs019.html#code-example-for-cross-validation-and-k-fold-cross-validation" style="font-size: 80%;"><b>Code Example for Cross-validation and \( k \)-fold Cross-validation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs020.html#more-examples-on-bootstrap-and-cross-validation-and-errors" style="font-size: 80%;"><b>More examples on bootstrap and cross-validation and errors</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs021.html#the-same-example-but-now-with-cross-validation" style="font-size: 80%;"><b>The same example but now with cross-validation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs022.html#logistic-regression" style="font-size: 80%;"><b>Logistic Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs023.html#classification-problems" style="font-size: 80%;"><b>Classification problems</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs024.html#optimization-and-deep-learning" style="font-size: 80%;"><b>Optimization and Deep learning</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs025.html#basics" style="font-size: 80%;"><b>Basics</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs026.html#linear-classifier" style="font-size: 80%;"><b>Linear classifier</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs027.html#some-selected-properties" style="font-size: 80%;"><b>Some selected properties</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs028.html#simple-example" style="font-size: 80%;"><b>Simple example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs029.html#plotting-the-mean-value-for-each-group" style="font-size: 80%;"><b>Plotting the mean value for each group</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs030.html#the-logistic-function" style="font-size: 80%;"><b>The logistic function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs031.html#examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-networks" style="font-size: 80%;"><b>Examples of likelihood functions used in logistic regression and nueral networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs032.html#two-parameters" style="font-size: 80%;"><b>Two parameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs033.html#maximum-likelihood" style="font-size: 80%;"><b>Maximum likelihood</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#the-cost-function-rewritten" style="font-size: 80%;"><b>The cost function rewritten</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs035.html#minimizing-the-cross-entropy" style="font-size: 80%;"><b>Minimizing the cross entropy</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs036.html#a-more-compact-expression" style="font-size: 80%;"><b>A more compact expression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs037.html#extending-to-more-predictors" style="font-size: 80%;"><b>Extending to more predictors</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs038.html#including-more-classes" style="font-size: 80%;"><b>Including more classes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs039.html#more-classes" style="font-size: 80%;"><b>More classes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs040.html#optimization-the-central-part-of-any-machine-learning-algortithm" style="font-size: 80%;"><b>Optimization, the central part of any Machine Learning algortithm</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs041.html#revisiting-our-logistic-regression-case" style="font-size: 80%;"><b>Revisiting our Logistic Regression case</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs042.html#the-equations-to-solve" style="font-size: 80%;"><b>The equations to solve</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs043.html#solving-using-newton-raphson-s-method" style="font-size: 80%;"><b>Solving using Newton-Raphson's method</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs044.html#example-code-for-logistic-regression" style="font-size: 80%;"><b>Example code for Logistic Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs044.html#synthetic-data-generation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Synthetic data generation</a></li>
+
+        </ul>
+      </li>
+    </ul>
+  </div>
+</div>
+</div> <!-- end of navigation bar -->
+<div class="container">
+<p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
+<a name="part0003"></a>
+<!-- !split -->
+<h2 id="readings-and-videos-logistic-regression" class="anchor">Readings and Videos, logistic regression </h2>
+<div class="panel panel-default">
+<div class="panel-body">
+<!-- subsequent paragraphs come in larger fonts, so start with a paragraph -->
+<ol>
+<li> Hastie et al 4.1, 4.2 and 4.3 on logistic regression</li>
+<li> Raschka et al, pages 53-76 on Logistic regression and pages 37-52 on gradient optimization</li>
+<li> <a href="https://www.youtube.com/watch?v=C5268D9t9Ak" target="_self">Video on Logistic regression</a></li>
+<li> <a href="https://www.youtube.com/watch?v=yIYKR4sgzI8" target="_self">Yet another video on logistic regression</a></li>
+</ol>
+</div>
+</div>
+
+
+<p>
+<!-- navigation buttons at the bottom of the page -->
+<ul class="pagination">
+<li><a href="http://github.com/._week39-bs002.html">&laquo;</a></li>
+  <li><a href="http://github.com/._week39-bs000.html">1</a></li>
+  <li><a href="http://github.com/._week39-bs001.html">2</a></li>
+  <li><a href="http://github.com/._week39-bs002.html">3</a></li>
+  <li class="active"><a href="http://github.com/._week39-bs003.html">4</a></li>
+  <li><a href="http://github.com/._week39-bs004.html">5</a></li>
+  <li><a href="http://github.com/._week39-bs005.html">6</a></li>
+  <li><a href="http://github.com/._week39-bs006.html">7</a></li>
+  <li><a href="http://github.com/._week39-bs007.html">8</a></li>
+  <li><a href="http://github.com/._week39-bs008.html">9</a></li>
+  <li><a href="http://github.com/._week39-bs009.html">10</a></li>
+  <li><a href="http://github.com/._week39-bs010.html">11</a></li>
+  <li><a href="http://github.com/._week39-bs011.html">12</a></li>
+  <li><a href="http://github.com/._week39-bs012.html">13</a></li>
+  <li><a href="">...</a></li>
+  <li><a href="http://github.com/._week39-bs044.html">45</a></li>
+  <li><a href="http://github.com/._week39-bs004.html">&raquo;</a></li>
+</ul>
+<!-- ------------------- end of main content --------------- -->
+</div>  <!-- end container -->
+<!-- include javascript, jQuery *first* -->
+<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.10.2/jquery.min.js"></script>
+<script src="https://netdna.bootstrapcdn.com/bootstrap/3.0.0/js/bootstrap.min.js"></script>
+<!-- Bootstrap footer
+<footer>
+<a href="https://.../"><img width="250" align=right src="https://.../"></a>
+</footer>
+-->
+<center style="font-size:80%">
+<!-- copyright only on the titlepage -->
+</center>
+</body>
+</html>
+
diff --git a/doc/pub/week39/html/._week39-bs004.html b/doc/pub/week39/html/._week39-bs004.html
new file mode 100644
index 000000000..3b5cafe9a
--- /dev/null
+++ b/doc/pub/week39/html/._week39-bs004.html
@@ -0,0 +1,308 @@
+<!--
+HTML file automatically generated from DocOnce source
+(https://github.com/doconce/doconce/)
+doconce format html week39.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week39-bs --no_mako
+-->
+<html>
+<head>
+<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
+<meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
+<meta name="viewport" content="width=device-width, initial-scale=1.0" />
+<meta name="description" content="Week 39: Resampling methods and logistic regression">
+<title>Week 39: Resampling methods and logistic regression</title>
+<!-- Bootstrap style: bootstrap -->
+<!-- doconce format html week39.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week39-bs --no_mako -->
+<link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
+<!-- not necessary
+<link href="https://netdna.bootstrapcdn.com/font-awesome/4.0.3/css/font-awesome.css" rel="stylesheet">
+-->
+<style type="text/css">
+/* Add scrollbar to dropdown menus in bootstrap navigation bar */
+.dropdown-menu {
+   height: auto;
+   max-height: 400px;
+   overflow-x: hidden;
+}
+/* Adds an invisible element before each target to offset for the navigation
+   bar */
+.anchor::before {
+  content:"";
+  display:block;
+  height:50px;      /* fixed header height for style bootstrap */
+  margin:-50px 0 0; /* negative fixed header height */
+}
+</style>
+</head>
+
+<!-- tocinfo
+{'highest level': 2,
+ 'sections': [('Plan for week 39, September 22-26, 2025',
+               2,
+               None,
+               'plan-for-week-39-september-22-26-2025'),
+              ('Readings and Videos, resampling methods',
+               2,
+               None,
+               'readings-and-videos-resampling-methods'),
+              ('Readings and Videos, logistic regression',
+               2,
+               None,
+               'readings-and-videos-logistic-regression'),
+              ('Lab sessions week 39', 2, None, 'lab-sessions-week-39'),
+              ('Lecture material', 2, None, 'lecture-material'),
+              ('Resampling methods', 2, None, 'resampling-methods'),
+              ('Resampling approaches can be computationally expensive',
+               2,
+               None,
+               'resampling-approaches-can-be-computationally-expensive'),
+              ('Why resampling methods ?', 2, None, 'why-resampling-methods'),
+              ('Statistical analysis', 2, None, 'statistical-analysis'),
+              ('Resampling methods', 2, None, 'resampling-methods'),
+              ('Resampling methods: Bootstrap',
+               2,
+               None,
+               'resampling-methods-bootstrap'),
+              ('The bias-variance tradeoff',
+               2,
+               None,
+               'the-bias-variance-tradeoff'),
+              ('A way to Read the Bias-Variance Tradeoff',
+               2,
+               None,
+               'a-way-to-read-the-bias-variance-tradeoff'),
+              ('Understanding what happens',
+               2,
+               None,
+               'understanding-what-happens'),
+              ('Summing up', 2, None, 'summing-up'),
+              ("Another Example from Scikit-Learn's Repository",
+               2,
+               None,
+               'another-example-from-scikit-learn-s-repository'),
+              ('Various steps in cross-validation',
+               2,
+               None,
+               'various-steps-in-cross-validation'),
+              ('Cross-validation in brief',
+               2,
+               None,
+               'cross-validation-in-brief'),
+              ('Code Example for Cross-validation and $k$-fold '
+               'Cross-validation',
+               2,
+               None,
+               'code-example-for-cross-validation-and-k-fold-cross-validation'),
+              ('More examples on bootstrap and cross-validation and errors',
+               2,
+               None,
+               'more-examples-on-bootstrap-and-cross-validation-and-errors'),
+              ('The same example but now with cross-validation',
+               2,
+               None,
+               'the-same-example-but-now-with-cross-validation'),
+              ('Logistic Regression', 2, None, 'logistic-regression'),
+              ('Classification problems', 2, None, 'classification-problems'),
+              ('Optimization and Deep learning',
+               2,
+               None,
+               'optimization-and-deep-learning'),
+              ('Basics', 2, None, 'basics'),
+              ('Linear classifier', 2, None, 'linear-classifier'),
+              ('Some selected properties', 2, None, 'some-selected-properties'),
+              ('Simple example', 2, None, 'simple-example'),
+              ('Plotting the mean value for each group',
+               2,
+               None,
+               'plotting-the-mean-value-for-each-group'),
+              ('The logistic function', 2, None, 'the-logistic-function'),
+              ('Examples of likelihood functions used in logistic regression '
+               'and nueral networks',
+               2,
+               None,
+               'examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-networks'),
+              ('Two parameters', 2, None, 'two-parameters'),
+              ('Maximum likelihood', 2, None, 'maximum-likelihood'),
+              ('The cost function rewritten',
+               2,
+               None,
+               'the-cost-function-rewritten'),
+              ('Minimizing the cross entropy',
+               2,
+               None,
+               'minimizing-the-cross-entropy'),
+              ('A more compact expression',
+               2,
+               None,
+               'a-more-compact-expression'),
+              ('Extending to more predictors',
+               2,
+               None,
+               'extending-to-more-predictors'),
+              ('Including more classes', 2, None, 'including-more-classes'),
+              ('More classes', 2, None, 'more-classes'),
+              ('Optimization, the central part of any Machine Learning '
+               'algortithm',
+               2,
+               None,
+               'optimization-the-central-part-of-any-machine-learning-algortithm'),
+              ('Revisiting our Logistic Regression case',
+               2,
+               None,
+               'revisiting-our-logistic-regression-case'),
+              ('The equations to solve', 2, None, 'the-equations-to-solve'),
+              ("Solving using Newton-Raphson's method",
+               2,
+               None,
+               'solving-using-newton-raphson-s-method'),
+              ('Example code for Logistic Regression',
+               2,
+               None,
+               'example-code-for-logistic-regression'),
+              ('Synthetic data generation',
+               3,
+               None,
+               'synthetic-data-generation')]}
+end of tocinfo -->
+
+<body>
+
+
+
+<script type="text/x-mathjax-config">
+MathJax.Hub.Config({
+  TeX: {
+     equationNumbers: {  autoNumber: "none"  },
+     extensions: ["AMSmath.js", "AMSsymbols.js", "autobold.js", "color.js"]
+  }
+});
+</script>
+<script type="text/javascript" async
+ src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
+</script>
+
+
+<!-- Bootstrap navigation bar -->
+<div class="navbar navbar-default navbar-fixed-top">
+  <div class="navbar-header">
+    <button type="button" class="navbar-toggle" data-toggle="collapse" data-target=".navbar-responsive-collapse">
+      <span class="icon-bar"></span>
+      <span class="icon-bar"></span>
+      <span class="icon-bar"></span>
+    </button>
+    <a class="navbar-brand" href="http://github.com/week39-bs.html">Week 39: Resampling methods and logistic regression</a>
+  </div>
+  <div class="navbar-collapse collapse navbar-responsive-collapse">
+    <ul class="nav navbar-nav navbar-right">
+      <li class="dropdown">
+        <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
+        <ul class="dropdown-menu">
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs001.html#plan-for-week-39-september-22-26-2025" style="font-size: 80%;"><b>Plan for week 39, September 22-26, 2025</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs002.html#readings-and-videos-resampling-methods" style="font-size: 80%;"><b>Readings and Videos, resampling methods</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs003.html#readings-and-videos-logistic-regression" style="font-size: 80%;"><b>Readings and Videos, logistic regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="#lab-sessions-week-39" style="font-size: 80%;"><b>Lab sessions week 39</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs005.html#lecture-material" style="font-size: 80%;"><b>Lecture material</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs010.html#resampling-methods" style="font-size: 80%;"><b>Resampling methods</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs007.html#resampling-approaches-can-be-computationally-expensive" style="font-size: 80%;"><b>Resampling approaches can be computationally expensive</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs008.html#why-resampling-methods" style="font-size: 80%;"><b>Why resampling methods ?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs009.html#statistical-analysis" style="font-size: 80%;"><b>Statistical analysis</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs010.html#resampling-methods" style="font-size: 80%;"><b>Resampling methods</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs011.html#resampling-methods-bootstrap" style="font-size: 80%;"><b>Resampling methods: Bootstrap</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs012.html#the-bias-variance-tradeoff" style="font-size: 80%;"><b>The bias-variance tradeoff</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs013.html#a-way-to-read-the-bias-variance-tradeoff" style="font-size: 80%;"><b>A way to Read the Bias-Variance Tradeoff</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs014.html#understanding-what-happens" style="font-size: 80%;"><b>Understanding what happens</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs015.html#summing-up" style="font-size: 80%;"><b>Summing up</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs016.html#another-example-from-scikit-learn-s-repository" style="font-size: 80%;"><b>Another Example from Scikit-Learn's Repository</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs017.html#various-steps-in-cross-validation" style="font-size: 80%;"><b>Various steps in cross-validation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs018.html#cross-validation-in-brief" style="font-size: 80%;"><b>Cross-validation in brief</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs019.html#code-example-for-cross-validation-and-k-fold-cross-validation" style="font-size: 80%;"><b>Code Example for Cross-validation and \( k \)-fold Cross-validation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs020.html#more-examples-on-bootstrap-and-cross-validation-and-errors" style="font-size: 80%;"><b>More examples on bootstrap and cross-validation and errors</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs021.html#the-same-example-but-now-with-cross-validation" style="font-size: 80%;"><b>The same example but now with cross-validation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs022.html#logistic-regression" style="font-size: 80%;"><b>Logistic Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs023.html#classification-problems" style="font-size: 80%;"><b>Classification problems</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs024.html#optimization-and-deep-learning" style="font-size: 80%;"><b>Optimization and Deep learning</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs025.html#basics" style="font-size: 80%;"><b>Basics</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs026.html#linear-classifier" style="font-size: 80%;"><b>Linear classifier</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs027.html#some-selected-properties" style="font-size: 80%;"><b>Some selected properties</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs028.html#simple-example" style="font-size: 80%;"><b>Simple example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs029.html#plotting-the-mean-value-for-each-group" style="font-size: 80%;"><b>Plotting the mean value for each group</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs030.html#the-logistic-function" style="font-size: 80%;"><b>The logistic function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs031.html#examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-networks" style="font-size: 80%;"><b>Examples of likelihood functions used in logistic regression and nueral networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs032.html#two-parameters" style="font-size: 80%;"><b>Two parameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs033.html#maximum-likelihood" style="font-size: 80%;"><b>Maximum likelihood</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#the-cost-function-rewritten" style="font-size: 80%;"><b>The cost function rewritten</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs035.html#minimizing-the-cross-entropy" style="font-size: 80%;"><b>Minimizing the cross entropy</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs036.html#a-more-compact-expression" style="font-size: 80%;"><b>A more compact expression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs037.html#extending-to-more-predictors" style="font-size: 80%;"><b>Extending to more predictors</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs038.html#including-more-classes" style="font-size: 80%;"><b>Including more classes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs039.html#more-classes" style="font-size: 80%;"><b>More classes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs040.html#optimization-the-central-part-of-any-machine-learning-algortithm" style="font-size: 80%;"><b>Optimization, the central part of any Machine Learning algortithm</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs041.html#revisiting-our-logistic-regression-case" style="font-size: 80%;"><b>Revisiting our Logistic Regression case</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs042.html#the-equations-to-solve" style="font-size: 80%;"><b>The equations to solve</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs043.html#solving-using-newton-raphson-s-method" style="font-size: 80%;"><b>Solving using Newton-Raphson's method</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs044.html#example-code-for-logistic-regression" style="font-size: 80%;"><b>Example code for Logistic Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs044.html#synthetic-data-generation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Synthetic data generation</a></li>
+
+        </ul>
+      </li>
+    </ul>
+  </div>
+</div>
+</div> <!-- end of navigation bar -->
+<div class="container">
+<p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
+<a name="part0004"></a>
+<!-- !split -->
+<h2 id="lab-sessions-week-39" class="anchor">Lab sessions week 39 </h2>
+
+<div class="panel panel-default">
+<div class="panel-body">
+<!-- subsequent paragraphs come in larger fonts, so start with a paragraph -->
+<ol>
+<li> Discussions on how to structure your report for the first project</li>
+<li> Exercise for week 39 on how to write the abstract and the introduction of the report and how to include references.</li> 
+<li> Work on project 1, in particular resampling methods like cross-validation and bootstrap. <b>For more discussions of project 1, chapter 5 of Goodfellow et al is a good read, in particular sections 5.1-5.5 and 5.7-5.11</b>.</li>
+<li> <a href="https://youtu.be/tVW1ZDmZnwM" target="_self">Video on how to write scientific reports recorded during one of the lab sessions</a></li>
+<li> A general guideline can be found at <a href="https://github.com/CompPhysics/MachineLearning/blob/master/doc/Projects/EvaluationGrading/EvaluationForm.md" target="_self"><tt>https://github.com/CompPhysics/MachineLearning/blob/master/doc/Projects/EvaluationGrading/EvaluationForm.md</tt></a>.</li>
+</ol>
+</div>
+</div>
+  
+
+<p>
+<!-- navigation buttons at the bottom of the page -->
+<ul class="pagination">
+<li><a href="http://github.com/._week39-bs003.html">&laquo;</a></li>
+  <li><a href="http://github.com/._week39-bs000.html">1</a></li>
+  <li><a href="http://github.com/._week39-bs001.html">2</a></li>
+  <li><a href="http://github.com/._week39-bs002.html">3</a></li>
+  <li><a href="http://github.com/._week39-bs003.html">4</a></li>
+  <li class="active"><a href="http://github.com/._week39-bs004.html">5</a></li>
+  <li><a href="http://github.com/._week39-bs005.html">6</a></li>
+  <li><a href="http://github.com/._week39-bs006.html">7</a></li>
+  <li><a href="http://github.com/._week39-bs007.html">8</a></li>
+  <li><a href="http://github.com/._week39-bs008.html">9</a></li>
+  <li><a href="http://github.com/._week39-bs009.html">10</a></li>
+  <li><a href="http://github.com/._week39-bs010.html">11</a></li>
+  <li><a href="http://github.com/._week39-bs011.html">12</a></li>
+  <li><a href="http://github.com/._week39-bs012.html">13</a></li>
+  <li><a href="http://github.com/._week39-bs013.html">14</a></li>
+  <li><a href="">...</a></li>
+  <li><a href="http://github.com/._week39-bs044.html">45</a></li>
+  <li><a href="http://github.com/._week39-bs005.html">&raquo;</a></li>
+</ul>
+<!-- ------------------- end of main content --------------- -->
+</div>  <!-- end container -->
+<!-- include javascript, jQuery *first* -->
+<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.10.2/jquery.min.js"></script>
+<script src="https://netdna.bootstrapcdn.com/bootstrap/3.0.0/js/bootstrap.min.js"></script>
+<!-- Bootstrap footer
+<footer>
+<a href="https://.../"><img width="250" align=right src="https://.../"></a>
+</footer>
+-->
+<center style="font-size:80%">
+<!-- copyright only on the titlepage -->
+</center>
+</body>
+</html>
+
diff --git a/doc/pub/week39/html/._week39-bs005.html b/doc/pub/week39/html/._week39-bs005.html
new file mode 100644
index 000000000..758e7f551
--- /dev/null
+++ b/doc/pub/week39/html/._week39-bs005.html
@@ -0,0 +1,295 @@
+<!--
+HTML file automatically generated from DocOnce source
+(https://github.com/doconce/doconce/)
+doconce format html week39.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week39-bs --no_mako
+-->
+<html>
+<head>
+<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
+<meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
+<meta name="viewport" content="width=device-width, initial-scale=1.0" />
+<meta name="description" content="Week 39: Resampling methods and logistic regression">
+<title>Week 39: Resampling methods and logistic regression</title>
+<!-- Bootstrap style: bootstrap -->
+<!-- doconce format html week39.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week39-bs --no_mako -->
+<link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
+<!-- not necessary
+<link href="https://netdna.bootstrapcdn.com/font-awesome/4.0.3/css/font-awesome.css" rel="stylesheet">
+-->
+<style type="text/css">
+/* Add scrollbar to dropdown menus in bootstrap navigation bar */
+.dropdown-menu {
+   height: auto;
+   max-height: 400px;
+   overflow-x: hidden;
+}
+/* Adds an invisible element before each target to offset for the navigation
+   bar */
+.anchor::before {
+  content:"";
+  display:block;
+  height:50px;      /* fixed header height for style bootstrap */
+  margin:-50px 0 0; /* negative fixed header height */
+}
+</style>
+</head>
+
+<!-- tocinfo
+{'highest level': 2,
+ 'sections': [('Plan for week 39, September 22-26, 2025',
+               2,
+               None,
+               'plan-for-week-39-september-22-26-2025'),
+              ('Readings and Videos, resampling methods',
+               2,
+               None,
+               'readings-and-videos-resampling-methods'),
+              ('Readings and Videos, logistic regression',
+               2,
+               None,
+               'readings-and-videos-logistic-regression'),
+              ('Lab sessions week 39', 2, None, 'lab-sessions-week-39'),
+              ('Lecture material', 2, None, 'lecture-material'),
+              ('Resampling methods', 2, None, 'resampling-methods'),
+              ('Resampling approaches can be computationally expensive',
+               2,
+               None,
+               'resampling-approaches-can-be-computationally-expensive'),
+              ('Why resampling methods ?', 2, None, 'why-resampling-methods'),
+              ('Statistical analysis', 2, None, 'statistical-analysis'),
+              ('Resampling methods', 2, None, 'resampling-methods'),
+              ('Resampling methods: Bootstrap',
+               2,
+               None,
+               'resampling-methods-bootstrap'),
+              ('The bias-variance tradeoff',
+               2,
+               None,
+               'the-bias-variance-tradeoff'),
+              ('A way to Read the Bias-Variance Tradeoff',
+               2,
+               None,
+               'a-way-to-read-the-bias-variance-tradeoff'),
+              ('Understanding what happens',
+               2,
+               None,
+               'understanding-what-happens'),
+              ('Summing up', 2, None, 'summing-up'),
+              ("Another Example from Scikit-Learn's Repository",
+               2,
+               None,
+               'another-example-from-scikit-learn-s-repository'),
+              ('Various steps in cross-validation',
+               2,
+               None,
+               'various-steps-in-cross-validation'),
+              ('Cross-validation in brief',
+               2,
+               None,
+               'cross-validation-in-brief'),
+              ('Code Example for Cross-validation and $k$-fold '
+               'Cross-validation',
+               2,
+               None,
+               'code-example-for-cross-validation-and-k-fold-cross-validation'),
+              ('More examples on bootstrap and cross-validation and errors',
+               2,
+               None,
+               'more-examples-on-bootstrap-and-cross-validation-and-errors'),
+              ('The same example but now with cross-validation',
+               2,
+               None,
+               'the-same-example-but-now-with-cross-validation'),
+              ('Logistic Regression', 2, None, 'logistic-regression'),
+              ('Classification problems', 2, None, 'classification-problems'),
+              ('Optimization and Deep learning',
+               2,
+               None,
+               'optimization-and-deep-learning'),
+              ('Basics', 2, None, 'basics'),
+              ('Linear classifier', 2, None, 'linear-classifier'),
+              ('Some selected properties', 2, None, 'some-selected-properties'),
+              ('Simple example', 2, None, 'simple-example'),
+              ('Plotting the mean value for each group',
+               2,
+               None,
+               'plotting-the-mean-value-for-each-group'),
+              ('The logistic function', 2, None, 'the-logistic-function'),
+              ('Examples of likelihood functions used in logistic regression '
+               'and nueral networks',
+               2,
+               None,
+               'examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-networks'),
+              ('Two parameters', 2, None, 'two-parameters'),
+              ('Maximum likelihood', 2, None, 'maximum-likelihood'),
+              ('The cost function rewritten',
+               2,
+               None,
+               'the-cost-function-rewritten'),
+              ('Minimizing the cross entropy',
+               2,
+               None,
+               'minimizing-the-cross-entropy'),
+              ('A more compact expression',
+               2,
+               None,
+               'a-more-compact-expression'),
+              ('Extending to more predictors',
+               2,
+               None,
+               'extending-to-more-predictors'),
+              ('Including more classes', 2, None, 'including-more-classes'),
+              ('More classes', 2, None, 'more-classes'),
+              ('Optimization, the central part of any Machine Learning '
+               'algortithm',
+               2,
+               None,
+               'optimization-the-central-part-of-any-machine-learning-algortithm'),
+              ('Revisiting our Logistic Regression case',
+               2,
+               None,
+               'revisiting-our-logistic-regression-case'),
+              ('The equations to solve', 2, None, 'the-equations-to-solve'),
+              ("Solving using Newton-Raphson's method",
+               2,
+               None,
+               'solving-using-newton-raphson-s-method'),
+              ('Example code for Logistic Regression',
+               2,
+               None,
+               'example-code-for-logistic-regression'),
+              ('Synthetic data generation',
+               3,
+               None,
+               'synthetic-data-generation')]}
+end of tocinfo -->
+
+<body>
+
+
+
+<script type="text/x-mathjax-config">
+MathJax.Hub.Config({
+  TeX: {
+     equationNumbers: {  autoNumber: "none"  },
+     extensions: ["AMSmath.js", "AMSsymbols.js", "autobold.js", "color.js"]
+  }
+});
+</script>
+<script type="text/javascript" async
+ src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
+</script>
+
+
+<!-- Bootstrap navigation bar -->
+<div class="navbar navbar-default navbar-fixed-top">
+  <div class="navbar-header">
+    <button type="button" class="navbar-toggle" data-toggle="collapse" data-target=".navbar-responsive-collapse">
+      <span class="icon-bar"></span>
+      <span class="icon-bar"></span>
+      <span class="icon-bar"></span>
+    </button>
+    <a class="navbar-brand" href="http://github.com/week39-bs.html">Week 39: Resampling methods and logistic regression</a>
+  </div>
+  <div class="navbar-collapse collapse navbar-responsive-collapse">
+    <ul class="nav navbar-nav navbar-right">
+      <li class="dropdown">
+        <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
+        <ul class="dropdown-menu">
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs001.html#plan-for-week-39-september-22-26-2025" style="font-size: 80%;"><b>Plan for week 39, September 22-26, 2025</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs002.html#readings-and-videos-resampling-methods" style="font-size: 80%;"><b>Readings and Videos, resampling methods</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs003.html#readings-and-videos-logistic-regression" style="font-size: 80%;"><b>Readings and Videos, logistic regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs004.html#lab-sessions-week-39" style="font-size: 80%;"><b>Lab sessions week 39</b></a></li>
+     <!-- navigation toc: --> <li><a href="#lecture-material" style="font-size: 80%;"><b>Lecture material</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs010.html#resampling-methods" style="font-size: 80%;"><b>Resampling methods</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs007.html#resampling-approaches-can-be-computationally-expensive" style="font-size: 80%;"><b>Resampling approaches can be computationally expensive</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs008.html#why-resampling-methods" style="font-size: 80%;"><b>Why resampling methods ?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs009.html#statistical-analysis" style="font-size: 80%;"><b>Statistical analysis</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs010.html#resampling-methods" style="font-size: 80%;"><b>Resampling methods</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs011.html#resampling-methods-bootstrap" style="font-size: 80%;"><b>Resampling methods: Bootstrap</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs012.html#the-bias-variance-tradeoff" style="font-size: 80%;"><b>The bias-variance tradeoff</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs013.html#a-way-to-read-the-bias-variance-tradeoff" style="font-size: 80%;"><b>A way to Read the Bias-Variance Tradeoff</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs014.html#understanding-what-happens" style="font-size: 80%;"><b>Understanding what happens</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs015.html#summing-up" style="font-size: 80%;"><b>Summing up</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs016.html#another-example-from-scikit-learn-s-repository" style="font-size: 80%;"><b>Another Example from Scikit-Learn's Repository</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs017.html#various-steps-in-cross-validation" style="font-size: 80%;"><b>Various steps in cross-validation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs018.html#cross-validation-in-brief" style="font-size: 80%;"><b>Cross-validation in brief</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs019.html#code-example-for-cross-validation-and-k-fold-cross-validation" style="font-size: 80%;"><b>Code Example for Cross-validation and \( k \)-fold Cross-validation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs020.html#more-examples-on-bootstrap-and-cross-validation-and-errors" style="font-size: 80%;"><b>More examples on bootstrap and cross-validation and errors</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs021.html#the-same-example-but-now-with-cross-validation" style="font-size: 80%;"><b>The same example but now with cross-validation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs022.html#logistic-regression" style="font-size: 80%;"><b>Logistic Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs023.html#classification-problems" style="font-size: 80%;"><b>Classification problems</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs024.html#optimization-and-deep-learning" style="font-size: 80%;"><b>Optimization and Deep learning</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs025.html#basics" style="font-size: 80%;"><b>Basics</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs026.html#linear-classifier" style="font-size: 80%;"><b>Linear classifier</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs027.html#some-selected-properties" style="font-size: 80%;"><b>Some selected properties</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs028.html#simple-example" style="font-size: 80%;"><b>Simple example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs029.html#plotting-the-mean-value-for-each-group" style="font-size: 80%;"><b>Plotting the mean value for each group</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs030.html#the-logistic-function" style="font-size: 80%;"><b>The logistic function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs031.html#examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-networks" style="font-size: 80%;"><b>Examples of likelihood functions used in logistic regression and nueral networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs032.html#two-parameters" style="font-size: 80%;"><b>Two parameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs033.html#maximum-likelihood" style="font-size: 80%;"><b>Maximum likelihood</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#the-cost-function-rewritten" style="font-size: 80%;"><b>The cost function rewritten</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs035.html#minimizing-the-cross-entropy" style="font-size: 80%;"><b>Minimizing the cross entropy</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs036.html#a-more-compact-expression" style="font-size: 80%;"><b>A more compact expression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs037.html#extending-to-more-predictors" style="font-size: 80%;"><b>Extending to more predictors</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs038.html#including-more-classes" style="font-size: 80%;"><b>Including more classes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs039.html#more-classes" style="font-size: 80%;"><b>More classes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs040.html#optimization-the-central-part-of-any-machine-learning-algortithm" style="font-size: 80%;"><b>Optimization, the central part of any Machine Learning algortithm</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs041.html#revisiting-our-logistic-regression-case" style="font-size: 80%;"><b>Revisiting our Logistic Regression case</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs042.html#the-equations-to-solve" style="font-size: 80%;"><b>The equations to solve</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs043.html#solving-using-newton-raphson-s-method" style="font-size: 80%;"><b>Solving using Newton-Raphson's method</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs044.html#example-code-for-logistic-regression" style="font-size: 80%;"><b>Example code for Logistic Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs044.html#synthetic-data-generation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Synthetic data generation</a></li>
+
+        </ul>
+      </li>
+    </ul>
+  </div>
+</div>
+</div> <!-- end of navigation bar -->
+<div class="container">
+<p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
+<a name="part0005"></a>
+<!-- !split -->
+<h2 id="lecture-material" class="anchor">Lecture material </h2>
+
+<p>
+<!-- navigation buttons at the bottom of the page -->
+<ul class="pagination">
+<li><a href="http://github.com/._week39-bs004.html">&laquo;</a></li>
+  <li><a href="http://github.com/._week39-bs000.html">1</a></li>
+  <li><a href="http://github.com/._week39-bs001.html">2</a></li>
+  <li><a href="http://github.com/._week39-bs002.html">3</a></li>
+  <li><a href="http://github.com/._week39-bs003.html">4</a></li>
+  <li><a href="http://github.com/._week39-bs004.html">5</a></li>
+  <li class="active"><a href="http://github.com/._week39-bs005.html">6</a></li>
+  <li><a href="http://github.com/._week39-bs006.html">7</a></li>
+  <li><a href="http://github.com/._week39-bs007.html">8</a></li>
+  <li><a href="http://github.com/._week39-bs008.html">9</a></li>
+  <li><a href="http://github.com/._week39-bs009.html">10</a></li>
+  <li><a href="http://github.com/._week39-bs010.html">11</a></li>
+  <li><a href="http://github.com/._week39-bs011.html">12</a></li>
+  <li><a href="http://github.com/._week39-bs012.html">13</a></li>
+  <li><a href="http://github.com/._week39-bs013.html">14</a></li>
+  <li><a href="http://github.com/._week39-bs014.html">15</a></li>
+  <li><a href="">...</a></li>
+  <li><a href="http://github.com/._week39-bs044.html">45</a></li>
+  <li><a href="http://github.com/._week39-bs006.html">&raquo;</a></li>
+</ul>
+<!-- ------------------- end of main content --------------- -->
+</div>  <!-- end container -->
+<!-- include javascript, jQuery *first* -->
+<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.10.2/jquery.min.js"></script>
+<script src="https://netdna.bootstrapcdn.com/bootstrap/3.0.0/js/bootstrap.min.js"></script>
+<!-- Bootstrap footer
+<footer>
+<a href="https://.../"><img width="250" align=right src="https://.../"></a>
+</footer>
+-->
+<center style="font-size:80%">
+<!-- copyright only on the titlepage -->
+</center>
+</body>
+</html>
+
diff --git a/doc/pub/week39/html/._week39-bs006.html b/doc/pub/week39/html/._week39-bs006.html
new file mode 100644
index 000000000..ba36aeeaf
--- /dev/null
+++ b/doc/pub/week39/html/._week39-bs006.html
@@ -0,0 +1,320 @@
+<!--
+HTML file automatically generated from DocOnce source
+(https://github.com/doconce/doconce/)
+doconce format html week39.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week39-bs --no_mako
+-->
+<html>
+<head>
+<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
+<meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
+<meta name="viewport" content="width=device-width, initial-scale=1.0" />
+<meta name="description" content="Week 39: Resampling methods and logistic regression">
+<title>Week 39: Resampling methods and logistic regression</title>
+<!-- Bootstrap style: bootstrap -->
+<!-- doconce format html week39.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week39-bs --no_mako -->
+<link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
+<!-- not necessary
+<link href="https://netdna.bootstrapcdn.com/font-awesome/4.0.3/css/font-awesome.css" rel="stylesheet">
+-->
+<style type="text/css">
+/* Add scrollbar to dropdown menus in bootstrap navigation bar */
+.dropdown-menu {
+   height: auto;
+   max-height: 400px;
+   overflow-x: hidden;
+}
+/* Adds an invisible element before each target to offset for the navigation
+   bar */
+.anchor::before {
+  content:"";
+  display:block;
+  height:50px;      /* fixed header height for style bootstrap */
+  margin:-50px 0 0; /* negative fixed header height */
+}
+</style>
+</head>
+
+<!-- tocinfo
+{'highest level': 2,
+ 'sections': [('Plan for week 39, September 22-26, 2025',
+               2,
+               None,
+               'plan-for-week-39-september-22-26-2025'),
+              ('Readings and Videos, resampling methods',
+               2,
+               None,
+               'readings-and-videos-resampling-methods'),
+              ('Readings and Videos, logistic regression',
+               2,
+               None,
+               'readings-and-videos-logistic-regression'),
+              ('Lab sessions week 39', 2, None, 'lab-sessions-week-39'),
+              ('Lecture material', 2, None, 'lecture-material'),
+              ('Resampling methods', 2, None, 'resampling-methods'),
+              ('Resampling approaches can be computationally expensive',
+               2,
+               None,
+               'resampling-approaches-can-be-computationally-expensive'),
+              ('Why resampling methods ?', 2, None, 'why-resampling-methods'),
+              ('Statistical analysis', 2, None, 'statistical-analysis'),
+              ('Resampling methods', 2, None, 'resampling-methods'),
+              ('Resampling methods: Bootstrap',
+               2,
+               None,
+               'resampling-methods-bootstrap'),
+              ('The bias-variance tradeoff',
+               2,
+               None,
+               'the-bias-variance-tradeoff'),
+              ('A way to Read the Bias-Variance Tradeoff',
+               2,
+               None,
+               'a-way-to-read-the-bias-variance-tradeoff'),
+              ('Understanding what happens',
+               2,
+               None,
+               'understanding-what-happens'),
+              ('Summing up', 2, None, 'summing-up'),
+              ("Another Example from Scikit-Learn's Repository",
+               2,
+               None,
+               'another-example-from-scikit-learn-s-repository'),
+              ('Various steps in cross-validation',
+               2,
+               None,
+               'various-steps-in-cross-validation'),
+              ('Cross-validation in brief',
+               2,
+               None,
+               'cross-validation-in-brief'),
+              ('Code Example for Cross-validation and $k$-fold '
+               'Cross-validation',
+               2,
+               None,
+               'code-example-for-cross-validation-and-k-fold-cross-validation'),
+              ('More examples on bootstrap and cross-validation and errors',
+               2,
+               None,
+               'more-examples-on-bootstrap-and-cross-validation-and-errors'),
+              ('The same example but now with cross-validation',
+               2,
+               None,
+               'the-same-example-but-now-with-cross-validation'),
+              ('Logistic Regression', 2, None, 'logistic-regression'),
+              ('Classification problems', 2, None, 'classification-problems'),
+              ('Optimization and Deep learning',
+               2,
+               None,
+               'optimization-and-deep-learning'),
+              ('Basics', 2, None, 'basics'),
+              ('Linear classifier', 2, None, 'linear-classifier'),
+              ('Some selected properties', 2, None, 'some-selected-properties'),
+              ('Simple example', 2, None, 'simple-example'),
+              ('Plotting the mean value for each group',
+               2,
+               None,
+               'plotting-the-mean-value-for-each-group'),
+              ('The logistic function', 2, None, 'the-logistic-function'),
+              ('Examples of likelihood functions used in logistic regression '
+               'and nueral networks',
+               2,
+               None,
+               'examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-networks'),
+              ('Two parameters', 2, None, 'two-parameters'),
+              ('Maximum likelihood', 2, None, 'maximum-likelihood'),
+              ('The cost function rewritten',
+               2,
+               None,
+               'the-cost-function-rewritten'),
+              ('Minimizing the cross entropy',
+               2,
+               None,
+               'minimizing-the-cross-entropy'),
+              ('A more compact expression',
+               2,
+               None,
+               'a-more-compact-expression'),
+              ('Extending to more predictors',
+               2,
+               None,
+               'extending-to-more-predictors'),
+              ('Including more classes', 2, None, 'including-more-classes'),
+              ('More classes', 2, None, 'more-classes'),
+              ('Optimization, the central part of any Machine Learning '
+               'algortithm',
+               2,
+               None,
+               'optimization-the-central-part-of-any-machine-learning-algortithm'),
+              ('Revisiting our Logistic Regression case',
+               2,
+               None,
+               'revisiting-our-logistic-regression-case'),
+              ('The equations to solve', 2, None, 'the-equations-to-solve'),
+              ("Solving using Newton-Raphson's method",
+               2,
+               None,
+               'solving-using-newton-raphson-s-method'),
+              ('Example code for Logistic Regression',
+               2,
+               None,
+               'example-code-for-logistic-regression'),
+              ('Synthetic data generation',
+               3,
+               None,
+               'synthetic-data-generation')]}
+end of tocinfo -->
+
+<body>
+
+
+
+<script type="text/x-mathjax-config">
+MathJax.Hub.Config({
+  TeX: {
+     equationNumbers: {  autoNumber: "none"  },
+     extensions: ["AMSmath.js", "AMSsymbols.js", "autobold.js", "color.js"]
+  }
+});
+</script>
+<script type="text/javascript" async
+ src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
+</script>
+
+
+<!-- Bootstrap navigation bar -->
+<div class="navbar navbar-default navbar-fixed-top">
+  <div class="navbar-header">
+    <button type="button" class="navbar-toggle" data-toggle="collapse" data-target=".navbar-responsive-collapse">
+      <span class="icon-bar"></span>
+      <span class="icon-bar"></span>
+      <span class="icon-bar"></span>
+    </button>
+    <a class="navbar-brand" href="http://github.com/week39-bs.html">Week 39: Resampling methods and logistic regression</a>
+  </div>
+  <div class="navbar-collapse collapse navbar-responsive-collapse">
+    <ul class="nav navbar-nav navbar-right">
+      <li class="dropdown">
+        <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
+        <ul class="dropdown-menu">
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs001.html#plan-for-week-39-september-22-26-2025" style="font-size: 80%;"><b>Plan for week 39, September 22-26, 2025</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs002.html#readings-and-videos-resampling-methods" style="font-size: 80%;"><b>Readings and Videos, resampling methods</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs003.html#readings-and-videos-logistic-regression" style="font-size: 80%;"><b>Readings and Videos, logistic regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs004.html#lab-sessions-week-39" style="font-size: 80%;"><b>Lab sessions week 39</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs005.html#lecture-material" style="font-size: 80%;"><b>Lecture material</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs010.html#resampling-methods" style="font-size: 80%;"><b>Resampling methods</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs007.html#resampling-approaches-can-be-computationally-expensive" style="font-size: 80%;"><b>Resampling approaches can be computationally expensive</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs008.html#why-resampling-methods" style="font-size: 80%;"><b>Why resampling methods ?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs009.html#statistical-analysis" style="font-size: 80%;"><b>Statistical analysis</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs010.html#resampling-methods" style="font-size: 80%;"><b>Resampling methods</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs011.html#resampling-methods-bootstrap" style="font-size: 80%;"><b>Resampling methods: Bootstrap</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs012.html#the-bias-variance-tradeoff" style="font-size: 80%;"><b>The bias-variance tradeoff</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs013.html#a-way-to-read-the-bias-variance-tradeoff" style="font-size: 80%;"><b>A way to Read the Bias-Variance Tradeoff</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs014.html#understanding-what-happens" style="font-size: 80%;"><b>Understanding what happens</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs015.html#summing-up" style="font-size: 80%;"><b>Summing up</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs016.html#another-example-from-scikit-learn-s-repository" style="font-size: 80%;"><b>Another Example from Scikit-Learn's Repository</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs017.html#various-steps-in-cross-validation" style="font-size: 80%;"><b>Various steps in cross-validation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs018.html#cross-validation-in-brief" style="font-size: 80%;"><b>Cross-validation in brief</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs019.html#code-example-for-cross-validation-and-k-fold-cross-validation" style="font-size: 80%;"><b>Code Example for Cross-validation and \( k \)-fold Cross-validation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs020.html#more-examples-on-bootstrap-and-cross-validation-and-errors" style="font-size: 80%;"><b>More examples on bootstrap and cross-validation and errors</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs021.html#the-same-example-but-now-with-cross-validation" style="font-size: 80%;"><b>The same example but now with cross-validation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs022.html#logistic-regression" style="font-size: 80%;"><b>Logistic Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs023.html#classification-problems" style="font-size: 80%;"><b>Classification problems</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs024.html#optimization-and-deep-learning" style="font-size: 80%;"><b>Optimization and Deep learning</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs025.html#basics" style="font-size: 80%;"><b>Basics</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs026.html#linear-classifier" style="font-size: 80%;"><b>Linear classifier</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs027.html#some-selected-properties" style="font-size: 80%;"><b>Some selected properties</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs028.html#simple-example" style="font-size: 80%;"><b>Simple example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs029.html#plotting-the-mean-value-for-each-group" style="font-size: 80%;"><b>Plotting the mean value for each group</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs030.html#the-logistic-function" style="font-size: 80%;"><b>The logistic function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs031.html#examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-networks" style="font-size: 80%;"><b>Examples of likelihood functions used in logistic regression and nueral networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs032.html#two-parameters" style="font-size: 80%;"><b>Two parameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs033.html#maximum-likelihood" style="font-size: 80%;"><b>Maximum likelihood</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#the-cost-function-rewritten" style="font-size: 80%;"><b>The cost function rewritten</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs035.html#minimizing-the-cross-entropy" style="font-size: 80%;"><b>Minimizing the cross entropy</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs036.html#a-more-compact-expression" style="font-size: 80%;"><b>A more compact expression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs037.html#extending-to-more-predictors" style="font-size: 80%;"><b>Extending to more predictors</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs038.html#including-more-classes" style="font-size: 80%;"><b>Including more classes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs039.html#more-classes" style="font-size: 80%;"><b>More classes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs040.html#optimization-the-central-part-of-any-machine-learning-algortithm" style="font-size: 80%;"><b>Optimization, the central part of any Machine Learning algortithm</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs041.html#revisiting-our-logistic-regression-case" style="font-size: 80%;"><b>Revisiting our Logistic Regression case</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs042.html#the-equations-to-solve" style="font-size: 80%;"><b>The equations to solve</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs043.html#solving-using-newton-raphson-s-method" style="font-size: 80%;"><b>Solving using Newton-Raphson's method</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs044.html#example-code-for-logistic-regression" style="font-size: 80%;"><b>Example code for Logistic Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs044.html#synthetic-data-generation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Synthetic data generation</a></li>
+
+        </ul>
+      </li>
+    </ul>
+  </div>
+</div>
+</div> <!-- end of navigation bar -->
+<div class="container">
+<p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
+<a name="part0006"></a>
+<!-- !split -->
+<h2 id="resampling-methods" class="anchor">Resampling methods </h2>
+<div class="panel panel-default">
+<div class="panel-body">
+<!-- subsequent paragraphs come in larger fonts, so start with a paragraph -->
+<p>Resampling methods are an indispensable tool in modern
+statistics. They involve repeatedly drawing samples from a training
+set and refitting a model of interest on each sample in order to
+obtain additional information about the fitted model. For example, in
+order to estimate the variability of a linear regression fit, we can
+repeatedly draw different samples from the training data, fit a linear
+regression to each new sample, and then examine the extent to which
+the resulting fits differ. Such an approach may allow us to obtain
+information that would not be available from fitting the model only
+once using the original training sample.
+</p>
+
+<p>Two resampling methods are often used in Machine Learning analyses,</p>
+<ol>
+<li> The <b>bootstrap method</b></li>
+<li> and <b>Cross-Validation</b></li>
+</ol>
+<p>In addition there are several other methods such as the Jackknife and the Blocking methods. This week will repeat some of the elements of the bootstrap method and focus more on cross-validation.</p>
+</div>
+</div>
+
+
+<p>
+<!-- navigation buttons at the bottom of the page -->
+<ul class="pagination">
+<li><a href="http://github.com/._week39-bs005.html">&laquo;</a></li>
+  <li><a href="http://github.com/._week39-bs000.html">1</a></li>
+  <li><a href="http://github.com/._week39-bs001.html">2</a></li>
+  <li><a href="http://github.com/._week39-bs002.html">3</a></li>
+  <li><a href="http://github.com/._week39-bs003.html">4</a></li>
+  <li><a href="http://github.com/._week39-bs004.html">5</a></li>
+  <li><a href="http://github.com/._week39-bs005.html">6</a></li>
+  <li class="active"><a href="http://github.com/._week39-bs006.html">7</a></li>
+  <li><a href="http://github.com/._week39-bs007.html">8</a></li>
+  <li><a href="http://github.com/._week39-bs008.html">9</a></li>
+  <li><a href="http://github.com/._week39-bs009.html">10</a></li>
+  <li><a href="http://github.com/._week39-bs010.html">11</a></li>
+  <li><a href="http://github.com/._week39-bs011.html">12</a></li>
+  <li><a href="http://github.com/._week39-bs012.html">13</a></li>
+  <li><a href="http://github.com/._week39-bs013.html">14</a></li>
+  <li><a href="http://github.com/._week39-bs014.html">15</a></li>
+  <li><a href="http://github.com/._week39-bs015.html">16</a></li>
+  <li><a href="">...</a></li>
+  <li><a href="http://github.com/._week39-bs044.html">45</a></li>
+  <li><a href="http://github.com/._week39-bs007.html">&raquo;</a></li>
+</ul>
+<!-- ------------------- end of main content --------------- -->
+</div>  <!-- end container -->
+<!-- include javascript, jQuery *first* -->
+<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.10.2/jquery.min.js"></script>
+<script src="https://netdna.bootstrapcdn.com/bootstrap/3.0.0/js/bootstrap.min.js"></script>
+<!-- Bootstrap footer
+<footer>
+<a href="https://.../"><img width="250" align=right src="https://.../"></a>
+</footer>
+-->
+<center style="font-size:80%">
+<!-- copyright only on the titlepage -->
+</center>
+</body>
+</html>
+
diff --git a/doc/pub/week39/html/._week39-bs007.html b/doc/pub/week39/html/._week39-bs007.html
new file mode 100644
index 000000000..b69069758
--- /dev/null
+++ b/doc/pub/week39/html/._week39-bs007.html
@@ -0,0 +1,320 @@
+<!--
+HTML file automatically generated from DocOnce source
+(https://github.com/doconce/doconce/)
+doconce format html week39.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week39-bs --no_mako
+-->
+<html>
+<head>
+<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
+<meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
+<meta name="viewport" content="width=device-width, initial-scale=1.0" />
+<meta name="description" content="Week 39: Resampling methods and logistic regression">
+<title>Week 39: Resampling methods and logistic regression</title>
+<!-- Bootstrap style: bootstrap -->
+<!-- doconce format html week39.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week39-bs --no_mako -->
+<link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
+<!-- not necessary
+<link href="https://netdna.bootstrapcdn.com/font-awesome/4.0.3/css/font-awesome.css" rel="stylesheet">
+-->
+<style type="text/css">
+/* Add scrollbar to dropdown menus in bootstrap navigation bar */
+.dropdown-menu {
+   height: auto;
+   max-height: 400px;
+   overflow-x: hidden;
+}
+/* Adds an invisible element before each target to offset for the navigation
+   bar */
+.anchor::before {
+  content:"";
+  display:block;
+  height:50px;      /* fixed header height for style bootstrap */
+  margin:-50px 0 0; /* negative fixed header height */
+}
+</style>
+</head>
+
+<!-- tocinfo
+{'highest level': 2,
+ 'sections': [('Plan for week 39, September 22-26, 2025',
+               2,
+               None,
+               'plan-for-week-39-september-22-26-2025'),
+              ('Readings and Videos, resampling methods',
+               2,
+               None,
+               'readings-and-videos-resampling-methods'),
+              ('Readings and Videos, logistic regression',
+               2,
+               None,
+               'readings-and-videos-logistic-regression'),
+              ('Lab sessions week 39', 2, None, 'lab-sessions-week-39'),
+              ('Lecture material', 2, None, 'lecture-material'),
+              ('Resampling methods', 2, None, 'resampling-methods'),
+              ('Resampling approaches can be computationally expensive',
+               2,
+               None,
+               'resampling-approaches-can-be-computationally-expensive'),
+              ('Why resampling methods ?', 2, None, 'why-resampling-methods'),
+              ('Statistical analysis', 2, None, 'statistical-analysis'),
+              ('Resampling methods', 2, None, 'resampling-methods'),
+              ('Resampling methods: Bootstrap',
+               2,
+               None,
+               'resampling-methods-bootstrap'),
+              ('The bias-variance tradeoff',
+               2,
+               None,
+               'the-bias-variance-tradeoff'),
+              ('A way to Read the Bias-Variance Tradeoff',
+               2,
+               None,
+               'a-way-to-read-the-bias-variance-tradeoff'),
+              ('Understanding what happens',
+               2,
+               None,
+               'understanding-what-happens'),
+              ('Summing up', 2, None, 'summing-up'),
+              ("Another Example from Scikit-Learn's Repository",
+               2,
+               None,
+               'another-example-from-scikit-learn-s-repository'),
+              ('Various steps in cross-validation',
+               2,
+               None,
+               'various-steps-in-cross-validation'),
+              ('Cross-validation in brief',
+               2,
+               None,
+               'cross-validation-in-brief'),
+              ('Code Example for Cross-validation and $k$-fold '
+               'Cross-validation',
+               2,
+               None,
+               'code-example-for-cross-validation-and-k-fold-cross-validation'),
+              ('More examples on bootstrap and cross-validation and errors',
+               2,
+               None,
+               'more-examples-on-bootstrap-and-cross-validation-and-errors'),
+              ('The same example but now with cross-validation',
+               2,
+               None,
+               'the-same-example-but-now-with-cross-validation'),
+              ('Logistic Regression', 2, None, 'logistic-regression'),
+              ('Classification problems', 2, None, 'classification-problems'),
+              ('Optimization and Deep learning',
+               2,
+               None,
+               'optimization-and-deep-learning'),
+              ('Basics', 2, None, 'basics'),
+              ('Linear classifier', 2, None, 'linear-classifier'),
+              ('Some selected properties', 2, None, 'some-selected-properties'),
+              ('Simple example', 2, None, 'simple-example'),
+              ('Plotting the mean value for each group',
+               2,
+               None,
+               'plotting-the-mean-value-for-each-group'),
+              ('The logistic function', 2, None, 'the-logistic-function'),
+              ('Examples of likelihood functions used in logistic regression '
+               'and nueral networks',
+               2,
+               None,
+               'examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-networks'),
+              ('Two parameters', 2, None, 'two-parameters'),
+              ('Maximum likelihood', 2, None, 'maximum-likelihood'),
+              ('The cost function rewritten',
+               2,
+               None,
+               'the-cost-function-rewritten'),
+              ('Minimizing the cross entropy',
+               2,
+               None,
+               'minimizing-the-cross-entropy'),
+              ('A more compact expression',
+               2,
+               None,
+               'a-more-compact-expression'),
+              ('Extending to more predictors',
+               2,
+               None,
+               'extending-to-more-predictors'),
+              ('Including more classes', 2, None, 'including-more-classes'),
+              ('More classes', 2, None, 'more-classes'),
+              ('Optimization, the central part of any Machine Learning '
+               'algortithm',
+               2,
+               None,
+               'optimization-the-central-part-of-any-machine-learning-algortithm'),
+              ('Revisiting our Logistic Regression case',
+               2,
+               None,
+               'revisiting-our-logistic-regression-case'),
+              ('The equations to solve', 2, None, 'the-equations-to-solve'),
+              ("Solving using Newton-Raphson's method",
+               2,
+               None,
+               'solving-using-newton-raphson-s-method'),
+              ('Example code for Logistic Regression',
+               2,
+               None,
+               'example-code-for-logistic-regression'),
+              ('Synthetic data generation',
+               3,
+               None,
+               'synthetic-data-generation')]}
+end of tocinfo -->
+
+<body>
+
+
+
+<script type="text/x-mathjax-config">
+MathJax.Hub.Config({
+  TeX: {
+     equationNumbers: {  autoNumber: "none"  },
+     extensions: ["AMSmath.js", "AMSsymbols.js", "autobold.js", "color.js"]
+  }
+});
+</script>
+<script type="text/javascript" async
+ src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
+</script>
+
+
+<!-- Bootstrap navigation bar -->
+<div class="navbar navbar-default navbar-fixed-top">
+  <div class="navbar-header">
+    <button type="button" class="navbar-toggle" data-toggle="collapse" data-target=".navbar-responsive-collapse">
+      <span class="icon-bar"></span>
+      <span class="icon-bar"></span>
+      <span class="icon-bar"></span>
+    </button>
+    <a class="navbar-brand" href="http://github.com/week39-bs.html">Week 39: Resampling methods and logistic regression</a>
+  </div>
+  <div class="navbar-collapse collapse navbar-responsive-collapse">
+    <ul class="nav navbar-nav navbar-right">
+      <li class="dropdown">
+        <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
+        <ul class="dropdown-menu">
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs001.html#plan-for-week-39-september-22-26-2025" style="font-size: 80%;"><b>Plan for week 39, September 22-26, 2025</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs002.html#readings-and-videos-resampling-methods" style="font-size: 80%;"><b>Readings and Videos, resampling methods</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs003.html#readings-and-videos-logistic-regression" style="font-size: 80%;"><b>Readings and Videos, logistic regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs004.html#lab-sessions-week-39" style="font-size: 80%;"><b>Lab sessions week 39</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs005.html#lecture-material" style="font-size: 80%;"><b>Lecture material</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs010.html#resampling-methods" style="font-size: 80%;"><b>Resampling methods</b></a></li>
+     <!-- navigation toc: --> <li><a href="#resampling-approaches-can-be-computationally-expensive" style="font-size: 80%;"><b>Resampling approaches can be computationally expensive</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs008.html#why-resampling-methods" style="font-size: 80%;"><b>Why resampling methods ?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs009.html#statistical-analysis" style="font-size: 80%;"><b>Statistical analysis</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs010.html#resampling-methods" style="font-size: 80%;"><b>Resampling methods</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs011.html#resampling-methods-bootstrap" style="font-size: 80%;"><b>Resampling methods: Bootstrap</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs012.html#the-bias-variance-tradeoff" style="font-size: 80%;"><b>The bias-variance tradeoff</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs013.html#a-way-to-read-the-bias-variance-tradeoff" style="font-size: 80%;"><b>A way to Read the Bias-Variance Tradeoff</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs014.html#understanding-what-happens" style="font-size: 80%;"><b>Understanding what happens</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs015.html#summing-up" style="font-size: 80%;"><b>Summing up</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs016.html#another-example-from-scikit-learn-s-repository" style="font-size: 80%;"><b>Another Example from Scikit-Learn's Repository</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs017.html#various-steps-in-cross-validation" style="font-size: 80%;"><b>Various steps in cross-validation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs018.html#cross-validation-in-brief" style="font-size: 80%;"><b>Cross-validation in brief</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs019.html#code-example-for-cross-validation-and-k-fold-cross-validation" style="font-size: 80%;"><b>Code Example for Cross-validation and \( k \)-fold Cross-validation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs020.html#more-examples-on-bootstrap-and-cross-validation-and-errors" style="font-size: 80%;"><b>More examples on bootstrap and cross-validation and errors</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs021.html#the-same-example-but-now-with-cross-validation" style="font-size: 80%;"><b>The same example but now with cross-validation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs022.html#logistic-regression" style="font-size: 80%;"><b>Logistic Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs023.html#classification-problems" style="font-size: 80%;"><b>Classification problems</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs024.html#optimization-and-deep-learning" style="font-size: 80%;"><b>Optimization and Deep learning</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs025.html#basics" style="font-size: 80%;"><b>Basics</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs026.html#linear-classifier" style="font-size: 80%;"><b>Linear classifier</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs027.html#some-selected-properties" style="font-size: 80%;"><b>Some selected properties</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs028.html#simple-example" style="font-size: 80%;"><b>Simple example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs029.html#plotting-the-mean-value-for-each-group" style="font-size: 80%;"><b>Plotting the mean value for each group</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs030.html#the-logistic-function" style="font-size: 80%;"><b>The logistic function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs031.html#examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-networks" style="font-size: 80%;"><b>Examples of likelihood functions used in logistic regression and nueral networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs032.html#two-parameters" style="font-size: 80%;"><b>Two parameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs033.html#maximum-likelihood" style="font-size: 80%;"><b>Maximum likelihood</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#the-cost-function-rewritten" style="font-size: 80%;"><b>The cost function rewritten</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs035.html#minimizing-the-cross-entropy" style="font-size: 80%;"><b>Minimizing the cross entropy</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs036.html#a-more-compact-expression" style="font-size: 80%;"><b>A more compact expression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs037.html#extending-to-more-predictors" style="font-size: 80%;"><b>Extending to more predictors</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs038.html#including-more-classes" style="font-size: 80%;"><b>Including more classes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs039.html#more-classes" style="font-size: 80%;"><b>More classes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs040.html#optimization-the-central-part-of-any-machine-learning-algortithm" style="font-size: 80%;"><b>Optimization, the central part of any Machine Learning algortithm</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs041.html#revisiting-our-logistic-regression-case" style="font-size: 80%;"><b>Revisiting our Logistic Regression case</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs042.html#the-equations-to-solve" style="font-size: 80%;"><b>The equations to solve</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs043.html#solving-using-newton-raphson-s-method" style="font-size: 80%;"><b>Solving using Newton-Raphson's method</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs044.html#example-code-for-logistic-regression" style="font-size: 80%;"><b>Example code for Logistic Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs044.html#synthetic-data-generation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Synthetic data generation</a></li>
+
+        </ul>
+      </li>
+    </ul>
+  </div>
+</div>
+</div> <!-- end of navigation bar -->
+<div class="container">
+<p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
+<a name="part0007"></a>
+<!-- !split -->
+<h2 id="resampling-approaches-can-be-computationally-expensive" class="anchor">Resampling approaches can be computationally expensive </h2>
+<div class="panel panel-default">
+<div class="panel-body">
+<!-- subsequent paragraphs come in larger fonts, so start with a paragraph -->
+
+<p>Resampling approaches can be computationally expensive, because they
+involve fitting the same statistical method multiple times using
+different subsets of the training data. However, due to recent
+advances in computing power, the computational requirements of
+resampling methods generally are not prohibitive. In this chapter, we
+discuss two of the most commonly used resampling methods,
+cross-validation and the bootstrap. Both methods are important tools
+in the practical application of many statistical learning
+procedures. For example, cross-validation can be used to estimate the
+test error associated with a given statistical learning method in
+order to evaluate its performance, or to select the appropriate level
+of flexibility. The process of evaluating a model&#8217;s performance is
+known as model assessment, whereas the process of selecting the proper
+level of flexibility for a model is known as model selection. The
+bootstrap is widely used.
+</p>
+</div>
+</div>
+
+
+<p>
+<!-- navigation buttons at the bottom of the page -->
+<ul class="pagination">
+<li><a href="http://github.com/._week39-bs006.html">&laquo;</a></li>
+  <li><a href="http://github.com/._week39-bs000.html">1</a></li>
+  <li><a href="http://github.com/._week39-bs001.html">2</a></li>
+  <li><a href="http://github.com/._week39-bs002.html">3</a></li>
+  <li><a href="http://github.com/._week39-bs003.html">4</a></li>
+  <li><a href="http://github.com/._week39-bs004.html">5</a></li>
+  <li><a href="http://github.com/._week39-bs005.html">6</a></li>
+  <li><a href="http://github.com/._week39-bs006.html">7</a></li>
+  <li class="active"><a href="http://github.com/._week39-bs007.html">8</a></li>
+  <li><a href="http://github.com/._week39-bs008.html">9</a></li>
+  <li><a href="http://github.com/._week39-bs009.html">10</a></li>
+  <li><a href="http://github.com/._week39-bs010.html">11</a></li>
+  <li><a href="http://github.com/._week39-bs011.html">12</a></li>
+  <li><a href="http://github.com/._week39-bs012.html">13</a></li>
+  <li><a href="http://github.com/._week39-bs013.html">14</a></li>
+  <li><a href="http://github.com/._week39-bs014.html">15</a></li>
+  <li><a href="http://github.com/._week39-bs015.html">16</a></li>
+  <li><a href="http://github.com/._week39-bs016.html">17</a></li>
+  <li><a href="">...</a></li>
+  <li><a href="http://github.com/._week39-bs044.html">45</a></li>
+  <li><a href="http://github.com/._week39-bs008.html">&raquo;</a></li>
+</ul>
+<!-- ------------------- end of main content --------------- -->
+</div>  <!-- end container -->
+<!-- include javascript, jQuery *first* -->
+<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.10.2/jquery.min.js"></script>
+<script src="https://netdna.bootstrapcdn.com/bootstrap/3.0.0/js/bootstrap.min.js"></script>
+<!-- Bootstrap footer
+<footer>
+<a href="https://.../"><img width="250" align=right src="https://.../"></a>
+</footer>
+-->
+<center style="font-size:80%">
+<!-- copyright only on the titlepage -->
+</center>
+</body>
+</html>
+
diff --git a/doc/pub/week39/html/._week39-bs008.html b/doc/pub/week39/html/._week39-bs008.html
new file mode 100644
index 000000000..3cd11c9d3
--- /dev/null
+++ b/doc/pub/week39/html/._week39-bs008.html
@@ -0,0 +1,310 @@
+<!--
+HTML file automatically generated from DocOnce source
+(https://github.com/doconce/doconce/)
+doconce format html week39.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week39-bs --no_mako
+-->
+<html>
+<head>
+<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
+<meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
+<meta name="viewport" content="width=device-width, initial-scale=1.0" />
+<meta name="description" content="Week 39: Resampling methods and logistic regression">
+<title>Week 39: Resampling methods and logistic regression</title>
+<!-- Bootstrap style: bootstrap -->
+<!-- doconce format html week39.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week39-bs --no_mako -->
+<link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
+<!-- not necessary
+<link href="https://netdna.bootstrapcdn.com/font-awesome/4.0.3/css/font-awesome.css" rel="stylesheet">
+-->
+<style type="text/css">
+/* Add scrollbar to dropdown menus in bootstrap navigation bar */
+.dropdown-menu {
+   height: auto;
+   max-height: 400px;
+   overflow-x: hidden;
+}
+/* Adds an invisible element before each target to offset for the navigation
+   bar */
+.anchor::before {
+  content:"";
+  display:block;
+  height:50px;      /* fixed header height for style bootstrap */
+  margin:-50px 0 0; /* negative fixed header height */
+}
+</style>
+</head>
+
+<!-- tocinfo
+{'highest level': 2,
+ 'sections': [('Plan for week 39, September 22-26, 2025',
+               2,
+               None,
+               'plan-for-week-39-september-22-26-2025'),
+              ('Readings and Videos, resampling methods',
+               2,
+               None,
+               'readings-and-videos-resampling-methods'),
+              ('Readings and Videos, logistic regression',
+               2,
+               None,
+               'readings-and-videos-logistic-regression'),
+              ('Lab sessions week 39', 2, None, 'lab-sessions-week-39'),
+              ('Lecture material', 2, None, 'lecture-material'),
+              ('Resampling methods', 2, None, 'resampling-methods'),
+              ('Resampling approaches can be computationally expensive',
+               2,
+               None,
+               'resampling-approaches-can-be-computationally-expensive'),
+              ('Why resampling methods ?', 2, None, 'why-resampling-methods'),
+              ('Statistical analysis', 2, None, 'statistical-analysis'),
+              ('Resampling methods', 2, None, 'resampling-methods'),
+              ('Resampling methods: Bootstrap',
+               2,
+               None,
+               'resampling-methods-bootstrap'),
+              ('The bias-variance tradeoff',
+               2,
+               None,
+               'the-bias-variance-tradeoff'),
+              ('A way to Read the Bias-Variance Tradeoff',
+               2,
+               None,
+               'a-way-to-read-the-bias-variance-tradeoff'),
+              ('Understanding what happens',
+               2,
+               None,
+               'understanding-what-happens'),
+              ('Summing up', 2, None, 'summing-up'),
+              ("Another Example from Scikit-Learn's Repository",
+               2,
+               None,
+               'another-example-from-scikit-learn-s-repository'),
+              ('Various steps in cross-validation',
+               2,
+               None,
+               'various-steps-in-cross-validation'),
+              ('Cross-validation in brief',
+               2,
+               None,
+               'cross-validation-in-brief'),
+              ('Code Example for Cross-validation and $k$-fold '
+               'Cross-validation',
+               2,
+               None,
+               'code-example-for-cross-validation-and-k-fold-cross-validation'),
+              ('More examples on bootstrap and cross-validation and errors',
+               2,
+               None,
+               'more-examples-on-bootstrap-and-cross-validation-and-errors'),
+              ('The same example but now with cross-validation',
+               2,
+               None,
+               'the-same-example-but-now-with-cross-validation'),
+              ('Logistic Regression', 2, None, 'logistic-regression'),
+              ('Classification problems', 2, None, 'classification-problems'),
+              ('Optimization and Deep learning',
+               2,
+               None,
+               'optimization-and-deep-learning'),
+              ('Basics', 2, None, 'basics'),
+              ('Linear classifier', 2, None, 'linear-classifier'),
+              ('Some selected properties', 2, None, 'some-selected-properties'),
+              ('Simple example', 2, None, 'simple-example'),
+              ('Plotting the mean value for each group',
+               2,
+               None,
+               'plotting-the-mean-value-for-each-group'),
+              ('The logistic function', 2, None, 'the-logistic-function'),
+              ('Examples of likelihood functions used in logistic regression '
+               'and nueral networks',
+               2,
+               None,
+               'examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-networks'),
+              ('Two parameters', 2, None, 'two-parameters'),
+              ('Maximum likelihood', 2, None, 'maximum-likelihood'),
+              ('The cost function rewritten',
+               2,
+               None,
+               'the-cost-function-rewritten'),
+              ('Minimizing the cross entropy',
+               2,
+               None,
+               'minimizing-the-cross-entropy'),
+              ('A more compact expression',
+               2,
+               None,
+               'a-more-compact-expression'),
+              ('Extending to more predictors',
+               2,
+               None,
+               'extending-to-more-predictors'),
+              ('Including more classes', 2, None, 'including-more-classes'),
+              ('More classes', 2, None, 'more-classes'),
+              ('Optimization, the central part of any Machine Learning '
+               'algortithm',
+               2,
+               None,
+               'optimization-the-central-part-of-any-machine-learning-algortithm'),
+              ('Revisiting our Logistic Regression case',
+               2,
+               None,
+               'revisiting-our-logistic-regression-case'),
+              ('The equations to solve', 2, None, 'the-equations-to-solve'),
+              ("Solving using Newton-Raphson's method",
+               2,
+               None,
+               'solving-using-newton-raphson-s-method'),
+              ('Example code for Logistic Regression',
+               2,
+               None,
+               'example-code-for-logistic-regression'),
+              ('Synthetic data generation',
+               3,
+               None,
+               'synthetic-data-generation')]}
+end of tocinfo -->
+
+<body>
+
+
+
+<script type="text/x-mathjax-config">
+MathJax.Hub.Config({
+  TeX: {
+     equationNumbers: {  autoNumber: "none"  },
+     extensions: ["AMSmath.js", "AMSsymbols.js", "autobold.js", "color.js"]
+  }
+});
+</script>
+<script type="text/javascript" async
+ src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
+</script>
+
+
+<!-- Bootstrap navigation bar -->
+<div class="navbar navbar-default navbar-fixed-top">
+  <div class="navbar-header">
+    <button type="button" class="navbar-toggle" data-toggle="collapse" data-target=".navbar-responsive-collapse">
+      <span class="icon-bar"></span>
+      <span class="icon-bar"></span>
+      <span class="icon-bar"></span>
+    </button>
+    <a class="navbar-brand" href="http://github.com/week39-bs.html">Week 39: Resampling methods and logistic regression</a>
+  </div>
+  <div class="navbar-collapse collapse navbar-responsive-collapse">
+    <ul class="nav navbar-nav navbar-right">
+      <li class="dropdown">
+        <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
+        <ul class="dropdown-menu">
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs001.html#plan-for-week-39-september-22-26-2025" style="font-size: 80%;"><b>Plan for week 39, September 22-26, 2025</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs002.html#readings-and-videos-resampling-methods" style="font-size: 80%;"><b>Readings and Videos, resampling methods</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs003.html#readings-and-videos-logistic-regression" style="font-size: 80%;"><b>Readings and Videos, logistic regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs004.html#lab-sessions-week-39" style="font-size: 80%;"><b>Lab sessions week 39</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs005.html#lecture-material" style="font-size: 80%;"><b>Lecture material</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs010.html#resampling-methods" style="font-size: 80%;"><b>Resampling methods</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs007.html#resampling-approaches-can-be-computationally-expensive" style="font-size: 80%;"><b>Resampling approaches can be computationally expensive</b></a></li>
+     <!-- navigation toc: --> <li><a href="#why-resampling-methods" style="font-size: 80%;"><b>Why resampling methods ?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs009.html#statistical-analysis" style="font-size: 80%;"><b>Statistical analysis</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs010.html#resampling-methods" style="font-size: 80%;"><b>Resampling methods</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs011.html#resampling-methods-bootstrap" style="font-size: 80%;"><b>Resampling methods: Bootstrap</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs012.html#the-bias-variance-tradeoff" style="font-size: 80%;"><b>The bias-variance tradeoff</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs013.html#a-way-to-read-the-bias-variance-tradeoff" style="font-size: 80%;"><b>A way to Read the Bias-Variance Tradeoff</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs014.html#understanding-what-happens" style="font-size: 80%;"><b>Understanding what happens</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs015.html#summing-up" style="font-size: 80%;"><b>Summing up</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs016.html#another-example-from-scikit-learn-s-repository" style="font-size: 80%;"><b>Another Example from Scikit-Learn's Repository</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs017.html#various-steps-in-cross-validation" style="font-size: 80%;"><b>Various steps in cross-validation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs018.html#cross-validation-in-brief" style="font-size: 80%;"><b>Cross-validation in brief</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs019.html#code-example-for-cross-validation-and-k-fold-cross-validation" style="font-size: 80%;"><b>Code Example for Cross-validation and \( k \)-fold Cross-validation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs020.html#more-examples-on-bootstrap-and-cross-validation-and-errors" style="font-size: 80%;"><b>More examples on bootstrap and cross-validation and errors</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs021.html#the-same-example-but-now-with-cross-validation" style="font-size: 80%;"><b>The same example but now with cross-validation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs022.html#logistic-regression" style="font-size: 80%;"><b>Logistic Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs023.html#classification-problems" style="font-size: 80%;"><b>Classification problems</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs024.html#optimization-and-deep-learning" style="font-size: 80%;"><b>Optimization and Deep learning</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs025.html#basics" style="font-size: 80%;"><b>Basics</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs026.html#linear-classifier" style="font-size: 80%;"><b>Linear classifier</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs027.html#some-selected-properties" style="font-size: 80%;"><b>Some selected properties</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs028.html#simple-example" style="font-size: 80%;"><b>Simple example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs029.html#plotting-the-mean-value-for-each-group" style="font-size: 80%;"><b>Plotting the mean value for each group</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs030.html#the-logistic-function" style="font-size: 80%;"><b>The logistic function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs031.html#examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-networks" style="font-size: 80%;"><b>Examples of likelihood functions used in logistic regression and nueral networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs032.html#two-parameters" style="font-size: 80%;"><b>Two parameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs033.html#maximum-likelihood" style="font-size: 80%;"><b>Maximum likelihood</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#the-cost-function-rewritten" style="font-size: 80%;"><b>The cost function rewritten</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs035.html#minimizing-the-cross-entropy" style="font-size: 80%;"><b>Minimizing the cross entropy</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs036.html#a-more-compact-expression" style="font-size: 80%;"><b>A more compact expression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs037.html#extending-to-more-predictors" style="font-size: 80%;"><b>Extending to more predictors</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs038.html#including-more-classes" style="font-size: 80%;"><b>Including more classes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs039.html#more-classes" style="font-size: 80%;"><b>More classes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs040.html#optimization-the-central-part-of-any-machine-learning-algortithm" style="font-size: 80%;"><b>Optimization, the central part of any Machine Learning algortithm</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs041.html#revisiting-our-logistic-regression-case" style="font-size: 80%;"><b>Revisiting our Logistic Regression case</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs042.html#the-equations-to-solve" style="font-size: 80%;"><b>The equations to solve</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs043.html#solving-using-newton-raphson-s-method" style="font-size: 80%;"><b>Solving using Newton-Raphson's method</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs044.html#example-code-for-logistic-regression" style="font-size: 80%;"><b>Example code for Logistic Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs044.html#synthetic-data-generation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Synthetic data generation</a></li>
+
+        </ul>
+      </li>
+    </ul>
+  </div>
+</div>
+</div> <!-- end of navigation bar -->
+<div class="container">
+<p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
+<a name="part0008"></a>
+<!-- !split -->
+<h2 id="why-resampling-methods" class="anchor">Why resampling methods ? </h2>
+<div class="panel panel-default">
+<div class="panel-body">
+<!-- subsequent paragraphs come in larger fonts, so start with a paragraph -->
+
+<ul>
+<li> Our simulations can be treated as <em>computer experiments</em>. This is particularly the case for Monte Carlo methods which are widely used in statistical analyses.</li>
+<li> The results can be analysed with the same statistical tools as we would use when analysing experimental data.</li>
+<li> As in all experiments, we are looking for expectation values and an estimate of how accurate they are, i.e., possible sources for errors.</li>
+</ul>
+</div>
+</div>
+    
+
+<p>
+<!-- navigation buttons at the bottom of the page -->
+<ul class="pagination">
+<li><a href="http://github.com/._week39-bs007.html">&laquo;</a></li>
+  <li><a href="http://github.com/._week39-bs000.html">1</a></li>
+  <li><a href="http://github.com/._week39-bs001.html">2</a></li>
+  <li><a href="http://github.com/._week39-bs002.html">3</a></li>
+  <li><a href="http://github.com/._week39-bs003.html">4</a></li>
+  <li><a href="http://github.com/._week39-bs004.html">5</a></li>
+  <li><a href="http://github.com/._week39-bs005.html">6</a></li>
+  <li><a href="http://github.com/._week39-bs006.html">7</a></li>
+  <li><a href="http://github.com/._week39-bs007.html">8</a></li>
+  <li class="active"><a href="http://github.com/._week39-bs008.html">9</a></li>
+  <li><a href="http://github.com/._week39-bs009.html">10</a></li>
+  <li><a href="http://github.com/._week39-bs010.html">11</a></li>
+  <li><a href="http://github.com/._week39-bs011.html">12</a></li>
+  <li><a href="http://github.com/._week39-bs012.html">13</a></li>
+  <li><a href="http://github.com/._week39-bs013.html">14</a></li>
+  <li><a href="http://github.com/._week39-bs014.html">15</a></li>
+  <li><a href="http://github.com/._week39-bs015.html">16</a></li>
+  <li><a href="http://github.com/._week39-bs016.html">17</a></li>
+  <li><a href="http://github.com/._week39-bs017.html">18</a></li>
+  <li><a href="">...</a></li>
+  <li><a href="http://github.com/._week39-bs044.html">45</a></li>
+  <li><a href="http://github.com/._week39-bs009.html">&raquo;</a></li>
+</ul>
+<!-- ------------------- end of main content --------------- -->
+</div>  <!-- end container -->
+<!-- include javascript, jQuery *first* -->
+<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.10.2/jquery.min.js"></script>
+<script src="https://netdna.bootstrapcdn.com/bootstrap/3.0.0/js/bootstrap.min.js"></script>
+<!-- Bootstrap footer
+<footer>
+<a href="https://.../"><img width="250" align=right src="https://.../"></a>
+</footer>
+-->
+<center style="font-size:80%">
+<!-- copyright only on the titlepage -->
+</center>
+</body>
+</html>
+
diff --git a/doc/pub/week39/html/._week39-bs009.html b/doc/pub/week39/html/._week39-bs009.html
new file mode 100644
index 000000000..39760f974
--- /dev/null
+++ b/doc/pub/week39/html/._week39-bs009.html
@@ -0,0 +1,315 @@
+<!--
+HTML file automatically generated from DocOnce source
+(https://github.com/doconce/doconce/)
+doconce format html week39.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week39-bs --no_mako
+-->
+<html>
+<head>
+<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
+<meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
+<meta name="viewport" content="width=device-width, initial-scale=1.0" />
+<meta name="description" content="Week 39: Resampling methods and logistic regression">
+<title>Week 39: Resampling methods and logistic regression</title>
+<!-- Bootstrap style: bootstrap -->
+<!-- doconce format html week39.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week39-bs --no_mako -->
+<link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
+<!-- not necessary
+<link href="https://netdna.bootstrapcdn.com/font-awesome/4.0.3/css/font-awesome.css" rel="stylesheet">
+-->
+<style type="text/css">
+/* Add scrollbar to dropdown menus in bootstrap navigation bar */
+.dropdown-menu {
+   height: auto;
+   max-height: 400px;
+   overflow-x: hidden;
+}
+/* Adds an invisible element before each target to offset for the navigation
+   bar */
+.anchor::before {
+  content:"";
+  display:block;
+  height:50px;      /* fixed header height for style bootstrap */
+  margin:-50px 0 0; /* negative fixed header height */
+}
+</style>
+</head>
+
+<!-- tocinfo
+{'highest level': 2,
+ 'sections': [('Plan for week 39, September 22-26, 2025',
+               2,
+               None,
+               'plan-for-week-39-september-22-26-2025'),
+              ('Readings and Videos, resampling methods',
+               2,
+               None,
+               'readings-and-videos-resampling-methods'),
+              ('Readings and Videos, logistic regression',
+               2,
+               None,
+               'readings-and-videos-logistic-regression'),
+              ('Lab sessions week 39', 2, None, 'lab-sessions-week-39'),
+              ('Lecture material', 2, None, 'lecture-material'),
+              ('Resampling methods', 2, None, 'resampling-methods'),
+              ('Resampling approaches can be computationally expensive',
+               2,
+               None,
+               'resampling-approaches-can-be-computationally-expensive'),
+              ('Why resampling methods ?', 2, None, 'why-resampling-methods'),
+              ('Statistical analysis', 2, None, 'statistical-analysis'),
+              ('Resampling methods', 2, None, 'resampling-methods'),
+              ('Resampling methods: Bootstrap',
+               2,
+               None,
+               'resampling-methods-bootstrap'),
+              ('The bias-variance tradeoff',
+               2,
+               None,
+               'the-bias-variance-tradeoff'),
+              ('A way to Read the Bias-Variance Tradeoff',
+               2,
+               None,
+               'a-way-to-read-the-bias-variance-tradeoff'),
+              ('Understanding what happens',
+               2,
+               None,
+               'understanding-what-happens'),
+              ('Summing up', 2, None, 'summing-up'),
+              ("Another Example from Scikit-Learn's Repository",
+               2,
+               None,
+               'another-example-from-scikit-learn-s-repository'),
+              ('Various steps in cross-validation',
+               2,
+               None,
+               'various-steps-in-cross-validation'),
+              ('Cross-validation in brief',
+               2,
+               None,
+               'cross-validation-in-brief'),
+              ('Code Example for Cross-validation and $k$-fold '
+               'Cross-validation',
+               2,
+               None,
+               'code-example-for-cross-validation-and-k-fold-cross-validation'),
+              ('More examples on bootstrap and cross-validation and errors',
+               2,
+               None,
+               'more-examples-on-bootstrap-and-cross-validation-and-errors'),
+              ('The same example but now with cross-validation',
+               2,
+               None,
+               'the-same-example-but-now-with-cross-validation'),
+              ('Logistic Regression', 2, None, 'logistic-regression'),
+              ('Classification problems', 2, None, 'classification-problems'),
+              ('Optimization and Deep learning',
+               2,
+               None,
+               'optimization-and-deep-learning'),
+              ('Basics', 2, None, 'basics'),
+              ('Linear classifier', 2, None, 'linear-classifier'),
+              ('Some selected properties', 2, None, 'some-selected-properties'),
+              ('Simple example', 2, None, 'simple-example'),
+              ('Plotting the mean value for each group',
+               2,
+               None,
+               'plotting-the-mean-value-for-each-group'),
+              ('The logistic function', 2, None, 'the-logistic-function'),
+              ('Examples of likelihood functions used in logistic regression '
+               'and nueral networks',
+               2,
+               None,
+               'examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-networks'),
+              ('Two parameters', 2, None, 'two-parameters'),
+              ('Maximum likelihood', 2, None, 'maximum-likelihood'),
+              ('The cost function rewritten',
+               2,
+               None,
+               'the-cost-function-rewritten'),
+              ('Minimizing the cross entropy',
+               2,
+               None,
+               'minimizing-the-cross-entropy'),
+              ('A more compact expression',
+               2,
+               None,
+               'a-more-compact-expression'),
+              ('Extending to more predictors',
+               2,
+               None,
+               'extending-to-more-predictors'),
+              ('Including more classes', 2, None, 'including-more-classes'),
+              ('More classes', 2, None, 'more-classes'),
+              ('Optimization, the central part of any Machine Learning '
+               'algortithm',
+               2,
+               None,
+               'optimization-the-central-part-of-any-machine-learning-algortithm'),
+              ('Revisiting our Logistic Regression case',
+               2,
+               None,
+               'revisiting-our-logistic-regression-case'),
+              ('The equations to solve', 2, None, 'the-equations-to-solve'),
+              ("Solving using Newton-Raphson's method",
+               2,
+               None,
+               'solving-using-newton-raphson-s-method'),
+              ('Example code for Logistic Regression',
+               2,
+               None,
+               'example-code-for-logistic-regression'),
+              ('Synthetic data generation',
+               3,
+               None,
+               'synthetic-data-generation')]}
+end of tocinfo -->
+
+<body>
+
+
+
+<script type="text/x-mathjax-config">
+MathJax.Hub.Config({
+  TeX: {
+     equationNumbers: {  autoNumber: "none"  },
+     extensions: ["AMSmath.js", "AMSsymbols.js", "autobold.js", "color.js"]
+  }
+});
+</script>
+<script type="text/javascript" async
+ src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
+</script>
+
+
+<!-- Bootstrap navigation bar -->
+<div class="navbar navbar-default navbar-fixed-top">
+  <div class="navbar-header">
+    <button type="button" class="navbar-toggle" data-toggle="collapse" data-target=".navbar-responsive-collapse">
+      <span class="icon-bar"></span>
+      <span class="icon-bar"></span>
+      <span class="icon-bar"></span>
+    </button>
+    <a class="navbar-brand" href="http://github.com/week39-bs.html">Week 39: Resampling methods and logistic regression</a>
+  </div>
+  <div class="navbar-collapse collapse navbar-responsive-collapse">
+    <ul class="nav navbar-nav navbar-right">
+      <li class="dropdown">
+        <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
+        <ul class="dropdown-menu">
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs001.html#plan-for-week-39-september-22-26-2025" style="font-size: 80%;"><b>Plan for week 39, September 22-26, 2025</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs002.html#readings-and-videos-resampling-methods" style="font-size: 80%;"><b>Readings and Videos, resampling methods</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs003.html#readings-and-videos-logistic-regression" style="font-size: 80%;"><b>Readings and Videos, logistic regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs004.html#lab-sessions-week-39" style="font-size: 80%;"><b>Lab sessions week 39</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs005.html#lecture-material" style="font-size: 80%;"><b>Lecture material</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs010.html#resampling-methods" style="font-size: 80%;"><b>Resampling methods</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs007.html#resampling-approaches-can-be-computationally-expensive" style="font-size: 80%;"><b>Resampling approaches can be computationally expensive</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs008.html#why-resampling-methods" style="font-size: 80%;"><b>Why resampling methods ?</b></a></li>
+     <!-- navigation toc: --> <li><a href="#statistical-analysis" style="font-size: 80%;"><b>Statistical analysis</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs010.html#resampling-methods" style="font-size: 80%;"><b>Resampling methods</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs011.html#resampling-methods-bootstrap" style="font-size: 80%;"><b>Resampling methods: Bootstrap</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs012.html#the-bias-variance-tradeoff" style="font-size: 80%;"><b>The bias-variance tradeoff</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs013.html#a-way-to-read-the-bias-variance-tradeoff" style="font-size: 80%;"><b>A way to Read the Bias-Variance Tradeoff</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs014.html#understanding-what-happens" style="font-size: 80%;"><b>Understanding what happens</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs015.html#summing-up" style="font-size: 80%;"><b>Summing up</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs016.html#another-example-from-scikit-learn-s-repository" style="font-size: 80%;"><b>Another Example from Scikit-Learn's Repository</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs017.html#various-steps-in-cross-validation" style="font-size: 80%;"><b>Various steps in cross-validation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs018.html#cross-validation-in-brief" style="font-size: 80%;"><b>Cross-validation in brief</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs019.html#code-example-for-cross-validation-and-k-fold-cross-validation" style="font-size: 80%;"><b>Code Example for Cross-validation and \( k \)-fold Cross-validation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs020.html#more-examples-on-bootstrap-and-cross-validation-and-errors" style="font-size: 80%;"><b>More examples on bootstrap and cross-validation and errors</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs021.html#the-same-example-but-now-with-cross-validation" style="font-size: 80%;"><b>The same example but now with cross-validation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs022.html#logistic-regression" style="font-size: 80%;"><b>Logistic Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs023.html#classification-problems" style="font-size: 80%;"><b>Classification problems</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs024.html#optimization-and-deep-learning" style="font-size: 80%;"><b>Optimization and Deep learning</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs025.html#basics" style="font-size: 80%;"><b>Basics</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs026.html#linear-classifier" style="font-size: 80%;"><b>Linear classifier</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs027.html#some-selected-properties" style="font-size: 80%;"><b>Some selected properties</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs028.html#simple-example" style="font-size: 80%;"><b>Simple example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs029.html#plotting-the-mean-value-for-each-group" style="font-size: 80%;"><b>Plotting the mean value for each group</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs030.html#the-logistic-function" style="font-size: 80%;"><b>The logistic function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs031.html#examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-networks" style="font-size: 80%;"><b>Examples of likelihood functions used in logistic regression and nueral networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs032.html#two-parameters" style="font-size: 80%;"><b>Two parameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs033.html#maximum-likelihood" style="font-size: 80%;"><b>Maximum likelihood</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#the-cost-function-rewritten" style="font-size: 80%;"><b>The cost function rewritten</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs035.html#minimizing-the-cross-entropy" style="font-size: 80%;"><b>Minimizing the cross entropy</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs036.html#a-more-compact-expression" style="font-size: 80%;"><b>A more compact expression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs037.html#extending-to-more-predictors" style="font-size: 80%;"><b>Extending to more predictors</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs038.html#including-more-classes" style="font-size: 80%;"><b>Including more classes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs039.html#more-classes" style="font-size: 80%;"><b>More classes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs040.html#optimization-the-central-part-of-any-machine-learning-algortithm" style="font-size: 80%;"><b>Optimization, the central part of any Machine Learning algortithm</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs041.html#revisiting-our-logistic-regression-case" style="font-size: 80%;"><b>Revisiting our Logistic Regression case</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs042.html#the-equations-to-solve" style="font-size: 80%;"><b>The equations to solve</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs043.html#solving-using-newton-raphson-s-method" style="font-size: 80%;"><b>Solving using Newton-Raphson's method</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs044.html#example-code-for-logistic-regression" style="font-size: 80%;"><b>Example code for Logistic Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs044.html#synthetic-data-generation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Synthetic data generation</a></li>
+
+        </ul>
+      </li>
+    </ul>
+  </div>
+</div>
+</div> <!-- end of navigation bar -->
+<div class="container">
+<p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
+<a name="part0009"></a>
+<!-- !split -->
+<h2 id="statistical-analysis" class="anchor">Statistical analysis </h2>
+<div class="panel panel-default">
+<div class="panel-body">
+<!-- subsequent paragraphs come in larger fonts, so start with a paragraph -->
+
+<ul>
+<li> As in other experiments, many numerical  experiments have two classes of errors:</li>
+<ul>
+  <li> Statistical errors</li>
+  <li> Systematical errors</li>
+</ul>
+<li> Statistical errors can be estimated using standard tools from statistics</li>
+<li> Systematical errors are method specific and must be treated differently from case to case.</li> 
+</ul>
+</div>
+</div>
+    
+
+<p>
+<!-- navigation buttons at the bottom of the page -->
+<ul class="pagination">
+<li><a href="http://github.com/._week39-bs008.html">&laquo;</a></li>
+  <li><a href="http://github.com/._week39-bs000.html">1</a></li>
+  <li><a href="http://github.com/._week39-bs001.html">2</a></li>
+  <li><a href="http://github.com/._week39-bs002.html">3</a></li>
+  <li><a href="http://github.com/._week39-bs003.html">4</a></li>
+  <li><a href="http://github.com/._week39-bs004.html">5</a></li>
+  <li><a href="http://github.com/._week39-bs005.html">6</a></li>
+  <li><a href="http://github.com/._week39-bs006.html">7</a></li>
+  <li><a href="http://github.com/._week39-bs007.html">8</a></li>
+  <li><a href="http://github.com/._week39-bs008.html">9</a></li>
+  <li class="active"><a href="http://github.com/._week39-bs009.html">10</a></li>
+  <li><a href="http://github.com/._week39-bs010.html">11</a></li>
+  <li><a href="http://github.com/._week39-bs011.html">12</a></li>
+  <li><a href="http://github.com/._week39-bs012.html">13</a></li>
+  <li><a href="http://github.com/._week39-bs013.html">14</a></li>
+  <li><a href="http://github.com/._week39-bs014.html">15</a></li>
+  <li><a href="http://github.com/._week39-bs015.html">16</a></li>
+  <li><a href="http://github.com/._week39-bs016.html">17</a></li>
+  <li><a href="http://github.com/._week39-bs017.html">18</a></li>
+  <li><a href="http://github.com/._week39-bs018.html">19</a></li>
+  <li><a href="">...</a></li>
+  <li><a href="http://github.com/._week39-bs044.html">45</a></li>
+  <li><a href="http://github.com/._week39-bs010.html">&raquo;</a></li>
+</ul>
+<!-- ------------------- end of main content --------------- -->
+</div>  <!-- end container -->
+<!-- include javascript, jQuery *first* -->
+<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.10.2/jquery.min.js"></script>
+<script src="https://netdna.bootstrapcdn.com/bootstrap/3.0.0/js/bootstrap.min.js"></script>
+<!-- Bootstrap footer
+<footer>
+<a href="https://.../"><img width="250" align=right src="https://.../"></a>
+</footer>
+-->
+<center style="font-size:80%">
+<!-- copyright only on the titlepage -->
+</center>
+</body>
+</html>
+
diff --git a/doc/pub/week39/html/._week39-bs010.html b/doc/pub/week39/html/._week39-bs010.html
new file mode 100644
index 000000000..d06eaf79a
--- /dev/null
+++ b/doc/pub/week39/html/._week39-bs010.html
@@ -0,0 +1,323 @@
+<!--
+HTML file automatically generated from DocOnce source
+(https://github.com/doconce/doconce/)
+doconce format html week39.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week39-bs --no_mako
+-->
+<html>
+<head>
+<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
+<meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
+<meta name="viewport" content="width=device-width, initial-scale=1.0" />
+<meta name="description" content="Week 39: Resampling methods and logistic regression">
+<title>Week 39: Resampling methods and logistic regression</title>
+<!-- Bootstrap style: bootstrap -->
+<!-- doconce format html week39.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week39-bs --no_mako -->
+<link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
+<!-- not necessary
+<link href="https://netdna.bootstrapcdn.com/font-awesome/4.0.3/css/font-awesome.css" rel="stylesheet">
+-->
+<style type="text/css">
+/* Add scrollbar to dropdown menus in bootstrap navigation bar */
+.dropdown-menu {
+   height: auto;
+   max-height: 400px;
+   overflow-x: hidden;
+}
+/* Adds an invisible element before each target to offset for the navigation
+   bar */
+.anchor::before {
+  content:"";
+  display:block;
+  height:50px;      /* fixed header height for style bootstrap */
+  margin:-50px 0 0; /* negative fixed header height */
+}
+</style>
+</head>
+
+<!-- tocinfo
+{'highest level': 2,
+ 'sections': [('Plan for week 39, September 22-26, 2025',
+               2,
+               None,
+               'plan-for-week-39-september-22-26-2025'),
+              ('Readings and Videos, resampling methods',
+               2,
+               None,
+               'readings-and-videos-resampling-methods'),
+              ('Readings and Videos, logistic regression',
+               2,
+               None,
+               'readings-and-videos-logistic-regression'),
+              ('Lab sessions week 39', 2, None, 'lab-sessions-week-39'),
+              ('Lecture material', 2, None, 'lecture-material'),
+              ('Resampling methods', 2, None, 'resampling-methods'),
+              ('Resampling approaches can be computationally expensive',
+               2,
+               None,
+               'resampling-approaches-can-be-computationally-expensive'),
+              ('Why resampling methods ?', 2, None, 'why-resampling-methods'),
+              ('Statistical analysis', 2, None, 'statistical-analysis'),
+              ('Resampling methods', 2, None, 'resampling-methods'),
+              ('Resampling methods: Bootstrap',
+               2,
+               None,
+               'resampling-methods-bootstrap'),
+              ('The bias-variance tradeoff',
+               2,
+               None,
+               'the-bias-variance-tradeoff'),
+              ('A way to Read the Bias-Variance Tradeoff',
+               2,
+               None,
+               'a-way-to-read-the-bias-variance-tradeoff'),
+              ('Understanding what happens',
+               2,
+               None,
+               'understanding-what-happens'),
+              ('Summing up', 2, None, 'summing-up'),
+              ("Another Example from Scikit-Learn's Repository",
+               2,
+               None,
+               'another-example-from-scikit-learn-s-repository'),
+              ('Various steps in cross-validation',
+               2,
+               None,
+               'various-steps-in-cross-validation'),
+              ('Cross-validation in brief',
+               2,
+               None,
+               'cross-validation-in-brief'),
+              ('Code Example for Cross-validation and $k$-fold '
+               'Cross-validation',
+               2,
+               None,
+               'code-example-for-cross-validation-and-k-fold-cross-validation'),
+              ('More examples on bootstrap and cross-validation and errors',
+               2,
+               None,
+               'more-examples-on-bootstrap-and-cross-validation-and-errors'),
+              ('The same example but now with cross-validation',
+               2,
+               None,
+               'the-same-example-but-now-with-cross-validation'),
+              ('Logistic Regression', 2, None, 'logistic-regression'),
+              ('Classification problems', 2, None, 'classification-problems'),
+              ('Optimization and Deep learning',
+               2,
+               None,
+               'optimization-and-deep-learning'),
+              ('Basics', 2, None, 'basics'),
+              ('Linear classifier', 2, None, 'linear-classifier'),
+              ('Some selected properties', 2, None, 'some-selected-properties'),
+              ('Simple example', 2, None, 'simple-example'),
+              ('Plotting the mean value for each group',
+               2,
+               None,
+               'plotting-the-mean-value-for-each-group'),
+              ('The logistic function', 2, None, 'the-logistic-function'),
+              ('Examples of likelihood functions used in logistic regression '
+               'and nueral networks',
+               2,
+               None,
+               'examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-networks'),
+              ('Two parameters', 2, None, 'two-parameters'),
+              ('Maximum likelihood', 2, None, 'maximum-likelihood'),
+              ('The cost function rewritten',
+               2,
+               None,
+               'the-cost-function-rewritten'),
+              ('Minimizing the cross entropy',
+               2,
+               None,
+               'minimizing-the-cross-entropy'),
+              ('A more compact expression',
+               2,
+               None,
+               'a-more-compact-expression'),
+              ('Extending to more predictors',
+               2,
+               None,
+               'extending-to-more-predictors'),
+              ('Including more classes', 2, None, 'including-more-classes'),
+              ('More classes', 2, None, 'more-classes'),
+              ('Optimization, the central part of any Machine Learning '
+               'algortithm',
+               2,
+               None,
+               'optimization-the-central-part-of-any-machine-learning-algortithm'),
+              ('Revisiting our Logistic Regression case',
+               2,
+               None,
+               'revisiting-our-logistic-regression-case'),
+              ('The equations to solve', 2, None, 'the-equations-to-solve'),
+              ("Solving using Newton-Raphson's method",
+               2,
+               None,
+               'solving-using-newton-raphson-s-method'),
+              ('Example code for Logistic Regression',
+               2,
+               None,
+               'example-code-for-logistic-regression'),
+              ('Synthetic data generation',
+               3,
+               None,
+               'synthetic-data-generation')]}
+end of tocinfo -->
+
+<body>
+
+
+
+<script type="text/x-mathjax-config">
+MathJax.Hub.Config({
+  TeX: {
+     equationNumbers: {  autoNumber: "none"  },
+     extensions: ["AMSmath.js", "AMSsymbols.js", "autobold.js", "color.js"]
+  }
+});
+</script>
+<script type="text/javascript" async
+ src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
+</script>
+
+
+<!-- Bootstrap navigation bar -->
+<div class="navbar navbar-default navbar-fixed-top">
+  <div class="navbar-header">
+    <button type="button" class="navbar-toggle" data-toggle="collapse" data-target=".navbar-responsive-collapse">
+      <span class="icon-bar"></span>
+      <span class="icon-bar"></span>
+      <span class="icon-bar"></span>
+    </button>
+    <a class="navbar-brand" href="http://github.com/week39-bs.html">Week 39: Resampling methods and logistic regression</a>
+  </div>
+  <div class="navbar-collapse collapse navbar-responsive-collapse">
+    <ul class="nav navbar-nav navbar-right">
+      <li class="dropdown">
+        <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
+        <ul class="dropdown-menu">
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs001.html#plan-for-week-39-september-22-26-2025" style="font-size: 80%;"><b>Plan for week 39, September 22-26, 2025</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs002.html#readings-and-videos-resampling-methods" style="font-size: 80%;"><b>Readings and Videos, resampling methods</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs003.html#readings-and-videos-logistic-regression" style="font-size: 80%;"><b>Readings and Videos, logistic regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs004.html#lab-sessions-week-39" style="font-size: 80%;"><b>Lab sessions week 39</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs005.html#lecture-material" style="font-size: 80%;"><b>Lecture material</b></a></li>
+     <!-- navigation toc: --> <li><a href="#resampling-methods" style="font-size: 80%;"><b>Resampling methods</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs007.html#resampling-approaches-can-be-computationally-expensive" style="font-size: 80%;"><b>Resampling approaches can be computationally expensive</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs008.html#why-resampling-methods" style="font-size: 80%;"><b>Why resampling methods ?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs009.html#statistical-analysis" style="font-size: 80%;"><b>Statistical analysis</b></a></li>
+     <!-- navigation toc: --> <li><a href="#resampling-methods" style="font-size: 80%;"><b>Resampling methods</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs011.html#resampling-methods-bootstrap" style="font-size: 80%;"><b>Resampling methods: Bootstrap</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs012.html#the-bias-variance-tradeoff" style="font-size: 80%;"><b>The bias-variance tradeoff</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs013.html#a-way-to-read-the-bias-variance-tradeoff" style="font-size: 80%;"><b>A way to Read the Bias-Variance Tradeoff</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs014.html#understanding-what-happens" style="font-size: 80%;"><b>Understanding what happens</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs015.html#summing-up" style="font-size: 80%;"><b>Summing up</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs016.html#another-example-from-scikit-learn-s-repository" style="font-size: 80%;"><b>Another Example from Scikit-Learn's Repository</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs017.html#various-steps-in-cross-validation" style="font-size: 80%;"><b>Various steps in cross-validation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs018.html#cross-validation-in-brief" style="font-size: 80%;"><b>Cross-validation in brief</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs019.html#code-example-for-cross-validation-and-k-fold-cross-validation" style="font-size: 80%;"><b>Code Example for Cross-validation and \( k \)-fold Cross-validation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs020.html#more-examples-on-bootstrap-and-cross-validation-and-errors" style="font-size: 80%;"><b>More examples on bootstrap and cross-validation and errors</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs021.html#the-same-example-but-now-with-cross-validation" style="font-size: 80%;"><b>The same example but now with cross-validation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs022.html#logistic-regression" style="font-size: 80%;"><b>Logistic Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs023.html#classification-problems" style="font-size: 80%;"><b>Classification problems</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs024.html#optimization-and-deep-learning" style="font-size: 80%;"><b>Optimization and Deep learning</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs025.html#basics" style="font-size: 80%;"><b>Basics</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs026.html#linear-classifier" style="font-size: 80%;"><b>Linear classifier</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs027.html#some-selected-properties" style="font-size: 80%;"><b>Some selected properties</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs028.html#simple-example" style="font-size: 80%;"><b>Simple example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs029.html#plotting-the-mean-value-for-each-group" style="font-size: 80%;"><b>Plotting the mean value for each group</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs030.html#the-logistic-function" style="font-size: 80%;"><b>The logistic function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs031.html#examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-networks" style="font-size: 80%;"><b>Examples of likelihood functions used in logistic regression and nueral networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs032.html#two-parameters" style="font-size: 80%;"><b>Two parameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs033.html#maximum-likelihood" style="font-size: 80%;"><b>Maximum likelihood</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#the-cost-function-rewritten" style="font-size: 80%;"><b>The cost function rewritten</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs035.html#minimizing-the-cross-entropy" style="font-size: 80%;"><b>Minimizing the cross entropy</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs036.html#a-more-compact-expression" style="font-size: 80%;"><b>A more compact expression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs037.html#extending-to-more-predictors" style="font-size: 80%;"><b>Extending to more predictors</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs038.html#including-more-classes" style="font-size: 80%;"><b>Including more classes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs039.html#more-classes" style="font-size: 80%;"><b>More classes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs040.html#optimization-the-central-part-of-any-machine-learning-algortithm" style="font-size: 80%;"><b>Optimization, the central part of any Machine Learning algortithm</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs041.html#revisiting-our-logistic-regression-case" style="font-size: 80%;"><b>Revisiting our Logistic Regression case</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs042.html#the-equations-to-solve" style="font-size: 80%;"><b>The equations to solve</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs043.html#solving-using-newton-raphson-s-method" style="font-size: 80%;"><b>Solving using Newton-Raphson's method</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs044.html#example-code-for-logistic-regression" style="font-size: 80%;"><b>Example code for Logistic Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs044.html#synthetic-data-generation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Synthetic data generation</a></li>
+
+        </ul>
+      </li>
+    </ul>
+  </div>
+</div>
+</div> <!-- end of navigation bar -->
+<div class="container">
+<p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
+<a name="part0010"></a>
+<!-- !split -->
+<h2 id="resampling-methods" class="anchor">Resampling methods </h2>
+
+<p>With all these analytical equations for both the OLS and Ridge
+regression, we will now outline how to assess a given model. This will
+lead to a discussion of the so-called bias-variance tradeoff (see
+below) and so-called resampling methods.
+</p>
+
+<p>One of the quantities we have discussed as a way to measure errors is
+the mean-squared error (MSE), mainly used for fitting of continuous
+functions. Another choice is the absolute error.
+</p>
+
+<p>In the discussions below we will focus on the MSE and in particular since we will split the data into test and training data,
+we discuss the
+</p>
+<ol>
+<li> prediction error or simply the <b>test error</b> \( \mathrm{Err_{Test}} \), where we have a fixed training set and the test error is the MSE arising from the data reserved for testing. We discuss also the</li> 
+<li> training error \( \mathrm{Err_{Train}} \), which is the average loss over the training data.</li>
+</ol>
+<p>As our model becomes more and more complex, more of the training data tends to  used. The training may thence adapt to more complicated structures in the data. This may lead to a decrease in the bias (see below for code example) and a slight increase of the variance for the test error.
+For a certain level of complexity the test error will reach minimum, before starting to increase again. The
+training error reaches a saturation.
+</p>
+
+<p>
+<!-- navigation buttons at the bottom of the page -->
+<ul class="pagination">
+<li><a href="http://github.com/._week39-bs009.html">&laquo;</a></li>
+  <li><a href="http://github.com/._week39-bs000.html">1</a></li>
+  <li><a href="">...</a></li>
+  <li><a href="http://github.com/._week39-bs002.html">3</a></li>
+  <li><a href="http://github.com/._week39-bs003.html">4</a></li>
+  <li><a href="http://github.com/._week39-bs004.html">5</a></li>
+  <li><a href="http://github.com/._week39-bs005.html">6</a></li>
+  <li><a href="http://github.com/._week39-bs006.html">7</a></li>
+  <li><a href="http://github.com/._week39-bs007.html">8</a></li>
+  <li><a href="http://github.com/._week39-bs008.html">9</a></li>
+  <li><a href="http://github.com/._week39-bs009.html">10</a></li>
+  <li class="active"><a href="http://github.com/._week39-bs010.html">11</a></li>
+  <li><a href="http://github.com/._week39-bs011.html">12</a></li>
+  <li><a href="http://github.com/._week39-bs012.html">13</a></li>
+  <li><a href="http://github.com/._week39-bs013.html">14</a></li>
+  <li><a href="http://github.com/._week39-bs014.html">15</a></li>
+  <li><a href="http://github.com/._week39-bs015.html">16</a></li>
+  <li><a href="http://github.com/._week39-bs016.html">17</a></li>
+  <li><a href="http://github.com/._week39-bs017.html">18</a></li>
+  <li><a href="http://github.com/._week39-bs018.html">19</a></li>
+  <li><a href="http://github.com/._week39-bs019.html">20</a></li>
+  <li><a href="">...</a></li>
+  <li><a href="http://github.com/._week39-bs044.html">45</a></li>
+  <li><a href="http://github.com/._week39-bs011.html">&raquo;</a></li>
+</ul>
+<!-- ------------------- end of main content --------------- -->
+</div>  <!-- end container -->
+<!-- include javascript, jQuery *first* -->
+<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.10.2/jquery.min.js"></script>
+<script src="https://netdna.bootstrapcdn.com/bootstrap/3.0.0/js/bootstrap.min.js"></script>
+<!-- Bootstrap footer
+<footer>
+<a href="https://.../"><img width="250" align=right src="https://.../"></a>
+</footer>
+-->
+<center style="font-size:80%">
+<!-- copyright only on the titlepage -->
+</center>
+</body>
+</html>
+
diff --git a/doc/pub/week39/html/._week39-bs011.html b/doc/pub/week39/html/._week39-bs011.html
new file mode 100644
index 000000000..c3b9250f3
--- /dev/null
+++ b/doc/pub/week39/html/._week39-bs011.html
@@ -0,0 +1,319 @@
+<!--
+HTML file automatically generated from DocOnce source
+(https://github.com/doconce/doconce/)
+doconce format html week39.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week39-bs --no_mako
+-->
+<html>
+<head>
+<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
+<meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
+<meta name="viewport" content="width=device-width, initial-scale=1.0" />
+<meta name="description" content="Week 39: Resampling methods and logistic regression">
+<title>Week 39: Resampling methods and logistic regression</title>
+<!-- Bootstrap style: bootstrap -->
+<!-- doconce format html week39.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week39-bs --no_mako -->
+<link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
+<!-- not necessary
+<link href="https://netdna.bootstrapcdn.com/font-awesome/4.0.3/css/font-awesome.css" rel="stylesheet">
+-->
+<style type="text/css">
+/* Add scrollbar to dropdown menus in bootstrap navigation bar */
+.dropdown-menu {
+   height: auto;
+   max-height: 400px;
+   overflow-x: hidden;
+}
+/* Adds an invisible element before each target to offset for the navigation
+   bar */
+.anchor::before {
+  content:"";
+  display:block;
+  height:50px;      /* fixed header height for style bootstrap */
+  margin:-50px 0 0; /* negative fixed header height */
+}
+</style>
+</head>
+
+<!-- tocinfo
+{'highest level': 2,
+ 'sections': [('Plan for week 39, September 22-26, 2025',
+               2,
+               None,
+               'plan-for-week-39-september-22-26-2025'),
+              ('Readings and Videos, resampling methods',
+               2,
+               None,
+               'readings-and-videos-resampling-methods'),
+              ('Readings and Videos, logistic regression',
+               2,
+               None,
+               'readings-and-videos-logistic-regression'),
+              ('Lab sessions week 39', 2, None, 'lab-sessions-week-39'),
+              ('Lecture material', 2, None, 'lecture-material'),
+              ('Resampling methods', 2, None, 'resampling-methods'),
+              ('Resampling approaches can be computationally expensive',
+               2,
+               None,
+               'resampling-approaches-can-be-computationally-expensive'),
+              ('Why resampling methods ?', 2, None, 'why-resampling-methods'),
+              ('Statistical analysis', 2, None, 'statistical-analysis'),
+              ('Resampling methods', 2, None, 'resampling-methods'),
+              ('Resampling methods: Bootstrap',
+               2,
+               None,
+               'resampling-methods-bootstrap'),
+              ('The bias-variance tradeoff',
+               2,
+               None,
+               'the-bias-variance-tradeoff'),
+              ('A way to Read the Bias-Variance Tradeoff',
+               2,
+               None,
+               'a-way-to-read-the-bias-variance-tradeoff'),
+              ('Understanding what happens',
+               2,
+               None,
+               'understanding-what-happens'),
+              ('Summing up', 2, None, 'summing-up'),
+              ("Another Example from Scikit-Learn's Repository",
+               2,
+               None,
+               'another-example-from-scikit-learn-s-repository'),
+              ('Various steps in cross-validation',
+               2,
+               None,
+               'various-steps-in-cross-validation'),
+              ('Cross-validation in brief',
+               2,
+               None,
+               'cross-validation-in-brief'),
+              ('Code Example for Cross-validation and $k$-fold '
+               'Cross-validation',
+               2,
+               None,
+               'code-example-for-cross-validation-and-k-fold-cross-validation'),
+              ('More examples on bootstrap and cross-validation and errors',
+               2,
+               None,
+               'more-examples-on-bootstrap-and-cross-validation-and-errors'),
+              ('The same example but now with cross-validation',
+               2,
+               None,
+               'the-same-example-but-now-with-cross-validation'),
+              ('Logistic Regression', 2, None, 'logistic-regression'),
+              ('Classification problems', 2, None, 'classification-problems'),
+              ('Optimization and Deep learning',
+               2,
+               None,
+               'optimization-and-deep-learning'),
+              ('Basics', 2, None, 'basics'),
+              ('Linear classifier', 2, None, 'linear-classifier'),
+              ('Some selected properties', 2, None, 'some-selected-properties'),
+              ('Simple example', 2, None, 'simple-example'),
+              ('Plotting the mean value for each group',
+               2,
+               None,
+               'plotting-the-mean-value-for-each-group'),
+              ('The logistic function', 2, None, 'the-logistic-function'),
+              ('Examples of likelihood functions used in logistic regression '
+               'and nueral networks',
+               2,
+               None,
+               'examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-networks'),
+              ('Two parameters', 2, None, 'two-parameters'),
+              ('Maximum likelihood', 2, None, 'maximum-likelihood'),
+              ('The cost function rewritten',
+               2,
+               None,
+               'the-cost-function-rewritten'),
+              ('Minimizing the cross entropy',
+               2,
+               None,
+               'minimizing-the-cross-entropy'),
+              ('A more compact expression',
+               2,
+               None,
+               'a-more-compact-expression'),
+              ('Extending to more predictors',
+               2,
+               None,
+               'extending-to-more-predictors'),
+              ('Including more classes', 2, None, 'including-more-classes'),
+              ('More classes', 2, None, 'more-classes'),
+              ('Optimization, the central part of any Machine Learning '
+               'algortithm',
+               2,
+               None,
+               'optimization-the-central-part-of-any-machine-learning-algortithm'),
+              ('Revisiting our Logistic Regression case',
+               2,
+               None,
+               'revisiting-our-logistic-regression-case'),
+              ('The equations to solve', 2, None, 'the-equations-to-solve'),
+              ("Solving using Newton-Raphson's method",
+               2,
+               None,
+               'solving-using-newton-raphson-s-method'),
+              ('Example code for Logistic Regression',
+               2,
+               None,
+               'example-code-for-logistic-regression'),
+              ('Synthetic data generation',
+               3,
+               None,
+               'synthetic-data-generation')]}
+end of tocinfo -->
+
+<body>
+
+
+
+<script type="text/x-mathjax-config">
+MathJax.Hub.Config({
+  TeX: {
+     equationNumbers: {  autoNumber: "none"  },
+     extensions: ["AMSmath.js", "AMSsymbols.js", "autobold.js", "color.js"]
+  }
+});
+</script>
+<script type="text/javascript" async
+ src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
+</script>
+
+
+<!-- Bootstrap navigation bar -->
+<div class="navbar navbar-default navbar-fixed-top">
+  <div class="navbar-header">
+    <button type="button" class="navbar-toggle" data-toggle="collapse" data-target=".navbar-responsive-collapse">
+      <span class="icon-bar"></span>
+      <span class="icon-bar"></span>
+      <span class="icon-bar"></span>
+    </button>
+    <a class="navbar-brand" href="http://github.com/week39-bs.html">Week 39: Resampling methods and logistic regression</a>
+  </div>
+  <div class="navbar-collapse collapse navbar-responsive-collapse">
+    <ul class="nav navbar-nav navbar-right">
+      <li class="dropdown">
+        <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
+        <ul class="dropdown-menu">
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs001.html#plan-for-week-39-september-22-26-2025" style="font-size: 80%;"><b>Plan for week 39, September 22-26, 2025</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs002.html#readings-and-videos-resampling-methods" style="font-size: 80%;"><b>Readings and Videos, resampling methods</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs003.html#readings-and-videos-logistic-regression" style="font-size: 80%;"><b>Readings and Videos, logistic regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs004.html#lab-sessions-week-39" style="font-size: 80%;"><b>Lab sessions week 39</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs005.html#lecture-material" style="font-size: 80%;"><b>Lecture material</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs010.html#resampling-methods" style="font-size: 80%;"><b>Resampling methods</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs007.html#resampling-approaches-can-be-computationally-expensive" style="font-size: 80%;"><b>Resampling approaches can be computationally expensive</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs008.html#why-resampling-methods" style="font-size: 80%;"><b>Why resampling methods ?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs009.html#statistical-analysis" style="font-size: 80%;"><b>Statistical analysis</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs010.html#resampling-methods" style="font-size: 80%;"><b>Resampling methods</b></a></li>
+     <!-- navigation toc: --> <li><a href="#resampling-methods-bootstrap" style="font-size: 80%;"><b>Resampling methods: Bootstrap</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs012.html#the-bias-variance-tradeoff" style="font-size: 80%;"><b>The bias-variance tradeoff</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs013.html#a-way-to-read-the-bias-variance-tradeoff" style="font-size: 80%;"><b>A way to Read the Bias-Variance Tradeoff</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs014.html#understanding-what-happens" style="font-size: 80%;"><b>Understanding what happens</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs015.html#summing-up" style="font-size: 80%;"><b>Summing up</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs016.html#another-example-from-scikit-learn-s-repository" style="font-size: 80%;"><b>Another Example from Scikit-Learn's Repository</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs017.html#various-steps-in-cross-validation" style="font-size: 80%;"><b>Various steps in cross-validation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs018.html#cross-validation-in-brief" style="font-size: 80%;"><b>Cross-validation in brief</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs019.html#code-example-for-cross-validation-and-k-fold-cross-validation" style="font-size: 80%;"><b>Code Example for Cross-validation and \( k \)-fold Cross-validation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs020.html#more-examples-on-bootstrap-and-cross-validation-and-errors" style="font-size: 80%;"><b>More examples on bootstrap and cross-validation and errors</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs021.html#the-same-example-but-now-with-cross-validation" style="font-size: 80%;"><b>The same example but now with cross-validation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs022.html#logistic-regression" style="font-size: 80%;"><b>Logistic Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs023.html#classification-problems" style="font-size: 80%;"><b>Classification problems</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs024.html#optimization-and-deep-learning" style="font-size: 80%;"><b>Optimization and Deep learning</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs025.html#basics" style="font-size: 80%;"><b>Basics</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs026.html#linear-classifier" style="font-size: 80%;"><b>Linear classifier</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs027.html#some-selected-properties" style="font-size: 80%;"><b>Some selected properties</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs028.html#simple-example" style="font-size: 80%;"><b>Simple example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs029.html#plotting-the-mean-value-for-each-group" style="font-size: 80%;"><b>Plotting the mean value for each group</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs030.html#the-logistic-function" style="font-size: 80%;"><b>The logistic function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs031.html#examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-networks" style="font-size: 80%;"><b>Examples of likelihood functions used in logistic regression and nueral networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs032.html#two-parameters" style="font-size: 80%;"><b>Two parameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs033.html#maximum-likelihood" style="font-size: 80%;"><b>Maximum likelihood</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#the-cost-function-rewritten" style="font-size: 80%;"><b>The cost function rewritten</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs035.html#minimizing-the-cross-entropy" style="font-size: 80%;"><b>Minimizing the cross entropy</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs036.html#a-more-compact-expression" style="font-size: 80%;"><b>A more compact expression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs037.html#extending-to-more-predictors" style="font-size: 80%;"><b>Extending to more predictors</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs038.html#including-more-classes" style="font-size: 80%;"><b>Including more classes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs039.html#more-classes" style="font-size: 80%;"><b>More classes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs040.html#optimization-the-central-part-of-any-machine-learning-algortithm" style="font-size: 80%;"><b>Optimization, the central part of any Machine Learning algortithm</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs041.html#revisiting-our-logistic-regression-case" style="font-size: 80%;"><b>Revisiting our Logistic Regression case</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs042.html#the-equations-to-solve" style="font-size: 80%;"><b>The equations to solve</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs043.html#solving-using-newton-raphson-s-method" style="font-size: 80%;"><b>Solving using Newton-Raphson's method</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs044.html#example-code-for-logistic-regression" style="font-size: 80%;"><b>Example code for Logistic Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs044.html#synthetic-data-generation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Synthetic data generation</a></li>
+
+        </ul>
+      </li>
+    </ul>
+  </div>
+</div>
+</div> <!-- end of navigation bar -->
+<div class="container">
+<p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
+<a name="part0011"></a>
+<!-- !split -->
+<h2 id="resampling-methods-bootstrap" class="anchor">Resampling methods: Bootstrap </h2>
+<div class="panel panel-default">
+<div class="panel-body">
+<!-- subsequent paragraphs come in larger fonts, so start with a paragraph -->
+<p>Bootstrapping is a <a href="https://en.wikipedia.org/wiki/Nonparametric_statistics" target="_self">non-parametric approach</a> to statistical inference
+that substitutes computation for more traditional distributional
+assumptions and asymptotic results. Bootstrapping offers a number of
+advantages: 
+</p>
+<ol>
+<li> The bootstrap is quite general, although there are some cases in which it fails.</li>  
+<li> Because it does not require distributional assumptions (such as normally distributed errors), the bootstrap can provide more accurate inferences when the data are not well behaved or when the sample size is small.</li>  
+<li> It is possible to apply the bootstrap to statistics with sampling distributions that are difficult to derive, even asymptotically.</li> 
+<li> It is relatively simple to apply the bootstrap to complex data-collection plans (such as stratified and clustered samples).</li>
+</ol>
+</div>
+</div>
+
+
+<p>The textbook by <a href="https://www.cambridge.org/core/books/bootstrap-methods-and-their-application/ED2FD043579F27952363566DC09CBD6A" target="_self">Davison on the Bootstrap Methods and their Applications</a> provides many more insights and proofs. In this course we will take a more practical approach and use the results and theorems provided in the literature. For those interested in reading more about the bootstrap methods, we recommend the above text and the one by <a href="https://www.routledge.com/An-Introduction-to-the-Bootstrap/Efron-Tibshirani/p/book/9780412042317" target="_self">Efron and Tibshirani</a>.</p>
+
+<p>
+<!-- navigation buttons at the bottom of the page -->
+<ul class="pagination">
+<li><a href="http://github.com/._week39-bs010.html">&laquo;</a></li>
+  <li><a href="http://github.com/._week39-bs000.html">1</a></li>
+  <li><a href="">...</a></li>
+  <li><a href="http://github.com/._week39-bs003.html">4</a></li>
+  <li><a href="http://github.com/._week39-bs004.html">5</a></li>
+  <li><a href="http://github.com/._week39-bs005.html">6</a></li>
+  <li><a href="http://github.com/._week39-bs006.html">7</a></li>
+  <li><a href="http://github.com/._week39-bs007.html">8</a></li>
+  <li><a href="http://github.com/._week39-bs008.html">9</a></li>
+  <li><a href="http://github.com/._week39-bs009.html">10</a></li>
+  <li><a href="http://github.com/._week39-bs010.html">11</a></li>
+  <li class="active"><a href="http://github.com/._week39-bs011.html">12</a></li>
+  <li><a href="http://github.com/._week39-bs012.html">13</a></li>
+  <li><a href="http://github.com/._week39-bs013.html">14</a></li>
+  <li><a href="http://github.com/._week39-bs014.html">15</a></li>
+  <li><a href="http://github.com/._week39-bs015.html">16</a></li>
+  <li><a href="http://github.com/._week39-bs016.html">17</a></li>
+  <li><a href="http://github.com/._week39-bs017.html">18</a></li>
+  <li><a href="http://github.com/._week39-bs018.html">19</a></li>
+  <li><a href="http://github.com/._week39-bs019.html">20</a></li>
+  <li><a href="http://github.com/._week39-bs020.html">21</a></li>
+  <li><a href="">...</a></li>
+  <li><a href="http://github.com/._week39-bs044.html">45</a></li>
+  <li><a href="http://github.com/._week39-bs012.html">&raquo;</a></li>
+</ul>
+<!-- ------------------- end of main content --------------- -->
+</div>  <!-- end container -->
+<!-- include javascript, jQuery *first* -->
+<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.10.2/jquery.min.js"></script>
+<script src="https://netdna.bootstrapcdn.com/bootstrap/3.0.0/js/bootstrap.min.js"></script>
+<!-- Bootstrap footer
+<footer>
+<a href="https://.../"><img width="250" align=right src="https://.../"></a>
+</footer>
+-->
+<center style="font-size:80%">
+<!-- copyright only on the titlepage -->
+</center>
+</body>
+</html>
+
diff --git a/doc/pub/week39/html/._week39-bs012.html b/doc/pub/week39/html/._week39-bs012.html
new file mode 100644
index 000000000..4b431ef09
--- /dev/null
+++ b/doc/pub/week39/html/._week39-bs012.html
@@ -0,0 +1,359 @@
+<!--
+HTML file automatically generated from DocOnce source
+(https://github.com/doconce/doconce/)
+doconce format html week39.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week39-bs --no_mako
+-->
+<html>
+<head>
+<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
+<meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
+<meta name="viewport" content="width=device-width, initial-scale=1.0" />
+<meta name="description" content="Week 39: Resampling methods and logistic regression">
+<title>Week 39: Resampling methods and logistic regression</title>
+<!-- Bootstrap style: bootstrap -->
+<!-- doconce format html week39.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week39-bs --no_mako -->
+<link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
+<!-- not necessary
+<link href="https://netdna.bootstrapcdn.com/font-awesome/4.0.3/css/font-awesome.css" rel="stylesheet">
+-->
+<style type="text/css">
+/* Add scrollbar to dropdown menus in bootstrap navigation bar */
+.dropdown-menu {
+   height: auto;
+   max-height: 400px;
+   overflow-x: hidden;
+}
+/* Adds an invisible element before each target to offset for the navigation
+   bar */
+.anchor::before {
+  content:"";
+  display:block;
+  height:50px;      /* fixed header height for style bootstrap */
+  margin:-50px 0 0; /* negative fixed header height */
+}
+</style>
+</head>
+
+<!-- tocinfo
+{'highest level': 2,
+ 'sections': [('Plan for week 39, September 22-26, 2025',
+               2,
+               None,
+               'plan-for-week-39-september-22-26-2025'),
+              ('Readings and Videos, resampling methods',
+               2,
+               None,
+               'readings-and-videos-resampling-methods'),
+              ('Readings and Videos, logistic regression',
+               2,
+               None,
+               'readings-and-videos-logistic-regression'),
+              ('Lab sessions week 39', 2, None, 'lab-sessions-week-39'),
+              ('Lecture material', 2, None, 'lecture-material'),
+              ('Resampling methods', 2, None, 'resampling-methods'),
+              ('Resampling approaches can be computationally expensive',
+               2,
+               None,
+               'resampling-approaches-can-be-computationally-expensive'),
+              ('Why resampling methods ?', 2, None, 'why-resampling-methods'),
+              ('Statistical analysis', 2, None, 'statistical-analysis'),
+              ('Resampling methods', 2, None, 'resampling-methods'),
+              ('Resampling methods: Bootstrap',
+               2,
+               None,
+               'resampling-methods-bootstrap'),
+              ('The bias-variance tradeoff',
+               2,
+               None,
+               'the-bias-variance-tradeoff'),
+              ('A way to Read the Bias-Variance Tradeoff',
+               2,
+               None,
+               'a-way-to-read-the-bias-variance-tradeoff'),
+              ('Understanding what happens',
+               2,
+               None,
+               'understanding-what-happens'),
+              ('Summing up', 2, None, 'summing-up'),
+              ("Another Example from Scikit-Learn's Repository",
+               2,
+               None,
+               'another-example-from-scikit-learn-s-repository'),
+              ('Various steps in cross-validation',
+               2,
+               None,
+               'various-steps-in-cross-validation'),
+              ('Cross-validation in brief',
+               2,
+               None,
+               'cross-validation-in-brief'),
+              ('Code Example for Cross-validation and $k$-fold '
+               'Cross-validation',
+               2,
+               None,
+               'code-example-for-cross-validation-and-k-fold-cross-validation'),
+              ('More examples on bootstrap and cross-validation and errors',
+               2,
+               None,
+               'more-examples-on-bootstrap-and-cross-validation-and-errors'),
+              ('The same example but now with cross-validation',
+               2,
+               None,
+               'the-same-example-but-now-with-cross-validation'),
+              ('Logistic Regression', 2, None, 'logistic-regression'),
+              ('Classification problems', 2, None, 'classification-problems'),
+              ('Optimization and Deep learning',
+               2,
+               None,
+               'optimization-and-deep-learning'),
+              ('Basics', 2, None, 'basics'),
+              ('Linear classifier', 2, None, 'linear-classifier'),
+              ('Some selected properties', 2, None, 'some-selected-properties'),
+              ('Simple example', 2, None, 'simple-example'),
+              ('Plotting the mean value for each group',
+               2,
+               None,
+               'plotting-the-mean-value-for-each-group'),
+              ('The logistic function', 2, None, 'the-logistic-function'),
+              ('Examples of likelihood functions used in logistic regression '
+               'and nueral networks',
+               2,
+               None,
+               'examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-networks'),
+              ('Two parameters', 2, None, 'two-parameters'),
+              ('Maximum likelihood', 2, None, 'maximum-likelihood'),
+              ('The cost function rewritten',
+               2,
+               None,
+               'the-cost-function-rewritten'),
+              ('Minimizing the cross entropy',
+               2,
+               None,
+               'minimizing-the-cross-entropy'),
+              ('A more compact expression',
+               2,
+               None,
+               'a-more-compact-expression'),
+              ('Extending to more predictors',
+               2,
+               None,
+               'extending-to-more-predictors'),
+              ('Including more classes', 2, None, 'including-more-classes'),
+              ('More classes', 2, None, 'more-classes'),
+              ('Optimization, the central part of any Machine Learning '
+               'algortithm',
+               2,
+               None,
+               'optimization-the-central-part-of-any-machine-learning-algortithm'),
+              ('Revisiting our Logistic Regression case',
+               2,
+               None,
+               'revisiting-our-logistic-regression-case'),
+              ('The equations to solve', 2, None, 'the-equations-to-solve'),
+              ("Solving using Newton-Raphson's method",
+               2,
+               None,
+               'solving-using-newton-raphson-s-method'),
+              ('Example code for Logistic Regression',
+               2,
+               None,
+               'example-code-for-logistic-regression'),
+              ('Synthetic data generation',
+               3,
+               None,
+               'synthetic-data-generation')]}
+end of tocinfo -->
+
+<body>
+
+
+
+<script type="text/x-mathjax-config">
+MathJax.Hub.Config({
+  TeX: {
+     equationNumbers: {  autoNumber: "none"  },
+     extensions: ["AMSmath.js", "AMSsymbols.js", "autobold.js", "color.js"]
+  }
+});
+</script>
+<script type="text/javascript" async
+ src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
+</script>
+
+
+<!-- Bootstrap navigation bar -->
+<div class="navbar navbar-default navbar-fixed-top">
+  <div class="navbar-header">
+    <button type="button" class="navbar-toggle" data-toggle="collapse" data-target=".navbar-responsive-collapse">
+      <span class="icon-bar"></span>
+      <span class="icon-bar"></span>
+      <span class="icon-bar"></span>
+    </button>
+    <a class="navbar-brand" href="http://github.com/week39-bs.html">Week 39: Resampling methods and logistic regression</a>
+  </div>
+  <div class="navbar-collapse collapse navbar-responsive-collapse">
+    <ul class="nav navbar-nav navbar-right">
+      <li class="dropdown">
+        <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
+        <ul class="dropdown-menu">
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs001.html#plan-for-week-39-september-22-26-2025" style="font-size: 80%;"><b>Plan for week 39, September 22-26, 2025</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs002.html#readings-and-videos-resampling-methods" style="font-size: 80%;"><b>Readings and Videos, resampling methods</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs003.html#readings-and-videos-logistic-regression" style="font-size: 80%;"><b>Readings and Videos, logistic regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs004.html#lab-sessions-week-39" style="font-size: 80%;"><b>Lab sessions week 39</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs005.html#lecture-material" style="font-size: 80%;"><b>Lecture material</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs010.html#resampling-methods" style="font-size: 80%;"><b>Resampling methods</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs007.html#resampling-approaches-can-be-computationally-expensive" style="font-size: 80%;"><b>Resampling approaches can be computationally expensive</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs008.html#why-resampling-methods" style="font-size: 80%;"><b>Why resampling methods ?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs009.html#statistical-analysis" style="font-size: 80%;"><b>Statistical analysis</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs010.html#resampling-methods" style="font-size: 80%;"><b>Resampling methods</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs011.html#resampling-methods-bootstrap" style="font-size: 80%;"><b>Resampling methods: Bootstrap</b></a></li>
+     <!-- navigation toc: --> <li><a href="#the-bias-variance-tradeoff" style="font-size: 80%;"><b>The bias-variance tradeoff</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs013.html#a-way-to-read-the-bias-variance-tradeoff" style="font-size: 80%;"><b>A way to Read the Bias-Variance Tradeoff</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs014.html#understanding-what-happens" style="font-size: 80%;"><b>Understanding what happens</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs015.html#summing-up" style="font-size: 80%;"><b>Summing up</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs016.html#another-example-from-scikit-learn-s-repository" style="font-size: 80%;"><b>Another Example from Scikit-Learn's Repository</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs017.html#various-steps-in-cross-validation" style="font-size: 80%;"><b>Various steps in cross-validation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs018.html#cross-validation-in-brief" style="font-size: 80%;"><b>Cross-validation in brief</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs019.html#code-example-for-cross-validation-and-k-fold-cross-validation" style="font-size: 80%;"><b>Code Example for Cross-validation and \( k \)-fold Cross-validation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs020.html#more-examples-on-bootstrap-and-cross-validation-and-errors" style="font-size: 80%;"><b>More examples on bootstrap and cross-validation and errors</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs021.html#the-same-example-but-now-with-cross-validation" style="font-size: 80%;"><b>The same example but now with cross-validation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs022.html#logistic-regression" style="font-size: 80%;"><b>Logistic Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs023.html#classification-problems" style="font-size: 80%;"><b>Classification problems</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs024.html#optimization-and-deep-learning" style="font-size: 80%;"><b>Optimization and Deep learning</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs025.html#basics" style="font-size: 80%;"><b>Basics</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs026.html#linear-classifier" style="font-size: 80%;"><b>Linear classifier</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs027.html#some-selected-properties" style="font-size: 80%;"><b>Some selected properties</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs028.html#simple-example" style="font-size: 80%;"><b>Simple example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs029.html#plotting-the-mean-value-for-each-group" style="font-size: 80%;"><b>Plotting the mean value for each group</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs030.html#the-logistic-function" style="font-size: 80%;"><b>The logistic function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs031.html#examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-networks" style="font-size: 80%;"><b>Examples of likelihood functions used in logistic regression and nueral networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs032.html#two-parameters" style="font-size: 80%;"><b>Two parameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs033.html#maximum-likelihood" style="font-size: 80%;"><b>Maximum likelihood</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#the-cost-function-rewritten" style="font-size: 80%;"><b>The cost function rewritten</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs035.html#minimizing-the-cross-entropy" style="font-size: 80%;"><b>Minimizing the cross entropy</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs036.html#a-more-compact-expression" style="font-size: 80%;"><b>A more compact expression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs037.html#extending-to-more-predictors" style="font-size: 80%;"><b>Extending to more predictors</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs038.html#including-more-classes" style="font-size: 80%;"><b>Including more classes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs039.html#more-classes" style="font-size: 80%;"><b>More classes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs040.html#optimization-the-central-part-of-any-machine-learning-algortithm" style="font-size: 80%;"><b>Optimization, the central part of any Machine Learning algortithm</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs041.html#revisiting-our-logistic-regression-case" style="font-size: 80%;"><b>Revisiting our Logistic Regression case</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs042.html#the-equations-to-solve" style="font-size: 80%;"><b>The equations to solve</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs043.html#solving-using-newton-raphson-s-method" style="font-size: 80%;"><b>Solving using Newton-Raphson's method</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs044.html#example-code-for-logistic-regression" style="font-size: 80%;"><b>Example code for Logistic Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs044.html#synthetic-data-generation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Synthetic data generation</a></li>
+
+        </ul>
+      </li>
+    </ul>
+  </div>
+</div>
+</div> <!-- end of navigation bar -->
+<div class="container">
+<p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
+<a name="part0012"></a>
+<!-- !split -->
+<h2 id="the-bias-variance-tradeoff" class="anchor">The bias-variance tradeoff </h2>
+
+<p>We will discuss the bias-variance tradeoff in the context of
+continuous predictions such as regression. However, many of the
+intuitions and ideas discussed here also carry over to classification
+tasks. Consider a dataset \( \mathcal{D} \) consisting of the data
+\( \mathbf{X}_\mathcal{D}=\{(y_j, \boldsymbol{x}_j), j=0\ldots n-1\} \). 
+</p>
+
+<p>Let us assume that the true data is generated from a noisy model</p>
+
+$$
+\boldsymbol{y}=f(\boldsymbol{x}) + \boldsymbol{\epsilon}
+$$
+
+<p>where \( \epsilon \) is normally distributed with mean zero and standard deviation \( \sigma^2 \).</p>
+
+<p>In our derivation of the ordinary least squares method we defined then
+an approximation to the function \( f \) in terms of the parameters
+\( \boldsymbol{\theta} \) and the design matrix \( \boldsymbol{X} \) which embody our model,
+that is \( \boldsymbol{\tilde{y}}=\boldsymbol{X}\boldsymbol{\theta} \). 
+</p>
+
+<p>Thereafter we found the parameters \( \boldsymbol{\theta} \) by optimizing the means squared error via the so-called cost function</p>
+$$
+C(\boldsymbol{X},\boldsymbol{\theta}) =\frac{1}{n}\sum_{i=0}^{n-1}(y_i-\tilde{y}_i)^2=\mathbb{E}\left[(\boldsymbol{y}-\boldsymbol{\tilde{y}})^2\right].
+$$
+
+<p>We can rewrite this as </p>
+$$
+\mathbb{E}\left[(\boldsymbol{y}-\boldsymbol{\tilde{y}})^2\right]=\frac{1}{n}\sum_i(f_i-\mathbb{E}\left[\boldsymbol{\tilde{y}}\right])^2+\frac{1}{n}\sum_i(\tilde{y}_i-\mathbb{E}\left[\boldsymbol{\tilde{y}}\right])^2+\sigma^2.
+$$
+
+<p>The three terms represent the square of the bias of the learning
+method, which can be thought of as the error caused by the simplifying
+assumptions built into the method. The second term represents the
+variance of the chosen model and finally the last terms is variance of
+the error \( \boldsymbol{\epsilon} \).
+</p>
+
+<p>To derive this equation, we need to recall that the variance of \( \boldsymbol{y} \) and \( \boldsymbol{\epsilon} \) are both equal to \( \sigma^2 \). The mean value of \( \boldsymbol{\epsilon} \) is by definition equal to zero. Furthermore, the function \( f \) is not a stochastics variable, idem for \( \boldsymbol{\tilde{y}} \).
+We use a more compact notation in terms of the expectation value 
+</p>
+$$
+\mathbb{E}\left[(\boldsymbol{y}-\boldsymbol{\tilde{y}})^2\right]=\mathbb{E}\left[(\boldsymbol{f}+\boldsymbol{\epsilon}-\boldsymbol{\tilde{y}})^2\right],
+$$
+
+<p>and adding and subtracting \( \mathbb{E}\left[\boldsymbol{\tilde{y}}\right] \) we get</p>
+$$
+\mathbb{E}\left[(\boldsymbol{y}-\boldsymbol{\tilde{y}})^2\right]=\mathbb{E}\left[(\boldsymbol{f}+\boldsymbol{\epsilon}-\boldsymbol{\tilde{y}}+\mathbb{E}\left[\boldsymbol{\tilde{y}}\right]-\mathbb{E}\left[\boldsymbol{\tilde{y}}\right])^2\right],
+$$
+
+<p>which, using the abovementioned expectation values can be rewritten as </p>
+$$
+\mathbb{E}\left[(\boldsymbol{y}-\boldsymbol{\tilde{y}})^2\right]=\mathbb{E}\left[(\boldsymbol{y}-\mathbb{E}\left[\boldsymbol{\tilde{y}}\right])^2\right]+\mathrm{Var}\left[\boldsymbol{\tilde{y}}\right]+\sigma^2,
+$$
+
+<p>that is the rewriting in terms of the so-called bias, the variance of the model \( \boldsymbol{\tilde{y}} \) and the variance of \( \boldsymbol{\epsilon} \).</p>
+
+<b>Note that in order to derive these equations we have assumed we can replace the unknown function \( \boldsymbol{f} \) with the target/output data \( \boldsymbol{y} \).</b>
+
+<p>
+<!-- navigation buttons at the bottom of the page -->
+<ul class="pagination">
+<li><a href="http://github.com/._week39-bs011.html">&laquo;</a></li>
+  <li><a href="http://github.com/._week39-bs000.html">1</a></li>
+  <li><a href="">...</a></li>
+  <li><a href="http://github.com/._week39-bs004.html">5</a></li>
+  <li><a href="http://github.com/._week39-bs005.html">6</a></li>
+  <li><a href="http://github.com/._week39-bs006.html">7</a></li>
+  <li><a href="http://github.com/._week39-bs007.html">8</a></li>
+  <li><a href="http://github.com/._week39-bs008.html">9</a></li>
+  <li><a href="http://github.com/._week39-bs009.html">10</a></li>
+  <li><a href="http://github.com/._week39-bs010.html">11</a></li>
+  <li><a href="http://github.com/._week39-bs011.html">12</a></li>
+  <li class="active"><a href="http://github.com/._week39-bs012.html">13</a></li>
+  <li><a href="http://github.com/._week39-bs013.html">14</a></li>
+  <li><a href="http://github.com/._week39-bs014.html">15</a></li>
+  <li><a href="http://github.com/._week39-bs015.html">16</a></li>
+  <li><a href="http://github.com/._week39-bs016.html">17</a></li>
+  <li><a href="http://github.com/._week39-bs017.html">18</a></li>
+  <li><a href="http://github.com/._week39-bs018.html">19</a></li>
+  <li><a href="http://github.com/._week39-bs019.html">20</a></li>
+  <li><a href="http://github.com/._week39-bs020.html">21</a></li>
+  <li><a href="http://github.com/._week39-bs021.html">22</a></li>
+  <li><a href="">...</a></li>
+  <li><a href="http://github.com/._week39-bs044.html">45</a></li>
+  <li><a href="http://github.com/._week39-bs013.html">&raquo;</a></li>
+</ul>
+<!-- ------------------- end of main content --------------- -->
+</div>  <!-- end container -->
+<!-- include javascript, jQuery *first* -->
+<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.10.2/jquery.min.js"></script>
+<script src="https://netdna.bootstrapcdn.com/bootstrap/3.0.0/js/bootstrap.min.js"></script>
+<!-- Bootstrap footer
+<footer>
+<a href="https://.../"><img width="250" align=right src="https://.../"></a>
+</footer>
+-->
+<center style="font-size:80%">
+<!-- copyright only on the titlepage -->
+</center>
+</body>
+</html>
+
diff --git a/doc/pub/week39/html/._week39-bs013.html b/doc/pub/week39/html/._week39-bs013.html
new file mode 100644
index 000000000..397bbd422
--- /dev/null
+++ b/doc/pub/week39/html/._week39-bs013.html
@@ -0,0 +1,306 @@
+<!--
+HTML file automatically generated from DocOnce source
+(https://github.com/doconce/doconce/)
+doconce format html week39.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week39-bs --no_mako
+-->
+<html>
+<head>
+<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
+<meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
+<meta name="viewport" content="width=device-width, initial-scale=1.0" />
+<meta name="description" content="Week 39: Resampling methods and logistic regression">
+<title>Week 39: Resampling methods and logistic regression</title>
+<!-- Bootstrap style: bootstrap -->
+<!-- doconce format html week39.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week39-bs --no_mako -->
+<link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
+<!-- not necessary
+<link href="https://netdna.bootstrapcdn.com/font-awesome/4.0.3/css/font-awesome.css" rel="stylesheet">
+-->
+<style type="text/css">
+/* Add scrollbar to dropdown menus in bootstrap navigation bar */
+.dropdown-menu {
+   height: auto;
+   max-height: 400px;
+   overflow-x: hidden;
+}
+/* Adds an invisible element before each target to offset for the navigation
+   bar */
+.anchor::before {
+  content:"";
+  display:block;
+  height:50px;      /* fixed header height for style bootstrap */
+  margin:-50px 0 0; /* negative fixed header height */
+}
+</style>
+</head>
+
+<!-- tocinfo
+{'highest level': 2,
+ 'sections': [('Plan for week 39, September 22-26, 2025',
+               2,
+               None,
+               'plan-for-week-39-september-22-26-2025'),
+              ('Readings and Videos, resampling methods',
+               2,
+               None,
+               'readings-and-videos-resampling-methods'),
+              ('Readings and Videos, logistic regression',
+               2,
+               None,
+               'readings-and-videos-logistic-regression'),
+              ('Lab sessions week 39', 2, None, 'lab-sessions-week-39'),
+              ('Lecture material', 2, None, 'lecture-material'),
+              ('Resampling methods', 2, None, 'resampling-methods'),
+              ('Resampling approaches can be computationally expensive',
+               2,
+               None,
+               'resampling-approaches-can-be-computationally-expensive'),
+              ('Why resampling methods ?', 2, None, 'why-resampling-methods'),
+              ('Statistical analysis', 2, None, 'statistical-analysis'),
+              ('Resampling methods', 2, None, 'resampling-methods'),
+              ('Resampling methods: Bootstrap',
+               2,
+               None,
+               'resampling-methods-bootstrap'),
+              ('The bias-variance tradeoff',
+               2,
+               None,
+               'the-bias-variance-tradeoff'),
+              ('A way to Read the Bias-Variance Tradeoff',
+               2,
+               None,
+               'a-way-to-read-the-bias-variance-tradeoff'),
+              ('Understanding what happens',
+               2,
+               None,
+               'understanding-what-happens'),
+              ('Summing up', 2, None, 'summing-up'),
+              ("Another Example from Scikit-Learn's Repository",
+               2,
+               None,
+               'another-example-from-scikit-learn-s-repository'),
+              ('Various steps in cross-validation',
+               2,
+               None,
+               'various-steps-in-cross-validation'),
+              ('Cross-validation in brief',
+               2,
+               None,
+               'cross-validation-in-brief'),
+              ('Code Example for Cross-validation and $k$-fold '
+               'Cross-validation',
+               2,
+               None,
+               'code-example-for-cross-validation-and-k-fold-cross-validation'),
+              ('More examples on bootstrap and cross-validation and errors',
+               2,
+               None,
+               'more-examples-on-bootstrap-and-cross-validation-and-errors'),
+              ('The same example but now with cross-validation',
+               2,
+               None,
+               'the-same-example-but-now-with-cross-validation'),
+              ('Logistic Regression', 2, None, 'logistic-regression'),
+              ('Classification problems', 2, None, 'classification-problems'),
+              ('Optimization and Deep learning',
+               2,
+               None,
+               'optimization-and-deep-learning'),
+              ('Basics', 2, None, 'basics'),
+              ('Linear classifier', 2, None, 'linear-classifier'),
+              ('Some selected properties', 2, None, 'some-selected-properties'),
+              ('Simple example', 2, None, 'simple-example'),
+              ('Plotting the mean value for each group',
+               2,
+               None,
+               'plotting-the-mean-value-for-each-group'),
+              ('The logistic function', 2, None, 'the-logistic-function'),
+              ('Examples of likelihood functions used in logistic regression '
+               'and nueral networks',
+               2,
+               None,
+               'examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-networks'),
+              ('Two parameters', 2, None, 'two-parameters'),
+              ('Maximum likelihood', 2, None, 'maximum-likelihood'),
+              ('The cost function rewritten',
+               2,
+               None,
+               'the-cost-function-rewritten'),
+              ('Minimizing the cross entropy',
+               2,
+               None,
+               'minimizing-the-cross-entropy'),
+              ('A more compact expression',
+               2,
+               None,
+               'a-more-compact-expression'),
+              ('Extending to more predictors',
+               2,
+               None,
+               'extending-to-more-predictors'),
+              ('Including more classes', 2, None, 'including-more-classes'),
+              ('More classes', 2, None, 'more-classes'),
+              ('Optimization, the central part of any Machine Learning '
+               'algortithm',
+               2,
+               None,
+               'optimization-the-central-part-of-any-machine-learning-algortithm'),
+              ('Revisiting our Logistic Regression case',
+               2,
+               None,
+               'revisiting-our-logistic-regression-case'),
+              ('The equations to solve', 2, None, 'the-equations-to-solve'),
+              ("Solving using Newton-Raphson's method",
+               2,
+               None,
+               'solving-using-newton-raphson-s-method'),
+              ('Example code for Logistic Regression',
+               2,
+               None,
+               'example-code-for-logistic-regression'),
+              ('Synthetic data generation',
+               3,
+               None,
+               'synthetic-data-generation')]}
+end of tocinfo -->
+
+<body>
+
+
+
+<script type="text/x-mathjax-config">
+MathJax.Hub.Config({
+  TeX: {
+     equationNumbers: {  autoNumber: "none"  },
+     extensions: ["AMSmath.js", "AMSsymbols.js", "autobold.js", "color.js"]
+  }
+});
+</script>
+<script type="text/javascript" async
+ src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
+</script>
+
+
+<!-- Bootstrap navigation bar -->
+<div class="navbar navbar-default navbar-fixed-top">
+  <div class="navbar-header">
+    <button type="button" class="navbar-toggle" data-toggle="collapse" data-target=".navbar-responsive-collapse">
+      <span class="icon-bar"></span>
+      <span class="icon-bar"></span>
+      <span class="icon-bar"></span>
+    </button>
+    <a class="navbar-brand" href="http://github.com/week39-bs.html">Week 39: Resampling methods and logistic regression</a>
+  </div>
+  <div class="navbar-collapse collapse navbar-responsive-collapse">
+    <ul class="nav navbar-nav navbar-right">
+      <li class="dropdown">
+        <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
+        <ul class="dropdown-menu">
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs001.html#plan-for-week-39-september-22-26-2025" style="font-size: 80%;"><b>Plan for week 39, September 22-26, 2025</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs002.html#readings-and-videos-resampling-methods" style="font-size: 80%;"><b>Readings and Videos, resampling methods</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs003.html#readings-and-videos-logistic-regression" style="font-size: 80%;"><b>Readings and Videos, logistic regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs004.html#lab-sessions-week-39" style="font-size: 80%;"><b>Lab sessions week 39</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs005.html#lecture-material" style="font-size: 80%;"><b>Lecture material</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs010.html#resampling-methods" style="font-size: 80%;"><b>Resampling methods</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs007.html#resampling-approaches-can-be-computationally-expensive" style="font-size: 80%;"><b>Resampling approaches can be computationally expensive</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs008.html#why-resampling-methods" style="font-size: 80%;"><b>Why resampling methods ?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs009.html#statistical-analysis" style="font-size: 80%;"><b>Statistical analysis</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs010.html#resampling-methods" style="font-size: 80%;"><b>Resampling methods</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs011.html#resampling-methods-bootstrap" style="font-size: 80%;"><b>Resampling methods: Bootstrap</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs012.html#the-bias-variance-tradeoff" style="font-size: 80%;"><b>The bias-variance tradeoff</b></a></li>
+     <!-- navigation toc: --> <li><a href="#a-way-to-read-the-bias-variance-tradeoff" style="font-size: 80%;"><b>A way to Read the Bias-Variance Tradeoff</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs014.html#understanding-what-happens" style="font-size: 80%;"><b>Understanding what happens</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs015.html#summing-up" style="font-size: 80%;"><b>Summing up</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs016.html#another-example-from-scikit-learn-s-repository" style="font-size: 80%;"><b>Another Example from Scikit-Learn's Repository</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs017.html#various-steps-in-cross-validation" style="font-size: 80%;"><b>Various steps in cross-validation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs018.html#cross-validation-in-brief" style="font-size: 80%;"><b>Cross-validation in brief</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs019.html#code-example-for-cross-validation-and-k-fold-cross-validation" style="font-size: 80%;"><b>Code Example for Cross-validation and \( k \)-fold Cross-validation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs020.html#more-examples-on-bootstrap-and-cross-validation-and-errors" style="font-size: 80%;"><b>More examples on bootstrap and cross-validation and errors</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs021.html#the-same-example-but-now-with-cross-validation" style="font-size: 80%;"><b>The same example but now with cross-validation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs022.html#logistic-regression" style="font-size: 80%;"><b>Logistic Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs023.html#classification-problems" style="font-size: 80%;"><b>Classification problems</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs024.html#optimization-and-deep-learning" style="font-size: 80%;"><b>Optimization and Deep learning</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs025.html#basics" style="font-size: 80%;"><b>Basics</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs026.html#linear-classifier" style="font-size: 80%;"><b>Linear classifier</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs027.html#some-selected-properties" style="font-size: 80%;"><b>Some selected properties</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs028.html#simple-example" style="font-size: 80%;"><b>Simple example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs029.html#plotting-the-mean-value-for-each-group" style="font-size: 80%;"><b>Plotting the mean value for each group</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs030.html#the-logistic-function" style="font-size: 80%;"><b>The logistic function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs031.html#examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-networks" style="font-size: 80%;"><b>Examples of likelihood functions used in logistic regression and nueral networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs032.html#two-parameters" style="font-size: 80%;"><b>Two parameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs033.html#maximum-likelihood" style="font-size: 80%;"><b>Maximum likelihood</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#the-cost-function-rewritten" style="font-size: 80%;"><b>The cost function rewritten</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs035.html#minimizing-the-cross-entropy" style="font-size: 80%;"><b>Minimizing the cross entropy</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs036.html#a-more-compact-expression" style="font-size: 80%;"><b>A more compact expression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs037.html#extending-to-more-predictors" style="font-size: 80%;"><b>Extending to more predictors</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs038.html#including-more-classes" style="font-size: 80%;"><b>Including more classes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs039.html#more-classes" style="font-size: 80%;"><b>More classes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs040.html#optimization-the-central-part-of-any-machine-learning-algortithm" style="font-size: 80%;"><b>Optimization, the central part of any Machine Learning algortithm</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs041.html#revisiting-our-logistic-regression-case" style="font-size: 80%;"><b>Revisiting our Logistic Regression case</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs042.html#the-equations-to-solve" style="font-size: 80%;"><b>The equations to solve</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs043.html#solving-using-newton-raphson-s-method" style="font-size: 80%;"><b>Solving using Newton-Raphson's method</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs044.html#example-code-for-logistic-regression" style="font-size: 80%;"><b>Example code for Logistic Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs044.html#synthetic-data-generation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Synthetic data generation</a></li>
+
+        </ul>
+      </li>
+    </ul>
+  </div>
+</div>
+</div> <!-- end of navigation bar -->
+<div class="container">
+<p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
+<a name="part0013"></a>
+<!-- !split -->
+<h2 id="a-way-to-read-the-bias-variance-tradeoff" class="anchor">A way to Read the Bias-Variance Tradeoff </h2>
+
+<br/><br/>
+<center>
+<p><img src="http://github.com/figures/BiasVariance.png" width="600" align="bottom"></p>
+</center>
+<br/><br/>
+
+<p>
+<!-- navigation buttons at the bottom of the page -->
+<ul class="pagination">
+<li><a href="http://github.com/._week39-bs012.html">&laquo;</a></li>
+  <li><a href="http://github.com/._week39-bs000.html">1</a></li>
+  <li><a href="">...</a></li>
+  <li><a href="http://github.com/._week39-bs005.html">6</a></li>
+  <li><a href="http://github.com/._week39-bs006.html">7</a></li>
+  <li><a href="http://github.com/._week39-bs007.html">8</a></li>
+  <li><a href="http://github.com/._week39-bs008.html">9</a></li>
+  <li><a href="http://github.com/._week39-bs009.html">10</a></li>
+  <li><a href="http://github.com/._week39-bs010.html">11</a></li>
+  <li><a href="http://github.com/._week39-bs011.html">12</a></li>
+  <li><a href="http://github.com/._week39-bs012.html">13</a></li>
+  <li class="active"><a href="http://github.com/._week39-bs013.html">14</a></li>
+  <li><a href="http://github.com/._week39-bs014.html">15</a></li>
+  <li><a href="http://github.com/._week39-bs015.html">16</a></li>
+  <li><a href="http://github.com/._week39-bs016.html">17</a></li>
+  <li><a href="http://github.com/._week39-bs017.html">18</a></li>
+  <li><a href="http://github.com/._week39-bs018.html">19</a></li>
+  <li><a href="http://github.com/._week39-bs019.html">20</a></li>
+  <li><a href="http://github.com/._week39-bs020.html">21</a></li>
+  <li><a href="http://github.com/._week39-bs021.html">22</a></li>
+  <li><a href="http://github.com/._week39-bs022.html">23</a></li>
+  <li><a href="">...</a></li>
+  <li><a href="http://github.com/._week39-bs044.html">45</a></li>
+  <li><a href="http://github.com/._week39-bs014.html">&raquo;</a></li>
+</ul>
+<!-- ------------------- end of main content --------------- -->
+</div>  <!-- end container -->
+<!-- include javascript, jQuery *first* -->
+<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.10.2/jquery.min.js"></script>
+<script src="https://netdna.bootstrapcdn.com/bootstrap/3.0.0/js/bootstrap.min.js"></script>
+<!-- Bootstrap footer
+<footer>
+<a href="https://.../"><img width="250" align=right src="https://.../"></a>
+</footer>
+-->
+<center style="font-size:80%">
+<!-- copyright only on the titlepage -->
+</center>
+</body>
+</html>
+
diff --git a/doc/pub/week39/html/._week39-bs014.html b/doc/pub/week39/html/._week39-bs014.html
new file mode 100644
index 000000000..fb75736a4
--- /dev/null
+++ b/doc/pub/week39/html/._week39-bs014.html
@@ -0,0 +1,368 @@
+<!--
+HTML file automatically generated from DocOnce source
+(https://github.com/doconce/doconce/)
+doconce format html week39.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week39-bs --no_mako
+-->
+<html>
+<head>
+<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
+<meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
+<meta name="viewport" content="width=device-width, initial-scale=1.0" />
+<meta name="description" content="Week 39: Resampling methods and logistic regression">
+<title>Week 39: Resampling methods and logistic regression</title>
+<!-- Bootstrap style: bootstrap -->
+<!-- doconce format html week39.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week39-bs --no_mako -->
+<link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
+<!-- not necessary
+<link href="https://netdna.bootstrapcdn.com/font-awesome/4.0.3/css/font-awesome.css" rel="stylesheet">
+-->
+<style type="text/css">
+/* Add scrollbar to dropdown menus in bootstrap navigation bar */
+.dropdown-menu {
+   height: auto;
+   max-height: 400px;
+   overflow-x: hidden;
+}
+/* Adds an invisible element before each target to offset for the navigation
+   bar */
+.anchor::before {
+  content:"";
+  display:block;
+  height:50px;      /* fixed header height for style bootstrap */
+  margin:-50px 0 0; /* negative fixed header height */
+}
+</style>
+</head>
+
+<!-- tocinfo
+{'highest level': 2,
+ 'sections': [('Plan for week 39, September 22-26, 2025',
+               2,
+               None,
+               'plan-for-week-39-september-22-26-2025'),
+              ('Readings and Videos, resampling methods',
+               2,
+               None,
+               'readings-and-videos-resampling-methods'),
+              ('Readings and Videos, logistic regression',
+               2,
+               None,
+               'readings-and-videos-logistic-regression'),
+              ('Lab sessions week 39', 2, None, 'lab-sessions-week-39'),
+              ('Lecture material', 2, None, 'lecture-material'),
+              ('Resampling methods', 2, None, 'resampling-methods'),
+              ('Resampling approaches can be computationally expensive',
+               2,
+               None,
+               'resampling-approaches-can-be-computationally-expensive'),
+              ('Why resampling methods ?', 2, None, 'why-resampling-methods'),
+              ('Statistical analysis', 2, None, 'statistical-analysis'),
+              ('Resampling methods', 2, None, 'resampling-methods'),
+              ('Resampling methods: Bootstrap',
+               2,
+               None,
+               'resampling-methods-bootstrap'),
+              ('The bias-variance tradeoff',
+               2,
+               None,
+               'the-bias-variance-tradeoff'),
+              ('A way to Read the Bias-Variance Tradeoff',
+               2,
+               None,
+               'a-way-to-read-the-bias-variance-tradeoff'),
+              ('Understanding what happens',
+               2,
+               None,
+               'understanding-what-happens'),
+              ('Summing up', 2, None, 'summing-up'),
+              ("Another Example from Scikit-Learn's Repository",
+               2,
+               None,
+               'another-example-from-scikit-learn-s-repository'),
+              ('Various steps in cross-validation',
+               2,
+               None,
+               'various-steps-in-cross-validation'),
+              ('Cross-validation in brief',
+               2,
+               None,
+               'cross-validation-in-brief'),
+              ('Code Example for Cross-validation and $k$-fold '
+               'Cross-validation',
+               2,
+               None,
+               'code-example-for-cross-validation-and-k-fold-cross-validation'),
+              ('More examples on bootstrap and cross-validation and errors',
+               2,
+               None,
+               'more-examples-on-bootstrap-and-cross-validation-and-errors'),
+              ('The same example but now with cross-validation',
+               2,
+               None,
+               'the-same-example-but-now-with-cross-validation'),
+              ('Logistic Regression', 2, None, 'logistic-regression'),
+              ('Classification problems', 2, None, 'classification-problems'),
+              ('Optimization and Deep learning',
+               2,
+               None,
+               'optimization-and-deep-learning'),
+              ('Basics', 2, None, 'basics'),
+              ('Linear classifier', 2, None, 'linear-classifier'),
+              ('Some selected properties', 2, None, 'some-selected-properties'),
+              ('Simple example', 2, None, 'simple-example'),
+              ('Plotting the mean value for each group',
+               2,
+               None,
+               'plotting-the-mean-value-for-each-group'),
+              ('The logistic function', 2, None, 'the-logistic-function'),
+              ('Examples of likelihood functions used in logistic regression '
+               'and nueral networks',
+               2,
+               None,
+               'examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-networks'),
+              ('Two parameters', 2, None, 'two-parameters'),
+              ('Maximum likelihood', 2, None, 'maximum-likelihood'),
+              ('The cost function rewritten',
+               2,
+               None,
+               'the-cost-function-rewritten'),
+              ('Minimizing the cross entropy',
+               2,
+               None,
+               'minimizing-the-cross-entropy'),
+              ('A more compact expression',
+               2,
+               None,
+               'a-more-compact-expression'),
+              ('Extending to more predictors',
+               2,
+               None,
+               'extending-to-more-predictors'),
+              ('Including more classes', 2, None, 'including-more-classes'),
+              ('More classes', 2, None, 'more-classes'),
+              ('Optimization, the central part of any Machine Learning '
+               'algortithm',
+               2,
+               None,
+               'optimization-the-central-part-of-any-machine-learning-algortithm'),
+              ('Revisiting our Logistic Regression case',
+               2,
+               None,
+               'revisiting-our-logistic-regression-case'),
+              ('The equations to solve', 2, None, 'the-equations-to-solve'),
+              ("Solving using Newton-Raphson's method",
+               2,
+               None,
+               'solving-using-newton-raphson-s-method'),
+              ('Example code for Logistic Regression',
+               2,
+               None,
+               'example-code-for-logistic-regression'),
+              ('Synthetic data generation',
+               3,
+               None,
+               'synthetic-data-generation')]}
+end of tocinfo -->
+
+<body>
+
+
+
+<script type="text/x-mathjax-config">
+MathJax.Hub.Config({
+  TeX: {
+     equationNumbers: {  autoNumber: "none"  },
+     extensions: ["AMSmath.js", "AMSsymbols.js", "autobold.js", "color.js"]
+  }
+});
+</script>
+<script type="text/javascript" async
+ src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
+</script>
+
+
+<!-- Bootstrap navigation bar -->
+<div class="navbar navbar-default navbar-fixed-top">
+  <div class="navbar-header">
+    <button type="button" class="navbar-toggle" data-toggle="collapse" data-target=".navbar-responsive-collapse">
+      <span class="icon-bar"></span>
+      <span class="icon-bar"></span>
+      <span class="icon-bar"></span>
+    </button>
+    <a class="navbar-brand" href="http://github.com/week39-bs.html">Week 39: Resampling methods and logistic regression</a>
+  </div>
+  <div class="navbar-collapse collapse navbar-responsive-collapse">
+    <ul class="nav navbar-nav navbar-right">
+      <li class="dropdown">
+        <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
+        <ul class="dropdown-menu">
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs001.html#plan-for-week-39-september-22-26-2025" style="font-size: 80%;"><b>Plan for week 39, September 22-26, 2025</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs002.html#readings-and-videos-resampling-methods" style="font-size: 80%;"><b>Readings and Videos, resampling methods</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs003.html#readings-and-videos-logistic-regression" style="font-size: 80%;"><b>Readings and Videos, logistic regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs004.html#lab-sessions-week-39" style="font-size: 80%;"><b>Lab sessions week 39</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs005.html#lecture-material" style="font-size: 80%;"><b>Lecture material</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs010.html#resampling-methods" style="font-size: 80%;"><b>Resampling methods</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs007.html#resampling-approaches-can-be-computationally-expensive" style="font-size: 80%;"><b>Resampling approaches can be computationally expensive</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs008.html#why-resampling-methods" style="font-size: 80%;"><b>Why resampling methods ?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs009.html#statistical-analysis" style="font-size: 80%;"><b>Statistical analysis</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs010.html#resampling-methods" style="font-size: 80%;"><b>Resampling methods</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs011.html#resampling-methods-bootstrap" style="font-size: 80%;"><b>Resampling methods: Bootstrap</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs012.html#the-bias-variance-tradeoff" style="font-size: 80%;"><b>The bias-variance tradeoff</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs013.html#a-way-to-read-the-bias-variance-tradeoff" style="font-size: 80%;"><b>A way to Read the Bias-Variance Tradeoff</b></a></li>
+     <!-- navigation toc: --> <li><a href="#understanding-what-happens" style="font-size: 80%;"><b>Understanding what happens</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs015.html#summing-up" style="font-size: 80%;"><b>Summing up</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs016.html#another-example-from-scikit-learn-s-repository" style="font-size: 80%;"><b>Another Example from Scikit-Learn's Repository</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs017.html#various-steps-in-cross-validation" style="font-size: 80%;"><b>Various steps in cross-validation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs018.html#cross-validation-in-brief" style="font-size: 80%;"><b>Cross-validation in brief</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs019.html#code-example-for-cross-validation-and-k-fold-cross-validation" style="font-size: 80%;"><b>Code Example for Cross-validation and \( k \)-fold Cross-validation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs020.html#more-examples-on-bootstrap-and-cross-validation-and-errors" style="font-size: 80%;"><b>More examples on bootstrap and cross-validation and errors</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs021.html#the-same-example-but-now-with-cross-validation" style="font-size: 80%;"><b>The same example but now with cross-validation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs022.html#logistic-regression" style="font-size: 80%;"><b>Logistic Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs023.html#classification-problems" style="font-size: 80%;"><b>Classification problems</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs024.html#optimization-and-deep-learning" style="font-size: 80%;"><b>Optimization and Deep learning</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs025.html#basics" style="font-size: 80%;"><b>Basics</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs026.html#linear-classifier" style="font-size: 80%;"><b>Linear classifier</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs027.html#some-selected-properties" style="font-size: 80%;"><b>Some selected properties</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs028.html#simple-example" style="font-size: 80%;"><b>Simple example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs029.html#plotting-the-mean-value-for-each-group" style="font-size: 80%;"><b>Plotting the mean value for each group</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs030.html#the-logistic-function" style="font-size: 80%;"><b>The logistic function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs031.html#examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-networks" style="font-size: 80%;"><b>Examples of likelihood functions used in logistic regression and nueral networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs032.html#two-parameters" style="font-size: 80%;"><b>Two parameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs033.html#maximum-likelihood" style="font-size: 80%;"><b>Maximum likelihood</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#the-cost-function-rewritten" style="font-size: 80%;"><b>The cost function rewritten</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs035.html#minimizing-the-cross-entropy" style="font-size: 80%;"><b>Minimizing the cross entropy</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs036.html#a-more-compact-expression" style="font-size: 80%;"><b>A more compact expression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs037.html#extending-to-more-predictors" style="font-size: 80%;"><b>Extending to more predictors</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs038.html#including-more-classes" style="font-size: 80%;"><b>Including more classes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs039.html#more-classes" style="font-size: 80%;"><b>More classes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs040.html#optimization-the-central-part-of-any-machine-learning-algortithm" style="font-size: 80%;"><b>Optimization, the central part of any Machine Learning algortithm</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs041.html#revisiting-our-logistic-regression-case" style="font-size: 80%;"><b>Revisiting our Logistic Regression case</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs042.html#the-equations-to-solve" style="font-size: 80%;"><b>The equations to solve</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs043.html#solving-using-newton-raphson-s-method" style="font-size: 80%;"><b>Solving using Newton-Raphson's method</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs044.html#example-code-for-logistic-regression" style="font-size: 80%;"><b>Example code for Logistic Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs044.html#synthetic-data-generation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Synthetic data generation</a></li>
+
+        </ul>
+      </li>
+    </ul>
+  </div>
+</div>
+</div> <!-- end of navigation bar -->
+<div class="container">
+<p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
+<a name="part0014"></a>
+<!-- !split -->
+<h2 id="understanding-what-happens" class="anchor">Understanding what happens </h2>
+
+<!-- code=python (!bc pycod) typeset with pygments style "default" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #f8f8f8">
+  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">matplotlib.pyplot</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">plt</span>
+<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.linear_model</span> <span style="color: #008000; font-weight: bold">import</span> LinearRegression, Ridge, Lasso
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.preprocessing</span> <span style="color: #008000; font-weight: bold">import</span> PolynomialFeatures
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.model_selection</span> <span style="color: #008000; font-weight: bold">import</span> train_test_split
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.pipeline</span> <span style="color: #008000; font-weight: bold">import</span> make_pipeline
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.utils</span> <span style="color: #008000; font-weight: bold">import</span> resample
+
+np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>seed(<span style="color: #666666">2018</span>)
+
+n <span style="color: #666666">=</span> <span style="color: #666666">40</span>
+n_boostraps <span style="color: #666666">=</span> <span style="color: #666666">100</span>
+maxdegree <span style="color: #666666">=</span> <span style="color: #666666">14</span>
+
+
+<span style="color: #408080; font-style: italic"># Make data set.</span>
+x <span style="color: #666666">=</span> np<span style="color: #666666">.</span>linspace(<span style="color: #666666">-3</span>, <span style="color: #666666">3</span>, n)<span style="color: #666666">.</span>reshape(<span style="color: #666666">-1</span>, <span style="color: #666666">1</span>)
+y <span style="color: #666666">=</span> np<span style="color: #666666">.</span>exp(<span style="color: #666666">-</span>x<span style="color: #666666">**2</span>) <span style="color: #666666">+</span> <span style="color: #666666">1.5</span> <span style="color: #666666">*</span> np<span style="color: #666666">.</span>exp(<span style="color: #666666">-</span>(x<span style="color: #666666">-2</span>)<span style="color: #666666">**2</span>)<span style="color: #666666">+</span> np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>normal(<span style="color: #666666">0</span>, <span style="color: #666666">0.1</span>, x<span style="color: #666666">.</span>shape)
+error <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros(maxdegree)
+bias <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros(maxdegree)
+variance <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros(maxdegree)
+polydegree <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros(maxdegree)
+x_train, x_test, y_train, y_test <span style="color: #666666">=</span> train_test_split(x, y, test_size<span style="color: #666666">=0.2</span>)
+
+<span style="color: #008000; font-weight: bold">for</span> degree <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(maxdegree):
+    model <span style="color: #666666">=</span> make_pipeline(PolynomialFeatures(degree<span style="color: #666666">=</span>degree), LinearRegression(fit_intercept<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">False</span>))
+    y_pred <span style="color: #666666">=</span> np<span style="color: #666666">.</span>empty((y_test<span style="color: #666666">.</span>shape[<span style="color: #666666">0</span>], n_boostraps))
+    <span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(n_boostraps):
+        x_, y_ <span style="color: #666666">=</span> resample(x_train, y_train)
+        y_pred[:, i] <span style="color: #666666">=</span> model<span style="color: #666666">.</span>fit(x_, y_)<span style="color: #666666">.</span>predict(x_test)<span style="color: #666666">.</span>ravel()
+
+    polydegree[degree] <span style="color: #666666">=</span> degree
+    error[degree] <span style="color: #666666">=</span> np<span style="color: #666666">.</span>mean( np<span style="color: #666666">.</span>mean((y_test <span style="color: #666666">-</span> y_pred)<span style="color: #666666">**2</span>, axis<span style="color: #666666">=1</span>, keepdims<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">True</span>) )
+    bias[degree] <span style="color: #666666">=</span> np<span style="color: #666666">.</span>mean( (y_test <span style="color: #666666">-</span> np<span style="color: #666666">.</span>mean(y_pred, axis<span style="color: #666666">=1</span>, keepdims<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">True</span>))<span style="color: #666666">**2</span> )
+    variance[degree] <span style="color: #666666">=</span> np<span style="color: #666666">.</span>mean( np<span style="color: #666666">.</span>var(y_pred, axis<span style="color: #666666">=1</span>, keepdims<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">True</span>) )
+    <span style="color: #008000">print</span>(<span style="color: #BA2121">&#39;Polynomial degree:&#39;</span>, degree)
+    <span style="color: #008000">print</span>(<span style="color: #BA2121">&#39;Error:&#39;</span>, error[degree])
+    <span style="color: #008000">print</span>(<span style="color: #BA2121">&#39;Bias^2:&#39;</span>, bias[degree])
+    <span style="color: #008000">print</span>(<span style="color: #BA2121">&#39;Var:&#39;</span>, variance[degree])
+    <span style="color: #008000">print</span>(<span style="color: #BA2121">&#39;</span><span style="color: #BB6688; font-weight: bold">{}</span><span style="color: #BA2121"> &gt;= </span><span style="color: #BB6688; font-weight: bold">{}</span><span style="color: #BA2121"> + </span><span style="color: #BB6688; font-weight: bold">{}</span><span style="color: #BA2121"> = </span><span style="color: #BB6688; font-weight: bold">{}</span><span style="color: #BA2121">&#39;</span><span style="color: #666666">.</span>format(error[degree], bias[degree], variance[degree], bias[degree]<span style="color: #666666">+</span>variance[degree]))
+
+plt<span style="color: #666666">.</span>plot(polydegree, error, label<span style="color: #666666">=</span><span style="color: #BA2121">&#39;Error&#39;</span>)
+plt<span style="color: #666666">.</span>plot(polydegree, bias, label<span style="color: #666666">=</span><span style="color: #BA2121">&#39;bias&#39;</span>)
+plt<span style="color: #666666">.</span>plot(polydegree, variance, label<span style="color: #666666">=</span><span style="color: #BA2121">&#39;Variance&#39;</span>)
+plt<span style="color: #666666">.</span>legend()
+plt<span style="color: #666666">.</span>show()
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
+</div>
+
+
+<p>
+<!-- navigation buttons at the bottom of the page -->
+<ul class="pagination">
+<li><a href="http://github.com/._week39-bs013.html">&laquo;</a></li>
+  <li><a href="http://github.com/._week39-bs000.html">1</a></li>
+  <li><a href="">...</a></li>
+  <li><a href="http://github.com/._week39-bs006.html">7</a></li>
+  <li><a href="http://github.com/._week39-bs007.html">8</a></li>
+  <li><a href="http://github.com/._week39-bs008.html">9</a></li>
+  <li><a href="http://github.com/._week39-bs009.html">10</a></li>
+  <li><a href="http://github.com/._week39-bs010.html">11</a></li>
+  <li><a href="http://github.com/._week39-bs011.html">12</a></li>
+  <li><a href="http://github.com/._week39-bs012.html">13</a></li>
+  <li><a href="http://github.com/._week39-bs013.html">14</a></li>
+  <li class="active"><a href="http://github.com/._week39-bs014.html">15</a></li>
+  <li><a href="http://github.com/._week39-bs015.html">16</a></li>
+  <li><a href="http://github.com/._week39-bs016.html">17</a></li>
+  <li><a href="http://github.com/._week39-bs017.html">18</a></li>
+  <li><a href="http://github.com/._week39-bs018.html">19</a></li>
+  <li><a href="http://github.com/._week39-bs019.html">20</a></li>
+  <li><a href="http://github.com/._week39-bs020.html">21</a></li>
+  <li><a href="http://github.com/._week39-bs021.html">22</a></li>
+  <li><a href="http://github.com/._week39-bs022.html">23</a></li>
+  <li><a href="http://github.com/._week39-bs023.html">24</a></li>
+  <li><a href="">...</a></li>
+  <li><a href="http://github.com/._week39-bs044.html">45</a></li>
+  <li><a href="http://github.com/._week39-bs015.html">&raquo;</a></li>
+</ul>
+<!-- ------------------- end of main content --------------- -->
+</div>  <!-- end container -->
+<!-- include javascript, jQuery *first* -->
+<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.10.2/jquery.min.js"></script>
+<script src="https://netdna.bootstrapcdn.com/bootstrap/3.0.0/js/bootstrap.min.js"></script>
+<!-- Bootstrap footer
+<footer>
+<a href="https://.../"><img width="250" align=right src="https://.../"></a>
+</footer>
+-->
+<center style="font-size:80%">
+<!-- copyright only on the titlepage -->
+</center>
+</body>
+</html>
+
diff --git a/doc/pub/week39/html/._week39-bs015.html b/doc/pub/week39/html/._week39-bs015.html
new file mode 100644
index 000000000..bf900503b
--- /dev/null
+++ b/doc/pub/week39/html/._week39-bs015.html
@@ -0,0 +1,331 @@
+<!--
+HTML file automatically generated from DocOnce source
+(https://github.com/doconce/doconce/)
+doconce format html week39.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week39-bs --no_mako
+-->
+<html>
+<head>
+<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
+<meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
+<meta name="viewport" content="width=device-width, initial-scale=1.0" />
+<meta name="description" content="Week 39: Resampling methods and logistic regression">
+<title>Week 39: Resampling methods and logistic regression</title>
+<!-- Bootstrap style: bootstrap -->
+<!-- doconce format html week39.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week39-bs --no_mako -->
+<link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
+<!-- not necessary
+<link href="https://netdna.bootstrapcdn.com/font-awesome/4.0.3/css/font-awesome.css" rel="stylesheet">
+-->
+<style type="text/css">
+/* Add scrollbar to dropdown menus in bootstrap navigation bar */
+.dropdown-menu {
+   height: auto;
+   max-height: 400px;
+   overflow-x: hidden;
+}
+/* Adds an invisible element before each target to offset for the navigation
+   bar */
+.anchor::before {
+  content:"";
+  display:block;
+  height:50px;      /* fixed header height for style bootstrap */
+  margin:-50px 0 0; /* negative fixed header height */
+}
+</style>
+</head>
+
+<!-- tocinfo
+{'highest level': 2,
+ 'sections': [('Plan for week 39, September 22-26, 2025',
+               2,
+               None,
+               'plan-for-week-39-september-22-26-2025'),
+              ('Readings and Videos, resampling methods',
+               2,
+               None,
+               'readings-and-videos-resampling-methods'),
+              ('Readings and Videos, logistic regression',
+               2,
+               None,
+               'readings-and-videos-logistic-regression'),
+              ('Lab sessions week 39', 2, None, 'lab-sessions-week-39'),
+              ('Lecture material', 2, None, 'lecture-material'),
+              ('Resampling methods', 2, None, 'resampling-methods'),
+              ('Resampling approaches can be computationally expensive',
+               2,
+               None,
+               'resampling-approaches-can-be-computationally-expensive'),
+              ('Why resampling methods ?', 2, None, 'why-resampling-methods'),
+              ('Statistical analysis', 2, None, 'statistical-analysis'),
+              ('Resampling methods', 2, None, 'resampling-methods'),
+              ('Resampling methods: Bootstrap',
+               2,
+               None,
+               'resampling-methods-bootstrap'),
+              ('The bias-variance tradeoff',
+               2,
+               None,
+               'the-bias-variance-tradeoff'),
+              ('A way to Read the Bias-Variance Tradeoff',
+               2,
+               None,
+               'a-way-to-read-the-bias-variance-tradeoff'),
+              ('Understanding what happens',
+               2,
+               None,
+               'understanding-what-happens'),
+              ('Summing up', 2, None, 'summing-up'),
+              ("Another Example from Scikit-Learn's Repository",
+               2,
+               None,
+               'another-example-from-scikit-learn-s-repository'),
+              ('Various steps in cross-validation',
+               2,
+               None,
+               'various-steps-in-cross-validation'),
+              ('Cross-validation in brief',
+               2,
+               None,
+               'cross-validation-in-brief'),
+              ('Code Example for Cross-validation and $k$-fold '
+               'Cross-validation',
+               2,
+               None,
+               'code-example-for-cross-validation-and-k-fold-cross-validation'),
+              ('More examples on bootstrap and cross-validation and errors',
+               2,
+               None,
+               'more-examples-on-bootstrap-and-cross-validation-and-errors'),
+              ('The same example but now with cross-validation',
+               2,
+               None,
+               'the-same-example-but-now-with-cross-validation'),
+              ('Logistic Regression', 2, None, 'logistic-regression'),
+              ('Classification problems', 2, None, 'classification-problems'),
+              ('Optimization and Deep learning',
+               2,
+               None,
+               'optimization-and-deep-learning'),
+              ('Basics', 2, None, 'basics'),
+              ('Linear classifier', 2, None, 'linear-classifier'),
+              ('Some selected properties', 2, None, 'some-selected-properties'),
+              ('Simple example', 2, None, 'simple-example'),
+              ('Plotting the mean value for each group',
+               2,
+               None,
+               'plotting-the-mean-value-for-each-group'),
+              ('The logistic function', 2, None, 'the-logistic-function'),
+              ('Examples of likelihood functions used in logistic regression '
+               'and nueral networks',
+               2,
+               None,
+               'examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-networks'),
+              ('Two parameters', 2, None, 'two-parameters'),
+              ('Maximum likelihood', 2, None, 'maximum-likelihood'),
+              ('The cost function rewritten',
+               2,
+               None,
+               'the-cost-function-rewritten'),
+              ('Minimizing the cross entropy',
+               2,
+               None,
+               'minimizing-the-cross-entropy'),
+              ('A more compact expression',
+               2,
+               None,
+               'a-more-compact-expression'),
+              ('Extending to more predictors',
+               2,
+               None,
+               'extending-to-more-predictors'),
+              ('Including more classes', 2, None, 'including-more-classes'),
+              ('More classes', 2, None, 'more-classes'),
+              ('Optimization, the central part of any Machine Learning '
+               'algortithm',
+               2,
+               None,
+               'optimization-the-central-part-of-any-machine-learning-algortithm'),
+              ('Revisiting our Logistic Regression case',
+               2,
+               None,
+               'revisiting-our-logistic-regression-case'),
+              ('The equations to solve', 2, None, 'the-equations-to-solve'),
+              ("Solving using Newton-Raphson's method",
+               2,
+               None,
+               'solving-using-newton-raphson-s-method'),
+              ('Example code for Logistic Regression',
+               2,
+               None,
+               'example-code-for-logistic-regression'),
+              ('Synthetic data generation',
+               3,
+               None,
+               'synthetic-data-generation')]}
+end of tocinfo -->
+
+<body>
+
+
+
+<script type="text/x-mathjax-config">
+MathJax.Hub.Config({
+  TeX: {
+     equationNumbers: {  autoNumber: "none"  },
+     extensions: ["AMSmath.js", "AMSsymbols.js", "autobold.js", "color.js"]
+  }
+});
+</script>
+<script type="text/javascript" async
+ src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
+</script>
+
+
+<!-- Bootstrap navigation bar -->
+<div class="navbar navbar-default navbar-fixed-top">
+  <div class="navbar-header">
+    <button type="button" class="navbar-toggle" data-toggle="collapse" data-target=".navbar-responsive-collapse">
+      <span class="icon-bar"></span>
+      <span class="icon-bar"></span>
+      <span class="icon-bar"></span>
+    </button>
+    <a class="navbar-brand" href="http://github.com/week39-bs.html">Week 39: Resampling methods and logistic regression</a>
+  </div>
+  <div class="navbar-collapse collapse navbar-responsive-collapse">
+    <ul class="nav navbar-nav navbar-right">
+      <li class="dropdown">
+        <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
+        <ul class="dropdown-menu">
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs001.html#plan-for-week-39-september-22-26-2025" style="font-size: 80%;"><b>Plan for week 39, September 22-26, 2025</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs002.html#readings-and-videos-resampling-methods" style="font-size: 80%;"><b>Readings and Videos, resampling methods</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs003.html#readings-and-videos-logistic-regression" style="font-size: 80%;"><b>Readings and Videos, logistic regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs004.html#lab-sessions-week-39" style="font-size: 80%;"><b>Lab sessions week 39</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs005.html#lecture-material" style="font-size: 80%;"><b>Lecture material</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs010.html#resampling-methods" style="font-size: 80%;"><b>Resampling methods</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs007.html#resampling-approaches-can-be-computationally-expensive" style="font-size: 80%;"><b>Resampling approaches can be computationally expensive</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs008.html#why-resampling-methods" style="font-size: 80%;"><b>Why resampling methods ?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs009.html#statistical-analysis" style="font-size: 80%;"><b>Statistical analysis</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs010.html#resampling-methods" style="font-size: 80%;"><b>Resampling methods</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs011.html#resampling-methods-bootstrap" style="font-size: 80%;"><b>Resampling methods: Bootstrap</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs012.html#the-bias-variance-tradeoff" style="font-size: 80%;"><b>The bias-variance tradeoff</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs013.html#a-way-to-read-the-bias-variance-tradeoff" style="font-size: 80%;"><b>A way to Read the Bias-Variance Tradeoff</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs014.html#understanding-what-happens" style="font-size: 80%;"><b>Understanding what happens</b></a></li>
+     <!-- navigation toc: --> <li><a href="#summing-up" style="font-size: 80%;"><b>Summing up</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs016.html#another-example-from-scikit-learn-s-repository" style="font-size: 80%;"><b>Another Example from Scikit-Learn's Repository</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs017.html#various-steps-in-cross-validation" style="font-size: 80%;"><b>Various steps in cross-validation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs018.html#cross-validation-in-brief" style="font-size: 80%;"><b>Cross-validation in brief</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs019.html#code-example-for-cross-validation-and-k-fold-cross-validation" style="font-size: 80%;"><b>Code Example for Cross-validation and \( k \)-fold Cross-validation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs020.html#more-examples-on-bootstrap-and-cross-validation-and-errors" style="font-size: 80%;"><b>More examples on bootstrap and cross-validation and errors</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs021.html#the-same-example-but-now-with-cross-validation" style="font-size: 80%;"><b>The same example but now with cross-validation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs022.html#logistic-regression" style="font-size: 80%;"><b>Logistic Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs023.html#classification-problems" style="font-size: 80%;"><b>Classification problems</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs024.html#optimization-and-deep-learning" style="font-size: 80%;"><b>Optimization and Deep learning</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs025.html#basics" style="font-size: 80%;"><b>Basics</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs026.html#linear-classifier" style="font-size: 80%;"><b>Linear classifier</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs027.html#some-selected-properties" style="font-size: 80%;"><b>Some selected properties</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs028.html#simple-example" style="font-size: 80%;"><b>Simple example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs029.html#plotting-the-mean-value-for-each-group" style="font-size: 80%;"><b>Plotting the mean value for each group</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs030.html#the-logistic-function" style="font-size: 80%;"><b>The logistic function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs031.html#examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-networks" style="font-size: 80%;"><b>Examples of likelihood functions used in logistic regression and nueral networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs032.html#two-parameters" style="font-size: 80%;"><b>Two parameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs033.html#maximum-likelihood" style="font-size: 80%;"><b>Maximum likelihood</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#the-cost-function-rewritten" style="font-size: 80%;"><b>The cost function rewritten</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs035.html#minimizing-the-cross-entropy" style="font-size: 80%;"><b>Minimizing the cross entropy</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs036.html#a-more-compact-expression" style="font-size: 80%;"><b>A more compact expression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs037.html#extending-to-more-predictors" style="font-size: 80%;"><b>Extending to more predictors</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs038.html#including-more-classes" style="font-size: 80%;"><b>Including more classes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs039.html#more-classes" style="font-size: 80%;"><b>More classes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs040.html#optimization-the-central-part-of-any-machine-learning-algortithm" style="font-size: 80%;"><b>Optimization, the central part of any Machine Learning algortithm</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs041.html#revisiting-our-logistic-regression-case" style="font-size: 80%;"><b>Revisiting our Logistic Regression case</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs042.html#the-equations-to-solve" style="font-size: 80%;"><b>The equations to solve</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs043.html#solving-using-newton-raphson-s-method" style="font-size: 80%;"><b>Solving using Newton-Raphson's method</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs044.html#example-code-for-logistic-regression" style="font-size: 80%;"><b>Example code for Logistic Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs044.html#synthetic-data-generation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Synthetic data generation</a></li>
+
+        </ul>
+      </li>
+    </ul>
+  </div>
+</div>
+</div> <!-- end of navigation bar -->
+<div class="container">
+<p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
+<a name="part0015"></a>
+<!-- !split  -->
+<h2 id="summing-up" class="anchor">Summing up </h2>
+
+<p>The bias-variance tradeoff summarizes the fundamental tension in
+machine learning, particularly supervised learning, between the
+complexity of a model and the amount of training data needed to train
+it.  Since data is often limited, in practice it is often useful to
+use a less-complex model with higher bias, that is  a model whose asymptotic
+performance is worse than another model because it is easier to
+train and less sensitive to sampling noise arising from having a
+finite-sized training dataset (smaller variance). 
+</p>
+
+<p>The above equations tell us that in
+order to minimize the expected test error, we need to select a
+statistical learning method that simultaneously achieves low variance
+and low bias. Note that variance is inherently a nonnegative quantity,
+and squared bias is also nonnegative. Hence, we see that the expected
+test MSE can never lie below \( Var(\epsilon) \), the irreducible error.
+</p>
+
+<p>What do we mean by the variance and bias of a statistical learning
+method? The variance refers to the amount by which our model would change if we
+estimated it using a different training data set. Since the training
+data are used to fit the statistical learning method, different
+training data sets  will result in a different estimate. But ideally the
+estimate for our model should not vary too much between training
+sets. However, if a method has high variance  then small changes in
+the training data can result in large changes in the model. In general, more
+flexible statistical methods have higher variance.
+</p>
+
+<p>You may also find this recent <a href="https://www.pnas.org/content/116/32/15849" target="_self">article</a> of interest.</p>
+
+<p>
+<!-- navigation buttons at the bottom of the page -->
+<ul class="pagination">
+<li><a href="http://github.com/._week39-bs014.html">&laquo;</a></li>
+  <li><a href="http://github.com/._week39-bs000.html">1</a></li>
+  <li><a href="">...</a></li>
+  <li><a href="http://github.com/._week39-bs007.html">8</a></li>
+  <li><a href="http://github.com/._week39-bs008.html">9</a></li>
+  <li><a href="http://github.com/._week39-bs009.html">10</a></li>
+  <li><a href="http://github.com/._week39-bs010.html">11</a></li>
+  <li><a href="http://github.com/._week39-bs011.html">12</a></li>
+  <li><a href="http://github.com/._week39-bs012.html">13</a></li>
+  <li><a href="http://github.com/._week39-bs013.html">14</a></li>
+  <li><a href="http://github.com/._week39-bs014.html">15</a></li>
+  <li class="active"><a href="http://github.com/._week39-bs015.html">16</a></li>
+  <li><a href="http://github.com/._week39-bs016.html">17</a></li>
+  <li><a href="http://github.com/._week39-bs017.html">18</a></li>
+  <li><a href="http://github.com/._week39-bs018.html">19</a></li>
+  <li><a href="http://github.com/._week39-bs019.html">20</a></li>
+  <li><a href="http://github.com/._week39-bs020.html">21</a></li>
+  <li><a href="http://github.com/._week39-bs021.html">22</a></li>
+  <li><a href="http://github.com/._week39-bs022.html">23</a></li>
+  <li><a href="http://github.com/._week39-bs023.html">24</a></li>
+  <li><a href="http://github.com/._week39-bs024.html">25</a></li>
+  <li><a href="">...</a></li>
+  <li><a href="http://github.com/._week39-bs044.html">45</a></li>
+  <li><a href="http://github.com/._week39-bs016.html">&raquo;</a></li>
+</ul>
+<!-- ------------------- end of main content --------------- -->
+</div>  <!-- end container -->
+<!-- include javascript, jQuery *first* -->
+<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.10.2/jquery.min.js"></script>
+<script src="https://netdna.bootstrapcdn.com/bootstrap/3.0.0/js/bootstrap.min.js"></script>
+<!-- Bootstrap footer
+<footer>
+<a href="https://.../"><img width="250" align=right src="https://.../"></a>
+</footer>
+-->
+<center style="font-size:80%">
+<!-- copyright only on the titlepage -->
+</center>
+</body>
+</html>
+
diff --git a/doc/pub/week39/html/._week39-bs016.html b/doc/pub/week39/html/._week39-bs016.html
new file mode 100644
index 000000000..0830eaab2
--- /dev/null
+++ b/doc/pub/week39/html/._week39-bs016.html
@@ -0,0 +1,389 @@
+<!--
+HTML file automatically generated from DocOnce source
+(https://github.com/doconce/doconce/)
+doconce format html week39.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week39-bs --no_mako
+-->
+<html>
+<head>
+<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
+<meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
+<meta name="viewport" content="width=device-width, initial-scale=1.0" />
+<meta name="description" content="Week 39: Resampling methods and logistic regression">
+<title>Week 39: Resampling methods and logistic regression</title>
+<!-- Bootstrap style: bootstrap -->
+<!-- doconce format html week39.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week39-bs --no_mako -->
+<link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
+<!-- not necessary
+<link href="https://netdna.bootstrapcdn.com/font-awesome/4.0.3/css/font-awesome.css" rel="stylesheet">
+-->
+<style type="text/css">
+/* Add scrollbar to dropdown menus in bootstrap navigation bar */
+.dropdown-menu {
+   height: auto;
+   max-height: 400px;
+   overflow-x: hidden;
+}
+/* Adds an invisible element before each target to offset for the navigation
+   bar */
+.anchor::before {
+  content:"";
+  display:block;
+  height:50px;      /* fixed header height for style bootstrap */
+  margin:-50px 0 0; /* negative fixed header height */
+}
+</style>
+</head>
+
+<!-- tocinfo
+{'highest level': 2,
+ 'sections': [('Plan for week 39, September 22-26, 2025',
+               2,
+               None,
+               'plan-for-week-39-september-22-26-2025'),
+              ('Readings and Videos, resampling methods',
+               2,
+               None,
+               'readings-and-videos-resampling-methods'),
+              ('Readings and Videos, logistic regression',
+               2,
+               None,
+               'readings-and-videos-logistic-regression'),
+              ('Lab sessions week 39', 2, None, 'lab-sessions-week-39'),
+              ('Lecture material', 2, None, 'lecture-material'),
+              ('Resampling methods', 2, None, 'resampling-methods'),
+              ('Resampling approaches can be computationally expensive',
+               2,
+               None,
+               'resampling-approaches-can-be-computationally-expensive'),
+              ('Why resampling methods ?', 2, None, 'why-resampling-methods'),
+              ('Statistical analysis', 2, None, 'statistical-analysis'),
+              ('Resampling methods', 2, None, 'resampling-methods'),
+              ('Resampling methods: Bootstrap',
+               2,
+               None,
+               'resampling-methods-bootstrap'),
+              ('The bias-variance tradeoff',
+               2,
+               None,
+               'the-bias-variance-tradeoff'),
+              ('A way to Read the Bias-Variance Tradeoff',
+               2,
+               None,
+               'a-way-to-read-the-bias-variance-tradeoff'),
+              ('Understanding what happens',
+               2,
+               None,
+               'understanding-what-happens'),
+              ('Summing up', 2, None, 'summing-up'),
+              ("Another Example from Scikit-Learn's Repository",
+               2,
+               None,
+               'another-example-from-scikit-learn-s-repository'),
+              ('Various steps in cross-validation',
+               2,
+               None,
+               'various-steps-in-cross-validation'),
+              ('Cross-validation in brief',
+               2,
+               None,
+               'cross-validation-in-brief'),
+              ('Code Example for Cross-validation and $k$-fold '
+               'Cross-validation',
+               2,
+               None,
+               'code-example-for-cross-validation-and-k-fold-cross-validation'),
+              ('More examples on bootstrap and cross-validation and errors',
+               2,
+               None,
+               'more-examples-on-bootstrap-and-cross-validation-and-errors'),
+              ('The same example but now with cross-validation',
+               2,
+               None,
+               'the-same-example-but-now-with-cross-validation'),
+              ('Logistic Regression', 2, None, 'logistic-regression'),
+              ('Classification problems', 2, None, 'classification-problems'),
+              ('Optimization and Deep learning',
+               2,
+               None,
+               'optimization-and-deep-learning'),
+              ('Basics', 2, None, 'basics'),
+              ('Linear classifier', 2, None, 'linear-classifier'),
+              ('Some selected properties', 2, None, 'some-selected-properties'),
+              ('Simple example', 2, None, 'simple-example'),
+              ('Plotting the mean value for each group',
+               2,
+               None,
+               'plotting-the-mean-value-for-each-group'),
+              ('The logistic function', 2, None, 'the-logistic-function'),
+              ('Examples of likelihood functions used in logistic regression '
+               'and nueral networks',
+               2,
+               None,
+               'examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-networks'),
+              ('Two parameters', 2, None, 'two-parameters'),
+              ('Maximum likelihood', 2, None, 'maximum-likelihood'),
+              ('The cost function rewritten',
+               2,
+               None,
+               'the-cost-function-rewritten'),
+              ('Minimizing the cross entropy',
+               2,
+               None,
+               'minimizing-the-cross-entropy'),
+              ('A more compact expression',
+               2,
+               None,
+               'a-more-compact-expression'),
+              ('Extending to more predictors',
+               2,
+               None,
+               'extending-to-more-predictors'),
+              ('Including more classes', 2, None, 'including-more-classes'),
+              ('More classes', 2, None, 'more-classes'),
+              ('Optimization, the central part of any Machine Learning '
+               'algortithm',
+               2,
+               None,
+               'optimization-the-central-part-of-any-machine-learning-algortithm'),
+              ('Revisiting our Logistic Regression case',
+               2,
+               None,
+               'revisiting-our-logistic-regression-case'),
+              ('The equations to solve', 2, None, 'the-equations-to-solve'),
+              ("Solving using Newton-Raphson's method",
+               2,
+               None,
+               'solving-using-newton-raphson-s-method'),
+              ('Example code for Logistic Regression',
+               2,
+               None,
+               'example-code-for-logistic-regression'),
+              ('Synthetic data generation',
+               3,
+               None,
+               'synthetic-data-generation')]}
+end of tocinfo -->
+
+<body>
+
+
+
+<script type="text/x-mathjax-config">
+MathJax.Hub.Config({
+  TeX: {
+     equationNumbers: {  autoNumber: "none"  },
+     extensions: ["AMSmath.js", "AMSsymbols.js", "autobold.js", "color.js"]
+  }
+});
+</script>
+<script type="text/javascript" async
+ src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
+</script>
+
+
+<!-- Bootstrap navigation bar -->
+<div class="navbar navbar-default navbar-fixed-top">
+  <div class="navbar-header">
+    <button type="button" class="navbar-toggle" data-toggle="collapse" data-target=".navbar-responsive-collapse">
+      <span class="icon-bar"></span>
+      <span class="icon-bar"></span>
+      <span class="icon-bar"></span>
+    </button>
+    <a class="navbar-brand" href="http://github.com/week39-bs.html">Week 39: Resampling methods and logistic regression</a>
+  </div>
+  <div class="navbar-collapse collapse navbar-responsive-collapse">
+    <ul class="nav navbar-nav navbar-right">
+      <li class="dropdown">
+        <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
+        <ul class="dropdown-menu">
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs001.html#plan-for-week-39-september-22-26-2025" style="font-size: 80%;"><b>Plan for week 39, September 22-26, 2025</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs002.html#readings-and-videos-resampling-methods" style="font-size: 80%;"><b>Readings and Videos, resampling methods</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs003.html#readings-and-videos-logistic-regression" style="font-size: 80%;"><b>Readings and Videos, logistic regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs004.html#lab-sessions-week-39" style="font-size: 80%;"><b>Lab sessions week 39</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs005.html#lecture-material" style="font-size: 80%;"><b>Lecture material</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs010.html#resampling-methods" style="font-size: 80%;"><b>Resampling methods</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs007.html#resampling-approaches-can-be-computationally-expensive" style="font-size: 80%;"><b>Resampling approaches can be computationally expensive</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs008.html#why-resampling-methods" style="font-size: 80%;"><b>Why resampling methods ?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs009.html#statistical-analysis" style="font-size: 80%;"><b>Statistical analysis</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs010.html#resampling-methods" style="font-size: 80%;"><b>Resampling methods</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs011.html#resampling-methods-bootstrap" style="font-size: 80%;"><b>Resampling methods: Bootstrap</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs012.html#the-bias-variance-tradeoff" style="font-size: 80%;"><b>The bias-variance tradeoff</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs013.html#a-way-to-read-the-bias-variance-tradeoff" style="font-size: 80%;"><b>A way to Read the Bias-Variance Tradeoff</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs014.html#understanding-what-happens" style="font-size: 80%;"><b>Understanding what happens</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs015.html#summing-up" style="font-size: 80%;"><b>Summing up</b></a></li>
+     <!-- navigation toc: --> <li><a href="#another-example-from-scikit-learn-s-repository" style="font-size: 80%;"><b>Another Example from Scikit-Learn's Repository</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs017.html#various-steps-in-cross-validation" style="font-size: 80%;"><b>Various steps in cross-validation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs018.html#cross-validation-in-brief" style="font-size: 80%;"><b>Cross-validation in brief</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs019.html#code-example-for-cross-validation-and-k-fold-cross-validation" style="font-size: 80%;"><b>Code Example for Cross-validation and \( k \)-fold Cross-validation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs020.html#more-examples-on-bootstrap-and-cross-validation-and-errors" style="font-size: 80%;"><b>More examples on bootstrap and cross-validation and errors</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs021.html#the-same-example-but-now-with-cross-validation" style="font-size: 80%;"><b>The same example but now with cross-validation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs022.html#logistic-regression" style="font-size: 80%;"><b>Logistic Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs023.html#classification-problems" style="font-size: 80%;"><b>Classification problems</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs024.html#optimization-and-deep-learning" style="font-size: 80%;"><b>Optimization and Deep learning</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs025.html#basics" style="font-size: 80%;"><b>Basics</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs026.html#linear-classifier" style="font-size: 80%;"><b>Linear classifier</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs027.html#some-selected-properties" style="font-size: 80%;"><b>Some selected properties</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs028.html#simple-example" style="font-size: 80%;"><b>Simple example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs029.html#plotting-the-mean-value-for-each-group" style="font-size: 80%;"><b>Plotting the mean value for each group</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs030.html#the-logistic-function" style="font-size: 80%;"><b>The logistic function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs031.html#examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-networks" style="font-size: 80%;"><b>Examples of likelihood functions used in logistic regression and nueral networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs032.html#two-parameters" style="font-size: 80%;"><b>Two parameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs033.html#maximum-likelihood" style="font-size: 80%;"><b>Maximum likelihood</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#the-cost-function-rewritten" style="font-size: 80%;"><b>The cost function rewritten</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs035.html#minimizing-the-cross-entropy" style="font-size: 80%;"><b>Minimizing the cross entropy</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs036.html#a-more-compact-expression" style="font-size: 80%;"><b>A more compact expression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs037.html#extending-to-more-predictors" style="font-size: 80%;"><b>Extending to more predictors</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs038.html#including-more-classes" style="font-size: 80%;"><b>Including more classes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs039.html#more-classes" style="font-size: 80%;"><b>More classes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs040.html#optimization-the-central-part-of-any-machine-learning-algortithm" style="font-size: 80%;"><b>Optimization, the central part of any Machine Learning algortithm</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs041.html#revisiting-our-logistic-regression-case" style="font-size: 80%;"><b>Revisiting our Logistic Regression case</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs042.html#the-equations-to-solve" style="font-size: 80%;"><b>The equations to solve</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs043.html#solving-using-newton-raphson-s-method" style="font-size: 80%;"><b>Solving using Newton-Raphson's method</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs044.html#example-code-for-logistic-regression" style="font-size: 80%;"><b>Example code for Logistic Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs044.html#synthetic-data-generation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Synthetic data generation</a></li>
+
+        </ul>
+      </li>
+    </ul>
+  </div>
+</div>
+</div> <!-- end of navigation bar -->
+<div class="container">
+<p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
+<a name="part0016"></a>
+<!-- !split -->
+<h2 id="another-example-from-scikit-learn-s-repository" class="anchor">Another Example from Scikit-Learn's Repository </h2>
+
+<p>This example demonstrates the problems of underfitting and overfitting and
+how we can use linear regression with polynomial features to approximate
+nonlinear functions. The plot shows the function that we want to approximate,
+which is a part of the cosine function. In addition, the samples from the
+real function and the approximations of different models are displayed. The
+models have polynomial features of different degrees. We can see that a
+linear function (polynomial with degree 1) is not sufficient to fit the
+training samples. This is called <b>underfitting</b>. A polynomial of degree 4
+approximates the true function almost perfectly. However, for higher degrees
+the model will <b>overfit</b> the training data, i.e. it learns the noise of the
+training data.
+We evaluate quantitatively overfitting and underfitting by using
+cross-validation. We calculate the mean squared error (MSE) on the validation
+set, the higher, the less likely the model generalizes correctly from the
+training data.
+</p>
+
+
+<!-- code=python (!bc pycod) typeset with pygments style "default" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #f8f8f8">
+  <pre style="line-height: 125%;"><span style="color: #408080; font-style: italic">#print(__doc__)</span>
+
+<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
+<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">matplotlib.pyplot</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">plt</span>
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.pipeline</span> <span style="color: #008000; font-weight: bold">import</span> Pipeline
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.preprocessing</span> <span style="color: #008000; font-weight: bold">import</span> PolynomialFeatures
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.linear_model</span> <span style="color: #008000; font-weight: bold">import</span> LinearRegression
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.model_selection</span> <span style="color: #008000; font-weight: bold">import</span> cross_val_score
+
+
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">true_fun</span>(X):
+    <span style="color: #008000; font-weight: bold">return</span> np<span style="color: #666666">.</span>cos(<span style="color: #666666">1.5</span> <span style="color: #666666">*</span> np<span style="color: #666666">.</span>pi <span style="color: #666666">*</span> X)
+
+np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>seed(<span style="color: #666666">0</span>)
+
+n_samples <span style="color: #666666">=</span> <span style="color: #666666">30</span>
+degrees <span style="color: #666666">=</span> [<span style="color: #666666">1</span>, <span style="color: #666666">4</span>, <span style="color: #666666">15</span>]
+
+X <span style="color: #666666">=</span> np<span style="color: #666666">.</span>sort(np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>rand(n_samples))
+y <span style="color: #666666">=</span> true_fun(X) <span style="color: #666666">+</span> np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>randn(n_samples) <span style="color: #666666">*</span> <span style="color: #666666">0.1</span>
+
+plt<span style="color: #666666">.</span>figure(figsize<span style="color: #666666">=</span>(<span style="color: #666666">14</span>, <span style="color: #666666">5</span>))
+<span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #008000">len</span>(degrees)):
+    ax <span style="color: #666666">=</span> plt<span style="color: #666666">.</span>subplot(<span style="color: #666666">1</span>, <span style="color: #008000">len</span>(degrees), i <span style="color: #666666">+</span> <span style="color: #666666">1</span>)
+    plt<span style="color: #666666">.</span>setp(ax, xticks<span style="color: #666666">=</span>(), yticks<span style="color: #666666">=</span>())
+
+    polynomial_features <span style="color: #666666">=</span> PolynomialFeatures(degree<span style="color: #666666">=</span>degrees[i],
+                                             include_bias<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">False</span>)
+    linear_regression <span style="color: #666666">=</span> LinearRegression()
+    pipeline <span style="color: #666666">=</span> Pipeline([(<span style="color: #BA2121">&quot;polynomial_features&quot;</span>, polynomial_features),
+                         (<span style="color: #BA2121">&quot;linear_regression&quot;</span>, linear_regression)])
+    pipeline<span style="color: #666666">.</span>fit(X[:, np<span style="color: #666666">.</span>newaxis], y)
+
+    <span style="color: #408080; font-style: italic"># Evaluate the models using crossvalidation</span>
+    scores <span style="color: #666666">=</span> cross_val_score(pipeline, X[:, np<span style="color: #666666">.</span>newaxis], y,
+                             scoring<span style="color: #666666">=</span><span style="color: #BA2121">&quot;neg_mean_squared_error&quot;</span>, cv<span style="color: #666666">=10</span>)
+
+    X_test <span style="color: #666666">=</span> np<span style="color: #666666">.</span>linspace(<span style="color: #666666">0</span>, <span style="color: #666666">1</span>, <span style="color: #666666">100</span>)
+    plt<span style="color: #666666">.</span>plot(X_test, pipeline<span style="color: #666666">.</span>predict(X_test[:, np<span style="color: #666666">.</span>newaxis]), label<span style="color: #666666">=</span><span style="color: #BA2121">&quot;Model&quot;</span>)
+    plt<span style="color: #666666">.</span>plot(X_test, true_fun(X_test), label<span style="color: #666666">=</span><span style="color: #BA2121">&quot;True function&quot;</span>)
+    plt<span style="color: #666666">.</span>scatter(X, y, edgecolor<span style="color: #666666">=</span><span style="color: #BA2121">&#39;b&#39;</span>, s<span style="color: #666666">=20</span>, label<span style="color: #666666">=</span><span style="color: #BA2121">&quot;Samples&quot;</span>)
+    plt<span style="color: #666666">.</span>xlabel(<span style="color: #BA2121">&quot;x&quot;</span>)
+    plt<span style="color: #666666">.</span>ylabel(<span style="color: #BA2121">&quot;y&quot;</span>)
+    plt<span style="color: #666666">.</span>xlim((<span style="color: #666666">0</span>, <span style="color: #666666">1</span>))
+    plt<span style="color: #666666">.</span>ylim((<span style="color: #666666">-2</span>, <span style="color: #666666">2</span>))
+    plt<span style="color: #666666">.</span>legend(loc<span style="color: #666666">=</span><span style="color: #BA2121">&quot;best&quot;</span>)
+    plt<span style="color: #666666">.</span>title(<span style="color: #BA2121">&quot;Degree </span><span style="color: #BB6688; font-weight: bold">{}</span><span style="color: #BB6622; font-weight: bold">\n</span><span style="color: #BA2121">MSE = </span><span style="color: #BB6688; font-weight: bold">{:.2e}</span><span style="color: #BA2121">(+/- </span><span style="color: #BB6688; font-weight: bold">{:.2e}</span><span style="color: #BA2121">)&quot;</span><span style="color: #666666">.</span>format(
+        degrees[i], <span style="color: #666666">-</span>scores<span style="color: #666666">.</span>mean(), scores<span style="color: #666666">.</span>std()))
+plt<span style="color: #666666">.</span>show()
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
+</div>
+
+
+<p>
+<!-- navigation buttons at the bottom of the page -->
+<ul class="pagination">
+<li><a href="http://github.com/._week39-bs015.html">&laquo;</a></li>
+  <li><a href="http://github.com/._week39-bs000.html">1</a></li>
+  <li><a href="">...</a></li>
+  <li><a href="http://github.com/._week39-bs008.html">9</a></li>
+  <li><a href="http://github.com/._week39-bs009.html">10</a></li>
+  <li><a href="http://github.com/._week39-bs010.html">11</a></li>
+  <li><a href="http://github.com/._week39-bs011.html">12</a></li>
+  <li><a href="http://github.com/._week39-bs012.html">13</a></li>
+  <li><a href="http://github.com/._week39-bs013.html">14</a></li>
+  <li><a href="http://github.com/._week39-bs014.html">15</a></li>
+  <li><a href="http://github.com/._week39-bs015.html">16</a></li>
+  <li class="active"><a href="http://github.com/._week39-bs016.html">17</a></li>
+  <li><a href="http://github.com/._week39-bs017.html">18</a></li>
+  <li><a href="http://github.com/._week39-bs018.html">19</a></li>
+  <li><a href="http://github.com/._week39-bs019.html">20</a></li>
+  <li><a href="http://github.com/._week39-bs020.html">21</a></li>
+  <li><a href="http://github.com/._week39-bs021.html">22</a></li>
+  <li><a href="http://github.com/._week39-bs022.html">23</a></li>
+  <li><a href="http://github.com/._week39-bs023.html">24</a></li>
+  <li><a href="http://github.com/._week39-bs024.html">25</a></li>
+  <li><a href="http://github.com/._week39-bs025.html">26</a></li>
+  <li><a href="">...</a></li>
+  <li><a href="http://github.com/._week39-bs044.html">45</a></li>
+  <li><a href="http://github.com/._week39-bs017.html">&raquo;</a></li>
+</ul>
+<!-- ------------------- end of main content --------------- -->
+</div>  <!-- end container -->
+<!-- include javascript, jQuery *first* -->
+<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.10.2/jquery.min.js"></script>
+<script src="https://netdna.bootstrapcdn.com/bootstrap/3.0.0/js/bootstrap.min.js"></script>
+<!-- Bootstrap footer
+<footer>
+<a href="https://.../"><img width="250" align=right src="https://.../"></a>
+</footer>
+-->
+<center style="font-size:80%">
+<!-- copyright only on the titlepage -->
+</center>
+</body>
+</html>
+
diff --git a/doc/pub/week39/html/._week39-bs017.html b/doc/pub/week39/html/._week39-bs017.html
new file mode 100644
index 000000000..82e5eaa7a
--- /dev/null
+++ b/doc/pub/week39/html/._week39-bs017.html
@@ -0,0 +1,316 @@
+<!--
+HTML file automatically generated from DocOnce source
+(https://github.com/doconce/doconce/)
+doconce format html week39.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week39-bs --no_mako
+-->
+<html>
+<head>
+<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
+<meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
+<meta name="viewport" content="width=device-width, initial-scale=1.0" />
+<meta name="description" content="Week 39: Resampling methods and logistic regression">
+<title>Week 39: Resampling methods and logistic regression</title>
+<!-- Bootstrap style: bootstrap -->
+<!-- doconce format html week39.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week39-bs --no_mako -->
+<link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
+<!-- not necessary
+<link href="https://netdna.bootstrapcdn.com/font-awesome/4.0.3/css/font-awesome.css" rel="stylesheet">
+-->
+<style type="text/css">
+/* Add scrollbar to dropdown menus in bootstrap navigation bar */
+.dropdown-menu {
+   height: auto;
+   max-height: 400px;
+   overflow-x: hidden;
+}
+/* Adds an invisible element before each target to offset for the navigation
+   bar */
+.anchor::before {
+  content:"";
+  display:block;
+  height:50px;      /* fixed header height for style bootstrap */
+  margin:-50px 0 0; /* negative fixed header height */
+}
+</style>
+</head>
+
+<!-- tocinfo
+{'highest level': 2,
+ 'sections': [('Plan for week 39, September 22-26, 2025',
+               2,
+               None,
+               'plan-for-week-39-september-22-26-2025'),
+              ('Readings and Videos, resampling methods',
+               2,
+               None,
+               'readings-and-videos-resampling-methods'),
+              ('Readings and Videos, logistic regression',
+               2,
+               None,
+               'readings-and-videos-logistic-regression'),
+              ('Lab sessions week 39', 2, None, 'lab-sessions-week-39'),
+              ('Lecture material', 2, None, 'lecture-material'),
+              ('Resampling methods', 2, None, 'resampling-methods'),
+              ('Resampling approaches can be computationally expensive',
+               2,
+               None,
+               'resampling-approaches-can-be-computationally-expensive'),
+              ('Why resampling methods ?', 2, None, 'why-resampling-methods'),
+              ('Statistical analysis', 2, None, 'statistical-analysis'),
+              ('Resampling methods', 2, None, 'resampling-methods'),
+              ('Resampling methods: Bootstrap',
+               2,
+               None,
+               'resampling-methods-bootstrap'),
+              ('The bias-variance tradeoff',
+               2,
+               None,
+               'the-bias-variance-tradeoff'),
+              ('A way to Read the Bias-Variance Tradeoff',
+               2,
+               None,
+               'a-way-to-read-the-bias-variance-tradeoff'),
+              ('Understanding what happens',
+               2,
+               None,
+               'understanding-what-happens'),
+              ('Summing up', 2, None, 'summing-up'),
+              ("Another Example from Scikit-Learn's Repository",
+               2,
+               None,
+               'another-example-from-scikit-learn-s-repository'),
+              ('Various steps in cross-validation',
+               2,
+               None,
+               'various-steps-in-cross-validation'),
+              ('Cross-validation in brief',
+               2,
+               None,
+               'cross-validation-in-brief'),
+              ('Code Example for Cross-validation and $k$-fold '
+               'Cross-validation',
+               2,
+               None,
+               'code-example-for-cross-validation-and-k-fold-cross-validation'),
+              ('More examples on bootstrap and cross-validation and errors',
+               2,
+               None,
+               'more-examples-on-bootstrap-and-cross-validation-and-errors'),
+              ('The same example but now with cross-validation',
+               2,
+               None,
+               'the-same-example-but-now-with-cross-validation'),
+              ('Logistic Regression', 2, None, 'logistic-regression'),
+              ('Classification problems', 2, None, 'classification-problems'),
+              ('Optimization and Deep learning',
+               2,
+               None,
+               'optimization-and-deep-learning'),
+              ('Basics', 2, None, 'basics'),
+              ('Linear classifier', 2, None, 'linear-classifier'),
+              ('Some selected properties', 2, None, 'some-selected-properties'),
+              ('Simple example', 2, None, 'simple-example'),
+              ('Plotting the mean value for each group',
+               2,
+               None,
+               'plotting-the-mean-value-for-each-group'),
+              ('The logistic function', 2, None, 'the-logistic-function'),
+              ('Examples of likelihood functions used in logistic regression '
+               'and nueral networks',
+               2,
+               None,
+               'examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-networks'),
+              ('Two parameters', 2, None, 'two-parameters'),
+              ('Maximum likelihood', 2, None, 'maximum-likelihood'),
+              ('The cost function rewritten',
+               2,
+               None,
+               'the-cost-function-rewritten'),
+              ('Minimizing the cross entropy',
+               2,
+               None,
+               'minimizing-the-cross-entropy'),
+              ('A more compact expression',
+               2,
+               None,
+               'a-more-compact-expression'),
+              ('Extending to more predictors',
+               2,
+               None,
+               'extending-to-more-predictors'),
+              ('Including more classes', 2, None, 'including-more-classes'),
+              ('More classes', 2, None, 'more-classes'),
+              ('Optimization, the central part of any Machine Learning '
+               'algortithm',
+               2,
+               None,
+               'optimization-the-central-part-of-any-machine-learning-algortithm'),
+              ('Revisiting our Logistic Regression case',
+               2,
+               None,
+               'revisiting-our-logistic-regression-case'),
+              ('The equations to solve', 2, None, 'the-equations-to-solve'),
+              ("Solving using Newton-Raphson's method",
+               2,
+               None,
+               'solving-using-newton-raphson-s-method'),
+              ('Example code for Logistic Regression',
+               2,
+               None,
+               'example-code-for-logistic-regression'),
+              ('Synthetic data generation',
+               3,
+               None,
+               'synthetic-data-generation')]}
+end of tocinfo -->
+
+<body>
+
+
+
+<script type="text/x-mathjax-config">
+MathJax.Hub.Config({
+  TeX: {
+     equationNumbers: {  autoNumber: "none"  },
+     extensions: ["AMSmath.js", "AMSsymbols.js", "autobold.js", "color.js"]
+  }
+});
+</script>
+<script type="text/javascript" async
+ src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
+</script>
+
+
+<!-- Bootstrap navigation bar -->
+<div class="navbar navbar-default navbar-fixed-top">
+  <div class="navbar-header">
+    <button type="button" class="navbar-toggle" data-toggle="collapse" data-target=".navbar-responsive-collapse">
+      <span class="icon-bar"></span>
+      <span class="icon-bar"></span>
+      <span class="icon-bar"></span>
+    </button>
+    <a class="navbar-brand" href="http://github.com/week39-bs.html">Week 39: Resampling methods and logistic regression</a>
+  </div>
+  <div class="navbar-collapse collapse navbar-responsive-collapse">
+    <ul class="nav navbar-nav navbar-right">
+      <li class="dropdown">
+        <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
+        <ul class="dropdown-menu">
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs001.html#plan-for-week-39-september-22-26-2025" style="font-size: 80%;"><b>Plan for week 39, September 22-26, 2025</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs002.html#readings-and-videos-resampling-methods" style="font-size: 80%;"><b>Readings and Videos, resampling methods</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs003.html#readings-and-videos-logistic-regression" style="font-size: 80%;"><b>Readings and Videos, logistic regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs004.html#lab-sessions-week-39" style="font-size: 80%;"><b>Lab sessions week 39</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs005.html#lecture-material" style="font-size: 80%;"><b>Lecture material</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs010.html#resampling-methods" style="font-size: 80%;"><b>Resampling methods</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs007.html#resampling-approaches-can-be-computationally-expensive" style="font-size: 80%;"><b>Resampling approaches can be computationally expensive</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs008.html#why-resampling-methods" style="font-size: 80%;"><b>Why resampling methods ?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs009.html#statistical-analysis" style="font-size: 80%;"><b>Statistical analysis</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs010.html#resampling-methods" style="font-size: 80%;"><b>Resampling methods</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs011.html#resampling-methods-bootstrap" style="font-size: 80%;"><b>Resampling methods: Bootstrap</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs012.html#the-bias-variance-tradeoff" style="font-size: 80%;"><b>The bias-variance tradeoff</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs013.html#a-way-to-read-the-bias-variance-tradeoff" style="font-size: 80%;"><b>A way to Read the Bias-Variance Tradeoff</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs014.html#understanding-what-happens" style="font-size: 80%;"><b>Understanding what happens</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs015.html#summing-up" style="font-size: 80%;"><b>Summing up</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs016.html#another-example-from-scikit-learn-s-repository" style="font-size: 80%;"><b>Another Example from Scikit-Learn's Repository</b></a></li>
+     <!-- navigation toc: --> <li><a href="#various-steps-in-cross-validation" style="font-size: 80%;"><b>Various steps in cross-validation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs018.html#cross-validation-in-brief" style="font-size: 80%;"><b>Cross-validation in brief</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs019.html#code-example-for-cross-validation-and-k-fold-cross-validation" style="font-size: 80%;"><b>Code Example for Cross-validation and \( k \)-fold Cross-validation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs020.html#more-examples-on-bootstrap-and-cross-validation-and-errors" style="font-size: 80%;"><b>More examples on bootstrap and cross-validation and errors</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs021.html#the-same-example-but-now-with-cross-validation" style="font-size: 80%;"><b>The same example but now with cross-validation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs022.html#logistic-regression" style="font-size: 80%;"><b>Logistic Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs023.html#classification-problems" style="font-size: 80%;"><b>Classification problems</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs024.html#optimization-and-deep-learning" style="font-size: 80%;"><b>Optimization and Deep learning</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs025.html#basics" style="font-size: 80%;"><b>Basics</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs026.html#linear-classifier" style="font-size: 80%;"><b>Linear classifier</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs027.html#some-selected-properties" style="font-size: 80%;"><b>Some selected properties</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs028.html#simple-example" style="font-size: 80%;"><b>Simple example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs029.html#plotting-the-mean-value-for-each-group" style="font-size: 80%;"><b>Plotting the mean value for each group</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs030.html#the-logistic-function" style="font-size: 80%;"><b>The logistic function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs031.html#examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-networks" style="font-size: 80%;"><b>Examples of likelihood functions used in logistic regression and nueral networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs032.html#two-parameters" style="font-size: 80%;"><b>Two parameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs033.html#maximum-likelihood" style="font-size: 80%;"><b>Maximum likelihood</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#the-cost-function-rewritten" style="font-size: 80%;"><b>The cost function rewritten</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs035.html#minimizing-the-cross-entropy" style="font-size: 80%;"><b>Minimizing the cross entropy</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs036.html#a-more-compact-expression" style="font-size: 80%;"><b>A more compact expression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs037.html#extending-to-more-predictors" style="font-size: 80%;"><b>Extending to more predictors</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs038.html#including-more-classes" style="font-size: 80%;"><b>Including more classes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs039.html#more-classes" style="font-size: 80%;"><b>More classes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs040.html#optimization-the-central-part-of-any-machine-learning-algortithm" style="font-size: 80%;"><b>Optimization, the central part of any Machine Learning algortithm</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs041.html#revisiting-our-logistic-regression-case" style="font-size: 80%;"><b>Revisiting our Logistic Regression case</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs042.html#the-equations-to-solve" style="font-size: 80%;"><b>The equations to solve</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs043.html#solving-using-newton-raphson-s-method" style="font-size: 80%;"><b>Solving using Newton-Raphson's method</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs044.html#example-code-for-logistic-regression" style="font-size: 80%;"><b>Example code for Logistic Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs044.html#synthetic-data-generation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Synthetic data generation</a></li>
+
+        </ul>
+      </li>
+    </ul>
+  </div>
+</div>
+</div> <!-- end of navigation bar -->
+<div class="container">
+<p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
+<a name="part0017"></a>
+<!-- !split  -->
+<h2 id="various-steps-in-cross-validation" class="anchor">Various steps in cross-validation </h2>
+
+<p>When the repetitive splitting of the data set is done randomly,
+samples may accidently end up in a fast majority of the splits in
+either training or test set. Such samples may have an unbalanced
+influence on either model building or prediction evaluation. To avoid
+this \( k \)-fold cross-validation structures the data splitting. The
+samples are divided into \( k \) more or less equally sized exhaustive and
+mutually exclusive subsets. In turn (at each split) one of these
+subsets plays the role of the test set while the union of the
+remaining subsets constitutes the training set. Such a splitting
+warrants a balanced representation of each sample in both training and
+test set over the splits. Still the division into the \( k \) subsets
+involves a degree of randomness. This may be fully excluded when
+choosing \( k=n \). This particular case is referred to as leave-one-out
+cross-validation (LOOCV). 
+</p>
+
+<p>
+<!-- navigation buttons at the bottom of the page -->
+<ul class="pagination">
+<li><a href="http://github.com/._week39-bs016.html">&laquo;</a></li>
+  <li><a href="http://github.com/._week39-bs000.html">1</a></li>
+  <li><a href="">...</a></li>
+  <li><a href="http://github.com/._week39-bs009.html">10</a></li>
+  <li><a href="http://github.com/._week39-bs010.html">11</a></li>
+  <li><a href="http://github.com/._week39-bs011.html">12</a></li>
+  <li><a href="http://github.com/._week39-bs012.html">13</a></li>
+  <li><a href="http://github.com/._week39-bs013.html">14</a></li>
+  <li><a href="http://github.com/._week39-bs014.html">15</a></li>
+  <li><a href="http://github.com/._week39-bs015.html">16</a></li>
+  <li><a href="http://github.com/._week39-bs016.html">17</a></li>
+  <li class="active"><a href="http://github.com/._week39-bs017.html">18</a></li>
+  <li><a href="http://github.com/._week39-bs018.html">19</a></li>
+  <li><a href="http://github.com/._week39-bs019.html">20</a></li>
+  <li><a href="http://github.com/._week39-bs020.html">21</a></li>
+  <li><a href="http://github.com/._week39-bs021.html">22</a></li>
+  <li><a href="http://github.com/._week39-bs022.html">23</a></li>
+  <li><a href="http://github.com/._week39-bs023.html">24</a></li>
+  <li><a href="http://github.com/._week39-bs024.html">25</a></li>
+  <li><a href="http://github.com/._week39-bs025.html">26</a></li>
+  <li><a href="http://github.com/._week39-bs026.html">27</a></li>
+  <li><a href="">...</a></li>
+  <li><a href="http://github.com/._week39-bs044.html">45</a></li>
+  <li><a href="http://github.com/._week39-bs018.html">&raquo;</a></li>
+</ul>
+<!-- ------------------- end of main content --------------- -->
+</div>  <!-- end container -->
+<!-- include javascript, jQuery *first* -->
+<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.10.2/jquery.min.js"></script>
+<script src="https://netdna.bootstrapcdn.com/bootstrap/3.0.0/js/bootstrap.min.js"></script>
+<!-- Bootstrap footer
+<footer>
+<a href="https://.../"><img width="250" align=right src="https://.../"></a>
+</footer>
+-->
+<center style="font-size:80%">
+<!-- copyright only on the titlepage -->
+</center>
+</body>
+</html>
+
diff --git a/doc/pub/week39/html/._week39-bs018.html b/doc/pub/week39/html/._week39-bs018.html
new file mode 100644
index 000000000..a4fd4bad7
--- /dev/null
+++ b/doc/pub/week39/html/._week39-bs018.html
@@ -0,0 +1,314 @@
+<!--
+HTML file automatically generated from DocOnce source
+(https://github.com/doconce/doconce/)
+doconce format html week39.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week39-bs --no_mako
+-->
+<html>
+<head>
+<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
+<meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
+<meta name="viewport" content="width=device-width, initial-scale=1.0" />
+<meta name="description" content="Week 39: Resampling methods and logistic regression">
+<title>Week 39: Resampling methods and logistic regression</title>
+<!-- Bootstrap style: bootstrap -->
+<!-- doconce format html week39.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week39-bs --no_mako -->
+<link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
+<!-- not necessary
+<link href="https://netdna.bootstrapcdn.com/font-awesome/4.0.3/css/font-awesome.css" rel="stylesheet">
+-->
+<style type="text/css">
+/* Add scrollbar to dropdown menus in bootstrap navigation bar */
+.dropdown-menu {
+   height: auto;
+   max-height: 400px;
+   overflow-x: hidden;
+}
+/* Adds an invisible element before each target to offset for the navigation
+   bar */
+.anchor::before {
+  content:"";
+  display:block;
+  height:50px;      /* fixed header height for style bootstrap */
+  margin:-50px 0 0; /* negative fixed header height */
+}
+</style>
+</head>
+
+<!-- tocinfo
+{'highest level': 2,
+ 'sections': [('Plan for week 39, September 22-26, 2025',
+               2,
+               None,
+               'plan-for-week-39-september-22-26-2025'),
+              ('Readings and Videos, resampling methods',
+               2,
+               None,
+               'readings-and-videos-resampling-methods'),
+              ('Readings and Videos, logistic regression',
+               2,
+               None,
+               'readings-and-videos-logistic-regression'),
+              ('Lab sessions week 39', 2, None, 'lab-sessions-week-39'),
+              ('Lecture material', 2, None, 'lecture-material'),
+              ('Resampling methods', 2, None, 'resampling-methods'),
+              ('Resampling approaches can be computationally expensive',
+               2,
+               None,
+               'resampling-approaches-can-be-computationally-expensive'),
+              ('Why resampling methods ?', 2, None, 'why-resampling-methods'),
+              ('Statistical analysis', 2, None, 'statistical-analysis'),
+              ('Resampling methods', 2, None, 'resampling-methods'),
+              ('Resampling methods: Bootstrap',
+               2,
+               None,
+               'resampling-methods-bootstrap'),
+              ('The bias-variance tradeoff',
+               2,
+               None,
+               'the-bias-variance-tradeoff'),
+              ('A way to Read the Bias-Variance Tradeoff',
+               2,
+               None,
+               'a-way-to-read-the-bias-variance-tradeoff'),
+              ('Understanding what happens',
+               2,
+               None,
+               'understanding-what-happens'),
+              ('Summing up', 2, None, 'summing-up'),
+              ("Another Example from Scikit-Learn's Repository",
+               2,
+               None,
+               'another-example-from-scikit-learn-s-repository'),
+              ('Various steps in cross-validation',
+               2,
+               None,
+               'various-steps-in-cross-validation'),
+              ('Cross-validation in brief',
+               2,
+               None,
+               'cross-validation-in-brief'),
+              ('Code Example for Cross-validation and $k$-fold '
+               'Cross-validation',
+               2,
+               None,
+               'code-example-for-cross-validation-and-k-fold-cross-validation'),
+              ('More examples on bootstrap and cross-validation and errors',
+               2,
+               None,
+               'more-examples-on-bootstrap-and-cross-validation-and-errors'),
+              ('The same example but now with cross-validation',
+               2,
+               None,
+               'the-same-example-but-now-with-cross-validation'),
+              ('Logistic Regression', 2, None, 'logistic-regression'),
+              ('Classification problems', 2, None, 'classification-problems'),
+              ('Optimization and Deep learning',
+               2,
+               None,
+               'optimization-and-deep-learning'),
+              ('Basics', 2, None, 'basics'),
+              ('Linear classifier', 2, None, 'linear-classifier'),
+              ('Some selected properties', 2, None, 'some-selected-properties'),
+              ('Simple example', 2, None, 'simple-example'),
+              ('Plotting the mean value for each group',
+               2,
+               None,
+               'plotting-the-mean-value-for-each-group'),
+              ('The logistic function', 2, None, 'the-logistic-function'),
+              ('Examples of likelihood functions used in logistic regression '
+               'and nueral networks',
+               2,
+               None,
+               'examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-networks'),
+              ('Two parameters', 2, None, 'two-parameters'),
+              ('Maximum likelihood', 2, None, 'maximum-likelihood'),
+              ('The cost function rewritten',
+               2,
+               None,
+               'the-cost-function-rewritten'),
+              ('Minimizing the cross entropy',
+               2,
+               None,
+               'minimizing-the-cross-entropy'),
+              ('A more compact expression',
+               2,
+               None,
+               'a-more-compact-expression'),
+              ('Extending to more predictors',
+               2,
+               None,
+               'extending-to-more-predictors'),
+              ('Including more classes', 2, None, 'including-more-classes'),
+              ('More classes', 2, None, 'more-classes'),
+              ('Optimization, the central part of any Machine Learning '
+               'algortithm',
+               2,
+               None,
+               'optimization-the-central-part-of-any-machine-learning-algortithm'),
+              ('Revisiting our Logistic Regression case',
+               2,
+               None,
+               'revisiting-our-logistic-regression-case'),
+              ('The equations to solve', 2, None, 'the-equations-to-solve'),
+              ("Solving using Newton-Raphson's method",
+               2,
+               None,
+               'solving-using-newton-raphson-s-method'),
+              ('Example code for Logistic Regression',
+               2,
+               None,
+               'example-code-for-logistic-regression'),
+              ('Synthetic data generation',
+               3,
+               None,
+               'synthetic-data-generation')]}
+end of tocinfo -->
+
+<body>
+
+
+
+<script type="text/x-mathjax-config">
+MathJax.Hub.Config({
+  TeX: {
+     equationNumbers: {  autoNumber: "none"  },
+     extensions: ["AMSmath.js", "AMSsymbols.js", "autobold.js", "color.js"]
+  }
+});
+</script>
+<script type="text/javascript" async
+ src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
+</script>
+
+
+<!-- Bootstrap navigation bar -->
+<div class="navbar navbar-default navbar-fixed-top">
+  <div class="navbar-header">
+    <button type="button" class="navbar-toggle" data-toggle="collapse" data-target=".navbar-responsive-collapse">
+      <span class="icon-bar"></span>
+      <span class="icon-bar"></span>
+      <span class="icon-bar"></span>
+    </button>
+    <a class="navbar-brand" href="http://github.com/week39-bs.html">Week 39: Resampling methods and logistic regression</a>
+  </div>
+  <div class="navbar-collapse collapse navbar-responsive-collapse">
+    <ul class="nav navbar-nav navbar-right">
+      <li class="dropdown">
+        <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
+        <ul class="dropdown-menu">
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs001.html#plan-for-week-39-september-22-26-2025" style="font-size: 80%;"><b>Plan for week 39, September 22-26, 2025</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs002.html#readings-and-videos-resampling-methods" style="font-size: 80%;"><b>Readings and Videos, resampling methods</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs003.html#readings-and-videos-logistic-regression" style="font-size: 80%;"><b>Readings and Videos, logistic regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs004.html#lab-sessions-week-39" style="font-size: 80%;"><b>Lab sessions week 39</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs005.html#lecture-material" style="font-size: 80%;"><b>Lecture material</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs010.html#resampling-methods" style="font-size: 80%;"><b>Resampling methods</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs007.html#resampling-approaches-can-be-computationally-expensive" style="font-size: 80%;"><b>Resampling approaches can be computationally expensive</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs008.html#why-resampling-methods" style="font-size: 80%;"><b>Why resampling methods ?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs009.html#statistical-analysis" style="font-size: 80%;"><b>Statistical analysis</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs010.html#resampling-methods" style="font-size: 80%;"><b>Resampling methods</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs011.html#resampling-methods-bootstrap" style="font-size: 80%;"><b>Resampling methods: Bootstrap</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs012.html#the-bias-variance-tradeoff" style="font-size: 80%;"><b>The bias-variance tradeoff</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs013.html#a-way-to-read-the-bias-variance-tradeoff" style="font-size: 80%;"><b>A way to Read the Bias-Variance Tradeoff</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs014.html#understanding-what-happens" style="font-size: 80%;"><b>Understanding what happens</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs015.html#summing-up" style="font-size: 80%;"><b>Summing up</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs016.html#another-example-from-scikit-learn-s-repository" style="font-size: 80%;"><b>Another Example from Scikit-Learn's Repository</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs017.html#various-steps-in-cross-validation" style="font-size: 80%;"><b>Various steps in cross-validation</b></a></li>
+     <!-- navigation toc: --> <li><a href="#cross-validation-in-brief" style="font-size: 80%;"><b>Cross-validation in brief</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs019.html#code-example-for-cross-validation-and-k-fold-cross-validation" style="font-size: 80%;"><b>Code Example for Cross-validation and \( k \)-fold Cross-validation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs020.html#more-examples-on-bootstrap-and-cross-validation-and-errors" style="font-size: 80%;"><b>More examples on bootstrap and cross-validation and errors</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs021.html#the-same-example-but-now-with-cross-validation" style="font-size: 80%;"><b>The same example but now with cross-validation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs022.html#logistic-regression" style="font-size: 80%;"><b>Logistic Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs023.html#classification-problems" style="font-size: 80%;"><b>Classification problems</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs024.html#optimization-and-deep-learning" style="font-size: 80%;"><b>Optimization and Deep learning</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs025.html#basics" style="font-size: 80%;"><b>Basics</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs026.html#linear-classifier" style="font-size: 80%;"><b>Linear classifier</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs027.html#some-selected-properties" style="font-size: 80%;"><b>Some selected properties</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs028.html#simple-example" style="font-size: 80%;"><b>Simple example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs029.html#plotting-the-mean-value-for-each-group" style="font-size: 80%;"><b>Plotting the mean value for each group</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs030.html#the-logistic-function" style="font-size: 80%;"><b>The logistic function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs031.html#examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-networks" style="font-size: 80%;"><b>Examples of likelihood functions used in logistic regression and nueral networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs032.html#two-parameters" style="font-size: 80%;"><b>Two parameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs033.html#maximum-likelihood" style="font-size: 80%;"><b>Maximum likelihood</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#the-cost-function-rewritten" style="font-size: 80%;"><b>The cost function rewritten</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs035.html#minimizing-the-cross-entropy" style="font-size: 80%;"><b>Minimizing the cross entropy</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs036.html#a-more-compact-expression" style="font-size: 80%;"><b>A more compact expression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs037.html#extending-to-more-predictors" style="font-size: 80%;"><b>Extending to more predictors</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs038.html#including-more-classes" style="font-size: 80%;"><b>Including more classes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs039.html#more-classes" style="font-size: 80%;"><b>More classes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs040.html#optimization-the-central-part-of-any-machine-learning-algortithm" style="font-size: 80%;"><b>Optimization, the central part of any Machine Learning algortithm</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs041.html#revisiting-our-logistic-regression-case" style="font-size: 80%;"><b>Revisiting our Logistic Regression case</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs042.html#the-equations-to-solve" style="font-size: 80%;"><b>The equations to solve</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs043.html#solving-using-newton-raphson-s-method" style="font-size: 80%;"><b>Solving using Newton-Raphson's method</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs044.html#example-code-for-logistic-regression" style="font-size: 80%;"><b>Example code for Logistic Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs044.html#synthetic-data-generation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Synthetic data generation</a></li>
+
+        </ul>
+      </li>
+    </ul>
+  </div>
+</div>
+</div> <!-- end of navigation bar -->
+<div class="container">
+<p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
+<a name="part0018"></a>
+<!-- !split -->
+<h2 id="cross-validation-in-brief" class="anchor">Cross-validation in brief </h2>
+
+<p>For the various values of \( k \)</p>
+
+<ol>
+<li> shuffle the dataset randomly.</li>
+<li> Split the dataset into \( k \) groups.</li>
+<li> For each unique group:
+<ol type="a"></li>
+<li> Decide which group to use as set for test data</li>
+<li> Take the remaining groups as a training data set</li>
+<li> Fit a model on the training set and evaluate it on the test set</li>
+<li> Retain the evaluation score and discard the model</li>
+</ol>
+<li> Summarize the model using the sample of model evaluation scores</li>
+</ol>
+<p>
+<!-- navigation buttons at the bottom of the page -->
+<ul class="pagination">
+<li><a href="http://github.com/._week39-bs017.html">&laquo;</a></li>
+  <li><a href="http://github.com/._week39-bs000.html">1</a></li>
+  <li><a href="">...</a></li>
+  <li><a href="http://github.com/._week39-bs010.html">11</a></li>
+  <li><a href="http://github.com/._week39-bs011.html">12</a></li>
+  <li><a href="http://github.com/._week39-bs012.html">13</a></li>
+  <li><a href="http://github.com/._week39-bs013.html">14</a></li>
+  <li><a href="http://github.com/._week39-bs014.html">15</a></li>
+  <li><a href="http://github.com/._week39-bs015.html">16</a></li>
+  <li><a href="http://github.com/._week39-bs016.html">17</a></li>
+  <li><a href="http://github.com/._week39-bs017.html">18</a></li>
+  <li class="active"><a href="http://github.com/._week39-bs018.html">19</a></li>
+  <li><a href="http://github.com/._week39-bs019.html">20</a></li>
+  <li><a href="http://github.com/._week39-bs020.html">21</a></li>
+  <li><a href="http://github.com/._week39-bs021.html">22</a></li>
+  <li><a href="http://github.com/._week39-bs022.html">23</a></li>
+  <li><a href="http://github.com/._week39-bs023.html">24</a></li>
+  <li><a href="http://github.com/._week39-bs024.html">25</a></li>
+  <li><a href="http://github.com/._week39-bs025.html">26</a></li>
+  <li><a href="http://github.com/._week39-bs026.html">27</a></li>
+  <li><a href="http://github.com/._week39-bs027.html">28</a></li>
+  <li><a href="">...</a></li>
+  <li><a href="http://github.com/._week39-bs044.html">45</a></li>
+  <li><a href="http://github.com/._week39-bs019.html">&raquo;</a></li>
+</ul>
+<!-- ------------------- end of main content --------------- -->
+</div>  <!-- end container -->
+<!-- include javascript, jQuery *first* -->
+<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.10.2/jquery.min.js"></script>
+<script src="https://netdna.bootstrapcdn.com/bootstrap/3.0.0/js/bootstrap.min.js"></script>
+<!-- Bootstrap footer
+<footer>
+<a href="https://.../"><img width="250" align=right src="https://.../"></a>
+</footer>
+-->
+<center style="font-size:80%">
+<!-- copyright only on the titlepage -->
+</center>
+</body>
+</html>
+
diff --git a/doc/pub/week39/html/._week39-bs019.html b/doc/pub/week39/html/._week39-bs019.html
new file mode 100644
index 000000000..bc6713f40
--- /dev/null
+++ b/doc/pub/week39/html/._week39-bs019.html
@@ -0,0 +1,413 @@
+<!--
+HTML file automatically generated from DocOnce source
+(https://github.com/doconce/doconce/)
+doconce format html week39.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week39-bs --no_mako
+-->
+<html>
+<head>
+<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
+<meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
+<meta name="viewport" content="width=device-width, initial-scale=1.0" />
+<meta name="description" content="Week 39: Resampling methods and logistic regression">
+<title>Week 39: Resampling methods and logistic regression</title>
+<!-- Bootstrap style: bootstrap -->
+<!-- doconce format html week39.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week39-bs --no_mako -->
+<link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
+<!-- not necessary
+<link href="https://netdna.bootstrapcdn.com/font-awesome/4.0.3/css/font-awesome.css" rel="stylesheet">
+-->
+<style type="text/css">
+/* Add scrollbar to dropdown menus in bootstrap navigation bar */
+.dropdown-menu {
+   height: auto;
+   max-height: 400px;
+   overflow-x: hidden;
+}
+/* Adds an invisible element before each target to offset for the navigation
+   bar */
+.anchor::before {
+  content:"";
+  display:block;
+  height:50px;      /* fixed header height for style bootstrap */
+  margin:-50px 0 0; /* negative fixed header height */
+}
+</style>
+</head>
+
+<!-- tocinfo
+{'highest level': 2,
+ 'sections': [('Plan for week 39, September 22-26, 2025',
+               2,
+               None,
+               'plan-for-week-39-september-22-26-2025'),
+              ('Readings and Videos, resampling methods',
+               2,
+               None,
+               'readings-and-videos-resampling-methods'),
+              ('Readings and Videos, logistic regression',
+               2,
+               None,
+               'readings-and-videos-logistic-regression'),
+              ('Lab sessions week 39', 2, None, 'lab-sessions-week-39'),
+              ('Lecture material', 2, None, 'lecture-material'),
+              ('Resampling methods', 2, None, 'resampling-methods'),
+              ('Resampling approaches can be computationally expensive',
+               2,
+               None,
+               'resampling-approaches-can-be-computationally-expensive'),
+              ('Why resampling methods ?', 2, None, 'why-resampling-methods'),
+              ('Statistical analysis', 2, None, 'statistical-analysis'),
+              ('Resampling methods', 2, None, 'resampling-methods'),
+              ('Resampling methods: Bootstrap',
+               2,
+               None,
+               'resampling-methods-bootstrap'),
+              ('The bias-variance tradeoff',
+               2,
+               None,
+               'the-bias-variance-tradeoff'),
+              ('A way to Read the Bias-Variance Tradeoff',
+               2,
+               None,
+               'a-way-to-read-the-bias-variance-tradeoff'),
+              ('Understanding what happens',
+               2,
+               None,
+               'understanding-what-happens'),
+              ('Summing up', 2, None, 'summing-up'),
+              ("Another Example from Scikit-Learn's Repository",
+               2,
+               None,
+               'another-example-from-scikit-learn-s-repository'),
+              ('Various steps in cross-validation',
+               2,
+               None,
+               'various-steps-in-cross-validation'),
+              ('Cross-validation in brief',
+               2,
+               None,
+               'cross-validation-in-brief'),
+              ('Code Example for Cross-validation and $k$-fold '
+               'Cross-validation',
+               2,
+               None,
+               'code-example-for-cross-validation-and-k-fold-cross-validation'),
+              ('More examples on bootstrap and cross-validation and errors',
+               2,
+               None,
+               'more-examples-on-bootstrap-and-cross-validation-and-errors'),
+              ('The same example but now with cross-validation',
+               2,
+               None,
+               'the-same-example-but-now-with-cross-validation'),
+              ('Logistic Regression', 2, None, 'logistic-regression'),
+              ('Classification problems', 2, None, 'classification-problems'),
+              ('Optimization and Deep learning',
+               2,
+               None,
+               'optimization-and-deep-learning'),
+              ('Basics', 2, None, 'basics'),
+              ('Linear classifier', 2, None, 'linear-classifier'),
+              ('Some selected properties', 2, None, 'some-selected-properties'),
+              ('Simple example', 2, None, 'simple-example'),
+              ('Plotting the mean value for each group',
+               2,
+               None,
+               'plotting-the-mean-value-for-each-group'),
+              ('The logistic function', 2, None, 'the-logistic-function'),
+              ('Examples of likelihood functions used in logistic regression '
+               'and nueral networks',
+               2,
+               None,
+               'examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-networks'),
+              ('Two parameters', 2, None, 'two-parameters'),
+              ('Maximum likelihood', 2, None, 'maximum-likelihood'),
+              ('The cost function rewritten',
+               2,
+               None,
+               'the-cost-function-rewritten'),
+              ('Minimizing the cross entropy',
+               2,
+               None,
+               'minimizing-the-cross-entropy'),
+              ('A more compact expression',
+               2,
+               None,
+               'a-more-compact-expression'),
+              ('Extending to more predictors',
+               2,
+               None,
+               'extending-to-more-predictors'),
+              ('Including more classes', 2, None, 'including-more-classes'),
+              ('More classes', 2, None, 'more-classes'),
+              ('Optimization, the central part of any Machine Learning '
+               'algortithm',
+               2,
+               None,
+               'optimization-the-central-part-of-any-machine-learning-algortithm'),
+              ('Revisiting our Logistic Regression case',
+               2,
+               None,
+               'revisiting-our-logistic-regression-case'),
+              ('The equations to solve', 2, None, 'the-equations-to-solve'),
+              ("Solving using Newton-Raphson's method",
+               2,
+               None,
+               'solving-using-newton-raphson-s-method'),
+              ('Example code for Logistic Regression',
+               2,
+               None,
+               'example-code-for-logistic-regression'),
+              ('Synthetic data generation',
+               3,
+               None,
+               'synthetic-data-generation')]}
+end of tocinfo -->
+
+<body>
+
+
+
+<script type="text/x-mathjax-config">
+MathJax.Hub.Config({
+  TeX: {
+     equationNumbers: {  autoNumber: "none"  },
+     extensions: ["AMSmath.js", "AMSsymbols.js", "autobold.js", "color.js"]
+  }
+});
+</script>
+<script type="text/javascript" async
+ src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
+</script>
+
+
+<!-- Bootstrap navigation bar -->
+<div class="navbar navbar-default navbar-fixed-top">
+  <div class="navbar-header">
+    <button type="button" class="navbar-toggle" data-toggle="collapse" data-target=".navbar-responsive-collapse">
+      <span class="icon-bar"></span>
+      <span class="icon-bar"></span>
+      <span class="icon-bar"></span>
+    </button>
+    <a class="navbar-brand" href="http://github.com/week39-bs.html">Week 39: Resampling methods and logistic regression</a>
+  </div>
+  <div class="navbar-collapse collapse navbar-responsive-collapse">
+    <ul class="nav navbar-nav navbar-right">
+      <li class="dropdown">
+        <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
+        <ul class="dropdown-menu">
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs001.html#plan-for-week-39-september-22-26-2025" style="font-size: 80%;"><b>Plan for week 39, September 22-26, 2025</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs002.html#readings-and-videos-resampling-methods" style="font-size: 80%;"><b>Readings and Videos, resampling methods</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs003.html#readings-and-videos-logistic-regression" style="font-size: 80%;"><b>Readings and Videos, logistic regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs004.html#lab-sessions-week-39" style="font-size: 80%;"><b>Lab sessions week 39</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs005.html#lecture-material" style="font-size: 80%;"><b>Lecture material</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs010.html#resampling-methods" style="font-size: 80%;"><b>Resampling methods</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs007.html#resampling-approaches-can-be-computationally-expensive" style="font-size: 80%;"><b>Resampling approaches can be computationally expensive</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs008.html#why-resampling-methods" style="font-size: 80%;"><b>Why resampling methods ?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs009.html#statistical-analysis" style="font-size: 80%;"><b>Statistical analysis</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs010.html#resampling-methods" style="font-size: 80%;"><b>Resampling methods</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs011.html#resampling-methods-bootstrap" style="font-size: 80%;"><b>Resampling methods: Bootstrap</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs012.html#the-bias-variance-tradeoff" style="font-size: 80%;"><b>The bias-variance tradeoff</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs013.html#a-way-to-read-the-bias-variance-tradeoff" style="font-size: 80%;"><b>A way to Read the Bias-Variance Tradeoff</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs014.html#understanding-what-happens" style="font-size: 80%;"><b>Understanding what happens</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs015.html#summing-up" style="font-size: 80%;"><b>Summing up</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs016.html#another-example-from-scikit-learn-s-repository" style="font-size: 80%;"><b>Another Example from Scikit-Learn's Repository</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs017.html#various-steps-in-cross-validation" style="font-size: 80%;"><b>Various steps in cross-validation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs018.html#cross-validation-in-brief" style="font-size: 80%;"><b>Cross-validation in brief</b></a></li>
+     <!-- navigation toc: --> <li><a href="#code-example-for-cross-validation-and-k-fold-cross-validation" style="font-size: 80%;"><b>Code Example for Cross-validation and \( k \)-fold Cross-validation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs020.html#more-examples-on-bootstrap-and-cross-validation-and-errors" style="font-size: 80%;"><b>More examples on bootstrap and cross-validation and errors</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs021.html#the-same-example-but-now-with-cross-validation" style="font-size: 80%;"><b>The same example but now with cross-validation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs022.html#logistic-regression" style="font-size: 80%;"><b>Logistic Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs023.html#classification-problems" style="font-size: 80%;"><b>Classification problems</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs024.html#optimization-and-deep-learning" style="font-size: 80%;"><b>Optimization and Deep learning</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs025.html#basics" style="font-size: 80%;"><b>Basics</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs026.html#linear-classifier" style="font-size: 80%;"><b>Linear classifier</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs027.html#some-selected-properties" style="font-size: 80%;"><b>Some selected properties</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs028.html#simple-example" style="font-size: 80%;"><b>Simple example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs029.html#plotting-the-mean-value-for-each-group" style="font-size: 80%;"><b>Plotting the mean value for each group</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs030.html#the-logistic-function" style="font-size: 80%;"><b>The logistic function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs031.html#examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-networks" style="font-size: 80%;"><b>Examples of likelihood functions used in logistic regression and nueral networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs032.html#two-parameters" style="font-size: 80%;"><b>Two parameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs033.html#maximum-likelihood" style="font-size: 80%;"><b>Maximum likelihood</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#the-cost-function-rewritten" style="font-size: 80%;"><b>The cost function rewritten</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs035.html#minimizing-the-cross-entropy" style="font-size: 80%;"><b>Minimizing the cross entropy</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs036.html#a-more-compact-expression" style="font-size: 80%;"><b>A more compact expression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs037.html#extending-to-more-predictors" style="font-size: 80%;"><b>Extending to more predictors</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs038.html#including-more-classes" style="font-size: 80%;"><b>Including more classes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs039.html#more-classes" style="font-size: 80%;"><b>More classes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs040.html#optimization-the-central-part-of-any-machine-learning-algortithm" style="font-size: 80%;"><b>Optimization, the central part of any Machine Learning algortithm</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs041.html#revisiting-our-logistic-regression-case" style="font-size: 80%;"><b>Revisiting our Logistic Regression case</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs042.html#the-equations-to-solve" style="font-size: 80%;"><b>The equations to solve</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs043.html#solving-using-newton-raphson-s-method" style="font-size: 80%;"><b>Solving using Newton-Raphson's method</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs044.html#example-code-for-logistic-regression" style="font-size: 80%;"><b>Example code for Logistic Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs044.html#synthetic-data-generation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Synthetic data generation</a></li>
+
+        </ul>
+      </li>
+    </ul>
+  </div>
+</div>
+</div> <!-- end of navigation bar -->
+<div class="container">
+<p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
+<a name="part0019"></a>
+<!-- !split -->
+<h2 id="code-example-for-cross-validation-and-k-fold-cross-validation" class="anchor">Code Example for Cross-validation and \( k \)-fold Cross-validation </h2>
+
+<p>The code here uses Ridge regression with cross-validation (CV)  resampling and \( k \)-fold CV in order to fit a specific polynomial. </p>
+
+<!-- code=python (!bc pycod) typeset with pygments style "default" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #f8f8f8">
+  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
+<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">matplotlib.pyplot</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">plt</span>
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.model_selection</span> <span style="color: #008000; font-weight: bold">import</span> KFold
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.linear_model</span> <span style="color: #008000; font-weight: bold">import</span> Ridge
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.model_selection</span> <span style="color: #008000; font-weight: bold">import</span> cross_val_score
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.preprocessing</span> <span style="color: #008000; font-weight: bold">import</span> PolynomialFeatures
+
+<span style="color: #408080; font-style: italic"># A seed just to ensure that the random numbers are the same for every run.</span>
+<span style="color: #408080; font-style: italic"># Useful for eventual debugging.</span>
+np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>seed(<span style="color: #666666">3155</span>)
+
+<span style="color: #408080; font-style: italic"># Generate the data.</span>
+nsamples <span style="color: #666666">=</span> <span style="color: #666666">100</span>
+x <span style="color: #666666">=</span> np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>randn(nsamples)
+y <span style="color: #666666">=</span> <span style="color: #666666">3*</span>x<span style="color: #666666">**2</span> <span style="color: #666666">+</span> np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>randn(nsamples)
+
+<span style="color: #408080; font-style: italic">## Cross-validation on Ridge regression using KFold only</span>
+
+<span style="color: #408080; font-style: italic"># Decide degree on polynomial to fit</span>
+poly <span style="color: #666666">=</span> PolynomialFeatures(degree <span style="color: #666666">=</span> <span style="color: #666666">6</span>)
+
+<span style="color: #408080; font-style: italic"># Decide which values of lambda to use</span>
+nlambdas <span style="color: #666666">=</span> <span style="color: #666666">500</span>
+lambdas <span style="color: #666666">=</span> np<span style="color: #666666">.</span>logspace(<span style="color: #666666">-3</span>, <span style="color: #666666">5</span>, nlambdas)
+
+<span style="color: #408080; font-style: italic"># Initialize a KFold instance</span>
+k <span style="color: #666666">=</span> <span style="color: #666666">5</span>
+kfold <span style="color: #666666">=</span> KFold(n_splits <span style="color: #666666">=</span> k)
+
+<span style="color: #408080; font-style: italic"># Perform the cross-validation to estimate MSE</span>
+scores_KFold <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros((nlambdas, k))
+
+i <span style="color: #666666">=</span> <span style="color: #666666">0</span>
+<span style="color: #008000; font-weight: bold">for</span> lmb <span style="color: #AA22FF; font-weight: bold">in</span> lambdas:
+    ridge <span style="color: #666666">=</span> Ridge(alpha <span style="color: #666666">=</span> lmb)
+    j <span style="color: #666666">=</span> <span style="color: #666666">0</span>
+    <span style="color: #008000; font-weight: bold">for</span> train_inds, test_inds <span style="color: #AA22FF; font-weight: bold">in</span> kfold<span style="color: #666666">.</span>split(x):
+        xtrain <span style="color: #666666">=</span> x[train_inds]
+        ytrain <span style="color: #666666">=</span> y[train_inds]
+
+        xtest <span style="color: #666666">=</span> x[test_inds]
+        ytest <span style="color: #666666">=</span> y[test_inds]
+
+        Xtrain <span style="color: #666666">=</span> poly<span style="color: #666666">.</span>fit_transform(xtrain[:, np<span style="color: #666666">.</span>newaxis])
+        ridge<span style="color: #666666">.</span>fit(Xtrain, ytrain[:, np<span style="color: #666666">.</span>newaxis])
+
+        Xtest <span style="color: #666666">=</span> poly<span style="color: #666666">.</span>fit_transform(xtest[:, np<span style="color: #666666">.</span>newaxis])
+        ypred <span style="color: #666666">=</span> ridge<span style="color: #666666">.</span>predict(Xtest)
+
+        scores_KFold[i,j] <span style="color: #666666">=</span> np<span style="color: #666666">.</span>sum((ypred <span style="color: #666666">-</span> ytest[:, np<span style="color: #666666">.</span>newaxis])<span style="color: #666666">**2</span>)<span style="color: #666666">/</span>np<span style="color: #666666">.</span>size(ypred)
+
+        j <span style="color: #666666">+=</span> <span style="color: #666666">1</span>
+    i <span style="color: #666666">+=</span> <span style="color: #666666">1</span>
+
+
+estimated_mse_KFold <span style="color: #666666">=</span> np<span style="color: #666666">.</span>mean(scores_KFold, axis <span style="color: #666666">=</span> <span style="color: #666666">1</span>)
+
+<span style="color: #408080; font-style: italic">## Cross-validation using cross_val_score from sklearn along with KFold</span>
+
+<span style="color: #408080; font-style: italic"># kfold is an instance initialized above as:</span>
+<span style="color: #408080; font-style: italic"># kfold = KFold(n_splits = k)</span>
+
+estimated_mse_sklearn <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros(nlambdas)
+i <span style="color: #666666">=</span> <span style="color: #666666">0</span>
+<span style="color: #008000; font-weight: bold">for</span> lmb <span style="color: #AA22FF; font-weight: bold">in</span> lambdas:
+    ridge <span style="color: #666666">=</span> Ridge(alpha <span style="color: #666666">=</span> lmb)
+
+    X <span style="color: #666666">=</span> poly<span style="color: #666666">.</span>fit_transform(x[:, np<span style="color: #666666">.</span>newaxis])
+    estimated_mse_folds <span style="color: #666666">=</span> cross_val_score(ridge, X, y[:, np<span style="color: #666666">.</span>newaxis], scoring<span style="color: #666666">=</span><span style="color: #BA2121">&#39;neg_mean_squared_error&#39;</span>, cv<span style="color: #666666">=</span>kfold)
+
+    <span style="color: #408080; font-style: italic"># cross_val_score return an array containing the estimated negative mse for every fold.</span>
+    <span style="color: #408080; font-style: italic"># we have to the the mean of every array in order to get an estimate of the mse of the model</span>
+    estimated_mse_sklearn[i] <span style="color: #666666">=</span> np<span style="color: #666666">.</span>mean(<span style="color: #666666">-</span>estimated_mse_folds)
+
+    i <span style="color: #666666">+=</span> <span style="color: #666666">1</span>
+
+<span style="color: #408080; font-style: italic">## Plot and compare the slightly different ways to perform cross-validation</span>
+
+plt<span style="color: #666666">.</span>figure()
+
+plt<span style="color: #666666">.</span>plot(np<span style="color: #666666">.</span>log10(lambdas), estimated_mse_sklearn, label <span style="color: #666666">=</span> <span style="color: #BA2121">&#39;cross_val_score&#39;</span>)
+<span style="color: #408080; font-style: italic">#plt.plot(np.log10(lambdas), estimated_mse_KFold, &#39;r--&#39;, label = &#39;KFold&#39;)</span>
+
+plt<span style="color: #666666">.</span>xlabel(<span style="color: #BA2121">&#39;log10(lambda)&#39;</span>)
+plt<span style="color: #666666">.</span>ylabel(<span style="color: #BA2121">&#39;mse&#39;</span>)
+
+plt<span style="color: #666666">.</span>legend()
+
+plt<span style="color: #666666">.</span>show()
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
+</div>
+
+
+<p>
+<!-- navigation buttons at the bottom of the page -->
+<ul class="pagination">
+<li><a href="http://github.com/._week39-bs018.html">&laquo;</a></li>
+  <li><a href="http://github.com/._week39-bs000.html">1</a></li>
+  <li><a href="">...</a></li>
+  <li><a href="http://github.com/._week39-bs011.html">12</a></li>
+  <li><a href="http://github.com/._week39-bs012.html">13</a></li>
+  <li><a href="http://github.com/._week39-bs013.html">14</a></li>
+  <li><a href="http://github.com/._week39-bs014.html">15</a></li>
+  <li><a href="http://github.com/._week39-bs015.html">16</a></li>
+  <li><a href="http://github.com/._week39-bs016.html">17</a></li>
+  <li><a href="http://github.com/._week39-bs017.html">18</a></li>
+  <li><a href="http://github.com/._week39-bs018.html">19</a></li>
+  <li class="active"><a href="http://github.com/._week39-bs019.html">20</a></li>
+  <li><a href="http://github.com/._week39-bs020.html">21</a></li>
+  <li><a href="http://github.com/._week39-bs021.html">22</a></li>
+  <li><a href="http://github.com/._week39-bs022.html">23</a></li>
+  <li><a href="http://github.com/._week39-bs023.html">24</a></li>
+  <li><a href="http://github.com/._week39-bs024.html">25</a></li>
+  <li><a href="http://github.com/._week39-bs025.html">26</a></li>
+  <li><a href="http://github.com/._week39-bs026.html">27</a></li>
+  <li><a href="http://github.com/._week39-bs027.html">28</a></li>
+  <li><a href="http://github.com/._week39-bs028.html">29</a></li>
+  <li><a href="">...</a></li>
+  <li><a href="http://github.com/._week39-bs044.html">45</a></li>
+  <li><a href="http://github.com/._week39-bs020.html">&raquo;</a></li>
+</ul>
+<!-- ------------------- end of main content --------------- -->
+</div>  <!-- end container -->
+<!-- include javascript, jQuery *first* -->
+<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.10.2/jquery.min.js"></script>
+<script src="https://netdna.bootstrapcdn.com/bootstrap/3.0.0/js/bootstrap.min.js"></script>
+<!-- Bootstrap footer
+<footer>
+<a href="https://.../"><img width="250" align=right src="https://.../"></a>
+</footer>
+-->
+<center style="font-size:80%">
+<!-- copyright only on the titlepage -->
+</center>
+</body>
+</html>
+
diff --git a/doc/pub/week39/html/._week39-bs020.html b/doc/pub/week39/html/._week39-bs020.html
new file mode 100644
index 000000000..6ca08c43a
--- /dev/null
+++ b/doc/pub/week39/html/._week39-bs020.html
@@ -0,0 +1,402 @@
+<!--
+HTML file automatically generated from DocOnce source
+(https://github.com/doconce/doconce/)
+doconce format html week39.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week39-bs --no_mako
+-->
+<html>
+<head>
+<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
+<meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
+<meta name="viewport" content="width=device-width, initial-scale=1.0" />
+<meta name="description" content="Week 39: Resampling methods and logistic regression">
+<title>Week 39: Resampling methods and logistic regression</title>
+<!-- Bootstrap style: bootstrap -->
+<!-- doconce format html week39.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week39-bs --no_mako -->
+<link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
+<!-- not necessary
+<link href="https://netdna.bootstrapcdn.com/font-awesome/4.0.3/css/font-awesome.css" rel="stylesheet">
+-->
+<style type="text/css">
+/* Add scrollbar to dropdown menus in bootstrap navigation bar */
+.dropdown-menu {
+   height: auto;
+   max-height: 400px;
+   overflow-x: hidden;
+}
+/* Adds an invisible element before each target to offset for the navigation
+   bar */
+.anchor::before {
+  content:"";
+  display:block;
+  height:50px;      /* fixed header height for style bootstrap */
+  margin:-50px 0 0; /* negative fixed header height */
+}
+</style>
+</head>
+
+<!-- tocinfo
+{'highest level': 2,
+ 'sections': [('Plan for week 39, September 22-26, 2025',
+               2,
+               None,
+               'plan-for-week-39-september-22-26-2025'),
+              ('Readings and Videos, resampling methods',
+               2,
+               None,
+               'readings-and-videos-resampling-methods'),
+              ('Readings and Videos, logistic regression',
+               2,
+               None,
+               'readings-and-videos-logistic-regression'),
+              ('Lab sessions week 39', 2, None, 'lab-sessions-week-39'),
+              ('Lecture material', 2, None, 'lecture-material'),
+              ('Resampling methods', 2, None, 'resampling-methods'),
+              ('Resampling approaches can be computationally expensive',
+               2,
+               None,
+               'resampling-approaches-can-be-computationally-expensive'),
+              ('Why resampling methods ?', 2, None, 'why-resampling-methods'),
+              ('Statistical analysis', 2, None, 'statistical-analysis'),
+              ('Resampling methods', 2, None, 'resampling-methods'),
+              ('Resampling methods: Bootstrap',
+               2,
+               None,
+               'resampling-methods-bootstrap'),
+              ('The bias-variance tradeoff',
+               2,
+               None,
+               'the-bias-variance-tradeoff'),
+              ('A way to Read the Bias-Variance Tradeoff',
+               2,
+               None,
+               'a-way-to-read-the-bias-variance-tradeoff'),
+              ('Understanding what happens',
+               2,
+               None,
+               'understanding-what-happens'),
+              ('Summing up', 2, None, 'summing-up'),
+              ("Another Example from Scikit-Learn's Repository",
+               2,
+               None,
+               'another-example-from-scikit-learn-s-repository'),
+              ('Various steps in cross-validation',
+               2,
+               None,
+               'various-steps-in-cross-validation'),
+              ('Cross-validation in brief',
+               2,
+               None,
+               'cross-validation-in-brief'),
+              ('Code Example for Cross-validation and $k$-fold '
+               'Cross-validation',
+               2,
+               None,
+               'code-example-for-cross-validation-and-k-fold-cross-validation'),
+              ('More examples on bootstrap and cross-validation and errors',
+               2,
+               None,
+               'more-examples-on-bootstrap-and-cross-validation-and-errors'),
+              ('The same example but now with cross-validation',
+               2,
+               None,
+               'the-same-example-but-now-with-cross-validation'),
+              ('Logistic Regression', 2, None, 'logistic-regression'),
+              ('Classification problems', 2, None, 'classification-problems'),
+              ('Optimization and Deep learning',
+               2,
+               None,
+               'optimization-and-deep-learning'),
+              ('Basics', 2, None, 'basics'),
+              ('Linear classifier', 2, None, 'linear-classifier'),
+              ('Some selected properties', 2, None, 'some-selected-properties'),
+              ('Simple example', 2, None, 'simple-example'),
+              ('Plotting the mean value for each group',
+               2,
+               None,
+               'plotting-the-mean-value-for-each-group'),
+              ('The logistic function', 2, None, 'the-logistic-function'),
+              ('Examples of likelihood functions used in logistic regression '
+               'and nueral networks',
+               2,
+               None,
+               'examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-networks'),
+              ('Two parameters', 2, None, 'two-parameters'),
+              ('Maximum likelihood', 2, None, 'maximum-likelihood'),
+              ('The cost function rewritten',
+               2,
+               None,
+               'the-cost-function-rewritten'),
+              ('Minimizing the cross entropy',
+               2,
+               None,
+               'minimizing-the-cross-entropy'),
+              ('A more compact expression',
+               2,
+               None,
+               'a-more-compact-expression'),
+              ('Extending to more predictors',
+               2,
+               None,
+               'extending-to-more-predictors'),
+              ('Including more classes', 2, None, 'including-more-classes'),
+              ('More classes', 2, None, 'more-classes'),
+              ('Optimization, the central part of any Machine Learning '
+               'algortithm',
+               2,
+               None,
+               'optimization-the-central-part-of-any-machine-learning-algortithm'),
+              ('Revisiting our Logistic Regression case',
+               2,
+               None,
+               'revisiting-our-logistic-regression-case'),
+              ('The equations to solve', 2, None, 'the-equations-to-solve'),
+              ("Solving using Newton-Raphson's method",
+               2,
+               None,
+               'solving-using-newton-raphson-s-method'),
+              ('Example code for Logistic Regression',
+               2,
+               None,
+               'example-code-for-logistic-regression'),
+              ('Synthetic data generation',
+               3,
+               None,
+               'synthetic-data-generation')]}
+end of tocinfo -->
+
+<body>
+
+
+
+<script type="text/x-mathjax-config">
+MathJax.Hub.Config({
+  TeX: {
+     equationNumbers: {  autoNumber: "none"  },
+     extensions: ["AMSmath.js", "AMSsymbols.js", "autobold.js", "color.js"]
+  }
+});
+</script>
+<script type="text/javascript" async
+ src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
+</script>
+
+
+<!-- Bootstrap navigation bar -->
+<div class="navbar navbar-default navbar-fixed-top">
+  <div class="navbar-header">
+    <button type="button" class="navbar-toggle" data-toggle="collapse" data-target=".navbar-responsive-collapse">
+      <span class="icon-bar"></span>
+      <span class="icon-bar"></span>
+      <span class="icon-bar"></span>
+    </button>
+    <a class="navbar-brand" href="http://github.com/week39-bs.html">Week 39: Resampling methods and logistic regression</a>
+  </div>
+  <div class="navbar-collapse collapse navbar-responsive-collapse">
+    <ul class="nav navbar-nav navbar-right">
+      <li class="dropdown">
+        <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
+        <ul class="dropdown-menu">
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs001.html#plan-for-week-39-september-22-26-2025" style="font-size: 80%;"><b>Plan for week 39, September 22-26, 2025</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs002.html#readings-and-videos-resampling-methods" style="font-size: 80%;"><b>Readings and Videos, resampling methods</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs003.html#readings-and-videos-logistic-regression" style="font-size: 80%;"><b>Readings and Videos, logistic regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs004.html#lab-sessions-week-39" style="font-size: 80%;"><b>Lab sessions week 39</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs005.html#lecture-material" style="font-size: 80%;"><b>Lecture material</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs010.html#resampling-methods" style="font-size: 80%;"><b>Resampling methods</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs007.html#resampling-approaches-can-be-computationally-expensive" style="font-size: 80%;"><b>Resampling approaches can be computationally expensive</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs008.html#why-resampling-methods" style="font-size: 80%;"><b>Why resampling methods ?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs009.html#statistical-analysis" style="font-size: 80%;"><b>Statistical analysis</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs010.html#resampling-methods" style="font-size: 80%;"><b>Resampling methods</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs011.html#resampling-methods-bootstrap" style="font-size: 80%;"><b>Resampling methods: Bootstrap</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs012.html#the-bias-variance-tradeoff" style="font-size: 80%;"><b>The bias-variance tradeoff</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs013.html#a-way-to-read-the-bias-variance-tradeoff" style="font-size: 80%;"><b>A way to Read the Bias-Variance Tradeoff</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs014.html#understanding-what-happens" style="font-size: 80%;"><b>Understanding what happens</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs015.html#summing-up" style="font-size: 80%;"><b>Summing up</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs016.html#another-example-from-scikit-learn-s-repository" style="font-size: 80%;"><b>Another Example from Scikit-Learn's Repository</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs017.html#various-steps-in-cross-validation" style="font-size: 80%;"><b>Various steps in cross-validation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs018.html#cross-validation-in-brief" style="font-size: 80%;"><b>Cross-validation in brief</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs019.html#code-example-for-cross-validation-and-k-fold-cross-validation" style="font-size: 80%;"><b>Code Example for Cross-validation and \( k \)-fold Cross-validation</b></a></li>
+     <!-- navigation toc: --> <li><a href="#more-examples-on-bootstrap-and-cross-validation-and-errors" style="font-size: 80%;"><b>More examples on bootstrap and cross-validation and errors</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs021.html#the-same-example-but-now-with-cross-validation" style="font-size: 80%;"><b>The same example but now with cross-validation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs022.html#logistic-regression" style="font-size: 80%;"><b>Logistic Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs023.html#classification-problems" style="font-size: 80%;"><b>Classification problems</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs024.html#optimization-and-deep-learning" style="font-size: 80%;"><b>Optimization and Deep learning</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs025.html#basics" style="font-size: 80%;"><b>Basics</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs026.html#linear-classifier" style="font-size: 80%;"><b>Linear classifier</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs027.html#some-selected-properties" style="font-size: 80%;"><b>Some selected properties</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs028.html#simple-example" style="font-size: 80%;"><b>Simple example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs029.html#plotting-the-mean-value-for-each-group" style="font-size: 80%;"><b>Plotting the mean value for each group</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs030.html#the-logistic-function" style="font-size: 80%;"><b>The logistic function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs031.html#examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-networks" style="font-size: 80%;"><b>Examples of likelihood functions used in logistic regression and nueral networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs032.html#two-parameters" style="font-size: 80%;"><b>Two parameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs033.html#maximum-likelihood" style="font-size: 80%;"><b>Maximum likelihood</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#the-cost-function-rewritten" style="font-size: 80%;"><b>The cost function rewritten</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs035.html#minimizing-the-cross-entropy" style="font-size: 80%;"><b>Minimizing the cross entropy</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs036.html#a-more-compact-expression" style="font-size: 80%;"><b>A more compact expression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs037.html#extending-to-more-predictors" style="font-size: 80%;"><b>Extending to more predictors</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs038.html#including-more-classes" style="font-size: 80%;"><b>Including more classes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs039.html#more-classes" style="font-size: 80%;"><b>More classes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs040.html#optimization-the-central-part-of-any-machine-learning-algortithm" style="font-size: 80%;"><b>Optimization, the central part of any Machine Learning algortithm</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs041.html#revisiting-our-logistic-regression-case" style="font-size: 80%;"><b>Revisiting our Logistic Regression case</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs042.html#the-equations-to-solve" style="font-size: 80%;"><b>The equations to solve</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs043.html#solving-using-newton-raphson-s-method" style="font-size: 80%;"><b>Solving using Newton-Raphson's method</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs044.html#example-code-for-logistic-regression" style="font-size: 80%;"><b>Example code for Logistic Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs044.html#synthetic-data-generation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Synthetic data generation</a></li>
+
+        </ul>
+      </li>
+    </ul>
+  </div>
+</div>
+</div> <!-- end of navigation bar -->
+<div class="container">
+<p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
+<a name="part0020"></a>
+<!-- !split -->
+<h2 id="more-examples-on-bootstrap-and-cross-validation-and-errors" class="anchor">More examples on bootstrap and cross-validation and errors </h2>
+
+
+<!-- code=python (!bc pycod) typeset with pygments style "default" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #f8f8f8">
+  <pre style="line-height: 125%;"><span style="color: #408080; font-style: italic"># Common imports</span>
+<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">os</span>
+<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
+<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">pandas</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">pd</span>
+<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">matplotlib.pyplot</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">plt</span>
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.linear_model</span> <span style="color: #008000; font-weight: bold">import</span> LinearRegression, Ridge, Lasso
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.model_selection</span> <span style="color: #008000; font-weight: bold">import</span> train_test_split
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.utils</span> <span style="color: #008000; font-weight: bold">import</span> resample
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.metrics</span> <span style="color: #008000; font-weight: bold">import</span> mean_squared_error
+<span style="color: #408080; font-style: italic"># Where to save the figures and data files</span>
+PROJECT_ROOT_DIR <span style="color: #666666">=</span> <span style="color: #BA2121">&quot;Results&quot;</span>
+FIGURE_ID <span style="color: #666666">=</span> <span style="color: #BA2121">&quot;Results/FigureFiles&quot;</span>
+DATA_ID <span style="color: #666666">=</span> <span style="color: #BA2121">&quot;DataFiles/&quot;</span>
+
+<span style="color: #008000; font-weight: bold">if</span> <span style="color: #AA22FF; font-weight: bold">not</span> os<span style="color: #666666">.</span>path<span style="color: #666666">.</span>exists(PROJECT_ROOT_DIR):
+    os<span style="color: #666666">.</span>mkdir(PROJECT_ROOT_DIR)
+
+<span style="color: #008000; font-weight: bold">if</span> <span style="color: #AA22FF; font-weight: bold">not</span> os<span style="color: #666666">.</span>path<span style="color: #666666">.</span>exists(FIGURE_ID):
+    os<span style="color: #666666">.</span>makedirs(FIGURE_ID)
+
+<span style="color: #008000; font-weight: bold">if</span> <span style="color: #AA22FF; font-weight: bold">not</span> os<span style="color: #666666">.</span>path<span style="color: #666666">.</span>exists(DATA_ID):
+    os<span style="color: #666666">.</span>makedirs(DATA_ID)
+
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">image_path</span>(fig_id):
+    <span style="color: #008000; font-weight: bold">return</span> os<span style="color: #666666">.</span>path<span style="color: #666666">.</span>join(FIGURE_ID, fig_id)
+
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">data_path</span>(dat_id):
+    <span style="color: #008000; font-weight: bold">return</span> os<span style="color: #666666">.</span>path<span style="color: #666666">.</span>join(DATA_ID, dat_id)
+
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">save_fig</span>(fig_id):
+    plt<span style="color: #666666">.</span>savefig(image_path(fig_id) <span style="color: #666666">+</span> <span style="color: #BA2121">&quot;.png&quot;</span>, <span style="color: #008000">format</span><span style="color: #666666">=</span><span style="color: #BA2121">&#39;png&#39;</span>)
+
+infile <span style="color: #666666">=</span> <span style="color: #008000">open</span>(data_path(<span style="color: #BA2121">&quot;EoS.csv&quot;</span>),<span style="color: #BA2121">&#39;r&#39;</span>)
+
+<span style="color: #408080; font-style: italic"># Read the EoS data as  csv file and organize the data into two arrays with density and energies</span>
+EoS <span style="color: #666666">=</span> pd<span style="color: #666666">.</span>read_csv(infile, names<span style="color: #666666">=</span>(<span style="color: #BA2121">&#39;Density&#39;</span>, <span style="color: #BA2121">&#39;Energy&#39;</span>))
+EoS[<span style="color: #BA2121">&#39;Energy&#39;</span>] <span style="color: #666666">=</span> pd<span style="color: #666666">.</span>to_numeric(EoS[<span style="color: #BA2121">&#39;Energy&#39;</span>], errors<span style="color: #666666">=</span><span style="color: #BA2121">&#39;coerce&#39;</span>)
+EoS <span style="color: #666666">=</span> EoS<span style="color: #666666">.</span>dropna()
+Energies <span style="color: #666666">=</span> EoS[<span style="color: #BA2121">&#39;Energy&#39;</span>]
+Density <span style="color: #666666">=</span> EoS[<span style="color: #BA2121">&#39;Density&#39;</span>]
+<span style="color: #408080; font-style: italic">#  The design matrix now as function of various polytrops</span>
+
+Maxpolydegree <span style="color: #666666">=</span> <span style="color: #666666">30</span>
+X <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros((<span style="color: #008000">len</span>(Density),Maxpolydegree))
+X[:,<span style="color: #666666">0</span>] <span style="color: #666666">=</span> <span style="color: #666666">1.0</span>
+testerror <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros(Maxpolydegree)
+trainingerror <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros(Maxpolydegree)
+polynomial <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros(Maxpolydegree)
+
+trials <span style="color: #666666">=</span> <span style="color: #666666">100</span>
+<span style="color: #008000; font-weight: bold">for</span> polydegree <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #666666">1</span>, Maxpolydegree):
+    polynomial[polydegree] <span style="color: #666666">=</span> polydegree
+    <span style="color: #008000; font-weight: bold">for</span> degree <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(polydegree):
+        X[:,degree] <span style="color: #666666">=</span> Density<span style="color: #666666">**</span>(degree<span style="color: #666666">/3.0</span>)
+
+<span style="color: #408080; font-style: italic"># loop over trials in order to estimate the expectation value of the MSE</span>
+    testerror[polydegree] <span style="color: #666666">=</span> <span style="color: #666666">0.0</span>
+    trainingerror[polydegree] <span style="color: #666666">=</span> <span style="color: #666666">0.0</span>
+    <span style="color: #008000; font-weight: bold">for</span> samples <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(trials):
+        x_train, x_test, y_train, y_test <span style="color: #666666">=</span> train_test_split(X, Energies, test_size<span style="color: #666666">=0.2</span>)
+        model <span style="color: #666666">=</span> LinearRegression(fit_intercept<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">False</span>)<span style="color: #666666">.</span>fit(x_train, y_train)
+        ypred <span style="color: #666666">=</span> model<span style="color: #666666">.</span>predict(x_train)
+        ytilde <span style="color: #666666">=</span> model<span style="color: #666666">.</span>predict(x_test)
+        testerror[polydegree] <span style="color: #666666">+=</span> mean_squared_error(y_test, ytilde)
+        trainingerror[polydegree] <span style="color: #666666">+=</span> mean_squared_error(y_train, ypred) 
+
+    testerror[polydegree] <span style="color: #666666">/=</span> trials
+    trainingerror[polydegree] <span style="color: #666666">/=</span> trials
+    <span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;Degree of polynomial: </span><span style="color: #BB6688; font-weight: bold">%3d</span><span style="color: #BA2121">&quot;</span><span style="color: #666666">%</span> polynomial[polydegree])
+    <span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;Mean squared error on training data: </span><span style="color: #BB6688; font-weight: bold">%.8f</span><span style="color: #BA2121">&quot;</span> <span style="color: #666666">%</span> trainingerror[polydegree])
+    <span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;Mean squared error on test data: </span><span style="color: #BB6688; font-weight: bold">%.8f</span><span style="color: #BA2121">&quot;</span> <span style="color: #666666">%</span> testerror[polydegree])
+
+plt<span style="color: #666666">.</span>plot(polynomial, np<span style="color: #666666">.</span>log10(trainingerror), label<span style="color: #666666">=</span><span style="color: #BA2121">&#39;Training Error&#39;</span>)
+plt<span style="color: #666666">.</span>plot(polynomial, np<span style="color: #666666">.</span>log10(testerror), label<span style="color: #666666">=</span><span style="color: #BA2121">&#39;Test Error&#39;</span>)
+plt<span style="color: #666666">.</span>xlabel(<span style="color: #BA2121">&#39;Polynomial degree&#39;</span>)
+plt<span style="color: #666666">.</span>ylabel(<span style="color: #BA2121">&#39;log10[MSE]&#39;</span>)
+plt<span style="color: #666666">.</span>legend()
+plt<span style="color: #666666">.</span>show()
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
+</div>
+
+<p>Note that we kept the intercept column in the fitting here. This means that we need to set the <b>intercept</b> in the call to the <b>Scikit-Learn</b> function as <b>False</b>. Alternatively, we could have set up the design matrix \( X \) without the first column of ones.</p>
+
+<p>
+<!-- navigation buttons at the bottom of the page -->
+<ul class="pagination">
+<li><a href="http://github.com/._week39-bs019.html">&laquo;</a></li>
+  <li><a href="http://github.com/._week39-bs000.html">1</a></li>
+  <li><a href="">...</a></li>
+  <li><a href="http://github.com/._week39-bs012.html">13</a></li>
+  <li><a href="http://github.com/._week39-bs013.html">14</a></li>
+  <li><a href="http://github.com/._week39-bs014.html">15</a></li>
+  <li><a href="http://github.com/._week39-bs015.html">16</a></li>
+  <li><a href="http://github.com/._week39-bs016.html">17</a></li>
+  <li><a href="http://github.com/._week39-bs017.html">18</a></li>
+  <li><a href="http://github.com/._week39-bs018.html">19</a></li>
+  <li><a href="http://github.com/._week39-bs019.html">20</a></li>
+  <li class="active"><a href="http://github.com/._week39-bs020.html">21</a></li>
+  <li><a href="http://github.com/._week39-bs021.html">22</a></li>
+  <li><a href="http://github.com/._week39-bs022.html">23</a></li>
+  <li><a href="http://github.com/._week39-bs023.html">24</a></li>
+  <li><a href="http://github.com/._week39-bs024.html">25</a></li>
+  <li><a href="http://github.com/._week39-bs025.html">26</a></li>
+  <li><a href="http://github.com/._week39-bs026.html">27</a></li>
+  <li><a href="http://github.com/._week39-bs027.html">28</a></li>
+  <li><a href="http://github.com/._week39-bs028.html">29</a></li>
+  <li><a href="http://github.com/._week39-bs029.html">30</a></li>
+  <li><a href="">...</a></li>
+  <li><a href="http://github.com/._week39-bs044.html">45</a></li>
+  <li><a href="http://github.com/._week39-bs021.html">&raquo;</a></li>
+</ul>
+<!-- ------------------- end of main content --------------- -->
+</div>  <!-- end container -->
+<!-- include javascript, jQuery *first* -->
+<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.10.2/jquery.min.js"></script>
+<script src="https://netdna.bootstrapcdn.com/bootstrap/3.0.0/js/bootstrap.min.js"></script>
+<!-- Bootstrap footer
+<footer>
+<a href="https://.../"><img width="250" align=right src="https://.../"></a>
+</footer>
+-->
+<center style="font-size:80%">
+<!-- copyright only on the titlepage -->
+</center>
+</body>
+</html>
+
diff --git a/doc/pub/week39/html/._week39-bs021.html b/doc/pub/week39/html/._week39-bs021.html
new file mode 100644
index 000000000..6960fea7a
--- /dev/null
+++ b/doc/pub/week39/html/._week39-bs021.html
@@ -0,0 +1,391 @@
+<!--
+HTML file automatically generated from DocOnce source
+(https://github.com/doconce/doconce/)
+doconce format html week39.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week39-bs --no_mako
+-->
+<html>
+<head>
+<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
+<meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
+<meta name="viewport" content="width=device-width, initial-scale=1.0" />
+<meta name="description" content="Week 39: Resampling methods and logistic regression">
+<title>Week 39: Resampling methods and logistic regression</title>
+<!-- Bootstrap style: bootstrap -->
+<!-- doconce format html week39.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week39-bs --no_mako -->
+<link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
+<!-- not necessary
+<link href="https://netdna.bootstrapcdn.com/font-awesome/4.0.3/css/font-awesome.css" rel="stylesheet">
+-->
+<style type="text/css">
+/* Add scrollbar to dropdown menus in bootstrap navigation bar */
+.dropdown-menu {
+   height: auto;
+   max-height: 400px;
+   overflow-x: hidden;
+}
+/* Adds an invisible element before each target to offset for the navigation
+   bar */
+.anchor::before {
+  content:"";
+  display:block;
+  height:50px;      /* fixed header height for style bootstrap */
+  margin:-50px 0 0; /* negative fixed header height */
+}
+</style>
+</head>
+
+<!-- tocinfo
+{'highest level': 2,
+ 'sections': [('Plan for week 39, September 22-26, 2025',
+               2,
+               None,
+               'plan-for-week-39-september-22-26-2025'),
+              ('Readings and Videos, resampling methods',
+               2,
+               None,
+               'readings-and-videos-resampling-methods'),
+              ('Readings and Videos, logistic regression',
+               2,
+               None,
+               'readings-and-videos-logistic-regression'),
+              ('Lab sessions week 39', 2, None, 'lab-sessions-week-39'),
+              ('Lecture material', 2, None, 'lecture-material'),
+              ('Resampling methods', 2, None, 'resampling-methods'),
+              ('Resampling approaches can be computationally expensive',
+               2,
+               None,
+               'resampling-approaches-can-be-computationally-expensive'),
+              ('Why resampling methods ?', 2, None, 'why-resampling-methods'),
+              ('Statistical analysis', 2, None, 'statistical-analysis'),
+              ('Resampling methods', 2, None, 'resampling-methods'),
+              ('Resampling methods: Bootstrap',
+               2,
+               None,
+               'resampling-methods-bootstrap'),
+              ('The bias-variance tradeoff',
+               2,
+               None,
+               'the-bias-variance-tradeoff'),
+              ('A way to Read the Bias-Variance Tradeoff',
+               2,
+               None,
+               'a-way-to-read-the-bias-variance-tradeoff'),
+              ('Understanding what happens',
+               2,
+               None,
+               'understanding-what-happens'),
+              ('Summing up', 2, None, 'summing-up'),
+              ("Another Example from Scikit-Learn's Repository",
+               2,
+               None,
+               'another-example-from-scikit-learn-s-repository'),
+              ('Various steps in cross-validation',
+               2,
+               None,
+               'various-steps-in-cross-validation'),
+              ('Cross-validation in brief',
+               2,
+               None,
+               'cross-validation-in-brief'),
+              ('Code Example for Cross-validation and $k$-fold '
+               'Cross-validation',
+               2,
+               None,
+               'code-example-for-cross-validation-and-k-fold-cross-validation'),
+              ('More examples on bootstrap and cross-validation and errors',
+               2,
+               None,
+               'more-examples-on-bootstrap-and-cross-validation-and-errors'),
+              ('The same example but now with cross-validation',
+               2,
+               None,
+               'the-same-example-but-now-with-cross-validation'),
+              ('Logistic Regression', 2, None, 'logistic-regression'),
+              ('Classification problems', 2, None, 'classification-problems'),
+              ('Optimization and Deep learning',
+               2,
+               None,
+               'optimization-and-deep-learning'),
+              ('Basics', 2, None, 'basics'),
+              ('Linear classifier', 2, None, 'linear-classifier'),
+              ('Some selected properties', 2, None, 'some-selected-properties'),
+              ('Simple example', 2, None, 'simple-example'),
+              ('Plotting the mean value for each group',
+               2,
+               None,
+               'plotting-the-mean-value-for-each-group'),
+              ('The logistic function', 2, None, 'the-logistic-function'),
+              ('Examples of likelihood functions used in logistic regression '
+               'and nueral networks',
+               2,
+               None,
+               'examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-networks'),
+              ('Two parameters', 2, None, 'two-parameters'),
+              ('Maximum likelihood', 2, None, 'maximum-likelihood'),
+              ('The cost function rewritten',
+               2,
+               None,
+               'the-cost-function-rewritten'),
+              ('Minimizing the cross entropy',
+               2,
+               None,
+               'minimizing-the-cross-entropy'),
+              ('A more compact expression',
+               2,
+               None,
+               'a-more-compact-expression'),
+              ('Extending to more predictors',
+               2,
+               None,
+               'extending-to-more-predictors'),
+              ('Including more classes', 2, None, 'including-more-classes'),
+              ('More classes', 2, None, 'more-classes'),
+              ('Optimization, the central part of any Machine Learning '
+               'algortithm',
+               2,
+               None,
+               'optimization-the-central-part-of-any-machine-learning-algortithm'),
+              ('Revisiting our Logistic Regression case',
+               2,
+               None,
+               'revisiting-our-logistic-regression-case'),
+              ('The equations to solve', 2, None, 'the-equations-to-solve'),
+              ("Solving using Newton-Raphson's method",
+               2,
+               None,
+               'solving-using-newton-raphson-s-method'),
+              ('Example code for Logistic Regression',
+               2,
+               None,
+               'example-code-for-logistic-regression'),
+              ('Synthetic data generation',
+               3,
+               None,
+               'synthetic-data-generation')]}
+end of tocinfo -->
+
+<body>
+
+
+
+<script type="text/x-mathjax-config">
+MathJax.Hub.Config({
+  TeX: {
+     equationNumbers: {  autoNumber: "none"  },
+     extensions: ["AMSmath.js", "AMSsymbols.js", "autobold.js", "color.js"]
+  }
+});
+</script>
+<script type="text/javascript" async
+ src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
+</script>
+
+
+<!-- Bootstrap navigation bar -->
+<div class="navbar navbar-default navbar-fixed-top">
+  <div class="navbar-header">
+    <button type="button" class="navbar-toggle" data-toggle="collapse" data-target=".navbar-responsive-collapse">
+      <span class="icon-bar"></span>
+      <span class="icon-bar"></span>
+      <span class="icon-bar"></span>
+    </button>
+    <a class="navbar-brand" href="http://github.com/week39-bs.html">Week 39: Resampling methods and logistic regression</a>
+  </div>
+  <div class="navbar-collapse collapse navbar-responsive-collapse">
+    <ul class="nav navbar-nav navbar-right">
+      <li class="dropdown">
+        <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
+        <ul class="dropdown-menu">
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs001.html#plan-for-week-39-september-22-26-2025" style="font-size: 80%;"><b>Plan for week 39, September 22-26, 2025</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs002.html#readings-and-videos-resampling-methods" style="font-size: 80%;"><b>Readings and Videos, resampling methods</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs003.html#readings-and-videos-logistic-regression" style="font-size: 80%;"><b>Readings and Videos, logistic regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs004.html#lab-sessions-week-39" style="font-size: 80%;"><b>Lab sessions week 39</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs005.html#lecture-material" style="font-size: 80%;"><b>Lecture material</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs010.html#resampling-methods" style="font-size: 80%;"><b>Resampling methods</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs007.html#resampling-approaches-can-be-computationally-expensive" style="font-size: 80%;"><b>Resampling approaches can be computationally expensive</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs008.html#why-resampling-methods" style="font-size: 80%;"><b>Why resampling methods ?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs009.html#statistical-analysis" style="font-size: 80%;"><b>Statistical analysis</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs010.html#resampling-methods" style="font-size: 80%;"><b>Resampling methods</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs011.html#resampling-methods-bootstrap" style="font-size: 80%;"><b>Resampling methods: Bootstrap</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs012.html#the-bias-variance-tradeoff" style="font-size: 80%;"><b>The bias-variance tradeoff</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs013.html#a-way-to-read-the-bias-variance-tradeoff" style="font-size: 80%;"><b>A way to Read the Bias-Variance Tradeoff</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs014.html#understanding-what-happens" style="font-size: 80%;"><b>Understanding what happens</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs015.html#summing-up" style="font-size: 80%;"><b>Summing up</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs016.html#another-example-from-scikit-learn-s-repository" style="font-size: 80%;"><b>Another Example from Scikit-Learn's Repository</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs017.html#various-steps-in-cross-validation" style="font-size: 80%;"><b>Various steps in cross-validation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs018.html#cross-validation-in-brief" style="font-size: 80%;"><b>Cross-validation in brief</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs019.html#code-example-for-cross-validation-and-k-fold-cross-validation" style="font-size: 80%;"><b>Code Example for Cross-validation and \( k \)-fold Cross-validation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs020.html#more-examples-on-bootstrap-and-cross-validation-and-errors" style="font-size: 80%;"><b>More examples on bootstrap and cross-validation and errors</b></a></li>
+     <!-- navigation toc: --> <li><a href="#the-same-example-but-now-with-cross-validation" style="font-size: 80%;"><b>The same example but now with cross-validation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs022.html#logistic-regression" style="font-size: 80%;"><b>Logistic Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs023.html#classification-problems" style="font-size: 80%;"><b>Classification problems</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs024.html#optimization-and-deep-learning" style="font-size: 80%;"><b>Optimization and Deep learning</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs025.html#basics" style="font-size: 80%;"><b>Basics</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs026.html#linear-classifier" style="font-size: 80%;"><b>Linear classifier</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs027.html#some-selected-properties" style="font-size: 80%;"><b>Some selected properties</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs028.html#simple-example" style="font-size: 80%;"><b>Simple example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs029.html#plotting-the-mean-value-for-each-group" style="font-size: 80%;"><b>Plotting the mean value for each group</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs030.html#the-logistic-function" style="font-size: 80%;"><b>The logistic function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs031.html#examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-networks" style="font-size: 80%;"><b>Examples of likelihood functions used in logistic regression and nueral networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs032.html#two-parameters" style="font-size: 80%;"><b>Two parameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs033.html#maximum-likelihood" style="font-size: 80%;"><b>Maximum likelihood</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#the-cost-function-rewritten" style="font-size: 80%;"><b>The cost function rewritten</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs035.html#minimizing-the-cross-entropy" style="font-size: 80%;"><b>Minimizing the cross entropy</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs036.html#a-more-compact-expression" style="font-size: 80%;"><b>A more compact expression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs037.html#extending-to-more-predictors" style="font-size: 80%;"><b>Extending to more predictors</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs038.html#including-more-classes" style="font-size: 80%;"><b>Including more classes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs039.html#more-classes" style="font-size: 80%;"><b>More classes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs040.html#optimization-the-central-part-of-any-machine-learning-algortithm" style="font-size: 80%;"><b>Optimization, the central part of any Machine Learning algortithm</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs041.html#revisiting-our-logistic-regression-case" style="font-size: 80%;"><b>Revisiting our Logistic Regression case</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs042.html#the-equations-to-solve" style="font-size: 80%;"><b>The equations to solve</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs043.html#solving-using-newton-raphson-s-method" style="font-size: 80%;"><b>Solving using Newton-Raphson's method</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs044.html#example-code-for-logistic-regression" style="font-size: 80%;"><b>Example code for Logistic Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs044.html#synthetic-data-generation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Synthetic data generation</a></li>
+
+        </ul>
+      </li>
+    </ul>
+  </div>
+</div>
+</div> <!-- end of navigation bar -->
+<div class="container">
+<p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
+<a name="part0021"></a>
+<!-- !split  -->
+<h2 id="the-same-example-but-now-with-cross-validation" class="anchor">The same example but now with cross-validation </h2>
+
+<p>In this example we keep the intercept column again but add cross-validation in order to estimate the best possible value of the means squared error.</p>
+
+<!-- code=python (!bc pycod) typeset with pygments style "default" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #f8f8f8">
+  <pre style="line-height: 125%;"><span style="color: #408080; font-style: italic"># Common imports</span>
+<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">os</span>
+<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
+<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">pandas</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">pd</span>
+<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">matplotlib.pyplot</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">plt</span>
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.linear_model</span> <span style="color: #008000; font-weight: bold">import</span> LinearRegression, Ridge, Lasso
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.metrics</span> <span style="color: #008000; font-weight: bold">import</span> mean_squared_error
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.model_selection</span> <span style="color: #008000; font-weight: bold">import</span> KFold
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.model_selection</span> <span style="color: #008000; font-weight: bold">import</span> cross_val_score
+
+
+<span style="color: #408080; font-style: italic"># Where to save the figures and data files</span>
+PROJECT_ROOT_DIR <span style="color: #666666">=</span> <span style="color: #BA2121">&quot;Results&quot;</span>
+FIGURE_ID <span style="color: #666666">=</span> <span style="color: #BA2121">&quot;Results/FigureFiles&quot;</span>
+DATA_ID <span style="color: #666666">=</span> <span style="color: #BA2121">&quot;DataFiles/&quot;</span>
+
+<span style="color: #008000; font-weight: bold">if</span> <span style="color: #AA22FF; font-weight: bold">not</span> os<span style="color: #666666">.</span>path<span style="color: #666666">.</span>exists(PROJECT_ROOT_DIR):
+    os<span style="color: #666666">.</span>mkdir(PROJECT_ROOT_DIR)
+
+<span style="color: #008000; font-weight: bold">if</span> <span style="color: #AA22FF; font-weight: bold">not</span> os<span style="color: #666666">.</span>path<span style="color: #666666">.</span>exists(FIGURE_ID):
+    os<span style="color: #666666">.</span>makedirs(FIGURE_ID)
+
+<span style="color: #008000; font-weight: bold">if</span> <span style="color: #AA22FF; font-weight: bold">not</span> os<span style="color: #666666">.</span>path<span style="color: #666666">.</span>exists(DATA_ID):
+    os<span style="color: #666666">.</span>makedirs(DATA_ID)
+
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">image_path</span>(fig_id):
+    <span style="color: #008000; font-weight: bold">return</span> os<span style="color: #666666">.</span>path<span style="color: #666666">.</span>join(FIGURE_ID, fig_id)
+
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">data_path</span>(dat_id):
+    <span style="color: #008000; font-weight: bold">return</span> os<span style="color: #666666">.</span>path<span style="color: #666666">.</span>join(DATA_ID, dat_id)
+
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">save_fig</span>(fig_id):
+    plt<span style="color: #666666">.</span>savefig(image_path(fig_id) <span style="color: #666666">+</span> <span style="color: #BA2121">&quot;.png&quot;</span>, <span style="color: #008000">format</span><span style="color: #666666">=</span><span style="color: #BA2121">&#39;png&#39;</span>)
+
+infile <span style="color: #666666">=</span> <span style="color: #008000">open</span>(data_path(<span style="color: #BA2121">&quot;EoS.csv&quot;</span>),<span style="color: #BA2121">&#39;r&#39;</span>)
+
+<span style="color: #408080; font-style: italic"># Read the EoS data as  csv file and organize the data into two arrays with density and energies</span>
+EoS <span style="color: #666666">=</span> pd<span style="color: #666666">.</span>read_csv(infile, names<span style="color: #666666">=</span>(<span style="color: #BA2121">&#39;Density&#39;</span>, <span style="color: #BA2121">&#39;Energy&#39;</span>))
+EoS[<span style="color: #BA2121">&#39;Energy&#39;</span>] <span style="color: #666666">=</span> pd<span style="color: #666666">.</span>to_numeric(EoS[<span style="color: #BA2121">&#39;Energy&#39;</span>], errors<span style="color: #666666">=</span><span style="color: #BA2121">&#39;coerce&#39;</span>)
+EoS <span style="color: #666666">=</span> EoS<span style="color: #666666">.</span>dropna()
+Energies <span style="color: #666666">=</span> EoS[<span style="color: #BA2121">&#39;Energy&#39;</span>]
+Density <span style="color: #666666">=</span> EoS[<span style="color: #BA2121">&#39;Density&#39;</span>]
+<span style="color: #408080; font-style: italic">#  The design matrix now as function of various polytrops</span>
+
+Maxpolydegree <span style="color: #666666">=</span> <span style="color: #666666">30</span>
+X <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros((<span style="color: #008000">len</span>(Density),Maxpolydegree))
+X[:,<span style="color: #666666">0</span>] <span style="color: #666666">=</span> <span style="color: #666666">1.0</span>
+estimated_mse_sklearn <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros(Maxpolydegree)
+polynomial <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros(Maxpolydegree)
+k <span style="color: #666666">=5</span>
+kfold <span style="color: #666666">=</span> KFold(n_splits <span style="color: #666666">=</span> k)
+
+<span style="color: #008000; font-weight: bold">for</span> polydegree <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #666666">1</span>, Maxpolydegree):
+    polynomial[polydegree] <span style="color: #666666">=</span> polydegree
+    <span style="color: #008000; font-weight: bold">for</span> degree <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(polydegree):
+        X[:,degree] <span style="color: #666666">=</span> Density<span style="color: #666666">**</span>(degree<span style="color: #666666">/3.0</span>)
+        OLS <span style="color: #666666">=</span> LinearRegression(fit_intercept<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">False</span>)
+<span style="color: #408080; font-style: italic"># loop over trials in order to estimate the expectation value of the MSE</span>
+    estimated_mse_folds <span style="color: #666666">=</span> cross_val_score(OLS, X, Energies, scoring<span style="color: #666666">=</span><span style="color: #BA2121">&#39;neg_mean_squared_error&#39;</span>, cv<span style="color: #666666">=</span>kfold)
+<span style="color: #408080; font-style: italic">#[:, np.newaxis]</span>
+    estimated_mse_sklearn[polydegree] <span style="color: #666666">=</span> np<span style="color: #666666">.</span>mean(<span style="color: #666666">-</span>estimated_mse_folds)
+
+plt<span style="color: #666666">.</span>plot(polynomial, np<span style="color: #666666">.</span>log10(estimated_mse_sklearn), label<span style="color: #666666">=</span><span style="color: #BA2121">&#39;Test Error&#39;</span>)
+plt<span style="color: #666666">.</span>xlabel(<span style="color: #BA2121">&#39;Polynomial degree&#39;</span>)
+plt<span style="color: #666666">.</span>ylabel(<span style="color: #BA2121">&#39;log10[MSE]&#39;</span>)
+plt<span style="color: #666666">.</span>legend()
+plt<span style="color: #666666">.</span>show()
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
+</div>
+
+
+<p>
+<!-- navigation buttons at the bottom of the page -->
+<ul class="pagination">
+<li><a href="http://github.com/._week39-bs020.html">&laquo;</a></li>
+  <li><a href="http://github.com/._week39-bs000.html">1</a></li>
+  <li><a href="">...</a></li>
+  <li><a href="http://github.com/._week39-bs013.html">14</a></li>
+  <li><a href="http://github.com/._week39-bs014.html">15</a></li>
+  <li><a href="http://github.com/._week39-bs015.html">16</a></li>
+  <li><a href="http://github.com/._week39-bs016.html">17</a></li>
+  <li><a href="http://github.com/._week39-bs017.html">18</a></li>
+  <li><a href="http://github.com/._week39-bs018.html">19</a></li>
+  <li><a href="http://github.com/._week39-bs019.html">20</a></li>
+  <li><a href="http://github.com/._week39-bs020.html">21</a></li>
+  <li class="active"><a href="http://github.com/._week39-bs021.html">22</a></li>
+  <li><a href="http://github.com/._week39-bs022.html">23</a></li>
+  <li><a href="http://github.com/._week39-bs023.html">24</a></li>
+  <li><a href="http://github.com/._week39-bs024.html">25</a></li>
+  <li><a href="http://github.com/._week39-bs025.html">26</a></li>
+  <li><a href="http://github.com/._week39-bs026.html">27</a></li>
+  <li><a href="http://github.com/._week39-bs027.html">28</a></li>
+  <li><a href="http://github.com/._week39-bs028.html">29</a></li>
+  <li><a href="http://github.com/._week39-bs029.html">30</a></li>
+  <li><a href="http://github.com/._week39-bs030.html">31</a></li>
+  <li><a href="">...</a></li>
+  <li><a href="http://github.com/._week39-bs044.html">45</a></li>
+  <li><a href="http://github.com/._week39-bs022.html">&raquo;</a></li>
+</ul>
+<!-- ------------------- end of main content --------------- -->
+</div>  <!-- end container -->
+<!-- include javascript, jQuery *first* -->
+<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.10.2/jquery.min.js"></script>
+<script src="https://netdna.bootstrapcdn.com/bootstrap/3.0.0/js/bootstrap.min.js"></script>
+<!-- Bootstrap footer
+<footer>
+<a href="https://.../"><img width="250" align=right src="https://.../"></a>
+</footer>
+-->
+<center style="font-size:80%">
+<!-- copyright only on the titlepage -->
+</center>
+</body>
+</html>
+
diff --git a/doc/pub/week39/html/._week39-bs022.html b/doc/pub/week39/html/._week39-bs022.html
new file mode 100644
index 000000000..273d94a9d
--- /dev/null
+++ b/doc/pub/week39/html/._week39-bs022.html
@@ -0,0 +1,313 @@
+<!--
+HTML file automatically generated from DocOnce source
+(https://github.com/doconce/doconce/)
+doconce format html week39.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week39-bs --no_mako
+-->
+<html>
+<head>
+<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
+<meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
+<meta name="viewport" content="width=device-width, initial-scale=1.0" />
+<meta name="description" content="Week 39: Resampling methods and logistic regression">
+<title>Week 39: Resampling methods and logistic regression</title>
+<!-- Bootstrap style: bootstrap -->
+<!-- doconce format html week39.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week39-bs --no_mako -->
+<link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
+<!-- not necessary
+<link href="https://netdna.bootstrapcdn.com/font-awesome/4.0.3/css/font-awesome.css" rel="stylesheet">
+-->
+<style type="text/css">
+/* Add scrollbar to dropdown menus in bootstrap navigation bar */
+.dropdown-menu {
+   height: auto;
+   max-height: 400px;
+   overflow-x: hidden;
+}
+/* Adds an invisible element before each target to offset for the navigation
+   bar */
+.anchor::before {
+  content:"";
+  display:block;
+  height:50px;      /* fixed header height for style bootstrap */
+  margin:-50px 0 0; /* negative fixed header height */
+}
+</style>
+</head>
+
+<!-- tocinfo
+{'highest level': 2,
+ 'sections': [('Plan for week 39, September 22-26, 2025',
+               2,
+               None,
+               'plan-for-week-39-september-22-26-2025'),
+              ('Readings and Videos, resampling methods',
+               2,
+               None,
+               'readings-and-videos-resampling-methods'),
+              ('Readings and Videos, logistic regression',
+               2,
+               None,
+               'readings-and-videos-logistic-regression'),
+              ('Lab sessions week 39', 2, None, 'lab-sessions-week-39'),
+              ('Lecture material', 2, None, 'lecture-material'),
+              ('Resampling methods', 2, None, 'resampling-methods'),
+              ('Resampling approaches can be computationally expensive',
+               2,
+               None,
+               'resampling-approaches-can-be-computationally-expensive'),
+              ('Why resampling methods ?', 2, None, 'why-resampling-methods'),
+              ('Statistical analysis', 2, None, 'statistical-analysis'),
+              ('Resampling methods', 2, None, 'resampling-methods'),
+              ('Resampling methods: Bootstrap',
+               2,
+               None,
+               'resampling-methods-bootstrap'),
+              ('The bias-variance tradeoff',
+               2,
+               None,
+               'the-bias-variance-tradeoff'),
+              ('A way to Read the Bias-Variance Tradeoff',
+               2,
+               None,
+               'a-way-to-read-the-bias-variance-tradeoff'),
+              ('Understanding what happens',
+               2,
+               None,
+               'understanding-what-happens'),
+              ('Summing up', 2, None, 'summing-up'),
+              ("Another Example from Scikit-Learn's Repository",
+               2,
+               None,
+               'another-example-from-scikit-learn-s-repository'),
+              ('Various steps in cross-validation',
+               2,
+               None,
+               'various-steps-in-cross-validation'),
+              ('Cross-validation in brief',
+               2,
+               None,
+               'cross-validation-in-brief'),
+              ('Code Example for Cross-validation and $k$-fold '
+               'Cross-validation',
+               2,
+               None,
+               'code-example-for-cross-validation-and-k-fold-cross-validation'),
+              ('More examples on bootstrap and cross-validation and errors',
+               2,
+               None,
+               'more-examples-on-bootstrap-and-cross-validation-and-errors'),
+              ('The same example but now with cross-validation',
+               2,
+               None,
+               'the-same-example-but-now-with-cross-validation'),
+              ('Logistic Regression', 2, None, 'logistic-regression'),
+              ('Classification problems', 2, None, 'classification-problems'),
+              ('Optimization and Deep learning',
+               2,
+               None,
+               'optimization-and-deep-learning'),
+              ('Basics', 2, None, 'basics'),
+              ('Linear classifier', 2, None, 'linear-classifier'),
+              ('Some selected properties', 2, None, 'some-selected-properties'),
+              ('Simple example', 2, None, 'simple-example'),
+              ('Plotting the mean value for each group',
+               2,
+               None,
+               'plotting-the-mean-value-for-each-group'),
+              ('The logistic function', 2, None, 'the-logistic-function'),
+              ('Examples of likelihood functions used in logistic regression '
+               'and nueral networks',
+               2,
+               None,
+               'examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-networks'),
+              ('Two parameters', 2, None, 'two-parameters'),
+              ('Maximum likelihood', 2, None, 'maximum-likelihood'),
+              ('The cost function rewritten',
+               2,
+               None,
+               'the-cost-function-rewritten'),
+              ('Minimizing the cross entropy',
+               2,
+               None,
+               'minimizing-the-cross-entropy'),
+              ('A more compact expression',
+               2,
+               None,
+               'a-more-compact-expression'),
+              ('Extending to more predictors',
+               2,
+               None,
+               'extending-to-more-predictors'),
+              ('Including more classes', 2, None, 'including-more-classes'),
+              ('More classes', 2, None, 'more-classes'),
+              ('Optimization, the central part of any Machine Learning '
+               'algortithm',
+               2,
+               None,
+               'optimization-the-central-part-of-any-machine-learning-algortithm'),
+              ('Revisiting our Logistic Regression case',
+               2,
+               None,
+               'revisiting-our-logistic-regression-case'),
+              ('The equations to solve', 2, None, 'the-equations-to-solve'),
+              ("Solving using Newton-Raphson's method",
+               2,
+               None,
+               'solving-using-newton-raphson-s-method'),
+              ('Example code for Logistic Regression',
+               2,
+               None,
+               'example-code-for-logistic-regression'),
+              ('Synthetic data generation',
+               3,
+               None,
+               'synthetic-data-generation')]}
+end of tocinfo -->
+
+<body>
+
+
+
+<script type="text/x-mathjax-config">
+MathJax.Hub.Config({
+  TeX: {
+     equationNumbers: {  autoNumber: "none"  },
+     extensions: ["AMSmath.js", "AMSsymbols.js", "autobold.js", "color.js"]
+  }
+});
+</script>
+<script type="text/javascript" async
+ src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
+</script>
+
+
+<!-- Bootstrap navigation bar -->
+<div class="navbar navbar-default navbar-fixed-top">
+  <div class="navbar-header">
+    <button type="button" class="navbar-toggle" data-toggle="collapse" data-target=".navbar-responsive-collapse">
+      <span class="icon-bar"></span>
+      <span class="icon-bar"></span>
+      <span class="icon-bar"></span>
+    </button>
+    <a class="navbar-brand" href="http://github.com/week39-bs.html">Week 39: Resampling methods and logistic regression</a>
+  </div>
+  <div class="navbar-collapse collapse navbar-responsive-collapse">
+    <ul class="nav navbar-nav navbar-right">
+      <li class="dropdown">
+        <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
+        <ul class="dropdown-menu">
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs001.html#plan-for-week-39-september-22-26-2025" style="font-size: 80%;"><b>Plan for week 39, September 22-26, 2025</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs002.html#readings-and-videos-resampling-methods" style="font-size: 80%;"><b>Readings and Videos, resampling methods</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs003.html#readings-and-videos-logistic-regression" style="font-size: 80%;"><b>Readings and Videos, logistic regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs004.html#lab-sessions-week-39" style="font-size: 80%;"><b>Lab sessions week 39</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs005.html#lecture-material" style="font-size: 80%;"><b>Lecture material</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs010.html#resampling-methods" style="font-size: 80%;"><b>Resampling methods</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs007.html#resampling-approaches-can-be-computationally-expensive" style="font-size: 80%;"><b>Resampling approaches can be computationally expensive</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs008.html#why-resampling-methods" style="font-size: 80%;"><b>Why resampling methods ?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs009.html#statistical-analysis" style="font-size: 80%;"><b>Statistical analysis</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs010.html#resampling-methods" style="font-size: 80%;"><b>Resampling methods</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs011.html#resampling-methods-bootstrap" style="font-size: 80%;"><b>Resampling methods: Bootstrap</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs012.html#the-bias-variance-tradeoff" style="font-size: 80%;"><b>The bias-variance tradeoff</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs013.html#a-way-to-read-the-bias-variance-tradeoff" style="font-size: 80%;"><b>A way to Read the Bias-Variance Tradeoff</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs014.html#understanding-what-happens" style="font-size: 80%;"><b>Understanding what happens</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs015.html#summing-up" style="font-size: 80%;"><b>Summing up</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs016.html#another-example-from-scikit-learn-s-repository" style="font-size: 80%;"><b>Another Example from Scikit-Learn's Repository</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs017.html#various-steps-in-cross-validation" style="font-size: 80%;"><b>Various steps in cross-validation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs018.html#cross-validation-in-brief" style="font-size: 80%;"><b>Cross-validation in brief</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs019.html#code-example-for-cross-validation-and-k-fold-cross-validation" style="font-size: 80%;"><b>Code Example for Cross-validation and \( k \)-fold Cross-validation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs020.html#more-examples-on-bootstrap-and-cross-validation-and-errors" style="font-size: 80%;"><b>More examples on bootstrap and cross-validation and errors</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs021.html#the-same-example-but-now-with-cross-validation" style="font-size: 80%;"><b>The same example but now with cross-validation</b></a></li>
+     <!-- navigation toc: --> <li><a href="#logistic-regression" style="font-size: 80%;"><b>Logistic Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs023.html#classification-problems" style="font-size: 80%;"><b>Classification problems</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs024.html#optimization-and-deep-learning" style="font-size: 80%;"><b>Optimization and Deep learning</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs025.html#basics" style="font-size: 80%;"><b>Basics</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs026.html#linear-classifier" style="font-size: 80%;"><b>Linear classifier</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs027.html#some-selected-properties" style="font-size: 80%;"><b>Some selected properties</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs028.html#simple-example" style="font-size: 80%;"><b>Simple example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs029.html#plotting-the-mean-value-for-each-group" style="font-size: 80%;"><b>Plotting the mean value for each group</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs030.html#the-logistic-function" style="font-size: 80%;"><b>The logistic function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs031.html#examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-networks" style="font-size: 80%;"><b>Examples of likelihood functions used in logistic regression and nueral networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs032.html#two-parameters" style="font-size: 80%;"><b>Two parameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs033.html#maximum-likelihood" style="font-size: 80%;"><b>Maximum likelihood</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#the-cost-function-rewritten" style="font-size: 80%;"><b>The cost function rewritten</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs035.html#minimizing-the-cross-entropy" style="font-size: 80%;"><b>Minimizing the cross entropy</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs036.html#a-more-compact-expression" style="font-size: 80%;"><b>A more compact expression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs037.html#extending-to-more-predictors" style="font-size: 80%;"><b>Extending to more predictors</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs038.html#including-more-classes" style="font-size: 80%;"><b>Including more classes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs039.html#more-classes" style="font-size: 80%;"><b>More classes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs040.html#optimization-the-central-part-of-any-machine-learning-algortithm" style="font-size: 80%;"><b>Optimization, the central part of any Machine Learning algortithm</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs041.html#revisiting-our-logistic-regression-case" style="font-size: 80%;"><b>Revisiting our Logistic Regression case</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs042.html#the-equations-to-solve" style="font-size: 80%;"><b>The equations to solve</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs043.html#solving-using-newton-raphson-s-method" style="font-size: 80%;"><b>Solving using Newton-Raphson's method</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs044.html#example-code-for-logistic-regression" style="font-size: 80%;"><b>Example code for Logistic Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs044.html#synthetic-data-generation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Synthetic data generation</a></li>
+
+        </ul>
+      </li>
+    </ul>
+  </div>
+</div>
+</div> <!-- end of navigation bar -->
+<div class="container">
+<p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
+<a name="part0022"></a>
+<!-- !split  -->
+<h2 id="logistic-regression" class="anchor">Logistic Regression </h2>
+
+<p>In linear regression our main interest was centered on learning the
+coefficients of a functional fit (say a polynomial) in order to be
+able to predict the response of a continuous variable on some unseen
+data. The fit to the continuous variable \( y_i \) is based on some
+independent variables \( \boldsymbol{x}_i \). Linear regression resulted in
+analytical expressions for standard ordinary Least Squares or Ridge
+regression (in terms of matrices to invert) for several quantities,
+ranging from the variance and thereby the confidence intervals of the
+parameters \( \boldsymbol{\theta} \) to the mean squared error. If we can invert
+the product of the design matrices, linear regression gives then a
+simple recipe for fitting our data.
+</p>
+
+<p>
+<!-- navigation buttons at the bottom of the page -->
+<ul class="pagination">
+<li><a href="http://github.com/._week39-bs021.html">&laquo;</a></li>
+  <li><a href="http://github.com/._week39-bs000.html">1</a></li>
+  <li><a href="">...</a></li>
+  <li><a href="http://github.com/._week39-bs014.html">15</a></li>
+  <li><a href="http://github.com/._week39-bs015.html">16</a></li>
+  <li><a href="http://github.com/._week39-bs016.html">17</a></li>
+  <li><a href="http://github.com/._week39-bs017.html">18</a></li>
+  <li><a href="http://github.com/._week39-bs018.html">19</a></li>
+  <li><a href="http://github.com/._week39-bs019.html">20</a></li>
+  <li><a href="http://github.com/._week39-bs020.html">21</a></li>
+  <li><a href="http://github.com/._week39-bs021.html">22</a></li>
+  <li class="active"><a href="http://github.com/._week39-bs022.html">23</a></li>
+  <li><a href="http://github.com/._week39-bs023.html">24</a></li>
+  <li><a href="http://github.com/._week39-bs024.html">25</a></li>
+  <li><a href="http://github.com/._week39-bs025.html">26</a></li>
+  <li><a href="http://github.com/._week39-bs026.html">27</a></li>
+  <li><a href="http://github.com/._week39-bs027.html">28</a></li>
+  <li><a href="http://github.com/._week39-bs028.html">29</a></li>
+  <li><a href="http://github.com/._week39-bs029.html">30</a></li>
+  <li><a href="http://github.com/._week39-bs030.html">31</a></li>
+  <li><a href="http://github.com/._week39-bs031.html">32</a></li>
+  <li><a href="">...</a></li>
+  <li><a href="http://github.com/._week39-bs044.html">45</a></li>
+  <li><a href="http://github.com/._week39-bs023.html">&raquo;</a></li>
+</ul>
+<!-- ------------------- end of main content --------------- -->
+</div>  <!-- end container -->
+<!-- include javascript, jQuery *first* -->
+<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.10.2/jquery.min.js"></script>
+<script src="https://netdna.bootstrapcdn.com/bootstrap/3.0.0/js/bootstrap.min.js"></script>
+<!-- Bootstrap footer
+<footer>
+<a href="https://.../"><img width="250" align=right src="https://.../"></a>
+</footer>
+-->
+<center style="font-size:80%">
+<!-- copyright only on the titlepage -->
+</center>
+</body>
+</html>
+
diff --git a/doc/pub/week39/html/._week39-bs023.html b/doc/pub/week39/html/._week39-bs023.html
new file mode 100644
index 000000000..17e09275e
--- /dev/null
+++ b/doc/pub/week39/html/._week39-bs023.html
@@ -0,0 +1,318 @@
+<!--
+HTML file automatically generated from DocOnce source
+(https://github.com/doconce/doconce/)
+doconce format html week39.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week39-bs --no_mako
+-->
+<html>
+<head>
+<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
+<meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
+<meta name="viewport" content="width=device-width, initial-scale=1.0" />
+<meta name="description" content="Week 39: Resampling methods and logistic regression">
+<title>Week 39: Resampling methods and logistic regression</title>
+<!-- Bootstrap style: bootstrap -->
+<!-- doconce format html week39.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week39-bs --no_mako -->
+<link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
+<!-- not necessary
+<link href="https://netdna.bootstrapcdn.com/font-awesome/4.0.3/css/font-awesome.css" rel="stylesheet">
+-->
+<style type="text/css">
+/* Add scrollbar to dropdown menus in bootstrap navigation bar */
+.dropdown-menu {
+   height: auto;
+   max-height: 400px;
+   overflow-x: hidden;
+}
+/* Adds an invisible element before each target to offset for the navigation
+   bar */
+.anchor::before {
+  content:"";
+  display:block;
+  height:50px;      /* fixed header height for style bootstrap */
+  margin:-50px 0 0; /* negative fixed header height */
+}
+</style>
+</head>
+
+<!-- tocinfo
+{'highest level': 2,
+ 'sections': [('Plan for week 39, September 22-26, 2025',
+               2,
+               None,
+               'plan-for-week-39-september-22-26-2025'),
+              ('Readings and Videos, resampling methods',
+               2,
+               None,
+               'readings-and-videos-resampling-methods'),
+              ('Readings and Videos, logistic regression',
+               2,
+               None,
+               'readings-and-videos-logistic-regression'),
+              ('Lab sessions week 39', 2, None, 'lab-sessions-week-39'),
+              ('Lecture material', 2, None, 'lecture-material'),
+              ('Resampling methods', 2, None, 'resampling-methods'),
+              ('Resampling approaches can be computationally expensive',
+               2,
+               None,
+               'resampling-approaches-can-be-computationally-expensive'),
+              ('Why resampling methods ?', 2, None, 'why-resampling-methods'),
+              ('Statistical analysis', 2, None, 'statistical-analysis'),
+              ('Resampling methods', 2, None, 'resampling-methods'),
+              ('Resampling methods: Bootstrap',
+               2,
+               None,
+               'resampling-methods-bootstrap'),
+              ('The bias-variance tradeoff',
+               2,
+               None,
+               'the-bias-variance-tradeoff'),
+              ('A way to Read the Bias-Variance Tradeoff',
+               2,
+               None,
+               'a-way-to-read-the-bias-variance-tradeoff'),
+              ('Understanding what happens',
+               2,
+               None,
+               'understanding-what-happens'),
+              ('Summing up', 2, None, 'summing-up'),
+              ("Another Example from Scikit-Learn's Repository",
+               2,
+               None,
+               'another-example-from-scikit-learn-s-repository'),
+              ('Various steps in cross-validation',
+               2,
+               None,
+               'various-steps-in-cross-validation'),
+              ('Cross-validation in brief',
+               2,
+               None,
+               'cross-validation-in-brief'),
+              ('Code Example for Cross-validation and $k$-fold '
+               'Cross-validation',
+               2,
+               None,
+               'code-example-for-cross-validation-and-k-fold-cross-validation'),
+              ('More examples on bootstrap and cross-validation and errors',
+               2,
+               None,
+               'more-examples-on-bootstrap-and-cross-validation-and-errors'),
+              ('The same example but now with cross-validation',
+               2,
+               None,
+               'the-same-example-but-now-with-cross-validation'),
+              ('Logistic Regression', 2, None, 'logistic-regression'),
+              ('Classification problems', 2, None, 'classification-problems'),
+              ('Optimization and Deep learning',
+               2,
+               None,
+               'optimization-and-deep-learning'),
+              ('Basics', 2, None, 'basics'),
+              ('Linear classifier', 2, None, 'linear-classifier'),
+              ('Some selected properties', 2, None, 'some-selected-properties'),
+              ('Simple example', 2, None, 'simple-example'),
+              ('Plotting the mean value for each group',
+               2,
+               None,
+               'plotting-the-mean-value-for-each-group'),
+              ('The logistic function', 2, None, 'the-logistic-function'),
+              ('Examples of likelihood functions used in logistic regression '
+               'and nueral networks',
+               2,
+               None,
+               'examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-networks'),
+              ('Two parameters', 2, None, 'two-parameters'),
+              ('Maximum likelihood', 2, None, 'maximum-likelihood'),
+              ('The cost function rewritten',
+               2,
+               None,
+               'the-cost-function-rewritten'),
+              ('Minimizing the cross entropy',
+               2,
+               None,
+               'minimizing-the-cross-entropy'),
+              ('A more compact expression',
+               2,
+               None,
+               'a-more-compact-expression'),
+              ('Extending to more predictors',
+               2,
+               None,
+               'extending-to-more-predictors'),
+              ('Including more classes', 2, None, 'including-more-classes'),
+              ('More classes', 2, None, 'more-classes'),
+              ('Optimization, the central part of any Machine Learning '
+               'algortithm',
+               2,
+               None,
+               'optimization-the-central-part-of-any-machine-learning-algortithm'),
+              ('Revisiting our Logistic Regression case',
+               2,
+               None,
+               'revisiting-our-logistic-regression-case'),
+              ('The equations to solve', 2, None, 'the-equations-to-solve'),
+              ("Solving using Newton-Raphson's method",
+               2,
+               None,
+               'solving-using-newton-raphson-s-method'),
+              ('Example code for Logistic Regression',
+               2,
+               None,
+               'example-code-for-logistic-regression'),
+              ('Synthetic data generation',
+               3,
+               None,
+               'synthetic-data-generation')]}
+end of tocinfo -->
+
+<body>
+
+
+
+<script type="text/x-mathjax-config">
+MathJax.Hub.Config({
+  TeX: {
+     equationNumbers: {  autoNumber: "none"  },
+     extensions: ["AMSmath.js", "AMSsymbols.js", "autobold.js", "color.js"]
+  }
+});
+</script>
+<script type="text/javascript" async
+ src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
+</script>
+
+
+<!-- Bootstrap navigation bar -->
+<div class="navbar navbar-default navbar-fixed-top">
+  <div class="navbar-header">
+    <button type="button" class="navbar-toggle" data-toggle="collapse" data-target=".navbar-responsive-collapse">
+      <span class="icon-bar"></span>
+      <span class="icon-bar"></span>
+      <span class="icon-bar"></span>
+    </button>
+    <a class="navbar-brand" href="http://github.com/week39-bs.html">Week 39: Resampling methods and logistic regression</a>
+  </div>
+  <div class="navbar-collapse collapse navbar-responsive-collapse">
+    <ul class="nav navbar-nav navbar-right">
+      <li class="dropdown">
+        <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
+        <ul class="dropdown-menu">
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs001.html#plan-for-week-39-september-22-26-2025" style="font-size: 80%;"><b>Plan for week 39, September 22-26, 2025</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs002.html#readings-and-videos-resampling-methods" style="font-size: 80%;"><b>Readings and Videos, resampling methods</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs003.html#readings-and-videos-logistic-regression" style="font-size: 80%;"><b>Readings and Videos, logistic regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs004.html#lab-sessions-week-39" style="font-size: 80%;"><b>Lab sessions week 39</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs005.html#lecture-material" style="font-size: 80%;"><b>Lecture material</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs010.html#resampling-methods" style="font-size: 80%;"><b>Resampling methods</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs007.html#resampling-approaches-can-be-computationally-expensive" style="font-size: 80%;"><b>Resampling approaches can be computationally expensive</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs008.html#why-resampling-methods" style="font-size: 80%;"><b>Why resampling methods ?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs009.html#statistical-analysis" style="font-size: 80%;"><b>Statistical analysis</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs010.html#resampling-methods" style="font-size: 80%;"><b>Resampling methods</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs011.html#resampling-methods-bootstrap" style="font-size: 80%;"><b>Resampling methods: Bootstrap</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs012.html#the-bias-variance-tradeoff" style="font-size: 80%;"><b>The bias-variance tradeoff</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs013.html#a-way-to-read-the-bias-variance-tradeoff" style="font-size: 80%;"><b>A way to Read the Bias-Variance Tradeoff</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs014.html#understanding-what-happens" style="font-size: 80%;"><b>Understanding what happens</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs015.html#summing-up" style="font-size: 80%;"><b>Summing up</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs016.html#another-example-from-scikit-learn-s-repository" style="font-size: 80%;"><b>Another Example from Scikit-Learn's Repository</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs017.html#various-steps-in-cross-validation" style="font-size: 80%;"><b>Various steps in cross-validation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs018.html#cross-validation-in-brief" style="font-size: 80%;"><b>Cross-validation in brief</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs019.html#code-example-for-cross-validation-and-k-fold-cross-validation" style="font-size: 80%;"><b>Code Example for Cross-validation and \( k \)-fold Cross-validation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs020.html#more-examples-on-bootstrap-and-cross-validation-and-errors" style="font-size: 80%;"><b>More examples on bootstrap and cross-validation and errors</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs021.html#the-same-example-but-now-with-cross-validation" style="font-size: 80%;"><b>The same example but now with cross-validation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs022.html#logistic-regression" style="font-size: 80%;"><b>Logistic Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="#classification-problems" style="font-size: 80%;"><b>Classification problems</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs024.html#optimization-and-deep-learning" style="font-size: 80%;"><b>Optimization and Deep learning</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs025.html#basics" style="font-size: 80%;"><b>Basics</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs026.html#linear-classifier" style="font-size: 80%;"><b>Linear classifier</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs027.html#some-selected-properties" style="font-size: 80%;"><b>Some selected properties</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs028.html#simple-example" style="font-size: 80%;"><b>Simple example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs029.html#plotting-the-mean-value-for-each-group" style="font-size: 80%;"><b>Plotting the mean value for each group</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs030.html#the-logistic-function" style="font-size: 80%;"><b>The logistic function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs031.html#examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-networks" style="font-size: 80%;"><b>Examples of likelihood functions used in logistic regression and nueral networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs032.html#two-parameters" style="font-size: 80%;"><b>Two parameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs033.html#maximum-likelihood" style="font-size: 80%;"><b>Maximum likelihood</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#the-cost-function-rewritten" style="font-size: 80%;"><b>The cost function rewritten</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs035.html#minimizing-the-cross-entropy" style="font-size: 80%;"><b>Minimizing the cross entropy</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs036.html#a-more-compact-expression" style="font-size: 80%;"><b>A more compact expression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs037.html#extending-to-more-predictors" style="font-size: 80%;"><b>Extending to more predictors</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs038.html#including-more-classes" style="font-size: 80%;"><b>Including more classes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs039.html#more-classes" style="font-size: 80%;"><b>More classes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs040.html#optimization-the-central-part-of-any-machine-learning-algortithm" style="font-size: 80%;"><b>Optimization, the central part of any Machine Learning algortithm</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs041.html#revisiting-our-logistic-regression-case" style="font-size: 80%;"><b>Revisiting our Logistic Regression case</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs042.html#the-equations-to-solve" style="font-size: 80%;"><b>The equations to solve</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs043.html#solving-using-newton-raphson-s-method" style="font-size: 80%;"><b>Solving using Newton-Raphson's method</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs044.html#example-code-for-logistic-regression" style="font-size: 80%;"><b>Example code for Logistic Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs044.html#synthetic-data-generation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Synthetic data generation</a></li>
+
+        </ul>
+      </li>
+    </ul>
+  </div>
+</div>
+</div> <!-- end of navigation bar -->
+<div class="container">
+<p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
+<a name="part0023"></a>
+<!-- !split  -->
+<h2 id="classification-problems" class="anchor">Classification problems </h2>
+
+<p>Classification problems, however, are concerned with outcomes taking
+the form of discrete variables (i.e. categories). We may for example,
+on the basis of DNA sequencing for a number of patients, like to find
+out which mutations are important for a certain disease; or based on
+scans of various patients' brains, figure out if there is a tumor or
+not; or given a specific physical system, we'd like to identify its
+state, say whether it is an ordered or disordered system (typical
+situation in solid state physics); or classify the status of a
+patient, whether she/he has a stroke or not and many other similar
+situations.
+</p>
+
+<p>The most common situation we encounter when we apply logistic
+regression is that of two possible outcomes, normally denoted as a
+binary outcome, true or false, positive or negative, success or
+failure etc.
+</p>
+
+<p>
+<!-- navigation buttons at the bottom of the page -->
+<ul class="pagination">
+<li><a href="http://github.com/._week39-bs022.html">&laquo;</a></li>
+  <li><a href="http://github.com/._week39-bs000.html">1</a></li>
+  <li><a href="">...</a></li>
+  <li><a href="http://github.com/._week39-bs015.html">16</a></li>
+  <li><a href="http://github.com/._week39-bs016.html">17</a></li>
+  <li><a href="http://github.com/._week39-bs017.html">18</a></li>
+  <li><a href="http://github.com/._week39-bs018.html">19</a></li>
+  <li><a href="http://github.com/._week39-bs019.html">20</a></li>
+  <li><a href="http://github.com/._week39-bs020.html">21</a></li>
+  <li><a href="http://github.com/._week39-bs021.html">22</a></li>
+  <li><a href="http://github.com/._week39-bs022.html">23</a></li>
+  <li class="active"><a href="http://github.com/._week39-bs023.html">24</a></li>
+  <li><a href="http://github.com/._week39-bs024.html">25</a></li>
+  <li><a href="http://github.com/._week39-bs025.html">26</a></li>
+  <li><a href="http://github.com/._week39-bs026.html">27</a></li>
+  <li><a href="http://github.com/._week39-bs027.html">28</a></li>
+  <li><a href="http://github.com/._week39-bs028.html">29</a></li>
+  <li><a href="http://github.com/._week39-bs029.html">30</a></li>
+  <li><a href="http://github.com/._week39-bs030.html">31</a></li>
+  <li><a href="http://github.com/._week39-bs031.html">32</a></li>
+  <li><a href="http://github.com/._week39-bs032.html">33</a></li>
+  <li><a href="">...</a></li>
+  <li><a href="http://github.com/._week39-bs044.html">45</a></li>
+  <li><a href="http://github.com/._week39-bs024.html">&raquo;</a></li>
+</ul>
+<!-- ------------------- end of main content --------------- -->
+</div>  <!-- end container -->
+<!-- include javascript, jQuery *first* -->
+<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.10.2/jquery.min.js"></script>
+<script src="https://netdna.bootstrapcdn.com/bootstrap/3.0.0/js/bootstrap.min.js"></script>
+<!-- Bootstrap footer
+<footer>
+<a href="https://.../"><img width="250" align=right src="https://.../"></a>
+</footer>
+-->
+<center style="font-size:80%">
+<!-- copyright only on the titlepage -->
+</center>
+</body>
+</html>
+
diff --git a/doc/pub/week39/html/._week39-bs024.html b/doc/pub/week39/html/._week39-bs024.html
new file mode 100644
index 000000000..8f8552d70
--- /dev/null
+++ b/doc/pub/week39/html/._week39-bs024.html
@@ -0,0 +1,316 @@
+<!--
+HTML file automatically generated from DocOnce source
+(https://github.com/doconce/doconce/)
+doconce format html week39.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week39-bs --no_mako
+-->
+<html>
+<head>
+<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
+<meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
+<meta name="viewport" content="width=device-width, initial-scale=1.0" />
+<meta name="description" content="Week 39: Resampling methods and logistic regression">
+<title>Week 39: Resampling methods and logistic regression</title>
+<!-- Bootstrap style: bootstrap -->
+<!-- doconce format html week39.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week39-bs --no_mako -->
+<link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
+<!-- not necessary
+<link href="https://netdna.bootstrapcdn.com/font-awesome/4.0.3/css/font-awesome.css" rel="stylesheet">
+-->
+<style type="text/css">
+/* Add scrollbar to dropdown menus in bootstrap navigation bar */
+.dropdown-menu {
+   height: auto;
+   max-height: 400px;
+   overflow-x: hidden;
+}
+/* Adds an invisible element before each target to offset for the navigation
+   bar */
+.anchor::before {
+  content:"";
+  display:block;
+  height:50px;      /* fixed header height for style bootstrap */
+  margin:-50px 0 0; /* negative fixed header height */
+}
+</style>
+</head>
+
+<!-- tocinfo
+{'highest level': 2,
+ 'sections': [('Plan for week 39, September 22-26, 2025',
+               2,
+               None,
+               'plan-for-week-39-september-22-26-2025'),
+              ('Readings and Videos, resampling methods',
+               2,
+               None,
+               'readings-and-videos-resampling-methods'),
+              ('Readings and Videos, logistic regression',
+               2,
+               None,
+               'readings-and-videos-logistic-regression'),
+              ('Lab sessions week 39', 2, None, 'lab-sessions-week-39'),
+              ('Lecture material', 2, None, 'lecture-material'),
+              ('Resampling methods', 2, None, 'resampling-methods'),
+              ('Resampling approaches can be computationally expensive',
+               2,
+               None,
+               'resampling-approaches-can-be-computationally-expensive'),
+              ('Why resampling methods ?', 2, None, 'why-resampling-methods'),
+              ('Statistical analysis', 2, None, 'statistical-analysis'),
+              ('Resampling methods', 2, None, 'resampling-methods'),
+              ('Resampling methods: Bootstrap',
+               2,
+               None,
+               'resampling-methods-bootstrap'),
+              ('The bias-variance tradeoff',
+               2,
+               None,
+               'the-bias-variance-tradeoff'),
+              ('A way to Read the Bias-Variance Tradeoff',
+               2,
+               None,
+               'a-way-to-read-the-bias-variance-tradeoff'),
+              ('Understanding what happens',
+               2,
+               None,
+               'understanding-what-happens'),
+              ('Summing up', 2, None, 'summing-up'),
+              ("Another Example from Scikit-Learn's Repository",
+               2,
+               None,
+               'another-example-from-scikit-learn-s-repository'),
+              ('Various steps in cross-validation',
+               2,
+               None,
+               'various-steps-in-cross-validation'),
+              ('Cross-validation in brief',
+               2,
+               None,
+               'cross-validation-in-brief'),
+              ('Code Example for Cross-validation and $k$-fold '
+               'Cross-validation',
+               2,
+               None,
+               'code-example-for-cross-validation-and-k-fold-cross-validation'),
+              ('More examples on bootstrap and cross-validation and errors',
+               2,
+               None,
+               'more-examples-on-bootstrap-and-cross-validation-and-errors'),
+              ('The same example but now with cross-validation',
+               2,
+               None,
+               'the-same-example-but-now-with-cross-validation'),
+              ('Logistic Regression', 2, None, 'logistic-regression'),
+              ('Classification problems', 2, None, 'classification-problems'),
+              ('Optimization and Deep learning',
+               2,
+               None,
+               'optimization-and-deep-learning'),
+              ('Basics', 2, None, 'basics'),
+              ('Linear classifier', 2, None, 'linear-classifier'),
+              ('Some selected properties', 2, None, 'some-selected-properties'),
+              ('Simple example', 2, None, 'simple-example'),
+              ('Plotting the mean value for each group',
+               2,
+               None,
+               'plotting-the-mean-value-for-each-group'),
+              ('The logistic function', 2, None, 'the-logistic-function'),
+              ('Examples of likelihood functions used in logistic regression '
+               'and nueral networks',
+               2,
+               None,
+               'examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-networks'),
+              ('Two parameters', 2, None, 'two-parameters'),
+              ('Maximum likelihood', 2, None, 'maximum-likelihood'),
+              ('The cost function rewritten',
+               2,
+               None,
+               'the-cost-function-rewritten'),
+              ('Minimizing the cross entropy',
+               2,
+               None,
+               'minimizing-the-cross-entropy'),
+              ('A more compact expression',
+               2,
+               None,
+               'a-more-compact-expression'),
+              ('Extending to more predictors',
+               2,
+               None,
+               'extending-to-more-predictors'),
+              ('Including more classes', 2, None, 'including-more-classes'),
+              ('More classes', 2, None, 'more-classes'),
+              ('Optimization, the central part of any Machine Learning '
+               'algortithm',
+               2,
+               None,
+               'optimization-the-central-part-of-any-machine-learning-algortithm'),
+              ('Revisiting our Logistic Regression case',
+               2,
+               None,
+               'revisiting-our-logistic-regression-case'),
+              ('The equations to solve', 2, None, 'the-equations-to-solve'),
+              ("Solving using Newton-Raphson's method",
+               2,
+               None,
+               'solving-using-newton-raphson-s-method'),
+              ('Example code for Logistic Regression',
+               2,
+               None,
+               'example-code-for-logistic-regression'),
+              ('Synthetic data generation',
+               3,
+               None,
+               'synthetic-data-generation')]}
+end of tocinfo -->
+
+<body>
+
+
+
+<script type="text/x-mathjax-config">
+MathJax.Hub.Config({
+  TeX: {
+     equationNumbers: {  autoNumber: "none"  },
+     extensions: ["AMSmath.js", "AMSsymbols.js", "autobold.js", "color.js"]
+  }
+});
+</script>
+<script type="text/javascript" async
+ src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
+</script>
+
+
+<!-- Bootstrap navigation bar -->
+<div class="navbar navbar-default navbar-fixed-top">
+  <div class="navbar-header">
+    <button type="button" class="navbar-toggle" data-toggle="collapse" data-target=".navbar-responsive-collapse">
+      <span class="icon-bar"></span>
+      <span class="icon-bar"></span>
+      <span class="icon-bar"></span>
+    </button>
+    <a class="navbar-brand" href="http://github.com/week39-bs.html">Week 39: Resampling methods and logistic regression</a>
+  </div>
+  <div class="navbar-collapse collapse navbar-responsive-collapse">
+    <ul class="nav navbar-nav navbar-right">
+      <li class="dropdown">
+        <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
+        <ul class="dropdown-menu">
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs001.html#plan-for-week-39-september-22-26-2025" style="font-size: 80%;"><b>Plan for week 39, September 22-26, 2025</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs002.html#readings-and-videos-resampling-methods" style="font-size: 80%;"><b>Readings and Videos, resampling methods</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs003.html#readings-and-videos-logistic-regression" style="font-size: 80%;"><b>Readings and Videos, logistic regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs004.html#lab-sessions-week-39" style="font-size: 80%;"><b>Lab sessions week 39</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs005.html#lecture-material" style="font-size: 80%;"><b>Lecture material</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs010.html#resampling-methods" style="font-size: 80%;"><b>Resampling methods</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs007.html#resampling-approaches-can-be-computationally-expensive" style="font-size: 80%;"><b>Resampling approaches can be computationally expensive</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs008.html#why-resampling-methods" style="font-size: 80%;"><b>Why resampling methods ?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs009.html#statistical-analysis" style="font-size: 80%;"><b>Statistical analysis</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs010.html#resampling-methods" style="font-size: 80%;"><b>Resampling methods</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs011.html#resampling-methods-bootstrap" style="font-size: 80%;"><b>Resampling methods: Bootstrap</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs012.html#the-bias-variance-tradeoff" style="font-size: 80%;"><b>The bias-variance tradeoff</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs013.html#a-way-to-read-the-bias-variance-tradeoff" style="font-size: 80%;"><b>A way to Read the Bias-Variance Tradeoff</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs014.html#understanding-what-happens" style="font-size: 80%;"><b>Understanding what happens</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs015.html#summing-up" style="font-size: 80%;"><b>Summing up</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs016.html#another-example-from-scikit-learn-s-repository" style="font-size: 80%;"><b>Another Example from Scikit-Learn's Repository</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs017.html#various-steps-in-cross-validation" style="font-size: 80%;"><b>Various steps in cross-validation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs018.html#cross-validation-in-brief" style="font-size: 80%;"><b>Cross-validation in brief</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs019.html#code-example-for-cross-validation-and-k-fold-cross-validation" style="font-size: 80%;"><b>Code Example for Cross-validation and \( k \)-fold Cross-validation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs020.html#more-examples-on-bootstrap-and-cross-validation-and-errors" style="font-size: 80%;"><b>More examples on bootstrap and cross-validation and errors</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs021.html#the-same-example-but-now-with-cross-validation" style="font-size: 80%;"><b>The same example but now with cross-validation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs022.html#logistic-regression" style="font-size: 80%;"><b>Logistic Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs023.html#classification-problems" style="font-size: 80%;"><b>Classification problems</b></a></li>
+     <!-- navigation toc: --> <li><a href="#optimization-and-deep-learning" style="font-size: 80%;"><b>Optimization and Deep learning</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs025.html#basics" style="font-size: 80%;"><b>Basics</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs026.html#linear-classifier" style="font-size: 80%;"><b>Linear classifier</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs027.html#some-selected-properties" style="font-size: 80%;"><b>Some selected properties</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs028.html#simple-example" style="font-size: 80%;"><b>Simple example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs029.html#plotting-the-mean-value-for-each-group" style="font-size: 80%;"><b>Plotting the mean value for each group</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs030.html#the-logistic-function" style="font-size: 80%;"><b>The logistic function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs031.html#examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-networks" style="font-size: 80%;"><b>Examples of likelihood functions used in logistic regression and nueral networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs032.html#two-parameters" style="font-size: 80%;"><b>Two parameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs033.html#maximum-likelihood" style="font-size: 80%;"><b>Maximum likelihood</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#the-cost-function-rewritten" style="font-size: 80%;"><b>The cost function rewritten</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs035.html#minimizing-the-cross-entropy" style="font-size: 80%;"><b>Minimizing the cross entropy</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs036.html#a-more-compact-expression" style="font-size: 80%;"><b>A more compact expression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs037.html#extending-to-more-predictors" style="font-size: 80%;"><b>Extending to more predictors</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs038.html#including-more-classes" style="font-size: 80%;"><b>Including more classes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs039.html#more-classes" style="font-size: 80%;"><b>More classes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs040.html#optimization-the-central-part-of-any-machine-learning-algortithm" style="font-size: 80%;"><b>Optimization, the central part of any Machine Learning algortithm</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs041.html#revisiting-our-logistic-regression-case" style="font-size: 80%;"><b>Revisiting our Logistic Regression case</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs042.html#the-equations-to-solve" style="font-size: 80%;"><b>The equations to solve</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs043.html#solving-using-newton-raphson-s-method" style="font-size: 80%;"><b>Solving using Newton-Raphson's method</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs044.html#example-code-for-logistic-regression" style="font-size: 80%;"><b>Example code for Logistic Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs044.html#synthetic-data-generation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Synthetic data generation</a></li>
+
+        </ul>
+      </li>
+    </ul>
+  </div>
+</div>
+</div> <!-- end of navigation bar -->
+<div class="container">
+<p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
+<a name="part0024"></a>
+<!-- !split -->
+<h2 id="optimization-and-deep-learning" class="anchor">Optimization and Deep learning </h2>
+
+<p>Logistic regression will also serve as our stepping stone towards
+neural network algorithms and supervised deep learning. For logistic
+learning, the minimization of the cost function leads to a non-linear
+equation in the parameters \( \boldsymbol{\theta} \). The optimization of the
+problem calls therefore for minimization algorithms. This forms the
+bottle neck of all machine learning algorithms, namely how to find
+reliable minima of a multi-variable function. This leads us to the
+family of gradient descent methods. The latter are the working horses
+of basically all modern machine learning algorithms.
+</p>
+
+<p>We note also that many of the topics discussed here on logistic 
+regression are also commonly used in modern supervised Deep Learning
+models, as we will see later.
+</p>
+
+<p>
+<!-- navigation buttons at the bottom of the page -->
+<ul class="pagination">
+<li><a href="http://github.com/._week39-bs023.html">&laquo;</a></li>
+  <li><a href="http://github.com/._week39-bs000.html">1</a></li>
+  <li><a href="">...</a></li>
+  <li><a href="http://github.com/._week39-bs016.html">17</a></li>
+  <li><a href="http://github.com/._week39-bs017.html">18</a></li>
+  <li><a href="http://github.com/._week39-bs018.html">19</a></li>
+  <li><a href="http://github.com/._week39-bs019.html">20</a></li>
+  <li><a href="http://github.com/._week39-bs020.html">21</a></li>
+  <li><a href="http://github.com/._week39-bs021.html">22</a></li>
+  <li><a href="http://github.com/._week39-bs022.html">23</a></li>
+  <li><a href="http://github.com/._week39-bs023.html">24</a></li>
+  <li class="active"><a href="http://github.com/._week39-bs024.html">25</a></li>
+  <li><a href="http://github.com/._week39-bs025.html">26</a></li>
+  <li><a href="http://github.com/._week39-bs026.html">27</a></li>
+  <li><a href="http://github.com/._week39-bs027.html">28</a></li>
+  <li><a href="http://github.com/._week39-bs028.html">29</a></li>
+  <li><a href="http://github.com/._week39-bs029.html">30</a></li>
+  <li><a href="http://github.com/._week39-bs030.html">31</a></li>
+  <li><a href="http://github.com/._week39-bs031.html">32</a></li>
+  <li><a href="http://github.com/._week39-bs032.html">33</a></li>
+  <li><a href="http://github.com/._week39-bs033.html">34</a></li>
+  <li><a href="">...</a></li>
+  <li><a href="http://github.com/._week39-bs044.html">45</a></li>
+  <li><a href="http://github.com/._week39-bs025.html">&raquo;</a></li>
+</ul>
+<!-- ------------------- end of main content --------------- -->
+</div>  <!-- end container -->
+<!-- include javascript, jQuery *first* -->
+<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.10.2/jquery.min.js"></script>
+<script src="https://netdna.bootstrapcdn.com/bootstrap/3.0.0/js/bootstrap.min.js"></script>
+<!-- Bootstrap footer
+<footer>
+<a href="https://.../"><img width="250" align=right src="https://.../"></a>
+</footer>
+-->
+<center style="font-size:80%">
+<!-- copyright only on the titlepage -->
+</center>
+</body>
+</html>
+
diff --git a/doc/pub/week39/html/._week39-bs025.html b/doc/pub/week39/html/._week39-bs025.html
new file mode 100644
index 000000000..1abef133f
--- /dev/null
+++ b/doc/pub/week39/html/._week39-bs025.html
@@ -0,0 +1,323 @@
+<!--
+HTML file automatically generated from DocOnce source
+(https://github.com/doconce/doconce/)
+doconce format html week39.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week39-bs --no_mako
+-->
+<html>
+<head>
+<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
+<meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
+<meta name="viewport" content="width=device-width, initial-scale=1.0" />
+<meta name="description" content="Week 39: Resampling methods and logistic regression">
+<title>Week 39: Resampling methods and logistic regression</title>
+<!-- Bootstrap style: bootstrap -->
+<!-- doconce format html week39.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week39-bs --no_mako -->
+<link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
+<!-- not necessary
+<link href="https://netdna.bootstrapcdn.com/font-awesome/4.0.3/css/font-awesome.css" rel="stylesheet">
+-->
+<style type="text/css">
+/* Add scrollbar to dropdown menus in bootstrap navigation bar */
+.dropdown-menu {
+   height: auto;
+   max-height: 400px;
+   overflow-x: hidden;
+}
+/* Adds an invisible element before each target to offset for the navigation
+   bar */
+.anchor::before {
+  content:"";
+  display:block;
+  height:50px;      /* fixed header height for style bootstrap */
+  margin:-50px 0 0; /* negative fixed header height */
+}
+</style>
+</head>
+
+<!-- tocinfo
+{'highest level': 2,
+ 'sections': [('Plan for week 39, September 22-26, 2025',
+               2,
+               None,
+               'plan-for-week-39-september-22-26-2025'),
+              ('Readings and Videos, resampling methods',
+               2,
+               None,
+               'readings-and-videos-resampling-methods'),
+              ('Readings and Videos, logistic regression',
+               2,
+               None,
+               'readings-and-videos-logistic-regression'),
+              ('Lab sessions week 39', 2, None, 'lab-sessions-week-39'),
+              ('Lecture material', 2, None, 'lecture-material'),
+              ('Resampling methods', 2, None, 'resampling-methods'),
+              ('Resampling approaches can be computationally expensive',
+               2,
+               None,
+               'resampling-approaches-can-be-computationally-expensive'),
+              ('Why resampling methods ?', 2, None, 'why-resampling-methods'),
+              ('Statistical analysis', 2, None, 'statistical-analysis'),
+              ('Resampling methods', 2, None, 'resampling-methods'),
+              ('Resampling methods: Bootstrap',
+               2,
+               None,
+               'resampling-methods-bootstrap'),
+              ('The bias-variance tradeoff',
+               2,
+               None,
+               'the-bias-variance-tradeoff'),
+              ('A way to Read the Bias-Variance Tradeoff',
+               2,
+               None,
+               'a-way-to-read-the-bias-variance-tradeoff'),
+              ('Understanding what happens',
+               2,
+               None,
+               'understanding-what-happens'),
+              ('Summing up', 2, None, 'summing-up'),
+              ("Another Example from Scikit-Learn's Repository",
+               2,
+               None,
+               'another-example-from-scikit-learn-s-repository'),
+              ('Various steps in cross-validation',
+               2,
+               None,
+               'various-steps-in-cross-validation'),
+              ('Cross-validation in brief',
+               2,
+               None,
+               'cross-validation-in-brief'),
+              ('Code Example for Cross-validation and $k$-fold '
+               'Cross-validation',
+               2,
+               None,
+               'code-example-for-cross-validation-and-k-fold-cross-validation'),
+              ('More examples on bootstrap and cross-validation and errors',
+               2,
+               None,
+               'more-examples-on-bootstrap-and-cross-validation-and-errors'),
+              ('The same example but now with cross-validation',
+               2,
+               None,
+               'the-same-example-but-now-with-cross-validation'),
+              ('Logistic Regression', 2, None, 'logistic-regression'),
+              ('Classification problems', 2, None, 'classification-problems'),
+              ('Optimization and Deep learning',
+               2,
+               None,
+               'optimization-and-deep-learning'),
+              ('Basics', 2, None, 'basics'),
+              ('Linear classifier', 2, None, 'linear-classifier'),
+              ('Some selected properties', 2, None, 'some-selected-properties'),
+              ('Simple example', 2, None, 'simple-example'),
+              ('Plotting the mean value for each group',
+               2,
+               None,
+               'plotting-the-mean-value-for-each-group'),
+              ('The logistic function', 2, None, 'the-logistic-function'),
+              ('Examples of likelihood functions used in logistic regression '
+               'and nueral networks',
+               2,
+               None,
+               'examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-networks'),
+              ('Two parameters', 2, None, 'two-parameters'),
+              ('Maximum likelihood', 2, None, 'maximum-likelihood'),
+              ('The cost function rewritten',
+               2,
+               None,
+               'the-cost-function-rewritten'),
+              ('Minimizing the cross entropy',
+               2,
+               None,
+               'minimizing-the-cross-entropy'),
+              ('A more compact expression',
+               2,
+               None,
+               'a-more-compact-expression'),
+              ('Extending to more predictors',
+               2,
+               None,
+               'extending-to-more-predictors'),
+              ('Including more classes', 2, None, 'including-more-classes'),
+              ('More classes', 2, None, 'more-classes'),
+              ('Optimization, the central part of any Machine Learning '
+               'algortithm',
+               2,
+               None,
+               'optimization-the-central-part-of-any-machine-learning-algortithm'),
+              ('Revisiting our Logistic Regression case',
+               2,
+               None,
+               'revisiting-our-logistic-regression-case'),
+              ('The equations to solve', 2, None, 'the-equations-to-solve'),
+              ("Solving using Newton-Raphson's method",
+               2,
+               None,
+               'solving-using-newton-raphson-s-method'),
+              ('Example code for Logistic Regression',
+               2,
+               None,
+               'example-code-for-logistic-regression'),
+              ('Synthetic data generation',
+               3,
+               None,
+               'synthetic-data-generation')]}
+end of tocinfo -->
+
+<body>
+
+
+
+<script type="text/x-mathjax-config">
+MathJax.Hub.Config({
+  TeX: {
+     equationNumbers: {  autoNumber: "none"  },
+     extensions: ["AMSmath.js", "AMSsymbols.js", "autobold.js", "color.js"]
+  }
+});
+</script>
+<script type="text/javascript" async
+ src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
+</script>
+
+
+<!-- Bootstrap navigation bar -->
+<div class="navbar navbar-default navbar-fixed-top">
+  <div class="navbar-header">
+    <button type="button" class="navbar-toggle" data-toggle="collapse" data-target=".navbar-responsive-collapse">
+      <span class="icon-bar"></span>
+      <span class="icon-bar"></span>
+      <span class="icon-bar"></span>
+    </button>
+    <a class="navbar-brand" href="http://github.com/week39-bs.html">Week 39: Resampling methods and logistic regression</a>
+  </div>
+  <div class="navbar-collapse collapse navbar-responsive-collapse">
+    <ul class="nav navbar-nav navbar-right">
+      <li class="dropdown">
+        <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
+        <ul class="dropdown-menu">
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs001.html#plan-for-week-39-september-22-26-2025" style="font-size: 80%;"><b>Plan for week 39, September 22-26, 2025</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs002.html#readings-and-videos-resampling-methods" style="font-size: 80%;"><b>Readings and Videos, resampling methods</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs003.html#readings-and-videos-logistic-regression" style="font-size: 80%;"><b>Readings and Videos, logistic regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs004.html#lab-sessions-week-39" style="font-size: 80%;"><b>Lab sessions week 39</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs005.html#lecture-material" style="font-size: 80%;"><b>Lecture material</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs010.html#resampling-methods" style="font-size: 80%;"><b>Resampling methods</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs007.html#resampling-approaches-can-be-computationally-expensive" style="font-size: 80%;"><b>Resampling approaches can be computationally expensive</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs008.html#why-resampling-methods" style="font-size: 80%;"><b>Why resampling methods ?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs009.html#statistical-analysis" style="font-size: 80%;"><b>Statistical analysis</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs010.html#resampling-methods" style="font-size: 80%;"><b>Resampling methods</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs011.html#resampling-methods-bootstrap" style="font-size: 80%;"><b>Resampling methods: Bootstrap</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs012.html#the-bias-variance-tradeoff" style="font-size: 80%;"><b>The bias-variance tradeoff</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs013.html#a-way-to-read-the-bias-variance-tradeoff" style="font-size: 80%;"><b>A way to Read the Bias-Variance Tradeoff</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs014.html#understanding-what-happens" style="font-size: 80%;"><b>Understanding what happens</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs015.html#summing-up" style="font-size: 80%;"><b>Summing up</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs016.html#another-example-from-scikit-learn-s-repository" style="font-size: 80%;"><b>Another Example from Scikit-Learn's Repository</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs017.html#various-steps-in-cross-validation" style="font-size: 80%;"><b>Various steps in cross-validation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs018.html#cross-validation-in-brief" style="font-size: 80%;"><b>Cross-validation in brief</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs019.html#code-example-for-cross-validation-and-k-fold-cross-validation" style="font-size: 80%;"><b>Code Example for Cross-validation and \( k \)-fold Cross-validation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs020.html#more-examples-on-bootstrap-and-cross-validation-and-errors" style="font-size: 80%;"><b>More examples on bootstrap and cross-validation and errors</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs021.html#the-same-example-but-now-with-cross-validation" style="font-size: 80%;"><b>The same example but now with cross-validation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs022.html#logistic-regression" style="font-size: 80%;"><b>Logistic Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs023.html#classification-problems" style="font-size: 80%;"><b>Classification problems</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs024.html#optimization-and-deep-learning" style="font-size: 80%;"><b>Optimization and Deep learning</b></a></li>
+     <!-- navigation toc: --> <li><a href="#basics" style="font-size: 80%;"><b>Basics</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs026.html#linear-classifier" style="font-size: 80%;"><b>Linear classifier</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs027.html#some-selected-properties" style="font-size: 80%;"><b>Some selected properties</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs028.html#simple-example" style="font-size: 80%;"><b>Simple example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs029.html#plotting-the-mean-value-for-each-group" style="font-size: 80%;"><b>Plotting the mean value for each group</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs030.html#the-logistic-function" style="font-size: 80%;"><b>The logistic function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs031.html#examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-networks" style="font-size: 80%;"><b>Examples of likelihood functions used in logistic regression and nueral networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs032.html#two-parameters" style="font-size: 80%;"><b>Two parameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs033.html#maximum-likelihood" style="font-size: 80%;"><b>Maximum likelihood</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#the-cost-function-rewritten" style="font-size: 80%;"><b>The cost function rewritten</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs035.html#minimizing-the-cross-entropy" style="font-size: 80%;"><b>Minimizing the cross entropy</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs036.html#a-more-compact-expression" style="font-size: 80%;"><b>A more compact expression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs037.html#extending-to-more-predictors" style="font-size: 80%;"><b>Extending to more predictors</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs038.html#including-more-classes" style="font-size: 80%;"><b>Including more classes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs039.html#more-classes" style="font-size: 80%;"><b>More classes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs040.html#optimization-the-central-part-of-any-machine-learning-algortithm" style="font-size: 80%;"><b>Optimization, the central part of any Machine Learning algortithm</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs041.html#revisiting-our-logistic-regression-case" style="font-size: 80%;"><b>Revisiting our Logistic Regression case</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs042.html#the-equations-to-solve" style="font-size: 80%;"><b>The equations to solve</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs043.html#solving-using-newton-raphson-s-method" style="font-size: 80%;"><b>Solving using Newton-Raphson's method</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs044.html#example-code-for-logistic-regression" style="font-size: 80%;"><b>Example code for Logistic Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs044.html#synthetic-data-generation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Synthetic data generation</a></li>
+
+        </ul>
+      </li>
+    </ul>
+  </div>
+</div>
+</div> <!-- end of navigation bar -->
+<div class="container">
+<p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
+<a name="part0025"></a>
+<!-- !split  -->
+<h2 id="basics" class="anchor">Basics </h2>
+
+<p>We consider the case where the outputs/targets, also called the
+responses or the outcomes, \( y_i \) are discrete and only take values
+from \( k=0,\dots,K-1 \) (i.e. \( K \) classes).
+</p>
+
+<p>The goal is to predict the
+output classes from the design matrix \( \boldsymbol{X}\in\mathbb{R}^{n\times p} \)
+made of \( n \) samples, each of which carries \( p \) features or predictors. The
+primary goal is to identify the classes to which new unseen samples
+belong.
+</p>
+
+<p>Let us specialize to the case of two classes only, with outputs
+\( y_i=0 \) and \( y_i=1 \). Our outcomes could represent the status of a
+credit card user that could default or not on her/his credit card
+debt. That is
+</p>
+
+$$
+y_i = \begin{bmatrix} 0 & \mathrm{no}\\  1 & \mathrm{yes} \end{bmatrix}.
+$$
+
+
+<p>
+<!-- navigation buttons at the bottom of the page -->
+<ul class="pagination">
+<li><a href="http://github.com/._week39-bs024.html">&laquo;</a></li>
+  <li><a href="http://github.com/._week39-bs000.html">1</a></li>
+  <li><a href="">...</a></li>
+  <li><a href="http://github.com/._week39-bs017.html">18</a></li>
+  <li><a href="http://github.com/._week39-bs018.html">19</a></li>
+  <li><a href="http://github.com/._week39-bs019.html">20</a></li>
+  <li><a href="http://github.com/._week39-bs020.html">21</a></li>
+  <li><a href="http://github.com/._week39-bs021.html">22</a></li>
+  <li><a href="http://github.com/._week39-bs022.html">23</a></li>
+  <li><a href="http://github.com/._week39-bs023.html">24</a></li>
+  <li><a href="http://github.com/._week39-bs024.html">25</a></li>
+  <li class="active"><a href="http://github.com/._week39-bs025.html">26</a></li>
+  <li><a href="http://github.com/._week39-bs026.html">27</a></li>
+  <li><a href="http://github.com/._week39-bs027.html">28</a></li>
+  <li><a href="http://github.com/._week39-bs028.html">29</a></li>
+  <li><a href="http://github.com/._week39-bs029.html">30</a></li>
+  <li><a href="http://github.com/._week39-bs030.html">31</a></li>
+  <li><a href="http://github.com/._week39-bs031.html">32</a></li>
+  <li><a href="http://github.com/._week39-bs032.html">33</a></li>
+  <li><a href="http://github.com/._week39-bs033.html">34</a></li>
+  <li><a href="http://github.com/._week39-bs034.html">35</a></li>
+  <li><a href="">...</a></li>
+  <li><a href="http://github.com/._week39-bs044.html">45</a></li>
+  <li><a href="http://github.com/._week39-bs026.html">&raquo;</a></li>
+</ul>
+<!-- ------------------- end of main content --------------- -->
+</div>  <!-- end container -->
+<!-- include javascript, jQuery *first* -->
+<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.10.2/jquery.min.js"></script>
+<script src="https://netdna.bootstrapcdn.com/bootstrap/3.0.0/js/bootstrap.min.js"></script>
+<!-- Bootstrap footer
+<footer>
+<a href="https://.../"><img width="250" align=right src="https://.../"></a>
+</footer>
+-->
+<center style="font-size:80%">
+<!-- copyright only on the titlepage -->
+</center>
+</body>
+</html>
+
diff --git a/doc/pub/week39/html/._week39-bs026.html b/doc/pub/week39/html/._week39-bs026.html
new file mode 100644
index 000000000..9e1c87421
--- /dev/null
+++ b/doc/pub/week39/html/._week39-bs026.html
@@ -0,0 +1,320 @@
+<!--
+HTML file automatically generated from DocOnce source
+(https://github.com/doconce/doconce/)
+doconce format html week39.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week39-bs --no_mako
+-->
+<html>
+<head>
+<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
+<meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
+<meta name="viewport" content="width=device-width, initial-scale=1.0" />
+<meta name="description" content="Week 39: Resampling methods and logistic regression">
+<title>Week 39: Resampling methods and logistic regression</title>
+<!-- Bootstrap style: bootstrap -->
+<!-- doconce format html week39.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week39-bs --no_mako -->
+<link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
+<!-- not necessary
+<link href="https://netdna.bootstrapcdn.com/font-awesome/4.0.3/css/font-awesome.css" rel="stylesheet">
+-->
+<style type="text/css">
+/* Add scrollbar to dropdown menus in bootstrap navigation bar */
+.dropdown-menu {
+   height: auto;
+   max-height: 400px;
+   overflow-x: hidden;
+}
+/* Adds an invisible element before each target to offset for the navigation
+   bar */
+.anchor::before {
+  content:"";
+  display:block;
+  height:50px;      /* fixed header height for style bootstrap */
+  margin:-50px 0 0; /* negative fixed header height */
+}
+</style>
+</head>
+
+<!-- tocinfo
+{'highest level': 2,
+ 'sections': [('Plan for week 39, September 22-26, 2025',
+               2,
+               None,
+               'plan-for-week-39-september-22-26-2025'),
+              ('Readings and Videos, resampling methods',
+               2,
+               None,
+               'readings-and-videos-resampling-methods'),
+              ('Readings and Videos, logistic regression',
+               2,
+               None,
+               'readings-and-videos-logistic-regression'),
+              ('Lab sessions week 39', 2, None, 'lab-sessions-week-39'),
+              ('Lecture material', 2, None, 'lecture-material'),
+              ('Resampling methods', 2, None, 'resampling-methods'),
+              ('Resampling approaches can be computationally expensive',
+               2,
+               None,
+               'resampling-approaches-can-be-computationally-expensive'),
+              ('Why resampling methods ?', 2, None, 'why-resampling-methods'),
+              ('Statistical analysis', 2, None, 'statistical-analysis'),
+              ('Resampling methods', 2, None, 'resampling-methods'),
+              ('Resampling methods: Bootstrap',
+               2,
+               None,
+               'resampling-methods-bootstrap'),
+              ('The bias-variance tradeoff',
+               2,
+               None,
+               'the-bias-variance-tradeoff'),
+              ('A way to Read the Bias-Variance Tradeoff',
+               2,
+               None,
+               'a-way-to-read-the-bias-variance-tradeoff'),
+              ('Understanding what happens',
+               2,
+               None,
+               'understanding-what-happens'),
+              ('Summing up', 2, None, 'summing-up'),
+              ("Another Example from Scikit-Learn's Repository",
+               2,
+               None,
+               'another-example-from-scikit-learn-s-repository'),
+              ('Various steps in cross-validation',
+               2,
+               None,
+               'various-steps-in-cross-validation'),
+              ('Cross-validation in brief',
+               2,
+               None,
+               'cross-validation-in-brief'),
+              ('Code Example for Cross-validation and $k$-fold '
+               'Cross-validation',
+               2,
+               None,
+               'code-example-for-cross-validation-and-k-fold-cross-validation'),
+              ('More examples on bootstrap and cross-validation and errors',
+               2,
+               None,
+               'more-examples-on-bootstrap-and-cross-validation-and-errors'),
+              ('The same example but now with cross-validation',
+               2,
+               None,
+               'the-same-example-but-now-with-cross-validation'),
+              ('Logistic Regression', 2, None, 'logistic-regression'),
+              ('Classification problems', 2, None, 'classification-problems'),
+              ('Optimization and Deep learning',
+               2,
+               None,
+               'optimization-and-deep-learning'),
+              ('Basics', 2, None, 'basics'),
+              ('Linear classifier', 2, None, 'linear-classifier'),
+              ('Some selected properties', 2, None, 'some-selected-properties'),
+              ('Simple example', 2, None, 'simple-example'),
+              ('Plotting the mean value for each group',
+               2,
+               None,
+               'plotting-the-mean-value-for-each-group'),
+              ('The logistic function', 2, None, 'the-logistic-function'),
+              ('Examples of likelihood functions used in logistic regression '
+               'and nueral networks',
+               2,
+               None,
+               'examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-networks'),
+              ('Two parameters', 2, None, 'two-parameters'),
+              ('Maximum likelihood', 2, None, 'maximum-likelihood'),
+              ('The cost function rewritten',
+               2,
+               None,
+               'the-cost-function-rewritten'),
+              ('Minimizing the cross entropy',
+               2,
+               None,
+               'minimizing-the-cross-entropy'),
+              ('A more compact expression',
+               2,
+               None,
+               'a-more-compact-expression'),
+              ('Extending to more predictors',
+               2,
+               None,
+               'extending-to-more-predictors'),
+              ('Including more classes', 2, None, 'including-more-classes'),
+              ('More classes', 2, None, 'more-classes'),
+              ('Optimization, the central part of any Machine Learning '
+               'algortithm',
+               2,
+               None,
+               'optimization-the-central-part-of-any-machine-learning-algortithm'),
+              ('Revisiting our Logistic Regression case',
+               2,
+               None,
+               'revisiting-our-logistic-regression-case'),
+              ('The equations to solve', 2, None, 'the-equations-to-solve'),
+              ("Solving using Newton-Raphson's method",
+               2,
+               None,
+               'solving-using-newton-raphson-s-method'),
+              ('Example code for Logistic Regression',
+               2,
+               None,
+               'example-code-for-logistic-regression'),
+              ('Synthetic data generation',
+               3,
+               None,
+               'synthetic-data-generation')]}
+end of tocinfo -->
+
+<body>
+
+
+
+<script type="text/x-mathjax-config">
+MathJax.Hub.Config({
+  TeX: {
+     equationNumbers: {  autoNumber: "none"  },
+     extensions: ["AMSmath.js", "AMSsymbols.js", "autobold.js", "color.js"]
+  }
+});
+</script>
+<script type="text/javascript" async
+ src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
+</script>
+
+
+<!-- Bootstrap navigation bar -->
+<div class="navbar navbar-default navbar-fixed-top">
+  <div class="navbar-header">
+    <button type="button" class="navbar-toggle" data-toggle="collapse" data-target=".navbar-responsive-collapse">
+      <span class="icon-bar"></span>
+      <span class="icon-bar"></span>
+      <span class="icon-bar"></span>
+    </button>
+    <a class="navbar-brand" href="http://github.com/week39-bs.html">Week 39: Resampling methods and logistic regression</a>
+  </div>
+  <div class="navbar-collapse collapse navbar-responsive-collapse">
+    <ul class="nav navbar-nav navbar-right">
+      <li class="dropdown">
+        <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
+        <ul class="dropdown-menu">
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs001.html#plan-for-week-39-september-22-26-2025" style="font-size: 80%;"><b>Plan for week 39, September 22-26, 2025</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs002.html#readings-and-videos-resampling-methods" style="font-size: 80%;"><b>Readings and Videos, resampling methods</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs003.html#readings-and-videos-logistic-regression" style="font-size: 80%;"><b>Readings and Videos, logistic regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs004.html#lab-sessions-week-39" style="font-size: 80%;"><b>Lab sessions week 39</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs005.html#lecture-material" style="font-size: 80%;"><b>Lecture material</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs010.html#resampling-methods" style="font-size: 80%;"><b>Resampling methods</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs007.html#resampling-approaches-can-be-computationally-expensive" style="font-size: 80%;"><b>Resampling approaches can be computationally expensive</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs008.html#why-resampling-methods" style="font-size: 80%;"><b>Why resampling methods ?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs009.html#statistical-analysis" style="font-size: 80%;"><b>Statistical analysis</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs010.html#resampling-methods" style="font-size: 80%;"><b>Resampling methods</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs011.html#resampling-methods-bootstrap" style="font-size: 80%;"><b>Resampling methods: Bootstrap</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs012.html#the-bias-variance-tradeoff" style="font-size: 80%;"><b>The bias-variance tradeoff</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs013.html#a-way-to-read-the-bias-variance-tradeoff" style="font-size: 80%;"><b>A way to Read the Bias-Variance Tradeoff</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs014.html#understanding-what-happens" style="font-size: 80%;"><b>Understanding what happens</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs015.html#summing-up" style="font-size: 80%;"><b>Summing up</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs016.html#another-example-from-scikit-learn-s-repository" style="font-size: 80%;"><b>Another Example from Scikit-Learn's Repository</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs017.html#various-steps-in-cross-validation" style="font-size: 80%;"><b>Various steps in cross-validation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs018.html#cross-validation-in-brief" style="font-size: 80%;"><b>Cross-validation in brief</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs019.html#code-example-for-cross-validation-and-k-fold-cross-validation" style="font-size: 80%;"><b>Code Example for Cross-validation and \( k \)-fold Cross-validation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs020.html#more-examples-on-bootstrap-and-cross-validation-and-errors" style="font-size: 80%;"><b>More examples on bootstrap and cross-validation and errors</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs021.html#the-same-example-but-now-with-cross-validation" style="font-size: 80%;"><b>The same example but now with cross-validation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs022.html#logistic-regression" style="font-size: 80%;"><b>Logistic Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs023.html#classification-problems" style="font-size: 80%;"><b>Classification problems</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs024.html#optimization-and-deep-learning" style="font-size: 80%;"><b>Optimization and Deep learning</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs025.html#basics" style="font-size: 80%;"><b>Basics</b></a></li>
+     <!-- navigation toc: --> <li><a href="#linear-classifier" style="font-size: 80%;"><b>Linear classifier</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs027.html#some-selected-properties" style="font-size: 80%;"><b>Some selected properties</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs028.html#simple-example" style="font-size: 80%;"><b>Simple example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs029.html#plotting-the-mean-value-for-each-group" style="font-size: 80%;"><b>Plotting the mean value for each group</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs030.html#the-logistic-function" style="font-size: 80%;"><b>The logistic function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs031.html#examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-networks" style="font-size: 80%;"><b>Examples of likelihood functions used in logistic regression and nueral networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs032.html#two-parameters" style="font-size: 80%;"><b>Two parameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs033.html#maximum-likelihood" style="font-size: 80%;"><b>Maximum likelihood</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#the-cost-function-rewritten" style="font-size: 80%;"><b>The cost function rewritten</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs035.html#minimizing-the-cross-entropy" style="font-size: 80%;"><b>Minimizing the cross entropy</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs036.html#a-more-compact-expression" style="font-size: 80%;"><b>A more compact expression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs037.html#extending-to-more-predictors" style="font-size: 80%;"><b>Extending to more predictors</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs038.html#including-more-classes" style="font-size: 80%;"><b>Including more classes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs039.html#more-classes" style="font-size: 80%;"><b>More classes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs040.html#optimization-the-central-part-of-any-machine-learning-algortithm" style="font-size: 80%;"><b>Optimization, the central part of any Machine Learning algortithm</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs041.html#revisiting-our-logistic-regression-case" style="font-size: 80%;"><b>Revisiting our Logistic Regression case</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs042.html#the-equations-to-solve" style="font-size: 80%;"><b>The equations to solve</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs043.html#solving-using-newton-raphson-s-method" style="font-size: 80%;"><b>Solving using Newton-Raphson's method</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs044.html#example-code-for-logistic-regression" style="font-size: 80%;"><b>Example code for Logistic Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs044.html#synthetic-data-generation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Synthetic data generation</a></li>
+
+        </ul>
+      </li>
+    </ul>
+  </div>
+</div>
+</div> <!-- end of navigation bar -->
+<div class="container">
+<p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
+<a name="part0026"></a>
+<!-- !split -->
+<h2 id="linear-classifier" class="anchor">Linear classifier </h2>
+
+<p>Before moving to the logistic model, let us try to use our linear
+regression model to classify these two outcomes. We could for example
+fit a linear model to the default case if \( y_i > 0.5 \) and the no
+default case \( y_i \leq 0.5 \).
+</p>
+
+<p>We would then have our 
+weighted linear combination, namely 
+</p>
+$$
+\begin{equation}
+\boldsymbol{y} = \boldsymbol{X}^T\boldsymbol{\theta} +  \boldsymbol{\epsilon},
+\tag{1}
+\end{equation}
+$$
+
+<p>where \( \boldsymbol{y} \) is a vector representing the possible outcomes, \( \boldsymbol{X} \) is our
+\( n\times p \) design matrix and \( \boldsymbol{\theta} \) represents our estimators/predictors.
+</p>
+
+<p>
+<!-- navigation buttons at the bottom of the page -->
+<ul class="pagination">
+<li><a href="http://github.com/._week39-bs025.html">&laquo;</a></li>
+  <li><a href="http://github.com/._week39-bs000.html">1</a></li>
+  <li><a href="">...</a></li>
+  <li><a href="http://github.com/._week39-bs018.html">19</a></li>
+  <li><a href="http://github.com/._week39-bs019.html">20</a></li>
+  <li><a href="http://github.com/._week39-bs020.html">21</a></li>
+  <li><a href="http://github.com/._week39-bs021.html">22</a></li>
+  <li><a href="http://github.com/._week39-bs022.html">23</a></li>
+  <li><a href="http://github.com/._week39-bs023.html">24</a></li>
+  <li><a href="http://github.com/._week39-bs024.html">25</a></li>
+  <li><a href="http://github.com/._week39-bs025.html">26</a></li>
+  <li class="active"><a href="http://github.com/._week39-bs026.html">27</a></li>
+  <li><a href="http://github.com/._week39-bs027.html">28</a></li>
+  <li><a href="http://github.com/._week39-bs028.html">29</a></li>
+  <li><a href="http://github.com/._week39-bs029.html">30</a></li>
+  <li><a href="http://github.com/._week39-bs030.html">31</a></li>
+  <li><a href="http://github.com/._week39-bs031.html">32</a></li>
+  <li><a href="http://github.com/._week39-bs032.html">33</a></li>
+  <li><a href="http://github.com/._week39-bs033.html">34</a></li>
+  <li><a href="http://github.com/._week39-bs034.html">35</a></li>
+  <li><a href="http://github.com/._week39-bs035.html">36</a></li>
+  <li><a href="">...</a></li>
+  <li><a href="http://github.com/._week39-bs044.html">45</a></li>
+  <li><a href="http://github.com/._week39-bs027.html">&raquo;</a></li>
+</ul>
+<!-- ------------------- end of main content --------------- -->
+</div>  <!-- end container -->
+<!-- include javascript, jQuery *first* -->
+<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.10.2/jquery.min.js"></script>
+<script src="https://netdna.bootstrapcdn.com/bootstrap/3.0.0/js/bootstrap.min.js"></script>
+<!-- Bootstrap footer
+<footer>
+<a href="https://.../"><img width="250" align=right src="https://.../"></a>
+</footer>
+-->
+<center style="font-size:80%">
+<!-- copyright only on the titlepage -->
+</center>
+</body>
+</html>
+
diff --git a/doc/pub/week39/html/._week39-bs027.html b/doc/pub/week39/html/._week39-bs027.html
new file mode 100644
index 000000000..6622d37d3
--- /dev/null
+++ b/doc/pub/week39/html/._week39-bs027.html
@@ -0,0 +1,320 @@
+<!--
+HTML file automatically generated from DocOnce source
+(https://github.com/doconce/doconce/)
+doconce format html week39.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week39-bs --no_mako
+-->
+<html>
+<head>
+<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
+<meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
+<meta name="viewport" content="width=device-width, initial-scale=1.0" />
+<meta name="description" content="Week 39: Resampling methods and logistic regression">
+<title>Week 39: Resampling methods and logistic regression</title>
+<!-- Bootstrap style: bootstrap -->
+<!-- doconce format html week39.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week39-bs --no_mako -->
+<link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
+<!-- not necessary
+<link href="https://netdna.bootstrapcdn.com/font-awesome/4.0.3/css/font-awesome.css" rel="stylesheet">
+-->
+<style type="text/css">
+/* Add scrollbar to dropdown menus in bootstrap navigation bar */
+.dropdown-menu {
+   height: auto;
+   max-height: 400px;
+   overflow-x: hidden;
+}
+/* Adds an invisible element before each target to offset for the navigation
+   bar */
+.anchor::before {
+  content:"";
+  display:block;
+  height:50px;      /* fixed header height for style bootstrap */
+  margin:-50px 0 0; /* negative fixed header height */
+}
+</style>
+</head>
+
+<!-- tocinfo
+{'highest level': 2,
+ 'sections': [('Plan for week 39, September 22-26, 2025',
+               2,
+               None,
+               'plan-for-week-39-september-22-26-2025'),
+              ('Readings and Videos, resampling methods',
+               2,
+               None,
+               'readings-and-videos-resampling-methods'),
+              ('Readings and Videos, logistic regression',
+               2,
+               None,
+               'readings-and-videos-logistic-regression'),
+              ('Lab sessions week 39', 2, None, 'lab-sessions-week-39'),
+              ('Lecture material', 2, None, 'lecture-material'),
+              ('Resampling methods', 2, None, 'resampling-methods'),
+              ('Resampling approaches can be computationally expensive',
+               2,
+               None,
+               'resampling-approaches-can-be-computationally-expensive'),
+              ('Why resampling methods ?', 2, None, 'why-resampling-methods'),
+              ('Statistical analysis', 2, None, 'statistical-analysis'),
+              ('Resampling methods', 2, None, 'resampling-methods'),
+              ('Resampling methods: Bootstrap',
+               2,
+               None,
+               'resampling-methods-bootstrap'),
+              ('The bias-variance tradeoff',
+               2,
+               None,
+               'the-bias-variance-tradeoff'),
+              ('A way to Read the Bias-Variance Tradeoff',
+               2,
+               None,
+               'a-way-to-read-the-bias-variance-tradeoff'),
+              ('Understanding what happens',
+               2,
+               None,
+               'understanding-what-happens'),
+              ('Summing up', 2, None, 'summing-up'),
+              ("Another Example from Scikit-Learn's Repository",
+               2,
+               None,
+               'another-example-from-scikit-learn-s-repository'),
+              ('Various steps in cross-validation',
+               2,
+               None,
+               'various-steps-in-cross-validation'),
+              ('Cross-validation in brief',
+               2,
+               None,
+               'cross-validation-in-brief'),
+              ('Code Example for Cross-validation and $k$-fold '
+               'Cross-validation',
+               2,
+               None,
+               'code-example-for-cross-validation-and-k-fold-cross-validation'),
+              ('More examples on bootstrap and cross-validation and errors',
+               2,
+               None,
+               'more-examples-on-bootstrap-and-cross-validation-and-errors'),
+              ('The same example but now with cross-validation',
+               2,
+               None,
+               'the-same-example-but-now-with-cross-validation'),
+              ('Logistic Regression', 2, None, 'logistic-regression'),
+              ('Classification problems', 2, None, 'classification-problems'),
+              ('Optimization and Deep learning',
+               2,
+               None,
+               'optimization-and-deep-learning'),
+              ('Basics', 2, None, 'basics'),
+              ('Linear classifier', 2, None, 'linear-classifier'),
+              ('Some selected properties', 2, None, 'some-selected-properties'),
+              ('Simple example', 2, None, 'simple-example'),
+              ('Plotting the mean value for each group',
+               2,
+               None,
+               'plotting-the-mean-value-for-each-group'),
+              ('The logistic function', 2, None, 'the-logistic-function'),
+              ('Examples of likelihood functions used in logistic regression '
+               'and nueral networks',
+               2,
+               None,
+               'examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-networks'),
+              ('Two parameters', 2, None, 'two-parameters'),
+              ('Maximum likelihood', 2, None, 'maximum-likelihood'),
+              ('The cost function rewritten',
+               2,
+               None,
+               'the-cost-function-rewritten'),
+              ('Minimizing the cross entropy',
+               2,
+               None,
+               'minimizing-the-cross-entropy'),
+              ('A more compact expression',
+               2,
+               None,
+               'a-more-compact-expression'),
+              ('Extending to more predictors',
+               2,
+               None,
+               'extending-to-more-predictors'),
+              ('Including more classes', 2, None, 'including-more-classes'),
+              ('More classes', 2, None, 'more-classes'),
+              ('Optimization, the central part of any Machine Learning '
+               'algortithm',
+               2,
+               None,
+               'optimization-the-central-part-of-any-machine-learning-algortithm'),
+              ('Revisiting our Logistic Regression case',
+               2,
+               None,
+               'revisiting-our-logistic-regression-case'),
+              ('The equations to solve', 2, None, 'the-equations-to-solve'),
+              ("Solving using Newton-Raphson's method",
+               2,
+               None,
+               'solving-using-newton-raphson-s-method'),
+              ('Example code for Logistic Regression',
+               2,
+               None,
+               'example-code-for-logistic-regression'),
+              ('Synthetic data generation',
+               3,
+               None,
+               'synthetic-data-generation')]}
+end of tocinfo -->
+
+<body>
+
+
+
+<script type="text/x-mathjax-config">
+MathJax.Hub.Config({
+  TeX: {
+     equationNumbers: {  autoNumber: "none"  },
+     extensions: ["AMSmath.js", "AMSsymbols.js", "autobold.js", "color.js"]
+  }
+});
+</script>
+<script type="text/javascript" async
+ src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
+</script>
+
+
+<!-- Bootstrap navigation bar -->
+<div class="navbar navbar-default navbar-fixed-top">
+  <div class="navbar-header">
+    <button type="button" class="navbar-toggle" data-toggle="collapse" data-target=".navbar-responsive-collapse">
+      <span class="icon-bar"></span>
+      <span class="icon-bar"></span>
+      <span class="icon-bar"></span>
+    </button>
+    <a class="navbar-brand" href="http://github.com/week39-bs.html">Week 39: Resampling methods and logistic regression</a>
+  </div>
+  <div class="navbar-collapse collapse navbar-responsive-collapse">
+    <ul class="nav navbar-nav navbar-right">
+      <li class="dropdown">
+        <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
+        <ul class="dropdown-menu">
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs001.html#plan-for-week-39-september-22-26-2025" style="font-size: 80%;"><b>Plan for week 39, September 22-26, 2025</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs002.html#readings-and-videos-resampling-methods" style="font-size: 80%;"><b>Readings and Videos, resampling methods</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs003.html#readings-and-videos-logistic-regression" style="font-size: 80%;"><b>Readings and Videos, logistic regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs004.html#lab-sessions-week-39" style="font-size: 80%;"><b>Lab sessions week 39</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs005.html#lecture-material" style="font-size: 80%;"><b>Lecture material</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs010.html#resampling-methods" style="font-size: 80%;"><b>Resampling methods</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs007.html#resampling-approaches-can-be-computationally-expensive" style="font-size: 80%;"><b>Resampling approaches can be computationally expensive</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs008.html#why-resampling-methods" style="font-size: 80%;"><b>Why resampling methods ?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs009.html#statistical-analysis" style="font-size: 80%;"><b>Statistical analysis</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs010.html#resampling-methods" style="font-size: 80%;"><b>Resampling methods</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs011.html#resampling-methods-bootstrap" style="font-size: 80%;"><b>Resampling methods: Bootstrap</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs012.html#the-bias-variance-tradeoff" style="font-size: 80%;"><b>The bias-variance tradeoff</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs013.html#a-way-to-read-the-bias-variance-tradeoff" style="font-size: 80%;"><b>A way to Read the Bias-Variance Tradeoff</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs014.html#understanding-what-happens" style="font-size: 80%;"><b>Understanding what happens</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs015.html#summing-up" style="font-size: 80%;"><b>Summing up</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs016.html#another-example-from-scikit-learn-s-repository" style="font-size: 80%;"><b>Another Example from Scikit-Learn's Repository</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs017.html#various-steps-in-cross-validation" style="font-size: 80%;"><b>Various steps in cross-validation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs018.html#cross-validation-in-brief" style="font-size: 80%;"><b>Cross-validation in brief</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs019.html#code-example-for-cross-validation-and-k-fold-cross-validation" style="font-size: 80%;"><b>Code Example for Cross-validation and \( k \)-fold Cross-validation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs020.html#more-examples-on-bootstrap-and-cross-validation-and-errors" style="font-size: 80%;"><b>More examples on bootstrap and cross-validation and errors</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs021.html#the-same-example-but-now-with-cross-validation" style="font-size: 80%;"><b>The same example but now with cross-validation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs022.html#logistic-regression" style="font-size: 80%;"><b>Logistic Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs023.html#classification-problems" style="font-size: 80%;"><b>Classification problems</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs024.html#optimization-and-deep-learning" style="font-size: 80%;"><b>Optimization and Deep learning</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs025.html#basics" style="font-size: 80%;"><b>Basics</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs026.html#linear-classifier" style="font-size: 80%;"><b>Linear classifier</b></a></li>
+     <!-- navigation toc: --> <li><a href="#some-selected-properties" style="font-size: 80%;"><b>Some selected properties</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs028.html#simple-example" style="font-size: 80%;"><b>Simple example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs029.html#plotting-the-mean-value-for-each-group" style="font-size: 80%;"><b>Plotting the mean value for each group</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs030.html#the-logistic-function" style="font-size: 80%;"><b>The logistic function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs031.html#examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-networks" style="font-size: 80%;"><b>Examples of likelihood functions used in logistic regression and nueral networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs032.html#two-parameters" style="font-size: 80%;"><b>Two parameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs033.html#maximum-likelihood" style="font-size: 80%;"><b>Maximum likelihood</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#the-cost-function-rewritten" style="font-size: 80%;"><b>The cost function rewritten</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs035.html#minimizing-the-cross-entropy" style="font-size: 80%;"><b>Minimizing the cross entropy</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs036.html#a-more-compact-expression" style="font-size: 80%;"><b>A more compact expression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs037.html#extending-to-more-predictors" style="font-size: 80%;"><b>Extending to more predictors</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs038.html#including-more-classes" style="font-size: 80%;"><b>Including more classes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs039.html#more-classes" style="font-size: 80%;"><b>More classes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs040.html#optimization-the-central-part-of-any-machine-learning-algortithm" style="font-size: 80%;"><b>Optimization, the central part of any Machine Learning algortithm</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs041.html#revisiting-our-logistic-regression-case" style="font-size: 80%;"><b>Revisiting our Logistic Regression case</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs042.html#the-equations-to-solve" style="font-size: 80%;"><b>The equations to solve</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs043.html#solving-using-newton-raphson-s-method" style="font-size: 80%;"><b>Solving using Newton-Raphson's method</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs044.html#example-code-for-logistic-regression" style="font-size: 80%;"><b>Example code for Logistic Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs044.html#synthetic-data-generation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Synthetic data generation</a></li>
+
+        </ul>
+      </li>
+    </ul>
+  </div>
+</div>
+</div> <!-- end of navigation bar -->
+<div class="container">
+<p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
+<a name="part0027"></a>
+<!-- !split -->
+<h2 id="some-selected-properties" class="anchor">Some selected properties </h2>
+
+<p>The main problem with our function is that it takes values on the
+entire real axis. In the case of logistic regression, however, the
+labels \( y_i \) are discrete variables. A typical example is the credit
+card data discussed below here, where we can set the state of
+defaulting the debt to \( y_i=1 \) and not to \( y_i=0 \) for one the persons
+in the data set (see the full example below).
+</p>
+
+<p>One simple way to get a discrete output is to have sign
+functions that map the output of a linear regressor to values \( \{0,1\} \),
+\( f(s_i)=sign(s_i)=1 \) if \( s_i\ge 0 \) and 0 if otherwise. 
+We will encounter this model in our first demonstration of neural networks.
+</p>
+
+<p>Historically it is called the <b>perceptron</b> model in the machine learning
+literature. This model is extremely simple. However, in many cases it is more
+favorable to use a ``soft" classifier that outputs
+the probability of a given category. This leads us to the logistic function.
+</p>
+
+<p>
+<!-- navigation buttons at the bottom of the page -->
+<ul class="pagination">
+<li><a href="http://github.com/._week39-bs026.html">&laquo;</a></li>
+  <li><a href="http://github.com/._week39-bs000.html">1</a></li>
+  <li><a href="">...</a></li>
+  <li><a href="http://github.com/._week39-bs019.html">20</a></li>
+  <li><a href="http://github.com/._week39-bs020.html">21</a></li>
+  <li><a href="http://github.com/._week39-bs021.html">22</a></li>
+  <li><a href="http://github.com/._week39-bs022.html">23</a></li>
+  <li><a href="http://github.com/._week39-bs023.html">24</a></li>
+  <li><a href="http://github.com/._week39-bs024.html">25</a></li>
+  <li><a href="http://github.com/._week39-bs025.html">26</a></li>
+  <li><a href="http://github.com/._week39-bs026.html">27</a></li>
+  <li class="active"><a href="http://github.com/._week39-bs027.html">28</a></li>
+  <li><a href="http://github.com/._week39-bs028.html">29</a></li>
+  <li><a href="http://github.com/._week39-bs029.html">30</a></li>
+  <li><a href="http://github.com/._week39-bs030.html">31</a></li>
+  <li><a href="http://github.com/._week39-bs031.html">32</a></li>
+  <li><a href="http://github.com/._week39-bs032.html">33</a></li>
+  <li><a href="http://github.com/._week39-bs033.html">34</a></li>
+  <li><a href="http://github.com/._week39-bs034.html">35</a></li>
+  <li><a href="http://github.com/._week39-bs035.html">36</a></li>
+  <li><a href="http://github.com/._week39-bs036.html">37</a></li>
+  <li><a href="">...</a></li>
+  <li><a href="http://github.com/._week39-bs044.html">45</a></li>
+  <li><a href="http://github.com/._week39-bs028.html">&raquo;</a></li>
+</ul>
+<!-- ------------------- end of main content --------------- -->
+</div>  <!-- end container -->
+<!-- include javascript, jQuery *first* -->
+<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.10.2/jquery.min.js"></script>
+<script src="https://netdna.bootstrapcdn.com/bootstrap/3.0.0/js/bootstrap.min.js"></script>
+<!-- Bootstrap footer
+<footer>
+<a href="https://.../"><img width="250" align=right src="https://.../"></a>
+</footer>
+-->
+<center style="font-size:80%">
+<!-- copyright only on the titlepage -->
+</center>
+</body>
+</html>
+
diff --git a/doc/pub/week39/html/._week39-bs028.html b/doc/pub/week39/html/._week39-bs028.html
new file mode 100644
index 000000000..8b3ec2aca
--- /dev/null
+++ b/doc/pub/week39/html/._week39-bs028.html
@@ -0,0 +1,378 @@
+<!--
+HTML file automatically generated from DocOnce source
+(https://github.com/doconce/doconce/)
+doconce format html week39.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week39-bs --no_mako
+-->
+<html>
+<head>
+<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
+<meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
+<meta name="viewport" content="width=device-width, initial-scale=1.0" />
+<meta name="description" content="Week 39: Resampling methods and logistic regression">
+<title>Week 39: Resampling methods and logistic regression</title>
+<!-- Bootstrap style: bootstrap -->
+<!-- doconce format html week39.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week39-bs --no_mako -->
+<link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
+<!-- not necessary
+<link href="https://netdna.bootstrapcdn.com/font-awesome/4.0.3/css/font-awesome.css" rel="stylesheet">
+-->
+<style type="text/css">
+/* Add scrollbar to dropdown menus in bootstrap navigation bar */
+.dropdown-menu {
+   height: auto;
+   max-height: 400px;
+   overflow-x: hidden;
+}
+/* Adds an invisible element before each target to offset for the navigation
+   bar */
+.anchor::before {
+  content:"";
+  display:block;
+  height:50px;      /* fixed header height for style bootstrap */
+  margin:-50px 0 0; /* negative fixed header height */
+}
+</style>
+</head>
+
+<!-- tocinfo
+{'highest level': 2,
+ 'sections': [('Plan for week 39, September 22-26, 2025',
+               2,
+               None,
+               'plan-for-week-39-september-22-26-2025'),
+              ('Readings and Videos, resampling methods',
+               2,
+               None,
+               'readings-and-videos-resampling-methods'),
+              ('Readings and Videos, logistic regression',
+               2,
+               None,
+               'readings-and-videos-logistic-regression'),
+              ('Lab sessions week 39', 2, None, 'lab-sessions-week-39'),
+              ('Lecture material', 2, None, 'lecture-material'),
+              ('Resampling methods', 2, None, 'resampling-methods'),
+              ('Resampling approaches can be computationally expensive',
+               2,
+               None,
+               'resampling-approaches-can-be-computationally-expensive'),
+              ('Why resampling methods ?', 2, None, 'why-resampling-methods'),
+              ('Statistical analysis', 2, None, 'statistical-analysis'),
+              ('Resampling methods', 2, None, 'resampling-methods'),
+              ('Resampling methods: Bootstrap',
+               2,
+               None,
+               'resampling-methods-bootstrap'),
+              ('The bias-variance tradeoff',
+               2,
+               None,
+               'the-bias-variance-tradeoff'),
+              ('A way to Read the Bias-Variance Tradeoff',
+               2,
+               None,
+               'a-way-to-read-the-bias-variance-tradeoff'),
+              ('Understanding what happens',
+               2,
+               None,
+               'understanding-what-happens'),
+              ('Summing up', 2, None, 'summing-up'),
+              ("Another Example from Scikit-Learn's Repository",
+               2,
+               None,
+               'another-example-from-scikit-learn-s-repository'),
+              ('Various steps in cross-validation',
+               2,
+               None,
+               'various-steps-in-cross-validation'),
+              ('Cross-validation in brief',
+               2,
+               None,
+               'cross-validation-in-brief'),
+              ('Code Example for Cross-validation and $k$-fold '
+               'Cross-validation',
+               2,
+               None,
+               'code-example-for-cross-validation-and-k-fold-cross-validation'),
+              ('More examples on bootstrap and cross-validation and errors',
+               2,
+               None,
+               'more-examples-on-bootstrap-and-cross-validation-and-errors'),
+              ('The same example but now with cross-validation',
+               2,
+               None,
+               'the-same-example-but-now-with-cross-validation'),
+              ('Logistic Regression', 2, None, 'logistic-regression'),
+              ('Classification problems', 2, None, 'classification-problems'),
+              ('Optimization and Deep learning',
+               2,
+               None,
+               'optimization-and-deep-learning'),
+              ('Basics', 2, None, 'basics'),
+              ('Linear classifier', 2, None, 'linear-classifier'),
+              ('Some selected properties', 2, None, 'some-selected-properties'),
+              ('Simple example', 2, None, 'simple-example'),
+              ('Plotting the mean value for each group',
+               2,
+               None,
+               'plotting-the-mean-value-for-each-group'),
+              ('The logistic function', 2, None, 'the-logistic-function'),
+              ('Examples of likelihood functions used in logistic regression '
+               'and nueral networks',
+               2,
+               None,
+               'examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-networks'),
+              ('Two parameters', 2, None, 'two-parameters'),
+              ('Maximum likelihood', 2, None, 'maximum-likelihood'),
+              ('The cost function rewritten',
+               2,
+               None,
+               'the-cost-function-rewritten'),
+              ('Minimizing the cross entropy',
+               2,
+               None,
+               'minimizing-the-cross-entropy'),
+              ('A more compact expression',
+               2,
+               None,
+               'a-more-compact-expression'),
+              ('Extending to more predictors',
+               2,
+               None,
+               'extending-to-more-predictors'),
+              ('Including more classes', 2, None, 'including-more-classes'),
+              ('More classes', 2, None, 'more-classes'),
+              ('Optimization, the central part of any Machine Learning '
+               'algortithm',
+               2,
+               None,
+               'optimization-the-central-part-of-any-machine-learning-algortithm'),
+              ('Revisiting our Logistic Regression case',
+               2,
+               None,
+               'revisiting-our-logistic-regression-case'),
+              ('The equations to solve', 2, None, 'the-equations-to-solve'),
+              ("Solving using Newton-Raphson's method",
+               2,
+               None,
+               'solving-using-newton-raphson-s-method'),
+              ('Example code for Logistic Regression',
+               2,
+               None,
+               'example-code-for-logistic-regression'),
+              ('Synthetic data generation',
+               3,
+               None,
+               'synthetic-data-generation')]}
+end of tocinfo -->
+
+<body>
+
+
+
+<script type="text/x-mathjax-config">
+MathJax.Hub.Config({
+  TeX: {
+     equationNumbers: {  autoNumber: "none"  },
+     extensions: ["AMSmath.js", "AMSsymbols.js", "autobold.js", "color.js"]
+  }
+});
+</script>
+<script type="text/javascript" async
+ src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
+</script>
+
+
+<!-- Bootstrap navigation bar -->
+<div class="navbar navbar-default navbar-fixed-top">
+  <div class="navbar-header">
+    <button type="button" class="navbar-toggle" data-toggle="collapse" data-target=".navbar-responsive-collapse">
+      <span class="icon-bar"></span>
+      <span class="icon-bar"></span>
+      <span class="icon-bar"></span>
+    </button>
+    <a class="navbar-brand" href="http://github.com/week39-bs.html">Week 39: Resampling methods and logistic regression</a>
+  </div>
+  <div class="navbar-collapse collapse navbar-responsive-collapse">
+    <ul class="nav navbar-nav navbar-right">
+      <li class="dropdown">
+        <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
+        <ul class="dropdown-menu">
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs001.html#plan-for-week-39-september-22-26-2025" style="font-size: 80%;"><b>Plan for week 39, September 22-26, 2025</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs002.html#readings-and-videos-resampling-methods" style="font-size: 80%;"><b>Readings and Videos, resampling methods</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs003.html#readings-and-videos-logistic-regression" style="font-size: 80%;"><b>Readings and Videos, logistic regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs004.html#lab-sessions-week-39" style="font-size: 80%;"><b>Lab sessions week 39</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs005.html#lecture-material" style="font-size: 80%;"><b>Lecture material</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs010.html#resampling-methods" style="font-size: 80%;"><b>Resampling methods</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs007.html#resampling-approaches-can-be-computationally-expensive" style="font-size: 80%;"><b>Resampling approaches can be computationally expensive</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs008.html#why-resampling-methods" style="font-size: 80%;"><b>Why resampling methods ?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs009.html#statistical-analysis" style="font-size: 80%;"><b>Statistical analysis</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs010.html#resampling-methods" style="font-size: 80%;"><b>Resampling methods</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs011.html#resampling-methods-bootstrap" style="font-size: 80%;"><b>Resampling methods: Bootstrap</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs012.html#the-bias-variance-tradeoff" style="font-size: 80%;"><b>The bias-variance tradeoff</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs013.html#a-way-to-read-the-bias-variance-tradeoff" style="font-size: 80%;"><b>A way to Read the Bias-Variance Tradeoff</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs014.html#understanding-what-happens" style="font-size: 80%;"><b>Understanding what happens</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs015.html#summing-up" style="font-size: 80%;"><b>Summing up</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs016.html#another-example-from-scikit-learn-s-repository" style="font-size: 80%;"><b>Another Example from Scikit-Learn's Repository</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs017.html#various-steps-in-cross-validation" style="font-size: 80%;"><b>Various steps in cross-validation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs018.html#cross-validation-in-brief" style="font-size: 80%;"><b>Cross-validation in brief</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs019.html#code-example-for-cross-validation-and-k-fold-cross-validation" style="font-size: 80%;"><b>Code Example for Cross-validation and \( k \)-fold Cross-validation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs020.html#more-examples-on-bootstrap-and-cross-validation-and-errors" style="font-size: 80%;"><b>More examples on bootstrap and cross-validation and errors</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs021.html#the-same-example-but-now-with-cross-validation" style="font-size: 80%;"><b>The same example but now with cross-validation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs022.html#logistic-regression" style="font-size: 80%;"><b>Logistic Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs023.html#classification-problems" style="font-size: 80%;"><b>Classification problems</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs024.html#optimization-and-deep-learning" style="font-size: 80%;"><b>Optimization and Deep learning</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs025.html#basics" style="font-size: 80%;"><b>Basics</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs026.html#linear-classifier" style="font-size: 80%;"><b>Linear classifier</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs027.html#some-selected-properties" style="font-size: 80%;"><b>Some selected properties</b></a></li>
+     <!-- navigation toc: --> <li><a href="#simple-example" style="font-size: 80%;"><b>Simple example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs029.html#plotting-the-mean-value-for-each-group" style="font-size: 80%;"><b>Plotting the mean value for each group</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs030.html#the-logistic-function" style="font-size: 80%;"><b>The logistic function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs031.html#examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-networks" style="font-size: 80%;"><b>Examples of likelihood functions used in logistic regression and nueral networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs032.html#two-parameters" style="font-size: 80%;"><b>Two parameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs033.html#maximum-likelihood" style="font-size: 80%;"><b>Maximum likelihood</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#the-cost-function-rewritten" style="font-size: 80%;"><b>The cost function rewritten</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs035.html#minimizing-the-cross-entropy" style="font-size: 80%;"><b>Minimizing the cross entropy</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs036.html#a-more-compact-expression" style="font-size: 80%;"><b>A more compact expression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs037.html#extending-to-more-predictors" style="font-size: 80%;"><b>Extending to more predictors</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs038.html#including-more-classes" style="font-size: 80%;"><b>Including more classes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs039.html#more-classes" style="font-size: 80%;"><b>More classes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs040.html#optimization-the-central-part-of-any-machine-learning-algortithm" style="font-size: 80%;"><b>Optimization, the central part of any Machine Learning algortithm</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs041.html#revisiting-our-logistic-regression-case" style="font-size: 80%;"><b>Revisiting our Logistic Regression case</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs042.html#the-equations-to-solve" style="font-size: 80%;"><b>The equations to solve</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs043.html#solving-using-newton-raphson-s-method" style="font-size: 80%;"><b>Solving using Newton-Raphson's method</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs044.html#example-code-for-logistic-regression" style="font-size: 80%;"><b>Example code for Logistic Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs044.html#synthetic-data-generation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Synthetic data generation</a></li>
+
+        </ul>
+      </li>
+    </ul>
+  </div>
+</div>
+</div> <!-- end of navigation bar -->
+<div class="container">
+<p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
+<a name="part0028"></a>
+<!-- !split -->
+<h2 id="simple-example" class="anchor">Simple example </h2>
+
+<p>The following example on data for coronary heart disease (CHD) as function of age may serve as an illustration. In the code here we read and plot whether a person has had CHD (output = 1) or not (output = 0). This ouput  is plotted the person's against age. Clearly, the figure shows that attempting to make a standard linear regression fit may not be very meaningful.</p>
+
+
+<!-- code=python (!bc pycod) typeset with pygments style "default" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #f8f8f8">
+  <pre style="line-height: 125%;"><span style="color: #408080; font-style: italic"># Common imports</span>
+<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">os</span>
+<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
+<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">pandas</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">pd</span>
+<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">matplotlib.pyplot</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">plt</span>
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.linear_model</span> <span style="color: #008000; font-weight: bold">import</span> LinearRegression, Ridge, Lasso
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.model_selection</span> <span style="color: #008000; font-weight: bold">import</span> train_test_split
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.utils</span> <span style="color: #008000; font-weight: bold">import</span> resample
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.metrics</span> <span style="color: #008000; font-weight: bold">import</span> mean_squared_error
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">IPython.display</span> <span style="color: #008000; font-weight: bold">import</span> display
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">pylab</span> <span style="color: #008000; font-weight: bold">import</span> plt, mpl
+mpl<span style="color: #666666">.</span>rcParams[<span style="color: #BA2121">&#39;font.family&#39;</span>] <span style="color: #666666">=</span> <span style="color: #BA2121">&#39;serif&#39;</span>
+
+<span style="color: #408080; font-style: italic"># Where to save the figures and data files</span>
+PROJECT_ROOT_DIR <span style="color: #666666">=</span> <span style="color: #BA2121">&quot;Results&quot;</span>
+FIGURE_ID <span style="color: #666666">=</span> <span style="color: #BA2121">&quot;Results/FigureFiles&quot;</span>
+DATA_ID <span style="color: #666666">=</span> <span style="color: #BA2121">&quot;DataFiles/&quot;</span>
+
+<span style="color: #008000; font-weight: bold">if</span> <span style="color: #AA22FF; font-weight: bold">not</span> os<span style="color: #666666">.</span>path<span style="color: #666666">.</span>exists(PROJECT_ROOT_DIR):
+    os<span style="color: #666666">.</span>mkdir(PROJECT_ROOT_DIR)
+
+<span style="color: #008000; font-weight: bold">if</span> <span style="color: #AA22FF; font-weight: bold">not</span> os<span style="color: #666666">.</span>path<span style="color: #666666">.</span>exists(FIGURE_ID):
+    os<span style="color: #666666">.</span>makedirs(FIGURE_ID)
+
+<span style="color: #008000; font-weight: bold">if</span> <span style="color: #AA22FF; font-weight: bold">not</span> os<span style="color: #666666">.</span>path<span style="color: #666666">.</span>exists(DATA_ID):
+    os<span style="color: #666666">.</span>makedirs(DATA_ID)
+
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">image_path</span>(fig_id):
+    <span style="color: #008000; font-weight: bold">return</span> os<span style="color: #666666">.</span>path<span style="color: #666666">.</span>join(FIGURE_ID, fig_id)
+
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">data_path</span>(dat_id):
+    <span style="color: #008000; font-weight: bold">return</span> os<span style="color: #666666">.</span>path<span style="color: #666666">.</span>join(DATA_ID, dat_id)
+
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">save_fig</span>(fig_id):
+    plt<span style="color: #666666">.</span>savefig(image_path(fig_id) <span style="color: #666666">+</span> <span style="color: #BA2121">&quot;.png&quot;</span>, <span style="color: #008000">format</span><span style="color: #666666">=</span><span style="color: #BA2121">&#39;png&#39;</span>)
+
+infile <span style="color: #666666">=</span> <span style="color: #008000">open</span>(data_path(<span style="color: #BA2121">&quot;chddata.csv&quot;</span>),<span style="color: #BA2121">&#39;r&#39;</span>)
+
+<span style="color: #408080; font-style: italic"># Read the chd data as  csv file and organize the data into arrays with age group, age, and chd</span>
+chd <span style="color: #666666">=</span> pd<span style="color: #666666">.</span>read_csv(infile, names<span style="color: #666666">=</span>(<span style="color: #BA2121">&#39;ID&#39;</span>, <span style="color: #BA2121">&#39;Age&#39;</span>, <span style="color: #BA2121">&#39;Agegroup&#39;</span>, <span style="color: #BA2121">&#39;CHD&#39;</span>))
+chd<span style="color: #666666">.</span>columns <span style="color: #666666">=</span> [<span style="color: #BA2121">&#39;ID&#39;</span>, <span style="color: #BA2121">&#39;Age&#39;</span>, <span style="color: #BA2121">&#39;Agegroup&#39;</span>, <span style="color: #BA2121">&#39;CHD&#39;</span>]
+output <span style="color: #666666">=</span> chd[<span style="color: #BA2121">&#39;CHD&#39;</span>]
+age <span style="color: #666666">=</span> chd[<span style="color: #BA2121">&#39;Age&#39;</span>]
+agegroup <span style="color: #666666">=</span> chd[<span style="color: #BA2121">&#39;Agegroup&#39;</span>]
+numberID  <span style="color: #666666">=</span> chd[<span style="color: #BA2121">&#39;ID&#39;</span>] 
+display(chd)
+
+plt<span style="color: #666666">.</span>scatter(age, output, marker<span style="color: #666666">=</span><span style="color: #BA2121">&#39;o&#39;</span>)
+plt<span style="color: #666666">.</span>axis([<span style="color: #666666">18</span>,<span style="color: #666666">70.0</span>,<span style="color: #666666">-0.1</span>, <span style="color: #666666">1.2</span>])
+plt<span style="color: #666666">.</span>xlabel(<span style="color: #BA2121">r&#39;Age&#39;</span>)
+plt<span style="color: #666666">.</span>ylabel(<span style="color: #BA2121">r&#39;CHD&#39;</span>)
+plt<span style="color: #666666">.</span>title(<span style="color: #BA2121">r&#39;Age distribution and Coronary heart disease&#39;</span>)
+plt<span style="color: #666666">.</span>show()
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
+</div>
+
+
+<p>
+<!-- navigation buttons at the bottom of the page -->
+<ul class="pagination">
+<li><a href="http://github.com/._week39-bs027.html">&laquo;</a></li>
+  <li><a href="http://github.com/._week39-bs000.html">1</a></li>
+  <li><a href="">...</a></li>
+  <li><a href="http://github.com/._week39-bs020.html">21</a></li>
+  <li><a href="http://github.com/._week39-bs021.html">22</a></li>
+  <li><a href="http://github.com/._week39-bs022.html">23</a></li>
+  <li><a href="http://github.com/._week39-bs023.html">24</a></li>
+  <li><a href="http://github.com/._week39-bs024.html">25</a></li>
+  <li><a href="http://github.com/._week39-bs025.html">26</a></li>
+  <li><a href="http://github.com/._week39-bs026.html">27</a></li>
+  <li><a href="http://github.com/._week39-bs027.html">28</a></li>
+  <li class="active"><a href="http://github.com/._week39-bs028.html">29</a></li>
+  <li><a href="http://github.com/._week39-bs029.html">30</a></li>
+  <li><a href="http://github.com/._week39-bs030.html">31</a></li>
+  <li><a href="http://github.com/._week39-bs031.html">32</a></li>
+  <li><a href="http://github.com/._week39-bs032.html">33</a></li>
+  <li><a href="http://github.com/._week39-bs033.html">34</a></li>
+  <li><a href="http://github.com/._week39-bs034.html">35</a></li>
+  <li><a href="http://github.com/._week39-bs035.html">36</a></li>
+  <li><a href="http://github.com/._week39-bs036.html">37</a></li>
+  <li><a href="http://github.com/._week39-bs037.html">38</a></li>
+  <li><a href="">...</a></li>
+  <li><a href="http://github.com/._week39-bs044.html">45</a></li>
+  <li><a href="http://github.com/._week39-bs029.html">&raquo;</a></li>
+</ul>
+<!-- ------------------- end of main content --------------- -->
+</div>  <!-- end container -->
+<!-- include javascript, jQuery *first* -->
+<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.10.2/jquery.min.js"></script>
+<script src="https://netdna.bootstrapcdn.com/bootstrap/3.0.0/js/bootstrap.min.js"></script>
+<!-- Bootstrap footer
+<footer>
+<a href="https://.../"><img width="250" align=right src="https://.../"></a>
+</footer>
+-->
+<center style="font-size:80%">
+<!-- copyright only on the titlepage -->
+</center>
+</body>
+</html>
+
diff --git a/doc/pub/week39/html/._week39-bs029.html b/doc/pub/week39/html/._week39-bs029.html
new file mode 100644
index 000000000..4eb185a8a
--- /dev/null
+++ b/doc/pub/week39/html/._week39-bs029.html
@@ -0,0 +1,351 @@
+<!--
+HTML file automatically generated from DocOnce source
+(https://github.com/doconce/doconce/)
+doconce format html week39.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week39-bs --no_mako
+-->
+<html>
+<head>
+<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
+<meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
+<meta name="viewport" content="width=device-width, initial-scale=1.0" />
+<meta name="description" content="Week 39: Resampling methods and logistic regression">
+<title>Week 39: Resampling methods and logistic regression</title>
+<!-- Bootstrap style: bootstrap -->
+<!-- doconce format html week39.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week39-bs --no_mako -->
+<link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
+<!-- not necessary
+<link href="https://netdna.bootstrapcdn.com/font-awesome/4.0.3/css/font-awesome.css" rel="stylesheet">
+-->
+<style type="text/css">
+/* Add scrollbar to dropdown menus in bootstrap navigation bar */
+.dropdown-menu {
+   height: auto;
+   max-height: 400px;
+   overflow-x: hidden;
+}
+/* Adds an invisible element before each target to offset for the navigation
+   bar */
+.anchor::before {
+  content:"";
+  display:block;
+  height:50px;      /* fixed header height for style bootstrap */
+  margin:-50px 0 0; /* negative fixed header height */
+}
+</style>
+</head>
+
+<!-- tocinfo
+{'highest level': 2,
+ 'sections': [('Plan for week 39, September 22-26, 2025',
+               2,
+               None,
+               'plan-for-week-39-september-22-26-2025'),
+              ('Readings and Videos, resampling methods',
+               2,
+               None,
+               'readings-and-videos-resampling-methods'),
+              ('Readings and Videos, logistic regression',
+               2,
+               None,
+               'readings-and-videos-logistic-regression'),
+              ('Lab sessions week 39', 2, None, 'lab-sessions-week-39'),
+              ('Lecture material', 2, None, 'lecture-material'),
+              ('Resampling methods', 2, None, 'resampling-methods'),
+              ('Resampling approaches can be computationally expensive',
+               2,
+               None,
+               'resampling-approaches-can-be-computationally-expensive'),
+              ('Why resampling methods ?', 2, None, 'why-resampling-methods'),
+              ('Statistical analysis', 2, None, 'statistical-analysis'),
+              ('Resampling methods', 2, None, 'resampling-methods'),
+              ('Resampling methods: Bootstrap',
+               2,
+               None,
+               'resampling-methods-bootstrap'),
+              ('The bias-variance tradeoff',
+               2,
+               None,
+               'the-bias-variance-tradeoff'),
+              ('A way to Read the Bias-Variance Tradeoff',
+               2,
+               None,
+               'a-way-to-read-the-bias-variance-tradeoff'),
+              ('Understanding what happens',
+               2,
+               None,
+               'understanding-what-happens'),
+              ('Summing up', 2, None, 'summing-up'),
+              ("Another Example from Scikit-Learn's Repository",
+               2,
+               None,
+               'another-example-from-scikit-learn-s-repository'),
+              ('Various steps in cross-validation',
+               2,
+               None,
+               'various-steps-in-cross-validation'),
+              ('Cross-validation in brief',
+               2,
+               None,
+               'cross-validation-in-brief'),
+              ('Code Example for Cross-validation and $k$-fold '
+               'Cross-validation',
+               2,
+               None,
+               'code-example-for-cross-validation-and-k-fold-cross-validation'),
+              ('More examples on bootstrap and cross-validation and errors',
+               2,
+               None,
+               'more-examples-on-bootstrap-and-cross-validation-and-errors'),
+              ('The same example but now with cross-validation',
+               2,
+               None,
+               'the-same-example-but-now-with-cross-validation'),
+              ('Logistic Regression', 2, None, 'logistic-regression'),
+              ('Classification problems', 2, None, 'classification-problems'),
+              ('Optimization and Deep learning',
+               2,
+               None,
+               'optimization-and-deep-learning'),
+              ('Basics', 2, None, 'basics'),
+              ('Linear classifier', 2, None, 'linear-classifier'),
+              ('Some selected properties', 2, None, 'some-selected-properties'),
+              ('Simple example', 2, None, 'simple-example'),
+              ('Plotting the mean value for each group',
+               2,
+               None,
+               'plotting-the-mean-value-for-each-group'),
+              ('The logistic function', 2, None, 'the-logistic-function'),
+              ('Examples of likelihood functions used in logistic regression '
+               'and nueral networks',
+               2,
+               None,
+               'examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-networks'),
+              ('Two parameters', 2, None, 'two-parameters'),
+              ('Maximum likelihood', 2, None, 'maximum-likelihood'),
+              ('The cost function rewritten',
+               2,
+               None,
+               'the-cost-function-rewritten'),
+              ('Minimizing the cross entropy',
+               2,
+               None,
+               'minimizing-the-cross-entropy'),
+              ('A more compact expression',
+               2,
+               None,
+               'a-more-compact-expression'),
+              ('Extending to more predictors',
+               2,
+               None,
+               'extending-to-more-predictors'),
+              ('Including more classes', 2, None, 'including-more-classes'),
+              ('More classes', 2, None, 'more-classes'),
+              ('Optimization, the central part of any Machine Learning '
+               'algortithm',
+               2,
+               None,
+               'optimization-the-central-part-of-any-machine-learning-algortithm'),
+              ('Revisiting our Logistic Regression case',
+               2,
+               None,
+               'revisiting-our-logistic-regression-case'),
+              ('The equations to solve', 2, None, 'the-equations-to-solve'),
+              ("Solving using Newton-Raphson's method",
+               2,
+               None,
+               'solving-using-newton-raphson-s-method'),
+              ('Example code for Logistic Regression',
+               2,
+               None,
+               'example-code-for-logistic-regression'),
+              ('Synthetic data generation',
+               3,
+               None,
+               'synthetic-data-generation')]}
+end of tocinfo -->
+
+<body>
+
+
+
+<script type="text/x-mathjax-config">
+MathJax.Hub.Config({
+  TeX: {
+     equationNumbers: {  autoNumber: "none"  },
+     extensions: ["AMSmath.js", "AMSsymbols.js", "autobold.js", "color.js"]
+  }
+});
+</script>
+<script type="text/javascript" async
+ src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
+</script>
+
+
+<!-- Bootstrap navigation bar -->
+<div class="navbar navbar-default navbar-fixed-top">
+  <div class="navbar-header">
+    <button type="button" class="navbar-toggle" data-toggle="collapse" data-target=".navbar-responsive-collapse">
+      <span class="icon-bar"></span>
+      <span class="icon-bar"></span>
+      <span class="icon-bar"></span>
+    </button>
+    <a class="navbar-brand" href="http://github.com/week39-bs.html">Week 39: Resampling methods and logistic regression</a>
+  </div>
+  <div class="navbar-collapse collapse navbar-responsive-collapse">
+    <ul class="nav navbar-nav navbar-right">
+      <li class="dropdown">
+        <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
+        <ul class="dropdown-menu">
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs001.html#plan-for-week-39-september-22-26-2025" style="font-size: 80%;"><b>Plan for week 39, September 22-26, 2025</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs002.html#readings-and-videos-resampling-methods" style="font-size: 80%;"><b>Readings and Videos, resampling methods</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs003.html#readings-and-videos-logistic-regression" style="font-size: 80%;"><b>Readings and Videos, logistic regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs004.html#lab-sessions-week-39" style="font-size: 80%;"><b>Lab sessions week 39</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs005.html#lecture-material" style="font-size: 80%;"><b>Lecture material</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs010.html#resampling-methods" style="font-size: 80%;"><b>Resampling methods</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs007.html#resampling-approaches-can-be-computationally-expensive" style="font-size: 80%;"><b>Resampling approaches can be computationally expensive</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs008.html#why-resampling-methods" style="font-size: 80%;"><b>Why resampling methods ?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs009.html#statistical-analysis" style="font-size: 80%;"><b>Statistical analysis</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs010.html#resampling-methods" style="font-size: 80%;"><b>Resampling methods</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs011.html#resampling-methods-bootstrap" style="font-size: 80%;"><b>Resampling methods: Bootstrap</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs012.html#the-bias-variance-tradeoff" style="font-size: 80%;"><b>The bias-variance tradeoff</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs013.html#a-way-to-read-the-bias-variance-tradeoff" style="font-size: 80%;"><b>A way to Read the Bias-Variance Tradeoff</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs014.html#understanding-what-happens" style="font-size: 80%;"><b>Understanding what happens</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs015.html#summing-up" style="font-size: 80%;"><b>Summing up</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs016.html#another-example-from-scikit-learn-s-repository" style="font-size: 80%;"><b>Another Example from Scikit-Learn's Repository</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs017.html#various-steps-in-cross-validation" style="font-size: 80%;"><b>Various steps in cross-validation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs018.html#cross-validation-in-brief" style="font-size: 80%;"><b>Cross-validation in brief</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs019.html#code-example-for-cross-validation-and-k-fold-cross-validation" style="font-size: 80%;"><b>Code Example for Cross-validation and \( k \)-fold Cross-validation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs020.html#more-examples-on-bootstrap-and-cross-validation-and-errors" style="font-size: 80%;"><b>More examples on bootstrap and cross-validation and errors</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs021.html#the-same-example-but-now-with-cross-validation" style="font-size: 80%;"><b>The same example but now with cross-validation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs022.html#logistic-regression" style="font-size: 80%;"><b>Logistic Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs023.html#classification-problems" style="font-size: 80%;"><b>Classification problems</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs024.html#optimization-and-deep-learning" style="font-size: 80%;"><b>Optimization and Deep learning</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs025.html#basics" style="font-size: 80%;"><b>Basics</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs026.html#linear-classifier" style="font-size: 80%;"><b>Linear classifier</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs027.html#some-selected-properties" style="font-size: 80%;"><b>Some selected properties</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs028.html#simple-example" style="font-size: 80%;"><b>Simple example</b></a></li>
+     <!-- navigation toc: --> <li><a href="#plotting-the-mean-value-for-each-group" style="font-size: 80%;"><b>Plotting the mean value for each group</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs030.html#the-logistic-function" style="font-size: 80%;"><b>The logistic function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs031.html#examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-networks" style="font-size: 80%;"><b>Examples of likelihood functions used in logistic regression and nueral networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs032.html#two-parameters" style="font-size: 80%;"><b>Two parameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs033.html#maximum-likelihood" style="font-size: 80%;"><b>Maximum likelihood</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#the-cost-function-rewritten" style="font-size: 80%;"><b>The cost function rewritten</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs035.html#minimizing-the-cross-entropy" style="font-size: 80%;"><b>Minimizing the cross entropy</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs036.html#a-more-compact-expression" style="font-size: 80%;"><b>A more compact expression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs037.html#extending-to-more-predictors" style="font-size: 80%;"><b>Extending to more predictors</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs038.html#including-more-classes" style="font-size: 80%;"><b>Including more classes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs039.html#more-classes" style="font-size: 80%;"><b>More classes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs040.html#optimization-the-central-part-of-any-machine-learning-algortithm" style="font-size: 80%;"><b>Optimization, the central part of any Machine Learning algortithm</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs041.html#revisiting-our-logistic-regression-case" style="font-size: 80%;"><b>Revisiting our Logistic Regression case</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs042.html#the-equations-to-solve" style="font-size: 80%;"><b>The equations to solve</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs043.html#solving-using-newton-raphson-s-method" style="font-size: 80%;"><b>Solving using Newton-Raphson's method</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs044.html#example-code-for-logistic-regression" style="font-size: 80%;"><b>Example code for Logistic Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs044.html#synthetic-data-generation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Synthetic data generation</a></li>
+
+        </ul>
+      </li>
+    </ul>
+  </div>
+</div>
+</div> <!-- end of navigation bar -->
+<div class="container">
+<p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
+<a name="part0029"></a>
+<!-- !split -->
+<h2 id="plotting-the-mean-value-for-each-group" class="anchor">Plotting the mean value for each group </h2>
+
+<p>What we could attempt however is to plot the mean value for each group.</p>
+
+
+<!-- code=python (!bc pycod) typeset with pygments style "default" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #f8f8f8">
+  <pre style="line-height: 125%;">agegroupmean <span style="color: #666666">=</span> np<span style="color: #666666">.</span>array([<span style="color: #666666">0.1</span>, <span style="color: #666666">0.133</span>, <span style="color: #666666">0.250</span>, <span style="color: #666666">0.333</span>, <span style="color: #666666">0.462</span>, <span style="color: #666666">0.625</span>, <span style="color: #666666">0.765</span>, <span style="color: #666666">0.800</span>])
+group <span style="color: #666666">=</span> np<span style="color: #666666">.</span>array([<span style="color: #666666">1</span>, <span style="color: #666666">2</span>, <span style="color: #666666">3</span>, <span style="color: #666666">4</span>, <span style="color: #666666">5</span>, <span style="color: #666666">6</span>, <span style="color: #666666">7</span>, <span style="color: #666666">8</span>])
+plt<span style="color: #666666">.</span>plot(group, agegroupmean, <span style="color: #BA2121">&quot;r-&quot;</span>)
+plt<span style="color: #666666">.</span>axis([<span style="color: #666666">0</span>,<span style="color: #666666">9</span>,<span style="color: #666666">0</span>, <span style="color: #666666">1.0</span>])
+plt<span style="color: #666666">.</span>xlabel(<span style="color: #BA2121">r&#39;Age group&#39;</span>)
+plt<span style="color: #666666">.</span>ylabel(<span style="color: #BA2121">r&#39;CHD mean values&#39;</span>)
+plt<span style="color: #666666">.</span>title(<span style="color: #BA2121">r&#39;Mean values for each age group&#39;</span>)
+plt<span style="color: #666666">.</span>show()
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
+</div>
+
+<p>We are now trying to find a function \( f(y\vert x) \), that is a function which gives us an expected value for the output \( y \) with a given input \( x \).
+In standard linear regression with a linear dependence on \( x \), we would write this in terms of our model
+</p>
+$$
+f(y_i\vert x_i)=\theta_0+\theta_1 x_i.
+$$
+
+<p>This expression implies however that \( f(y_i\vert x_i) \) could take any
+value from minus infinity to plus infinity. If we however let
+\( f(y\vert y) \) be represented by the mean value, the above example
+shows us that we can constrain the function to take values between
+zero and one, that is we have \( 0 \le f(y_i\vert x_i) \le 1 \). Looking
+at our last curve we see also that it has an S-shaped form. This leads
+us to a very popular model for the function \( f \), namely the so-called
+Sigmoid function or logistic model. We will consider this function as
+representing the probability for finding a value of \( y_i \) with a given
+\( x_i \).
+</p>
+
+<p>
+<!-- navigation buttons at the bottom of the page -->
+<ul class="pagination">
+<li><a href="http://github.com/._week39-bs028.html">&laquo;</a></li>
+  <li><a href="http://github.com/._week39-bs000.html">1</a></li>
+  <li><a href="">...</a></li>
+  <li><a href="http://github.com/._week39-bs021.html">22</a></li>
+  <li><a href="http://github.com/._week39-bs022.html">23</a></li>
+  <li><a href="http://github.com/._week39-bs023.html">24</a></li>
+  <li><a href="http://github.com/._week39-bs024.html">25</a></li>
+  <li><a href="http://github.com/._week39-bs025.html">26</a></li>
+  <li><a href="http://github.com/._week39-bs026.html">27</a></li>
+  <li><a href="http://github.com/._week39-bs027.html">28</a></li>
+  <li><a href="http://github.com/._week39-bs028.html">29</a></li>
+  <li class="active"><a href="http://github.com/._week39-bs029.html">30</a></li>
+  <li><a href="http://github.com/._week39-bs030.html">31</a></li>
+  <li><a href="http://github.com/._week39-bs031.html">32</a></li>
+  <li><a href="http://github.com/._week39-bs032.html">33</a></li>
+  <li><a href="http://github.com/._week39-bs033.html">34</a></li>
+  <li><a href="http://github.com/._week39-bs034.html">35</a></li>
+  <li><a href="http://github.com/._week39-bs035.html">36</a></li>
+  <li><a href="http://github.com/._week39-bs036.html">37</a></li>
+  <li><a href="http://github.com/._week39-bs037.html">38</a></li>
+  <li><a href="http://github.com/._week39-bs038.html">39</a></li>
+  <li><a href="">...</a></li>
+  <li><a href="http://github.com/._week39-bs044.html">45</a></li>
+  <li><a href="http://github.com/._week39-bs030.html">&raquo;</a></li>
+</ul>
+<!-- ------------------- end of main content --------------- -->
+</div>  <!-- end container -->
+<!-- include javascript, jQuery *first* -->
+<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.10.2/jquery.min.js"></script>
+<script src="https://netdna.bootstrapcdn.com/bootstrap/3.0.0/js/bootstrap.min.js"></script>
+<!-- Bootstrap footer
+<footer>
+<a href="https://.../"><img width="250" align=right src="https://.../"></a>
+</footer>
+-->
+<center style="font-size:80%">
+<!-- copyright only on the titlepage -->
+</center>
+</body>
+</html>
+
diff --git a/doc/pub/week39/html/._week39-bs030.html b/doc/pub/week39/html/._week39-bs030.html
new file mode 100644
index 000000000..5bafd9ea0
--- /dev/null
+++ b/doc/pub/week39/html/._week39-bs030.html
@@ -0,0 +1,318 @@
+<!--
+HTML file automatically generated from DocOnce source
+(https://github.com/doconce/doconce/)
+doconce format html week39.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week39-bs --no_mako
+-->
+<html>
+<head>
+<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
+<meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
+<meta name="viewport" content="width=device-width, initial-scale=1.0" />
+<meta name="description" content="Week 39: Resampling methods and logistic regression">
+<title>Week 39: Resampling methods and logistic regression</title>
+<!-- Bootstrap style: bootstrap -->
+<!-- doconce format html week39.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week39-bs --no_mako -->
+<link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
+<!-- not necessary
+<link href="https://netdna.bootstrapcdn.com/font-awesome/4.0.3/css/font-awesome.css" rel="stylesheet">
+-->
+<style type="text/css">
+/* Add scrollbar to dropdown menus in bootstrap navigation bar */
+.dropdown-menu {
+   height: auto;
+   max-height: 400px;
+   overflow-x: hidden;
+}
+/* Adds an invisible element before each target to offset for the navigation
+   bar */
+.anchor::before {
+  content:"";
+  display:block;
+  height:50px;      /* fixed header height for style bootstrap */
+  margin:-50px 0 0; /* negative fixed header height */
+}
+</style>
+</head>
+
+<!-- tocinfo
+{'highest level': 2,
+ 'sections': [('Plan for week 39, September 22-26, 2025',
+               2,
+               None,
+               'plan-for-week-39-september-22-26-2025'),
+              ('Readings and Videos, resampling methods',
+               2,
+               None,
+               'readings-and-videos-resampling-methods'),
+              ('Readings and Videos, logistic regression',
+               2,
+               None,
+               'readings-and-videos-logistic-regression'),
+              ('Lab sessions week 39', 2, None, 'lab-sessions-week-39'),
+              ('Lecture material', 2, None, 'lecture-material'),
+              ('Resampling methods', 2, None, 'resampling-methods'),
+              ('Resampling approaches can be computationally expensive',
+               2,
+               None,
+               'resampling-approaches-can-be-computationally-expensive'),
+              ('Why resampling methods ?', 2, None, 'why-resampling-methods'),
+              ('Statistical analysis', 2, None, 'statistical-analysis'),
+              ('Resampling methods', 2, None, 'resampling-methods'),
+              ('Resampling methods: Bootstrap',
+               2,
+               None,
+               'resampling-methods-bootstrap'),
+              ('The bias-variance tradeoff',
+               2,
+               None,
+               'the-bias-variance-tradeoff'),
+              ('A way to Read the Bias-Variance Tradeoff',
+               2,
+               None,
+               'a-way-to-read-the-bias-variance-tradeoff'),
+              ('Understanding what happens',
+               2,
+               None,
+               'understanding-what-happens'),
+              ('Summing up', 2, None, 'summing-up'),
+              ("Another Example from Scikit-Learn's Repository",
+               2,
+               None,
+               'another-example-from-scikit-learn-s-repository'),
+              ('Various steps in cross-validation',
+               2,
+               None,
+               'various-steps-in-cross-validation'),
+              ('Cross-validation in brief',
+               2,
+               None,
+               'cross-validation-in-brief'),
+              ('Code Example for Cross-validation and $k$-fold '
+               'Cross-validation',
+               2,
+               None,
+               'code-example-for-cross-validation-and-k-fold-cross-validation'),
+              ('More examples on bootstrap and cross-validation and errors',
+               2,
+               None,
+               'more-examples-on-bootstrap-and-cross-validation-and-errors'),
+              ('The same example but now with cross-validation',
+               2,
+               None,
+               'the-same-example-but-now-with-cross-validation'),
+              ('Logistic Regression', 2, None, 'logistic-regression'),
+              ('Classification problems', 2, None, 'classification-problems'),
+              ('Optimization and Deep learning',
+               2,
+               None,
+               'optimization-and-deep-learning'),
+              ('Basics', 2, None, 'basics'),
+              ('Linear classifier', 2, None, 'linear-classifier'),
+              ('Some selected properties', 2, None, 'some-selected-properties'),
+              ('Simple example', 2, None, 'simple-example'),
+              ('Plotting the mean value for each group',
+               2,
+               None,
+               'plotting-the-mean-value-for-each-group'),
+              ('The logistic function', 2, None, 'the-logistic-function'),
+              ('Examples of likelihood functions used in logistic regression '
+               'and nueral networks',
+               2,
+               None,
+               'examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-networks'),
+              ('Two parameters', 2, None, 'two-parameters'),
+              ('Maximum likelihood', 2, None, 'maximum-likelihood'),
+              ('The cost function rewritten',
+               2,
+               None,
+               'the-cost-function-rewritten'),
+              ('Minimizing the cross entropy',
+               2,
+               None,
+               'minimizing-the-cross-entropy'),
+              ('A more compact expression',
+               2,
+               None,
+               'a-more-compact-expression'),
+              ('Extending to more predictors',
+               2,
+               None,
+               'extending-to-more-predictors'),
+              ('Including more classes', 2, None, 'including-more-classes'),
+              ('More classes', 2, None, 'more-classes'),
+              ('Optimization, the central part of any Machine Learning '
+               'algortithm',
+               2,
+               None,
+               'optimization-the-central-part-of-any-machine-learning-algortithm'),
+              ('Revisiting our Logistic Regression case',
+               2,
+               None,
+               'revisiting-our-logistic-regression-case'),
+              ('The equations to solve', 2, None, 'the-equations-to-solve'),
+              ("Solving using Newton-Raphson's method",
+               2,
+               None,
+               'solving-using-newton-raphson-s-method'),
+              ('Example code for Logistic Regression',
+               2,
+               None,
+               'example-code-for-logistic-regression'),
+              ('Synthetic data generation',
+               3,
+               None,
+               'synthetic-data-generation')]}
+end of tocinfo -->
+
+<body>
+
+
+
+<script type="text/x-mathjax-config">
+MathJax.Hub.Config({
+  TeX: {
+     equationNumbers: {  autoNumber: "none"  },
+     extensions: ["AMSmath.js", "AMSsymbols.js", "autobold.js", "color.js"]
+  }
+});
+</script>
+<script type="text/javascript" async
+ src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
+</script>
+
+
+<!-- Bootstrap navigation bar -->
+<div class="navbar navbar-default navbar-fixed-top">
+  <div class="navbar-header">
+    <button type="button" class="navbar-toggle" data-toggle="collapse" data-target=".navbar-responsive-collapse">
+      <span class="icon-bar"></span>
+      <span class="icon-bar"></span>
+      <span class="icon-bar"></span>
+    </button>
+    <a class="navbar-brand" href="http://github.com/week39-bs.html">Week 39: Resampling methods and logistic regression</a>
+  </div>
+  <div class="navbar-collapse collapse navbar-responsive-collapse">
+    <ul class="nav navbar-nav navbar-right">
+      <li class="dropdown">
+        <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
+        <ul class="dropdown-menu">
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs001.html#plan-for-week-39-september-22-26-2025" style="font-size: 80%;"><b>Plan for week 39, September 22-26, 2025</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs002.html#readings-and-videos-resampling-methods" style="font-size: 80%;"><b>Readings and Videos, resampling methods</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs003.html#readings-and-videos-logistic-regression" style="font-size: 80%;"><b>Readings and Videos, logistic regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs004.html#lab-sessions-week-39" style="font-size: 80%;"><b>Lab sessions week 39</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs005.html#lecture-material" style="font-size: 80%;"><b>Lecture material</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs010.html#resampling-methods" style="font-size: 80%;"><b>Resampling methods</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs007.html#resampling-approaches-can-be-computationally-expensive" style="font-size: 80%;"><b>Resampling approaches can be computationally expensive</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs008.html#why-resampling-methods" style="font-size: 80%;"><b>Why resampling methods ?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs009.html#statistical-analysis" style="font-size: 80%;"><b>Statistical analysis</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs010.html#resampling-methods" style="font-size: 80%;"><b>Resampling methods</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs011.html#resampling-methods-bootstrap" style="font-size: 80%;"><b>Resampling methods: Bootstrap</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs012.html#the-bias-variance-tradeoff" style="font-size: 80%;"><b>The bias-variance tradeoff</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs013.html#a-way-to-read-the-bias-variance-tradeoff" style="font-size: 80%;"><b>A way to Read the Bias-Variance Tradeoff</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs014.html#understanding-what-happens" style="font-size: 80%;"><b>Understanding what happens</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs015.html#summing-up" style="font-size: 80%;"><b>Summing up</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs016.html#another-example-from-scikit-learn-s-repository" style="font-size: 80%;"><b>Another Example from Scikit-Learn's Repository</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs017.html#various-steps-in-cross-validation" style="font-size: 80%;"><b>Various steps in cross-validation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs018.html#cross-validation-in-brief" style="font-size: 80%;"><b>Cross-validation in brief</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs019.html#code-example-for-cross-validation-and-k-fold-cross-validation" style="font-size: 80%;"><b>Code Example for Cross-validation and \( k \)-fold Cross-validation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs020.html#more-examples-on-bootstrap-and-cross-validation-and-errors" style="font-size: 80%;"><b>More examples on bootstrap and cross-validation and errors</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs021.html#the-same-example-but-now-with-cross-validation" style="font-size: 80%;"><b>The same example but now with cross-validation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs022.html#logistic-regression" style="font-size: 80%;"><b>Logistic Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs023.html#classification-problems" style="font-size: 80%;"><b>Classification problems</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs024.html#optimization-and-deep-learning" style="font-size: 80%;"><b>Optimization and Deep learning</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs025.html#basics" style="font-size: 80%;"><b>Basics</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs026.html#linear-classifier" style="font-size: 80%;"><b>Linear classifier</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs027.html#some-selected-properties" style="font-size: 80%;"><b>Some selected properties</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs028.html#simple-example" style="font-size: 80%;"><b>Simple example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs029.html#plotting-the-mean-value-for-each-group" style="font-size: 80%;"><b>Plotting the mean value for each group</b></a></li>
+     <!-- navigation toc: --> <li><a href="#the-logistic-function" style="font-size: 80%;"><b>The logistic function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs031.html#examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-networks" style="font-size: 80%;"><b>Examples of likelihood functions used in logistic regression and nueral networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs032.html#two-parameters" style="font-size: 80%;"><b>Two parameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs033.html#maximum-likelihood" style="font-size: 80%;"><b>Maximum likelihood</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#the-cost-function-rewritten" style="font-size: 80%;"><b>The cost function rewritten</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs035.html#minimizing-the-cross-entropy" style="font-size: 80%;"><b>Minimizing the cross entropy</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs036.html#a-more-compact-expression" style="font-size: 80%;"><b>A more compact expression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs037.html#extending-to-more-predictors" style="font-size: 80%;"><b>Extending to more predictors</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs038.html#including-more-classes" style="font-size: 80%;"><b>Including more classes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs039.html#more-classes" style="font-size: 80%;"><b>More classes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs040.html#optimization-the-central-part-of-any-machine-learning-algortithm" style="font-size: 80%;"><b>Optimization, the central part of any Machine Learning algortithm</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs041.html#revisiting-our-logistic-regression-case" style="font-size: 80%;"><b>Revisiting our Logistic Regression case</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs042.html#the-equations-to-solve" style="font-size: 80%;"><b>The equations to solve</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs043.html#solving-using-newton-raphson-s-method" style="font-size: 80%;"><b>Solving using Newton-Raphson's method</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs044.html#example-code-for-logistic-regression" style="font-size: 80%;"><b>Example code for Logistic Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs044.html#synthetic-data-generation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Synthetic data generation</a></li>
+
+        </ul>
+      </li>
+    </ul>
+  </div>
+</div>
+</div> <!-- end of navigation bar -->
+<div class="container">
+<p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
+<a name="part0030"></a>
+<!-- !split -->
+<h2 id="the-logistic-function" class="anchor">The logistic function </h2>
+
+<p>Another widely studied model, is the so-called 
+perceptron model, which is an example of a &quot;hard classification&quot; model. We
+will encounter this model when we discuss neural networks as
+well. Each datapoint is deterministically assigned to a category (i.e
+\( y_i=0 \) or \( y_i=1 \)). In many cases, and the coronary heart disease data forms one of many such examples, it is favorable to have a &quot;soft&quot;
+classifier that outputs the probability of a given category rather
+than a single value. For example, given \( x_i \), the classifier
+outputs the probability of being in a category \( k \).  Logistic regression
+is the most common example of a so-called soft classifier. In logistic
+regression, the probability that a data point \( x_i \)
+belongs to a category \( y_i=\{0,1\} \) is given by the so-called logit function (or Sigmoid) which is meant to represent the likelihood for a given event, 
+</p>
+$$
+p(t) = \frac{1}{1+\mathrm \exp{-t}}=\frac{\exp{t}}{1+\mathrm \exp{t}}.
+$$
+
+<p>Note that \( 1-p(t)= p(-t) \).</p>
+
+<p>
+<!-- navigation buttons at the bottom of the page -->
+<ul class="pagination">
+<li><a href="http://github.com/._week39-bs029.html">&laquo;</a></li>
+  <li><a href="http://github.com/._week39-bs000.html">1</a></li>
+  <li><a href="">...</a></li>
+  <li><a href="http://github.com/._week39-bs022.html">23</a></li>
+  <li><a href="http://github.com/._week39-bs023.html">24</a></li>
+  <li><a href="http://github.com/._week39-bs024.html">25</a></li>
+  <li><a href="http://github.com/._week39-bs025.html">26</a></li>
+  <li><a href="http://github.com/._week39-bs026.html">27</a></li>
+  <li><a href="http://github.com/._week39-bs027.html">28</a></li>
+  <li><a href="http://github.com/._week39-bs028.html">29</a></li>
+  <li><a href="http://github.com/._week39-bs029.html">30</a></li>
+  <li class="active"><a href="http://github.com/._week39-bs030.html">31</a></li>
+  <li><a href="http://github.com/._week39-bs031.html">32</a></li>
+  <li><a href="http://github.com/._week39-bs032.html">33</a></li>
+  <li><a href="http://github.com/._week39-bs033.html">34</a></li>
+  <li><a href="http://github.com/._week39-bs034.html">35</a></li>
+  <li><a href="http://github.com/._week39-bs035.html">36</a></li>
+  <li><a href="http://github.com/._week39-bs036.html">37</a></li>
+  <li><a href="http://github.com/._week39-bs037.html">38</a></li>
+  <li><a href="http://github.com/._week39-bs038.html">39</a></li>
+  <li><a href="http://github.com/._week39-bs039.html">40</a></li>
+  <li><a href="">...</a></li>
+  <li><a href="http://github.com/._week39-bs044.html">45</a></li>
+  <li><a href="http://github.com/._week39-bs031.html">&raquo;</a></li>
+</ul>
+<!-- ------------------- end of main content --------------- -->
+</div>  <!-- end container -->
+<!-- include javascript, jQuery *first* -->
+<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.10.2/jquery.min.js"></script>
+<script src="https://netdna.bootstrapcdn.com/bootstrap/3.0.0/js/bootstrap.min.js"></script>
+<!-- Bootstrap footer
+<footer>
+<a href="https://.../"><img width="250" align=right src="https://.../"></a>
+</footer>
+-->
+<center style="font-size:80%">
+<!-- copyright only on the titlepage -->
+</center>
+</body>
+</html>
+
diff --git a/doc/pub/week39/html/._week39-bs031.html b/doc/pub/week39/html/._week39-bs031.html
new file mode 100644
index 000000000..2406ff009
--- /dev/null
+++ b/doc/pub/week39/html/._week39-bs031.html
@@ -0,0 +1,379 @@
+<!--
+HTML file automatically generated from DocOnce source
+(https://github.com/doconce/doconce/)
+doconce format html week39.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week39-bs --no_mako
+-->
+<html>
+<head>
+<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
+<meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
+<meta name="viewport" content="width=device-width, initial-scale=1.0" />
+<meta name="description" content="Week 39: Resampling methods and logistic regression">
+<title>Week 39: Resampling methods and logistic regression</title>
+<!-- Bootstrap style: bootstrap -->
+<!-- doconce format html week39.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week39-bs --no_mako -->
+<link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
+<!-- not necessary
+<link href="https://netdna.bootstrapcdn.com/font-awesome/4.0.3/css/font-awesome.css" rel="stylesheet">
+-->
+<style type="text/css">
+/* Add scrollbar to dropdown menus in bootstrap navigation bar */
+.dropdown-menu {
+   height: auto;
+   max-height: 400px;
+   overflow-x: hidden;
+}
+/* Adds an invisible element before each target to offset for the navigation
+   bar */
+.anchor::before {
+  content:"";
+  display:block;
+  height:50px;      /* fixed header height for style bootstrap */
+  margin:-50px 0 0; /* negative fixed header height */
+}
+</style>
+</head>
+
+<!-- tocinfo
+{'highest level': 2,
+ 'sections': [('Plan for week 39, September 22-26, 2025',
+               2,
+               None,
+               'plan-for-week-39-september-22-26-2025'),
+              ('Readings and Videos, resampling methods',
+               2,
+               None,
+               'readings-and-videos-resampling-methods'),
+              ('Readings and Videos, logistic regression',
+               2,
+               None,
+               'readings-and-videos-logistic-regression'),
+              ('Lab sessions week 39', 2, None, 'lab-sessions-week-39'),
+              ('Lecture material', 2, None, 'lecture-material'),
+              ('Resampling methods', 2, None, 'resampling-methods'),
+              ('Resampling approaches can be computationally expensive',
+               2,
+               None,
+               'resampling-approaches-can-be-computationally-expensive'),
+              ('Why resampling methods ?', 2, None, 'why-resampling-methods'),
+              ('Statistical analysis', 2, None, 'statistical-analysis'),
+              ('Resampling methods', 2, None, 'resampling-methods'),
+              ('Resampling methods: Bootstrap',
+               2,
+               None,
+               'resampling-methods-bootstrap'),
+              ('The bias-variance tradeoff',
+               2,
+               None,
+               'the-bias-variance-tradeoff'),
+              ('A way to Read the Bias-Variance Tradeoff',
+               2,
+               None,
+               'a-way-to-read-the-bias-variance-tradeoff'),
+              ('Understanding what happens',
+               2,
+               None,
+               'understanding-what-happens'),
+              ('Summing up', 2, None, 'summing-up'),
+              ("Another Example from Scikit-Learn's Repository",
+               2,
+               None,
+               'another-example-from-scikit-learn-s-repository'),
+              ('Various steps in cross-validation',
+               2,
+               None,
+               'various-steps-in-cross-validation'),
+              ('Cross-validation in brief',
+               2,
+               None,
+               'cross-validation-in-brief'),
+              ('Code Example for Cross-validation and $k$-fold '
+               'Cross-validation',
+               2,
+               None,
+               'code-example-for-cross-validation-and-k-fold-cross-validation'),
+              ('More examples on bootstrap and cross-validation and errors',
+               2,
+               None,
+               'more-examples-on-bootstrap-and-cross-validation-and-errors'),
+              ('The same example but now with cross-validation',
+               2,
+               None,
+               'the-same-example-but-now-with-cross-validation'),
+              ('Logistic Regression', 2, None, 'logistic-regression'),
+              ('Classification problems', 2, None, 'classification-problems'),
+              ('Optimization and Deep learning',
+               2,
+               None,
+               'optimization-and-deep-learning'),
+              ('Basics', 2, None, 'basics'),
+              ('Linear classifier', 2, None, 'linear-classifier'),
+              ('Some selected properties', 2, None, 'some-selected-properties'),
+              ('Simple example', 2, None, 'simple-example'),
+              ('Plotting the mean value for each group',
+               2,
+               None,
+               'plotting-the-mean-value-for-each-group'),
+              ('The logistic function', 2, None, 'the-logistic-function'),
+              ('Examples of likelihood functions used in logistic regression '
+               'and nueral networks',
+               2,
+               None,
+               'examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-networks'),
+              ('Two parameters', 2, None, 'two-parameters'),
+              ('Maximum likelihood', 2, None, 'maximum-likelihood'),
+              ('The cost function rewritten',
+               2,
+               None,
+               'the-cost-function-rewritten'),
+              ('Minimizing the cross entropy',
+               2,
+               None,
+               'minimizing-the-cross-entropy'),
+              ('A more compact expression',
+               2,
+               None,
+               'a-more-compact-expression'),
+              ('Extending to more predictors',
+               2,
+               None,
+               'extending-to-more-predictors'),
+              ('Including more classes', 2, None, 'including-more-classes'),
+              ('More classes', 2, None, 'more-classes'),
+              ('Optimization, the central part of any Machine Learning '
+               'algortithm',
+               2,
+               None,
+               'optimization-the-central-part-of-any-machine-learning-algortithm'),
+              ('Revisiting our Logistic Regression case',
+               2,
+               None,
+               'revisiting-our-logistic-regression-case'),
+              ('The equations to solve', 2, None, 'the-equations-to-solve'),
+              ("Solving using Newton-Raphson's method",
+               2,
+               None,
+               'solving-using-newton-raphson-s-method'),
+              ('Example code for Logistic Regression',
+               2,
+               None,
+               'example-code-for-logistic-regression'),
+              ('Synthetic data generation',
+               3,
+               None,
+               'synthetic-data-generation')]}
+end of tocinfo -->
+
+<body>
+
+
+
+<script type="text/x-mathjax-config">
+MathJax.Hub.Config({
+  TeX: {
+     equationNumbers: {  autoNumber: "none"  },
+     extensions: ["AMSmath.js", "AMSsymbols.js", "autobold.js", "color.js"]
+  }
+});
+</script>
+<script type="text/javascript" async
+ src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
+</script>
+
+
+<!-- Bootstrap navigation bar -->
+<div class="navbar navbar-default navbar-fixed-top">
+  <div class="navbar-header">
+    <button type="button" class="navbar-toggle" data-toggle="collapse" data-target=".navbar-responsive-collapse">
+      <span class="icon-bar"></span>
+      <span class="icon-bar"></span>
+      <span class="icon-bar"></span>
+    </button>
+    <a class="navbar-brand" href="http://github.com/week39-bs.html">Week 39: Resampling methods and logistic regression</a>
+  </div>
+  <div class="navbar-collapse collapse navbar-responsive-collapse">
+    <ul class="nav navbar-nav navbar-right">
+      <li class="dropdown">
+        <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
+        <ul class="dropdown-menu">
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs001.html#plan-for-week-39-september-22-26-2025" style="font-size: 80%;"><b>Plan for week 39, September 22-26, 2025</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs002.html#readings-and-videos-resampling-methods" style="font-size: 80%;"><b>Readings and Videos, resampling methods</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs003.html#readings-and-videos-logistic-regression" style="font-size: 80%;"><b>Readings and Videos, logistic regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs004.html#lab-sessions-week-39" style="font-size: 80%;"><b>Lab sessions week 39</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs005.html#lecture-material" style="font-size: 80%;"><b>Lecture material</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs010.html#resampling-methods" style="font-size: 80%;"><b>Resampling methods</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs007.html#resampling-approaches-can-be-computationally-expensive" style="font-size: 80%;"><b>Resampling approaches can be computationally expensive</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs008.html#why-resampling-methods" style="font-size: 80%;"><b>Why resampling methods ?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs009.html#statistical-analysis" style="font-size: 80%;"><b>Statistical analysis</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs010.html#resampling-methods" style="font-size: 80%;"><b>Resampling methods</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs011.html#resampling-methods-bootstrap" style="font-size: 80%;"><b>Resampling methods: Bootstrap</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs012.html#the-bias-variance-tradeoff" style="font-size: 80%;"><b>The bias-variance tradeoff</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs013.html#a-way-to-read-the-bias-variance-tradeoff" style="font-size: 80%;"><b>A way to Read the Bias-Variance Tradeoff</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs014.html#understanding-what-happens" style="font-size: 80%;"><b>Understanding what happens</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs015.html#summing-up" style="font-size: 80%;"><b>Summing up</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs016.html#another-example-from-scikit-learn-s-repository" style="font-size: 80%;"><b>Another Example from Scikit-Learn's Repository</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs017.html#various-steps-in-cross-validation" style="font-size: 80%;"><b>Various steps in cross-validation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs018.html#cross-validation-in-brief" style="font-size: 80%;"><b>Cross-validation in brief</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs019.html#code-example-for-cross-validation-and-k-fold-cross-validation" style="font-size: 80%;"><b>Code Example for Cross-validation and \( k \)-fold Cross-validation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs020.html#more-examples-on-bootstrap-and-cross-validation-and-errors" style="font-size: 80%;"><b>More examples on bootstrap and cross-validation and errors</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs021.html#the-same-example-but-now-with-cross-validation" style="font-size: 80%;"><b>The same example but now with cross-validation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs022.html#logistic-regression" style="font-size: 80%;"><b>Logistic Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs023.html#classification-problems" style="font-size: 80%;"><b>Classification problems</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs024.html#optimization-and-deep-learning" style="font-size: 80%;"><b>Optimization and Deep learning</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs025.html#basics" style="font-size: 80%;"><b>Basics</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs026.html#linear-classifier" style="font-size: 80%;"><b>Linear classifier</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs027.html#some-selected-properties" style="font-size: 80%;"><b>Some selected properties</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs028.html#simple-example" style="font-size: 80%;"><b>Simple example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs029.html#plotting-the-mean-value-for-each-group" style="font-size: 80%;"><b>Plotting the mean value for each group</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs030.html#the-logistic-function" style="font-size: 80%;"><b>The logistic function</b></a></li>
+     <!-- navigation toc: --> <li><a href="#examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-networks" style="font-size: 80%;"><b>Examples of likelihood functions used in logistic regression and nueral networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs032.html#two-parameters" style="font-size: 80%;"><b>Two parameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs033.html#maximum-likelihood" style="font-size: 80%;"><b>Maximum likelihood</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#the-cost-function-rewritten" style="font-size: 80%;"><b>The cost function rewritten</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs035.html#minimizing-the-cross-entropy" style="font-size: 80%;"><b>Minimizing the cross entropy</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs036.html#a-more-compact-expression" style="font-size: 80%;"><b>A more compact expression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs037.html#extending-to-more-predictors" style="font-size: 80%;"><b>Extending to more predictors</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs038.html#including-more-classes" style="font-size: 80%;"><b>Including more classes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs039.html#more-classes" style="font-size: 80%;"><b>More classes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs040.html#optimization-the-central-part-of-any-machine-learning-algortithm" style="font-size: 80%;"><b>Optimization, the central part of any Machine Learning algortithm</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs041.html#revisiting-our-logistic-regression-case" style="font-size: 80%;"><b>Revisiting our Logistic Regression case</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs042.html#the-equations-to-solve" style="font-size: 80%;"><b>The equations to solve</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs043.html#solving-using-newton-raphson-s-method" style="font-size: 80%;"><b>Solving using Newton-Raphson's method</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs044.html#example-code-for-logistic-regression" style="font-size: 80%;"><b>Example code for Logistic Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs044.html#synthetic-data-generation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Synthetic data generation</a></li>
+
+        </ul>
+      </li>
+    </ul>
+  </div>
+</div>
+</div> <!-- end of navigation bar -->
+<div class="container">
+<p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
+<a name="part0031"></a>
+<!-- !split -->
+<h2 id="examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-networks" class="anchor">Examples of likelihood functions used in logistic regression and nueral networks </h2>
+
+<p>The following code plots the logistic function, the step function and other functions we will encounter from here and on.</p>
+
+
+<!-- code=python (!bc pycod) typeset with pygments style "default" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #f8f8f8">
+  <pre style="line-height: 125%;"><span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;The sigmoid function (or the logistic curve) is a</span>
+<span style="color: #BA2121; font-style: italic">function that takes any real number, z, and outputs a number (0,1).</span>
+<span style="color: #BA2121; font-style: italic">It is useful in neural networks for assigning weights on a relative scale.</span>
+<span style="color: #BA2121; font-style: italic">The value z is the weighted sum of parameters involved in the learning algorithm.&quot;&quot;&quot;</span>
+
+<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">numpy</span>
+<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">matplotlib.pyplot</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">plt</span>
+<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">math</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">mt</span>
+
+z <span style="color: #666666">=</span> numpy<span style="color: #666666">.</span>arange(<span style="color: #666666">-5</span>, <span style="color: #666666">5</span>, <span style="color: #666666">.1</span>)
+sigma_fn <span style="color: #666666">=</span> numpy<span style="color: #666666">.</span>vectorize(<span style="color: #008000; font-weight: bold">lambda</span> z: <span style="color: #666666">1/</span>(<span style="color: #666666">1+</span>numpy<span style="color: #666666">.</span>exp(<span style="color: #666666">-</span>z)))
+sigma <span style="color: #666666">=</span> sigma_fn(z)
+
+fig <span style="color: #666666">=</span> plt<span style="color: #666666">.</span>figure()
+ax <span style="color: #666666">=</span> fig<span style="color: #666666">.</span>add_subplot(<span style="color: #666666">111</span>)
+ax<span style="color: #666666">.</span>plot(z, sigma)
+ax<span style="color: #666666">.</span>set_ylim([<span style="color: #666666">-0.1</span>, <span style="color: #666666">1.1</span>])
+ax<span style="color: #666666">.</span>set_xlim([<span style="color: #666666">-5</span>,<span style="color: #666666">5</span>])
+ax<span style="color: #666666">.</span>grid(<span style="color: #008000; font-weight: bold">True</span>)
+ax<span style="color: #666666">.</span>set_xlabel(<span style="color: #BA2121">&#39;z&#39;</span>)
+ax<span style="color: #666666">.</span>set_title(<span style="color: #BA2121">&#39;sigmoid function&#39;</span>)
+
+plt<span style="color: #666666">.</span>show()
+
+<span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;Step Function&quot;&quot;&quot;</span>
+z <span style="color: #666666">=</span> numpy<span style="color: #666666">.</span>arange(<span style="color: #666666">-5</span>, <span style="color: #666666">5</span>, <span style="color: #666666">.02</span>)
+step_fn <span style="color: #666666">=</span> numpy<span style="color: #666666">.</span>vectorize(<span style="color: #008000; font-weight: bold">lambda</span> z: <span style="color: #666666">1.0</span> <span style="color: #008000; font-weight: bold">if</span> z <span style="color: #666666">&gt;=</span> <span style="color: #666666">0.0</span> <span style="color: #008000; font-weight: bold">else</span> <span style="color: #666666">0.0</span>)
+step <span style="color: #666666">=</span> step_fn(z)
+
+fig <span style="color: #666666">=</span> plt<span style="color: #666666">.</span>figure()
+ax <span style="color: #666666">=</span> fig<span style="color: #666666">.</span>add_subplot(<span style="color: #666666">111</span>)
+ax<span style="color: #666666">.</span>plot(z, step)
+ax<span style="color: #666666">.</span>set_ylim([<span style="color: #666666">-0.5</span>, <span style="color: #666666">1.5</span>])
+ax<span style="color: #666666">.</span>set_xlim([<span style="color: #666666">-5</span>,<span style="color: #666666">5</span>])
+ax<span style="color: #666666">.</span>grid(<span style="color: #008000; font-weight: bold">True</span>)
+ax<span style="color: #666666">.</span>set_xlabel(<span style="color: #BA2121">&#39;z&#39;</span>)
+ax<span style="color: #666666">.</span>set_title(<span style="color: #BA2121">&#39;step function&#39;</span>)
+
+plt<span style="color: #666666">.</span>show()
+
+<span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;tanh Function&quot;&quot;&quot;</span>
+z <span style="color: #666666">=</span> numpy<span style="color: #666666">.</span>arange(<span style="color: #666666">-2*</span>mt<span style="color: #666666">.</span>pi, <span style="color: #666666">2*</span>mt<span style="color: #666666">.</span>pi, <span style="color: #666666">0.1</span>)
+t <span style="color: #666666">=</span> numpy<span style="color: #666666">.</span>tanh(z)
+
+fig <span style="color: #666666">=</span> plt<span style="color: #666666">.</span>figure()
+ax <span style="color: #666666">=</span> fig<span style="color: #666666">.</span>add_subplot(<span style="color: #666666">111</span>)
+ax<span style="color: #666666">.</span>plot(z, t)
+ax<span style="color: #666666">.</span>set_ylim([<span style="color: #666666">-1.0</span>, <span style="color: #666666">1.0</span>])
+ax<span style="color: #666666">.</span>set_xlim([<span style="color: #666666">-2*</span>mt<span style="color: #666666">.</span>pi,<span style="color: #666666">2*</span>mt<span style="color: #666666">.</span>pi])
+ax<span style="color: #666666">.</span>grid(<span style="color: #008000; font-weight: bold">True</span>)
+ax<span style="color: #666666">.</span>set_xlabel(<span style="color: #BA2121">&#39;z&#39;</span>)
+ax<span style="color: #666666">.</span>set_title(<span style="color: #BA2121">&#39;tanh function&#39;</span>)
+
+plt<span style="color: #666666">.</span>show()
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
+</div>
+
+
+<p>
+<!-- navigation buttons at the bottom of the page -->
+<ul class="pagination">
+<li><a href="http://github.com/._week39-bs030.html">&laquo;</a></li>
+  <li><a href="http://github.com/._week39-bs000.html">1</a></li>
+  <li><a href="">...</a></li>
+  <li><a href="http://github.com/._week39-bs023.html">24</a></li>
+  <li><a href="http://github.com/._week39-bs024.html">25</a></li>
+  <li><a href="http://github.com/._week39-bs025.html">26</a></li>
+  <li><a href="http://github.com/._week39-bs026.html">27</a></li>
+  <li><a href="http://github.com/._week39-bs027.html">28</a></li>
+  <li><a href="http://github.com/._week39-bs028.html">29</a></li>
+  <li><a href="http://github.com/._week39-bs029.html">30</a></li>
+  <li><a href="http://github.com/._week39-bs030.html">31</a></li>
+  <li class="active"><a href="http://github.com/._week39-bs031.html">32</a></li>
+  <li><a href="http://github.com/._week39-bs032.html">33</a></li>
+  <li><a href="http://github.com/._week39-bs033.html">34</a></li>
+  <li><a href="http://github.com/._week39-bs034.html">35</a></li>
+  <li><a href="http://github.com/._week39-bs035.html">36</a></li>
+  <li><a href="http://github.com/._week39-bs036.html">37</a></li>
+  <li><a href="http://github.com/._week39-bs037.html">38</a></li>
+  <li><a href="http://github.com/._week39-bs038.html">39</a></li>
+  <li><a href="http://github.com/._week39-bs039.html">40</a></li>
+  <li><a href="http://github.com/._week39-bs040.html">41</a></li>
+  <li><a href="">...</a></li>
+  <li><a href="http://github.com/._week39-bs044.html">45</a></li>
+  <li><a href="http://github.com/._week39-bs032.html">&raquo;</a></li>
+</ul>
+<!-- ------------------- end of main content --------------- -->
+</div>  <!-- end container -->
+<!-- include javascript, jQuery *first* -->
+<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.10.2/jquery.min.js"></script>
+<script src="https://netdna.bootstrapcdn.com/bootstrap/3.0.0/js/bootstrap.min.js"></script>
+<!-- Bootstrap footer
+<footer>
+<a href="https://.../"><img width="250" align=right src="https://.../"></a>
+</footer>
+-->
+<center style="font-size:80%">
+<!-- copyright only on the titlepage -->
+</center>
+</body>
+</html>
+
diff --git a/doc/pub/week39/html/._week39-bs032.html b/doc/pub/week39/html/._week39-bs032.html
new file mode 100644
index 000000000..eb3593307
--- /dev/null
+++ b/doc/pub/week39/html/._week39-bs032.html
@@ -0,0 +1,316 @@
+<!--
+HTML file automatically generated from DocOnce source
+(https://github.com/doconce/doconce/)
+doconce format html week39.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week39-bs --no_mako
+-->
+<html>
+<head>
+<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
+<meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
+<meta name="viewport" content="width=device-width, initial-scale=1.0" />
+<meta name="description" content="Week 39: Resampling methods and logistic regression">
+<title>Week 39: Resampling methods and logistic regression</title>
+<!-- Bootstrap style: bootstrap -->
+<!-- doconce format html week39.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week39-bs --no_mako -->
+<link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
+<!-- not necessary
+<link href="https://netdna.bootstrapcdn.com/font-awesome/4.0.3/css/font-awesome.css" rel="stylesheet">
+-->
+<style type="text/css">
+/* Add scrollbar to dropdown menus in bootstrap navigation bar */
+.dropdown-menu {
+   height: auto;
+   max-height: 400px;
+   overflow-x: hidden;
+}
+/* Adds an invisible element before each target to offset for the navigation
+   bar */
+.anchor::before {
+  content:"";
+  display:block;
+  height:50px;      /* fixed header height for style bootstrap */
+  margin:-50px 0 0; /* negative fixed header height */
+}
+</style>
+</head>
+
+<!-- tocinfo
+{'highest level': 2,
+ 'sections': [('Plan for week 39, September 22-26, 2025',
+               2,
+               None,
+               'plan-for-week-39-september-22-26-2025'),
+              ('Readings and Videos, resampling methods',
+               2,
+               None,
+               'readings-and-videos-resampling-methods'),
+              ('Readings and Videos, logistic regression',
+               2,
+               None,
+               'readings-and-videos-logistic-regression'),
+              ('Lab sessions week 39', 2, None, 'lab-sessions-week-39'),
+              ('Lecture material', 2, None, 'lecture-material'),
+              ('Resampling methods', 2, None, 'resampling-methods'),
+              ('Resampling approaches can be computationally expensive',
+               2,
+               None,
+               'resampling-approaches-can-be-computationally-expensive'),
+              ('Why resampling methods ?', 2, None, 'why-resampling-methods'),
+              ('Statistical analysis', 2, None, 'statistical-analysis'),
+              ('Resampling methods', 2, None, 'resampling-methods'),
+              ('Resampling methods: Bootstrap',
+               2,
+               None,
+               'resampling-methods-bootstrap'),
+              ('The bias-variance tradeoff',
+               2,
+               None,
+               'the-bias-variance-tradeoff'),
+              ('A way to Read the Bias-Variance Tradeoff',
+               2,
+               None,
+               'a-way-to-read-the-bias-variance-tradeoff'),
+              ('Understanding what happens',
+               2,
+               None,
+               'understanding-what-happens'),
+              ('Summing up', 2, None, 'summing-up'),
+              ("Another Example from Scikit-Learn's Repository",
+               2,
+               None,
+               'another-example-from-scikit-learn-s-repository'),
+              ('Various steps in cross-validation',
+               2,
+               None,
+               'various-steps-in-cross-validation'),
+              ('Cross-validation in brief',
+               2,
+               None,
+               'cross-validation-in-brief'),
+              ('Code Example for Cross-validation and $k$-fold '
+               'Cross-validation',
+               2,
+               None,
+               'code-example-for-cross-validation-and-k-fold-cross-validation'),
+              ('More examples on bootstrap and cross-validation and errors',
+               2,
+               None,
+               'more-examples-on-bootstrap-and-cross-validation-and-errors'),
+              ('The same example but now with cross-validation',
+               2,
+               None,
+               'the-same-example-but-now-with-cross-validation'),
+              ('Logistic Regression', 2, None, 'logistic-regression'),
+              ('Classification problems', 2, None, 'classification-problems'),
+              ('Optimization and Deep learning',
+               2,
+               None,
+               'optimization-and-deep-learning'),
+              ('Basics', 2, None, 'basics'),
+              ('Linear classifier', 2, None, 'linear-classifier'),
+              ('Some selected properties', 2, None, 'some-selected-properties'),
+              ('Simple example', 2, None, 'simple-example'),
+              ('Plotting the mean value for each group',
+               2,
+               None,
+               'plotting-the-mean-value-for-each-group'),
+              ('The logistic function', 2, None, 'the-logistic-function'),
+              ('Examples of likelihood functions used in logistic regression '
+               'and nueral networks',
+               2,
+               None,
+               'examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-networks'),
+              ('Two parameters', 2, None, 'two-parameters'),
+              ('Maximum likelihood', 2, None, 'maximum-likelihood'),
+              ('The cost function rewritten',
+               2,
+               None,
+               'the-cost-function-rewritten'),
+              ('Minimizing the cross entropy',
+               2,
+               None,
+               'minimizing-the-cross-entropy'),
+              ('A more compact expression',
+               2,
+               None,
+               'a-more-compact-expression'),
+              ('Extending to more predictors',
+               2,
+               None,
+               'extending-to-more-predictors'),
+              ('Including more classes', 2, None, 'including-more-classes'),
+              ('More classes', 2, None, 'more-classes'),
+              ('Optimization, the central part of any Machine Learning '
+               'algortithm',
+               2,
+               None,
+               'optimization-the-central-part-of-any-machine-learning-algortithm'),
+              ('Revisiting our Logistic Regression case',
+               2,
+               None,
+               'revisiting-our-logistic-regression-case'),
+              ('The equations to solve', 2, None, 'the-equations-to-solve'),
+              ("Solving using Newton-Raphson's method",
+               2,
+               None,
+               'solving-using-newton-raphson-s-method'),
+              ('Example code for Logistic Regression',
+               2,
+               None,
+               'example-code-for-logistic-regression'),
+              ('Synthetic data generation',
+               3,
+               None,
+               'synthetic-data-generation')]}
+end of tocinfo -->
+
+<body>
+
+
+
+<script type="text/x-mathjax-config">
+MathJax.Hub.Config({
+  TeX: {
+     equationNumbers: {  autoNumber: "none"  },
+     extensions: ["AMSmath.js", "AMSsymbols.js", "autobold.js", "color.js"]
+  }
+});
+</script>
+<script type="text/javascript" async
+ src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
+</script>
+
+
+<!-- Bootstrap navigation bar -->
+<div class="navbar navbar-default navbar-fixed-top">
+  <div class="navbar-header">
+    <button type="button" class="navbar-toggle" data-toggle="collapse" data-target=".navbar-responsive-collapse">
+      <span class="icon-bar"></span>
+      <span class="icon-bar"></span>
+      <span class="icon-bar"></span>
+    </button>
+    <a class="navbar-brand" href="http://github.com/week39-bs.html">Week 39: Resampling methods and logistic regression</a>
+  </div>
+  <div class="navbar-collapse collapse navbar-responsive-collapse">
+    <ul class="nav navbar-nav navbar-right">
+      <li class="dropdown">
+        <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
+        <ul class="dropdown-menu">
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs001.html#plan-for-week-39-september-22-26-2025" style="font-size: 80%;"><b>Plan for week 39, September 22-26, 2025</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs002.html#readings-and-videos-resampling-methods" style="font-size: 80%;"><b>Readings and Videos, resampling methods</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs003.html#readings-and-videos-logistic-regression" style="font-size: 80%;"><b>Readings and Videos, logistic regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs004.html#lab-sessions-week-39" style="font-size: 80%;"><b>Lab sessions week 39</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs005.html#lecture-material" style="font-size: 80%;"><b>Lecture material</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs010.html#resampling-methods" style="font-size: 80%;"><b>Resampling methods</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs007.html#resampling-approaches-can-be-computationally-expensive" style="font-size: 80%;"><b>Resampling approaches can be computationally expensive</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs008.html#why-resampling-methods" style="font-size: 80%;"><b>Why resampling methods ?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs009.html#statistical-analysis" style="font-size: 80%;"><b>Statistical analysis</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs010.html#resampling-methods" style="font-size: 80%;"><b>Resampling methods</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs011.html#resampling-methods-bootstrap" style="font-size: 80%;"><b>Resampling methods: Bootstrap</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs012.html#the-bias-variance-tradeoff" style="font-size: 80%;"><b>The bias-variance tradeoff</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs013.html#a-way-to-read-the-bias-variance-tradeoff" style="font-size: 80%;"><b>A way to Read the Bias-Variance Tradeoff</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs014.html#understanding-what-happens" style="font-size: 80%;"><b>Understanding what happens</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs015.html#summing-up" style="font-size: 80%;"><b>Summing up</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs016.html#another-example-from-scikit-learn-s-repository" style="font-size: 80%;"><b>Another Example from Scikit-Learn's Repository</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs017.html#various-steps-in-cross-validation" style="font-size: 80%;"><b>Various steps in cross-validation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs018.html#cross-validation-in-brief" style="font-size: 80%;"><b>Cross-validation in brief</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs019.html#code-example-for-cross-validation-and-k-fold-cross-validation" style="font-size: 80%;"><b>Code Example for Cross-validation and \( k \)-fold Cross-validation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs020.html#more-examples-on-bootstrap-and-cross-validation-and-errors" style="font-size: 80%;"><b>More examples on bootstrap and cross-validation and errors</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs021.html#the-same-example-but-now-with-cross-validation" style="font-size: 80%;"><b>The same example but now with cross-validation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs022.html#logistic-regression" style="font-size: 80%;"><b>Logistic Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs023.html#classification-problems" style="font-size: 80%;"><b>Classification problems</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs024.html#optimization-and-deep-learning" style="font-size: 80%;"><b>Optimization and Deep learning</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs025.html#basics" style="font-size: 80%;"><b>Basics</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs026.html#linear-classifier" style="font-size: 80%;"><b>Linear classifier</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs027.html#some-selected-properties" style="font-size: 80%;"><b>Some selected properties</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs028.html#simple-example" style="font-size: 80%;"><b>Simple example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs029.html#plotting-the-mean-value-for-each-group" style="font-size: 80%;"><b>Plotting the mean value for each group</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs030.html#the-logistic-function" style="font-size: 80%;"><b>The logistic function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs031.html#examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-networks" style="font-size: 80%;"><b>Examples of likelihood functions used in logistic regression and nueral networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="#two-parameters" style="font-size: 80%;"><b>Two parameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs033.html#maximum-likelihood" style="font-size: 80%;"><b>Maximum likelihood</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#the-cost-function-rewritten" style="font-size: 80%;"><b>The cost function rewritten</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs035.html#minimizing-the-cross-entropy" style="font-size: 80%;"><b>Minimizing the cross entropy</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs036.html#a-more-compact-expression" style="font-size: 80%;"><b>A more compact expression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs037.html#extending-to-more-predictors" style="font-size: 80%;"><b>Extending to more predictors</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs038.html#including-more-classes" style="font-size: 80%;"><b>Including more classes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs039.html#more-classes" style="font-size: 80%;"><b>More classes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs040.html#optimization-the-central-part-of-any-machine-learning-algortithm" style="font-size: 80%;"><b>Optimization, the central part of any Machine Learning algortithm</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs041.html#revisiting-our-logistic-regression-case" style="font-size: 80%;"><b>Revisiting our Logistic Regression case</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs042.html#the-equations-to-solve" style="font-size: 80%;"><b>The equations to solve</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs043.html#solving-using-newton-raphson-s-method" style="font-size: 80%;"><b>Solving using Newton-Raphson's method</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs044.html#example-code-for-logistic-regression" style="font-size: 80%;"><b>Example code for Logistic Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs044.html#synthetic-data-generation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Synthetic data generation</a></li>
+
+        </ul>
+      </li>
+    </ul>
+  </div>
+</div>
+</div> <!-- end of navigation bar -->
+<div class="container">
+<p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
+<a name="part0032"></a>
+<!-- !split -->
+<h2 id="two-parameters" class="anchor">Two parameters </h2>
+
+<p>We assume now that we have two classes with \( y_i \) either \( 0 \) or \( 1 \). Furthermore we assume also that we have only two parameters \( \theta \) in our fitting of the Sigmoid function, that is we define probabilities </p>
+$$
+\begin{align*}
+p(y_i=1|x_i,\boldsymbol{\theta}) &= \frac{\exp{(\theta_0+\theta_1x_i)}}{1+\exp{(\theta_0+\theta_1x_i)}},\nonumber\\
+p(y_i=0|x_i,\boldsymbol{\theta}) &= 1 - p(y_i=1|x_i,\boldsymbol{\theta}),
+\end{align*}
+$$
+
+<p>where \( \boldsymbol{\theta} \) are the weights we wish to extract from data, in our case \( \theta_0 \) and \( \theta_1 \). </p>
+
+<p>Note that we used</p>
+$$
+p(y_i=0\vert x_i, \boldsymbol{\theta}) = 1-p(y_i=1\vert x_i, \boldsymbol{\theta}).
+$$
+
+
+<p>
+<!-- navigation buttons at the bottom of the page -->
+<ul class="pagination">
+<li><a href="http://github.com/._week39-bs031.html">&laquo;</a></li>
+  <li><a href="http://github.com/._week39-bs000.html">1</a></li>
+  <li><a href="">...</a></li>
+  <li><a href="http://github.com/._week39-bs024.html">25</a></li>
+  <li><a href="http://github.com/._week39-bs025.html">26</a></li>
+  <li><a href="http://github.com/._week39-bs026.html">27</a></li>
+  <li><a href="http://github.com/._week39-bs027.html">28</a></li>
+  <li><a href="http://github.com/._week39-bs028.html">29</a></li>
+  <li><a href="http://github.com/._week39-bs029.html">30</a></li>
+  <li><a href="http://github.com/._week39-bs030.html">31</a></li>
+  <li><a href="http://github.com/._week39-bs031.html">32</a></li>
+  <li class="active"><a href="http://github.com/._week39-bs032.html">33</a></li>
+  <li><a href="http://github.com/._week39-bs033.html">34</a></li>
+  <li><a href="http://github.com/._week39-bs034.html">35</a></li>
+  <li><a href="http://github.com/._week39-bs035.html">36</a></li>
+  <li><a href="http://github.com/._week39-bs036.html">37</a></li>
+  <li><a href="http://github.com/._week39-bs037.html">38</a></li>
+  <li><a href="http://github.com/._week39-bs038.html">39</a></li>
+  <li><a href="http://github.com/._week39-bs039.html">40</a></li>
+  <li><a href="http://github.com/._week39-bs040.html">41</a></li>
+  <li><a href="http://github.com/._week39-bs041.html">42</a></li>
+  <li><a href="">...</a></li>
+  <li><a href="http://github.com/._week39-bs044.html">45</a></li>
+  <li><a href="http://github.com/._week39-bs033.html">&raquo;</a></li>
+</ul>
+<!-- ------------------- end of main content --------------- -->
+</div>  <!-- end container -->
+<!-- include javascript, jQuery *first* -->
+<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.10.2/jquery.min.js"></script>
+<script src="https://netdna.bootstrapcdn.com/bootstrap/3.0.0/js/bootstrap.min.js"></script>
+<!-- Bootstrap footer
+<footer>
+<a href="https://.../"><img width="250" align=right src="https://.../"></a>
+</footer>
+-->
+<center style="font-size:80%">
+<!-- copyright only on the titlepage -->
+</center>
+</body>
+</html>
+
diff --git a/doc/pub/week39/html/._week39-bs033.html b/doc/pub/week39/html/._week39-bs033.html
new file mode 100644
index 000000000..3e0a2070c
--- /dev/null
+++ b/doc/pub/week39/html/._week39-bs033.html
@@ -0,0 +1,319 @@
+<!--
+HTML file automatically generated from DocOnce source
+(https://github.com/doconce/doconce/)
+doconce format html week39.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week39-bs --no_mako
+-->
+<html>
+<head>
+<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
+<meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
+<meta name="viewport" content="width=device-width, initial-scale=1.0" />
+<meta name="description" content="Week 39: Resampling methods and logistic regression">
+<title>Week 39: Resampling methods and logistic regression</title>
+<!-- Bootstrap style: bootstrap -->
+<!-- doconce format html week39.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week39-bs --no_mako -->
+<link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
+<!-- not necessary
+<link href="https://netdna.bootstrapcdn.com/font-awesome/4.0.3/css/font-awesome.css" rel="stylesheet">
+-->
+<style type="text/css">
+/* Add scrollbar to dropdown menus in bootstrap navigation bar */
+.dropdown-menu {
+   height: auto;
+   max-height: 400px;
+   overflow-x: hidden;
+}
+/* Adds an invisible element before each target to offset for the navigation
+   bar */
+.anchor::before {
+  content:"";
+  display:block;
+  height:50px;      /* fixed header height for style bootstrap */
+  margin:-50px 0 0; /* negative fixed header height */
+}
+</style>
+</head>
+
+<!-- tocinfo
+{'highest level': 2,
+ 'sections': [('Plan for week 39, September 22-26, 2025',
+               2,
+               None,
+               'plan-for-week-39-september-22-26-2025'),
+              ('Readings and Videos, resampling methods',
+               2,
+               None,
+               'readings-and-videos-resampling-methods'),
+              ('Readings and Videos, logistic regression',
+               2,
+               None,
+               'readings-and-videos-logistic-regression'),
+              ('Lab sessions week 39', 2, None, 'lab-sessions-week-39'),
+              ('Lecture material', 2, None, 'lecture-material'),
+              ('Resampling methods', 2, None, 'resampling-methods'),
+              ('Resampling approaches can be computationally expensive',
+               2,
+               None,
+               'resampling-approaches-can-be-computationally-expensive'),
+              ('Why resampling methods ?', 2, None, 'why-resampling-methods'),
+              ('Statistical analysis', 2, None, 'statistical-analysis'),
+              ('Resampling methods', 2, None, 'resampling-methods'),
+              ('Resampling methods: Bootstrap',
+               2,
+               None,
+               'resampling-methods-bootstrap'),
+              ('The bias-variance tradeoff',
+               2,
+               None,
+               'the-bias-variance-tradeoff'),
+              ('A way to Read the Bias-Variance Tradeoff',
+               2,
+               None,
+               'a-way-to-read-the-bias-variance-tradeoff'),
+              ('Understanding what happens',
+               2,
+               None,
+               'understanding-what-happens'),
+              ('Summing up', 2, None, 'summing-up'),
+              ("Another Example from Scikit-Learn's Repository",
+               2,
+               None,
+               'another-example-from-scikit-learn-s-repository'),
+              ('Various steps in cross-validation',
+               2,
+               None,
+               'various-steps-in-cross-validation'),
+              ('Cross-validation in brief',
+               2,
+               None,
+               'cross-validation-in-brief'),
+              ('Code Example for Cross-validation and $k$-fold '
+               'Cross-validation',
+               2,
+               None,
+               'code-example-for-cross-validation-and-k-fold-cross-validation'),
+              ('More examples on bootstrap and cross-validation and errors',
+               2,
+               None,
+               'more-examples-on-bootstrap-and-cross-validation-and-errors'),
+              ('The same example but now with cross-validation',
+               2,
+               None,
+               'the-same-example-but-now-with-cross-validation'),
+              ('Logistic Regression', 2, None, 'logistic-regression'),
+              ('Classification problems', 2, None, 'classification-problems'),
+              ('Optimization and Deep learning',
+               2,
+               None,
+               'optimization-and-deep-learning'),
+              ('Basics', 2, None, 'basics'),
+              ('Linear classifier', 2, None, 'linear-classifier'),
+              ('Some selected properties', 2, None, 'some-selected-properties'),
+              ('Simple example', 2, None, 'simple-example'),
+              ('Plotting the mean value for each group',
+               2,
+               None,
+               'plotting-the-mean-value-for-each-group'),
+              ('The logistic function', 2, None, 'the-logistic-function'),
+              ('Examples of likelihood functions used in logistic regression '
+               'and nueral networks',
+               2,
+               None,
+               'examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-networks'),
+              ('Two parameters', 2, None, 'two-parameters'),
+              ('Maximum likelihood', 2, None, 'maximum-likelihood'),
+              ('The cost function rewritten',
+               2,
+               None,
+               'the-cost-function-rewritten'),
+              ('Minimizing the cross entropy',
+               2,
+               None,
+               'minimizing-the-cross-entropy'),
+              ('A more compact expression',
+               2,
+               None,
+               'a-more-compact-expression'),
+              ('Extending to more predictors',
+               2,
+               None,
+               'extending-to-more-predictors'),
+              ('Including more classes', 2, None, 'including-more-classes'),
+              ('More classes', 2, None, 'more-classes'),
+              ('Optimization, the central part of any Machine Learning '
+               'algortithm',
+               2,
+               None,
+               'optimization-the-central-part-of-any-machine-learning-algortithm'),
+              ('Revisiting our Logistic Regression case',
+               2,
+               None,
+               'revisiting-our-logistic-regression-case'),
+              ('The equations to solve', 2, None, 'the-equations-to-solve'),
+              ("Solving using Newton-Raphson's method",
+               2,
+               None,
+               'solving-using-newton-raphson-s-method'),
+              ('Example code for Logistic Regression',
+               2,
+               None,
+               'example-code-for-logistic-regression'),
+              ('Synthetic data generation',
+               3,
+               None,
+               'synthetic-data-generation')]}
+end of tocinfo -->
+
+<body>
+
+
+
+<script type="text/x-mathjax-config">
+MathJax.Hub.Config({
+  TeX: {
+     equationNumbers: {  autoNumber: "none"  },
+     extensions: ["AMSmath.js", "AMSsymbols.js", "autobold.js", "color.js"]
+  }
+});
+</script>
+<script type="text/javascript" async
+ src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
+</script>
+
+
+<!-- Bootstrap navigation bar -->
+<div class="navbar navbar-default navbar-fixed-top">
+  <div class="navbar-header">
+    <button type="button" class="navbar-toggle" data-toggle="collapse" data-target=".navbar-responsive-collapse">
+      <span class="icon-bar"></span>
+      <span class="icon-bar"></span>
+      <span class="icon-bar"></span>
+    </button>
+    <a class="navbar-brand" href="http://github.com/week39-bs.html">Week 39: Resampling methods and logistic regression</a>
+  </div>
+  <div class="navbar-collapse collapse navbar-responsive-collapse">
+    <ul class="nav navbar-nav navbar-right">
+      <li class="dropdown">
+        <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
+        <ul class="dropdown-menu">
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs001.html#plan-for-week-39-september-22-26-2025" style="font-size: 80%;"><b>Plan for week 39, September 22-26, 2025</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs002.html#readings-and-videos-resampling-methods" style="font-size: 80%;"><b>Readings and Videos, resampling methods</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs003.html#readings-and-videos-logistic-regression" style="font-size: 80%;"><b>Readings and Videos, logistic regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs004.html#lab-sessions-week-39" style="font-size: 80%;"><b>Lab sessions week 39</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs005.html#lecture-material" style="font-size: 80%;"><b>Lecture material</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs010.html#resampling-methods" style="font-size: 80%;"><b>Resampling methods</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs007.html#resampling-approaches-can-be-computationally-expensive" style="font-size: 80%;"><b>Resampling approaches can be computationally expensive</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs008.html#why-resampling-methods" style="font-size: 80%;"><b>Why resampling methods ?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs009.html#statistical-analysis" style="font-size: 80%;"><b>Statistical analysis</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs010.html#resampling-methods" style="font-size: 80%;"><b>Resampling methods</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs011.html#resampling-methods-bootstrap" style="font-size: 80%;"><b>Resampling methods: Bootstrap</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs012.html#the-bias-variance-tradeoff" style="font-size: 80%;"><b>The bias-variance tradeoff</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs013.html#a-way-to-read-the-bias-variance-tradeoff" style="font-size: 80%;"><b>A way to Read the Bias-Variance Tradeoff</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs014.html#understanding-what-happens" style="font-size: 80%;"><b>Understanding what happens</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs015.html#summing-up" style="font-size: 80%;"><b>Summing up</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs016.html#another-example-from-scikit-learn-s-repository" style="font-size: 80%;"><b>Another Example from Scikit-Learn's Repository</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs017.html#various-steps-in-cross-validation" style="font-size: 80%;"><b>Various steps in cross-validation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs018.html#cross-validation-in-brief" style="font-size: 80%;"><b>Cross-validation in brief</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs019.html#code-example-for-cross-validation-and-k-fold-cross-validation" style="font-size: 80%;"><b>Code Example for Cross-validation and \( k \)-fold Cross-validation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs020.html#more-examples-on-bootstrap-and-cross-validation-and-errors" style="font-size: 80%;"><b>More examples on bootstrap and cross-validation and errors</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs021.html#the-same-example-but-now-with-cross-validation" style="font-size: 80%;"><b>The same example but now with cross-validation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs022.html#logistic-regression" style="font-size: 80%;"><b>Logistic Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs023.html#classification-problems" style="font-size: 80%;"><b>Classification problems</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs024.html#optimization-and-deep-learning" style="font-size: 80%;"><b>Optimization and Deep learning</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs025.html#basics" style="font-size: 80%;"><b>Basics</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs026.html#linear-classifier" style="font-size: 80%;"><b>Linear classifier</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs027.html#some-selected-properties" style="font-size: 80%;"><b>Some selected properties</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs028.html#simple-example" style="font-size: 80%;"><b>Simple example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs029.html#plotting-the-mean-value-for-each-group" style="font-size: 80%;"><b>Plotting the mean value for each group</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs030.html#the-logistic-function" style="font-size: 80%;"><b>The logistic function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs031.html#examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-networks" style="font-size: 80%;"><b>Examples of likelihood functions used in logistic regression and nueral networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs032.html#two-parameters" style="font-size: 80%;"><b>Two parameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="#maximum-likelihood" style="font-size: 80%;"><b>Maximum likelihood</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#the-cost-function-rewritten" style="font-size: 80%;"><b>The cost function rewritten</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs035.html#minimizing-the-cross-entropy" style="font-size: 80%;"><b>Minimizing the cross entropy</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs036.html#a-more-compact-expression" style="font-size: 80%;"><b>A more compact expression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs037.html#extending-to-more-predictors" style="font-size: 80%;"><b>Extending to more predictors</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs038.html#including-more-classes" style="font-size: 80%;"><b>Including more classes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs039.html#more-classes" style="font-size: 80%;"><b>More classes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs040.html#optimization-the-central-part-of-any-machine-learning-algortithm" style="font-size: 80%;"><b>Optimization, the central part of any Machine Learning algortithm</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs041.html#revisiting-our-logistic-regression-case" style="font-size: 80%;"><b>Revisiting our Logistic Regression case</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs042.html#the-equations-to-solve" style="font-size: 80%;"><b>The equations to solve</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs043.html#solving-using-newton-raphson-s-method" style="font-size: 80%;"><b>Solving using Newton-Raphson's method</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs044.html#example-code-for-logistic-regression" style="font-size: 80%;"><b>Example code for Logistic Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs044.html#synthetic-data-generation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Synthetic data generation</a></li>
+
+        </ul>
+      </li>
+    </ul>
+  </div>
+</div>
+</div> <!-- end of navigation bar -->
+<div class="container">
+<p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
+<a name="part0033"></a>
+<!-- !split  -->
+<h2 id="maximum-likelihood" class="anchor">Maximum likelihood </h2>
+
+<p>In order to define the total likelihood for all possible outcomes from a  
+dataset \( \mathcal{D}=\{(y_i,x_i)\} \), with the binary labels
+\( y_i\in\{0,1\} \) and where the data points are drawn independently, we use the so-called <a href="https://en.wikipedia.org/wiki/Maximum_likelihood_estimation" target="_self">Maximum Likelihood Estimation</a> (MLE) principle. 
+We aim thus at maximizing 
+the probability of seeing the observed data. We can then approximate the 
+likelihood in terms of the product of the individual probabilities of a specific outcome \( y_i \), that is 
+</p>
+$$
+\begin{align*}
+P(\mathcal{D}|\boldsymbol{\theta})& = \prod_{i=1}^n \left[p(y_i=1|x_i,\boldsymbol{\theta})\right]^{y_i}\left[1-p(y_i=1|x_i,\boldsymbol{\theta}))\right]^{1-y_i}\nonumber \\
+\end{align*}
+$$
+
+<p>from which we obtain the log-likelihood and our <b>cost/loss</b> function</p>
+$$
+\mathcal{C}(\boldsymbol{\theta}) = \sum_{i=1}^n \left( y_i\log{p(y_i=1|x_i,\boldsymbol{\theta})} + (1-y_i)\log\left[1-p(y_i=1|x_i,\boldsymbol{\theta}))\right]\right).
+$$
+
+
+<p>
+<!-- navigation buttons at the bottom of the page -->
+<ul class="pagination">
+<li><a href="http://github.com/._week39-bs032.html">&laquo;</a></li>
+  <li><a href="http://github.com/._week39-bs000.html">1</a></li>
+  <li><a href="">...</a></li>
+  <li><a href="http://github.com/._week39-bs025.html">26</a></li>
+  <li><a href="http://github.com/._week39-bs026.html">27</a></li>
+  <li><a href="http://github.com/._week39-bs027.html">28</a></li>
+  <li><a href="http://github.com/._week39-bs028.html">29</a></li>
+  <li><a href="http://github.com/._week39-bs029.html">30</a></li>
+  <li><a href="http://github.com/._week39-bs030.html">31</a></li>
+  <li><a href="http://github.com/._week39-bs031.html">32</a></li>
+  <li><a href="http://github.com/._week39-bs032.html">33</a></li>
+  <li class="active"><a href="http://github.com/._week39-bs033.html">34</a></li>
+  <li><a href="http://github.com/._week39-bs034.html">35</a></li>
+  <li><a href="http://github.com/._week39-bs035.html">36</a></li>
+  <li><a href="http://github.com/._week39-bs036.html">37</a></li>
+  <li><a href="http://github.com/._week39-bs037.html">38</a></li>
+  <li><a href="http://github.com/._week39-bs038.html">39</a></li>
+  <li><a href="http://github.com/._week39-bs039.html">40</a></li>
+  <li><a href="http://github.com/._week39-bs040.html">41</a></li>
+  <li><a href="http://github.com/._week39-bs041.html">42</a></li>
+  <li><a href="http://github.com/._week39-bs042.html">43</a></li>
+  <li><a href="">...</a></li>
+  <li><a href="http://github.com/._week39-bs044.html">45</a></li>
+  <li><a href="http://github.com/._week39-bs034.html">&raquo;</a></li>
+</ul>
+<!-- ------------------- end of main content --------------- -->
+</div>  <!-- end container -->
+<!-- include javascript, jQuery *first* -->
+<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.10.2/jquery.min.js"></script>
+<script src="https://netdna.bootstrapcdn.com/bootstrap/3.0.0/js/bootstrap.min.js"></script>
+<!-- Bootstrap footer
+<footer>
+<a href="https://.../"><img width="250" align=right src="https://.../"></a>
+</footer>
+-->
+<center style="font-size:80%">
+<!-- copyright only on the titlepage -->
+</center>
+</body>
+</html>
+
diff --git a/doc/pub/week39/html/._week39-bs034.html b/doc/pub/week39/html/._week39-bs034.html
new file mode 100644
index 000000000..bf62b86d5
--- /dev/null
+++ b/doc/pub/week39/html/._week39-bs034.html
@@ -0,0 +1,316 @@
+<!--
+HTML file automatically generated from DocOnce source
+(https://github.com/doconce/doconce/)
+doconce format html week39.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week39-bs --no_mako
+-->
+<html>
+<head>
+<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
+<meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
+<meta name="viewport" content="width=device-width, initial-scale=1.0" />
+<meta name="description" content="Week 39: Resampling methods and logistic regression">
+<title>Week 39: Resampling methods and logistic regression</title>
+<!-- Bootstrap style: bootstrap -->
+<!-- doconce format html week39.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week39-bs --no_mako -->
+<link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
+<!-- not necessary
+<link href="https://netdna.bootstrapcdn.com/font-awesome/4.0.3/css/font-awesome.css" rel="stylesheet">
+-->
+<style type="text/css">
+/* Add scrollbar to dropdown menus in bootstrap navigation bar */
+.dropdown-menu {
+   height: auto;
+   max-height: 400px;
+   overflow-x: hidden;
+}
+/* Adds an invisible element before each target to offset for the navigation
+   bar */
+.anchor::before {
+  content:"";
+  display:block;
+  height:50px;      /* fixed header height for style bootstrap */
+  margin:-50px 0 0; /* negative fixed header height */
+}
+</style>
+</head>
+
+<!-- tocinfo
+{'highest level': 2,
+ 'sections': [('Plan for week 39, September 22-26, 2025',
+               2,
+               None,
+               'plan-for-week-39-september-22-26-2025'),
+              ('Readings and Videos, resampling methods',
+               2,
+               None,
+               'readings-and-videos-resampling-methods'),
+              ('Readings and Videos, logistic regression',
+               2,
+               None,
+               'readings-and-videos-logistic-regression'),
+              ('Lab sessions week 39', 2, None, 'lab-sessions-week-39'),
+              ('Lecture material', 2, None, 'lecture-material'),
+              ('Resampling methods', 2, None, 'resampling-methods'),
+              ('Resampling approaches can be computationally expensive',
+               2,
+               None,
+               'resampling-approaches-can-be-computationally-expensive'),
+              ('Why resampling methods ?', 2, None, 'why-resampling-methods'),
+              ('Statistical analysis', 2, None, 'statistical-analysis'),
+              ('Resampling methods', 2, None, 'resampling-methods'),
+              ('Resampling methods: Bootstrap',
+               2,
+               None,
+               'resampling-methods-bootstrap'),
+              ('The bias-variance tradeoff',
+               2,
+               None,
+               'the-bias-variance-tradeoff'),
+              ('A way to Read the Bias-Variance Tradeoff',
+               2,
+               None,
+               'a-way-to-read-the-bias-variance-tradeoff'),
+              ('Understanding what happens',
+               2,
+               None,
+               'understanding-what-happens'),
+              ('Summing up', 2, None, 'summing-up'),
+              ("Another Example from Scikit-Learn's Repository",
+               2,
+               None,
+               'another-example-from-scikit-learn-s-repository'),
+              ('Various steps in cross-validation',
+               2,
+               None,
+               'various-steps-in-cross-validation'),
+              ('Cross-validation in brief',
+               2,
+               None,
+               'cross-validation-in-brief'),
+              ('Code Example for Cross-validation and $k$-fold '
+               'Cross-validation',
+               2,
+               None,
+               'code-example-for-cross-validation-and-k-fold-cross-validation'),
+              ('More examples on bootstrap and cross-validation and errors',
+               2,
+               None,
+               'more-examples-on-bootstrap-and-cross-validation-and-errors'),
+              ('The same example but now with cross-validation',
+               2,
+               None,
+               'the-same-example-but-now-with-cross-validation'),
+              ('Logistic Regression', 2, None, 'logistic-regression'),
+              ('Classification problems', 2, None, 'classification-problems'),
+              ('Optimization and Deep learning',
+               2,
+               None,
+               'optimization-and-deep-learning'),
+              ('Basics', 2, None, 'basics'),
+              ('Linear classifier', 2, None, 'linear-classifier'),
+              ('Some selected properties', 2, None, 'some-selected-properties'),
+              ('Simple example', 2, None, 'simple-example'),
+              ('Plotting the mean value for each group',
+               2,
+               None,
+               'plotting-the-mean-value-for-each-group'),
+              ('The logistic function', 2, None, 'the-logistic-function'),
+              ('Examples of likelihood functions used in logistic regression '
+               'and nueral networks',
+               2,
+               None,
+               'examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-networks'),
+              ('Two parameters', 2, None, 'two-parameters'),
+              ('Maximum likelihood', 2, None, 'maximum-likelihood'),
+              ('The cost function rewritten',
+               2,
+               None,
+               'the-cost-function-rewritten'),
+              ('Minimizing the cross entropy',
+               2,
+               None,
+               'minimizing-the-cross-entropy'),
+              ('A more compact expression',
+               2,
+               None,
+               'a-more-compact-expression'),
+              ('Extending to more predictors',
+               2,
+               None,
+               'extending-to-more-predictors'),
+              ('Including more classes', 2, None, 'including-more-classes'),
+              ('More classes', 2, None, 'more-classes'),
+              ('Optimization, the central part of any Machine Learning '
+               'algortithm',
+               2,
+               None,
+               'optimization-the-central-part-of-any-machine-learning-algortithm'),
+              ('Revisiting our Logistic Regression case',
+               2,
+               None,
+               'revisiting-our-logistic-regression-case'),
+              ('The equations to solve', 2, None, 'the-equations-to-solve'),
+              ("Solving using Newton-Raphson's method",
+               2,
+               None,
+               'solving-using-newton-raphson-s-method'),
+              ('Example code for Logistic Regression',
+               2,
+               None,
+               'example-code-for-logistic-regression'),
+              ('Synthetic data generation',
+               3,
+               None,
+               'synthetic-data-generation')]}
+end of tocinfo -->
+
+<body>
+
+
+
+<script type="text/x-mathjax-config">
+MathJax.Hub.Config({
+  TeX: {
+     equationNumbers: {  autoNumber: "none"  },
+     extensions: ["AMSmath.js", "AMSsymbols.js", "autobold.js", "color.js"]
+  }
+});
+</script>
+<script type="text/javascript" async
+ src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
+</script>
+
+
+<!-- Bootstrap navigation bar -->
+<div class="navbar navbar-default navbar-fixed-top">
+  <div class="navbar-header">
+    <button type="button" class="navbar-toggle" data-toggle="collapse" data-target=".navbar-responsive-collapse">
+      <span class="icon-bar"></span>
+      <span class="icon-bar"></span>
+      <span class="icon-bar"></span>
+    </button>
+    <a class="navbar-brand" href="http://github.com/week39-bs.html">Week 39: Resampling methods and logistic regression</a>
+  </div>
+  <div class="navbar-collapse collapse navbar-responsive-collapse">
+    <ul class="nav navbar-nav navbar-right">
+      <li class="dropdown">
+        <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
+        <ul class="dropdown-menu">
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs001.html#plan-for-week-39-september-22-26-2025" style="font-size: 80%;"><b>Plan for week 39, September 22-26, 2025</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs002.html#readings-and-videos-resampling-methods" style="font-size: 80%;"><b>Readings and Videos, resampling methods</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs003.html#readings-and-videos-logistic-regression" style="font-size: 80%;"><b>Readings and Videos, logistic regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs004.html#lab-sessions-week-39" style="font-size: 80%;"><b>Lab sessions week 39</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs005.html#lecture-material" style="font-size: 80%;"><b>Lecture material</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs010.html#resampling-methods" style="font-size: 80%;"><b>Resampling methods</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs007.html#resampling-approaches-can-be-computationally-expensive" style="font-size: 80%;"><b>Resampling approaches can be computationally expensive</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs008.html#why-resampling-methods" style="font-size: 80%;"><b>Why resampling methods ?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs009.html#statistical-analysis" style="font-size: 80%;"><b>Statistical analysis</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs010.html#resampling-methods" style="font-size: 80%;"><b>Resampling methods</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs011.html#resampling-methods-bootstrap" style="font-size: 80%;"><b>Resampling methods: Bootstrap</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs012.html#the-bias-variance-tradeoff" style="font-size: 80%;"><b>The bias-variance tradeoff</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs013.html#a-way-to-read-the-bias-variance-tradeoff" style="font-size: 80%;"><b>A way to Read the Bias-Variance Tradeoff</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs014.html#understanding-what-happens" style="font-size: 80%;"><b>Understanding what happens</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs015.html#summing-up" style="font-size: 80%;"><b>Summing up</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs016.html#another-example-from-scikit-learn-s-repository" style="font-size: 80%;"><b>Another Example from Scikit-Learn's Repository</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs017.html#various-steps-in-cross-validation" style="font-size: 80%;"><b>Various steps in cross-validation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs018.html#cross-validation-in-brief" style="font-size: 80%;"><b>Cross-validation in brief</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs019.html#code-example-for-cross-validation-and-k-fold-cross-validation" style="font-size: 80%;"><b>Code Example for Cross-validation and \( k \)-fold Cross-validation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs020.html#more-examples-on-bootstrap-and-cross-validation-and-errors" style="font-size: 80%;"><b>More examples on bootstrap and cross-validation and errors</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs021.html#the-same-example-but-now-with-cross-validation" style="font-size: 80%;"><b>The same example but now with cross-validation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs022.html#logistic-regression" style="font-size: 80%;"><b>Logistic Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs023.html#classification-problems" style="font-size: 80%;"><b>Classification problems</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs024.html#optimization-and-deep-learning" style="font-size: 80%;"><b>Optimization and Deep learning</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs025.html#basics" style="font-size: 80%;"><b>Basics</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs026.html#linear-classifier" style="font-size: 80%;"><b>Linear classifier</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs027.html#some-selected-properties" style="font-size: 80%;"><b>Some selected properties</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs028.html#simple-example" style="font-size: 80%;"><b>Simple example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs029.html#plotting-the-mean-value-for-each-group" style="font-size: 80%;"><b>Plotting the mean value for each group</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs030.html#the-logistic-function" style="font-size: 80%;"><b>The logistic function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs031.html#examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-networks" style="font-size: 80%;"><b>Examples of likelihood functions used in logistic regression and nueral networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs032.html#two-parameters" style="font-size: 80%;"><b>Two parameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs033.html#maximum-likelihood" style="font-size: 80%;"><b>Maximum likelihood</b></a></li>
+     <!-- navigation toc: --> <li><a href="#the-cost-function-rewritten" style="font-size: 80%;"><b>The cost function rewritten</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs035.html#minimizing-the-cross-entropy" style="font-size: 80%;"><b>Minimizing the cross entropy</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs036.html#a-more-compact-expression" style="font-size: 80%;"><b>A more compact expression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs037.html#extending-to-more-predictors" style="font-size: 80%;"><b>Extending to more predictors</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs038.html#including-more-classes" style="font-size: 80%;"><b>Including more classes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs039.html#more-classes" style="font-size: 80%;"><b>More classes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs040.html#optimization-the-central-part-of-any-machine-learning-algortithm" style="font-size: 80%;"><b>Optimization, the central part of any Machine Learning algortithm</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs041.html#revisiting-our-logistic-regression-case" style="font-size: 80%;"><b>Revisiting our Logistic Regression case</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs042.html#the-equations-to-solve" style="font-size: 80%;"><b>The equations to solve</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs043.html#solving-using-newton-raphson-s-method" style="font-size: 80%;"><b>Solving using Newton-Raphson's method</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs044.html#example-code-for-logistic-regression" style="font-size: 80%;"><b>Example code for Logistic Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs044.html#synthetic-data-generation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Synthetic data generation</a></li>
+
+        </ul>
+      </li>
+    </ul>
+  </div>
+</div>
+</div> <!-- end of navigation bar -->
+<div class="container">
+<p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
+<a name="part0034"></a>
+<!-- !split -->
+<h2 id="the-cost-function-rewritten" class="anchor">The cost function rewritten </h2>
+
+<p>Reordering the logarithms, we can rewrite the <b>cost/loss</b> function as</p>
+$$
+\mathcal{C}(\boldsymbol{\theta}) = \sum_{i=1}^n  \left(y_i(\theta_0+\theta_1x_i) -\log{(1+\exp{(\theta_0+\theta_1x_i)})}\right).
+$$
+
+<p>The maximum likelihood estimator is defined as the set of parameters that maximize the log-likelihood where we maximize with respect to \( \theta \).
+Since the cost (error) function is just the negative log-likelihood, for logistic regression we have that
+</p>
+$$
+\mathcal{C}(\boldsymbol{\theta})=-\sum_{i=1}^n  \left(y_i(\theta_0+\theta_1x_i) -\log{(1+\exp{(\theta_0+\theta_1x_i)})}\right).
+$$
+
+<p>This equation is known in statistics as the <b>cross entropy</b>. Finally, we note that just as in linear regression, 
+in practice we often supplement the cross-entropy with additional regularization terms, usually \( L_1 \) and \( L_2 \) regularization as we did for Ridge and Lasso regression.
+</p>
+
+<p>
+<!-- navigation buttons at the bottom of the page -->
+<ul class="pagination">
+<li><a href="http://github.com/._week39-bs033.html">&laquo;</a></li>
+  <li><a href="http://github.com/._week39-bs000.html">1</a></li>
+  <li><a href="">...</a></li>
+  <li><a href="http://github.com/._week39-bs026.html">27</a></li>
+  <li><a href="http://github.com/._week39-bs027.html">28</a></li>
+  <li><a href="http://github.com/._week39-bs028.html">29</a></li>
+  <li><a href="http://github.com/._week39-bs029.html">30</a></li>
+  <li><a href="http://github.com/._week39-bs030.html">31</a></li>
+  <li><a href="http://github.com/._week39-bs031.html">32</a></li>
+  <li><a href="http://github.com/._week39-bs032.html">33</a></li>
+  <li><a href="http://github.com/._week39-bs033.html">34</a></li>
+  <li class="active"><a href="http://github.com/._week39-bs034.html">35</a></li>
+  <li><a href="http://github.com/._week39-bs035.html">36</a></li>
+  <li><a href="http://github.com/._week39-bs036.html">37</a></li>
+  <li><a href="http://github.com/._week39-bs037.html">38</a></li>
+  <li><a href="http://github.com/._week39-bs038.html">39</a></li>
+  <li><a href="http://github.com/._week39-bs039.html">40</a></li>
+  <li><a href="http://github.com/._week39-bs040.html">41</a></li>
+  <li><a href="http://github.com/._week39-bs041.html">42</a></li>
+  <li><a href="http://github.com/._week39-bs042.html">43</a></li>
+  <li><a href="http://github.com/._week39-bs043.html">44</a></li>
+  <li><a href="">...</a></li>
+  <li><a href="http://github.com/._week39-bs044.html">45</a></li>
+  <li><a href="http://github.com/._week39-bs035.html">&raquo;</a></li>
+</ul>
+<!-- ------------------- end of main content --------------- -->
+</div>  <!-- end container -->
+<!-- include javascript, jQuery *first* -->
+<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.10.2/jquery.min.js"></script>
+<script src="https://netdna.bootstrapcdn.com/bootstrap/3.0.0/js/bootstrap.min.js"></script>
+<!-- Bootstrap footer
+<footer>
+<a href="https://.../"><img width="250" align=right src="https://.../"></a>
+</footer>
+-->
+<center style="font-size:80%">
+<!-- copyright only on the titlepage -->
+</center>
+</body>
+</html>
+
diff --git a/doc/pub/week39/html/._week39-bs035.html b/doc/pub/week39/html/._week39-bs035.html
new file mode 100644
index 000000000..b4c9f4ec3
--- /dev/null
+++ b/doc/pub/week39/html/._week39-bs035.html
@@ -0,0 +1,316 @@
+<!--
+HTML file automatically generated from DocOnce source
+(https://github.com/doconce/doconce/)
+doconce format html week39.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week39-bs --no_mako
+-->
+<html>
+<head>
+<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
+<meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
+<meta name="viewport" content="width=device-width, initial-scale=1.0" />
+<meta name="description" content="Week 39: Resampling methods and logistic regression">
+<title>Week 39: Resampling methods and logistic regression</title>
+<!-- Bootstrap style: bootstrap -->
+<!-- doconce format html week39.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week39-bs --no_mako -->
+<link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
+<!-- not necessary
+<link href="https://netdna.bootstrapcdn.com/font-awesome/4.0.3/css/font-awesome.css" rel="stylesheet">
+-->
+<style type="text/css">
+/* Add scrollbar to dropdown menus in bootstrap navigation bar */
+.dropdown-menu {
+   height: auto;
+   max-height: 400px;
+   overflow-x: hidden;
+}
+/* Adds an invisible element before each target to offset for the navigation
+   bar */
+.anchor::before {
+  content:"";
+  display:block;
+  height:50px;      /* fixed header height for style bootstrap */
+  margin:-50px 0 0; /* negative fixed header height */
+}
+</style>
+</head>
+
+<!-- tocinfo
+{'highest level': 2,
+ 'sections': [('Plan for week 39, September 22-26, 2025',
+               2,
+               None,
+               'plan-for-week-39-september-22-26-2025'),
+              ('Readings and Videos, resampling methods',
+               2,
+               None,
+               'readings-and-videos-resampling-methods'),
+              ('Readings and Videos, logistic regression',
+               2,
+               None,
+               'readings-and-videos-logistic-regression'),
+              ('Lab sessions week 39', 2, None, 'lab-sessions-week-39'),
+              ('Lecture material', 2, None, 'lecture-material'),
+              ('Resampling methods', 2, None, 'resampling-methods'),
+              ('Resampling approaches can be computationally expensive',
+               2,
+               None,
+               'resampling-approaches-can-be-computationally-expensive'),
+              ('Why resampling methods ?', 2, None, 'why-resampling-methods'),
+              ('Statistical analysis', 2, None, 'statistical-analysis'),
+              ('Resampling methods', 2, None, 'resampling-methods'),
+              ('Resampling methods: Bootstrap',
+               2,
+               None,
+               'resampling-methods-bootstrap'),
+              ('The bias-variance tradeoff',
+               2,
+               None,
+               'the-bias-variance-tradeoff'),
+              ('A way to Read the Bias-Variance Tradeoff',
+               2,
+               None,
+               'a-way-to-read-the-bias-variance-tradeoff'),
+              ('Understanding what happens',
+               2,
+               None,
+               'understanding-what-happens'),
+              ('Summing up', 2, None, 'summing-up'),
+              ("Another Example from Scikit-Learn's Repository",
+               2,
+               None,
+               'another-example-from-scikit-learn-s-repository'),
+              ('Various steps in cross-validation',
+               2,
+               None,
+               'various-steps-in-cross-validation'),
+              ('Cross-validation in brief',
+               2,
+               None,
+               'cross-validation-in-brief'),
+              ('Code Example for Cross-validation and $k$-fold '
+               'Cross-validation',
+               2,
+               None,
+               'code-example-for-cross-validation-and-k-fold-cross-validation'),
+              ('More examples on bootstrap and cross-validation and errors',
+               2,
+               None,
+               'more-examples-on-bootstrap-and-cross-validation-and-errors'),
+              ('The same example but now with cross-validation',
+               2,
+               None,
+               'the-same-example-but-now-with-cross-validation'),
+              ('Logistic Regression', 2, None, 'logistic-regression'),
+              ('Classification problems', 2, None, 'classification-problems'),
+              ('Optimization and Deep learning',
+               2,
+               None,
+               'optimization-and-deep-learning'),
+              ('Basics', 2, None, 'basics'),
+              ('Linear classifier', 2, None, 'linear-classifier'),
+              ('Some selected properties', 2, None, 'some-selected-properties'),
+              ('Simple example', 2, None, 'simple-example'),
+              ('Plotting the mean value for each group',
+               2,
+               None,
+               'plotting-the-mean-value-for-each-group'),
+              ('The logistic function', 2, None, 'the-logistic-function'),
+              ('Examples of likelihood functions used in logistic regression '
+               'and nueral networks',
+               2,
+               None,
+               'examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-networks'),
+              ('Two parameters', 2, None, 'two-parameters'),
+              ('Maximum likelihood', 2, None, 'maximum-likelihood'),
+              ('The cost function rewritten',
+               2,
+               None,
+               'the-cost-function-rewritten'),
+              ('Minimizing the cross entropy',
+               2,
+               None,
+               'minimizing-the-cross-entropy'),
+              ('A more compact expression',
+               2,
+               None,
+               'a-more-compact-expression'),
+              ('Extending to more predictors',
+               2,
+               None,
+               'extending-to-more-predictors'),
+              ('Including more classes', 2, None, 'including-more-classes'),
+              ('More classes', 2, None, 'more-classes'),
+              ('Optimization, the central part of any Machine Learning '
+               'algortithm',
+               2,
+               None,
+               'optimization-the-central-part-of-any-machine-learning-algortithm'),
+              ('Revisiting our Logistic Regression case',
+               2,
+               None,
+               'revisiting-our-logistic-regression-case'),
+              ('The equations to solve', 2, None, 'the-equations-to-solve'),
+              ("Solving using Newton-Raphson's method",
+               2,
+               None,
+               'solving-using-newton-raphson-s-method'),
+              ('Example code for Logistic Regression',
+               2,
+               None,
+               'example-code-for-logistic-regression'),
+              ('Synthetic data generation',
+               3,
+               None,
+               'synthetic-data-generation')]}
+end of tocinfo -->
+
+<body>
+
+
+
+<script type="text/x-mathjax-config">
+MathJax.Hub.Config({
+  TeX: {
+     equationNumbers: {  autoNumber: "none"  },
+     extensions: ["AMSmath.js", "AMSsymbols.js", "autobold.js", "color.js"]
+  }
+});
+</script>
+<script type="text/javascript" async
+ src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
+</script>
+
+
+<!-- Bootstrap navigation bar -->
+<div class="navbar navbar-default navbar-fixed-top">
+  <div class="navbar-header">
+    <button type="button" class="navbar-toggle" data-toggle="collapse" data-target=".navbar-responsive-collapse">
+      <span class="icon-bar"></span>
+      <span class="icon-bar"></span>
+      <span class="icon-bar"></span>
+    </button>
+    <a class="navbar-brand" href="http://github.com/week39-bs.html">Week 39: Resampling methods and logistic regression</a>
+  </div>
+  <div class="navbar-collapse collapse navbar-responsive-collapse">
+    <ul class="nav navbar-nav navbar-right">
+      <li class="dropdown">
+        <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
+        <ul class="dropdown-menu">
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs001.html#plan-for-week-39-september-22-26-2025" style="font-size: 80%;"><b>Plan for week 39, September 22-26, 2025</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs002.html#readings-and-videos-resampling-methods" style="font-size: 80%;"><b>Readings and Videos, resampling methods</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs003.html#readings-and-videos-logistic-regression" style="font-size: 80%;"><b>Readings and Videos, logistic regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs004.html#lab-sessions-week-39" style="font-size: 80%;"><b>Lab sessions week 39</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs005.html#lecture-material" style="font-size: 80%;"><b>Lecture material</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs010.html#resampling-methods" style="font-size: 80%;"><b>Resampling methods</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs007.html#resampling-approaches-can-be-computationally-expensive" style="font-size: 80%;"><b>Resampling approaches can be computationally expensive</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs008.html#why-resampling-methods" style="font-size: 80%;"><b>Why resampling methods ?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs009.html#statistical-analysis" style="font-size: 80%;"><b>Statistical analysis</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs010.html#resampling-methods" style="font-size: 80%;"><b>Resampling methods</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs011.html#resampling-methods-bootstrap" style="font-size: 80%;"><b>Resampling methods: Bootstrap</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs012.html#the-bias-variance-tradeoff" style="font-size: 80%;"><b>The bias-variance tradeoff</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs013.html#a-way-to-read-the-bias-variance-tradeoff" style="font-size: 80%;"><b>A way to Read the Bias-Variance Tradeoff</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs014.html#understanding-what-happens" style="font-size: 80%;"><b>Understanding what happens</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs015.html#summing-up" style="font-size: 80%;"><b>Summing up</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs016.html#another-example-from-scikit-learn-s-repository" style="font-size: 80%;"><b>Another Example from Scikit-Learn's Repository</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs017.html#various-steps-in-cross-validation" style="font-size: 80%;"><b>Various steps in cross-validation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs018.html#cross-validation-in-brief" style="font-size: 80%;"><b>Cross-validation in brief</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs019.html#code-example-for-cross-validation-and-k-fold-cross-validation" style="font-size: 80%;"><b>Code Example for Cross-validation and \( k \)-fold Cross-validation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs020.html#more-examples-on-bootstrap-and-cross-validation-and-errors" style="font-size: 80%;"><b>More examples on bootstrap and cross-validation and errors</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs021.html#the-same-example-but-now-with-cross-validation" style="font-size: 80%;"><b>The same example but now with cross-validation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs022.html#logistic-regression" style="font-size: 80%;"><b>Logistic Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs023.html#classification-problems" style="font-size: 80%;"><b>Classification problems</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs024.html#optimization-and-deep-learning" style="font-size: 80%;"><b>Optimization and Deep learning</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs025.html#basics" style="font-size: 80%;"><b>Basics</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs026.html#linear-classifier" style="font-size: 80%;"><b>Linear classifier</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs027.html#some-selected-properties" style="font-size: 80%;"><b>Some selected properties</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs028.html#simple-example" style="font-size: 80%;"><b>Simple example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs029.html#plotting-the-mean-value-for-each-group" style="font-size: 80%;"><b>Plotting the mean value for each group</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs030.html#the-logistic-function" style="font-size: 80%;"><b>The logistic function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs031.html#examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-networks" style="font-size: 80%;"><b>Examples of likelihood functions used in logistic regression and nueral networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs032.html#two-parameters" style="font-size: 80%;"><b>Two parameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs033.html#maximum-likelihood" style="font-size: 80%;"><b>Maximum likelihood</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#the-cost-function-rewritten" style="font-size: 80%;"><b>The cost function rewritten</b></a></li>
+     <!-- navigation toc: --> <li><a href="#minimizing-the-cross-entropy" style="font-size: 80%;"><b>Minimizing the cross entropy</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs036.html#a-more-compact-expression" style="font-size: 80%;"><b>A more compact expression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs037.html#extending-to-more-predictors" style="font-size: 80%;"><b>Extending to more predictors</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs038.html#including-more-classes" style="font-size: 80%;"><b>Including more classes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs039.html#more-classes" style="font-size: 80%;"><b>More classes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs040.html#optimization-the-central-part-of-any-machine-learning-algortithm" style="font-size: 80%;"><b>Optimization, the central part of any Machine Learning algortithm</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs041.html#revisiting-our-logistic-regression-case" style="font-size: 80%;"><b>Revisiting our Logistic Regression case</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs042.html#the-equations-to-solve" style="font-size: 80%;"><b>The equations to solve</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs043.html#solving-using-newton-raphson-s-method" style="font-size: 80%;"><b>Solving using Newton-Raphson's method</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs044.html#example-code-for-logistic-regression" style="font-size: 80%;"><b>Example code for Logistic Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs044.html#synthetic-data-generation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Synthetic data generation</a></li>
+
+        </ul>
+      </li>
+    </ul>
+  </div>
+</div>
+</div> <!-- end of navigation bar -->
+<div class="container">
+<p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
+<a name="part0035"></a>
+<!-- !split -->
+<h2 id="minimizing-the-cross-entropy" class="anchor">Minimizing the cross entropy </h2>
+
+<p>The cross entropy is a convex function of the weights \( \boldsymbol{\theta} \) and,
+therefore, any local minimizer is a global minimizer. 
+</p>
+
+<p>Minimizing this
+cost function with respect to the two parameters \( \theta_0 \) and \( \theta_1 \) we obtain
+</p>
+
+$$
+\frac{\partial \mathcal{C}(\boldsymbol{\theta})}{\partial \theta_0} = -\sum_{i=1}^n  \left(y_i -\frac{\exp{(\theta_0+\theta_1x_i)}}{1+\exp{(\theta_0+\theta_1x_i)}}\right),
+$$
+
+<p>and </p>
+$$
+\frac{\partial \mathcal{C}(\boldsymbol{\theta})}{\partial \theta_1} = -\sum_{i=1}^n  \left(y_ix_i -x_i\frac{\exp{(\theta_0+\theta_1x_i)}}{1+\exp{(\theta_0+\theta_1x_i)}}\right).
+$$
+
+
+<p>
+<!-- navigation buttons at the bottom of the page -->
+<ul class="pagination">
+<li><a href="http://github.com/._week39-bs034.html">&laquo;</a></li>
+  <li><a href="http://github.com/._week39-bs000.html">1</a></li>
+  <li><a href="">...</a></li>
+  <li><a href="http://github.com/._week39-bs027.html">28</a></li>
+  <li><a href="http://github.com/._week39-bs028.html">29</a></li>
+  <li><a href="http://github.com/._week39-bs029.html">30</a></li>
+  <li><a href="http://github.com/._week39-bs030.html">31</a></li>
+  <li><a href="http://github.com/._week39-bs031.html">32</a></li>
+  <li><a href="http://github.com/._week39-bs032.html">33</a></li>
+  <li><a href="http://github.com/._week39-bs033.html">34</a></li>
+  <li><a href="http://github.com/._week39-bs034.html">35</a></li>
+  <li class="active"><a href="http://github.com/._week39-bs035.html">36</a></li>
+  <li><a href="http://github.com/._week39-bs036.html">37</a></li>
+  <li><a href="http://github.com/._week39-bs037.html">38</a></li>
+  <li><a href="http://github.com/._week39-bs038.html">39</a></li>
+  <li><a href="http://github.com/._week39-bs039.html">40</a></li>
+  <li><a href="http://github.com/._week39-bs040.html">41</a></li>
+  <li><a href="http://github.com/._week39-bs041.html">42</a></li>
+  <li><a href="http://github.com/._week39-bs042.html">43</a></li>
+  <li><a href="http://github.com/._week39-bs043.html">44</a></li>
+  <li><a href="http://github.com/._week39-bs044.html">45</a></li>
+  <li><a href="http://github.com/._week39-bs036.html">&raquo;</a></li>
+</ul>
+<!-- ------------------- end of main content --------------- -->
+</div>  <!-- end container -->
+<!-- include javascript, jQuery *first* -->
+<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.10.2/jquery.min.js"></script>
+<script src="https://netdna.bootstrapcdn.com/bootstrap/3.0.0/js/bootstrap.min.js"></script>
+<!-- Bootstrap footer
+<footer>
+<a href="https://.../"><img width="250" align=right src="https://.../"></a>
+</footer>
+-->
+<center style="font-size:80%">
+<!-- copyright only on the titlepage -->
+</center>
+</body>
+</html>
+
diff --git a/doc/pub/week39/html/._week39-bs036.html b/doc/pub/week39/html/._week39-bs036.html
new file mode 100644
index 000000000..82971218b
--- /dev/null
+++ b/doc/pub/week39/html/._week39-bs036.html
@@ -0,0 +1,316 @@
+<!--
+HTML file automatically generated from DocOnce source
+(https://github.com/doconce/doconce/)
+doconce format html week39.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week39-bs --no_mako
+-->
+<html>
+<head>
+<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
+<meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
+<meta name="viewport" content="width=device-width, initial-scale=1.0" />
+<meta name="description" content="Week 39: Resampling methods and logistic regression">
+<title>Week 39: Resampling methods and logistic regression</title>
+<!-- Bootstrap style: bootstrap -->
+<!-- doconce format html week39.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week39-bs --no_mako -->
+<link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
+<!-- not necessary
+<link href="https://netdna.bootstrapcdn.com/font-awesome/4.0.3/css/font-awesome.css" rel="stylesheet">
+-->
+<style type="text/css">
+/* Add scrollbar to dropdown menus in bootstrap navigation bar */
+.dropdown-menu {
+   height: auto;
+   max-height: 400px;
+   overflow-x: hidden;
+}
+/* Adds an invisible element before each target to offset for the navigation
+   bar */
+.anchor::before {
+  content:"";
+  display:block;
+  height:50px;      /* fixed header height for style bootstrap */
+  margin:-50px 0 0; /* negative fixed header height */
+}
+</style>
+</head>
+
+<!-- tocinfo
+{'highest level': 2,
+ 'sections': [('Plan for week 39, September 22-26, 2025',
+               2,
+               None,
+               'plan-for-week-39-september-22-26-2025'),
+              ('Readings and Videos, resampling methods',
+               2,
+               None,
+               'readings-and-videos-resampling-methods'),
+              ('Readings and Videos, logistic regression',
+               2,
+               None,
+               'readings-and-videos-logistic-regression'),
+              ('Lab sessions week 39', 2, None, 'lab-sessions-week-39'),
+              ('Lecture material', 2, None, 'lecture-material'),
+              ('Resampling methods', 2, None, 'resampling-methods'),
+              ('Resampling approaches can be computationally expensive',
+               2,
+               None,
+               'resampling-approaches-can-be-computationally-expensive'),
+              ('Why resampling methods ?', 2, None, 'why-resampling-methods'),
+              ('Statistical analysis', 2, None, 'statistical-analysis'),
+              ('Resampling methods', 2, None, 'resampling-methods'),
+              ('Resampling methods: Bootstrap',
+               2,
+               None,
+               'resampling-methods-bootstrap'),
+              ('The bias-variance tradeoff',
+               2,
+               None,
+               'the-bias-variance-tradeoff'),
+              ('A way to Read the Bias-Variance Tradeoff',
+               2,
+               None,
+               'a-way-to-read-the-bias-variance-tradeoff'),
+              ('Understanding what happens',
+               2,
+               None,
+               'understanding-what-happens'),
+              ('Summing up', 2, None, 'summing-up'),
+              ("Another Example from Scikit-Learn's Repository",
+               2,
+               None,
+               'another-example-from-scikit-learn-s-repository'),
+              ('Various steps in cross-validation',
+               2,
+               None,
+               'various-steps-in-cross-validation'),
+              ('Cross-validation in brief',
+               2,
+               None,
+               'cross-validation-in-brief'),
+              ('Code Example for Cross-validation and $k$-fold '
+               'Cross-validation',
+               2,
+               None,
+               'code-example-for-cross-validation-and-k-fold-cross-validation'),
+              ('More examples on bootstrap and cross-validation and errors',
+               2,
+               None,
+               'more-examples-on-bootstrap-and-cross-validation-and-errors'),
+              ('The same example but now with cross-validation',
+               2,
+               None,
+               'the-same-example-but-now-with-cross-validation'),
+              ('Logistic Regression', 2, None, 'logistic-regression'),
+              ('Classification problems', 2, None, 'classification-problems'),
+              ('Optimization and Deep learning',
+               2,
+               None,
+               'optimization-and-deep-learning'),
+              ('Basics', 2, None, 'basics'),
+              ('Linear classifier', 2, None, 'linear-classifier'),
+              ('Some selected properties', 2, None, 'some-selected-properties'),
+              ('Simple example', 2, None, 'simple-example'),
+              ('Plotting the mean value for each group',
+               2,
+               None,
+               'plotting-the-mean-value-for-each-group'),
+              ('The logistic function', 2, None, 'the-logistic-function'),
+              ('Examples of likelihood functions used in logistic regression '
+               'and nueral networks',
+               2,
+               None,
+               'examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-networks'),
+              ('Two parameters', 2, None, 'two-parameters'),
+              ('Maximum likelihood', 2, None, 'maximum-likelihood'),
+              ('The cost function rewritten',
+               2,
+               None,
+               'the-cost-function-rewritten'),
+              ('Minimizing the cross entropy',
+               2,
+               None,
+               'minimizing-the-cross-entropy'),
+              ('A more compact expression',
+               2,
+               None,
+               'a-more-compact-expression'),
+              ('Extending to more predictors',
+               2,
+               None,
+               'extending-to-more-predictors'),
+              ('Including more classes', 2, None, 'including-more-classes'),
+              ('More classes', 2, None, 'more-classes'),
+              ('Optimization, the central part of any Machine Learning '
+               'algortithm',
+               2,
+               None,
+               'optimization-the-central-part-of-any-machine-learning-algortithm'),
+              ('Revisiting our Logistic Regression case',
+               2,
+               None,
+               'revisiting-our-logistic-regression-case'),
+              ('The equations to solve', 2, None, 'the-equations-to-solve'),
+              ("Solving using Newton-Raphson's method",
+               2,
+               None,
+               'solving-using-newton-raphson-s-method'),
+              ('Example code for Logistic Regression',
+               2,
+               None,
+               'example-code-for-logistic-regression'),
+              ('Synthetic data generation',
+               3,
+               None,
+               'synthetic-data-generation')]}
+end of tocinfo -->
+
+<body>
+
+
+
+<script type="text/x-mathjax-config">
+MathJax.Hub.Config({
+  TeX: {
+     equationNumbers: {  autoNumber: "none"  },
+     extensions: ["AMSmath.js", "AMSsymbols.js", "autobold.js", "color.js"]
+  }
+});
+</script>
+<script type="text/javascript" async
+ src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
+</script>
+
+
+<!-- Bootstrap navigation bar -->
+<div class="navbar navbar-default navbar-fixed-top">
+  <div class="navbar-header">
+    <button type="button" class="navbar-toggle" data-toggle="collapse" data-target=".navbar-responsive-collapse">
+      <span class="icon-bar"></span>
+      <span class="icon-bar"></span>
+      <span class="icon-bar"></span>
+    </button>
+    <a class="navbar-brand" href="http://github.com/week39-bs.html">Week 39: Resampling methods and logistic regression</a>
+  </div>
+  <div class="navbar-collapse collapse navbar-responsive-collapse">
+    <ul class="nav navbar-nav navbar-right">
+      <li class="dropdown">
+        <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
+        <ul class="dropdown-menu">
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs001.html#plan-for-week-39-september-22-26-2025" style="font-size: 80%;"><b>Plan for week 39, September 22-26, 2025</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs002.html#readings-and-videos-resampling-methods" style="font-size: 80%;"><b>Readings and Videos, resampling methods</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs003.html#readings-and-videos-logistic-regression" style="font-size: 80%;"><b>Readings and Videos, logistic regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs004.html#lab-sessions-week-39" style="font-size: 80%;"><b>Lab sessions week 39</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs005.html#lecture-material" style="font-size: 80%;"><b>Lecture material</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs010.html#resampling-methods" style="font-size: 80%;"><b>Resampling methods</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs007.html#resampling-approaches-can-be-computationally-expensive" style="font-size: 80%;"><b>Resampling approaches can be computationally expensive</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs008.html#why-resampling-methods" style="font-size: 80%;"><b>Why resampling methods ?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs009.html#statistical-analysis" style="font-size: 80%;"><b>Statistical analysis</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs010.html#resampling-methods" style="font-size: 80%;"><b>Resampling methods</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs011.html#resampling-methods-bootstrap" style="font-size: 80%;"><b>Resampling methods: Bootstrap</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs012.html#the-bias-variance-tradeoff" style="font-size: 80%;"><b>The bias-variance tradeoff</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs013.html#a-way-to-read-the-bias-variance-tradeoff" style="font-size: 80%;"><b>A way to Read the Bias-Variance Tradeoff</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs014.html#understanding-what-happens" style="font-size: 80%;"><b>Understanding what happens</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs015.html#summing-up" style="font-size: 80%;"><b>Summing up</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs016.html#another-example-from-scikit-learn-s-repository" style="font-size: 80%;"><b>Another Example from Scikit-Learn's Repository</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs017.html#various-steps-in-cross-validation" style="font-size: 80%;"><b>Various steps in cross-validation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs018.html#cross-validation-in-brief" style="font-size: 80%;"><b>Cross-validation in brief</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs019.html#code-example-for-cross-validation-and-k-fold-cross-validation" style="font-size: 80%;"><b>Code Example for Cross-validation and \( k \)-fold Cross-validation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs020.html#more-examples-on-bootstrap-and-cross-validation-and-errors" style="font-size: 80%;"><b>More examples on bootstrap and cross-validation and errors</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs021.html#the-same-example-but-now-with-cross-validation" style="font-size: 80%;"><b>The same example but now with cross-validation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs022.html#logistic-regression" style="font-size: 80%;"><b>Logistic Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs023.html#classification-problems" style="font-size: 80%;"><b>Classification problems</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs024.html#optimization-and-deep-learning" style="font-size: 80%;"><b>Optimization and Deep learning</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs025.html#basics" style="font-size: 80%;"><b>Basics</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs026.html#linear-classifier" style="font-size: 80%;"><b>Linear classifier</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs027.html#some-selected-properties" style="font-size: 80%;"><b>Some selected properties</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs028.html#simple-example" style="font-size: 80%;"><b>Simple example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs029.html#plotting-the-mean-value-for-each-group" style="font-size: 80%;"><b>Plotting the mean value for each group</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs030.html#the-logistic-function" style="font-size: 80%;"><b>The logistic function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs031.html#examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-networks" style="font-size: 80%;"><b>Examples of likelihood functions used in logistic regression and nueral networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs032.html#two-parameters" style="font-size: 80%;"><b>Two parameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs033.html#maximum-likelihood" style="font-size: 80%;"><b>Maximum likelihood</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#the-cost-function-rewritten" style="font-size: 80%;"><b>The cost function rewritten</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs035.html#minimizing-the-cross-entropy" style="font-size: 80%;"><b>Minimizing the cross entropy</b></a></li>
+     <!-- navigation toc: --> <li><a href="#a-more-compact-expression" style="font-size: 80%;"><b>A more compact expression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs037.html#extending-to-more-predictors" style="font-size: 80%;"><b>Extending to more predictors</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs038.html#including-more-classes" style="font-size: 80%;"><b>Including more classes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs039.html#more-classes" style="font-size: 80%;"><b>More classes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs040.html#optimization-the-central-part-of-any-machine-learning-algortithm" style="font-size: 80%;"><b>Optimization, the central part of any Machine Learning algortithm</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs041.html#revisiting-our-logistic-regression-case" style="font-size: 80%;"><b>Revisiting our Logistic Regression case</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs042.html#the-equations-to-solve" style="font-size: 80%;"><b>The equations to solve</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs043.html#solving-using-newton-raphson-s-method" style="font-size: 80%;"><b>Solving using Newton-Raphson's method</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs044.html#example-code-for-logistic-regression" style="font-size: 80%;"><b>Example code for Logistic Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs044.html#synthetic-data-generation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Synthetic data generation</a></li>
+
+        </ul>
+      </li>
+    </ul>
+  </div>
+</div>
+</div> <!-- end of navigation bar -->
+<div class="container">
+<p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
+<a name="part0036"></a>
+<!-- !split -->
+<h2 id="a-more-compact-expression" class="anchor">A more compact expression </h2>
+
+<p>Let us now define a vector \( \boldsymbol{y} \) with \( n \) elements \( y_i \), an
+\( n\times p \) matrix \( \boldsymbol{X} \) which contains the \( x_i \) values and a
+vector \( \boldsymbol{p} \) of fitted probabilities \( p(y_i\vert x_i,\boldsymbol{\theta}) \). We can rewrite in a more compact form the first
+derivative of the cost function as
+</p>
+
+$$
+\frac{\partial \mathcal{C}(\boldsymbol{\theta})}{\partial \boldsymbol{\theta}} = -\boldsymbol{X}^T\left(\boldsymbol{y}-\boldsymbol{p}\right). 
+$$
+
+<p>If we in addition define a diagonal matrix \( \boldsymbol{W} \) with elements 
+\( p(y_i\vert x_i,\boldsymbol{\theta})(1-p(y_i\vert x_i,\boldsymbol{\theta}) \), we can obtain a compact expression of the second derivative as 
+</p>
+
+$$
+\frac{\partial^2 \mathcal{C}(\boldsymbol{\theta})}{\partial \boldsymbol{\theta}\partial \boldsymbol{\theta}^T} = \boldsymbol{X}^T\boldsymbol{W}\boldsymbol{X}. 
+$$
+
+
+<p>
+<!-- navigation buttons at the bottom of the page -->
+<ul class="pagination">
+<li><a href="http://github.com/._week39-bs035.html">&laquo;</a></li>
+  <li><a href="http://github.com/._week39-bs000.html">1</a></li>
+  <li><a href="">...</a></li>
+  <li><a href="http://github.com/._week39-bs028.html">29</a></li>
+  <li><a href="http://github.com/._week39-bs029.html">30</a></li>
+  <li><a href="http://github.com/._week39-bs030.html">31</a></li>
+  <li><a href="http://github.com/._week39-bs031.html">32</a></li>
+  <li><a href="http://github.com/._week39-bs032.html">33</a></li>
+  <li><a href="http://github.com/._week39-bs033.html">34</a></li>
+  <li><a href="http://github.com/._week39-bs034.html">35</a></li>
+  <li><a href="http://github.com/._week39-bs035.html">36</a></li>
+  <li class="active"><a href="http://github.com/._week39-bs036.html">37</a></li>
+  <li><a href="http://github.com/._week39-bs037.html">38</a></li>
+  <li><a href="http://github.com/._week39-bs038.html">39</a></li>
+  <li><a href="http://github.com/._week39-bs039.html">40</a></li>
+  <li><a href="http://github.com/._week39-bs040.html">41</a></li>
+  <li><a href="http://github.com/._week39-bs041.html">42</a></li>
+  <li><a href="http://github.com/._week39-bs042.html">43</a></li>
+  <li><a href="http://github.com/._week39-bs043.html">44</a></li>
+  <li><a href="http://github.com/._week39-bs044.html">45</a></li>
+  <li><a href="http://github.com/._week39-bs037.html">&raquo;</a></li>
+</ul>
+<!-- ------------------- end of main content --------------- -->
+</div>  <!-- end container -->
+<!-- include javascript, jQuery *first* -->
+<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.10.2/jquery.min.js"></script>
+<script src="https://netdna.bootstrapcdn.com/bootstrap/3.0.0/js/bootstrap.min.js"></script>
+<!-- Bootstrap footer
+<footer>
+<a href="https://.../"><img width="250" align=right src="https://.../"></a>
+</footer>
+-->
+<center style="font-size:80%">
+<!-- copyright only on the titlepage -->
+</center>
+</body>
+</html>
+
diff --git a/doc/pub/week39/html/._week39-bs037.html b/doc/pub/week39/html/._week39-bs037.html
new file mode 100644
index 000000000..98aa82919
--- /dev/null
+++ b/doc/pub/week39/html/._week39-bs037.html
@@ -0,0 +1,307 @@
+<!--
+HTML file automatically generated from DocOnce source
+(https://github.com/doconce/doconce/)
+doconce format html week39.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week39-bs --no_mako
+-->
+<html>
+<head>
+<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
+<meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
+<meta name="viewport" content="width=device-width, initial-scale=1.0" />
+<meta name="description" content="Week 39: Resampling methods and logistic regression">
+<title>Week 39: Resampling methods and logistic regression</title>
+<!-- Bootstrap style: bootstrap -->
+<!-- doconce format html week39.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week39-bs --no_mako -->
+<link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
+<!-- not necessary
+<link href="https://netdna.bootstrapcdn.com/font-awesome/4.0.3/css/font-awesome.css" rel="stylesheet">
+-->
+<style type="text/css">
+/* Add scrollbar to dropdown menus in bootstrap navigation bar */
+.dropdown-menu {
+   height: auto;
+   max-height: 400px;
+   overflow-x: hidden;
+}
+/* Adds an invisible element before each target to offset for the navigation
+   bar */
+.anchor::before {
+  content:"";
+  display:block;
+  height:50px;      /* fixed header height for style bootstrap */
+  margin:-50px 0 0; /* negative fixed header height */
+}
+</style>
+</head>
+
+<!-- tocinfo
+{'highest level': 2,
+ 'sections': [('Plan for week 39, September 22-26, 2025',
+               2,
+               None,
+               'plan-for-week-39-september-22-26-2025'),
+              ('Readings and Videos, resampling methods',
+               2,
+               None,
+               'readings-and-videos-resampling-methods'),
+              ('Readings and Videos, logistic regression',
+               2,
+               None,
+               'readings-and-videos-logistic-regression'),
+              ('Lab sessions week 39', 2, None, 'lab-sessions-week-39'),
+              ('Lecture material', 2, None, 'lecture-material'),
+              ('Resampling methods', 2, None, 'resampling-methods'),
+              ('Resampling approaches can be computationally expensive',
+               2,
+               None,
+               'resampling-approaches-can-be-computationally-expensive'),
+              ('Why resampling methods ?', 2, None, 'why-resampling-methods'),
+              ('Statistical analysis', 2, None, 'statistical-analysis'),
+              ('Resampling methods', 2, None, 'resampling-methods'),
+              ('Resampling methods: Bootstrap',
+               2,
+               None,
+               'resampling-methods-bootstrap'),
+              ('The bias-variance tradeoff',
+               2,
+               None,
+               'the-bias-variance-tradeoff'),
+              ('A way to Read the Bias-Variance Tradeoff',
+               2,
+               None,
+               'a-way-to-read-the-bias-variance-tradeoff'),
+              ('Understanding what happens',
+               2,
+               None,
+               'understanding-what-happens'),
+              ('Summing up', 2, None, 'summing-up'),
+              ("Another Example from Scikit-Learn's Repository",
+               2,
+               None,
+               'another-example-from-scikit-learn-s-repository'),
+              ('Various steps in cross-validation',
+               2,
+               None,
+               'various-steps-in-cross-validation'),
+              ('Cross-validation in brief',
+               2,
+               None,
+               'cross-validation-in-brief'),
+              ('Code Example for Cross-validation and $k$-fold '
+               'Cross-validation',
+               2,
+               None,
+               'code-example-for-cross-validation-and-k-fold-cross-validation'),
+              ('More examples on bootstrap and cross-validation and errors',
+               2,
+               None,
+               'more-examples-on-bootstrap-and-cross-validation-and-errors'),
+              ('The same example but now with cross-validation',
+               2,
+               None,
+               'the-same-example-but-now-with-cross-validation'),
+              ('Logistic Regression', 2, None, 'logistic-regression'),
+              ('Classification problems', 2, None, 'classification-problems'),
+              ('Optimization and Deep learning',
+               2,
+               None,
+               'optimization-and-deep-learning'),
+              ('Basics', 2, None, 'basics'),
+              ('Linear classifier', 2, None, 'linear-classifier'),
+              ('Some selected properties', 2, None, 'some-selected-properties'),
+              ('Simple example', 2, None, 'simple-example'),
+              ('Plotting the mean value for each group',
+               2,
+               None,
+               'plotting-the-mean-value-for-each-group'),
+              ('The logistic function', 2, None, 'the-logistic-function'),
+              ('Examples of likelihood functions used in logistic regression '
+               'and nueral networks',
+               2,
+               None,
+               'examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-networks'),
+              ('Two parameters', 2, None, 'two-parameters'),
+              ('Maximum likelihood', 2, None, 'maximum-likelihood'),
+              ('The cost function rewritten',
+               2,
+               None,
+               'the-cost-function-rewritten'),
+              ('Minimizing the cross entropy',
+               2,
+               None,
+               'minimizing-the-cross-entropy'),
+              ('A more compact expression',
+               2,
+               None,
+               'a-more-compact-expression'),
+              ('Extending to more predictors',
+               2,
+               None,
+               'extending-to-more-predictors'),
+              ('Including more classes', 2, None, 'including-more-classes'),
+              ('More classes', 2, None, 'more-classes'),
+              ('Optimization, the central part of any Machine Learning '
+               'algortithm',
+               2,
+               None,
+               'optimization-the-central-part-of-any-machine-learning-algortithm'),
+              ('Revisiting our Logistic Regression case',
+               2,
+               None,
+               'revisiting-our-logistic-regression-case'),
+              ('The equations to solve', 2, None, 'the-equations-to-solve'),
+              ("Solving using Newton-Raphson's method",
+               2,
+               None,
+               'solving-using-newton-raphson-s-method'),
+              ('Example code for Logistic Regression',
+               2,
+               None,
+               'example-code-for-logistic-regression'),
+              ('Synthetic data generation',
+               3,
+               None,
+               'synthetic-data-generation')]}
+end of tocinfo -->
+
+<body>
+
+
+
+<script type="text/x-mathjax-config">
+MathJax.Hub.Config({
+  TeX: {
+     equationNumbers: {  autoNumber: "none"  },
+     extensions: ["AMSmath.js", "AMSsymbols.js", "autobold.js", "color.js"]
+  }
+});
+</script>
+<script type="text/javascript" async
+ src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
+</script>
+
+
+<!-- Bootstrap navigation bar -->
+<div class="navbar navbar-default navbar-fixed-top">
+  <div class="navbar-header">
+    <button type="button" class="navbar-toggle" data-toggle="collapse" data-target=".navbar-responsive-collapse">
+      <span class="icon-bar"></span>
+      <span class="icon-bar"></span>
+      <span class="icon-bar"></span>
+    </button>
+    <a class="navbar-brand" href="http://github.com/week39-bs.html">Week 39: Resampling methods and logistic regression</a>
+  </div>
+  <div class="navbar-collapse collapse navbar-responsive-collapse">
+    <ul class="nav navbar-nav navbar-right">
+      <li class="dropdown">
+        <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
+        <ul class="dropdown-menu">
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs001.html#plan-for-week-39-september-22-26-2025" style="font-size: 80%;"><b>Plan for week 39, September 22-26, 2025</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs002.html#readings-and-videos-resampling-methods" style="font-size: 80%;"><b>Readings and Videos, resampling methods</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs003.html#readings-and-videos-logistic-regression" style="font-size: 80%;"><b>Readings and Videos, logistic regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs004.html#lab-sessions-week-39" style="font-size: 80%;"><b>Lab sessions week 39</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs005.html#lecture-material" style="font-size: 80%;"><b>Lecture material</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs010.html#resampling-methods" style="font-size: 80%;"><b>Resampling methods</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs007.html#resampling-approaches-can-be-computationally-expensive" style="font-size: 80%;"><b>Resampling approaches can be computationally expensive</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs008.html#why-resampling-methods" style="font-size: 80%;"><b>Why resampling methods ?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs009.html#statistical-analysis" style="font-size: 80%;"><b>Statistical analysis</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs010.html#resampling-methods" style="font-size: 80%;"><b>Resampling methods</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs011.html#resampling-methods-bootstrap" style="font-size: 80%;"><b>Resampling methods: Bootstrap</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs012.html#the-bias-variance-tradeoff" style="font-size: 80%;"><b>The bias-variance tradeoff</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs013.html#a-way-to-read-the-bias-variance-tradeoff" style="font-size: 80%;"><b>A way to Read the Bias-Variance Tradeoff</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs014.html#understanding-what-happens" style="font-size: 80%;"><b>Understanding what happens</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs015.html#summing-up" style="font-size: 80%;"><b>Summing up</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs016.html#another-example-from-scikit-learn-s-repository" style="font-size: 80%;"><b>Another Example from Scikit-Learn's Repository</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs017.html#various-steps-in-cross-validation" style="font-size: 80%;"><b>Various steps in cross-validation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs018.html#cross-validation-in-brief" style="font-size: 80%;"><b>Cross-validation in brief</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs019.html#code-example-for-cross-validation-and-k-fold-cross-validation" style="font-size: 80%;"><b>Code Example for Cross-validation and \( k \)-fold Cross-validation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs020.html#more-examples-on-bootstrap-and-cross-validation-and-errors" style="font-size: 80%;"><b>More examples on bootstrap and cross-validation and errors</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs021.html#the-same-example-but-now-with-cross-validation" style="font-size: 80%;"><b>The same example but now with cross-validation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs022.html#logistic-regression" style="font-size: 80%;"><b>Logistic Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs023.html#classification-problems" style="font-size: 80%;"><b>Classification problems</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs024.html#optimization-and-deep-learning" style="font-size: 80%;"><b>Optimization and Deep learning</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs025.html#basics" style="font-size: 80%;"><b>Basics</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs026.html#linear-classifier" style="font-size: 80%;"><b>Linear classifier</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs027.html#some-selected-properties" style="font-size: 80%;"><b>Some selected properties</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs028.html#simple-example" style="font-size: 80%;"><b>Simple example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs029.html#plotting-the-mean-value-for-each-group" style="font-size: 80%;"><b>Plotting the mean value for each group</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs030.html#the-logistic-function" style="font-size: 80%;"><b>The logistic function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs031.html#examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-networks" style="font-size: 80%;"><b>Examples of likelihood functions used in logistic regression and nueral networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs032.html#two-parameters" style="font-size: 80%;"><b>Two parameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs033.html#maximum-likelihood" style="font-size: 80%;"><b>Maximum likelihood</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#the-cost-function-rewritten" style="font-size: 80%;"><b>The cost function rewritten</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs035.html#minimizing-the-cross-entropy" style="font-size: 80%;"><b>Minimizing the cross entropy</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs036.html#a-more-compact-expression" style="font-size: 80%;"><b>A more compact expression</b></a></li>
+     <!-- navigation toc: --> <li><a href="#extending-to-more-predictors" style="font-size: 80%;"><b>Extending to more predictors</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs038.html#including-more-classes" style="font-size: 80%;"><b>Including more classes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs039.html#more-classes" style="font-size: 80%;"><b>More classes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs040.html#optimization-the-central-part-of-any-machine-learning-algortithm" style="font-size: 80%;"><b>Optimization, the central part of any Machine Learning algortithm</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs041.html#revisiting-our-logistic-regression-case" style="font-size: 80%;"><b>Revisiting our Logistic Regression case</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs042.html#the-equations-to-solve" style="font-size: 80%;"><b>The equations to solve</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs043.html#solving-using-newton-raphson-s-method" style="font-size: 80%;"><b>Solving using Newton-Raphson's method</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs044.html#example-code-for-logistic-regression" style="font-size: 80%;"><b>Example code for Logistic Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs044.html#synthetic-data-generation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Synthetic data generation</a></li>
+
+        </ul>
+      </li>
+    </ul>
+  </div>
+</div>
+</div> <!-- end of navigation bar -->
+<div class="container">
+<p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
+<a name="part0037"></a>
+<!-- !split -->
+<h2 id="extending-to-more-predictors" class="anchor">Extending to more predictors </h2>
+
+<p>Within a binary classification problem, we can easily expand our model to include multiple predictors. Our ratio between likelihoods is then with \( p \) predictors</p>
+$$
+\log{ \frac{p(\boldsymbol{\theta}\boldsymbol{x})}{1-p(\boldsymbol{\theta}\boldsymbol{x})}} = \theta_0+\theta_1x_1+\theta_2x_2+\dots+\theta_px_p.
+$$
+
+<p>Here we defined \( \boldsymbol{x}=[1,x_1,x_2,\dots,x_p] \) and \( \boldsymbol{\theta}=[\theta_0, \theta_1, \dots, \theta_p] \) leading to</p>
+$$
+p(\boldsymbol{\theta}\boldsymbol{x})=\frac{ \exp{(\theta_0+\theta_1x_1+\theta_2x_2+\dots+\theta_px_p)}}{1+\exp{(\theta_0+\theta_1x_1+\theta_2x_2+\dots+\theta_px_p)}}.
+$$
+
+
+<p>
+<!-- navigation buttons at the bottom of the page -->
+<ul class="pagination">
+<li><a href="http://github.com/._week39-bs036.html">&laquo;</a></li>
+  <li><a href="http://github.com/._week39-bs000.html">1</a></li>
+  <li><a href="">...</a></li>
+  <li><a href="http://github.com/._week39-bs029.html">30</a></li>
+  <li><a href="http://github.com/._week39-bs030.html">31</a></li>
+  <li><a href="http://github.com/._week39-bs031.html">32</a></li>
+  <li><a href="http://github.com/._week39-bs032.html">33</a></li>
+  <li><a href="http://github.com/._week39-bs033.html">34</a></li>
+  <li><a href="http://github.com/._week39-bs034.html">35</a></li>
+  <li><a href="http://github.com/._week39-bs035.html">36</a></li>
+  <li><a href="http://github.com/._week39-bs036.html">37</a></li>
+  <li class="active"><a href="http://github.com/._week39-bs037.html">38</a></li>
+  <li><a href="http://github.com/._week39-bs038.html">39</a></li>
+  <li><a href="http://github.com/._week39-bs039.html">40</a></li>
+  <li><a href="http://github.com/._week39-bs040.html">41</a></li>
+  <li><a href="http://github.com/._week39-bs041.html">42</a></li>
+  <li><a href="http://github.com/._week39-bs042.html">43</a></li>
+  <li><a href="http://github.com/._week39-bs043.html">44</a></li>
+  <li><a href="http://github.com/._week39-bs044.html">45</a></li>
+  <li><a href="http://github.com/._week39-bs038.html">&raquo;</a></li>
+</ul>
+<!-- ------------------- end of main content --------------- -->
+</div>  <!-- end container -->
+<!-- include javascript, jQuery *first* -->
+<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.10.2/jquery.min.js"></script>
+<script src="https://netdna.bootstrapcdn.com/bootstrap/3.0.0/js/bootstrap.min.js"></script>
+<!-- Bootstrap footer
+<footer>
+<a href="https://.../"><img width="250" align=right src="https://.../"></a>
+</footer>
+-->
+<center style="font-size:80%">
+<!-- copyright only on the titlepage -->
+</center>
+</body>
+</html>
+
diff --git a/doc/pub/week39/html/._week39-bs038.html b/doc/pub/week39/html/._week39-bs038.html
new file mode 100644
index 000000000..206eda2de
--- /dev/null
+++ b/doc/pub/week39/html/._week39-bs038.html
@@ -0,0 +1,318 @@
+<!--
+HTML file automatically generated from DocOnce source
+(https://github.com/doconce/doconce/)
+doconce format html week39.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week39-bs --no_mako
+-->
+<html>
+<head>
+<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
+<meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
+<meta name="viewport" content="width=device-width, initial-scale=1.0" />
+<meta name="description" content="Week 39: Resampling methods and logistic regression">
+<title>Week 39: Resampling methods and logistic regression</title>
+<!-- Bootstrap style: bootstrap -->
+<!-- doconce format html week39.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week39-bs --no_mako -->
+<link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
+<!-- not necessary
+<link href="https://netdna.bootstrapcdn.com/font-awesome/4.0.3/css/font-awesome.css" rel="stylesheet">
+-->
+<style type="text/css">
+/* Add scrollbar to dropdown menus in bootstrap navigation bar */
+.dropdown-menu {
+   height: auto;
+   max-height: 400px;
+   overflow-x: hidden;
+}
+/* Adds an invisible element before each target to offset for the navigation
+   bar */
+.anchor::before {
+  content:"";
+  display:block;
+  height:50px;      /* fixed header height for style bootstrap */
+  margin:-50px 0 0; /* negative fixed header height */
+}
+</style>
+</head>
+
+<!-- tocinfo
+{'highest level': 2,
+ 'sections': [('Plan for week 39, September 22-26, 2025',
+               2,
+               None,
+               'plan-for-week-39-september-22-26-2025'),
+              ('Readings and Videos, resampling methods',
+               2,
+               None,
+               'readings-and-videos-resampling-methods'),
+              ('Readings and Videos, logistic regression',
+               2,
+               None,
+               'readings-and-videos-logistic-regression'),
+              ('Lab sessions week 39', 2, None, 'lab-sessions-week-39'),
+              ('Lecture material', 2, None, 'lecture-material'),
+              ('Resampling methods', 2, None, 'resampling-methods'),
+              ('Resampling approaches can be computationally expensive',
+               2,
+               None,
+               'resampling-approaches-can-be-computationally-expensive'),
+              ('Why resampling methods ?', 2, None, 'why-resampling-methods'),
+              ('Statistical analysis', 2, None, 'statistical-analysis'),
+              ('Resampling methods', 2, None, 'resampling-methods'),
+              ('Resampling methods: Bootstrap',
+               2,
+               None,
+               'resampling-methods-bootstrap'),
+              ('The bias-variance tradeoff',
+               2,
+               None,
+               'the-bias-variance-tradeoff'),
+              ('A way to Read the Bias-Variance Tradeoff',
+               2,
+               None,
+               'a-way-to-read-the-bias-variance-tradeoff'),
+              ('Understanding what happens',
+               2,
+               None,
+               'understanding-what-happens'),
+              ('Summing up', 2, None, 'summing-up'),
+              ("Another Example from Scikit-Learn's Repository",
+               2,
+               None,
+               'another-example-from-scikit-learn-s-repository'),
+              ('Various steps in cross-validation',
+               2,
+               None,
+               'various-steps-in-cross-validation'),
+              ('Cross-validation in brief',
+               2,
+               None,
+               'cross-validation-in-brief'),
+              ('Code Example for Cross-validation and $k$-fold '
+               'Cross-validation',
+               2,
+               None,
+               'code-example-for-cross-validation-and-k-fold-cross-validation'),
+              ('More examples on bootstrap and cross-validation and errors',
+               2,
+               None,
+               'more-examples-on-bootstrap-and-cross-validation-and-errors'),
+              ('The same example but now with cross-validation',
+               2,
+               None,
+               'the-same-example-but-now-with-cross-validation'),
+              ('Logistic Regression', 2, None, 'logistic-regression'),
+              ('Classification problems', 2, None, 'classification-problems'),
+              ('Optimization and Deep learning',
+               2,
+               None,
+               'optimization-and-deep-learning'),
+              ('Basics', 2, None, 'basics'),
+              ('Linear classifier', 2, None, 'linear-classifier'),
+              ('Some selected properties', 2, None, 'some-selected-properties'),
+              ('Simple example', 2, None, 'simple-example'),
+              ('Plotting the mean value for each group',
+               2,
+               None,
+               'plotting-the-mean-value-for-each-group'),
+              ('The logistic function', 2, None, 'the-logistic-function'),
+              ('Examples of likelihood functions used in logistic regression '
+               'and nueral networks',
+               2,
+               None,
+               'examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-networks'),
+              ('Two parameters', 2, None, 'two-parameters'),
+              ('Maximum likelihood', 2, None, 'maximum-likelihood'),
+              ('The cost function rewritten',
+               2,
+               None,
+               'the-cost-function-rewritten'),
+              ('Minimizing the cross entropy',
+               2,
+               None,
+               'minimizing-the-cross-entropy'),
+              ('A more compact expression',
+               2,
+               None,
+               'a-more-compact-expression'),
+              ('Extending to more predictors',
+               2,
+               None,
+               'extending-to-more-predictors'),
+              ('Including more classes', 2, None, 'including-more-classes'),
+              ('More classes', 2, None, 'more-classes'),
+              ('Optimization, the central part of any Machine Learning '
+               'algortithm',
+               2,
+               None,
+               'optimization-the-central-part-of-any-machine-learning-algortithm'),
+              ('Revisiting our Logistic Regression case',
+               2,
+               None,
+               'revisiting-our-logistic-regression-case'),
+              ('The equations to solve', 2, None, 'the-equations-to-solve'),
+              ("Solving using Newton-Raphson's method",
+               2,
+               None,
+               'solving-using-newton-raphson-s-method'),
+              ('Example code for Logistic Regression',
+               2,
+               None,
+               'example-code-for-logistic-regression'),
+              ('Synthetic data generation',
+               3,
+               None,
+               'synthetic-data-generation')]}
+end of tocinfo -->
+
+<body>
+
+
+
+<script type="text/x-mathjax-config">
+MathJax.Hub.Config({
+  TeX: {
+     equationNumbers: {  autoNumber: "none"  },
+     extensions: ["AMSmath.js", "AMSsymbols.js", "autobold.js", "color.js"]
+  }
+});
+</script>
+<script type="text/javascript" async
+ src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
+</script>
+
+
+<!-- Bootstrap navigation bar -->
+<div class="navbar navbar-default navbar-fixed-top">
+  <div class="navbar-header">
+    <button type="button" class="navbar-toggle" data-toggle="collapse" data-target=".navbar-responsive-collapse">
+      <span class="icon-bar"></span>
+      <span class="icon-bar"></span>
+      <span class="icon-bar"></span>
+    </button>
+    <a class="navbar-brand" href="http://github.com/week39-bs.html">Week 39: Resampling methods and logistic regression</a>
+  </div>
+  <div class="navbar-collapse collapse navbar-responsive-collapse">
+    <ul class="nav navbar-nav navbar-right">
+      <li class="dropdown">
+        <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
+        <ul class="dropdown-menu">
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs001.html#plan-for-week-39-september-22-26-2025" style="font-size: 80%;"><b>Plan for week 39, September 22-26, 2025</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs002.html#readings-and-videos-resampling-methods" style="font-size: 80%;"><b>Readings and Videos, resampling methods</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs003.html#readings-and-videos-logistic-regression" style="font-size: 80%;"><b>Readings and Videos, logistic regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs004.html#lab-sessions-week-39" style="font-size: 80%;"><b>Lab sessions week 39</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs005.html#lecture-material" style="font-size: 80%;"><b>Lecture material</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs010.html#resampling-methods" style="font-size: 80%;"><b>Resampling methods</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs007.html#resampling-approaches-can-be-computationally-expensive" style="font-size: 80%;"><b>Resampling approaches can be computationally expensive</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs008.html#why-resampling-methods" style="font-size: 80%;"><b>Why resampling methods ?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs009.html#statistical-analysis" style="font-size: 80%;"><b>Statistical analysis</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs010.html#resampling-methods" style="font-size: 80%;"><b>Resampling methods</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs011.html#resampling-methods-bootstrap" style="font-size: 80%;"><b>Resampling methods: Bootstrap</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs012.html#the-bias-variance-tradeoff" style="font-size: 80%;"><b>The bias-variance tradeoff</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs013.html#a-way-to-read-the-bias-variance-tradeoff" style="font-size: 80%;"><b>A way to Read the Bias-Variance Tradeoff</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs014.html#understanding-what-happens" style="font-size: 80%;"><b>Understanding what happens</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs015.html#summing-up" style="font-size: 80%;"><b>Summing up</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs016.html#another-example-from-scikit-learn-s-repository" style="font-size: 80%;"><b>Another Example from Scikit-Learn's Repository</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs017.html#various-steps-in-cross-validation" style="font-size: 80%;"><b>Various steps in cross-validation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs018.html#cross-validation-in-brief" style="font-size: 80%;"><b>Cross-validation in brief</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs019.html#code-example-for-cross-validation-and-k-fold-cross-validation" style="font-size: 80%;"><b>Code Example for Cross-validation and \( k \)-fold Cross-validation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs020.html#more-examples-on-bootstrap-and-cross-validation-and-errors" style="font-size: 80%;"><b>More examples on bootstrap and cross-validation and errors</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs021.html#the-same-example-but-now-with-cross-validation" style="font-size: 80%;"><b>The same example but now with cross-validation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs022.html#logistic-regression" style="font-size: 80%;"><b>Logistic Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs023.html#classification-problems" style="font-size: 80%;"><b>Classification problems</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs024.html#optimization-and-deep-learning" style="font-size: 80%;"><b>Optimization and Deep learning</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs025.html#basics" style="font-size: 80%;"><b>Basics</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs026.html#linear-classifier" style="font-size: 80%;"><b>Linear classifier</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs027.html#some-selected-properties" style="font-size: 80%;"><b>Some selected properties</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs028.html#simple-example" style="font-size: 80%;"><b>Simple example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs029.html#plotting-the-mean-value-for-each-group" style="font-size: 80%;"><b>Plotting the mean value for each group</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs030.html#the-logistic-function" style="font-size: 80%;"><b>The logistic function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs031.html#examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-networks" style="font-size: 80%;"><b>Examples of likelihood functions used in logistic regression and nueral networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs032.html#two-parameters" style="font-size: 80%;"><b>Two parameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs033.html#maximum-likelihood" style="font-size: 80%;"><b>Maximum likelihood</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#the-cost-function-rewritten" style="font-size: 80%;"><b>The cost function rewritten</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs035.html#minimizing-the-cross-entropy" style="font-size: 80%;"><b>Minimizing the cross entropy</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs036.html#a-more-compact-expression" style="font-size: 80%;"><b>A more compact expression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs037.html#extending-to-more-predictors" style="font-size: 80%;"><b>Extending to more predictors</b></a></li>
+     <!-- navigation toc: --> <li><a href="#including-more-classes" style="font-size: 80%;"><b>Including more classes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs039.html#more-classes" style="font-size: 80%;"><b>More classes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs040.html#optimization-the-central-part-of-any-machine-learning-algortithm" style="font-size: 80%;"><b>Optimization, the central part of any Machine Learning algortithm</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs041.html#revisiting-our-logistic-regression-case" style="font-size: 80%;"><b>Revisiting our Logistic Regression case</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs042.html#the-equations-to-solve" style="font-size: 80%;"><b>The equations to solve</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs043.html#solving-using-newton-raphson-s-method" style="font-size: 80%;"><b>Solving using Newton-Raphson's method</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs044.html#example-code-for-logistic-regression" style="font-size: 80%;"><b>Example code for Logistic Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs044.html#synthetic-data-generation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Synthetic data generation</a></li>
+
+        </ul>
+      </li>
+    </ul>
+  </div>
+</div>
+</div> <!-- end of navigation bar -->
+<div class="container">
+<p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
+<a name="part0038"></a>
+<!-- !split -->
+<h2 id="including-more-classes" class="anchor">Including more classes </h2>
+
+<p>Till now we have mainly focused on two classes, the so-called binary
+system. Suppose we wish to extend to \( K \) classes.  Let us for the sake
+of simplicity assume we have only two predictors. We have then following model
+</p>
+
+$$
+\log{\frac{p(C=1\vert x)}{p(K\vert x)}} = \theta_{10}+\theta_{11}x_1,
+$$
+
+<p>and </p>
+$$
+\log{\frac{p(C=2\vert x)}{p(K\vert x)}} = \theta_{20}+\theta_{21}x_1,
+$$
+
+<p>and so on till the class \( C=K-1 \) class</p>
+$$
+\log{\frac{p(C=K-1\vert x)}{p(K\vert x)}} = \theta_{(K-1)0}+\theta_{(K-1)1}x_1,
+$$
+
+<p>and the model is specified in term of \( K-1 \) so-called log-odds or
+<b>logit</b> transformations.
+</p>
+
+<p>
+<!-- navigation buttons at the bottom of the page -->
+<ul class="pagination">
+<li><a href="http://github.com/._week39-bs037.html">&laquo;</a></li>
+  <li><a href="http://github.com/._week39-bs000.html">1</a></li>
+  <li><a href="">...</a></li>
+  <li><a href="http://github.com/._week39-bs030.html">31</a></li>
+  <li><a href="http://github.com/._week39-bs031.html">32</a></li>
+  <li><a href="http://github.com/._week39-bs032.html">33</a></li>
+  <li><a href="http://github.com/._week39-bs033.html">34</a></li>
+  <li><a href="http://github.com/._week39-bs034.html">35</a></li>
+  <li><a href="http://github.com/._week39-bs035.html">36</a></li>
+  <li><a href="http://github.com/._week39-bs036.html">37</a></li>
+  <li><a href="http://github.com/._week39-bs037.html">38</a></li>
+  <li class="active"><a href="http://github.com/._week39-bs038.html">39</a></li>
+  <li><a href="http://github.com/._week39-bs039.html">40</a></li>
+  <li><a href="http://github.com/._week39-bs040.html">41</a></li>
+  <li><a href="http://github.com/._week39-bs041.html">42</a></li>
+  <li><a href="http://github.com/._week39-bs042.html">43</a></li>
+  <li><a href="http://github.com/._week39-bs043.html">44</a></li>
+  <li><a href="http://github.com/._week39-bs044.html">45</a></li>
+  <li><a href="http://github.com/._week39-bs039.html">&raquo;</a></li>
+</ul>
+<!-- ------------------- end of main content --------------- -->
+</div>  <!-- end container -->
+<!-- include javascript, jQuery *first* -->
+<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.10.2/jquery.min.js"></script>
+<script src="https://netdna.bootstrapcdn.com/bootstrap/3.0.0/js/bootstrap.min.js"></script>
+<!-- Bootstrap footer
+<footer>
+<a href="https://.../"><img width="250" align=right src="https://.../"></a>
+</footer>
+-->
+<center style="font-size:80%">
+<!-- copyright only on the titlepage -->
+</center>
+</body>
+</html>
+
diff --git a/doc/pub/week39/html/._week39-bs039.html b/doc/pub/week39/html/._week39-bs039.html
new file mode 100644
index 000000000..88282c44d
--- /dev/null
+++ b/doc/pub/week39/html/._week39-bs039.html
@@ -0,0 +1,329 @@
+<!--
+HTML file automatically generated from DocOnce source
+(https://github.com/doconce/doconce/)
+doconce format html week39.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week39-bs --no_mako
+-->
+<html>
+<head>
+<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
+<meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
+<meta name="viewport" content="width=device-width, initial-scale=1.0" />
+<meta name="description" content="Week 39: Resampling methods and logistic regression">
+<title>Week 39: Resampling methods and logistic regression</title>
+<!-- Bootstrap style: bootstrap -->
+<!-- doconce format html week39.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week39-bs --no_mako -->
+<link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
+<!-- not necessary
+<link href="https://netdna.bootstrapcdn.com/font-awesome/4.0.3/css/font-awesome.css" rel="stylesheet">
+-->
+<style type="text/css">
+/* Add scrollbar to dropdown menus in bootstrap navigation bar */
+.dropdown-menu {
+   height: auto;
+   max-height: 400px;
+   overflow-x: hidden;
+}
+/* Adds an invisible element before each target to offset for the navigation
+   bar */
+.anchor::before {
+  content:"";
+  display:block;
+  height:50px;      /* fixed header height for style bootstrap */
+  margin:-50px 0 0; /* negative fixed header height */
+}
+</style>
+</head>
+
+<!-- tocinfo
+{'highest level': 2,
+ 'sections': [('Plan for week 39, September 22-26, 2025',
+               2,
+               None,
+               'plan-for-week-39-september-22-26-2025'),
+              ('Readings and Videos, resampling methods',
+               2,
+               None,
+               'readings-and-videos-resampling-methods'),
+              ('Readings and Videos, logistic regression',
+               2,
+               None,
+               'readings-and-videos-logistic-regression'),
+              ('Lab sessions week 39', 2, None, 'lab-sessions-week-39'),
+              ('Lecture material', 2, None, 'lecture-material'),
+              ('Resampling methods', 2, None, 'resampling-methods'),
+              ('Resampling approaches can be computationally expensive',
+               2,
+               None,
+               'resampling-approaches-can-be-computationally-expensive'),
+              ('Why resampling methods ?', 2, None, 'why-resampling-methods'),
+              ('Statistical analysis', 2, None, 'statistical-analysis'),
+              ('Resampling methods', 2, None, 'resampling-methods'),
+              ('Resampling methods: Bootstrap',
+               2,
+               None,
+               'resampling-methods-bootstrap'),
+              ('The bias-variance tradeoff',
+               2,
+               None,
+               'the-bias-variance-tradeoff'),
+              ('A way to Read the Bias-Variance Tradeoff',
+               2,
+               None,
+               'a-way-to-read-the-bias-variance-tradeoff'),
+              ('Understanding what happens',
+               2,
+               None,
+               'understanding-what-happens'),
+              ('Summing up', 2, None, 'summing-up'),
+              ("Another Example from Scikit-Learn's Repository",
+               2,
+               None,
+               'another-example-from-scikit-learn-s-repository'),
+              ('Various steps in cross-validation',
+               2,
+               None,
+               'various-steps-in-cross-validation'),
+              ('Cross-validation in brief',
+               2,
+               None,
+               'cross-validation-in-brief'),
+              ('Code Example for Cross-validation and $k$-fold '
+               'Cross-validation',
+               2,
+               None,
+               'code-example-for-cross-validation-and-k-fold-cross-validation'),
+              ('More examples on bootstrap and cross-validation and errors',
+               2,
+               None,
+               'more-examples-on-bootstrap-and-cross-validation-and-errors'),
+              ('The same example but now with cross-validation',
+               2,
+               None,
+               'the-same-example-but-now-with-cross-validation'),
+              ('Logistic Regression', 2, None, 'logistic-regression'),
+              ('Classification problems', 2, None, 'classification-problems'),
+              ('Optimization and Deep learning',
+               2,
+               None,
+               'optimization-and-deep-learning'),
+              ('Basics', 2, None, 'basics'),
+              ('Linear classifier', 2, None, 'linear-classifier'),
+              ('Some selected properties', 2, None, 'some-selected-properties'),
+              ('Simple example', 2, None, 'simple-example'),
+              ('Plotting the mean value for each group',
+               2,
+               None,
+               'plotting-the-mean-value-for-each-group'),
+              ('The logistic function', 2, None, 'the-logistic-function'),
+              ('Examples of likelihood functions used in logistic regression '
+               'and nueral networks',
+               2,
+               None,
+               'examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-networks'),
+              ('Two parameters', 2, None, 'two-parameters'),
+              ('Maximum likelihood', 2, None, 'maximum-likelihood'),
+              ('The cost function rewritten',
+               2,
+               None,
+               'the-cost-function-rewritten'),
+              ('Minimizing the cross entropy',
+               2,
+               None,
+               'minimizing-the-cross-entropy'),
+              ('A more compact expression',
+               2,
+               None,
+               'a-more-compact-expression'),
+              ('Extending to more predictors',
+               2,
+               None,
+               'extending-to-more-predictors'),
+              ('Including more classes', 2, None, 'including-more-classes'),
+              ('More classes', 2, None, 'more-classes'),
+              ('Optimization, the central part of any Machine Learning '
+               'algortithm',
+               2,
+               None,
+               'optimization-the-central-part-of-any-machine-learning-algortithm'),
+              ('Revisiting our Logistic Regression case',
+               2,
+               None,
+               'revisiting-our-logistic-regression-case'),
+              ('The equations to solve', 2, None, 'the-equations-to-solve'),
+              ("Solving using Newton-Raphson's method",
+               2,
+               None,
+               'solving-using-newton-raphson-s-method'),
+              ('Example code for Logistic Regression',
+               2,
+               None,
+               'example-code-for-logistic-regression'),
+              ('Synthetic data generation',
+               3,
+               None,
+               'synthetic-data-generation')]}
+end of tocinfo -->
+
+<body>
+
+
+
+<script type="text/x-mathjax-config">
+MathJax.Hub.Config({
+  TeX: {
+     equationNumbers: {  autoNumber: "none"  },
+     extensions: ["AMSmath.js", "AMSsymbols.js", "autobold.js", "color.js"]
+  }
+});
+</script>
+<script type="text/javascript" async
+ src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
+</script>
+
+
+<!-- Bootstrap navigation bar -->
+<div class="navbar navbar-default navbar-fixed-top">
+  <div class="navbar-header">
+    <button type="button" class="navbar-toggle" data-toggle="collapse" data-target=".navbar-responsive-collapse">
+      <span class="icon-bar"></span>
+      <span class="icon-bar"></span>
+      <span class="icon-bar"></span>
+    </button>
+    <a class="navbar-brand" href="http://github.com/week39-bs.html">Week 39: Resampling methods and logistic regression</a>
+  </div>
+  <div class="navbar-collapse collapse navbar-responsive-collapse">
+    <ul class="nav navbar-nav navbar-right">
+      <li class="dropdown">
+        <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
+        <ul class="dropdown-menu">
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs001.html#plan-for-week-39-september-22-26-2025" style="font-size: 80%;"><b>Plan for week 39, September 22-26, 2025</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs002.html#readings-and-videos-resampling-methods" style="font-size: 80%;"><b>Readings and Videos, resampling methods</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs003.html#readings-and-videos-logistic-regression" style="font-size: 80%;"><b>Readings and Videos, logistic regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs004.html#lab-sessions-week-39" style="font-size: 80%;"><b>Lab sessions week 39</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs005.html#lecture-material" style="font-size: 80%;"><b>Lecture material</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs010.html#resampling-methods" style="font-size: 80%;"><b>Resampling methods</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs007.html#resampling-approaches-can-be-computationally-expensive" style="font-size: 80%;"><b>Resampling approaches can be computationally expensive</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs008.html#why-resampling-methods" style="font-size: 80%;"><b>Why resampling methods ?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs009.html#statistical-analysis" style="font-size: 80%;"><b>Statistical analysis</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs010.html#resampling-methods" style="font-size: 80%;"><b>Resampling methods</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs011.html#resampling-methods-bootstrap" style="font-size: 80%;"><b>Resampling methods: Bootstrap</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs012.html#the-bias-variance-tradeoff" style="font-size: 80%;"><b>The bias-variance tradeoff</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs013.html#a-way-to-read-the-bias-variance-tradeoff" style="font-size: 80%;"><b>A way to Read the Bias-Variance Tradeoff</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs014.html#understanding-what-happens" style="font-size: 80%;"><b>Understanding what happens</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs015.html#summing-up" style="font-size: 80%;"><b>Summing up</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs016.html#another-example-from-scikit-learn-s-repository" style="font-size: 80%;"><b>Another Example from Scikit-Learn's Repository</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs017.html#various-steps-in-cross-validation" style="font-size: 80%;"><b>Various steps in cross-validation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs018.html#cross-validation-in-brief" style="font-size: 80%;"><b>Cross-validation in brief</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs019.html#code-example-for-cross-validation-and-k-fold-cross-validation" style="font-size: 80%;"><b>Code Example for Cross-validation and \( k \)-fold Cross-validation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs020.html#more-examples-on-bootstrap-and-cross-validation-and-errors" style="font-size: 80%;"><b>More examples on bootstrap and cross-validation and errors</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs021.html#the-same-example-but-now-with-cross-validation" style="font-size: 80%;"><b>The same example but now with cross-validation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs022.html#logistic-regression" style="font-size: 80%;"><b>Logistic Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs023.html#classification-problems" style="font-size: 80%;"><b>Classification problems</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs024.html#optimization-and-deep-learning" style="font-size: 80%;"><b>Optimization and Deep learning</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs025.html#basics" style="font-size: 80%;"><b>Basics</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs026.html#linear-classifier" style="font-size: 80%;"><b>Linear classifier</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs027.html#some-selected-properties" style="font-size: 80%;"><b>Some selected properties</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs028.html#simple-example" style="font-size: 80%;"><b>Simple example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs029.html#plotting-the-mean-value-for-each-group" style="font-size: 80%;"><b>Plotting the mean value for each group</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs030.html#the-logistic-function" style="font-size: 80%;"><b>The logistic function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs031.html#examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-networks" style="font-size: 80%;"><b>Examples of likelihood functions used in logistic regression and nueral networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs032.html#two-parameters" style="font-size: 80%;"><b>Two parameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs033.html#maximum-likelihood" style="font-size: 80%;"><b>Maximum likelihood</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#the-cost-function-rewritten" style="font-size: 80%;"><b>The cost function rewritten</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs035.html#minimizing-the-cross-entropy" style="font-size: 80%;"><b>Minimizing the cross entropy</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs036.html#a-more-compact-expression" style="font-size: 80%;"><b>A more compact expression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs037.html#extending-to-more-predictors" style="font-size: 80%;"><b>Extending to more predictors</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs038.html#including-more-classes" style="font-size: 80%;"><b>Including more classes</b></a></li>
+     <!-- navigation toc: --> <li><a href="#more-classes" style="font-size: 80%;"><b>More classes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs040.html#optimization-the-central-part-of-any-machine-learning-algortithm" style="font-size: 80%;"><b>Optimization, the central part of any Machine Learning algortithm</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs041.html#revisiting-our-logistic-regression-case" style="font-size: 80%;"><b>Revisiting our Logistic Regression case</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs042.html#the-equations-to-solve" style="font-size: 80%;"><b>The equations to solve</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs043.html#solving-using-newton-raphson-s-method" style="font-size: 80%;"><b>Solving using Newton-Raphson's method</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs044.html#example-code-for-logistic-regression" style="font-size: 80%;"><b>Example code for Logistic Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs044.html#synthetic-data-generation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Synthetic data generation</a></li>
+
+        </ul>
+      </li>
+    </ul>
+  </div>
+</div>
+</div> <!-- end of navigation bar -->
+<div class="container">
+<p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
+<a name="part0039"></a>
+<!-- !split -->
+<h2 id="more-classes" class="anchor">More classes </h2>
+
+<p>In our discussion of neural networks we will encounter the above again
+in terms of a slightly modified function, the so-called <b>Softmax</b> function.
+</p>
+
+<p>The softmax function is used in various multiclass classification
+methods, such as multinomial logistic regression (also known as
+softmax regression), multiclass linear discriminant analysis, naive
+Bayes classifiers, and artificial neural networks.  Specifically, in
+multinomial logistic regression and linear discriminant analysis, the
+input to the function is the result of \( K \) distinct linear functions,
+and the predicted probability for the \( k \)-th class given a sample
+vector \( \boldsymbol{x} \) and a weighting vector \( \boldsymbol{\theta} \) is (with two
+predictors):
+</p>
+
+$$
+p(C=k\vert \mathbf {x} )=\frac{\exp{(\theta_{k0}+\theta_{k1}x_1)}}{1+\sum_{l=1}^{K-1}\exp{(\theta_{l0}+\theta_{l1}x_1)}}.
+$$
+
+<p>It is easy to extend to more predictors. The final class is </p>
+$$
+p(C=K\vert \mathbf {x} )=\frac{1}{1+\sum_{l=1}^{K-1}\exp{(\theta_{l0}+\theta_{l1}x_1)}},
+$$
+
+<p>and they sum to one. Our earlier discussions were all specialized to
+the case with two classes only. It is easy to see from the above that
+what we derived earlier is compatible with these equations.
+</p>
+
+<p>To find the optimal parameters we would typically use a gradient
+descent method.  Newton's method and gradient descent methods are
+discussed in the material on <a href="https://compphysics.github.io/MachineLearning/doc/pub/Splines/html/Splines-bs.html" target="_self">optimization
+methods</a>.
+</p>
+
+<p>
+<!-- navigation buttons at the bottom of the page -->
+<ul class="pagination">
+<li><a href="http://github.com/._week39-bs038.html">&laquo;</a></li>
+  <li><a href="http://github.com/._week39-bs000.html">1</a></li>
+  <li><a href="">...</a></li>
+  <li><a href="http://github.com/._week39-bs031.html">32</a></li>
+  <li><a href="http://github.com/._week39-bs032.html">33</a></li>
+  <li><a href="http://github.com/._week39-bs033.html">34</a></li>
+  <li><a href="http://github.com/._week39-bs034.html">35</a></li>
+  <li><a href="http://github.com/._week39-bs035.html">36</a></li>
+  <li><a href="http://github.com/._week39-bs036.html">37</a></li>
+  <li><a href="http://github.com/._week39-bs037.html">38</a></li>
+  <li><a href="http://github.com/._week39-bs038.html">39</a></li>
+  <li class="active"><a href="http://github.com/._week39-bs039.html">40</a></li>
+  <li><a href="http://github.com/._week39-bs040.html">41</a></li>
+  <li><a href="http://github.com/._week39-bs041.html">42</a></li>
+  <li><a href="http://github.com/._week39-bs042.html">43</a></li>
+  <li><a href="http://github.com/._week39-bs043.html">44</a></li>
+  <li><a href="http://github.com/._week39-bs044.html">45</a></li>
+  <li><a href="http://github.com/._week39-bs040.html">&raquo;</a></li>
+</ul>
+<!-- ------------------- end of main content --------------- -->
+</div>  <!-- end container -->
+<!-- include javascript, jQuery *first* -->
+<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.10.2/jquery.min.js"></script>
+<script src="https://netdna.bootstrapcdn.com/bootstrap/3.0.0/js/bootstrap.min.js"></script>
+<!-- Bootstrap footer
+<footer>
+<a href="https://.../"><img width="250" align=right src="https://.../"></a>
+</footer>
+-->
+<center style="font-size:80%">
+<!-- copyright only on the titlepage -->
+</center>
+</body>
+</html>
+
diff --git a/doc/pub/week39/html/._week39-bs040.html b/doc/pub/week39/html/._week39-bs040.html
new file mode 100644
index 000000000..26d57c047
--- /dev/null
+++ b/doc/pub/week39/html/._week39-bs040.html
@@ -0,0 +1,303 @@
+<!--
+HTML file automatically generated from DocOnce source
+(https://github.com/doconce/doconce/)
+doconce format html week39.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week39-bs --no_mako
+-->
+<html>
+<head>
+<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
+<meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
+<meta name="viewport" content="width=device-width, initial-scale=1.0" />
+<meta name="description" content="Week 39: Resampling methods and logistic regression">
+<title>Week 39: Resampling methods and logistic regression</title>
+<!-- Bootstrap style: bootstrap -->
+<!-- doconce format html week39.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week39-bs --no_mako -->
+<link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
+<!-- not necessary
+<link href="https://netdna.bootstrapcdn.com/font-awesome/4.0.3/css/font-awesome.css" rel="stylesheet">
+-->
+<style type="text/css">
+/* Add scrollbar to dropdown menus in bootstrap navigation bar */
+.dropdown-menu {
+   height: auto;
+   max-height: 400px;
+   overflow-x: hidden;
+}
+/* Adds an invisible element before each target to offset for the navigation
+   bar */
+.anchor::before {
+  content:"";
+  display:block;
+  height:50px;      /* fixed header height for style bootstrap */
+  margin:-50px 0 0; /* negative fixed header height */
+}
+</style>
+</head>
+
+<!-- tocinfo
+{'highest level': 2,
+ 'sections': [('Plan for week 39, September 22-26, 2025',
+               2,
+               None,
+               'plan-for-week-39-september-22-26-2025'),
+              ('Readings and Videos, resampling methods',
+               2,
+               None,
+               'readings-and-videos-resampling-methods'),
+              ('Readings and Videos, logistic regression',
+               2,
+               None,
+               'readings-and-videos-logistic-regression'),
+              ('Lab sessions week 39', 2, None, 'lab-sessions-week-39'),
+              ('Lecture material', 2, None, 'lecture-material'),
+              ('Resampling methods', 2, None, 'resampling-methods'),
+              ('Resampling approaches can be computationally expensive',
+               2,
+               None,
+               'resampling-approaches-can-be-computationally-expensive'),
+              ('Why resampling methods ?', 2, None, 'why-resampling-methods'),
+              ('Statistical analysis', 2, None, 'statistical-analysis'),
+              ('Resampling methods', 2, None, 'resampling-methods'),
+              ('Resampling methods: Bootstrap',
+               2,
+               None,
+               'resampling-methods-bootstrap'),
+              ('The bias-variance tradeoff',
+               2,
+               None,
+               'the-bias-variance-tradeoff'),
+              ('A way to Read the Bias-Variance Tradeoff',
+               2,
+               None,
+               'a-way-to-read-the-bias-variance-tradeoff'),
+              ('Understanding what happens',
+               2,
+               None,
+               'understanding-what-happens'),
+              ('Summing up', 2, None, 'summing-up'),
+              ("Another Example from Scikit-Learn's Repository",
+               2,
+               None,
+               'another-example-from-scikit-learn-s-repository'),
+              ('Various steps in cross-validation',
+               2,
+               None,
+               'various-steps-in-cross-validation'),
+              ('Cross-validation in brief',
+               2,
+               None,
+               'cross-validation-in-brief'),
+              ('Code Example for Cross-validation and $k$-fold '
+               'Cross-validation',
+               2,
+               None,
+               'code-example-for-cross-validation-and-k-fold-cross-validation'),
+              ('More examples on bootstrap and cross-validation and errors',
+               2,
+               None,
+               'more-examples-on-bootstrap-and-cross-validation-and-errors'),
+              ('The same example but now with cross-validation',
+               2,
+               None,
+               'the-same-example-but-now-with-cross-validation'),
+              ('Logistic Regression', 2, None, 'logistic-regression'),
+              ('Classification problems', 2, None, 'classification-problems'),
+              ('Optimization and Deep learning',
+               2,
+               None,
+               'optimization-and-deep-learning'),
+              ('Basics', 2, None, 'basics'),
+              ('Linear classifier', 2, None, 'linear-classifier'),
+              ('Some selected properties', 2, None, 'some-selected-properties'),
+              ('Simple example', 2, None, 'simple-example'),
+              ('Plotting the mean value for each group',
+               2,
+               None,
+               'plotting-the-mean-value-for-each-group'),
+              ('The logistic function', 2, None, 'the-logistic-function'),
+              ('Examples of likelihood functions used in logistic regression '
+               'and nueral networks',
+               2,
+               None,
+               'examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-networks'),
+              ('Two parameters', 2, None, 'two-parameters'),
+              ('Maximum likelihood', 2, None, 'maximum-likelihood'),
+              ('The cost function rewritten',
+               2,
+               None,
+               'the-cost-function-rewritten'),
+              ('Minimizing the cross entropy',
+               2,
+               None,
+               'minimizing-the-cross-entropy'),
+              ('A more compact expression',
+               2,
+               None,
+               'a-more-compact-expression'),
+              ('Extending to more predictors',
+               2,
+               None,
+               'extending-to-more-predictors'),
+              ('Including more classes', 2, None, 'including-more-classes'),
+              ('More classes', 2, None, 'more-classes'),
+              ('Optimization, the central part of any Machine Learning '
+               'algortithm',
+               2,
+               None,
+               'optimization-the-central-part-of-any-machine-learning-algortithm'),
+              ('Revisiting our Logistic Regression case',
+               2,
+               None,
+               'revisiting-our-logistic-regression-case'),
+              ('The equations to solve', 2, None, 'the-equations-to-solve'),
+              ("Solving using Newton-Raphson's method",
+               2,
+               None,
+               'solving-using-newton-raphson-s-method'),
+              ('Example code for Logistic Regression',
+               2,
+               None,
+               'example-code-for-logistic-regression'),
+              ('Synthetic data generation',
+               3,
+               None,
+               'synthetic-data-generation')]}
+end of tocinfo -->
+
+<body>
+
+
+
+<script type="text/x-mathjax-config">
+MathJax.Hub.Config({
+  TeX: {
+     equationNumbers: {  autoNumber: "none"  },
+     extensions: ["AMSmath.js", "AMSsymbols.js", "autobold.js", "color.js"]
+  }
+});
+</script>
+<script type="text/javascript" async
+ src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
+</script>
+
+
+<!-- Bootstrap navigation bar -->
+<div class="navbar navbar-default navbar-fixed-top">
+  <div class="navbar-header">
+    <button type="button" class="navbar-toggle" data-toggle="collapse" data-target=".navbar-responsive-collapse">
+      <span class="icon-bar"></span>
+      <span class="icon-bar"></span>
+      <span class="icon-bar"></span>
+    </button>
+    <a class="navbar-brand" href="http://github.com/week39-bs.html">Week 39: Resampling methods and logistic regression</a>
+  </div>
+  <div class="navbar-collapse collapse navbar-responsive-collapse">
+    <ul class="nav navbar-nav navbar-right">
+      <li class="dropdown">
+        <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
+        <ul class="dropdown-menu">
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs001.html#plan-for-week-39-september-22-26-2025" style="font-size: 80%;"><b>Plan for week 39, September 22-26, 2025</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs002.html#readings-and-videos-resampling-methods" style="font-size: 80%;"><b>Readings and Videos, resampling methods</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs003.html#readings-and-videos-logistic-regression" style="font-size: 80%;"><b>Readings and Videos, logistic regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs004.html#lab-sessions-week-39" style="font-size: 80%;"><b>Lab sessions week 39</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs005.html#lecture-material" style="font-size: 80%;"><b>Lecture material</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs010.html#resampling-methods" style="font-size: 80%;"><b>Resampling methods</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs007.html#resampling-approaches-can-be-computationally-expensive" style="font-size: 80%;"><b>Resampling approaches can be computationally expensive</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs008.html#why-resampling-methods" style="font-size: 80%;"><b>Why resampling methods ?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs009.html#statistical-analysis" style="font-size: 80%;"><b>Statistical analysis</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs010.html#resampling-methods" style="font-size: 80%;"><b>Resampling methods</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs011.html#resampling-methods-bootstrap" style="font-size: 80%;"><b>Resampling methods: Bootstrap</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs012.html#the-bias-variance-tradeoff" style="font-size: 80%;"><b>The bias-variance tradeoff</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs013.html#a-way-to-read-the-bias-variance-tradeoff" style="font-size: 80%;"><b>A way to Read the Bias-Variance Tradeoff</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs014.html#understanding-what-happens" style="font-size: 80%;"><b>Understanding what happens</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs015.html#summing-up" style="font-size: 80%;"><b>Summing up</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs016.html#another-example-from-scikit-learn-s-repository" style="font-size: 80%;"><b>Another Example from Scikit-Learn's Repository</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs017.html#various-steps-in-cross-validation" style="font-size: 80%;"><b>Various steps in cross-validation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs018.html#cross-validation-in-brief" style="font-size: 80%;"><b>Cross-validation in brief</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs019.html#code-example-for-cross-validation-and-k-fold-cross-validation" style="font-size: 80%;"><b>Code Example for Cross-validation and \( k \)-fold Cross-validation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs020.html#more-examples-on-bootstrap-and-cross-validation-and-errors" style="font-size: 80%;"><b>More examples on bootstrap and cross-validation and errors</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs021.html#the-same-example-but-now-with-cross-validation" style="font-size: 80%;"><b>The same example but now with cross-validation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs022.html#logistic-regression" style="font-size: 80%;"><b>Logistic Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs023.html#classification-problems" style="font-size: 80%;"><b>Classification problems</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs024.html#optimization-and-deep-learning" style="font-size: 80%;"><b>Optimization and Deep learning</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs025.html#basics" style="font-size: 80%;"><b>Basics</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs026.html#linear-classifier" style="font-size: 80%;"><b>Linear classifier</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs027.html#some-selected-properties" style="font-size: 80%;"><b>Some selected properties</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs028.html#simple-example" style="font-size: 80%;"><b>Simple example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs029.html#plotting-the-mean-value-for-each-group" style="font-size: 80%;"><b>Plotting the mean value for each group</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs030.html#the-logistic-function" style="font-size: 80%;"><b>The logistic function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs031.html#examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-networks" style="font-size: 80%;"><b>Examples of likelihood functions used in logistic regression and nueral networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs032.html#two-parameters" style="font-size: 80%;"><b>Two parameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs033.html#maximum-likelihood" style="font-size: 80%;"><b>Maximum likelihood</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#the-cost-function-rewritten" style="font-size: 80%;"><b>The cost function rewritten</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs035.html#minimizing-the-cross-entropy" style="font-size: 80%;"><b>Minimizing the cross entropy</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs036.html#a-more-compact-expression" style="font-size: 80%;"><b>A more compact expression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs037.html#extending-to-more-predictors" style="font-size: 80%;"><b>Extending to more predictors</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs038.html#including-more-classes" style="font-size: 80%;"><b>Including more classes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs039.html#more-classes" style="font-size: 80%;"><b>More classes</b></a></li>
+     <!-- navigation toc: --> <li><a href="#optimization-the-central-part-of-any-machine-learning-algortithm" style="font-size: 80%;"><b>Optimization, the central part of any Machine Learning algortithm</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs041.html#revisiting-our-logistic-regression-case" style="font-size: 80%;"><b>Revisiting our Logistic Regression case</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs042.html#the-equations-to-solve" style="font-size: 80%;"><b>The equations to solve</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs043.html#solving-using-newton-raphson-s-method" style="font-size: 80%;"><b>Solving using Newton-Raphson's method</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs044.html#example-code-for-logistic-regression" style="font-size: 80%;"><b>Example code for Logistic Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs044.html#synthetic-data-generation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Synthetic data generation</a></li>
+
+        </ul>
+      </li>
+    </ul>
+  </div>
+</div>
+</div> <!-- end of navigation bar -->
+<div class="container">
+<p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
+<a name="part0040"></a>
+<!-- !split -->
+<h2 id="optimization-the-central-part-of-any-machine-learning-algortithm" class="anchor">Optimization, the central part of any Machine Learning algortithm </h2>
+
+<p>Almost every problem in machine learning and data science starts with
+a dataset \( X \), a model \( g(\theta) \), which is a function of the
+parameters \( \theta \) and a cost function \( C(X, g(\theta)) \) that allows
+us to judge how well the model \( g(\theta) \) explains the observations
+\( X \). The model is fit by finding the values of \( \theta \) that minimize
+the cost function. Ideally we would be able to solve for \( \theta \)
+analytically, however this is not possible in general and we must use
+some approximative/numerical method to compute the minimum.
+</p>
+
+<p>
+<!-- navigation buttons at the bottom of the page -->
+<ul class="pagination">
+<li><a href="http://github.com/._week39-bs039.html">&laquo;</a></li>
+  <li><a href="http://github.com/._week39-bs000.html">1</a></li>
+  <li><a href="">...</a></li>
+  <li><a href="http://github.com/._week39-bs032.html">33</a></li>
+  <li><a href="http://github.com/._week39-bs033.html">34</a></li>
+  <li><a href="http://github.com/._week39-bs034.html">35</a></li>
+  <li><a href="http://github.com/._week39-bs035.html">36</a></li>
+  <li><a href="http://github.com/._week39-bs036.html">37</a></li>
+  <li><a href="http://github.com/._week39-bs037.html">38</a></li>
+  <li><a href="http://github.com/._week39-bs038.html">39</a></li>
+  <li><a href="http://github.com/._week39-bs039.html">40</a></li>
+  <li class="active"><a href="http://github.com/._week39-bs040.html">41</a></li>
+  <li><a href="http://github.com/._week39-bs041.html">42</a></li>
+  <li><a href="http://github.com/._week39-bs042.html">43</a></li>
+  <li><a href="http://github.com/._week39-bs043.html">44</a></li>
+  <li><a href="http://github.com/._week39-bs044.html">45</a></li>
+  <li><a href="http://github.com/._week39-bs041.html">&raquo;</a></li>
+</ul>
+<!-- ------------------- end of main content --------------- -->
+</div>  <!-- end container -->
+<!-- include javascript, jQuery *first* -->
+<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.10.2/jquery.min.js"></script>
+<script src="https://netdna.bootstrapcdn.com/bootstrap/3.0.0/js/bootstrap.min.js"></script>
+<!-- Bootstrap footer
+<footer>
+<a href="https://.../"><img width="250" align=right src="https://.../"></a>
+</footer>
+-->
+<center style="font-size:80%">
+<!-- copyright only on the titlepage -->
+</center>
+</body>
+</html>
+
diff --git a/doc/pub/week39/html/._week39-bs041.html b/doc/pub/week39/html/._week39-bs041.html
new file mode 100644
index 000000000..677037092
--- /dev/null
+++ b/doc/pub/week39/html/._week39-bs041.html
@@ -0,0 +1,309 @@
+<!--
+HTML file automatically generated from DocOnce source
+(https://github.com/doconce/doconce/)
+doconce format html week39.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week39-bs --no_mako
+-->
+<html>
+<head>
+<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
+<meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
+<meta name="viewport" content="width=device-width, initial-scale=1.0" />
+<meta name="description" content="Week 39: Resampling methods and logistic regression">
+<title>Week 39: Resampling methods and logistic regression</title>
+<!-- Bootstrap style: bootstrap -->
+<!-- doconce format html week39.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week39-bs --no_mako -->
+<link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
+<!-- not necessary
+<link href="https://netdna.bootstrapcdn.com/font-awesome/4.0.3/css/font-awesome.css" rel="stylesheet">
+-->
+<style type="text/css">
+/* Add scrollbar to dropdown menus in bootstrap navigation bar */
+.dropdown-menu {
+   height: auto;
+   max-height: 400px;
+   overflow-x: hidden;
+}
+/* Adds an invisible element before each target to offset for the navigation
+   bar */
+.anchor::before {
+  content:"";
+  display:block;
+  height:50px;      /* fixed header height for style bootstrap */
+  margin:-50px 0 0; /* negative fixed header height */
+}
+</style>
+</head>
+
+<!-- tocinfo
+{'highest level': 2,
+ 'sections': [('Plan for week 39, September 22-26, 2025',
+               2,
+               None,
+               'plan-for-week-39-september-22-26-2025'),
+              ('Readings and Videos, resampling methods',
+               2,
+               None,
+               'readings-and-videos-resampling-methods'),
+              ('Readings and Videos, logistic regression',
+               2,
+               None,
+               'readings-and-videos-logistic-regression'),
+              ('Lab sessions week 39', 2, None, 'lab-sessions-week-39'),
+              ('Lecture material', 2, None, 'lecture-material'),
+              ('Resampling methods', 2, None, 'resampling-methods'),
+              ('Resampling approaches can be computationally expensive',
+               2,
+               None,
+               'resampling-approaches-can-be-computationally-expensive'),
+              ('Why resampling methods ?', 2, None, 'why-resampling-methods'),
+              ('Statistical analysis', 2, None, 'statistical-analysis'),
+              ('Resampling methods', 2, None, 'resampling-methods'),
+              ('Resampling methods: Bootstrap',
+               2,
+               None,
+               'resampling-methods-bootstrap'),
+              ('The bias-variance tradeoff',
+               2,
+               None,
+               'the-bias-variance-tradeoff'),
+              ('A way to Read the Bias-Variance Tradeoff',
+               2,
+               None,
+               'a-way-to-read-the-bias-variance-tradeoff'),
+              ('Understanding what happens',
+               2,
+               None,
+               'understanding-what-happens'),
+              ('Summing up', 2, None, 'summing-up'),
+              ("Another Example from Scikit-Learn's Repository",
+               2,
+               None,
+               'another-example-from-scikit-learn-s-repository'),
+              ('Various steps in cross-validation',
+               2,
+               None,
+               'various-steps-in-cross-validation'),
+              ('Cross-validation in brief',
+               2,
+               None,
+               'cross-validation-in-brief'),
+              ('Code Example for Cross-validation and $k$-fold '
+               'Cross-validation',
+               2,
+               None,
+               'code-example-for-cross-validation-and-k-fold-cross-validation'),
+              ('More examples on bootstrap and cross-validation and errors',
+               2,
+               None,
+               'more-examples-on-bootstrap-and-cross-validation-and-errors'),
+              ('The same example but now with cross-validation',
+               2,
+               None,
+               'the-same-example-but-now-with-cross-validation'),
+              ('Logistic Regression', 2, None, 'logistic-regression'),
+              ('Classification problems', 2, None, 'classification-problems'),
+              ('Optimization and Deep learning',
+               2,
+               None,
+               'optimization-and-deep-learning'),
+              ('Basics', 2, None, 'basics'),
+              ('Linear classifier', 2, None, 'linear-classifier'),
+              ('Some selected properties', 2, None, 'some-selected-properties'),
+              ('Simple example', 2, None, 'simple-example'),
+              ('Plotting the mean value for each group',
+               2,
+               None,
+               'plotting-the-mean-value-for-each-group'),
+              ('The logistic function', 2, None, 'the-logistic-function'),
+              ('Examples of likelihood functions used in logistic regression '
+               'and nueral networks',
+               2,
+               None,
+               'examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-networks'),
+              ('Two parameters', 2, None, 'two-parameters'),
+              ('Maximum likelihood', 2, None, 'maximum-likelihood'),
+              ('The cost function rewritten',
+               2,
+               None,
+               'the-cost-function-rewritten'),
+              ('Minimizing the cross entropy',
+               2,
+               None,
+               'minimizing-the-cross-entropy'),
+              ('A more compact expression',
+               2,
+               None,
+               'a-more-compact-expression'),
+              ('Extending to more predictors',
+               2,
+               None,
+               'extending-to-more-predictors'),
+              ('Including more classes', 2, None, 'including-more-classes'),
+              ('More classes', 2, None, 'more-classes'),
+              ('Optimization, the central part of any Machine Learning '
+               'algortithm',
+               2,
+               None,
+               'optimization-the-central-part-of-any-machine-learning-algortithm'),
+              ('Revisiting our Logistic Regression case',
+               2,
+               None,
+               'revisiting-our-logistic-regression-case'),
+              ('The equations to solve', 2, None, 'the-equations-to-solve'),
+              ("Solving using Newton-Raphson's method",
+               2,
+               None,
+               'solving-using-newton-raphson-s-method'),
+              ('Example code for Logistic Regression',
+               2,
+               None,
+               'example-code-for-logistic-regression'),
+              ('Synthetic data generation',
+               3,
+               None,
+               'synthetic-data-generation')]}
+end of tocinfo -->
+
+<body>
+
+
+
+<script type="text/x-mathjax-config">
+MathJax.Hub.Config({
+  TeX: {
+     equationNumbers: {  autoNumber: "none"  },
+     extensions: ["AMSmath.js", "AMSsymbols.js", "autobold.js", "color.js"]
+  }
+});
+</script>
+<script type="text/javascript" async
+ src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
+</script>
+
+
+<!-- Bootstrap navigation bar -->
+<div class="navbar navbar-default navbar-fixed-top">
+  <div class="navbar-header">
+    <button type="button" class="navbar-toggle" data-toggle="collapse" data-target=".navbar-responsive-collapse">
+      <span class="icon-bar"></span>
+      <span class="icon-bar"></span>
+      <span class="icon-bar"></span>
+    </button>
+    <a class="navbar-brand" href="http://github.com/week39-bs.html">Week 39: Resampling methods and logistic regression</a>
+  </div>
+  <div class="navbar-collapse collapse navbar-responsive-collapse">
+    <ul class="nav navbar-nav navbar-right">
+      <li class="dropdown">
+        <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
+        <ul class="dropdown-menu">
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs001.html#plan-for-week-39-september-22-26-2025" style="font-size: 80%;"><b>Plan for week 39, September 22-26, 2025</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs002.html#readings-and-videos-resampling-methods" style="font-size: 80%;"><b>Readings and Videos, resampling methods</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs003.html#readings-and-videos-logistic-regression" style="font-size: 80%;"><b>Readings and Videos, logistic regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs004.html#lab-sessions-week-39" style="font-size: 80%;"><b>Lab sessions week 39</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs005.html#lecture-material" style="font-size: 80%;"><b>Lecture material</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs010.html#resampling-methods" style="font-size: 80%;"><b>Resampling methods</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs007.html#resampling-approaches-can-be-computationally-expensive" style="font-size: 80%;"><b>Resampling approaches can be computationally expensive</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs008.html#why-resampling-methods" style="font-size: 80%;"><b>Why resampling methods ?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs009.html#statistical-analysis" style="font-size: 80%;"><b>Statistical analysis</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs010.html#resampling-methods" style="font-size: 80%;"><b>Resampling methods</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs011.html#resampling-methods-bootstrap" style="font-size: 80%;"><b>Resampling methods: Bootstrap</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs012.html#the-bias-variance-tradeoff" style="font-size: 80%;"><b>The bias-variance tradeoff</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs013.html#a-way-to-read-the-bias-variance-tradeoff" style="font-size: 80%;"><b>A way to Read the Bias-Variance Tradeoff</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs014.html#understanding-what-happens" style="font-size: 80%;"><b>Understanding what happens</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs015.html#summing-up" style="font-size: 80%;"><b>Summing up</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs016.html#another-example-from-scikit-learn-s-repository" style="font-size: 80%;"><b>Another Example from Scikit-Learn's Repository</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs017.html#various-steps-in-cross-validation" style="font-size: 80%;"><b>Various steps in cross-validation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs018.html#cross-validation-in-brief" style="font-size: 80%;"><b>Cross-validation in brief</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs019.html#code-example-for-cross-validation-and-k-fold-cross-validation" style="font-size: 80%;"><b>Code Example for Cross-validation and \( k \)-fold Cross-validation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs020.html#more-examples-on-bootstrap-and-cross-validation-and-errors" style="font-size: 80%;"><b>More examples on bootstrap and cross-validation and errors</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs021.html#the-same-example-but-now-with-cross-validation" style="font-size: 80%;"><b>The same example but now with cross-validation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs022.html#logistic-regression" style="font-size: 80%;"><b>Logistic Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs023.html#classification-problems" style="font-size: 80%;"><b>Classification problems</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs024.html#optimization-and-deep-learning" style="font-size: 80%;"><b>Optimization and Deep learning</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs025.html#basics" style="font-size: 80%;"><b>Basics</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs026.html#linear-classifier" style="font-size: 80%;"><b>Linear classifier</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs027.html#some-selected-properties" style="font-size: 80%;"><b>Some selected properties</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs028.html#simple-example" style="font-size: 80%;"><b>Simple example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs029.html#plotting-the-mean-value-for-each-group" style="font-size: 80%;"><b>Plotting the mean value for each group</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs030.html#the-logistic-function" style="font-size: 80%;"><b>The logistic function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs031.html#examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-networks" style="font-size: 80%;"><b>Examples of likelihood functions used in logistic regression and nueral networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs032.html#two-parameters" style="font-size: 80%;"><b>Two parameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs033.html#maximum-likelihood" style="font-size: 80%;"><b>Maximum likelihood</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#the-cost-function-rewritten" style="font-size: 80%;"><b>The cost function rewritten</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs035.html#minimizing-the-cross-entropy" style="font-size: 80%;"><b>Minimizing the cross entropy</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs036.html#a-more-compact-expression" style="font-size: 80%;"><b>A more compact expression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs037.html#extending-to-more-predictors" style="font-size: 80%;"><b>Extending to more predictors</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs038.html#including-more-classes" style="font-size: 80%;"><b>Including more classes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs039.html#more-classes" style="font-size: 80%;"><b>More classes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs040.html#optimization-the-central-part-of-any-machine-learning-algortithm" style="font-size: 80%;"><b>Optimization, the central part of any Machine Learning algortithm</b></a></li>
+     <!-- navigation toc: --> <li><a href="#revisiting-our-logistic-regression-case" style="font-size: 80%;"><b>Revisiting our Logistic Regression case</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs042.html#the-equations-to-solve" style="font-size: 80%;"><b>The equations to solve</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs043.html#solving-using-newton-raphson-s-method" style="font-size: 80%;"><b>Solving using Newton-Raphson's method</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs044.html#example-code-for-logistic-regression" style="font-size: 80%;"><b>Example code for Logistic Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs044.html#synthetic-data-generation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Synthetic data generation</a></li>
+
+        </ul>
+      </li>
+    </ul>
+  </div>
+</div>
+</div> <!-- end of navigation bar -->
+<div class="container">
+<p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
+<a name="part0041"></a>
+<!-- !split -->
+<h2 id="revisiting-our-logistic-regression-case" class="anchor">Revisiting our Logistic Regression case </h2>
+
+<p>In our discussion on Logistic Regression we studied the 
+case of
+two classes, with \( y_i \) either
+\( 0 \) or \( 1 \). Furthermore we assumed also that we have only two
+parameters \( \theta \) in our fitting, that is we
+defined probabilities
+</p>
+
+$$
+\begin{align*}
+p(y_i=1|x_i,\boldsymbol{\theta}) &= \frac{\exp{(\theta_0+\theta_1x_i)}}{1+\exp{(\theta_0+\theta_1x_i)}},\nonumber\\
+p(y_i=0|x_i,\boldsymbol{\theta}) &= 1 - p(y_i=1|x_i,\boldsymbol{\theta}),
+\end{align*}
+$$
+
+<p>where \( \boldsymbol{\theta} \) are the weights we wish to extract from data, in our case \( \theta_0 \) and \( \theta_1 \). </p>
+
+<p>
+<!-- navigation buttons at the bottom of the page -->
+<ul class="pagination">
+<li><a href="http://github.com/._week39-bs040.html">&laquo;</a></li>
+  <li><a href="http://github.com/._week39-bs000.html">1</a></li>
+  <li><a href="">...</a></li>
+  <li><a href="http://github.com/._week39-bs033.html">34</a></li>
+  <li><a href="http://github.com/._week39-bs034.html">35</a></li>
+  <li><a href="http://github.com/._week39-bs035.html">36</a></li>
+  <li><a href="http://github.com/._week39-bs036.html">37</a></li>
+  <li><a href="http://github.com/._week39-bs037.html">38</a></li>
+  <li><a href="http://github.com/._week39-bs038.html">39</a></li>
+  <li><a href="http://github.com/._week39-bs039.html">40</a></li>
+  <li><a href="http://github.com/._week39-bs040.html">41</a></li>
+  <li class="active"><a href="http://github.com/._week39-bs041.html">42</a></li>
+  <li><a href="http://github.com/._week39-bs042.html">43</a></li>
+  <li><a href="http://github.com/._week39-bs043.html">44</a></li>
+  <li><a href="http://github.com/._week39-bs044.html">45</a></li>
+  <li><a href="http://github.com/._week39-bs042.html">&raquo;</a></li>
+</ul>
+<!-- ------------------- end of main content --------------- -->
+</div>  <!-- end container -->
+<!-- include javascript, jQuery *first* -->
+<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.10.2/jquery.min.js"></script>
+<script src="https://netdna.bootstrapcdn.com/bootstrap/3.0.0/js/bootstrap.min.js"></script>
+<!-- Bootstrap footer
+<footer>
+<a href="https://.../"><img width="250" align=right src="https://.../"></a>
+</footer>
+-->
+<center style="font-size:80%">
+<!-- copyright only on the titlepage -->
+</center>
+</body>
+</html>
+
diff --git a/doc/pub/week39/html/._week39-bs042.html b/doc/pub/week39/html/._week39-bs042.html
new file mode 100644
index 000000000..8c77fb850
--- /dev/null
+++ b/doc/pub/week39/html/._week39-bs042.html
@@ -0,0 +1,312 @@
+<!--
+HTML file automatically generated from DocOnce source
+(https://github.com/doconce/doconce/)
+doconce format html week39.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week39-bs --no_mako
+-->
+<html>
+<head>
+<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
+<meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
+<meta name="viewport" content="width=device-width, initial-scale=1.0" />
+<meta name="description" content="Week 39: Resampling methods and logistic regression">
+<title>Week 39: Resampling methods and logistic regression</title>
+<!-- Bootstrap style: bootstrap -->
+<!-- doconce format html week39.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week39-bs --no_mako -->
+<link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
+<!-- not necessary
+<link href="https://netdna.bootstrapcdn.com/font-awesome/4.0.3/css/font-awesome.css" rel="stylesheet">
+-->
+<style type="text/css">
+/* Add scrollbar to dropdown menus in bootstrap navigation bar */
+.dropdown-menu {
+   height: auto;
+   max-height: 400px;
+   overflow-x: hidden;
+}
+/* Adds an invisible element before each target to offset for the navigation
+   bar */
+.anchor::before {
+  content:"";
+  display:block;
+  height:50px;      /* fixed header height for style bootstrap */
+  margin:-50px 0 0; /* negative fixed header height */
+}
+</style>
+</head>
+
+<!-- tocinfo
+{'highest level': 2,
+ 'sections': [('Plan for week 39, September 22-26, 2025',
+               2,
+               None,
+               'plan-for-week-39-september-22-26-2025'),
+              ('Readings and Videos, resampling methods',
+               2,
+               None,
+               'readings-and-videos-resampling-methods'),
+              ('Readings and Videos, logistic regression',
+               2,
+               None,
+               'readings-and-videos-logistic-regression'),
+              ('Lab sessions week 39', 2, None, 'lab-sessions-week-39'),
+              ('Lecture material', 2, None, 'lecture-material'),
+              ('Resampling methods', 2, None, 'resampling-methods'),
+              ('Resampling approaches can be computationally expensive',
+               2,
+               None,
+               'resampling-approaches-can-be-computationally-expensive'),
+              ('Why resampling methods ?', 2, None, 'why-resampling-methods'),
+              ('Statistical analysis', 2, None, 'statistical-analysis'),
+              ('Resampling methods', 2, None, 'resampling-methods'),
+              ('Resampling methods: Bootstrap',
+               2,
+               None,
+               'resampling-methods-bootstrap'),
+              ('The bias-variance tradeoff',
+               2,
+               None,
+               'the-bias-variance-tradeoff'),
+              ('A way to Read the Bias-Variance Tradeoff',
+               2,
+               None,
+               'a-way-to-read-the-bias-variance-tradeoff'),
+              ('Understanding what happens',
+               2,
+               None,
+               'understanding-what-happens'),
+              ('Summing up', 2, None, 'summing-up'),
+              ("Another Example from Scikit-Learn's Repository",
+               2,
+               None,
+               'another-example-from-scikit-learn-s-repository'),
+              ('Various steps in cross-validation',
+               2,
+               None,
+               'various-steps-in-cross-validation'),
+              ('Cross-validation in brief',
+               2,
+               None,
+               'cross-validation-in-brief'),
+              ('Code Example for Cross-validation and $k$-fold '
+               'Cross-validation',
+               2,
+               None,
+               'code-example-for-cross-validation-and-k-fold-cross-validation'),
+              ('More examples on bootstrap and cross-validation and errors',
+               2,
+               None,
+               'more-examples-on-bootstrap-and-cross-validation-and-errors'),
+              ('The same example but now with cross-validation',
+               2,
+               None,
+               'the-same-example-but-now-with-cross-validation'),
+              ('Logistic Regression', 2, None, 'logistic-regression'),
+              ('Classification problems', 2, None, 'classification-problems'),
+              ('Optimization and Deep learning',
+               2,
+               None,
+               'optimization-and-deep-learning'),
+              ('Basics', 2, None, 'basics'),
+              ('Linear classifier', 2, None, 'linear-classifier'),
+              ('Some selected properties', 2, None, 'some-selected-properties'),
+              ('Simple example', 2, None, 'simple-example'),
+              ('Plotting the mean value for each group',
+               2,
+               None,
+               'plotting-the-mean-value-for-each-group'),
+              ('The logistic function', 2, None, 'the-logistic-function'),
+              ('Examples of likelihood functions used in logistic regression '
+               'and nueral networks',
+               2,
+               None,
+               'examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-networks'),
+              ('Two parameters', 2, None, 'two-parameters'),
+              ('Maximum likelihood', 2, None, 'maximum-likelihood'),
+              ('The cost function rewritten',
+               2,
+               None,
+               'the-cost-function-rewritten'),
+              ('Minimizing the cross entropy',
+               2,
+               None,
+               'minimizing-the-cross-entropy'),
+              ('A more compact expression',
+               2,
+               None,
+               'a-more-compact-expression'),
+              ('Extending to more predictors',
+               2,
+               None,
+               'extending-to-more-predictors'),
+              ('Including more classes', 2, None, 'including-more-classes'),
+              ('More classes', 2, None, 'more-classes'),
+              ('Optimization, the central part of any Machine Learning '
+               'algortithm',
+               2,
+               None,
+               'optimization-the-central-part-of-any-machine-learning-algortithm'),
+              ('Revisiting our Logistic Regression case',
+               2,
+               None,
+               'revisiting-our-logistic-regression-case'),
+              ('The equations to solve', 2, None, 'the-equations-to-solve'),
+              ("Solving using Newton-Raphson's method",
+               2,
+               None,
+               'solving-using-newton-raphson-s-method'),
+              ('Example code for Logistic Regression',
+               2,
+               None,
+               'example-code-for-logistic-regression'),
+              ('Synthetic data generation',
+               3,
+               None,
+               'synthetic-data-generation')]}
+end of tocinfo -->
+
+<body>
+
+
+
+<script type="text/x-mathjax-config">
+MathJax.Hub.Config({
+  TeX: {
+     equationNumbers: {  autoNumber: "none"  },
+     extensions: ["AMSmath.js", "AMSsymbols.js", "autobold.js", "color.js"]
+  }
+});
+</script>
+<script type="text/javascript" async
+ src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
+</script>
+
+
+<!-- Bootstrap navigation bar -->
+<div class="navbar navbar-default navbar-fixed-top">
+  <div class="navbar-header">
+    <button type="button" class="navbar-toggle" data-toggle="collapse" data-target=".navbar-responsive-collapse">
+      <span class="icon-bar"></span>
+      <span class="icon-bar"></span>
+      <span class="icon-bar"></span>
+    </button>
+    <a class="navbar-brand" href="http://github.com/week39-bs.html">Week 39: Resampling methods and logistic regression</a>
+  </div>
+  <div class="navbar-collapse collapse navbar-responsive-collapse">
+    <ul class="nav navbar-nav navbar-right">
+      <li class="dropdown">
+        <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
+        <ul class="dropdown-menu">
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs001.html#plan-for-week-39-september-22-26-2025" style="font-size: 80%;"><b>Plan for week 39, September 22-26, 2025</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs002.html#readings-and-videos-resampling-methods" style="font-size: 80%;"><b>Readings and Videos, resampling methods</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs003.html#readings-and-videos-logistic-regression" style="font-size: 80%;"><b>Readings and Videos, logistic regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs004.html#lab-sessions-week-39" style="font-size: 80%;"><b>Lab sessions week 39</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs005.html#lecture-material" style="font-size: 80%;"><b>Lecture material</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs010.html#resampling-methods" style="font-size: 80%;"><b>Resampling methods</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs007.html#resampling-approaches-can-be-computationally-expensive" style="font-size: 80%;"><b>Resampling approaches can be computationally expensive</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs008.html#why-resampling-methods" style="font-size: 80%;"><b>Why resampling methods ?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs009.html#statistical-analysis" style="font-size: 80%;"><b>Statistical analysis</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs010.html#resampling-methods" style="font-size: 80%;"><b>Resampling methods</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs011.html#resampling-methods-bootstrap" style="font-size: 80%;"><b>Resampling methods: Bootstrap</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs012.html#the-bias-variance-tradeoff" style="font-size: 80%;"><b>The bias-variance tradeoff</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs013.html#a-way-to-read-the-bias-variance-tradeoff" style="font-size: 80%;"><b>A way to Read the Bias-Variance Tradeoff</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs014.html#understanding-what-happens" style="font-size: 80%;"><b>Understanding what happens</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs015.html#summing-up" style="font-size: 80%;"><b>Summing up</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs016.html#another-example-from-scikit-learn-s-repository" style="font-size: 80%;"><b>Another Example from Scikit-Learn's Repository</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs017.html#various-steps-in-cross-validation" style="font-size: 80%;"><b>Various steps in cross-validation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs018.html#cross-validation-in-brief" style="font-size: 80%;"><b>Cross-validation in brief</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs019.html#code-example-for-cross-validation-and-k-fold-cross-validation" style="font-size: 80%;"><b>Code Example for Cross-validation and \( k \)-fold Cross-validation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs020.html#more-examples-on-bootstrap-and-cross-validation-and-errors" style="font-size: 80%;"><b>More examples on bootstrap and cross-validation and errors</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs021.html#the-same-example-but-now-with-cross-validation" style="font-size: 80%;"><b>The same example but now with cross-validation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs022.html#logistic-regression" style="font-size: 80%;"><b>Logistic Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs023.html#classification-problems" style="font-size: 80%;"><b>Classification problems</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs024.html#optimization-and-deep-learning" style="font-size: 80%;"><b>Optimization and Deep learning</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs025.html#basics" style="font-size: 80%;"><b>Basics</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs026.html#linear-classifier" style="font-size: 80%;"><b>Linear classifier</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs027.html#some-selected-properties" style="font-size: 80%;"><b>Some selected properties</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs028.html#simple-example" style="font-size: 80%;"><b>Simple example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs029.html#plotting-the-mean-value-for-each-group" style="font-size: 80%;"><b>Plotting the mean value for each group</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs030.html#the-logistic-function" style="font-size: 80%;"><b>The logistic function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs031.html#examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-networks" style="font-size: 80%;"><b>Examples of likelihood functions used in logistic regression and nueral networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs032.html#two-parameters" style="font-size: 80%;"><b>Two parameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs033.html#maximum-likelihood" style="font-size: 80%;"><b>Maximum likelihood</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#the-cost-function-rewritten" style="font-size: 80%;"><b>The cost function rewritten</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs035.html#minimizing-the-cross-entropy" style="font-size: 80%;"><b>Minimizing the cross entropy</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs036.html#a-more-compact-expression" style="font-size: 80%;"><b>A more compact expression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs037.html#extending-to-more-predictors" style="font-size: 80%;"><b>Extending to more predictors</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs038.html#including-more-classes" style="font-size: 80%;"><b>Including more classes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs039.html#more-classes" style="font-size: 80%;"><b>More classes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs040.html#optimization-the-central-part-of-any-machine-learning-algortithm" style="font-size: 80%;"><b>Optimization, the central part of any Machine Learning algortithm</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs041.html#revisiting-our-logistic-regression-case" style="font-size: 80%;"><b>Revisiting our Logistic Regression case</b></a></li>
+     <!-- navigation toc: --> <li><a href="#the-equations-to-solve" style="font-size: 80%;"><b>The equations to solve</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs043.html#solving-using-newton-raphson-s-method" style="font-size: 80%;"><b>Solving using Newton-Raphson's method</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs044.html#example-code-for-logistic-regression" style="font-size: 80%;"><b>Example code for Logistic Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs044.html#synthetic-data-generation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Synthetic data generation</a></li>
+
+        </ul>
+      </li>
+    </ul>
+  </div>
+</div>
+</div> <!-- end of navigation bar -->
+<div class="container">
+<p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
+<a name="part0042"></a>
+<!-- !split -->
+<h2 id="the-equations-to-solve" class="anchor">The equations to solve </h2>
+
+<p>Our compact equations used a definition of a vector \( \boldsymbol{y} \) with \( n \)
+elements \( y_i \), an \( n\times p \) matrix \( \boldsymbol{X} \) which contains the
+\( x_i \) values and a vector \( \boldsymbol{p} \) of fitted probabilities
+\( p(y_i\vert x_i,\boldsymbol{\theta}) \). We rewrote in a more compact form
+the first derivative of the cost function as
+</p>
+
+$$
+\frac{\partial \mathcal{C}(\boldsymbol{\theta})}{\partial \boldsymbol{\theta}} = -\boldsymbol{X}^T\left(\boldsymbol{y}-\boldsymbol{p}\right). 
+$$
+
+<p>If we in addition define a diagonal matrix \( \boldsymbol{W} \) with elements 
+\( p(y_i\vert x_i,\boldsymbol{\theta})(1-p(y_i\vert x_i,\boldsymbol{\theta}) \), we can obtain a compact expression of the second derivative as 
+</p>
+
+$$
+\frac{\partial^2 \mathcal{C}(\boldsymbol{\theta})}{\partial \boldsymbol{\theta}\partial \boldsymbol{\theta}^T} = \boldsymbol{X}^T\boldsymbol{W}\boldsymbol{X}. 
+$$
+
+<p>This defines what is called  the Hessian matrix.</p>
+
+<p>
+<!-- navigation buttons at the bottom of the page -->
+<ul class="pagination">
+<li><a href="http://github.com/._week39-bs041.html">&laquo;</a></li>
+  <li><a href="http://github.com/._week39-bs000.html">1</a></li>
+  <li><a href="">...</a></li>
+  <li><a href="http://github.com/._week39-bs034.html">35</a></li>
+  <li><a href="http://github.com/._week39-bs035.html">36</a></li>
+  <li><a href="http://github.com/._week39-bs036.html">37</a></li>
+  <li><a href="http://github.com/._week39-bs037.html">38</a></li>
+  <li><a href="http://github.com/._week39-bs038.html">39</a></li>
+  <li><a href="http://github.com/._week39-bs039.html">40</a></li>
+  <li><a href="http://github.com/._week39-bs040.html">41</a></li>
+  <li><a href="http://github.com/._week39-bs041.html">42</a></li>
+  <li class="active"><a href="http://github.com/._week39-bs042.html">43</a></li>
+  <li><a href="http://github.com/._week39-bs043.html">44</a></li>
+  <li><a href="http://github.com/._week39-bs044.html">45</a></li>
+  <li><a href="http://github.com/._week39-bs043.html">&raquo;</a></li>
+</ul>
+<!-- ------------------- end of main content --------------- -->
+</div>  <!-- end container -->
+<!-- include javascript, jQuery *first* -->
+<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.10.2/jquery.min.js"></script>
+<script src="https://netdna.bootstrapcdn.com/bootstrap/3.0.0/js/bootstrap.min.js"></script>
+<!-- Bootstrap footer
+<footer>
+<a href="https://.../"><img width="250" align=right src="https://.../"></a>
+</footer>
+-->
+<center style="font-size:80%">
+<!-- copyright only on the titlepage -->
+</center>
+</body>
+</html>
+
diff --git a/doc/pub/week39/html/._week39-bs043.html b/doc/pub/week39/html/._week39-bs043.html
new file mode 100644
index 000000000..6724eb072
--- /dev/null
+++ b/doc/pub/week39/html/._week39-bs043.html
@@ -0,0 +1,308 @@
+<!--
+HTML file automatically generated from DocOnce source
+(https://github.com/doconce/doconce/)
+doconce format html week39.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week39-bs --no_mako
+-->
+<html>
+<head>
+<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
+<meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
+<meta name="viewport" content="width=device-width, initial-scale=1.0" />
+<meta name="description" content="Week 39: Resampling methods and logistic regression">
+<title>Week 39: Resampling methods and logistic regression</title>
+<!-- Bootstrap style: bootstrap -->
+<!-- doconce format html week39.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week39-bs --no_mako -->
+<link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
+<!-- not necessary
+<link href="https://netdna.bootstrapcdn.com/font-awesome/4.0.3/css/font-awesome.css" rel="stylesheet">
+-->
+<style type="text/css">
+/* Add scrollbar to dropdown menus in bootstrap navigation bar */
+.dropdown-menu {
+   height: auto;
+   max-height: 400px;
+   overflow-x: hidden;
+}
+/* Adds an invisible element before each target to offset for the navigation
+   bar */
+.anchor::before {
+  content:"";
+  display:block;
+  height:50px;      /* fixed header height for style bootstrap */
+  margin:-50px 0 0; /* negative fixed header height */
+}
+</style>
+</head>
+
+<!-- tocinfo
+{'highest level': 2,
+ 'sections': [('Plan for week 39, September 22-26, 2025',
+               2,
+               None,
+               'plan-for-week-39-september-22-26-2025'),
+              ('Readings and Videos, resampling methods',
+               2,
+               None,
+               'readings-and-videos-resampling-methods'),
+              ('Readings and Videos, logistic regression',
+               2,
+               None,
+               'readings-and-videos-logistic-regression'),
+              ('Lab sessions week 39', 2, None, 'lab-sessions-week-39'),
+              ('Lecture material', 2, None, 'lecture-material'),
+              ('Resampling methods', 2, None, 'resampling-methods'),
+              ('Resampling approaches can be computationally expensive',
+               2,
+               None,
+               'resampling-approaches-can-be-computationally-expensive'),
+              ('Why resampling methods ?', 2, None, 'why-resampling-methods'),
+              ('Statistical analysis', 2, None, 'statistical-analysis'),
+              ('Resampling methods', 2, None, 'resampling-methods'),
+              ('Resampling methods: Bootstrap',
+               2,
+               None,
+               'resampling-methods-bootstrap'),
+              ('The bias-variance tradeoff',
+               2,
+               None,
+               'the-bias-variance-tradeoff'),
+              ('A way to Read the Bias-Variance Tradeoff',
+               2,
+               None,
+               'a-way-to-read-the-bias-variance-tradeoff'),
+              ('Understanding what happens',
+               2,
+               None,
+               'understanding-what-happens'),
+              ('Summing up', 2, None, 'summing-up'),
+              ("Another Example from Scikit-Learn's Repository",
+               2,
+               None,
+               'another-example-from-scikit-learn-s-repository'),
+              ('Various steps in cross-validation',
+               2,
+               None,
+               'various-steps-in-cross-validation'),
+              ('Cross-validation in brief',
+               2,
+               None,
+               'cross-validation-in-brief'),
+              ('Code Example for Cross-validation and $k$-fold '
+               'Cross-validation',
+               2,
+               None,
+               'code-example-for-cross-validation-and-k-fold-cross-validation'),
+              ('More examples on bootstrap and cross-validation and errors',
+               2,
+               None,
+               'more-examples-on-bootstrap-and-cross-validation-and-errors'),
+              ('The same example but now with cross-validation',
+               2,
+               None,
+               'the-same-example-but-now-with-cross-validation'),
+              ('Logistic Regression', 2, None, 'logistic-regression'),
+              ('Classification problems', 2, None, 'classification-problems'),
+              ('Optimization and Deep learning',
+               2,
+               None,
+               'optimization-and-deep-learning'),
+              ('Basics', 2, None, 'basics'),
+              ('Linear classifier', 2, None, 'linear-classifier'),
+              ('Some selected properties', 2, None, 'some-selected-properties'),
+              ('Simple example', 2, None, 'simple-example'),
+              ('Plotting the mean value for each group',
+               2,
+               None,
+               'plotting-the-mean-value-for-each-group'),
+              ('The logistic function', 2, None, 'the-logistic-function'),
+              ('Examples of likelihood functions used in logistic regression '
+               'and nueral networks',
+               2,
+               None,
+               'examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-networks'),
+              ('Two parameters', 2, None, 'two-parameters'),
+              ('Maximum likelihood', 2, None, 'maximum-likelihood'),
+              ('The cost function rewritten',
+               2,
+               None,
+               'the-cost-function-rewritten'),
+              ('Minimizing the cross entropy',
+               2,
+               None,
+               'minimizing-the-cross-entropy'),
+              ('A more compact expression',
+               2,
+               None,
+               'a-more-compact-expression'),
+              ('Extending to more predictors',
+               2,
+               None,
+               'extending-to-more-predictors'),
+              ('Including more classes', 2, None, 'including-more-classes'),
+              ('More classes', 2, None, 'more-classes'),
+              ('Optimization, the central part of any Machine Learning '
+               'algortithm',
+               2,
+               None,
+               'optimization-the-central-part-of-any-machine-learning-algortithm'),
+              ('Revisiting our Logistic Regression case',
+               2,
+               None,
+               'revisiting-our-logistic-regression-case'),
+              ('The equations to solve', 2, None, 'the-equations-to-solve'),
+              ("Solving using Newton-Raphson's method",
+               2,
+               None,
+               'solving-using-newton-raphson-s-method'),
+              ('Example code for Logistic Regression',
+               2,
+               None,
+               'example-code-for-logistic-regression'),
+              ('Synthetic data generation',
+               3,
+               None,
+               'synthetic-data-generation')]}
+end of tocinfo -->
+
+<body>
+
+
+
+<script type="text/x-mathjax-config">
+MathJax.Hub.Config({
+  TeX: {
+     equationNumbers: {  autoNumber: "none"  },
+     extensions: ["AMSmath.js", "AMSsymbols.js", "autobold.js", "color.js"]
+  }
+});
+</script>
+<script type="text/javascript" async
+ src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
+</script>
+
+
+<!-- Bootstrap navigation bar -->
+<div class="navbar navbar-default navbar-fixed-top">
+  <div class="navbar-header">
+    <button type="button" class="navbar-toggle" data-toggle="collapse" data-target=".navbar-responsive-collapse">
+      <span class="icon-bar"></span>
+      <span class="icon-bar"></span>
+      <span class="icon-bar"></span>
+    </button>
+    <a class="navbar-brand" href="http://github.com/week39-bs.html">Week 39: Resampling methods and logistic regression</a>
+  </div>
+  <div class="navbar-collapse collapse navbar-responsive-collapse">
+    <ul class="nav navbar-nav navbar-right">
+      <li class="dropdown">
+        <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
+        <ul class="dropdown-menu">
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs001.html#plan-for-week-39-september-22-26-2025" style="font-size: 80%;"><b>Plan for week 39, September 22-26, 2025</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs002.html#readings-and-videos-resampling-methods" style="font-size: 80%;"><b>Readings and Videos, resampling methods</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs003.html#readings-and-videos-logistic-regression" style="font-size: 80%;"><b>Readings and Videos, logistic regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs004.html#lab-sessions-week-39" style="font-size: 80%;"><b>Lab sessions week 39</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs005.html#lecture-material" style="font-size: 80%;"><b>Lecture material</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs010.html#resampling-methods" style="font-size: 80%;"><b>Resampling methods</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs007.html#resampling-approaches-can-be-computationally-expensive" style="font-size: 80%;"><b>Resampling approaches can be computationally expensive</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs008.html#why-resampling-methods" style="font-size: 80%;"><b>Why resampling methods ?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs009.html#statistical-analysis" style="font-size: 80%;"><b>Statistical analysis</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs010.html#resampling-methods" style="font-size: 80%;"><b>Resampling methods</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs011.html#resampling-methods-bootstrap" style="font-size: 80%;"><b>Resampling methods: Bootstrap</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs012.html#the-bias-variance-tradeoff" style="font-size: 80%;"><b>The bias-variance tradeoff</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs013.html#a-way-to-read-the-bias-variance-tradeoff" style="font-size: 80%;"><b>A way to Read the Bias-Variance Tradeoff</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs014.html#understanding-what-happens" style="font-size: 80%;"><b>Understanding what happens</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs015.html#summing-up" style="font-size: 80%;"><b>Summing up</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs016.html#another-example-from-scikit-learn-s-repository" style="font-size: 80%;"><b>Another Example from Scikit-Learn's Repository</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs017.html#various-steps-in-cross-validation" style="font-size: 80%;"><b>Various steps in cross-validation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs018.html#cross-validation-in-brief" style="font-size: 80%;"><b>Cross-validation in brief</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs019.html#code-example-for-cross-validation-and-k-fold-cross-validation" style="font-size: 80%;"><b>Code Example for Cross-validation and \( k \)-fold Cross-validation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs020.html#more-examples-on-bootstrap-and-cross-validation-and-errors" style="font-size: 80%;"><b>More examples on bootstrap and cross-validation and errors</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs021.html#the-same-example-but-now-with-cross-validation" style="font-size: 80%;"><b>The same example but now with cross-validation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs022.html#logistic-regression" style="font-size: 80%;"><b>Logistic Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs023.html#classification-problems" style="font-size: 80%;"><b>Classification problems</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs024.html#optimization-and-deep-learning" style="font-size: 80%;"><b>Optimization and Deep learning</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs025.html#basics" style="font-size: 80%;"><b>Basics</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs026.html#linear-classifier" style="font-size: 80%;"><b>Linear classifier</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs027.html#some-selected-properties" style="font-size: 80%;"><b>Some selected properties</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs028.html#simple-example" style="font-size: 80%;"><b>Simple example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs029.html#plotting-the-mean-value-for-each-group" style="font-size: 80%;"><b>Plotting the mean value for each group</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs030.html#the-logistic-function" style="font-size: 80%;"><b>The logistic function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs031.html#examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-networks" style="font-size: 80%;"><b>Examples of likelihood functions used in logistic regression and nueral networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs032.html#two-parameters" style="font-size: 80%;"><b>Two parameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs033.html#maximum-likelihood" style="font-size: 80%;"><b>Maximum likelihood</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#the-cost-function-rewritten" style="font-size: 80%;"><b>The cost function rewritten</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs035.html#minimizing-the-cross-entropy" style="font-size: 80%;"><b>Minimizing the cross entropy</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs036.html#a-more-compact-expression" style="font-size: 80%;"><b>A more compact expression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs037.html#extending-to-more-predictors" style="font-size: 80%;"><b>Extending to more predictors</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs038.html#including-more-classes" style="font-size: 80%;"><b>Including more classes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs039.html#more-classes" style="font-size: 80%;"><b>More classes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs040.html#optimization-the-central-part-of-any-machine-learning-algortithm" style="font-size: 80%;"><b>Optimization, the central part of any Machine Learning algortithm</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs041.html#revisiting-our-logistic-regression-case" style="font-size: 80%;"><b>Revisiting our Logistic Regression case</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs042.html#the-equations-to-solve" style="font-size: 80%;"><b>The equations to solve</b></a></li>
+     <!-- navigation toc: --> <li><a href="#solving-using-newton-raphson-s-method" style="font-size: 80%;"><b>Solving using Newton-Raphson's method</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs044.html#example-code-for-logistic-regression" style="font-size: 80%;"><b>Example code for Logistic Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs044.html#synthetic-data-generation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Synthetic data generation</a></li>
+
+        </ul>
+      </li>
+    </ul>
+  </div>
+</div>
+</div> <!-- end of navigation bar -->
+<div class="container">
+<p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
+<a name="part0043"></a>
+<!-- !split -->
+<h2 id="solving-using-newton-raphson-s-method" class="anchor">Solving using Newton-Raphson's method </h2>
+
+<p>If we can set up these equations, Newton-Raphson's iterative method is normally the method of choice. It requires however that we can compute in an efficient way the  matrices that define the first and second derivatives. </p>
+
+<p>Our iterative scheme is then given by</p>
+
+$$
+\boldsymbol{\theta}^{\mathrm{new}} = \boldsymbol{\theta}^{\mathrm{old}}-\left(\frac{\partial^2 \mathcal{C}(\boldsymbol{\theta})}{\partial \boldsymbol{\theta}\partial \boldsymbol{\theta}^T}\right)^{-1}_{\boldsymbol{\theta}^{\mathrm{old}}}\times \left(\frac{\partial \mathcal{C}(\boldsymbol{\theta})}{\partial \boldsymbol{\theta}}\right)_{\boldsymbol{\theta}^{\mathrm{old}}},
+$$
+
+<p>or in matrix form as</p>
+
+$$
+\boldsymbol{\theta}^{\mathrm{new}} = \boldsymbol{\theta}^{\mathrm{old}}-\left(\boldsymbol{X}^T\boldsymbol{W}\boldsymbol{X} \right)^{-1}\times \left(-\boldsymbol{X}^T(\boldsymbol{y}-\boldsymbol{p}) \right)_{\boldsymbol{\theta}^{\mathrm{old}}}.
+$$
+
+<p>The right-hand side is computed with the old values of \( \theta \). </p>
+
+<p>If we can compute these matrices, in particular the Hessian, the above is often the easiest method to implement. </p>
+
+<p>
+<!-- navigation buttons at the bottom of the page -->
+<ul class="pagination">
+<li><a href="http://github.com/._week39-bs042.html">&laquo;</a></li>
+  <li><a href="http://github.com/._week39-bs000.html">1</a></li>
+  <li><a href="">...</a></li>
+  <li><a href="http://github.com/._week39-bs035.html">36</a></li>
+  <li><a href="http://github.com/._week39-bs036.html">37</a></li>
+  <li><a href="http://github.com/._week39-bs037.html">38</a></li>
+  <li><a href="http://github.com/._week39-bs038.html">39</a></li>
+  <li><a href="http://github.com/._week39-bs039.html">40</a></li>
+  <li><a href="http://github.com/._week39-bs040.html">41</a></li>
+  <li><a href="http://github.com/._week39-bs041.html">42</a></li>
+  <li><a href="http://github.com/._week39-bs042.html">43</a></li>
+  <li class="active"><a href="http://github.com/._week39-bs043.html">44</a></li>
+  <li><a href="http://github.com/._week39-bs044.html">45</a></li>
+  <li><a href="http://github.com/._week39-bs044.html">&raquo;</a></li>
+</ul>
+<!-- ------------------- end of main content --------------- -->
+</div>  <!-- end container -->
+<!-- include javascript, jQuery *first* -->
+<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.10.2/jquery.min.js"></script>
+<script src="https://netdna.bootstrapcdn.com/bootstrap/3.0.0/js/bootstrap.min.js"></script>
+<!-- Bootstrap footer
+<footer>
+<a href="https://.../"><img width="250" align=right src="https://.../"></a>
+</footer>
+-->
+<center style="font-size:80%">
+<!-- copyright only on the titlepage -->
+</center>
+</body>
+</html>
+
diff --git a/doc/pub/week39/html/._week39-bs044.html b/doc/pub/week39/html/._week39-bs044.html
new file mode 100644
index 000000000..4a9e1f5ea
--- /dev/null
+++ b/doc/pub/week39/html/._week39-bs044.html
@@ -0,0 +1,613 @@
+<!--
+HTML file automatically generated from DocOnce source
+(https://github.com/doconce/doconce/)
+doconce format html week39.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week39-bs --no_mako
+-->
+<html>
+<head>
+<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
+<meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
+<meta name="viewport" content="width=device-width, initial-scale=1.0" />
+<meta name="description" content="Week 39: Resampling methods and logistic regression">
+<title>Week 39: Resampling methods and logistic regression</title>
+<!-- Bootstrap style: bootstrap -->
+<!-- doconce format html week39.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week39-bs --no_mako -->
+<link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
+<!-- not necessary
+<link href="https://netdna.bootstrapcdn.com/font-awesome/4.0.3/css/font-awesome.css" rel="stylesheet">
+-->
+<style type="text/css">
+/* Add scrollbar to dropdown menus in bootstrap navigation bar */
+.dropdown-menu {
+   height: auto;
+   max-height: 400px;
+   overflow-x: hidden;
+}
+/* Adds an invisible element before each target to offset for the navigation
+   bar */
+.anchor::before {
+  content:"";
+  display:block;
+  height:50px;      /* fixed header height for style bootstrap */
+  margin:-50px 0 0; /* negative fixed header height */
+}
+</style>
+</head>
+
+<!-- tocinfo
+{'highest level': 2,
+ 'sections': [('Plan for week 39, September 22-26, 2025',
+               2,
+               None,
+               'plan-for-week-39-september-22-26-2025'),
+              ('Readings and Videos, resampling methods',
+               2,
+               None,
+               'readings-and-videos-resampling-methods'),
+              ('Readings and Videos, logistic regression',
+               2,
+               None,
+               'readings-and-videos-logistic-regression'),
+              ('Lab sessions week 39', 2, None, 'lab-sessions-week-39'),
+              ('Lecture material', 2, None, 'lecture-material'),
+              ('Resampling methods', 2, None, 'resampling-methods'),
+              ('Resampling approaches can be computationally expensive',
+               2,
+               None,
+               'resampling-approaches-can-be-computationally-expensive'),
+              ('Why resampling methods ?', 2, None, 'why-resampling-methods'),
+              ('Statistical analysis', 2, None, 'statistical-analysis'),
+              ('Resampling methods', 2, None, 'resampling-methods'),
+              ('Resampling methods: Bootstrap',
+               2,
+               None,
+               'resampling-methods-bootstrap'),
+              ('The bias-variance tradeoff',
+               2,
+               None,
+               'the-bias-variance-tradeoff'),
+              ('A way to Read the Bias-Variance Tradeoff',
+               2,
+               None,
+               'a-way-to-read-the-bias-variance-tradeoff'),
+              ('Understanding what happens',
+               2,
+               None,
+               'understanding-what-happens'),
+              ('Summing up', 2, None, 'summing-up'),
+              ("Another Example from Scikit-Learn's Repository",
+               2,
+               None,
+               'another-example-from-scikit-learn-s-repository'),
+              ('Various steps in cross-validation',
+               2,
+               None,
+               'various-steps-in-cross-validation'),
+              ('Cross-validation in brief',
+               2,
+               None,
+               'cross-validation-in-brief'),
+              ('Code Example for Cross-validation and $k$-fold '
+               'Cross-validation',
+               2,
+               None,
+               'code-example-for-cross-validation-and-k-fold-cross-validation'),
+              ('More examples on bootstrap and cross-validation and errors',
+               2,
+               None,
+               'more-examples-on-bootstrap-and-cross-validation-and-errors'),
+              ('The same example but now with cross-validation',
+               2,
+               None,
+               'the-same-example-but-now-with-cross-validation'),
+              ('Logistic Regression', 2, None, 'logistic-regression'),
+              ('Classification problems', 2, None, 'classification-problems'),
+              ('Optimization and Deep learning',
+               2,
+               None,
+               'optimization-and-deep-learning'),
+              ('Basics', 2, None, 'basics'),
+              ('Linear classifier', 2, None, 'linear-classifier'),
+              ('Some selected properties', 2, None, 'some-selected-properties'),
+              ('Simple example', 2, None, 'simple-example'),
+              ('Plotting the mean value for each group',
+               2,
+               None,
+               'plotting-the-mean-value-for-each-group'),
+              ('The logistic function', 2, None, 'the-logistic-function'),
+              ('Examples of likelihood functions used in logistic regression '
+               'and nueral networks',
+               2,
+               None,
+               'examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-networks'),
+              ('Two parameters', 2, None, 'two-parameters'),
+              ('Maximum likelihood', 2, None, 'maximum-likelihood'),
+              ('The cost function rewritten',
+               2,
+               None,
+               'the-cost-function-rewritten'),
+              ('Minimizing the cross entropy',
+               2,
+               None,
+               'minimizing-the-cross-entropy'),
+              ('A more compact expression',
+               2,
+               None,
+               'a-more-compact-expression'),
+              ('Extending to more predictors',
+               2,
+               None,
+               'extending-to-more-predictors'),
+              ('Including more classes', 2, None, 'including-more-classes'),
+              ('More classes', 2, None, 'more-classes'),
+              ('Optimization, the central part of any Machine Learning '
+               'algortithm',
+               2,
+               None,
+               'optimization-the-central-part-of-any-machine-learning-algortithm'),
+              ('Revisiting our Logistic Regression case',
+               2,
+               None,
+               'revisiting-our-logistic-regression-case'),
+              ('The equations to solve', 2, None, 'the-equations-to-solve'),
+              ("Solving using Newton-Raphson's method",
+               2,
+               None,
+               'solving-using-newton-raphson-s-method'),
+              ('Example code for Logistic Regression',
+               2,
+               None,
+               'example-code-for-logistic-regression'),
+              ('Synthetic data generation',
+               3,
+               None,
+               'synthetic-data-generation')]}
+end of tocinfo -->
+
+<body>
+
+
+
+<script type="text/x-mathjax-config">
+MathJax.Hub.Config({
+  TeX: {
+     equationNumbers: {  autoNumber: "none"  },
+     extensions: ["AMSmath.js", "AMSsymbols.js", "autobold.js", "color.js"]
+  }
+});
+</script>
+<script type="text/javascript" async
+ src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
+</script>
+
+
+<!-- Bootstrap navigation bar -->
+<div class="navbar navbar-default navbar-fixed-top">
+  <div class="navbar-header">
+    <button type="button" class="navbar-toggle" data-toggle="collapse" data-target=".navbar-responsive-collapse">
+      <span class="icon-bar"></span>
+      <span class="icon-bar"></span>
+      <span class="icon-bar"></span>
+    </button>
+    <a class="navbar-brand" href="http://github.com/week39-bs.html">Week 39: Resampling methods and logistic regression</a>
+  </div>
+  <div class="navbar-collapse collapse navbar-responsive-collapse">
+    <ul class="nav navbar-nav navbar-right">
+      <li class="dropdown">
+        <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
+        <ul class="dropdown-menu">
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs001.html#plan-for-week-39-september-22-26-2025" style="font-size: 80%;"><b>Plan for week 39, September 22-26, 2025</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs002.html#readings-and-videos-resampling-methods" style="font-size: 80%;"><b>Readings and Videos, resampling methods</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs003.html#readings-and-videos-logistic-regression" style="font-size: 80%;"><b>Readings and Videos, logistic regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs004.html#lab-sessions-week-39" style="font-size: 80%;"><b>Lab sessions week 39</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs005.html#lecture-material" style="font-size: 80%;"><b>Lecture material</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs010.html#resampling-methods" style="font-size: 80%;"><b>Resampling methods</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs007.html#resampling-approaches-can-be-computationally-expensive" style="font-size: 80%;"><b>Resampling approaches can be computationally expensive</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs008.html#why-resampling-methods" style="font-size: 80%;"><b>Why resampling methods ?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs009.html#statistical-analysis" style="font-size: 80%;"><b>Statistical analysis</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs010.html#resampling-methods" style="font-size: 80%;"><b>Resampling methods</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs011.html#resampling-methods-bootstrap" style="font-size: 80%;"><b>Resampling methods: Bootstrap</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs012.html#the-bias-variance-tradeoff" style="font-size: 80%;"><b>The bias-variance tradeoff</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs013.html#a-way-to-read-the-bias-variance-tradeoff" style="font-size: 80%;"><b>A way to Read the Bias-Variance Tradeoff</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs014.html#understanding-what-happens" style="font-size: 80%;"><b>Understanding what happens</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs015.html#summing-up" style="font-size: 80%;"><b>Summing up</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs016.html#another-example-from-scikit-learn-s-repository" style="font-size: 80%;"><b>Another Example from Scikit-Learn's Repository</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs017.html#various-steps-in-cross-validation" style="font-size: 80%;"><b>Various steps in cross-validation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs018.html#cross-validation-in-brief" style="font-size: 80%;"><b>Cross-validation in brief</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs019.html#code-example-for-cross-validation-and-k-fold-cross-validation" style="font-size: 80%;"><b>Code Example for Cross-validation and \( k \)-fold Cross-validation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs020.html#more-examples-on-bootstrap-and-cross-validation-and-errors" style="font-size: 80%;"><b>More examples on bootstrap and cross-validation and errors</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs021.html#the-same-example-but-now-with-cross-validation" style="font-size: 80%;"><b>The same example but now with cross-validation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs022.html#logistic-regression" style="font-size: 80%;"><b>Logistic Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs023.html#classification-problems" style="font-size: 80%;"><b>Classification problems</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs024.html#optimization-and-deep-learning" style="font-size: 80%;"><b>Optimization and Deep learning</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs025.html#basics" style="font-size: 80%;"><b>Basics</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs026.html#linear-classifier" style="font-size: 80%;"><b>Linear classifier</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs027.html#some-selected-properties" style="font-size: 80%;"><b>Some selected properties</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs028.html#simple-example" style="font-size: 80%;"><b>Simple example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs029.html#plotting-the-mean-value-for-each-group" style="font-size: 80%;"><b>Plotting the mean value for each group</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs030.html#the-logistic-function" style="font-size: 80%;"><b>The logistic function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs031.html#examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-networks" style="font-size: 80%;"><b>Examples of likelihood functions used in logistic regression and nueral networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs032.html#two-parameters" style="font-size: 80%;"><b>Two parameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs033.html#maximum-likelihood" style="font-size: 80%;"><b>Maximum likelihood</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#the-cost-function-rewritten" style="font-size: 80%;"><b>The cost function rewritten</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs035.html#minimizing-the-cross-entropy" style="font-size: 80%;"><b>Minimizing the cross entropy</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs036.html#a-more-compact-expression" style="font-size: 80%;"><b>A more compact expression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs037.html#extending-to-more-predictors" style="font-size: 80%;"><b>Extending to more predictors</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs038.html#including-more-classes" style="font-size: 80%;"><b>Including more classes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs039.html#more-classes" style="font-size: 80%;"><b>More classes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs040.html#optimization-the-central-part-of-any-machine-learning-algortithm" style="font-size: 80%;"><b>Optimization, the central part of any Machine Learning algortithm</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs041.html#revisiting-our-logistic-regression-case" style="font-size: 80%;"><b>Revisiting our Logistic Regression case</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs042.html#the-equations-to-solve" style="font-size: 80%;"><b>The equations to solve</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs043.html#solving-using-newton-raphson-s-method" style="font-size: 80%;"><b>Solving using Newton-Raphson's method</b></a></li>
+     <!-- navigation toc: --> <li><a href="#example-code-for-logistic-regression" style="font-size: 80%;"><b>Example code for Logistic Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="#synthetic-data-generation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Synthetic data generation</a></li>
+
+        </ul>
+      </li>
+    </ul>
+  </div>
+</div>
+</div> <!-- end of navigation bar -->
+<div class="container">
+<p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
+<a name="part0044"></a>
+<!-- !split -->
+<h2 id="example-code-for-logistic-regression" class="anchor">Example code for Logistic Regression </h2>
+
+<p>Here we make a class for Logistic regression. The code uses a simple data set and includes both a binary case and a multiclass case.</p>
+
+<!-- code=python (!bc pycod) typeset with pygments style "default" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #f8f8f8">
+  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
+
+<span style="color: #008000; font-weight: bold">class</span> <span style="color: #0000FF; font-weight: bold">LogisticRegression</span>:
+    <span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;</span>
+<span style="color: #BA2121; font-style: italic">    Logistic Regression for binary and multiclass classification.</span>
+<span style="color: #BA2121; font-style: italic">    &quot;&quot;&quot;</span>
+    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">__init__</span>(<span style="color: #008000">self</span>, lr<span style="color: #666666">=0.01</span>, epochs<span style="color: #666666">=1000</span>, fit_intercept<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">True</span>, verbose<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">False</span>):
+        <span style="color: #008000">self</span><span style="color: #666666">.</span>lr <span style="color: #666666">=</span> lr                  <span style="color: #408080; font-style: italic"># Learning rate for gradient descent</span>
+        <span style="color: #008000">self</span><span style="color: #666666">.</span>epochs <span style="color: #666666">=</span> epochs          <span style="color: #408080; font-style: italic"># Number of iterations</span>
+        <span style="color: #008000">self</span><span style="color: #666666">.</span>fit_intercept <span style="color: #666666">=</span> fit_intercept  <span style="color: #408080; font-style: italic"># Whether to add intercept (bias)</span>
+        <span style="color: #008000">self</span><span style="color: #666666">.</span>verbose <span style="color: #666666">=</span> verbose        <span style="color: #408080; font-style: italic"># Print loss during training if True</span>
+        <span style="color: #008000">self</span><span style="color: #666666">.</span>weights <span style="color: #666666">=</span> <span style="color: #008000; font-weight: bold">None</span>
+        <span style="color: #008000">self</span><span style="color: #666666">.</span>multi_class <span style="color: #666666">=</span> <span style="color: #008000; font-weight: bold">False</span>      <span style="color: #408080; font-style: italic"># Will be determined at fit time</span>
+
+    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">_add_intercept</span>(<span style="color: #008000">self</span>, X):
+        <span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;Add intercept term (column of ones) to feature matrix.&quot;&quot;&quot;</span>
+        intercept <span style="color: #666666">=</span> np<span style="color: #666666">.</span>ones((X<span style="color: #666666">.</span>shape[<span style="color: #666666">0</span>], <span style="color: #666666">1</span>))
+        <span style="color: #008000; font-weight: bold">return</span> np<span style="color: #666666">.</span>concatenate((intercept, X), axis<span style="color: #666666">=1</span>)
+
+    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">_sigmoid</span>(<span style="color: #008000">self</span>, z):
+        <span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;Sigmoid function for binary logistic.&quot;&quot;&quot;</span>
+        <span style="color: #008000; font-weight: bold">return</span> <span style="color: #666666">1</span> <span style="color: #666666">/</span> (<span style="color: #666666">1</span> <span style="color: #666666">+</span> np<span style="color: #666666">.</span>exp(<span style="color: #666666">-</span>z))
+
+    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">_softmax</span>(<span style="color: #008000">self</span>, Z):
+        <span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;Softmax function for multiclass logistic.&quot;&quot;&quot;</span>
+        exp_Z <span style="color: #666666">=</span> np<span style="color: #666666">.</span>exp(Z <span style="color: #666666">-</span> np<span style="color: #666666">.</span>max(Z, axis<span style="color: #666666">=1</span>, keepdims<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">True</span>))
+        <span style="color: #008000; font-weight: bold">return</span> exp_Z <span style="color: #666666">/</span> np<span style="color: #666666">.</span>sum(exp_Z, axis<span style="color: #666666">=1</span>, keepdims<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">True</span>)
+
+    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">fit</span>(<span style="color: #008000">self</span>, X, y):
+        <span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;</span>
+<span style="color: #BA2121; font-style: italic">        Train the logistic regression model using gradient descent.</span>
+<span style="color: #BA2121; font-style: italic">        Supports binary (sigmoid) and multiclass (softmax) based on y.</span>
+<span style="color: #BA2121; font-style: italic">        &quot;&quot;&quot;</span>
+        X <span style="color: #666666">=</span> np<span style="color: #666666">.</span>array(X)
+        y <span style="color: #666666">=</span> np<span style="color: #666666">.</span>array(y)
+        n_samples, n_features <span style="color: #666666">=</span> X<span style="color: #666666">.</span>shape
+
+        <span style="color: #408080; font-style: italic"># Add intercept if needed</span>
+        <span style="color: #008000; font-weight: bold">if</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>fit_intercept:
+            X <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>_add_intercept(X)
+            n_features <span style="color: #666666">+=</span> <span style="color: #666666">1</span>
+
+        <span style="color: #408080; font-style: italic"># Determine classes and mode (binary vs multiclass)</span>
+        unique_classes <span style="color: #666666">=</span> np<span style="color: #666666">.</span>unique(y)
+        <span style="color: #008000; font-weight: bold">if</span> <span style="color: #008000">len</span>(unique_classes) <span style="color: #666666">&gt;</span> <span style="color: #666666">2</span>:
+            <span style="color: #008000">self</span><span style="color: #666666">.</span>multi_class <span style="color: #666666">=</span> <span style="color: #008000; font-weight: bold">True</span>
+        <span style="color: #008000; font-weight: bold">else</span>:
+            <span style="color: #008000">self</span><span style="color: #666666">.</span>multi_class <span style="color: #666666">=</span> <span style="color: #008000; font-weight: bold">False</span>
+
+        <span style="color: #408080; font-style: italic"># ----- Multiclass case -----</span>
+        <span style="color: #008000; font-weight: bold">if</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>multi_class:
+            n_classes <span style="color: #666666">=</span> <span style="color: #008000">len</span>(unique_classes)
+            <span style="color: #408080; font-style: italic"># Map original labels to 0...n_classes-1</span>
+            class_to_index <span style="color: #666666">=</span> {c: idx <span style="color: #008000; font-weight: bold">for</span> idx, c <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">enumerate</span>(unique_classes)}
+            y_indices <span style="color: #666666">=</span> np<span style="color: #666666">.</span>array([class_to_index[c] <span style="color: #008000; font-weight: bold">for</span> c <span style="color: #AA22FF; font-weight: bold">in</span> y])
+            <span style="color: #408080; font-style: italic"># Initialize weight matrix (features x classes)</span>
+            <span style="color: #008000">self</span><span style="color: #666666">.</span>weights <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros((n_features, n_classes))
+
+            <span style="color: #408080; font-style: italic"># One-hot encode y</span>
+            Y_onehot <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros((n_samples, n_classes))
+            Y_onehot[np<span style="color: #666666">.</span>arange(n_samples), y_indices] <span style="color: #666666">=</span> <span style="color: #666666">1</span>
+
+            <span style="color: #408080; font-style: italic"># Gradient descent</span>
+            <span style="color: #008000; font-weight: bold">for</span> epoch <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #008000">self</span><span style="color: #666666">.</span>epochs):
+                scores <span style="color: #666666">=</span> X<span style="color: #666666">.</span>dot(<span style="color: #008000">self</span><span style="color: #666666">.</span>weights)          <span style="color: #408080; font-style: italic"># Linear scores (n_samples x n_classes)</span>
+                probs <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>_softmax(scores)        <span style="color: #408080; font-style: italic"># Probabilities (n_samples x n_classes)</span>
+                <span style="color: #408080; font-style: italic"># Compute gradient (features x classes)</span>
+                gradient <span style="color: #666666">=</span> (<span style="color: #666666">1</span> <span style="color: #666666">/</span> n_samples) <span style="color: #666666">*</span> X<span style="color: #666666">.</span>T<span style="color: #666666">.</span>dot(probs <span style="color: #666666">-</span> Y_onehot)
+                <span style="color: #408080; font-style: italic"># Update weights</span>
+                <span style="color: #008000">self</span><span style="color: #666666">.</span>weights <span style="color: #666666">-=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>lr <span style="color: #666666">*</span> gradient
+
+                <span style="color: #008000; font-weight: bold">if</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>verbose <span style="color: #AA22FF; font-weight: bold">and</span> epoch <span style="color: #666666">%</span> <span style="color: #666666">100</span> <span style="color: #666666">==</span> <span style="color: #666666">0</span>:
+                    <span style="color: #408080; font-style: italic"># Compute current loss (categorical cross-entropy)</span>
+                    loss <span style="color: #666666">=</span> <span style="color: #666666">-</span>np<span style="color: #666666">.</span>sum(Y_onehot <span style="color: #666666">*</span> np<span style="color: #666666">.</span>log(probs <span style="color: #666666">+</span> <span style="color: #666666">1e-15</span>)) <span style="color: #666666">/</span> n_samples
+                    <span style="color: #008000">print</span>(<span style="color: #BA2121">f&quot;[Epoch </span><span style="color: #BB6688; font-weight: bold">{</span>epoch<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">] Multiclass loss: </span><span style="color: #BB6688; font-weight: bold">{</span>loss<span style="color: #BB6688; font-weight: bold">:</span><span style="color: #BA2121">.4f</span><span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">&quot;</span>)
+
+        <span style="color: #408080; font-style: italic"># ----- Binary case -----</span>
+        <span style="color: #008000; font-weight: bold">else</span>:
+            <span style="color: #408080; font-style: italic"># Convert y to 0/1 if not already</span>
+            <span style="color: #008000; font-weight: bold">if</span> <span style="color: #AA22FF; font-weight: bold">not</span> np<span style="color: #666666">.</span>array_equal(unique_classes, [<span style="color: #666666">0</span>, <span style="color: #666666">1</span>]):
+                <span style="color: #408080; font-style: italic"># Map the two classes to 0 and 1</span>
+                class0, class1 <span style="color: #666666">=</span> unique_classes
+                y_binary <span style="color: #666666">=</span> np<span style="color: #666666">.</span>where(y <span style="color: #666666">==</span> class1, <span style="color: #666666">1</span>, <span style="color: #666666">0</span>)
+            <span style="color: #008000; font-weight: bold">else</span>:
+                y_binary <span style="color: #666666">=</span> y<span style="color: #666666">.</span>copy()<span style="color: #666666">.</span>astype(<span style="color: #008000">int</span>)
+
+            <span style="color: #408080; font-style: italic"># Initialize weights vector (features,)</span>
+            <span style="color: #008000">self</span><span style="color: #666666">.</span>weights <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros(n_features)
+
+            <span style="color: #408080; font-style: italic"># Gradient descent</span>
+            <span style="color: #008000; font-weight: bold">for</span> epoch <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #008000">self</span><span style="color: #666666">.</span>epochs):
+                linear_model <span style="color: #666666">=</span> X<span style="color: #666666">.</span>dot(<span style="color: #008000">self</span><span style="color: #666666">.</span>weights)     <span style="color: #408080; font-style: italic"># (n_samples,)</span>
+                probs <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>_sigmoid(linear_model)   <span style="color: #408080; font-style: italic"># (n_samples,)</span>
+                <span style="color: #408080; font-style: italic"># Gradient for binary cross-entropy</span>
+                gradient <span style="color: #666666">=</span> (<span style="color: #666666">1</span> <span style="color: #666666">/</span> n_samples) <span style="color: #666666">*</span> X<span style="color: #666666">.</span>T<span style="color: #666666">.</span>dot(probs <span style="color: #666666">-</span> y_binary)
+                <span style="color: #008000">self</span><span style="color: #666666">.</span>weights <span style="color: #666666">-=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>lr <span style="color: #666666">*</span> gradient
+
+                <span style="color: #008000; font-weight: bold">if</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>verbose <span style="color: #AA22FF; font-weight: bold">and</span> epoch <span style="color: #666666">%</span> <span style="color: #666666">100</span> <span style="color: #666666">==</span> <span style="color: #666666">0</span>:
+                    <span style="color: #408080; font-style: italic"># Compute binary cross-entropy loss</span>
+                    loss <span style="color: #666666">=</span> <span style="color: #666666">-</span>np<span style="color: #666666">.</span>mean(
+                        y_binary <span style="color: #666666">*</span> np<span style="color: #666666">.</span>log(probs <span style="color: #666666">+</span> <span style="color: #666666">1e-15</span>) <span style="color: #666666">+</span> 
+                        (<span style="color: #666666">1</span> <span style="color: #666666">-</span> y_binary) <span style="color: #666666">*</span> np<span style="color: #666666">.</span>log(<span style="color: #666666">1</span> <span style="color: #666666">-</span> probs <span style="color: #666666">+</span> <span style="color: #666666">1e-15</span>)
+                    )
+                    <span style="color: #008000">print</span>(<span style="color: #BA2121">f&quot;[Epoch </span><span style="color: #BB6688; font-weight: bold">{</span>epoch<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">] Binary loss: </span><span style="color: #BB6688; font-weight: bold">{</span>loss<span style="color: #BB6688; font-weight: bold">:</span><span style="color: #BA2121">.4f</span><span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">&quot;</span>)
+
+    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">predict_prob</span>(<span style="color: #008000">self</span>, X):
+        <span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;</span>
+<span style="color: #BA2121; font-style: italic">        Compute probability estimates. Returns a 1D array for binary or</span>
+<span style="color: #BA2121; font-style: italic">        a 2D array (n_samples x n_classes) for multiclass.</span>
+<span style="color: #BA2121; font-style: italic">        &quot;&quot;&quot;</span>
+        X <span style="color: #666666">=</span> np<span style="color: #666666">.</span>array(X)
+        <span style="color: #408080; font-style: italic"># Add intercept if the model used it</span>
+        <span style="color: #008000; font-weight: bold">if</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>fit_intercept:
+            X <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>_add_intercept(X)
+        scores <span style="color: #666666">=</span> X<span style="color: #666666">.</span>dot(<span style="color: #008000">self</span><span style="color: #666666">.</span>weights)
+        <span style="color: #008000; font-weight: bold">if</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>multi_class:
+            <span style="color: #008000; font-weight: bold">return</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>_softmax(scores)
+        <span style="color: #008000; font-weight: bold">else</span>:
+            <span style="color: #008000; font-weight: bold">return</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>_sigmoid(scores)
+
+    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">predict</span>(<span style="color: #008000">self</span>, X):
+        <span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;</span>
+<span style="color: #BA2121; font-style: italic">        Predict class labels for samples in X.</span>
+<span style="color: #BA2121; font-style: italic">        Returns integer class labels (0,1 for binary, or 0...C-1 for multiclass).</span>
+<span style="color: #BA2121; font-style: italic">        &quot;&quot;&quot;</span>
+        probs <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>predict_prob(X)
+        <span style="color: #008000; font-weight: bold">if</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>multi_class:
+            <span style="color: #408080; font-style: italic"># Choose class with highest probability</span>
+            <span style="color: #008000; font-weight: bold">return</span> np<span style="color: #666666">.</span>argmax(probs, axis<span style="color: #666666">=1</span>)
+        <span style="color: #008000; font-weight: bold">else</span>:
+            <span style="color: #408080; font-style: italic"># Threshold at 0.5 for binary</span>
+            <span style="color: #008000; font-weight: bold">return</span> (probs <span style="color: #666666">&gt;=</span> <span style="color: #666666">0.5</span>)<span style="color: #666666">.</span>astype(<span style="color: #008000">int</span>)
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
+</div>
+
+<p>The class implements the sigmoid and softmax internally. During fit(),
+we check the number of classes: if more than 2, we set
+self.multi_class=True and perform multinomial logistic regression. We
+one-hot encode the target vector and update a weight matrix with
+softmax probabilities. Otherwise, we do standard binary logistic
+regression, converting labels to 0/1 if needed and updating a weight
+vector. In both cases we use batch gradient descent on the
+cross-entropy loss (we add a small epsilon 1e-15 to logs for numerical
+stability). Progress (loss) can be printed if verbose=True.
+</p>
+
+
+<!-- code=python (!bc pycod) typeset with pygments style "default" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #f8f8f8">
+  <pre style="line-height: 125%;"><span style="color: #408080; font-style: italic"># Evaluation Metrics</span>
+<span style="color: #408080; font-style: italic">#We define helper functions for accuracy and cross-entropy loss. Accuracy is the fraction of correct predictions . For loss, we compute the appropriate cross-entropy:</span>
+
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">accuracy_score</span>(y_true, y_pred):
+    <span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;Accuracy = (# correct predictions) / (total samples).&quot;&quot;&quot;</span>
+    y_true <span style="color: #666666">=</span> np<span style="color: #666666">.</span>array(y_true)
+    y_pred <span style="color: #666666">=</span> np<span style="color: #666666">.</span>array(y_pred)
+    <span style="color: #008000; font-weight: bold">return</span> np<span style="color: #666666">.</span>mean(y_true <span style="color: #666666">==</span> y_pred)
+
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">binary_cross_entropy</span>(y_true, y_prob):
+    <span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;</span>
+<span style="color: #BA2121; font-style: italic">    Binary cross-entropy loss.</span>
+<span style="color: #BA2121; font-style: italic">    y_true: true binary labels (0 or 1), y_prob: predicted probabilities for class 1.</span>
+<span style="color: #BA2121; font-style: italic">    &quot;&quot;&quot;</span>
+    y_true <span style="color: #666666">=</span> np<span style="color: #666666">.</span>array(y_true)
+    y_prob <span style="color: #666666">=</span> np<span style="color: #666666">.</span>clip(np<span style="color: #666666">.</span>array(y_prob), <span style="color: #666666">1e-15</span>, <span style="color: #666666">1-1e-15</span>)
+    <span style="color: #008000; font-weight: bold">return</span> <span style="color: #666666">-</span>np<span style="color: #666666">.</span>mean(y_true <span style="color: #666666">*</span> np<span style="color: #666666">.</span>log(y_prob) <span style="color: #666666">+</span> (<span style="color: #666666">1</span> <span style="color: #666666">-</span> y_true) <span style="color: #666666">*</span> np<span style="color: #666666">.</span>log(<span style="color: #666666">1</span> <span style="color: #666666">-</span> y_prob))
+
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">categorical_cross_entropy</span>(y_true, y_prob):
+    <span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;</span>
+<span style="color: #BA2121; font-style: italic">    Categorical cross-entropy loss for multiclass.</span>
+<span style="color: #BA2121; font-style: italic">    y_true: true labels (0...C-1), y_prob: array of predicted probabilities (n_samples x C).</span>
+<span style="color: #BA2121; font-style: italic">    &quot;&quot;&quot;</span>
+    y_true <span style="color: #666666">=</span> np<span style="color: #666666">.</span>array(y_true, dtype<span style="color: #666666">=</span><span style="color: #008000">int</span>)
+    y_prob <span style="color: #666666">=</span> np<span style="color: #666666">.</span>clip(np<span style="color: #666666">.</span>array(y_prob), <span style="color: #666666">1e-15</span>, <span style="color: #666666">1-1e-15</span>)
+    <span style="color: #408080; font-style: italic"># One-hot encode true labels</span>
+    n_samples, n_classes <span style="color: #666666">=</span> y_prob<span style="color: #666666">.</span>shape
+    one_hot <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros_like(y_prob)
+    one_hot[np<span style="color: #666666">.</span>arange(n_samples), y_true] <span style="color: #666666">=</span> <span style="color: #666666">1</span>
+    <span style="color: #408080; font-style: italic"># Compute cross-entropy</span>
+    loss_vec <span style="color: #666666">=</span> <span style="color: #666666">-</span>np<span style="color: #666666">.</span>sum(one_hot <span style="color: #666666">*</span> np<span style="color: #666666">.</span>log(y_prob), axis<span style="color: #666666">=1</span>)
+    <span style="color: #008000; font-weight: bold">return</span> np<span style="color: #666666">.</span>mean(loss_vec)
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
+</div>
+<h3 id="synthetic-data-generation" class="anchor">Synthetic data generation </h3>
+
+<p>Binary classification data: Create two Gaussian clusters in 2D. For example, class 0 around mean [-2,-2] and class 1 around [2,2].
+Multiclass data: Create several Gaussian clusters (one per class) spread out in feature space.
+</p>
+
+
+<!-- code=python (!bc pycod) typeset with pygments style "default" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #f8f8f8">
+  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
+
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">generate_binary_data</span>(n_samples<span style="color: #666666">=100</span>, n_features<span style="color: #666666">=2</span>, random_state<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">None</span>):
+    <span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;</span>
+<span style="color: #BA2121; font-style: italic">    Generate synthetic binary classification data.</span>
+<span style="color: #BA2121; font-style: italic">    Returns (X, y) where X is (n_samples x n_features), y in {0,1}.</span>
+<span style="color: #BA2121; font-style: italic">    &quot;&quot;&quot;</span>
+    rng <span style="color: #666666">=</span> np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>RandomState(random_state)
+    <span style="color: #408080; font-style: italic"># Half samples for class 0, half for class 1</span>
+    n0 <span style="color: #666666">=</span> n_samples <span style="color: #666666">//</span> <span style="color: #666666">2</span>
+    n1 <span style="color: #666666">=</span> n_samples <span style="color: #666666">-</span> n0
+    <span style="color: #408080; font-style: italic"># Class 0 around mean -2, class 1 around +2</span>
+    mean0 <span style="color: #666666">=</span> <span style="color: #666666">-2</span> <span style="color: #666666">*</span> np<span style="color: #666666">.</span>ones(n_features)
+    mean1 <span style="color: #666666">=</span>  <span style="color: #666666">2</span> <span style="color: #666666">*</span> np<span style="color: #666666">.</span>ones(n_features)
+    X0 <span style="color: #666666">=</span> rng<span style="color: #666666">.</span>randn(n0, n_features) <span style="color: #666666">+</span> mean0
+    X1 <span style="color: #666666">=</span> rng<span style="color: #666666">.</span>randn(n1, n_features) <span style="color: #666666">+</span> mean1
+    X <span style="color: #666666">=</span> np<span style="color: #666666">.</span>vstack((X0, X1))
+    y <span style="color: #666666">=</span> np<span style="color: #666666">.</span>array([<span style="color: #666666">0</span>]<span style="color: #666666">*</span>n0 <span style="color: #666666">+</span> [<span style="color: #666666">1</span>]<span style="color: #666666">*</span>n1)
+    <span style="color: #008000; font-weight: bold">return</span> X, y
+
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">generate_multiclass_data</span>(n_samples<span style="color: #666666">=150</span>, n_features<span style="color: #666666">=2</span>, n_classes<span style="color: #666666">=3</span>, random_state<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">None</span>):
+    <span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;</span>
+<span style="color: #BA2121; font-style: italic">    Generate synthetic multiclass data with n_classes Gaussian clusters.</span>
+<span style="color: #BA2121; font-style: italic">    &quot;&quot;&quot;</span>
+    rng <span style="color: #666666">=</span> np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>RandomState(random_state)
+    X <span style="color: #666666">=</span> []
+    y <span style="color: #666666">=</span> []
+    samples_per_class <span style="color: #666666">=</span> n_samples <span style="color: #666666">//</span> n_classes
+    <span style="color: #008000; font-weight: bold">for</span> <span style="color: #008000">cls</span> <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(n_classes):
+        <span style="color: #408080; font-style: italic"># Random cluster center for each class</span>
+        center <span style="color: #666666">=</span> rng<span style="color: #666666">.</span>uniform(<span style="color: #666666">-5</span>, <span style="color: #666666">5</span>, size<span style="color: #666666">=</span>n_features)
+        Xi <span style="color: #666666">=</span> rng<span style="color: #666666">.</span>randn(samples_per_class, n_features) <span style="color: #666666">+</span> center
+        yi <span style="color: #666666">=</span> [<span style="color: #008000">cls</span>] <span style="color: #666666">*</span> samples_per_class
+        X<span style="color: #666666">.</span>append(Xi)
+        y<span style="color: #666666">.</span>extend(yi)
+    X <span style="color: #666666">=</span> np<span style="color: #666666">.</span>vstack(X)
+    y <span style="color: #666666">=</span> np<span style="color: #666666">.</span>array(y)
+    <span style="color: #008000; font-weight: bold">return</span> X, y
+
+
+<span style="color: #408080; font-style: italic"># Generate and test on binary data</span>
+X_bin, y_bin <span style="color: #666666">=</span> generate_binary_data(n_samples<span style="color: #666666">=200</span>, n_features<span style="color: #666666">=2</span>, random_state<span style="color: #666666">=42</span>)
+model_bin <span style="color: #666666">=</span> LogisticRegression(lr<span style="color: #666666">=0.1</span>, epochs<span style="color: #666666">=1000</span>)
+model_bin<span style="color: #666666">.</span>fit(X_bin, y_bin)
+y_prob_bin <span style="color: #666666">=</span> model_bin<span style="color: #666666">.</span>predict_prob(X_bin)      <span style="color: #408080; font-style: italic"># probabilities for class 1</span>
+y_pred_bin <span style="color: #666666">=</span> model_bin<span style="color: #666666">.</span>predict(X_bin)           <span style="color: #408080; font-style: italic"># predicted classes 0 or 1</span>
+
+acc_bin <span style="color: #666666">=</span> accuracy_score(y_bin, y_pred_bin)
+loss_bin <span style="color: #666666">=</span> binary_cross_entropy(y_bin, y_prob_bin)
+<span style="color: #008000">print</span>(<span style="color: #BA2121">f&quot;Binary Classification - Accuracy: </span><span style="color: #BB6688; font-weight: bold">{</span>acc_bin<span style="color: #BB6688; font-weight: bold">:</span><span style="color: #BA2121">.2f</span><span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">, Cross-Entropy Loss: </span><span style="color: #BB6688; font-weight: bold">{</span>loss_bin<span style="color: #BB6688; font-weight: bold">:</span><span style="color: #BA2121">.2f</span><span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">&quot;</span>)
+<span style="color: #408080; font-style: italic">#For multiclass:</span>
+<span style="color: #408080; font-style: italic"># Generate and test on multiclass data</span>
+X_multi, y_multi <span style="color: #666666">=</span> generate_multiclass_data(n_samples<span style="color: #666666">=300</span>, n_features<span style="color: #666666">=2</span>, n_classes<span style="color: #666666">=3</span>, random_state<span style="color: #666666">=1</span>)
+model_multi <span style="color: #666666">=</span> LogisticRegression(lr<span style="color: #666666">=0.1</span>, epochs<span style="color: #666666">=1000</span>)
+model_multi<span style="color: #666666">.</span>fit(X_multi, y_multi)
+y_prob_multi <span style="color: #666666">=</span> model_multi<span style="color: #666666">.</span>predict_prob(X_multi)     <span style="color: #408080; font-style: italic"># (n_samples x 3) probabilities</span>
+y_pred_multi <span style="color: #666666">=</span> model_multi<span style="color: #666666">.</span>predict(X_multi)          <span style="color: #408080; font-style: italic"># predicted labels 0,1,2</span>
+
+acc_multi <span style="color: #666666">=</span> accuracy_score(y_multi, y_pred_multi)
+loss_multi <span style="color: #666666">=</span> categorical_cross_entropy(y_multi, y_prob_multi)
+<span style="color: #008000">print</span>(<span style="color: #BA2121">f&quot;Multiclass Classification - Accuracy: </span><span style="color: #BB6688; font-weight: bold">{</span>acc_multi<span style="color: #BB6688; font-weight: bold">:</span><span style="color: #BA2121">.2f</span><span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">, Cross-Entropy Loss: </span><span style="color: #BB6688; font-weight: bold">{</span>loss_multi<span style="color: #BB6688; font-weight: bold">:</span><span style="color: #BA2121">.2f</span><span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">&quot;</span>)
+
+<span style="color: #408080; font-style: italic"># CSV Export</span>
+<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">csv</span>
+
+<span style="color: #408080; font-style: italic"># Export binary results</span>
+<span style="color: #008000; font-weight: bold">with</span> <span style="color: #008000">open</span>(<span style="color: #BA2121">&#39;binary_results.csv&#39;</span>, mode<span style="color: #666666">=</span><span style="color: #BA2121">&#39;w&#39;</span>, newline<span style="color: #666666">=</span><span style="color: #BA2121">&#39;&#39;</span>) <span style="color: #008000; font-weight: bold">as</span> f:
+    writer <span style="color: #666666">=</span> csv<span style="color: #666666">.</span>writer(f)
+    writer<span style="color: #666666">.</span>writerow([<span style="color: #BA2121">&quot;TrueLabel&quot;</span>, <span style="color: #BA2121">&quot;PredictedLabel&quot;</span>])
+    <span style="color: #008000; font-weight: bold">for</span> true, pred <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">zip</span>(y_bin, y_pred_bin):
+        writer<span style="color: #666666">.</span>writerow([true, pred])
+
+<span style="color: #408080; font-style: italic"># Export multiclass results</span>
+<span style="color: #008000; font-weight: bold">with</span> <span style="color: #008000">open</span>(<span style="color: #BA2121">&#39;multiclass_results.csv&#39;</span>, mode<span style="color: #666666">=</span><span style="color: #BA2121">&#39;w&#39;</span>, newline<span style="color: #666666">=</span><span style="color: #BA2121">&#39;&#39;</span>) <span style="color: #008000; font-weight: bold">as</span> f:
+    writer <span style="color: #666666">=</span> csv<span style="color: #666666">.</span>writer(f)
+    writer<span style="color: #666666">.</span>writerow([<span style="color: #BA2121">&quot;TrueLabel&quot;</span>, <span style="color: #BA2121">&quot;PredictedLabel&quot;</span>])
+    <span style="color: #008000; font-weight: bold">for</span> true, pred <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">zip</span>(y_multi, y_pred_multi):
+        writer<span style="color: #666666">.</span>writerow([true, pred])
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
+</div>
+
+<p>
+<!-- navigation buttons at the bottom of the page -->
+<ul class="pagination">
+<li><a href="http://github.com/._week39-bs043.html">&laquo;</a></li>
+  <li><a href="http://github.com/._week39-bs000.html">1</a></li>
+  <li><a href="">...</a></li>
+  <li><a href="http://github.com/._week39-bs036.html">37</a></li>
+  <li><a href="http://github.com/._week39-bs037.html">38</a></li>
+  <li><a href="http://github.com/._week39-bs038.html">39</a></li>
+  <li><a href="http://github.com/._week39-bs039.html">40</a></li>
+  <li><a href="http://github.com/._week39-bs040.html">41</a></li>
+  <li><a href="http://github.com/._week39-bs041.html">42</a></li>
+  <li><a href="http://github.com/._week39-bs042.html">43</a></li>
+  <li><a href="http://github.com/._week39-bs043.html">44</a></li>
+  <li class="active"><a href="http://github.com/._week39-bs044.html">45</a></li>
+</ul>
+<!-- ------------------- end of main content --------------- -->
+</div>  <!-- end container -->
+<!-- include javascript, jQuery *first* -->
+<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.10.2/jquery.min.js"></script>
+<script src="https://netdna.bootstrapcdn.com/bootstrap/3.0.0/js/bootstrap.min.js"></script>
+<!-- Bootstrap footer
+<footer>
+<a href="https://.../"><img width="250" align=right src="https://.../"></a>
+</footer>
+-->
+<center style="font-size:80%">
+<!-- copyright only on the titlepage -->
+</center>
+</body>
+</html>
+
diff --git a/doc/pub/week39/html/._week39-bs045.html b/doc/pub/week39/html/._week39-bs045.html
new file mode 100644
index 000000000..12fd481df
--- /dev/null
+++ b/doc/pub/week39/html/._week39-bs045.html
@@ -0,0 +1,467 @@
+<!--
+HTML file automatically generated from DocOnce source
+(https://github.com/doconce/doconce/)
+doconce format html week39.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week39-bs --no_mako
+-->
+<html>
+<head>
+<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
+<meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
+<meta name="viewport" content="width=device-width, initial-scale=1.0" />
+<meta name="description" content="Week 39: Optimization and  Gradient Methods">
+<title>Week 39: Optimization and  Gradient Methods</title>
+<!-- Bootstrap style: bootstrap -->
+<!-- doconce format html week39.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week39-bs --no_mako -->
+<link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
+<!-- not necessary
+<link href="https://netdna.bootstrapcdn.com/font-awesome/4.0.3/css/font-awesome.css" rel="stylesheet">
+-->
+<style type="text/css">
+/* Add scrollbar to dropdown menus in bootstrap navigation bar */
+.dropdown-menu {
+   height: auto;
+   max-height: 400px;
+   overflow-x: hidden;
+}
+/* Adds an invisible element before each target to offset for the navigation
+   bar */
+.anchor::before {
+  content:"";
+  display:block;
+  height:50px;      /* fixed header height for style bootstrap */
+  margin:-50px 0 0; /* negative fixed header height */
+}
+</style>
+</head>
+
+<!-- tocinfo
+{'highest level': 2,
+ 'sections': [('Plan for week 39, September 23-27, 2024',
+               2,
+               None,
+               'plan-for-week-39-september-23-27-2024'),
+              ('Lecture Monday September 23',
+               2,
+               None,
+               'lecture-monday-september-23'),
+              ('Lab sessions week 39', 2, None, 'lab-sessions-week-39'),
+              ('Lecture Monday September 23, Optimization, the central part of '
+               'any Machine Learning algortithm',
+               2,
+               None,
+               'lecture-monday-september-23-optimization-the-central-part-of-any-machine-learning-algortithm'),
+              ('Revisiting our Logistic Regression case',
+               2,
+               None,
+               'revisiting-our-logistic-regression-case'),
+              ('The equations to solve', 2, None, 'the-equations-to-solve'),
+              ("Solving using Newton-Raphson's method",
+               2,
+               None,
+               'solving-using-newton-raphson-s-method'),
+              ("Brief reminder on Newton-Raphson's method",
+               2,
+               None,
+               'brief-reminder-on-newton-raphson-s-method'),
+              ('The equations', 2, None, 'the-equations'),
+              ('Simple geometric interpretation',
+               2,
+               None,
+               'simple-geometric-interpretation'),
+              ('Extending to more than one variable',
+               2,
+               None,
+               'extending-to-more-than-one-variable'),
+              ('Steepest descent', 2, None, 'steepest-descent'),
+              ('More on Steepest descent', 2, None, 'more-on-steepest-descent'),
+              ('The ideal', 2, None, 'the-ideal'),
+              ('The sensitiveness of the gradient descent',
+               2,
+               None,
+               'the-sensitiveness-of-the-gradient-descent'),
+              ('Convex functions', 2, None, 'convex-functions'),
+              ('Convex function', 2, None, 'convex-function'),
+              ('Conditions on convex functions',
+               2,
+               None,
+               'conditions-on-convex-functions'),
+              ('More on convex functions', 2, None, 'more-on-convex-functions'),
+              ('Some simple problems', 2, None, 'some-simple-problems'),
+              ('Standard steepest descent',
+               2,
+               None,
+               'standard-steepest-descent'),
+              ('Gradient method', 2, None, 'gradient-method'),
+              ('Steepest descent  method', 2, None, 'steepest-descent-method'),
+              ('Steepest descent  method', 2, None, 'steepest-descent-method'),
+              ('Final expressions', 2, None, 'final-expressions'),
+              ('Steepest descent example', 2, None, 'steepest-descent-example'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method and iterations',
+               2,
+               None,
+               'conjugate-gradient-method-and-iterations'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Revisiting our first homework',
+               2,
+               None,
+               'revisiting-our-first-homework'),
+              ('Gradient descent example', 2, None, 'gradient-descent-example'),
+              ('The derivative of the cost/loss function',
+               2,
+               None,
+               'the-derivative-of-the-cost-loss-function'),
+              ('The Hessian matrix', 2, None, 'the-hessian-matrix'),
+              ('Simple program', 2, None, 'simple-program'),
+              ('Gradient Descent Example', 2, None, 'gradient-descent-example'),
+              ('And a corresponding example using _scikit-learn_',
+               2,
+               None,
+               'and-a-corresponding-example-using-scikit-learn'),
+              ('Gradient descent and Ridge',
+               2,
+               None,
+               'gradient-descent-and-ridge'),
+              ('The Hessian matrix for Ridge Regression',
+               2,
+               None,
+               'the-hessian-matrix-for-ridge-regression'),
+              ('Program example for gradient descent with Ridge Regression',
+               2,
+               None,
+               'program-example-for-gradient-descent-with-ridge-regression'),
+              ('Using gradient descent methods, limitations',
+               2,
+               None,
+               'using-gradient-descent-methods-limitations'),
+              ('Improving gradient descent with momentum',
+               2,
+               None,
+               'improving-gradient-descent-with-momentum'),
+              ('Same code but now with momentum gradient descent',
+               2,
+               None,
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ('Overview video on Stochastic Gradient Descent',
+               2,
+               None,
+               'overview-video-on-stochastic-gradient-descent'),
+              ('Batches and mini-batches', 2, None, 'batches-and-mini-batches'),
+              ('Stochastic Gradient Descent (SGD)',
+               2,
+               None,
+               'stochastic-gradient-descent-sgd'),
+              ('Stochastic Gradient Descent',
+               2,
+               None,
+               'stochastic-gradient-descent'),
+              ('Computation of gradients', 2, None, 'computation-of-gradients'),
+              ('SGD example', 2, None, 'sgd-example'),
+              ('The gradient step', 2, None, 'the-gradient-step'),
+              ('Simple example code', 2, None, 'simple-example-code'),
+              ('When do we stop?', 2, None, 'when-do-we-stop'),
+              ('Slightly different approach',
+               2,
+               None,
+               'slightly-different-approach'),
+              ('Time decay rate', 2, None, 'time-decay-rate'),
+              ('Code with a Number of Minibatches which varies',
+               2,
+               None,
+               'code-with-a-number-of-minibatches-which-varies'),
+              ('Replace or not', 2, None, 'replace-or-not'),
+              ('Momentum based GD', 2, None, 'momentum-based-gd'),
+              ('More on momentum based approaches',
+               2,
+               None,
+               'more-on-momentum-based-approaches'),
+              ('Momentum parameter', 2, None, 'momentum-parameter'),
+              ('Second moment of the gradient',
+               2,
+               None,
+               'second-moment-of-the-gradient'),
+              ('RMS prop', 2, None, 'rms-prop'),
+              ('"ADAM optimizer":"https://arxiv.org/abs/1412.6980"',
+               2,
+               None,
+               'adam-optimizer-https-arxiv-org-abs-1412-6980'),
+              ('Algorithms and codes for Adagrad, RMSprop and Adam',
+               2,
+               None,
+               'algorithms-and-codes-for-adagrad-rmsprop-and-adam'),
+              ('Practical tips', 2, None, 'practical-tips'),
+              ('Automatic differentiation',
+               2,
+               None,
+               'automatic-differentiation'),
+              ('Using autograd', 2, None, 'using-autograd'),
+              ('Autograd with more complicated functions',
+               2,
+               None,
+               'autograd-with-more-complicated-functions'),
+              ('More complicated functions using the elements of their '
+               'arguments directly',
+               2,
+               None,
+               'more-complicated-functions-using-the-elements-of-their-arguments-directly'),
+              ('Functions using mathematical functions from Numpy',
+               2,
+               None,
+               'functions-using-mathematical-functions-from-numpy'),
+              ('More autograd', 2, None, 'more-autograd'),
+              ('And  with loops', 2, None, 'and-with-loops'),
+              ('Using recursion', 2, None, 'using-recursion'),
+              ('Unsupported functions', 2, None, 'unsupported-functions'),
+              ('The syntax a.dot(b) when finding the dot product',
+               2,
+               None,
+               'the-syntax-a-dot-b-when-finding-the-dot-product'),
+              ('Using Autograd with OLS', 2, None, 'using-autograd-with-ols'),
+              ('Same code but now with momentum gradient descent',
+               2,
+               None,
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ("But none of these can compete with Newton's method",
+               2,
+               None,
+               'but-none-of-these-can-compete-with-newton-s-method'),
+              ('Including Stochastic Gradient Descent with Autograd',
+               2,
+               None,
+               'including-stochastic-gradient-descent-with-autograd'),
+              ('Same code but now with momentum gradient descent',
+               2,
+               None,
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ('Similar (second order function now) problem but now with '
+               'AdaGrad',
+               2,
+               None,
+               'similar-second-order-function-now-problem-but-now-with-adagrad'),
+              ('RMSprop for adaptive learning rate with Stochastic Gradient '
+               'Descent',
+               2,
+               None,
+               'rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent'),
+              ('And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"',
+               2,
+               None,
+               'and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf'),
+              ('And Logistic Regression', 2, None, 'and-logistic-regression'),
+              ('Introducing "JAX":"https://jax.readthedocs.io/en/latest/"',
+               2,
+               None,
+               'introducing-jax-https-jax-readthedocs-io-en-latest')]}
+end of tocinfo -->
+
+<body>
+
+
+
+<script type="text/x-mathjax-config">
+MathJax.Hub.Config({
+  TeX: {
+     equationNumbers: {  autoNumber: "none"  },
+     extensions: ["AMSmath.js", "AMSsymbols.js", "autobold.js", "color.js"]
+  }
+});
+</script>
+<script type="text/javascript" async
+ src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
+</script>
+
+
+<!-- Bootstrap navigation bar -->
+<div class="navbar navbar-default navbar-fixed-top">
+  <div class="navbar-header">
+    <button type="button" class="navbar-toggle" data-toggle="collapse" data-target=".navbar-responsive-collapse">
+      <span class="icon-bar"></span>
+      <span class="icon-bar"></span>
+      <span class="icon-bar"></span>
+    </button>
+    <a class="navbar-brand" href="http://github.com/week39-bs.html">Week 39: Optimization and  Gradient Methods</a>
+  </div>
+  <div class="navbar-collapse collapse navbar-responsive-collapse">
+    <ul class="nav navbar-nav navbar-right">
+      <li class="dropdown">
+        <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
+        <ul class="dropdown-menu">
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs001.html#plan-for-week-39-september-23-27-2024" style="font-size: 80%;">Plan for week 39, September 23-27, 2024</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs002.html#lecture-monday-september-23" style="font-size: 80%;">Lecture Monday September 23</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs003.html#lab-sessions-week-39" style="font-size: 80%;">Lab sessions week 39</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs004.html#lecture-monday-september-23-optimization-the-central-part-of-any-machine-learning-algortithm" style="font-size: 80%;">Lecture Monday September 23, Optimization, the central part of any Machine Learning algortithm</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs005.html#revisiting-our-logistic-regression-case" style="font-size: 80%;">Revisiting our Logistic Regression case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs006.html#the-equations-to-solve" style="font-size: 80%;">The equations to solve</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs007.html#solving-using-newton-raphson-s-method" style="font-size: 80%;">Solving using Newton-Raphson's method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs008.html#brief-reminder-on-newton-raphson-s-method" style="font-size: 80%;">Brief reminder on Newton-Raphson's method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs009.html#the-equations" style="font-size: 80%;">The equations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs010.html#simple-geometric-interpretation" style="font-size: 80%;">Simple geometric interpretation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs011.html#extending-to-more-than-one-variable" style="font-size: 80%;">Extending to more than one variable</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs012.html#steepest-descent" style="font-size: 80%;">Steepest descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs013.html#more-on-steepest-descent" style="font-size: 80%;">More on Steepest descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs014.html#the-ideal" style="font-size: 80%;">The ideal</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs015.html#the-sensitiveness-of-the-gradient-descent" style="font-size: 80%;">The sensitiveness of the gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs016.html#convex-functions" style="font-size: 80%;">Convex functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs017.html#convex-function" style="font-size: 80%;">Convex function</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs018.html#conditions-on-convex-functions" style="font-size: 80%;">Conditions on convex functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs019.html#more-on-convex-functions" style="font-size: 80%;">More on convex functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs020.html#some-simple-problems" style="font-size: 80%;">Some simple problems</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs021.html#standard-steepest-descent" style="font-size: 80%;">Standard steepest descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs022.html#gradient-method" style="font-size: 80%;">Gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs024.html#steepest-descent-method" style="font-size: 80%;">Steepest descent  method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs024.html#steepest-descent-method" style="font-size: 80%;">Steepest descent  method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs025.html#final-expressions" style="font-size: 80%;">Final expressions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs026.html#steepest-descent-example" style="font-size: 80%;">Steepest descent example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs031.html#conjugate-gradient-method-and-iterations" style="font-size: 80%;">Conjugate gradient method and iterations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs035.html#revisiting-our-first-homework" style="font-size: 80%;">Revisiting our first homework</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs040.html#gradient-descent-example" style="font-size: 80%;">Gradient descent example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs037.html#the-derivative-of-the-cost-loss-function" style="font-size: 80%;">The derivative of the cost/loss function</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs038.html#the-hessian-matrix" style="font-size: 80%;">The Hessian matrix</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs039.html#simple-program" style="font-size: 80%;">Simple program</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs040.html#gradient-descent-example" style="font-size: 80%;">Gradient Descent Example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs041.html#and-a-corresponding-example-using-scikit-learn" style="font-size: 80%;">And a corresponding example using <b>scikit-learn</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs042.html#gradient-descent-and-ridge" style="font-size: 80%;">Gradient descent and Ridge</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs043.html#the-hessian-matrix-for-ridge-regression" style="font-size: 80%;">The Hessian matrix for Ridge Regression</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs044.html#program-example-for-gradient-descent-with-ridge-regression" style="font-size: 80%;">Program example for gradient descent with Ridge Regression</a></li>
+     <!-- navigation toc: --> <li><a href="#using-gradient-descent-methods-limitations" style="font-size: 80%;">Using gradient descent methods, limitations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs046.html#improving-gradient-descent-with-momentum" style="font-size: 80%;">Improving gradient descent with momentum</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs083.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;">Same code but now with momentum gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs048.html#overview-video-on-stochastic-gradient-descent" style="font-size: 80%;">Overview video on Stochastic Gradient Descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs049.html#batches-and-mini-batches" style="font-size: 80%;">Batches and mini-batches</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs050.html#stochastic-gradient-descent-sgd" style="font-size: 80%;">Stochastic Gradient Descent (SGD)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs051.html#stochastic-gradient-descent" style="font-size: 80%;">Stochastic Gradient Descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs052.html#computation-of-gradients" style="font-size: 80%;">Computation of gradients</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs053.html#sgd-example" style="font-size: 80%;">SGD example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs054.html#the-gradient-step" style="font-size: 80%;">The gradient step</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs055.html#simple-example-code" style="font-size: 80%;">Simple example code</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs056.html#when-do-we-stop" style="font-size: 80%;">When do we stop?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs057.html#slightly-different-approach" style="font-size: 80%;">Slightly different approach</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs058.html#time-decay-rate" style="font-size: 80%;">Time decay rate</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs059.html#code-with-a-number-of-minibatches-which-varies" style="font-size: 80%;">Code with a Number of Minibatches which varies</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs060.html#replace-or-not" style="font-size: 80%;">Replace or not</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs061.html#momentum-based-gd" style="font-size: 80%;">Momentum based GD</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs062.html#more-on-momentum-based-approaches" style="font-size: 80%;">More on momentum based approaches</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs063.html#momentum-parameter" style="font-size: 80%;">Momentum parameter</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs064.html#second-moment-of-the-gradient" style="font-size: 80%;">Second moment of the gradient</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs065.html#rms-prop" style="font-size: 80%;">RMS prop</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs066.html#adam-optimizer-https-arxiv-org-abs-1412-6980" style="font-size: 80%;">"ADAM optimizer":"https://arxiv.org/abs/1412.6980"</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs067.html#algorithms-and-codes-for-adagrad-rmsprop-and-adam" style="font-size: 80%;">Algorithms and codes for Adagrad, RMSprop and Adam</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs068.html#practical-tips" style="font-size: 80%;">Practical tips</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs069.html#automatic-differentiation" style="font-size: 80%;">Automatic differentiation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs070.html#using-autograd" style="font-size: 80%;">Using autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs071.html#autograd-with-more-complicated-functions" style="font-size: 80%;">Autograd with more complicated functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs072.html#more-complicated-functions-using-the-elements-of-their-arguments-directly" style="font-size: 80%;">More complicated functions using the elements of their arguments directly</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs073.html#functions-using-mathematical-functions-from-numpy" style="font-size: 80%;">Functions using mathematical functions from Numpy</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs074.html#more-autograd" style="font-size: 80%;">More autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs075.html#and-with-loops" style="font-size: 80%;">And  with loops</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs076.html#using-recursion" style="font-size: 80%;">Using recursion</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs077.html#unsupported-functions" style="font-size: 80%;">Unsupported functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs078.html#the-syntax-a-dot-b-when-finding-the-dot-product" style="font-size: 80%;">The syntax a.dot(b) when finding the dot product</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs079.html#using-autograd-with-ols" style="font-size: 80%;">Using Autograd with OLS</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs083.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;">Same code but now with momentum gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs081.html#but-none-of-these-can-compete-with-newton-s-method" style="font-size: 80%;">But none of these can compete with Newton's method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs082.html#including-stochastic-gradient-descent-with-autograd" style="font-size: 80%;">Including Stochastic Gradient Descent with Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs083.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;">Same code but now with momentum gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs084.html#similar-second-order-function-now-problem-but-now-with-adagrad" style="font-size: 80%;">Similar (second order function now) problem but now with AdaGrad</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs085.html#rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent" style="font-size: 80%;">RMSprop for adaptive learning rate with Stochastic Gradient Descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs086.html#and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf" style="font-size: 80%;">And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs087.html#and-logistic-regression" style="font-size: 80%;">And Logistic Regression</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs088.html#introducing-jax-https-jax-readthedocs-io-en-latest" style="font-size: 80%;">Introducing "JAX":"https://jax.readthedocs.io/en/latest/"</a></li>
+
+        </ul>
+      </li>
+    </ul>
+  </div>
+</div>
+</div> <!-- end of navigation bar -->
+<div class="container">
+<p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
+<a name="part0045"></a>
+<!-- !split -->
+<h2 id="using-gradient-descent-methods-limitations" class="anchor">Using gradient descent methods, limitations </h2>
+
+<ul>
+<li> <b>Gradient descent (GD) finds local minima of our function</b>. Since the GD algorithm is deterministic, if it converges, it will converge to a local minimum of our cost/loss/risk function. Because in ML we are often dealing with extremely rugged landscapes with many local minima, this can lead to poor performance.</li>
+<li> <b>GD is sensitive to initial conditions</b>. One consequence of the local nature of GD is that initial conditions matter. Depending on where one starts, one will end up at a different local minima. Therefore, it is very important to think about how one initializes the training process. This is true for GD as well as more complicated variants of GD.</li>
+<li> <b>Gradients are computationally expensive to calculate for large datasets</b>. In many cases in statistics and ML, the cost/loss/risk function is a sum of terms, with one term for each data point. For example, in linear regression, \( E \propto \sum_{i=1}^n (y_i - \mathbf{w}^T\cdot\mathbf{x}_i)^2 \); for logistic regression, the square error is replaced by the cross entropy. To calculate the gradient we have to sum over <em>all</em> \( n \) data points. Doing this at every GD step becomes extremely computationally expensive. An ingenious solution to this, is to calculate the gradients using small subsets of the data called &quot;mini batches&quot;. This has the added benefit of introducing stochasticity into our algorithm.</li>
+<li> <b>GD is very sensitive to choices of learning rates</b>. GD is extremely sensitive to the choice of learning rates. If the learning rate is very small, the training process take an extremely long time. For larger learning rates, GD can diverge and give poor results. Furthermore, depending on what the local landscape looks like, we have to modify the learning rates to ensure convergence. Ideally, we would <em>adaptively</em> choose the learning rates to match the landscape.</li>
+<li> <b>GD treats all directions in parameter space uniformly.</b> Another major drawback of GD is that unlike Newton's method, the learning rate for GD is the same in all directions in parameter space. For this reason, the maximum learning rate is set by the behavior of the steepest direction and this can significantly slow down training. Ideally, we would like to take large steps in flat directions and small steps in steep directions. Since we are exploring rugged landscapes where curvatures change, this requires us to keep track of not only the gradient but second derivatives. The ideal scenario would be to calculate the Hessian but this proves to be too computationally expensive.</li> 
+<li> GD can take exponential time to escape saddle points, even with random initialization. As we mentioned, GD is extremely sensitive to initial condition since it determines the particular local minimum GD would eventually reach. However, even with a good initialization scheme, through the introduction of randomness, GD can still take exponential time to escape saddle points.</li>
+</ul>
+<p>
+<!-- navigation buttons at the bottom of the page -->
+<ul class="pagination">
+<li><a href="http://github.com/._week39-bs044.html">&laquo;</a></li>
+  <li><a href="http://github.com/._week39-bs000.html">1</a></li>
+  <li><a href="">...</a></li>
+  <li><a href="http://github.com/._week39-bs037.html">38</a></li>
+  <li><a href="http://github.com/._week39-bs038.html">39</a></li>
+  <li><a href="http://github.com/._week39-bs039.html">40</a></li>
+  <li><a href="http://github.com/._week39-bs040.html">41</a></li>
+  <li><a href="http://github.com/._week39-bs041.html">42</a></li>
+  <li><a href="http://github.com/._week39-bs042.html">43</a></li>
+  <li><a href="http://github.com/._week39-bs043.html">44</a></li>
+  <li><a href="http://github.com/._week39-bs044.html">45</a></li>
+  <li class="active"><a href="http://github.com/._week39-bs045.html">46</a></li>
+  <li><a href="http://github.com/._week39-bs046.html">47</a></li>
+  <li><a href="http://github.com/._week39-bs047.html">48</a></li>
+  <li><a href="http://github.com/._week39-bs048.html">49</a></li>
+  <li><a href="http://github.com/._week39-bs049.html">50</a></li>
+  <li><a href="http://github.com/._week39-bs050.html">51</a></li>
+  <li><a href="http://github.com/._week39-bs051.html">52</a></li>
+  <li><a href="http://github.com/._week39-bs052.html">53</a></li>
+  <li><a href="http://github.com/._week39-bs053.html">54</a></li>
+  <li><a href="http://github.com/._week39-bs054.html">55</a></li>
+  <li><a href="">...</a></li>
+  <li><a href="http://github.com/._week39-bs088.html">89</a></li>
+  <li><a href="http://github.com/._week39-bs046.html">&raquo;</a></li>
+</ul>
+<!-- ------------------- end of main content --------------- -->
+</div>  <!-- end container -->
+<!-- include javascript, jQuery *first* -->
+<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.10.2/jquery.min.js"></script>
+<script src="https://netdna.bootstrapcdn.com/bootstrap/3.0.0/js/bootstrap.min.js"></script>
+<!-- Bootstrap footer
+<footer>
+<a href="https://.../"><img width="250" align=right src="https://.../"></a>
+</footer>
+-->
+<center style="font-size:80%">
+<!-- copyright only on the titlepage -->
+</center>
+</body>
+</html>
+
diff --git a/doc/pub/week39/html/._week39-bs046.html b/doc/pub/week39/html/._week39-bs046.html
new file mode 100644
index 000000000..b46cb20b5
--- /dev/null
+++ b/doc/pub/week39/html/._week39-bs046.html
@@ -0,0 +1,539 @@
+<!--
+HTML file automatically generated from DocOnce source
+(https://github.com/doconce/doconce/)
+doconce format html week39.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week39-bs --no_mako
+-->
+<html>
+<head>
+<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
+<meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
+<meta name="viewport" content="width=device-width, initial-scale=1.0" />
+<meta name="description" content="Week 39: Optimization and  Gradient Methods">
+<title>Week 39: Optimization and  Gradient Methods</title>
+<!-- Bootstrap style: bootstrap -->
+<!-- doconce format html week39.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week39-bs --no_mako -->
+<link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
+<!-- not necessary
+<link href="https://netdna.bootstrapcdn.com/font-awesome/4.0.3/css/font-awesome.css" rel="stylesheet">
+-->
+<style type="text/css">
+/* Add scrollbar to dropdown menus in bootstrap navigation bar */
+.dropdown-menu {
+   height: auto;
+   max-height: 400px;
+   overflow-x: hidden;
+}
+/* Adds an invisible element before each target to offset for the navigation
+   bar */
+.anchor::before {
+  content:"";
+  display:block;
+  height:50px;      /* fixed header height for style bootstrap */
+  margin:-50px 0 0; /* negative fixed header height */
+}
+</style>
+</head>
+
+<!-- tocinfo
+{'highest level': 2,
+ 'sections': [('Plan for week 39, September 23-27, 2024',
+               2,
+               None,
+               'plan-for-week-39-september-23-27-2024'),
+              ('Lecture Monday September 23',
+               2,
+               None,
+               'lecture-monday-september-23'),
+              ('Lab sessions week 39', 2, None, 'lab-sessions-week-39'),
+              ('Lecture Monday September 23, Optimization, the central part of '
+               'any Machine Learning algortithm',
+               2,
+               None,
+               'lecture-monday-september-23-optimization-the-central-part-of-any-machine-learning-algortithm'),
+              ('Revisiting our Logistic Regression case',
+               2,
+               None,
+               'revisiting-our-logistic-regression-case'),
+              ('The equations to solve', 2, None, 'the-equations-to-solve'),
+              ("Solving using Newton-Raphson's method",
+               2,
+               None,
+               'solving-using-newton-raphson-s-method'),
+              ("Brief reminder on Newton-Raphson's method",
+               2,
+               None,
+               'brief-reminder-on-newton-raphson-s-method'),
+              ('The equations', 2, None, 'the-equations'),
+              ('Simple geometric interpretation',
+               2,
+               None,
+               'simple-geometric-interpretation'),
+              ('Extending to more than one variable',
+               2,
+               None,
+               'extending-to-more-than-one-variable'),
+              ('Steepest descent', 2, None, 'steepest-descent'),
+              ('More on Steepest descent', 2, None, 'more-on-steepest-descent'),
+              ('The ideal', 2, None, 'the-ideal'),
+              ('The sensitiveness of the gradient descent',
+               2,
+               None,
+               'the-sensitiveness-of-the-gradient-descent'),
+              ('Convex functions', 2, None, 'convex-functions'),
+              ('Convex function', 2, None, 'convex-function'),
+              ('Conditions on convex functions',
+               2,
+               None,
+               'conditions-on-convex-functions'),
+              ('More on convex functions', 2, None, 'more-on-convex-functions'),
+              ('Some simple problems', 2, None, 'some-simple-problems'),
+              ('Standard steepest descent',
+               2,
+               None,
+               'standard-steepest-descent'),
+              ('Gradient method', 2, None, 'gradient-method'),
+              ('Steepest descent  method', 2, None, 'steepest-descent-method'),
+              ('Steepest descent  method', 2, None, 'steepest-descent-method'),
+              ('Final expressions', 2, None, 'final-expressions'),
+              ('Steepest descent example', 2, None, 'steepest-descent-example'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method and iterations',
+               2,
+               None,
+               'conjugate-gradient-method-and-iterations'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Revisiting our first homework',
+               2,
+               None,
+               'revisiting-our-first-homework'),
+              ('Gradient descent example', 2, None, 'gradient-descent-example'),
+              ('The derivative of the cost/loss function',
+               2,
+               None,
+               'the-derivative-of-the-cost-loss-function'),
+              ('The Hessian matrix', 2, None, 'the-hessian-matrix'),
+              ('Simple program', 2, None, 'simple-program'),
+              ('Gradient Descent Example', 2, None, 'gradient-descent-example'),
+              ('And a corresponding example using _scikit-learn_',
+               2,
+               None,
+               'and-a-corresponding-example-using-scikit-learn'),
+              ('Gradient descent and Ridge',
+               2,
+               None,
+               'gradient-descent-and-ridge'),
+              ('The Hessian matrix for Ridge Regression',
+               2,
+               None,
+               'the-hessian-matrix-for-ridge-regression'),
+              ('Program example for gradient descent with Ridge Regression',
+               2,
+               None,
+               'program-example-for-gradient-descent-with-ridge-regression'),
+              ('Using gradient descent methods, limitations',
+               2,
+               None,
+               'using-gradient-descent-methods-limitations'),
+              ('Improving gradient descent with momentum',
+               2,
+               None,
+               'improving-gradient-descent-with-momentum'),
+              ('Same code but now with momentum gradient descent',
+               2,
+               None,
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ('Overview video on Stochastic Gradient Descent',
+               2,
+               None,
+               'overview-video-on-stochastic-gradient-descent'),
+              ('Batches and mini-batches', 2, None, 'batches-and-mini-batches'),
+              ('Stochastic Gradient Descent (SGD)',
+               2,
+               None,
+               'stochastic-gradient-descent-sgd'),
+              ('Stochastic Gradient Descent',
+               2,
+               None,
+               'stochastic-gradient-descent'),
+              ('Computation of gradients', 2, None, 'computation-of-gradients'),
+              ('SGD example', 2, None, 'sgd-example'),
+              ('The gradient step', 2, None, 'the-gradient-step'),
+              ('Simple example code', 2, None, 'simple-example-code'),
+              ('When do we stop?', 2, None, 'when-do-we-stop'),
+              ('Slightly different approach',
+               2,
+               None,
+               'slightly-different-approach'),
+              ('Time decay rate', 2, None, 'time-decay-rate'),
+              ('Code with a Number of Minibatches which varies',
+               2,
+               None,
+               'code-with-a-number-of-minibatches-which-varies'),
+              ('Replace or not', 2, None, 'replace-or-not'),
+              ('Momentum based GD', 2, None, 'momentum-based-gd'),
+              ('More on momentum based approaches',
+               2,
+               None,
+               'more-on-momentum-based-approaches'),
+              ('Momentum parameter', 2, None, 'momentum-parameter'),
+              ('Second moment of the gradient',
+               2,
+               None,
+               'second-moment-of-the-gradient'),
+              ('RMS prop', 2, None, 'rms-prop'),
+              ('"ADAM optimizer":"https://arxiv.org/abs/1412.6980"',
+               2,
+               None,
+               'adam-optimizer-https-arxiv-org-abs-1412-6980'),
+              ('Algorithms and codes for Adagrad, RMSprop and Adam',
+               2,
+               None,
+               'algorithms-and-codes-for-adagrad-rmsprop-and-adam'),
+              ('Practical tips', 2, None, 'practical-tips'),
+              ('Automatic differentiation',
+               2,
+               None,
+               'automatic-differentiation'),
+              ('Using autograd', 2, None, 'using-autograd'),
+              ('Autograd with more complicated functions',
+               2,
+               None,
+               'autograd-with-more-complicated-functions'),
+              ('More complicated functions using the elements of their '
+               'arguments directly',
+               2,
+               None,
+               'more-complicated-functions-using-the-elements-of-their-arguments-directly'),
+              ('Functions using mathematical functions from Numpy',
+               2,
+               None,
+               'functions-using-mathematical-functions-from-numpy'),
+              ('More autograd', 2, None, 'more-autograd'),
+              ('And  with loops', 2, None, 'and-with-loops'),
+              ('Using recursion', 2, None, 'using-recursion'),
+              ('Unsupported functions', 2, None, 'unsupported-functions'),
+              ('The syntax a.dot(b) when finding the dot product',
+               2,
+               None,
+               'the-syntax-a-dot-b-when-finding-the-dot-product'),
+              ('Using Autograd with OLS', 2, None, 'using-autograd-with-ols'),
+              ('Same code but now with momentum gradient descent',
+               2,
+               None,
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ("But none of these can compete with Newton's method",
+               2,
+               None,
+               'but-none-of-these-can-compete-with-newton-s-method'),
+              ('Including Stochastic Gradient Descent with Autograd',
+               2,
+               None,
+               'including-stochastic-gradient-descent-with-autograd'),
+              ('Same code but now with momentum gradient descent',
+               2,
+               None,
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ('Similar (second order function now) problem but now with '
+               'AdaGrad',
+               2,
+               None,
+               'similar-second-order-function-now-problem-but-now-with-adagrad'),
+              ('RMSprop for adaptive learning rate with Stochastic Gradient '
+               'Descent',
+               2,
+               None,
+               'rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent'),
+              ('And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"',
+               2,
+               None,
+               'and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf'),
+              ('And Logistic Regression', 2, None, 'and-logistic-regression'),
+              ('Introducing "JAX":"https://jax.readthedocs.io/en/latest/"',
+               2,
+               None,
+               'introducing-jax-https-jax-readthedocs-io-en-latest')]}
+end of tocinfo -->
+
+<body>
+
+
+
+<script type="text/x-mathjax-config">
+MathJax.Hub.Config({
+  TeX: {
+     equationNumbers: {  autoNumber: "none"  },
+     extensions: ["AMSmath.js", "AMSsymbols.js", "autobold.js", "color.js"]
+  }
+});
+</script>
+<script type="text/javascript" async
+ src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
+</script>
+
+
+<!-- Bootstrap navigation bar -->
+<div class="navbar navbar-default navbar-fixed-top">
+  <div class="navbar-header">
+    <button type="button" class="navbar-toggle" data-toggle="collapse" data-target=".navbar-responsive-collapse">
+      <span class="icon-bar"></span>
+      <span class="icon-bar"></span>
+      <span class="icon-bar"></span>
+    </button>
+    <a class="navbar-brand" href="http://github.com/week39-bs.html">Week 39: Optimization and  Gradient Methods</a>
+  </div>
+  <div class="navbar-collapse collapse navbar-responsive-collapse">
+    <ul class="nav navbar-nav navbar-right">
+      <li class="dropdown">
+        <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
+        <ul class="dropdown-menu">
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs001.html#plan-for-week-39-september-23-27-2024" style="font-size: 80%;">Plan for week 39, September 23-27, 2024</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs002.html#lecture-monday-september-23" style="font-size: 80%;">Lecture Monday September 23</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs003.html#lab-sessions-week-39" style="font-size: 80%;">Lab sessions week 39</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs004.html#lecture-monday-september-23-optimization-the-central-part-of-any-machine-learning-algortithm" style="font-size: 80%;">Lecture Monday September 23, Optimization, the central part of any Machine Learning algortithm</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs005.html#revisiting-our-logistic-regression-case" style="font-size: 80%;">Revisiting our Logistic Regression case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs006.html#the-equations-to-solve" style="font-size: 80%;">The equations to solve</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs007.html#solving-using-newton-raphson-s-method" style="font-size: 80%;">Solving using Newton-Raphson's method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs008.html#brief-reminder-on-newton-raphson-s-method" style="font-size: 80%;">Brief reminder on Newton-Raphson's method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs009.html#the-equations" style="font-size: 80%;">The equations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs010.html#simple-geometric-interpretation" style="font-size: 80%;">Simple geometric interpretation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs011.html#extending-to-more-than-one-variable" style="font-size: 80%;">Extending to more than one variable</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs012.html#steepest-descent" style="font-size: 80%;">Steepest descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs013.html#more-on-steepest-descent" style="font-size: 80%;">More on Steepest descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs014.html#the-ideal" style="font-size: 80%;">The ideal</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs015.html#the-sensitiveness-of-the-gradient-descent" style="font-size: 80%;">The sensitiveness of the gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs016.html#convex-functions" style="font-size: 80%;">Convex functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs017.html#convex-function" style="font-size: 80%;">Convex function</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs018.html#conditions-on-convex-functions" style="font-size: 80%;">Conditions on convex functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs019.html#more-on-convex-functions" style="font-size: 80%;">More on convex functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs020.html#some-simple-problems" style="font-size: 80%;">Some simple problems</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs021.html#standard-steepest-descent" style="font-size: 80%;">Standard steepest descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs022.html#gradient-method" style="font-size: 80%;">Gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs024.html#steepest-descent-method" style="font-size: 80%;">Steepest descent  method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs024.html#steepest-descent-method" style="font-size: 80%;">Steepest descent  method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs025.html#final-expressions" style="font-size: 80%;">Final expressions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs026.html#steepest-descent-example" style="font-size: 80%;">Steepest descent example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs031.html#conjugate-gradient-method-and-iterations" style="font-size: 80%;">Conjugate gradient method and iterations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs035.html#revisiting-our-first-homework" style="font-size: 80%;">Revisiting our first homework</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs040.html#gradient-descent-example" style="font-size: 80%;">Gradient descent example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs037.html#the-derivative-of-the-cost-loss-function" style="font-size: 80%;">The derivative of the cost/loss function</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs038.html#the-hessian-matrix" style="font-size: 80%;">The Hessian matrix</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs039.html#simple-program" style="font-size: 80%;">Simple program</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs040.html#gradient-descent-example" style="font-size: 80%;">Gradient Descent Example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs041.html#and-a-corresponding-example-using-scikit-learn" style="font-size: 80%;">And a corresponding example using <b>scikit-learn</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs042.html#gradient-descent-and-ridge" style="font-size: 80%;">Gradient descent and Ridge</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs043.html#the-hessian-matrix-for-ridge-regression" style="font-size: 80%;">The Hessian matrix for Ridge Regression</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs044.html#program-example-for-gradient-descent-with-ridge-regression" style="font-size: 80%;">Program example for gradient descent with Ridge Regression</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs045.html#using-gradient-descent-methods-limitations" style="font-size: 80%;">Using gradient descent methods, limitations</a></li>
+     <!-- navigation toc: --> <li><a href="#improving-gradient-descent-with-momentum" style="font-size: 80%;">Improving gradient descent with momentum</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs083.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;">Same code but now with momentum gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs048.html#overview-video-on-stochastic-gradient-descent" style="font-size: 80%;">Overview video on Stochastic Gradient Descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs049.html#batches-and-mini-batches" style="font-size: 80%;">Batches and mini-batches</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs050.html#stochastic-gradient-descent-sgd" style="font-size: 80%;">Stochastic Gradient Descent (SGD)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs051.html#stochastic-gradient-descent" style="font-size: 80%;">Stochastic Gradient Descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs052.html#computation-of-gradients" style="font-size: 80%;">Computation of gradients</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs053.html#sgd-example" style="font-size: 80%;">SGD example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs054.html#the-gradient-step" style="font-size: 80%;">The gradient step</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs055.html#simple-example-code" style="font-size: 80%;">Simple example code</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs056.html#when-do-we-stop" style="font-size: 80%;">When do we stop?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs057.html#slightly-different-approach" style="font-size: 80%;">Slightly different approach</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs058.html#time-decay-rate" style="font-size: 80%;">Time decay rate</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs059.html#code-with-a-number-of-minibatches-which-varies" style="font-size: 80%;">Code with a Number of Minibatches which varies</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs060.html#replace-or-not" style="font-size: 80%;">Replace or not</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs061.html#momentum-based-gd" style="font-size: 80%;">Momentum based GD</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs062.html#more-on-momentum-based-approaches" style="font-size: 80%;">More on momentum based approaches</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs063.html#momentum-parameter" style="font-size: 80%;">Momentum parameter</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs064.html#second-moment-of-the-gradient" style="font-size: 80%;">Second moment of the gradient</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs065.html#rms-prop" style="font-size: 80%;">RMS prop</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs066.html#adam-optimizer-https-arxiv-org-abs-1412-6980" style="font-size: 80%;">"ADAM optimizer":"https://arxiv.org/abs/1412.6980"</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs067.html#algorithms-and-codes-for-adagrad-rmsprop-and-adam" style="font-size: 80%;">Algorithms and codes for Adagrad, RMSprop and Adam</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs068.html#practical-tips" style="font-size: 80%;">Practical tips</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs069.html#automatic-differentiation" style="font-size: 80%;">Automatic differentiation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs070.html#using-autograd" style="font-size: 80%;">Using autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs071.html#autograd-with-more-complicated-functions" style="font-size: 80%;">Autograd with more complicated functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs072.html#more-complicated-functions-using-the-elements-of-their-arguments-directly" style="font-size: 80%;">More complicated functions using the elements of their arguments directly</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs073.html#functions-using-mathematical-functions-from-numpy" style="font-size: 80%;">Functions using mathematical functions from Numpy</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs074.html#more-autograd" style="font-size: 80%;">More autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs075.html#and-with-loops" style="font-size: 80%;">And  with loops</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs076.html#using-recursion" style="font-size: 80%;">Using recursion</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs077.html#unsupported-functions" style="font-size: 80%;">Unsupported functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs078.html#the-syntax-a-dot-b-when-finding-the-dot-product" style="font-size: 80%;">The syntax a.dot(b) when finding the dot product</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs079.html#using-autograd-with-ols" style="font-size: 80%;">Using Autograd with OLS</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs083.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;">Same code but now with momentum gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs081.html#but-none-of-these-can-compete-with-newton-s-method" style="font-size: 80%;">But none of these can compete with Newton's method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs082.html#including-stochastic-gradient-descent-with-autograd" style="font-size: 80%;">Including Stochastic Gradient Descent with Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs083.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;">Same code but now with momentum gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs084.html#similar-second-order-function-now-problem-but-now-with-adagrad" style="font-size: 80%;">Similar (second order function now) problem but now with AdaGrad</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs085.html#rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent" style="font-size: 80%;">RMSprop for adaptive learning rate with Stochastic Gradient Descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs086.html#and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf" style="font-size: 80%;">And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs087.html#and-logistic-regression" style="font-size: 80%;">And Logistic Regression</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs088.html#introducing-jax-https-jax-readthedocs-io-en-latest" style="font-size: 80%;">Introducing "JAX":"https://jax.readthedocs.io/en/latest/"</a></li>
+
+        </ul>
+      </li>
+    </ul>
+  </div>
+</div>
+</div> <!-- end of navigation bar -->
+<div class="container">
+<p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
+<a name="part0046"></a>
+<!-- !split -->
+<h2 id="improving-gradient-descent-with-momentum" class="anchor">Improving gradient descent with momentum </h2>
+
+<p>We discuss here some simple examples where we introduce what is called 'memory'about previous steps, or what is normally called momentum gradient descent. The mathematics is explained below in connection with Stochastic gradient descent.</p>
+
+
+<!-- code=python (!bc pycod) typeset with pygments style "default" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #f8f8f8">
+  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">numpy</span> <span style="color: #008000; font-weight: bold">import</span> asarray
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">numpy</span> <span style="color: #008000; font-weight: bold">import</span> arange
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">numpy.random</span> <span style="color: #008000; font-weight: bold">import</span> rand
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">numpy.random</span> <span style="color: #008000; font-weight: bold">import</span> seed
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">matplotlib</span> <span style="color: #008000; font-weight: bold">import</span> pyplot
+ 
+<span style="color: #408080; font-style: italic"># objective function</span>
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">objective</span>(x):
+	<span style="color: #008000; font-weight: bold">return</span> x<span style="color: #666666">**2.0</span>
+ 
+<span style="color: #408080; font-style: italic"># derivative of objective function</span>
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">derivative</span>(x):
+	<span style="color: #008000; font-weight: bold">return</span> x <span style="color: #666666">*</span> <span style="color: #666666">2.0</span>
+ 
+<span style="color: #408080; font-style: italic"># gradient descent algorithm</span>
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">gradient_descent</span>(objective, derivative, bounds, n_iter, step_size):
+	<span style="color: #408080; font-style: italic"># track all solutions</span>
+	solutions, scores <span style="color: #666666">=</span> <span style="color: #008000">list</span>(), <span style="color: #008000">list</span>()
+	<span style="color: #408080; font-style: italic"># generate an initial point</span>
+	solution <span style="color: #666666">=</span> bounds[:, <span style="color: #666666">0</span>] <span style="color: #666666">+</span> rand(<span style="color: #008000">len</span>(bounds)) <span style="color: #666666">*</span> (bounds[:, <span style="color: #666666">1</span>] <span style="color: #666666">-</span> bounds[:, <span style="color: #666666">0</span>])
+	<span style="color: #408080; font-style: italic"># run the gradient descent</span>
+	<span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(n_iter):
+		<span style="color: #408080; font-style: italic"># calculate gradient</span>
+		gradient <span style="color: #666666">=</span> derivative(solution)
+		<span style="color: #408080; font-style: italic"># take a step</span>
+		solution <span style="color: #666666">=</span> solution <span style="color: #666666">-</span> step_size <span style="color: #666666">*</span> gradient
+		<span style="color: #408080; font-style: italic"># evaluate candidate point</span>
+		solution_eval <span style="color: #666666">=</span> objective(solution)
+		<span style="color: #408080; font-style: italic"># store solution</span>
+		solutions<span style="color: #666666">.</span>append(solution)
+		scores<span style="color: #666666">.</span>append(solution_eval)
+		<span style="color: #408080; font-style: italic"># report progress</span>
+		<span style="color: #008000">print</span>(<span style="color: #BA2121">&#39;&gt;</span><span style="color: #BB6688; font-weight: bold">%d</span><span style="color: #BA2121"> f(</span><span style="color: #BB6688; font-weight: bold">%s</span><span style="color: #BA2121">) = </span><span style="color: #BB6688; font-weight: bold">%.5f</span><span style="color: #BA2121">&#39;</span> <span style="color: #666666">%</span> (i, solution, solution_eval))
+	<span style="color: #008000; font-weight: bold">return</span> [solutions, scores]
+ 
+<span style="color: #408080; font-style: italic"># seed the pseudo random number generator</span>
+seed(<span style="color: #666666">4</span>)
+<span style="color: #408080; font-style: italic"># define range for input</span>
+bounds <span style="color: #666666">=</span> asarray([[<span style="color: #666666">-1.0</span>, <span style="color: #666666">1.0</span>]])
+<span style="color: #408080; font-style: italic"># define the total iterations</span>
+n_iter <span style="color: #666666">=</span> <span style="color: #666666">30</span>
+<span style="color: #408080; font-style: italic"># define the step size</span>
+step_size <span style="color: #666666">=</span> <span style="color: #666666">0.1</span>
+<span style="color: #408080; font-style: italic"># perform the gradient descent search</span>
+solutions, scores <span style="color: #666666">=</span> gradient_descent(objective, derivative, bounds, n_iter, step_size)
+<span style="color: #408080; font-style: italic"># sample input range uniformly at 0.1 increments</span>
+inputs <span style="color: #666666">=</span> arange(bounds[<span style="color: #666666">0</span>,<span style="color: #666666">0</span>], bounds[<span style="color: #666666">0</span>,<span style="color: #666666">1</span>]<span style="color: #666666">+0.1</span>, <span style="color: #666666">0.1</span>)
+<span style="color: #408080; font-style: italic"># compute targets</span>
+results <span style="color: #666666">=</span> objective(inputs)
+<span style="color: #408080; font-style: italic"># create a line plot of input vs result</span>
+pyplot<span style="color: #666666">.</span>plot(inputs, results)
+<span style="color: #408080; font-style: italic"># plot the solutions found</span>
+pyplot<span style="color: #666666">.</span>plot(solutions, scores, <span style="color: #BA2121">&#39;.-&#39;</span>, color<span style="color: #666666">=</span><span style="color: #BA2121">&#39;red&#39;</span>)
+<span style="color: #408080; font-style: italic"># show the plot</span>
+pyplot<span style="color: #666666">.</span>show()
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
+</div>
+
+
+<p>
+<!-- navigation buttons at the bottom of the page -->
+<ul class="pagination">
+<li><a href="http://github.com/._week39-bs045.html">&laquo;</a></li>
+  <li><a href="http://github.com/._week39-bs000.html">1</a></li>
+  <li><a href="">...</a></li>
+  <li><a href="http://github.com/._week39-bs038.html">39</a></li>
+  <li><a href="http://github.com/._week39-bs039.html">40</a></li>
+  <li><a href="http://github.com/._week39-bs040.html">41</a></li>
+  <li><a href="http://github.com/._week39-bs041.html">42</a></li>
+  <li><a href="http://github.com/._week39-bs042.html">43</a></li>
+  <li><a href="http://github.com/._week39-bs043.html">44</a></li>
+  <li><a href="http://github.com/._week39-bs044.html">45</a></li>
+  <li><a href="http://github.com/._week39-bs045.html">46</a></li>
+  <li class="active"><a href="http://github.com/._week39-bs046.html">47</a></li>
+  <li><a href="http://github.com/._week39-bs047.html">48</a></li>
+  <li><a href="http://github.com/._week39-bs048.html">49</a></li>
+  <li><a href="http://github.com/._week39-bs049.html">50</a></li>
+  <li><a href="http://github.com/._week39-bs050.html">51</a></li>
+  <li><a href="http://github.com/._week39-bs051.html">52</a></li>
+  <li><a href="http://github.com/._week39-bs052.html">53</a></li>
+  <li><a href="http://github.com/._week39-bs053.html">54</a></li>
+  <li><a href="http://github.com/._week39-bs054.html">55</a></li>
+  <li><a href="http://github.com/._week39-bs055.html">56</a></li>
+  <li><a href="">...</a></li>
+  <li><a href="http://github.com/._week39-bs088.html">89</a></li>
+  <li><a href="http://github.com/._week39-bs047.html">&raquo;</a></li>
+</ul>
+<!-- ------------------- end of main content --------------- -->
+</div>  <!-- end container -->
+<!-- include javascript, jQuery *first* -->
+<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.10.2/jquery.min.js"></script>
+<script src="https://netdna.bootstrapcdn.com/bootstrap/3.0.0/js/bootstrap.min.js"></script>
+<!-- Bootstrap footer
+<footer>
+<a href="https://.../"><img width="250" align=right src="https://.../"></a>
+</footer>
+-->
+<center style="font-size:80%">
+<!-- copyright only on the titlepage -->
+</center>
+</body>
+</html>
+
diff --git a/doc/pub/week39/html/._week39-bs047.html b/doc/pub/week39/html/._week39-bs047.html
new file mode 100644
index 000000000..3bdae2775
--- /dev/null
+++ b/doc/pub/week39/html/._week39-bs047.html
@@ -0,0 +1,545 @@
+<!--
+HTML file automatically generated from DocOnce source
+(https://github.com/doconce/doconce/)
+doconce format html week39.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week39-bs --no_mako
+-->
+<html>
+<head>
+<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
+<meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
+<meta name="viewport" content="width=device-width, initial-scale=1.0" />
+<meta name="description" content="Week 39: Optimization and  Gradient Methods">
+<title>Week 39: Optimization and  Gradient Methods</title>
+<!-- Bootstrap style: bootstrap -->
+<!-- doconce format html week39.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week39-bs --no_mako -->
+<link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
+<!-- not necessary
+<link href="https://netdna.bootstrapcdn.com/font-awesome/4.0.3/css/font-awesome.css" rel="stylesheet">
+-->
+<style type="text/css">
+/* Add scrollbar to dropdown menus in bootstrap navigation bar */
+.dropdown-menu {
+   height: auto;
+   max-height: 400px;
+   overflow-x: hidden;
+}
+/* Adds an invisible element before each target to offset for the navigation
+   bar */
+.anchor::before {
+  content:"";
+  display:block;
+  height:50px;      /* fixed header height for style bootstrap */
+  margin:-50px 0 0; /* negative fixed header height */
+}
+</style>
+</head>
+
+<!-- tocinfo
+{'highest level': 2,
+ 'sections': [('Plan for week 39, September 23-27, 2024',
+               2,
+               None,
+               'plan-for-week-39-september-23-27-2024'),
+              ('Lecture Monday September 23',
+               2,
+               None,
+               'lecture-monday-september-23'),
+              ('Lab sessions week 39', 2, None, 'lab-sessions-week-39'),
+              ('Lecture Monday September 23, Optimization, the central part of '
+               'any Machine Learning algortithm',
+               2,
+               None,
+               'lecture-monday-september-23-optimization-the-central-part-of-any-machine-learning-algortithm'),
+              ('Revisiting our Logistic Regression case',
+               2,
+               None,
+               'revisiting-our-logistic-regression-case'),
+              ('The equations to solve', 2, None, 'the-equations-to-solve'),
+              ("Solving using Newton-Raphson's method",
+               2,
+               None,
+               'solving-using-newton-raphson-s-method'),
+              ("Brief reminder on Newton-Raphson's method",
+               2,
+               None,
+               'brief-reminder-on-newton-raphson-s-method'),
+              ('The equations', 2, None, 'the-equations'),
+              ('Simple geometric interpretation',
+               2,
+               None,
+               'simple-geometric-interpretation'),
+              ('Extending to more than one variable',
+               2,
+               None,
+               'extending-to-more-than-one-variable'),
+              ('Steepest descent', 2, None, 'steepest-descent'),
+              ('More on Steepest descent', 2, None, 'more-on-steepest-descent'),
+              ('The ideal', 2, None, 'the-ideal'),
+              ('The sensitiveness of the gradient descent',
+               2,
+               None,
+               'the-sensitiveness-of-the-gradient-descent'),
+              ('Convex functions', 2, None, 'convex-functions'),
+              ('Convex function', 2, None, 'convex-function'),
+              ('Conditions on convex functions',
+               2,
+               None,
+               'conditions-on-convex-functions'),
+              ('More on convex functions', 2, None, 'more-on-convex-functions'),
+              ('Some simple problems', 2, None, 'some-simple-problems'),
+              ('Standard steepest descent',
+               2,
+               None,
+               'standard-steepest-descent'),
+              ('Gradient method', 2, None, 'gradient-method'),
+              ('Steepest descent  method', 2, None, 'steepest-descent-method'),
+              ('Steepest descent  method', 2, None, 'steepest-descent-method'),
+              ('Final expressions', 2, None, 'final-expressions'),
+              ('Steepest descent example', 2, None, 'steepest-descent-example'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method and iterations',
+               2,
+               None,
+               'conjugate-gradient-method-and-iterations'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Revisiting our first homework',
+               2,
+               None,
+               'revisiting-our-first-homework'),
+              ('Gradient descent example', 2, None, 'gradient-descent-example'),
+              ('The derivative of the cost/loss function',
+               2,
+               None,
+               'the-derivative-of-the-cost-loss-function'),
+              ('The Hessian matrix', 2, None, 'the-hessian-matrix'),
+              ('Simple program', 2, None, 'simple-program'),
+              ('Gradient Descent Example', 2, None, 'gradient-descent-example'),
+              ('And a corresponding example using _scikit-learn_',
+               2,
+               None,
+               'and-a-corresponding-example-using-scikit-learn'),
+              ('Gradient descent and Ridge',
+               2,
+               None,
+               'gradient-descent-and-ridge'),
+              ('The Hessian matrix for Ridge Regression',
+               2,
+               None,
+               'the-hessian-matrix-for-ridge-regression'),
+              ('Program example for gradient descent with Ridge Regression',
+               2,
+               None,
+               'program-example-for-gradient-descent-with-ridge-regression'),
+              ('Using gradient descent methods, limitations',
+               2,
+               None,
+               'using-gradient-descent-methods-limitations'),
+              ('Improving gradient descent with momentum',
+               2,
+               None,
+               'improving-gradient-descent-with-momentum'),
+              ('Same code but now with momentum gradient descent',
+               2,
+               None,
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ('Overview video on Stochastic Gradient Descent',
+               2,
+               None,
+               'overview-video-on-stochastic-gradient-descent'),
+              ('Batches and mini-batches', 2, None, 'batches-and-mini-batches'),
+              ('Stochastic Gradient Descent (SGD)',
+               2,
+               None,
+               'stochastic-gradient-descent-sgd'),
+              ('Stochastic Gradient Descent',
+               2,
+               None,
+               'stochastic-gradient-descent'),
+              ('Computation of gradients', 2, None, 'computation-of-gradients'),
+              ('SGD example', 2, None, 'sgd-example'),
+              ('The gradient step', 2, None, 'the-gradient-step'),
+              ('Simple example code', 2, None, 'simple-example-code'),
+              ('When do we stop?', 2, None, 'when-do-we-stop'),
+              ('Slightly different approach',
+               2,
+               None,
+               'slightly-different-approach'),
+              ('Time decay rate', 2, None, 'time-decay-rate'),
+              ('Code with a Number of Minibatches which varies',
+               2,
+               None,
+               'code-with-a-number-of-minibatches-which-varies'),
+              ('Replace or not', 2, None, 'replace-or-not'),
+              ('Momentum based GD', 2, None, 'momentum-based-gd'),
+              ('More on momentum based approaches',
+               2,
+               None,
+               'more-on-momentum-based-approaches'),
+              ('Momentum parameter', 2, None, 'momentum-parameter'),
+              ('Second moment of the gradient',
+               2,
+               None,
+               'second-moment-of-the-gradient'),
+              ('RMS prop', 2, None, 'rms-prop'),
+              ('"ADAM optimizer":"https://arxiv.org/abs/1412.6980"',
+               2,
+               None,
+               'adam-optimizer-https-arxiv-org-abs-1412-6980'),
+              ('Algorithms and codes for Adagrad, RMSprop and Adam',
+               2,
+               None,
+               'algorithms-and-codes-for-adagrad-rmsprop-and-adam'),
+              ('Practical tips', 2, None, 'practical-tips'),
+              ('Automatic differentiation',
+               2,
+               None,
+               'automatic-differentiation'),
+              ('Using autograd', 2, None, 'using-autograd'),
+              ('Autograd with more complicated functions',
+               2,
+               None,
+               'autograd-with-more-complicated-functions'),
+              ('More complicated functions using the elements of their '
+               'arguments directly',
+               2,
+               None,
+               'more-complicated-functions-using-the-elements-of-their-arguments-directly'),
+              ('Functions using mathematical functions from Numpy',
+               2,
+               None,
+               'functions-using-mathematical-functions-from-numpy'),
+              ('More autograd', 2, None, 'more-autograd'),
+              ('And  with loops', 2, None, 'and-with-loops'),
+              ('Using recursion', 2, None, 'using-recursion'),
+              ('Unsupported functions', 2, None, 'unsupported-functions'),
+              ('The syntax a.dot(b) when finding the dot product',
+               2,
+               None,
+               'the-syntax-a-dot-b-when-finding-the-dot-product'),
+              ('Using Autograd with OLS', 2, None, 'using-autograd-with-ols'),
+              ('Same code but now with momentum gradient descent',
+               2,
+               None,
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ("But none of these can compete with Newton's method",
+               2,
+               None,
+               'but-none-of-these-can-compete-with-newton-s-method'),
+              ('Including Stochastic Gradient Descent with Autograd',
+               2,
+               None,
+               'including-stochastic-gradient-descent-with-autograd'),
+              ('Same code but now with momentum gradient descent',
+               2,
+               None,
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ('Similar (second order function now) problem but now with '
+               'AdaGrad',
+               2,
+               None,
+               'similar-second-order-function-now-problem-but-now-with-adagrad'),
+              ('RMSprop for adaptive learning rate with Stochastic Gradient '
+               'Descent',
+               2,
+               None,
+               'rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent'),
+              ('And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"',
+               2,
+               None,
+               'and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf'),
+              ('And Logistic Regression', 2, None, 'and-logistic-regression'),
+              ('Introducing "JAX":"https://jax.readthedocs.io/en/latest/"',
+               2,
+               None,
+               'introducing-jax-https-jax-readthedocs-io-en-latest')]}
+end of tocinfo -->
+
+<body>
+
+
+
+<script type="text/x-mathjax-config">
+MathJax.Hub.Config({
+  TeX: {
+     equationNumbers: {  autoNumber: "none"  },
+     extensions: ["AMSmath.js", "AMSsymbols.js", "autobold.js", "color.js"]
+  }
+});
+</script>
+<script type="text/javascript" async
+ src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
+</script>
+
+
+<!-- Bootstrap navigation bar -->
+<div class="navbar navbar-default navbar-fixed-top">
+  <div class="navbar-header">
+    <button type="button" class="navbar-toggle" data-toggle="collapse" data-target=".navbar-responsive-collapse">
+      <span class="icon-bar"></span>
+      <span class="icon-bar"></span>
+      <span class="icon-bar"></span>
+    </button>
+    <a class="navbar-brand" href="http://github.com/week39-bs.html">Week 39: Optimization and  Gradient Methods</a>
+  </div>
+  <div class="navbar-collapse collapse navbar-responsive-collapse">
+    <ul class="nav navbar-nav navbar-right">
+      <li class="dropdown">
+        <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
+        <ul class="dropdown-menu">
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs001.html#plan-for-week-39-september-23-27-2024" style="font-size: 80%;">Plan for week 39, September 23-27, 2024</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs002.html#lecture-monday-september-23" style="font-size: 80%;">Lecture Monday September 23</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs003.html#lab-sessions-week-39" style="font-size: 80%;">Lab sessions week 39</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs004.html#lecture-monday-september-23-optimization-the-central-part-of-any-machine-learning-algortithm" style="font-size: 80%;">Lecture Monday September 23, Optimization, the central part of any Machine Learning algortithm</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs005.html#revisiting-our-logistic-regression-case" style="font-size: 80%;">Revisiting our Logistic Regression case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs006.html#the-equations-to-solve" style="font-size: 80%;">The equations to solve</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs007.html#solving-using-newton-raphson-s-method" style="font-size: 80%;">Solving using Newton-Raphson's method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs008.html#brief-reminder-on-newton-raphson-s-method" style="font-size: 80%;">Brief reminder on Newton-Raphson's method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs009.html#the-equations" style="font-size: 80%;">The equations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs010.html#simple-geometric-interpretation" style="font-size: 80%;">Simple geometric interpretation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs011.html#extending-to-more-than-one-variable" style="font-size: 80%;">Extending to more than one variable</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs012.html#steepest-descent" style="font-size: 80%;">Steepest descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs013.html#more-on-steepest-descent" style="font-size: 80%;">More on Steepest descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs014.html#the-ideal" style="font-size: 80%;">The ideal</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs015.html#the-sensitiveness-of-the-gradient-descent" style="font-size: 80%;">The sensitiveness of the gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs016.html#convex-functions" style="font-size: 80%;">Convex functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs017.html#convex-function" style="font-size: 80%;">Convex function</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs018.html#conditions-on-convex-functions" style="font-size: 80%;">Conditions on convex functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs019.html#more-on-convex-functions" style="font-size: 80%;">More on convex functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs020.html#some-simple-problems" style="font-size: 80%;">Some simple problems</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs021.html#standard-steepest-descent" style="font-size: 80%;">Standard steepest descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs022.html#gradient-method" style="font-size: 80%;">Gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs024.html#steepest-descent-method" style="font-size: 80%;">Steepest descent  method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs024.html#steepest-descent-method" style="font-size: 80%;">Steepest descent  method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs025.html#final-expressions" style="font-size: 80%;">Final expressions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs026.html#steepest-descent-example" style="font-size: 80%;">Steepest descent example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs031.html#conjugate-gradient-method-and-iterations" style="font-size: 80%;">Conjugate gradient method and iterations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs035.html#revisiting-our-first-homework" style="font-size: 80%;">Revisiting our first homework</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs040.html#gradient-descent-example" style="font-size: 80%;">Gradient descent example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs037.html#the-derivative-of-the-cost-loss-function" style="font-size: 80%;">The derivative of the cost/loss function</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs038.html#the-hessian-matrix" style="font-size: 80%;">The Hessian matrix</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs039.html#simple-program" style="font-size: 80%;">Simple program</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs040.html#gradient-descent-example" style="font-size: 80%;">Gradient Descent Example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs041.html#and-a-corresponding-example-using-scikit-learn" style="font-size: 80%;">And a corresponding example using <b>scikit-learn</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs042.html#gradient-descent-and-ridge" style="font-size: 80%;">Gradient descent and Ridge</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs043.html#the-hessian-matrix-for-ridge-regression" style="font-size: 80%;">The Hessian matrix for Ridge Regression</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs044.html#program-example-for-gradient-descent-with-ridge-regression" style="font-size: 80%;">Program example for gradient descent with Ridge Regression</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs045.html#using-gradient-descent-methods-limitations" style="font-size: 80%;">Using gradient descent methods, limitations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs046.html#improving-gradient-descent-with-momentum" style="font-size: 80%;">Improving gradient descent with momentum</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs083.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;">Same code but now with momentum gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs048.html#overview-video-on-stochastic-gradient-descent" style="font-size: 80%;">Overview video on Stochastic Gradient Descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs049.html#batches-and-mini-batches" style="font-size: 80%;">Batches and mini-batches</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs050.html#stochastic-gradient-descent-sgd" style="font-size: 80%;">Stochastic Gradient Descent (SGD)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs051.html#stochastic-gradient-descent" style="font-size: 80%;">Stochastic Gradient Descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs052.html#computation-of-gradients" style="font-size: 80%;">Computation of gradients</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs053.html#sgd-example" style="font-size: 80%;">SGD example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs054.html#the-gradient-step" style="font-size: 80%;">The gradient step</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs055.html#simple-example-code" style="font-size: 80%;">Simple example code</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs056.html#when-do-we-stop" style="font-size: 80%;">When do we stop?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs057.html#slightly-different-approach" style="font-size: 80%;">Slightly different approach</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs058.html#time-decay-rate" style="font-size: 80%;">Time decay rate</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs059.html#code-with-a-number-of-minibatches-which-varies" style="font-size: 80%;">Code with a Number of Minibatches which varies</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs060.html#replace-or-not" style="font-size: 80%;">Replace or not</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs061.html#momentum-based-gd" style="font-size: 80%;">Momentum based GD</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs062.html#more-on-momentum-based-approaches" style="font-size: 80%;">More on momentum based approaches</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs063.html#momentum-parameter" style="font-size: 80%;">Momentum parameter</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs064.html#second-moment-of-the-gradient" style="font-size: 80%;">Second moment of the gradient</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs065.html#rms-prop" style="font-size: 80%;">RMS prop</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs066.html#adam-optimizer-https-arxiv-org-abs-1412-6980" style="font-size: 80%;">"ADAM optimizer":"https://arxiv.org/abs/1412.6980"</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs067.html#algorithms-and-codes-for-adagrad-rmsprop-and-adam" style="font-size: 80%;">Algorithms and codes for Adagrad, RMSprop and Adam</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs068.html#practical-tips" style="font-size: 80%;">Practical tips</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs069.html#automatic-differentiation" style="font-size: 80%;">Automatic differentiation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs070.html#using-autograd" style="font-size: 80%;">Using autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs071.html#autograd-with-more-complicated-functions" style="font-size: 80%;">Autograd with more complicated functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs072.html#more-complicated-functions-using-the-elements-of-their-arguments-directly" style="font-size: 80%;">More complicated functions using the elements of their arguments directly</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs073.html#functions-using-mathematical-functions-from-numpy" style="font-size: 80%;">Functions using mathematical functions from Numpy</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs074.html#more-autograd" style="font-size: 80%;">More autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs075.html#and-with-loops" style="font-size: 80%;">And  with loops</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs076.html#using-recursion" style="font-size: 80%;">Using recursion</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs077.html#unsupported-functions" style="font-size: 80%;">Unsupported functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs078.html#the-syntax-a-dot-b-when-finding-the-dot-product" style="font-size: 80%;">The syntax a.dot(b) when finding the dot product</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs079.html#using-autograd-with-ols" style="font-size: 80%;">Using Autograd with OLS</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs083.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;">Same code but now with momentum gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs081.html#but-none-of-these-can-compete-with-newton-s-method" style="font-size: 80%;">But none of these can compete with Newton's method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs082.html#including-stochastic-gradient-descent-with-autograd" style="font-size: 80%;">Including Stochastic Gradient Descent with Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs083.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;">Same code but now with momentum gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs084.html#similar-second-order-function-now-problem-but-now-with-adagrad" style="font-size: 80%;">Similar (second order function now) problem but now with AdaGrad</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs085.html#rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent" style="font-size: 80%;">RMSprop for adaptive learning rate with Stochastic Gradient Descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs086.html#and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf" style="font-size: 80%;">And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs087.html#and-logistic-regression" style="font-size: 80%;">And Logistic Regression</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs088.html#introducing-jax-https-jax-readthedocs-io-en-latest" style="font-size: 80%;">Introducing "JAX":"https://jax.readthedocs.io/en/latest/"</a></li>
+
+        </ul>
+      </li>
+    </ul>
+  </div>
+</div>
+</div> <!-- end of navigation bar -->
+<div class="container">
+<p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
+<a name="part0047"></a>
+<!-- !split -->
+<h2 id="same-code-but-now-with-momentum-gradient-descent" class="anchor">Same code but now with momentum gradient descent </h2>
+
+
+<!-- code=python (!bc pycod) typeset with pygments style "default" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #f8f8f8">
+  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">numpy</span> <span style="color: #008000; font-weight: bold">import</span> asarray
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">numpy</span> <span style="color: #008000; font-weight: bold">import</span> arange
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">numpy.random</span> <span style="color: #008000; font-weight: bold">import</span> rand
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">numpy.random</span> <span style="color: #008000; font-weight: bold">import</span> seed
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">matplotlib</span> <span style="color: #008000; font-weight: bold">import</span> pyplot
+ 
+<span style="color: #408080; font-style: italic"># objective function</span>
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">objective</span>(x):
+	<span style="color: #008000; font-weight: bold">return</span> x<span style="color: #666666">**2.0</span>
+ 
+<span style="color: #408080; font-style: italic"># derivative of objective function</span>
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">derivative</span>(x):
+	<span style="color: #008000; font-weight: bold">return</span> x <span style="color: #666666">*</span> <span style="color: #666666">2.0</span>
+ 
+<span style="color: #408080; font-style: italic"># gradient descent algorithm</span>
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">gradient_descent</span>(objective, derivative, bounds, n_iter, step_size, momentum):
+	<span style="color: #408080; font-style: italic"># track all solutions</span>
+	solutions, scores <span style="color: #666666">=</span> <span style="color: #008000">list</span>(), <span style="color: #008000">list</span>()
+	<span style="color: #408080; font-style: italic"># generate an initial point</span>
+	solution <span style="color: #666666">=</span> bounds[:, <span style="color: #666666">0</span>] <span style="color: #666666">+</span> rand(<span style="color: #008000">len</span>(bounds)) <span style="color: #666666">*</span> (bounds[:, <span style="color: #666666">1</span>] <span style="color: #666666">-</span> bounds[:, <span style="color: #666666">0</span>])
+	<span style="color: #408080; font-style: italic"># keep track of the change</span>
+	change <span style="color: #666666">=</span> <span style="color: #666666">0.0</span>
+	<span style="color: #408080; font-style: italic"># run the gradient descent</span>
+	<span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(n_iter):
+		<span style="color: #408080; font-style: italic"># calculate gradient</span>
+		gradient <span style="color: #666666">=</span> derivative(solution)
+		<span style="color: #408080; font-style: italic"># calculate update</span>
+		new_change <span style="color: #666666">=</span> step_size <span style="color: #666666">*</span> gradient <span style="color: #666666">+</span> momentum <span style="color: #666666">*</span> change
+		<span style="color: #408080; font-style: italic"># take a step</span>
+		solution <span style="color: #666666">=</span> solution <span style="color: #666666">-</span> new_change
+		<span style="color: #408080; font-style: italic"># save the change</span>
+		change <span style="color: #666666">=</span> new_change
+		<span style="color: #408080; font-style: italic"># evaluate candidate point</span>
+		solution_eval <span style="color: #666666">=</span> objective(solution)
+		<span style="color: #408080; font-style: italic"># store solution</span>
+		solutions<span style="color: #666666">.</span>append(solution)
+		scores<span style="color: #666666">.</span>append(solution_eval)
+		<span style="color: #408080; font-style: italic"># report progress</span>
+		<span style="color: #008000">print</span>(<span style="color: #BA2121">&#39;&gt;</span><span style="color: #BB6688; font-weight: bold">%d</span><span style="color: #BA2121"> f(</span><span style="color: #BB6688; font-weight: bold">%s</span><span style="color: #BA2121">) = </span><span style="color: #BB6688; font-weight: bold">%.5f</span><span style="color: #BA2121">&#39;</span> <span style="color: #666666">%</span> (i, solution, solution_eval))
+	<span style="color: #008000; font-weight: bold">return</span> [solutions, scores]
+ 
+<span style="color: #408080; font-style: italic"># seed the pseudo random number generator</span>
+seed(<span style="color: #666666">4</span>)
+<span style="color: #408080; font-style: italic"># define range for input</span>
+bounds <span style="color: #666666">=</span> asarray([[<span style="color: #666666">-1.0</span>, <span style="color: #666666">1.0</span>]])
+<span style="color: #408080; font-style: italic"># define the total iterations</span>
+n_iter <span style="color: #666666">=</span> <span style="color: #666666">30</span>
+<span style="color: #408080; font-style: italic"># define the step size</span>
+step_size <span style="color: #666666">=</span> <span style="color: #666666">0.1</span>
+<span style="color: #408080; font-style: italic"># define momentum</span>
+momentum <span style="color: #666666">=</span> <span style="color: #666666">0.3</span>
+<span style="color: #408080; font-style: italic"># perform the gradient descent search with momentum</span>
+solutions, scores <span style="color: #666666">=</span> gradient_descent(objective, derivative, bounds, n_iter, step_size, momentum)
+<span style="color: #408080; font-style: italic"># sample input range uniformly at 0.1 increments</span>
+inputs <span style="color: #666666">=</span> arange(bounds[<span style="color: #666666">0</span>,<span style="color: #666666">0</span>], bounds[<span style="color: #666666">0</span>,<span style="color: #666666">1</span>]<span style="color: #666666">+0.1</span>, <span style="color: #666666">0.1</span>)
+<span style="color: #408080; font-style: italic"># compute targets</span>
+results <span style="color: #666666">=</span> objective(inputs)
+<span style="color: #408080; font-style: italic"># create a line plot of input vs result</span>
+pyplot<span style="color: #666666">.</span>plot(inputs, results)
+<span style="color: #408080; font-style: italic"># plot the solutions found</span>
+pyplot<span style="color: #666666">.</span>plot(solutions, scores, <span style="color: #BA2121">&#39;.-&#39;</span>, color<span style="color: #666666">=</span><span style="color: #BA2121">&#39;red&#39;</span>)
+<span style="color: #408080; font-style: italic"># show the plot</span>
+pyplot<span style="color: #666666">.</span>show()
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
+</div>
+
+
+<p>
+<!-- navigation buttons at the bottom of the page -->
+<ul class="pagination">
+<li><a href="http://github.com/._week39-bs046.html">&laquo;</a></li>
+  <li><a href="http://github.com/._week39-bs000.html">1</a></li>
+  <li><a href="">...</a></li>
+  <li><a href="http://github.com/._week39-bs039.html">40</a></li>
+  <li><a href="http://github.com/._week39-bs040.html">41</a></li>
+  <li><a href="http://github.com/._week39-bs041.html">42</a></li>
+  <li><a href="http://github.com/._week39-bs042.html">43</a></li>
+  <li><a href="http://github.com/._week39-bs043.html">44</a></li>
+  <li><a href="http://github.com/._week39-bs044.html">45</a></li>
+  <li><a href="http://github.com/._week39-bs045.html">46</a></li>
+  <li><a href="http://github.com/._week39-bs046.html">47</a></li>
+  <li class="active"><a href="http://github.com/._week39-bs047.html">48</a></li>
+  <li><a href="http://github.com/._week39-bs048.html">49</a></li>
+  <li><a href="http://github.com/._week39-bs049.html">50</a></li>
+  <li><a href="http://github.com/._week39-bs050.html">51</a></li>
+  <li><a href="http://github.com/._week39-bs051.html">52</a></li>
+  <li><a href="http://github.com/._week39-bs052.html">53</a></li>
+  <li><a href="http://github.com/._week39-bs053.html">54</a></li>
+  <li><a href="http://github.com/._week39-bs054.html">55</a></li>
+  <li><a href="http://github.com/._week39-bs055.html">56</a></li>
+  <li><a href="http://github.com/._week39-bs056.html">57</a></li>
+  <li><a href="">...</a></li>
+  <li><a href="http://github.com/._week39-bs088.html">89</a></li>
+  <li><a href="http://github.com/._week39-bs048.html">&raquo;</a></li>
+</ul>
+<!-- ------------------- end of main content --------------- -->
+</div>  <!-- end container -->
+<!-- include javascript, jQuery *first* -->
+<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.10.2/jquery.min.js"></script>
+<script src="https://netdna.bootstrapcdn.com/bootstrap/3.0.0/js/bootstrap.min.js"></script>
+<!-- Bootstrap footer
+<footer>
+<a href="https://.../"><img width="250" align=right src="https://.../"></a>
+</footer>
+-->
+<center style="font-size:80%">
+<!-- copyright only on the titlepage -->
+</center>
+</body>
+</html>
+
diff --git a/doc/pub/week39/html/._week39-bs048.html b/doc/pub/week39/html/._week39-bs048.html
new file mode 100644
index 000000000..39c74d5ae
--- /dev/null
+++ b/doc/pub/week39/html/._week39-bs048.html
@@ -0,0 +1,461 @@
+<!--
+HTML file automatically generated from DocOnce source
+(https://github.com/doconce/doconce/)
+doconce format html week39.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week39-bs --no_mako
+-->
+<html>
+<head>
+<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
+<meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
+<meta name="viewport" content="width=device-width, initial-scale=1.0" />
+<meta name="description" content="Week 39: Optimization and  Gradient Methods">
+<title>Week 39: Optimization and  Gradient Methods</title>
+<!-- Bootstrap style: bootstrap -->
+<!-- doconce format html week39.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week39-bs --no_mako -->
+<link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
+<!-- not necessary
+<link href="https://netdna.bootstrapcdn.com/font-awesome/4.0.3/css/font-awesome.css" rel="stylesheet">
+-->
+<style type="text/css">
+/* Add scrollbar to dropdown menus in bootstrap navigation bar */
+.dropdown-menu {
+   height: auto;
+   max-height: 400px;
+   overflow-x: hidden;
+}
+/* Adds an invisible element before each target to offset for the navigation
+   bar */
+.anchor::before {
+  content:"";
+  display:block;
+  height:50px;      /* fixed header height for style bootstrap */
+  margin:-50px 0 0; /* negative fixed header height */
+}
+</style>
+</head>
+
+<!-- tocinfo
+{'highest level': 2,
+ 'sections': [('Plan for week 39, September 23-27, 2024',
+               2,
+               None,
+               'plan-for-week-39-september-23-27-2024'),
+              ('Lecture Monday September 23',
+               2,
+               None,
+               'lecture-monday-september-23'),
+              ('Lab sessions week 39', 2, None, 'lab-sessions-week-39'),
+              ('Lecture Monday September 23, Optimization, the central part of '
+               'any Machine Learning algortithm',
+               2,
+               None,
+               'lecture-monday-september-23-optimization-the-central-part-of-any-machine-learning-algortithm'),
+              ('Revisiting our Logistic Regression case',
+               2,
+               None,
+               'revisiting-our-logistic-regression-case'),
+              ('The equations to solve', 2, None, 'the-equations-to-solve'),
+              ("Solving using Newton-Raphson's method",
+               2,
+               None,
+               'solving-using-newton-raphson-s-method'),
+              ("Brief reminder on Newton-Raphson's method",
+               2,
+               None,
+               'brief-reminder-on-newton-raphson-s-method'),
+              ('The equations', 2, None, 'the-equations'),
+              ('Simple geometric interpretation',
+               2,
+               None,
+               'simple-geometric-interpretation'),
+              ('Extending to more than one variable',
+               2,
+               None,
+               'extending-to-more-than-one-variable'),
+              ('Steepest descent', 2, None, 'steepest-descent'),
+              ('More on Steepest descent', 2, None, 'more-on-steepest-descent'),
+              ('The ideal', 2, None, 'the-ideal'),
+              ('The sensitiveness of the gradient descent',
+               2,
+               None,
+               'the-sensitiveness-of-the-gradient-descent'),
+              ('Convex functions', 2, None, 'convex-functions'),
+              ('Convex function', 2, None, 'convex-function'),
+              ('Conditions on convex functions',
+               2,
+               None,
+               'conditions-on-convex-functions'),
+              ('More on convex functions', 2, None, 'more-on-convex-functions'),
+              ('Some simple problems', 2, None, 'some-simple-problems'),
+              ('Standard steepest descent',
+               2,
+               None,
+               'standard-steepest-descent'),
+              ('Gradient method', 2, None, 'gradient-method'),
+              ('Steepest descent  method', 2, None, 'steepest-descent-method'),
+              ('Steepest descent  method', 2, None, 'steepest-descent-method'),
+              ('Final expressions', 2, None, 'final-expressions'),
+              ('Steepest descent example', 2, None, 'steepest-descent-example'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method and iterations',
+               2,
+               None,
+               'conjugate-gradient-method-and-iterations'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Revisiting our first homework',
+               2,
+               None,
+               'revisiting-our-first-homework'),
+              ('Gradient descent example', 2, None, 'gradient-descent-example'),
+              ('The derivative of the cost/loss function',
+               2,
+               None,
+               'the-derivative-of-the-cost-loss-function'),
+              ('The Hessian matrix', 2, None, 'the-hessian-matrix'),
+              ('Simple program', 2, None, 'simple-program'),
+              ('Gradient Descent Example', 2, None, 'gradient-descent-example'),
+              ('And a corresponding example using _scikit-learn_',
+               2,
+               None,
+               'and-a-corresponding-example-using-scikit-learn'),
+              ('Gradient descent and Ridge',
+               2,
+               None,
+               'gradient-descent-and-ridge'),
+              ('The Hessian matrix for Ridge Regression',
+               2,
+               None,
+               'the-hessian-matrix-for-ridge-regression'),
+              ('Program example for gradient descent with Ridge Regression',
+               2,
+               None,
+               'program-example-for-gradient-descent-with-ridge-regression'),
+              ('Using gradient descent methods, limitations',
+               2,
+               None,
+               'using-gradient-descent-methods-limitations'),
+              ('Improving gradient descent with momentum',
+               2,
+               None,
+               'improving-gradient-descent-with-momentum'),
+              ('Same code but now with momentum gradient descent',
+               2,
+               None,
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ('Overview video on Stochastic Gradient Descent',
+               2,
+               None,
+               'overview-video-on-stochastic-gradient-descent'),
+              ('Batches and mini-batches', 2, None, 'batches-and-mini-batches'),
+              ('Stochastic Gradient Descent (SGD)',
+               2,
+               None,
+               'stochastic-gradient-descent-sgd'),
+              ('Stochastic Gradient Descent',
+               2,
+               None,
+               'stochastic-gradient-descent'),
+              ('Computation of gradients', 2, None, 'computation-of-gradients'),
+              ('SGD example', 2, None, 'sgd-example'),
+              ('The gradient step', 2, None, 'the-gradient-step'),
+              ('Simple example code', 2, None, 'simple-example-code'),
+              ('When do we stop?', 2, None, 'when-do-we-stop'),
+              ('Slightly different approach',
+               2,
+               None,
+               'slightly-different-approach'),
+              ('Time decay rate', 2, None, 'time-decay-rate'),
+              ('Code with a Number of Minibatches which varies',
+               2,
+               None,
+               'code-with-a-number-of-minibatches-which-varies'),
+              ('Replace or not', 2, None, 'replace-or-not'),
+              ('Momentum based GD', 2, None, 'momentum-based-gd'),
+              ('More on momentum based approaches',
+               2,
+               None,
+               'more-on-momentum-based-approaches'),
+              ('Momentum parameter', 2, None, 'momentum-parameter'),
+              ('Second moment of the gradient',
+               2,
+               None,
+               'second-moment-of-the-gradient'),
+              ('RMS prop', 2, None, 'rms-prop'),
+              ('"ADAM optimizer":"https://arxiv.org/abs/1412.6980"',
+               2,
+               None,
+               'adam-optimizer-https-arxiv-org-abs-1412-6980'),
+              ('Algorithms and codes for Adagrad, RMSprop and Adam',
+               2,
+               None,
+               'algorithms-and-codes-for-adagrad-rmsprop-and-adam'),
+              ('Practical tips', 2, None, 'practical-tips'),
+              ('Automatic differentiation',
+               2,
+               None,
+               'automatic-differentiation'),
+              ('Using autograd', 2, None, 'using-autograd'),
+              ('Autograd with more complicated functions',
+               2,
+               None,
+               'autograd-with-more-complicated-functions'),
+              ('More complicated functions using the elements of their '
+               'arguments directly',
+               2,
+               None,
+               'more-complicated-functions-using-the-elements-of-their-arguments-directly'),
+              ('Functions using mathematical functions from Numpy',
+               2,
+               None,
+               'functions-using-mathematical-functions-from-numpy'),
+              ('More autograd', 2, None, 'more-autograd'),
+              ('And  with loops', 2, None, 'and-with-loops'),
+              ('Using recursion', 2, None, 'using-recursion'),
+              ('Unsupported functions', 2, None, 'unsupported-functions'),
+              ('The syntax a.dot(b) when finding the dot product',
+               2,
+               None,
+               'the-syntax-a-dot-b-when-finding-the-dot-product'),
+              ('Using Autograd with OLS', 2, None, 'using-autograd-with-ols'),
+              ('Same code but now with momentum gradient descent',
+               2,
+               None,
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ("But none of these can compete with Newton's method",
+               2,
+               None,
+               'but-none-of-these-can-compete-with-newton-s-method'),
+              ('Including Stochastic Gradient Descent with Autograd',
+               2,
+               None,
+               'including-stochastic-gradient-descent-with-autograd'),
+              ('Same code but now with momentum gradient descent',
+               2,
+               None,
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ('Similar (second order function now) problem but now with '
+               'AdaGrad',
+               2,
+               None,
+               'similar-second-order-function-now-problem-but-now-with-adagrad'),
+              ('RMSprop for adaptive learning rate with Stochastic Gradient '
+               'Descent',
+               2,
+               None,
+               'rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent'),
+              ('And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"',
+               2,
+               None,
+               'and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf'),
+              ('And Logistic Regression', 2, None, 'and-logistic-regression'),
+              ('Introducing "JAX":"https://jax.readthedocs.io/en/latest/"',
+               2,
+               None,
+               'introducing-jax-https-jax-readthedocs-io-en-latest')]}
+end of tocinfo -->
+
+<body>
+
+
+
+<script type="text/x-mathjax-config">
+MathJax.Hub.Config({
+  TeX: {
+     equationNumbers: {  autoNumber: "none"  },
+     extensions: ["AMSmath.js", "AMSsymbols.js", "autobold.js", "color.js"]
+  }
+});
+</script>
+<script type="text/javascript" async
+ src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
+</script>
+
+
+<!-- Bootstrap navigation bar -->
+<div class="navbar navbar-default navbar-fixed-top">
+  <div class="navbar-header">
+    <button type="button" class="navbar-toggle" data-toggle="collapse" data-target=".navbar-responsive-collapse">
+      <span class="icon-bar"></span>
+      <span class="icon-bar"></span>
+      <span class="icon-bar"></span>
+    </button>
+    <a class="navbar-brand" href="http://github.com/week39-bs.html">Week 39: Optimization and  Gradient Methods</a>
+  </div>
+  <div class="navbar-collapse collapse navbar-responsive-collapse">
+    <ul class="nav navbar-nav navbar-right">
+      <li class="dropdown">
+        <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
+        <ul class="dropdown-menu">
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs001.html#plan-for-week-39-september-23-27-2024" style="font-size: 80%;">Plan for week 39, September 23-27, 2024</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs002.html#lecture-monday-september-23" style="font-size: 80%;">Lecture Monday September 23</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs003.html#lab-sessions-week-39" style="font-size: 80%;">Lab sessions week 39</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs004.html#lecture-monday-september-23-optimization-the-central-part-of-any-machine-learning-algortithm" style="font-size: 80%;">Lecture Monday September 23, Optimization, the central part of any Machine Learning algortithm</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs005.html#revisiting-our-logistic-regression-case" style="font-size: 80%;">Revisiting our Logistic Regression case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs006.html#the-equations-to-solve" style="font-size: 80%;">The equations to solve</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs007.html#solving-using-newton-raphson-s-method" style="font-size: 80%;">Solving using Newton-Raphson's method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs008.html#brief-reminder-on-newton-raphson-s-method" style="font-size: 80%;">Brief reminder on Newton-Raphson's method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs009.html#the-equations" style="font-size: 80%;">The equations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs010.html#simple-geometric-interpretation" style="font-size: 80%;">Simple geometric interpretation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs011.html#extending-to-more-than-one-variable" style="font-size: 80%;">Extending to more than one variable</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs012.html#steepest-descent" style="font-size: 80%;">Steepest descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs013.html#more-on-steepest-descent" style="font-size: 80%;">More on Steepest descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs014.html#the-ideal" style="font-size: 80%;">The ideal</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs015.html#the-sensitiveness-of-the-gradient-descent" style="font-size: 80%;">The sensitiveness of the gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs016.html#convex-functions" style="font-size: 80%;">Convex functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs017.html#convex-function" style="font-size: 80%;">Convex function</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs018.html#conditions-on-convex-functions" style="font-size: 80%;">Conditions on convex functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs019.html#more-on-convex-functions" style="font-size: 80%;">More on convex functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs020.html#some-simple-problems" style="font-size: 80%;">Some simple problems</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs021.html#standard-steepest-descent" style="font-size: 80%;">Standard steepest descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs022.html#gradient-method" style="font-size: 80%;">Gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs024.html#steepest-descent-method" style="font-size: 80%;">Steepest descent  method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs024.html#steepest-descent-method" style="font-size: 80%;">Steepest descent  method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs025.html#final-expressions" style="font-size: 80%;">Final expressions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs026.html#steepest-descent-example" style="font-size: 80%;">Steepest descent example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs031.html#conjugate-gradient-method-and-iterations" style="font-size: 80%;">Conjugate gradient method and iterations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs035.html#revisiting-our-first-homework" style="font-size: 80%;">Revisiting our first homework</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs040.html#gradient-descent-example" style="font-size: 80%;">Gradient descent example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs037.html#the-derivative-of-the-cost-loss-function" style="font-size: 80%;">The derivative of the cost/loss function</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs038.html#the-hessian-matrix" style="font-size: 80%;">The Hessian matrix</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs039.html#simple-program" style="font-size: 80%;">Simple program</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs040.html#gradient-descent-example" style="font-size: 80%;">Gradient Descent Example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs041.html#and-a-corresponding-example-using-scikit-learn" style="font-size: 80%;">And a corresponding example using <b>scikit-learn</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs042.html#gradient-descent-and-ridge" style="font-size: 80%;">Gradient descent and Ridge</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs043.html#the-hessian-matrix-for-ridge-regression" style="font-size: 80%;">The Hessian matrix for Ridge Regression</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs044.html#program-example-for-gradient-descent-with-ridge-regression" style="font-size: 80%;">Program example for gradient descent with Ridge Regression</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs045.html#using-gradient-descent-methods-limitations" style="font-size: 80%;">Using gradient descent methods, limitations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs046.html#improving-gradient-descent-with-momentum" style="font-size: 80%;">Improving gradient descent with momentum</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs083.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;">Same code but now with momentum gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="#overview-video-on-stochastic-gradient-descent" style="font-size: 80%;">Overview video on Stochastic Gradient Descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs049.html#batches-and-mini-batches" style="font-size: 80%;">Batches and mini-batches</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs050.html#stochastic-gradient-descent-sgd" style="font-size: 80%;">Stochastic Gradient Descent (SGD)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs051.html#stochastic-gradient-descent" style="font-size: 80%;">Stochastic Gradient Descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs052.html#computation-of-gradients" style="font-size: 80%;">Computation of gradients</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs053.html#sgd-example" style="font-size: 80%;">SGD example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs054.html#the-gradient-step" style="font-size: 80%;">The gradient step</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs055.html#simple-example-code" style="font-size: 80%;">Simple example code</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs056.html#when-do-we-stop" style="font-size: 80%;">When do we stop?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs057.html#slightly-different-approach" style="font-size: 80%;">Slightly different approach</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs058.html#time-decay-rate" style="font-size: 80%;">Time decay rate</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs059.html#code-with-a-number-of-minibatches-which-varies" style="font-size: 80%;">Code with a Number of Minibatches which varies</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs060.html#replace-or-not" style="font-size: 80%;">Replace or not</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs061.html#momentum-based-gd" style="font-size: 80%;">Momentum based GD</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs062.html#more-on-momentum-based-approaches" style="font-size: 80%;">More on momentum based approaches</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs063.html#momentum-parameter" style="font-size: 80%;">Momentum parameter</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs064.html#second-moment-of-the-gradient" style="font-size: 80%;">Second moment of the gradient</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs065.html#rms-prop" style="font-size: 80%;">RMS prop</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs066.html#adam-optimizer-https-arxiv-org-abs-1412-6980" style="font-size: 80%;">"ADAM optimizer":"https://arxiv.org/abs/1412.6980"</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs067.html#algorithms-and-codes-for-adagrad-rmsprop-and-adam" style="font-size: 80%;">Algorithms and codes for Adagrad, RMSprop and Adam</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs068.html#practical-tips" style="font-size: 80%;">Practical tips</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs069.html#automatic-differentiation" style="font-size: 80%;">Automatic differentiation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs070.html#using-autograd" style="font-size: 80%;">Using autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs071.html#autograd-with-more-complicated-functions" style="font-size: 80%;">Autograd with more complicated functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs072.html#more-complicated-functions-using-the-elements-of-their-arguments-directly" style="font-size: 80%;">More complicated functions using the elements of their arguments directly</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs073.html#functions-using-mathematical-functions-from-numpy" style="font-size: 80%;">Functions using mathematical functions from Numpy</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs074.html#more-autograd" style="font-size: 80%;">More autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs075.html#and-with-loops" style="font-size: 80%;">And  with loops</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs076.html#using-recursion" style="font-size: 80%;">Using recursion</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs077.html#unsupported-functions" style="font-size: 80%;">Unsupported functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs078.html#the-syntax-a-dot-b-when-finding-the-dot-product" style="font-size: 80%;">The syntax a.dot(b) when finding the dot product</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs079.html#using-autograd-with-ols" style="font-size: 80%;">Using Autograd with OLS</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs083.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;">Same code but now with momentum gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs081.html#but-none-of-these-can-compete-with-newton-s-method" style="font-size: 80%;">But none of these can compete with Newton's method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs082.html#including-stochastic-gradient-descent-with-autograd" style="font-size: 80%;">Including Stochastic Gradient Descent with Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs083.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;">Same code but now with momentum gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs084.html#similar-second-order-function-now-problem-but-now-with-adagrad" style="font-size: 80%;">Similar (second order function now) problem but now with AdaGrad</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs085.html#rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent" style="font-size: 80%;">RMSprop for adaptive learning rate with Stochastic Gradient Descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs086.html#and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf" style="font-size: 80%;">And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs087.html#and-logistic-regression" style="font-size: 80%;">And Logistic Regression</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs088.html#introducing-jax-https-jax-readthedocs-io-en-latest" style="font-size: 80%;">Introducing "JAX":"https://jax.readthedocs.io/en/latest/"</a></li>
+
+        </ul>
+      </li>
+    </ul>
+  </div>
+</div>
+</div> <!-- end of navigation bar -->
+<div class="container">
+<p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
+<a name="part0048"></a>
+<!-- !split -->
+<h2 id="overview-video-on-stochastic-gradient-descent" class="anchor">Overview video on Stochastic Gradient Descent </h2>
+
+<a href="https://www.youtube.com/watch?v=vMh0zPT0tLI&ab_channel=StatQuestwithJoshStarmer" target="_self">What is Stochastic Gradient Descent</a>
+
+<p>
+<!-- navigation buttons at the bottom of the page -->
+<ul class="pagination">
+<li><a href="http://github.com/._week39-bs047.html">&laquo;</a></li>
+  <li><a href="http://github.com/._week39-bs000.html">1</a></li>
+  <li><a href="">...</a></li>
+  <li><a href="http://github.com/._week39-bs040.html">41</a></li>
+  <li><a href="http://github.com/._week39-bs041.html">42</a></li>
+  <li><a href="http://github.com/._week39-bs042.html">43</a></li>
+  <li><a href="http://github.com/._week39-bs043.html">44</a></li>
+  <li><a href="http://github.com/._week39-bs044.html">45</a></li>
+  <li><a href="http://github.com/._week39-bs045.html">46</a></li>
+  <li><a href="http://github.com/._week39-bs046.html">47</a></li>
+  <li><a href="http://github.com/._week39-bs047.html">48</a></li>
+  <li class="active"><a href="http://github.com/._week39-bs048.html">49</a></li>
+  <li><a href="http://github.com/._week39-bs049.html">50</a></li>
+  <li><a href="http://github.com/._week39-bs050.html">51</a></li>
+  <li><a href="http://github.com/._week39-bs051.html">52</a></li>
+  <li><a href="http://github.com/._week39-bs052.html">53</a></li>
+  <li><a href="http://github.com/._week39-bs053.html">54</a></li>
+  <li><a href="http://github.com/._week39-bs054.html">55</a></li>
+  <li><a href="http://github.com/._week39-bs055.html">56</a></li>
+  <li><a href="http://github.com/._week39-bs056.html">57</a></li>
+  <li><a href="http://github.com/._week39-bs057.html">58</a></li>
+  <li><a href="">...</a></li>
+  <li><a href="http://github.com/._week39-bs088.html">89</a></li>
+  <li><a href="http://github.com/._week39-bs049.html">&raquo;</a></li>
+</ul>
+<!-- ------------------- end of main content --------------- -->
+</div>  <!-- end container -->
+<!-- include javascript, jQuery *first* -->
+<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.10.2/jquery.min.js"></script>
+<script src="https://netdna.bootstrapcdn.com/bootstrap/3.0.0/js/bootstrap.min.js"></script>
+<!-- Bootstrap footer
+<footer>
+<a href="https://.../"><img width="250" align=right src="https://.../"></a>
+</footer>
+-->
+<center style="font-size:80%">
+<!-- copyright only on the titlepage -->
+</center>
+</body>
+</html>
+
diff --git a/doc/pub/week39/html/._week39-bs049.html b/doc/pub/week39/html/._week39-bs049.html
new file mode 100644
index 000000000..40ef9997c
--- /dev/null
+++ b/doc/pub/week39/html/._week39-bs049.html
@@ -0,0 +1,471 @@
+<!--
+HTML file automatically generated from DocOnce source
+(https://github.com/doconce/doconce/)
+doconce format html week39.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week39-bs --no_mako
+-->
+<html>
+<head>
+<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
+<meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
+<meta name="viewport" content="width=device-width, initial-scale=1.0" />
+<meta name="description" content="Week 39: Optimization and  Gradient Methods">
+<title>Week 39: Optimization and  Gradient Methods</title>
+<!-- Bootstrap style: bootstrap -->
+<!-- doconce format html week39.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week39-bs --no_mako -->
+<link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
+<!-- not necessary
+<link href="https://netdna.bootstrapcdn.com/font-awesome/4.0.3/css/font-awesome.css" rel="stylesheet">
+-->
+<style type="text/css">
+/* Add scrollbar to dropdown menus in bootstrap navigation bar */
+.dropdown-menu {
+   height: auto;
+   max-height: 400px;
+   overflow-x: hidden;
+}
+/* Adds an invisible element before each target to offset for the navigation
+   bar */
+.anchor::before {
+  content:"";
+  display:block;
+  height:50px;      /* fixed header height for style bootstrap */
+  margin:-50px 0 0; /* negative fixed header height */
+}
+</style>
+</head>
+
+<!-- tocinfo
+{'highest level': 2,
+ 'sections': [('Plan for week 39, September 23-27, 2024',
+               2,
+               None,
+               'plan-for-week-39-september-23-27-2024'),
+              ('Lecture Monday September 23',
+               2,
+               None,
+               'lecture-monday-september-23'),
+              ('Lab sessions week 39', 2, None, 'lab-sessions-week-39'),
+              ('Lecture Monday September 23, Optimization, the central part of '
+               'any Machine Learning algortithm',
+               2,
+               None,
+               'lecture-monday-september-23-optimization-the-central-part-of-any-machine-learning-algortithm'),
+              ('Revisiting our Logistic Regression case',
+               2,
+               None,
+               'revisiting-our-logistic-regression-case'),
+              ('The equations to solve', 2, None, 'the-equations-to-solve'),
+              ("Solving using Newton-Raphson's method",
+               2,
+               None,
+               'solving-using-newton-raphson-s-method'),
+              ("Brief reminder on Newton-Raphson's method",
+               2,
+               None,
+               'brief-reminder-on-newton-raphson-s-method'),
+              ('The equations', 2, None, 'the-equations'),
+              ('Simple geometric interpretation',
+               2,
+               None,
+               'simple-geometric-interpretation'),
+              ('Extending to more than one variable',
+               2,
+               None,
+               'extending-to-more-than-one-variable'),
+              ('Steepest descent', 2, None, 'steepest-descent'),
+              ('More on Steepest descent', 2, None, 'more-on-steepest-descent'),
+              ('The ideal', 2, None, 'the-ideal'),
+              ('The sensitiveness of the gradient descent',
+               2,
+               None,
+               'the-sensitiveness-of-the-gradient-descent'),
+              ('Convex functions', 2, None, 'convex-functions'),
+              ('Convex function', 2, None, 'convex-function'),
+              ('Conditions on convex functions',
+               2,
+               None,
+               'conditions-on-convex-functions'),
+              ('More on convex functions', 2, None, 'more-on-convex-functions'),
+              ('Some simple problems', 2, None, 'some-simple-problems'),
+              ('Standard steepest descent',
+               2,
+               None,
+               'standard-steepest-descent'),
+              ('Gradient method', 2, None, 'gradient-method'),
+              ('Steepest descent  method', 2, None, 'steepest-descent-method'),
+              ('Steepest descent  method', 2, None, 'steepest-descent-method'),
+              ('Final expressions', 2, None, 'final-expressions'),
+              ('Steepest descent example', 2, None, 'steepest-descent-example'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method and iterations',
+               2,
+               None,
+               'conjugate-gradient-method-and-iterations'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Revisiting our first homework',
+               2,
+               None,
+               'revisiting-our-first-homework'),
+              ('Gradient descent example', 2, None, 'gradient-descent-example'),
+              ('The derivative of the cost/loss function',
+               2,
+               None,
+               'the-derivative-of-the-cost-loss-function'),
+              ('The Hessian matrix', 2, None, 'the-hessian-matrix'),
+              ('Simple program', 2, None, 'simple-program'),
+              ('Gradient Descent Example', 2, None, 'gradient-descent-example'),
+              ('And a corresponding example using _scikit-learn_',
+               2,
+               None,
+               'and-a-corresponding-example-using-scikit-learn'),
+              ('Gradient descent and Ridge',
+               2,
+               None,
+               'gradient-descent-and-ridge'),
+              ('The Hessian matrix for Ridge Regression',
+               2,
+               None,
+               'the-hessian-matrix-for-ridge-regression'),
+              ('Program example for gradient descent with Ridge Regression',
+               2,
+               None,
+               'program-example-for-gradient-descent-with-ridge-regression'),
+              ('Using gradient descent methods, limitations',
+               2,
+               None,
+               'using-gradient-descent-methods-limitations'),
+              ('Improving gradient descent with momentum',
+               2,
+               None,
+               'improving-gradient-descent-with-momentum'),
+              ('Same code but now with momentum gradient descent',
+               2,
+               None,
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ('Overview video on Stochastic Gradient Descent',
+               2,
+               None,
+               'overview-video-on-stochastic-gradient-descent'),
+              ('Batches and mini-batches', 2, None, 'batches-and-mini-batches'),
+              ('Stochastic Gradient Descent (SGD)',
+               2,
+               None,
+               'stochastic-gradient-descent-sgd'),
+              ('Stochastic Gradient Descent',
+               2,
+               None,
+               'stochastic-gradient-descent'),
+              ('Computation of gradients', 2, None, 'computation-of-gradients'),
+              ('SGD example', 2, None, 'sgd-example'),
+              ('The gradient step', 2, None, 'the-gradient-step'),
+              ('Simple example code', 2, None, 'simple-example-code'),
+              ('When do we stop?', 2, None, 'when-do-we-stop'),
+              ('Slightly different approach',
+               2,
+               None,
+               'slightly-different-approach'),
+              ('Time decay rate', 2, None, 'time-decay-rate'),
+              ('Code with a Number of Minibatches which varies',
+               2,
+               None,
+               'code-with-a-number-of-minibatches-which-varies'),
+              ('Replace or not', 2, None, 'replace-or-not'),
+              ('Momentum based GD', 2, None, 'momentum-based-gd'),
+              ('More on momentum based approaches',
+               2,
+               None,
+               'more-on-momentum-based-approaches'),
+              ('Momentum parameter', 2, None, 'momentum-parameter'),
+              ('Second moment of the gradient',
+               2,
+               None,
+               'second-moment-of-the-gradient'),
+              ('RMS prop', 2, None, 'rms-prop'),
+              ('"ADAM optimizer":"https://arxiv.org/abs/1412.6980"',
+               2,
+               None,
+               'adam-optimizer-https-arxiv-org-abs-1412-6980'),
+              ('Algorithms and codes for Adagrad, RMSprop and Adam',
+               2,
+               None,
+               'algorithms-and-codes-for-adagrad-rmsprop-and-adam'),
+              ('Practical tips', 2, None, 'practical-tips'),
+              ('Automatic differentiation',
+               2,
+               None,
+               'automatic-differentiation'),
+              ('Using autograd', 2, None, 'using-autograd'),
+              ('Autograd with more complicated functions',
+               2,
+               None,
+               'autograd-with-more-complicated-functions'),
+              ('More complicated functions using the elements of their '
+               'arguments directly',
+               2,
+               None,
+               'more-complicated-functions-using-the-elements-of-their-arguments-directly'),
+              ('Functions using mathematical functions from Numpy',
+               2,
+               None,
+               'functions-using-mathematical-functions-from-numpy'),
+              ('More autograd', 2, None, 'more-autograd'),
+              ('And  with loops', 2, None, 'and-with-loops'),
+              ('Using recursion', 2, None, 'using-recursion'),
+              ('Unsupported functions', 2, None, 'unsupported-functions'),
+              ('The syntax a.dot(b) when finding the dot product',
+               2,
+               None,
+               'the-syntax-a-dot-b-when-finding-the-dot-product'),
+              ('Using Autograd with OLS', 2, None, 'using-autograd-with-ols'),
+              ('Same code but now with momentum gradient descent',
+               2,
+               None,
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ("But none of these can compete with Newton's method",
+               2,
+               None,
+               'but-none-of-these-can-compete-with-newton-s-method'),
+              ('Including Stochastic Gradient Descent with Autograd',
+               2,
+               None,
+               'including-stochastic-gradient-descent-with-autograd'),
+              ('Same code but now with momentum gradient descent',
+               2,
+               None,
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ('Similar (second order function now) problem but now with '
+               'AdaGrad',
+               2,
+               None,
+               'similar-second-order-function-now-problem-but-now-with-adagrad'),
+              ('RMSprop for adaptive learning rate with Stochastic Gradient '
+               'Descent',
+               2,
+               None,
+               'rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent'),
+              ('And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"',
+               2,
+               None,
+               'and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf'),
+              ('And Logistic Regression', 2, None, 'and-logistic-regression'),
+              ('Introducing "JAX":"https://jax.readthedocs.io/en/latest/"',
+               2,
+               None,
+               'introducing-jax-https-jax-readthedocs-io-en-latest')]}
+end of tocinfo -->
+
+<body>
+
+
+
+<script type="text/x-mathjax-config">
+MathJax.Hub.Config({
+  TeX: {
+     equationNumbers: {  autoNumber: "none"  },
+     extensions: ["AMSmath.js", "AMSsymbols.js", "autobold.js", "color.js"]
+  }
+});
+</script>
+<script type="text/javascript" async
+ src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
+</script>
+
+
+<!-- Bootstrap navigation bar -->
+<div class="navbar navbar-default navbar-fixed-top">
+  <div class="navbar-header">
+    <button type="button" class="navbar-toggle" data-toggle="collapse" data-target=".navbar-responsive-collapse">
+      <span class="icon-bar"></span>
+      <span class="icon-bar"></span>
+      <span class="icon-bar"></span>
+    </button>
+    <a class="navbar-brand" href="http://github.com/week39-bs.html">Week 39: Optimization and  Gradient Methods</a>
+  </div>
+  <div class="navbar-collapse collapse navbar-responsive-collapse">
+    <ul class="nav navbar-nav navbar-right">
+      <li class="dropdown">
+        <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
+        <ul class="dropdown-menu">
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs001.html#plan-for-week-39-september-23-27-2024" style="font-size: 80%;">Plan for week 39, September 23-27, 2024</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs002.html#lecture-monday-september-23" style="font-size: 80%;">Lecture Monday September 23</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs003.html#lab-sessions-week-39" style="font-size: 80%;">Lab sessions week 39</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs004.html#lecture-monday-september-23-optimization-the-central-part-of-any-machine-learning-algortithm" style="font-size: 80%;">Lecture Monday September 23, Optimization, the central part of any Machine Learning algortithm</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs005.html#revisiting-our-logistic-regression-case" style="font-size: 80%;">Revisiting our Logistic Regression case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs006.html#the-equations-to-solve" style="font-size: 80%;">The equations to solve</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs007.html#solving-using-newton-raphson-s-method" style="font-size: 80%;">Solving using Newton-Raphson's method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs008.html#brief-reminder-on-newton-raphson-s-method" style="font-size: 80%;">Brief reminder on Newton-Raphson's method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs009.html#the-equations" style="font-size: 80%;">The equations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs010.html#simple-geometric-interpretation" style="font-size: 80%;">Simple geometric interpretation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs011.html#extending-to-more-than-one-variable" style="font-size: 80%;">Extending to more than one variable</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs012.html#steepest-descent" style="font-size: 80%;">Steepest descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs013.html#more-on-steepest-descent" style="font-size: 80%;">More on Steepest descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs014.html#the-ideal" style="font-size: 80%;">The ideal</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs015.html#the-sensitiveness-of-the-gradient-descent" style="font-size: 80%;">The sensitiveness of the gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs016.html#convex-functions" style="font-size: 80%;">Convex functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs017.html#convex-function" style="font-size: 80%;">Convex function</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs018.html#conditions-on-convex-functions" style="font-size: 80%;">Conditions on convex functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs019.html#more-on-convex-functions" style="font-size: 80%;">More on convex functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs020.html#some-simple-problems" style="font-size: 80%;">Some simple problems</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs021.html#standard-steepest-descent" style="font-size: 80%;">Standard steepest descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs022.html#gradient-method" style="font-size: 80%;">Gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs024.html#steepest-descent-method" style="font-size: 80%;">Steepest descent  method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs024.html#steepest-descent-method" style="font-size: 80%;">Steepest descent  method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs025.html#final-expressions" style="font-size: 80%;">Final expressions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs026.html#steepest-descent-example" style="font-size: 80%;">Steepest descent example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs031.html#conjugate-gradient-method-and-iterations" style="font-size: 80%;">Conjugate gradient method and iterations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs035.html#revisiting-our-first-homework" style="font-size: 80%;">Revisiting our first homework</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs040.html#gradient-descent-example" style="font-size: 80%;">Gradient descent example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs037.html#the-derivative-of-the-cost-loss-function" style="font-size: 80%;">The derivative of the cost/loss function</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs038.html#the-hessian-matrix" style="font-size: 80%;">The Hessian matrix</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs039.html#simple-program" style="font-size: 80%;">Simple program</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs040.html#gradient-descent-example" style="font-size: 80%;">Gradient Descent Example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs041.html#and-a-corresponding-example-using-scikit-learn" style="font-size: 80%;">And a corresponding example using <b>scikit-learn</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs042.html#gradient-descent-and-ridge" style="font-size: 80%;">Gradient descent and Ridge</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs043.html#the-hessian-matrix-for-ridge-regression" style="font-size: 80%;">The Hessian matrix for Ridge Regression</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs044.html#program-example-for-gradient-descent-with-ridge-regression" style="font-size: 80%;">Program example for gradient descent with Ridge Regression</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs045.html#using-gradient-descent-methods-limitations" style="font-size: 80%;">Using gradient descent methods, limitations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs046.html#improving-gradient-descent-with-momentum" style="font-size: 80%;">Improving gradient descent with momentum</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs083.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;">Same code but now with momentum gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs048.html#overview-video-on-stochastic-gradient-descent" style="font-size: 80%;">Overview video on Stochastic Gradient Descent</a></li>
+     <!-- navigation toc: --> <li><a href="#batches-and-mini-batches" style="font-size: 80%;">Batches and mini-batches</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs050.html#stochastic-gradient-descent-sgd" style="font-size: 80%;">Stochastic Gradient Descent (SGD)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs051.html#stochastic-gradient-descent" style="font-size: 80%;">Stochastic Gradient Descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs052.html#computation-of-gradients" style="font-size: 80%;">Computation of gradients</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs053.html#sgd-example" style="font-size: 80%;">SGD example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs054.html#the-gradient-step" style="font-size: 80%;">The gradient step</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs055.html#simple-example-code" style="font-size: 80%;">Simple example code</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs056.html#when-do-we-stop" style="font-size: 80%;">When do we stop?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs057.html#slightly-different-approach" style="font-size: 80%;">Slightly different approach</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs058.html#time-decay-rate" style="font-size: 80%;">Time decay rate</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs059.html#code-with-a-number-of-minibatches-which-varies" style="font-size: 80%;">Code with a Number of Minibatches which varies</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs060.html#replace-or-not" style="font-size: 80%;">Replace or not</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs061.html#momentum-based-gd" style="font-size: 80%;">Momentum based GD</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs062.html#more-on-momentum-based-approaches" style="font-size: 80%;">More on momentum based approaches</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs063.html#momentum-parameter" style="font-size: 80%;">Momentum parameter</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs064.html#second-moment-of-the-gradient" style="font-size: 80%;">Second moment of the gradient</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs065.html#rms-prop" style="font-size: 80%;">RMS prop</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs066.html#adam-optimizer-https-arxiv-org-abs-1412-6980" style="font-size: 80%;">"ADAM optimizer":"https://arxiv.org/abs/1412.6980"</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs067.html#algorithms-and-codes-for-adagrad-rmsprop-and-adam" style="font-size: 80%;">Algorithms and codes for Adagrad, RMSprop and Adam</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs068.html#practical-tips" style="font-size: 80%;">Practical tips</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs069.html#automatic-differentiation" style="font-size: 80%;">Automatic differentiation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs070.html#using-autograd" style="font-size: 80%;">Using autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs071.html#autograd-with-more-complicated-functions" style="font-size: 80%;">Autograd with more complicated functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs072.html#more-complicated-functions-using-the-elements-of-their-arguments-directly" style="font-size: 80%;">More complicated functions using the elements of their arguments directly</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs073.html#functions-using-mathematical-functions-from-numpy" style="font-size: 80%;">Functions using mathematical functions from Numpy</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs074.html#more-autograd" style="font-size: 80%;">More autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs075.html#and-with-loops" style="font-size: 80%;">And  with loops</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs076.html#using-recursion" style="font-size: 80%;">Using recursion</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs077.html#unsupported-functions" style="font-size: 80%;">Unsupported functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs078.html#the-syntax-a-dot-b-when-finding-the-dot-product" style="font-size: 80%;">The syntax a.dot(b) when finding the dot product</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs079.html#using-autograd-with-ols" style="font-size: 80%;">Using Autograd with OLS</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs083.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;">Same code but now with momentum gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs081.html#but-none-of-these-can-compete-with-newton-s-method" style="font-size: 80%;">But none of these can compete with Newton's method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs082.html#including-stochastic-gradient-descent-with-autograd" style="font-size: 80%;">Including Stochastic Gradient Descent with Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs083.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;">Same code but now with momentum gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs084.html#similar-second-order-function-now-problem-but-now-with-adagrad" style="font-size: 80%;">Similar (second order function now) problem but now with AdaGrad</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs085.html#rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent" style="font-size: 80%;">RMSprop for adaptive learning rate with Stochastic Gradient Descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs086.html#and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf" style="font-size: 80%;">And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs087.html#and-logistic-regression" style="font-size: 80%;">And Logistic Regression</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs088.html#introducing-jax-https-jax-readthedocs-io-en-latest" style="font-size: 80%;">Introducing "JAX":"https://jax.readthedocs.io/en/latest/"</a></li>
+
+        </ul>
+      </li>
+    </ul>
+  </div>
+</div>
+</div> <!-- end of navigation bar -->
+<div class="container">
+<p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
+<a name="part0049"></a>
+<!-- !split -->
+<h2 id="batches-and-mini-batches" class="anchor">Batches and mini-batches </h2>
+
+<p>In gradient descent we compute the cost function and its gradient for all data points we have.</p>
+
+<p>In large-scale applications such as the <a href="https://www.image-net.org/challenges/LSVRC/" target="_self">ILSVRC challenge</a>, the
+training data can have on order of millions of examples. Hence, it
+seems wasteful to compute the full cost function over the entire
+training set in order to perform only a single parameter update. A
+very common approach to addressing this challenge is to compute the
+gradient over batches of the training data. For example, a typical batch could contain some thousand  examples from
+an  entire training set of several millions. This batch is then used to
+perform a parameter update.
+</p>
+
+<p>
+<!-- navigation buttons at the bottom of the page -->
+<ul class="pagination">
+<li><a href="http://github.com/._week39-bs048.html">&laquo;</a></li>
+  <li><a href="http://github.com/._week39-bs000.html">1</a></li>
+  <li><a href="">...</a></li>
+  <li><a href="http://github.com/._week39-bs041.html">42</a></li>
+  <li><a href="http://github.com/._week39-bs042.html">43</a></li>
+  <li><a href="http://github.com/._week39-bs043.html">44</a></li>
+  <li><a href="http://github.com/._week39-bs044.html">45</a></li>
+  <li><a href="http://github.com/._week39-bs045.html">46</a></li>
+  <li><a href="http://github.com/._week39-bs046.html">47</a></li>
+  <li><a href="http://github.com/._week39-bs047.html">48</a></li>
+  <li><a href="http://github.com/._week39-bs048.html">49</a></li>
+  <li class="active"><a href="http://github.com/._week39-bs049.html">50</a></li>
+  <li><a href="http://github.com/._week39-bs050.html">51</a></li>
+  <li><a href="http://github.com/._week39-bs051.html">52</a></li>
+  <li><a href="http://github.com/._week39-bs052.html">53</a></li>
+  <li><a href="http://github.com/._week39-bs053.html">54</a></li>
+  <li><a href="http://github.com/._week39-bs054.html">55</a></li>
+  <li><a href="http://github.com/._week39-bs055.html">56</a></li>
+  <li><a href="http://github.com/._week39-bs056.html">57</a></li>
+  <li><a href="http://github.com/._week39-bs057.html">58</a></li>
+  <li><a href="http://github.com/._week39-bs058.html">59</a></li>
+  <li><a href="">...</a></li>
+  <li><a href="http://github.com/._week39-bs088.html">89</a></li>
+  <li><a href="http://github.com/._week39-bs050.html">&raquo;</a></li>
+</ul>
+<!-- ------------------- end of main content --------------- -->
+</div>  <!-- end container -->
+<!-- include javascript, jQuery *first* -->
+<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.10.2/jquery.min.js"></script>
+<script src="https://netdna.bootstrapcdn.com/bootstrap/3.0.0/js/bootstrap.min.js"></script>
+<!-- Bootstrap footer
+<footer>
+<a href="https://.../"><img width="250" align=right src="https://.../"></a>
+</footer>
+-->
+<center style="font-size:80%">
+<!-- copyright only on the titlepage -->
+</center>
+</body>
+</html>
+
diff --git a/doc/pub/week39/html/._week39-bs050.html b/doc/pub/week39/html/._week39-bs050.html
new file mode 100644
index 000000000..67bd324f5
--- /dev/null
+++ b/doc/pub/week39/html/._week39-bs050.html
@@ -0,0 +1,483 @@
+<!--
+HTML file automatically generated from DocOnce source
+(https://github.com/doconce/doconce/)
+doconce format html week39.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week39-bs --no_mako
+-->
+<html>
+<head>
+<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
+<meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
+<meta name="viewport" content="width=device-width, initial-scale=1.0" />
+<meta name="description" content="Week 39: Optimization and  Gradient Methods">
+<title>Week 39: Optimization and  Gradient Methods</title>
+<!-- Bootstrap style: bootstrap -->
+<!-- doconce format html week39.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week39-bs --no_mako -->
+<link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
+<!-- not necessary
+<link href="https://netdna.bootstrapcdn.com/font-awesome/4.0.3/css/font-awesome.css" rel="stylesheet">
+-->
+<style type="text/css">
+/* Add scrollbar to dropdown menus in bootstrap navigation bar */
+.dropdown-menu {
+   height: auto;
+   max-height: 400px;
+   overflow-x: hidden;
+}
+/* Adds an invisible element before each target to offset for the navigation
+   bar */
+.anchor::before {
+  content:"";
+  display:block;
+  height:50px;      /* fixed header height for style bootstrap */
+  margin:-50px 0 0; /* negative fixed header height */
+}
+</style>
+</head>
+
+<!-- tocinfo
+{'highest level': 2,
+ 'sections': [('Plan for week 39, September 23-27, 2024',
+               2,
+               None,
+               'plan-for-week-39-september-23-27-2024'),
+              ('Lecture Monday September 23',
+               2,
+               None,
+               'lecture-monday-september-23'),
+              ('Lab sessions week 39', 2, None, 'lab-sessions-week-39'),
+              ('Lecture Monday September 23, Optimization, the central part of '
+               'any Machine Learning algortithm',
+               2,
+               None,
+               'lecture-monday-september-23-optimization-the-central-part-of-any-machine-learning-algortithm'),
+              ('Revisiting our Logistic Regression case',
+               2,
+               None,
+               'revisiting-our-logistic-regression-case'),
+              ('The equations to solve', 2, None, 'the-equations-to-solve'),
+              ("Solving using Newton-Raphson's method",
+               2,
+               None,
+               'solving-using-newton-raphson-s-method'),
+              ("Brief reminder on Newton-Raphson's method",
+               2,
+               None,
+               'brief-reminder-on-newton-raphson-s-method'),
+              ('The equations', 2, None, 'the-equations'),
+              ('Simple geometric interpretation',
+               2,
+               None,
+               'simple-geometric-interpretation'),
+              ('Extending to more than one variable',
+               2,
+               None,
+               'extending-to-more-than-one-variable'),
+              ('Steepest descent', 2, None, 'steepest-descent'),
+              ('More on Steepest descent', 2, None, 'more-on-steepest-descent'),
+              ('The ideal', 2, None, 'the-ideal'),
+              ('The sensitiveness of the gradient descent',
+               2,
+               None,
+               'the-sensitiveness-of-the-gradient-descent'),
+              ('Convex functions', 2, None, 'convex-functions'),
+              ('Convex function', 2, None, 'convex-function'),
+              ('Conditions on convex functions',
+               2,
+               None,
+               'conditions-on-convex-functions'),
+              ('More on convex functions', 2, None, 'more-on-convex-functions'),
+              ('Some simple problems', 2, None, 'some-simple-problems'),
+              ('Standard steepest descent',
+               2,
+               None,
+               'standard-steepest-descent'),
+              ('Gradient method', 2, None, 'gradient-method'),
+              ('Steepest descent  method', 2, None, 'steepest-descent-method'),
+              ('Steepest descent  method', 2, None, 'steepest-descent-method'),
+              ('Final expressions', 2, None, 'final-expressions'),
+              ('Steepest descent example', 2, None, 'steepest-descent-example'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method and iterations',
+               2,
+               None,
+               'conjugate-gradient-method-and-iterations'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Revisiting our first homework',
+               2,
+               None,
+               'revisiting-our-first-homework'),
+              ('Gradient descent example', 2, None, 'gradient-descent-example'),
+              ('The derivative of the cost/loss function',
+               2,
+               None,
+               'the-derivative-of-the-cost-loss-function'),
+              ('The Hessian matrix', 2, None, 'the-hessian-matrix'),
+              ('Simple program', 2, None, 'simple-program'),
+              ('Gradient Descent Example', 2, None, 'gradient-descent-example'),
+              ('And a corresponding example using _scikit-learn_',
+               2,
+               None,
+               'and-a-corresponding-example-using-scikit-learn'),
+              ('Gradient descent and Ridge',
+               2,
+               None,
+               'gradient-descent-and-ridge'),
+              ('The Hessian matrix for Ridge Regression',
+               2,
+               None,
+               'the-hessian-matrix-for-ridge-regression'),
+              ('Program example for gradient descent with Ridge Regression',
+               2,
+               None,
+               'program-example-for-gradient-descent-with-ridge-regression'),
+              ('Using gradient descent methods, limitations',
+               2,
+               None,
+               'using-gradient-descent-methods-limitations'),
+              ('Improving gradient descent with momentum',
+               2,
+               None,
+               'improving-gradient-descent-with-momentum'),
+              ('Same code but now with momentum gradient descent',
+               2,
+               None,
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ('Overview video on Stochastic Gradient Descent',
+               2,
+               None,
+               'overview-video-on-stochastic-gradient-descent'),
+              ('Batches and mini-batches', 2, None, 'batches-and-mini-batches'),
+              ('Stochastic Gradient Descent (SGD)',
+               2,
+               None,
+               'stochastic-gradient-descent-sgd'),
+              ('Stochastic Gradient Descent',
+               2,
+               None,
+               'stochastic-gradient-descent'),
+              ('Computation of gradients', 2, None, 'computation-of-gradients'),
+              ('SGD example', 2, None, 'sgd-example'),
+              ('The gradient step', 2, None, 'the-gradient-step'),
+              ('Simple example code', 2, None, 'simple-example-code'),
+              ('When do we stop?', 2, None, 'when-do-we-stop'),
+              ('Slightly different approach',
+               2,
+               None,
+               'slightly-different-approach'),
+              ('Time decay rate', 2, None, 'time-decay-rate'),
+              ('Code with a Number of Minibatches which varies',
+               2,
+               None,
+               'code-with-a-number-of-minibatches-which-varies'),
+              ('Replace or not', 2, None, 'replace-or-not'),
+              ('Momentum based GD', 2, None, 'momentum-based-gd'),
+              ('More on momentum based approaches',
+               2,
+               None,
+               'more-on-momentum-based-approaches'),
+              ('Momentum parameter', 2, None, 'momentum-parameter'),
+              ('Second moment of the gradient',
+               2,
+               None,
+               'second-moment-of-the-gradient'),
+              ('RMS prop', 2, None, 'rms-prop'),
+              ('"ADAM optimizer":"https://arxiv.org/abs/1412.6980"',
+               2,
+               None,
+               'adam-optimizer-https-arxiv-org-abs-1412-6980'),
+              ('Algorithms and codes for Adagrad, RMSprop and Adam',
+               2,
+               None,
+               'algorithms-and-codes-for-adagrad-rmsprop-and-adam'),
+              ('Practical tips', 2, None, 'practical-tips'),
+              ('Automatic differentiation',
+               2,
+               None,
+               'automatic-differentiation'),
+              ('Using autograd', 2, None, 'using-autograd'),
+              ('Autograd with more complicated functions',
+               2,
+               None,
+               'autograd-with-more-complicated-functions'),
+              ('More complicated functions using the elements of their '
+               'arguments directly',
+               2,
+               None,
+               'more-complicated-functions-using-the-elements-of-their-arguments-directly'),
+              ('Functions using mathematical functions from Numpy',
+               2,
+               None,
+               'functions-using-mathematical-functions-from-numpy'),
+              ('More autograd', 2, None, 'more-autograd'),
+              ('And  with loops', 2, None, 'and-with-loops'),
+              ('Using recursion', 2, None, 'using-recursion'),
+              ('Unsupported functions', 2, None, 'unsupported-functions'),
+              ('The syntax a.dot(b) when finding the dot product',
+               2,
+               None,
+               'the-syntax-a-dot-b-when-finding-the-dot-product'),
+              ('Using Autograd with OLS', 2, None, 'using-autograd-with-ols'),
+              ('Same code but now with momentum gradient descent',
+               2,
+               None,
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ("But none of these can compete with Newton's method",
+               2,
+               None,
+               'but-none-of-these-can-compete-with-newton-s-method'),
+              ('Including Stochastic Gradient Descent with Autograd',
+               2,
+               None,
+               'including-stochastic-gradient-descent-with-autograd'),
+              ('Same code but now with momentum gradient descent',
+               2,
+               None,
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ('Similar (second order function now) problem but now with '
+               'AdaGrad',
+               2,
+               None,
+               'similar-second-order-function-now-problem-but-now-with-adagrad'),
+              ('RMSprop for adaptive learning rate with Stochastic Gradient '
+               'Descent',
+               2,
+               None,
+               'rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent'),
+              ('And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"',
+               2,
+               None,
+               'and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf'),
+              ('And Logistic Regression', 2, None, 'and-logistic-regression'),
+              ('Introducing "JAX":"https://jax.readthedocs.io/en/latest/"',
+               2,
+               None,
+               'introducing-jax-https-jax-readthedocs-io-en-latest')]}
+end of tocinfo -->
+
+<body>
+
+
+
+<script type="text/x-mathjax-config">
+MathJax.Hub.Config({
+  TeX: {
+     equationNumbers: {  autoNumber: "none"  },
+     extensions: ["AMSmath.js", "AMSsymbols.js", "autobold.js", "color.js"]
+  }
+});
+</script>
+<script type="text/javascript" async
+ src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
+</script>
+
+
+<!-- Bootstrap navigation bar -->
+<div class="navbar navbar-default navbar-fixed-top">
+  <div class="navbar-header">
+    <button type="button" class="navbar-toggle" data-toggle="collapse" data-target=".navbar-responsive-collapse">
+      <span class="icon-bar"></span>
+      <span class="icon-bar"></span>
+      <span class="icon-bar"></span>
+    </button>
+    <a class="navbar-brand" href="http://github.com/week39-bs.html">Week 39: Optimization and  Gradient Methods</a>
+  </div>
+  <div class="navbar-collapse collapse navbar-responsive-collapse">
+    <ul class="nav navbar-nav navbar-right">
+      <li class="dropdown">
+        <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
+        <ul class="dropdown-menu">
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs001.html#plan-for-week-39-september-23-27-2024" style="font-size: 80%;">Plan for week 39, September 23-27, 2024</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs002.html#lecture-monday-september-23" style="font-size: 80%;">Lecture Monday September 23</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs003.html#lab-sessions-week-39" style="font-size: 80%;">Lab sessions week 39</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs004.html#lecture-monday-september-23-optimization-the-central-part-of-any-machine-learning-algortithm" style="font-size: 80%;">Lecture Monday September 23, Optimization, the central part of any Machine Learning algortithm</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs005.html#revisiting-our-logistic-regression-case" style="font-size: 80%;">Revisiting our Logistic Regression case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs006.html#the-equations-to-solve" style="font-size: 80%;">The equations to solve</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs007.html#solving-using-newton-raphson-s-method" style="font-size: 80%;">Solving using Newton-Raphson's method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs008.html#brief-reminder-on-newton-raphson-s-method" style="font-size: 80%;">Brief reminder on Newton-Raphson's method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs009.html#the-equations" style="font-size: 80%;">The equations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs010.html#simple-geometric-interpretation" style="font-size: 80%;">Simple geometric interpretation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs011.html#extending-to-more-than-one-variable" style="font-size: 80%;">Extending to more than one variable</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs012.html#steepest-descent" style="font-size: 80%;">Steepest descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs013.html#more-on-steepest-descent" style="font-size: 80%;">More on Steepest descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs014.html#the-ideal" style="font-size: 80%;">The ideal</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs015.html#the-sensitiveness-of-the-gradient-descent" style="font-size: 80%;">The sensitiveness of the gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs016.html#convex-functions" style="font-size: 80%;">Convex functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs017.html#convex-function" style="font-size: 80%;">Convex function</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs018.html#conditions-on-convex-functions" style="font-size: 80%;">Conditions on convex functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs019.html#more-on-convex-functions" style="font-size: 80%;">More on convex functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs020.html#some-simple-problems" style="font-size: 80%;">Some simple problems</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs021.html#standard-steepest-descent" style="font-size: 80%;">Standard steepest descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs022.html#gradient-method" style="font-size: 80%;">Gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs024.html#steepest-descent-method" style="font-size: 80%;">Steepest descent  method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs024.html#steepest-descent-method" style="font-size: 80%;">Steepest descent  method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs025.html#final-expressions" style="font-size: 80%;">Final expressions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs026.html#steepest-descent-example" style="font-size: 80%;">Steepest descent example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs031.html#conjugate-gradient-method-and-iterations" style="font-size: 80%;">Conjugate gradient method and iterations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs035.html#revisiting-our-first-homework" style="font-size: 80%;">Revisiting our first homework</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs040.html#gradient-descent-example" style="font-size: 80%;">Gradient descent example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs037.html#the-derivative-of-the-cost-loss-function" style="font-size: 80%;">The derivative of the cost/loss function</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs038.html#the-hessian-matrix" style="font-size: 80%;">The Hessian matrix</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs039.html#simple-program" style="font-size: 80%;">Simple program</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs040.html#gradient-descent-example" style="font-size: 80%;">Gradient Descent Example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs041.html#and-a-corresponding-example-using-scikit-learn" style="font-size: 80%;">And a corresponding example using <b>scikit-learn</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs042.html#gradient-descent-and-ridge" style="font-size: 80%;">Gradient descent and Ridge</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs043.html#the-hessian-matrix-for-ridge-regression" style="font-size: 80%;">The Hessian matrix for Ridge Regression</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs044.html#program-example-for-gradient-descent-with-ridge-regression" style="font-size: 80%;">Program example for gradient descent with Ridge Regression</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs045.html#using-gradient-descent-methods-limitations" style="font-size: 80%;">Using gradient descent methods, limitations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs046.html#improving-gradient-descent-with-momentum" style="font-size: 80%;">Improving gradient descent with momentum</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs083.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;">Same code but now with momentum gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs048.html#overview-video-on-stochastic-gradient-descent" style="font-size: 80%;">Overview video on Stochastic Gradient Descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs049.html#batches-and-mini-batches" style="font-size: 80%;">Batches and mini-batches</a></li>
+     <!-- navigation toc: --> <li><a href="#stochastic-gradient-descent-sgd" style="font-size: 80%;">Stochastic Gradient Descent (SGD)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs051.html#stochastic-gradient-descent" style="font-size: 80%;">Stochastic Gradient Descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs052.html#computation-of-gradients" style="font-size: 80%;">Computation of gradients</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs053.html#sgd-example" style="font-size: 80%;">SGD example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs054.html#the-gradient-step" style="font-size: 80%;">The gradient step</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs055.html#simple-example-code" style="font-size: 80%;">Simple example code</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs056.html#when-do-we-stop" style="font-size: 80%;">When do we stop?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs057.html#slightly-different-approach" style="font-size: 80%;">Slightly different approach</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs058.html#time-decay-rate" style="font-size: 80%;">Time decay rate</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs059.html#code-with-a-number-of-minibatches-which-varies" style="font-size: 80%;">Code with a Number of Minibatches which varies</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs060.html#replace-or-not" style="font-size: 80%;">Replace or not</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs061.html#momentum-based-gd" style="font-size: 80%;">Momentum based GD</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs062.html#more-on-momentum-based-approaches" style="font-size: 80%;">More on momentum based approaches</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs063.html#momentum-parameter" style="font-size: 80%;">Momentum parameter</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs064.html#second-moment-of-the-gradient" style="font-size: 80%;">Second moment of the gradient</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs065.html#rms-prop" style="font-size: 80%;">RMS prop</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs066.html#adam-optimizer-https-arxiv-org-abs-1412-6980" style="font-size: 80%;">"ADAM optimizer":"https://arxiv.org/abs/1412.6980"</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs067.html#algorithms-and-codes-for-adagrad-rmsprop-and-adam" style="font-size: 80%;">Algorithms and codes for Adagrad, RMSprop and Adam</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs068.html#practical-tips" style="font-size: 80%;">Practical tips</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs069.html#automatic-differentiation" style="font-size: 80%;">Automatic differentiation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs070.html#using-autograd" style="font-size: 80%;">Using autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs071.html#autograd-with-more-complicated-functions" style="font-size: 80%;">Autograd with more complicated functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs072.html#more-complicated-functions-using-the-elements-of-their-arguments-directly" style="font-size: 80%;">More complicated functions using the elements of their arguments directly</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs073.html#functions-using-mathematical-functions-from-numpy" style="font-size: 80%;">Functions using mathematical functions from Numpy</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs074.html#more-autograd" style="font-size: 80%;">More autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs075.html#and-with-loops" style="font-size: 80%;">And  with loops</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs076.html#using-recursion" style="font-size: 80%;">Using recursion</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs077.html#unsupported-functions" style="font-size: 80%;">Unsupported functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs078.html#the-syntax-a-dot-b-when-finding-the-dot-product" style="font-size: 80%;">The syntax a.dot(b) when finding the dot product</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs079.html#using-autograd-with-ols" style="font-size: 80%;">Using Autograd with OLS</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs083.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;">Same code but now with momentum gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs081.html#but-none-of-these-can-compete-with-newton-s-method" style="font-size: 80%;">But none of these can compete with Newton's method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs082.html#including-stochastic-gradient-descent-with-autograd" style="font-size: 80%;">Including Stochastic Gradient Descent with Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs083.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;">Same code but now with momentum gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs084.html#similar-second-order-function-now-problem-but-now-with-adagrad" style="font-size: 80%;">Similar (second order function now) problem but now with AdaGrad</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs085.html#rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent" style="font-size: 80%;">RMSprop for adaptive learning rate with Stochastic Gradient Descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs086.html#and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf" style="font-size: 80%;">And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs087.html#and-logistic-regression" style="font-size: 80%;">And Logistic Regression</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs088.html#introducing-jax-https-jax-readthedocs-io-en-latest" style="font-size: 80%;">Introducing "JAX":"https://jax.readthedocs.io/en/latest/"</a></li>
+
+        </ul>
+      </li>
+    </ul>
+  </div>
+</div>
+</div> <!-- end of navigation bar -->
+<div class="container">
+<p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
+<a name="part0050"></a>
+<!-- !split -->
+<h2 id="stochastic-gradient-descent-sgd" class="anchor">Stochastic Gradient Descent (SGD) </h2>
+
+<p>In stochastic gradient descent, the extreme case is the case where we
+have only one batch, that is we include the whole data set.
+</p>
+
+<p>This process is called Stochastic Gradient
+Descent (SGD) (or also sometimes on-line gradient descent). This is
+relatively less common to see because in practice due to vectorized
+code optimizations it can be computationally much more efficient to
+evaluate the gradient for 100 examples, than the gradient for one
+example 100 times. Even though SGD technically refers to using a
+single example at a time to evaluate the gradient, you will hear
+people use the term SGD even when referring to mini-batch gradient
+descent (i.e. mentions of MGD for &#8220;Minibatch Gradient Descent&#8221;, or BGD
+for &#8220;Batch gradient descent&#8221; are rare to see), where it is usually
+assumed that mini-batches are used. The size of the mini-batch is a
+hyperparameter but it is not very common to cross-validate or bootstrap it. It is
+usually based on memory constraints (if any), or set to some value,
+e.g. 32, 64 or 128. We use powers of 2 in practice because many
+vectorized operation implementations work faster when their inputs are
+sized in powers of 2.
+</p>
+
+<p>In our notes with  SGD we mean stochastic gradient descent with mini-batches.</p>
+
+<p>
+<!-- navigation buttons at the bottom of the page -->
+<ul class="pagination">
+<li><a href="http://github.com/._week39-bs049.html">&laquo;</a></li>
+  <li><a href="http://github.com/._week39-bs000.html">1</a></li>
+  <li><a href="">...</a></li>
+  <li><a href="http://github.com/._week39-bs042.html">43</a></li>
+  <li><a href="http://github.com/._week39-bs043.html">44</a></li>
+  <li><a href="http://github.com/._week39-bs044.html">45</a></li>
+  <li><a href="http://github.com/._week39-bs045.html">46</a></li>
+  <li><a href="http://github.com/._week39-bs046.html">47</a></li>
+  <li><a href="http://github.com/._week39-bs047.html">48</a></li>
+  <li><a href="http://github.com/._week39-bs048.html">49</a></li>
+  <li><a href="http://github.com/._week39-bs049.html">50</a></li>
+  <li class="active"><a href="http://github.com/._week39-bs050.html">51</a></li>
+  <li><a href="http://github.com/._week39-bs051.html">52</a></li>
+  <li><a href="http://github.com/._week39-bs052.html">53</a></li>
+  <li><a href="http://github.com/._week39-bs053.html">54</a></li>
+  <li><a href="http://github.com/._week39-bs054.html">55</a></li>
+  <li><a href="http://github.com/._week39-bs055.html">56</a></li>
+  <li><a href="http://github.com/._week39-bs056.html">57</a></li>
+  <li><a href="http://github.com/._week39-bs057.html">58</a></li>
+  <li><a href="http://github.com/._week39-bs058.html">59</a></li>
+  <li><a href="http://github.com/._week39-bs059.html">60</a></li>
+  <li><a href="">...</a></li>
+  <li><a href="http://github.com/._week39-bs088.html">89</a></li>
+  <li><a href="http://github.com/._week39-bs051.html">&raquo;</a></li>
+</ul>
+<!-- ------------------- end of main content --------------- -->
+</div>  <!-- end container -->
+<!-- include javascript, jQuery *first* -->
+<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.10.2/jquery.min.js"></script>
+<script src="https://netdna.bootstrapcdn.com/bootstrap/3.0.0/js/bootstrap.min.js"></script>
+<!-- Bootstrap footer
+<footer>
+<a href="https://.../"><img width="250" align=right src="https://.../"></a>
+</footer>
+-->
+<center style="font-size:80%">
+<!-- copyright only on the titlepage -->
+</center>
+</body>
+</html>
+
diff --git a/doc/pub/week39/html/._week39-bs051.html b/doc/pub/week39/html/._week39-bs051.html
new file mode 100644
index 000000000..f5a76ca74
--- /dev/null
+++ b/doc/pub/week39/html/._week39-bs051.html
@@ -0,0 +1,473 @@
+<!--
+HTML file automatically generated from DocOnce source
+(https://github.com/doconce/doconce/)
+doconce format html week39.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week39-bs --no_mako
+-->
+<html>
+<head>
+<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
+<meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
+<meta name="viewport" content="width=device-width, initial-scale=1.0" />
+<meta name="description" content="Week 39: Optimization and  Gradient Methods">
+<title>Week 39: Optimization and  Gradient Methods</title>
+<!-- Bootstrap style: bootstrap -->
+<!-- doconce format html week39.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week39-bs --no_mako -->
+<link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
+<!-- not necessary
+<link href="https://netdna.bootstrapcdn.com/font-awesome/4.0.3/css/font-awesome.css" rel="stylesheet">
+-->
+<style type="text/css">
+/* Add scrollbar to dropdown menus in bootstrap navigation bar */
+.dropdown-menu {
+   height: auto;
+   max-height: 400px;
+   overflow-x: hidden;
+}
+/* Adds an invisible element before each target to offset for the navigation
+   bar */
+.anchor::before {
+  content:"";
+  display:block;
+  height:50px;      /* fixed header height for style bootstrap */
+  margin:-50px 0 0; /* negative fixed header height */
+}
+</style>
+</head>
+
+<!-- tocinfo
+{'highest level': 2,
+ 'sections': [('Plan for week 39, September 23-27, 2024',
+               2,
+               None,
+               'plan-for-week-39-september-23-27-2024'),
+              ('Lecture Monday September 23',
+               2,
+               None,
+               'lecture-monday-september-23'),
+              ('Lab sessions week 39', 2, None, 'lab-sessions-week-39'),
+              ('Lecture Monday September 23, Optimization, the central part of '
+               'any Machine Learning algortithm',
+               2,
+               None,
+               'lecture-monday-september-23-optimization-the-central-part-of-any-machine-learning-algortithm'),
+              ('Revisiting our Logistic Regression case',
+               2,
+               None,
+               'revisiting-our-logistic-regression-case'),
+              ('The equations to solve', 2, None, 'the-equations-to-solve'),
+              ("Solving using Newton-Raphson's method",
+               2,
+               None,
+               'solving-using-newton-raphson-s-method'),
+              ("Brief reminder on Newton-Raphson's method",
+               2,
+               None,
+               'brief-reminder-on-newton-raphson-s-method'),
+              ('The equations', 2, None, 'the-equations'),
+              ('Simple geometric interpretation',
+               2,
+               None,
+               'simple-geometric-interpretation'),
+              ('Extending to more than one variable',
+               2,
+               None,
+               'extending-to-more-than-one-variable'),
+              ('Steepest descent', 2, None, 'steepest-descent'),
+              ('More on Steepest descent', 2, None, 'more-on-steepest-descent'),
+              ('The ideal', 2, None, 'the-ideal'),
+              ('The sensitiveness of the gradient descent',
+               2,
+               None,
+               'the-sensitiveness-of-the-gradient-descent'),
+              ('Convex functions', 2, None, 'convex-functions'),
+              ('Convex function', 2, None, 'convex-function'),
+              ('Conditions on convex functions',
+               2,
+               None,
+               'conditions-on-convex-functions'),
+              ('More on convex functions', 2, None, 'more-on-convex-functions'),
+              ('Some simple problems', 2, None, 'some-simple-problems'),
+              ('Standard steepest descent',
+               2,
+               None,
+               'standard-steepest-descent'),
+              ('Gradient method', 2, None, 'gradient-method'),
+              ('Steepest descent  method', 2, None, 'steepest-descent-method'),
+              ('Steepest descent  method', 2, None, 'steepest-descent-method'),
+              ('Final expressions', 2, None, 'final-expressions'),
+              ('Steepest descent example', 2, None, 'steepest-descent-example'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method and iterations',
+               2,
+               None,
+               'conjugate-gradient-method-and-iterations'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Revisiting our first homework',
+               2,
+               None,
+               'revisiting-our-first-homework'),
+              ('Gradient descent example', 2, None, 'gradient-descent-example'),
+              ('The derivative of the cost/loss function',
+               2,
+               None,
+               'the-derivative-of-the-cost-loss-function'),
+              ('The Hessian matrix', 2, None, 'the-hessian-matrix'),
+              ('Simple program', 2, None, 'simple-program'),
+              ('Gradient Descent Example', 2, None, 'gradient-descent-example'),
+              ('And a corresponding example using _scikit-learn_',
+               2,
+               None,
+               'and-a-corresponding-example-using-scikit-learn'),
+              ('Gradient descent and Ridge',
+               2,
+               None,
+               'gradient-descent-and-ridge'),
+              ('The Hessian matrix for Ridge Regression',
+               2,
+               None,
+               'the-hessian-matrix-for-ridge-regression'),
+              ('Program example for gradient descent with Ridge Regression',
+               2,
+               None,
+               'program-example-for-gradient-descent-with-ridge-regression'),
+              ('Using gradient descent methods, limitations',
+               2,
+               None,
+               'using-gradient-descent-methods-limitations'),
+              ('Improving gradient descent with momentum',
+               2,
+               None,
+               'improving-gradient-descent-with-momentum'),
+              ('Same code but now with momentum gradient descent',
+               2,
+               None,
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ('Overview video on Stochastic Gradient Descent',
+               2,
+               None,
+               'overview-video-on-stochastic-gradient-descent'),
+              ('Batches and mini-batches', 2, None, 'batches-and-mini-batches'),
+              ('Stochastic Gradient Descent (SGD)',
+               2,
+               None,
+               'stochastic-gradient-descent-sgd'),
+              ('Stochastic Gradient Descent',
+               2,
+               None,
+               'stochastic-gradient-descent'),
+              ('Computation of gradients', 2, None, 'computation-of-gradients'),
+              ('SGD example', 2, None, 'sgd-example'),
+              ('The gradient step', 2, None, 'the-gradient-step'),
+              ('Simple example code', 2, None, 'simple-example-code'),
+              ('When do we stop?', 2, None, 'when-do-we-stop'),
+              ('Slightly different approach',
+               2,
+               None,
+               'slightly-different-approach'),
+              ('Time decay rate', 2, None, 'time-decay-rate'),
+              ('Code with a Number of Minibatches which varies',
+               2,
+               None,
+               'code-with-a-number-of-minibatches-which-varies'),
+              ('Replace or not', 2, None, 'replace-or-not'),
+              ('Momentum based GD', 2, None, 'momentum-based-gd'),
+              ('More on momentum based approaches',
+               2,
+               None,
+               'more-on-momentum-based-approaches'),
+              ('Momentum parameter', 2, None, 'momentum-parameter'),
+              ('Second moment of the gradient',
+               2,
+               None,
+               'second-moment-of-the-gradient'),
+              ('RMS prop', 2, None, 'rms-prop'),
+              ('"ADAM optimizer":"https://arxiv.org/abs/1412.6980"',
+               2,
+               None,
+               'adam-optimizer-https-arxiv-org-abs-1412-6980'),
+              ('Algorithms and codes for Adagrad, RMSprop and Adam',
+               2,
+               None,
+               'algorithms-and-codes-for-adagrad-rmsprop-and-adam'),
+              ('Practical tips', 2, None, 'practical-tips'),
+              ('Automatic differentiation',
+               2,
+               None,
+               'automatic-differentiation'),
+              ('Using autograd', 2, None, 'using-autograd'),
+              ('Autograd with more complicated functions',
+               2,
+               None,
+               'autograd-with-more-complicated-functions'),
+              ('More complicated functions using the elements of their '
+               'arguments directly',
+               2,
+               None,
+               'more-complicated-functions-using-the-elements-of-their-arguments-directly'),
+              ('Functions using mathematical functions from Numpy',
+               2,
+               None,
+               'functions-using-mathematical-functions-from-numpy'),
+              ('More autograd', 2, None, 'more-autograd'),
+              ('And  with loops', 2, None, 'and-with-loops'),
+              ('Using recursion', 2, None, 'using-recursion'),
+              ('Unsupported functions', 2, None, 'unsupported-functions'),
+              ('The syntax a.dot(b) when finding the dot product',
+               2,
+               None,
+               'the-syntax-a-dot-b-when-finding-the-dot-product'),
+              ('Using Autograd with OLS', 2, None, 'using-autograd-with-ols'),
+              ('Same code but now with momentum gradient descent',
+               2,
+               None,
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ("But none of these can compete with Newton's method",
+               2,
+               None,
+               'but-none-of-these-can-compete-with-newton-s-method'),
+              ('Including Stochastic Gradient Descent with Autograd',
+               2,
+               None,
+               'including-stochastic-gradient-descent-with-autograd'),
+              ('Same code but now with momentum gradient descent',
+               2,
+               None,
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ('Similar (second order function now) problem but now with '
+               'AdaGrad',
+               2,
+               None,
+               'similar-second-order-function-now-problem-but-now-with-adagrad'),
+              ('RMSprop for adaptive learning rate with Stochastic Gradient '
+               'Descent',
+               2,
+               None,
+               'rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent'),
+              ('And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"',
+               2,
+               None,
+               'and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf'),
+              ('And Logistic Regression', 2, None, 'and-logistic-regression'),
+              ('Introducing "JAX":"https://jax.readthedocs.io/en/latest/"',
+               2,
+               None,
+               'introducing-jax-https-jax-readthedocs-io-en-latest')]}
+end of tocinfo -->
+
+<body>
+
+
+
+<script type="text/x-mathjax-config">
+MathJax.Hub.Config({
+  TeX: {
+     equationNumbers: {  autoNumber: "none"  },
+     extensions: ["AMSmath.js", "AMSsymbols.js", "autobold.js", "color.js"]
+  }
+});
+</script>
+<script type="text/javascript" async
+ src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
+</script>
+
+
+<!-- Bootstrap navigation bar -->
+<div class="navbar navbar-default navbar-fixed-top">
+  <div class="navbar-header">
+    <button type="button" class="navbar-toggle" data-toggle="collapse" data-target=".navbar-responsive-collapse">
+      <span class="icon-bar"></span>
+      <span class="icon-bar"></span>
+      <span class="icon-bar"></span>
+    </button>
+    <a class="navbar-brand" href="http://github.com/week39-bs.html">Week 39: Optimization and  Gradient Methods</a>
+  </div>
+  <div class="navbar-collapse collapse navbar-responsive-collapse">
+    <ul class="nav navbar-nav navbar-right">
+      <li class="dropdown">
+        <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
+        <ul class="dropdown-menu">
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs001.html#plan-for-week-39-september-23-27-2024" style="font-size: 80%;">Plan for week 39, September 23-27, 2024</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs002.html#lecture-monday-september-23" style="font-size: 80%;">Lecture Monday September 23</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs003.html#lab-sessions-week-39" style="font-size: 80%;">Lab sessions week 39</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs004.html#lecture-monday-september-23-optimization-the-central-part-of-any-machine-learning-algortithm" style="font-size: 80%;">Lecture Monday September 23, Optimization, the central part of any Machine Learning algortithm</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs005.html#revisiting-our-logistic-regression-case" style="font-size: 80%;">Revisiting our Logistic Regression case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs006.html#the-equations-to-solve" style="font-size: 80%;">The equations to solve</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs007.html#solving-using-newton-raphson-s-method" style="font-size: 80%;">Solving using Newton-Raphson's method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs008.html#brief-reminder-on-newton-raphson-s-method" style="font-size: 80%;">Brief reminder on Newton-Raphson's method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs009.html#the-equations" style="font-size: 80%;">The equations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs010.html#simple-geometric-interpretation" style="font-size: 80%;">Simple geometric interpretation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs011.html#extending-to-more-than-one-variable" style="font-size: 80%;">Extending to more than one variable</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs012.html#steepest-descent" style="font-size: 80%;">Steepest descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs013.html#more-on-steepest-descent" style="font-size: 80%;">More on Steepest descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs014.html#the-ideal" style="font-size: 80%;">The ideal</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs015.html#the-sensitiveness-of-the-gradient-descent" style="font-size: 80%;">The sensitiveness of the gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs016.html#convex-functions" style="font-size: 80%;">Convex functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs017.html#convex-function" style="font-size: 80%;">Convex function</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs018.html#conditions-on-convex-functions" style="font-size: 80%;">Conditions on convex functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs019.html#more-on-convex-functions" style="font-size: 80%;">More on convex functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs020.html#some-simple-problems" style="font-size: 80%;">Some simple problems</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs021.html#standard-steepest-descent" style="font-size: 80%;">Standard steepest descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs022.html#gradient-method" style="font-size: 80%;">Gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs024.html#steepest-descent-method" style="font-size: 80%;">Steepest descent  method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs024.html#steepest-descent-method" style="font-size: 80%;">Steepest descent  method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs025.html#final-expressions" style="font-size: 80%;">Final expressions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs026.html#steepest-descent-example" style="font-size: 80%;">Steepest descent example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs031.html#conjugate-gradient-method-and-iterations" style="font-size: 80%;">Conjugate gradient method and iterations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs035.html#revisiting-our-first-homework" style="font-size: 80%;">Revisiting our first homework</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs040.html#gradient-descent-example" style="font-size: 80%;">Gradient descent example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs037.html#the-derivative-of-the-cost-loss-function" style="font-size: 80%;">The derivative of the cost/loss function</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs038.html#the-hessian-matrix" style="font-size: 80%;">The Hessian matrix</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs039.html#simple-program" style="font-size: 80%;">Simple program</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs040.html#gradient-descent-example" style="font-size: 80%;">Gradient Descent Example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs041.html#and-a-corresponding-example-using-scikit-learn" style="font-size: 80%;">And a corresponding example using <b>scikit-learn</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs042.html#gradient-descent-and-ridge" style="font-size: 80%;">Gradient descent and Ridge</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs043.html#the-hessian-matrix-for-ridge-regression" style="font-size: 80%;">The Hessian matrix for Ridge Regression</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs044.html#program-example-for-gradient-descent-with-ridge-regression" style="font-size: 80%;">Program example for gradient descent with Ridge Regression</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs045.html#using-gradient-descent-methods-limitations" style="font-size: 80%;">Using gradient descent methods, limitations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs046.html#improving-gradient-descent-with-momentum" style="font-size: 80%;">Improving gradient descent with momentum</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs083.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;">Same code but now with momentum gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs048.html#overview-video-on-stochastic-gradient-descent" style="font-size: 80%;">Overview video on Stochastic Gradient Descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs049.html#batches-and-mini-batches" style="font-size: 80%;">Batches and mini-batches</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs050.html#stochastic-gradient-descent-sgd" style="font-size: 80%;">Stochastic Gradient Descent (SGD)</a></li>
+     <!-- navigation toc: --> <li><a href="#stochastic-gradient-descent" style="font-size: 80%;">Stochastic Gradient Descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs052.html#computation-of-gradients" style="font-size: 80%;">Computation of gradients</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs053.html#sgd-example" style="font-size: 80%;">SGD example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs054.html#the-gradient-step" style="font-size: 80%;">The gradient step</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs055.html#simple-example-code" style="font-size: 80%;">Simple example code</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs056.html#when-do-we-stop" style="font-size: 80%;">When do we stop?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs057.html#slightly-different-approach" style="font-size: 80%;">Slightly different approach</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs058.html#time-decay-rate" style="font-size: 80%;">Time decay rate</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs059.html#code-with-a-number-of-minibatches-which-varies" style="font-size: 80%;">Code with a Number of Minibatches which varies</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs060.html#replace-or-not" style="font-size: 80%;">Replace or not</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs061.html#momentum-based-gd" style="font-size: 80%;">Momentum based GD</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs062.html#more-on-momentum-based-approaches" style="font-size: 80%;">More on momentum based approaches</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs063.html#momentum-parameter" style="font-size: 80%;">Momentum parameter</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs064.html#second-moment-of-the-gradient" style="font-size: 80%;">Second moment of the gradient</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs065.html#rms-prop" style="font-size: 80%;">RMS prop</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs066.html#adam-optimizer-https-arxiv-org-abs-1412-6980" style="font-size: 80%;">"ADAM optimizer":"https://arxiv.org/abs/1412.6980"</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs067.html#algorithms-and-codes-for-adagrad-rmsprop-and-adam" style="font-size: 80%;">Algorithms and codes for Adagrad, RMSprop and Adam</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs068.html#practical-tips" style="font-size: 80%;">Practical tips</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs069.html#automatic-differentiation" style="font-size: 80%;">Automatic differentiation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs070.html#using-autograd" style="font-size: 80%;">Using autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs071.html#autograd-with-more-complicated-functions" style="font-size: 80%;">Autograd with more complicated functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs072.html#more-complicated-functions-using-the-elements-of-their-arguments-directly" style="font-size: 80%;">More complicated functions using the elements of their arguments directly</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs073.html#functions-using-mathematical-functions-from-numpy" style="font-size: 80%;">Functions using mathematical functions from Numpy</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs074.html#more-autograd" style="font-size: 80%;">More autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs075.html#and-with-loops" style="font-size: 80%;">And  with loops</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs076.html#using-recursion" style="font-size: 80%;">Using recursion</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs077.html#unsupported-functions" style="font-size: 80%;">Unsupported functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs078.html#the-syntax-a-dot-b-when-finding-the-dot-product" style="font-size: 80%;">The syntax a.dot(b) when finding the dot product</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs079.html#using-autograd-with-ols" style="font-size: 80%;">Using Autograd with OLS</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs083.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;">Same code but now with momentum gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs081.html#but-none-of-these-can-compete-with-newton-s-method" style="font-size: 80%;">But none of these can compete with Newton's method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs082.html#including-stochastic-gradient-descent-with-autograd" style="font-size: 80%;">Including Stochastic Gradient Descent with Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs083.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;">Same code but now with momentum gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs084.html#similar-second-order-function-now-problem-but-now-with-adagrad" style="font-size: 80%;">Similar (second order function now) problem but now with AdaGrad</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs085.html#rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent" style="font-size: 80%;">RMSprop for adaptive learning rate with Stochastic Gradient Descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs086.html#and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf" style="font-size: 80%;">And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs087.html#and-logistic-regression" style="font-size: 80%;">And Logistic Regression</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs088.html#introducing-jax-https-jax-readthedocs-io-en-latest" style="font-size: 80%;">Introducing "JAX":"https://jax.readthedocs.io/en/latest/"</a></li>
+
+        </ul>
+      </li>
+    </ul>
+  </div>
+</div>
+</div> <!-- end of navigation bar -->
+<div class="container">
+<p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
+<a name="part0051"></a>
+<!-- !split -->
+<h2 id="stochastic-gradient-descent" class="anchor">Stochastic Gradient Descent </h2>
+
+<p>Stochastic gradient descent (SGD) and variants thereof address some of
+the shortcomings of the Gradient descent method discussed above.
+</p>
+
+<p>The underlying idea of SGD comes from the observation that the cost
+function, which we want to minimize, can almost always be written as a
+sum over \( n \) data points \( \{\mathbf{x}_i\}_{i=1}^n \),
+</p>
+$$
+C(\mathbf{\beta}) = \sum_{i=1}^n c_i(\mathbf{x}_i,
+\mathbf{\beta}). 
+$$
+
+
+<p>
+<!-- navigation buttons at the bottom of the page -->
+<ul class="pagination">
+<li><a href="http://github.com/._week39-bs050.html">&laquo;</a></li>
+  <li><a href="http://github.com/._week39-bs000.html">1</a></li>
+  <li><a href="">...</a></li>
+  <li><a href="http://github.com/._week39-bs043.html">44</a></li>
+  <li><a href="http://github.com/._week39-bs044.html">45</a></li>
+  <li><a href="http://github.com/._week39-bs045.html">46</a></li>
+  <li><a href="http://github.com/._week39-bs046.html">47</a></li>
+  <li><a href="http://github.com/._week39-bs047.html">48</a></li>
+  <li><a href="http://github.com/._week39-bs048.html">49</a></li>
+  <li><a href="http://github.com/._week39-bs049.html">50</a></li>
+  <li><a href="http://github.com/._week39-bs050.html">51</a></li>
+  <li class="active"><a href="http://github.com/._week39-bs051.html">52</a></li>
+  <li><a href="http://github.com/._week39-bs052.html">53</a></li>
+  <li><a href="http://github.com/._week39-bs053.html">54</a></li>
+  <li><a href="http://github.com/._week39-bs054.html">55</a></li>
+  <li><a href="http://github.com/._week39-bs055.html">56</a></li>
+  <li><a href="http://github.com/._week39-bs056.html">57</a></li>
+  <li><a href="http://github.com/._week39-bs057.html">58</a></li>
+  <li><a href="http://github.com/._week39-bs058.html">59</a></li>
+  <li><a href="http://github.com/._week39-bs059.html">60</a></li>
+  <li><a href="http://github.com/._week39-bs060.html">61</a></li>
+  <li><a href="">...</a></li>
+  <li><a href="http://github.com/._week39-bs088.html">89</a></li>
+  <li><a href="http://github.com/._week39-bs052.html">&raquo;</a></li>
+</ul>
+<!-- ------------------- end of main content --------------- -->
+</div>  <!-- end container -->
+<!-- include javascript, jQuery *first* -->
+<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.10.2/jquery.min.js"></script>
+<script src="https://netdna.bootstrapcdn.com/bootstrap/3.0.0/js/bootstrap.min.js"></script>
+<!-- Bootstrap footer
+<footer>
+<a href="https://.../"><img width="250" align=right src="https://.../"></a>
+</footer>
+-->
+<center style="font-size:80%">
+<!-- copyright only on the titlepage -->
+</center>
+</body>
+</html>
+
diff --git a/doc/pub/week39/html/._week39-bs052.html b/doc/pub/week39/html/._week39-bs052.html
new file mode 100644
index 000000000..e34ccd33c
--- /dev/null
+++ b/doc/pub/week39/html/._week39-bs052.html
@@ -0,0 +1,474 @@
+<!--
+HTML file automatically generated from DocOnce source
+(https://github.com/doconce/doconce/)
+doconce format html week39.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week39-bs --no_mako
+-->
+<html>
+<head>
+<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
+<meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
+<meta name="viewport" content="width=device-width, initial-scale=1.0" />
+<meta name="description" content="Week 39: Optimization and  Gradient Methods">
+<title>Week 39: Optimization and  Gradient Methods</title>
+<!-- Bootstrap style: bootstrap -->
+<!-- doconce format html week39.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week39-bs --no_mako -->
+<link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
+<!-- not necessary
+<link href="https://netdna.bootstrapcdn.com/font-awesome/4.0.3/css/font-awesome.css" rel="stylesheet">
+-->
+<style type="text/css">
+/* Add scrollbar to dropdown menus in bootstrap navigation bar */
+.dropdown-menu {
+   height: auto;
+   max-height: 400px;
+   overflow-x: hidden;
+}
+/* Adds an invisible element before each target to offset for the navigation
+   bar */
+.anchor::before {
+  content:"";
+  display:block;
+  height:50px;      /* fixed header height for style bootstrap */
+  margin:-50px 0 0; /* negative fixed header height */
+}
+</style>
+</head>
+
+<!-- tocinfo
+{'highest level': 2,
+ 'sections': [('Plan for week 39, September 23-27, 2024',
+               2,
+               None,
+               'plan-for-week-39-september-23-27-2024'),
+              ('Lecture Monday September 23',
+               2,
+               None,
+               'lecture-monday-september-23'),
+              ('Lab sessions week 39', 2, None, 'lab-sessions-week-39'),
+              ('Lecture Monday September 23, Optimization, the central part of '
+               'any Machine Learning algortithm',
+               2,
+               None,
+               'lecture-monday-september-23-optimization-the-central-part-of-any-machine-learning-algortithm'),
+              ('Revisiting our Logistic Regression case',
+               2,
+               None,
+               'revisiting-our-logistic-regression-case'),
+              ('The equations to solve', 2, None, 'the-equations-to-solve'),
+              ("Solving using Newton-Raphson's method",
+               2,
+               None,
+               'solving-using-newton-raphson-s-method'),
+              ("Brief reminder on Newton-Raphson's method",
+               2,
+               None,
+               'brief-reminder-on-newton-raphson-s-method'),
+              ('The equations', 2, None, 'the-equations'),
+              ('Simple geometric interpretation',
+               2,
+               None,
+               'simple-geometric-interpretation'),
+              ('Extending to more than one variable',
+               2,
+               None,
+               'extending-to-more-than-one-variable'),
+              ('Steepest descent', 2, None, 'steepest-descent'),
+              ('More on Steepest descent', 2, None, 'more-on-steepest-descent'),
+              ('The ideal', 2, None, 'the-ideal'),
+              ('The sensitiveness of the gradient descent',
+               2,
+               None,
+               'the-sensitiveness-of-the-gradient-descent'),
+              ('Convex functions', 2, None, 'convex-functions'),
+              ('Convex function', 2, None, 'convex-function'),
+              ('Conditions on convex functions',
+               2,
+               None,
+               'conditions-on-convex-functions'),
+              ('More on convex functions', 2, None, 'more-on-convex-functions'),
+              ('Some simple problems', 2, None, 'some-simple-problems'),
+              ('Standard steepest descent',
+               2,
+               None,
+               'standard-steepest-descent'),
+              ('Gradient method', 2, None, 'gradient-method'),
+              ('Steepest descent  method', 2, None, 'steepest-descent-method'),
+              ('Steepest descent  method', 2, None, 'steepest-descent-method'),
+              ('Final expressions', 2, None, 'final-expressions'),
+              ('Steepest descent example', 2, None, 'steepest-descent-example'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method and iterations',
+               2,
+               None,
+               'conjugate-gradient-method-and-iterations'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Revisiting our first homework',
+               2,
+               None,
+               'revisiting-our-first-homework'),
+              ('Gradient descent example', 2, None, 'gradient-descent-example'),
+              ('The derivative of the cost/loss function',
+               2,
+               None,
+               'the-derivative-of-the-cost-loss-function'),
+              ('The Hessian matrix', 2, None, 'the-hessian-matrix'),
+              ('Simple program', 2, None, 'simple-program'),
+              ('Gradient Descent Example', 2, None, 'gradient-descent-example'),
+              ('And a corresponding example using _scikit-learn_',
+               2,
+               None,
+               'and-a-corresponding-example-using-scikit-learn'),
+              ('Gradient descent and Ridge',
+               2,
+               None,
+               'gradient-descent-and-ridge'),
+              ('The Hessian matrix for Ridge Regression',
+               2,
+               None,
+               'the-hessian-matrix-for-ridge-regression'),
+              ('Program example for gradient descent with Ridge Regression',
+               2,
+               None,
+               'program-example-for-gradient-descent-with-ridge-regression'),
+              ('Using gradient descent methods, limitations',
+               2,
+               None,
+               'using-gradient-descent-methods-limitations'),
+              ('Improving gradient descent with momentum',
+               2,
+               None,
+               'improving-gradient-descent-with-momentum'),
+              ('Same code but now with momentum gradient descent',
+               2,
+               None,
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ('Overview video on Stochastic Gradient Descent',
+               2,
+               None,
+               'overview-video-on-stochastic-gradient-descent'),
+              ('Batches and mini-batches', 2, None, 'batches-and-mini-batches'),
+              ('Stochastic Gradient Descent (SGD)',
+               2,
+               None,
+               'stochastic-gradient-descent-sgd'),
+              ('Stochastic Gradient Descent',
+               2,
+               None,
+               'stochastic-gradient-descent'),
+              ('Computation of gradients', 2, None, 'computation-of-gradients'),
+              ('SGD example', 2, None, 'sgd-example'),
+              ('The gradient step', 2, None, 'the-gradient-step'),
+              ('Simple example code', 2, None, 'simple-example-code'),
+              ('When do we stop?', 2, None, 'when-do-we-stop'),
+              ('Slightly different approach',
+               2,
+               None,
+               'slightly-different-approach'),
+              ('Time decay rate', 2, None, 'time-decay-rate'),
+              ('Code with a Number of Minibatches which varies',
+               2,
+               None,
+               'code-with-a-number-of-minibatches-which-varies'),
+              ('Replace or not', 2, None, 'replace-or-not'),
+              ('Momentum based GD', 2, None, 'momentum-based-gd'),
+              ('More on momentum based approaches',
+               2,
+               None,
+               'more-on-momentum-based-approaches'),
+              ('Momentum parameter', 2, None, 'momentum-parameter'),
+              ('Second moment of the gradient',
+               2,
+               None,
+               'second-moment-of-the-gradient'),
+              ('RMS prop', 2, None, 'rms-prop'),
+              ('"ADAM optimizer":"https://arxiv.org/abs/1412.6980"',
+               2,
+               None,
+               'adam-optimizer-https-arxiv-org-abs-1412-6980'),
+              ('Algorithms and codes for Adagrad, RMSprop and Adam',
+               2,
+               None,
+               'algorithms-and-codes-for-adagrad-rmsprop-and-adam'),
+              ('Practical tips', 2, None, 'practical-tips'),
+              ('Automatic differentiation',
+               2,
+               None,
+               'automatic-differentiation'),
+              ('Using autograd', 2, None, 'using-autograd'),
+              ('Autograd with more complicated functions',
+               2,
+               None,
+               'autograd-with-more-complicated-functions'),
+              ('More complicated functions using the elements of their '
+               'arguments directly',
+               2,
+               None,
+               'more-complicated-functions-using-the-elements-of-their-arguments-directly'),
+              ('Functions using mathematical functions from Numpy',
+               2,
+               None,
+               'functions-using-mathematical-functions-from-numpy'),
+              ('More autograd', 2, None, 'more-autograd'),
+              ('And  with loops', 2, None, 'and-with-loops'),
+              ('Using recursion', 2, None, 'using-recursion'),
+              ('Unsupported functions', 2, None, 'unsupported-functions'),
+              ('The syntax a.dot(b) when finding the dot product',
+               2,
+               None,
+               'the-syntax-a-dot-b-when-finding-the-dot-product'),
+              ('Using Autograd with OLS', 2, None, 'using-autograd-with-ols'),
+              ('Same code but now with momentum gradient descent',
+               2,
+               None,
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ("But none of these can compete with Newton's method",
+               2,
+               None,
+               'but-none-of-these-can-compete-with-newton-s-method'),
+              ('Including Stochastic Gradient Descent with Autograd',
+               2,
+               None,
+               'including-stochastic-gradient-descent-with-autograd'),
+              ('Same code but now with momentum gradient descent',
+               2,
+               None,
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ('Similar (second order function now) problem but now with '
+               'AdaGrad',
+               2,
+               None,
+               'similar-second-order-function-now-problem-but-now-with-adagrad'),
+              ('RMSprop for adaptive learning rate with Stochastic Gradient '
+               'Descent',
+               2,
+               None,
+               'rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent'),
+              ('And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"',
+               2,
+               None,
+               'and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf'),
+              ('And Logistic Regression', 2, None, 'and-logistic-regression'),
+              ('Introducing "JAX":"https://jax.readthedocs.io/en/latest/"',
+               2,
+               None,
+               'introducing-jax-https-jax-readthedocs-io-en-latest')]}
+end of tocinfo -->
+
+<body>
+
+
+
+<script type="text/x-mathjax-config">
+MathJax.Hub.Config({
+  TeX: {
+     equationNumbers: {  autoNumber: "none"  },
+     extensions: ["AMSmath.js", "AMSsymbols.js", "autobold.js", "color.js"]
+  }
+});
+</script>
+<script type="text/javascript" async
+ src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
+</script>
+
+
+<!-- Bootstrap navigation bar -->
+<div class="navbar navbar-default navbar-fixed-top">
+  <div class="navbar-header">
+    <button type="button" class="navbar-toggle" data-toggle="collapse" data-target=".navbar-responsive-collapse">
+      <span class="icon-bar"></span>
+      <span class="icon-bar"></span>
+      <span class="icon-bar"></span>
+    </button>
+    <a class="navbar-brand" href="http://github.com/week39-bs.html">Week 39: Optimization and  Gradient Methods</a>
+  </div>
+  <div class="navbar-collapse collapse navbar-responsive-collapse">
+    <ul class="nav navbar-nav navbar-right">
+      <li class="dropdown">
+        <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
+        <ul class="dropdown-menu">
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs001.html#plan-for-week-39-september-23-27-2024" style="font-size: 80%;">Plan for week 39, September 23-27, 2024</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs002.html#lecture-monday-september-23" style="font-size: 80%;">Lecture Monday September 23</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs003.html#lab-sessions-week-39" style="font-size: 80%;">Lab sessions week 39</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs004.html#lecture-monday-september-23-optimization-the-central-part-of-any-machine-learning-algortithm" style="font-size: 80%;">Lecture Monday September 23, Optimization, the central part of any Machine Learning algortithm</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs005.html#revisiting-our-logistic-regression-case" style="font-size: 80%;">Revisiting our Logistic Regression case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs006.html#the-equations-to-solve" style="font-size: 80%;">The equations to solve</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs007.html#solving-using-newton-raphson-s-method" style="font-size: 80%;">Solving using Newton-Raphson's method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs008.html#brief-reminder-on-newton-raphson-s-method" style="font-size: 80%;">Brief reminder on Newton-Raphson's method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs009.html#the-equations" style="font-size: 80%;">The equations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs010.html#simple-geometric-interpretation" style="font-size: 80%;">Simple geometric interpretation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs011.html#extending-to-more-than-one-variable" style="font-size: 80%;">Extending to more than one variable</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs012.html#steepest-descent" style="font-size: 80%;">Steepest descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs013.html#more-on-steepest-descent" style="font-size: 80%;">More on Steepest descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs014.html#the-ideal" style="font-size: 80%;">The ideal</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs015.html#the-sensitiveness-of-the-gradient-descent" style="font-size: 80%;">The sensitiveness of the gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs016.html#convex-functions" style="font-size: 80%;">Convex functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs017.html#convex-function" style="font-size: 80%;">Convex function</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs018.html#conditions-on-convex-functions" style="font-size: 80%;">Conditions on convex functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs019.html#more-on-convex-functions" style="font-size: 80%;">More on convex functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs020.html#some-simple-problems" style="font-size: 80%;">Some simple problems</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs021.html#standard-steepest-descent" style="font-size: 80%;">Standard steepest descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs022.html#gradient-method" style="font-size: 80%;">Gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs024.html#steepest-descent-method" style="font-size: 80%;">Steepest descent  method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs024.html#steepest-descent-method" style="font-size: 80%;">Steepest descent  method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs025.html#final-expressions" style="font-size: 80%;">Final expressions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs026.html#steepest-descent-example" style="font-size: 80%;">Steepest descent example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs031.html#conjugate-gradient-method-and-iterations" style="font-size: 80%;">Conjugate gradient method and iterations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs035.html#revisiting-our-first-homework" style="font-size: 80%;">Revisiting our first homework</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs040.html#gradient-descent-example" style="font-size: 80%;">Gradient descent example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs037.html#the-derivative-of-the-cost-loss-function" style="font-size: 80%;">The derivative of the cost/loss function</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs038.html#the-hessian-matrix" style="font-size: 80%;">The Hessian matrix</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs039.html#simple-program" style="font-size: 80%;">Simple program</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs040.html#gradient-descent-example" style="font-size: 80%;">Gradient Descent Example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs041.html#and-a-corresponding-example-using-scikit-learn" style="font-size: 80%;">And a corresponding example using <b>scikit-learn</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs042.html#gradient-descent-and-ridge" style="font-size: 80%;">Gradient descent and Ridge</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs043.html#the-hessian-matrix-for-ridge-regression" style="font-size: 80%;">The Hessian matrix for Ridge Regression</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs044.html#program-example-for-gradient-descent-with-ridge-regression" style="font-size: 80%;">Program example for gradient descent with Ridge Regression</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs045.html#using-gradient-descent-methods-limitations" style="font-size: 80%;">Using gradient descent methods, limitations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs046.html#improving-gradient-descent-with-momentum" style="font-size: 80%;">Improving gradient descent with momentum</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs083.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;">Same code but now with momentum gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs048.html#overview-video-on-stochastic-gradient-descent" style="font-size: 80%;">Overview video on Stochastic Gradient Descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs049.html#batches-and-mini-batches" style="font-size: 80%;">Batches and mini-batches</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs050.html#stochastic-gradient-descent-sgd" style="font-size: 80%;">Stochastic Gradient Descent (SGD)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs051.html#stochastic-gradient-descent" style="font-size: 80%;">Stochastic Gradient Descent</a></li>
+     <!-- navigation toc: --> <li><a href="#computation-of-gradients" style="font-size: 80%;">Computation of gradients</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs053.html#sgd-example" style="font-size: 80%;">SGD example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs054.html#the-gradient-step" style="font-size: 80%;">The gradient step</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs055.html#simple-example-code" style="font-size: 80%;">Simple example code</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs056.html#when-do-we-stop" style="font-size: 80%;">When do we stop?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs057.html#slightly-different-approach" style="font-size: 80%;">Slightly different approach</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs058.html#time-decay-rate" style="font-size: 80%;">Time decay rate</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs059.html#code-with-a-number-of-minibatches-which-varies" style="font-size: 80%;">Code with a Number of Minibatches which varies</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs060.html#replace-or-not" style="font-size: 80%;">Replace or not</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs061.html#momentum-based-gd" style="font-size: 80%;">Momentum based GD</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs062.html#more-on-momentum-based-approaches" style="font-size: 80%;">More on momentum based approaches</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs063.html#momentum-parameter" style="font-size: 80%;">Momentum parameter</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs064.html#second-moment-of-the-gradient" style="font-size: 80%;">Second moment of the gradient</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs065.html#rms-prop" style="font-size: 80%;">RMS prop</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs066.html#adam-optimizer-https-arxiv-org-abs-1412-6980" style="font-size: 80%;">"ADAM optimizer":"https://arxiv.org/abs/1412.6980"</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs067.html#algorithms-and-codes-for-adagrad-rmsprop-and-adam" style="font-size: 80%;">Algorithms and codes for Adagrad, RMSprop and Adam</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs068.html#practical-tips" style="font-size: 80%;">Practical tips</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs069.html#automatic-differentiation" style="font-size: 80%;">Automatic differentiation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs070.html#using-autograd" style="font-size: 80%;">Using autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs071.html#autograd-with-more-complicated-functions" style="font-size: 80%;">Autograd with more complicated functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs072.html#more-complicated-functions-using-the-elements-of-their-arguments-directly" style="font-size: 80%;">More complicated functions using the elements of their arguments directly</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs073.html#functions-using-mathematical-functions-from-numpy" style="font-size: 80%;">Functions using mathematical functions from Numpy</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs074.html#more-autograd" style="font-size: 80%;">More autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs075.html#and-with-loops" style="font-size: 80%;">And  with loops</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs076.html#using-recursion" style="font-size: 80%;">Using recursion</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs077.html#unsupported-functions" style="font-size: 80%;">Unsupported functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs078.html#the-syntax-a-dot-b-when-finding-the-dot-product" style="font-size: 80%;">The syntax a.dot(b) when finding the dot product</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs079.html#using-autograd-with-ols" style="font-size: 80%;">Using Autograd with OLS</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs083.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;">Same code but now with momentum gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs081.html#but-none-of-these-can-compete-with-newton-s-method" style="font-size: 80%;">But none of these can compete with Newton's method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs082.html#including-stochastic-gradient-descent-with-autograd" style="font-size: 80%;">Including Stochastic Gradient Descent with Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs083.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;">Same code but now with momentum gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs084.html#similar-second-order-function-now-problem-but-now-with-adagrad" style="font-size: 80%;">Similar (second order function now) problem but now with AdaGrad</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs085.html#rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent" style="font-size: 80%;">RMSprop for adaptive learning rate with Stochastic Gradient Descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs086.html#and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf" style="font-size: 80%;">And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs087.html#and-logistic-regression" style="font-size: 80%;">And Logistic Regression</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs088.html#introducing-jax-https-jax-readthedocs-io-en-latest" style="font-size: 80%;">Introducing "JAX":"https://jax.readthedocs.io/en/latest/"</a></li>
+
+        </ul>
+      </li>
+    </ul>
+  </div>
+</div>
+</div> <!-- end of navigation bar -->
+<div class="container">
+<p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
+<a name="part0052"></a>
+<!-- !split -->
+<h2 id="computation-of-gradients" class="anchor">Computation of gradients </h2>
+
+<p>This in turn means that the gradient can be
+computed as a sum over \( i \)-gradients 
+</p>
+$$
+\nabla_\beta C(\mathbf{\beta}) = \sum_i^n \nabla_\beta c_i(\mathbf{x}_i,
+\mathbf{\beta}).
+$$
+
+<p>Stochasticity/randomness is introduced by only taking the
+gradient on a subset of the data called minibatches.  If there are \( n \)
+data points and the size of each minibatch is \( M \), there will be \( n/M \)
+minibatches. We denote these minibatches by \( B_k \) where
+\( k=1,\cdots,n/M \).
+</p>
+
+<p>
+<!-- navigation buttons at the bottom of the page -->
+<ul class="pagination">
+<li><a href="http://github.com/._week39-bs051.html">&laquo;</a></li>
+  <li><a href="http://github.com/._week39-bs000.html">1</a></li>
+  <li><a href="">...</a></li>
+  <li><a href="http://github.com/._week39-bs044.html">45</a></li>
+  <li><a href="http://github.com/._week39-bs045.html">46</a></li>
+  <li><a href="http://github.com/._week39-bs046.html">47</a></li>
+  <li><a href="http://github.com/._week39-bs047.html">48</a></li>
+  <li><a href="http://github.com/._week39-bs048.html">49</a></li>
+  <li><a href="http://github.com/._week39-bs049.html">50</a></li>
+  <li><a href="http://github.com/._week39-bs050.html">51</a></li>
+  <li><a href="http://github.com/._week39-bs051.html">52</a></li>
+  <li class="active"><a href="http://github.com/._week39-bs052.html">53</a></li>
+  <li><a href="http://github.com/._week39-bs053.html">54</a></li>
+  <li><a href="http://github.com/._week39-bs054.html">55</a></li>
+  <li><a href="http://github.com/._week39-bs055.html">56</a></li>
+  <li><a href="http://github.com/._week39-bs056.html">57</a></li>
+  <li><a href="http://github.com/._week39-bs057.html">58</a></li>
+  <li><a href="http://github.com/._week39-bs058.html">59</a></li>
+  <li><a href="http://github.com/._week39-bs059.html">60</a></li>
+  <li><a href="http://github.com/._week39-bs060.html">61</a></li>
+  <li><a href="http://github.com/._week39-bs061.html">62</a></li>
+  <li><a href="">...</a></li>
+  <li><a href="http://github.com/._week39-bs088.html">89</a></li>
+  <li><a href="http://github.com/._week39-bs053.html">&raquo;</a></li>
+</ul>
+<!-- ------------------- end of main content --------------- -->
+</div>  <!-- end container -->
+<!-- include javascript, jQuery *first* -->
+<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.10.2/jquery.min.js"></script>
+<script src="https://netdna.bootstrapcdn.com/bootstrap/3.0.0/js/bootstrap.min.js"></script>
+<!-- Bootstrap footer
+<footer>
+<a href="https://.../"><img width="250" align=right src="https://.../"></a>
+</footer>
+-->
+<center style="font-size:80%">
+<!-- copyright only on the titlepage -->
+</center>
+</body>
+</html>
+
diff --git a/doc/pub/week39/html/._week39-bs053.html b/doc/pub/week39/html/._week39-bs053.html
new file mode 100644
index 000000000..81c971ba2
--- /dev/null
+++ b/doc/pub/week39/html/._week39-bs053.html
@@ -0,0 +1,480 @@
+<!--
+HTML file automatically generated from DocOnce source
+(https://github.com/doconce/doconce/)
+doconce format html week39.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week39-bs --no_mako
+-->
+<html>
+<head>
+<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
+<meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
+<meta name="viewport" content="width=device-width, initial-scale=1.0" />
+<meta name="description" content="Week 39: Optimization and  Gradient Methods">
+<title>Week 39: Optimization and  Gradient Methods</title>
+<!-- Bootstrap style: bootstrap -->
+<!-- doconce format html week39.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week39-bs --no_mako -->
+<link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
+<!-- not necessary
+<link href="https://netdna.bootstrapcdn.com/font-awesome/4.0.3/css/font-awesome.css" rel="stylesheet">
+-->
+<style type="text/css">
+/* Add scrollbar to dropdown menus in bootstrap navigation bar */
+.dropdown-menu {
+   height: auto;
+   max-height: 400px;
+   overflow-x: hidden;
+}
+/* Adds an invisible element before each target to offset for the navigation
+   bar */
+.anchor::before {
+  content:"";
+  display:block;
+  height:50px;      /* fixed header height for style bootstrap */
+  margin:-50px 0 0; /* negative fixed header height */
+}
+</style>
+</head>
+
+<!-- tocinfo
+{'highest level': 2,
+ 'sections': [('Plan for week 39, September 23-27, 2024',
+               2,
+               None,
+               'plan-for-week-39-september-23-27-2024'),
+              ('Lecture Monday September 23',
+               2,
+               None,
+               'lecture-monday-september-23'),
+              ('Lab sessions week 39', 2, None, 'lab-sessions-week-39'),
+              ('Lecture Monday September 23, Optimization, the central part of '
+               'any Machine Learning algortithm',
+               2,
+               None,
+               'lecture-monday-september-23-optimization-the-central-part-of-any-machine-learning-algortithm'),
+              ('Revisiting our Logistic Regression case',
+               2,
+               None,
+               'revisiting-our-logistic-regression-case'),
+              ('The equations to solve', 2, None, 'the-equations-to-solve'),
+              ("Solving using Newton-Raphson's method",
+               2,
+               None,
+               'solving-using-newton-raphson-s-method'),
+              ("Brief reminder on Newton-Raphson's method",
+               2,
+               None,
+               'brief-reminder-on-newton-raphson-s-method'),
+              ('The equations', 2, None, 'the-equations'),
+              ('Simple geometric interpretation',
+               2,
+               None,
+               'simple-geometric-interpretation'),
+              ('Extending to more than one variable',
+               2,
+               None,
+               'extending-to-more-than-one-variable'),
+              ('Steepest descent', 2, None, 'steepest-descent'),
+              ('More on Steepest descent', 2, None, 'more-on-steepest-descent'),
+              ('The ideal', 2, None, 'the-ideal'),
+              ('The sensitiveness of the gradient descent',
+               2,
+               None,
+               'the-sensitiveness-of-the-gradient-descent'),
+              ('Convex functions', 2, None, 'convex-functions'),
+              ('Convex function', 2, None, 'convex-function'),
+              ('Conditions on convex functions',
+               2,
+               None,
+               'conditions-on-convex-functions'),
+              ('More on convex functions', 2, None, 'more-on-convex-functions'),
+              ('Some simple problems', 2, None, 'some-simple-problems'),
+              ('Standard steepest descent',
+               2,
+               None,
+               'standard-steepest-descent'),
+              ('Gradient method', 2, None, 'gradient-method'),
+              ('Steepest descent  method', 2, None, 'steepest-descent-method'),
+              ('Steepest descent  method', 2, None, 'steepest-descent-method'),
+              ('Final expressions', 2, None, 'final-expressions'),
+              ('Steepest descent example', 2, None, 'steepest-descent-example'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method and iterations',
+               2,
+               None,
+               'conjugate-gradient-method-and-iterations'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Revisiting our first homework',
+               2,
+               None,
+               'revisiting-our-first-homework'),
+              ('Gradient descent example', 2, None, 'gradient-descent-example'),
+              ('The derivative of the cost/loss function',
+               2,
+               None,
+               'the-derivative-of-the-cost-loss-function'),
+              ('The Hessian matrix', 2, None, 'the-hessian-matrix'),
+              ('Simple program', 2, None, 'simple-program'),
+              ('Gradient Descent Example', 2, None, 'gradient-descent-example'),
+              ('And a corresponding example using _scikit-learn_',
+               2,
+               None,
+               'and-a-corresponding-example-using-scikit-learn'),
+              ('Gradient descent and Ridge',
+               2,
+               None,
+               'gradient-descent-and-ridge'),
+              ('The Hessian matrix for Ridge Regression',
+               2,
+               None,
+               'the-hessian-matrix-for-ridge-regression'),
+              ('Program example for gradient descent with Ridge Regression',
+               2,
+               None,
+               'program-example-for-gradient-descent-with-ridge-regression'),
+              ('Using gradient descent methods, limitations',
+               2,
+               None,
+               'using-gradient-descent-methods-limitations'),
+              ('Improving gradient descent with momentum',
+               2,
+               None,
+               'improving-gradient-descent-with-momentum'),
+              ('Same code but now with momentum gradient descent',
+               2,
+               None,
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ('Overview video on Stochastic Gradient Descent',
+               2,
+               None,
+               'overview-video-on-stochastic-gradient-descent'),
+              ('Batches and mini-batches', 2, None, 'batches-and-mini-batches'),
+              ('Stochastic Gradient Descent (SGD)',
+               2,
+               None,
+               'stochastic-gradient-descent-sgd'),
+              ('Stochastic Gradient Descent',
+               2,
+               None,
+               'stochastic-gradient-descent'),
+              ('Computation of gradients', 2, None, 'computation-of-gradients'),
+              ('SGD example', 2, None, 'sgd-example'),
+              ('The gradient step', 2, None, 'the-gradient-step'),
+              ('Simple example code', 2, None, 'simple-example-code'),
+              ('When do we stop?', 2, None, 'when-do-we-stop'),
+              ('Slightly different approach',
+               2,
+               None,
+               'slightly-different-approach'),
+              ('Time decay rate', 2, None, 'time-decay-rate'),
+              ('Code with a Number of Minibatches which varies',
+               2,
+               None,
+               'code-with-a-number-of-minibatches-which-varies'),
+              ('Replace or not', 2, None, 'replace-or-not'),
+              ('Momentum based GD', 2, None, 'momentum-based-gd'),
+              ('More on momentum based approaches',
+               2,
+               None,
+               'more-on-momentum-based-approaches'),
+              ('Momentum parameter', 2, None, 'momentum-parameter'),
+              ('Second moment of the gradient',
+               2,
+               None,
+               'second-moment-of-the-gradient'),
+              ('RMS prop', 2, None, 'rms-prop'),
+              ('"ADAM optimizer":"https://arxiv.org/abs/1412.6980"',
+               2,
+               None,
+               'adam-optimizer-https-arxiv-org-abs-1412-6980'),
+              ('Algorithms and codes for Adagrad, RMSprop and Adam',
+               2,
+               None,
+               'algorithms-and-codes-for-adagrad-rmsprop-and-adam'),
+              ('Practical tips', 2, None, 'practical-tips'),
+              ('Automatic differentiation',
+               2,
+               None,
+               'automatic-differentiation'),
+              ('Using autograd', 2, None, 'using-autograd'),
+              ('Autograd with more complicated functions',
+               2,
+               None,
+               'autograd-with-more-complicated-functions'),
+              ('More complicated functions using the elements of their '
+               'arguments directly',
+               2,
+               None,
+               'more-complicated-functions-using-the-elements-of-their-arguments-directly'),
+              ('Functions using mathematical functions from Numpy',
+               2,
+               None,
+               'functions-using-mathematical-functions-from-numpy'),
+              ('More autograd', 2, None, 'more-autograd'),
+              ('And  with loops', 2, None, 'and-with-loops'),
+              ('Using recursion', 2, None, 'using-recursion'),
+              ('Unsupported functions', 2, None, 'unsupported-functions'),
+              ('The syntax a.dot(b) when finding the dot product',
+               2,
+               None,
+               'the-syntax-a-dot-b-when-finding-the-dot-product'),
+              ('Using Autograd with OLS', 2, None, 'using-autograd-with-ols'),
+              ('Same code but now with momentum gradient descent',
+               2,
+               None,
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ("But none of these can compete with Newton's method",
+               2,
+               None,
+               'but-none-of-these-can-compete-with-newton-s-method'),
+              ('Including Stochastic Gradient Descent with Autograd',
+               2,
+               None,
+               'including-stochastic-gradient-descent-with-autograd'),
+              ('Same code but now with momentum gradient descent',
+               2,
+               None,
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ('Similar (second order function now) problem but now with '
+               'AdaGrad',
+               2,
+               None,
+               'similar-second-order-function-now-problem-but-now-with-adagrad'),
+              ('RMSprop for adaptive learning rate with Stochastic Gradient '
+               'Descent',
+               2,
+               None,
+               'rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent'),
+              ('And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"',
+               2,
+               None,
+               'and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf'),
+              ('And Logistic Regression', 2, None, 'and-logistic-regression'),
+              ('Introducing "JAX":"https://jax.readthedocs.io/en/latest/"',
+               2,
+               None,
+               'introducing-jax-https-jax-readthedocs-io-en-latest')]}
+end of tocinfo -->
+
+<body>
+
+
+
+<script type="text/x-mathjax-config">
+MathJax.Hub.Config({
+  TeX: {
+     equationNumbers: {  autoNumber: "none"  },
+     extensions: ["AMSmath.js", "AMSsymbols.js", "autobold.js", "color.js"]
+  }
+});
+</script>
+<script type="text/javascript" async
+ src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
+</script>
+
+
+<!-- Bootstrap navigation bar -->
+<div class="navbar navbar-default navbar-fixed-top">
+  <div class="navbar-header">
+    <button type="button" class="navbar-toggle" data-toggle="collapse" data-target=".navbar-responsive-collapse">
+      <span class="icon-bar"></span>
+      <span class="icon-bar"></span>
+      <span class="icon-bar"></span>
+    </button>
+    <a class="navbar-brand" href="http://github.com/week39-bs.html">Week 39: Optimization and  Gradient Methods</a>
+  </div>
+  <div class="navbar-collapse collapse navbar-responsive-collapse">
+    <ul class="nav navbar-nav navbar-right">
+      <li class="dropdown">
+        <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
+        <ul class="dropdown-menu">
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs001.html#plan-for-week-39-september-23-27-2024" style="font-size: 80%;">Plan for week 39, September 23-27, 2024</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs002.html#lecture-monday-september-23" style="font-size: 80%;">Lecture Monday September 23</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs003.html#lab-sessions-week-39" style="font-size: 80%;">Lab sessions week 39</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs004.html#lecture-monday-september-23-optimization-the-central-part-of-any-machine-learning-algortithm" style="font-size: 80%;">Lecture Monday September 23, Optimization, the central part of any Machine Learning algortithm</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs005.html#revisiting-our-logistic-regression-case" style="font-size: 80%;">Revisiting our Logistic Regression case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs006.html#the-equations-to-solve" style="font-size: 80%;">The equations to solve</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs007.html#solving-using-newton-raphson-s-method" style="font-size: 80%;">Solving using Newton-Raphson's method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs008.html#brief-reminder-on-newton-raphson-s-method" style="font-size: 80%;">Brief reminder on Newton-Raphson's method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs009.html#the-equations" style="font-size: 80%;">The equations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs010.html#simple-geometric-interpretation" style="font-size: 80%;">Simple geometric interpretation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs011.html#extending-to-more-than-one-variable" style="font-size: 80%;">Extending to more than one variable</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs012.html#steepest-descent" style="font-size: 80%;">Steepest descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs013.html#more-on-steepest-descent" style="font-size: 80%;">More on Steepest descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs014.html#the-ideal" style="font-size: 80%;">The ideal</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs015.html#the-sensitiveness-of-the-gradient-descent" style="font-size: 80%;">The sensitiveness of the gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs016.html#convex-functions" style="font-size: 80%;">Convex functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs017.html#convex-function" style="font-size: 80%;">Convex function</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs018.html#conditions-on-convex-functions" style="font-size: 80%;">Conditions on convex functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs019.html#more-on-convex-functions" style="font-size: 80%;">More on convex functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs020.html#some-simple-problems" style="font-size: 80%;">Some simple problems</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs021.html#standard-steepest-descent" style="font-size: 80%;">Standard steepest descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs022.html#gradient-method" style="font-size: 80%;">Gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs024.html#steepest-descent-method" style="font-size: 80%;">Steepest descent  method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs024.html#steepest-descent-method" style="font-size: 80%;">Steepest descent  method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs025.html#final-expressions" style="font-size: 80%;">Final expressions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs026.html#steepest-descent-example" style="font-size: 80%;">Steepest descent example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs031.html#conjugate-gradient-method-and-iterations" style="font-size: 80%;">Conjugate gradient method and iterations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs035.html#revisiting-our-first-homework" style="font-size: 80%;">Revisiting our first homework</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs040.html#gradient-descent-example" style="font-size: 80%;">Gradient descent example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs037.html#the-derivative-of-the-cost-loss-function" style="font-size: 80%;">The derivative of the cost/loss function</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs038.html#the-hessian-matrix" style="font-size: 80%;">The Hessian matrix</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs039.html#simple-program" style="font-size: 80%;">Simple program</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs040.html#gradient-descent-example" style="font-size: 80%;">Gradient Descent Example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs041.html#and-a-corresponding-example-using-scikit-learn" style="font-size: 80%;">And a corresponding example using <b>scikit-learn</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs042.html#gradient-descent-and-ridge" style="font-size: 80%;">Gradient descent and Ridge</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs043.html#the-hessian-matrix-for-ridge-regression" style="font-size: 80%;">The Hessian matrix for Ridge Regression</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs044.html#program-example-for-gradient-descent-with-ridge-regression" style="font-size: 80%;">Program example for gradient descent with Ridge Regression</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs045.html#using-gradient-descent-methods-limitations" style="font-size: 80%;">Using gradient descent methods, limitations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs046.html#improving-gradient-descent-with-momentum" style="font-size: 80%;">Improving gradient descent with momentum</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs083.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;">Same code but now with momentum gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs048.html#overview-video-on-stochastic-gradient-descent" style="font-size: 80%;">Overview video on Stochastic Gradient Descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs049.html#batches-and-mini-batches" style="font-size: 80%;">Batches and mini-batches</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs050.html#stochastic-gradient-descent-sgd" style="font-size: 80%;">Stochastic Gradient Descent (SGD)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs051.html#stochastic-gradient-descent" style="font-size: 80%;">Stochastic Gradient Descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs052.html#computation-of-gradients" style="font-size: 80%;">Computation of gradients</a></li>
+     <!-- navigation toc: --> <li><a href="#sgd-example" style="font-size: 80%;">SGD example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs054.html#the-gradient-step" style="font-size: 80%;">The gradient step</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs055.html#simple-example-code" style="font-size: 80%;">Simple example code</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs056.html#when-do-we-stop" style="font-size: 80%;">When do we stop?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs057.html#slightly-different-approach" style="font-size: 80%;">Slightly different approach</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs058.html#time-decay-rate" style="font-size: 80%;">Time decay rate</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs059.html#code-with-a-number-of-minibatches-which-varies" style="font-size: 80%;">Code with a Number of Minibatches which varies</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs060.html#replace-or-not" style="font-size: 80%;">Replace or not</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs061.html#momentum-based-gd" style="font-size: 80%;">Momentum based GD</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs062.html#more-on-momentum-based-approaches" style="font-size: 80%;">More on momentum based approaches</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs063.html#momentum-parameter" style="font-size: 80%;">Momentum parameter</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs064.html#second-moment-of-the-gradient" style="font-size: 80%;">Second moment of the gradient</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs065.html#rms-prop" style="font-size: 80%;">RMS prop</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs066.html#adam-optimizer-https-arxiv-org-abs-1412-6980" style="font-size: 80%;">"ADAM optimizer":"https://arxiv.org/abs/1412.6980"</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs067.html#algorithms-and-codes-for-adagrad-rmsprop-and-adam" style="font-size: 80%;">Algorithms and codes for Adagrad, RMSprop and Adam</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs068.html#practical-tips" style="font-size: 80%;">Practical tips</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs069.html#automatic-differentiation" style="font-size: 80%;">Automatic differentiation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs070.html#using-autograd" style="font-size: 80%;">Using autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs071.html#autograd-with-more-complicated-functions" style="font-size: 80%;">Autograd with more complicated functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs072.html#more-complicated-functions-using-the-elements-of-their-arguments-directly" style="font-size: 80%;">More complicated functions using the elements of their arguments directly</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs073.html#functions-using-mathematical-functions-from-numpy" style="font-size: 80%;">Functions using mathematical functions from Numpy</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs074.html#more-autograd" style="font-size: 80%;">More autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs075.html#and-with-loops" style="font-size: 80%;">And  with loops</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs076.html#using-recursion" style="font-size: 80%;">Using recursion</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs077.html#unsupported-functions" style="font-size: 80%;">Unsupported functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs078.html#the-syntax-a-dot-b-when-finding-the-dot-product" style="font-size: 80%;">The syntax a.dot(b) when finding the dot product</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs079.html#using-autograd-with-ols" style="font-size: 80%;">Using Autograd with OLS</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs083.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;">Same code but now with momentum gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs081.html#but-none-of-these-can-compete-with-newton-s-method" style="font-size: 80%;">But none of these can compete with Newton's method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs082.html#including-stochastic-gradient-descent-with-autograd" style="font-size: 80%;">Including Stochastic Gradient Descent with Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs083.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;">Same code but now with momentum gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs084.html#similar-second-order-function-now-problem-but-now-with-adagrad" style="font-size: 80%;">Similar (second order function now) problem but now with AdaGrad</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs085.html#rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent" style="font-size: 80%;">RMSprop for adaptive learning rate with Stochastic Gradient Descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs086.html#and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf" style="font-size: 80%;">And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs087.html#and-logistic-regression" style="font-size: 80%;">And Logistic Regression</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs088.html#introducing-jax-https-jax-readthedocs-io-en-latest" style="font-size: 80%;">Introducing "JAX":"https://jax.readthedocs.io/en/latest/"</a></li>
+
+        </ul>
+      </li>
+    </ul>
+  </div>
+</div>
+</div> <!-- end of navigation bar -->
+<div class="container">
+<p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
+<a name="part0053"></a>
+<!-- !split -->
+<h2 id="sgd-example" class="anchor">SGD example </h2>
+<p>As an example, suppose we have \( 10 \) data points \( (\mathbf{x}_1,\cdots, \mathbf{x}_{10}) \) 
+and we choose to have \( M=5 \) minibathces,
+then each minibatch contains two data points. In particular we have
+\( B_1 = (\mathbf{x}_1,\mathbf{x}_2), \cdots, B_5 =
+(\mathbf{x}_9,\mathbf{x}_{10}) \). Note that if you choose \( M=1 \) you
+have only a single batch with all data points and on the other extreme,
+you may choose \( M=n \) resulting in a minibatch for each datapoint, i.e
+\( B_k = \mathbf{x}_k \).
+</p>
+
+<p>The idea is now to approximate the gradient by replacing the sum over
+all data points with a sum over the data points in one the minibatches
+picked at random in each gradient descent step 
+</p>
+$$
+\nabla_{\beta}
+C(\mathbf{\beta}) = \sum_{i=1}^n \nabla_\beta c_i(\mathbf{x}_i,
+\mathbf{\beta}) \rightarrow \sum_{i \in B_k}^n \nabla_\beta
+c_i(\mathbf{x}_i, \mathbf{\beta}).
+$$
+
+
+<p>
+<!-- navigation buttons at the bottom of the page -->
+<ul class="pagination">
+<li><a href="http://github.com/._week39-bs052.html">&laquo;</a></li>
+  <li><a href="http://github.com/._week39-bs000.html">1</a></li>
+  <li><a href="">...</a></li>
+  <li><a href="http://github.com/._week39-bs045.html">46</a></li>
+  <li><a href="http://github.com/._week39-bs046.html">47</a></li>
+  <li><a href="http://github.com/._week39-bs047.html">48</a></li>
+  <li><a href="http://github.com/._week39-bs048.html">49</a></li>
+  <li><a href="http://github.com/._week39-bs049.html">50</a></li>
+  <li><a href="http://github.com/._week39-bs050.html">51</a></li>
+  <li><a href="http://github.com/._week39-bs051.html">52</a></li>
+  <li><a href="http://github.com/._week39-bs052.html">53</a></li>
+  <li class="active"><a href="http://github.com/._week39-bs053.html">54</a></li>
+  <li><a href="http://github.com/._week39-bs054.html">55</a></li>
+  <li><a href="http://github.com/._week39-bs055.html">56</a></li>
+  <li><a href="http://github.com/._week39-bs056.html">57</a></li>
+  <li><a href="http://github.com/._week39-bs057.html">58</a></li>
+  <li><a href="http://github.com/._week39-bs058.html">59</a></li>
+  <li><a href="http://github.com/._week39-bs059.html">60</a></li>
+  <li><a href="http://github.com/._week39-bs060.html">61</a></li>
+  <li><a href="http://github.com/._week39-bs061.html">62</a></li>
+  <li><a href="http://github.com/._week39-bs062.html">63</a></li>
+  <li><a href="">...</a></li>
+  <li><a href="http://github.com/._week39-bs088.html">89</a></li>
+  <li><a href="http://github.com/._week39-bs054.html">&raquo;</a></li>
+</ul>
+<!-- ------------------- end of main content --------------- -->
+</div>  <!-- end container -->
+<!-- include javascript, jQuery *first* -->
+<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.10.2/jquery.min.js"></script>
+<script src="https://netdna.bootstrapcdn.com/bootstrap/3.0.0/js/bootstrap.min.js"></script>
+<!-- Bootstrap footer
+<footer>
+<a href="https://.../"><img width="250" align=right src="https://.../"></a>
+</footer>
+-->
+<center style="font-size:80%">
+<!-- copyright only on the titlepage -->
+</center>
+</body>
+</html>
+
diff --git a/doc/pub/week39/html/._week39-bs054.html b/doc/pub/week39/html/._week39-bs054.html
new file mode 100644
index 000000000..81260a212
--- /dev/null
+++ b/doc/pub/week39/html/._week39-bs054.html
@@ -0,0 +1,472 @@
+<!--
+HTML file automatically generated from DocOnce source
+(https://github.com/doconce/doconce/)
+doconce format html week39.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week39-bs --no_mako
+-->
+<html>
+<head>
+<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
+<meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
+<meta name="viewport" content="width=device-width, initial-scale=1.0" />
+<meta name="description" content="Week 39: Optimization and  Gradient Methods">
+<title>Week 39: Optimization and  Gradient Methods</title>
+<!-- Bootstrap style: bootstrap -->
+<!-- doconce format html week39.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week39-bs --no_mako -->
+<link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
+<!-- not necessary
+<link href="https://netdna.bootstrapcdn.com/font-awesome/4.0.3/css/font-awesome.css" rel="stylesheet">
+-->
+<style type="text/css">
+/* Add scrollbar to dropdown menus in bootstrap navigation bar */
+.dropdown-menu {
+   height: auto;
+   max-height: 400px;
+   overflow-x: hidden;
+}
+/* Adds an invisible element before each target to offset for the navigation
+   bar */
+.anchor::before {
+  content:"";
+  display:block;
+  height:50px;      /* fixed header height for style bootstrap */
+  margin:-50px 0 0; /* negative fixed header height */
+}
+</style>
+</head>
+
+<!-- tocinfo
+{'highest level': 2,
+ 'sections': [('Plan for week 39, September 23-27, 2024',
+               2,
+               None,
+               'plan-for-week-39-september-23-27-2024'),
+              ('Lecture Monday September 23',
+               2,
+               None,
+               'lecture-monday-september-23'),
+              ('Lab sessions week 39', 2, None, 'lab-sessions-week-39'),
+              ('Lecture Monday September 23, Optimization, the central part of '
+               'any Machine Learning algortithm',
+               2,
+               None,
+               'lecture-monday-september-23-optimization-the-central-part-of-any-machine-learning-algortithm'),
+              ('Revisiting our Logistic Regression case',
+               2,
+               None,
+               'revisiting-our-logistic-regression-case'),
+              ('The equations to solve', 2, None, 'the-equations-to-solve'),
+              ("Solving using Newton-Raphson's method",
+               2,
+               None,
+               'solving-using-newton-raphson-s-method'),
+              ("Brief reminder on Newton-Raphson's method",
+               2,
+               None,
+               'brief-reminder-on-newton-raphson-s-method'),
+              ('The equations', 2, None, 'the-equations'),
+              ('Simple geometric interpretation',
+               2,
+               None,
+               'simple-geometric-interpretation'),
+              ('Extending to more than one variable',
+               2,
+               None,
+               'extending-to-more-than-one-variable'),
+              ('Steepest descent', 2, None, 'steepest-descent'),
+              ('More on Steepest descent', 2, None, 'more-on-steepest-descent'),
+              ('The ideal', 2, None, 'the-ideal'),
+              ('The sensitiveness of the gradient descent',
+               2,
+               None,
+               'the-sensitiveness-of-the-gradient-descent'),
+              ('Convex functions', 2, None, 'convex-functions'),
+              ('Convex function', 2, None, 'convex-function'),
+              ('Conditions on convex functions',
+               2,
+               None,
+               'conditions-on-convex-functions'),
+              ('More on convex functions', 2, None, 'more-on-convex-functions'),
+              ('Some simple problems', 2, None, 'some-simple-problems'),
+              ('Standard steepest descent',
+               2,
+               None,
+               'standard-steepest-descent'),
+              ('Gradient method', 2, None, 'gradient-method'),
+              ('Steepest descent  method', 2, None, 'steepest-descent-method'),
+              ('Steepest descent  method', 2, None, 'steepest-descent-method'),
+              ('Final expressions', 2, None, 'final-expressions'),
+              ('Steepest descent example', 2, None, 'steepest-descent-example'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method and iterations',
+               2,
+               None,
+               'conjugate-gradient-method-and-iterations'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Revisiting our first homework',
+               2,
+               None,
+               'revisiting-our-first-homework'),
+              ('Gradient descent example', 2, None, 'gradient-descent-example'),
+              ('The derivative of the cost/loss function',
+               2,
+               None,
+               'the-derivative-of-the-cost-loss-function'),
+              ('The Hessian matrix', 2, None, 'the-hessian-matrix'),
+              ('Simple program', 2, None, 'simple-program'),
+              ('Gradient Descent Example', 2, None, 'gradient-descent-example'),
+              ('And a corresponding example using _scikit-learn_',
+               2,
+               None,
+               'and-a-corresponding-example-using-scikit-learn'),
+              ('Gradient descent and Ridge',
+               2,
+               None,
+               'gradient-descent-and-ridge'),
+              ('The Hessian matrix for Ridge Regression',
+               2,
+               None,
+               'the-hessian-matrix-for-ridge-regression'),
+              ('Program example for gradient descent with Ridge Regression',
+               2,
+               None,
+               'program-example-for-gradient-descent-with-ridge-regression'),
+              ('Using gradient descent methods, limitations',
+               2,
+               None,
+               'using-gradient-descent-methods-limitations'),
+              ('Improving gradient descent with momentum',
+               2,
+               None,
+               'improving-gradient-descent-with-momentum'),
+              ('Same code but now with momentum gradient descent',
+               2,
+               None,
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ('Overview video on Stochastic Gradient Descent',
+               2,
+               None,
+               'overview-video-on-stochastic-gradient-descent'),
+              ('Batches and mini-batches', 2, None, 'batches-and-mini-batches'),
+              ('Stochastic Gradient Descent (SGD)',
+               2,
+               None,
+               'stochastic-gradient-descent-sgd'),
+              ('Stochastic Gradient Descent',
+               2,
+               None,
+               'stochastic-gradient-descent'),
+              ('Computation of gradients', 2, None, 'computation-of-gradients'),
+              ('SGD example', 2, None, 'sgd-example'),
+              ('The gradient step', 2, None, 'the-gradient-step'),
+              ('Simple example code', 2, None, 'simple-example-code'),
+              ('When do we stop?', 2, None, 'when-do-we-stop'),
+              ('Slightly different approach',
+               2,
+               None,
+               'slightly-different-approach'),
+              ('Time decay rate', 2, None, 'time-decay-rate'),
+              ('Code with a Number of Minibatches which varies',
+               2,
+               None,
+               'code-with-a-number-of-minibatches-which-varies'),
+              ('Replace or not', 2, None, 'replace-or-not'),
+              ('Momentum based GD', 2, None, 'momentum-based-gd'),
+              ('More on momentum based approaches',
+               2,
+               None,
+               'more-on-momentum-based-approaches'),
+              ('Momentum parameter', 2, None, 'momentum-parameter'),
+              ('Second moment of the gradient',
+               2,
+               None,
+               'second-moment-of-the-gradient'),
+              ('RMS prop', 2, None, 'rms-prop'),
+              ('"ADAM optimizer":"https://arxiv.org/abs/1412.6980"',
+               2,
+               None,
+               'adam-optimizer-https-arxiv-org-abs-1412-6980'),
+              ('Algorithms and codes for Adagrad, RMSprop and Adam',
+               2,
+               None,
+               'algorithms-and-codes-for-adagrad-rmsprop-and-adam'),
+              ('Practical tips', 2, None, 'practical-tips'),
+              ('Automatic differentiation',
+               2,
+               None,
+               'automatic-differentiation'),
+              ('Using autograd', 2, None, 'using-autograd'),
+              ('Autograd with more complicated functions',
+               2,
+               None,
+               'autograd-with-more-complicated-functions'),
+              ('More complicated functions using the elements of their '
+               'arguments directly',
+               2,
+               None,
+               'more-complicated-functions-using-the-elements-of-their-arguments-directly'),
+              ('Functions using mathematical functions from Numpy',
+               2,
+               None,
+               'functions-using-mathematical-functions-from-numpy'),
+              ('More autograd', 2, None, 'more-autograd'),
+              ('And  with loops', 2, None, 'and-with-loops'),
+              ('Using recursion', 2, None, 'using-recursion'),
+              ('Unsupported functions', 2, None, 'unsupported-functions'),
+              ('The syntax a.dot(b) when finding the dot product',
+               2,
+               None,
+               'the-syntax-a-dot-b-when-finding-the-dot-product'),
+              ('Using Autograd with OLS', 2, None, 'using-autograd-with-ols'),
+              ('Same code but now with momentum gradient descent',
+               2,
+               None,
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ("But none of these can compete with Newton's method",
+               2,
+               None,
+               'but-none-of-these-can-compete-with-newton-s-method'),
+              ('Including Stochastic Gradient Descent with Autograd',
+               2,
+               None,
+               'including-stochastic-gradient-descent-with-autograd'),
+              ('Same code but now with momentum gradient descent',
+               2,
+               None,
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ('Similar (second order function now) problem but now with '
+               'AdaGrad',
+               2,
+               None,
+               'similar-second-order-function-now-problem-but-now-with-adagrad'),
+              ('RMSprop for adaptive learning rate with Stochastic Gradient '
+               'Descent',
+               2,
+               None,
+               'rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent'),
+              ('And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"',
+               2,
+               None,
+               'and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf'),
+              ('And Logistic Regression', 2, None, 'and-logistic-regression'),
+              ('Introducing "JAX":"https://jax.readthedocs.io/en/latest/"',
+               2,
+               None,
+               'introducing-jax-https-jax-readthedocs-io-en-latest')]}
+end of tocinfo -->
+
+<body>
+
+
+
+<script type="text/x-mathjax-config">
+MathJax.Hub.Config({
+  TeX: {
+     equationNumbers: {  autoNumber: "none"  },
+     extensions: ["AMSmath.js", "AMSsymbols.js", "autobold.js", "color.js"]
+  }
+});
+</script>
+<script type="text/javascript" async
+ src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
+</script>
+
+
+<!-- Bootstrap navigation bar -->
+<div class="navbar navbar-default navbar-fixed-top">
+  <div class="navbar-header">
+    <button type="button" class="navbar-toggle" data-toggle="collapse" data-target=".navbar-responsive-collapse">
+      <span class="icon-bar"></span>
+      <span class="icon-bar"></span>
+      <span class="icon-bar"></span>
+    </button>
+    <a class="navbar-brand" href="http://github.com/week39-bs.html">Week 39: Optimization and  Gradient Methods</a>
+  </div>
+  <div class="navbar-collapse collapse navbar-responsive-collapse">
+    <ul class="nav navbar-nav navbar-right">
+      <li class="dropdown">
+        <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
+        <ul class="dropdown-menu">
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs001.html#plan-for-week-39-september-23-27-2024" style="font-size: 80%;">Plan for week 39, September 23-27, 2024</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs002.html#lecture-monday-september-23" style="font-size: 80%;">Lecture Monday September 23</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs003.html#lab-sessions-week-39" style="font-size: 80%;">Lab sessions week 39</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs004.html#lecture-monday-september-23-optimization-the-central-part-of-any-machine-learning-algortithm" style="font-size: 80%;">Lecture Monday September 23, Optimization, the central part of any Machine Learning algortithm</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs005.html#revisiting-our-logistic-regression-case" style="font-size: 80%;">Revisiting our Logistic Regression case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs006.html#the-equations-to-solve" style="font-size: 80%;">The equations to solve</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs007.html#solving-using-newton-raphson-s-method" style="font-size: 80%;">Solving using Newton-Raphson's method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs008.html#brief-reminder-on-newton-raphson-s-method" style="font-size: 80%;">Brief reminder on Newton-Raphson's method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs009.html#the-equations" style="font-size: 80%;">The equations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs010.html#simple-geometric-interpretation" style="font-size: 80%;">Simple geometric interpretation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs011.html#extending-to-more-than-one-variable" style="font-size: 80%;">Extending to more than one variable</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs012.html#steepest-descent" style="font-size: 80%;">Steepest descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs013.html#more-on-steepest-descent" style="font-size: 80%;">More on Steepest descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs014.html#the-ideal" style="font-size: 80%;">The ideal</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs015.html#the-sensitiveness-of-the-gradient-descent" style="font-size: 80%;">The sensitiveness of the gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs016.html#convex-functions" style="font-size: 80%;">Convex functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs017.html#convex-function" style="font-size: 80%;">Convex function</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs018.html#conditions-on-convex-functions" style="font-size: 80%;">Conditions on convex functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs019.html#more-on-convex-functions" style="font-size: 80%;">More on convex functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs020.html#some-simple-problems" style="font-size: 80%;">Some simple problems</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs021.html#standard-steepest-descent" style="font-size: 80%;">Standard steepest descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs022.html#gradient-method" style="font-size: 80%;">Gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs024.html#steepest-descent-method" style="font-size: 80%;">Steepest descent  method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs024.html#steepest-descent-method" style="font-size: 80%;">Steepest descent  method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs025.html#final-expressions" style="font-size: 80%;">Final expressions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs026.html#steepest-descent-example" style="font-size: 80%;">Steepest descent example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs031.html#conjugate-gradient-method-and-iterations" style="font-size: 80%;">Conjugate gradient method and iterations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs035.html#revisiting-our-first-homework" style="font-size: 80%;">Revisiting our first homework</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs040.html#gradient-descent-example" style="font-size: 80%;">Gradient descent example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs037.html#the-derivative-of-the-cost-loss-function" style="font-size: 80%;">The derivative of the cost/loss function</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs038.html#the-hessian-matrix" style="font-size: 80%;">The Hessian matrix</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs039.html#simple-program" style="font-size: 80%;">Simple program</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs040.html#gradient-descent-example" style="font-size: 80%;">Gradient Descent Example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs041.html#and-a-corresponding-example-using-scikit-learn" style="font-size: 80%;">And a corresponding example using <b>scikit-learn</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs042.html#gradient-descent-and-ridge" style="font-size: 80%;">Gradient descent and Ridge</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs043.html#the-hessian-matrix-for-ridge-regression" style="font-size: 80%;">The Hessian matrix for Ridge Regression</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs044.html#program-example-for-gradient-descent-with-ridge-regression" style="font-size: 80%;">Program example for gradient descent with Ridge Regression</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs045.html#using-gradient-descent-methods-limitations" style="font-size: 80%;">Using gradient descent methods, limitations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs046.html#improving-gradient-descent-with-momentum" style="font-size: 80%;">Improving gradient descent with momentum</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs083.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;">Same code but now with momentum gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs048.html#overview-video-on-stochastic-gradient-descent" style="font-size: 80%;">Overview video on Stochastic Gradient Descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs049.html#batches-and-mini-batches" style="font-size: 80%;">Batches and mini-batches</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs050.html#stochastic-gradient-descent-sgd" style="font-size: 80%;">Stochastic Gradient Descent (SGD)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs051.html#stochastic-gradient-descent" style="font-size: 80%;">Stochastic Gradient Descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs052.html#computation-of-gradients" style="font-size: 80%;">Computation of gradients</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs053.html#sgd-example" style="font-size: 80%;">SGD example</a></li>
+     <!-- navigation toc: --> <li><a href="#the-gradient-step" style="font-size: 80%;">The gradient step</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs055.html#simple-example-code" style="font-size: 80%;">Simple example code</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs056.html#when-do-we-stop" style="font-size: 80%;">When do we stop?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs057.html#slightly-different-approach" style="font-size: 80%;">Slightly different approach</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs058.html#time-decay-rate" style="font-size: 80%;">Time decay rate</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs059.html#code-with-a-number-of-minibatches-which-varies" style="font-size: 80%;">Code with a Number of Minibatches which varies</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs060.html#replace-or-not" style="font-size: 80%;">Replace or not</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs061.html#momentum-based-gd" style="font-size: 80%;">Momentum based GD</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs062.html#more-on-momentum-based-approaches" style="font-size: 80%;">More on momentum based approaches</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs063.html#momentum-parameter" style="font-size: 80%;">Momentum parameter</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs064.html#second-moment-of-the-gradient" style="font-size: 80%;">Second moment of the gradient</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs065.html#rms-prop" style="font-size: 80%;">RMS prop</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs066.html#adam-optimizer-https-arxiv-org-abs-1412-6980" style="font-size: 80%;">"ADAM optimizer":"https://arxiv.org/abs/1412.6980"</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs067.html#algorithms-and-codes-for-adagrad-rmsprop-and-adam" style="font-size: 80%;">Algorithms and codes for Adagrad, RMSprop and Adam</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs068.html#practical-tips" style="font-size: 80%;">Practical tips</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs069.html#automatic-differentiation" style="font-size: 80%;">Automatic differentiation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs070.html#using-autograd" style="font-size: 80%;">Using autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs071.html#autograd-with-more-complicated-functions" style="font-size: 80%;">Autograd with more complicated functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs072.html#more-complicated-functions-using-the-elements-of-their-arguments-directly" style="font-size: 80%;">More complicated functions using the elements of their arguments directly</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs073.html#functions-using-mathematical-functions-from-numpy" style="font-size: 80%;">Functions using mathematical functions from Numpy</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs074.html#more-autograd" style="font-size: 80%;">More autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs075.html#and-with-loops" style="font-size: 80%;">And  with loops</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs076.html#using-recursion" style="font-size: 80%;">Using recursion</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs077.html#unsupported-functions" style="font-size: 80%;">Unsupported functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs078.html#the-syntax-a-dot-b-when-finding-the-dot-product" style="font-size: 80%;">The syntax a.dot(b) when finding the dot product</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs079.html#using-autograd-with-ols" style="font-size: 80%;">Using Autograd with OLS</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs083.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;">Same code but now with momentum gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs081.html#but-none-of-these-can-compete-with-newton-s-method" style="font-size: 80%;">But none of these can compete with Newton's method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs082.html#including-stochastic-gradient-descent-with-autograd" style="font-size: 80%;">Including Stochastic Gradient Descent with Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs083.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;">Same code but now with momentum gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs084.html#similar-second-order-function-now-problem-but-now-with-adagrad" style="font-size: 80%;">Similar (second order function now) problem but now with AdaGrad</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs085.html#rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent" style="font-size: 80%;">RMSprop for adaptive learning rate with Stochastic Gradient Descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs086.html#and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf" style="font-size: 80%;">And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs087.html#and-logistic-regression" style="font-size: 80%;">And Logistic Regression</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs088.html#introducing-jax-https-jax-readthedocs-io-en-latest" style="font-size: 80%;">Introducing "JAX":"https://jax.readthedocs.io/en/latest/"</a></li>
+
+        </ul>
+      </li>
+    </ul>
+  </div>
+</div>
+</div> <!-- end of navigation bar -->
+<div class="container">
+<p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
+<a name="part0054"></a>
+<!-- !split -->
+<h2 id="the-gradient-step" class="anchor">The gradient step </h2>
+
+<p>Thus a gradient descent step now looks like </p>
+$$
+\beta_{j+1} = \beta_j - \gamma_j \sum_{i \in B_k}^n \nabla_\beta c_i(\mathbf{x}_i,
+\mathbf{\beta})
+$$
+
+<p>where \( k \) is picked at random with equal
+probability from \( [1,n/M] \). An iteration over the number of
+minibathces (n/M) is commonly referred to as an epoch. Thus it is
+typical to choose a number of epochs and for each epoch iterate over
+the number of minibatches, as exemplified in the code below.
+</p>
+
+<p>
+<!-- navigation buttons at the bottom of the page -->
+<ul class="pagination">
+<li><a href="http://github.com/._week39-bs053.html">&laquo;</a></li>
+  <li><a href="http://github.com/._week39-bs000.html">1</a></li>
+  <li><a href="">...</a></li>
+  <li><a href="http://github.com/._week39-bs046.html">47</a></li>
+  <li><a href="http://github.com/._week39-bs047.html">48</a></li>
+  <li><a href="http://github.com/._week39-bs048.html">49</a></li>
+  <li><a href="http://github.com/._week39-bs049.html">50</a></li>
+  <li><a href="http://github.com/._week39-bs050.html">51</a></li>
+  <li><a href="http://github.com/._week39-bs051.html">52</a></li>
+  <li><a href="http://github.com/._week39-bs052.html">53</a></li>
+  <li><a href="http://github.com/._week39-bs053.html">54</a></li>
+  <li class="active"><a href="http://github.com/._week39-bs054.html">55</a></li>
+  <li><a href="http://github.com/._week39-bs055.html">56</a></li>
+  <li><a href="http://github.com/._week39-bs056.html">57</a></li>
+  <li><a href="http://github.com/._week39-bs057.html">58</a></li>
+  <li><a href="http://github.com/._week39-bs058.html">59</a></li>
+  <li><a href="http://github.com/._week39-bs059.html">60</a></li>
+  <li><a href="http://github.com/._week39-bs060.html">61</a></li>
+  <li><a href="http://github.com/._week39-bs061.html">62</a></li>
+  <li><a href="http://github.com/._week39-bs062.html">63</a></li>
+  <li><a href="http://github.com/._week39-bs063.html">64</a></li>
+  <li><a href="">...</a></li>
+  <li><a href="http://github.com/._week39-bs088.html">89</a></li>
+  <li><a href="http://github.com/._week39-bs055.html">&raquo;</a></li>
+</ul>
+<!-- ------------------- end of main content --------------- -->
+</div>  <!-- end container -->
+<!-- include javascript, jQuery *first* -->
+<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.10.2/jquery.min.js"></script>
+<script src="https://netdna.bootstrapcdn.com/bootstrap/3.0.0/js/bootstrap.min.js"></script>
+<!-- Bootstrap footer
+<footer>
+<a href="https://.../"><img width="250" align=right src="https://.../"></a>
+</footer>
+-->
+<center style="font-size:80%">
+<!-- copyright only on the titlepage -->
+</center>
+</body>
+</html>
+
diff --git a/doc/pub/week39/html/._week39-bs055.html b/doc/pub/week39/html/._week39-bs055.html
new file mode 100644
index 000000000..aaf538a29
--- /dev/null
+++ b/doc/pub/week39/html/._week39-bs055.html
@@ -0,0 +1,504 @@
+<!--
+HTML file automatically generated from DocOnce source
+(https://github.com/doconce/doconce/)
+doconce format html week39.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week39-bs --no_mako
+-->
+<html>
+<head>
+<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
+<meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
+<meta name="viewport" content="width=device-width, initial-scale=1.0" />
+<meta name="description" content="Week 39: Optimization and  Gradient Methods">
+<title>Week 39: Optimization and  Gradient Methods</title>
+<!-- Bootstrap style: bootstrap -->
+<!-- doconce format html week39.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week39-bs --no_mako -->
+<link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
+<!-- not necessary
+<link href="https://netdna.bootstrapcdn.com/font-awesome/4.0.3/css/font-awesome.css" rel="stylesheet">
+-->
+<style type="text/css">
+/* Add scrollbar to dropdown menus in bootstrap navigation bar */
+.dropdown-menu {
+   height: auto;
+   max-height: 400px;
+   overflow-x: hidden;
+}
+/* Adds an invisible element before each target to offset for the navigation
+   bar */
+.anchor::before {
+  content:"";
+  display:block;
+  height:50px;      /* fixed header height for style bootstrap */
+  margin:-50px 0 0; /* negative fixed header height */
+}
+</style>
+</head>
+
+<!-- tocinfo
+{'highest level': 2,
+ 'sections': [('Plan for week 39, September 23-27, 2024',
+               2,
+               None,
+               'plan-for-week-39-september-23-27-2024'),
+              ('Lecture Monday September 23',
+               2,
+               None,
+               'lecture-monday-september-23'),
+              ('Lab sessions week 39', 2, None, 'lab-sessions-week-39'),
+              ('Lecture Monday September 23, Optimization, the central part of '
+               'any Machine Learning algortithm',
+               2,
+               None,
+               'lecture-monday-september-23-optimization-the-central-part-of-any-machine-learning-algortithm'),
+              ('Revisiting our Logistic Regression case',
+               2,
+               None,
+               'revisiting-our-logistic-regression-case'),
+              ('The equations to solve', 2, None, 'the-equations-to-solve'),
+              ("Solving using Newton-Raphson's method",
+               2,
+               None,
+               'solving-using-newton-raphson-s-method'),
+              ("Brief reminder on Newton-Raphson's method",
+               2,
+               None,
+               'brief-reminder-on-newton-raphson-s-method'),
+              ('The equations', 2, None, 'the-equations'),
+              ('Simple geometric interpretation',
+               2,
+               None,
+               'simple-geometric-interpretation'),
+              ('Extending to more than one variable',
+               2,
+               None,
+               'extending-to-more-than-one-variable'),
+              ('Steepest descent', 2, None, 'steepest-descent'),
+              ('More on Steepest descent', 2, None, 'more-on-steepest-descent'),
+              ('The ideal', 2, None, 'the-ideal'),
+              ('The sensitiveness of the gradient descent',
+               2,
+               None,
+               'the-sensitiveness-of-the-gradient-descent'),
+              ('Convex functions', 2, None, 'convex-functions'),
+              ('Convex function', 2, None, 'convex-function'),
+              ('Conditions on convex functions',
+               2,
+               None,
+               'conditions-on-convex-functions'),
+              ('More on convex functions', 2, None, 'more-on-convex-functions'),
+              ('Some simple problems', 2, None, 'some-simple-problems'),
+              ('Standard steepest descent',
+               2,
+               None,
+               'standard-steepest-descent'),
+              ('Gradient method', 2, None, 'gradient-method'),
+              ('Steepest descent  method', 2, None, 'steepest-descent-method'),
+              ('Steepest descent  method', 2, None, 'steepest-descent-method'),
+              ('Final expressions', 2, None, 'final-expressions'),
+              ('Steepest descent example', 2, None, 'steepest-descent-example'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method and iterations',
+               2,
+               None,
+               'conjugate-gradient-method-and-iterations'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Revisiting our first homework',
+               2,
+               None,
+               'revisiting-our-first-homework'),
+              ('Gradient descent example', 2, None, 'gradient-descent-example'),
+              ('The derivative of the cost/loss function',
+               2,
+               None,
+               'the-derivative-of-the-cost-loss-function'),
+              ('The Hessian matrix', 2, None, 'the-hessian-matrix'),
+              ('Simple program', 2, None, 'simple-program'),
+              ('Gradient Descent Example', 2, None, 'gradient-descent-example'),
+              ('And a corresponding example using _scikit-learn_',
+               2,
+               None,
+               'and-a-corresponding-example-using-scikit-learn'),
+              ('Gradient descent and Ridge',
+               2,
+               None,
+               'gradient-descent-and-ridge'),
+              ('The Hessian matrix for Ridge Regression',
+               2,
+               None,
+               'the-hessian-matrix-for-ridge-regression'),
+              ('Program example for gradient descent with Ridge Regression',
+               2,
+               None,
+               'program-example-for-gradient-descent-with-ridge-regression'),
+              ('Using gradient descent methods, limitations',
+               2,
+               None,
+               'using-gradient-descent-methods-limitations'),
+              ('Improving gradient descent with momentum',
+               2,
+               None,
+               'improving-gradient-descent-with-momentum'),
+              ('Same code but now with momentum gradient descent',
+               2,
+               None,
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ('Overview video on Stochastic Gradient Descent',
+               2,
+               None,
+               'overview-video-on-stochastic-gradient-descent'),
+              ('Batches and mini-batches', 2, None, 'batches-and-mini-batches'),
+              ('Stochastic Gradient Descent (SGD)',
+               2,
+               None,
+               'stochastic-gradient-descent-sgd'),
+              ('Stochastic Gradient Descent',
+               2,
+               None,
+               'stochastic-gradient-descent'),
+              ('Computation of gradients', 2, None, 'computation-of-gradients'),
+              ('SGD example', 2, None, 'sgd-example'),
+              ('The gradient step', 2, None, 'the-gradient-step'),
+              ('Simple example code', 2, None, 'simple-example-code'),
+              ('When do we stop?', 2, None, 'when-do-we-stop'),
+              ('Slightly different approach',
+               2,
+               None,
+               'slightly-different-approach'),
+              ('Time decay rate', 2, None, 'time-decay-rate'),
+              ('Code with a Number of Minibatches which varies',
+               2,
+               None,
+               'code-with-a-number-of-minibatches-which-varies'),
+              ('Replace or not', 2, None, 'replace-or-not'),
+              ('Momentum based GD', 2, None, 'momentum-based-gd'),
+              ('More on momentum based approaches',
+               2,
+               None,
+               'more-on-momentum-based-approaches'),
+              ('Momentum parameter', 2, None, 'momentum-parameter'),
+              ('Second moment of the gradient',
+               2,
+               None,
+               'second-moment-of-the-gradient'),
+              ('RMS prop', 2, None, 'rms-prop'),
+              ('"ADAM optimizer":"https://arxiv.org/abs/1412.6980"',
+               2,
+               None,
+               'adam-optimizer-https-arxiv-org-abs-1412-6980'),
+              ('Algorithms and codes for Adagrad, RMSprop and Adam',
+               2,
+               None,
+               'algorithms-and-codes-for-adagrad-rmsprop-and-adam'),
+              ('Practical tips', 2, None, 'practical-tips'),
+              ('Automatic differentiation',
+               2,
+               None,
+               'automatic-differentiation'),
+              ('Using autograd', 2, None, 'using-autograd'),
+              ('Autograd with more complicated functions',
+               2,
+               None,
+               'autograd-with-more-complicated-functions'),
+              ('More complicated functions using the elements of their '
+               'arguments directly',
+               2,
+               None,
+               'more-complicated-functions-using-the-elements-of-their-arguments-directly'),
+              ('Functions using mathematical functions from Numpy',
+               2,
+               None,
+               'functions-using-mathematical-functions-from-numpy'),
+              ('More autograd', 2, None, 'more-autograd'),
+              ('And  with loops', 2, None, 'and-with-loops'),
+              ('Using recursion', 2, None, 'using-recursion'),
+              ('Unsupported functions', 2, None, 'unsupported-functions'),
+              ('The syntax a.dot(b) when finding the dot product',
+               2,
+               None,
+               'the-syntax-a-dot-b-when-finding-the-dot-product'),
+              ('Using Autograd with OLS', 2, None, 'using-autograd-with-ols'),
+              ('Same code but now with momentum gradient descent',
+               2,
+               None,
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ("But none of these can compete with Newton's method",
+               2,
+               None,
+               'but-none-of-these-can-compete-with-newton-s-method'),
+              ('Including Stochastic Gradient Descent with Autograd',
+               2,
+               None,
+               'including-stochastic-gradient-descent-with-autograd'),
+              ('Same code but now with momentum gradient descent',
+               2,
+               None,
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ('Similar (second order function now) problem but now with '
+               'AdaGrad',
+               2,
+               None,
+               'similar-second-order-function-now-problem-but-now-with-adagrad'),
+              ('RMSprop for adaptive learning rate with Stochastic Gradient '
+               'Descent',
+               2,
+               None,
+               'rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent'),
+              ('And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"',
+               2,
+               None,
+               'and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf'),
+              ('And Logistic Regression', 2, None, 'and-logistic-regression'),
+              ('Introducing "JAX":"https://jax.readthedocs.io/en/latest/"',
+               2,
+               None,
+               'introducing-jax-https-jax-readthedocs-io-en-latest')]}
+end of tocinfo -->
+
+<body>
+
+
+
+<script type="text/x-mathjax-config">
+MathJax.Hub.Config({
+  TeX: {
+     equationNumbers: {  autoNumber: "none"  },
+     extensions: ["AMSmath.js", "AMSsymbols.js", "autobold.js", "color.js"]
+  }
+});
+</script>
+<script type="text/javascript" async
+ src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
+</script>
+
+
+<!-- Bootstrap navigation bar -->
+<div class="navbar navbar-default navbar-fixed-top">
+  <div class="navbar-header">
+    <button type="button" class="navbar-toggle" data-toggle="collapse" data-target=".navbar-responsive-collapse">
+      <span class="icon-bar"></span>
+      <span class="icon-bar"></span>
+      <span class="icon-bar"></span>
+    </button>
+    <a class="navbar-brand" href="http://github.com/week39-bs.html">Week 39: Optimization and  Gradient Methods</a>
+  </div>
+  <div class="navbar-collapse collapse navbar-responsive-collapse">
+    <ul class="nav navbar-nav navbar-right">
+      <li class="dropdown">
+        <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
+        <ul class="dropdown-menu">
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs001.html#plan-for-week-39-september-23-27-2024" style="font-size: 80%;">Plan for week 39, September 23-27, 2024</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs002.html#lecture-monday-september-23" style="font-size: 80%;">Lecture Monday September 23</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs003.html#lab-sessions-week-39" style="font-size: 80%;">Lab sessions week 39</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs004.html#lecture-monday-september-23-optimization-the-central-part-of-any-machine-learning-algortithm" style="font-size: 80%;">Lecture Monday September 23, Optimization, the central part of any Machine Learning algortithm</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs005.html#revisiting-our-logistic-regression-case" style="font-size: 80%;">Revisiting our Logistic Regression case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs006.html#the-equations-to-solve" style="font-size: 80%;">The equations to solve</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs007.html#solving-using-newton-raphson-s-method" style="font-size: 80%;">Solving using Newton-Raphson's method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs008.html#brief-reminder-on-newton-raphson-s-method" style="font-size: 80%;">Brief reminder on Newton-Raphson's method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs009.html#the-equations" style="font-size: 80%;">The equations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs010.html#simple-geometric-interpretation" style="font-size: 80%;">Simple geometric interpretation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs011.html#extending-to-more-than-one-variable" style="font-size: 80%;">Extending to more than one variable</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs012.html#steepest-descent" style="font-size: 80%;">Steepest descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs013.html#more-on-steepest-descent" style="font-size: 80%;">More on Steepest descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs014.html#the-ideal" style="font-size: 80%;">The ideal</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs015.html#the-sensitiveness-of-the-gradient-descent" style="font-size: 80%;">The sensitiveness of the gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs016.html#convex-functions" style="font-size: 80%;">Convex functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs017.html#convex-function" style="font-size: 80%;">Convex function</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs018.html#conditions-on-convex-functions" style="font-size: 80%;">Conditions on convex functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs019.html#more-on-convex-functions" style="font-size: 80%;">More on convex functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs020.html#some-simple-problems" style="font-size: 80%;">Some simple problems</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs021.html#standard-steepest-descent" style="font-size: 80%;">Standard steepest descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs022.html#gradient-method" style="font-size: 80%;">Gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs024.html#steepest-descent-method" style="font-size: 80%;">Steepest descent  method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs024.html#steepest-descent-method" style="font-size: 80%;">Steepest descent  method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs025.html#final-expressions" style="font-size: 80%;">Final expressions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs026.html#steepest-descent-example" style="font-size: 80%;">Steepest descent example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs031.html#conjugate-gradient-method-and-iterations" style="font-size: 80%;">Conjugate gradient method and iterations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs035.html#revisiting-our-first-homework" style="font-size: 80%;">Revisiting our first homework</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs040.html#gradient-descent-example" style="font-size: 80%;">Gradient descent example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs037.html#the-derivative-of-the-cost-loss-function" style="font-size: 80%;">The derivative of the cost/loss function</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs038.html#the-hessian-matrix" style="font-size: 80%;">The Hessian matrix</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs039.html#simple-program" style="font-size: 80%;">Simple program</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs040.html#gradient-descent-example" style="font-size: 80%;">Gradient Descent Example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs041.html#and-a-corresponding-example-using-scikit-learn" style="font-size: 80%;">And a corresponding example using <b>scikit-learn</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs042.html#gradient-descent-and-ridge" style="font-size: 80%;">Gradient descent and Ridge</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs043.html#the-hessian-matrix-for-ridge-regression" style="font-size: 80%;">The Hessian matrix for Ridge Regression</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs044.html#program-example-for-gradient-descent-with-ridge-regression" style="font-size: 80%;">Program example for gradient descent with Ridge Regression</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs045.html#using-gradient-descent-methods-limitations" style="font-size: 80%;">Using gradient descent methods, limitations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs046.html#improving-gradient-descent-with-momentum" style="font-size: 80%;">Improving gradient descent with momentum</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs083.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;">Same code but now with momentum gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs048.html#overview-video-on-stochastic-gradient-descent" style="font-size: 80%;">Overview video on Stochastic Gradient Descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs049.html#batches-and-mini-batches" style="font-size: 80%;">Batches and mini-batches</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs050.html#stochastic-gradient-descent-sgd" style="font-size: 80%;">Stochastic Gradient Descent (SGD)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs051.html#stochastic-gradient-descent" style="font-size: 80%;">Stochastic Gradient Descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs052.html#computation-of-gradients" style="font-size: 80%;">Computation of gradients</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs053.html#sgd-example" style="font-size: 80%;">SGD example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs054.html#the-gradient-step" style="font-size: 80%;">The gradient step</a></li>
+     <!-- navigation toc: --> <li><a href="#simple-example-code" style="font-size: 80%;">Simple example code</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs056.html#when-do-we-stop" style="font-size: 80%;">When do we stop?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs057.html#slightly-different-approach" style="font-size: 80%;">Slightly different approach</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs058.html#time-decay-rate" style="font-size: 80%;">Time decay rate</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs059.html#code-with-a-number-of-minibatches-which-varies" style="font-size: 80%;">Code with a Number of Minibatches which varies</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs060.html#replace-or-not" style="font-size: 80%;">Replace or not</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs061.html#momentum-based-gd" style="font-size: 80%;">Momentum based GD</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs062.html#more-on-momentum-based-approaches" style="font-size: 80%;">More on momentum based approaches</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs063.html#momentum-parameter" style="font-size: 80%;">Momentum parameter</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs064.html#second-moment-of-the-gradient" style="font-size: 80%;">Second moment of the gradient</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs065.html#rms-prop" style="font-size: 80%;">RMS prop</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs066.html#adam-optimizer-https-arxiv-org-abs-1412-6980" style="font-size: 80%;">"ADAM optimizer":"https://arxiv.org/abs/1412.6980"</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs067.html#algorithms-and-codes-for-adagrad-rmsprop-and-adam" style="font-size: 80%;">Algorithms and codes for Adagrad, RMSprop and Adam</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs068.html#practical-tips" style="font-size: 80%;">Practical tips</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs069.html#automatic-differentiation" style="font-size: 80%;">Automatic differentiation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs070.html#using-autograd" style="font-size: 80%;">Using autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs071.html#autograd-with-more-complicated-functions" style="font-size: 80%;">Autograd with more complicated functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs072.html#more-complicated-functions-using-the-elements-of-their-arguments-directly" style="font-size: 80%;">More complicated functions using the elements of their arguments directly</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs073.html#functions-using-mathematical-functions-from-numpy" style="font-size: 80%;">Functions using mathematical functions from Numpy</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs074.html#more-autograd" style="font-size: 80%;">More autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs075.html#and-with-loops" style="font-size: 80%;">And  with loops</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs076.html#using-recursion" style="font-size: 80%;">Using recursion</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs077.html#unsupported-functions" style="font-size: 80%;">Unsupported functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs078.html#the-syntax-a-dot-b-when-finding-the-dot-product" style="font-size: 80%;">The syntax a.dot(b) when finding the dot product</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs079.html#using-autograd-with-ols" style="font-size: 80%;">Using Autograd with OLS</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs083.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;">Same code but now with momentum gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs081.html#but-none-of-these-can-compete-with-newton-s-method" style="font-size: 80%;">But none of these can compete with Newton's method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs082.html#including-stochastic-gradient-descent-with-autograd" style="font-size: 80%;">Including Stochastic Gradient Descent with Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs083.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;">Same code but now with momentum gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs084.html#similar-second-order-function-now-problem-but-now-with-adagrad" style="font-size: 80%;">Similar (second order function now) problem but now with AdaGrad</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs085.html#rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent" style="font-size: 80%;">RMSprop for adaptive learning rate with Stochastic Gradient Descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs086.html#and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf" style="font-size: 80%;">And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs087.html#and-logistic-regression" style="font-size: 80%;">And Logistic Regression</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs088.html#introducing-jax-https-jax-readthedocs-io-en-latest" style="font-size: 80%;">Introducing "JAX":"https://jax.readthedocs.io/en/latest/"</a></li>
+
+        </ul>
+      </li>
+    </ul>
+  </div>
+</div>
+</div> <!-- end of navigation bar -->
+<div class="container">
+<p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
+<a name="part0055"></a>
+<!-- !split -->
+<h2 id="simple-example-code" class="anchor">Simple example code </h2>
+
+
+<!-- code=python (!bc pycod) typeset with pygments style "default" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #f8f8f8">
+  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span> 
+
+n <span style="color: #666666">=</span> <span style="color: #666666">100</span> <span style="color: #408080; font-style: italic">#100 datapoints </span>
+M <span style="color: #666666">=</span> <span style="color: #666666">5</span>   <span style="color: #408080; font-style: italic">#size of each minibatch</span>
+m <span style="color: #666666">=</span> <span style="color: #008000">int</span>(n<span style="color: #666666">/</span>M) <span style="color: #408080; font-style: italic">#number of minibatches</span>
+n_epochs <span style="color: #666666">=</span> <span style="color: #666666">10</span> <span style="color: #408080; font-style: italic">#number of epochs</span>
+
+j <span style="color: #666666">=</span> <span style="color: #666666">0</span>
+<span style="color: #008000; font-weight: bold">for</span> epoch <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #666666">1</span>,n_epochs<span style="color: #666666">+1</span>):
+    <span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(m):
+        k <span style="color: #666666">=</span> np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>randint(m) <span style="color: #408080; font-style: italic">#Pick the k-th minibatch at random</span>
+        <span style="color: #408080; font-style: italic">#Compute the gradient using the data in minibatch Bk</span>
+        <span style="color: #408080; font-style: italic">#Compute new suggestion for </span>
+        j <span style="color: #666666">+=</span> <span style="color: #666666">1</span>
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
+</div>
+
+<p>Taking the gradient only on a subset of the data has two important
+benefits. First, it introduces randomness which decreases the chance
+that our opmization scheme gets stuck in a local minima. Second, if
+the size of the minibatches are small relative to the number of
+datapoints (\( M <  n \)), the computation of the gradient is much
+cheaper since we sum over the datapoints in the \( k-th \) minibatch and not
+all \( n \) datapoints.
+</p>
+
+<p>
+<!-- navigation buttons at the bottom of the page -->
+<ul class="pagination">
+<li><a href="http://github.com/._week39-bs054.html">&laquo;</a></li>
+  <li><a href="http://github.com/._week39-bs000.html">1</a></li>
+  <li><a href="">...</a></li>
+  <li><a href="http://github.com/._week39-bs047.html">48</a></li>
+  <li><a href="http://github.com/._week39-bs048.html">49</a></li>
+  <li><a href="http://github.com/._week39-bs049.html">50</a></li>
+  <li><a href="http://github.com/._week39-bs050.html">51</a></li>
+  <li><a href="http://github.com/._week39-bs051.html">52</a></li>
+  <li><a href="http://github.com/._week39-bs052.html">53</a></li>
+  <li><a href="http://github.com/._week39-bs053.html">54</a></li>
+  <li><a href="http://github.com/._week39-bs054.html">55</a></li>
+  <li class="active"><a href="http://github.com/._week39-bs055.html">56</a></li>
+  <li><a href="http://github.com/._week39-bs056.html">57</a></li>
+  <li><a href="http://github.com/._week39-bs057.html">58</a></li>
+  <li><a href="http://github.com/._week39-bs058.html">59</a></li>
+  <li><a href="http://github.com/._week39-bs059.html">60</a></li>
+  <li><a href="http://github.com/._week39-bs060.html">61</a></li>
+  <li><a href="http://github.com/._week39-bs061.html">62</a></li>
+  <li><a href="http://github.com/._week39-bs062.html">63</a></li>
+  <li><a href="http://github.com/._week39-bs063.html">64</a></li>
+  <li><a href="http://github.com/._week39-bs064.html">65</a></li>
+  <li><a href="">...</a></li>
+  <li><a href="http://github.com/._week39-bs088.html">89</a></li>
+  <li><a href="http://github.com/._week39-bs056.html">&raquo;</a></li>
+</ul>
+<!-- ------------------- end of main content --------------- -->
+</div>  <!-- end container -->
+<!-- include javascript, jQuery *first* -->
+<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.10.2/jquery.min.js"></script>
+<script src="https://netdna.bootstrapcdn.com/bootstrap/3.0.0/js/bootstrap.min.js"></script>
+<!-- Bootstrap footer
+<footer>
+<a href="https://.../"><img width="250" align=right src="https://.../"></a>
+</footer>
+-->
+<center style="font-size:80%">
+<!-- copyright only on the titlepage -->
+</center>
+</body>
+</html>
+
diff --git a/doc/pub/week39/html/._week39-bs056.html b/doc/pub/week39/html/._week39-bs056.html
new file mode 100644
index 000000000..94b8ae245
--- /dev/null
+++ b/doc/pub/week39/html/._week39-bs056.html
@@ -0,0 +1,471 @@
+<!--
+HTML file automatically generated from DocOnce source
+(https://github.com/doconce/doconce/)
+doconce format html week39.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week39-bs --no_mako
+-->
+<html>
+<head>
+<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
+<meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
+<meta name="viewport" content="width=device-width, initial-scale=1.0" />
+<meta name="description" content="Week 39: Optimization and  Gradient Methods">
+<title>Week 39: Optimization and  Gradient Methods</title>
+<!-- Bootstrap style: bootstrap -->
+<!-- doconce format html week39.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week39-bs --no_mako -->
+<link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
+<!-- not necessary
+<link href="https://netdna.bootstrapcdn.com/font-awesome/4.0.3/css/font-awesome.css" rel="stylesheet">
+-->
+<style type="text/css">
+/* Add scrollbar to dropdown menus in bootstrap navigation bar */
+.dropdown-menu {
+   height: auto;
+   max-height: 400px;
+   overflow-x: hidden;
+}
+/* Adds an invisible element before each target to offset for the navigation
+   bar */
+.anchor::before {
+  content:"";
+  display:block;
+  height:50px;      /* fixed header height for style bootstrap */
+  margin:-50px 0 0; /* negative fixed header height */
+}
+</style>
+</head>
+
+<!-- tocinfo
+{'highest level': 2,
+ 'sections': [('Plan for week 39, September 23-27, 2024',
+               2,
+               None,
+               'plan-for-week-39-september-23-27-2024'),
+              ('Lecture Monday September 23',
+               2,
+               None,
+               'lecture-monday-september-23'),
+              ('Lab sessions week 39', 2, None, 'lab-sessions-week-39'),
+              ('Lecture Monday September 23, Optimization, the central part of '
+               'any Machine Learning algortithm',
+               2,
+               None,
+               'lecture-monday-september-23-optimization-the-central-part-of-any-machine-learning-algortithm'),
+              ('Revisiting our Logistic Regression case',
+               2,
+               None,
+               'revisiting-our-logistic-regression-case'),
+              ('The equations to solve', 2, None, 'the-equations-to-solve'),
+              ("Solving using Newton-Raphson's method",
+               2,
+               None,
+               'solving-using-newton-raphson-s-method'),
+              ("Brief reminder on Newton-Raphson's method",
+               2,
+               None,
+               'brief-reminder-on-newton-raphson-s-method'),
+              ('The equations', 2, None, 'the-equations'),
+              ('Simple geometric interpretation',
+               2,
+               None,
+               'simple-geometric-interpretation'),
+              ('Extending to more than one variable',
+               2,
+               None,
+               'extending-to-more-than-one-variable'),
+              ('Steepest descent', 2, None, 'steepest-descent'),
+              ('More on Steepest descent', 2, None, 'more-on-steepest-descent'),
+              ('The ideal', 2, None, 'the-ideal'),
+              ('The sensitiveness of the gradient descent',
+               2,
+               None,
+               'the-sensitiveness-of-the-gradient-descent'),
+              ('Convex functions', 2, None, 'convex-functions'),
+              ('Convex function', 2, None, 'convex-function'),
+              ('Conditions on convex functions',
+               2,
+               None,
+               'conditions-on-convex-functions'),
+              ('More on convex functions', 2, None, 'more-on-convex-functions'),
+              ('Some simple problems', 2, None, 'some-simple-problems'),
+              ('Standard steepest descent',
+               2,
+               None,
+               'standard-steepest-descent'),
+              ('Gradient method', 2, None, 'gradient-method'),
+              ('Steepest descent  method', 2, None, 'steepest-descent-method'),
+              ('Steepest descent  method', 2, None, 'steepest-descent-method'),
+              ('Final expressions', 2, None, 'final-expressions'),
+              ('Steepest descent example', 2, None, 'steepest-descent-example'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method and iterations',
+               2,
+               None,
+               'conjugate-gradient-method-and-iterations'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Revisiting our first homework',
+               2,
+               None,
+               'revisiting-our-first-homework'),
+              ('Gradient descent example', 2, None, 'gradient-descent-example'),
+              ('The derivative of the cost/loss function',
+               2,
+               None,
+               'the-derivative-of-the-cost-loss-function'),
+              ('The Hessian matrix', 2, None, 'the-hessian-matrix'),
+              ('Simple program', 2, None, 'simple-program'),
+              ('Gradient Descent Example', 2, None, 'gradient-descent-example'),
+              ('And a corresponding example using _scikit-learn_',
+               2,
+               None,
+               'and-a-corresponding-example-using-scikit-learn'),
+              ('Gradient descent and Ridge',
+               2,
+               None,
+               'gradient-descent-and-ridge'),
+              ('The Hessian matrix for Ridge Regression',
+               2,
+               None,
+               'the-hessian-matrix-for-ridge-regression'),
+              ('Program example for gradient descent with Ridge Regression',
+               2,
+               None,
+               'program-example-for-gradient-descent-with-ridge-regression'),
+              ('Using gradient descent methods, limitations',
+               2,
+               None,
+               'using-gradient-descent-methods-limitations'),
+              ('Improving gradient descent with momentum',
+               2,
+               None,
+               'improving-gradient-descent-with-momentum'),
+              ('Same code but now with momentum gradient descent',
+               2,
+               None,
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ('Overview video on Stochastic Gradient Descent',
+               2,
+               None,
+               'overview-video-on-stochastic-gradient-descent'),
+              ('Batches and mini-batches', 2, None, 'batches-and-mini-batches'),
+              ('Stochastic Gradient Descent (SGD)',
+               2,
+               None,
+               'stochastic-gradient-descent-sgd'),
+              ('Stochastic Gradient Descent',
+               2,
+               None,
+               'stochastic-gradient-descent'),
+              ('Computation of gradients', 2, None, 'computation-of-gradients'),
+              ('SGD example', 2, None, 'sgd-example'),
+              ('The gradient step', 2, None, 'the-gradient-step'),
+              ('Simple example code', 2, None, 'simple-example-code'),
+              ('When do we stop?', 2, None, 'when-do-we-stop'),
+              ('Slightly different approach',
+               2,
+               None,
+               'slightly-different-approach'),
+              ('Time decay rate', 2, None, 'time-decay-rate'),
+              ('Code with a Number of Minibatches which varies',
+               2,
+               None,
+               'code-with-a-number-of-minibatches-which-varies'),
+              ('Replace or not', 2, None, 'replace-or-not'),
+              ('Momentum based GD', 2, None, 'momentum-based-gd'),
+              ('More on momentum based approaches',
+               2,
+               None,
+               'more-on-momentum-based-approaches'),
+              ('Momentum parameter', 2, None, 'momentum-parameter'),
+              ('Second moment of the gradient',
+               2,
+               None,
+               'second-moment-of-the-gradient'),
+              ('RMS prop', 2, None, 'rms-prop'),
+              ('"ADAM optimizer":"https://arxiv.org/abs/1412.6980"',
+               2,
+               None,
+               'adam-optimizer-https-arxiv-org-abs-1412-6980'),
+              ('Algorithms and codes for Adagrad, RMSprop and Adam',
+               2,
+               None,
+               'algorithms-and-codes-for-adagrad-rmsprop-and-adam'),
+              ('Practical tips', 2, None, 'practical-tips'),
+              ('Automatic differentiation',
+               2,
+               None,
+               'automatic-differentiation'),
+              ('Using autograd', 2, None, 'using-autograd'),
+              ('Autograd with more complicated functions',
+               2,
+               None,
+               'autograd-with-more-complicated-functions'),
+              ('More complicated functions using the elements of their '
+               'arguments directly',
+               2,
+               None,
+               'more-complicated-functions-using-the-elements-of-their-arguments-directly'),
+              ('Functions using mathematical functions from Numpy',
+               2,
+               None,
+               'functions-using-mathematical-functions-from-numpy'),
+              ('More autograd', 2, None, 'more-autograd'),
+              ('And  with loops', 2, None, 'and-with-loops'),
+              ('Using recursion', 2, None, 'using-recursion'),
+              ('Unsupported functions', 2, None, 'unsupported-functions'),
+              ('The syntax a.dot(b) when finding the dot product',
+               2,
+               None,
+               'the-syntax-a-dot-b-when-finding-the-dot-product'),
+              ('Using Autograd with OLS', 2, None, 'using-autograd-with-ols'),
+              ('Same code but now with momentum gradient descent',
+               2,
+               None,
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ("But none of these can compete with Newton's method",
+               2,
+               None,
+               'but-none-of-these-can-compete-with-newton-s-method'),
+              ('Including Stochastic Gradient Descent with Autograd',
+               2,
+               None,
+               'including-stochastic-gradient-descent-with-autograd'),
+              ('Same code but now with momentum gradient descent',
+               2,
+               None,
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ('Similar (second order function now) problem but now with '
+               'AdaGrad',
+               2,
+               None,
+               'similar-second-order-function-now-problem-but-now-with-adagrad'),
+              ('RMSprop for adaptive learning rate with Stochastic Gradient '
+               'Descent',
+               2,
+               None,
+               'rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent'),
+              ('And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"',
+               2,
+               None,
+               'and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf'),
+              ('And Logistic Regression', 2, None, 'and-logistic-regression'),
+              ('Introducing "JAX":"https://jax.readthedocs.io/en/latest/"',
+               2,
+               None,
+               'introducing-jax-https-jax-readthedocs-io-en-latest')]}
+end of tocinfo -->
+
+<body>
+
+
+
+<script type="text/x-mathjax-config">
+MathJax.Hub.Config({
+  TeX: {
+     equationNumbers: {  autoNumber: "none"  },
+     extensions: ["AMSmath.js", "AMSsymbols.js", "autobold.js", "color.js"]
+  }
+});
+</script>
+<script type="text/javascript" async
+ src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
+</script>
+
+
+<!-- Bootstrap navigation bar -->
+<div class="navbar navbar-default navbar-fixed-top">
+  <div class="navbar-header">
+    <button type="button" class="navbar-toggle" data-toggle="collapse" data-target=".navbar-responsive-collapse">
+      <span class="icon-bar"></span>
+      <span class="icon-bar"></span>
+      <span class="icon-bar"></span>
+    </button>
+    <a class="navbar-brand" href="http://github.com/week39-bs.html">Week 39: Optimization and  Gradient Methods</a>
+  </div>
+  <div class="navbar-collapse collapse navbar-responsive-collapse">
+    <ul class="nav navbar-nav navbar-right">
+      <li class="dropdown">
+        <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
+        <ul class="dropdown-menu">
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs001.html#plan-for-week-39-september-23-27-2024" style="font-size: 80%;">Plan for week 39, September 23-27, 2024</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs002.html#lecture-monday-september-23" style="font-size: 80%;">Lecture Monday September 23</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs003.html#lab-sessions-week-39" style="font-size: 80%;">Lab sessions week 39</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs004.html#lecture-monday-september-23-optimization-the-central-part-of-any-machine-learning-algortithm" style="font-size: 80%;">Lecture Monday September 23, Optimization, the central part of any Machine Learning algortithm</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs005.html#revisiting-our-logistic-regression-case" style="font-size: 80%;">Revisiting our Logistic Regression case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs006.html#the-equations-to-solve" style="font-size: 80%;">The equations to solve</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs007.html#solving-using-newton-raphson-s-method" style="font-size: 80%;">Solving using Newton-Raphson's method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs008.html#brief-reminder-on-newton-raphson-s-method" style="font-size: 80%;">Brief reminder on Newton-Raphson's method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs009.html#the-equations" style="font-size: 80%;">The equations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs010.html#simple-geometric-interpretation" style="font-size: 80%;">Simple geometric interpretation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs011.html#extending-to-more-than-one-variable" style="font-size: 80%;">Extending to more than one variable</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs012.html#steepest-descent" style="font-size: 80%;">Steepest descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs013.html#more-on-steepest-descent" style="font-size: 80%;">More on Steepest descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs014.html#the-ideal" style="font-size: 80%;">The ideal</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs015.html#the-sensitiveness-of-the-gradient-descent" style="font-size: 80%;">The sensitiveness of the gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs016.html#convex-functions" style="font-size: 80%;">Convex functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs017.html#convex-function" style="font-size: 80%;">Convex function</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs018.html#conditions-on-convex-functions" style="font-size: 80%;">Conditions on convex functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs019.html#more-on-convex-functions" style="font-size: 80%;">More on convex functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs020.html#some-simple-problems" style="font-size: 80%;">Some simple problems</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs021.html#standard-steepest-descent" style="font-size: 80%;">Standard steepest descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs022.html#gradient-method" style="font-size: 80%;">Gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs024.html#steepest-descent-method" style="font-size: 80%;">Steepest descent  method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs024.html#steepest-descent-method" style="font-size: 80%;">Steepest descent  method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs025.html#final-expressions" style="font-size: 80%;">Final expressions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs026.html#steepest-descent-example" style="font-size: 80%;">Steepest descent example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs031.html#conjugate-gradient-method-and-iterations" style="font-size: 80%;">Conjugate gradient method and iterations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs035.html#revisiting-our-first-homework" style="font-size: 80%;">Revisiting our first homework</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs040.html#gradient-descent-example" style="font-size: 80%;">Gradient descent example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs037.html#the-derivative-of-the-cost-loss-function" style="font-size: 80%;">The derivative of the cost/loss function</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs038.html#the-hessian-matrix" style="font-size: 80%;">The Hessian matrix</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs039.html#simple-program" style="font-size: 80%;">Simple program</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs040.html#gradient-descent-example" style="font-size: 80%;">Gradient Descent Example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs041.html#and-a-corresponding-example-using-scikit-learn" style="font-size: 80%;">And a corresponding example using <b>scikit-learn</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs042.html#gradient-descent-and-ridge" style="font-size: 80%;">Gradient descent and Ridge</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs043.html#the-hessian-matrix-for-ridge-regression" style="font-size: 80%;">The Hessian matrix for Ridge Regression</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs044.html#program-example-for-gradient-descent-with-ridge-regression" style="font-size: 80%;">Program example for gradient descent with Ridge Regression</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs045.html#using-gradient-descent-methods-limitations" style="font-size: 80%;">Using gradient descent methods, limitations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs046.html#improving-gradient-descent-with-momentum" style="font-size: 80%;">Improving gradient descent with momentum</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs083.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;">Same code but now with momentum gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs048.html#overview-video-on-stochastic-gradient-descent" style="font-size: 80%;">Overview video on Stochastic Gradient Descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs049.html#batches-and-mini-batches" style="font-size: 80%;">Batches and mini-batches</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs050.html#stochastic-gradient-descent-sgd" style="font-size: 80%;">Stochastic Gradient Descent (SGD)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs051.html#stochastic-gradient-descent" style="font-size: 80%;">Stochastic Gradient Descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs052.html#computation-of-gradients" style="font-size: 80%;">Computation of gradients</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs053.html#sgd-example" style="font-size: 80%;">SGD example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs054.html#the-gradient-step" style="font-size: 80%;">The gradient step</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs055.html#simple-example-code" style="font-size: 80%;">Simple example code</a></li>
+     <!-- navigation toc: --> <li><a href="#when-do-we-stop" style="font-size: 80%;">When do we stop?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs057.html#slightly-different-approach" style="font-size: 80%;">Slightly different approach</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs058.html#time-decay-rate" style="font-size: 80%;">Time decay rate</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs059.html#code-with-a-number-of-minibatches-which-varies" style="font-size: 80%;">Code with a Number of Minibatches which varies</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs060.html#replace-or-not" style="font-size: 80%;">Replace or not</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs061.html#momentum-based-gd" style="font-size: 80%;">Momentum based GD</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs062.html#more-on-momentum-based-approaches" style="font-size: 80%;">More on momentum based approaches</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs063.html#momentum-parameter" style="font-size: 80%;">Momentum parameter</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs064.html#second-moment-of-the-gradient" style="font-size: 80%;">Second moment of the gradient</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs065.html#rms-prop" style="font-size: 80%;">RMS prop</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs066.html#adam-optimizer-https-arxiv-org-abs-1412-6980" style="font-size: 80%;">"ADAM optimizer":"https://arxiv.org/abs/1412.6980"</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs067.html#algorithms-and-codes-for-adagrad-rmsprop-and-adam" style="font-size: 80%;">Algorithms and codes for Adagrad, RMSprop and Adam</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs068.html#practical-tips" style="font-size: 80%;">Practical tips</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs069.html#automatic-differentiation" style="font-size: 80%;">Automatic differentiation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs070.html#using-autograd" style="font-size: 80%;">Using autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs071.html#autograd-with-more-complicated-functions" style="font-size: 80%;">Autograd with more complicated functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs072.html#more-complicated-functions-using-the-elements-of-their-arguments-directly" style="font-size: 80%;">More complicated functions using the elements of their arguments directly</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs073.html#functions-using-mathematical-functions-from-numpy" style="font-size: 80%;">Functions using mathematical functions from Numpy</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs074.html#more-autograd" style="font-size: 80%;">More autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs075.html#and-with-loops" style="font-size: 80%;">And  with loops</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs076.html#using-recursion" style="font-size: 80%;">Using recursion</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs077.html#unsupported-functions" style="font-size: 80%;">Unsupported functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs078.html#the-syntax-a-dot-b-when-finding-the-dot-product" style="font-size: 80%;">The syntax a.dot(b) when finding the dot product</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs079.html#using-autograd-with-ols" style="font-size: 80%;">Using Autograd with OLS</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs083.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;">Same code but now with momentum gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs081.html#but-none-of-these-can-compete-with-newton-s-method" style="font-size: 80%;">But none of these can compete with Newton's method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs082.html#including-stochastic-gradient-descent-with-autograd" style="font-size: 80%;">Including Stochastic Gradient Descent with Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs083.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;">Same code but now with momentum gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs084.html#similar-second-order-function-now-problem-but-now-with-adagrad" style="font-size: 80%;">Similar (second order function now) problem but now with AdaGrad</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs085.html#rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent" style="font-size: 80%;">RMSprop for adaptive learning rate with Stochastic Gradient Descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs086.html#and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf" style="font-size: 80%;">And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs087.html#and-logistic-regression" style="font-size: 80%;">And Logistic Regression</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs088.html#introducing-jax-https-jax-readthedocs-io-en-latest" style="font-size: 80%;">Introducing "JAX":"https://jax.readthedocs.io/en/latest/"</a></li>
+
+        </ul>
+      </li>
+    </ul>
+  </div>
+</div>
+</div> <!-- end of navigation bar -->
+<div class="container">
+<p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
+<a name="part0056"></a>
+<!-- !split -->
+<h2 id="when-do-we-stop" class="anchor">When do we stop? </h2>
+
+<p>A natural question is when do we stop the search for a new minimum?
+One possibility is to compute the full gradient after a given number
+of epochs and check if the norm of the gradient is smaller than some
+threshold and stop if true. However, the condition that the gradient
+is zero is valid also for local minima, so this would only tell us
+that we are close to a local/global minimum. However, we could also
+evaluate the cost function at this point, store the result and
+continue the search. If the test kicks in at a later stage we can
+compare the values of the cost function and keep the \( \beta \) that
+gave the lowest value.
+</p>
+
+<p>
+<!-- navigation buttons at the bottom of the page -->
+<ul class="pagination">
+<li><a href="http://github.com/._week39-bs055.html">&laquo;</a></li>
+  <li><a href="http://github.com/._week39-bs000.html">1</a></li>
+  <li><a href="">...</a></li>
+  <li><a href="http://github.com/._week39-bs048.html">49</a></li>
+  <li><a href="http://github.com/._week39-bs049.html">50</a></li>
+  <li><a href="http://github.com/._week39-bs050.html">51</a></li>
+  <li><a href="http://github.com/._week39-bs051.html">52</a></li>
+  <li><a href="http://github.com/._week39-bs052.html">53</a></li>
+  <li><a href="http://github.com/._week39-bs053.html">54</a></li>
+  <li><a href="http://github.com/._week39-bs054.html">55</a></li>
+  <li><a href="http://github.com/._week39-bs055.html">56</a></li>
+  <li class="active"><a href="http://github.com/._week39-bs056.html">57</a></li>
+  <li><a href="http://github.com/._week39-bs057.html">58</a></li>
+  <li><a href="http://github.com/._week39-bs058.html">59</a></li>
+  <li><a href="http://github.com/._week39-bs059.html">60</a></li>
+  <li><a href="http://github.com/._week39-bs060.html">61</a></li>
+  <li><a href="http://github.com/._week39-bs061.html">62</a></li>
+  <li><a href="http://github.com/._week39-bs062.html">63</a></li>
+  <li><a href="http://github.com/._week39-bs063.html">64</a></li>
+  <li><a href="http://github.com/._week39-bs064.html">65</a></li>
+  <li><a href="http://github.com/._week39-bs065.html">66</a></li>
+  <li><a href="">...</a></li>
+  <li><a href="http://github.com/._week39-bs088.html">89</a></li>
+  <li><a href="http://github.com/._week39-bs057.html">&raquo;</a></li>
+</ul>
+<!-- ------------------- end of main content --------------- -->
+</div>  <!-- end container -->
+<!-- include javascript, jQuery *first* -->
+<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.10.2/jquery.min.js"></script>
+<script src="https://netdna.bootstrapcdn.com/bootstrap/3.0.0/js/bootstrap.min.js"></script>
+<!-- Bootstrap footer
+<footer>
+<a href="https://.../"><img width="250" align=right src="https://.../"></a>
+</footer>
+-->
+<center style="font-size:80%">
+<!-- copyright only on the titlepage -->
+</center>
+</body>
+</html>
+
diff --git a/doc/pub/week39/html/._week39-bs057.html b/doc/pub/week39/html/._week39-bs057.html
new file mode 100644
index 000000000..fa86cec0c
--- /dev/null
+++ b/doc/pub/week39/html/._week39-bs057.html
@@ -0,0 +1,470 @@
+<!--
+HTML file automatically generated from DocOnce source
+(https://github.com/doconce/doconce/)
+doconce format html week39.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week39-bs --no_mako
+-->
+<html>
+<head>
+<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
+<meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
+<meta name="viewport" content="width=device-width, initial-scale=1.0" />
+<meta name="description" content="Week 39: Optimization and  Gradient Methods">
+<title>Week 39: Optimization and  Gradient Methods</title>
+<!-- Bootstrap style: bootstrap -->
+<!-- doconce format html week39.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week39-bs --no_mako -->
+<link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
+<!-- not necessary
+<link href="https://netdna.bootstrapcdn.com/font-awesome/4.0.3/css/font-awesome.css" rel="stylesheet">
+-->
+<style type="text/css">
+/* Add scrollbar to dropdown menus in bootstrap navigation bar */
+.dropdown-menu {
+   height: auto;
+   max-height: 400px;
+   overflow-x: hidden;
+}
+/* Adds an invisible element before each target to offset for the navigation
+   bar */
+.anchor::before {
+  content:"";
+  display:block;
+  height:50px;      /* fixed header height for style bootstrap */
+  margin:-50px 0 0; /* negative fixed header height */
+}
+</style>
+</head>
+
+<!-- tocinfo
+{'highest level': 2,
+ 'sections': [('Plan for week 39, September 23-27, 2024',
+               2,
+               None,
+               'plan-for-week-39-september-23-27-2024'),
+              ('Lecture Monday September 23',
+               2,
+               None,
+               'lecture-monday-september-23'),
+              ('Lab sessions week 39', 2, None, 'lab-sessions-week-39'),
+              ('Lecture Monday September 23, Optimization, the central part of '
+               'any Machine Learning algortithm',
+               2,
+               None,
+               'lecture-monday-september-23-optimization-the-central-part-of-any-machine-learning-algortithm'),
+              ('Revisiting our Logistic Regression case',
+               2,
+               None,
+               'revisiting-our-logistic-regression-case'),
+              ('The equations to solve', 2, None, 'the-equations-to-solve'),
+              ("Solving using Newton-Raphson's method",
+               2,
+               None,
+               'solving-using-newton-raphson-s-method'),
+              ("Brief reminder on Newton-Raphson's method",
+               2,
+               None,
+               'brief-reminder-on-newton-raphson-s-method'),
+              ('The equations', 2, None, 'the-equations'),
+              ('Simple geometric interpretation',
+               2,
+               None,
+               'simple-geometric-interpretation'),
+              ('Extending to more than one variable',
+               2,
+               None,
+               'extending-to-more-than-one-variable'),
+              ('Steepest descent', 2, None, 'steepest-descent'),
+              ('More on Steepest descent', 2, None, 'more-on-steepest-descent'),
+              ('The ideal', 2, None, 'the-ideal'),
+              ('The sensitiveness of the gradient descent',
+               2,
+               None,
+               'the-sensitiveness-of-the-gradient-descent'),
+              ('Convex functions', 2, None, 'convex-functions'),
+              ('Convex function', 2, None, 'convex-function'),
+              ('Conditions on convex functions',
+               2,
+               None,
+               'conditions-on-convex-functions'),
+              ('More on convex functions', 2, None, 'more-on-convex-functions'),
+              ('Some simple problems', 2, None, 'some-simple-problems'),
+              ('Standard steepest descent',
+               2,
+               None,
+               'standard-steepest-descent'),
+              ('Gradient method', 2, None, 'gradient-method'),
+              ('Steepest descent  method', 2, None, 'steepest-descent-method'),
+              ('Steepest descent  method', 2, None, 'steepest-descent-method'),
+              ('Final expressions', 2, None, 'final-expressions'),
+              ('Steepest descent example', 2, None, 'steepest-descent-example'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method and iterations',
+               2,
+               None,
+               'conjugate-gradient-method-and-iterations'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Revisiting our first homework',
+               2,
+               None,
+               'revisiting-our-first-homework'),
+              ('Gradient descent example', 2, None, 'gradient-descent-example'),
+              ('The derivative of the cost/loss function',
+               2,
+               None,
+               'the-derivative-of-the-cost-loss-function'),
+              ('The Hessian matrix', 2, None, 'the-hessian-matrix'),
+              ('Simple program', 2, None, 'simple-program'),
+              ('Gradient Descent Example', 2, None, 'gradient-descent-example'),
+              ('And a corresponding example using _scikit-learn_',
+               2,
+               None,
+               'and-a-corresponding-example-using-scikit-learn'),
+              ('Gradient descent and Ridge',
+               2,
+               None,
+               'gradient-descent-and-ridge'),
+              ('The Hessian matrix for Ridge Regression',
+               2,
+               None,
+               'the-hessian-matrix-for-ridge-regression'),
+              ('Program example for gradient descent with Ridge Regression',
+               2,
+               None,
+               'program-example-for-gradient-descent-with-ridge-regression'),
+              ('Using gradient descent methods, limitations',
+               2,
+               None,
+               'using-gradient-descent-methods-limitations'),
+              ('Improving gradient descent with momentum',
+               2,
+               None,
+               'improving-gradient-descent-with-momentum'),
+              ('Same code but now with momentum gradient descent',
+               2,
+               None,
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ('Overview video on Stochastic Gradient Descent',
+               2,
+               None,
+               'overview-video-on-stochastic-gradient-descent'),
+              ('Batches and mini-batches', 2, None, 'batches-and-mini-batches'),
+              ('Stochastic Gradient Descent (SGD)',
+               2,
+               None,
+               'stochastic-gradient-descent-sgd'),
+              ('Stochastic Gradient Descent',
+               2,
+               None,
+               'stochastic-gradient-descent'),
+              ('Computation of gradients', 2, None, 'computation-of-gradients'),
+              ('SGD example', 2, None, 'sgd-example'),
+              ('The gradient step', 2, None, 'the-gradient-step'),
+              ('Simple example code', 2, None, 'simple-example-code'),
+              ('When do we stop?', 2, None, 'when-do-we-stop'),
+              ('Slightly different approach',
+               2,
+               None,
+               'slightly-different-approach'),
+              ('Time decay rate', 2, None, 'time-decay-rate'),
+              ('Code with a Number of Minibatches which varies',
+               2,
+               None,
+               'code-with-a-number-of-minibatches-which-varies'),
+              ('Replace or not', 2, None, 'replace-or-not'),
+              ('Momentum based GD', 2, None, 'momentum-based-gd'),
+              ('More on momentum based approaches',
+               2,
+               None,
+               'more-on-momentum-based-approaches'),
+              ('Momentum parameter', 2, None, 'momentum-parameter'),
+              ('Second moment of the gradient',
+               2,
+               None,
+               'second-moment-of-the-gradient'),
+              ('RMS prop', 2, None, 'rms-prop'),
+              ('"ADAM optimizer":"https://arxiv.org/abs/1412.6980"',
+               2,
+               None,
+               'adam-optimizer-https-arxiv-org-abs-1412-6980'),
+              ('Algorithms and codes for Adagrad, RMSprop and Adam',
+               2,
+               None,
+               'algorithms-and-codes-for-adagrad-rmsprop-and-adam'),
+              ('Practical tips', 2, None, 'practical-tips'),
+              ('Automatic differentiation',
+               2,
+               None,
+               'automatic-differentiation'),
+              ('Using autograd', 2, None, 'using-autograd'),
+              ('Autograd with more complicated functions',
+               2,
+               None,
+               'autograd-with-more-complicated-functions'),
+              ('More complicated functions using the elements of their '
+               'arguments directly',
+               2,
+               None,
+               'more-complicated-functions-using-the-elements-of-their-arguments-directly'),
+              ('Functions using mathematical functions from Numpy',
+               2,
+               None,
+               'functions-using-mathematical-functions-from-numpy'),
+              ('More autograd', 2, None, 'more-autograd'),
+              ('And  with loops', 2, None, 'and-with-loops'),
+              ('Using recursion', 2, None, 'using-recursion'),
+              ('Unsupported functions', 2, None, 'unsupported-functions'),
+              ('The syntax a.dot(b) when finding the dot product',
+               2,
+               None,
+               'the-syntax-a-dot-b-when-finding-the-dot-product'),
+              ('Using Autograd with OLS', 2, None, 'using-autograd-with-ols'),
+              ('Same code but now with momentum gradient descent',
+               2,
+               None,
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ("But none of these can compete with Newton's method",
+               2,
+               None,
+               'but-none-of-these-can-compete-with-newton-s-method'),
+              ('Including Stochastic Gradient Descent with Autograd',
+               2,
+               None,
+               'including-stochastic-gradient-descent-with-autograd'),
+              ('Same code but now with momentum gradient descent',
+               2,
+               None,
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ('Similar (second order function now) problem but now with '
+               'AdaGrad',
+               2,
+               None,
+               'similar-second-order-function-now-problem-but-now-with-adagrad'),
+              ('RMSprop for adaptive learning rate with Stochastic Gradient '
+               'Descent',
+               2,
+               None,
+               'rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent'),
+              ('And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"',
+               2,
+               None,
+               'and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf'),
+              ('And Logistic Regression', 2, None, 'and-logistic-regression'),
+              ('Introducing "JAX":"https://jax.readthedocs.io/en/latest/"',
+               2,
+               None,
+               'introducing-jax-https-jax-readthedocs-io-en-latest')]}
+end of tocinfo -->
+
+<body>
+
+
+
+<script type="text/x-mathjax-config">
+MathJax.Hub.Config({
+  TeX: {
+     equationNumbers: {  autoNumber: "none"  },
+     extensions: ["AMSmath.js", "AMSsymbols.js", "autobold.js", "color.js"]
+  }
+});
+</script>
+<script type="text/javascript" async
+ src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
+</script>
+
+
+<!-- Bootstrap navigation bar -->
+<div class="navbar navbar-default navbar-fixed-top">
+  <div class="navbar-header">
+    <button type="button" class="navbar-toggle" data-toggle="collapse" data-target=".navbar-responsive-collapse">
+      <span class="icon-bar"></span>
+      <span class="icon-bar"></span>
+      <span class="icon-bar"></span>
+    </button>
+    <a class="navbar-brand" href="http://github.com/week39-bs.html">Week 39: Optimization and  Gradient Methods</a>
+  </div>
+  <div class="navbar-collapse collapse navbar-responsive-collapse">
+    <ul class="nav navbar-nav navbar-right">
+      <li class="dropdown">
+        <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
+        <ul class="dropdown-menu">
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs001.html#plan-for-week-39-september-23-27-2024" style="font-size: 80%;">Plan for week 39, September 23-27, 2024</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs002.html#lecture-monday-september-23" style="font-size: 80%;">Lecture Monday September 23</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs003.html#lab-sessions-week-39" style="font-size: 80%;">Lab sessions week 39</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs004.html#lecture-monday-september-23-optimization-the-central-part-of-any-machine-learning-algortithm" style="font-size: 80%;">Lecture Monday September 23, Optimization, the central part of any Machine Learning algortithm</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs005.html#revisiting-our-logistic-regression-case" style="font-size: 80%;">Revisiting our Logistic Regression case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs006.html#the-equations-to-solve" style="font-size: 80%;">The equations to solve</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs007.html#solving-using-newton-raphson-s-method" style="font-size: 80%;">Solving using Newton-Raphson's method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs008.html#brief-reminder-on-newton-raphson-s-method" style="font-size: 80%;">Brief reminder on Newton-Raphson's method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs009.html#the-equations" style="font-size: 80%;">The equations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs010.html#simple-geometric-interpretation" style="font-size: 80%;">Simple geometric interpretation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs011.html#extending-to-more-than-one-variable" style="font-size: 80%;">Extending to more than one variable</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs012.html#steepest-descent" style="font-size: 80%;">Steepest descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs013.html#more-on-steepest-descent" style="font-size: 80%;">More on Steepest descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs014.html#the-ideal" style="font-size: 80%;">The ideal</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs015.html#the-sensitiveness-of-the-gradient-descent" style="font-size: 80%;">The sensitiveness of the gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs016.html#convex-functions" style="font-size: 80%;">Convex functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs017.html#convex-function" style="font-size: 80%;">Convex function</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs018.html#conditions-on-convex-functions" style="font-size: 80%;">Conditions on convex functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs019.html#more-on-convex-functions" style="font-size: 80%;">More on convex functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs020.html#some-simple-problems" style="font-size: 80%;">Some simple problems</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs021.html#standard-steepest-descent" style="font-size: 80%;">Standard steepest descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs022.html#gradient-method" style="font-size: 80%;">Gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs024.html#steepest-descent-method" style="font-size: 80%;">Steepest descent  method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs024.html#steepest-descent-method" style="font-size: 80%;">Steepest descent  method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs025.html#final-expressions" style="font-size: 80%;">Final expressions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs026.html#steepest-descent-example" style="font-size: 80%;">Steepest descent example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs031.html#conjugate-gradient-method-and-iterations" style="font-size: 80%;">Conjugate gradient method and iterations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs035.html#revisiting-our-first-homework" style="font-size: 80%;">Revisiting our first homework</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs040.html#gradient-descent-example" style="font-size: 80%;">Gradient descent example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs037.html#the-derivative-of-the-cost-loss-function" style="font-size: 80%;">The derivative of the cost/loss function</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs038.html#the-hessian-matrix" style="font-size: 80%;">The Hessian matrix</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs039.html#simple-program" style="font-size: 80%;">Simple program</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs040.html#gradient-descent-example" style="font-size: 80%;">Gradient Descent Example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs041.html#and-a-corresponding-example-using-scikit-learn" style="font-size: 80%;">And a corresponding example using <b>scikit-learn</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs042.html#gradient-descent-and-ridge" style="font-size: 80%;">Gradient descent and Ridge</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs043.html#the-hessian-matrix-for-ridge-regression" style="font-size: 80%;">The Hessian matrix for Ridge Regression</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs044.html#program-example-for-gradient-descent-with-ridge-regression" style="font-size: 80%;">Program example for gradient descent with Ridge Regression</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs045.html#using-gradient-descent-methods-limitations" style="font-size: 80%;">Using gradient descent methods, limitations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs046.html#improving-gradient-descent-with-momentum" style="font-size: 80%;">Improving gradient descent with momentum</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs083.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;">Same code but now with momentum gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs048.html#overview-video-on-stochastic-gradient-descent" style="font-size: 80%;">Overview video on Stochastic Gradient Descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs049.html#batches-and-mini-batches" style="font-size: 80%;">Batches and mini-batches</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs050.html#stochastic-gradient-descent-sgd" style="font-size: 80%;">Stochastic Gradient Descent (SGD)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs051.html#stochastic-gradient-descent" style="font-size: 80%;">Stochastic Gradient Descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs052.html#computation-of-gradients" style="font-size: 80%;">Computation of gradients</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs053.html#sgd-example" style="font-size: 80%;">SGD example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs054.html#the-gradient-step" style="font-size: 80%;">The gradient step</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs055.html#simple-example-code" style="font-size: 80%;">Simple example code</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs056.html#when-do-we-stop" style="font-size: 80%;">When do we stop?</a></li>
+     <!-- navigation toc: --> <li><a href="#slightly-different-approach" style="font-size: 80%;">Slightly different approach</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs058.html#time-decay-rate" style="font-size: 80%;">Time decay rate</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs059.html#code-with-a-number-of-minibatches-which-varies" style="font-size: 80%;">Code with a Number of Minibatches which varies</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs060.html#replace-or-not" style="font-size: 80%;">Replace or not</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs061.html#momentum-based-gd" style="font-size: 80%;">Momentum based GD</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs062.html#more-on-momentum-based-approaches" style="font-size: 80%;">More on momentum based approaches</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs063.html#momentum-parameter" style="font-size: 80%;">Momentum parameter</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs064.html#second-moment-of-the-gradient" style="font-size: 80%;">Second moment of the gradient</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs065.html#rms-prop" style="font-size: 80%;">RMS prop</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs066.html#adam-optimizer-https-arxiv-org-abs-1412-6980" style="font-size: 80%;">"ADAM optimizer":"https://arxiv.org/abs/1412.6980"</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs067.html#algorithms-and-codes-for-adagrad-rmsprop-and-adam" style="font-size: 80%;">Algorithms and codes for Adagrad, RMSprop and Adam</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs068.html#practical-tips" style="font-size: 80%;">Practical tips</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs069.html#automatic-differentiation" style="font-size: 80%;">Automatic differentiation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs070.html#using-autograd" style="font-size: 80%;">Using autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs071.html#autograd-with-more-complicated-functions" style="font-size: 80%;">Autograd with more complicated functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs072.html#more-complicated-functions-using-the-elements-of-their-arguments-directly" style="font-size: 80%;">More complicated functions using the elements of their arguments directly</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs073.html#functions-using-mathematical-functions-from-numpy" style="font-size: 80%;">Functions using mathematical functions from Numpy</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs074.html#more-autograd" style="font-size: 80%;">More autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs075.html#and-with-loops" style="font-size: 80%;">And  with loops</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs076.html#using-recursion" style="font-size: 80%;">Using recursion</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs077.html#unsupported-functions" style="font-size: 80%;">Unsupported functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs078.html#the-syntax-a-dot-b-when-finding-the-dot-product" style="font-size: 80%;">The syntax a.dot(b) when finding the dot product</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs079.html#using-autograd-with-ols" style="font-size: 80%;">Using Autograd with OLS</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs083.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;">Same code but now with momentum gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs081.html#but-none-of-these-can-compete-with-newton-s-method" style="font-size: 80%;">But none of these can compete with Newton's method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs082.html#including-stochastic-gradient-descent-with-autograd" style="font-size: 80%;">Including Stochastic Gradient Descent with Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs083.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;">Same code but now with momentum gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs084.html#similar-second-order-function-now-problem-but-now-with-adagrad" style="font-size: 80%;">Similar (second order function now) problem but now with AdaGrad</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs085.html#rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent" style="font-size: 80%;">RMSprop for adaptive learning rate with Stochastic Gradient Descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs086.html#and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf" style="font-size: 80%;">And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs087.html#and-logistic-regression" style="font-size: 80%;">And Logistic Regression</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs088.html#introducing-jax-https-jax-readthedocs-io-en-latest" style="font-size: 80%;">Introducing "JAX":"https://jax.readthedocs.io/en/latest/"</a></li>
+
+        </ul>
+      </li>
+    </ul>
+  </div>
+</div>
+</div> <!-- end of navigation bar -->
+<div class="container">
+<p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
+<a name="part0057"></a>
+<!-- !split -->
+<h2 id="slightly-different-approach" class="anchor">Slightly different approach </h2>
+
+<p>Another approach is to let the step length \( \gamma_j \) depend on the
+number of epochs in such a way that it becomes very small after a
+reasonable time such that we do not move at all. Such approaches are
+also called scaling. There are many such ways to <a href="https://towardsdatascience.com/gradient-descent-the-learning-rate-and-the-importance-of-feature-scaling-6c0b416596e1" target="_self">scale the learning
+rate</a>
+and <a href="https://www.jmlr.org/papers/volume23/20-1258/20-1258.pdf" target="_self">discussions here</a>. See
+also
+<a href="https://towardsdatascience.com/learning-rate-schedules-and-adaptive-learning-rate-methods-for-deep-learning-2c8f433990d1" target="_self"><tt>https://towardsdatascience.com/learning-rate-schedules-and-adaptive-learning-rate-methods-for-deep-learning-2c8f433990d1</tt></a>
+for a discussion of different scaling functions for the learning rate.
+</p>
+
+<p>
+<!-- navigation buttons at the bottom of the page -->
+<ul class="pagination">
+<li><a href="http://github.com/._week39-bs056.html">&laquo;</a></li>
+  <li><a href="http://github.com/._week39-bs000.html">1</a></li>
+  <li><a href="">...</a></li>
+  <li><a href="http://github.com/._week39-bs049.html">50</a></li>
+  <li><a href="http://github.com/._week39-bs050.html">51</a></li>
+  <li><a href="http://github.com/._week39-bs051.html">52</a></li>
+  <li><a href="http://github.com/._week39-bs052.html">53</a></li>
+  <li><a href="http://github.com/._week39-bs053.html">54</a></li>
+  <li><a href="http://github.com/._week39-bs054.html">55</a></li>
+  <li><a href="http://github.com/._week39-bs055.html">56</a></li>
+  <li><a href="http://github.com/._week39-bs056.html">57</a></li>
+  <li class="active"><a href="http://github.com/._week39-bs057.html">58</a></li>
+  <li><a href="http://github.com/._week39-bs058.html">59</a></li>
+  <li><a href="http://github.com/._week39-bs059.html">60</a></li>
+  <li><a href="http://github.com/._week39-bs060.html">61</a></li>
+  <li><a href="http://github.com/._week39-bs061.html">62</a></li>
+  <li><a href="http://github.com/._week39-bs062.html">63</a></li>
+  <li><a href="http://github.com/._week39-bs063.html">64</a></li>
+  <li><a href="http://github.com/._week39-bs064.html">65</a></li>
+  <li><a href="http://github.com/._week39-bs065.html">66</a></li>
+  <li><a href="http://github.com/._week39-bs066.html">67</a></li>
+  <li><a href="">...</a></li>
+  <li><a href="http://github.com/._week39-bs088.html">89</a></li>
+  <li><a href="http://github.com/._week39-bs058.html">&raquo;</a></li>
+</ul>
+<!-- ------------------- end of main content --------------- -->
+</div>  <!-- end container -->
+<!-- include javascript, jQuery *first* -->
+<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.10.2/jquery.min.js"></script>
+<script src="https://netdna.bootstrapcdn.com/bootstrap/3.0.0/js/bootstrap.min.js"></script>
+<!-- Bootstrap footer
+<footer>
+<a href="https://.../"><img width="250" align=right src="https://.../"></a>
+</footer>
+-->
+<center style="font-size:80%">
+<!-- copyright only on the titlepage -->
+</center>
+</body>
+</html>
+
diff --git a/doc/pub/week39/html/._week39-bs058.html b/doc/pub/week39/html/._week39-bs058.html
new file mode 100644
index 000000000..c6cf71ee5
--- /dev/null
+++ b/doc/pub/week39/html/._week39-bs058.html
@@ -0,0 +1,515 @@
+<!--
+HTML file automatically generated from DocOnce source
+(https://github.com/doconce/doconce/)
+doconce format html week39.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week39-bs --no_mako
+-->
+<html>
+<head>
+<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
+<meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
+<meta name="viewport" content="width=device-width, initial-scale=1.0" />
+<meta name="description" content="Week 39: Optimization and  Gradient Methods">
+<title>Week 39: Optimization and  Gradient Methods</title>
+<!-- Bootstrap style: bootstrap -->
+<!-- doconce format html week39.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week39-bs --no_mako -->
+<link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
+<!-- not necessary
+<link href="https://netdna.bootstrapcdn.com/font-awesome/4.0.3/css/font-awesome.css" rel="stylesheet">
+-->
+<style type="text/css">
+/* Add scrollbar to dropdown menus in bootstrap navigation bar */
+.dropdown-menu {
+   height: auto;
+   max-height: 400px;
+   overflow-x: hidden;
+}
+/* Adds an invisible element before each target to offset for the navigation
+   bar */
+.anchor::before {
+  content:"";
+  display:block;
+  height:50px;      /* fixed header height for style bootstrap */
+  margin:-50px 0 0; /* negative fixed header height */
+}
+</style>
+</head>
+
+<!-- tocinfo
+{'highest level': 2,
+ 'sections': [('Plan for week 39, September 23-27, 2024',
+               2,
+               None,
+               'plan-for-week-39-september-23-27-2024'),
+              ('Lecture Monday September 23',
+               2,
+               None,
+               'lecture-monday-september-23'),
+              ('Lab sessions week 39', 2, None, 'lab-sessions-week-39'),
+              ('Lecture Monday September 23, Optimization, the central part of '
+               'any Machine Learning algortithm',
+               2,
+               None,
+               'lecture-monday-september-23-optimization-the-central-part-of-any-machine-learning-algortithm'),
+              ('Revisiting our Logistic Regression case',
+               2,
+               None,
+               'revisiting-our-logistic-regression-case'),
+              ('The equations to solve', 2, None, 'the-equations-to-solve'),
+              ("Solving using Newton-Raphson's method",
+               2,
+               None,
+               'solving-using-newton-raphson-s-method'),
+              ("Brief reminder on Newton-Raphson's method",
+               2,
+               None,
+               'brief-reminder-on-newton-raphson-s-method'),
+              ('The equations', 2, None, 'the-equations'),
+              ('Simple geometric interpretation',
+               2,
+               None,
+               'simple-geometric-interpretation'),
+              ('Extending to more than one variable',
+               2,
+               None,
+               'extending-to-more-than-one-variable'),
+              ('Steepest descent', 2, None, 'steepest-descent'),
+              ('More on Steepest descent', 2, None, 'more-on-steepest-descent'),
+              ('The ideal', 2, None, 'the-ideal'),
+              ('The sensitiveness of the gradient descent',
+               2,
+               None,
+               'the-sensitiveness-of-the-gradient-descent'),
+              ('Convex functions', 2, None, 'convex-functions'),
+              ('Convex function', 2, None, 'convex-function'),
+              ('Conditions on convex functions',
+               2,
+               None,
+               'conditions-on-convex-functions'),
+              ('More on convex functions', 2, None, 'more-on-convex-functions'),
+              ('Some simple problems', 2, None, 'some-simple-problems'),
+              ('Standard steepest descent',
+               2,
+               None,
+               'standard-steepest-descent'),
+              ('Gradient method', 2, None, 'gradient-method'),
+              ('Steepest descent  method', 2, None, 'steepest-descent-method'),
+              ('Steepest descent  method', 2, None, 'steepest-descent-method'),
+              ('Final expressions', 2, None, 'final-expressions'),
+              ('Steepest descent example', 2, None, 'steepest-descent-example'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method and iterations',
+               2,
+               None,
+               'conjugate-gradient-method-and-iterations'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Revisiting our first homework',
+               2,
+               None,
+               'revisiting-our-first-homework'),
+              ('Gradient descent example', 2, None, 'gradient-descent-example'),
+              ('The derivative of the cost/loss function',
+               2,
+               None,
+               'the-derivative-of-the-cost-loss-function'),
+              ('The Hessian matrix', 2, None, 'the-hessian-matrix'),
+              ('Simple program', 2, None, 'simple-program'),
+              ('Gradient Descent Example', 2, None, 'gradient-descent-example'),
+              ('And a corresponding example using _scikit-learn_',
+               2,
+               None,
+               'and-a-corresponding-example-using-scikit-learn'),
+              ('Gradient descent and Ridge',
+               2,
+               None,
+               'gradient-descent-and-ridge'),
+              ('The Hessian matrix for Ridge Regression',
+               2,
+               None,
+               'the-hessian-matrix-for-ridge-regression'),
+              ('Program example for gradient descent with Ridge Regression',
+               2,
+               None,
+               'program-example-for-gradient-descent-with-ridge-regression'),
+              ('Using gradient descent methods, limitations',
+               2,
+               None,
+               'using-gradient-descent-methods-limitations'),
+              ('Improving gradient descent with momentum',
+               2,
+               None,
+               'improving-gradient-descent-with-momentum'),
+              ('Same code but now with momentum gradient descent',
+               2,
+               None,
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ('Overview video on Stochastic Gradient Descent',
+               2,
+               None,
+               'overview-video-on-stochastic-gradient-descent'),
+              ('Batches and mini-batches', 2, None, 'batches-and-mini-batches'),
+              ('Stochastic Gradient Descent (SGD)',
+               2,
+               None,
+               'stochastic-gradient-descent-sgd'),
+              ('Stochastic Gradient Descent',
+               2,
+               None,
+               'stochastic-gradient-descent'),
+              ('Computation of gradients', 2, None, 'computation-of-gradients'),
+              ('SGD example', 2, None, 'sgd-example'),
+              ('The gradient step', 2, None, 'the-gradient-step'),
+              ('Simple example code', 2, None, 'simple-example-code'),
+              ('When do we stop?', 2, None, 'when-do-we-stop'),
+              ('Slightly different approach',
+               2,
+               None,
+               'slightly-different-approach'),
+              ('Time decay rate', 2, None, 'time-decay-rate'),
+              ('Code with a Number of Minibatches which varies',
+               2,
+               None,
+               'code-with-a-number-of-minibatches-which-varies'),
+              ('Replace or not', 2, None, 'replace-or-not'),
+              ('Momentum based GD', 2, None, 'momentum-based-gd'),
+              ('More on momentum based approaches',
+               2,
+               None,
+               'more-on-momentum-based-approaches'),
+              ('Momentum parameter', 2, None, 'momentum-parameter'),
+              ('Second moment of the gradient',
+               2,
+               None,
+               'second-moment-of-the-gradient'),
+              ('RMS prop', 2, None, 'rms-prop'),
+              ('"ADAM optimizer":"https://arxiv.org/abs/1412.6980"',
+               2,
+               None,
+               'adam-optimizer-https-arxiv-org-abs-1412-6980'),
+              ('Algorithms and codes for Adagrad, RMSprop and Adam',
+               2,
+               None,
+               'algorithms-and-codes-for-adagrad-rmsprop-and-adam'),
+              ('Practical tips', 2, None, 'practical-tips'),
+              ('Automatic differentiation',
+               2,
+               None,
+               'automatic-differentiation'),
+              ('Using autograd', 2, None, 'using-autograd'),
+              ('Autograd with more complicated functions',
+               2,
+               None,
+               'autograd-with-more-complicated-functions'),
+              ('More complicated functions using the elements of their '
+               'arguments directly',
+               2,
+               None,
+               'more-complicated-functions-using-the-elements-of-their-arguments-directly'),
+              ('Functions using mathematical functions from Numpy',
+               2,
+               None,
+               'functions-using-mathematical-functions-from-numpy'),
+              ('More autograd', 2, None, 'more-autograd'),
+              ('And  with loops', 2, None, 'and-with-loops'),
+              ('Using recursion', 2, None, 'using-recursion'),
+              ('Unsupported functions', 2, None, 'unsupported-functions'),
+              ('The syntax a.dot(b) when finding the dot product',
+               2,
+               None,
+               'the-syntax-a-dot-b-when-finding-the-dot-product'),
+              ('Using Autograd with OLS', 2, None, 'using-autograd-with-ols'),
+              ('Same code but now with momentum gradient descent',
+               2,
+               None,
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ("But none of these can compete with Newton's method",
+               2,
+               None,
+               'but-none-of-these-can-compete-with-newton-s-method'),
+              ('Including Stochastic Gradient Descent with Autograd',
+               2,
+               None,
+               'including-stochastic-gradient-descent-with-autograd'),
+              ('Same code but now with momentum gradient descent',
+               2,
+               None,
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ('Similar (second order function now) problem but now with '
+               'AdaGrad',
+               2,
+               None,
+               'similar-second-order-function-now-problem-but-now-with-adagrad'),
+              ('RMSprop for adaptive learning rate with Stochastic Gradient '
+               'Descent',
+               2,
+               None,
+               'rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent'),
+              ('And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"',
+               2,
+               None,
+               'and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf'),
+              ('And Logistic Regression', 2, None, 'and-logistic-regression'),
+              ('Introducing "JAX":"https://jax.readthedocs.io/en/latest/"',
+               2,
+               None,
+               'introducing-jax-https-jax-readthedocs-io-en-latest')]}
+end of tocinfo -->
+
+<body>
+
+
+
+<script type="text/x-mathjax-config">
+MathJax.Hub.Config({
+  TeX: {
+     equationNumbers: {  autoNumber: "none"  },
+     extensions: ["AMSmath.js", "AMSsymbols.js", "autobold.js", "color.js"]
+  }
+});
+</script>
+<script type="text/javascript" async
+ src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
+</script>
+
+
+<!-- Bootstrap navigation bar -->
+<div class="navbar navbar-default navbar-fixed-top">
+  <div class="navbar-header">
+    <button type="button" class="navbar-toggle" data-toggle="collapse" data-target=".navbar-responsive-collapse">
+      <span class="icon-bar"></span>
+      <span class="icon-bar"></span>
+      <span class="icon-bar"></span>
+    </button>
+    <a class="navbar-brand" href="http://github.com/week39-bs.html">Week 39: Optimization and  Gradient Methods</a>
+  </div>
+  <div class="navbar-collapse collapse navbar-responsive-collapse">
+    <ul class="nav navbar-nav navbar-right">
+      <li class="dropdown">
+        <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
+        <ul class="dropdown-menu">
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs001.html#plan-for-week-39-september-23-27-2024" style="font-size: 80%;">Plan for week 39, September 23-27, 2024</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs002.html#lecture-monday-september-23" style="font-size: 80%;">Lecture Monday September 23</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs003.html#lab-sessions-week-39" style="font-size: 80%;">Lab sessions week 39</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs004.html#lecture-monday-september-23-optimization-the-central-part-of-any-machine-learning-algortithm" style="font-size: 80%;">Lecture Monday September 23, Optimization, the central part of any Machine Learning algortithm</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs005.html#revisiting-our-logistic-regression-case" style="font-size: 80%;">Revisiting our Logistic Regression case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs006.html#the-equations-to-solve" style="font-size: 80%;">The equations to solve</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs007.html#solving-using-newton-raphson-s-method" style="font-size: 80%;">Solving using Newton-Raphson's method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs008.html#brief-reminder-on-newton-raphson-s-method" style="font-size: 80%;">Brief reminder on Newton-Raphson's method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs009.html#the-equations" style="font-size: 80%;">The equations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs010.html#simple-geometric-interpretation" style="font-size: 80%;">Simple geometric interpretation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs011.html#extending-to-more-than-one-variable" style="font-size: 80%;">Extending to more than one variable</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs012.html#steepest-descent" style="font-size: 80%;">Steepest descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs013.html#more-on-steepest-descent" style="font-size: 80%;">More on Steepest descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs014.html#the-ideal" style="font-size: 80%;">The ideal</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs015.html#the-sensitiveness-of-the-gradient-descent" style="font-size: 80%;">The sensitiveness of the gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs016.html#convex-functions" style="font-size: 80%;">Convex functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs017.html#convex-function" style="font-size: 80%;">Convex function</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs018.html#conditions-on-convex-functions" style="font-size: 80%;">Conditions on convex functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs019.html#more-on-convex-functions" style="font-size: 80%;">More on convex functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs020.html#some-simple-problems" style="font-size: 80%;">Some simple problems</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs021.html#standard-steepest-descent" style="font-size: 80%;">Standard steepest descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs022.html#gradient-method" style="font-size: 80%;">Gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs024.html#steepest-descent-method" style="font-size: 80%;">Steepest descent  method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs024.html#steepest-descent-method" style="font-size: 80%;">Steepest descent  method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs025.html#final-expressions" style="font-size: 80%;">Final expressions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs026.html#steepest-descent-example" style="font-size: 80%;">Steepest descent example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs031.html#conjugate-gradient-method-and-iterations" style="font-size: 80%;">Conjugate gradient method and iterations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs035.html#revisiting-our-first-homework" style="font-size: 80%;">Revisiting our first homework</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs040.html#gradient-descent-example" style="font-size: 80%;">Gradient descent example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs037.html#the-derivative-of-the-cost-loss-function" style="font-size: 80%;">The derivative of the cost/loss function</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs038.html#the-hessian-matrix" style="font-size: 80%;">The Hessian matrix</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs039.html#simple-program" style="font-size: 80%;">Simple program</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs040.html#gradient-descent-example" style="font-size: 80%;">Gradient Descent Example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs041.html#and-a-corresponding-example-using-scikit-learn" style="font-size: 80%;">And a corresponding example using <b>scikit-learn</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs042.html#gradient-descent-and-ridge" style="font-size: 80%;">Gradient descent and Ridge</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs043.html#the-hessian-matrix-for-ridge-regression" style="font-size: 80%;">The Hessian matrix for Ridge Regression</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs044.html#program-example-for-gradient-descent-with-ridge-regression" style="font-size: 80%;">Program example for gradient descent with Ridge Regression</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs045.html#using-gradient-descent-methods-limitations" style="font-size: 80%;">Using gradient descent methods, limitations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs046.html#improving-gradient-descent-with-momentum" style="font-size: 80%;">Improving gradient descent with momentum</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs083.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;">Same code but now with momentum gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs048.html#overview-video-on-stochastic-gradient-descent" style="font-size: 80%;">Overview video on Stochastic Gradient Descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs049.html#batches-and-mini-batches" style="font-size: 80%;">Batches and mini-batches</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs050.html#stochastic-gradient-descent-sgd" style="font-size: 80%;">Stochastic Gradient Descent (SGD)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs051.html#stochastic-gradient-descent" style="font-size: 80%;">Stochastic Gradient Descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs052.html#computation-of-gradients" style="font-size: 80%;">Computation of gradients</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs053.html#sgd-example" style="font-size: 80%;">SGD example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs054.html#the-gradient-step" style="font-size: 80%;">The gradient step</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs055.html#simple-example-code" style="font-size: 80%;">Simple example code</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs056.html#when-do-we-stop" style="font-size: 80%;">When do we stop?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs057.html#slightly-different-approach" style="font-size: 80%;">Slightly different approach</a></li>
+     <!-- navigation toc: --> <li><a href="#time-decay-rate" style="font-size: 80%;">Time decay rate</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs059.html#code-with-a-number-of-minibatches-which-varies" style="font-size: 80%;">Code with a Number of Minibatches which varies</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs060.html#replace-or-not" style="font-size: 80%;">Replace or not</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs061.html#momentum-based-gd" style="font-size: 80%;">Momentum based GD</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs062.html#more-on-momentum-based-approaches" style="font-size: 80%;">More on momentum based approaches</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs063.html#momentum-parameter" style="font-size: 80%;">Momentum parameter</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs064.html#second-moment-of-the-gradient" style="font-size: 80%;">Second moment of the gradient</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs065.html#rms-prop" style="font-size: 80%;">RMS prop</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs066.html#adam-optimizer-https-arxiv-org-abs-1412-6980" style="font-size: 80%;">"ADAM optimizer":"https://arxiv.org/abs/1412.6980"</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs067.html#algorithms-and-codes-for-adagrad-rmsprop-and-adam" style="font-size: 80%;">Algorithms and codes for Adagrad, RMSprop and Adam</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs068.html#practical-tips" style="font-size: 80%;">Practical tips</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs069.html#automatic-differentiation" style="font-size: 80%;">Automatic differentiation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs070.html#using-autograd" style="font-size: 80%;">Using autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs071.html#autograd-with-more-complicated-functions" style="font-size: 80%;">Autograd with more complicated functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs072.html#more-complicated-functions-using-the-elements-of-their-arguments-directly" style="font-size: 80%;">More complicated functions using the elements of their arguments directly</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs073.html#functions-using-mathematical-functions-from-numpy" style="font-size: 80%;">Functions using mathematical functions from Numpy</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs074.html#more-autograd" style="font-size: 80%;">More autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs075.html#and-with-loops" style="font-size: 80%;">And  with loops</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs076.html#using-recursion" style="font-size: 80%;">Using recursion</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs077.html#unsupported-functions" style="font-size: 80%;">Unsupported functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs078.html#the-syntax-a-dot-b-when-finding-the-dot-product" style="font-size: 80%;">The syntax a.dot(b) when finding the dot product</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs079.html#using-autograd-with-ols" style="font-size: 80%;">Using Autograd with OLS</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs083.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;">Same code but now with momentum gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs081.html#but-none-of-these-can-compete-with-newton-s-method" style="font-size: 80%;">But none of these can compete with Newton's method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs082.html#including-stochastic-gradient-descent-with-autograd" style="font-size: 80%;">Including Stochastic Gradient Descent with Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs083.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;">Same code but now with momentum gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs084.html#similar-second-order-function-now-problem-but-now-with-adagrad" style="font-size: 80%;">Similar (second order function now) problem but now with AdaGrad</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs085.html#rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent" style="font-size: 80%;">RMSprop for adaptive learning rate with Stochastic Gradient Descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs086.html#and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf" style="font-size: 80%;">And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs087.html#and-logistic-regression" style="font-size: 80%;">And Logistic Regression</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs088.html#introducing-jax-https-jax-readthedocs-io-en-latest" style="font-size: 80%;">Introducing "JAX":"https://jax.readthedocs.io/en/latest/"</a></li>
+
+        </ul>
+      </li>
+    </ul>
+  </div>
+</div>
+</div> <!-- end of navigation bar -->
+<div class="container">
+<p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
+<a name="part0058"></a>
+<!-- !split -->
+<h2 id="time-decay-rate" class="anchor">Time decay rate </h2>
+
+<p>As an example, let \( e = 0,1,2,3,\cdots \) denote the current epoch and let \( t_0, t_1 > 0 \) be two fixed numbers. Furthermore, let \( t = e \cdot m + i \) where \( m \) is the number of minibatches and \( i=0,\cdots,m-1 \). Then the function $$\gamma_j(t; t_0, t_1) = \frac{t_0}{t+t_1} $$ goes to zero as the number of epochs gets large. I.e. we start with a step length \( \gamma_j (0; t_0, t_1) = t_0/t_1 \) which decays in <em>time</em> \( t \).</p>
+
+<p>In this way we can fix the number of epochs, compute \( \beta \) and
+evaluate the cost function at the end. Repeating the computation will
+give a different result since the scheme is random by design. Then we
+pick the final \( \beta \) that gives the lowest value of the cost
+function.
+</p>
+
+
+<!-- code=python (!bc pycod) typeset with pygments style "default" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #f8f8f8">
+  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span> 
+
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">step_length</span>(t,t0,t1):
+    <span style="color: #008000; font-weight: bold">return</span> t0<span style="color: #666666">/</span>(t<span style="color: #666666">+</span>t1)
+
+n <span style="color: #666666">=</span> <span style="color: #666666">100</span> <span style="color: #408080; font-style: italic">#100 datapoints </span>
+M <span style="color: #666666">=</span> <span style="color: #666666">5</span>   <span style="color: #408080; font-style: italic">#size of each minibatch</span>
+m <span style="color: #666666">=</span> <span style="color: #008000">int</span>(n<span style="color: #666666">/</span>M) <span style="color: #408080; font-style: italic">#number of minibatches</span>
+n_epochs <span style="color: #666666">=</span> <span style="color: #666666">500</span> <span style="color: #408080; font-style: italic">#number of epochs</span>
+t0 <span style="color: #666666">=</span> <span style="color: #666666">1.0</span>
+t1 <span style="color: #666666">=</span> <span style="color: #666666">10</span>
+
+gamma_j <span style="color: #666666">=</span> t0<span style="color: #666666">/</span>t1
+j <span style="color: #666666">=</span> <span style="color: #666666">0</span>
+<span style="color: #008000; font-weight: bold">for</span> epoch <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #666666">1</span>,n_epochs<span style="color: #666666">+1</span>):
+    <span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(m):
+        k <span style="color: #666666">=</span> np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>randint(m) <span style="color: #408080; font-style: italic">#Pick the k-th minibatch at random</span>
+        <span style="color: #408080; font-style: italic">#Compute the gradient using the data in minibatch Bk</span>
+        <span style="color: #408080; font-style: italic">#Compute new suggestion for beta</span>
+        t <span style="color: #666666">=</span> epoch<span style="color: #666666">*</span>m<span style="color: #666666">+</span>i
+        gamma_j <span style="color: #666666">=</span> step_length(t,t0,t1)
+        j <span style="color: #666666">+=</span> <span style="color: #666666">1</span>
+
+<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;gamma_j after </span><span style="color: #BB6688; font-weight: bold">%d</span><span style="color: #BA2121"> epochs: </span><span style="color: #BB6688; font-weight: bold">%g</span><span style="color: #BA2121">&quot;</span> <span style="color: #666666">%</span> (n_epochs,gamma_j))
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
+</div>
+
+
+<p>
+<!-- navigation buttons at the bottom of the page -->
+<ul class="pagination">
+<li><a href="http://github.com/._week39-bs057.html">&laquo;</a></li>
+  <li><a href="http://github.com/._week39-bs000.html">1</a></li>
+  <li><a href="">...</a></li>
+  <li><a href="http://github.com/._week39-bs050.html">51</a></li>
+  <li><a href="http://github.com/._week39-bs051.html">52</a></li>
+  <li><a href="http://github.com/._week39-bs052.html">53</a></li>
+  <li><a href="http://github.com/._week39-bs053.html">54</a></li>
+  <li><a href="http://github.com/._week39-bs054.html">55</a></li>
+  <li><a href="http://github.com/._week39-bs055.html">56</a></li>
+  <li><a href="http://github.com/._week39-bs056.html">57</a></li>
+  <li><a href="http://github.com/._week39-bs057.html">58</a></li>
+  <li class="active"><a href="http://github.com/._week39-bs058.html">59</a></li>
+  <li><a href="http://github.com/._week39-bs059.html">60</a></li>
+  <li><a href="http://github.com/._week39-bs060.html">61</a></li>
+  <li><a href="http://github.com/._week39-bs061.html">62</a></li>
+  <li><a href="http://github.com/._week39-bs062.html">63</a></li>
+  <li><a href="http://github.com/._week39-bs063.html">64</a></li>
+  <li><a href="http://github.com/._week39-bs064.html">65</a></li>
+  <li><a href="http://github.com/._week39-bs065.html">66</a></li>
+  <li><a href="http://github.com/._week39-bs066.html">67</a></li>
+  <li><a href="http://github.com/._week39-bs067.html">68</a></li>
+  <li><a href="">...</a></li>
+  <li><a href="http://github.com/._week39-bs088.html">89</a></li>
+  <li><a href="http://github.com/._week39-bs059.html">&raquo;</a></li>
+</ul>
+<!-- ------------------- end of main content --------------- -->
+</div>  <!-- end container -->
+<!-- include javascript, jQuery *first* -->
+<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.10.2/jquery.min.js"></script>
+<script src="https://netdna.bootstrapcdn.com/bootstrap/3.0.0/js/bootstrap.min.js"></script>
+<!-- Bootstrap footer
+<footer>
+<a href="https://.../"><img width="250" align=right src="https://.../"></a>
+</footer>
+-->
+<center style="font-size:80%">
+<!-- copyright only on the titlepage -->
+</center>
+</body>
+</html>
+
diff --git a/doc/pub/week39/html/._week39-bs059.html b/doc/pub/week39/html/._week39-bs059.html
new file mode 100644
index 000000000..eb2c5b76e
--- /dev/null
+++ b/doc/pub/week39/html/._week39-bs059.html
@@ -0,0 +1,549 @@
+<!--
+HTML file automatically generated from DocOnce source
+(https://github.com/doconce/doconce/)
+doconce format html week39.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week39-bs --no_mako
+-->
+<html>
+<head>
+<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
+<meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
+<meta name="viewport" content="width=device-width, initial-scale=1.0" />
+<meta name="description" content="Week 39: Optimization and  Gradient Methods">
+<title>Week 39: Optimization and  Gradient Methods</title>
+<!-- Bootstrap style: bootstrap -->
+<!-- doconce format html week39.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week39-bs --no_mako -->
+<link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
+<!-- not necessary
+<link href="https://netdna.bootstrapcdn.com/font-awesome/4.0.3/css/font-awesome.css" rel="stylesheet">
+-->
+<style type="text/css">
+/* Add scrollbar to dropdown menus in bootstrap navigation bar */
+.dropdown-menu {
+   height: auto;
+   max-height: 400px;
+   overflow-x: hidden;
+}
+/* Adds an invisible element before each target to offset for the navigation
+   bar */
+.anchor::before {
+  content:"";
+  display:block;
+  height:50px;      /* fixed header height for style bootstrap */
+  margin:-50px 0 0; /* negative fixed header height */
+}
+</style>
+</head>
+
+<!-- tocinfo
+{'highest level': 2,
+ 'sections': [('Plan for week 39, September 23-27, 2024',
+               2,
+               None,
+               'plan-for-week-39-september-23-27-2024'),
+              ('Lecture Monday September 23',
+               2,
+               None,
+               'lecture-monday-september-23'),
+              ('Lab sessions week 39', 2, None, 'lab-sessions-week-39'),
+              ('Lecture Monday September 23, Optimization, the central part of '
+               'any Machine Learning algortithm',
+               2,
+               None,
+               'lecture-monday-september-23-optimization-the-central-part-of-any-machine-learning-algortithm'),
+              ('Revisiting our Logistic Regression case',
+               2,
+               None,
+               'revisiting-our-logistic-regression-case'),
+              ('The equations to solve', 2, None, 'the-equations-to-solve'),
+              ("Solving using Newton-Raphson's method",
+               2,
+               None,
+               'solving-using-newton-raphson-s-method'),
+              ("Brief reminder on Newton-Raphson's method",
+               2,
+               None,
+               'brief-reminder-on-newton-raphson-s-method'),
+              ('The equations', 2, None, 'the-equations'),
+              ('Simple geometric interpretation',
+               2,
+               None,
+               'simple-geometric-interpretation'),
+              ('Extending to more than one variable',
+               2,
+               None,
+               'extending-to-more-than-one-variable'),
+              ('Steepest descent', 2, None, 'steepest-descent'),
+              ('More on Steepest descent', 2, None, 'more-on-steepest-descent'),
+              ('The ideal', 2, None, 'the-ideal'),
+              ('The sensitiveness of the gradient descent',
+               2,
+               None,
+               'the-sensitiveness-of-the-gradient-descent'),
+              ('Convex functions', 2, None, 'convex-functions'),
+              ('Convex function', 2, None, 'convex-function'),
+              ('Conditions on convex functions',
+               2,
+               None,
+               'conditions-on-convex-functions'),
+              ('More on convex functions', 2, None, 'more-on-convex-functions'),
+              ('Some simple problems', 2, None, 'some-simple-problems'),
+              ('Standard steepest descent',
+               2,
+               None,
+               'standard-steepest-descent'),
+              ('Gradient method', 2, None, 'gradient-method'),
+              ('Steepest descent  method', 2, None, 'steepest-descent-method'),
+              ('Steepest descent  method', 2, None, 'steepest-descent-method'),
+              ('Final expressions', 2, None, 'final-expressions'),
+              ('Steepest descent example', 2, None, 'steepest-descent-example'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method and iterations',
+               2,
+               None,
+               'conjugate-gradient-method-and-iterations'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Revisiting our first homework',
+               2,
+               None,
+               'revisiting-our-first-homework'),
+              ('Gradient descent example', 2, None, 'gradient-descent-example'),
+              ('The derivative of the cost/loss function',
+               2,
+               None,
+               'the-derivative-of-the-cost-loss-function'),
+              ('The Hessian matrix', 2, None, 'the-hessian-matrix'),
+              ('Simple program', 2, None, 'simple-program'),
+              ('Gradient Descent Example', 2, None, 'gradient-descent-example'),
+              ('And a corresponding example using _scikit-learn_',
+               2,
+               None,
+               'and-a-corresponding-example-using-scikit-learn'),
+              ('Gradient descent and Ridge',
+               2,
+               None,
+               'gradient-descent-and-ridge'),
+              ('The Hessian matrix for Ridge Regression',
+               2,
+               None,
+               'the-hessian-matrix-for-ridge-regression'),
+              ('Program example for gradient descent with Ridge Regression',
+               2,
+               None,
+               'program-example-for-gradient-descent-with-ridge-regression'),
+              ('Using gradient descent methods, limitations',
+               2,
+               None,
+               'using-gradient-descent-methods-limitations'),
+              ('Improving gradient descent with momentum',
+               2,
+               None,
+               'improving-gradient-descent-with-momentum'),
+              ('Same code but now with momentum gradient descent',
+               2,
+               None,
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ('Overview video on Stochastic Gradient Descent',
+               2,
+               None,
+               'overview-video-on-stochastic-gradient-descent'),
+              ('Batches and mini-batches', 2, None, 'batches-and-mini-batches'),
+              ('Stochastic Gradient Descent (SGD)',
+               2,
+               None,
+               'stochastic-gradient-descent-sgd'),
+              ('Stochastic Gradient Descent',
+               2,
+               None,
+               'stochastic-gradient-descent'),
+              ('Computation of gradients', 2, None, 'computation-of-gradients'),
+              ('SGD example', 2, None, 'sgd-example'),
+              ('The gradient step', 2, None, 'the-gradient-step'),
+              ('Simple example code', 2, None, 'simple-example-code'),
+              ('When do we stop?', 2, None, 'when-do-we-stop'),
+              ('Slightly different approach',
+               2,
+               None,
+               'slightly-different-approach'),
+              ('Time decay rate', 2, None, 'time-decay-rate'),
+              ('Code with a Number of Minibatches which varies',
+               2,
+               None,
+               'code-with-a-number-of-minibatches-which-varies'),
+              ('Replace or not', 2, None, 'replace-or-not'),
+              ('Momentum based GD', 2, None, 'momentum-based-gd'),
+              ('More on momentum based approaches',
+               2,
+               None,
+               'more-on-momentum-based-approaches'),
+              ('Momentum parameter', 2, None, 'momentum-parameter'),
+              ('Second moment of the gradient',
+               2,
+               None,
+               'second-moment-of-the-gradient'),
+              ('RMS prop', 2, None, 'rms-prop'),
+              ('"ADAM optimizer":"https://arxiv.org/abs/1412.6980"',
+               2,
+               None,
+               'adam-optimizer-https-arxiv-org-abs-1412-6980'),
+              ('Algorithms and codes for Adagrad, RMSprop and Adam',
+               2,
+               None,
+               'algorithms-and-codes-for-adagrad-rmsprop-and-adam'),
+              ('Practical tips', 2, None, 'practical-tips'),
+              ('Automatic differentiation',
+               2,
+               None,
+               'automatic-differentiation'),
+              ('Using autograd', 2, None, 'using-autograd'),
+              ('Autograd with more complicated functions',
+               2,
+               None,
+               'autograd-with-more-complicated-functions'),
+              ('More complicated functions using the elements of their '
+               'arguments directly',
+               2,
+               None,
+               'more-complicated-functions-using-the-elements-of-their-arguments-directly'),
+              ('Functions using mathematical functions from Numpy',
+               2,
+               None,
+               'functions-using-mathematical-functions-from-numpy'),
+              ('More autograd', 2, None, 'more-autograd'),
+              ('And  with loops', 2, None, 'and-with-loops'),
+              ('Using recursion', 2, None, 'using-recursion'),
+              ('Unsupported functions', 2, None, 'unsupported-functions'),
+              ('The syntax a.dot(b) when finding the dot product',
+               2,
+               None,
+               'the-syntax-a-dot-b-when-finding-the-dot-product'),
+              ('Using Autograd with OLS', 2, None, 'using-autograd-with-ols'),
+              ('Same code but now with momentum gradient descent',
+               2,
+               None,
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ("But none of these can compete with Newton's method",
+               2,
+               None,
+               'but-none-of-these-can-compete-with-newton-s-method'),
+              ('Including Stochastic Gradient Descent with Autograd',
+               2,
+               None,
+               'including-stochastic-gradient-descent-with-autograd'),
+              ('Same code but now with momentum gradient descent',
+               2,
+               None,
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ('Similar (second order function now) problem but now with '
+               'AdaGrad',
+               2,
+               None,
+               'similar-second-order-function-now-problem-but-now-with-adagrad'),
+              ('RMSprop for adaptive learning rate with Stochastic Gradient '
+               'Descent',
+               2,
+               None,
+               'rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent'),
+              ('And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"',
+               2,
+               None,
+               'and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf'),
+              ('And Logistic Regression', 2, None, 'and-logistic-regression'),
+              ('Introducing "JAX":"https://jax.readthedocs.io/en/latest/"',
+               2,
+               None,
+               'introducing-jax-https-jax-readthedocs-io-en-latest')]}
+end of tocinfo -->
+
+<body>
+
+
+
+<script type="text/x-mathjax-config">
+MathJax.Hub.Config({
+  TeX: {
+     equationNumbers: {  autoNumber: "none"  },
+     extensions: ["AMSmath.js", "AMSsymbols.js", "autobold.js", "color.js"]
+  }
+});
+</script>
+<script type="text/javascript" async
+ src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
+</script>
+
+
+<!-- Bootstrap navigation bar -->
+<div class="navbar navbar-default navbar-fixed-top">
+  <div class="navbar-header">
+    <button type="button" class="navbar-toggle" data-toggle="collapse" data-target=".navbar-responsive-collapse">
+      <span class="icon-bar"></span>
+      <span class="icon-bar"></span>
+      <span class="icon-bar"></span>
+    </button>
+    <a class="navbar-brand" href="http://github.com/week39-bs.html">Week 39: Optimization and  Gradient Methods</a>
+  </div>
+  <div class="navbar-collapse collapse navbar-responsive-collapse">
+    <ul class="nav navbar-nav navbar-right">
+      <li class="dropdown">
+        <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
+        <ul class="dropdown-menu">
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs001.html#plan-for-week-39-september-23-27-2024" style="font-size: 80%;">Plan for week 39, September 23-27, 2024</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs002.html#lecture-monday-september-23" style="font-size: 80%;">Lecture Monday September 23</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs003.html#lab-sessions-week-39" style="font-size: 80%;">Lab sessions week 39</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs004.html#lecture-monday-september-23-optimization-the-central-part-of-any-machine-learning-algortithm" style="font-size: 80%;">Lecture Monday September 23, Optimization, the central part of any Machine Learning algortithm</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs005.html#revisiting-our-logistic-regression-case" style="font-size: 80%;">Revisiting our Logistic Regression case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs006.html#the-equations-to-solve" style="font-size: 80%;">The equations to solve</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs007.html#solving-using-newton-raphson-s-method" style="font-size: 80%;">Solving using Newton-Raphson's method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs008.html#brief-reminder-on-newton-raphson-s-method" style="font-size: 80%;">Brief reminder on Newton-Raphson's method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs009.html#the-equations" style="font-size: 80%;">The equations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs010.html#simple-geometric-interpretation" style="font-size: 80%;">Simple geometric interpretation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs011.html#extending-to-more-than-one-variable" style="font-size: 80%;">Extending to more than one variable</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs012.html#steepest-descent" style="font-size: 80%;">Steepest descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs013.html#more-on-steepest-descent" style="font-size: 80%;">More on Steepest descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs014.html#the-ideal" style="font-size: 80%;">The ideal</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs015.html#the-sensitiveness-of-the-gradient-descent" style="font-size: 80%;">The sensitiveness of the gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs016.html#convex-functions" style="font-size: 80%;">Convex functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs017.html#convex-function" style="font-size: 80%;">Convex function</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs018.html#conditions-on-convex-functions" style="font-size: 80%;">Conditions on convex functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs019.html#more-on-convex-functions" style="font-size: 80%;">More on convex functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs020.html#some-simple-problems" style="font-size: 80%;">Some simple problems</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs021.html#standard-steepest-descent" style="font-size: 80%;">Standard steepest descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs022.html#gradient-method" style="font-size: 80%;">Gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs024.html#steepest-descent-method" style="font-size: 80%;">Steepest descent  method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs024.html#steepest-descent-method" style="font-size: 80%;">Steepest descent  method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs025.html#final-expressions" style="font-size: 80%;">Final expressions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs026.html#steepest-descent-example" style="font-size: 80%;">Steepest descent example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs031.html#conjugate-gradient-method-and-iterations" style="font-size: 80%;">Conjugate gradient method and iterations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs035.html#revisiting-our-first-homework" style="font-size: 80%;">Revisiting our first homework</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs040.html#gradient-descent-example" style="font-size: 80%;">Gradient descent example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs037.html#the-derivative-of-the-cost-loss-function" style="font-size: 80%;">The derivative of the cost/loss function</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs038.html#the-hessian-matrix" style="font-size: 80%;">The Hessian matrix</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs039.html#simple-program" style="font-size: 80%;">Simple program</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs040.html#gradient-descent-example" style="font-size: 80%;">Gradient Descent Example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs041.html#and-a-corresponding-example-using-scikit-learn" style="font-size: 80%;">And a corresponding example using <b>scikit-learn</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs042.html#gradient-descent-and-ridge" style="font-size: 80%;">Gradient descent and Ridge</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs043.html#the-hessian-matrix-for-ridge-regression" style="font-size: 80%;">The Hessian matrix for Ridge Regression</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs044.html#program-example-for-gradient-descent-with-ridge-regression" style="font-size: 80%;">Program example for gradient descent with Ridge Regression</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs045.html#using-gradient-descent-methods-limitations" style="font-size: 80%;">Using gradient descent methods, limitations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs046.html#improving-gradient-descent-with-momentum" style="font-size: 80%;">Improving gradient descent with momentum</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs083.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;">Same code but now with momentum gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs048.html#overview-video-on-stochastic-gradient-descent" style="font-size: 80%;">Overview video on Stochastic Gradient Descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs049.html#batches-and-mini-batches" style="font-size: 80%;">Batches and mini-batches</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs050.html#stochastic-gradient-descent-sgd" style="font-size: 80%;">Stochastic Gradient Descent (SGD)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs051.html#stochastic-gradient-descent" style="font-size: 80%;">Stochastic Gradient Descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs052.html#computation-of-gradients" style="font-size: 80%;">Computation of gradients</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs053.html#sgd-example" style="font-size: 80%;">SGD example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs054.html#the-gradient-step" style="font-size: 80%;">The gradient step</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs055.html#simple-example-code" style="font-size: 80%;">Simple example code</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs056.html#when-do-we-stop" style="font-size: 80%;">When do we stop?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs057.html#slightly-different-approach" style="font-size: 80%;">Slightly different approach</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs058.html#time-decay-rate" style="font-size: 80%;">Time decay rate</a></li>
+     <!-- navigation toc: --> <li><a href="#code-with-a-number-of-minibatches-which-varies" style="font-size: 80%;">Code with a Number of Minibatches which varies</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs060.html#replace-or-not" style="font-size: 80%;">Replace or not</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs061.html#momentum-based-gd" style="font-size: 80%;">Momentum based GD</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs062.html#more-on-momentum-based-approaches" style="font-size: 80%;">More on momentum based approaches</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs063.html#momentum-parameter" style="font-size: 80%;">Momentum parameter</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs064.html#second-moment-of-the-gradient" style="font-size: 80%;">Second moment of the gradient</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs065.html#rms-prop" style="font-size: 80%;">RMS prop</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs066.html#adam-optimizer-https-arxiv-org-abs-1412-6980" style="font-size: 80%;">"ADAM optimizer":"https://arxiv.org/abs/1412.6980"</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs067.html#algorithms-and-codes-for-adagrad-rmsprop-and-adam" style="font-size: 80%;">Algorithms and codes for Adagrad, RMSprop and Adam</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs068.html#practical-tips" style="font-size: 80%;">Practical tips</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs069.html#automatic-differentiation" style="font-size: 80%;">Automatic differentiation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs070.html#using-autograd" style="font-size: 80%;">Using autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs071.html#autograd-with-more-complicated-functions" style="font-size: 80%;">Autograd with more complicated functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs072.html#more-complicated-functions-using-the-elements-of-their-arguments-directly" style="font-size: 80%;">More complicated functions using the elements of their arguments directly</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs073.html#functions-using-mathematical-functions-from-numpy" style="font-size: 80%;">Functions using mathematical functions from Numpy</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs074.html#more-autograd" style="font-size: 80%;">More autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs075.html#and-with-loops" style="font-size: 80%;">And  with loops</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs076.html#using-recursion" style="font-size: 80%;">Using recursion</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs077.html#unsupported-functions" style="font-size: 80%;">Unsupported functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs078.html#the-syntax-a-dot-b-when-finding-the-dot-product" style="font-size: 80%;">The syntax a.dot(b) when finding the dot product</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs079.html#using-autograd-with-ols" style="font-size: 80%;">Using Autograd with OLS</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs083.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;">Same code but now with momentum gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs081.html#but-none-of-these-can-compete-with-newton-s-method" style="font-size: 80%;">But none of these can compete with Newton's method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs082.html#including-stochastic-gradient-descent-with-autograd" style="font-size: 80%;">Including Stochastic Gradient Descent with Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs083.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;">Same code but now with momentum gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs084.html#similar-second-order-function-now-problem-but-now-with-adagrad" style="font-size: 80%;">Similar (second order function now) problem but now with AdaGrad</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs085.html#rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent" style="font-size: 80%;">RMSprop for adaptive learning rate with Stochastic Gradient Descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs086.html#and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf" style="font-size: 80%;">And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs087.html#and-logistic-regression" style="font-size: 80%;">And Logistic Regression</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs088.html#introducing-jax-https-jax-readthedocs-io-en-latest" style="font-size: 80%;">Introducing "JAX":"https://jax.readthedocs.io/en/latest/"</a></li>
+
+        </ul>
+      </li>
+    </ul>
+  </div>
+</div>
+</div> <!-- end of navigation bar -->
+<div class="container">
+<p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
+<a name="part0059"></a>
+<!-- !split -->
+<h2 id="code-with-a-number-of-minibatches-which-varies" class="anchor">Code with a Number of Minibatches which varies </h2>
+
+<p>In the code here we vary the number of mini-batches.</p>
+
+<!-- code=text (!bc pycode) typeset with pygments style "default" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #f8f8f8">
+  <pre style="line-height: 125%;"># Importing various packages
+from math import exp, sqrt
+from random import random, seed
+import numpy as np
+import matplotlib.pyplot as plt
+
+n = 100
+x = 2*np.random.rand(n,1)
+y = 4+3*x+np.random.randn(n,1)
+
+X = np.c_[np.ones((n,1)), x]
+XT_X = X.T @ X
+theta_linreg = np.linalg.inv(X.T @ X) @ (X.T @ y)
+print(&quot;Own inversion&quot;)
+print(theta_linreg)
+# Hessian matrix
+H = (2.0/n)* XT_X
+EigValues, EigVectors = np.linalg.eig(H)
+print(f&quot;Eigenvalues of Hessian Matrix:{EigValues}&quot;)
+
+theta = np.random.randn(2,1)
+eta = 1.0/np.max(EigValues)
+Niterations = 1000
+
+
+for iter in range(Niterations):
+    gradients = 2.0/n*X.T @ ((X @ theta)-y)
+    theta -= eta*gradients
+print(&quot;theta from own gd&quot;)
+print(theta)
+
+xnew = np.array([[0],[2]])
+Xnew = np.c_[np.ones((2,1)), xnew]
+ypredict = Xnew.dot(theta)
+ypredict2 = Xnew.dot(theta_linreg)
+
+n_epochs = 50
+M = 5   #size of each minibatch
+m = int(n/M) #number of minibatches
+t0, t1 = 5, 50
+
+def learning_schedule(t):
+    return t0/(t+t1)
+
+theta = np.random.randn(2,1)
+
+for epoch in range(n_epochs):
+# Can you figure out a better way of setting up the contributions to each batch?
+    for i in range(m):
+        random_index = M*np.random.randint(m)
+        xi = X[random_index:random_index+M]
+        yi = y[random_index:random_index+M]
+        gradients = (2.0/M)* xi.T @ ((xi @ theta)-yi)
+        eta = learning_schedule(epoch*m+i)
+        theta = theta - eta*gradients
+print(&quot;theta from own sdg&quot;)
+print(theta)
+
+plt.plot(xnew, ypredict, &quot;r-&quot;)
+plt.plot(xnew, ypredict2, &quot;b-&quot;)
+plt.plot(x, y ,&#39;ro&#39;)
+plt.axis([0,2.0,0, 15.0])
+plt.xlabel(r&#39;$x$&#39;)
+plt.ylabel(r&#39;$y$&#39;)
+plt.title(r&#39;Random numbers &#39;)
+plt.show()
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
+</div>
+
+
+<p>
+<!-- navigation buttons at the bottom of the page -->
+<ul class="pagination">
+<li><a href="http://github.com/._week39-bs058.html">&laquo;</a></li>
+  <li><a href="http://github.com/._week39-bs000.html">1</a></li>
+  <li><a href="">...</a></li>
+  <li><a href="http://github.com/._week39-bs051.html">52</a></li>
+  <li><a href="http://github.com/._week39-bs052.html">53</a></li>
+  <li><a href="http://github.com/._week39-bs053.html">54</a></li>
+  <li><a href="http://github.com/._week39-bs054.html">55</a></li>
+  <li><a href="http://github.com/._week39-bs055.html">56</a></li>
+  <li><a href="http://github.com/._week39-bs056.html">57</a></li>
+  <li><a href="http://github.com/._week39-bs057.html">58</a></li>
+  <li><a href="http://github.com/._week39-bs058.html">59</a></li>
+  <li class="active"><a href="http://github.com/._week39-bs059.html">60</a></li>
+  <li><a href="http://github.com/._week39-bs060.html">61</a></li>
+  <li><a href="http://github.com/._week39-bs061.html">62</a></li>
+  <li><a href="http://github.com/._week39-bs062.html">63</a></li>
+  <li><a href="http://github.com/._week39-bs063.html">64</a></li>
+  <li><a href="http://github.com/._week39-bs064.html">65</a></li>
+  <li><a href="http://github.com/._week39-bs065.html">66</a></li>
+  <li><a href="http://github.com/._week39-bs066.html">67</a></li>
+  <li><a href="http://github.com/._week39-bs067.html">68</a></li>
+  <li><a href="http://github.com/._week39-bs068.html">69</a></li>
+  <li><a href="">...</a></li>
+  <li><a href="http://github.com/._week39-bs088.html">89</a></li>
+  <li><a href="http://github.com/._week39-bs060.html">&raquo;</a></li>
+</ul>
+<!-- ------------------- end of main content --------------- -->
+</div>  <!-- end container -->
+<!-- include javascript, jQuery *first* -->
+<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.10.2/jquery.min.js"></script>
+<script src="https://netdna.bootstrapcdn.com/bootstrap/3.0.0/js/bootstrap.min.js"></script>
+<!-- Bootstrap footer
+<footer>
+<a href="https://.../"><img width="250" align=right src="https://.../"></a>
+</footer>
+-->
+<center style="font-size:80%">
+<!-- copyright only on the titlepage -->
+</center>
+</body>
+</html>
+
diff --git a/doc/pub/week39/html/._week39-bs060.html b/doc/pub/week39/html/._week39-bs060.html
new file mode 100644
index 000000000..4d77d1f1d
--- /dev/null
+++ b/doc/pub/week39/html/._week39-bs060.html
@@ -0,0 +1,465 @@
+<!--
+HTML file automatically generated from DocOnce source
+(https://github.com/doconce/doconce/)
+doconce format html week39.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week39-bs --no_mako
+-->
+<html>
+<head>
+<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
+<meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
+<meta name="viewport" content="width=device-width, initial-scale=1.0" />
+<meta name="description" content="Week 39: Optimization and  Gradient Methods">
+<title>Week 39: Optimization and  Gradient Methods</title>
+<!-- Bootstrap style: bootstrap -->
+<!-- doconce format html week39.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week39-bs --no_mako -->
+<link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
+<!-- not necessary
+<link href="https://netdna.bootstrapcdn.com/font-awesome/4.0.3/css/font-awesome.css" rel="stylesheet">
+-->
+<style type="text/css">
+/* Add scrollbar to dropdown menus in bootstrap navigation bar */
+.dropdown-menu {
+   height: auto;
+   max-height: 400px;
+   overflow-x: hidden;
+}
+/* Adds an invisible element before each target to offset for the navigation
+   bar */
+.anchor::before {
+  content:"";
+  display:block;
+  height:50px;      /* fixed header height for style bootstrap */
+  margin:-50px 0 0; /* negative fixed header height */
+}
+</style>
+</head>
+
+<!-- tocinfo
+{'highest level': 2,
+ 'sections': [('Plan for week 39, September 23-27, 2024',
+               2,
+               None,
+               'plan-for-week-39-september-23-27-2024'),
+              ('Lecture Monday September 23',
+               2,
+               None,
+               'lecture-monday-september-23'),
+              ('Lab sessions week 39', 2, None, 'lab-sessions-week-39'),
+              ('Lecture Monday September 23, Optimization, the central part of '
+               'any Machine Learning algortithm',
+               2,
+               None,
+               'lecture-monday-september-23-optimization-the-central-part-of-any-machine-learning-algortithm'),
+              ('Revisiting our Logistic Regression case',
+               2,
+               None,
+               'revisiting-our-logistic-regression-case'),
+              ('The equations to solve', 2, None, 'the-equations-to-solve'),
+              ("Solving using Newton-Raphson's method",
+               2,
+               None,
+               'solving-using-newton-raphson-s-method'),
+              ("Brief reminder on Newton-Raphson's method",
+               2,
+               None,
+               'brief-reminder-on-newton-raphson-s-method'),
+              ('The equations', 2, None, 'the-equations'),
+              ('Simple geometric interpretation',
+               2,
+               None,
+               'simple-geometric-interpretation'),
+              ('Extending to more than one variable',
+               2,
+               None,
+               'extending-to-more-than-one-variable'),
+              ('Steepest descent', 2, None, 'steepest-descent'),
+              ('More on Steepest descent', 2, None, 'more-on-steepest-descent'),
+              ('The ideal', 2, None, 'the-ideal'),
+              ('The sensitiveness of the gradient descent',
+               2,
+               None,
+               'the-sensitiveness-of-the-gradient-descent'),
+              ('Convex functions', 2, None, 'convex-functions'),
+              ('Convex function', 2, None, 'convex-function'),
+              ('Conditions on convex functions',
+               2,
+               None,
+               'conditions-on-convex-functions'),
+              ('More on convex functions', 2, None, 'more-on-convex-functions'),
+              ('Some simple problems', 2, None, 'some-simple-problems'),
+              ('Standard steepest descent',
+               2,
+               None,
+               'standard-steepest-descent'),
+              ('Gradient method', 2, None, 'gradient-method'),
+              ('Steepest descent  method', 2, None, 'steepest-descent-method'),
+              ('Steepest descent  method', 2, None, 'steepest-descent-method'),
+              ('Final expressions', 2, None, 'final-expressions'),
+              ('Steepest descent example', 2, None, 'steepest-descent-example'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method and iterations',
+               2,
+               None,
+               'conjugate-gradient-method-and-iterations'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Revisiting our first homework',
+               2,
+               None,
+               'revisiting-our-first-homework'),
+              ('Gradient descent example', 2, None, 'gradient-descent-example'),
+              ('The derivative of the cost/loss function',
+               2,
+               None,
+               'the-derivative-of-the-cost-loss-function'),
+              ('The Hessian matrix', 2, None, 'the-hessian-matrix'),
+              ('Simple program', 2, None, 'simple-program'),
+              ('Gradient Descent Example', 2, None, 'gradient-descent-example'),
+              ('And a corresponding example using _scikit-learn_',
+               2,
+               None,
+               'and-a-corresponding-example-using-scikit-learn'),
+              ('Gradient descent and Ridge',
+               2,
+               None,
+               'gradient-descent-and-ridge'),
+              ('The Hessian matrix for Ridge Regression',
+               2,
+               None,
+               'the-hessian-matrix-for-ridge-regression'),
+              ('Program example for gradient descent with Ridge Regression',
+               2,
+               None,
+               'program-example-for-gradient-descent-with-ridge-regression'),
+              ('Using gradient descent methods, limitations',
+               2,
+               None,
+               'using-gradient-descent-methods-limitations'),
+              ('Improving gradient descent with momentum',
+               2,
+               None,
+               'improving-gradient-descent-with-momentum'),
+              ('Same code but now with momentum gradient descent',
+               2,
+               None,
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ('Overview video on Stochastic Gradient Descent',
+               2,
+               None,
+               'overview-video-on-stochastic-gradient-descent'),
+              ('Batches and mini-batches', 2, None, 'batches-and-mini-batches'),
+              ('Stochastic Gradient Descent (SGD)',
+               2,
+               None,
+               'stochastic-gradient-descent-sgd'),
+              ('Stochastic Gradient Descent',
+               2,
+               None,
+               'stochastic-gradient-descent'),
+              ('Computation of gradients', 2, None, 'computation-of-gradients'),
+              ('SGD example', 2, None, 'sgd-example'),
+              ('The gradient step', 2, None, 'the-gradient-step'),
+              ('Simple example code', 2, None, 'simple-example-code'),
+              ('When do we stop?', 2, None, 'when-do-we-stop'),
+              ('Slightly different approach',
+               2,
+               None,
+               'slightly-different-approach'),
+              ('Time decay rate', 2, None, 'time-decay-rate'),
+              ('Code with a Number of Minibatches which varies',
+               2,
+               None,
+               'code-with-a-number-of-minibatches-which-varies'),
+              ('Replace or not', 2, None, 'replace-or-not'),
+              ('Momentum based GD', 2, None, 'momentum-based-gd'),
+              ('More on momentum based approaches',
+               2,
+               None,
+               'more-on-momentum-based-approaches'),
+              ('Momentum parameter', 2, None, 'momentum-parameter'),
+              ('Second moment of the gradient',
+               2,
+               None,
+               'second-moment-of-the-gradient'),
+              ('RMS prop', 2, None, 'rms-prop'),
+              ('"ADAM optimizer":"https://arxiv.org/abs/1412.6980"',
+               2,
+               None,
+               'adam-optimizer-https-arxiv-org-abs-1412-6980'),
+              ('Algorithms and codes for Adagrad, RMSprop and Adam',
+               2,
+               None,
+               'algorithms-and-codes-for-adagrad-rmsprop-and-adam'),
+              ('Practical tips', 2, None, 'practical-tips'),
+              ('Automatic differentiation',
+               2,
+               None,
+               'automatic-differentiation'),
+              ('Using autograd', 2, None, 'using-autograd'),
+              ('Autograd with more complicated functions',
+               2,
+               None,
+               'autograd-with-more-complicated-functions'),
+              ('More complicated functions using the elements of their '
+               'arguments directly',
+               2,
+               None,
+               'more-complicated-functions-using-the-elements-of-their-arguments-directly'),
+              ('Functions using mathematical functions from Numpy',
+               2,
+               None,
+               'functions-using-mathematical-functions-from-numpy'),
+              ('More autograd', 2, None, 'more-autograd'),
+              ('And  with loops', 2, None, 'and-with-loops'),
+              ('Using recursion', 2, None, 'using-recursion'),
+              ('Unsupported functions', 2, None, 'unsupported-functions'),
+              ('The syntax a.dot(b) when finding the dot product',
+               2,
+               None,
+               'the-syntax-a-dot-b-when-finding-the-dot-product'),
+              ('Using Autograd with OLS', 2, None, 'using-autograd-with-ols'),
+              ('Same code but now with momentum gradient descent',
+               2,
+               None,
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ("But none of these can compete with Newton's method",
+               2,
+               None,
+               'but-none-of-these-can-compete-with-newton-s-method'),
+              ('Including Stochastic Gradient Descent with Autograd',
+               2,
+               None,
+               'including-stochastic-gradient-descent-with-autograd'),
+              ('Same code but now with momentum gradient descent',
+               2,
+               None,
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ('Similar (second order function now) problem but now with '
+               'AdaGrad',
+               2,
+               None,
+               'similar-second-order-function-now-problem-but-now-with-adagrad'),
+              ('RMSprop for adaptive learning rate with Stochastic Gradient '
+               'Descent',
+               2,
+               None,
+               'rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent'),
+              ('And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"',
+               2,
+               None,
+               'and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf'),
+              ('And Logistic Regression', 2, None, 'and-logistic-regression'),
+              ('Introducing "JAX":"https://jax.readthedocs.io/en/latest/"',
+               2,
+               None,
+               'introducing-jax-https-jax-readthedocs-io-en-latest')]}
+end of tocinfo -->
+
+<body>
+
+
+
+<script type="text/x-mathjax-config">
+MathJax.Hub.Config({
+  TeX: {
+     equationNumbers: {  autoNumber: "none"  },
+     extensions: ["AMSmath.js", "AMSsymbols.js", "autobold.js", "color.js"]
+  }
+});
+</script>
+<script type="text/javascript" async
+ src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
+</script>
+
+
+<!-- Bootstrap navigation bar -->
+<div class="navbar navbar-default navbar-fixed-top">
+  <div class="navbar-header">
+    <button type="button" class="navbar-toggle" data-toggle="collapse" data-target=".navbar-responsive-collapse">
+      <span class="icon-bar"></span>
+      <span class="icon-bar"></span>
+      <span class="icon-bar"></span>
+    </button>
+    <a class="navbar-brand" href="http://github.com/week39-bs.html">Week 39: Optimization and  Gradient Methods</a>
+  </div>
+  <div class="navbar-collapse collapse navbar-responsive-collapse">
+    <ul class="nav navbar-nav navbar-right">
+      <li class="dropdown">
+        <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
+        <ul class="dropdown-menu">
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs001.html#plan-for-week-39-september-23-27-2024" style="font-size: 80%;">Plan for week 39, September 23-27, 2024</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs002.html#lecture-monday-september-23" style="font-size: 80%;">Lecture Monday September 23</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs003.html#lab-sessions-week-39" style="font-size: 80%;">Lab sessions week 39</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs004.html#lecture-monday-september-23-optimization-the-central-part-of-any-machine-learning-algortithm" style="font-size: 80%;">Lecture Monday September 23, Optimization, the central part of any Machine Learning algortithm</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs005.html#revisiting-our-logistic-regression-case" style="font-size: 80%;">Revisiting our Logistic Regression case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs006.html#the-equations-to-solve" style="font-size: 80%;">The equations to solve</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs007.html#solving-using-newton-raphson-s-method" style="font-size: 80%;">Solving using Newton-Raphson's method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs008.html#brief-reminder-on-newton-raphson-s-method" style="font-size: 80%;">Brief reminder on Newton-Raphson's method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs009.html#the-equations" style="font-size: 80%;">The equations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs010.html#simple-geometric-interpretation" style="font-size: 80%;">Simple geometric interpretation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs011.html#extending-to-more-than-one-variable" style="font-size: 80%;">Extending to more than one variable</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs012.html#steepest-descent" style="font-size: 80%;">Steepest descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs013.html#more-on-steepest-descent" style="font-size: 80%;">More on Steepest descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs014.html#the-ideal" style="font-size: 80%;">The ideal</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs015.html#the-sensitiveness-of-the-gradient-descent" style="font-size: 80%;">The sensitiveness of the gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs016.html#convex-functions" style="font-size: 80%;">Convex functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs017.html#convex-function" style="font-size: 80%;">Convex function</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs018.html#conditions-on-convex-functions" style="font-size: 80%;">Conditions on convex functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs019.html#more-on-convex-functions" style="font-size: 80%;">More on convex functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs020.html#some-simple-problems" style="font-size: 80%;">Some simple problems</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs021.html#standard-steepest-descent" style="font-size: 80%;">Standard steepest descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs022.html#gradient-method" style="font-size: 80%;">Gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs024.html#steepest-descent-method" style="font-size: 80%;">Steepest descent  method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs024.html#steepest-descent-method" style="font-size: 80%;">Steepest descent  method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs025.html#final-expressions" style="font-size: 80%;">Final expressions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs026.html#steepest-descent-example" style="font-size: 80%;">Steepest descent example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs031.html#conjugate-gradient-method-and-iterations" style="font-size: 80%;">Conjugate gradient method and iterations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs035.html#revisiting-our-first-homework" style="font-size: 80%;">Revisiting our first homework</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs040.html#gradient-descent-example" style="font-size: 80%;">Gradient descent example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs037.html#the-derivative-of-the-cost-loss-function" style="font-size: 80%;">The derivative of the cost/loss function</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs038.html#the-hessian-matrix" style="font-size: 80%;">The Hessian matrix</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs039.html#simple-program" style="font-size: 80%;">Simple program</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs040.html#gradient-descent-example" style="font-size: 80%;">Gradient Descent Example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs041.html#and-a-corresponding-example-using-scikit-learn" style="font-size: 80%;">And a corresponding example using <b>scikit-learn</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs042.html#gradient-descent-and-ridge" style="font-size: 80%;">Gradient descent and Ridge</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs043.html#the-hessian-matrix-for-ridge-regression" style="font-size: 80%;">The Hessian matrix for Ridge Regression</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs044.html#program-example-for-gradient-descent-with-ridge-regression" style="font-size: 80%;">Program example for gradient descent with Ridge Regression</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs045.html#using-gradient-descent-methods-limitations" style="font-size: 80%;">Using gradient descent methods, limitations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs046.html#improving-gradient-descent-with-momentum" style="font-size: 80%;">Improving gradient descent with momentum</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs083.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;">Same code but now with momentum gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs048.html#overview-video-on-stochastic-gradient-descent" style="font-size: 80%;">Overview video on Stochastic Gradient Descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs049.html#batches-and-mini-batches" style="font-size: 80%;">Batches and mini-batches</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs050.html#stochastic-gradient-descent-sgd" style="font-size: 80%;">Stochastic Gradient Descent (SGD)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs051.html#stochastic-gradient-descent" style="font-size: 80%;">Stochastic Gradient Descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs052.html#computation-of-gradients" style="font-size: 80%;">Computation of gradients</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs053.html#sgd-example" style="font-size: 80%;">SGD example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs054.html#the-gradient-step" style="font-size: 80%;">The gradient step</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs055.html#simple-example-code" style="font-size: 80%;">Simple example code</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs056.html#when-do-we-stop" style="font-size: 80%;">When do we stop?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs057.html#slightly-different-approach" style="font-size: 80%;">Slightly different approach</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs058.html#time-decay-rate" style="font-size: 80%;">Time decay rate</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs059.html#code-with-a-number-of-minibatches-which-varies" style="font-size: 80%;">Code with a Number of Minibatches which varies</a></li>
+     <!-- navigation toc: --> <li><a href="#replace-or-not" style="font-size: 80%;">Replace or not</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs061.html#momentum-based-gd" style="font-size: 80%;">Momentum based GD</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs062.html#more-on-momentum-based-approaches" style="font-size: 80%;">More on momentum based approaches</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs063.html#momentum-parameter" style="font-size: 80%;">Momentum parameter</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs064.html#second-moment-of-the-gradient" style="font-size: 80%;">Second moment of the gradient</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs065.html#rms-prop" style="font-size: 80%;">RMS prop</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs066.html#adam-optimizer-https-arxiv-org-abs-1412-6980" style="font-size: 80%;">"ADAM optimizer":"https://arxiv.org/abs/1412.6980"</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs067.html#algorithms-and-codes-for-adagrad-rmsprop-and-adam" style="font-size: 80%;">Algorithms and codes for Adagrad, RMSprop and Adam</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs068.html#practical-tips" style="font-size: 80%;">Practical tips</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs069.html#automatic-differentiation" style="font-size: 80%;">Automatic differentiation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs070.html#using-autograd" style="font-size: 80%;">Using autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs071.html#autograd-with-more-complicated-functions" style="font-size: 80%;">Autograd with more complicated functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs072.html#more-complicated-functions-using-the-elements-of-their-arguments-directly" style="font-size: 80%;">More complicated functions using the elements of their arguments directly</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs073.html#functions-using-mathematical-functions-from-numpy" style="font-size: 80%;">Functions using mathematical functions from Numpy</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs074.html#more-autograd" style="font-size: 80%;">More autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs075.html#and-with-loops" style="font-size: 80%;">And  with loops</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs076.html#using-recursion" style="font-size: 80%;">Using recursion</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs077.html#unsupported-functions" style="font-size: 80%;">Unsupported functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs078.html#the-syntax-a-dot-b-when-finding-the-dot-product" style="font-size: 80%;">The syntax a.dot(b) when finding the dot product</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs079.html#using-autograd-with-ols" style="font-size: 80%;">Using Autograd with OLS</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs083.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;">Same code but now with momentum gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs081.html#but-none-of-these-can-compete-with-newton-s-method" style="font-size: 80%;">But none of these can compete with Newton's method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs082.html#including-stochastic-gradient-descent-with-autograd" style="font-size: 80%;">Including Stochastic Gradient Descent with Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs083.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;">Same code but now with momentum gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs084.html#similar-second-order-function-now-problem-but-now-with-adagrad" style="font-size: 80%;">Similar (second order function now) problem but now with AdaGrad</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs085.html#rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent" style="font-size: 80%;">RMSprop for adaptive learning rate with Stochastic Gradient Descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs086.html#and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf" style="font-size: 80%;">And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs087.html#and-logistic-regression" style="font-size: 80%;">And Logistic Regression</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs088.html#introducing-jax-https-jax-readthedocs-io-en-latest" style="font-size: 80%;">Introducing "JAX":"https://jax.readthedocs.io/en/latest/"</a></li>
+
+        </ul>
+      </li>
+    </ul>
+  </div>
+</div>
+</div> <!-- end of navigation bar -->
+<div class="container">
+<p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
+<a name="part0060"></a>
+<!-- !split -->
+<h2 id="replace-or-not" class="anchor">Replace or not </h2>
+
+<p>In the above code, we have use replacement in setting up the
+mini-batches. The discussion
+<a href="https://sebastianraschka.com/faq/docs/sgd-methods.html" target="_self">here</a> may be
+useful.  
+</p>
+
+<p>
+<!-- navigation buttons at the bottom of the page -->
+<ul class="pagination">
+<li><a href="http://github.com/._week39-bs059.html">&laquo;</a></li>
+  <li><a href="http://github.com/._week39-bs000.html">1</a></li>
+  <li><a href="">...</a></li>
+  <li><a href="http://github.com/._week39-bs052.html">53</a></li>
+  <li><a href="http://github.com/._week39-bs053.html">54</a></li>
+  <li><a href="http://github.com/._week39-bs054.html">55</a></li>
+  <li><a href="http://github.com/._week39-bs055.html">56</a></li>
+  <li><a href="http://github.com/._week39-bs056.html">57</a></li>
+  <li><a href="http://github.com/._week39-bs057.html">58</a></li>
+  <li><a href="http://github.com/._week39-bs058.html">59</a></li>
+  <li><a href="http://github.com/._week39-bs059.html">60</a></li>
+  <li class="active"><a href="http://github.com/._week39-bs060.html">61</a></li>
+  <li><a href="http://github.com/._week39-bs061.html">62</a></li>
+  <li><a href="http://github.com/._week39-bs062.html">63</a></li>
+  <li><a href="http://github.com/._week39-bs063.html">64</a></li>
+  <li><a href="http://github.com/._week39-bs064.html">65</a></li>
+  <li><a href="http://github.com/._week39-bs065.html">66</a></li>
+  <li><a href="http://github.com/._week39-bs066.html">67</a></li>
+  <li><a href="http://github.com/._week39-bs067.html">68</a></li>
+  <li><a href="http://github.com/._week39-bs068.html">69</a></li>
+  <li><a href="http://github.com/._week39-bs069.html">70</a></li>
+  <li><a href="">...</a></li>
+  <li><a href="http://github.com/._week39-bs088.html">89</a></li>
+  <li><a href="http://github.com/._week39-bs061.html">&raquo;</a></li>
+</ul>
+<!-- ------------------- end of main content --------------- -->
+</div>  <!-- end container -->
+<!-- include javascript, jQuery *first* -->
+<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.10.2/jquery.min.js"></script>
+<script src="https://netdna.bootstrapcdn.com/bootstrap/3.0.0/js/bootstrap.min.js"></script>
+<!-- Bootstrap footer
+<footer>
+<a href="https://.../"><img width="250" align=right src="https://.../"></a>
+</footer>
+-->
+<center style="font-size:80%">
+<!-- copyright only on the titlepage -->
+</center>
+</body>
+</html>
+
diff --git a/doc/pub/week39/html/._week39-bs061.html b/doc/pub/week39/html/._week39-bs061.html
new file mode 100644
index 000000000..3838033ab
--- /dev/null
+++ b/doc/pub/week39/html/._week39-bs061.html
@@ -0,0 +1,491 @@
+<!--
+HTML file automatically generated from DocOnce source
+(https://github.com/doconce/doconce/)
+doconce format html week39.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week39-bs --no_mako
+-->
+<html>
+<head>
+<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
+<meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
+<meta name="viewport" content="width=device-width, initial-scale=1.0" />
+<meta name="description" content="Week 39: Optimization and  Gradient Methods">
+<title>Week 39: Optimization and  Gradient Methods</title>
+<!-- Bootstrap style: bootstrap -->
+<!-- doconce format html week39.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week39-bs --no_mako -->
+<link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
+<!-- not necessary
+<link href="https://netdna.bootstrapcdn.com/font-awesome/4.0.3/css/font-awesome.css" rel="stylesheet">
+-->
+<style type="text/css">
+/* Add scrollbar to dropdown menus in bootstrap navigation bar */
+.dropdown-menu {
+   height: auto;
+   max-height: 400px;
+   overflow-x: hidden;
+}
+/* Adds an invisible element before each target to offset for the navigation
+   bar */
+.anchor::before {
+  content:"";
+  display:block;
+  height:50px;      /* fixed header height for style bootstrap */
+  margin:-50px 0 0; /* negative fixed header height */
+}
+</style>
+</head>
+
+<!-- tocinfo
+{'highest level': 2,
+ 'sections': [('Plan for week 39, September 23-27, 2024',
+               2,
+               None,
+               'plan-for-week-39-september-23-27-2024'),
+              ('Lecture Monday September 23',
+               2,
+               None,
+               'lecture-monday-september-23'),
+              ('Lab sessions week 39', 2, None, 'lab-sessions-week-39'),
+              ('Lecture Monday September 23, Optimization, the central part of '
+               'any Machine Learning algortithm',
+               2,
+               None,
+               'lecture-monday-september-23-optimization-the-central-part-of-any-machine-learning-algortithm'),
+              ('Revisiting our Logistic Regression case',
+               2,
+               None,
+               'revisiting-our-logistic-regression-case'),
+              ('The equations to solve', 2, None, 'the-equations-to-solve'),
+              ("Solving using Newton-Raphson's method",
+               2,
+               None,
+               'solving-using-newton-raphson-s-method'),
+              ("Brief reminder on Newton-Raphson's method",
+               2,
+               None,
+               'brief-reminder-on-newton-raphson-s-method'),
+              ('The equations', 2, None, 'the-equations'),
+              ('Simple geometric interpretation',
+               2,
+               None,
+               'simple-geometric-interpretation'),
+              ('Extending to more than one variable',
+               2,
+               None,
+               'extending-to-more-than-one-variable'),
+              ('Steepest descent', 2, None, 'steepest-descent'),
+              ('More on Steepest descent', 2, None, 'more-on-steepest-descent'),
+              ('The ideal', 2, None, 'the-ideal'),
+              ('The sensitiveness of the gradient descent',
+               2,
+               None,
+               'the-sensitiveness-of-the-gradient-descent'),
+              ('Convex functions', 2, None, 'convex-functions'),
+              ('Convex function', 2, None, 'convex-function'),
+              ('Conditions on convex functions',
+               2,
+               None,
+               'conditions-on-convex-functions'),
+              ('More on convex functions', 2, None, 'more-on-convex-functions'),
+              ('Some simple problems', 2, None, 'some-simple-problems'),
+              ('Standard steepest descent',
+               2,
+               None,
+               'standard-steepest-descent'),
+              ('Gradient method', 2, None, 'gradient-method'),
+              ('Steepest descent  method', 2, None, 'steepest-descent-method'),
+              ('Steepest descent  method', 2, None, 'steepest-descent-method'),
+              ('Final expressions', 2, None, 'final-expressions'),
+              ('Steepest descent example', 2, None, 'steepest-descent-example'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method and iterations',
+               2,
+               None,
+               'conjugate-gradient-method-and-iterations'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Revisiting our first homework',
+               2,
+               None,
+               'revisiting-our-first-homework'),
+              ('Gradient descent example', 2, None, 'gradient-descent-example'),
+              ('The derivative of the cost/loss function',
+               2,
+               None,
+               'the-derivative-of-the-cost-loss-function'),
+              ('The Hessian matrix', 2, None, 'the-hessian-matrix'),
+              ('Simple program', 2, None, 'simple-program'),
+              ('Gradient Descent Example', 2, None, 'gradient-descent-example'),
+              ('And a corresponding example using _scikit-learn_',
+               2,
+               None,
+               'and-a-corresponding-example-using-scikit-learn'),
+              ('Gradient descent and Ridge',
+               2,
+               None,
+               'gradient-descent-and-ridge'),
+              ('The Hessian matrix for Ridge Regression',
+               2,
+               None,
+               'the-hessian-matrix-for-ridge-regression'),
+              ('Program example for gradient descent with Ridge Regression',
+               2,
+               None,
+               'program-example-for-gradient-descent-with-ridge-regression'),
+              ('Using gradient descent methods, limitations',
+               2,
+               None,
+               'using-gradient-descent-methods-limitations'),
+              ('Improving gradient descent with momentum',
+               2,
+               None,
+               'improving-gradient-descent-with-momentum'),
+              ('Same code but now with momentum gradient descent',
+               2,
+               None,
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ('Overview video on Stochastic Gradient Descent',
+               2,
+               None,
+               'overview-video-on-stochastic-gradient-descent'),
+              ('Batches and mini-batches', 2, None, 'batches-and-mini-batches'),
+              ('Stochastic Gradient Descent (SGD)',
+               2,
+               None,
+               'stochastic-gradient-descent-sgd'),
+              ('Stochastic Gradient Descent',
+               2,
+               None,
+               'stochastic-gradient-descent'),
+              ('Computation of gradients', 2, None, 'computation-of-gradients'),
+              ('SGD example', 2, None, 'sgd-example'),
+              ('The gradient step', 2, None, 'the-gradient-step'),
+              ('Simple example code', 2, None, 'simple-example-code'),
+              ('When do we stop?', 2, None, 'when-do-we-stop'),
+              ('Slightly different approach',
+               2,
+               None,
+               'slightly-different-approach'),
+              ('Time decay rate', 2, None, 'time-decay-rate'),
+              ('Code with a Number of Minibatches which varies',
+               2,
+               None,
+               'code-with-a-number-of-minibatches-which-varies'),
+              ('Replace or not', 2, None, 'replace-or-not'),
+              ('Momentum based GD', 2, None, 'momentum-based-gd'),
+              ('More on momentum based approaches',
+               2,
+               None,
+               'more-on-momentum-based-approaches'),
+              ('Momentum parameter', 2, None, 'momentum-parameter'),
+              ('Second moment of the gradient',
+               2,
+               None,
+               'second-moment-of-the-gradient'),
+              ('RMS prop', 2, None, 'rms-prop'),
+              ('"ADAM optimizer":"https://arxiv.org/abs/1412.6980"',
+               2,
+               None,
+               'adam-optimizer-https-arxiv-org-abs-1412-6980'),
+              ('Algorithms and codes for Adagrad, RMSprop and Adam',
+               2,
+               None,
+               'algorithms-and-codes-for-adagrad-rmsprop-and-adam'),
+              ('Practical tips', 2, None, 'practical-tips'),
+              ('Automatic differentiation',
+               2,
+               None,
+               'automatic-differentiation'),
+              ('Using autograd', 2, None, 'using-autograd'),
+              ('Autograd with more complicated functions',
+               2,
+               None,
+               'autograd-with-more-complicated-functions'),
+              ('More complicated functions using the elements of their '
+               'arguments directly',
+               2,
+               None,
+               'more-complicated-functions-using-the-elements-of-their-arguments-directly'),
+              ('Functions using mathematical functions from Numpy',
+               2,
+               None,
+               'functions-using-mathematical-functions-from-numpy'),
+              ('More autograd', 2, None, 'more-autograd'),
+              ('And  with loops', 2, None, 'and-with-loops'),
+              ('Using recursion', 2, None, 'using-recursion'),
+              ('Unsupported functions', 2, None, 'unsupported-functions'),
+              ('The syntax a.dot(b) when finding the dot product',
+               2,
+               None,
+               'the-syntax-a-dot-b-when-finding-the-dot-product'),
+              ('Using Autograd with OLS', 2, None, 'using-autograd-with-ols'),
+              ('Same code but now with momentum gradient descent',
+               2,
+               None,
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ("But none of these can compete with Newton's method",
+               2,
+               None,
+               'but-none-of-these-can-compete-with-newton-s-method'),
+              ('Including Stochastic Gradient Descent with Autograd',
+               2,
+               None,
+               'including-stochastic-gradient-descent-with-autograd'),
+              ('Same code but now with momentum gradient descent',
+               2,
+               None,
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ('Similar (second order function now) problem but now with '
+               'AdaGrad',
+               2,
+               None,
+               'similar-second-order-function-now-problem-but-now-with-adagrad'),
+              ('RMSprop for adaptive learning rate with Stochastic Gradient '
+               'Descent',
+               2,
+               None,
+               'rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent'),
+              ('And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"',
+               2,
+               None,
+               'and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf'),
+              ('And Logistic Regression', 2, None, 'and-logistic-regression'),
+              ('Introducing "JAX":"https://jax.readthedocs.io/en/latest/"',
+               2,
+               None,
+               'introducing-jax-https-jax-readthedocs-io-en-latest')]}
+end of tocinfo -->
+
+<body>
+
+
+
+<script type="text/x-mathjax-config">
+MathJax.Hub.Config({
+  TeX: {
+     equationNumbers: {  autoNumber: "none"  },
+     extensions: ["AMSmath.js", "AMSsymbols.js", "autobold.js", "color.js"]
+  }
+});
+</script>
+<script type="text/javascript" async
+ src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
+</script>
+
+
+<!-- Bootstrap navigation bar -->
+<div class="navbar navbar-default navbar-fixed-top">
+  <div class="navbar-header">
+    <button type="button" class="navbar-toggle" data-toggle="collapse" data-target=".navbar-responsive-collapse">
+      <span class="icon-bar"></span>
+      <span class="icon-bar"></span>
+      <span class="icon-bar"></span>
+    </button>
+    <a class="navbar-brand" href="http://github.com/week39-bs.html">Week 39: Optimization and  Gradient Methods</a>
+  </div>
+  <div class="navbar-collapse collapse navbar-responsive-collapse">
+    <ul class="nav navbar-nav navbar-right">
+      <li class="dropdown">
+        <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
+        <ul class="dropdown-menu">
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs001.html#plan-for-week-39-september-23-27-2024" style="font-size: 80%;">Plan for week 39, September 23-27, 2024</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs002.html#lecture-monday-september-23" style="font-size: 80%;">Lecture Monday September 23</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs003.html#lab-sessions-week-39" style="font-size: 80%;">Lab sessions week 39</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs004.html#lecture-monday-september-23-optimization-the-central-part-of-any-machine-learning-algortithm" style="font-size: 80%;">Lecture Monday September 23, Optimization, the central part of any Machine Learning algortithm</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs005.html#revisiting-our-logistic-regression-case" style="font-size: 80%;">Revisiting our Logistic Regression case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs006.html#the-equations-to-solve" style="font-size: 80%;">The equations to solve</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs007.html#solving-using-newton-raphson-s-method" style="font-size: 80%;">Solving using Newton-Raphson's method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs008.html#brief-reminder-on-newton-raphson-s-method" style="font-size: 80%;">Brief reminder on Newton-Raphson's method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs009.html#the-equations" style="font-size: 80%;">The equations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs010.html#simple-geometric-interpretation" style="font-size: 80%;">Simple geometric interpretation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs011.html#extending-to-more-than-one-variable" style="font-size: 80%;">Extending to more than one variable</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs012.html#steepest-descent" style="font-size: 80%;">Steepest descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs013.html#more-on-steepest-descent" style="font-size: 80%;">More on Steepest descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs014.html#the-ideal" style="font-size: 80%;">The ideal</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs015.html#the-sensitiveness-of-the-gradient-descent" style="font-size: 80%;">The sensitiveness of the gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs016.html#convex-functions" style="font-size: 80%;">Convex functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs017.html#convex-function" style="font-size: 80%;">Convex function</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs018.html#conditions-on-convex-functions" style="font-size: 80%;">Conditions on convex functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs019.html#more-on-convex-functions" style="font-size: 80%;">More on convex functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs020.html#some-simple-problems" style="font-size: 80%;">Some simple problems</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs021.html#standard-steepest-descent" style="font-size: 80%;">Standard steepest descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs022.html#gradient-method" style="font-size: 80%;">Gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs024.html#steepest-descent-method" style="font-size: 80%;">Steepest descent  method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs024.html#steepest-descent-method" style="font-size: 80%;">Steepest descent  method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs025.html#final-expressions" style="font-size: 80%;">Final expressions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs026.html#steepest-descent-example" style="font-size: 80%;">Steepest descent example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs031.html#conjugate-gradient-method-and-iterations" style="font-size: 80%;">Conjugate gradient method and iterations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs035.html#revisiting-our-first-homework" style="font-size: 80%;">Revisiting our first homework</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs040.html#gradient-descent-example" style="font-size: 80%;">Gradient descent example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs037.html#the-derivative-of-the-cost-loss-function" style="font-size: 80%;">The derivative of the cost/loss function</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs038.html#the-hessian-matrix" style="font-size: 80%;">The Hessian matrix</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs039.html#simple-program" style="font-size: 80%;">Simple program</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs040.html#gradient-descent-example" style="font-size: 80%;">Gradient Descent Example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs041.html#and-a-corresponding-example-using-scikit-learn" style="font-size: 80%;">And a corresponding example using <b>scikit-learn</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs042.html#gradient-descent-and-ridge" style="font-size: 80%;">Gradient descent and Ridge</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs043.html#the-hessian-matrix-for-ridge-regression" style="font-size: 80%;">The Hessian matrix for Ridge Regression</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs044.html#program-example-for-gradient-descent-with-ridge-regression" style="font-size: 80%;">Program example for gradient descent with Ridge Regression</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs045.html#using-gradient-descent-methods-limitations" style="font-size: 80%;">Using gradient descent methods, limitations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs046.html#improving-gradient-descent-with-momentum" style="font-size: 80%;">Improving gradient descent with momentum</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs083.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;">Same code but now with momentum gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs048.html#overview-video-on-stochastic-gradient-descent" style="font-size: 80%;">Overview video on Stochastic Gradient Descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs049.html#batches-and-mini-batches" style="font-size: 80%;">Batches and mini-batches</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs050.html#stochastic-gradient-descent-sgd" style="font-size: 80%;">Stochastic Gradient Descent (SGD)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs051.html#stochastic-gradient-descent" style="font-size: 80%;">Stochastic Gradient Descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs052.html#computation-of-gradients" style="font-size: 80%;">Computation of gradients</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs053.html#sgd-example" style="font-size: 80%;">SGD example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs054.html#the-gradient-step" style="font-size: 80%;">The gradient step</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs055.html#simple-example-code" style="font-size: 80%;">Simple example code</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs056.html#when-do-we-stop" style="font-size: 80%;">When do we stop?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs057.html#slightly-different-approach" style="font-size: 80%;">Slightly different approach</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs058.html#time-decay-rate" style="font-size: 80%;">Time decay rate</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs059.html#code-with-a-number-of-minibatches-which-varies" style="font-size: 80%;">Code with a Number of Minibatches which varies</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs060.html#replace-or-not" style="font-size: 80%;">Replace or not</a></li>
+     <!-- navigation toc: --> <li><a href="#momentum-based-gd" style="font-size: 80%;">Momentum based GD</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs062.html#more-on-momentum-based-approaches" style="font-size: 80%;">More on momentum based approaches</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs063.html#momentum-parameter" style="font-size: 80%;">Momentum parameter</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs064.html#second-moment-of-the-gradient" style="font-size: 80%;">Second moment of the gradient</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs065.html#rms-prop" style="font-size: 80%;">RMS prop</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs066.html#adam-optimizer-https-arxiv-org-abs-1412-6980" style="font-size: 80%;">"ADAM optimizer":"https://arxiv.org/abs/1412.6980"</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs067.html#algorithms-and-codes-for-adagrad-rmsprop-and-adam" style="font-size: 80%;">Algorithms and codes for Adagrad, RMSprop and Adam</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs068.html#practical-tips" style="font-size: 80%;">Practical tips</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs069.html#automatic-differentiation" style="font-size: 80%;">Automatic differentiation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs070.html#using-autograd" style="font-size: 80%;">Using autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs071.html#autograd-with-more-complicated-functions" style="font-size: 80%;">Autograd with more complicated functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs072.html#more-complicated-functions-using-the-elements-of-their-arguments-directly" style="font-size: 80%;">More complicated functions using the elements of their arguments directly</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs073.html#functions-using-mathematical-functions-from-numpy" style="font-size: 80%;">Functions using mathematical functions from Numpy</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs074.html#more-autograd" style="font-size: 80%;">More autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs075.html#and-with-loops" style="font-size: 80%;">And  with loops</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs076.html#using-recursion" style="font-size: 80%;">Using recursion</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs077.html#unsupported-functions" style="font-size: 80%;">Unsupported functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs078.html#the-syntax-a-dot-b-when-finding-the-dot-product" style="font-size: 80%;">The syntax a.dot(b) when finding the dot product</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs079.html#using-autograd-with-ols" style="font-size: 80%;">Using Autograd with OLS</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs083.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;">Same code but now with momentum gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs081.html#but-none-of-these-can-compete-with-newton-s-method" style="font-size: 80%;">But none of these can compete with Newton's method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs082.html#including-stochastic-gradient-descent-with-autograd" style="font-size: 80%;">Including Stochastic Gradient Descent with Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs083.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;">Same code but now with momentum gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs084.html#similar-second-order-function-now-problem-but-now-with-adagrad" style="font-size: 80%;">Similar (second order function now) problem but now with AdaGrad</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs085.html#rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent" style="font-size: 80%;">RMSprop for adaptive learning rate with Stochastic Gradient Descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs086.html#and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf" style="font-size: 80%;">And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs087.html#and-logistic-regression" style="font-size: 80%;">And Logistic Regression</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs088.html#introducing-jax-https-jax-readthedocs-io-en-latest" style="font-size: 80%;">Introducing "JAX":"https://jax.readthedocs.io/en/latest/"</a></li>
+
+        </ul>
+      </li>
+    </ul>
+  </div>
+</div>
+</div> <!-- end of navigation bar -->
+<div class="container">
+<p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
+<a name="part0061"></a>
+<!-- !split -->
+<h2 id="momentum-based-gd" class="anchor">Momentum based GD </h2>
+
+<p>The stochastic gradient descent (SGD) is almost always used with a
+<em>momentum</em> or inertia term that serves as a memory of the direction we
+are moving in parameter space.  This is typically implemented as
+follows
+</p>
+
+$$
+\begin{align}
+\mathbf{v}_{t}&=\gamma \mathbf{v}_{t-1}+\eta_{t}\nabla_\theta E(\boldsymbol{\theta}_t) \nonumber \\
+\boldsymbol{\theta}_{t+1}&= \boldsymbol{\theta}_t -\mathbf{v}_{t},
+\tag{2}
+\end{align}
+$$
+
+<p>where we have introduced a momentum parameter \( \gamma \), with
+\( 0\le\gamma\le 1 \), and for brevity we dropped the explicit notation to
+indicate the gradient is to be taken over a different mini-batch at
+each step. We call this algorithm gradient descent with momentum
+(GDM). From these equations, it is clear that \( \mathbf{v}_t \) is a
+running average of recently encountered gradients and
+\( (1-\gamma)^{-1} \) sets the characteristic time scale for the memory
+used in the averaging procedure. Consistent with this, when
+\( \gamma=0 \), this just reduces down to ordinary SGD as discussed
+earlier. An equivalent way of writing the updates is
+</p>
+
+$$
+\Delta \boldsymbol{\theta}_{t+1} = \gamma \Delta \boldsymbol{\theta}_t -\ \eta_{t}\nabla_\theta E(\boldsymbol{\theta}_t),
+$$
+
+<p>where we have defined \( \Delta \boldsymbol{\theta}_{t}= \boldsymbol{\theta}_t-\boldsymbol{\theta}_{t-1} \).</p>
+
+<p>
+<!-- navigation buttons at the bottom of the page -->
+<ul class="pagination">
+<li><a href="http://github.com/._week39-bs060.html">&laquo;</a></li>
+  <li><a href="http://github.com/._week39-bs000.html">1</a></li>
+  <li><a href="">...</a></li>
+  <li><a href="http://github.com/._week39-bs053.html">54</a></li>
+  <li><a href="http://github.com/._week39-bs054.html">55</a></li>
+  <li><a href="http://github.com/._week39-bs055.html">56</a></li>
+  <li><a href="http://github.com/._week39-bs056.html">57</a></li>
+  <li><a href="http://github.com/._week39-bs057.html">58</a></li>
+  <li><a href="http://github.com/._week39-bs058.html">59</a></li>
+  <li><a href="http://github.com/._week39-bs059.html">60</a></li>
+  <li><a href="http://github.com/._week39-bs060.html">61</a></li>
+  <li class="active"><a href="http://github.com/._week39-bs061.html">62</a></li>
+  <li><a href="http://github.com/._week39-bs062.html">63</a></li>
+  <li><a href="http://github.com/._week39-bs063.html">64</a></li>
+  <li><a href="http://github.com/._week39-bs064.html">65</a></li>
+  <li><a href="http://github.com/._week39-bs065.html">66</a></li>
+  <li><a href="http://github.com/._week39-bs066.html">67</a></li>
+  <li><a href="http://github.com/._week39-bs067.html">68</a></li>
+  <li><a href="http://github.com/._week39-bs068.html">69</a></li>
+  <li><a href="http://github.com/._week39-bs069.html">70</a></li>
+  <li><a href="http://github.com/._week39-bs070.html">71</a></li>
+  <li><a href="">...</a></li>
+  <li><a href="http://github.com/._week39-bs088.html">89</a></li>
+  <li><a href="http://github.com/._week39-bs062.html">&raquo;</a></li>
+</ul>
+<!-- ------------------- end of main content --------------- -->
+</div>  <!-- end container -->
+<!-- include javascript, jQuery *first* -->
+<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.10.2/jquery.min.js"></script>
+<script src="https://netdna.bootstrapcdn.com/bootstrap/3.0.0/js/bootstrap.min.js"></script>
+<!-- Bootstrap footer
+<footer>
+<a href="https://.../"><img width="250" align=right src="https://.../"></a>
+</footer>
+-->
+<center style="font-size:80%">
+<!-- copyright only on the titlepage -->
+</center>
+</body>
+</html>
+
diff --git a/doc/pub/week39/html/._week39-bs062.html b/doc/pub/week39/html/._week39-bs062.html
new file mode 100644
index 000000000..c42315d60
--- /dev/null
+++ b/doc/pub/week39/html/._week39-bs062.html
@@ -0,0 +1,483 @@
+<!--
+HTML file automatically generated from DocOnce source
+(https://github.com/doconce/doconce/)
+doconce format html week39.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week39-bs --no_mako
+-->
+<html>
+<head>
+<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
+<meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
+<meta name="viewport" content="width=device-width, initial-scale=1.0" />
+<meta name="description" content="Week 39: Optimization and  Gradient Methods">
+<title>Week 39: Optimization and  Gradient Methods</title>
+<!-- Bootstrap style: bootstrap -->
+<!-- doconce format html week39.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week39-bs --no_mako -->
+<link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
+<!-- not necessary
+<link href="https://netdna.bootstrapcdn.com/font-awesome/4.0.3/css/font-awesome.css" rel="stylesheet">
+-->
+<style type="text/css">
+/* Add scrollbar to dropdown menus in bootstrap navigation bar */
+.dropdown-menu {
+   height: auto;
+   max-height: 400px;
+   overflow-x: hidden;
+}
+/* Adds an invisible element before each target to offset for the navigation
+   bar */
+.anchor::before {
+  content:"";
+  display:block;
+  height:50px;      /* fixed header height for style bootstrap */
+  margin:-50px 0 0; /* negative fixed header height */
+}
+</style>
+</head>
+
+<!-- tocinfo
+{'highest level': 2,
+ 'sections': [('Plan for week 39, September 23-27, 2024',
+               2,
+               None,
+               'plan-for-week-39-september-23-27-2024'),
+              ('Lecture Monday September 23',
+               2,
+               None,
+               'lecture-monday-september-23'),
+              ('Lab sessions week 39', 2, None, 'lab-sessions-week-39'),
+              ('Lecture Monday September 23, Optimization, the central part of '
+               'any Machine Learning algortithm',
+               2,
+               None,
+               'lecture-monday-september-23-optimization-the-central-part-of-any-machine-learning-algortithm'),
+              ('Revisiting our Logistic Regression case',
+               2,
+               None,
+               'revisiting-our-logistic-regression-case'),
+              ('The equations to solve', 2, None, 'the-equations-to-solve'),
+              ("Solving using Newton-Raphson's method",
+               2,
+               None,
+               'solving-using-newton-raphson-s-method'),
+              ("Brief reminder on Newton-Raphson's method",
+               2,
+               None,
+               'brief-reminder-on-newton-raphson-s-method'),
+              ('The equations', 2, None, 'the-equations'),
+              ('Simple geometric interpretation',
+               2,
+               None,
+               'simple-geometric-interpretation'),
+              ('Extending to more than one variable',
+               2,
+               None,
+               'extending-to-more-than-one-variable'),
+              ('Steepest descent', 2, None, 'steepest-descent'),
+              ('More on Steepest descent', 2, None, 'more-on-steepest-descent'),
+              ('The ideal', 2, None, 'the-ideal'),
+              ('The sensitiveness of the gradient descent',
+               2,
+               None,
+               'the-sensitiveness-of-the-gradient-descent'),
+              ('Convex functions', 2, None, 'convex-functions'),
+              ('Convex function', 2, None, 'convex-function'),
+              ('Conditions on convex functions',
+               2,
+               None,
+               'conditions-on-convex-functions'),
+              ('More on convex functions', 2, None, 'more-on-convex-functions'),
+              ('Some simple problems', 2, None, 'some-simple-problems'),
+              ('Standard steepest descent',
+               2,
+               None,
+               'standard-steepest-descent'),
+              ('Gradient method', 2, None, 'gradient-method'),
+              ('Steepest descent  method', 2, None, 'steepest-descent-method'),
+              ('Steepest descent  method', 2, None, 'steepest-descent-method'),
+              ('Final expressions', 2, None, 'final-expressions'),
+              ('Steepest descent example', 2, None, 'steepest-descent-example'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method and iterations',
+               2,
+               None,
+               'conjugate-gradient-method-and-iterations'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Revisiting our first homework',
+               2,
+               None,
+               'revisiting-our-first-homework'),
+              ('Gradient descent example', 2, None, 'gradient-descent-example'),
+              ('The derivative of the cost/loss function',
+               2,
+               None,
+               'the-derivative-of-the-cost-loss-function'),
+              ('The Hessian matrix', 2, None, 'the-hessian-matrix'),
+              ('Simple program', 2, None, 'simple-program'),
+              ('Gradient Descent Example', 2, None, 'gradient-descent-example'),
+              ('And a corresponding example using _scikit-learn_',
+               2,
+               None,
+               'and-a-corresponding-example-using-scikit-learn'),
+              ('Gradient descent and Ridge',
+               2,
+               None,
+               'gradient-descent-and-ridge'),
+              ('The Hessian matrix for Ridge Regression',
+               2,
+               None,
+               'the-hessian-matrix-for-ridge-regression'),
+              ('Program example for gradient descent with Ridge Regression',
+               2,
+               None,
+               'program-example-for-gradient-descent-with-ridge-regression'),
+              ('Using gradient descent methods, limitations',
+               2,
+               None,
+               'using-gradient-descent-methods-limitations'),
+              ('Improving gradient descent with momentum',
+               2,
+               None,
+               'improving-gradient-descent-with-momentum'),
+              ('Same code but now with momentum gradient descent',
+               2,
+               None,
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ('Overview video on Stochastic Gradient Descent',
+               2,
+               None,
+               'overview-video-on-stochastic-gradient-descent'),
+              ('Batches and mini-batches', 2, None, 'batches-and-mini-batches'),
+              ('Stochastic Gradient Descent (SGD)',
+               2,
+               None,
+               'stochastic-gradient-descent-sgd'),
+              ('Stochastic Gradient Descent',
+               2,
+               None,
+               'stochastic-gradient-descent'),
+              ('Computation of gradients', 2, None, 'computation-of-gradients'),
+              ('SGD example', 2, None, 'sgd-example'),
+              ('The gradient step', 2, None, 'the-gradient-step'),
+              ('Simple example code', 2, None, 'simple-example-code'),
+              ('When do we stop?', 2, None, 'when-do-we-stop'),
+              ('Slightly different approach',
+               2,
+               None,
+               'slightly-different-approach'),
+              ('Time decay rate', 2, None, 'time-decay-rate'),
+              ('Code with a Number of Minibatches which varies',
+               2,
+               None,
+               'code-with-a-number-of-minibatches-which-varies'),
+              ('Replace or not', 2, None, 'replace-or-not'),
+              ('Momentum based GD', 2, None, 'momentum-based-gd'),
+              ('More on momentum based approaches',
+               2,
+               None,
+               'more-on-momentum-based-approaches'),
+              ('Momentum parameter', 2, None, 'momentum-parameter'),
+              ('Second moment of the gradient',
+               2,
+               None,
+               'second-moment-of-the-gradient'),
+              ('RMS prop', 2, None, 'rms-prop'),
+              ('"ADAM optimizer":"https://arxiv.org/abs/1412.6980"',
+               2,
+               None,
+               'adam-optimizer-https-arxiv-org-abs-1412-6980'),
+              ('Algorithms and codes for Adagrad, RMSprop and Adam',
+               2,
+               None,
+               'algorithms-and-codes-for-adagrad-rmsprop-and-adam'),
+              ('Practical tips', 2, None, 'practical-tips'),
+              ('Automatic differentiation',
+               2,
+               None,
+               'automatic-differentiation'),
+              ('Using autograd', 2, None, 'using-autograd'),
+              ('Autograd with more complicated functions',
+               2,
+               None,
+               'autograd-with-more-complicated-functions'),
+              ('More complicated functions using the elements of their '
+               'arguments directly',
+               2,
+               None,
+               'more-complicated-functions-using-the-elements-of-their-arguments-directly'),
+              ('Functions using mathematical functions from Numpy',
+               2,
+               None,
+               'functions-using-mathematical-functions-from-numpy'),
+              ('More autograd', 2, None, 'more-autograd'),
+              ('And  with loops', 2, None, 'and-with-loops'),
+              ('Using recursion', 2, None, 'using-recursion'),
+              ('Unsupported functions', 2, None, 'unsupported-functions'),
+              ('The syntax a.dot(b) when finding the dot product',
+               2,
+               None,
+               'the-syntax-a-dot-b-when-finding-the-dot-product'),
+              ('Using Autograd with OLS', 2, None, 'using-autograd-with-ols'),
+              ('Same code but now with momentum gradient descent',
+               2,
+               None,
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ("But none of these can compete with Newton's method",
+               2,
+               None,
+               'but-none-of-these-can-compete-with-newton-s-method'),
+              ('Including Stochastic Gradient Descent with Autograd',
+               2,
+               None,
+               'including-stochastic-gradient-descent-with-autograd'),
+              ('Same code but now with momentum gradient descent',
+               2,
+               None,
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ('Similar (second order function now) problem but now with '
+               'AdaGrad',
+               2,
+               None,
+               'similar-second-order-function-now-problem-but-now-with-adagrad'),
+              ('RMSprop for adaptive learning rate with Stochastic Gradient '
+               'Descent',
+               2,
+               None,
+               'rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent'),
+              ('And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"',
+               2,
+               None,
+               'and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf'),
+              ('And Logistic Regression', 2, None, 'and-logistic-regression'),
+              ('Introducing "JAX":"https://jax.readthedocs.io/en/latest/"',
+               2,
+               None,
+               'introducing-jax-https-jax-readthedocs-io-en-latest')]}
+end of tocinfo -->
+
+<body>
+
+
+
+<script type="text/x-mathjax-config">
+MathJax.Hub.Config({
+  TeX: {
+     equationNumbers: {  autoNumber: "none"  },
+     extensions: ["AMSmath.js", "AMSsymbols.js", "autobold.js", "color.js"]
+  }
+});
+</script>
+<script type="text/javascript" async
+ src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
+</script>
+
+
+<!-- Bootstrap navigation bar -->
+<div class="navbar navbar-default navbar-fixed-top">
+  <div class="navbar-header">
+    <button type="button" class="navbar-toggle" data-toggle="collapse" data-target=".navbar-responsive-collapse">
+      <span class="icon-bar"></span>
+      <span class="icon-bar"></span>
+      <span class="icon-bar"></span>
+    </button>
+    <a class="navbar-brand" href="http://github.com/week39-bs.html">Week 39: Optimization and  Gradient Methods</a>
+  </div>
+  <div class="navbar-collapse collapse navbar-responsive-collapse">
+    <ul class="nav navbar-nav navbar-right">
+      <li class="dropdown">
+        <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
+        <ul class="dropdown-menu">
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs001.html#plan-for-week-39-september-23-27-2024" style="font-size: 80%;">Plan for week 39, September 23-27, 2024</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs002.html#lecture-monday-september-23" style="font-size: 80%;">Lecture Monday September 23</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs003.html#lab-sessions-week-39" style="font-size: 80%;">Lab sessions week 39</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs004.html#lecture-monday-september-23-optimization-the-central-part-of-any-machine-learning-algortithm" style="font-size: 80%;">Lecture Monday September 23, Optimization, the central part of any Machine Learning algortithm</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs005.html#revisiting-our-logistic-regression-case" style="font-size: 80%;">Revisiting our Logistic Regression case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs006.html#the-equations-to-solve" style="font-size: 80%;">The equations to solve</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs007.html#solving-using-newton-raphson-s-method" style="font-size: 80%;">Solving using Newton-Raphson's method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs008.html#brief-reminder-on-newton-raphson-s-method" style="font-size: 80%;">Brief reminder on Newton-Raphson's method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs009.html#the-equations" style="font-size: 80%;">The equations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs010.html#simple-geometric-interpretation" style="font-size: 80%;">Simple geometric interpretation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs011.html#extending-to-more-than-one-variable" style="font-size: 80%;">Extending to more than one variable</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs012.html#steepest-descent" style="font-size: 80%;">Steepest descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs013.html#more-on-steepest-descent" style="font-size: 80%;">More on Steepest descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs014.html#the-ideal" style="font-size: 80%;">The ideal</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs015.html#the-sensitiveness-of-the-gradient-descent" style="font-size: 80%;">The sensitiveness of the gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs016.html#convex-functions" style="font-size: 80%;">Convex functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs017.html#convex-function" style="font-size: 80%;">Convex function</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs018.html#conditions-on-convex-functions" style="font-size: 80%;">Conditions on convex functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs019.html#more-on-convex-functions" style="font-size: 80%;">More on convex functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs020.html#some-simple-problems" style="font-size: 80%;">Some simple problems</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs021.html#standard-steepest-descent" style="font-size: 80%;">Standard steepest descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs022.html#gradient-method" style="font-size: 80%;">Gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs024.html#steepest-descent-method" style="font-size: 80%;">Steepest descent  method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs024.html#steepest-descent-method" style="font-size: 80%;">Steepest descent  method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs025.html#final-expressions" style="font-size: 80%;">Final expressions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs026.html#steepest-descent-example" style="font-size: 80%;">Steepest descent example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs031.html#conjugate-gradient-method-and-iterations" style="font-size: 80%;">Conjugate gradient method and iterations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs035.html#revisiting-our-first-homework" style="font-size: 80%;">Revisiting our first homework</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs040.html#gradient-descent-example" style="font-size: 80%;">Gradient descent example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs037.html#the-derivative-of-the-cost-loss-function" style="font-size: 80%;">The derivative of the cost/loss function</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs038.html#the-hessian-matrix" style="font-size: 80%;">The Hessian matrix</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs039.html#simple-program" style="font-size: 80%;">Simple program</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs040.html#gradient-descent-example" style="font-size: 80%;">Gradient Descent Example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs041.html#and-a-corresponding-example-using-scikit-learn" style="font-size: 80%;">And a corresponding example using <b>scikit-learn</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs042.html#gradient-descent-and-ridge" style="font-size: 80%;">Gradient descent and Ridge</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs043.html#the-hessian-matrix-for-ridge-regression" style="font-size: 80%;">The Hessian matrix for Ridge Regression</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs044.html#program-example-for-gradient-descent-with-ridge-regression" style="font-size: 80%;">Program example for gradient descent with Ridge Regression</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs045.html#using-gradient-descent-methods-limitations" style="font-size: 80%;">Using gradient descent methods, limitations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs046.html#improving-gradient-descent-with-momentum" style="font-size: 80%;">Improving gradient descent with momentum</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs083.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;">Same code but now with momentum gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs048.html#overview-video-on-stochastic-gradient-descent" style="font-size: 80%;">Overview video on Stochastic Gradient Descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs049.html#batches-and-mini-batches" style="font-size: 80%;">Batches and mini-batches</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs050.html#stochastic-gradient-descent-sgd" style="font-size: 80%;">Stochastic Gradient Descent (SGD)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs051.html#stochastic-gradient-descent" style="font-size: 80%;">Stochastic Gradient Descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs052.html#computation-of-gradients" style="font-size: 80%;">Computation of gradients</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs053.html#sgd-example" style="font-size: 80%;">SGD example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs054.html#the-gradient-step" style="font-size: 80%;">The gradient step</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs055.html#simple-example-code" style="font-size: 80%;">Simple example code</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs056.html#when-do-we-stop" style="font-size: 80%;">When do we stop?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs057.html#slightly-different-approach" style="font-size: 80%;">Slightly different approach</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs058.html#time-decay-rate" style="font-size: 80%;">Time decay rate</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs059.html#code-with-a-number-of-minibatches-which-varies" style="font-size: 80%;">Code with a Number of Minibatches which varies</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs060.html#replace-or-not" style="font-size: 80%;">Replace or not</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs061.html#momentum-based-gd" style="font-size: 80%;">Momentum based GD</a></li>
+     <!-- navigation toc: --> <li><a href="#more-on-momentum-based-approaches" style="font-size: 80%;">More on momentum based approaches</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs063.html#momentum-parameter" style="font-size: 80%;">Momentum parameter</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs064.html#second-moment-of-the-gradient" style="font-size: 80%;">Second moment of the gradient</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs065.html#rms-prop" style="font-size: 80%;">RMS prop</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs066.html#adam-optimizer-https-arxiv-org-abs-1412-6980" style="font-size: 80%;">"ADAM optimizer":"https://arxiv.org/abs/1412.6980"</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs067.html#algorithms-and-codes-for-adagrad-rmsprop-and-adam" style="font-size: 80%;">Algorithms and codes for Adagrad, RMSprop and Adam</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs068.html#practical-tips" style="font-size: 80%;">Practical tips</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs069.html#automatic-differentiation" style="font-size: 80%;">Automatic differentiation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs070.html#using-autograd" style="font-size: 80%;">Using autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs071.html#autograd-with-more-complicated-functions" style="font-size: 80%;">Autograd with more complicated functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs072.html#more-complicated-functions-using-the-elements-of-their-arguments-directly" style="font-size: 80%;">More complicated functions using the elements of their arguments directly</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs073.html#functions-using-mathematical-functions-from-numpy" style="font-size: 80%;">Functions using mathematical functions from Numpy</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs074.html#more-autograd" style="font-size: 80%;">More autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs075.html#and-with-loops" style="font-size: 80%;">And  with loops</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs076.html#using-recursion" style="font-size: 80%;">Using recursion</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs077.html#unsupported-functions" style="font-size: 80%;">Unsupported functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs078.html#the-syntax-a-dot-b-when-finding-the-dot-product" style="font-size: 80%;">The syntax a.dot(b) when finding the dot product</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs079.html#using-autograd-with-ols" style="font-size: 80%;">Using Autograd with OLS</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs083.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;">Same code but now with momentum gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs081.html#but-none-of-these-can-compete-with-newton-s-method" style="font-size: 80%;">But none of these can compete with Newton's method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs082.html#including-stochastic-gradient-descent-with-autograd" style="font-size: 80%;">Including Stochastic Gradient Descent with Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs083.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;">Same code but now with momentum gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs084.html#similar-second-order-function-now-problem-but-now-with-adagrad" style="font-size: 80%;">Similar (second order function now) problem but now with AdaGrad</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs085.html#rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent" style="font-size: 80%;">RMSprop for adaptive learning rate with Stochastic Gradient Descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs086.html#and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf" style="font-size: 80%;">And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs087.html#and-logistic-regression" style="font-size: 80%;">And Logistic Regression</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs088.html#introducing-jax-https-jax-readthedocs-io-en-latest" style="font-size: 80%;">Introducing "JAX":"https://jax.readthedocs.io/en/latest/"</a></li>
+
+        </ul>
+      </li>
+    </ul>
+  </div>
+</div>
+</div> <!-- end of navigation bar -->
+<div class="container">
+<p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
+<a name="part0062"></a>
+<!-- !split -->
+<h2 id="more-on-momentum-based-approaches" class="anchor">More on momentum based approaches </h2>
+
+<p>Let us try to get more intuition from these equations. It is helpful
+to consider a simple physical analogy with a particle of mass \( m \)
+moving in a viscous medium with drag coefficient \( \mu \) and potential
+\( E(\mathbf{w}) \). If we denote the particle's position by \( \mathbf{w} \),
+then its motion is described by
+</p>
+
+$$
+m {d^2 \mathbf{w} \over dt^2} + \mu {d \mathbf{w} \over dt }= -\nabla_w E(\mathbf{w}).
+$$
+
+<p>We can discretize this equation in the usual way to get</p>
+
+$$
+m { \mathbf{w}_{t+\Delta t}-2 \mathbf{w}_{t} +\mathbf{w}_{t-\Delta t} \over (\Delta t)^2}+\mu {\mathbf{w}_{t+\Delta t}- \mathbf{w}_{t} \over \Delta t} = -\nabla_w E(\mathbf{w}).
+$$
+
+<p>Rearranging this equation, we can rewrite this as</p>
+
+$$
+\Delta \mathbf{w}_{t +\Delta t}= - { (\Delta t)^2 \over m +\mu \Delta t} \nabla_w E(\mathbf{w})+ {m \over m +\mu \Delta t} \Delta \mathbf{w}_t.
+$$
+
+
+<p>
+<!-- navigation buttons at the bottom of the page -->
+<ul class="pagination">
+<li><a href="http://github.com/._week39-bs061.html">&laquo;</a></li>
+  <li><a href="http://github.com/._week39-bs000.html">1</a></li>
+  <li><a href="">...</a></li>
+  <li><a href="http://github.com/._week39-bs054.html">55</a></li>
+  <li><a href="http://github.com/._week39-bs055.html">56</a></li>
+  <li><a href="http://github.com/._week39-bs056.html">57</a></li>
+  <li><a href="http://github.com/._week39-bs057.html">58</a></li>
+  <li><a href="http://github.com/._week39-bs058.html">59</a></li>
+  <li><a href="http://github.com/._week39-bs059.html">60</a></li>
+  <li><a href="http://github.com/._week39-bs060.html">61</a></li>
+  <li><a href="http://github.com/._week39-bs061.html">62</a></li>
+  <li class="active"><a href="http://github.com/._week39-bs062.html">63</a></li>
+  <li><a href="http://github.com/._week39-bs063.html">64</a></li>
+  <li><a href="http://github.com/._week39-bs064.html">65</a></li>
+  <li><a href="http://github.com/._week39-bs065.html">66</a></li>
+  <li><a href="http://github.com/._week39-bs066.html">67</a></li>
+  <li><a href="http://github.com/._week39-bs067.html">68</a></li>
+  <li><a href="http://github.com/._week39-bs068.html">69</a></li>
+  <li><a href="http://github.com/._week39-bs069.html">70</a></li>
+  <li><a href="http://github.com/._week39-bs070.html">71</a></li>
+  <li><a href="http://github.com/._week39-bs071.html">72</a></li>
+  <li><a href="">...</a></li>
+  <li><a href="http://github.com/._week39-bs088.html">89</a></li>
+  <li><a href="http://github.com/._week39-bs063.html">&raquo;</a></li>
+</ul>
+<!-- ------------------- end of main content --------------- -->
+</div>  <!-- end container -->
+<!-- include javascript, jQuery *first* -->
+<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.10.2/jquery.min.js"></script>
+<script src="https://netdna.bootstrapcdn.com/bootstrap/3.0.0/js/bootstrap.min.js"></script>
+<!-- Bootstrap footer
+<footer>
+<a href="https://.../"><img width="250" align=right src="https://.../"></a>
+</footer>
+-->
+<center style="font-size:80%">
+<!-- copyright only on the titlepage -->
+</center>
+</body>
+</html>
+
diff --git a/doc/pub/week39/html/._week39-bs063.html b/doc/pub/week39/html/._week39-bs063.html
new file mode 100644
index 000000000..c60aade85
--- /dev/null
+++ b/doc/pub/week39/html/._week39-bs063.html
@@ -0,0 +1,509 @@
+<!--
+HTML file automatically generated from DocOnce source
+(https://github.com/doconce/doconce/)
+doconce format html week39.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week39-bs --no_mako
+-->
+<html>
+<head>
+<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
+<meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
+<meta name="viewport" content="width=device-width, initial-scale=1.0" />
+<meta name="description" content="Week 39: Optimization and  Gradient Methods">
+<title>Week 39: Optimization and  Gradient Methods</title>
+<!-- Bootstrap style: bootstrap -->
+<!-- doconce format html week39.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week39-bs --no_mako -->
+<link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
+<!-- not necessary
+<link href="https://netdna.bootstrapcdn.com/font-awesome/4.0.3/css/font-awesome.css" rel="stylesheet">
+-->
+<style type="text/css">
+/* Add scrollbar to dropdown menus in bootstrap navigation bar */
+.dropdown-menu {
+   height: auto;
+   max-height: 400px;
+   overflow-x: hidden;
+}
+/* Adds an invisible element before each target to offset for the navigation
+   bar */
+.anchor::before {
+  content:"";
+  display:block;
+  height:50px;      /* fixed header height for style bootstrap */
+  margin:-50px 0 0; /* negative fixed header height */
+}
+</style>
+</head>
+
+<!-- tocinfo
+{'highest level': 2,
+ 'sections': [('Plan for week 39, September 23-27, 2024',
+               2,
+               None,
+               'plan-for-week-39-september-23-27-2024'),
+              ('Lecture Monday September 23',
+               2,
+               None,
+               'lecture-monday-september-23'),
+              ('Lab sessions week 39', 2, None, 'lab-sessions-week-39'),
+              ('Lecture Monday September 23, Optimization, the central part of '
+               'any Machine Learning algortithm',
+               2,
+               None,
+               'lecture-monday-september-23-optimization-the-central-part-of-any-machine-learning-algortithm'),
+              ('Revisiting our Logistic Regression case',
+               2,
+               None,
+               'revisiting-our-logistic-regression-case'),
+              ('The equations to solve', 2, None, 'the-equations-to-solve'),
+              ("Solving using Newton-Raphson's method",
+               2,
+               None,
+               'solving-using-newton-raphson-s-method'),
+              ("Brief reminder on Newton-Raphson's method",
+               2,
+               None,
+               'brief-reminder-on-newton-raphson-s-method'),
+              ('The equations', 2, None, 'the-equations'),
+              ('Simple geometric interpretation',
+               2,
+               None,
+               'simple-geometric-interpretation'),
+              ('Extending to more than one variable',
+               2,
+               None,
+               'extending-to-more-than-one-variable'),
+              ('Steepest descent', 2, None, 'steepest-descent'),
+              ('More on Steepest descent', 2, None, 'more-on-steepest-descent'),
+              ('The ideal', 2, None, 'the-ideal'),
+              ('The sensitiveness of the gradient descent',
+               2,
+               None,
+               'the-sensitiveness-of-the-gradient-descent'),
+              ('Convex functions', 2, None, 'convex-functions'),
+              ('Convex function', 2, None, 'convex-function'),
+              ('Conditions on convex functions',
+               2,
+               None,
+               'conditions-on-convex-functions'),
+              ('More on convex functions', 2, None, 'more-on-convex-functions'),
+              ('Some simple problems', 2, None, 'some-simple-problems'),
+              ('Standard steepest descent',
+               2,
+               None,
+               'standard-steepest-descent'),
+              ('Gradient method', 2, None, 'gradient-method'),
+              ('Steepest descent  method', 2, None, 'steepest-descent-method'),
+              ('Steepest descent  method', 2, None, 'steepest-descent-method'),
+              ('Final expressions', 2, None, 'final-expressions'),
+              ('Steepest descent example', 2, None, 'steepest-descent-example'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method and iterations',
+               2,
+               None,
+               'conjugate-gradient-method-and-iterations'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Revisiting our first homework',
+               2,
+               None,
+               'revisiting-our-first-homework'),
+              ('Gradient descent example', 2, None, 'gradient-descent-example'),
+              ('The derivative of the cost/loss function',
+               2,
+               None,
+               'the-derivative-of-the-cost-loss-function'),
+              ('The Hessian matrix', 2, None, 'the-hessian-matrix'),
+              ('Simple program', 2, None, 'simple-program'),
+              ('Gradient Descent Example', 2, None, 'gradient-descent-example'),
+              ('And a corresponding example using _scikit-learn_',
+               2,
+               None,
+               'and-a-corresponding-example-using-scikit-learn'),
+              ('Gradient descent and Ridge',
+               2,
+               None,
+               'gradient-descent-and-ridge'),
+              ('The Hessian matrix for Ridge Regression',
+               2,
+               None,
+               'the-hessian-matrix-for-ridge-regression'),
+              ('Program example for gradient descent with Ridge Regression',
+               2,
+               None,
+               'program-example-for-gradient-descent-with-ridge-regression'),
+              ('Using gradient descent methods, limitations',
+               2,
+               None,
+               'using-gradient-descent-methods-limitations'),
+              ('Improving gradient descent with momentum',
+               2,
+               None,
+               'improving-gradient-descent-with-momentum'),
+              ('Same code but now with momentum gradient descent',
+               2,
+               None,
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ('Overview video on Stochastic Gradient Descent',
+               2,
+               None,
+               'overview-video-on-stochastic-gradient-descent'),
+              ('Batches and mini-batches', 2, None, 'batches-and-mini-batches'),
+              ('Stochastic Gradient Descent (SGD)',
+               2,
+               None,
+               'stochastic-gradient-descent-sgd'),
+              ('Stochastic Gradient Descent',
+               2,
+               None,
+               'stochastic-gradient-descent'),
+              ('Computation of gradients', 2, None, 'computation-of-gradients'),
+              ('SGD example', 2, None, 'sgd-example'),
+              ('The gradient step', 2, None, 'the-gradient-step'),
+              ('Simple example code', 2, None, 'simple-example-code'),
+              ('When do we stop?', 2, None, 'when-do-we-stop'),
+              ('Slightly different approach',
+               2,
+               None,
+               'slightly-different-approach'),
+              ('Time decay rate', 2, None, 'time-decay-rate'),
+              ('Code with a Number of Minibatches which varies',
+               2,
+               None,
+               'code-with-a-number-of-minibatches-which-varies'),
+              ('Replace or not', 2, None, 'replace-or-not'),
+              ('Momentum based GD', 2, None, 'momentum-based-gd'),
+              ('More on momentum based approaches',
+               2,
+               None,
+               'more-on-momentum-based-approaches'),
+              ('Momentum parameter', 2, None, 'momentum-parameter'),
+              ('Second moment of the gradient',
+               2,
+               None,
+               'second-moment-of-the-gradient'),
+              ('RMS prop', 2, None, 'rms-prop'),
+              ('"ADAM optimizer":"https://arxiv.org/abs/1412.6980"',
+               2,
+               None,
+               'adam-optimizer-https-arxiv-org-abs-1412-6980'),
+              ('Algorithms and codes for Adagrad, RMSprop and Adam',
+               2,
+               None,
+               'algorithms-and-codes-for-adagrad-rmsprop-and-adam'),
+              ('Practical tips', 2, None, 'practical-tips'),
+              ('Automatic differentiation',
+               2,
+               None,
+               'automatic-differentiation'),
+              ('Using autograd', 2, None, 'using-autograd'),
+              ('Autograd with more complicated functions',
+               2,
+               None,
+               'autograd-with-more-complicated-functions'),
+              ('More complicated functions using the elements of their '
+               'arguments directly',
+               2,
+               None,
+               'more-complicated-functions-using-the-elements-of-their-arguments-directly'),
+              ('Functions using mathematical functions from Numpy',
+               2,
+               None,
+               'functions-using-mathematical-functions-from-numpy'),
+              ('More autograd', 2, None, 'more-autograd'),
+              ('And  with loops', 2, None, 'and-with-loops'),
+              ('Using recursion', 2, None, 'using-recursion'),
+              ('Unsupported functions', 2, None, 'unsupported-functions'),
+              ('The syntax a.dot(b) when finding the dot product',
+               2,
+               None,
+               'the-syntax-a-dot-b-when-finding-the-dot-product'),
+              ('Using Autograd with OLS', 2, None, 'using-autograd-with-ols'),
+              ('Same code but now with momentum gradient descent',
+               2,
+               None,
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ("But none of these can compete with Newton's method",
+               2,
+               None,
+               'but-none-of-these-can-compete-with-newton-s-method'),
+              ('Including Stochastic Gradient Descent with Autograd',
+               2,
+               None,
+               'including-stochastic-gradient-descent-with-autograd'),
+              ('Same code but now with momentum gradient descent',
+               2,
+               None,
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ('Similar (second order function now) problem but now with '
+               'AdaGrad',
+               2,
+               None,
+               'similar-second-order-function-now-problem-but-now-with-adagrad'),
+              ('RMSprop for adaptive learning rate with Stochastic Gradient '
+               'Descent',
+               2,
+               None,
+               'rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent'),
+              ('And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"',
+               2,
+               None,
+               'and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf'),
+              ('And Logistic Regression', 2, None, 'and-logistic-regression'),
+              ('Introducing "JAX":"https://jax.readthedocs.io/en/latest/"',
+               2,
+               None,
+               'introducing-jax-https-jax-readthedocs-io-en-latest')]}
+end of tocinfo -->
+
+<body>
+
+
+
+<script type="text/x-mathjax-config">
+MathJax.Hub.Config({
+  TeX: {
+     equationNumbers: {  autoNumber: "none"  },
+     extensions: ["AMSmath.js", "AMSsymbols.js", "autobold.js", "color.js"]
+  }
+});
+</script>
+<script type="text/javascript" async
+ src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
+</script>
+
+
+<!-- Bootstrap navigation bar -->
+<div class="navbar navbar-default navbar-fixed-top">
+  <div class="navbar-header">
+    <button type="button" class="navbar-toggle" data-toggle="collapse" data-target=".navbar-responsive-collapse">
+      <span class="icon-bar"></span>
+      <span class="icon-bar"></span>
+      <span class="icon-bar"></span>
+    </button>
+    <a class="navbar-brand" href="http://github.com/week39-bs.html">Week 39: Optimization and  Gradient Methods</a>
+  </div>
+  <div class="navbar-collapse collapse navbar-responsive-collapse">
+    <ul class="nav navbar-nav navbar-right">
+      <li class="dropdown">
+        <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
+        <ul class="dropdown-menu">
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs001.html#plan-for-week-39-september-23-27-2024" style="font-size: 80%;">Plan for week 39, September 23-27, 2024</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs002.html#lecture-monday-september-23" style="font-size: 80%;">Lecture Monday September 23</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs003.html#lab-sessions-week-39" style="font-size: 80%;">Lab sessions week 39</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs004.html#lecture-monday-september-23-optimization-the-central-part-of-any-machine-learning-algortithm" style="font-size: 80%;">Lecture Monday September 23, Optimization, the central part of any Machine Learning algortithm</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs005.html#revisiting-our-logistic-regression-case" style="font-size: 80%;">Revisiting our Logistic Regression case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs006.html#the-equations-to-solve" style="font-size: 80%;">The equations to solve</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs007.html#solving-using-newton-raphson-s-method" style="font-size: 80%;">Solving using Newton-Raphson's method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs008.html#brief-reminder-on-newton-raphson-s-method" style="font-size: 80%;">Brief reminder on Newton-Raphson's method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs009.html#the-equations" style="font-size: 80%;">The equations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs010.html#simple-geometric-interpretation" style="font-size: 80%;">Simple geometric interpretation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs011.html#extending-to-more-than-one-variable" style="font-size: 80%;">Extending to more than one variable</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs012.html#steepest-descent" style="font-size: 80%;">Steepest descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs013.html#more-on-steepest-descent" style="font-size: 80%;">More on Steepest descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs014.html#the-ideal" style="font-size: 80%;">The ideal</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs015.html#the-sensitiveness-of-the-gradient-descent" style="font-size: 80%;">The sensitiveness of the gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs016.html#convex-functions" style="font-size: 80%;">Convex functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs017.html#convex-function" style="font-size: 80%;">Convex function</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs018.html#conditions-on-convex-functions" style="font-size: 80%;">Conditions on convex functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs019.html#more-on-convex-functions" style="font-size: 80%;">More on convex functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs020.html#some-simple-problems" style="font-size: 80%;">Some simple problems</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs021.html#standard-steepest-descent" style="font-size: 80%;">Standard steepest descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs022.html#gradient-method" style="font-size: 80%;">Gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs024.html#steepest-descent-method" style="font-size: 80%;">Steepest descent  method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs024.html#steepest-descent-method" style="font-size: 80%;">Steepest descent  method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs025.html#final-expressions" style="font-size: 80%;">Final expressions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs026.html#steepest-descent-example" style="font-size: 80%;">Steepest descent example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs031.html#conjugate-gradient-method-and-iterations" style="font-size: 80%;">Conjugate gradient method and iterations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs035.html#revisiting-our-first-homework" style="font-size: 80%;">Revisiting our first homework</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs040.html#gradient-descent-example" style="font-size: 80%;">Gradient descent example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs037.html#the-derivative-of-the-cost-loss-function" style="font-size: 80%;">The derivative of the cost/loss function</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs038.html#the-hessian-matrix" style="font-size: 80%;">The Hessian matrix</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs039.html#simple-program" style="font-size: 80%;">Simple program</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs040.html#gradient-descent-example" style="font-size: 80%;">Gradient Descent Example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs041.html#and-a-corresponding-example-using-scikit-learn" style="font-size: 80%;">And a corresponding example using <b>scikit-learn</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs042.html#gradient-descent-and-ridge" style="font-size: 80%;">Gradient descent and Ridge</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs043.html#the-hessian-matrix-for-ridge-regression" style="font-size: 80%;">The Hessian matrix for Ridge Regression</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs044.html#program-example-for-gradient-descent-with-ridge-regression" style="font-size: 80%;">Program example for gradient descent with Ridge Regression</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs045.html#using-gradient-descent-methods-limitations" style="font-size: 80%;">Using gradient descent methods, limitations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs046.html#improving-gradient-descent-with-momentum" style="font-size: 80%;">Improving gradient descent with momentum</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs083.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;">Same code but now with momentum gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs048.html#overview-video-on-stochastic-gradient-descent" style="font-size: 80%;">Overview video on Stochastic Gradient Descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs049.html#batches-and-mini-batches" style="font-size: 80%;">Batches and mini-batches</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs050.html#stochastic-gradient-descent-sgd" style="font-size: 80%;">Stochastic Gradient Descent (SGD)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs051.html#stochastic-gradient-descent" style="font-size: 80%;">Stochastic Gradient Descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs052.html#computation-of-gradients" style="font-size: 80%;">Computation of gradients</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs053.html#sgd-example" style="font-size: 80%;">SGD example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs054.html#the-gradient-step" style="font-size: 80%;">The gradient step</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs055.html#simple-example-code" style="font-size: 80%;">Simple example code</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs056.html#when-do-we-stop" style="font-size: 80%;">When do we stop?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs057.html#slightly-different-approach" style="font-size: 80%;">Slightly different approach</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs058.html#time-decay-rate" style="font-size: 80%;">Time decay rate</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs059.html#code-with-a-number-of-minibatches-which-varies" style="font-size: 80%;">Code with a Number of Minibatches which varies</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs060.html#replace-or-not" style="font-size: 80%;">Replace or not</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs061.html#momentum-based-gd" style="font-size: 80%;">Momentum based GD</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs062.html#more-on-momentum-based-approaches" style="font-size: 80%;">More on momentum based approaches</a></li>
+     <!-- navigation toc: --> <li><a href="#momentum-parameter" style="font-size: 80%;">Momentum parameter</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs064.html#second-moment-of-the-gradient" style="font-size: 80%;">Second moment of the gradient</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs065.html#rms-prop" style="font-size: 80%;">RMS prop</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs066.html#adam-optimizer-https-arxiv-org-abs-1412-6980" style="font-size: 80%;">"ADAM optimizer":"https://arxiv.org/abs/1412.6980"</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs067.html#algorithms-and-codes-for-adagrad-rmsprop-and-adam" style="font-size: 80%;">Algorithms and codes for Adagrad, RMSprop and Adam</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs068.html#practical-tips" style="font-size: 80%;">Practical tips</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs069.html#automatic-differentiation" style="font-size: 80%;">Automatic differentiation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs070.html#using-autograd" style="font-size: 80%;">Using autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs071.html#autograd-with-more-complicated-functions" style="font-size: 80%;">Autograd with more complicated functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs072.html#more-complicated-functions-using-the-elements-of-their-arguments-directly" style="font-size: 80%;">More complicated functions using the elements of their arguments directly</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs073.html#functions-using-mathematical-functions-from-numpy" style="font-size: 80%;">Functions using mathematical functions from Numpy</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs074.html#more-autograd" style="font-size: 80%;">More autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs075.html#and-with-loops" style="font-size: 80%;">And  with loops</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs076.html#using-recursion" style="font-size: 80%;">Using recursion</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs077.html#unsupported-functions" style="font-size: 80%;">Unsupported functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs078.html#the-syntax-a-dot-b-when-finding-the-dot-product" style="font-size: 80%;">The syntax a.dot(b) when finding the dot product</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs079.html#using-autograd-with-ols" style="font-size: 80%;">Using Autograd with OLS</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs083.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;">Same code but now with momentum gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs081.html#but-none-of-these-can-compete-with-newton-s-method" style="font-size: 80%;">But none of these can compete with Newton's method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs082.html#including-stochastic-gradient-descent-with-autograd" style="font-size: 80%;">Including Stochastic Gradient Descent with Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs083.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;">Same code but now with momentum gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs084.html#similar-second-order-function-now-problem-but-now-with-adagrad" style="font-size: 80%;">Similar (second order function now) problem but now with AdaGrad</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs085.html#rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent" style="font-size: 80%;">RMSprop for adaptive learning rate with Stochastic Gradient Descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs086.html#and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf" style="font-size: 80%;">And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs087.html#and-logistic-regression" style="font-size: 80%;">And Logistic Regression</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs088.html#introducing-jax-https-jax-readthedocs-io-en-latest" style="font-size: 80%;">Introducing "JAX":"https://jax.readthedocs.io/en/latest/"</a></li>
+
+        </ul>
+      </li>
+    </ul>
+  </div>
+</div>
+</div> <!-- end of navigation bar -->
+<div class="container">
+<p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
+<a name="part0063"></a>
+<!-- !split -->
+<h2 id="momentum-parameter" class="anchor">Momentum parameter </h2>
+
+<p>Notice that this equation is identical to previous one if we identify
+the position of the particle, \( \mathbf{w} \), with the parameters
+\( \boldsymbol{\theta} \). This allows us to identify the momentum
+parameter and learning rate with the mass of the particle and the
+viscous drag as:
+</p>
+
+$$
+\gamma= {m \over m +\mu \Delta t }, \qquad \eta = {(\Delta t)^2 \over m +\mu \Delta t}.
+$$
+
+<p>Thus, as the name suggests, the momentum parameter is proportional to
+the mass of the particle and effectively provides inertia.
+Furthermore, in the large viscosity/small learning rate limit, our
+memory time scales as \( (1-\gamma)^{-1} \approx m/(\mu \Delta t) \).
+</p>
+
+<p>Why is momentum useful? SGD momentum helps the gradient descent
+algorithm gain speed in directions with persistent but small gradients
+even in the presence of stochasticity, while suppressing oscillations
+in high-curvature directions. This becomes especially important in
+situations where the landscape is shallow and flat in some directions
+and narrow and steep in others. It has been argued that first-order
+methods (with appropriate initial conditions) can perform comparable
+to more expensive second order methods, especially in the context of
+complex deep learning models.
+</p>
+
+<p>These beneficial properties of momentum can sometimes become even more
+pronounced by using a slight modification of the classical momentum
+algorithm called Nesterov Accelerated Gradient (NAG).
+</p>
+
+<p>In the NAG algorithm, rather than calculating the gradient at the
+current parameters, \( \nabla_\theta E(\boldsymbol{\theta}_t) \), one
+calculates the gradient at the expected value of the parameters given
+our current momentum, \( \nabla_\theta E(\boldsymbol{\theta}_t +\gamma
+\mathbf{v}_{t-1}) \). This yields the NAG update rule
+</p>
+
+$$
+\begin{align}
+\mathbf{v}_{t}&=\gamma \mathbf{v}_{t-1}+\eta_{t}\nabla_\theta E(\boldsymbol{\theta}_t +\gamma \mathbf{v}_{t-1}) \nonumber \\
+\boldsymbol{\theta}_{t+1}&= \boldsymbol{\theta}_t -\mathbf{v}_{t}.
+\tag{3}
+\end{align}
+$$
+
+<p>One of the major advantages of NAG is that it allows for the use of a larger learning rate than GDM for the same choice of \( \gamma \).</p>
+
+<p>
+<!-- navigation buttons at the bottom of the page -->
+<ul class="pagination">
+<li><a href="http://github.com/._week39-bs062.html">&laquo;</a></li>
+  <li><a href="http://github.com/._week39-bs000.html">1</a></li>
+  <li><a href="">...</a></li>
+  <li><a href="http://github.com/._week39-bs055.html">56</a></li>
+  <li><a href="http://github.com/._week39-bs056.html">57</a></li>
+  <li><a href="http://github.com/._week39-bs057.html">58</a></li>
+  <li><a href="http://github.com/._week39-bs058.html">59</a></li>
+  <li><a href="http://github.com/._week39-bs059.html">60</a></li>
+  <li><a href="http://github.com/._week39-bs060.html">61</a></li>
+  <li><a href="http://github.com/._week39-bs061.html">62</a></li>
+  <li><a href="http://github.com/._week39-bs062.html">63</a></li>
+  <li class="active"><a href="http://github.com/._week39-bs063.html">64</a></li>
+  <li><a href="http://github.com/._week39-bs064.html">65</a></li>
+  <li><a href="http://github.com/._week39-bs065.html">66</a></li>
+  <li><a href="http://github.com/._week39-bs066.html">67</a></li>
+  <li><a href="http://github.com/._week39-bs067.html">68</a></li>
+  <li><a href="http://github.com/._week39-bs068.html">69</a></li>
+  <li><a href="http://github.com/._week39-bs069.html">70</a></li>
+  <li><a href="http://github.com/._week39-bs070.html">71</a></li>
+  <li><a href="http://github.com/._week39-bs071.html">72</a></li>
+  <li><a href="http://github.com/._week39-bs072.html">73</a></li>
+  <li><a href="">...</a></li>
+  <li><a href="http://github.com/._week39-bs088.html">89</a></li>
+  <li><a href="http://github.com/._week39-bs064.html">&raquo;</a></li>
+</ul>
+<!-- ------------------- end of main content --------------- -->
+</div>  <!-- end container -->
+<!-- include javascript, jQuery *first* -->
+<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.10.2/jquery.min.js"></script>
+<script src="https://netdna.bootstrapcdn.com/bootstrap/3.0.0/js/bootstrap.min.js"></script>
+<!-- Bootstrap footer
+<footer>
+<a href="https://.../"><img width="250" align=right src="https://.../"></a>
+</footer>
+-->
+<center style="font-size:80%">
+<!-- copyright only on the titlepage -->
+</center>
+</body>
+</html>
+
diff --git a/doc/pub/week39/html/._week39-bs064.html b/doc/pub/week39/html/._week39-bs064.html
new file mode 100644
index 000000000..3b806b349
--- /dev/null
+++ b/doc/pub/week39/html/._week39-bs064.html
@@ -0,0 +1,482 @@
+<!--
+HTML file automatically generated from DocOnce source
+(https://github.com/doconce/doconce/)
+doconce format html week39.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week39-bs --no_mako
+-->
+<html>
+<head>
+<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
+<meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
+<meta name="viewport" content="width=device-width, initial-scale=1.0" />
+<meta name="description" content="Week 39: Optimization and  Gradient Methods">
+<title>Week 39: Optimization and  Gradient Methods</title>
+<!-- Bootstrap style: bootstrap -->
+<!-- doconce format html week39.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week39-bs --no_mako -->
+<link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
+<!-- not necessary
+<link href="https://netdna.bootstrapcdn.com/font-awesome/4.0.3/css/font-awesome.css" rel="stylesheet">
+-->
+<style type="text/css">
+/* Add scrollbar to dropdown menus in bootstrap navigation bar */
+.dropdown-menu {
+   height: auto;
+   max-height: 400px;
+   overflow-x: hidden;
+}
+/* Adds an invisible element before each target to offset for the navigation
+   bar */
+.anchor::before {
+  content:"";
+  display:block;
+  height:50px;      /* fixed header height for style bootstrap */
+  margin:-50px 0 0; /* negative fixed header height */
+}
+</style>
+</head>
+
+<!-- tocinfo
+{'highest level': 2,
+ 'sections': [('Plan for week 39, September 23-27, 2024',
+               2,
+               None,
+               'plan-for-week-39-september-23-27-2024'),
+              ('Lecture Monday September 23',
+               2,
+               None,
+               'lecture-monday-september-23'),
+              ('Lab sessions week 39', 2, None, 'lab-sessions-week-39'),
+              ('Lecture Monday September 23, Optimization, the central part of '
+               'any Machine Learning algortithm',
+               2,
+               None,
+               'lecture-monday-september-23-optimization-the-central-part-of-any-machine-learning-algortithm'),
+              ('Revisiting our Logistic Regression case',
+               2,
+               None,
+               'revisiting-our-logistic-regression-case'),
+              ('The equations to solve', 2, None, 'the-equations-to-solve'),
+              ("Solving using Newton-Raphson's method",
+               2,
+               None,
+               'solving-using-newton-raphson-s-method'),
+              ("Brief reminder on Newton-Raphson's method",
+               2,
+               None,
+               'brief-reminder-on-newton-raphson-s-method'),
+              ('The equations', 2, None, 'the-equations'),
+              ('Simple geometric interpretation',
+               2,
+               None,
+               'simple-geometric-interpretation'),
+              ('Extending to more than one variable',
+               2,
+               None,
+               'extending-to-more-than-one-variable'),
+              ('Steepest descent', 2, None, 'steepest-descent'),
+              ('More on Steepest descent', 2, None, 'more-on-steepest-descent'),
+              ('The ideal', 2, None, 'the-ideal'),
+              ('The sensitiveness of the gradient descent',
+               2,
+               None,
+               'the-sensitiveness-of-the-gradient-descent'),
+              ('Convex functions', 2, None, 'convex-functions'),
+              ('Convex function', 2, None, 'convex-function'),
+              ('Conditions on convex functions',
+               2,
+               None,
+               'conditions-on-convex-functions'),
+              ('More on convex functions', 2, None, 'more-on-convex-functions'),
+              ('Some simple problems', 2, None, 'some-simple-problems'),
+              ('Standard steepest descent',
+               2,
+               None,
+               'standard-steepest-descent'),
+              ('Gradient method', 2, None, 'gradient-method'),
+              ('Steepest descent  method', 2, None, 'steepest-descent-method'),
+              ('Steepest descent  method', 2, None, 'steepest-descent-method'),
+              ('Final expressions', 2, None, 'final-expressions'),
+              ('Steepest descent example', 2, None, 'steepest-descent-example'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method and iterations',
+               2,
+               None,
+               'conjugate-gradient-method-and-iterations'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Revisiting our first homework',
+               2,
+               None,
+               'revisiting-our-first-homework'),
+              ('Gradient descent example', 2, None, 'gradient-descent-example'),
+              ('The derivative of the cost/loss function',
+               2,
+               None,
+               'the-derivative-of-the-cost-loss-function'),
+              ('The Hessian matrix', 2, None, 'the-hessian-matrix'),
+              ('Simple program', 2, None, 'simple-program'),
+              ('Gradient Descent Example', 2, None, 'gradient-descent-example'),
+              ('And a corresponding example using _scikit-learn_',
+               2,
+               None,
+               'and-a-corresponding-example-using-scikit-learn'),
+              ('Gradient descent and Ridge',
+               2,
+               None,
+               'gradient-descent-and-ridge'),
+              ('The Hessian matrix for Ridge Regression',
+               2,
+               None,
+               'the-hessian-matrix-for-ridge-regression'),
+              ('Program example for gradient descent with Ridge Regression',
+               2,
+               None,
+               'program-example-for-gradient-descent-with-ridge-regression'),
+              ('Using gradient descent methods, limitations',
+               2,
+               None,
+               'using-gradient-descent-methods-limitations'),
+              ('Improving gradient descent with momentum',
+               2,
+               None,
+               'improving-gradient-descent-with-momentum'),
+              ('Same code but now with momentum gradient descent',
+               2,
+               None,
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ('Overview video on Stochastic Gradient Descent',
+               2,
+               None,
+               'overview-video-on-stochastic-gradient-descent'),
+              ('Batches and mini-batches', 2, None, 'batches-and-mini-batches'),
+              ('Stochastic Gradient Descent (SGD)',
+               2,
+               None,
+               'stochastic-gradient-descent-sgd'),
+              ('Stochastic Gradient Descent',
+               2,
+               None,
+               'stochastic-gradient-descent'),
+              ('Computation of gradients', 2, None, 'computation-of-gradients'),
+              ('SGD example', 2, None, 'sgd-example'),
+              ('The gradient step', 2, None, 'the-gradient-step'),
+              ('Simple example code', 2, None, 'simple-example-code'),
+              ('When do we stop?', 2, None, 'when-do-we-stop'),
+              ('Slightly different approach',
+               2,
+               None,
+               'slightly-different-approach'),
+              ('Time decay rate', 2, None, 'time-decay-rate'),
+              ('Code with a Number of Minibatches which varies',
+               2,
+               None,
+               'code-with-a-number-of-minibatches-which-varies'),
+              ('Replace or not', 2, None, 'replace-or-not'),
+              ('Momentum based GD', 2, None, 'momentum-based-gd'),
+              ('More on momentum based approaches',
+               2,
+               None,
+               'more-on-momentum-based-approaches'),
+              ('Momentum parameter', 2, None, 'momentum-parameter'),
+              ('Second moment of the gradient',
+               2,
+               None,
+               'second-moment-of-the-gradient'),
+              ('RMS prop', 2, None, 'rms-prop'),
+              ('"ADAM optimizer":"https://arxiv.org/abs/1412.6980"',
+               2,
+               None,
+               'adam-optimizer-https-arxiv-org-abs-1412-6980'),
+              ('Algorithms and codes for Adagrad, RMSprop and Adam',
+               2,
+               None,
+               'algorithms-and-codes-for-adagrad-rmsprop-and-adam'),
+              ('Practical tips', 2, None, 'practical-tips'),
+              ('Automatic differentiation',
+               2,
+               None,
+               'automatic-differentiation'),
+              ('Using autograd', 2, None, 'using-autograd'),
+              ('Autograd with more complicated functions',
+               2,
+               None,
+               'autograd-with-more-complicated-functions'),
+              ('More complicated functions using the elements of their '
+               'arguments directly',
+               2,
+               None,
+               'more-complicated-functions-using-the-elements-of-their-arguments-directly'),
+              ('Functions using mathematical functions from Numpy',
+               2,
+               None,
+               'functions-using-mathematical-functions-from-numpy'),
+              ('More autograd', 2, None, 'more-autograd'),
+              ('And  with loops', 2, None, 'and-with-loops'),
+              ('Using recursion', 2, None, 'using-recursion'),
+              ('Unsupported functions', 2, None, 'unsupported-functions'),
+              ('The syntax a.dot(b) when finding the dot product',
+               2,
+               None,
+               'the-syntax-a-dot-b-when-finding-the-dot-product'),
+              ('Using Autograd with OLS', 2, None, 'using-autograd-with-ols'),
+              ('Same code but now with momentum gradient descent',
+               2,
+               None,
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ("But none of these can compete with Newton's method",
+               2,
+               None,
+               'but-none-of-these-can-compete-with-newton-s-method'),
+              ('Including Stochastic Gradient Descent with Autograd',
+               2,
+               None,
+               'including-stochastic-gradient-descent-with-autograd'),
+              ('Same code but now with momentum gradient descent',
+               2,
+               None,
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ('Similar (second order function now) problem but now with '
+               'AdaGrad',
+               2,
+               None,
+               'similar-second-order-function-now-problem-but-now-with-adagrad'),
+              ('RMSprop for adaptive learning rate with Stochastic Gradient '
+               'Descent',
+               2,
+               None,
+               'rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent'),
+              ('And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"',
+               2,
+               None,
+               'and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf'),
+              ('And Logistic Regression', 2, None, 'and-logistic-regression'),
+              ('Introducing "JAX":"https://jax.readthedocs.io/en/latest/"',
+               2,
+               None,
+               'introducing-jax-https-jax-readthedocs-io-en-latest')]}
+end of tocinfo -->
+
+<body>
+
+
+
+<script type="text/x-mathjax-config">
+MathJax.Hub.Config({
+  TeX: {
+     equationNumbers: {  autoNumber: "none"  },
+     extensions: ["AMSmath.js", "AMSsymbols.js", "autobold.js", "color.js"]
+  }
+});
+</script>
+<script type="text/javascript" async
+ src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
+</script>
+
+
+<!-- Bootstrap navigation bar -->
+<div class="navbar navbar-default navbar-fixed-top">
+  <div class="navbar-header">
+    <button type="button" class="navbar-toggle" data-toggle="collapse" data-target=".navbar-responsive-collapse">
+      <span class="icon-bar"></span>
+      <span class="icon-bar"></span>
+      <span class="icon-bar"></span>
+    </button>
+    <a class="navbar-brand" href="http://github.com/week39-bs.html">Week 39: Optimization and  Gradient Methods</a>
+  </div>
+  <div class="navbar-collapse collapse navbar-responsive-collapse">
+    <ul class="nav navbar-nav navbar-right">
+      <li class="dropdown">
+        <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
+        <ul class="dropdown-menu">
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs001.html#plan-for-week-39-september-23-27-2024" style="font-size: 80%;">Plan for week 39, September 23-27, 2024</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs002.html#lecture-monday-september-23" style="font-size: 80%;">Lecture Monday September 23</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs003.html#lab-sessions-week-39" style="font-size: 80%;">Lab sessions week 39</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs004.html#lecture-monday-september-23-optimization-the-central-part-of-any-machine-learning-algortithm" style="font-size: 80%;">Lecture Monday September 23, Optimization, the central part of any Machine Learning algortithm</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs005.html#revisiting-our-logistic-regression-case" style="font-size: 80%;">Revisiting our Logistic Regression case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs006.html#the-equations-to-solve" style="font-size: 80%;">The equations to solve</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs007.html#solving-using-newton-raphson-s-method" style="font-size: 80%;">Solving using Newton-Raphson's method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs008.html#brief-reminder-on-newton-raphson-s-method" style="font-size: 80%;">Brief reminder on Newton-Raphson's method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs009.html#the-equations" style="font-size: 80%;">The equations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs010.html#simple-geometric-interpretation" style="font-size: 80%;">Simple geometric interpretation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs011.html#extending-to-more-than-one-variable" style="font-size: 80%;">Extending to more than one variable</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs012.html#steepest-descent" style="font-size: 80%;">Steepest descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs013.html#more-on-steepest-descent" style="font-size: 80%;">More on Steepest descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs014.html#the-ideal" style="font-size: 80%;">The ideal</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs015.html#the-sensitiveness-of-the-gradient-descent" style="font-size: 80%;">The sensitiveness of the gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs016.html#convex-functions" style="font-size: 80%;">Convex functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs017.html#convex-function" style="font-size: 80%;">Convex function</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs018.html#conditions-on-convex-functions" style="font-size: 80%;">Conditions on convex functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs019.html#more-on-convex-functions" style="font-size: 80%;">More on convex functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs020.html#some-simple-problems" style="font-size: 80%;">Some simple problems</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs021.html#standard-steepest-descent" style="font-size: 80%;">Standard steepest descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs022.html#gradient-method" style="font-size: 80%;">Gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs024.html#steepest-descent-method" style="font-size: 80%;">Steepest descent  method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs024.html#steepest-descent-method" style="font-size: 80%;">Steepest descent  method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs025.html#final-expressions" style="font-size: 80%;">Final expressions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs026.html#steepest-descent-example" style="font-size: 80%;">Steepest descent example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs031.html#conjugate-gradient-method-and-iterations" style="font-size: 80%;">Conjugate gradient method and iterations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs035.html#revisiting-our-first-homework" style="font-size: 80%;">Revisiting our first homework</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs040.html#gradient-descent-example" style="font-size: 80%;">Gradient descent example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs037.html#the-derivative-of-the-cost-loss-function" style="font-size: 80%;">The derivative of the cost/loss function</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs038.html#the-hessian-matrix" style="font-size: 80%;">The Hessian matrix</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs039.html#simple-program" style="font-size: 80%;">Simple program</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs040.html#gradient-descent-example" style="font-size: 80%;">Gradient Descent Example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs041.html#and-a-corresponding-example-using-scikit-learn" style="font-size: 80%;">And a corresponding example using <b>scikit-learn</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs042.html#gradient-descent-and-ridge" style="font-size: 80%;">Gradient descent and Ridge</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs043.html#the-hessian-matrix-for-ridge-regression" style="font-size: 80%;">The Hessian matrix for Ridge Regression</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs044.html#program-example-for-gradient-descent-with-ridge-regression" style="font-size: 80%;">Program example for gradient descent with Ridge Regression</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs045.html#using-gradient-descent-methods-limitations" style="font-size: 80%;">Using gradient descent methods, limitations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs046.html#improving-gradient-descent-with-momentum" style="font-size: 80%;">Improving gradient descent with momentum</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs083.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;">Same code but now with momentum gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs048.html#overview-video-on-stochastic-gradient-descent" style="font-size: 80%;">Overview video on Stochastic Gradient Descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs049.html#batches-and-mini-batches" style="font-size: 80%;">Batches and mini-batches</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs050.html#stochastic-gradient-descent-sgd" style="font-size: 80%;">Stochastic Gradient Descent (SGD)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs051.html#stochastic-gradient-descent" style="font-size: 80%;">Stochastic Gradient Descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs052.html#computation-of-gradients" style="font-size: 80%;">Computation of gradients</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs053.html#sgd-example" style="font-size: 80%;">SGD example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs054.html#the-gradient-step" style="font-size: 80%;">The gradient step</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs055.html#simple-example-code" style="font-size: 80%;">Simple example code</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs056.html#when-do-we-stop" style="font-size: 80%;">When do we stop?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs057.html#slightly-different-approach" style="font-size: 80%;">Slightly different approach</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs058.html#time-decay-rate" style="font-size: 80%;">Time decay rate</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs059.html#code-with-a-number-of-minibatches-which-varies" style="font-size: 80%;">Code with a Number of Minibatches which varies</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs060.html#replace-or-not" style="font-size: 80%;">Replace or not</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs061.html#momentum-based-gd" style="font-size: 80%;">Momentum based GD</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs062.html#more-on-momentum-based-approaches" style="font-size: 80%;">More on momentum based approaches</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs063.html#momentum-parameter" style="font-size: 80%;">Momentum parameter</a></li>
+     <!-- navigation toc: --> <li><a href="#second-moment-of-the-gradient" style="font-size: 80%;">Second moment of the gradient</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs065.html#rms-prop" style="font-size: 80%;">RMS prop</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs066.html#adam-optimizer-https-arxiv-org-abs-1412-6980" style="font-size: 80%;">"ADAM optimizer":"https://arxiv.org/abs/1412.6980"</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs067.html#algorithms-and-codes-for-adagrad-rmsprop-and-adam" style="font-size: 80%;">Algorithms and codes for Adagrad, RMSprop and Adam</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs068.html#practical-tips" style="font-size: 80%;">Practical tips</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs069.html#automatic-differentiation" style="font-size: 80%;">Automatic differentiation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs070.html#using-autograd" style="font-size: 80%;">Using autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs071.html#autograd-with-more-complicated-functions" style="font-size: 80%;">Autograd with more complicated functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs072.html#more-complicated-functions-using-the-elements-of-their-arguments-directly" style="font-size: 80%;">More complicated functions using the elements of their arguments directly</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs073.html#functions-using-mathematical-functions-from-numpy" style="font-size: 80%;">Functions using mathematical functions from Numpy</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs074.html#more-autograd" style="font-size: 80%;">More autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs075.html#and-with-loops" style="font-size: 80%;">And  with loops</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs076.html#using-recursion" style="font-size: 80%;">Using recursion</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs077.html#unsupported-functions" style="font-size: 80%;">Unsupported functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs078.html#the-syntax-a-dot-b-when-finding-the-dot-product" style="font-size: 80%;">The syntax a.dot(b) when finding the dot product</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs079.html#using-autograd-with-ols" style="font-size: 80%;">Using Autograd with OLS</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs083.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;">Same code but now with momentum gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs081.html#but-none-of-these-can-compete-with-newton-s-method" style="font-size: 80%;">But none of these can compete with Newton's method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs082.html#including-stochastic-gradient-descent-with-autograd" style="font-size: 80%;">Including Stochastic Gradient Descent with Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs083.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;">Same code but now with momentum gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs084.html#similar-second-order-function-now-problem-but-now-with-adagrad" style="font-size: 80%;">Similar (second order function now) problem but now with AdaGrad</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs085.html#rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent" style="font-size: 80%;">RMSprop for adaptive learning rate with Stochastic Gradient Descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs086.html#and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf" style="font-size: 80%;">And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs087.html#and-logistic-regression" style="font-size: 80%;">And Logistic Regression</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs088.html#introducing-jax-https-jax-readthedocs-io-en-latest" style="font-size: 80%;">Introducing "JAX":"https://jax.readthedocs.io/en/latest/"</a></li>
+
+        </ul>
+      </li>
+    </ul>
+  </div>
+</div>
+</div> <!-- end of navigation bar -->
+<div class="container">
+<p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
+<a name="part0064"></a>
+<!-- !split -->
+<h2 id="second-moment-of-the-gradient" class="anchor">Second moment of the gradient </h2>
+
+<p>In stochastic gradient descent, with and without momentum, we still
+have to specify a schedule for tuning the learning rates \( \eta_t \)
+as a function of time.  As discussed in the context of Newton's
+method, this presents a number of dilemmas. The learning rate is
+limited by the steepest direction which can change depending on the
+current position in the landscape. To circumvent this problem, ideally
+our algorithm would keep track of curvature and take large steps in
+shallow, flat directions and small steps in steep, narrow directions.
+Second-order methods accomplish this by calculating or approximating
+the Hessian and normalizing the learning rate by the
+curvature. However, this is very computationally expensive for
+extremely large models. Ideally, we would like to be able to
+adaptively change the step size to match the landscape without paying
+the steep computational price of calculating or approximating
+Hessians.
+</p>
+
+<p>Recently, a number of methods have been introduced that accomplish
+this by tracking not only the gradient, but also the second moment of
+the gradient. These methods include AdaGrad, AdaDelta, Root Mean Squared Propagation (RMS-Prop), and
+<a href="https://arxiv.org/abs/1412.6980" target="_self">ADAM</a>.
+</p>
+
+<p>
+<!-- navigation buttons at the bottom of the page -->
+<ul class="pagination">
+<li><a href="http://github.com/._week39-bs063.html">&laquo;</a></li>
+  <li><a href="http://github.com/._week39-bs000.html">1</a></li>
+  <li><a href="">...</a></li>
+  <li><a href="http://github.com/._week39-bs056.html">57</a></li>
+  <li><a href="http://github.com/._week39-bs057.html">58</a></li>
+  <li><a href="http://github.com/._week39-bs058.html">59</a></li>
+  <li><a href="http://github.com/._week39-bs059.html">60</a></li>
+  <li><a href="http://github.com/._week39-bs060.html">61</a></li>
+  <li><a href="http://github.com/._week39-bs061.html">62</a></li>
+  <li><a href="http://github.com/._week39-bs062.html">63</a></li>
+  <li><a href="http://github.com/._week39-bs063.html">64</a></li>
+  <li class="active"><a href="http://github.com/._week39-bs064.html">65</a></li>
+  <li><a href="http://github.com/._week39-bs065.html">66</a></li>
+  <li><a href="http://github.com/._week39-bs066.html">67</a></li>
+  <li><a href="http://github.com/._week39-bs067.html">68</a></li>
+  <li><a href="http://github.com/._week39-bs068.html">69</a></li>
+  <li><a href="http://github.com/._week39-bs069.html">70</a></li>
+  <li><a href="http://github.com/._week39-bs070.html">71</a></li>
+  <li><a href="http://github.com/._week39-bs071.html">72</a></li>
+  <li><a href="http://github.com/._week39-bs072.html">73</a></li>
+  <li><a href="http://github.com/._week39-bs073.html">74</a></li>
+  <li><a href="">...</a></li>
+  <li><a href="http://github.com/._week39-bs088.html">89</a></li>
+  <li><a href="http://github.com/._week39-bs065.html">&raquo;</a></li>
+</ul>
+<!-- ------------------- end of main content --------------- -->
+</div>  <!-- end container -->
+<!-- include javascript, jQuery *first* -->
+<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.10.2/jquery.min.js"></script>
+<script src="https://netdna.bootstrapcdn.com/bootstrap/3.0.0/js/bootstrap.min.js"></script>
+<!-- Bootstrap footer
+<footer>
+<a href="https://.../"><img width="250" align=right src="https://.../"></a>
+</footer>
+-->
+<center style="font-size:80%">
+<!-- copyright only on the titlepage -->
+</center>
+</body>
+</html>
+
diff --git a/doc/pub/week39/html/._week39-bs065.html b/doc/pub/week39/html/._week39-bs065.html
new file mode 100644
index 000000000..04a243b3c
--- /dev/null
+++ b/doc/pub/week39/html/._week39-bs065.html
@@ -0,0 +1,485 @@
+<!--
+HTML file automatically generated from DocOnce source
+(https://github.com/doconce/doconce/)
+doconce format html week39.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week39-bs --no_mako
+-->
+<html>
+<head>
+<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
+<meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
+<meta name="viewport" content="width=device-width, initial-scale=1.0" />
+<meta name="description" content="Week 39: Optimization and  Gradient Methods">
+<title>Week 39: Optimization and  Gradient Methods</title>
+<!-- Bootstrap style: bootstrap -->
+<!-- doconce format html week39.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week39-bs --no_mako -->
+<link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
+<!-- not necessary
+<link href="https://netdna.bootstrapcdn.com/font-awesome/4.0.3/css/font-awesome.css" rel="stylesheet">
+-->
+<style type="text/css">
+/* Add scrollbar to dropdown menus in bootstrap navigation bar */
+.dropdown-menu {
+   height: auto;
+   max-height: 400px;
+   overflow-x: hidden;
+}
+/* Adds an invisible element before each target to offset for the navigation
+   bar */
+.anchor::before {
+  content:"";
+  display:block;
+  height:50px;      /* fixed header height for style bootstrap */
+  margin:-50px 0 0; /* negative fixed header height */
+}
+</style>
+</head>
+
+<!-- tocinfo
+{'highest level': 2,
+ 'sections': [('Plan for week 39, September 23-27, 2024',
+               2,
+               None,
+               'plan-for-week-39-september-23-27-2024'),
+              ('Lecture Monday September 23',
+               2,
+               None,
+               'lecture-monday-september-23'),
+              ('Lab sessions week 39', 2, None, 'lab-sessions-week-39'),
+              ('Lecture Monday September 23, Optimization, the central part of '
+               'any Machine Learning algortithm',
+               2,
+               None,
+               'lecture-monday-september-23-optimization-the-central-part-of-any-machine-learning-algortithm'),
+              ('Revisiting our Logistic Regression case',
+               2,
+               None,
+               'revisiting-our-logistic-regression-case'),
+              ('The equations to solve', 2, None, 'the-equations-to-solve'),
+              ("Solving using Newton-Raphson's method",
+               2,
+               None,
+               'solving-using-newton-raphson-s-method'),
+              ("Brief reminder on Newton-Raphson's method",
+               2,
+               None,
+               'brief-reminder-on-newton-raphson-s-method'),
+              ('The equations', 2, None, 'the-equations'),
+              ('Simple geometric interpretation',
+               2,
+               None,
+               'simple-geometric-interpretation'),
+              ('Extending to more than one variable',
+               2,
+               None,
+               'extending-to-more-than-one-variable'),
+              ('Steepest descent', 2, None, 'steepest-descent'),
+              ('More on Steepest descent', 2, None, 'more-on-steepest-descent'),
+              ('The ideal', 2, None, 'the-ideal'),
+              ('The sensitiveness of the gradient descent',
+               2,
+               None,
+               'the-sensitiveness-of-the-gradient-descent'),
+              ('Convex functions', 2, None, 'convex-functions'),
+              ('Convex function', 2, None, 'convex-function'),
+              ('Conditions on convex functions',
+               2,
+               None,
+               'conditions-on-convex-functions'),
+              ('More on convex functions', 2, None, 'more-on-convex-functions'),
+              ('Some simple problems', 2, None, 'some-simple-problems'),
+              ('Standard steepest descent',
+               2,
+               None,
+               'standard-steepest-descent'),
+              ('Gradient method', 2, None, 'gradient-method'),
+              ('Steepest descent  method', 2, None, 'steepest-descent-method'),
+              ('Steepest descent  method', 2, None, 'steepest-descent-method'),
+              ('Final expressions', 2, None, 'final-expressions'),
+              ('Steepest descent example', 2, None, 'steepest-descent-example'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method and iterations',
+               2,
+               None,
+               'conjugate-gradient-method-and-iterations'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Revisiting our first homework',
+               2,
+               None,
+               'revisiting-our-first-homework'),
+              ('Gradient descent example', 2, None, 'gradient-descent-example'),
+              ('The derivative of the cost/loss function',
+               2,
+               None,
+               'the-derivative-of-the-cost-loss-function'),
+              ('The Hessian matrix', 2, None, 'the-hessian-matrix'),
+              ('Simple program', 2, None, 'simple-program'),
+              ('Gradient Descent Example', 2, None, 'gradient-descent-example'),
+              ('And a corresponding example using _scikit-learn_',
+               2,
+               None,
+               'and-a-corresponding-example-using-scikit-learn'),
+              ('Gradient descent and Ridge',
+               2,
+               None,
+               'gradient-descent-and-ridge'),
+              ('The Hessian matrix for Ridge Regression',
+               2,
+               None,
+               'the-hessian-matrix-for-ridge-regression'),
+              ('Program example for gradient descent with Ridge Regression',
+               2,
+               None,
+               'program-example-for-gradient-descent-with-ridge-regression'),
+              ('Using gradient descent methods, limitations',
+               2,
+               None,
+               'using-gradient-descent-methods-limitations'),
+              ('Improving gradient descent with momentum',
+               2,
+               None,
+               'improving-gradient-descent-with-momentum'),
+              ('Same code but now with momentum gradient descent',
+               2,
+               None,
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ('Overview video on Stochastic Gradient Descent',
+               2,
+               None,
+               'overview-video-on-stochastic-gradient-descent'),
+              ('Batches and mini-batches', 2, None, 'batches-and-mini-batches'),
+              ('Stochastic Gradient Descent (SGD)',
+               2,
+               None,
+               'stochastic-gradient-descent-sgd'),
+              ('Stochastic Gradient Descent',
+               2,
+               None,
+               'stochastic-gradient-descent'),
+              ('Computation of gradients', 2, None, 'computation-of-gradients'),
+              ('SGD example', 2, None, 'sgd-example'),
+              ('The gradient step', 2, None, 'the-gradient-step'),
+              ('Simple example code', 2, None, 'simple-example-code'),
+              ('When do we stop?', 2, None, 'when-do-we-stop'),
+              ('Slightly different approach',
+               2,
+               None,
+               'slightly-different-approach'),
+              ('Time decay rate', 2, None, 'time-decay-rate'),
+              ('Code with a Number of Minibatches which varies',
+               2,
+               None,
+               'code-with-a-number-of-minibatches-which-varies'),
+              ('Replace or not', 2, None, 'replace-or-not'),
+              ('Momentum based GD', 2, None, 'momentum-based-gd'),
+              ('More on momentum based approaches',
+               2,
+               None,
+               'more-on-momentum-based-approaches'),
+              ('Momentum parameter', 2, None, 'momentum-parameter'),
+              ('Second moment of the gradient',
+               2,
+               None,
+               'second-moment-of-the-gradient'),
+              ('RMS prop', 2, None, 'rms-prop'),
+              ('"ADAM optimizer":"https://arxiv.org/abs/1412.6980"',
+               2,
+               None,
+               'adam-optimizer-https-arxiv-org-abs-1412-6980'),
+              ('Algorithms and codes for Adagrad, RMSprop and Adam',
+               2,
+               None,
+               'algorithms-and-codes-for-adagrad-rmsprop-and-adam'),
+              ('Practical tips', 2, None, 'practical-tips'),
+              ('Automatic differentiation',
+               2,
+               None,
+               'automatic-differentiation'),
+              ('Using autograd', 2, None, 'using-autograd'),
+              ('Autograd with more complicated functions',
+               2,
+               None,
+               'autograd-with-more-complicated-functions'),
+              ('More complicated functions using the elements of their '
+               'arguments directly',
+               2,
+               None,
+               'more-complicated-functions-using-the-elements-of-their-arguments-directly'),
+              ('Functions using mathematical functions from Numpy',
+               2,
+               None,
+               'functions-using-mathematical-functions-from-numpy'),
+              ('More autograd', 2, None, 'more-autograd'),
+              ('And  with loops', 2, None, 'and-with-loops'),
+              ('Using recursion', 2, None, 'using-recursion'),
+              ('Unsupported functions', 2, None, 'unsupported-functions'),
+              ('The syntax a.dot(b) when finding the dot product',
+               2,
+               None,
+               'the-syntax-a-dot-b-when-finding-the-dot-product'),
+              ('Using Autograd with OLS', 2, None, 'using-autograd-with-ols'),
+              ('Same code but now with momentum gradient descent',
+               2,
+               None,
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ("But none of these can compete with Newton's method",
+               2,
+               None,
+               'but-none-of-these-can-compete-with-newton-s-method'),
+              ('Including Stochastic Gradient Descent with Autograd',
+               2,
+               None,
+               'including-stochastic-gradient-descent-with-autograd'),
+              ('Same code but now with momentum gradient descent',
+               2,
+               None,
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ('Similar (second order function now) problem but now with '
+               'AdaGrad',
+               2,
+               None,
+               'similar-second-order-function-now-problem-but-now-with-adagrad'),
+              ('RMSprop for adaptive learning rate with Stochastic Gradient '
+               'Descent',
+               2,
+               None,
+               'rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent'),
+              ('And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"',
+               2,
+               None,
+               'and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf'),
+              ('And Logistic Regression', 2, None, 'and-logistic-regression'),
+              ('Introducing "JAX":"https://jax.readthedocs.io/en/latest/"',
+               2,
+               None,
+               'introducing-jax-https-jax-readthedocs-io-en-latest')]}
+end of tocinfo -->
+
+<body>
+
+
+
+<script type="text/x-mathjax-config">
+MathJax.Hub.Config({
+  TeX: {
+     equationNumbers: {  autoNumber: "none"  },
+     extensions: ["AMSmath.js", "AMSsymbols.js", "autobold.js", "color.js"]
+  }
+});
+</script>
+<script type="text/javascript" async
+ src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
+</script>
+
+
+<!-- Bootstrap navigation bar -->
+<div class="navbar navbar-default navbar-fixed-top">
+  <div class="navbar-header">
+    <button type="button" class="navbar-toggle" data-toggle="collapse" data-target=".navbar-responsive-collapse">
+      <span class="icon-bar"></span>
+      <span class="icon-bar"></span>
+      <span class="icon-bar"></span>
+    </button>
+    <a class="navbar-brand" href="http://github.com/week39-bs.html">Week 39: Optimization and  Gradient Methods</a>
+  </div>
+  <div class="navbar-collapse collapse navbar-responsive-collapse">
+    <ul class="nav navbar-nav navbar-right">
+      <li class="dropdown">
+        <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
+        <ul class="dropdown-menu">
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs001.html#plan-for-week-39-september-23-27-2024" style="font-size: 80%;">Plan for week 39, September 23-27, 2024</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs002.html#lecture-monday-september-23" style="font-size: 80%;">Lecture Monday September 23</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs003.html#lab-sessions-week-39" style="font-size: 80%;">Lab sessions week 39</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs004.html#lecture-monday-september-23-optimization-the-central-part-of-any-machine-learning-algortithm" style="font-size: 80%;">Lecture Monday September 23, Optimization, the central part of any Machine Learning algortithm</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs005.html#revisiting-our-logistic-regression-case" style="font-size: 80%;">Revisiting our Logistic Regression case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs006.html#the-equations-to-solve" style="font-size: 80%;">The equations to solve</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs007.html#solving-using-newton-raphson-s-method" style="font-size: 80%;">Solving using Newton-Raphson's method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs008.html#brief-reminder-on-newton-raphson-s-method" style="font-size: 80%;">Brief reminder on Newton-Raphson's method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs009.html#the-equations" style="font-size: 80%;">The equations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs010.html#simple-geometric-interpretation" style="font-size: 80%;">Simple geometric interpretation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs011.html#extending-to-more-than-one-variable" style="font-size: 80%;">Extending to more than one variable</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs012.html#steepest-descent" style="font-size: 80%;">Steepest descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs013.html#more-on-steepest-descent" style="font-size: 80%;">More on Steepest descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs014.html#the-ideal" style="font-size: 80%;">The ideal</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs015.html#the-sensitiveness-of-the-gradient-descent" style="font-size: 80%;">The sensitiveness of the gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs016.html#convex-functions" style="font-size: 80%;">Convex functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs017.html#convex-function" style="font-size: 80%;">Convex function</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs018.html#conditions-on-convex-functions" style="font-size: 80%;">Conditions on convex functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs019.html#more-on-convex-functions" style="font-size: 80%;">More on convex functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs020.html#some-simple-problems" style="font-size: 80%;">Some simple problems</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs021.html#standard-steepest-descent" style="font-size: 80%;">Standard steepest descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs022.html#gradient-method" style="font-size: 80%;">Gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs024.html#steepest-descent-method" style="font-size: 80%;">Steepest descent  method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs024.html#steepest-descent-method" style="font-size: 80%;">Steepest descent  method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs025.html#final-expressions" style="font-size: 80%;">Final expressions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs026.html#steepest-descent-example" style="font-size: 80%;">Steepest descent example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs031.html#conjugate-gradient-method-and-iterations" style="font-size: 80%;">Conjugate gradient method and iterations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs035.html#revisiting-our-first-homework" style="font-size: 80%;">Revisiting our first homework</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs040.html#gradient-descent-example" style="font-size: 80%;">Gradient descent example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs037.html#the-derivative-of-the-cost-loss-function" style="font-size: 80%;">The derivative of the cost/loss function</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs038.html#the-hessian-matrix" style="font-size: 80%;">The Hessian matrix</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs039.html#simple-program" style="font-size: 80%;">Simple program</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs040.html#gradient-descent-example" style="font-size: 80%;">Gradient Descent Example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs041.html#and-a-corresponding-example-using-scikit-learn" style="font-size: 80%;">And a corresponding example using <b>scikit-learn</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs042.html#gradient-descent-and-ridge" style="font-size: 80%;">Gradient descent and Ridge</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs043.html#the-hessian-matrix-for-ridge-regression" style="font-size: 80%;">The Hessian matrix for Ridge Regression</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs044.html#program-example-for-gradient-descent-with-ridge-regression" style="font-size: 80%;">Program example for gradient descent with Ridge Regression</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs045.html#using-gradient-descent-methods-limitations" style="font-size: 80%;">Using gradient descent methods, limitations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs046.html#improving-gradient-descent-with-momentum" style="font-size: 80%;">Improving gradient descent with momentum</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs083.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;">Same code but now with momentum gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs048.html#overview-video-on-stochastic-gradient-descent" style="font-size: 80%;">Overview video on Stochastic Gradient Descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs049.html#batches-and-mini-batches" style="font-size: 80%;">Batches and mini-batches</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs050.html#stochastic-gradient-descent-sgd" style="font-size: 80%;">Stochastic Gradient Descent (SGD)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs051.html#stochastic-gradient-descent" style="font-size: 80%;">Stochastic Gradient Descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs052.html#computation-of-gradients" style="font-size: 80%;">Computation of gradients</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs053.html#sgd-example" style="font-size: 80%;">SGD example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs054.html#the-gradient-step" style="font-size: 80%;">The gradient step</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs055.html#simple-example-code" style="font-size: 80%;">Simple example code</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs056.html#when-do-we-stop" style="font-size: 80%;">When do we stop?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs057.html#slightly-different-approach" style="font-size: 80%;">Slightly different approach</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs058.html#time-decay-rate" style="font-size: 80%;">Time decay rate</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs059.html#code-with-a-number-of-minibatches-which-varies" style="font-size: 80%;">Code with a Number of Minibatches which varies</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs060.html#replace-or-not" style="font-size: 80%;">Replace or not</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs061.html#momentum-based-gd" style="font-size: 80%;">Momentum based GD</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs062.html#more-on-momentum-based-approaches" style="font-size: 80%;">More on momentum based approaches</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs063.html#momentum-parameter" style="font-size: 80%;">Momentum parameter</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs064.html#second-moment-of-the-gradient" style="font-size: 80%;">Second moment of the gradient</a></li>
+     <!-- navigation toc: --> <li><a href="#rms-prop" style="font-size: 80%;">RMS prop</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs066.html#adam-optimizer-https-arxiv-org-abs-1412-6980" style="font-size: 80%;">"ADAM optimizer":"https://arxiv.org/abs/1412.6980"</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs067.html#algorithms-and-codes-for-adagrad-rmsprop-and-adam" style="font-size: 80%;">Algorithms and codes for Adagrad, RMSprop and Adam</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs068.html#practical-tips" style="font-size: 80%;">Practical tips</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs069.html#automatic-differentiation" style="font-size: 80%;">Automatic differentiation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs070.html#using-autograd" style="font-size: 80%;">Using autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs071.html#autograd-with-more-complicated-functions" style="font-size: 80%;">Autograd with more complicated functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs072.html#more-complicated-functions-using-the-elements-of-their-arguments-directly" style="font-size: 80%;">More complicated functions using the elements of their arguments directly</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs073.html#functions-using-mathematical-functions-from-numpy" style="font-size: 80%;">Functions using mathematical functions from Numpy</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs074.html#more-autograd" style="font-size: 80%;">More autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs075.html#and-with-loops" style="font-size: 80%;">And  with loops</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs076.html#using-recursion" style="font-size: 80%;">Using recursion</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs077.html#unsupported-functions" style="font-size: 80%;">Unsupported functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs078.html#the-syntax-a-dot-b-when-finding-the-dot-product" style="font-size: 80%;">The syntax a.dot(b) when finding the dot product</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs079.html#using-autograd-with-ols" style="font-size: 80%;">Using Autograd with OLS</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs083.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;">Same code but now with momentum gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs081.html#but-none-of-these-can-compete-with-newton-s-method" style="font-size: 80%;">But none of these can compete with Newton's method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs082.html#including-stochastic-gradient-descent-with-autograd" style="font-size: 80%;">Including Stochastic Gradient Descent with Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs083.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;">Same code but now with momentum gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs084.html#similar-second-order-function-now-problem-but-now-with-adagrad" style="font-size: 80%;">Similar (second order function now) problem but now with AdaGrad</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs085.html#rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent" style="font-size: 80%;">RMSprop for adaptive learning rate with Stochastic Gradient Descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs086.html#and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf" style="font-size: 80%;">And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs087.html#and-logistic-regression" style="font-size: 80%;">And Logistic Regression</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs088.html#introducing-jax-https-jax-readthedocs-io-en-latest" style="font-size: 80%;">Introducing "JAX":"https://jax.readthedocs.io/en/latest/"</a></li>
+
+        </ul>
+      </li>
+    </ul>
+  </div>
+</div>
+</div> <!-- end of navigation bar -->
+<div class="container">
+<p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
+<a name="part0065"></a>
+<!-- !split -->
+<h2 id="rms-prop" class="anchor">RMS prop </h2>
+
+<p>In RMS prop, in addition to keeping a running average of the first
+moment of the gradient, we also keep track of the second moment
+denoted by \( \mathbf{s}_t=\mathbb{E}[\mathbf{g}_t^2] \). The update rule
+for RMS prop is given by
+</p>
+
+$$
+\begin{align}
+\mathbf{g}_t &= \nabla_\theta E(\boldsymbol{\theta}) 
+\tag{4}\\
+\mathbf{s}_t &=\beta \mathbf{s}_{t-1} +(1-\beta)\mathbf{g}_t^2 \nonumber \\
+\boldsymbol{\theta}_{t+1}&=&\boldsymbol{\theta}_t - \eta_t { \mathbf{g}_t \over \sqrt{\mathbf{s}_t +\epsilon}}, \nonumber
+\end{align}
+$$
+
+<p>where \( \beta \) controls the averaging time of the second moment and is
+typically taken to be about \( \beta=0.9 \), \( \eta_t \) is a learning rate
+typically chosen to be \( 10^{-3} \), and \( \epsilon\sim 10^{-8}  \) is a
+small regularization constant to prevent divergences. Multiplication
+and division by vectors is understood as an element-wise operation. It
+is clear from this formula that the learning rate is reduced in
+directions where the norm of the gradient is consistently large. This
+greatly speeds up the convergence by allowing us to use a larger
+learning rate for flat directions.
+</p>
+
+<p>
+<!-- navigation buttons at the bottom of the page -->
+<ul class="pagination">
+<li><a href="http://github.com/._week39-bs064.html">&laquo;</a></li>
+  <li><a href="http://github.com/._week39-bs000.html">1</a></li>
+  <li><a href="">...</a></li>
+  <li><a href="http://github.com/._week39-bs057.html">58</a></li>
+  <li><a href="http://github.com/._week39-bs058.html">59</a></li>
+  <li><a href="http://github.com/._week39-bs059.html">60</a></li>
+  <li><a href="http://github.com/._week39-bs060.html">61</a></li>
+  <li><a href="http://github.com/._week39-bs061.html">62</a></li>
+  <li><a href="http://github.com/._week39-bs062.html">63</a></li>
+  <li><a href="http://github.com/._week39-bs063.html">64</a></li>
+  <li><a href="http://github.com/._week39-bs064.html">65</a></li>
+  <li class="active"><a href="http://github.com/._week39-bs065.html">66</a></li>
+  <li><a href="http://github.com/._week39-bs066.html">67</a></li>
+  <li><a href="http://github.com/._week39-bs067.html">68</a></li>
+  <li><a href="http://github.com/._week39-bs068.html">69</a></li>
+  <li><a href="http://github.com/._week39-bs069.html">70</a></li>
+  <li><a href="http://github.com/._week39-bs070.html">71</a></li>
+  <li><a href="http://github.com/._week39-bs071.html">72</a></li>
+  <li><a href="http://github.com/._week39-bs072.html">73</a></li>
+  <li><a href="http://github.com/._week39-bs073.html">74</a></li>
+  <li><a href="http://github.com/._week39-bs074.html">75</a></li>
+  <li><a href="">...</a></li>
+  <li><a href="http://github.com/._week39-bs088.html">89</a></li>
+  <li><a href="http://github.com/._week39-bs066.html">&raquo;</a></li>
+</ul>
+<!-- ------------------- end of main content --------------- -->
+</div>  <!-- end container -->
+<!-- include javascript, jQuery *first* -->
+<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.10.2/jquery.min.js"></script>
+<script src="https://netdna.bootstrapcdn.com/bootstrap/3.0.0/js/bootstrap.min.js"></script>
+<!-- Bootstrap footer
+<footer>
+<a href="https://.../"><img width="250" align=right src="https://.../"></a>
+</footer>
+-->
+<center style="font-size:80%">
+<!-- copyright only on the titlepage -->
+</center>
+</body>
+</html>
+
diff --git a/doc/pub/week39/html/._week39-bs066.html b/doc/pub/week39/html/._week39-bs066.html
new file mode 100644
index 000000000..7c9e66ade
--- /dev/null
+++ b/doc/pub/week39/html/._week39-bs066.html
@@ -0,0 +1,511 @@
+<!--
+HTML file automatically generated from DocOnce source
+(https://github.com/doconce/doconce/)
+doconce format html week39.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week39-bs --no_mako
+-->
+<html>
+<head>
+<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
+<meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
+<meta name="viewport" content="width=device-width, initial-scale=1.0" />
+<meta name="description" content="Week 39: Optimization and  Gradient Methods">
+<title>Week 39: Optimization and  Gradient Methods</title>
+<!-- Bootstrap style: bootstrap -->
+<!-- doconce format html week39.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week39-bs --no_mako -->
+<link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
+<!-- not necessary
+<link href="https://netdna.bootstrapcdn.com/font-awesome/4.0.3/css/font-awesome.css" rel="stylesheet">
+-->
+<style type="text/css">
+/* Add scrollbar to dropdown menus in bootstrap navigation bar */
+.dropdown-menu {
+   height: auto;
+   max-height: 400px;
+   overflow-x: hidden;
+}
+/* Adds an invisible element before each target to offset for the navigation
+   bar */
+.anchor::before {
+  content:"";
+  display:block;
+  height:50px;      /* fixed header height for style bootstrap */
+  margin:-50px 0 0; /* negative fixed header height */
+}
+</style>
+</head>
+
+<!-- tocinfo
+{'highest level': 2,
+ 'sections': [('Plan for week 39, September 23-27, 2024',
+               2,
+               None,
+               'plan-for-week-39-september-23-27-2024'),
+              ('Lecture Monday September 23',
+               2,
+               None,
+               'lecture-monday-september-23'),
+              ('Lab sessions week 39', 2, None, 'lab-sessions-week-39'),
+              ('Lecture Monday September 23, Optimization, the central part of '
+               'any Machine Learning algortithm',
+               2,
+               None,
+               'lecture-monday-september-23-optimization-the-central-part-of-any-machine-learning-algortithm'),
+              ('Revisiting our Logistic Regression case',
+               2,
+               None,
+               'revisiting-our-logistic-regression-case'),
+              ('The equations to solve', 2, None, 'the-equations-to-solve'),
+              ("Solving using Newton-Raphson's method",
+               2,
+               None,
+               'solving-using-newton-raphson-s-method'),
+              ("Brief reminder on Newton-Raphson's method",
+               2,
+               None,
+               'brief-reminder-on-newton-raphson-s-method'),
+              ('The equations', 2, None, 'the-equations'),
+              ('Simple geometric interpretation',
+               2,
+               None,
+               'simple-geometric-interpretation'),
+              ('Extending to more than one variable',
+               2,
+               None,
+               'extending-to-more-than-one-variable'),
+              ('Steepest descent', 2, None, 'steepest-descent'),
+              ('More on Steepest descent', 2, None, 'more-on-steepest-descent'),
+              ('The ideal', 2, None, 'the-ideal'),
+              ('The sensitiveness of the gradient descent',
+               2,
+               None,
+               'the-sensitiveness-of-the-gradient-descent'),
+              ('Convex functions', 2, None, 'convex-functions'),
+              ('Convex function', 2, None, 'convex-function'),
+              ('Conditions on convex functions',
+               2,
+               None,
+               'conditions-on-convex-functions'),
+              ('More on convex functions', 2, None, 'more-on-convex-functions'),
+              ('Some simple problems', 2, None, 'some-simple-problems'),
+              ('Standard steepest descent',
+               2,
+               None,
+               'standard-steepest-descent'),
+              ('Gradient method', 2, None, 'gradient-method'),
+              ('Steepest descent  method', 2, None, 'steepest-descent-method'),
+              ('Steepest descent  method', 2, None, 'steepest-descent-method'),
+              ('Final expressions', 2, None, 'final-expressions'),
+              ('Steepest descent example', 2, None, 'steepest-descent-example'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method and iterations',
+               2,
+               None,
+               'conjugate-gradient-method-and-iterations'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Revisiting our first homework',
+               2,
+               None,
+               'revisiting-our-first-homework'),
+              ('Gradient descent example', 2, None, 'gradient-descent-example'),
+              ('The derivative of the cost/loss function',
+               2,
+               None,
+               'the-derivative-of-the-cost-loss-function'),
+              ('The Hessian matrix', 2, None, 'the-hessian-matrix'),
+              ('Simple program', 2, None, 'simple-program'),
+              ('Gradient Descent Example', 2, None, 'gradient-descent-example'),
+              ('And a corresponding example using _scikit-learn_',
+               2,
+               None,
+               'and-a-corresponding-example-using-scikit-learn'),
+              ('Gradient descent and Ridge',
+               2,
+               None,
+               'gradient-descent-and-ridge'),
+              ('The Hessian matrix for Ridge Regression',
+               2,
+               None,
+               'the-hessian-matrix-for-ridge-regression'),
+              ('Program example for gradient descent with Ridge Regression',
+               2,
+               None,
+               'program-example-for-gradient-descent-with-ridge-regression'),
+              ('Using gradient descent methods, limitations',
+               2,
+               None,
+               'using-gradient-descent-methods-limitations'),
+              ('Improving gradient descent with momentum',
+               2,
+               None,
+               'improving-gradient-descent-with-momentum'),
+              ('Same code but now with momentum gradient descent',
+               2,
+               None,
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ('Overview video on Stochastic Gradient Descent',
+               2,
+               None,
+               'overview-video-on-stochastic-gradient-descent'),
+              ('Batches and mini-batches', 2, None, 'batches-and-mini-batches'),
+              ('Stochastic Gradient Descent (SGD)',
+               2,
+               None,
+               'stochastic-gradient-descent-sgd'),
+              ('Stochastic Gradient Descent',
+               2,
+               None,
+               'stochastic-gradient-descent'),
+              ('Computation of gradients', 2, None, 'computation-of-gradients'),
+              ('SGD example', 2, None, 'sgd-example'),
+              ('The gradient step', 2, None, 'the-gradient-step'),
+              ('Simple example code', 2, None, 'simple-example-code'),
+              ('When do we stop?', 2, None, 'when-do-we-stop'),
+              ('Slightly different approach',
+               2,
+               None,
+               'slightly-different-approach'),
+              ('Time decay rate', 2, None, 'time-decay-rate'),
+              ('Code with a Number of Minibatches which varies',
+               2,
+               None,
+               'code-with-a-number-of-minibatches-which-varies'),
+              ('Replace or not', 2, None, 'replace-or-not'),
+              ('Momentum based GD', 2, None, 'momentum-based-gd'),
+              ('More on momentum based approaches',
+               2,
+               None,
+               'more-on-momentum-based-approaches'),
+              ('Momentum parameter', 2, None, 'momentum-parameter'),
+              ('Second moment of the gradient',
+               2,
+               None,
+               'second-moment-of-the-gradient'),
+              ('RMS prop', 2, None, 'rms-prop'),
+              ('"ADAM optimizer":"https://arxiv.org/abs/1412.6980"',
+               2,
+               None,
+               'adam-optimizer-https-arxiv-org-abs-1412-6980'),
+              ('Algorithms and codes for Adagrad, RMSprop and Adam',
+               2,
+               None,
+               'algorithms-and-codes-for-adagrad-rmsprop-and-adam'),
+              ('Practical tips', 2, None, 'practical-tips'),
+              ('Automatic differentiation',
+               2,
+               None,
+               'automatic-differentiation'),
+              ('Using autograd', 2, None, 'using-autograd'),
+              ('Autograd with more complicated functions',
+               2,
+               None,
+               'autograd-with-more-complicated-functions'),
+              ('More complicated functions using the elements of their '
+               'arguments directly',
+               2,
+               None,
+               'more-complicated-functions-using-the-elements-of-their-arguments-directly'),
+              ('Functions using mathematical functions from Numpy',
+               2,
+               None,
+               'functions-using-mathematical-functions-from-numpy'),
+              ('More autograd', 2, None, 'more-autograd'),
+              ('And  with loops', 2, None, 'and-with-loops'),
+              ('Using recursion', 2, None, 'using-recursion'),
+              ('Unsupported functions', 2, None, 'unsupported-functions'),
+              ('The syntax a.dot(b) when finding the dot product',
+               2,
+               None,
+               'the-syntax-a-dot-b-when-finding-the-dot-product'),
+              ('Using Autograd with OLS', 2, None, 'using-autograd-with-ols'),
+              ('Same code but now with momentum gradient descent',
+               2,
+               None,
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ("But none of these can compete with Newton's method",
+               2,
+               None,
+               'but-none-of-these-can-compete-with-newton-s-method'),
+              ('Including Stochastic Gradient Descent with Autograd',
+               2,
+               None,
+               'including-stochastic-gradient-descent-with-autograd'),
+              ('Same code but now with momentum gradient descent',
+               2,
+               None,
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ('Similar (second order function now) problem but now with '
+               'AdaGrad',
+               2,
+               None,
+               'similar-second-order-function-now-problem-but-now-with-adagrad'),
+              ('RMSprop for adaptive learning rate with Stochastic Gradient '
+               'Descent',
+               2,
+               None,
+               'rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent'),
+              ('And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"',
+               2,
+               None,
+               'and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf'),
+              ('And Logistic Regression', 2, None, 'and-logistic-regression'),
+              ('Introducing "JAX":"https://jax.readthedocs.io/en/latest/"',
+               2,
+               None,
+               'introducing-jax-https-jax-readthedocs-io-en-latest')]}
+end of tocinfo -->
+
+<body>
+
+
+
+<script type="text/x-mathjax-config">
+MathJax.Hub.Config({
+  TeX: {
+     equationNumbers: {  autoNumber: "none"  },
+     extensions: ["AMSmath.js", "AMSsymbols.js", "autobold.js", "color.js"]
+  }
+});
+</script>
+<script type="text/javascript" async
+ src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
+</script>
+
+
+<!-- Bootstrap navigation bar -->
+<div class="navbar navbar-default navbar-fixed-top">
+  <div class="navbar-header">
+    <button type="button" class="navbar-toggle" data-toggle="collapse" data-target=".navbar-responsive-collapse">
+      <span class="icon-bar"></span>
+      <span class="icon-bar"></span>
+      <span class="icon-bar"></span>
+    </button>
+    <a class="navbar-brand" href="http://github.com/week39-bs.html">Week 39: Optimization and  Gradient Methods</a>
+  </div>
+  <div class="navbar-collapse collapse navbar-responsive-collapse">
+    <ul class="nav navbar-nav navbar-right">
+      <li class="dropdown">
+        <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
+        <ul class="dropdown-menu">
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs001.html#plan-for-week-39-september-23-27-2024" style="font-size: 80%;">Plan for week 39, September 23-27, 2024</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs002.html#lecture-monday-september-23" style="font-size: 80%;">Lecture Monday September 23</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs003.html#lab-sessions-week-39" style="font-size: 80%;">Lab sessions week 39</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs004.html#lecture-monday-september-23-optimization-the-central-part-of-any-machine-learning-algortithm" style="font-size: 80%;">Lecture Monday September 23, Optimization, the central part of any Machine Learning algortithm</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs005.html#revisiting-our-logistic-regression-case" style="font-size: 80%;">Revisiting our Logistic Regression case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs006.html#the-equations-to-solve" style="font-size: 80%;">The equations to solve</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs007.html#solving-using-newton-raphson-s-method" style="font-size: 80%;">Solving using Newton-Raphson's method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs008.html#brief-reminder-on-newton-raphson-s-method" style="font-size: 80%;">Brief reminder on Newton-Raphson's method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs009.html#the-equations" style="font-size: 80%;">The equations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs010.html#simple-geometric-interpretation" style="font-size: 80%;">Simple geometric interpretation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs011.html#extending-to-more-than-one-variable" style="font-size: 80%;">Extending to more than one variable</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs012.html#steepest-descent" style="font-size: 80%;">Steepest descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs013.html#more-on-steepest-descent" style="font-size: 80%;">More on Steepest descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs014.html#the-ideal" style="font-size: 80%;">The ideal</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs015.html#the-sensitiveness-of-the-gradient-descent" style="font-size: 80%;">The sensitiveness of the gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs016.html#convex-functions" style="font-size: 80%;">Convex functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs017.html#convex-function" style="font-size: 80%;">Convex function</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs018.html#conditions-on-convex-functions" style="font-size: 80%;">Conditions on convex functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs019.html#more-on-convex-functions" style="font-size: 80%;">More on convex functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs020.html#some-simple-problems" style="font-size: 80%;">Some simple problems</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs021.html#standard-steepest-descent" style="font-size: 80%;">Standard steepest descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs022.html#gradient-method" style="font-size: 80%;">Gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs024.html#steepest-descent-method" style="font-size: 80%;">Steepest descent  method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs024.html#steepest-descent-method" style="font-size: 80%;">Steepest descent  method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs025.html#final-expressions" style="font-size: 80%;">Final expressions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs026.html#steepest-descent-example" style="font-size: 80%;">Steepest descent example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs031.html#conjugate-gradient-method-and-iterations" style="font-size: 80%;">Conjugate gradient method and iterations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs035.html#revisiting-our-first-homework" style="font-size: 80%;">Revisiting our first homework</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs040.html#gradient-descent-example" style="font-size: 80%;">Gradient descent example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs037.html#the-derivative-of-the-cost-loss-function" style="font-size: 80%;">The derivative of the cost/loss function</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs038.html#the-hessian-matrix" style="font-size: 80%;">The Hessian matrix</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs039.html#simple-program" style="font-size: 80%;">Simple program</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs040.html#gradient-descent-example" style="font-size: 80%;">Gradient Descent Example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs041.html#and-a-corresponding-example-using-scikit-learn" style="font-size: 80%;">And a corresponding example using <b>scikit-learn</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs042.html#gradient-descent-and-ridge" style="font-size: 80%;">Gradient descent and Ridge</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs043.html#the-hessian-matrix-for-ridge-regression" style="font-size: 80%;">The Hessian matrix for Ridge Regression</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs044.html#program-example-for-gradient-descent-with-ridge-regression" style="font-size: 80%;">Program example for gradient descent with Ridge Regression</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs045.html#using-gradient-descent-methods-limitations" style="font-size: 80%;">Using gradient descent methods, limitations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs046.html#improving-gradient-descent-with-momentum" style="font-size: 80%;">Improving gradient descent with momentum</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs083.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;">Same code but now with momentum gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs048.html#overview-video-on-stochastic-gradient-descent" style="font-size: 80%;">Overview video on Stochastic Gradient Descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs049.html#batches-and-mini-batches" style="font-size: 80%;">Batches and mini-batches</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs050.html#stochastic-gradient-descent-sgd" style="font-size: 80%;">Stochastic Gradient Descent (SGD)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs051.html#stochastic-gradient-descent" style="font-size: 80%;">Stochastic Gradient Descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs052.html#computation-of-gradients" style="font-size: 80%;">Computation of gradients</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs053.html#sgd-example" style="font-size: 80%;">SGD example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs054.html#the-gradient-step" style="font-size: 80%;">The gradient step</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs055.html#simple-example-code" style="font-size: 80%;">Simple example code</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs056.html#when-do-we-stop" style="font-size: 80%;">When do we stop?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs057.html#slightly-different-approach" style="font-size: 80%;">Slightly different approach</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs058.html#time-decay-rate" style="font-size: 80%;">Time decay rate</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs059.html#code-with-a-number-of-minibatches-which-varies" style="font-size: 80%;">Code with a Number of Minibatches which varies</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs060.html#replace-or-not" style="font-size: 80%;">Replace or not</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs061.html#momentum-based-gd" style="font-size: 80%;">Momentum based GD</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs062.html#more-on-momentum-based-approaches" style="font-size: 80%;">More on momentum based approaches</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs063.html#momentum-parameter" style="font-size: 80%;">Momentum parameter</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs064.html#second-moment-of-the-gradient" style="font-size: 80%;">Second moment of the gradient</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs065.html#rms-prop" style="font-size: 80%;">RMS prop</a></li>
+     <!-- navigation toc: --> <li><a href="#adam-optimizer-https-arxiv-org-abs-1412-6980" style="font-size: 80%;">"ADAM optimizer":"https://arxiv.org/abs/1412.6980"</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs067.html#algorithms-and-codes-for-adagrad-rmsprop-and-adam" style="font-size: 80%;">Algorithms and codes for Adagrad, RMSprop and Adam</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs068.html#practical-tips" style="font-size: 80%;">Practical tips</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs069.html#automatic-differentiation" style="font-size: 80%;">Automatic differentiation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs070.html#using-autograd" style="font-size: 80%;">Using autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs071.html#autograd-with-more-complicated-functions" style="font-size: 80%;">Autograd with more complicated functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs072.html#more-complicated-functions-using-the-elements-of-their-arguments-directly" style="font-size: 80%;">More complicated functions using the elements of their arguments directly</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs073.html#functions-using-mathematical-functions-from-numpy" style="font-size: 80%;">Functions using mathematical functions from Numpy</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs074.html#more-autograd" style="font-size: 80%;">More autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs075.html#and-with-loops" style="font-size: 80%;">And  with loops</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs076.html#using-recursion" style="font-size: 80%;">Using recursion</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs077.html#unsupported-functions" style="font-size: 80%;">Unsupported functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs078.html#the-syntax-a-dot-b-when-finding-the-dot-product" style="font-size: 80%;">The syntax a.dot(b) when finding the dot product</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs079.html#using-autograd-with-ols" style="font-size: 80%;">Using Autograd with OLS</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs083.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;">Same code but now with momentum gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs081.html#but-none-of-these-can-compete-with-newton-s-method" style="font-size: 80%;">But none of these can compete with Newton's method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs082.html#including-stochastic-gradient-descent-with-autograd" style="font-size: 80%;">Including Stochastic Gradient Descent with Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs083.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;">Same code but now with momentum gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs084.html#similar-second-order-function-now-problem-but-now-with-adagrad" style="font-size: 80%;">Similar (second order function now) problem but now with AdaGrad</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs085.html#rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent" style="font-size: 80%;">RMSprop for adaptive learning rate with Stochastic Gradient Descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs086.html#and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf" style="font-size: 80%;">And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs087.html#and-logistic-regression" style="font-size: 80%;">And Logistic Regression</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs088.html#introducing-jax-https-jax-readthedocs-io-en-latest" style="font-size: 80%;">Introducing "JAX":"https://jax.readthedocs.io/en/latest/"</a></li>
+
+        </ul>
+      </li>
+    </ul>
+  </div>
+</div>
+</div> <!-- end of navigation bar -->
+<div class="container">
+<p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
+<a name="part0066"></a>
+<!-- !split -->
+<h2 id="adam-optimizer-https-arxiv-org-abs-1412-6980" class="anchor"><a href="https://arxiv.org/abs/1412.6980" target="_self">ADAM optimizer</a> </h2>
+
+<p>A related algorithm is the ADAM optimizer. In
+<a href="https://arxiv.org/abs/1412.6980" target="_self">ADAM</a>, we keep a running average of
+both the first and second moment of the gradient and use this
+information to adaptively change the learning rate for different
+parameters.  The method isefficient when working with large
+problems involving lots data and/or parameters.  It is a combination of the
+gradient descent with momentum algorithm and the RMSprop algorithm
+discussed above.
+</p>
+
+<p>In addition to keeping a running average of the first and
+second moments of the gradient
+(i.e. \( \mathbf{m}_t=\mathbb{E}[\mathbf{g}_t] \) and
+\( \mathbf{s}_t=\mathbb{E}[\mathbf{g}^2_t] \), respectively), ADAM
+performs an additional bias correction to account for the fact that we
+are estimating the first two moments of the gradient using a running
+average (denoted by the hats in the update rule below). The update
+rule for ADAM is given by (where multiplication and division are once
+again understood to be element-wise operations below)
+</p>
+
+$$
+\begin{align}
+\mathbf{g}_t &= \nabla_\theta E(\boldsymbol{\theta}) 
+\tag{5}\\
+\mathbf{m}_t &= \beta_1 \mathbf{m}_{t-1} + (1-\beta_1) \mathbf{g}_t \nonumber \\
+\mathbf{s}_t &=\beta_2 \mathbf{s}_{t-1} +(1-\beta_2)\mathbf{g}_t^2 \nonumber \\
+\boldsymbol{\mathbf{m}}_t&={\mathbf{m}_t \over 1-\beta_1^t} \nonumber \\
+\boldsymbol{\mathbf{s}}_t &={\mathbf{s}_t \over1-\beta_2^t} \nonumber \\
+\boldsymbol{\theta}_{t+1}&=\boldsymbol{\theta}_t - \eta_t { \boldsymbol{\mathbf{m}}_t \over \sqrt{\boldsymbol{\mathbf{s}}_t} +\epsilon}, \nonumber \\
+\tag{6}
+\end{align}
+$$
+
+<p>where \( \beta_1 \) and \( \beta_2 \) set the memory lifetime of the first and
+second moment and are typically taken to be \( 0.9 \) and \( 0.99 \)
+respectively, and \( \eta \) and \( \epsilon \) are identical to RMSprop.
+</p>
+
+<p>Like in RMSprop, the effective step size of a parameter depends on the
+magnitude of its gradient squared.  To understand this better, let us
+rewrite this expression in terms of the variance
+\( \boldsymbol{\sigma}_t^2 = \boldsymbol{\mathbf{s}}_t -
+(\boldsymbol{\mathbf{m}}_t)^2 \). Consider a single parameter \( \theta_t \). The
+update rule for this parameter is given by
+</p>
+
+$$
+\Delta \theta_{t+1}= -\eta_t { \boldsymbol{m}_t \over \sqrt{\sigma_t^2 +  m_t^2 }+\epsilon}.
+$$
+
+
+<p>
+<!-- navigation buttons at the bottom of the page -->
+<ul class="pagination">
+<li><a href="http://github.com/._week39-bs065.html">&laquo;</a></li>
+  <li><a href="http://github.com/._week39-bs000.html">1</a></li>
+  <li><a href="">...</a></li>
+  <li><a href="http://github.com/._week39-bs058.html">59</a></li>
+  <li><a href="http://github.com/._week39-bs059.html">60</a></li>
+  <li><a href="http://github.com/._week39-bs060.html">61</a></li>
+  <li><a href="http://github.com/._week39-bs061.html">62</a></li>
+  <li><a href="http://github.com/._week39-bs062.html">63</a></li>
+  <li><a href="http://github.com/._week39-bs063.html">64</a></li>
+  <li><a href="http://github.com/._week39-bs064.html">65</a></li>
+  <li><a href="http://github.com/._week39-bs065.html">66</a></li>
+  <li class="active"><a href="http://github.com/._week39-bs066.html">67</a></li>
+  <li><a href="http://github.com/._week39-bs067.html">68</a></li>
+  <li><a href="http://github.com/._week39-bs068.html">69</a></li>
+  <li><a href="http://github.com/._week39-bs069.html">70</a></li>
+  <li><a href="http://github.com/._week39-bs070.html">71</a></li>
+  <li><a href="http://github.com/._week39-bs071.html">72</a></li>
+  <li><a href="http://github.com/._week39-bs072.html">73</a></li>
+  <li><a href="http://github.com/._week39-bs073.html">74</a></li>
+  <li><a href="http://github.com/._week39-bs074.html">75</a></li>
+  <li><a href="http://github.com/._week39-bs075.html">76</a></li>
+  <li><a href="">...</a></li>
+  <li><a href="http://github.com/._week39-bs088.html">89</a></li>
+  <li><a href="http://github.com/._week39-bs067.html">&raquo;</a></li>
+</ul>
+<!-- ------------------- end of main content --------------- -->
+</div>  <!-- end container -->
+<!-- include javascript, jQuery *first* -->
+<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.10.2/jquery.min.js"></script>
+<script src="https://netdna.bootstrapcdn.com/bootstrap/3.0.0/js/bootstrap.min.js"></script>
+<!-- Bootstrap footer
+<footer>
+<a href="https://.../"><img width="250" align=right src="https://.../"></a>
+</footer>
+-->
+<center style="font-size:80%">
+<!-- copyright only on the titlepage -->
+</center>
+</body>
+</html>
+
diff --git a/doc/pub/week39/html/._week39-bs067.html b/doc/pub/week39/html/._week39-bs067.html
new file mode 100644
index 000000000..91731ecee
--- /dev/null
+++ b/doc/pub/week39/html/._week39-bs067.html
@@ -0,0 +1,463 @@
+<!--
+HTML file automatically generated from DocOnce source
+(https://github.com/doconce/doconce/)
+doconce format html week39.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week39-bs --no_mako
+-->
+<html>
+<head>
+<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
+<meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
+<meta name="viewport" content="width=device-width, initial-scale=1.0" />
+<meta name="description" content="Week 39: Optimization and  Gradient Methods">
+<title>Week 39: Optimization and  Gradient Methods</title>
+<!-- Bootstrap style: bootstrap -->
+<!-- doconce format html week39.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week39-bs --no_mako -->
+<link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
+<!-- not necessary
+<link href="https://netdna.bootstrapcdn.com/font-awesome/4.0.3/css/font-awesome.css" rel="stylesheet">
+-->
+<style type="text/css">
+/* Add scrollbar to dropdown menus in bootstrap navigation bar */
+.dropdown-menu {
+   height: auto;
+   max-height: 400px;
+   overflow-x: hidden;
+}
+/* Adds an invisible element before each target to offset for the navigation
+   bar */
+.anchor::before {
+  content:"";
+  display:block;
+  height:50px;      /* fixed header height for style bootstrap */
+  margin:-50px 0 0; /* negative fixed header height */
+}
+</style>
+</head>
+
+<!-- tocinfo
+{'highest level': 2,
+ 'sections': [('Plan for week 39, September 23-27, 2024',
+               2,
+               None,
+               'plan-for-week-39-september-23-27-2024'),
+              ('Lecture Monday September 23',
+               2,
+               None,
+               'lecture-monday-september-23'),
+              ('Lab sessions week 39', 2, None, 'lab-sessions-week-39'),
+              ('Lecture Monday September 23, Optimization, the central part of '
+               'any Machine Learning algortithm',
+               2,
+               None,
+               'lecture-monday-september-23-optimization-the-central-part-of-any-machine-learning-algortithm'),
+              ('Revisiting our Logistic Regression case',
+               2,
+               None,
+               'revisiting-our-logistic-regression-case'),
+              ('The equations to solve', 2, None, 'the-equations-to-solve'),
+              ("Solving using Newton-Raphson's method",
+               2,
+               None,
+               'solving-using-newton-raphson-s-method'),
+              ("Brief reminder on Newton-Raphson's method",
+               2,
+               None,
+               'brief-reminder-on-newton-raphson-s-method'),
+              ('The equations', 2, None, 'the-equations'),
+              ('Simple geometric interpretation',
+               2,
+               None,
+               'simple-geometric-interpretation'),
+              ('Extending to more than one variable',
+               2,
+               None,
+               'extending-to-more-than-one-variable'),
+              ('Steepest descent', 2, None, 'steepest-descent'),
+              ('More on Steepest descent', 2, None, 'more-on-steepest-descent'),
+              ('The ideal', 2, None, 'the-ideal'),
+              ('The sensitiveness of the gradient descent',
+               2,
+               None,
+               'the-sensitiveness-of-the-gradient-descent'),
+              ('Convex functions', 2, None, 'convex-functions'),
+              ('Convex function', 2, None, 'convex-function'),
+              ('Conditions on convex functions',
+               2,
+               None,
+               'conditions-on-convex-functions'),
+              ('More on convex functions', 2, None, 'more-on-convex-functions'),
+              ('Some simple problems', 2, None, 'some-simple-problems'),
+              ('Standard steepest descent',
+               2,
+               None,
+               'standard-steepest-descent'),
+              ('Gradient method', 2, None, 'gradient-method'),
+              ('Steepest descent  method', 2, None, 'steepest-descent-method'),
+              ('Steepest descent  method', 2, None, 'steepest-descent-method'),
+              ('Final expressions', 2, None, 'final-expressions'),
+              ('Steepest descent example', 2, None, 'steepest-descent-example'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method and iterations',
+               2,
+               None,
+               'conjugate-gradient-method-and-iterations'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Revisiting our first homework',
+               2,
+               None,
+               'revisiting-our-first-homework'),
+              ('Gradient descent example', 2, None, 'gradient-descent-example'),
+              ('The derivative of the cost/loss function',
+               2,
+               None,
+               'the-derivative-of-the-cost-loss-function'),
+              ('The Hessian matrix', 2, None, 'the-hessian-matrix'),
+              ('Simple program', 2, None, 'simple-program'),
+              ('Gradient Descent Example', 2, None, 'gradient-descent-example'),
+              ('And a corresponding example using _scikit-learn_',
+               2,
+               None,
+               'and-a-corresponding-example-using-scikit-learn'),
+              ('Gradient descent and Ridge',
+               2,
+               None,
+               'gradient-descent-and-ridge'),
+              ('The Hessian matrix for Ridge Regression',
+               2,
+               None,
+               'the-hessian-matrix-for-ridge-regression'),
+              ('Program example for gradient descent with Ridge Regression',
+               2,
+               None,
+               'program-example-for-gradient-descent-with-ridge-regression'),
+              ('Using gradient descent methods, limitations',
+               2,
+               None,
+               'using-gradient-descent-methods-limitations'),
+              ('Improving gradient descent with momentum',
+               2,
+               None,
+               'improving-gradient-descent-with-momentum'),
+              ('Same code but now with momentum gradient descent',
+               2,
+               None,
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ('Overview video on Stochastic Gradient Descent',
+               2,
+               None,
+               'overview-video-on-stochastic-gradient-descent'),
+              ('Batches and mini-batches', 2, None, 'batches-and-mini-batches'),
+              ('Stochastic Gradient Descent (SGD)',
+               2,
+               None,
+               'stochastic-gradient-descent-sgd'),
+              ('Stochastic Gradient Descent',
+               2,
+               None,
+               'stochastic-gradient-descent'),
+              ('Computation of gradients', 2, None, 'computation-of-gradients'),
+              ('SGD example', 2, None, 'sgd-example'),
+              ('The gradient step', 2, None, 'the-gradient-step'),
+              ('Simple example code', 2, None, 'simple-example-code'),
+              ('When do we stop?', 2, None, 'when-do-we-stop'),
+              ('Slightly different approach',
+               2,
+               None,
+               'slightly-different-approach'),
+              ('Time decay rate', 2, None, 'time-decay-rate'),
+              ('Code with a Number of Minibatches which varies',
+               2,
+               None,
+               'code-with-a-number-of-minibatches-which-varies'),
+              ('Replace or not', 2, None, 'replace-or-not'),
+              ('Momentum based GD', 2, None, 'momentum-based-gd'),
+              ('More on momentum based approaches',
+               2,
+               None,
+               'more-on-momentum-based-approaches'),
+              ('Momentum parameter', 2, None, 'momentum-parameter'),
+              ('Second moment of the gradient',
+               2,
+               None,
+               'second-moment-of-the-gradient'),
+              ('RMS prop', 2, None, 'rms-prop'),
+              ('"ADAM optimizer":"https://arxiv.org/abs/1412.6980"',
+               2,
+               None,
+               'adam-optimizer-https-arxiv-org-abs-1412-6980'),
+              ('Algorithms and codes for Adagrad, RMSprop and Adam',
+               2,
+               None,
+               'algorithms-and-codes-for-adagrad-rmsprop-and-adam'),
+              ('Practical tips', 2, None, 'practical-tips'),
+              ('Automatic differentiation',
+               2,
+               None,
+               'automatic-differentiation'),
+              ('Using autograd', 2, None, 'using-autograd'),
+              ('Autograd with more complicated functions',
+               2,
+               None,
+               'autograd-with-more-complicated-functions'),
+              ('More complicated functions using the elements of their '
+               'arguments directly',
+               2,
+               None,
+               'more-complicated-functions-using-the-elements-of-their-arguments-directly'),
+              ('Functions using mathematical functions from Numpy',
+               2,
+               None,
+               'functions-using-mathematical-functions-from-numpy'),
+              ('More autograd', 2, None, 'more-autograd'),
+              ('And  with loops', 2, None, 'and-with-loops'),
+              ('Using recursion', 2, None, 'using-recursion'),
+              ('Unsupported functions', 2, None, 'unsupported-functions'),
+              ('The syntax a.dot(b) when finding the dot product',
+               2,
+               None,
+               'the-syntax-a-dot-b-when-finding-the-dot-product'),
+              ('Using Autograd with OLS', 2, None, 'using-autograd-with-ols'),
+              ('Same code but now with momentum gradient descent',
+               2,
+               None,
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ("But none of these can compete with Newton's method",
+               2,
+               None,
+               'but-none-of-these-can-compete-with-newton-s-method'),
+              ('Including Stochastic Gradient Descent with Autograd',
+               2,
+               None,
+               'including-stochastic-gradient-descent-with-autograd'),
+              ('Same code but now with momentum gradient descent',
+               2,
+               None,
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ('Similar (second order function now) problem but now with '
+               'AdaGrad',
+               2,
+               None,
+               'similar-second-order-function-now-problem-but-now-with-adagrad'),
+              ('RMSprop for adaptive learning rate with Stochastic Gradient '
+               'Descent',
+               2,
+               None,
+               'rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent'),
+              ('And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"',
+               2,
+               None,
+               'and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf'),
+              ('And Logistic Regression', 2, None, 'and-logistic-regression'),
+              ('Introducing "JAX":"https://jax.readthedocs.io/en/latest/"',
+               2,
+               None,
+               'introducing-jax-https-jax-readthedocs-io-en-latest')]}
+end of tocinfo -->
+
+<body>
+
+
+
+<script type="text/x-mathjax-config">
+MathJax.Hub.Config({
+  TeX: {
+     equationNumbers: {  autoNumber: "none"  },
+     extensions: ["AMSmath.js", "AMSsymbols.js", "autobold.js", "color.js"]
+  }
+});
+</script>
+<script type="text/javascript" async
+ src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
+</script>
+
+
+<!-- Bootstrap navigation bar -->
+<div class="navbar navbar-default navbar-fixed-top">
+  <div class="navbar-header">
+    <button type="button" class="navbar-toggle" data-toggle="collapse" data-target=".navbar-responsive-collapse">
+      <span class="icon-bar"></span>
+      <span class="icon-bar"></span>
+      <span class="icon-bar"></span>
+    </button>
+    <a class="navbar-brand" href="http://github.com/week39-bs.html">Week 39: Optimization and  Gradient Methods</a>
+  </div>
+  <div class="navbar-collapse collapse navbar-responsive-collapse">
+    <ul class="nav navbar-nav navbar-right">
+      <li class="dropdown">
+        <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
+        <ul class="dropdown-menu">
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs001.html#plan-for-week-39-september-23-27-2024" style="font-size: 80%;">Plan for week 39, September 23-27, 2024</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs002.html#lecture-monday-september-23" style="font-size: 80%;">Lecture Monday September 23</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs003.html#lab-sessions-week-39" style="font-size: 80%;">Lab sessions week 39</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs004.html#lecture-monday-september-23-optimization-the-central-part-of-any-machine-learning-algortithm" style="font-size: 80%;">Lecture Monday September 23, Optimization, the central part of any Machine Learning algortithm</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs005.html#revisiting-our-logistic-regression-case" style="font-size: 80%;">Revisiting our Logistic Regression case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs006.html#the-equations-to-solve" style="font-size: 80%;">The equations to solve</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs007.html#solving-using-newton-raphson-s-method" style="font-size: 80%;">Solving using Newton-Raphson's method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs008.html#brief-reminder-on-newton-raphson-s-method" style="font-size: 80%;">Brief reminder on Newton-Raphson's method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs009.html#the-equations" style="font-size: 80%;">The equations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs010.html#simple-geometric-interpretation" style="font-size: 80%;">Simple geometric interpretation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs011.html#extending-to-more-than-one-variable" style="font-size: 80%;">Extending to more than one variable</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs012.html#steepest-descent" style="font-size: 80%;">Steepest descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs013.html#more-on-steepest-descent" style="font-size: 80%;">More on Steepest descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs014.html#the-ideal" style="font-size: 80%;">The ideal</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs015.html#the-sensitiveness-of-the-gradient-descent" style="font-size: 80%;">The sensitiveness of the gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs016.html#convex-functions" style="font-size: 80%;">Convex functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs017.html#convex-function" style="font-size: 80%;">Convex function</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs018.html#conditions-on-convex-functions" style="font-size: 80%;">Conditions on convex functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs019.html#more-on-convex-functions" style="font-size: 80%;">More on convex functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs020.html#some-simple-problems" style="font-size: 80%;">Some simple problems</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs021.html#standard-steepest-descent" style="font-size: 80%;">Standard steepest descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs022.html#gradient-method" style="font-size: 80%;">Gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs024.html#steepest-descent-method" style="font-size: 80%;">Steepest descent  method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs024.html#steepest-descent-method" style="font-size: 80%;">Steepest descent  method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs025.html#final-expressions" style="font-size: 80%;">Final expressions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs026.html#steepest-descent-example" style="font-size: 80%;">Steepest descent example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs031.html#conjugate-gradient-method-and-iterations" style="font-size: 80%;">Conjugate gradient method and iterations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs035.html#revisiting-our-first-homework" style="font-size: 80%;">Revisiting our first homework</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs040.html#gradient-descent-example" style="font-size: 80%;">Gradient descent example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs037.html#the-derivative-of-the-cost-loss-function" style="font-size: 80%;">The derivative of the cost/loss function</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs038.html#the-hessian-matrix" style="font-size: 80%;">The Hessian matrix</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs039.html#simple-program" style="font-size: 80%;">Simple program</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs040.html#gradient-descent-example" style="font-size: 80%;">Gradient Descent Example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs041.html#and-a-corresponding-example-using-scikit-learn" style="font-size: 80%;">And a corresponding example using <b>scikit-learn</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs042.html#gradient-descent-and-ridge" style="font-size: 80%;">Gradient descent and Ridge</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs043.html#the-hessian-matrix-for-ridge-regression" style="font-size: 80%;">The Hessian matrix for Ridge Regression</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs044.html#program-example-for-gradient-descent-with-ridge-regression" style="font-size: 80%;">Program example for gradient descent with Ridge Regression</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs045.html#using-gradient-descent-methods-limitations" style="font-size: 80%;">Using gradient descent methods, limitations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs046.html#improving-gradient-descent-with-momentum" style="font-size: 80%;">Improving gradient descent with momentum</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs083.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;">Same code but now with momentum gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs048.html#overview-video-on-stochastic-gradient-descent" style="font-size: 80%;">Overview video on Stochastic Gradient Descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs049.html#batches-and-mini-batches" style="font-size: 80%;">Batches and mini-batches</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs050.html#stochastic-gradient-descent-sgd" style="font-size: 80%;">Stochastic Gradient Descent (SGD)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs051.html#stochastic-gradient-descent" style="font-size: 80%;">Stochastic Gradient Descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs052.html#computation-of-gradients" style="font-size: 80%;">Computation of gradients</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs053.html#sgd-example" style="font-size: 80%;">SGD example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs054.html#the-gradient-step" style="font-size: 80%;">The gradient step</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs055.html#simple-example-code" style="font-size: 80%;">Simple example code</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs056.html#when-do-we-stop" style="font-size: 80%;">When do we stop?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs057.html#slightly-different-approach" style="font-size: 80%;">Slightly different approach</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs058.html#time-decay-rate" style="font-size: 80%;">Time decay rate</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs059.html#code-with-a-number-of-minibatches-which-varies" style="font-size: 80%;">Code with a Number of Minibatches which varies</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs060.html#replace-or-not" style="font-size: 80%;">Replace or not</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs061.html#momentum-based-gd" style="font-size: 80%;">Momentum based GD</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs062.html#more-on-momentum-based-approaches" style="font-size: 80%;">More on momentum based approaches</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs063.html#momentum-parameter" style="font-size: 80%;">Momentum parameter</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs064.html#second-moment-of-the-gradient" style="font-size: 80%;">Second moment of the gradient</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs065.html#rms-prop" style="font-size: 80%;">RMS prop</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs066.html#adam-optimizer-https-arxiv-org-abs-1412-6980" style="font-size: 80%;">"ADAM optimizer":"https://arxiv.org/abs/1412.6980"</a></li>
+     <!-- navigation toc: --> <li><a href="#algorithms-and-codes-for-adagrad-rmsprop-and-adam" style="font-size: 80%;">Algorithms and codes for Adagrad, RMSprop and Adam</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs068.html#practical-tips" style="font-size: 80%;">Practical tips</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs069.html#automatic-differentiation" style="font-size: 80%;">Automatic differentiation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs070.html#using-autograd" style="font-size: 80%;">Using autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs071.html#autograd-with-more-complicated-functions" style="font-size: 80%;">Autograd with more complicated functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs072.html#more-complicated-functions-using-the-elements-of-their-arguments-directly" style="font-size: 80%;">More complicated functions using the elements of their arguments directly</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs073.html#functions-using-mathematical-functions-from-numpy" style="font-size: 80%;">Functions using mathematical functions from Numpy</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs074.html#more-autograd" style="font-size: 80%;">More autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs075.html#and-with-loops" style="font-size: 80%;">And  with loops</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs076.html#using-recursion" style="font-size: 80%;">Using recursion</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs077.html#unsupported-functions" style="font-size: 80%;">Unsupported functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs078.html#the-syntax-a-dot-b-when-finding-the-dot-product" style="font-size: 80%;">The syntax a.dot(b) when finding the dot product</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs079.html#using-autograd-with-ols" style="font-size: 80%;">Using Autograd with OLS</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs083.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;">Same code but now with momentum gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs081.html#but-none-of-these-can-compete-with-newton-s-method" style="font-size: 80%;">But none of these can compete with Newton's method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs082.html#including-stochastic-gradient-descent-with-autograd" style="font-size: 80%;">Including Stochastic Gradient Descent with Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs083.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;">Same code but now with momentum gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs084.html#similar-second-order-function-now-problem-but-now-with-adagrad" style="font-size: 80%;">Similar (second order function now) problem but now with AdaGrad</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs085.html#rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent" style="font-size: 80%;">RMSprop for adaptive learning rate with Stochastic Gradient Descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs086.html#and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf" style="font-size: 80%;">And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs087.html#and-logistic-regression" style="font-size: 80%;">And Logistic Regression</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs088.html#introducing-jax-https-jax-readthedocs-io-en-latest" style="font-size: 80%;">Introducing "JAX":"https://jax.readthedocs.io/en/latest/"</a></li>
+
+        </ul>
+      </li>
+    </ul>
+  </div>
+</div>
+</div> <!-- end of navigation bar -->
+<div class="container">
+<p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
+<a name="part0067"></a>
+<!-- !split -->
+<h2 id="algorithms-and-codes-for-adagrad-rmsprop-and-adam" class="anchor">Algorithms and codes for Adagrad, RMSprop and Adam </h2>
+
+<p>The algorithms we have implemented are well described in the text by <a href="https://www.deeplearningbook.org/contents/optimization.html" target="_self">Goodfellow, Bengio and Courville, chapter 8</a>.</p>
+
+<p>The codes which implement these algorithms are discussed after our presentation of automatic differentiation.</p>
+
+<p>
+<!-- navigation buttons at the bottom of the page -->
+<ul class="pagination">
+<li><a href="http://github.com/._week39-bs066.html">&laquo;</a></li>
+  <li><a href="http://github.com/._week39-bs000.html">1</a></li>
+  <li><a href="">...</a></li>
+  <li><a href="http://github.com/._week39-bs059.html">60</a></li>
+  <li><a href="http://github.com/._week39-bs060.html">61</a></li>
+  <li><a href="http://github.com/._week39-bs061.html">62</a></li>
+  <li><a href="http://github.com/._week39-bs062.html">63</a></li>
+  <li><a href="http://github.com/._week39-bs063.html">64</a></li>
+  <li><a href="http://github.com/._week39-bs064.html">65</a></li>
+  <li><a href="http://github.com/._week39-bs065.html">66</a></li>
+  <li><a href="http://github.com/._week39-bs066.html">67</a></li>
+  <li class="active"><a href="http://github.com/._week39-bs067.html">68</a></li>
+  <li><a href="http://github.com/._week39-bs068.html">69</a></li>
+  <li><a href="http://github.com/._week39-bs069.html">70</a></li>
+  <li><a href="http://github.com/._week39-bs070.html">71</a></li>
+  <li><a href="http://github.com/._week39-bs071.html">72</a></li>
+  <li><a href="http://github.com/._week39-bs072.html">73</a></li>
+  <li><a href="http://github.com/._week39-bs073.html">74</a></li>
+  <li><a href="http://github.com/._week39-bs074.html">75</a></li>
+  <li><a href="http://github.com/._week39-bs075.html">76</a></li>
+  <li><a href="http://github.com/._week39-bs076.html">77</a></li>
+  <li><a href="">...</a></li>
+  <li><a href="http://github.com/._week39-bs088.html">89</a></li>
+  <li><a href="http://github.com/._week39-bs068.html">&raquo;</a></li>
+</ul>
+<!-- ------------------- end of main content --------------- -->
+</div>  <!-- end container -->
+<!-- include javascript, jQuery *first* -->
+<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.10.2/jquery.min.js"></script>
+<script src="https://netdna.bootstrapcdn.com/bootstrap/3.0.0/js/bootstrap.min.js"></script>
+<!-- Bootstrap footer
+<footer>
+<a href="https://.../"><img width="250" align=right src="https://.../"></a>
+</footer>
+-->
+<center style="font-size:80%">
+<!-- copyright only on the titlepage -->
+</center>
+</body>
+</html>
+
diff --git a/doc/pub/week39/html/._week39-bs068.html b/doc/pub/week39/html/._week39-bs068.html
new file mode 100644
index 000000000..253a5b779
--- /dev/null
+++ b/doc/pub/week39/html/._week39-bs068.html
@@ -0,0 +1,467 @@
+<!--
+HTML file automatically generated from DocOnce source
+(https://github.com/doconce/doconce/)
+doconce format html week39.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week39-bs --no_mako
+-->
+<html>
+<head>
+<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
+<meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
+<meta name="viewport" content="width=device-width, initial-scale=1.0" />
+<meta name="description" content="Week 39: Optimization and  Gradient Methods">
+<title>Week 39: Optimization and  Gradient Methods</title>
+<!-- Bootstrap style: bootstrap -->
+<!-- doconce format html week39.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week39-bs --no_mako -->
+<link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
+<!-- not necessary
+<link href="https://netdna.bootstrapcdn.com/font-awesome/4.0.3/css/font-awesome.css" rel="stylesheet">
+-->
+<style type="text/css">
+/* Add scrollbar to dropdown menus in bootstrap navigation bar */
+.dropdown-menu {
+   height: auto;
+   max-height: 400px;
+   overflow-x: hidden;
+}
+/* Adds an invisible element before each target to offset for the navigation
+   bar */
+.anchor::before {
+  content:"";
+  display:block;
+  height:50px;      /* fixed header height for style bootstrap */
+  margin:-50px 0 0; /* negative fixed header height */
+}
+</style>
+</head>
+
+<!-- tocinfo
+{'highest level': 2,
+ 'sections': [('Plan for week 39, September 23-27, 2024',
+               2,
+               None,
+               'plan-for-week-39-september-23-27-2024'),
+              ('Lecture Monday September 23',
+               2,
+               None,
+               'lecture-monday-september-23'),
+              ('Lab sessions week 39', 2, None, 'lab-sessions-week-39'),
+              ('Lecture Monday September 23, Optimization, the central part of '
+               'any Machine Learning algortithm',
+               2,
+               None,
+               'lecture-monday-september-23-optimization-the-central-part-of-any-machine-learning-algortithm'),
+              ('Revisiting our Logistic Regression case',
+               2,
+               None,
+               'revisiting-our-logistic-regression-case'),
+              ('The equations to solve', 2, None, 'the-equations-to-solve'),
+              ("Solving using Newton-Raphson's method",
+               2,
+               None,
+               'solving-using-newton-raphson-s-method'),
+              ("Brief reminder on Newton-Raphson's method",
+               2,
+               None,
+               'brief-reminder-on-newton-raphson-s-method'),
+              ('The equations', 2, None, 'the-equations'),
+              ('Simple geometric interpretation',
+               2,
+               None,
+               'simple-geometric-interpretation'),
+              ('Extending to more than one variable',
+               2,
+               None,
+               'extending-to-more-than-one-variable'),
+              ('Steepest descent', 2, None, 'steepest-descent'),
+              ('More on Steepest descent', 2, None, 'more-on-steepest-descent'),
+              ('The ideal', 2, None, 'the-ideal'),
+              ('The sensitiveness of the gradient descent',
+               2,
+               None,
+               'the-sensitiveness-of-the-gradient-descent'),
+              ('Convex functions', 2, None, 'convex-functions'),
+              ('Convex function', 2, None, 'convex-function'),
+              ('Conditions on convex functions',
+               2,
+               None,
+               'conditions-on-convex-functions'),
+              ('More on convex functions', 2, None, 'more-on-convex-functions'),
+              ('Some simple problems', 2, None, 'some-simple-problems'),
+              ('Standard steepest descent',
+               2,
+               None,
+               'standard-steepest-descent'),
+              ('Gradient method', 2, None, 'gradient-method'),
+              ('Steepest descent  method', 2, None, 'steepest-descent-method'),
+              ('Steepest descent  method', 2, None, 'steepest-descent-method'),
+              ('Final expressions', 2, None, 'final-expressions'),
+              ('Steepest descent example', 2, None, 'steepest-descent-example'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method and iterations',
+               2,
+               None,
+               'conjugate-gradient-method-and-iterations'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Revisiting our first homework',
+               2,
+               None,
+               'revisiting-our-first-homework'),
+              ('Gradient descent example', 2, None, 'gradient-descent-example'),
+              ('The derivative of the cost/loss function',
+               2,
+               None,
+               'the-derivative-of-the-cost-loss-function'),
+              ('The Hessian matrix', 2, None, 'the-hessian-matrix'),
+              ('Simple program', 2, None, 'simple-program'),
+              ('Gradient Descent Example', 2, None, 'gradient-descent-example'),
+              ('And a corresponding example using _scikit-learn_',
+               2,
+               None,
+               'and-a-corresponding-example-using-scikit-learn'),
+              ('Gradient descent and Ridge',
+               2,
+               None,
+               'gradient-descent-and-ridge'),
+              ('The Hessian matrix for Ridge Regression',
+               2,
+               None,
+               'the-hessian-matrix-for-ridge-regression'),
+              ('Program example for gradient descent with Ridge Regression',
+               2,
+               None,
+               'program-example-for-gradient-descent-with-ridge-regression'),
+              ('Using gradient descent methods, limitations',
+               2,
+               None,
+               'using-gradient-descent-methods-limitations'),
+              ('Improving gradient descent with momentum',
+               2,
+               None,
+               'improving-gradient-descent-with-momentum'),
+              ('Same code but now with momentum gradient descent',
+               2,
+               None,
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ('Overview video on Stochastic Gradient Descent',
+               2,
+               None,
+               'overview-video-on-stochastic-gradient-descent'),
+              ('Batches and mini-batches', 2, None, 'batches-and-mini-batches'),
+              ('Stochastic Gradient Descent (SGD)',
+               2,
+               None,
+               'stochastic-gradient-descent-sgd'),
+              ('Stochastic Gradient Descent',
+               2,
+               None,
+               'stochastic-gradient-descent'),
+              ('Computation of gradients', 2, None, 'computation-of-gradients'),
+              ('SGD example', 2, None, 'sgd-example'),
+              ('The gradient step', 2, None, 'the-gradient-step'),
+              ('Simple example code', 2, None, 'simple-example-code'),
+              ('When do we stop?', 2, None, 'when-do-we-stop'),
+              ('Slightly different approach',
+               2,
+               None,
+               'slightly-different-approach'),
+              ('Time decay rate', 2, None, 'time-decay-rate'),
+              ('Code with a Number of Minibatches which varies',
+               2,
+               None,
+               'code-with-a-number-of-minibatches-which-varies'),
+              ('Replace or not', 2, None, 'replace-or-not'),
+              ('Momentum based GD', 2, None, 'momentum-based-gd'),
+              ('More on momentum based approaches',
+               2,
+               None,
+               'more-on-momentum-based-approaches'),
+              ('Momentum parameter', 2, None, 'momentum-parameter'),
+              ('Second moment of the gradient',
+               2,
+               None,
+               'second-moment-of-the-gradient'),
+              ('RMS prop', 2, None, 'rms-prop'),
+              ('"ADAM optimizer":"https://arxiv.org/abs/1412.6980"',
+               2,
+               None,
+               'adam-optimizer-https-arxiv-org-abs-1412-6980'),
+              ('Algorithms and codes for Adagrad, RMSprop and Adam',
+               2,
+               None,
+               'algorithms-and-codes-for-adagrad-rmsprop-and-adam'),
+              ('Practical tips', 2, None, 'practical-tips'),
+              ('Automatic differentiation',
+               2,
+               None,
+               'automatic-differentiation'),
+              ('Using autograd', 2, None, 'using-autograd'),
+              ('Autograd with more complicated functions',
+               2,
+               None,
+               'autograd-with-more-complicated-functions'),
+              ('More complicated functions using the elements of their '
+               'arguments directly',
+               2,
+               None,
+               'more-complicated-functions-using-the-elements-of-their-arguments-directly'),
+              ('Functions using mathematical functions from Numpy',
+               2,
+               None,
+               'functions-using-mathematical-functions-from-numpy'),
+              ('More autograd', 2, None, 'more-autograd'),
+              ('And  with loops', 2, None, 'and-with-loops'),
+              ('Using recursion', 2, None, 'using-recursion'),
+              ('Unsupported functions', 2, None, 'unsupported-functions'),
+              ('The syntax a.dot(b) when finding the dot product',
+               2,
+               None,
+               'the-syntax-a-dot-b-when-finding-the-dot-product'),
+              ('Using Autograd with OLS', 2, None, 'using-autograd-with-ols'),
+              ('Same code but now with momentum gradient descent',
+               2,
+               None,
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ("But none of these can compete with Newton's method",
+               2,
+               None,
+               'but-none-of-these-can-compete-with-newton-s-method'),
+              ('Including Stochastic Gradient Descent with Autograd',
+               2,
+               None,
+               'including-stochastic-gradient-descent-with-autograd'),
+              ('Same code but now with momentum gradient descent',
+               2,
+               None,
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ('Similar (second order function now) problem but now with '
+               'AdaGrad',
+               2,
+               None,
+               'similar-second-order-function-now-problem-but-now-with-adagrad'),
+              ('RMSprop for adaptive learning rate with Stochastic Gradient '
+               'Descent',
+               2,
+               None,
+               'rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent'),
+              ('And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"',
+               2,
+               None,
+               'and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf'),
+              ('And Logistic Regression', 2, None, 'and-logistic-regression'),
+              ('Introducing "JAX":"https://jax.readthedocs.io/en/latest/"',
+               2,
+               None,
+               'introducing-jax-https-jax-readthedocs-io-en-latest')]}
+end of tocinfo -->
+
+<body>
+
+
+
+<script type="text/x-mathjax-config">
+MathJax.Hub.Config({
+  TeX: {
+     equationNumbers: {  autoNumber: "none"  },
+     extensions: ["AMSmath.js", "AMSsymbols.js", "autobold.js", "color.js"]
+  }
+});
+</script>
+<script type="text/javascript" async
+ src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
+</script>
+
+
+<!-- Bootstrap navigation bar -->
+<div class="navbar navbar-default navbar-fixed-top">
+  <div class="navbar-header">
+    <button type="button" class="navbar-toggle" data-toggle="collapse" data-target=".navbar-responsive-collapse">
+      <span class="icon-bar"></span>
+      <span class="icon-bar"></span>
+      <span class="icon-bar"></span>
+    </button>
+    <a class="navbar-brand" href="http://github.com/week39-bs.html">Week 39: Optimization and  Gradient Methods</a>
+  </div>
+  <div class="navbar-collapse collapse navbar-responsive-collapse">
+    <ul class="nav navbar-nav navbar-right">
+      <li class="dropdown">
+        <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
+        <ul class="dropdown-menu">
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs001.html#plan-for-week-39-september-23-27-2024" style="font-size: 80%;">Plan for week 39, September 23-27, 2024</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs002.html#lecture-monday-september-23" style="font-size: 80%;">Lecture Monday September 23</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs003.html#lab-sessions-week-39" style="font-size: 80%;">Lab sessions week 39</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs004.html#lecture-monday-september-23-optimization-the-central-part-of-any-machine-learning-algortithm" style="font-size: 80%;">Lecture Monday September 23, Optimization, the central part of any Machine Learning algortithm</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs005.html#revisiting-our-logistic-regression-case" style="font-size: 80%;">Revisiting our Logistic Regression case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs006.html#the-equations-to-solve" style="font-size: 80%;">The equations to solve</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs007.html#solving-using-newton-raphson-s-method" style="font-size: 80%;">Solving using Newton-Raphson's method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs008.html#brief-reminder-on-newton-raphson-s-method" style="font-size: 80%;">Brief reminder on Newton-Raphson's method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs009.html#the-equations" style="font-size: 80%;">The equations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs010.html#simple-geometric-interpretation" style="font-size: 80%;">Simple geometric interpretation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs011.html#extending-to-more-than-one-variable" style="font-size: 80%;">Extending to more than one variable</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs012.html#steepest-descent" style="font-size: 80%;">Steepest descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs013.html#more-on-steepest-descent" style="font-size: 80%;">More on Steepest descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs014.html#the-ideal" style="font-size: 80%;">The ideal</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs015.html#the-sensitiveness-of-the-gradient-descent" style="font-size: 80%;">The sensitiveness of the gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs016.html#convex-functions" style="font-size: 80%;">Convex functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs017.html#convex-function" style="font-size: 80%;">Convex function</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs018.html#conditions-on-convex-functions" style="font-size: 80%;">Conditions on convex functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs019.html#more-on-convex-functions" style="font-size: 80%;">More on convex functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs020.html#some-simple-problems" style="font-size: 80%;">Some simple problems</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs021.html#standard-steepest-descent" style="font-size: 80%;">Standard steepest descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs022.html#gradient-method" style="font-size: 80%;">Gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs024.html#steepest-descent-method" style="font-size: 80%;">Steepest descent  method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs024.html#steepest-descent-method" style="font-size: 80%;">Steepest descent  method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs025.html#final-expressions" style="font-size: 80%;">Final expressions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs026.html#steepest-descent-example" style="font-size: 80%;">Steepest descent example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs031.html#conjugate-gradient-method-and-iterations" style="font-size: 80%;">Conjugate gradient method and iterations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs035.html#revisiting-our-first-homework" style="font-size: 80%;">Revisiting our first homework</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs040.html#gradient-descent-example" style="font-size: 80%;">Gradient descent example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs037.html#the-derivative-of-the-cost-loss-function" style="font-size: 80%;">The derivative of the cost/loss function</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs038.html#the-hessian-matrix" style="font-size: 80%;">The Hessian matrix</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs039.html#simple-program" style="font-size: 80%;">Simple program</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs040.html#gradient-descent-example" style="font-size: 80%;">Gradient Descent Example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs041.html#and-a-corresponding-example-using-scikit-learn" style="font-size: 80%;">And a corresponding example using <b>scikit-learn</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs042.html#gradient-descent-and-ridge" style="font-size: 80%;">Gradient descent and Ridge</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs043.html#the-hessian-matrix-for-ridge-regression" style="font-size: 80%;">The Hessian matrix for Ridge Regression</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs044.html#program-example-for-gradient-descent-with-ridge-regression" style="font-size: 80%;">Program example for gradient descent with Ridge Regression</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs045.html#using-gradient-descent-methods-limitations" style="font-size: 80%;">Using gradient descent methods, limitations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs046.html#improving-gradient-descent-with-momentum" style="font-size: 80%;">Improving gradient descent with momentum</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs083.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;">Same code but now with momentum gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs048.html#overview-video-on-stochastic-gradient-descent" style="font-size: 80%;">Overview video on Stochastic Gradient Descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs049.html#batches-and-mini-batches" style="font-size: 80%;">Batches and mini-batches</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs050.html#stochastic-gradient-descent-sgd" style="font-size: 80%;">Stochastic Gradient Descent (SGD)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs051.html#stochastic-gradient-descent" style="font-size: 80%;">Stochastic Gradient Descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs052.html#computation-of-gradients" style="font-size: 80%;">Computation of gradients</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs053.html#sgd-example" style="font-size: 80%;">SGD example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs054.html#the-gradient-step" style="font-size: 80%;">The gradient step</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs055.html#simple-example-code" style="font-size: 80%;">Simple example code</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs056.html#when-do-we-stop" style="font-size: 80%;">When do we stop?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs057.html#slightly-different-approach" style="font-size: 80%;">Slightly different approach</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs058.html#time-decay-rate" style="font-size: 80%;">Time decay rate</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs059.html#code-with-a-number-of-minibatches-which-varies" style="font-size: 80%;">Code with a Number of Minibatches which varies</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs060.html#replace-or-not" style="font-size: 80%;">Replace or not</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs061.html#momentum-based-gd" style="font-size: 80%;">Momentum based GD</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs062.html#more-on-momentum-based-approaches" style="font-size: 80%;">More on momentum based approaches</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs063.html#momentum-parameter" style="font-size: 80%;">Momentum parameter</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs064.html#second-moment-of-the-gradient" style="font-size: 80%;">Second moment of the gradient</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs065.html#rms-prop" style="font-size: 80%;">RMS prop</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs066.html#adam-optimizer-https-arxiv-org-abs-1412-6980" style="font-size: 80%;">"ADAM optimizer":"https://arxiv.org/abs/1412.6980"</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs067.html#algorithms-and-codes-for-adagrad-rmsprop-and-adam" style="font-size: 80%;">Algorithms and codes for Adagrad, RMSprop and Adam</a></li>
+     <!-- navigation toc: --> <li><a href="#practical-tips" style="font-size: 80%;">Practical tips</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs069.html#automatic-differentiation" style="font-size: 80%;">Automatic differentiation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs070.html#using-autograd" style="font-size: 80%;">Using autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs071.html#autograd-with-more-complicated-functions" style="font-size: 80%;">Autograd with more complicated functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs072.html#more-complicated-functions-using-the-elements-of-their-arguments-directly" style="font-size: 80%;">More complicated functions using the elements of their arguments directly</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs073.html#functions-using-mathematical-functions-from-numpy" style="font-size: 80%;">Functions using mathematical functions from Numpy</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs074.html#more-autograd" style="font-size: 80%;">More autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs075.html#and-with-loops" style="font-size: 80%;">And  with loops</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs076.html#using-recursion" style="font-size: 80%;">Using recursion</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs077.html#unsupported-functions" style="font-size: 80%;">Unsupported functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs078.html#the-syntax-a-dot-b-when-finding-the-dot-product" style="font-size: 80%;">The syntax a.dot(b) when finding the dot product</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs079.html#using-autograd-with-ols" style="font-size: 80%;">Using Autograd with OLS</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs083.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;">Same code but now with momentum gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs081.html#but-none-of-these-can-compete-with-newton-s-method" style="font-size: 80%;">But none of these can compete with Newton's method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs082.html#including-stochastic-gradient-descent-with-autograd" style="font-size: 80%;">Including Stochastic Gradient Descent with Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs083.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;">Same code but now with momentum gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs084.html#similar-second-order-function-now-problem-but-now-with-adagrad" style="font-size: 80%;">Similar (second order function now) problem but now with AdaGrad</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs085.html#rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent" style="font-size: 80%;">RMSprop for adaptive learning rate with Stochastic Gradient Descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs086.html#and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf" style="font-size: 80%;">And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs087.html#and-logistic-regression" style="font-size: 80%;">And Logistic Regression</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs088.html#introducing-jax-https-jax-readthedocs-io-en-latest" style="font-size: 80%;">Introducing "JAX":"https://jax.readthedocs.io/en/latest/"</a></li>
+
+        </ul>
+      </li>
+    </ul>
+  </div>
+</div>
+</div> <!-- end of navigation bar -->
+<div class="container">
+<p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
+<a name="part0068"></a>
+<!-- !split -->
+<h2 id="practical-tips" class="anchor">Practical tips </h2>
+
+<ul>
+<li> <b>Randomize the data when making mini-batches</b>. It is always important to randomly shuffle the data when forming mini-batches. Otherwise, the gradient descent method can fit spurious correlations resulting from the order in which data is presented.</li>
+<li> <b>Transform your inputs</b>. Learning becomes difficult when our landscape has a mixture of steep and flat directions. One simple trick for minimizing these situations is to standardize the data by subtracting the mean and normalizing the variance of input variables. Whenever possible, also decorrelate the inputs. To understand why this is helpful, consider the case of linear regression. It is easy to show that for the squared error cost function, the Hessian of the cost function is just the correlation matrix between the inputs. Thus, by standardizing the inputs, we are ensuring that the landscape looks homogeneous in all directions in parameter space. Since most deep networks can be viewed as linear transformations followed by a non-linearity at each layer, we expect this intuition to hold beyond the linear case.</li>
+<li> <b>Monitor the out-of-sample performance.</b> Always monitor the performance of your model on a validation set (a small portion of the training data that is held out of the training process to serve as a proxy for the test set. If the validation error starts increasing, then the model is beginning to overfit. Terminate the learning process. This <em>early stopping</em> significantly improves performance in many settings.</li>
+<li> <b>Adaptive optimization methods don't always have good generalization.</b> Recent studies have shown that adaptive methods such as ADAM, RMSPorp, and AdaGrad tend to have poor generalization compared to SGD or SGD with momentum, particularly in the high-dimensional limit (i.e. the number of parameters exceeds the number of data points). Although it is not clear at this stage why these methods perform so well in training deep neural networks, simpler procedures like properly-tuned SGD may work as well or better in these applications.</li>
+</ul>
+<p>Geron's text, see chapter 11, has several interesting discussions.</p>
+
+<p>
+<!-- navigation buttons at the bottom of the page -->
+<ul class="pagination">
+<li><a href="http://github.com/._week39-bs067.html">&laquo;</a></li>
+  <li><a href="http://github.com/._week39-bs000.html">1</a></li>
+  <li><a href="">...</a></li>
+  <li><a href="http://github.com/._week39-bs060.html">61</a></li>
+  <li><a href="http://github.com/._week39-bs061.html">62</a></li>
+  <li><a href="http://github.com/._week39-bs062.html">63</a></li>
+  <li><a href="http://github.com/._week39-bs063.html">64</a></li>
+  <li><a href="http://github.com/._week39-bs064.html">65</a></li>
+  <li><a href="http://github.com/._week39-bs065.html">66</a></li>
+  <li><a href="http://github.com/._week39-bs066.html">67</a></li>
+  <li><a href="http://github.com/._week39-bs067.html">68</a></li>
+  <li class="active"><a href="http://github.com/._week39-bs068.html">69</a></li>
+  <li><a href="http://github.com/._week39-bs069.html">70</a></li>
+  <li><a href="http://github.com/._week39-bs070.html">71</a></li>
+  <li><a href="http://github.com/._week39-bs071.html">72</a></li>
+  <li><a href="http://github.com/._week39-bs072.html">73</a></li>
+  <li><a href="http://github.com/._week39-bs073.html">74</a></li>
+  <li><a href="http://github.com/._week39-bs074.html">75</a></li>
+  <li><a href="http://github.com/._week39-bs075.html">76</a></li>
+  <li><a href="http://github.com/._week39-bs076.html">77</a></li>
+  <li><a href="http://github.com/._week39-bs077.html">78</a></li>
+  <li><a href="">...</a></li>
+  <li><a href="http://github.com/._week39-bs088.html">89</a></li>
+  <li><a href="http://github.com/._week39-bs069.html">&raquo;</a></li>
+</ul>
+<!-- ------------------- end of main content --------------- -->
+</div>  <!-- end container -->
+<!-- include javascript, jQuery *first* -->
+<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.10.2/jquery.min.js"></script>
+<script src="https://netdna.bootstrapcdn.com/bootstrap/3.0.0/js/bootstrap.min.js"></script>
+<!-- Bootstrap footer
+<footer>
+<a href="https://.../"><img width="250" align=right src="https://.../"></a>
+</footer>
+-->
+<center style="font-size:80%">
+<!-- copyright only on the titlepage -->
+</center>
+</body>
+</html>
+
diff --git a/doc/pub/week39/html/._week39-bs069.html b/doc/pub/week39/html/._week39-bs069.html
new file mode 100644
index 000000000..34120dda8
--- /dev/null
+++ b/doc/pub/week39/html/._week39-bs069.html
@@ -0,0 +1,557 @@
+<!--
+HTML file automatically generated from DocOnce source
+(https://github.com/doconce/doconce/)
+doconce format html week39.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week39-bs --no_mako
+-->
+<html>
+<head>
+<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
+<meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
+<meta name="viewport" content="width=device-width, initial-scale=1.0" />
+<meta name="description" content="Week 39: Optimization and  Gradient Methods">
+<title>Week 39: Optimization and  Gradient Methods</title>
+<!-- Bootstrap style: bootstrap -->
+<!-- doconce format html week39.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week39-bs --no_mako -->
+<link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
+<!-- not necessary
+<link href="https://netdna.bootstrapcdn.com/font-awesome/4.0.3/css/font-awesome.css" rel="stylesheet">
+-->
+<style type="text/css">
+/* Add scrollbar to dropdown menus in bootstrap navigation bar */
+.dropdown-menu {
+   height: auto;
+   max-height: 400px;
+   overflow-x: hidden;
+}
+/* Adds an invisible element before each target to offset for the navigation
+   bar */
+.anchor::before {
+  content:"";
+  display:block;
+  height:50px;      /* fixed header height for style bootstrap */
+  margin:-50px 0 0; /* negative fixed header height */
+}
+</style>
+</head>
+
+<!-- tocinfo
+{'highest level': 2,
+ 'sections': [('Plan for week 39, September 23-27, 2024',
+               2,
+               None,
+               'plan-for-week-39-september-23-27-2024'),
+              ('Lecture Monday September 23',
+               2,
+               None,
+               'lecture-monday-september-23'),
+              ('Lab sessions week 39', 2, None, 'lab-sessions-week-39'),
+              ('Lecture Monday September 23, Optimization, the central part of '
+               'any Machine Learning algortithm',
+               2,
+               None,
+               'lecture-monday-september-23-optimization-the-central-part-of-any-machine-learning-algortithm'),
+              ('Revisiting our Logistic Regression case',
+               2,
+               None,
+               'revisiting-our-logistic-regression-case'),
+              ('The equations to solve', 2, None, 'the-equations-to-solve'),
+              ("Solving using Newton-Raphson's method",
+               2,
+               None,
+               'solving-using-newton-raphson-s-method'),
+              ("Brief reminder on Newton-Raphson's method",
+               2,
+               None,
+               'brief-reminder-on-newton-raphson-s-method'),
+              ('The equations', 2, None, 'the-equations'),
+              ('Simple geometric interpretation',
+               2,
+               None,
+               'simple-geometric-interpretation'),
+              ('Extending to more than one variable',
+               2,
+               None,
+               'extending-to-more-than-one-variable'),
+              ('Steepest descent', 2, None, 'steepest-descent'),
+              ('More on Steepest descent', 2, None, 'more-on-steepest-descent'),
+              ('The ideal', 2, None, 'the-ideal'),
+              ('The sensitiveness of the gradient descent',
+               2,
+               None,
+               'the-sensitiveness-of-the-gradient-descent'),
+              ('Convex functions', 2, None, 'convex-functions'),
+              ('Convex function', 2, None, 'convex-function'),
+              ('Conditions on convex functions',
+               2,
+               None,
+               'conditions-on-convex-functions'),
+              ('More on convex functions', 2, None, 'more-on-convex-functions'),
+              ('Some simple problems', 2, None, 'some-simple-problems'),
+              ('Standard steepest descent',
+               2,
+               None,
+               'standard-steepest-descent'),
+              ('Gradient method', 2, None, 'gradient-method'),
+              ('Steepest descent  method', 2, None, 'steepest-descent-method'),
+              ('Steepest descent  method', 2, None, 'steepest-descent-method'),
+              ('Final expressions', 2, None, 'final-expressions'),
+              ('Steepest descent example', 2, None, 'steepest-descent-example'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method and iterations',
+               2,
+               None,
+               'conjugate-gradient-method-and-iterations'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Revisiting our first homework',
+               2,
+               None,
+               'revisiting-our-first-homework'),
+              ('Gradient descent example', 2, None, 'gradient-descent-example'),
+              ('The derivative of the cost/loss function',
+               2,
+               None,
+               'the-derivative-of-the-cost-loss-function'),
+              ('The Hessian matrix', 2, None, 'the-hessian-matrix'),
+              ('Simple program', 2, None, 'simple-program'),
+              ('Gradient Descent Example', 2, None, 'gradient-descent-example'),
+              ('And a corresponding example using _scikit-learn_',
+               2,
+               None,
+               'and-a-corresponding-example-using-scikit-learn'),
+              ('Gradient descent and Ridge',
+               2,
+               None,
+               'gradient-descent-and-ridge'),
+              ('The Hessian matrix for Ridge Regression',
+               2,
+               None,
+               'the-hessian-matrix-for-ridge-regression'),
+              ('Program example for gradient descent with Ridge Regression',
+               2,
+               None,
+               'program-example-for-gradient-descent-with-ridge-regression'),
+              ('Using gradient descent methods, limitations',
+               2,
+               None,
+               'using-gradient-descent-methods-limitations'),
+              ('Improving gradient descent with momentum',
+               2,
+               None,
+               'improving-gradient-descent-with-momentum'),
+              ('Same code but now with momentum gradient descent',
+               2,
+               None,
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ('Overview video on Stochastic Gradient Descent',
+               2,
+               None,
+               'overview-video-on-stochastic-gradient-descent'),
+              ('Batches and mini-batches', 2, None, 'batches-and-mini-batches'),
+              ('Stochastic Gradient Descent (SGD)',
+               2,
+               None,
+               'stochastic-gradient-descent-sgd'),
+              ('Stochastic Gradient Descent',
+               2,
+               None,
+               'stochastic-gradient-descent'),
+              ('Computation of gradients', 2, None, 'computation-of-gradients'),
+              ('SGD example', 2, None, 'sgd-example'),
+              ('The gradient step', 2, None, 'the-gradient-step'),
+              ('Simple example code', 2, None, 'simple-example-code'),
+              ('When do we stop?', 2, None, 'when-do-we-stop'),
+              ('Slightly different approach',
+               2,
+               None,
+               'slightly-different-approach'),
+              ('Time decay rate', 2, None, 'time-decay-rate'),
+              ('Code with a Number of Minibatches which varies',
+               2,
+               None,
+               'code-with-a-number-of-minibatches-which-varies'),
+              ('Replace or not', 2, None, 'replace-or-not'),
+              ('Momentum based GD', 2, None, 'momentum-based-gd'),
+              ('More on momentum based approaches',
+               2,
+               None,
+               'more-on-momentum-based-approaches'),
+              ('Momentum parameter', 2, None, 'momentum-parameter'),
+              ('Second moment of the gradient',
+               2,
+               None,
+               'second-moment-of-the-gradient'),
+              ('RMS prop', 2, None, 'rms-prop'),
+              ('"ADAM optimizer":"https://arxiv.org/abs/1412.6980"',
+               2,
+               None,
+               'adam-optimizer-https-arxiv-org-abs-1412-6980'),
+              ('Algorithms and codes for Adagrad, RMSprop and Adam',
+               2,
+               None,
+               'algorithms-and-codes-for-adagrad-rmsprop-and-adam'),
+              ('Practical tips', 2, None, 'practical-tips'),
+              ('Automatic differentiation',
+               2,
+               None,
+               'automatic-differentiation'),
+              ('Using autograd', 2, None, 'using-autograd'),
+              ('Autograd with more complicated functions',
+               2,
+               None,
+               'autograd-with-more-complicated-functions'),
+              ('More complicated functions using the elements of their '
+               'arguments directly',
+               2,
+               None,
+               'more-complicated-functions-using-the-elements-of-their-arguments-directly'),
+              ('Functions using mathematical functions from Numpy',
+               2,
+               None,
+               'functions-using-mathematical-functions-from-numpy'),
+              ('More autograd', 2, None, 'more-autograd'),
+              ('And  with loops', 2, None, 'and-with-loops'),
+              ('Using recursion', 2, None, 'using-recursion'),
+              ('Unsupported functions', 2, None, 'unsupported-functions'),
+              ('The syntax a.dot(b) when finding the dot product',
+               2,
+               None,
+               'the-syntax-a-dot-b-when-finding-the-dot-product'),
+              ('Using Autograd with OLS', 2, None, 'using-autograd-with-ols'),
+              ('Same code but now with momentum gradient descent',
+               2,
+               None,
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ("But none of these can compete with Newton's method",
+               2,
+               None,
+               'but-none-of-these-can-compete-with-newton-s-method'),
+              ('Including Stochastic Gradient Descent with Autograd',
+               2,
+               None,
+               'including-stochastic-gradient-descent-with-autograd'),
+              ('Same code but now with momentum gradient descent',
+               2,
+               None,
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ('Similar (second order function now) problem but now with '
+               'AdaGrad',
+               2,
+               None,
+               'similar-second-order-function-now-problem-but-now-with-adagrad'),
+              ('RMSprop for adaptive learning rate with Stochastic Gradient '
+               'Descent',
+               2,
+               None,
+               'rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent'),
+              ('And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"',
+               2,
+               None,
+               'and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf'),
+              ('And Logistic Regression', 2, None, 'and-logistic-regression'),
+              ('Introducing "JAX":"https://jax.readthedocs.io/en/latest/"',
+               2,
+               None,
+               'introducing-jax-https-jax-readthedocs-io-en-latest')]}
+end of tocinfo -->
+
+<body>
+
+
+
+<script type="text/x-mathjax-config">
+MathJax.Hub.Config({
+  TeX: {
+     equationNumbers: {  autoNumber: "none"  },
+     extensions: ["AMSmath.js", "AMSsymbols.js", "autobold.js", "color.js"]
+  }
+});
+</script>
+<script type="text/javascript" async
+ src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
+</script>
+
+
+<!-- Bootstrap navigation bar -->
+<div class="navbar navbar-default navbar-fixed-top">
+  <div class="navbar-header">
+    <button type="button" class="navbar-toggle" data-toggle="collapse" data-target=".navbar-responsive-collapse">
+      <span class="icon-bar"></span>
+      <span class="icon-bar"></span>
+      <span class="icon-bar"></span>
+    </button>
+    <a class="navbar-brand" href="http://github.com/week39-bs.html">Week 39: Optimization and  Gradient Methods</a>
+  </div>
+  <div class="navbar-collapse collapse navbar-responsive-collapse">
+    <ul class="nav navbar-nav navbar-right">
+      <li class="dropdown">
+        <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
+        <ul class="dropdown-menu">
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs001.html#plan-for-week-39-september-23-27-2024" style="font-size: 80%;">Plan for week 39, September 23-27, 2024</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs002.html#lecture-monday-september-23" style="font-size: 80%;">Lecture Monday September 23</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs003.html#lab-sessions-week-39" style="font-size: 80%;">Lab sessions week 39</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs004.html#lecture-monday-september-23-optimization-the-central-part-of-any-machine-learning-algortithm" style="font-size: 80%;">Lecture Monday September 23, Optimization, the central part of any Machine Learning algortithm</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs005.html#revisiting-our-logistic-regression-case" style="font-size: 80%;">Revisiting our Logistic Regression case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs006.html#the-equations-to-solve" style="font-size: 80%;">The equations to solve</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs007.html#solving-using-newton-raphson-s-method" style="font-size: 80%;">Solving using Newton-Raphson's method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs008.html#brief-reminder-on-newton-raphson-s-method" style="font-size: 80%;">Brief reminder on Newton-Raphson's method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs009.html#the-equations" style="font-size: 80%;">The equations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs010.html#simple-geometric-interpretation" style="font-size: 80%;">Simple geometric interpretation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs011.html#extending-to-more-than-one-variable" style="font-size: 80%;">Extending to more than one variable</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs012.html#steepest-descent" style="font-size: 80%;">Steepest descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs013.html#more-on-steepest-descent" style="font-size: 80%;">More on Steepest descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs014.html#the-ideal" style="font-size: 80%;">The ideal</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs015.html#the-sensitiveness-of-the-gradient-descent" style="font-size: 80%;">The sensitiveness of the gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs016.html#convex-functions" style="font-size: 80%;">Convex functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs017.html#convex-function" style="font-size: 80%;">Convex function</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs018.html#conditions-on-convex-functions" style="font-size: 80%;">Conditions on convex functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs019.html#more-on-convex-functions" style="font-size: 80%;">More on convex functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs020.html#some-simple-problems" style="font-size: 80%;">Some simple problems</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs021.html#standard-steepest-descent" style="font-size: 80%;">Standard steepest descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs022.html#gradient-method" style="font-size: 80%;">Gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs024.html#steepest-descent-method" style="font-size: 80%;">Steepest descent  method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs024.html#steepest-descent-method" style="font-size: 80%;">Steepest descent  method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs025.html#final-expressions" style="font-size: 80%;">Final expressions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs026.html#steepest-descent-example" style="font-size: 80%;">Steepest descent example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs031.html#conjugate-gradient-method-and-iterations" style="font-size: 80%;">Conjugate gradient method and iterations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs035.html#revisiting-our-first-homework" style="font-size: 80%;">Revisiting our first homework</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs040.html#gradient-descent-example" style="font-size: 80%;">Gradient descent example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs037.html#the-derivative-of-the-cost-loss-function" style="font-size: 80%;">The derivative of the cost/loss function</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs038.html#the-hessian-matrix" style="font-size: 80%;">The Hessian matrix</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs039.html#simple-program" style="font-size: 80%;">Simple program</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs040.html#gradient-descent-example" style="font-size: 80%;">Gradient Descent Example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs041.html#and-a-corresponding-example-using-scikit-learn" style="font-size: 80%;">And a corresponding example using <b>scikit-learn</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs042.html#gradient-descent-and-ridge" style="font-size: 80%;">Gradient descent and Ridge</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs043.html#the-hessian-matrix-for-ridge-regression" style="font-size: 80%;">The Hessian matrix for Ridge Regression</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs044.html#program-example-for-gradient-descent-with-ridge-regression" style="font-size: 80%;">Program example for gradient descent with Ridge Regression</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs045.html#using-gradient-descent-methods-limitations" style="font-size: 80%;">Using gradient descent methods, limitations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs046.html#improving-gradient-descent-with-momentum" style="font-size: 80%;">Improving gradient descent with momentum</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs083.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;">Same code but now with momentum gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs048.html#overview-video-on-stochastic-gradient-descent" style="font-size: 80%;">Overview video on Stochastic Gradient Descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs049.html#batches-and-mini-batches" style="font-size: 80%;">Batches and mini-batches</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs050.html#stochastic-gradient-descent-sgd" style="font-size: 80%;">Stochastic Gradient Descent (SGD)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs051.html#stochastic-gradient-descent" style="font-size: 80%;">Stochastic Gradient Descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs052.html#computation-of-gradients" style="font-size: 80%;">Computation of gradients</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs053.html#sgd-example" style="font-size: 80%;">SGD example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs054.html#the-gradient-step" style="font-size: 80%;">The gradient step</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs055.html#simple-example-code" style="font-size: 80%;">Simple example code</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs056.html#when-do-we-stop" style="font-size: 80%;">When do we stop?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs057.html#slightly-different-approach" style="font-size: 80%;">Slightly different approach</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs058.html#time-decay-rate" style="font-size: 80%;">Time decay rate</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs059.html#code-with-a-number-of-minibatches-which-varies" style="font-size: 80%;">Code with a Number of Minibatches which varies</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs060.html#replace-or-not" style="font-size: 80%;">Replace or not</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs061.html#momentum-based-gd" style="font-size: 80%;">Momentum based GD</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs062.html#more-on-momentum-based-approaches" style="font-size: 80%;">More on momentum based approaches</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs063.html#momentum-parameter" style="font-size: 80%;">Momentum parameter</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs064.html#second-moment-of-the-gradient" style="font-size: 80%;">Second moment of the gradient</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs065.html#rms-prop" style="font-size: 80%;">RMS prop</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs066.html#adam-optimizer-https-arxiv-org-abs-1412-6980" style="font-size: 80%;">"ADAM optimizer":"https://arxiv.org/abs/1412.6980"</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs067.html#algorithms-and-codes-for-adagrad-rmsprop-and-adam" style="font-size: 80%;">Algorithms and codes for Adagrad, RMSprop and Adam</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs068.html#practical-tips" style="font-size: 80%;">Practical tips</a></li>
+     <!-- navigation toc: --> <li><a href="#automatic-differentiation" style="font-size: 80%;">Automatic differentiation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs070.html#using-autograd" style="font-size: 80%;">Using autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs071.html#autograd-with-more-complicated-functions" style="font-size: 80%;">Autograd with more complicated functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs072.html#more-complicated-functions-using-the-elements-of-their-arguments-directly" style="font-size: 80%;">More complicated functions using the elements of their arguments directly</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs073.html#functions-using-mathematical-functions-from-numpy" style="font-size: 80%;">Functions using mathematical functions from Numpy</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs074.html#more-autograd" style="font-size: 80%;">More autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs075.html#and-with-loops" style="font-size: 80%;">And  with loops</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs076.html#using-recursion" style="font-size: 80%;">Using recursion</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs077.html#unsupported-functions" style="font-size: 80%;">Unsupported functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs078.html#the-syntax-a-dot-b-when-finding-the-dot-product" style="font-size: 80%;">The syntax a.dot(b) when finding the dot product</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs079.html#using-autograd-with-ols" style="font-size: 80%;">Using Autograd with OLS</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs083.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;">Same code but now with momentum gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs081.html#but-none-of-these-can-compete-with-newton-s-method" style="font-size: 80%;">But none of these can compete with Newton's method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs082.html#including-stochastic-gradient-descent-with-autograd" style="font-size: 80%;">Including Stochastic Gradient Descent with Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs083.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;">Same code but now with momentum gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs084.html#similar-second-order-function-now-problem-but-now-with-adagrad" style="font-size: 80%;">Similar (second order function now) problem but now with AdaGrad</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs085.html#rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent" style="font-size: 80%;">RMSprop for adaptive learning rate with Stochastic Gradient Descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs086.html#and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf" style="font-size: 80%;">And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs087.html#and-logistic-regression" style="font-size: 80%;">And Logistic Regression</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs088.html#introducing-jax-https-jax-readthedocs-io-en-latest" style="font-size: 80%;">Introducing "JAX":"https://jax.readthedocs.io/en/latest/"</a></li>
+
+        </ul>
+      </li>
+    </ul>
+  </div>
+</div>
+</div> <!-- end of navigation bar -->
+<div class="container">
+<p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
+<a name="part0069"></a>
+<!-- !split -->
+<h2 id="automatic-differentiation" class="anchor">Automatic differentiation </h2>
+
+<p><a href="https://en.wikipedia.org/wiki/Automatic_differentiation" target="_self">Automatic differentiation (AD)</a>, 
+also called algorithmic
+differentiation or computational differentiation,is a set of
+techniques to numerically evaluate the derivative of a function
+specified by a computer program. AD exploits the fact that every
+computer program, no matter how complicated, executes a sequence of
+elementary arithmetic operations (addition, subtraction,
+multiplication, division, etc.) and elementary functions (exp, log,
+sin, cos, etc.). By applying the chain rule repeatedly to these
+operations, derivatives of arbitrary order can be computed
+automatically, accurately to working precision, and using at most a
+small constant factor more arithmetic operations than the original
+program.
+</p>
+
+<p>Automatic differentiation is neither:</p>
+
+<ul>
+<li> Symbolic differentiation, nor</li>
+<li> Numerical differentiation (the method of finite differences).</li>
+</ul>
+<p>Symbolic differentiation can lead to inefficient code and faces the
+difficulty of converting a computer program into a single expression,
+while numerical differentiation can introduce round-off errors in the
+discretization process and cancellation
+</p>
+
+<p>Python has tools for so-called <b>automatic differentiation</b>.
+Consider the following example
+</p>
+$$
+f(x) = \sin\left(2\pi x + x^2\right)
+$$
+
+<p>which has the following derivative</p>
+$$
+f'(x) = \cos\left(2\pi x + x^2\right)\left(2\pi + 2x\right) 
+$$
+
+<p>Using <b>autograd</b> we have</p>
+
+
+<!-- code=python (!bc pycod) typeset with pygments style "default" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #f8f8f8">
+  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">autograd.numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
+
+<span style="color: #408080; font-style: italic"># To do elementwise differentiation:</span>
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">autograd</span> <span style="color: #008000; font-weight: bold">import</span> elementwise_grad <span style="color: #008000; font-weight: bold">as</span> egrad 
+
+<span style="color: #408080; font-style: italic"># To plot:</span>
+<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">matplotlib.pyplot</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">plt</span> 
+
+
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">f</span>(x):
+    <span style="color: #008000; font-weight: bold">return</span> np<span style="color: #666666">.</span>sin(<span style="color: #666666">2*</span>np<span style="color: #666666">.</span>pi<span style="color: #666666">*</span>x <span style="color: #666666">+</span> x<span style="color: #666666">**2</span>)
+
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">f_grad_analytic</span>(x):
+    <span style="color: #008000; font-weight: bold">return</span> np<span style="color: #666666">.</span>cos(<span style="color: #666666">2*</span>np<span style="color: #666666">.</span>pi<span style="color: #666666">*</span>x <span style="color: #666666">+</span> x<span style="color: #666666">**2</span>)<span style="color: #666666">*</span>(<span style="color: #666666">2*</span>np<span style="color: #666666">.</span>pi <span style="color: #666666">+</span> <span style="color: #666666">2*</span>x)
+
+<span style="color: #408080; font-style: italic"># Do the comparison:</span>
+x <span style="color: #666666">=</span> np<span style="color: #666666">.</span>linspace(<span style="color: #666666">0</span>,<span style="color: #666666">1</span>,<span style="color: #666666">1000</span>)
+
+f_grad <span style="color: #666666">=</span> egrad(f)
+
+computed <span style="color: #666666">=</span> f_grad(x)
+analytic <span style="color: #666666">=</span> f_grad_analytic(x)
+
+plt<span style="color: #666666">.</span>title(<span style="color: #BA2121">&#39;Derivative computed from Autograd compared with the analytical derivative&#39;</span>)
+plt<span style="color: #666666">.</span>plot(x,computed,label<span style="color: #666666">=</span><span style="color: #BA2121">&#39;autograd&#39;</span>)
+plt<span style="color: #666666">.</span>plot(x,analytic,label<span style="color: #666666">=</span><span style="color: #BA2121">&#39;analytic&#39;</span>)
+
+plt<span style="color: #666666">.</span>xlabel(<span style="color: #BA2121">&#39;x&#39;</span>)
+plt<span style="color: #666666">.</span>ylabel(<span style="color: #BA2121">&#39;y&#39;</span>)
+plt<span style="color: #666666">.</span>legend()
+
+plt<span style="color: #666666">.</span>show()
+
+<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;The max absolute difference is: </span><span style="color: #BB6688; font-weight: bold">%g</span><span style="color: #BA2121">&quot;</span><span style="color: #666666">%</span>(np<span style="color: #666666">.</span>max(np<span style="color: #666666">.</span>abs(computed <span style="color: #666666">-</span> analytic))))
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
+</div>
+
+
+<p>
+<!-- navigation buttons at the bottom of the page -->
+<ul class="pagination">
+<li><a href="http://github.com/._week39-bs068.html">&laquo;</a></li>
+  <li><a href="http://github.com/._week39-bs000.html">1</a></li>
+  <li><a href="">...</a></li>
+  <li><a href="http://github.com/._week39-bs061.html">62</a></li>
+  <li><a href="http://github.com/._week39-bs062.html">63</a></li>
+  <li><a href="http://github.com/._week39-bs063.html">64</a></li>
+  <li><a href="http://github.com/._week39-bs064.html">65</a></li>
+  <li><a href="http://github.com/._week39-bs065.html">66</a></li>
+  <li><a href="http://github.com/._week39-bs066.html">67</a></li>
+  <li><a href="http://github.com/._week39-bs067.html">68</a></li>
+  <li><a href="http://github.com/._week39-bs068.html">69</a></li>
+  <li class="active"><a href="http://github.com/._week39-bs069.html">70</a></li>
+  <li><a href="http://github.com/._week39-bs070.html">71</a></li>
+  <li><a href="http://github.com/._week39-bs071.html">72</a></li>
+  <li><a href="http://github.com/._week39-bs072.html">73</a></li>
+  <li><a href="http://github.com/._week39-bs073.html">74</a></li>
+  <li><a href="http://github.com/._week39-bs074.html">75</a></li>
+  <li><a href="http://github.com/._week39-bs075.html">76</a></li>
+  <li><a href="http://github.com/._week39-bs076.html">77</a></li>
+  <li><a href="http://github.com/._week39-bs077.html">78</a></li>
+  <li><a href="http://github.com/._week39-bs078.html">79</a></li>
+  <li><a href="">...</a></li>
+  <li><a href="http://github.com/._week39-bs088.html">89</a></li>
+  <li><a href="http://github.com/._week39-bs070.html">&raquo;</a></li>
+</ul>
+<!-- ------------------- end of main content --------------- -->
+</div>  <!-- end container -->
+<!-- include javascript, jQuery *first* -->
+<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.10.2/jquery.min.js"></script>
+<script src="https://netdna.bootstrapcdn.com/bootstrap/3.0.0/js/bootstrap.min.js"></script>
+<!-- Bootstrap footer
+<footer>
+<a href="https://.../"><img width="250" align=right src="https://.../"></a>
+</footer>
+-->
+<center style="font-size:80%">
+<!-- copyright only on the titlepage -->
+</center>
+</body>
+</html>
+
diff --git a/doc/pub/week39/html/._week39-bs070.html b/doc/pub/week39/html/._week39-bs070.html
new file mode 100644
index 000000000..575da0e98
--- /dev/null
+++ b/doc/pub/week39/html/._week39-bs070.html
@@ -0,0 +1,506 @@
+<!--
+HTML file automatically generated from DocOnce source
+(https://github.com/doconce/doconce/)
+doconce format html week39.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week39-bs --no_mako
+-->
+<html>
+<head>
+<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
+<meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
+<meta name="viewport" content="width=device-width, initial-scale=1.0" />
+<meta name="description" content="Week 39: Optimization and  Gradient Methods">
+<title>Week 39: Optimization and  Gradient Methods</title>
+<!-- Bootstrap style: bootstrap -->
+<!-- doconce format html week39.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week39-bs --no_mako -->
+<link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
+<!-- not necessary
+<link href="https://netdna.bootstrapcdn.com/font-awesome/4.0.3/css/font-awesome.css" rel="stylesheet">
+-->
+<style type="text/css">
+/* Add scrollbar to dropdown menus in bootstrap navigation bar */
+.dropdown-menu {
+   height: auto;
+   max-height: 400px;
+   overflow-x: hidden;
+}
+/* Adds an invisible element before each target to offset for the navigation
+   bar */
+.anchor::before {
+  content:"";
+  display:block;
+  height:50px;      /* fixed header height for style bootstrap */
+  margin:-50px 0 0; /* negative fixed header height */
+}
+</style>
+</head>
+
+<!-- tocinfo
+{'highest level': 2,
+ 'sections': [('Plan for week 39, September 23-27, 2024',
+               2,
+               None,
+               'plan-for-week-39-september-23-27-2024'),
+              ('Lecture Monday September 23',
+               2,
+               None,
+               'lecture-monday-september-23'),
+              ('Lab sessions week 39', 2, None, 'lab-sessions-week-39'),
+              ('Lecture Monday September 23, Optimization, the central part of '
+               'any Machine Learning algortithm',
+               2,
+               None,
+               'lecture-monday-september-23-optimization-the-central-part-of-any-machine-learning-algortithm'),
+              ('Revisiting our Logistic Regression case',
+               2,
+               None,
+               'revisiting-our-logistic-regression-case'),
+              ('The equations to solve', 2, None, 'the-equations-to-solve'),
+              ("Solving using Newton-Raphson's method",
+               2,
+               None,
+               'solving-using-newton-raphson-s-method'),
+              ("Brief reminder on Newton-Raphson's method",
+               2,
+               None,
+               'brief-reminder-on-newton-raphson-s-method'),
+              ('The equations', 2, None, 'the-equations'),
+              ('Simple geometric interpretation',
+               2,
+               None,
+               'simple-geometric-interpretation'),
+              ('Extending to more than one variable',
+               2,
+               None,
+               'extending-to-more-than-one-variable'),
+              ('Steepest descent', 2, None, 'steepest-descent'),
+              ('More on Steepest descent', 2, None, 'more-on-steepest-descent'),
+              ('The ideal', 2, None, 'the-ideal'),
+              ('The sensitiveness of the gradient descent',
+               2,
+               None,
+               'the-sensitiveness-of-the-gradient-descent'),
+              ('Convex functions', 2, None, 'convex-functions'),
+              ('Convex function', 2, None, 'convex-function'),
+              ('Conditions on convex functions',
+               2,
+               None,
+               'conditions-on-convex-functions'),
+              ('More on convex functions', 2, None, 'more-on-convex-functions'),
+              ('Some simple problems', 2, None, 'some-simple-problems'),
+              ('Standard steepest descent',
+               2,
+               None,
+               'standard-steepest-descent'),
+              ('Gradient method', 2, None, 'gradient-method'),
+              ('Steepest descent  method', 2, None, 'steepest-descent-method'),
+              ('Steepest descent  method', 2, None, 'steepest-descent-method'),
+              ('Final expressions', 2, None, 'final-expressions'),
+              ('Steepest descent example', 2, None, 'steepest-descent-example'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method and iterations',
+               2,
+               None,
+               'conjugate-gradient-method-and-iterations'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Revisiting our first homework',
+               2,
+               None,
+               'revisiting-our-first-homework'),
+              ('Gradient descent example', 2, None, 'gradient-descent-example'),
+              ('The derivative of the cost/loss function',
+               2,
+               None,
+               'the-derivative-of-the-cost-loss-function'),
+              ('The Hessian matrix', 2, None, 'the-hessian-matrix'),
+              ('Simple program', 2, None, 'simple-program'),
+              ('Gradient Descent Example', 2, None, 'gradient-descent-example'),
+              ('And a corresponding example using _scikit-learn_',
+               2,
+               None,
+               'and-a-corresponding-example-using-scikit-learn'),
+              ('Gradient descent and Ridge',
+               2,
+               None,
+               'gradient-descent-and-ridge'),
+              ('The Hessian matrix for Ridge Regression',
+               2,
+               None,
+               'the-hessian-matrix-for-ridge-regression'),
+              ('Program example for gradient descent with Ridge Regression',
+               2,
+               None,
+               'program-example-for-gradient-descent-with-ridge-regression'),
+              ('Using gradient descent methods, limitations',
+               2,
+               None,
+               'using-gradient-descent-methods-limitations'),
+              ('Improving gradient descent with momentum',
+               2,
+               None,
+               'improving-gradient-descent-with-momentum'),
+              ('Same code but now with momentum gradient descent',
+               2,
+               None,
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ('Overview video on Stochastic Gradient Descent',
+               2,
+               None,
+               'overview-video-on-stochastic-gradient-descent'),
+              ('Batches and mini-batches', 2, None, 'batches-and-mini-batches'),
+              ('Stochastic Gradient Descent (SGD)',
+               2,
+               None,
+               'stochastic-gradient-descent-sgd'),
+              ('Stochastic Gradient Descent',
+               2,
+               None,
+               'stochastic-gradient-descent'),
+              ('Computation of gradients', 2, None, 'computation-of-gradients'),
+              ('SGD example', 2, None, 'sgd-example'),
+              ('The gradient step', 2, None, 'the-gradient-step'),
+              ('Simple example code', 2, None, 'simple-example-code'),
+              ('When do we stop?', 2, None, 'when-do-we-stop'),
+              ('Slightly different approach',
+               2,
+               None,
+               'slightly-different-approach'),
+              ('Time decay rate', 2, None, 'time-decay-rate'),
+              ('Code with a Number of Minibatches which varies',
+               2,
+               None,
+               'code-with-a-number-of-minibatches-which-varies'),
+              ('Replace or not', 2, None, 'replace-or-not'),
+              ('Momentum based GD', 2, None, 'momentum-based-gd'),
+              ('More on momentum based approaches',
+               2,
+               None,
+               'more-on-momentum-based-approaches'),
+              ('Momentum parameter', 2, None, 'momentum-parameter'),
+              ('Second moment of the gradient',
+               2,
+               None,
+               'second-moment-of-the-gradient'),
+              ('RMS prop', 2, None, 'rms-prop'),
+              ('"ADAM optimizer":"https://arxiv.org/abs/1412.6980"',
+               2,
+               None,
+               'adam-optimizer-https-arxiv-org-abs-1412-6980'),
+              ('Algorithms and codes for Adagrad, RMSprop and Adam',
+               2,
+               None,
+               'algorithms-and-codes-for-adagrad-rmsprop-and-adam'),
+              ('Practical tips', 2, None, 'practical-tips'),
+              ('Automatic differentiation',
+               2,
+               None,
+               'automatic-differentiation'),
+              ('Using autograd', 2, None, 'using-autograd'),
+              ('Autograd with more complicated functions',
+               2,
+               None,
+               'autograd-with-more-complicated-functions'),
+              ('More complicated functions using the elements of their '
+               'arguments directly',
+               2,
+               None,
+               'more-complicated-functions-using-the-elements-of-their-arguments-directly'),
+              ('Functions using mathematical functions from Numpy',
+               2,
+               None,
+               'functions-using-mathematical-functions-from-numpy'),
+              ('More autograd', 2, None, 'more-autograd'),
+              ('And  with loops', 2, None, 'and-with-loops'),
+              ('Using recursion', 2, None, 'using-recursion'),
+              ('Unsupported functions', 2, None, 'unsupported-functions'),
+              ('The syntax a.dot(b) when finding the dot product',
+               2,
+               None,
+               'the-syntax-a-dot-b-when-finding-the-dot-product'),
+              ('Using Autograd with OLS', 2, None, 'using-autograd-with-ols'),
+              ('Same code but now with momentum gradient descent',
+               2,
+               None,
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ("But none of these can compete with Newton's method",
+               2,
+               None,
+               'but-none-of-these-can-compete-with-newton-s-method'),
+              ('Including Stochastic Gradient Descent with Autograd',
+               2,
+               None,
+               'including-stochastic-gradient-descent-with-autograd'),
+              ('Same code but now with momentum gradient descent',
+               2,
+               None,
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ('Similar (second order function now) problem but now with '
+               'AdaGrad',
+               2,
+               None,
+               'similar-second-order-function-now-problem-but-now-with-adagrad'),
+              ('RMSprop for adaptive learning rate with Stochastic Gradient '
+               'Descent',
+               2,
+               None,
+               'rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent'),
+              ('And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"',
+               2,
+               None,
+               'and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf'),
+              ('And Logistic Regression', 2, None, 'and-logistic-regression'),
+              ('Introducing "JAX":"https://jax.readthedocs.io/en/latest/"',
+               2,
+               None,
+               'introducing-jax-https-jax-readthedocs-io-en-latest')]}
+end of tocinfo -->
+
+<body>
+
+
+
+<script type="text/x-mathjax-config">
+MathJax.Hub.Config({
+  TeX: {
+     equationNumbers: {  autoNumber: "none"  },
+     extensions: ["AMSmath.js", "AMSsymbols.js", "autobold.js", "color.js"]
+  }
+});
+</script>
+<script type="text/javascript" async
+ src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
+</script>
+
+
+<!-- Bootstrap navigation bar -->
+<div class="navbar navbar-default navbar-fixed-top">
+  <div class="navbar-header">
+    <button type="button" class="navbar-toggle" data-toggle="collapse" data-target=".navbar-responsive-collapse">
+      <span class="icon-bar"></span>
+      <span class="icon-bar"></span>
+      <span class="icon-bar"></span>
+    </button>
+    <a class="navbar-brand" href="http://github.com/week39-bs.html">Week 39: Optimization and  Gradient Methods</a>
+  </div>
+  <div class="navbar-collapse collapse navbar-responsive-collapse">
+    <ul class="nav navbar-nav navbar-right">
+      <li class="dropdown">
+        <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
+        <ul class="dropdown-menu">
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs001.html#plan-for-week-39-september-23-27-2024" style="font-size: 80%;">Plan for week 39, September 23-27, 2024</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs002.html#lecture-monday-september-23" style="font-size: 80%;">Lecture Monday September 23</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs003.html#lab-sessions-week-39" style="font-size: 80%;">Lab sessions week 39</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs004.html#lecture-monday-september-23-optimization-the-central-part-of-any-machine-learning-algortithm" style="font-size: 80%;">Lecture Monday September 23, Optimization, the central part of any Machine Learning algortithm</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs005.html#revisiting-our-logistic-regression-case" style="font-size: 80%;">Revisiting our Logistic Regression case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs006.html#the-equations-to-solve" style="font-size: 80%;">The equations to solve</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs007.html#solving-using-newton-raphson-s-method" style="font-size: 80%;">Solving using Newton-Raphson's method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs008.html#brief-reminder-on-newton-raphson-s-method" style="font-size: 80%;">Brief reminder on Newton-Raphson's method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs009.html#the-equations" style="font-size: 80%;">The equations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs010.html#simple-geometric-interpretation" style="font-size: 80%;">Simple geometric interpretation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs011.html#extending-to-more-than-one-variable" style="font-size: 80%;">Extending to more than one variable</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs012.html#steepest-descent" style="font-size: 80%;">Steepest descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs013.html#more-on-steepest-descent" style="font-size: 80%;">More on Steepest descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs014.html#the-ideal" style="font-size: 80%;">The ideal</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs015.html#the-sensitiveness-of-the-gradient-descent" style="font-size: 80%;">The sensitiveness of the gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs016.html#convex-functions" style="font-size: 80%;">Convex functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs017.html#convex-function" style="font-size: 80%;">Convex function</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs018.html#conditions-on-convex-functions" style="font-size: 80%;">Conditions on convex functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs019.html#more-on-convex-functions" style="font-size: 80%;">More on convex functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs020.html#some-simple-problems" style="font-size: 80%;">Some simple problems</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs021.html#standard-steepest-descent" style="font-size: 80%;">Standard steepest descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs022.html#gradient-method" style="font-size: 80%;">Gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs024.html#steepest-descent-method" style="font-size: 80%;">Steepest descent  method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs024.html#steepest-descent-method" style="font-size: 80%;">Steepest descent  method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs025.html#final-expressions" style="font-size: 80%;">Final expressions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs026.html#steepest-descent-example" style="font-size: 80%;">Steepest descent example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs031.html#conjugate-gradient-method-and-iterations" style="font-size: 80%;">Conjugate gradient method and iterations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs035.html#revisiting-our-first-homework" style="font-size: 80%;">Revisiting our first homework</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs040.html#gradient-descent-example" style="font-size: 80%;">Gradient descent example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs037.html#the-derivative-of-the-cost-loss-function" style="font-size: 80%;">The derivative of the cost/loss function</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs038.html#the-hessian-matrix" style="font-size: 80%;">The Hessian matrix</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs039.html#simple-program" style="font-size: 80%;">Simple program</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs040.html#gradient-descent-example" style="font-size: 80%;">Gradient Descent Example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs041.html#and-a-corresponding-example-using-scikit-learn" style="font-size: 80%;">And a corresponding example using <b>scikit-learn</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs042.html#gradient-descent-and-ridge" style="font-size: 80%;">Gradient descent and Ridge</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs043.html#the-hessian-matrix-for-ridge-regression" style="font-size: 80%;">The Hessian matrix for Ridge Regression</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs044.html#program-example-for-gradient-descent-with-ridge-regression" style="font-size: 80%;">Program example for gradient descent with Ridge Regression</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs045.html#using-gradient-descent-methods-limitations" style="font-size: 80%;">Using gradient descent methods, limitations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs046.html#improving-gradient-descent-with-momentum" style="font-size: 80%;">Improving gradient descent with momentum</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs083.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;">Same code but now with momentum gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs048.html#overview-video-on-stochastic-gradient-descent" style="font-size: 80%;">Overview video on Stochastic Gradient Descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs049.html#batches-and-mini-batches" style="font-size: 80%;">Batches and mini-batches</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs050.html#stochastic-gradient-descent-sgd" style="font-size: 80%;">Stochastic Gradient Descent (SGD)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs051.html#stochastic-gradient-descent" style="font-size: 80%;">Stochastic Gradient Descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs052.html#computation-of-gradients" style="font-size: 80%;">Computation of gradients</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs053.html#sgd-example" style="font-size: 80%;">SGD example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs054.html#the-gradient-step" style="font-size: 80%;">The gradient step</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs055.html#simple-example-code" style="font-size: 80%;">Simple example code</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs056.html#when-do-we-stop" style="font-size: 80%;">When do we stop?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs057.html#slightly-different-approach" style="font-size: 80%;">Slightly different approach</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs058.html#time-decay-rate" style="font-size: 80%;">Time decay rate</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs059.html#code-with-a-number-of-minibatches-which-varies" style="font-size: 80%;">Code with a Number of Minibatches which varies</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs060.html#replace-or-not" style="font-size: 80%;">Replace or not</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs061.html#momentum-based-gd" style="font-size: 80%;">Momentum based GD</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs062.html#more-on-momentum-based-approaches" style="font-size: 80%;">More on momentum based approaches</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs063.html#momentum-parameter" style="font-size: 80%;">Momentum parameter</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs064.html#second-moment-of-the-gradient" style="font-size: 80%;">Second moment of the gradient</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs065.html#rms-prop" style="font-size: 80%;">RMS prop</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs066.html#adam-optimizer-https-arxiv-org-abs-1412-6980" style="font-size: 80%;">"ADAM optimizer":"https://arxiv.org/abs/1412.6980"</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs067.html#algorithms-and-codes-for-adagrad-rmsprop-and-adam" style="font-size: 80%;">Algorithms and codes for Adagrad, RMSprop and Adam</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs068.html#practical-tips" style="font-size: 80%;">Practical tips</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs069.html#automatic-differentiation" style="font-size: 80%;">Automatic differentiation</a></li>
+     <!-- navigation toc: --> <li><a href="#using-autograd" style="font-size: 80%;">Using autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs071.html#autograd-with-more-complicated-functions" style="font-size: 80%;">Autograd with more complicated functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs072.html#more-complicated-functions-using-the-elements-of-their-arguments-directly" style="font-size: 80%;">More complicated functions using the elements of their arguments directly</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs073.html#functions-using-mathematical-functions-from-numpy" style="font-size: 80%;">Functions using mathematical functions from Numpy</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs074.html#more-autograd" style="font-size: 80%;">More autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs075.html#and-with-loops" style="font-size: 80%;">And  with loops</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs076.html#using-recursion" style="font-size: 80%;">Using recursion</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs077.html#unsupported-functions" style="font-size: 80%;">Unsupported functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs078.html#the-syntax-a-dot-b-when-finding-the-dot-product" style="font-size: 80%;">The syntax a.dot(b) when finding the dot product</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs079.html#using-autograd-with-ols" style="font-size: 80%;">Using Autograd with OLS</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs083.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;">Same code but now with momentum gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs081.html#but-none-of-these-can-compete-with-newton-s-method" style="font-size: 80%;">But none of these can compete with Newton's method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs082.html#including-stochastic-gradient-descent-with-autograd" style="font-size: 80%;">Including Stochastic Gradient Descent with Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs083.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;">Same code but now with momentum gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs084.html#similar-second-order-function-now-problem-but-now-with-adagrad" style="font-size: 80%;">Similar (second order function now) problem but now with AdaGrad</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs085.html#rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent" style="font-size: 80%;">RMSprop for adaptive learning rate with Stochastic Gradient Descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs086.html#and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf" style="font-size: 80%;">And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs087.html#and-logistic-regression" style="font-size: 80%;">And Logistic Regression</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs088.html#introducing-jax-https-jax-readthedocs-io-en-latest" style="font-size: 80%;">Introducing "JAX":"https://jax.readthedocs.io/en/latest/"</a></li>
+
+        </ul>
+      </li>
+    </ul>
+  </div>
+</div>
+</div> <!-- end of navigation bar -->
+<div class="container">
+<p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
+<a name="part0070"></a>
+<!-- !split  -->
+<h2 id="using-autograd" class="anchor">Using autograd </h2>
+
+<p>Here we
+experiment with what kind of functions Autograd is capable
+of finding the gradient of. The following Python functions are just
+meant to illustrate what Autograd can do, but please feel free to
+experiment with other, possibly more complicated, functions as well.
+</p>
+
+
+<!-- code=python (!bc pycod) typeset with pygments style "default" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #f8f8f8">
+  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">autograd.numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">autograd</span> <span style="color: #008000; font-weight: bold">import</span> grad
+
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">f1</span>(x):
+    <span style="color: #008000; font-weight: bold">return</span> x<span style="color: #666666">**3</span> <span style="color: #666666">+</span> <span style="color: #666666">1</span>
+
+f1_grad <span style="color: #666666">=</span> grad(f1)
+
+<span style="color: #408080; font-style: italic"># Remember to send in float as argument to the computed gradient from Autograd!</span>
+a <span style="color: #666666">=</span> <span style="color: #666666">1.0</span>
+
+<span style="color: #408080; font-style: italic"># See the evaluated gradient at a using autograd:</span>
+<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;The gradient of f1 evaluated at a = </span><span style="color: #BB6688; font-weight: bold">%g</span><span style="color: #BA2121"> using autograd is: </span><span style="color: #BB6688; font-weight: bold">%g</span><span style="color: #BA2121">&quot;</span><span style="color: #666666">%</span>(a,f1_grad(a)))
+
+<span style="color: #408080; font-style: italic"># Compare with the analytical derivative, that is f1&#39;(x) = 3*x**2 </span>
+grad_analytical <span style="color: #666666">=</span> <span style="color: #666666">3*</span>a<span style="color: #666666">**2</span>
+<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;The gradient of f1 evaluated at a = </span><span style="color: #BB6688; font-weight: bold">%g</span><span style="color: #BA2121"> by finding the analytic expression is: </span><span style="color: #BB6688; font-weight: bold">%g</span><span style="color: #BA2121">&quot;</span><span style="color: #666666">%</span>(a,grad_analytical))
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
+</div>
+
+
+<p>
+<!-- navigation buttons at the bottom of the page -->
+<ul class="pagination">
+<li><a href="http://github.com/._week39-bs069.html">&laquo;</a></li>
+  <li><a href="http://github.com/._week39-bs000.html">1</a></li>
+  <li><a href="">...</a></li>
+  <li><a href="http://github.com/._week39-bs062.html">63</a></li>
+  <li><a href="http://github.com/._week39-bs063.html">64</a></li>
+  <li><a href="http://github.com/._week39-bs064.html">65</a></li>
+  <li><a href="http://github.com/._week39-bs065.html">66</a></li>
+  <li><a href="http://github.com/._week39-bs066.html">67</a></li>
+  <li><a href="http://github.com/._week39-bs067.html">68</a></li>
+  <li><a href="http://github.com/._week39-bs068.html">69</a></li>
+  <li><a href="http://github.com/._week39-bs069.html">70</a></li>
+  <li class="active"><a href="http://github.com/._week39-bs070.html">71</a></li>
+  <li><a href="http://github.com/._week39-bs071.html">72</a></li>
+  <li><a href="http://github.com/._week39-bs072.html">73</a></li>
+  <li><a href="http://github.com/._week39-bs073.html">74</a></li>
+  <li><a href="http://github.com/._week39-bs074.html">75</a></li>
+  <li><a href="http://github.com/._week39-bs075.html">76</a></li>
+  <li><a href="http://github.com/._week39-bs076.html">77</a></li>
+  <li><a href="http://github.com/._week39-bs077.html">78</a></li>
+  <li><a href="http://github.com/._week39-bs078.html">79</a></li>
+  <li><a href="http://github.com/._week39-bs079.html">80</a></li>
+  <li><a href="">...</a></li>
+  <li><a href="http://github.com/._week39-bs088.html">89</a></li>
+  <li><a href="http://github.com/._week39-bs071.html">&raquo;</a></li>
+</ul>
+<!-- ------------------- end of main content --------------- -->
+</div>  <!-- end container -->
+<!-- include javascript, jQuery *first* -->
+<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.10.2/jquery.min.js"></script>
+<script src="https://netdna.bootstrapcdn.com/bootstrap/3.0.0/js/bootstrap.min.js"></script>
+<!-- Bootstrap footer
+<footer>
+<a href="https://.../"><img width="250" align=right src="https://.../"></a>
+</footer>
+-->
+<center style="font-size:80%">
+<!-- copyright only on the titlepage -->
+</center>
+</body>
+</html>
+
diff --git a/doc/pub/week39/html/._week39-bs071.html b/doc/pub/week39/html/._week39-bs071.html
new file mode 100644
index 000000000..02ace03c4
--- /dev/null
+++ b/doc/pub/week39/html/._week39-bs071.html
@@ -0,0 +1,521 @@
+<!--
+HTML file automatically generated from DocOnce source
+(https://github.com/doconce/doconce/)
+doconce format html week39.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week39-bs --no_mako
+-->
+<html>
+<head>
+<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
+<meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
+<meta name="viewport" content="width=device-width, initial-scale=1.0" />
+<meta name="description" content="Week 39: Optimization and  Gradient Methods">
+<title>Week 39: Optimization and  Gradient Methods</title>
+<!-- Bootstrap style: bootstrap -->
+<!-- doconce format html week39.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week39-bs --no_mako -->
+<link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
+<!-- not necessary
+<link href="https://netdna.bootstrapcdn.com/font-awesome/4.0.3/css/font-awesome.css" rel="stylesheet">
+-->
+<style type="text/css">
+/* Add scrollbar to dropdown menus in bootstrap navigation bar */
+.dropdown-menu {
+   height: auto;
+   max-height: 400px;
+   overflow-x: hidden;
+}
+/* Adds an invisible element before each target to offset for the navigation
+   bar */
+.anchor::before {
+  content:"";
+  display:block;
+  height:50px;      /* fixed header height for style bootstrap */
+  margin:-50px 0 0; /* negative fixed header height */
+}
+</style>
+</head>
+
+<!-- tocinfo
+{'highest level': 2,
+ 'sections': [('Plan for week 39, September 23-27, 2024',
+               2,
+               None,
+               'plan-for-week-39-september-23-27-2024'),
+              ('Lecture Monday September 23',
+               2,
+               None,
+               'lecture-monday-september-23'),
+              ('Lab sessions week 39', 2, None, 'lab-sessions-week-39'),
+              ('Lecture Monday September 23, Optimization, the central part of '
+               'any Machine Learning algortithm',
+               2,
+               None,
+               'lecture-monday-september-23-optimization-the-central-part-of-any-machine-learning-algortithm'),
+              ('Revisiting our Logistic Regression case',
+               2,
+               None,
+               'revisiting-our-logistic-regression-case'),
+              ('The equations to solve', 2, None, 'the-equations-to-solve'),
+              ("Solving using Newton-Raphson's method",
+               2,
+               None,
+               'solving-using-newton-raphson-s-method'),
+              ("Brief reminder on Newton-Raphson's method",
+               2,
+               None,
+               'brief-reminder-on-newton-raphson-s-method'),
+              ('The equations', 2, None, 'the-equations'),
+              ('Simple geometric interpretation',
+               2,
+               None,
+               'simple-geometric-interpretation'),
+              ('Extending to more than one variable',
+               2,
+               None,
+               'extending-to-more-than-one-variable'),
+              ('Steepest descent', 2, None, 'steepest-descent'),
+              ('More on Steepest descent', 2, None, 'more-on-steepest-descent'),
+              ('The ideal', 2, None, 'the-ideal'),
+              ('The sensitiveness of the gradient descent',
+               2,
+               None,
+               'the-sensitiveness-of-the-gradient-descent'),
+              ('Convex functions', 2, None, 'convex-functions'),
+              ('Convex function', 2, None, 'convex-function'),
+              ('Conditions on convex functions',
+               2,
+               None,
+               'conditions-on-convex-functions'),
+              ('More on convex functions', 2, None, 'more-on-convex-functions'),
+              ('Some simple problems', 2, None, 'some-simple-problems'),
+              ('Standard steepest descent',
+               2,
+               None,
+               'standard-steepest-descent'),
+              ('Gradient method', 2, None, 'gradient-method'),
+              ('Steepest descent  method', 2, None, 'steepest-descent-method'),
+              ('Steepest descent  method', 2, None, 'steepest-descent-method'),
+              ('Final expressions', 2, None, 'final-expressions'),
+              ('Steepest descent example', 2, None, 'steepest-descent-example'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method and iterations',
+               2,
+               None,
+               'conjugate-gradient-method-and-iterations'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Revisiting our first homework',
+               2,
+               None,
+               'revisiting-our-first-homework'),
+              ('Gradient descent example', 2, None, 'gradient-descent-example'),
+              ('The derivative of the cost/loss function',
+               2,
+               None,
+               'the-derivative-of-the-cost-loss-function'),
+              ('The Hessian matrix', 2, None, 'the-hessian-matrix'),
+              ('Simple program', 2, None, 'simple-program'),
+              ('Gradient Descent Example', 2, None, 'gradient-descent-example'),
+              ('And a corresponding example using _scikit-learn_',
+               2,
+               None,
+               'and-a-corresponding-example-using-scikit-learn'),
+              ('Gradient descent and Ridge',
+               2,
+               None,
+               'gradient-descent-and-ridge'),
+              ('The Hessian matrix for Ridge Regression',
+               2,
+               None,
+               'the-hessian-matrix-for-ridge-regression'),
+              ('Program example for gradient descent with Ridge Regression',
+               2,
+               None,
+               'program-example-for-gradient-descent-with-ridge-regression'),
+              ('Using gradient descent methods, limitations',
+               2,
+               None,
+               'using-gradient-descent-methods-limitations'),
+              ('Improving gradient descent with momentum',
+               2,
+               None,
+               'improving-gradient-descent-with-momentum'),
+              ('Same code but now with momentum gradient descent',
+               2,
+               None,
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ('Overview video on Stochastic Gradient Descent',
+               2,
+               None,
+               'overview-video-on-stochastic-gradient-descent'),
+              ('Batches and mini-batches', 2, None, 'batches-and-mini-batches'),
+              ('Stochastic Gradient Descent (SGD)',
+               2,
+               None,
+               'stochastic-gradient-descent-sgd'),
+              ('Stochastic Gradient Descent',
+               2,
+               None,
+               'stochastic-gradient-descent'),
+              ('Computation of gradients', 2, None, 'computation-of-gradients'),
+              ('SGD example', 2, None, 'sgd-example'),
+              ('The gradient step', 2, None, 'the-gradient-step'),
+              ('Simple example code', 2, None, 'simple-example-code'),
+              ('When do we stop?', 2, None, 'when-do-we-stop'),
+              ('Slightly different approach',
+               2,
+               None,
+               'slightly-different-approach'),
+              ('Time decay rate', 2, None, 'time-decay-rate'),
+              ('Code with a Number of Minibatches which varies',
+               2,
+               None,
+               'code-with-a-number-of-minibatches-which-varies'),
+              ('Replace or not', 2, None, 'replace-or-not'),
+              ('Momentum based GD', 2, None, 'momentum-based-gd'),
+              ('More on momentum based approaches',
+               2,
+               None,
+               'more-on-momentum-based-approaches'),
+              ('Momentum parameter', 2, None, 'momentum-parameter'),
+              ('Second moment of the gradient',
+               2,
+               None,
+               'second-moment-of-the-gradient'),
+              ('RMS prop', 2, None, 'rms-prop'),
+              ('"ADAM optimizer":"https://arxiv.org/abs/1412.6980"',
+               2,
+               None,
+               'adam-optimizer-https-arxiv-org-abs-1412-6980'),
+              ('Algorithms and codes for Adagrad, RMSprop and Adam',
+               2,
+               None,
+               'algorithms-and-codes-for-adagrad-rmsprop-and-adam'),
+              ('Practical tips', 2, None, 'practical-tips'),
+              ('Automatic differentiation',
+               2,
+               None,
+               'automatic-differentiation'),
+              ('Using autograd', 2, None, 'using-autograd'),
+              ('Autograd with more complicated functions',
+               2,
+               None,
+               'autograd-with-more-complicated-functions'),
+              ('More complicated functions using the elements of their '
+               'arguments directly',
+               2,
+               None,
+               'more-complicated-functions-using-the-elements-of-their-arguments-directly'),
+              ('Functions using mathematical functions from Numpy',
+               2,
+               None,
+               'functions-using-mathematical-functions-from-numpy'),
+              ('More autograd', 2, None, 'more-autograd'),
+              ('And  with loops', 2, None, 'and-with-loops'),
+              ('Using recursion', 2, None, 'using-recursion'),
+              ('Unsupported functions', 2, None, 'unsupported-functions'),
+              ('The syntax a.dot(b) when finding the dot product',
+               2,
+               None,
+               'the-syntax-a-dot-b-when-finding-the-dot-product'),
+              ('Using Autograd with OLS', 2, None, 'using-autograd-with-ols'),
+              ('Same code but now with momentum gradient descent',
+               2,
+               None,
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ("But none of these can compete with Newton's method",
+               2,
+               None,
+               'but-none-of-these-can-compete-with-newton-s-method'),
+              ('Including Stochastic Gradient Descent with Autograd',
+               2,
+               None,
+               'including-stochastic-gradient-descent-with-autograd'),
+              ('Same code but now with momentum gradient descent',
+               2,
+               None,
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ('Similar (second order function now) problem but now with '
+               'AdaGrad',
+               2,
+               None,
+               'similar-second-order-function-now-problem-but-now-with-adagrad'),
+              ('RMSprop for adaptive learning rate with Stochastic Gradient '
+               'Descent',
+               2,
+               None,
+               'rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent'),
+              ('And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"',
+               2,
+               None,
+               'and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf'),
+              ('And Logistic Regression', 2, None, 'and-logistic-regression'),
+              ('Introducing "JAX":"https://jax.readthedocs.io/en/latest/"',
+               2,
+               None,
+               'introducing-jax-https-jax-readthedocs-io-en-latest')]}
+end of tocinfo -->
+
+<body>
+
+
+
+<script type="text/x-mathjax-config">
+MathJax.Hub.Config({
+  TeX: {
+     equationNumbers: {  autoNumber: "none"  },
+     extensions: ["AMSmath.js", "AMSsymbols.js", "autobold.js", "color.js"]
+  }
+});
+</script>
+<script type="text/javascript" async
+ src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
+</script>
+
+
+<!-- Bootstrap navigation bar -->
+<div class="navbar navbar-default navbar-fixed-top">
+  <div class="navbar-header">
+    <button type="button" class="navbar-toggle" data-toggle="collapse" data-target=".navbar-responsive-collapse">
+      <span class="icon-bar"></span>
+      <span class="icon-bar"></span>
+      <span class="icon-bar"></span>
+    </button>
+    <a class="navbar-brand" href="http://github.com/week39-bs.html">Week 39: Optimization and  Gradient Methods</a>
+  </div>
+  <div class="navbar-collapse collapse navbar-responsive-collapse">
+    <ul class="nav navbar-nav navbar-right">
+      <li class="dropdown">
+        <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
+        <ul class="dropdown-menu">
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs001.html#plan-for-week-39-september-23-27-2024" style="font-size: 80%;">Plan for week 39, September 23-27, 2024</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs002.html#lecture-monday-september-23" style="font-size: 80%;">Lecture Monday September 23</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs003.html#lab-sessions-week-39" style="font-size: 80%;">Lab sessions week 39</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs004.html#lecture-monday-september-23-optimization-the-central-part-of-any-machine-learning-algortithm" style="font-size: 80%;">Lecture Monday September 23, Optimization, the central part of any Machine Learning algortithm</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs005.html#revisiting-our-logistic-regression-case" style="font-size: 80%;">Revisiting our Logistic Regression case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs006.html#the-equations-to-solve" style="font-size: 80%;">The equations to solve</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs007.html#solving-using-newton-raphson-s-method" style="font-size: 80%;">Solving using Newton-Raphson's method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs008.html#brief-reminder-on-newton-raphson-s-method" style="font-size: 80%;">Brief reminder on Newton-Raphson's method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs009.html#the-equations" style="font-size: 80%;">The equations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs010.html#simple-geometric-interpretation" style="font-size: 80%;">Simple geometric interpretation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs011.html#extending-to-more-than-one-variable" style="font-size: 80%;">Extending to more than one variable</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs012.html#steepest-descent" style="font-size: 80%;">Steepest descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs013.html#more-on-steepest-descent" style="font-size: 80%;">More on Steepest descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs014.html#the-ideal" style="font-size: 80%;">The ideal</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs015.html#the-sensitiveness-of-the-gradient-descent" style="font-size: 80%;">The sensitiveness of the gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs016.html#convex-functions" style="font-size: 80%;">Convex functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs017.html#convex-function" style="font-size: 80%;">Convex function</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs018.html#conditions-on-convex-functions" style="font-size: 80%;">Conditions on convex functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs019.html#more-on-convex-functions" style="font-size: 80%;">More on convex functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs020.html#some-simple-problems" style="font-size: 80%;">Some simple problems</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs021.html#standard-steepest-descent" style="font-size: 80%;">Standard steepest descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs022.html#gradient-method" style="font-size: 80%;">Gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs024.html#steepest-descent-method" style="font-size: 80%;">Steepest descent  method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs024.html#steepest-descent-method" style="font-size: 80%;">Steepest descent  method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs025.html#final-expressions" style="font-size: 80%;">Final expressions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs026.html#steepest-descent-example" style="font-size: 80%;">Steepest descent example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs031.html#conjugate-gradient-method-and-iterations" style="font-size: 80%;">Conjugate gradient method and iterations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs035.html#revisiting-our-first-homework" style="font-size: 80%;">Revisiting our first homework</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs040.html#gradient-descent-example" style="font-size: 80%;">Gradient descent example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs037.html#the-derivative-of-the-cost-loss-function" style="font-size: 80%;">The derivative of the cost/loss function</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs038.html#the-hessian-matrix" style="font-size: 80%;">The Hessian matrix</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs039.html#simple-program" style="font-size: 80%;">Simple program</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs040.html#gradient-descent-example" style="font-size: 80%;">Gradient Descent Example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs041.html#and-a-corresponding-example-using-scikit-learn" style="font-size: 80%;">And a corresponding example using <b>scikit-learn</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs042.html#gradient-descent-and-ridge" style="font-size: 80%;">Gradient descent and Ridge</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs043.html#the-hessian-matrix-for-ridge-regression" style="font-size: 80%;">The Hessian matrix for Ridge Regression</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs044.html#program-example-for-gradient-descent-with-ridge-regression" style="font-size: 80%;">Program example for gradient descent with Ridge Regression</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs045.html#using-gradient-descent-methods-limitations" style="font-size: 80%;">Using gradient descent methods, limitations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs046.html#improving-gradient-descent-with-momentum" style="font-size: 80%;">Improving gradient descent with momentum</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs083.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;">Same code but now with momentum gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs048.html#overview-video-on-stochastic-gradient-descent" style="font-size: 80%;">Overview video on Stochastic Gradient Descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs049.html#batches-and-mini-batches" style="font-size: 80%;">Batches and mini-batches</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs050.html#stochastic-gradient-descent-sgd" style="font-size: 80%;">Stochastic Gradient Descent (SGD)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs051.html#stochastic-gradient-descent" style="font-size: 80%;">Stochastic Gradient Descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs052.html#computation-of-gradients" style="font-size: 80%;">Computation of gradients</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs053.html#sgd-example" style="font-size: 80%;">SGD example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs054.html#the-gradient-step" style="font-size: 80%;">The gradient step</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs055.html#simple-example-code" style="font-size: 80%;">Simple example code</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs056.html#when-do-we-stop" style="font-size: 80%;">When do we stop?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs057.html#slightly-different-approach" style="font-size: 80%;">Slightly different approach</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs058.html#time-decay-rate" style="font-size: 80%;">Time decay rate</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs059.html#code-with-a-number-of-minibatches-which-varies" style="font-size: 80%;">Code with a Number of Minibatches which varies</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs060.html#replace-or-not" style="font-size: 80%;">Replace or not</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs061.html#momentum-based-gd" style="font-size: 80%;">Momentum based GD</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs062.html#more-on-momentum-based-approaches" style="font-size: 80%;">More on momentum based approaches</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs063.html#momentum-parameter" style="font-size: 80%;">Momentum parameter</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs064.html#second-moment-of-the-gradient" style="font-size: 80%;">Second moment of the gradient</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs065.html#rms-prop" style="font-size: 80%;">RMS prop</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs066.html#adam-optimizer-https-arxiv-org-abs-1412-6980" style="font-size: 80%;">"ADAM optimizer":"https://arxiv.org/abs/1412.6980"</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs067.html#algorithms-and-codes-for-adagrad-rmsprop-and-adam" style="font-size: 80%;">Algorithms and codes for Adagrad, RMSprop and Adam</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs068.html#practical-tips" style="font-size: 80%;">Practical tips</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs069.html#automatic-differentiation" style="font-size: 80%;">Automatic differentiation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs070.html#using-autograd" style="font-size: 80%;">Using autograd</a></li>
+     <!-- navigation toc: --> <li><a href="#autograd-with-more-complicated-functions" style="font-size: 80%;">Autograd with more complicated functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs072.html#more-complicated-functions-using-the-elements-of-their-arguments-directly" style="font-size: 80%;">More complicated functions using the elements of their arguments directly</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs073.html#functions-using-mathematical-functions-from-numpy" style="font-size: 80%;">Functions using mathematical functions from Numpy</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs074.html#more-autograd" style="font-size: 80%;">More autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs075.html#and-with-loops" style="font-size: 80%;">And  with loops</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs076.html#using-recursion" style="font-size: 80%;">Using recursion</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs077.html#unsupported-functions" style="font-size: 80%;">Unsupported functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs078.html#the-syntax-a-dot-b-when-finding-the-dot-product" style="font-size: 80%;">The syntax a.dot(b) when finding the dot product</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs079.html#using-autograd-with-ols" style="font-size: 80%;">Using Autograd with OLS</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs083.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;">Same code but now with momentum gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs081.html#but-none-of-these-can-compete-with-newton-s-method" style="font-size: 80%;">But none of these can compete with Newton's method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs082.html#including-stochastic-gradient-descent-with-autograd" style="font-size: 80%;">Including Stochastic Gradient Descent with Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs083.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;">Same code but now with momentum gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs084.html#similar-second-order-function-now-problem-but-now-with-adagrad" style="font-size: 80%;">Similar (second order function now) problem but now with AdaGrad</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs085.html#rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent" style="font-size: 80%;">RMSprop for adaptive learning rate with Stochastic Gradient Descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs086.html#and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf" style="font-size: 80%;">And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs087.html#and-logistic-regression" style="font-size: 80%;">And Logistic Regression</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs088.html#introducing-jax-https-jax-readthedocs-io-en-latest" style="font-size: 80%;">Introducing "JAX":"https://jax.readthedocs.io/en/latest/"</a></li>
+
+        </ul>
+      </li>
+    </ul>
+  </div>
+</div>
+</div> <!-- end of navigation bar -->
+<div class="container">
+<p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
+<a name="part0071"></a>
+<!-- !split -->
+<h2 id="autograd-with-more-complicated-functions" class="anchor">Autograd with more complicated functions </h2>
+
+<p>To differentiate with respect to two (or more) arguments of a Python
+function, Autograd need to know at which variable the function if
+being differentiated with respect to.
+</p>
+
+
+<!-- code=python (!bc pycod) typeset with pygments style "default" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #f8f8f8">
+  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">autograd.numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">autograd</span> <span style="color: #008000; font-weight: bold">import</span> grad
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">f2</span>(x1,x2):
+    <span style="color: #008000; font-weight: bold">return</span> <span style="color: #666666">3*</span>x1<span style="color: #666666">**3</span> <span style="color: #666666">+</span> x2<span style="color: #666666">*</span>(x1 <span style="color: #666666">-</span> <span style="color: #666666">5</span>) <span style="color: #666666">+</span> <span style="color: #666666">1</span>
+
+<span style="color: #408080; font-style: italic"># By sending the argument 0, Autograd will compute the derivative w.r.t the first variable, in this case x1</span>
+f2_grad_x1 <span style="color: #666666">=</span> grad(f2,<span style="color: #666666">0</span>)
+
+<span style="color: #408080; font-style: italic"># ... and differentiate w.r.t x2 by sending 1 as an additional arugment to grad</span>
+f2_grad_x2 <span style="color: #666666">=</span> grad(f2,<span style="color: #666666">1</span>)
+
+x1 <span style="color: #666666">=</span> <span style="color: #666666">1.0</span>
+x2 <span style="color: #666666">=</span> <span style="color: #666666">3.0</span> 
+
+<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;Evaluating at x1 = </span><span style="color: #BB6688; font-weight: bold">%g</span><span style="color: #BA2121">, x2 = </span><span style="color: #BB6688; font-weight: bold">%g</span><span style="color: #BA2121">&quot;</span><span style="color: #666666">%</span>(x1,x2))
+<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;-&quot;</span><span style="color: #666666">*30</span>)
+
+<span style="color: #408080; font-style: italic"># Compare with the analytical derivatives:</span>
+
+<span style="color: #408080; font-style: italic"># Derivative of f2 w.r.t x1 is: 9*x1**2 + x2:</span>
+f2_grad_x1_analytical <span style="color: #666666">=</span> <span style="color: #666666">9*</span>x1<span style="color: #666666">**2</span> <span style="color: #666666">+</span> x2
+
+<span style="color: #408080; font-style: italic"># Derivative of f2 w.r.t x2 is: x1 - 5:</span>
+f2_grad_x2_analytical <span style="color: #666666">=</span> x1 <span style="color: #666666">-</span> <span style="color: #666666">5</span>
+
+<span style="color: #408080; font-style: italic"># See the evaluated derivations:</span>
+<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;The derivative of f2 w.r.t x1: </span><span style="color: #BB6688; font-weight: bold">%g</span><span style="color: #BA2121">&quot;</span><span style="color: #666666">%</span>( f2_grad_x1(x1,x2) ))
+<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;The analytical derivative of f2 w.r.t x1: </span><span style="color: #BB6688; font-weight: bold">%g</span><span style="color: #BA2121">&quot;</span><span style="color: #666666">%</span>( f2_grad_x1(x1,x2) ))
+
+<span style="color: #008000">print</span>()
+
+<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;The derivative of f2 w.r.t x2: </span><span style="color: #BB6688; font-weight: bold">%g</span><span style="color: #BA2121">&quot;</span><span style="color: #666666">%</span>( f2_grad_x2(x1,x2) ))
+<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;The analytical derivative of f2 w.r.t x2: </span><span style="color: #BB6688; font-weight: bold">%g</span><span style="color: #BA2121">&quot;</span><span style="color: #666666">%</span>( f2_grad_x2(x1,x2) ))
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
+</div>
+
+<p>Note that the grad function will not produce the true gradient of the function. The true gradient of a function with two or more variables will produce a vector, where each element is the function differentiated w.r.t a variable.</p>
+
+<p>
+<!-- navigation buttons at the bottom of the page -->
+<ul class="pagination">
+<li><a href="http://github.com/._week39-bs070.html">&laquo;</a></li>
+  <li><a href="http://github.com/._week39-bs000.html">1</a></li>
+  <li><a href="">...</a></li>
+  <li><a href="http://github.com/._week39-bs063.html">64</a></li>
+  <li><a href="http://github.com/._week39-bs064.html">65</a></li>
+  <li><a href="http://github.com/._week39-bs065.html">66</a></li>
+  <li><a href="http://github.com/._week39-bs066.html">67</a></li>
+  <li><a href="http://github.com/._week39-bs067.html">68</a></li>
+  <li><a href="http://github.com/._week39-bs068.html">69</a></li>
+  <li><a href="http://github.com/._week39-bs069.html">70</a></li>
+  <li><a href="http://github.com/._week39-bs070.html">71</a></li>
+  <li class="active"><a href="http://github.com/._week39-bs071.html">72</a></li>
+  <li><a href="http://github.com/._week39-bs072.html">73</a></li>
+  <li><a href="http://github.com/._week39-bs073.html">74</a></li>
+  <li><a href="http://github.com/._week39-bs074.html">75</a></li>
+  <li><a href="http://github.com/._week39-bs075.html">76</a></li>
+  <li><a href="http://github.com/._week39-bs076.html">77</a></li>
+  <li><a href="http://github.com/._week39-bs077.html">78</a></li>
+  <li><a href="http://github.com/._week39-bs078.html">79</a></li>
+  <li><a href="http://github.com/._week39-bs079.html">80</a></li>
+  <li><a href="http://github.com/._week39-bs080.html">81</a></li>
+  <li><a href="">...</a></li>
+  <li><a href="http://github.com/._week39-bs088.html">89</a></li>
+  <li><a href="http://github.com/._week39-bs072.html">&raquo;</a></li>
+</ul>
+<!-- ------------------- end of main content --------------- -->
+</div>  <!-- end container -->
+<!-- include javascript, jQuery *first* -->
+<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.10.2/jquery.min.js"></script>
+<script src="https://netdna.bootstrapcdn.com/bootstrap/3.0.0/js/bootstrap.min.js"></script>
+<!-- Bootstrap footer
+<footer>
+<a href="https://.../"><img width="250" align=right src="https://.../"></a>
+</footer>
+-->
+<center style="font-size:80%">
+<!-- copyright only on the titlepage -->
+</center>
+</body>
+</html>
+
diff --git a/doc/pub/week39/html/._week39-bs072.html b/doc/pub/week39/html/._week39-bs072.html
new file mode 100644
index 000000000..4e439771a
--- /dev/null
+++ b/doc/pub/week39/html/._week39-bs072.html
@@ -0,0 +1,506 @@
+<!--
+HTML file automatically generated from DocOnce source
+(https://github.com/doconce/doconce/)
+doconce format html week39.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week39-bs --no_mako
+-->
+<html>
+<head>
+<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
+<meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
+<meta name="viewport" content="width=device-width, initial-scale=1.0" />
+<meta name="description" content="Week 39: Optimization and  Gradient Methods">
+<title>Week 39: Optimization and  Gradient Methods</title>
+<!-- Bootstrap style: bootstrap -->
+<!-- doconce format html week39.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week39-bs --no_mako -->
+<link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
+<!-- not necessary
+<link href="https://netdna.bootstrapcdn.com/font-awesome/4.0.3/css/font-awesome.css" rel="stylesheet">
+-->
+<style type="text/css">
+/* Add scrollbar to dropdown menus in bootstrap navigation bar */
+.dropdown-menu {
+   height: auto;
+   max-height: 400px;
+   overflow-x: hidden;
+}
+/* Adds an invisible element before each target to offset for the navigation
+   bar */
+.anchor::before {
+  content:"";
+  display:block;
+  height:50px;      /* fixed header height for style bootstrap */
+  margin:-50px 0 0; /* negative fixed header height */
+}
+</style>
+</head>
+
+<!-- tocinfo
+{'highest level': 2,
+ 'sections': [('Plan for week 39, September 23-27, 2024',
+               2,
+               None,
+               'plan-for-week-39-september-23-27-2024'),
+              ('Lecture Monday September 23',
+               2,
+               None,
+               'lecture-monday-september-23'),
+              ('Lab sessions week 39', 2, None, 'lab-sessions-week-39'),
+              ('Lecture Monday September 23, Optimization, the central part of '
+               'any Machine Learning algortithm',
+               2,
+               None,
+               'lecture-monday-september-23-optimization-the-central-part-of-any-machine-learning-algortithm'),
+              ('Revisiting our Logistic Regression case',
+               2,
+               None,
+               'revisiting-our-logistic-regression-case'),
+              ('The equations to solve', 2, None, 'the-equations-to-solve'),
+              ("Solving using Newton-Raphson's method",
+               2,
+               None,
+               'solving-using-newton-raphson-s-method'),
+              ("Brief reminder on Newton-Raphson's method",
+               2,
+               None,
+               'brief-reminder-on-newton-raphson-s-method'),
+              ('The equations', 2, None, 'the-equations'),
+              ('Simple geometric interpretation',
+               2,
+               None,
+               'simple-geometric-interpretation'),
+              ('Extending to more than one variable',
+               2,
+               None,
+               'extending-to-more-than-one-variable'),
+              ('Steepest descent', 2, None, 'steepest-descent'),
+              ('More on Steepest descent', 2, None, 'more-on-steepest-descent'),
+              ('The ideal', 2, None, 'the-ideal'),
+              ('The sensitiveness of the gradient descent',
+               2,
+               None,
+               'the-sensitiveness-of-the-gradient-descent'),
+              ('Convex functions', 2, None, 'convex-functions'),
+              ('Convex function', 2, None, 'convex-function'),
+              ('Conditions on convex functions',
+               2,
+               None,
+               'conditions-on-convex-functions'),
+              ('More on convex functions', 2, None, 'more-on-convex-functions'),
+              ('Some simple problems', 2, None, 'some-simple-problems'),
+              ('Standard steepest descent',
+               2,
+               None,
+               'standard-steepest-descent'),
+              ('Gradient method', 2, None, 'gradient-method'),
+              ('Steepest descent  method', 2, None, 'steepest-descent-method'),
+              ('Steepest descent  method', 2, None, 'steepest-descent-method'),
+              ('Final expressions', 2, None, 'final-expressions'),
+              ('Steepest descent example', 2, None, 'steepest-descent-example'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method and iterations',
+               2,
+               None,
+               'conjugate-gradient-method-and-iterations'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Revisiting our first homework',
+               2,
+               None,
+               'revisiting-our-first-homework'),
+              ('Gradient descent example', 2, None, 'gradient-descent-example'),
+              ('The derivative of the cost/loss function',
+               2,
+               None,
+               'the-derivative-of-the-cost-loss-function'),
+              ('The Hessian matrix', 2, None, 'the-hessian-matrix'),
+              ('Simple program', 2, None, 'simple-program'),
+              ('Gradient Descent Example', 2, None, 'gradient-descent-example'),
+              ('And a corresponding example using _scikit-learn_',
+               2,
+               None,
+               'and-a-corresponding-example-using-scikit-learn'),
+              ('Gradient descent and Ridge',
+               2,
+               None,
+               'gradient-descent-and-ridge'),
+              ('The Hessian matrix for Ridge Regression',
+               2,
+               None,
+               'the-hessian-matrix-for-ridge-regression'),
+              ('Program example for gradient descent with Ridge Regression',
+               2,
+               None,
+               'program-example-for-gradient-descent-with-ridge-regression'),
+              ('Using gradient descent methods, limitations',
+               2,
+               None,
+               'using-gradient-descent-methods-limitations'),
+              ('Improving gradient descent with momentum',
+               2,
+               None,
+               'improving-gradient-descent-with-momentum'),
+              ('Same code but now with momentum gradient descent',
+               2,
+               None,
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ('Overview video on Stochastic Gradient Descent',
+               2,
+               None,
+               'overview-video-on-stochastic-gradient-descent'),
+              ('Batches and mini-batches', 2, None, 'batches-and-mini-batches'),
+              ('Stochastic Gradient Descent (SGD)',
+               2,
+               None,
+               'stochastic-gradient-descent-sgd'),
+              ('Stochastic Gradient Descent',
+               2,
+               None,
+               'stochastic-gradient-descent'),
+              ('Computation of gradients', 2, None, 'computation-of-gradients'),
+              ('SGD example', 2, None, 'sgd-example'),
+              ('The gradient step', 2, None, 'the-gradient-step'),
+              ('Simple example code', 2, None, 'simple-example-code'),
+              ('When do we stop?', 2, None, 'when-do-we-stop'),
+              ('Slightly different approach',
+               2,
+               None,
+               'slightly-different-approach'),
+              ('Time decay rate', 2, None, 'time-decay-rate'),
+              ('Code with a Number of Minibatches which varies',
+               2,
+               None,
+               'code-with-a-number-of-minibatches-which-varies'),
+              ('Replace or not', 2, None, 'replace-or-not'),
+              ('Momentum based GD', 2, None, 'momentum-based-gd'),
+              ('More on momentum based approaches',
+               2,
+               None,
+               'more-on-momentum-based-approaches'),
+              ('Momentum parameter', 2, None, 'momentum-parameter'),
+              ('Second moment of the gradient',
+               2,
+               None,
+               'second-moment-of-the-gradient'),
+              ('RMS prop', 2, None, 'rms-prop'),
+              ('"ADAM optimizer":"https://arxiv.org/abs/1412.6980"',
+               2,
+               None,
+               'adam-optimizer-https-arxiv-org-abs-1412-6980'),
+              ('Algorithms and codes for Adagrad, RMSprop and Adam',
+               2,
+               None,
+               'algorithms-and-codes-for-adagrad-rmsprop-and-adam'),
+              ('Practical tips', 2, None, 'practical-tips'),
+              ('Automatic differentiation',
+               2,
+               None,
+               'automatic-differentiation'),
+              ('Using autograd', 2, None, 'using-autograd'),
+              ('Autograd with more complicated functions',
+               2,
+               None,
+               'autograd-with-more-complicated-functions'),
+              ('More complicated functions using the elements of their '
+               'arguments directly',
+               2,
+               None,
+               'more-complicated-functions-using-the-elements-of-their-arguments-directly'),
+              ('Functions using mathematical functions from Numpy',
+               2,
+               None,
+               'functions-using-mathematical-functions-from-numpy'),
+              ('More autograd', 2, None, 'more-autograd'),
+              ('And  with loops', 2, None, 'and-with-loops'),
+              ('Using recursion', 2, None, 'using-recursion'),
+              ('Unsupported functions', 2, None, 'unsupported-functions'),
+              ('The syntax a.dot(b) when finding the dot product',
+               2,
+               None,
+               'the-syntax-a-dot-b-when-finding-the-dot-product'),
+              ('Using Autograd with OLS', 2, None, 'using-autograd-with-ols'),
+              ('Same code but now with momentum gradient descent',
+               2,
+               None,
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ("But none of these can compete with Newton's method",
+               2,
+               None,
+               'but-none-of-these-can-compete-with-newton-s-method'),
+              ('Including Stochastic Gradient Descent with Autograd',
+               2,
+               None,
+               'including-stochastic-gradient-descent-with-autograd'),
+              ('Same code but now with momentum gradient descent',
+               2,
+               None,
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ('Similar (second order function now) problem but now with '
+               'AdaGrad',
+               2,
+               None,
+               'similar-second-order-function-now-problem-but-now-with-adagrad'),
+              ('RMSprop for adaptive learning rate with Stochastic Gradient '
+               'Descent',
+               2,
+               None,
+               'rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent'),
+              ('And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"',
+               2,
+               None,
+               'and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf'),
+              ('And Logistic Regression', 2, None, 'and-logistic-regression'),
+              ('Introducing "JAX":"https://jax.readthedocs.io/en/latest/"',
+               2,
+               None,
+               'introducing-jax-https-jax-readthedocs-io-en-latest')]}
+end of tocinfo -->
+
+<body>
+
+
+
+<script type="text/x-mathjax-config">
+MathJax.Hub.Config({
+  TeX: {
+     equationNumbers: {  autoNumber: "none"  },
+     extensions: ["AMSmath.js", "AMSsymbols.js", "autobold.js", "color.js"]
+  }
+});
+</script>
+<script type="text/javascript" async
+ src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
+</script>
+
+
+<!-- Bootstrap navigation bar -->
+<div class="navbar navbar-default navbar-fixed-top">
+  <div class="navbar-header">
+    <button type="button" class="navbar-toggle" data-toggle="collapse" data-target=".navbar-responsive-collapse">
+      <span class="icon-bar"></span>
+      <span class="icon-bar"></span>
+      <span class="icon-bar"></span>
+    </button>
+    <a class="navbar-brand" href="http://github.com/week39-bs.html">Week 39: Optimization and  Gradient Methods</a>
+  </div>
+  <div class="navbar-collapse collapse navbar-responsive-collapse">
+    <ul class="nav navbar-nav navbar-right">
+      <li class="dropdown">
+        <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
+        <ul class="dropdown-menu">
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs001.html#plan-for-week-39-september-23-27-2024" style="font-size: 80%;">Plan for week 39, September 23-27, 2024</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs002.html#lecture-monday-september-23" style="font-size: 80%;">Lecture Monday September 23</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs003.html#lab-sessions-week-39" style="font-size: 80%;">Lab sessions week 39</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs004.html#lecture-monday-september-23-optimization-the-central-part-of-any-machine-learning-algortithm" style="font-size: 80%;">Lecture Monday September 23, Optimization, the central part of any Machine Learning algortithm</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs005.html#revisiting-our-logistic-regression-case" style="font-size: 80%;">Revisiting our Logistic Regression case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs006.html#the-equations-to-solve" style="font-size: 80%;">The equations to solve</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs007.html#solving-using-newton-raphson-s-method" style="font-size: 80%;">Solving using Newton-Raphson's method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs008.html#brief-reminder-on-newton-raphson-s-method" style="font-size: 80%;">Brief reminder on Newton-Raphson's method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs009.html#the-equations" style="font-size: 80%;">The equations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs010.html#simple-geometric-interpretation" style="font-size: 80%;">Simple geometric interpretation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs011.html#extending-to-more-than-one-variable" style="font-size: 80%;">Extending to more than one variable</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs012.html#steepest-descent" style="font-size: 80%;">Steepest descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs013.html#more-on-steepest-descent" style="font-size: 80%;">More on Steepest descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs014.html#the-ideal" style="font-size: 80%;">The ideal</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs015.html#the-sensitiveness-of-the-gradient-descent" style="font-size: 80%;">The sensitiveness of the gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs016.html#convex-functions" style="font-size: 80%;">Convex functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs017.html#convex-function" style="font-size: 80%;">Convex function</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs018.html#conditions-on-convex-functions" style="font-size: 80%;">Conditions on convex functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs019.html#more-on-convex-functions" style="font-size: 80%;">More on convex functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs020.html#some-simple-problems" style="font-size: 80%;">Some simple problems</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs021.html#standard-steepest-descent" style="font-size: 80%;">Standard steepest descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs022.html#gradient-method" style="font-size: 80%;">Gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs024.html#steepest-descent-method" style="font-size: 80%;">Steepest descent  method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs024.html#steepest-descent-method" style="font-size: 80%;">Steepest descent  method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs025.html#final-expressions" style="font-size: 80%;">Final expressions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs026.html#steepest-descent-example" style="font-size: 80%;">Steepest descent example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs031.html#conjugate-gradient-method-and-iterations" style="font-size: 80%;">Conjugate gradient method and iterations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs035.html#revisiting-our-first-homework" style="font-size: 80%;">Revisiting our first homework</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs040.html#gradient-descent-example" style="font-size: 80%;">Gradient descent example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs037.html#the-derivative-of-the-cost-loss-function" style="font-size: 80%;">The derivative of the cost/loss function</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs038.html#the-hessian-matrix" style="font-size: 80%;">The Hessian matrix</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs039.html#simple-program" style="font-size: 80%;">Simple program</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs040.html#gradient-descent-example" style="font-size: 80%;">Gradient Descent Example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs041.html#and-a-corresponding-example-using-scikit-learn" style="font-size: 80%;">And a corresponding example using <b>scikit-learn</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs042.html#gradient-descent-and-ridge" style="font-size: 80%;">Gradient descent and Ridge</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs043.html#the-hessian-matrix-for-ridge-regression" style="font-size: 80%;">The Hessian matrix for Ridge Regression</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs044.html#program-example-for-gradient-descent-with-ridge-regression" style="font-size: 80%;">Program example for gradient descent with Ridge Regression</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs045.html#using-gradient-descent-methods-limitations" style="font-size: 80%;">Using gradient descent methods, limitations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs046.html#improving-gradient-descent-with-momentum" style="font-size: 80%;">Improving gradient descent with momentum</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs083.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;">Same code but now with momentum gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs048.html#overview-video-on-stochastic-gradient-descent" style="font-size: 80%;">Overview video on Stochastic Gradient Descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs049.html#batches-and-mini-batches" style="font-size: 80%;">Batches and mini-batches</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs050.html#stochastic-gradient-descent-sgd" style="font-size: 80%;">Stochastic Gradient Descent (SGD)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs051.html#stochastic-gradient-descent" style="font-size: 80%;">Stochastic Gradient Descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs052.html#computation-of-gradients" style="font-size: 80%;">Computation of gradients</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs053.html#sgd-example" style="font-size: 80%;">SGD example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs054.html#the-gradient-step" style="font-size: 80%;">The gradient step</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs055.html#simple-example-code" style="font-size: 80%;">Simple example code</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs056.html#when-do-we-stop" style="font-size: 80%;">When do we stop?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs057.html#slightly-different-approach" style="font-size: 80%;">Slightly different approach</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs058.html#time-decay-rate" style="font-size: 80%;">Time decay rate</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs059.html#code-with-a-number-of-minibatches-which-varies" style="font-size: 80%;">Code with a Number of Minibatches which varies</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs060.html#replace-or-not" style="font-size: 80%;">Replace or not</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs061.html#momentum-based-gd" style="font-size: 80%;">Momentum based GD</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs062.html#more-on-momentum-based-approaches" style="font-size: 80%;">More on momentum based approaches</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs063.html#momentum-parameter" style="font-size: 80%;">Momentum parameter</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs064.html#second-moment-of-the-gradient" style="font-size: 80%;">Second moment of the gradient</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs065.html#rms-prop" style="font-size: 80%;">RMS prop</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs066.html#adam-optimizer-https-arxiv-org-abs-1412-6980" style="font-size: 80%;">"ADAM optimizer":"https://arxiv.org/abs/1412.6980"</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs067.html#algorithms-and-codes-for-adagrad-rmsprop-and-adam" style="font-size: 80%;">Algorithms and codes for Adagrad, RMSprop and Adam</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs068.html#practical-tips" style="font-size: 80%;">Practical tips</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs069.html#automatic-differentiation" style="font-size: 80%;">Automatic differentiation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs070.html#using-autograd" style="font-size: 80%;">Using autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs071.html#autograd-with-more-complicated-functions" style="font-size: 80%;">Autograd with more complicated functions</a></li>
+     <!-- navigation toc: --> <li><a href="#more-complicated-functions-using-the-elements-of-their-arguments-directly" style="font-size: 80%;">More complicated functions using the elements of their arguments directly</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs073.html#functions-using-mathematical-functions-from-numpy" style="font-size: 80%;">Functions using mathematical functions from Numpy</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs074.html#more-autograd" style="font-size: 80%;">More autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs075.html#and-with-loops" style="font-size: 80%;">And  with loops</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs076.html#using-recursion" style="font-size: 80%;">Using recursion</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs077.html#unsupported-functions" style="font-size: 80%;">Unsupported functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs078.html#the-syntax-a-dot-b-when-finding-the-dot-product" style="font-size: 80%;">The syntax a.dot(b) when finding the dot product</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs079.html#using-autograd-with-ols" style="font-size: 80%;">Using Autograd with OLS</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs083.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;">Same code but now with momentum gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs081.html#but-none-of-these-can-compete-with-newton-s-method" style="font-size: 80%;">But none of these can compete with Newton's method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs082.html#including-stochastic-gradient-descent-with-autograd" style="font-size: 80%;">Including Stochastic Gradient Descent with Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs083.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;">Same code but now with momentum gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs084.html#similar-second-order-function-now-problem-but-now-with-adagrad" style="font-size: 80%;">Similar (second order function now) problem but now with AdaGrad</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs085.html#rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent" style="font-size: 80%;">RMSprop for adaptive learning rate with Stochastic Gradient Descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs086.html#and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf" style="font-size: 80%;">And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs087.html#and-logistic-regression" style="font-size: 80%;">And Logistic Regression</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs088.html#introducing-jax-https-jax-readthedocs-io-en-latest" style="font-size: 80%;">Introducing "JAX":"https://jax.readthedocs.io/en/latest/"</a></li>
+
+        </ul>
+      </li>
+    </ul>
+  </div>
+</div>
+</div> <!-- end of navigation bar -->
+<div class="container">
+<p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
+<a name="part0072"></a>
+<!-- !split -->
+<h2 id="more-complicated-functions-using-the-elements-of-their-arguments-directly" class="anchor">More complicated functions using the elements of their arguments directly </h2>
+
+
+<!-- code=python (!bc pycod) typeset with pygments style "default" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #f8f8f8">
+  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">autograd.numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">autograd</span> <span style="color: #008000; font-weight: bold">import</span> grad
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">f3</span>(x): <span style="color: #408080; font-style: italic"># Assumes x is an array of length 5 or higher</span>
+    <span style="color: #008000; font-weight: bold">return</span> <span style="color: #666666">2*</span>x[<span style="color: #666666">0</span>] <span style="color: #666666">+</span> <span style="color: #666666">3*</span>x[<span style="color: #666666">1</span>] <span style="color: #666666">+</span> <span style="color: #666666">5*</span>x[<span style="color: #666666">2</span>] <span style="color: #666666">+</span> <span style="color: #666666">7*</span>x[<span style="color: #666666">3</span>] <span style="color: #666666">+</span> <span style="color: #666666">11*</span>x[<span style="color: #666666">4</span>]<span style="color: #666666">**2</span>
+
+f3_grad <span style="color: #666666">=</span> grad(f3)
+
+x <span style="color: #666666">=</span> np<span style="color: #666666">.</span>linspace(<span style="color: #666666">0</span>,<span style="color: #666666">4</span>,<span style="color: #666666">5</span>)
+
+<span style="color: #408080; font-style: italic"># Print the computed gradient:</span>
+<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;The computed gradient of f3 is: &quot;</span>, f3_grad(x))
+
+<span style="color: #408080; font-style: italic"># The analytical gradient is: (2, 3, 5, 7, 22*x[4])</span>
+f3_grad_analytical <span style="color: #666666">=</span> np<span style="color: #666666">.</span>array([<span style="color: #666666">2</span>, <span style="color: #666666">3</span>, <span style="color: #666666">5</span>, <span style="color: #666666">7</span>, <span style="color: #666666">22*</span>x[<span style="color: #666666">4</span>]])
+
+<span style="color: #408080; font-style: italic"># Print the analytical gradient:</span>
+<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;The analytical gradient of f3 is: &quot;</span>, f3_grad_analytical)
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
+</div>
+
+<p>Note that in this case, when sending an array as input argument, the
+output from Autograd is another array. This is the true gradient of
+the function, as opposed to the function in the previous example. By
+using arrays to represent the variables, the output from Autograd
+might be easier to work with, as the output is closer to what one
+could expect form a gradient-evaluting function.
+</p>
+
+<p>
+<!-- navigation buttons at the bottom of the page -->
+<ul class="pagination">
+<li><a href="http://github.com/._week39-bs071.html">&laquo;</a></li>
+  <li><a href="http://github.com/._week39-bs000.html">1</a></li>
+  <li><a href="">...</a></li>
+  <li><a href="http://github.com/._week39-bs064.html">65</a></li>
+  <li><a href="http://github.com/._week39-bs065.html">66</a></li>
+  <li><a href="http://github.com/._week39-bs066.html">67</a></li>
+  <li><a href="http://github.com/._week39-bs067.html">68</a></li>
+  <li><a href="http://github.com/._week39-bs068.html">69</a></li>
+  <li><a href="http://github.com/._week39-bs069.html">70</a></li>
+  <li><a href="http://github.com/._week39-bs070.html">71</a></li>
+  <li><a href="http://github.com/._week39-bs071.html">72</a></li>
+  <li class="active"><a href="http://github.com/._week39-bs072.html">73</a></li>
+  <li><a href="http://github.com/._week39-bs073.html">74</a></li>
+  <li><a href="http://github.com/._week39-bs074.html">75</a></li>
+  <li><a href="http://github.com/._week39-bs075.html">76</a></li>
+  <li><a href="http://github.com/._week39-bs076.html">77</a></li>
+  <li><a href="http://github.com/._week39-bs077.html">78</a></li>
+  <li><a href="http://github.com/._week39-bs078.html">79</a></li>
+  <li><a href="http://github.com/._week39-bs079.html">80</a></li>
+  <li><a href="http://github.com/._week39-bs080.html">81</a></li>
+  <li><a href="http://github.com/._week39-bs081.html">82</a></li>
+  <li><a href="">...</a></li>
+  <li><a href="http://github.com/._week39-bs088.html">89</a></li>
+  <li><a href="http://github.com/._week39-bs073.html">&raquo;</a></li>
+</ul>
+<!-- ------------------- end of main content --------------- -->
+</div>  <!-- end container -->
+<!-- include javascript, jQuery *first* -->
+<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.10.2/jquery.min.js"></script>
+<script src="https://netdna.bootstrapcdn.com/bootstrap/3.0.0/js/bootstrap.min.js"></script>
+<!-- Bootstrap footer
+<footer>
+<a href="https://.../"><img width="250" align=right src="https://.../"></a>
+</footer>
+-->
+<center style="font-size:80%">
+<!-- copyright only on the titlepage -->
+</center>
+</body>
+</html>
+
diff --git a/doc/pub/week39/html/._week39-bs073.html b/doc/pub/week39/html/._week39-bs073.html
new file mode 100644
index 000000000..210e64a4a
--- /dev/null
+++ b/doc/pub/week39/html/._week39-bs073.html
@@ -0,0 +1,499 @@
+<!--
+HTML file automatically generated from DocOnce source
+(https://github.com/doconce/doconce/)
+doconce format html week39.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week39-bs --no_mako
+-->
+<html>
+<head>
+<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
+<meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
+<meta name="viewport" content="width=device-width, initial-scale=1.0" />
+<meta name="description" content="Week 39: Optimization and  Gradient Methods">
+<title>Week 39: Optimization and  Gradient Methods</title>
+<!-- Bootstrap style: bootstrap -->
+<!-- doconce format html week39.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week39-bs --no_mako -->
+<link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
+<!-- not necessary
+<link href="https://netdna.bootstrapcdn.com/font-awesome/4.0.3/css/font-awesome.css" rel="stylesheet">
+-->
+<style type="text/css">
+/* Add scrollbar to dropdown menus in bootstrap navigation bar */
+.dropdown-menu {
+   height: auto;
+   max-height: 400px;
+   overflow-x: hidden;
+}
+/* Adds an invisible element before each target to offset for the navigation
+   bar */
+.anchor::before {
+  content:"";
+  display:block;
+  height:50px;      /* fixed header height for style bootstrap */
+  margin:-50px 0 0; /* negative fixed header height */
+}
+</style>
+</head>
+
+<!-- tocinfo
+{'highest level': 2,
+ 'sections': [('Plan for week 39, September 23-27, 2024',
+               2,
+               None,
+               'plan-for-week-39-september-23-27-2024'),
+              ('Lecture Monday September 23',
+               2,
+               None,
+               'lecture-monday-september-23'),
+              ('Lab sessions week 39', 2, None, 'lab-sessions-week-39'),
+              ('Lecture Monday September 23, Optimization, the central part of '
+               'any Machine Learning algortithm',
+               2,
+               None,
+               'lecture-monday-september-23-optimization-the-central-part-of-any-machine-learning-algortithm'),
+              ('Revisiting our Logistic Regression case',
+               2,
+               None,
+               'revisiting-our-logistic-regression-case'),
+              ('The equations to solve', 2, None, 'the-equations-to-solve'),
+              ("Solving using Newton-Raphson's method",
+               2,
+               None,
+               'solving-using-newton-raphson-s-method'),
+              ("Brief reminder on Newton-Raphson's method",
+               2,
+               None,
+               'brief-reminder-on-newton-raphson-s-method'),
+              ('The equations', 2, None, 'the-equations'),
+              ('Simple geometric interpretation',
+               2,
+               None,
+               'simple-geometric-interpretation'),
+              ('Extending to more than one variable',
+               2,
+               None,
+               'extending-to-more-than-one-variable'),
+              ('Steepest descent', 2, None, 'steepest-descent'),
+              ('More on Steepest descent', 2, None, 'more-on-steepest-descent'),
+              ('The ideal', 2, None, 'the-ideal'),
+              ('The sensitiveness of the gradient descent',
+               2,
+               None,
+               'the-sensitiveness-of-the-gradient-descent'),
+              ('Convex functions', 2, None, 'convex-functions'),
+              ('Convex function', 2, None, 'convex-function'),
+              ('Conditions on convex functions',
+               2,
+               None,
+               'conditions-on-convex-functions'),
+              ('More on convex functions', 2, None, 'more-on-convex-functions'),
+              ('Some simple problems', 2, None, 'some-simple-problems'),
+              ('Standard steepest descent',
+               2,
+               None,
+               'standard-steepest-descent'),
+              ('Gradient method', 2, None, 'gradient-method'),
+              ('Steepest descent  method', 2, None, 'steepest-descent-method'),
+              ('Steepest descent  method', 2, None, 'steepest-descent-method'),
+              ('Final expressions', 2, None, 'final-expressions'),
+              ('Steepest descent example', 2, None, 'steepest-descent-example'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method and iterations',
+               2,
+               None,
+               'conjugate-gradient-method-and-iterations'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Revisiting our first homework',
+               2,
+               None,
+               'revisiting-our-first-homework'),
+              ('Gradient descent example', 2, None, 'gradient-descent-example'),
+              ('The derivative of the cost/loss function',
+               2,
+               None,
+               'the-derivative-of-the-cost-loss-function'),
+              ('The Hessian matrix', 2, None, 'the-hessian-matrix'),
+              ('Simple program', 2, None, 'simple-program'),
+              ('Gradient Descent Example', 2, None, 'gradient-descent-example'),
+              ('And a corresponding example using _scikit-learn_',
+               2,
+               None,
+               'and-a-corresponding-example-using-scikit-learn'),
+              ('Gradient descent and Ridge',
+               2,
+               None,
+               'gradient-descent-and-ridge'),
+              ('The Hessian matrix for Ridge Regression',
+               2,
+               None,
+               'the-hessian-matrix-for-ridge-regression'),
+              ('Program example for gradient descent with Ridge Regression',
+               2,
+               None,
+               'program-example-for-gradient-descent-with-ridge-regression'),
+              ('Using gradient descent methods, limitations',
+               2,
+               None,
+               'using-gradient-descent-methods-limitations'),
+              ('Improving gradient descent with momentum',
+               2,
+               None,
+               'improving-gradient-descent-with-momentum'),
+              ('Same code but now with momentum gradient descent',
+               2,
+               None,
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ('Overview video on Stochastic Gradient Descent',
+               2,
+               None,
+               'overview-video-on-stochastic-gradient-descent'),
+              ('Batches and mini-batches', 2, None, 'batches-and-mini-batches'),
+              ('Stochastic Gradient Descent (SGD)',
+               2,
+               None,
+               'stochastic-gradient-descent-sgd'),
+              ('Stochastic Gradient Descent',
+               2,
+               None,
+               'stochastic-gradient-descent'),
+              ('Computation of gradients', 2, None, 'computation-of-gradients'),
+              ('SGD example', 2, None, 'sgd-example'),
+              ('The gradient step', 2, None, 'the-gradient-step'),
+              ('Simple example code', 2, None, 'simple-example-code'),
+              ('When do we stop?', 2, None, 'when-do-we-stop'),
+              ('Slightly different approach',
+               2,
+               None,
+               'slightly-different-approach'),
+              ('Time decay rate', 2, None, 'time-decay-rate'),
+              ('Code with a Number of Minibatches which varies',
+               2,
+               None,
+               'code-with-a-number-of-minibatches-which-varies'),
+              ('Replace or not', 2, None, 'replace-or-not'),
+              ('Momentum based GD', 2, None, 'momentum-based-gd'),
+              ('More on momentum based approaches',
+               2,
+               None,
+               'more-on-momentum-based-approaches'),
+              ('Momentum parameter', 2, None, 'momentum-parameter'),
+              ('Second moment of the gradient',
+               2,
+               None,
+               'second-moment-of-the-gradient'),
+              ('RMS prop', 2, None, 'rms-prop'),
+              ('"ADAM optimizer":"https://arxiv.org/abs/1412.6980"',
+               2,
+               None,
+               'adam-optimizer-https-arxiv-org-abs-1412-6980'),
+              ('Algorithms and codes for Adagrad, RMSprop and Adam',
+               2,
+               None,
+               'algorithms-and-codes-for-adagrad-rmsprop-and-adam'),
+              ('Practical tips', 2, None, 'practical-tips'),
+              ('Automatic differentiation',
+               2,
+               None,
+               'automatic-differentiation'),
+              ('Using autograd', 2, None, 'using-autograd'),
+              ('Autograd with more complicated functions',
+               2,
+               None,
+               'autograd-with-more-complicated-functions'),
+              ('More complicated functions using the elements of their '
+               'arguments directly',
+               2,
+               None,
+               'more-complicated-functions-using-the-elements-of-their-arguments-directly'),
+              ('Functions using mathematical functions from Numpy',
+               2,
+               None,
+               'functions-using-mathematical-functions-from-numpy'),
+              ('More autograd', 2, None, 'more-autograd'),
+              ('And  with loops', 2, None, 'and-with-loops'),
+              ('Using recursion', 2, None, 'using-recursion'),
+              ('Unsupported functions', 2, None, 'unsupported-functions'),
+              ('The syntax a.dot(b) when finding the dot product',
+               2,
+               None,
+               'the-syntax-a-dot-b-when-finding-the-dot-product'),
+              ('Using Autograd with OLS', 2, None, 'using-autograd-with-ols'),
+              ('Same code but now with momentum gradient descent',
+               2,
+               None,
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ("But none of these can compete with Newton's method",
+               2,
+               None,
+               'but-none-of-these-can-compete-with-newton-s-method'),
+              ('Including Stochastic Gradient Descent with Autograd',
+               2,
+               None,
+               'including-stochastic-gradient-descent-with-autograd'),
+              ('Same code but now with momentum gradient descent',
+               2,
+               None,
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ('Similar (second order function now) problem but now with '
+               'AdaGrad',
+               2,
+               None,
+               'similar-second-order-function-now-problem-but-now-with-adagrad'),
+              ('RMSprop for adaptive learning rate with Stochastic Gradient '
+               'Descent',
+               2,
+               None,
+               'rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent'),
+              ('And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"',
+               2,
+               None,
+               'and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf'),
+              ('And Logistic Regression', 2, None, 'and-logistic-regression'),
+              ('Introducing "JAX":"https://jax.readthedocs.io/en/latest/"',
+               2,
+               None,
+               'introducing-jax-https-jax-readthedocs-io-en-latest')]}
+end of tocinfo -->
+
+<body>
+
+
+
+<script type="text/x-mathjax-config">
+MathJax.Hub.Config({
+  TeX: {
+     equationNumbers: {  autoNumber: "none"  },
+     extensions: ["AMSmath.js", "AMSsymbols.js", "autobold.js", "color.js"]
+  }
+});
+</script>
+<script type="text/javascript" async
+ src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
+</script>
+
+
+<!-- Bootstrap navigation bar -->
+<div class="navbar navbar-default navbar-fixed-top">
+  <div class="navbar-header">
+    <button type="button" class="navbar-toggle" data-toggle="collapse" data-target=".navbar-responsive-collapse">
+      <span class="icon-bar"></span>
+      <span class="icon-bar"></span>
+      <span class="icon-bar"></span>
+    </button>
+    <a class="navbar-brand" href="http://github.com/week39-bs.html">Week 39: Optimization and  Gradient Methods</a>
+  </div>
+  <div class="navbar-collapse collapse navbar-responsive-collapse">
+    <ul class="nav navbar-nav navbar-right">
+      <li class="dropdown">
+        <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
+        <ul class="dropdown-menu">
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs001.html#plan-for-week-39-september-23-27-2024" style="font-size: 80%;">Plan for week 39, September 23-27, 2024</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs002.html#lecture-monday-september-23" style="font-size: 80%;">Lecture Monday September 23</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs003.html#lab-sessions-week-39" style="font-size: 80%;">Lab sessions week 39</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs004.html#lecture-monday-september-23-optimization-the-central-part-of-any-machine-learning-algortithm" style="font-size: 80%;">Lecture Monday September 23, Optimization, the central part of any Machine Learning algortithm</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs005.html#revisiting-our-logistic-regression-case" style="font-size: 80%;">Revisiting our Logistic Regression case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs006.html#the-equations-to-solve" style="font-size: 80%;">The equations to solve</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs007.html#solving-using-newton-raphson-s-method" style="font-size: 80%;">Solving using Newton-Raphson's method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs008.html#brief-reminder-on-newton-raphson-s-method" style="font-size: 80%;">Brief reminder on Newton-Raphson's method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs009.html#the-equations" style="font-size: 80%;">The equations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs010.html#simple-geometric-interpretation" style="font-size: 80%;">Simple geometric interpretation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs011.html#extending-to-more-than-one-variable" style="font-size: 80%;">Extending to more than one variable</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs012.html#steepest-descent" style="font-size: 80%;">Steepest descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs013.html#more-on-steepest-descent" style="font-size: 80%;">More on Steepest descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs014.html#the-ideal" style="font-size: 80%;">The ideal</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs015.html#the-sensitiveness-of-the-gradient-descent" style="font-size: 80%;">The sensitiveness of the gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs016.html#convex-functions" style="font-size: 80%;">Convex functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs017.html#convex-function" style="font-size: 80%;">Convex function</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs018.html#conditions-on-convex-functions" style="font-size: 80%;">Conditions on convex functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs019.html#more-on-convex-functions" style="font-size: 80%;">More on convex functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs020.html#some-simple-problems" style="font-size: 80%;">Some simple problems</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs021.html#standard-steepest-descent" style="font-size: 80%;">Standard steepest descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs022.html#gradient-method" style="font-size: 80%;">Gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs024.html#steepest-descent-method" style="font-size: 80%;">Steepest descent  method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs024.html#steepest-descent-method" style="font-size: 80%;">Steepest descent  method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs025.html#final-expressions" style="font-size: 80%;">Final expressions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs026.html#steepest-descent-example" style="font-size: 80%;">Steepest descent example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs031.html#conjugate-gradient-method-and-iterations" style="font-size: 80%;">Conjugate gradient method and iterations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs035.html#revisiting-our-first-homework" style="font-size: 80%;">Revisiting our first homework</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs040.html#gradient-descent-example" style="font-size: 80%;">Gradient descent example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs037.html#the-derivative-of-the-cost-loss-function" style="font-size: 80%;">The derivative of the cost/loss function</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs038.html#the-hessian-matrix" style="font-size: 80%;">The Hessian matrix</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs039.html#simple-program" style="font-size: 80%;">Simple program</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs040.html#gradient-descent-example" style="font-size: 80%;">Gradient Descent Example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs041.html#and-a-corresponding-example-using-scikit-learn" style="font-size: 80%;">And a corresponding example using <b>scikit-learn</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs042.html#gradient-descent-and-ridge" style="font-size: 80%;">Gradient descent and Ridge</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs043.html#the-hessian-matrix-for-ridge-regression" style="font-size: 80%;">The Hessian matrix for Ridge Regression</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs044.html#program-example-for-gradient-descent-with-ridge-regression" style="font-size: 80%;">Program example for gradient descent with Ridge Regression</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs045.html#using-gradient-descent-methods-limitations" style="font-size: 80%;">Using gradient descent methods, limitations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs046.html#improving-gradient-descent-with-momentum" style="font-size: 80%;">Improving gradient descent with momentum</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs083.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;">Same code but now with momentum gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs048.html#overview-video-on-stochastic-gradient-descent" style="font-size: 80%;">Overview video on Stochastic Gradient Descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs049.html#batches-and-mini-batches" style="font-size: 80%;">Batches and mini-batches</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs050.html#stochastic-gradient-descent-sgd" style="font-size: 80%;">Stochastic Gradient Descent (SGD)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs051.html#stochastic-gradient-descent" style="font-size: 80%;">Stochastic Gradient Descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs052.html#computation-of-gradients" style="font-size: 80%;">Computation of gradients</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs053.html#sgd-example" style="font-size: 80%;">SGD example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs054.html#the-gradient-step" style="font-size: 80%;">The gradient step</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs055.html#simple-example-code" style="font-size: 80%;">Simple example code</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs056.html#when-do-we-stop" style="font-size: 80%;">When do we stop?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs057.html#slightly-different-approach" style="font-size: 80%;">Slightly different approach</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs058.html#time-decay-rate" style="font-size: 80%;">Time decay rate</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs059.html#code-with-a-number-of-minibatches-which-varies" style="font-size: 80%;">Code with a Number of Minibatches which varies</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs060.html#replace-or-not" style="font-size: 80%;">Replace or not</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs061.html#momentum-based-gd" style="font-size: 80%;">Momentum based GD</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs062.html#more-on-momentum-based-approaches" style="font-size: 80%;">More on momentum based approaches</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs063.html#momentum-parameter" style="font-size: 80%;">Momentum parameter</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs064.html#second-moment-of-the-gradient" style="font-size: 80%;">Second moment of the gradient</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs065.html#rms-prop" style="font-size: 80%;">RMS prop</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs066.html#adam-optimizer-https-arxiv-org-abs-1412-6980" style="font-size: 80%;">"ADAM optimizer":"https://arxiv.org/abs/1412.6980"</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs067.html#algorithms-and-codes-for-adagrad-rmsprop-and-adam" style="font-size: 80%;">Algorithms and codes for Adagrad, RMSprop and Adam</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs068.html#practical-tips" style="font-size: 80%;">Practical tips</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs069.html#automatic-differentiation" style="font-size: 80%;">Automatic differentiation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs070.html#using-autograd" style="font-size: 80%;">Using autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs071.html#autograd-with-more-complicated-functions" style="font-size: 80%;">Autograd with more complicated functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs072.html#more-complicated-functions-using-the-elements-of-their-arguments-directly" style="font-size: 80%;">More complicated functions using the elements of their arguments directly</a></li>
+     <!-- navigation toc: --> <li><a href="#functions-using-mathematical-functions-from-numpy" style="font-size: 80%;">Functions using mathematical functions from Numpy</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs074.html#more-autograd" style="font-size: 80%;">More autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs075.html#and-with-loops" style="font-size: 80%;">And  with loops</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs076.html#using-recursion" style="font-size: 80%;">Using recursion</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs077.html#unsupported-functions" style="font-size: 80%;">Unsupported functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs078.html#the-syntax-a-dot-b-when-finding-the-dot-product" style="font-size: 80%;">The syntax a.dot(b) when finding the dot product</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs079.html#using-autograd-with-ols" style="font-size: 80%;">Using Autograd with OLS</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs083.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;">Same code but now with momentum gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs081.html#but-none-of-these-can-compete-with-newton-s-method" style="font-size: 80%;">But none of these can compete with Newton's method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs082.html#including-stochastic-gradient-descent-with-autograd" style="font-size: 80%;">Including Stochastic Gradient Descent with Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs083.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;">Same code but now with momentum gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs084.html#similar-second-order-function-now-problem-but-now-with-adagrad" style="font-size: 80%;">Similar (second order function now) problem but now with AdaGrad</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs085.html#rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent" style="font-size: 80%;">RMSprop for adaptive learning rate with Stochastic Gradient Descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs086.html#and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf" style="font-size: 80%;">And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs087.html#and-logistic-regression" style="font-size: 80%;">And Logistic Regression</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs088.html#introducing-jax-https-jax-readthedocs-io-en-latest" style="font-size: 80%;">Introducing "JAX":"https://jax.readthedocs.io/en/latest/"</a></li>
+
+        </ul>
+      </li>
+    </ul>
+  </div>
+</div>
+</div> <!-- end of navigation bar -->
+<div class="container">
+<p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
+<a name="part0073"></a>
+<!-- !split  -->
+<h2 id="functions-using-mathematical-functions-from-numpy" class="anchor">Functions using mathematical functions from Numpy </h2>
+
+
+<!-- code=python (!bc pycod) typeset with pygments style "default" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #f8f8f8">
+  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">autograd.numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">autograd</span> <span style="color: #008000; font-weight: bold">import</span> grad
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">f4</span>(x):
+    <span style="color: #008000; font-weight: bold">return</span> np<span style="color: #666666">.</span>sqrt(<span style="color: #666666">1+</span>x<span style="color: #666666">**2</span>) <span style="color: #666666">+</span> np<span style="color: #666666">.</span>exp(x) <span style="color: #666666">+</span> np<span style="color: #666666">.</span>sin(<span style="color: #666666">2*</span>np<span style="color: #666666">.</span>pi<span style="color: #666666">*</span>x)
+
+f4_grad <span style="color: #666666">=</span> grad(f4)
+
+x <span style="color: #666666">=</span> <span style="color: #666666">2.7</span>
+
+<span style="color: #408080; font-style: italic"># Print the computed derivative:</span>
+<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;The computed derivative of f4 at x = </span><span style="color: #BB6688; font-weight: bold">%g</span><span style="color: #BA2121"> is: </span><span style="color: #BB6688; font-weight: bold">%g</span><span style="color: #BA2121">&quot;</span><span style="color: #666666">%</span>(x,f4_grad(x)))
+
+<span style="color: #408080; font-style: italic"># The analytical derivative is: x/sqrt(1 + x**2) + exp(x) + cos(2*pi*x)*2*pi</span>
+f4_grad_analytical <span style="color: #666666">=</span> x<span style="color: #666666">/</span>np<span style="color: #666666">.</span>sqrt(<span style="color: #666666">1</span> <span style="color: #666666">+</span> x<span style="color: #666666">**2</span>) <span style="color: #666666">+</span> np<span style="color: #666666">.</span>exp(x) <span style="color: #666666">+</span> np<span style="color: #666666">.</span>cos(<span style="color: #666666">2*</span>np<span style="color: #666666">.</span>pi<span style="color: #666666">*</span>x)<span style="color: #666666">*2*</span>np<span style="color: #666666">.</span>pi
+
+<span style="color: #408080; font-style: italic"># Print the analytical gradient:</span>
+<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;The analytical gradient of f4 at x = </span><span style="color: #BB6688; font-weight: bold">%g</span><span style="color: #BA2121"> is: </span><span style="color: #BB6688; font-weight: bold">%g</span><span style="color: #BA2121">&quot;</span><span style="color: #666666">%</span>(x,f4_grad_analytical))
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
+</div>
+
+
+<p>
+<!-- navigation buttons at the bottom of the page -->
+<ul class="pagination">
+<li><a href="http://github.com/._week39-bs072.html">&laquo;</a></li>
+  <li><a href="http://github.com/._week39-bs000.html">1</a></li>
+  <li><a href="">...</a></li>
+  <li><a href="http://github.com/._week39-bs065.html">66</a></li>
+  <li><a href="http://github.com/._week39-bs066.html">67</a></li>
+  <li><a href="http://github.com/._week39-bs067.html">68</a></li>
+  <li><a href="http://github.com/._week39-bs068.html">69</a></li>
+  <li><a href="http://github.com/._week39-bs069.html">70</a></li>
+  <li><a href="http://github.com/._week39-bs070.html">71</a></li>
+  <li><a href="http://github.com/._week39-bs071.html">72</a></li>
+  <li><a href="http://github.com/._week39-bs072.html">73</a></li>
+  <li class="active"><a href="http://github.com/._week39-bs073.html">74</a></li>
+  <li><a href="http://github.com/._week39-bs074.html">75</a></li>
+  <li><a href="http://github.com/._week39-bs075.html">76</a></li>
+  <li><a href="http://github.com/._week39-bs076.html">77</a></li>
+  <li><a href="http://github.com/._week39-bs077.html">78</a></li>
+  <li><a href="http://github.com/._week39-bs078.html">79</a></li>
+  <li><a href="http://github.com/._week39-bs079.html">80</a></li>
+  <li><a href="http://github.com/._week39-bs080.html">81</a></li>
+  <li><a href="http://github.com/._week39-bs081.html">82</a></li>
+  <li><a href="http://github.com/._week39-bs082.html">83</a></li>
+  <li><a href="">...</a></li>
+  <li><a href="http://github.com/._week39-bs088.html">89</a></li>
+  <li><a href="http://github.com/._week39-bs074.html">&raquo;</a></li>
+</ul>
+<!-- ------------------- end of main content --------------- -->
+</div>  <!-- end container -->
+<!-- include javascript, jQuery *first* -->
+<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.10.2/jquery.min.js"></script>
+<script src="https://netdna.bootstrapcdn.com/bootstrap/3.0.0/js/bootstrap.min.js"></script>
+<!-- Bootstrap footer
+<footer>
+<a href="https://.../"><img width="250" align=right src="https://.../"></a>
+</footer>
+-->
+<center style="font-size:80%">
+<!-- copyright only on the titlepage -->
+</center>
+</body>
+</html>
+
diff --git a/doc/pub/week39/html/._week39-bs074.html b/doc/pub/week39/html/._week39-bs074.html
new file mode 100644
index 000000000..9979274aa
--- /dev/null
+++ b/doc/pub/week39/html/._week39-bs074.html
@@ -0,0 +1,496 @@
+<!--
+HTML file automatically generated from DocOnce source
+(https://github.com/doconce/doconce/)
+doconce format html week39.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week39-bs --no_mako
+-->
+<html>
+<head>
+<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
+<meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
+<meta name="viewport" content="width=device-width, initial-scale=1.0" />
+<meta name="description" content="Week 39: Optimization and  Gradient Methods">
+<title>Week 39: Optimization and  Gradient Methods</title>
+<!-- Bootstrap style: bootstrap -->
+<!-- doconce format html week39.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week39-bs --no_mako -->
+<link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
+<!-- not necessary
+<link href="https://netdna.bootstrapcdn.com/font-awesome/4.0.3/css/font-awesome.css" rel="stylesheet">
+-->
+<style type="text/css">
+/* Add scrollbar to dropdown menus in bootstrap navigation bar */
+.dropdown-menu {
+   height: auto;
+   max-height: 400px;
+   overflow-x: hidden;
+}
+/* Adds an invisible element before each target to offset for the navigation
+   bar */
+.anchor::before {
+  content:"";
+  display:block;
+  height:50px;      /* fixed header height for style bootstrap */
+  margin:-50px 0 0; /* negative fixed header height */
+}
+</style>
+</head>
+
+<!-- tocinfo
+{'highest level': 2,
+ 'sections': [('Plan for week 39, September 23-27, 2024',
+               2,
+               None,
+               'plan-for-week-39-september-23-27-2024'),
+              ('Lecture Monday September 23',
+               2,
+               None,
+               'lecture-monday-september-23'),
+              ('Lab sessions week 39', 2, None, 'lab-sessions-week-39'),
+              ('Lecture Monday September 23, Optimization, the central part of '
+               'any Machine Learning algortithm',
+               2,
+               None,
+               'lecture-monday-september-23-optimization-the-central-part-of-any-machine-learning-algortithm'),
+              ('Revisiting our Logistic Regression case',
+               2,
+               None,
+               'revisiting-our-logistic-regression-case'),
+              ('The equations to solve', 2, None, 'the-equations-to-solve'),
+              ("Solving using Newton-Raphson's method",
+               2,
+               None,
+               'solving-using-newton-raphson-s-method'),
+              ("Brief reminder on Newton-Raphson's method",
+               2,
+               None,
+               'brief-reminder-on-newton-raphson-s-method'),
+              ('The equations', 2, None, 'the-equations'),
+              ('Simple geometric interpretation',
+               2,
+               None,
+               'simple-geometric-interpretation'),
+              ('Extending to more than one variable',
+               2,
+               None,
+               'extending-to-more-than-one-variable'),
+              ('Steepest descent', 2, None, 'steepest-descent'),
+              ('More on Steepest descent', 2, None, 'more-on-steepest-descent'),
+              ('The ideal', 2, None, 'the-ideal'),
+              ('The sensitiveness of the gradient descent',
+               2,
+               None,
+               'the-sensitiveness-of-the-gradient-descent'),
+              ('Convex functions', 2, None, 'convex-functions'),
+              ('Convex function', 2, None, 'convex-function'),
+              ('Conditions on convex functions',
+               2,
+               None,
+               'conditions-on-convex-functions'),
+              ('More on convex functions', 2, None, 'more-on-convex-functions'),
+              ('Some simple problems', 2, None, 'some-simple-problems'),
+              ('Standard steepest descent',
+               2,
+               None,
+               'standard-steepest-descent'),
+              ('Gradient method', 2, None, 'gradient-method'),
+              ('Steepest descent  method', 2, None, 'steepest-descent-method'),
+              ('Steepest descent  method', 2, None, 'steepest-descent-method'),
+              ('Final expressions', 2, None, 'final-expressions'),
+              ('Steepest descent example', 2, None, 'steepest-descent-example'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method and iterations',
+               2,
+               None,
+               'conjugate-gradient-method-and-iterations'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Revisiting our first homework',
+               2,
+               None,
+               'revisiting-our-first-homework'),
+              ('Gradient descent example', 2, None, 'gradient-descent-example'),
+              ('The derivative of the cost/loss function',
+               2,
+               None,
+               'the-derivative-of-the-cost-loss-function'),
+              ('The Hessian matrix', 2, None, 'the-hessian-matrix'),
+              ('Simple program', 2, None, 'simple-program'),
+              ('Gradient Descent Example', 2, None, 'gradient-descent-example'),
+              ('And a corresponding example using _scikit-learn_',
+               2,
+               None,
+               'and-a-corresponding-example-using-scikit-learn'),
+              ('Gradient descent and Ridge',
+               2,
+               None,
+               'gradient-descent-and-ridge'),
+              ('The Hessian matrix for Ridge Regression',
+               2,
+               None,
+               'the-hessian-matrix-for-ridge-regression'),
+              ('Program example for gradient descent with Ridge Regression',
+               2,
+               None,
+               'program-example-for-gradient-descent-with-ridge-regression'),
+              ('Using gradient descent methods, limitations',
+               2,
+               None,
+               'using-gradient-descent-methods-limitations'),
+              ('Improving gradient descent with momentum',
+               2,
+               None,
+               'improving-gradient-descent-with-momentum'),
+              ('Same code but now with momentum gradient descent',
+               2,
+               None,
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ('Overview video on Stochastic Gradient Descent',
+               2,
+               None,
+               'overview-video-on-stochastic-gradient-descent'),
+              ('Batches and mini-batches', 2, None, 'batches-and-mini-batches'),
+              ('Stochastic Gradient Descent (SGD)',
+               2,
+               None,
+               'stochastic-gradient-descent-sgd'),
+              ('Stochastic Gradient Descent',
+               2,
+               None,
+               'stochastic-gradient-descent'),
+              ('Computation of gradients', 2, None, 'computation-of-gradients'),
+              ('SGD example', 2, None, 'sgd-example'),
+              ('The gradient step', 2, None, 'the-gradient-step'),
+              ('Simple example code', 2, None, 'simple-example-code'),
+              ('When do we stop?', 2, None, 'when-do-we-stop'),
+              ('Slightly different approach',
+               2,
+               None,
+               'slightly-different-approach'),
+              ('Time decay rate', 2, None, 'time-decay-rate'),
+              ('Code with a Number of Minibatches which varies',
+               2,
+               None,
+               'code-with-a-number-of-minibatches-which-varies'),
+              ('Replace or not', 2, None, 'replace-or-not'),
+              ('Momentum based GD', 2, None, 'momentum-based-gd'),
+              ('More on momentum based approaches',
+               2,
+               None,
+               'more-on-momentum-based-approaches'),
+              ('Momentum parameter', 2, None, 'momentum-parameter'),
+              ('Second moment of the gradient',
+               2,
+               None,
+               'second-moment-of-the-gradient'),
+              ('RMS prop', 2, None, 'rms-prop'),
+              ('"ADAM optimizer":"https://arxiv.org/abs/1412.6980"',
+               2,
+               None,
+               'adam-optimizer-https-arxiv-org-abs-1412-6980'),
+              ('Algorithms and codes for Adagrad, RMSprop and Adam',
+               2,
+               None,
+               'algorithms-and-codes-for-adagrad-rmsprop-and-adam'),
+              ('Practical tips', 2, None, 'practical-tips'),
+              ('Automatic differentiation',
+               2,
+               None,
+               'automatic-differentiation'),
+              ('Using autograd', 2, None, 'using-autograd'),
+              ('Autograd with more complicated functions',
+               2,
+               None,
+               'autograd-with-more-complicated-functions'),
+              ('More complicated functions using the elements of their '
+               'arguments directly',
+               2,
+               None,
+               'more-complicated-functions-using-the-elements-of-their-arguments-directly'),
+              ('Functions using mathematical functions from Numpy',
+               2,
+               None,
+               'functions-using-mathematical-functions-from-numpy'),
+              ('More autograd', 2, None, 'more-autograd'),
+              ('And  with loops', 2, None, 'and-with-loops'),
+              ('Using recursion', 2, None, 'using-recursion'),
+              ('Unsupported functions', 2, None, 'unsupported-functions'),
+              ('The syntax a.dot(b) when finding the dot product',
+               2,
+               None,
+               'the-syntax-a-dot-b-when-finding-the-dot-product'),
+              ('Using Autograd with OLS', 2, None, 'using-autograd-with-ols'),
+              ('Same code but now with momentum gradient descent',
+               2,
+               None,
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ("But none of these can compete with Newton's method",
+               2,
+               None,
+               'but-none-of-these-can-compete-with-newton-s-method'),
+              ('Including Stochastic Gradient Descent with Autograd',
+               2,
+               None,
+               'including-stochastic-gradient-descent-with-autograd'),
+              ('Same code but now with momentum gradient descent',
+               2,
+               None,
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ('Similar (second order function now) problem but now with '
+               'AdaGrad',
+               2,
+               None,
+               'similar-second-order-function-now-problem-but-now-with-adagrad'),
+              ('RMSprop for adaptive learning rate with Stochastic Gradient '
+               'Descent',
+               2,
+               None,
+               'rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent'),
+              ('And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"',
+               2,
+               None,
+               'and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf'),
+              ('And Logistic Regression', 2, None, 'and-logistic-regression'),
+              ('Introducing "JAX":"https://jax.readthedocs.io/en/latest/"',
+               2,
+               None,
+               'introducing-jax-https-jax-readthedocs-io-en-latest')]}
+end of tocinfo -->
+
+<body>
+
+
+
+<script type="text/x-mathjax-config">
+MathJax.Hub.Config({
+  TeX: {
+     equationNumbers: {  autoNumber: "none"  },
+     extensions: ["AMSmath.js", "AMSsymbols.js", "autobold.js", "color.js"]
+  }
+});
+</script>
+<script type="text/javascript" async
+ src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
+</script>
+
+
+<!-- Bootstrap navigation bar -->
+<div class="navbar navbar-default navbar-fixed-top">
+  <div class="navbar-header">
+    <button type="button" class="navbar-toggle" data-toggle="collapse" data-target=".navbar-responsive-collapse">
+      <span class="icon-bar"></span>
+      <span class="icon-bar"></span>
+      <span class="icon-bar"></span>
+    </button>
+    <a class="navbar-brand" href="http://github.com/week39-bs.html">Week 39: Optimization and  Gradient Methods</a>
+  </div>
+  <div class="navbar-collapse collapse navbar-responsive-collapse">
+    <ul class="nav navbar-nav navbar-right">
+      <li class="dropdown">
+        <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
+        <ul class="dropdown-menu">
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs001.html#plan-for-week-39-september-23-27-2024" style="font-size: 80%;">Plan for week 39, September 23-27, 2024</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs002.html#lecture-monday-september-23" style="font-size: 80%;">Lecture Monday September 23</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs003.html#lab-sessions-week-39" style="font-size: 80%;">Lab sessions week 39</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs004.html#lecture-monday-september-23-optimization-the-central-part-of-any-machine-learning-algortithm" style="font-size: 80%;">Lecture Monday September 23, Optimization, the central part of any Machine Learning algortithm</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs005.html#revisiting-our-logistic-regression-case" style="font-size: 80%;">Revisiting our Logistic Regression case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs006.html#the-equations-to-solve" style="font-size: 80%;">The equations to solve</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs007.html#solving-using-newton-raphson-s-method" style="font-size: 80%;">Solving using Newton-Raphson's method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs008.html#brief-reminder-on-newton-raphson-s-method" style="font-size: 80%;">Brief reminder on Newton-Raphson's method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs009.html#the-equations" style="font-size: 80%;">The equations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs010.html#simple-geometric-interpretation" style="font-size: 80%;">Simple geometric interpretation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs011.html#extending-to-more-than-one-variable" style="font-size: 80%;">Extending to more than one variable</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs012.html#steepest-descent" style="font-size: 80%;">Steepest descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs013.html#more-on-steepest-descent" style="font-size: 80%;">More on Steepest descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs014.html#the-ideal" style="font-size: 80%;">The ideal</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs015.html#the-sensitiveness-of-the-gradient-descent" style="font-size: 80%;">The sensitiveness of the gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs016.html#convex-functions" style="font-size: 80%;">Convex functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs017.html#convex-function" style="font-size: 80%;">Convex function</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs018.html#conditions-on-convex-functions" style="font-size: 80%;">Conditions on convex functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs019.html#more-on-convex-functions" style="font-size: 80%;">More on convex functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs020.html#some-simple-problems" style="font-size: 80%;">Some simple problems</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs021.html#standard-steepest-descent" style="font-size: 80%;">Standard steepest descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs022.html#gradient-method" style="font-size: 80%;">Gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs024.html#steepest-descent-method" style="font-size: 80%;">Steepest descent  method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs024.html#steepest-descent-method" style="font-size: 80%;">Steepest descent  method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs025.html#final-expressions" style="font-size: 80%;">Final expressions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs026.html#steepest-descent-example" style="font-size: 80%;">Steepest descent example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs031.html#conjugate-gradient-method-and-iterations" style="font-size: 80%;">Conjugate gradient method and iterations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs035.html#revisiting-our-first-homework" style="font-size: 80%;">Revisiting our first homework</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs040.html#gradient-descent-example" style="font-size: 80%;">Gradient descent example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs037.html#the-derivative-of-the-cost-loss-function" style="font-size: 80%;">The derivative of the cost/loss function</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs038.html#the-hessian-matrix" style="font-size: 80%;">The Hessian matrix</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs039.html#simple-program" style="font-size: 80%;">Simple program</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs040.html#gradient-descent-example" style="font-size: 80%;">Gradient Descent Example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs041.html#and-a-corresponding-example-using-scikit-learn" style="font-size: 80%;">And a corresponding example using <b>scikit-learn</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs042.html#gradient-descent-and-ridge" style="font-size: 80%;">Gradient descent and Ridge</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs043.html#the-hessian-matrix-for-ridge-regression" style="font-size: 80%;">The Hessian matrix for Ridge Regression</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs044.html#program-example-for-gradient-descent-with-ridge-regression" style="font-size: 80%;">Program example for gradient descent with Ridge Regression</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs045.html#using-gradient-descent-methods-limitations" style="font-size: 80%;">Using gradient descent methods, limitations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs046.html#improving-gradient-descent-with-momentum" style="font-size: 80%;">Improving gradient descent with momentum</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs083.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;">Same code but now with momentum gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs048.html#overview-video-on-stochastic-gradient-descent" style="font-size: 80%;">Overview video on Stochastic Gradient Descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs049.html#batches-and-mini-batches" style="font-size: 80%;">Batches and mini-batches</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs050.html#stochastic-gradient-descent-sgd" style="font-size: 80%;">Stochastic Gradient Descent (SGD)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs051.html#stochastic-gradient-descent" style="font-size: 80%;">Stochastic Gradient Descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs052.html#computation-of-gradients" style="font-size: 80%;">Computation of gradients</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs053.html#sgd-example" style="font-size: 80%;">SGD example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs054.html#the-gradient-step" style="font-size: 80%;">The gradient step</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs055.html#simple-example-code" style="font-size: 80%;">Simple example code</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs056.html#when-do-we-stop" style="font-size: 80%;">When do we stop?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs057.html#slightly-different-approach" style="font-size: 80%;">Slightly different approach</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs058.html#time-decay-rate" style="font-size: 80%;">Time decay rate</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs059.html#code-with-a-number-of-minibatches-which-varies" style="font-size: 80%;">Code with a Number of Minibatches which varies</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs060.html#replace-or-not" style="font-size: 80%;">Replace or not</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs061.html#momentum-based-gd" style="font-size: 80%;">Momentum based GD</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs062.html#more-on-momentum-based-approaches" style="font-size: 80%;">More on momentum based approaches</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs063.html#momentum-parameter" style="font-size: 80%;">Momentum parameter</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs064.html#second-moment-of-the-gradient" style="font-size: 80%;">Second moment of the gradient</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs065.html#rms-prop" style="font-size: 80%;">RMS prop</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs066.html#adam-optimizer-https-arxiv-org-abs-1412-6980" style="font-size: 80%;">"ADAM optimizer":"https://arxiv.org/abs/1412.6980"</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs067.html#algorithms-and-codes-for-adagrad-rmsprop-and-adam" style="font-size: 80%;">Algorithms and codes for Adagrad, RMSprop and Adam</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs068.html#practical-tips" style="font-size: 80%;">Practical tips</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs069.html#automatic-differentiation" style="font-size: 80%;">Automatic differentiation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs070.html#using-autograd" style="font-size: 80%;">Using autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs071.html#autograd-with-more-complicated-functions" style="font-size: 80%;">Autograd with more complicated functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs072.html#more-complicated-functions-using-the-elements-of-their-arguments-directly" style="font-size: 80%;">More complicated functions using the elements of their arguments directly</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs073.html#functions-using-mathematical-functions-from-numpy" style="font-size: 80%;">Functions using mathematical functions from Numpy</a></li>
+     <!-- navigation toc: --> <li><a href="#more-autograd" style="font-size: 80%;">More autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs075.html#and-with-loops" style="font-size: 80%;">And  with loops</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs076.html#using-recursion" style="font-size: 80%;">Using recursion</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs077.html#unsupported-functions" style="font-size: 80%;">Unsupported functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs078.html#the-syntax-a-dot-b-when-finding-the-dot-product" style="font-size: 80%;">The syntax a.dot(b) when finding the dot product</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs079.html#using-autograd-with-ols" style="font-size: 80%;">Using Autograd with OLS</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs083.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;">Same code but now with momentum gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs081.html#but-none-of-these-can-compete-with-newton-s-method" style="font-size: 80%;">But none of these can compete with Newton's method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs082.html#including-stochastic-gradient-descent-with-autograd" style="font-size: 80%;">Including Stochastic Gradient Descent with Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs083.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;">Same code but now with momentum gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs084.html#similar-second-order-function-now-problem-but-now-with-adagrad" style="font-size: 80%;">Similar (second order function now) problem but now with AdaGrad</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs085.html#rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent" style="font-size: 80%;">RMSprop for adaptive learning rate with Stochastic Gradient Descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs086.html#and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf" style="font-size: 80%;">And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs087.html#and-logistic-regression" style="font-size: 80%;">And Logistic Regression</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs088.html#introducing-jax-https-jax-readthedocs-io-en-latest" style="font-size: 80%;">Introducing "JAX":"https://jax.readthedocs.io/en/latest/"</a></li>
+
+        </ul>
+      </li>
+    </ul>
+  </div>
+</div>
+</div> <!-- end of navigation bar -->
+<div class="container">
+<p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
+<a name="part0074"></a>
+<!-- !split -->
+<h2 id="more-autograd" class="anchor">More autograd </h2>
+
+
+<!-- code=python (!bc pycod) typeset with pygments style "default" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #f8f8f8">
+  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">autograd.numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">autograd</span> <span style="color: #008000; font-weight: bold">import</span> grad
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">f5</span>(x):
+    <span style="color: #008000; font-weight: bold">if</span> x <span style="color: #666666">&gt;=</span> <span style="color: #666666">0</span>:
+        <span style="color: #008000; font-weight: bold">return</span> x<span style="color: #666666">**2</span>
+    <span style="color: #008000; font-weight: bold">else</span>:
+        <span style="color: #008000; font-weight: bold">return</span> <span style="color: #666666">-3*</span>x <span style="color: #666666">+</span> <span style="color: #666666">1</span>
+
+f5_grad <span style="color: #666666">=</span> grad(f5)
+
+x <span style="color: #666666">=</span> <span style="color: #666666">2.7</span>
+
+<span style="color: #408080; font-style: italic"># Print the computed derivative:</span>
+<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;The computed derivative of f5 at x = </span><span style="color: #BB6688; font-weight: bold">%g</span><span style="color: #BA2121"> is: </span><span style="color: #BB6688; font-weight: bold">%g</span><span style="color: #BA2121">&quot;</span><span style="color: #666666">%</span>(x,f5_grad(x)))
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
+</div>
+
+
+<p>
+<!-- navigation buttons at the bottom of the page -->
+<ul class="pagination">
+<li><a href="http://github.com/._week39-bs073.html">&laquo;</a></li>
+  <li><a href="http://github.com/._week39-bs000.html">1</a></li>
+  <li><a href="">...</a></li>
+  <li><a href="http://github.com/._week39-bs066.html">67</a></li>
+  <li><a href="http://github.com/._week39-bs067.html">68</a></li>
+  <li><a href="http://github.com/._week39-bs068.html">69</a></li>
+  <li><a href="http://github.com/._week39-bs069.html">70</a></li>
+  <li><a href="http://github.com/._week39-bs070.html">71</a></li>
+  <li><a href="http://github.com/._week39-bs071.html">72</a></li>
+  <li><a href="http://github.com/._week39-bs072.html">73</a></li>
+  <li><a href="http://github.com/._week39-bs073.html">74</a></li>
+  <li class="active"><a href="http://github.com/._week39-bs074.html">75</a></li>
+  <li><a href="http://github.com/._week39-bs075.html">76</a></li>
+  <li><a href="http://github.com/._week39-bs076.html">77</a></li>
+  <li><a href="http://github.com/._week39-bs077.html">78</a></li>
+  <li><a href="http://github.com/._week39-bs078.html">79</a></li>
+  <li><a href="http://github.com/._week39-bs079.html">80</a></li>
+  <li><a href="http://github.com/._week39-bs080.html">81</a></li>
+  <li><a href="http://github.com/._week39-bs081.html">82</a></li>
+  <li><a href="http://github.com/._week39-bs082.html">83</a></li>
+  <li><a href="http://github.com/._week39-bs083.html">84</a></li>
+  <li><a href="">...</a></li>
+  <li><a href="http://github.com/._week39-bs088.html">89</a></li>
+  <li><a href="http://github.com/._week39-bs075.html">&raquo;</a></li>
+</ul>
+<!-- ------------------- end of main content --------------- -->
+</div>  <!-- end container -->
+<!-- include javascript, jQuery *first* -->
+<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.10.2/jquery.min.js"></script>
+<script src="https://netdna.bootstrapcdn.com/bootstrap/3.0.0/js/bootstrap.min.js"></script>
+<!-- Bootstrap footer
+<footer>
+<a href="https://.../"><img width="250" align=right src="https://.../"></a>
+</footer>
+-->
+<center style="font-size:80%">
+<!-- copyright only on the titlepage -->
+</center>
+</body>
+</html>
+
diff --git a/doc/pub/week39/html/._week39-bs075.html b/doc/pub/week39/html/._week39-bs075.html
new file mode 100644
index 000000000..9378dc5f4
--- /dev/null
+++ b/doc/pub/week39/html/._week39-bs075.html
@@ -0,0 +1,537 @@
+<!--
+HTML file automatically generated from DocOnce source
+(https://github.com/doconce/doconce/)
+doconce format html week39.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week39-bs --no_mako
+-->
+<html>
+<head>
+<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
+<meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
+<meta name="viewport" content="width=device-width, initial-scale=1.0" />
+<meta name="description" content="Week 39: Optimization and  Gradient Methods">
+<title>Week 39: Optimization and  Gradient Methods</title>
+<!-- Bootstrap style: bootstrap -->
+<!-- doconce format html week39.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week39-bs --no_mako -->
+<link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
+<!-- not necessary
+<link href="https://netdna.bootstrapcdn.com/font-awesome/4.0.3/css/font-awesome.css" rel="stylesheet">
+-->
+<style type="text/css">
+/* Add scrollbar to dropdown menus in bootstrap navigation bar */
+.dropdown-menu {
+   height: auto;
+   max-height: 400px;
+   overflow-x: hidden;
+}
+/* Adds an invisible element before each target to offset for the navigation
+   bar */
+.anchor::before {
+  content:"";
+  display:block;
+  height:50px;      /* fixed header height for style bootstrap */
+  margin:-50px 0 0; /* negative fixed header height */
+}
+</style>
+</head>
+
+<!-- tocinfo
+{'highest level': 2,
+ 'sections': [('Plan for week 39, September 23-27, 2024',
+               2,
+               None,
+               'plan-for-week-39-september-23-27-2024'),
+              ('Lecture Monday September 23',
+               2,
+               None,
+               'lecture-monday-september-23'),
+              ('Lab sessions week 39', 2, None, 'lab-sessions-week-39'),
+              ('Lecture Monday September 23, Optimization, the central part of '
+               'any Machine Learning algortithm',
+               2,
+               None,
+               'lecture-monday-september-23-optimization-the-central-part-of-any-machine-learning-algortithm'),
+              ('Revisiting our Logistic Regression case',
+               2,
+               None,
+               'revisiting-our-logistic-regression-case'),
+              ('The equations to solve', 2, None, 'the-equations-to-solve'),
+              ("Solving using Newton-Raphson's method",
+               2,
+               None,
+               'solving-using-newton-raphson-s-method'),
+              ("Brief reminder on Newton-Raphson's method",
+               2,
+               None,
+               'brief-reminder-on-newton-raphson-s-method'),
+              ('The equations', 2, None, 'the-equations'),
+              ('Simple geometric interpretation',
+               2,
+               None,
+               'simple-geometric-interpretation'),
+              ('Extending to more than one variable',
+               2,
+               None,
+               'extending-to-more-than-one-variable'),
+              ('Steepest descent', 2, None, 'steepest-descent'),
+              ('More on Steepest descent', 2, None, 'more-on-steepest-descent'),
+              ('The ideal', 2, None, 'the-ideal'),
+              ('The sensitiveness of the gradient descent',
+               2,
+               None,
+               'the-sensitiveness-of-the-gradient-descent'),
+              ('Convex functions', 2, None, 'convex-functions'),
+              ('Convex function', 2, None, 'convex-function'),
+              ('Conditions on convex functions',
+               2,
+               None,
+               'conditions-on-convex-functions'),
+              ('More on convex functions', 2, None, 'more-on-convex-functions'),
+              ('Some simple problems', 2, None, 'some-simple-problems'),
+              ('Standard steepest descent',
+               2,
+               None,
+               'standard-steepest-descent'),
+              ('Gradient method', 2, None, 'gradient-method'),
+              ('Steepest descent  method', 2, None, 'steepest-descent-method'),
+              ('Steepest descent  method', 2, None, 'steepest-descent-method'),
+              ('Final expressions', 2, None, 'final-expressions'),
+              ('Steepest descent example', 2, None, 'steepest-descent-example'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method and iterations',
+               2,
+               None,
+               'conjugate-gradient-method-and-iterations'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Revisiting our first homework',
+               2,
+               None,
+               'revisiting-our-first-homework'),
+              ('Gradient descent example', 2, None, 'gradient-descent-example'),
+              ('The derivative of the cost/loss function',
+               2,
+               None,
+               'the-derivative-of-the-cost-loss-function'),
+              ('The Hessian matrix', 2, None, 'the-hessian-matrix'),
+              ('Simple program', 2, None, 'simple-program'),
+              ('Gradient Descent Example', 2, None, 'gradient-descent-example'),
+              ('And a corresponding example using _scikit-learn_',
+               2,
+               None,
+               'and-a-corresponding-example-using-scikit-learn'),
+              ('Gradient descent and Ridge',
+               2,
+               None,
+               'gradient-descent-and-ridge'),
+              ('The Hessian matrix for Ridge Regression',
+               2,
+               None,
+               'the-hessian-matrix-for-ridge-regression'),
+              ('Program example for gradient descent with Ridge Regression',
+               2,
+               None,
+               'program-example-for-gradient-descent-with-ridge-regression'),
+              ('Using gradient descent methods, limitations',
+               2,
+               None,
+               'using-gradient-descent-methods-limitations'),
+              ('Improving gradient descent with momentum',
+               2,
+               None,
+               'improving-gradient-descent-with-momentum'),
+              ('Same code but now with momentum gradient descent',
+               2,
+               None,
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ('Overview video on Stochastic Gradient Descent',
+               2,
+               None,
+               'overview-video-on-stochastic-gradient-descent'),
+              ('Batches and mini-batches', 2, None, 'batches-and-mini-batches'),
+              ('Stochastic Gradient Descent (SGD)',
+               2,
+               None,
+               'stochastic-gradient-descent-sgd'),
+              ('Stochastic Gradient Descent',
+               2,
+               None,
+               'stochastic-gradient-descent'),
+              ('Computation of gradients', 2, None, 'computation-of-gradients'),
+              ('SGD example', 2, None, 'sgd-example'),
+              ('The gradient step', 2, None, 'the-gradient-step'),
+              ('Simple example code', 2, None, 'simple-example-code'),
+              ('When do we stop?', 2, None, 'when-do-we-stop'),
+              ('Slightly different approach',
+               2,
+               None,
+               'slightly-different-approach'),
+              ('Time decay rate', 2, None, 'time-decay-rate'),
+              ('Code with a Number of Minibatches which varies',
+               2,
+               None,
+               'code-with-a-number-of-minibatches-which-varies'),
+              ('Replace or not', 2, None, 'replace-or-not'),
+              ('Momentum based GD', 2, None, 'momentum-based-gd'),
+              ('More on momentum based approaches',
+               2,
+               None,
+               'more-on-momentum-based-approaches'),
+              ('Momentum parameter', 2, None, 'momentum-parameter'),
+              ('Second moment of the gradient',
+               2,
+               None,
+               'second-moment-of-the-gradient'),
+              ('RMS prop', 2, None, 'rms-prop'),
+              ('"ADAM optimizer":"https://arxiv.org/abs/1412.6980"',
+               2,
+               None,
+               'adam-optimizer-https-arxiv-org-abs-1412-6980'),
+              ('Algorithms and codes for Adagrad, RMSprop and Adam',
+               2,
+               None,
+               'algorithms-and-codes-for-adagrad-rmsprop-and-adam'),
+              ('Practical tips', 2, None, 'practical-tips'),
+              ('Automatic differentiation',
+               2,
+               None,
+               'automatic-differentiation'),
+              ('Using autograd', 2, None, 'using-autograd'),
+              ('Autograd with more complicated functions',
+               2,
+               None,
+               'autograd-with-more-complicated-functions'),
+              ('More complicated functions using the elements of their '
+               'arguments directly',
+               2,
+               None,
+               'more-complicated-functions-using-the-elements-of-their-arguments-directly'),
+              ('Functions using mathematical functions from Numpy',
+               2,
+               None,
+               'functions-using-mathematical-functions-from-numpy'),
+              ('More autograd', 2, None, 'more-autograd'),
+              ('And  with loops', 2, None, 'and-with-loops'),
+              ('Using recursion', 2, None, 'using-recursion'),
+              ('Unsupported functions', 2, None, 'unsupported-functions'),
+              ('The syntax a.dot(b) when finding the dot product',
+               2,
+               None,
+               'the-syntax-a-dot-b-when-finding-the-dot-product'),
+              ('Using Autograd with OLS', 2, None, 'using-autograd-with-ols'),
+              ('Same code but now with momentum gradient descent',
+               2,
+               None,
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ("But none of these can compete with Newton's method",
+               2,
+               None,
+               'but-none-of-these-can-compete-with-newton-s-method'),
+              ('Including Stochastic Gradient Descent with Autograd',
+               2,
+               None,
+               'including-stochastic-gradient-descent-with-autograd'),
+              ('Same code but now with momentum gradient descent',
+               2,
+               None,
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ('Similar (second order function now) problem but now with '
+               'AdaGrad',
+               2,
+               None,
+               'similar-second-order-function-now-problem-but-now-with-adagrad'),
+              ('RMSprop for adaptive learning rate with Stochastic Gradient '
+               'Descent',
+               2,
+               None,
+               'rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent'),
+              ('And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"',
+               2,
+               None,
+               'and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf'),
+              ('And Logistic Regression', 2, None, 'and-logistic-regression'),
+              ('Introducing "JAX":"https://jax.readthedocs.io/en/latest/"',
+               2,
+               None,
+               'introducing-jax-https-jax-readthedocs-io-en-latest')]}
+end of tocinfo -->
+
+<body>
+
+
+
+<script type="text/x-mathjax-config">
+MathJax.Hub.Config({
+  TeX: {
+     equationNumbers: {  autoNumber: "none"  },
+     extensions: ["AMSmath.js", "AMSsymbols.js", "autobold.js", "color.js"]
+  }
+});
+</script>
+<script type="text/javascript" async
+ src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
+</script>
+
+
+<!-- Bootstrap navigation bar -->
+<div class="navbar navbar-default navbar-fixed-top">
+  <div class="navbar-header">
+    <button type="button" class="navbar-toggle" data-toggle="collapse" data-target=".navbar-responsive-collapse">
+      <span class="icon-bar"></span>
+      <span class="icon-bar"></span>
+      <span class="icon-bar"></span>
+    </button>
+    <a class="navbar-brand" href="http://github.com/week39-bs.html">Week 39: Optimization and  Gradient Methods</a>
+  </div>
+  <div class="navbar-collapse collapse navbar-responsive-collapse">
+    <ul class="nav navbar-nav navbar-right">
+      <li class="dropdown">
+        <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
+        <ul class="dropdown-menu">
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs001.html#plan-for-week-39-september-23-27-2024" style="font-size: 80%;">Plan for week 39, September 23-27, 2024</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs002.html#lecture-monday-september-23" style="font-size: 80%;">Lecture Monday September 23</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs003.html#lab-sessions-week-39" style="font-size: 80%;">Lab sessions week 39</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs004.html#lecture-monday-september-23-optimization-the-central-part-of-any-machine-learning-algortithm" style="font-size: 80%;">Lecture Monday September 23, Optimization, the central part of any Machine Learning algortithm</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs005.html#revisiting-our-logistic-regression-case" style="font-size: 80%;">Revisiting our Logistic Regression case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs006.html#the-equations-to-solve" style="font-size: 80%;">The equations to solve</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs007.html#solving-using-newton-raphson-s-method" style="font-size: 80%;">Solving using Newton-Raphson's method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs008.html#brief-reminder-on-newton-raphson-s-method" style="font-size: 80%;">Brief reminder on Newton-Raphson's method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs009.html#the-equations" style="font-size: 80%;">The equations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs010.html#simple-geometric-interpretation" style="font-size: 80%;">Simple geometric interpretation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs011.html#extending-to-more-than-one-variable" style="font-size: 80%;">Extending to more than one variable</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs012.html#steepest-descent" style="font-size: 80%;">Steepest descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs013.html#more-on-steepest-descent" style="font-size: 80%;">More on Steepest descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs014.html#the-ideal" style="font-size: 80%;">The ideal</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs015.html#the-sensitiveness-of-the-gradient-descent" style="font-size: 80%;">The sensitiveness of the gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs016.html#convex-functions" style="font-size: 80%;">Convex functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs017.html#convex-function" style="font-size: 80%;">Convex function</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs018.html#conditions-on-convex-functions" style="font-size: 80%;">Conditions on convex functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs019.html#more-on-convex-functions" style="font-size: 80%;">More on convex functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs020.html#some-simple-problems" style="font-size: 80%;">Some simple problems</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs021.html#standard-steepest-descent" style="font-size: 80%;">Standard steepest descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs022.html#gradient-method" style="font-size: 80%;">Gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs024.html#steepest-descent-method" style="font-size: 80%;">Steepest descent  method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs024.html#steepest-descent-method" style="font-size: 80%;">Steepest descent  method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs025.html#final-expressions" style="font-size: 80%;">Final expressions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs026.html#steepest-descent-example" style="font-size: 80%;">Steepest descent example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs031.html#conjugate-gradient-method-and-iterations" style="font-size: 80%;">Conjugate gradient method and iterations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs035.html#revisiting-our-first-homework" style="font-size: 80%;">Revisiting our first homework</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs040.html#gradient-descent-example" style="font-size: 80%;">Gradient descent example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs037.html#the-derivative-of-the-cost-loss-function" style="font-size: 80%;">The derivative of the cost/loss function</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs038.html#the-hessian-matrix" style="font-size: 80%;">The Hessian matrix</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs039.html#simple-program" style="font-size: 80%;">Simple program</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs040.html#gradient-descent-example" style="font-size: 80%;">Gradient Descent Example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs041.html#and-a-corresponding-example-using-scikit-learn" style="font-size: 80%;">And a corresponding example using <b>scikit-learn</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs042.html#gradient-descent-and-ridge" style="font-size: 80%;">Gradient descent and Ridge</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs043.html#the-hessian-matrix-for-ridge-regression" style="font-size: 80%;">The Hessian matrix for Ridge Regression</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs044.html#program-example-for-gradient-descent-with-ridge-regression" style="font-size: 80%;">Program example for gradient descent with Ridge Regression</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs045.html#using-gradient-descent-methods-limitations" style="font-size: 80%;">Using gradient descent methods, limitations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs046.html#improving-gradient-descent-with-momentum" style="font-size: 80%;">Improving gradient descent with momentum</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs083.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;">Same code but now with momentum gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs048.html#overview-video-on-stochastic-gradient-descent" style="font-size: 80%;">Overview video on Stochastic Gradient Descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs049.html#batches-and-mini-batches" style="font-size: 80%;">Batches and mini-batches</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs050.html#stochastic-gradient-descent-sgd" style="font-size: 80%;">Stochastic Gradient Descent (SGD)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs051.html#stochastic-gradient-descent" style="font-size: 80%;">Stochastic Gradient Descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs052.html#computation-of-gradients" style="font-size: 80%;">Computation of gradients</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs053.html#sgd-example" style="font-size: 80%;">SGD example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs054.html#the-gradient-step" style="font-size: 80%;">The gradient step</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs055.html#simple-example-code" style="font-size: 80%;">Simple example code</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs056.html#when-do-we-stop" style="font-size: 80%;">When do we stop?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs057.html#slightly-different-approach" style="font-size: 80%;">Slightly different approach</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs058.html#time-decay-rate" style="font-size: 80%;">Time decay rate</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs059.html#code-with-a-number-of-minibatches-which-varies" style="font-size: 80%;">Code with a Number of Minibatches which varies</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs060.html#replace-or-not" style="font-size: 80%;">Replace or not</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs061.html#momentum-based-gd" style="font-size: 80%;">Momentum based GD</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs062.html#more-on-momentum-based-approaches" style="font-size: 80%;">More on momentum based approaches</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs063.html#momentum-parameter" style="font-size: 80%;">Momentum parameter</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs064.html#second-moment-of-the-gradient" style="font-size: 80%;">Second moment of the gradient</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs065.html#rms-prop" style="font-size: 80%;">RMS prop</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs066.html#adam-optimizer-https-arxiv-org-abs-1412-6980" style="font-size: 80%;">"ADAM optimizer":"https://arxiv.org/abs/1412.6980"</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs067.html#algorithms-and-codes-for-adagrad-rmsprop-and-adam" style="font-size: 80%;">Algorithms and codes for Adagrad, RMSprop and Adam</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs068.html#practical-tips" style="font-size: 80%;">Practical tips</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs069.html#automatic-differentiation" style="font-size: 80%;">Automatic differentiation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs070.html#using-autograd" style="font-size: 80%;">Using autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs071.html#autograd-with-more-complicated-functions" style="font-size: 80%;">Autograd with more complicated functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs072.html#more-complicated-functions-using-the-elements-of-their-arguments-directly" style="font-size: 80%;">More complicated functions using the elements of their arguments directly</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs073.html#functions-using-mathematical-functions-from-numpy" style="font-size: 80%;">Functions using mathematical functions from Numpy</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs074.html#more-autograd" style="font-size: 80%;">More autograd</a></li>
+     <!-- navigation toc: --> <li><a href="#and-with-loops" style="font-size: 80%;">And  with loops</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs076.html#using-recursion" style="font-size: 80%;">Using recursion</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs077.html#unsupported-functions" style="font-size: 80%;">Unsupported functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs078.html#the-syntax-a-dot-b-when-finding-the-dot-product" style="font-size: 80%;">The syntax a.dot(b) when finding the dot product</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs079.html#using-autograd-with-ols" style="font-size: 80%;">Using Autograd with OLS</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs083.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;">Same code but now with momentum gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs081.html#but-none-of-these-can-compete-with-newton-s-method" style="font-size: 80%;">But none of these can compete with Newton's method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs082.html#including-stochastic-gradient-descent-with-autograd" style="font-size: 80%;">Including Stochastic Gradient Descent with Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs083.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;">Same code but now with momentum gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs084.html#similar-second-order-function-now-problem-but-now-with-adagrad" style="font-size: 80%;">Similar (second order function now) problem but now with AdaGrad</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs085.html#rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent" style="font-size: 80%;">RMSprop for adaptive learning rate with Stochastic Gradient Descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs086.html#and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf" style="font-size: 80%;">And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs087.html#and-logistic-regression" style="font-size: 80%;">And Logistic Regression</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs088.html#introducing-jax-https-jax-readthedocs-io-en-latest" style="font-size: 80%;">Introducing "JAX":"https://jax.readthedocs.io/en/latest/"</a></li>
+
+        </ul>
+      </li>
+    </ul>
+  </div>
+</div>
+</div> <!-- end of navigation bar -->
+<div class="container">
+<p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
+<a name="part0075"></a>
+<!-- !split -->
+<h2 id="and-with-loops" class="anchor">And  with loops </h2>
+
+
+<!-- code=python (!bc pycod) typeset with pygments style "default" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #f8f8f8">
+  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">autograd.numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">autograd</span> <span style="color: #008000; font-weight: bold">import</span> grad
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">f6_for</span>(x):
+    val <span style="color: #666666">=</span> <span style="color: #666666">0</span>
+    <span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #666666">10</span>):
+        val <span style="color: #666666">=</span> val <span style="color: #666666">+</span> x<span style="color: #666666">**</span>i
+    <span style="color: #008000; font-weight: bold">return</span> val
+
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">f6_while</span>(x):
+    val <span style="color: #666666">=</span> <span style="color: #666666">0</span>
+    i <span style="color: #666666">=</span> <span style="color: #666666">0</span>
+    <span style="color: #008000; font-weight: bold">while</span> i <span style="color: #666666">&lt;</span> <span style="color: #666666">10</span>:
+        val <span style="color: #666666">=</span> val <span style="color: #666666">+</span> x<span style="color: #666666">**</span>i
+        i <span style="color: #666666">=</span> i <span style="color: #666666">+</span> <span style="color: #666666">1</span>
+    <span style="color: #008000; font-weight: bold">return</span> val
+
+f6_for_grad <span style="color: #666666">=</span> grad(f6_for)
+f6_while_grad <span style="color: #666666">=</span> grad(f6_while)
+
+x <span style="color: #666666">=</span> <span style="color: #666666">0.5</span>
+
+<span style="color: #408080; font-style: italic"># Print the computed derivaties of f6_for and f6_while</span>
+<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;The computed derivative of f6_for at x = </span><span style="color: #BB6688; font-weight: bold">%g</span><span style="color: #BA2121"> is: </span><span style="color: #BB6688; font-weight: bold">%g</span><span style="color: #BA2121">&quot;</span><span style="color: #666666">%</span>(x,f6_for_grad(x)))
+<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;The computed derivative of f6_while at x = </span><span style="color: #BB6688; font-weight: bold">%g</span><span style="color: #BA2121"> is: </span><span style="color: #BB6688; font-weight: bold">%g</span><span style="color: #BA2121">&quot;</span><span style="color: #666666">%</span>(x,f6_while_grad(x)))
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
+</div>
+
+
+<!-- code=python (!bc pycod) typeset with pygments style "default" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #f8f8f8">
+  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">autograd.numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">autograd</span> <span style="color: #008000; font-weight: bold">import</span> grad
+<span style="color: #408080; font-style: italic"># Both of the functions are implementation of the sum: sum(x**i) for i = 0, ..., 9</span>
+<span style="color: #408080; font-style: italic"># The analytical derivative is: sum(i*x**(i-1)) </span>
+f6_grad_analytical <span style="color: #666666">=</span> <span style="color: #666666">0</span>
+<span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #666666">10</span>):
+    f6_grad_analytical <span style="color: #666666">+=</span> i<span style="color: #666666">*</span>x<span style="color: #666666">**</span>(i<span style="color: #666666">-1</span>)
+
+<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;The analytical derivative of f6 at x = </span><span style="color: #BB6688; font-weight: bold">%g</span><span style="color: #BA2121"> is: </span><span style="color: #BB6688; font-weight: bold">%g</span><span style="color: #BA2121">&quot;</span><span style="color: #666666">%</span>(x,f6_grad_analytical))
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
+</div>
+
+
+<p>
+<!-- navigation buttons at the bottom of the page -->
+<ul class="pagination">
+<li><a href="http://github.com/._week39-bs074.html">&laquo;</a></li>
+  <li><a href="http://github.com/._week39-bs000.html">1</a></li>
+  <li><a href="">...</a></li>
+  <li><a href="http://github.com/._week39-bs067.html">68</a></li>
+  <li><a href="http://github.com/._week39-bs068.html">69</a></li>
+  <li><a href="http://github.com/._week39-bs069.html">70</a></li>
+  <li><a href="http://github.com/._week39-bs070.html">71</a></li>
+  <li><a href="http://github.com/._week39-bs071.html">72</a></li>
+  <li><a href="http://github.com/._week39-bs072.html">73</a></li>
+  <li><a href="http://github.com/._week39-bs073.html">74</a></li>
+  <li><a href="http://github.com/._week39-bs074.html">75</a></li>
+  <li class="active"><a href="http://github.com/._week39-bs075.html">76</a></li>
+  <li><a href="http://github.com/._week39-bs076.html">77</a></li>
+  <li><a href="http://github.com/._week39-bs077.html">78</a></li>
+  <li><a href="http://github.com/._week39-bs078.html">79</a></li>
+  <li><a href="http://github.com/._week39-bs079.html">80</a></li>
+  <li><a href="http://github.com/._week39-bs080.html">81</a></li>
+  <li><a href="http://github.com/._week39-bs081.html">82</a></li>
+  <li><a href="http://github.com/._week39-bs082.html">83</a></li>
+  <li><a href="http://github.com/._week39-bs083.html">84</a></li>
+  <li><a href="http://github.com/._week39-bs084.html">85</a></li>
+  <li><a href="">...</a></li>
+  <li><a href="http://github.com/._week39-bs088.html">89</a></li>
+  <li><a href="http://github.com/._week39-bs076.html">&raquo;</a></li>
+</ul>
+<!-- ------------------- end of main content --------------- -->
+</div>  <!-- end container -->
+<!-- include javascript, jQuery *first* -->
+<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.10.2/jquery.min.js"></script>
+<script src="https://netdna.bootstrapcdn.com/bootstrap/3.0.0/js/bootstrap.min.js"></script>
+<!-- Bootstrap footer
+<footer>
+<a href="https://.../"><img width="250" align=right src="https://.../"></a>
+</footer>
+-->
+<center style="font-size:80%">
+<!-- copyright only on the titlepage -->
+</center>
+</body>
+</html>
+
diff --git a/doc/pub/week39/html/._week39-bs076.html b/doc/pub/week39/html/._week39-bs076.html
new file mode 100644
index 000000000..53b23f1db
--- /dev/null
+++ b/doc/pub/week39/html/._week39-bs076.html
@@ -0,0 +1,509 @@
+<!--
+HTML file automatically generated from DocOnce source
+(https://github.com/doconce/doconce/)
+doconce format html week39.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week39-bs --no_mako
+-->
+<html>
+<head>
+<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
+<meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
+<meta name="viewport" content="width=device-width, initial-scale=1.0" />
+<meta name="description" content="Week 39: Optimization and  Gradient Methods">
+<title>Week 39: Optimization and  Gradient Methods</title>
+<!-- Bootstrap style: bootstrap -->
+<!-- doconce format html week39.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week39-bs --no_mako -->
+<link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
+<!-- not necessary
+<link href="https://netdna.bootstrapcdn.com/font-awesome/4.0.3/css/font-awesome.css" rel="stylesheet">
+-->
+<style type="text/css">
+/* Add scrollbar to dropdown menus in bootstrap navigation bar */
+.dropdown-menu {
+   height: auto;
+   max-height: 400px;
+   overflow-x: hidden;
+}
+/* Adds an invisible element before each target to offset for the navigation
+   bar */
+.anchor::before {
+  content:"";
+  display:block;
+  height:50px;      /* fixed header height for style bootstrap */
+  margin:-50px 0 0; /* negative fixed header height */
+}
+</style>
+</head>
+
+<!-- tocinfo
+{'highest level': 2,
+ 'sections': [('Plan for week 39, September 23-27, 2024',
+               2,
+               None,
+               'plan-for-week-39-september-23-27-2024'),
+              ('Lecture Monday September 23',
+               2,
+               None,
+               'lecture-monday-september-23'),
+              ('Lab sessions week 39', 2, None, 'lab-sessions-week-39'),
+              ('Lecture Monday September 23, Optimization, the central part of '
+               'any Machine Learning algortithm',
+               2,
+               None,
+               'lecture-monday-september-23-optimization-the-central-part-of-any-machine-learning-algortithm'),
+              ('Revisiting our Logistic Regression case',
+               2,
+               None,
+               'revisiting-our-logistic-regression-case'),
+              ('The equations to solve', 2, None, 'the-equations-to-solve'),
+              ("Solving using Newton-Raphson's method",
+               2,
+               None,
+               'solving-using-newton-raphson-s-method'),
+              ("Brief reminder on Newton-Raphson's method",
+               2,
+               None,
+               'brief-reminder-on-newton-raphson-s-method'),
+              ('The equations', 2, None, 'the-equations'),
+              ('Simple geometric interpretation',
+               2,
+               None,
+               'simple-geometric-interpretation'),
+              ('Extending to more than one variable',
+               2,
+               None,
+               'extending-to-more-than-one-variable'),
+              ('Steepest descent', 2, None, 'steepest-descent'),
+              ('More on Steepest descent', 2, None, 'more-on-steepest-descent'),
+              ('The ideal', 2, None, 'the-ideal'),
+              ('The sensitiveness of the gradient descent',
+               2,
+               None,
+               'the-sensitiveness-of-the-gradient-descent'),
+              ('Convex functions', 2, None, 'convex-functions'),
+              ('Convex function', 2, None, 'convex-function'),
+              ('Conditions on convex functions',
+               2,
+               None,
+               'conditions-on-convex-functions'),
+              ('More on convex functions', 2, None, 'more-on-convex-functions'),
+              ('Some simple problems', 2, None, 'some-simple-problems'),
+              ('Standard steepest descent',
+               2,
+               None,
+               'standard-steepest-descent'),
+              ('Gradient method', 2, None, 'gradient-method'),
+              ('Steepest descent  method', 2, None, 'steepest-descent-method'),
+              ('Steepest descent  method', 2, None, 'steepest-descent-method'),
+              ('Final expressions', 2, None, 'final-expressions'),
+              ('Steepest descent example', 2, None, 'steepest-descent-example'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method and iterations',
+               2,
+               None,
+               'conjugate-gradient-method-and-iterations'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Revisiting our first homework',
+               2,
+               None,
+               'revisiting-our-first-homework'),
+              ('Gradient descent example', 2, None, 'gradient-descent-example'),
+              ('The derivative of the cost/loss function',
+               2,
+               None,
+               'the-derivative-of-the-cost-loss-function'),
+              ('The Hessian matrix', 2, None, 'the-hessian-matrix'),
+              ('Simple program', 2, None, 'simple-program'),
+              ('Gradient Descent Example', 2, None, 'gradient-descent-example'),
+              ('And a corresponding example using _scikit-learn_',
+               2,
+               None,
+               'and-a-corresponding-example-using-scikit-learn'),
+              ('Gradient descent and Ridge',
+               2,
+               None,
+               'gradient-descent-and-ridge'),
+              ('The Hessian matrix for Ridge Regression',
+               2,
+               None,
+               'the-hessian-matrix-for-ridge-regression'),
+              ('Program example for gradient descent with Ridge Regression',
+               2,
+               None,
+               'program-example-for-gradient-descent-with-ridge-regression'),
+              ('Using gradient descent methods, limitations',
+               2,
+               None,
+               'using-gradient-descent-methods-limitations'),
+              ('Improving gradient descent with momentum',
+               2,
+               None,
+               'improving-gradient-descent-with-momentum'),
+              ('Same code but now with momentum gradient descent',
+               2,
+               None,
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ('Overview video on Stochastic Gradient Descent',
+               2,
+               None,
+               'overview-video-on-stochastic-gradient-descent'),
+              ('Batches and mini-batches', 2, None, 'batches-and-mini-batches'),
+              ('Stochastic Gradient Descent (SGD)',
+               2,
+               None,
+               'stochastic-gradient-descent-sgd'),
+              ('Stochastic Gradient Descent',
+               2,
+               None,
+               'stochastic-gradient-descent'),
+              ('Computation of gradients', 2, None, 'computation-of-gradients'),
+              ('SGD example', 2, None, 'sgd-example'),
+              ('The gradient step', 2, None, 'the-gradient-step'),
+              ('Simple example code', 2, None, 'simple-example-code'),
+              ('When do we stop?', 2, None, 'when-do-we-stop'),
+              ('Slightly different approach',
+               2,
+               None,
+               'slightly-different-approach'),
+              ('Time decay rate', 2, None, 'time-decay-rate'),
+              ('Code with a Number of Minibatches which varies',
+               2,
+               None,
+               'code-with-a-number-of-minibatches-which-varies'),
+              ('Replace or not', 2, None, 'replace-or-not'),
+              ('Momentum based GD', 2, None, 'momentum-based-gd'),
+              ('More on momentum based approaches',
+               2,
+               None,
+               'more-on-momentum-based-approaches'),
+              ('Momentum parameter', 2, None, 'momentum-parameter'),
+              ('Second moment of the gradient',
+               2,
+               None,
+               'second-moment-of-the-gradient'),
+              ('RMS prop', 2, None, 'rms-prop'),
+              ('"ADAM optimizer":"https://arxiv.org/abs/1412.6980"',
+               2,
+               None,
+               'adam-optimizer-https-arxiv-org-abs-1412-6980'),
+              ('Algorithms and codes for Adagrad, RMSprop and Adam',
+               2,
+               None,
+               'algorithms-and-codes-for-adagrad-rmsprop-and-adam'),
+              ('Practical tips', 2, None, 'practical-tips'),
+              ('Automatic differentiation',
+               2,
+               None,
+               'automatic-differentiation'),
+              ('Using autograd', 2, None, 'using-autograd'),
+              ('Autograd with more complicated functions',
+               2,
+               None,
+               'autograd-with-more-complicated-functions'),
+              ('More complicated functions using the elements of their '
+               'arguments directly',
+               2,
+               None,
+               'more-complicated-functions-using-the-elements-of-their-arguments-directly'),
+              ('Functions using mathematical functions from Numpy',
+               2,
+               None,
+               'functions-using-mathematical-functions-from-numpy'),
+              ('More autograd', 2, None, 'more-autograd'),
+              ('And  with loops', 2, None, 'and-with-loops'),
+              ('Using recursion', 2, None, 'using-recursion'),
+              ('Unsupported functions', 2, None, 'unsupported-functions'),
+              ('The syntax a.dot(b) when finding the dot product',
+               2,
+               None,
+               'the-syntax-a-dot-b-when-finding-the-dot-product'),
+              ('Using Autograd with OLS', 2, None, 'using-autograd-with-ols'),
+              ('Same code but now with momentum gradient descent',
+               2,
+               None,
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ("But none of these can compete with Newton's method",
+               2,
+               None,
+               'but-none-of-these-can-compete-with-newton-s-method'),
+              ('Including Stochastic Gradient Descent with Autograd',
+               2,
+               None,
+               'including-stochastic-gradient-descent-with-autograd'),
+              ('Same code but now with momentum gradient descent',
+               2,
+               None,
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ('Similar (second order function now) problem but now with '
+               'AdaGrad',
+               2,
+               None,
+               'similar-second-order-function-now-problem-but-now-with-adagrad'),
+              ('RMSprop for adaptive learning rate with Stochastic Gradient '
+               'Descent',
+               2,
+               None,
+               'rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent'),
+              ('And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"',
+               2,
+               None,
+               'and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf'),
+              ('And Logistic Regression', 2, None, 'and-logistic-regression'),
+              ('Introducing "JAX":"https://jax.readthedocs.io/en/latest/"',
+               2,
+               None,
+               'introducing-jax-https-jax-readthedocs-io-en-latest')]}
+end of tocinfo -->
+
+<body>
+
+
+
+<script type="text/x-mathjax-config">
+MathJax.Hub.Config({
+  TeX: {
+     equationNumbers: {  autoNumber: "none"  },
+     extensions: ["AMSmath.js", "AMSsymbols.js", "autobold.js", "color.js"]
+  }
+});
+</script>
+<script type="text/javascript" async
+ src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
+</script>
+
+
+<!-- Bootstrap navigation bar -->
+<div class="navbar navbar-default navbar-fixed-top">
+  <div class="navbar-header">
+    <button type="button" class="navbar-toggle" data-toggle="collapse" data-target=".navbar-responsive-collapse">
+      <span class="icon-bar"></span>
+      <span class="icon-bar"></span>
+      <span class="icon-bar"></span>
+    </button>
+    <a class="navbar-brand" href="http://github.com/week39-bs.html">Week 39: Optimization and  Gradient Methods</a>
+  </div>
+  <div class="navbar-collapse collapse navbar-responsive-collapse">
+    <ul class="nav navbar-nav navbar-right">
+      <li class="dropdown">
+        <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
+        <ul class="dropdown-menu">
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs001.html#plan-for-week-39-september-23-27-2024" style="font-size: 80%;">Plan for week 39, September 23-27, 2024</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs002.html#lecture-monday-september-23" style="font-size: 80%;">Lecture Monday September 23</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs003.html#lab-sessions-week-39" style="font-size: 80%;">Lab sessions week 39</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs004.html#lecture-monday-september-23-optimization-the-central-part-of-any-machine-learning-algortithm" style="font-size: 80%;">Lecture Monday September 23, Optimization, the central part of any Machine Learning algortithm</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs005.html#revisiting-our-logistic-regression-case" style="font-size: 80%;">Revisiting our Logistic Regression case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs006.html#the-equations-to-solve" style="font-size: 80%;">The equations to solve</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs007.html#solving-using-newton-raphson-s-method" style="font-size: 80%;">Solving using Newton-Raphson's method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs008.html#brief-reminder-on-newton-raphson-s-method" style="font-size: 80%;">Brief reminder on Newton-Raphson's method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs009.html#the-equations" style="font-size: 80%;">The equations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs010.html#simple-geometric-interpretation" style="font-size: 80%;">Simple geometric interpretation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs011.html#extending-to-more-than-one-variable" style="font-size: 80%;">Extending to more than one variable</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs012.html#steepest-descent" style="font-size: 80%;">Steepest descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs013.html#more-on-steepest-descent" style="font-size: 80%;">More on Steepest descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs014.html#the-ideal" style="font-size: 80%;">The ideal</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs015.html#the-sensitiveness-of-the-gradient-descent" style="font-size: 80%;">The sensitiveness of the gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs016.html#convex-functions" style="font-size: 80%;">Convex functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs017.html#convex-function" style="font-size: 80%;">Convex function</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs018.html#conditions-on-convex-functions" style="font-size: 80%;">Conditions on convex functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs019.html#more-on-convex-functions" style="font-size: 80%;">More on convex functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs020.html#some-simple-problems" style="font-size: 80%;">Some simple problems</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs021.html#standard-steepest-descent" style="font-size: 80%;">Standard steepest descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs022.html#gradient-method" style="font-size: 80%;">Gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs024.html#steepest-descent-method" style="font-size: 80%;">Steepest descent  method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs024.html#steepest-descent-method" style="font-size: 80%;">Steepest descent  method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs025.html#final-expressions" style="font-size: 80%;">Final expressions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs026.html#steepest-descent-example" style="font-size: 80%;">Steepest descent example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs031.html#conjugate-gradient-method-and-iterations" style="font-size: 80%;">Conjugate gradient method and iterations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs035.html#revisiting-our-first-homework" style="font-size: 80%;">Revisiting our first homework</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs040.html#gradient-descent-example" style="font-size: 80%;">Gradient descent example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs037.html#the-derivative-of-the-cost-loss-function" style="font-size: 80%;">The derivative of the cost/loss function</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs038.html#the-hessian-matrix" style="font-size: 80%;">The Hessian matrix</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs039.html#simple-program" style="font-size: 80%;">Simple program</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs040.html#gradient-descent-example" style="font-size: 80%;">Gradient Descent Example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs041.html#and-a-corresponding-example-using-scikit-learn" style="font-size: 80%;">And a corresponding example using <b>scikit-learn</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs042.html#gradient-descent-and-ridge" style="font-size: 80%;">Gradient descent and Ridge</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs043.html#the-hessian-matrix-for-ridge-regression" style="font-size: 80%;">The Hessian matrix for Ridge Regression</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs044.html#program-example-for-gradient-descent-with-ridge-regression" style="font-size: 80%;">Program example for gradient descent with Ridge Regression</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs045.html#using-gradient-descent-methods-limitations" style="font-size: 80%;">Using gradient descent methods, limitations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs046.html#improving-gradient-descent-with-momentum" style="font-size: 80%;">Improving gradient descent with momentum</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs083.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;">Same code but now with momentum gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs048.html#overview-video-on-stochastic-gradient-descent" style="font-size: 80%;">Overview video on Stochastic Gradient Descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs049.html#batches-and-mini-batches" style="font-size: 80%;">Batches and mini-batches</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs050.html#stochastic-gradient-descent-sgd" style="font-size: 80%;">Stochastic Gradient Descent (SGD)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs051.html#stochastic-gradient-descent" style="font-size: 80%;">Stochastic Gradient Descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs052.html#computation-of-gradients" style="font-size: 80%;">Computation of gradients</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs053.html#sgd-example" style="font-size: 80%;">SGD example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs054.html#the-gradient-step" style="font-size: 80%;">The gradient step</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs055.html#simple-example-code" style="font-size: 80%;">Simple example code</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs056.html#when-do-we-stop" style="font-size: 80%;">When do we stop?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs057.html#slightly-different-approach" style="font-size: 80%;">Slightly different approach</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs058.html#time-decay-rate" style="font-size: 80%;">Time decay rate</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs059.html#code-with-a-number-of-minibatches-which-varies" style="font-size: 80%;">Code with a Number of Minibatches which varies</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs060.html#replace-or-not" style="font-size: 80%;">Replace or not</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs061.html#momentum-based-gd" style="font-size: 80%;">Momentum based GD</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs062.html#more-on-momentum-based-approaches" style="font-size: 80%;">More on momentum based approaches</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs063.html#momentum-parameter" style="font-size: 80%;">Momentum parameter</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs064.html#second-moment-of-the-gradient" style="font-size: 80%;">Second moment of the gradient</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs065.html#rms-prop" style="font-size: 80%;">RMS prop</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs066.html#adam-optimizer-https-arxiv-org-abs-1412-6980" style="font-size: 80%;">"ADAM optimizer":"https://arxiv.org/abs/1412.6980"</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs067.html#algorithms-and-codes-for-adagrad-rmsprop-and-adam" style="font-size: 80%;">Algorithms and codes for Adagrad, RMSprop and Adam</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs068.html#practical-tips" style="font-size: 80%;">Practical tips</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs069.html#automatic-differentiation" style="font-size: 80%;">Automatic differentiation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs070.html#using-autograd" style="font-size: 80%;">Using autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs071.html#autograd-with-more-complicated-functions" style="font-size: 80%;">Autograd with more complicated functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs072.html#more-complicated-functions-using-the-elements-of-their-arguments-directly" style="font-size: 80%;">More complicated functions using the elements of their arguments directly</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs073.html#functions-using-mathematical-functions-from-numpy" style="font-size: 80%;">Functions using mathematical functions from Numpy</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs074.html#more-autograd" style="font-size: 80%;">More autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs075.html#and-with-loops" style="font-size: 80%;">And  with loops</a></li>
+     <!-- navigation toc: --> <li><a href="#using-recursion" style="font-size: 80%;">Using recursion</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs077.html#unsupported-functions" style="font-size: 80%;">Unsupported functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs078.html#the-syntax-a-dot-b-when-finding-the-dot-product" style="font-size: 80%;">The syntax a.dot(b) when finding the dot product</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs079.html#using-autograd-with-ols" style="font-size: 80%;">Using Autograd with OLS</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs083.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;">Same code but now with momentum gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs081.html#but-none-of-these-can-compete-with-newton-s-method" style="font-size: 80%;">But none of these can compete with Newton's method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs082.html#including-stochastic-gradient-descent-with-autograd" style="font-size: 80%;">Including Stochastic Gradient Descent with Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs083.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;">Same code but now with momentum gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs084.html#similar-second-order-function-now-problem-but-now-with-adagrad" style="font-size: 80%;">Similar (second order function now) problem but now with AdaGrad</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs085.html#rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent" style="font-size: 80%;">RMSprop for adaptive learning rate with Stochastic Gradient Descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs086.html#and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf" style="font-size: 80%;">And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs087.html#and-logistic-regression" style="font-size: 80%;">And Logistic Regression</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs088.html#introducing-jax-https-jax-readthedocs-io-en-latest" style="font-size: 80%;">Introducing "JAX":"https://jax.readthedocs.io/en/latest/"</a></li>
+
+        </ul>
+      </li>
+    </ul>
+  </div>
+</div>
+</div> <!-- end of navigation bar -->
+<div class="container">
+<p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
+<a name="part0076"></a>
+<!-- !split -->
+<h2 id="using-recursion" class="anchor">Using recursion </h2>
+
+<!-- code=python (!bc pycod) typeset with pygments style "default" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #f8f8f8">
+  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">autograd.numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">autograd</span> <span style="color: #008000; font-weight: bold">import</span> grad
+
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">f7</span>(n): <span style="color: #408080; font-style: italic"># Assume that n is an integer</span>
+    <span style="color: #008000; font-weight: bold">if</span> n <span style="color: #666666">==</span> <span style="color: #666666">1</span> <span style="color: #AA22FF; font-weight: bold">or</span> n <span style="color: #666666">==</span> <span style="color: #666666">0</span>:
+        <span style="color: #008000; font-weight: bold">return</span> <span style="color: #666666">1</span>
+    <span style="color: #008000; font-weight: bold">else</span>:
+        <span style="color: #008000; font-weight: bold">return</span> n<span style="color: #666666">*</span>f7(n<span style="color: #666666">-1</span>)
+
+f7_grad <span style="color: #666666">=</span> grad(f7)
+
+n <span style="color: #666666">=</span> <span style="color: #666666">2.0</span>
+
+<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;The computed derivative of f7 at n = </span><span style="color: #BB6688; font-weight: bold">%d</span><span style="color: #BA2121"> is: </span><span style="color: #BB6688; font-weight: bold">%g</span><span style="color: #BA2121">&quot;</span><span style="color: #666666">%</span>(n,f7_grad(n)))
+
+<span style="color: #408080; font-style: italic"># The function f7 is an implementation of the factorial of n.</span>
+<span style="color: #408080; font-style: italic"># By using the product rule, one can find that the derivative is:</span>
+
+f7_grad_analytical <span style="color: #666666">=</span> <span style="color: #666666">0</span>
+<span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #008000">int</span>(n)<span style="color: #666666">-1</span>):
+    tmp <span style="color: #666666">=</span> <span style="color: #666666">1</span>
+    <span style="color: #008000; font-weight: bold">for</span> k <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #008000">int</span>(n)<span style="color: #666666">-1</span>):
+        <span style="color: #008000; font-weight: bold">if</span> k <span style="color: #666666">!=</span> i:
+            tmp <span style="color: #666666">*=</span> (n <span style="color: #666666">-</span> k)
+    f7_grad_analytical <span style="color: #666666">+=</span> tmp
+
+<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;The analytical derivative of f7 at n = </span><span style="color: #BB6688; font-weight: bold">%d</span><span style="color: #BA2121"> is: </span><span style="color: #BB6688; font-weight: bold">%g</span><span style="color: #BA2121">&quot;</span><span style="color: #666666">%</span>(n,f7_grad_analytical))
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
+</div>
+
+<p>Note that if n is equal to zero or one, Autograd will give an error message. This message appears when the output is independent on input.</p>
+
+<p>
+<!-- navigation buttons at the bottom of the page -->
+<ul class="pagination">
+<li><a href="http://github.com/._week39-bs075.html">&laquo;</a></li>
+  <li><a href="http://github.com/._week39-bs000.html">1</a></li>
+  <li><a href="">...</a></li>
+  <li><a href="http://github.com/._week39-bs068.html">69</a></li>
+  <li><a href="http://github.com/._week39-bs069.html">70</a></li>
+  <li><a href="http://github.com/._week39-bs070.html">71</a></li>
+  <li><a href="http://github.com/._week39-bs071.html">72</a></li>
+  <li><a href="http://github.com/._week39-bs072.html">73</a></li>
+  <li><a href="http://github.com/._week39-bs073.html">74</a></li>
+  <li><a href="http://github.com/._week39-bs074.html">75</a></li>
+  <li><a href="http://github.com/._week39-bs075.html">76</a></li>
+  <li class="active"><a href="http://github.com/._week39-bs076.html">77</a></li>
+  <li><a href="http://github.com/._week39-bs077.html">78</a></li>
+  <li><a href="http://github.com/._week39-bs078.html">79</a></li>
+  <li><a href="http://github.com/._week39-bs079.html">80</a></li>
+  <li><a href="http://github.com/._week39-bs080.html">81</a></li>
+  <li><a href="http://github.com/._week39-bs081.html">82</a></li>
+  <li><a href="http://github.com/._week39-bs082.html">83</a></li>
+  <li><a href="http://github.com/._week39-bs083.html">84</a></li>
+  <li><a href="http://github.com/._week39-bs084.html">85</a></li>
+  <li><a href="http://github.com/._week39-bs085.html">86</a></li>
+  <li><a href="">...</a></li>
+  <li><a href="http://github.com/._week39-bs088.html">89</a></li>
+  <li><a href="http://github.com/._week39-bs077.html">&raquo;</a></li>
+</ul>
+<!-- ------------------- end of main content --------------- -->
+</div>  <!-- end container -->
+<!-- include javascript, jQuery *first* -->
+<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.10.2/jquery.min.js"></script>
+<script src="https://netdna.bootstrapcdn.com/bootstrap/3.0.0/js/bootstrap.min.js"></script>
+<!-- Bootstrap footer
+<footer>
+<a href="https://.../"><img width="250" align=right src="https://.../"></a>
+</footer>
+-->
+<center style="font-size:80%">
+<!-- copyright only on the titlepage -->
+</center>
+</body>
+</html>
+
diff --git a/doc/pub/week39/html/._week39-bs077.html b/doc/pub/week39/html/._week39-bs077.html
new file mode 100644
index 000000000..7792fbfe1
--- /dev/null
+++ b/doc/pub/week39/html/._week39-bs077.html
@@ -0,0 +1,496 @@
+<!--
+HTML file automatically generated from DocOnce source
+(https://github.com/doconce/doconce/)
+doconce format html week39.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week39-bs --no_mako
+-->
+<html>
+<head>
+<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
+<meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
+<meta name="viewport" content="width=device-width, initial-scale=1.0" />
+<meta name="description" content="Week 39: Optimization and  Gradient Methods">
+<title>Week 39: Optimization and  Gradient Methods</title>
+<!-- Bootstrap style: bootstrap -->
+<!-- doconce format html week39.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week39-bs --no_mako -->
+<link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
+<!-- not necessary
+<link href="https://netdna.bootstrapcdn.com/font-awesome/4.0.3/css/font-awesome.css" rel="stylesheet">
+-->
+<style type="text/css">
+/* Add scrollbar to dropdown menus in bootstrap navigation bar */
+.dropdown-menu {
+   height: auto;
+   max-height: 400px;
+   overflow-x: hidden;
+}
+/* Adds an invisible element before each target to offset for the navigation
+   bar */
+.anchor::before {
+  content:"";
+  display:block;
+  height:50px;      /* fixed header height for style bootstrap */
+  margin:-50px 0 0; /* negative fixed header height */
+}
+</style>
+</head>
+
+<!-- tocinfo
+{'highest level': 2,
+ 'sections': [('Plan for week 39, September 23-27, 2024',
+               2,
+               None,
+               'plan-for-week-39-september-23-27-2024'),
+              ('Lecture Monday September 23',
+               2,
+               None,
+               'lecture-monday-september-23'),
+              ('Lab sessions week 39', 2, None, 'lab-sessions-week-39'),
+              ('Lecture Monday September 23, Optimization, the central part of '
+               'any Machine Learning algortithm',
+               2,
+               None,
+               'lecture-monday-september-23-optimization-the-central-part-of-any-machine-learning-algortithm'),
+              ('Revisiting our Logistic Regression case',
+               2,
+               None,
+               'revisiting-our-logistic-regression-case'),
+              ('The equations to solve', 2, None, 'the-equations-to-solve'),
+              ("Solving using Newton-Raphson's method",
+               2,
+               None,
+               'solving-using-newton-raphson-s-method'),
+              ("Brief reminder on Newton-Raphson's method",
+               2,
+               None,
+               'brief-reminder-on-newton-raphson-s-method'),
+              ('The equations', 2, None, 'the-equations'),
+              ('Simple geometric interpretation',
+               2,
+               None,
+               'simple-geometric-interpretation'),
+              ('Extending to more than one variable',
+               2,
+               None,
+               'extending-to-more-than-one-variable'),
+              ('Steepest descent', 2, None, 'steepest-descent'),
+              ('More on Steepest descent', 2, None, 'more-on-steepest-descent'),
+              ('The ideal', 2, None, 'the-ideal'),
+              ('The sensitiveness of the gradient descent',
+               2,
+               None,
+               'the-sensitiveness-of-the-gradient-descent'),
+              ('Convex functions', 2, None, 'convex-functions'),
+              ('Convex function', 2, None, 'convex-function'),
+              ('Conditions on convex functions',
+               2,
+               None,
+               'conditions-on-convex-functions'),
+              ('More on convex functions', 2, None, 'more-on-convex-functions'),
+              ('Some simple problems', 2, None, 'some-simple-problems'),
+              ('Standard steepest descent',
+               2,
+               None,
+               'standard-steepest-descent'),
+              ('Gradient method', 2, None, 'gradient-method'),
+              ('Steepest descent  method', 2, None, 'steepest-descent-method'),
+              ('Steepest descent  method', 2, None, 'steepest-descent-method'),
+              ('Final expressions', 2, None, 'final-expressions'),
+              ('Steepest descent example', 2, None, 'steepest-descent-example'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method and iterations',
+               2,
+               None,
+               'conjugate-gradient-method-and-iterations'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Revisiting our first homework',
+               2,
+               None,
+               'revisiting-our-first-homework'),
+              ('Gradient descent example', 2, None, 'gradient-descent-example'),
+              ('The derivative of the cost/loss function',
+               2,
+               None,
+               'the-derivative-of-the-cost-loss-function'),
+              ('The Hessian matrix', 2, None, 'the-hessian-matrix'),
+              ('Simple program', 2, None, 'simple-program'),
+              ('Gradient Descent Example', 2, None, 'gradient-descent-example'),
+              ('And a corresponding example using _scikit-learn_',
+               2,
+               None,
+               'and-a-corresponding-example-using-scikit-learn'),
+              ('Gradient descent and Ridge',
+               2,
+               None,
+               'gradient-descent-and-ridge'),
+              ('The Hessian matrix for Ridge Regression',
+               2,
+               None,
+               'the-hessian-matrix-for-ridge-regression'),
+              ('Program example for gradient descent with Ridge Regression',
+               2,
+               None,
+               'program-example-for-gradient-descent-with-ridge-regression'),
+              ('Using gradient descent methods, limitations',
+               2,
+               None,
+               'using-gradient-descent-methods-limitations'),
+              ('Improving gradient descent with momentum',
+               2,
+               None,
+               'improving-gradient-descent-with-momentum'),
+              ('Same code but now with momentum gradient descent',
+               2,
+               None,
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ('Overview video on Stochastic Gradient Descent',
+               2,
+               None,
+               'overview-video-on-stochastic-gradient-descent'),
+              ('Batches and mini-batches', 2, None, 'batches-and-mini-batches'),
+              ('Stochastic Gradient Descent (SGD)',
+               2,
+               None,
+               'stochastic-gradient-descent-sgd'),
+              ('Stochastic Gradient Descent',
+               2,
+               None,
+               'stochastic-gradient-descent'),
+              ('Computation of gradients', 2, None, 'computation-of-gradients'),
+              ('SGD example', 2, None, 'sgd-example'),
+              ('The gradient step', 2, None, 'the-gradient-step'),
+              ('Simple example code', 2, None, 'simple-example-code'),
+              ('When do we stop?', 2, None, 'when-do-we-stop'),
+              ('Slightly different approach',
+               2,
+               None,
+               'slightly-different-approach'),
+              ('Time decay rate', 2, None, 'time-decay-rate'),
+              ('Code with a Number of Minibatches which varies',
+               2,
+               None,
+               'code-with-a-number-of-minibatches-which-varies'),
+              ('Replace or not', 2, None, 'replace-or-not'),
+              ('Momentum based GD', 2, None, 'momentum-based-gd'),
+              ('More on momentum based approaches',
+               2,
+               None,
+               'more-on-momentum-based-approaches'),
+              ('Momentum parameter', 2, None, 'momentum-parameter'),
+              ('Second moment of the gradient',
+               2,
+               None,
+               'second-moment-of-the-gradient'),
+              ('RMS prop', 2, None, 'rms-prop'),
+              ('"ADAM optimizer":"https://arxiv.org/abs/1412.6980"',
+               2,
+               None,
+               'adam-optimizer-https-arxiv-org-abs-1412-6980'),
+              ('Algorithms and codes for Adagrad, RMSprop and Adam',
+               2,
+               None,
+               'algorithms-and-codes-for-adagrad-rmsprop-and-adam'),
+              ('Practical tips', 2, None, 'practical-tips'),
+              ('Automatic differentiation',
+               2,
+               None,
+               'automatic-differentiation'),
+              ('Using autograd', 2, None, 'using-autograd'),
+              ('Autograd with more complicated functions',
+               2,
+               None,
+               'autograd-with-more-complicated-functions'),
+              ('More complicated functions using the elements of their '
+               'arguments directly',
+               2,
+               None,
+               'more-complicated-functions-using-the-elements-of-their-arguments-directly'),
+              ('Functions using mathematical functions from Numpy',
+               2,
+               None,
+               'functions-using-mathematical-functions-from-numpy'),
+              ('More autograd', 2, None, 'more-autograd'),
+              ('And  with loops', 2, None, 'and-with-loops'),
+              ('Using recursion', 2, None, 'using-recursion'),
+              ('Unsupported functions', 2, None, 'unsupported-functions'),
+              ('The syntax a.dot(b) when finding the dot product',
+               2,
+               None,
+               'the-syntax-a-dot-b-when-finding-the-dot-product'),
+              ('Using Autograd with OLS', 2, None, 'using-autograd-with-ols'),
+              ('Same code but now with momentum gradient descent',
+               2,
+               None,
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ("But none of these can compete with Newton's method",
+               2,
+               None,
+               'but-none-of-these-can-compete-with-newton-s-method'),
+              ('Including Stochastic Gradient Descent with Autograd',
+               2,
+               None,
+               'including-stochastic-gradient-descent-with-autograd'),
+              ('Same code but now with momentum gradient descent',
+               2,
+               None,
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ('Similar (second order function now) problem but now with '
+               'AdaGrad',
+               2,
+               None,
+               'similar-second-order-function-now-problem-but-now-with-adagrad'),
+              ('RMSprop for adaptive learning rate with Stochastic Gradient '
+               'Descent',
+               2,
+               None,
+               'rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent'),
+              ('And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"',
+               2,
+               None,
+               'and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf'),
+              ('And Logistic Regression', 2, None, 'and-logistic-regression'),
+              ('Introducing "JAX":"https://jax.readthedocs.io/en/latest/"',
+               2,
+               None,
+               'introducing-jax-https-jax-readthedocs-io-en-latest')]}
+end of tocinfo -->
+
+<body>
+
+
+
+<script type="text/x-mathjax-config">
+MathJax.Hub.Config({
+  TeX: {
+     equationNumbers: {  autoNumber: "none"  },
+     extensions: ["AMSmath.js", "AMSsymbols.js", "autobold.js", "color.js"]
+  }
+});
+</script>
+<script type="text/javascript" async
+ src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
+</script>
+
+
+<!-- Bootstrap navigation bar -->
+<div class="navbar navbar-default navbar-fixed-top">
+  <div class="navbar-header">
+    <button type="button" class="navbar-toggle" data-toggle="collapse" data-target=".navbar-responsive-collapse">
+      <span class="icon-bar"></span>
+      <span class="icon-bar"></span>
+      <span class="icon-bar"></span>
+    </button>
+    <a class="navbar-brand" href="http://github.com/week39-bs.html">Week 39: Optimization and  Gradient Methods</a>
+  </div>
+  <div class="navbar-collapse collapse navbar-responsive-collapse">
+    <ul class="nav navbar-nav navbar-right">
+      <li class="dropdown">
+        <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
+        <ul class="dropdown-menu">
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs001.html#plan-for-week-39-september-23-27-2024" style="font-size: 80%;">Plan for week 39, September 23-27, 2024</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs002.html#lecture-monday-september-23" style="font-size: 80%;">Lecture Monday September 23</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs003.html#lab-sessions-week-39" style="font-size: 80%;">Lab sessions week 39</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs004.html#lecture-monday-september-23-optimization-the-central-part-of-any-machine-learning-algortithm" style="font-size: 80%;">Lecture Monday September 23, Optimization, the central part of any Machine Learning algortithm</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs005.html#revisiting-our-logistic-regression-case" style="font-size: 80%;">Revisiting our Logistic Regression case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs006.html#the-equations-to-solve" style="font-size: 80%;">The equations to solve</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs007.html#solving-using-newton-raphson-s-method" style="font-size: 80%;">Solving using Newton-Raphson's method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs008.html#brief-reminder-on-newton-raphson-s-method" style="font-size: 80%;">Brief reminder on Newton-Raphson's method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs009.html#the-equations" style="font-size: 80%;">The equations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs010.html#simple-geometric-interpretation" style="font-size: 80%;">Simple geometric interpretation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs011.html#extending-to-more-than-one-variable" style="font-size: 80%;">Extending to more than one variable</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs012.html#steepest-descent" style="font-size: 80%;">Steepest descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs013.html#more-on-steepest-descent" style="font-size: 80%;">More on Steepest descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs014.html#the-ideal" style="font-size: 80%;">The ideal</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs015.html#the-sensitiveness-of-the-gradient-descent" style="font-size: 80%;">The sensitiveness of the gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs016.html#convex-functions" style="font-size: 80%;">Convex functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs017.html#convex-function" style="font-size: 80%;">Convex function</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs018.html#conditions-on-convex-functions" style="font-size: 80%;">Conditions on convex functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs019.html#more-on-convex-functions" style="font-size: 80%;">More on convex functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs020.html#some-simple-problems" style="font-size: 80%;">Some simple problems</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs021.html#standard-steepest-descent" style="font-size: 80%;">Standard steepest descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs022.html#gradient-method" style="font-size: 80%;">Gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs024.html#steepest-descent-method" style="font-size: 80%;">Steepest descent  method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs024.html#steepest-descent-method" style="font-size: 80%;">Steepest descent  method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs025.html#final-expressions" style="font-size: 80%;">Final expressions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs026.html#steepest-descent-example" style="font-size: 80%;">Steepest descent example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs031.html#conjugate-gradient-method-and-iterations" style="font-size: 80%;">Conjugate gradient method and iterations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs035.html#revisiting-our-first-homework" style="font-size: 80%;">Revisiting our first homework</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs040.html#gradient-descent-example" style="font-size: 80%;">Gradient descent example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs037.html#the-derivative-of-the-cost-loss-function" style="font-size: 80%;">The derivative of the cost/loss function</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs038.html#the-hessian-matrix" style="font-size: 80%;">The Hessian matrix</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs039.html#simple-program" style="font-size: 80%;">Simple program</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs040.html#gradient-descent-example" style="font-size: 80%;">Gradient Descent Example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs041.html#and-a-corresponding-example-using-scikit-learn" style="font-size: 80%;">And a corresponding example using <b>scikit-learn</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs042.html#gradient-descent-and-ridge" style="font-size: 80%;">Gradient descent and Ridge</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs043.html#the-hessian-matrix-for-ridge-regression" style="font-size: 80%;">The Hessian matrix for Ridge Regression</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs044.html#program-example-for-gradient-descent-with-ridge-regression" style="font-size: 80%;">Program example for gradient descent with Ridge Regression</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs045.html#using-gradient-descent-methods-limitations" style="font-size: 80%;">Using gradient descent methods, limitations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs046.html#improving-gradient-descent-with-momentum" style="font-size: 80%;">Improving gradient descent with momentum</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs083.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;">Same code but now with momentum gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs048.html#overview-video-on-stochastic-gradient-descent" style="font-size: 80%;">Overview video on Stochastic Gradient Descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs049.html#batches-and-mini-batches" style="font-size: 80%;">Batches and mini-batches</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs050.html#stochastic-gradient-descent-sgd" style="font-size: 80%;">Stochastic Gradient Descent (SGD)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs051.html#stochastic-gradient-descent" style="font-size: 80%;">Stochastic Gradient Descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs052.html#computation-of-gradients" style="font-size: 80%;">Computation of gradients</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs053.html#sgd-example" style="font-size: 80%;">SGD example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs054.html#the-gradient-step" style="font-size: 80%;">The gradient step</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs055.html#simple-example-code" style="font-size: 80%;">Simple example code</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs056.html#when-do-we-stop" style="font-size: 80%;">When do we stop?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs057.html#slightly-different-approach" style="font-size: 80%;">Slightly different approach</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs058.html#time-decay-rate" style="font-size: 80%;">Time decay rate</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs059.html#code-with-a-number-of-minibatches-which-varies" style="font-size: 80%;">Code with a Number of Minibatches which varies</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs060.html#replace-or-not" style="font-size: 80%;">Replace or not</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs061.html#momentum-based-gd" style="font-size: 80%;">Momentum based GD</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs062.html#more-on-momentum-based-approaches" style="font-size: 80%;">More on momentum based approaches</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs063.html#momentum-parameter" style="font-size: 80%;">Momentum parameter</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs064.html#second-moment-of-the-gradient" style="font-size: 80%;">Second moment of the gradient</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs065.html#rms-prop" style="font-size: 80%;">RMS prop</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs066.html#adam-optimizer-https-arxiv-org-abs-1412-6980" style="font-size: 80%;">"ADAM optimizer":"https://arxiv.org/abs/1412.6980"</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs067.html#algorithms-and-codes-for-adagrad-rmsprop-and-adam" style="font-size: 80%;">Algorithms and codes for Adagrad, RMSprop and Adam</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs068.html#practical-tips" style="font-size: 80%;">Practical tips</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs069.html#automatic-differentiation" style="font-size: 80%;">Automatic differentiation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs070.html#using-autograd" style="font-size: 80%;">Using autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs071.html#autograd-with-more-complicated-functions" style="font-size: 80%;">Autograd with more complicated functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs072.html#more-complicated-functions-using-the-elements-of-their-arguments-directly" style="font-size: 80%;">More complicated functions using the elements of their arguments directly</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs073.html#functions-using-mathematical-functions-from-numpy" style="font-size: 80%;">Functions using mathematical functions from Numpy</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs074.html#more-autograd" style="font-size: 80%;">More autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs075.html#and-with-loops" style="font-size: 80%;">And  with loops</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs076.html#using-recursion" style="font-size: 80%;">Using recursion</a></li>
+     <!-- navigation toc: --> <li><a href="#unsupported-functions" style="font-size: 80%;">Unsupported functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs078.html#the-syntax-a-dot-b-when-finding-the-dot-product" style="font-size: 80%;">The syntax a.dot(b) when finding the dot product</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs079.html#using-autograd-with-ols" style="font-size: 80%;">Using Autograd with OLS</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs083.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;">Same code but now with momentum gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs081.html#but-none-of-these-can-compete-with-newton-s-method" style="font-size: 80%;">But none of these can compete with Newton's method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs082.html#including-stochastic-gradient-descent-with-autograd" style="font-size: 80%;">Including Stochastic Gradient Descent with Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs083.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;">Same code but now with momentum gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs084.html#similar-second-order-function-now-problem-but-now-with-adagrad" style="font-size: 80%;">Similar (second order function now) problem but now with AdaGrad</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs085.html#rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent" style="font-size: 80%;">RMSprop for adaptive learning rate with Stochastic Gradient Descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs086.html#and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf" style="font-size: 80%;">And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs087.html#and-logistic-regression" style="font-size: 80%;">And Logistic Regression</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs088.html#introducing-jax-https-jax-readthedocs-io-en-latest" style="font-size: 80%;">Introducing "JAX":"https://jax.readthedocs.io/en/latest/"</a></li>
+
+        </ul>
+      </li>
+    </ul>
+  </div>
+</div>
+</div> <!-- end of navigation bar -->
+<div class="container">
+<p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
+<a name="part0077"></a>
+<!-- !split -->
+<h2 id="unsupported-functions" class="anchor">Unsupported functions </h2>
+<p>Autograd supports many features. However, there are some functions that is not supported (yet) by Autograd.</p>
+
+<p>Assigning a value to the variable being differentiated with respect to</p>
+
+<!-- code=python (!bc pycod) typeset with pygments style "default" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #f8f8f8">
+  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">autograd.numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">autograd</span> <span style="color: #008000; font-weight: bold">import</span> grad
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">f8</span>(x): <span style="color: #408080; font-style: italic"># Assume x is an array</span>
+    x[<span style="color: #666666">2</span>] <span style="color: #666666">=</span> <span style="color: #666666">3</span>
+    <span style="color: #008000; font-weight: bold">return</span> x<span style="color: #666666">*2</span>
+
+<span style="color: #408080; font-style: italic">#f8_grad = grad(f8)</span>
+
+<span style="color: #408080; font-style: italic">#x = 8.4</span>
+
+<span style="color: #408080; font-style: italic">#print(&quot;The derivative of f8 is:&quot;,f8_grad(x))</span>
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
+</div>
+
+<p>Here, running this code, Autograd tells us that an 'ArrayBox' does not support item assignment. The item assignment is done when the program tries to assign x[2] to the value 3. However, Autograd has implemented the computation of the derivative such that this assignment is not possible.</p>
+
+<p>
+<!-- navigation buttons at the bottom of the page -->
+<ul class="pagination">
+<li><a href="http://github.com/._week39-bs076.html">&laquo;</a></li>
+  <li><a href="http://github.com/._week39-bs000.html">1</a></li>
+  <li><a href="">...</a></li>
+  <li><a href="http://github.com/._week39-bs069.html">70</a></li>
+  <li><a href="http://github.com/._week39-bs070.html">71</a></li>
+  <li><a href="http://github.com/._week39-bs071.html">72</a></li>
+  <li><a href="http://github.com/._week39-bs072.html">73</a></li>
+  <li><a href="http://github.com/._week39-bs073.html">74</a></li>
+  <li><a href="http://github.com/._week39-bs074.html">75</a></li>
+  <li><a href="http://github.com/._week39-bs075.html">76</a></li>
+  <li><a href="http://github.com/._week39-bs076.html">77</a></li>
+  <li class="active"><a href="http://github.com/._week39-bs077.html">78</a></li>
+  <li><a href="http://github.com/._week39-bs078.html">79</a></li>
+  <li><a href="http://github.com/._week39-bs079.html">80</a></li>
+  <li><a href="http://github.com/._week39-bs080.html">81</a></li>
+  <li><a href="http://github.com/._week39-bs081.html">82</a></li>
+  <li><a href="http://github.com/._week39-bs082.html">83</a></li>
+  <li><a href="http://github.com/._week39-bs083.html">84</a></li>
+  <li><a href="http://github.com/._week39-bs084.html">85</a></li>
+  <li><a href="http://github.com/._week39-bs085.html">86</a></li>
+  <li><a href="http://github.com/._week39-bs086.html">87</a></li>
+  <li><a href="">...</a></li>
+  <li><a href="http://github.com/._week39-bs088.html">89</a></li>
+  <li><a href="http://github.com/._week39-bs078.html">&raquo;</a></li>
+</ul>
+<!-- ------------------- end of main content --------------- -->
+</div>  <!-- end container -->
+<!-- include javascript, jQuery *first* -->
+<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.10.2/jquery.min.js"></script>
+<script src="https://netdna.bootstrapcdn.com/bootstrap/3.0.0/js/bootstrap.min.js"></script>
+<!-- Bootstrap footer
+<footer>
+<a href="https://.../"><img width="250" align=right src="https://.../"></a>
+</footer>
+-->
+<center style="font-size:80%">
+<!-- copyright only on the titlepage -->
+</center>
+</body>
+</html>
+
diff --git a/doc/pub/week39/html/._week39-bs078.html b/doc/pub/week39/html/._week39-bs078.html
new file mode 100644
index 000000000..7b4e40c3e
--- /dev/null
+++ b/doc/pub/week39/html/._week39-bs078.html
@@ -0,0 +1,533 @@
+<!--
+HTML file automatically generated from DocOnce source
+(https://github.com/doconce/doconce/)
+doconce format html week39.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week39-bs --no_mako
+-->
+<html>
+<head>
+<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
+<meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
+<meta name="viewport" content="width=device-width, initial-scale=1.0" />
+<meta name="description" content="Week 39: Optimization and  Gradient Methods">
+<title>Week 39: Optimization and  Gradient Methods</title>
+<!-- Bootstrap style: bootstrap -->
+<!-- doconce format html week39.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week39-bs --no_mako -->
+<link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
+<!-- not necessary
+<link href="https://netdna.bootstrapcdn.com/font-awesome/4.0.3/css/font-awesome.css" rel="stylesheet">
+-->
+<style type="text/css">
+/* Add scrollbar to dropdown menus in bootstrap navigation bar */
+.dropdown-menu {
+   height: auto;
+   max-height: 400px;
+   overflow-x: hidden;
+}
+/* Adds an invisible element before each target to offset for the navigation
+   bar */
+.anchor::before {
+  content:"";
+  display:block;
+  height:50px;      /* fixed header height for style bootstrap */
+  margin:-50px 0 0; /* negative fixed header height */
+}
+</style>
+</head>
+
+<!-- tocinfo
+{'highest level': 2,
+ 'sections': [('Plan for week 39, September 23-27, 2024',
+               2,
+               None,
+               'plan-for-week-39-september-23-27-2024'),
+              ('Lecture Monday September 23',
+               2,
+               None,
+               'lecture-monday-september-23'),
+              ('Lab sessions week 39', 2, None, 'lab-sessions-week-39'),
+              ('Lecture Monday September 23, Optimization, the central part of '
+               'any Machine Learning algortithm',
+               2,
+               None,
+               'lecture-monday-september-23-optimization-the-central-part-of-any-machine-learning-algortithm'),
+              ('Revisiting our Logistic Regression case',
+               2,
+               None,
+               'revisiting-our-logistic-regression-case'),
+              ('The equations to solve', 2, None, 'the-equations-to-solve'),
+              ("Solving using Newton-Raphson's method",
+               2,
+               None,
+               'solving-using-newton-raphson-s-method'),
+              ("Brief reminder on Newton-Raphson's method",
+               2,
+               None,
+               'brief-reminder-on-newton-raphson-s-method'),
+              ('The equations', 2, None, 'the-equations'),
+              ('Simple geometric interpretation',
+               2,
+               None,
+               'simple-geometric-interpretation'),
+              ('Extending to more than one variable',
+               2,
+               None,
+               'extending-to-more-than-one-variable'),
+              ('Steepest descent', 2, None, 'steepest-descent'),
+              ('More on Steepest descent', 2, None, 'more-on-steepest-descent'),
+              ('The ideal', 2, None, 'the-ideal'),
+              ('The sensitiveness of the gradient descent',
+               2,
+               None,
+               'the-sensitiveness-of-the-gradient-descent'),
+              ('Convex functions', 2, None, 'convex-functions'),
+              ('Convex function', 2, None, 'convex-function'),
+              ('Conditions on convex functions',
+               2,
+               None,
+               'conditions-on-convex-functions'),
+              ('More on convex functions', 2, None, 'more-on-convex-functions'),
+              ('Some simple problems', 2, None, 'some-simple-problems'),
+              ('Standard steepest descent',
+               2,
+               None,
+               'standard-steepest-descent'),
+              ('Gradient method', 2, None, 'gradient-method'),
+              ('Steepest descent  method', 2, None, 'steepest-descent-method'),
+              ('Steepest descent  method', 2, None, 'steepest-descent-method'),
+              ('Final expressions', 2, None, 'final-expressions'),
+              ('Steepest descent example', 2, None, 'steepest-descent-example'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method and iterations',
+               2,
+               None,
+               'conjugate-gradient-method-and-iterations'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Revisiting our first homework',
+               2,
+               None,
+               'revisiting-our-first-homework'),
+              ('Gradient descent example', 2, None, 'gradient-descent-example'),
+              ('The derivative of the cost/loss function',
+               2,
+               None,
+               'the-derivative-of-the-cost-loss-function'),
+              ('The Hessian matrix', 2, None, 'the-hessian-matrix'),
+              ('Simple program', 2, None, 'simple-program'),
+              ('Gradient Descent Example', 2, None, 'gradient-descent-example'),
+              ('And a corresponding example using _scikit-learn_',
+               2,
+               None,
+               'and-a-corresponding-example-using-scikit-learn'),
+              ('Gradient descent and Ridge',
+               2,
+               None,
+               'gradient-descent-and-ridge'),
+              ('The Hessian matrix for Ridge Regression',
+               2,
+               None,
+               'the-hessian-matrix-for-ridge-regression'),
+              ('Program example for gradient descent with Ridge Regression',
+               2,
+               None,
+               'program-example-for-gradient-descent-with-ridge-regression'),
+              ('Using gradient descent methods, limitations',
+               2,
+               None,
+               'using-gradient-descent-methods-limitations'),
+              ('Improving gradient descent with momentum',
+               2,
+               None,
+               'improving-gradient-descent-with-momentum'),
+              ('Same code but now with momentum gradient descent',
+               2,
+               None,
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ('Overview video on Stochastic Gradient Descent',
+               2,
+               None,
+               'overview-video-on-stochastic-gradient-descent'),
+              ('Batches and mini-batches', 2, None, 'batches-and-mini-batches'),
+              ('Stochastic Gradient Descent (SGD)',
+               2,
+               None,
+               'stochastic-gradient-descent-sgd'),
+              ('Stochastic Gradient Descent',
+               2,
+               None,
+               'stochastic-gradient-descent'),
+              ('Computation of gradients', 2, None, 'computation-of-gradients'),
+              ('SGD example', 2, None, 'sgd-example'),
+              ('The gradient step', 2, None, 'the-gradient-step'),
+              ('Simple example code', 2, None, 'simple-example-code'),
+              ('When do we stop?', 2, None, 'when-do-we-stop'),
+              ('Slightly different approach',
+               2,
+               None,
+               'slightly-different-approach'),
+              ('Time decay rate', 2, None, 'time-decay-rate'),
+              ('Code with a Number of Minibatches which varies',
+               2,
+               None,
+               'code-with-a-number-of-minibatches-which-varies'),
+              ('Replace or not', 2, None, 'replace-or-not'),
+              ('Momentum based GD', 2, None, 'momentum-based-gd'),
+              ('More on momentum based approaches',
+               2,
+               None,
+               'more-on-momentum-based-approaches'),
+              ('Momentum parameter', 2, None, 'momentum-parameter'),
+              ('Second moment of the gradient',
+               2,
+               None,
+               'second-moment-of-the-gradient'),
+              ('RMS prop', 2, None, 'rms-prop'),
+              ('"ADAM optimizer":"https://arxiv.org/abs/1412.6980"',
+               2,
+               None,
+               'adam-optimizer-https-arxiv-org-abs-1412-6980'),
+              ('Algorithms and codes for Adagrad, RMSprop and Adam',
+               2,
+               None,
+               'algorithms-and-codes-for-adagrad-rmsprop-and-adam'),
+              ('Practical tips', 2, None, 'practical-tips'),
+              ('Automatic differentiation',
+               2,
+               None,
+               'automatic-differentiation'),
+              ('Using autograd', 2, None, 'using-autograd'),
+              ('Autograd with more complicated functions',
+               2,
+               None,
+               'autograd-with-more-complicated-functions'),
+              ('More complicated functions using the elements of their '
+               'arguments directly',
+               2,
+               None,
+               'more-complicated-functions-using-the-elements-of-their-arguments-directly'),
+              ('Functions using mathematical functions from Numpy',
+               2,
+               None,
+               'functions-using-mathematical-functions-from-numpy'),
+              ('More autograd', 2, None, 'more-autograd'),
+              ('And  with loops', 2, None, 'and-with-loops'),
+              ('Using recursion', 2, None, 'using-recursion'),
+              ('Unsupported functions', 2, None, 'unsupported-functions'),
+              ('The syntax a.dot(b) when finding the dot product',
+               2,
+               None,
+               'the-syntax-a-dot-b-when-finding-the-dot-product'),
+              ('Using Autograd with OLS', 2, None, 'using-autograd-with-ols'),
+              ('Same code but now with momentum gradient descent',
+               2,
+               None,
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ("But none of these can compete with Newton's method",
+               2,
+               None,
+               'but-none-of-these-can-compete-with-newton-s-method'),
+              ('Including Stochastic Gradient Descent with Autograd',
+               2,
+               None,
+               'including-stochastic-gradient-descent-with-autograd'),
+              ('Same code but now with momentum gradient descent',
+               2,
+               None,
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ('Similar (second order function now) problem but now with '
+               'AdaGrad',
+               2,
+               None,
+               'similar-second-order-function-now-problem-but-now-with-adagrad'),
+              ('RMSprop for adaptive learning rate with Stochastic Gradient '
+               'Descent',
+               2,
+               None,
+               'rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent'),
+              ('And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"',
+               2,
+               None,
+               'and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf'),
+              ('And Logistic Regression', 2, None, 'and-logistic-regression'),
+              ('Introducing "JAX":"https://jax.readthedocs.io/en/latest/"',
+               2,
+               None,
+               'introducing-jax-https-jax-readthedocs-io-en-latest')]}
+end of tocinfo -->
+
+<body>
+
+
+
+<script type="text/x-mathjax-config">
+MathJax.Hub.Config({
+  TeX: {
+     equationNumbers: {  autoNumber: "none"  },
+     extensions: ["AMSmath.js", "AMSsymbols.js", "autobold.js", "color.js"]
+  }
+});
+</script>
+<script type="text/javascript" async
+ src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
+</script>
+
+
+<!-- Bootstrap navigation bar -->
+<div class="navbar navbar-default navbar-fixed-top">
+  <div class="navbar-header">
+    <button type="button" class="navbar-toggle" data-toggle="collapse" data-target=".navbar-responsive-collapse">
+      <span class="icon-bar"></span>
+      <span class="icon-bar"></span>
+      <span class="icon-bar"></span>
+    </button>
+    <a class="navbar-brand" href="http://github.com/week39-bs.html">Week 39: Optimization and  Gradient Methods</a>
+  </div>
+  <div class="navbar-collapse collapse navbar-responsive-collapse">
+    <ul class="nav navbar-nav navbar-right">
+      <li class="dropdown">
+        <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
+        <ul class="dropdown-menu">
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs001.html#plan-for-week-39-september-23-27-2024" style="font-size: 80%;">Plan for week 39, September 23-27, 2024</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs002.html#lecture-monday-september-23" style="font-size: 80%;">Lecture Monday September 23</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs003.html#lab-sessions-week-39" style="font-size: 80%;">Lab sessions week 39</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs004.html#lecture-monday-september-23-optimization-the-central-part-of-any-machine-learning-algortithm" style="font-size: 80%;">Lecture Monday September 23, Optimization, the central part of any Machine Learning algortithm</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs005.html#revisiting-our-logistic-regression-case" style="font-size: 80%;">Revisiting our Logistic Regression case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs006.html#the-equations-to-solve" style="font-size: 80%;">The equations to solve</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs007.html#solving-using-newton-raphson-s-method" style="font-size: 80%;">Solving using Newton-Raphson's method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs008.html#brief-reminder-on-newton-raphson-s-method" style="font-size: 80%;">Brief reminder on Newton-Raphson's method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs009.html#the-equations" style="font-size: 80%;">The equations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs010.html#simple-geometric-interpretation" style="font-size: 80%;">Simple geometric interpretation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs011.html#extending-to-more-than-one-variable" style="font-size: 80%;">Extending to more than one variable</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs012.html#steepest-descent" style="font-size: 80%;">Steepest descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs013.html#more-on-steepest-descent" style="font-size: 80%;">More on Steepest descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs014.html#the-ideal" style="font-size: 80%;">The ideal</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs015.html#the-sensitiveness-of-the-gradient-descent" style="font-size: 80%;">The sensitiveness of the gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs016.html#convex-functions" style="font-size: 80%;">Convex functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs017.html#convex-function" style="font-size: 80%;">Convex function</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs018.html#conditions-on-convex-functions" style="font-size: 80%;">Conditions on convex functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs019.html#more-on-convex-functions" style="font-size: 80%;">More on convex functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs020.html#some-simple-problems" style="font-size: 80%;">Some simple problems</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs021.html#standard-steepest-descent" style="font-size: 80%;">Standard steepest descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs022.html#gradient-method" style="font-size: 80%;">Gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs024.html#steepest-descent-method" style="font-size: 80%;">Steepest descent  method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs024.html#steepest-descent-method" style="font-size: 80%;">Steepest descent  method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs025.html#final-expressions" style="font-size: 80%;">Final expressions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs026.html#steepest-descent-example" style="font-size: 80%;">Steepest descent example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs031.html#conjugate-gradient-method-and-iterations" style="font-size: 80%;">Conjugate gradient method and iterations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs035.html#revisiting-our-first-homework" style="font-size: 80%;">Revisiting our first homework</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs040.html#gradient-descent-example" style="font-size: 80%;">Gradient descent example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs037.html#the-derivative-of-the-cost-loss-function" style="font-size: 80%;">The derivative of the cost/loss function</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs038.html#the-hessian-matrix" style="font-size: 80%;">The Hessian matrix</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs039.html#simple-program" style="font-size: 80%;">Simple program</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs040.html#gradient-descent-example" style="font-size: 80%;">Gradient Descent Example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs041.html#and-a-corresponding-example-using-scikit-learn" style="font-size: 80%;">And a corresponding example using <b>scikit-learn</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs042.html#gradient-descent-and-ridge" style="font-size: 80%;">Gradient descent and Ridge</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs043.html#the-hessian-matrix-for-ridge-regression" style="font-size: 80%;">The Hessian matrix for Ridge Regression</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs044.html#program-example-for-gradient-descent-with-ridge-regression" style="font-size: 80%;">Program example for gradient descent with Ridge Regression</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs045.html#using-gradient-descent-methods-limitations" style="font-size: 80%;">Using gradient descent methods, limitations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs046.html#improving-gradient-descent-with-momentum" style="font-size: 80%;">Improving gradient descent with momentum</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs083.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;">Same code but now with momentum gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs048.html#overview-video-on-stochastic-gradient-descent" style="font-size: 80%;">Overview video on Stochastic Gradient Descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs049.html#batches-and-mini-batches" style="font-size: 80%;">Batches and mini-batches</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs050.html#stochastic-gradient-descent-sgd" style="font-size: 80%;">Stochastic Gradient Descent (SGD)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs051.html#stochastic-gradient-descent" style="font-size: 80%;">Stochastic Gradient Descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs052.html#computation-of-gradients" style="font-size: 80%;">Computation of gradients</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs053.html#sgd-example" style="font-size: 80%;">SGD example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs054.html#the-gradient-step" style="font-size: 80%;">The gradient step</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs055.html#simple-example-code" style="font-size: 80%;">Simple example code</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs056.html#when-do-we-stop" style="font-size: 80%;">When do we stop?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs057.html#slightly-different-approach" style="font-size: 80%;">Slightly different approach</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs058.html#time-decay-rate" style="font-size: 80%;">Time decay rate</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs059.html#code-with-a-number-of-minibatches-which-varies" style="font-size: 80%;">Code with a Number of Minibatches which varies</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs060.html#replace-or-not" style="font-size: 80%;">Replace or not</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs061.html#momentum-based-gd" style="font-size: 80%;">Momentum based GD</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs062.html#more-on-momentum-based-approaches" style="font-size: 80%;">More on momentum based approaches</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs063.html#momentum-parameter" style="font-size: 80%;">Momentum parameter</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs064.html#second-moment-of-the-gradient" style="font-size: 80%;">Second moment of the gradient</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs065.html#rms-prop" style="font-size: 80%;">RMS prop</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs066.html#adam-optimizer-https-arxiv-org-abs-1412-6980" style="font-size: 80%;">"ADAM optimizer":"https://arxiv.org/abs/1412.6980"</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs067.html#algorithms-and-codes-for-adagrad-rmsprop-and-adam" style="font-size: 80%;">Algorithms and codes for Adagrad, RMSprop and Adam</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs068.html#practical-tips" style="font-size: 80%;">Practical tips</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs069.html#automatic-differentiation" style="font-size: 80%;">Automatic differentiation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs070.html#using-autograd" style="font-size: 80%;">Using autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs071.html#autograd-with-more-complicated-functions" style="font-size: 80%;">Autograd with more complicated functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs072.html#more-complicated-functions-using-the-elements-of-their-arguments-directly" style="font-size: 80%;">More complicated functions using the elements of their arguments directly</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs073.html#functions-using-mathematical-functions-from-numpy" style="font-size: 80%;">Functions using mathematical functions from Numpy</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs074.html#more-autograd" style="font-size: 80%;">More autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs075.html#and-with-loops" style="font-size: 80%;">And  with loops</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs076.html#using-recursion" style="font-size: 80%;">Using recursion</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs077.html#unsupported-functions" style="font-size: 80%;">Unsupported functions</a></li>
+     <!-- navigation toc: --> <li><a href="#the-syntax-a-dot-b-when-finding-the-dot-product" style="font-size: 80%;">The syntax a.dot(b) when finding the dot product</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs079.html#using-autograd-with-ols" style="font-size: 80%;">Using Autograd with OLS</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs083.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;">Same code but now with momentum gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs081.html#but-none-of-these-can-compete-with-newton-s-method" style="font-size: 80%;">But none of these can compete with Newton's method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs082.html#including-stochastic-gradient-descent-with-autograd" style="font-size: 80%;">Including Stochastic Gradient Descent with Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs083.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;">Same code but now with momentum gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs084.html#similar-second-order-function-now-problem-but-now-with-adagrad" style="font-size: 80%;">Similar (second order function now) problem but now with AdaGrad</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs085.html#rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent" style="font-size: 80%;">RMSprop for adaptive learning rate with Stochastic Gradient Descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs086.html#and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf" style="font-size: 80%;">And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs087.html#and-logistic-regression" style="font-size: 80%;">And Logistic Regression</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs088.html#introducing-jax-https-jax-readthedocs-io-en-latest" style="font-size: 80%;">Introducing "JAX":"https://jax.readthedocs.io/en/latest/"</a></li>
+
+        </ul>
+      </li>
+    </ul>
+  </div>
+</div>
+</div> <!-- end of navigation bar -->
+<div class="container">
+<p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
+<a name="part0078"></a>
+<!-- !split -->
+<h2 id="the-syntax-a-dot-b-when-finding-the-dot-product" class="anchor">The syntax a.dot(b) when finding the dot product </h2>
+
+<!-- code=python (!bc pycod) typeset with pygments style "default" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #f8f8f8">
+  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">autograd.numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">autograd</span> <span style="color: #008000; font-weight: bold">import</span> grad
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">f9</span>(a): <span style="color: #408080; font-style: italic"># Assume a is an array with 2 elements</span>
+    b <span style="color: #666666">=</span> np<span style="color: #666666">.</span>array([<span style="color: #666666">1.0</span>,<span style="color: #666666">2.0</span>])
+    <span style="color: #008000; font-weight: bold">return</span> a<span style="color: #666666">.</span>dot(b)
+
+<span style="color: #408080; font-style: italic">#f9_grad = grad(f9)</span>
+
+<span style="color: #408080; font-style: italic">#x = np.array([1.0,0.0])</span>
+
+<span style="color: #408080; font-style: italic">#print(&quot;The derivative of f9 is:&quot;,f9_grad(x))</span>
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
+</div>
+
+<p>Here we are told that the 'dot' function does not belong to Autograd's
+version of a Numpy array.  To overcome this, an alternative syntax
+which also computed the dot product can be used:
+</p>
+
+
+<!-- code=python (!bc pycod) typeset with pygments style "default" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #f8f8f8">
+  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">autograd.numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">autograd</span> <span style="color: #008000; font-weight: bold">import</span> grad
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">f9_alternative</span>(x): <span style="color: #408080; font-style: italic"># Assume a is an array with 2 elements</span>
+    b <span style="color: #666666">=</span> np<span style="color: #666666">.</span>array([<span style="color: #666666">1.0</span>,<span style="color: #666666">2.0</span>])
+    <span style="color: #008000; font-weight: bold">return</span> np<span style="color: #666666">.</span>dot(x,b) <span style="color: #408080; font-style: italic"># The same as x_1*b_1 + x_2*b_2</span>
+
+f9_alternative_grad <span style="color: #666666">=</span> grad(f9_alternative)
+
+x <span style="color: #666666">=</span> np<span style="color: #666666">.</span>array([<span style="color: #666666">3.0</span>,<span style="color: #666666">0.0</span>])
+
+<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;The gradient of f9 is:&quot;</span>,f9_alternative_grad(x))
+
+<span style="color: #408080; font-style: italic"># The analytical gradient of the dot product of vectors x and b with two elements (x_1,x_2) and (b_1, b_2) respectively</span>
+<span style="color: #408080; font-style: italic"># w.r.t x is (b_1, b_2).</span>
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
+</div>
+
+
+<p>
+<!-- navigation buttons at the bottom of the page -->
+<ul class="pagination">
+<li><a href="http://github.com/._week39-bs077.html">&laquo;</a></li>
+  <li><a href="http://github.com/._week39-bs000.html">1</a></li>
+  <li><a href="">...</a></li>
+  <li><a href="http://github.com/._week39-bs070.html">71</a></li>
+  <li><a href="http://github.com/._week39-bs071.html">72</a></li>
+  <li><a href="http://github.com/._week39-bs072.html">73</a></li>
+  <li><a href="http://github.com/._week39-bs073.html">74</a></li>
+  <li><a href="http://github.com/._week39-bs074.html">75</a></li>
+  <li><a href="http://github.com/._week39-bs075.html">76</a></li>
+  <li><a href="http://github.com/._week39-bs076.html">77</a></li>
+  <li><a href="http://github.com/._week39-bs077.html">78</a></li>
+  <li class="active"><a href="http://github.com/._week39-bs078.html">79</a></li>
+  <li><a href="http://github.com/._week39-bs079.html">80</a></li>
+  <li><a href="http://github.com/._week39-bs080.html">81</a></li>
+  <li><a href="http://github.com/._week39-bs081.html">82</a></li>
+  <li><a href="http://github.com/._week39-bs082.html">83</a></li>
+  <li><a href="http://github.com/._week39-bs083.html">84</a></li>
+  <li><a href="http://github.com/._week39-bs084.html">85</a></li>
+  <li><a href="http://github.com/._week39-bs085.html">86</a></li>
+  <li><a href="http://github.com/._week39-bs086.html">87</a></li>
+  <li><a href="http://github.com/._week39-bs087.html">88</a></li>
+  <li><a href="">...</a></li>
+  <li><a href="http://github.com/._week39-bs088.html">89</a></li>
+  <li><a href="http://github.com/._week39-bs079.html">&raquo;</a></li>
+</ul>
+<!-- ------------------- end of main content --------------- -->
+</div>  <!-- end container -->
+<!-- include javascript, jQuery *first* -->
+<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.10.2/jquery.min.js"></script>
+<script src="https://netdna.bootstrapcdn.com/bootstrap/3.0.0/js/bootstrap.min.js"></script>
+<!-- Bootstrap footer
+<footer>
+<a href="https://.../"><img width="250" align=right src="https://.../"></a>
+</footer>
+-->
+<center style="font-size:80%">
+<!-- copyright only on the titlepage -->
+</center>
+</body>
+</html>
+
diff --git a/doc/pub/week39/html/._week39-bs079.html b/doc/pub/week39/html/._week39-bs079.html
new file mode 100644
index 000000000..22540ce8a
--- /dev/null
+++ b/doc/pub/week39/html/._week39-bs079.html
@@ -0,0 +1,534 @@
+<!--
+HTML file automatically generated from DocOnce source
+(https://github.com/doconce/doconce/)
+doconce format html week39.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week39-bs --no_mako
+-->
+<html>
+<head>
+<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
+<meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
+<meta name="viewport" content="width=device-width, initial-scale=1.0" />
+<meta name="description" content="Week 39: Optimization and  Gradient Methods">
+<title>Week 39: Optimization and  Gradient Methods</title>
+<!-- Bootstrap style: bootstrap -->
+<!-- doconce format html week39.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week39-bs --no_mako -->
+<link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
+<!-- not necessary
+<link href="https://netdna.bootstrapcdn.com/font-awesome/4.0.3/css/font-awesome.css" rel="stylesheet">
+-->
+<style type="text/css">
+/* Add scrollbar to dropdown menus in bootstrap navigation bar */
+.dropdown-menu {
+   height: auto;
+   max-height: 400px;
+   overflow-x: hidden;
+}
+/* Adds an invisible element before each target to offset for the navigation
+   bar */
+.anchor::before {
+  content:"";
+  display:block;
+  height:50px;      /* fixed header height for style bootstrap */
+  margin:-50px 0 0; /* negative fixed header height */
+}
+</style>
+</head>
+
+<!-- tocinfo
+{'highest level': 2,
+ 'sections': [('Plan for week 39, September 23-27, 2024',
+               2,
+               None,
+               'plan-for-week-39-september-23-27-2024'),
+              ('Lecture Monday September 23',
+               2,
+               None,
+               'lecture-monday-september-23'),
+              ('Lab sessions week 39', 2, None, 'lab-sessions-week-39'),
+              ('Lecture Monday September 23, Optimization, the central part of '
+               'any Machine Learning algortithm',
+               2,
+               None,
+               'lecture-monday-september-23-optimization-the-central-part-of-any-machine-learning-algortithm'),
+              ('Revisiting our Logistic Regression case',
+               2,
+               None,
+               'revisiting-our-logistic-regression-case'),
+              ('The equations to solve', 2, None, 'the-equations-to-solve'),
+              ("Solving using Newton-Raphson's method",
+               2,
+               None,
+               'solving-using-newton-raphson-s-method'),
+              ("Brief reminder on Newton-Raphson's method",
+               2,
+               None,
+               'brief-reminder-on-newton-raphson-s-method'),
+              ('The equations', 2, None, 'the-equations'),
+              ('Simple geometric interpretation',
+               2,
+               None,
+               'simple-geometric-interpretation'),
+              ('Extending to more than one variable',
+               2,
+               None,
+               'extending-to-more-than-one-variable'),
+              ('Steepest descent', 2, None, 'steepest-descent'),
+              ('More on Steepest descent', 2, None, 'more-on-steepest-descent'),
+              ('The ideal', 2, None, 'the-ideal'),
+              ('The sensitiveness of the gradient descent',
+               2,
+               None,
+               'the-sensitiveness-of-the-gradient-descent'),
+              ('Convex functions', 2, None, 'convex-functions'),
+              ('Convex function', 2, None, 'convex-function'),
+              ('Conditions on convex functions',
+               2,
+               None,
+               'conditions-on-convex-functions'),
+              ('More on convex functions', 2, None, 'more-on-convex-functions'),
+              ('Some simple problems', 2, None, 'some-simple-problems'),
+              ('Standard steepest descent',
+               2,
+               None,
+               'standard-steepest-descent'),
+              ('Gradient method', 2, None, 'gradient-method'),
+              ('Steepest descent  method', 2, None, 'steepest-descent-method'),
+              ('Steepest descent  method', 2, None, 'steepest-descent-method'),
+              ('Final expressions', 2, None, 'final-expressions'),
+              ('Steepest descent example', 2, None, 'steepest-descent-example'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method and iterations',
+               2,
+               None,
+               'conjugate-gradient-method-and-iterations'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Revisiting our first homework',
+               2,
+               None,
+               'revisiting-our-first-homework'),
+              ('Gradient descent example', 2, None, 'gradient-descent-example'),
+              ('The derivative of the cost/loss function',
+               2,
+               None,
+               'the-derivative-of-the-cost-loss-function'),
+              ('The Hessian matrix', 2, None, 'the-hessian-matrix'),
+              ('Simple program', 2, None, 'simple-program'),
+              ('Gradient Descent Example', 2, None, 'gradient-descent-example'),
+              ('And a corresponding example using _scikit-learn_',
+               2,
+               None,
+               'and-a-corresponding-example-using-scikit-learn'),
+              ('Gradient descent and Ridge',
+               2,
+               None,
+               'gradient-descent-and-ridge'),
+              ('The Hessian matrix for Ridge Regression',
+               2,
+               None,
+               'the-hessian-matrix-for-ridge-regression'),
+              ('Program example for gradient descent with Ridge Regression',
+               2,
+               None,
+               'program-example-for-gradient-descent-with-ridge-regression'),
+              ('Using gradient descent methods, limitations',
+               2,
+               None,
+               'using-gradient-descent-methods-limitations'),
+              ('Improving gradient descent with momentum',
+               2,
+               None,
+               'improving-gradient-descent-with-momentum'),
+              ('Same code but now with momentum gradient descent',
+               2,
+               None,
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ('Overview video on Stochastic Gradient Descent',
+               2,
+               None,
+               'overview-video-on-stochastic-gradient-descent'),
+              ('Batches and mini-batches', 2, None, 'batches-and-mini-batches'),
+              ('Stochastic Gradient Descent (SGD)',
+               2,
+               None,
+               'stochastic-gradient-descent-sgd'),
+              ('Stochastic Gradient Descent',
+               2,
+               None,
+               'stochastic-gradient-descent'),
+              ('Computation of gradients', 2, None, 'computation-of-gradients'),
+              ('SGD example', 2, None, 'sgd-example'),
+              ('The gradient step', 2, None, 'the-gradient-step'),
+              ('Simple example code', 2, None, 'simple-example-code'),
+              ('When do we stop?', 2, None, 'when-do-we-stop'),
+              ('Slightly different approach',
+               2,
+               None,
+               'slightly-different-approach'),
+              ('Time decay rate', 2, None, 'time-decay-rate'),
+              ('Code with a Number of Minibatches which varies',
+               2,
+               None,
+               'code-with-a-number-of-minibatches-which-varies'),
+              ('Replace or not', 2, None, 'replace-or-not'),
+              ('Momentum based GD', 2, None, 'momentum-based-gd'),
+              ('More on momentum based approaches',
+               2,
+               None,
+               'more-on-momentum-based-approaches'),
+              ('Momentum parameter', 2, None, 'momentum-parameter'),
+              ('Second moment of the gradient',
+               2,
+               None,
+               'second-moment-of-the-gradient'),
+              ('RMS prop', 2, None, 'rms-prop'),
+              ('"ADAM optimizer":"https://arxiv.org/abs/1412.6980"',
+               2,
+               None,
+               'adam-optimizer-https-arxiv-org-abs-1412-6980'),
+              ('Algorithms and codes for Adagrad, RMSprop and Adam',
+               2,
+               None,
+               'algorithms-and-codes-for-adagrad-rmsprop-and-adam'),
+              ('Practical tips', 2, None, 'practical-tips'),
+              ('Automatic differentiation',
+               2,
+               None,
+               'automatic-differentiation'),
+              ('Using autograd', 2, None, 'using-autograd'),
+              ('Autograd with more complicated functions',
+               2,
+               None,
+               'autograd-with-more-complicated-functions'),
+              ('More complicated functions using the elements of their '
+               'arguments directly',
+               2,
+               None,
+               'more-complicated-functions-using-the-elements-of-their-arguments-directly'),
+              ('Functions using mathematical functions from Numpy',
+               2,
+               None,
+               'functions-using-mathematical-functions-from-numpy'),
+              ('More autograd', 2, None, 'more-autograd'),
+              ('And  with loops', 2, None, 'and-with-loops'),
+              ('Using recursion', 2, None, 'using-recursion'),
+              ('Unsupported functions', 2, None, 'unsupported-functions'),
+              ('The syntax a.dot(b) when finding the dot product',
+               2,
+               None,
+               'the-syntax-a-dot-b-when-finding-the-dot-product'),
+              ('Using Autograd with OLS', 2, None, 'using-autograd-with-ols'),
+              ('Same code but now with momentum gradient descent',
+               2,
+               None,
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ("But none of these can compete with Newton's method",
+               2,
+               None,
+               'but-none-of-these-can-compete-with-newton-s-method'),
+              ('Including Stochastic Gradient Descent with Autograd',
+               2,
+               None,
+               'including-stochastic-gradient-descent-with-autograd'),
+              ('Same code but now with momentum gradient descent',
+               2,
+               None,
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ('Similar (second order function now) problem but now with '
+               'AdaGrad',
+               2,
+               None,
+               'similar-second-order-function-now-problem-but-now-with-adagrad'),
+              ('RMSprop for adaptive learning rate with Stochastic Gradient '
+               'Descent',
+               2,
+               None,
+               'rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent'),
+              ('And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"',
+               2,
+               None,
+               'and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf'),
+              ('And Logistic Regression', 2, None, 'and-logistic-regression'),
+              ('Introducing "JAX":"https://jax.readthedocs.io/en/latest/"',
+               2,
+               None,
+               'introducing-jax-https-jax-readthedocs-io-en-latest')]}
+end of tocinfo -->
+
+<body>
+
+
+
+<script type="text/x-mathjax-config">
+MathJax.Hub.Config({
+  TeX: {
+     equationNumbers: {  autoNumber: "none"  },
+     extensions: ["AMSmath.js", "AMSsymbols.js", "autobold.js", "color.js"]
+  }
+});
+</script>
+<script type="text/javascript" async
+ src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
+</script>
+
+
+<!-- Bootstrap navigation bar -->
+<div class="navbar navbar-default navbar-fixed-top">
+  <div class="navbar-header">
+    <button type="button" class="navbar-toggle" data-toggle="collapse" data-target=".navbar-responsive-collapse">
+      <span class="icon-bar"></span>
+      <span class="icon-bar"></span>
+      <span class="icon-bar"></span>
+    </button>
+    <a class="navbar-brand" href="http://github.com/week39-bs.html">Week 39: Optimization and  Gradient Methods</a>
+  </div>
+  <div class="navbar-collapse collapse navbar-responsive-collapse">
+    <ul class="nav navbar-nav navbar-right">
+      <li class="dropdown">
+        <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
+        <ul class="dropdown-menu">
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs001.html#plan-for-week-39-september-23-27-2024" style="font-size: 80%;">Plan for week 39, September 23-27, 2024</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs002.html#lecture-monday-september-23" style="font-size: 80%;">Lecture Monday September 23</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs003.html#lab-sessions-week-39" style="font-size: 80%;">Lab sessions week 39</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs004.html#lecture-monday-september-23-optimization-the-central-part-of-any-machine-learning-algortithm" style="font-size: 80%;">Lecture Monday September 23, Optimization, the central part of any Machine Learning algortithm</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs005.html#revisiting-our-logistic-regression-case" style="font-size: 80%;">Revisiting our Logistic Regression case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs006.html#the-equations-to-solve" style="font-size: 80%;">The equations to solve</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs007.html#solving-using-newton-raphson-s-method" style="font-size: 80%;">Solving using Newton-Raphson's method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs008.html#brief-reminder-on-newton-raphson-s-method" style="font-size: 80%;">Brief reminder on Newton-Raphson's method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs009.html#the-equations" style="font-size: 80%;">The equations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs010.html#simple-geometric-interpretation" style="font-size: 80%;">Simple geometric interpretation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs011.html#extending-to-more-than-one-variable" style="font-size: 80%;">Extending to more than one variable</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs012.html#steepest-descent" style="font-size: 80%;">Steepest descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs013.html#more-on-steepest-descent" style="font-size: 80%;">More on Steepest descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs014.html#the-ideal" style="font-size: 80%;">The ideal</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs015.html#the-sensitiveness-of-the-gradient-descent" style="font-size: 80%;">The sensitiveness of the gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs016.html#convex-functions" style="font-size: 80%;">Convex functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs017.html#convex-function" style="font-size: 80%;">Convex function</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs018.html#conditions-on-convex-functions" style="font-size: 80%;">Conditions on convex functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs019.html#more-on-convex-functions" style="font-size: 80%;">More on convex functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs020.html#some-simple-problems" style="font-size: 80%;">Some simple problems</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs021.html#standard-steepest-descent" style="font-size: 80%;">Standard steepest descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs022.html#gradient-method" style="font-size: 80%;">Gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs024.html#steepest-descent-method" style="font-size: 80%;">Steepest descent  method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs024.html#steepest-descent-method" style="font-size: 80%;">Steepest descent  method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs025.html#final-expressions" style="font-size: 80%;">Final expressions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs026.html#steepest-descent-example" style="font-size: 80%;">Steepest descent example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs031.html#conjugate-gradient-method-and-iterations" style="font-size: 80%;">Conjugate gradient method and iterations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs035.html#revisiting-our-first-homework" style="font-size: 80%;">Revisiting our first homework</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs040.html#gradient-descent-example" style="font-size: 80%;">Gradient descent example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs037.html#the-derivative-of-the-cost-loss-function" style="font-size: 80%;">The derivative of the cost/loss function</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs038.html#the-hessian-matrix" style="font-size: 80%;">The Hessian matrix</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs039.html#simple-program" style="font-size: 80%;">Simple program</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs040.html#gradient-descent-example" style="font-size: 80%;">Gradient Descent Example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs041.html#and-a-corresponding-example-using-scikit-learn" style="font-size: 80%;">And a corresponding example using <b>scikit-learn</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs042.html#gradient-descent-and-ridge" style="font-size: 80%;">Gradient descent and Ridge</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs043.html#the-hessian-matrix-for-ridge-regression" style="font-size: 80%;">The Hessian matrix for Ridge Regression</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs044.html#program-example-for-gradient-descent-with-ridge-regression" style="font-size: 80%;">Program example for gradient descent with Ridge Regression</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs045.html#using-gradient-descent-methods-limitations" style="font-size: 80%;">Using gradient descent methods, limitations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs046.html#improving-gradient-descent-with-momentum" style="font-size: 80%;">Improving gradient descent with momentum</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs083.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;">Same code but now with momentum gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs048.html#overview-video-on-stochastic-gradient-descent" style="font-size: 80%;">Overview video on Stochastic Gradient Descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs049.html#batches-and-mini-batches" style="font-size: 80%;">Batches and mini-batches</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs050.html#stochastic-gradient-descent-sgd" style="font-size: 80%;">Stochastic Gradient Descent (SGD)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs051.html#stochastic-gradient-descent" style="font-size: 80%;">Stochastic Gradient Descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs052.html#computation-of-gradients" style="font-size: 80%;">Computation of gradients</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs053.html#sgd-example" style="font-size: 80%;">SGD example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs054.html#the-gradient-step" style="font-size: 80%;">The gradient step</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs055.html#simple-example-code" style="font-size: 80%;">Simple example code</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs056.html#when-do-we-stop" style="font-size: 80%;">When do we stop?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs057.html#slightly-different-approach" style="font-size: 80%;">Slightly different approach</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs058.html#time-decay-rate" style="font-size: 80%;">Time decay rate</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs059.html#code-with-a-number-of-minibatches-which-varies" style="font-size: 80%;">Code with a Number of Minibatches which varies</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs060.html#replace-or-not" style="font-size: 80%;">Replace or not</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs061.html#momentum-based-gd" style="font-size: 80%;">Momentum based GD</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs062.html#more-on-momentum-based-approaches" style="font-size: 80%;">More on momentum based approaches</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs063.html#momentum-parameter" style="font-size: 80%;">Momentum parameter</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs064.html#second-moment-of-the-gradient" style="font-size: 80%;">Second moment of the gradient</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs065.html#rms-prop" style="font-size: 80%;">RMS prop</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs066.html#adam-optimizer-https-arxiv-org-abs-1412-6980" style="font-size: 80%;">"ADAM optimizer":"https://arxiv.org/abs/1412.6980"</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs067.html#algorithms-and-codes-for-adagrad-rmsprop-and-adam" style="font-size: 80%;">Algorithms and codes for Adagrad, RMSprop and Adam</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs068.html#practical-tips" style="font-size: 80%;">Practical tips</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs069.html#automatic-differentiation" style="font-size: 80%;">Automatic differentiation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs070.html#using-autograd" style="font-size: 80%;">Using autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs071.html#autograd-with-more-complicated-functions" style="font-size: 80%;">Autograd with more complicated functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs072.html#more-complicated-functions-using-the-elements-of-their-arguments-directly" style="font-size: 80%;">More complicated functions using the elements of their arguments directly</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs073.html#functions-using-mathematical-functions-from-numpy" style="font-size: 80%;">Functions using mathematical functions from Numpy</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs074.html#more-autograd" style="font-size: 80%;">More autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs075.html#and-with-loops" style="font-size: 80%;">And  with loops</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs076.html#using-recursion" style="font-size: 80%;">Using recursion</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs077.html#unsupported-functions" style="font-size: 80%;">Unsupported functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs078.html#the-syntax-a-dot-b-when-finding-the-dot-product" style="font-size: 80%;">The syntax a.dot(b) when finding the dot product</a></li>
+     <!-- navigation toc: --> <li><a href="#using-autograd-with-ols" style="font-size: 80%;">Using Autograd with OLS</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs083.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;">Same code but now with momentum gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs081.html#but-none-of-these-can-compete-with-newton-s-method" style="font-size: 80%;">But none of these can compete with Newton's method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs082.html#including-stochastic-gradient-descent-with-autograd" style="font-size: 80%;">Including Stochastic Gradient Descent with Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs083.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;">Same code but now with momentum gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs084.html#similar-second-order-function-now-problem-but-now-with-adagrad" style="font-size: 80%;">Similar (second order function now) problem but now with AdaGrad</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs085.html#rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent" style="font-size: 80%;">RMSprop for adaptive learning rate with Stochastic Gradient Descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs086.html#and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf" style="font-size: 80%;">And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs087.html#and-logistic-regression" style="font-size: 80%;">And Logistic Regression</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs088.html#introducing-jax-https-jax-readthedocs-io-en-latest" style="font-size: 80%;">Introducing "JAX":"https://jax.readthedocs.io/en/latest/"</a></li>
+
+        </ul>
+      </li>
+    </ul>
+  </div>
+</div>
+</div> <!-- end of navigation bar -->
+<div class="container">
+<p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
+<a name="part0079"></a>
+<!-- !split -->
+<h2 id="using-autograd-with-ols" class="anchor">Using Autograd with OLS </h2>
+
+<p>We conclude the part on optmization by showing how we can make codes
+for linear regression and logistic regression using <b>autograd</b>. The
+first example shows results with ordinary leats squares.
+</p>
+
+
+<!-- code=python (!bc pycod) typeset with pygments style "default" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #f8f8f8">
+  <pre style="line-height: 125%;"><span style="color: #408080; font-style: italic"># Using Autograd to calculate gradients for OLS</span>
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">random</span> <span style="color: #008000; font-weight: bold">import</span> random, seed
+<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
+<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">autograd.numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
+<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">matplotlib.pyplot</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">plt</span>
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">autograd</span> <span style="color: #008000; font-weight: bold">import</span> grad
+
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">CostOLS</span>(beta):
+    <span style="color: #008000; font-weight: bold">return</span> (<span style="color: #666666">1.0/</span>n)<span style="color: #666666">*</span>np<span style="color: #666666">.</span>sum((y<span style="color: #666666">-</span>X <span style="color: #666666">@</span> beta)<span style="color: #666666">**2</span>)
+
+n <span style="color: #666666">=</span> <span style="color: #666666">100</span>
+x <span style="color: #666666">=</span> <span style="color: #666666">2*</span>np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>rand(n,<span style="color: #666666">1</span>)
+y <span style="color: #666666">=</span> <span style="color: #666666">4+3*</span>x<span style="color: #666666">+</span>np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>randn(n,<span style="color: #666666">1</span>)
+
+X <span style="color: #666666">=</span> np<span style="color: #666666">.</span>c_[np<span style="color: #666666">.</span>ones((n,<span style="color: #666666">1</span>)), x]
+XT_X <span style="color: #666666">=</span> X<span style="color: #666666">.</span>T <span style="color: #666666">@</span> X
+theta_linreg <span style="color: #666666">=</span> np<span style="color: #666666">.</span>linalg<span style="color: #666666">.</span>pinv(XT_X) <span style="color: #666666">@</span> (X<span style="color: #666666">.</span>T <span style="color: #666666">@</span> y)
+<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;Own inversion&quot;</span>)
+<span style="color: #008000">print</span>(theta_linreg)
+<span style="color: #408080; font-style: italic"># Hessian matrix</span>
+H <span style="color: #666666">=</span> (<span style="color: #666666">2.0/</span>n)<span style="color: #666666">*</span> XT_X
+EigValues, EigVectors <span style="color: #666666">=</span> np<span style="color: #666666">.</span>linalg<span style="color: #666666">.</span>eig(H)
+<span style="color: #008000">print</span>(<span style="color: #BA2121">f&quot;Eigenvalues of Hessian Matrix:</span><span style="color: #BB6688; font-weight: bold">{</span>EigValues<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">&quot;</span>)
+
+theta <span style="color: #666666">=</span> np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>randn(<span style="color: #666666">2</span>,<span style="color: #666666">1</span>)
+eta <span style="color: #666666">=</span> <span style="color: #666666">1.0/</span>np<span style="color: #666666">.</span>max(EigValues)
+Niterations <span style="color: #666666">=</span> <span style="color: #666666">1000</span>
+<span style="color: #408080; font-style: italic"># define the gradient</span>
+training_gradient <span style="color: #666666">=</span> grad(CostOLS)
+
+<span style="color: #008000; font-weight: bold">for</span> <span style="color: #008000">iter</span> <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(Niterations):
+    gradients <span style="color: #666666">=</span> training_gradient(theta)
+    theta <span style="color: #666666">-=</span> eta<span style="color: #666666">*</span>gradients
+<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;theta from own gd&quot;</span>)
+<span style="color: #008000">print</span>(theta)
+
+xnew <span style="color: #666666">=</span> np<span style="color: #666666">.</span>array([[<span style="color: #666666">0</span>],[<span style="color: #666666">2</span>]])
+Xnew <span style="color: #666666">=</span> np<span style="color: #666666">.</span>c_[np<span style="color: #666666">.</span>ones((<span style="color: #666666">2</span>,<span style="color: #666666">1</span>)), xnew]
+ypredict <span style="color: #666666">=</span> Xnew<span style="color: #666666">.</span>dot(theta)
+ypredict2 <span style="color: #666666">=</span> Xnew<span style="color: #666666">.</span>dot(theta_linreg)
+
+plt<span style="color: #666666">.</span>plot(xnew, ypredict, <span style="color: #BA2121">&quot;r-&quot;</span>)
+plt<span style="color: #666666">.</span>plot(xnew, ypredict2, <span style="color: #BA2121">&quot;b-&quot;</span>)
+plt<span style="color: #666666">.</span>plot(x, y ,<span style="color: #BA2121">&#39;ro&#39;</span>)
+plt<span style="color: #666666">.</span>axis([<span style="color: #666666">0</span>,<span style="color: #666666">2.0</span>,<span style="color: #666666">0</span>, <span style="color: #666666">15.0</span>])
+plt<span style="color: #666666">.</span>xlabel(<span style="color: #BA2121">r&#39;$x$&#39;</span>)
+plt<span style="color: #666666">.</span>ylabel(<span style="color: #BA2121">r&#39;$y$&#39;</span>)
+plt<span style="color: #666666">.</span>title(<span style="color: #BA2121">r&#39;Random numbers &#39;</span>)
+plt<span style="color: #666666">.</span>show()
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
+</div>
+
+
+<p>
+<!-- navigation buttons at the bottom of the page -->
+<ul class="pagination">
+<li><a href="http://github.com/._week39-bs078.html">&laquo;</a></li>
+  <li><a href="http://github.com/._week39-bs000.html">1</a></li>
+  <li><a href="">...</a></li>
+  <li><a href="http://github.com/._week39-bs071.html">72</a></li>
+  <li><a href="http://github.com/._week39-bs072.html">73</a></li>
+  <li><a href="http://github.com/._week39-bs073.html">74</a></li>
+  <li><a href="http://github.com/._week39-bs074.html">75</a></li>
+  <li><a href="http://github.com/._week39-bs075.html">76</a></li>
+  <li><a href="http://github.com/._week39-bs076.html">77</a></li>
+  <li><a href="http://github.com/._week39-bs077.html">78</a></li>
+  <li><a href="http://github.com/._week39-bs078.html">79</a></li>
+  <li class="active"><a href="http://github.com/._week39-bs079.html">80</a></li>
+  <li><a href="http://github.com/._week39-bs080.html">81</a></li>
+  <li><a href="http://github.com/._week39-bs081.html">82</a></li>
+  <li><a href="http://github.com/._week39-bs082.html">83</a></li>
+  <li><a href="http://github.com/._week39-bs083.html">84</a></li>
+  <li><a href="http://github.com/._week39-bs084.html">85</a></li>
+  <li><a href="http://github.com/._week39-bs085.html">86</a></li>
+  <li><a href="http://github.com/._week39-bs086.html">87</a></li>
+  <li><a href="http://github.com/._week39-bs087.html">88</a></li>
+  <li><a href="http://github.com/._week39-bs088.html">89</a></li>
+  <li><a href="http://github.com/._week39-bs080.html">&raquo;</a></li>
+</ul>
+<!-- ------------------- end of main content --------------- -->
+</div>  <!-- end container -->
+<!-- include javascript, jQuery *first* -->
+<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.10.2/jquery.min.js"></script>
+<script src="https://netdna.bootstrapcdn.com/bootstrap/3.0.0/js/bootstrap.min.js"></script>
+<!-- Bootstrap footer
+<footer>
+<a href="https://.../"><img width="250" align=right src="https://.../"></a>
+</footer>
+-->
+<center style="font-size:80%">
+<!-- copyright only on the titlepage -->
+</center>
+</body>
+</html>
+
diff --git a/doc/pub/week39/html/._week39-bs080.html b/doc/pub/week39/html/._week39-bs080.html
new file mode 100644
index 000000000..184294432
--- /dev/null
+++ b/doc/pub/week39/html/._week39-bs080.html
@@ -0,0 +1,531 @@
+<!--
+HTML file automatically generated from DocOnce source
+(https://github.com/doconce/doconce/)
+doconce format html week39.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week39-bs --no_mako
+-->
+<html>
+<head>
+<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
+<meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
+<meta name="viewport" content="width=device-width, initial-scale=1.0" />
+<meta name="description" content="Week 39: Optimization and  Gradient Methods">
+<title>Week 39: Optimization and  Gradient Methods</title>
+<!-- Bootstrap style: bootstrap -->
+<!-- doconce format html week39.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week39-bs --no_mako -->
+<link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
+<!-- not necessary
+<link href="https://netdna.bootstrapcdn.com/font-awesome/4.0.3/css/font-awesome.css" rel="stylesheet">
+-->
+<style type="text/css">
+/* Add scrollbar to dropdown menus in bootstrap navigation bar */
+.dropdown-menu {
+   height: auto;
+   max-height: 400px;
+   overflow-x: hidden;
+}
+/* Adds an invisible element before each target to offset for the navigation
+   bar */
+.anchor::before {
+  content:"";
+  display:block;
+  height:50px;      /* fixed header height for style bootstrap */
+  margin:-50px 0 0; /* negative fixed header height */
+}
+</style>
+</head>
+
+<!-- tocinfo
+{'highest level': 2,
+ 'sections': [('Plan for week 39, September 23-27, 2024',
+               2,
+               None,
+               'plan-for-week-39-september-23-27-2024'),
+              ('Lecture Monday September 23',
+               2,
+               None,
+               'lecture-monday-september-23'),
+              ('Lab sessions week 39', 2, None, 'lab-sessions-week-39'),
+              ('Lecture Monday September 23, Optimization, the central part of '
+               'any Machine Learning algortithm',
+               2,
+               None,
+               'lecture-monday-september-23-optimization-the-central-part-of-any-machine-learning-algortithm'),
+              ('Revisiting our Logistic Regression case',
+               2,
+               None,
+               'revisiting-our-logistic-regression-case'),
+              ('The equations to solve', 2, None, 'the-equations-to-solve'),
+              ("Solving using Newton-Raphson's method",
+               2,
+               None,
+               'solving-using-newton-raphson-s-method'),
+              ("Brief reminder on Newton-Raphson's method",
+               2,
+               None,
+               'brief-reminder-on-newton-raphson-s-method'),
+              ('The equations', 2, None, 'the-equations'),
+              ('Simple geometric interpretation',
+               2,
+               None,
+               'simple-geometric-interpretation'),
+              ('Extending to more than one variable',
+               2,
+               None,
+               'extending-to-more-than-one-variable'),
+              ('Steepest descent', 2, None, 'steepest-descent'),
+              ('More on Steepest descent', 2, None, 'more-on-steepest-descent'),
+              ('The ideal', 2, None, 'the-ideal'),
+              ('The sensitiveness of the gradient descent',
+               2,
+               None,
+               'the-sensitiveness-of-the-gradient-descent'),
+              ('Convex functions', 2, None, 'convex-functions'),
+              ('Convex function', 2, None, 'convex-function'),
+              ('Conditions on convex functions',
+               2,
+               None,
+               'conditions-on-convex-functions'),
+              ('More on convex functions', 2, None, 'more-on-convex-functions'),
+              ('Some simple problems', 2, None, 'some-simple-problems'),
+              ('Standard steepest descent',
+               2,
+               None,
+               'standard-steepest-descent'),
+              ('Gradient method', 2, None, 'gradient-method'),
+              ('Steepest descent  method', 2, None, 'steepest-descent-method'),
+              ('Steepest descent  method', 2, None, 'steepest-descent-method'),
+              ('Final expressions', 2, None, 'final-expressions'),
+              ('Steepest descent example', 2, None, 'steepest-descent-example'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method and iterations',
+               2,
+               None,
+               'conjugate-gradient-method-and-iterations'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Revisiting our first homework',
+               2,
+               None,
+               'revisiting-our-first-homework'),
+              ('Gradient descent example', 2, None, 'gradient-descent-example'),
+              ('The derivative of the cost/loss function',
+               2,
+               None,
+               'the-derivative-of-the-cost-loss-function'),
+              ('The Hessian matrix', 2, None, 'the-hessian-matrix'),
+              ('Simple program', 2, None, 'simple-program'),
+              ('Gradient Descent Example', 2, None, 'gradient-descent-example'),
+              ('And a corresponding example using _scikit-learn_',
+               2,
+               None,
+               'and-a-corresponding-example-using-scikit-learn'),
+              ('Gradient descent and Ridge',
+               2,
+               None,
+               'gradient-descent-and-ridge'),
+              ('The Hessian matrix for Ridge Regression',
+               2,
+               None,
+               'the-hessian-matrix-for-ridge-regression'),
+              ('Program example for gradient descent with Ridge Regression',
+               2,
+               None,
+               'program-example-for-gradient-descent-with-ridge-regression'),
+              ('Using gradient descent methods, limitations',
+               2,
+               None,
+               'using-gradient-descent-methods-limitations'),
+              ('Improving gradient descent with momentum',
+               2,
+               None,
+               'improving-gradient-descent-with-momentum'),
+              ('Same code but now with momentum gradient descent',
+               2,
+               None,
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ('Overview video on Stochastic Gradient Descent',
+               2,
+               None,
+               'overview-video-on-stochastic-gradient-descent'),
+              ('Batches and mini-batches', 2, None, 'batches-and-mini-batches'),
+              ('Stochastic Gradient Descent (SGD)',
+               2,
+               None,
+               'stochastic-gradient-descent-sgd'),
+              ('Stochastic Gradient Descent',
+               2,
+               None,
+               'stochastic-gradient-descent'),
+              ('Computation of gradients', 2, None, 'computation-of-gradients'),
+              ('SGD example', 2, None, 'sgd-example'),
+              ('The gradient step', 2, None, 'the-gradient-step'),
+              ('Simple example code', 2, None, 'simple-example-code'),
+              ('When do we stop?', 2, None, 'when-do-we-stop'),
+              ('Slightly different approach',
+               2,
+               None,
+               'slightly-different-approach'),
+              ('Time decay rate', 2, None, 'time-decay-rate'),
+              ('Code with a Number of Minibatches which varies',
+               2,
+               None,
+               'code-with-a-number-of-minibatches-which-varies'),
+              ('Replace or not', 2, None, 'replace-or-not'),
+              ('Momentum based GD', 2, None, 'momentum-based-gd'),
+              ('More on momentum based approaches',
+               2,
+               None,
+               'more-on-momentum-based-approaches'),
+              ('Momentum parameter', 2, None, 'momentum-parameter'),
+              ('Second moment of the gradient',
+               2,
+               None,
+               'second-moment-of-the-gradient'),
+              ('RMS prop', 2, None, 'rms-prop'),
+              ('"ADAM optimizer":"https://arxiv.org/abs/1412.6980"',
+               2,
+               None,
+               'adam-optimizer-https-arxiv-org-abs-1412-6980'),
+              ('Algorithms and codes for Adagrad, RMSprop and Adam',
+               2,
+               None,
+               'algorithms-and-codes-for-adagrad-rmsprop-and-adam'),
+              ('Practical tips', 2, None, 'practical-tips'),
+              ('Automatic differentiation',
+               2,
+               None,
+               'automatic-differentiation'),
+              ('Using autograd', 2, None, 'using-autograd'),
+              ('Autograd with more complicated functions',
+               2,
+               None,
+               'autograd-with-more-complicated-functions'),
+              ('More complicated functions using the elements of their '
+               'arguments directly',
+               2,
+               None,
+               'more-complicated-functions-using-the-elements-of-their-arguments-directly'),
+              ('Functions using mathematical functions from Numpy',
+               2,
+               None,
+               'functions-using-mathematical-functions-from-numpy'),
+              ('More autograd', 2, None, 'more-autograd'),
+              ('And  with loops', 2, None, 'and-with-loops'),
+              ('Using recursion', 2, None, 'using-recursion'),
+              ('Unsupported functions', 2, None, 'unsupported-functions'),
+              ('The syntax a.dot(b) when finding the dot product',
+               2,
+               None,
+               'the-syntax-a-dot-b-when-finding-the-dot-product'),
+              ('Using Autograd with OLS', 2, None, 'using-autograd-with-ols'),
+              ('Same code but now with momentum gradient descent',
+               2,
+               None,
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ("But none of these can compete with Newton's method",
+               2,
+               None,
+               'but-none-of-these-can-compete-with-newton-s-method'),
+              ('Including Stochastic Gradient Descent with Autograd',
+               2,
+               None,
+               'including-stochastic-gradient-descent-with-autograd'),
+              ('Same code but now with momentum gradient descent',
+               2,
+               None,
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ('Similar (second order function now) problem but now with '
+               'AdaGrad',
+               2,
+               None,
+               'similar-second-order-function-now-problem-but-now-with-adagrad'),
+              ('RMSprop for adaptive learning rate with Stochastic Gradient '
+               'Descent',
+               2,
+               None,
+               'rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent'),
+              ('And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"',
+               2,
+               None,
+               'and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf'),
+              ('And Logistic Regression', 2, None, 'and-logistic-regression'),
+              ('Introducing "JAX":"https://jax.readthedocs.io/en/latest/"',
+               2,
+               None,
+               'introducing-jax-https-jax-readthedocs-io-en-latest')]}
+end of tocinfo -->
+
+<body>
+
+
+
+<script type="text/x-mathjax-config">
+MathJax.Hub.Config({
+  TeX: {
+     equationNumbers: {  autoNumber: "none"  },
+     extensions: ["AMSmath.js", "AMSsymbols.js", "autobold.js", "color.js"]
+  }
+});
+</script>
+<script type="text/javascript" async
+ src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
+</script>
+
+
+<!-- Bootstrap navigation bar -->
+<div class="navbar navbar-default navbar-fixed-top">
+  <div class="navbar-header">
+    <button type="button" class="navbar-toggle" data-toggle="collapse" data-target=".navbar-responsive-collapse">
+      <span class="icon-bar"></span>
+      <span class="icon-bar"></span>
+      <span class="icon-bar"></span>
+    </button>
+    <a class="navbar-brand" href="http://github.com/week39-bs.html">Week 39: Optimization and  Gradient Methods</a>
+  </div>
+  <div class="navbar-collapse collapse navbar-responsive-collapse">
+    <ul class="nav navbar-nav navbar-right">
+      <li class="dropdown">
+        <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
+        <ul class="dropdown-menu">
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs001.html#plan-for-week-39-september-23-27-2024" style="font-size: 80%;">Plan for week 39, September 23-27, 2024</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs002.html#lecture-monday-september-23" style="font-size: 80%;">Lecture Monday September 23</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs003.html#lab-sessions-week-39" style="font-size: 80%;">Lab sessions week 39</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs004.html#lecture-monday-september-23-optimization-the-central-part-of-any-machine-learning-algortithm" style="font-size: 80%;">Lecture Monday September 23, Optimization, the central part of any Machine Learning algortithm</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs005.html#revisiting-our-logistic-regression-case" style="font-size: 80%;">Revisiting our Logistic Regression case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs006.html#the-equations-to-solve" style="font-size: 80%;">The equations to solve</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs007.html#solving-using-newton-raphson-s-method" style="font-size: 80%;">Solving using Newton-Raphson's method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs008.html#brief-reminder-on-newton-raphson-s-method" style="font-size: 80%;">Brief reminder on Newton-Raphson's method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs009.html#the-equations" style="font-size: 80%;">The equations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs010.html#simple-geometric-interpretation" style="font-size: 80%;">Simple geometric interpretation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs011.html#extending-to-more-than-one-variable" style="font-size: 80%;">Extending to more than one variable</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs012.html#steepest-descent" style="font-size: 80%;">Steepest descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs013.html#more-on-steepest-descent" style="font-size: 80%;">More on Steepest descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs014.html#the-ideal" style="font-size: 80%;">The ideal</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs015.html#the-sensitiveness-of-the-gradient-descent" style="font-size: 80%;">The sensitiveness of the gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs016.html#convex-functions" style="font-size: 80%;">Convex functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs017.html#convex-function" style="font-size: 80%;">Convex function</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs018.html#conditions-on-convex-functions" style="font-size: 80%;">Conditions on convex functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs019.html#more-on-convex-functions" style="font-size: 80%;">More on convex functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs020.html#some-simple-problems" style="font-size: 80%;">Some simple problems</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs021.html#standard-steepest-descent" style="font-size: 80%;">Standard steepest descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs022.html#gradient-method" style="font-size: 80%;">Gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs024.html#steepest-descent-method" style="font-size: 80%;">Steepest descent  method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs024.html#steepest-descent-method" style="font-size: 80%;">Steepest descent  method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs025.html#final-expressions" style="font-size: 80%;">Final expressions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs026.html#steepest-descent-example" style="font-size: 80%;">Steepest descent example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs031.html#conjugate-gradient-method-and-iterations" style="font-size: 80%;">Conjugate gradient method and iterations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs035.html#revisiting-our-first-homework" style="font-size: 80%;">Revisiting our first homework</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs040.html#gradient-descent-example" style="font-size: 80%;">Gradient descent example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs037.html#the-derivative-of-the-cost-loss-function" style="font-size: 80%;">The derivative of the cost/loss function</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs038.html#the-hessian-matrix" style="font-size: 80%;">The Hessian matrix</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs039.html#simple-program" style="font-size: 80%;">Simple program</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs040.html#gradient-descent-example" style="font-size: 80%;">Gradient Descent Example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs041.html#and-a-corresponding-example-using-scikit-learn" style="font-size: 80%;">And a corresponding example using <b>scikit-learn</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs042.html#gradient-descent-and-ridge" style="font-size: 80%;">Gradient descent and Ridge</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs043.html#the-hessian-matrix-for-ridge-regression" style="font-size: 80%;">The Hessian matrix for Ridge Regression</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs044.html#program-example-for-gradient-descent-with-ridge-regression" style="font-size: 80%;">Program example for gradient descent with Ridge Regression</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs045.html#using-gradient-descent-methods-limitations" style="font-size: 80%;">Using gradient descent methods, limitations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs046.html#improving-gradient-descent-with-momentum" style="font-size: 80%;">Improving gradient descent with momentum</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs083.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;">Same code but now with momentum gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs048.html#overview-video-on-stochastic-gradient-descent" style="font-size: 80%;">Overview video on Stochastic Gradient Descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs049.html#batches-and-mini-batches" style="font-size: 80%;">Batches and mini-batches</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs050.html#stochastic-gradient-descent-sgd" style="font-size: 80%;">Stochastic Gradient Descent (SGD)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs051.html#stochastic-gradient-descent" style="font-size: 80%;">Stochastic Gradient Descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs052.html#computation-of-gradients" style="font-size: 80%;">Computation of gradients</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs053.html#sgd-example" style="font-size: 80%;">SGD example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs054.html#the-gradient-step" style="font-size: 80%;">The gradient step</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs055.html#simple-example-code" style="font-size: 80%;">Simple example code</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs056.html#when-do-we-stop" style="font-size: 80%;">When do we stop?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs057.html#slightly-different-approach" style="font-size: 80%;">Slightly different approach</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs058.html#time-decay-rate" style="font-size: 80%;">Time decay rate</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs059.html#code-with-a-number-of-minibatches-which-varies" style="font-size: 80%;">Code with a Number of Minibatches which varies</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs060.html#replace-or-not" style="font-size: 80%;">Replace or not</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs061.html#momentum-based-gd" style="font-size: 80%;">Momentum based GD</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs062.html#more-on-momentum-based-approaches" style="font-size: 80%;">More on momentum based approaches</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs063.html#momentum-parameter" style="font-size: 80%;">Momentum parameter</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs064.html#second-moment-of-the-gradient" style="font-size: 80%;">Second moment of the gradient</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs065.html#rms-prop" style="font-size: 80%;">RMS prop</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs066.html#adam-optimizer-https-arxiv-org-abs-1412-6980" style="font-size: 80%;">"ADAM optimizer":"https://arxiv.org/abs/1412.6980"</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs067.html#algorithms-and-codes-for-adagrad-rmsprop-and-adam" style="font-size: 80%;">Algorithms and codes for Adagrad, RMSprop and Adam</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs068.html#practical-tips" style="font-size: 80%;">Practical tips</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs069.html#automatic-differentiation" style="font-size: 80%;">Automatic differentiation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs070.html#using-autograd" style="font-size: 80%;">Using autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs071.html#autograd-with-more-complicated-functions" style="font-size: 80%;">Autograd with more complicated functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs072.html#more-complicated-functions-using-the-elements-of-their-arguments-directly" style="font-size: 80%;">More complicated functions using the elements of their arguments directly</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs073.html#functions-using-mathematical-functions-from-numpy" style="font-size: 80%;">Functions using mathematical functions from Numpy</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs074.html#more-autograd" style="font-size: 80%;">More autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs075.html#and-with-loops" style="font-size: 80%;">And  with loops</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs076.html#using-recursion" style="font-size: 80%;">Using recursion</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs077.html#unsupported-functions" style="font-size: 80%;">Unsupported functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs078.html#the-syntax-a-dot-b-when-finding-the-dot-product" style="font-size: 80%;">The syntax a.dot(b) when finding the dot product</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs079.html#using-autograd-with-ols" style="font-size: 80%;">Using Autograd with OLS</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs083.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;">Same code but now with momentum gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs081.html#but-none-of-these-can-compete-with-newton-s-method" style="font-size: 80%;">But none of these can compete with Newton's method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs082.html#including-stochastic-gradient-descent-with-autograd" style="font-size: 80%;">Including Stochastic Gradient Descent with Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs083.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;">Same code but now with momentum gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs084.html#similar-second-order-function-now-problem-but-now-with-adagrad" style="font-size: 80%;">Similar (second order function now) problem but now with AdaGrad</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs085.html#rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent" style="font-size: 80%;">RMSprop for adaptive learning rate with Stochastic Gradient Descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs086.html#and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf" style="font-size: 80%;">And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs087.html#and-logistic-regression" style="font-size: 80%;">And Logistic Regression</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs088.html#introducing-jax-https-jax-readthedocs-io-en-latest" style="font-size: 80%;">Introducing "JAX":"https://jax.readthedocs.io/en/latest/"</a></li>
+
+        </ul>
+      </li>
+    </ul>
+  </div>
+</div>
+</div> <!-- end of navigation bar -->
+<div class="container">
+<p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
+<a name="part0080"></a>
+<!-- !split -->
+<h2 id="same-code-but-now-with-momentum-gradient-descent" class="anchor">Same code but now with momentum gradient descent </h2>
+
+<!-- code=python (!bc pycod) typeset with pygments style "default" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #f8f8f8">
+  <pre style="line-height: 125%;"><span style="color: #408080; font-style: italic"># Using Autograd to calculate gradients for OLS</span>
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">random</span> <span style="color: #008000; font-weight: bold">import</span> random, seed
+<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
+<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">autograd.numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
+<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">matplotlib.pyplot</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">plt</span>
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">autograd</span> <span style="color: #008000; font-weight: bold">import</span> grad
+
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">CostOLS</span>(beta):
+    <span style="color: #008000; font-weight: bold">return</span> (<span style="color: #666666">1.0/</span>n)<span style="color: #666666">*</span>np<span style="color: #666666">.</span>sum((y<span style="color: #666666">-</span>X <span style="color: #666666">@</span> beta)<span style="color: #666666">**2</span>)
+
+n <span style="color: #666666">=</span> <span style="color: #666666">100</span>
+x <span style="color: #666666">=</span> <span style="color: #666666">2*</span>np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>rand(n,<span style="color: #666666">1</span>)
+y <span style="color: #666666">=</span> <span style="color: #666666">4+3*</span>x<span style="color: #408080; font-style: italic">#+np.random.randn(n,1)</span>
+
+X <span style="color: #666666">=</span> np<span style="color: #666666">.</span>c_[np<span style="color: #666666">.</span>ones((n,<span style="color: #666666">1</span>)), x]
+XT_X <span style="color: #666666">=</span> X<span style="color: #666666">.</span>T <span style="color: #666666">@</span> X
+theta_linreg <span style="color: #666666">=</span> np<span style="color: #666666">.</span>linalg<span style="color: #666666">.</span>pinv(XT_X) <span style="color: #666666">@</span> (X<span style="color: #666666">.</span>T <span style="color: #666666">@</span> y)
+<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;Own inversion&quot;</span>)
+<span style="color: #008000">print</span>(theta_linreg)
+<span style="color: #408080; font-style: italic"># Hessian matrix</span>
+H <span style="color: #666666">=</span> (<span style="color: #666666">2.0/</span>n)<span style="color: #666666">*</span> XT_X
+EigValues, EigVectors <span style="color: #666666">=</span> np<span style="color: #666666">.</span>linalg<span style="color: #666666">.</span>eig(H)
+<span style="color: #008000">print</span>(<span style="color: #BA2121">f&quot;Eigenvalues of Hessian Matrix:</span><span style="color: #BB6688; font-weight: bold">{</span>EigValues<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">&quot;</span>)
+
+theta <span style="color: #666666">=</span> np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>randn(<span style="color: #666666">2</span>,<span style="color: #666666">1</span>)
+eta <span style="color: #666666">=</span> <span style="color: #666666">1.0/</span>np<span style="color: #666666">.</span>max(EigValues)
+Niterations <span style="color: #666666">=</span> <span style="color: #666666">30</span>
+
+<span style="color: #408080; font-style: italic"># define the gradient</span>
+training_gradient <span style="color: #666666">=</span> grad(CostOLS)
+
+<span style="color: #008000; font-weight: bold">for</span> <span style="color: #008000">iter</span> <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(Niterations):
+    gradients <span style="color: #666666">=</span> training_gradient(theta)
+    theta <span style="color: #666666">-=</span> eta<span style="color: #666666">*</span>gradients
+    <span style="color: #008000">print</span>(<span style="color: #008000">iter</span>,gradients[<span style="color: #666666">0</span>],gradients[<span style="color: #666666">1</span>])
+<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;theta from own gd&quot;</span>)
+<span style="color: #008000">print</span>(theta)
+
+<span style="color: #408080; font-style: italic"># Now improve with momentum gradient descent</span>
+change <span style="color: #666666">=</span> <span style="color: #666666">0.0</span>
+delta_momentum <span style="color: #666666">=</span> <span style="color: #666666">0.3</span>
+<span style="color: #008000; font-weight: bold">for</span> <span style="color: #008000">iter</span> <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(Niterations):
+    <span style="color: #408080; font-style: italic"># calculate gradient</span>
+    gradients <span style="color: #666666">=</span> training_gradient(theta)
+    <span style="color: #408080; font-style: italic"># calculate update</span>
+    new_change <span style="color: #666666">=</span> eta<span style="color: #666666">*</span>gradients<span style="color: #666666">+</span>delta_momentum<span style="color: #666666">*</span>change
+    <span style="color: #408080; font-style: italic"># take a step</span>
+    theta <span style="color: #666666">-=</span> new_change
+    <span style="color: #408080; font-style: italic"># save the change</span>
+    change <span style="color: #666666">=</span> new_change
+    <span style="color: #008000">print</span>(<span style="color: #008000">iter</span>,gradients[<span style="color: #666666">0</span>],gradients[<span style="color: #666666">1</span>])
+<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;theta from own gd wth momentum&quot;</span>)
+<span style="color: #008000">print</span>(theta)
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
+</div>
+
+
+<p>
+<!-- navigation buttons at the bottom of the page -->
+<ul class="pagination">
+<li><a href="http://github.com/._week39-bs079.html">&laquo;</a></li>
+  <li><a href="http://github.com/._week39-bs000.html">1</a></li>
+  <li><a href="">...</a></li>
+  <li><a href="http://github.com/._week39-bs072.html">73</a></li>
+  <li><a href="http://github.com/._week39-bs073.html">74</a></li>
+  <li><a href="http://github.com/._week39-bs074.html">75</a></li>
+  <li><a href="http://github.com/._week39-bs075.html">76</a></li>
+  <li><a href="http://github.com/._week39-bs076.html">77</a></li>
+  <li><a href="http://github.com/._week39-bs077.html">78</a></li>
+  <li><a href="http://github.com/._week39-bs078.html">79</a></li>
+  <li><a href="http://github.com/._week39-bs079.html">80</a></li>
+  <li class="active"><a href="http://github.com/._week39-bs080.html">81</a></li>
+  <li><a href="http://github.com/._week39-bs081.html">82</a></li>
+  <li><a href="http://github.com/._week39-bs082.html">83</a></li>
+  <li><a href="http://github.com/._week39-bs083.html">84</a></li>
+  <li><a href="http://github.com/._week39-bs084.html">85</a></li>
+  <li><a href="http://github.com/._week39-bs085.html">86</a></li>
+  <li><a href="http://github.com/._week39-bs086.html">87</a></li>
+  <li><a href="http://github.com/._week39-bs087.html">88</a></li>
+  <li><a href="http://github.com/._week39-bs088.html">89</a></li>
+  <li><a href="http://github.com/._week39-bs081.html">&raquo;</a></li>
+</ul>
+<!-- ------------------- end of main content --------------- -->
+</div>  <!-- end container -->
+<!-- include javascript, jQuery *first* -->
+<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.10.2/jquery.min.js"></script>
+<script src="https://netdna.bootstrapcdn.com/bootstrap/3.0.0/js/bootstrap.min.js"></script>
+<!-- Bootstrap footer
+<footer>
+<a href="https://.../"><img width="250" align=right src="https://.../"></a>
+</footer>
+-->
+<center style="font-size:80%">
+<!-- copyright only on the titlepage -->
+</center>
+</body>
+</html>
+
diff --git a/doc/pub/week39/html/._week39-bs081.html b/doc/pub/week39/html/._week39-bs081.html
new file mode 100644
index 000000000..90ee108a7
--- /dev/null
+++ b/doc/pub/week39/html/._week39-bs081.html
@@ -0,0 +1,516 @@
+<!--
+HTML file automatically generated from DocOnce source
+(https://github.com/doconce/doconce/)
+doconce format html week39.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week39-bs --no_mako
+-->
+<html>
+<head>
+<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
+<meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
+<meta name="viewport" content="width=device-width, initial-scale=1.0" />
+<meta name="description" content="Week 39: Optimization and  Gradient Methods">
+<title>Week 39: Optimization and  Gradient Methods</title>
+<!-- Bootstrap style: bootstrap -->
+<!-- doconce format html week39.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week39-bs --no_mako -->
+<link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
+<!-- not necessary
+<link href="https://netdna.bootstrapcdn.com/font-awesome/4.0.3/css/font-awesome.css" rel="stylesheet">
+-->
+<style type="text/css">
+/* Add scrollbar to dropdown menus in bootstrap navigation bar */
+.dropdown-menu {
+   height: auto;
+   max-height: 400px;
+   overflow-x: hidden;
+}
+/* Adds an invisible element before each target to offset for the navigation
+   bar */
+.anchor::before {
+  content:"";
+  display:block;
+  height:50px;      /* fixed header height for style bootstrap */
+  margin:-50px 0 0; /* negative fixed header height */
+}
+</style>
+</head>
+
+<!-- tocinfo
+{'highest level': 2,
+ 'sections': [('Plan for week 39, September 23-27, 2024',
+               2,
+               None,
+               'plan-for-week-39-september-23-27-2024'),
+              ('Lecture Monday September 23',
+               2,
+               None,
+               'lecture-monday-september-23'),
+              ('Lab sessions week 39', 2, None, 'lab-sessions-week-39'),
+              ('Lecture Monday September 23, Optimization, the central part of '
+               'any Machine Learning algortithm',
+               2,
+               None,
+               'lecture-monday-september-23-optimization-the-central-part-of-any-machine-learning-algortithm'),
+              ('Revisiting our Logistic Regression case',
+               2,
+               None,
+               'revisiting-our-logistic-regression-case'),
+              ('The equations to solve', 2, None, 'the-equations-to-solve'),
+              ("Solving using Newton-Raphson's method",
+               2,
+               None,
+               'solving-using-newton-raphson-s-method'),
+              ("Brief reminder on Newton-Raphson's method",
+               2,
+               None,
+               'brief-reminder-on-newton-raphson-s-method'),
+              ('The equations', 2, None, 'the-equations'),
+              ('Simple geometric interpretation',
+               2,
+               None,
+               'simple-geometric-interpretation'),
+              ('Extending to more than one variable',
+               2,
+               None,
+               'extending-to-more-than-one-variable'),
+              ('Steepest descent', 2, None, 'steepest-descent'),
+              ('More on Steepest descent', 2, None, 'more-on-steepest-descent'),
+              ('The ideal', 2, None, 'the-ideal'),
+              ('The sensitiveness of the gradient descent',
+               2,
+               None,
+               'the-sensitiveness-of-the-gradient-descent'),
+              ('Convex functions', 2, None, 'convex-functions'),
+              ('Convex function', 2, None, 'convex-function'),
+              ('Conditions on convex functions',
+               2,
+               None,
+               'conditions-on-convex-functions'),
+              ('More on convex functions', 2, None, 'more-on-convex-functions'),
+              ('Some simple problems', 2, None, 'some-simple-problems'),
+              ('Standard steepest descent',
+               2,
+               None,
+               'standard-steepest-descent'),
+              ('Gradient method', 2, None, 'gradient-method'),
+              ('Steepest descent  method', 2, None, 'steepest-descent-method'),
+              ('Steepest descent  method', 2, None, 'steepest-descent-method'),
+              ('Final expressions', 2, None, 'final-expressions'),
+              ('Steepest descent example', 2, None, 'steepest-descent-example'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method and iterations',
+               2,
+               None,
+               'conjugate-gradient-method-and-iterations'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Revisiting our first homework',
+               2,
+               None,
+               'revisiting-our-first-homework'),
+              ('Gradient descent example', 2, None, 'gradient-descent-example'),
+              ('The derivative of the cost/loss function',
+               2,
+               None,
+               'the-derivative-of-the-cost-loss-function'),
+              ('The Hessian matrix', 2, None, 'the-hessian-matrix'),
+              ('Simple program', 2, None, 'simple-program'),
+              ('Gradient Descent Example', 2, None, 'gradient-descent-example'),
+              ('And a corresponding example using _scikit-learn_',
+               2,
+               None,
+               'and-a-corresponding-example-using-scikit-learn'),
+              ('Gradient descent and Ridge',
+               2,
+               None,
+               'gradient-descent-and-ridge'),
+              ('The Hessian matrix for Ridge Regression',
+               2,
+               None,
+               'the-hessian-matrix-for-ridge-regression'),
+              ('Program example for gradient descent with Ridge Regression',
+               2,
+               None,
+               'program-example-for-gradient-descent-with-ridge-regression'),
+              ('Using gradient descent methods, limitations',
+               2,
+               None,
+               'using-gradient-descent-methods-limitations'),
+              ('Improving gradient descent with momentum',
+               2,
+               None,
+               'improving-gradient-descent-with-momentum'),
+              ('Same code but now with momentum gradient descent',
+               2,
+               None,
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ('Overview video on Stochastic Gradient Descent',
+               2,
+               None,
+               'overview-video-on-stochastic-gradient-descent'),
+              ('Batches and mini-batches', 2, None, 'batches-and-mini-batches'),
+              ('Stochastic Gradient Descent (SGD)',
+               2,
+               None,
+               'stochastic-gradient-descent-sgd'),
+              ('Stochastic Gradient Descent',
+               2,
+               None,
+               'stochastic-gradient-descent'),
+              ('Computation of gradients', 2, None, 'computation-of-gradients'),
+              ('SGD example', 2, None, 'sgd-example'),
+              ('The gradient step', 2, None, 'the-gradient-step'),
+              ('Simple example code', 2, None, 'simple-example-code'),
+              ('When do we stop?', 2, None, 'when-do-we-stop'),
+              ('Slightly different approach',
+               2,
+               None,
+               'slightly-different-approach'),
+              ('Time decay rate', 2, None, 'time-decay-rate'),
+              ('Code with a Number of Minibatches which varies',
+               2,
+               None,
+               'code-with-a-number-of-minibatches-which-varies'),
+              ('Replace or not', 2, None, 'replace-or-not'),
+              ('Momentum based GD', 2, None, 'momentum-based-gd'),
+              ('More on momentum based approaches',
+               2,
+               None,
+               'more-on-momentum-based-approaches'),
+              ('Momentum parameter', 2, None, 'momentum-parameter'),
+              ('Second moment of the gradient',
+               2,
+               None,
+               'second-moment-of-the-gradient'),
+              ('RMS prop', 2, None, 'rms-prop'),
+              ('"ADAM optimizer":"https://arxiv.org/abs/1412.6980"',
+               2,
+               None,
+               'adam-optimizer-https-arxiv-org-abs-1412-6980'),
+              ('Algorithms and codes for Adagrad, RMSprop and Adam',
+               2,
+               None,
+               'algorithms-and-codes-for-adagrad-rmsprop-and-adam'),
+              ('Practical tips', 2, None, 'practical-tips'),
+              ('Automatic differentiation',
+               2,
+               None,
+               'automatic-differentiation'),
+              ('Using autograd', 2, None, 'using-autograd'),
+              ('Autograd with more complicated functions',
+               2,
+               None,
+               'autograd-with-more-complicated-functions'),
+              ('More complicated functions using the elements of their '
+               'arguments directly',
+               2,
+               None,
+               'more-complicated-functions-using-the-elements-of-their-arguments-directly'),
+              ('Functions using mathematical functions from Numpy',
+               2,
+               None,
+               'functions-using-mathematical-functions-from-numpy'),
+              ('More autograd', 2, None, 'more-autograd'),
+              ('And  with loops', 2, None, 'and-with-loops'),
+              ('Using recursion', 2, None, 'using-recursion'),
+              ('Unsupported functions', 2, None, 'unsupported-functions'),
+              ('The syntax a.dot(b) when finding the dot product',
+               2,
+               None,
+               'the-syntax-a-dot-b-when-finding-the-dot-product'),
+              ('Using Autograd with OLS', 2, None, 'using-autograd-with-ols'),
+              ('Same code but now with momentum gradient descent',
+               2,
+               None,
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ("But none of these can compete with Newton's method",
+               2,
+               None,
+               'but-none-of-these-can-compete-with-newton-s-method'),
+              ('Including Stochastic Gradient Descent with Autograd',
+               2,
+               None,
+               'including-stochastic-gradient-descent-with-autograd'),
+              ('Same code but now with momentum gradient descent',
+               2,
+               None,
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ('Similar (second order function now) problem but now with '
+               'AdaGrad',
+               2,
+               None,
+               'similar-second-order-function-now-problem-but-now-with-adagrad'),
+              ('RMSprop for adaptive learning rate with Stochastic Gradient '
+               'Descent',
+               2,
+               None,
+               'rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent'),
+              ('And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"',
+               2,
+               None,
+               'and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf'),
+              ('And Logistic Regression', 2, None, 'and-logistic-regression'),
+              ('Introducing "JAX":"https://jax.readthedocs.io/en/latest/"',
+               2,
+               None,
+               'introducing-jax-https-jax-readthedocs-io-en-latest')]}
+end of tocinfo -->
+
+<body>
+
+
+
+<script type="text/x-mathjax-config">
+MathJax.Hub.Config({
+  TeX: {
+     equationNumbers: {  autoNumber: "none"  },
+     extensions: ["AMSmath.js", "AMSsymbols.js", "autobold.js", "color.js"]
+  }
+});
+</script>
+<script type="text/javascript" async
+ src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
+</script>
+
+
+<!-- Bootstrap navigation bar -->
+<div class="navbar navbar-default navbar-fixed-top">
+  <div class="navbar-header">
+    <button type="button" class="navbar-toggle" data-toggle="collapse" data-target=".navbar-responsive-collapse">
+      <span class="icon-bar"></span>
+      <span class="icon-bar"></span>
+      <span class="icon-bar"></span>
+    </button>
+    <a class="navbar-brand" href="http://github.com/week39-bs.html">Week 39: Optimization and  Gradient Methods</a>
+  </div>
+  <div class="navbar-collapse collapse navbar-responsive-collapse">
+    <ul class="nav navbar-nav navbar-right">
+      <li class="dropdown">
+        <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
+        <ul class="dropdown-menu">
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs001.html#plan-for-week-39-september-23-27-2024" style="font-size: 80%;">Plan for week 39, September 23-27, 2024</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs002.html#lecture-monday-september-23" style="font-size: 80%;">Lecture Monday September 23</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs003.html#lab-sessions-week-39" style="font-size: 80%;">Lab sessions week 39</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs004.html#lecture-monday-september-23-optimization-the-central-part-of-any-machine-learning-algortithm" style="font-size: 80%;">Lecture Monday September 23, Optimization, the central part of any Machine Learning algortithm</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs005.html#revisiting-our-logistic-regression-case" style="font-size: 80%;">Revisiting our Logistic Regression case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs006.html#the-equations-to-solve" style="font-size: 80%;">The equations to solve</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs007.html#solving-using-newton-raphson-s-method" style="font-size: 80%;">Solving using Newton-Raphson's method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs008.html#brief-reminder-on-newton-raphson-s-method" style="font-size: 80%;">Brief reminder on Newton-Raphson's method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs009.html#the-equations" style="font-size: 80%;">The equations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs010.html#simple-geometric-interpretation" style="font-size: 80%;">Simple geometric interpretation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs011.html#extending-to-more-than-one-variable" style="font-size: 80%;">Extending to more than one variable</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs012.html#steepest-descent" style="font-size: 80%;">Steepest descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs013.html#more-on-steepest-descent" style="font-size: 80%;">More on Steepest descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs014.html#the-ideal" style="font-size: 80%;">The ideal</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs015.html#the-sensitiveness-of-the-gradient-descent" style="font-size: 80%;">The sensitiveness of the gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs016.html#convex-functions" style="font-size: 80%;">Convex functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs017.html#convex-function" style="font-size: 80%;">Convex function</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs018.html#conditions-on-convex-functions" style="font-size: 80%;">Conditions on convex functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs019.html#more-on-convex-functions" style="font-size: 80%;">More on convex functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs020.html#some-simple-problems" style="font-size: 80%;">Some simple problems</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs021.html#standard-steepest-descent" style="font-size: 80%;">Standard steepest descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs022.html#gradient-method" style="font-size: 80%;">Gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs024.html#steepest-descent-method" style="font-size: 80%;">Steepest descent  method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs024.html#steepest-descent-method" style="font-size: 80%;">Steepest descent  method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs025.html#final-expressions" style="font-size: 80%;">Final expressions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs026.html#steepest-descent-example" style="font-size: 80%;">Steepest descent example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs031.html#conjugate-gradient-method-and-iterations" style="font-size: 80%;">Conjugate gradient method and iterations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs035.html#revisiting-our-first-homework" style="font-size: 80%;">Revisiting our first homework</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs040.html#gradient-descent-example" style="font-size: 80%;">Gradient descent example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs037.html#the-derivative-of-the-cost-loss-function" style="font-size: 80%;">The derivative of the cost/loss function</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs038.html#the-hessian-matrix" style="font-size: 80%;">The Hessian matrix</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs039.html#simple-program" style="font-size: 80%;">Simple program</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs040.html#gradient-descent-example" style="font-size: 80%;">Gradient Descent Example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs041.html#and-a-corresponding-example-using-scikit-learn" style="font-size: 80%;">And a corresponding example using <b>scikit-learn</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs042.html#gradient-descent-and-ridge" style="font-size: 80%;">Gradient descent and Ridge</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs043.html#the-hessian-matrix-for-ridge-regression" style="font-size: 80%;">The Hessian matrix for Ridge Regression</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs044.html#program-example-for-gradient-descent-with-ridge-regression" style="font-size: 80%;">Program example for gradient descent with Ridge Regression</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs045.html#using-gradient-descent-methods-limitations" style="font-size: 80%;">Using gradient descent methods, limitations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs046.html#improving-gradient-descent-with-momentum" style="font-size: 80%;">Improving gradient descent with momentum</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs083.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;">Same code but now with momentum gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs048.html#overview-video-on-stochastic-gradient-descent" style="font-size: 80%;">Overview video on Stochastic Gradient Descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs049.html#batches-and-mini-batches" style="font-size: 80%;">Batches and mini-batches</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs050.html#stochastic-gradient-descent-sgd" style="font-size: 80%;">Stochastic Gradient Descent (SGD)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs051.html#stochastic-gradient-descent" style="font-size: 80%;">Stochastic Gradient Descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs052.html#computation-of-gradients" style="font-size: 80%;">Computation of gradients</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs053.html#sgd-example" style="font-size: 80%;">SGD example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs054.html#the-gradient-step" style="font-size: 80%;">The gradient step</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs055.html#simple-example-code" style="font-size: 80%;">Simple example code</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs056.html#when-do-we-stop" style="font-size: 80%;">When do we stop?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs057.html#slightly-different-approach" style="font-size: 80%;">Slightly different approach</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs058.html#time-decay-rate" style="font-size: 80%;">Time decay rate</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs059.html#code-with-a-number-of-minibatches-which-varies" style="font-size: 80%;">Code with a Number of Minibatches which varies</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs060.html#replace-or-not" style="font-size: 80%;">Replace or not</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs061.html#momentum-based-gd" style="font-size: 80%;">Momentum based GD</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs062.html#more-on-momentum-based-approaches" style="font-size: 80%;">More on momentum based approaches</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs063.html#momentum-parameter" style="font-size: 80%;">Momentum parameter</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs064.html#second-moment-of-the-gradient" style="font-size: 80%;">Second moment of the gradient</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs065.html#rms-prop" style="font-size: 80%;">RMS prop</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs066.html#adam-optimizer-https-arxiv-org-abs-1412-6980" style="font-size: 80%;">"ADAM optimizer":"https://arxiv.org/abs/1412.6980"</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs067.html#algorithms-and-codes-for-adagrad-rmsprop-and-adam" style="font-size: 80%;">Algorithms and codes for Adagrad, RMSprop and Adam</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs068.html#practical-tips" style="font-size: 80%;">Practical tips</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs069.html#automatic-differentiation" style="font-size: 80%;">Automatic differentiation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs070.html#using-autograd" style="font-size: 80%;">Using autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs071.html#autograd-with-more-complicated-functions" style="font-size: 80%;">Autograd with more complicated functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs072.html#more-complicated-functions-using-the-elements-of-their-arguments-directly" style="font-size: 80%;">More complicated functions using the elements of their arguments directly</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs073.html#functions-using-mathematical-functions-from-numpy" style="font-size: 80%;">Functions using mathematical functions from Numpy</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs074.html#more-autograd" style="font-size: 80%;">More autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs075.html#and-with-loops" style="font-size: 80%;">And  with loops</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs076.html#using-recursion" style="font-size: 80%;">Using recursion</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs077.html#unsupported-functions" style="font-size: 80%;">Unsupported functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs078.html#the-syntax-a-dot-b-when-finding-the-dot-product" style="font-size: 80%;">The syntax a.dot(b) when finding the dot product</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs079.html#using-autograd-with-ols" style="font-size: 80%;">Using Autograd with OLS</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs083.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;">Same code but now with momentum gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="#but-none-of-these-can-compete-with-newton-s-method" style="font-size: 80%;">But none of these can compete with Newton's method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs082.html#including-stochastic-gradient-descent-with-autograd" style="font-size: 80%;">Including Stochastic Gradient Descent with Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs083.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;">Same code but now with momentum gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs084.html#similar-second-order-function-now-problem-but-now-with-adagrad" style="font-size: 80%;">Similar (second order function now) problem but now with AdaGrad</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs085.html#rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent" style="font-size: 80%;">RMSprop for adaptive learning rate with Stochastic Gradient Descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs086.html#and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf" style="font-size: 80%;">And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs087.html#and-logistic-regression" style="font-size: 80%;">And Logistic Regression</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs088.html#introducing-jax-https-jax-readthedocs-io-en-latest" style="font-size: 80%;">Introducing "JAX":"https://jax.readthedocs.io/en/latest/"</a></li>
+
+        </ul>
+      </li>
+    </ul>
+  </div>
+</div>
+</div> <!-- end of navigation bar -->
+<div class="container">
+<p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
+<a name="part0081"></a>
+<!-- !split -->
+<h2 id="but-none-of-these-can-compete-with-newton-s-method" class="anchor">But none of these can compete with Newton's method </h2>
+
+
+<!-- code=python (!bc pycod) typeset with pygments style "default" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #f8f8f8">
+  <pre style="line-height: 125%;"><span style="color: #408080; font-style: italic"># Using Newton&#39;s method</span>
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">random</span> <span style="color: #008000; font-weight: bold">import</span> random, seed
+<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
+<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">autograd.numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
+<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">matplotlib.pyplot</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">plt</span>
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">autograd</span> <span style="color: #008000; font-weight: bold">import</span> grad
+
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">CostOLS</span>(beta):
+    <span style="color: #008000; font-weight: bold">return</span> (<span style="color: #666666">1.0/</span>n)<span style="color: #666666">*</span>np<span style="color: #666666">.</span>sum((y<span style="color: #666666">-</span>X <span style="color: #666666">@</span> beta)<span style="color: #666666">**2</span>)
+
+n <span style="color: #666666">=</span> <span style="color: #666666">100</span>
+x <span style="color: #666666">=</span> <span style="color: #666666">2*</span>np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>rand(n,<span style="color: #666666">1</span>)
+y <span style="color: #666666">=</span> <span style="color: #666666">4+3*</span>x<span style="color: #666666">+</span>np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>randn(n,<span style="color: #666666">1</span>)
+
+X <span style="color: #666666">=</span> np<span style="color: #666666">.</span>c_[np<span style="color: #666666">.</span>ones((n,<span style="color: #666666">1</span>)), x]
+XT_X <span style="color: #666666">=</span> X<span style="color: #666666">.</span>T <span style="color: #666666">@</span> X
+beta_linreg <span style="color: #666666">=</span> np<span style="color: #666666">.</span>linalg<span style="color: #666666">.</span>pinv(XT_X) <span style="color: #666666">@</span> (X<span style="color: #666666">.</span>T <span style="color: #666666">@</span> y)
+<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;Own inversion&quot;</span>)
+<span style="color: #008000">print</span>(beta_linreg)
+<span style="color: #408080; font-style: italic"># Hessian matrix</span>
+H <span style="color: #666666">=</span> (<span style="color: #666666">2.0/</span>n)<span style="color: #666666">*</span> XT_X
+<span style="color: #408080; font-style: italic"># Note that here the Hessian does not depend on the parameters beta</span>
+invH <span style="color: #666666">=</span> np<span style="color: #666666">.</span>linalg<span style="color: #666666">.</span>pinv(H)
+EigValues, EigVectors <span style="color: #666666">=</span> np<span style="color: #666666">.</span>linalg<span style="color: #666666">.</span>eig(H)
+<span style="color: #008000">print</span>(<span style="color: #BA2121">f&quot;Eigenvalues of Hessian Matrix:</span><span style="color: #BB6688; font-weight: bold">{</span>EigValues<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">&quot;</span>)
+
+beta <span style="color: #666666">=</span> np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>randn(<span style="color: #666666">2</span>,<span style="color: #666666">1</span>)
+Niterations <span style="color: #666666">=</span> <span style="color: #666666">5</span>
+
+<span style="color: #408080; font-style: italic"># define the gradient</span>
+training_gradient <span style="color: #666666">=</span> grad(CostOLS)
+
+<span style="color: #008000; font-weight: bold">for</span> <span style="color: #008000">iter</span> <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(Niterations):
+    gradients <span style="color: #666666">=</span> training_gradient(beta)
+    beta <span style="color: #666666">-=</span> invH <span style="color: #666666">@</span> gradients
+    <span style="color: #008000">print</span>(<span style="color: #008000">iter</span>,gradients[<span style="color: #666666">0</span>],gradients[<span style="color: #666666">1</span>])
+<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;beta from own Newton code&quot;</span>)
+<span style="color: #008000">print</span>(beta)
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
+</div>
+
+
+<p>
+<!-- navigation buttons at the bottom of the page -->
+<ul class="pagination">
+<li><a href="http://github.com/._week39-bs080.html">&laquo;</a></li>
+  <li><a href="http://github.com/._week39-bs000.html">1</a></li>
+  <li><a href="">...</a></li>
+  <li><a href="http://github.com/._week39-bs073.html">74</a></li>
+  <li><a href="http://github.com/._week39-bs074.html">75</a></li>
+  <li><a href="http://github.com/._week39-bs075.html">76</a></li>
+  <li><a href="http://github.com/._week39-bs076.html">77</a></li>
+  <li><a href="http://github.com/._week39-bs077.html">78</a></li>
+  <li><a href="http://github.com/._week39-bs078.html">79</a></li>
+  <li><a href="http://github.com/._week39-bs079.html">80</a></li>
+  <li><a href="http://github.com/._week39-bs080.html">81</a></li>
+  <li class="active"><a href="http://github.com/._week39-bs081.html">82</a></li>
+  <li><a href="http://github.com/._week39-bs082.html">83</a></li>
+  <li><a href="http://github.com/._week39-bs083.html">84</a></li>
+  <li><a href="http://github.com/._week39-bs084.html">85</a></li>
+  <li><a href="http://github.com/._week39-bs085.html">86</a></li>
+  <li><a href="http://github.com/._week39-bs086.html">87</a></li>
+  <li><a href="http://github.com/._week39-bs087.html">88</a></li>
+  <li><a href="http://github.com/._week39-bs088.html">89</a></li>
+  <li><a href="http://github.com/._week39-bs082.html">&raquo;</a></li>
+</ul>
+<!-- ------------------- end of main content --------------- -->
+</div>  <!-- end container -->
+<!-- include javascript, jQuery *first* -->
+<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.10.2/jquery.min.js"></script>
+<script src="https://netdna.bootstrapcdn.com/bootstrap/3.0.0/js/bootstrap.min.js"></script>
+<!-- Bootstrap footer
+<footer>
+<a href="https://.../"><img width="250" align=right src="https://.../"></a>
+</footer>
+-->
+<center style="font-size:80%">
+<!-- copyright only on the titlepage -->
+</center>
+</body>
+</html>
+
diff --git a/doc/pub/week39/html/._week39-bs082.html b/doc/pub/week39/html/._week39-bs082.html
new file mode 100644
index 000000000..10cf81083
--- /dev/null
+++ b/doc/pub/week39/html/._week39-bs082.html
@@ -0,0 +1,551 @@
+<!--
+HTML file automatically generated from DocOnce source
+(https://github.com/doconce/doconce/)
+doconce format html week39.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week39-bs --no_mako
+-->
+<html>
+<head>
+<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
+<meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
+<meta name="viewport" content="width=device-width, initial-scale=1.0" />
+<meta name="description" content="Week 39: Optimization and  Gradient Methods">
+<title>Week 39: Optimization and  Gradient Methods</title>
+<!-- Bootstrap style: bootstrap -->
+<!-- doconce format html week39.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week39-bs --no_mako -->
+<link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
+<!-- not necessary
+<link href="https://netdna.bootstrapcdn.com/font-awesome/4.0.3/css/font-awesome.css" rel="stylesheet">
+-->
+<style type="text/css">
+/* Add scrollbar to dropdown menus in bootstrap navigation bar */
+.dropdown-menu {
+   height: auto;
+   max-height: 400px;
+   overflow-x: hidden;
+}
+/* Adds an invisible element before each target to offset for the navigation
+   bar */
+.anchor::before {
+  content:"";
+  display:block;
+  height:50px;      /* fixed header height for style bootstrap */
+  margin:-50px 0 0; /* negative fixed header height */
+}
+</style>
+</head>
+
+<!-- tocinfo
+{'highest level': 2,
+ 'sections': [('Plan for week 39, September 23-27, 2024',
+               2,
+               None,
+               'plan-for-week-39-september-23-27-2024'),
+              ('Lecture Monday September 23',
+               2,
+               None,
+               'lecture-monday-september-23'),
+              ('Lab sessions week 39', 2, None, 'lab-sessions-week-39'),
+              ('Lecture Monday September 23, Optimization, the central part of '
+               'any Machine Learning algortithm',
+               2,
+               None,
+               'lecture-monday-september-23-optimization-the-central-part-of-any-machine-learning-algortithm'),
+              ('Revisiting our Logistic Regression case',
+               2,
+               None,
+               'revisiting-our-logistic-regression-case'),
+              ('The equations to solve', 2, None, 'the-equations-to-solve'),
+              ("Solving using Newton-Raphson's method",
+               2,
+               None,
+               'solving-using-newton-raphson-s-method'),
+              ("Brief reminder on Newton-Raphson's method",
+               2,
+               None,
+               'brief-reminder-on-newton-raphson-s-method'),
+              ('The equations', 2, None, 'the-equations'),
+              ('Simple geometric interpretation',
+               2,
+               None,
+               'simple-geometric-interpretation'),
+              ('Extending to more than one variable',
+               2,
+               None,
+               'extending-to-more-than-one-variable'),
+              ('Steepest descent', 2, None, 'steepest-descent'),
+              ('More on Steepest descent', 2, None, 'more-on-steepest-descent'),
+              ('The ideal', 2, None, 'the-ideal'),
+              ('The sensitiveness of the gradient descent',
+               2,
+               None,
+               'the-sensitiveness-of-the-gradient-descent'),
+              ('Convex functions', 2, None, 'convex-functions'),
+              ('Convex function', 2, None, 'convex-function'),
+              ('Conditions on convex functions',
+               2,
+               None,
+               'conditions-on-convex-functions'),
+              ('More on convex functions', 2, None, 'more-on-convex-functions'),
+              ('Some simple problems', 2, None, 'some-simple-problems'),
+              ('Standard steepest descent',
+               2,
+               None,
+               'standard-steepest-descent'),
+              ('Gradient method', 2, None, 'gradient-method'),
+              ('Steepest descent  method', 2, None, 'steepest-descent-method'),
+              ('Steepest descent  method', 2, None, 'steepest-descent-method'),
+              ('Final expressions', 2, None, 'final-expressions'),
+              ('Steepest descent example', 2, None, 'steepest-descent-example'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method and iterations',
+               2,
+               None,
+               'conjugate-gradient-method-and-iterations'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Revisiting our first homework',
+               2,
+               None,
+               'revisiting-our-first-homework'),
+              ('Gradient descent example', 2, None, 'gradient-descent-example'),
+              ('The derivative of the cost/loss function',
+               2,
+               None,
+               'the-derivative-of-the-cost-loss-function'),
+              ('The Hessian matrix', 2, None, 'the-hessian-matrix'),
+              ('Simple program', 2, None, 'simple-program'),
+              ('Gradient Descent Example', 2, None, 'gradient-descent-example'),
+              ('And a corresponding example using _scikit-learn_',
+               2,
+               None,
+               'and-a-corresponding-example-using-scikit-learn'),
+              ('Gradient descent and Ridge',
+               2,
+               None,
+               'gradient-descent-and-ridge'),
+              ('The Hessian matrix for Ridge Regression',
+               2,
+               None,
+               'the-hessian-matrix-for-ridge-regression'),
+              ('Program example for gradient descent with Ridge Regression',
+               2,
+               None,
+               'program-example-for-gradient-descent-with-ridge-regression'),
+              ('Using gradient descent methods, limitations',
+               2,
+               None,
+               'using-gradient-descent-methods-limitations'),
+              ('Improving gradient descent with momentum',
+               2,
+               None,
+               'improving-gradient-descent-with-momentum'),
+              ('Same code but now with momentum gradient descent',
+               2,
+               None,
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ('Overview video on Stochastic Gradient Descent',
+               2,
+               None,
+               'overview-video-on-stochastic-gradient-descent'),
+              ('Batches and mini-batches', 2, None, 'batches-and-mini-batches'),
+              ('Stochastic Gradient Descent (SGD)',
+               2,
+               None,
+               'stochastic-gradient-descent-sgd'),
+              ('Stochastic Gradient Descent',
+               2,
+               None,
+               'stochastic-gradient-descent'),
+              ('Computation of gradients', 2, None, 'computation-of-gradients'),
+              ('SGD example', 2, None, 'sgd-example'),
+              ('The gradient step', 2, None, 'the-gradient-step'),
+              ('Simple example code', 2, None, 'simple-example-code'),
+              ('When do we stop?', 2, None, 'when-do-we-stop'),
+              ('Slightly different approach',
+               2,
+               None,
+               'slightly-different-approach'),
+              ('Time decay rate', 2, None, 'time-decay-rate'),
+              ('Code with a Number of Minibatches which varies',
+               2,
+               None,
+               'code-with-a-number-of-minibatches-which-varies'),
+              ('Replace or not', 2, None, 'replace-or-not'),
+              ('Momentum based GD', 2, None, 'momentum-based-gd'),
+              ('More on momentum based approaches',
+               2,
+               None,
+               'more-on-momentum-based-approaches'),
+              ('Momentum parameter', 2, None, 'momentum-parameter'),
+              ('Second moment of the gradient',
+               2,
+               None,
+               'second-moment-of-the-gradient'),
+              ('RMS prop', 2, None, 'rms-prop'),
+              ('"ADAM optimizer":"https://arxiv.org/abs/1412.6980"',
+               2,
+               None,
+               'adam-optimizer-https-arxiv-org-abs-1412-6980'),
+              ('Algorithms and codes for Adagrad, RMSprop and Adam',
+               2,
+               None,
+               'algorithms-and-codes-for-adagrad-rmsprop-and-adam'),
+              ('Practical tips', 2, None, 'practical-tips'),
+              ('Automatic differentiation',
+               2,
+               None,
+               'automatic-differentiation'),
+              ('Using autograd', 2, None, 'using-autograd'),
+              ('Autograd with more complicated functions',
+               2,
+               None,
+               'autograd-with-more-complicated-functions'),
+              ('More complicated functions using the elements of their '
+               'arguments directly',
+               2,
+               None,
+               'more-complicated-functions-using-the-elements-of-their-arguments-directly'),
+              ('Functions using mathematical functions from Numpy',
+               2,
+               None,
+               'functions-using-mathematical-functions-from-numpy'),
+              ('More autograd', 2, None, 'more-autograd'),
+              ('And  with loops', 2, None, 'and-with-loops'),
+              ('Using recursion', 2, None, 'using-recursion'),
+              ('Unsupported functions', 2, None, 'unsupported-functions'),
+              ('The syntax a.dot(b) when finding the dot product',
+               2,
+               None,
+               'the-syntax-a-dot-b-when-finding-the-dot-product'),
+              ('Using Autograd with OLS', 2, None, 'using-autograd-with-ols'),
+              ('Same code but now with momentum gradient descent',
+               2,
+               None,
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ("But none of these can compete with Newton's method",
+               2,
+               None,
+               'but-none-of-these-can-compete-with-newton-s-method'),
+              ('Including Stochastic Gradient Descent with Autograd',
+               2,
+               None,
+               'including-stochastic-gradient-descent-with-autograd'),
+              ('Same code but now with momentum gradient descent',
+               2,
+               None,
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ('Similar (second order function now) problem but now with '
+               'AdaGrad',
+               2,
+               None,
+               'similar-second-order-function-now-problem-but-now-with-adagrad'),
+              ('RMSprop for adaptive learning rate with Stochastic Gradient '
+               'Descent',
+               2,
+               None,
+               'rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent'),
+              ('And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"',
+               2,
+               None,
+               'and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf'),
+              ('And Logistic Regression', 2, None, 'and-logistic-regression'),
+              ('Introducing "JAX":"https://jax.readthedocs.io/en/latest/"',
+               2,
+               None,
+               'introducing-jax-https-jax-readthedocs-io-en-latest')]}
+end of tocinfo -->
+
+<body>
+
+
+
+<script type="text/x-mathjax-config">
+MathJax.Hub.Config({
+  TeX: {
+     equationNumbers: {  autoNumber: "none"  },
+     extensions: ["AMSmath.js", "AMSsymbols.js", "autobold.js", "color.js"]
+  }
+});
+</script>
+<script type="text/javascript" async
+ src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
+</script>
+
+
+<!-- Bootstrap navigation bar -->
+<div class="navbar navbar-default navbar-fixed-top">
+  <div class="navbar-header">
+    <button type="button" class="navbar-toggle" data-toggle="collapse" data-target=".navbar-responsive-collapse">
+      <span class="icon-bar"></span>
+      <span class="icon-bar"></span>
+      <span class="icon-bar"></span>
+    </button>
+    <a class="navbar-brand" href="http://github.com/week39-bs.html">Week 39: Optimization and  Gradient Methods</a>
+  </div>
+  <div class="navbar-collapse collapse navbar-responsive-collapse">
+    <ul class="nav navbar-nav navbar-right">
+      <li class="dropdown">
+        <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
+        <ul class="dropdown-menu">
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs001.html#plan-for-week-39-september-23-27-2024" style="font-size: 80%;">Plan for week 39, September 23-27, 2024</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs002.html#lecture-monday-september-23" style="font-size: 80%;">Lecture Monday September 23</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs003.html#lab-sessions-week-39" style="font-size: 80%;">Lab sessions week 39</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs004.html#lecture-monday-september-23-optimization-the-central-part-of-any-machine-learning-algortithm" style="font-size: 80%;">Lecture Monday September 23, Optimization, the central part of any Machine Learning algortithm</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs005.html#revisiting-our-logistic-regression-case" style="font-size: 80%;">Revisiting our Logistic Regression case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs006.html#the-equations-to-solve" style="font-size: 80%;">The equations to solve</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs007.html#solving-using-newton-raphson-s-method" style="font-size: 80%;">Solving using Newton-Raphson's method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs008.html#brief-reminder-on-newton-raphson-s-method" style="font-size: 80%;">Brief reminder on Newton-Raphson's method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs009.html#the-equations" style="font-size: 80%;">The equations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs010.html#simple-geometric-interpretation" style="font-size: 80%;">Simple geometric interpretation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs011.html#extending-to-more-than-one-variable" style="font-size: 80%;">Extending to more than one variable</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs012.html#steepest-descent" style="font-size: 80%;">Steepest descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs013.html#more-on-steepest-descent" style="font-size: 80%;">More on Steepest descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs014.html#the-ideal" style="font-size: 80%;">The ideal</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs015.html#the-sensitiveness-of-the-gradient-descent" style="font-size: 80%;">The sensitiveness of the gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs016.html#convex-functions" style="font-size: 80%;">Convex functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs017.html#convex-function" style="font-size: 80%;">Convex function</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs018.html#conditions-on-convex-functions" style="font-size: 80%;">Conditions on convex functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs019.html#more-on-convex-functions" style="font-size: 80%;">More on convex functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs020.html#some-simple-problems" style="font-size: 80%;">Some simple problems</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs021.html#standard-steepest-descent" style="font-size: 80%;">Standard steepest descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs022.html#gradient-method" style="font-size: 80%;">Gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs024.html#steepest-descent-method" style="font-size: 80%;">Steepest descent  method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs024.html#steepest-descent-method" style="font-size: 80%;">Steepest descent  method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs025.html#final-expressions" style="font-size: 80%;">Final expressions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs026.html#steepest-descent-example" style="font-size: 80%;">Steepest descent example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs031.html#conjugate-gradient-method-and-iterations" style="font-size: 80%;">Conjugate gradient method and iterations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs035.html#revisiting-our-first-homework" style="font-size: 80%;">Revisiting our first homework</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs040.html#gradient-descent-example" style="font-size: 80%;">Gradient descent example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs037.html#the-derivative-of-the-cost-loss-function" style="font-size: 80%;">The derivative of the cost/loss function</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs038.html#the-hessian-matrix" style="font-size: 80%;">The Hessian matrix</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs039.html#simple-program" style="font-size: 80%;">Simple program</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs040.html#gradient-descent-example" style="font-size: 80%;">Gradient Descent Example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs041.html#and-a-corresponding-example-using-scikit-learn" style="font-size: 80%;">And a corresponding example using <b>scikit-learn</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs042.html#gradient-descent-and-ridge" style="font-size: 80%;">Gradient descent and Ridge</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs043.html#the-hessian-matrix-for-ridge-regression" style="font-size: 80%;">The Hessian matrix for Ridge Regression</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs044.html#program-example-for-gradient-descent-with-ridge-regression" style="font-size: 80%;">Program example for gradient descent with Ridge Regression</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs045.html#using-gradient-descent-methods-limitations" style="font-size: 80%;">Using gradient descent methods, limitations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs046.html#improving-gradient-descent-with-momentum" style="font-size: 80%;">Improving gradient descent with momentum</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs083.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;">Same code but now with momentum gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs048.html#overview-video-on-stochastic-gradient-descent" style="font-size: 80%;">Overview video on Stochastic Gradient Descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs049.html#batches-and-mini-batches" style="font-size: 80%;">Batches and mini-batches</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs050.html#stochastic-gradient-descent-sgd" style="font-size: 80%;">Stochastic Gradient Descent (SGD)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs051.html#stochastic-gradient-descent" style="font-size: 80%;">Stochastic Gradient Descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs052.html#computation-of-gradients" style="font-size: 80%;">Computation of gradients</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs053.html#sgd-example" style="font-size: 80%;">SGD example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs054.html#the-gradient-step" style="font-size: 80%;">The gradient step</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs055.html#simple-example-code" style="font-size: 80%;">Simple example code</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs056.html#when-do-we-stop" style="font-size: 80%;">When do we stop?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs057.html#slightly-different-approach" style="font-size: 80%;">Slightly different approach</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs058.html#time-decay-rate" style="font-size: 80%;">Time decay rate</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs059.html#code-with-a-number-of-minibatches-which-varies" style="font-size: 80%;">Code with a Number of Minibatches which varies</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs060.html#replace-or-not" style="font-size: 80%;">Replace or not</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs061.html#momentum-based-gd" style="font-size: 80%;">Momentum based GD</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs062.html#more-on-momentum-based-approaches" style="font-size: 80%;">More on momentum based approaches</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs063.html#momentum-parameter" style="font-size: 80%;">Momentum parameter</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs064.html#second-moment-of-the-gradient" style="font-size: 80%;">Second moment of the gradient</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs065.html#rms-prop" style="font-size: 80%;">RMS prop</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs066.html#adam-optimizer-https-arxiv-org-abs-1412-6980" style="font-size: 80%;">"ADAM optimizer":"https://arxiv.org/abs/1412.6980"</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs067.html#algorithms-and-codes-for-adagrad-rmsprop-and-adam" style="font-size: 80%;">Algorithms and codes for Adagrad, RMSprop and Adam</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs068.html#practical-tips" style="font-size: 80%;">Practical tips</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs069.html#automatic-differentiation" style="font-size: 80%;">Automatic differentiation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs070.html#using-autograd" style="font-size: 80%;">Using autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs071.html#autograd-with-more-complicated-functions" style="font-size: 80%;">Autograd with more complicated functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs072.html#more-complicated-functions-using-the-elements-of-their-arguments-directly" style="font-size: 80%;">More complicated functions using the elements of their arguments directly</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs073.html#functions-using-mathematical-functions-from-numpy" style="font-size: 80%;">Functions using mathematical functions from Numpy</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs074.html#more-autograd" style="font-size: 80%;">More autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs075.html#and-with-loops" style="font-size: 80%;">And  with loops</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs076.html#using-recursion" style="font-size: 80%;">Using recursion</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs077.html#unsupported-functions" style="font-size: 80%;">Unsupported functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs078.html#the-syntax-a-dot-b-when-finding-the-dot-product" style="font-size: 80%;">The syntax a.dot(b) when finding the dot product</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs079.html#using-autograd-with-ols" style="font-size: 80%;">Using Autograd with OLS</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs083.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;">Same code but now with momentum gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs081.html#but-none-of-these-can-compete-with-newton-s-method" style="font-size: 80%;">But none of these can compete with Newton's method</a></li>
+     <!-- navigation toc: --> <li><a href="#including-stochastic-gradient-descent-with-autograd" style="font-size: 80%;">Including Stochastic Gradient Descent with Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs083.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;">Same code but now with momentum gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs084.html#similar-second-order-function-now-problem-but-now-with-adagrad" style="font-size: 80%;">Similar (second order function now) problem but now with AdaGrad</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs085.html#rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent" style="font-size: 80%;">RMSprop for adaptive learning rate with Stochastic Gradient Descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs086.html#and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf" style="font-size: 80%;">And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs087.html#and-logistic-regression" style="font-size: 80%;">And Logistic Regression</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs088.html#introducing-jax-https-jax-readthedocs-io-en-latest" style="font-size: 80%;">Introducing "JAX":"https://jax.readthedocs.io/en/latest/"</a></li>
+
+        </ul>
+      </li>
+    </ul>
+  </div>
+</div>
+</div> <!-- end of navigation bar -->
+<div class="container">
+<p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
+<a name="part0082"></a>
+<!-- !split -->
+<h2 id="including-stochastic-gradient-descent-with-autograd" class="anchor">Including Stochastic Gradient Descent with Autograd </h2>
+<p>In this code we include the stochastic gradient descent approach discussed above. Note here that we specify which argument we are taking the derivative with respect to when using <b>autograd</b>.</p>
+
+
+<!-- code=python (!bc pycod) typeset with pygments style "default" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #f8f8f8">
+  <pre style="line-height: 125%;"><span style="color: #408080; font-style: italic"># Using Autograd to calculate gradients using SGD</span>
+<span style="color: #408080; font-style: italic"># OLS example</span>
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">random</span> <span style="color: #008000; font-weight: bold">import</span> random, seed
+<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
+<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">autograd.numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
+<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">matplotlib.pyplot</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">plt</span>
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">autograd</span> <span style="color: #008000; font-weight: bold">import</span> grad
+
+<span style="color: #408080; font-style: italic"># Note change from previous example</span>
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">CostOLS</span>(y,X,theta):
+    <span style="color: #008000; font-weight: bold">return</span> np<span style="color: #666666">.</span>sum((y<span style="color: #666666">-</span>X <span style="color: #666666">@</span> theta)<span style="color: #666666">**2</span>)
+
+n <span style="color: #666666">=</span> <span style="color: #666666">100</span>
+x <span style="color: #666666">=</span> <span style="color: #666666">2*</span>np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>rand(n,<span style="color: #666666">1</span>)
+y <span style="color: #666666">=</span> <span style="color: #666666">4+3*</span>x<span style="color: #666666">+</span>np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>randn(n,<span style="color: #666666">1</span>)
+
+X <span style="color: #666666">=</span> np<span style="color: #666666">.</span>c_[np<span style="color: #666666">.</span>ones((n,<span style="color: #666666">1</span>)), x]
+XT_X <span style="color: #666666">=</span> X<span style="color: #666666">.</span>T <span style="color: #666666">@</span> X
+theta_linreg <span style="color: #666666">=</span> np<span style="color: #666666">.</span>linalg<span style="color: #666666">.</span>pinv(XT_X) <span style="color: #666666">@</span> (X<span style="color: #666666">.</span>T <span style="color: #666666">@</span> y)
+<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;Own inversion&quot;</span>)
+<span style="color: #008000">print</span>(theta_linreg)
+<span style="color: #408080; font-style: italic"># Hessian matrix</span>
+H <span style="color: #666666">=</span> (<span style="color: #666666">2.0/</span>n)<span style="color: #666666">*</span> XT_X
+EigValues, EigVectors <span style="color: #666666">=</span> np<span style="color: #666666">.</span>linalg<span style="color: #666666">.</span>eig(H)
+<span style="color: #008000">print</span>(<span style="color: #BA2121">f&quot;Eigenvalues of Hessian Matrix:</span><span style="color: #BB6688; font-weight: bold">{</span>EigValues<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">&quot;</span>)
+
+theta <span style="color: #666666">=</span> np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>randn(<span style="color: #666666">2</span>,<span style="color: #666666">1</span>)
+eta <span style="color: #666666">=</span> <span style="color: #666666">1.0/</span>np<span style="color: #666666">.</span>max(EigValues)
+Niterations <span style="color: #666666">=</span> <span style="color: #666666">1000</span>
+
+<span style="color: #408080; font-style: italic"># Note that we request the derivative wrt third argument (theta, 2 here)</span>
+training_gradient <span style="color: #666666">=</span> grad(CostOLS,<span style="color: #666666">2</span>)
+
+<span style="color: #008000; font-weight: bold">for</span> <span style="color: #008000">iter</span> <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(Niterations):
+    gradients <span style="color: #666666">=</span> (<span style="color: #666666">1.0/</span>n)<span style="color: #666666">*</span>training_gradient(y, X, theta)
+    theta <span style="color: #666666">-=</span> eta<span style="color: #666666">*</span>gradients
+<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;theta from own gd&quot;</span>)
+<span style="color: #008000">print</span>(theta)
+
+xnew <span style="color: #666666">=</span> np<span style="color: #666666">.</span>array([[<span style="color: #666666">0</span>],[<span style="color: #666666">2</span>]])
+Xnew <span style="color: #666666">=</span> np<span style="color: #666666">.</span>c_[np<span style="color: #666666">.</span>ones((<span style="color: #666666">2</span>,<span style="color: #666666">1</span>)), xnew]
+ypredict <span style="color: #666666">=</span> Xnew<span style="color: #666666">.</span>dot(theta)
+ypredict2 <span style="color: #666666">=</span> Xnew<span style="color: #666666">.</span>dot(theta_linreg)
+
+plt<span style="color: #666666">.</span>plot(xnew, ypredict, <span style="color: #BA2121">&quot;r-&quot;</span>)
+plt<span style="color: #666666">.</span>plot(xnew, ypredict2, <span style="color: #BA2121">&quot;b-&quot;</span>)
+plt<span style="color: #666666">.</span>plot(x, y ,<span style="color: #BA2121">&#39;ro&#39;</span>)
+plt<span style="color: #666666">.</span>axis([<span style="color: #666666">0</span>,<span style="color: #666666">2.0</span>,<span style="color: #666666">0</span>, <span style="color: #666666">15.0</span>])
+plt<span style="color: #666666">.</span>xlabel(<span style="color: #BA2121">r&#39;$x$&#39;</span>)
+plt<span style="color: #666666">.</span>ylabel(<span style="color: #BA2121">r&#39;$y$&#39;</span>)
+plt<span style="color: #666666">.</span>title(<span style="color: #BA2121">r&#39;Random numbers &#39;</span>)
+plt<span style="color: #666666">.</span>show()
+
+n_epochs <span style="color: #666666">=</span> <span style="color: #666666">50</span>
+M <span style="color: #666666">=</span> <span style="color: #666666">5</span>   <span style="color: #408080; font-style: italic">#size of each minibatch</span>
+m <span style="color: #666666">=</span> <span style="color: #008000">int</span>(n<span style="color: #666666">/</span>M) <span style="color: #408080; font-style: italic">#number of minibatches</span>
+t0, t1 <span style="color: #666666">=</span> <span style="color: #666666">5</span>, <span style="color: #666666">50</span>
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">learning_schedule</span>(t):
+    <span style="color: #008000; font-weight: bold">return</span> t0<span style="color: #666666">/</span>(t<span style="color: #666666">+</span>t1)
+
+theta <span style="color: #666666">=</span> np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>randn(<span style="color: #666666">2</span>,<span style="color: #666666">1</span>)
+
+<span style="color: #008000; font-weight: bold">for</span> epoch <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(n_epochs):
+<span style="color: #408080; font-style: italic"># Can you figure out a better way of setting up the contributions to each batch?</span>
+    <span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(m):
+        random_index <span style="color: #666666">=</span> M<span style="color: #666666">*</span>np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>randint(m)
+        xi <span style="color: #666666">=</span> X[random_index:random_index<span style="color: #666666">+</span>M]
+        yi <span style="color: #666666">=</span> y[random_index:random_index<span style="color: #666666">+</span>M]
+        gradients <span style="color: #666666">=</span> (<span style="color: #666666">1.0/</span>M)<span style="color: #666666">*</span>training_gradient(yi, xi, theta)
+        eta <span style="color: #666666">=</span> learning_schedule(epoch<span style="color: #666666">*</span>m<span style="color: #666666">+</span>i)
+        theta <span style="color: #666666">=</span> theta <span style="color: #666666">-</span> eta<span style="color: #666666">*</span>gradients
+<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;theta from own sdg&quot;</span>)
+<span style="color: #008000">print</span>(theta)
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
+</div>
+
+
+<p>
+<!-- navigation buttons at the bottom of the page -->
+<ul class="pagination">
+<li><a href="http://github.com/._week39-bs081.html">&laquo;</a></li>
+  <li><a href="http://github.com/._week39-bs000.html">1</a></li>
+  <li><a href="">...</a></li>
+  <li><a href="http://github.com/._week39-bs074.html">75</a></li>
+  <li><a href="http://github.com/._week39-bs075.html">76</a></li>
+  <li><a href="http://github.com/._week39-bs076.html">77</a></li>
+  <li><a href="http://github.com/._week39-bs077.html">78</a></li>
+  <li><a href="http://github.com/._week39-bs078.html">79</a></li>
+  <li><a href="http://github.com/._week39-bs079.html">80</a></li>
+  <li><a href="http://github.com/._week39-bs080.html">81</a></li>
+  <li><a href="http://github.com/._week39-bs081.html">82</a></li>
+  <li class="active"><a href="http://github.com/._week39-bs082.html">83</a></li>
+  <li><a href="http://github.com/._week39-bs083.html">84</a></li>
+  <li><a href="http://github.com/._week39-bs084.html">85</a></li>
+  <li><a href="http://github.com/._week39-bs085.html">86</a></li>
+  <li><a href="http://github.com/._week39-bs086.html">87</a></li>
+  <li><a href="http://github.com/._week39-bs087.html">88</a></li>
+  <li><a href="http://github.com/._week39-bs088.html">89</a></li>
+  <li><a href="http://github.com/._week39-bs083.html">&raquo;</a></li>
+</ul>
+<!-- ------------------- end of main content --------------- -->
+</div>  <!-- end container -->
+<!-- include javascript, jQuery *first* -->
+<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.10.2/jquery.min.js"></script>
+<script src="https://netdna.bootstrapcdn.com/bootstrap/3.0.0/js/bootstrap.min.js"></script>
+<!-- Bootstrap footer
+<footer>
+<a href="https://.../"><img width="250" align=right src="https://.../"></a>
+</footer>
+-->
+<center style="font-size:80%">
+<!-- copyright only on the titlepage -->
+</center>
+</body>
+</html>
+
diff --git a/doc/pub/week39/html/._week39-bs083.html b/doc/pub/week39/html/._week39-bs083.html
new file mode 100644
index 000000000..fefcbc005
--- /dev/null
+++ b/doc/pub/week39/html/._week39-bs083.html
@@ -0,0 +1,542 @@
+<!--
+HTML file automatically generated from DocOnce source
+(https://github.com/doconce/doconce/)
+doconce format html week39.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week39-bs --no_mako
+-->
+<html>
+<head>
+<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
+<meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
+<meta name="viewport" content="width=device-width, initial-scale=1.0" />
+<meta name="description" content="Week 39: Optimization and  Gradient Methods">
+<title>Week 39: Optimization and  Gradient Methods</title>
+<!-- Bootstrap style: bootstrap -->
+<!-- doconce format html week39.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week39-bs --no_mako -->
+<link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
+<!-- not necessary
+<link href="https://netdna.bootstrapcdn.com/font-awesome/4.0.3/css/font-awesome.css" rel="stylesheet">
+-->
+<style type="text/css">
+/* Add scrollbar to dropdown menus in bootstrap navigation bar */
+.dropdown-menu {
+   height: auto;
+   max-height: 400px;
+   overflow-x: hidden;
+}
+/* Adds an invisible element before each target to offset for the navigation
+   bar */
+.anchor::before {
+  content:"";
+  display:block;
+  height:50px;      /* fixed header height for style bootstrap */
+  margin:-50px 0 0; /* negative fixed header height */
+}
+</style>
+</head>
+
+<!-- tocinfo
+{'highest level': 2,
+ 'sections': [('Plan for week 39, September 23-27, 2024',
+               2,
+               None,
+               'plan-for-week-39-september-23-27-2024'),
+              ('Lecture Monday September 23',
+               2,
+               None,
+               'lecture-monday-september-23'),
+              ('Lab sessions week 39', 2, None, 'lab-sessions-week-39'),
+              ('Lecture Monday September 23, Optimization, the central part of '
+               'any Machine Learning algortithm',
+               2,
+               None,
+               'lecture-monday-september-23-optimization-the-central-part-of-any-machine-learning-algortithm'),
+              ('Revisiting our Logistic Regression case',
+               2,
+               None,
+               'revisiting-our-logistic-regression-case'),
+              ('The equations to solve', 2, None, 'the-equations-to-solve'),
+              ("Solving using Newton-Raphson's method",
+               2,
+               None,
+               'solving-using-newton-raphson-s-method'),
+              ("Brief reminder on Newton-Raphson's method",
+               2,
+               None,
+               'brief-reminder-on-newton-raphson-s-method'),
+              ('The equations', 2, None, 'the-equations'),
+              ('Simple geometric interpretation',
+               2,
+               None,
+               'simple-geometric-interpretation'),
+              ('Extending to more than one variable',
+               2,
+               None,
+               'extending-to-more-than-one-variable'),
+              ('Steepest descent', 2, None, 'steepest-descent'),
+              ('More on Steepest descent', 2, None, 'more-on-steepest-descent'),
+              ('The ideal', 2, None, 'the-ideal'),
+              ('The sensitiveness of the gradient descent',
+               2,
+               None,
+               'the-sensitiveness-of-the-gradient-descent'),
+              ('Convex functions', 2, None, 'convex-functions'),
+              ('Convex function', 2, None, 'convex-function'),
+              ('Conditions on convex functions',
+               2,
+               None,
+               'conditions-on-convex-functions'),
+              ('More on convex functions', 2, None, 'more-on-convex-functions'),
+              ('Some simple problems', 2, None, 'some-simple-problems'),
+              ('Standard steepest descent',
+               2,
+               None,
+               'standard-steepest-descent'),
+              ('Gradient method', 2, None, 'gradient-method'),
+              ('Steepest descent  method', 2, None, 'steepest-descent-method'),
+              ('Steepest descent  method', 2, None, 'steepest-descent-method'),
+              ('Final expressions', 2, None, 'final-expressions'),
+              ('Steepest descent example', 2, None, 'steepest-descent-example'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method and iterations',
+               2,
+               None,
+               'conjugate-gradient-method-and-iterations'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Revisiting our first homework',
+               2,
+               None,
+               'revisiting-our-first-homework'),
+              ('Gradient descent example', 2, None, 'gradient-descent-example'),
+              ('The derivative of the cost/loss function',
+               2,
+               None,
+               'the-derivative-of-the-cost-loss-function'),
+              ('The Hessian matrix', 2, None, 'the-hessian-matrix'),
+              ('Simple program', 2, None, 'simple-program'),
+              ('Gradient Descent Example', 2, None, 'gradient-descent-example'),
+              ('And a corresponding example using _scikit-learn_',
+               2,
+               None,
+               'and-a-corresponding-example-using-scikit-learn'),
+              ('Gradient descent and Ridge',
+               2,
+               None,
+               'gradient-descent-and-ridge'),
+              ('The Hessian matrix for Ridge Regression',
+               2,
+               None,
+               'the-hessian-matrix-for-ridge-regression'),
+              ('Program example for gradient descent with Ridge Regression',
+               2,
+               None,
+               'program-example-for-gradient-descent-with-ridge-regression'),
+              ('Using gradient descent methods, limitations',
+               2,
+               None,
+               'using-gradient-descent-methods-limitations'),
+              ('Improving gradient descent with momentum',
+               2,
+               None,
+               'improving-gradient-descent-with-momentum'),
+              ('Same code but now with momentum gradient descent',
+               2,
+               None,
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ('Overview video on Stochastic Gradient Descent',
+               2,
+               None,
+               'overview-video-on-stochastic-gradient-descent'),
+              ('Batches and mini-batches', 2, None, 'batches-and-mini-batches'),
+              ('Stochastic Gradient Descent (SGD)',
+               2,
+               None,
+               'stochastic-gradient-descent-sgd'),
+              ('Stochastic Gradient Descent',
+               2,
+               None,
+               'stochastic-gradient-descent'),
+              ('Computation of gradients', 2, None, 'computation-of-gradients'),
+              ('SGD example', 2, None, 'sgd-example'),
+              ('The gradient step', 2, None, 'the-gradient-step'),
+              ('Simple example code', 2, None, 'simple-example-code'),
+              ('When do we stop?', 2, None, 'when-do-we-stop'),
+              ('Slightly different approach',
+               2,
+               None,
+               'slightly-different-approach'),
+              ('Time decay rate', 2, None, 'time-decay-rate'),
+              ('Code with a Number of Minibatches which varies',
+               2,
+               None,
+               'code-with-a-number-of-minibatches-which-varies'),
+              ('Replace or not', 2, None, 'replace-or-not'),
+              ('Momentum based GD', 2, None, 'momentum-based-gd'),
+              ('More on momentum based approaches',
+               2,
+               None,
+               'more-on-momentum-based-approaches'),
+              ('Momentum parameter', 2, None, 'momentum-parameter'),
+              ('Second moment of the gradient',
+               2,
+               None,
+               'second-moment-of-the-gradient'),
+              ('RMS prop', 2, None, 'rms-prop'),
+              ('"ADAM optimizer":"https://arxiv.org/abs/1412.6980"',
+               2,
+               None,
+               'adam-optimizer-https-arxiv-org-abs-1412-6980'),
+              ('Algorithms and codes for Adagrad, RMSprop and Adam',
+               2,
+               None,
+               'algorithms-and-codes-for-adagrad-rmsprop-and-adam'),
+              ('Practical tips', 2, None, 'practical-tips'),
+              ('Automatic differentiation',
+               2,
+               None,
+               'automatic-differentiation'),
+              ('Using autograd', 2, None, 'using-autograd'),
+              ('Autograd with more complicated functions',
+               2,
+               None,
+               'autograd-with-more-complicated-functions'),
+              ('More complicated functions using the elements of their '
+               'arguments directly',
+               2,
+               None,
+               'more-complicated-functions-using-the-elements-of-their-arguments-directly'),
+              ('Functions using mathematical functions from Numpy',
+               2,
+               None,
+               'functions-using-mathematical-functions-from-numpy'),
+              ('More autograd', 2, None, 'more-autograd'),
+              ('And  with loops', 2, None, 'and-with-loops'),
+              ('Using recursion', 2, None, 'using-recursion'),
+              ('Unsupported functions', 2, None, 'unsupported-functions'),
+              ('The syntax a.dot(b) when finding the dot product',
+               2,
+               None,
+               'the-syntax-a-dot-b-when-finding-the-dot-product'),
+              ('Using Autograd with OLS', 2, None, 'using-autograd-with-ols'),
+              ('Same code but now with momentum gradient descent',
+               2,
+               None,
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ("But none of these can compete with Newton's method",
+               2,
+               None,
+               'but-none-of-these-can-compete-with-newton-s-method'),
+              ('Including Stochastic Gradient Descent with Autograd',
+               2,
+               None,
+               'including-stochastic-gradient-descent-with-autograd'),
+              ('Same code but now with momentum gradient descent',
+               2,
+               None,
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ('Similar (second order function now) problem but now with '
+               'AdaGrad',
+               2,
+               None,
+               'similar-second-order-function-now-problem-but-now-with-adagrad'),
+              ('RMSprop for adaptive learning rate with Stochastic Gradient '
+               'Descent',
+               2,
+               None,
+               'rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent'),
+              ('And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"',
+               2,
+               None,
+               'and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf'),
+              ('And Logistic Regression', 2, None, 'and-logistic-regression'),
+              ('Introducing "JAX":"https://jax.readthedocs.io/en/latest/"',
+               2,
+               None,
+               'introducing-jax-https-jax-readthedocs-io-en-latest')]}
+end of tocinfo -->
+
+<body>
+
+
+
+<script type="text/x-mathjax-config">
+MathJax.Hub.Config({
+  TeX: {
+     equationNumbers: {  autoNumber: "none"  },
+     extensions: ["AMSmath.js", "AMSsymbols.js", "autobold.js", "color.js"]
+  }
+});
+</script>
+<script type="text/javascript" async
+ src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
+</script>
+
+
+<!-- Bootstrap navigation bar -->
+<div class="navbar navbar-default navbar-fixed-top">
+  <div class="navbar-header">
+    <button type="button" class="navbar-toggle" data-toggle="collapse" data-target=".navbar-responsive-collapse">
+      <span class="icon-bar"></span>
+      <span class="icon-bar"></span>
+      <span class="icon-bar"></span>
+    </button>
+    <a class="navbar-brand" href="http://github.com/week39-bs.html">Week 39: Optimization and  Gradient Methods</a>
+  </div>
+  <div class="navbar-collapse collapse navbar-responsive-collapse">
+    <ul class="nav navbar-nav navbar-right">
+      <li class="dropdown">
+        <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
+        <ul class="dropdown-menu">
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs001.html#plan-for-week-39-september-23-27-2024" style="font-size: 80%;">Plan for week 39, September 23-27, 2024</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs002.html#lecture-monday-september-23" style="font-size: 80%;">Lecture Monday September 23</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs003.html#lab-sessions-week-39" style="font-size: 80%;">Lab sessions week 39</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs004.html#lecture-monday-september-23-optimization-the-central-part-of-any-machine-learning-algortithm" style="font-size: 80%;">Lecture Monday September 23, Optimization, the central part of any Machine Learning algortithm</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs005.html#revisiting-our-logistic-regression-case" style="font-size: 80%;">Revisiting our Logistic Regression case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs006.html#the-equations-to-solve" style="font-size: 80%;">The equations to solve</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs007.html#solving-using-newton-raphson-s-method" style="font-size: 80%;">Solving using Newton-Raphson's method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs008.html#brief-reminder-on-newton-raphson-s-method" style="font-size: 80%;">Brief reminder on Newton-Raphson's method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs009.html#the-equations" style="font-size: 80%;">The equations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs010.html#simple-geometric-interpretation" style="font-size: 80%;">Simple geometric interpretation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs011.html#extending-to-more-than-one-variable" style="font-size: 80%;">Extending to more than one variable</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs012.html#steepest-descent" style="font-size: 80%;">Steepest descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs013.html#more-on-steepest-descent" style="font-size: 80%;">More on Steepest descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs014.html#the-ideal" style="font-size: 80%;">The ideal</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs015.html#the-sensitiveness-of-the-gradient-descent" style="font-size: 80%;">The sensitiveness of the gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs016.html#convex-functions" style="font-size: 80%;">Convex functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs017.html#convex-function" style="font-size: 80%;">Convex function</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs018.html#conditions-on-convex-functions" style="font-size: 80%;">Conditions on convex functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs019.html#more-on-convex-functions" style="font-size: 80%;">More on convex functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs020.html#some-simple-problems" style="font-size: 80%;">Some simple problems</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs021.html#standard-steepest-descent" style="font-size: 80%;">Standard steepest descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs022.html#gradient-method" style="font-size: 80%;">Gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs024.html#steepest-descent-method" style="font-size: 80%;">Steepest descent  method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs024.html#steepest-descent-method" style="font-size: 80%;">Steepest descent  method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs025.html#final-expressions" style="font-size: 80%;">Final expressions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs026.html#steepest-descent-example" style="font-size: 80%;">Steepest descent example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs031.html#conjugate-gradient-method-and-iterations" style="font-size: 80%;">Conjugate gradient method and iterations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs035.html#revisiting-our-first-homework" style="font-size: 80%;">Revisiting our first homework</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs040.html#gradient-descent-example" style="font-size: 80%;">Gradient descent example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs037.html#the-derivative-of-the-cost-loss-function" style="font-size: 80%;">The derivative of the cost/loss function</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs038.html#the-hessian-matrix" style="font-size: 80%;">The Hessian matrix</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs039.html#simple-program" style="font-size: 80%;">Simple program</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs040.html#gradient-descent-example" style="font-size: 80%;">Gradient Descent Example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs041.html#and-a-corresponding-example-using-scikit-learn" style="font-size: 80%;">And a corresponding example using <b>scikit-learn</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs042.html#gradient-descent-and-ridge" style="font-size: 80%;">Gradient descent and Ridge</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs043.html#the-hessian-matrix-for-ridge-regression" style="font-size: 80%;">The Hessian matrix for Ridge Regression</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs044.html#program-example-for-gradient-descent-with-ridge-regression" style="font-size: 80%;">Program example for gradient descent with Ridge Regression</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs045.html#using-gradient-descent-methods-limitations" style="font-size: 80%;">Using gradient descent methods, limitations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs046.html#improving-gradient-descent-with-momentum" style="font-size: 80%;">Improving gradient descent with momentum</a></li>
+     <!-- navigation toc: --> <li><a href="#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;">Same code but now with momentum gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs048.html#overview-video-on-stochastic-gradient-descent" style="font-size: 80%;">Overview video on Stochastic Gradient Descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs049.html#batches-and-mini-batches" style="font-size: 80%;">Batches and mini-batches</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs050.html#stochastic-gradient-descent-sgd" style="font-size: 80%;">Stochastic Gradient Descent (SGD)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs051.html#stochastic-gradient-descent" style="font-size: 80%;">Stochastic Gradient Descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs052.html#computation-of-gradients" style="font-size: 80%;">Computation of gradients</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs053.html#sgd-example" style="font-size: 80%;">SGD example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs054.html#the-gradient-step" style="font-size: 80%;">The gradient step</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs055.html#simple-example-code" style="font-size: 80%;">Simple example code</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs056.html#when-do-we-stop" style="font-size: 80%;">When do we stop?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs057.html#slightly-different-approach" style="font-size: 80%;">Slightly different approach</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs058.html#time-decay-rate" style="font-size: 80%;">Time decay rate</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs059.html#code-with-a-number-of-minibatches-which-varies" style="font-size: 80%;">Code with a Number of Minibatches which varies</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs060.html#replace-or-not" style="font-size: 80%;">Replace or not</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs061.html#momentum-based-gd" style="font-size: 80%;">Momentum based GD</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs062.html#more-on-momentum-based-approaches" style="font-size: 80%;">More on momentum based approaches</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs063.html#momentum-parameter" style="font-size: 80%;">Momentum parameter</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs064.html#second-moment-of-the-gradient" style="font-size: 80%;">Second moment of the gradient</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs065.html#rms-prop" style="font-size: 80%;">RMS prop</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs066.html#adam-optimizer-https-arxiv-org-abs-1412-6980" style="font-size: 80%;">"ADAM optimizer":"https://arxiv.org/abs/1412.6980"</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs067.html#algorithms-and-codes-for-adagrad-rmsprop-and-adam" style="font-size: 80%;">Algorithms and codes for Adagrad, RMSprop and Adam</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs068.html#practical-tips" style="font-size: 80%;">Practical tips</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs069.html#automatic-differentiation" style="font-size: 80%;">Automatic differentiation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs070.html#using-autograd" style="font-size: 80%;">Using autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs071.html#autograd-with-more-complicated-functions" style="font-size: 80%;">Autograd with more complicated functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs072.html#more-complicated-functions-using-the-elements-of-their-arguments-directly" style="font-size: 80%;">More complicated functions using the elements of their arguments directly</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs073.html#functions-using-mathematical-functions-from-numpy" style="font-size: 80%;">Functions using mathematical functions from Numpy</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs074.html#more-autograd" style="font-size: 80%;">More autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs075.html#and-with-loops" style="font-size: 80%;">And  with loops</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs076.html#using-recursion" style="font-size: 80%;">Using recursion</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs077.html#unsupported-functions" style="font-size: 80%;">Unsupported functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs078.html#the-syntax-a-dot-b-when-finding-the-dot-product" style="font-size: 80%;">The syntax a.dot(b) when finding the dot product</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs079.html#using-autograd-with-ols" style="font-size: 80%;">Using Autograd with OLS</a></li>
+     <!-- navigation toc: --> <li><a href="#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;">Same code but now with momentum gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs081.html#but-none-of-these-can-compete-with-newton-s-method" style="font-size: 80%;">But none of these can compete with Newton's method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs082.html#including-stochastic-gradient-descent-with-autograd" style="font-size: 80%;">Including Stochastic Gradient Descent with Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;">Same code but now with momentum gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs084.html#similar-second-order-function-now-problem-but-now-with-adagrad" style="font-size: 80%;">Similar (second order function now) problem but now with AdaGrad</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs085.html#rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent" style="font-size: 80%;">RMSprop for adaptive learning rate with Stochastic Gradient Descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs086.html#and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf" style="font-size: 80%;">And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs087.html#and-logistic-regression" style="font-size: 80%;">And Logistic Regression</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs088.html#introducing-jax-https-jax-readthedocs-io-en-latest" style="font-size: 80%;">Introducing "JAX":"https://jax.readthedocs.io/en/latest/"</a></li>
+
+        </ul>
+      </li>
+    </ul>
+  </div>
+</div>
+</div> <!-- end of navigation bar -->
+<div class="container">
+<p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
+<a name="part0083"></a>
+<!-- !split -->
+<h2 id="same-code-but-now-with-momentum-gradient-descent" class="anchor">Same code but now with momentum gradient descent </h2>
+
+<!-- code=python (!bc pycod) typeset with pygments style "default" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #f8f8f8">
+  <pre style="line-height: 125%;"><span style="color: #408080; font-style: italic"># Using Autograd to calculate gradients using SGD</span>
+<span style="color: #408080; font-style: italic"># OLS example</span>
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">random</span> <span style="color: #008000; font-weight: bold">import</span> random, seed
+<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
+<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">autograd.numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
+<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">matplotlib.pyplot</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">plt</span>
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">autograd</span> <span style="color: #008000; font-weight: bold">import</span> grad
+
+<span style="color: #408080; font-style: italic"># Note change from previous example</span>
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">CostOLS</span>(y,X,theta):
+    <span style="color: #008000; font-weight: bold">return</span> np<span style="color: #666666">.</span>sum((y<span style="color: #666666">-</span>X <span style="color: #666666">@</span> theta)<span style="color: #666666">**2</span>)
+
+n <span style="color: #666666">=</span> <span style="color: #666666">100</span>
+x <span style="color: #666666">=</span> <span style="color: #666666">2*</span>np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>rand(n,<span style="color: #666666">1</span>)
+y <span style="color: #666666">=</span> <span style="color: #666666">4+3*</span>x<span style="color: #666666">+</span>np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>randn(n,<span style="color: #666666">1</span>)
+
+X <span style="color: #666666">=</span> np<span style="color: #666666">.</span>c_[np<span style="color: #666666">.</span>ones((n,<span style="color: #666666">1</span>)), x]
+XT_X <span style="color: #666666">=</span> X<span style="color: #666666">.</span>T <span style="color: #666666">@</span> X
+theta_linreg <span style="color: #666666">=</span> np<span style="color: #666666">.</span>linalg<span style="color: #666666">.</span>pinv(XT_X) <span style="color: #666666">@</span> (X<span style="color: #666666">.</span>T <span style="color: #666666">@</span> y)
+<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;Own inversion&quot;</span>)
+<span style="color: #008000">print</span>(theta_linreg)
+<span style="color: #408080; font-style: italic"># Hessian matrix</span>
+H <span style="color: #666666">=</span> (<span style="color: #666666">2.0/</span>n)<span style="color: #666666">*</span> XT_X
+EigValues, EigVectors <span style="color: #666666">=</span> np<span style="color: #666666">.</span>linalg<span style="color: #666666">.</span>eig(H)
+<span style="color: #008000">print</span>(<span style="color: #BA2121">f&quot;Eigenvalues of Hessian Matrix:</span><span style="color: #BB6688; font-weight: bold">{</span>EigValues<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">&quot;</span>)
+
+theta <span style="color: #666666">=</span> np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>randn(<span style="color: #666666">2</span>,<span style="color: #666666">1</span>)
+eta <span style="color: #666666">=</span> <span style="color: #666666">1.0/</span>np<span style="color: #666666">.</span>max(EigValues)
+Niterations <span style="color: #666666">=</span> <span style="color: #666666">100</span>
+
+<span style="color: #408080; font-style: italic"># Note that we request the derivative wrt third argument (theta, 2 here)</span>
+training_gradient <span style="color: #666666">=</span> grad(CostOLS,<span style="color: #666666">2</span>)
+
+<span style="color: #008000; font-weight: bold">for</span> <span style="color: #008000">iter</span> <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(Niterations):
+    gradients <span style="color: #666666">=</span> (<span style="color: #666666">1.0/</span>n)<span style="color: #666666">*</span>training_gradient(y, X, theta)
+    theta <span style="color: #666666">-=</span> eta<span style="color: #666666">*</span>gradients
+<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;theta from own gd&quot;</span>)
+<span style="color: #008000">print</span>(theta)
+
+
+n_epochs <span style="color: #666666">=</span> <span style="color: #666666">50</span>
+M <span style="color: #666666">=</span> <span style="color: #666666">5</span>   <span style="color: #408080; font-style: italic">#size of each minibatch</span>
+m <span style="color: #666666">=</span> <span style="color: #008000">int</span>(n<span style="color: #666666">/</span>M) <span style="color: #408080; font-style: italic">#number of minibatches</span>
+t0, t1 <span style="color: #666666">=</span> <span style="color: #666666">5</span>, <span style="color: #666666">50</span>
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">learning_schedule</span>(t):
+    <span style="color: #008000; font-weight: bold">return</span> t0<span style="color: #666666">/</span>(t<span style="color: #666666">+</span>t1)
+
+theta <span style="color: #666666">=</span> np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>randn(<span style="color: #666666">2</span>,<span style="color: #666666">1</span>)
+
+change <span style="color: #666666">=</span> <span style="color: #666666">0.0</span>
+delta_momentum <span style="color: #666666">=</span> <span style="color: #666666">0.3</span>
+
+<span style="color: #008000; font-weight: bold">for</span> epoch <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(n_epochs):
+    <span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(m):
+        random_index <span style="color: #666666">=</span> M<span style="color: #666666">*</span>np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>randint(m)
+        xi <span style="color: #666666">=</span> X[random_index:random_index<span style="color: #666666">+</span>M]
+        yi <span style="color: #666666">=</span> y[random_index:random_index<span style="color: #666666">+</span>M]
+        gradients <span style="color: #666666">=</span> (<span style="color: #666666">1.0/</span>M)<span style="color: #666666">*</span>training_gradient(yi, xi, theta)
+        eta <span style="color: #666666">=</span> learning_schedule(epoch<span style="color: #666666">*</span>m<span style="color: #666666">+</span>i)
+        <span style="color: #408080; font-style: italic"># calculate update</span>
+        new_change <span style="color: #666666">=</span> eta<span style="color: #666666">*</span>gradients<span style="color: #666666">+</span>delta_momentum<span style="color: #666666">*</span>change
+        <span style="color: #408080; font-style: italic"># take a step</span>
+        theta <span style="color: #666666">-=</span> new_change
+        <span style="color: #408080; font-style: italic"># save the change</span>
+        change <span style="color: #666666">=</span> new_change
+<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;theta from own sdg with momentum&quot;</span>)
+<span style="color: #008000">print</span>(theta)
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
+</div>
+
+
+<p>
+<!-- navigation buttons at the bottom of the page -->
+<ul class="pagination">
+<li><a href="http://github.com/._week39-bs082.html">&laquo;</a></li>
+  <li><a href="http://github.com/._week39-bs000.html">1</a></li>
+  <li><a href="">...</a></li>
+  <li><a href="http://github.com/._week39-bs075.html">76</a></li>
+  <li><a href="http://github.com/._week39-bs076.html">77</a></li>
+  <li><a href="http://github.com/._week39-bs077.html">78</a></li>
+  <li><a href="http://github.com/._week39-bs078.html">79</a></li>
+  <li><a href="http://github.com/._week39-bs079.html">80</a></li>
+  <li><a href="http://github.com/._week39-bs080.html">81</a></li>
+  <li><a href="http://github.com/._week39-bs081.html">82</a></li>
+  <li><a href="http://github.com/._week39-bs082.html">83</a></li>
+  <li class="active"><a href="http://github.com/._week39-bs083.html">84</a></li>
+  <li><a href="http://github.com/._week39-bs084.html">85</a></li>
+  <li><a href="http://github.com/._week39-bs085.html">86</a></li>
+  <li><a href="http://github.com/._week39-bs086.html">87</a></li>
+  <li><a href="http://github.com/._week39-bs087.html">88</a></li>
+  <li><a href="http://github.com/._week39-bs088.html">89</a></li>
+  <li><a href="http://github.com/._week39-bs084.html">&raquo;</a></li>
+</ul>
+<!-- ------------------- end of main content --------------- -->
+</div>  <!-- end container -->
+<!-- include javascript, jQuery *first* -->
+<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.10.2/jquery.min.js"></script>
+<script src="https://netdna.bootstrapcdn.com/bootstrap/3.0.0/js/bootstrap.min.js"></script>
+<!-- Bootstrap footer
+<footer>
+<a href="https://.../"><img width="250" align=right src="https://.../"></a>
+</footer>
+-->
+<center style="font-size:80%">
+<!-- copyright only on the titlepage -->
+</center>
+</body>
+</html>
+
diff --git a/doc/pub/week39/html/._week39-bs084.html b/doc/pub/week39/html/._week39-bs084.html
new file mode 100644
index 000000000..a4e36dd59
--- /dev/null
+++ b/doc/pub/week39/html/._week39-bs084.html
@@ -0,0 +1,523 @@
+<!--
+HTML file automatically generated from DocOnce source
+(https://github.com/doconce/doconce/)
+doconce format html week39.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week39-bs --no_mako
+-->
+<html>
+<head>
+<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
+<meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
+<meta name="viewport" content="width=device-width, initial-scale=1.0" />
+<meta name="description" content="Week 39: Optimization and  Gradient Methods">
+<title>Week 39: Optimization and  Gradient Methods</title>
+<!-- Bootstrap style: bootstrap -->
+<!-- doconce format html week39.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week39-bs --no_mako -->
+<link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
+<!-- not necessary
+<link href="https://netdna.bootstrapcdn.com/font-awesome/4.0.3/css/font-awesome.css" rel="stylesheet">
+-->
+<style type="text/css">
+/* Add scrollbar to dropdown menus in bootstrap navigation bar */
+.dropdown-menu {
+   height: auto;
+   max-height: 400px;
+   overflow-x: hidden;
+}
+/* Adds an invisible element before each target to offset for the navigation
+   bar */
+.anchor::before {
+  content:"";
+  display:block;
+  height:50px;      /* fixed header height for style bootstrap */
+  margin:-50px 0 0; /* negative fixed header height */
+}
+</style>
+</head>
+
+<!-- tocinfo
+{'highest level': 2,
+ 'sections': [('Plan for week 39, September 23-27, 2024',
+               2,
+               None,
+               'plan-for-week-39-september-23-27-2024'),
+              ('Lecture Monday September 23',
+               2,
+               None,
+               'lecture-monday-september-23'),
+              ('Lab sessions week 39', 2, None, 'lab-sessions-week-39'),
+              ('Lecture Monday September 23, Optimization, the central part of '
+               'any Machine Learning algortithm',
+               2,
+               None,
+               'lecture-monday-september-23-optimization-the-central-part-of-any-machine-learning-algortithm'),
+              ('Revisiting our Logistic Regression case',
+               2,
+               None,
+               'revisiting-our-logistic-regression-case'),
+              ('The equations to solve', 2, None, 'the-equations-to-solve'),
+              ("Solving using Newton-Raphson's method",
+               2,
+               None,
+               'solving-using-newton-raphson-s-method'),
+              ("Brief reminder on Newton-Raphson's method",
+               2,
+               None,
+               'brief-reminder-on-newton-raphson-s-method'),
+              ('The equations', 2, None, 'the-equations'),
+              ('Simple geometric interpretation',
+               2,
+               None,
+               'simple-geometric-interpretation'),
+              ('Extending to more than one variable',
+               2,
+               None,
+               'extending-to-more-than-one-variable'),
+              ('Steepest descent', 2, None, 'steepest-descent'),
+              ('More on Steepest descent', 2, None, 'more-on-steepest-descent'),
+              ('The ideal', 2, None, 'the-ideal'),
+              ('The sensitiveness of the gradient descent',
+               2,
+               None,
+               'the-sensitiveness-of-the-gradient-descent'),
+              ('Convex functions', 2, None, 'convex-functions'),
+              ('Convex function', 2, None, 'convex-function'),
+              ('Conditions on convex functions',
+               2,
+               None,
+               'conditions-on-convex-functions'),
+              ('More on convex functions', 2, None, 'more-on-convex-functions'),
+              ('Some simple problems', 2, None, 'some-simple-problems'),
+              ('Standard steepest descent',
+               2,
+               None,
+               'standard-steepest-descent'),
+              ('Gradient method', 2, None, 'gradient-method'),
+              ('Steepest descent  method', 2, None, 'steepest-descent-method'),
+              ('Steepest descent  method', 2, None, 'steepest-descent-method'),
+              ('Final expressions', 2, None, 'final-expressions'),
+              ('Steepest descent example', 2, None, 'steepest-descent-example'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method and iterations',
+               2,
+               None,
+               'conjugate-gradient-method-and-iterations'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Revisiting our first homework',
+               2,
+               None,
+               'revisiting-our-first-homework'),
+              ('Gradient descent example', 2, None, 'gradient-descent-example'),
+              ('The derivative of the cost/loss function',
+               2,
+               None,
+               'the-derivative-of-the-cost-loss-function'),
+              ('The Hessian matrix', 2, None, 'the-hessian-matrix'),
+              ('Simple program', 2, None, 'simple-program'),
+              ('Gradient Descent Example', 2, None, 'gradient-descent-example'),
+              ('And a corresponding example using _scikit-learn_',
+               2,
+               None,
+               'and-a-corresponding-example-using-scikit-learn'),
+              ('Gradient descent and Ridge',
+               2,
+               None,
+               'gradient-descent-and-ridge'),
+              ('The Hessian matrix for Ridge Regression',
+               2,
+               None,
+               'the-hessian-matrix-for-ridge-regression'),
+              ('Program example for gradient descent with Ridge Regression',
+               2,
+               None,
+               'program-example-for-gradient-descent-with-ridge-regression'),
+              ('Using gradient descent methods, limitations',
+               2,
+               None,
+               'using-gradient-descent-methods-limitations'),
+              ('Improving gradient descent with momentum',
+               2,
+               None,
+               'improving-gradient-descent-with-momentum'),
+              ('Same code but now with momentum gradient descent',
+               2,
+               None,
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ('Overview video on Stochastic Gradient Descent',
+               2,
+               None,
+               'overview-video-on-stochastic-gradient-descent'),
+              ('Batches and mini-batches', 2, None, 'batches-and-mini-batches'),
+              ('Stochastic Gradient Descent (SGD)',
+               2,
+               None,
+               'stochastic-gradient-descent-sgd'),
+              ('Stochastic Gradient Descent',
+               2,
+               None,
+               'stochastic-gradient-descent'),
+              ('Computation of gradients', 2, None, 'computation-of-gradients'),
+              ('SGD example', 2, None, 'sgd-example'),
+              ('The gradient step', 2, None, 'the-gradient-step'),
+              ('Simple example code', 2, None, 'simple-example-code'),
+              ('When do we stop?', 2, None, 'when-do-we-stop'),
+              ('Slightly different approach',
+               2,
+               None,
+               'slightly-different-approach'),
+              ('Time decay rate', 2, None, 'time-decay-rate'),
+              ('Code with a Number of Minibatches which varies',
+               2,
+               None,
+               'code-with-a-number-of-minibatches-which-varies'),
+              ('Replace or not', 2, None, 'replace-or-not'),
+              ('Momentum based GD', 2, None, 'momentum-based-gd'),
+              ('More on momentum based approaches',
+               2,
+               None,
+               'more-on-momentum-based-approaches'),
+              ('Momentum parameter', 2, None, 'momentum-parameter'),
+              ('Second moment of the gradient',
+               2,
+               None,
+               'second-moment-of-the-gradient'),
+              ('RMS prop', 2, None, 'rms-prop'),
+              ('"ADAM optimizer":"https://arxiv.org/abs/1412.6980"',
+               2,
+               None,
+               'adam-optimizer-https-arxiv-org-abs-1412-6980'),
+              ('Algorithms and codes for Adagrad, RMSprop and Adam',
+               2,
+               None,
+               'algorithms-and-codes-for-adagrad-rmsprop-and-adam'),
+              ('Practical tips', 2, None, 'practical-tips'),
+              ('Automatic differentiation',
+               2,
+               None,
+               'automatic-differentiation'),
+              ('Using autograd', 2, None, 'using-autograd'),
+              ('Autograd with more complicated functions',
+               2,
+               None,
+               'autograd-with-more-complicated-functions'),
+              ('More complicated functions using the elements of their '
+               'arguments directly',
+               2,
+               None,
+               'more-complicated-functions-using-the-elements-of-their-arguments-directly'),
+              ('Functions using mathematical functions from Numpy',
+               2,
+               None,
+               'functions-using-mathematical-functions-from-numpy'),
+              ('More autograd', 2, None, 'more-autograd'),
+              ('And  with loops', 2, None, 'and-with-loops'),
+              ('Using recursion', 2, None, 'using-recursion'),
+              ('Unsupported functions', 2, None, 'unsupported-functions'),
+              ('The syntax a.dot(b) when finding the dot product',
+               2,
+               None,
+               'the-syntax-a-dot-b-when-finding-the-dot-product'),
+              ('Using Autograd with OLS', 2, None, 'using-autograd-with-ols'),
+              ('Same code but now with momentum gradient descent',
+               2,
+               None,
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ("But none of these can compete with Newton's method",
+               2,
+               None,
+               'but-none-of-these-can-compete-with-newton-s-method'),
+              ('Including Stochastic Gradient Descent with Autograd',
+               2,
+               None,
+               'including-stochastic-gradient-descent-with-autograd'),
+              ('Same code but now with momentum gradient descent',
+               2,
+               None,
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ('Similar (second order function now) problem but now with '
+               'AdaGrad',
+               2,
+               None,
+               'similar-second-order-function-now-problem-but-now-with-adagrad'),
+              ('RMSprop for adaptive learning rate with Stochastic Gradient '
+               'Descent',
+               2,
+               None,
+               'rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent'),
+              ('And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"',
+               2,
+               None,
+               'and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf'),
+              ('And Logistic Regression', 2, None, 'and-logistic-regression'),
+              ('Introducing "JAX":"https://jax.readthedocs.io/en/latest/"',
+               2,
+               None,
+               'introducing-jax-https-jax-readthedocs-io-en-latest')]}
+end of tocinfo -->
+
+<body>
+
+
+
+<script type="text/x-mathjax-config">
+MathJax.Hub.Config({
+  TeX: {
+     equationNumbers: {  autoNumber: "none"  },
+     extensions: ["AMSmath.js", "AMSsymbols.js", "autobold.js", "color.js"]
+  }
+});
+</script>
+<script type="text/javascript" async
+ src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
+</script>
+
+
+<!-- Bootstrap navigation bar -->
+<div class="navbar navbar-default navbar-fixed-top">
+  <div class="navbar-header">
+    <button type="button" class="navbar-toggle" data-toggle="collapse" data-target=".navbar-responsive-collapse">
+      <span class="icon-bar"></span>
+      <span class="icon-bar"></span>
+      <span class="icon-bar"></span>
+    </button>
+    <a class="navbar-brand" href="http://github.com/week39-bs.html">Week 39: Optimization and  Gradient Methods</a>
+  </div>
+  <div class="navbar-collapse collapse navbar-responsive-collapse">
+    <ul class="nav navbar-nav navbar-right">
+      <li class="dropdown">
+        <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
+        <ul class="dropdown-menu">
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs001.html#plan-for-week-39-september-23-27-2024" style="font-size: 80%;">Plan for week 39, September 23-27, 2024</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs002.html#lecture-monday-september-23" style="font-size: 80%;">Lecture Monday September 23</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs003.html#lab-sessions-week-39" style="font-size: 80%;">Lab sessions week 39</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs004.html#lecture-monday-september-23-optimization-the-central-part-of-any-machine-learning-algortithm" style="font-size: 80%;">Lecture Monday September 23, Optimization, the central part of any Machine Learning algortithm</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs005.html#revisiting-our-logistic-regression-case" style="font-size: 80%;">Revisiting our Logistic Regression case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs006.html#the-equations-to-solve" style="font-size: 80%;">The equations to solve</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs007.html#solving-using-newton-raphson-s-method" style="font-size: 80%;">Solving using Newton-Raphson's method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs008.html#brief-reminder-on-newton-raphson-s-method" style="font-size: 80%;">Brief reminder on Newton-Raphson's method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs009.html#the-equations" style="font-size: 80%;">The equations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs010.html#simple-geometric-interpretation" style="font-size: 80%;">Simple geometric interpretation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs011.html#extending-to-more-than-one-variable" style="font-size: 80%;">Extending to more than one variable</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs012.html#steepest-descent" style="font-size: 80%;">Steepest descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs013.html#more-on-steepest-descent" style="font-size: 80%;">More on Steepest descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs014.html#the-ideal" style="font-size: 80%;">The ideal</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs015.html#the-sensitiveness-of-the-gradient-descent" style="font-size: 80%;">The sensitiveness of the gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs016.html#convex-functions" style="font-size: 80%;">Convex functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs017.html#convex-function" style="font-size: 80%;">Convex function</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs018.html#conditions-on-convex-functions" style="font-size: 80%;">Conditions on convex functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs019.html#more-on-convex-functions" style="font-size: 80%;">More on convex functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs020.html#some-simple-problems" style="font-size: 80%;">Some simple problems</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs021.html#standard-steepest-descent" style="font-size: 80%;">Standard steepest descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs022.html#gradient-method" style="font-size: 80%;">Gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs024.html#steepest-descent-method" style="font-size: 80%;">Steepest descent  method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs024.html#steepest-descent-method" style="font-size: 80%;">Steepest descent  method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs025.html#final-expressions" style="font-size: 80%;">Final expressions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs026.html#steepest-descent-example" style="font-size: 80%;">Steepest descent example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs031.html#conjugate-gradient-method-and-iterations" style="font-size: 80%;">Conjugate gradient method and iterations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs035.html#revisiting-our-first-homework" style="font-size: 80%;">Revisiting our first homework</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs040.html#gradient-descent-example" style="font-size: 80%;">Gradient descent example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs037.html#the-derivative-of-the-cost-loss-function" style="font-size: 80%;">The derivative of the cost/loss function</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs038.html#the-hessian-matrix" style="font-size: 80%;">The Hessian matrix</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs039.html#simple-program" style="font-size: 80%;">Simple program</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs040.html#gradient-descent-example" style="font-size: 80%;">Gradient Descent Example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs041.html#and-a-corresponding-example-using-scikit-learn" style="font-size: 80%;">And a corresponding example using <b>scikit-learn</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs042.html#gradient-descent-and-ridge" style="font-size: 80%;">Gradient descent and Ridge</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs043.html#the-hessian-matrix-for-ridge-regression" style="font-size: 80%;">The Hessian matrix for Ridge Regression</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs044.html#program-example-for-gradient-descent-with-ridge-regression" style="font-size: 80%;">Program example for gradient descent with Ridge Regression</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs045.html#using-gradient-descent-methods-limitations" style="font-size: 80%;">Using gradient descent methods, limitations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs046.html#improving-gradient-descent-with-momentum" style="font-size: 80%;">Improving gradient descent with momentum</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs083.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;">Same code but now with momentum gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs048.html#overview-video-on-stochastic-gradient-descent" style="font-size: 80%;">Overview video on Stochastic Gradient Descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs049.html#batches-and-mini-batches" style="font-size: 80%;">Batches and mini-batches</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs050.html#stochastic-gradient-descent-sgd" style="font-size: 80%;">Stochastic Gradient Descent (SGD)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs051.html#stochastic-gradient-descent" style="font-size: 80%;">Stochastic Gradient Descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs052.html#computation-of-gradients" style="font-size: 80%;">Computation of gradients</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs053.html#sgd-example" style="font-size: 80%;">SGD example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs054.html#the-gradient-step" style="font-size: 80%;">The gradient step</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs055.html#simple-example-code" style="font-size: 80%;">Simple example code</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs056.html#when-do-we-stop" style="font-size: 80%;">When do we stop?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs057.html#slightly-different-approach" style="font-size: 80%;">Slightly different approach</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs058.html#time-decay-rate" style="font-size: 80%;">Time decay rate</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs059.html#code-with-a-number-of-minibatches-which-varies" style="font-size: 80%;">Code with a Number of Minibatches which varies</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs060.html#replace-or-not" style="font-size: 80%;">Replace or not</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs061.html#momentum-based-gd" style="font-size: 80%;">Momentum based GD</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs062.html#more-on-momentum-based-approaches" style="font-size: 80%;">More on momentum based approaches</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs063.html#momentum-parameter" style="font-size: 80%;">Momentum parameter</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs064.html#second-moment-of-the-gradient" style="font-size: 80%;">Second moment of the gradient</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs065.html#rms-prop" style="font-size: 80%;">RMS prop</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs066.html#adam-optimizer-https-arxiv-org-abs-1412-6980" style="font-size: 80%;">"ADAM optimizer":"https://arxiv.org/abs/1412.6980"</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs067.html#algorithms-and-codes-for-adagrad-rmsprop-and-adam" style="font-size: 80%;">Algorithms and codes for Adagrad, RMSprop and Adam</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs068.html#practical-tips" style="font-size: 80%;">Practical tips</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs069.html#automatic-differentiation" style="font-size: 80%;">Automatic differentiation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs070.html#using-autograd" style="font-size: 80%;">Using autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs071.html#autograd-with-more-complicated-functions" style="font-size: 80%;">Autograd with more complicated functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs072.html#more-complicated-functions-using-the-elements-of-their-arguments-directly" style="font-size: 80%;">More complicated functions using the elements of their arguments directly</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs073.html#functions-using-mathematical-functions-from-numpy" style="font-size: 80%;">Functions using mathematical functions from Numpy</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs074.html#more-autograd" style="font-size: 80%;">More autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs075.html#and-with-loops" style="font-size: 80%;">And  with loops</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs076.html#using-recursion" style="font-size: 80%;">Using recursion</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs077.html#unsupported-functions" style="font-size: 80%;">Unsupported functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs078.html#the-syntax-a-dot-b-when-finding-the-dot-product" style="font-size: 80%;">The syntax a.dot(b) when finding the dot product</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs079.html#using-autograd-with-ols" style="font-size: 80%;">Using Autograd with OLS</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs083.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;">Same code but now with momentum gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs081.html#but-none-of-these-can-compete-with-newton-s-method" style="font-size: 80%;">But none of these can compete with Newton's method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs082.html#including-stochastic-gradient-descent-with-autograd" style="font-size: 80%;">Including Stochastic Gradient Descent with Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs083.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;">Same code but now with momentum gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="#similar-second-order-function-now-problem-but-now-with-adagrad" style="font-size: 80%;">Similar (second order function now) problem but now with AdaGrad</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs085.html#rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent" style="font-size: 80%;">RMSprop for adaptive learning rate with Stochastic Gradient Descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs086.html#and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf" style="font-size: 80%;">And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs087.html#and-logistic-regression" style="font-size: 80%;">And Logistic Regression</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs088.html#introducing-jax-https-jax-readthedocs-io-en-latest" style="font-size: 80%;">Introducing "JAX":"https://jax.readthedocs.io/en/latest/"</a></li>
+
+        </ul>
+      </li>
+    </ul>
+  </div>
+</div>
+</div> <!-- end of navigation bar -->
+<div class="container">
+<p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
+<a name="part0084"></a>
+<!-- !split -->
+<h2 id="similar-second-order-function-now-problem-but-now-with-adagrad" class="anchor">Similar (second order function now) problem but now with AdaGrad </h2>
+
+<!-- code=python (!bc pycod) typeset with pygments style "default" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #f8f8f8">
+  <pre style="line-height: 125%;"><span style="color: #408080; font-style: italic"># Using Autograd to calculate gradients using AdaGrad and Stochastic Gradient descent</span>
+<span style="color: #408080; font-style: italic"># OLS example</span>
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">random</span> <span style="color: #008000; font-weight: bold">import</span> random, seed
+<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
+<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">autograd.numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
+<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">matplotlib.pyplot</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">plt</span>
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">autograd</span> <span style="color: #008000; font-weight: bold">import</span> grad
+
+<span style="color: #408080; font-style: italic"># Note change from previous example</span>
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">CostOLS</span>(y,X,theta):
+    <span style="color: #008000; font-weight: bold">return</span> np<span style="color: #666666">.</span>sum((y<span style="color: #666666">-</span>X <span style="color: #666666">@</span> theta)<span style="color: #666666">**2</span>)
+
+n <span style="color: #666666">=</span> <span style="color: #666666">1000</span>
+x <span style="color: #666666">=</span> np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>rand(n,<span style="color: #666666">1</span>)
+y <span style="color: #666666">=</span> <span style="color: #666666">2.0+3*</span>x <span style="color: #666666">+4*</span>x<span style="color: #666666">*</span>x
+
+X <span style="color: #666666">=</span> np<span style="color: #666666">.</span>c_[np<span style="color: #666666">.</span>ones((n,<span style="color: #666666">1</span>)), x, x<span style="color: #666666">*</span>x]
+XT_X <span style="color: #666666">=</span> X<span style="color: #666666">.</span>T <span style="color: #666666">@</span> X
+theta_linreg <span style="color: #666666">=</span> np<span style="color: #666666">.</span>linalg<span style="color: #666666">.</span>pinv(XT_X) <span style="color: #666666">@</span> (X<span style="color: #666666">.</span>T <span style="color: #666666">@</span> y)
+<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;Own inversion&quot;</span>)
+<span style="color: #008000">print</span>(theta_linreg)
+
+
+<span style="color: #408080; font-style: italic"># Note that we request the derivative wrt third argument (theta, 2 here)</span>
+training_gradient <span style="color: #666666">=</span> grad(CostOLS,<span style="color: #666666">2</span>)
+<span style="color: #408080; font-style: italic"># Define parameters for Stochastic Gradient Descent</span>
+n_epochs <span style="color: #666666">=</span> <span style="color: #666666">50</span>
+M <span style="color: #666666">=</span> <span style="color: #666666">5</span>   <span style="color: #408080; font-style: italic">#size of each minibatch</span>
+m <span style="color: #666666">=</span> <span style="color: #008000">int</span>(n<span style="color: #666666">/</span>M) <span style="color: #408080; font-style: italic">#number of minibatches</span>
+<span style="color: #408080; font-style: italic"># Guess for unknown parameters theta</span>
+theta <span style="color: #666666">=</span> np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>randn(<span style="color: #666666">3</span>,<span style="color: #666666">1</span>)
+
+<span style="color: #408080; font-style: italic"># Value for learning rate</span>
+eta <span style="color: #666666">=</span> <span style="color: #666666">0.01</span>
+<span style="color: #408080; font-style: italic"># Including AdaGrad parameter to avoid possible division by zero</span>
+delta  <span style="color: #666666">=</span> <span style="color: #666666">1e-8</span>
+<span style="color: #008000; font-weight: bold">for</span> epoch <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(n_epochs):
+    Giter <span style="color: #666666">=</span> <span style="color: #666666">0.0</span>
+    <span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(m):
+        random_index <span style="color: #666666">=</span> M<span style="color: #666666">*</span>np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>randint(m)
+        xi <span style="color: #666666">=</span> X[random_index:random_index<span style="color: #666666">+</span>M]
+        yi <span style="color: #666666">=</span> y[random_index:random_index<span style="color: #666666">+</span>M]
+        gradients <span style="color: #666666">=</span> (<span style="color: #666666">1.0/</span>M)<span style="color: #666666">*</span>training_gradient(yi, xi, theta)
+        Giter <span style="color: #666666">+=</span> gradients<span style="color: #666666">*</span>gradients
+        update <span style="color: #666666">=</span> gradients<span style="color: #666666">*</span>eta<span style="color: #666666">/</span>(delta<span style="color: #666666">+</span>np<span style="color: #666666">.</span>sqrt(Giter))
+        theta <span style="color: #666666">-=</span> update
+<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;theta from own AdaGrad&quot;</span>)
+<span style="color: #008000">print</span>(theta)
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
+</div>
+
+<p>Running this code we note an almost perfect agreement with the results from matrix inversion.</p>
+
+<p>
+<!-- navigation buttons at the bottom of the page -->
+<ul class="pagination">
+<li><a href="http://github.com/._week39-bs083.html">&laquo;</a></li>
+  <li><a href="http://github.com/._week39-bs000.html">1</a></li>
+  <li><a href="">...</a></li>
+  <li><a href="http://github.com/._week39-bs076.html">77</a></li>
+  <li><a href="http://github.com/._week39-bs077.html">78</a></li>
+  <li><a href="http://github.com/._week39-bs078.html">79</a></li>
+  <li><a href="http://github.com/._week39-bs079.html">80</a></li>
+  <li><a href="http://github.com/._week39-bs080.html">81</a></li>
+  <li><a href="http://github.com/._week39-bs081.html">82</a></li>
+  <li><a href="http://github.com/._week39-bs082.html">83</a></li>
+  <li><a href="http://github.com/._week39-bs083.html">84</a></li>
+  <li class="active"><a href="http://github.com/._week39-bs084.html">85</a></li>
+  <li><a href="http://github.com/._week39-bs085.html">86</a></li>
+  <li><a href="http://github.com/._week39-bs086.html">87</a></li>
+  <li><a href="http://github.com/._week39-bs087.html">88</a></li>
+  <li><a href="http://github.com/._week39-bs088.html">89</a></li>
+  <li><a href="http://github.com/._week39-bs085.html">&raquo;</a></li>
+</ul>
+<!-- ------------------- end of main content --------------- -->
+</div>  <!-- end container -->
+<!-- include javascript, jQuery *first* -->
+<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.10.2/jquery.min.js"></script>
+<script src="https://netdna.bootstrapcdn.com/bootstrap/3.0.0/js/bootstrap.min.js"></script>
+<!-- Bootstrap footer
+<footer>
+<a href="https://.../"><img width="250" align=right src="https://.../"></a>
+</footer>
+-->
+<center style="font-size:80%">
+<!-- copyright only on the titlepage -->
+</center>
+</body>
+</html>
+
diff --git a/doc/pub/week39/html/._week39-bs085.html b/doc/pub/week39/html/._week39-bs085.html
new file mode 100644
index 000000000..f908adc27
--- /dev/null
+++ b/doc/pub/week39/html/._week39-bs085.html
@@ -0,0 +1,527 @@
+<!--
+HTML file automatically generated from DocOnce source
+(https://github.com/doconce/doconce/)
+doconce format html week39.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week39-bs --no_mako
+-->
+<html>
+<head>
+<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
+<meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
+<meta name="viewport" content="width=device-width, initial-scale=1.0" />
+<meta name="description" content="Week 39: Optimization and  Gradient Methods">
+<title>Week 39: Optimization and  Gradient Methods</title>
+<!-- Bootstrap style: bootstrap -->
+<!-- doconce format html week39.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week39-bs --no_mako -->
+<link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
+<!-- not necessary
+<link href="https://netdna.bootstrapcdn.com/font-awesome/4.0.3/css/font-awesome.css" rel="stylesheet">
+-->
+<style type="text/css">
+/* Add scrollbar to dropdown menus in bootstrap navigation bar */
+.dropdown-menu {
+   height: auto;
+   max-height: 400px;
+   overflow-x: hidden;
+}
+/* Adds an invisible element before each target to offset for the navigation
+   bar */
+.anchor::before {
+  content:"";
+  display:block;
+  height:50px;      /* fixed header height for style bootstrap */
+  margin:-50px 0 0; /* negative fixed header height */
+}
+</style>
+</head>
+
+<!-- tocinfo
+{'highest level': 2,
+ 'sections': [('Plan for week 39, September 23-27, 2024',
+               2,
+               None,
+               'plan-for-week-39-september-23-27-2024'),
+              ('Lecture Monday September 23',
+               2,
+               None,
+               'lecture-monday-september-23'),
+              ('Lab sessions week 39', 2, None, 'lab-sessions-week-39'),
+              ('Lecture Monday September 23, Optimization, the central part of '
+               'any Machine Learning algortithm',
+               2,
+               None,
+               'lecture-monday-september-23-optimization-the-central-part-of-any-machine-learning-algortithm'),
+              ('Revisiting our Logistic Regression case',
+               2,
+               None,
+               'revisiting-our-logistic-regression-case'),
+              ('The equations to solve', 2, None, 'the-equations-to-solve'),
+              ("Solving using Newton-Raphson's method",
+               2,
+               None,
+               'solving-using-newton-raphson-s-method'),
+              ("Brief reminder on Newton-Raphson's method",
+               2,
+               None,
+               'brief-reminder-on-newton-raphson-s-method'),
+              ('The equations', 2, None, 'the-equations'),
+              ('Simple geometric interpretation',
+               2,
+               None,
+               'simple-geometric-interpretation'),
+              ('Extending to more than one variable',
+               2,
+               None,
+               'extending-to-more-than-one-variable'),
+              ('Steepest descent', 2, None, 'steepest-descent'),
+              ('More on Steepest descent', 2, None, 'more-on-steepest-descent'),
+              ('The ideal', 2, None, 'the-ideal'),
+              ('The sensitiveness of the gradient descent',
+               2,
+               None,
+               'the-sensitiveness-of-the-gradient-descent'),
+              ('Convex functions', 2, None, 'convex-functions'),
+              ('Convex function', 2, None, 'convex-function'),
+              ('Conditions on convex functions',
+               2,
+               None,
+               'conditions-on-convex-functions'),
+              ('More on convex functions', 2, None, 'more-on-convex-functions'),
+              ('Some simple problems', 2, None, 'some-simple-problems'),
+              ('Standard steepest descent',
+               2,
+               None,
+               'standard-steepest-descent'),
+              ('Gradient method', 2, None, 'gradient-method'),
+              ('Steepest descent  method', 2, None, 'steepest-descent-method'),
+              ('Steepest descent  method', 2, None, 'steepest-descent-method'),
+              ('Final expressions', 2, None, 'final-expressions'),
+              ('Steepest descent example', 2, None, 'steepest-descent-example'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method and iterations',
+               2,
+               None,
+               'conjugate-gradient-method-and-iterations'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Revisiting our first homework',
+               2,
+               None,
+               'revisiting-our-first-homework'),
+              ('Gradient descent example', 2, None, 'gradient-descent-example'),
+              ('The derivative of the cost/loss function',
+               2,
+               None,
+               'the-derivative-of-the-cost-loss-function'),
+              ('The Hessian matrix', 2, None, 'the-hessian-matrix'),
+              ('Simple program', 2, None, 'simple-program'),
+              ('Gradient Descent Example', 2, None, 'gradient-descent-example'),
+              ('And a corresponding example using _scikit-learn_',
+               2,
+               None,
+               'and-a-corresponding-example-using-scikit-learn'),
+              ('Gradient descent and Ridge',
+               2,
+               None,
+               'gradient-descent-and-ridge'),
+              ('The Hessian matrix for Ridge Regression',
+               2,
+               None,
+               'the-hessian-matrix-for-ridge-regression'),
+              ('Program example for gradient descent with Ridge Regression',
+               2,
+               None,
+               'program-example-for-gradient-descent-with-ridge-regression'),
+              ('Using gradient descent methods, limitations',
+               2,
+               None,
+               'using-gradient-descent-methods-limitations'),
+              ('Improving gradient descent with momentum',
+               2,
+               None,
+               'improving-gradient-descent-with-momentum'),
+              ('Same code but now with momentum gradient descent',
+               2,
+               None,
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ('Overview video on Stochastic Gradient Descent',
+               2,
+               None,
+               'overview-video-on-stochastic-gradient-descent'),
+              ('Batches and mini-batches', 2, None, 'batches-and-mini-batches'),
+              ('Stochastic Gradient Descent (SGD)',
+               2,
+               None,
+               'stochastic-gradient-descent-sgd'),
+              ('Stochastic Gradient Descent',
+               2,
+               None,
+               'stochastic-gradient-descent'),
+              ('Computation of gradients', 2, None, 'computation-of-gradients'),
+              ('SGD example', 2, None, 'sgd-example'),
+              ('The gradient step', 2, None, 'the-gradient-step'),
+              ('Simple example code', 2, None, 'simple-example-code'),
+              ('When do we stop?', 2, None, 'when-do-we-stop'),
+              ('Slightly different approach',
+               2,
+               None,
+               'slightly-different-approach'),
+              ('Time decay rate', 2, None, 'time-decay-rate'),
+              ('Code with a Number of Minibatches which varies',
+               2,
+               None,
+               'code-with-a-number-of-minibatches-which-varies'),
+              ('Replace or not', 2, None, 'replace-or-not'),
+              ('Momentum based GD', 2, None, 'momentum-based-gd'),
+              ('More on momentum based approaches',
+               2,
+               None,
+               'more-on-momentum-based-approaches'),
+              ('Momentum parameter', 2, None, 'momentum-parameter'),
+              ('Second moment of the gradient',
+               2,
+               None,
+               'second-moment-of-the-gradient'),
+              ('RMS prop', 2, None, 'rms-prop'),
+              ('"ADAM optimizer":"https://arxiv.org/abs/1412.6980"',
+               2,
+               None,
+               'adam-optimizer-https-arxiv-org-abs-1412-6980'),
+              ('Algorithms and codes for Adagrad, RMSprop and Adam',
+               2,
+               None,
+               'algorithms-and-codes-for-adagrad-rmsprop-and-adam'),
+              ('Practical tips', 2, None, 'practical-tips'),
+              ('Automatic differentiation',
+               2,
+               None,
+               'automatic-differentiation'),
+              ('Using autograd', 2, None, 'using-autograd'),
+              ('Autograd with more complicated functions',
+               2,
+               None,
+               'autograd-with-more-complicated-functions'),
+              ('More complicated functions using the elements of their '
+               'arguments directly',
+               2,
+               None,
+               'more-complicated-functions-using-the-elements-of-their-arguments-directly'),
+              ('Functions using mathematical functions from Numpy',
+               2,
+               None,
+               'functions-using-mathematical-functions-from-numpy'),
+              ('More autograd', 2, None, 'more-autograd'),
+              ('And  with loops', 2, None, 'and-with-loops'),
+              ('Using recursion', 2, None, 'using-recursion'),
+              ('Unsupported functions', 2, None, 'unsupported-functions'),
+              ('The syntax a.dot(b) when finding the dot product',
+               2,
+               None,
+               'the-syntax-a-dot-b-when-finding-the-dot-product'),
+              ('Using Autograd with OLS', 2, None, 'using-autograd-with-ols'),
+              ('Same code but now with momentum gradient descent',
+               2,
+               None,
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ("But none of these can compete with Newton's method",
+               2,
+               None,
+               'but-none-of-these-can-compete-with-newton-s-method'),
+              ('Including Stochastic Gradient Descent with Autograd',
+               2,
+               None,
+               'including-stochastic-gradient-descent-with-autograd'),
+              ('Same code but now with momentum gradient descent',
+               2,
+               None,
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ('Similar (second order function now) problem but now with '
+               'AdaGrad',
+               2,
+               None,
+               'similar-second-order-function-now-problem-but-now-with-adagrad'),
+              ('RMSprop for adaptive learning rate with Stochastic Gradient '
+               'Descent',
+               2,
+               None,
+               'rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent'),
+              ('And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"',
+               2,
+               None,
+               'and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf'),
+              ('And Logistic Regression', 2, None, 'and-logistic-regression'),
+              ('Introducing "JAX":"https://jax.readthedocs.io/en/latest/"',
+               2,
+               None,
+               'introducing-jax-https-jax-readthedocs-io-en-latest')]}
+end of tocinfo -->
+
+<body>
+
+
+
+<script type="text/x-mathjax-config">
+MathJax.Hub.Config({
+  TeX: {
+     equationNumbers: {  autoNumber: "none"  },
+     extensions: ["AMSmath.js", "AMSsymbols.js", "autobold.js", "color.js"]
+  }
+});
+</script>
+<script type="text/javascript" async
+ src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
+</script>
+
+
+<!-- Bootstrap navigation bar -->
+<div class="navbar navbar-default navbar-fixed-top">
+  <div class="navbar-header">
+    <button type="button" class="navbar-toggle" data-toggle="collapse" data-target=".navbar-responsive-collapse">
+      <span class="icon-bar"></span>
+      <span class="icon-bar"></span>
+      <span class="icon-bar"></span>
+    </button>
+    <a class="navbar-brand" href="http://github.com/week39-bs.html">Week 39: Optimization and  Gradient Methods</a>
+  </div>
+  <div class="navbar-collapse collapse navbar-responsive-collapse">
+    <ul class="nav navbar-nav navbar-right">
+      <li class="dropdown">
+        <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
+        <ul class="dropdown-menu">
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs001.html#plan-for-week-39-september-23-27-2024" style="font-size: 80%;">Plan for week 39, September 23-27, 2024</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs002.html#lecture-monday-september-23" style="font-size: 80%;">Lecture Monday September 23</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs003.html#lab-sessions-week-39" style="font-size: 80%;">Lab sessions week 39</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs004.html#lecture-monday-september-23-optimization-the-central-part-of-any-machine-learning-algortithm" style="font-size: 80%;">Lecture Monday September 23, Optimization, the central part of any Machine Learning algortithm</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs005.html#revisiting-our-logistic-regression-case" style="font-size: 80%;">Revisiting our Logistic Regression case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs006.html#the-equations-to-solve" style="font-size: 80%;">The equations to solve</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs007.html#solving-using-newton-raphson-s-method" style="font-size: 80%;">Solving using Newton-Raphson's method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs008.html#brief-reminder-on-newton-raphson-s-method" style="font-size: 80%;">Brief reminder on Newton-Raphson's method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs009.html#the-equations" style="font-size: 80%;">The equations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs010.html#simple-geometric-interpretation" style="font-size: 80%;">Simple geometric interpretation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs011.html#extending-to-more-than-one-variable" style="font-size: 80%;">Extending to more than one variable</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs012.html#steepest-descent" style="font-size: 80%;">Steepest descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs013.html#more-on-steepest-descent" style="font-size: 80%;">More on Steepest descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs014.html#the-ideal" style="font-size: 80%;">The ideal</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs015.html#the-sensitiveness-of-the-gradient-descent" style="font-size: 80%;">The sensitiveness of the gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs016.html#convex-functions" style="font-size: 80%;">Convex functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs017.html#convex-function" style="font-size: 80%;">Convex function</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs018.html#conditions-on-convex-functions" style="font-size: 80%;">Conditions on convex functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs019.html#more-on-convex-functions" style="font-size: 80%;">More on convex functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs020.html#some-simple-problems" style="font-size: 80%;">Some simple problems</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs021.html#standard-steepest-descent" style="font-size: 80%;">Standard steepest descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs022.html#gradient-method" style="font-size: 80%;">Gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs024.html#steepest-descent-method" style="font-size: 80%;">Steepest descent  method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs024.html#steepest-descent-method" style="font-size: 80%;">Steepest descent  method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs025.html#final-expressions" style="font-size: 80%;">Final expressions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs026.html#steepest-descent-example" style="font-size: 80%;">Steepest descent example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs031.html#conjugate-gradient-method-and-iterations" style="font-size: 80%;">Conjugate gradient method and iterations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs035.html#revisiting-our-first-homework" style="font-size: 80%;">Revisiting our first homework</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs040.html#gradient-descent-example" style="font-size: 80%;">Gradient descent example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs037.html#the-derivative-of-the-cost-loss-function" style="font-size: 80%;">The derivative of the cost/loss function</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs038.html#the-hessian-matrix" style="font-size: 80%;">The Hessian matrix</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs039.html#simple-program" style="font-size: 80%;">Simple program</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs040.html#gradient-descent-example" style="font-size: 80%;">Gradient Descent Example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs041.html#and-a-corresponding-example-using-scikit-learn" style="font-size: 80%;">And a corresponding example using <b>scikit-learn</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs042.html#gradient-descent-and-ridge" style="font-size: 80%;">Gradient descent and Ridge</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs043.html#the-hessian-matrix-for-ridge-regression" style="font-size: 80%;">The Hessian matrix for Ridge Regression</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs044.html#program-example-for-gradient-descent-with-ridge-regression" style="font-size: 80%;">Program example for gradient descent with Ridge Regression</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs045.html#using-gradient-descent-methods-limitations" style="font-size: 80%;">Using gradient descent methods, limitations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs046.html#improving-gradient-descent-with-momentum" style="font-size: 80%;">Improving gradient descent with momentum</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs083.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;">Same code but now with momentum gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs048.html#overview-video-on-stochastic-gradient-descent" style="font-size: 80%;">Overview video on Stochastic Gradient Descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs049.html#batches-and-mini-batches" style="font-size: 80%;">Batches and mini-batches</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs050.html#stochastic-gradient-descent-sgd" style="font-size: 80%;">Stochastic Gradient Descent (SGD)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs051.html#stochastic-gradient-descent" style="font-size: 80%;">Stochastic Gradient Descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs052.html#computation-of-gradients" style="font-size: 80%;">Computation of gradients</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs053.html#sgd-example" style="font-size: 80%;">SGD example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs054.html#the-gradient-step" style="font-size: 80%;">The gradient step</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs055.html#simple-example-code" style="font-size: 80%;">Simple example code</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs056.html#when-do-we-stop" style="font-size: 80%;">When do we stop?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs057.html#slightly-different-approach" style="font-size: 80%;">Slightly different approach</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs058.html#time-decay-rate" style="font-size: 80%;">Time decay rate</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs059.html#code-with-a-number-of-minibatches-which-varies" style="font-size: 80%;">Code with a Number of Minibatches which varies</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs060.html#replace-or-not" style="font-size: 80%;">Replace or not</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs061.html#momentum-based-gd" style="font-size: 80%;">Momentum based GD</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs062.html#more-on-momentum-based-approaches" style="font-size: 80%;">More on momentum based approaches</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs063.html#momentum-parameter" style="font-size: 80%;">Momentum parameter</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs064.html#second-moment-of-the-gradient" style="font-size: 80%;">Second moment of the gradient</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs065.html#rms-prop" style="font-size: 80%;">RMS prop</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs066.html#adam-optimizer-https-arxiv-org-abs-1412-6980" style="font-size: 80%;">"ADAM optimizer":"https://arxiv.org/abs/1412.6980"</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs067.html#algorithms-and-codes-for-adagrad-rmsprop-and-adam" style="font-size: 80%;">Algorithms and codes for Adagrad, RMSprop and Adam</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs068.html#practical-tips" style="font-size: 80%;">Practical tips</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs069.html#automatic-differentiation" style="font-size: 80%;">Automatic differentiation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs070.html#using-autograd" style="font-size: 80%;">Using autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs071.html#autograd-with-more-complicated-functions" style="font-size: 80%;">Autograd with more complicated functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs072.html#more-complicated-functions-using-the-elements-of-their-arguments-directly" style="font-size: 80%;">More complicated functions using the elements of their arguments directly</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs073.html#functions-using-mathematical-functions-from-numpy" style="font-size: 80%;">Functions using mathematical functions from Numpy</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs074.html#more-autograd" style="font-size: 80%;">More autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs075.html#and-with-loops" style="font-size: 80%;">And  with loops</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs076.html#using-recursion" style="font-size: 80%;">Using recursion</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs077.html#unsupported-functions" style="font-size: 80%;">Unsupported functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs078.html#the-syntax-a-dot-b-when-finding-the-dot-product" style="font-size: 80%;">The syntax a.dot(b) when finding the dot product</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs079.html#using-autograd-with-ols" style="font-size: 80%;">Using Autograd with OLS</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs083.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;">Same code but now with momentum gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs081.html#but-none-of-these-can-compete-with-newton-s-method" style="font-size: 80%;">But none of these can compete with Newton's method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs082.html#including-stochastic-gradient-descent-with-autograd" style="font-size: 80%;">Including Stochastic Gradient Descent with Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs083.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;">Same code but now with momentum gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs084.html#similar-second-order-function-now-problem-but-now-with-adagrad" style="font-size: 80%;">Similar (second order function now) problem but now with AdaGrad</a></li>
+     <!-- navigation toc: --> <li><a href="#rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent" style="font-size: 80%;">RMSprop for adaptive learning rate with Stochastic Gradient Descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs086.html#and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf" style="font-size: 80%;">And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs087.html#and-logistic-regression" style="font-size: 80%;">And Logistic Regression</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs088.html#introducing-jax-https-jax-readthedocs-io-en-latest" style="font-size: 80%;">Introducing "JAX":"https://jax.readthedocs.io/en/latest/"</a></li>
+
+        </ul>
+      </li>
+    </ul>
+  </div>
+</div>
+</div> <!-- end of navigation bar -->
+<div class="container">
+<p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
+<a name="part0085"></a>
+<!-- !split -->
+<h2 id="rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent" class="anchor">RMSprop for adaptive learning rate with Stochastic Gradient Descent </h2>
+
+<!-- code=python (!bc pycod) typeset with pygments style "default" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #f8f8f8">
+  <pre style="line-height: 125%;"><span style="color: #408080; font-style: italic"># Using Autograd to calculate gradients using RMSprop  and Stochastic Gradient descent</span>
+<span style="color: #408080; font-style: italic"># OLS example</span>
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">random</span> <span style="color: #008000; font-weight: bold">import</span> random, seed
+<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
+<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">autograd.numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
+<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">matplotlib.pyplot</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">plt</span>
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">autograd</span> <span style="color: #008000; font-weight: bold">import</span> grad
+
+<span style="color: #408080; font-style: italic"># Note change from previous example</span>
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">CostOLS</span>(y,X,theta):
+    <span style="color: #008000; font-weight: bold">return</span> np<span style="color: #666666">.</span>sum((y<span style="color: #666666">-</span>X <span style="color: #666666">@</span> theta)<span style="color: #666666">**2</span>)
+
+n <span style="color: #666666">=</span> <span style="color: #666666">1000</span>
+x <span style="color: #666666">=</span> np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>rand(n,<span style="color: #666666">1</span>)
+y <span style="color: #666666">=</span> <span style="color: #666666">2.0+3*</span>x <span style="color: #666666">+4*</span>x<span style="color: #666666">*</span>x<span style="color: #408080; font-style: italic"># +np.random.randn(n,1)</span>
+
+X <span style="color: #666666">=</span> np<span style="color: #666666">.</span>c_[np<span style="color: #666666">.</span>ones((n,<span style="color: #666666">1</span>)), x, x<span style="color: #666666">*</span>x]
+XT_X <span style="color: #666666">=</span> X<span style="color: #666666">.</span>T <span style="color: #666666">@</span> X
+theta_linreg <span style="color: #666666">=</span> np<span style="color: #666666">.</span>linalg<span style="color: #666666">.</span>pinv(XT_X) <span style="color: #666666">@</span> (X<span style="color: #666666">.</span>T <span style="color: #666666">@</span> y)
+<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;Own inversion&quot;</span>)
+<span style="color: #008000">print</span>(theta_linreg)
+
+
+<span style="color: #408080; font-style: italic"># Note that we request the derivative wrt third argument (theta, 2 here)</span>
+training_gradient <span style="color: #666666">=</span> grad(CostOLS,<span style="color: #666666">2</span>)
+<span style="color: #408080; font-style: italic"># Define parameters for Stochastic Gradient Descent</span>
+n_epochs <span style="color: #666666">=</span> <span style="color: #666666">50</span>
+M <span style="color: #666666">=</span> <span style="color: #666666">5</span>   <span style="color: #408080; font-style: italic">#size of each minibatch</span>
+m <span style="color: #666666">=</span> <span style="color: #008000">int</span>(n<span style="color: #666666">/</span>M) <span style="color: #408080; font-style: italic">#number of minibatches</span>
+<span style="color: #408080; font-style: italic"># Guess for unknown parameters theta</span>
+theta <span style="color: #666666">=</span> np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>randn(<span style="color: #666666">3</span>,<span style="color: #666666">1</span>)
+
+<span style="color: #408080; font-style: italic"># Value for learning rate</span>
+eta <span style="color: #666666">=</span> <span style="color: #666666">0.01</span>
+<span style="color: #408080; font-style: italic"># Value for parameter rho</span>
+rho <span style="color: #666666">=</span> <span style="color: #666666">0.99</span>
+<span style="color: #408080; font-style: italic"># Including AdaGrad parameter to avoid possible division by zero</span>
+delta  <span style="color: #666666">=</span> <span style="color: #666666">1e-8</span>
+<span style="color: #008000; font-weight: bold">for</span> epoch <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(n_epochs):
+    Giter <span style="color: #666666">=</span> <span style="color: #666666">0.0</span>
+    <span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(m):
+        random_index <span style="color: #666666">=</span> M<span style="color: #666666">*</span>np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>randint(m)
+        xi <span style="color: #666666">=</span> X[random_index:random_index<span style="color: #666666">+</span>M]
+        yi <span style="color: #666666">=</span> y[random_index:random_index<span style="color: #666666">+</span>M]
+        gradients <span style="color: #666666">=</span> (<span style="color: #666666">1.0/</span>M)<span style="color: #666666">*</span>training_gradient(yi, xi, theta)
+	<span style="color: #408080; font-style: italic"># Accumulated gradient</span>
+	<span style="color: #408080; font-style: italic"># Scaling with rho the new and the previous results</span>
+        Giter <span style="color: #666666">=</span> (rho<span style="color: #666666">*</span>Giter<span style="color: #666666">+</span>(<span style="color: #666666">1-</span>rho)<span style="color: #666666">*</span>gradients<span style="color: #666666">*</span>gradients)
+	<span style="color: #408080; font-style: italic"># Taking the diagonal only and inverting</span>
+        update <span style="color: #666666">=</span> gradients<span style="color: #666666">*</span>eta<span style="color: #666666">/</span>(delta<span style="color: #666666">+</span>np<span style="color: #666666">.</span>sqrt(Giter))
+	<span style="color: #408080; font-style: italic"># Hadamard product</span>
+        theta <span style="color: #666666">-=</span> update
+<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;theta from own RMSprop&quot;</span>)
+<span style="color: #008000">print</span>(theta)
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
+</div>
+
+
+<p>
+<!-- navigation buttons at the bottom of the page -->
+<ul class="pagination">
+<li><a href="http://github.com/._week39-bs084.html">&laquo;</a></li>
+  <li><a href="http://github.com/._week39-bs000.html">1</a></li>
+  <li><a href="">...</a></li>
+  <li><a href="http://github.com/._week39-bs077.html">78</a></li>
+  <li><a href="http://github.com/._week39-bs078.html">79</a></li>
+  <li><a href="http://github.com/._week39-bs079.html">80</a></li>
+  <li><a href="http://github.com/._week39-bs080.html">81</a></li>
+  <li><a href="http://github.com/._week39-bs081.html">82</a></li>
+  <li><a href="http://github.com/._week39-bs082.html">83</a></li>
+  <li><a href="http://github.com/._week39-bs083.html">84</a></li>
+  <li><a href="http://github.com/._week39-bs084.html">85</a></li>
+  <li class="active"><a href="http://github.com/._week39-bs085.html">86</a></li>
+  <li><a href="http://github.com/._week39-bs086.html">87</a></li>
+  <li><a href="http://github.com/._week39-bs087.html">88</a></li>
+  <li><a href="http://github.com/._week39-bs088.html">89</a></li>
+  <li><a href="http://github.com/._week39-bs086.html">&raquo;</a></li>
+</ul>
+<!-- ------------------- end of main content --------------- -->
+</div>  <!-- end container -->
+<!-- include javascript, jQuery *first* -->
+<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.10.2/jquery.min.js"></script>
+<script src="https://netdna.bootstrapcdn.com/bootstrap/3.0.0/js/bootstrap.min.js"></script>
+<!-- Bootstrap footer
+<footer>
+<a href="https://.../"><img width="250" align=right src="https://.../"></a>
+</footer>
+-->
+<center style="font-size:80%">
+<!-- copyright only on the titlepage -->
+</center>
+</body>
+</html>
+
diff --git a/doc/pub/week39/html/._week39-bs086.html b/doc/pub/week39/html/._week39-bs086.html
new file mode 100644
index 000000000..98e185b59
--- /dev/null
+++ b/doc/pub/week39/html/._week39-bs086.html
@@ -0,0 +1,532 @@
+<!--
+HTML file automatically generated from DocOnce source
+(https://github.com/doconce/doconce/)
+doconce format html week39.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week39-bs --no_mako
+-->
+<html>
+<head>
+<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
+<meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
+<meta name="viewport" content="width=device-width, initial-scale=1.0" />
+<meta name="description" content="Week 39: Optimization and  Gradient Methods">
+<title>Week 39: Optimization and  Gradient Methods</title>
+<!-- Bootstrap style: bootstrap -->
+<!-- doconce format html week39.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week39-bs --no_mako -->
+<link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
+<!-- not necessary
+<link href="https://netdna.bootstrapcdn.com/font-awesome/4.0.3/css/font-awesome.css" rel="stylesheet">
+-->
+<style type="text/css">
+/* Add scrollbar to dropdown menus in bootstrap navigation bar */
+.dropdown-menu {
+   height: auto;
+   max-height: 400px;
+   overflow-x: hidden;
+}
+/* Adds an invisible element before each target to offset for the navigation
+   bar */
+.anchor::before {
+  content:"";
+  display:block;
+  height:50px;      /* fixed header height for style bootstrap */
+  margin:-50px 0 0; /* negative fixed header height */
+}
+</style>
+</head>
+
+<!-- tocinfo
+{'highest level': 2,
+ 'sections': [('Plan for week 39, September 23-27, 2024',
+               2,
+               None,
+               'plan-for-week-39-september-23-27-2024'),
+              ('Lecture Monday September 23',
+               2,
+               None,
+               'lecture-monday-september-23'),
+              ('Lab sessions week 39', 2, None, 'lab-sessions-week-39'),
+              ('Lecture Monday September 23, Optimization, the central part of '
+               'any Machine Learning algortithm',
+               2,
+               None,
+               'lecture-monday-september-23-optimization-the-central-part-of-any-machine-learning-algortithm'),
+              ('Revisiting our Logistic Regression case',
+               2,
+               None,
+               'revisiting-our-logistic-regression-case'),
+              ('The equations to solve', 2, None, 'the-equations-to-solve'),
+              ("Solving using Newton-Raphson's method",
+               2,
+               None,
+               'solving-using-newton-raphson-s-method'),
+              ("Brief reminder on Newton-Raphson's method",
+               2,
+               None,
+               'brief-reminder-on-newton-raphson-s-method'),
+              ('The equations', 2, None, 'the-equations'),
+              ('Simple geometric interpretation',
+               2,
+               None,
+               'simple-geometric-interpretation'),
+              ('Extending to more than one variable',
+               2,
+               None,
+               'extending-to-more-than-one-variable'),
+              ('Steepest descent', 2, None, 'steepest-descent'),
+              ('More on Steepest descent', 2, None, 'more-on-steepest-descent'),
+              ('The ideal', 2, None, 'the-ideal'),
+              ('The sensitiveness of the gradient descent',
+               2,
+               None,
+               'the-sensitiveness-of-the-gradient-descent'),
+              ('Convex functions', 2, None, 'convex-functions'),
+              ('Convex function', 2, None, 'convex-function'),
+              ('Conditions on convex functions',
+               2,
+               None,
+               'conditions-on-convex-functions'),
+              ('More on convex functions', 2, None, 'more-on-convex-functions'),
+              ('Some simple problems', 2, None, 'some-simple-problems'),
+              ('Standard steepest descent',
+               2,
+               None,
+               'standard-steepest-descent'),
+              ('Gradient method', 2, None, 'gradient-method'),
+              ('Steepest descent  method', 2, None, 'steepest-descent-method'),
+              ('Steepest descent  method', 2, None, 'steepest-descent-method'),
+              ('Final expressions', 2, None, 'final-expressions'),
+              ('Steepest descent example', 2, None, 'steepest-descent-example'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method and iterations',
+               2,
+               None,
+               'conjugate-gradient-method-and-iterations'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Revisiting our first homework',
+               2,
+               None,
+               'revisiting-our-first-homework'),
+              ('Gradient descent example', 2, None, 'gradient-descent-example'),
+              ('The derivative of the cost/loss function',
+               2,
+               None,
+               'the-derivative-of-the-cost-loss-function'),
+              ('The Hessian matrix', 2, None, 'the-hessian-matrix'),
+              ('Simple program', 2, None, 'simple-program'),
+              ('Gradient Descent Example', 2, None, 'gradient-descent-example'),
+              ('And a corresponding example using _scikit-learn_',
+               2,
+               None,
+               'and-a-corresponding-example-using-scikit-learn'),
+              ('Gradient descent and Ridge',
+               2,
+               None,
+               'gradient-descent-and-ridge'),
+              ('The Hessian matrix for Ridge Regression',
+               2,
+               None,
+               'the-hessian-matrix-for-ridge-regression'),
+              ('Program example for gradient descent with Ridge Regression',
+               2,
+               None,
+               'program-example-for-gradient-descent-with-ridge-regression'),
+              ('Using gradient descent methods, limitations',
+               2,
+               None,
+               'using-gradient-descent-methods-limitations'),
+              ('Improving gradient descent with momentum',
+               2,
+               None,
+               'improving-gradient-descent-with-momentum'),
+              ('Same code but now with momentum gradient descent',
+               2,
+               None,
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ('Overview video on Stochastic Gradient Descent',
+               2,
+               None,
+               'overview-video-on-stochastic-gradient-descent'),
+              ('Batches and mini-batches', 2, None, 'batches-and-mini-batches'),
+              ('Stochastic Gradient Descent (SGD)',
+               2,
+               None,
+               'stochastic-gradient-descent-sgd'),
+              ('Stochastic Gradient Descent',
+               2,
+               None,
+               'stochastic-gradient-descent'),
+              ('Computation of gradients', 2, None, 'computation-of-gradients'),
+              ('SGD example', 2, None, 'sgd-example'),
+              ('The gradient step', 2, None, 'the-gradient-step'),
+              ('Simple example code', 2, None, 'simple-example-code'),
+              ('When do we stop?', 2, None, 'when-do-we-stop'),
+              ('Slightly different approach',
+               2,
+               None,
+               'slightly-different-approach'),
+              ('Time decay rate', 2, None, 'time-decay-rate'),
+              ('Code with a Number of Minibatches which varies',
+               2,
+               None,
+               'code-with-a-number-of-minibatches-which-varies'),
+              ('Replace or not', 2, None, 'replace-or-not'),
+              ('Momentum based GD', 2, None, 'momentum-based-gd'),
+              ('More on momentum based approaches',
+               2,
+               None,
+               'more-on-momentum-based-approaches'),
+              ('Momentum parameter', 2, None, 'momentum-parameter'),
+              ('Second moment of the gradient',
+               2,
+               None,
+               'second-moment-of-the-gradient'),
+              ('RMS prop', 2, None, 'rms-prop'),
+              ('"ADAM optimizer":"https://arxiv.org/abs/1412.6980"',
+               2,
+               None,
+               'adam-optimizer-https-arxiv-org-abs-1412-6980'),
+              ('Algorithms and codes for Adagrad, RMSprop and Adam',
+               2,
+               None,
+               'algorithms-and-codes-for-adagrad-rmsprop-and-adam'),
+              ('Practical tips', 2, None, 'practical-tips'),
+              ('Automatic differentiation',
+               2,
+               None,
+               'automatic-differentiation'),
+              ('Using autograd', 2, None, 'using-autograd'),
+              ('Autograd with more complicated functions',
+               2,
+               None,
+               'autograd-with-more-complicated-functions'),
+              ('More complicated functions using the elements of their '
+               'arguments directly',
+               2,
+               None,
+               'more-complicated-functions-using-the-elements-of-their-arguments-directly'),
+              ('Functions using mathematical functions from Numpy',
+               2,
+               None,
+               'functions-using-mathematical-functions-from-numpy'),
+              ('More autograd', 2, None, 'more-autograd'),
+              ('And  with loops', 2, None, 'and-with-loops'),
+              ('Using recursion', 2, None, 'using-recursion'),
+              ('Unsupported functions', 2, None, 'unsupported-functions'),
+              ('The syntax a.dot(b) when finding the dot product',
+               2,
+               None,
+               'the-syntax-a-dot-b-when-finding-the-dot-product'),
+              ('Using Autograd with OLS', 2, None, 'using-autograd-with-ols'),
+              ('Same code but now with momentum gradient descent',
+               2,
+               None,
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ("But none of these can compete with Newton's method",
+               2,
+               None,
+               'but-none-of-these-can-compete-with-newton-s-method'),
+              ('Including Stochastic Gradient Descent with Autograd',
+               2,
+               None,
+               'including-stochastic-gradient-descent-with-autograd'),
+              ('Same code but now with momentum gradient descent',
+               2,
+               None,
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ('Similar (second order function now) problem but now with '
+               'AdaGrad',
+               2,
+               None,
+               'similar-second-order-function-now-problem-but-now-with-adagrad'),
+              ('RMSprop for adaptive learning rate with Stochastic Gradient '
+               'Descent',
+               2,
+               None,
+               'rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent'),
+              ('And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"',
+               2,
+               None,
+               'and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf'),
+              ('And Logistic Regression', 2, None, 'and-logistic-regression'),
+              ('Introducing "JAX":"https://jax.readthedocs.io/en/latest/"',
+               2,
+               None,
+               'introducing-jax-https-jax-readthedocs-io-en-latest')]}
+end of tocinfo -->
+
+<body>
+
+
+
+<script type="text/x-mathjax-config">
+MathJax.Hub.Config({
+  TeX: {
+     equationNumbers: {  autoNumber: "none"  },
+     extensions: ["AMSmath.js", "AMSsymbols.js", "autobold.js", "color.js"]
+  }
+});
+</script>
+<script type="text/javascript" async
+ src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
+</script>
+
+
+<!-- Bootstrap navigation bar -->
+<div class="navbar navbar-default navbar-fixed-top">
+  <div class="navbar-header">
+    <button type="button" class="navbar-toggle" data-toggle="collapse" data-target=".navbar-responsive-collapse">
+      <span class="icon-bar"></span>
+      <span class="icon-bar"></span>
+      <span class="icon-bar"></span>
+    </button>
+    <a class="navbar-brand" href="http://github.com/week39-bs.html">Week 39: Optimization and  Gradient Methods</a>
+  </div>
+  <div class="navbar-collapse collapse navbar-responsive-collapse">
+    <ul class="nav navbar-nav navbar-right">
+      <li class="dropdown">
+        <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
+        <ul class="dropdown-menu">
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs001.html#plan-for-week-39-september-23-27-2024" style="font-size: 80%;">Plan for week 39, September 23-27, 2024</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs002.html#lecture-monday-september-23" style="font-size: 80%;">Lecture Monday September 23</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs003.html#lab-sessions-week-39" style="font-size: 80%;">Lab sessions week 39</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs004.html#lecture-monday-september-23-optimization-the-central-part-of-any-machine-learning-algortithm" style="font-size: 80%;">Lecture Monday September 23, Optimization, the central part of any Machine Learning algortithm</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs005.html#revisiting-our-logistic-regression-case" style="font-size: 80%;">Revisiting our Logistic Regression case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs006.html#the-equations-to-solve" style="font-size: 80%;">The equations to solve</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs007.html#solving-using-newton-raphson-s-method" style="font-size: 80%;">Solving using Newton-Raphson's method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs008.html#brief-reminder-on-newton-raphson-s-method" style="font-size: 80%;">Brief reminder on Newton-Raphson's method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs009.html#the-equations" style="font-size: 80%;">The equations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs010.html#simple-geometric-interpretation" style="font-size: 80%;">Simple geometric interpretation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs011.html#extending-to-more-than-one-variable" style="font-size: 80%;">Extending to more than one variable</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs012.html#steepest-descent" style="font-size: 80%;">Steepest descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs013.html#more-on-steepest-descent" style="font-size: 80%;">More on Steepest descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs014.html#the-ideal" style="font-size: 80%;">The ideal</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs015.html#the-sensitiveness-of-the-gradient-descent" style="font-size: 80%;">The sensitiveness of the gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs016.html#convex-functions" style="font-size: 80%;">Convex functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs017.html#convex-function" style="font-size: 80%;">Convex function</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs018.html#conditions-on-convex-functions" style="font-size: 80%;">Conditions on convex functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs019.html#more-on-convex-functions" style="font-size: 80%;">More on convex functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs020.html#some-simple-problems" style="font-size: 80%;">Some simple problems</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs021.html#standard-steepest-descent" style="font-size: 80%;">Standard steepest descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs022.html#gradient-method" style="font-size: 80%;">Gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs024.html#steepest-descent-method" style="font-size: 80%;">Steepest descent  method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs024.html#steepest-descent-method" style="font-size: 80%;">Steepest descent  method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs025.html#final-expressions" style="font-size: 80%;">Final expressions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs026.html#steepest-descent-example" style="font-size: 80%;">Steepest descent example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs031.html#conjugate-gradient-method-and-iterations" style="font-size: 80%;">Conjugate gradient method and iterations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs035.html#revisiting-our-first-homework" style="font-size: 80%;">Revisiting our first homework</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs040.html#gradient-descent-example" style="font-size: 80%;">Gradient descent example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs037.html#the-derivative-of-the-cost-loss-function" style="font-size: 80%;">The derivative of the cost/loss function</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs038.html#the-hessian-matrix" style="font-size: 80%;">The Hessian matrix</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs039.html#simple-program" style="font-size: 80%;">Simple program</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs040.html#gradient-descent-example" style="font-size: 80%;">Gradient Descent Example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs041.html#and-a-corresponding-example-using-scikit-learn" style="font-size: 80%;">And a corresponding example using <b>scikit-learn</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs042.html#gradient-descent-and-ridge" style="font-size: 80%;">Gradient descent and Ridge</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs043.html#the-hessian-matrix-for-ridge-regression" style="font-size: 80%;">The Hessian matrix for Ridge Regression</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs044.html#program-example-for-gradient-descent-with-ridge-regression" style="font-size: 80%;">Program example for gradient descent with Ridge Regression</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs045.html#using-gradient-descent-methods-limitations" style="font-size: 80%;">Using gradient descent methods, limitations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs046.html#improving-gradient-descent-with-momentum" style="font-size: 80%;">Improving gradient descent with momentum</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs083.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;">Same code but now with momentum gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs048.html#overview-video-on-stochastic-gradient-descent" style="font-size: 80%;">Overview video on Stochastic Gradient Descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs049.html#batches-and-mini-batches" style="font-size: 80%;">Batches and mini-batches</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs050.html#stochastic-gradient-descent-sgd" style="font-size: 80%;">Stochastic Gradient Descent (SGD)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs051.html#stochastic-gradient-descent" style="font-size: 80%;">Stochastic Gradient Descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs052.html#computation-of-gradients" style="font-size: 80%;">Computation of gradients</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs053.html#sgd-example" style="font-size: 80%;">SGD example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs054.html#the-gradient-step" style="font-size: 80%;">The gradient step</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs055.html#simple-example-code" style="font-size: 80%;">Simple example code</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs056.html#when-do-we-stop" style="font-size: 80%;">When do we stop?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs057.html#slightly-different-approach" style="font-size: 80%;">Slightly different approach</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs058.html#time-decay-rate" style="font-size: 80%;">Time decay rate</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs059.html#code-with-a-number-of-minibatches-which-varies" style="font-size: 80%;">Code with a Number of Minibatches which varies</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs060.html#replace-or-not" style="font-size: 80%;">Replace or not</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs061.html#momentum-based-gd" style="font-size: 80%;">Momentum based GD</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs062.html#more-on-momentum-based-approaches" style="font-size: 80%;">More on momentum based approaches</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs063.html#momentum-parameter" style="font-size: 80%;">Momentum parameter</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs064.html#second-moment-of-the-gradient" style="font-size: 80%;">Second moment of the gradient</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs065.html#rms-prop" style="font-size: 80%;">RMS prop</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs066.html#adam-optimizer-https-arxiv-org-abs-1412-6980" style="font-size: 80%;">"ADAM optimizer":"https://arxiv.org/abs/1412.6980"</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs067.html#algorithms-and-codes-for-adagrad-rmsprop-and-adam" style="font-size: 80%;">Algorithms and codes for Adagrad, RMSprop and Adam</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs068.html#practical-tips" style="font-size: 80%;">Practical tips</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs069.html#automatic-differentiation" style="font-size: 80%;">Automatic differentiation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs070.html#using-autograd" style="font-size: 80%;">Using autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs071.html#autograd-with-more-complicated-functions" style="font-size: 80%;">Autograd with more complicated functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs072.html#more-complicated-functions-using-the-elements-of-their-arguments-directly" style="font-size: 80%;">More complicated functions using the elements of their arguments directly</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs073.html#functions-using-mathematical-functions-from-numpy" style="font-size: 80%;">Functions using mathematical functions from Numpy</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs074.html#more-autograd" style="font-size: 80%;">More autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs075.html#and-with-loops" style="font-size: 80%;">And  with loops</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs076.html#using-recursion" style="font-size: 80%;">Using recursion</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs077.html#unsupported-functions" style="font-size: 80%;">Unsupported functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs078.html#the-syntax-a-dot-b-when-finding-the-dot-product" style="font-size: 80%;">The syntax a.dot(b) when finding the dot product</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs079.html#using-autograd-with-ols" style="font-size: 80%;">Using Autograd with OLS</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs083.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;">Same code but now with momentum gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs081.html#but-none-of-these-can-compete-with-newton-s-method" style="font-size: 80%;">But none of these can compete with Newton's method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs082.html#including-stochastic-gradient-descent-with-autograd" style="font-size: 80%;">Including Stochastic Gradient Descent with Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs083.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;">Same code but now with momentum gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs084.html#similar-second-order-function-now-problem-but-now-with-adagrad" style="font-size: 80%;">Similar (second order function now) problem but now with AdaGrad</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs085.html#rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent" style="font-size: 80%;">RMSprop for adaptive learning rate with Stochastic Gradient Descent</a></li>
+     <!-- navigation toc: --> <li><a href="#and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf" style="font-size: 80%;">And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs087.html#and-logistic-regression" style="font-size: 80%;">And Logistic Regression</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs088.html#introducing-jax-https-jax-readthedocs-io-en-latest" style="font-size: 80%;">Introducing "JAX":"https://jax.readthedocs.io/en/latest/"</a></li>
+
+        </ul>
+      </li>
+    </ul>
+  </div>
+</div>
+</div> <!-- end of navigation bar -->
+<div class="container">
+<p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
+<a name="part0086"></a>
+<!-- !split -->
+<h2 id="and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf" class="anchor">And finally <a href="https://arxiv.org/pdf/1412.6980.pdf" target="_self">ADAM</a> </h2>
+
+
+<!-- code=python (!bc pycod) typeset with pygments style "default" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #f8f8f8">
+  <pre style="line-height: 125%;"><span style="color: #408080; font-style: italic"># Using Autograd to calculate gradients using RMSprop  and Stochastic Gradient descent</span>
+<span style="color: #408080; font-style: italic"># OLS example</span>
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">random</span> <span style="color: #008000; font-weight: bold">import</span> random, seed
+<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
+<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">autograd.numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
+<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">matplotlib.pyplot</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">plt</span>
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">autograd</span> <span style="color: #008000; font-weight: bold">import</span> grad
+
+<span style="color: #408080; font-style: italic"># Note change from previous example</span>
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">CostOLS</span>(y,X,theta):
+    <span style="color: #008000; font-weight: bold">return</span> np<span style="color: #666666">.</span>sum((y<span style="color: #666666">-</span>X <span style="color: #666666">@</span> theta)<span style="color: #666666">**2</span>)
+
+n <span style="color: #666666">=</span> <span style="color: #666666">1000</span>
+x <span style="color: #666666">=</span> np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>rand(n,<span style="color: #666666">1</span>)
+y <span style="color: #666666">=</span> <span style="color: #666666">2.0+3*</span>x <span style="color: #666666">+4*</span>x<span style="color: #666666">*</span>x<span style="color: #408080; font-style: italic"># +np.random.randn(n,1)</span>
+
+X <span style="color: #666666">=</span> np<span style="color: #666666">.</span>c_[np<span style="color: #666666">.</span>ones((n,<span style="color: #666666">1</span>)), x, x<span style="color: #666666">*</span>x]
+XT_X <span style="color: #666666">=</span> X<span style="color: #666666">.</span>T <span style="color: #666666">@</span> X
+theta_linreg <span style="color: #666666">=</span> np<span style="color: #666666">.</span>linalg<span style="color: #666666">.</span>pinv(XT_X) <span style="color: #666666">@</span> (X<span style="color: #666666">.</span>T <span style="color: #666666">@</span> y)
+<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;Own inversion&quot;</span>)
+<span style="color: #008000">print</span>(theta_linreg)
+
+
+<span style="color: #408080; font-style: italic"># Note that we request the derivative wrt third argument (theta, 2 here)</span>
+training_gradient <span style="color: #666666">=</span> grad(CostOLS,<span style="color: #666666">2</span>)
+<span style="color: #408080; font-style: italic"># Define parameters for Stochastic Gradient Descent</span>
+n_epochs <span style="color: #666666">=</span> <span style="color: #666666">50</span>
+M <span style="color: #666666">=</span> <span style="color: #666666">5</span>   <span style="color: #408080; font-style: italic">#size of each minibatch</span>
+m <span style="color: #666666">=</span> <span style="color: #008000">int</span>(n<span style="color: #666666">/</span>M) <span style="color: #408080; font-style: italic">#number of minibatches</span>
+<span style="color: #408080; font-style: italic"># Guess for unknown parameters theta</span>
+theta <span style="color: #666666">=</span> np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>randn(<span style="color: #666666">3</span>,<span style="color: #666666">1</span>)
+
+<span style="color: #408080; font-style: italic"># Value for learning rate</span>
+eta <span style="color: #666666">=</span> <span style="color: #666666">0.01</span>
+<span style="color: #408080; font-style: italic"># Value for parameters beta1 and beta2, see https://arxiv.org/abs/1412.6980</span>
+beta1 <span style="color: #666666">=</span> <span style="color: #666666">0.9</span>
+beta2 <span style="color: #666666">=</span> <span style="color: #666666">0.999</span>
+<span style="color: #408080; font-style: italic"># Including AdaGrad parameter to avoid possible division by zero</span>
+delta  <span style="color: #666666">=</span> <span style="color: #666666">1e-7</span>
+<span style="color: #008000">iter</span> <span style="color: #666666">=</span> <span style="color: #666666">0</span>
+<span style="color: #008000; font-weight: bold">for</span> epoch <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(n_epochs):
+    first_moment <span style="color: #666666">=</span> <span style="color: #666666">0.0</span>
+    second_moment <span style="color: #666666">=</span> <span style="color: #666666">0.0</span>
+    <span style="color: #008000">iter</span> <span style="color: #666666">+=</span> <span style="color: #666666">1</span>
+    <span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(m):
+        random_index <span style="color: #666666">=</span> M<span style="color: #666666">*</span>np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>randint(m)
+        xi <span style="color: #666666">=</span> X[random_index:random_index<span style="color: #666666">+</span>M]
+        yi <span style="color: #666666">=</span> y[random_index:random_index<span style="color: #666666">+</span>M]
+        gradients <span style="color: #666666">=</span> (<span style="color: #666666">1.0/</span>M)<span style="color: #666666">*</span>training_gradient(yi, xi, theta)
+        <span style="color: #408080; font-style: italic"># Computing moments first</span>
+        first_moment <span style="color: #666666">=</span> beta1<span style="color: #666666">*</span>first_moment <span style="color: #666666">+</span> (<span style="color: #666666">1-</span>beta1)<span style="color: #666666">*</span>gradients
+        second_moment <span style="color: #666666">=</span> beta2<span style="color: #666666">*</span>second_moment<span style="color: #666666">+</span>(<span style="color: #666666">1-</span>beta2)<span style="color: #666666">*</span>gradients<span style="color: #666666">*</span>gradients
+        first_term <span style="color: #666666">=</span> first_moment<span style="color: #666666">/</span>(<span style="color: #666666">1.0-</span>beta1<span style="color: #666666">**</span><span style="color: #008000">iter</span>)
+        second_term <span style="color: #666666">=</span> second_moment<span style="color: #666666">/</span>(<span style="color: #666666">1.0-</span>beta2<span style="color: #666666">**</span><span style="color: #008000">iter</span>)
+	<span style="color: #408080; font-style: italic"># Scaling with rho the new and the previous results</span>
+        update <span style="color: #666666">=</span> eta<span style="color: #666666">*</span>first_term<span style="color: #666666">/</span>(np<span style="color: #666666">.</span>sqrt(second_term)<span style="color: #666666">+</span>delta)
+        theta <span style="color: #666666">-=</span> update
+<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;theta from own ADAM&quot;</span>)
+<span style="color: #008000">print</span>(theta)
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
+</div>
+
+
+<p>
+<!-- navigation buttons at the bottom of the page -->
+<ul class="pagination">
+<li><a href="http://github.com/._week39-bs085.html">&laquo;</a></li>
+  <li><a href="http://github.com/._week39-bs000.html">1</a></li>
+  <li><a href="">...</a></li>
+  <li><a href="http://github.com/._week39-bs078.html">79</a></li>
+  <li><a href="http://github.com/._week39-bs079.html">80</a></li>
+  <li><a href="http://github.com/._week39-bs080.html">81</a></li>
+  <li><a href="http://github.com/._week39-bs081.html">82</a></li>
+  <li><a href="http://github.com/._week39-bs082.html">83</a></li>
+  <li><a href="http://github.com/._week39-bs083.html">84</a></li>
+  <li><a href="http://github.com/._week39-bs084.html">85</a></li>
+  <li><a href="http://github.com/._week39-bs085.html">86</a></li>
+  <li class="active"><a href="http://github.com/._week39-bs086.html">87</a></li>
+  <li><a href="http://github.com/._week39-bs087.html">88</a></li>
+  <li><a href="http://github.com/._week39-bs088.html">89</a></li>
+  <li><a href="http://github.com/._week39-bs087.html">&raquo;</a></li>
+</ul>
+<!-- ------------------- end of main content --------------- -->
+</div>  <!-- end container -->
+<!-- include javascript, jQuery *first* -->
+<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.10.2/jquery.min.js"></script>
+<script src="https://netdna.bootstrapcdn.com/bootstrap/3.0.0/js/bootstrap.min.js"></script>
+<!-- Bootstrap footer
+<footer>
+<a href="https://.../"><img width="250" align=right src="https://.../"></a>
+</footer>
+-->
+<center style="font-size:80%">
+<!-- copyright only on the titlepage -->
+</center>
+</body>
+</html>
+
diff --git a/doc/pub/week39/html/._week39-bs087.html b/doc/pub/week39/html/._week39-bs087.html
new file mode 100644
index 000000000..11f5da1cf
--- /dev/null
+++ b/doc/pub/week39/html/._week39-bs087.html
@@ -0,0 +1,505 @@
+<!--
+HTML file automatically generated from DocOnce source
+(https://github.com/doconce/doconce/)
+doconce format html week39.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week39-bs --no_mako
+-->
+<html>
+<head>
+<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
+<meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
+<meta name="viewport" content="width=device-width, initial-scale=1.0" />
+<meta name="description" content="Week 39: Optimization and  Gradient Methods">
+<title>Week 39: Optimization and  Gradient Methods</title>
+<!-- Bootstrap style: bootstrap -->
+<!-- doconce format html week39.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week39-bs --no_mako -->
+<link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
+<!-- not necessary
+<link href="https://netdna.bootstrapcdn.com/font-awesome/4.0.3/css/font-awesome.css" rel="stylesheet">
+-->
+<style type="text/css">
+/* Add scrollbar to dropdown menus in bootstrap navigation bar */
+.dropdown-menu {
+   height: auto;
+   max-height: 400px;
+   overflow-x: hidden;
+}
+/* Adds an invisible element before each target to offset for the navigation
+   bar */
+.anchor::before {
+  content:"";
+  display:block;
+  height:50px;      /* fixed header height for style bootstrap */
+  margin:-50px 0 0; /* negative fixed header height */
+}
+</style>
+</head>
+
+<!-- tocinfo
+{'highest level': 2,
+ 'sections': [('Plan for week 39, September 23-27, 2024',
+               2,
+               None,
+               'plan-for-week-39-september-23-27-2024'),
+              ('Lecture Monday September 23',
+               2,
+               None,
+               'lecture-monday-september-23'),
+              ('Lab sessions week 39', 2, None, 'lab-sessions-week-39'),
+              ('Lecture Monday September 23, Optimization, the central part of '
+               'any Machine Learning algortithm',
+               2,
+               None,
+               'lecture-monday-september-23-optimization-the-central-part-of-any-machine-learning-algortithm'),
+              ('Revisiting our Logistic Regression case',
+               2,
+               None,
+               'revisiting-our-logistic-regression-case'),
+              ('The equations to solve', 2, None, 'the-equations-to-solve'),
+              ("Solving using Newton-Raphson's method",
+               2,
+               None,
+               'solving-using-newton-raphson-s-method'),
+              ("Brief reminder on Newton-Raphson's method",
+               2,
+               None,
+               'brief-reminder-on-newton-raphson-s-method'),
+              ('The equations', 2, None, 'the-equations'),
+              ('Simple geometric interpretation',
+               2,
+               None,
+               'simple-geometric-interpretation'),
+              ('Extending to more than one variable',
+               2,
+               None,
+               'extending-to-more-than-one-variable'),
+              ('Steepest descent', 2, None, 'steepest-descent'),
+              ('More on Steepest descent', 2, None, 'more-on-steepest-descent'),
+              ('The ideal', 2, None, 'the-ideal'),
+              ('The sensitiveness of the gradient descent',
+               2,
+               None,
+               'the-sensitiveness-of-the-gradient-descent'),
+              ('Convex functions', 2, None, 'convex-functions'),
+              ('Convex function', 2, None, 'convex-function'),
+              ('Conditions on convex functions',
+               2,
+               None,
+               'conditions-on-convex-functions'),
+              ('More on convex functions', 2, None, 'more-on-convex-functions'),
+              ('Some simple problems', 2, None, 'some-simple-problems'),
+              ('Standard steepest descent',
+               2,
+               None,
+               'standard-steepest-descent'),
+              ('Gradient method', 2, None, 'gradient-method'),
+              ('Steepest descent  method', 2, None, 'steepest-descent-method'),
+              ('Steepest descent  method', 2, None, 'steepest-descent-method'),
+              ('Final expressions', 2, None, 'final-expressions'),
+              ('Steepest descent example', 2, None, 'steepest-descent-example'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method and iterations',
+               2,
+               None,
+               'conjugate-gradient-method-and-iterations'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Revisiting our first homework',
+               2,
+               None,
+               'revisiting-our-first-homework'),
+              ('Gradient descent example', 2, None, 'gradient-descent-example'),
+              ('The derivative of the cost/loss function',
+               2,
+               None,
+               'the-derivative-of-the-cost-loss-function'),
+              ('The Hessian matrix', 2, None, 'the-hessian-matrix'),
+              ('Simple program', 2, None, 'simple-program'),
+              ('Gradient Descent Example', 2, None, 'gradient-descent-example'),
+              ('And a corresponding example using _scikit-learn_',
+               2,
+               None,
+               'and-a-corresponding-example-using-scikit-learn'),
+              ('Gradient descent and Ridge',
+               2,
+               None,
+               'gradient-descent-and-ridge'),
+              ('The Hessian matrix for Ridge Regression',
+               2,
+               None,
+               'the-hessian-matrix-for-ridge-regression'),
+              ('Program example for gradient descent with Ridge Regression',
+               2,
+               None,
+               'program-example-for-gradient-descent-with-ridge-regression'),
+              ('Using gradient descent methods, limitations',
+               2,
+               None,
+               'using-gradient-descent-methods-limitations'),
+              ('Improving gradient descent with momentum',
+               2,
+               None,
+               'improving-gradient-descent-with-momentum'),
+              ('Same code but now with momentum gradient descent',
+               2,
+               None,
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ('Overview video on Stochastic Gradient Descent',
+               2,
+               None,
+               'overview-video-on-stochastic-gradient-descent'),
+              ('Batches and mini-batches', 2, None, 'batches-and-mini-batches'),
+              ('Stochastic Gradient Descent (SGD)',
+               2,
+               None,
+               'stochastic-gradient-descent-sgd'),
+              ('Stochastic Gradient Descent',
+               2,
+               None,
+               'stochastic-gradient-descent'),
+              ('Computation of gradients', 2, None, 'computation-of-gradients'),
+              ('SGD example', 2, None, 'sgd-example'),
+              ('The gradient step', 2, None, 'the-gradient-step'),
+              ('Simple example code', 2, None, 'simple-example-code'),
+              ('When do we stop?', 2, None, 'when-do-we-stop'),
+              ('Slightly different approach',
+               2,
+               None,
+               'slightly-different-approach'),
+              ('Time decay rate', 2, None, 'time-decay-rate'),
+              ('Code with a Number of Minibatches which varies',
+               2,
+               None,
+               'code-with-a-number-of-minibatches-which-varies'),
+              ('Replace or not', 2, None, 'replace-or-not'),
+              ('Momentum based GD', 2, None, 'momentum-based-gd'),
+              ('More on momentum based approaches',
+               2,
+               None,
+               'more-on-momentum-based-approaches'),
+              ('Momentum parameter', 2, None, 'momentum-parameter'),
+              ('Second moment of the gradient',
+               2,
+               None,
+               'second-moment-of-the-gradient'),
+              ('RMS prop', 2, None, 'rms-prop'),
+              ('"ADAM optimizer":"https://arxiv.org/abs/1412.6980"',
+               2,
+               None,
+               'adam-optimizer-https-arxiv-org-abs-1412-6980'),
+              ('Algorithms and codes for Adagrad, RMSprop and Adam',
+               2,
+               None,
+               'algorithms-and-codes-for-adagrad-rmsprop-and-adam'),
+              ('Practical tips', 2, None, 'practical-tips'),
+              ('Automatic differentiation',
+               2,
+               None,
+               'automatic-differentiation'),
+              ('Using autograd', 2, None, 'using-autograd'),
+              ('Autograd with more complicated functions',
+               2,
+               None,
+               'autograd-with-more-complicated-functions'),
+              ('More complicated functions using the elements of their '
+               'arguments directly',
+               2,
+               None,
+               'more-complicated-functions-using-the-elements-of-their-arguments-directly'),
+              ('Functions using mathematical functions from Numpy',
+               2,
+               None,
+               'functions-using-mathematical-functions-from-numpy'),
+              ('More autograd', 2, None, 'more-autograd'),
+              ('And  with loops', 2, None, 'and-with-loops'),
+              ('Using recursion', 2, None, 'using-recursion'),
+              ('Unsupported functions', 2, None, 'unsupported-functions'),
+              ('The syntax a.dot(b) when finding the dot product',
+               2,
+               None,
+               'the-syntax-a-dot-b-when-finding-the-dot-product'),
+              ('Using Autograd with OLS', 2, None, 'using-autograd-with-ols'),
+              ('Same code but now with momentum gradient descent',
+               2,
+               None,
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ("But none of these can compete with Newton's method",
+               2,
+               None,
+               'but-none-of-these-can-compete-with-newton-s-method'),
+              ('Including Stochastic Gradient Descent with Autograd',
+               2,
+               None,
+               'including-stochastic-gradient-descent-with-autograd'),
+              ('Same code but now with momentum gradient descent',
+               2,
+               None,
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ('Similar (second order function now) problem but now with '
+               'AdaGrad',
+               2,
+               None,
+               'similar-second-order-function-now-problem-but-now-with-adagrad'),
+              ('RMSprop for adaptive learning rate with Stochastic Gradient '
+               'Descent',
+               2,
+               None,
+               'rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent'),
+              ('And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"',
+               2,
+               None,
+               'and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf'),
+              ('And Logistic Regression', 2, None, 'and-logistic-regression'),
+              ('Introducing "JAX":"https://jax.readthedocs.io/en/latest/"',
+               2,
+               None,
+               'introducing-jax-https-jax-readthedocs-io-en-latest')]}
+end of tocinfo -->
+
+<body>
+
+
+
+<script type="text/x-mathjax-config">
+MathJax.Hub.Config({
+  TeX: {
+     equationNumbers: {  autoNumber: "none"  },
+     extensions: ["AMSmath.js", "AMSsymbols.js", "autobold.js", "color.js"]
+  }
+});
+</script>
+<script type="text/javascript" async
+ src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
+</script>
+
+
+<!-- Bootstrap navigation bar -->
+<div class="navbar navbar-default navbar-fixed-top">
+  <div class="navbar-header">
+    <button type="button" class="navbar-toggle" data-toggle="collapse" data-target=".navbar-responsive-collapse">
+      <span class="icon-bar"></span>
+      <span class="icon-bar"></span>
+      <span class="icon-bar"></span>
+    </button>
+    <a class="navbar-brand" href="http://github.com/week39-bs.html">Week 39: Optimization and  Gradient Methods</a>
+  </div>
+  <div class="navbar-collapse collapse navbar-responsive-collapse">
+    <ul class="nav navbar-nav navbar-right">
+      <li class="dropdown">
+        <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
+        <ul class="dropdown-menu">
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs001.html#plan-for-week-39-september-23-27-2024" style="font-size: 80%;">Plan for week 39, September 23-27, 2024</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs002.html#lecture-monday-september-23" style="font-size: 80%;">Lecture Monday September 23</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs003.html#lab-sessions-week-39" style="font-size: 80%;">Lab sessions week 39</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs004.html#lecture-monday-september-23-optimization-the-central-part-of-any-machine-learning-algortithm" style="font-size: 80%;">Lecture Monday September 23, Optimization, the central part of any Machine Learning algortithm</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs005.html#revisiting-our-logistic-regression-case" style="font-size: 80%;">Revisiting our Logistic Regression case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs006.html#the-equations-to-solve" style="font-size: 80%;">The equations to solve</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs007.html#solving-using-newton-raphson-s-method" style="font-size: 80%;">Solving using Newton-Raphson's method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs008.html#brief-reminder-on-newton-raphson-s-method" style="font-size: 80%;">Brief reminder on Newton-Raphson's method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs009.html#the-equations" style="font-size: 80%;">The equations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs010.html#simple-geometric-interpretation" style="font-size: 80%;">Simple geometric interpretation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs011.html#extending-to-more-than-one-variable" style="font-size: 80%;">Extending to more than one variable</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs012.html#steepest-descent" style="font-size: 80%;">Steepest descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs013.html#more-on-steepest-descent" style="font-size: 80%;">More on Steepest descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs014.html#the-ideal" style="font-size: 80%;">The ideal</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs015.html#the-sensitiveness-of-the-gradient-descent" style="font-size: 80%;">The sensitiveness of the gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs016.html#convex-functions" style="font-size: 80%;">Convex functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs017.html#convex-function" style="font-size: 80%;">Convex function</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs018.html#conditions-on-convex-functions" style="font-size: 80%;">Conditions on convex functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs019.html#more-on-convex-functions" style="font-size: 80%;">More on convex functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs020.html#some-simple-problems" style="font-size: 80%;">Some simple problems</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs021.html#standard-steepest-descent" style="font-size: 80%;">Standard steepest descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs022.html#gradient-method" style="font-size: 80%;">Gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs024.html#steepest-descent-method" style="font-size: 80%;">Steepest descent  method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs024.html#steepest-descent-method" style="font-size: 80%;">Steepest descent  method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs025.html#final-expressions" style="font-size: 80%;">Final expressions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs026.html#steepest-descent-example" style="font-size: 80%;">Steepest descent example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs031.html#conjugate-gradient-method-and-iterations" style="font-size: 80%;">Conjugate gradient method and iterations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs035.html#revisiting-our-first-homework" style="font-size: 80%;">Revisiting our first homework</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs040.html#gradient-descent-example" style="font-size: 80%;">Gradient descent example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs037.html#the-derivative-of-the-cost-loss-function" style="font-size: 80%;">The derivative of the cost/loss function</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs038.html#the-hessian-matrix" style="font-size: 80%;">The Hessian matrix</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs039.html#simple-program" style="font-size: 80%;">Simple program</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs040.html#gradient-descent-example" style="font-size: 80%;">Gradient Descent Example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs041.html#and-a-corresponding-example-using-scikit-learn" style="font-size: 80%;">And a corresponding example using <b>scikit-learn</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs042.html#gradient-descent-and-ridge" style="font-size: 80%;">Gradient descent and Ridge</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs043.html#the-hessian-matrix-for-ridge-regression" style="font-size: 80%;">The Hessian matrix for Ridge Regression</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs044.html#program-example-for-gradient-descent-with-ridge-regression" style="font-size: 80%;">Program example for gradient descent with Ridge Regression</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs045.html#using-gradient-descent-methods-limitations" style="font-size: 80%;">Using gradient descent methods, limitations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs046.html#improving-gradient-descent-with-momentum" style="font-size: 80%;">Improving gradient descent with momentum</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs083.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;">Same code but now with momentum gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs048.html#overview-video-on-stochastic-gradient-descent" style="font-size: 80%;">Overview video on Stochastic Gradient Descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs049.html#batches-and-mini-batches" style="font-size: 80%;">Batches and mini-batches</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs050.html#stochastic-gradient-descent-sgd" style="font-size: 80%;">Stochastic Gradient Descent (SGD)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs051.html#stochastic-gradient-descent" style="font-size: 80%;">Stochastic Gradient Descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs052.html#computation-of-gradients" style="font-size: 80%;">Computation of gradients</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs053.html#sgd-example" style="font-size: 80%;">SGD example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs054.html#the-gradient-step" style="font-size: 80%;">The gradient step</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs055.html#simple-example-code" style="font-size: 80%;">Simple example code</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs056.html#when-do-we-stop" style="font-size: 80%;">When do we stop?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs057.html#slightly-different-approach" style="font-size: 80%;">Slightly different approach</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs058.html#time-decay-rate" style="font-size: 80%;">Time decay rate</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs059.html#code-with-a-number-of-minibatches-which-varies" style="font-size: 80%;">Code with a Number of Minibatches which varies</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs060.html#replace-or-not" style="font-size: 80%;">Replace or not</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs061.html#momentum-based-gd" style="font-size: 80%;">Momentum based GD</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs062.html#more-on-momentum-based-approaches" style="font-size: 80%;">More on momentum based approaches</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs063.html#momentum-parameter" style="font-size: 80%;">Momentum parameter</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs064.html#second-moment-of-the-gradient" style="font-size: 80%;">Second moment of the gradient</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs065.html#rms-prop" style="font-size: 80%;">RMS prop</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs066.html#adam-optimizer-https-arxiv-org-abs-1412-6980" style="font-size: 80%;">"ADAM optimizer":"https://arxiv.org/abs/1412.6980"</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs067.html#algorithms-and-codes-for-adagrad-rmsprop-and-adam" style="font-size: 80%;">Algorithms and codes for Adagrad, RMSprop and Adam</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs068.html#practical-tips" style="font-size: 80%;">Practical tips</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs069.html#automatic-differentiation" style="font-size: 80%;">Automatic differentiation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs070.html#using-autograd" style="font-size: 80%;">Using autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs071.html#autograd-with-more-complicated-functions" style="font-size: 80%;">Autograd with more complicated functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs072.html#more-complicated-functions-using-the-elements-of-their-arguments-directly" style="font-size: 80%;">More complicated functions using the elements of their arguments directly</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs073.html#functions-using-mathematical-functions-from-numpy" style="font-size: 80%;">Functions using mathematical functions from Numpy</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs074.html#more-autograd" style="font-size: 80%;">More autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs075.html#and-with-loops" style="font-size: 80%;">And  with loops</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs076.html#using-recursion" style="font-size: 80%;">Using recursion</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs077.html#unsupported-functions" style="font-size: 80%;">Unsupported functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs078.html#the-syntax-a-dot-b-when-finding-the-dot-product" style="font-size: 80%;">The syntax a.dot(b) when finding the dot product</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs079.html#using-autograd-with-ols" style="font-size: 80%;">Using Autograd with OLS</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs083.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;">Same code but now with momentum gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs081.html#but-none-of-these-can-compete-with-newton-s-method" style="font-size: 80%;">But none of these can compete with Newton's method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs082.html#including-stochastic-gradient-descent-with-autograd" style="font-size: 80%;">Including Stochastic Gradient Descent with Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs083.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;">Same code but now with momentum gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs084.html#similar-second-order-function-now-problem-but-now-with-adagrad" style="font-size: 80%;">Similar (second order function now) problem but now with AdaGrad</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs085.html#rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent" style="font-size: 80%;">RMSprop for adaptive learning rate with Stochastic Gradient Descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs086.html#and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf" style="font-size: 80%;">And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"</a></li>
+     <!-- navigation toc: --> <li><a href="#and-logistic-regression" style="font-size: 80%;">And Logistic Regression</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs088.html#introducing-jax-https-jax-readthedocs-io-en-latest" style="font-size: 80%;">Introducing "JAX":"https://jax.readthedocs.io/en/latest/"</a></li>
+
+        </ul>
+      </li>
+    </ul>
+  </div>
+</div>
+</div> <!-- end of navigation bar -->
+<div class="container">
+<p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
+<a name="part0087"></a>
+<!-- !split -->
+<h2 id="and-logistic-regression" class="anchor">And Logistic Regression </h2>
+
+
+<!-- code=python (!bc pycod) typeset with pygments style "default" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #f8f8f8">
+  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">autograd.numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">autograd</span> <span style="color: #008000; font-weight: bold">import</span> grad
+
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">sigmoid</span>(x):
+    <span style="color: #008000; font-weight: bold">return</span> <span style="color: #666666">0.5</span> <span style="color: #666666">*</span> (np<span style="color: #666666">.</span>tanh(x <span style="color: #666666">/</span> <span style="color: #666666">2.</span>) <span style="color: #666666">+</span> <span style="color: #666666">1</span>)
+
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">logistic_predictions</span>(weights, inputs):
+    <span style="color: #408080; font-style: italic"># Outputs probability of a label being true according to logistic model.</span>
+    <span style="color: #008000; font-weight: bold">return</span> sigmoid(np<span style="color: #666666">.</span>dot(inputs, weights))
+
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">training_loss</span>(weights):
+    <span style="color: #408080; font-style: italic"># Training loss is the negative log-likelihood of the training labels.</span>
+    preds <span style="color: #666666">=</span> logistic_predictions(weights, inputs)
+    label_probabilities <span style="color: #666666">=</span> preds <span style="color: #666666">*</span> targets <span style="color: #666666">+</span> (<span style="color: #666666">1</span> <span style="color: #666666">-</span> preds) <span style="color: #666666">*</span> (<span style="color: #666666">1</span> <span style="color: #666666">-</span> targets)
+    <span style="color: #008000; font-weight: bold">return</span> <span style="color: #666666">-</span>np<span style="color: #666666">.</span>sum(np<span style="color: #666666">.</span>log(label_probabilities))
+
+<span style="color: #408080; font-style: italic"># Build a toy dataset.</span>
+inputs <span style="color: #666666">=</span> np<span style="color: #666666">.</span>array([[<span style="color: #666666">0.52</span>, <span style="color: #666666">1.12</span>,  <span style="color: #666666">0.77</span>],
+                   [<span style="color: #666666">0.88</span>, <span style="color: #666666">-1.08</span>, <span style="color: #666666">0.15</span>],
+                   [<span style="color: #666666">0.52</span>, <span style="color: #666666">0.06</span>, <span style="color: #666666">-1.30</span>],
+                   [<span style="color: #666666">0.74</span>, <span style="color: #666666">-2.49</span>, <span style="color: #666666">1.39</span>]])
+targets <span style="color: #666666">=</span> np<span style="color: #666666">.</span>array([<span style="color: #008000; font-weight: bold">True</span>, <span style="color: #008000; font-weight: bold">True</span>, <span style="color: #008000; font-weight: bold">False</span>, <span style="color: #008000; font-weight: bold">True</span>])
+
+<span style="color: #408080; font-style: italic"># Define a function that returns gradients of training loss using Autograd.</span>
+training_gradient_fun <span style="color: #666666">=</span> grad(training_loss)
+
+<span style="color: #408080; font-style: italic"># Optimize weights using gradient descent.</span>
+weights <span style="color: #666666">=</span> np<span style="color: #666666">.</span>array([<span style="color: #666666">0.0</span>, <span style="color: #666666">0.0</span>, <span style="color: #666666">0.0</span>])
+<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;Initial loss:&quot;</span>, training_loss(weights))
+<span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #666666">100</span>):
+    weights <span style="color: #666666">-=</span> training_gradient_fun(weights) <span style="color: #666666">*</span> <span style="color: #666666">0.01</span>
+
+<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;Trained loss:&quot;</span>, training_loss(weights))
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
+</div>
+
+
+<p>
+<!-- navigation buttons at the bottom of the page -->
+<ul class="pagination">
+<li><a href="http://github.com/._week39-bs086.html">&laquo;</a></li>
+  <li><a href="http://github.com/._week39-bs000.html">1</a></li>
+  <li><a href="">...</a></li>
+  <li><a href="http://github.com/._week39-bs079.html">80</a></li>
+  <li><a href="http://github.com/._week39-bs080.html">81</a></li>
+  <li><a href="http://github.com/._week39-bs081.html">82</a></li>
+  <li><a href="http://github.com/._week39-bs082.html">83</a></li>
+  <li><a href="http://github.com/._week39-bs083.html">84</a></li>
+  <li><a href="http://github.com/._week39-bs084.html">85</a></li>
+  <li><a href="http://github.com/._week39-bs085.html">86</a></li>
+  <li><a href="http://github.com/._week39-bs086.html">87</a></li>
+  <li class="active"><a href="http://github.com/._week39-bs087.html">88</a></li>
+  <li><a href="http://github.com/._week39-bs088.html">89</a></li>
+  <li><a href="http://github.com/._week39-bs088.html">&raquo;</a></li>
+</ul>
+<!-- ------------------- end of main content --------------- -->
+</div>  <!-- end container -->
+<!-- include javascript, jQuery *first* -->
+<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.10.2/jquery.min.js"></script>
+<script src="https://netdna.bootstrapcdn.com/bootstrap/3.0.0/js/bootstrap.min.js"></script>
+<!-- Bootstrap footer
+<footer>
+<a href="https://.../"><img width="250" align=right src="https://.../"></a>
+</footer>
+-->
+<center style="font-size:80%">
+<!-- copyright only on the titlepage -->
+</center>
+</body>
+</html>
+
diff --git a/doc/pub/week39/html/._week39-bs088.html b/doc/pub/week39/html/._week39-bs088.html
new file mode 100644
index 000000000..d532f9ec2
--- /dev/null
+++ b/doc/pub/week39/html/._week39-bs088.html
@@ -0,0 +1,488 @@
+<!--
+HTML file automatically generated from DocOnce source
+(https://github.com/doconce/doconce/)
+doconce format html week39.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week39-bs --no_mako
+-->
+<html>
+<head>
+<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
+<meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
+<meta name="viewport" content="width=device-width, initial-scale=1.0" />
+<meta name="description" content="Week 39: Optimization and  Gradient Methods">
+<title>Week 39: Optimization and  Gradient Methods</title>
+<!-- Bootstrap style: bootstrap -->
+<!-- doconce format html week39.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week39-bs --no_mako -->
+<link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
+<!-- not necessary
+<link href="https://netdna.bootstrapcdn.com/font-awesome/4.0.3/css/font-awesome.css" rel="stylesheet">
+-->
+<style type="text/css">
+/* Add scrollbar to dropdown menus in bootstrap navigation bar */
+.dropdown-menu {
+   height: auto;
+   max-height: 400px;
+   overflow-x: hidden;
+}
+/* Adds an invisible element before each target to offset for the navigation
+   bar */
+.anchor::before {
+  content:"";
+  display:block;
+  height:50px;      /* fixed header height for style bootstrap */
+  margin:-50px 0 0; /* negative fixed header height */
+}
+</style>
+</head>
+
+<!-- tocinfo
+{'highest level': 2,
+ 'sections': [('Plan for week 39, September 23-27, 2024',
+               2,
+               None,
+               'plan-for-week-39-september-23-27-2024'),
+              ('Lecture Monday September 23',
+               2,
+               None,
+               'lecture-monday-september-23'),
+              ('Lab sessions week 39', 2, None, 'lab-sessions-week-39'),
+              ('Lecture Monday September 23, Optimization, the central part of '
+               'any Machine Learning algortithm',
+               2,
+               None,
+               'lecture-monday-september-23-optimization-the-central-part-of-any-machine-learning-algortithm'),
+              ('Revisiting our Logistic Regression case',
+               2,
+               None,
+               'revisiting-our-logistic-regression-case'),
+              ('The equations to solve', 2, None, 'the-equations-to-solve'),
+              ("Solving using Newton-Raphson's method",
+               2,
+               None,
+               'solving-using-newton-raphson-s-method'),
+              ("Brief reminder on Newton-Raphson's method",
+               2,
+               None,
+               'brief-reminder-on-newton-raphson-s-method'),
+              ('The equations', 2, None, 'the-equations'),
+              ('Simple geometric interpretation',
+               2,
+               None,
+               'simple-geometric-interpretation'),
+              ('Extending to more than one variable',
+               2,
+               None,
+               'extending-to-more-than-one-variable'),
+              ('Steepest descent', 2, None, 'steepest-descent'),
+              ('More on Steepest descent', 2, None, 'more-on-steepest-descent'),
+              ('The ideal', 2, None, 'the-ideal'),
+              ('The sensitiveness of the gradient descent',
+               2,
+               None,
+               'the-sensitiveness-of-the-gradient-descent'),
+              ('Convex functions', 2, None, 'convex-functions'),
+              ('Convex function', 2, None, 'convex-function'),
+              ('Conditions on convex functions',
+               2,
+               None,
+               'conditions-on-convex-functions'),
+              ('More on convex functions', 2, None, 'more-on-convex-functions'),
+              ('Some simple problems', 2, None, 'some-simple-problems'),
+              ('Standard steepest descent',
+               2,
+               None,
+               'standard-steepest-descent'),
+              ('Gradient method', 2, None, 'gradient-method'),
+              ('Steepest descent  method', 2, None, 'steepest-descent-method'),
+              ('Steepest descent  method', 2, None, 'steepest-descent-method'),
+              ('Final expressions', 2, None, 'final-expressions'),
+              ('Steepest descent example', 2, None, 'steepest-descent-example'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method and iterations',
+               2,
+               None,
+               'conjugate-gradient-method-and-iterations'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Revisiting our first homework',
+               2,
+               None,
+               'revisiting-our-first-homework'),
+              ('Gradient descent example', 2, None, 'gradient-descent-example'),
+              ('The derivative of the cost/loss function',
+               2,
+               None,
+               'the-derivative-of-the-cost-loss-function'),
+              ('The Hessian matrix', 2, None, 'the-hessian-matrix'),
+              ('Simple program', 2, None, 'simple-program'),
+              ('Gradient Descent Example', 2, None, 'gradient-descent-example'),
+              ('And a corresponding example using _scikit-learn_',
+               2,
+               None,
+               'and-a-corresponding-example-using-scikit-learn'),
+              ('Gradient descent and Ridge',
+               2,
+               None,
+               'gradient-descent-and-ridge'),
+              ('The Hessian matrix for Ridge Regression',
+               2,
+               None,
+               'the-hessian-matrix-for-ridge-regression'),
+              ('Program example for gradient descent with Ridge Regression',
+               2,
+               None,
+               'program-example-for-gradient-descent-with-ridge-regression'),
+              ('Using gradient descent methods, limitations',
+               2,
+               None,
+               'using-gradient-descent-methods-limitations'),
+              ('Improving gradient descent with momentum',
+               2,
+               None,
+               'improving-gradient-descent-with-momentum'),
+              ('Same code but now with momentum gradient descent',
+               2,
+               None,
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ('Overview video on Stochastic Gradient Descent',
+               2,
+               None,
+               'overview-video-on-stochastic-gradient-descent'),
+              ('Batches and mini-batches', 2, None, 'batches-and-mini-batches'),
+              ('Stochastic Gradient Descent (SGD)',
+               2,
+               None,
+               'stochastic-gradient-descent-sgd'),
+              ('Stochastic Gradient Descent',
+               2,
+               None,
+               'stochastic-gradient-descent'),
+              ('Computation of gradients', 2, None, 'computation-of-gradients'),
+              ('SGD example', 2, None, 'sgd-example'),
+              ('The gradient step', 2, None, 'the-gradient-step'),
+              ('Simple example code', 2, None, 'simple-example-code'),
+              ('When do we stop?', 2, None, 'when-do-we-stop'),
+              ('Slightly different approach',
+               2,
+               None,
+               'slightly-different-approach'),
+              ('Time decay rate', 2, None, 'time-decay-rate'),
+              ('Code with a Number of Minibatches which varies',
+               2,
+               None,
+               'code-with-a-number-of-minibatches-which-varies'),
+              ('Replace or not', 2, None, 'replace-or-not'),
+              ('Momentum based GD', 2, None, 'momentum-based-gd'),
+              ('More on momentum based approaches',
+               2,
+               None,
+               'more-on-momentum-based-approaches'),
+              ('Momentum parameter', 2, None, 'momentum-parameter'),
+              ('Second moment of the gradient',
+               2,
+               None,
+               'second-moment-of-the-gradient'),
+              ('RMS prop', 2, None, 'rms-prop'),
+              ('"ADAM optimizer":"https://arxiv.org/abs/1412.6980"',
+               2,
+               None,
+               'adam-optimizer-https-arxiv-org-abs-1412-6980'),
+              ('Algorithms and codes for Adagrad, RMSprop and Adam',
+               2,
+               None,
+               'algorithms-and-codes-for-adagrad-rmsprop-and-adam'),
+              ('Practical tips', 2, None, 'practical-tips'),
+              ('Automatic differentiation',
+               2,
+               None,
+               'automatic-differentiation'),
+              ('Using autograd', 2, None, 'using-autograd'),
+              ('Autograd with more complicated functions',
+               2,
+               None,
+               'autograd-with-more-complicated-functions'),
+              ('More complicated functions using the elements of their '
+               'arguments directly',
+               2,
+               None,
+               'more-complicated-functions-using-the-elements-of-their-arguments-directly'),
+              ('Functions using mathematical functions from Numpy',
+               2,
+               None,
+               'functions-using-mathematical-functions-from-numpy'),
+              ('More autograd', 2, None, 'more-autograd'),
+              ('And  with loops', 2, None, 'and-with-loops'),
+              ('Using recursion', 2, None, 'using-recursion'),
+              ('Unsupported functions', 2, None, 'unsupported-functions'),
+              ('The syntax a.dot(b) when finding the dot product',
+               2,
+               None,
+               'the-syntax-a-dot-b-when-finding-the-dot-product'),
+              ('Using Autograd with OLS', 2, None, 'using-autograd-with-ols'),
+              ('Same code but now with momentum gradient descent',
+               2,
+               None,
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ("But none of these can compete with Newton's method",
+               2,
+               None,
+               'but-none-of-these-can-compete-with-newton-s-method'),
+              ('Including Stochastic Gradient Descent with Autograd',
+               2,
+               None,
+               'including-stochastic-gradient-descent-with-autograd'),
+              ('Same code but now with momentum gradient descent',
+               2,
+               None,
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ('Similar (second order function now) problem but now with '
+               'AdaGrad',
+               2,
+               None,
+               'similar-second-order-function-now-problem-but-now-with-adagrad'),
+              ('RMSprop for adaptive learning rate with Stochastic Gradient '
+               'Descent',
+               2,
+               None,
+               'rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent'),
+              ('And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"',
+               2,
+               None,
+               'and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf'),
+              ('And Logistic Regression', 2, None, 'and-logistic-regression'),
+              ('Introducing "JAX":"https://jax.readthedocs.io/en/latest/"',
+               2,
+               None,
+               'introducing-jax-https-jax-readthedocs-io-en-latest')]}
+end of tocinfo -->
+
+<body>
+
+
+
+<script type="text/x-mathjax-config">
+MathJax.Hub.Config({
+  TeX: {
+     equationNumbers: {  autoNumber: "none"  },
+     extensions: ["AMSmath.js", "AMSsymbols.js", "autobold.js", "color.js"]
+  }
+});
+</script>
+<script type="text/javascript" async
+ src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
+</script>
+
+
+<!-- Bootstrap navigation bar -->
+<div class="navbar navbar-default navbar-fixed-top">
+  <div class="navbar-header">
+    <button type="button" class="navbar-toggle" data-toggle="collapse" data-target=".navbar-responsive-collapse">
+      <span class="icon-bar"></span>
+      <span class="icon-bar"></span>
+      <span class="icon-bar"></span>
+    </button>
+    <a class="navbar-brand" href="http://github.com/week39-bs.html">Week 39: Optimization and  Gradient Methods</a>
+  </div>
+  <div class="navbar-collapse collapse navbar-responsive-collapse">
+    <ul class="nav navbar-nav navbar-right">
+      <li class="dropdown">
+        <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
+        <ul class="dropdown-menu">
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs001.html#plan-for-week-39-september-23-27-2024" style="font-size: 80%;">Plan for week 39, September 23-27, 2024</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs002.html#lecture-monday-september-23" style="font-size: 80%;">Lecture Monday September 23</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs003.html#lab-sessions-week-39" style="font-size: 80%;">Lab sessions week 39</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs004.html#lecture-monday-september-23-optimization-the-central-part-of-any-machine-learning-algortithm" style="font-size: 80%;">Lecture Monday September 23, Optimization, the central part of any Machine Learning algortithm</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs005.html#revisiting-our-logistic-regression-case" style="font-size: 80%;">Revisiting our Logistic Regression case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs006.html#the-equations-to-solve" style="font-size: 80%;">The equations to solve</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs007.html#solving-using-newton-raphson-s-method" style="font-size: 80%;">Solving using Newton-Raphson's method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs008.html#brief-reminder-on-newton-raphson-s-method" style="font-size: 80%;">Brief reminder on Newton-Raphson's method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs009.html#the-equations" style="font-size: 80%;">The equations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs010.html#simple-geometric-interpretation" style="font-size: 80%;">Simple geometric interpretation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs011.html#extending-to-more-than-one-variable" style="font-size: 80%;">Extending to more than one variable</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs012.html#steepest-descent" style="font-size: 80%;">Steepest descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs013.html#more-on-steepest-descent" style="font-size: 80%;">More on Steepest descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs014.html#the-ideal" style="font-size: 80%;">The ideal</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs015.html#the-sensitiveness-of-the-gradient-descent" style="font-size: 80%;">The sensitiveness of the gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs016.html#convex-functions" style="font-size: 80%;">Convex functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs017.html#convex-function" style="font-size: 80%;">Convex function</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs018.html#conditions-on-convex-functions" style="font-size: 80%;">Conditions on convex functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs019.html#more-on-convex-functions" style="font-size: 80%;">More on convex functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs020.html#some-simple-problems" style="font-size: 80%;">Some simple problems</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs021.html#standard-steepest-descent" style="font-size: 80%;">Standard steepest descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs022.html#gradient-method" style="font-size: 80%;">Gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs024.html#steepest-descent-method" style="font-size: 80%;">Steepest descent  method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs024.html#steepest-descent-method" style="font-size: 80%;">Steepest descent  method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs025.html#final-expressions" style="font-size: 80%;">Final expressions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs026.html#steepest-descent-example" style="font-size: 80%;">Steepest descent example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs031.html#conjugate-gradient-method-and-iterations" style="font-size: 80%;">Conjugate gradient method and iterations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs035.html#revisiting-our-first-homework" style="font-size: 80%;">Revisiting our first homework</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs040.html#gradient-descent-example" style="font-size: 80%;">Gradient descent example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs037.html#the-derivative-of-the-cost-loss-function" style="font-size: 80%;">The derivative of the cost/loss function</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs038.html#the-hessian-matrix" style="font-size: 80%;">The Hessian matrix</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs039.html#simple-program" style="font-size: 80%;">Simple program</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs040.html#gradient-descent-example" style="font-size: 80%;">Gradient Descent Example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs041.html#and-a-corresponding-example-using-scikit-learn" style="font-size: 80%;">And a corresponding example using <b>scikit-learn</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs042.html#gradient-descent-and-ridge" style="font-size: 80%;">Gradient descent and Ridge</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs043.html#the-hessian-matrix-for-ridge-regression" style="font-size: 80%;">The Hessian matrix for Ridge Regression</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs044.html#program-example-for-gradient-descent-with-ridge-regression" style="font-size: 80%;">Program example for gradient descent with Ridge Regression</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs045.html#using-gradient-descent-methods-limitations" style="font-size: 80%;">Using gradient descent methods, limitations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs046.html#improving-gradient-descent-with-momentum" style="font-size: 80%;">Improving gradient descent with momentum</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs083.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;">Same code but now with momentum gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs048.html#overview-video-on-stochastic-gradient-descent" style="font-size: 80%;">Overview video on Stochastic Gradient Descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs049.html#batches-and-mini-batches" style="font-size: 80%;">Batches and mini-batches</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs050.html#stochastic-gradient-descent-sgd" style="font-size: 80%;">Stochastic Gradient Descent (SGD)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs051.html#stochastic-gradient-descent" style="font-size: 80%;">Stochastic Gradient Descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs052.html#computation-of-gradients" style="font-size: 80%;">Computation of gradients</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs053.html#sgd-example" style="font-size: 80%;">SGD example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs054.html#the-gradient-step" style="font-size: 80%;">The gradient step</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs055.html#simple-example-code" style="font-size: 80%;">Simple example code</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs056.html#when-do-we-stop" style="font-size: 80%;">When do we stop?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs057.html#slightly-different-approach" style="font-size: 80%;">Slightly different approach</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs058.html#time-decay-rate" style="font-size: 80%;">Time decay rate</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs059.html#code-with-a-number-of-minibatches-which-varies" style="font-size: 80%;">Code with a Number of Minibatches which varies</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs060.html#replace-or-not" style="font-size: 80%;">Replace or not</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs061.html#momentum-based-gd" style="font-size: 80%;">Momentum based GD</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs062.html#more-on-momentum-based-approaches" style="font-size: 80%;">More on momentum based approaches</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs063.html#momentum-parameter" style="font-size: 80%;">Momentum parameter</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs064.html#second-moment-of-the-gradient" style="font-size: 80%;">Second moment of the gradient</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs065.html#rms-prop" style="font-size: 80%;">RMS prop</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs066.html#adam-optimizer-https-arxiv-org-abs-1412-6980" style="font-size: 80%;">"ADAM optimizer":"https://arxiv.org/abs/1412.6980"</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs067.html#algorithms-and-codes-for-adagrad-rmsprop-and-adam" style="font-size: 80%;">Algorithms and codes for Adagrad, RMSprop and Adam</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs068.html#practical-tips" style="font-size: 80%;">Practical tips</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs069.html#automatic-differentiation" style="font-size: 80%;">Automatic differentiation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs070.html#using-autograd" style="font-size: 80%;">Using autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs071.html#autograd-with-more-complicated-functions" style="font-size: 80%;">Autograd with more complicated functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs072.html#more-complicated-functions-using-the-elements-of-their-arguments-directly" style="font-size: 80%;">More complicated functions using the elements of their arguments directly</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs073.html#functions-using-mathematical-functions-from-numpy" style="font-size: 80%;">Functions using mathematical functions from Numpy</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs074.html#more-autograd" style="font-size: 80%;">More autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs075.html#and-with-loops" style="font-size: 80%;">And  with loops</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs076.html#using-recursion" style="font-size: 80%;">Using recursion</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs077.html#unsupported-functions" style="font-size: 80%;">Unsupported functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs078.html#the-syntax-a-dot-b-when-finding-the-dot-product" style="font-size: 80%;">The syntax a.dot(b) when finding the dot product</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs079.html#using-autograd-with-ols" style="font-size: 80%;">Using Autograd with OLS</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs083.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;">Same code but now with momentum gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs081.html#but-none-of-these-can-compete-with-newton-s-method" style="font-size: 80%;">But none of these can compete with Newton's method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs082.html#including-stochastic-gradient-descent-with-autograd" style="font-size: 80%;">Including Stochastic Gradient Descent with Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs083.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;">Same code but now with momentum gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs084.html#similar-second-order-function-now-problem-but-now-with-adagrad" style="font-size: 80%;">Similar (second order function now) problem but now with AdaGrad</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs085.html#rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent" style="font-size: 80%;">RMSprop for adaptive learning rate with Stochastic Gradient Descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs086.html#and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf" style="font-size: 80%;">And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs087.html#and-logistic-regression" style="font-size: 80%;">And Logistic Regression</a></li>
+     <!-- navigation toc: --> <li><a href="#introducing-jax-https-jax-readthedocs-io-en-latest" style="font-size: 80%;">Introducing "JAX":"https://jax.readthedocs.io/en/latest/"</a></li>
+
+        </ul>
+      </li>
+    </ul>
+  </div>
+</div>
+</div> <!-- end of navigation bar -->
+<div class="container">
+<p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
+<a name="part0088"></a>
+<!-- !split -->
+<h2 id="introducing-jax-https-jax-readthedocs-io-en-latest" class="anchor">Introducing <a href="https://jax.readthedocs.io/en/latest/" target="_self">JAX</a> </h2>
+
+<p>Presently, instead of using <b>autograd</b>, we recommend using <a href="https://jax.readthedocs.io/en/latest/" target="_self">JAX</a></p>
+
+<p><b>JAX</b> is Autograd and <a href="https://www.tensorflow.org/xla" target="_self">XLA (Accelerated Linear Algebra))</a>,
+brought together for high-performance numerical computing and machine learning research.
+It provides composable transformations of Python+NumPy programs: differentiate, vectorize, parallelize, Just-In-Time compile to GPU/TPU, and more.
+</p>
+
+<p>Here's a simple example on how you can use <b>JAX</b> to compute the derivate of the logistic function.</p>
+
+
+<!-- code=python (!bc pycod) typeset with pygments style "default" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #f8f8f8">
+  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">jax.numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">jnp</span>
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">jax</span> <span style="color: #008000; font-weight: bold">import</span> grad, jit, vmap
+
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">sum_logistic</span>(x):
+  <span style="color: #008000; font-weight: bold">return</span> jnp<span style="color: #666666">.</span>sum(<span style="color: #666666">1.0</span> <span style="color: #666666">/</span> (<span style="color: #666666">1.0</span> <span style="color: #666666">+</span> jnp<span style="color: #666666">.</span>exp(<span style="color: #666666">-</span>x)))
+
+x_small <span style="color: #666666">=</span> jnp<span style="color: #666666">.</span>arange(<span style="color: #666666">3.</span>)
+derivative_fn <span style="color: #666666">=</span> grad(sum_logistic)
+<span style="color: #008000">print</span>(derivative_fn(x_small))
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
+</div>
+
+
+<p>
+<!-- navigation buttons at the bottom of the page -->
+<ul class="pagination">
+<li><a href="http://github.com/._week39-bs087.html">&laquo;</a></li>
+  <li><a href="http://github.com/._week39-bs000.html">1</a></li>
+  <li><a href="">...</a></li>
+  <li><a href="http://github.com/._week39-bs080.html">81</a></li>
+  <li><a href="http://github.com/._week39-bs081.html">82</a></li>
+  <li><a href="http://github.com/._week39-bs082.html">83</a></li>
+  <li><a href="http://github.com/._week39-bs083.html">84</a></li>
+  <li><a href="http://github.com/._week39-bs084.html">85</a></li>
+  <li><a href="http://github.com/._week39-bs085.html">86</a></li>
+  <li><a href="http://github.com/._week39-bs086.html">87</a></li>
+  <li><a href="http://github.com/._week39-bs087.html">88</a></li>
+  <li class="active"><a href="http://github.com/._week39-bs088.html">89</a></li>
+</ul>
+<!-- ------------------- end of main content --------------- -->
+</div>  <!-- end container -->
+<!-- include javascript, jQuery *first* -->
+<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.10.2/jquery.min.js"></script>
+<script src="https://netdna.bootstrapcdn.com/bootstrap/3.0.0/js/bootstrap.min.js"></script>
+<!-- Bootstrap footer
+<footer>
+<a href="https://.../"><img width="250" align=right src="https://.../"></a>
+</footer>
+-->
+<center style="font-size:80%">
+<!-- copyright only on the titlepage -->
+</center>
+</body>
+</html>
+
diff --git a/doc/pub/week39/html/._week39-bs089.html b/doc/pub/week39/html/._week39-bs089.html
new file mode 100644
index 000000000..f49f192c2
--- /dev/null
+++ b/doc/pub/week39/html/._week39-bs089.html
@@ -0,0 +1,490 @@
+<!--
+HTML file automatically generated from DocOnce source
+(https://github.com/doconce/doconce/)
+doconce format html week39.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week39-bs --no_mako
+-->
+<html>
+<head>
+<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
+<meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
+<meta name="viewport" content="width=device-width, initial-scale=1.0" />
+<meta name="description" content="Week 39: Optimization and  Gradient Methods">
+<title>Week 39: Optimization and  Gradient Methods</title>
+<!-- Bootstrap style: bootstrap -->
+<!-- doconce format html week39.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week39-bs --no_mako -->
+<link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
+<!-- not necessary
+<link href="https://netdna.bootstrapcdn.com/font-awesome/4.0.3/css/font-awesome.css" rel="stylesheet">
+-->
+<style type="text/css">
+/* Add scrollbar to dropdown menus in bootstrap navigation bar */
+.dropdown-menu {
+   height: auto;
+   max-height: 400px;
+   overflow-x: hidden;
+}
+/* Adds an invisible element before each target to offset for the navigation
+   bar */
+.anchor::before {
+  content:"";
+  display:block;
+  height:50px;      /* fixed header height for style bootstrap */
+  margin:-50px 0 0; /* negative fixed header height */
+}
+</style>
+</head>
+
+<!-- tocinfo
+{'highest level': 2,
+ 'sections': [('Plan for week 39, September 23-27, 2024',
+               2,
+               None,
+               'plan-for-week-39-september-23-27-2024'),
+              ('Lecture Monday September 23',
+               2,
+               None,
+               'lecture-monday-september-23'),
+              ('Lab sessions week 39', 2, None, 'lab-sessions-week-39'),
+              ('Lecture Monday September 23, Optimization, the central part of '
+               'any Machine Learning algortithm',
+               2,
+               None,
+               'lecture-monday-september-23-optimization-the-central-part-of-any-machine-learning-algortithm'),
+              ('Revisiting our Logistic Regression case',
+               2,
+               None,
+               'revisiting-our-logistic-regression-case'),
+              ('The equations to solve', 2, None, 'the-equations-to-solve'),
+              ("Solving using Newton-Raphson's method",
+               2,
+               None,
+               'solving-using-newton-raphson-s-method'),
+              ("Brief reminder on Newton-Raphson's method",
+               2,
+               None,
+               'brief-reminder-on-newton-raphson-s-method'),
+              ('The equations', 2, None, 'the-equations'),
+              ('Simple geometric interpretation',
+               2,
+               None,
+               'simple-geometric-interpretation'),
+              ('Extending to more than one variable',
+               2,
+               None,
+               'extending-to-more-than-one-variable'),
+              ('Steepest descent', 2, None, 'steepest-descent'),
+              ('More on Steepest descent', 2, None, 'more-on-steepest-descent'),
+              ('The ideal', 2, None, 'the-ideal'),
+              ('The sensitiveness of the gradient descent',
+               2,
+               None,
+               'the-sensitiveness-of-the-gradient-descent'),
+              ('Convex functions', 2, None, 'convex-functions'),
+              ('Convex function', 2, None, 'convex-function'),
+              ('Conditions on convex functions',
+               2,
+               None,
+               'conditions-on-convex-functions'),
+              ('More on convex functions', 2, None, 'more-on-convex-functions'),
+              ('Some simple problems', 2, None, 'some-simple-problems'),
+              ('Standard steepest descent',
+               2,
+               None,
+               'standard-steepest-descent'),
+              ('Gradient method', 2, None, 'gradient-method'),
+              ('Steepest descent  method', 2, None, 'steepest-descent-method'),
+              ('Steepest descent  method', 2, None, 'steepest-descent-method'),
+              ('Final expressions', 2, None, 'final-expressions'),
+              ('Steepest descent example', 2, None, 'steepest-descent-example'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method and iterations',
+               2,
+               None,
+               'conjugate-gradient-method-and-iterations'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Conjugate gradient method',
+               2,
+               None,
+               'conjugate-gradient-method'),
+              ('Revisiting our first homework',
+               2,
+               None,
+               'revisiting-our-first-homework'),
+              ('Gradient descent example', 2, None, 'gradient-descent-example'),
+              ('The derivative of the cost/loss function',
+               2,
+               None,
+               'the-derivative-of-the-cost-loss-function'),
+              ('The Hessian matrix', 2, None, 'the-hessian-matrix'),
+              ('Simple program', 2, None, 'simple-program'),
+              ('Gradient Descent Example', 2, None, 'gradient-descent-example'),
+              ('And a corresponding example using _scikit-learn_',
+               2,
+               None,
+               'and-a-corresponding-example-using-scikit-learn'),
+              ('Gradient descent and Ridge',
+               2,
+               None,
+               'gradient-descent-and-ridge'),
+              ('The Hessian matrix for Ridge Regression',
+               2,
+               None,
+               'the-hessian-matrix-for-ridge-regression'),
+              ('Program example for gradient descent with Ridge Regression',
+               2,
+               None,
+               'program-example-for-gradient-descent-with-ridge-regression'),
+              ('Using gradient descent methods, limitations',
+               2,
+               None,
+               'using-gradient-descent-methods-limitations'),
+              ('Improving gradient descent with momentum',
+               2,
+               None,
+               'improving-gradient-descent-with-momentum'),
+              ('Same code but now with momentum gradient descent',
+               2,
+               None,
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ('Overview video on Stochastic Gradient Descent',
+               2,
+               None,
+               'overview-video-on-stochastic-gradient-descent'),
+              ('Batches and mini-batches', 2, None, 'batches-and-mini-batches'),
+              ('Stochastic Gradient Descent (SGD)',
+               2,
+               None,
+               'stochastic-gradient-descent-sgd'),
+              ('Stochastic Gradient Descent',
+               2,
+               None,
+               'stochastic-gradient-descent'),
+              ('Computation of gradients', 2, None, 'computation-of-gradients'),
+              ('SGD example', 2, None, 'sgd-example'),
+              ('The gradient step', 2, None, 'the-gradient-step'),
+              ('Simple example code', 2, None, 'simple-example-code'),
+              ('When do we stop?', 2, None, 'when-do-we-stop'),
+              ('Slightly different approach',
+               2,
+               None,
+               'slightly-different-approach'),
+              ('Time decay rate', 2, None, 'time-decay-rate'),
+              ('Code with a Number of Minibatches which varies',
+               2,
+               None,
+               'code-with-a-number-of-minibatches-which-varies'),
+              ('Replace or not', 2, None, 'replace-or-not'),
+              ('Momentum based GD', 2, None, 'momentum-based-gd'),
+              ('More on momentum based approaches',
+               2,
+               None,
+               'more-on-momentum-based-approaches'),
+              ('Momentum parameter', 2, None, 'momentum-parameter'),
+              ('Second moment of the gradient',
+               2,
+               None,
+               'second-moment-of-the-gradient'),
+              ('RMS prop', 2, None, 'rms-prop'),
+              ('"ADAM optimizer":"https://arxiv.org/abs/1412.6980"',
+               2,
+               None,
+               'adam-optimizer-https-arxiv-org-abs-1412-6980'),
+              ('Algorithms and codes for Adagrad, RMSprop and Adam',
+               2,
+               None,
+               'algorithms-and-codes-for-adagrad-rmsprop-and-adam'),
+              ('Practical tips', 2, None, 'practical-tips'),
+              ('Automatic differentiation',
+               2,
+               None,
+               'automatic-differentiation'),
+              ('Using autograd', 2, None, 'using-autograd'),
+              ('Autograd with more complicated functions',
+               2,
+               None,
+               'autograd-with-more-complicated-functions'),
+              ('More complicated functions using the elements of their '
+               'arguments directly',
+               2,
+               None,
+               'more-complicated-functions-using-the-elements-of-their-arguments-directly'),
+              ('Functions using mathematical functions from Numpy',
+               2,
+               None,
+               'functions-using-mathematical-functions-from-numpy'),
+              ('More autograd', 2, None, 'more-autograd'),
+              ('And  with loops', 2, None, 'and-with-loops'),
+              ('Using recursion', 2, None, 'using-recursion'),
+              ('Unsupported functions', 2, None, 'unsupported-functions'),
+              ('The syntax a.dot(b) when finding the dot product',
+               2,
+               None,
+               'the-syntax-a-dot-b-when-finding-the-dot-product'),
+              ('Recommended to avoid', 2, None, 'recommended-to-avoid'),
+              ('Using Autograd with OLS', 2, None, 'using-autograd-with-ols'),
+              ('Same code but now with momentum gradient descent',
+               2,
+               None,
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ("But none of these can compete with Newton's method",
+               2,
+               None,
+               'but-none-of-these-can-compete-with-newton-s-method'),
+              ('Including Stochastic Gradient Descent with Autograd',
+               2,
+               None,
+               'including-stochastic-gradient-descent-with-autograd'),
+              ('Same code but now with momentum gradient descent',
+               2,
+               None,
+               'same-code-but-now-with-momentum-gradient-descent'),
+              ('Similar (second order function now) problem but now with '
+               'AdaGrad',
+               2,
+               None,
+               'similar-second-order-function-now-problem-but-now-with-adagrad'),
+              ('RMSprop for adaptive learning rate with Stochastic Gradient '
+               'Descent',
+               2,
+               None,
+               'rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent'),
+              ('And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"',
+               2,
+               None,
+               'and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf'),
+              ('And Logistic Regression', 2, None, 'and-logistic-regression'),
+              ('Introducing "JAX":"https://jax.readthedocs.io/en/latest/"',
+               2,
+               None,
+               'introducing-jax-https-jax-readthedocs-io-en-latest')]}
+end of tocinfo -->
+
+<body>
+
+
+
+<script type="text/x-mathjax-config">
+MathJax.Hub.Config({
+  TeX: {
+     equationNumbers: {  autoNumber: "none"  },
+     extensions: ["AMSmath.js", "AMSsymbols.js", "autobold.js", "color.js"]
+  }
+});
+</script>
+<script type="text/javascript" async
+ src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
+</script>
+
+
+<!-- Bootstrap navigation bar -->
+<div class="navbar navbar-default navbar-fixed-top">
+  <div class="navbar-header">
+    <button type="button" class="navbar-toggle" data-toggle="collapse" data-target=".navbar-responsive-collapse">
+      <span class="icon-bar"></span>
+      <span class="icon-bar"></span>
+      <span class="icon-bar"></span>
+    </button>
+    <a class="navbar-brand" href="http://github.com/week39-bs.html">Week 39: Optimization and  Gradient Methods</a>
+  </div>
+  <div class="navbar-collapse collapse navbar-responsive-collapse">
+    <ul class="nav navbar-nav navbar-right">
+      <li class="dropdown">
+        <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
+        <ul class="dropdown-menu">
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs001.html#plan-for-week-39-september-23-27-2024" style="font-size: 80%;">Plan for week 39, September 23-27, 2024</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs002.html#lecture-monday-september-23" style="font-size: 80%;">Lecture Monday September 23</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs003.html#lab-sessions-week-39" style="font-size: 80%;">Lab sessions week 39</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs004.html#lecture-monday-september-23-optimization-the-central-part-of-any-machine-learning-algortithm" style="font-size: 80%;">Lecture Monday September 23, Optimization, the central part of any Machine Learning algortithm</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs005.html#revisiting-our-logistic-regression-case" style="font-size: 80%;">Revisiting our Logistic Regression case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs006.html#the-equations-to-solve" style="font-size: 80%;">The equations to solve</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs007.html#solving-using-newton-raphson-s-method" style="font-size: 80%;">Solving using Newton-Raphson's method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs008.html#brief-reminder-on-newton-raphson-s-method" style="font-size: 80%;">Brief reminder on Newton-Raphson's method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs009.html#the-equations" style="font-size: 80%;">The equations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs010.html#simple-geometric-interpretation" style="font-size: 80%;">Simple geometric interpretation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs011.html#extending-to-more-than-one-variable" style="font-size: 80%;">Extending to more than one variable</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs012.html#steepest-descent" style="font-size: 80%;">Steepest descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs013.html#more-on-steepest-descent" style="font-size: 80%;">More on Steepest descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs014.html#the-ideal" style="font-size: 80%;">The ideal</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs015.html#the-sensitiveness-of-the-gradient-descent" style="font-size: 80%;">The sensitiveness of the gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs016.html#convex-functions" style="font-size: 80%;">Convex functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs017.html#convex-function" style="font-size: 80%;">Convex function</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs018.html#conditions-on-convex-functions" style="font-size: 80%;">Conditions on convex functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs019.html#more-on-convex-functions" style="font-size: 80%;">More on convex functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs020.html#some-simple-problems" style="font-size: 80%;">Some simple problems</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs021.html#standard-steepest-descent" style="font-size: 80%;">Standard steepest descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs022.html#gradient-method" style="font-size: 80%;">Gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs024.html#steepest-descent-method" style="font-size: 80%;">Steepest descent  method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs024.html#steepest-descent-method" style="font-size: 80%;">Steepest descent  method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs025.html#final-expressions" style="font-size: 80%;">Final expressions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs026.html#steepest-descent-example" style="font-size: 80%;">Steepest descent example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs031.html#conjugate-gradient-method-and-iterations" style="font-size: 80%;">Conjugate gradient method and iterations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs035.html#revisiting-our-first-homework" style="font-size: 80%;">Revisiting our first homework</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs040.html#gradient-descent-example" style="font-size: 80%;">Gradient descent example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs037.html#the-derivative-of-the-cost-loss-function" style="font-size: 80%;">The derivative of the cost/loss function</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs038.html#the-hessian-matrix" style="font-size: 80%;">The Hessian matrix</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs039.html#simple-program" style="font-size: 80%;">Simple program</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs040.html#gradient-descent-example" style="font-size: 80%;">Gradient Descent Example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs041.html#and-a-corresponding-example-using-scikit-learn" style="font-size: 80%;">And a corresponding example using <b>scikit-learn</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs042.html#gradient-descent-and-ridge" style="font-size: 80%;">Gradient descent and Ridge</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs043.html#the-hessian-matrix-for-ridge-regression" style="font-size: 80%;">The Hessian matrix for Ridge Regression</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs044.html#program-example-for-gradient-descent-with-ridge-regression" style="font-size: 80%;">Program example for gradient descent with Ridge Regression</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs045.html#using-gradient-descent-methods-limitations" style="font-size: 80%;">Using gradient descent methods, limitations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs046.html#improving-gradient-descent-with-momentum" style="font-size: 80%;">Improving gradient descent with momentum</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs084.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;">Same code but now with momentum gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs048.html#overview-video-on-stochastic-gradient-descent" style="font-size: 80%;">Overview video on Stochastic Gradient Descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs049.html#batches-and-mini-batches" style="font-size: 80%;">Batches and mini-batches</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs050.html#stochastic-gradient-descent-sgd" style="font-size: 80%;">Stochastic Gradient Descent (SGD)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs051.html#stochastic-gradient-descent" style="font-size: 80%;">Stochastic Gradient Descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs052.html#computation-of-gradients" style="font-size: 80%;">Computation of gradients</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs053.html#sgd-example" style="font-size: 80%;">SGD example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs054.html#the-gradient-step" style="font-size: 80%;">The gradient step</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs055.html#simple-example-code" style="font-size: 80%;">Simple example code</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs056.html#when-do-we-stop" style="font-size: 80%;">When do we stop?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs057.html#slightly-different-approach" style="font-size: 80%;">Slightly different approach</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs058.html#time-decay-rate" style="font-size: 80%;">Time decay rate</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs059.html#code-with-a-number-of-minibatches-which-varies" style="font-size: 80%;">Code with a Number of Minibatches which varies</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs060.html#replace-or-not" style="font-size: 80%;">Replace or not</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs061.html#momentum-based-gd" style="font-size: 80%;">Momentum based GD</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs062.html#more-on-momentum-based-approaches" style="font-size: 80%;">More on momentum based approaches</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs063.html#momentum-parameter" style="font-size: 80%;">Momentum parameter</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs064.html#second-moment-of-the-gradient" style="font-size: 80%;">Second moment of the gradient</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs065.html#rms-prop" style="font-size: 80%;">RMS prop</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs066.html#adam-optimizer-https-arxiv-org-abs-1412-6980" style="font-size: 80%;">"ADAM optimizer":"https://arxiv.org/abs/1412.6980"</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs067.html#algorithms-and-codes-for-adagrad-rmsprop-and-adam" style="font-size: 80%;">Algorithms and codes for Adagrad, RMSprop and Adam</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs068.html#practical-tips" style="font-size: 80%;">Practical tips</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs069.html#automatic-differentiation" style="font-size: 80%;">Automatic differentiation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs070.html#using-autograd" style="font-size: 80%;">Using autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs071.html#autograd-with-more-complicated-functions" style="font-size: 80%;">Autograd with more complicated functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs072.html#more-complicated-functions-using-the-elements-of-their-arguments-directly" style="font-size: 80%;">More complicated functions using the elements of their arguments directly</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs073.html#functions-using-mathematical-functions-from-numpy" style="font-size: 80%;">Functions using mathematical functions from Numpy</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs074.html#more-autograd" style="font-size: 80%;">More autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs075.html#and-with-loops" style="font-size: 80%;">And  with loops</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs076.html#using-recursion" style="font-size: 80%;">Using recursion</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs077.html#unsupported-functions" style="font-size: 80%;">Unsupported functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs078.html#the-syntax-a-dot-b-when-finding-the-dot-product" style="font-size: 80%;">The syntax a.dot(b) when finding the dot product</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs079.html#recommended-to-avoid" style="font-size: 80%;">Recommended to avoid</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs080.html#using-autograd-with-ols" style="font-size: 80%;">Using Autograd with OLS</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs084.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;">Same code but now with momentum gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs082.html#but-none-of-these-can-compete-with-newton-s-method" style="font-size: 80%;">But none of these can compete with Newton's method</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs083.html#including-stochastic-gradient-descent-with-autograd" style="font-size: 80%;">Including Stochastic Gradient Descent with Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs084.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;">Same code but now with momentum gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs085.html#similar-second-order-function-now-problem-but-now-with-adagrad" style="font-size: 80%;">Similar (second order function now) problem but now with AdaGrad</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs086.html#rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent" style="font-size: 80%;">RMSprop for adaptive learning rate with Stochastic Gradient Descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs087.html#and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf" style="font-size: 80%;">And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs088.html#and-logistic-regression" style="font-size: 80%;">And Logistic Regression</a></li>
+     <!-- navigation toc: --> <li><a href="#introducing-jax-https-jax-readthedocs-io-en-latest" style="font-size: 80%;">Introducing "JAX":"https://jax.readthedocs.io/en/latest/"</a></li>
+
+        </ul>
+      </li>
+    </ul>
+  </div>
+</div>
+</div> <!-- end of navigation bar -->
+<div class="container">
+<p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
+<a name="part0089"></a>
+<!-- !split -->
+<h2 id="introducing-jax-https-jax-readthedocs-io-en-latest" class="anchor">Introducing <a href="https://jax.readthedocs.io/en/latest/" target="_self">JAX</a> </h2>
+
+<p>Presently, instead of using <b>autograd</b>, we recommend using <a href="https://jax.readthedocs.io/en/latest/" target="_self">JAX</a></p>
+
+<p><b>JAX</b> is Autograd and <a href="https://www.tensorflow.org/xla" target="_self">XLA (Accelerated Linear Algebra))</a>,
+brought together for high-performance numerical computing and machine learning research.
+It provides composable transformations of Python+NumPy programs: differentiate, vectorize, parallelize, Just-In-Time compile to GPU/TPU, and more.
+</p>
+
+<p>Here's a simple example on how you can use <b>JAX</b> to compute the derivate of the logistic function.</p>
+
+
+<!-- code=python (!bc pycod) typeset with pygments style "default" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #f8f8f8">
+  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">jax.numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">jnp</span>
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">jax</span> <span style="color: #008000; font-weight: bold">import</span> grad, jit, vmap
+
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">sum_logistic</span>(x):
+  <span style="color: #008000; font-weight: bold">return</span> jnp<span style="color: #666666">.</span>sum(<span style="color: #666666">1.0</span> <span style="color: #666666">/</span> (<span style="color: #666666">1.0</span> <span style="color: #666666">+</span> jnp<span style="color: #666666">.</span>exp(<span style="color: #666666">-</span>x)))
+
+x_small <span style="color: #666666">=</span> jnp<span style="color: #666666">.</span>arange(<span style="color: #666666">3.</span>)
+derivative_fn <span style="color: #666666">=</span> grad(sum_logistic)
+<span style="color: #008000">print</span>(derivative_fn(x_small))
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
+</div>
+
+
+<p>
+<!-- navigation buttons at the bottom of the page -->
+<ul class="pagination">
+<li><a href="http://github.com/._week39-bs088.html">&laquo;</a></li>
+  <li><a href="http://github.com/._week39-bs000.html">1</a></li>
+  <li><a href="">...</a></li>
+  <li><a href="http://github.com/._week39-bs081.html">82</a></li>
+  <li><a href="http://github.com/._week39-bs082.html">83</a></li>
+  <li><a href="http://github.com/._week39-bs083.html">84</a></li>
+  <li><a href="http://github.com/._week39-bs084.html">85</a></li>
+  <li><a href="http://github.com/._week39-bs085.html">86</a></li>
+  <li><a href="http://github.com/._week39-bs086.html">87</a></li>
+  <li><a href="http://github.com/._week39-bs087.html">88</a></li>
+  <li><a href="http://github.com/._week39-bs088.html">89</a></li>
+  <li class="active"><a href="http://github.com/._week39-bs089.html">90</a></li>
+</ul>
+<!-- ------------------- end of main content --------------- -->
+</div>  <!-- end container -->
+<!-- include javascript, jQuery *first* -->
+<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.10.2/jquery.min.js"></script>
+<script src="https://netdna.bootstrapcdn.com/bootstrap/3.0.0/js/bootstrap.min.js"></script>
+<!-- Bootstrap footer
+<footer>
+<a href="https://.../"><img width="250" align=right src="https://.../"></a>
+</footer>
+-->
+<center style="font-size:80%">
+<!-- copyright only on the titlepage -->
+</center>
+</body>
+</html>
+
diff --git a/doc/pub/week39/html/week39-bs.html b/doc/pub/week39/html/week39-bs.html
index e57267e6f..85a342f93 100644
--- a/doc/pub/week39/html/week39-bs.html
+++ b/doc/pub/week39/html/week39-bs.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 39: Optimization and  Gradient Methods">
-<title>Week 39: Optimization and  Gradient Methods</title>
+<meta name="description" content="Week 39: Resampling methods and logistic regression">
+<title>Week 39: Resampling methods and logistic regression</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week39.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week39-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -36,248 +36,132 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Plan for week 39, September 23-27, 2024',
+ 'sections': [('Plan for week 39, September 22-26, 2025',
                2,
                None,
-               'plan-for-week-39-september-23-27-2024'),
-              ('Lecture Monday September 23',
+               'plan-for-week-39-september-22-26-2025'),
+              ('Readings and Videos, resampling methods',
                2,
                None,
-               'lecture-monday-september-23'),
-              ('Lab sessions week 39', 2, None, 'lab-sessions-week-39'),
-              ('Lecture Monday September 23, Optimization, the central part of '
-               'any Machine Learning algortithm',
-               2,
-               None,
-               'lecture-monday-september-23-optimization-the-central-part-of-any-machine-learning-algortithm'),
-              ('Revisiting our Logistic Regression case',
-               2,
-               None,
-               'revisiting-our-logistic-regression-case'),
-              ('The equations to solve', 2, None, 'the-equations-to-solve'),
-              ("Solving using Newton-Raphson's method",
-               2,
-               None,
-               'solving-using-newton-raphson-s-method'),
-              ("Brief reminder on Newton-Raphson's method",
-               2,
-               None,
-               'brief-reminder-on-newton-raphson-s-method'),
-              ('The equations', 2, None, 'the-equations'),
-              ('Simple geometric interpretation',
-               2,
-               None,
-               'simple-geometric-interpretation'),
-              ('Extending to more than one variable',
-               2,
-               None,
-               'extending-to-more-than-one-variable'),
-              ('Steepest descent', 2, None, 'steepest-descent'),
-              ('More on Steepest descent', 2, None, 'more-on-steepest-descent'),
-              ('The ideal', 2, None, 'the-ideal'),
-              ('The sensitiveness of the gradient descent',
-               2,
-               None,
-               'the-sensitiveness-of-the-gradient-descent'),
-              ('Convex functions', 2, None, 'convex-functions'),
-              ('Convex function', 2, None, 'convex-function'),
-              ('Conditions on convex functions',
-               2,
-               None,
-               'conditions-on-convex-functions'),
-              ('More on convex functions', 2, None, 'more-on-convex-functions'),
-              ('Some simple problems', 2, None, 'some-simple-problems'),
-              ('Standard steepest descent',
-               2,
-               None,
-               'standard-steepest-descent'),
-              ('Gradient method', 2, None, 'gradient-method'),
-              ('Steepest descent  method', 2, None, 'steepest-descent-method'),
-              ('Steepest descent  method', 2, None, 'steepest-descent-method'),
-              ('Final expressions', 2, None, 'final-expressions'),
-              ('Steepest descent example', 2, None, 'steepest-descent-example'),
-              ('Conjugate gradient method',
-               2,
-               None,
-               'conjugate-gradient-method'),
-              ('Conjugate gradient method',
-               2,
-               None,
-               'conjugate-gradient-method'),
-              ('Conjugate gradient method',
-               2,
-               None,
-               'conjugate-gradient-method'),
-              ('Conjugate gradient method',
-               2,
-               None,
-               'conjugate-gradient-method'),
-              ('Conjugate gradient method and iterations',
+               'readings-and-videos-resampling-methods'),
+              ('Readings and Videos, logistic regression',
                2,
                None,
-               'conjugate-gradient-method-and-iterations'),
-              ('Conjugate gradient method',
-               2,
-               None,
-               'conjugate-gradient-method'),
-              ('Conjugate gradient method',
-               2,
-               None,
-               'conjugate-gradient-method'),
-              ('Conjugate gradient method',
-               2,
-               None,
-               'conjugate-gradient-method'),
-              ('Revisiting our first homework',
-               2,
-               None,
-               'revisiting-our-first-homework'),
-              ('Gradient descent example', 2, None, 'gradient-descent-example'),
-              ('The derivative of the cost/loss function',
-               2,
-               None,
-               'the-derivative-of-the-cost-loss-function'),
-              ('The Hessian matrix', 2, None, 'the-hessian-matrix'),
-              ('Simple program', 2, None, 'simple-program'),
-              ('Gradient Descent Example', 2, None, 'gradient-descent-example'),
-              ('And a corresponding example using _scikit-learn_',
-               2,
-               None,
-               'and-a-corresponding-example-using-scikit-learn'),
-              ('Gradient descent and Ridge',
-               2,
-               None,
-               'gradient-descent-and-ridge'),
-              ('The Hessian matrix for Ridge Regression',
-               2,
-               None,
-               'the-hessian-matrix-for-ridge-regression'),
-              ('Program example for gradient descent with Ridge Regression',
-               2,
-               None,
-               'program-example-for-gradient-descent-with-ridge-regression'),
-              ('Using gradient descent methods, limitations',
-               2,
-               None,
-               'using-gradient-descent-methods-limitations'),
-              ('Improving gradient descent with momentum',
-               2,
-               None,
-               'improving-gradient-descent-with-momentum'),
-              ('Same code but now with momentum gradient descent',
+               'readings-and-videos-logistic-regression'),
+              ('Lab sessions week 39', 2, None, 'lab-sessions-week-39'),
+              ('Lecture material', 2, None, 'lecture-material'),
+              ('Resampling methods', 2, None, 'resampling-methods'),
+              ('Resampling approaches can be computationally expensive',
                2,
                None,
-               'same-code-but-now-with-momentum-gradient-descent'),
-              ('Overview video on Stochastic Gradient Descent',
+               'resampling-approaches-can-be-computationally-expensive'),
+              ('Why resampling methods ?', 2, None, 'why-resampling-methods'),
+              ('Statistical analysis', 2, None, 'statistical-analysis'),
+              ('Resampling methods', 2, None, 'resampling-methods'),
+              ('Resampling methods: Bootstrap',
                2,
                None,
-               'overview-video-on-stochastic-gradient-descent'),
-              ('Batches and mini-batches', 2, None, 'batches-and-mini-batches'),
-              ('Stochastic Gradient Descent (SGD)',
+               'resampling-methods-bootstrap'),
+              ('The bias-variance tradeoff',
                2,
                None,
-               'stochastic-gradient-descent-sgd'),
-              ('Stochastic Gradient Descent',
+               'the-bias-variance-tradeoff'),
+              ('A way to Read the Bias-Variance Tradeoff',
                2,
                None,
-               'stochastic-gradient-descent'),
-              ('Computation of gradients', 2, None, 'computation-of-gradients'),
-              ('SGD example', 2, None, 'sgd-example'),
-              ('The gradient step', 2, None, 'the-gradient-step'),
-              ('Simple example code', 2, None, 'simple-example-code'),
-              ('When do we stop?', 2, None, 'when-do-we-stop'),
-              ('Slightly different approach',
+               'a-way-to-read-the-bias-variance-tradeoff'),
+              ('Understanding what happens',
                2,
                None,
-               'slightly-different-approach'),
-              ('Time decay rate', 2, None, 'time-decay-rate'),
-              ('Code with a Number of Minibatches which varies',
+               'understanding-what-happens'),
+              ('Summing up', 2, None, 'summing-up'),
+              ("Another Example from Scikit-Learn's Repository",
                2,
                None,
-               'code-with-a-number-of-minibatches-which-varies'),
-              ('Replace or not', 2, None, 'replace-or-not'),
-              ('Momentum based GD', 2, None, 'momentum-based-gd'),
-              ('More on momentum based approaches',
+               'another-example-from-scikit-learn-s-repository'),
+              ('Various steps in cross-validation',
                2,
                None,
-               'more-on-momentum-based-approaches'),
-              ('Momentum parameter', 2, None, 'momentum-parameter'),
-              ('Second moment of the gradient',
+               'various-steps-in-cross-validation'),
+              ('Cross-validation in brief',
                2,
                None,
-               'second-moment-of-the-gradient'),
-              ('RMS prop', 2, None, 'rms-prop'),
-              ('"ADAM optimizer":"https://arxiv.org/abs/1412.6980"',
+               'cross-validation-in-brief'),
+              ('Code Example for Cross-validation and $k$-fold '
+               'Cross-validation',
                2,
                None,
-               'adam-optimizer-https-arxiv-org-abs-1412-6980'),
-              ('Algorithms and codes for Adagrad, RMSprop and Adam',
+               'code-example-for-cross-validation-and-k-fold-cross-validation'),
+              ('More examples on bootstrap and cross-validation and errors',
                2,
                None,
-               'algorithms-and-codes-for-adagrad-rmsprop-and-adam'),
-              ('Practical tips', 2, None, 'practical-tips'),
-              ('Automatic differentiation',
+               'more-examples-on-bootstrap-and-cross-validation-and-errors'),
+              ('The same example but now with cross-validation',
                2,
                None,
-               'automatic-differentiation'),
-              ('Using autograd', 2, None, 'using-autograd'),
-              ('Autograd with more complicated functions',
+               'the-same-example-but-now-with-cross-validation'),
+              ('Logistic Regression', 2, None, 'logistic-regression'),
+              ('Classification problems', 2, None, 'classification-problems'),
+              ('Optimization and Deep learning',
                2,
                None,
-               'autograd-with-more-complicated-functions'),
-              ('More complicated functions using the elements of their '
-               'arguments directly',
+               'optimization-and-deep-learning'),
+              ('Basics', 2, None, 'basics'),
+              ('Linear classifier', 2, None, 'linear-classifier'),
+              ('Some selected properties', 2, None, 'some-selected-properties'),
+              ('Simple example', 2, None, 'simple-example'),
+              ('Plotting the mean value for each group',
                2,
                None,
-               'more-complicated-functions-using-the-elements-of-their-arguments-directly'),
-              ('Functions using mathematical functions from Numpy',
+               'plotting-the-mean-value-for-each-group'),
+              ('The logistic function', 2, None, 'the-logistic-function'),
+              ('Examples of likelihood functions used in logistic regression '
+               'and nueral networks',
                2,
                None,
-               'functions-using-mathematical-functions-from-numpy'),
-              ('More autograd', 2, None, 'more-autograd'),
-              ('And  with loops', 2, None, 'and-with-loops'),
-              ('Using recursion', 2, None, 'using-recursion'),
-              ('Unsupported functions', 2, None, 'unsupported-functions'),
-              ('The syntax a.dot(b) when finding the dot product',
+               'examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-networks'),
+              ('Two parameters', 2, None, 'two-parameters'),
+              ('Maximum likelihood', 2, None, 'maximum-likelihood'),
+              ('The cost function rewritten',
                2,
                None,
-               'the-syntax-a-dot-b-when-finding-the-dot-product'),
-              ('Using Autograd with OLS', 2, None, 'using-autograd-with-ols'),
-              ('Same code but now with momentum gradient descent',
+               'the-cost-function-rewritten'),
+              ('Minimizing the cross entropy',
                2,
                None,
-               'same-code-but-now-with-momentum-gradient-descent'),
-              ("But none of these can compete with Newton's method",
+               'minimizing-the-cross-entropy'),
+              ('A more compact expression',
                2,
                None,
-               'but-none-of-these-can-compete-with-newton-s-method'),
-              ('Including Stochastic Gradient Descent with Autograd',
+               'a-more-compact-expression'),
+              ('Extending to more predictors',
                2,
                None,
-               'including-stochastic-gradient-descent-with-autograd'),
-              ('Same code but now with momentum gradient descent',
+               'extending-to-more-predictors'),
+              ('Including more classes', 2, None, 'including-more-classes'),
+              ('More classes', 2, None, 'more-classes'),
+              ('Optimization, the central part of any Machine Learning '
+               'algortithm',
                2,
                None,
-               'same-code-but-now-with-momentum-gradient-descent'),
-              ('Similar (second order function now) problem but now with '
-               'AdaGrad',
+               'optimization-the-central-part-of-any-machine-learning-algortithm'),
+              ('Revisiting our Logistic Regression case',
                2,
                None,
-               'similar-second-order-function-now-problem-but-now-with-adagrad'),
-              ('RMSprop for adaptive learning rate with Stochastic Gradient '
-               'Descent',
+               'revisiting-our-logistic-regression-case'),
+              ('The equations to solve', 2, None, 'the-equations-to-solve'),
+              ("Solving using Newton-Raphson's method",
                2,
                None,
-               'rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent'),
-              ('And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"',
+               'solving-using-newton-raphson-s-method'),
+              ('Example code for Logistic Regression',
                2,
                None,
-               'and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf'),
-              ('And Logistic Regression', 2, None, 'and-logistic-regression'),
-              ('Introducing "JAX":"https://jax.readthedocs.io/en/latest/"',
-               2,
+               'example-code-for-logistic-regression'),
+              ('Synthetic data generation',
+               3,
                None,
-               'introducing-jax-https-jax-readthedocs-io-en-latest')]}
+               'synthetic-data-generation')]}
 end of tocinfo -->
 
 <body>
@@ -305,101 +189,58 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week39-bs.html">Week 39: Optimization and  Gradient Methods</a>
+    <a class="navbar-brand" href="http://github.com/week39-bs.html">Week 39: Resampling methods and logistic regression</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs001.html#plan-for-week-39-september-23-27-2024" style="font-size: 80%;">Plan for week 39, September 23-27, 2024</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs002.html#lecture-monday-september-23" style="font-size: 80%;">Lecture Monday September 23</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs003.html#lab-sessions-week-39" style="font-size: 80%;">Lab sessions week 39</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs004.html#lecture-monday-september-23-optimization-the-central-part-of-any-machine-learning-algortithm" style="font-size: 80%;">Lecture Monday September 23, Optimization, the central part of any Machine Learning algortithm</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs005.html#revisiting-our-logistic-regression-case" style="font-size: 80%;">Revisiting our Logistic Regression case</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs006.html#the-equations-to-solve" style="font-size: 80%;">The equations to solve</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs007.html#solving-using-newton-raphson-s-method" style="font-size: 80%;">Solving using Newton-Raphson's method</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs008.html#brief-reminder-on-newton-raphson-s-method" style="font-size: 80%;">Brief reminder on Newton-Raphson's method</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs009.html#the-equations" style="font-size: 80%;">The equations</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs010.html#simple-geometric-interpretation" style="font-size: 80%;">Simple geometric interpretation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs011.html#extending-to-more-than-one-variable" style="font-size: 80%;">Extending to more than one variable</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs012.html#steepest-descent" style="font-size: 80%;">Steepest descent</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs013.html#more-on-steepest-descent" style="font-size: 80%;">More on Steepest descent</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs014.html#the-ideal" style="font-size: 80%;">The ideal</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs015.html#the-sensitiveness-of-the-gradient-descent" style="font-size: 80%;">The sensitiveness of the gradient descent</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs016.html#convex-functions" style="font-size: 80%;">Convex functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs017.html#convex-function" style="font-size: 80%;">Convex function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs018.html#conditions-on-convex-functions" style="font-size: 80%;">Conditions on convex functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs019.html#more-on-convex-functions" style="font-size: 80%;">More on convex functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs020.html#some-simple-problems" style="font-size: 80%;">Some simple problems</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs021.html#standard-steepest-descent" style="font-size: 80%;">Standard steepest descent</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs022.html#gradient-method" style="font-size: 80%;">Gradient method</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs024.html#steepest-descent-method" style="font-size: 80%;">Steepest descent  method</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs024.html#steepest-descent-method" style="font-size: 80%;">Steepest descent  method</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs025.html#final-expressions" style="font-size: 80%;">Final expressions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs026.html#steepest-descent-example" style="font-size: 80%;">Steepest descent example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs031.html#conjugate-gradient-method-and-iterations" style="font-size: 80%;">Conjugate gradient method and iterations</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#conjugate-gradient-method" style="font-size: 80%;">Conjugate gradient method</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs035.html#revisiting-our-first-homework" style="font-size: 80%;">Revisiting our first homework</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs040.html#gradient-descent-example" style="font-size: 80%;">Gradient descent example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs037.html#the-derivative-of-the-cost-loss-function" style="font-size: 80%;">The derivative of the cost/loss function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs038.html#the-hessian-matrix" style="font-size: 80%;">The Hessian matrix</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs039.html#simple-program" style="font-size: 80%;">Simple program</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs040.html#gradient-descent-example" style="font-size: 80%;">Gradient Descent Example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs041.html#and-a-corresponding-example-using-scikit-learn" style="font-size: 80%;">And a corresponding example using <b>scikit-learn</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs042.html#gradient-descent-and-ridge" style="font-size: 80%;">Gradient descent and Ridge</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs043.html#the-hessian-matrix-for-ridge-regression" style="font-size: 80%;">The Hessian matrix for Ridge Regression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs044.html#program-example-for-gradient-descent-with-ridge-regression" style="font-size: 80%;">Program example for gradient descent with Ridge Regression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs045.html#using-gradient-descent-methods-limitations" style="font-size: 80%;">Using gradient descent methods, limitations</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs046.html#improving-gradient-descent-with-momentum" style="font-size: 80%;">Improving gradient descent with momentum</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs083.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;">Same code but now with momentum gradient descent</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs048.html#overview-video-on-stochastic-gradient-descent" style="font-size: 80%;">Overview video on Stochastic Gradient Descent</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs049.html#batches-and-mini-batches" style="font-size: 80%;">Batches and mini-batches</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs050.html#stochastic-gradient-descent-sgd" style="font-size: 80%;">Stochastic Gradient Descent (SGD)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs051.html#stochastic-gradient-descent" style="font-size: 80%;">Stochastic Gradient Descent</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs052.html#computation-of-gradients" style="font-size: 80%;">Computation of gradients</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs053.html#sgd-example" style="font-size: 80%;">SGD example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs054.html#the-gradient-step" style="font-size: 80%;">The gradient step</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs055.html#simple-example-code" style="font-size: 80%;">Simple example code</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs056.html#when-do-we-stop" style="font-size: 80%;">When do we stop?</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs057.html#slightly-different-approach" style="font-size: 80%;">Slightly different approach</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs058.html#time-decay-rate" style="font-size: 80%;">Time decay rate</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs059.html#code-with-a-number-of-minibatches-which-varies" style="font-size: 80%;">Code with a Number of Minibatches which varies</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs060.html#replace-or-not" style="font-size: 80%;">Replace or not</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs061.html#momentum-based-gd" style="font-size: 80%;">Momentum based GD</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs062.html#more-on-momentum-based-approaches" style="font-size: 80%;">More on momentum based approaches</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs063.html#momentum-parameter" style="font-size: 80%;">Momentum parameter</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs064.html#second-moment-of-the-gradient" style="font-size: 80%;">Second moment of the gradient</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs065.html#rms-prop" style="font-size: 80%;">RMS prop</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs066.html#adam-optimizer-https-arxiv-org-abs-1412-6980" style="font-size: 80%;">"ADAM optimizer":"https://arxiv.org/abs/1412.6980"</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs067.html#algorithms-and-codes-for-adagrad-rmsprop-and-adam" style="font-size: 80%;">Algorithms and codes for Adagrad, RMSprop and Adam</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs068.html#practical-tips" style="font-size: 80%;">Practical tips</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs069.html#automatic-differentiation" style="font-size: 80%;">Automatic differentiation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs070.html#using-autograd" style="font-size: 80%;">Using autograd</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs071.html#autograd-with-more-complicated-functions" style="font-size: 80%;">Autograd with more complicated functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs072.html#more-complicated-functions-using-the-elements-of-their-arguments-directly" style="font-size: 80%;">More complicated functions using the elements of their arguments directly</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs073.html#functions-using-mathematical-functions-from-numpy" style="font-size: 80%;">Functions using mathematical functions from Numpy</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs074.html#more-autograd" style="font-size: 80%;">More autograd</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs075.html#and-with-loops" style="font-size: 80%;">And  with loops</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs076.html#using-recursion" style="font-size: 80%;">Using recursion</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs077.html#unsupported-functions" style="font-size: 80%;">Unsupported functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs078.html#the-syntax-a-dot-b-when-finding-the-dot-product" style="font-size: 80%;">The syntax a.dot(b) when finding the dot product</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs079.html#using-autograd-with-ols" style="font-size: 80%;">Using Autograd with OLS</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs083.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;">Same code but now with momentum gradient descent</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs081.html#but-none-of-these-can-compete-with-newton-s-method" style="font-size: 80%;">But none of these can compete with Newton's method</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs082.html#including-stochastic-gradient-descent-with-autograd" style="font-size: 80%;">Including Stochastic Gradient Descent with Autograd</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs083.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;">Same code but now with momentum gradient descent</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs084.html#similar-second-order-function-now-problem-but-now-with-adagrad" style="font-size: 80%;">Similar (second order function now) problem but now with AdaGrad</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs085.html#rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent" style="font-size: 80%;">RMSprop for adaptive learning rate with Stochastic Gradient Descent</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs086.html#and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf" style="font-size: 80%;">And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs087.html#and-logistic-regression" style="font-size: 80%;">And Logistic Regression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs088.html#introducing-jax-https-jax-readthedocs-io-en-latest" style="font-size: 80%;">Introducing "JAX":"https://jax.readthedocs.io/en/latest/"</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs001.html#plan-for-week-39-september-22-26-2025" style="font-size: 80%;"><b>Plan for week 39, September 22-26, 2025</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs002.html#readings-and-videos-resampling-methods" style="font-size: 80%;"><b>Readings and Videos, resampling methods</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs003.html#readings-and-videos-logistic-regression" style="font-size: 80%;"><b>Readings and Videos, logistic regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs004.html#lab-sessions-week-39" style="font-size: 80%;"><b>Lab sessions week 39</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs005.html#lecture-material" style="font-size: 80%;"><b>Lecture material</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs010.html#resampling-methods" style="font-size: 80%;"><b>Resampling methods</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs007.html#resampling-approaches-can-be-computationally-expensive" style="font-size: 80%;"><b>Resampling approaches can be computationally expensive</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs008.html#why-resampling-methods" style="font-size: 80%;"><b>Why resampling methods ?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs009.html#statistical-analysis" style="font-size: 80%;"><b>Statistical analysis</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs010.html#resampling-methods" style="font-size: 80%;"><b>Resampling methods</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs011.html#resampling-methods-bootstrap" style="font-size: 80%;"><b>Resampling methods: Bootstrap</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs012.html#the-bias-variance-tradeoff" style="font-size: 80%;"><b>The bias-variance tradeoff</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs013.html#a-way-to-read-the-bias-variance-tradeoff" style="font-size: 80%;"><b>A way to Read the Bias-Variance Tradeoff</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs014.html#understanding-what-happens" style="font-size: 80%;"><b>Understanding what happens</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs015.html#summing-up" style="font-size: 80%;"><b>Summing up</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs016.html#another-example-from-scikit-learn-s-repository" style="font-size: 80%;"><b>Another Example from Scikit-Learn's Repository</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs017.html#various-steps-in-cross-validation" style="font-size: 80%;"><b>Various steps in cross-validation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs018.html#cross-validation-in-brief" style="font-size: 80%;"><b>Cross-validation in brief</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs019.html#code-example-for-cross-validation-and-k-fold-cross-validation" style="font-size: 80%;"><b>Code Example for Cross-validation and \( k \)-fold Cross-validation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs020.html#more-examples-on-bootstrap-and-cross-validation-and-errors" style="font-size: 80%;"><b>More examples on bootstrap and cross-validation and errors</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs021.html#the-same-example-but-now-with-cross-validation" style="font-size: 80%;"><b>The same example but now with cross-validation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs022.html#logistic-regression" style="font-size: 80%;"><b>Logistic Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs023.html#classification-problems" style="font-size: 80%;"><b>Classification problems</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs024.html#optimization-and-deep-learning" style="font-size: 80%;"><b>Optimization and Deep learning</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs025.html#basics" style="font-size: 80%;"><b>Basics</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs026.html#linear-classifier" style="font-size: 80%;"><b>Linear classifier</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs027.html#some-selected-properties" style="font-size: 80%;"><b>Some selected properties</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs028.html#simple-example" style="font-size: 80%;"><b>Simple example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs029.html#plotting-the-mean-value-for-each-group" style="font-size: 80%;"><b>Plotting the mean value for each group</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs030.html#the-logistic-function" style="font-size: 80%;"><b>The logistic function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs031.html#examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-networks" style="font-size: 80%;"><b>Examples of likelihood functions used in logistic regression and nueral networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs032.html#two-parameters" style="font-size: 80%;"><b>Two parameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs033.html#maximum-likelihood" style="font-size: 80%;"><b>Maximum likelihood</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs034.html#the-cost-function-rewritten" style="font-size: 80%;"><b>The cost function rewritten</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs035.html#minimizing-the-cross-entropy" style="font-size: 80%;"><b>Minimizing the cross entropy</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs036.html#a-more-compact-expression" style="font-size: 80%;"><b>A more compact expression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs037.html#extending-to-more-predictors" style="font-size: 80%;"><b>Extending to more predictors</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs038.html#including-more-classes" style="font-size: 80%;"><b>Including more classes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs039.html#more-classes" style="font-size: 80%;"><b>More classes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs040.html#optimization-the-central-part-of-any-machine-learning-algortithm" style="font-size: 80%;"><b>Optimization, the central part of any Machine Learning algortithm</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs041.html#revisiting-our-logistic-regression-case" style="font-size: 80%;"><b>Revisiting our Logistic Regression case</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs042.html#the-equations-to-solve" style="font-size: 80%;"><b>The equations to solve</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs043.html#solving-using-newton-raphson-s-method" style="font-size: 80%;"><b>Solving using Newton-Raphson's method</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs044.html#example-code-for-logistic-regression" style="font-size: 80%;"><b>Example code for Logistic Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week39-bs044.html#synthetic-data-generation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Synthetic data generation</a></li>
 
         </ul>
       </li>
@@ -413,19 +254,16 @@
 <!-- ------------------- main content ---------------------- -->
 <div class="jumbotron">
 <center>
-<h1>Week 39: Optimization and  Gradient Methods</h1>
+<h1>Week 39: Resampling methods and logistic regression</h1>
 </center>  <!-- document title -->
 
 <!-- author(s): Morten Hjorth-Jensen -->
 <center>
-<b>Morten Hjorth-Jensen</b> [1, 2]
-</center>
-<!-- institution(s) -->
-<center>
-[1] <b>Department of Physics, University of Oslo</b>
+<b>Morten Hjorth-Jensen</b> 
 </center>
+<!-- institution -->
 <center>
-[2] <b>Department of Physics and Astronomy and Facility for Rare Isotope Beams, Michigan State University</b>
+<b>Department of Physics, University of Oslo</b>
 </center>
 <br>
 <center>
@@ -454,7 +292,7 @@ <h4>Week 39</h4>
   <li><a href="http://github.com/._week39-bs008.html">9</a></li>
   <li><a href="http://github.com/._week39-bs009.html">10</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week39-bs088.html">89</a></li>
+  <li><a href="http://github.com/._week39-bs044.html">45</a></li>
   <li><a href="http://github.com/._week39-bs001.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
@@ -468,7 +306,7 @@ <h4>Week 39</h4>
 </footer>
 -->
 <center style="font-size:80%">
-<!-- copyright --> &copy; 1999-2024, Morten Hjorth-Jensen. Released under CC Attribution-NonCommercial 4.0 license
+<!-- copyright --> &copy; 1999-2025, Morten Hjorth-Jensen. Released under CC Attribution-NonCommercial 4.0 license
 </center>
 </body>
 </html>
diff --git a/doc/pub/week39/html/week39-reveal.html b/doc/pub/week39/html/week39-reveal.html
index 808d09600..8e6602610 100644
--- a/doc/pub/week39/html/week39-reveal.html
+++ b/doc/pub/week39/html/week39-reveal.html
@@ -9,8 +9,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 39: Optimization and  Gradient Methods">
-<title>Week 39: Optimization and  Gradient Methods</title>
+<meta name="description" content="Week 39: Resampling methods and logistic regression">
+<title>Week 39: Resampling methods and logistic regression</title>
 
 <!-- reveal.js: https://lab.hakim.se/reveal-js/ -->
 
@@ -168,19 +168,16 @@
 <section>
 <!-- ------------------- main content ---------------------- -->
 <center>
-<h1 style="text-align: center;">Week 39: Optimization and  Gradient Methods</h1>
+<h1 style="text-align: center;">Week 39: Resampling methods and logistic regression</h1>
 </center>  <!-- document title -->
 
 <!-- author(s): Morten Hjorth-Jensen -->
 <center>
-<b>Morten Hjorth-Jensen</b> [1, 2]
+<b>Morten Hjorth-Jensen</b> 
 </center>
-<!-- institution(s) -->
+<!-- institution -->
 <center>
-[1] <b>Department of Physics, University of Oslo</b>
-</center>
-<center>
-[2] <b>Department of Physics and Astronomy and Facility for Rare Isotope Beams, Michigan State University</b>
+<b>Department of Physics, University of Oslo</b>
 </center>
 <br>
 <center>
@@ -190,45 +187,51 @@ <h4>Week 39</h4>
 
 
 <center style="font-size:80%">
-<!-- copyright --> &copy; 1999-2024, Morten Hjorth-Jensen. Released under CC Attribution-NonCommercial 4.0 license
+<!-- copyright --> &copy; 1999-2025, Morten Hjorth-Jensen. Released under CC Attribution-NonCommercial 4.0 license
 </center>
 </section>
 
 <section>
-<h2 id="plan-for-week-39-september-23-27-2024">Plan for week 39, September 23-27, 2024 </h2>
+<h2 id="plan-for-week-39-september-22-26-2025">Plan for week 39, September 22-26, 2025 </h2>
+
+<div class="alert alert-block alert-block alert-text-normal">
+<b>Material for the lecture on Monday September 22</b>
+<p>
+<ol>
+<p><li> Resampling techniques, Bootstrap and cross validation and bias-variance tradeoff</li>
+<p><li> Logistic regression, our first classification encounter and a stepping stone towards neural networks</li>
+<p><li> <a href="https://youtu.be/OVouJyhoksY" target="_blank">Video of lecture</a></li>
+<p><li> <a href="https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2024/FYSSTKweek39.pdf" target="_blank">Whiteboard notes</a></li>
+</ol>
+</div>
 </section>
 
 <section>
-<h2 id="lecture-monday-september-23">Lecture Monday September 23 </h2>
-
+<h2 id="readings-and-videos-resampling-methods">Readings and Videos, resampling methods </h2>
 <div class="alert alert-block alert-block alert-text-normal">
-<b>Material for the lecture on Monday September 23</b>
+<b></b>
 <p>
-<ul>
-
-<p><li> Repetition of Logistic regression equations and classification problems and discussion of Gradient methods. Examples on how to implement Logistic Regression and discussion of gradient methods</li>
-
-<p><li> Stochastic Gradient descent with examples and automatic differentiation (theme also for next week).</li>
-
-<p><li> <a href="https://youtu.be/ISGpTC28Vmk" target="_blank">Video of lecture</a></li>
-
-<p><li> <a href="https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2024/NotesSeptember23.pdf" target="_blank">Whiteboard notes</a></li>
-
-<p><li> Readings and Videos:</li>
-<ul>
-
-<p><li> These lecture notes</li>
-
-<p><li> For a good discussion on gradient methods, we would like to recommend Goodfellow et al section 4.3-4.5 and sections 8.3-8.6. We will come back to the latter chapter in our discussion of Neural networks as well.</li>
-
-<p><li> Raschka et al, pages 53-76 on Logistic regression and pages 37-52 on gradient optimization</li>
-
-<p><li> <a href="https://www.youtube.com/watch?v=sDv4f4s2SB8" target="_blank">Video on gradient descent</a></li>
+<ol>
+<p><li> Raschka et al, pages 175-192</li>
+<p><li> Hastie et al Chapter 7, here we recommend 7.1-7.5 and 7.10 (cross-validation) and 7.11 (bootstrap). See <a href="https://link.springer.com/book/10.1007/978-0-387-84858-7" target="_blank"><tt>https://link.springer.com/book/10.1007/978-0-387-84858-7</tt></a>.</li>
+<p><li> <a href="https://www.youtube.com/watch?v=EuBBz3bI-aA" target="_blank">Video on bias-variance tradeoff</a></li>
+<p><li> <a href="https://www.youtube.com/watch?v=Xz0x-8-cgaQ" target="_blank">Video on Bootstrapping</a></li>
+<p><li> <a href="https://www.youtube.com/watch?v=fSytzGwwBVw" target="_blank">Video on cross validation</a></li>
+</ol>
+</div>
+</section>
 
-<p><li> <a href="https://www.youtube.com/watch?v=vMh0zPT0tLI" target="_blank">Video on stochastic gradient descent</a></li>
-</ul>
+<section>
+<h2 id="readings-and-videos-logistic-regression">Readings and Videos, logistic regression </h2>
+<div class="alert alert-block alert-block alert-text-normal">
+<b></b>
 <p>
-</ul>
+<ol>
+<p><li> Hastie et al 4.1, 4.2 and 4.3 on logistic regression</li>
+<p><li> Raschka et al, pages 53-76 on Logistic regression and pages 37-52 on gradient optimization</li>
+<p><li> <a href="https://www.youtube.com/watch?v=C5268D9t9Ak" target="_blank">Video on Logistic regression</a></li>
+<p><li> <a href="https://www.youtube.com/watch?v=yIYKR4sgzI8" target="_blank">Yet another video on logistic regression</a></li>
+</ol>
 </div>
 </section>
 
@@ -236,635 +239,601 @@ <h2 id="lecture-monday-september-23">Lecture Monday September 23 </h2>
 <h2 id="lab-sessions-week-39">Lab sessions week 39 </h2>
 
 <div class="alert alert-block alert-block alert-text-normal">
-<b>Material for the active learning sessions on Tuesday and Wednesday</b>
+<b>Material for the lab sessions on Tuesday and Wednesday</b>
 <p>
-<ul>
-
+<ol>
 <p><li> Discussions on how to structure your report for the first project</li>
-
-<p><li> Exercise for week 39 on how to write the abstract and the introduction of the report and how to include references.</li>
-
+<p><li> Exercise for week 39 on how to write the abstract and the introduction of the report and how to include references.</li> 
 <p><li> Work on project 1, in particular resampling methods like cross-validation and bootstrap. <b>For more discussions of project 1, chapter 5 of Goodfellow et al is a good read, in particular sections 5.1-5.5 and 5.7-5.11</b>.</li>
-
 <p><li> <a href="https://youtu.be/tVW1ZDmZnwM" target="_blank">Video on how to write scientific reports recorded during one of the lab sessions</a></li>
-
 <p><li> A general guideline can be found at <a href="https://github.com/CompPhysics/MachineLearning/blob/master/doc/Projects/EvaluationGrading/EvaluationForm.md" target="_blank"><tt>https://github.com/CompPhysics/MachineLearning/blob/master/doc/Projects/EvaluationGrading/EvaluationForm.md</tt></a>.</li>
-</ul>
+</ol>
 </div>
 </section>
 
 <section>
-<h2 id="lecture-monday-september-23-optimization-the-central-part-of-any-machine-learning-algortithm">Lecture Monday September 23, Optimization, the central part of any Machine Learning algortithm </h2>
-
-<p>The first few slides here are a repetition from last week. </p>
-
-<p>Almost every problem in machine learning and data science starts with
-a dataset \( X \), a model \( g(\beta) \), which is a function of the
-parameters \( \beta \) and a cost function \( C(X, g(\beta)) \) that allows
-us to judge how well the model \( g(\beta) \) explains the observations
-\( X \). The model is fit by finding the values of \( \beta \) that minimize
-the cost function. Ideally we would be able to solve for \( \beta \)
-analytically, however this is not possible in general and we must use
-some approximative/numerical method to compute the minimum.
-</p>
+<h2 id="lecture-material">Lecture material </h2>
 </section>
 
 <section>
-<h2 id="revisiting-our-logistic-regression-case">Revisiting our Logistic Regression case </h2>
-
-<p>In our discussion on Logistic Regression we studied the 
-case of
-two classes, with \( y_i \) either
-\( 0 \) or \( 1 \). Furthermore we assumed also that we have only two
-parameters \( \beta \) in our fitting, that is we
-defined probabilities
-</p>
-
-<p>&nbsp;<br>
-$$
-\begin{align*}
-p(y_i=1|x_i,\boldsymbol{\beta}) &= \frac{\exp{(\beta_0+\beta_1x_i)}}{1+\exp{(\beta_0+\beta_1x_i)}},\nonumber\\
-p(y_i=0|x_i,\boldsymbol{\beta}) &= 1 - p(y_i=1|x_i,\boldsymbol{\beta}),
-\end{align*}
-$$
-<p>&nbsp;<br>
-
-<p>where \( \boldsymbol{\beta} \) are the weights we wish to extract from data, in our case \( \beta_0 \) and \( \beta_1 \). </p>
+<h2 id="resampling-methods">Resampling methods </h2>
+<div class="alert alert-block alert-block alert-text-normal">
+<b></b>
+<p>
+<p>Resampling methods are an indispensable tool in modern
+statistics. They involve repeatedly drawing samples from a training
+set and refitting a model of interest on each sample in order to
+obtain additional information about the fitted model. For example, in
+order to estimate the variability of a linear regression fit, we can
+repeatedly draw different samples from the training data, fit a linear
+regression to each new sample, and then examine the extent to which
+the resulting fits differ. Such an approach may allow us to obtain
+information that would not be available from fitting the model only
+once using the original training sample.
+</p>
+
+<p>Two resampling methods are often used in Machine Learning analyses,</p>
+<ol>
+<p><li> The <b>bootstrap method</b></li>
+<p><li> and <b>Cross-Validation</b></li>
+</ol>
+<p>
+<p>In addition there are several other methods such as the Jackknife and the Blocking methods. This week will repeat some of the elements of the bootstrap method and focus more on cross-validation.</p>
+</div>
 </section>
 
 <section>
-<h2 id="the-equations-to-solve">The equations to solve </h2>
-
-<p>Our compact equations used a definition of a vector \( \boldsymbol{y} \) with \( n \)
-elements \( y_i \), an \( n\times p \) matrix \( \boldsymbol{X} \) which contains the
-\( x_i \) values and a vector \( \boldsymbol{p} \) of fitted probabilities
-\( p(y_i\vert x_i,\boldsymbol{\beta}) \). We rewrote in a more compact form
-the first derivative of the cost function as
-</p>
-
-<p>&nbsp;<br>
-$$
-\frac{\partial \mathcal{C}(\boldsymbol{\beta})}{\partial \boldsymbol{\beta}} = -\boldsymbol{X}^T\left(\boldsymbol{y}-\boldsymbol{p}\right). 
-$$
-<p>&nbsp;<br>
+<h2 id="resampling-approaches-can-be-computationally-expensive">Resampling approaches can be computationally expensive </h2>
+<div class="alert alert-block alert-block alert-text-normal">
+<b></b>
+<p>
 
-<p>If we in addition define a diagonal matrix \( \boldsymbol{W} \) with elements 
-\( p(y_i\vert x_i,\boldsymbol{\beta})(1-p(y_i\vert x_i,\boldsymbol{\beta}) \), we can obtain a compact expression of the second derivative as 
+<p>Resampling approaches can be computationally expensive, because they
+involve fitting the same statistical method multiple times using
+different subsets of the training data. However, due to recent
+advances in computing power, the computational requirements of
+resampling methods generally are not prohibitive. In this chapter, we
+discuss two of the most commonly used resampling methods,
+cross-validation and the bootstrap. Both methods are important tools
+in the practical application of many statistical learning
+procedures. For example, cross-validation can be used to estimate the
+test error associated with a given statistical learning method in
+order to evaluate its performance, or to select the appropriate level
+of flexibility. The process of evaluating a model&#8217;s performance is
+known as model assessment, whereas the process of selecting the proper
+level of flexibility for a model is known as model selection. The
+bootstrap is widely used.
 </p>
+</div>
+</section>
 
-<p>&nbsp;<br>
-$$
-\frac{\partial^2 \mathcal{C}(\boldsymbol{\beta})}{\partial \boldsymbol{\beta}\partial \boldsymbol{\beta}^T} = \boldsymbol{X}^T\boldsymbol{W}\boldsymbol{X}. 
-$$
-<p>&nbsp;<br>
+<section>
+<h2 id="why-resampling-methods">Why resampling methods ? </h2>
+<div class="alert alert-block alert-block alert-text-normal">
+<b>Statistical analysis</b>
+<p>
 
-<p>This defines what is called  the Hessian matrix.</p>
+<ul>
+<p><li> Our simulations can be treated as <em>computer experiments</em>. This is particularly the case for Monte Carlo methods which are widely used in statistical analyses.</li>
+<p><li> The results can be analysed with the same statistical tools as we would use when analysing experimental data.</li>
+<p><li> As in all experiments, we are looking for expectation values and an estimate of how accurate they are, i.e., possible sources for errors.</li>
+</ul>
+</div>
 </section>
 
 <section>
-<h2 id="solving-using-newton-raphson-s-method">Solving using Newton-Raphson's method </h2>
+<h2 id="statistical-analysis">Statistical analysis </h2>
+<div class="alert alert-block alert-block alert-text-normal">
+<b></b>
+<p>
 
-<p>If we can set up these equations, Newton-Raphson's iterative method is normally the method of choice. It requires however that we can compute in an efficient way the  matrices that define the first and second derivatives. </p>
+<ul>
+<p><li> As in other experiments, many numerical  experiments have two classes of errors:</li>
+<ul>
 
-<p>Our iterative scheme is then given by</p>
+<p><li> Statistical errors</li>
 
-<p>&nbsp;<br>
-$$
-\boldsymbol{\beta}^{\mathrm{new}} = \boldsymbol{\beta}^{\mathrm{old}}-\left(\frac{\partial^2 \mathcal{C}(\boldsymbol{\beta})}{\partial \boldsymbol{\beta}\partial \boldsymbol{\beta}^T}\right)^{-1}_{\boldsymbol{\beta}^{\mathrm{old}}}\times \left(\frac{\partial \mathcal{C}(\boldsymbol{\beta})}{\partial \boldsymbol{\beta}}\right)_{\boldsymbol{\beta}^{\mathrm{old}}},
-$$
-<p>&nbsp;<br>
+<p><li> Systematical errors</li>
+</ul>
+<p>
+<p><li> Statistical errors can be estimated using standard tools from statistics</li>
+<p><li> Systematical errors are method specific and must be treated differently from case to case.</li> 
+</ul>
+</div>
+</section>
 
-<p>or in matrix form as</p>
+<section>
+<h2 id="resampling-methods">Resampling methods </h2>
 
-<p>&nbsp;<br>
-$$
-\boldsymbol{\beta}^{\mathrm{new}} = \boldsymbol{\beta}^{\mathrm{old}}-\left(\boldsymbol{X}^T\boldsymbol{W}\boldsymbol{X} \right)^{-1}\times \left(-\boldsymbol{X}^T(\boldsymbol{y}-\boldsymbol{p}) \right)_{\boldsymbol{\beta}^{\mathrm{old}}}.
-$$
-<p>&nbsp;<br>
+<p>With all these analytical equations for both the OLS and Ridge
+regression, we will now outline how to assess a given model. This will
+lead to a discussion of the so-called bias-variance tradeoff (see
+below) and so-called resampling methods.
+</p>
 
-<p>The right-hand side is computed with the old values of \( \beta \). </p>
+<p>One of the quantities we have discussed as a way to measure errors is
+the mean-squared error (MSE), mainly used for fitting of continuous
+functions. Another choice is the absolute error.
+</p>
 
-<p>If we can compute these matrices, in particular the Hessian, the above is often the easiest method to implement. </p>
+<p>In the discussions below we will focus on the MSE and in particular since we will split the data into test and training data,
+we discuss the
+</p>
+<ol>
+<p><li> prediction error or simply the <b>test error</b> \( \mathrm{Err_{Test}} \), where we have a fixed training set and the test error is the MSE arising from the data reserved for testing. We discuss also the</li> 
+<p><li> training error \( \mathrm{Err_{Train}} \), which is the average loss over the training data.</li>
+</ol>
+<p>
+<p>As our model becomes more and more complex, more of the training data tends to  used. The training may thence adapt to more complicated structures in the data. This may lead to a decrease in the bias (see below for code example) and a slight increase of the variance for the test error.
+For a certain level of complexity the test error will reach minimum, before starting to increase again. The
+training error reaches a saturation.
+</p>
 </section>
 
 <section>
-<h2 id="brief-reminder-on-newton-raphson-s-method">Brief reminder on Newton-Raphson's method </h2>
+<h2 id="resampling-methods-bootstrap">Resampling methods: Bootstrap </h2>
+<div class="alert alert-block alert-block alert-text-normal">
+<b></b>
+<p>
+<p>Bootstrapping is a <a href="https://en.wikipedia.org/wiki/Nonparametric_statistics" target="_blank">non-parametric approach</a> to statistical inference
+that substitutes computation for more traditional distributional
+assumptions and asymptotic results. Bootstrapping offers a number of
+advantages: 
+</p>
+<ol>
+<p><li> The bootstrap is quite general, although there are some cases in which it fails.</li>
 
-<p>Let us quickly remind ourselves how we derive the above method.</p>
+<p><li> Because it does not require distributional assumptions (such as normally distributed errors), the bootstrap can provide more accurate inferences when the data are not well behaved or when the sample size is small.</li>
 
-<p>Perhaps the most celebrated of all one-dimensional root-finding
-routines is Newton's method, also called the Newton-Raphson
-method. This method  requires the evaluation of both the
-function \( f \) and its derivative \( f' \) at arbitrary points. 
-If you can only calculate the derivative
-numerically and/or your function is not of the smooth type, we
-normally discourage the use of this method.
-</p>
+<p><li> It is possible to apply the bootstrap to statistics with sampling distributions that are difficult to derive, even asymptotically.</li> 
+<p><li> It is relatively simple to apply the bootstrap to complex data-collection plans (such as stratified and clustered samples).</li>
+</ol>
+</div>
+
+<p>The textbook by <a href="https://www.cambridge.org/core/books/bootstrap-methods-and-their-application/ED2FD043579F27952363566DC09CBD6A" target="_blank">Davison on the Bootstrap Methods and their Applications</a> provides many more insights and proofs. In this course we will take a more practical approach and use the results and theorems provided in the literature. For those interested in reading more about the bootstrap methods, we recommend the above text and the one by <a href="https://www.routledge.com/An-Introduction-to-the-Bootstrap/Efron-Tibshirani/p/book/9780412042317" target="_blank">Efron and Tibshirani</a>.</p>
 </section>
 
 <section>
-<h2 id="the-equations">The equations </h2>
+<h2 id="the-bias-variance-tradeoff">The bias-variance tradeoff </h2>
 
-<p>The Newton-Raphson formula consists geometrically of extending the
-tangent line at a current point until it crosses zero, then setting
-the next guess to the abscissa of that zero-crossing.  The mathematics
-behind this method is rather simple. Employing a Taylor expansion for
-\( x \) sufficiently close to the solution \( s \), we have
+<p>We will discuss the bias-variance tradeoff in the context of
+continuous predictions such as regression. However, many of the
+intuitions and ideas discussed here also carry over to classification
+tasks. Consider a dataset \( \mathcal{D} \) consisting of the data
+\( \mathbf{X}_\mathcal{D}=\{(y_j, \boldsymbol{x}_j), j=0\ldots n-1\} \). 
 </p>
 
+<p>Let us assume that the true data is generated from a noisy model</p>
+
 <p>&nbsp;<br>
 $$
-    f(s)=0=f(x)+(s-x)f'(x)+\frac{(s-x)^2}{2}f''(x) +\dots.
-    \tag{1}
+\boldsymbol{y}=f(\boldsymbol{x}) + \boldsymbol{\epsilon}
 $$
 <p>&nbsp;<br>
 
-<p>For small enough values of the function and for well-behaved
-functions, the terms beyond linear are unimportant, hence we obtain
-</p>
+<p>where \( \epsilon \) is normally distributed with mean zero and standard deviation \( \sigma^2 \).</p>
 
-<p>&nbsp;<br>
-$$
-   f(x)+(s-x)f'(x)\approx 0,
-$$
-<p>&nbsp;<br>
+<p>In our derivation of the ordinary least squares method we defined then
+an approximation to the function \( f \) in terms of the parameters
+\( \boldsymbol{\theta} \) and the design matrix \( \boldsymbol{X} \) which embody our model,
+that is \( \boldsymbol{\tilde{y}}=\boldsymbol{X}\boldsymbol{\theta} \). 
+</p>
 
-<p>yielding</p>
+<p>Thereafter we found the parameters \( \boldsymbol{\theta} \) by optimizing the means squared error via the so-called cost function</p>
 <p>&nbsp;<br>
 $$
-   s\approx x-\frac{f(x)}{f'(x)}.
+C(\boldsymbol{X},\boldsymbol{\theta}) =\frac{1}{n}\sum_{i=0}^{n-1}(y_i-\tilde{y}_i)^2=\mathbb{E}\left[(\boldsymbol{y}-\boldsymbol{\tilde{y}})^2\right].
 $$
 <p>&nbsp;<br>
 
-<p>Having in mind an iterative procedure, it is natural to start iterating with</p>
+<p>We can rewrite this as </p>
 <p>&nbsp;<br>
 $$
-   x_{n+1}=x_n-\frac{f(x_n)}{f'(x_n)}.
+\mathbb{E}\left[(\boldsymbol{y}-\boldsymbol{\tilde{y}})^2\right]=\frac{1}{n}\sum_i(f_i-\mathbb{E}\left[\boldsymbol{\tilde{y}}\right])^2+\frac{1}{n}\sum_i(\tilde{y}_i-\mathbb{E}\left[\boldsymbol{\tilde{y}}\right])^2+\sigma^2.
 $$
 <p>&nbsp;<br>
-</section>
 
-<section>
-<h2 id="simple-geometric-interpretation">Simple geometric interpretation </h2>
-
-<p>The above is Newton-Raphson's method. It has a simple geometric
-interpretation, namely \( x_{n+1} \) is the point where the tangent from
-\( (x_n,f(x_n)) \) crosses the \( x \)-axis.  Close to the solution,
-Newton-Raphson converges fast to the desired result. However, if we
-are far from a root, where the higher-order terms in the series are
-important, the Newton-Raphson formula can give grossly inaccurate
-results. For instance, the initial guess for the root might be so far
-from the true root as to let the search interval include a local
-maximum or minimum of the function.  If an iteration places a trial
-guess near such a local extremum, so that the first derivative nearly
-vanishes, then Newton-Raphson may fail totally
+<p>The three terms represent the square of the bias of the learning
+method, which can be thought of as the error caused by the simplifying
+assumptions built into the method. The second term represents the
+variance of the chosen model and finally the last terms is variance of
+the error \( \boldsymbol{\epsilon} \).
 </p>
-</section>
-
-<section>
-<h2 id="extending-to-more-than-one-variable">Extending to more than one variable </h2>
 
-<p>Newton's method can be generalized to systems of several non-linear equations
-and variables. Consider the case with two equations
+<p>To derive this equation, we need to recall that the variance of \( \boldsymbol{y} \) and \( \boldsymbol{\epsilon} \) are both equal to \( \sigma^2 \). The mean value of \( \boldsymbol{\epsilon} \) is by definition equal to zero. Furthermore, the function \( f \) is not a stochastics variable, idem for \( \boldsymbol{\tilde{y}} \).
+We use a more compact notation in terms of the expectation value 
 </p>
 <p>&nbsp;<br>
 $$
-   \begin{array}{cc} f_1(x_1,x_2) &=0\\
-                     f_2(x_1,x_2) &=0,\end{array}
-$$
-<p>&nbsp;<br>
-
-<p>which we Taylor expand to obtain</p>
-
-<p>&nbsp;<br>
-$$
-   \begin{array}{cc} 0=f_1(x_1+h_1,x_2+h_2)=&f_1(x_1,x_2)+h_1
-                     \partial f_1/\partial x_1+h_2
-                     \partial f_1/\partial x_2+\dots\\
-                     0=f_2(x_1+h_1,x_2+h_2)=&f_2(x_1,x_2)+h_1
-                     \partial f_2/\partial x_1+h_2
-                     \partial f_2/\partial x_2+\dots
-                       \end{array}.
-$$
-<p>&nbsp;<br>
-
-<p>Defining the Jacobian matrix \( {\bf \boldsymbol{J}} \) we have</p>
-<p>&nbsp;<br>
-$$
- {\bf \boldsymbol{J}}=\left( \begin{array}{cc}
-                         \partial f_1/\partial x_1  & \partial f_1/\partial x_2 \\
-                          \partial f_2/\partial x_1     &\partial f_2/\partial x_2
-             \end{array} \right),
+\mathbb{E}\left[(\boldsymbol{y}-\boldsymbol{\tilde{y}})^2\right]=\mathbb{E}\left[(\boldsymbol{f}+\boldsymbol{\epsilon}-\boldsymbol{\tilde{y}})^2\right],
 $$
 <p>&nbsp;<br>
 
-<p>we can rephrase Newton's method as</p>
+<p>and adding and subtracting \( \mathbb{E}\left[\boldsymbol{\tilde{y}}\right] \) we get</p>
 <p>&nbsp;<br>
 $$
-\left(\begin{array}{c} x_1^{n+1} \\ x_2^{n+1} \end{array} \right)=
-\left(\begin{array}{c} x_1^{n} \\ x_2^{n} \end{array} \right)+
-\left(\begin{array}{c} h_1^{n} \\ h_2^{n} \end{array} \right),
+\mathbb{E}\left[(\boldsymbol{y}-\boldsymbol{\tilde{y}})^2\right]=\mathbb{E}\left[(\boldsymbol{f}+\boldsymbol{\epsilon}-\boldsymbol{\tilde{y}}+\mathbb{E}\left[\boldsymbol{\tilde{y}}\right]-\mathbb{E}\left[\boldsymbol{\tilde{y}}\right])^2\right],
 $$
 <p>&nbsp;<br>
 
-<p>where we have defined</p>
+<p>which, using the abovementioned expectation values can be rewritten as </p>
 <p>&nbsp;<br>
 $$
-   \left(\begin{array}{c} h_1^{n} \\ h_2^{n} \end{array} \right)=
-   -{\bf \boldsymbol{J}}^{-1}
-   \left(\begin{array}{c} f_1(x_1^{n},x_2^{n}) \\ f_2(x_1^{n},x_2^{n}) \end{array} \right).
+\mathbb{E}\left[(\boldsymbol{y}-\boldsymbol{\tilde{y}})^2\right]=\mathbb{E}\left[(\boldsymbol{y}-\mathbb{E}\left[\boldsymbol{\tilde{y}}\right])^2\right]+\mathrm{Var}\left[\boldsymbol{\tilde{y}}\right]+\sigma^2,
 $$
 <p>&nbsp;<br>
 
-<p>We need thus to compute the inverse of the Jacobian matrix and it
-is to understand that difficulties  may
-arise in case \( {\bf \boldsymbol{J}} \) is nearly singular.
-</p>
+<p>that is the rewriting in terms of the so-called bias, the variance of the model \( \boldsymbol{\tilde{y}} \) and the variance of \( \boldsymbol{\epsilon} \).</p>
 
-<p>It is rather straightforward to extend the above scheme to systems of
-more than two non-linear equations. In our case, the Jacobian matrix is given by the Hessian that represents the second derivative of cost function. 
-</p>
+<b>Note that in order to derive these equations we have assumed we can replace the unknown function \( \boldsymbol{f} \) with the target/output data \( \boldsymbol{y} \).</b>
 </section>
 
 <section>
-<h2 id="steepest-descent">Steepest descent </h2>
-
-<p>The basic idea of gradient descent is
-that a function \( F(\mathbf{x}) \), 
-\( \mathbf{x} \equiv (x_1,\cdots,x_n) \), decreases fastest if one goes from \( \bf {x} \) in the
-direction of the negative gradient \( -\nabla F(\mathbf{x}) \).
-</p>
+<h2 id="a-way-to-read-the-bias-variance-tradeoff">A way to Read the Bias-Variance Tradeoff </h2>
 
-<p>It can be shown that if </p>
-<p>&nbsp;<br>
-$$
-\mathbf{x}_{k+1} = \mathbf{x}_k - \gamma_k \nabla F(\mathbf{x}_k),
-$$
-<p>&nbsp;<br>
-
-<p>with \( \gamma_k > 0 \).</p>
-
-<p>For \( \gamma_k \) small enough, then \( F(\mathbf{x}_{k+1}) \leq
-F(\mathbf{x}_k) \). This means that for a sufficiently small \( \gamma_k \)
-we are always moving towards smaller function values, i.e a minimum.
-</p>
+<br/><br/>
+<center>
+<p><img src="http://github.com/figures/BiasVariance.png" width="600" align="bottom"></p>
+</center>
+<br/><br/>
 </section>
 
 <section>
-<h2 id="more-on-steepest-descent">More on Steepest descent </h2>
-
-<p>The previous observation is the basis of the method of steepest
-descent, which is also referred to as just gradient descent (GD). One
-starts with an initial guess \( \mathbf{x}_0 \) for a minimum of \( F \) and
-computes new approximations according to
-</p>
-
-<p>&nbsp;<br>
-$$
-\mathbf{x}_{k+1} = \mathbf{x}_k - \gamma_k \nabla F(\mathbf{x}_k), \ \ k \geq 0.
-$$
-<p>&nbsp;<br>
+<h2 id="understanding-what-happens">Understanding what happens </h2>
 
-<p>The parameter \( \gamma_k \) is often referred to as the step length or
-the learning rate within the context of Machine Learning.
-</p>
+<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #eeeedd">
+  <pre style="font-size: 80%; line-height: 125%;"><span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">matplotlib.pyplot</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">plt</span>
+<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.linear_model</span> <span style="color: #8B008B; font-weight: bold">import</span> LinearRegression, Ridge, Lasso
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.preprocessing</span> <span style="color: #8B008B; font-weight: bold">import</span> PolynomialFeatures
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.model_selection</span> <span style="color: #8B008B; font-weight: bold">import</span> train_test_split
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.pipeline</span> <span style="color: #8B008B; font-weight: bold">import</span> make_pipeline
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.utils</span> <span style="color: #8B008B; font-weight: bold">import</span> resample
+
+np.random.seed(<span style="color: #B452CD">2018</span>)
+
+n = <span style="color: #B452CD">40</span>
+n_boostraps = <span style="color: #B452CD">100</span>
+maxdegree = <span style="color: #B452CD">14</span>
+
+
+<span style="color: #228B22"># Make data set.</span>
+x = np.linspace(-<span style="color: #B452CD">3</span>, <span style="color: #B452CD">3</span>, n).reshape(-<span style="color: #B452CD">1</span>, <span style="color: #B452CD">1</span>)
+y = np.exp(-x**<span style="color: #B452CD">2</span>) + <span style="color: #B452CD">1.5</span> * np.exp(-(x-<span style="color: #B452CD">2</span>)**<span style="color: #B452CD">2</span>)+ np.random.normal(<span style="color: #B452CD">0</span>, <span style="color: #B452CD">0.1</span>, x.shape)
+error = np.zeros(maxdegree)
+bias = np.zeros(maxdegree)
+variance = np.zeros(maxdegree)
+polydegree = np.zeros(maxdegree)
+x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=<span style="color: #B452CD">0.2</span>)
+
+<span style="color: #8B008B; font-weight: bold">for</span> degree <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(maxdegree):
+    model = make_pipeline(PolynomialFeatures(degree=degree), LinearRegression(fit_intercept=<span style="color: #8B008B; font-weight: bold">False</span>))
+    y_pred = np.empty((y_test.shape[<span style="color: #B452CD">0</span>], n_boostraps))
+    <span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(n_boostraps):
+        x_, y_ = resample(x_train, y_train)
+        y_pred[:, i] = model.fit(x_, y_).predict(x_test).ravel()
+
+    polydegree[degree] = degree
+    error[degree] = np.mean( np.mean((y_test - y_pred)**<span style="color: #B452CD">2</span>, axis=<span style="color: #B452CD">1</span>, keepdims=<span style="color: #8B008B; font-weight: bold">True</span>) )
+    bias[degree] = np.mean( (y_test - np.mean(y_pred, axis=<span style="color: #B452CD">1</span>, keepdims=<span style="color: #8B008B; font-weight: bold">True</span>))**<span style="color: #B452CD">2</span> )
+    variance[degree] = np.mean( np.var(y_pred, axis=<span style="color: #B452CD">1</span>, keepdims=<span style="color: #8B008B; font-weight: bold">True</span>) )
+    <span style="color: #658b00">print</span>(<span style="color: #CD5555">&#39;Polynomial degree:&#39;</span>, degree)
+    <span style="color: #658b00">print</span>(<span style="color: #CD5555">&#39;Error:&#39;</span>, error[degree])
+    <span style="color: #658b00">print</span>(<span style="color: #CD5555">&#39;Bias^2:&#39;</span>, bias[degree])
+    <span style="color: #658b00">print</span>(<span style="color: #CD5555">&#39;Var:&#39;</span>, variance[degree])
+    <span style="color: #658b00">print</span>(<span style="color: #CD5555">&#39;{} &gt;= {} + {} = {}&#39;</span>.format(error[degree], bias[degree], variance[degree], bias[degree]+variance[degree]))
+
+plt.plot(polydegree, error, label=<span style="color: #CD5555">&#39;Error&#39;</span>)
+plt.plot(polydegree, bias, label=<span style="color: #CD5555">&#39;bias&#39;</span>)
+plt.plot(polydegree, variance, label=<span style="color: #CD5555">&#39;Variance&#39;</span>)
+plt.legend()
+plt.show()
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
+</div>
 </section>
 
 <section>
-<h2 id="the-ideal">The ideal </h2>
-
-<p>Ideally the sequence \( \{\mathbf{x}_k \}_{k=0} \) converges to a global
-minimum of the function \( F \). In general we do not know if we are in a
-global or local minimum. In the special case when \( F \) is a convex
-function, all local minima are also global minima, so in this case
-gradient descent can converge to the global solution. The advantage of
-this scheme is that it is conceptually simple and straightforward to
-implement. However the method in this form has some severe
-limitations:
-</p>
+<h2 id="summing-up">Summing up </h2>
 
-<p>In machine learing we are often faced with non-convex high dimensional
-cost functions with many local minima. Since GD is deterministic we
-will get stuck in a local minimum, if the method converges, unless we
-have a very good intial guess. This also implies that the scheme is
-sensitive to the chosen initial condition.
+<p>The bias-variance tradeoff summarizes the fundamental tension in
+machine learning, particularly supervised learning, between the
+complexity of a model and the amount of training data needed to train
+it.  Since data is often limited, in practice it is often useful to
+use a less-complex model with higher bias, that is  a model whose asymptotic
+performance is worse than another model because it is easier to
+train and less sensitive to sampling noise arising from having a
+finite-sized training dataset (smaller variance). 
 </p>
 
-<p>Note that the gradient is a function of \( \mathbf{x} =
-(x_1,\cdots,x_n) \) which makes it expensive to compute numerically.
+<p>The above equations tell us that in
+order to minimize the expected test error, we need to select a
+statistical learning method that simultaneously achieves low variance
+and low bias. Note that variance is inherently a nonnegative quantity,
+and squared bias is also nonnegative. Hence, we see that the expected
+test MSE can never lie below \( Var(\epsilon) \), the irreducible error.
 </p>
-</section>
 
-<section>
-<h2 id="the-sensitiveness-of-the-gradient-descent">The sensitiveness of the gradient descent </h2>
-
-<p>The gradient descent method 
-is sensitive to the choice of learning rate \( \gamma_k \). This is due
-to the fact that we are only guaranteed that \( F(\mathbf{x}_{k+1}) \leq
-F(\mathbf{x}_k) \) for sufficiently small \( \gamma_k \). The problem is to
-determine an optimal learning rate. If the learning rate is chosen too
-small the method will take a long time to converge and if it is too
-large we can experience erratic behavior.
+<p>What do we mean by the variance and bias of a statistical learning
+method? The variance refers to the amount by which our model would change if we
+estimated it using a different training data set. Since the training
+data are used to fit the statistical learning method, different
+training data sets  will result in a different estimate. But ideally the
+estimate for our model should not vary too much between training
+sets. However, if a method has high variance  then small changes in
+the training data can result in large changes in the model. In general, more
+flexible statistical methods have higher variance.
 </p>
 
-<p>Many of these shortcomings can be alleviated by introducing
-randomness. One such method is that of Stochastic Gradient Descent
-(SGD), see below.
-</p>
+<p>You may also find this recent <a href="https://www.pnas.org/content/116/32/15849" target="_blank">article</a> of interest.</p>
 </section>
 
 <section>
-<h2 id="convex-functions">Convex functions </h2>
+<h2 id="another-example-from-scikit-learn-s-repository">Another Example from Scikit-Learn's Repository </h2>
 
-<p>Ideally we want our cost/loss function to be convex(concave).</p>
-
-<p>First we give the definition of a convex set: A set \( C \) in
-\( \mathbb{R}^n \) is said to be convex if, for all \( x \) and \( y \) in \( C \) and
-all \( t \in (0,1) \) , the point \( (1 &#8722; t)x + ty \) also belongs to
-C. Geometrically this means that every point on the line segment
-connecting \( x \) and \( y \) is in \( C \) as discussed below.
+<p>This example demonstrates the problems of underfitting and overfitting and
+how we can use linear regression with polynomial features to approximate
+nonlinear functions. The plot shows the function that we want to approximate,
+which is a part of the cosine function. In addition, the samples from the
+real function and the approximations of different models are displayed. The
+models have polynomial features of different degrees. We can see that a
+linear function (polynomial with degree 1) is not sufficient to fit the
+training samples. This is called <b>underfitting</b>. A polynomial of degree 4
+approximates the true function almost perfectly. However, for higher degrees
+the model will <b>overfit</b> the training data, i.e. it learns the noise of the
+training data.
+We evaluate quantitatively overfitting and underfitting by using
+cross-validation. We calculate the mean squared error (MSE) on the validation
+set, the higher, the less likely the model generalizes correctly from the
+training data.
 </p>
 
-<p>The convex subsets of \( \mathbb{R} \) are the intervals of
-\( \mathbb{R} \). Examples of convex sets of \( \mathbb{R}^2 \) are the
-regular polygons (triangles, rectangles, pentagons, etc...).
-</p>
-</section>
 
-<section>
-<h2 id="convex-function">Convex function </h2>
-
-<p><b>Convex function</b>: Let \( X \subset \mathbb{R}^n \) be a convex
-set. Assume that the function \( f: X \rightarrow \mathbb{R} \) is
-continuous, then \( f \) is said to be convex if \( f(tx_1 + (1-t)x_2) \leq tf(x_1) + (1-t)f(x_2) \)
-for all \( x_1, x_2 \in X \) and for all \( t \in [0,1] \).
-If \( \leq \) is replaced with a strict inequaltiy in the
-definition, we demand \( x_1 \neq x_2 \) and \( t\in(0,1) \) then \( f \) is said
-to be strictly convex. For a single variable function, convexity means
-that if you draw a straight line connecting \( f(x_1) \) and \( f(x_2) \), the
-value of the function on the interval \( [x_1,x_2] \) is always below the
-line as illustrated below.
-</p>
+<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #eeeedd">
+  <pre style="font-size: 80%; line-height: 125%;"><span style="color: #228B22">#print(__doc__)</span>
+
+<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
+<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">matplotlib.pyplot</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">plt</span>
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.pipeline</span> <span style="color: #8B008B; font-weight: bold">import</span> Pipeline
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.preprocessing</span> <span style="color: #8B008B; font-weight: bold">import</span> PolynomialFeatures
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.linear_model</span> <span style="color: #8B008B; font-weight: bold">import</span> LinearRegression
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.model_selection</span> <span style="color: #8B008B; font-weight: bold">import</span> cross_val_score
+
+
+<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">true_fun</span>(X):
+    <span style="color: #8B008B; font-weight: bold">return</span> np.cos(<span style="color: #B452CD">1.5</span> * np.pi * X)
+
+np.random.seed(<span style="color: #B452CD">0</span>)
+
+n_samples = <span style="color: #B452CD">30</span>
+degrees = [<span style="color: #B452CD">1</span>, <span style="color: #B452CD">4</span>, <span style="color: #B452CD">15</span>]
+
+X = np.sort(np.random.rand(n_samples))
+y = true_fun(X) + np.random.randn(n_samples) * <span style="color: #B452CD">0.1</span>
+
+plt.figure(figsize=(<span style="color: #B452CD">14</span>, <span style="color: #B452CD">5</span>))
+<span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #658b00">len</span>(degrees)):
+    ax = plt.subplot(<span style="color: #B452CD">1</span>, <span style="color: #658b00">len</span>(degrees), i + <span style="color: #B452CD">1</span>)
+    plt.setp(ax, xticks=(), yticks=())
+
+    polynomial_features = PolynomialFeatures(degree=degrees[i],
+                                             include_bias=<span style="color: #8B008B; font-weight: bold">False</span>)
+    linear_regression = LinearRegression()
+    pipeline = Pipeline([(<span style="color: #CD5555">&quot;polynomial_features&quot;</span>, polynomial_features),
+                         (<span style="color: #CD5555">&quot;linear_regression&quot;</span>, linear_regression)])
+    pipeline.fit(X[:, np.newaxis], y)
+
+    <span style="color: #228B22"># Evaluate the models using crossvalidation</span>
+    scores = cross_val_score(pipeline, X[:, np.newaxis], y,
+                             scoring=<span style="color: #CD5555">&quot;neg_mean_squared_error&quot;</span>, cv=<span style="color: #B452CD">10</span>)
+
+    X_test = np.linspace(<span style="color: #B452CD">0</span>, <span style="color: #B452CD">1</span>, <span style="color: #B452CD">100</span>)
+    plt.plot(X_test, pipeline.predict(X_test[:, np.newaxis]), label=<span style="color: #CD5555">&quot;Model&quot;</span>)
+    plt.plot(X_test, true_fun(X_test), label=<span style="color: #CD5555">&quot;True function&quot;</span>)
+    plt.scatter(X, y, edgecolor=<span style="color: #CD5555">&#39;b&#39;</span>, s=<span style="color: #B452CD">20</span>, label=<span style="color: #CD5555">&quot;Samples&quot;</span>)
+    plt.xlabel(<span style="color: #CD5555">&quot;x&quot;</span>)
+    plt.ylabel(<span style="color: #CD5555">&quot;y&quot;</span>)
+    plt.xlim((<span style="color: #B452CD">0</span>, <span style="color: #B452CD">1</span>))
+    plt.ylim((-<span style="color: #B452CD">2</span>, <span style="color: #B452CD">2</span>))
+    plt.legend(loc=<span style="color: #CD5555">&quot;best&quot;</span>)
+    plt.title(<span style="color: #CD5555">&quot;Degree {}\nMSE = {:.2e}(+/- {:.2e})&quot;</span>.format(
+        degrees[i], -scores.mean(), scores.std()))
+plt.show()
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
+</div>
 </section>
 
 <section>
-<h2 id="conditions-on-convex-functions">Conditions on convex functions </h2>
-
-<p>In the following we state first and second-order conditions which
-ensures convexity of a function \( f \). We write \( D_f \) to denote the
-domain of \( f \), i.e the subset of \( R^n \) where \( f \) is defined. For more
-details and proofs we refer to: <a href="http://stanford.edu/boyd/cvxbook/" target="_blank">S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge University Press</a>.
-</p>
+<h2 id="various-steps-in-cross-validation">Various steps in cross-validation </h2>
 
-<div class="alert alert-block alert-block alert-text-normal">
-<b>First order condition</b>
-<p>
-<p>Suppose \( f \) is differentiable (i.e \( \nabla f(x) \) is well defined for
-all \( x \) in the domain of \( f \)). Then \( f \) is convex if and only if \( D_f \)
-is a convex set and \( f(y) \geq f(x) + \nabla f(x)^T (y-x) \) holds
-for all \( x,y \in D_f \).
+<p>When the repetitive splitting of the data set is done randomly,
+samples may accidently end up in a fast majority of the splits in
+either training or test set. Such samples may have an unbalanced
+influence on either model building or prediction evaluation. To avoid
+this \( k \)-fold cross-validation structures the data splitting. The
+samples are divided into \( k \) more or less equally sized exhaustive and
+mutually exclusive subsets. In turn (at each split) one of these
+subsets plays the role of the test set while the union of the
+remaining subsets constitutes the training set. Such a splitting
+warrants a balanced representation of each sample in both training and
+test set over the splits. Still the division into the \( k \) subsets
+involves a degree of randomness. This may be fully excluded when
+choosing \( k=n \). This particular case is referred to as leave-one-out
+cross-validation (LOOCV). 
 </p>
+</section>
 
-<p>This condition means that for a convex function
-the first order Taylor expansion (right hand side above) at any point
-a global under estimator of the function. To convince yourself you can
-make a drawing of \( f(x) = x^2+1 \) and draw the tangent line to \( f(x) \) and
-note that it is always below the graph.  
-</p>
-</div>
+<section>
+<h2 id="cross-validation-in-brief">Cross-validation in brief </h2>
 
+<p>For the various values of \( k \)</p>
 
-<div class="alert alert-block alert-block alert-text-normal">
-<b>Second order condition</b>
+<ol>
+<p><li> shuffle the dataset randomly.</li>
+<p><li> Split the dataset into \( k \) groups.</li>
+<p><li> For each unique group:
+<ol type="a"></li>
+<p><li> Decide which group to use as set for test data</li>
+<p><li> Take the remaining groups as a training data set</li>
+<p><li> Fit a model on the training set and evaluate it on the test set</li>
+<p><li> Retain the evaluation score and discard the model</li>
+</ol>
 <p>
-<p>Assume that \( f \) is twice
-differentiable, i.e the Hessian matrix exists at each point in
-\( D_f \). Then \( f \) is convex if and only if \( D_f \) is a convex set and its
-Hessian is positive semi-definite for all \( x\in D_f \). For a
-single-variable function this reduces to \( f''(x) \geq 0 \). Geometrically this means that \( f \) has nonnegative curvature
-everywhere.
-</p>
-</div>
-
-<p>This condition is particularly useful since it gives us an procedure for determining if the function under consideration is convex, apart from using the definition.</p>
+<p><li> Summarize the model using the sample of model evaluation scores</li>
+</ol>
 </section>
 
 <section>
-<h2 id="more-on-convex-functions">More on convex functions </h2>
+<h2 id="code-example-for-cross-validation-and-k-fold-cross-validation">Code Example for Cross-validation and \( k \)-fold Cross-validation </h2>
 
-<p>The next result is of great importance to us and the reason why we are
-going on about convex functions. In machine learning we frequently
-have to minimize a loss/cost function in order to find the best
-parameters for the model we are considering. 
-</p>
+<p>The code here uses Ridge regression with cross-validation (CV)  resampling and \( k \)-fold CV in order to fit a specific polynomial. </p>
 
-<p>Ideally we want the
-global minimum (for high-dimensional models it is hard to know
-if we have local or global minimum). However, if the cost/loss function
-is convex the following result provides invaluable information:
-</p>
+<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #eeeedd">
+  <pre style="font-size: 80%; line-height: 125%;"><span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
+<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">matplotlib.pyplot</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">plt</span>
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.model_selection</span> <span style="color: #8B008B; font-weight: bold">import</span> KFold
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.linear_model</span> <span style="color: #8B008B; font-weight: bold">import</span> Ridge
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.model_selection</span> <span style="color: #8B008B; font-weight: bold">import</span> cross_val_score
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.preprocessing</span> <span style="color: #8B008B; font-weight: bold">import</span> PolynomialFeatures
 
-<div class="alert alert-block alert-block alert-text-normal">
-<b>Any minimum is global for convex functions</b>
-<p>
-<p>Consider the problem of finding \( x \in \mathbb{R}^n \) such that \( f(x) \)
-is minimal, where \( f \) is convex and differentiable. Then, any point
-\( x^* \) that satisfies \( \nabla f(x^*) = 0 \) is a global minimum.
-</p>
-</div>
+<span style="color: #228B22"># A seed just to ensure that the random numbers are the same for every run.</span>
+<span style="color: #228B22"># Useful for eventual debugging.</span>
+np.random.seed(<span style="color: #B452CD">3155</span>)
 
-<p>This result means that if we know that the cost/loss function is convex and we are able to find a minimum, we are guaranteed that it is a global minimum.</p>
-</section>
+<span style="color: #228B22"># Generate the data.</span>
+nsamples = <span style="color: #B452CD">100</span>
+x = np.random.randn(nsamples)
+y = <span style="color: #B452CD">3</span>*x**<span style="color: #B452CD">2</span> + np.random.randn(nsamples)
 
-<section>
-<h2 id="some-simple-problems">Some simple problems </h2>
+<span style="color: #228B22">## Cross-validation on Ridge regression using KFold only</span>
 
-<ol>
-<p><li> Show that \( f(x)=x^2 \) is convex for \( x \in \mathbb{R} \) using the definition of convexity. Hint: If you re-write the definition, \( f \) is convex if the following holds for all \( x,y \in D_f \) and any \( \lambda \in [0,1] \) $\lambda f(x)+(1-\lambda)f(y)-f(\lambda x + (1-\lambda) y ) \geq 0$.</li>
-<p><li> Using the second order condition show that the following functions are convex on the specified domain.</li>
-<ul>
- <p><li> \( f(x) = e^x \) is convex for \( x \in \mathbb{R} \).</li>
- <p><li> \( g(x) = -\ln(x) \) is convex for \( x \in (0,\infty) \).</li>
-</ul>
-<p>
-<p><li> Let \( f(x) = x^2 \) and \( g(x) = e^x \). Show that \( f(g(x)) \) and \( g(f(x)) \) is convex for \( x \in \mathbb{R} \). Also show that if \( f(x) \) is any convex function than \( h(x) = e^{f(x)} \) is convex.</li>
-<p><li> A norm is any function that satisfy the following properties</li>
-<ul>
- <p><li> \( f(\alpha x) = |\alpha| f(x) \) for all \( \alpha \in \mathbb{R} \).</li>
- <p><li> \( f(x+y) \leq f(x) + f(y) \)</li>
- <p><li> \( f(x) \leq 0 \) for all \( x \in \mathbb{R}^n \) with equality if and only if \( x = 0 \)</li>
-</ul>
-<p>
-</ol>
-<p>
-<p>Using the definition of convexity, try to show that a function satisfying the properties above is convex (the third condition is not needed to show this).</p>
-</section>
+<span style="color: #228B22"># Decide degree on polynomial to fit</span>
+poly = PolynomialFeatures(degree = <span style="color: #B452CD">6</span>)
 
-<section>
-<h2 id="standard-steepest-descent">Standard steepest descent </h2>
+<span style="color: #228B22"># Decide which values of lambda to use</span>
+nlambdas = <span style="color: #B452CD">500</span>
+lambdas = np.logspace(-<span style="color: #B452CD">3</span>, <span style="color: #B452CD">5</span>, nlambdas)
 
-<p>Before we proceed, we would like to discuss the approach called the
-<b>standard Steepest descent</b> (different from the above steepest descent discussion), which again leads to us having to be able
-to compute a matrix. It belongs to the class of Conjugate Gradient methods (CG).
-</p>
+<span style="color: #228B22"># Initialize a KFold instance</span>
+k = <span style="color: #B452CD">5</span>
+kfold = KFold(n_splits = k)
 
-<a href="https://www.cs.cmu.edu/~quake-papers/painless-conjugate-gradient.pdf" target="_blank">The success of the CG method</a>
-<p>for finding solutions of non-linear problems is based on the theory
-of conjugate gradients for linear systems of equations. It belongs to
-the class of iterative methods for solving problems from linear
-algebra of the type 
-</p>
-<p>&nbsp;<br>
-$$
-\begin{equation*} 
-\boldsymbol{A}\boldsymbol{x} = \boldsymbol{b}.
-\end{equation*} 
-$$
-<p>&nbsp;<br>
+<span style="color: #228B22"># Perform the cross-validation to estimate MSE</span>
+scores_KFold = np.zeros((nlambdas, k))
 
-<p>In the iterative process we end up with a problem like</p>
+i = <span style="color: #B452CD">0</span>
+<span style="color: #8B008B; font-weight: bold">for</span> lmb <span style="color: #8B008B">in</span> lambdas:
+    ridge = Ridge(alpha = lmb)
+    j = <span style="color: #B452CD">0</span>
+    <span style="color: #8B008B; font-weight: bold">for</span> train_inds, test_inds <span style="color: #8B008B">in</span> kfold.split(x):
+        xtrain = x[train_inds]
+        ytrain = y[train_inds]
 
-<p>&nbsp;<br>
-$$
-\begin{equation*}
-  \boldsymbol{r}= \boldsymbol{b}-\boldsymbol{A}\boldsymbol{x},
-\end{equation*}
-$$
-<p>&nbsp;<br>
+        xtest = x[test_inds]
+        ytest = y[test_inds]
 
-<p>where \( \boldsymbol{r} \) is the so-called residual or error in the iterative process.</p>
+        Xtrain = poly.fit_transform(xtrain[:, np.newaxis])
+        ridge.fit(Xtrain, ytrain[:, np.newaxis])
 
-<p>When we have found the exact solution, \( \boldsymbol{r}=0 \).</p>
-</section>
+        Xtest = poly.fit_transform(xtest[:, np.newaxis])
+        ypred = ridge.predict(Xtest)
 
-<section>
-<h2 id="gradient-method">Gradient method </h2>
+        scores_KFold[i,j] = np.sum((ypred - ytest[:, np.newaxis])**<span style="color: #B452CD">2</span>)/np.size(ypred)
 
-<p>The residual is zero when we reach the minimum of the quadratic equation</p>
-<p>&nbsp;<br>
-$$
-\begin{equation*}
-  P(\boldsymbol{x})=\frac{1}{2}\boldsymbol{x}^T\boldsymbol{A}\boldsymbol{x} - \boldsymbol{x}^T\boldsymbol{b},
-\end{equation*}
-$$
-<p>&nbsp;<br>
+        j += <span style="color: #B452CD">1</span>
+    i += <span style="color: #B452CD">1</span>
 
-<p>with the constraint that the matrix \( \boldsymbol{A} \) is positive definite and
-symmetric.  This defines also the Hessian and we want it to be  positive definite.  
-</p>
-</section>
 
-<section>
-<h2 id="steepest-descent-method">Steepest descent  method </h2>
+estimated_mse_KFold = np.mean(scores_KFold, axis = <span style="color: #B452CD">1</span>)
 
-<p>We denote the initial guess for \( \boldsymbol{x} \) as \( \boldsymbol{x}_0 \). 
-We can assume without loss of generality that
-</p>
-<p>&nbsp;<br>
-$$
-\begin{equation*}
-\boldsymbol{x}_0=0,
-\end{equation*}
-$$
-<p>&nbsp;<br>
+<span style="color: #228B22">## Cross-validation using cross_val_score from sklearn along with KFold</span>
 
-<p>or consider the system</p>
-<p>&nbsp;<br>
-$$
-\begin{equation*}
-\boldsymbol{A}\boldsymbol{z} = \boldsymbol{b}-\boldsymbol{A}\boldsymbol{x}_0,
-\end{equation*}
-$$
-<p>&nbsp;<br>
+<span style="color: #228B22"># kfold is an instance initialized above as:</span>
+<span style="color: #228B22"># kfold = KFold(n_splits = k)</span>
 
-<p>instead.</p>
-</section>
+estimated_mse_sklearn = np.zeros(nlambdas)
+i = <span style="color: #B452CD">0</span>
+<span style="color: #8B008B; font-weight: bold">for</span> lmb <span style="color: #8B008B">in</span> lambdas:
+    ridge = Ridge(alpha = lmb)
 
-<section>
-<h2 id="steepest-descent-method">Steepest descent  method </h2>
-<div class="alert alert-block alert-block alert-text-normal">
-<b></b>
-<p>
-<p>One can show that the solution \( \boldsymbol{x} \) is also the unique minimizer of the quadratic form</p>
-<p>&nbsp;<br>
-$$
-\begin{equation*}
-  f(\boldsymbol{x}) = \frac{1}{2}\boldsymbol{x}^T\boldsymbol{A}\boldsymbol{x} - \boldsymbol{x}^T \boldsymbol{x} , \quad \boldsymbol{x}\in\mathbf{R}^n. 
-\end{equation*}
-$$
-<p>&nbsp;<br>
+    X = poly.fit_transform(x[:, np.newaxis])
+    estimated_mse_folds = cross_val_score(ridge, X, y[:, np.newaxis], scoring=<span style="color: #CD5555">&#39;neg_mean_squared_error&#39;</span>, cv=kfold)
 
-<p>This suggests taking the first basis vector \( \boldsymbol{r}_1 \) (see below for definition) 
-to be the gradient of \( f \) at \( \boldsymbol{x}=\boldsymbol{x}_0 \), 
-which equals
-</p>
-<p>&nbsp;<br>
-$$
-\begin{equation*}
-\boldsymbol{A}\boldsymbol{x}_0-\boldsymbol{b},
-\end{equation*}
-$$
-<p>&nbsp;<br>
+    <span style="color: #228B22"># cross_val_score return an array containing the estimated negative mse for every fold.</span>
+    <span style="color: #228B22"># we have to the the mean of every array in order to get an estimate of the mse of the model</span>
+    estimated_mse_sklearn[i] = np.mean(-estimated_mse_folds)
 
-<p>and 
-\( \boldsymbol{x}_0=0 \) it is equal \( -\boldsymbol{b} \).
-</p>
-</div>
-</section>
+    i += <span style="color: #B452CD">1</span>
 
-<section>
-<h2 id="final-expressions">Final expressions </h2>
-<div class="alert alert-block alert-block alert-text-normal">
-<b></b>
-<p>
-<p>We can compute the residual iteratively as</p>
-<p>&nbsp;<br>
-$$
-\begin{equation*}
-\boldsymbol{r}_{k+1}=\boldsymbol{b}-\boldsymbol{A}\boldsymbol{x}_{k+1},
- \end{equation*}
-$$
-<p>&nbsp;<br>
+<span style="color: #228B22">## Plot and compare the slightly different ways to perform cross-validation</span>
 
-<p>which equals</p>
-<p>&nbsp;<br>
-$$
-\begin{equation*}
-\boldsymbol{b}-\boldsymbol{A}(\boldsymbol{x}_k+\alpha_k\boldsymbol{r}_k),
- \end{equation*}
-$$
-<p>&nbsp;<br>
+plt.figure()
 
-<p>or</p>
-<p>&nbsp;<br>
-$$
-\begin{equation*}
-(\boldsymbol{b}-\boldsymbol{A}\boldsymbol{x}_k)-\alpha_k\boldsymbol{A}\boldsymbol{r}_k,
- \end{equation*}
-$$
-<p>&nbsp;<br>
+plt.plot(np.log10(lambdas), estimated_mse_sklearn, label = <span style="color: #CD5555">&#39;cross_val_score&#39;</span>)
+<span style="color: #228B22">#plt.plot(np.log10(lambdas), estimated_mse_KFold, &#39;r--&#39;, label = &#39;KFold&#39;)</span>
 
-<p>which gives</p>
+plt.xlabel(<span style="color: #CD5555">&#39;log10(lambda)&#39;</span>)
+plt.ylabel(<span style="color: #CD5555">&#39;mse&#39;</span>)
 
-<p>&nbsp;<br>
-$$
-\alpha_k = \frac{\boldsymbol{r}_k^T\boldsymbol{r}_k}{\boldsymbol{r}_k^T\boldsymbol{A}\boldsymbol{r}_k}
-$$
-<p>&nbsp;<br>
+plt.legend()
 
-<p>leading to the iterative scheme</p>
-<p>&nbsp;<br>
-$$
-\begin{equation*}
-\boldsymbol{x}_{k+1}=\boldsymbol{x}_k+\alpha_k\boldsymbol{r}_{k},
- \end{equation*}
-$$
-<p>&nbsp;<br>
+plt.show()
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
 </div>
 </section>
 
 <section>
-<h2 id="steepest-descent-example">Steepest descent example </h2>
+<h2 id="more-examples-on-bootstrap-and-cross-validation-and-errors">More examples on bootstrap and cross-validation and errors </h2>
 
 
 <!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
@@ -873,26 +842,84 @@ <h2 id="steepest-descent-example">Steepest descent example </h2>
     <div class="inner_cell">
       <div class="input_area">
         <div class="highlight" style="background: #eeeedd">
-  <pre style="font-size: 80%; line-height: 125%;"><span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">numpy.linalg</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">la</span>
-
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">scipy.optimize</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">sopt</span>
-
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">matplotlib.pyplot</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">pt</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">mpl_toolkits.mplot3d</span> <span style="color: #8B008B; font-weight: bold">import</span> axes3d
-
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">f</span>(x):
-    <span style="color: #8B008B; font-weight: bold">return</span> x[<span style="color: #B452CD">0</span>]**<span style="color: #B452CD">2</span> + <span style="color: #B452CD">3.0</span>*x[<span style="color: #B452CD">1</span>]**<span style="color: #B452CD">2</span>
-
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">df</span>(x):
-    <span style="color: #8B008B; font-weight: bold">return</span> np.array([<span style="color: #B452CD">2</span>*x[<span style="color: #B452CD">0</span>], <span style="color: #B452CD">6</span>*x[<span style="color: #B452CD">1</span>]])
-
-fig = pt.figure()
-ax = fig.add_subplot(projection = <span style="color: #CD5555">&#39;3d&#39;</span>)
-
-xmesh, ymesh = np.mgrid[-<span style="color: #B452CD">3</span>:<span style="color: #B452CD">3</span>:<span style="color: #B452CD">50</span>j,-<span style="color: #B452CD">3</span>:<span style="color: #B452CD">3</span>:<span style="color: #B452CD">50</span>j]
-fmesh = f(np.array([xmesh, ymesh]))
-ax.plot_surface(xmesh, ymesh, fmesh)
+  <pre style="font-size: 80%; line-height: 125%;"><span style="color: #228B22"># Common imports</span>
+<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">os</span>
+<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
+<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">pandas</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">pd</span>
+<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">matplotlib.pyplot</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">plt</span>
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.linear_model</span> <span style="color: #8B008B; font-weight: bold">import</span> LinearRegression, Ridge, Lasso
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.model_selection</span> <span style="color: #8B008B; font-weight: bold">import</span> train_test_split
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.utils</span> <span style="color: #8B008B; font-weight: bold">import</span> resample
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.metrics</span> <span style="color: #8B008B; font-weight: bold">import</span> mean_squared_error
+<span style="color: #228B22"># Where to save the figures and data files</span>
+PROJECT_ROOT_DIR = <span style="color: #CD5555">&quot;Results&quot;</span>
+FIGURE_ID = <span style="color: #CD5555">&quot;Results/FigureFiles&quot;</span>
+DATA_ID = <span style="color: #CD5555">&quot;DataFiles/&quot;</span>
+
+<span style="color: #8B008B; font-weight: bold">if</span> <span style="color: #8B008B">not</span> os.path.exists(PROJECT_ROOT_DIR):
+    os.mkdir(PROJECT_ROOT_DIR)
+
+<span style="color: #8B008B; font-weight: bold">if</span> <span style="color: #8B008B">not</span> os.path.exists(FIGURE_ID):
+    os.makedirs(FIGURE_ID)
+
+<span style="color: #8B008B; font-weight: bold">if</span> <span style="color: #8B008B">not</span> os.path.exists(DATA_ID):
+    os.makedirs(DATA_ID)
+
+<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">image_path</span>(fig_id):
+    <span style="color: #8B008B; font-weight: bold">return</span> os.path.join(FIGURE_ID, fig_id)
+
+<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">data_path</span>(dat_id):
+    <span style="color: #8B008B; font-weight: bold">return</span> os.path.join(DATA_ID, dat_id)
+
+<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">save_fig</span>(fig_id):
+    plt.savefig(image_path(fig_id) + <span style="color: #CD5555">&quot;.png&quot;</span>, <span style="color: #658b00">format</span>=<span style="color: #CD5555">&#39;png&#39;</span>)
+
+infile = <span style="color: #658b00">open</span>(data_path(<span style="color: #CD5555">&quot;EoS.csv&quot;</span>),<span style="color: #CD5555">&#39;r&#39;</span>)
+
+<span style="color: #228B22"># Read the EoS data as  csv file and organize the data into two arrays with density and energies</span>
+EoS = pd.read_csv(infile, names=(<span style="color: #CD5555">&#39;Density&#39;</span>, <span style="color: #CD5555">&#39;Energy&#39;</span>))
+EoS[<span style="color: #CD5555">&#39;Energy&#39;</span>] = pd.to_numeric(EoS[<span style="color: #CD5555">&#39;Energy&#39;</span>], errors=<span style="color: #CD5555">&#39;coerce&#39;</span>)
+EoS = EoS.dropna()
+Energies = EoS[<span style="color: #CD5555">&#39;Energy&#39;</span>]
+Density = EoS[<span style="color: #CD5555">&#39;Density&#39;</span>]
+<span style="color: #228B22">#  The design matrix now as function of various polytrops</span>
+
+Maxpolydegree = <span style="color: #B452CD">30</span>
+X = np.zeros((<span style="color: #658b00">len</span>(Density),Maxpolydegree))
+X[:,<span style="color: #B452CD">0</span>] = <span style="color: #B452CD">1.0</span>
+testerror = np.zeros(Maxpolydegree)
+trainingerror = np.zeros(Maxpolydegree)
+polynomial = np.zeros(Maxpolydegree)
+
+trials = <span style="color: #B452CD">100</span>
+<span style="color: #8B008B; font-weight: bold">for</span> polydegree <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #B452CD">1</span>, Maxpolydegree):
+    polynomial[polydegree] = polydegree
+    <span style="color: #8B008B; font-weight: bold">for</span> degree <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(polydegree):
+        X[:,degree] = Density**(degree/<span style="color: #B452CD">3.0</span>)
+
+<span style="color: #228B22"># loop over trials in order to estimate the expectation value of the MSE</span>
+    testerror[polydegree] = <span style="color: #B452CD">0.0</span>
+    trainingerror[polydegree] = <span style="color: #B452CD">0.0</span>
+    <span style="color: #8B008B; font-weight: bold">for</span> samples <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(trials):
+        x_train, x_test, y_train, y_test = train_test_split(X, Energies, test_size=<span style="color: #B452CD">0.2</span>)
+        model = LinearRegression(fit_intercept=<span style="color: #8B008B; font-weight: bold">False</span>).fit(x_train, y_train)
+        ypred = model.predict(x_train)
+        ytilde = model.predict(x_test)
+        testerror[polydegree] += mean_squared_error(y_test, ytilde)
+        trainingerror[polydegree] += mean_squared_error(y_train, ypred) 
+
+    testerror[polydegree] /= trials
+    trainingerror[polydegree] /= trials
+    <span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;Degree of polynomial: %3d&quot;</span>% polynomial[polydegree])
+    <span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;Mean squared error on training data: %.8f&quot;</span> % trainingerror[polydegree])
+    <span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;Mean squared error on test data: %.8f&quot;</span> % testerror[polydegree])
+
+plt.plot(polynomial, np.log10(trainingerror), label=<span style="color: #CD5555">&#39;Training Error&#39;</span>)
+plt.plot(polynomial, np.log10(testerror), label=<span style="color: #CD5555">&#39;Test Error&#39;</span>)
+plt.xlabel(<span style="color: #CD5555">&#39;Polynomial degree&#39;</span>)
+plt.ylabel(<span style="color: #CD5555">&#39;log10[MSE]&#39;</span>)
+plt.legend()
+plt.show()
 </pre>
 </div>
       </div>
@@ -908,7 +935,13 @@ <h2 id="steepest-descent-example">Steepest descent example </h2>
   </div>
 </div>
 
-<p>And then as countor plot</p>
+<p>Note that we kept the intercept column in the fitting here. This means that we need to set the <b>intercept</b> in the call to the <b>Scikit-Learn</b> function as <b>False</b>. Alternatively, we could have set up the design matrix \( X \) without the first column of ones.</p>
+</section>
+
+<section>
+<h2 id="the-same-example-but-now-with-cross-validation">The same example but now with cross-validation </h2>
+
+<p>In this example we keep the intercept column again but add cross-validation in order to estimate the best possible value of the means squared error.</p>
 
 <!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
 <div class="cell border-box-sizing code_cell rendered">
@@ -916,9 +949,73 @@ <h2 id="steepest-descent-example">Steepest descent example </h2>
     <div class="inner_cell">
       <div class="input_area">
         <div class="highlight" style="background: #eeeedd">
-  <pre style="font-size: 80%; line-height: 125%;">pt.axis(<span style="color: #CD5555">&quot;equal&quot;</span>)
-pt.contour(xmesh, ymesh, fmesh)
-guesses = [np.array([<span style="color: #B452CD">2</span>, <span style="color: #B452CD">2.</span>/<span style="color: #B452CD">5</span>])]
+  <pre style="font-size: 80%; line-height: 125%;"><span style="color: #228B22"># Common imports</span>
+<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">os</span>
+<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
+<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">pandas</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">pd</span>
+<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">matplotlib.pyplot</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">plt</span>
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.linear_model</span> <span style="color: #8B008B; font-weight: bold">import</span> LinearRegression, Ridge, Lasso
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.metrics</span> <span style="color: #8B008B; font-weight: bold">import</span> mean_squared_error
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.model_selection</span> <span style="color: #8B008B; font-weight: bold">import</span> KFold
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.model_selection</span> <span style="color: #8B008B; font-weight: bold">import</span> cross_val_score
+
+
+<span style="color: #228B22"># Where to save the figures and data files</span>
+PROJECT_ROOT_DIR = <span style="color: #CD5555">&quot;Results&quot;</span>
+FIGURE_ID = <span style="color: #CD5555">&quot;Results/FigureFiles&quot;</span>
+DATA_ID = <span style="color: #CD5555">&quot;DataFiles/&quot;</span>
+
+<span style="color: #8B008B; font-weight: bold">if</span> <span style="color: #8B008B">not</span> os.path.exists(PROJECT_ROOT_DIR):
+    os.mkdir(PROJECT_ROOT_DIR)
+
+<span style="color: #8B008B; font-weight: bold">if</span> <span style="color: #8B008B">not</span> os.path.exists(FIGURE_ID):
+    os.makedirs(FIGURE_ID)
+
+<span style="color: #8B008B; font-weight: bold">if</span> <span style="color: #8B008B">not</span> os.path.exists(DATA_ID):
+    os.makedirs(DATA_ID)
+
+<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">image_path</span>(fig_id):
+    <span style="color: #8B008B; font-weight: bold">return</span> os.path.join(FIGURE_ID, fig_id)
+
+<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">data_path</span>(dat_id):
+    <span style="color: #8B008B; font-weight: bold">return</span> os.path.join(DATA_ID, dat_id)
+
+<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">save_fig</span>(fig_id):
+    plt.savefig(image_path(fig_id) + <span style="color: #CD5555">&quot;.png&quot;</span>, <span style="color: #658b00">format</span>=<span style="color: #CD5555">&#39;png&#39;</span>)
+
+infile = <span style="color: #658b00">open</span>(data_path(<span style="color: #CD5555">&quot;EoS.csv&quot;</span>),<span style="color: #CD5555">&#39;r&#39;</span>)
+
+<span style="color: #228B22"># Read the EoS data as  csv file and organize the data into two arrays with density and energies</span>
+EoS = pd.read_csv(infile, names=(<span style="color: #CD5555">&#39;Density&#39;</span>, <span style="color: #CD5555">&#39;Energy&#39;</span>))
+EoS[<span style="color: #CD5555">&#39;Energy&#39;</span>] = pd.to_numeric(EoS[<span style="color: #CD5555">&#39;Energy&#39;</span>], errors=<span style="color: #CD5555">&#39;coerce&#39;</span>)
+EoS = EoS.dropna()
+Energies = EoS[<span style="color: #CD5555">&#39;Energy&#39;</span>]
+Density = EoS[<span style="color: #CD5555">&#39;Density&#39;</span>]
+<span style="color: #228B22">#  The design matrix now as function of various polytrops</span>
+
+Maxpolydegree = <span style="color: #B452CD">30</span>
+X = np.zeros((<span style="color: #658b00">len</span>(Density),Maxpolydegree))
+X[:,<span style="color: #B452CD">0</span>] = <span style="color: #B452CD">1.0</span>
+estimated_mse_sklearn = np.zeros(Maxpolydegree)
+polynomial = np.zeros(Maxpolydegree)
+k =<span style="color: #B452CD">5</span>
+kfold = KFold(n_splits = k)
+
+<span style="color: #8B008B; font-weight: bold">for</span> polydegree <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #B452CD">1</span>, Maxpolydegree):
+    polynomial[polydegree] = polydegree
+    <span style="color: #8B008B; font-weight: bold">for</span> degree <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(polydegree):
+        X[:,degree] = Density**(degree/<span style="color: #B452CD">3.0</span>)
+        OLS = LinearRegression(fit_intercept=<span style="color: #8B008B; font-weight: bold">False</span>)
+<span style="color: #228B22"># loop over trials in order to estimate the expectation value of the MSE</span>
+    estimated_mse_folds = cross_val_score(OLS, X, Energies, scoring=<span style="color: #CD5555">&#39;neg_mean_squared_error&#39;</span>, cv=kfold)
+<span style="color: #228B22">#[:, np.newaxis]</span>
+    estimated_mse_sklearn[polydegree] = np.mean(-estimated_mse_folds)
+
+plt.plot(polynomial, np.log10(estimated_mse_sklearn), label=<span style="color: #CD5555">&#39;Test Error&#39;</span>)
+plt.xlabel(<span style="color: #CD5555">&#39;Polynomial degree&#39;</span>)
+plt.ylabel(<span style="color: #CD5555">&#39;log10[MSE]&#39;</span>)
+plt.legend()
+plt.show()
 </pre>
 </div>
       </div>
@@ -933,8 +1030,150 @@ <h2 id="steepest-descent-example">Steepest descent example </h2>
     </div>
   </div>
 </div>
+</section>
+
+<section>
+<h2 id="logistic-regression">Logistic Regression </h2>
+
+<p>In linear regression our main interest was centered on learning the
+coefficients of a functional fit (say a polynomial) in order to be
+able to predict the response of a continuous variable on some unseen
+data. The fit to the continuous variable \( y_i \) is based on some
+independent variables \( \boldsymbol{x}_i \). Linear regression resulted in
+analytical expressions for standard ordinary Least Squares or Ridge
+regression (in terms of matrices to invert) for several quantities,
+ranging from the variance and thereby the confidence intervals of the
+parameters \( \boldsymbol{\theta} \) to the mean squared error. If we can invert
+the product of the design matrices, linear regression gives then a
+simple recipe for fitting our data.
+</p>
+</section>
+
+<section>
+<h2 id="classification-problems">Classification problems </h2>
+
+<p>Classification problems, however, are concerned with outcomes taking
+the form of discrete variables (i.e. categories). We may for example,
+on the basis of DNA sequencing for a number of patients, like to find
+out which mutations are important for a certain disease; or based on
+scans of various patients' brains, figure out if there is a tumor or
+not; or given a specific physical system, we'd like to identify its
+state, say whether it is an ordered or disordered system (typical
+situation in solid state physics); or classify the status of a
+patient, whether she/he has a stroke or not and many other similar
+situations.
+</p>
+
+<p>The most common situation we encounter when we apply logistic
+regression is that of two possible outcomes, normally denoted as a
+binary outcome, true or false, positive or negative, success or
+failure etc.
+</p>
+</section>
+
+<section>
+<h2 id="optimization-and-deep-learning">Optimization and Deep learning </h2>
+
+<p>Logistic regression will also serve as our stepping stone towards
+neural network algorithms and supervised deep learning. For logistic
+learning, the minimization of the cost function leads to a non-linear
+equation in the parameters \( \boldsymbol{\theta} \). The optimization of the
+problem calls therefore for minimization algorithms. This forms the
+bottle neck of all machine learning algorithms, namely how to find
+reliable minima of a multi-variable function. This leads us to the
+family of gradient descent methods. The latter are the working horses
+of basically all modern machine learning algorithms.
+</p>
+
+<p>We note also that many of the topics discussed here on logistic 
+regression are also commonly used in modern supervised Deep Learning
+models, as we will see later.
+</p>
+</section>
+
+<section>
+<h2 id="basics">Basics </h2>
+
+<p>We consider the case where the outputs/targets, also called the
+responses or the outcomes, \( y_i \) are discrete and only take values
+from \( k=0,\dots,K-1 \) (i.e. \( K \) classes).
+</p>
+
+<p>The goal is to predict the
+output classes from the design matrix \( \boldsymbol{X}\in\mathbb{R}^{n\times p} \)
+made of \( n \) samples, each of which carries \( p \) features or predictors. The
+primary goal is to identify the classes to which new unseen samples
+belong.
+</p>
+
+<p>Let us specialize to the case of two classes only, with outputs
+\( y_i=0 \) and \( y_i=1 \). Our outcomes could represent the status of a
+credit card user that could default or not on her/his credit card
+debt. That is
+</p>
+
+<p>&nbsp;<br>
+$$
+y_i = \begin{bmatrix} 0 & \mathrm{no}\\  1 & \mathrm{yes} \end{bmatrix}.
+$$
+<p>&nbsp;<br>
+</section>
+
+<section>
+<h2 id="linear-classifier">Linear classifier </h2>
+
+<p>Before moving to the logistic model, let us try to use our linear
+regression model to classify these two outcomes. We could for example
+fit a linear model to the default case if \( y_i > 0.5 \) and the no
+default case \( y_i \leq 0.5 \).
+</p>
+
+<p>We would then have our 
+weighted linear combination, namely 
+</p>
+<p>&nbsp;<br>
+$$
+\begin{equation}
+\boldsymbol{y} = \boldsymbol{X}^T\boldsymbol{\theta} +  \boldsymbol{\epsilon},
+\tag{1}
+\end{equation}
+$$
+<p>&nbsp;<br>
+
+<p>where \( \boldsymbol{y} \) is a vector representing the possible outcomes, \( \boldsymbol{X} \) is our
+\( n\times p \) design matrix and \( \boldsymbol{\theta} \) represents our estimators/predictors.
+</p>
+</section>
+
+<section>
+<h2 id="some-selected-properties">Some selected properties </h2>
+
+<p>The main problem with our function is that it takes values on the
+entire real axis. In the case of logistic regression, however, the
+labels \( y_i \) are discrete variables. A typical example is the credit
+card data discussed below here, where we can set the state of
+defaulting the debt to \( y_i=1 \) and not to \( y_i=0 \) for one the persons
+in the data set (see the full example below).
+</p>
+
+<p>One simple way to get a discrete output is to have sign
+functions that map the output of a linear regressor to values \( \{0,1\} \),
+\( f(s_i)=sign(s_i)=1 \) if \( s_i\ge 0 \) and 0 if otherwise. 
+We will encounter this model in our first demonstration of neural networks.
+</p>
+
+<p>Historically it is called the <b>perceptron</b> model in the machine learning
+literature. This model is extremely simple. However, in many cases it is more
+favorable to use a ``soft" classifier that outputs
+the probability of a given category. This leads us to the logistic function.
+</p>
+</section>
+
+<section>
+<h2 id="simple-example">Simple example </h2>
+
+<p>The following example on data for coronary heart disease (CHD) as function of age may serve as an illustration. In the code here we read and plot whether a person has had CHD (output = 1) or not (output = 0). This ouput  is plotted the person's against age. Clearly, the figure shows that attempting to make a standard linear regression fit may not be very meaningful.</p>
 
-<p>Find guesses</p>
 
 <!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
 <div class="cell border-box-sizing code_cell rendered">
@@ -942,8 +1181,59 @@ <h2 id="steepest-descent-example">Steepest descent example </h2>
     <div class="inner_cell">
       <div class="input_area">
         <div class="highlight" style="background: #eeeedd">
-  <pre style="font-size: 80%; line-height: 125%;">x = guesses[-<span style="color: #B452CD">1</span>]
-s = -df(x)
+  <pre style="font-size: 80%; line-height: 125%;"><span style="color: #228B22"># Common imports</span>
+<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">os</span>
+<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
+<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">pandas</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">pd</span>
+<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">matplotlib.pyplot</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">plt</span>
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.linear_model</span> <span style="color: #8B008B; font-weight: bold">import</span> LinearRegression, Ridge, Lasso
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.model_selection</span> <span style="color: #8B008B; font-weight: bold">import</span> train_test_split
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.utils</span> <span style="color: #8B008B; font-weight: bold">import</span> resample
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.metrics</span> <span style="color: #8B008B; font-weight: bold">import</span> mean_squared_error
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">IPython.display</span> <span style="color: #8B008B; font-weight: bold">import</span> display
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">pylab</span> <span style="color: #8B008B; font-weight: bold">import</span> plt, mpl
+mpl.rcParams[<span style="color: #CD5555">&#39;font.family&#39;</span>] = <span style="color: #CD5555">&#39;serif&#39;</span>
+
+<span style="color: #228B22"># Where to save the figures and data files</span>
+PROJECT_ROOT_DIR = <span style="color: #CD5555">&quot;Results&quot;</span>
+FIGURE_ID = <span style="color: #CD5555">&quot;Results/FigureFiles&quot;</span>
+DATA_ID = <span style="color: #CD5555">&quot;DataFiles/&quot;</span>
+
+<span style="color: #8B008B; font-weight: bold">if</span> <span style="color: #8B008B">not</span> os.path.exists(PROJECT_ROOT_DIR):
+    os.mkdir(PROJECT_ROOT_DIR)
+
+<span style="color: #8B008B; font-weight: bold">if</span> <span style="color: #8B008B">not</span> os.path.exists(FIGURE_ID):
+    os.makedirs(FIGURE_ID)
+
+<span style="color: #8B008B; font-weight: bold">if</span> <span style="color: #8B008B">not</span> os.path.exists(DATA_ID):
+    os.makedirs(DATA_ID)
+
+<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">image_path</span>(fig_id):
+    <span style="color: #8B008B; font-weight: bold">return</span> os.path.join(FIGURE_ID, fig_id)
+
+<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">data_path</span>(dat_id):
+    <span style="color: #8B008B; font-weight: bold">return</span> os.path.join(DATA_ID, dat_id)
+
+<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">save_fig</span>(fig_id):
+    plt.savefig(image_path(fig_id) + <span style="color: #CD5555">&quot;.png&quot;</span>, <span style="color: #658b00">format</span>=<span style="color: #CD5555">&#39;png&#39;</span>)
+
+infile = <span style="color: #658b00">open</span>(data_path(<span style="color: #CD5555">&quot;chddata.csv&quot;</span>),<span style="color: #CD5555">&#39;r&#39;</span>)
+
+<span style="color: #228B22"># Read the chd data as  csv file and organize the data into arrays with age group, age, and chd</span>
+chd = pd.read_csv(infile, names=(<span style="color: #CD5555">&#39;ID&#39;</span>, <span style="color: #CD5555">&#39;Age&#39;</span>, <span style="color: #CD5555">&#39;Agegroup&#39;</span>, <span style="color: #CD5555">&#39;CHD&#39;</span>))
+chd.columns = [<span style="color: #CD5555">&#39;ID&#39;</span>, <span style="color: #CD5555">&#39;Age&#39;</span>, <span style="color: #CD5555">&#39;Agegroup&#39;</span>, <span style="color: #CD5555">&#39;CHD&#39;</span>]
+output = chd[<span style="color: #CD5555">&#39;CHD&#39;</span>]
+age = chd[<span style="color: #CD5555">&#39;Age&#39;</span>]
+agegroup = chd[<span style="color: #CD5555">&#39;Agegroup&#39;</span>]
+numberID  = chd[<span style="color: #CD5555">&#39;ID&#39;</span>] 
+display(chd)
+
+plt.scatter(age, output, marker=<span style="color: #CD5555">&#39;o&#39;</span>)
+plt.axis([<span style="color: #B452CD">18</span>,<span style="color: #B452CD">70.0</span>,-<span style="color: #B452CD">0.1</span>, <span style="color: #B452CD">1.2</span>])
+plt.xlabel(<span style="color: #CD5555">r&#39;Age&#39;</span>)
+plt.ylabel(<span style="color: #CD5555">r&#39;CHD&#39;</span>)
+plt.title(<span style="color: #CD5555">r&#39;Age distribution and Coronary heart disease&#39;</span>)
+plt.show()
 </pre>
 </div>
       </div>
@@ -958,8 +1248,13 @@ <h2 id="steepest-descent-example">Steepest descent example </h2>
     </div>
   </div>
 </div>
+</section>
+
+<section>
+<h2 id="plotting-the-mean-value-for-each-group">Plotting the mean value for each group </h2>
+
+<p>What we could attempt however is to plot the mean value for each group.</p>
 
-<p>Run it!</p>
 
 <!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
 <div class="cell border-box-sizing code_cell rendered">
@@ -967,13 +1262,14 @@ <h2 id="steepest-descent-example">Steepest descent example </h2>
     <div class="inner_cell">
       <div class="input_area">
         <div class="highlight" style="background: #eeeedd">
-  <pre style="font-size: 80%; line-height: 125%;"><span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">f1d</span>(alpha):
-    <span style="color: #8B008B; font-weight: bold">return</span> f(x + alpha*s)
-
-alpha_opt = sopt.golden(f1d)
-next_guess = x + alpha_opt * s
-guesses.append(next_guess)
-<span style="color: #658b00">print</span>(next_guess)
+  <pre style="font-size: 80%; line-height: 125%;">agegroupmean = np.array([<span style="color: #B452CD">0.1</span>, <span style="color: #B452CD">0.133</span>, <span style="color: #B452CD">0.250</span>, <span style="color: #B452CD">0.333</span>, <span style="color: #B452CD">0.462</span>, <span style="color: #B452CD">0.625</span>, <span style="color: #B452CD">0.765</span>, <span style="color: #B452CD">0.800</span>])
+group = np.array([<span style="color: #B452CD">1</span>, <span style="color: #B452CD">2</span>, <span style="color: #B452CD">3</span>, <span style="color: #B452CD">4</span>, <span style="color: #B452CD">5</span>, <span style="color: #B452CD">6</span>, <span style="color: #B452CD">7</span>, <span style="color: #B452CD">8</span>])
+plt.plot(group, agegroupmean, <span style="color: #CD5555">&quot;r-&quot;</span>)
+plt.axis([<span style="color: #B452CD">0</span>,<span style="color: #B452CD">9</span>,<span style="color: #B452CD">0</span>, <span style="color: #B452CD">1.0</span>])
+plt.xlabel(<span style="color: #CD5555">r&#39;Age group&#39;</span>)
+plt.ylabel(<span style="color: #CD5555">r&#39;CHD mean values&#39;</span>)
+plt.title(<span style="color: #CD5555">r&#39;Mean values for each age group&#39;</span>)
+plt.show()
 </pre>
 </div>
       </div>
@@ -989,7 +1285,57 @@ <h2 id="steepest-descent-example">Steepest descent example </h2>
   </div>
 </div>
 
-<p>What happened?</p>
+<p>We are now trying to find a function \( f(y\vert x) \), that is a function which gives us an expected value for the output \( y \) with a given input \( x \).
+In standard linear regression with a linear dependence on \( x \), we would write this in terms of our model
+</p>
+<p>&nbsp;<br>
+$$
+f(y_i\vert x_i)=\theta_0+\theta_1 x_i.
+$$
+<p>&nbsp;<br>
+
+<p>This expression implies however that \( f(y_i\vert x_i) \) could take any
+value from minus infinity to plus infinity. If we however let
+\( f(y\vert y) \) be represented by the mean value, the above example
+shows us that we can constrain the function to take values between
+zero and one, that is we have \( 0 \le f(y_i\vert x_i) \le 1 \). Looking
+at our last curve we see also that it has an S-shaped form. This leads
+us to a very popular model for the function \( f \), namely the so-called
+Sigmoid function or logistic model. We will consider this function as
+representing the probability for finding a value of \( y_i \) with a given
+\( x_i \).
+</p>
+</section>
+
+<section>
+<h2 id="the-logistic-function">The logistic function </h2>
+
+<p>Another widely studied model, is the so-called 
+perceptron model, which is an example of a &quot;hard classification&quot; model. We
+will encounter this model when we discuss neural networks as
+well. Each datapoint is deterministically assigned to a category (i.e
+\( y_i=0 \) or \( y_i=1 \)). In many cases, and the coronary heart disease data forms one of many such examples, it is favorable to have a &quot;soft&quot;
+classifier that outputs the probability of a given category rather
+than a single value. For example, given \( x_i \), the classifier
+outputs the probability of being in a category \( k \).  Logistic regression
+is the most common example of a so-called soft classifier. In logistic
+regression, the probability that a data point \( x_i \)
+belongs to a category \( y_i=\{0,1\} \) is given by the so-called logit function (or Sigmoid) which is meant to represent the likelihood for a given event, 
+</p>
+<p>&nbsp;<br>
+$$
+p(t) = \frac{1}{1+\mathrm \exp{-t}}=\frac{\exp{t}}{1+\mathrm \exp{t}}.
+$$
+<p>&nbsp;<br>
+
+<p>Note that \( 1-p(t)= p(-t) \).</p>
+</section>
+
+<section>
+<h2 id="examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-networks">Examples of likelihood functions used in logistic regression and nueral networks </h2>
+
+<p>The following code plots the logistic function, the step function and other functions we will encounter from here and on.</p>
+
 
 <!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
 <div class="cell border-box-sizing code_cell rendered">
@@ -997,10 +1343,60 @@ <h2 id="steepest-descent-example">Steepest descent example </h2>
     <div class="inner_cell">
       <div class="input_area">
         <div class="highlight" style="background: #eeeedd">
-  <pre style="font-size: 80%; line-height: 125%;">pt.axis(<span style="color: #CD5555">&quot;equal&quot;</span>)
-pt.contour(xmesh, ymesh, fmesh, <span style="color: #B452CD">50</span>)
-it_array = np.array(guesses)
-pt.plot(it_array.T[<span style="color: #B452CD">0</span>], it_array.T[<span style="color: #B452CD">1</span>], <span style="color: #CD5555">&quot;x-&quot;</span>)
+  <pre style="font-size: 80%; line-height: 125%;"><span style="color: #CD5555">&quot;&quot;&quot;The sigmoid function (or the logistic curve) is a</span>
+<span style="color: #CD5555">function that takes any real number, z, and outputs a number (0,1).</span>
+<span style="color: #CD5555">It is useful in neural networks for assigning weights on a relative scale.</span>
+<span style="color: #CD5555">The value z is the weighted sum of parameters involved in the learning algorithm.&quot;&quot;&quot;</span>
+
+<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">numpy</span>
+<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">matplotlib.pyplot</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">plt</span>
+<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">math</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">mt</span>
+
+z = numpy.arange(-<span style="color: #B452CD">5</span>, <span style="color: #B452CD">5</span>, <span style="color: #B452CD">.1</span>)
+sigma_fn = numpy.vectorize(<span style="color: #8B008B; font-weight: bold">lambda</span> z: <span style="color: #B452CD">1</span>/(<span style="color: #B452CD">1</span>+numpy.exp(-z)))
+sigma = sigma_fn(z)
+
+fig = plt.figure()
+ax = fig.add_subplot(<span style="color: #B452CD">111</span>)
+ax.plot(z, sigma)
+ax.set_ylim([-<span style="color: #B452CD">0.1</span>, <span style="color: #B452CD">1.1</span>])
+ax.set_xlim([-<span style="color: #B452CD">5</span>,<span style="color: #B452CD">5</span>])
+ax.grid(<span style="color: #8B008B; font-weight: bold">True</span>)
+ax.set_xlabel(<span style="color: #CD5555">&#39;z&#39;</span>)
+ax.set_title(<span style="color: #CD5555">&#39;sigmoid function&#39;</span>)
+
+plt.show()
+
+<span style="color: #CD5555">&quot;&quot;&quot;Step Function&quot;&quot;&quot;</span>
+z = numpy.arange(-<span style="color: #B452CD">5</span>, <span style="color: #B452CD">5</span>, <span style="color: #B452CD">.02</span>)
+step_fn = numpy.vectorize(<span style="color: #8B008B; font-weight: bold">lambda</span> z: <span style="color: #B452CD">1.0</span> <span style="color: #8B008B; font-weight: bold">if</span> z &gt;= <span style="color: #B452CD">0.0</span> <span style="color: #8B008B; font-weight: bold">else</span> <span style="color: #B452CD">0.0</span>)
+step = step_fn(z)
+
+fig = plt.figure()
+ax = fig.add_subplot(<span style="color: #B452CD">111</span>)
+ax.plot(z, step)
+ax.set_ylim([-<span style="color: #B452CD">0.5</span>, <span style="color: #B452CD">1.5</span>])
+ax.set_xlim([-<span style="color: #B452CD">5</span>,<span style="color: #B452CD">5</span>])
+ax.grid(<span style="color: #8B008B; font-weight: bold">True</span>)
+ax.set_xlabel(<span style="color: #CD5555">&#39;z&#39;</span>)
+ax.set_title(<span style="color: #CD5555">&#39;step function&#39;</span>)
+
+plt.show()
+
+<span style="color: #CD5555">&quot;&quot;&quot;tanh Function&quot;&quot;&quot;</span>
+z = numpy.arange(-<span style="color: #B452CD">2</span>*mt.pi, <span style="color: #B452CD">2</span>*mt.pi, <span style="color: #B452CD">0.1</span>)
+t = numpy.tanh(z)
+
+fig = plt.figure()
+ax = fig.add_subplot(<span style="color: #B452CD">111</span>)
+ax.plot(z, t)
+ax.set_ylim([-<span style="color: #B452CD">1.0</span>, <span style="color: #B452CD">1.0</span>])
+ax.set_xlim([-<span style="color: #B452CD">2</span>*mt.pi,<span style="color: #B452CD">2</span>*mt.pi])
+ax.grid(<span style="color: #8B008B; font-weight: bold">True</span>)
+ax.set_xlabel(<span style="color: #CD5555">&#39;z&#39;</span>)
+ax.set_title(<span style="color: #CD5555">&#39;tanh function&#39;</span>)
+
+plt.show()
 </pre>
 </div>
       </div>
@@ -1015,576 +1411,322 @@ <h2 id="steepest-descent-example">Steepest descent example </h2>
     </div>
   </div>
 </div>
-
-<p>Note that we did only one iteration here. We can easily add more using our previous guesses.</p>
 </section>
 
 <section>
-<h2 id="conjugate-gradient-method">Conjugate gradient method </h2>
-<div class="alert alert-block alert-block alert-text-normal">
-<b></b>
-<p>
-<p>In the CG method we define so-called conjugate directions and two vectors 
-\( \boldsymbol{s} \) and \( \boldsymbol{t} \)
-are said to be
-conjugate if
-</p>
+<h2 id="two-parameters">Two parameters </h2>
+
+<p>We assume now that we have two classes with \( y_i \) either \( 0 \) or \( 1 \). Furthermore we assume also that we have only two parameters \( \theta \) in our fitting of the Sigmoid function, that is we define probabilities </p>
 <p>&nbsp;<br>
 $$
-\begin{equation*}
-\boldsymbol{s}^T\boldsymbol{A}\boldsymbol{t}= 0.
-\end{equation*}
+\begin{align*}
+p(y_i=1|x_i,\boldsymbol{\theta}) &= \frac{\exp{(\theta_0+\theta_1x_i)}}{1+\exp{(\theta_0+\theta_1x_i)}},\nonumber\\
+p(y_i=0|x_i,\boldsymbol{\theta}) &= 1 - p(y_i=1|x_i,\boldsymbol{\theta}),
+\end{align*}
 $$
 <p>&nbsp;<br>
 
-<p>The philosophy of the CG method is to perform searches in various conjugate directions
-of our vectors \( \boldsymbol{x}_i \) obeying the above criterion, namely
-</p>
+<p>where \( \boldsymbol{\theta} \) are the weights we wish to extract from data, in our case \( \theta_0 \) and \( \theta_1 \). </p>
+
+<p>Note that we used</p>
 <p>&nbsp;<br>
 $$
-\begin{equation*}
-\boldsymbol{x}_i^T\boldsymbol{A}\boldsymbol{x}_j= 0.
-\end{equation*}
+p(y_i=0\vert x_i, \boldsymbol{\theta}) = 1-p(y_i=1\vert x_i, \boldsymbol{\theta}).
 $$
 <p>&nbsp;<br>
-
-<p>Two vectors are conjugate if they are orthogonal with respect to 
-this inner product. Being conjugate is a symmetric relation: if \( \boldsymbol{s} \) is conjugate to \( \boldsymbol{t} \), then \( \boldsymbol{t} \) is conjugate to \( \boldsymbol{s} \).
-</p>
-</div>
 </section>
 
 <section>
-<h2 id="conjugate-gradient-method">Conjugate gradient method </h2>
-<div class="alert alert-block alert-block alert-text-normal">
-<b></b>
-<p>
-<p>An example is given by the eigenvectors of the matrix</p>
+<h2 id="maximum-likelihood">Maximum likelihood </h2>
+
+<p>In order to define the total likelihood for all possible outcomes from a  
+dataset \( \mathcal{D}=\{(y_i,x_i)\} \), with the binary labels
+\( y_i\in\{0,1\} \) and where the data points are drawn independently, we use the so-called <a href="https://en.wikipedia.org/wiki/Maximum_likelihood_estimation" target="_blank">Maximum Likelihood Estimation</a> (MLE) principle. 
+We aim thus at maximizing 
+the probability of seeing the observed data. We can then approximate the 
+likelihood in terms of the product of the individual probabilities of a specific outcome \( y_i \), that is 
+</p>
 <p>&nbsp;<br>
 $$
-\begin{equation*}
-\boldsymbol{v}_i^T\boldsymbol{A}\boldsymbol{v}_j= \lambda\boldsymbol{v}_i^T\boldsymbol{v}_j,
-\end{equation*}
+\begin{align*}
+P(\mathcal{D}|\boldsymbol{\theta})& = \prod_{i=1}^n \left[p(y_i=1|x_i,\boldsymbol{\theta})\right]^{y_i}\left[1-p(y_i=1|x_i,\boldsymbol{\theta}))\right]^{1-y_i}\nonumber \\
+\end{align*}
 $$
 <p>&nbsp;<br>
 
-<p>which is zero unless \( i=j \). </p>
-</div>
-</section>
-
-<section>
-<h2 id="conjugate-gradient-method">Conjugate gradient method </h2>
-<div class="alert alert-block alert-block alert-text-normal">
-<b></b>
-<p>
-<p>Assume now that we have a symmetric positive-definite matrix \( \boldsymbol{A} \) of size
-\( n\times n \). At each iteration \( i+1 \) we obtain the conjugate direction of a vector
-</p>
-<p>&nbsp;<br>
-$$
-\begin{equation*}
-\boldsymbol{x}_{i+1}=\boldsymbol{x}_{i}+\alpha_i\boldsymbol{p}_{i}. 
-\end{equation*}
-$$
-<p>&nbsp;<br>
-
-<p>We assume that \( \boldsymbol{p}_{i} \) is a sequence of \( n \) mutually conjugate directions. 
-Then the \( \boldsymbol{p}_{i} \)  form a basis of \( R^n \) and we can expand the solution 
-$  \boldsymbol{A}\boldsymbol{x} = \boldsymbol{b}$ in this basis, namely
-</p>
-
+<p>from which we obtain the log-likelihood and our <b>cost/loss</b> function</p>
 <p>&nbsp;<br>
 $$
-\begin{equation*}
-  \boldsymbol{x}  = \sum^{n}_{i=1} \alpha_i \boldsymbol{p}_i.
-\end{equation*}
+\mathcal{C}(\boldsymbol{\theta}) = \sum_{i=1}^n \left( y_i\log{p(y_i=1|x_i,\boldsymbol{\theta})} + (1-y_i)\log\left[1-p(y_i=1|x_i,\boldsymbol{\theta}))\right]\right).
 $$
 <p>&nbsp;<br>
-</div>
 </section>
 
 <section>
-<h2 id="conjugate-gradient-method">Conjugate gradient method </h2>
-<div class="alert alert-block alert-block alert-text-normal">
-<b></b>
-<p>
-<p>The coefficients are given by</p>
-<p>&nbsp;<br>
-$$
-\begin{equation*}
-    \mathbf{A}\mathbf{x} = \sum^{n}_{i=1} \alpha_i \mathbf{A} \mathbf{p}_i = \mathbf{b}.
-\end{equation*}
-$$
-<p>&nbsp;<br>
-
-<p>Multiplying with \( \boldsymbol{p}_k^T \)  from the left gives</p>
+<h2 id="the-cost-function-rewritten">The cost function rewritten </h2>
 
+<p>Reordering the logarithms, we can rewrite the <b>cost/loss</b> function as</p>
 <p>&nbsp;<br>
 $$
-\begin{equation*}
-  \boldsymbol{p}_k^T \boldsymbol{A}\boldsymbol{x} = \sum^{n}_{i=1} \alpha_i\boldsymbol{p}_k^T \boldsymbol{A}\boldsymbol{p}_i= \boldsymbol{p}_k^T \boldsymbol{b},
-\end{equation*}
+\mathcal{C}(\boldsymbol{\theta}) = \sum_{i=1}^n  \left(y_i(\theta_0+\theta_1x_i) -\log{(1+\exp{(\theta_0+\theta_1x_i)})}\right).
 $$
 <p>&nbsp;<br>
 
-<p>and we can define the coefficients \( \alpha_k \) as</p>
-
+<p>The maximum likelihood estimator is defined as the set of parameters that maximize the log-likelihood where we maximize with respect to \( \theta \).
+Since the cost (error) function is just the negative log-likelihood, for logistic regression we have that
+</p>
 <p>&nbsp;<br>
 $$
-\begin{equation*}
-    \alpha_k = \frac{\boldsymbol{p}_k^T \boldsymbol{b}}{\boldsymbol{p}_k^T \boldsymbol{A} \boldsymbol{p}_k}
-\end{equation*}
+\mathcal{C}(\boldsymbol{\theta})=-\sum_{i=1}^n  \left(y_i(\theta_0+\theta_1x_i) -\log{(1+\exp{(\theta_0+\theta_1x_i)})}\right).
 $$
 <p>&nbsp;<br>
-</div>
+
+<p>This equation is known in statistics as the <b>cross entropy</b>. Finally, we note that just as in linear regression, 
+in practice we often supplement the cross-entropy with additional regularization terms, usually \( L_1 \) and \( L_2 \) regularization as we did for Ridge and Lasso regression.
+</p>
 </section>
 
 <section>
-<h2 id="conjugate-gradient-method-and-iterations">Conjugate gradient method and iterations </h2>
-<div class="alert alert-block alert-block alert-text-normal">
-<b></b>
-<p>
+<h2 id="minimizing-the-cross-entropy">Minimizing the cross entropy </h2>
 
-<p>If we choose the conjugate vectors \( \boldsymbol{p}_k \) carefully, 
-then we may not need all of them to obtain a good approximation to the solution 
-\( \boldsymbol{x} \). 
-We want to regard the conjugate gradient method as an iterative method. 
-This will us to solve systems where \( n \) is so large that the direct 
-method would take too much time.
+<p>The cross entropy is a convex function of the weights \( \boldsymbol{\theta} \) and,
+therefore, any local minimizer is a global minimizer. 
 </p>
 
-<p>We denote the initial guess for \( \boldsymbol{x} \) as \( \boldsymbol{x}_0 \). 
-We can assume without loss of generality that
+<p>Minimizing this
+cost function with respect to the two parameters \( \theta_0 \) and \( \theta_1 \) we obtain
 </p>
+
 <p>&nbsp;<br>
 $$
-\begin{equation*}
-\boldsymbol{x}_0=0,
-\end{equation*}
+\frac{\partial \mathcal{C}(\boldsymbol{\theta})}{\partial \theta_0} = -\sum_{i=1}^n  \left(y_i -\frac{\exp{(\theta_0+\theta_1x_i)}}{1+\exp{(\theta_0+\theta_1x_i)}}\right),
 $$
 <p>&nbsp;<br>
 
-<p>or consider the system</p>
+<p>and </p>
 <p>&nbsp;<br>
 $$
-\begin{equation*}
-\boldsymbol{A}\boldsymbol{z} = \boldsymbol{b}-\boldsymbol{A}\boldsymbol{x}_0,
-\end{equation*}
+\frac{\partial \mathcal{C}(\boldsymbol{\theta})}{\partial \theta_1} = -\sum_{i=1}^n  \left(y_ix_i -x_i\frac{\exp{(\theta_0+\theta_1x_i)}}{1+\exp{(\theta_0+\theta_1x_i)}}\right).
 $$
 <p>&nbsp;<br>
-
-<p>instead.</p>
-</div>
 </section>
 
 <section>
-<h2 id="conjugate-gradient-method">Conjugate gradient method </h2>
-<div class="alert alert-block alert-block alert-text-normal">
-<b></b>
-<p>
-<p>One can show that the solution \( \boldsymbol{x} \) is also the unique minimizer of the quadratic form</p>
+<h2 id="a-more-compact-expression">A more compact expression </h2>
+
+<p>Let us now define a vector \( \boldsymbol{y} \) with \( n \) elements \( y_i \), an
+\( n\times p \) matrix \( \boldsymbol{X} \) which contains the \( x_i \) values and a
+vector \( \boldsymbol{p} \) of fitted probabilities \( p(y_i\vert x_i,\boldsymbol{\theta}) \). We can rewrite in a more compact form the first
+derivative of the cost function as
+</p>
+
 <p>&nbsp;<br>
 $$
-\begin{equation*}
-  f(\boldsymbol{x}) = \frac{1}{2}\boldsymbol{x}^T\boldsymbol{A}\boldsymbol{x} - \boldsymbol{x}^T \boldsymbol{x} , \quad \boldsymbol{x}\in\mathbf{R}^n. 
-\end{equation*}
+\frac{\partial \mathcal{C}(\boldsymbol{\theta})}{\partial \boldsymbol{\theta}} = -\boldsymbol{X}^T\left(\boldsymbol{y}-\boldsymbol{p}\right). 
 $$
 <p>&nbsp;<br>
 
-<p>This suggests taking the first basis vector \( \boldsymbol{p}_1 \) 
-to be the gradient of \( f \) at \( \boldsymbol{x}=\boldsymbol{x}_0 \), 
-which equals
+<p>If we in addition define a diagonal matrix \( \boldsymbol{W} \) with elements 
+\( p(y_i\vert x_i,\boldsymbol{\theta})(1-p(y_i\vert x_i,\boldsymbol{\theta}) \), we can obtain a compact expression of the second derivative as 
 </p>
+
 <p>&nbsp;<br>
 $$
-\begin{equation*}
-\boldsymbol{A}\boldsymbol{x}_0-\boldsymbol{b},
-\end{equation*}
+\frac{\partial^2 \mathcal{C}(\boldsymbol{\theta})}{\partial \boldsymbol{\theta}\partial \boldsymbol{\theta}^T} = \boldsymbol{X}^T\boldsymbol{W}\boldsymbol{X}. 
 $$
 <p>&nbsp;<br>
-
-<p>and 
-\( \boldsymbol{x}_0=0 \) it is equal \( -\boldsymbol{b} \).
-The other vectors in the basis will be conjugate to the gradient, 
-hence the name conjugate gradient method.
-</p>
-</div>
 </section>
 
 <section>
-<h2 id="conjugate-gradient-method">Conjugate gradient method </h2>
-<div class="alert alert-block alert-block alert-text-normal">
-<b></b>
-<p>
-<p>Let  \( \boldsymbol{r}_k \) be the residual at the \( k \)-th step:</p>
+<h2 id="extending-to-more-predictors">Extending to more predictors </h2>
+
+<p>Within a binary classification problem, we can easily expand our model to include multiple predictors. Our ratio between likelihoods is then with \( p \) predictors</p>
 <p>&nbsp;<br>
 $$
-\begin{equation*}
-\boldsymbol{r}_k=\boldsymbol{b}-\boldsymbol{A}\boldsymbol{x}_k.
-\end{equation*}
+\log{ \frac{p(\boldsymbol{\theta}\boldsymbol{x})}{1-p(\boldsymbol{\theta}\boldsymbol{x})}} = \theta_0+\theta_1x_1+\theta_2x_2+\dots+\theta_px_p.
 $$
 <p>&nbsp;<br>
 
-<p>Note that \( \boldsymbol{r}_k \) is the negative gradient of \( f \) at 
-\( \boldsymbol{x}=\boldsymbol{x}_k \), 
-so the gradient descent method would be to move in the direction \( \boldsymbol{r}_k \). 
-Here, we insist that the directions \( \boldsymbol{p}_k \) are conjugate to each other, 
-so we take the direction closest to the gradient \( \boldsymbol{r}_k \)  
-under the conjugacy constraint. 
-This gives the following expression
-</p>
+<p>Here we defined \( \boldsymbol{x}=[1,x_1,x_2,\dots,x_p] \) and \( \boldsymbol{\theta}=[\theta_0, \theta_1, \dots, \theta_p] \) leading to</p>
 <p>&nbsp;<br>
 $$
-\begin{equation*}
-\boldsymbol{p}_{k+1}=\boldsymbol{r}_k-\frac{\boldsymbol{p}_k^T \boldsymbol{A}\boldsymbol{r}_k}{\boldsymbol{p}_k^T\boldsymbol{A}\boldsymbol{p}_k} \boldsymbol{p}_k.
-\end{equation*}
+p(\boldsymbol{\theta}\boldsymbol{x})=\frac{ \exp{(\theta_0+\theta_1x_1+\theta_2x_2+\dots+\theta_px_p)}}{1+\exp{(\theta_0+\theta_1x_1+\theta_2x_2+\dots+\theta_px_p)}}.
 $$
 <p>&nbsp;<br>
-</div>
 </section>
 
 <section>
-<h2 id="conjugate-gradient-method">Conjugate gradient method </h2>
-<div class="alert alert-block alert-block alert-text-normal">
-<b></b>
-<p>
-<p>We can also  compute the residual iteratively as</p>
-<p>&nbsp;<br>
-$$
-\begin{equation*}
-\boldsymbol{r}_{k+1}=\boldsymbol{b}-\boldsymbol{A}\boldsymbol{x}_{k+1},
- \end{equation*}
-$$
-<p>&nbsp;<br>
+<h2 id="including-more-classes">Including more classes </h2>
+
+<p>Till now we have mainly focused on two classes, the so-called binary
+system. Suppose we wish to extend to \( K \) classes.  Let us for the sake
+of simplicity assume we have only two predictors. We have then following model
+</p>
 
-<p>which equals</p>
 <p>&nbsp;<br>
 $$
-\begin{equation*}
-\boldsymbol{b}-\boldsymbol{A}(\boldsymbol{x}_k+\alpha_k\boldsymbol{p}_k),
- \end{equation*}
+\log{\frac{p(C=1\vert x)}{p(K\vert x)}} = \theta_{10}+\theta_{11}x_1,
 $$
 <p>&nbsp;<br>
 
-<p>or</p>
+<p>and </p>
 <p>&nbsp;<br>
 $$
-\begin{equation*}
-(\boldsymbol{b}-\boldsymbol{A}\boldsymbol{x}_k)-\alpha_k\boldsymbol{A}\boldsymbol{p}_k,
- \end{equation*}
+\log{\frac{p(C=2\vert x)}{p(K\vert x)}} = \theta_{20}+\theta_{21}x_1,
 $$
 <p>&nbsp;<br>
 
-<p>which gives</p>
-
+<p>and so on till the class \( C=K-1 \) class</p>
 <p>&nbsp;<br>
 $$
-\begin{equation*}
-\boldsymbol{r}_{k+1}=\boldsymbol{r}_k-\boldsymbol{A}\boldsymbol{p}_{k},
- \end{equation*}
+\log{\frac{p(C=K-1\vert x)}{p(K\vert x)}} = \theta_{(K-1)0}+\theta_{(K-1)1}x_1,
 $$
 <p>&nbsp;<br>
-</div>
+
+<p>and the model is specified in term of \( K-1 \) so-called log-odds or
+<b>logit</b> transformations.
+</p>
 </section>
 
 <section>
-<h2 id="revisiting-our-first-homework">Revisiting our first homework </h2>
+<h2 id="more-classes">More classes </h2>
 
-<p>We will use linear regression as a case study for the gradient descent
-methods. Linear regression is a great test case for the gradient
-descent methods discussed in the lectures since it has several
-desirable properties such as:
+<p>In our discussion of neural networks we will encounter the above again
+in terms of a slightly modified function, the so-called <b>Softmax</b> function.
 </p>
 
-<ol>
-<p><li> An analytical solution (recall homework set 1).</li>
-<p><li> The gradient can be computed analytically.</li>
-<p><li> The cost function is convex which guarantees that gradient descent converges for small enough learning rates</li>
-</ol>
-<p>
-<p>We revisit an example similar to what we had in the first homework set. We had a function  of the type</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="font-size: 80%; line-height: 125%;">m = <span style="color: #B452CD">100</span>
-x = <span style="color: #B452CD">2</span>*np.random.rand(m,<span style="color: #B452CD">1</span>)
-y = <span style="color: #B452CD">4</span>+<span style="color: #B452CD">3</span>*x+np.random.randn(m,<span style="color: #B452CD">1</span>)
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-<p>with \( x_i \in [0,1]  \) is chosen randomly using a uniform distribution. Additionally we have a stochastic noise chosen according to a normal distribution \( \cal {N}(0,1) \). 
-The linear regression model is given by 
+<p>The softmax function is used in various multiclass classification
+methods, such as multinomial logistic regression (also known as
+softmax regression), multiclass linear discriminant analysis, naive
+Bayes classifiers, and artificial neural networks.  Specifically, in
+multinomial logistic regression and linear discriminant analysis, the
+input to the function is the result of \( K \) distinct linear functions,
+and the predicted probability for the \( k \)-th class given a sample
+vector \( \boldsymbol{x} \) and a weighting vector \( \boldsymbol{\theta} \) is (with two
+predictors):
 </p>
-<p>&nbsp;<br>
-$$
-h_\beta(x) = \boldsymbol{y} = \beta_0 + \beta_1 x,
-$$
-<p>&nbsp;<br>
 
-<p>such that </p>
 <p>&nbsp;<br>
 $$
-\boldsymbol{y}_i = \beta_0 + \beta_1 x_i.
+p(C=k\vert \mathbf {x} )=\frac{\exp{(\theta_{k0}+\theta_{k1}x_1)}}{1+\sum_{l=1}^{K-1}\exp{(\theta_{l0}+\theta_{l1}x_1)}}.
 $$
 <p>&nbsp;<br>
-</section>
-
-<section>
-<h2 id="gradient-descent-example">Gradient descent example </h2>
-
-<p>Let \( \mathbf{y} = (y_1,\cdots,y_n)^T \), \( \mathbf{\boldsymbol{y}} = (\boldsymbol{y}_1,\cdots,\boldsymbol{y}_n)^T \) and \( \beta = (\beta_0, \beta_1)^T \)</p>
 
-<p>It is convenient to write \( \mathbf{\boldsymbol{y}} = X\beta \) where \( X \in \mathbb{R}^{100 \times 2}  \) is the design matrix given by (we keep the intercept here)</p>
+<p>It is easy to extend to more predictors. The final class is </p>
 <p>&nbsp;<br>
 $$
-X \equiv \begin{bmatrix}
-1 & x_1  \\
-\vdots & \vdots  \\
-1 & x_{100} &  \\
-\end{bmatrix}.
+p(C=K\vert \mathbf {x} )=\frac{1}{1+\sum_{l=1}^{K-1}\exp{(\theta_{l0}+\theta_{l1}x_1)}},
 $$
 <p>&nbsp;<br>
 
-<p>The cost/loss/risk function is given by (</p>
-<p>&nbsp;<br>
-$$
-C(\beta) = \frac{1}{n}||X\beta-\mathbf{y}||_{2}^{2} = \frac{1}{n}\sum_{i=1}^{100}\left[ (\beta_0 + \beta_1 x_i)^2 - 2 y_i (\beta_0 + \beta_1 x_i) + y_i^2\right] 
-$$
-<p>&nbsp;<br>
+<p>and they sum to one. Our earlier discussions were all specialized to
+the case with two classes only. It is easy to see from the above that
+what we derived earlier is compatible with these equations.
+</p>
 
-<p>and we want to find \( \beta \) such that \( C(\beta) \) is minimized.</p>
+<p>To find the optimal parameters we would typically use a gradient
+descent method.  Newton's method and gradient descent methods are
+discussed in the material on <a href="https://compphysics.github.io/MachineLearning/doc/pub/Splines/html/Splines-bs.html" target="_blank">optimization
+methods</a>.
+</p>
 </section>
 
 <section>
-<h2 id="the-derivative-of-the-cost-loss-function">The derivative of the cost/loss function </h2>
+<h2 id="optimization-the-central-part-of-any-machine-learning-algortithm">Optimization, the central part of any Machine Learning algortithm </h2>
 
-<p>Computing \( \partial C(\beta) / \partial \beta_0 \) and \( \partial C(\beta) / \partial \beta_1 \) we can show  that the gradient can be written as</p>
-<p>&nbsp;<br>
-$$
-\nabla_{\beta} C(\beta) = \frac{2}{n}\begin{bmatrix} \sum_{i=1}^{100} \left(\beta_0+\beta_1x_i-y_i\right) \\
-\sum_{i=1}^{100}\left( x_i (\beta_0+\beta_1x_i)-y_ix_i\right) \\
-\end{bmatrix} = \frac{2}{n}X^T(X\beta - \mathbf{y}), 
-$$
-<p>&nbsp;<br>
-
-<p>where \( X \) is the design matrix defined above.</p>
+<p>Almost every problem in machine learning and data science starts with
+a dataset \( X \), a model \( g(\theta) \), which is a function of the
+parameters \( \theta \) and a cost function \( C(X, g(\theta)) \) that allows
+us to judge how well the model \( g(\theta) \) explains the observations
+\( X \). The model is fit by finding the values of \( \theta \) that minimize
+the cost function. Ideally we would be able to solve for \( \theta \)
+analytically, however this is not possible in general and we must use
+some approximative/numerical method to compute the minimum.
+</p>
 </section>
 
 <section>
-<h2 id="the-hessian-matrix">The Hessian matrix </h2>
-<p>The Hessian matrix of \( C(\beta) \) is given by </p>
+<h2 id="revisiting-our-logistic-regression-case">Revisiting our Logistic Regression case </h2>
+
+<p>In our discussion on Logistic Regression we studied the 
+case of
+two classes, with \( y_i \) either
+\( 0 \) or \( 1 \). Furthermore we assumed also that we have only two
+parameters \( \theta \) in our fitting, that is we
+defined probabilities
+</p>
+
 <p>&nbsp;<br>
 $$
-\boldsymbol{H} \equiv \begin{bmatrix}
-\frac{\partial^2 C(\beta)}{\partial \beta_0^2} & \frac{\partial^2 C(\beta)}{\partial \beta_0 \partial \beta_1}  \\
-\frac{\partial^2 C(\beta)}{\partial \beta_0 \partial \beta_1} & \frac{\partial^2 C(\beta)}{\partial \beta_1^2} &  \\
-\end{bmatrix} = \frac{2}{n}X^T X.
+\begin{align*}
+p(y_i=1|x_i,\boldsymbol{\theta}) &= \frac{\exp{(\theta_0+\theta_1x_i)}}{1+\exp{(\theta_0+\theta_1x_i)}},\nonumber\\
+p(y_i=0|x_i,\boldsymbol{\theta}) &= 1 - p(y_i=1|x_i,\boldsymbol{\theta}),
+\end{align*}
 $$
 <p>&nbsp;<br>
 
-<p>This result implies that \( C(\beta) \) is a convex function since the matrix \( X^T X \) always is positive semi-definite.</p>
+<p>where \( \boldsymbol{\theta} \) are the weights we wish to extract from data, in our case \( \theta_0 \) and \( \theta_1 \). </p>
 </section>
 
 <section>
-<h2 id="simple-program">Simple program </h2>
+<h2 id="the-equations-to-solve">The equations to solve </h2>
+
+<p>Our compact equations used a definition of a vector \( \boldsymbol{y} \) with \( n \)
+elements \( y_i \), an \( n\times p \) matrix \( \boldsymbol{X} \) which contains the
+\( x_i \) values and a vector \( \boldsymbol{p} \) of fitted probabilities
+\( p(y_i\vert x_i,\boldsymbol{\theta}) \). We rewrote in a more compact form
+the first derivative of the cost function as
+</p>
 
-<p>We can now write a program that minimizes \( C(\beta) \) using the gradient descent method with a constant learning rate \( \gamma \) according to </p>
 <p>&nbsp;<br>
 $$
-\beta_{k+1} = \beta_k - \gamma \nabla_\beta C(\beta_k), \ k=0,1,\cdots 
+\frac{\partial \mathcal{C}(\boldsymbol{\theta})}{\partial \boldsymbol{\theta}} = -\boldsymbol{X}^T\left(\boldsymbol{y}-\boldsymbol{p}\right). 
 $$
 <p>&nbsp;<br>
 
-<p>We can use the expression we computed for the gradient and let use a
-\( \beta_0 \) be chosen randomly and let \( \gamma = 0.001 \). Stop iterating
-when \( ||\nabla_\beta C(\beta_k) || \leq \epsilon = 10^{-8} \). <b>Note that the code below does not include the latter stop criterion</b>.
-</p>
-
-<p>And finally we can compare our solution for \( \beta \) with the analytic result given by 
-\( \beta= (X^TX)^{-1} X^T \mathbf{y} \).
+<p>If we in addition define a diagonal matrix \( \boldsymbol{W} \) with elements 
+\( p(y_i\vert x_i,\boldsymbol{\theta})(1-p(y_i\vert x_i,\boldsymbol{\theta}) \), we can obtain a compact expression of the second derivative as 
 </p>
-</section>
-
-<section>
-<h2 id="gradient-descent-example">Gradient Descent Example </h2>
 
-<p>Here our simple example</p>
+<p>&nbsp;<br>
+$$
+\frac{\partial^2 \mathcal{C}(\boldsymbol{\theta})}{\partial \boldsymbol{\theta}\partial \boldsymbol{\theta}^T} = \boldsymbol{X}^T\boldsymbol{W}\boldsymbol{X}. 
+$$
+<p>&nbsp;<br>
 
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="font-size: 80%; line-height: 125%;"><span style="color: #228B22"># Importing various packages</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">random</span> <span style="color: #8B008B; font-weight: bold">import</span> random, seed
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">matplotlib.pyplot</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">plt</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">mpl_toolkits.mplot3d</span> <span style="color: #8B008B; font-weight: bold">import</span> Axes3D
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">matplotlib</span> <span style="color: #8B008B; font-weight: bold">import</span> cm
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">matplotlib.ticker</span> <span style="color: #8B008B; font-weight: bold">import</span> LinearLocator, FormatStrFormatter
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">sys</span>
-
-<span style="color: #228B22"># the number of datapoints</span>
-n = <span style="color: #B452CD">100</span>
-x = <span style="color: #B452CD">2</span>*np.random.rand(n,<span style="color: #B452CD">1</span>)
-y = <span style="color: #B452CD">4</span>+<span style="color: #B452CD">3</span>*x+np.random.randn(n,<span style="color: #B452CD">1</span>)
-
-X = np.c_[np.ones((n,<span style="color: #B452CD">1</span>)), x]
-<span style="color: #228B22"># Hessian matrix</span>
-H = (<span style="color: #B452CD">2.0</span>/n)* X.T @ X
-<span style="color: #228B22"># Get the eigenvalues</span>
-EigValues, EigVectors = np.linalg.eig(H)
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">f&quot;Eigenvalues of Hessian Matrix:{</span>EigValues<span style="color: #CD5555">}&quot;</span>)
-
-beta_linreg = np.linalg.inv(X.T @ X) @ X.T @ y
-<span style="color: #658b00">print</span>(beta_linreg)
-beta = np.random.randn(<span style="color: #B452CD">2</span>,<span style="color: #B452CD">1</span>)
-
-eta = <span style="color: #B452CD">1.0</span>/np.max(EigValues)
-Niterations = <span style="color: #B452CD">1000</span>
-
-<span style="color: #8B008B; font-weight: bold">for</span> <span style="color: #658b00">iter</span> <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(Niterations):
-    gradient = (<span style="color: #B452CD">2.0</span>/n)*X.T @ (X @ beta-y)
-    beta -= eta*gradient
-
-<span style="color: #658b00">print</span>(beta)
-xnew = np.array([[<span style="color: #B452CD">0</span>],[<span style="color: #B452CD">2</span>]])
-xbnew = np.c_[np.ones((<span style="color: #B452CD">2</span>,<span style="color: #B452CD">1</span>)), xnew]
-ypredict = xbnew.dot(beta)
-ypredict2 = xbnew.dot(beta_linreg)
-plt.plot(xnew, ypredict, <span style="color: #CD5555">&quot;r-&quot;</span>)
-plt.plot(xnew, ypredict2, <span style="color: #CD5555">&quot;b-&quot;</span>)
-plt.plot(x, y ,<span style="color: #CD5555">&#39;ro&#39;</span>)
-plt.axis([<span style="color: #B452CD">0</span>,<span style="color: #B452CD">2.0</span>,<span style="color: #B452CD">0</span>, <span style="color: #B452CD">15.0</span>])
-plt.xlabel(<span style="color: #CD5555">r&#39;$x$&#39;</span>)
-plt.ylabel(<span style="color: #CD5555">r&#39;$y$&#39;</span>)
-plt.title(<span style="color: #CD5555">r&#39;Gradient descent example&#39;</span>)
-plt.show()
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
+<p>This defines what is called  the Hessian matrix.</p>
 </section>
 
 <section>
-<h2 id="and-a-corresponding-example-using-scikit-learn">And a corresponding example using <b>scikit-learn</b> </h2>
-
+<h2 id="solving-using-newton-raphson-s-method">Solving using Newton-Raphson's method </h2>
 
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="font-size: 80%; line-height: 125%;"><span style="color: #228B22"># Importing various packages</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">random</span> <span style="color: #8B008B; font-weight: bold">import</span> random, seed
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">matplotlib.pyplot</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">plt</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.linear_model</span> <span style="color: #8B008B; font-weight: bold">import</span> SGDRegressor
-
-n = <span style="color: #B452CD">100</span>
-x = <span style="color: #B452CD">2</span>*np.random.rand(n,<span style="color: #B452CD">1</span>)
-y = <span style="color: #B452CD">4</span>+<span style="color: #B452CD">3</span>*x+np.random.randn(n,<span style="color: #B452CD">1</span>)
-
-X = np.c_[np.ones((n,<span style="color: #B452CD">1</span>)), x]
-beta_linreg = np.linalg.inv(X.T @ X) @ (X.T @ y)
-<span style="color: #658b00">print</span>(beta_linreg)
-sgdreg = SGDRegressor(max_iter = <span style="color: #B452CD">50</span>, penalty=<span style="color: #8B008B; font-weight: bold">None</span>, eta0=<span style="color: #B452CD">0.1</span>)
-sgdreg.fit(x,y.ravel())
-<span style="color: #658b00">print</span>(sgdreg.intercept_, sgdreg.coef_)
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-</section>
+<p>If we can set up these equations, Newton-Raphson's iterative method is normally the method of choice. It requires however that we can compute in an efficient way the  matrices that define the first and second derivatives. </p>
 
-<section>
-<h2 id="gradient-descent-and-ridge">Gradient descent and Ridge </h2>
+<p>Our iterative scheme is then given by</p>
 
-<p>We have also discussed Ridge regression where the loss function contains a regularized term given by the \( L_2 \) norm of \( \beta \), </p>
 <p>&nbsp;<br>
 $$
-C_{\text{ridge}}(\beta) = \frac{1}{n}||X\beta -\mathbf{y}||^2 + \lambda ||\beta||^2, \ \lambda \geq 0.
+\boldsymbol{\theta}^{\mathrm{new}} = \boldsymbol{\theta}^{\mathrm{old}}-\left(\frac{\partial^2 \mathcal{C}(\boldsymbol{\theta})}{\partial \boldsymbol{\theta}\partial \boldsymbol{\theta}^T}\right)^{-1}_{\boldsymbol{\theta}^{\mathrm{old}}}\times \left(\frac{\partial \mathcal{C}(\boldsymbol{\theta})}{\partial \boldsymbol{\theta}}\right)_{\boldsymbol{\theta}^{\mathrm{old}}},
 $$
 <p>&nbsp;<br>
 
-<p>In order to minimize \( C_{\text{ridge}}(\beta) \) using GD we adjust the gradient as follows </p>
-<p>&nbsp;<br>
-$$
-\nabla_\beta C_{\text{ridge}}(\beta)  = \frac{2}{n}\begin{bmatrix} \sum_{i=1}^{100} \left(\beta_0+\beta_1x_i-y_i\right) \\
-\sum_{i=1}^{100}\left( x_i (\beta_0+\beta_1x_i)-y_ix_i\right) \\
-\end{bmatrix} + 2\lambda\begin{bmatrix} \beta_0 \\ \beta_1\end{bmatrix} = 2 (\frac{1}{n}X^T(X\beta - \mathbf{y})+\lambda \beta).
-$$
-<p>&nbsp;<br>
+<p>or in matrix form as</p>
 
-<p>We can easily extend our program to minimize \( C_{\text{ridge}}(\beta) \) using gradient descent and compare with the analytical solution given by </p>
 <p>&nbsp;<br>
 $$
-\beta_{\text{ridge}} = \left(X^T X + n\lambda I_{2 \times 2} \right)^{-1} X^T \mathbf{y}.
+\boldsymbol{\theta}^{\mathrm{new}} = \boldsymbol{\theta}^{\mathrm{old}}-\left(\boldsymbol{X}^T\boldsymbol{W}\boldsymbol{X} \right)^{-1}\times \left(-\boldsymbol{X}^T(\boldsymbol{y}-\boldsymbol{p}) \right)_{\boldsymbol{\theta}^{\mathrm{old}}}.
 $$
 <p>&nbsp;<br>
-</section>
 
-<section>
-<h2 id="the-hessian-matrix-for-ridge-regression">The Hessian matrix for Ridge Regression </h2>
-<p>The Hessian matrix of Ridge Regression for our simple example  is given by </p>
-<p>&nbsp;<br>
-$$
-\boldsymbol{H} \equiv \begin{bmatrix}
-\frac{\partial^2 C(\beta)}{\partial \beta_0^2} & \frac{\partial^2 C(\beta)}{\partial \beta_0 \partial \beta_1}  \\
-\frac{\partial^2 C(\beta)}{\partial \beta_0 \partial \beta_1} & \frac{\partial^2 C(\beta)}{\partial \beta_1^2} &  \\
-\end{bmatrix} = \frac{2}{n}X^T X+2\lambda\boldsymbol{I}.
-$$
-<p>&nbsp;<br>
+<p>The right-hand side is computed with the old values of \( \theta \). </p>
 
-<p>This implies that the Hessian matrix  is positive definite, hence the stationary point is a
-minimum.
-Note that the Ridge cost function is convex being  a sum of two convex
-functions. Therefore, the stationary point is a global
-minimum of this function.
-</p>
+<p>If we can compute these matrices, in particular the Hessian, the above is often the easiest method to implement. </p>
 </section>
 
 <section>
-<h2 id="program-example-for-gradient-descent-with-ridge-regression">Program example for gradient descent with Ridge Regression </h2>
+<h2 id="example-code-for-logistic-regression">Example code for Logistic Regression </h2>
+
+<p>Here we make a class for Logistic regression. The code uses a simple data set and includes both a binary case and a multiclass case.</p>
 
 <!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
 <div class="cell border-box-sizing code_cell rendered">
@@ -1592,56 +1734,138 @@ <h2 id="program-example-for-gradient-descent-with-ridge-regression">Program exam
     <div class="inner_cell">
       <div class="input_area">
         <div class="highlight" style="background: #eeeedd">
-  <pre style="font-size: 80%; line-height: 125%;"><span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">random</span> <span style="color: #8B008B; font-weight: bold">import</span> random, seed
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">matplotlib.pyplot</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">plt</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">mpl_toolkits.mplot3d</span> <span style="color: #8B008B; font-weight: bold">import</span> Axes3D
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">matplotlib</span> <span style="color: #8B008B; font-weight: bold">import</span> cm
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">matplotlib.ticker</span> <span style="color: #8B008B; font-weight: bold">import</span> LinearLocator, FormatStrFormatter
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">sys</span>
-
-<span style="color: #228B22"># the number of datapoints</span>
-n = <span style="color: #B452CD">100</span>
-x = <span style="color: #B452CD">2</span>*np.random.rand(n,<span style="color: #B452CD">1</span>)
-y = <span style="color: #B452CD">4</span>+<span style="color: #B452CD">3</span>*x+np.random.randn(n,<span style="color: #B452CD">1</span>)
-
-X = np.c_[np.ones((n,<span style="color: #B452CD">1</span>)), x]
-XT_X = X.T @ X
-
-<span style="color: #228B22">#Ridge parameter lambda</span>
-lmbda  = <span style="color: #B452CD">0.001</span>
-Id = n*lmbda* np.eye(XT_X.shape[<span style="color: #B452CD">0</span>])
-
-<span style="color: #228B22"># Hessian matrix</span>
-H = (<span style="color: #B452CD">2.0</span>/n)* XT_X+<span style="color: #B452CD">2</span>*lmbda* np.eye(XT_X.shape[<span style="color: #B452CD">0</span>])
-<span style="color: #228B22"># Get the eigenvalues</span>
-EigValues, EigVectors = np.linalg.eig(H)
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">f&quot;Eigenvalues of Hessian Matrix:{</span>EigValues<span style="color: #CD5555">}&quot;</span>)
-
-
-beta_linreg = np.linalg.inv(XT_X+Id) @ X.T @ y
-<span style="color: #658b00">print</span>(beta_linreg)
-<span style="color: #228B22"># Start plain gradient descent</span>
-beta = np.random.randn(<span style="color: #B452CD">2</span>,<span style="color: #B452CD">1</span>)
-
-eta = <span style="color: #B452CD">1.0</span>/np.max(EigValues)
-Niterations = <span style="color: #B452CD">100</span>
-
-<span style="color: #8B008B; font-weight: bold">for</span> <span style="color: #658b00">iter</span> <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(Niterations):
-    gradients = <span style="color: #B452CD">2.0</span>/n*X.T @ (X @ (beta)-y)+<span style="color: #B452CD">2</span>*lmbda*beta
-    beta -= eta*gradients
-
-<span style="color: #658b00">print</span>(beta)
-ypredict = X @ beta
-ypredict2 = X @ beta_linreg
-plt.plot(x, ypredict, <span style="color: #CD5555">&quot;r-&quot;</span>)
-plt.plot(x, ypredict2, <span style="color: #CD5555">&quot;b-&quot;</span>)
-plt.plot(x, y ,<span style="color: #CD5555">&#39;ro&#39;</span>)
-plt.axis([<span style="color: #B452CD">0</span>,<span style="color: #B452CD">2.0</span>,<span style="color: #B452CD">0</span>, <span style="color: #B452CD">15.0</span>])
-plt.xlabel(<span style="color: #CD5555">r&#39;$x$&#39;</span>)
-plt.ylabel(<span style="color: #CD5555">r&#39;$y$&#39;</span>)
-plt.title(<span style="color: #CD5555">r&#39;Gradient descent example for Ridge&#39;</span>)
-plt.show()
+  <pre style="font-size: 80%; line-height: 125%;"><span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
+
+<span style="color: #8B008B; font-weight: bold">class</span> <span style="color: #008b45; font-weight: bold">LogisticRegression</span>:
+    <span style="color: #CD5555">&quot;&quot;&quot;</span>
+<span style="color: #CD5555">    Logistic Regression for binary and multiclass classification.</span>
+<span style="color: #CD5555">    &quot;&quot;&quot;</span>
+    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">__init__</span>(<span style="color: #658b00">self</span>, lr=<span style="color: #B452CD">0.01</span>, epochs=<span style="color: #B452CD">1000</span>, fit_intercept=<span style="color: #8B008B; font-weight: bold">True</span>, verbose=<span style="color: #8B008B; font-weight: bold">False</span>):
+        <span style="color: #658b00">self</span>.lr = lr                  <span style="color: #228B22"># Learning rate for gradient descent</span>
+        <span style="color: #658b00">self</span>.epochs = epochs          <span style="color: #228B22"># Number of iterations</span>
+        <span style="color: #658b00">self</span>.fit_intercept = fit_intercept  <span style="color: #228B22"># Whether to add intercept (bias)</span>
+        <span style="color: #658b00">self</span>.verbose = verbose        <span style="color: #228B22"># Print loss during training if True</span>
+        <span style="color: #658b00">self</span>.weights = <span style="color: #8B008B; font-weight: bold">None</span>
+        <span style="color: #658b00">self</span>.multi_class = <span style="color: #8B008B; font-weight: bold">False</span>      <span style="color: #228B22"># Will be determined at fit time</span>
+
+    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">_add_intercept</span>(<span style="color: #658b00">self</span>, X):
+        <span style="color: #CD5555">&quot;&quot;&quot;Add intercept term (column of ones) to feature matrix.&quot;&quot;&quot;</span>
+        intercept = np.ones((X.shape[<span style="color: #B452CD">0</span>], <span style="color: #B452CD">1</span>))
+        <span style="color: #8B008B; font-weight: bold">return</span> np.concatenate((intercept, X), axis=<span style="color: #B452CD">1</span>)
+
+    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">_sigmoid</span>(<span style="color: #658b00">self</span>, z):
+        <span style="color: #CD5555">&quot;&quot;&quot;Sigmoid function for binary logistic.&quot;&quot;&quot;</span>
+        <span style="color: #8B008B; font-weight: bold">return</span> <span style="color: #B452CD">1</span> / (<span style="color: #B452CD">1</span> + np.exp(-z))
+
+    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">_softmax</span>(<span style="color: #658b00">self</span>, Z):
+        <span style="color: #CD5555">&quot;&quot;&quot;Softmax function for multiclass logistic.&quot;&quot;&quot;</span>
+        exp_Z = np.exp(Z - np.max(Z, axis=<span style="color: #B452CD">1</span>, keepdims=<span style="color: #8B008B; font-weight: bold">True</span>))
+        <span style="color: #8B008B; font-weight: bold">return</span> exp_Z / np.sum(exp_Z, axis=<span style="color: #B452CD">1</span>, keepdims=<span style="color: #8B008B; font-weight: bold">True</span>)
+
+    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">fit</span>(<span style="color: #658b00">self</span>, X, y):
+        <span style="color: #CD5555">&quot;&quot;&quot;</span>
+<span style="color: #CD5555">        Train the logistic regression model using gradient descent.</span>
+<span style="color: #CD5555">        Supports binary (sigmoid) and multiclass (softmax) based on y.</span>
+<span style="color: #CD5555">        &quot;&quot;&quot;</span>
+        X = np.array(X)
+        y = np.array(y)
+        n_samples, n_features = X.shape
+
+        <span style="color: #228B22"># Add intercept if needed</span>
+        <span style="color: #8B008B; font-weight: bold">if</span> <span style="color: #658b00">self</span>.fit_intercept:
+            X = <span style="color: #658b00">self</span>._add_intercept(X)
+            n_features += <span style="color: #B452CD">1</span>
+
+        <span style="color: #228B22"># Determine classes and mode (binary vs multiclass)</span>
+        unique_classes = np.unique(y)
+        <span style="color: #8B008B; font-weight: bold">if</span> <span style="color: #658b00">len</span>(unique_classes) &gt; <span style="color: #B452CD">2</span>:
+            <span style="color: #658b00">self</span>.multi_class = <span style="color: #8B008B; font-weight: bold">True</span>
+        <span style="color: #8B008B; font-weight: bold">else</span>:
+            <span style="color: #658b00">self</span>.multi_class = <span style="color: #8B008B; font-weight: bold">False</span>
+
+        <span style="color: #228B22"># ----- Multiclass case -----</span>
+        <span style="color: #8B008B; font-weight: bold">if</span> <span style="color: #658b00">self</span>.multi_class:
+            n_classes = <span style="color: #658b00">len</span>(unique_classes)
+            <span style="color: #228B22"># Map original labels to 0...n_classes-1</span>
+            class_to_index = {c: idx <span style="color: #8B008B; font-weight: bold">for</span> idx, c <span style="color: #8B008B">in</span> <span style="color: #658b00">enumerate</span>(unique_classes)}
+            y_indices = np.array([class_to_index[c] <span style="color: #8B008B; font-weight: bold">for</span> c <span style="color: #8B008B">in</span> y])
+            <span style="color: #228B22"># Initialize weight matrix (features x classes)</span>
+            <span style="color: #658b00">self</span>.weights = np.zeros((n_features, n_classes))
+
+            <span style="color: #228B22"># One-hot encode y</span>
+            Y_onehot = np.zeros((n_samples, n_classes))
+            Y_onehot[np.arange(n_samples), y_indices] = <span style="color: #B452CD">1</span>
+
+            <span style="color: #228B22"># Gradient descent</span>
+            <span style="color: #8B008B; font-weight: bold">for</span> epoch <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #658b00">self</span>.epochs):
+                scores = X.dot(<span style="color: #658b00">self</span>.weights)          <span style="color: #228B22"># Linear scores (n_samples x n_classes)</span>
+                probs = <span style="color: #658b00">self</span>._softmax(scores)        <span style="color: #228B22"># Probabilities (n_samples x n_classes)</span>
+                <span style="color: #228B22"># Compute gradient (features x classes)</span>
+                gradient = (<span style="color: #B452CD">1</span> / n_samples) * X.T.dot(probs - Y_onehot)
+                <span style="color: #228B22"># Update weights</span>
+                <span style="color: #658b00">self</span>.weights -= <span style="color: #658b00">self</span>.lr * gradient
+
+                <span style="color: #8B008B; font-weight: bold">if</span> <span style="color: #658b00">self</span>.verbose <span style="color: #8B008B">and</span> epoch % <span style="color: #B452CD">100</span> == <span style="color: #B452CD">0</span>:
+                    <span style="color: #228B22"># Compute current loss (categorical cross-entropy)</span>
+                    loss = -np.sum(Y_onehot * np.log(probs + <span style="color: #B452CD">1e-15</span>)) / n_samples
+                    <span style="color: #658b00">print</span>(<span style="color: #CD5555">f&quot;[Epoch {</span>epoch<span style="color: #CD5555">}] Multiclass loss: {</span>loss<span style="color: #CD5555">:.4f}&quot;</span>)
+
+        <span style="color: #228B22"># ----- Binary case -----</span>
+        <span style="color: #8B008B; font-weight: bold">else</span>:
+            <span style="color: #228B22"># Convert y to 0/1 if not already</span>
+            <span style="color: #8B008B; font-weight: bold">if</span> <span style="color: #8B008B">not</span> np.array_equal(unique_classes, [<span style="color: #B452CD">0</span>, <span style="color: #B452CD">1</span>]):
+                <span style="color: #228B22"># Map the two classes to 0 and 1</span>
+                class0, class1 = unique_classes
+                y_binary = np.where(y == class1, <span style="color: #B452CD">1</span>, <span style="color: #B452CD">0</span>)
+            <span style="color: #8B008B; font-weight: bold">else</span>:
+                y_binary = y.copy().astype(<span style="color: #658b00">int</span>)
+
+            <span style="color: #228B22"># Initialize weights vector (features,)</span>
+            <span style="color: #658b00">self</span>.weights = np.zeros(n_features)
+
+            <span style="color: #228B22"># Gradient descent</span>
+            <span style="color: #8B008B; font-weight: bold">for</span> epoch <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #658b00">self</span>.epochs):
+                linear_model = X.dot(<span style="color: #658b00">self</span>.weights)     <span style="color: #228B22"># (n_samples,)</span>
+                probs = <span style="color: #658b00">self</span>._sigmoid(linear_model)   <span style="color: #228B22"># (n_samples,)</span>
+                <span style="color: #228B22"># Gradient for binary cross-entropy</span>
+                gradient = (<span style="color: #B452CD">1</span> / n_samples) * X.T.dot(probs - y_binary)
+                <span style="color: #658b00">self</span>.weights -= <span style="color: #658b00">self</span>.lr * gradient
+
+                <span style="color: #8B008B; font-weight: bold">if</span> <span style="color: #658b00">self</span>.verbose <span style="color: #8B008B">and</span> epoch % <span style="color: #B452CD">100</span> == <span style="color: #B452CD">0</span>:
+                    <span style="color: #228B22"># Compute binary cross-entropy loss</span>
+                    loss = -np.mean(
+                        y_binary * np.log(probs + <span style="color: #B452CD">1e-15</span>) + 
+                        (<span style="color: #B452CD">1</span> - y_binary) * np.log(<span style="color: #B452CD">1</span> - probs + <span style="color: #B452CD">1e-15</span>)
+                    )
+                    <span style="color: #658b00">print</span>(<span style="color: #CD5555">f&quot;[Epoch {</span>epoch<span style="color: #CD5555">}] Binary loss: {</span>loss<span style="color: #CD5555">:.4f}&quot;</span>)
+
+    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">predict_prob</span>(<span style="color: #658b00">self</span>, X):
+        <span style="color: #CD5555">&quot;&quot;&quot;</span>
+<span style="color: #CD5555">        Compute probability estimates. Returns a 1D array for binary or</span>
+<span style="color: #CD5555">        a 2D array (n_samples x n_classes) for multiclass.</span>
+<span style="color: #CD5555">        &quot;&quot;&quot;</span>
+        X = np.array(X)
+        <span style="color: #228B22"># Add intercept if the model used it</span>
+        <span style="color: #8B008B; font-weight: bold">if</span> <span style="color: #658b00">self</span>.fit_intercept:
+            X = <span style="color: #658b00">self</span>._add_intercept(X)
+        scores = X.dot(<span style="color: #658b00">self</span>.weights)
+        <span style="color: #8B008B; font-weight: bold">if</span> <span style="color: #658b00">self</span>.multi_class:
+            <span style="color: #8B008B; font-weight: bold">return</span> <span style="color: #658b00">self</span>._softmax(scores)
+        <span style="color: #8B008B; font-weight: bold">else</span>:
+            <span style="color: #8B008B; font-weight: bold">return</span> <span style="color: #658b00">self</span>._sigmoid(scores)
+
+    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">predict</span>(<span style="color: #658b00">self</span>, X):
+        <span style="color: #CD5555">&quot;&quot;&quot;</span>
+<span style="color: #CD5555">        Predict class labels for samples in X.</span>
+<span style="color: #CD5555">        Returns integer class labels (0,1 for binary, or 0...C-1 for multiclass).</span>
+<span style="color: #CD5555">        &quot;&quot;&quot;</span>
+        probs = <span style="color: #658b00">self</span>.predict_prob(X)
+        <span style="color: #8B008B; font-weight: bold">if</span> <span style="color: #658b00">self</span>.multi_class:
+            <span style="color: #228B22"># Choose class with highest probability</span>
+            <span style="color: #8B008B; font-weight: bold">return</span> np.argmax(probs, axis=<span style="color: #B452CD">1</span>)
+        <span style="color: #8B008B; font-weight: bold">else</span>:
+            <span style="color: #228B22"># Threshold at 0.5 for binary</span>
+            <span style="color: #8B008B; font-weight: bold">return</span> (probs &gt;= <span style="color: #B452CD">0.5</span>).astype(<span style="color: #658b00">int</span>)
 </pre>
 </div>
       </div>
@@ -1656,25 +1880,17 @@ <h2 id="program-example-for-gradient-descent-with-ridge-regression">Program exam
     </div>
   </div>
 </div>
-</section>
-
-<section>
-<h2 id="using-gradient-descent-methods-limitations">Using gradient descent methods, limitations </h2>
-
-<ul>
-<p><li> <b>Gradient descent (GD) finds local minima of our function</b>. Since the GD algorithm is deterministic, if it converges, it will converge to a local minimum of our cost/loss/risk function. Because in ML we are often dealing with extremely rugged landscapes with many local minima, this can lead to poor performance.</li>
-<p><li> <b>GD is sensitive to initial conditions</b>. One consequence of the local nature of GD is that initial conditions matter. Depending on where one starts, one will end up at a different local minima. Therefore, it is very important to think about how one initializes the training process. This is true for GD as well as more complicated variants of GD.</li>
-<p><li> <b>Gradients are computationally expensive to calculate for large datasets</b>. In many cases in statistics and ML, the cost/loss/risk function is a sum of terms, with one term for each data point. For example, in linear regression, \( E \propto \sum_{i=1}^n (y_i - \mathbf{w}^T\cdot\mathbf{x}_i)^2 \); for logistic regression, the square error is replaced by the cross entropy. To calculate the gradient we have to sum over <em>all</em> \( n \) data points. Doing this at every GD step becomes extremely computationally expensive. An ingenious solution to this, is to calculate the gradients using small subsets of the data called &quot;mini batches&quot;. This has the added benefit of introducing stochasticity into our algorithm.</li>
-<p><li> <b>GD is very sensitive to choices of learning rates</b>. GD is extremely sensitive to the choice of learning rates. If the learning rate is very small, the training process take an extremely long time. For larger learning rates, GD can diverge and give poor results. Furthermore, depending on what the local landscape looks like, we have to modify the learning rates to ensure convergence. Ideally, we would <em>adaptively</em> choose the learning rates to match the landscape.</li>
-<p><li> <b>GD treats all directions in parameter space uniformly.</b> Another major drawback of GD is that unlike Newton's method, the learning rate for GD is the same in all directions in parameter space. For this reason, the maximum learning rate is set by the behavior of the steepest direction and this can significantly slow down training. Ideally, we would like to take large steps in flat directions and small steps in steep directions. Since we are exploring rugged landscapes where curvatures change, this requires us to keep track of not only the gradient but second derivatives. The ideal scenario would be to calculate the Hessian but this proves to be too computationally expensive.</li> 
-<p><li> GD can take exponential time to escape saddle points, even with random initialization. As we mentioned, GD is extremely sensitive to initial condition since it determines the particular local minimum GD would eventually reach. However, even with a good initialization scheme, through the introduction of randomness, GD can still take exponential time to escape saddle points.</li>
-</ul>
-</section>
-
-<section>
-<h2 id="improving-gradient-descent-with-momentum">Improving gradient descent with momentum </h2>
 
-<p>We discuss here some simple examples where we introduce what is called 'memory'about previous steps, or what is normally called momentum gradient descent. The mathematics is explained below in connection with Stochastic gradient descent.</p>
+<p>The class implements the sigmoid and softmax internally. During fit(),
+we check the number of classes: if more than 2, we set
+self.multi_class=True and perform multinomial logistic regression. We
+one-hot encode the target vector and update a weight matrix with
+softmax probabilities. Otherwise, we do standard binary logistic
+regression, converting labels to 0/1 if needed and updating a weight
+vector. In both cases we use batch gradient descent on the
+cross-entropy loss (we add a small epsilon 1e-15 to logs for numerical
+stability). Progress (loss) can be printed if verbose=True.
+</p>
 
 
 <!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
@@ -1683,61 +1899,38 @@ <h2 id="improving-gradient-descent-with-momentum">Improving gradient descent wit
     <div class="inner_cell">
       <div class="input_area">
         <div class="highlight" style="background: #eeeedd">
-  <pre style="font-size: 80%; line-height: 125%;"><span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">numpy</span> <span style="color: #8B008B; font-weight: bold">import</span> asarray
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">numpy</span> <span style="color: #8B008B; font-weight: bold">import</span> arange
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">numpy.random</span> <span style="color: #8B008B; font-weight: bold">import</span> rand
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">numpy.random</span> <span style="color: #8B008B; font-weight: bold">import</span> seed
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">matplotlib</span> <span style="color: #8B008B; font-weight: bold">import</span> pyplot
- 
-<span style="color: #228B22"># objective function</span>
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">objective</span>(x):
-	<span style="color: #8B008B; font-weight: bold">return</span> x**<span style="color: #B452CD">2.0</span>
- 
-<span style="color: #228B22"># derivative of objective function</span>
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">derivative</span>(x):
-	<span style="color: #8B008B; font-weight: bold">return</span> x * <span style="color: #B452CD">2.0</span>
- 
-<span style="color: #228B22"># gradient descent algorithm</span>
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">gradient_descent</span>(objective, derivative, bounds, n_iter, step_size):
-	<span style="color: #228B22"># track all solutions</span>
-	solutions, scores = <span style="color: #658b00">list</span>(), <span style="color: #658b00">list</span>()
-	<span style="color: #228B22"># generate an initial point</span>
-	solution = bounds[:, <span style="color: #B452CD">0</span>] + rand(<span style="color: #658b00">len</span>(bounds)) * (bounds[:, <span style="color: #B452CD">1</span>] - bounds[:, <span style="color: #B452CD">0</span>])
-	<span style="color: #228B22"># run the gradient descent</span>
-	<span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(n_iter):
-		<span style="color: #228B22"># calculate gradient</span>
-		gradient = derivative(solution)
-		<span style="color: #228B22"># take a step</span>
-		solution = solution - step_size * gradient
-		<span style="color: #228B22"># evaluate candidate point</span>
-		solution_eval = objective(solution)
-		<span style="color: #228B22"># store solution</span>
-		solutions.append(solution)
-		scores.append(solution_eval)
-		<span style="color: #228B22"># report progress</span>
-		<span style="color: #658b00">print</span>(<span style="color: #CD5555">&#39;&gt;%d f(%s) = %.5f&#39;</span> % (i, solution, solution_eval))
-	<span style="color: #8B008B; font-weight: bold">return</span> [solutions, scores]
- 
-<span style="color: #228B22"># seed the pseudo random number generator</span>
-seed(<span style="color: #B452CD">4</span>)
-<span style="color: #228B22"># define range for input</span>
-bounds = asarray([[-<span style="color: #B452CD">1.0</span>, <span style="color: #B452CD">1.0</span>]])
-<span style="color: #228B22"># define the total iterations</span>
-n_iter = <span style="color: #B452CD">30</span>
-<span style="color: #228B22"># define the step size</span>
-step_size = <span style="color: #B452CD">0.1</span>
-<span style="color: #228B22"># perform the gradient descent search</span>
-solutions, scores = gradient_descent(objective, derivative, bounds, n_iter, step_size)
-<span style="color: #228B22"># sample input range uniformly at 0.1 increments</span>
-inputs = arange(bounds[<span style="color: #B452CD">0</span>,<span style="color: #B452CD">0</span>], bounds[<span style="color: #B452CD">0</span>,<span style="color: #B452CD">1</span>]+<span style="color: #B452CD">0.1</span>, <span style="color: #B452CD">0.1</span>)
-<span style="color: #228B22"># compute targets</span>
-results = objective(inputs)
-<span style="color: #228B22"># create a line plot of input vs result</span>
-pyplot.plot(inputs, results)
-<span style="color: #228B22"># plot the solutions found</span>
-pyplot.plot(solutions, scores, <span style="color: #CD5555">&#39;.-&#39;</span>, color=<span style="color: #CD5555">&#39;red&#39;</span>)
-<span style="color: #228B22"># show the plot</span>
-pyplot.show()
+  <pre style="font-size: 80%; line-height: 125%;"><span style="color: #228B22"># Evaluation Metrics</span>
+<span style="color: #228B22">#We define helper functions for accuracy and cross-entropy loss. Accuracy is the fraction of correct predictions . For loss, we compute the appropriate cross-entropy:</span>
+
+<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">accuracy_score</span>(y_true, y_pred):
+    <span style="color: #CD5555">&quot;&quot;&quot;Accuracy = (# correct predictions) / (total samples).&quot;&quot;&quot;</span>
+    y_true = np.array(y_true)
+    y_pred = np.array(y_pred)
+    <span style="color: #8B008B; font-weight: bold">return</span> np.mean(y_true == y_pred)
+
+<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">binary_cross_entropy</span>(y_true, y_prob):
+    <span style="color: #CD5555">&quot;&quot;&quot;</span>
+<span style="color: #CD5555">    Binary cross-entropy loss.</span>
+<span style="color: #CD5555">    y_true: true binary labels (0 or 1), y_prob: predicted probabilities for class 1.</span>
+<span style="color: #CD5555">    &quot;&quot;&quot;</span>
+    y_true = np.array(y_true)
+    y_prob = np.clip(np.array(y_prob), <span style="color: #B452CD">1e-15</span>, <span style="color: #B452CD">1</span>-<span style="color: #B452CD">1e-15</span>)
+    <span style="color: #8B008B; font-weight: bold">return</span> -np.mean(y_true * np.log(y_prob) + (<span style="color: #B452CD">1</span> - y_true) * np.log(<span style="color: #B452CD">1</span> - y_prob))
+
+<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">categorical_cross_entropy</span>(y_true, y_prob):
+    <span style="color: #CD5555">&quot;&quot;&quot;</span>
+<span style="color: #CD5555">    Categorical cross-entropy loss for multiclass.</span>
+<span style="color: #CD5555">    y_true: true labels (0...C-1), y_prob: array of predicted probabilities (n_samples x C).</span>
+<span style="color: #CD5555">    &quot;&quot;&quot;</span>
+    y_true = np.array(y_true, dtype=<span style="color: #658b00">int</span>)
+    y_prob = np.clip(np.array(y_prob), <span style="color: #B452CD">1e-15</span>, <span style="color: #B452CD">1</span>-<span style="color: #B452CD">1e-15</span>)
+    <span style="color: #228B22"># One-hot encode true labels</span>
+    n_samples, n_classes = y_prob.shape
+    one_hot = np.zeros_like(y_prob)
+    one_hot[np.arange(n_samples), y_true] = <span style="color: #B452CD">1</span>
+    <span style="color: #228B22"># Compute cross-entropy</span>
+    loss_vec = -np.sum(one_hot * np.log(y_prob), axis=<span style="color: #B452CD">1</span>)
+    <span style="color: #8B008B; font-weight: bold">return</span> np.mean(loss_vec)
 </pre>
 </div>
       </div>
@@ -1752,10 +1945,11 @@ <h2 id="improving-gradient-descent-with-momentum">Improving gradient descent wit
     </div>
   </div>
 </div>
-</section>
+<h3 id="synthetic-data-generation">Synthetic data generation </h3>
 
-<section>
-<h2 id="same-code-but-now-with-momentum-gradient-descent">Same code but now with momentum gradient descent </h2>
+<p>Binary classification data: Create two Gaussian clusters in 2D. For example, class 0 around mean [-2,-2] and class 1 around [2,2].
+Multiclass data: Create several Gaussian clusters (one per class) spread out in feature space.
+</p>
 
 
 <!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
@@ -1764,2083 +1958,84 @@ <h2 id="same-code-but-now-with-momentum-gradient-descent">Same code but now with
     <div class="inner_cell">
       <div class="input_area">
         <div class="highlight" style="background: #eeeedd">
-  <pre style="font-size: 80%; line-height: 125%;"><span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">numpy</span> <span style="color: #8B008B; font-weight: bold">import</span> asarray
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">numpy</span> <span style="color: #8B008B; font-weight: bold">import</span> arange
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">numpy.random</span> <span style="color: #8B008B; font-weight: bold">import</span> rand
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">numpy.random</span> <span style="color: #8B008B; font-weight: bold">import</span> seed
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">matplotlib</span> <span style="color: #8B008B; font-weight: bold">import</span> pyplot
- 
-<span style="color: #228B22"># objective function</span>
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">objective</span>(x):
-	<span style="color: #8B008B; font-weight: bold">return</span> x**<span style="color: #B452CD">2.0</span>
- 
-<span style="color: #228B22"># derivative of objective function</span>
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">derivative</span>(x):
-	<span style="color: #8B008B; font-weight: bold">return</span> x * <span style="color: #B452CD">2.0</span>
- 
-<span style="color: #228B22"># gradient descent algorithm</span>
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">gradient_descent</span>(objective, derivative, bounds, n_iter, step_size, momentum):
-	<span style="color: #228B22"># track all solutions</span>
-	solutions, scores = <span style="color: #658b00">list</span>(), <span style="color: #658b00">list</span>()
-	<span style="color: #228B22"># generate an initial point</span>
-	solution = bounds[:, <span style="color: #B452CD">0</span>] + rand(<span style="color: #658b00">len</span>(bounds)) * (bounds[:, <span style="color: #B452CD">1</span>] - bounds[:, <span style="color: #B452CD">0</span>])
-	<span style="color: #228B22"># keep track of the change</span>
-	change = <span style="color: #B452CD">0.0</span>
-	<span style="color: #228B22"># run the gradient descent</span>
-	<span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(n_iter):
-		<span style="color: #228B22"># calculate gradient</span>
-		gradient = derivative(solution)
-		<span style="color: #228B22"># calculate update</span>
-		new_change = step_size * gradient + momentum * change
-		<span style="color: #228B22"># take a step</span>
-		solution = solution - new_change
-		<span style="color: #228B22"># save the change</span>
-		change = new_change
-		<span style="color: #228B22"># evaluate candidate point</span>
-		solution_eval = objective(solution)
-		<span style="color: #228B22"># store solution</span>
-		solutions.append(solution)
-		scores.append(solution_eval)
-		<span style="color: #228B22"># report progress</span>
-		<span style="color: #658b00">print</span>(<span style="color: #CD5555">&#39;&gt;%d f(%s) = %.5f&#39;</span> % (i, solution, solution_eval))
-	<span style="color: #8B008B; font-weight: bold">return</span> [solutions, scores]
- 
-<span style="color: #228B22"># seed the pseudo random number generator</span>
-seed(<span style="color: #B452CD">4</span>)
-<span style="color: #228B22"># define range for input</span>
-bounds = asarray([[-<span style="color: #B452CD">1.0</span>, <span style="color: #B452CD">1.0</span>]])
-<span style="color: #228B22"># define the total iterations</span>
-n_iter = <span style="color: #B452CD">30</span>
-<span style="color: #228B22"># define the step size</span>
-step_size = <span style="color: #B452CD">0.1</span>
-<span style="color: #228B22"># define momentum</span>
-momentum = <span style="color: #B452CD">0.3</span>
-<span style="color: #228B22"># perform the gradient descent search with momentum</span>
-solutions, scores = gradient_descent(objective, derivative, bounds, n_iter, step_size, momentum)
-<span style="color: #228B22"># sample input range uniformly at 0.1 increments</span>
-inputs = arange(bounds[<span style="color: #B452CD">0</span>,<span style="color: #B452CD">0</span>], bounds[<span style="color: #B452CD">0</span>,<span style="color: #B452CD">1</span>]+<span style="color: #B452CD">0.1</span>, <span style="color: #B452CD">0.1</span>)
-<span style="color: #228B22"># compute targets</span>
-results = objective(inputs)
-<span style="color: #228B22"># create a line plot of input vs result</span>
-pyplot.plot(inputs, results)
-<span style="color: #228B22"># plot the solutions found</span>
-pyplot.plot(solutions, scores, <span style="color: #CD5555">&#39;.-&#39;</span>, color=<span style="color: #CD5555">&#39;red&#39;</span>)
-<span style="color: #228B22"># show the plot</span>
-pyplot.show()
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-</section>
-
-<section>
-<h2 id="overview-video-on-stochastic-gradient-descent">Overview video on Stochastic Gradient Descent </h2>
-
-<a href="https://www.youtube.com/watch?v=vMh0zPT0tLI&ab_channel=StatQuestwithJoshStarmer" target="_blank">What is Stochastic Gradient Descent</a>
-</section>
-
-<section>
-<h2 id="batches-and-mini-batches">Batches and mini-batches </h2>
-
-<p>In gradient descent we compute the cost function and its gradient for all data points we have.</p>
-
-<p>In large-scale applications such as the <a href="https://www.image-net.org/challenges/LSVRC/" target="_blank">ILSVRC challenge</a>, the
-training data can have on order of millions of examples. Hence, it
-seems wasteful to compute the full cost function over the entire
-training set in order to perform only a single parameter update. A
-very common approach to addressing this challenge is to compute the
-gradient over batches of the training data. For example, a typical batch could contain some thousand  examples from
-an  entire training set of several millions. This batch is then used to
-perform a parameter update.
-</p>
-</section>
-
-<section>
-<h2 id="stochastic-gradient-descent-sgd">Stochastic Gradient Descent (SGD) </h2>
-
-<p>In stochastic gradient descent, the extreme case is the case where we
-have only one batch, that is we include the whole data set.
-</p>
-
-<p>This process is called Stochastic Gradient
-Descent (SGD) (or also sometimes on-line gradient descent). This is
-relatively less common to see because in practice due to vectorized
-code optimizations it can be computationally much more efficient to
-evaluate the gradient for 100 examples, than the gradient for one
-example 100 times. Even though SGD technically refers to using a
-single example at a time to evaluate the gradient, you will hear
-people use the term SGD even when referring to mini-batch gradient
-descent (i.e. mentions of MGD for &#8220;Minibatch Gradient Descent&#8221;, or BGD
-for &#8220;Batch gradient descent&#8221; are rare to see), where it is usually
-assumed that mini-batches are used. The size of the mini-batch is a
-hyperparameter but it is not very common to cross-validate or bootstrap it. It is
-usually based on memory constraints (if any), or set to some value,
-e.g. 32, 64 or 128. We use powers of 2 in practice because many
-vectorized operation implementations work faster when their inputs are
-sized in powers of 2.
-</p>
-
-<p>In our notes with  SGD we mean stochastic gradient descent with mini-batches.</p>
-</section>
-
-<section>
-<h2 id="stochastic-gradient-descent">Stochastic Gradient Descent </h2>
-
-<p>Stochastic gradient descent (SGD) and variants thereof address some of
-the shortcomings of the Gradient descent method discussed above.
-</p>
-
-<p>The underlying idea of SGD comes from the observation that the cost
-function, which we want to minimize, can almost always be written as a
-sum over \( n \) data points \( \{\mathbf{x}_i\}_{i=1}^n \),
-</p>
-<p>&nbsp;<br>
-$$
-C(\mathbf{\beta}) = \sum_{i=1}^n c_i(\mathbf{x}_i,
-\mathbf{\beta}). 
-$$
-<p>&nbsp;<br>
-</section>
-
-<section>
-<h2 id="computation-of-gradients">Computation of gradients </h2>
-
-<p>This in turn means that the gradient can be
-computed as a sum over \( i \)-gradients 
-</p>
-<p>&nbsp;<br>
-$$
-\nabla_\beta C(\mathbf{\beta}) = \sum_i^n \nabla_\beta c_i(\mathbf{x}_i,
-\mathbf{\beta}).
-$$
-<p>&nbsp;<br>
-
-<p>Stochasticity/randomness is introduced by only taking the
-gradient on a subset of the data called minibatches.  If there are \( n \)
-data points and the size of each minibatch is \( M \), there will be \( n/M \)
-minibatches. We denote these minibatches by \( B_k \) where
-\( k=1,\cdots,n/M \).
-</p>
-</section>
-
-<section>
-<h2 id="sgd-example">SGD example </h2>
-<p>As an example, suppose we have \( 10 \) data points \( (\mathbf{x}_1,\cdots, \mathbf{x}_{10}) \) 
-and we choose to have \( M=5 \) minibathces,
-then each minibatch contains two data points. In particular we have
-\( B_1 = (\mathbf{x}_1,\mathbf{x}_2), \cdots, B_5 =
-(\mathbf{x}_9,\mathbf{x}_{10}) \). Note that if you choose \( M=1 \) you
-have only a single batch with all data points and on the other extreme,
-you may choose \( M=n \) resulting in a minibatch for each datapoint, i.e
-\( B_k = \mathbf{x}_k \).
-</p>
-
-<p>The idea is now to approximate the gradient by replacing the sum over
-all data points with a sum over the data points in one the minibatches
-picked at random in each gradient descent step 
-</p>
-<p>&nbsp;<br>
-$$
-\nabla_{\beta}
-C(\mathbf{\beta}) = \sum_{i=1}^n \nabla_\beta c_i(\mathbf{x}_i,
-\mathbf{\beta}) \rightarrow \sum_{i \in B_k}^n \nabla_\beta
-c_i(\mathbf{x}_i, \mathbf{\beta}).
-$$
-<p>&nbsp;<br>
-</section>
-
-<section>
-<h2 id="the-gradient-step">The gradient step </h2>
-
-<p>Thus a gradient descent step now looks like </p>
-<p>&nbsp;<br>
-$$
-\beta_{j+1} = \beta_j - \gamma_j \sum_{i \in B_k}^n \nabla_\beta c_i(\mathbf{x}_i,
-\mathbf{\beta})
-$$
-<p>&nbsp;<br>
-
-<p>where \( k \) is picked at random with equal
-probability from \( [1,n/M] \). An iteration over the number of
-minibathces (n/M) is commonly referred to as an epoch. Thus it is
-typical to choose a number of epochs and for each epoch iterate over
-the number of minibatches, as exemplified in the code below.
-</p>
-</section>
-
-<section>
-<h2 id="simple-example-code">Simple example code </h2>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="font-size: 80%; line-height: 125%;"><span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span> 
-
-n = <span style="color: #B452CD">100</span> <span style="color: #228B22">#100 datapoints </span>
-M = <span style="color: #B452CD">5</span>   <span style="color: #228B22">#size of each minibatch</span>
-m = <span style="color: #658b00">int</span>(n/M) <span style="color: #228B22">#number of minibatches</span>
-n_epochs = <span style="color: #B452CD">10</span> <span style="color: #228B22">#number of epochs</span>
-
-j = <span style="color: #B452CD">0</span>
-<span style="color: #8B008B; font-weight: bold">for</span> epoch <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #B452CD">1</span>,n_epochs+<span style="color: #B452CD">1</span>):
-    <span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(m):
-        k = np.random.randint(m) <span style="color: #228B22">#Pick the k-th minibatch at random</span>
-        <span style="color: #228B22">#Compute the gradient using the data in minibatch Bk</span>
-        <span style="color: #228B22">#Compute new suggestion for </span>
-        j += <span style="color: #B452CD">1</span>
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-<p>Taking the gradient only on a subset of the data has two important
-benefits. First, it introduces randomness which decreases the chance
-that our opmization scheme gets stuck in a local minima. Second, if
-the size of the minibatches are small relative to the number of
-datapoints (\( M <  n \)), the computation of the gradient is much
-cheaper since we sum over the datapoints in the \( k-th \) minibatch and not
-all \( n \) datapoints.
-</p>
-</section>
-
-<section>
-<h2 id="when-do-we-stop">When do we stop? </h2>
-
-<p>A natural question is when do we stop the search for a new minimum?
-One possibility is to compute the full gradient after a given number
-of epochs and check if the norm of the gradient is smaller than some
-threshold and stop if true. However, the condition that the gradient
-is zero is valid also for local minima, so this would only tell us
-that we are close to a local/global minimum. However, we could also
-evaluate the cost function at this point, store the result and
-continue the search. If the test kicks in at a later stage we can
-compare the values of the cost function and keep the \( \beta \) that
-gave the lowest value.
-</p>
-</section>
-
-<section>
-<h2 id="slightly-different-approach">Slightly different approach </h2>
-
-<p>Another approach is to let the step length \( \gamma_j \) depend on the
-number of epochs in such a way that it becomes very small after a
-reasonable time such that we do not move at all. Such approaches are
-also called scaling. There are many such ways to <a href="https://towardsdatascience.com/gradient-descent-the-learning-rate-and-the-importance-of-feature-scaling-6c0b416596e1" target="_blank">scale the learning
-rate</a>
-and <a href="https://www.jmlr.org/papers/volume23/20-1258/20-1258.pdf" target="_blank">discussions here</a>. See
-also
-<a href="https://towardsdatascience.com/learning-rate-schedules-and-adaptive-learning-rate-methods-for-deep-learning-2c8f433990d1" target="_blank"><tt>https://towardsdatascience.com/learning-rate-schedules-and-adaptive-learning-rate-methods-for-deep-learning-2c8f433990d1</tt></a>
-for a discussion of different scaling functions for the learning rate.
-</p>
-</section>
-
-<section>
-<h2 id="time-decay-rate">Time decay rate </h2>
-
-<p>As an example, let \( e = 0,1,2,3,\cdots \) denote the current epoch and let \( t_0, t_1 > 0 \) be two fixed numbers. Furthermore, let \( t = e \cdot m + i \) where \( m \) is the number of minibatches and \( i=0,\cdots,m-1 \). Then the function <p>&nbsp;<br>
-$$\gamma_j(t; t_0, t_1) = \frac{t_0}{t+t_1} $$
-<p>&nbsp;<br> goes to zero as the number of epochs gets large. I.e. we start with a step length \( \gamma_j (0; t_0, t_1) = t_0/t_1 \) which decays in <em>time</em> \( t \).</p>
-
-<p>In this way we can fix the number of epochs, compute \( \beta \) and
-evaluate the cost function at the end. Repeating the computation will
-give a different result since the scheme is random by design. Then we
-pick the final \( \beta \) that gives the lowest value of the cost
-function.
-</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="font-size: 80%; line-height: 125%;"><span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span> 
-
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">step_length</span>(t,t0,t1):
-    <span style="color: #8B008B; font-weight: bold">return</span> t0/(t+t1)
-
-n = <span style="color: #B452CD">100</span> <span style="color: #228B22">#100 datapoints </span>
-M = <span style="color: #B452CD">5</span>   <span style="color: #228B22">#size of each minibatch</span>
-m = <span style="color: #658b00">int</span>(n/M) <span style="color: #228B22">#number of minibatches</span>
-n_epochs = <span style="color: #B452CD">500</span> <span style="color: #228B22">#number of epochs</span>
-t0 = <span style="color: #B452CD">1.0</span>
-t1 = <span style="color: #B452CD">10</span>
-
-gamma_j = t0/t1
-j = <span style="color: #B452CD">0</span>
-<span style="color: #8B008B; font-weight: bold">for</span> epoch <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #B452CD">1</span>,n_epochs+<span style="color: #B452CD">1</span>):
-    <span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(m):
-        k = np.random.randint(m) <span style="color: #228B22">#Pick the k-th minibatch at random</span>
-        <span style="color: #228B22">#Compute the gradient using the data in minibatch Bk</span>
-        <span style="color: #228B22">#Compute new suggestion for beta</span>
-        t = epoch*m+i
-        gamma_j = step_length(t,t0,t1)
-        j += <span style="color: #B452CD">1</span>
-
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;gamma_j after %d epochs: %g&quot;</span> % (n_epochs,gamma_j))
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-</section>
-
-<section>
-<h2 id="code-with-a-number-of-minibatches-which-varies">Code with a Number of Minibatches which varies </h2>
-
-<p>In the code here we vary the number of mini-batches.</p>
-
-<!-- code=text (!bc pycode) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="font-size: 80%; line-height: 125%;"># Importing various packages
-from math import exp, sqrt
-from random import random, seed
-import numpy as np
-import matplotlib.pyplot as plt
-
-n = 100
-x = 2*np.random.rand(n,1)
-y = 4+3*x+np.random.randn(n,1)
-
-X = np.c_[np.ones((n,1)), x]
-XT_X = X.T @ X
-theta_linreg = np.linalg.inv(X.T @ X) @ (X.T @ y)
-print(&quot;Own inversion&quot;)
-print(theta_linreg)
-# Hessian matrix
-H = (2.0/n)* XT_X
-EigValues, EigVectors = np.linalg.eig(H)
-print(f&quot;Eigenvalues of Hessian Matrix:{EigValues}&quot;)
-
-theta = np.random.randn(2,1)
-eta = 1.0/np.max(EigValues)
-Niterations = 1000
-
-
-for iter in range(Niterations):
-    gradients = 2.0/n*X.T @ ((X @ theta)-y)
-    theta -= eta*gradients
-print(&quot;theta from own gd&quot;)
-print(theta)
-
-xnew = np.array([[0],[2]])
-Xnew = np.c_[np.ones((2,1)), xnew]
-ypredict = Xnew.dot(theta)
-ypredict2 = Xnew.dot(theta_linreg)
-
-n_epochs = 50
-M = 5   #size of each minibatch
-m = int(n/M) #number of minibatches
-t0, t1 = 5, 50
-
-def learning_schedule(t):
-    return t0/(t+t1)
-
-theta = np.random.randn(2,1)
-
-for epoch in range(n_epochs):
-# Can you figure out a better way of setting up the contributions to each batch?
-    for i in range(m):
-        random_index = M*np.random.randint(m)
-        xi = X[random_index:random_index+M]
-        yi = y[random_index:random_index+M]
-        gradients = (2.0/M)* xi.T @ ((xi @ theta)-yi)
-        eta = learning_schedule(epoch*m+i)
-        theta = theta - eta*gradients
-print(&quot;theta from own sdg&quot;)
-print(theta)
-
-plt.plot(xnew, ypredict, &quot;r-&quot;)
-plt.plot(xnew, ypredict2, &quot;b-&quot;)
-plt.plot(x, y ,&#39;ro&#39;)
-plt.axis([0,2.0,0, 15.0])
-plt.xlabel(r&#39;$x$&#39;)
-plt.ylabel(r&#39;$y$&#39;)
-plt.title(r&#39;Random numbers &#39;)
-plt.show()
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-</section>
-
-<section>
-<h2 id="replace-or-not">Replace or not </h2>
-
-<p>In the above code, we have use replacement in setting up the
-mini-batches. The discussion
-<a href="https://sebastianraschka.com/faq/docs/sgd-methods.html" target="_blank">here</a> may be
-useful.  
-</p>
-</section>
-
-<section>
-<h2 id="momentum-based-gd">Momentum based GD </h2>
-
-<p>The stochastic gradient descent (SGD) is almost always used with a
-<em>momentum</em> or inertia term that serves as a memory of the direction we
-are moving in parameter space.  This is typically implemented as
-follows
-</p>
-
-<p>&nbsp;<br>
-$$
-\begin{align}
-\mathbf{v}_{t}&=\gamma \mathbf{v}_{t-1}+\eta_{t}\nabla_\theta E(\boldsymbol{\theta}_t) \nonumber \\
-\boldsymbol{\theta}_{t+1}&= \boldsymbol{\theta}_t -\mathbf{v}_{t},
-\tag{2}
-\end{align}
-$$
-<p>&nbsp;<br>
-
-<p>where we have introduced a momentum parameter \( \gamma \), with
-\( 0\le\gamma\le 1 \), and for brevity we dropped the explicit notation to
-indicate the gradient is to be taken over a different mini-batch at
-each step. We call this algorithm gradient descent with momentum
-(GDM). From these equations, it is clear that \( \mathbf{v}_t \) is a
-running average of recently encountered gradients and
-\( (1-\gamma)^{-1} \) sets the characteristic time scale for the memory
-used in the averaging procedure. Consistent with this, when
-\( \gamma=0 \), this just reduces down to ordinary SGD as discussed
-earlier. An equivalent way of writing the updates is
-</p>
-
-<p>&nbsp;<br>
-$$
-\Delta \boldsymbol{\theta}_{t+1} = \gamma \Delta \boldsymbol{\theta}_t -\ \eta_{t}\nabla_\theta E(\boldsymbol{\theta}_t),
-$$
-<p>&nbsp;<br>
-
-<p>where we have defined \( \Delta \boldsymbol{\theta}_{t}= \boldsymbol{\theta}_t-\boldsymbol{\theta}_{t-1} \).</p>
-</section>
-
-<section>
-<h2 id="more-on-momentum-based-approaches">More on momentum based approaches </h2>
-
-<p>Let us try to get more intuition from these equations. It is helpful
-to consider a simple physical analogy with a particle of mass \( m \)
-moving in a viscous medium with drag coefficient \( \mu \) and potential
-\( E(\mathbf{w}) \). If we denote the particle's position by \( \mathbf{w} \),
-then its motion is described by
-</p>
-
-<p>&nbsp;<br>
-$$
-m {d^2 \mathbf{w} \over dt^2} + \mu {d \mathbf{w} \over dt }= -\nabla_w E(\mathbf{w}).
-$$
-<p>&nbsp;<br>
-
-<p>We can discretize this equation in the usual way to get</p>
-
-<p>&nbsp;<br>
-$$
-m { \mathbf{w}_{t+\Delta t}-2 \mathbf{w}_{t} +\mathbf{w}_{t-\Delta t} \over (\Delta t)^2}+\mu {\mathbf{w}_{t+\Delta t}- \mathbf{w}_{t} \over \Delta t} = -\nabla_w E(\mathbf{w}).
-$$
-<p>&nbsp;<br>
-
-<p>Rearranging this equation, we can rewrite this as</p>
-
-<p>&nbsp;<br>
-$$
-\Delta \mathbf{w}_{t +\Delta t}= - { (\Delta t)^2 \over m +\mu \Delta t} \nabla_w E(\mathbf{w})+ {m \over m +\mu \Delta t} \Delta \mathbf{w}_t.
-$$
-<p>&nbsp;<br>
-</section>
-
-<section>
-<h2 id="momentum-parameter">Momentum parameter </h2>
-
-<p>Notice that this equation is identical to previous one if we identify
-the position of the particle, \( \mathbf{w} \), with the parameters
-\( \boldsymbol{\theta} \). This allows us to identify the momentum
-parameter and learning rate with the mass of the particle and the
-viscous drag as:
-</p>
-
-<p>&nbsp;<br>
-$$
-\gamma= {m \over m +\mu \Delta t }, \qquad \eta = {(\Delta t)^2 \over m +\mu \Delta t}.
-$$
-<p>&nbsp;<br>
-
-<p>Thus, as the name suggests, the momentum parameter is proportional to
-the mass of the particle and effectively provides inertia.
-Furthermore, in the large viscosity/small learning rate limit, our
-memory time scales as \( (1-\gamma)^{-1} \approx m/(\mu \Delta t) \).
-</p>
-
-<p>Why is momentum useful? SGD momentum helps the gradient descent
-algorithm gain speed in directions with persistent but small gradients
-even in the presence of stochasticity, while suppressing oscillations
-in high-curvature directions. This becomes especially important in
-situations where the landscape is shallow and flat in some directions
-and narrow and steep in others. It has been argued that first-order
-methods (with appropriate initial conditions) can perform comparable
-to more expensive second order methods, especially in the context of
-complex deep learning models.
-</p>
-
-<p>These beneficial properties of momentum can sometimes become even more
-pronounced by using a slight modification of the classical momentum
-algorithm called Nesterov Accelerated Gradient (NAG).
-</p>
-
-<p>In the NAG algorithm, rather than calculating the gradient at the
-current parameters, \( \nabla_\theta E(\boldsymbol{\theta}_t) \), one
-calculates the gradient at the expected value of the parameters given
-our current momentum, \( \nabla_\theta E(\boldsymbol{\theta}_t +\gamma
-\mathbf{v}_{t-1}) \). This yields the NAG update rule
-</p>
-
-<p>&nbsp;<br>
-$$
-\begin{align}
-\mathbf{v}_{t}&=\gamma \mathbf{v}_{t-1}+\eta_{t}\nabla_\theta E(\boldsymbol{\theta}_t +\gamma \mathbf{v}_{t-1}) \nonumber \\
-\boldsymbol{\theta}_{t+1}&= \boldsymbol{\theta}_t -\mathbf{v}_{t}.
-\tag{3}
-\end{align}
-$$
-<p>&nbsp;<br>
-
-<p>One of the major advantages of NAG is that it allows for the use of a larger learning rate than GDM for the same choice of \( \gamma \).</p>
-</section>
-
-<section>
-<h2 id="second-moment-of-the-gradient">Second moment of the gradient </h2>
-
-<p>In stochastic gradient descent, with and without momentum, we still
-have to specify a schedule for tuning the learning rates \( \eta_t \)
-as a function of time.  As discussed in the context of Newton's
-method, this presents a number of dilemmas. The learning rate is
-limited by the steepest direction which can change depending on the
-current position in the landscape. To circumvent this problem, ideally
-our algorithm would keep track of curvature and take large steps in
-shallow, flat directions and small steps in steep, narrow directions.
-Second-order methods accomplish this by calculating or approximating
-the Hessian and normalizing the learning rate by the
-curvature. However, this is very computationally expensive for
-extremely large models. Ideally, we would like to be able to
-adaptively change the step size to match the landscape without paying
-the steep computational price of calculating or approximating
-Hessians.
-</p>
-
-<p>Recently, a number of methods have been introduced that accomplish
-this by tracking not only the gradient, but also the second moment of
-the gradient. These methods include AdaGrad, AdaDelta, Root Mean Squared Propagation (RMS-Prop), and
-<a href="https://arxiv.org/abs/1412.6980" target="_blank">ADAM</a>.
-</p>
-</section>
-
-<section>
-<h2 id="rms-prop">RMS prop </h2>
-
-<p>In RMS prop, in addition to keeping a running average of the first
-moment of the gradient, we also keep track of the second moment
-denoted by \( \mathbf{s}_t=\mathbb{E}[\mathbf{g}_t^2] \). The update rule
-for RMS prop is given by
-</p>
-
-<p>&nbsp;<br>
-$$
-\begin{align}
-\mathbf{g}_t &= \nabla_\theta E(\boldsymbol{\theta}) 
-\tag{4}\\
-\mathbf{s}_t &=\beta \mathbf{s}_{t-1} +(1-\beta)\mathbf{g}_t^2 \nonumber \\
-\boldsymbol{\theta}_{t+1}&=&\boldsymbol{\theta}_t - \eta_t { \mathbf{g}_t \over \sqrt{\mathbf{s}_t +\epsilon}}, \nonumber
-\end{align}
-$$
-<p>&nbsp;<br>
-
-<p>where \( \beta \) controls the averaging time of the second moment and is
-typically taken to be about \( \beta=0.9 \), \( \eta_t \) is a learning rate
-typically chosen to be \( 10^{-3} \), and \( \epsilon\sim 10^{-8}  \) is a
-small regularization constant to prevent divergences. Multiplication
-and division by vectors is understood as an element-wise operation. It
-is clear from this formula that the learning rate is reduced in
-directions where the norm of the gradient is consistently large. This
-greatly speeds up the convergence by allowing us to use a larger
-learning rate for flat directions.
-</p>
-</section>
-
-<section>
-<h2 id="adam-optimizer-https-arxiv-org-abs-1412-6980"><a href="https://arxiv.org/abs/1412.6980" target="_blank">ADAM optimizer</a> </h2>
-
-<p>A related algorithm is the ADAM optimizer. In
-<a href="https://arxiv.org/abs/1412.6980" target="_blank">ADAM</a>, we keep a running average of
-both the first and second moment of the gradient and use this
-information to adaptively change the learning rate for different
-parameters.  The method isefficient when working with large
-problems involving lots data and/or parameters.  It is a combination of the
-gradient descent with momentum algorithm and the RMSprop algorithm
-discussed above.
-</p>
-
-<p>In addition to keeping a running average of the first and
-second moments of the gradient
-(i.e. \( \mathbf{m}_t=\mathbb{E}[\mathbf{g}_t] \) and
-\( \mathbf{s}_t=\mathbb{E}[\mathbf{g}^2_t] \), respectively), ADAM
-performs an additional bias correction to account for the fact that we
-are estimating the first two moments of the gradient using a running
-average (denoted by the hats in the update rule below). The update
-rule for ADAM is given by (where multiplication and division are once
-again understood to be element-wise operations below)
-</p>
-
-<p>&nbsp;<br>
-$$
-\begin{align}
-\mathbf{g}_t &= \nabla_\theta E(\boldsymbol{\theta}) 
-\tag{5}\\
-\mathbf{m}_t &= \beta_1 \mathbf{m}_{t-1} + (1-\beta_1) \mathbf{g}_t \nonumber \\
-\mathbf{s}_t &=\beta_2 \mathbf{s}_{t-1} +(1-\beta_2)\mathbf{g}_t^2 \nonumber \\
-\boldsymbol{\mathbf{m}}_t&={\mathbf{m}_t \over 1-\beta_1^t} \nonumber \\
-\boldsymbol{\mathbf{s}}_t &={\mathbf{s}_t \over1-\beta_2^t} \nonumber \\
-\boldsymbol{\theta}_{t+1}&=\boldsymbol{\theta}_t - \eta_t { \boldsymbol{\mathbf{m}}_t \over \sqrt{\boldsymbol{\mathbf{s}}_t} +\epsilon}, \nonumber \\
-\tag{6}
-\end{align}
-$$
-<p>&nbsp;<br>
-
-<p>where \( \beta_1 \) and \( \beta_2 \) set the memory lifetime of the first and
-second moment and are typically taken to be \( 0.9 \) and \( 0.99 \)
-respectively, and \( \eta \) and \( \epsilon \) are identical to RMSprop.
-</p>
-
-<p>Like in RMSprop, the effective step size of a parameter depends on the
-magnitude of its gradient squared.  To understand this better, let us
-rewrite this expression in terms of the variance
-\( \boldsymbol{\sigma}_t^2 = \boldsymbol{\mathbf{s}}_t -
-(\boldsymbol{\mathbf{m}}_t)^2 \). Consider a single parameter \( \theta_t \). The
-update rule for this parameter is given by
-</p>
-
-<p>&nbsp;<br>
-$$
-\Delta \theta_{t+1}= -\eta_t { \boldsymbol{m}_t \over \sqrt{\sigma_t^2 +  m_t^2 }+\epsilon}.
-$$
-<p>&nbsp;<br>
-</section>
-
-<section>
-<h2 id="algorithms-and-codes-for-adagrad-rmsprop-and-adam">Algorithms and codes for Adagrad, RMSprop and Adam </h2>
-
-<p>The algorithms we have implemented are well described in the text by <a href="https://www.deeplearningbook.org/contents/optimization.html" target="_blank">Goodfellow, Bengio and Courville, chapter 8</a>.</p>
-
-<p>The codes which implement these algorithms are discussed after our presentation of automatic differentiation.</p>
-</section>
-
-<section>
-<h2 id="practical-tips">Practical tips </h2>
-
-<ul>
-<p><li> <b>Randomize the data when making mini-batches</b>. It is always important to randomly shuffle the data when forming mini-batches. Otherwise, the gradient descent method can fit spurious correlations resulting from the order in which data is presented.</li>
-<p><li> <b>Transform your inputs</b>. Learning becomes difficult when our landscape has a mixture of steep and flat directions. One simple trick for minimizing these situations is to standardize the data by subtracting the mean and normalizing the variance of input variables. Whenever possible, also decorrelate the inputs. To understand why this is helpful, consider the case of linear regression. It is easy to show that for the squared error cost function, the Hessian of the cost function is just the correlation matrix between the inputs. Thus, by standardizing the inputs, we are ensuring that the landscape looks homogeneous in all directions in parameter space. Since most deep networks can be viewed as linear transformations followed by a non-linearity at each layer, we expect this intuition to hold beyond the linear case.</li>
-<p><li> <b>Monitor the out-of-sample performance.</b> Always monitor the performance of your model on a validation set (a small portion of the training data that is held out of the training process to serve as a proxy for the test set. If the validation error starts increasing, then the model is beginning to overfit. Terminate the learning process. This <em>early stopping</em> significantly improves performance in many settings.</li>
-<p><li> <b>Adaptive optimization methods don't always have good generalization.</b> Recent studies have shown that adaptive methods such as ADAM, RMSPorp, and AdaGrad tend to have poor generalization compared to SGD or SGD with momentum, particularly in the high-dimensional limit (i.e. the number of parameters exceeds the number of data points). Although it is not clear at this stage why these methods perform so well in training deep neural networks, simpler procedures like properly-tuned SGD may work as well or better in these applications.</li>
-</ul>
-<p>
-<p>Geron's text, see chapter 11, has several interesting discussions.</p>
-</section>
-
-<section>
-<h2 id="automatic-differentiation">Automatic differentiation </h2>
-
-<p><a href="https://en.wikipedia.org/wiki/Automatic_differentiation" target="_blank">Automatic differentiation (AD)</a>, 
-also called algorithmic
-differentiation or computational differentiation,is a set of
-techniques to numerically evaluate the derivative of a function
-specified by a computer program. AD exploits the fact that every
-computer program, no matter how complicated, executes a sequence of
-elementary arithmetic operations (addition, subtraction,
-multiplication, division, etc.) and elementary functions (exp, log,
-sin, cos, etc.). By applying the chain rule repeatedly to these
-operations, derivatives of arbitrary order can be computed
-automatically, accurately to working precision, and using at most a
-small constant factor more arithmetic operations than the original
-program.
-</p>
-
-<p>Automatic differentiation is neither:</p>
-
-<ul>
-<p><li> Symbolic differentiation, nor</li>
-<p><li> Numerical differentiation (the method of finite differences).</li>
-</ul>
-<p>
-<p>Symbolic differentiation can lead to inefficient code and faces the
-difficulty of converting a computer program into a single expression,
-while numerical differentiation can introduce round-off errors in the
-discretization process and cancellation
-</p>
-
-<p>Python has tools for so-called <b>automatic differentiation</b>.
-Consider the following example
-</p>
-<p>&nbsp;<br>
-$$
-f(x) = \sin\left(2\pi x + x^2\right)
-$$
-<p>&nbsp;<br>
-
-<p>which has the following derivative</p>
-<p>&nbsp;<br>
-$$
-f'(x) = \cos\left(2\pi x + x^2\right)\left(2\pi + 2x\right) 
-$$
-<p>&nbsp;<br>
-
-<p>Using <b>autograd</b> we have</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="font-size: 80%; line-height: 125%;"><span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">autograd.numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
-
-<span style="color: #228B22"># To do elementwise differentiation:</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">autograd</span> <span style="color: #8B008B; font-weight: bold">import</span> elementwise_grad <span style="color: #8B008B; font-weight: bold">as</span> egrad 
-
-<span style="color: #228B22"># To plot:</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">matplotlib.pyplot</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">plt</span> 
-
-
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">f</span>(x):
-    <span style="color: #8B008B; font-weight: bold">return</span> np.sin(<span style="color: #B452CD">2</span>*np.pi*x + x**<span style="color: #B452CD">2</span>)
-
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">f_grad_analytic</span>(x):
-    <span style="color: #8B008B; font-weight: bold">return</span> np.cos(<span style="color: #B452CD">2</span>*np.pi*x + x**<span style="color: #B452CD">2</span>)*(<span style="color: #B452CD">2</span>*np.pi + <span style="color: #B452CD">2</span>*x)
-
-<span style="color: #228B22"># Do the comparison:</span>
-x = np.linspace(<span style="color: #B452CD">0</span>,<span style="color: #B452CD">1</span>,<span style="color: #B452CD">1000</span>)
-
-f_grad = egrad(f)
-
-computed = f_grad(x)
-analytic = f_grad_analytic(x)
-
-plt.title(<span style="color: #CD5555">&#39;Derivative computed from Autograd compared with the analytical derivative&#39;</span>)
-plt.plot(x,computed,label=<span style="color: #CD5555">&#39;autograd&#39;</span>)
-plt.plot(x,analytic,label=<span style="color: #CD5555">&#39;analytic&#39;</span>)
-
-plt.xlabel(<span style="color: #CD5555">&#39;x&#39;</span>)
-plt.ylabel(<span style="color: #CD5555">&#39;y&#39;</span>)
-plt.legend()
-
-plt.show()
-
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;The max absolute difference is: %g&quot;</span>%(np.max(np.abs(computed - analytic))))
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-</section>
-
-<section>
-<h2 id="using-autograd">Using autograd </h2>
-
-<p>Here we
-experiment with what kind of functions Autograd is capable
-of finding the gradient of. The following Python functions are just
-meant to illustrate what Autograd can do, but please feel free to
-experiment with other, possibly more complicated, functions as well.
-</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="font-size: 80%; line-height: 125%;"><span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">autograd.numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">autograd</span> <span style="color: #8B008B; font-weight: bold">import</span> grad
-
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">f1</span>(x):
-    <span style="color: #8B008B; font-weight: bold">return</span> x**<span style="color: #B452CD">3</span> + <span style="color: #B452CD">1</span>
-
-f1_grad = grad(f1)
-
-<span style="color: #228B22"># Remember to send in float as argument to the computed gradient from Autograd!</span>
-a = <span style="color: #B452CD">1.0</span>
-
-<span style="color: #228B22"># See the evaluated gradient at a using autograd:</span>
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;The gradient of f1 evaluated at a = %g using autograd is: %g&quot;</span>%(a,f1_grad(a)))
-
-<span style="color: #228B22"># Compare with the analytical derivative, that is f1&#39;(x) = 3*x**2 </span>
-grad_analytical = <span style="color: #B452CD">3</span>*a**<span style="color: #B452CD">2</span>
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;The gradient of f1 evaluated at a = %g by finding the analytic expression is: %g&quot;</span>%(a,grad_analytical))
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-</section>
-
-<section>
-<h2 id="autograd-with-more-complicated-functions">Autograd with more complicated functions </h2>
-
-<p>To differentiate with respect to two (or more) arguments of a Python
-function, Autograd need to know at which variable the function if
-being differentiated with respect to.
-</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="font-size: 80%; line-height: 125%;"><span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">autograd.numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">autograd</span> <span style="color: #8B008B; font-weight: bold">import</span> grad
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">f2</span>(x1,x2):
-    <span style="color: #8B008B; font-weight: bold">return</span> <span style="color: #B452CD">3</span>*x1**<span style="color: #B452CD">3</span> + x2*(x1 - <span style="color: #B452CD">5</span>) + <span style="color: #B452CD">1</span>
-
-<span style="color: #228B22"># By sending the argument 0, Autograd will compute the derivative w.r.t the first variable, in this case x1</span>
-f2_grad_x1 = grad(f2,<span style="color: #B452CD">0</span>)
-
-<span style="color: #228B22"># ... and differentiate w.r.t x2 by sending 1 as an additional arugment to grad</span>
-f2_grad_x2 = grad(f2,<span style="color: #B452CD">1</span>)
-
-x1 = <span style="color: #B452CD">1.0</span>
-x2 = <span style="color: #B452CD">3.0</span> 
-
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;Evaluating at x1 = %g, x2 = %g&quot;</span>%(x1,x2))
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;-&quot;</span>*<span style="color: #B452CD">30</span>)
-
-<span style="color: #228B22"># Compare with the analytical derivatives:</span>
-
-<span style="color: #228B22"># Derivative of f2 w.r.t x1 is: 9*x1**2 + x2:</span>
-f2_grad_x1_analytical = <span style="color: #B452CD">9</span>*x1**<span style="color: #B452CD">2</span> + x2
-
-<span style="color: #228B22"># Derivative of f2 w.r.t x2 is: x1 - 5:</span>
-f2_grad_x2_analytical = x1 - <span style="color: #B452CD">5</span>
-
-<span style="color: #228B22"># See the evaluated derivations:</span>
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;The derivative of f2 w.r.t x1: %g&quot;</span>%( f2_grad_x1(x1,x2) ))
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;The analytical derivative of f2 w.r.t x1: %g&quot;</span>%( f2_grad_x1(x1,x2) ))
-
-<span style="color: #658b00">print</span>()
-
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;The derivative of f2 w.r.t x2: %g&quot;</span>%( f2_grad_x2(x1,x2) ))
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;The analytical derivative of f2 w.r.t x2: %g&quot;</span>%( f2_grad_x2(x1,x2) ))
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-<p>Note that the grad function will not produce the true gradient of the function. The true gradient of a function with two or more variables will produce a vector, where each element is the function differentiated w.r.t a variable.</p>
-</section>
-
-<section>
-<h2 id="more-complicated-functions-using-the-elements-of-their-arguments-directly">More complicated functions using the elements of their arguments directly </h2>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="font-size: 80%; line-height: 125%;"><span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">autograd.numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">autograd</span> <span style="color: #8B008B; font-weight: bold">import</span> grad
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">f3</span>(x): <span style="color: #228B22"># Assumes x is an array of length 5 or higher</span>
-    <span style="color: #8B008B; font-weight: bold">return</span> <span style="color: #B452CD">2</span>*x[<span style="color: #B452CD">0</span>] + <span style="color: #B452CD">3</span>*x[<span style="color: #B452CD">1</span>] + <span style="color: #B452CD">5</span>*x[<span style="color: #B452CD">2</span>] + <span style="color: #B452CD">7</span>*x[<span style="color: #B452CD">3</span>] + <span style="color: #B452CD">11</span>*x[<span style="color: #B452CD">4</span>]**<span style="color: #B452CD">2</span>
-
-f3_grad = grad(f3)
-
-x = np.linspace(<span style="color: #B452CD">0</span>,<span style="color: #B452CD">4</span>,<span style="color: #B452CD">5</span>)
-
-<span style="color: #228B22"># Print the computed gradient:</span>
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;The computed gradient of f3 is: &quot;</span>, f3_grad(x))
-
-<span style="color: #228B22"># The analytical gradient is: (2, 3, 5, 7, 22*x[4])</span>
-f3_grad_analytical = np.array([<span style="color: #B452CD">2</span>, <span style="color: #B452CD">3</span>, <span style="color: #B452CD">5</span>, <span style="color: #B452CD">7</span>, <span style="color: #B452CD">22</span>*x[<span style="color: #B452CD">4</span>]])
-
-<span style="color: #228B22"># Print the analytical gradient:</span>
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;The analytical gradient of f3 is: &quot;</span>, f3_grad_analytical)
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-<p>Note that in this case, when sending an array as input argument, the
-output from Autograd is another array. This is the true gradient of
-the function, as opposed to the function in the previous example. By
-using arrays to represent the variables, the output from Autograd
-might be easier to work with, as the output is closer to what one
-could expect form a gradient-evaluting function.
-</p>
-</section>
-
-<section>
-<h2 id="functions-using-mathematical-functions-from-numpy">Functions using mathematical functions from Numpy </h2>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="font-size: 80%; line-height: 125%;"><span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">autograd.numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">autograd</span> <span style="color: #8B008B; font-weight: bold">import</span> grad
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">f4</span>(x):
-    <span style="color: #8B008B; font-weight: bold">return</span> np.sqrt(<span style="color: #B452CD">1</span>+x**<span style="color: #B452CD">2</span>) + np.exp(x) + np.sin(<span style="color: #B452CD">2</span>*np.pi*x)
-
-f4_grad = grad(f4)
-
-x = <span style="color: #B452CD">2.7</span>
-
-<span style="color: #228B22"># Print the computed derivative:</span>
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;The computed derivative of f4 at x = %g is: %g&quot;</span>%(x,f4_grad(x)))
-
-<span style="color: #228B22"># The analytical derivative is: x/sqrt(1 + x**2) + exp(x) + cos(2*pi*x)*2*pi</span>
-f4_grad_analytical = x/np.sqrt(<span style="color: #B452CD">1</span> + x**<span style="color: #B452CD">2</span>) + np.exp(x) + np.cos(<span style="color: #B452CD">2</span>*np.pi*x)*<span style="color: #B452CD">2</span>*np.pi
-
-<span style="color: #228B22"># Print the analytical gradient:</span>
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;The analytical gradient of f4 at x = %g is: %g&quot;</span>%(x,f4_grad_analytical))
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-</section>
-
-<section>
-<h2 id="more-autograd">More autograd </h2>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="font-size: 80%; line-height: 125%;"><span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">autograd.numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">autograd</span> <span style="color: #8B008B; font-weight: bold">import</span> grad
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">f5</span>(x):
-    <span style="color: #8B008B; font-weight: bold">if</span> x &gt;= <span style="color: #B452CD">0</span>:
-        <span style="color: #8B008B; font-weight: bold">return</span> x**<span style="color: #B452CD">2</span>
-    <span style="color: #8B008B; font-weight: bold">else</span>:
-        <span style="color: #8B008B; font-weight: bold">return</span> -<span style="color: #B452CD">3</span>*x + <span style="color: #B452CD">1</span>
-
-f5_grad = grad(f5)
-
-x = <span style="color: #B452CD">2.7</span>
-
-<span style="color: #228B22"># Print the computed derivative:</span>
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;The computed derivative of f5 at x = %g is: %g&quot;</span>%(x,f5_grad(x)))
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-</section>
-
-<section>
-<h2 id="and-with-loops">And  with loops </h2>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="font-size: 80%; line-height: 125%;"><span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">autograd.numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">autograd</span> <span style="color: #8B008B; font-weight: bold">import</span> grad
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">f6_for</span>(x):
-    val = <span style="color: #B452CD">0</span>
-    <span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #B452CD">10</span>):
-        val = val + x**i
-    <span style="color: #8B008B; font-weight: bold">return</span> val
-
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">f6_while</span>(x):
-    val = <span style="color: #B452CD">0</span>
-    i = <span style="color: #B452CD">0</span>
-    <span style="color: #8B008B; font-weight: bold">while</span> i &lt; <span style="color: #B452CD">10</span>:
-        val = val + x**i
-        i = i + <span style="color: #B452CD">1</span>
-    <span style="color: #8B008B; font-weight: bold">return</span> val
-
-f6_for_grad = grad(f6_for)
-f6_while_grad = grad(f6_while)
-
-x = <span style="color: #B452CD">0.5</span>
-
-<span style="color: #228B22"># Print the computed derivaties of f6_for and f6_while</span>
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;The computed derivative of f6_for at x = %g is: %g&quot;</span>%(x,f6_for_grad(x)))
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;The computed derivative of f6_while at x = %g is: %g&quot;</span>%(x,f6_while_grad(x)))
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="font-size: 80%; line-height: 125%;"><span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">autograd.numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">autograd</span> <span style="color: #8B008B; font-weight: bold">import</span> grad
-<span style="color: #228B22"># Both of the functions are implementation of the sum: sum(x**i) for i = 0, ..., 9</span>
-<span style="color: #228B22"># The analytical derivative is: sum(i*x**(i-1)) </span>
-f6_grad_analytical = <span style="color: #B452CD">0</span>
-<span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #B452CD">10</span>):
-    f6_grad_analytical += i*x**(i-<span style="color: #B452CD">1</span>)
-
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;The analytical derivative of f6 at x = %g is: %g&quot;</span>%(x,f6_grad_analytical))
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-</section>
-
-<section>
-<h2 id="using-recursion">Using recursion </h2>
-
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="font-size: 80%; line-height: 125%;"><span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">autograd.numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">autograd</span> <span style="color: #8B008B; font-weight: bold">import</span> grad
-
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">f7</span>(n): <span style="color: #228B22"># Assume that n is an integer</span>
-    <span style="color: #8B008B; font-weight: bold">if</span> n == <span style="color: #B452CD">1</span> <span style="color: #8B008B">or</span> n == <span style="color: #B452CD">0</span>:
-        <span style="color: #8B008B; font-weight: bold">return</span> <span style="color: #B452CD">1</span>
-    <span style="color: #8B008B; font-weight: bold">else</span>:
-        <span style="color: #8B008B; font-weight: bold">return</span> n*f7(n-<span style="color: #B452CD">1</span>)
-
-f7_grad = grad(f7)
-
-n = <span style="color: #B452CD">2.0</span>
-
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;The computed derivative of f7 at n = %d is: %g&quot;</span>%(n,f7_grad(n)))
-
-<span style="color: #228B22"># The function f7 is an implementation of the factorial of n.</span>
-<span style="color: #228B22"># By using the product rule, one can find that the derivative is:</span>
-
-f7_grad_analytical = <span style="color: #B452CD">0</span>
-<span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #658b00">int</span>(n)-<span style="color: #B452CD">1</span>):
-    tmp = <span style="color: #B452CD">1</span>
-    <span style="color: #8B008B; font-weight: bold">for</span> k <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #658b00">int</span>(n)-<span style="color: #B452CD">1</span>):
-        <span style="color: #8B008B; font-weight: bold">if</span> k != i:
-            tmp *= (n - k)
-    f7_grad_analytical += tmp
-
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;The analytical derivative of f7 at n = %d is: %g&quot;</span>%(n,f7_grad_analytical))
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-<p>Note that if n is equal to zero or one, Autograd will give an error message. This message appears when the output is independent on input.</p>
-</section>
-
-<section>
-<h2 id="unsupported-functions">Unsupported functions </h2>
-<p>Autograd supports many features. However, there are some functions that is not supported (yet) by Autograd.</p>
-
-<p>Assigning a value to the variable being differentiated with respect to</p>
-
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="font-size: 80%; line-height: 125%;"><span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">autograd.numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">autograd</span> <span style="color: #8B008B; font-weight: bold">import</span> grad
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">f8</span>(x): <span style="color: #228B22"># Assume x is an array</span>
-    x[<span style="color: #B452CD">2</span>] = <span style="color: #B452CD">3</span>
-    <span style="color: #8B008B; font-weight: bold">return</span> x*<span style="color: #B452CD">2</span>
-
-<span style="color: #228B22">#f8_grad = grad(f8)</span>
-
-<span style="color: #228B22">#x = 8.4</span>
-
-<span style="color: #228B22">#print(&quot;The derivative of f8 is:&quot;,f8_grad(x))</span>
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-<p>Here, running this code, Autograd tells us that an 'ArrayBox' does not support item assignment. The item assignment is done when the program tries to assign x[2] to the value 3. However, Autograd has implemented the computation of the derivative such that this assignment is not possible.</p>
-</section>
-
-<section>
-<h2 id="the-syntax-a-dot-b-when-finding-the-dot-product">The syntax a.dot(b) when finding the dot product </h2>
-
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="font-size: 80%; line-height: 125%;"><span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">autograd.numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">autograd</span> <span style="color: #8B008B; font-weight: bold">import</span> grad
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">f9</span>(a): <span style="color: #228B22"># Assume a is an array with 2 elements</span>
-    b = np.array([<span style="color: #B452CD">1.0</span>,<span style="color: #B452CD">2.0</span>])
-    <span style="color: #8B008B; font-weight: bold">return</span> a.dot(b)
-
-<span style="color: #228B22">#f9_grad = grad(f9)</span>
-
-<span style="color: #228B22">#x = np.array([1.0,0.0])</span>
-
-<span style="color: #228B22">#print(&quot;The derivative of f9 is:&quot;,f9_grad(x))</span>
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-<p>Here we are told that the 'dot' function does not belong to Autograd's
-version of a Numpy array.  To overcome this, an alternative syntax
-which also computed the dot product can be used:
-</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="font-size: 80%; line-height: 125%;"><span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">autograd.numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">autograd</span> <span style="color: #8B008B; font-weight: bold">import</span> grad
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">f9_alternative</span>(x): <span style="color: #228B22"># Assume a is an array with 2 elements</span>
-    b = np.array([<span style="color: #B452CD">1.0</span>,<span style="color: #B452CD">2.0</span>])
-    <span style="color: #8B008B; font-weight: bold">return</span> np.dot(x,b) <span style="color: #228B22"># The same as x_1*b_1 + x_2*b_2</span>
-
-f9_alternative_grad = grad(f9_alternative)
-
-x = np.array([<span style="color: #B452CD">3.0</span>,<span style="color: #B452CD">0.0</span>])
-
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;The gradient of f9 is:&quot;</span>,f9_alternative_grad(x))
-
-<span style="color: #228B22"># The analytical gradient of the dot product of vectors x and b with two elements (x_1,x_2) and (b_1, b_2) respectively</span>
-<span style="color: #228B22"># w.r.t x is (b_1, b_2).</span>
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-</section>
-
-<section>
-<h2 id="using-autograd-with-ols">Using Autograd with OLS </h2>
-
-<p>We conclude the part on optmization by showing how we can make codes
-for linear regression and logistic regression using <b>autograd</b>. The
-first example shows results with ordinary leats squares.
-</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="font-size: 80%; line-height: 125%;"><span style="color: #228B22"># Using Autograd to calculate gradients for OLS</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">random</span> <span style="color: #8B008B; font-weight: bold">import</span> random, seed
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">autograd.numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">matplotlib.pyplot</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">plt</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">autograd</span> <span style="color: #8B008B; font-weight: bold">import</span> grad
-
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">CostOLS</span>(beta):
-    <span style="color: #8B008B; font-weight: bold">return</span> (<span style="color: #B452CD">1.0</span>/n)*np.sum((y-X @ beta)**<span style="color: #B452CD">2</span>)
-
-n = <span style="color: #B452CD">100</span>
-x = <span style="color: #B452CD">2</span>*np.random.rand(n,<span style="color: #B452CD">1</span>)
-y = <span style="color: #B452CD">4</span>+<span style="color: #B452CD">3</span>*x+np.random.randn(n,<span style="color: #B452CD">1</span>)
-
-X = np.c_[np.ones((n,<span style="color: #B452CD">1</span>)), x]
-XT_X = X.T @ X
-theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;Own inversion&quot;</span>)
-<span style="color: #658b00">print</span>(theta_linreg)
-<span style="color: #228B22"># Hessian matrix</span>
-H = (<span style="color: #B452CD">2.0</span>/n)* XT_X
-EigValues, EigVectors = np.linalg.eig(H)
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">f&quot;Eigenvalues of Hessian Matrix:{</span>EigValues<span style="color: #CD5555">}&quot;</span>)
-
-theta = np.random.randn(<span style="color: #B452CD">2</span>,<span style="color: #B452CD">1</span>)
-eta = <span style="color: #B452CD">1.0</span>/np.max(EigValues)
-Niterations = <span style="color: #B452CD">1000</span>
-<span style="color: #228B22"># define the gradient</span>
-training_gradient = grad(CostOLS)
-
-<span style="color: #8B008B; font-weight: bold">for</span> <span style="color: #658b00">iter</span> <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(Niterations):
-    gradients = training_gradient(theta)
-    theta -= eta*gradients
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;theta from own gd&quot;</span>)
-<span style="color: #658b00">print</span>(theta)
-
-xnew = np.array([[<span style="color: #B452CD">0</span>],[<span style="color: #B452CD">2</span>]])
-Xnew = np.c_[np.ones((<span style="color: #B452CD">2</span>,<span style="color: #B452CD">1</span>)), xnew]
-ypredict = Xnew.dot(theta)
-ypredict2 = Xnew.dot(theta_linreg)
-
-plt.plot(xnew, ypredict, <span style="color: #CD5555">&quot;r-&quot;</span>)
-plt.plot(xnew, ypredict2, <span style="color: #CD5555">&quot;b-&quot;</span>)
-plt.plot(x, y ,<span style="color: #CD5555">&#39;ro&#39;</span>)
-plt.axis([<span style="color: #B452CD">0</span>,<span style="color: #B452CD">2.0</span>,<span style="color: #B452CD">0</span>, <span style="color: #B452CD">15.0</span>])
-plt.xlabel(<span style="color: #CD5555">r&#39;$x$&#39;</span>)
-plt.ylabel(<span style="color: #CD5555">r&#39;$y$&#39;</span>)
-plt.title(<span style="color: #CD5555">r&#39;Random numbers &#39;</span>)
-plt.show()
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-</section>
-
-<section>
-<h2 id="same-code-but-now-with-momentum-gradient-descent">Same code but now with momentum gradient descent </h2>
-
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="font-size: 80%; line-height: 125%;"><span style="color: #228B22"># Using Autograd to calculate gradients for OLS</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">random</span> <span style="color: #8B008B; font-weight: bold">import</span> random, seed
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">autograd.numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">matplotlib.pyplot</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">plt</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">autograd</span> <span style="color: #8B008B; font-weight: bold">import</span> grad
-
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">CostOLS</span>(beta):
-    <span style="color: #8B008B; font-weight: bold">return</span> (<span style="color: #B452CD">1.0</span>/n)*np.sum((y-X @ beta)**<span style="color: #B452CD">2</span>)
-
-n = <span style="color: #B452CD">100</span>
-x = <span style="color: #B452CD">2</span>*np.random.rand(n,<span style="color: #B452CD">1</span>)
-y = <span style="color: #B452CD">4</span>+<span style="color: #B452CD">3</span>*x<span style="color: #228B22">#+np.random.randn(n,1)</span>
-
-X = np.c_[np.ones((n,<span style="color: #B452CD">1</span>)), x]
-XT_X = X.T @ X
-theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;Own inversion&quot;</span>)
-<span style="color: #658b00">print</span>(theta_linreg)
-<span style="color: #228B22"># Hessian matrix</span>
-H = (<span style="color: #B452CD">2.0</span>/n)* XT_X
-EigValues, EigVectors = np.linalg.eig(H)
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">f&quot;Eigenvalues of Hessian Matrix:{</span>EigValues<span style="color: #CD5555">}&quot;</span>)
-
-theta = np.random.randn(<span style="color: #B452CD">2</span>,<span style="color: #B452CD">1</span>)
-eta = <span style="color: #B452CD">1.0</span>/np.max(EigValues)
-Niterations = <span style="color: #B452CD">30</span>
-
-<span style="color: #228B22"># define the gradient</span>
-training_gradient = grad(CostOLS)
-
-<span style="color: #8B008B; font-weight: bold">for</span> <span style="color: #658b00">iter</span> <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(Niterations):
-    gradients = training_gradient(theta)
-    theta -= eta*gradients
-    <span style="color: #658b00">print</span>(<span style="color: #658b00">iter</span>,gradients[<span style="color: #B452CD">0</span>],gradients[<span style="color: #B452CD">1</span>])
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;theta from own gd&quot;</span>)
-<span style="color: #658b00">print</span>(theta)
-
-<span style="color: #228B22"># Now improve with momentum gradient descent</span>
-change = <span style="color: #B452CD">0.0</span>
-delta_momentum = <span style="color: #B452CD">0.3</span>
-<span style="color: #8B008B; font-weight: bold">for</span> <span style="color: #658b00">iter</span> <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(Niterations):
-    <span style="color: #228B22"># calculate gradient</span>
-    gradients = training_gradient(theta)
-    <span style="color: #228B22"># calculate update</span>
-    new_change = eta*gradients+delta_momentum*change
-    <span style="color: #228B22"># take a step</span>
-    theta -= new_change
-    <span style="color: #228B22"># save the change</span>
-    change = new_change
-    <span style="color: #658b00">print</span>(<span style="color: #658b00">iter</span>,gradients[<span style="color: #B452CD">0</span>],gradients[<span style="color: #B452CD">1</span>])
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;theta from own gd wth momentum&quot;</span>)
-<span style="color: #658b00">print</span>(theta)
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-</section>
-
-<section>
-<h2 id="but-none-of-these-can-compete-with-newton-s-method">But none of these can compete with Newton's method </h2>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="font-size: 80%; line-height: 125%;"><span style="color: #228B22"># Using Newton&#39;s method</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">random</span> <span style="color: #8B008B; font-weight: bold">import</span> random, seed
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">autograd.numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">matplotlib.pyplot</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">plt</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">autograd</span> <span style="color: #8B008B; font-weight: bold">import</span> grad
-
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">CostOLS</span>(beta):
-    <span style="color: #8B008B; font-weight: bold">return</span> (<span style="color: #B452CD">1.0</span>/n)*np.sum((y-X @ beta)**<span style="color: #B452CD">2</span>)
-
-n = <span style="color: #B452CD">100</span>
-x = <span style="color: #B452CD">2</span>*np.random.rand(n,<span style="color: #B452CD">1</span>)
-y = <span style="color: #B452CD">4</span>+<span style="color: #B452CD">3</span>*x+np.random.randn(n,<span style="color: #B452CD">1</span>)
-
-X = np.c_[np.ones((n,<span style="color: #B452CD">1</span>)), x]
-XT_X = X.T @ X
-beta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;Own inversion&quot;</span>)
-<span style="color: #658b00">print</span>(beta_linreg)
-<span style="color: #228B22"># Hessian matrix</span>
-H = (<span style="color: #B452CD">2.0</span>/n)* XT_X
-<span style="color: #228B22"># Note that here the Hessian does not depend on the parameters beta</span>
-invH = np.linalg.pinv(H)
-EigValues, EigVectors = np.linalg.eig(H)
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">f&quot;Eigenvalues of Hessian Matrix:{</span>EigValues<span style="color: #CD5555">}&quot;</span>)
-
-beta = np.random.randn(<span style="color: #B452CD">2</span>,<span style="color: #B452CD">1</span>)
-Niterations = <span style="color: #B452CD">5</span>
-
-<span style="color: #228B22"># define the gradient</span>
-training_gradient = grad(CostOLS)
-
-<span style="color: #8B008B; font-weight: bold">for</span> <span style="color: #658b00">iter</span> <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(Niterations):
-    gradients = training_gradient(beta)
-    beta -= invH @ gradients
-    <span style="color: #658b00">print</span>(<span style="color: #658b00">iter</span>,gradients[<span style="color: #B452CD">0</span>],gradients[<span style="color: #B452CD">1</span>])
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;beta from own Newton code&quot;</span>)
-<span style="color: #658b00">print</span>(beta)
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-</section>
-
-<section>
-<h2 id="including-stochastic-gradient-descent-with-autograd">Including Stochastic Gradient Descent with Autograd </h2>
-<p>In this code we include the stochastic gradient descent approach discussed above. Note here that we specify which argument we are taking the derivative with respect to when using <b>autograd</b>.</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="font-size: 80%; line-height: 125%;"><span style="color: #228B22"># Using Autograd to calculate gradients using SGD</span>
-<span style="color: #228B22"># OLS example</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">random</span> <span style="color: #8B008B; font-weight: bold">import</span> random, seed
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">autograd.numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">matplotlib.pyplot</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">plt</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">autograd</span> <span style="color: #8B008B; font-weight: bold">import</span> grad
-
-<span style="color: #228B22"># Note change from previous example</span>
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">CostOLS</span>(y,X,theta):
-    <span style="color: #8B008B; font-weight: bold">return</span> np.sum((y-X @ theta)**<span style="color: #B452CD">2</span>)
-
-n = <span style="color: #B452CD">100</span>
-x = <span style="color: #B452CD">2</span>*np.random.rand(n,<span style="color: #B452CD">1</span>)
-y = <span style="color: #B452CD">4</span>+<span style="color: #B452CD">3</span>*x+np.random.randn(n,<span style="color: #B452CD">1</span>)
-
-X = np.c_[np.ones((n,<span style="color: #B452CD">1</span>)), x]
-XT_X = X.T @ X
-theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;Own inversion&quot;</span>)
-<span style="color: #658b00">print</span>(theta_linreg)
-<span style="color: #228B22"># Hessian matrix</span>
-H = (<span style="color: #B452CD">2.0</span>/n)* XT_X
-EigValues, EigVectors = np.linalg.eig(H)
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">f&quot;Eigenvalues of Hessian Matrix:{</span>EigValues<span style="color: #CD5555">}&quot;</span>)
-
-theta = np.random.randn(<span style="color: #B452CD">2</span>,<span style="color: #B452CD">1</span>)
-eta = <span style="color: #B452CD">1.0</span>/np.max(EigValues)
-Niterations = <span style="color: #B452CD">1000</span>
-
-<span style="color: #228B22"># Note that we request the derivative wrt third argument (theta, 2 here)</span>
-training_gradient = grad(CostOLS,<span style="color: #B452CD">2</span>)
-
-<span style="color: #8B008B; font-weight: bold">for</span> <span style="color: #658b00">iter</span> <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(Niterations):
-    gradients = (<span style="color: #B452CD">1.0</span>/n)*training_gradient(y, X, theta)
-    theta -= eta*gradients
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;theta from own gd&quot;</span>)
-<span style="color: #658b00">print</span>(theta)
-
-xnew = np.array([[<span style="color: #B452CD">0</span>],[<span style="color: #B452CD">2</span>]])
-Xnew = np.c_[np.ones((<span style="color: #B452CD">2</span>,<span style="color: #B452CD">1</span>)), xnew]
-ypredict = Xnew.dot(theta)
-ypredict2 = Xnew.dot(theta_linreg)
-
-plt.plot(xnew, ypredict, <span style="color: #CD5555">&quot;r-&quot;</span>)
-plt.plot(xnew, ypredict2, <span style="color: #CD5555">&quot;b-&quot;</span>)
-plt.plot(x, y ,<span style="color: #CD5555">&#39;ro&#39;</span>)
-plt.axis([<span style="color: #B452CD">0</span>,<span style="color: #B452CD">2.0</span>,<span style="color: #B452CD">0</span>, <span style="color: #B452CD">15.0</span>])
-plt.xlabel(<span style="color: #CD5555">r&#39;$x$&#39;</span>)
-plt.ylabel(<span style="color: #CD5555">r&#39;$y$&#39;</span>)
-plt.title(<span style="color: #CD5555">r&#39;Random numbers &#39;</span>)
-plt.show()
-
-n_epochs = <span style="color: #B452CD">50</span>
-M = <span style="color: #B452CD">5</span>   <span style="color: #228B22">#size of each minibatch</span>
-m = <span style="color: #658b00">int</span>(n/M) <span style="color: #228B22">#number of minibatches</span>
-t0, t1 = <span style="color: #B452CD">5</span>, <span style="color: #B452CD">50</span>
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">learning_schedule</span>(t):
-    <span style="color: #8B008B; font-weight: bold">return</span> t0/(t+t1)
-
-theta = np.random.randn(<span style="color: #B452CD">2</span>,<span style="color: #B452CD">1</span>)
-
-<span style="color: #8B008B; font-weight: bold">for</span> epoch <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(n_epochs):
-<span style="color: #228B22"># Can you figure out a better way of setting up the contributions to each batch?</span>
-    <span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(m):
-        random_index = M*np.random.randint(m)
-        xi = X[random_index:random_index+M]
-        yi = y[random_index:random_index+M]
-        gradients = (<span style="color: #B452CD">1.0</span>/M)*training_gradient(yi, xi, theta)
-        eta = learning_schedule(epoch*m+i)
-        theta = theta - eta*gradients
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;theta from own sdg&quot;</span>)
-<span style="color: #658b00">print</span>(theta)
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-</section>
-
-<section>
-<h2 id="same-code-but-now-with-momentum-gradient-descent">Same code but now with momentum gradient descent </h2>
-
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="font-size: 80%; line-height: 125%;"><span style="color: #228B22"># Using Autograd to calculate gradients using SGD</span>
-<span style="color: #228B22"># OLS example</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">random</span> <span style="color: #8B008B; font-weight: bold">import</span> random, seed
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">autograd.numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">matplotlib.pyplot</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">plt</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">autograd</span> <span style="color: #8B008B; font-weight: bold">import</span> grad
-
-<span style="color: #228B22"># Note change from previous example</span>
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">CostOLS</span>(y,X,theta):
-    <span style="color: #8B008B; font-weight: bold">return</span> np.sum((y-X @ theta)**<span style="color: #B452CD">2</span>)
-
-n = <span style="color: #B452CD">100</span>
-x = <span style="color: #B452CD">2</span>*np.random.rand(n,<span style="color: #B452CD">1</span>)
-y = <span style="color: #B452CD">4</span>+<span style="color: #B452CD">3</span>*x+np.random.randn(n,<span style="color: #B452CD">1</span>)
-
-X = np.c_[np.ones((n,<span style="color: #B452CD">1</span>)), x]
-XT_X = X.T @ X
-theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;Own inversion&quot;</span>)
-<span style="color: #658b00">print</span>(theta_linreg)
-<span style="color: #228B22"># Hessian matrix</span>
-H = (<span style="color: #B452CD">2.0</span>/n)* XT_X
-EigValues, EigVectors = np.linalg.eig(H)
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">f&quot;Eigenvalues of Hessian Matrix:{</span>EigValues<span style="color: #CD5555">}&quot;</span>)
-
-theta = np.random.randn(<span style="color: #B452CD">2</span>,<span style="color: #B452CD">1</span>)
-eta = <span style="color: #B452CD">1.0</span>/np.max(EigValues)
-Niterations = <span style="color: #B452CD">100</span>
-
-<span style="color: #228B22"># Note that we request the derivative wrt third argument (theta, 2 here)</span>
-training_gradient = grad(CostOLS,<span style="color: #B452CD">2</span>)
-
-<span style="color: #8B008B; font-weight: bold">for</span> <span style="color: #658b00">iter</span> <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(Niterations):
-    gradients = (<span style="color: #B452CD">1.0</span>/n)*training_gradient(y, X, theta)
-    theta -= eta*gradients
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;theta from own gd&quot;</span>)
-<span style="color: #658b00">print</span>(theta)
-
-
-n_epochs = <span style="color: #B452CD">50</span>
-M = <span style="color: #B452CD">5</span>   <span style="color: #228B22">#size of each minibatch</span>
-m = <span style="color: #658b00">int</span>(n/M) <span style="color: #228B22">#number of minibatches</span>
-t0, t1 = <span style="color: #B452CD">5</span>, <span style="color: #B452CD">50</span>
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">learning_schedule</span>(t):
-    <span style="color: #8B008B; font-weight: bold">return</span> t0/(t+t1)
-
-theta = np.random.randn(<span style="color: #B452CD">2</span>,<span style="color: #B452CD">1</span>)
-
-change = <span style="color: #B452CD">0.0</span>
-delta_momentum = <span style="color: #B452CD">0.3</span>
-
-<span style="color: #8B008B; font-weight: bold">for</span> epoch <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(n_epochs):
-    <span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(m):
-        random_index = M*np.random.randint(m)
-        xi = X[random_index:random_index+M]
-        yi = y[random_index:random_index+M]
-        gradients = (<span style="color: #B452CD">1.0</span>/M)*training_gradient(yi, xi, theta)
-        eta = learning_schedule(epoch*m+i)
-        <span style="color: #228B22"># calculate update</span>
-        new_change = eta*gradients+delta_momentum*change
-        <span style="color: #228B22"># take a step</span>
-        theta -= new_change
-        <span style="color: #228B22"># save the change</span>
-        change = new_change
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;theta from own sdg with momentum&quot;</span>)
-<span style="color: #658b00">print</span>(theta)
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-</section>
-
-<section>
-<h2 id="similar-second-order-function-now-problem-but-now-with-adagrad">Similar (second order function now) problem but now with AdaGrad </h2>
-
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="font-size: 80%; line-height: 125%;"><span style="color: #228B22"># Using Autograd to calculate gradients using AdaGrad and Stochastic Gradient descent</span>
-<span style="color: #228B22"># OLS example</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">random</span> <span style="color: #8B008B; font-weight: bold">import</span> random, seed
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">autograd.numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">matplotlib.pyplot</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">plt</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">autograd</span> <span style="color: #8B008B; font-weight: bold">import</span> grad
-
-<span style="color: #228B22"># Note change from previous example</span>
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">CostOLS</span>(y,X,theta):
-    <span style="color: #8B008B; font-weight: bold">return</span> np.sum((y-X @ theta)**<span style="color: #B452CD">2</span>)
-
-n = <span style="color: #B452CD">1000</span>
-x = np.random.rand(n,<span style="color: #B452CD">1</span>)
-y = <span style="color: #B452CD">2.0</span>+<span style="color: #B452CD">3</span>*x +<span style="color: #B452CD">4</span>*x*x
-
-X = np.c_[np.ones((n,<span style="color: #B452CD">1</span>)), x, x*x]
-XT_X = X.T @ X
-theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;Own inversion&quot;</span>)
-<span style="color: #658b00">print</span>(theta_linreg)
-
-
-<span style="color: #228B22"># Note that we request the derivative wrt third argument (theta, 2 here)</span>
-training_gradient = grad(CostOLS,<span style="color: #B452CD">2</span>)
-<span style="color: #228B22"># Define parameters for Stochastic Gradient Descent</span>
-n_epochs = <span style="color: #B452CD">50</span>
-M = <span style="color: #B452CD">5</span>   <span style="color: #228B22">#size of each minibatch</span>
-m = <span style="color: #658b00">int</span>(n/M) <span style="color: #228B22">#number of minibatches</span>
-<span style="color: #228B22"># Guess for unknown parameters theta</span>
-theta = np.random.randn(<span style="color: #B452CD">3</span>,<span style="color: #B452CD">1</span>)
-
-<span style="color: #228B22"># Value for learning rate</span>
-eta = <span style="color: #B452CD">0.01</span>
-<span style="color: #228B22"># Including AdaGrad parameter to avoid possible division by zero</span>
-delta  = <span style="color: #B452CD">1e-8</span>
-<span style="color: #8B008B; font-weight: bold">for</span> epoch <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(n_epochs):
-    Giter = <span style="color: #B452CD">0.0</span>
-    <span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(m):
-        random_index = M*np.random.randint(m)
-        xi = X[random_index:random_index+M]
-        yi = y[random_index:random_index+M]
-        gradients = (<span style="color: #B452CD">1.0</span>/M)*training_gradient(yi, xi, theta)
-        Giter += gradients*gradients
-        update = gradients*eta/(delta+np.sqrt(Giter))
-        theta -= update
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;theta from own AdaGrad&quot;</span>)
-<span style="color: #658b00">print</span>(theta)
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-<p>Running this code we note an almost perfect agreement with the results from matrix inversion.</p>
-</section>
-
-<section>
-<h2 id="rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent">RMSprop for adaptive learning rate with Stochastic Gradient Descent </h2>
-
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="font-size: 80%; line-height: 125%;"><span style="color: #228B22"># Using Autograd to calculate gradients using RMSprop  and Stochastic Gradient descent</span>
-<span style="color: #228B22"># OLS example</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">random</span> <span style="color: #8B008B; font-weight: bold">import</span> random, seed
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">autograd.numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">matplotlib.pyplot</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">plt</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">autograd</span> <span style="color: #8B008B; font-weight: bold">import</span> grad
-
-<span style="color: #228B22"># Note change from previous example</span>
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">CostOLS</span>(y,X,theta):
-    <span style="color: #8B008B; font-weight: bold">return</span> np.sum((y-X @ theta)**<span style="color: #B452CD">2</span>)
-
-n = <span style="color: #B452CD">1000</span>
-x = np.random.rand(n,<span style="color: #B452CD">1</span>)
-y = <span style="color: #B452CD">2.0</span>+<span style="color: #B452CD">3</span>*x +<span style="color: #B452CD">4</span>*x*x<span style="color: #228B22"># +np.random.randn(n,1)</span>
-
-X = np.c_[np.ones((n,<span style="color: #B452CD">1</span>)), x, x*x]
-XT_X = X.T @ X
-theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;Own inversion&quot;</span>)
-<span style="color: #658b00">print</span>(theta_linreg)
-
-
-<span style="color: #228B22"># Note that we request the derivative wrt third argument (theta, 2 here)</span>
-training_gradient = grad(CostOLS,<span style="color: #B452CD">2</span>)
-<span style="color: #228B22"># Define parameters for Stochastic Gradient Descent</span>
-n_epochs = <span style="color: #B452CD">50</span>
-M = <span style="color: #B452CD">5</span>   <span style="color: #228B22">#size of each minibatch</span>
-m = <span style="color: #658b00">int</span>(n/M) <span style="color: #228B22">#number of minibatches</span>
-<span style="color: #228B22"># Guess for unknown parameters theta</span>
-theta = np.random.randn(<span style="color: #B452CD">3</span>,<span style="color: #B452CD">1</span>)
-
-<span style="color: #228B22"># Value for learning rate</span>
-eta = <span style="color: #B452CD">0.01</span>
-<span style="color: #228B22"># Value for parameter rho</span>
-rho = <span style="color: #B452CD">0.99</span>
-<span style="color: #228B22"># Including AdaGrad parameter to avoid possible division by zero</span>
-delta  = <span style="color: #B452CD">1e-8</span>
-<span style="color: #8B008B; font-weight: bold">for</span> epoch <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(n_epochs):
-    Giter = <span style="color: #B452CD">0.0</span>
-    <span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(m):
-        random_index = M*np.random.randint(m)
-        xi = X[random_index:random_index+M]
-        yi = y[random_index:random_index+M]
-        gradients = (<span style="color: #B452CD">1.0</span>/M)*training_gradient(yi, xi, theta)
-	<span style="color: #228B22"># Accumulated gradient</span>
-	<span style="color: #228B22"># Scaling with rho the new and the previous results</span>
-        Giter = (rho*Giter+(<span style="color: #B452CD">1</span>-rho)*gradients*gradients)
-	<span style="color: #228B22"># Taking the diagonal only and inverting</span>
-        update = gradients*eta/(delta+np.sqrt(Giter))
-	<span style="color: #228B22"># Hadamard product</span>
-        theta -= update
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;theta from own RMSprop&quot;</span>)
-<span style="color: #658b00">print</span>(theta)
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-</section>
-
-<section>
-<h2 id="and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf">And finally <a href="https://arxiv.org/pdf/1412.6980.pdf" target="_blank">ADAM</a> </h2>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="font-size: 80%; line-height: 125%;"><span style="color: #228B22"># Using Autograd to calculate gradients using RMSprop  and Stochastic Gradient descent</span>
-<span style="color: #228B22"># OLS example</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">random</span> <span style="color: #8B008B; font-weight: bold">import</span> random, seed
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">autograd.numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">matplotlib.pyplot</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">plt</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">autograd</span> <span style="color: #8B008B; font-weight: bold">import</span> grad
-
-<span style="color: #228B22"># Note change from previous example</span>
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">CostOLS</span>(y,X,theta):
-    <span style="color: #8B008B; font-weight: bold">return</span> np.sum((y-X @ theta)**<span style="color: #B452CD">2</span>)
-
-n = <span style="color: #B452CD">1000</span>
-x = np.random.rand(n,<span style="color: #B452CD">1</span>)
-y = <span style="color: #B452CD">2.0</span>+<span style="color: #B452CD">3</span>*x +<span style="color: #B452CD">4</span>*x*x<span style="color: #228B22"># +np.random.randn(n,1)</span>
-
-X = np.c_[np.ones((n,<span style="color: #B452CD">1</span>)), x, x*x]
-XT_X = X.T @ X
-theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;Own inversion&quot;</span>)
-<span style="color: #658b00">print</span>(theta_linreg)
-
-
-<span style="color: #228B22"># Note that we request the derivative wrt third argument (theta, 2 here)</span>
-training_gradient = grad(CostOLS,<span style="color: #B452CD">2</span>)
-<span style="color: #228B22"># Define parameters for Stochastic Gradient Descent</span>
-n_epochs = <span style="color: #B452CD">50</span>
-M = <span style="color: #B452CD">5</span>   <span style="color: #228B22">#size of each minibatch</span>
-m = <span style="color: #658b00">int</span>(n/M) <span style="color: #228B22">#number of minibatches</span>
-<span style="color: #228B22"># Guess for unknown parameters theta</span>
-theta = np.random.randn(<span style="color: #B452CD">3</span>,<span style="color: #B452CD">1</span>)
-
-<span style="color: #228B22"># Value for learning rate</span>
-eta = <span style="color: #B452CD">0.01</span>
-<span style="color: #228B22"># Value for parameters beta1 and beta2, see https://arxiv.org/abs/1412.6980</span>
-beta1 = <span style="color: #B452CD">0.9</span>
-beta2 = <span style="color: #B452CD">0.999</span>
-<span style="color: #228B22"># Including AdaGrad parameter to avoid possible division by zero</span>
-delta  = <span style="color: #B452CD">1e-7</span>
-<span style="color: #658b00">iter</span> = <span style="color: #B452CD">0</span>
-<span style="color: #8B008B; font-weight: bold">for</span> epoch <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(n_epochs):
-    first_moment = <span style="color: #B452CD">0.0</span>
-    second_moment = <span style="color: #B452CD">0.0</span>
-    <span style="color: #658b00">iter</span> += <span style="color: #B452CD">1</span>
-    <span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(m):
-        random_index = M*np.random.randint(m)
-        xi = X[random_index:random_index+M]
-        yi = y[random_index:random_index+M]
-        gradients = (<span style="color: #B452CD">1.0</span>/M)*training_gradient(yi, xi, theta)
-        <span style="color: #228B22"># Computing moments first</span>
-        first_moment = beta1*first_moment + (<span style="color: #B452CD">1</span>-beta1)*gradients
-        second_moment = beta2*second_moment+(<span style="color: #B452CD">1</span>-beta2)*gradients*gradients
-        first_term = first_moment/(<span style="color: #B452CD">1.0</span>-beta1**<span style="color: #658b00">iter</span>)
-        second_term = second_moment/(<span style="color: #B452CD">1.0</span>-beta2**<span style="color: #658b00">iter</span>)
-	<span style="color: #228B22"># Scaling with rho the new and the previous results</span>
-        update = eta*first_term/(np.sqrt(second_term)+delta)
-        theta -= update
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;theta from own ADAM&quot;</span>)
-<span style="color: #658b00">print</span>(theta)
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-</section>
-
-<section>
-<h2 id="and-logistic-regression">And Logistic Regression </h2>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="font-size: 80%; line-height: 125%;"><span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">autograd.numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">autograd</span> <span style="color: #8B008B; font-weight: bold">import</span> grad
-
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">sigmoid</span>(x):
-    <span style="color: #8B008B; font-weight: bold">return</span> <span style="color: #B452CD">0.5</span> * (np.tanh(x / <span style="color: #B452CD">2.</span>) + <span style="color: #B452CD">1</span>)
-
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">logistic_predictions</span>(weights, inputs):
-    <span style="color: #228B22"># Outputs probability of a label being true according to logistic model.</span>
-    <span style="color: #8B008B; font-weight: bold">return</span> sigmoid(np.dot(inputs, weights))
-
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">training_loss</span>(weights):
-    <span style="color: #228B22"># Training loss is the negative log-likelihood of the training labels.</span>
-    preds = logistic_predictions(weights, inputs)
-    label_probabilities = preds * targets + (<span style="color: #B452CD">1</span> - preds) * (<span style="color: #B452CD">1</span> - targets)
-    <span style="color: #8B008B; font-weight: bold">return</span> -np.sum(np.log(label_probabilities))
-
-<span style="color: #228B22"># Build a toy dataset.</span>
-inputs = np.array([[<span style="color: #B452CD">0.52</span>, <span style="color: #B452CD">1.12</span>,  <span style="color: #B452CD">0.77</span>],
-                   [<span style="color: #B452CD">0.88</span>, -<span style="color: #B452CD">1.08</span>, <span style="color: #B452CD">0.15</span>],
-                   [<span style="color: #B452CD">0.52</span>, <span style="color: #B452CD">0.06</span>, -<span style="color: #B452CD">1.30</span>],
-                   [<span style="color: #B452CD">0.74</span>, -<span style="color: #B452CD">2.49</span>, <span style="color: #B452CD">1.39</span>]])
-targets = np.array([<span style="color: #8B008B; font-weight: bold">True</span>, <span style="color: #8B008B; font-weight: bold">True</span>, <span style="color: #8B008B; font-weight: bold">False</span>, <span style="color: #8B008B; font-weight: bold">True</span>])
-
-<span style="color: #228B22"># Define a function that returns gradients of training loss using Autograd.</span>
-training_gradient_fun = grad(training_loss)
-
-<span style="color: #228B22"># Optimize weights using gradient descent.</span>
-weights = np.array([<span style="color: #B452CD">0.0</span>, <span style="color: #B452CD">0.0</span>, <span style="color: #B452CD">0.0</span>])
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;Initial loss:&quot;</span>, training_loss(weights))
-<span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #B452CD">100</span>):
-    weights -= training_gradient_fun(weights) * <span style="color: #B452CD">0.01</span>
-
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;Trained loss:&quot;</span>, training_loss(weights))
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-</section>
-
-<section>
-<h2 id="introducing-jax-https-jax-readthedocs-io-en-latest">Introducing <a href="https://jax.readthedocs.io/en/latest/" target="_blank">JAX</a> </h2>
-
-<p>Presently, instead of using <b>autograd</b>, we recommend using <a href="https://jax.readthedocs.io/en/latest/" target="_blank">JAX</a></p>
-
-<p><b>JAX</b> is Autograd and <a href="https://www.tensorflow.org/xla" target="_blank">XLA (Accelerated Linear Algebra))</a>,
-brought together for high-performance numerical computing and machine learning research.
-It provides composable transformations of Python+NumPy programs: differentiate, vectorize, parallelize, Just-In-Time compile to GPU/TPU, and more.
-</p>
-
-<p>Here's a simple example on how you can use <b>JAX</b> to compute the derivate of the logistic function.</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="font-size: 80%; line-height: 125%;"><span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">jax.numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">jnp</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">jax</span> <span style="color: #8B008B; font-weight: bold">import</span> grad, jit, vmap
-
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">sum_logistic</span>(x):
-  <span style="color: #8B008B; font-weight: bold">return</span> jnp.sum(<span style="color: #B452CD">1.0</span> / (<span style="color: #B452CD">1.0</span> + jnp.exp(-x)))
+  <pre style="font-size: 80%; line-height: 125%;"><span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
 
-x_small = jnp.arange(<span style="color: #B452CD">3.</span>)
-derivative_fn = grad(sum_logistic)
-<span style="color: #658b00">print</span>(derivative_fn(x_small))
+<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">generate_binary_data</span>(n_samples=<span style="color: #B452CD">100</span>, n_features=<span style="color: #B452CD">2</span>, random_state=<span style="color: #8B008B; font-weight: bold">None</span>):
+    <span style="color: #CD5555">&quot;&quot;&quot;</span>
+<span style="color: #CD5555">    Generate synthetic binary classification data.</span>
+<span style="color: #CD5555">    Returns (X, y) where X is (n_samples x n_features), y in {0,1}.</span>
+<span style="color: #CD5555">    &quot;&quot;&quot;</span>
+    rng = np.random.RandomState(random_state)
+    <span style="color: #228B22"># Half samples for class 0, half for class 1</span>
+    n0 = n_samples // <span style="color: #B452CD">2</span>
+    n1 = n_samples - n0
+    <span style="color: #228B22"># Class 0 around mean -2, class 1 around +2</span>
+    mean0 = -<span style="color: #B452CD">2</span> * np.ones(n_features)
+    mean1 =  <span style="color: #B452CD">2</span> * np.ones(n_features)
+    X0 = rng.randn(n0, n_features) + mean0
+    X1 = rng.randn(n1, n_features) + mean1
+    X = np.vstack((X0, X1))
+    y = np.array([<span style="color: #B452CD">0</span>]*n0 + [<span style="color: #B452CD">1</span>]*n1)
+    <span style="color: #8B008B; font-weight: bold">return</span> X, y
+
+<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">generate_multiclass_data</span>(n_samples=<span style="color: #B452CD">150</span>, n_features=<span style="color: #B452CD">2</span>, n_classes=<span style="color: #B452CD">3</span>, random_state=<span style="color: #8B008B; font-weight: bold">None</span>):
+    <span style="color: #CD5555">&quot;&quot;&quot;</span>
+<span style="color: #CD5555">    Generate synthetic multiclass data with n_classes Gaussian clusters.</span>
+<span style="color: #CD5555">    &quot;&quot;&quot;</span>
+    rng = np.random.RandomState(random_state)
+    X = []
+    y = []
+    samples_per_class = n_samples // n_classes
+    <span style="color: #8B008B; font-weight: bold">for</span> <span style="color: #658b00">cls</span> <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(n_classes):
+        <span style="color: #228B22"># Random cluster center for each class</span>
+        center = rng.uniform(-<span style="color: #B452CD">5</span>, <span style="color: #B452CD">5</span>, size=n_features)
+        Xi = rng.randn(samples_per_class, n_features) + center
+        yi = [<span style="color: #658b00">cls</span>] * samples_per_class
+        X.append(Xi)
+        y.extend(yi)
+    X = np.vstack(X)
+    y = np.array(y)
+    <span style="color: #8B008B; font-weight: bold">return</span> X, y
+
+
+<span style="color: #228B22"># Generate and test on binary data</span>
+X_bin, y_bin = generate_binary_data(n_samples=<span style="color: #B452CD">200</span>, n_features=<span style="color: #B452CD">2</span>, random_state=<span style="color: #B452CD">42</span>)
+model_bin = LogisticRegression(lr=<span style="color: #B452CD">0.1</span>, epochs=<span style="color: #B452CD">1000</span>)
+model_bin.fit(X_bin, y_bin)
+y_prob_bin = model_bin.predict_prob(X_bin)      <span style="color: #228B22"># probabilities for class 1</span>
+y_pred_bin = model_bin.predict(X_bin)           <span style="color: #228B22"># predicted classes 0 or 1</span>
+
+acc_bin = accuracy_score(y_bin, y_pred_bin)
+loss_bin = binary_cross_entropy(y_bin, y_prob_bin)
+<span style="color: #658b00">print</span>(<span style="color: #CD5555">f&quot;Binary Classification - Accuracy: {</span>acc_bin<span style="color: #CD5555">:.2f}, Cross-Entropy Loss: {</span>loss_bin<span style="color: #CD5555">:.2f}&quot;</span>)
+<span style="color: #228B22">#For multiclass:</span>
+<span style="color: #228B22"># Generate and test on multiclass data</span>
+X_multi, y_multi = generate_multiclass_data(n_samples=<span style="color: #B452CD">300</span>, n_features=<span style="color: #B452CD">2</span>, n_classes=<span style="color: #B452CD">3</span>, random_state=<span style="color: #B452CD">1</span>)
+model_multi = LogisticRegression(lr=<span style="color: #B452CD">0.1</span>, epochs=<span style="color: #B452CD">1000</span>)
+model_multi.fit(X_multi, y_multi)
+y_prob_multi = model_multi.predict_prob(X_multi)     <span style="color: #228B22"># (n_samples x 3) probabilities</span>
+y_pred_multi = model_multi.predict(X_multi)          <span style="color: #228B22"># predicted labels 0,1,2</span>
+
+acc_multi = accuracy_score(y_multi, y_pred_multi)
+loss_multi = categorical_cross_entropy(y_multi, y_prob_multi)
+<span style="color: #658b00">print</span>(<span style="color: #CD5555">f&quot;Multiclass Classification - Accuracy: {</span>acc_multi<span style="color: #CD5555">:.2f}, Cross-Entropy Loss: {</span>loss_multi<span style="color: #CD5555">:.2f}&quot;</span>)
+
+<span style="color: #228B22"># CSV Export</span>
+<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">csv</span>
+
+<span style="color: #228B22"># Export binary results</span>
+<span style="color: #8B008B; font-weight: bold">with</span> <span style="color: #658b00">open</span>(<span style="color: #CD5555">&#39;binary_results.csv&#39;</span>, mode=<span style="color: #CD5555">&#39;w&#39;</span>, newline=<span style="color: #CD5555">&#39;&#39;</span>) <span style="color: #8B008B; font-weight: bold">as</span> f:
+    writer = csv.writer(f)
+    writer.writerow([<span style="color: #CD5555">&quot;TrueLabel&quot;</span>, <span style="color: #CD5555">&quot;PredictedLabel&quot;</span>])
+    <span style="color: #8B008B; font-weight: bold">for</span> true, pred <span style="color: #8B008B">in</span> <span style="color: #658b00">zip</span>(y_bin, y_pred_bin):
+        writer.writerow([true, pred])
+
+<span style="color: #228B22"># Export multiclass results</span>
+<span style="color: #8B008B; font-weight: bold">with</span> <span style="color: #658b00">open</span>(<span style="color: #CD5555">&#39;multiclass_results.csv&#39;</span>, mode=<span style="color: #CD5555">&#39;w&#39;</span>, newline=<span style="color: #CD5555">&#39;&#39;</span>) <span style="color: #8B008B; font-weight: bold">as</span> f:
+    writer = csv.writer(f)
+    writer.writerow([<span style="color: #CD5555">&quot;TrueLabel&quot;</span>, <span style="color: #CD5555">&quot;PredictedLabel&quot;</span>])
+    <span style="color: #8B008B; font-weight: bold">for</span> true, pred <span style="color: #8B008B">in</span> <span style="color: #658b00">zip</span>(y_multi, y_pred_multi):
+        writer.writerow([true, pred])
 </pre>
 </div>
       </div>
diff --git a/doc/pub/week39/html/week39-solarized.html b/doc/pub/week39/html/week39-solarized.html
index 36a7e748b..6a8e90be3 100644
--- a/doc/pub/week39/html/week39-solarized.html
+++ b/doc/pub/week39/html/week39-solarized.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 39: Optimization and  Gradient Methods">
-<title>Week 39: Optimization and  Gradient Methods</title>
+<meta name="description" content="Week 39: Resampling methods and logistic regression">
+<title>Week 39: Resampling methods and logistic regression</title>
 <link href="https://cdn.rawgit.com/doconce/doconce/master/bundled/html_styles/style_solarized_box/css/solarized_light_code.css" rel="stylesheet" type="text/css" title="light"/>
 <script src="https://cdn.rawgit.com/doconce/doconce/master/bundled/html_styles/style_solarized_box/js/highlight.pack.js"></script>
 <script>hljs.initHighlightingOnLoad();</script>
@@ -63,248 +63,132 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Plan for week 39, September 23-27, 2024',
+ 'sections': [('Plan for week 39, September 22-26, 2025',
                2,
                None,
-               'plan-for-week-39-september-23-27-2024'),
-              ('Lecture Monday September 23',
+               'plan-for-week-39-september-22-26-2025'),
+              ('Readings and Videos, resampling methods',
                2,
                None,
-               'lecture-monday-september-23'),
-              ('Lab sessions week 39', 2, None, 'lab-sessions-week-39'),
-              ('Lecture Monday September 23, Optimization, the central part of '
-               'any Machine Learning algortithm',
-               2,
-               None,
-               'lecture-monday-september-23-optimization-the-central-part-of-any-machine-learning-algortithm'),
-              ('Revisiting our Logistic Regression case',
-               2,
-               None,
-               'revisiting-our-logistic-regression-case'),
-              ('The equations to solve', 2, None, 'the-equations-to-solve'),
-              ("Solving using Newton-Raphson's method",
-               2,
-               None,
-               'solving-using-newton-raphson-s-method'),
-              ("Brief reminder on Newton-Raphson's method",
-               2,
-               None,
-               'brief-reminder-on-newton-raphson-s-method'),
-              ('The equations', 2, None, 'the-equations'),
-              ('Simple geometric interpretation',
-               2,
-               None,
-               'simple-geometric-interpretation'),
-              ('Extending to more than one variable',
-               2,
-               None,
-               'extending-to-more-than-one-variable'),
-              ('Steepest descent', 2, None, 'steepest-descent'),
-              ('More on Steepest descent', 2, None, 'more-on-steepest-descent'),
-              ('The ideal', 2, None, 'the-ideal'),
-              ('The sensitiveness of the gradient descent',
-               2,
-               None,
-               'the-sensitiveness-of-the-gradient-descent'),
-              ('Convex functions', 2, None, 'convex-functions'),
-              ('Convex function', 2, None, 'convex-function'),
-              ('Conditions on convex functions',
-               2,
-               None,
-               'conditions-on-convex-functions'),
-              ('More on convex functions', 2, None, 'more-on-convex-functions'),
-              ('Some simple problems', 2, None, 'some-simple-problems'),
-              ('Standard steepest descent',
-               2,
-               None,
-               'standard-steepest-descent'),
-              ('Gradient method', 2, None, 'gradient-method'),
-              ('Steepest descent  method', 2, None, 'steepest-descent-method'),
-              ('Steepest descent  method', 2, None, 'steepest-descent-method'),
-              ('Final expressions', 2, None, 'final-expressions'),
-              ('Steepest descent example', 2, None, 'steepest-descent-example'),
-              ('Conjugate gradient method',
-               2,
-               None,
-               'conjugate-gradient-method'),
-              ('Conjugate gradient method',
-               2,
-               None,
-               'conjugate-gradient-method'),
-              ('Conjugate gradient method',
-               2,
-               None,
-               'conjugate-gradient-method'),
-              ('Conjugate gradient method',
-               2,
-               None,
-               'conjugate-gradient-method'),
-              ('Conjugate gradient method and iterations',
+               'readings-and-videos-resampling-methods'),
+              ('Readings and Videos, logistic regression',
                2,
                None,
-               'conjugate-gradient-method-and-iterations'),
-              ('Conjugate gradient method',
-               2,
-               None,
-               'conjugate-gradient-method'),
-              ('Conjugate gradient method',
-               2,
-               None,
-               'conjugate-gradient-method'),
-              ('Conjugate gradient method',
-               2,
-               None,
-               'conjugate-gradient-method'),
-              ('Revisiting our first homework',
-               2,
-               None,
-               'revisiting-our-first-homework'),
-              ('Gradient descent example', 2, None, 'gradient-descent-example'),
-              ('The derivative of the cost/loss function',
-               2,
-               None,
-               'the-derivative-of-the-cost-loss-function'),
-              ('The Hessian matrix', 2, None, 'the-hessian-matrix'),
-              ('Simple program', 2, None, 'simple-program'),
-              ('Gradient Descent Example', 2, None, 'gradient-descent-example'),
-              ('And a corresponding example using _scikit-learn_',
-               2,
-               None,
-               'and-a-corresponding-example-using-scikit-learn'),
-              ('Gradient descent and Ridge',
-               2,
-               None,
-               'gradient-descent-and-ridge'),
-              ('The Hessian matrix for Ridge Regression',
-               2,
-               None,
-               'the-hessian-matrix-for-ridge-regression'),
-              ('Program example for gradient descent with Ridge Regression',
-               2,
-               None,
-               'program-example-for-gradient-descent-with-ridge-regression'),
-              ('Using gradient descent methods, limitations',
-               2,
-               None,
-               'using-gradient-descent-methods-limitations'),
-              ('Improving gradient descent with momentum',
-               2,
-               None,
-               'improving-gradient-descent-with-momentum'),
-              ('Same code but now with momentum gradient descent',
+               'readings-and-videos-logistic-regression'),
+              ('Lab sessions week 39', 2, None, 'lab-sessions-week-39'),
+              ('Lecture material', 2, None, 'lecture-material'),
+              ('Resampling methods', 2, None, 'resampling-methods'),
+              ('Resampling approaches can be computationally expensive',
                2,
                None,
-               'same-code-but-now-with-momentum-gradient-descent'),
-              ('Overview video on Stochastic Gradient Descent',
+               'resampling-approaches-can-be-computationally-expensive'),
+              ('Why resampling methods ?', 2, None, 'why-resampling-methods'),
+              ('Statistical analysis', 2, None, 'statistical-analysis'),
+              ('Resampling methods', 2, None, 'resampling-methods'),
+              ('Resampling methods: Bootstrap',
                2,
                None,
-               'overview-video-on-stochastic-gradient-descent'),
-              ('Batches and mini-batches', 2, None, 'batches-and-mini-batches'),
-              ('Stochastic Gradient Descent (SGD)',
+               'resampling-methods-bootstrap'),
+              ('The bias-variance tradeoff',
                2,
                None,
-               'stochastic-gradient-descent-sgd'),
-              ('Stochastic Gradient Descent',
+               'the-bias-variance-tradeoff'),
+              ('A way to Read the Bias-Variance Tradeoff',
                2,
                None,
-               'stochastic-gradient-descent'),
-              ('Computation of gradients', 2, None, 'computation-of-gradients'),
-              ('SGD example', 2, None, 'sgd-example'),
-              ('The gradient step', 2, None, 'the-gradient-step'),
-              ('Simple example code', 2, None, 'simple-example-code'),
-              ('When do we stop?', 2, None, 'when-do-we-stop'),
-              ('Slightly different approach',
+               'a-way-to-read-the-bias-variance-tradeoff'),
+              ('Understanding what happens',
                2,
                None,
-               'slightly-different-approach'),
-              ('Time decay rate', 2, None, 'time-decay-rate'),
-              ('Code with a Number of Minibatches which varies',
+               'understanding-what-happens'),
+              ('Summing up', 2, None, 'summing-up'),
+              ("Another Example from Scikit-Learn's Repository",
                2,
                None,
-               'code-with-a-number-of-minibatches-which-varies'),
-              ('Replace or not', 2, None, 'replace-or-not'),
-              ('Momentum based GD', 2, None, 'momentum-based-gd'),
-              ('More on momentum based approaches',
+               'another-example-from-scikit-learn-s-repository'),
+              ('Various steps in cross-validation',
                2,
                None,
-               'more-on-momentum-based-approaches'),
-              ('Momentum parameter', 2, None, 'momentum-parameter'),
-              ('Second moment of the gradient',
+               'various-steps-in-cross-validation'),
+              ('Cross-validation in brief',
                2,
                None,
-               'second-moment-of-the-gradient'),
-              ('RMS prop', 2, None, 'rms-prop'),
-              ('"ADAM optimizer":"https://arxiv.org/abs/1412.6980"',
+               'cross-validation-in-brief'),
+              ('Code Example for Cross-validation and $k$-fold '
+               'Cross-validation',
                2,
                None,
-               'adam-optimizer-https-arxiv-org-abs-1412-6980'),
-              ('Algorithms and codes for Adagrad, RMSprop and Adam',
+               'code-example-for-cross-validation-and-k-fold-cross-validation'),
+              ('More examples on bootstrap and cross-validation and errors',
                2,
                None,
-               'algorithms-and-codes-for-adagrad-rmsprop-and-adam'),
-              ('Practical tips', 2, None, 'practical-tips'),
-              ('Automatic differentiation',
+               'more-examples-on-bootstrap-and-cross-validation-and-errors'),
+              ('The same example but now with cross-validation',
                2,
                None,
-               'automatic-differentiation'),
-              ('Using autograd', 2, None, 'using-autograd'),
-              ('Autograd with more complicated functions',
+               'the-same-example-but-now-with-cross-validation'),
+              ('Logistic Regression', 2, None, 'logistic-regression'),
+              ('Classification problems', 2, None, 'classification-problems'),
+              ('Optimization and Deep learning',
                2,
                None,
-               'autograd-with-more-complicated-functions'),
-              ('More complicated functions using the elements of their '
-               'arguments directly',
+               'optimization-and-deep-learning'),
+              ('Basics', 2, None, 'basics'),
+              ('Linear classifier', 2, None, 'linear-classifier'),
+              ('Some selected properties', 2, None, 'some-selected-properties'),
+              ('Simple example', 2, None, 'simple-example'),
+              ('Plotting the mean value for each group',
                2,
                None,
-               'more-complicated-functions-using-the-elements-of-their-arguments-directly'),
-              ('Functions using mathematical functions from Numpy',
+               'plotting-the-mean-value-for-each-group'),
+              ('The logistic function', 2, None, 'the-logistic-function'),
+              ('Examples of likelihood functions used in logistic regression '
+               'and nueral networks',
                2,
                None,
-               'functions-using-mathematical-functions-from-numpy'),
-              ('More autograd', 2, None, 'more-autograd'),
-              ('And  with loops', 2, None, 'and-with-loops'),
-              ('Using recursion', 2, None, 'using-recursion'),
-              ('Unsupported functions', 2, None, 'unsupported-functions'),
-              ('The syntax a.dot(b) when finding the dot product',
+               'examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-networks'),
+              ('Two parameters', 2, None, 'two-parameters'),
+              ('Maximum likelihood', 2, None, 'maximum-likelihood'),
+              ('The cost function rewritten',
                2,
                None,
-               'the-syntax-a-dot-b-when-finding-the-dot-product'),
-              ('Using Autograd with OLS', 2, None, 'using-autograd-with-ols'),
-              ('Same code but now with momentum gradient descent',
+               'the-cost-function-rewritten'),
+              ('Minimizing the cross entropy',
                2,
                None,
-               'same-code-but-now-with-momentum-gradient-descent'),
-              ("But none of these can compete with Newton's method",
+               'minimizing-the-cross-entropy'),
+              ('A more compact expression',
                2,
                None,
-               'but-none-of-these-can-compete-with-newton-s-method'),
-              ('Including Stochastic Gradient Descent with Autograd',
+               'a-more-compact-expression'),
+              ('Extending to more predictors',
                2,
                None,
-               'including-stochastic-gradient-descent-with-autograd'),
-              ('Same code but now with momentum gradient descent',
+               'extending-to-more-predictors'),
+              ('Including more classes', 2, None, 'including-more-classes'),
+              ('More classes', 2, None, 'more-classes'),
+              ('Optimization, the central part of any Machine Learning '
+               'algortithm',
                2,
                None,
-               'same-code-but-now-with-momentum-gradient-descent'),
-              ('Similar (second order function now) problem but now with '
-               'AdaGrad',
+               'optimization-the-central-part-of-any-machine-learning-algortithm'),
+              ('Revisiting our Logistic Regression case',
                2,
                None,
-               'similar-second-order-function-now-problem-but-now-with-adagrad'),
-              ('RMSprop for adaptive learning rate with Stochastic Gradient '
-               'Descent',
+               'revisiting-our-logistic-regression-case'),
+              ('The equations to solve', 2, None, 'the-equations-to-solve'),
+              ("Solving using Newton-Raphson's method",
                2,
                None,
-               'rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent'),
-              ('And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"',
+               'solving-using-newton-raphson-s-method'),
+              ('Example code for Logistic Regression',
                2,
                None,
-               'and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf'),
-              ('And Logistic Regression', 2, None, 'and-logistic-regression'),
-              ('Introducing "JAX":"https://jax.readthedocs.io/en/latest/"',
-               2,
+               'example-code-for-logistic-regression'),
+              ('Synthetic data generation',
+               3,
                None,
-               'introducing-jax-https-jax-readthedocs-io-en-latest')]}
+               'synthetic-data-generation')]}
 end of tocinfo -->
 
 <body>
@@ -326,19 +210,16 @@
 
 <!-- ------------------- main content ---------------------- -->
 <center>
-<h1>Week 39: Optimization and  Gradient Methods</h1>
+<h1>Week 39: Resampling methods and logistic regression</h1>
 </center>  <!-- document title -->
 
 <!-- author(s): Morten Hjorth-Jensen -->
 <center>
-<b>Morten Hjorth-Jensen</b> [1, 2]
-</center>
-<!-- institution(s) -->
-<center>
-[1] <b>Department of Physics, University of Oslo</b>
+<b>Morten Hjorth-Jensen</b> 
 </center>
+<!-- institution -->
 <center>
-[2] <b>Department of Physics and Astronomy and Facility for Rare Isotope Beams, Michigan State University</b>
+<b>Department of Physics, University of Oslo</b>
 </center>
 <br>
 <center>
@@ -347,28 +228,46 @@ <h4>Week 39</h4>
 <br>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="plan-for-week-39-september-23-27-2024">Plan for week 39, September 23-27, 2024 </h2>
+<h2 id="plan-for-week-39-september-22-26-2025">Plan for week 39, September 22-26, 2025 </h2>
+
+<div class="alert alert-block alert-block alert-text-normal">
+<b>Material for the lecture on Monday September 22</b>
+<p>
+<ol>
+<li> Resampling techniques, Bootstrap and cross validation and bias-variance tradeoff</li>
+<li> Logistic regression, our first classification encounter and a stepping stone towards neural networks</li>
+<li> <a href="https://youtu.be/OVouJyhoksY" target="_blank">Video of lecture</a></li>
+<li> <a href="https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2024/FYSSTKweek39.pdf" target="_blank">Whiteboard notes</a></li>
+</ol>
+</div>
+
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="lecture-monday-september-23">Lecture Monday September 23 </h2>
+<h2 id="readings-and-videos-resampling-methods">Readings and Videos, resampling methods </h2>
+<div class="alert alert-block alert-block alert-text-normal">
+<b></b>
+<p>
+<ol>
+<li> Raschka et al, pages 175-192</li>
+<li> Hastie et al Chapter 7, here we recommend 7.1-7.5 and 7.10 (cross-validation) and 7.11 (bootstrap). See <a href="https://link.springer.com/book/10.1007/978-0-387-84858-7" target="_blank"><tt>https://link.springer.com/book/10.1007/978-0-387-84858-7</tt></a>.</li>
+<li> <a href="https://www.youtube.com/watch?v=EuBBz3bI-aA" target="_blank">Video on bias-variance tradeoff</a></li>
+<li> <a href="https://www.youtube.com/watch?v=Xz0x-8-cgaQ" target="_blank">Video on Bootstrapping</a></li>
+<li> <a href="https://www.youtube.com/watch?v=fSytzGwwBVw" target="_blank">Video on cross validation</a></li>
+</ol>
+</div>
 
+
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="readings-and-videos-logistic-regression">Readings and Videos, logistic regression </h2>
 <div class="alert alert-block alert-block alert-text-normal">
-<b>Material for the lecture on Monday September 23</b>
+<b></b>
 <p>
-<ul>
-  <li> Repetition of Logistic regression equations and classification problems and discussion of Gradient methods. Examples on how to implement Logistic Regression and discussion of gradient methods</li>
-  <li> Stochastic Gradient descent with examples and automatic differentiation (theme also for next week).</li>
-  <li> <a href="https://youtu.be/ISGpTC28Vmk" target="_blank">Video of lecture</a></li>
-  <li> <a href="https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2024/NotesSeptember23.pdf" target="_blank">Whiteboard notes</a></li>
-  <li> Readings and Videos:</li>
-<ul>
-    <li> These lecture notes</li>
-    <li> For a good discussion on gradient methods, we would like to recommend Goodfellow et al section 4.3-4.5 and sections 8.3-8.6. We will come back to the latter chapter in our discussion of Neural networks as well.</li>
-    <li> Raschka et al, pages 53-76 on Logistic regression and pages 37-52 on gradient optimization</li>    
-    <li> <a href="https://www.youtube.com/watch?v=sDv4f4s2SB8" target="_blank">Video on gradient descent</a></li>
-    <li> <a href="https://www.youtube.com/watch?v=vMh0zPT0tLI" target="_blank">Video on stochastic gradient descent</a></li>
-</ul>
-</ul>
+<ol>
+<li> Hastie et al 4.1, 4.2 and 4.3 on logistic regression</li>
+<li> Raschka et al, pages 53-76 on Logistic regression and pages 37-52 on gradient optimization</li>
+<li> <a href="https://www.youtube.com/watch?v=C5268D9t9Ak" target="_blank">Video on Logistic regression</a></li>
+<li> <a href="https://www.youtube.com/watch?v=yIYKR4sgzI8" target="_blank">Yet another video on logistic regression</a></li>
+</ol>
 </div>
 
 
@@ -376,554 +275,573 @@ <h2 id="lecture-monday-september-23">Lecture Monday September 23 </h2>
 <h2 id="lab-sessions-week-39">Lab sessions week 39 </h2>
 
 <div class="alert alert-block alert-block alert-text-normal">
-<b>Material for the active learning sessions on Tuesday and Wednesday</b>
+<b>Material for the lab sessions on Tuesday and Wednesday</b>
 <p>
-<ul>
-  <li> Discussions on how to structure your report for the first project</li>
-  <li> Exercise for week 39 on how to write the abstract and the introduction of the report and how to include references.</li> 
-  <li> Work on project 1, in particular resampling methods like cross-validation and bootstrap. <b>For more discussions of project 1, chapter 5 of Goodfellow et al is a good read, in particular sections 5.1-5.5 and 5.7-5.11</b>.</li>
-  <li> <a href="https://youtu.be/tVW1ZDmZnwM" target="_blank">Video on how to write scientific reports recorded during one of the lab sessions</a></li>
-  <li> A general guideline can be found at <a href="https://github.com/CompPhysics/MachineLearning/blob/master/doc/Projects/EvaluationGrading/EvaluationForm.md" target="_blank"><tt>https://github.com/CompPhysics/MachineLearning/blob/master/doc/Projects/EvaluationGrading/EvaluationForm.md</tt></a>.</li>
-</ul>
+<ol>
+<li> Discussions on how to structure your report for the first project</li>
+<li> Exercise for week 39 on how to write the abstract and the introduction of the report and how to include references.</li> 
+<li> Work on project 1, in particular resampling methods like cross-validation and bootstrap. <b>For more discussions of project 1, chapter 5 of Goodfellow et al is a good read, in particular sections 5.1-5.5 and 5.7-5.11</b>.</li>
+<li> <a href="https://youtu.be/tVW1ZDmZnwM" target="_blank">Video on how to write scientific reports recorded during one of the lab sessions</a></li>
+<li> A general guideline can be found at <a href="https://github.com/CompPhysics/MachineLearning/blob/master/doc/Projects/EvaluationGrading/EvaluationForm.md" target="_blank"><tt>https://github.com/CompPhysics/MachineLearning/blob/master/doc/Projects/EvaluationGrading/EvaluationForm.md</tt></a>.</li>
+</ol>
 </div>
   
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="lecture-monday-september-23-optimization-the-central-part-of-any-machine-learning-algortithm">Lecture Monday September 23, Optimization, the central part of any Machine Learning algortithm </h2>
-
-<p>The first few slides here are a repetition from last week. </p>
-
-<p>Almost every problem in machine learning and data science starts with
-a dataset \( X \), a model \( g(\beta) \), which is a function of the
-parameters \( \beta \) and a cost function \( C(X, g(\beta)) \) that allows
-us to judge how well the model \( g(\beta) \) explains the observations
-\( X \). The model is fit by finding the values of \( \beta \) that minimize
-the cost function. Ideally we would be able to solve for \( \beta \)
-analytically, however this is not possible in general and we must use
-some approximative/numerical method to compute the minimum.
-</p>
+<h2 id="lecture-material">Lecture material </h2>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="revisiting-our-logistic-regression-case">Revisiting our Logistic Regression case </h2>
-
-<p>In our discussion on Logistic Regression we studied the 
-case of
-two classes, with \( y_i \) either
-\( 0 \) or \( 1 \). Furthermore we assumed also that we have only two
-parameters \( \beta \) in our fitting, that is we
-defined probabilities
-</p>
-
-$$
-\begin{align*}
-p(y_i=1|x_i,\boldsymbol{\beta}) &= \frac{\exp{(\beta_0+\beta_1x_i)}}{1+\exp{(\beta_0+\beta_1x_i)}},\nonumber\\
-p(y_i=0|x_i,\boldsymbol{\beta}) &= 1 - p(y_i=1|x_i,\boldsymbol{\beta}),
-\end{align*}
-$$
+<h2 id="resampling-methods">Resampling methods </h2>
+<div class="alert alert-block alert-block alert-text-normal">
+<b></b>
+<p>
+<p>Resampling methods are an indispensable tool in modern
+statistics. They involve repeatedly drawing samples from a training
+set and refitting a model of interest on each sample in order to
+obtain additional information about the fitted model. For example, in
+order to estimate the variability of a linear regression fit, we can
+repeatedly draw different samples from the training data, fit a linear
+regression to each new sample, and then examine the extent to which
+the resulting fits differ. Such an approach may allow us to obtain
+information that would not be available from fitting the model only
+once using the original training sample.
+</p>
+
+<p>Two resampling methods are often used in Machine Learning analyses,</p>
+<ol>
+<li> The <b>bootstrap method</b></li>
+<li> and <b>Cross-Validation</b></li>
+</ol>
+<p>In addition there are several other methods such as the Jackknife and the Blocking methods. This week will repeat some of the elements of the bootstrap method and focus more on cross-validation.</p>
+</div>
 
-<p>where \( \boldsymbol{\beta} \) are the weights we wish to extract from data, in our case \( \beta_0 \) and \( \beta_1 \). </p>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="the-equations-to-solve">The equations to solve </h2>
-
-<p>Our compact equations used a definition of a vector \( \boldsymbol{y} \) with \( n \)
-elements \( y_i \), an \( n\times p \) matrix \( \boldsymbol{X} \) which contains the
-\( x_i \) values and a vector \( \boldsymbol{p} \) of fitted probabilities
-\( p(y_i\vert x_i,\boldsymbol{\beta}) \). We rewrote in a more compact form
-the first derivative of the cost function as
-</p>
-
-$$
-\frac{\partial \mathcal{C}(\boldsymbol{\beta})}{\partial \boldsymbol{\beta}} = -\boldsymbol{X}^T\left(\boldsymbol{y}-\boldsymbol{p}\right). 
-$$
+<h2 id="resampling-approaches-can-be-computationally-expensive">Resampling approaches can be computationally expensive </h2>
+<div class="alert alert-block alert-block alert-text-normal">
+<b></b>
+<p>
 
-<p>If we in addition define a diagonal matrix \( \boldsymbol{W} \) with elements 
-\( p(y_i\vert x_i,\boldsymbol{\beta})(1-p(y_i\vert x_i,\boldsymbol{\beta}) \), we can obtain a compact expression of the second derivative as 
+<p>Resampling approaches can be computationally expensive, because they
+involve fitting the same statistical method multiple times using
+different subsets of the training data. However, due to recent
+advances in computing power, the computational requirements of
+resampling methods generally are not prohibitive. In this chapter, we
+discuss two of the most commonly used resampling methods,
+cross-validation and the bootstrap. Both methods are important tools
+in the practical application of many statistical learning
+procedures. For example, cross-validation can be used to estimate the
+test error associated with a given statistical learning method in
+order to evaluate its performance, or to select the appropriate level
+of flexibility. The process of evaluating a model&#8217;s performance is
+known as model assessment, whereas the process of selecting the proper
+level of flexibility for a model is known as model selection. The
+bootstrap is widely used.
 </p>
+</div>
 
-$$
-\frac{\partial^2 \mathcal{C}(\boldsymbol{\beta})}{\partial \boldsymbol{\beta}\partial \boldsymbol{\beta}^T} = \boldsymbol{X}^T\boldsymbol{W}\boldsymbol{X}. 
-$$
-
-<p>This defines what is called  the Hessian matrix.</p>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="solving-using-newton-raphson-s-method">Solving using Newton-Raphson's method </h2>
+<h2 id="why-resampling-methods">Why resampling methods ? </h2>
+<div class="alert alert-block alert-block alert-text-normal">
+<b>Statistical analysis</b>
+<p>
 
-<p>If we can set up these equations, Newton-Raphson's iterative method is normally the method of choice. It requires however that we can compute in an efficient way the  matrices that define the first and second derivatives. </p>
+<ul>
+<li> Our simulations can be treated as <em>computer experiments</em>. This is particularly the case for Monte Carlo methods which are widely used in statistical analyses.</li>
+<li> The results can be analysed with the same statistical tools as we would use when analysing experimental data.</li>
+<li> As in all experiments, we are looking for expectation values and an estimate of how accurate they are, i.e., possible sources for errors.</li>
+</ul>
+</div>
+    
 
-<p>Our iterative scheme is then given by</p>
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="statistical-analysis">Statistical analysis </h2>
+<div class="alert alert-block alert-block alert-text-normal">
+<b></b>
+<p>
 
-$$
-\boldsymbol{\beta}^{\mathrm{new}} = \boldsymbol{\beta}^{\mathrm{old}}-\left(\frac{\partial^2 \mathcal{C}(\boldsymbol{\beta})}{\partial \boldsymbol{\beta}\partial \boldsymbol{\beta}^T}\right)^{-1}_{\boldsymbol{\beta}^{\mathrm{old}}}\times \left(\frac{\partial \mathcal{C}(\boldsymbol{\beta})}{\partial \boldsymbol{\beta}}\right)_{\boldsymbol{\beta}^{\mathrm{old}}},
-$$
+<ul>
+<li> As in other experiments, many numerical  experiments have two classes of errors:</li>
+<ul>
+  <li> Statistical errors</li>
+  <li> Systematical errors</li>
+</ul>
+<li> Statistical errors can be estimated using standard tools from statistics</li>
+<li> Systematical errors are method specific and must be treated differently from case to case.</li> 
+</ul>
+</div>
+    
 
-<p>or in matrix form as</p>
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="resampling-methods">Resampling methods </h2>
 
-$$
-\boldsymbol{\beta}^{\mathrm{new}} = \boldsymbol{\beta}^{\mathrm{old}}-\left(\boldsymbol{X}^T\boldsymbol{W}\boldsymbol{X} \right)^{-1}\times \left(-\boldsymbol{X}^T(\boldsymbol{y}-\boldsymbol{p}) \right)_{\boldsymbol{\beta}^{\mathrm{old}}}.
-$$
+<p>With all these analytical equations for both the OLS and Ridge
+regression, we will now outline how to assess a given model. This will
+lead to a discussion of the so-called bias-variance tradeoff (see
+below) and so-called resampling methods.
+</p>
 
-<p>The right-hand side is computed with the old values of \( \beta \). </p>
+<p>One of the quantities we have discussed as a way to measure errors is
+the mean-squared error (MSE), mainly used for fitting of continuous
+functions. Another choice is the absolute error.
+</p>
 
-<p>If we can compute these matrices, in particular the Hessian, the above is often the easiest method to implement. </p>
+<p>In the discussions below we will focus on the MSE and in particular since we will split the data into test and training data,
+we discuss the
+</p>
+<ol>
+<li> prediction error or simply the <b>test error</b> \( \mathrm{Err_{Test}} \), where we have a fixed training set and the test error is the MSE arising from the data reserved for testing. We discuss also the</li> 
+<li> training error \( \mathrm{Err_{Train}} \), which is the average loss over the training data.</li>
+</ol>
+<p>As our model becomes more and more complex, more of the training data tends to  used. The training may thence adapt to more complicated structures in the data. This may lead to a decrease in the bias (see below for code example) and a slight increase of the variance for the test error.
+For a certain level of complexity the test error will reach minimum, before starting to increase again. The
+training error reaches a saturation.
+</p>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="brief-reminder-on-newton-raphson-s-method">Brief reminder on Newton-Raphson's method </h2>
+<h2 id="resampling-methods-bootstrap">Resampling methods: Bootstrap </h2>
+<div class="alert alert-block alert-block alert-text-normal">
+<b></b>
+<p>
+<p>Bootstrapping is a <a href="https://en.wikipedia.org/wiki/Nonparametric_statistics" target="_blank">non-parametric approach</a> to statistical inference
+that substitutes computation for more traditional distributional
+assumptions and asymptotic results. Bootstrapping offers a number of
+advantages: 
+</p>
+<ol>
+<li> The bootstrap is quite general, although there are some cases in which it fails.</li>  
+<li> Because it does not require distributional assumptions (such as normally distributed errors), the bootstrap can provide more accurate inferences when the data are not well behaved or when the sample size is small.</li>  
+<li> It is possible to apply the bootstrap to statistics with sampling distributions that are difficult to derive, even asymptotically.</li> 
+<li> It is relatively simple to apply the bootstrap to complex data-collection plans (such as stratified and clustered samples).</li>
+</ol>
+</div>
 
-<p>Let us quickly remind ourselves how we derive the above method.</p>
 
-<p>Perhaps the most celebrated of all one-dimensional root-finding
-routines is Newton's method, also called the Newton-Raphson
-method. This method  requires the evaluation of both the
-function \( f \) and its derivative \( f' \) at arbitrary points. 
-If you can only calculate the derivative
-numerically and/or your function is not of the smooth type, we
-normally discourage the use of this method.
-</p>
+<p>The textbook by <a href="https://www.cambridge.org/core/books/bootstrap-methods-and-their-application/ED2FD043579F27952363566DC09CBD6A" target="_blank">Davison on the Bootstrap Methods and their Applications</a> provides many more insights and proofs. In this course we will take a more practical approach and use the results and theorems provided in the literature. For those interested in reading more about the bootstrap methods, we recommend the above text and the one by <a href="https://www.routledge.com/An-Introduction-to-the-Bootstrap/Efron-Tibshirani/p/book/9780412042317" target="_blank">Efron and Tibshirani</a>.</p>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="the-equations">The equations </h2>
+<h2 id="the-bias-variance-tradeoff">The bias-variance tradeoff </h2>
 
-<p>The Newton-Raphson formula consists geometrically of extending the
-tangent line at a current point until it crosses zero, then setting
-the next guess to the abscissa of that zero-crossing.  The mathematics
-behind this method is rather simple. Employing a Taylor expansion for
-\( x \) sufficiently close to the solution \( s \), we have
+<p>We will discuss the bias-variance tradeoff in the context of
+continuous predictions such as regression. However, many of the
+intuitions and ideas discussed here also carry over to classification
+tasks. Consider a dataset \( \mathcal{D} \) consisting of the data
+\( \mathbf{X}_\mathcal{D}=\{(y_j, \boldsymbol{x}_j), j=0\ldots n-1\} \). 
 </p>
 
+<p>Let us assume that the true data is generated from a noisy model</p>
+
 $$
-    f(s)=0=f(x)+(s-x)f'(x)+\frac{(s-x)^2}{2}f''(x) +\dots.
-    \label{eq:taylornr}
+\boldsymbol{y}=f(\boldsymbol{x}) + \boldsymbol{\epsilon}
 $$
 
-<p>For small enough values of the function and for well-behaved
-functions, the terms beyond linear are unimportant, hence we obtain
-</p>
+<p>where \( \epsilon \) is normally distributed with mean zero and standard deviation \( \sigma^2 \).</p>
 
-$$
-   f(x)+(s-x)f'(x)\approx 0,
-$$
+<p>In our derivation of the ordinary least squares method we defined then
+an approximation to the function \( f \) in terms of the parameters
+\( \boldsymbol{\theta} \) and the design matrix \( \boldsymbol{X} \) which embody our model,
+that is \( \boldsymbol{\tilde{y}}=\boldsymbol{X}\boldsymbol{\theta} \). 
+</p>
 
-<p>yielding</p>
+<p>Thereafter we found the parameters \( \boldsymbol{\theta} \) by optimizing the means squared error via the so-called cost function</p>
 $$
-   s\approx x-\frac{f(x)}{f'(x)}.
+C(\boldsymbol{X},\boldsymbol{\theta}) =\frac{1}{n}\sum_{i=0}^{n-1}(y_i-\tilde{y}_i)^2=\mathbb{E}\left[(\boldsymbol{y}-\boldsymbol{\tilde{y}})^2\right].
 $$
 
-<p>Having in mind an iterative procedure, it is natural to start iterating with</p>
+<p>We can rewrite this as </p>
 $$
-   x_{n+1}=x_n-\frac{f(x_n)}{f'(x_n)}.
+\mathbb{E}\left[(\boldsymbol{y}-\boldsymbol{\tilde{y}})^2\right]=\frac{1}{n}\sum_i(f_i-\mathbb{E}\left[\boldsymbol{\tilde{y}}\right])^2+\frac{1}{n}\sum_i(\tilde{y}_i-\mathbb{E}\left[\boldsymbol{\tilde{y}}\right])^2+\sigma^2.
 $$
 
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="simple-geometric-interpretation">Simple geometric interpretation </h2>
-
-<p>The above is Newton-Raphson's method. It has a simple geometric
-interpretation, namely \( x_{n+1} \) is the point where the tangent from
-\( (x_n,f(x_n)) \) crosses the \( x \)-axis.  Close to the solution,
-Newton-Raphson converges fast to the desired result. However, if we
-are far from a root, where the higher-order terms in the series are
-important, the Newton-Raphson formula can give grossly inaccurate
-results. For instance, the initial guess for the root might be so far
-from the true root as to let the search interval include a local
-maximum or minimum of the function.  If an iteration places a trial
-guess near such a local extremum, so that the first derivative nearly
-vanishes, then Newton-Raphson may fail totally
+<p>The three terms represent the square of the bias of the learning
+method, which can be thought of as the error caused by the simplifying
+assumptions built into the method. The second term represents the
+variance of the chosen model and finally the last terms is variance of
+the error \( \boldsymbol{\epsilon} \).
 </p>
 
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="extending-to-more-than-one-variable">Extending to more than one variable </h2>
-
-<p>Newton's method can be generalized to systems of several non-linear equations
-and variables. Consider the case with two equations
+<p>To derive this equation, we need to recall that the variance of \( \boldsymbol{y} \) and \( \boldsymbol{\epsilon} \) are both equal to \( \sigma^2 \). The mean value of \( \boldsymbol{\epsilon} \) is by definition equal to zero. Furthermore, the function \( f \) is not a stochastics variable, idem for \( \boldsymbol{\tilde{y}} \).
+We use a more compact notation in terms of the expectation value 
 </p>
 $$
-   \begin{array}{cc} f_1(x_1,x_2) &=0\\
-                     f_2(x_1,x_2) &=0,\end{array}
-$$
-
-<p>which we Taylor expand to obtain</p>
-
-$$
-   \begin{array}{cc} 0=f_1(x_1+h_1,x_2+h_2)=&f_1(x_1,x_2)+h_1
-                     \partial f_1/\partial x_1+h_2
-                     \partial f_1/\partial x_2+\dots\\
-                     0=f_2(x_1+h_1,x_2+h_2)=&f_2(x_1,x_2)+h_1
-                     \partial f_2/\partial x_1+h_2
-                     \partial f_2/\partial x_2+\dots
-                       \end{array}.
+\mathbb{E}\left[(\boldsymbol{y}-\boldsymbol{\tilde{y}})^2\right]=\mathbb{E}\left[(\boldsymbol{f}+\boldsymbol{\epsilon}-\boldsymbol{\tilde{y}})^2\right],
 $$
 
-<p>Defining the Jacobian matrix \( {\bf \boldsymbol{J}} \) we have</p>
+<p>and adding and subtracting \( \mathbb{E}\left[\boldsymbol{\tilde{y}}\right] \) we get</p>
 $$
- {\bf \boldsymbol{J}}=\left( \begin{array}{cc}
-                         \partial f_1/\partial x_1  & \partial f_1/\partial x_2 \\
-                          \partial f_2/\partial x_1     &\partial f_2/\partial x_2
-             \end{array} \right),
+\mathbb{E}\left[(\boldsymbol{y}-\boldsymbol{\tilde{y}})^2\right]=\mathbb{E}\left[(\boldsymbol{f}+\boldsymbol{\epsilon}-\boldsymbol{\tilde{y}}+\mathbb{E}\left[\boldsymbol{\tilde{y}}\right]-\mathbb{E}\left[\boldsymbol{\tilde{y}}\right])^2\right],
 $$
 
-<p>we can rephrase Newton's method as</p>
+<p>which, using the abovementioned expectation values can be rewritten as </p>
 $$
-\left(\begin{array}{c} x_1^{n+1} \\ x_2^{n+1} \end{array} \right)=
-\left(\begin{array}{c} x_1^{n} \\ x_2^{n} \end{array} \right)+
-\left(\begin{array}{c} h_1^{n} \\ h_2^{n} \end{array} \right),
+\mathbb{E}\left[(\boldsymbol{y}-\boldsymbol{\tilde{y}})^2\right]=\mathbb{E}\left[(\boldsymbol{y}-\mathbb{E}\left[\boldsymbol{\tilde{y}}\right])^2\right]+\mathrm{Var}\left[\boldsymbol{\tilde{y}}\right]+\sigma^2,
 $$
 
-<p>where we have defined</p>
-$$
-   \left(\begin{array}{c} h_1^{n} \\ h_2^{n} \end{array} \right)=
-   -{\bf \boldsymbol{J}}^{-1}
-   \left(\begin{array}{c} f_1(x_1^{n},x_2^{n}) \\ f_2(x_1^{n},x_2^{n}) \end{array} \right).
-$$
-
-<p>We need thus to compute the inverse of the Jacobian matrix and it
-is to understand that difficulties  may
-arise in case \( {\bf \boldsymbol{J}} \) is nearly singular.
-</p>
+<p>that is the rewriting in terms of the so-called bias, the variance of the model \( \boldsymbol{\tilde{y}} \) and the variance of \( \boldsymbol{\epsilon} \).</p>
 
-<p>It is rather straightforward to extend the above scheme to systems of
-more than two non-linear equations. In our case, the Jacobian matrix is given by the Hessian that represents the second derivative of cost function. 
-</p>
+<b>Note that in order to derive these equations we have assumed we can replace the unknown function \( \boldsymbol{f} \) with the target/output data \( \boldsymbol{y} \).</b>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="steepest-descent">Steepest descent </h2>
+<h2 id="a-way-to-read-the-bias-variance-tradeoff">A way to Read the Bias-Variance Tradeoff </h2>
 
-<p>The basic idea of gradient descent is
-that a function \( F(\mathbf{x}) \), 
-\( \mathbf{x} \equiv (x_1,\cdots,x_n) \), decreases fastest if one goes from \( \bf {x} \) in the
-direction of the negative gradient \( -\nabla F(\mathbf{x}) \).
-</p>
+<br/><br/>
+<center>
+<p><img src="http://github.com/figures/BiasVariance.png" width="600" align="bottom"></p>
+</center>
+<br/><br/>
 
-<p>It can be shown that if </p>
-$$
-\mathbf{x}_{k+1} = \mathbf{x}_k - \gamma_k \nabla F(\mathbf{x}_k),
-$$
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="understanding-what-happens">Understanding what happens </h2>
 
-<p>with \( \gamma_k > 0 \).</p>
+<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #eeeedd">
+  <pre style="line-height: 125%;"><span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">matplotlib.pyplot</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">plt</span>
+<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.linear_model</span> <span style="color: #8B008B; font-weight: bold">import</span> LinearRegression, Ridge, Lasso
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.preprocessing</span> <span style="color: #8B008B; font-weight: bold">import</span> PolynomialFeatures
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.model_selection</span> <span style="color: #8B008B; font-weight: bold">import</span> train_test_split
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.pipeline</span> <span style="color: #8B008B; font-weight: bold">import</span> make_pipeline
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.utils</span> <span style="color: #8B008B; font-weight: bold">import</span> resample
+
+np.random.seed(<span style="color: #B452CD">2018</span>)
+
+n = <span style="color: #B452CD">40</span>
+n_boostraps = <span style="color: #B452CD">100</span>
+maxdegree = <span style="color: #B452CD">14</span>
+
+
+<span style="color: #228B22"># Make data set.</span>
+x = np.linspace(-<span style="color: #B452CD">3</span>, <span style="color: #B452CD">3</span>, n).reshape(-<span style="color: #B452CD">1</span>, <span style="color: #B452CD">1</span>)
+y = np.exp(-x**<span style="color: #B452CD">2</span>) + <span style="color: #B452CD">1.5</span> * np.exp(-(x-<span style="color: #B452CD">2</span>)**<span style="color: #B452CD">2</span>)+ np.random.normal(<span style="color: #B452CD">0</span>, <span style="color: #B452CD">0.1</span>, x.shape)
+error = np.zeros(maxdegree)
+bias = np.zeros(maxdegree)
+variance = np.zeros(maxdegree)
+polydegree = np.zeros(maxdegree)
+x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=<span style="color: #B452CD">0.2</span>)
+
+<span style="color: #8B008B; font-weight: bold">for</span> degree <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(maxdegree):
+    model = make_pipeline(PolynomialFeatures(degree=degree), LinearRegression(fit_intercept=<span style="color: #8B008B; font-weight: bold">False</span>))
+    y_pred = np.empty((y_test.shape[<span style="color: #B452CD">0</span>], n_boostraps))
+    <span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(n_boostraps):
+        x_, y_ = resample(x_train, y_train)
+        y_pred[:, i] = model.fit(x_, y_).predict(x_test).ravel()
+
+    polydegree[degree] = degree
+    error[degree] = np.mean( np.mean((y_test - y_pred)**<span style="color: #B452CD">2</span>, axis=<span style="color: #B452CD">1</span>, keepdims=<span style="color: #8B008B; font-weight: bold">True</span>) )
+    bias[degree] = np.mean( (y_test - np.mean(y_pred, axis=<span style="color: #B452CD">1</span>, keepdims=<span style="color: #8B008B; font-weight: bold">True</span>))**<span style="color: #B452CD">2</span> )
+    variance[degree] = np.mean( np.var(y_pred, axis=<span style="color: #B452CD">1</span>, keepdims=<span style="color: #8B008B; font-weight: bold">True</span>) )
+    <span style="color: #658b00">print</span>(<span style="color: #CD5555">&#39;Polynomial degree:&#39;</span>, degree)
+    <span style="color: #658b00">print</span>(<span style="color: #CD5555">&#39;Error:&#39;</span>, error[degree])
+    <span style="color: #658b00">print</span>(<span style="color: #CD5555">&#39;Bias^2:&#39;</span>, bias[degree])
+    <span style="color: #658b00">print</span>(<span style="color: #CD5555">&#39;Var:&#39;</span>, variance[degree])
+    <span style="color: #658b00">print</span>(<span style="color: #CD5555">&#39;{} &gt;= {} + {} = {}&#39;</span>.format(error[degree], bias[degree], variance[degree], bias[degree]+variance[degree]))
+
+plt.plot(polydegree, error, label=<span style="color: #CD5555">&#39;Error&#39;</span>)
+plt.plot(polydegree, bias, label=<span style="color: #CD5555">&#39;bias&#39;</span>)
+plt.plot(polydegree, variance, label=<span style="color: #CD5555">&#39;Variance&#39;</span>)
+plt.legend()
+plt.show()
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
+</div>
 
-<p>For \( \gamma_k \) small enough, then \( F(\mathbf{x}_{k+1}) \leq
-F(\mathbf{x}_k) \). This means that for a sufficiently small \( \gamma_k \)
-we are always moving towards smaller function values, i.e a minimum.
-</p>
 
 <!-- !split  -->
-<h2 id="more-on-steepest-descent">More on Steepest descent </h2>
+<h2 id="summing-up">Summing up </h2>
 
-<p>The previous observation is the basis of the method of steepest
-descent, which is also referred to as just gradient descent (GD). One
-starts with an initial guess \( \mathbf{x}_0 \) for a minimum of \( F \) and
-computes new approximations according to
+<p>The bias-variance tradeoff summarizes the fundamental tension in
+machine learning, particularly supervised learning, between the
+complexity of a model and the amount of training data needed to train
+it.  Since data is often limited, in practice it is often useful to
+use a less-complex model with higher bias, that is  a model whose asymptotic
+performance is worse than another model because it is easier to
+train and less sensitive to sampling noise arising from having a
+finite-sized training dataset (smaller variance). 
 </p>
 
-$$
-\mathbf{x}_{k+1} = \mathbf{x}_k - \gamma_k \nabla F(\mathbf{x}_k), \ \ k \geq 0.
-$$
-
-<p>The parameter \( \gamma_k \) is often referred to as the step length or
-the learning rate within the context of Machine Learning.
+<p>The above equations tell us that in
+order to minimize the expected test error, we need to select a
+statistical learning method that simultaneously achieves low variance
+and low bias. Note that variance is inherently a nonnegative quantity,
+and squared bias is also nonnegative. Hence, we see that the expected
+test MSE can never lie below \( Var(\epsilon) \), the irreducible error.
 </p>
 
-<!-- !split  -->
-<h2 id="the-ideal">The ideal </h2>
-
-<p>Ideally the sequence \( \{\mathbf{x}_k \}_{k=0} \) converges to a global
-minimum of the function \( F \). In general we do not know if we are in a
-global or local minimum. In the special case when \( F \) is a convex
-function, all local minima are also global minima, so in this case
-gradient descent can converge to the global solution. The advantage of
-this scheme is that it is conceptually simple and straightforward to
-implement. However the method in this form has some severe
-limitations:
+<p>What do we mean by the variance and bias of a statistical learning
+method? The variance refers to the amount by which our model would change if we
+estimated it using a different training data set. Since the training
+data are used to fit the statistical learning method, different
+training data sets  will result in a different estimate. But ideally the
+estimate for our model should not vary too much between training
+sets. However, if a method has high variance  then small changes in
+the training data can result in large changes in the model. In general, more
+flexible statistical methods have higher variance.
 </p>
 
-<p>In machine learing we are often faced with non-convex high dimensional
-cost functions with many local minima. Since GD is deterministic we
-will get stuck in a local minimum, if the method converges, unless we
-have a very good intial guess. This also implies that the scheme is
-sensitive to the chosen initial condition.
-</p>
+<p>You may also find this recent <a href="https://www.pnas.org/content/116/32/15849" target="_blank">article</a> of interest.</p>
 
-<p>Note that the gradient is a function of \( \mathbf{x} =
-(x_1,\cdots,x_n) \) which makes it expensive to compute numerically.
-</p>
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="another-example-from-scikit-learn-s-repository">Another Example from Scikit-Learn's Repository </h2>
 
-<!-- !split  -->
-<h2 id="the-sensitiveness-of-the-gradient-descent">The sensitiveness of the gradient descent </h2>
-
-<p>The gradient descent method 
-is sensitive to the choice of learning rate \( \gamma_k \). This is due
-to the fact that we are only guaranteed that \( F(\mathbf{x}_{k+1}) \leq
-F(\mathbf{x}_k) \) for sufficiently small \( \gamma_k \). The problem is to
-determine an optimal learning rate. If the learning rate is chosen too
-small the method will take a long time to converge and if it is too
-large we can experience erratic behavior.
+<p>This example demonstrates the problems of underfitting and overfitting and
+how we can use linear regression with polynomial features to approximate
+nonlinear functions. The plot shows the function that we want to approximate,
+which is a part of the cosine function. In addition, the samples from the
+real function and the approximations of different models are displayed. The
+models have polynomial features of different degrees. We can see that a
+linear function (polynomial with degree 1) is not sufficient to fit the
+training samples. This is called <b>underfitting</b>. A polynomial of degree 4
+approximates the true function almost perfectly. However, for higher degrees
+the model will <b>overfit</b> the training data, i.e. it learns the noise of the
+training data.
+We evaluate quantitatively overfitting and underfitting by using
+cross-validation. We calculate the mean squared error (MSE) on the validation
+set, the higher, the less likely the model generalizes correctly from the
+training data.
 </p>
 
-<p>Many of these shortcomings can be alleviated by introducing
-randomness. One such method is that of Stochastic Gradient Descent
-(SGD), see below.
-</p>
 
-<!-- !split  -->
-<h2 id="convex-functions">Convex functions </h2>
+<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #eeeedd">
+  <pre style="line-height: 125%;"><span style="color: #228B22">#print(__doc__)</span>
 
-<p>Ideally we want our cost/loss function to be convex(concave).</p>
+<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
+<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">matplotlib.pyplot</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">plt</span>
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.pipeline</span> <span style="color: #8B008B; font-weight: bold">import</span> Pipeline
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.preprocessing</span> <span style="color: #8B008B; font-weight: bold">import</span> PolynomialFeatures
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.linear_model</span> <span style="color: #8B008B; font-weight: bold">import</span> LinearRegression
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.model_selection</span> <span style="color: #8B008B; font-weight: bold">import</span> cross_val_score
+
+
+<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">true_fun</span>(X):
+    <span style="color: #8B008B; font-weight: bold">return</span> np.cos(<span style="color: #B452CD">1.5</span> * np.pi * X)
+
+np.random.seed(<span style="color: #B452CD">0</span>)
+
+n_samples = <span style="color: #B452CD">30</span>
+degrees = [<span style="color: #B452CD">1</span>, <span style="color: #B452CD">4</span>, <span style="color: #B452CD">15</span>]
+
+X = np.sort(np.random.rand(n_samples))
+y = true_fun(X) + np.random.randn(n_samples) * <span style="color: #B452CD">0.1</span>
+
+plt.figure(figsize=(<span style="color: #B452CD">14</span>, <span style="color: #B452CD">5</span>))
+<span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #658b00">len</span>(degrees)):
+    ax = plt.subplot(<span style="color: #B452CD">1</span>, <span style="color: #658b00">len</span>(degrees), i + <span style="color: #B452CD">1</span>)
+    plt.setp(ax, xticks=(), yticks=())
+
+    polynomial_features = PolynomialFeatures(degree=degrees[i],
+                                             include_bias=<span style="color: #8B008B; font-weight: bold">False</span>)
+    linear_regression = LinearRegression()
+    pipeline = Pipeline([(<span style="color: #CD5555">&quot;polynomial_features&quot;</span>, polynomial_features),
+                         (<span style="color: #CD5555">&quot;linear_regression&quot;</span>, linear_regression)])
+    pipeline.fit(X[:, np.newaxis], y)
+
+    <span style="color: #228B22"># Evaluate the models using crossvalidation</span>
+    scores = cross_val_score(pipeline, X[:, np.newaxis], y,
+                             scoring=<span style="color: #CD5555">&quot;neg_mean_squared_error&quot;</span>, cv=<span style="color: #B452CD">10</span>)
+
+    X_test = np.linspace(<span style="color: #B452CD">0</span>, <span style="color: #B452CD">1</span>, <span style="color: #B452CD">100</span>)
+    plt.plot(X_test, pipeline.predict(X_test[:, np.newaxis]), label=<span style="color: #CD5555">&quot;Model&quot;</span>)
+    plt.plot(X_test, true_fun(X_test), label=<span style="color: #CD5555">&quot;True function&quot;</span>)
+    plt.scatter(X, y, edgecolor=<span style="color: #CD5555">&#39;b&#39;</span>, s=<span style="color: #B452CD">20</span>, label=<span style="color: #CD5555">&quot;Samples&quot;</span>)
+    plt.xlabel(<span style="color: #CD5555">&quot;x&quot;</span>)
+    plt.ylabel(<span style="color: #CD5555">&quot;y&quot;</span>)
+    plt.xlim((<span style="color: #B452CD">0</span>, <span style="color: #B452CD">1</span>))
+    plt.ylim((-<span style="color: #B452CD">2</span>, <span style="color: #B452CD">2</span>))
+    plt.legend(loc=<span style="color: #CD5555">&quot;best&quot;</span>)
+    plt.title(<span style="color: #CD5555">&quot;Degree {}\nMSE = {:.2e}(+/- {:.2e})&quot;</span>.format(
+        degrees[i], -scores.mean(), scores.std()))
+plt.show()
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
+</div>
 
-<p>First we give the definition of a convex set: A set \( C \) in
-\( \mathbb{R}^n \) is said to be convex if, for all \( x \) and \( y \) in \( C \) and
-all \( t \in (0,1) \) , the point \( (1 &#8722; t)x + ty \) also belongs to
-C. Geometrically this means that every point on the line segment
-connecting \( x \) and \( y \) is in \( C \) as discussed below.
-</p>
 
-<p>The convex subsets of \( \mathbb{R} \) are the intervals of
-\( \mathbb{R} \). Examples of convex sets of \( \mathbb{R}^2 \) are the
-regular polygons (triangles, rectangles, pentagons, etc...).
-</p>
+<!-- !split  -->
+<h2 id="various-steps-in-cross-validation">Various steps in cross-validation </h2>
 
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="convex-function">Convex function </h2>
-
-<p><b>Convex function</b>: Let \( X \subset \mathbb{R}^n \) be a convex
-set. Assume that the function \( f: X \rightarrow \mathbb{R} \) is
-continuous, then \( f \) is said to be convex if \( f(tx_1 + (1-t)x_2) \leq tf(x_1) + (1-t)f(x_2) \)
-for all \( x_1, x_2 \in X \) and for all \( t \in [0,1] \).
-If \( \leq \) is replaced with a strict inequaltiy in the
-definition, we demand \( x_1 \neq x_2 \) and \( t\in(0,1) \) then \( f \) is said
-to be strictly convex. For a single variable function, convexity means
-that if you draw a straight line connecting \( f(x_1) \) and \( f(x_2) \), the
-value of the function on the interval \( [x_1,x_2] \) is always below the
-line as illustrated below.
+<p>When the repetitive splitting of the data set is done randomly,
+samples may accidently end up in a fast majority of the splits in
+either training or test set. Such samples may have an unbalanced
+influence on either model building or prediction evaluation. To avoid
+this \( k \)-fold cross-validation structures the data splitting. The
+samples are divided into \( k \) more or less equally sized exhaustive and
+mutually exclusive subsets. In turn (at each split) one of these
+subsets plays the role of the test set while the union of the
+remaining subsets constitutes the training set. Such a splitting
+warrants a balanced representation of each sample in both training and
+test set over the splits. Still the division into the \( k \) subsets
+involves a degree of randomness. This may be fully excluded when
+choosing \( k=n \). This particular case is referred to as leave-one-out
+cross-validation (LOOCV). 
 </p>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="conditions-on-convex-functions">Conditions on convex functions </h2>
+<h2 id="cross-validation-in-brief">Cross-validation in brief </h2>
 
-<p>In the following we state first and second-order conditions which
-ensures convexity of a function \( f \). We write \( D_f \) to denote the
-domain of \( f \), i.e the subset of \( R^n \) where \( f \) is defined. For more
-details and proofs we refer to: <a href="http://stanford.edu/boyd/cvxbook/" target="_blank">S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge University Press</a>.
-</p>
-
-<div class="alert alert-block alert-block alert-text-normal">
-<b>First order condition</b>
-<p>
-<p>Suppose \( f \) is differentiable (i.e \( \nabla f(x) \) is well defined for
-all \( x \) in the domain of \( f \)). Then \( f \) is convex if and only if \( D_f \)
-is a convex set and \( f(y) \geq f(x) + \nabla f(x)^T (y-x) \) holds
-for all \( x,y \in D_f \).
-</p>
-
-<p>This condition means that for a convex function
-the first order Taylor expansion (right hand side above) at any point
-a global under estimator of the function. To convince yourself you can
-make a drawing of \( f(x) = x^2+1 \) and draw the tangent line to \( f(x) \) and
-note that it is always below the graph.  
-</p>
-</div>
-
-
-<div class="alert alert-block alert-block alert-text-normal">
-<b>Second order condition</b>
-<p>
-<p>Assume that \( f \) is twice
-differentiable, i.e the Hessian matrix exists at each point in
-\( D_f \). Then \( f \) is convex if and only if \( D_f \) is a convex set and its
-Hessian is positive semi-definite for all \( x\in D_f \). For a
-single-variable function this reduces to \( f''(x) \geq 0 \). Geometrically this means that \( f \) has nonnegative curvature
-everywhere.
-</p>
-</div>
-
-
-<p>This condition is particularly useful since it gives us an procedure for determining if the function under consideration is convex, apart from using the definition.</p>
+<p>For the various values of \( k \)</p>
 
+<ol>
+<li> shuffle the dataset randomly.</li>
+<li> Split the dataset into \( k \) groups.</li>
+<li> For each unique group:
+<ol type="a"></li>
+<li> Decide which group to use as set for test data</li>
+<li> Take the remaining groups as a training data set</li>
+<li> Fit a model on the training set and evaluate it on the test set</li>
+<li> Retain the evaluation score and discard the model</li>
+</ol>
+<li> Summarize the model using the sample of model evaluation scores</li>
+</ol>
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="more-on-convex-functions">More on convex functions </h2>
-
-<p>The next result is of great importance to us and the reason why we are
-going on about convex functions. In machine learning we frequently
-have to minimize a loss/cost function in order to find the best
-parameters for the model we are considering. 
-</p>
-
-<p>Ideally we want the
-global minimum (for high-dimensional models it is hard to know
-if we have local or global minimum). However, if the cost/loss function
-is convex the following result provides invaluable information:
-</p>
+<h2 id="code-example-for-cross-validation-and-k-fold-cross-validation">Code Example for Cross-validation and \( k \)-fold Cross-validation </h2>
 
-<div class="alert alert-block alert-block alert-text-normal">
-<b>Any minimum is global for convex functions</b>
-<p>
-<p>Consider the problem of finding \( x \in \mathbb{R}^n \) such that \( f(x) \)
-is minimal, where \( f \) is convex and differentiable. Then, any point
-\( x^* \) that satisfies \( \nabla f(x^*) = 0 \) is a global minimum.
-</p>
-</div>
+<p>The code here uses Ridge regression with cross-validation (CV)  resampling and \( k \)-fold CV in order to fit a specific polynomial. </p>
 
+<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #eeeedd">
+  <pre style="line-height: 125%;"><span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
+<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">matplotlib.pyplot</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">plt</span>
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.model_selection</span> <span style="color: #8B008B; font-weight: bold">import</span> KFold
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.linear_model</span> <span style="color: #8B008B; font-weight: bold">import</span> Ridge
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.model_selection</span> <span style="color: #8B008B; font-weight: bold">import</span> cross_val_score
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.preprocessing</span> <span style="color: #8B008B; font-weight: bold">import</span> PolynomialFeatures
 
-<p>This result means that if we know that the cost/loss function is convex and we are able to find a minimum, we are guaranteed that it is a global minimum.</p>
+<span style="color: #228B22"># A seed just to ensure that the random numbers are the same for every run.</span>
+<span style="color: #228B22"># Useful for eventual debugging.</span>
+np.random.seed(<span style="color: #B452CD">3155</span>)
 
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="some-simple-problems">Some simple problems </h2>
+<span style="color: #228B22"># Generate the data.</span>
+nsamples = <span style="color: #B452CD">100</span>
+x = np.random.randn(nsamples)
+y = <span style="color: #B452CD">3</span>*x**<span style="color: #B452CD">2</span> + np.random.randn(nsamples)
 
-<ol>
-<li> Show that \( f(x)=x^2 \) is convex for \( x \in \mathbb{R} \) using the definition of convexity. Hint: If you re-write the definition, \( f \) is convex if the following holds for all \( x,y \in D_f \) and any \( \lambda \in [0,1] \) $\lambda f(x)+(1-\lambda)f(y)-f(\lambda x + (1-\lambda) y ) \geq 0$.</li>
-<li> Using the second order condition show that the following functions are convex on the specified domain.</li>
-<ul>
- <li> \( f(x) = e^x \) is convex for \( x \in \mathbb{R} \).</li>
- <li> \( g(x) = -\ln(x) \) is convex for \( x \in (0,\infty) \).</li>
-</ul>
-<li> Let \( f(x) = x^2 \) and \( g(x) = e^x \). Show that \( f(g(x)) \) and \( g(f(x)) \) is convex for \( x \in \mathbb{R} \). Also show that if \( f(x) \) is any convex function than \( h(x) = e^{f(x)} \) is convex.</li>
-<li> A norm is any function that satisfy the following properties</li>
-<ul>
- <li> \( f(\alpha x) = |\alpha| f(x) \) for all \( \alpha \in \mathbb{R} \).</li>
- <li> \( f(x+y) \leq f(x) + f(y) \)</li>
- <li> \( f(x) \leq 0 \) for all \( x \in \mathbb{R}^n \) with equality if and only if \( x = 0 \)</li>
-</ul>
-</ol>
-<p>Using the definition of convexity, try to show that a function satisfying the properties above is convex (the third condition is not needed to show this).</p>
+<span style="color: #228B22">## Cross-validation on Ridge regression using KFold only</span>
 
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="standard-steepest-descent">Standard steepest descent </h2>
+<span style="color: #228B22"># Decide degree on polynomial to fit</span>
+poly = PolynomialFeatures(degree = <span style="color: #B452CD">6</span>)
 
-<p>Before we proceed, we would like to discuss the approach called the
-<b>standard Steepest descent</b> (different from the above steepest descent discussion), which again leads to us having to be able
-to compute a matrix. It belongs to the class of Conjugate Gradient methods (CG).
-</p>
+<span style="color: #228B22"># Decide which values of lambda to use</span>
+nlambdas = <span style="color: #B452CD">500</span>
+lambdas = np.logspace(-<span style="color: #B452CD">3</span>, <span style="color: #B452CD">5</span>, nlambdas)
 
-<a href="https://www.cs.cmu.edu/~quake-papers/painless-conjugate-gradient.pdf" target="_blank">The success of the CG method</a>
-<p>for finding solutions of non-linear problems is based on the theory
-of conjugate gradients for linear systems of equations. It belongs to
-the class of iterative methods for solving problems from linear
-algebra of the type 
-</p>
-$$
-\begin{equation*} 
-\boldsymbol{A}\boldsymbol{x} = \boldsymbol{b}.
-\end{equation*} 
-$$
+<span style="color: #228B22"># Initialize a KFold instance</span>
+k = <span style="color: #B452CD">5</span>
+kfold = KFold(n_splits = k)
 
-<p>In the iterative process we end up with a problem like</p>
+<span style="color: #228B22"># Perform the cross-validation to estimate MSE</span>
+scores_KFold = np.zeros((nlambdas, k))
 
-$$
-\begin{equation*}
-  \boldsymbol{r}= \boldsymbol{b}-\boldsymbol{A}\boldsymbol{x},
-\end{equation*}
-$$
+i = <span style="color: #B452CD">0</span>
+<span style="color: #8B008B; font-weight: bold">for</span> lmb <span style="color: #8B008B">in</span> lambdas:
+    ridge = Ridge(alpha = lmb)
+    j = <span style="color: #B452CD">0</span>
+    <span style="color: #8B008B; font-weight: bold">for</span> train_inds, test_inds <span style="color: #8B008B">in</span> kfold.split(x):
+        xtrain = x[train_inds]
+        ytrain = y[train_inds]
 
-<p>where \( \boldsymbol{r} \) is the so-called residual or error in the iterative process.</p>
+        xtest = x[test_inds]
+        ytest = y[test_inds]
 
-<p>When we have found the exact solution, \( \boldsymbol{r}=0 \).</p>
+        Xtrain = poly.fit_transform(xtrain[:, np.newaxis])
+        ridge.fit(Xtrain, ytrain[:, np.newaxis])
 
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="gradient-method">Gradient method </h2>
+        Xtest = poly.fit_transform(xtest[:, np.newaxis])
+        ypred = ridge.predict(Xtest)
 
-<p>The residual is zero when we reach the minimum of the quadratic equation</p>
-$$
-\begin{equation*}
-  P(\boldsymbol{x})=\frac{1}{2}\boldsymbol{x}^T\boldsymbol{A}\boldsymbol{x} - \boldsymbol{x}^T\boldsymbol{b},
-\end{equation*}
-$$
+        scores_KFold[i,j] = np.sum((ypred - ytest[:, np.newaxis])**<span style="color: #B452CD">2</span>)/np.size(ypred)
 
-<p>with the constraint that the matrix \( \boldsymbol{A} \) is positive definite and
-symmetric.  This defines also the Hessian and we want it to be  positive definite.  
-</p>
+        j += <span style="color: #B452CD">1</span>
+    i += <span style="color: #B452CD">1</span>
 
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="steepest-descent-method">Steepest descent  method </h2>
 
-<p>We denote the initial guess for \( \boldsymbol{x} \) as \( \boldsymbol{x}_0 \). 
-We can assume without loss of generality that
-</p>
-$$
-\begin{equation*}
-\boldsymbol{x}_0=0,
-\end{equation*}
-$$
+estimated_mse_KFold = np.mean(scores_KFold, axis = <span style="color: #B452CD">1</span>)
 
-<p>or consider the system</p>
-$$
-\begin{equation*}
-\boldsymbol{A}\boldsymbol{z} = \boldsymbol{b}-\boldsymbol{A}\boldsymbol{x}_0,
-\end{equation*}
-$$
+<span style="color: #228B22">## Cross-validation using cross_val_score from sklearn along with KFold</span>
 
-<p>instead.</p>
+<span style="color: #228B22"># kfold is an instance initialized above as:</span>
+<span style="color: #228B22"># kfold = KFold(n_splits = k)</span>
 
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="steepest-descent-method">Steepest descent  method </h2>
-<div class="alert alert-block alert-block alert-text-normal">
-<b></b>
-<p>
-<p>One can show that the solution \( \boldsymbol{x} \) is also the unique minimizer of the quadratic form</p>
-$$
-\begin{equation*}
-  f(\boldsymbol{x}) = \frac{1}{2}\boldsymbol{x}^T\boldsymbol{A}\boldsymbol{x} - \boldsymbol{x}^T \boldsymbol{x} , \quad \boldsymbol{x}\in\mathbf{R}^n. 
-\end{equation*}
-$$
+estimated_mse_sklearn = np.zeros(nlambdas)
+i = <span style="color: #B452CD">0</span>
+<span style="color: #8B008B; font-weight: bold">for</span> lmb <span style="color: #8B008B">in</span> lambdas:
+    ridge = Ridge(alpha = lmb)
 
-<p>This suggests taking the first basis vector \( \boldsymbol{r}_1 \) (see below for definition) 
-to be the gradient of \( f \) at \( \boldsymbol{x}=\boldsymbol{x}_0 \), 
-which equals
-</p>
-$$
-\begin{equation*}
-\boldsymbol{A}\boldsymbol{x}_0-\boldsymbol{b},
-\end{equation*}
-$$
+    X = poly.fit_transform(x[:, np.newaxis])
+    estimated_mse_folds = cross_val_score(ridge, X, y[:, np.newaxis], scoring=<span style="color: #CD5555">&#39;neg_mean_squared_error&#39;</span>, cv=kfold)
 
-<p>and 
-\( \boldsymbol{x}_0=0 \) it is equal \( -\boldsymbol{b} \).
-</p>
-</div>
+    <span style="color: #228B22"># cross_val_score return an array containing the estimated negative mse for every fold.</span>
+    <span style="color: #228B22"># we have to the the mean of every array in order to get an estimate of the mse of the model</span>
+    estimated_mse_sklearn[i] = np.mean(-estimated_mse_folds)
 
+    i += <span style="color: #B452CD">1</span>
 
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="final-expressions">Final expressions </h2>
-<div class="alert alert-block alert-block alert-text-normal">
-<b></b>
-<p>
-<p>We can compute the residual iteratively as</p>
-$$
-\begin{equation*}
-\boldsymbol{r}_{k+1}=\boldsymbol{b}-\boldsymbol{A}\boldsymbol{x}_{k+1},
- \end{equation*}
-$$
+<span style="color: #228B22">## Plot and compare the slightly different ways to perform cross-validation</span>
 
-<p>which equals</p>
-$$
-\begin{equation*}
-\boldsymbol{b}-\boldsymbol{A}(\boldsymbol{x}_k+\alpha_k\boldsymbol{r}_k),
- \end{equation*}
-$$
+plt.figure()
 
-<p>or</p>
-$$
-\begin{equation*}
-(\boldsymbol{b}-\boldsymbol{A}\boldsymbol{x}_k)-\alpha_k\boldsymbol{A}\boldsymbol{r}_k,
- \end{equation*}
-$$
+plt.plot(np.log10(lambdas), estimated_mse_sklearn, label = <span style="color: #CD5555">&#39;cross_val_score&#39;</span>)
+<span style="color: #228B22">#plt.plot(np.log10(lambdas), estimated_mse_KFold, &#39;r--&#39;, label = &#39;KFold&#39;)</span>
 
-<p>which gives</p>
+plt.xlabel(<span style="color: #CD5555">&#39;log10(lambda)&#39;</span>)
+plt.ylabel(<span style="color: #CD5555">&#39;mse&#39;</span>)
 
-$$
-\alpha_k = \frac{\boldsymbol{r}_k^T\boldsymbol{r}_k}{\boldsymbol{r}_k^T\boldsymbol{A}\boldsymbol{r}_k}
-$$
+plt.legend()
 
-<p>leading to the iterative scheme</p>
-$$
-\begin{equation*}
-\boldsymbol{x}_{k+1}=\boldsymbol{x}_k+\alpha_k\boldsymbol{r}_{k},
- \end{equation*}
-$$
+plt.show()
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
 </div>
 
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="steepest-descent-example">Steepest descent example </h2>
+<h2 id="more-examples-on-bootstrap-and-cross-validation-and-errors">More examples on bootstrap and cross-validation and errors </h2>
 
 
 <!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
@@ -932,26 +850,84 @@ <h2 id="steepest-descent-example">Steepest descent example </h2>
     <div class="inner_cell">
       <div class="input_area">
         <div class="highlight" style="background: #eeeedd">
-  <pre style="line-height: 125%;"><span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">numpy.linalg</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">la</span>
-
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">scipy.optimize</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">sopt</span>
-
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">matplotlib.pyplot</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">pt</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">mpl_toolkits.mplot3d</span> <span style="color: #8B008B; font-weight: bold">import</span> axes3d
-
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">f</span>(x):
-    <span style="color: #8B008B; font-weight: bold">return</span> x[<span style="color: #B452CD">0</span>]**<span style="color: #B452CD">2</span> + <span style="color: #B452CD">3.0</span>*x[<span style="color: #B452CD">1</span>]**<span style="color: #B452CD">2</span>
-
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">df</span>(x):
-    <span style="color: #8B008B; font-weight: bold">return</span> np.array([<span style="color: #B452CD">2</span>*x[<span style="color: #B452CD">0</span>], <span style="color: #B452CD">6</span>*x[<span style="color: #B452CD">1</span>]])
-
-fig = pt.figure()
-ax = fig.add_subplot(projection = <span style="color: #CD5555">&#39;3d&#39;</span>)
-
-xmesh, ymesh = np.mgrid[-<span style="color: #B452CD">3</span>:<span style="color: #B452CD">3</span>:<span style="color: #B452CD">50</span>j,-<span style="color: #B452CD">3</span>:<span style="color: #B452CD">3</span>:<span style="color: #B452CD">50</span>j]
-fmesh = f(np.array([xmesh, ymesh]))
-ax.plot_surface(xmesh, ymesh, fmesh)
+  <pre style="line-height: 125%;"><span style="color: #228B22"># Common imports</span>
+<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">os</span>
+<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
+<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">pandas</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">pd</span>
+<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">matplotlib.pyplot</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">plt</span>
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.linear_model</span> <span style="color: #8B008B; font-weight: bold">import</span> LinearRegression, Ridge, Lasso
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.model_selection</span> <span style="color: #8B008B; font-weight: bold">import</span> train_test_split
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.utils</span> <span style="color: #8B008B; font-weight: bold">import</span> resample
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.metrics</span> <span style="color: #8B008B; font-weight: bold">import</span> mean_squared_error
+<span style="color: #228B22"># Where to save the figures and data files</span>
+PROJECT_ROOT_DIR = <span style="color: #CD5555">&quot;Results&quot;</span>
+FIGURE_ID = <span style="color: #CD5555">&quot;Results/FigureFiles&quot;</span>
+DATA_ID = <span style="color: #CD5555">&quot;DataFiles/&quot;</span>
+
+<span style="color: #8B008B; font-weight: bold">if</span> <span style="color: #8B008B">not</span> os.path.exists(PROJECT_ROOT_DIR):
+    os.mkdir(PROJECT_ROOT_DIR)
+
+<span style="color: #8B008B; font-weight: bold">if</span> <span style="color: #8B008B">not</span> os.path.exists(FIGURE_ID):
+    os.makedirs(FIGURE_ID)
+
+<span style="color: #8B008B; font-weight: bold">if</span> <span style="color: #8B008B">not</span> os.path.exists(DATA_ID):
+    os.makedirs(DATA_ID)
+
+<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">image_path</span>(fig_id):
+    <span style="color: #8B008B; font-weight: bold">return</span> os.path.join(FIGURE_ID, fig_id)
+
+<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">data_path</span>(dat_id):
+    <span style="color: #8B008B; font-weight: bold">return</span> os.path.join(DATA_ID, dat_id)
+
+<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">save_fig</span>(fig_id):
+    plt.savefig(image_path(fig_id) + <span style="color: #CD5555">&quot;.png&quot;</span>, <span style="color: #658b00">format</span>=<span style="color: #CD5555">&#39;png&#39;</span>)
+
+infile = <span style="color: #658b00">open</span>(data_path(<span style="color: #CD5555">&quot;EoS.csv&quot;</span>),<span style="color: #CD5555">&#39;r&#39;</span>)
+
+<span style="color: #228B22"># Read the EoS data as  csv file and organize the data into two arrays with density and energies</span>
+EoS = pd.read_csv(infile, names=(<span style="color: #CD5555">&#39;Density&#39;</span>, <span style="color: #CD5555">&#39;Energy&#39;</span>))
+EoS[<span style="color: #CD5555">&#39;Energy&#39;</span>] = pd.to_numeric(EoS[<span style="color: #CD5555">&#39;Energy&#39;</span>], errors=<span style="color: #CD5555">&#39;coerce&#39;</span>)
+EoS = EoS.dropna()
+Energies = EoS[<span style="color: #CD5555">&#39;Energy&#39;</span>]
+Density = EoS[<span style="color: #CD5555">&#39;Density&#39;</span>]
+<span style="color: #228B22">#  The design matrix now as function of various polytrops</span>
+
+Maxpolydegree = <span style="color: #B452CD">30</span>
+X = np.zeros((<span style="color: #658b00">len</span>(Density),Maxpolydegree))
+X[:,<span style="color: #B452CD">0</span>] = <span style="color: #B452CD">1.0</span>
+testerror = np.zeros(Maxpolydegree)
+trainingerror = np.zeros(Maxpolydegree)
+polynomial = np.zeros(Maxpolydegree)
+
+trials = <span style="color: #B452CD">100</span>
+<span style="color: #8B008B; font-weight: bold">for</span> polydegree <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #B452CD">1</span>, Maxpolydegree):
+    polynomial[polydegree] = polydegree
+    <span style="color: #8B008B; font-weight: bold">for</span> degree <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(polydegree):
+        X[:,degree] = Density**(degree/<span style="color: #B452CD">3.0</span>)
+
+<span style="color: #228B22"># loop over trials in order to estimate the expectation value of the MSE</span>
+    testerror[polydegree] = <span style="color: #B452CD">0.0</span>
+    trainingerror[polydegree] = <span style="color: #B452CD">0.0</span>
+    <span style="color: #8B008B; font-weight: bold">for</span> samples <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(trials):
+        x_train, x_test, y_train, y_test = train_test_split(X, Energies, test_size=<span style="color: #B452CD">0.2</span>)
+        model = LinearRegression(fit_intercept=<span style="color: #8B008B; font-weight: bold">False</span>).fit(x_train, y_train)
+        ypred = model.predict(x_train)
+        ytilde = model.predict(x_test)
+        testerror[polydegree] += mean_squared_error(y_test, ytilde)
+        trainingerror[polydegree] += mean_squared_error(y_train, ypred) 
+
+    testerror[polydegree] /= trials
+    trainingerror[polydegree] /= trials
+    <span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;Degree of polynomial: %3d&quot;</span>% polynomial[polydegree])
+    <span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;Mean squared error on training data: %.8f&quot;</span> % trainingerror[polydegree])
+    <span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;Mean squared error on test data: %.8f&quot;</span> % testerror[polydegree])
+
+plt.plot(polynomial, np.log10(trainingerror), label=<span style="color: #CD5555">&#39;Training Error&#39;</span>)
+plt.plot(polynomial, np.log10(testerror), label=<span style="color: #CD5555">&#39;Test Error&#39;</span>)
+plt.xlabel(<span style="color: #CD5555">&#39;Polynomial degree&#39;</span>)
+plt.ylabel(<span style="color: #CD5555">&#39;log10[MSE]&#39;</span>)
+plt.legend()
+plt.show()
 </pre>
 </div>
       </div>
@@ -967,7 +943,12 @@ <h2 id="steepest-descent-example">Steepest descent example </h2>
   </div>
 </div>
 
-<p>And then as countor plot</p>
+<p>Note that we kept the intercept column in the fitting here. This means that we need to set the <b>intercept</b> in the call to the <b>Scikit-Learn</b> function as <b>False</b>. Alternatively, we could have set up the design matrix \( X \) without the first column of ones.</p>
+
+<!-- !split  -->
+<h2 id="the-same-example-but-now-with-cross-validation">The same example but now with cross-validation </h2>
+
+<p>In this example we keep the intercept column again but add cross-validation in order to estimate the best possible value of the means squared error.</p>
 
 <!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
 <div class="cell border-box-sizing code_cell rendered">
@@ -975,9 +956,73 @@ <h2 id="steepest-descent-example">Steepest descent example </h2>
     <div class="inner_cell">
       <div class="input_area">
         <div class="highlight" style="background: #eeeedd">
-  <pre style="line-height: 125%;">pt.axis(<span style="color: #CD5555">&quot;equal&quot;</span>)
-pt.contour(xmesh, ymesh, fmesh)
-guesses = [np.array([<span style="color: #B452CD">2</span>, <span style="color: #B452CD">2.</span>/<span style="color: #B452CD">5</span>])]
+  <pre style="line-height: 125%;"><span style="color: #228B22"># Common imports</span>
+<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">os</span>
+<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
+<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">pandas</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">pd</span>
+<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">matplotlib.pyplot</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">plt</span>
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.linear_model</span> <span style="color: #8B008B; font-weight: bold">import</span> LinearRegression, Ridge, Lasso
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.metrics</span> <span style="color: #8B008B; font-weight: bold">import</span> mean_squared_error
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.model_selection</span> <span style="color: #8B008B; font-weight: bold">import</span> KFold
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.model_selection</span> <span style="color: #8B008B; font-weight: bold">import</span> cross_val_score
+
+
+<span style="color: #228B22"># Where to save the figures and data files</span>
+PROJECT_ROOT_DIR = <span style="color: #CD5555">&quot;Results&quot;</span>
+FIGURE_ID = <span style="color: #CD5555">&quot;Results/FigureFiles&quot;</span>
+DATA_ID = <span style="color: #CD5555">&quot;DataFiles/&quot;</span>
+
+<span style="color: #8B008B; font-weight: bold">if</span> <span style="color: #8B008B">not</span> os.path.exists(PROJECT_ROOT_DIR):
+    os.mkdir(PROJECT_ROOT_DIR)
+
+<span style="color: #8B008B; font-weight: bold">if</span> <span style="color: #8B008B">not</span> os.path.exists(FIGURE_ID):
+    os.makedirs(FIGURE_ID)
+
+<span style="color: #8B008B; font-weight: bold">if</span> <span style="color: #8B008B">not</span> os.path.exists(DATA_ID):
+    os.makedirs(DATA_ID)
+
+<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">image_path</span>(fig_id):
+    <span style="color: #8B008B; font-weight: bold">return</span> os.path.join(FIGURE_ID, fig_id)
+
+<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">data_path</span>(dat_id):
+    <span style="color: #8B008B; font-weight: bold">return</span> os.path.join(DATA_ID, dat_id)
+
+<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">save_fig</span>(fig_id):
+    plt.savefig(image_path(fig_id) + <span style="color: #CD5555">&quot;.png&quot;</span>, <span style="color: #658b00">format</span>=<span style="color: #CD5555">&#39;png&#39;</span>)
+
+infile = <span style="color: #658b00">open</span>(data_path(<span style="color: #CD5555">&quot;EoS.csv&quot;</span>),<span style="color: #CD5555">&#39;r&#39;</span>)
+
+<span style="color: #228B22"># Read the EoS data as  csv file and organize the data into two arrays with density and energies</span>
+EoS = pd.read_csv(infile, names=(<span style="color: #CD5555">&#39;Density&#39;</span>, <span style="color: #CD5555">&#39;Energy&#39;</span>))
+EoS[<span style="color: #CD5555">&#39;Energy&#39;</span>] = pd.to_numeric(EoS[<span style="color: #CD5555">&#39;Energy&#39;</span>], errors=<span style="color: #CD5555">&#39;coerce&#39;</span>)
+EoS = EoS.dropna()
+Energies = EoS[<span style="color: #CD5555">&#39;Energy&#39;</span>]
+Density = EoS[<span style="color: #CD5555">&#39;Density&#39;</span>]
+<span style="color: #228B22">#  The design matrix now as function of various polytrops</span>
+
+Maxpolydegree = <span style="color: #B452CD">30</span>
+X = np.zeros((<span style="color: #658b00">len</span>(Density),Maxpolydegree))
+X[:,<span style="color: #B452CD">0</span>] = <span style="color: #B452CD">1.0</span>
+estimated_mse_sklearn = np.zeros(Maxpolydegree)
+polynomial = np.zeros(Maxpolydegree)
+k =<span style="color: #B452CD">5</span>
+kfold = KFold(n_splits = k)
+
+<span style="color: #8B008B; font-weight: bold">for</span> polydegree <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #B452CD">1</span>, Maxpolydegree):
+    polynomial[polydegree] = polydegree
+    <span style="color: #8B008B; font-weight: bold">for</span> degree <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(polydegree):
+        X[:,degree] = Density**(degree/<span style="color: #B452CD">3.0</span>)
+        OLS = LinearRegression(fit_intercept=<span style="color: #8B008B; font-weight: bold">False</span>)
+<span style="color: #228B22"># loop over trials in order to estimate the expectation value of the MSE</span>
+    estimated_mse_folds = cross_val_score(OLS, X, Energies, scoring=<span style="color: #CD5555">&#39;neg_mean_squared_error&#39;</span>, cv=kfold)
+<span style="color: #228B22">#[:, np.newaxis]</span>
+    estimated_mse_sklearn[polydegree] = np.mean(-estimated_mse_folds)
+
+plt.plot(polynomial, np.log10(estimated_mse_sklearn), label=<span style="color: #CD5555">&#39;Test Error&#39;</span>)
+plt.xlabel(<span style="color: #CD5555">&#39;Polynomial degree&#39;</span>)
+plt.ylabel(<span style="color: #CD5555">&#39;log10[MSE]&#39;</span>)
+plt.legend()
+plt.show()
 </pre>
 </div>
       </div>
@@ -993,7 +1038,140 @@ <h2 id="steepest-descent-example">Steepest descent example </h2>
   </div>
 </div>
 
-<p>Find guesses</p>
+
+<!-- !split  -->
+<h2 id="logistic-regression">Logistic Regression </h2>
+
+<p>In linear regression our main interest was centered on learning the
+coefficients of a functional fit (say a polynomial) in order to be
+able to predict the response of a continuous variable on some unseen
+data. The fit to the continuous variable \( y_i \) is based on some
+independent variables \( \boldsymbol{x}_i \). Linear regression resulted in
+analytical expressions for standard ordinary Least Squares or Ridge
+regression (in terms of matrices to invert) for several quantities,
+ranging from the variance and thereby the confidence intervals of the
+parameters \( \boldsymbol{\theta} \) to the mean squared error. If we can invert
+the product of the design matrices, linear regression gives then a
+simple recipe for fitting our data.
+</p>
+
+<!-- !split  -->
+<h2 id="classification-problems">Classification problems </h2>
+
+<p>Classification problems, however, are concerned with outcomes taking
+the form of discrete variables (i.e. categories). We may for example,
+on the basis of DNA sequencing for a number of patients, like to find
+out which mutations are important for a certain disease; or based on
+scans of various patients' brains, figure out if there is a tumor or
+not; or given a specific physical system, we'd like to identify its
+state, say whether it is an ordered or disordered system (typical
+situation in solid state physics); or classify the status of a
+patient, whether she/he has a stroke or not and many other similar
+situations.
+</p>
+
+<p>The most common situation we encounter when we apply logistic
+regression is that of two possible outcomes, normally denoted as a
+binary outcome, true or false, positive or negative, success or
+failure etc.
+</p>
+
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="optimization-and-deep-learning">Optimization and Deep learning </h2>
+
+<p>Logistic regression will also serve as our stepping stone towards
+neural network algorithms and supervised deep learning. For logistic
+learning, the minimization of the cost function leads to a non-linear
+equation in the parameters \( \boldsymbol{\theta} \). The optimization of the
+problem calls therefore for minimization algorithms. This forms the
+bottle neck of all machine learning algorithms, namely how to find
+reliable minima of a multi-variable function. This leads us to the
+family of gradient descent methods. The latter are the working horses
+of basically all modern machine learning algorithms.
+</p>
+
+<p>We note also that many of the topics discussed here on logistic 
+regression are also commonly used in modern supervised Deep Learning
+models, as we will see later.
+</p>
+
+<!-- !split  -->
+<h2 id="basics">Basics </h2>
+
+<p>We consider the case where the outputs/targets, also called the
+responses or the outcomes, \( y_i \) are discrete and only take values
+from \( k=0,\dots,K-1 \) (i.e. \( K \) classes).
+</p>
+
+<p>The goal is to predict the
+output classes from the design matrix \( \boldsymbol{X}\in\mathbb{R}^{n\times p} \)
+made of \( n \) samples, each of which carries \( p \) features or predictors. The
+primary goal is to identify the classes to which new unseen samples
+belong.
+</p>
+
+<p>Let us specialize to the case of two classes only, with outputs
+\( y_i=0 \) and \( y_i=1 \). Our outcomes could represent the status of a
+credit card user that could default or not on her/his credit card
+debt. That is
+</p>
+
+$$
+y_i = \begin{bmatrix} 0 & \mathrm{no}\\  1 & \mathrm{yes} \end{bmatrix}.
+$$
+
+
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="linear-classifier">Linear classifier </h2>
+
+<p>Before moving to the logistic model, let us try to use our linear
+regression model to classify these two outcomes. We could for example
+fit a linear model to the default case if \( y_i > 0.5 \) and the no
+default case \( y_i \leq 0.5 \).
+</p>
+
+<p>We would then have our 
+weighted linear combination, namely 
+</p>
+$$
+\begin{equation}
+\boldsymbol{y} = \boldsymbol{X}^T\boldsymbol{\theta} +  \boldsymbol{\epsilon},
+\label{_auto1}
+\end{equation}
+$$
+
+<p>where \( \boldsymbol{y} \) is a vector representing the possible outcomes, \( \boldsymbol{X} \) is our
+\( n\times p \) design matrix and \( \boldsymbol{\theta} \) represents our estimators/predictors.
+</p>
+
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="some-selected-properties">Some selected properties </h2>
+
+<p>The main problem with our function is that it takes values on the
+entire real axis. In the case of logistic regression, however, the
+labels \( y_i \) are discrete variables. A typical example is the credit
+card data discussed below here, where we can set the state of
+defaulting the debt to \( y_i=1 \) and not to \( y_i=0 \) for one the persons
+in the data set (see the full example below).
+</p>
+
+<p>One simple way to get a discrete output is to have sign
+functions that map the output of a linear regressor to values \( \{0,1\} \),
+\( f(s_i)=sign(s_i)=1 \) if \( s_i\ge 0 \) and 0 if otherwise. 
+We will encounter this model in our first demonstration of neural networks.
+</p>
+
+<p>Historically it is called the <b>perceptron</b> model in the machine learning
+literature. This model is extremely simple. However, in many cases it is more
+favorable to use a ``soft" classifier that outputs
+the probability of a given category. This leads us to the logistic function.
+</p>
+
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="simple-example">Simple example </h2>
+
+<p>The following example on data for coronary heart disease (CHD) as function of age may serve as an illustration. In the code here we read and plot whether a person has had CHD (output = 1) or not (output = 0). This ouput  is plotted the person's against age. Clearly, the figure shows that attempting to make a standard linear regression fit may not be very meaningful.</p>
+
 
 <!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
 <div class="cell border-box-sizing code_cell rendered">
@@ -1001,8 +1179,59 @@ <h2 id="steepest-descent-example">Steepest descent example </h2>
     <div class="inner_cell">
       <div class="input_area">
         <div class="highlight" style="background: #eeeedd">
-  <pre style="line-height: 125%;">x = guesses[-<span style="color: #B452CD">1</span>]
-s = -df(x)
+  <pre style="line-height: 125%;"><span style="color: #228B22"># Common imports</span>
+<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">os</span>
+<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
+<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">pandas</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">pd</span>
+<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">matplotlib.pyplot</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">plt</span>
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.linear_model</span> <span style="color: #8B008B; font-weight: bold">import</span> LinearRegression, Ridge, Lasso
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.model_selection</span> <span style="color: #8B008B; font-weight: bold">import</span> train_test_split
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.utils</span> <span style="color: #8B008B; font-weight: bold">import</span> resample
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.metrics</span> <span style="color: #8B008B; font-weight: bold">import</span> mean_squared_error
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">IPython.display</span> <span style="color: #8B008B; font-weight: bold">import</span> display
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">pylab</span> <span style="color: #8B008B; font-weight: bold">import</span> plt, mpl
+mpl.rcParams[<span style="color: #CD5555">&#39;font.family&#39;</span>] = <span style="color: #CD5555">&#39;serif&#39;</span>
+
+<span style="color: #228B22"># Where to save the figures and data files</span>
+PROJECT_ROOT_DIR = <span style="color: #CD5555">&quot;Results&quot;</span>
+FIGURE_ID = <span style="color: #CD5555">&quot;Results/FigureFiles&quot;</span>
+DATA_ID = <span style="color: #CD5555">&quot;DataFiles/&quot;</span>
+
+<span style="color: #8B008B; font-weight: bold">if</span> <span style="color: #8B008B">not</span> os.path.exists(PROJECT_ROOT_DIR):
+    os.mkdir(PROJECT_ROOT_DIR)
+
+<span style="color: #8B008B; font-weight: bold">if</span> <span style="color: #8B008B">not</span> os.path.exists(FIGURE_ID):
+    os.makedirs(FIGURE_ID)
+
+<span style="color: #8B008B; font-weight: bold">if</span> <span style="color: #8B008B">not</span> os.path.exists(DATA_ID):
+    os.makedirs(DATA_ID)
+
+<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">image_path</span>(fig_id):
+    <span style="color: #8B008B; font-weight: bold">return</span> os.path.join(FIGURE_ID, fig_id)
+
+<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">data_path</span>(dat_id):
+    <span style="color: #8B008B; font-weight: bold">return</span> os.path.join(DATA_ID, dat_id)
+
+<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">save_fig</span>(fig_id):
+    plt.savefig(image_path(fig_id) + <span style="color: #CD5555">&quot;.png&quot;</span>, <span style="color: #658b00">format</span>=<span style="color: #CD5555">&#39;png&#39;</span>)
+
+infile = <span style="color: #658b00">open</span>(data_path(<span style="color: #CD5555">&quot;chddata.csv&quot;</span>),<span style="color: #CD5555">&#39;r&#39;</span>)
+
+<span style="color: #228B22"># Read the chd data as  csv file and organize the data into arrays with age group, age, and chd</span>
+chd = pd.read_csv(infile, names=(<span style="color: #CD5555">&#39;ID&#39;</span>, <span style="color: #CD5555">&#39;Age&#39;</span>, <span style="color: #CD5555">&#39;Agegroup&#39;</span>, <span style="color: #CD5555">&#39;CHD&#39;</span>))
+chd.columns = [<span style="color: #CD5555">&#39;ID&#39;</span>, <span style="color: #CD5555">&#39;Age&#39;</span>, <span style="color: #CD5555">&#39;Agegroup&#39;</span>, <span style="color: #CD5555">&#39;CHD&#39;</span>]
+output = chd[<span style="color: #CD5555">&#39;CHD&#39;</span>]
+age = chd[<span style="color: #CD5555">&#39;Age&#39;</span>]
+agegroup = chd[<span style="color: #CD5555">&#39;Agegroup&#39;</span>]
+numberID  = chd[<span style="color: #CD5555">&#39;ID&#39;</span>] 
+display(chd)
+
+plt.scatter(age, output, marker=<span style="color: #CD5555">&#39;o&#39;</span>)
+plt.axis([<span style="color: #B452CD">18</span>,<span style="color: #B452CD">70.0</span>,-<span style="color: #B452CD">0.1</span>, <span style="color: #B452CD">1.2</span>])
+plt.xlabel(<span style="color: #CD5555">r&#39;Age&#39;</span>)
+plt.ylabel(<span style="color: #CD5555">r&#39;CHD&#39;</span>)
+plt.title(<span style="color: #CD5555">r&#39;Age distribution and Coronary heart disease&#39;</span>)
+plt.show()
 </pre>
 </div>
       </div>
@@ -1018,7 +1247,12 @@ <h2 id="steepest-descent-example">Steepest descent example </h2>
   </div>
 </div>
 
-<p>Run it!</p>
+
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="plotting-the-mean-value-for-each-group">Plotting the mean value for each group </h2>
+
+<p>What we could attempt however is to plot the mean value for each group.</p>
+
 
 <!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
 <div class="cell border-box-sizing code_cell rendered">
@@ -1026,13 +1260,14 @@ <h2 id="steepest-descent-example">Steepest descent example </h2>
     <div class="inner_cell">
       <div class="input_area">
         <div class="highlight" style="background: #eeeedd">
-  <pre style="line-height: 125%;"><span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">f1d</span>(alpha):
-    <span style="color: #8B008B; font-weight: bold">return</span> f(x + alpha*s)
-
-alpha_opt = sopt.golden(f1d)
-next_guess = x + alpha_opt * s
-guesses.append(next_guess)
-<span style="color: #658b00">print</span>(next_guess)
+  <pre style="line-height: 125%;">agegroupmean = np.array([<span style="color: #B452CD">0.1</span>, <span style="color: #B452CD">0.133</span>, <span style="color: #B452CD">0.250</span>, <span style="color: #B452CD">0.333</span>, <span style="color: #B452CD">0.462</span>, <span style="color: #B452CD">0.625</span>, <span style="color: #B452CD">0.765</span>, <span style="color: #B452CD">0.800</span>])
+group = np.array([<span style="color: #B452CD">1</span>, <span style="color: #B452CD">2</span>, <span style="color: #B452CD">3</span>, <span style="color: #B452CD">4</span>, <span style="color: #B452CD">5</span>, <span style="color: #B452CD">6</span>, <span style="color: #B452CD">7</span>, <span style="color: #B452CD">8</span>])
+plt.plot(group, agegroupmean, <span style="color: #CD5555">&quot;r-&quot;</span>)
+plt.axis([<span style="color: #B452CD">0</span>,<span style="color: #B452CD">9</span>,<span style="color: #B452CD">0</span>, <span style="color: #B452CD">1.0</span>])
+plt.xlabel(<span style="color: #CD5555">r&#39;Age group&#39;</span>)
+plt.ylabel(<span style="color: #CD5555">r&#39;CHD mean values&#39;</span>)
+plt.title(<span style="color: #CD5555">r&#39;Mean values for each age group&#39;</span>)
+plt.show()
 </pre>
 </div>
       </div>
@@ -1048,7 +1283,51 @@ <h2 id="steepest-descent-example">Steepest descent example </h2>
   </div>
 </div>
 
-<p>What happened?</p>
+<p>We are now trying to find a function \( f(y\vert x) \), that is a function which gives us an expected value for the output \( y \) with a given input \( x \).
+In standard linear regression with a linear dependence on \( x \), we would write this in terms of our model
+</p>
+$$
+f(y_i\vert x_i)=\theta_0+\theta_1 x_i.
+$$
+
+<p>This expression implies however that \( f(y_i\vert x_i) \) could take any
+value from minus infinity to plus infinity. If we however let
+\( f(y\vert y) \) be represented by the mean value, the above example
+shows us that we can constrain the function to take values between
+zero and one, that is we have \( 0 \le f(y_i\vert x_i) \le 1 \). Looking
+at our last curve we see also that it has an S-shaped form. This leads
+us to a very popular model for the function \( f \), namely the so-called
+Sigmoid function or logistic model. We will consider this function as
+representing the probability for finding a value of \( y_i \) with a given
+\( x_i \).
+</p>
+
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="the-logistic-function">The logistic function </h2>
+
+<p>Another widely studied model, is the so-called 
+perceptron model, which is an example of a &quot;hard classification&quot; model. We
+will encounter this model when we discuss neural networks as
+well. Each datapoint is deterministically assigned to a category (i.e
+\( y_i=0 \) or \( y_i=1 \)). In many cases, and the coronary heart disease data forms one of many such examples, it is favorable to have a &quot;soft&quot;
+classifier that outputs the probability of a given category rather
+than a single value. For example, given \( x_i \), the classifier
+outputs the probability of being in a category \( k \).  Logistic regression
+is the most common example of a so-called soft classifier. In logistic
+regression, the probability that a data point \( x_i \)
+belongs to a category \( y_i=\{0,1\} \) is given by the so-called logit function (or Sigmoid) which is meant to represent the likelihood for a given event, 
+</p>
+$$
+p(t) = \frac{1}{1+\mathrm \exp{-t}}=\frac{\exp{t}}{1+\mathrm \exp{t}}.
+$$
+
+<p>Note that \( 1-p(t)= p(-t) \).</p>
+
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-networks">Examples of likelihood functions used in logistic regression and nueral networks </h2>
+
+<p>The following code plots the logistic function, the step function and other functions we will encounter from here and on.</p>
+
 
 <!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
 <div class="cell border-box-sizing code_cell rendered">
@@ -1056,10 +1335,60 @@ <h2 id="steepest-descent-example">Steepest descent example </h2>
     <div class="inner_cell">
       <div class="input_area">
         <div class="highlight" style="background: #eeeedd">
-  <pre style="line-height: 125%;">pt.axis(<span style="color: #CD5555">&quot;equal&quot;</span>)
-pt.contour(xmesh, ymesh, fmesh, <span style="color: #B452CD">50</span>)
-it_array = np.array(guesses)
-pt.plot(it_array.T[<span style="color: #B452CD">0</span>], it_array.T[<span style="color: #B452CD">1</span>], <span style="color: #CD5555">&quot;x-&quot;</span>)
+  <pre style="line-height: 125%;"><span style="color: #CD5555">&quot;&quot;&quot;The sigmoid function (or the logistic curve) is a</span>
+<span style="color: #CD5555">function that takes any real number, z, and outputs a number (0,1).</span>
+<span style="color: #CD5555">It is useful in neural networks for assigning weights on a relative scale.</span>
+<span style="color: #CD5555">The value z is the weighted sum of parameters involved in the learning algorithm.&quot;&quot;&quot;</span>
+
+<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">numpy</span>
+<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">matplotlib.pyplot</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">plt</span>
+<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">math</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">mt</span>
+
+z = numpy.arange(-<span style="color: #B452CD">5</span>, <span style="color: #B452CD">5</span>, <span style="color: #B452CD">.1</span>)
+sigma_fn = numpy.vectorize(<span style="color: #8B008B; font-weight: bold">lambda</span> z: <span style="color: #B452CD">1</span>/(<span style="color: #B452CD">1</span>+numpy.exp(-z)))
+sigma = sigma_fn(z)
+
+fig = plt.figure()
+ax = fig.add_subplot(<span style="color: #B452CD">111</span>)
+ax.plot(z, sigma)
+ax.set_ylim([-<span style="color: #B452CD">0.1</span>, <span style="color: #B452CD">1.1</span>])
+ax.set_xlim([-<span style="color: #B452CD">5</span>,<span style="color: #B452CD">5</span>])
+ax.grid(<span style="color: #8B008B; font-weight: bold">True</span>)
+ax.set_xlabel(<span style="color: #CD5555">&#39;z&#39;</span>)
+ax.set_title(<span style="color: #CD5555">&#39;sigmoid function&#39;</span>)
+
+plt.show()
+
+<span style="color: #CD5555">&quot;&quot;&quot;Step Function&quot;&quot;&quot;</span>
+z = numpy.arange(-<span style="color: #B452CD">5</span>, <span style="color: #B452CD">5</span>, <span style="color: #B452CD">.02</span>)
+step_fn = numpy.vectorize(<span style="color: #8B008B; font-weight: bold">lambda</span> z: <span style="color: #B452CD">1.0</span> <span style="color: #8B008B; font-weight: bold">if</span> z &gt;= <span style="color: #B452CD">0.0</span> <span style="color: #8B008B; font-weight: bold">else</span> <span style="color: #B452CD">0.0</span>)
+step = step_fn(z)
+
+fig = plt.figure()
+ax = fig.add_subplot(<span style="color: #B452CD">111</span>)
+ax.plot(z, step)
+ax.set_ylim([-<span style="color: #B452CD">0.5</span>, <span style="color: #B452CD">1.5</span>])
+ax.set_xlim([-<span style="color: #B452CD">5</span>,<span style="color: #B452CD">5</span>])
+ax.grid(<span style="color: #8B008B; font-weight: bold">True</span>)
+ax.set_xlabel(<span style="color: #CD5555">&#39;z&#39;</span>)
+ax.set_title(<span style="color: #CD5555">&#39;step function&#39;</span>)
+
+plt.show()
+
+<span style="color: #CD5555">&quot;&quot;&quot;tanh Function&quot;&quot;&quot;</span>
+z = numpy.arange(-<span style="color: #B452CD">2</span>*mt.pi, <span style="color: #B452CD">2</span>*mt.pi, <span style="color: #B452CD">0.1</span>)
+t = numpy.tanh(z)
+
+fig = plt.figure()
+ax = fig.add_subplot(<span style="color: #B452CD">111</span>)
+ax.plot(z, t)
+ax.set_ylim([-<span style="color: #B452CD">1.0</span>, <span style="color: #B452CD">1.0</span>])
+ax.set_xlim([-<span style="color: #B452CD">2</span>*mt.pi,<span style="color: #B452CD">2</span>*mt.pi])
+ax.grid(<span style="color: #8B008B; font-weight: bold">True</span>)
+ax.set_xlabel(<span style="color: #CD5555">&#39;z&#39;</span>)
+ax.set_title(<span style="color: #CD5555">&#39;tanh function&#39;</span>)
+
+plt.show()
 </pre>
 </div>
       </div>
@@ -1075,599 +1404,270 @@ <h2 id="steepest-descent-example">Steepest descent example </h2>
   </div>
 </div>
 
-<p>Note that we did only one iteration here. We can easily add more using our previous guesses.</p>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="conjugate-gradient-method">Conjugate gradient method </h2>
-<div class="alert alert-block alert-block alert-text-normal">
-<b></b>
-<p>
-<p>In the CG method we define so-called conjugate directions and two vectors 
-\( \boldsymbol{s} \) and \( \boldsymbol{t} \)
-are said to be
-conjugate if
-</p>
+<h2 id="two-parameters">Two parameters </h2>
+
+<p>We assume now that we have two classes with \( y_i \) either \( 0 \) or \( 1 \). Furthermore we assume also that we have only two parameters \( \theta \) in our fitting of the Sigmoid function, that is we define probabilities </p>
 $$
-\begin{equation*}
-\boldsymbol{s}^T\boldsymbol{A}\boldsymbol{t}= 0.
-\end{equation*}
+\begin{align*}
+p(y_i=1|x_i,\boldsymbol{\theta}) &= \frac{\exp{(\theta_0+\theta_1x_i)}}{1+\exp{(\theta_0+\theta_1x_i)}},\nonumber\\
+p(y_i=0|x_i,\boldsymbol{\theta}) &= 1 - p(y_i=1|x_i,\boldsymbol{\theta}),
+\end{align*}
 $$
 
-<p>The philosophy of the CG method is to perform searches in various conjugate directions
-of our vectors \( \boldsymbol{x}_i \) obeying the above criterion, namely
-</p>
+<p>where \( \boldsymbol{\theta} \) are the weights we wish to extract from data, in our case \( \theta_0 \) and \( \theta_1 \). </p>
+
+<p>Note that we used</p>
 $$
-\begin{equation*}
-\boldsymbol{x}_i^T\boldsymbol{A}\boldsymbol{x}_j= 0.
-\end{equation*}
+p(y_i=0\vert x_i, \boldsymbol{\theta}) = 1-p(y_i=1\vert x_i, \boldsymbol{\theta}).
 $$
 
-<p>Two vectors are conjugate if they are orthogonal with respect to 
-this inner product. Being conjugate is a symmetric relation: if \( \boldsymbol{s} \) is conjugate to \( \boldsymbol{t} \), then \( \boldsymbol{t} \) is conjugate to \( \boldsymbol{s} \).
-</p>
-</div>
 
+<!-- !split  -->
+<h2 id="maximum-likelihood">Maximum likelihood </h2>
 
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="conjugate-gradient-method">Conjugate gradient method </h2>
-<div class="alert alert-block alert-block alert-text-normal">
-<b></b>
-<p>
-<p>An example is given by the eigenvectors of the matrix</p>
+<p>In order to define the total likelihood for all possible outcomes from a  
+dataset \( \mathcal{D}=\{(y_i,x_i)\} \), with the binary labels
+\( y_i\in\{0,1\} \) and where the data points are drawn independently, we use the so-called <a href="https://en.wikipedia.org/wiki/Maximum_likelihood_estimation" target="_blank">Maximum Likelihood Estimation</a> (MLE) principle. 
+We aim thus at maximizing 
+the probability of seeing the observed data. We can then approximate the 
+likelihood in terms of the product of the individual probabilities of a specific outcome \( y_i \), that is 
+</p>
 $$
-\begin{equation*}
-\boldsymbol{v}_i^T\boldsymbol{A}\boldsymbol{v}_j= \lambda\boldsymbol{v}_i^T\boldsymbol{v}_j,
-\end{equation*}
+\begin{align*}
+P(\mathcal{D}|\boldsymbol{\theta})& = \prod_{i=1}^n \left[p(y_i=1|x_i,\boldsymbol{\theta})\right]^{y_i}\left[1-p(y_i=1|x_i,\boldsymbol{\theta}))\right]^{1-y_i}\nonumber \\
+\end{align*}
 $$
 
-<p>which is zero unless \( i=j \). </p>
-</div>
+<p>from which we obtain the log-likelihood and our <b>cost/loss</b> function</p>
+$$
+\mathcal{C}(\boldsymbol{\theta}) = \sum_{i=1}^n \left( y_i\log{p(y_i=1|x_i,\boldsymbol{\theta})} + (1-y_i)\log\left[1-p(y_i=1|x_i,\boldsymbol{\theta}))\right]\right).
+$$
 
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="conjugate-gradient-method">Conjugate gradient method </h2>
-<div class="alert alert-block alert-block alert-text-normal">
-<b></b>
-<p>
-<p>Assume now that we have a symmetric positive-definite matrix \( \boldsymbol{A} \) of size
-\( n\times n \). At each iteration \( i+1 \) we obtain the conjugate direction of a vector
-</p>
+<h2 id="the-cost-function-rewritten">The cost function rewritten </h2>
+
+<p>Reordering the logarithms, we can rewrite the <b>cost/loss</b> function as</p>
 $$
-\begin{equation*}
-\boldsymbol{x}_{i+1}=\boldsymbol{x}_{i}+\alpha_i\boldsymbol{p}_{i}. 
-\end{equation*}
+\mathcal{C}(\boldsymbol{\theta}) = \sum_{i=1}^n  \left(y_i(\theta_0+\theta_1x_i) -\log{(1+\exp{(\theta_0+\theta_1x_i)})}\right).
 $$
 
-<p>We assume that \( \boldsymbol{p}_{i} \) is a sequence of \( n \) mutually conjugate directions. 
-Then the \( \boldsymbol{p}_{i} \)  form a basis of \( R^n \) and we can expand the solution 
-$  \boldsymbol{A}\boldsymbol{x} = \boldsymbol{b}$ in this basis, namely
+<p>The maximum likelihood estimator is defined as the set of parameters that maximize the log-likelihood where we maximize with respect to \( \theta \).
+Since the cost (error) function is just the negative log-likelihood, for logistic regression we have that
 </p>
-
 $$
-\begin{equation*}
-  \boldsymbol{x}  = \sum^{n}_{i=1} \alpha_i \boldsymbol{p}_i.
-\end{equation*}
+\mathcal{C}(\boldsymbol{\theta})=-\sum_{i=1}^n  \left(y_i(\theta_0+\theta_1x_i) -\log{(1+\exp{(\theta_0+\theta_1x_i)})}\right).
 $$
-</div>
 
+<p>This equation is known in statistics as the <b>cross entropy</b>. Finally, we note that just as in linear regression, 
+in practice we often supplement the cross-entropy with additional regularization terms, usually \( L_1 \) and \( L_2 \) regularization as we did for Ridge and Lasso regression.
+</p>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="conjugate-gradient-method">Conjugate gradient method </h2>
-<div class="alert alert-block alert-block alert-text-normal">
-<b></b>
-<p>
-<p>The coefficients are given by</p>
-$$
-\begin{equation*}
-    \mathbf{A}\mathbf{x} = \sum^{n}_{i=1} \alpha_i \mathbf{A} \mathbf{p}_i = \mathbf{b}.
-\end{equation*}
-$$
+<h2 id="minimizing-the-cross-entropy">Minimizing the cross entropy </h2>
+
+<p>The cross entropy is a convex function of the weights \( \boldsymbol{\theta} \) and,
+therefore, any local minimizer is a global minimizer. 
+</p>
 
-<p>Multiplying with \( \boldsymbol{p}_k^T \)  from the left gives</p>
+<p>Minimizing this
+cost function with respect to the two parameters \( \theta_0 \) and \( \theta_1 \) we obtain
+</p>
 
 $$
-\begin{equation*}
-  \boldsymbol{p}_k^T \boldsymbol{A}\boldsymbol{x} = \sum^{n}_{i=1} \alpha_i\boldsymbol{p}_k^T \boldsymbol{A}\boldsymbol{p}_i= \boldsymbol{p}_k^T \boldsymbol{b},
-\end{equation*}
+\frac{\partial \mathcal{C}(\boldsymbol{\theta})}{\partial \theta_0} = -\sum_{i=1}^n  \left(y_i -\frac{\exp{(\theta_0+\theta_1x_i)}}{1+\exp{(\theta_0+\theta_1x_i)}}\right),
 $$
 
-<p>and we can define the coefficients \( \alpha_k \) as</p>
-
+<p>and </p>
 $$
-\begin{equation*}
-    \alpha_k = \frac{\boldsymbol{p}_k^T \boldsymbol{b}}{\boldsymbol{p}_k^T \boldsymbol{A} \boldsymbol{p}_k}
-\end{equation*}
+\frac{\partial \mathcal{C}(\boldsymbol{\theta})}{\partial \theta_1} = -\sum_{i=1}^n  \left(y_ix_i -x_i\frac{\exp{(\theta_0+\theta_1x_i)}}{1+\exp{(\theta_0+\theta_1x_i)}}\right).
 $$
-</div>
 
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="conjugate-gradient-method-and-iterations">Conjugate gradient method and iterations </h2>
-<div class="alert alert-block alert-block alert-text-normal">
-<b></b>
-<p>
+<h2 id="a-more-compact-expression">A more compact expression </h2>
 
-<p>If we choose the conjugate vectors \( \boldsymbol{p}_k \) carefully, 
-then we may not need all of them to obtain a good approximation to the solution 
-\( \boldsymbol{x} \). 
-We want to regard the conjugate gradient method as an iterative method. 
-This will us to solve systems where \( n \) is so large that the direct 
-method would take too much time.
+<p>Let us now define a vector \( \boldsymbol{y} \) with \( n \) elements \( y_i \), an
+\( n\times p \) matrix \( \boldsymbol{X} \) which contains the \( x_i \) values and a
+vector \( \boldsymbol{p} \) of fitted probabilities \( p(y_i\vert x_i,\boldsymbol{\theta}) \). We can rewrite in a more compact form the first
+derivative of the cost function as
 </p>
 
-<p>We denote the initial guess for \( \boldsymbol{x} \) as \( \boldsymbol{x}_0 \). 
-We can assume without loss of generality that
-</p>
 $$
-\begin{equation*}
-\boldsymbol{x}_0=0,
-\end{equation*}
+\frac{\partial \mathcal{C}(\boldsymbol{\theta})}{\partial \boldsymbol{\theta}} = -\boldsymbol{X}^T\left(\boldsymbol{y}-\boldsymbol{p}\right). 
 $$
 
-<p>or consider the system</p>
+<p>If we in addition define a diagonal matrix \( \boldsymbol{W} \) with elements 
+\( p(y_i\vert x_i,\boldsymbol{\theta})(1-p(y_i\vert x_i,\boldsymbol{\theta}) \), we can obtain a compact expression of the second derivative as 
+</p>
+
 $$
-\begin{equation*}
-\boldsymbol{A}\boldsymbol{z} = \boldsymbol{b}-\boldsymbol{A}\boldsymbol{x}_0,
-\end{equation*}
+\frac{\partial^2 \mathcal{C}(\boldsymbol{\theta})}{\partial \boldsymbol{\theta}\partial \boldsymbol{\theta}^T} = \boldsymbol{X}^T\boldsymbol{W}\boldsymbol{X}. 
 $$
 
-<p>instead.</p>
-</div>
-
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="conjugate-gradient-method">Conjugate gradient method </h2>
-<div class="alert alert-block alert-block alert-text-normal">
-<b></b>
-<p>
-<p>One can show that the solution \( \boldsymbol{x} \) is also the unique minimizer of the quadratic form</p>
+<h2 id="extending-to-more-predictors">Extending to more predictors </h2>
+
+<p>Within a binary classification problem, we can easily expand our model to include multiple predictors. Our ratio between likelihoods is then with \( p \) predictors</p>
 $$
-\begin{equation*}
-  f(\boldsymbol{x}) = \frac{1}{2}\boldsymbol{x}^T\boldsymbol{A}\boldsymbol{x} - \boldsymbol{x}^T \boldsymbol{x} , \quad \boldsymbol{x}\in\mathbf{R}^n. 
-\end{equation*}
+\log{ \frac{p(\boldsymbol{\theta}\boldsymbol{x})}{1-p(\boldsymbol{\theta}\boldsymbol{x})}} = \theta_0+\theta_1x_1+\theta_2x_2+\dots+\theta_px_p.
 $$
 
-<p>This suggests taking the first basis vector \( \boldsymbol{p}_1 \) 
-to be the gradient of \( f \) at \( \boldsymbol{x}=\boldsymbol{x}_0 \), 
-which equals
-</p>
+<p>Here we defined \( \boldsymbol{x}=[1,x_1,x_2,\dots,x_p] \) and \( \boldsymbol{\theta}=[\theta_0, \theta_1, \dots, \theta_p] \) leading to</p>
 $$
-\begin{equation*}
-\boldsymbol{A}\boldsymbol{x}_0-\boldsymbol{b},
-\end{equation*}
+p(\boldsymbol{\theta}\boldsymbol{x})=\frac{ \exp{(\theta_0+\theta_1x_1+\theta_2x_2+\dots+\theta_px_p)}}{1+\exp{(\theta_0+\theta_1x_1+\theta_2x_2+\dots+\theta_px_p)}}.
 $$
 
-<p>and 
-\( \boldsymbol{x}_0=0 \) it is equal \( -\boldsymbol{b} \).
-The other vectors in the basis will be conjugate to the gradient, 
-hence the name conjugate gradient method.
-</p>
-</div>
-
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="conjugate-gradient-method">Conjugate gradient method </h2>
-<div class="alert alert-block alert-block alert-text-normal">
-<b></b>
-<p>
-<p>Let  \( \boldsymbol{r}_k \) be the residual at the \( k \)-th step:</p>
-$$
-\begin{equation*}
-\boldsymbol{r}_k=\boldsymbol{b}-\boldsymbol{A}\boldsymbol{x}_k.
-\end{equation*}
-$$
+<h2 id="including-more-classes">Including more classes </h2>
 
-<p>Note that \( \boldsymbol{r}_k \) is the negative gradient of \( f \) at 
-\( \boldsymbol{x}=\boldsymbol{x}_k \), 
-so the gradient descent method would be to move in the direction \( \boldsymbol{r}_k \). 
-Here, we insist that the directions \( \boldsymbol{p}_k \) are conjugate to each other, 
-so we take the direction closest to the gradient \( \boldsymbol{r}_k \)  
-under the conjugacy constraint. 
-This gives the following expression
+<p>Till now we have mainly focused on two classes, the so-called binary
+system. Suppose we wish to extend to \( K \) classes.  Let us for the sake
+of simplicity assume we have only two predictors. We have then following model
 </p>
-$$
-\begin{equation*}
-\boldsymbol{p}_{k+1}=\boldsymbol{r}_k-\frac{\boldsymbol{p}_k^T \boldsymbol{A}\boldsymbol{r}_k}{\boldsymbol{p}_k^T\boldsymbol{A}\boldsymbol{p}_k} \boldsymbol{p}_k.
-\end{equation*}
-$$
-</div>
-
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="conjugate-gradient-method">Conjugate gradient method </h2>
-<div class="alert alert-block alert-block alert-text-normal">
-<b></b>
-<p>
-<p>We can also  compute the residual iteratively as</p>
-$$
-\begin{equation*}
-\boldsymbol{r}_{k+1}=\boldsymbol{b}-\boldsymbol{A}\boldsymbol{x}_{k+1},
- \end{equation*}
-$$
 
-<p>which equals</p>
 $$
-\begin{equation*}
-\boldsymbol{b}-\boldsymbol{A}(\boldsymbol{x}_k+\alpha_k\boldsymbol{p}_k),
- \end{equation*}
+\log{\frac{p(C=1\vert x)}{p(K\vert x)}} = \theta_{10}+\theta_{11}x_1,
 $$
 
-<p>or</p>
+<p>and </p>
 $$
-\begin{equation*}
-(\boldsymbol{b}-\boldsymbol{A}\boldsymbol{x}_k)-\alpha_k\boldsymbol{A}\boldsymbol{p}_k,
- \end{equation*}
+\log{\frac{p(C=2\vert x)}{p(K\vert x)}} = \theta_{20}+\theta_{21}x_1,
 $$
 
-<p>which gives</p>
-
+<p>and so on till the class \( C=K-1 \) class</p>
 $$
-\begin{equation*}
-\boldsymbol{r}_{k+1}=\boldsymbol{r}_k-\boldsymbol{A}\boldsymbol{p}_{k},
- \end{equation*}
+\log{\frac{p(C=K-1\vert x)}{p(K\vert x)}} = \theta_{(K-1)0}+\theta_{(K-1)1}x_1,
 $$
-</div>
-
 
-<!-- !split  -->
-<h2 id="revisiting-our-first-homework">Revisiting our first homework </h2>
-
-<p>We will use linear regression as a case study for the gradient descent
-methods. Linear regression is a great test case for the gradient
-descent methods discussed in the lectures since it has several
-desirable properties such as:
+<p>and the model is specified in term of \( K-1 \) so-called log-odds or
+<b>logit</b> transformations.
 </p>
 
-<ol>
-<li> An analytical solution (recall homework set 1).</li>
-<li> The gradient can be computed analytically.</li>
-<li> The cost function is convex which guarantees that gradient descent converges for small enough learning rates</li>
-</ol>
-<p>We revisit an example similar to what we had in the first homework set. We had a function  of the type</p>
-
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="more-classes">More classes </h2>
 
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="line-height: 125%;">m = <span style="color: #B452CD">100</span>
-x = <span style="color: #B452CD">2</span>*np.random.rand(m,<span style="color: #B452CD">1</span>)
-y = <span style="color: #B452CD">4</span>+<span style="color: #B452CD">3</span>*x+np.random.randn(m,<span style="color: #B452CD">1</span>)
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
+<p>In our discussion of neural networks we will encounter the above again
+in terms of a slightly modified function, the so-called <b>Softmax</b> function.
+</p>
 
-<p>with \( x_i \in [0,1]  \) is chosen randomly using a uniform distribution. Additionally we have a stochastic noise chosen according to a normal distribution \( \cal {N}(0,1) \). 
-The linear regression model is given by 
+<p>The softmax function is used in various multiclass classification
+methods, such as multinomial logistic regression (also known as
+softmax regression), multiclass linear discriminant analysis, naive
+Bayes classifiers, and artificial neural networks.  Specifically, in
+multinomial logistic regression and linear discriminant analysis, the
+input to the function is the result of \( K \) distinct linear functions,
+and the predicted probability for the \( k \)-th class given a sample
+vector \( \boldsymbol{x} \) and a weighting vector \( \boldsymbol{\theta} \) is (with two
+predictors):
 </p>
-$$
-h_\beta(x) = \boldsymbol{y} = \beta_0 + \beta_1 x,
-$$
 
-<p>such that </p>
 $$
-\boldsymbol{y}_i = \beta_0 + \beta_1 x_i.
+p(C=k\vert \mathbf {x} )=\frac{\exp{(\theta_{k0}+\theta_{k1}x_1)}}{1+\sum_{l=1}^{K-1}\exp{(\theta_{l0}+\theta_{l1}x_1)}}.
 $$
 
-
-<!-- !split  -->
-<h2 id="gradient-descent-example">Gradient descent example </h2>
-
-<p>Let \( \mathbf{y} = (y_1,\cdots,y_n)^T \), \( \mathbf{\boldsymbol{y}} = (\boldsymbol{y}_1,\cdots,\boldsymbol{y}_n)^T \) and \( \beta = (\beta_0, \beta_1)^T \)</p>
-
-<p>It is convenient to write \( \mathbf{\boldsymbol{y}} = X\beta \) where \( X \in \mathbb{R}^{100 \times 2}  \) is the design matrix given by (we keep the intercept here)</p>
+<p>It is easy to extend to more predictors. The final class is </p>
 $$
-X \equiv \begin{bmatrix}
-1 & x_1  \\
-\vdots & \vdots  \\
-1 & x_{100} &  \\
-\end{bmatrix}.
+p(C=K\vert \mathbf {x} )=\frac{1}{1+\sum_{l=1}^{K-1}\exp{(\theta_{l0}+\theta_{l1}x_1)}},
 $$
 
-<p>The cost/loss/risk function is given by (</p>
-$$
-C(\beta) = \frac{1}{n}||X\beta-\mathbf{y}||_{2}^{2} = \frac{1}{n}\sum_{i=1}^{100}\left[ (\beta_0 + \beta_1 x_i)^2 - 2 y_i (\beta_0 + \beta_1 x_i) + y_i^2\right] 
-$$
+<p>and they sum to one. Our earlier discussions were all specialized to
+the case with two classes only. It is easy to see from the above that
+what we derived earlier is compatible with these equations.
+</p>
 
-<p>and we want to find \( \beta \) such that \( C(\beta) \) is minimized.</p>
+<p>To find the optimal parameters we would typically use a gradient
+descent method.  Newton's method and gradient descent methods are
+discussed in the material on <a href="https://compphysics.github.io/MachineLearning/doc/pub/Splines/html/Splines-bs.html" target="_blank">optimization
+methods</a>.
+</p>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="the-derivative-of-the-cost-loss-function">The derivative of the cost/loss function </h2>
-
-<p>Computing \( \partial C(\beta) / \partial \beta_0 \) and \( \partial C(\beta) / \partial \beta_1 \) we can show  that the gradient can be written as</p>
-$$
-\nabla_{\beta} C(\beta) = \frac{2}{n}\begin{bmatrix} \sum_{i=1}^{100} \left(\beta_0+\beta_1x_i-y_i\right) \\
-\sum_{i=1}^{100}\left( x_i (\beta_0+\beta_1x_i)-y_ix_i\right) \\
-\end{bmatrix} = \frac{2}{n}X^T(X\beta - \mathbf{y}), 
-$$
+<h2 id="optimization-the-central-part-of-any-machine-learning-algortithm">Optimization, the central part of any Machine Learning algortithm </h2>
 
-<p>where \( X \) is the design matrix defined above.</p>
+<p>Almost every problem in machine learning and data science starts with
+a dataset \( X \), a model \( g(\theta) \), which is a function of the
+parameters \( \theta \) and a cost function \( C(X, g(\theta)) \) that allows
+us to judge how well the model \( g(\theta) \) explains the observations
+\( X \). The model is fit by finding the values of \( \theta \) that minimize
+the cost function. Ideally we would be able to solve for \( \theta \)
+analytically, however this is not possible in general and we must use
+some approximative/numerical method to compute the minimum.
+</p>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="the-hessian-matrix">The Hessian matrix </h2>
-<p>The Hessian matrix of \( C(\beta) \) is given by </p>
+<h2 id="revisiting-our-logistic-regression-case">Revisiting our Logistic Regression case </h2>
+
+<p>In our discussion on Logistic Regression we studied the 
+case of
+two classes, with \( y_i \) either
+\( 0 \) or \( 1 \). Furthermore we assumed also that we have only two
+parameters \( \theta \) in our fitting, that is we
+defined probabilities
+</p>
+
 $$
-\boldsymbol{H} \equiv \begin{bmatrix}
-\frac{\partial^2 C(\beta)}{\partial \beta_0^2} & \frac{\partial^2 C(\beta)}{\partial \beta_0 \partial \beta_1}  \\
-\frac{\partial^2 C(\beta)}{\partial \beta_0 \partial \beta_1} & \frac{\partial^2 C(\beta)}{\partial \beta_1^2} &  \\
-\end{bmatrix} = \frac{2}{n}X^T X.
+\begin{align*}
+p(y_i=1|x_i,\boldsymbol{\theta}) &= \frac{\exp{(\theta_0+\theta_1x_i)}}{1+\exp{(\theta_0+\theta_1x_i)}},\nonumber\\
+p(y_i=0|x_i,\boldsymbol{\theta}) &= 1 - p(y_i=1|x_i,\boldsymbol{\theta}),
+\end{align*}
 $$
 
-<p>This result implies that \( C(\beta) \) is a convex function since the matrix \( X^T X \) always is positive semi-definite.</p>
+<p>where \( \boldsymbol{\theta} \) are the weights we wish to extract from data, in our case \( \theta_0 \) and \( \theta_1 \). </p>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="simple-program">Simple program </h2>
+<h2 id="the-equations-to-solve">The equations to solve </h2>
+
+<p>Our compact equations used a definition of a vector \( \boldsymbol{y} \) with \( n \)
+elements \( y_i \), an \( n\times p \) matrix \( \boldsymbol{X} \) which contains the
+\( x_i \) values and a vector \( \boldsymbol{p} \) of fitted probabilities
+\( p(y_i\vert x_i,\boldsymbol{\theta}) \). We rewrote in a more compact form
+the first derivative of the cost function as
+</p>
 
-<p>We can now write a program that minimizes \( C(\beta) \) using the gradient descent method with a constant learning rate \( \gamma \) according to </p>
 $$
-\beta_{k+1} = \beta_k - \gamma \nabla_\beta C(\beta_k), \ k=0,1,\cdots 
+\frac{\partial \mathcal{C}(\boldsymbol{\theta})}{\partial \boldsymbol{\theta}} = -\boldsymbol{X}^T\left(\boldsymbol{y}-\boldsymbol{p}\right). 
 $$
 
-<p>We can use the expression we computed for the gradient and let use a
-\( \beta_0 \) be chosen randomly and let \( \gamma = 0.001 \). Stop iterating
-when \( ||\nabla_\beta C(\beta_k) || \leq \epsilon = 10^{-8} \). <b>Note that the code below does not include the latter stop criterion</b>.
-</p>
-
-<p>And finally we can compare our solution for \( \beta \) with the analytic result given by 
-\( \beta= (X^TX)^{-1} X^T \mathbf{y} \).
+<p>If we in addition define a diagonal matrix \( \boldsymbol{W} \) with elements 
+\( p(y_i\vert x_i,\boldsymbol{\theta})(1-p(y_i\vert x_i,\boldsymbol{\theta}) \), we can obtain a compact expression of the second derivative as 
 </p>
 
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="gradient-descent-example">Gradient Descent Example </h2>
-
-<p>Here our simple example</p>
-
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="line-height: 125%;"><span style="color: #228B22"># Importing various packages</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">random</span> <span style="color: #8B008B; font-weight: bold">import</span> random, seed
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">matplotlib.pyplot</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">plt</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">mpl_toolkits.mplot3d</span> <span style="color: #8B008B; font-weight: bold">import</span> Axes3D
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">matplotlib</span> <span style="color: #8B008B; font-weight: bold">import</span> cm
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">matplotlib.ticker</span> <span style="color: #8B008B; font-weight: bold">import</span> LinearLocator, FormatStrFormatter
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">sys</span>
-
-<span style="color: #228B22"># the number of datapoints</span>
-n = <span style="color: #B452CD">100</span>
-x = <span style="color: #B452CD">2</span>*np.random.rand(n,<span style="color: #B452CD">1</span>)
-y = <span style="color: #B452CD">4</span>+<span style="color: #B452CD">3</span>*x+np.random.randn(n,<span style="color: #B452CD">1</span>)
-
-X = np.c_[np.ones((n,<span style="color: #B452CD">1</span>)), x]
-<span style="color: #228B22"># Hessian matrix</span>
-H = (<span style="color: #B452CD">2.0</span>/n)* X.T @ X
-<span style="color: #228B22"># Get the eigenvalues</span>
-EigValues, EigVectors = np.linalg.eig(H)
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">f&quot;Eigenvalues of Hessian Matrix:{</span>EigValues<span style="color: #CD5555">}&quot;</span>)
-
-beta_linreg = np.linalg.inv(X.T @ X) @ X.T @ y
-<span style="color: #658b00">print</span>(beta_linreg)
-beta = np.random.randn(<span style="color: #B452CD">2</span>,<span style="color: #B452CD">1</span>)
-
-eta = <span style="color: #B452CD">1.0</span>/np.max(EigValues)
-Niterations = <span style="color: #B452CD">1000</span>
-
-<span style="color: #8B008B; font-weight: bold">for</span> <span style="color: #658b00">iter</span> <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(Niterations):
-    gradient = (<span style="color: #B452CD">2.0</span>/n)*X.T @ (X @ beta-y)
-    beta -= eta*gradient
-
-<span style="color: #658b00">print</span>(beta)
-xnew = np.array([[<span style="color: #B452CD">0</span>],[<span style="color: #B452CD">2</span>]])
-xbnew = np.c_[np.ones((<span style="color: #B452CD">2</span>,<span style="color: #B452CD">1</span>)), xnew]
-ypredict = xbnew.dot(beta)
-ypredict2 = xbnew.dot(beta_linreg)
-plt.plot(xnew, ypredict, <span style="color: #CD5555">&quot;r-&quot;</span>)
-plt.plot(xnew, ypredict2, <span style="color: #CD5555">&quot;b-&quot;</span>)
-plt.plot(x, y ,<span style="color: #CD5555">&#39;ro&#39;</span>)
-plt.axis([<span style="color: #B452CD">0</span>,<span style="color: #B452CD">2.0</span>,<span style="color: #B452CD">0</span>, <span style="color: #B452CD">15.0</span>])
-plt.xlabel(<span style="color: #CD5555">r&#39;$x$&#39;</span>)
-plt.ylabel(<span style="color: #CD5555">r&#39;$y$&#39;</span>)
-plt.title(<span style="color: #CD5555">r&#39;Gradient descent example&#39;</span>)
-plt.show()
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
+$$
+\frac{\partial^2 \mathcal{C}(\boldsymbol{\theta})}{\partial \boldsymbol{\theta}\partial \boldsymbol{\theta}^T} = \boldsymbol{X}^T\boldsymbol{W}\boldsymbol{X}. 
+$$
 
+<p>This defines what is called  the Hessian matrix.</p>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="and-a-corresponding-example-using-scikit-learn">And a corresponding example using <b>scikit-learn</b> </h2>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="line-height: 125%;"><span style="color: #228B22"># Importing various packages</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">random</span> <span style="color: #8B008B; font-weight: bold">import</span> random, seed
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">matplotlib.pyplot</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">plt</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.linear_model</span> <span style="color: #8B008B; font-weight: bold">import</span> SGDRegressor
-
-n = <span style="color: #B452CD">100</span>
-x = <span style="color: #B452CD">2</span>*np.random.rand(n,<span style="color: #B452CD">1</span>)
-y = <span style="color: #B452CD">4</span>+<span style="color: #B452CD">3</span>*x+np.random.randn(n,<span style="color: #B452CD">1</span>)
-
-X = np.c_[np.ones((n,<span style="color: #B452CD">1</span>)), x]
-beta_linreg = np.linalg.inv(X.T @ X) @ (X.T @ y)
-<span style="color: #658b00">print</span>(beta_linreg)
-sgdreg = SGDRegressor(max_iter = <span style="color: #B452CD">50</span>, penalty=<span style="color: #8B008B; font-weight: bold">None</span>, eta0=<span style="color: #B452CD">0.1</span>)
-sgdreg.fit(x,y.ravel())
-<span style="color: #658b00">print</span>(sgdreg.intercept_, sgdreg.coef_)
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-
-<!-- !split  -->
-<h2 id="gradient-descent-and-ridge">Gradient descent and Ridge </h2>
+<h2 id="solving-using-newton-raphson-s-method">Solving using Newton-Raphson's method </h2>
 
-<p>We have also discussed Ridge regression where the loss function contains a regularized term given by the \( L_2 \) norm of \( \beta \), </p>
-$$
-C_{\text{ridge}}(\beta) = \frac{1}{n}||X\beta -\mathbf{y}||^2 + \lambda ||\beta||^2, \ \lambda \geq 0.
-$$
+<p>If we can set up these equations, Newton-Raphson's iterative method is normally the method of choice. It requires however that we can compute in an efficient way the  matrices that define the first and second derivatives. </p>
 
-<p>In order to minimize \( C_{\text{ridge}}(\beta) \) using GD we adjust the gradient as follows </p>
-$$
-\nabla_\beta C_{\text{ridge}}(\beta)  = \frac{2}{n}\begin{bmatrix} \sum_{i=1}^{100} \left(\beta_0+\beta_1x_i-y_i\right) \\
-\sum_{i=1}^{100}\left( x_i (\beta_0+\beta_1x_i)-y_ix_i\right) \\
-\end{bmatrix} + 2\lambda\begin{bmatrix} \beta_0 \\ \beta_1\end{bmatrix} = 2 (\frac{1}{n}X^T(X\beta - \mathbf{y})+\lambda \beta).
-$$
+<p>Our iterative scheme is then given by</p>
 
-<p>We can easily extend our program to minimize \( C_{\text{ridge}}(\beta) \) using gradient descent and compare with the analytical solution given by </p>
 $$
-\beta_{\text{ridge}} = \left(X^T X + n\lambda I_{2 \times 2} \right)^{-1} X^T \mathbf{y}.
+\boldsymbol{\theta}^{\mathrm{new}} = \boldsymbol{\theta}^{\mathrm{old}}-\left(\frac{\partial^2 \mathcal{C}(\boldsymbol{\theta})}{\partial \boldsymbol{\theta}\partial \boldsymbol{\theta}^T}\right)^{-1}_{\boldsymbol{\theta}^{\mathrm{old}}}\times \left(\frac{\partial \mathcal{C}(\boldsymbol{\theta})}{\partial \boldsymbol{\theta}}\right)_{\boldsymbol{\theta}^{\mathrm{old}}},
 $$
 
+<p>or in matrix form as</p>
 
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="the-hessian-matrix-for-ridge-regression">The Hessian matrix for Ridge Regression </h2>
-<p>The Hessian matrix of Ridge Regression for our simple example  is given by </p>
 $$
-\boldsymbol{H} \equiv \begin{bmatrix}
-\frac{\partial^2 C(\beta)}{\partial \beta_0^2} & \frac{\partial^2 C(\beta)}{\partial \beta_0 \partial \beta_1}  \\
-\frac{\partial^2 C(\beta)}{\partial \beta_0 \partial \beta_1} & \frac{\partial^2 C(\beta)}{\partial \beta_1^2} &  \\
-\end{bmatrix} = \frac{2}{n}X^T X+2\lambda\boldsymbol{I}.
+\boldsymbol{\theta}^{\mathrm{new}} = \boldsymbol{\theta}^{\mathrm{old}}-\left(\boldsymbol{X}^T\boldsymbol{W}\boldsymbol{X} \right)^{-1}\times \left(-\boldsymbol{X}^T(\boldsymbol{y}-\boldsymbol{p}) \right)_{\boldsymbol{\theta}^{\mathrm{old}}}.
 $$
 
-<p>This implies that the Hessian matrix  is positive definite, hence the stationary point is a
-minimum.
-Note that the Ridge cost function is convex being  a sum of two convex
-functions. Therefore, the stationary point is a global
-minimum of this function.
-</p>
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="program-example-for-gradient-descent-with-ridge-regression">Program example for gradient descent with Ridge Regression </h2>
-
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="line-height: 125%;"><span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">random</span> <span style="color: #8B008B; font-weight: bold">import</span> random, seed
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">matplotlib.pyplot</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">plt</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">mpl_toolkits.mplot3d</span> <span style="color: #8B008B; font-weight: bold">import</span> Axes3D
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">matplotlib</span> <span style="color: #8B008B; font-weight: bold">import</span> cm
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">matplotlib.ticker</span> <span style="color: #8B008B; font-weight: bold">import</span> LinearLocator, FormatStrFormatter
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">sys</span>
-
-<span style="color: #228B22"># the number of datapoints</span>
-n = <span style="color: #B452CD">100</span>
-x = <span style="color: #B452CD">2</span>*np.random.rand(n,<span style="color: #B452CD">1</span>)
-y = <span style="color: #B452CD">4</span>+<span style="color: #B452CD">3</span>*x+np.random.randn(n,<span style="color: #B452CD">1</span>)
-
-X = np.c_[np.ones((n,<span style="color: #B452CD">1</span>)), x]
-XT_X = X.T @ X
-
-<span style="color: #228B22">#Ridge parameter lambda</span>
-lmbda  = <span style="color: #B452CD">0.001</span>
-Id = n*lmbda* np.eye(XT_X.shape[<span style="color: #B452CD">0</span>])
-
-<span style="color: #228B22"># Hessian matrix</span>
-H = (<span style="color: #B452CD">2.0</span>/n)* XT_X+<span style="color: #B452CD">2</span>*lmbda* np.eye(XT_X.shape[<span style="color: #B452CD">0</span>])
-<span style="color: #228B22"># Get the eigenvalues</span>
-EigValues, EigVectors = np.linalg.eig(H)
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">f&quot;Eigenvalues of Hessian Matrix:{</span>EigValues<span style="color: #CD5555">}&quot;</span>)
-
-
-beta_linreg = np.linalg.inv(XT_X+Id) @ X.T @ y
-<span style="color: #658b00">print</span>(beta_linreg)
-<span style="color: #228B22"># Start plain gradient descent</span>
-beta = np.random.randn(<span style="color: #B452CD">2</span>,<span style="color: #B452CD">1</span>)
-
-eta = <span style="color: #B452CD">1.0</span>/np.max(EigValues)
-Niterations = <span style="color: #B452CD">100</span>
-
-<span style="color: #8B008B; font-weight: bold">for</span> <span style="color: #658b00">iter</span> <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(Niterations):
-    gradients = <span style="color: #B452CD">2.0</span>/n*X.T @ (X @ (beta)-y)+<span style="color: #B452CD">2</span>*lmbda*beta
-    beta -= eta*gradients
-
-<span style="color: #658b00">print</span>(beta)
-ypredict = X @ beta
-ypredict2 = X @ beta_linreg
-plt.plot(x, ypredict, <span style="color: #CD5555">&quot;r-&quot;</span>)
-plt.plot(x, ypredict2, <span style="color: #CD5555">&quot;b-&quot;</span>)
-plt.plot(x, y ,<span style="color: #CD5555">&#39;ro&#39;</span>)
-plt.axis([<span style="color: #B452CD">0</span>,<span style="color: #B452CD">2.0</span>,<span style="color: #B452CD">0</span>, <span style="color: #B452CD">15.0</span>])
-plt.xlabel(<span style="color: #CD5555">r&#39;$x$&#39;</span>)
-plt.ylabel(<span style="color: #CD5555">r&#39;$y$&#39;</span>)
-plt.title(<span style="color: #CD5555">r&#39;Gradient descent example for Ridge&#39;</span>)
-plt.show()
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
+<p>The right-hand side is computed with the old values of \( \theta \). </p>
 
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="using-gradient-descent-methods-limitations">Using gradient descent methods, limitations </h2>
+<p>If we can compute these matrices, in particular the Hessian, the above is often the easiest method to implement. </p>
 
-<ul>
-<li> <b>Gradient descent (GD) finds local minima of our function</b>. Since the GD algorithm is deterministic, if it converges, it will converge to a local minimum of our cost/loss/risk function. Because in ML we are often dealing with extremely rugged landscapes with many local minima, this can lead to poor performance.</li>
-<li> <b>GD is sensitive to initial conditions</b>. One consequence of the local nature of GD is that initial conditions matter. Depending on where one starts, one will end up at a different local minima. Therefore, it is very important to think about how one initializes the training process. This is true for GD as well as more complicated variants of GD.</li>
-<li> <b>Gradients are computationally expensive to calculate for large datasets</b>. In many cases in statistics and ML, the cost/loss/risk function is a sum of terms, with one term for each data point. For example, in linear regression, \( E \propto \sum_{i=1}^n (y_i - \mathbf{w}^T\cdot\mathbf{x}_i)^2 \); for logistic regression, the square error is replaced by the cross entropy. To calculate the gradient we have to sum over <em>all</em> \( n \) data points. Doing this at every GD step becomes extremely computationally expensive. An ingenious solution to this, is to calculate the gradients using small subsets of the data called &quot;mini batches&quot;. This has the added benefit of introducing stochasticity into our algorithm.</li>
-<li> <b>GD is very sensitive to choices of learning rates</b>. GD is extremely sensitive to the choice of learning rates. If the learning rate is very small, the training process take an extremely long time. For larger learning rates, GD can diverge and give poor results. Furthermore, depending on what the local landscape looks like, we have to modify the learning rates to ensure convergence. Ideally, we would <em>adaptively</em> choose the learning rates to match the landscape.</li>
-<li> <b>GD treats all directions in parameter space uniformly.</b> Another major drawback of GD is that unlike Newton's method, the learning rate for GD is the same in all directions in parameter space. For this reason, the maximum learning rate is set by the behavior of the steepest direction and this can significantly slow down training. Ideally, we would like to take large steps in flat directions and small steps in steep directions. Since we are exploring rugged landscapes where curvatures change, this requires us to keep track of not only the gradient but second derivatives. The ideal scenario would be to calculate the Hessian but this proves to be too computationally expensive.</li> 
-<li> GD can take exponential time to escape saddle points, even with random initialization. As we mentioned, GD is extremely sensitive to initial condition since it determines the particular local minimum GD would eventually reach. However, even with a good initialization scheme, through the introduction of randomness, GD can still take exponential time to escape saddle points.</li>
-</ul>
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="improving-gradient-descent-with-momentum">Improving gradient descent with momentum </h2>
-
-<p>We discuss here some simple examples where we introduce what is called 'memory'about previous steps, or what is normally called momentum gradient descent. The mathematics is explained below in connection with Stochastic gradient descent.</p>
+<h2 id="example-code-for-logistic-regression">Example code for Logistic Regression </h2>
 
+<p>Here we make a class for Logistic regression. The code uses a simple data set and includes both a binary case and a multiclass case.</p>
 
 <!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
 <div class="cell border-box-sizing code_cell rendered">
@@ -1675,61 +1675,138 @@ <h2 id="improving-gradient-descent-with-momentum">Improving gradient descent wit
     <div class="inner_cell">
       <div class="input_area">
         <div class="highlight" style="background: #eeeedd">
-  <pre style="line-height: 125%;"><span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">numpy</span> <span style="color: #8B008B; font-weight: bold">import</span> asarray
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">numpy</span> <span style="color: #8B008B; font-weight: bold">import</span> arange
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">numpy.random</span> <span style="color: #8B008B; font-weight: bold">import</span> rand
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">numpy.random</span> <span style="color: #8B008B; font-weight: bold">import</span> seed
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">matplotlib</span> <span style="color: #8B008B; font-weight: bold">import</span> pyplot
- 
-<span style="color: #228B22"># objective function</span>
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">objective</span>(x):
-	<span style="color: #8B008B; font-weight: bold">return</span> x**<span style="color: #B452CD">2.0</span>
- 
-<span style="color: #228B22"># derivative of objective function</span>
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">derivative</span>(x):
-	<span style="color: #8B008B; font-weight: bold">return</span> x * <span style="color: #B452CD">2.0</span>
- 
-<span style="color: #228B22"># gradient descent algorithm</span>
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">gradient_descent</span>(objective, derivative, bounds, n_iter, step_size):
-	<span style="color: #228B22"># track all solutions</span>
-	solutions, scores = <span style="color: #658b00">list</span>(), <span style="color: #658b00">list</span>()
-	<span style="color: #228B22"># generate an initial point</span>
-	solution = bounds[:, <span style="color: #B452CD">0</span>] + rand(<span style="color: #658b00">len</span>(bounds)) * (bounds[:, <span style="color: #B452CD">1</span>] - bounds[:, <span style="color: #B452CD">0</span>])
-	<span style="color: #228B22"># run the gradient descent</span>
-	<span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(n_iter):
-		<span style="color: #228B22"># calculate gradient</span>
-		gradient = derivative(solution)
-		<span style="color: #228B22"># take a step</span>
-		solution = solution - step_size * gradient
-		<span style="color: #228B22"># evaluate candidate point</span>
-		solution_eval = objective(solution)
-		<span style="color: #228B22"># store solution</span>
-		solutions.append(solution)
-		scores.append(solution_eval)
-		<span style="color: #228B22"># report progress</span>
-		<span style="color: #658b00">print</span>(<span style="color: #CD5555">&#39;&gt;%d f(%s) = %.5f&#39;</span> % (i, solution, solution_eval))
-	<span style="color: #8B008B; font-weight: bold">return</span> [solutions, scores]
- 
-<span style="color: #228B22"># seed the pseudo random number generator</span>
-seed(<span style="color: #B452CD">4</span>)
-<span style="color: #228B22"># define range for input</span>
-bounds = asarray([[-<span style="color: #B452CD">1.0</span>, <span style="color: #B452CD">1.0</span>]])
-<span style="color: #228B22"># define the total iterations</span>
-n_iter = <span style="color: #B452CD">30</span>
-<span style="color: #228B22"># define the step size</span>
-step_size = <span style="color: #B452CD">0.1</span>
-<span style="color: #228B22"># perform the gradient descent search</span>
-solutions, scores = gradient_descent(objective, derivative, bounds, n_iter, step_size)
-<span style="color: #228B22"># sample input range uniformly at 0.1 increments</span>
-inputs = arange(bounds[<span style="color: #B452CD">0</span>,<span style="color: #B452CD">0</span>], bounds[<span style="color: #B452CD">0</span>,<span style="color: #B452CD">1</span>]+<span style="color: #B452CD">0.1</span>, <span style="color: #B452CD">0.1</span>)
-<span style="color: #228B22"># compute targets</span>
-results = objective(inputs)
-<span style="color: #228B22"># create a line plot of input vs result</span>
-pyplot.plot(inputs, results)
-<span style="color: #228B22"># plot the solutions found</span>
-pyplot.plot(solutions, scores, <span style="color: #CD5555">&#39;.-&#39;</span>, color=<span style="color: #CD5555">&#39;red&#39;</span>)
-<span style="color: #228B22"># show the plot</span>
-pyplot.show()
+  <pre style="line-height: 125%;"><span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
+
+<span style="color: #8B008B; font-weight: bold">class</span> <span style="color: #008b45; font-weight: bold">LogisticRegression</span>:
+    <span style="color: #CD5555">&quot;&quot;&quot;</span>
+<span style="color: #CD5555">    Logistic Regression for binary and multiclass classification.</span>
+<span style="color: #CD5555">    &quot;&quot;&quot;</span>
+    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">__init__</span>(<span style="color: #658b00">self</span>, lr=<span style="color: #B452CD">0.01</span>, epochs=<span style="color: #B452CD">1000</span>, fit_intercept=<span style="color: #8B008B; font-weight: bold">True</span>, verbose=<span style="color: #8B008B; font-weight: bold">False</span>):
+        <span style="color: #658b00">self</span>.lr = lr                  <span style="color: #228B22"># Learning rate for gradient descent</span>
+        <span style="color: #658b00">self</span>.epochs = epochs          <span style="color: #228B22"># Number of iterations</span>
+        <span style="color: #658b00">self</span>.fit_intercept = fit_intercept  <span style="color: #228B22"># Whether to add intercept (bias)</span>
+        <span style="color: #658b00">self</span>.verbose = verbose        <span style="color: #228B22"># Print loss during training if True</span>
+        <span style="color: #658b00">self</span>.weights = <span style="color: #8B008B; font-weight: bold">None</span>
+        <span style="color: #658b00">self</span>.multi_class = <span style="color: #8B008B; font-weight: bold">False</span>      <span style="color: #228B22"># Will be determined at fit time</span>
+
+    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">_add_intercept</span>(<span style="color: #658b00">self</span>, X):
+        <span style="color: #CD5555">&quot;&quot;&quot;Add intercept term (column of ones) to feature matrix.&quot;&quot;&quot;</span>
+        intercept = np.ones((X.shape[<span style="color: #B452CD">0</span>], <span style="color: #B452CD">1</span>))
+        <span style="color: #8B008B; font-weight: bold">return</span> np.concatenate((intercept, X), axis=<span style="color: #B452CD">1</span>)
+
+    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">_sigmoid</span>(<span style="color: #658b00">self</span>, z):
+        <span style="color: #CD5555">&quot;&quot;&quot;Sigmoid function for binary logistic.&quot;&quot;&quot;</span>
+        <span style="color: #8B008B; font-weight: bold">return</span> <span style="color: #B452CD">1</span> / (<span style="color: #B452CD">1</span> + np.exp(-z))
+
+    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">_softmax</span>(<span style="color: #658b00">self</span>, Z):
+        <span style="color: #CD5555">&quot;&quot;&quot;Softmax function for multiclass logistic.&quot;&quot;&quot;</span>
+        exp_Z = np.exp(Z - np.max(Z, axis=<span style="color: #B452CD">1</span>, keepdims=<span style="color: #8B008B; font-weight: bold">True</span>))
+        <span style="color: #8B008B; font-weight: bold">return</span> exp_Z / np.sum(exp_Z, axis=<span style="color: #B452CD">1</span>, keepdims=<span style="color: #8B008B; font-weight: bold">True</span>)
+
+    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">fit</span>(<span style="color: #658b00">self</span>, X, y):
+        <span style="color: #CD5555">&quot;&quot;&quot;</span>
+<span style="color: #CD5555">        Train the logistic regression model using gradient descent.</span>
+<span style="color: #CD5555">        Supports binary (sigmoid) and multiclass (softmax) based on y.</span>
+<span style="color: #CD5555">        &quot;&quot;&quot;</span>
+        X = np.array(X)
+        y = np.array(y)
+        n_samples, n_features = X.shape
+
+        <span style="color: #228B22"># Add intercept if needed</span>
+        <span style="color: #8B008B; font-weight: bold">if</span> <span style="color: #658b00">self</span>.fit_intercept:
+            X = <span style="color: #658b00">self</span>._add_intercept(X)
+            n_features += <span style="color: #B452CD">1</span>
+
+        <span style="color: #228B22"># Determine classes and mode (binary vs multiclass)</span>
+        unique_classes = np.unique(y)
+        <span style="color: #8B008B; font-weight: bold">if</span> <span style="color: #658b00">len</span>(unique_classes) &gt; <span style="color: #B452CD">2</span>:
+            <span style="color: #658b00">self</span>.multi_class = <span style="color: #8B008B; font-weight: bold">True</span>
+        <span style="color: #8B008B; font-weight: bold">else</span>:
+            <span style="color: #658b00">self</span>.multi_class = <span style="color: #8B008B; font-weight: bold">False</span>
+
+        <span style="color: #228B22"># ----- Multiclass case -----</span>
+        <span style="color: #8B008B; font-weight: bold">if</span> <span style="color: #658b00">self</span>.multi_class:
+            n_classes = <span style="color: #658b00">len</span>(unique_classes)
+            <span style="color: #228B22"># Map original labels to 0...n_classes-1</span>
+            class_to_index = {c: idx <span style="color: #8B008B; font-weight: bold">for</span> idx, c <span style="color: #8B008B">in</span> <span style="color: #658b00">enumerate</span>(unique_classes)}
+            y_indices = np.array([class_to_index[c] <span style="color: #8B008B; font-weight: bold">for</span> c <span style="color: #8B008B">in</span> y])
+            <span style="color: #228B22"># Initialize weight matrix (features x classes)</span>
+            <span style="color: #658b00">self</span>.weights = np.zeros((n_features, n_classes))
+
+            <span style="color: #228B22"># One-hot encode y</span>
+            Y_onehot = np.zeros((n_samples, n_classes))
+            Y_onehot[np.arange(n_samples), y_indices] = <span style="color: #B452CD">1</span>
+
+            <span style="color: #228B22"># Gradient descent</span>
+            <span style="color: #8B008B; font-weight: bold">for</span> epoch <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #658b00">self</span>.epochs):
+                scores = X.dot(<span style="color: #658b00">self</span>.weights)          <span style="color: #228B22"># Linear scores (n_samples x n_classes)</span>
+                probs = <span style="color: #658b00">self</span>._softmax(scores)        <span style="color: #228B22"># Probabilities (n_samples x n_classes)</span>
+                <span style="color: #228B22"># Compute gradient (features x classes)</span>
+                gradient = (<span style="color: #B452CD">1</span> / n_samples) * X.T.dot(probs - Y_onehot)
+                <span style="color: #228B22"># Update weights</span>
+                <span style="color: #658b00">self</span>.weights -= <span style="color: #658b00">self</span>.lr * gradient
+
+                <span style="color: #8B008B; font-weight: bold">if</span> <span style="color: #658b00">self</span>.verbose <span style="color: #8B008B">and</span> epoch % <span style="color: #B452CD">100</span> == <span style="color: #B452CD">0</span>:
+                    <span style="color: #228B22"># Compute current loss (categorical cross-entropy)</span>
+                    loss = -np.sum(Y_onehot * np.log(probs + <span style="color: #B452CD">1e-15</span>)) / n_samples
+                    <span style="color: #658b00">print</span>(<span style="color: #CD5555">f&quot;[Epoch {</span>epoch<span style="color: #CD5555">}] Multiclass loss: {</span>loss<span style="color: #CD5555">:.4f}&quot;</span>)
+
+        <span style="color: #228B22"># ----- Binary case -----</span>
+        <span style="color: #8B008B; font-weight: bold">else</span>:
+            <span style="color: #228B22"># Convert y to 0/1 if not already</span>
+            <span style="color: #8B008B; font-weight: bold">if</span> <span style="color: #8B008B">not</span> np.array_equal(unique_classes, [<span style="color: #B452CD">0</span>, <span style="color: #B452CD">1</span>]):
+                <span style="color: #228B22"># Map the two classes to 0 and 1</span>
+                class0, class1 = unique_classes
+                y_binary = np.where(y == class1, <span style="color: #B452CD">1</span>, <span style="color: #B452CD">0</span>)
+            <span style="color: #8B008B; font-weight: bold">else</span>:
+                y_binary = y.copy().astype(<span style="color: #658b00">int</span>)
+
+            <span style="color: #228B22"># Initialize weights vector (features,)</span>
+            <span style="color: #658b00">self</span>.weights = np.zeros(n_features)
+
+            <span style="color: #228B22"># Gradient descent</span>
+            <span style="color: #8B008B; font-weight: bold">for</span> epoch <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #658b00">self</span>.epochs):
+                linear_model = X.dot(<span style="color: #658b00">self</span>.weights)     <span style="color: #228B22"># (n_samples,)</span>
+                probs = <span style="color: #658b00">self</span>._sigmoid(linear_model)   <span style="color: #228B22"># (n_samples,)</span>
+                <span style="color: #228B22"># Gradient for binary cross-entropy</span>
+                gradient = (<span style="color: #B452CD">1</span> / n_samples) * X.T.dot(probs - y_binary)
+                <span style="color: #658b00">self</span>.weights -= <span style="color: #658b00">self</span>.lr * gradient
+
+                <span style="color: #8B008B; font-weight: bold">if</span> <span style="color: #658b00">self</span>.verbose <span style="color: #8B008B">and</span> epoch % <span style="color: #B452CD">100</span> == <span style="color: #B452CD">0</span>:
+                    <span style="color: #228B22"># Compute binary cross-entropy loss</span>
+                    loss = -np.mean(
+                        y_binary * np.log(probs + <span style="color: #B452CD">1e-15</span>) + 
+                        (<span style="color: #B452CD">1</span> - y_binary) * np.log(<span style="color: #B452CD">1</span> - probs + <span style="color: #B452CD">1e-15</span>)
+                    )
+                    <span style="color: #658b00">print</span>(<span style="color: #CD5555">f&quot;[Epoch {</span>epoch<span style="color: #CD5555">}] Binary loss: {</span>loss<span style="color: #CD5555">:.4f}&quot;</span>)
+
+    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">predict_prob</span>(<span style="color: #658b00">self</span>, X):
+        <span style="color: #CD5555">&quot;&quot;&quot;</span>
+<span style="color: #CD5555">        Compute probability estimates. Returns a 1D array for binary or</span>
+<span style="color: #CD5555">        a 2D array (n_samples x n_classes) for multiclass.</span>
+<span style="color: #CD5555">        &quot;&quot;&quot;</span>
+        X = np.array(X)
+        <span style="color: #228B22"># Add intercept if the model used it</span>
+        <span style="color: #8B008B; font-weight: bold">if</span> <span style="color: #658b00">self</span>.fit_intercept:
+            X = <span style="color: #658b00">self</span>._add_intercept(X)
+        scores = X.dot(<span style="color: #658b00">self</span>.weights)
+        <span style="color: #8B008B; font-weight: bold">if</span> <span style="color: #658b00">self</span>.multi_class:
+            <span style="color: #8B008B; font-weight: bold">return</span> <span style="color: #658b00">self</span>._softmax(scores)
+        <span style="color: #8B008B; font-weight: bold">else</span>:
+            <span style="color: #8B008B; font-weight: bold">return</span> <span style="color: #658b00">self</span>._sigmoid(scores)
+
+    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">predict</span>(<span style="color: #658b00">self</span>, X):
+        <span style="color: #CD5555">&quot;&quot;&quot;</span>
+<span style="color: #CD5555">        Predict class labels for samples in X.</span>
+<span style="color: #CD5555">        Returns integer class labels (0,1 for binary, or 0...C-1 for multiclass).</span>
+<span style="color: #CD5555">        &quot;&quot;&quot;</span>
+        probs = <span style="color: #658b00">self</span>.predict_prob(X)
+        <span style="color: #8B008B; font-weight: bold">if</span> <span style="color: #658b00">self</span>.multi_class:
+            <span style="color: #228B22"># Choose class with highest probability</span>
+            <span style="color: #8B008B; font-weight: bold">return</span> np.argmax(probs, axis=<span style="color: #B452CD">1</span>)
+        <span style="color: #8B008B; font-weight: bold">else</span>:
+            <span style="color: #228B22"># Threshold at 0.5 for binary</span>
+            <span style="color: #8B008B; font-weight: bold">return</span> (probs &gt;= <span style="color: #B452CD">0.5</span>).astype(<span style="color: #658b00">int</span>)
 </pre>
 </div>
       </div>
@@ -1745,9 +1822,16 @@ <h2 id="improving-gradient-descent-with-momentum">Improving gradient descent wit
   </div>
 </div>
 
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="same-code-but-now-with-momentum-gradient-descent">Same code but now with momentum gradient descent </h2>
+<p>The class implements the sigmoid and softmax internally. During fit(),
+we check the number of classes: if more than 2, we set
+self.multi_class=True and perform multinomial logistic regression. We
+one-hot encode the target vector and update a weight matrix with
+softmax probabilities. Otherwise, we do standard binary logistic
+regression, converting labels to 0/1 if needed and updating a weight
+vector. In both cases we use batch gradient descent on the
+cross-entropy loss (we add a small epsilon 1e-15 to logs for numerical
+stability). Progress (loss) can be printed if verbose=True.
+</p>
 
 
 <!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
@@ -1756,69 +1840,38 @@ <h2 id="same-code-but-now-with-momentum-gradient-descent">Same code but now with
     <div class="inner_cell">
       <div class="input_area">
         <div class="highlight" style="background: #eeeedd">
-  <pre style="line-height: 125%;"><span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">numpy</span> <span style="color: #8B008B; font-weight: bold">import</span> asarray
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">numpy</span> <span style="color: #8B008B; font-weight: bold">import</span> arange
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">numpy.random</span> <span style="color: #8B008B; font-weight: bold">import</span> rand
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">numpy.random</span> <span style="color: #8B008B; font-weight: bold">import</span> seed
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">matplotlib</span> <span style="color: #8B008B; font-weight: bold">import</span> pyplot
- 
-<span style="color: #228B22"># objective function</span>
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">objective</span>(x):
-	<span style="color: #8B008B; font-weight: bold">return</span> x**<span style="color: #B452CD">2.0</span>
- 
-<span style="color: #228B22"># derivative of objective function</span>
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">derivative</span>(x):
-	<span style="color: #8B008B; font-weight: bold">return</span> x * <span style="color: #B452CD">2.0</span>
- 
-<span style="color: #228B22"># gradient descent algorithm</span>
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">gradient_descent</span>(objective, derivative, bounds, n_iter, step_size, momentum):
-	<span style="color: #228B22"># track all solutions</span>
-	solutions, scores = <span style="color: #658b00">list</span>(), <span style="color: #658b00">list</span>()
-	<span style="color: #228B22"># generate an initial point</span>
-	solution = bounds[:, <span style="color: #B452CD">0</span>] + rand(<span style="color: #658b00">len</span>(bounds)) * (bounds[:, <span style="color: #B452CD">1</span>] - bounds[:, <span style="color: #B452CD">0</span>])
-	<span style="color: #228B22"># keep track of the change</span>
-	change = <span style="color: #B452CD">0.0</span>
-	<span style="color: #228B22"># run the gradient descent</span>
-	<span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(n_iter):
-		<span style="color: #228B22"># calculate gradient</span>
-		gradient = derivative(solution)
-		<span style="color: #228B22"># calculate update</span>
-		new_change = step_size * gradient + momentum * change
-		<span style="color: #228B22"># take a step</span>
-		solution = solution - new_change
-		<span style="color: #228B22"># save the change</span>
-		change = new_change
-		<span style="color: #228B22"># evaluate candidate point</span>
-		solution_eval = objective(solution)
-		<span style="color: #228B22"># store solution</span>
-		solutions.append(solution)
-		scores.append(solution_eval)
-		<span style="color: #228B22"># report progress</span>
-		<span style="color: #658b00">print</span>(<span style="color: #CD5555">&#39;&gt;%d f(%s) = %.5f&#39;</span> % (i, solution, solution_eval))
-	<span style="color: #8B008B; font-weight: bold">return</span> [solutions, scores]
- 
-<span style="color: #228B22"># seed the pseudo random number generator</span>
-seed(<span style="color: #B452CD">4</span>)
-<span style="color: #228B22"># define range for input</span>
-bounds = asarray([[-<span style="color: #B452CD">1.0</span>, <span style="color: #B452CD">1.0</span>]])
-<span style="color: #228B22"># define the total iterations</span>
-n_iter = <span style="color: #B452CD">30</span>
-<span style="color: #228B22"># define the step size</span>
-step_size = <span style="color: #B452CD">0.1</span>
-<span style="color: #228B22"># define momentum</span>
-momentum = <span style="color: #B452CD">0.3</span>
-<span style="color: #228B22"># perform the gradient descent search with momentum</span>
-solutions, scores = gradient_descent(objective, derivative, bounds, n_iter, step_size, momentum)
-<span style="color: #228B22"># sample input range uniformly at 0.1 increments</span>
-inputs = arange(bounds[<span style="color: #B452CD">0</span>,<span style="color: #B452CD">0</span>], bounds[<span style="color: #B452CD">0</span>,<span style="color: #B452CD">1</span>]+<span style="color: #B452CD">0.1</span>, <span style="color: #B452CD">0.1</span>)
-<span style="color: #228B22"># compute targets</span>
-results = objective(inputs)
-<span style="color: #228B22"># create a line plot of input vs result</span>
-pyplot.plot(inputs, results)
-<span style="color: #228B22"># plot the solutions found</span>
-pyplot.plot(solutions, scores, <span style="color: #CD5555">&#39;.-&#39;</span>, color=<span style="color: #CD5555">&#39;red&#39;</span>)
-<span style="color: #228B22"># show the plot</span>
-pyplot.show()
+  <pre style="line-height: 125%;"><span style="color: #228B22"># Evaluation Metrics</span>
+<span style="color: #228B22">#We define helper functions for accuracy and cross-entropy loss. Accuracy is the fraction of correct predictions . For loss, we compute the appropriate cross-entropy:</span>
+
+<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">accuracy_score</span>(y_true, y_pred):
+    <span style="color: #CD5555">&quot;&quot;&quot;Accuracy = (# correct predictions) / (total samples).&quot;&quot;&quot;</span>
+    y_true = np.array(y_true)
+    y_pred = np.array(y_pred)
+    <span style="color: #8B008B; font-weight: bold">return</span> np.mean(y_true == y_pred)
+
+<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">binary_cross_entropy</span>(y_true, y_prob):
+    <span style="color: #CD5555">&quot;&quot;&quot;</span>
+<span style="color: #CD5555">    Binary cross-entropy loss.</span>
+<span style="color: #CD5555">    y_true: true binary labels (0 or 1), y_prob: predicted probabilities for class 1.</span>
+<span style="color: #CD5555">    &quot;&quot;&quot;</span>
+    y_true = np.array(y_true)
+    y_prob = np.clip(np.array(y_prob), <span style="color: #B452CD">1e-15</span>, <span style="color: #B452CD">1</span>-<span style="color: #B452CD">1e-15</span>)
+    <span style="color: #8B008B; font-weight: bold">return</span> -np.mean(y_true * np.log(y_prob) + (<span style="color: #B452CD">1</span> - y_true) * np.log(<span style="color: #B452CD">1</span> - y_prob))
+
+<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">categorical_cross_entropy</span>(y_true, y_prob):
+    <span style="color: #CD5555">&quot;&quot;&quot;</span>
+<span style="color: #CD5555">    Categorical cross-entropy loss for multiclass.</span>
+<span style="color: #CD5555">    y_true: true labels (0...C-1), y_prob: array of predicted probabilities (n_samples x C).</span>
+<span style="color: #CD5555">    &quot;&quot;&quot;</span>
+    y_true = np.array(y_true, dtype=<span style="color: #658b00">int</span>)
+    y_prob = np.clip(np.array(y_prob), <span style="color: #B452CD">1e-15</span>, <span style="color: #B452CD">1</span>-<span style="color: #B452CD">1e-15</span>)
+    <span style="color: #228B22"># One-hot encode true labels</span>
+    n_samples, n_classes = y_prob.shape
+    one_hot = np.zeros_like(y_prob)
+    one_hot[np.arange(n_samples), y_true] = <span style="color: #B452CD">1</span>
+    <span style="color: #228B22"># Compute cross-entropy</span>
+    loss_vec = -np.sum(one_hot * np.log(y_prob), axis=<span style="color: #B452CD">1</span>)
+    <span style="color: #8B008B; font-weight: bold">return</span> np.mean(loss_vec)
 </pre>
 </div>
       </div>
@@ -1833,133 +1886,12 @@ <h2 id="same-code-but-now-with-momentum-gradient-descent">Same code but now with
     </div>
   </div>
 </div>
+<h3 id="synthetic-data-generation">Synthetic data generation </h3>
 
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="overview-video-on-stochastic-gradient-descent">Overview video on Stochastic Gradient Descent </h2>
-
-<a href="https://www.youtube.com/watch?v=vMh0zPT0tLI&ab_channel=StatQuestwithJoshStarmer" target="_blank">What is Stochastic Gradient Descent</a>
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="batches-and-mini-batches">Batches and mini-batches </h2>
-
-<p>In gradient descent we compute the cost function and its gradient for all data points we have.</p>
-
-<p>In large-scale applications such as the <a href="https://www.image-net.org/challenges/LSVRC/" target="_blank">ILSVRC challenge</a>, the
-training data can have on order of millions of examples. Hence, it
-seems wasteful to compute the full cost function over the entire
-training set in order to perform only a single parameter update. A
-very common approach to addressing this challenge is to compute the
-gradient over batches of the training data. For example, a typical batch could contain some thousand  examples from
-an  entire training set of several millions. This batch is then used to
-perform a parameter update.
-</p>
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="stochastic-gradient-descent-sgd">Stochastic Gradient Descent (SGD) </h2>
-
-<p>In stochastic gradient descent, the extreme case is the case where we
-have only one batch, that is we include the whole data set.
-</p>
-
-<p>This process is called Stochastic Gradient
-Descent (SGD) (or also sometimes on-line gradient descent). This is
-relatively less common to see because in practice due to vectorized
-code optimizations it can be computationally much more efficient to
-evaluate the gradient for 100 examples, than the gradient for one
-example 100 times. Even though SGD technically refers to using a
-single example at a time to evaluate the gradient, you will hear
-people use the term SGD even when referring to mini-batch gradient
-descent (i.e. mentions of MGD for &#8220;Minibatch Gradient Descent&#8221;, or BGD
-for &#8220;Batch gradient descent&#8221; are rare to see), where it is usually
-assumed that mini-batches are used. The size of the mini-batch is a
-hyperparameter but it is not very common to cross-validate or bootstrap it. It is
-usually based on memory constraints (if any), or set to some value,
-e.g. 32, 64 or 128. We use powers of 2 in practice because many
-vectorized operation implementations work faster when their inputs are
-sized in powers of 2.
-</p>
-
-<p>In our notes with  SGD we mean stochastic gradient descent with mini-batches.</p>
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="stochastic-gradient-descent">Stochastic Gradient Descent </h2>
-
-<p>Stochastic gradient descent (SGD) and variants thereof address some of
-the shortcomings of the Gradient descent method discussed above.
-</p>
-
-<p>The underlying idea of SGD comes from the observation that the cost
-function, which we want to minimize, can almost always be written as a
-sum over \( n \) data points \( \{\mathbf{x}_i\}_{i=1}^n \),
-</p>
-$$
-C(\mathbf{\beta}) = \sum_{i=1}^n c_i(\mathbf{x}_i,
-\mathbf{\beta}). 
-$$
-
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="computation-of-gradients">Computation of gradients </h2>
-
-<p>This in turn means that the gradient can be
-computed as a sum over \( i \)-gradients 
-</p>
-$$
-\nabla_\beta C(\mathbf{\beta}) = \sum_i^n \nabla_\beta c_i(\mathbf{x}_i,
-\mathbf{\beta}).
-$$
-
-<p>Stochasticity/randomness is introduced by only taking the
-gradient on a subset of the data called minibatches.  If there are \( n \)
-data points and the size of each minibatch is \( M \), there will be \( n/M \)
-minibatches. We denote these minibatches by \( B_k \) where
-\( k=1,\cdots,n/M \).
-</p>
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="sgd-example">SGD example </h2>
-<p>As an example, suppose we have \( 10 \) data points \( (\mathbf{x}_1,\cdots, \mathbf{x}_{10}) \) 
-and we choose to have \( M=5 \) minibathces,
-then each minibatch contains two data points. In particular we have
-\( B_1 = (\mathbf{x}_1,\mathbf{x}_2), \cdots, B_5 =
-(\mathbf{x}_9,\mathbf{x}_{10}) \). Note that if you choose \( M=1 \) you
-have only a single batch with all data points and on the other extreme,
-you may choose \( M=n \) resulting in a minibatch for each datapoint, i.e
-\( B_k = \mathbf{x}_k \).
-</p>
-
-<p>The idea is now to approximate the gradient by replacing the sum over
-all data points with a sum over the data points in one the minibatches
-picked at random in each gradient descent step 
-</p>
-$$
-\nabla_{\beta}
-C(\mathbf{\beta}) = \sum_{i=1}^n \nabla_\beta c_i(\mathbf{x}_i,
-\mathbf{\beta}) \rightarrow \sum_{i \in B_k}^n \nabla_\beta
-c_i(\mathbf{x}_i, \mathbf{\beta}).
-$$
-
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="the-gradient-step">The gradient step </h2>
-
-<p>Thus a gradient descent step now looks like </p>
-$$
-\beta_{j+1} = \beta_j - \gamma_j \sum_{i \in B_k}^n \nabla_\beta c_i(\mathbf{x}_i,
-\mathbf{\beta})
-$$
-
-<p>where \( k \) is picked at random with equal
-probability from \( [1,n/M] \). An iteration over the number of
-minibathces (n/M) is commonly referred to as an epoch. Thus it is
-typical to choose a number of epochs and for each epoch iterate over
-the number of minibatches, as exemplified in the code below.
+<p>Binary classification data: Create two Gaussian clusters in 2D. For example, class 0 around mean [-2,-2] and class 1 around [2,2].
+Multiclass data: Create several Gaussian clusters (one per class) spread out in feature space.
 </p>
 
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="simple-example-code">Simple example code </h2>
-
 
 <!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
 <div class="cell border-box-sizing code_cell rendered">
@@ -1967,20 +1899,84 @@ <h2 id="simple-example-code">Simple example code </h2>
     <div class="inner_cell">
       <div class="input_area">
         <div class="highlight" style="background: #eeeedd">
-  <pre style="line-height: 125%;"><span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span> 
-
-n = <span style="color: #B452CD">100</span> <span style="color: #228B22">#100 datapoints </span>
-M = <span style="color: #B452CD">5</span>   <span style="color: #228B22">#size of each minibatch</span>
-m = <span style="color: #658b00">int</span>(n/M) <span style="color: #228B22">#number of minibatches</span>
-n_epochs = <span style="color: #B452CD">10</span> <span style="color: #228B22">#number of epochs</span>
-
-j = <span style="color: #B452CD">0</span>
-<span style="color: #8B008B; font-weight: bold">for</span> epoch <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #B452CD">1</span>,n_epochs+<span style="color: #B452CD">1</span>):
-    <span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(m):
-        k = np.random.randint(m) <span style="color: #228B22">#Pick the k-th minibatch at random</span>
-        <span style="color: #228B22">#Compute the gradient using the data in minibatch Bk</span>
-        <span style="color: #228B22">#Compute new suggestion for </span>
-        j += <span style="color: #B452CD">1</span>
+  <pre style="line-height: 125%;"><span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
+
+<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">generate_binary_data</span>(n_samples=<span style="color: #B452CD">100</span>, n_features=<span style="color: #B452CD">2</span>, random_state=<span style="color: #8B008B; font-weight: bold">None</span>):
+    <span style="color: #CD5555">&quot;&quot;&quot;</span>
+<span style="color: #CD5555">    Generate synthetic binary classification data.</span>
+<span style="color: #CD5555">    Returns (X, y) where X is (n_samples x n_features), y in {0,1}.</span>
+<span style="color: #CD5555">    &quot;&quot;&quot;</span>
+    rng = np.random.RandomState(random_state)
+    <span style="color: #228B22"># Half samples for class 0, half for class 1</span>
+    n0 = n_samples // <span style="color: #B452CD">2</span>
+    n1 = n_samples - n0
+    <span style="color: #228B22"># Class 0 around mean -2, class 1 around +2</span>
+    mean0 = -<span style="color: #B452CD">2</span> * np.ones(n_features)
+    mean1 =  <span style="color: #B452CD">2</span> * np.ones(n_features)
+    X0 = rng.randn(n0, n_features) + mean0
+    X1 = rng.randn(n1, n_features) + mean1
+    X = np.vstack((X0, X1))
+    y = np.array([<span style="color: #B452CD">0</span>]*n0 + [<span style="color: #B452CD">1</span>]*n1)
+    <span style="color: #8B008B; font-weight: bold">return</span> X, y
+
+<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">generate_multiclass_data</span>(n_samples=<span style="color: #B452CD">150</span>, n_features=<span style="color: #B452CD">2</span>, n_classes=<span style="color: #B452CD">3</span>, random_state=<span style="color: #8B008B; font-weight: bold">None</span>):
+    <span style="color: #CD5555">&quot;&quot;&quot;</span>
+<span style="color: #CD5555">    Generate synthetic multiclass data with n_classes Gaussian clusters.</span>
+<span style="color: #CD5555">    &quot;&quot;&quot;</span>
+    rng = np.random.RandomState(random_state)
+    X = []
+    y = []
+    samples_per_class = n_samples // n_classes
+    <span style="color: #8B008B; font-weight: bold">for</span> <span style="color: #658b00">cls</span> <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(n_classes):
+        <span style="color: #228B22"># Random cluster center for each class</span>
+        center = rng.uniform(-<span style="color: #B452CD">5</span>, <span style="color: #B452CD">5</span>, size=n_features)
+        Xi = rng.randn(samples_per_class, n_features) + center
+        yi = [<span style="color: #658b00">cls</span>] * samples_per_class
+        X.append(Xi)
+        y.extend(yi)
+    X = np.vstack(X)
+    y = np.array(y)
+    <span style="color: #8B008B; font-weight: bold">return</span> X, y
+
+
+<span style="color: #228B22"># Generate and test on binary data</span>
+X_bin, y_bin = generate_binary_data(n_samples=<span style="color: #B452CD">200</span>, n_features=<span style="color: #B452CD">2</span>, random_state=<span style="color: #B452CD">42</span>)
+model_bin = LogisticRegression(lr=<span style="color: #B452CD">0.1</span>, epochs=<span style="color: #B452CD">1000</span>)
+model_bin.fit(X_bin, y_bin)
+y_prob_bin = model_bin.predict_prob(X_bin)      <span style="color: #228B22"># probabilities for class 1</span>
+y_pred_bin = model_bin.predict(X_bin)           <span style="color: #228B22"># predicted classes 0 or 1</span>
+
+acc_bin = accuracy_score(y_bin, y_pred_bin)
+loss_bin = binary_cross_entropy(y_bin, y_prob_bin)
+<span style="color: #658b00">print</span>(<span style="color: #CD5555">f&quot;Binary Classification - Accuracy: {</span>acc_bin<span style="color: #CD5555">:.2f}, Cross-Entropy Loss: {</span>loss_bin<span style="color: #CD5555">:.2f}&quot;</span>)
+<span style="color: #228B22">#For multiclass:</span>
+<span style="color: #228B22"># Generate and test on multiclass data</span>
+X_multi, y_multi = generate_multiclass_data(n_samples=<span style="color: #B452CD">300</span>, n_features=<span style="color: #B452CD">2</span>, n_classes=<span style="color: #B452CD">3</span>, random_state=<span style="color: #B452CD">1</span>)
+model_multi = LogisticRegression(lr=<span style="color: #B452CD">0.1</span>, epochs=<span style="color: #B452CD">1000</span>)
+model_multi.fit(X_multi, y_multi)
+y_prob_multi = model_multi.predict_prob(X_multi)     <span style="color: #228B22"># (n_samples x 3) probabilities</span>
+y_pred_multi = model_multi.predict(X_multi)          <span style="color: #228B22"># predicted labels 0,1,2</span>
+
+acc_multi = accuracy_score(y_multi, y_pred_multi)
+loss_multi = categorical_cross_entropy(y_multi, y_prob_multi)
+<span style="color: #658b00">print</span>(<span style="color: #CD5555">f&quot;Multiclass Classification - Accuracy: {</span>acc_multi<span style="color: #CD5555">:.2f}, Cross-Entropy Loss: {</span>loss_multi<span style="color: #CD5555">:.2f}&quot;</span>)
+
+<span style="color: #228B22"># CSV Export</span>
+<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">csv</span>
+
+<span style="color: #228B22"># Export binary results</span>
+<span style="color: #8B008B; font-weight: bold">with</span> <span style="color: #658b00">open</span>(<span style="color: #CD5555">&#39;binary_results.csv&#39;</span>, mode=<span style="color: #CD5555">&#39;w&#39;</span>, newline=<span style="color: #CD5555">&#39;&#39;</span>) <span style="color: #8B008B; font-weight: bold">as</span> f:
+    writer = csv.writer(f)
+    writer.writerow([<span style="color: #CD5555">&quot;TrueLabel&quot;</span>, <span style="color: #CD5555">&quot;PredictedLabel&quot;</span>])
+    <span style="color: #8B008B; font-weight: bold">for</span> true, pred <span style="color: #8B008B">in</span> <span style="color: #658b00">zip</span>(y_bin, y_pred_bin):
+        writer.writerow([true, pred])
+
+<span style="color: #228B22"># Export multiclass results</span>
+<span style="color: #8B008B; font-weight: bold">with</span> <span style="color: #658b00">open</span>(<span style="color: #CD5555">&#39;multiclass_results.csv&#39;</span>, mode=<span style="color: #CD5555">&#39;w&#39;</span>, newline=<span style="color: #CD5555">&#39;&#39;</span>) <span style="color: #8B008B; font-weight: bold">as</span> f:
+    writer = csv.writer(f)
+    writer.writerow([<span style="color: #CD5555">&quot;TrueLabel&quot;</span>, <span style="color: #CD5555">&quot;PredictedLabel&quot;</span>])
+    <span style="color: #8B008B; font-weight: bold">for</span> true, pred <span style="color: #8B008B">in</span> <span style="color: #658b00">zip</span>(y_multi, y_pred_multi):
+        writer.writerow([true, pred])
 </pre>
 </div>
       </div>
@@ -1996,1806 +1992,9 @@ <h2 id="simple-example-code">Simple example code </h2>
   </div>
 </div>
 
-<p>Taking the gradient only on a subset of the data has two important
-benefits. First, it introduces randomness which decreases the chance
-that our opmization scheme gets stuck in a local minima. Second, if
-the size of the minibatches are small relative to the number of
-datapoints (\( M <  n \)), the computation of the gradient is much
-cheaper since we sum over the datapoints in the \( k-th \) minibatch and not
-all \( n \) datapoints.
-</p>
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="when-do-we-stop">When do we stop? </h2>
-
-<p>A natural question is when do we stop the search for a new minimum?
-One possibility is to compute the full gradient after a given number
-of epochs and check if the norm of the gradient is smaller than some
-threshold and stop if true. However, the condition that the gradient
-is zero is valid also for local minima, so this would only tell us
-that we are close to a local/global minimum. However, we could also
-evaluate the cost function at this point, store the result and
-continue the search. If the test kicks in at a later stage we can
-compare the values of the cost function and keep the \( \beta \) that
-gave the lowest value.
-</p>
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="slightly-different-approach">Slightly different approach </h2>
-
-<p>Another approach is to let the step length \( \gamma_j \) depend on the
-number of epochs in such a way that it becomes very small after a
-reasonable time such that we do not move at all. Such approaches are
-also called scaling. There are many such ways to <a href="https://towardsdatascience.com/gradient-descent-the-learning-rate-and-the-importance-of-feature-scaling-6c0b416596e1" target="_blank">scale the learning
-rate</a>
-and <a href="https://www.jmlr.org/papers/volume23/20-1258/20-1258.pdf" target="_blank">discussions here</a>. See
-also
-<a href="https://towardsdatascience.com/learning-rate-schedules-and-adaptive-learning-rate-methods-for-deep-learning-2c8f433990d1" target="_blank"><tt>https://towardsdatascience.com/learning-rate-schedules-and-adaptive-learning-rate-methods-for-deep-learning-2c8f433990d1</tt></a>
-for a discussion of different scaling functions for the learning rate.
-</p>
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="time-decay-rate">Time decay rate </h2>
-
-<p>As an example, let \( e = 0,1,2,3,\cdots \) denote the current epoch and let \( t_0, t_1 > 0 \) be two fixed numbers. Furthermore, let \( t = e \cdot m + i \) where \( m \) is the number of minibatches and \( i=0,\cdots,m-1 \). Then the function $$\gamma_j(t; t_0, t_1) = \frac{t_0}{t+t_1} $$ goes to zero as the number of epochs gets large. I.e. we start with a step length \( \gamma_j (0; t_0, t_1) = t_0/t_1 \) which decays in <em>time</em> \( t \).</p>
-
-<p>In this way we can fix the number of epochs, compute \( \beta \) and
-evaluate the cost function at the end. Repeating the computation will
-give a different result since the scheme is random by design. Then we
-pick the final \( \beta \) that gives the lowest value of the cost
-function.
-</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="line-height: 125%;"><span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span> 
-
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">step_length</span>(t,t0,t1):
-    <span style="color: #8B008B; font-weight: bold">return</span> t0/(t+t1)
-
-n = <span style="color: #B452CD">100</span> <span style="color: #228B22">#100 datapoints </span>
-M = <span style="color: #B452CD">5</span>   <span style="color: #228B22">#size of each minibatch</span>
-m = <span style="color: #658b00">int</span>(n/M) <span style="color: #228B22">#number of minibatches</span>
-n_epochs = <span style="color: #B452CD">500</span> <span style="color: #228B22">#number of epochs</span>
-t0 = <span style="color: #B452CD">1.0</span>
-t1 = <span style="color: #B452CD">10</span>
-
-gamma_j = t0/t1
-j = <span style="color: #B452CD">0</span>
-<span style="color: #8B008B; font-weight: bold">for</span> epoch <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #B452CD">1</span>,n_epochs+<span style="color: #B452CD">1</span>):
-    <span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(m):
-        k = np.random.randint(m) <span style="color: #228B22">#Pick the k-th minibatch at random</span>
-        <span style="color: #228B22">#Compute the gradient using the data in minibatch Bk</span>
-        <span style="color: #228B22">#Compute new suggestion for beta</span>
-        t = epoch*m+i
-        gamma_j = step_length(t,t0,t1)
-        j += <span style="color: #B452CD">1</span>
-
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;gamma_j after %d epochs: %g&quot;</span> % (n_epochs,gamma_j))
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="code-with-a-number-of-minibatches-which-varies">Code with a Number of Minibatches which varies </h2>
-
-<p>In the code here we vary the number of mini-batches.</p>
-
-<!-- code=text (!bc pycode) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="line-height: 125%;"># Importing various packages
-from math import exp, sqrt
-from random import random, seed
-import numpy as np
-import matplotlib.pyplot as plt
-
-n = 100
-x = 2*np.random.rand(n,1)
-y = 4+3*x+np.random.randn(n,1)
-
-X = np.c_[np.ones((n,1)), x]
-XT_X = X.T @ X
-theta_linreg = np.linalg.inv(X.T @ X) @ (X.T @ y)
-print(&quot;Own inversion&quot;)
-print(theta_linreg)
-# Hessian matrix
-H = (2.0/n)* XT_X
-EigValues, EigVectors = np.linalg.eig(H)
-print(f&quot;Eigenvalues of Hessian Matrix:{EigValues}&quot;)
-
-theta = np.random.randn(2,1)
-eta = 1.0/np.max(EigValues)
-Niterations = 1000
-
-
-for iter in range(Niterations):
-    gradients = 2.0/n*X.T @ ((X @ theta)-y)
-    theta -= eta*gradients
-print(&quot;theta from own gd&quot;)
-print(theta)
-
-xnew = np.array([[0],[2]])
-Xnew = np.c_[np.ones((2,1)), xnew]
-ypredict = Xnew.dot(theta)
-ypredict2 = Xnew.dot(theta_linreg)
-
-n_epochs = 50
-M = 5   #size of each minibatch
-m = int(n/M) #number of minibatches
-t0, t1 = 5, 50
-
-def learning_schedule(t):
-    return t0/(t+t1)
-
-theta = np.random.randn(2,1)
-
-for epoch in range(n_epochs):
-# Can you figure out a better way of setting up the contributions to each batch?
-    for i in range(m):
-        random_index = M*np.random.randint(m)
-        xi = X[random_index:random_index+M]
-        yi = y[random_index:random_index+M]
-        gradients = (2.0/M)* xi.T @ ((xi @ theta)-yi)
-        eta = learning_schedule(epoch*m+i)
-        theta = theta - eta*gradients
-print(&quot;theta from own sdg&quot;)
-print(theta)
-
-plt.plot(xnew, ypredict, &quot;r-&quot;)
-plt.plot(xnew, ypredict2, &quot;b-&quot;)
-plt.plot(x, y ,&#39;ro&#39;)
-plt.axis([0,2.0,0, 15.0])
-plt.xlabel(r&#39;$x$&#39;)
-plt.ylabel(r&#39;$y$&#39;)
-plt.title(r&#39;Random numbers &#39;)
-plt.show()
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="replace-or-not">Replace or not </h2>
-
-<p>In the above code, we have use replacement in setting up the
-mini-batches. The discussion
-<a href="https://sebastianraschka.com/faq/docs/sgd-methods.html" target="_blank">here</a> may be
-useful.  
-</p>
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="momentum-based-gd">Momentum based GD </h2>
-
-<p>The stochastic gradient descent (SGD) is almost always used with a
-<em>momentum</em> or inertia term that serves as a memory of the direction we
-are moving in parameter space.  This is typically implemented as
-follows
-</p>
-
-$$
-\begin{align}
-\mathbf{v}_{t}&=\gamma \mathbf{v}_{t-1}+\eta_{t}\nabla_\theta E(\boldsymbol{\theta}_t) \nonumber \\
-\boldsymbol{\theta}_{t+1}&= \boldsymbol{\theta}_t -\mathbf{v}_{t},
-\label{_auto1}
-\end{align}
-$$
-
-<p>where we have introduced a momentum parameter \( \gamma \), with
-\( 0\le\gamma\le 1 \), and for brevity we dropped the explicit notation to
-indicate the gradient is to be taken over a different mini-batch at
-each step. We call this algorithm gradient descent with momentum
-(GDM). From these equations, it is clear that \( \mathbf{v}_t \) is a
-running average of recently encountered gradients and
-\( (1-\gamma)^{-1} \) sets the characteristic time scale for the memory
-used in the averaging procedure. Consistent with this, when
-\( \gamma=0 \), this just reduces down to ordinary SGD as discussed
-earlier. An equivalent way of writing the updates is
-</p>
-
-$$
-\Delta \boldsymbol{\theta}_{t+1} = \gamma \Delta \boldsymbol{\theta}_t -\ \eta_{t}\nabla_\theta E(\boldsymbol{\theta}_t),
-$$
-
-<p>where we have defined \( \Delta \boldsymbol{\theta}_{t}= \boldsymbol{\theta}_t-\boldsymbol{\theta}_{t-1} \).</p>
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="more-on-momentum-based-approaches">More on momentum based approaches </h2>
-
-<p>Let us try to get more intuition from these equations. It is helpful
-to consider a simple physical analogy with a particle of mass \( m \)
-moving in a viscous medium with drag coefficient \( \mu \) and potential
-\( E(\mathbf{w}) \). If we denote the particle's position by \( \mathbf{w} \),
-then its motion is described by
-</p>
-
-$$
-m {d^2 \mathbf{w} \over dt^2} + \mu {d \mathbf{w} \over dt }= -\nabla_w E(\mathbf{w}).
-$$
-
-<p>We can discretize this equation in the usual way to get</p>
-
-$$
-m { \mathbf{w}_{t+\Delta t}-2 \mathbf{w}_{t} +\mathbf{w}_{t-\Delta t} \over (\Delta t)^2}+\mu {\mathbf{w}_{t+\Delta t}- \mathbf{w}_{t} \over \Delta t} = -\nabla_w E(\mathbf{w}).
-$$
-
-<p>Rearranging this equation, we can rewrite this as</p>
-
-$$
-\Delta \mathbf{w}_{t +\Delta t}= - { (\Delta t)^2 \over m +\mu \Delta t} \nabla_w E(\mathbf{w})+ {m \over m +\mu \Delta t} \Delta \mathbf{w}_t.
-$$
-
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="momentum-parameter">Momentum parameter </h2>
-
-<p>Notice that this equation is identical to previous one if we identify
-the position of the particle, \( \mathbf{w} \), with the parameters
-\( \boldsymbol{\theta} \). This allows us to identify the momentum
-parameter and learning rate with the mass of the particle and the
-viscous drag as:
-</p>
-
-$$
-\gamma= {m \over m +\mu \Delta t }, \qquad \eta = {(\Delta t)^2 \over m +\mu \Delta t}.
-$$
-
-<p>Thus, as the name suggests, the momentum parameter is proportional to
-the mass of the particle and effectively provides inertia.
-Furthermore, in the large viscosity/small learning rate limit, our
-memory time scales as \( (1-\gamma)^{-1} \approx m/(\mu \Delta t) \).
-</p>
-
-<p>Why is momentum useful? SGD momentum helps the gradient descent
-algorithm gain speed in directions with persistent but small gradients
-even in the presence of stochasticity, while suppressing oscillations
-in high-curvature directions. This becomes especially important in
-situations where the landscape is shallow and flat in some directions
-and narrow and steep in others. It has been argued that first-order
-methods (with appropriate initial conditions) can perform comparable
-to more expensive second order methods, especially in the context of
-complex deep learning models.
-</p>
-
-<p>These beneficial properties of momentum can sometimes become even more
-pronounced by using a slight modification of the classical momentum
-algorithm called Nesterov Accelerated Gradient (NAG).
-</p>
-
-<p>In the NAG algorithm, rather than calculating the gradient at the
-current parameters, \( \nabla_\theta E(\boldsymbol{\theta}_t) \), one
-calculates the gradient at the expected value of the parameters given
-our current momentum, \( \nabla_\theta E(\boldsymbol{\theta}_t +\gamma
-\mathbf{v}_{t-1}) \). This yields the NAG update rule
-</p>
-
-$$
-\begin{align}
-\mathbf{v}_{t}&=\gamma \mathbf{v}_{t-1}+\eta_{t}\nabla_\theta E(\boldsymbol{\theta}_t +\gamma \mathbf{v}_{t-1}) \nonumber \\
-\boldsymbol{\theta}_{t+1}&= \boldsymbol{\theta}_t -\mathbf{v}_{t}.
-\label{_auto2}
-\end{align}
-$$
-
-<p>One of the major advantages of NAG is that it allows for the use of a larger learning rate than GDM for the same choice of \( \gamma \).</p>
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="second-moment-of-the-gradient">Second moment of the gradient </h2>
-
-<p>In stochastic gradient descent, with and without momentum, we still
-have to specify a schedule for tuning the learning rates \( \eta_t \)
-as a function of time.  As discussed in the context of Newton's
-method, this presents a number of dilemmas. The learning rate is
-limited by the steepest direction which can change depending on the
-current position in the landscape. To circumvent this problem, ideally
-our algorithm would keep track of curvature and take large steps in
-shallow, flat directions and small steps in steep, narrow directions.
-Second-order methods accomplish this by calculating or approximating
-the Hessian and normalizing the learning rate by the
-curvature. However, this is very computationally expensive for
-extremely large models. Ideally, we would like to be able to
-adaptively change the step size to match the landscape without paying
-the steep computational price of calculating or approximating
-Hessians.
-</p>
-
-<p>Recently, a number of methods have been introduced that accomplish
-this by tracking not only the gradient, but also the second moment of
-the gradient. These methods include AdaGrad, AdaDelta, Root Mean Squared Propagation (RMS-Prop), and
-<a href="https://arxiv.org/abs/1412.6980" target="_blank">ADAM</a>.
-</p>
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="rms-prop">RMS prop </h2>
-
-<p>In RMS prop, in addition to keeping a running average of the first
-moment of the gradient, we also keep track of the second moment
-denoted by \( \mathbf{s}_t=\mathbb{E}[\mathbf{g}_t^2] \). The update rule
-for RMS prop is given by
-</p>
-
-$$
-\begin{align}
-\mathbf{g}_t &= \nabla_\theta E(\boldsymbol{\theta}) 
-\label{_auto3}\\
-\mathbf{s}_t &=\beta \mathbf{s}_{t-1} +(1-\beta)\mathbf{g}_t^2 \nonumber \\
-\boldsymbol{\theta}_{t+1}&=&\boldsymbol{\theta}_t - \eta_t { \mathbf{g}_t \over \sqrt{\mathbf{s}_t +\epsilon}}, \nonumber
-\end{align}
-$$
-
-<p>where \( \beta \) controls the averaging time of the second moment and is
-typically taken to be about \( \beta=0.9 \), \( \eta_t \) is a learning rate
-typically chosen to be \( 10^{-3} \), and \( \epsilon\sim 10^{-8}  \) is a
-small regularization constant to prevent divergences. Multiplication
-and division by vectors is understood as an element-wise operation. It
-is clear from this formula that the learning rate is reduced in
-directions where the norm of the gradient is consistently large. This
-greatly speeds up the convergence by allowing us to use a larger
-learning rate for flat directions.
-</p>
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="adam-optimizer-https-arxiv-org-abs-1412-6980"><a href="https://arxiv.org/abs/1412.6980" target="_blank">ADAM optimizer</a> </h2>
-
-<p>A related algorithm is the ADAM optimizer. In
-<a href="https://arxiv.org/abs/1412.6980" target="_blank">ADAM</a>, we keep a running average of
-both the first and second moment of the gradient and use this
-information to adaptively change the learning rate for different
-parameters.  The method isefficient when working with large
-problems involving lots data and/or parameters.  It is a combination of the
-gradient descent with momentum algorithm and the RMSprop algorithm
-discussed above.
-</p>
-
-<p>In addition to keeping a running average of the first and
-second moments of the gradient
-(i.e. \( \mathbf{m}_t=\mathbb{E}[\mathbf{g}_t] \) and
-\( \mathbf{s}_t=\mathbb{E}[\mathbf{g}^2_t] \), respectively), ADAM
-performs an additional bias correction to account for the fact that we
-are estimating the first two moments of the gradient using a running
-average (denoted by the hats in the update rule below). The update
-rule for ADAM is given by (where multiplication and division are once
-again understood to be element-wise operations below)
-</p>
-
-$$
-\begin{align}
-\mathbf{g}_t &= \nabla_\theta E(\boldsymbol{\theta}) 
-\label{_auto4}\\
-\mathbf{m}_t &= \beta_1 \mathbf{m}_{t-1} + (1-\beta_1) \mathbf{g}_t \nonumber \\
-\mathbf{s}_t &=\beta_2 \mathbf{s}_{t-1} +(1-\beta_2)\mathbf{g}_t^2 \nonumber \\
-\boldsymbol{\mathbf{m}}_t&={\mathbf{m}_t \over 1-\beta_1^t} \nonumber \\
-\boldsymbol{\mathbf{s}}_t &={\mathbf{s}_t \over1-\beta_2^t} \nonumber \\
-\boldsymbol{\theta}_{t+1}&=\boldsymbol{\theta}_t - \eta_t { \boldsymbol{\mathbf{m}}_t \over \sqrt{\boldsymbol{\mathbf{s}}_t} +\epsilon}, \nonumber \\
-\label{_auto5}
-\end{align}
-$$
-
-<p>where \( \beta_1 \) and \( \beta_2 \) set the memory lifetime of the first and
-second moment and are typically taken to be \( 0.9 \) and \( 0.99 \)
-respectively, and \( \eta \) and \( \epsilon \) are identical to RMSprop.
-</p>
-
-<p>Like in RMSprop, the effective step size of a parameter depends on the
-magnitude of its gradient squared.  To understand this better, let us
-rewrite this expression in terms of the variance
-\( \boldsymbol{\sigma}_t^2 = \boldsymbol{\mathbf{s}}_t -
-(\boldsymbol{\mathbf{m}}_t)^2 \). Consider a single parameter \( \theta_t \). The
-update rule for this parameter is given by
-</p>
-
-$$
-\Delta \theta_{t+1}= -\eta_t { \boldsymbol{m}_t \over \sqrt{\sigma_t^2 +  m_t^2 }+\epsilon}.
-$$
-
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="algorithms-and-codes-for-adagrad-rmsprop-and-adam">Algorithms and codes for Adagrad, RMSprop and Adam </h2>
-
-<p>The algorithms we have implemented are well described in the text by <a href="https://www.deeplearningbook.org/contents/optimization.html" target="_blank">Goodfellow, Bengio and Courville, chapter 8</a>.</p>
-
-<p>The codes which implement these algorithms are discussed after our presentation of automatic differentiation.</p>
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="practical-tips">Practical tips </h2>
-
-<ul>
-<li> <b>Randomize the data when making mini-batches</b>. It is always important to randomly shuffle the data when forming mini-batches. Otherwise, the gradient descent method can fit spurious correlations resulting from the order in which data is presented.</li>
-<li> <b>Transform your inputs</b>. Learning becomes difficult when our landscape has a mixture of steep and flat directions. One simple trick for minimizing these situations is to standardize the data by subtracting the mean and normalizing the variance of input variables. Whenever possible, also decorrelate the inputs. To understand why this is helpful, consider the case of linear regression. It is easy to show that for the squared error cost function, the Hessian of the cost function is just the correlation matrix between the inputs. Thus, by standardizing the inputs, we are ensuring that the landscape looks homogeneous in all directions in parameter space. Since most deep networks can be viewed as linear transformations followed by a non-linearity at each layer, we expect this intuition to hold beyond the linear case.</li>
-<li> <b>Monitor the out-of-sample performance.</b> Always monitor the performance of your model on a validation set (a small portion of the training data that is held out of the training process to serve as a proxy for the test set. If the validation error starts increasing, then the model is beginning to overfit. Terminate the learning process. This <em>early stopping</em> significantly improves performance in many settings.</li>
-<li> <b>Adaptive optimization methods don't always have good generalization.</b> Recent studies have shown that adaptive methods such as ADAM, RMSPorp, and AdaGrad tend to have poor generalization compared to SGD or SGD with momentum, particularly in the high-dimensional limit (i.e. the number of parameters exceeds the number of data points). Although it is not clear at this stage why these methods perform so well in training deep neural networks, simpler procedures like properly-tuned SGD may work as well or better in these applications.</li>
-</ul>
-<p>Geron's text, see chapter 11, has several interesting discussions.</p>
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="automatic-differentiation">Automatic differentiation </h2>
-
-<p><a href="https://en.wikipedia.org/wiki/Automatic_differentiation" target="_blank">Automatic differentiation (AD)</a>, 
-also called algorithmic
-differentiation or computational differentiation,is a set of
-techniques to numerically evaluate the derivative of a function
-specified by a computer program. AD exploits the fact that every
-computer program, no matter how complicated, executes a sequence of
-elementary arithmetic operations (addition, subtraction,
-multiplication, division, etc.) and elementary functions (exp, log,
-sin, cos, etc.). By applying the chain rule repeatedly to these
-operations, derivatives of arbitrary order can be computed
-automatically, accurately to working precision, and using at most a
-small constant factor more arithmetic operations than the original
-program.
-</p>
-
-<p>Automatic differentiation is neither:</p>
-
-<ul>
-<li> Symbolic differentiation, nor</li>
-<li> Numerical differentiation (the method of finite differences).</li>
-</ul>
-<p>Symbolic differentiation can lead to inefficient code and faces the
-difficulty of converting a computer program into a single expression,
-while numerical differentiation can introduce round-off errors in the
-discretization process and cancellation
-</p>
-
-<p>Python has tools for so-called <b>automatic differentiation</b>.
-Consider the following example
-</p>
-$$
-f(x) = \sin\left(2\pi x + x^2\right)
-$$
-
-<p>which has the following derivative</p>
-$$
-f'(x) = \cos\left(2\pi x + x^2\right)\left(2\pi + 2x\right) 
-$$
-
-<p>Using <b>autograd</b> we have</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="line-height: 125%;"><span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">autograd.numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
-
-<span style="color: #228B22"># To do elementwise differentiation:</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">autograd</span> <span style="color: #8B008B; font-weight: bold">import</span> elementwise_grad <span style="color: #8B008B; font-weight: bold">as</span> egrad 
-
-<span style="color: #228B22"># To plot:</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">matplotlib.pyplot</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">plt</span> 
-
-
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">f</span>(x):
-    <span style="color: #8B008B; font-weight: bold">return</span> np.sin(<span style="color: #B452CD">2</span>*np.pi*x + x**<span style="color: #B452CD">2</span>)
-
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">f_grad_analytic</span>(x):
-    <span style="color: #8B008B; font-weight: bold">return</span> np.cos(<span style="color: #B452CD">2</span>*np.pi*x + x**<span style="color: #B452CD">2</span>)*(<span style="color: #B452CD">2</span>*np.pi + <span style="color: #B452CD">2</span>*x)
-
-<span style="color: #228B22"># Do the comparison:</span>
-x = np.linspace(<span style="color: #B452CD">0</span>,<span style="color: #B452CD">1</span>,<span style="color: #B452CD">1000</span>)
-
-f_grad = egrad(f)
-
-computed = f_grad(x)
-analytic = f_grad_analytic(x)
-
-plt.title(<span style="color: #CD5555">&#39;Derivative computed from Autograd compared with the analytical derivative&#39;</span>)
-plt.plot(x,computed,label=<span style="color: #CD5555">&#39;autograd&#39;</span>)
-plt.plot(x,analytic,label=<span style="color: #CD5555">&#39;analytic&#39;</span>)
-
-plt.xlabel(<span style="color: #CD5555">&#39;x&#39;</span>)
-plt.ylabel(<span style="color: #CD5555">&#39;y&#39;</span>)
-plt.legend()
-
-plt.show()
-
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;The max absolute difference is: %g&quot;</span>%(np.max(np.abs(computed - analytic))))
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-
-<!-- !split  -->
-<h2 id="using-autograd">Using autograd </h2>
-
-<p>Here we
-experiment with what kind of functions Autograd is capable
-of finding the gradient of. The following Python functions are just
-meant to illustrate what Autograd can do, but please feel free to
-experiment with other, possibly more complicated, functions as well.
-</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="line-height: 125%;"><span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">autograd.numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">autograd</span> <span style="color: #8B008B; font-weight: bold">import</span> grad
-
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">f1</span>(x):
-    <span style="color: #8B008B; font-weight: bold">return</span> x**<span style="color: #B452CD">3</span> + <span style="color: #B452CD">1</span>
-
-f1_grad = grad(f1)
-
-<span style="color: #228B22"># Remember to send in float as argument to the computed gradient from Autograd!</span>
-a = <span style="color: #B452CD">1.0</span>
-
-<span style="color: #228B22"># See the evaluated gradient at a using autograd:</span>
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;The gradient of f1 evaluated at a = %g using autograd is: %g&quot;</span>%(a,f1_grad(a)))
-
-<span style="color: #228B22"># Compare with the analytical derivative, that is f1&#39;(x) = 3*x**2 </span>
-grad_analytical = <span style="color: #B452CD">3</span>*a**<span style="color: #B452CD">2</span>
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;The gradient of f1 evaluated at a = %g by finding the analytic expression is: %g&quot;</span>%(a,grad_analytical))
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="autograd-with-more-complicated-functions">Autograd with more complicated functions </h2>
-
-<p>To differentiate with respect to two (or more) arguments of a Python
-function, Autograd need to know at which variable the function if
-being differentiated with respect to.
-</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="line-height: 125%;"><span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">autograd.numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">autograd</span> <span style="color: #8B008B; font-weight: bold">import</span> grad
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">f2</span>(x1,x2):
-    <span style="color: #8B008B; font-weight: bold">return</span> <span style="color: #B452CD">3</span>*x1**<span style="color: #B452CD">3</span> + x2*(x1 - <span style="color: #B452CD">5</span>) + <span style="color: #B452CD">1</span>
-
-<span style="color: #228B22"># By sending the argument 0, Autograd will compute the derivative w.r.t the first variable, in this case x1</span>
-f2_grad_x1 = grad(f2,<span style="color: #B452CD">0</span>)
-
-<span style="color: #228B22"># ... and differentiate w.r.t x2 by sending 1 as an additional arugment to grad</span>
-f2_grad_x2 = grad(f2,<span style="color: #B452CD">1</span>)
-
-x1 = <span style="color: #B452CD">1.0</span>
-x2 = <span style="color: #B452CD">3.0</span> 
-
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;Evaluating at x1 = %g, x2 = %g&quot;</span>%(x1,x2))
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;-&quot;</span>*<span style="color: #B452CD">30</span>)
-
-<span style="color: #228B22"># Compare with the analytical derivatives:</span>
-
-<span style="color: #228B22"># Derivative of f2 w.r.t x1 is: 9*x1**2 + x2:</span>
-f2_grad_x1_analytical = <span style="color: #B452CD">9</span>*x1**<span style="color: #B452CD">2</span> + x2
-
-<span style="color: #228B22"># Derivative of f2 w.r.t x2 is: x1 - 5:</span>
-f2_grad_x2_analytical = x1 - <span style="color: #B452CD">5</span>
-
-<span style="color: #228B22"># See the evaluated derivations:</span>
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;The derivative of f2 w.r.t x1: %g&quot;</span>%( f2_grad_x1(x1,x2) ))
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;The analytical derivative of f2 w.r.t x1: %g&quot;</span>%( f2_grad_x1(x1,x2) ))
-
-<span style="color: #658b00">print</span>()
-
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;The derivative of f2 w.r.t x2: %g&quot;</span>%( f2_grad_x2(x1,x2) ))
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;The analytical derivative of f2 w.r.t x2: %g&quot;</span>%( f2_grad_x2(x1,x2) ))
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-<p>Note that the grad function will not produce the true gradient of the function. The true gradient of a function with two or more variables will produce a vector, where each element is the function differentiated w.r.t a variable.</p>
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="more-complicated-functions-using-the-elements-of-their-arguments-directly">More complicated functions using the elements of their arguments directly </h2>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="line-height: 125%;"><span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">autograd.numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">autograd</span> <span style="color: #8B008B; font-weight: bold">import</span> grad
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">f3</span>(x): <span style="color: #228B22"># Assumes x is an array of length 5 or higher</span>
-    <span style="color: #8B008B; font-weight: bold">return</span> <span style="color: #B452CD">2</span>*x[<span style="color: #B452CD">0</span>] + <span style="color: #B452CD">3</span>*x[<span style="color: #B452CD">1</span>] + <span style="color: #B452CD">5</span>*x[<span style="color: #B452CD">2</span>] + <span style="color: #B452CD">7</span>*x[<span style="color: #B452CD">3</span>] + <span style="color: #B452CD">11</span>*x[<span style="color: #B452CD">4</span>]**<span style="color: #B452CD">2</span>
-
-f3_grad = grad(f3)
-
-x = np.linspace(<span style="color: #B452CD">0</span>,<span style="color: #B452CD">4</span>,<span style="color: #B452CD">5</span>)
-
-<span style="color: #228B22"># Print the computed gradient:</span>
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;The computed gradient of f3 is: &quot;</span>, f3_grad(x))
-
-<span style="color: #228B22"># The analytical gradient is: (2, 3, 5, 7, 22*x[4])</span>
-f3_grad_analytical = np.array([<span style="color: #B452CD">2</span>, <span style="color: #B452CD">3</span>, <span style="color: #B452CD">5</span>, <span style="color: #B452CD">7</span>, <span style="color: #B452CD">22</span>*x[<span style="color: #B452CD">4</span>]])
-
-<span style="color: #228B22"># Print the analytical gradient:</span>
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;The analytical gradient of f3 is: &quot;</span>, f3_grad_analytical)
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-<p>Note that in this case, when sending an array as input argument, the
-output from Autograd is another array. This is the true gradient of
-the function, as opposed to the function in the previous example. By
-using arrays to represent the variables, the output from Autograd
-might be easier to work with, as the output is closer to what one
-could expect form a gradient-evaluting function.
-</p>
-
-<!-- !split  -->
-<h2 id="functions-using-mathematical-functions-from-numpy">Functions using mathematical functions from Numpy </h2>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="line-height: 125%;"><span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">autograd.numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">autograd</span> <span style="color: #8B008B; font-weight: bold">import</span> grad
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">f4</span>(x):
-    <span style="color: #8B008B; font-weight: bold">return</span> np.sqrt(<span style="color: #B452CD">1</span>+x**<span style="color: #B452CD">2</span>) + np.exp(x) + np.sin(<span style="color: #B452CD">2</span>*np.pi*x)
-
-f4_grad = grad(f4)
-
-x = <span style="color: #B452CD">2.7</span>
-
-<span style="color: #228B22"># Print the computed derivative:</span>
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;The computed derivative of f4 at x = %g is: %g&quot;</span>%(x,f4_grad(x)))
-
-<span style="color: #228B22"># The analytical derivative is: x/sqrt(1 + x**2) + exp(x) + cos(2*pi*x)*2*pi</span>
-f4_grad_analytical = x/np.sqrt(<span style="color: #B452CD">1</span> + x**<span style="color: #B452CD">2</span>) + np.exp(x) + np.cos(<span style="color: #B452CD">2</span>*np.pi*x)*<span style="color: #B452CD">2</span>*np.pi
-
-<span style="color: #228B22"># Print the analytical gradient:</span>
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;The analytical gradient of f4 at x = %g is: %g&quot;</span>%(x,f4_grad_analytical))
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="more-autograd">More autograd </h2>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="line-height: 125%;"><span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">autograd.numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">autograd</span> <span style="color: #8B008B; font-weight: bold">import</span> grad
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">f5</span>(x):
-    <span style="color: #8B008B; font-weight: bold">if</span> x &gt;= <span style="color: #B452CD">0</span>:
-        <span style="color: #8B008B; font-weight: bold">return</span> x**<span style="color: #B452CD">2</span>
-    <span style="color: #8B008B; font-weight: bold">else</span>:
-        <span style="color: #8B008B; font-weight: bold">return</span> -<span style="color: #B452CD">3</span>*x + <span style="color: #B452CD">1</span>
-
-f5_grad = grad(f5)
-
-x = <span style="color: #B452CD">2.7</span>
-
-<span style="color: #228B22"># Print the computed derivative:</span>
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;The computed derivative of f5 at x = %g is: %g&quot;</span>%(x,f5_grad(x)))
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="and-with-loops">And  with loops </h2>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="line-height: 125%;"><span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">autograd.numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">autograd</span> <span style="color: #8B008B; font-weight: bold">import</span> grad
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">f6_for</span>(x):
-    val = <span style="color: #B452CD">0</span>
-    <span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #B452CD">10</span>):
-        val = val + x**i
-    <span style="color: #8B008B; font-weight: bold">return</span> val
-
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">f6_while</span>(x):
-    val = <span style="color: #B452CD">0</span>
-    i = <span style="color: #B452CD">0</span>
-    <span style="color: #8B008B; font-weight: bold">while</span> i &lt; <span style="color: #B452CD">10</span>:
-        val = val + x**i
-        i = i + <span style="color: #B452CD">1</span>
-    <span style="color: #8B008B; font-weight: bold">return</span> val
-
-f6_for_grad = grad(f6_for)
-f6_while_grad = grad(f6_while)
-
-x = <span style="color: #B452CD">0.5</span>
-
-<span style="color: #228B22"># Print the computed derivaties of f6_for and f6_while</span>
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;The computed derivative of f6_for at x = %g is: %g&quot;</span>%(x,f6_for_grad(x)))
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;The computed derivative of f6_while at x = %g is: %g&quot;</span>%(x,f6_while_grad(x)))
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="line-height: 125%;"><span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">autograd.numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">autograd</span> <span style="color: #8B008B; font-weight: bold">import</span> grad
-<span style="color: #228B22"># Both of the functions are implementation of the sum: sum(x**i) for i = 0, ..., 9</span>
-<span style="color: #228B22"># The analytical derivative is: sum(i*x**(i-1)) </span>
-f6_grad_analytical = <span style="color: #B452CD">0</span>
-<span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #B452CD">10</span>):
-    f6_grad_analytical += i*x**(i-<span style="color: #B452CD">1</span>)
-
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;The analytical derivative of f6 at x = %g is: %g&quot;</span>%(x,f6_grad_analytical))
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="using-recursion">Using recursion </h2>
-
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="line-height: 125%;"><span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">autograd.numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">autograd</span> <span style="color: #8B008B; font-weight: bold">import</span> grad
-
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">f7</span>(n): <span style="color: #228B22"># Assume that n is an integer</span>
-    <span style="color: #8B008B; font-weight: bold">if</span> n == <span style="color: #B452CD">1</span> <span style="color: #8B008B">or</span> n == <span style="color: #B452CD">0</span>:
-        <span style="color: #8B008B; font-weight: bold">return</span> <span style="color: #B452CD">1</span>
-    <span style="color: #8B008B; font-weight: bold">else</span>:
-        <span style="color: #8B008B; font-weight: bold">return</span> n*f7(n-<span style="color: #B452CD">1</span>)
-
-f7_grad = grad(f7)
-
-n = <span style="color: #B452CD">2.0</span>
-
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;The computed derivative of f7 at n = %d is: %g&quot;</span>%(n,f7_grad(n)))
-
-<span style="color: #228B22"># The function f7 is an implementation of the factorial of n.</span>
-<span style="color: #228B22"># By using the product rule, one can find that the derivative is:</span>
-
-f7_grad_analytical = <span style="color: #B452CD">0</span>
-<span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #658b00">int</span>(n)-<span style="color: #B452CD">1</span>):
-    tmp = <span style="color: #B452CD">1</span>
-    <span style="color: #8B008B; font-weight: bold">for</span> k <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #658b00">int</span>(n)-<span style="color: #B452CD">1</span>):
-        <span style="color: #8B008B; font-weight: bold">if</span> k != i:
-            tmp *= (n - k)
-    f7_grad_analytical += tmp
-
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;The analytical derivative of f7 at n = %d is: %g&quot;</span>%(n,f7_grad_analytical))
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-<p>Note that if n is equal to zero or one, Autograd will give an error message. This message appears when the output is independent on input.</p>
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="unsupported-functions">Unsupported functions </h2>
-<p>Autograd supports many features. However, there are some functions that is not supported (yet) by Autograd.</p>
-
-<p>Assigning a value to the variable being differentiated with respect to</p>
-
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="line-height: 125%;"><span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">autograd.numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">autograd</span> <span style="color: #8B008B; font-weight: bold">import</span> grad
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">f8</span>(x): <span style="color: #228B22"># Assume x is an array</span>
-    x[<span style="color: #B452CD">2</span>] = <span style="color: #B452CD">3</span>
-    <span style="color: #8B008B; font-weight: bold">return</span> x*<span style="color: #B452CD">2</span>
-
-<span style="color: #228B22">#f8_grad = grad(f8)</span>
-
-<span style="color: #228B22">#x = 8.4</span>
-
-<span style="color: #228B22">#print(&quot;The derivative of f8 is:&quot;,f8_grad(x))</span>
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-<p>Here, running this code, Autograd tells us that an 'ArrayBox' does not support item assignment. The item assignment is done when the program tries to assign x[2] to the value 3. However, Autograd has implemented the computation of the derivative such that this assignment is not possible.</p>
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="the-syntax-a-dot-b-when-finding-the-dot-product">The syntax a.dot(b) when finding the dot product </h2>
-
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="line-height: 125%;"><span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">autograd.numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">autograd</span> <span style="color: #8B008B; font-weight: bold">import</span> grad
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">f9</span>(a): <span style="color: #228B22"># Assume a is an array with 2 elements</span>
-    b = np.array([<span style="color: #B452CD">1.0</span>,<span style="color: #B452CD">2.0</span>])
-    <span style="color: #8B008B; font-weight: bold">return</span> a.dot(b)
-
-<span style="color: #228B22">#f9_grad = grad(f9)</span>
-
-<span style="color: #228B22">#x = np.array([1.0,0.0])</span>
-
-<span style="color: #228B22">#print(&quot;The derivative of f9 is:&quot;,f9_grad(x))</span>
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-<p>Here we are told that the 'dot' function does not belong to Autograd's
-version of a Numpy array.  To overcome this, an alternative syntax
-which also computed the dot product can be used:
-</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="line-height: 125%;"><span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">autograd.numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">autograd</span> <span style="color: #8B008B; font-weight: bold">import</span> grad
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">f9_alternative</span>(x): <span style="color: #228B22"># Assume a is an array with 2 elements</span>
-    b = np.array([<span style="color: #B452CD">1.0</span>,<span style="color: #B452CD">2.0</span>])
-    <span style="color: #8B008B; font-weight: bold">return</span> np.dot(x,b) <span style="color: #228B22"># The same as x_1*b_1 + x_2*b_2</span>
-
-f9_alternative_grad = grad(f9_alternative)
-
-x = np.array([<span style="color: #B452CD">3.0</span>,<span style="color: #B452CD">0.0</span>])
-
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;The gradient of f9 is:&quot;</span>,f9_alternative_grad(x))
-
-<span style="color: #228B22"># The analytical gradient of the dot product of vectors x and b with two elements (x_1,x_2) and (b_1, b_2) respectively</span>
-<span style="color: #228B22"># w.r.t x is (b_1, b_2).</span>
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="using-autograd-with-ols">Using Autograd with OLS </h2>
-
-<p>We conclude the part on optmization by showing how we can make codes
-for linear regression and logistic regression using <b>autograd</b>. The
-first example shows results with ordinary leats squares.
-</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="line-height: 125%;"><span style="color: #228B22"># Using Autograd to calculate gradients for OLS</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">random</span> <span style="color: #8B008B; font-weight: bold">import</span> random, seed
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">autograd.numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">matplotlib.pyplot</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">plt</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">autograd</span> <span style="color: #8B008B; font-weight: bold">import</span> grad
-
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">CostOLS</span>(beta):
-    <span style="color: #8B008B; font-weight: bold">return</span> (<span style="color: #B452CD">1.0</span>/n)*np.sum((y-X @ beta)**<span style="color: #B452CD">2</span>)
-
-n = <span style="color: #B452CD">100</span>
-x = <span style="color: #B452CD">2</span>*np.random.rand(n,<span style="color: #B452CD">1</span>)
-y = <span style="color: #B452CD">4</span>+<span style="color: #B452CD">3</span>*x+np.random.randn(n,<span style="color: #B452CD">1</span>)
-
-X = np.c_[np.ones((n,<span style="color: #B452CD">1</span>)), x]
-XT_X = X.T @ X
-theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;Own inversion&quot;</span>)
-<span style="color: #658b00">print</span>(theta_linreg)
-<span style="color: #228B22"># Hessian matrix</span>
-H = (<span style="color: #B452CD">2.0</span>/n)* XT_X
-EigValues, EigVectors = np.linalg.eig(H)
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">f&quot;Eigenvalues of Hessian Matrix:{</span>EigValues<span style="color: #CD5555">}&quot;</span>)
-
-theta = np.random.randn(<span style="color: #B452CD">2</span>,<span style="color: #B452CD">1</span>)
-eta = <span style="color: #B452CD">1.0</span>/np.max(EigValues)
-Niterations = <span style="color: #B452CD">1000</span>
-<span style="color: #228B22"># define the gradient</span>
-training_gradient = grad(CostOLS)
-
-<span style="color: #8B008B; font-weight: bold">for</span> <span style="color: #658b00">iter</span> <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(Niterations):
-    gradients = training_gradient(theta)
-    theta -= eta*gradients
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;theta from own gd&quot;</span>)
-<span style="color: #658b00">print</span>(theta)
-
-xnew = np.array([[<span style="color: #B452CD">0</span>],[<span style="color: #B452CD">2</span>]])
-Xnew = np.c_[np.ones((<span style="color: #B452CD">2</span>,<span style="color: #B452CD">1</span>)), xnew]
-ypredict = Xnew.dot(theta)
-ypredict2 = Xnew.dot(theta_linreg)
-
-plt.plot(xnew, ypredict, <span style="color: #CD5555">&quot;r-&quot;</span>)
-plt.plot(xnew, ypredict2, <span style="color: #CD5555">&quot;b-&quot;</span>)
-plt.plot(x, y ,<span style="color: #CD5555">&#39;ro&#39;</span>)
-plt.axis([<span style="color: #B452CD">0</span>,<span style="color: #B452CD">2.0</span>,<span style="color: #B452CD">0</span>, <span style="color: #B452CD">15.0</span>])
-plt.xlabel(<span style="color: #CD5555">r&#39;$x$&#39;</span>)
-plt.ylabel(<span style="color: #CD5555">r&#39;$y$&#39;</span>)
-plt.title(<span style="color: #CD5555">r&#39;Random numbers &#39;</span>)
-plt.show()
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="same-code-but-now-with-momentum-gradient-descent">Same code but now with momentum gradient descent </h2>
-
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="line-height: 125%;"><span style="color: #228B22"># Using Autograd to calculate gradients for OLS</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">random</span> <span style="color: #8B008B; font-weight: bold">import</span> random, seed
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">autograd.numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">matplotlib.pyplot</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">plt</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">autograd</span> <span style="color: #8B008B; font-weight: bold">import</span> grad
-
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">CostOLS</span>(beta):
-    <span style="color: #8B008B; font-weight: bold">return</span> (<span style="color: #B452CD">1.0</span>/n)*np.sum((y-X @ beta)**<span style="color: #B452CD">2</span>)
-
-n = <span style="color: #B452CD">100</span>
-x = <span style="color: #B452CD">2</span>*np.random.rand(n,<span style="color: #B452CD">1</span>)
-y = <span style="color: #B452CD">4</span>+<span style="color: #B452CD">3</span>*x<span style="color: #228B22">#+np.random.randn(n,1)</span>
-
-X = np.c_[np.ones((n,<span style="color: #B452CD">1</span>)), x]
-XT_X = X.T @ X
-theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;Own inversion&quot;</span>)
-<span style="color: #658b00">print</span>(theta_linreg)
-<span style="color: #228B22"># Hessian matrix</span>
-H = (<span style="color: #B452CD">2.0</span>/n)* XT_X
-EigValues, EigVectors = np.linalg.eig(H)
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">f&quot;Eigenvalues of Hessian Matrix:{</span>EigValues<span style="color: #CD5555">}&quot;</span>)
-
-theta = np.random.randn(<span style="color: #B452CD">2</span>,<span style="color: #B452CD">1</span>)
-eta = <span style="color: #B452CD">1.0</span>/np.max(EigValues)
-Niterations = <span style="color: #B452CD">30</span>
-
-<span style="color: #228B22"># define the gradient</span>
-training_gradient = grad(CostOLS)
-
-<span style="color: #8B008B; font-weight: bold">for</span> <span style="color: #658b00">iter</span> <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(Niterations):
-    gradients = training_gradient(theta)
-    theta -= eta*gradients
-    <span style="color: #658b00">print</span>(<span style="color: #658b00">iter</span>,gradients[<span style="color: #B452CD">0</span>],gradients[<span style="color: #B452CD">1</span>])
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;theta from own gd&quot;</span>)
-<span style="color: #658b00">print</span>(theta)
-
-<span style="color: #228B22"># Now improve with momentum gradient descent</span>
-change = <span style="color: #B452CD">0.0</span>
-delta_momentum = <span style="color: #B452CD">0.3</span>
-<span style="color: #8B008B; font-weight: bold">for</span> <span style="color: #658b00">iter</span> <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(Niterations):
-    <span style="color: #228B22"># calculate gradient</span>
-    gradients = training_gradient(theta)
-    <span style="color: #228B22"># calculate update</span>
-    new_change = eta*gradients+delta_momentum*change
-    <span style="color: #228B22"># take a step</span>
-    theta -= new_change
-    <span style="color: #228B22"># save the change</span>
-    change = new_change
-    <span style="color: #658b00">print</span>(<span style="color: #658b00">iter</span>,gradients[<span style="color: #B452CD">0</span>],gradients[<span style="color: #B452CD">1</span>])
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;theta from own gd wth momentum&quot;</span>)
-<span style="color: #658b00">print</span>(theta)
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="but-none-of-these-can-compete-with-newton-s-method">But none of these can compete with Newton's method </h2>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="line-height: 125%;"><span style="color: #228B22"># Using Newton&#39;s method</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">random</span> <span style="color: #8B008B; font-weight: bold">import</span> random, seed
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">autograd.numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">matplotlib.pyplot</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">plt</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">autograd</span> <span style="color: #8B008B; font-weight: bold">import</span> grad
-
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">CostOLS</span>(beta):
-    <span style="color: #8B008B; font-weight: bold">return</span> (<span style="color: #B452CD">1.0</span>/n)*np.sum((y-X @ beta)**<span style="color: #B452CD">2</span>)
-
-n = <span style="color: #B452CD">100</span>
-x = <span style="color: #B452CD">2</span>*np.random.rand(n,<span style="color: #B452CD">1</span>)
-y = <span style="color: #B452CD">4</span>+<span style="color: #B452CD">3</span>*x+np.random.randn(n,<span style="color: #B452CD">1</span>)
-
-X = np.c_[np.ones((n,<span style="color: #B452CD">1</span>)), x]
-XT_X = X.T @ X
-beta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;Own inversion&quot;</span>)
-<span style="color: #658b00">print</span>(beta_linreg)
-<span style="color: #228B22"># Hessian matrix</span>
-H = (<span style="color: #B452CD">2.0</span>/n)* XT_X
-<span style="color: #228B22"># Note that here the Hessian does not depend on the parameters beta</span>
-invH = np.linalg.pinv(H)
-EigValues, EigVectors = np.linalg.eig(H)
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">f&quot;Eigenvalues of Hessian Matrix:{</span>EigValues<span style="color: #CD5555">}&quot;</span>)
-
-beta = np.random.randn(<span style="color: #B452CD">2</span>,<span style="color: #B452CD">1</span>)
-Niterations = <span style="color: #B452CD">5</span>
-
-<span style="color: #228B22"># define the gradient</span>
-training_gradient = grad(CostOLS)
-
-<span style="color: #8B008B; font-weight: bold">for</span> <span style="color: #658b00">iter</span> <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(Niterations):
-    gradients = training_gradient(beta)
-    beta -= invH @ gradients
-    <span style="color: #658b00">print</span>(<span style="color: #658b00">iter</span>,gradients[<span style="color: #B452CD">0</span>],gradients[<span style="color: #B452CD">1</span>])
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;beta from own Newton code&quot;</span>)
-<span style="color: #658b00">print</span>(beta)
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="including-stochastic-gradient-descent-with-autograd">Including Stochastic Gradient Descent with Autograd </h2>
-<p>In this code we include the stochastic gradient descent approach discussed above. Note here that we specify which argument we are taking the derivative with respect to when using <b>autograd</b>.</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="line-height: 125%;"><span style="color: #228B22"># Using Autograd to calculate gradients using SGD</span>
-<span style="color: #228B22"># OLS example</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">random</span> <span style="color: #8B008B; font-weight: bold">import</span> random, seed
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">autograd.numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">matplotlib.pyplot</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">plt</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">autograd</span> <span style="color: #8B008B; font-weight: bold">import</span> grad
-
-<span style="color: #228B22"># Note change from previous example</span>
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">CostOLS</span>(y,X,theta):
-    <span style="color: #8B008B; font-weight: bold">return</span> np.sum((y-X @ theta)**<span style="color: #B452CD">2</span>)
-
-n = <span style="color: #B452CD">100</span>
-x = <span style="color: #B452CD">2</span>*np.random.rand(n,<span style="color: #B452CD">1</span>)
-y = <span style="color: #B452CD">4</span>+<span style="color: #B452CD">3</span>*x+np.random.randn(n,<span style="color: #B452CD">1</span>)
-
-X = np.c_[np.ones((n,<span style="color: #B452CD">1</span>)), x]
-XT_X = X.T @ X
-theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;Own inversion&quot;</span>)
-<span style="color: #658b00">print</span>(theta_linreg)
-<span style="color: #228B22"># Hessian matrix</span>
-H = (<span style="color: #B452CD">2.0</span>/n)* XT_X
-EigValues, EigVectors = np.linalg.eig(H)
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">f&quot;Eigenvalues of Hessian Matrix:{</span>EigValues<span style="color: #CD5555">}&quot;</span>)
-
-theta = np.random.randn(<span style="color: #B452CD">2</span>,<span style="color: #B452CD">1</span>)
-eta = <span style="color: #B452CD">1.0</span>/np.max(EigValues)
-Niterations = <span style="color: #B452CD">1000</span>
-
-<span style="color: #228B22"># Note that we request the derivative wrt third argument (theta, 2 here)</span>
-training_gradient = grad(CostOLS,<span style="color: #B452CD">2</span>)
-
-<span style="color: #8B008B; font-weight: bold">for</span> <span style="color: #658b00">iter</span> <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(Niterations):
-    gradients = (<span style="color: #B452CD">1.0</span>/n)*training_gradient(y, X, theta)
-    theta -= eta*gradients
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;theta from own gd&quot;</span>)
-<span style="color: #658b00">print</span>(theta)
-
-xnew = np.array([[<span style="color: #B452CD">0</span>],[<span style="color: #B452CD">2</span>]])
-Xnew = np.c_[np.ones((<span style="color: #B452CD">2</span>,<span style="color: #B452CD">1</span>)), xnew]
-ypredict = Xnew.dot(theta)
-ypredict2 = Xnew.dot(theta_linreg)
-
-plt.plot(xnew, ypredict, <span style="color: #CD5555">&quot;r-&quot;</span>)
-plt.plot(xnew, ypredict2, <span style="color: #CD5555">&quot;b-&quot;</span>)
-plt.plot(x, y ,<span style="color: #CD5555">&#39;ro&#39;</span>)
-plt.axis([<span style="color: #B452CD">0</span>,<span style="color: #B452CD">2.0</span>,<span style="color: #B452CD">0</span>, <span style="color: #B452CD">15.0</span>])
-plt.xlabel(<span style="color: #CD5555">r&#39;$x$&#39;</span>)
-plt.ylabel(<span style="color: #CD5555">r&#39;$y$&#39;</span>)
-plt.title(<span style="color: #CD5555">r&#39;Random numbers &#39;</span>)
-plt.show()
-
-n_epochs = <span style="color: #B452CD">50</span>
-M = <span style="color: #B452CD">5</span>   <span style="color: #228B22">#size of each minibatch</span>
-m = <span style="color: #658b00">int</span>(n/M) <span style="color: #228B22">#number of minibatches</span>
-t0, t1 = <span style="color: #B452CD">5</span>, <span style="color: #B452CD">50</span>
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">learning_schedule</span>(t):
-    <span style="color: #8B008B; font-weight: bold">return</span> t0/(t+t1)
-
-theta = np.random.randn(<span style="color: #B452CD">2</span>,<span style="color: #B452CD">1</span>)
-
-<span style="color: #8B008B; font-weight: bold">for</span> epoch <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(n_epochs):
-<span style="color: #228B22"># Can you figure out a better way of setting up the contributions to each batch?</span>
-    <span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(m):
-        random_index = M*np.random.randint(m)
-        xi = X[random_index:random_index+M]
-        yi = y[random_index:random_index+M]
-        gradients = (<span style="color: #B452CD">1.0</span>/M)*training_gradient(yi, xi, theta)
-        eta = learning_schedule(epoch*m+i)
-        theta = theta - eta*gradients
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;theta from own sdg&quot;</span>)
-<span style="color: #658b00">print</span>(theta)
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="same-code-but-now-with-momentum-gradient-descent">Same code but now with momentum gradient descent </h2>
-
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="line-height: 125%;"><span style="color: #228B22"># Using Autograd to calculate gradients using SGD</span>
-<span style="color: #228B22"># OLS example</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">random</span> <span style="color: #8B008B; font-weight: bold">import</span> random, seed
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">autograd.numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">matplotlib.pyplot</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">plt</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">autograd</span> <span style="color: #8B008B; font-weight: bold">import</span> grad
-
-<span style="color: #228B22"># Note change from previous example</span>
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">CostOLS</span>(y,X,theta):
-    <span style="color: #8B008B; font-weight: bold">return</span> np.sum((y-X @ theta)**<span style="color: #B452CD">2</span>)
-
-n = <span style="color: #B452CD">100</span>
-x = <span style="color: #B452CD">2</span>*np.random.rand(n,<span style="color: #B452CD">1</span>)
-y = <span style="color: #B452CD">4</span>+<span style="color: #B452CD">3</span>*x+np.random.randn(n,<span style="color: #B452CD">1</span>)
-
-X = np.c_[np.ones((n,<span style="color: #B452CD">1</span>)), x]
-XT_X = X.T @ X
-theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;Own inversion&quot;</span>)
-<span style="color: #658b00">print</span>(theta_linreg)
-<span style="color: #228B22"># Hessian matrix</span>
-H = (<span style="color: #B452CD">2.0</span>/n)* XT_X
-EigValues, EigVectors = np.linalg.eig(H)
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">f&quot;Eigenvalues of Hessian Matrix:{</span>EigValues<span style="color: #CD5555">}&quot;</span>)
-
-theta = np.random.randn(<span style="color: #B452CD">2</span>,<span style="color: #B452CD">1</span>)
-eta = <span style="color: #B452CD">1.0</span>/np.max(EigValues)
-Niterations = <span style="color: #B452CD">100</span>
-
-<span style="color: #228B22"># Note that we request the derivative wrt third argument (theta, 2 here)</span>
-training_gradient = grad(CostOLS,<span style="color: #B452CD">2</span>)
-
-<span style="color: #8B008B; font-weight: bold">for</span> <span style="color: #658b00">iter</span> <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(Niterations):
-    gradients = (<span style="color: #B452CD">1.0</span>/n)*training_gradient(y, X, theta)
-    theta -= eta*gradients
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;theta from own gd&quot;</span>)
-<span style="color: #658b00">print</span>(theta)
-
-
-n_epochs = <span style="color: #B452CD">50</span>
-M = <span style="color: #B452CD">5</span>   <span style="color: #228B22">#size of each minibatch</span>
-m = <span style="color: #658b00">int</span>(n/M) <span style="color: #228B22">#number of minibatches</span>
-t0, t1 = <span style="color: #B452CD">5</span>, <span style="color: #B452CD">50</span>
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">learning_schedule</span>(t):
-    <span style="color: #8B008B; font-weight: bold">return</span> t0/(t+t1)
-
-theta = np.random.randn(<span style="color: #B452CD">2</span>,<span style="color: #B452CD">1</span>)
-
-change = <span style="color: #B452CD">0.0</span>
-delta_momentum = <span style="color: #B452CD">0.3</span>
-
-<span style="color: #8B008B; font-weight: bold">for</span> epoch <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(n_epochs):
-    <span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(m):
-        random_index = M*np.random.randint(m)
-        xi = X[random_index:random_index+M]
-        yi = y[random_index:random_index+M]
-        gradients = (<span style="color: #B452CD">1.0</span>/M)*training_gradient(yi, xi, theta)
-        eta = learning_schedule(epoch*m+i)
-        <span style="color: #228B22"># calculate update</span>
-        new_change = eta*gradients+delta_momentum*change
-        <span style="color: #228B22"># take a step</span>
-        theta -= new_change
-        <span style="color: #228B22"># save the change</span>
-        change = new_change
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;theta from own sdg with momentum&quot;</span>)
-<span style="color: #658b00">print</span>(theta)
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="similar-second-order-function-now-problem-but-now-with-adagrad">Similar (second order function now) problem but now with AdaGrad </h2>
-
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="line-height: 125%;"><span style="color: #228B22"># Using Autograd to calculate gradients using AdaGrad and Stochastic Gradient descent</span>
-<span style="color: #228B22"># OLS example</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">random</span> <span style="color: #8B008B; font-weight: bold">import</span> random, seed
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">autograd.numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">matplotlib.pyplot</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">plt</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">autograd</span> <span style="color: #8B008B; font-weight: bold">import</span> grad
-
-<span style="color: #228B22"># Note change from previous example</span>
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">CostOLS</span>(y,X,theta):
-    <span style="color: #8B008B; font-weight: bold">return</span> np.sum((y-X @ theta)**<span style="color: #B452CD">2</span>)
-
-n = <span style="color: #B452CD">1000</span>
-x = np.random.rand(n,<span style="color: #B452CD">1</span>)
-y = <span style="color: #B452CD">2.0</span>+<span style="color: #B452CD">3</span>*x +<span style="color: #B452CD">4</span>*x*x
-
-X = np.c_[np.ones((n,<span style="color: #B452CD">1</span>)), x, x*x]
-XT_X = X.T @ X
-theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;Own inversion&quot;</span>)
-<span style="color: #658b00">print</span>(theta_linreg)
-
-
-<span style="color: #228B22"># Note that we request the derivative wrt third argument (theta, 2 here)</span>
-training_gradient = grad(CostOLS,<span style="color: #B452CD">2</span>)
-<span style="color: #228B22"># Define parameters for Stochastic Gradient Descent</span>
-n_epochs = <span style="color: #B452CD">50</span>
-M = <span style="color: #B452CD">5</span>   <span style="color: #228B22">#size of each minibatch</span>
-m = <span style="color: #658b00">int</span>(n/M) <span style="color: #228B22">#number of minibatches</span>
-<span style="color: #228B22"># Guess for unknown parameters theta</span>
-theta = np.random.randn(<span style="color: #B452CD">3</span>,<span style="color: #B452CD">1</span>)
-
-<span style="color: #228B22"># Value for learning rate</span>
-eta = <span style="color: #B452CD">0.01</span>
-<span style="color: #228B22"># Including AdaGrad parameter to avoid possible division by zero</span>
-delta  = <span style="color: #B452CD">1e-8</span>
-<span style="color: #8B008B; font-weight: bold">for</span> epoch <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(n_epochs):
-    Giter = <span style="color: #B452CD">0.0</span>
-    <span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(m):
-        random_index = M*np.random.randint(m)
-        xi = X[random_index:random_index+M]
-        yi = y[random_index:random_index+M]
-        gradients = (<span style="color: #B452CD">1.0</span>/M)*training_gradient(yi, xi, theta)
-        Giter += gradients*gradients
-        update = gradients*eta/(delta+np.sqrt(Giter))
-        theta -= update
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;theta from own AdaGrad&quot;</span>)
-<span style="color: #658b00">print</span>(theta)
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-<p>Running this code we note an almost perfect agreement with the results from matrix inversion.</p>
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent">RMSprop for adaptive learning rate with Stochastic Gradient Descent </h2>
-
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="line-height: 125%;"><span style="color: #228B22"># Using Autograd to calculate gradients using RMSprop  and Stochastic Gradient descent</span>
-<span style="color: #228B22"># OLS example</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">random</span> <span style="color: #8B008B; font-weight: bold">import</span> random, seed
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">autograd.numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">matplotlib.pyplot</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">plt</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">autograd</span> <span style="color: #8B008B; font-weight: bold">import</span> grad
-
-<span style="color: #228B22"># Note change from previous example</span>
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">CostOLS</span>(y,X,theta):
-    <span style="color: #8B008B; font-weight: bold">return</span> np.sum((y-X @ theta)**<span style="color: #B452CD">2</span>)
-
-n = <span style="color: #B452CD">1000</span>
-x = np.random.rand(n,<span style="color: #B452CD">1</span>)
-y = <span style="color: #B452CD">2.0</span>+<span style="color: #B452CD">3</span>*x +<span style="color: #B452CD">4</span>*x*x<span style="color: #228B22"># +np.random.randn(n,1)</span>
-
-X = np.c_[np.ones((n,<span style="color: #B452CD">1</span>)), x, x*x]
-XT_X = X.T @ X
-theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;Own inversion&quot;</span>)
-<span style="color: #658b00">print</span>(theta_linreg)
-
-
-<span style="color: #228B22"># Note that we request the derivative wrt third argument (theta, 2 here)</span>
-training_gradient = grad(CostOLS,<span style="color: #B452CD">2</span>)
-<span style="color: #228B22"># Define parameters for Stochastic Gradient Descent</span>
-n_epochs = <span style="color: #B452CD">50</span>
-M = <span style="color: #B452CD">5</span>   <span style="color: #228B22">#size of each minibatch</span>
-m = <span style="color: #658b00">int</span>(n/M) <span style="color: #228B22">#number of minibatches</span>
-<span style="color: #228B22"># Guess for unknown parameters theta</span>
-theta = np.random.randn(<span style="color: #B452CD">3</span>,<span style="color: #B452CD">1</span>)
-
-<span style="color: #228B22"># Value for learning rate</span>
-eta = <span style="color: #B452CD">0.01</span>
-<span style="color: #228B22"># Value for parameter rho</span>
-rho = <span style="color: #B452CD">0.99</span>
-<span style="color: #228B22"># Including AdaGrad parameter to avoid possible division by zero</span>
-delta  = <span style="color: #B452CD">1e-8</span>
-<span style="color: #8B008B; font-weight: bold">for</span> epoch <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(n_epochs):
-    Giter = <span style="color: #B452CD">0.0</span>
-    <span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(m):
-        random_index = M*np.random.randint(m)
-        xi = X[random_index:random_index+M]
-        yi = y[random_index:random_index+M]
-        gradients = (<span style="color: #B452CD">1.0</span>/M)*training_gradient(yi, xi, theta)
-	<span style="color: #228B22"># Accumulated gradient</span>
-	<span style="color: #228B22"># Scaling with rho the new and the previous results</span>
-        Giter = (rho*Giter+(<span style="color: #B452CD">1</span>-rho)*gradients*gradients)
-	<span style="color: #228B22"># Taking the diagonal only and inverting</span>
-        update = gradients*eta/(delta+np.sqrt(Giter))
-	<span style="color: #228B22"># Hadamard product</span>
-        theta -= update
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;theta from own RMSprop&quot;</span>)
-<span style="color: #658b00">print</span>(theta)
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf">And finally <a href="https://arxiv.org/pdf/1412.6980.pdf" target="_blank">ADAM</a> </h2>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="line-height: 125%;"><span style="color: #228B22"># Using Autograd to calculate gradients using RMSprop  and Stochastic Gradient descent</span>
-<span style="color: #228B22"># OLS example</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">random</span> <span style="color: #8B008B; font-weight: bold">import</span> random, seed
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">autograd.numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">matplotlib.pyplot</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">plt</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">autograd</span> <span style="color: #8B008B; font-weight: bold">import</span> grad
-
-<span style="color: #228B22"># Note change from previous example</span>
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">CostOLS</span>(y,X,theta):
-    <span style="color: #8B008B; font-weight: bold">return</span> np.sum((y-X @ theta)**<span style="color: #B452CD">2</span>)
-
-n = <span style="color: #B452CD">1000</span>
-x = np.random.rand(n,<span style="color: #B452CD">1</span>)
-y = <span style="color: #B452CD">2.0</span>+<span style="color: #B452CD">3</span>*x +<span style="color: #B452CD">4</span>*x*x<span style="color: #228B22"># +np.random.randn(n,1)</span>
-
-X = np.c_[np.ones((n,<span style="color: #B452CD">1</span>)), x, x*x]
-XT_X = X.T @ X
-theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;Own inversion&quot;</span>)
-<span style="color: #658b00">print</span>(theta_linreg)
-
-
-<span style="color: #228B22"># Note that we request the derivative wrt third argument (theta, 2 here)</span>
-training_gradient = grad(CostOLS,<span style="color: #B452CD">2</span>)
-<span style="color: #228B22"># Define parameters for Stochastic Gradient Descent</span>
-n_epochs = <span style="color: #B452CD">50</span>
-M = <span style="color: #B452CD">5</span>   <span style="color: #228B22">#size of each minibatch</span>
-m = <span style="color: #658b00">int</span>(n/M) <span style="color: #228B22">#number of minibatches</span>
-<span style="color: #228B22"># Guess for unknown parameters theta</span>
-theta = np.random.randn(<span style="color: #B452CD">3</span>,<span style="color: #B452CD">1</span>)
-
-<span style="color: #228B22"># Value for learning rate</span>
-eta = <span style="color: #B452CD">0.01</span>
-<span style="color: #228B22"># Value for parameters beta1 and beta2, see https://arxiv.org/abs/1412.6980</span>
-beta1 = <span style="color: #B452CD">0.9</span>
-beta2 = <span style="color: #B452CD">0.999</span>
-<span style="color: #228B22"># Including AdaGrad parameter to avoid possible division by zero</span>
-delta  = <span style="color: #B452CD">1e-7</span>
-<span style="color: #658b00">iter</span> = <span style="color: #B452CD">0</span>
-<span style="color: #8B008B; font-weight: bold">for</span> epoch <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(n_epochs):
-    first_moment = <span style="color: #B452CD">0.0</span>
-    second_moment = <span style="color: #B452CD">0.0</span>
-    <span style="color: #658b00">iter</span> += <span style="color: #B452CD">1</span>
-    <span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(m):
-        random_index = M*np.random.randint(m)
-        xi = X[random_index:random_index+M]
-        yi = y[random_index:random_index+M]
-        gradients = (<span style="color: #B452CD">1.0</span>/M)*training_gradient(yi, xi, theta)
-        <span style="color: #228B22"># Computing moments first</span>
-        first_moment = beta1*first_moment + (<span style="color: #B452CD">1</span>-beta1)*gradients
-        second_moment = beta2*second_moment+(<span style="color: #B452CD">1</span>-beta2)*gradients*gradients
-        first_term = first_moment/(<span style="color: #B452CD">1.0</span>-beta1**<span style="color: #658b00">iter</span>)
-        second_term = second_moment/(<span style="color: #B452CD">1.0</span>-beta2**<span style="color: #658b00">iter</span>)
-	<span style="color: #228B22"># Scaling with rho the new and the previous results</span>
-        update = eta*first_term/(np.sqrt(second_term)+delta)
-        theta -= update
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;theta from own ADAM&quot;</span>)
-<span style="color: #658b00">print</span>(theta)
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="and-logistic-regression">And Logistic Regression </h2>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="line-height: 125%;"><span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">autograd.numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">autograd</span> <span style="color: #8B008B; font-weight: bold">import</span> grad
-
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">sigmoid</span>(x):
-    <span style="color: #8B008B; font-weight: bold">return</span> <span style="color: #B452CD">0.5</span> * (np.tanh(x / <span style="color: #B452CD">2.</span>) + <span style="color: #B452CD">1</span>)
-
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">logistic_predictions</span>(weights, inputs):
-    <span style="color: #228B22"># Outputs probability of a label being true according to logistic model.</span>
-    <span style="color: #8B008B; font-weight: bold">return</span> sigmoid(np.dot(inputs, weights))
-
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">training_loss</span>(weights):
-    <span style="color: #228B22"># Training loss is the negative log-likelihood of the training labels.</span>
-    preds = logistic_predictions(weights, inputs)
-    label_probabilities = preds * targets + (<span style="color: #B452CD">1</span> - preds) * (<span style="color: #B452CD">1</span> - targets)
-    <span style="color: #8B008B; font-weight: bold">return</span> -np.sum(np.log(label_probabilities))
-
-<span style="color: #228B22"># Build a toy dataset.</span>
-inputs = np.array([[<span style="color: #B452CD">0.52</span>, <span style="color: #B452CD">1.12</span>,  <span style="color: #B452CD">0.77</span>],
-                   [<span style="color: #B452CD">0.88</span>, -<span style="color: #B452CD">1.08</span>, <span style="color: #B452CD">0.15</span>],
-                   [<span style="color: #B452CD">0.52</span>, <span style="color: #B452CD">0.06</span>, -<span style="color: #B452CD">1.30</span>],
-                   [<span style="color: #B452CD">0.74</span>, -<span style="color: #B452CD">2.49</span>, <span style="color: #B452CD">1.39</span>]])
-targets = np.array([<span style="color: #8B008B; font-weight: bold">True</span>, <span style="color: #8B008B; font-weight: bold">True</span>, <span style="color: #8B008B; font-weight: bold">False</span>, <span style="color: #8B008B; font-weight: bold">True</span>])
-
-<span style="color: #228B22"># Define a function that returns gradients of training loss using Autograd.</span>
-training_gradient_fun = grad(training_loss)
-
-<span style="color: #228B22"># Optimize weights using gradient descent.</span>
-weights = np.array([<span style="color: #B452CD">0.0</span>, <span style="color: #B452CD">0.0</span>, <span style="color: #B452CD">0.0</span>])
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;Initial loss:&quot;</span>, training_loss(weights))
-<span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #B452CD">100</span>):
-    weights -= training_gradient_fun(weights) * <span style="color: #B452CD">0.01</span>
-
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;Trained loss:&quot;</span>, training_loss(weights))
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="introducing-jax-https-jax-readthedocs-io-en-latest">Introducing <a href="https://jax.readthedocs.io/en/latest/" target="_blank">JAX</a> </h2>
-
-<p>Presently, instead of using <b>autograd</b>, we recommend using <a href="https://jax.readthedocs.io/en/latest/" target="_blank">JAX</a></p>
-
-<p><b>JAX</b> is Autograd and <a href="https://www.tensorflow.org/xla" target="_blank">XLA (Accelerated Linear Algebra))</a>,
-brought together for high-performance numerical computing and machine learning research.
-It provides composable transformations of Python+NumPy programs: differentiate, vectorize, parallelize, Just-In-Time compile to GPU/TPU, and more.
-</p>
-
-<p>Here's a simple example on how you can use <b>JAX</b> to compute the derivate of the logistic function.</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="line-height: 125%;"><span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">jax.numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">jnp</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">jax</span> <span style="color: #8B008B; font-weight: bold">import</span> grad, jit, vmap
-
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">sum_logistic</span>(x):
-  <span style="color: #8B008B; font-weight: bold">return</span> jnp.sum(<span style="color: #B452CD">1.0</span> / (<span style="color: #B452CD">1.0</span> + jnp.exp(-x)))
-
-x_small = jnp.arange(<span style="color: #B452CD">3.</span>)
-derivative_fn = grad(sum_logistic)
-<span style="color: #658b00">print</span>(derivative_fn(x_small))
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-
 <!-- ------------------- end of main content --------------- -->
 <center style="font-size:80%">
-<!-- copyright --> &copy; 1999-2024, Morten Hjorth-Jensen. Released under CC Attribution-NonCommercial 4.0 license
+<!-- copyright --> &copy; 1999-2025, Morten Hjorth-Jensen. Released under CC Attribution-NonCommercial 4.0 license
 </center>
 </body>
 </html>
diff --git a/doc/pub/week39/html/week39.html b/doc/pub/week39/html/week39.html
index 0efaaec20..59032cd58 100644
--- a/doc/pub/week39/html/week39.html
+++ b/doc/pub/week39/html/week39.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 39: Optimization and  Gradient Methods">
-<title>Week 39: Optimization and  Gradient Methods</title>
+<meta name="description" content="Week 39: Resampling methods and logistic regression">
+<title>Week 39: Resampling methods and logistic regression</title>
 <style type="text/css">
 /* bloodish style */
 body {
@@ -140,248 +140,132 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Plan for week 39, September 23-27, 2024',
+ 'sections': [('Plan for week 39, September 22-26, 2025',
                2,
                None,
-               'plan-for-week-39-september-23-27-2024'),
-              ('Lecture Monday September 23',
+               'plan-for-week-39-september-22-26-2025'),
+              ('Readings and Videos, resampling methods',
                2,
                None,
-               'lecture-monday-september-23'),
-              ('Lab sessions week 39', 2, None, 'lab-sessions-week-39'),
-              ('Lecture Monday September 23, Optimization, the central part of '
-               'any Machine Learning algortithm',
-               2,
-               None,
-               'lecture-monday-september-23-optimization-the-central-part-of-any-machine-learning-algortithm'),
-              ('Revisiting our Logistic Regression case',
-               2,
-               None,
-               'revisiting-our-logistic-regression-case'),
-              ('The equations to solve', 2, None, 'the-equations-to-solve'),
-              ("Solving using Newton-Raphson's method",
-               2,
-               None,
-               'solving-using-newton-raphson-s-method'),
-              ("Brief reminder on Newton-Raphson's method",
-               2,
-               None,
-               'brief-reminder-on-newton-raphson-s-method'),
-              ('The equations', 2, None, 'the-equations'),
-              ('Simple geometric interpretation',
-               2,
-               None,
-               'simple-geometric-interpretation'),
-              ('Extending to more than one variable',
-               2,
-               None,
-               'extending-to-more-than-one-variable'),
-              ('Steepest descent', 2, None, 'steepest-descent'),
-              ('More on Steepest descent', 2, None, 'more-on-steepest-descent'),
-              ('The ideal', 2, None, 'the-ideal'),
-              ('The sensitiveness of the gradient descent',
-               2,
-               None,
-               'the-sensitiveness-of-the-gradient-descent'),
-              ('Convex functions', 2, None, 'convex-functions'),
-              ('Convex function', 2, None, 'convex-function'),
-              ('Conditions on convex functions',
-               2,
-               None,
-               'conditions-on-convex-functions'),
-              ('More on convex functions', 2, None, 'more-on-convex-functions'),
-              ('Some simple problems', 2, None, 'some-simple-problems'),
-              ('Standard steepest descent',
-               2,
-               None,
-               'standard-steepest-descent'),
-              ('Gradient method', 2, None, 'gradient-method'),
-              ('Steepest descent  method', 2, None, 'steepest-descent-method'),
-              ('Steepest descent  method', 2, None, 'steepest-descent-method'),
-              ('Final expressions', 2, None, 'final-expressions'),
-              ('Steepest descent example', 2, None, 'steepest-descent-example'),
-              ('Conjugate gradient method',
-               2,
-               None,
-               'conjugate-gradient-method'),
-              ('Conjugate gradient method',
-               2,
-               None,
-               'conjugate-gradient-method'),
-              ('Conjugate gradient method',
-               2,
-               None,
-               'conjugate-gradient-method'),
-              ('Conjugate gradient method',
-               2,
-               None,
-               'conjugate-gradient-method'),
-              ('Conjugate gradient method and iterations',
+               'readings-and-videos-resampling-methods'),
+              ('Readings and Videos, logistic regression',
                2,
                None,
-               'conjugate-gradient-method-and-iterations'),
-              ('Conjugate gradient method',
-               2,
-               None,
-               'conjugate-gradient-method'),
-              ('Conjugate gradient method',
-               2,
-               None,
-               'conjugate-gradient-method'),
-              ('Conjugate gradient method',
-               2,
-               None,
-               'conjugate-gradient-method'),
-              ('Revisiting our first homework',
-               2,
-               None,
-               'revisiting-our-first-homework'),
-              ('Gradient descent example', 2, None, 'gradient-descent-example'),
-              ('The derivative of the cost/loss function',
-               2,
-               None,
-               'the-derivative-of-the-cost-loss-function'),
-              ('The Hessian matrix', 2, None, 'the-hessian-matrix'),
-              ('Simple program', 2, None, 'simple-program'),
-              ('Gradient Descent Example', 2, None, 'gradient-descent-example'),
-              ('And a corresponding example using _scikit-learn_',
-               2,
-               None,
-               'and-a-corresponding-example-using-scikit-learn'),
-              ('Gradient descent and Ridge',
-               2,
-               None,
-               'gradient-descent-and-ridge'),
-              ('The Hessian matrix for Ridge Regression',
-               2,
-               None,
-               'the-hessian-matrix-for-ridge-regression'),
-              ('Program example for gradient descent with Ridge Regression',
-               2,
-               None,
-               'program-example-for-gradient-descent-with-ridge-regression'),
-              ('Using gradient descent methods, limitations',
-               2,
-               None,
-               'using-gradient-descent-methods-limitations'),
-              ('Improving gradient descent with momentum',
-               2,
-               None,
-               'improving-gradient-descent-with-momentum'),
-              ('Same code but now with momentum gradient descent',
+               'readings-and-videos-logistic-regression'),
+              ('Lab sessions week 39', 2, None, 'lab-sessions-week-39'),
+              ('Lecture material', 2, None, 'lecture-material'),
+              ('Resampling methods', 2, None, 'resampling-methods'),
+              ('Resampling approaches can be computationally expensive',
                2,
                None,
-               'same-code-but-now-with-momentum-gradient-descent'),
-              ('Overview video on Stochastic Gradient Descent',
+               'resampling-approaches-can-be-computationally-expensive'),
+              ('Why resampling methods ?', 2, None, 'why-resampling-methods'),
+              ('Statistical analysis', 2, None, 'statistical-analysis'),
+              ('Resampling methods', 2, None, 'resampling-methods'),
+              ('Resampling methods: Bootstrap',
                2,
                None,
-               'overview-video-on-stochastic-gradient-descent'),
-              ('Batches and mini-batches', 2, None, 'batches-and-mini-batches'),
-              ('Stochastic Gradient Descent (SGD)',
+               'resampling-methods-bootstrap'),
+              ('The bias-variance tradeoff',
                2,
                None,
-               'stochastic-gradient-descent-sgd'),
-              ('Stochastic Gradient Descent',
+               'the-bias-variance-tradeoff'),
+              ('A way to Read the Bias-Variance Tradeoff',
                2,
                None,
-               'stochastic-gradient-descent'),
-              ('Computation of gradients', 2, None, 'computation-of-gradients'),
-              ('SGD example', 2, None, 'sgd-example'),
-              ('The gradient step', 2, None, 'the-gradient-step'),
-              ('Simple example code', 2, None, 'simple-example-code'),
-              ('When do we stop?', 2, None, 'when-do-we-stop'),
-              ('Slightly different approach',
+               'a-way-to-read-the-bias-variance-tradeoff'),
+              ('Understanding what happens',
                2,
                None,
-               'slightly-different-approach'),
-              ('Time decay rate', 2, None, 'time-decay-rate'),
-              ('Code with a Number of Minibatches which varies',
+               'understanding-what-happens'),
+              ('Summing up', 2, None, 'summing-up'),
+              ("Another Example from Scikit-Learn's Repository",
                2,
                None,
-               'code-with-a-number-of-minibatches-which-varies'),
-              ('Replace or not', 2, None, 'replace-or-not'),
-              ('Momentum based GD', 2, None, 'momentum-based-gd'),
-              ('More on momentum based approaches',
+               'another-example-from-scikit-learn-s-repository'),
+              ('Various steps in cross-validation',
                2,
                None,
-               'more-on-momentum-based-approaches'),
-              ('Momentum parameter', 2, None, 'momentum-parameter'),
-              ('Second moment of the gradient',
+               'various-steps-in-cross-validation'),
+              ('Cross-validation in brief',
                2,
                None,
-               'second-moment-of-the-gradient'),
-              ('RMS prop', 2, None, 'rms-prop'),
-              ('"ADAM optimizer":"https://arxiv.org/abs/1412.6980"',
+               'cross-validation-in-brief'),
+              ('Code Example for Cross-validation and $k$-fold '
+               'Cross-validation',
                2,
                None,
-               'adam-optimizer-https-arxiv-org-abs-1412-6980'),
-              ('Algorithms and codes for Adagrad, RMSprop and Adam',
+               'code-example-for-cross-validation-and-k-fold-cross-validation'),
+              ('More examples on bootstrap and cross-validation and errors',
                2,
                None,
-               'algorithms-and-codes-for-adagrad-rmsprop-and-adam'),
-              ('Practical tips', 2, None, 'practical-tips'),
-              ('Automatic differentiation',
+               'more-examples-on-bootstrap-and-cross-validation-and-errors'),
+              ('The same example but now with cross-validation',
                2,
                None,
-               'automatic-differentiation'),
-              ('Using autograd', 2, None, 'using-autograd'),
-              ('Autograd with more complicated functions',
+               'the-same-example-but-now-with-cross-validation'),
+              ('Logistic Regression', 2, None, 'logistic-regression'),
+              ('Classification problems', 2, None, 'classification-problems'),
+              ('Optimization and Deep learning',
                2,
                None,
-               'autograd-with-more-complicated-functions'),
-              ('More complicated functions using the elements of their '
-               'arguments directly',
+               'optimization-and-deep-learning'),
+              ('Basics', 2, None, 'basics'),
+              ('Linear classifier', 2, None, 'linear-classifier'),
+              ('Some selected properties', 2, None, 'some-selected-properties'),
+              ('Simple example', 2, None, 'simple-example'),
+              ('Plotting the mean value for each group',
                2,
                None,
-               'more-complicated-functions-using-the-elements-of-their-arguments-directly'),
-              ('Functions using mathematical functions from Numpy',
+               'plotting-the-mean-value-for-each-group'),
+              ('The logistic function', 2, None, 'the-logistic-function'),
+              ('Examples of likelihood functions used in logistic regression '
+               'and nueral networks',
                2,
                None,
-               'functions-using-mathematical-functions-from-numpy'),
-              ('More autograd', 2, None, 'more-autograd'),
-              ('And  with loops', 2, None, 'and-with-loops'),
-              ('Using recursion', 2, None, 'using-recursion'),
-              ('Unsupported functions', 2, None, 'unsupported-functions'),
-              ('The syntax a.dot(b) when finding the dot product',
+               'examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-networks'),
+              ('Two parameters', 2, None, 'two-parameters'),
+              ('Maximum likelihood', 2, None, 'maximum-likelihood'),
+              ('The cost function rewritten',
                2,
                None,
-               'the-syntax-a-dot-b-when-finding-the-dot-product'),
-              ('Using Autograd with OLS', 2, None, 'using-autograd-with-ols'),
-              ('Same code but now with momentum gradient descent',
+               'the-cost-function-rewritten'),
+              ('Minimizing the cross entropy',
                2,
                None,
-               'same-code-but-now-with-momentum-gradient-descent'),
-              ("But none of these can compete with Newton's method",
+               'minimizing-the-cross-entropy'),
+              ('A more compact expression',
                2,
                None,
-               'but-none-of-these-can-compete-with-newton-s-method'),
-              ('Including Stochastic Gradient Descent with Autograd',
+               'a-more-compact-expression'),
+              ('Extending to more predictors',
                2,
                None,
-               'including-stochastic-gradient-descent-with-autograd'),
-              ('Same code but now with momentum gradient descent',
+               'extending-to-more-predictors'),
+              ('Including more classes', 2, None, 'including-more-classes'),
+              ('More classes', 2, None, 'more-classes'),
+              ('Optimization, the central part of any Machine Learning '
+               'algortithm',
                2,
                None,
-               'same-code-but-now-with-momentum-gradient-descent'),
-              ('Similar (second order function now) problem but now with '
-               'AdaGrad',
+               'optimization-the-central-part-of-any-machine-learning-algortithm'),
+              ('Revisiting our Logistic Regression case',
                2,
                None,
-               'similar-second-order-function-now-problem-but-now-with-adagrad'),
-              ('RMSprop for adaptive learning rate with Stochastic Gradient '
-               'Descent',
+               'revisiting-our-logistic-regression-case'),
+              ('The equations to solve', 2, None, 'the-equations-to-solve'),
+              ("Solving using Newton-Raphson's method",
                2,
                None,
-               'rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent'),
-              ('And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"',
+               'solving-using-newton-raphson-s-method'),
+              ('Example code for Logistic Regression',
                2,
                None,
-               'and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf'),
-              ('And Logistic Regression', 2, None, 'and-logistic-regression'),
-              ('Introducing "JAX":"https://jax.readthedocs.io/en/latest/"',
-               2,
+               'example-code-for-logistic-regression'),
+              ('Synthetic data generation',
+               3,
                None,
-               'introducing-jax-https-jax-readthedocs-io-en-latest')]}
+               'synthetic-data-generation')]}
 end of tocinfo -->
 
 <body>
@@ -403,19 +287,16 @@
 
 <!-- ------------------- main content ---------------------- -->
 <center>
-<h1>Week 39: Optimization and  Gradient Methods</h1>
+<h1>Week 39: Resampling methods and logistic regression</h1>
 </center>  <!-- document title -->
 
 <!-- author(s): Morten Hjorth-Jensen -->
 <center>
-<b>Morten Hjorth-Jensen</b> [1, 2]
-</center>
-<!-- institution(s) -->
-<center>
-[1] <b>Department of Physics, University of Oslo</b>
+<b>Morten Hjorth-Jensen</b> 
 </center>
+<!-- institution -->
 <center>
-[2] <b>Department of Physics and Astronomy and Facility for Rare Isotope Beams, Michigan State University</b>
+<b>Department of Physics, University of Oslo</b>
 </center>
 <br>
 <center>
@@ -424,28 +305,46 @@ <h4>Week 39</h4>
 <br>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="plan-for-week-39-september-23-27-2024">Plan for week 39, September 23-27, 2024 </h2>
+<h2 id="plan-for-week-39-september-22-26-2025">Plan for week 39, September 22-26, 2025 </h2>
+
+<div class="alert alert-block alert-block alert-text-normal">
+<b>Material for the lecture on Monday September 22</b>
+<p>
+<ol>
+<li> Resampling techniques, Bootstrap and cross validation and bias-variance tradeoff</li>
+<li> Logistic regression, our first classification encounter and a stepping stone towards neural networks</li>
+<li> <a href="https://youtu.be/OVouJyhoksY" target="_blank">Video of lecture</a></li>
+<li> <a href="https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2024/FYSSTKweek39.pdf" target="_blank">Whiteboard notes</a></li>
+</ol>
+</div>
+
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="lecture-monday-september-23">Lecture Monday September 23 </h2>
+<h2 id="readings-and-videos-resampling-methods">Readings and Videos, resampling methods </h2>
+<div class="alert alert-block alert-block alert-text-normal">
+<b></b>
+<p>
+<ol>
+<li> Raschka et al, pages 175-192</li>
+<li> Hastie et al Chapter 7, here we recommend 7.1-7.5 and 7.10 (cross-validation) and 7.11 (bootstrap). See <a href="https://link.springer.com/book/10.1007/978-0-387-84858-7" target="_blank"><tt>https://link.springer.com/book/10.1007/978-0-387-84858-7</tt></a>.</li>
+<li> <a href="https://www.youtube.com/watch?v=EuBBz3bI-aA" target="_blank">Video on bias-variance tradeoff</a></li>
+<li> <a href="https://www.youtube.com/watch?v=Xz0x-8-cgaQ" target="_blank">Video on Bootstrapping</a></li>
+<li> <a href="https://www.youtube.com/watch?v=fSytzGwwBVw" target="_blank">Video on cross validation</a></li>
+</ol>
+</div>
 
+
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="readings-and-videos-logistic-regression">Readings and Videos, logistic regression </h2>
 <div class="alert alert-block alert-block alert-text-normal">
-<b>Material for the lecture on Monday September 23</b>
+<b></b>
 <p>
-<ul>
-  <li> Repetition of Logistic regression equations and classification problems and discussion of Gradient methods. Examples on how to implement Logistic Regression and discussion of gradient methods</li>
-  <li> Stochastic Gradient descent with examples and automatic differentiation (theme also for next week).</li>
-  <li> <a href="https://youtu.be/ISGpTC28Vmk" target="_blank">Video of lecture</a></li>
-  <li> <a href="https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2024/NotesSeptember23.pdf" target="_blank">Whiteboard notes</a></li>
-  <li> Readings and Videos:</li>
-<ul>
-    <li> These lecture notes</li>
-    <li> For a good discussion on gradient methods, we would like to recommend Goodfellow et al section 4.3-4.5 and sections 8.3-8.6. We will come back to the latter chapter in our discussion of Neural networks as well.</li>
-    <li> Raschka et al, pages 53-76 on Logistic regression and pages 37-52 on gradient optimization</li>    
-    <li> <a href="https://www.youtube.com/watch?v=sDv4f4s2SB8" target="_blank">Video on gradient descent</a></li>
-    <li> <a href="https://www.youtube.com/watch?v=vMh0zPT0tLI" target="_blank">Video on stochastic gradient descent</a></li>
-</ul>
-</ul>
+<ol>
+<li> Hastie et al 4.1, 4.2 and 4.3 on logistic regression</li>
+<li> Raschka et al, pages 53-76 on Logistic regression and pages 37-52 on gradient optimization</li>
+<li> <a href="https://www.youtube.com/watch?v=C5268D9t9Ak" target="_blank">Video on Logistic regression</a></li>
+<li> <a href="https://www.youtube.com/watch?v=yIYKR4sgzI8" target="_blank">Yet another video on logistic regression</a></li>
+</ol>
 </div>
 
 
@@ -453,554 +352,573 @@ <h2 id="lecture-monday-september-23">Lecture Monday September 23 </h2>
 <h2 id="lab-sessions-week-39">Lab sessions week 39 </h2>
 
 <div class="alert alert-block alert-block alert-text-normal">
-<b>Material for the active learning sessions on Tuesday and Wednesday</b>
+<b>Material for the lab sessions on Tuesday and Wednesday</b>
 <p>
-<ul>
-  <li> Discussions on how to structure your report for the first project</li>
-  <li> Exercise for week 39 on how to write the abstract and the introduction of the report and how to include references.</li> 
-  <li> Work on project 1, in particular resampling methods like cross-validation and bootstrap. <b>For more discussions of project 1, chapter 5 of Goodfellow et al is a good read, in particular sections 5.1-5.5 and 5.7-5.11</b>.</li>
-  <li> <a href="https://youtu.be/tVW1ZDmZnwM" target="_blank">Video on how to write scientific reports recorded during one of the lab sessions</a></li>
-  <li> A general guideline can be found at <a href="https://github.com/CompPhysics/MachineLearning/blob/master/doc/Projects/EvaluationGrading/EvaluationForm.md" target="_blank"><tt>https://github.com/CompPhysics/MachineLearning/blob/master/doc/Projects/EvaluationGrading/EvaluationForm.md</tt></a>.</li>
-</ul>
+<ol>
+<li> Discussions on how to structure your report for the first project</li>
+<li> Exercise for week 39 on how to write the abstract and the introduction of the report and how to include references.</li> 
+<li> Work on project 1, in particular resampling methods like cross-validation and bootstrap. <b>For more discussions of project 1, chapter 5 of Goodfellow et al is a good read, in particular sections 5.1-5.5 and 5.7-5.11</b>.</li>
+<li> <a href="https://youtu.be/tVW1ZDmZnwM" target="_blank">Video on how to write scientific reports recorded during one of the lab sessions</a></li>
+<li> A general guideline can be found at <a href="https://github.com/CompPhysics/MachineLearning/blob/master/doc/Projects/EvaluationGrading/EvaluationForm.md" target="_blank"><tt>https://github.com/CompPhysics/MachineLearning/blob/master/doc/Projects/EvaluationGrading/EvaluationForm.md</tt></a>.</li>
+</ol>
 </div>
   
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="lecture-monday-september-23-optimization-the-central-part-of-any-machine-learning-algortithm">Lecture Monday September 23, Optimization, the central part of any Machine Learning algortithm </h2>
-
-<p>The first few slides here are a repetition from last week. </p>
-
-<p>Almost every problem in machine learning and data science starts with
-a dataset \( X \), a model \( g(\beta) \), which is a function of the
-parameters \( \beta \) and a cost function \( C(X, g(\beta)) \) that allows
-us to judge how well the model \( g(\beta) \) explains the observations
-\( X \). The model is fit by finding the values of \( \beta \) that minimize
-the cost function. Ideally we would be able to solve for \( \beta \)
-analytically, however this is not possible in general and we must use
-some approximative/numerical method to compute the minimum.
-</p>
+<h2 id="lecture-material">Lecture material </h2>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="revisiting-our-logistic-regression-case">Revisiting our Logistic Regression case </h2>
-
-<p>In our discussion on Logistic Regression we studied the 
-case of
-two classes, with \( y_i \) either
-\( 0 \) or \( 1 \). Furthermore we assumed also that we have only two
-parameters \( \beta \) in our fitting, that is we
-defined probabilities
-</p>
-
-$$
-\begin{align*}
-p(y_i=1|x_i,\boldsymbol{\beta}) &= \frac{\exp{(\beta_0+\beta_1x_i)}}{1+\exp{(\beta_0+\beta_1x_i)}},\nonumber\\
-p(y_i=0|x_i,\boldsymbol{\beta}) &= 1 - p(y_i=1|x_i,\boldsymbol{\beta}),
-\end{align*}
-$$
+<h2 id="resampling-methods">Resampling methods </h2>
+<div class="alert alert-block alert-block alert-text-normal">
+<b></b>
+<p>
+<p>Resampling methods are an indispensable tool in modern
+statistics. They involve repeatedly drawing samples from a training
+set and refitting a model of interest on each sample in order to
+obtain additional information about the fitted model. For example, in
+order to estimate the variability of a linear regression fit, we can
+repeatedly draw different samples from the training data, fit a linear
+regression to each new sample, and then examine the extent to which
+the resulting fits differ. Such an approach may allow us to obtain
+information that would not be available from fitting the model only
+once using the original training sample.
+</p>
+
+<p>Two resampling methods are often used in Machine Learning analyses,</p>
+<ol>
+<li> The <b>bootstrap method</b></li>
+<li> and <b>Cross-Validation</b></li>
+</ol>
+<p>In addition there are several other methods such as the Jackknife and the Blocking methods. This week will repeat some of the elements of the bootstrap method and focus more on cross-validation.</p>
+</div>
 
-<p>where \( \boldsymbol{\beta} \) are the weights we wish to extract from data, in our case \( \beta_0 \) and \( \beta_1 \). </p>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="the-equations-to-solve">The equations to solve </h2>
-
-<p>Our compact equations used a definition of a vector \( \boldsymbol{y} \) with \( n \)
-elements \( y_i \), an \( n\times p \) matrix \( \boldsymbol{X} \) which contains the
-\( x_i \) values and a vector \( \boldsymbol{p} \) of fitted probabilities
-\( p(y_i\vert x_i,\boldsymbol{\beta}) \). We rewrote in a more compact form
-the first derivative of the cost function as
-</p>
-
-$$
-\frac{\partial \mathcal{C}(\boldsymbol{\beta})}{\partial \boldsymbol{\beta}} = -\boldsymbol{X}^T\left(\boldsymbol{y}-\boldsymbol{p}\right). 
-$$
+<h2 id="resampling-approaches-can-be-computationally-expensive">Resampling approaches can be computationally expensive </h2>
+<div class="alert alert-block alert-block alert-text-normal">
+<b></b>
+<p>
 
-<p>If we in addition define a diagonal matrix \( \boldsymbol{W} \) with elements 
-\( p(y_i\vert x_i,\boldsymbol{\beta})(1-p(y_i\vert x_i,\boldsymbol{\beta}) \), we can obtain a compact expression of the second derivative as 
+<p>Resampling approaches can be computationally expensive, because they
+involve fitting the same statistical method multiple times using
+different subsets of the training data. However, due to recent
+advances in computing power, the computational requirements of
+resampling methods generally are not prohibitive. In this chapter, we
+discuss two of the most commonly used resampling methods,
+cross-validation and the bootstrap. Both methods are important tools
+in the practical application of many statistical learning
+procedures. For example, cross-validation can be used to estimate the
+test error associated with a given statistical learning method in
+order to evaluate its performance, or to select the appropriate level
+of flexibility. The process of evaluating a model&#8217;s performance is
+known as model assessment, whereas the process of selecting the proper
+level of flexibility for a model is known as model selection. The
+bootstrap is widely used.
 </p>
+</div>
 
-$$
-\frac{\partial^2 \mathcal{C}(\boldsymbol{\beta})}{\partial \boldsymbol{\beta}\partial \boldsymbol{\beta}^T} = \boldsymbol{X}^T\boldsymbol{W}\boldsymbol{X}. 
-$$
-
-<p>This defines what is called  the Hessian matrix.</p>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="solving-using-newton-raphson-s-method">Solving using Newton-Raphson's method </h2>
+<h2 id="why-resampling-methods">Why resampling methods ? </h2>
+<div class="alert alert-block alert-block alert-text-normal">
+<b>Statistical analysis</b>
+<p>
 
-<p>If we can set up these equations, Newton-Raphson's iterative method is normally the method of choice. It requires however that we can compute in an efficient way the  matrices that define the first and second derivatives. </p>
+<ul>
+<li> Our simulations can be treated as <em>computer experiments</em>. This is particularly the case for Monte Carlo methods which are widely used in statistical analyses.</li>
+<li> The results can be analysed with the same statistical tools as we would use when analysing experimental data.</li>
+<li> As in all experiments, we are looking for expectation values and an estimate of how accurate they are, i.e., possible sources for errors.</li>
+</ul>
+</div>
+    
 
-<p>Our iterative scheme is then given by</p>
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="statistical-analysis">Statistical analysis </h2>
+<div class="alert alert-block alert-block alert-text-normal">
+<b></b>
+<p>
 
-$$
-\boldsymbol{\beta}^{\mathrm{new}} = \boldsymbol{\beta}^{\mathrm{old}}-\left(\frac{\partial^2 \mathcal{C}(\boldsymbol{\beta})}{\partial \boldsymbol{\beta}\partial \boldsymbol{\beta}^T}\right)^{-1}_{\boldsymbol{\beta}^{\mathrm{old}}}\times \left(\frac{\partial \mathcal{C}(\boldsymbol{\beta})}{\partial \boldsymbol{\beta}}\right)_{\boldsymbol{\beta}^{\mathrm{old}}},
-$$
+<ul>
+<li> As in other experiments, many numerical  experiments have two classes of errors:</li>
+<ul>
+  <li> Statistical errors</li>
+  <li> Systematical errors</li>
+</ul>
+<li> Statistical errors can be estimated using standard tools from statistics</li>
+<li> Systematical errors are method specific and must be treated differently from case to case.</li> 
+</ul>
+</div>
+    
 
-<p>or in matrix form as</p>
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="resampling-methods">Resampling methods </h2>
 
-$$
-\boldsymbol{\beta}^{\mathrm{new}} = \boldsymbol{\beta}^{\mathrm{old}}-\left(\boldsymbol{X}^T\boldsymbol{W}\boldsymbol{X} \right)^{-1}\times \left(-\boldsymbol{X}^T(\boldsymbol{y}-\boldsymbol{p}) \right)_{\boldsymbol{\beta}^{\mathrm{old}}}.
-$$
+<p>With all these analytical equations for both the OLS and Ridge
+regression, we will now outline how to assess a given model. This will
+lead to a discussion of the so-called bias-variance tradeoff (see
+below) and so-called resampling methods.
+</p>
 
-<p>The right-hand side is computed with the old values of \( \beta \). </p>
+<p>One of the quantities we have discussed as a way to measure errors is
+the mean-squared error (MSE), mainly used for fitting of continuous
+functions. Another choice is the absolute error.
+</p>
 
-<p>If we can compute these matrices, in particular the Hessian, the above is often the easiest method to implement. </p>
+<p>In the discussions below we will focus on the MSE and in particular since we will split the data into test and training data,
+we discuss the
+</p>
+<ol>
+<li> prediction error or simply the <b>test error</b> \( \mathrm{Err_{Test}} \), where we have a fixed training set and the test error is the MSE arising from the data reserved for testing. We discuss also the</li> 
+<li> training error \( \mathrm{Err_{Train}} \), which is the average loss over the training data.</li>
+</ol>
+<p>As our model becomes more and more complex, more of the training data tends to  used. The training may thence adapt to more complicated structures in the data. This may lead to a decrease in the bias (see below for code example) and a slight increase of the variance for the test error.
+For a certain level of complexity the test error will reach minimum, before starting to increase again. The
+training error reaches a saturation.
+</p>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="brief-reminder-on-newton-raphson-s-method">Brief reminder on Newton-Raphson's method </h2>
+<h2 id="resampling-methods-bootstrap">Resampling methods: Bootstrap </h2>
+<div class="alert alert-block alert-block alert-text-normal">
+<b></b>
+<p>
+<p>Bootstrapping is a <a href="https://en.wikipedia.org/wiki/Nonparametric_statistics" target="_blank">non-parametric approach</a> to statistical inference
+that substitutes computation for more traditional distributional
+assumptions and asymptotic results. Bootstrapping offers a number of
+advantages: 
+</p>
+<ol>
+<li> The bootstrap is quite general, although there are some cases in which it fails.</li>  
+<li> Because it does not require distributional assumptions (such as normally distributed errors), the bootstrap can provide more accurate inferences when the data are not well behaved or when the sample size is small.</li>  
+<li> It is possible to apply the bootstrap to statistics with sampling distributions that are difficult to derive, even asymptotically.</li> 
+<li> It is relatively simple to apply the bootstrap to complex data-collection plans (such as stratified and clustered samples).</li>
+</ol>
+</div>
 
-<p>Let us quickly remind ourselves how we derive the above method.</p>
 
-<p>Perhaps the most celebrated of all one-dimensional root-finding
-routines is Newton's method, also called the Newton-Raphson
-method. This method  requires the evaluation of both the
-function \( f \) and its derivative \( f' \) at arbitrary points. 
-If you can only calculate the derivative
-numerically and/or your function is not of the smooth type, we
-normally discourage the use of this method.
-</p>
+<p>The textbook by <a href="https://www.cambridge.org/core/books/bootstrap-methods-and-their-application/ED2FD043579F27952363566DC09CBD6A" target="_blank">Davison on the Bootstrap Methods and their Applications</a> provides many more insights and proofs. In this course we will take a more practical approach and use the results and theorems provided in the literature. For those interested in reading more about the bootstrap methods, we recommend the above text and the one by <a href="https://www.routledge.com/An-Introduction-to-the-Bootstrap/Efron-Tibshirani/p/book/9780412042317" target="_blank">Efron and Tibshirani</a>.</p>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="the-equations">The equations </h2>
+<h2 id="the-bias-variance-tradeoff">The bias-variance tradeoff </h2>
 
-<p>The Newton-Raphson formula consists geometrically of extending the
-tangent line at a current point until it crosses zero, then setting
-the next guess to the abscissa of that zero-crossing.  The mathematics
-behind this method is rather simple. Employing a Taylor expansion for
-\( x \) sufficiently close to the solution \( s \), we have
+<p>We will discuss the bias-variance tradeoff in the context of
+continuous predictions such as regression. However, many of the
+intuitions and ideas discussed here also carry over to classification
+tasks. Consider a dataset \( \mathcal{D} \) consisting of the data
+\( \mathbf{X}_\mathcal{D}=\{(y_j, \boldsymbol{x}_j), j=0\ldots n-1\} \). 
 </p>
 
+<p>Let us assume that the true data is generated from a noisy model</p>
+
 $$
-    f(s)=0=f(x)+(s-x)f'(x)+\frac{(s-x)^2}{2}f''(x) +\dots.
-    \label{eq:taylornr}
+\boldsymbol{y}=f(\boldsymbol{x}) + \boldsymbol{\epsilon}
 $$
 
-<p>For small enough values of the function and for well-behaved
-functions, the terms beyond linear are unimportant, hence we obtain
-</p>
+<p>where \( \epsilon \) is normally distributed with mean zero and standard deviation \( \sigma^2 \).</p>
 
-$$
-   f(x)+(s-x)f'(x)\approx 0,
-$$
+<p>In our derivation of the ordinary least squares method we defined then
+an approximation to the function \( f \) in terms of the parameters
+\( \boldsymbol{\theta} \) and the design matrix \( \boldsymbol{X} \) which embody our model,
+that is \( \boldsymbol{\tilde{y}}=\boldsymbol{X}\boldsymbol{\theta} \). 
+</p>
 
-<p>yielding</p>
+<p>Thereafter we found the parameters \( \boldsymbol{\theta} \) by optimizing the means squared error via the so-called cost function</p>
 $$
-   s\approx x-\frac{f(x)}{f'(x)}.
+C(\boldsymbol{X},\boldsymbol{\theta}) =\frac{1}{n}\sum_{i=0}^{n-1}(y_i-\tilde{y}_i)^2=\mathbb{E}\left[(\boldsymbol{y}-\boldsymbol{\tilde{y}})^2\right].
 $$
 
-<p>Having in mind an iterative procedure, it is natural to start iterating with</p>
+<p>We can rewrite this as </p>
 $$
-   x_{n+1}=x_n-\frac{f(x_n)}{f'(x_n)}.
+\mathbb{E}\left[(\boldsymbol{y}-\boldsymbol{\tilde{y}})^2\right]=\frac{1}{n}\sum_i(f_i-\mathbb{E}\left[\boldsymbol{\tilde{y}}\right])^2+\frac{1}{n}\sum_i(\tilde{y}_i-\mathbb{E}\left[\boldsymbol{\tilde{y}}\right])^2+\sigma^2.
 $$
 
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="simple-geometric-interpretation">Simple geometric interpretation </h2>
-
-<p>The above is Newton-Raphson's method. It has a simple geometric
-interpretation, namely \( x_{n+1} \) is the point where the tangent from
-\( (x_n,f(x_n)) \) crosses the \( x \)-axis.  Close to the solution,
-Newton-Raphson converges fast to the desired result. However, if we
-are far from a root, where the higher-order terms in the series are
-important, the Newton-Raphson formula can give grossly inaccurate
-results. For instance, the initial guess for the root might be so far
-from the true root as to let the search interval include a local
-maximum or minimum of the function.  If an iteration places a trial
-guess near such a local extremum, so that the first derivative nearly
-vanishes, then Newton-Raphson may fail totally
+<p>The three terms represent the square of the bias of the learning
+method, which can be thought of as the error caused by the simplifying
+assumptions built into the method. The second term represents the
+variance of the chosen model and finally the last terms is variance of
+the error \( \boldsymbol{\epsilon} \).
 </p>
 
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="extending-to-more-than-one-variable">Extending to more than one variable </h2>
-
-<p>Newton's method can be generalized to systems of several non-linear equations
-and variables. Consider the case with two equations
+<p>To derive this equation, we need to recall that the variance of \( \boldsymbol{y} \) and \( \boldsymbol{\epsilon} \) are both equal to \( \sigma^2 \). The mean value of \( \boldsymbol{\epsilon} \) is by definition equal to zero. Furthermore, the function \( f \) is not a stochastics variable, idem for \( \boldsymbol{\tilde{y}} \).
+We use a more compact notation in terms of the expectation value 
 </p>
 $$
-   \begin{array}{cc} f_1(x_1,x_2) &=0\\
-                     f_2(x_1,x_2) &=0,\end{array}
-$$
-
-<p>which we Taylor expand to obtain</p>
-
-$$
-   \begin{array}{cc} 0=f_1(x_1+h_1,x_2+h_2)=&f_1(x_1,x_2)+h_1
-                     \partial f_1/\partial x_1+h_2
-                     \partial f_1/\partial x_2+\dots\\
-                     0=f_2(x_1+h_1,x_2+h_2)=&f_2(x_1,x_2)+h_1
-                     \partial f_2/\partial x_1+h_2
-                     \partial f_2/\partial x_2+\dots
-                       \end{array}.
+\mathbb{E}\left[(\boldsymbol{y}-\boldsymbol{\tilde{y}})^2\right]=\mathbb{E}\left[(\boldsymbol{f}+\boldsymbol{\epsilon}-\boldsymbol{\tilde{y}})^2\right],
 $$
 
-<p>Defining the Jacobian matrix \( {\bf \boldsymbol{J}} \) we have</p>
+<p>and adding and subtracting \( \mathbb{E}\left[\boldsymbol{\tilde{y}}\right] \) we get</p>
 $$
- {\bf \boldsymbol{J}}=\left( \begin{array}{cc}
-                         \partial f_1/\partial x_1  & \partial f_1/\partial x_2 \\
-                          \partial f_2/\partial x_1     &\partial f_2/\partial x_2
-             \end{array} \right),
+\mathbb{E}\left[(\boldsymbol{y}-\boldsymbol{\tilde{y}})^2\right]=\mathbb{E}\left[(\boldsymbol{f}+\boldsymbol{\epsilon}-\boldsymbol{\tilde{y}}+\mathbb{E}\left[\boldsymbol{\tilde{y}}\right]-\mathbb{E}\left[\boldsymbol{\tilde{y}}\right])^2\right],
 $$
 
-<p>we can rephrase Newton's method as</p>
+<p>which, using the abovementioned expectation values can be rewritten as </p>
 $$
-\left(\begin{array}{c} x_1^{n+1} \\ x_2^{n+1} \end{array} \right)=
-\left(\begin{array}{c} x_1^{n} \\ x_2^{n} \end{array} \right)+
-\left(\begin{array}{c} h_1^{n} \\ h_2^{n} \end{array} \right),
+\mathbb{E}\left[(\boldsymbol{y}-\boldsymbol{\tilde{y}})^2\right]=\mathbb{E}\left[(\boldsymbol{y}-\mathbb{E}\left[\boldsymbol{\tilde{y}}\right])^2\right]+\mathrm{Var}\left[\boldsymbol{\tilde{y}}\right]+\sigma^2,
 $$
 
-<p>where we have defined</p>
-$$
-   \left(\begin{array}{c} h_1^{n} \\ h_2^{n} \end{array} \right)=
-   -{\bf \boldsymbol{J}}^{-1}
-   \left(\begin{array}{c} f_1(x_1^{n},x_2^{n}) \\ f_2(x_1^{n},x_2^{n}) \end{array} \right).
-$$
-
-<p>We need thus to compute the inverse of the Jacobian matrix and it
-is to understand that difficulties  may
-arise in case \( {\bf \boldsymbol{J}} \) is nearly singular.
-</p>
+<p>that is the rewriting in terms of the so-called bias, the variance of the model \( \boldsymbol{\tilde{y}} \) and the variance of \( \boldsymbol{\epsilon} \).</p>
 
-<p>It is rather straightforward to extend the above scheme to systems of
-more than two non-linear equations. In our case, the Jacobian matrix is given by the Hessian that represents the second derivative of cost function. 
-</p>
+<b>Note that in order to derive these equations we have assumed we can replace the unknown function \( \boldsymbol{f} \) with the target/output data \( \boldsymbol{y} \).</b>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="steepest-descent">Steepest descent </h2>
+<h2 id="a-way-to-read-the-bias-variance-tradeoff">A way to Read the Bias-Variance Tradeoff </h2>
 
-<p>The basic idea of gradient descent is
-that a function \( F(\mathbf{x}) \), 
-\( \mathbf{x} \equiv (x_1,\cdots,x_n) \), decreases fastest if one goes from \( \bf {x} \) in the
-direction of the negative gradient \( -\nabla F(\mathbf{x}) \).
-</p>
+<br/><br/>
+<center>
+<p><img src="http://github.com/figures/BiasVariance.png" width="600" align="bottom"></p>
+</center>
+<br/><br/>
 
-<p>It can be shown that if </p>
-$$
-\mathbf{x}_{k+1} = \mathbf{x}_k - \gamma_k \nabla F(\mathbf{x}_k),
-$$
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="understanding-what-happens">Understanding what happens </h2>
 
-<p>with \( \gamma_k > 0 \).</p>
+<!-- code=python (!bc pycod) typeset with pygments style "default" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #f8f8f8">
+  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">matplotlib.pyplot</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">plt</span>
+<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.linear_model</span> <span style="color: #008000; font-weight: bold">import</span> LinearRegression, Ridge, Lasso
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.preprocessing</span> <span style="color: #008000; font-weight: bold">import</span> PolynomialFeatures
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.model_selection</span> <span style="color: #008000; font-weight: bold">import</span> train_test_split
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.pipeline</span> <span style="color: #008000; font-weight: bold">import</span> make_pipeline
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.utils</span> <span style="color: #008000; font-weight: bold">import</span> resample
+
+np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>seed(<span style="color: #666666">2018</span>)
+
+n <span style="color: #666666">=</span> <span style="color: #666666">40</span>
+n_boostraps <span style="color: #666666">=</span> <span style="color: #666666">100</span>
+maxdegree <span style="color: #666666">=</span> <span style="color: #666666">14</span>
+
+
+<span style="color: #408080; font-style: italic"># Make data set.</span>
+x <span style="color: #666666">=</span> np<span style="color: #666666">.</span>linspace(<span style="color: #666666">-3</span>, <span style="color: #666666">3</span>, n)<span style="color: #666666">.</span>reshape(<span style="color: #666666">-1</span>, <span style="color: #666666">1</span>)
+y <span style="color: #666666">=</span> np<span style="color: #666666">.</span>exp(<span style="color: #666666">-</span>x<span style="color: #666666">**2</span>) <span style="color: #666666">+</span> <span style="color: #666666">1.5</span> <span style="color: #666666">*</span> np<span style="color: #666666">.</span>exp(<span style="color: #666666">-</span>(x<span style="color: #666666">-2</span>)<span style="color: #666666">**2</span>)<span style="color: #666666">+</span> np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>normal(<span style="color: #666666">0</span>, <span style="color: #666666">0.1</span>, x<span style="color: #666666">.</span>shape)
+error <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros(maxdegree)
+bias <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros(maxdegree)
+variance <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros(maxdegree)
+polydegree <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros(maxdegree)
+x_train, x_test, y_train, y_test <span style="color: #666666">=</span> train_test_split(x, y, test_size<span style="color: #666666">=0.2</span>)
+
+<span style="color: #008000; font-weight: bold">for</span> degree <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(maxdegree):
+    model <span style="color: #666666">=</span> make_pipeline(PolynomialFeatures(degree<span style="color: #666666">=</span>degree), LinearRegression(fit_intercept<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">False</span>))
+    y_pred <span style="color: #666666">=</span> np<span style="color: #666666">.</span>empty((y_test<span style="color: #666666">.</span>shape[<span style="color: #666666">0</span>], n_boostraps))
+    <span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(n_boostraps):
+        x_, y_ <span style="color: #666666">=</span> resample(x_train, y_train)
+        y_pred[:, i] <span style="color: #666666">=</span> model<span style="color: #666666">.</span>fit(x_, y_)<span style="color: #666666">.</span>predict(x_test)<span style="color: #666666">.</span>ravel()
+
+    polydegree[degree] <span style="color: #666666">=</span> degree
+    error[degree] <span style="color: #666666">=</span> np<span style="color: #666666">.</span>mean( np<span style="color: #666666">.</span>mean((y_test <span style="color: #666666">-</span> y_pred)<span style="color: #666666">**2</span>, axis<span style="color: #666666">=1</span>, keepdims<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">True</span>) )
+    bias[degree] <span style="color: #666666">=</span> np<span style="color: #666666">.</span>mean( (y_test <span style="color: #666666">-</span> np<span style="color: #666666">.</span>mean(y_pred, axis<span style="color: #666666">=1</span>, keepdims<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">True</span>))<span style="color: #666666">**2</span> )
+    variance[degree] <span style="color: #666666">=</span> np<span style="color: #666666">.</span>mean( np<span style="color: #666666">.</span>var(y_pred, axis<span style="color: #666666">=1</span>, keepdims<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">True</span>) )
+    <span style="color: #008000">print</span>(<span style="color: #BA2121">&#39;Polynomial degree:&#39;</span>, degree)
+    <span style="color: #008000">print</span>(<span style="color: #BA2121">&#39;Error:&#39;</span>, error[degree])
+    <span style="color: #008000">print</span>(<span style="color: #BA2121">&#39;Bias^2:&#39;</span>, bias[degree])
+    <span style="color: #008000">print</span>(<span style="color: #BA2121">&#39;Var:&#39;</span>, variance[degree])
+    <span style="color: #008000">print</span>(<span style="color: #BA2121">&#39;</span><span style="color: #BB6688; font-weight: bold">{}</span><span style="color: #BA2121"> &gt;= </span><span style="color: #BB6688; font-weight: bold">{}</span><span style="color: #BA2121"> + </span><span style="color: #BB6688; font-weight: bold">{}</span><span style="color: #BA2121"> = </span><span style="color: #BB6688; font-weight: bold">{}</span><span style="color: #BA2121">&#39;</span><span style="color: #666666">.</span>format(error[degree], bias[degree], variance[degree], bias[degree]<span style="color: #666666">+</span>variance[degree]))
+
+plt<span style="color: #666666">.</span>plot(polydegree, error, label<span style="color: #666666">=</span><span style="color: #BA2121">&#39;Error&#39;</span>)
+plt<span style="color: #666666">.</span>plot(polydegree, bias, label<span style="color: #666666">=</span><span style="color: #BA2121">&#39;bias&#39;</span>)
+plt<span style="color: #666666">.</span>plot(polydegree, variance, label<span style="color: #666666">=</span><span style="color: #BA2121">&#39;Variance&#39;</span>)
+plt<span style="color: #666666">.</span>legend()
+plt<span style="color: #666666">.</span>show()
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
+</div>
 
-<p>For \( \gamma_k \) small enough, then \( F(\mathbf{x}_{k+1}) \leq
-F(\mathbf{x}_k) \). This means that for a sufficiently small \( \gamma_k \)
-we are always moving towards smaller function values, i.e a minimum.
-</p>
 
 <!-- !split  -->
-<h2 id="more-on-steepest-descent">More on Steepest descent </h2>
+<h2 id="summing-up">Summing up </h2>
 
-<p>The previous observation is the basis of the method of steepest
-descent, which is also referred to as just gradient descent (GD). One
-starts with an initial guess \( \mathbf{x}_0 \) for a minimum of \( F \) and
-computes new approximations according to
+<p>The bias-variance tradeoff summarizes the fundamental tension in
+machine learning, particularly supervised learning, between the
+complexity of a model and the amount of training data needed to train
+it.  Since data is often limited, in practice it is often useful to
+use a less-complex model with higher bias, that is  a model whose asymptotic
+performance is worse than another model because it is easier to
+train and less sensitive to sampling noise arising from having a
+finite-sized training dataset (smaller variance). 
 </p>
 
-$$
-\mathbf{x}_{k+1} = \mathbf{x}_k - \gamma_k \nabla F(\mathbf{x}_k), \ \ k \geq 0.
-$$
-
-<p>The parameter \( \gamma_k \) is often referred to as the step length or
-the learning rate within the context of Machine Learning.
+<p>The above equations tell us that in
+order to minimize the expected test error, we need to select a
+statistical learning method that simultaneously achieves low variance
+and low bias. Note that variance is inherently a nonnegative quantity,
+and squared bias is also nonnegative. Hence, we see that the expected
+test MSE can never lie below \( Var(\epsilon) \), the irreducible error.
 </p>
 
-<!-- !split  -->
-<h2 id="the-ideal">The ideal </h2>
-
-<p>Ideally the sequence \( \{\mathbf{x}_k \}_{k=0} \) converges to a global
-minimum of the function \( F \). In general we do not know if we are in a
-global or local minimum. In the special case when \( F \) is a convex
-function, all local minima are also global minima, so in this case
-gradient descent can converge to the global solution. The advantage of
-this scheme is that it is conceptually simple and straightforward to
-implement. However the method in this form has some severe
-limitations:
+<p>What do we mean by the variance and bias of a statistical learning
+method? The variance refers to the amount by which our model would change if we
+estimated it using a different training data set. Since the training
+data are used to fit the statistical learning method, different
+training data sets  will result in a different estimate. But ideally the
+estimate for our model should not vary too much between training
+sets. However, if a method has high variance  then small changes in
+the training data can result in large changes in the model. In general, more
+flexible statistical methods have higher variance.
 </p>
 
-<p>In machine learing we are often faced with non-convex high dimensional
-cost functions with many local minima. Since GD is deterministic we
-will get stuck in a local minimum, if the method converges, unless we
-have a very good intial guess. This also implies that the scheme is
-sensitive to the chosen initial condition.
-</p>
+<p>You may also find this recent <a href="https://www.pnas.org/content/116/32/15849" target="_blank">article</a> of interest.</p>
 
-<p>Note that the gradient is a function of \( \mathbf{x} =
-(x_1,\cdots,x_n) \) which makes it expensive to compute numerically.
-</p>
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="another-example-from-scikit-learn-s-repository">Another Example from Scikit-Learn's Repository </h2>
 
-<!-- !split  -->
-<h2 id="the-sensitiveness-of-the-gradient-descent">The sensitiveness of the gradient descent </h2>
-
-<p>The gradient descent method 
-is sensitive to the choice of learning rate \( \gamma_k \). This is due
-to the fact that we are only guaranteed that \( F(\mathbf{x}_{k+1}) \leq
-F(\mathbf{x}_k) \) for sufficiently small \( \gamma_k \). The problem is to
-determine an optimal learning rate. If the learning rate is chosen too
-small the method will take a long time to converge and if it is too
-large we can experience erratic behavior.
+<p>This example demonstrates the problems of underfitting and overfitting and
+how we can use linear regression with polynomial features to approximate
+nonlinear functions. The plot shows the function that we want to approximate,
+which is a part of the cosine function. In addition, the samples from the
+real function and the approximations of different models are displayed. The
+models have polynomial features of different degrees. We can see that a
+linear function (polynomial with degree 1) is not sufficient to fit the
+training samples. This is called <b>underfitting</b>. A polynomial of degree 4
+approximates the true function almost perfectly. However, for higher degrees
+the model will <b>overfit</b> the training data, i.e. it learns the noise of the
+training data.
+We evaluate quantitatively overfitting and underfitting by using
+cross-validation. We calculate the mean squared error (MSE) on the validation
+set, the higher, the less likely the model generalizes correctly from the
+training data.
 </p>
 
-<p>Many of these shortcomings can be alleviated by introducing
-randomness. One such method is that of Stochastic Gradient Descent
-(SGD), see below.
-</p>
 
-<!-- !split  -->
-<h2 id="convex-functions">Convex functions </h2>
+<!-- code=python (!bc pycod) typeset with pygments style "default" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #f8f8f8">
+  <pre style="line-height: 125%;"><span style="color: #408080; font-style: italic">#print(__doc__)</span>
 
-<p>Ideally we want our cost/loss function to be convex(concave).</p>
+<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
+<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">matplotlib.pyplot</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">plt</span>
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.pipeline</span> <span style="color: #008000; font-weight: bold">import</span> Pipeline
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.preprocessing</span> <span style="color: #008000; font-weight: bold">import</span> PolynomialFeatures
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.linear_model</span> <span style="color: #008000; font-weight: bold">import</span> LinearRegression
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.model_selection</span> <span style="color: #008000; font-weight: bold">import</span> cross_val_score
+
+
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">true_fun</span>(X):
+    <span style="color: #008000; font-weight: bold">return</span> np<span style="color: #666666">.</span>cos(<span style="color: #666666">1.5</span> <span style="color: #666666">*</span> np<span style="color: #666666">.</span>pi <span style="color: #666666">*</span> X)
+
+np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>seed(<span style="color: #666666">0</span>)
+
+n_samples <span style="color: #666666">=</span> <span style="color: #666666">30</span>
+degrees <span style="color: #666666">=</span> [<span style="color: #666666">1</span>, <span style="color: #666666">4</span>, <span style="color: #666666">15</span>]
+
+X <span style="color: #666666">=</span> np<span style="color: #666666">.</span>sort(np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>rand(n_samples))
+y <span style="color: #666666">=</span> true_fun(X) <span style="color: #666666">+</span> np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>randn(n_samples) <span style="color: #666666">*</span> <span style="color: #666666">0.1</span>
+
+plt<span style="color: #666666">.</span>figure(figsize<span style="color: #666666">=</span>(<span style="color: #666666">14</span>, <span style="color: #666666">5</span>))
+<span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #008000">len</span>(degrees)):
+    ax <span style="color: #666666">=</span> plt<span style="color: #666666">.</span>subplot(<span style="color: #666666">1</span>, <span style="color: #008000">len</span>(degrees), i <span style="color: #666666">+</span> <span style="color: #666666">1</span>)
+    plt<span style="color: #666666">.</span>setp(ax, xticks<span style="color: #666666">=</span>(), yticks<span style="color: #666666">=</span>())
+
+    polynomial_features <span style="color: #666666">=</span> PolynomialFeatures(degree<span style="color: #666666">=</span>degrees[i],
+                                             include_bias<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">False</span>)
+    linear_regression <span style="color: #666666">=</span> LinearRegression()
+    pipeline <span style="color: #666666">=</span> Pipeline([(<span style="color: #BA2121">&quot;polynomial_features&quot;</span>, polynomial_features),
+                         (<span style="color: #BA2121">&quot;linear_regression&quot;</span>, linear_regression)])
+    pipeline<span style="color: #666666">.</span>fit(X[:, np<span style="color: #666666">.</span>newaxis], y)
+
+    <span style="color: #408080; font-style: italic"># Evaluate the models using crossvalidation</span>
+    scores <span style="color: #666666">=</span> cross_val_score(pipeline, X[:, np<span style="color: #666666">.</span>newaxis], y,
+                             scoring<span style="color: #666666">=</span><span style="color: #BA2121">&quot;neg_mean_squared_error&quot;</span>, cv<span style="color: #666666">=10</span>)
+
+    X_test <span style="color: #666666">=</span> np<span style="color: #666666">.</span>linspace(<span style="color: #666666">0</span>, <span style="color: #666666">1</span>, <span style="color: #666666">100</span>)
+    plt<span style="color: #666666">.</span>plot(X_test, pipeline<span style="color: #666666">.</span>predict(X_test[:, np<span style="color: #666666">.</span>newaxis]), label<span style="color: #666666">=</span><span style="color: #BA2121">&quot;Model&quot;</span>)
+    plt<span style="color: #666666">.</span>plot(X_test, true_fun(X_test), label<span style="color: #666666">=</span><span style="color: #BA2121">&quot;True function&quot;</span>)
+    plt<span style="color: #666666">.</span>scatter(X, y, edgecolor<span style="color: #666666">=</span><span style="color: #BA2121">&#39;b&#39;</span>, s<span style="color: #666666">=20</span>, label<span style="color: #666666">=</span><span style="color: #BA2121">&quot;Samples&quot;</span>)
+    plt<span style="color: #666666">.</span>xlabel(<span style="color: #BA2121">&quot;x&quot;</span>)
+    plt<span style="color: #666666">.</span>ylabel(<span style="color: #BA2121">&quot;y&quot;</span>)
+    plt<span style="color: #666666">.</span>xlim((<span style="color: #666666">0</span>, <span style="color: #666666">1</span>))
+    plt<span style="color: #666666">.</span>ylim((<span style="color: #666666">-2</span>, <span style="color: #666666">2</span>))
+    plt<span style="color: #666666">.</span>legend(loc<span style="color: #666666">=</span><span style="color: #BA2121">&quot;best&quot;</span>)
+    plt<span style="color: #666666">.</span>title(<span style="color: #BA2121">&quot;Degree </span><span style="color: #BB6688; font-weight: bold">{}</span><span style="color: #BB6622; font-weight: bold">\n</span><span style="color: #BA2121">MSE = </span><span style="color: #BB6688; font-weight: bold">{:.2e}</span><span style="color: #BA2121">(+/- </span><span style="color: #BB6688; font-weight: bold">{:.2e}</span><span style="color: #BA2121">)&quot;</span><span style="color: #666666">.</span>format(
+        degrees[i], <span style="color: #666666">-</span>scores<span style="color: #666666">.</span>mean(), scores<span style="color: #666666">.</span>std()))
+plt<span style="color: #666666">.</span>show()
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
+</div>
 
-<p>First we give the definition of a convex set: A set \( C \) in
-\( \mathbb{R}^n \) is said to be convex if, for all \( x \) and \( y \) in \( C \) and
-all \( t \in (0,1) \) , the point \( (1 &#8722; t)x + ty \) also belongs to
-C. Geometrically this means that every point on the line segment
-connecting \( x \) and \( y \) is in \( C \) as discussed below.
-</p>
 
-<p>The convex subsets of \( \mathbb{R} \) are the intervals of
-\( \mathbb{R} \). Examples of convex sets of \( \mathbb{R}^2 \) are the
-regular polygons (triangles, rectangles, pentagons, etc...).
-</p>
+<!-- !split  -->
+<h2 id="various-steps-in-cross-validation">Various steps in cross-validation </h2>
 
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="convex-function">Convex function </h2>
-
-<p><b>Convex function</b>: Let \( X \subset \mathbb{R}^n \) be a convex
-set. Assume that the function \( f: X \rightarrow \mathbb{R} \) is
-continuous, then \( f \) is said to be convex if \( f(tx_1 + (1-t)x_2) \leq tf(x_1) + (1-t)f(x_2) \)
-for all \( x_1, x_2 \in X \) and for all \( t \in [0,1] \).
-If \( \leq \) is replaced with a strict inequaltiy in the
-definition, we demand \( x_1 \neq x_2 \) and \( t\in(0,1) \) then \( f \) is said
-to be strictly convex. For a single variable function, convexity means
-that if you draw a straight line connecting \( f(x_1) \) and \( f(x_2) \), the
-value of the function on the interval \( [x_1,x_2] \) is always below the
-line as illustrated below.
+<p>When the repetitive splitting of the data set is done randomly,
+samples may accidently end up in a fast majority of the splits in
+either training or test set. Such samples may have an unbalanced
+influence on either model building or prediction evaluation. To avoid
+this \( k \)-fold cross-validation structures the data splitting. The
+samples are divided into \( k \) more or less equally sized exhaustive and
+mutually exclusive subsets. In turn (at each split) one of these
+subsets plays the role of the test set while the union of the
+remaining subsets constitutes the training set. Such a splitting
+warrants a balanced representation of each sample in both training and
+test set over the splits. Still the division into the \( k \) subsets
+involves a degree of randomness. This may be fully excluded when
+choosing \( k=n \). This particular case is referred to as leave-one-out
+cross-validation (LOOCV). 
 </p>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="conditions-on-convex-functions">Conditions on convex functions </h2>
+<h2 id="cross-validation-in-brief">Cross-validation in brief </h2>
 
-<p>In the following we state first and second-order conditions which
-ensures convexity of a function \( f \). We write \( D_f \) to denote the
-domain of \( f \), i.e the subset of \( R^n \) where \( f \) is defined. For more
-details and proofs we refer to: <a href="http://stanford.edu/boyd/cvxbook/" target="_blank">S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge University Press</a>.
-</p>
-
-<div class="alert alert-block alert-block alert-text-normal">
-<b>First order condition</b>
-<p>
-<p>Suppose \( f \) is differentiable (i.e \( \nabla f(x) \) is well defined for
-all \( x \) in the domain of \( f \)). Then \( f \) is convex if and only if \( D_f \)
-is a convex set and \( f(y) \geq f(x) + \nabla f(x)^T (y-x) \) holds
-for all \( x,y \in D_f \).
-</p>
-
-<p>This condition means that for a convex function
-the first order Taylor expansion (right hand side above) at any point
-a global under estimator of the function. To convince yourself you can
-make a drawing of \( f(x) = x^2+1 \) and draw the tangent line to \( f(x) \) and
-note that it is always below the graph.  
-</p>
-</div>
-
-
-<div class="alert alert-block alert-block alert-text-normal">
-<b>Second order condition</b>
-<p>
-<p>Assume that \( f \) is twice
-differentiable, i.e the Hessian matrix exists at each point in
-\( D_f \). Then \( f \) is convex if and only if \( D_f \) is a convex set and its
-Hessian is positive semi-definite for all \( x\in D_f \). For a
-single-variable function this reduces to \( f''(x) \geq 0 \). Geometrically this means that \( f \) has nonnegative curvature
-everywhere.
-</p>
-</div>
-
-
-<p>This condition is particularly useful since it gives us an procedure for determining if the function under consideration is convex, apart from using the definition.</p>
+<p>For the various values of \( k \)</p>
 
+<ol>
+<li> shuffle the dataset randomly.</li>
+<li> Split the dataset into \( k \) groups.</li>
+<li> For each unique group:
+<ol type="a"></li>
+<li> Decide which group to use as set for test data</li>
+<li> Take the remaining groups as a training data set</li>
+<li> Fit a model on the training set and evaluate it on the test set</li>
+<li> Retain the evaluation score and discard the model</li>
+</ol>
+<li> Summarize the model using the sample of model evaluation scores</li>
+</ol>
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="more-on-convex-functions">More on convex functions </h2>
-
-<p>The next result is of great importance to us and the reason why we are
-going on about convex functions. In machine learning we frequently
-have to minimize a loss/cost function in order to find the best
-parameters for the model we are considering. 
-</p>
-
-<p>Ideally we want the
-global minimum (for high-dimensional models it is hard to know
-if we have local or global minimum). However, if the cost/loss function
-is convex the following result provides invaluable information:
-</p>
+<h2 id="code-example-for-cross-validation-and-k-fold-cross-validation">Code Example for Cross-validation and \( k \)-fold Cross-validation </h2>
 
-<div class="alert alert-block alert-block alert-text-normal">
-<b>Any minimum is global for convex functions</b>
-<p>
-<p>Consider the problem of finding \( x \in \mathbb{R}^n \) such that \( f(x) \)
-is minimal, where \( f \) is convex and differentiable. Then, any point
-\( x^* \) that satisfies \( \nabla f(x^*) = 0 \) is a global minimum.
-</p>
-</div>
+<p>The code here uses Ridge regression with cross-validation (CV)  resampling and \( k \)-fold CV in order to fit a specific polynomial. </p>
 
+<!-- code=python (!bc pycod) typeset with pygments style "default" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #f8f8f8">
+  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
+<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">matplotlib.pyplot</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">plt</span>
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.model_selection</span> <span style="color: #008000; font-weight: bold">import</span> KFold
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.linear_model</span> <span style="color: #008000; font-weight: bold">import</span> Ridge
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.model_selection</span> <span style="color: #008000; font-weight: bold">import</span> cross_val_score
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.preprocessing</span> <span style="color: #008000; font-weight: bold">import</span> PolynomialFeatures
 
-<p>This result means that if we know that the cost/loss function is convex and we are able to find a minimum, we are guaranteed that it is a global minimum.</p>
+<span style="color: #408080; font-style: italic"># A seed just to ensure that the random numbers are the same for every run.</span>
+<span style="color: #408080; font-style: italic"># Useful for eventual debugging.</span>
+np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>seed(<span style="color: #666666">3155</span>)
 
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="some-simple-problems">Some simple problems </h2>
+<span style="color: #408080; font-style: italic"># Generate the data.</span>
+nsamples <span style="color: #666666">=</span> <span style="color: #666666">100</span>
+x <span style="color: #666666">=</span> np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>randn(nsamples)
+y <span style="color: #666666">=</span> <span style="color: #666666">3*</span>x<span style="color: #666666">**2</span> <span style="color: #666666">+</span> np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>randn(nsamples)
 
-<ol>
-<li> Show that \( f(x)=x^2 \) is convex for \( x \in \mathbb{R} \) using the definition of convexity. Hint: If you re-write the definition, \( f \) is convex if the following holds for all \( x,y \in D_f \) and any \( \lambda \in [0,1] \) $\lambda f(x)+(1-\lambda)f(y)-f(\lambda x + (1-\lambda) y ) \geq 0$.</li>
-<li> Using the second order condition show that the following functions are convex on the specified domain.</li>
-<ul>
- <li> \( f(x) = e^x \) is convex for \( x \in \mathbb{R} \).</li>
- <li> \( g(x) = -\ln(x) \) is convex for \( x \in (0,\infty) \).</li>
-</ul>
-<li> Let \( f(x) = x^2 \) and \( g(x) = e^x \). Show that \( f(g(x)) \) and \( g(f(x)) \) is convex for \( x \in \mathbb{R} \). Also show that if \( f(x) \) is any convex function than \( h(x) = e^{f(x)} \) is convex.</li>
-<li> A norm is any function that satisfy the following properties</li>
-<ul>
- <li> \( f(\alpha x) = |\alpha| f(x) \) for all \( \alpha \in \mathbb{R} \).</li>
- <li> \( f(x+y) \leq f(x) + f(y) \)</li>
- <li> \( f(x) \leq 0 \) for all \( x \in \mathbb{R}^n \) with equality if and only if \( x = 0 \)</li>
-</ul>
-</ol>
-<p>Using the definition of convexity, try to show that a function satisfying the properties above is convex (the third condition is not needed to show this).</p>
+<span style="color: #408080; font-style: italic">## Cross-validation on Ridge regression using KFold only</span>
 
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="standard-steepest-descent">Standard steepest descent </h2>
+<span style="color: #408080; font-style: italic"># Decide degree on polynomial to fit</span>
+poly <span style="color: #666666">=</span> PolynomialFeatures(degree <span style="color: #666666">=</span> <span style="color: #666666">6</span>)
 
-<p>Before we proceed, we would like to discuss the approach called the
-<b>standard Steepest descent</b> (different from the above steepest descent discussion), which again leads to us having to be able
-to compute a matrix. It belongs to the class of Conjugate Gradient methods (CG).
-</p>
+<span style="color: #408080; font-style: italic"># Decide which values of lambda to use</span>
+nlambdas <span style="color: #666666">=</span> <span style="color: #666666">500</span>
+lambdas <span style="color: #666666">=</span> np<span style="color: #666666">.</span>logspace(<span style="color: #666666">-3</span>, <span style="color: #666666">5</span>, nlambdas)
 
-<a href="https://www.cs.cmu.edu/~quake-papers/painless-conjugate-gradient.pdf" target="_blank">The success of the CG method</a>
-<p>for finding solutions of non-linear problems is based on the theory
-of conjugate gradients for linear systems of equations. It belongs to
-the class of iterative methods for solving problems from linear
-algebra of the type 
-</p>
-$$
-\begin{equation*} 
-\boldsymbol{A}\boldsymbol{x} = \boldsymbol{b}.
-\end{equation*} 
-$$
+<span style="color: #408080; font-style: italic"># Initialize a KFold instance</span>
+k <span style="color: #666666">=</span> <span style="color: #666666">5</span>
+kfold <span style="color: #666666">=</span> KFold(n_splits <span style="color: #666666">=</span> k)
 
-<p>In the iterative process we end up with a problem like</p>
+<span style="color: #408080; font-style: italic"># Perform the cross-validation to estimate MSE</span>
+scores_KFold <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros((nlambdas, k))
 
-$$
-\begin{equation*}
-  \boldsymbol{r}= \boldsymbol{b}-\boldsymbol{A}\boldsymbol{x},
-\end{equation*}
-$$
+i <span style="color: #666666">=</span> <span style="color: #666666">0</span>
+<span style="color: #008000; font-weight: bold">for</span> lmb <span style="color: #AA22FF; font-weight: bold">in</span> lambdas:
+    ridge <span style="color: #666666">=</span> Ridge(alpha <span style="color: #666666">=</span> lmb)
+    j <span style="color: #666666">=</span> <span style="color: #666666">0</span>
+    <span style="color: #008000; font-weight: bold">for</span> train_inds, test_inds <span style="color: #AA22FF; font-weight: bold">in</span> kfold<span style="color: #666666">.</span>split(x):
+        xtrain <span style="color: #666666">=</span> x[train_inds]
+        ytrain <span style="color: #666666">=</span> y[train_inds]
 
-<p>where \( \boldsymbol{r} \) is the so-called residual or error in the iterative process.</p>
+        xtest <span style="color: #666666">=</span> x[test_inds]
+        ytest <span style="color: #666666">=</span> y[test_inds]
 
-<p>When we have found the exact solution, \( \boldsymbol{r}=0 \).</p>
+        Xtrain <span style="color: #666666">=</span> poly<span style="color: #666666">.</span>fit_transform(xtrain[:, np<span style="color: #666666">.</span>newaxis])
+        ridge<span style="color: #666666">.</span>fit(Xtrain, ytrain[:, np<span style="color: #666666">.</span>newaxis])
 
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="gradient-method">Gradient method </h2>
+        Xtest <span style="color: #666666">=</span> poly<span style="color: #666666">.</span>fit_transform(xtest[:, np<span style="color: #666666">.</span>newaxis])
+        ypred <span style="color: #666666">=</span> ridge<span style="color: #666666">.</span>predict(Xtest)
 
-<p>The residual is zero when we reach the minimum of the quadratic equation</p>
-$$
-\begin{equation*}
-  P(\boldsymbol{x})=\frac{1}{2}\boldsymbol{x}^T\boldsymbol{A}\boldsymbol{x} - \boldsymbol{x}^T\boldsymbol{b},
-\end{equation*}
-$$
+        scores_KFold[i,j] <span style="color: #666666">=</span> np<span style="color: #666666">.</span>sum((ypred <span style="color: #666666">-</span> ytest[:, np<span style="color: #666666">.</span>newaxis])<span style="color: #666666">**2</span>)<span style="color: #666666">/</span>np<span style="color: #666666">.</span>size(ypred)
 
-<p>with the constraint that the matrix \( \boldsymbol{A} \) is positive definite and
-symmetric.  This defines also the Hessian and we want it to be  positive definite.  
-</p>
+        j <span style="color: #666666">+=</span> <span style="color: #666666">1</span>
+    i <span style="color: #666666">+=</span> <span style="color: #666666">1</span>
 
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="steepest-descent-method">Steepest descent  method </h2>
 
-<p>We denote the initial guess for \( \boldsymbol{x} \) as \( \boldsymbol{x}_0 \). 
-We can assume without loss of generality that
-</p>
-$$
-\begin{equation*}
-\boldsymbol{x}_0=0,
-\end{equation*}
-$$
+estimated_mse_KFold <span style="color: #666666">=</span> np<span style="color: #666666">.</span>mean(scores_KFold, axis <span style="color: #666666">=</span> <span style="color: #666666">1</span>)
 
-<p>or consider the system</p>
-$$
-\begin{equation*}
-\boldsymbol{A}\boldsymbol{z} = \boldsymbol{b}-\boldsymbol{A}\boldsymbol{x}_0,
-\end{equation*}
-$$
+<span style="color: #408080; font-style: italic">## Cross-validation using cross_val_score from sklearn along with KFold</span>
 
-<p>instead.</p>
+<span style="color: #408080; font-style: italic"># kfold is an instance initialized above as:</span>
+<span style="color: #408080; font-style: italic"># kfold = KFold(n_splits = k)</span>
 
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="steepest-descent-method">Steepest descent  method </h2>
-<div class="alert alert-block alert-block alert-text-normal">
-<b></b>
-<p>
-<p>One can show that the solution \( \boldsymbol{x} \) is also the unique minimizer of the quadratic form</p>
-$$
-\begin{equation*}
-  f(\boldsymbol{x}) = \frac{1}{2}\boldsymbol{x}^T\boldsymbol{A}\boldsymbol{x} - \boldsymbol{x}^T \boldsymbol{x} , \quad \boldsymbol{x}\in\mathbf{R}^n. 
-\end{equation*}
-$$
+estimated_mse_sklearn <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros(nlambdas)
+i <span style="color: #666666">=</span> <span style="color: #666666">0</span>
+<span style="color: #008000; font-weight: bold">for</span> lmb <span style="color: #AA22FF; font-weight: bold">in</span> lambdas:
+    ridge <span style="color: #666666">=</span> Ridge(alpha <span style="color: #666666">=</span> lmb)
 
-<p>This suggests taking the first basis vector \( \boldsymbol{r}_1 \) (see below for definition) 
-to be the gradient of \( f \) at \( \boldsymbol{x}=\boldsymbol{x}_0 \), 
-which equals
-</p>
-$$
-\begin{equation*}
-\boldsymbol{A}\boldsymbol{x}_0-\boldsymbol{b},
-\end{equation*}
-$$
+    X <span style="color: #666666">=</span> poly<span style="color: #666666">.</span>fit_transform(x[:, np<span style="color: #666666">.</span>newaxis])
+    estimated_mse_folds <span style="color: #666666">=</span> cross_val_score(ridge, X, y[:, np<span style="color: #666666">.</span>newaxis], scoring<span style="color: #666666">=</span><span style="color: #BA2121">&#39;neg_mean_squared_error&#39;</span>, cv<span style="color: #666666">=</span>kfold)
 
-<p>and 
-\( \boldsymbol{x}_0=0 \) it is equal \( -\boldsymbol{b} \).
-</p>
-</div>
+    <span style="color: #408080; font-style: italic"># cross_val_score return an array containing the estimated negative mse for every fold.</span>
+    <span style="color: #408080; font-style: italic"># we have to the the mean of every array in order to get an estimate of the mse of the model</span>
+    estimated_mse_sklearn[i] <span style="color: #666666">=</span> np<span style="color: #666666">.</span>mean(<span style="color: #666666">-</span>estimated_mse_folds)
 
+    i <span style="color: #666666">+=</span> <span style="color: #666666">1</span>
 
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="final-expressions">Final expressions </h2>
-<div class="alert alert-block alert-block alert-text-normal">
-<b></b>
-<p>
-<p>We can compute the residual iteratively as</p>
-$$
-\begin{equation*}
-\boldsymbol{r}_{k+1}=\boldsymbol{b}-\boldsymbol{A}\boldsymbol{x}_{k+1},
- \end{equation*}
-$$
+<span style="color: #408080; font-style: italic">## Plot and compare the slightly different ways to perform cross-validation</span>
 
-<p>which equals</p>
-$$
-\begin{equation*}
-\boldsymbol{b}-\boldsymbol{A}(\boldsymbol{x}_k+\alpha_k\boldsymbol{r}_k),
- \end{equation*}
-$$
+plt<span style="color: #666666">.</span>figure()
 
-<p>or</p>
-$$
-\begin{equation*}
-(\boldsymbol{b}-\boldsymbol{A}\boldsymbol{x}_k)-\alpha_k\boldsymbol{A}\boldsymbol{r}_k,
- \end{equation*}
-$$
+plt<span style="color: #666666">.</span>plot(np<span style="color: #666666">.</span>log10(lambdas), estimated_mse_sklearn, label <span style="color: #666666">=</span> <span style="color: #BA2121">&#39;cross_val_score&#39;</span>)
+<span style="color: #408080; font-style: italic">#plt.plot(np.log10(lambdas), estimated_mse_KFold, &#39;r--&#39;, label = &#39;KFold&#39;)</span>
 
-<p>which gives</p>
+plt<span style="color: #666666">.</span>xlabel(<span style="color: #BA2121">&#39;log10(lambda)&#39;</span>)
+plt<span style="color: #666666">.</span>ylabel(<span style="color: #BA2121">&#39;mse&#39;</span>)
 
-$$
-\alpha_k = \frac{\boldsymbol{r}_k^T\boldsymbol{r}_k}{\boldsymbol{r}_k^T\boldsymbol{A}\boldsymbol{r}_k}
-$$
+plt<span style="color: #666666">.</span>legend()
 
-<p>leading to the iterative scheme</p>
-$$
-\begin{equation*}
-\boldsymbol{x}_{k+1}=\boldsymbol{x}_k+\alpha_k\boldsymbol{r}_{k},
- \end{equation*}
-$$
+plt<span style="color: #666666">.</span>show()
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
 </div>
 
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="steepest-descent-example">Steepest descent example </h2>
+<h2 id="more-examples-on-bootstrap-and-cross-validation-and-errors">More examples on bootstrap and cross-validation and errors </h2>
 
 
 <!-- code=python (!bc pycod) typeset with pygments style "default" -->
@@ -1009,26 +927,84 @@ <h2 id="steepest-descent-example">Steepest descent example </h2>
     <div class="inner_cell">
       <div class="input_area">
         <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">numpy.linalg</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">la</span>
-
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">scipy.optimize</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">sopt</span>
-
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">matplotlib.pyplot</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">pt</span>
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">mpl_toolkits.mplot3d</span> <span style="color: #008000; font-weight: bold">import</span> axes3d
-
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">f</span>(x):
-    <span style="color: #008000; font-weight: bold">return</span> x[<span style="color: #666666">0</span>]<span style="color: #666666">**2</span> <span style="color: #666666">+</span> <span style="color: #666666">3.0*</span>x[<span style="color: #666666">1</span>]<span style="color: #666666">**2</span>
-
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">df</span>(x):
-    <span style="color: #008000; font-weight: bold">return</span> np<span style="color: #666666">.</span>array([<span style="color: #666666">2*</span>x[<span style="color: #666666">0</span>], <span style="color: #666666">6*</span>x[<span style="color: #666666">1</span>]])
-
-fig <span style="color: #666666">=</span> pt<span style="color: #666666">.</span>figure()
-ax <span style="color: #666666">=</span> fig<span style="color: #666666">.</span>add_subplot(projection <span style="color: #666666">=</span> <span style="color: #BA2121">&#39;3d&#39;</span>)
-
-xmesh, ymesh <span style="color: #666666">=</span> np<span style="color: #666666">.</span>mgrid[<span style="color: #666666">-3</span>:<span style="color: #666666">3</span>:<span style="color: #666666">50</span>j,<span style="color: #666666">-3</span>:<span style="color: #666666">3</span>:<span style="color: #666666">50</span>j]
-fmesh <span style="color: #666666">=</span> f(np<span style="color: #666666">.</span>array([xmesh, ymesh]))
-ax<span style="color: #666666">.</span>plot_surface(xmesh, ymesh, fmesh)
+  <pre style="line-height: 125%;"><span style="color: #408080; font-style: italic"># Common imports</span>
+<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">os</span>
+<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
+<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">pandas</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">pd</span>
+<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">matplotlib.pyplot</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">plt</span>
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.linear_model</span> <span style="color: #008000; font-weight: bold">import</span> LinearRegression, Ridge, Lasso
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.model_selection</span> <span style="color: #008000; font-weight: bold">import</span> train_test_split
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.utils</span> <span style="color: #008000; font-weight: bold">import</span> resample
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.metrics</span> <span style="color: #008000; font-weight: bold">import</span> mean_squared_error
+<span style="color: #408080; font-style: italic"># Where to save the figures and data files</span>
+PROJECT_ROOT_DIR <span style="color: #666666">=</span> <span style="color: #BA2121">&quot;Results&quot;</span>
+FIGURE_ID <span style="color: #666666">=</span> <span style="color: #BA2121">&quot;Results/FigureFiles&quot;</span>
+DATA_ID <span style="color: #666666">=</span> <span style="color: #BA2121">&quot;DataFiles/&quot;</span>
+
+<span style="color: #008000; font-weight: bold">if</span> <span style="color: #AA22FF; font-weight: bold">not</span> os<span style="color: #666666">.</span>path<span style="color: #666666">.</span>exists(PROJECT_ROOT_DIR):
+    os<span style="color: #666666">.</span>mkdir(PROJECT_ROOT_DIR)
+
+<span style="color: #008000; font-weight: bold">if</span> <span style="color: #AA22FF; font-weight: bold">not</span> os<span style="color: #666666">.</span>path<span style="color: #666666">.</span>exists(FIGURE_ID):
+    os<span style="color: #666666">.</span>makedirs(FIGURE_ID)
+
+<span style="color: #008000; font-weight: bold">if</span> <span style="color: #AA22FF; font-weight: bold">not</span> os<span style="color: #666666">.</span>path<span style="color: #666666">.</span>exists(DATA_ID):
+    os<span style="color: #666666">.</span>makedirs(DATA_ID)
+
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">image_path</span>(fig_id):
+    <span style="color: #008000; font-weight: bold">return</span> os<span style="color: #666666">.</span>path<span style="color: #666666">.</span>join(FIGURE_ID, fig_id)
+
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">data_path</span>(dat_id):
+    <span style="color: #008000; font-weight: bold">return</span> os<span style="color: #666666">.</span>path<span style="color: #666666">.</span>join(DATA_ID, dat_id)
+
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">save_fig</span>(fig_id):
+    plt<span style="color: #666666">.</span>savefig(image_path(fig_id) <span style="color: #666666">+</span> <span style="color: #BA2121">&quot;.png&quot;</span>, <span style="color: #008000">format</span><span style="color: #666666">=</span><span style="color: #BA2121">&#39;png&#39;</span>)
+
+infile <span style="color: #666666">=</span> <span style="color: #008000">open</span>(data_path(<span style="color: #BA2121">&quot;EoS.csv&quot;</span>),<span style="color: #BA2121">&#39;r&#39;</span>)
+
+<span style="color: #408080; font-style: italic"># Read the EoS data as  csv file and organize the data into two arrays with density and energies</span>
+EoS <span style="color: #666666">=</span> pd<span style="color: #666666">.</span>read_csv(infile, names<span style="color: #666666">=</span>(<span style="color: #BA2121">&#39;Density&#39;</span>, <span style="color: #BA2121">&#39;Energy&#39;</span>))
+EoS[<span style="color: #BA2121">&#39;Energy&#39;</span>] <span style="color: #666666">=</span> pd<span style="color: #666666">.</span>to_numeric(EoS[<span style="color: #BA2121">&#39;Energy&#39;</span>], errors<span style="color: #666666">=</span><span style="color: #BA2121">&#39;coerce&#39;</span>)
+EoS <span style="color: #666666">=</span> EoS<span style="color: #666666">.</span>dropna()
+Energies <span style="color: #666666">=</span> EoS[<span style="color: #BA2121">&#39;Energy&#39;</span>]
+Density <span style="color: #666666">=</span> EoS[<span style="color: #BA2121">&#39;Density&#39;</span>]
+<span style="color: #408080; font-style: italic">#  The design matrix now as function of various polytrops</span>
+
+Maxpolydegree <span style="color: #666666">=</span> <span style="color: #666666">30</span>
+X <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros((<span style="color: #008000">len</span>(Density),Maxpolydegree))
+X[:,<span style="color: #666666">0</span>] <span style="color: #666666">=</span> <span style="color: #666666">1.0</span>
+testerror <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros(Maxpolydegree)
+trainingerror <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros(Maxpolydegree)
+polynomial <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros(Maxpolydegree)
+
+trials <span style="color: #666666">=</span> <span style="color: #666666">100</span>
+<span style="color: #008000; font-weight: bold">for</span> polydegree <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #666666">1</span>, Maxpolydegree):
+    polynomial[polydegree] <span style="color: #666666">=</span> polydegree
+    <span style="color: #008000; font-weight: bold">for</span> degree <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(polydegree):
+        X[:,degree] <span style="color: #666666">=</span> Density<span style="color: #666666">**</span>(degree<span style="color: #666666">/3.0</span>)
+
+<span style="color: #408080; font-style: italic"># loop over trials in order to estimate the expectation value of the MSE</span>
+    testerror[polydegree] <span style="color: #666666">=</span> <span style="color: #666666">0.0</span>
+    trainingerror[polydegree] <span style="color: #666666">=</span> <span style="color: #666666">0.0</span>
+    <span style="color: #008000; font-weight: bold">for</span> samples <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(trials):
+        x_train, x_test, y_train, y_test <span style="color: #666666">=</span> train_test_split(X, Energies, test_size<span style="color: #666666">=0.2</span>)
+        model <span style="color: #666666">=</span> LinearRegression(fit_intercept<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">False</span>)<span style="color: #666666">.</span>fit(x_train, y_train)
+        ypred <span style="color: #666666">=</span> model<span style="color: #666666">.</span>predict(x_train)
+        ytilde <span style="color: #666666">=</span> model<span style="color: #666666">.</span>predict(x_test)
+        testerror[polydegree] <span style="color: #666666">+=</span> mean_squared_error(y_test, ytilde)
+        trainingerror[polydegree] <span style="color: #666666">+=</span> mean_squared_error(y_train, ypred) 
+
+    testerror[polydegree] <span style="color: #666666">/=</span> trials
+    trainingerror[polydegree] <span style="color: #666666">/=</span> trials
+    <span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;Degree of polynomial: </span><span style="color: #BB6688; font-weight: bold">%3d</span><span style="color: #BA2121">&quot;</span><span style="color: #666666">%</span> polynomial[polydegree])
+    <span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;Mean squared error on training data: </span><span style="color: #BB6688; font-weight: bold">%.8f</span><span style="color: #BA2121">&quot;</span> <span style="color: #666666">%</span> trainingerror[polydegree])
+    <span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;Mean squared error on test data: </span><span style="color: #BB6688; font-weight: bold">%.8f</span><span style="color: #BA2121">&quot;</span> <span style="color: #666666">%</span> testerror[polydegree])
+
+plt<span style="color: #666666">.</span>plot(polynomial, np<span style="color: #666666">.</span>log10(trainingerror), label<span style="color: #666666">=</span><span style="color: #BA2121">&#39;Training Error&#39;</span>)
+plt<span style="color: #666666">.</span>plot(polynomial, np<span style="color: #666666">.</span>log10(testerror), label<span style="color: #666666">=</span><span style="color: #BA2121">&#39;Test Error&#39;</span>)
+plt<span style="color: #666666">.</span>xlabel(<span style="color: #BA2121">&#39;Polynomial degree&#39;</span>)
+plt<span style="color: #666666">.</span>ylabel(<span style="color: #BA2121">&#39;log10[MSE]&#39;</span>)
+plt<span style="color: #666666">.</span>legend()
+plt<span style="color: #666666">.</span>show()
 </pre>
 </div>
       </div>
@@ -1044,7 +1020,12 @@ <h2 id="steepest-descent-example">Steepest descent example </h2>
   </div>
 </div>
 
-<p>And then as countor plot</p>
+<p>Note that we kept the intercept column in the fitting here. This means that we need to set the <b>intercept</b> in the call to the <b>Scikit-Learn</b> function as <b>False</b>. Alternatively, we could have set up the design matrix \( X \) without the first column of ones.</p>
+
+<!-- !split  -->
+<h2 id="the-same-example-but-now-with-cross-validation">The same example but now with cross-validation </h2>
+
+<p>In this example we keep the intercept column again but add cross-validation in order to estimate the best possible value of the means squared error.</p>
 
 <!-- code=python (!bc pycod) typeset with pygments style "default" -->
 <div class="cell border-box-sizing code_cell rendered">
@@ -1052,9 +1033,73 @@ <h2 id="steepest-descent-example">Steepest descent example </h2>
     <div class="inner_cell">
       <div class="input_area">
         <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;">pt<span style="color: #666666">.</span>axis(<span style="color: #BA2121">&quot;equal&quot;</span>)
-pt<span style="color: #666666">.</span>contour(xmesh, ymesh, fmesh)
-guesses <span style="color: #666666">=</span> [np<span style="color: #666666">.</span>array([<span style="color: #666666">2</span>, <span style="color: #666666">2./5</span>])]
+  <pre style="line-height: 125%;"><span style="color: #408080; font-style: italic"># Common imports</span>
+<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">os</span>
+<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
+<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">pandas</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">pd</span>
+<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">matplotlib.pyplot</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">plt</span>
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.linear_model</span> <span style="color: #008000; font-weight: bold">import</span> LinearRegression, Ridge, Lasso
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.metrics</span> <span style="color: #008000; font-weight: bold">import</span> mean_squared_error
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.model_selection</span> <span style="color: #008000; font-weight: bold">import</span> KFold
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.model_selection</span> <span style="color: #008000; font-weight: bold">import</span> cross_val_score
+
+
+<span style="color: #408080; font-style: italic"># Where to save the figures and data files</span>
+PROJECT_ROOT_DIR <span style="color: #666666">=</span> <span style="color: #BA2121">&quot;Results&quot;</span>
+FIGURE_ID <span style="color: #666666">=</span> <span style="color: #BA2121">&quot;Results/FigureFiles&quot;</span>
+DATA_ID <span style="color: #666666">=</span> <span style="color: #BA2121">&quot;DataFiles/&quot;</span>
+
+<span style="color: #008000; font-weight: bold">if</span> <span style="color: #AA22FF; font-weight: bold">not</span> os<span style="color: #666666">.</span>path<span style="color: #666666">.</span>exists(PROJECT_ROOT_DIR):
+    os<span style="color: #666666">.</span>mkdir(PROJECT_ROOT_DIR)
+
+<span style="color: #008000; font-weight: bold">if</span> <span style="color: #AA22FF; font-weight: bold">not</span> os<span style="color: #666666">.</span>path<span style="color: #666666">.</span>exists(FIGURE_ID):
+    os<span style="color: #666666">.</span>makedirs(FIGURE_ID)
+
+<span style="color: #008000; font-weight: bold">if</span> <span style="color: #AA22FF; font-weight: bold">not</span> os<span style="color: #666666">.</span>path<span style="color: #666666">.</span>exists(DATA_ID):
+    os<span style="color: #666666">.</span>makedirs(DATA_ID)
+
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">image_path</span>(fig_id):
+    <span style="color: #008000; font-weight: bold">return</span> os<span style="color: #666666">.</span>path<span style="color: #666666">.</span>join(FIGURE_ID, fig_id)
+
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">data_path</span>(dat_id):
+    <span style="color: #008000; font-weight: bold">return</span> os<span style="color: #666666">.</span>path<span style="color: #666666">.</span>join(DATA_ID, dat_id)
+
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">save_fig</span>(fig_id):
+    plt<span style="color: #666666">.</span>savefig(image_path(fig_id) <span style="color: #666666">+</span> <span style="color: #BA2121">&quot;.png&quot;</span>, <span style="color: #008000">format</span><span style="color: #666666">=</span><span style="color: #BA2121">&#39;png&#39;</span>)
+
+infile <span style="color: #666666">=</span> <span style="color: #008000">open</span>(data_path(<span style="color: #BA2121">&quot;EoS.csv&quot;</span>),<span style="color: #BA2121">&#39;r&#39;</span>)
+
+<span style="color: #408080; font-style: italic"># Read the EoS data as  csv file and organize the data into two arrays with density and energies</span>
+EoS <span style="color: #666666">=</span> pd<span style="color: #666666">.</span>read_csv(infile, names<span style="color: #666666">=</span>(<span style="color: #BA2121">&#39;Density&#39;</span>, <span style="color: #BA2121">&#39;Energy&#39;</span>))
+EoS[<span style="color: #BA2121">&#39;Energy&#39;</span>] <span style="color: #666666">=</span> pd<span style="color: #666666">.</span>to_numeric(EoS[<span style="color: #BA2121">&#39;Energy&#39;</span>], errors<span style="color: #666666">=</span><span style="color: #BA2121">&#39;coerce&#39;</span>)
+EoS <span style="color: #666666">=</span> EoS<span style="color: #666666">.</span>dropna()
+Energies <span style="color: #666666">=</span> EoS[<span style="color: #BA2121">&#39;Energy&#39;</span>]
+Density <span style="color: #666666">=</span> EoS[<span style="color: #BA2121">&#39;Density&#39;</span>]
+<span style="color: #408080; font-style: italic">#  The design matrix now as function of various polytrops</span>
+
+Maxpolydegree <span style="color: #666666">=</span> <span style="color: #666666">30</span>
+X <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros((<span style="color: #008000">len</span>(Density),Maxpolydegree))
+X[:,<span style="color: #666666">0</span>] <span style="color: #666666">=</span> <span style="color: #666666">1.0</span>
+estimated_mse_sklearn <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros(Maxpolydegree)
+polynomial <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros(Maxpolydegree)
+k <span style="color: #666666">=5</span>
+kfold <span style="color: #666666">=</span> KFold(n_splits <span style="color: #666666">=</span> k)
+
+<span style="color: #008000; font-weight: bold">for</span> polydegree <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #666666">1</span>, Maxpolydegree):
+    polynomial[polydegree] <span style="color: #666666">=</span> polydegree
+    <span style="color: #008000; font-weight: bold">for</span> degree <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(polydegree):
+        X[:,degree] <span style="color: #666666">=</span> Density<span style="color: #666666">**</span>(degree<span style="color: #666666">/3.0</span>)
+        OLS <span style="color: #666666">=</span> LinearRegression(fit_intercept<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">False</span>)
+<span style="color: #408080; font-style: italic"># loop over trials in order to estimate the expectation value of the MSE</span>
+    estimated_mse_folds <span style="color: #666666">=</span> cross_val_score(OLS, X, Energies, scoring<span style="color: #666666">=</span><span style="color: #BA2121">&#39;neg_mean_squared_error&#39;</span>, cv<span style="color: #666666">=</span>kfold)
+<span style="color: #408080; font-style: italic">#[:, np.newaxis]</span>
+    estimated_mse_sklearn[polydegree] <span style="color: #666666">=</span> np<span style="color: #666666">.</span>mean(<span style="color: #666666">-</span>estimated_mse_folds)
+
+plt<span style="color: #666666">.</span>plot(polynomial, np<span style="color: #666666">.</span>log10(estimated_mse_sklearn), label<span style="color: #666666">=</span><span style="color: #BA2121">&#39;Test Error&#39;</span>)
+plt<span style="color: #666666">.</span>xlabel(<span style="color: #BA2121">&#39;Polynomial degree&#39;</span>)
+plt<span style="color: #666666">.</span>ylabel(<span style="color: #BA2121">&#39;log10[MSE]&#39;</span>)
+plt<span style="color: #666666">.</span>legend()
+plt<span style="color: #666666">.</span>show()
 </pre>
 </div>
       </div>
@@ -1070,7 +1115,140 @@ <h2 id="steepest-descent-example">Steepest descent example </h2>
   </div>
 </div>
 
-<p>Find guesses</p>
+
+<!-- !split  -->
+<h2 id="logistic-regression">Logistic Regression </h2>
+
+<p>In linear regression our main interest was centered on learning the
+coefficients of a functional fit (say a polynomial) in order to be
+able to predict the response of a continuous variable on some unseen
+data. The fit to the continuous variable \( y_i \) is based on some
+independent variables \( \boldsymbol{x}_i \). Linear regression resulted in
+analytical expressions for standard ordinary Least Squares or Ridge
+regression (in terms of matrices to invert) for several quantities,
+ranging from the variance and thereby the confidence intervals of the
+parameters \( \boldsymbol{\theta} \) to the mean squared error. If we can invert
+the product of the design matrices, linear regression gives then a
+simple recipe for fitting our data.
+</p>
+
+<!-- !split  -->
+<h2 id="classification-problems">Classification problems </h2>
+
+<p>Classification problems, however, are concerned with outcomes taking
+the form of discrete variables (i.e. categories). We may for example,
+on the basis of DNA sequencing for a number of patients, like to find
+out which mutations are important for a certain disease; or based on
+scans of various patients' brains, figure out if there is a tumor or
+not; or given a specific physical system, we'd like to identify its
+state, say whether it is an ordered or disordered system (typical
+situation in solid state physics); or classify the status of a
+patient, whether she/he has a stroke or not and many other similar
+situations.
+</p>
+
+<p>The most common situation we encounter when we apply logistic
+regression is that of two possible outcomes, normally denoted as a
+binary outcome, true or false, positive or negative, success or
+failure etc.
+</p>
+
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="optimization-and-deep-learning">Optimization and Deep learning </h2>
+
+<p>Logistic regression will also serve as our stepping stone towards
+neural network algorithms and supervised deep learning. For logistic
+learning, the minimization of the cost function leads to a non-linear
+equation in the parameters \( \boldsymbol{\theta} \). The optimization of the
+problem calls therefore for minimization algorithms. This forms the
+bottle neck of all machine learning algorithms, namely how to find
+reliable minima of a multi-variable function. This leads us to the
+family of gradient descent methods. The latter are the working horses
+of basically all modern machine learning algorithms.
+</p>
+
+<p>We note also that many of the topics discussed here on logistic 
+regression are also commonly used in modern supervised Deep Learning
+models, as we will see later.
+</p>
+
+<!-- !split  -->
+<h2 id="basics">Basics </h2>
+
+<p>We consider the case where the outputs/targets, also called the
+responses or the outcomes, \( y_i \) are discrete and only take values
+from \( k=0,\dots,K-1 \) (i.e. \( K \) classes).
+</p>
+
+<p>The goal is to predict the
+output classes from the design matrix \( \boldsymbol{X}\in\mathbb{R}^{n\times p} \)
+made of \( n \) samples, each of which carries \( p \) features or predictors. The
+primary goal is to identify the classes to which new unseen samples
+belong.
+</p>
+
+<p>Let us specialize to the case of two classes only, with outputs
+\( y_i=0 \) and \( y_i=1 \). Our outcomes could represent the status of a
+credit card user that could default or not on her/his credit card
+debt. That is
+</p>
+
+$$
+y_i = \begin{bmatrix} 0 & \mathrm{no}\\  1 & \mathrm{yes} \end{bmatrix}.
+$$
+
+
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="linear-classifier">Linear classifier </h2>
+
+<p>Before moving to the logistic model, let us try to use our linear
+regression model to classify these two outcomes. We could for example
+fit a linear model to the default case if \( y_i > 0.5 \) and the no
+default case \( y_i \leq 0.5 \).
+</p>
+
+<p>We would then have our 
+weighted linear combination, namely 
+</p>
+$$
+\begin{equation}
+\boldsymbol{y} = \boldsymbol{X}^T\boldsymbol{\theta} +  \boldsymbol{\epsilon},
+\label{_auto1}
+\end{equation}
+$$
+
+<p>where \( \boldsymbol{y} \) is a vector representing the possible outcomes, \( \boldsymbol{X} \) is our
+\( n\times p \) design matrix and \( \boldsymbol{\theta} \) represents our estimators/predictors.
+</p>
+
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="some-selected-properties">Some selected properties </h2>
+
+<p>The main problem with our function is that it takes values on the
+entire real axis. In the case of logistic regression, however, the
+labels \( y_i \) are discrete variables. A typical example is the credit
+card data discussed below here, where we can set the state of
+defaulting the debt to \( y_i=1 \) and not to \( y_i=0 \) for one the persons
+in the data set (see the full example below).
+</p>
+
+<p>One simple way to get a discrete output is to have sign
+functions that map the output of a linear regressor to values \( \{0,1\} \),
+\( f(s_i)=sign(s_i)=1 \) if \( s_i\ge 0 \) and 0 if otherwise. 
+We will encounter this model in our first demonstration of neural networks.
+</p>
+
+<p>Historically it is called the <b>perceptron</b> model in the machine learning
+literature. This model is extremely simple. However, in many cases it is more
+favorable to use a ``soft" classifier that outputs
+the probability of a given category. This leads us to the logistic function.
+</p>
+
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="simple-example">Simple example </h2>
+
+<p>The following example on data for coronary heart disease (CHD) as function of age may serve as an illustration. In the code here we read and plot whether a person has had CHD (output = 1) or not (output = 0). This ouput  is plotted the person's against age. Clearly, the figure shows that attempting to make a standard linear regression fit may not be very meaningful.</p>
+
 
 <!-- code=python (!bc pycod) typeset with pygments style "default" -->
 <div class="cell border-box-sizing code_cell rendered">
@@ -1078,8 +1256,59 @@ <h2 id="steepest-descent-example">Steepest descent example </h2>
     <div class="inner_cell">
       <div class="input_area">
         <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;">x <span style="color: #666666">=</span> guesses[<span style="color: #666666">-1</span>]
-s <span style="color: #666666">=</span> <span style="color: #666666">-</span>df(x)
+  <pre style="line-height: 125%;"><span style="color: #408080; font-style: italic"># Common imports</span>
+<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">os</span>
+<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
+<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">pandas</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">pd</span>
+<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">matplotlib.pyplot</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">plt</span>
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.linear_model</span> <span style="color: #008000; font-weight: bold">import</span> LinearRegression, Ridge, Lasso
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.model_selection</span> <span style="color: #008000; font-weight: bold">import</span> train_test_split
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.utils</span> <span style="color: #008000; font-weight: bold">import</span> resample
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.metrics</span> <span style="color: #008000; font-weight: bold">import</span> mean_squared_error
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">IPython.display</span> <span style="color: #008000; font-weight: bold">import</span> display
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">pylab</span> <span style="color: #008000; font-weight: bold">import</span> plt, mpl
+mpl<span style="color: #666666">.</span>rcParams[<span style="color: #BA2121">&#39;font.family&#39;</span>] <span style="color: #666666">=</span> <span style="color: #BA2121">&#39;serif&#39;</span>
+
+<span style="color: #408080; font-style: italic"># Where to save the figures and data files</span>
+PROJECT_ROOT_DIR <span style="color: #666666">=</span> <span style="color: #BA2121">&quot;Results&quot;</span>
+FIGURE_ID <span style="color: #666666">=</span> <span style="color: #BA2121">&quot;Results/FigureFiles&quot;</span>
+DATA_ID <span style="color: #666666">=</span> <span style="color: #BA2121">&quot;DataFiles/&quot;</span>
+
+<span style="color: #008000; font-weight: bold">if</span> <span style="color: #AA22FF; font-weight: bold">not</span> os<span style="color: #666666">.</span>path<span style="color: #666666">.</span>exists(PROJECT_ROOT_DIR):
+    os<span style="color: #666666">.</span>mkdir(PROJECT_ROOT_DIR)
+
+<span style="color: #008000; font-weight: bold">if</span> <span style="color: #AA22FF; font-weight: bold">not</span> os<span style="color: #666666">.</span>path<span style="color: #666666">.</span>exists(FIGURE_ID):
+    os<span style="color: #666666">.</span>makedirs(FIGURE_ID)
+
+<span style="color: #008000; font-weight: bold">if</span> <span style="color: #AA22FF; font-weight: bold">not</span> os<span style="color: #666666">.</span>path<span style="color: #666666">.</span>exists(DATA_ID):
+    os<span style="color: #666666">.</span>makedirs(DATA_ID)
+
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">image_path</span>(fig_id):
+    <span style="color: #008000; font-weight: bold">return</span> os<span style="color: #666666">.</span>path<span style="color: #666666">.</span>join(FIGURE_ID, fig_id)
+
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">data_path</span>(dat_id):
+    <span style="color: #008000; font-weight: bold">return</span> os<span style="color: #666666">.</span>path<span style="color: #666666">.</span>join(DATA_ID, dat_id)
+
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">save_fig</span>(fig_id):
+    plt<span style="color: #666666">.</span>savefig(image_path(fig_id) <span style="color: #666666">+</span> <span style="color: #BA2121">&quot;.png&quot;</span>, <span style="color: #008000">format</span><span style="color: #666666">=</span><span style="color: #BA2121">&#39;png&#39;</span>)
+
+infile <span style="color: #666666">=</span> <span style="color: #008000">open</span>(data_path(<span style="color: #BA2121">&quot;chddata.csv&quot;</span>),<span style="color: #BA2121">&#39;r&#39;</span>)
+
+<span style="color: #408080; font-style: italic"># Read the chd data as  csv file and organize the data into arrays with age group, age, and chd</span>
+chd <span style="color: #666666">=</span> pd<span style="color: #666666">.</span>read_csv(infile, names<span style="color: #666666">=</span>(<span style="color: #BA2121">&#39;ID&#39;</span>, <span style="color: #BA2121">&#39;Age&#39;</span>, <span style="color: #BA2121">&#39;Agegroup&#39;</span>, <span style="color: #BA2121">&#39;CHD&#39;</span>))
+chd<span style="color: #666666">.</span>columns <span style="color: #666666">=</span> [<span style="color: #BA2121">&#39;ID&#39;</span>, <span style="color: #BA2121">&#39;Age&#39;</span>, <span style="color: #BA2121">&#39;Agegroup&#39;</span>, <span style="color: #BA2121">&#39;CHD&#39;</span>]
+output <span style="color: #666666">=</span> chd[<span style="color: #BA2121">&#39;CHD&#39;</span>]
+age <span style="color: #666666">=</span> chd[<span style="color: #BA2121">&#39;Age&#39;</span>]
+agegroup <span style="color: #666666">=</span> chd[<span style="color: #BA2121">&#39;Agegroup&#39;</span>]
+numberID  <span style="color: #666666">=</span> chd[<span style="color: #BA2121">&#39;ID&#39;</span>] 
+display(chd)
+
+plt<span style="color: #666666">.</span>scatter(age, output, marker<span style="color: #666666">=</span><span style="color: #BA2121">&#39;o&#39;</span>)
+plt<span style="color: #666666">.</span>axis([<span style="color: #666666">18</span>,<span style="color: #666666">70.0</span>,<span style="color: #666666">-0.1</span>, <span style="color: #666666">1.2</span>])
+plt<span style="color: #666666">.</span>xlabel(<span style="color: #BA2121">r&#39;Age&#39;</span>)
+plt<span style="color: #666666">.</span>ylabel(<span style="color: #BA2121">r&#39;CHD&#39;</span>)
+plt<span style="color: #666666">.</span>title(<span style="color: #BA2121">r&#39;Age distribution and Coronary heart disease&#39;</span>)
+plt<span style="color: #666666">.</span>show()
 </pre>
 </div>
       </div>
@@ -1095,7 +1324,12 @@ <h2 id="steepest-descent-example">Steepest descent example </h2>
   </div>
 </div>
 
-<p>Run it!</p>
+
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="plotting-the-mean-value-for-each-group">Plotting the mean value for each group </h2>
+
+<p>What we could attempt however is to plot the mean value for each group.</p>
+
 
 <!-- code=python (!bc pycod) typeset with pygments style "default" -->
 <div class="cell border-box-sizing code_cell rendered">
@@ -1103,13 +1337,14 @@ <h2 id="steepest-descent-example">Steepest descent example </h2>
     <div class="inner_cell">
       <div class="input_area">
         <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">f1d</span>(alpha):
-    <span style="color: #008000; font-weight: bold">return</span> f(x <span style="color: #666666">+</span> alpha<span style="color: #666666">*</span>s)
-
-alpha_opt <span style="color: #666666">=</span> sopt<span style="color: #666666">.</span>golden(f1d)
-next_guess <span style="color: #666666">=</span> x <span style="color: #666666">+</span> alpha_opt <span style="color: #666666">*</span> s
-guesses<span style="color: #666666">.</span>append(next_guess)
-<span style="color: #008000">print</span>(next_guess)
+  <pre style="line-height: 125%;">agegroupmean <span style="color: #666666">=</span> np<span style="color: #666666">.</span>array([<span style="color: #666666">0.1</span>, <span style="color: #666666">0.133</span>, <span style="color: #666666">0.250</span>, <span style="color: #666666">0.333</span>, <span style="color: #666666">0.462</span>, <span style="color: #666666">0.625</span>, <span style="color: #666666">0.765</span>, <span style="color: #666666">0.800</span>])
+group <span style="color: #666666">=</span> np<span style="color: #666666">.</span>array([<span style="color: #666666">1</span>, <span style="color: #666666">2</span>, <span style="color: #666666">3</span>, <span style="color: #666666">4</span>, <span style="color: #666666">5</span>, <span style="color: #666666">6</span>, <span style="color: #666666">7</span>, <span style="color: #666666">8</span>])
+plt<span style="color: #666666">.</span>plot(group, agegroupmean, <span style="color: #BA2121">&quot;r-&quot;</span>)
+plt<span style="color: #666666">.</span>axis([<span style="color: #666666">0</span>,<span style="color: #666666">9</span>,<span style="color: #666666">0</span>, <span style="color: #666666">1.0</span>])
+plt<span style="color: #666666">.</span>xlabel(<span style="color: #BA2121">r&#39;Age group&#39;</span>)
+plt<span style="color: #666666">.</span>ylabel(<span style="color: #BA2121">r&#39;CHD mean values&#39;</span>)
+plt<span style="color: #666666">.</span>title(<span style="color: #BA2121">r&#39;Mean values for each age group&#39;</span>)
+plt<span style="color: #666666">.</span>show()
 </pre>
 </div>
       </div>
@@ -1125,7 +1360,51 @@ <h2 id="steepest-descent-example">Steepest descent example </h2>
   </div>
 </div>
 
-<p>What happened?</p>
+<p>We are now trying to find a function \( f(y\vert x) \), that is a function which gives us an expected value for the output \( y \) with a given input \( x \).
+In standard linear regression with a linear dependence on \( x \), we would write this in terms of our model
+</p>
+$$
+f(y_i\vert x_i)=\theta_0+\theta_1 x_i.
+$$
+
+<p>This expression implies however that \( f(y_i\vert x_i) \) could take any
+value from minus infinity to plus infinity. If we however let
+\( f(y\vert y) \) be represented by the mean value, the above example
+shows us that we can constrain the function to take values between
+zero and one, that is we have \( 0 \le f(y_i\vert x_i) \le 1 \). Looking
+at our last curve we see also that it has an S-shaped form. This leads
+us to a very popular model for the function \( f \), namely the so-called
+Sigmoid function or logistic model. We will consider this function as
+representing the probability for finding a value of \( y_i \) with a given
+\( x_i \).
+</p>
+
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="the-logistic-function">The logistic function </h2>
+
+<p>Another widely studied model, is the so-called 
+perceptron model, which is an example of a &quot;hard classification&quot; model. We
+will encounter this model when we discuss neural networks as
+well. Each datapoint is deterministically assigned to a category (i.e
+\( y_i=0 \) or \( y_i=1 \)). In many cases, and the coronary heart disease data forms one of many such examples, it is favorable to have a &quot;soft&quot;
+classifier that outputs the probability of a given category rather
+than a single value. For example, given \( x_i \), the classifier
+outputs the probability of being in a category \( k \).  Logistic regression
+is the most common example of a so-called soft classifier. In logistic
+regression, the probability that a data point \( x_i \)
+belongs to a category \( y_i=\{0,1\} \) is given by the so-called logit function (or Sigmoid) which is meant to represent the likelihood for a given event, 
+</p>
+$$
+p(t) = \frac{1}{1+\mathrm \exp{-t}}=\frac{\exp{t}}{1+\mathrm \exp{t}}.
+$$
+
+<p>Note that \( 1-p(t)= p(-t) \).</p>
+
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="examples-of-likelihood-functions-used-in-logistic-regression-and-nueral-networks">Examples of likelihood functions used in logistic regression and nueral networks </h2>
+
+<p>The following code plots the logistic function, the step function and other functions we will encounter from here and on.</p>
+
 
 <!-- code=python (!bc pycod) typeset with pygments style "default" -->
 <div class="cell border-box-sizing code_cell rendered">
@@ -1133,10 +1412,60 @@ <h2 id="steepest-descent-example">Steepest descent example </h2>
     <div class="inner_cell">
       <div class="input_area">
         <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;">pt<span style="color: #666666">.</span>axis(<span style="color: #BA2121">&quot;equal&quot;</span>)
-pt<span style="color: #666666">.</span>contour(xmesh, ymesh, fmesh, <span style="color: #666666">50</span>)
-it_array <span style="color: #666666">=</span> np<span style="color: #666666">.</span>array(guesses)
-pt<span style="color: #666666">.</span>plot(it_array<span style="color: #666666">.</span>T[<span style="color: #666666">0</span>], it_array<span style="color: #666666">.</span>T[<span style="color: #666666">1</span>], <span style="color: #BA2121">&quot;x-&quot;</span>)
+  <pre style="line-height: 125%;"><span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;The sigmoid function (or the logistic curve) is a</span>
+<span style="color: #BA2121; font-style: italic">function that takes any real number, z, and outputs a number (0,1).</span>
+<span style="color: #BA2121; font-style: italic">It is useful in neural networks for assigning weights on a relative scale.</span>
+<span style="color: #BA2121; font-style: italic">The value z is the weighted sum of parameters involved in the learning algorithm.&quot;&quot;&quot;</span>
+
+<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">numpy</span>
+<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">matplotlib.pyplot</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">plt</span>
+<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">math</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">mt</span>
+
+z <span style="color: #666666">=</span> numpy<span style="color: #666666">.</span>arange(<span style="color: #666666">-5</span>, <span style="color: #666666">5</span>, <span style="color: #666666">.1</span>)
+sigma_fn <span style="color: #666666">=</span> numpy<span style="color: #666666">.</span>vectorize(<span style="color: #008000; font-weight: bold">lambda</span> z: <span style="color: #666666">1/</span>(<span style="color: #666666">1+</span>numpy<span style="color: #666666">.</span>exp(<span style="color: #666666">-</span>z)))
+sigma <span style="color: #666666">=</span> sigma_fn(z)
+
+fig <span style="color: #666666">=</span> plt<span style="color: #666666">.</span>figure()
+ax <span style="color: #666666">=</span> fig<span style="color: #666666">.</span>add_subplot(<span style="color: #666666">111</span>)
+ax<span style="color: #666666">.</span>plot(z, sigma)
+ax<span style="color: #666666">.</span>set_ylim([<span style="color: #666666">-0.1</span>, <span style="color: #666666">1.1</span>])
+ax<span style="color: #666666">.</span>set_xlim([<span style="color: #666666">-5</span>,<span style="color: #666666">5</span>])
+ax<span style="color: #666666">.</span>grid(<span style="color: #008000; font-weight: bold">True</span>)
+ax<span style="color: #666666">.</span>set_xlabel(<span style="color: #BA2121">&#39;z&#39;</span>)
+ax<span style="color: #666666">.</span>set_title(<span style="color: #BA2121">&#39;sigmoid function&#39;</span>)
+
+plt<span style="color: #666666">.</span>show()
+
+<span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;Step Function&quot;&quot;&quot;</span>
+z <span style="color: #666666">=</span> numpy<span style="color: #666666">.</span>arange(<span style="color: #666666">-5</span>, <span style="color: #666666">5</span>, <span style="color: #666666">.02</span>)
+step_fn <span style="color: #666666">=</span> numpy<span style="color: #666666">.</span>vectorize(<span style="color: #008000; font-weight: bold">lambda</span> z: <span style="color: #666666">1.0</span> <span style="color: #008000; font-weight: bold">if</span> z <span style="color: #666666">&gt;=</span> <span style="color: #666666">0.0</span> <span style="color: #008000; font-weight: bold">else</span> <span style="color: #666666">0.0</span>)
+step <span style="color: #666666">=</span> step_fn(z)
+
+fig <span style="color: #666666">=</span> plt<span style="color: #666666">.</span>figure()
+ax <span style="color: #666666">=</span> fig<span style="color: #666666">.</span>add_subplot(<span style="color: #666666">111</span>)
+ax<span style="color: #666666">.</span>plot(z, step)
+ax<span style="color: #666666">.</span>set_ylim([<span style="color: #666666">-0.5</span>, <span style="color: #666666">1.5</span>])
+ax<span style="color: #666666">.</span>set_xlim([<span style="color: #666666">-5</span>,<span style="color: #666666">5</span>])
+ax<span style="color: #666666">.</span>grid(<span style="color: #008000; font-weight: bold">True</span>)
+ax<span style="color: #666666">.</span>set_xlabel(<span style="color: #BA2121">&#39;z&#39;</span>)
+ax<span style="color: #666666">.</span>set_title(<span style="color: #BA2121">&#39;step function&#39;</span>)
+
+plt<span style="color: #666666">.</span>show()
+
+<span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;tanh Function&quot;&quot;&quot;</span>
+z <span style="color: #666666">=</span> numpy<span style="color: #666666">.</span>arange(<span style="color: #666666">-2*</span>mt<span style="color: #666666">.</span>pi, <span style="color: #666666">2*</span>mt<span style="color: #666666">.</span>pi, <span style="color: #666666">0.1</span>)
+t <span style="color: #666666">=</span> numpy<span style="color: #666666">.</span>tanh(z)
+
+fig <span style="color: #666666">=</span> plt<span style="color: #666666">.</span>figure()
+ax <span style="color: #666666">=</span> fig<span style="color: #666666">.</span>add_subplot(<span style="color: #666666">111</span>)
+ax<span style="color: #666666">.</span>plot(z, t)
+ax<span style="color: #666666">.</span>set_ylim([<span style="color: #666666">-1.0</span>, <span style="color: #666666">1.0</span>])
+ax<span style="color: #666666">.</span>set_xlim([<span style="color: #666666">-2*</span>mt<span style="color: #666666">.</span>pi,<span style="color: #666666">2*</span>mt<span style="color: #666666">.</span>pi])
+ax<span style="color: #666666">.</span>grid(<span style="color: #008000; font-weight: bold">True</span>)
+ax<span style="color: #666666">.</span>set_xlabel(<span style="color: #BA2121">&#39;z&#39;</span>)
+ax<span style="color: #666666">.</span>set_title(<span style="color: #BA2121">&#39;tanh function&#39;</span>)
+
+plt<span style="color: #666666">.</span>show()
 </pre>
 </div>
       </div>
@@ -1152,599 +1481,270 @@ <h2 id="steepest-descent-example">Steepest descent example </h2>
   </div>
 </div>
 
-<p>Note that we did only one iteration here. We can easily add more using our previous guesses.</p>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="conjugate-gradient-method">Conjugate gradient method </h2>
-<div class="alert alert-block alert-block alert-text-normal">
-<b></b>
-<p>
-<p>In the CG method we define so-called conjugate directions and two vectors 
-\( \boldsymbol{s} \) and \( \boldsymbol{t} \)
-are said to be
-conjugate if
-</p>
+<h2 id="two-parameters">Two parameters </h2>
+
+<p>We assume now that we have two classes with \( y_i \) either \( 0 \) or \( 1 \). Furthermore we assume also that we have only two parameters \( \theta \) in our fitting of the Sigmoid function, that is we define probabilities </p>
 $$
-\begin{equation*}
-\boldsymbol{s}^T\boldsymbol{A}\boldsymbol{t}= 0.
-\end{equation*}
+\begin{align*}
+p(y_i=1|x_i,\boldsymbol{\theta}) &= \frac{\exp{(\theta_0+\theta_1x_i)}}{1+\exp{(\theta_0+\theta_1x_i)}},\nonumber\\
+p(y_i=0|x_i,\boldsymbol{\theta}) &= 1 - p(y_i=1|x_i,\boldsymbol{\theta}),
+\end{align*}
 $$
 
-<p>The philosophy of the CG method is to perform searches in various conjugate directions
-of our vectors \( \boldsymbol{x}_i \) obeying the above criterion, namely
-</p>
+<p>where \( \boldsymbol{\theta} \) are the weights we wish to extract from data, in our case \( \theta_0 \) and \( \theta_1 \). </p>
+
+<p>Note that we used</p>
 $$
-\begin{equation*}
-\boldsymbol{x}_i^T\boldsymbol{A}\boldsymbol{x}_j= 0.
-\end{equation*}
+p(y_i=0\vert x_i, \boldsymbol{\theta}) = 1-p(y_i=1\vert x_i, \boldsymbol{\theta}).
 $$
 
-<p>Two vectors are conjugate if they are orthogonal with respect to 
-this inner product. Being conjugate is a symmetric relation: if \( \boldsymbol{s} \) is conjugate to \( \boldsymbol{t} \), then \( \boldsymbol{t} \) is conjugate to \( \boldsymbol{s} \).
-</p>
-</div>
 
+<!-- !split  -->
+<h2 id="maximum-likelihood">Maximum likelihood </h2>
 
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="conjugate-gradient-method">Conjugate gradient method </h2>
-<div class="alert alert-block alert-block alert-text-normal">
-<b></b>
-<p>
-<p>An example is given by the eigenvectors of the matrix</p>
+<p>In order to define the total likelihood for all possible outcomes from a  
+dataset \( \mathcal{D}=\{(y_i,x_i)\} \), with the binary labels
+\( y_i\in\{0,1\} \) and where the data points are drawn independently, we use the so-called <a href="https://en.wikipedia.org/wiki/Maximum_likelihood_estimation" target="_blank">Maximum Likelihood Estimation</a> (MLE) principle. 
+We aim thus at maximizing 
+the probability of seeing the observed data. We can then approximate the 
+likelihood in terms of the product of the individual probabilities of a specific outcome \( y_i \), that is 
+</p>
 $$
-\begin{equation*}
-\boldsymbol{v}_i^T\boldsymbol{A}\boldsymbol{v}_j= \lambda\boldsymbol{v}_i^T\boldsymbol{v}_j,
-\end{equation*}
+\begin{align*}
+P(\mathcal{D}|\boldsymbol{\theta})& = \prod_{i=1}^n \left[p(y_i=1|x_i,\boldsymbol{\theta})\right]^{y_i}\left[1-p(y_i=1|x_i,\boldsymbol{\theta}))\right]^{1-y_i}\nonumber \\
+\end{align*}
 $$
 
-<p>which is zero unless \( i=j \). </p>
-</div>
+<p>from which we obtain the log-likelihood and our <b>cost/loss</b> function</p>
+$$
+\mathcal{C}(\boldsymbol{\theta}) = \sum_{i=1}^n \left( y_i\log{p(y_i=1|x_i,\boldsymbol{\theta})} + (1-y_i)\log\left[1-p(y_i=1|x_i,\boldsymbol{\theta}))\right]\right).
+$$
 
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="conjugate-gradient-method">Conjugate gradient method </h2>
-<div class="alert alert-block alert-block alert-text-normal">
-<b></b>
-<p>
-<p>Assume now that we have a symmetric positive-definite matrix \( \boldsymbol{A} \) of size
-\( n\times n \). At each iteration \( i+1 \) we obtain the conjugate direction of a vector
-</p>
+<h2 id="the-cost-function-rewritten">The cost function rewritten </h2>
+
+<p>Reordering the logarithms, we can rewrite the <b>cost/loss</b> function as</p>
 $$
-\begin{equation*}
-\boldsymbol{x}_{i+1}=\boldsymbol{x}_{i}+\alpha_i\boldsymbol{p}_{i}. 
-\end{equation*}
+\mathcal{C}(\boldsymbol{\theta}) = \sum_{i=1}^n  \left(y_i(\theta_0+\theta_1x_i) -\log{(1+\exp{(\theta_0+\theta_1x_i)})}\right).
 $$
 
-<p>We assume that \( \boldsymbol{p}_{i} \) is a sequence of \( n \) mutually conjugate directions. 
-Then the \( \boldsymbol{p}_{i} \)  form a basis of \( R^n \) and we can expand the solution 
-$  \boldsymbol{A}\boldsymbol{x} = \boldsymbol{b}$ in this basis, namely
+<p>The maximum likelihood estimator is defined as the set of parameters that maximize the log-likelihood where we maximize with respect to \( \theta \).
+Since the cost (error) function is just the negative log-likelihood, for logistic regression we have that
 </p>
-
 $$
-\begin{equation*}
-  \boldsymbol{x}  = \sum^{n}_{i=1} \alpha_i \boldsymbol{p}_i.
-\end{equation*}
+\mathcal{C}(\boldsymbol{\theta})=-\sum_{i=1}^n  \left(y_i(\theta_0+\theta_1x_i) -\log{(1+\exp{(\theta_0+\theta_1x_i)})}\right).
 $$
-</div>
 
+<p>This equation is known in statistics as the <b>cross entropy</b>. Finally, we note that just as in linear regression, 
+in practice we often supplement the cross-entropy with additional regularization terms, usually \( L_1 \) and \( L_2 \) regularization as we did for Ridge and Lasso regression.
+</p>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="conjugate-gradient-method">Conjugate gradient method </h2>
-<div class="alert alert-block alert-block alert-text-normal">
-<b></b>
-<p>
-<p>The coefficients are given by</p>
-$$
-\begin{equation*}
-    \mathbf{A}\mathbf{x} = \sum^{n}_{i=1} \alpha_i \mathbf{A} \mathbf{p}_i = \mathbf{b}.
-\end{equation*}
-$$
+<h2 id="minimizing-the-cross-entropy">Minimizing the cross entropy </h2>
+
+<p>The cross entropy is a convex function of the weights \( \boldsymbol{\theta} \) and,
+therefore, any local minimizer is a global minimizer. 
+</p>
 
-<p>Multiplying with \( \boldsymbol{p}_k^T \)  from the left gives</p>
+<p>Minimizing this
+cost function with respect to the two parameters \( \theta_0 \) and \( \theta_1 \) we obtain
+</p>
 
 $$
-\begin{equation*}
-  \boldsymbol{p}_k^T \boldsymbol{A}\boldsymbol{x} = \sum^{n}_{i=1} \alpha_i\boldsymbol{p}_k^T \boldsymbol{A}\boldsymbol{p}_i= \boldsymbol{p}_k^T \boldsymbol{b},
-\end{equation*}
+\frac{\partial \mathcal{C}(\boldsymbol{\theta})}{\partial \theta_0} = -\sum_{i=1}^n  \left(y_i -\frac{\exp{(\theta_0+\theta_1x_i)}}{1+\exp{(\theta_0+\theta_1x_i)}}\right),
 $$
 
-<p>and we can define the coefficients \( \alpha_k \) as</p>
-
+<p>and </p>
 $$
-\begin{equation*}
-    \alpha_k = \frac{\boldsymbol{p}_k^T \boldsymbol{b}}{\boldsymbol{p}_k^T \boldsymbol{A} \boldsymbol{p}_k}
-\end{equation*}
+\frac{\partial \mathcal{C}(\boldsymbol{\theta})}{\partial \theta_1} = -\sum_{i=1}^n  \left(y_ix_i -x_i\frac{\exp{(\theta_0+\theta_1x_i)}}{1+\exp{(\theta_0+\theta_1x_i)}}\right).
 $$
-</div>
 
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="conjugate-gradient-method-and-iterations">Conjugate gradient method and iterations </h2>
-<div class="alert alert-block alert-block alert-text-normal">
-<b></b>
-<p>
+<h2 id="a-more-compact-expression">A more compact expression </h2>
 
-<p>If we choose the conjugate vectors \( \boldsymbol{p}_k \) carefully, 
-then we may not need all of them to obtain a good approximation to the solution 
-\( \boldsymbol{x} \). 
-We want to regard the conjugate gradient method as an iterative method. 
-This will us to solve systems where \( n \) is so large that the direct 
-method would take too much time.
+<p>Let us now define a vector \( \boldsymbol{y} \) with \( n \) elements \( y_i \), an
+\( n\times p \) matrix \( \boldsymbol{X} \) which contains the \( x_i \) values and a
+vector \( \boldsymbol{p} \) of fitted probabilities \( p(y_i\vert x_i,\boldsymbol{\theta}) \). We can rewrite in a more compact form the first
+derivative of the cost function as
 </p>
 
-<p>We denote the initial guess for \( \boldsymbol{x} \) as \( \boldsymbol{x}_0 \). 
-We can assume without loss of generality that
-</p>
 $$
-\begin{equation*}
-\boldsymbol{x}_0=0,
-\end{equation*}
+\frac{\partial \mathcal{C}(\boldsymbol{\theta})}{\partial \boldsymbol{\theta}} = -\boldsymbol{X}^T\left(\boldsymbol{y}-\boldsymbol{p}\right). 
 $$
 
-<p>or consider the system</p>
+<p>If we in addition define a diagonal matrix \( \boldsymbol{W} \) with elements 
+\( p(y_i\vert x_i,\boldsymbol{\theta})(1-p(y_i\vert x_i,\boldsymbol{\theta}) \), we can obtain a compact expression of the second derivative as 
+</p>
+
 $$
-\begin{equation*}
-\boldsymbol{A}\boldsymbol{z} = \boldsymbol{b}-\boldsymbol{A}\boldsymbol{x}_0,
-\end{equation*}
+\frac{\partial^2 \mathcal{C}(\boldsymbol{\theta})}{\partial \boldsymbol{\theta}\partial \boldsymbol{\theta}^T} = \boldsymbol{X}^T\boldsymbol{W}\boldsymbol{X}. 
 $$
 
-<p>instead.</p>
-</div>
-
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="conjugate-gradient-method">Conjugate gradient method </h2>
-<div class="alert alert-block alert-block alert-text-normal">
-<b></b>
-<p>
-<p>One can show that the solution \( \boldsymbol{x} \) is also the unique minimizer of the quadratic form</p>
+<h2 id="extending-to-more-predictors">Extending to more predictors </h2>
+
+<p>Within a binary classification problem, we can easily expand our model to include multiple predictors. Our ratio between likelihoods is then with \( p \) predictors</p>
 $$
-\begin{equation*}
-  f(\boldsymbol{x}) = \frac{1}{2}\boldsymbol{x}^T\boldsymbol{A}\boldsymbol{x} - \boldsymbol{x}^T \boldsymbol{x} , \quad \boldsymbol{x}\in\mathbf{R}^n. 
-\end{equation*}
+\log{ \frac{p(\boldsymbol{\theta}\boldsymbol{x})}{1-p(\boldsymbol{\theta}\boldsymbol{x})}} = \theta_0+\theta_1x_1+\theta_2x_2+\dots+\theta_px_p.
 $$
 
-<p>This suggests taking the first basis vector \( \boldsymbol{p}_1 \) 
-to be the gradient of \( f \) at \( \boldsymbol{x}=\boldsymbol{x}_0 \), 
-which equals
-</p>
+<p>Here we defined \( \boldsymbol{x}=[1,x_1,x_2,\dots,x_p] \) and \( \boldsymbol{\theta}=[\theta_0, \theta_1, \dots, \theta_p] \) leading to</p>
 $$
-\begin{equation*}
-\boldsymbol{A}\boldsymbol{x}_0-\boldsymbol{b},
-\end{equation*}
+p(\boldsymbol{\theta}\boldsymbol{x})=\frac{ \exp{(\theta_0+\theta_1x_1+\theta_2x_2+\dots+\theta_px_p)}}{1+\exp{(\theta_0+\theta_1x_1+\theta_2x_2+\dots+\theta_px_p)}}.
 $$
 
-<p>and 
-\( \boldsymbol{x}_0=0 \) it is equal \( -\boldsymbol{b} \).
-The other vectors in the basis will be conjugate to the gradient, 
-hence the name conjugate gradient method.
-</p>
-</div>
-
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="conjugate-gradient-method">Conjugate gradient method </h2>
-<div class="alert alert-block alert-block alert-text-normal">
-<b></b>
-<p>
-<p>Let  \( \boldsymbol{r}_k \) be the residual at the \( k \)-th step:</p>
-$$
-\begin{equation*}
-\boldsymbol{r}_k=\boldsymbol{b}-\boldsymbol{A}\boldsymbol{x}_k.
-\end{equation*}
-$$
+<h2 id="including-more-classes">Including more classes </h2>
 
-<p>Note that \( \boldsymbol{r}_k \) is the negative gradient of \( f \) at 
-\( \boldsymbol{x}=\boldsymbol{x}_k \), 
-so the gradient descent method would be to move in the direction \( \boldsymbol{r}_k \). 
-Here, we insist that the directions \( \boldsymbol{p}_k \) are conjugate to each other, 
-so we take the direction closest to the gradient \( \boldsymbol{r}_k \)  
-under the conjugacy constraint. 
-This gives the following expression
+<p>Till now we have mainly focused on two classes, the so-called binary
+system. Suppose we wish to extend to \( K \) classes.  Let us for the sake
+of simplicity assume we have only two predictors. We have then following model
 </p>
-$$
-\begin{equation*}
-\boldsymbol{p}_{k+1}=\boldsymbol{r}_k-\frac{\boldsymbol{p}_k^T \boldsymbol{A}\boldsymbol{r}_k}{\boldsymbol{p}_k^T\boldsymbol{A}\boldsymbol{p}_k} \boldsymbol{p}_k.
-\end{equation*}
-$$
-</div>
-
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="conjugate-gradient-method">Conjugate gradient method </h2>
-<div class="alert alert-block alert-block alert-text-normal">
-<b></b>
-<p>
-<p>We can also  compute the residual iteratively as</p>
-$$
-\begin{equation*}
-\boldsymbol{r}_{k+1}=\boldsymbol{b}-\boldsymbol{A}\boldsymbol{x}_{k+1},
- \end{equation*}
-$$
 
-<p>which equals</p>
 $$
-\begin{equation*}
-\boldsymbol{b}-\boldsymbol{A}(\boldsymbol{x}_k+\alpha_k\boldsymbol{p}_k),
- \end{equation*}
+\log{\frac{p(C=1\vert x)}{p(K\vert x)}} = \theta_{10}+\theta_{11}x_1,
 $$
 
-<p>or</p>
+<p>and </p>
 $$
-\begin{equation*}
-(\boldsymbol{b}-\boldsymbol{A}\boldsymbol{x}_k)-\alpha_k\boldsymbol{A}\boldsymbol{p}_k,
- \end{equation*}
+\log{\frac{p(C=2\vert x)}{p(K\vert x)}} = \theta_{20}+\theta_{21}x_1,
 $$
 
-<p>which gives</p>
-
+<p>and so on till the class \( C=K-1 \) class</p>
 $$
-\begin{equation*}
-\boldsymbol{r}_{k+1}=\boldsymbol{r}_k-\boldsymbol{A}\boldsymbol{p}_{k},
- \end{equation*}
+\log{\frac{p(C=K-1\vert x)}{p(K\vert x)}} = \theta_{(K-1)0}+\theta_{(K-1)1}x_1,
 $$
-</div>
-
 
-<!-- !split  -->
-<h2 id="revisiting-our-first-homework">Revisiting our first homework </h2>
-
-<p>We will use linear regression as a case study for the gradient descent
-methods. Linear regression is a great test case for the gradient
-descent methods discussed in the lectures since it has several
-desirable properties such as:
+<p>and the model is specified in term of \( K-1 \) so-called log-odds or
+<b>logit</b> transformations.
 </p>
 
-<ol>
-<li> An analytical solution (recall homework set 1).</li>
-<li> The gradient can be computed analytically.</li>
-<li> The cost function is convex which guarantees that gradient descent converges for small enough learning rates</li>
-</ol>
-<p>We revisit an example similar to what we had in the first homework set. We had a function  of the type</p>
-
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="more-classes">More classes </h2>
 
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;">m <span style="color: #666666">=</span> <span style="color: #666666">100</span>
-x <span style="color: #666666">=</span> <span style="color: #666666">2*</span>np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>rand(m,<span style="color: #666666">1</span>)
-y <span style="color: #666666">=</span> <span style="color: #666666">4+3*</span>x<span style="color: #666666">+</span>np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>randn(m,<span style="color: #666666">1</span>)
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
+<p>In our discussion of neural networks we will encounter the above again
+in terms of a slightly modified function, the so-called <b>Softmax</b> function.
+</p>
 
-<p>with \( x_i \in [0,1]  \) is chosen randomly using a uniform distribution. Additionally we have a stochastic noise chosen according to a normal distribution \( \cal {N}(0,1) \). 
-The linear regression model is given by 
+<p>The softmax function is used in various multiclass classification
+methods, such as multinomial logistic regression (also known as
+softmax regression), multiclass linear discriminant analysis, naive
+Bayes classifiers, and artificial neural networks.  Specifically, in
+multinomial logistic regression and linear discriminant analysis, the
+input to the function is the result of \( K \) distinct linear functions,
+and the predicted probability for the \( k \)-th class given a sample
+vector \( \boldsymbol{x} \) and a weighting vector \( \boldsymbol{\theta} \) is (with two
+predictors):
 </p>
-$$
-h_\beta(x) = \boldsymbol{y} = \beta_0 + \beta_1 x,
-$$
 
-<p>such that </p>
 $$
-\boldsymbol{y}_i = \beta_0 + \beta_1 x_i.
+p(C=k\vert \mathbf {x} )=\frac{\exp{(\theta_{k0}+\theta_{k1}x_1)}}{1+\sum_{l=1}^{K-1}\exp{(\theta_{l0}+\theta_{l1}x_1)}}.
 $$
 
-
-<!-- !split  -->
-<h2 id="gradient-descent-example">Gradient descent example </h2>
-
-<p>Let \( \mathbf{y} = (y_1,\cdots,y_n)^T \), \( \mathbf{\boldsymbol{y}} = (\boldsymbol{y}_1,\cdots,\boldsymbol{y}_n)^T \) and \( \beta = (\beta_0, \beta_1)^T \)</p>
-
-<p>It is convenient to write \( \mathbf{\boldsymbol{y}} = X\beta \) where \( X \in \mathbb{R}^{100 \times 2}  \) is the design matrix given by (we keep the intercept here)</p>
+<p>It is easy to extend to more predictors. The final class is </p>
 $$
-X \equiv \begin{bmatrix}
-1 & x_1  \\
-\vdots & \vdots  \\
-1 & x_{100} &  \\
-\end{bmatrix}.
+p(C=K\vert \mathbf {x} )=\frac{1}{1+\sum_{l=1}^{K-1}\exp{(\theta_{l0}+\theta_{l1}x_1)}},
 $$
 
-<p>The cost/loss/risk function is given by (</p>
-$$
-C(\beta) = \frac{1}{n}||X\beta-\mathbf{y}||_{2}^{2} = \frac{1}{n}\sum_{i=1}^{100}\left[ (\beta_0 + \beta_1 x_i)^2 - 2 y_i (\beta_0 + \beta_1 x_i) + y_i^2\right] 
-$$
+<p>and they sum to one. Our earlier discussions were all specialized to
+the case with two classes only. It is easy to see from the above that
+what we derived earlier is compatible with these equations.
+</p>
 
-<p>and we want to find \( \beta \) such that \( C(\beta) \) is minimized.</p>
+<p>To find the optimal parameters we would typically use a gradient
+descent method.  Newton's method and gradient descent methods are
+discussed in the material on <a href="https://compphysics.github.io/MachineLearning/doc/pub/Splines/html/Splines-bs.html" target="_blank">optimization
+methods</a>.
+</p>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="the-derivative-of-the-cost-loss-function">The derivative of the cost/loss function </h2>
-
-<p>Computing \( \partial C(\beta) / \partial \beta_0 \) and \( \partial C(\beta) / \partial \beta_1 \) we can show  that the gradient can be written as</p>
-$$
-\nabla_{\beta} C(\beta) = \frac{2}{n}\begin{bmatrix} \sum_{i=1}^{100} \left(\beta_0+\beta_1x_i-y_i\right) \\
-\sum_{i=1}^{100}\left( x_i (\beta_0+\beta_1x_i)-y_ix_i\right) \\
-\end{bmatrix} = \frac{2}{n}X^T(X\beta - \mathbf{y}), 
-$$
+<h2 id="optimization-the-central-part-of-any-machine-learning-algortithm">Optimization, the central part of any Machine Learning algortithm </h2>
 
-<p>where \( X \) is the design matrix defined above.</p>
+<p>Almost every problem in machine learning and data science starts with
+a dataset \( X \), a model \( g(\theta) \), which is a function of the
+parameters \( \theta \) and a cost function \( C(X, g(\theta)) \) that allows
+us to judge how well the model \( g(\theta) \) explains the observations
+\( X \). The model is fit by finding the values of \( \theta \) that minimize
+the cost function. Ideally we would be able to solve for \( \theta \)
+analytically, however this is not possible in general and we must use
+some approximative/numerical method to compute the minimum.
+</p>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="the-hessian-matrix">The Hessian matrix </h2>
-<p>The Hessian matrix of \( C(\beta) \) is given by </p>
+<h2 id="revisiting-our-logistic-regression-case">Revisiting our Logistic Regression case </h2>
+
+<p>In our discussion on Logistic Regression we studied the 
+case of
+two classes, with \( y_i \) either
+\( 0 \) or \( 1 \). Furthermore we assumed also that we have only two
+parameters \( \theta \) in our fitting, that is we
+defined probabilities
+</p>
+
 $$
-\boldsymbol{H} \equiv \begin{bmatrix}
-\frac{\partial^2 C(\beta)}{\partial \beta_0^2} & \frac{\partial^2 C(\beta)}{\partial \beta_0 \partial \beta_1}  \\
-\frac{\partial^2 C(\beta)}{\partial \beta_0 \partial \beta_1} & \frac{\partial^2 C(\beta)}{\partial \beta_1^2} &  \\
-\end{bmatrix} = \frac{2}{n}X^T X.
+\begin{align*}
+p(y_i=1|x_i,\boldsymbol{\theta}) &= \frac{\exp{(\theta_0+\theta_1x_i)}}{1+\exp{(\theta_0+\theta_1x_i)}},\nonumber\\
+p(y_i=0|x_i,\boldsymbol{\theta}) &= 1 - p(y_i=1|x_i,\boldsymbol{\theta}),
+\end{align*}
 $$
 
-<p>This result implies that \( C(\beta) \) is a convex function since the matrix \( X^T X \) always is positive semi-definite.</p>
+<p>where \( \boldsymbol{\theta} \) are the weights we wish to extract from data, in our case \( \theta_0 \) and \( \theta_1 \). </p>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="simple-program">Simple program </h2>
+<h2 id="the-equations-to-solve">The equations to solve </h2>
+
+<p>Our compact equations used a definition of a vector \( \boldsymbol{y} \) with \( n \)
+elements \( y_i \), an \( n\times p \) matrix \( \boldsymbol{X} \) which contains the
+\( x_i \) values and a vector \( \boldsymbol{p} \) of fitted probabilities
+\( p(y_i\vert x_i,\boldsymbol{\theta}) \). We rewrote in a more compact form
+the first derivative of the cost function as
+</p>
 
-<p>We can now write a program that minimizes \( C(\beta) \) using the gradient descent method with a constant learning rate \( \gamma \) according to </p>
 $$
-\beta_{k+1} = \beta_k - \gamma \nabla_\beta C(\beta_k), \ k=0,1,\cdots 
+\frac{\partial \mathcal{C}(\boldsymbol{\theta})}{\partial \boldsymbol{\theta}} = -\boldsymbol{X}^T\left(\boldsymbol{y}-\boldsymbol{p}\right). 
 $$
 
-<p>We can use the expression we computed for the gradient and let use a
-\( \beta_0 \) be chosen randomly and let \( \gamma = 0.001 \). Stop iterating
-when \( ||\nabla_\beta C(\beta_k) || \leq \epsilon = 10^{-8} \). <b>Note that the code below does not include the latter stop criterion</b>.
-</p>
-
-<p>And finally we can compare our solution for \( \beta \) with the analytic result given by 
-\( \beta= (X^TX)^{-1} X^T \mathbf{y} \).
+<p>If we in addition define a diagonal matrix \( \boldsymbol{W} \) with elements 
+\( p(y_i\vert x_i,\boldsymbol{\theta})(1-p(y_i\vert x_i,\boldsymbol{\theta}) \), we can obtain a compact expression of the second derivative as 
 </p>
 
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="gradient-descent-example">Gradient Descent Example </h2>
-
-<p>Here our simple example</p>
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #408080; font-style: italic"># Importing various packages</span>
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">random</span> <span style="color: #008000; font-weight: bold">import</span> random, seed
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">matplotlib.pyplot</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">plt</span>
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">mpl_toolkits.mplot3d</span> <span style="color: #008000; font-weight: bold">import</span> Axes3D
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">matplotlib</span> <span style="color: #008000; font-weight: bold">import</span> cm
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">matplotlib.ticker</span> <span style="color: #008000; font-weight: bold">import</span> LinearLocator, FormatStrFormatter
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">sys</span>
-
-<span style="color: #408080; font-style: italic"># the number of datapoints</span>
-n <span style="color: #666666">=</span> <span style="color: #666666">100</span>
-x <span style="color: #666666">=</span> <span style="color: #666666">2*</span>np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>rand(n,<span style="color: #666666">1</span>)
-y <span style="color: #666666">=</span> <span style="color: #666666">4+3*</span>x<span style="color: #666666">+</span>np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>randn(n,<span style="color: #666666">1</span>)
-
-X <span style="color: #666666">=</span> np<span style="color: #666666">.</span>c_[np<span style="color: #666666">.</span>ones((n,<span style="color: #666666">1</span>)), x]
-<span style="color: #408080; font-style: italic"># Hessian matrix</span>
-H <span style="color: #666666">=</span> (<span style="color: #666666">2.0/</span>n)<span style="color: #666666">*</span> X<span style="color: #666666">.</span>T <span style="color: #666666">@</span> X
-<span style="color: #408080; font-style: italic"># Get the eigenvalues</span>
-EigValues, EigVectors <span style="color: #666666">=</span> np<span style="color: #666666">.</span>linalg<span style="color: #666666">.</span>eig(H)
-<span style="color: #008000">print</span>(<span style="color: #BA2121">f&quot;Eigenvalues of Hessian Matrix:</span><span style="color: #BB6688; font-weight: bold">{</span>EigValues<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">&quot;</span>)
-
-beta_linreg <span style="color: #666666">=</span> np<span style="color: #666666">.</span>linalg<span style="color: #666666">.</span>inv(X<span style="color: #666666">.</span>T <span style="color: #666666">@</span> X) <span style="color: #666666">@</span> X<span style="color: #666666">.</span>T <span style="color: #666666">@</span> y
-<span style="color: #008000">print</span>(beta_linreg)
-beta <span style="color: #666666">=</span> np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>randn(<span style="color: #666666">2</span>,<span style="color: #666666">1</span>)
-
-eta <span style="color: #666666">=</span> <span style="color: #666666">1.0/</span>np<span style="color: #666666">.</span>max(EigValues)
-Niterations <span style="color: #666666">=</span> <span style="color: #666666">1000</span>
-
-<span style="color: #008000; font-weight: bold">for</span> <span style="color: #008000">iter</span> <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(Niterations):
-    gradient <span style="color: #666666">=</span> (<span style="color: #666666">2.0/</span>n)<span style="color: #666666">*</span>X<span style="color: #666666">.</span>T <span style="color: #666666">@</span> (X <span style="color: #666666">@</span> beta<span style="color: #666666">-</span>y)
-    beta <span style="color: #666666">-=</span> eta<span style="color: #666666">*</span>gradient
-
-<span style="color: #008000">print</span>(beta)
-xnew <span style="color: #666666">=</span> np<span style="color: #666666">.</span>array([[<span style="color: #666666">0</span>],[<span style="color: #666666">2</span>]])
-xbnew <span style="color: #666666">=</span> np<span style="color: #666666">.</span>c_[np<span style="color: #666666">.</span>ones((<span style="color: #666666">2</span>,<span style="color: #666666">1</span>)), xnew]
-ypredict <span style="color: #666666">=</span> xbnew<span style="color: #666666">.</span>dot(beta)
-ypredict2 <span style="color: #666666">=</span> xbnew<span style="color: #666666">.</span>dot(beta_linreg)
-plt<span style="color: #666666">.</span>plot(xnew, ypredict, <span style="color: #BA2121">&quot;r-&quot;</span>)
-plt<span style="color: #666666">.</span>plot(xnew, ypredict2, <span style="color: #BA2121">&quot;b-&quot;</span>)
-plt<span style="color: #666666">.</span>plot(x, y ,<span style="color: #BA2121">&#39;ro&#39;</span>)
-plt<span style="color: #666666">.</span>axis([<span style="color: #666666">0</span>,<span style="color: #666666">2.0</span>,<span style="color: #666666">0</span>, <span style="color: #666666">15.0</span>])
-plt<span style="color: #666666">.</span>xlabel(<span style="color: #BA2121">r&#39;$x$&#39;</span>)
-plt<span style="color: #666666">.</span>ylabel(<span style="color: #BA2121">r&#39;$y$&#39;</span>)
-plt<span style="color: #666666">.</span>title(<span style="color: #BA2121">r&#39;Gradient descent example&#39;</span>)
-plt<span style="color: #666666">.</span>show()
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
+$$
+\frac{\partial^2 \mathcal{C}(\boldsymbol{\theta})}{\partial \boldsymbol{\theta}\partial \boldsymbol{\theta}^T} = \boldsymbol{X}^T\boldsymbol{W}\boldsymbol{X}. 
+$$
 
+<p>This defines what is called  the Hessian matrix.</p>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="and-a-corresponding-example-using-scikit-learn">And a corresponding example using <b>scikit-learn</b> </h2>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #408080; font-style: italic"># Importing various packages</span>
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">random</span> <span style="color: #008000; font-weight: bold">import</span> random, seed
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">matplotlib.pyplot</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">plt</span>
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.linear_model</span> <span style="color: #008000; font-weight: bold">import</span> SGDRegressor
-
-n <span style="color: #666666">=</span> <span style="color: #666666">100</span>
-x <span style="color: #666666">=</span> <span style="color: #666666">2*</span>np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>rand(n,<span style="color: #666666">1</span>)
-y <span style="color: #666666">=</span> <span style="color: #666666">4+3*</span>x<span style="color: #666666">+</span>np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>randn(n,<span style="color: #666666">1</span>)
-
-X <span style="color: #666666">=</span> np<span style="color: #666666">.</span>c_[np<span style="color: #666666">.</span>ones((n,<span style="color: #666666">1</span>)), x]
-beta_linreg <span style="color: #666666">=</span> np<span style="color: #666666">.</span>linalg<span style="color: #666666">.</span>inv(X<span style="color: #666666">.</span>T <span style="color: #666666">@</span> X) <span style="color: #666666">@</span> (X<span style="color: #666666">.</span>T <span style="color: #666666">@</span> y)
-<span style="color: #008000">print</span>(beta_linreg)
-sgdreg <span style="color: #666666">=</span> SGDRegressor(max_iter <span style="color: #666666">=</span> <span style="color: #666666">50</span>, penalty<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">None</span>, eta0<span style="color: #666666">=0.1</span>)
-sgdreg<span style="color: #666666">.</span>fit(x,y<span style="color: #666666">.</span>ravel())
-<span style="color: #008000">print</span>(sgdreg<span style="color: #666666">.</span>intercept_, sgdreg<span style="color: #666666">.</span>coef_)
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-
-<!-- !split  -->
-<h2 id="gradient-descent-and-ridge">Gradient descent and Ridge </h2>
+<h2 id="solving-using-newton-raphson-s-method">Solving using Newton-Raphson's method </h2>
 
-<p>We have also discussed Ridge regression where the loss function contains a regularized term given by the \( L_2 \) norm of \( \beta \), </p>
-$$
-C_{\text{ridge}}(\beta) = \frac{1}{n}||X\beta -\mathbf{y}||^2 + \lambda ||\beta||^2, \ \lambda \geq 0.
-$$
+<p>If we can set up these equations, Newton-Raphson's iterative method is normally the method of choice. It requires however that we can compute in an efficient way the  matrices that define the first and second derivatives. </p>
 
-<p>In order to minimize \( C_{\text{ridge}}(\beta) \) using GD we adjust the gradient as follows </p>
-$$
-\nabla_\beta C_{\text{ridge}}(\beta)  = \frac{2}{n}\begin{bmatrix} \sum_{i=1}^{100} \left(\beta_0+\beta_1x_i-y_i\right) \\
-\sum_{i=1}^{100}\left( x_i (\beta_0+\beta_1x_i)-y_ix_i\right) \\
-\end{bmatrix} + 2\lambda\begin{bmatrix} \beta_0 \\ \beta_1\end{bmatrix} = 2 (\frac{1}{n}X^T(X\beta - \mathbf{y})+\lambda \beta).
-$$
+<p>Our iterative scheme is then given by</p>
 
-<p>We can easily extend our program to minimize \( C_{\text{ridge}}(\beta) \) using gradient descent and compare with the analytical solution given by </p>
 $$
-\beta_{\text{ridge}} = \left(X^T X + n\lambda I_{2 \times 2} \right)^{-1} X^T \mathbf{y}.
+\boldsymbol{\theta}^{\mathrm{new}} = \boldsymbol{\theta}^{\mathrm{old}}-\left(\frac{\partial^2 \mathcal{C}(\boldsymbol{\theta})}{\partial \boldsymbol{\theta}\partial \boldsymbol{\theta}^T}\right)^{-1}_{\boldsymbol{\theta}^{\mathrm{old}}}\times \left(\frac{\partial \mathcal{C}(\boldsymbol{\theta})}{\partial \boldsymbol{\theta}}\right)_{\boldsymbol{\theta}^{\mathrm{old}}},
 $$
 
+<p>or in matrix form as</p>
 
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="the-hessian-matrix-for-ridge-regression">The Hessian matrix for Ridge Regression </h2>
-<p>The Hessian matrix of Ridge Regression for our simple example  is given by </p>
 $$
-\boldsymbol{H} \equiv \begin{bmatrix}
-\frac{\partial^2 C(\beta)}{\partial \beta_0^2} & \frac{\partial^2 C(\beta)}{\partial \beta_0 \partial \beta_1}  \\
-\frac{\partial^2 C(\beta)}{\partial \beta_0 \partial \beta_1} & \frac{\partial^2 C(\beta)}{\partial \beta_1^2} &  \\
-\end{bmatrix} = \frac{2}{n}X^T X+2\lambda\boldsymbol{I}.
+\boldsymbol{\theta}^{\mathrm{new}} = \boldsymbol{\theta}^{\mathrm{old}}-\left(\boldsymbol{X}^T\boldsymbol{W}\boldsymbol{X} \right)^{-1}\times \left(-\boldsymbol{X}^T(\boldsymbol{y}-\boldsymbol{p}) \right)_{\boldsymbol{\theta}^{\mathrm{old}}}.
 $$
 
-<p>This implies that the Hessian matrix  is positive definite, hence the stationary point is a
-minimum.
-Note that the Ridge cost function is convex being  a sum of two convex
-functions. Therefore, the stationary point is a global
-minimum of this function.
-</p>
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="program-example-for-gradient-descent-with-ridge-regression">Program example for gradient descent with Ridge Regression </h2>
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">random</span> <span style="color: #008000; font-weight: bold">import</span> random, seed
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">matplotlib.pyplot</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">plt</span>
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">mpl_toolkits.mplot3d</span> <span style="color: #008000; font-weight: bold">import</span> Axes3D
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">matplotlib</span> <span style="color: #008000; font-weight: bold">import</span> cm
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">matplotlib.ticker</span> <span style="color: #008000; font-weight: bold">import</span> LinearLocator, FormatStrFormatter
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">sys</span>
-
-<span style="color: #408080; font-style: italic"># the number of datapoints</span>
-n <span style="color: #666666">=</span> <span style="color: #666666">100</span>
-x <span style="color: #666666">=</span> <span style="color: #666666">2*</span>np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>rand(n,<span style="color: #666666">1</span>)
-y <span style="color: #666666">=</span> <span style="color: #666666">4+3*</span>x<span style="color: #666666">+</span>np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>randn(n,<span style="color: #666666">1</span>)
-
-X <span style="color: #666666">=</span> np<span style="color: #666666">.</span>c_[np<span style="color: #666666">.</span>ones((n,<span style="color: #666666">1</span>)), x]
-XT_X <span style="color: #666666">=</span> X<span style="color: #666666">.</span>T <span style="color: #666666">@</span> X
-
-<span style="color: #408080; font-style: italic">#Ridge parameter lambda</span>
-lmbda  <span style="color: #666666">=</span> <span style="color: #666666">0.001</span>
-Id <span style="color: #666666">=</span> n<span style="color: #666666">*</span>lmbda<span style="color: #666666">*</span> np<span style="color: #666666">.</span>eye(XT_X<span style="color: #666666">.</span>shape[<span style="color: #666666">0</span>])
-
-<span style="color: #408080; font-style: italic"># Hessian matrix</span>
-H <span style="color: #666666">=</span> (<span style="color: #666666">2.0/</span>n)<span style="color: #666666">*</span> XT_X<span style="color: #666666">+2*</span>lmbda<span style="color: #666666">*</span> np<span style="color: #666666">.</span>eye(XT_X<span style="color: #666666">.</span>shape[<span style="color: #666666">0</span>])
-<span style="color: #408080; font-style: italic"># Get the eigenvalues</span>
-EigValues, EigVectors <span style="color: #666666">=</span> np<span style="color: #666666">.</span>linalg<span style="color: #666666">.</span>eig(H)
-<span style="color: #008000">print</span>(<span style="color: #BA2121">f&quot;Eigenvalues of Hessian Matrix:</span><span style="color: #BB6688; font-weight: bold">{</span>EigValues<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">&quot;</span>)
-
-
-beta_linreg <span style="color: #666666">=</span> np<span style="color: #666666">.</span>linalg<span style="color: #666666">.</span>inv(XT_X<span style="color: #666666">+</span>Id) <span style="color: #666666">@</span> X<span style="color: #666666">.</span>T <span style="color: #666666">@</span> y
-<span style="color: #008000">print</span>(beta_linreg)
-<span style="color: #408080; font-style: italic"># Start plain gradient descent</span>
-beta <span style="color: #666666">=</span> np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>randn(<span style="color: #666666">2</span>,<span style="color: #666666">1</span>)
-
-eta <span style="color: #666666">=</span> <span style="color: #666666">1.0/</span>np<span style="color: #666666">.</span>max(EigValues)
-Niterations <span style="color: #666666">=</span> <span style="color: #666666">100</span>
-
-<span style="color: #008000; font-weight: bold">for</span> <span style="color: #008000">iter</span> <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(Niterations):
-    gradients <span style="color: #666666">=</span> <span style="color: #666666">2.0/</span>n<span style="color: #666666">*</span>X<span style="color: #666666">.</span>T <span style="color: #666666">@</span> (X <span style="color: #666666">@</span> (beta)<span style="color: #666666">-</span>y)<span style="color: #666666">+2*</span>lmbda<span style="color: #666666">*</span>beta
-    beta <span style="color: #666666">-=</span> eta<span style="color: #666666">*</span>gradients
-
-<span style="color: #008000">print</span>(beta)
-ypredict <span style="color: #666666">=</span> X <span style="color: #666666">@</span> beta
-ypredict2 <span style="color: #666666">=</span> X <span style="color: #666666">@</span> beta_linreg
-plt<span style="color: #666666">.</span>plot(x, ypredict, <span style="color: #BA2121">&quot;r-&quot;</span>)
-plt<span style="color: #666666">.</span>plot(x, ypredict2, <span style="color: #BA2121">&quot;b-&quot;</span>)
-plt<span style="color: #666666">.</span>plot(x, y ,<span style="color: #BA2121">&#39;ro&#39;</span>)
-plt<span style="color: #666666">.</span>axis([<span style="color: #666666">0</span>,<span style="color: #666666">2.0</span>,<span style="color: #666666">0</span>, <span style="color: #666666">15.0</span>])
-plt<span style="color: #666666">.</span>xlabel(<span style="color: #BA2121">r&#39;$x$&#39;</span>)
-plt<span style="color: #666666">.</span>ylabel(<span style="color: #BA2121">r&#39;$y$&#39;</span>)
-plt<span style="color: #666666">.</span>title(<span style="color: #BA2121">r&#39;Gradient descent example for Ridge&#39;</span>)
-plt<span style="color: #666666">.</span>show()
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
+<p>The right-hand side is computed with the old values of \( \theta \). </p>
 
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="using-gradient-descent-methods-limitations">Using gradient descent methods, limitations </h2>
+<p>If we can compute these matrices, in particular the Hessian, the above is often the easiest method to implement. </p>
 
-<ul>
-<li> <b>Gradient descent (GD) finds local minima of our function</b>. Since the GD algorithm is deterministic, if it converges, it will converge to a local minimum of our cost/loss/risk function. Because in ML we are often dealing with extremely rugged landscapes with many local minima, this can lead to poor performance.</li>
-<li> <b>GD is sensitive to initial conditions</b>. One consequence of the local nature of GD is that initial conditions matter. Depending on where one starts, one will end up at a different local minima. Therefore, it is very important to think about how one initializes the training process. This is true for GD as well as more complicated variants of GD.</li>
-<li> <b>Gradients are computationally expensive to calculate for large datasets</b>. In many cases in statistics and ML, the cost/loss/risk function is a sum of terms, with one term for each data point. For example, in linear regression, \( E \propto \sum_{i=1}^n (y_i - \mathbf{w}^T\cdot\mathbf{x}_i)^2 \); for logistic regression, the square error is replaced by the cross entropy. To calculate the gradient we have to sum over <em>all</em> \( n \) data points. Doing this at every GD step becomes extremely computationally expensive. An ingenious solution to this, is to calculate the gradients using small subsets of the data called &quot;mini batches&quot;. This has the added benefit of introducing stochasticity into our algorithm.</li>
-<li> <b>GD is very sensitive to choices of learning rates</b>. GD is extremely sensitive to the choice of learning rates. If the learning rate is very small, the training process take an extremely long time. For larger learning rates, GD can diverge and give poor results. Furthermore, depending on what the local landscape looks like, we have to modify the learning rates to ensure convergence. Ideally, we would <em>adaptively</em> choose the learning rates to match the landscape.</li>
-<li> <b>GD treats all directions in parameter space uniformly.</b> Another major drawback of GD is that unlike Newton's method, the learning rate for GD is the same in all directions in parameter space. For this reason, the maximum learning rate is set by the behavior of the steepest direction and this can significantly slow down training. Ideally, we would like to take large steps in flat directions and small steps in steep directions. Since we are exploring rugged landscapes where curvatures change, this requires us to keep track of not only the gradient but second derivatives. The ideal scenario would be to calculate the Hessian but this proves to be too computationally expensive.</li> 
-<li> GD can take exponential time to escape saddle points, even with random initialization. As we mentioned, GD is extremely sensitive to initial condition since it determines the particular local minimum GD would eventually reach. However, even with a good initialization scheme, through the introduction of randomness, GD can still take exponential time to escape saddle points.</li>
-</ul>
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="improving-gradient-descent-with-momentum">Improving gradient descent with momentum </h2>
-
-<p>We discuss here some simple examples where we introduce what is called 'memory'about previous steps, or what is normally called momentum gradient descent. The mathematics is explained below in connection with Stochastic gradient descent.</p>
+<h2 id="example-code-for-logistic-regression">Example code for Logistic Regression </h2>
 
+<p>Here we make a class for Logistic regression. The code uses a simple data set and includes both a binary case and a multiclass case.</p>
 
 <!-- code=python (!bc pycod) typeset with pygments style "default" -->
 <div class="cell border-box-sizing code_cell rendered">
@@ -1752,61 +1752,138 @@ <h2 id="improving-gradient-descent-with-momentum">Improving gradient descent wit
     <div class="inner_cell">
       <div class="input_area">
         <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">numpy</span> <span style="color: #008000; font-weight: bold">import</span> asarray
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">numpy</span> <span style="color: #008000; font-weight: bold">import</span> arange
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">numpy.random</span> <span style="color: #008000; font-weight: bold">import</span> rand
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">numpy.random</span> <span style="color: #008000; font-weight: bold">import</span> seed
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">matplotlib</span> <span style="color: #008000; font-weight: bold">import</span> pyplot
- 
-<span style="color: #408080; font-style: italic"># objective function</span>
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">objective</span>(x):
-	<span style="color: #008000; font-weight: bold">return</span> x<span style="color: #666666">**2.0</span>
- 
-<span style="color: #408080; font-style: italic"># derivative of objective function</span>
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">derivative</span>(x):
-	<span style="color: #008000; font-weight: bold">return</span> x <span style="color: #666666">*</span> <span style="color: #666666">2.0</span>
- 
-<span style="color: #408080; font-style: italic"># gradient descent algorithm</span>
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">gradient_descent</span>(objective, derivative, bounds, n_iter, step_size):
-	<span style="color: #408080; font-style: italic"># track all solutions</span>
-	solutions, scores <span style="color: #666666">=</span> <span style="color: #008000">list</span>(), <span style="color: #008000">list</span>()
-	<span style="color: #408080; font-style: italic"># generate an initial point</span>
-	solution <span style="color: #666666">=</span> bounds[:, <span style="color: #666666">0</span>] <span style="color: #666666">+</span> rand(<span style="color: #008000">len</span>(bounds)) <span style="color: #666666">*</span> (bounds[:, <span style="color: #666666">1</span>] <span style="color: #666666">-</span> bounds[:, <span style="color: #666666">0</span>])
-	<span style="color: #408080; font-style: italic"># run the gradient descent</span>
-	<span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(n_iter):
-		<span style="color: #408080; font-style: italic"># calculate gradient</span>
-		gradient <span style="color: #666666">=</span> derivative(solution)
-		<span style="color: #408080; font-style: italic"># take a step</span>
-		solution <span style="color: #666666">=</span> solution <span style="color: #666666">-</span> step_size <span style="color: #666666">*</span> gradient
-		<span style="color: #408080; font-style: italic"># evaluate candidate point</span>
-		solution_eval <span style="color: #666666">=</span> objective(solution)
-		<span style="color: #408080; font-style: italic"># store solution</span>
-		solutions<span style="color: #666666">.</span>append(solution)
-		scores<span style="color: #666666">.</span>append(solution_eval)
-		<span style="color: #408080; font-style: italic"># report progress</span>
-		<span style="color: #008000">print</span>(<span style="color: #BA2121">&#39;&gt;</span><span style="color: #BB6688; font-weight: bold">%d</span><span style="color: #BA2121"> f(</span><span style="color: #BB6688; font-weight: bold">%s</span><span style="color: #BA2121">) = </span><span style="color: #BB6688; font-weight: bold">%.5f</span><span style="color: #BA2121">&#39;</span> <span style="color: #666666">%</span> (i, solution, solution_eval))
-	<span style="color: #008000; font-weight: bold">return</span> [solutions, scores]
- 
-<span style="color: #408080; font-style: italic"># seed the pseudo random number generator</span>
-seed(<span style="color: #666666">4</span>)
-<span style="color: #408080; font-style: italic"># define range for input</span>
-bounds <span style="color: #666666">=</span> asarray([[<span style="color: #666666">-1.0</span>, <span style="color: #666666">1.0</span>]])
-<span style="color: #408080; font-style: italic"># define the total iterations</span>
-n_iter <span style="color: #666666">=</span> <span style="color: #666666">30</span>
-<span style="color: #408080; font-style: italic"># define the step size</span>
-step_size <span style="color: #666666">=</span> <span style="color: #666666">0.1</span>
-<span style="color: #408080; font-style: italic"># perform the gradient descent search</span>
-solutions, scores <span style="color: #666666">=</span> gradient_descent(objective, derivative, bounds, n_iter, step_size)
-<span style="color: #408080; font-style: italic"># sample input range uniformly at 0.1 increments</span>
-inputs <span style="color: #666666">=</span> arange(bounds[<span style="color: #666666">0</span>,<span style="color: #666666">0</span>], bounds[<span style="color: #666666">0</span>,<span style="color: #666666">1</span>]<span style="color: #666666">+0.1</span>, <span style="color: #666666">0.1</span>)
-<span style="color: #408080; font-style: italic"># compute targets</span>
-results <span style="color: #666666">=</span> objective(inputs)
-<span style="color: #408080; font-style: italic"># create a line plot of input vs result</span>
-pyplot<span style="color: #666666">.</span>plot(inputs, results)
-<span style="color: #408080; font-style: italic"># plot the solutions found</span>
-pyplot<span style="color: #666666">.</span>plot(solutions, scores, <span style="color: #BA2121">&#39;.-&#39;</span>, color<span style="color: #666666">=</span><span style="color: #BA2121">&#39;red&#39;</span>)
-<span style="color: #408080; font-style: italic"># show the plot</span>
-pyplot<span style="color: #666666">.</span>show()
+  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
+
+<span style="color: #008000; font-weight: bold">class</span> <span style="color: #0000FF; font-weight: bold">LogisticRegression</span>:
+    <span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;</span>
+<span style="color: #BA2121; font-style: italic">    Logistic Regression for binary and multiclass classification.</span>
+<span style="color: #BA2121; font-style: italic">    &quot;&quot;&quot;</span>
+    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">__init__</span>(<span style="color: #008000">self</span>, lr<span style="color: #666666">=0.01</span>, epochs<span style="color: #666666">=1000</span>, fit_intercept<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">True</span>, verbose<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">False</span>):
+        <span style="color: #008000">self</span><span style="color: #666666">.</span>lr <span style="color: #666666">=</span> lr                  <span style="color: #408080; font-style: italic"># Learning rate for gradient descent</span>
+        <span style="color: #008000">self</span><span style="color: #666666">.</span>epochs <span style="color: #666666">=</span> epochs          <span style="color: #408080; font-style: italic"># Number of iterations</span>
+        <span style="color: #008000">self</span><span style="color: #666666">.</span>fit_intercept <span style="color: #666666">=</span> fit_intercept  <span style="color: #408080; font-style: italic"># Whether to add intercept (bias)</span>
+        <span style="color: #008000">self</span><span style="color: #666666">.</span>verbose <span style="color: #666666">=</span> verbose        <span style="color: #408080; font-style: italic"># Print loss during training if True</span>
+        <span style="color: #008000">self</span><span style="color: #666666">.</span>weights <span style="color: #666666">=</span> <span style="color: #008000; font-weight: bold">None</span>
+        <span style="color: #008000">self</span><span style="color: #666666">.</span>multi_class <span style="color: #666666">=</span> <span style="color: #008000; font-weight: bold">False</span>      <span style="color: #408080; font-style: italic"># Will be determined at fit time</span>
+
+    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">_add_intercept</span>(<span style="color: #008000">self</span>, X):
+        <span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;Add intercept term (column of ones) to feature matrix.&quot;&quot;&quot;</span>
+        intercept <span style="color: #666666">=</span> np<span style="color: #666666">.</span>ones((X<span style="color: #666666">.</span>shape[<span style="color: #666666">0</span>], <span style="color: #666666">1</span>))
+        <span style="color: #008000; font-weight: bold">return</span> np<span style="color: #666666">.</span>concatenate((intercept, X), axis<span style="color: #666666">=1</span>)
+
+    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">_sigmoid</span>(<span style="color: #008000">self</span>, z):
+        <span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;Sigmoid function for binary logistic.&quot;&quot;&quot;</span>
+        <span style="color: #008000; font-weight: bold">return</span> <span style="color: #666666">1</span> <span style="color: #666666">/</span> (<span style="color: #666666">1</span> <span style="color: #666666">+</span> np<span style="color: #666666">.</span>exp(<span style="color: #666666">-</span>z))
+
+    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">_softmax</span>(<span style="color: #008000">self</span>, Z):
+        <span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;Softmax function for multiclass logistic.&quot;&quot;&quot;</span>
+        exp_Z <span style="color: #666666">=</span> np<span style="color: #666666">.</span>exp(Z <span style="color: #666666">-</span> np<span style="color: #666666">.</span>max(Z, axis<span style="color: #666666">=1</span>, keepdims<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">True</span>))
+        <span style="color: #008000; font-weight: bold">return</span> exp_Z <span style="color: #666666">/</span> np<span style="color: #666666">.</span>sum(exp_Z, axis<span style="color: #666666">=1</span>, keepdims<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">True</span>)
+
+    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">fit</span>(<span style="color: #008000">self</span>, X, y):
+        <span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;</span>
+<span style="color: #BA2121; font-style: italic">        Train the logistic regression model using gradient descent.</span>
+<span style="color: #BA2121; font-style: italic">        Supports binary (sigmoid) and multiclass (softmax) based on y.</span>
+<span style="color: #BA2121; font-style: italic">        &quot;&quot;&quot;</span>
+        X <span style="color: #666666">=</span> np<span style="color: #666666">.</span>array(X)
+        y <span style="color: #666666">=</span> np<span style="color: #666666">.</span>array(y)
+        n_samples, n_features <span style="color: #666666">=</span> X<span style="color: #666666">.</span>shape
+
+        <span style="color: #408080; font-style: italic"># Add intercept if needed</span>
+        <span style="color: #008000; font-weight: bold">if</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>fit_intercept:
+            X <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>_add_intercept(X)
+            n_features <span style="color: #666666">+=</span> <span style="color: #666666">1</span>
+
+        <span style="color: #408080; font-style: italic"># Determine classes and mode (binary vs multiclass)</span>
+        unique_classes <span style="color: #666666">=</span> np<span style="color: #666666">.</span>unique(y)
+        <span style="color: #008000; font-weight: bold">if</span> <span style="color: #008000">len</span>(unique_classes) <span style="color: #666666">&gt;</span> <span style="color: #666666">2</span>:
+            <span style="color: #008000">self</span><span style="color: #666666">.</span>multi_class <span style="color: #666666">=</span> <span style="color: #008000; font-weight: bold">True</span>
+        <span style="color: #008000; font-weight: bold">else</span>:
+            <span style="color: #008000">self</span><span style="color: #666666">.</span>multi_class <span style="color: #666666">=</span> <span style="color: #008000; font-weight: bold">False</span>
+
+        <span style="color: #408080; font-style: italic"># ----- Multiclass case -----</span>
+        <span style="color: #008000; font-weight: bold">if</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>multi_class:
+            n_classes <span style="color: #666666">=</span> <span style="color: #008000">len</span>(unique_classes)
+            <span style="color: #408080; font-style: italic"># Map original labels to 0...n_classes-1</span>
+            class_to_index <span style="color: #666666">=</span> {c: idx <span style="color: #008000; font-weight: bold">for</span> idx, c <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">enumerate</span>(unique_classes)}
+            y_indices <span style="color: #666666">=</span> np<span style="color: #666666">.</span>array([class_to_index[c] <span style="color: #008000; font-weight: bold">for</span> c <span style="color: #AA22FF; font-weight: bold">in</span> y])
+            <span style="color: #408080; font-style: italic"># Initialize weight matrix (features x classes)</span>
+            <span style="color: #008000">self</span><span style="color: #666666">.</span>weights <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros((n_features, n_classes))
+
+            <span style="color: #408080; font-style: italic"># One-hot encode y</span>
+            Y_onehot <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros((n_samples, n_classes))
+            Y_onehot[np<span style="color: #666666">.</span>arange(n_samples), y_indices] <span style="color: #666666">=</span> <span style="color: #666666">1</span>
+
+            <span style="color: #408080; font-style: italic"># Gradient descent</span>
+            <span style="color: #008000; font-weight: bold">for</span> epoch <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #008000">self</span><span style="color: #666666">.</span>epochs):
+                scores <span style="color: #666666">=</span> X<span style="color: #666666">.</span>dot(<span style="color: #008000">self</span><span style="color: #666666">.</span>weights)          <span style="color: #408080; font-style: italic"># Linear scores (n_samples x n_classes)</span>
+                probs <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>_softmax(scores)        <span style="color: #408080; font-style: italic"># Probabilities (n_samples x n_classes)</span>
+                <span style="color: #408080; font-style: italic"># Compute gradient (features x classes)</span>
+                gradient <span style="color: #666666">=</span> (<span style="color: #666666">1</span> <span style="color: #666666">/</span> n_samples) <span style="color: #666666">*</span> X<span style="color: #666666">.</span>T<span style="color: #666666">.</span>dot(probs <span style="color: #666666">-</span> Y_onehot)
+                <span style="color: #408080; font-style: italic"># Update weights</span>
+                <span style="color: #008000">self</span><span style="color: #666666">.</span>weights <span style="color: #666666">-=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>lr <span style="color: #666666">*</span> gradient
+
+                <span style="color: #008000; font-weight: bold">if</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>verbose <span style="color: #AA22FF; font-weight: bold">and</span> epoch <span style="color: #666666">%</span> <span style="color: #666666">100</span> <span style="color: #666666">==</span> <span style="color: #666666">0</span>:
+                    <span style="color: #408080; font-style: italic"># Compute current loss (categorical cross-entropy)</span>
+                    loss <span style="color: #666666">=</span> <span style="color: #666666">-</span>np<span style="color: #666666">.</span>sum(Y_onehot <span style="color: #666666">*</span> np<span style="color: #666666">.</span>log(probs <span style="color: #666666">+</span> <span style="color: #666666">1e-15</span>)) <span style="color: #666666">/</span> n_samples
+                    <span style="color: #008000">print</span>(<span style="color: #BA2121">f&quot;[Epoch </span><span style="color: #BB6688; font-weight: bold">{</span>epoch<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">] Multiclass loss: </span><span style="color: #BB6688; font-weight: bold">{</span>loss<span style="color: #BB6688; font-weight: bold">:</span><span style="color: #BA2121">.4f</span><span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">&quot;</span>)
+
+        <span style="color: #408080; font-style: italic"># ----- Binary case -----</span>
+        <span style="color: #008000; font-weight: bold">else</span>:
+            <span style="color: #408080; font-style: italic"># Convert y to 0/1 if not already</span>
+            <span style="color: #008000; font-weight: bold">if</span> <span style="color: #AA22FF; font-weight: bold">not</span> np<span style="color: #666666">.</span>array_equal(unique_classes, [<span style="color: #666666">0</span>, <span style="color: #666666">1</span>]):
+                <span style="color: #408080; font-style: italic"># Map the two classes to 0 and 1</span>
+                class0, class1 <span style="color: #666666">=</span> unique_classes
+                y_binary <span style="color: #666666">=</span> np<span style="color: #666666">.</span>where(y <span style="color: #666666">==</span> class1, <span style="color: #666666">1</span>, <span style="color: #666666">0</span>)
+            <span style="color: #008000; font-weight: bold">else</span>:
+                y_binary <span style="color: #666666">=</span> y<span style="color: #666666">.</span>copy()<span style="color: #666666">.</span>astype(<span style="color: #008000">int</span>)
+
+            <span style="color: #408080; font-style: italic"># Initialize weights vector (features,)</span>
+            <span style="color: #008000">self</span><span style="color: #666666">.</span>weights <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros(n_features)
+
+            <span style="color: #408080; font-style: italic"># Gradient descent</span>
+            <span style="color: #008000; font-weight: bold">for</span> epoch <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #008000">self</span><span style="color: #666666">.</span>epochs):
+                linear_model <span style="color: #666666">=</span> X<span style="color: #666666">.</span>dot(<span style="color: #008000">self</span><span style="color: #666666">.</span>weights)     <span style="color: #408080; font-style: italic"># (n_samples,)</span>
+                probs <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>_sigmoid(linear_model)   <span style="color: #408080; font-style: italic"># (n_samples,)</span>
+                <span style="color: #408080; font-style: italic"># Gradient for binary cross-entropy</span>
+                gradient <span style="color: #666666">=</span> (<span style="color: #666666">1</span> <span style="color: #666666">/</span> n_samples) <span style="color: #666666">*</span> X<span style="color: #666666">.</span>T<span style="color: #666666">.</span>dot(probs <span style="color: #666666">-</span> y_binary)
+                <span style="color: #008000">self</span><span style="color: #666666">.</span>weights <span style="color: #666666">-=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>lr <span style="color: #666666">*</span> gradient
+
+                <span style="color: #008000; font-weight: bold">if</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>verbose <span style="color: #AA22FF; font-weight: bold">and</span> epoch <span style="color: #666666">%</span> <span style="color: #666666">100</span> <span style="color: #666666">==</span> <span style="color: #666666">0</span>:
+                    <span style="color: #408080; font-style: italic"># Compute binary cross-entropy loss</span>
+                    loss <span style="color: #666666">=</span> <span style="color: #666666">-</span>np<span style="color: #666666">.</span>mean(
+                        y_binary <span style="color: #666666">*</span> np<span style="color: #666666">.</span>log(probs <span style="color: #666666">+</span> <span style="color: #666666">1e-15</span>) <span style="color: #666666">+</span> 
+                        (<span style="color: #666666">1</span> <span style="color: #666666">-</span> y_binary) <span style="color: #666666">*</span> np<span style="color: #666666">.</span>log(<span style="color: #666666">1</span> <span style="color: #666666">-</span> probs <span style="color: #666666">+</span> <span style="color: #666666">1e-15</span>)
+                    )
+                    <span style="color: #008000">print</span>(<span style="color: #BA2121">f&quot;[Epoch </span><span style="color: #BB6688; font-weight: bold">{</span>epoch<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">] Binary loss: </span><span style="color: #BB6688; font-weight: bold">{</span>loss<span style="color: #BB6688; font-weight: bold">:</span><span style="color: #BA2121">.4f</span><span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">&quot;</span>)
+
+    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">predict_prob</span>(<span style="color: #008000">self</span>, X):
+        <span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;</span>
+<span style="color: #BA2121; font-style: italic">        Compute probability estimates. Returns a 1D array for binary or</span>
+<span style="color: #BA2121; font-style: italic">        a 2D array (n_samples x n_classes) for multiclass.</span>
+<span style="color: #BA2121; font-style: italic">        &quot;&quot;&quot;</span>
+        X <span style="color: #666666">=</span> np<span style="color: #666666">.</span>array(X)
+        <span style="color: #408080; font-style: italic"># Add intercept if the model used it</span>
+        <span style="color: #008000; font-weight: bold">if</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>fit_intercept:
+            X <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>_add_intercept(X)
+        scores <span style="color: #666666">=</span> X<span style="color: #666666">.</span>dot(<span style="color: #008000">self</span><span style="color: #666666">.</span>weights)
+        <span style="color: #008000; font-weight: bold">if</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>multi_class:
+            <span style="color: #008000; font-weight: bold">return</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>_softmax(scores)
+        <span style="color: #008000; font-weight: bold">else</span>:
+            <span style="color: #008000; font-weight: bold">return</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>_sigmoid(scores)
+
+    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">predict</span>(<span style="color: #008000">self</span>, X):
+        <span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;</span>
+<span style="color: #BA2121; font-style: italic">        Predict class labels for samples in X.</span>
+<span style="color: #BA2121; font-style: italic">        Returns integer class labels (0,1 for binary, or 0...C-1 for multiclass).</span>
+<span style="color: #BA2121; font-style: italic">        &quot;&quot;&quot;</span>
+        probs <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>predict_prob(X)
+        <span style="color: #008000; font-weight: bold">if</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>multi_class:
+            <span style="color: #408080; font-style: italic"># Choose class with highest probability</span>
+            <span style="color: #008000; font-weight: bold">return</span> np<span style="color: #666666">.</span>argmax(probs, axis<span style="color: #666666">=1</span>)
+        <span style="color: #008000; font-weight: bold">else</span>:
+            <span style="color: #408080; font-style: italic"># Threshold at 0.5 for binary</span>
+            <span style="color: #008000; font-weight: bold">return</span> (probs <span style="color: #666666">&gt;=</span> <span style="color: #666666">0.5</span>)<span style="color: #666666">.</span>astype(<span style="color: #008000">int</span>)
 </pre>
 </div>
       </div>
@@ -1822,9 +1899,16 @@ <h2 id="improving-gradient-descent-with-momentum">Improving gradient descent wit
   </div>
 </div>
 
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="same-code-but-now-with-momentum-gradient-descent">Same code but now with momentum gradient descent </h2>
+<p>The class implements the sigmoid and softmax internally. During fit(),
+we check the number of classes: if more than 2, we set
+self.multi_class=True and perform multinomial logistic regression. We
+one-hot encode the target vector and update a weight matrix with
+softmax probabilities. Otherwise, we do standard binary logistic
+regression, converting labels to 0/1 if needed and updating a weight
+vector. In both cases we use batch gradient descent on the
+cross-entropy loss (we add a small epsilon 1e-15 to logs for numerical
+stability). Progress (loss) can be printed if verbose=True.
+</p>
 
 
 <!-- code=python (!bc pycod) typeset with pygments style "default" -->
@@ -1833,69 +1917,38 @@ <h2 id="same-code-but-now-with-momentum-gradient-descent">Same code but now with
     <div class="inner_cell">
       <div class="input_area">
         <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">numpy</span> <span style="color: #008000; font-weight: bold">import</span> asarray
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">numpy</span> <span style="color: #008000; font-weight: bold">import</span> arange
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">numpy.random</span> <span style="color: #008000; font-weight: bold">import</span> rand
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">numpy.random</span> <span style="color: #008000; font-weight: bold">import</span> seed
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">matplotlib</span> <span style="color: #008000; font-weight: bold">import</span> pyplot
- 
-<span style="color: #408080; font-style: italic"># objective function</span>
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">objective</span>(x):
-	<span style="color: #008000; font-weight: bold">return</span> x<span style="color: #666666">**2.0</span>
- 
-<span style="color: #408080; font-style: italic"># derivative of objective function</span>
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">derivative</span>(x):
-	<span style="color: #008000; font-weight: bold">return</span> x <span style="color: #666666">*</span> <span style="color: #666666">2.0</span>
- 
-<span style="color: #408080; font-style: italic"># gradient descent algorithm</span>
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">gradient_descent</span>(objective, derivative, bounds, n_iter, step_size, momentum):
-	<span style="color: #408080; font-style: italic"># track all solutions</span>
-	solutions, scores <span style="color: #666666">=</span> <span style="color: #008000">list</span>(), <span style="color: #008000">list</span>()
-	<span style="color: #408080; font-style: italic"># generate an initial point</span>
-	solution <span style="color: #666666">=</span> bounds[:, <span style="color: #666666">0</span>] <span style="color: #666666">+</span> rand(<span style="color: #008000">len</span>(bounds)) <span style="color: #666666">*</span> (bounds[:, <span style="color: #666666">1</span>] <span style="color: #666666">-</span> bounds[:, <span style="color: #666666">0</span>])
-	<span style="color: #408080; font-style: italic"># keep track of the change</span>
-	change <span style="color: #666666">=</span> <span style="color: #666666">0.0</span>
-	<span style="color: #408080; font-style: italic"># run the gradient descent</span>
-	<span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(n_iter):
-		<span style="color: #408080; font-style: italic"># calculate gradient</span>
-		gradient <span style="color: #666666">=</span> derivative(solution)
-		<span style="color: #408080; font-style: italic"># calculate update</span>
-		new_change <span style="color: #666666">=</span> step_size <span style="color: #666666">*</span> gradient <span style="color: #666666">+</span> momentum <span style="color: #666666">*</span> change
-		<span style="color: #408080; font-style: italic"># take a step</span>
-		solution <span style="color: #666666">=</span> solution <span style="color: #666666">-</span> new_change
-		<span style="color: #408080; font-style: italic"># save the change</span>
-		change <span style="color: #666666">=</span> new_change
-		<span style="color: #408080; font-style: italic"># evaluate candidate point</span>
-		solution_eval <span style="color: #666666">=</span> objective(solution)
-		<span style="color: #408080; font-style: italic"># store solution</span>
-		solutions<span style="color: #666666">.</span>append(solution)
-		scores<span style="color: #666666">.</span>append(solution_eval)
-		<span style="color: #408080; font-style: italic"># report progress</span>
-		<span style="color: #008000">print</span>(<span style="color: #BA2121">&#39;&gt;</span><span style="color: #BB6688; font-weight: bold">%d</span><span style="color: #BA2121"> f(</span><span style="color: #BB6688; font-weight: bold">%s</span><span style="color: #BA2121">) = </span><span style="color: #BB6688; font-weight: bold">%.5f</span><span style="color: #BA2121">&#39;</span> <span style="color: #666666">%</span> (i, solution, solution_eval))
-	<span style="color: #008000; font-weight: bold">return</span> [solutions, scores]
- 
-<span style="color: #408080; font-style: italic"># seed the pseudo random number generator</span>
-seed(<span style="color: #666666">4</span>)
-<span style="color: #408080; font-style: italic"># define range for input</span>
-bounds <span style="color: #666666">=</span> asarray([[<span style="color: #666666">-1.0</span>, <span style="color: #666666">1.0</span>]])
-<span style="color: #408080; font-style: italic"># define the total iterations</span>
-n_iter <span style="color: #666666">=</span> <span style="color: #666666">30</span>
-<span style="color: #408080; font-style: italic"># define the step size</span>
-step_size <span style="color: #666666">=</span> <span style="color: #666666">0.1</span>
-<span style="color: #408080; font-style: italic"># define momentum</span>
-momentum <span style="color: #666666">=</span> <span style="color: #666666">0.3</span>
-<span style="color: #408080; font-style: italic"># perform the gradient descent search with momentum</span>
-solutions, scores <span style="color: #666666">=</span> gradient_descent(objective, derivative, bounds, n_iter, step_size, momentum)
-<span style="color: #408080; font-style: italic"># sample input range uniformly at 0.1 increments</span>
-inputs <span style="color: #666666">=</span> arange(bounds[<span style="color: #666666">0</span>,<span style="color: #666666">0</span>], bounds[<span style="color: #666666">0</span>,<span style="color: #666666">1</span>]<span style="color: #666666">+0.1</span>, <span style="color: #666666">0.1</span>)
-<span style="color: #408080; font-style: italic"># compute targets</span>
-results <span style="color: #666666">=</span> objective(inputs)
-<span style="color: #408080; font-style: italic"># create a line plot of input vs result</span>
-pyplot<span style="color: #666666">.</span>plot(inputs, results)
-<span style="color: #408080; font-style: italic"># plot the solutions found</span>
-pyplot<span style="color: #666666">.</span>plot(solutions, scores, <span style="color: #BA2121">&#39;.-&#39;</span>, color<span style="color: #666666">=</span><span style="color: #BA2121">&#39;red&#39;</span>)
-<span style="color: #408080; font-style: italic"># show the plot</span>
-pyplot<span style="color: #666666">.</span>show()
+  <pre style="line-height: 125%;"><span style="color: #408080; font-style: italic"># Evaluation Metrics</span>
+<span style="color: #408080; font-style: italic">#We define helper functions for accuracy and cross-entropy loss. Accuracy is the fraction of correct predictions . For loss, we compute the appropriate cross-entropy:</span>
+
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">accuracy_score</span>(y_true, y_pred):
+    <span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;Accuracy = (# correct predictions) / (total samples).&quot;&quot;&quot;</span>
+    y_true <span style="color: #666666">=</span> np<span style="color: #666666">.</span>array(y_true)
+    y_pred <span style="color: #666666">=</span> np<span style="color: #666666">.</span>array(y_pred)
+    <span style="color: #008000; font-weight: bold">return</span> np<span style="color: #666666">.</span>mean(y_true <span style="color: #666666">==</span> y_pred)
+
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">binary_cross_entropy</span>(y_true, y_prob):
+    <span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;</span>
+<span style="color: #BA2121; font-style: italic">    Binary cross-entropy loss.</span>
+<span style="color: #BA2121; font-style: italic">    y_true: true binary labels (0 or 1), y_prob: predicted probabilities for class 1.</span>
+<span style="color: #BA2121; font-style: italic">    &quot;&quot;&quot;</span>
+    y_true <span style="color: #666666">=</span> np<span style="color: #666666">.</span>array(y_true)
+    y_prob <span style="color: #666666">=</span> np<span style="color: #666666">.</span>clip(np<span style="color: #666666">.</span>array(y_prob), <span style="color: #666666">1e-15</span>, <span style="color: #666666">1-1e-15</span>)
+    <span style="color: #008000; font-weight: bold">return</span> <span style="color: #666666">-</span>np<span style="color: #666666">.</span>mean(y_true <span style="color: #666666">*</span> np<span style="color: #666666">.</span>log(y_prob) <span style="color: #666666">+</span> (<span style="color: #666666">1</span> <span style="color: #666666">-</span> y_true) <span style="color: #666666">*</span> np<span style="color: #666666">.</span>log(<span style="color: #666666">1</span> <span style="color: #666666">-</span> y_prob))
+
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">categorical_cross_entropy</span>(y_true, y_prob):
+    <span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;</span>
+<span style="color: #BA2121; font-style: italic">    Categorical cross-entropy loss for multiclass.</span>
+<span style="color: #BA2121; font-style: italic">    y_true: true labels (0...C-1), y_prob: array of predicted probabilities (n_samples x C).</span>
+<span style="color: #BA2121; font-style: italic">    &quot;&quot;&quot;</span>
+    y_true <span style="color: #666666">=</span> np<span style="color: #666666">.</span>array(y_true, dtype<span style="color: #666666">=</span><span style="color: #008000">int</span>)
+    y_prob <span style="color: #666666">=</span> np<span style="color: #666666">.</span>clip(np<span style="color: #666666">.</span>array(y_prob), <span style="color: #666666">1e-15</span>, <span style="color: #666666">1-1e-15</span>)
+    <span style="color: #408080; font-style: italic"># One-hot encode true labels</span>
+    n_samples, n_classes <span style="color: #666666">=</span> y_prob<span style="color: #666666">.</span>shape
+    one_hot <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros_like(y_prob)
+    one_hot[np<span style="color: #666666">.</span>arange(n_samples), y_true] <span style="color: #666666">=</span> <span style="color: #666666">1</span>
+    <span style="color: #408080; font-style: italic"># Compute cross-entropy</span>
+    loss_vec <span style="color: #666666">=</span> <span style="color: #666666">-</span>np<span style="color: #666666">.</span>sum(one_hot <span style="color: #666666">*</span> np<span style="color: #666666">.</span>log(y_prob), axis<span style="color: #666666">=1</span>)
+    <span style="color: #008000; font-weight: bold">return</span> np<span style="color: #666666">.</span>mean(loss_vec)
 </pre>
 </div>
       </div>
@@ -1910,133 +1963,12 @@ <h2 id="same-code-but-now-with-momentum-gradient-descent">Same code but now with
     </div>
   </div>
 </div>
+<h3 id="synthetic-data-generation">Synthetic data generation </h3>
 
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="overview-video-on-stochastic-gradient-descent">Overview video on Stochastic Gradient Descent </h2>
-
-<a href="https://www.youtube.com/watch?v=vMh0zPT0tLI&ab_channel=StatQuestwithJoshStarmer" target="_blank">What is Stochastic Gradient Descent</a>
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="batches-and-mini-batches">Batches and mini-batches </h2>
-
-<p>In gradient descent we compute the cost function and its gradient for all data points we have.</p>
-
-<p>In large-scale applications such as the <a href="https://www.image-net.org/challenges/LSVRC/" target="_blank">ILSVRC challenge</a>, the
-training data can have on order of millions of examples. Hence, it
-seems wasteful to compute the full cost function over the entire
-training set in order to perform only a single parameter update. A
-very common approach to addressing this challenge is to compute the
-gradient over batches of the training data. For example, a typical batch could contain some thousand  examples from
-an  entire training set of several millions. This batch is then used to
-perform a parameter update.
-</p>
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="stochastic-gradient-descent-sgd">Stochastic Gradient Descent (SGD) </h2>
-
-<p>In stochastic gradient descent, the extreme case is the case where we
-have only one batch, that is we include the whole data set.
-</p>
-
-<p>This process is called Stochastic Gradient
-Descent (SGD) (or also sometimes on-line gradient descent). This is
-relatively less common to see because in practice due to vectorized
-code optimizations it can be computationally much more efficient to
-evaluate the gradient for 100 examples, than the gradient for one
-example 100 times. Even though SGD technically refers to using a
-single example at a time to evaluate the gradient, you will hear
-people use the term SGD even when referring to mini-batch gradient
-descent (i.e. mentions of MGD for &#8220;Minibatch Gradient Descent&#8221;, or BGD
-for &#8220;Batch gradient descent&#8221; are rare to see), where it is usually
-assumed that mini-batches are used. The size of the mini-batch is a
-hyperparameter but it is not very common to cross-validate or bootstrap it. It is
-usually based on memory constraints (if any), or set to some value,
-e.g. 32, 64 or 128. We use powers of 2 in practice because many
-vectorized operation implementations work faster when their inputs are
-sized in powers of 2.
-</p>
-
-<p>In our notes with  SGD we mean stochastic gradient descent with mini-batches.</p>
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="stochastic-gradient-descent">Stochastic Gradient Descent </h2>
-
-<p>Stochastic gradient descent (SGD) and variants thereof address some of
-the shortcomings of the Gradient descent method discussed above.
-</p>
-
-<p>The underlying idea of SGD comes from the observation that the cost
-function, which we want to minimize, can almost always be written as a
-sum over \( n \) data points \( \{\mathbf{x}_i\}_{i=1}^n \),
-</p>
-$$
-C(\mathbf{\beta}) = \sum_{i=1}^n c_i(\mathbf{x}_i,
-\mathbf{\beta}). 
-$$
-
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="computation-of-gradients">Computation of gradients </h2>
-
-<p>This in turn means that the gradient can be
-computed as a sum over \( i \)-gradients 
-</p>
-$$
-\nabla_\beta C(\mathbf{\beta}) = \sum_i^n \nabla_\beta c_i(\mathbf{x}_i,
-\mathbf{\beta}).
-$$
-
-<p>Stochasticity/randomness is introduced by only taking the
-gradient on a subset of the data called minibatches.  If there are \( n \)
-data points and the size of each minibatch is \( M \), there will be \( n/M \)
-minibatches. We denote these minibatches by \( B_k \) where
-\( k=1,\cdots,n/M \).
-</p>
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="sgd-example">SGD example </h2>
-<p>As an example, suppose we have \( 10 \) data points \( (\mathbf{x}_1,\cdots, \mathbf{x}_{10}) \) 
-and we choose to have \( M=5 \) minibathces,
-then each minibatch contains two data points. In particular we have
-\( B_1 = (\mathbf{x}_1,\mathbf{x}_2), \cdots, B_5 =
-(\mathbf{x}_9,\mathbf{x}_{10}) \). Note that if you choose \( M=1 \) you
-have only a single batch with all data points and on the other extreme,
-you may choose \( M=n \) resulting in a minibatch for each datapoint, i.e
-\( B_k = \mathbf{x}_k \).
-</p>
-
-<p>The idea is now to approximate the gradient by replacing the sum over
-all data points with a sum over the data points in one the minibatches
-picked at random in each gradient descent step 
-</p>
-$$
-\nabla_{\beta}
-C(\mathbf{\beta}) = \sum_{i=1}^n \nabla_\beta c_i(\mathbf{x}_i,
-\mathbf{\beta}) \rightarrow \sum_{i \in B_k}^n \nabla_\beta
-c_i(\mathbf{x}_i, \mathbf{\beta}).
-$$
-
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="the-gradient-step">The gradient step </h2>
-
-<p>Thus a gradient descent step now looks like </p>
-$$
-\beta_{j+1} = \beta_j - \gamma_j \sum_{i \in B_k}^n \nabla_\beta c_i(\mathbf{x}_i,
-\mathbf{\beta})
-$$
-
-<p>where \( k \) is picked at random with equal
-probability from \( [1,n/M] \). An iteration over the number of
-minibathces (n/M) is commonly referred to as an epoch. Thus it is
-typical to choose a number of epochs and for each epoch iterate over
-the number of minibatches, as exemplified in the code below.
+<p>Binary classification data: Create two Gaussian clusters in 2D. For example, class 0 around mean [-2,-2] and class 1 around [2,2].
+Multiclass data: Create several Gaussian clusters (one per class) spread out in feature space.
 </p>
 
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="simple-example-code">Simple example code </h2>
-
 
 <!-- code=python (!bc pycod) typeset with pygments style "default" -->
 <div class="cell border-box-sizing code_cell rendered">
@@ -2044,20 +1976,84 @@ <h2 id="simple-example-code">Simple example code </h2>
     <div class="inner_cell">
       <div class="input_area">
         <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span> 
-
-n <span style="color: #666666">=</span> <span style="color: #666666">100</span> <span style="color: #408080; font-style: italic">#100 datapoints </span>
-M <span style="color: #666666">=</span> <span style="color: #666666">5</span>   <span style="color: #408080; font-style: italic">#size of each minibatch</span>
-m <span style="color: #666666">=</span> <span style="color: #008000">int</span>(n<span style="color: #666666">/</span>M) <span style="color: #408080; font-style: italic">#number of minibatches</span>
-n_epochs <span style="color: #666666">=</span> <span style="color: #666666">10</span> <span style="color: #408080; font-style: italic">#number of epochs</span>
-
-j <span style="color: #666666">=</span> <span style="color: #666666">0</span>
-<span style="color: #008000; font-weight: bold">for</span> epoch <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #666666">1</span>,n_epochs<span style="color: #666666">+1</span>):
-    <span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(m):
-        k <span style="color: #666666">=</span> np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>randint(m) <span style="color: #408080; font-style: italic">#Pick the k-th minibatch at random</span>
-        <span style="color: #408080; font-style: italic">#Compute the gradient using the data in minibatch Bk</span>
-        <span style="color: #408080; font-style: italic">#Compute new suggestion for </span>
-        j <span style="color: #666666">+=</span> <span style="color: #666666">1</span>
+  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
+
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">generate_binary_data</span>(n_samples<span style="color: #666666">=100</span>, n_features<span style="color: #666666">=2</span>, random_state<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">None</span>):
+    <span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;</span>
+<span style="color: #BA2121; font-style: italic">    Generate synthetic binary classification data.</span>
+<span style="color: #BA2121; font-style: italic">    Returns (X, y) where X is (n_samples x n_features), y in {0,1}.</span>
+<span style="color: #BA2121; font-style: italic">    &quot;&quot;&quot;</span>
+    rng <span style="color: #666666">=</span> np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>RandomState(random_state)
+    <span style="color: #408080; font-style: italic"># Half samples for class 0, half for class 1</span>
+    n0 <span style="color: #666666">=</span> n_samples <span style="color: #666666">//</span> <span style="color: #666666">2</span>
+    n1 <span style="color: #666666">=</span> n_samples <span style="color: #666666">-</span> n0
+    <span style="color: #408080; font-style: italic"># Class 0 around mean -2, class 1 around +2</span>
+    mean0 <span style="color: #666666">=</span> <span style="color: #666666">-2</span> <span style="color: #666666">*</span> np<span style="color: #666666">.</span>ones(n_features)
+    mean1 <span style="color: #666666">=</span>  <span style="color: #666666">2</span> <span style="color: #666666">*</span> np<span style="color: #666666">.</span>ones(n_features)
+    X0 <span style="color: #666666">=</span> rng<span style="color: #666666">.</span>randn(n0, n_features) <span style="color: #666666">+</span> mean0
+    X1 <span style="color: #666666">=</span> rng<span style="color: #666666">.</span>randn(n1, n_features) <span style="color: #666666">+</span> mean1
+    X <span style="color: #666666">=</span> np<span style="color: #666666">.</span>vstack((X0, X1))
+    y <span style="color: #666666">=</span> np<span style="color: #666666">.</span>array([<span style="color: #666666">0</span>]<span style="color: #666666">*</span>n0 <span style="color: #666666">+</span> [<span style="color: #666666">1</span>]<span style="color: #666666">*</span>n1)
+    <span style="color: #008000; font-weight: bold">return</span> X, y
+
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">generate_multiclass_data</span>(n_samples<span style="color: #666666">=150</span>, n_features<span style="color: #666666">=2</span>, n_classes<span style="color: #666666">=3</span>, random_state<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">None</span>):
+    <span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;</span>
+<span style="color: #BA2121; font-style: italic">    Generate synthetic multiclass data with n_classes Gaussian clusters.</span>
+<span style="color: #BA2121; font-style: italic">    &quot;&quot;&quot;</span>
+    rng <span style="color: #666666">=</span> np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>RandomState(random_state)
+    X <span style="color: #666666">=</span> []
+    y <span style="color: #666666">=</span> []
+    samples_per_class <span style="color: #666666">=</span> n_samples <span style="color: #666666">//</span> n_classes
+    <span style="color: #008000; font-weight: bold">for</span> <span style="color: #008000">cls</span> <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(n_classes):
+        <span style="color: #408080; font-style: italic"># Random cluster center for each class</span>
+        center <span style="color: #666666">=</span> rng<span style="color: #666666">.</span>uniform(<span style="color: #666666">-5</span>, <span style="color: #666666">5</span>, size<span style="color: #666666">=</span>n_features)
+        Xi <span style="color: #666666">=</span> rng<span style="color: #666666">.</span>randn(samples_per_class, n_features) <span style="color: #666666">+</span> center
+        yi <span style="color: #666666">=</span> [<span style="color: #008000">cls</span>] <span style="color: #666666">*</span> samples_per_class
+        X<span style="color: #666666">.</span>append(Xi)
+        y<span style="color: #666666">.</span>extend(yi)
+    X <span style="color: #666666">=</span> np<span style="color: #666666">.</span>vstack(X)
+    y <span style="color: #666666">=</span> np<span style="color: #666666">.</span>array(y)
+    <span style="color: #008000; font-weight: bold">return</span> X, y
+
+
+<span style="color: #408080; font-style: italic"># Generate and test on binary data</span>
+X_bin, y_bin <span style="color: #666666">=</span> generate_binary_data(n_samples<span style="color: #666666">=200</span>, n_features<span style="color: #666666">=2</span>, random_state<span style="color: #666666">=42</span>)
+model_bin <span style="color: #666666">=</span> LogisticRegression(lr<span style="color: #666666">=0.1</span>, epochs<span style="color: #666666">=1000</span>)
+model_bin<span style="color: #666666">.</span>fit(X_bin, y_bin)
+y_prob_bin <span style="color: #666666">=</span> model_bin<span style="color: #666666">.</span>predict_prob(X_bin)      <span style="color: #408080; font-style: italic"># probabilities for class 1</span>
+y_pred_bin <span style="color: #666666">=</span> model_bin<span style="color: #666666">.</span>predict(X_bin)           <span style="color: #408080; font-style: italic"># predicted classes 0 or 1</span>
+
+acc_bin <span style="color: #666666">=</span> accuracy_score(y_bin, y_pred_bin)
+loss_bin <span style="color: #666666">=</span> binary_cross_entropy(y_bin, y_prob_bin)
+<span style="color: #008000">print</span>(<span style="color: #BA2121">f&quot;Binary Classification - Accuracy: </span><span style="color: #BB6688; font-weight: bold">{</span>acc_bin<span style="color: #BB6688; font-weight: bold">:</span><span style="color: #BA2121">.2f</span><span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">, Cross-Entropy Loss: </span><span style="color: #BB6688; font-weight: bold">{</span>loss_bin<span style="color: #BB6688; font-weight: bold">:</span><span style="color: #BA2121">.2f</span><span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">&quot;</span>)
+<span style="color: #408080; font-style: italic">#For multiclass:</span>
+<span style="color: #408080; font-style: italic"># Generate and test on multiclass data</span>
+X_multi, y_multi <span style="color: #666666">=</span> generate_multiclass_data(n_samples<span style="color: #666666">=300</span>, n_features<span style="color: #666666">=2</span>, n_classes<span style="color: #666666">=3</span>, random_state<span style="color: #666666">=1</span>)
+model_multi <span style="color: #666666">=</span> LogisticRegression(lr<span style="color: #666666">=0.1</span>, epochs<span style="color: #666666">=1000</span>)
+model_multi<span style="color: #666666">.</span>fit(X_multi, y_multi)
+y_prob_multi <span style="color: #666666">=</span> model_multi<span style="color: #666666">.</span>predict_prob(X_multi)     <span style="color: #408080; font-style: italic"># (n_samples x 3) probabilities</span>
+y_pred_multi <span style="color: #666666">=</span> model_multi<span style="color: #666666">.</span>predict(X_multi)          <span style="color: #408080; font-style: italic"># predicted labels 0,1,2</span>
+
+acc_multi <span style="color: #666666">=</span> accuracy_score(y_multi, y_pred_multi)
+loss_multi <span style="color: #666666">=</span> categorical_cross_entropy(y_multi, y_prob_multi)
+<span style="color: #008000">print</span>(<span style="color: #BA2121">f&quot;Multiclass Classification - Accuracy: </span><span style="color: #BB6688; font-weight: bold">{</span>acc_multi<span style="color: #BB6688; font-weight: bold">:</span><span style="color: #BA2121">.2f</span><span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">, Cross-Entropy Loss: </span><span style="color: #BB6688; font-weight: bold">{</span>loss_multi<span style="color: #BB6688; font-weight: bold">:</span><span style="color: #BA2121">.2f</span><span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">&quot;</span>)
+
+<span style="color: #408080; font-style: italic"># CSV Export</span>
+<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">csv</span>
+
+<span style="color: #408080; font-style: italic"># Export binary results</span>
+<span style="color: #008000; font-weight: bold">with</span> <span style="color: #008000">open</span>(<span style="color: #BA2121">&#39;binary_results.csv&#39;</span>, mode<span style="color: #666666">=</span><span style="color: #BA2121">&#39;w&#39;</span>, newline<span style="color: #666666">=</span><span style="color: #BA2121">&#39;&#39;</span>) <span style="color: #008000; font-weight: bold">as</span> f:
+    writer <span style="color: #666666">=</span> csv<span style="color: #666666">.</span>writer(f)
+    writer<span style="color: #666666">.</span>writerow([<span style="color: #BA2121">&quot;TrueLabel&quot;</span>, <span style="color: #BA2121">&quot;PredictedLabel&quot;</span>])
+    <span style="color: #008000; font-weight: bold">for</span> true, pred <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">zip</span>(y_bin, y_pred_bin):
+        writer<span style="color: #666666">.</span>writerow([true, pred])
+
+<span style="color: #408080; font-style: italic"># Export multiclass results</span>
+<span style="color: #008000; font-weight: bold">with</span> <span style="color: #008000">open</span>(<span style="color: #BA2121">&#39;multiclass_results.csv&#39;</span>, mode<span style="color: #666666">=</span><span style="color: #BA2121">&#39;w&#39;</span>, newline<span style="color: #666666">=</span><span style="color: #BA2121">&#39;&#39;</span>) <span style="color: #008000; font-weight: bold">as</span> f:
+    writer <span style="color: #666666">=</span> csv<span style="color: #666666">.</span>writer(f)
+    writer<span style="color: #666666">.</span>writerow([<span style="color: #BA2121">&quot;TrueLabel&quot;</span>, <span style="color: #BA2121">&quot;PredictedLabel&quot;</span>])
+    <span style="color: #008000; font-weight: bold">for</span> true, pred <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">zip</span>(y_multi, y_pred_multi):
+        writer<span style="color: #666666">.</span>writerow([true, pred])
 </pre>
 </div>
       </div>
@@ -2073,1806 +2069,9 @@ <h2 id="simple-example-code">Simple example code </h2>
   </div>
 </div>
 
-<p>Taking the gradient only on a subset of the data has two important
-benefits. First, it introduces randomness which decreases the chance
-that our opmization scheme gets stuck in a local minima. Second, if
-the size of the minibatches are small relative to the number of
-datapoints (\( M <  n \)), the computation of the gradient is much
-cheaper since we sum over the datapoints in the \( k-th \) minibatch and not
-all \( n \) datapoints.
-</p>
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="when-do-we-stop">When do we stop? </h2>
-
-<p>A natural question is when do we stop the search for a new minimum?
-One possibility is to compute the full gradient after a given number
-of epochs and check if the norm of the gradient is smaller than some
-threshold and stop if true. However, the condition that the gradient
-is zero is valid also for local minima, so this would only tell us
-that we are close to a local/global minimum. However, we could also
-evaluate the cost function at this point, store the result and
-continue the search. If the test kicks in at a later stage we can
-compare the values of the cost function and keep the \( \beta \) that
-gave the lowest value.
-</p>
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="slightly-different-approach">Slightly different approach </h2>
-
-<p>Another approach is to let the step length \( \gamma_j \) depend on the
-number of epochs in such a way that it becomes very small after a
-reasonable time such that we do not move at all. Such approaches are
-also called scaling. There are many such ways to <a href="https://towardsdatascience.com/gradient-descent-the-learning-rate-and-the-importance-of-feature-scaling-6c0b416596e1" target="_blank">scale the learning
-rate</a>
-and <a href="https://www.jmlr.org/papers/volume23/20-1258/20-1258.pdf" target="_blank">discussions here</a>. See
-also
-<a href="https://towardsdatascience.com/learning-rate-schedules-and-adaptive-learning-rate-methods-for-deep-learning-2c8f433990d1" target="_blank"><tt>https://towardsdatascience.com/learning-rate-schedules-and-adaptive-learning-rate-methods-for-deep-learning-2c8f433990d1</tt></a>
-for a discussion of different scaling functions for the learning rate.
-</p>
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="time-decay-rate">Time decay rate </h2>
-
-<p>As an example, let \( e = 0,1,2,3,\cdots \) denote the current epoch and let \( t_0, t_1 > 0 \) be two fixed numbers. Furthermore, let \( t = e \cdot m + i \) where \( m \) is the number of minibatches and \( i=0,\cdots,m-1 \). Then the function $$\gamma_j(t; t_0, t_1) = \frac{t_0}{t+t_1} $$ goes to zero as the number of epochs gets large. I.e. we start with a step length \( \gamma_j (0; t_0, t_1) = t_0/t_1 \) which decays in <em>time</em> \( t \).</p>
-
-<p>In this way we can fix the number of epochs, compute \( \beta \) and
-evaluate the cost function at the end. Repeating the computation will
-give a different result since the scheme is random by design. Then we
-pick the final \( \beta \) that gives the lowest value of the cost
-function.
-</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span> 
-
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">step_length</span>(t,t0,t1):
-    <span style="color: #008000; font-weight: bold">return</span> t0<span style="color: #666666">/</span>(t<span style="color: #666666">+</span>t1)
-
-n <span style="color: #666666">=</span> <span style="color: #666666">100</span> <span style="color: #408080; font-style: italic">#100 datapoints </span>
-M <span style="color: #666666">=</span> <span style="color: #666666">5</span>   <span style="color: #408080; font-style: italic">#size of each minibatch</span>
-m <span style="color: #666666">=</span> <span style="color: #008000">int</span>(n<span style="color: #666666">/</span>M) <span style="color: #408080; font-style: italic">#number of minibatches</span>
-n_epochs <span style="color: #666666">=</span> <span style="color: #666666">500</span> <span style="color: #408080; font-style: italic">#number of epochs</span>
-t0 <span style="color: #666666">=</span> <span style="color: #666666">1.0</span>
-t1 <span style="color: #666666">=</span> <span style="color: #666666">10</span>
-
-gamma_j <span style="color: #666666">=</span> t0<span style="color: #666666">/</span>t1
-j <span style="color: #666666">=</span> <span style="color: #666666">0</span>
-<span style="color: #008000; font-weight: bold">for</span> epoch <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #666666">1</span>,n_epochs<span style="color: #666666">+1</span>):
-    <span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(m):
-        k <span style="color: #666666">=</span> np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>randint(m) <span style="color: #408080; font-style: italic">#Pick the k-th minibatch at random</span>
-        <span style="color: #408080; font-style: italic">#Compute the gradient using the data in minibatch Bk</span>
-        <span style="color: #408080; font-style: italic">#Compute new suggestion for beta</span>
-        t <span style="color: #666666">=</span> epoch<span style="color: #666666">*</span>m<span style="color: #666666">+</span>i
-        gamma_j <span style="color: #666666">=</span> step_length(t,t0,t1)
-        j <span style="color: #666666">+=</span> <span style="color: #666666">1</span>
-
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;gamma_j after </span><span style="color: #BB6688; font-weight: bold">%d</span><span style="color: #BA2121"> epochs: </span><span style="color: #BB6688; font-weight: bold">%g</span><span style="color: #BA2121">&quot;</span> <span style="color: #666666">%</span> (n_epochs,gamma_j))
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="code-with-a-number-of-minibatches-which-varies">Code with a Number of Minibatches which varies </h2>
-
-<p>In the code here we vary the number of mini-batches.</p>
-
-<!-- code=text (!bc pycode) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"># Importing various packages
-from math import exp, sqrt
-from random import random, seed
-import numpy as np
-import matplotlib.pyplot as plt
-
-n = 100
-x = 2*np.random.rand(n,1)
-y = 4+3*x+np.random.randn(n,1)
-
-X = np.c_[np.ones((n,1)), x]
-XT_X = X.T @ X
-theta_linreg = np.linalg.inv(X.T @ X) @ (X.T @ y)
-print(&quot;Own inversion&quot;)
-print(theta_linreg)
-# Hessian matrix
-H = (2.0/n)* XT_X
-EigValues, EigVectors = np.linalg.eig(H)
-print(f&quot;Eigenvalues of Hessian Matrix:{EigValues}&quot;)
-
-theta = np.random.randn(2,1)
-eta = 1.0/np.max(EigValues)
-Niterations = 1000
-
-
-for iter in range(Niterations):
-    gradients = 2.0/n*X.T @ ((X @ theta)-y)
-    theta -= eta*gradients
-print(&quot;theta from own gd&quot;)
-print(theta)
-
-xnew = np.array([[0],[2]])
-Xnew = np.c_[np.ones((2,1)), xnew]
-ypredict = Xnew.dot(theta)
-ypredict2 = Xnew.dot(theta_linreg)
-
-n_epochs = 50
-M = 5   #size of each minibatch
-m = int(n/M) #number of minibatches
-t0, t1 = 5, 50
-
-def learning_schedule(t):
-    return t0/(t+t1)
-
-theta = np.random.randn(2,1)
-
-for epoch in range(n_epochs):
-# Can you figure out a better way of setting up the contributions to each batch?
-    for i in range(m):
-        random_index = M*np.random.randint(m)
-        xi = X[random_index:random_index+M]
-        yi = y[random_index:random_index+M]
-        gradients = (2.0/M)* xi.T @ ((xi @ theta)-yi)
-        eta = learning_schedule(epoch*m+i)
-        theta = theta - eta*gradients
-print(&quot;theta from own sdg&quot;)
-print(theta)
-
-plt.plot(xnew, ypredict, &quot;r-&quot;)
-plt.plot(xnew, ypredict2, &quot;b-&quot;)
-plt.plot(x, y ,&#39;ro&#39;)
-plt.axis([0,2.0,0, 15.0])
-plt.xlabel(r&#39;$x$&#39;)
-plt.ylabel(r&#39;$y$&#39;)
-plt.title(r&#39;Random numbers &#39;)
-plt.show()
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="replace-or-not">Replace or not </h2>
-
-<p>In the above code, we have use replacement in setting up the
-mini-batches. The discussion
-<a href="https://sebastianraschka.com/faq/docs/sgd-methods.html" target="_blank">here</a> may be
-useful.  
-</p>
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="momentum-based-gd">Momentum based GD </h2>
-
-<p>The stochastic gradient descent (SGD) is almost always used with a
-<em>momentum</em> or inertia term that serves as a memory of the direction we
-are moving in parameter space.  This is typically implemented as
-follows
-</p>
-
-$$
-\begin{align}
-\mathbf{v}_{t}&=\gamma \mathbf{v}_{t-1}+\eta_{t}\nabla_\theta E(\boldsymbol{\theta}_t) \nonumber \\
-\boldsymbol{\theta}_{t+1}&= \boldsymbol{\theta}_t -\mathbf{v}_{t},
-\label{_auto1}
-\end{align}
-$$
-
-<p>where we have introduced a momentum parameter \( \gamma \), with
-\( 0\le\gamma\le 1 \), and for brevity we dropped the explicit notation to
-indicate the gradient is to be taken over a different mini-batch at
-each step. We call this algorithm gradient descent with momentum
-(GDM). From these equations, it is clear that \( \mathbf{v}_t \) is a
-running average of recently encountered gradients and
-\( (1-\gamma)^{-1} \) sets the characteristic time scale for the memory
-used in the averaging procedure. Consistent with this, when
-\( \gamma=0 \), this just reduces down to ordinary SGD as discussed
-earlier. An equivalent way of writing the updates is
-</p>
-
-$$
-\Delta \boldsymbol{\theta}_{t+1} = \gamma \Delta \boldsymbol{\theta}_t -\ \eta_{t}\nabla_\theta E(\boldsymbol{\theta}_t),
-$$
-
-<p>where we have defined \( \Delta \boldsymbol{\theta}_{t}= \boldsymbol{\theta}_t-\boldsymbol{\theta}_{t-1} \).</p>
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="more-on-momentum-based-approaches">More on momentum based approaches </h2>
-
-<p>Let us try to get more intuition from these equations. It is helpful
-to consider a simple physical analogy with a particle of mass \( m \)
-moving in a viscous medium with drag coefficient \( \mu \) and potential
-\( E(\mathbf{w}) \). If we denote the particle's position by \( \mathbf{w} \),
-then its motion is described by
-</p>
-
-$$
-m {d^2 \mathbf{w} \over dt^2} + \mu {d \mathbf{w} \over dt }= -\nabla_w E(\mathbf{w}).
-$$
-
-<p>We can discretize this equation in the usual way to get</p>
-
-$$
-m { \mathbf{w}_{t+\Delta t}-2 \mathbf{w}_{t} +\mathbf{w}_{t-\Delta t} \over (\Delta t)^2}+\mu {\mathbf{w}_{t+\Delta t}- \mathbf{w}_{t} \over \Delta t} = -\nabla_w E(\mathbf{w}).
-$$
-
-<p>Rearranging this equation, we can rewrite this as</p>
-
-$$
-\Delta \mathbf{w}_{t +\Delta t}= - { (\Delta t)^2 \over m +\mu \Delta t} \nabla_w E(\mathbf{w})+ {m \over m +\mu \Delta t} \Delta \mathbf{w}_t.
-$$
-
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="momentum-parameter">Momentum parameter </h2>
-
-<p>Notice that this equation is identical to previous one if we identify
-the position of the particle, \( \mathbf{w} \), with the parameters
-\( \boldsymbol{\theta} \). This allows us to identify the momentum
-parameter and learning rate with the mass of the particle and the
-viscous drag as:
-</p>
-
-$$
-\gamma= {m \over m +\mu \Delta t }, \qquad \eta = {(\Delta t)^2 \over m +\mu \Delta t}.
-$$
-
-<p>Thus, as the name suggests, the momentum parameter is proportional to
-the mass of the particle and effectively provides inertia.
-Furthermore, in the large viscosity/small learning rate limit, our
-memory time scales as \( (1-\gamma)^{-1} \approx m/(\mu \Delta t) \).
-</p>
-
-<p>Why is momentum useful? SGD momentum helps the gradient descent
-algorithm gain speed in directions with persistent but small gradients
-even in the presence of stochasticity, while suppressing oscillations
-in high-curvature directions. This becomes especially important in
-situations where the landscape is shallow and flat in some directions
-and narrow and steep in others. It has been argued that first-order
-methods (with appropriate initial conditions) can perform comparable
-to more expensive second order methods, especially in the context of
-complex deep learning models.
-</p>
-
-<p>These beneficial properties of momentum can sometimes become even more
-pronounced by using a slight modification of the classical momentum
-algorithm called Nesterov Accelerated Gradient (NAG).
-</p>
-
-<p>In the NAG algorithm, rather than calculating the gradient at the
-current parameters, \( \nabla_\theta E(\boldsymbol{\theta}_t) \), one
-calculates the gradient at the expected value of the parameters given
-our current momentum, \( \nabla_\theta E(\boldsymbol{\theta}_t +\gamma
-\mathbf{v}_{t-1}) \). This yields the NAG update rule
-</p>
-
-$$
-\begin{align}
-\mathbf{v}_{t}&=\gamma \mathbf{v}_{t-1}+\eta_{t}\nabla_\theta E(\boldsymbol{\theta}_t +\gamma \mathbf{v}_{t-1}) \nonumber \\
-\boldsymbol{\theta}_{t+1}&= \boldsymbol{\theta}_t -\mathbf{v}_{t}.
-\label{_auto2}
-\end{align}
-$$
-
-<p>One of the major advantages of NAG is that it allows for the use of a larger learning rate than GDM for the same choice of \( \gamma \).</p>
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="second-moment-of-the-gradient">Second moment of the gradient </h2>
-
-<p>In stochastic gradient descent, with and without momentum, we still
-have to specify a schedule for tuning the learning rates \( \eta_t \)
-as a function of time.  As discussed in the context of Newton's
-method, this presents a number of dilemmas. The learning rate is
-limited by the steepest direction which can change depending on the
-current position in the landscape. To circumvent this problem, ideally
-our algorithm would keep track of curvature and take large steps in
-shallow, flat directions and small steps in steep, narrow directions.
-Second-order methods accomplish this by calculating or approximating
-the Hessian and normalizing the learning rate by the
-curvature. However, this is very computationally expensive for
-extremely large models. Ideally, we would like to be able to
-adaptively change the step size to match the landscape without paying
-the steep computational price of calculating or approximating
-Hessians.
-</p>
-
-<p>Recently, a number of methods have been introduced that accomplish
-this by tracking not only the gradient, but also the second moment of
-the gradient. These methods include AdaGrad, AdaDelta, Root Mean Squared Propagation (RMS-Prop), and
-<a href="https://arxiv.org/abs/1412.6980" target="_blank">ADAM</a>.
-</p>
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="rms-prop">RMS prop </h2>
-
-<p>In RMS prop, in addition to keeping a running average of the first
-moment of the gradient, we also keep track of the second moment
-denoted by \( \mathbf{s}_t=\mathbb{E}[\mathbf{g}_t^2] \). The update rule
-for RMS prop is given by
-</p>
-
-$$
-\begin{align}
-\mathbf{g}_t &= \nabla_\theta E(\boldsymbol{\theta}) 
-\label{_auto3}\\
-\mathbf{s}_t &=\beta \mathbf{s}_{t-1} +(1-\beta)\mathbf{g}_t^2 \nonumber \\
-\boldsymbol{\theta}_{t+1}&=&\boldsymbol{\theta}_t - \eta_t { \mathbf{g}_t \over \sqrt{\mathbf{s}_t +\epsilon}}, \nonumber
-\end{align}
-$$
-
-<p>where \( \beta \) controls the averaging time of the second moment and is
-typically taken to be about \( \beta=0.9 \), \( \eta_t \) is a learning rate
-typically chosen to be \( 10^{-3} \), and \( \epsilon\sim 10^{-8}  \) is a
-small regularization constant to prevent divergences. Multiplication
-and division by vectors is understood as an element-wise operation. It
-is clear from this formula that the learning rate is reduced in
-directions where the norm of the gradient is consistently large. This
-greatly speeds up the convergence by allowing us to use a larger
-learning rate for flat directions.
-</p>
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="adam-optimizer-https-arxiv-org-abs-1412-6980"><a href="https://arxiv.org/abs/1412.6980" target="_blank">ADAM optimizer</a> </h2>
-
-<p>A related algorithm is the ADAM optimizer. In
-<a href="https://arxiv.org/abs/1412.6980" target="_blank">ADAM</a>, we keep a running average of
-both the first and second moment of the gradient and use this
-information to adaptively change the learning rate for different
-parameters.  The method isefficient when working with large
-problems involving lots data and/or parameters.  It is a combination of the
-gradient descent with momentum algorithm and the RMSprop algorithm
-discussed above.
-</p>
-
-<p>In addition to keeping a running average of the first and
-second moments of the gradient
-(i.e. \( \mathbf{m}_t=\mathbb{E}[\mathbf{g}_t] \) and
-\( \mathbf{s}_t=\mathbb{E}[\mathbf{g}^2_t] \), respectively), ADAM
-performs an additional bias correction to account for the fact that we
-are estimating the first two moments of the gradient using a running
-average (denoted by the hats in the update rule below). The update
-rule for ADAM is given by (where multiplication and division are once
-again understood to be element-wise operations below)
-</p>
-
-$$
-\begin{align}
-\mathbf{g}_t &= \nabla_\theta E(\boldsymbol{\theta}) 
-\label{_auto4}\\
-\mathbf{m}_t &= \beta_1 \mathbf{m}_{t-1} + (1-\beta_1) \mathbf{g}_t \nonumber \\
-\mathbf{s}_t &=\beta_2 \mathbf{s}_{t-1} +(1-\beta_2)\mathbf{g}_t^2 \nonumber \\
-\boldsymbol{\mathbf{m}}_t&={\mathbf{m}_t \over 1-\beta_1^t} \nonumber \\
-\boldsymbol{\mathbf{s}}_t &={\mathbf{s}_t \over1-\beta_2^t} \nonumber \\
-\boldsymbol{\theta}_{t+1}&=\boldsymbol{\theta}_t - \eta_t { \boldsymbol{\mathbf{m}}_t \over \sqrt{\boldsymbol{\mathbf{s}}_t} +\epsilon}, \nonumber \\
-\label{_auto5}
-\end{align}
-$$
-
-<p>where \( \beta_1 \) and \( \beta_2 \) set the memory lifetime of the first and
-second moment and are typically taken to be \( 0.9 \) and \( 0.99 \)
-respectively, and \( \eta \) and \( \epsilon \) are identical to RMSprop.
-</p>
-
-<p>Like in RMSprop, the effective step size of a parameter depends on the
-magnitude of its gradient squared.  To understand this better, let us
-rewrite this expression in terms of the variance
-\( \boldsymbol{\sigma}_t^2 = \boldsymbol{\mathbf{s}}_t -
-(\boldsymbol{\mathbf{m}}_t)^2 \). Consider a single parameter \( \theta_t \). The
-update rule for this parameter is given by
-</p>
-
-$$
-\Delta \theta_{t+1}= -\eta_t { \boldsymbol{m}_t \over \sqrt{\sigma_t^2 +  m_t^2 }+\epsilon}.
-$$
-
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="algorithms-and-codes-for-adagrad-rmsprop-and-adam">Algorithms and codes for Adagrad, RMSprop and Adam </h2>
-
-<p>The algorithms we have implemented are well described in the text by <a href="https://www.deeplearningbook.org/contents/optimization.html" target="_blank">Goodfellow, Bengio and Courville, chapter 8</a>.</p>
-
-<p>The codes which implement these algorithms are discussed after our presentation of automatic differentiation.</p>
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="practical-tips">Practical tips </h2>
-
-<ul>
-<li> <b>Randomize the data when making mini-batches</b>. It is always important to randomly shuffle the data when forming mini-batches. Otherwise, the gradient descent method can fit spurious correlations resulting from the order in which data is presented.</li>
-<li> <b>Transform your inputs</b>. Learning becomes difficult when our landscape has a mixture of steep and flat directions. One simple trick for minimizing these situations is to standardize the data by subtracting the mean and normalizing the variance of input variables. Whenever possible, also decorrelate the inputs. To understand why this is helpful, consider the case of linear regression. It is easy to show that for the squared error cost function, the Hessian of the cost function is just the correlation matrix between the inputs. Thus, by standardizing the inputs, we are ensuring that the landscape looks homogeneous in all directions in parameter space. Since most deep networks can be viewed as linear transformations followed by a non-linearity at each layer, we expect this intuition to hold beyond the linear case.</li>
-<li> <b>Monitor the out-of-sample performance.</b> Always monitor the performance of your model on a validation set (a small portion of the training data that is held out of the training process to serve as a proxy for the test set. If the validation error starts increasing, then the model is beginning to overfit. Terminate the learning process. This <em>early stopping</em> significantly improves performance in many settings.</li>
-<li> <b>Adaptive optimization methods don't always have good generalization.</b> Recent studies have shown that adaptive methods such as ADAM, RMSPorp, and AdaGrad tend to have poor generalization compared to SGD or SGD with momentum, particularly in the high-dimensional limit (i.e. the number of parameters exceeds the number of data points). Although it is not clear at this stage why these methods perform so well in training deep neural networks, simpler procedures like properly-tuned SGD may work as well or better in these applications.</li>
-</ul>
-<p>Geron's text, see chapter 11, has several interesting discussions.</p>
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="automatic-differentiation">Automatic differentiation </h2>
-
-<p><a href="https://en.wikipedia.org/wiki/Automatic_differentiation" target="_blank">Automatic differentiation (AD)</a>, 
-also called algorithmic
-differentiation or computational differentiation,is a set of
-techniques to numerically evaluate the derivative of a function
-specified by a computer program. AD exploits the fact that every
-computer program, no matter how complicated, executes a sequence of
-elementary arithmetic operations (addition, subtraction,
-multiplication, division, etc.) and elementary functions (exp, log,
-sin, cos, etc.). By applying the chain rule repeatedly to these
-operations, derivatives of arbitrary order can be computed
-automatically, accurately to working precision, and using at most a
-small constant factor more arithmetic operations than the original
-program.
-</p>
-
-<p>Automatic differentiation is neither:</p>
-
-<ul>
-<li> Symbolic differentiation, nor</li>
-<li> Numerical differentiation (the method of finite differences).</li>
-</ul>
-<p>Symbolic differentiation can lead to inefficient code and faces the
-difficulty of converting a computer program into a single expression,
-while numerical differentiation can introduce round-off errors in the
-discretization process and cancellation
-</p>
-
-<p>Python has tools for so-called <b>automatic differentiation</b>.
-Consider the following example
-</p>
-$$
-f(x) = \sin\left(2\pi x + x^2\right)
-$$
-
-<p>which has the following derivative</p>
-$$
-f'(x) = \cos\left(2\pi x + x^2\right)\left(2\pi + 2x\right) 
-$$
-
-<p>Using <b>autograd</b> we have</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">autograd.numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
-
-<span style="color: #408080; font-style: italic"># To do elementwise differentiation:</span>
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">autograd</span> <span style="color: #008000; font-weight: bold">import</span> elementwise_grad <span style="color: #008000; font-weight: bold">as</span> egrad 
-
-<span style="color: #408080; font-style: italic"># To plot:</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">matplotlib.pyplot</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">plt</span> 
-
-
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">f</span>(x):
-    <span style="color: #008000; font-weight: bold">return</span> np<span style="color: #666666">.</span>sin(<span style="color: #666666">2*</span>np<span style="color: #666666">.</span>pi<span style="color: #666666">*</span>x <span style="color: #666666">+</span> x<span style="color: #666666">**2</span>)
-
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">f_grad_analytic</span>(x):
-    <span style="color: #008000; font-weight: bold">return</span> np<span style="color: #666666">.</span>cos(<span style="color: #666666">2*</span>np<span style="color: #666666">.</span>pi<span style="color: #666666">*</span>x <span style="color: #666666">+</span> x<span style="color: #666666">**2</span>)<span style="color: #666666">*</span>(<span style="color: #666666">2*</span>np<span style="color: #666666">.</span>pi <span style="color: #666666">+</span> <span style="color: #666666">2*</span>x)
-
-<span style="color: #408080; font-style: italic"># Do the comparison:</span>
-x <span style="color: #666666">=</span> np<span style="color: #666666">.</span>linspace(<span style="color: #666666">0</span>,<span style="color: #666666">1</span>,<span style="color: #666666">1000</span>)
-
-f_grad <span style="color: #666666">=</span> egrad(f)
-
-computed <span style="color: #666666">=</span> f_grad(x)
-analytic <span style="color: #666666">=</span> f_grad_analytic(x)
-
-plt<span style="color: #666666">.</span>title(<span style="color: #BA2121">&#39;Derivative computed from Autograd compared with the analytical derivative&#39;</span>)
-plt<span style="color: #666666">.</span>plot(x,computed,label<span style="color: #666666">=</span><span style="color: #BA2121">&#39;autograd&#39;</span>)
-plt<span style="color: #666666">.</span>plot(x,analytic,label<span style="color: #666666">=</span><span style="color: #BA2121">&#39;analytic&#39;</span>)
-
-plt<span style="color: #666666">.</span>xlabel(<span style="color: #BA2121">&#39;x&#39;</span>)
-plt<span style="color: #666666">.</span>ylabel(<span style="color: #BA2121">&#39;y&#39;</span>)
-plt<span style="color: #666666">.</span>legend()
-
-plt<span style="color: #666666">.</span>show()
-
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;The max absolute difference is: </span><span style="color: #BB6688; font-weight: bold">%g</span><span style="color: #BA2121">&quot;</span><span style="color: #666666">%</span>(np<span style="color: #666666">.</span>max(np<span style="color: #666666">.</span>abs(computed <span style="color: #666666">-</span> analytic))))
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-
-<!-- !split  -->
-<h2 id="using-autograd">Using autograd </h2>
-
-<p>Here we
-experiment with what kind of functions Autograd is capable
-of finding the gradient of. The following Python functions are just
-meant to illustrate what Autograd can do, but please feel free to
-experiment with other, possibly more complicated, functions as well.
-</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">autograd.numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">autograd</span> <span style="color: #008000; font-weight: bold">import</span> grad
-
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">f1</span>(x):
-    <span style="color: #008000; font-weight: bold">return</span> x<span style="color: #666666">**3</span> <span style="color: #666666">+</span> <span style="color: #666666">1</span>
-
-f1_grad <span style="color: #666666">=</span> grad(f1)
-
-<span style="color: #408080; font-style: italic"># Remember to send in float as argument to the computed gradient from Autograd!</span>
-a <span style="color: #666666">=</span> <span style="color: #666666">1.0</span>
-
-<span style="color: #408080; font-style: italic"># See the evaluated gradient at a using autograd:</span>
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;The gradient of f1 evaluated at a = </span><span style="color: #BB6688; font-weight: bold">%g</span><span style="color: #BA2121"> using autograd is: </span><span style="color: #BB6688; font-weight: bold">%g</span><span style="color: #BA2121">&quot;</span><span style="color: #666666">%</span>(a,f1_grad(a)))
-
-<span style="color: #408080; font-style: italic"># Compare with the analytical derivative, that is f1&#39;(x) = 3*x**2 </span>
-grad_analytical <span style="color: #666666">=</span> <span style="color: #666666">3*</span>a<span style="color: #666666">**2</span>
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;The gradient of f1 evaluated at a = </span><span style="color: #BB6688; font-weight: bold">%g</span><span style="color: #BA2121"> by finding the analytic expression is: </span><span style="color: #BB6688; font-weight: bold">%g</span><span style="color: #BA2121">&quot;</span><span style="color: #666666">%</span>(a,grad_analytical))
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="autograd-with-more-complicated-functions">Autograd with more complicated functions </h2>
-
-<p>To differentiate with respect to two (or more) arguments of a Python
-function, Autograd need to know at which variable the function if
-being differentiated with respect to.
-</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">autograd.numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">autograd</span> <span style="color: #008000; font-weight: bold">import</span> grad
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">f2</span>(x1,x2):
-    <span style="color: #008000; font-weight: bold">return</span> <span style="color: #666666">3*</span>x1<span style="color: #666666">**3</span> <span style="color: #666666">+</span> x2<span style="color: #666666">*</span>(x1 <span style="color: #666666">-</span> <span style="color: #666666">5</span>) <span style="color: #666666">+</span> <span style="color: #666666">1</span>
-
-<span style="color: #408080; font-style: italic"># By sending the argument 0, Autograd will compute the derivative w.r.t the first variable, in this case x1</span>
-f2_grad_x1 <span style="color: #666666">=</span> grad(f2,<span style="color: #666666">0</span>)
-
-<span style="color: #408080; font-style: italic"># ... and differentiate w.r.t x2 by sending 1 as an additional arugment to grad</span>
-f2_grad_x2 <span style="color: #666666">=</span> grad(f2,<span style="color: #666666">1</span>)
-
-x1 <span style="color: #666666">=</span> <span style="color: #666666">1.0</span>
-x2 <span style="color: #666666">=</span> <span style="color: #666666">3.0</span> 
-
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;Evaluating at x1 = </span><span style="color: #BB6688; font-weight: bold">%g</span><span style="color: #BA2121">, x2 = </span><span style="color: #BB6688; font-weight: bold">%g</span><span style="color: #BA2121">&quot;</span><span style="color: #666666">%</span>(x1,x2))
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;-&quot;</span><span style="color: #666666">*30</span>)
-
-<span style="color: #408080; font-style: italic"># Compare with the analytical derivatives:</span>
-
-<span style="color: #408080; font-style: italic"># Derivative of f2 w.r.t x1 is: 9*x1**2 + x2:</span>
-f2_grad_x1_analytical <span style="color: #666666">=</span> <span style="color: #666666">9*</span>x1<span style="color: #666666">**2</span> <span style="color: #666666">+</span> x2
-
-<span style="color: #408080; font-style: italic"># Derivative of f2 w.r.t x2 is: x1 - 5:</span>
-f2_grad_x2_analytical <span style="color: #666666">=</span> x1 <span style="color: #666666">-</span> <span style="color: #666666">5</span>
-
-<span style="color: #408080; font-style: italic"># See the evaluated derivations:</span>
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;The derivative of f2 w.r.t x1: </span><span style="color: #BB6688; font-weight: bold">%g</span><span style="color: #BA2121">&quot;</span><span style="color: #666666">%</span>( f2_grad_x1(x1,x2) ))
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;The analytical derivative of f2 w.r.t x1: </span><span style="color: #BB6688; font-weight: bold">%g</span><span style="color: #BA2121">&quot;</span><span style="color: #666666">%</span>( f2_grad_x1(x1,x2) ))
-
-<span style="color: #008000">print</span>()
-
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;The derivative of f2 w.r.t x2: </span><span style="color: #BB6688; font-weight: bold">%g</span><span style="color: #BA2121">&quot;</span><span style="color: #666666">%</span>( f2_grad_x2(x1,x2) ))
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;The analytical derivative of f2 w.r.t x2: </span><span style="color: #BB6688; font-weight: bold">%g</span><span style="color: #BA2121">&quot;</span><span style="color: #666666">%</span>( f2_grad_x2(x1,x2) ))
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-<p>Note that the grad function will not produce the true gradient of the function. The true gradient of a function with two or more variables will produce a vector, where each element is the function differentiated w.r.t a variable.</p>
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="more-complicated-functions-using-the-elements-of-their-arguments-directly">More complicated functions using the elements of their arguments directly </h2>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">autograd.numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">autograd</span> <span style="color: #008000; font-weight: bold">import</span> grad
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">f3</span>(x): <span style="color: #408080; font-style: italic"># Assumes x is an array of length 5 or higher</span>
-    <span style="color: #008000; font-weight: bold">return</span> <span style="color: #666666">2*</span>x[<span style="color: #666666">0</span>] <span style="color: #666666">+</span> <span style="color: #666666">3*</span>x[<span style="color: #666666">1</span>] <span style="color: #666666">+</span> <span style="color: #666666">5*</span>x[<span style="color: #666666">2</span>] <span style="color: #666666">+</span> <span style="color: #666666">7*</span>x[<span style="color: #666666">3</span>] <span style="color: #666666">+</span> <span style="color: #666666">11*</span>x[<span style="color: #666666">4</span>]<span style="color: #666666">**2</span>
-
-f3_grad <span style="color: #666666">=</span> grad(f3)
-
-x <span style="color: #666666">=</span> np<span style="color: #666666">.</span>linspace(<span style="color: #666666">0</span>,<span style="color: #666666">4</span>,<span style="color: #666666">5</span>)
-
-<span style="color: #408080; font-style: italic"># Print the computed gradient:</span>
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;The computed gradient of f3 is: &quot;</span>, f3_grad(x))
-
-<span style="color: #408080; font-style: italic"># The analytical gradient is: (2, 3, 5, 7, 22*x[4])</span>
-f3_grad_analytical <span style="color: #666666">=</span> np<span style="color: #666666">.</span>array([<span style="color: #666666">2</span>, <span style="color: #666666">3</span>, <span style="color: #666666">5</span>, <span style="color: #666666">7</span>, <span style="color: #666666">22*</span>x[<span style="color: #666666">4</span>]])
-
-<span style="color: #408080; font-style: italic"># Print the analytical gradient:</span>
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;The analytical gradient of f3 is: &quot;</span>, f3_grad_analytical)
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-<p>Note that in this case, when sending an array as input argument, the
-output from Autograd is another array. This is the true gradient of
-the function, as opposed to the function in the previous example. By
-using arrays to represent the variables, the output from Autograd
-might be easier to work with, as the output is closer to what one
-could expect form a gradient-evaluting function.
-</p>
-
-<!-- !split  -->
-<h2 id="functions-using-mathematical-functions-from-numpy">Functions using mathematical functions from Numpy </h2>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">autograd.numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">autograd</span> <span style="color: #008000; font-weight: bold">import</span> grad
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">f4</span>(x):
-    <span style="color: #008000; font-weight: bold">return</span> np<span style="color: #666666">.</span>sqrt(<span style="color: #666666">1+</span>x<span style="color: #666666">**2</span>) <span style="color: #666666">+</span> np<span style="color: #666666">.</span>exp(x) <span style="color: #666666">+</span> np<span style="color: #666666">.</span>sin(<span style="color: #666666">2*</span>np<span style="color: #666666">.</span>pi<span style="color: #666666">*</span>x)
-
-f4_grad <span style="color: #666666">=</span> grad(f4)
-
-x <span style="color: #666666">=</span> <span style="color: #666666">2.7</span>
-
-<span style="color: #408080; font-style: italic"># Print the computed derivative:</span>
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;The computed derivative of f4 at x = </span><span style="color: #BB6688; font-weight: bold">%g</span><span style="color: #BA2121"> is: </span><span style="color: #BB6688; font-weight: bold">%g</span><span style="color: #BA2121">&quot;</span><span style="color: #666666">%</span>(x,f4_grad(x)))
-
-<span style="color: #408080; font-style: italic"># The analytical derivative is: x/sqrt(1 + x**2) + exp(x) + cos(2*pi*x)*2*pi</span>
-f4_grad_analytical <span style="color: #666666">=</span> x<span style="color: #666666">/</span>np<span style="color: #666666">.</span>sqrt(<span style="color: #666666">1</span> <span style="color: #666666">+</span> x<span style="color: #666666">**2</span>) <span style="color: #666666">+</span> np<span style="color: #666666">.</span>exp(x) <span style="color: #666666">+</span> np<span style="color: #666666">.</span>cos(<span style="color: #666666">2*</span>np<span style="color: #666666">.</span>pi<span style="color: #666666">*</span>x)<span style="color: #666666">*2*</span>np<span style="color: #666666">.</span>pi
-
-<span style="color: #408080; font-style: italic"># Print the analytical gradient:</span>
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;The analytical gradient of f4 at x = </span><span style="color: #BB6688; font-weight: bold">%g</span><span style="color: #BA2121"> is: </span><span style="color: #BB6688; font-weight: bold">%g</span><span style="color: #BA2121">&quot;</span><span style="color: #666666">%</span>(x,f4_grad_analytical))
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="more-autograd">More autograd </h2>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">autograd.numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">autograd</span> <span style="color: #008000; font-weight: bold">import</span> grad
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">f5</span>(x):
-    <span style="color: #008000; font-weight: bold">if</span> x <span style="color: #666666">&gt;=</span> <span style="color: #666666">0</span>:
-        <span style="color: #008000; font-weight: bold">return</span> x<span style="color: #666666">**2</span>
-    <span style="color: #008000; font-weight: bold">else</span>:
-        <span style="color: #008000; font-weight: bold">return</span> <span style="color: #666666">-3*</span>x <span style="color: #666666">+</span> <span style="color: #666666">1</span>
-
-f5_grad <span style="color: #666666">=</span> grad(f5)
-
-x <span style="color: #666666">=</span> <span style="color: #666666">2.7</span>
-
-<span style="color: #408080; font-style: italic"># Print the computed derivative:</span>
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;The computed derivative of f5 at x = </span><span style="color: #BB6688; font-weight: bold">%g</span><span style="color: #BA2121"> is: </span><span style="color: #BB6688; font-weight: bold">%g</span><span style="color: #BA2121">&quot;</span><span style="color: #666666">%</span>(x,f5_grad(x)))
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="and-with-loops">And  with loops </h2>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">autograd.numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">autograd</span> <span style="color: #008000; font-weight: bold">import</span> grad
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">f6_for</span>(x):
-    val <span style="color: #666666">=</span> <span style="color: #666666">0</span>
-    <span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #666666">10</span>):
-        val <span style="color: #666666">=</span> val <span style="color: #666666">+</span> x<span style="color: #666666">**</span>i
-    <span style="color: #008000; font-weight: bold">return</span> val
-
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">f6_while</span>(x):
-    val <span style="color: #666666">=</span> <span style="color: #666666">0</span>
-    i <span style="color: #666666">=</span> <span style="color: #666666">0</span>
-    <span style="color: #008000; font-weight: bold">while</span> i <span style="color: #666666">&lt;</span> <span style="color: #666666">10</span>:
-        val <span style="color: #666666">=</span> val <span style="color: #666666">+</span> x<span style="color: #666666">**</span>i
-        i <span style="color: #666666">=</span> i <span style="color: #666666">+</span> <span style="color: #666666">1</span>
-    <span style="color: #008000; font-weight: bold">return</span> val
-
-f6_for_grad <span style="color: #666666">=</span> grad(f6_for)
-f6_while_grad <span style="color: #666666">=</span> grad(f6_while)
-
-x <span style="color: #666666">=</span> <span style="color: #666666">0.5</span>
-
-<span style="color: #408080; font-style: italic"># Print the computed derivaties of f6_for and f6_while</span>
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;The computed derivative of f6_for at x = </span><span style="color: #BB6688; font-weight: bold">%g</span><span style="color: #BA2121"> is: </span><span style="color: #BB6688; font-weight: bold">%g</span><span style="color: #BA2121">&quot;</span><span style="color: #666666">%</span>(x,f6_for_grad(x)))
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;The computed derivative of f6_while at x = </span><span style="color: #BB6688; font-weight: bold">%g</span><span style="color: #BA2121"> is: </span><span style="color: #BB6688; font-weight: bold">%g</span><span style="color: #BA2121">&quot;</span><span style="color: #666666">%</span>(x,f6_while_grad(x)))
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">autograd.numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">autograd</span> <span style="color: #008000; font-weight: bold">import</span> grad
-<span style="color: #408080; font-style: italic"># Both of the functions are implementation of the sum: sum(x**i) for i = 0, ..., 9</span>
-<span style="color: #408080; font-style: italic"># The analytical derivative is: sum(i*x**(i-1)) </span>
-f6_grad_analytical <span style="color: #666666">=</span> <span style="color: #666666">0</span>
-<span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #666666">10</span>):
-    f6_grad_analytical <span style="color: #666666">+=</span> i<span style="color: #666666">*</span>x<span style="color: #666666">**</span>(i<span style="color: #666666">-1</span>)
-
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;The analytical derivative of f6 at x = </span><span style="color: #BB6688; font-weight: bold">%g</span><span style="color: #BA2121"> is: </span><span style="color: #BB6688; font-weight: bold">%g</span><span style="color: #BA2121">&quot;</span><span style="color: #666666">%</span>(x,f6_grad_analytical))
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="using-recursion">Using recursion </h2>
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">autograd.numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">autograd</span> <span style="color: #008000; font-weight: bold">import</span> grad
-
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">f7</span>(n): <span style="color: #408080; font-style: italic"># Assume that n is an integer</span>
-    <span style="color: #008000; font-weight: bold">if</span> n <span style="color: #666666">==</span> <span style="color: #666666">1</span> <span style="color: #AA22FF; font-weight: bold">or</span> n <span style="color: #666666">==</span> <span style="color: #666666">0</span>:
-        <span style="color: #008000; font-weight: bold">return</span> <span style="color: #666666">1</span>
-    <span style="color: #008000; font-weight: bold">else</span>:
-        <span style="color: #008000; font-weight: bold">return</span> n<span style="color: #666666">*</span>f7(n<span style="color: #666666">-1</span>)
-
-f7_grad <span style="color: #666666">=</span> grad(f7)
-
-n <span style="color: #666666">=</span> <span style="color: #666666">2.0</span>
-
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;The computed derivative of f7 at n = </span><span style="color: #BB6688; font-weight: bold">%d</span><span style="color: #BA2121"> is: </span><span style="color: #BB6688; font-weight: bold">%g</span><span style="color: #BA2121">&quot;</span><span style="color: #666666">%</span>(n,f7_grad(n)))
-
-<span style="color: #408080; font-style: italic"># The function f7 is an implementation of the factorial of n.</span>
-<span style="color: #408080; font-style: italic"># By using the product rule, one can find that the derivative is:</span>
-
-f7_grad_analytical <span style="color: #666666">=</span> <span style="color: #666666">0</span>
-<span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #008000">int</span>(n)<span style="color: #666666">-1</span>):
-    tmp <span style="color: #666666">=</span> <span style="color: #666666">1</span>
-    <span style="color: #008000; font-weight: bold">for</span> k <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #008000">int</span>(n)<span style="color: #666666">-1</span>):
-        <span style="color: #008000; font-weight: bold">if</span> k <span style="color: #666666">!=</span> i:
-            tmp <span style="color: #666666">*=</span> (n <span style="color: #666666">-</span> k)
-    f7_grad_analytical <span style="color: #666666">+=</span> tmp
-
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;The analytical derivative of f7 at n = </span><span style="color: #BB6688; font-weight: bold">%d</span><span style="color: #BA2121"> is: </span><span style="color: #BB6688; font-weight: bold">%g</span><span style="color: #BA2121">&quot;</span><span style="color: #666666">%</span>(n,f7_grad_analytical))
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-<p>Note that if n is equal to zero or one, Autograd will give an error message. This message appears when the output is independent on input.</p>
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="unsupported-functions">Unsupported functions </h2>
-<p>Autograd supports many features. However, there are some functions that is not supported (yet) by Autograd.</p>
-
-<p>Assigning a value to the variable being differentiated with respect to</p>
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">autograd.numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">autograd</span> <span style="color: #008000; font-weight: bold">import</span> grad
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">f8</span>(x): <span style="color: #408080; font-style: italic"># Assume x is an array</span>
-    x[<span style="color: #666666">2</span>] <span style="color: #666666">=</span> <span style="color: #666666">3</span>
-    <span style="color: #008000; font-weight: bold">return</span> x<span style="color: #666666">*2</span>
-
-<span style="color: #408080; font-style: italic">#f8_grad = grad(f8)</span>
-
-<span style="color: #408080; font-style: italic">#x = 8.4</span>
-
-<span style="color: #408080; font-style: italic">#print(&quot;The derivative of f8 is:&quot;,f8_grad(x))</span>
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-<p>Here, running this code, Autograd tells us that an 'ArrayBox' does not support item assignment. The item assignment is done when the program tries to assign x[2] to the value 3. However, Autograd has implemented the computation of the derivative such that this assignment is not possible.</p>
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="the-syntax-a-dot-b-when-finding-the-dot-product">The syntax a.dot(b) when finding the dot product </h2>
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">autograd.numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">autograd</span> <span style="color: #008000; font-weight: bold">import</span> grad
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">f9</span>(a): <span style="color: #408080; font-style: italic"># Assume a is an array with 2 elements</span>
-    b <span style="color: #666666">=</span> np<span style="color: #666666">.</span>array([<span style="color: #666666">1.0</span>,<span style="color: #666666">2.0</span>])
-    <span style="color: #008000; font-weight: bold">return</span> a<span style="color: #666666">.</span>dot(b)
-
-<span style="color: #408080; font-style: italic">#f9_grad = grad(f9)</span>
-
-<span style="color: #408080; font-style: italic">#x = np.array([1.0,0.0])</span>
-
-<span style="color: #408080; font-style: italic">#print(&quot;The derivative of f9 is:&quot;,f9_grad(x))</span>
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-<p>Here we are told that the 'dot' function does not belong to Autograd's
-version of a Numpy array.  To overcome this, an alternative syntax
-which also computed the dot product can be used:
-</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">autograd.numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">autograd</span> <span style="color: #008000; font-weight: bold">import</span> grad
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">f9_alternative</span>(x): <span style="color: #408080; font-style: italic"># Assume a is an array with 2 elements</span>
-    b <span style="color: #666666">=</span> np<span style="color: #666666">.</span>array([<span style="color: #666666">1.0</span>,<span style="color: #666666">2.0</span>])
-    <span style="color: #008000; font-weight: bold">return</span> np<span style="color: #666666">.</span>dot(x,b) <span style="color: #408080; font-style: italic"># The same as x_1*b_1 + x_2*b_2</span>
-
-f9_alternative_grad <span style="color: #666666">=</span> grad(f9_alternative)
-
-x <span style="color: #666666">=</span> np<span style="color: #666666">.</span>array([<span style="color: #666666">3.0</span>,<span style="color: #666666">0.0</span>])
-
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;The gradient of f9 is:&quot;</span>,f9_alternative_grad(x))
-
-<span style="color: #408080; font-style: italic"># The analytical gradient of the dot product of vectors x and b with two elements (x_1,x_2) and (b_1, b_2) respectively</span>
-<span style="color: #408080; font-style: italic"># w.r.t x is (b_1, b_2).</span>
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="using-autograd-with-ols">Using Autograd with OLS </h2>
-
-<p>We conclude the part on optmization by showing how we can make codes
-for linear regression and logistic regression using <b>autograd</b>. The
-first example shows results with ordinary leats squares.
-</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #408080; font-style: italic"># Using Autograd to calculate gradients for OLS</span>
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">random</span> <span style="color: #008000; font-weight: bold">import</span> random, seed
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">autograd.numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">matplotlib.pyplot</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">plt</span>
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">autograd</span> <span style="color: #008000; font-weight: bold">import</span> grad
-
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">CostOLS</span>(beta):
-    <span style="color: #008000; font-weight: bold">return</span> (<span style="color: #666666">1.0/</span>n)<span style="color: #666666">*</span>np<span style="color: #666666">.</span>sum((y<span style="color: #666666">-</span>X <span style="color: #666666">@</span> beta)<span style="color: #666666">**2</span>)
-
-n <span style="color: #666666">=</span> <span style="color: #666666">100</span>
-x <span style="color: #666666">=</span> <span style="color: #666666">2*</span>np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>rand(n,<span style="color: #666666">1</span>)
-y <span style="color: #666666">=</span> <span style="color: #666666">4+3*</span>x<span style="color: #666666">+</span>np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>randn(n,<span style="color: #666666">1</span>)
-
-X <span style="color: #666666">=</span> np<span style="color: #666666">.</span>c_[np<span style="color: #666666">.</span>ones((n,<span style="color: #666666">1</span>)), x]
-XT_X <span style="color: #666666">=</span> X<span style="color: #666666">.</span>T <span style="color: #666666">@</span> X
-theta_linreg <span style="color: #666666">=</span> np<span style="color: #666666">.</span>linalg<span style="color: #666666">.</span>pinv(XT_X) <span style="color: #666666">@</span> (X<span style="color: #666666">.</span>T <span style="color: #666666">@</span> y)
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;Own inversion&quot;</span>)
-<span style="color: #008000">print</span>(theta_linreg)
-<span style="color: #408080; font-style: italic"># Hessian matrix</span>
-H <span style="color: #666666">=</span> (<span style="color: #666666">2.0/</span>n)<span style="color: #666666">*</span> XT_X
-EigValues, EigVectors <span style="color: #666666">=</span> np<span style="color: #666666">.</span>linalg<span style="color: #666666">.</span>eig(H)
-<span style="color: #008000">print</span>(<span style="color: #BA2121">f&quot;Eigenvalues of Hessian Matrix:</span><span style="color: #BB6688; font-weight: bold">{</span>EigValues<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">&quot;</span>)
-
-theta <span style="color: #666666">=</span> np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>randn(<span style="color: #666666">2</span>,<span style="color: #666666">1</span>)
-eta <span style="color: #666666">=</span> <span style="color: #666666">1.0/</span>np<span style="color: #666666">.</span>max(EigValues)
-Niterations <span style="color: #666666">=</span> <span style="color: #666666">1000</span>
-<span style="color: #408080; font-style: italic"># define the gradient</span>
-training_gradient <span style="color: #666666">=</span> grad(CostOLS)
-
-<span style="color: #008000; font-weight: bold">for</span> <span style="color: #008000">iter</span> <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(Niterations):
-    gradients <span style="color: #666666">=</span> training_gradient(theta)
-    theta <span style="color: #666666">-=</span> eta<span style="color: #666666">*</span>gradients
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;theta from own gd&quot;</span>)
-<span style="color: #008000">print</span>(theta)
-
-xnew <span style="color: #666666">=</span> np<span style="color: #666666">.</span>array([[<span style="color: #666666">0</span>],[<span style="color: #666666">2</span>]])
-Xnew <span style="color: #666666">=</span> np<span style="color: #666666">.</span>c_[np<span style="color: #666666">.</span>ones((<span style="color: #666666">2</span>,<span style="color: #666666">1</span>)), xnew]
-ypredict <span style="color: #666666">=</span> Xnew<span style="color: #666666">.</span>dot(theta)
-ypredict2 <span style="color: #666666">=</span> Xnew<span style="color: #666666">.</span>dot(theta_linreg)
-
-plt<span style="color: #666666">.</span>plot(xnew, ypredict, <span style="color: #BA2121">&quot;r-&quot;</span>)
-plt<span style="color: #666666">.</span>plot(xnew, ypredict2, <span style="color: #BA2121">&quot;b-&quot;</span>)
-plt<span style="color: #666666">.</span>plot(x, y ,<span style="color: #BA2121">&#39;ro&#39;</span>)
-plt<span style="color: #666666">.</span>axis([<span style="color: #666666">0</span>,<span style="color: #666666">2.0</span>,<span style="color: #666666">0</span>, <span style="color: #666666">15.0</span>])
-plt<span style="color: #666666">.</span>xlabel(<span style="color: #BA2121">r&#39;$x$&#39;</span>)
-plt<span style="color: #666666">.</span>ylabel(<span style="color: #BA2121">r&#39;$y$&#39;</span>)
-plt<span style="color: #666666">.</span>title(<span style="color: #BA2121">r&#39;Random numbers &#39;</span>)
-plt<span style="color: #666666">.</span>show()
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="same-code-but-now-with-momentum-gradient-descent">Same code but now with momentum gradient descent </h2>
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #408080; font-style: italic"># Using Autograd to calculate gradients for OLS</span>
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">random</span> <span style="color: #008000; font-weight: bold">import</span> random, seed
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">autograd.numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">matplotlib.pyplot</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">plt</span>
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">autograd</span> <span style="color: #008000; font-weight: bold">import</span> grad
-
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">CostOLS</span>(beta):
-    <span style="color: #008000; font-weight: bold">return</span> (<span style="color: #666666">1.0/</span>n)<span style="color: #666666">*</span>np<span style="color: #666666">.</span>sum((y<span style="color: #666666">-</span>X <span style="color: #666666">@</span> beta)<span style="color: #666666">**2</span>)
-
-n <span style="color: #666666">=</span> <span style="color: #666666">100</span>
-x <span style="color: #666666">=</span> <span style="color: #666666">2*</span>np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>rand(n,<span style="color: #666666">1</span>)
-y <span style="color: #666666">=</span> <span style="color: #666666">4+3*</span>x<span style="color: #408080; font-style: italic">#+np.random.randn(n,1)</span>
-
-X <span style="color: #666666">=</span> np<span style="color: #666666">.</span>c_[np<span style="color: #666666">.</span>ones((n,<span style="color: #666666">1</span>)), x]
-XT_X <span style="color: #666666">=</span> X<span style="color: #666666">.</span>T <span style="color: #666666">@</span> X
-theta_linreg <span style="color: #666666">=</span> np<span style="color: #666666">.</span>linalg<span style="color: #666666">.</span>pinv(XT_X) <span style="color: #666666">@</span> (X<span style="color: #666666">.</span>T <span style="color: #666666">@</span> y)
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;Own inversion&quot;</span>)
-<span style="color: #008000">print</span>(theta_linreg)
-<span style="color: #408080; font-style: italic"># Hessian matrix</span>
-H <span style="color: #666666">=</span> (<span style="color: #666666">2.0/</span>n)<span style="color: #666666">*</span> XT_X
-EigValues, EigVectors <span style="color: #666666">=</span> np<span style="color: #666666">.</span>linalg<span style="color: #666666">.</span>eig(H)
-<span style="color: #008000">print</span>(<span style="color: #BA2121">f&quot;Eigenvalues of Hessian Matrix:</span><span style="color: #BB6688; font-weight: bold">{</span>EigValues<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">&quot;</span>)
-
-theta <span style="color: #666666">=</span> np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>randn(<span style="color: #666666">2</span>,<span style="color: #666666">1</span>)
-eta <span style="color: #666666">=</span> <span style="color: #666666">1.0/</span>np<span style="color: #666666">.</span>max(EigValues)
-Niterations <span style="color: #666666">=</span> <span style="color: #666666">30</span>
-
-<span style="color: #408080; font-style: italic"># define the gradient</span>
-training_gradient <span style="color: #666666">=</span> grad(CostOLS)
-
-<span style="color: #008000; font-weight: bold">for</span> <span style="color: #008000">iter</span> <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(Niterations):
-    gradients <span style="color: #666666">=</span> training_gradient(theta)
-    theta <span style="color: #666666">-=</span> eta<span style="color: #666666">*</span>gradients
-    <span style="color: #008000">print</span>(<span style="color: #008000">iter</span>,gradients[<span style="color: #666666">0</span>],gradients[<span style="color: #666666">1</span>])
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;theta from own gd&quot;</span>)
-<span style="color: #008000">print</span>(theta)
-
-<span style="color: #408080; font-style: italic"># Now improve with momentum gradient descent</span>
-change <span style="color: #666666">=</span> <span style="color: #666666">0.0</span>
-delta_momentum <span style="color: #666666">=</span> <span style="color: #666666">0.3</span>
-<span style="color: #008000; font-weight: bold">for</span> <span style="color: #008000">iter</span> <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(Niterations):
-    <span style="color: #408080; font-style: italic"># calculate gradient</span>
-    gradients <span style="color: #666666">=</span> training_gradient(theta)
-    <span style="color: #408080; font-style: italic"># calculate update</span>
-    new_change <span style="color: #666666">=</span> eta<span style="color: #666666">*</span>gradients<span style="color: #666666">+</span>delta_momentum<span style="color: #666666">*</span>change
-    <span style="color: #408080; font-style: italic"># take a step</span>
-    theta <span style="color: #666666">-=</span> new_change
-    <span style="color: #408080; font-style: italic"># save the change</span>
-    change <span style="color: #666666">=</span> new_change
-    <span style="color: #008000">print</span>(<span style="color: #008000">iter</span>,gradients[<span style="color: #666666">0</span>],gradients[<span style="color: #666666">1</span>])
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;theta from own gd wth momentum&quot;</span>)
-<span style="color: #008000">print</span>(theta)
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="but-none-of-these-can-compete-with-newton-s-method">But none of these can compete with Newton's method </h2>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #408080; font-style: italic"># Using Newton&#39;s method</span>
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">random</span> <span style="color: #008000; font-weight: bold">import</span> random, seed
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">autograd.numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">matplotlib.pyplot</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">plt</span>
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">autograd</span> <span style="color: #008000; font-weight: bold">import</span> grad
-
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">CostOLS</span>(beta):
-    <span style="color: #008000; font-weight: bold">return</span> (<span style="color: #666666">1.0/</span>n)<span style="color: #666666">*</span>np<span style="color: #666666">.</span>sum((y<span style="color: #666666">-</span>X <span style="color: #666666">@</span> beta)<span style="color: #666666">**2</span>)
-
-n <span style="color: #666666">=</span> <span style="color: #666666">100</span>
-x <span style="color: #666666">=</span> <span style="color: #666666">2*</span>np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>rand(n,<span style="color: #666666">1</span>)
-y <span style="color: #666666">=</span> <span style="color: #666666">4+3*</span>x<span style="color: #666666">+</span>np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>randn(n,<span style="color: #666666">1</span>)
-
-X <span style="color: #666666">=</span> np<span style="color: #666666">.</span>c_[np<span style="color: #666666">.</span>ones((n,<span style="color: #666666">1</span>)), x]
-XT_X <span style="color: #666666">=</span> X<span style="color: #666666">.</span>T <span style="color: #666666">@</span> X
-beta_linreg <span style="color: #666666">=</span> np<span style="color: #666666">.</span>linalg<span style="color: #666666">.</span>pinv(XT_X) <span style="color: #666666">@</span> (X<span style="color: #666666">.</span>T <span style="color: #666666">@</span> y)
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;Own inversion&quot;</span>)
-<span style="color: #008000">print</span>(beta_linreg)
-<span style="color: #408080; font-style: italic"># Hessian matrix</span>
-H <span style="color: #666666">=</span> (<span style="color: #666666">2.0/</span>n)<span style="color: #666666">*</span> XT_X
-<span style="color: #408080; font-style: italic"># Note that here the Hessian does not depend on the parameters beta</span>
-invH <span style="color: #666666">=</span> np<span style="color: #666666">.</span>linalg<span style="color: #666666">.</span>pinv(H)
-EigValues, EigVectors <span style="color: #666666">=</span> np<span style="color: #666666">.</span>linalg<span style="color: #666666">.</span>eig(H)
-<span style="color: #008000">print</span>(<span style="color: #BA2121">f&quot;Eigenvalues of Hessian Matrix:</span><span style="color: #BB6688; font-weight: bold">{</span>EigValues<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">&quot;</span>)
-
-beta <span style="color: #666666">=</span> np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>randn(<span style="color: #666666">2</span>,<span style="color: #666666">1</span>)
-Niterations <span style="color: #666666">=</span> <span style="color: #666666">5</span>
-
-<span style="color: #408080; font-style: italic"># define the gradient</span>
-training_gradient <span style="color: #666666">=</span> grad(CostOLS)
-
-<span style="color: #008000; font-weight: bold">for</span> <span style="color: #008000">iter</span> <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(Niterations):
-    gradients <span style="color: #666666">=</span> training_gradient(beta)
-    beta <span style="color: #666666">-=</span> invH <span style="color: #666666">@</span> gradients
-    <span style="color: #008000">print</span>(<span style="color: #008000">iter</span>,gradients[<span style="color: #666666">0</span>],gradients[<span style="color: #666666">1</span>])
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;beta from own Newton code&quot;</span>)
-<span style="color: #008000">print</span>(beta)
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="including-stochastic-gradient-descent-with-autograd">Including Stochastic Gradient Descent with Autograd </h2>
-<p>In this code we include the stochastic gradient descent approach discussed above. Note here that we specify which argument we are taking the derivative with respect to when using <b>autograd</b>.</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #408080; font-style: italic"># Using Autograd to calculate gradients using SGD</span>
-<span style="color: #408080; font-style: italic"># OLS example</span>
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">random</span> <span style="color: #008000; font-weight: bold">import</span> random, seed
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">autograd.numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">matplotlib.pyplot</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">plt</span>
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">autograd</span> <span style="color: #008000; font-weight: bold">import</span> grad
-
-<span style="color: #408080; font-style: italic"># Note change from previous example</span>
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">CostOLS</span>(y,X,theta):
-    <span style="color: #008000; font-weight: bold">return</span> np<span style="color: #666666">.</span>sum((y<span style="color: #666666">-</span>X <span style="color: #666666">@</span> theta)<span style="color: #666666">**2</span>)
-
-n <span style="color: #666666">=</span> <span style="color: #666666">100</span>
-x <span style="color: #666666">=</span> <span style="color: #666666">2*</span>np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>rand(n,<span style="color: #666666">1</span>)
-y <span style="color: #666666">=</span> <span style="color: #666666">4+3*</span>x<span style="color: #666666">+</span>np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>randn(n,<span style="color: #666666">1</span>)
-
-X <span style="color: #666666">=</span> np<span style="color: #666666">.</span>c_[np<span style="color: #666666">.</span>ones((n,<span style="color: #666666">1</span>)), x]
-XT_X <span style="color: #666666">=</span> X<span style="color: #666666">.</span>T <span style="color: #666666">@</span> X
-theta_linreg <span style="color: #666666">=</span> np<span style="color: #666666">.</span>linalg<span style="color: #666666">.</span>pinv(XT_X) <span style="color: #666666">@</span> (X<span style="color: #666666">.</span>T <span style="color: #666666">@</span> y)
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;Own inversion&quot;</span>)
-<span style="color: #008000">print</span>(theta_linreg)
-<span style="color: #408080; font-style: italic"># Hessian matrix</span>
-H <span style="color: #666666">=</span> (<span style="color: #666666">2.0/</span>n)<span style="color: #666666">*</span> XT_X
-EigValues, EigVectors <span style="color: #666666">=</span> np<span style="color: #666666">.</span>linalg<span style="color: #666666">.</span>eig(H)
-<span style="color: #008000">print</span>(<span style="color: #BA2121">f&quot;Eigenvalues of Hessian Matrix:</span><span style="color: #BB6688; font-weight: bold">{</span>EigValues<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">&quot;</span>)
-
-theta <span style="color: #666666">=</span> np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>randn(<span style="color: #666666">2</span>,<span style="color: #666666">1</span>)
-eta <span style="color: #666666">=</span> <span style="color: #666666">1.0/</span>np<span style="color: #666666">.</span>max(EigValues)
-Niterations <span style="color: #666666">=</span> <span style="color: #666666">1000</span>
-
-<span style="color: #408080; font-style: italic"># Note that we request the derivative wrt third argument (theta, 2 here)</span>
-training_gradient <span style="color: #666666">=</span> grad(CostOLS,<span style="color: #666666">2</span>)
-
-<span style="color: #008000; font-weight: bold">for</span> <span style="color: #008000">iter</span> <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(Niterations):
-    gradients <span style="color: #666666">=</span> (<span style="color: #666666">1.0/</span>n)<span style="color: #666666">*</span>training_gradient(y, X, theta)
-    theta <span style="color: #666666">-=</span> eta<span style="color: #666666">*</span>gradients
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;theta from own gd&quot;</span>)
-<span style="color: #008000">print</span>(theta)
-
-xnew <span style="color: #666666">=</span> np<span style="color: #666666">.</span>array([[<span style="color: #666666">0</span>],[<span style="color: #666666">2</span>]])
-Xnew <span style="color: #666666">=</span> np<span style="color: #666666">.</span>c_[np<span style="color: #666666">.</span>ones((<span style="color: #666666">2</span>,<span style="color: #666666">1</span>)), xnew]
-ypredict <span style="color: #666666">=</span> Xnew<span style="color: #666666">.</span>dot(theta)
-ypredict2 <span style="color: #666666">=</span> Xnew<span style="color: #666666">.</span>dot(theta_linreg)
-
-plt<span style="color: #666666">.</span>plot(xnew, ypredict, <span style="color: #BA2121">&quot;r-&quot;</span>)
-plt<span style="color: #666666">.</span>plot(xnew, ypredict2, <span style="color: #BA2121">&quot;b-&quot;</span>)
-plt<span style="color: #666666">.</span>plot(x, y ,<span style="color: #BA2121">&#39;ro&#39;</span>)
-plt<span style="color: #666666">.</span>axis([<span style="color: #666666">0</span>,<span style="color: #666666">2.0</span>,<span style="color: #666666">0</span>, <span style="color: #666666">15.0</span>])
-plt<span style="color: #666666">.</span>xlabel(<span style="color: #BA2121">r&#39;$x$&#39;</span>)
-plt<span style="color: #666666">.</span>ylabel(<span style="color: #BA2121">r&#39;$y$&#39;</span>)
-plt<span style="color: #666666">.</span>title(<span style="color: #BA2121">r&#39;Random numbers &#39;</span>)
-plt<span style="color: #666666">.</span>show()
-
-n_epochs <span style="color: #666666">=</span> <span style="color: #666666">50</span>
-M <span style="color: #666666">=</span> <span style="color: #666666">5</span>   <span style="color: #408080; font-style: italic">#size of each minibatch</span>
-m <span style="color: #666666">=</span> <span style="color: #008000">int</span>(n<span style="color: #666666">/</span>M) <span style="color: #408080; font-style: italic">#number of minibatches</span>
-t0, t1 <span style="color: #666666">=</span> <span style="color: #666666">5</span>, <span style="color: #666666">50</span>
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">learning_schedule</span>(t):
-    <span style="color: #008000; font-weight: bold">return</span> t0<span style="color: #666666">/</span>(t<span style="color: #666666">+</span>t1)
-
-theta <span style="color: #666666">=</span> np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>randn(<span style="color: #666666">2</span>,<span style="color: #666666">1</span>)
-
-<span style="color: #008000; font-weight: bold">for</span> epoch <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(n_epochs):
-<span style="color: #408080; font-style: italic"># Can you figure out a better way of setting up the contributions to each batch?</span>
-    <span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(m):
-        random_index <span style="color: #666666">=</span> M<span style="color: #666666">*</span>np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>randint(m)
-        xi <span style="color: #666666">=</span> X[random_index:random_index<span style="color: #666666">+</span>M]
-        yi <span style="color: #666666">=</span> y[random_index:random_index<span style="color: #666666">+</span>M]
-        gradients <span style="color: #666666">=</span> (<span style="color: #666666">1.0/</span>M)<span style="color: #666666">*</span>training_gradient(yi, xi, theta)
-        eta <span style="color: #666666">=</span> learning_schedule(epoch<span style="color: #666666">*</span>m<span style="color: #666666">+</span>i)
-        theta <span style="color: #666666">=</span> theta <span style="color: #666666">-</span> eta<span style="color: #666666">*</span>gradients
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;theta from own sdg&quot;</span>)
-<span style="color: #008000">print</span>(theta)
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="same-code-but-now-with-momentum-gradient-descent">Same code but now with momentum gradient descent </h2>
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #408080; font-style: italic"># Using Autograd to calculate gradients using SGD</span>
-<span style="color: #408080; font-style: italic"># OLS example</span>
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">random</span> <span style="color: #008000; font-weight: bold">import</span> random, seed
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">autograd.numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">matplotlib.pyplot</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">plt</span>
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">autograd</span> <span style="color: #008000; font-weight: bold">import</span> grad
-
-<span style="color: #408080; font-style: italic"># Note change from previous example</span>
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">CostOLS</span>(y,X,theta):
-    <span style="color: #008000; font-weight: bold">return</span> np<span style="color: #666666">.</span>sum((y<span style="color: #666666">-</span>X <span style="color: #666666">@</span> theta)<span style="color: #666666">**2</span>)
-
-n <span style="color: #666666">=</span> <span style="color: #666666">100</span>
-x <span style="color: #666666">=</span> <span style="color: #666666">2*</span>np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>rand(n,<span style="color: #666666">1</span>)
-y <span style="color: #666666">=</span> <span style="color: #666666">4+3*</span>x<span style="color: #666666">+</span>np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>randn(n,<span style="color: #666666">1</span>)
-
-X <span style="color: #666666">=</span> np<span style="color: #666666">.</span>c_[np<span style="color: #666666">.</span>ones((n,<span style="color: #666666">1</span>)), x]
-XT_X <span style="color: #666666">=</span> X<span style="color: #666666">.</span>T <span style="color: #666666">@</span> X
-theta_linreg <span style="color: #666666">=</span> np<span style="color: #666666">.</span>linalg<span style="color: #666666">.</span>pinv(XT_X) <span style="color: #666666">@</span> (X<span style="color: #666666">.</span>T <span style="color: #666666">@</span> y)
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;Own inversion&quot;</span>)
-<span style="color: #008000">print</span>(theta_linreg)
-<span style="color: #408080; font-style: italic"># Hessian matrix</span>
-H <span style="color: #666666">=</span> (<span style="color: #666666">2.0/</span>n)<span style="color: #666666">*</span> XT_X
-EigValues, EigVectors <span style="color: #666666">=</span> np<span style="color: #666666">.</span>linalg<span style="color: #666666">.</span>eig(H)
-<span style="color: #008000">print</span>(<span style="color: #BA2121">f&quot;Eigenvalues of Hessian Matrix:</span><span style="color: #BB6688; font-weight: bold">{</span>EigValues<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">&quot;</span>)
-
-theta <span style="color: #666666">=</span> np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>randn(<span style="color: #666666">2</span>,<span style="color: #666666">1</span>)
-eta <span style="color: #666666">=</span> <span style="color: #666666">1.0/</span>np<span style="color: #666666">.</span>max(EigValues)
-Niterations <span style="color: #666666">=</span> <span style="color: #666666">100</span>
-
-<span style="color: #408080; font-style: italic"># Note that we request the derivative wrt third argument (theta, 2 here)</span>
-training_gradient <span style="color: #666666">=</span> grad(CostOLS,<span style="color: #666666">2</span>)
-
-<span style="color: #008000; font-weight: bold">for</span> <span style="color: #008000">iter</span> <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(Niterations):
-    gradients <span style="color: #666666">=</span> (<span style="color: #666666">1.0/</span>n)<span style="color: #666666">*</span>training_gradient(y, X, theta)
-    theta <span style="color: #666666">-=</span> eta<span style="color: #666666">*</span>gradients
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;theta from own gd&quot;</span>)
-<span style="color: #008000">print</span>(theta)
-
-
-n_epochs <span style="color: #666666">=</span> <span style="color: #666666">50</span>
-M <span style="color: #666666">=</span> <span style="color: #666666">5</span>   <span style="color: #408080; font-style: italic">#size of each minibatch</span>
-m <span style="color: #666666">=</span> <span style="color: #008000">int</span>(n<span style="color: #666666">/</span>M) <span style="color: #408080; font-style: italic">#number of minibatches</span>
-t0, t1 <span style="color: #666666">=</span> <span style="color: #666666">5</span>, <span style="color: #666666">50</span>
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">learning_schedule</span>(t):
-    <span style="color: #008000; font-weight: bold">return</span> t0<span style="color: #666666">/</span>(t<span style="color: #666666">+</span>t1)
-
-theta <span style="color: #666666">=</span> np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>randn(<span style="color: #666666">2</span>,<span style="color: #666666">1</span>)
-
-change <span style="color: #666666">=</span> <span style="color: #666666">0.0</span>
-delta_momentum <span style="color: #666666">=</span> <span style="color: #666666">0.3</span>
-
-<span style="color: #008000; font-weight: bold">for</span> epoch <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(n_epochs):
-    <span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(m):
-        random_index <span style="color: #666666">=</span> M<span style="color: #666666">*</span>np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>randint(m)
-        xi <span style="color: #666666">=</span> X[random_index:random_index<span style="color: #666666">+</span>M]
-        yi <span style="color: #666666">=</span> y[random_index:random_index<span style="color: #666666">+</span>M]
-        gradients <span style="color: #666666">=</span> (<span style="color: #666666">1.0/</span>M)<span style="color: #666666">*</span>training_gradient(yi, xi, theta)
-        eta <span style="color: #666666">=</span> learning_schedule(epoch<span style="color: #666666">*</span>m<span style="color: #666666">+</span>i)
-        <span style="color: #408080; font-style: italic"># calculate update</span>
-        new_change <span style="color: #666666">=</span> eta<span style="color: #666666">*</span>gradients<span style="color: #666666">+</span>delta_momentum<span style="color: #666666">*</span>change
-        <span style="color: #408080; font-style: italic"># take a step</span>
-        theta <span style="color: #666666">-=</span> new_change
-        <span style="color: #408080; font-style: italic"># save the change</span>
-        change <span style="color: #666666">=</span> new_change
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;theta from own sdg with momentum&quot;</span>)
-<span style="color: #008000">print</span>(theta)
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="similar-second-order-function-now-problem-but-now-with-adagrad">Similar (second order function now) problem but now with AdaGrad </h2>
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #408080; font-style: italic"># Using Autograd to calculate gradients using AdaGrad and Stochastic Gradient descent</span>
-<span style="color: #408080; font-style: italic"># OLS example</span>
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">random</span> <span style="color: #008000; font-weight: bold">import</span> random, seed
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">autograd.numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">matplotlib.pyplot</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">plt</span>
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">autograd</span> <span style="color: #008000; font-weight: bold">import</span> grad
-
-<span style="color: #408080; font-style: italic"># Note change from previous example</span>
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">CostOLS</span>(y,X,theta):
-    <span style="color: #008000; font-weight: bold">return</span> np<span style="color: #666666">.</span>sum((y<span style="color: #666666">-</span>X <span style="color: #666666">@</span> theta)<span style="color: #666666">**2</span>)
-
-n <span style="color: #666666">=</span> <span style="color: #666666">1000</span>
-x <span style="color: #666666">=</span> np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>rand(n,<span style="color: #666666">1</span>)
-y <span style="color: #666666">=</span> <span style="color: #666666">2.0+3*</span>x <span style="color: #666666">+4*</span>x<span style="color: #666666">*</span>x
-
-X <span style="color: #666666">=</span> np<span style="color: #666666">.</span>c_[np<span style="color: #666666">.</span>ones((n,<span style="color: #666666">1</span>)), x, x<span style="color: #666666">*</span>x]
-XT_X <span style="color: #666666">=</span> X<span style="color: #666666">.</span>T <span style="color: #666666">@</span> X
-theta_linreg <span style="color: #666666">=</span> np<span style="color: #666666">.</span>linalg<span style="color: #666666">.</span>pinv(XT_X) <span style="color: #666666">@</span> (X<span style="color: #666666">.</span>T <span style="color: #666666">@</span> y)
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;Own inversion&quot;</span>)
-<span style="color: #008000">print</span>(theta_linreg)
-
-
-<span style="color: #408080; font-style: italic"># Note that we request the derivative wrt third argument (theta, 2 here)</span>
-training_gradient <span style="color: #666666">=</span> grad(CostOLS,<span style="color: #666666">2</span>)
-<span style="color: #408080; font-style: italic"># Define parameters for Stochastic Gradient Descent</span>
-n_epochs <span style="color: #666666">=</span> <span style="color: #666666">50</span>
-M <span style="color: #666666">=</span> <span style="color: #666666">5</span>   <span style="color: #408080; font-style: italic">#size of each minibatch</span>
-m <span style="color: #666666">=</span> <span style="color: #008000">int</span>(n<span style="color: #666666">/</span>M) <span style="color: #408080; font-style: italic">#number of minibatches</span>
-<span style="color: #408080; font-style: italic"># Guess for unknown parameters theta</span>
-theta <span style="color: #666666">=</span> np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>randn(<span style="color: #666666">3</span>,<span style="color: #666666">1</span>)
-
-<span style="color: #408080; font-style: italic"># Value for learning rate</span>
-eta <span style="color: #666666">=</span> <span style="color: #666666">0.01</span>
-<span style="color: #408080; font-style: italic"># Including AdaGrad parameter to avoid possible division by zero</span>
-delta  <span style="color: #666666">=</span> <span style="color: #666666">1e-8</span>
-<span style="color: #008000; font-weight: bold">for</span> epoch <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(n_epochs):
-    Giter <span style="color: #666666">=</span> <span style="color: #666666">0.0</span>
-    <span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(m):
-        random_index <span style="color: #666666">=</span> M<span style="color: #666666">*</span>np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>randint(m)
-        xi <span style="color: #666666">=</span> X[random_index:random_index<span style="color: #666666">+</span>M]
-        yi <span style="color: #666666">=</span> y[random_index:random_index<span style="color: #666666">+</span>M]
-        gradients <span style="color: #666666">=</span> (<span style="color: #666666">1.0/</span>M)<span style="color: #666666">*</span>training_gradient(yi, xi, theta)
-        Giter <span style="color: #666666">+=</span> gradients<span style="color: #666666">*</span>gradients
-        update <span style="color: #666666">=</span> gradients<span style="color: #666666">*</span>eta<span style="color: #666666">/</span>(delta<span style="color: #666666">+</span>np<span style="color: #666666">.</span>sqrt(Giter))
-        theta <span style="color: #666666">-=</span> update
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;theta from own AdaGrad&quot;</span>)
-<span style="color: #008000">print</span>(theta)
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-<p>Running this code we note an almost perfect agreement with the results from matrix inversion.</p>
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent">RMSprop for adaptive learning rate with Stochastic Gradient Descent </h2>
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #408080; font-style: italic"># Using Autograd to calculate gradients using RMSprop  and Stochastic Gradient descent</span>
-<span style="color: #408080; font-style: italic"># OLS example</span>
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">random</span> <span style="color: #008000; font-weight: bold">import</span> random, seed
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">autograd.numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">matplotlib.pyplot</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">plt</span>
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">autograd</span> <span style="color: #008000; font-weight: bold">import</span> grad
-
-<span style="color: #408080; font-style: italic"># Note change from previous example</span>
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">CostOLS</span>(y,X,theta):
-    <span style="color: #008000; font-weight: bold">return</span> np<span style="color: #666666">.</span>sum((y<span style="color: #666666">-</span>X <span style="color: #666666">@</span> theta)<span style="color: #666666">**2</span>)
-
-n <span style="color: #666666">=</span> <span style="color: #666666">1000</span>
-x <span style="color: #666666">=</span> np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>rand(n,<span style="color: #666666">1</span>)
-y <span style="color: #666666">=</span> <span style="color: #666666">2.0+3*</span>x <span style="color: #666666">+4*</span>x<span style="color: #666666">*</span>x<span style="color: #408080; font-style: italic"># +np.random.randn(n,1)</span>
-
-X <span style="color: #666666">=</span> np<span style="color: #666666">.</span>c_[np<span style="color: #666666">.</span>ones((n,<span style="color: #666666">1</span>)), x, x<span style="color: #666666">*</span>x]
-XT_X <span style="color: #666666">=</span> X<span style="color: #666666">.</span>T <span style="color: #666666">@</span> X
-theta_linreg <span style="color: #666666">=</span> np<span style="color: #666666">.</span>linalg<span style="color: #666666">.</span>pinv(XT_X) <span style="color: #666666">@</span> (X<span style="color: #666666">.</span>T <span style="color: #666666">@</span> y)
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;Own inversion&quot;</span>)
-<span style="color: #008000">print</span>(theta_linreg)
-
-
-<span style="color: #408080; font-style: italic"># Note that we request the derivative wrt third argument (theta, 2 here)</span>
-training_gradient <span style="color: #666666">=</span> grad(CostOLS,<span style="color: #666666">2</span>)
-<span style="color: #408080; font-style: italic"># Define parameters for Stochastic Gradient Descent</span>
-n_epochs <span style="color: #666666">=</span> <span style="color: #666666">50</span>
-M <span style="color: #666666">=</span> <span style="color: #666666">5</span>   <span style="color: #408080; font-style: italic">#size of each minibatch</span>
-m <span style="color: #666666">=</span> <span style="color: #008000">int</span>(n<span style="color: #666666">/</span>M) <span style="color: #408080; font-style: italic">#number of minibatches</span>
-<span style="color: #408080; font-style: italic"># Guess for unknown parameters theta</span>
-theta <span style="color: #666666">=</span> np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>randn(<span style="color: #666666">3</span>,<span style="color: #666666">1</span>)
-
-<span style="color: #408080; font-style: italic"># Value for learning rate</span>
-eta <span style="color: #666666">=</span> <span style="color: #666666">0.01</span>
-<span style="color: #408080; font-style: italic"># Value for parameter rho</span>
-rho <span style="color: #666666">=</span> <span style="color: #666666">0.99</span>
-<span style="color: #408080; font-style: italic"># Including AdaGrad parameter to avoid possible division by zero</span>
-delta  <span style="color: #666666">=</span> <span style="color: #666666">1e-8</span>
-<span style="color: #008000; font-weight: bold">for</span> epoch <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(n_epochs):
-    Giter <span style="color: #666666">=</span> <span style="color: #666666">0.0</span>
-    <span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(m):
-        random_index <span style="color: #666666">=</span> M<span style="color: #666666">*</span>np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>randint(m)
-        xi <span style="color: #666666">=</span> X[random_index:random_index<span style="color: #666666">+</span>M]
-        yi <span style="color: #666666">=</span> y[random_index:random_index<span style="color: #666666">+</span>M]
-        gradients <span style="color: #666666">=</span> (<span style="color: #666666">1.0/</span>M)<span style="color: #666666">*</span>training_gradient(yi, xi, theta)
-	<span style="color: #408080; font-style: italic"># Accumulated gradient</span>
-	<span style="color: #408080; font-style: italic"># Scaling with rho the new and the previous results</span>
-        Giter <span style="color: #666666">=</span> (rho<span style="color: #666666">*</span>Giter<span style="color: #666666">+</span>(<span style="color: #666666">1-</span>rho)<span style="color: #666666">*</span>gradients<span style="color: #666666">*</span>gradients)
-	<span style="color: #408080; font-style: italic"># Taking the diagonal only and inverting</span>
-        update <span style="color: #666666">=</span> gradients<span style="color: #666666">*</span>eta<span style="color: #666666">/</span>(delta<span style="color: #666666">+</span>np<span style="color: #666666">.</span>sqrt(Giter))
-	<span style="color: #408080; font-style: italic"># Hadamard product</span>
-        theta <span style="color: #666666">-=</span> update
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;theta from own RMSprop&quot;</span>)
-<span style="color: #008000">print</span>(theta)
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf">And finally <a href="https://arxiv.org/pdf/1412.6980.pdf" target="_blank">ADAM</a> </h2>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #408080; font-style: italic"># Using Autograd to calculate gradients using RMSprop  and Stochastic Gradient descent</span>
-<span style="color: #408080; font-style: italic"># OLS example</span>
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">random</span> <span style="color: #008000; font-weight: bold">import</span> random, seed
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">autograd.numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">matplotlib.pyplot</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">plt</span>
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">autograd</span> <span style="color: #008000; font-weight: bold">import</span> grad
-
-<span style="color: #408080; font-style: italic"># Note change from previous example</span>
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">CostOLS</span>(y,X,theta):
-    <span style="color: #008000; font-weight: bold">return</span> np<span style="color: #666666">.</span>sum((y<span style="color: #666666">-</span>X <span style="color: #666666">@</span> theta)<span style="color: #666666">**2</span>)
-
-n <span style="color: #666666">=</span> <span style="color: #666666">1000</span>
-x <span style="color: #666666">=</span> np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>rand(n,<span style="color: #666666">1</span>)
-y <span style="color: #666666">=</span> <span style="color: #666666">2.0+3*</span>x <span style="color: #666666">+4*</span>x<span style="color: #666666">*</span>x<span style="color: #408080; font-style: italic"># +np.random.randn(n,1)</span>
-
-X <span style="color: #666666">=</span> np<span style="color: #666666">.</span>c_[np<span style="color: #666666">.</span>ones((n,<span style="color: #666666">1</span>)), x, x<span style="color: #666666">*</span>x]
-XT_X <span style="color: #666666">=</span> X<span style="color: #666666">.</span>T <span style="color: #666666">@</span> X
-theta_linreg <span style="color: #666666">=</span> np<span style="color: #666666">.</span>linalg<span style="color: #666666">.</span>pinv(XT_X) <span style="color: #666666">@</span> (X<span style="color: #666666">.</span>T <span style="color: #666666">@</span> y)
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;Own inversion&quot;</span>)
-<span style="color: #008000">print</span>(theta_linreg)
-
-
-<span style="color: #408080; font-style: italic"># Note that we request the derivative wrt third argument (theta, 2 here)</span>
-training_gradient <span style="color: #666666">=</span> grad(CostOLS,<span style="color: #666666">2</span>)
-<span style="color: #408080; font-style: italic"># Define parameters for Stochastic Gradient Descent</span>
-n_epochs <span style="color: #666666">=</span> <span style="color: #666666">50</span>
-M <span style="color: #666666">=</span> <span style="color: #666666">5</span>   <span style="color: #408080; font-style: italic">#size of each minibatch</span>
-m <span style="color: #666666">=</span> <span style="color: #008000">int</span>(n<span style="color: #666666">/</span>M) <span style="color: #408080; font-style: italic">#number of minibatches</span>
-<span style="color: #408080; font-style: italic"># Guess for unknown parameters theta</span>
-theta <span style="color: #666666">=</span> np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>randn(<span style="color: #666666">3</span>,<span style="color: #666666">1</span>)
-
-<span style="color: #408080; font-style: italic"># Value for learning rate</span>
-eta <span style="color: #666666">=</span> <span style="color: #666666">0.01</span>
-<span style="color: #408080; font-style: italic"># Value for parameters beta1 and beta2, see https://arxiv.org/abs/1412.6980</span>
-beta1 <span style="color: #666666">=</span> <span style="color: #666666">0.9</span>
-beta2 <span style="color: #666666">=</span> <span style="color: #666666">0.999</span>
-<span style="color: #408080; font-style: italic"># Including AdaGrad parameter to avoid possible division by zero</span>
-delta  <span style="color: #666666">=</span> <span style="color: #666666">1e-7</span>
-<span style="color: #008000">iter</span> <span style="color: #666666">=</span> <span style="color: #666666">0</span>
-<span style="color: #008000; font-weight: bold">for</span> epoch <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(n_epochs):
-    first_moment <span style="color: #666666">=</span> <span style="color: #666666">0.0</span>
-    second_moment <span style="color: #666666">=</span> <span style="color: #666666">0.0</span>
-    <span style="color: #008000">iter</span> <span style="color: #666666">+=</span> <span style="color: #666666">1</span>
-    <span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(m):
-        random_index <span style="color: #666666">=</span> M<span style="color: #666666">*</span>np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>randint(m)
-        xi <span style="color: #666666">=</span> X[random_index:random_index<span style="color: #666666">+</span>M]
-        yi <span style="color: #666666">=</span> y[random_index:random_index<span style="color: #666666">+</span>M]
-        gradients <span style="color: #666666">=</span> (<span style="color: #666666">1.0/</span>M)<span style="color: #666666">*</span>training_gradient(yi, xi, theta)
-        <span style="color: #408080; font-style: italic"># Computing moments first</span>
-        first_moment <span style="color: #666666">=</span> beta1<span style="color: #666666">*</span>first_moment <span style="color: #666666">+</span> (<span style="color: #666666">1-</span>beta1)<span style="color: #666666">*</span>gradients
-        second_moment <span style="color: #666666">=</span> beta2<span style="color: #666666">*</span>second_moment<span style="color: #666666">+</span>(<span style="color: #666666">1-</span>beta2)<span style="color: #666666">*</span>gradients<span style="color: #666666">*</span>gradients
-        first_term <span style="color: #666666">=</span> first_moment<span style="color: #666666">/</span>(<span style="color: #666666">1.0-</span>beta1<span style="color: #666666">**</span><span style="color: #008000">iter</span>)
-        second_term <span style="color: #666666">=</span> second_moment<span style="color: #666666">/</span>(<span style="color: #666666">1.0-</span>beta2<span style="color: #666666">**</span><span style="color: #008000">iter</span>)
-	<span style="color: #408080; font-style: italic"># Scaling with rho the new and the previous results</span>
-        update <span style="color: #666666">=</span> eta<span style="color: #666666">*</span>first_term<span style="color: #666666">/</span>(np<span style="color: #666666">.</span>sqrt(second_term)<span style="color: #666666">+</span>delta)
-        theta <span style="color: #666666">-=</span> update
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;theta from own ADAM&quot;</span>)
-<span style="color: #008000">print</span>(theta)
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="and-logistic-regression">And Logistic Regression </h2>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">autograd.numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">autograd</span> <span style="color: #008000; font-weight: bold">import</span> grad
-
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">sigmoid</span>(x):
-    <span style="color: #008000; font-weight: bold">return</span> <span style="color: #666666">0.5</span> <span style="color: #666666">*</span> (np<span style="color: #666666">.</span>tanh(x <span style="color: #666666">/</span> <span style="color: #666666">2.</span>) <span style="color: #666666">+</span> <span style="color: #666666">1</span>)
-
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">logistic_predictions</span>(weights, inputs):
-    <span style="color: #408080; font-style: italic"># Outputs probability of a label being true according to logistic model.</span>
-    <span style="color: #008000; font-weight: bold">return</span> sigmoid(np<span style="color: #666666">.</span>dot(inputs, weights))
-
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">training_loss</span>(weights):
-    <span style="color: #408080; font-style: italic"># Training loss is the negative log-likelihood of the training labels.</span>
-    preds <span style="color: #666666">=</span> logistic_predictions(weights, inputs)
-    label_probabilities <span style="color: #666666">=</span> preds <span style="color: #666666">*</span> targets <span style="color: #666666">+</span> (<span style="color: #666666">1</span> <span style="color: #666666">-</span> preds) <span style="color: #666666">*</span> (<span style="color: #666666">1</span> <span style="color: #666666">-</span> targets)
-    <span style="color: #008000; font-weight: bold">return</span> <span style="color: #666666">-</span>np<span style="color: #666666">.</span>sum(np<span style="color: #666666">.</span>log(label_probabilities))
-
-<span style="color: #408080; font-style: italic"># Build a toy dataset.</span>
-inputs <span style="color: #666666">=</span> np<span style="color: #666666">.</span>array([[<span style="color: #666666">0.52</span>, <span style="color: #666666">1.12</span>,  <span style="color: #666666">0.77</span>],
-                   [<span style="color: #666666">0.88</span>, <span style="color: #666666">-1.08</span>, <span style="color: #666666">0.15</span>],
-                   [<span style="color: #666666">0.52</span>, <span style="color: #666666">0.06</span>, <span style="color: #666666">-1.30</span>],
-                   [<span style="color: #666666">0.74</span>, <span style="color: #666666">-2.49</span>, <span style="color: #666666">1.39</span>]])
-targets <span style="color: #666666">=</span> np<span style="color: #666666">.</span>array([<span style="color: #008000; font-weight: bold">True</span>, <span style="color: #008000; font-weight: bold">True</span>, <span style="color: #008000; font-weight: bold">False</span>, <span style="color: #008000; font-weight: bold">True</span>])
-
-<span style="color: #408080; font-style: italic"># Define a function that returns gradients of training loss using Autograd.</span>
-training_gradient_fun <span style="color: #666666">=</span> grad(training_loss)
-
-<span style="color: #408080; font-style: italic"># Optimize weights using gradient descent.</span>
-weights <span style="color: #666666">=</span> np<span style="color: #666666">.</span>array([<span style="color: #666666">0.0</span>, <span style="color: #666666">0.0</span>, <span style="color: #666666">0.0</span>])
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;Initial loss:&quot;</span>, training_loss(weights))
-<span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #666666">100</span>):
-    weights <span style="color: #666666">-=</span> training_gradient_fun(weights) <span style="color: #666666">*</span> <span style="color: #666666">0.01</span>
-
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;Trained loss:&quot;</span>, training_loss(weights))
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="introducing-jax-https-jax-readthedocs-io-en-latest">Introducing <a href="https://jax.readthedocs.io/en/latest/" target="_blank">JAX</a> </h2>
-
-<p>Presently, instead of using <b>autograd</b>, we recommend using <a href="https://jax.readthedocs.io/en/latest/" target="_blank">JAX</a></p>
-
-<p><b>JAX</b> is Autograd and <a href="https://www.tensorflow.org/xla" target="_blank">XLA (Accelerated Linear Algebra))</a>,
-brought together for high-performance numerical computing and machine learning research.
-It provides composable transformations of Python+NumPy programs: differentiate, vectorize, parallelize, Just-In-Time compile to GPU/TPU, and more.
-</p>
-
-<p>Here's a simple example on how you can use <b>JAX</b> to compute the derivate of the logistic function.</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">jax.numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">jnp</span>
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">jax</span> <span style="color: #008000; font-weight: bold">import</span> grad, jit, vmap
-
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">sum_logistic</span>(x):
-  <span style="color: #008000; font-weight: bold">return</span> jnp<span style="color: #666666">.</span>sum(<span style="color: #666666">1.0</span> <span style="color: #666666">/</span> (<span style="color: #666666">1.0</span> <span style="color: #666666">+</span> jnp<span style="color: #666666">.</span>exp(<span style="color: #666666">-</span>x)))
-
-x_small <span style="color: #666666">=</span> jnp<span style="color: #666666">.</span>arange(<span style="color: #666666">3.</span>)
-derivative_fn <span style="color: #666666">=</span> grad(sum_logistic)
-<span style="color: #008000">print</span>(derivative_fn(x_small))
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-
 <!-- ------------------- end of main content --------------- -->
 <center style="font-size:80%">
-<!-- copyright --> &copy; 1999-2024, Morten Hjorth-Jensen. Released under CC Attribution-NonCommercial 4.0 license
+<!-- copyright --> &copy; 1999-2025, Morten Hjorth-Jensen. Released under CC Attribution-NonCommercial 4.0 license
 </center>
 </body>
 </html>
diff --git a/doc/pub/week39/ipynb/Datafiles/EoS.csv b/doc/pub/week39/ipynb/Datafiles/EoS.csv
new file mode 100644
index 000000000..a958ae7b5
--- /dev/null
+++ b/doc/pub/week39/ipynb/Datafiles/EoS.csv
@@ -0,0 +1,90 @@
+ 3.3773726001100143E-005,   3.1715032621225665E-002
+ 2.7018980800880114E-004,  0.25379346864959029     
+ 9.1189060202970370E-004,  0.85691856357595775     
+ 2.1615184640704091E-003,   2.0322700794292126     
+ 4.2217157501375172E-003,   3.9716946611295496     
+ 7.2951248162376296E-003,   6.8678138410008760     
+ 1.1584388018377344E-002,   10.913973030767592     
+ 1.7292147712563273E-002,   16.304871533080519     
+ 2.4621046254802003E-002,   23.235226466525493     
+ 3.3773726001100138E-002,   31.902701432416372     
+ 4.4952829307464297E-002,   42.504284679798289     
+ 5.8360998529901037E-002,   55.243642496340065     
+ 7.4200876024417023E-002,   70.325036648868448     
+ 9.2675104147018753E-002,   87.967299710326188     
+ 0.11398632525371297,        108.39264632432710     
+ 0.13833718170050618 ,       131.84282952216495     
+ 0.16593031584340495 ,       158.57124952965228     
+ 0.19696837003841602 ,       188.82820513039681     
+ 0.203199995458126059, 195.042764094891368
+ 0.211199995279312130, 202.937172130123315
+ 0.219199995100498229, 210.852580165355278
+ 0.227199994921684245, 218.789988200587175
+ 0.235199994742870316, 226.748396235819115
+ 0.243199994564056388, 234.740804271051104
+ 0.251199994385242487, 242.757212306283037
+ 0.259199994206428530, 250.797620341515000
+ 0.267199994027614574, 258.874028376746878
+ 0.275199993848800673, 266.977436411978829
+ 0.283199993669986716, 275.092844447210780
+ 0.291199993491172815, 283.248252482442751
+ 0.299199993312358858, 291.445660517674696
+ 0.307199993133544902, 299.655068552906641
+ 0.315199992954731001, 307.909476588138546
+ 0.323199992775917044, 316.192884623370503
+ 0.339199992418289187, 332.846700693834407
+ 0.355199992060661329, 349.620516764298316
+ 0.371199991703033416, 366.537332834762140
+ 0.387199991345405559, 383.604148905225998
+ 0.403199990987777701, 400.801964975689941
+ 0.419199990630149844, 418.158781046153820
+ 0.435199990272521986, 435.624597116617792
+ 0.451199989914894073, 453.288413187081574
+ 0.467199989557266215, 471.062229257545425
+ 0.483199989199638358, 489.048045328009380
+ 0.499199988842010500, 507.146861398473277
+ 0.515199988484382643, 525.392677468937222
+ 0.531199988126754730, 543.829493539401028
+ 0.531999988108873390, 544.796634342924222
+ 0.550079987704753859, 565.860416502548446
+ 0.567999987304210641, 586.865970501467928
+ 0.585599986910820047, 607.703068178978242
+ 0.603199986517429343, 628.645165856488575
+ 0.620639986127614951, 649.578035373294142
+ 0.637919985741376872, 670.496676729395176
+ 0.655199985355138792, 691.549318085496111
+ 0.672479984968900713, 712.766959441597237
+ 0.689599984586238834, 733.981372636993456
+ 0.706879984200000755, 755.521013993094471
+ 0.724159983813762675, 777.228655349195492
+ 0.741439983427524596, 799.117296705296553
+ 0.758719983041286516, 821.191938061397536
+ 0.775999982655048326, 843.460579417498366
+ 0.793439982265233934, 866.080448934304059
+ 0.810879981875419542, 888.907318451109631
+ 0.828479981482028949, 912.100416128620054
+ 0.846239981085061932, 935.664741966834868
+ 0.864159980684518825, 959.607295965754474
+ 0.882079980283975607, 983.787849964674024
+ 0.900159979879856187, 1008.36063212429838
+ 0.918399979472160344, 1033.33564244462718
+ 0.936639979064464612, 1058.56865276495591
+ 0.955199978649616255, 1084.36811940669395
+ 0.973919978231191585, 1110.59381420913678
+ 0.992799977809190715, 1137.25073717228429
+ 1.01183997738361353, 1164.35288829613614
+ 1.06047997629642499, 1234.41724915034661
+ 1.11039997518062594, 1307.72443529019392
+ 1.16191997402906422, 1384.74890303708753
+ 1.21535997283458719, 1466.00110871243692
+ 1.27071997159719463, 1551.73305231624181
+ 1.32847997030615828, 1642.68741833061654
+ 1.38895996895432461, 1739.56266307697001
+ 1.45263996753096580, 1843.32947103741640
+ 1.52031996601820008, 1955.19398301547858
+ 1.59295996439456933, 2077.14256797538474
+ 1.67167996263504026, 2211.44382304206692
+ 1.75855996069312082, 2361.74471430468566
+ 1.85647995850443825, 2533.64534865592441
+ 1.96959995597600934, 2734.93465827410455
+ 2.10543995293974895, 2980.03336671234320
diff --git a/doc/pub/week39/ipynb/Datafiles/MassEval2016.dat b/doc/pub/week39/ipynb/Datafiles/MassEval2016.dat
new file mode 100644
index 000000000..4b479be45
--- /dev/null
+++ b/doc/pub/week39/ipynb/Datafiles/MassEval2016.dat
@@ -0,0 +1,3475 @@
+1    a0boogfu                                 A T O M I C   M A S S   A D J U S T M E N T
+0                                                     DATE  1 Mar 2017 TIME 17:26
+0        *********************                               A=   0 TO 295
+         * file : mass16.txt *
+         *********************
+
+   This is one file out of a series of 3 files published in:
+       "The Ame2016 atomic mass evaluation (I)"   by W.J.Huang, G.Audi, M.Wang, F.G.Kondev, S.Naimi and X.Xu
+           Chinese Physics C41 030002, March 2017.
+       "The Ame2016 atomic mass evaluation (II)"  by M.Wang, G.Audi, F.G.Kondev, W.J.Huang, S.Naimi and X.Xu
+           Chinese Physics C41 030003, March 2017.
+                       for files : mass16.txt  : atomic masses
+                                   rct1-16.txt : react and sep energies,  part 1
+                                   rct2-16.txt : react and sep energies,  part 2
+   A fourth file  is the "Rounded" version of the atomic mass table (the first file)
+                                   mass16round.txt : atomic masses "Rounded" version
+
+   All files are 3436 lines long with 124 character per line.
+       Headers are 39 lines long.
+   Values in files 1, 2 and 3 are unrounded copy of the published ones
+   Values in file  4          are exact     copy of the published ones
+
+   col 1     :  Fortran character control: 1 = page feed  0 = line feed
+   format    :  a1,i3,i5,i5,i5,1x,a3,a4,1x,f13.5,f11.5,f11.3,f9.3,1x,a2,f11.3,f9.3,1x,i3,1x,f12.5,f11.5
+                cc NZ  N  Z  A    el  o     mass  unc binding unc     B  beta  unc    atomic_mass   unc
+   Warnings  :  this format is identical to the ones used in Ame2003 and Ame2012
+                      in particular "Mass Excess" and "Atomic Mass" values are given now, when necessary,
+                      with 5 digits after decimal point.
+                decimal point is replaced by # for (non-experimental) estimated values.
+                * in place of value : not calculable
+
+....+....1....+....2....+....3....+....4....+....5....+....6....+....7....+....8....+....9....+...10....+...11....+...12....
+
+
+                                                               MASS LIST
+                                                             for analysis
+
+1N-Z    N    Z   A  EL    O     MASS EXCESS       BINDING ENERGY/A        BETA-DECAY ENERGY         ATOMIC MASS
+                                   (keV)                (keV)                    (keV)                (micro-u)
+0  1    1    0    1  n         8071.31713    0.00046      0.0      0.0   B-    782.347    0.000   1 008664.91582    0.00049
+  -1    0    1    1 H          7288.97061    0.00009      0.0      0.0   B-      *                1 007825.03224    0.00009
+0  0    1    1    2 H         13135.72176    0.00011   1112.283    0.000 B-      *                2 014101.77811    0.00012
+0  1    2    1    3 H         14949.80993    0.00022   2827.265    0.000 B-     18.592    0.000   3 016049.28199    0.00023
+  -1    1    2    3 He        14931.21793    0.00021   2572.680    0.000 B- -13736#    2000#      3 016029.32265    0.00022
+  -3    0    3    3 Li  -pp   28667#      2000#       -2267#     667#    B-      *                3 030775#      2147#
+0  2    3    1    4 H    -n   24621.127    100.000     1720.449   25.000 B-  22196.211  100.000   4 026431.868    107.354
+   0    2    2    4 He         2424.91561    0.00006   7073.915    0.000 B- -22898.273  212.132   4 002603.25413    0.00006
+  -2    1    3    4 Li   -p   25323.189    212.132     1153.760   53.033 B-      *                4 027185.562    227.733
+0  3    4    1    5 H   -nn   32892.444     89.443     1336.359   17.889 B-  21661.211   91.652   5 035311.493     96.020
+   1    3    2    5 He   -n   11231.233     20.000     5512.132    4.000 B-   -447.654   53.852   5 012057.224     21.470
+  -1    2    3    5 Li   -p   11678.886     50.000     5266.132   10.000 B- -25460#    2003#      5 012537.800     53.677
+  -3    1    4    5 Be    x   37139#      2003#          18#     401#    B-      *                5 039870#      2150#
+0  4    5    1    6 H   -3n   41875.721    254.127      961.639   42.354 B-  24283.626  254.127   6 044955.437    272.816
+   2    4    2    6 He        17592.095      0.053     4878.519    0.009 B-   3505.216    0.053   6 018885.891      0.057
+   0    3    3    6 Li        14086.87895    0.00144   5332.331    0.000 B-  -4288.154    5.448   6 015122.88742    0.00155
+  -2    2    4    6 Be    -   18375.033      5.448     4487.247    0.908 B- -28945#    2003#      6 019726.409      5.848
+  -4    1    5    6 B     x   47320#      2003#        -467#     334#    B-      *                6 050800#      2150#
+0  5    6    1    7 H   -nn   49135#      1004#         940#     143#    B-  23062#    1004#      7 052749#      1078#
+   3    5    2    7 He   -n   26073.126      7.559     4123.057    1.080 B-  11166.021    7.559   7 027990.654      8.115
+   1    4    3    7 Li        14907.10529    0.00423   5606.439    0.001 B-   -861.893    0.071   7 016003.43666    0.00454
+  -1    3    4    7 Be        15768.999      0.071     5371.548    0.010 B- -11907.551   25.150   7 016928.717      0.076
+  -3    2    5    7 B   p4n   27676.550     25.150     3558.705    3.593 B-      *                7 029712.000     27.000
+0  4    6    2    8 He        31609.681      0.089     3924.520    0.011 B-  10663.878    0.100   8 033934.390      0.095
+   2    5    3    8 Li        20945.804      0.047     5159.712    0.006 B-  16004.133    0.059   8 022486.246      0.050
+   0    4    4    8 Be   -a    4941.671      0.035     7062.435    0.004 B- -17979.896    1.000   8 005305.102      0.037
+  -2    3    5    8 B         22921.567      1.000     4717.155    0.125 B- -12142.701   18.270   8 024607.316      1.073
+  -4    2    6    8 C         35064.268     18.243     3101.524    2.280 B-      *                8 037643.042     19.584
+0  5    7    2    9 He        40935.826     46.816     3349.037    5.202 B-  15980.924   46.817   9 043946.419     50.259
+   3    6    3    9 Li  -3n   24954.902      0.186     5037.768    0.021 B-  13606.449    0.201   9 026790.191      0.200
+   1    5    4    9 Be        11348.453      0.077     6462.668    0.009 B-  -1068.035    0.899   9 012183.066      0.082
+  -1    4    5    9 B     -   12416.488      0.903     6257.070    0.100 B- -16494.484    2.319   9 013329.649      0.969
+  -3    3    6    9 C   -pp   28910.972      2.137     4337.423    0.237 B-      *                9 031037.207      2.293
+0  6    8    2   10 He  -nn   49197.143     92.848     2995.134    9.285 B-  16144.519   93.715  10 052815.308     99.676
+   4    7    3   10 Li   -n   33052.624     12.721     4531.351    1.272 B-  20445.136   12.722  10 035483.453     13.656
+   2    6    4   10 Be        12607.488      0.081     6497.630    0.008 B-    556.878    0.082  10 013534.695      0.086
+   0    5    5   10 B         12050.609      0.015     6475.083    0.002 B-  -3648.062    0.069  10 012936.862      0.016
+  -2    4    6   10 C         15698.672      0.070     6032.042    0.007 B- -23101.355  400.000  10 016853.218      0.075
+  -4    3    7   10 N    --   38800.026    400.000     3643.672   40.000 B-      *               10 041653.543    429.417
+0  5    8    3   11 Li    x   40728.254      0.615     4155.381    0.056 B-  20551.087    0.659  11 043723.581      0.660
+   3    7    4   11 Be        20177.167      0.238     5952.540    0.022 B-  11509.460    0.238  11 021661.081      0.255
+   1    6    5   11 B          8667.707      0.012     6927.732    0.001 B-  -1981.689    0.061  11 009305.166      0.013
+  -1    5    6   11 C         10649.396      0.060     6676.456    0.005 B- -13654.163   46.154  11 011432.597      0.064
+  -3    4    7   11 N    -p   24303.559     46.154     5364.046    4.196 B-      *               11 026090.945     49.548
+0  6    9    3   12 Li   -n   49009.571     30.006     3791.600    2.501 B-  23931.812   30.067  12 052613.941     32.213
+   4    8    4   12 Be        25077.760      1.909     5720.722    0.159 B-  11708.363    2.321  12 026922.083      2.048
+   2    7    5   12 B         13369.397      1.321     6631.223    0.110 B-  13369.397    1.321  12 014352.638      1.418
+   0    6    6   12 C             0.0        0.0       7680.144    0.000 B- -17338.068    1.000  12 000000.0        0.0
+  -2    5    7   12 N         17338.068      1.000     6170.109    0.083 B- -14576.544   24.021  12 018613.182      1.073
+  -4    4    8   12 O   -pp   31914.613     24.000     4890.202    2.000 B-      *               12 034261.747     25.765
+0  7   10    3   13 Li  -nn   56980.888     70.003     3507.630    5.385 B-  23321.812   70.739  13 061171.503     75.150
+   5    9    4   13 Be   -n   33659.077     10.180     5241.435    0.783 B-  17097.130   10.230  13 036134.507     10.929
+   3    8    5   13 B   -nn   16561.947      1.000     6496.419    0.077 B-  13436.938    1.000  13 017779.981      1.073
+   1    7    6   13 C          3125.00888    0.00021   7469.849    0.000 B-  -2220.472    0.270  13 003354.83521    0.00023
+  -1    6    7   13 N          5345.481      0.270     7238.863    0.021 B- -17769.951    9.530  13 005738.609      0.289
+  -3    5    8   13 O   +3n   23115.432      9.526     5811.763    0.733 B-      *               13 024815.437     10.226
+0  6   10    4   14 Be    x   39954.498    132.245     4993.897    9.446 B-  16290.813  133.936  14 042892.920    141.970
+   4    9    5   14 B         23663.685     21.213     6101.644    1.515 B-  20643.792   21.213  14 025404.012     22.773
+   2    8    6   14 C          3019.89278    0.00376   7520.319    0.000 B-    156.476    0.004  14 003241.98843    0.00403
+   0    7    7   14 N          2863.41672    0.00019   7475.614    0.000 B-  -5144.364    0.025  14 003074.00446    0.00021
+  -2    6    8   14 O          8007.781      0.025     7052.278    0.002 B- -23956.622   41.119  14 008596.706      0.027
+  -4    5    9   14 F    -p   31964.402     41.119     5285.208    2.937 B-      *               14 034315.199     44.142
+0  7   11    4   15 Be   -n   49825.815    165.797     4540.970   11.053 B-  20867.573  167.126  15 053490.215    177.990
+   5   10    5   15 B         28958.242     21.032     5879.985    1.402 B-  19085.098   21.047  15 031087.953     22.578
+   3    9    6   15 C    -n    9873.144      0.800     7100.169    0.053 B-   9771.705    0.800  15 010599.256      0.858
+   1    8    7   15 N           101.43871    0.00060   7699.460    0.000 B-  -2754.166    0.491  15 000108.89894    0.00065
+  -1    7    8   15 O          2855.605      0.491     7463.692    0.033 B- -13711.146   14.009  15 003065.618      0.526
+  -3    6    9   15 F    -p   16566.751     14.000     6497.459    0.933 B- -23648.622   68.138  15 017785.139     15.029
+  -5    5   10   15 Ne  -pp   40215.373     66.684     4868.728    4.446 B-      *               15 043172.980     71.588
+0  8   12    4   16 Be  -nn   57447.132    165.797     4285.285   10.362 B-  20334.623  167.608  16 061672.036    177.990
+   6   11    5   16 B         37112.510     24.569     5507.302    1.536 B-  23418.378   24.828  16 039841.920     26.375
+   4   10    6   16 C   -nn   13694.132      3.578     6922.054    0.224 B-   8010.225    4.254  16 014701.256      3.840
+   2    9    7   16 N    -n    5683.907      2.301     7373.796    0.144 B-  10420.908    2.301  16 006101.925      2.470
+   0    8    8   16 O         -4737.00135    0.00016   7976.206    0.000 B- -15417.254    8.321  15 994914.61960    0.00017
+  -2    7    9   16 F     -   10680.253      8.321     6963.731    0.520 B- -13306.523   22.106  16 011465.723      8.932
+  -4    6   10   16 Ne   --   23986.776     20.480     6083.177    1.280 B-      *               16 025750.864     21.986
+0  7   12    5   17 B     x   43716.317    204.104     5269.667   12.006 B-  22684.419  204.841  17 046931.399    219.114
+   5   11    6   17 C  2p-n   21031.898     17.365     6558.024    1.021 B-  13161.820   22.946  17 022578.672     18.641
+   3   10    7   17 N    +p    7870.079     15.000     7286.229    0.882 B-   8678.842   15.000  17 008448.877     16.103
+   1    9    8   17 O          -808.76348    0.00066   7750.728    0.000 B-  -2760.465    0.248  16 999131.75664    0.00070
+  -1    8    9   17 F          1951.702      0.248     7542.328    0.015 B- -14548.746    0.432  17 002095.238      0.266
+  -3    7   10   17 Ne        16500.447      0.354     6640.499    0.021 B- -18672.766 1001.356  17 017713.959      0.380
+  -5    6   11   17 Na    x   35173.214   1001.356     5496.080   58.903 B-      *               17 037760.000   1075.000
+0  8   13    5   18 B    -n   51792.634    204.165     4976.630   11.342 B-  26873.370  206.357  18 055601.682    219.180
+   6   12    6   18 C    ++   24919.264     30.000     6426.131    1.667 B-  11806.096   35.282  18 026751.932     32.206
+   4   11    7   18 N     +   13113.168     18.570     7038.562    1.032 B-  13895.984   18.570  18 014077.565     19.935
+   2   10    8   18 O          -782.81560    0.00071   7767.097    0.000 B-  -1655.929    0.463  17 999159.61284    0.00076
+   0    9    9   18 F           873.113      0.463     7631.638    0.026 B-  -4444.501    0.589  18 000937.325      0.497
+  -2    8   10   18 Ne         5317.614      0.363     7341.257    0.020 B- -19720.374   93.882  18 005708.693      0.390
+  -4    7   11   18 Na        25037.988     93.881     6202.217    5.216 B-      *               18 026879.386    100.785
+0  9   14    5   19 B     x   59770.244    525.363     4719.634   27.651 B-  27356.492  534.496  19 064166.000    564.000
+   7   13    6   19 C    -n   32413.752     98.389     6118.273    5.178 B-  16557.471   99.748  19 034797.596    105.625
+   5   12    7   19 N  p-2n   15856.282     16.404     6948.543    0.863 B-  12523.424   16.614  19 017022.419     17.610
+   3   11    8   19 O    -n    3332.858      2.637     7566.495    0.139 B-   4820.302    2.637  19 003577.970      2.830
+   1   10    9   19 F         -1487.44420    0.00086   7779.018    0.000 B-  -3239.494    0.160  18 998403.16288    0.00093
+  -1    9   10   19 Ne  +3n    1752.050      0.160     7567.343    0.008 B- -11177.340   10.536  19 001880.903      0.171
+  -3    8   11   19 Na        12929.390     10.535     6937.885    0.554 B- -18898.998   51.099  19 013880.272     11.309
+  -5    7   12   19 Mg  -pp   31828.389     50.001     5902.025    2.632 B-      *               19 034169.182     53.678
+0 10   15    5   20 B     x   68450#       800#        4453#      40#    B-  30946#     833#     20 073484#       859#
+   8   14    6   20 C     x   37503.563    230.625     5961.435   11.531 B-  15737.067  243.746  20 040261.732    247.585
+   6   13    7   20 N     x   21766.496     78.894     6709.171    3.945 B-  17970.324   78.899  20 023367.295     84.696
+   4   12    8   20 O   -nn    3796.172      0.885     7568.570    0.044 B-   3813.635    0.885  20 004075.358      0.950
+   2   11    9   20 F    -n     -17.463      0.030     7720.134    0.002 B-   7024.467    0.030  19 999981.252      0.031
+   0   10   10   20 Ne        -7041.93055    0.00157   8032.240    0.000 B- -13892.535    1.114  19 992440.17619    0.00168
+  -2    9   11   20 Na         6850.604      1.114     7298.496    0.056 B- -10627.088    2.171  20 007354.426      1.195
+  -4    8   12   20 Mg   +t   17477.692      1.863     6728.025    0.093 B-      *               20 018763.075      2.000
+0 11   16    5   21 B     x   77330#       900#        4203#      43#    B-  31687#    1079#     21 083017#       966#
+   9   15    6   21 C     x   45643#       596#        5674#      28#    B-  20411#     611#     21 049000#       640#
+   7   14    7   21 N     x   25231.913    134.048     6609.015    6.383 B-  17169.878  134.584  21 027087.573    143.906
+   5   13    8   21 O   -3n    8062.035     12.000     7389.374    0.571 B-   8109.640   12.134  21 008654.950     12.882
+   3   12    9   21 F   -nn     -47.605      1.800     7738.293    0.086 B-   5684.171    1.800  20 999948.894      1.932
+   1   11   10   21 Ne        -5731.776      0.038     7971.713    0.002 B-  -3547.145    0.090  20 993846.685      0.041
+  -1   10   11   21 Na        -2184.631      0.098     7765.547    0.005 B- -13088.480    0.761  20 997654.702      0.105
+  -3    9   12   21 Mg    x   10903.850      0.755     7105.031    0.036 B- -16086#     596#     21 011705.764      0.810
+  -5    8   13   21 Al    x   26990#       596#        6302#      28#    B-      *               21 028975#       640#
+0 10   16    6   22 C   -nn   53611.197    231.490     5421.077   10.522 B-  21846.396  311.063  22 057553.990    248.515
+   8   15    7   22 N     x   31764.801    207.779     6378.534    9.445 B-  22481.768  215.435  22 034100.918    223.060
+   6   14    8   22 O   -4n    9283.033     56.921     7364.871    2.587 B-   6489.660   58.256  22 009965.746     61.107
+   4   13    9   22 F     +    2793.373     12.399     7624.295    0.564 B-  10818.092   12.399  22 002998.809     13.310
+   2   12   10   22 Ne        -8024.719      0.018     8080.465    0.001 B-  -2843.207    0.171  21 991385.109      0.018
+   0   11   11   22 Na        -5181.511      0.171     7915.667    0.008 B-  -4781.578    0.321  21 994437.418      0.183
+  -2   10   12   22 Mg         -399.933      0.313     7662.761    0.014 B- -18601#     401#     21 999570.654      0.335
+  -4    9   13   22 Al    x   18201#       401#        6782#      18#    B- -15137#     643#     22 019540#       430#
+  -6    8   14   22 Si    x   33338#       503#        6058#      23#    B-      *               22 035790#       540#
+0 11   17    6   23 C     x   64171#       997#        5077#      43#    B-  27450#    1082#     23 068890#      1070#
+   9   16    7   23 N     x   36720.425    420.570     6236.671   18.286 B-  22099.056  437.827  23 039421.000    451.500
+   7   15    8   23 O     x   14621.369    121.712     7163.485    5.292 B-  11336.106  126.190  23 015696.686    130.663
+   5   14    9   23 F          3285.263     33.320     7622.344    1.449 B-   8439.312   33.321  23 003526.874     35.770
+   3   13   10   23 Ne   -n   -5154.049      0.104     7955.256    0.005 B-   4375.804    0.104  22 994466.900      0.112
+   1   12   11   23 Na        -9529.85248    0.00181   8111.493    0.000 B-  -4056.340    0.158  22 989769.28199    0.00194
+  -1   11   12   23 Mg    -   -5473.513      0.158     7901.115    0.007 B- -12221.583    0.379  22 994123.941      0.170
+  -3   10   13   23 Al   --    6748.070      0.345     7335.727    0.015 B- -16949#     503#     23 007244.351      0.370
+  -5    9   14   23 Si    x   23697#       503#        6565#      22#    B-      *               23 025440#       540#
+0 10   17    7   24 N     x   46938#       401#        5887#      17#    B-  28438#     433#     24 050390#       430#
+   8   16    8   24 O     x   18500.402    164.874     7039.685    6.870 B-  10955.887  191.633  24 019861.000    177.000
+   6   15    9   24 F     x    7544.515     97.670     7463.582    4.070 B-  13496.161   97.672  24 008099.370    104.853
+   4   14   10   24 Ne  -nn   -5951.646      0.513     7993.325    0.021 B-   2466.255    0.513  23 993610.645      0.550
+   2   13   11   24 Na   -n   -8417.901      0.017     8063.488    0.001 B-   5515.669    0.021  23 990963.011      0.017
+   0   12   12   24 Mg       -13933.569      0.013     8260.709    0.001 B- -13884.704    0.233  23 985041.697      0.014
+  -2   11   13   24 Al   ep     -48.865      0.233     7649.582    0.010 B- -10794.060   19.473  23 999947.541      0.250
+  -4   10   14   24 Si   --   10745.195     19.472     7167.232    0.811 B- -22574#     503#     24 011535.441     20.904
+  -6    9   15   24 P     x   33320#       503#        6194#      21#    B-      *               24 035770#       540#
+0 11   18    7   25 N     x   55983#       503#        5613#      20#    B-  28654#     529#     25 060100#       540#
+   9   17    8   25 O    -n   27329.027    165.084     6727.805    6.603 B-  15994.862  191.191  25 029338.919    177.225
+   7   16    9   25 F     x   11334.166     96.442     7336.306    3.858 B-  13369.667  100.721  25 012167.727    103.535
+   5   15   10   25 Ne        -2035.502     29.045     7839.799    1.162 B-   7322.312   29.070  24 997814.799     31.181
+   3   14   11   25 Na  -nn   -9357.813      1.200     8101.397    0.048 B-   3834.969    1.201  24 989953.973      1.288
+   1   13   12   25 Mg       -13192.783      0.047     8223.502    0.002 B-  -4276.808    0.045  24 985836.964      0.050
+  -1   12   13   25 Al        -8915.975      0.065     8021.136    0.003 B- -12743.299   10.000  24 990428.306      0.069
+  -3   11   14   25 Si  +3n    3827.324     10.000     7480.110    0.400 B- -15911#     401#     25 004108.801     10.735
+  -5   10   15   25 P     x   19738#       401#        6812#      16#    B-      *               25 021190#       430#
+0 10   18    8   26 O   -nn   34661.037    164.950     6497.478    6.344 B-  16012.161  198.932  26 037210.155    177.081
+   8   17    9   26 F     x   18648.875    111.199     7083.240    4.277 B-  18167.762  112.716  26 020020.392    119.377
+   6   16   10   26 Ne    x     481.114     18.429     7751.910    0.709 B-   7341.893   18.758  26 000516.496     19.784
+   4   15   11   26 Na    x   -6860.780      3.502     8004.201    0.135 B-   9353.763    3.502  25 992634.649      3.759
+   2   14   12   26 Mg       -16214.542      0.030     8333.870    0.001 B-  -4004.391    0.063  25 982592.971      0.032
+   0   13   13   26 Al       -12210.151      0.067     8149.765    0.003 B-  -5069.136    0.085  25 986891.863      0.071
+  -2   12   14   26 Si    -   -7141.015      0.108     7924.708    0.004 B- -18114#     196#     25 992333.804      0.115
+  -4   11   15   26 P     x   10973#       196#        7198#       8#    B- -16106#     627#     26 011780#       210#
+  -6   10   16   26 S     x   27079#       596#        6548#      23#    B-      *               26 029070#       640#
+0 11   19    8   27 O     x   44670#       500#        6185#      19#    B-  19220#     634#     27 047955#       537#
+   9   18    9   27 F     x   25450.279    389.830     6867.932   14.438 B-  18399.370  400.258  27 027322.000    418.500
+   7   17   10   27 Ne    x    7050.909     90.770     7520.414    3.362 B-  12568.699   90.847  27 007569.462     97.445
+   5   16   11   27 Na   ++   -5517.790      3.726     7956.946    0.138 B-   9068.821    3.727  26 994076.408      4.000
+   3   15   12   27 Mg   -n  -14586.611      0.050     8263.852    0.002 B-   2610.251    0.069  26 984340.628      0.053
+   1   14   13   27 Al       -17196.861      0.047     8331.553    0.002 B-  -4812.359    0.096  26 981538.408      0.050
+  -1   13   14   27 Si    -  -12384.503      0.107     8124.341    0.004 B- -11662.044   26.340  26 986704.688      0.115
+  -3   12   15   27 P   p4n    -722.458     26.340     7663.438    0.976 B- -17750#     400#     26 999224.409     28.277
+  -5   11   16   27 S     -   17028#       401#        6977#      15#    B-      *               27 018280#       430#
+0 12   20    8   28 O     x   52080#       699#        5988#      25#    B-  18338#     802#     28 055910#       750#
+  10   19    9   28 F    -n   33741.596    393.024     6614.792   14.037 B-  22441.859  412.748  28 036223.095    421.928
+   8   18   10   28 Ne    x   11299.737    126.068     7388.346    4.502 B-  12288.052  126.483  28 012130.767    135.339
+   6   17   11   28 Na    x    -988.315     10.246     7799.264    0.366 B-  14030.529   10.440  27 998939.000     11.000
+   4   16   12   28 Mg    +  -15018.845      2.001     8272.413    0.071 B-   1831.800    2.000  27 983876.606      2.148
+   2   15   13   28 Al   -n  -16850.645      0.077     8309.894    0.003 B-   4642.150    0.077  27 981910.087      0.083
+   0   14   14   28 Si       -21492.79430    0.00049   8447.744    0.000 B- -14345.055    1.152  27 976926.53499    0.00052
+  -2   13   15   28 P         -7147.740      1.152     7907.479    0.041 B- -11220.945  160.004  27 992326.585      1.236
+  -4   12   16   28 S    --    4073.206    160.000     7478.790    5.714 B- -23443#     617#     28 004372.766    171.767
+  -6   11   17   28 Cl    x   27516#       596#        6614#      21#    B-      *               28 029540#       640#
+0 11   20    9   29 F     x   40150.186    525.363     6444.031   18.116 B-  21750.385  546.221  29 043103.000    564.000
+   9   19   10   29 Ne    x   18399.801    149.505     7167.067    5.155 B-  15719.807  149.685  29 019753.000    160.500
+   7   18   11   29 Na         2679.994      7.337     7682.151    0.253 B-  13282.824   13.557  29 002877.092      7.876
+   5   17   12   29 Mg    x  -10602.829     11.400     8113.202    0.393 B-   7604.931   11.405  28 988617.393     12.238
+   3   16   13   29 Al    x  -18207.760      0.345     8348.464    0.012 B-   3687.318    0.345  28 980453.164      0.370
+   1   15   14   29 Si       -21895.07838    0.00056   8448.635    0.000 B-  -4942.230    0.359  28 976494.66525    0.00060
+  -1   14   15   29 P        -16952.848      0.359     8251.236    0.012 B- -13796.432   50.001  28 981800.368      0.385
+  -3   13   16   29 S   +3n   -3156.416     50.000     7748.520    1.724 B- -16318.592  195.192  28 996611.448     53.677
+  -5   12   17   29 Cl   -p   13162.176    188.680     7158.832    6.506 B-      *               29 014130.178    202.555
+0 12   21    9   30 F     x   48112#       596#        6233#      20#    B-  24832#     648#     30 051650#       640#
+  10   20   10   30 Ne        23280.117    253.250     7034.531    8.442 B-  14805.448  253.295  30 024992.235    271.875
+   8   19   11   30 Na         8474.670      4.727     7501.968    0.158 B-  17358.490    5.850  30 009097.932      5.074
+   6   18   12   30 Mg    x   -8883.820      3.447     8054.506    0.115 B-   6981.024    4.496  29 990462.826      3.700
+   4   17   13   30 Al    x  -15864.844      2.888     8261.128    0.096 B-   8568.116    2.888  29 982968.388      3.100
+   2   16   14   30 Si   -n  -24432.960      0.022     8520.654    0.001 B-  -4232.106    0.061  29 973770.136      0.023
+   0   15   15   30 P     -  -20200.854      0.065     8353.506    0.002 B-  -6141.601    0.196  29 978313.489      0.069
+  -2   14   16   30 S     -  -14059.253      0.206     8122.707    0.007 B- -18502#     196#     29 984906.769      0.221
+  -4   13   17   30 Cl    x    4443#       196#        7480#       7#    B- -16488#     284#     30 004770#       210#
+  -6   12   18   30 Ar  -pp   20931.147    206.155     6904.204    6.872 B-      *               30 022470.511    221.316
+0 13   22    9   31 F   -nn   56143#       546#        6033#      18#    B-  24961#     608#     31 060272#       587#
+  11   21   10   31 Ne        31181.591    266.195     6813.090    8.587 B-  18935.559  266.562  31 033474.816    285.772
+   9   20   11   31 Na    x   12246.031     13.972     7398.677    0.451 B-  15368.182   14.307  31 013146.656     15.000
+   7   19   12   31 Mg    x   -3122.151      3.074     7869.188    0.099 B-  11828.555    3.801  30 996648.232      3.300
+   5   18   13   31 Al    x  -14950.706      2.236     8225.517    0.072 B-   7998.330    2.236  30 983949.756      2.400
+   3   17   14   31 Si   -n  -22949.036      0.043     8458.291    0.001 B-   1491.505    0.043  30 975363.194      0.046
+   1   16   15   31 P        -24440.54095    0.00067   8481.167    0.000 B-  -5398.016    0.229  30 973761.99863    0.00072
+  -1   15   16   31 S        -19042.525      0.229     8281.800    0.007 B- -12007.974    3.454  30 979557.007      0.246
+  -3   14   17   31 Cl   --   -7034.551      3.447     7869.209    0.111 B- -18360#     200#     30 992448.098      3.700
+  -5   13   18   31 Ar    -   11325#       200#        7252#       6#    B-      *               31 012158#       215#
+0 12   22   10   32 Ne    x   36999#       503#        6671#      16#    B-  18359#     504#     32 039720#       540#
+  10   21   11   32 Na    x   18640.151     37.260     7219.881    1.164 B-  19469.051   37.402  32 020011.026     40.000
+   8   20   12   32 Mg    x    -828.900      3.260     7803.840    0.102 B-  10270.467    7.879  31 999110.139      3.500
+   6   19   13   32 Al    x  -11099.367      7.173     8100.344    0.224 B-  12978.319    7.179  31 988084.339      7.700
+   4   18   14   32 Si    x  -24077.686      0.298     8481.468    0.009 B-    227.188    0.301  31 974151.539      0.320
+   2   17   15   32 P    -n  -24304.874      0.040     8464.120    0.001 B-   1710.660    0.040  31 973907.643      0.042
+   0   16   16   32 S        -26015.53355    0.00132   8493.129    0.000 B- -12680.860    0.562  31 972071.17443    0.00141
+  -2   15   17   32 Cl       -13334.674      0.562     8072.404    0.018 B- -11134.323    1.857  31 985684.637      0.603
+  -4   14   18   32 Ar    x   -2200.351      1.770     7700.008    0.055 B- -23299#     401#     31 997637.826      1.900
+  -6   13   19   32 K     x   21098#       401#        6947#      13#    B-      *               32 022650#       430#
+0 13   23   10   33 Ne    x   45997#       596#        6440#      18#    B-  22217#     747#     33 049380#       640#
+  11   22   11   33 Na    x   23780.110    449.912     7089.926   13.634 B-  18817.813  449.921  33 025529.000    483.000
+   9   21   12   33 Mg    x    4962.297      2.888     7636.455    0.088 B-  13459.677    7.559  33 005327.245      3.100
+   7   20   13   33 Al    x   -8497.380      6.986     8020.616    0.212 B-  12016.945    7.021  32 990877.687      7.500
+   5   19   14   33 Si    x  -20514.325      0.699     8361.059    0.021 B-   5823.021    1.295  32 977976.964      0.750
+   3   18   15   33 P     +  -26337.346      1.090     8513.806    0.033 B-    248.508    1.090  32 971725.694      1.170
+   1   17   16   33 S        -26585.85434    0.00135   8497.630    0.000 B-  -5582.517    0.391  32 971458.90985    0.00145
+  -1   16   17   33 Cl       -21003.337      0.391     8304.755    0.012 B- -11619.044    0.560  32 977451.989      0.419
+  -3   15   18   33 Ar    x   -9384.292      0.401     7928.955    0.012 B- -16426#     196#     32 989925.547      0.430
+  -5   14   19   33 K     x    7042#       196#        7407#       6#    B-      *               33 007560#       210#
+0 14   24   10   34 Ne  -nn   52842#       513#        6287#      15#    B-  21161#     789#     34 056728#       551#
+  12   23   11   34 Na    x   31680.111    599.416     6886.437   17.630 B-  23356.764  600.112  34 034010.000    643.500
+  10   22   12   34 Mg    x    8323.348     28.876     7550.390    0.849 B-  11323.637   29.039  34 008935.481     31.000
+   8   21   13   34 Al    x   -3000.289      3.074     7860.428    0.090 B-  16956.563   14.448  33 996779.057      3.300
+   6   20   14   34 Si  +pp  -19956.852     14.118     8336.141    0.415 B-   4591.847   14.141  33 978575.437     15.155
+   4   19   15   34 P     x  -24548.698      0.810     8448.185    0.024 B-   5382.987    0.812  33 973645.887      0.870
+   2   18   16   34 S        -29931.685      0.045     8583.498    0.001 B-  -5491.603    0.038  33 967867.012      0.047
+   0   17   17   34 Cl       -24440.082      0.049     8398.970    0.001 B-  -6061.792    0.063  33 973762.491      0.052
+  -2   16   18   34 Ar       -18378.290      0.078     8197.672    0.002 B- -17158#     196#     33 980270.093      0.083
+  -4   15   19   34 K     x   -1220#       196#        7670#       6#    B- -15072#     357#     33 998690#       210#
+  -6   14   20   34 Ca    x   13851#       298#        7204#       9#    B-      *               34 014870#       320#
+0 13   24   11   35 Na   -n   38231#       670#        6733#      19#    B-  22592#     723#     35 041043#       720#
+  11   23   12   35 Mg    x   15639.784    269.668     7356.233    7.705 B-  15863.512  269.768  35 016790.000    289.500
+   9   22   13   35 Al    x    -223.728      7.359     7787.124    0.210 B-  14167.729   36.605  34 999759.817      7.900
+   7   21   14   35 Si 2p-n  -14391.457     35.857     8169.563    1.024 B-  10466.342   35.905  34 984550.134     38.494
+   5   20   15   35 P    +p  -24857.799      1.866     8446.249    0.053 B-   3988.407    1.867  34 973314.053      2.003
+   3   19   16   35 S        -28846.206      0.040     8537.850    0.001 B-    167.322    0.026  34 969032.322      0.043
+   1   18   17   35 Cl       -29013.528      0.035     8520.278    0.001 B-  -5966.243    0.679  34 968852.694      0.038
+  -1   17   18   35 Ar    -  -23047.284      0.680     8327.461    0.019 B- -11874.394    0.852  34 975257.721      0.730
+  -3   16   19   35 K    4n  -11172.891      0.512     7965.840    0.015 B- -15961#     196#     34 988005.407      0.550
+  -5   15   20   35 Ca    x    4788#       196#        7487#       6#    B-      *               35 005140#       210#
+0 14   25   11   36 Na   -n   46303#       678#        6546#      19#    B-  25923#     967#     36 049708#       728#
+  12   24   12   36 Mg    x   20380.157    690.237     7244.419   19.173 B-  14429.774  706.243  36 021879.000    741.000
+  10   23   13   36 Al    x    5950.384    149.505     7623.515    4.153 B-  18386.508  165.851  36 006388.000    160.500
+   8   22   14   36 Si    x  -12436.124     71.797     8112.519    1.994 B-   7814.911   72.985  35 986649.271     77.077
+   6   21   15   36 P     +  -20251.034     13.114     8307.868    0.364 B-  10413.096   13.112  35 978259.619     14.078
+   4   20   16   36 S        -30664.131      0.188     8575.389    0.005 B-  -1142.126    0.189  35 967080.699      0.201
+   2   19   17   36 Cl       -29522.005      0.036     8521.931    0.001 B-    709.535    0.045  35 968306.822      0.038
+   0   18   18   36 Ar       -30231.540      0.027     8519.909    0.001 B- -12814.475    0.342  35 967545.105      0.028
+  -2   17   19   36 K        -17417.065      0.341     8142.219    0.009 B- -10965.916   40.001  35 981302.010      0.366
+  -4   16   20   36 Ca   4n   -6451.149     40.000     7815.879    1.111 B- -21802#     301#     35 993074.406     42.941
+  -6   15   21   36 Sc    x   15351#       298#        7189#       8#    B-      *               36 016480#       320#
+0 15   26   11   37 Na  -nn   53534#       687#        6392#      19#    B-  25323#     980#     37 057471#       737#
+  13   25   12   37 Mg   -n   28211.474    698.947     7055.111   18.890 B-  18401.911  721.814  37 030286.265    750.350
+  11   24   13   37 Al    x    9809.563    180.244     7531.315    4.871 B-  16381.075  213.168  37 010531.000    193.500
+   9   23   14   37 Si    x   -6571.511    113.809     7952.903    3.076 B-  12424.486  119.969  36 992945.191    122.179
+   7   22   15   37 P  p-2n  -18995.998     37.948     8267.555    1.026 B-   7900.419   37.947  36 979606.956     40.738
+   5   21   16   37 S    -n  -26896.417      0.198     8459.935    0.005 B-   4865.121    0.196  36 971125.507      0.212
+   3   20   17   37 Cl       -31761.538      0.052     8570.281    0.001 B-   -813.873    0.200  36 965902.584      0.055
+   1   19   18   37 Ar    -  -30947.664      0.207     8527.139    0.006 B-  -6147.465    0.227  36 966776.314      0.221
+  -1   18   19   37 K    -p  -24800.199      0.094     8339.847    0.003 B- -11664.133    0.641  36 973375.889      0.100
+  -3   17   20   37 Ca    x  -13136.066      0.634     8003.456    0.017 B- -16656#     300#     36 985897.852      0.680
+  -5   16   21   37 Sc    x    3520#       300#        7532#       8#    B-      *               37 003779#       322#
+0 14   26   12   38 Mg    x   34074#       503#        6928#      13#    B-  17864#     627#     38 036580#       540#
+  12   25   13   38 Al    x   16209.859    374.461     7377.097    9.854 B-  20380.157  388.847  38 017402.000    402.000
+  10   24   14   38 Si    x   -4170.299    104.793     7892.829    2.758 B-  10451.265  127.474  37 995523.000    112.500
+   8   23   15   38 P     x  -14621.563     72.581     8147.274    1.910 B-  12239.640   72.934  37 984303.105     77.918
+   6   22   16   38 S     +  -26861.203      7.172     8448.782    0.189 B-   2936.900    7.171  37 971163.310      7.699
+   4   21   17   38 Cl   -n  -29798.103      0.098     8505.481    0.003 B-   4916.718    0.218  37 968010.418      0.105
+   2   20   18   38 Ar       -34714.821      0.195     8614.280    0.005 B-  -5914.066    0.045  37 962732.104      0.209
+   0   19   19   38 K        -28800.755      0.195     8438.058    0.005 B-  -6742.256    0.063  37 969081.116      0.209
+  -2   18   20   38 Ca       -22058.499      0.194     8240.043    0.005 B- -17809#     200#     37 976319.226      0.208
+  -4   17   21   38 Sc    x   -4249#       200#        7751#       5#    B- -15119#     361#     37 995438#       215#
+  -6   16   22   38 Ti    x   10870#       300#        7332#       8#    B-      *               38 011669#       322#
+0 15   27   12   39 Mg   -n   42275#       513#        6747#      13#    B-  21625#     650#     39 045384#       551#
+  13   26   13   39 Al    x   20650#       400#        7281#      10#    B-  18330#     422#     39 022169#       429#
+  11   25   14   39 Si    x    2320.352    135.532     7730.979    3.475 B-  15094.986  176.232  39 002491.000    145.500
+   9   24   15   39 P     x  -12774.634    112.645     8097.969    2.888 B-  10388.033  123.243  38 986285.865    120.929
+   7   23   16   39 S  2p-n  -23162.667     50.000     8344.269    1.282 B-   6637.538   50.030  38 975133.852     53.677
+   5   22   17   39 Cl  -nn  -29800.205      1.732     8494.402    0.044 B-   3441.985    5.292  38 968008.162      1.859
+   3   21   18   39 Ar    +  -33242.190      5.000     8562.598    0.128 B-    565.000    5.000  38 964313.039      5.367
+   1   20   19   39 K        -33807.19010    0.00458   8557.025    0.000 B-  -6524.488    0.596  38 963706.48661    0.00492
+  -1   19   20   39 Ca       -27282.702      0.596     8369.670    0.015 B- -13109.993   24.007  38 970710.813      0.640
+  -3   18   21   39 Sc 2n-p  -14172.709     24.000     8013.456    0.615 B- -16373#     202#     38 984784.970     25.765
+  -5   17   22   39 Ti    x    2200#       200#        7574#       5#    B-      *               39 002362#       215#
+0 16   28   12   40 Mg    x   48350#       500#        6628#      13#    B-  20760#     640#     40 051906#       537#
+  14   27   13   40 Al    x   27590#       400#        7127#      10#    B-  22160#     528#     40 029619#       429#
+  12   26   14   40 Si    x    5429.679    345.119     7661.754    8.628 B-  13544.049  377.749  40 005829.000    370.500
+  10   25   15   40 P     x   -8114.370    153.582     7980.796    3.840 B-  14723.476  153.633  39 991288.865    164.876
+   8   24   16   40 S        -22837.846      3.982     8329.325    0.100 B-   4719.967   32.312  39 975482.562      4.274
+   6   23   17   40 Cl    +  -27557.813     32.066     8427.765    0.802 B-   7482.082   32.066  39 970415.469     34.423
+   4   22   18   40 Ar       -35039.89464    0.00224   8595.259    0.000 B-  -1504.403    0.056  39 962383.12378    0.00240
+   2   21   19   40 K        -33535.492      0.056     8538.090    0.001 B-   1310.893    0.060  39 963998.166      0.060
+   0   20   20   40 Ca       -34846.384      0.021     8551.303    0.001 B- -14323.050    2.828  39 962590.865      0.022
+  -2   19   21   40 Sc    -  -20523.335      2.828     8173.669    0.071 B- -11672.950  160.025  39 977967.292      3.036
+  -4   18   22   40 Ti   --   -8850.384    160.000     7862.286    4.000 B- -21020#     340#     39 990498.721    171.767
+  -6   17   23   40 V     x   12170#       300#        7317#       7#    B-      *               40 013065#       322#
+0 15   28   13   41 Al    x   33420#       500#        7008#      12#    B-  21300#     747#     41 035878#       537#
+  13   27   14   41 Si    x   12119.668    554.705     7508.573   13.529 B-  17099.435  567.571  41 013011.000    595.500
+  11   26   15   41 P     x   -4979.767    120.163     7906.551    2.931 B-  14028.810  120.233  40 994654.000    129.000
+   9   25   16   41 S     x  -19008.577      4.099     8229.635    0.100 B-   8298.611   68.846  40 979593.451      4.400
+   7   24   17   41 Cl    x  -27307.189     68.723     8412.959    1.676 B-   5760.317   68.724  40 970684.525     73.777
+   5   23   18   41 Ar   -n  -33067.505      0.347     8534.372    0.008 B-   2492.038    0.347  40 964500.571      0.372
+   3   22   19   41 K        -35559.54331    0.00380   8576.072    0.000 B-   -421.653    0.138  40 961825.25796    0.00408
+   1   21   20   41 Ca       -35137.890      0.138     8546.706    0.003 B-  -6495.478    0.158  40 962277.921      0.147
+  -1   20   21   41 Sc       -28642.412      0.083     8369.198    0.002 B- -12944.875   27.945  40 969251.104      0.088
+  -3   19   22   41 Ti    x  -15697.537     27.945     8034.388    0.682 B- -16018#     202#     40 983148.000     30.000
+  -5   18   23   41 V     x     320#       200#        7625#       5#    B-      *               41 000344#       215#
+0 16   29   13   42 Al    x   40100#       600#        6874#      14#    B-  23630#     781#     42 043049#       644#
+  14   28   14   42 Si    x   16470#       500#        7418#      12#    B-  15460#     591#     42 017681#       537#
+  12   27   15   42 P     x    1009.740    314.379     7767.866    7.485 B-  18647.485  314.392  42 001084.000    337.500
+  10   26   16   42 S     x  -17637.746      2.794     8193.227    0.067 B-   7194.021   59.681  41 981065.100      3.000
+   8   25   17   42 Cl    x  -24831.767     59.616     8345.886    1.419 B-   9590.908   59.895  41 973342.000     64.000
+   6   24   18   42 Ar    x  -34422.675      5.775     8555.613    0.138 B-    599.351    5.776  41 963045.736      6.200
+   4   23   19   42 K    -n  -35022.026      0.106     8551.256    0.003 B-   3525.219    0.183  41 962402.306      0.113
+   2   22   20   42 Ca       -38547.245      0.149     8616.563    0.004 B-  -6426.092    0.097  41 958617.828      0.159
+   0   21   21   42 Sc       -32121.153      0.169     8444.933    0.004 B-  -7016.479    0.224  41 965516.522      0.181
+  -2   20   22   42 Ti       -25104.674      0.277     8259.247    0.007 B- -17485#     196#     41 973049.022      0.297
+  -4   19   23   42 V     x   -7620#       196#        7824#       5#    B- -14350#     445#     41 991820#       210#
+  -6   18   24   42 Cr    x    6730#       400#        7464#      10#    B-      *               42 007225#       429#
+0 17   30   13   43 Al    x   47020#       800#        6741#      19#    B-  23919#     998#     43 050478#       859#
+  15   29   14   43 Si    x   23101#       596#        7279#      14#    B-  18421#     814#     43 024800#       640#
+  13   28   15   43 P     x    4679.826    554.705     7689.572   12.900 B-  16875.285  554.727  43 005024.000    595.500
+  11   27   16   43 S     x  -12195.459      4.970     8063.827    0.116 B-  11964.049   62.058  42 986907.635      5.335
+   9   26   17   43 Cl    x  -24159.508     61.858     8323.866    1.439 B-   7850.300   62.086  42 974063.700     66.407
+   7   25   18   43 Ar    x  -32009.808      5.310     8488.237    0.123 B-   4565.581    5.325  42 965636.055      5.700
+   5   24   19   43 K   -4n  -36575.389      0.410     8576.220    0.010 B-   1833.434    0.469  42 960734.703      0.440
+   3   23   20   43 Ca       -38408.822      0.228     8600.663    0.005 B-  -2220.720    1.865  42 958766.430      0.244
+   1   22   21   43 Sc   -p  -36188.102      1.863     8530.825    0.043 B-  -6867.020    7.481  42 961150.472      1.999
+  -1   21   22   43 Ti -n2p  -29321.082      7.245     8352.932    0.168 B- -11404.726   43.457  42 968522.521      7.777
+  -3   20   23   43 V     x  -17916.356     42.849     8069.512    0.996 B- -15946#     402#     42 980766.000     46.000
+  -5   19   24   43 Cr    x   -1970#       400#        7680#       9#    B-      *               42 997885#       429#
+0 16   30   14   44 Si    x   28513#       596#        7174#      14#    B-  18063#     778#     44 030610#       640#
+  14   29   15   44 P     x   10450#       500#        7567#      11#    B-  19655#     500#     44 011219#       537#
+  12   28   16   44 S     x   -9204.233      5.216     7996.015    0.119 B-  11180.290  136.421  43 990118.848      5.600
+  10   27   17   44 Cl    x  -20384.523    136.321     8232.332    3.098 B-  12288.731  136.330  43 978116.312    146.346
+   8   26   18   44 Ar    x  -32673.255      1.584     8493.840    0.036 B-   3108.237    1.638  43 964923.816      1.700
+   6   25   19   44 K     x  -35781.492      0.419     8546.701    0.010 B-   5687.183    0.530  43 961586.986      0.450
+   4   24   20   44 Ca       -41468.675      0.325     8658.175    0.007 B-  -3652.690    1.757  43 955481.543      0.348
+   2   23   21   44 Sc   -p  -37815.985      1.756     8557.379    0.040 B-   -267.416    1.890  43 959402.867      1.884
+   0   22   22   44 Ti   -a  -37548.569      0.700     8533.520    0.016 B- -13432.189  181.643  43 959689.951      0.751
+  -2   21   23   44 V     x  -24116.380    181.641     8210.463    4.128 B- -10756#     351#     43 974110.000    195.000
+  -4   20   24   44 Cr    x  -13360#       300#        7948#       7#    B- -20390#     583#     43 985657#       322#
+  -6   19   25   44 Mn    x    7030#       500#        7467#      11#    B-      *               44 007547#       537#
+0 17   31   14   45 Si    x   37490#       700#        6995#      16#    B-  21890#     860#     45 040247#       751#
+  15   30   15   45 P     x   15600#       500#        7464#      11#    B-  19589#    1150#     45 016747#       537#
+  13   29   16   45 S     x   -3989.589   1035.356     7881.807   23.008 B-  14272.954 1044.271  44 995717.000   1111.500
+  11   28   17   45 Cl    x  -18262.543    136.163     8181.598    3.026 B-  11508.254  136.164  44 980394.353    146.177
+   9   27   18   45 Ar    x  -29770.796      0.512     8419.952    0.011 B-   6844.841    0.731  44 968039.733      0.550
+   7   26   19   45 K     x  -36615.638      0.522     8554.674    0.012 B-   4196.536    0.637  44 960691.493      0.560
+   5   25   20   45 Ca       -40812.174      0.366     8630.545    0.008 B-    259.722    0.747  44 956186.326      0.392
+   3   24   21   45 Sc       -41071.896      0.675     8618.931    0.015 B-  -2062.056    0.509  44 955907.503      0.724
+   1   23   22   45 Ti       -39009.840      0.845     8555.722    0.019 B-  -7123.824    0.214  44 958121.211      0.907
+  -1   22   23   45 V        -31886.016      0.872     8380.029    0.019 B- -12371.217   35.408  44 965768.951      0.935
+  -3   21   24   45 Cr    x  -19514.799     35.397     8087.728    0.787 B- -14265#     401#     44 979050.000     38.000
+  -5   20   25   45 Mn    x   -5250#       400#        7753#       9#    B- -19012#     565#     44 994364#       429#
+  -7   19   26   45 Fe  -pp   13762#       400#        7313#       9#    B-      *               45 014774#       429#
+0 16   31   15   46 P     x   22970#       700#        7317#      15#    B-  22630#     860#     46 024659#       751#
+  14   30   16   46 S     x     340#       500#        7792#      11#    B-  14199#     542#     46 000365#       537#
+  12   29   17   46 Cl    x  -13859.398    208.661     8083.480    4.536 B-  15913.528  208.664  45 985121.323    224.006
+  10   28   18   46 Ar    x  -29772.926      1.118     8412.419    0.024 B-   5640.997    1.333  45 968037.446      1.200
+   8   27   19   46 K     x  -35413.924      0.727     8518.042    0.016 B-   7725.438    2.350  45 961981.586      0.780
+   6   26   20   46 Ca       -43139.361      2.235     8668.979    0.049 B-  -1378.143    2.333  45 953687.988      2.399
+   4   25   21   46 Sc   -n  -41761.219      0.683     8622.012    0.015 B-   2366.581    0.667  45 955167.485      0.732
+   2   24   22   46 Ti       -44127.799      0.165     8656.451    0.004 B-  -7052.449    0.093  45 952626.856      0.176
+   0   23   23   46 V        -37075.351      0.202     8486.130    0.004 B-  -7603.784   11.455  45 960197.971      0.216
+  -2   22   24   46 Cr       -29471.567     11.453     8303.823    0.249 B- -16902#     400#     45 968360.970     12.295
+  -4   21   25   46 Mn    x  -12570#       400#        7919#       9#    B- -13480#     640#     45 986506#       429#
+  -6   20   26   46 Fe    x     910#       500#        7609#      11#    B-      *               46 000977#       537#
+0 17   32   15   47 P     x   29710#       800#        7190#      17#    B-  22340#     944#     47 031895#       859#
+  15   31   16   47 S     x    7370#       500#        7648#      11#    B-  17150#     640#     47 007912#       537#
+  13   30   17   47 Cl    x   -9780#       400#        7996#       9#    B-  15587#     400#     46 989501#       429#
+  11   29   18   47 Ar    x  -25366.338      1.118     8311.404    0.024 B-  10345.638    1.789  46 972768.114      1.200
+   9   28   19   47 K     x  -35711.976      1.397     8514.879    0.030 B-   6632.442    2.625  46 961661.614      1.500
+   7   27   20   47 Ca       -42344.418      2.222     8639.349    0.047 B-   1992.177    1.185  46 954541.394      2.385
+   5   26   21   47 Sc       -44336.595      1.933     8665.090    0.041 B-    600.769    1.929  46 952402.704      2.074
+   3   25   22   47 Ti       -44937.364      0.115     8661.227    0.002 B-  -2930.746    0.138  46 951757.752      0.123
+   1   24   23   47 V        -42006.618      0.169     8582.225    0.004 B-  -7444.040    6.032  46 954904.038      0.181
+  -1   23   24   47 Cr       -34562.578      6.030     8407.195    0.128 B- -11996.204   32.240  46 962895.544      6.473
+  -3   22   25   47 Mn    x  -22566.374     31.671     8135.311    0.674 B- -15697#     501#     46 975774.000     34.000
+  -5   21   26   47 Fe    x   -6870#       500#        7785#      11#    B- -17240#     781#     46 992625#       537#
+  -7   20   27   47 Co    x   10370#       600#        7401#      13#    B-      *               47 011133#       644#
+0 16   32   16   48 S     x   12761#       596#        7545#      12#    B-  17042#     778#     48 013700#       640#
+  14   31   17   48 Cl    x   -4280#       500#        7883#      10#    B-  18001#     587#     47 995405#       537#
+  12   30   18   48 Ar    x  -22281.337    307.393     8242.132    6.404 B-  10003.140  307.394  47 976080.000    330.000
+  10   29   19   48 K     x  -32284.477      0.773     8434.232    0.016 B-  11940.153    0.779  47 965341.186      0.830
+   8   28   20   48 Ca       -44224.629      0.096     8666.686    0.002 B-    279.213    4.950  47 952522.904      0.103
+   6   27   21   48 Sc       -44503.842      4.951     8656.204    0.103 B-   3988.866    4.950  47 952223.157      5.314
+   4   26   22   48 Ti       -48492.709      0.109     8723.006    0.002 B-  -4015.015    0.969  47 947940.932      0.117
+   2   25   23   48 V        -44477.694      0.975     8623.061    0.020 B-  -1655.673    7.388  47 952251.229      1.046
+   0   24   24   48 Cr  +nn  -42822.020      7.324     8572.269    0.153 B- -13525.682   10.087  47 954028.667      7.862
+  -2   23   25   48 Mn       -29296.338      6.939     8274.185    0.145 B- -11296#     400#     47 968549.085      7.449
+  -4   22   26   48 Fe    x  -18000#       400#        8023#       8#    B- -19500#     640#     47 980676#       429#
+  -6   21   27   48 Co    x    1500#       500#        7600#      10#    B- -15293#     708#     48 001610#       537#
+  -8   20   28   48 Ni  -pp   16793#       502#        7265#      10#    B-      *               48 018028#       538#
+0 17   33   16   49 S    -n   21093#       667#        7385#      14#    B-  20153#     897#     49 022644#       716#
+  15   32   17   49 Cl    x     940#       600#        7781#      12#    B-  18130#     721#     49 001009#       644#
+  13   31   18   49 Ar    x  -17190#       400#        8135#       8#    B-  12422#     400#     48 981546#       429#
+  11   30   19   49 K     x  -29611.490      0.801     8372.274    0.016 B-  11688.275    0.826  48 968210.755      0.860
+   9   29   20   49 Ca   -n  -41299.765      0.201     8594.844    0.004 B-   5261.500    2.702  48 955662.875      0.216
+   7   28   21   49 Sc       -46561.265      2.698     8686.256    0.055 B-   2002.522    2.697  48 950014.423      2.896
+   5   27   22   49 Ti       -48563.787      0.114     8711.157    0.002 B-   -601.856    0.820  48 947864.627      0.122
+   3   26   23   49 V     -  -47961.931      0.828     8682.908    0.017 B-  -2628.871    2.391  48 948510.746      0.889
+   1   25   24   49 Cr       -45333.060      2.243     8613.291    0.046 B-  -7712.426    0.233  48 951332.955      2.407
+  -1   24   25   49 Mn       -37620.634      2.255     8439.929    0.046 B- -12869.907   24.324  48 959612.585      2.420
+  -3   23   26   49 Fe    x  -24750.727     24.219     8161.311    0.494 B- -14870#     501#     48 973429.000     26.000
+  -5   22   27   49 Co    x   -9880#       500#        7842#      10#    B- -18080#     781#     48 989393#       537#
+  -7   21   28   49 Ni    x    8200#       600#        7457#      12#    B-      *               49 008803#       644#
+0 16   33   17   50 Cl    x    7740#       600#        7651#      12#    B-  21069#     781#     50 008309#       644#
+  14   32   18   50 Ar    x  -13330#       500#        8056#      10#    B-  12398#     500#     49 985690#       537#
+  12   31   19   50 K     x  -25727.848      7.731     8288.582    0.155 B-  13861.376    7.892  49 972380.017      8.300
+  10   30   20   50 Ca    x  -39589.224      1.584     8550.163    0.032 B-   4958.158   15.084  49 957499.217      1.700
+   8   29   21   50 Sc  -pn  -44547.382     15.000     8633.679    0.300 B-   6884.278   15.000  49 952176.415     16.103
+   6   28   22   50 Ti       -51431.660      0.121     8755.718    0.002 B-  -2207.647    0.426  49 944785.839      0.129
+   4   27   23   50 V    +n  -49224.013      0.409     8695.918    0.008 B-   1038.059    0.299  49 947155.845      0.438
+   2   26   24   50 Cr       -50262.072      0.437     8701.032    0.009 B-  -7634.477    0.067  49 946041.443      0.468
+   0   25   25   50 Mn       -42627.595      0.442     8532.696    0.009 B-  -8151.139    8.395  49 954237.391      0.474
+  -2   24   26   50 Fe    x  -34476.456      8.383     8354.026    0.168 B- -16846#     400#     49 962988.000      9.000
+  -4   23   27   50 Co    x  -17630#       400#        8001#       8#    B- -13510#     640#     49 981073#       429#
+  -6   22   28   50 Ni    x   -4120#       500#        7716#      10#    B-      *               49 995577#       537#
+0 17   34   17   51 Cl    x   14290#       700#        7530#      14#    B-  20980#     922#     51 015341#       751#
+  15   33   18   51 Ar    x   -6690#       600#        7926#      12#    B-  15826#     600#     50 992818#       644#
+  13   32   19   51 K     x  -22516.196     13.047     8221.349    0.256 B-  13816.107   13.057  50 975827.867     14.006
+  11   31   20   51 Ca    x  -36332.304      0.522     8476.913    0.010 B-   6896.381   20.007  50 960995.665      0.560
+   9   30   21   51 Sc -p2n  -43228.684     20.000     8596.796    0.392 B-   6504.153   20.006  50 953592.095     21.471
+   7   29   22   51 Ti   -n  -49732.837      0.505     8708.988    0.010 B-   2471.005    0.644  50 946609.600      0.541
+   5   28   23   51 V        -52203.842      0.401     8742.099    0.008 B-   -752.447    0.213  50 943956.867      0.430
+   3   27   24   51 Cr       -51451.395      0.400     8712.005    0.008 B-  -3207.518    0.346  50 944764.652      0.429
+   1   26   25   51 Mn       -48243.877      0.502     8633.772    0.010 B-  -8041.321    8.977  50 948208.065      0.539
+  -1   25   26   51 Fe       -40202.555      8.964     8460.759    0.176 B- -12860.412   49.260  50 956840.779      9.623
+  -3   24   27   51 Co    x  -27342.143     48.438     8193.254    0.950 B- -15442#     503#     50 970647.000     52.000
+  -5   23   28   51 Ni    x  -11900#       500#        7875#      10#    B-      *               50 987225#       537#
+0 16   34   18   52 Ar    x   -1280#       600#        7825#      12#    B-  15858#     601#     51 998626#       644#
+  14   33   19   52 K     x  -17137.627     33.534     8115.029    0.645 B-  17128.639   33.540  51 981602.000     36.000
+  12   32   20   52 Ca    x  -34266.266      0.671     8429.381    0.013 B-   6177.013   81.855  51 963213.648      0.720
+  10   31   21   52 Sc    x  -40443.279     81.852     8533.125    1.574 B-   9026.541   82.157  51 956582.351     87.871
+   8   30   22   52 Ti  -nn  -49469.820      7.072     8691.667    0.136 B-   1973.948    7.085  51 946891.960      7.592
+   6   29   23   52 V    -n  -51443.769      0.420     8714.582    0.008 B-   3975.473    0.531  51 944772.839      0.450
+   4   28   24   52 Cr       -55419.242      0.340     8775.989    0.007 B-  -4711.958    1.851  51 940504.992      0.364
+   2   27   25   52 Mn       -50707.284      1.845     8670.329    0.035 B-  -2376.920    5.017  51 945563.488      1.980
+   0   26   26   52 Fe       -48330.363      5.117     8609.574    0.098 B- -13969.413    9.822  51 948115.217      5.493
+  -2   25   27   52 Co    x  -34360.951      8.383     8325.886    0.161 B- -12031#     400#     51 963112.000      9.000
+  -4   24   28   52 Ni    x  -22330#       400#        8079#       8#    B- -20049#     721#     51 976028#       429#
+  -6   23   29   52 Cu    x   -2280#       600#        7679#      12#    B-      *               51 997552#       644#
+0 17   35   18   53 Ar    x    6791#       699#        7677#      13#    B-  19086#     708#     53 007290#       750#
+  15   34   19   53 K     x  -12295.721    111.779     8022.848    2.109 B-  17091.983  120.047  52 986800.000    120.000
+  13   33   20   53 Ca    x  -29387.704     43.780     8330.577    0.826 B-   9519.104  103.774  52 968451.000     47.000
+  11   32   21   53 Sc    x  -38906.808     94.087     8495.421    1.775 B-   7924.253  137.339  52 958231.821    101.006
+   9   31   22   53 Ti    +  -46831.061    100.049     8630.174    1.888 B-   5020.000  100.000  52 949724.785    107.406
+   7   30   23   53 V    +p  -51851.061      3.120     8710.130    0.059 B-   3435.938    3.102  52 944335.593      3.349
+   5   29   24   53 Cr       -55286.999      0.348     8760.198    0.007 B-   -596.884    0.356  52 940646.961      0.373
+   3   28   25   53 Mn       -54690.116      0.450     8734.175    0.009 B-  -3742.586    1.686  52 941287.742      0.483
+   1   27   26   53 Fe       -50947.530      1.656     8648.799    0.031 B-  -8288.101    0.443  52 945305.574      1.777
+  -1   26   27   53 Co       -42659.428      1.713     8477.658    0.032 B- -13028.604   25.209  52 954203.217      1.839
+  -3   25   28   53 Ni    x  -29630.824     25.150     8217.074    0.475 B- -16361#     501#     52 968190.000     27.000
+  -5   24   29   53 Cu    x  -13270#       500#        7894#       9#    B-      *               52 985754#       537#
+0 16   35   19   54 K     x   -5002#       596#        7889#      11#    B-  20158#     598#     53 994630#       640#
+  14   34   20   54 Ca    x  -25160.585     48.438     8247.496    0.897 B-   8730.315  277.066  53 972989.000     52.000
+  12   33   21   54 Sc    x  -33890.900    272.800     8394.681    5.052 B-  11731.081  284.990  53 963616.620    292.862
+  10   32   22   54 Ti    x  -45621.981     82.461     8597.435    1.527 B-   4271.192   83.815  53 951022.786     88.526
+   8   31   23   54 V     +  -49893.173     15.004     8662.043    0.278 B-   7041.592   15.000  53 946437.472     16.107
+   6   30   24   54 Cr       -56934.765      0.353     8777.955    0.007 B-  -1377.136    1.008  53 938878.012      0.378
+   4   29   25   54 Mn   -p  -55557.629      1.059     8737.965    0.020 B-    696.872    1.076  53 940356.429      1.136
+   2   28   26   54 Fe       -56254.500      0.372     8736.382    0.007 B-  -8244.547    0.089  53 939608.306      0.399
+   0   27   27   54 Co       -48009.953      0.383     8569.217    0.007 B-  -8731.646    4.673  53 948459.192      0.411
+  -2   26   28   54 Ni    x  -39278.308      4.657     8393.032    0.086 B- -17868#     400#     53 957833.000      5.000
+  -4   25   29   54 Cu    x  -21410#       400#        8048#       7#    B- -15139#     565#     53 977015#       429#
+  -6   24   30   54 Zn  -pp   -6272#       400#        7753#       7#    B-      *               53 993267#       430#
+0 17   36   19   55 K     x     708#       699#        7788#      13#    B-  19058#     760#     55 000760#       750#
+  15   35   20   55 Ca    x  -18350#       300#        8120#       5#    B-  11809#     544#     54 980300#       322#
+  13   34   21   55 Sc    x  -30159.352    454.342     8320.955    8.261 B-  11508.735  482.226  54 967622.601    487.756
+  11   33   22   55 Ti       -41668.088    161.602     8515.980    2.938 B-   7476.498  157.206  54 955267.465    173.486
+   9   32   23   55 V        -49144.586     95.104     8637.692    1.729 B-   5965.125   95.103  54 947241.114    102.098
+   7   31   24   55 Cr       -55109.710      0.399     8731.924    0.007 B-   2602.703    0.368  54 940837.289      0.428
+   5   30   25   55 Mn       -57712.413      0.303     8765.022    0.006 B-   -231.114    0.179  54 938043.172      0.325
+   3   29   26   55 Fe       -57481.300      0.342     8746.595    0.006 B-  -3451.417    0.324  54 938291.283      0.367
+   1   28   27   55 Co       -54029.883      0.428     8669.618    0.008 B-  -8694.034    0.578  54 941996.531      0.459
+  -1   27   28   55 Ni    -  -45335.849      0.719     8497.320    0.013 B- -13700.449  155.561  54 951329.961      0.771
+  -3   26   29   55 Cu    x  -31635.399    155.559     8233.996    2.828 B- -17065#     429#     54 966038.000    167.000
+  -5   25   30   55 Zn    x  -14570#       400#        7909#       7#    B-      *               54 984358#       429#
+0 18   37   19   56 K     x    7927#       801#        7664#      14#    B-  21825#     895#     56 008510#       860#
+  16   36   20   56 Ca    x  -13898#       400#        8040#       7#    B-  10954#     710#     55 985080#       429#
+  14   35   21   56 Sc    x  -24852.260    586.841     8221.728   10.479 B-  14467.788  599.236  55 973320.000    630.000
+  12   34   22   56 Ti       -39320.048    121.247     8466.110    2.165 B-   6834.833  194.550  55 957788.190    130.164
+  10   33   23   56 V        -46154.881    176.898     8574.191    3.159 B-   9130.120  176.899  55 950450.694    189.907
+   8   32   24   56 Cr   ++  -55285.001      0.603     8723.258    0.011 B-   1626.538    0.561  55 940649.107      0.647
+   6   31   25   56 Mn   -n  -56911.538      0.331     8738.333    0.006 B-   3695.544    0.207  55 938902.947      0.355
+   4   30   26   56 Fe       -60607.082      0.302     8790.354    0.005 B-  -4566.680    0.411  55 934935.617      0.324
+   2   29   27   56 Co       -56040.402      0.493     8694.836    0.009 B-  -2132.863    0.374  55 939838.150      0.529
+   0   28   28   56 Ni       -53907.539      0.422     8642.779    0.008 B- -15264.511   14.910  55 942127.872      0.452
+  -2   27   29   56 Cu    x  -38643.029     14.904     8356.227    0.266 B- -13253#     400#     55 958515.000     16.000
+  -4   26   30   56 Zn    x  -25390#       400#        8106#       7#    B- -22000#     640#     55 972743#       429#
+  -6   25   31   56 Ga    x   -3390#       500#        7699#       9#    B-      *               55 996361#       537#
+0 17   37   20   57 Ca    x   -6874#       400#        7917#       7#    B-  14121#    1364#     56 992620#       429#
+  15   36   21   57 Sc    x  -20995.875   1304.092     8151.433   22.879 B-  12919.758 1329.062  56 977460.000   1400.000
+  13   35   22   57 Ti    x  -33915.633    256.417     8364.370    4.499 B-  10497.818  268.750  56 963590.068    275.274
+  11   34   23   57 V     x  -44413.450     80.479     8534.817    1.412 B-   8111.252   80.486  56 952320.197     86.397
+   9   33   24   57 Cr    x  -52524.702      1.068     8663.394    0.019 B-   4961.548    1.846  56 943612.409      1.146
+   7   32   25   57 Mn       -57486.251      1.505     8736.713    0.026 B-   2695.589    1.526  56 938285.968      1.615
+   5   31   26   57 Fe       -60181.839      0.304     8770.279    0.005 B-   -836.276    0.451  56 935392.134      0.326
+   3   30   27   57 Co       -59345.564      0.533     8741.882    0.009 B-  -3261.731    0.642  56 936289.913      0.572
+   1   29   28   57 Ni       -56083.833      0.582     8670.933    0.010 B-  -8774.947    0.439  56 939791.525      0.624
+  -1   28   29   57 Cu       -47308.886      0.519     8503.262    0.009 B- -14759#     200#     56 949211.819      0.557
+  -3   27   30   57 Zn    x  -32550#       200#        8231#       4#    B- -17540#     447#     56 965056#       215#
+  -5   26   31   57 Ga    x  -15010#       400#        7909#       7#    B-      *               56 983886#       429#
+0 18   38   20   58 Ca    x   -1919#       500#        7835#       9#    B-  12957#     640#     57 997940#       537#
+  16   37   21   58 Sc    x  -14876#       400#        8045#       7#    B-  16234#     447#     57 984030#       429#
+  14   36   22   58 Ti    x  -31110#       200#        8311#       3#    B-   9292#     219#     57 966602#       215#
+  12   35   23   58 V     x  -40401.753     89.374     8457.658    1.541 B-  11590.049   89.386  57 956626.932     95.947
+  10   34   24   58 Cr    x  -51991.801      1.490     8643.998    0.026 B-   3835.759    3.085  57 944184.502      1.600
+   8   33   25   58 Mn    x  -55827.560      2.701     8696.643    0.047 B-   6327.553    2.723  57 940066.646      2.900
+   6   32   26   58 Fe       -62155.113      0.343     8792.250    0.006 B-  -2307.955    1.139  57 933273.738      0.368
+   4   31   27   58 Co       -59847.158      1.160     8738.969    0.020 B-    381.586    1.107  57 935751.429      1.245
+   2   30   28   58 Ni       -60228.744      0.373     8732.059    0.006 B-  -8561.019    0.443  57 935341.780      0.400
+   0   29   29   58 Cu       -51667.725      0.578     8570.967    0.010 B-  -9368.981   50.002  57 944532.413      0.621
+  -2   28   30   58 Zn   --  -42298.744     50.001     8395.944    0.862 B- -18759#     304#     57 954590.428     53.678
+  -4   27   31   58 Ga    x  -23540#       300#        8059#       5#    B- -16459#     583#     57 974729#       322#
+  -6   26   32   58 Ge    x   -7080#       500#        7762#       9#    B-      *               57 992399#       537#
+0 17   38   21   59 Sc    x  -10302#       400#        7967#       7#    B-  15208#     447#     58 988940#       429#
+  15   37   22   59 Ti    x  -25510#       200#        8212#       3#    B-  12322#     258#     58 972614#       215#
+  13   36   23   59 V     x  -37832.015    161.874     8407.555    2.744 B-  10253.745  270.218  58 959385.659    173.778
+  11   35   24   59 Cr    x  -48085.760    216.367     8568.087    3.667 B-   7439.560  216.380  58 948377.810    232.279
+   9   34   25   59 Mn    x  -55525.320      2.329     8680.921    0.039 B-   5139.485    2.356  58 940391.113      2.500
+   7   33   26   59 Fe       -60664.805      0.355     8754.771    0.006 B-   1564.903    0.369  58 934873.649      0.380
+   5   32   27   59 Co       -62229.709      0.418     8768.035    0.007 B-  -1073.002    0.194  58 933193.656      0.448
+   3   31   28   59 Ni       -61156.707      0.374     8736.588    0.006 B-  -4798.380    0.397  58 934345.571      0.402
+   1   30   29   59 Cu       -56358.327      0.544     8642.000    0.009 B-  -9142.775    0.602  58 939496.844      0.584
+  -1   29   30   59 Zn       -47215.551      0.771     8473.777    0.013 B- -13455#     170#     58 949312.017      0.827
+  -3   28   31   59 Ga    x  -33760#       170#        8232#       3#    B- -17890#     434#     58 963757#       183#
+  -5   27   32   59 Ge    x  -15870#       400#        7916#       7#    B-      *               58 982963#       429#
+0 18   39   21   60 Sc    x   -4052#       500#        7865#       8#    B-  18278#     583#     59 995650#       537#
+  16   38   22   60 Ti    x  -22330#       300#        8157#       5#    B-  10912#     372#     59 976028#       322#
+  14   37   23   60 V     x  -33241.956    220.159     8325.450    3.669 B-  13427.621  293.169  59 964313.290    236.350
+  12   36   24   60 Cr    x  -46669.576    193.593     8536.205    3.227 B-   6298.361  193.607  59 949898.146    207.830
+  10   35   25   60 Mn    x  -52967.938      2.329     8628.138    0.039 B-   8445.079    4.128  59 943136.576      2.500
+   8   34   26   60 Fe  -nn  -61413.017      3.409     8755.851    0.057 B-    237.293    3.411  59 934070.411      3.659
+   6   33   27   60 Co   -n  -61650.309      0.424     8746.766    0.007 B-   2822.809    0.212  59 933815.667      0.455
+   4   32   28   60 Ni       -64473.118      0.376     8780.774    0.006 B-  -6127.982    1.573  59 930785.256      0.403
+   2   31   29   60 Cu    -  -58345.137      1.618     8665.602    0.027 B-  -4170.797    1.629  59 937363.916      1.736
+   0   30   30   60 Zn       -54174.340      0.564     8583.050    0.009 B- -14584#     200#     59 941841.450      0.605
+  -2   29   31   60 Ga    x  -39590#       200#        8327#       3#    B- -12501#     361#     59 957498#       215#
+  -4   28   32   60 Ge    x  -27090#       300#        8106#       5#    B- -21620#     500#     59 970918#       322#
+  -6   27   33   60 As    x   -5470#       400#        7732#       7#    B-      *               59 994128#       429#
+0 19   40   21   61 Sc    x     931#       600#        7787#      10#    B-  17281#     721#     61 001000#       644#
+  17   39   22   61 Ti    x  -16350#       400#        8057#       7#    B-  14157#     979#     60 982448#       429#
+  15   38   23   61 V     x  -30506.429    894.234     8276.439   14.660 B-  11968.800  899.958  60 967250.000    960.000
+  13   37   24   61 Cr    x  -42475.229    101.341     8459.824    1.661 B-   9266.893  101.367  60 954400.963    108.793
+  11   36   25   61 Mn    x  -51742.122      2.329     8598.915    0.038 B-   7178.372    3.497  60 944452.544      2.500
+   9   35   26   61 Fe    x  -58920.494      2.608     8703.768    0.043 B-   3977.572    2.742  60 936746.244      2.800
+   7   34   27   61 Co  p2n  -62898.066      0.846     8756.148    0.014 B-   1323.839    0.790  60 932476.145      0.908
+   5   33   28   61 Ni       -64221.905      0.378     8765.025    0.006 B-  -2237.845    0.966  60 931054.945      0.405
+   3   32   29   61 Cu  p2n  -61984.059      0.953     8715.514    0.016 B-  -5635.156   15.903  60 933457.371      1.023
+   1   31   30   61 Zn       -56348.903     15.899     8610.309    0.261 B-  -9214.245   37.679  60 939506.960     17.068
+  -1   30   31   61 Ga       -47134.659     37.994     8446.431    0.623 B- -13775#     302#     60 949398.859     40.787
+  -3   29   32   61 Ge    x  -33360#       300#        8208#       5#    B- -16459#     424#     60 964187#       322#
+  -5   28   33   61 As    x  -16900#       300#        7925#       5#    B-      *               60 981857#       322#
+0 18   40   22   62 Ti    x  -12500#       400#        7995#       6#    B-  12977#     499#     61 986581#       429#
+  16   39   23   62 V     x  -25476#       298#        8192#       5#    B-  15419#     333#     61 972650#       320#
+  14   38   24   62 Cr    x  -40894.961    148.099     8428.069    2.389 B-   7628.996  148.244  61 956097.451    158.991
+  12   37   25   62 Mn   IT  -48523.957      6.542     8538.499    0.106 B-  10354.091    7.114  61 947907.386      7.023
+  10   36   26   62 Fe    x  -58878.048      2.794     8692.882    0.045 B-   2546.235   18.784  61 936791.812      3.000
+   8   35   27   62 Co    +  -61424.282     18.575     8721.332    0.300 B-   5322.040   18.570  61 934058.317     19.940
+   6   34   28   62 Ni       -66746.323      0.439     8794.553    0.007 B-  -3958.896    0.475  61 928344.871      0.470
+   4   33   29   62 Cu    -  -62787.426      0.647     8718.081    0.010 B-  -1619.455    0.651  61 932594.921      0.694
+   2   32   30   62 Zn       -61167.972      0.625     8679.343    0.010 B-  -9181.066    0.376  61 934333.477      0.670
+   0   31   31   62 Ga       -51986.906      0.647     8518.642    0.010 B- -10247#     140#     61 944189.757      0.694
+  -2   30   32   62 Ge    x  -41740#       140#        8341#       2#    B- -17420#     331#     61 955190#       150#
+  -4   29   33   62 As    x  -24320#       300#        8047#       5#    B-      *               61 973891#       322#
+0 19   41   22   63 Ti    x   -5750#       500#        7889#       8#    B-  16140#     640#     62 993827#       537#
+  17   40   23   63 V     x  -21890#       400#        8133#       6#    B-  14117#     537#     62 976500#       429#
+  15   39   24   63 Cr    x  -36007.474    358.073     8344.828    5.684 B-  10879.579  358.092  62 961344.384    384.407
+  13   38   25   63 Mn    x  -46887.053      3.726     8505.101    0.059 B-   8748.568    5.692  62 949664.675      4.000
+  11   37   26   63 Fe       -55635.621      4.302     8631.549    0.068 B-   6215.819   19.067  62 940272.700      4.618
+   9   36   27   63 Co       -61851.440     18.575     8717.795    0.295 B-   3661.335   18.570  62 933599.744     19.941
+   7   35   28   63 Ni       -65512.775      0.440     8763.493    0.007 B-     66.977    0.015  62 929669.139      0.472
+   5   34   29   63 Cu       -65579.752      0.440     8752.138    0.007 B-  -3366.355    1.546  62 929597.236      0.472
+   3   33   30   63 Zn       -62213.397      1.561     8686.285    0.025 B-  -5666.304    2.034  62 933211.167      1.676
+   1   32   31   63 Ga    x  -56547.093      1.304     8583.926    0.021 B-  -9625.877   37.283  62 939294.195      1.400
+  -1   31   32   63 Ge    x  -46921.216     37.260     8418.716    0.591 B- -13421#     204#     62 949628.000     40.000
+  -3   30   33   63 As    x  -33500#       200#        8193#       3#    B-      *               62 964036#       215#
+0 20   42   22   64 Ti    x   -1025#       600#        7818#       9#    B-  15295#     721#     63 998900#       644#
+  18   41   23   64 V     x  -16320#       400#        8045#       6#    B-  17160#     594#     63 982480#       429#
+  16   40   24   64 Cr    x  -33479.757    439.665     8301.058    6.870 B-   9509.277  439.679  63 964058.000    472.000
+  14   39   25   64 Mn    x  -42989.035      3.540     8437.417    0.055 B-  11980.510    6.140  63 953849.370      3.800
+  12   38   26   64 Fe    x  -54969.544      5.017     8612.388    0.078 B-   4822.785   20.625  63 940987.763      5.386
+  10   37   27   64 Co    +  -59792.329     20.006     8675.520    0.313 B-   7306.592   20.000  63 935810.291     21.476
+   8   36   28   64 Ni       -67098.921      0.475     8777.461    0.007 B-  -1674.376    0.225  63 927966.341      0.510
+   6   35   29   64 Cu       -65424.545      0.448     8739.075    0.007 B-    579.469    0.650  63 929763.857      0.481
+   4   34   30   64 Zn       -66004.014      0.647     8735.905    0.010 B-  -7171.194    1.483  63 929141.772      0.694
+   2   33   31   64 Ga       -58832.821      1.429     8611.631    0.022 B-  -4517.325    3.991  63 936840.365      1.533
+   0   32   32   64 Ge    x  -54315.496      3.726     8528.823    0.058 B- -14783#     203#     63 941689.913      4.000
+  -2   31   33   64 As   -p  -39532#       203#        8286#       3#    B- -12832#     543#     63 957560#       218#
+  -4   30   34   64 Se    x  -26700#       503#        8073#       8#    B-      *               63 971336#       540#
+0 19   42   23   65 V     x  -11780#       500#        7976#       8#    B-  16440#     583#     64 987354#       537#
+  17   41   24   65 Cr    x  -28220#       300#        8217#       5#    B-  12748#     300#     64 969705#       322#
+  15   40   25   65 Mn    x  -40967.339      3.726     8400.681    0.057 B-  10250.557    6.326  64 956019.750      4.000
+  13   39   26   65 Fe    x  -51217.895      5.112     8546.346    0.079 B-   7967.303    5.520  64 945015.324      5.487
+  11   38   27   65 Co    x  -59185.198      2.083     8656.884    0.032 B-   5940.487    2.141  64 936462.073      2.235
+   9   37   28   65 Ni   -n  -65125.685      0.495     8736.240    0.008 B-   2137.975    0.706  64 930084.697      0.531
+   7   36   29   65 Cu       -67263.660      0.650     8757.096    0.010 B-  -1351.640    0.360  64 927789.487      0.697
+   5   35   30   65 Zn       -65912.019      0.650     8724.265    0.010 B-  -3254.513    0.662  64 929240.532      0.697
+   3   34   31   65 Ga       -62657.507      0.815     8662.160    0.013 B-  -6179.291    2.313  64 932734.395      0.874
+   1   33   32   65 Ge       -56478.216      2.165     8555.058    0.033 B-  -9541.165   84.794  64 939368.137      2.323
+  -1   32   33   65 As    x  -46937.051     84.766     8396.234    1.304 B- -13917#     312#     64 949611.000     91.000
+  -3   31   34   65 Se    x  -33020#       300#        8170#       5#    B-      *               64 964552#       322#
+0 20   43   23   66 V     x   -5610#       500#        7884#       8#    B-  19110#     640#     65 993977#       537#
+  18   42   24   66 Cr    x  -24720#       400#        8161#       6#    B-  12030#     400#     65 973462#       429#
+  16   41   25   66 Mn    x  -36750.387     11.178     8331.798    0.169 B-  13317.452   11.906  65 960546.834     12.000
+  14   40   26   66 Fe    x  -50067.839      4.099     8521.724    0.062 B-   6340.694   14.561  65 946249.960      4.400
+  12   39   27   66 Co    x  -56408.533     13.972     8605.941    0.212 B-   9597.752   14.042  65 939442.945     15.000
+  10   38   28   66 Ni    x  -66006.285      1.397     8739.508    0.021 B-    251.987    1.543  65 929139.334      1.500
+   8   37   29   66 Cu       -66258.272      0.655     8731.472    0.010 B-   2640.888    0.931  65 928868.814      0.703
+   6   36   30   66 Zn       -68899.160      0.749     8759.632    0.011 B-  -5175.500    0.800  65 926033.704      0.804
+   4   35   31   66 Ga    -  -63723.660      1.096     8669.361    0.017 B-  -2116.628    2.639  65 931589.832      1.176
+   2   34   32   66 Ge    x  -61607.032      2.401     8625.437    0.036 B-  -9581.955    6.168  65 933862.126      2.577
+   0   33   33   66 As    x  -52025.077      5.682     8468.403    0.086 B- -10365#     200#     65 944148.779      6.100
+  -2   32   34   66 Se    x  -41660#       200#        8300#       3#    B-      *               65 955276#       215#
+0 21   44   23   67 V     x    -650#       600#        7812#       9#    B-  18030#     721#     66 999302#       644#
+  19   43   24   67 Cr    x  -18680#       400#        8070#       6#    B-  14780#     500#     66 979946#       429#
+  17   42   25   67 Mn    x  -33460#       300#        8279#       4#    B-  12150#     404#     66 964079#       322#
+  15   41   26   67 Fe    x  -45610.155    270.285     8448.469    4.034 B-   9711.620  270.362  66 951035.482    290.163
+  13   40   27   67 Co    x  -55321.775      6.443     8581.741    0.096 B-   8420.905    7.061  66 940609.628      6.917
+  11   39   28   67 Ni    x  -63742.680      2.888     8695.750    0.043 B-   3576.832    3.023  66 931569.414      3.100
+   9   38   29   67 Cu       -67319.513      0.894     8737.458    0.013 B-    560.800    0.830  66 927729.526      0.959
+   7   37   30   67 Zn       -67880.313      0.760     8734.152    0.011 B-  -1001.265    1.122  66 927127.482      0.815
+   5   36   31   67 Ga       -66879.048      1.181     8707.531    0.018 B-  -4220.819    4.799  66 928202.384      1.268
+   3   35   32   67 Ge -n2p  -62658.230      4.661     8632.857    0.070 B-  -6071.005    4.682  66 932733.620      5.003
+   1   34   33   67 As       -56587.225      0.443     8530.568    0.007 B- -10006.936   67.069  66 939251.111      0.475
+  -1   33   34   67 Se    x  -46580.289     67.068     8369.534    1.001 B- -13790#     405#     66 949994.000     72.000
+  -3   32   35   67 Br    x  -32790#       400#        8152#       6#    B-      *               66 964798#       429#
+0 20   44   24   68 Cr    x  -14800#       500#        8013#       7#    B-  13580#     640#     67 984112#       537#
+  18   43   25   68 Mn    x  -28380#       400#        8201#       6#    B-  15107#     541#     67 969533#       429#
+  16   42   26   68 Fe    x  -43486.914    365.259     8411.698    5.371 B-   8443.751  411.489  67 953314.875    392.121
+  14   41   27   68 Co    x  -51930.665    189.497     8524.366    2.787 B-  11533.150  189.520  67 944250.135    203.433
+  12   40   28   68 Ni    x  -63463.814      2.981     8682.466    0.044 B-   2103.220    3.375  67 931868.789      3.200
+  10   39   29   68 Cu    x  -65567.035      1.584     8701.890    0.023 B-   4440.057    1.767  67 929610.889      1.700
+   8   38   30   68 Zn       -70007.092      0.784     8755.680    0.012 B-  -2921.100    1.200  67 924844.291      0.841
+   6   37   31   68 Ga    -  -67085.992      1.433     8701.218    0.021 B-   -107.203    2.361  67 927980.221      1.538
+   4   36   32   68 Ge    x  -66978.789      1.876     8688.136    0.028 B-  -8084.270    2.632  67 928095.308      2.014
+   2   35   33   68 As       -58894.519      1.846     8557.745    0.027 B-  -4705.078    1.911  67 936774.130      1.981
+   0   34   34   68 Se    x  -54189.441      0.496     8477.047    0.007 B- -15398#     259#     67 941825.239      0.532
+  -2   33   35   68 Br   -p  -38791#       259#        8239#       4#    B-      *               67 958356#       278#
+0 21   45   24   69 Cr    x   -8580#       500#        7924#       7#    B-  16190#     640#     68 990789#       537#
+  19   44   25   69 Mn    x  -24770#       400#        8147#       6#    B-  14259#     565#     68 973408#       429#
+  17   43   26   69 Fe    x  -39030#       400#        8342#       6#    B-  11250#     424#     68 958100#       429#
+  15   42   27   69 Co    x  -50279.157    140.506     8493.865    2.036 B-   9699.492  140.556  68 946023.102    150.839
+  13   41   28   69 Ni    x  -59978.648      3.726     8623.099    0.054 B-   5757.564    3.979  68 935610.268      4.000
+  11   40   29   69 Cu    x  -65736.213      1.397     8695.204    0.020 B-   2681.632    1.610  68 929429.268      1.500
+   9   39   30   69 Zn   -n  -68417.845      0.800     8722.729    0.012 B-    909.964    1.426  68 926550.418      0.858
+   7   38   31   69 Ga       -69327.809      1.197     8724.579    0.017 B-  -2227.146    0.550  68 925573.531      1.285
+   5   37   32   69 Ge       -67100.663      1.318     8680.963    0.019 B-  -3988.492   31.982  68 927964.471      1.414
+   3   36   33   69 As       -63112.171     31.999     8611.821    0.464 B-  -6677.465   32.021  68 932246.294     34.352
+   1   35   34   69 Se       -56434.706      1.490     8503.707    0.022 B- -10175.236   42.029  68 939414.847      1.599
+  -1   34   35   69 Br   -p  -46259.470     42.003     8344.902    0.609 B- -13825#     403#     68 950338.413     45.092
+  -3   33   36   69 Kr    x  -32435#       401#        8133#       6#    B-      *               68 965180#       430#
+0 22   46   24   70 Cr    x   -4480#       600#        7867#       9#    B-  15020#     781#     69 995191#       644#
+  20   45   25   70 Mn    x  -19500#       500#        8070#       7#    B-  17010#     640#     69 979066#       537#
+  18   44   26   70 Fe    x  -36510#       400#        8302#       6#    B-  10120#     500#     69 960805#       429#
+  16   43   27   70 Co    x  -46630#       300#        8436#       4#    B-  12584#     300#     69 949941#       322#
+  14   42   28   70 Ni    x  -59213.860      2.144     8604.291    0.031 B-   3762.513    2.401  69 936431.303      2.301
+  12   41   29   70 Cu    x  -62976.373      1.082     8646.865    0.015 B-   6588.362    2.202  69 932392.079      1.161
+  10   40   30   70 Zn       -69564.735      1.918     8729.808    0.027 B-   -654.595    1.574  69 925319.181      2.058
+   8   39   31   70 Ga       -68910.140      1.201     8709.280    0.017 B-   1651.736    1.462  69 926021.917      1.289
+   6   38   32   70 Ge       -70561.876      0.838     8721.700    0.012 B-  -6220.000   50.000  69 924248.706      0.900
+   4   37   33   70 As    -  -64341.876     50.007     8621.666    0.714 B-  -2411.985   50.032  69 930926.151     53.684
+   2   36   34   70 Se    x  -61929.891      1.584     8576.033    0.023 B- -10504.272   14.988  69 933515.523      1.700
+   0   35   35   70 Br    x  -51425.619     14.904     8414.796    0.213 B- -10325#     201#     69 944792.323     16.000
+  -2   34   36   70 Kr    x  -41100#       200#        8256#       3#    B-      *               69 955877#       215#
+0 21   46   25   71 Mn    x  -15570#       500#        8015#       7#    B-  15860#     640#     70 983285#       537#
+  19   45   26   71 Fe    x  -31430#       400#        8227#       6#    B-  12940#     613#     70 966259#       429#
+  17   44   27   71 Co    x  -44369.926    465.030     8398.734    6.550 B-  11036.302  465.035  70 952366.923    499.230
+  15   43   28   71 Ni    x  -55406.228      2.237     8543.156    0.032 B-   7304.899    2.688  70 940518.964      2.401
+  13   42   29   71 Cu    x  -62711.127      1.490     8635.022    0.021 B-   4617.651    3.044  70 932676.832      1.600
+  11   41   30   71 Zn       -67328.777      2.654     8689.041    0.037 B-   2810.358    2.775  70 927719.580      2.849
+   9   40   31   71 Ga       -70139.135      0.812     8717.604    0.011 B-   -232.638    0.223  70 924702.536      0.871
+   7   39   32   71 Ge       -69906.497      0.834     8703.309    0.012 B-  -2013.400    4.082  70 924952.284      0.894
+   5   38   33   71 As    -  -67893.097      4.167     8663.932    0.059 B-  -4746.590    5.017  70 927113.758      4.473
+   3   37   34   71 Se    x  -63146.507      2.794     8586.060    0.039 B-  -6644.089    6.082  70 932209.432      3.000
+   1   36   35   71 Br       -56502.418      5.402     8481.462    0.076 B- -10175.212  128.845  70 939342.156      5.799
+  -1   35   36   71 Kr       -46327.205    128.769     8327.130    1.814 B- -14267#     420#     70 950265.696    138.238
+  -3   34   37   71 Rb    x  -32060#       400#        8115#       6#    B-      *               70 965582#       429#
+0 22   47   25   72 Mn    x   -9900#       600#        7937#       8#    B-  18530#     781#     71 989372#       644#
+  20   46   26   72 Fe    x  -28430#       500#        8184#       7#    B-  11769#     640#     71 969479#       537#
+  18   45   27   72 Co    x  -40200#       400#        8336#       6#    B-  14027#     400#     71 956844#       429#
+  16   44   28   72 Ni    x  -54226.060      2.237     8520.211    0.031 B-   5556.938    2.637  71 941785.926      2.401
+  14   43   29   72 Cu    x  -59782.999      1.397     8586.525    0.019 B-   8362.487    2.558  71 935820.307      1.500
+  12   42   30   72 Zn    x  -68145.486      2.142     8691.805    0.030 B-    442.807    2.294  71 926842.807      2.300
+  10   41   31   72 Ga       -68588.293      0.819     8687.089    0.011 B-   3997.607    0.822  71 926367.434      0.878
+   8   40   32   72 Ge       -72585.900      0.076     8731.745    0.001 B-  -4356.102    4.082  71 922075.826      0.081
+   6   39   33   72 As    -  -68229.798      4.083     8660.378    0.057 B-   -361.618    4.528  71 926752.295      4.383
+   4   38   34   72 Se    x  -67868.180      1.956     8644.489    0.027 B-  -8806.437    2.208  71 927140.507      2.100
+   2   37   35   72 Br    x  -59061.743      1.025     8511.312    0.014 B-  -5121.168    8.076  71 936594.607      1.100
+   0   36   36   72 Kr    x  -53940.575      8.011     8429.319    0.111 B- -15611#     500#     71 942092.407      8.600
+  -2   35   37   72 Rb    x  -38330#       500#        8202#       7#    B-      *               71 958851#       537#
+0 21   47   26   73 Fe    x  -22900#       500#        8106#       7#    B-  14518#     640#     72 975416#       537#
+  19   46   27   73 Co    x  -37418#       400#        8295#       5#    B-  12690#     400#     72 959830#       429#
+  17   45   28   73 Ni    x  -50108.152      2.423     8457.652    0.033 B-   8879.285    3.104  72 946206.683      2.601
+  15   44   29   73 Cu       -58987.437      1.942     8568.569    0.027 B-   6605.966    2.691  72 936674.378      2.084
+  13   43   30   73 Zn    x  -65593.402      1.863     8648.345    0.026 B-   4105.932    2.506  72 929582.582      2.000
+  11   42   31   73 Ga    x  -69699.335      1.677     8693.873    0.023 B-   1598.188    1.678  72 925174.682      1.800
+   9   41   32   73 Ge       -71297.523      0.057     8705.049    0.001 B-   -344.776    3.853  72 923458.956      0.061
+   7   40   33   73 As       -70952.747      3.853     8689.609    0.053 B-  -2725.360    7.399  72 923829.089      4.136
+   5   39   34   73 Se       -68227.387      7.424     8641.558    0.102 B-  -4579.912   10.388  72 926754.883      7.969
+   3   38   35   73 Br    x  -63647.475      7.266     8568.103    0.100 B-  -7095.725    9.801  72 931671.621      7.800
+   1   37   36   73 Kr    x  -56551.751      6.578     8460.184    0.090 B- -10470#     200#     72 939289.195      7.061
+  -1   36   37   73 Rb   -p  -46082#       200#        8306#       3#    B- -14131#     448#     72 950529#       215#
+  -3   35   38   73 Sr    x  -31950#       401#        8102#       5#    B-      *               72 965700#       430#
+0 22   48   26   74 Fe    x  -19590#       600#        8061#       8#    B-  13230#     781#     73 978969#       644#
+  20   47   27   74 Co    x  -32820#       500#        8229#       7#    B-  15636#     537#     73 964766#       537#
+  18   46   28   74 Ni    x  -48456#       196#        8430#       3#    B-   7550#     196#     73 947980#       210#
+  16   45   29   74 Cu    x  -56006.205      6.148     8521.562    0.083 B-   9750.507    6.642  73 939874.862      6.600
+  14   44   30   74 Zn    x  -65756.712      2.515     8642.754    0.034 B-   2292.905    3.910  73 929407.262      2.700
+  12   43   31   74 Ga    x  -68049.617      2.994     8663.167    0.040 B-   5372.824    2.994  73 926945.726      3.214
+  10   42   32   74 Ge       -73422.442      0.013     8725.200    0.000 B-  -2562.387    1.693  73 921177.762      0.013
+   8   41   33   74 As       -70860.054      1.693     8680.001    0.023 B-   1353.147    1.693  73 923928.598      1.817
+   6   40   34   74 Se       -72213.201      0.015     8687.715    0.000 B-  -6925.049    5.835  73 922475.935      0.015
+   4   39   35   74 Br       -65288.153      5.835     8583.561    0.079 B-  -2956.317    6.173  73 929910.281      6.264
+   2   38   36   74 Kr       -62331.836      2.013     8533.038    0.027 B- -10415.827    3.424  73 933084.017      2.161
+   0   37   37   74 Rb       -51916.009      3.027     8381.712    0.041 B- -11089#     100#     73 944265.868      3.249
+  -2   36   38   74 Sr    x  -40827#       100#        8221#       1#    B-      *               73 956170#       107#
+0 23   49   26   75 Fe    x  -13640#       600#        7982#       8#    B-  16010#     781#     74 985357#       644#
+  21   48   27   75 Co    x  -29649#       500#        8185#       7#    B-  14380#     583#     74 968170#       537#
+  19   47   28   75 Ni    x  -44030#       300#        8366#       4#    B-  10441#     300#     74 952732#       322#
+  17   46   29   75 Cu    x  -54471.341      2.330     8495.094    0.031 B-   8087.567    3.042  74 941522.606      2.501
+  15   45   30   75 Zn    x  -62558.908      1.956     8592.497    0.026 B-   5905.672    3.113  74 932840.246      2.100
+  13   44   31   75 Ga    x  -68464.580      2.422     8660.808    0.032 B-   3392.384    2.422  74 926500.246      2.600
+  11   43   32   75 Ge   -n  -71856.965      0.052     8695.609    0.001 B-   1177.231    0.885  74 922858.371      0.055
+   9   42   33   75 As       -73034.195      0.884     8700.874    0.012 B-   -864.714    0.882  74 921594.562      0.948
+   7   41   34   75 Se       -72169.481      0.073     8678.913    0.001 B-  -3062.472    4.285  74 922522.871      0.078
+   5   40   35   75 Br    x  -69107.009      4.285     8627.649    0.057 B-  -4783.385    9.167  74 925810.570      4.600
+   3   39   36   75 Kr    x  -64323.624      8.104     8553.439    0.108 B-  -7104.929    8.189  74 930945.746      8.700
+   1   38   37   75 Rb    x  -57218.694      1.180     8448.275    0.016 B- -10600.000  220.000  74 938573.201      1.266
+  -1   37   38   75 Sr    -  -46618.694    220.003     8296.511    2.933 B- -14799#     372#     74 949952.770    236.183
+  -3   36   39   75 Y     x  -31820#       300#        8089#       4#    B-      *               74 965840#       322#
+0 22   49   27   76 Co    x  -24510#       600#        8116#       8#    B-  17120#     721#     75 973687#       644#
+  20   48   28   76 Ni    x  -41630#       400#        8331#       5#    B-   9346#     400#     75 955308#       429#
+  18   47   29   76 Cu    x  -50975.985      6.707     8443.527    0.088 B-  11327.031    6.863  75 945275.025      7.200
+  16   46   30   76 Zn       -62303.016      1.456     8582.273    0.019 B-   3993.624    2.438  75 933114.957      1.562
+  14   45   31   76 Ga    x  -66296.640      1.956     8624.526    0.026 B-   6916.249    1.956  75 928827.625      2.100
+  12   44   32   76 Ge       -73212.889      0.018     8705.236    0.000 B-   -921.512    0.886  75 921402.726      0.019
+  10   43   33   76 As   -n  -72291.377      0.886     8682.816    0.012 B-   2960.573    0.886  75 922392.010      0.951
+   8   42   34   76 Se       -75251.950      0.016     8711.477    0.000 B-  -4962.881    9.322  75 919213.704      0.017
+   6   41   35   76 Br    -  -70289.068      9.322     8635.882    0.123 B-  -1275.355   10.149  75 924541.577     10.007
+   4   40   36   76 Kr       -69013.714      4.013     8608.807    0.053 B-  -8534.633    4.121  75 925910.726      4.308
+   2   39   37   76 Rb    x  -60479.081      0.938     8486.215    0.012 B-  -6231.442   34.478  75 935073.032      1.006
+   0   38   38   76 Sr    x  -54247.639     34.465     8393.929    0.453 B- -15768#     302#     75 941762.761     37.000
+  -2   37   39   76 Y     x  -38480#       300#        8176#       4#    B-      *               75 958690#       322#
+0 23   50   27   77 Co    x  -21015#       600#        8070#       8#    B-  15785#     781#     76 977440#       644#
+  21   49   28   77 Ni    x  -36800#       500#        8265#       6#    B-  11824#     522#     76 960494#       537#
+  19   48   29   77 Cu    x  -48624#       149#        8408#       2#    B-  10165#     149#     76 947800#       160#
+  17   47   30   77 Zn       -58789.195      1.973     8530.003    0.026 B-   7203.149    3.124  76 936887.199      2.117
+  15   46   31   77 Ga    x  -65992.344      2.422     8613.390    0.031 B-   5220.518    2.422  76 929154.300      2.600
+  13   45   32   77 Ge   -n  -71212.862      0.053     8671.028    0.001 B-   2703.456    1.694  76 923549.844      0.056
+  11   44   33   77 As       -73916.318      1.693     8695.978    0.022 B-    683.170    1.693  76 920647.564      1.817
+   9   43   34   77 Se       -74599.488      0.062     8694.690    0.001 B-  -1364.680    2.810  76 919914.150      0.067
+   7   42   35   77 Br    -  -73234.809      2.811     8666.806    0.037 B-  -3065.366    3.424  76 921379.194      3.017
+   5   41   36   77 Kr    x  -70169.443      1.956     8616.836    0.025 B-  -5338.951    2.351  76 924670.000      2.100
+   3   40   37   77 Rb    x  -64830.492      1.304     8537.339    0.017 B-  -7027.055    8.024  76 930401.600      1.400
+   1   39   38   77 Sr    x  -57803.436      7.918     8435.918    0.103 B- -11365#     203#     76 937945.455      8.500
+  -1   38   39   77 Y    -p  -46439#       203#        8278#       3#    B- -14399#     448#     76 950146#       218#
+  -3   37   40   77 Zr    x  -32040#       400#        8081#       5#    B-      *               76 965604#       429#
+0 22   50   28   78 Ni    x  -33890#       600#        8225#       8#    B-  10608#     783#     77 963618#       644#
+  20   49   29   78 Cu    x  -44497.469    503.007     8350.925    6.449 B-  12985.766  503.011  77 952230.000    540.000
+  18   48   30   78 Zn       -57483.235      1.944     8507.379    0.025 B-   6222.716    2.719  77 938289.205      2.086
+  16   47   31   78 Ga       -63705.950      1.903     8577.127    0.024 B-   8156.099    4.015  77 931608.845      2.043
+  14   46   32   78 Ge  -nn  -71862.050      3.536     8671.663    0.045 B-    954.890   10.400  77 922852.912      3.795
+  12   45   33   78 As  +pn  -72816.940      9.781     8673.875    0.125 B-   4209.004    9.782  77 921827.795     10.500
+  10   44   34   78 Se       -77025.944      0.179     8717.806    0.002 B-  -3573.784    3.575  77 917309.243      0.191
+   8   43   35   78 Br    -  -73452.160      3.580     8661.959    0.046 B-    726.116    3.584  77 921145.859      3.842
+   6   42   36   78 Kr       -74178.275      0.307     8661.238    0.004 B-  -7242.857    3.252  77 920366.341      0.329
+   4   41   37   78 Rb    x  -66935.419      3.237     8558.350    0.042 B-  -3761.477    8.125  77 928141.868      3.475
+   2   40   38   78 Sr    x  -63173.941      7.452     8500.096    0.096 B- -11001#     298#     77 932179.980      8.000
+   0   39   39   78 Y     x  -52173#       298#        8349#       4#    B- -11323#     499#     77 943990#       320#
+  -2   38   40   78 Zr    x  -40850#       400#        8194#       5#    B-      *               77 956146#       429#
+0 23   51   28   79 Ni    x  -27570#       600#        8143#       8#    B-  14170#     671#     78 970402#       644#
+  21   50   29   79 Cu    x  -41740#       300#        8312#       4#    B-  11692#     300#     78 955190#       322#
+  19   49   30   79 Zn       -53432.295      2.225     8450.582    0.028 B-   9115.384    2.901  78 942638.068      2.388
+  17   48   31   79 Ga       -62547.679      1.868     8556.063    0.024 B-   6978.913   37.147  78 932852.301      2.005
+  15   47   32   79 Ge       -69526.592     37.181     8634.501    0.471 B-   4109.457   37.456  78 925360.129     39.915
+  13   46   33   79 As       -73636.049      5.328     8676.616    0.067 B-   2281.410    5.331  78 920948.445      5.719
+  11   45   34   79 Se   -n  -75917.459      0.223     8695.592    0.003 B-    150.576    1.038  78 918499.251      0.238
+   9   44   35   79 Br   +n  -76068.035      1.021     8687.594    0.013 B-  -1625.778    3.333  78 918337.601      1.095
+   7   43   36   79 Kr    -  -74442.257      3.486     8657.112    0.044 B-  -3639.271    4.092  78 920082.945      3.742
+   5   42   37   79 Rb    x  -70802.985      2.142     8601.142    0.027 B-  -5326.096    8.653  78 923989.864      2.300
+   3   41   38   79 Sr    x  -65476.889      8.383     8523.820    0.106 B-  -7659.056   79.620  78 929707.664      9.000
+   1   40   39   79 Y     x  -57817.833     79.177     8416.967    1.002 B- -11048#     310#     78 937930.000     85.000
+  -1   39   40   79 Zr    x  -46770#       300#        8267#       4#    B- -15120#     583#     78 949790#       322#
+  -3   38   41   79 Nb    x  -31650#       500#        8066#       6#    B-      *               78 966022#       537#
+0 24   52   28   80 Ni    x  -22630#       700#        8080#       9#    B-  13570#     806#     79 975706#       751#
+  22   51   29   80 Cu    x  -36200#       400#        8240#       5#    B-  15449#     400#     79 961138#       429#
+  20   50   30   80 Zn       -51648.612      2.585     8423.545    0.032 B-   7575.055    3.877  79 944552.930      2.774
+  18   49   31   80 Ga    x  -59223.667      2.891     8508.454    0.036 B-  10311.639    3.541  79 936420.774      3.103
+  16   48   32   80 Ge    x  -69535.306      2.054     8627.570    0.026 B-   2679.187    3.915  79 925350.774      2.205
+  14   47   33   80 As    x  -72214.493      3.333     8651.280    0.042 B-   5544.964    3.445  79 922474.548      3.577
+  12   46   34   80 Se       -77759.457      0.963     8710.813    0.012 B-  -1870.464    0.310  79 916521.785      1.034
+  10   45   35   80 Br    -  -75888.993      1.012     8677.653    0.013 B-   2004.353    1.154  79 918529.810      1.086
+   8   44   36   80 Kr       -77893.346      0.691     8692.928    0.009 B-  -5717.879    1.987  79 916378.048      0.742
+   6   43   37   80 Rb    x  -72175.467      1.863     8611.675    0.023 B-  -1864.009    3.933  79 922516.444      2.000
+   4   42   38   80 Sr    x  -70311.459      3.464     8578.596    0.043 B-  -9163.307    7.139  79 924517.540      3.718
+   2   41   39   80 Y     x  -61148.152      6.242     8454.275    0.078 B-  -6788#     300#     79 934354.755      6.701
+   0   40   40   80 Zr    x  -54360#       300#        8360#       4#    B- -15940#     500#     79 941642#       322#
+  -2   39   41   80 Nb    x  -38420#       400#        8151#       5#    B-      *               79 958754#       429#
+0 23   52   29   81 Cu    x  -31420#       500#        8179#       6#    B-  14779#     500#     80 966269#       537#
+  21   51   30   81 Zn    x  -46199.663      5.030     8351.925    0.062 B-  11428.292    5.996  80 950402.619      5.400
+  19   50   31   81 Ga    x  -57627.954      3.264     8483.357    0.040 B-   8663.733    3.851  80 938133.842      3.503
+  17   49   32   81 Ge    x  -66291.687      2.055     8580.658    0.025 B-   6241.617    3.344  80 928832.942      2.205
+  15   48   33   81 As       -72533.304      2.644     8648.056    0.033 B-   3855.684    2.812  80 922132.290      2.838
+  13   47   34   81 Se       -76388.988      0.992     8685.999    0.012 B-   1588.046    1.389  80 917993.044      1.065
+  11   46   35   81 Br       -77977.034      0.978     8695.946    0.012 B-   -280.853    0.471  80 916288.206      1.049
+   9   45   36   81 Kr       -77696.181      1.074     8682.820    0.013 B-  -2239.511    5.019  80 916589.714      1.152
+   7   44   37   81 Rb       -75456.670      4.904     8645.513    0.061 B-  -3928.545    5.817  80 918993.927      5.264
+   5   43   38   81 Sr    x  -71528.125      3.128     8587.354    0.039 B-  -5815.214    6.245  80 923211.394      3.358
+   3   42   39   81 Y     x  -65712.912      5.405     8505.902    0.067 B-  -8252.773   94.236  80 929454.283      5.802
+   1   41   40   81 Zr    x  -57460.139     94.081     8394.358    1.161 B- -11100#     411#     80 938314.000    101.000
+  -1   40   41   81 Nb    x  -46360#       400#        8248#       5#    B- -14610#     640#     80 950230#       429#
+  -3   39   42   81 Mo    x  -31750#       500#        8058#       6#    B-      *               80 965915#       537#
+0 24   53   29   82 Cu    x  -25320#       600#        8103#       7#    B-  16994#     600#     81 972818#       644#
+  22   52   30   82 Zn    x  -42313.954      3.074     8301.117    0.037 B-  10616.764    3.916  81 954574.099      3.300
+  20   51   31   82 Ga    x  -52930.719      2.426     8421.049    0.030 B-  12484.348    3.296  81 943176.533      2.604
+  18   50   32   82 Ge    x  -65415.067      2.241     8563.756    0.027 B-   4690.352    4.345  81 929774.033      2.405
+  16   49   33   82 As    x  -70105.419      3.729     8611.414    0.045 B-   7488.463    3.758  81 924738.733      4.003
+  14   48   34   82 Se       -77593.882      0.467     8693.196    0.006 B-    -95.221    1.077  81 916699.537      0.500
+  12   47   35   82 Br       -77498.661      0.971     8682.494    0.012 B-   3093.124    0.971  81 916801.760      1.042
+  10   46   36   82 Kr       -80591.78515    0.00549   8710.675    0.000 B-  -4403.982    3.009  81 913481.15520    0.00589
+   8   45   37   82 Rb   IT  -76187.803      3.009     8647.427    0.037 B-   -177.751    6.705  81 918209.024      3.230
+   6   44   38   82 Sr       -76010.053      5.992     8635.718    0.073 B-  -7945.961    8.132  81 918399.847      6.432
+   4   43   39   82 Y     x  -68064.091      5.499     8529.275    0.067 B-  -4432.804   12.457  81 926930.188      5.902
+   2   42   40   82 Zr    x  -63631.287     11.178     8465.676    0.136 B- -11541#     300#     81 931689.000     12.000
+   0   41   41   82 Nb    x  -52090#       300#        8315#       4#    B- -11720#     500#     81 944079#       322#
+  -2   40   42   82 Mo    x  -40370#       400#        8163#       5#    B-      *               81 956661#       429#
+0 23   53   30   83 Zn    x  -36290#       300#        8226#       4#    B-  12967#     300#     82 961041#       322#
+  21   52   31   83 Ga    x  -49257.122      2.613     8372.575    0.031 B-  11719.312    3.559  82 947120.301      2.804
+  19   51   32   83 Ge    x  -60976.435      2.427     8504.345    0.029 B-   8692.888    3.698  82 934539.101      2.605
+  17   50   33   83 As    x  -69669.323      2.799     8599.653    0.034 B-   5671.207    4.129  82 925206.901      3.004
+  15   49   34   83 Se   -n  -75340.530      3.036     8658.555    0.037 B-   3673.179    4.839  82 919118.609      3.259
+  13   48   35   83 Br       -79013.709      3.795     8693.384    0.046 B-    976.924    3.795  82 915175.289      4.073
+  11   47   36   83 Kr       -79990.633      0.009     8695.729    0.000 B-   -920.004    2.329  82 914126.518      0.009
+   9   46   37   83 Rb       -79070.630      2.329     8675.218    0.028 B-  -2273.024    6.424  82 915114.182      2.500
+   7   45   38   83 Sr       -76797.606      6.834     8638.407    0.082 B-  -4591.941   19.844  82 917554.374      7.336
+   5   44   39   83 Y     x  -72205.665     18.631     8573.656    0.224 B-  -6294.012   19.707  82 922484.025     20.000
+   3   43   40   83 Zr    x  -65911.654      6.430     8488.399    0.077 B-  -8355.571  151.039  82 929240.925      6.902
+   1   42   41   83 Nb    x  -57556.083    150.902     8378.304    1.818 B- -11216#     428#     82 938211.000    162.000
+  -1   41   42   83 Mo    x  -46340#       401#        8234#       5#    B- -15020#     641#     82 950252#       430#
+  -3   40   43   83 Tc    x  -31320#       500#        8043#       6#    B-      *               82 966377#       537#
+0 24   54   30   84 Zn    x  -31930#       400#        8172#       5#    B-  12158#     447#     83 965722#       429#
+  22   53   31   84 Ga    x  -44088#       200#        8307#       2#    B-  14061#     200#     83 952670#       215#
+  20   52   32   84 Ge    x  -58148.428      3.171     8465.524    0.038 B-   7705.132    4.479  83 937575.091      3.403
+  18   51   33   84 As    x  -65853.560      3.171     8547.938    0.038 B-  10094.161    3.722  83 929303.291      3.403
+  16   50   34   84 Se       -75947.721      1.961     8658.793    0.023 B-   1835.363   25.765  83 918466.762      2.105
+  14   49   35   84 Br       -77783.084     25.730     8671.329    0.306 B-   4656.251   25.730  83 916496.419     27.622
+  12   48   36   84 Kr       -82439.33510    0.00379   8717.446    0.000 B-  -2680.371    2.194  83 911497.72863    0.00407
+  10   47   37   84 Rb       -79758.964      2.194     8676.224    0.026 B-    890.606    2.336  83 914375.225      2.355
+   8   46   38   84 Sr       -80649.570      1.243     8677.512    0.015 B-  -6755.139    4.411  83 913419.120      1.334
+   6   45   39   84 Y        -73894.431      4.299     8587.780    0.051 B-  -2472.745    6.977  83 920671.061      4.615
+   4   44   40   84 Zr    x  -71421.686      5.499     8549.029    0.065 B- -10202.968   14.153  83 923325.662      5.903
+   2   43   41   84 Nb    x  -61218.717     13.041     8418.252    0.155 B-  -7049#     298#     83 934279.000     14.000
+   0   42   42   84 Mo    x  -54170#       298#        8325#       4#    B- -16470#     499#     83 941846#       320#
+  -2   41   43   84 Tc    x  -37700#       400#        8120#       5#    B-      *               83 959527#       429#
+0 25   55   30   85 Zn    x  -25230#       500#        8092#       6#    B-  14619#     582#     84 972914#       537#
+  23   54   31   85 Ga    x  -39849#       298#        8255#       4#    B-  13274#     298#     84 957220#       320#
+  21   53   32   85 Ge    x  -53123.420      3.729     8401.768    0.044 B-  10065.724    4.830  84 942969.659      4.003
+  19   52   33   85 As    x  -63189.144      3.078     8510.984    0.036 B-   9224.492    4.031  84 932163.659      3.304
+  17   51   34   85 Se  +3p  -72413.636      2.613     8610.304    0.031 B-   6161.833    4.031  84 922260.759      2.804
+  15   50   35   85 Br +n2p  -78575.469      3.078     8673.592    0.036 B-   2904.861    3.671  84 915645.759      3.304
+  13   49   36   85 Kr    +  -81480.331      2.000     8698.562    0.024 B-    687.000    2.000  84 912527.262      2.147
+  11   48   37   85 Rb       -82167.33050    0.00498   8697.441    0.000 B-  -1064.051    2.813  84 911789.73760    0.00534
+   9   47   38   85 Sr       -81103.280      2.813     8675.718    0.033 B-  -3261.157   19.173  84 912932.043      3.020
+   7   46   39   85 Y     x  -77842.123     18.965     8628.148    0.223 B-  -4666.934   20.026  84 916433.039     20.360
+   5   45   40   85 Zr    x  -73175.189      6.430     8564.039    0.076 B-  -6895.514    7.625  84 921443.198      6.902
+   3   44   41   85 Nb    x  -66279.676      4.099     8473.711    0.048 B-  -8769.923   16.357  84 928845.837      4.400
+   1   43   42   85 Mo    x  -57509.753     15.835     8361.331    0.186 B- -11660#     400#     84 938260.737     17.000
+  -1   42   43   85 Tc    x  -45850#       400#        8215#       5#    B- -14900#     640#     84 950778#       429#
+  -3   41   44   85 Ru    x  -30950#       500#        8030#       6#    B-      *               84 966774#       537#
+0 24   55   31   86 Ga    x  -34080#       400#        8186#       5#    B-  15320#     593#     85 963414#       429#
+  22   54   32   86 Ge    x  -49399.922    437.802     8354.629    5.091 B-   9562.221  437.816  85 946967.000    470.000
+  20   53   33   86 As    x  -58962.142      3.450     8456.721    0.040 B-  11541.024    4.267  85 936701.533      3.703
+  18   52   34   86 Se    x  -70503.167      2.520     8581.822    0.029 B-   5129.085    3.972  85 924311.733      2.705
+  16   51   35   86 Br  +pp  -75632.252      3.078     8632.365    0.036 B-   7633.414    3.078  85 918805.433      3.304
+  14   50   36   86 Kr       -83265.66564    0.00369   8712.029    0.000 B-   -518.672    0.200  85 910610.62627    0.00396
+  12   49   37   86 Rb   -n  -82746.993      0.200     8696.901    0.002 B-   1776.096    0.200  85 911167.443      0.214
+  10   48   38   86 Sr       -84523.08935    0.00522   8708.456    0.000 B-  -5240.000   14.142  85 909260.72631    0.00561
+   8   47   39   86 Y     -  -79283.089     14.142     8638.428    0.164 B-  -1314.075   14.585  85 914886.098     15.182
+   6   46   40   86 Zr       -77969.014      3.566     8614.051    0.041 B-  -8834.960    6.552  85 916296.815      3.827
+   4   45   41   86 Nb    x  -69134.054      5.499     8502.222    0.064 B-  -5023.810    6.642  85 925781.535      5.903
+   2   44   42   86 Mo    x  -64110.245      3.726     8434.709    0.043 B- -12540#     300#     85 931174.817      4.000
+   0   43   43   86 Tc    x  -51570#       300#        8280#       3#    B- -11800#     500#     85 944637#       322#
+  -2   42   44   86 Ru    x  -39770#       400#        8133#       5#    B-      *               85 957305#       429#
+0 25   56   31   87 Ga    x  -29250#       500#        8129#       6#    B-  14828#     583#     86 968599#       537#
+  23   55   32   87 Ge    x  -44078#       300#        8290#       3#    B-  11540#     300#     86 952680#       322#
+  21   54   33   87 As    x  -55617.907      2.985     8413.851    0.034 B-  10808.218    3.726  86 940291.718      3.204
+  19   53   34   87 Se    x  -66426.125      2.241     8529.091    0.026 B-   7465.552    3.877  86 928688.618      2.405
+  17   52   35   87 Br 2p-n  -73891.676      3.171     8605.910    0.036 B-   6817.845    3.181  86 920674.018      3.404
+  15   51   36   87 Kr   -n  -80709.522      0.246     8675.283    0.003 B-   3888.269    0.246  86 913354.759      0.264
+  13   50   37   87 Rb       -84597.791      0.006     8710.983    0.000 B-    282.275    0.006  86 909180.531      0.006
+  11   49   38   87 Sr       -84880.06595    0.00510   8705.236    0.000 B-  -1861.690    1.128  86 908877.49615    0.00548
+   9   48   39   87 Y     -  -83018.376      1.128     8674.844    0.013 B-  -3671.239    4.296  86 910876.102      1.210
+   7   47   40   87 Zr       -79347.137      4.146     8623.654    0.048 B-  -5472.651    7.963  86 914817.339      4.450
+   5   46   41   87 Nb    x  -73874.486      6.802     8551.757    0.078 B-  -6989.678    7.378  86 920692.472      7.302
+   3   45   42   87 Mo       -66884.808      2.857     8462.424    0.033 B-  -9194.764    5.073  86 928196.201      3.067
+   1   44   43   87 Tc    x  -57690.044      4.192     8347.744    0.048 B- -12170#     400#     86 938067.187      4.500
+  -1   43   44   87 Ru    x  -45520#       400#        8199#       5#    B-      *               86 951132#       429#
+0 24   56   32   88 Ge    x  -40138#       400#        8243#       5#    B-  10582#     445#     87 956910#       429#
+  22   55   33   88 As    x  -50720#       196#        8354#       2#    B-  13164#     196#     87 945550#       210#
+  20   54   34   88 Se    x  -63884.195      3.357     8495.004    0.038 B-   6831.763    4.613  87 931417.491      3.604
+  18   53   35   88 Br   ++  -70715.959      3.171     8563.747    0.036 B-   8975.327    4.106  87 924083.291      3.404
+  16   52   36   88 Kr    x  -79691.286      2.608     8656.849    0.030 B-   2917.709    2.613  87 914447.881      2.800
+  14   51   37   88 Rb       -82608.995      0.159     8681.115    0.002 B-   5312.623    0.159  87 911315.591      0.171
+  12   50   38   88 Sr       -87921.61793    0.00558   8732.595    0.000 B-  -3622.600    1.500  87 905612.25561    0.00599
+  10   49   39   88 Y     -  -84299.018      1.500     8682.539    0.017 B-   -670.147    5.608  87 909501.276      1.610
+   8   48   40   88 Zr       -83628.871      5.403     8666.033    0.061 B-  -7455.284   58.886  87 910220.709      5.800
+   6   47   41   88 Nb       -76173.586     58.810     8572.424    0.668 B-  -3487.042   58.933  87 918224.287     63.134
+   4   46   42   88 Mo    x  -72686.544      3.819     8523.908    0.043 B- -11005.229  149.088  87 921967.781      4.100
+   2   45   43   88 Tc    x  -61681.315    149.039     8389.958    1.694 B-  -7342#     335#     87 933782.381    160.000
+   0   44   44   88 Ru    x  -54340#       300#        8298#       3#    B- -17479#     500#     87 941664#       322#
+  -2   43   45   88 Rh    x  -36860#       400#        8090#       5#    B-      *               87 960429#       429#
+0 25   57   32   89 Ge    x  -33729#       400#        8169#       4#    B-  13069#     499#     88 963790#       429#
+  23   56   33   89 As    x  -46798#       298#        8307#       3#    B-  12194#     298#     88 949760#       320#
+  21   55   34   89 Se    x  -58992.391      3.729     8435.279    0.042 B-   9281.872    4.951  88 936669.059      4.003
+  19   54   35   89 Br    x  -68274.263      3.264     8530.779    0.037 B-   8261.522    3.904  88 926704.559      3.504
+  17   53   36   89 Kr    x  -76535.785      2.142     8614.815    0.024 B-   5176.604    5.834  88 917835.450      2.300
+  15   52   37   89 Rb       -81712.388      5.427     8664.189    0.061 B-   4496.628    5.427  88 912278.137      5.825
+  13   51   38   89 Sr       -86209.017      0.092     8705.922    0.001 B-   1499.336    1.615  88 907450.808      0.098
+  11   50   39   89 Y        -87708.352      1.612     8713.978    0.018 B-  -2832.792    2.776  88 905841.205      1.730
+   9   49   40   89 Zr       -84875.561      3.083     8673.359    0.035 B-  -4250.351   23.743  88 908882.332      3.310
+   7   48   41   89 Nb       -80625.209     23.631     8616.812    0.266 B-  -5610.275   23.953  88 913445.272     25.369
+   5   47   42   89 Mo    x  -75014.935      3.912     8544.984    0.044 B-  -7620.087    5.467  88 919468.150      4.200
+   3   46   43   89 Tc    x  -67394.848      3.819     8450.575    0.043 B-  -9135#     298#     88 927648.650      4.100
+   1   45   44   89 Ru    x  -58260#       298#        8339#       3#    B- -12400#     468#     88 937455#       320#
+  -1   44   45   89 Rh   -p  -45861#       361#        8191#       4#    B-      *               88 950767#       387#
+0 26   58   32   90 Ge    x  -29221#       500#        8118#       6#    B-  12109#     640#     89 968630#       537#
+  24   57   33   90 As    x  -41330#       400#        8244#       4#    B-  14470#     518#     89 955630#       429#
+  22   56   34   90 Se    x  -55800.217    329.749     8395.766    3.664 B-   8200.081  329.766  89 940096.000    354.000
+  20   55   35   90 Br    x  -64000.298      3.357     8478.186    0.037 B-  10958.952    3.840  89 931292.850      3.604
+  18   54   36   90 Kr    x  -74959.250      1.863     8591.259    0.021 B-   4405.154    6.746  89 919527.930      2.000
+  16   53   37   90 Rb       -79364.404      6.484     8631.512    0.072 B-   6583.723    6.544  89 914798.803      6.960
+  14   52   38   90 Sr       -85948.127      2.124     8695.972    0.024 B-    545.934    1.406  89 907730.885      2.280
+  12   51   39   90 Y        -86494.062      1.611     8693.345    0.018 B-   2278.474    1.609  89 907144.800      1.729
+  10   50   40   90 Zr       -88772.535      0.118     8709.969    0.001 B-  -6111.016    3.316  89 904698.758      0.126
+   8   49   41   90 Nb       -82661.519      3.317     8633.376    0.037 B-  -2489.016    3.316  89 911259.204      3.561
+   6   48   42   90 Mo       -80172.503      3.463     8597.028    0.038 B-  -9447.816    3.611  89 913931.272      3.717
+   4   47   43   90 Tc    x  -70724.687      1.025     8483.359    0.011 B-  -5840.895    3.869  89 924073.921      1.100
+   2   46   44   90 Ru       -64883.792      3.730     8409.768    0.041 B- -13184#     300#     89 930344.379      4.004
+   0   45   45   90 Rh    x  -51700#       300#        8255#       3#    B- -11990#     500#     89 944498#       322#
+  -2   44   46   90 Pd    x  -39710#       400#        8113#       4#    B-      *               89 957370#       429#
+0 25   58   33   91 As    x  -36896#       400#        8193#       4#    B-  13684#     589#     90 960390#       429#
+  23   57   34   91 Se    x  -50580.124    433.145     8334.837    4.760 B-  10527.169  433.159  90 945700.000    465.000
+  21   56   35   91 Br -n2p  -61107.294      3.544     8441.923    0.039 B-   9866.671    4.190  90 934398.618      3.804
+  19   55   36   91 Kr    x  -70973.965      2.236     8541.751    0.025 B-   6771.072    8.115  90 923806.310      2.400
+  17   54   37   91 Rb       -77745.037      7.801     8607.561    0.086 B-   5906.890    8.873  90 916537.265      8.375
+  15   53   38   91 Sr       -83651.927      5.453     8663.875    0.060 B-   2699.369    5.247  90 910195.958      5.853
+  13   52   39   91 Y        -86351.295      1.843     8684.941    0.020 B-   1544.271    1.840  90 907298.066      1.978
+  11   51   40   91 Zr       -87895.566      0.105     8693.314    0.001 B-  -1257.565    2.924  90 905640.223      0.112
+   9   50   41   91 Nb       -86638.001      2.926     8670.897    0.032 B-  -4429.180    6.744  90 906990.274      3.141
+   7   49   42   91 Mo       -82208.821      6.238     8613.628    0.069 B-  -6222.175    6.671  90 911745.195      6.696
+   5   48   43   91 Tc       -75986.646      2.363     8536.655    0.026 B-  -7746.824    3.242  90 918424.975      2.536
+   3   47   44   91 Ru       -68239.823      2.221     8442.928    0.024 B-  -9670#     298#     90 926741.532      2.384
+   1   46   45   91 Rh    x  -58570#       298#        8328#       3#    B- -12639#     499#     90 937123#       320#
+  -1   45   46   91 Pd    x  -45930#       401#        8181#       4#    B-      *               90 950692#       430#
+0 26   59   33   92 As    x  -30981#       500#        8127#       5#    B-  15742#     640#     91 966740#       537#
+  24   58   34   92 Se    x  -46724#       400#        8290#       4#    B-   9509#     400#     91 949840#       429#
+  22   57   35   92 Br    x  -56232.805      6.709     8384.911    0.073 B-  12536.514    7.232  91 939631.597      7.202
+  20   56   36   92 Kr    x  -68769.320      2.701     8512.674    0.029 B-   6003.118    6.692  91 926173.094      2.900
+  18   55   37   92 Rb       -74772.438      6.123     8569.422    0.067 B-   8094.923    6.419  91 919728.481      6.573
+  16   54   38   92 Sr       -82867.361      3.423     8648.906    0.037 B-   1949.132    9.384  91 911038.224      3.675
+  14   53   39   92 Y        -84816.492      9.127     8661.589    0.099 B-   3642.535    9.127  91 908945.745      9.798
+  12   52   40   92 Zr       -88459.028      0.102     8692.678    0.001 B-  -2005.736    1.782  91 905035.322      0.109
+  10   51   41   92 Nb       -86453.292      1.785     8662.372    0.019 B-    355.284    1.791  91 907188.568      1.915
+   8   50   42   92 Mo       -86808.576      0.157     8657.730    0.002 B-  -7882.884    3.106  91 906807.155      0.168
+   6   49   43   92 Tc       -78925.693      3.102     8563.543    0.034 B-  -4624.492    4.125  91 915269.779      3.330
+   4   48   44   92 Ru       -74301.201      2.718     8504.773    0.030 B- -11302.114    5.153  91 920234.375      2.917
+   2   47   45   92 Rh    x  -62999.087      4.378     8373.420    0.048 B-  -8419#     300#     91 932367.694      4.700
+   0   46   46   92 Pd    x  -54580#       300#        8273#       3#    B- -17450#     583#     91 941406#       322#
+  -2   45   47   92 Ag    x  -37130#       500#        8075#       5#    B-      *               91 960139#       537#
+0 25   59   34   93 Se    x  -40716#       400#        8223#       4#    B-  12175#     588#     92 956290#       429#
+  23   58   35   93 Br    x  -52890.230    430.816     8345.598    4.632 B-  11245.765  430.823  92 943220.000    462.500
+  21   57   36   93 Kr    x  -64135.994      2.515     8458.108    0.027 B-   8483.907    8.224  92 931147.174      2.700
+  19   56   37   93 Rb       -72619.901      7.830     8540.920    0.084 B-   7465.938    8.876  92 922039.325      8.406
+  17   55   38   93 Sr       -80085.838      7.554     8612.787    0.081 B-   4141.319   11.697  92 914024.311      8.109
+  15   54   39   93 Y        -84227.157     10.488     8648.905    0.113 B-   2894.875   10.483  92 909578.422     11.259
+  13   53   40   93 Zr       -87122.032      0.457     8671.620    0.005 B-     90.806    1.484  92 906470.646      0.490
+  11   52   41   93 Nb       -87212.838      1.491     8664.184    0.016 B-   -405.769    1.501  92 906373.161      1.600
+   9   51   42   93 Mo   -n  -86807.069      0.181     8651.409    0.002 B-  -3200.963    1.004  92 906808.773      0.193
+   7   50   43   93 Tc   -p  -83606.106      1.012     8608.577    0.011 B-  -6389.393    2.299  92 910245.149      1.086
+   5   49   44   93 Ru       -77216.713      2.065     8531.462    0.022 B-  -8204.913    3.343  92 917104.444      2.216
+   3   48   45   93 Rh       -69011.800      2.629     8434.825    0.028 B- -10011#     301#     92 925912.781      2.821
+   1   47   46   93 Pd   +p  -59001#       300#        8319#       3#    B- -12734#     501#     92 936660#       323#
+  -1   46   47   93 Ag    x  -46267#       401#        8173#       4#    B-      *               92 950330#       430#
+0 26   60   34   94 Se    x  -36803#       500#        8180#       5#    B-  10597#     583#     93 960490#       537#
+  24   59   35   94 Br    x  -47400#       300#        8284#       3#    B-  13948#     300#     93 949114#       322#
+  22   58   36   94 Kr    x  -61347.772     12.109     8424.331    0.129 B-   7215.013   12.278  93 934140.454     13.000
+  20   57   37   94 Rb       -68562.785      2.029     8492.764    0.022 B-  10282.926    2.623  93 926394.818      2.177
+  18   56   38   94 Sr       -78845.711      1.663     8593.834    0.018 B-   3505.752    6.422  93 915355.643      1.785
+  16   55   39   94 Y        -82351.463      6.380     8622.806    0.068 B-   4917.859    6.380  93 911592.063      6.849
+  14   54   40   94 Zr       -87269.322      0.164     8666.801    0.002 B-   -900.260    1.500  93 906312.524      0.175
+  12   53   41   94 Nb       -86369.062      1.491     8648.901    0.016 B-   2045.002    1.494  93 907278.992      1.601
+  10   52   42   94 Mo       -88414.065      0.141     8662.333    0.002 B-  -4255.748    4.069  93 905083.592      0.151
+   8   51   43   94 Tc    -  -84158.317      4.071     8608.736    0.043 B-  -1574.726    5.143  93 909652.325      4.370
+   6   50   44   94 Ru       -82583.591      3.143     8583.661    0.033 B-  -9675.978    4.615  93 911342.863      3.374
+   4   49   45   94 Rh       -72907.613      3.379     8472.402    0.036 B-  -6805.345    5.459  93 921730.453      3.627
+   2   48   46   94 Pd    x  -66102.268      4.287     8391.682    0.046 B- -13693#     400#     93 929036.292      4.602
+   0   47   47   94 Ag    x  -52410#       400#        8238#       4#    B- -12270#     640#     93 943736#       429#
+  -2   46   48   94 Cd    x  -40140#       500#        8099#       5#    B-      *               93 956908#       537#
+0 27   61   34   95 Se    x  -30460#       500#        8112#       5#    B-  13311#     582#     94 967300#       537#
+  25   60   35   95 Br    x  -43771#       298#        8244#       3#    B-  12388#     299#     94 953010#       320#
+  23   59   36   95 Kr    x  -56158.913     18.630     8365.995    0.196 B-   9732.580   27.513  94 939710.923     20.000
+  21   58   37   95 Rb       -65891.493     20.245     8460.208    0.213 B-   9228.058   20.204  94 929262.568     21.734
+  19   57   38   95 Sr       -75119.551      5.812     8549.111    0.061 B-   6089.296    7.240  94 919355.840      6.239
+  17   56   39   95 Y        -81208.848      6.779     8604.973    0.071 B-   4451.092    6.772  94 912818.711      7.277
+  15   55   40   95 Zr       -85659.940      0.869     8643.592    0.009 B-   1126.318    0.985  94 908040.267      0.933
+  13   54   41   95 Nb       -86786.258      0.508     8647.212    0.005 B-    925.601    0.494  94 906831.115      0.545
+  11   53   42   95 Mo       -87711.858      0.123     8648.720    0.001 B-  -1690.518    5.078  94 905837.442      0.132
+   9   52   43   95 Tc       -86021.341      5.080     8622.690    0.053 B-  -2563.596   10.531  94 907652.287      5.453
+   7   51   44   95 Ru       -83457.745      9.502     8587.470    0.100 B-  -5117.138   10.266  94 910404.420     10.200
+   5   50   45   95 Rh       -78340.606      3.886     8525.370    0.041 B-  -8374.706    4.928  94 915897.895      4.171
+   3   49   46   95 Pd    x  -69965.900      3.031     8428.980    0.032 B- -10369#     298#     94 924888.512      3.253
+   1   48   47   95 Ag    x  -59597#       298#        8312#       3#    B- -12966#     499#     94 936020#       320#
+  -1   47   48   95 Cd    x  -46631#       401#        8167#       4#    B-      *               94 949940#       430#
+0 26   61   35   96 Br    x  -38163#       298#        8184#       3#    B-  14916#     299#     95 959030#       320#
+  24   60   36   96 Kr    x  -53079.678     20.493     8330.851    0.213 B-   8274.671   20.765  95 943016.618     22.000
+  22   59   37   96 Rb       -61354.349      3.353     8408.896    0.035 B-  11569.808    9.115  95 934133.393      3.599
+  20   58   38   96 Sr       -72924.157      8.475     8521.265    0.088 B-   5411.738    9.726  95 921712.692      9.098
+  18   57   39   96 Y        -78335.895      6.088     8569.488    0.063 B-   7102.951    6.087  95 915902.953      6.535
+  16   56   40   96 Zr       -85438.846      0.114     8635.327    0.001 B-    163.971    0.100  95 908277.621      0.122
+  14   55   41   96 Nb       -85602.816      0.147     8628.886    0.002 B-   3192.059    0.107  95 908101.591      0.157
+  12   54   42   96 Mo       -88794.876      0.120     8653.987    0.001 B-  -2973.242    5.145  95 904674.774      0.128
+  10   53   43   96 Tc    -  -85821.634      5.146     8614.866    0.054 B-    258.738    5.146  95 907866.681      5.524
+   8   52   44   96 Ru       -86080.372      0.170     8609.412    0.002 B-  -6392.654   10.000  95 907588.914      0.182
+   6   51   45   96 Rh    -  -79687.718     10.001     8534.673    0.104 B-  -3504.312   10.844  95 914451.710     10.737
+   4   50   46   96 Pd    x  -76183.406      4.194     8490.020    0.044 B- -11671.771   90.181  95 918213.744      4.502
+   2   49   47   96 Ag   ep  -64511.636     90.084     8360.290    0.938 B-  -8939#     411#     95 930743.906     96.708
+   0   48   48   96 Cd    x  -55573#       401#        8259#       4#    B- -17683#     641#     95 940340#       430#
+  -2   47   49   96 In    x  -37890#       500#        8067#       5#    B-      *               95 959323#       537#
+0 27   62   35   97 Br    x  -34055#       401#        8140#       4#    B-  13368#     421#     96 963440#       430#
+  25   61   36   97 Kr    x  -47423.492    130.409     8269.864    1.344 B-  11095.645  130.423  96 949088.784    140.000
+  23   60   37   97 Rb       -58519.137      1.912     8376.186    0.020 B-  10062.317    3.888  96 937177.118      2.052
+  21   59   38   97 Sr       -68581.454      3.385     8471.856    0.035 B-   7539.969    7.521  96 926374.776      3.633
+  19   58   39   97 Y     +  -76121.424      6.719     8541.522    0.069 B-   6821.237    6.707  96 918280.286      7.213
+  17   57   40   97 Zr       -82942.661      0.414     8603.779    0.004 B-   2663.115    4.248  96 910957.386      0.444
+  15   56   41   97 Nb       -85605.776      4.249     8623.168    0.044 B-   1938.915    4.248  96 908098.414      4.561
+  13   55   42   97 Mo       -87544.691      0.165     8635.092    0.002 B-   -320.266    4.117  96 906016.903      0.176
+  11   54   43   97 Tc       -87224.424      4.118     8623.725    0.042 B-  -1103.873    4.956  96 906360.723      4.420
+   9   53   44   97 Ru   -n  -86120.552      2.763     8604.279    0.028 B-  -3523.000   35.355  96 907545.779      2.965
+   7   52   45   97 Rh    -  -82597.552     35.463     8559.894    0.366 B-  -4791.709   35.792  96 911327.876     38.071
+   5   51   46   97 Pd    x  -77805.843      4.844     8502.430    0.050 B-  -6980.000  110.000  96 916471.987      5.200
+   3   50   47   97 Ag    -  -70825.843    110.107     8422.405    1.135 B- -10372#     318#     96 923965.326    118.204
+   1   49   48   97 Cd    x  -60454#       298#        8307#       3#    B- -13264#     499#     96 935100#       320#
+  -1   48   49   97 In    x  -47189#       401#        8163#       4#    B-      *               96 949340#       430#
+0 28   63   35   98 Br    x  -28250#       400#        8080#       4#    B-  16061#     499#     97 969672#       429#
+  26   62   36   98 Kr    x  -44311#       298#        8236#       3#    B-  10058#     299#     97 952430#       320#
+  24   61   37   98 Rb       -54369.146     16.083     8330.729    0.164 B-  12053.958   16.403  97 941632.317     17.265
+  22   60   38   98 Sr       -66423.104      3.226     8445.745    0.033 B-   5871.673    8.558  97 928691.860      3.463
+  20   59   39   98 Y  p-2n  -72294.777      7.929     8497.677    0.081 B-   8991.932   11.576  97 922388.360      8.511
+  18   58   40   98 Zr       -81286.709      8.451     8581.448    0.086 B-   2237.890    9.819  97 912735.124      9.072
+  16   57   41   98 Nb  -pn  -83524.598      5.001     8596.301    0.051 B-   4591.373    5.003  97 910332.650      5.369
+  14   56   42   98 Mo       -88115.972      0.174     8635.168    0.002 B-  -1683.766    3.377  97 905403.608      0.186
+  12   55   43   98 Tc       -86432.205      3.380     8610.004    0.034 B-   1792.653    7.157  97 907211.205      3.628
+  10   54   44   98 Ru       -88224.858      6.463     8620.313    0.066 B-  -5049.653   10.000  97 905286.713      6.937
+   8   53   45   98 Rh    -  -83175.205     11.906     8560.803    0.121 B-  -1854.229   12.816  97 910707.740     12.782
+   6   52   46   98 Pd       -81320.975      4.742     8533.899    0.048 B-  -8254.560   33.098  97 912698.337      5.090
+   4   51   47   98 Ag       -73066.415     32.907     8441.686    0.336 B-  -5430.000   40.000  97 921559.972     35.327
+   2   50   48   98 Cd    -  -67636.415     51.797     8378.295    0.529 B- -13740#     303#     97 927389.317     55.605
+   0   49   49   98 In    x  -53896#       298#        8230#       3#    B-      *               97 942140#       320#
+0 27   63   36   99 Kr    x  -38759#       401#        8178#       4#    B-  12362#     401#     98 958390#       430#
+  25   62   37   99 Rb    x  -51121.143      4.031     8295.300    0.041 B-  11400.258    6.223  98 945119.192      4.327
+  23   61   38   99 Sr       -62521.401      4.741     8402.552    0.048 B-   8128.424    8.138  98 932880.511      5.089
+  21   60   39   99 Y     x  -70649.825      6.627     8476.755    0.067 B-   6970.792   12.409  98 924154.288      7.114
+  19   59   40   99 Zr       -77620.617     10.502     8539.264    0.106 B-   4714.724   15.950  98 916670.835     11.274
+  17   58   41   99 Nb   +p  -82335.341     12.004     8578.985    0.121 B-   3634.758   12.006  98 911609.371     12.886
+  15   57   42   99 Mo       -85970.098      0.229     8607.797    0.002 B-   1357.764    0.890  98 907707.298      0.245
+  13   56   43   99 Tc       -87327.862      0.908     8613.610    0.009 B-    297.519    0.946  98 906249.678      0.974
+  11   55   44   99 Ru       -87625.381      0.344     8608.712    0.003 B-  -2044.081    6.690  98 905930.278      0.369
+   9   54   45   99 Rh       -85581.300      6.697     8580.163    0.068 B-  -3398.649    8.008  98 908124.690      7.189
+   7   53   46   99 Pd       -82182.651      4.981     8537.930    0.050 B-  -5470.178    8.004  98 911773.290      5.347
+   5   52   47   99 Ag    x  -76712.473      6.265     8474.774    0.063 B-  -6781.350    6.462  98 917645.768      6.725
+   3   51   48   99 Cd    x  -69931.123      1.584     8398.373    0.016 B-  -8555#     298#     98 924925.847      1.700
+   1   50   49   99 In    x  -61376#       298#        8304#       3#    B- -13432#     585#     98 934110#       320#
+  -1   49   50   99 Sn    x  -47944#       503#        8160#       5#    B-      *               98 948530#       540#
+0 28   64   36  100 Kr    x  -35052#       401#        8140#       4#    B-  11195#     401#     99 962370#       430#
+  26   63   37  100 Rb    x  -46247.064     19.561     8244.320    0.196 B-  13573.838   20.831  99 950351.731     21.000
+  24   62   38  100 Sr       -59820.903      7.160     8372.234    0.072 B-   7506.493   13.273  99 935779.615      7.686
+  22   61   39  100 Y     x  -67327.396     11.186     8439.476    0.112 B-   9050.041   13.830  99 927721.063     12.008
+  20   60   40  100 Zr       -76377.437      8.149     8522.153    0.081 B-   3419.963   11.398  99 918005.444      8.748
+  18   59   41  100 Nb   IT  -79797.399      7.986     8548.529    0.080 B-   6395.626    7.992  99 914333.963      8.573
+  16   58   42  100 Mo       -86193.025      0.302     8604.662    0.003 B-   -172.080    1.371  99 907467.976      0.323
+  14   57   43  100 Tc   -n  -86020.945      1.351     8595.118    0.014 B-   3206.444    1.376  99 907652.711      1.450
+  12   56   44  100 Ru       -89227.389      0.343     8619.359    0.003 B-  -3636.262   18.123  99 904210.452      0.368
+  10   55   45  100 Rh       -85591.126     18.125     8575.172    0.181 B-   -378.348   25.289  99 908114.141     19.458
+   8   54   46  100 Pd       -85212.778     17.638     8563.566    0.176 B-  -7074.819   18.333  99 908520.315     18.935
+   6   53   47  100 Ag    x  -78137.959      5.000     8484.994    0.050 B-  -3943.363    5.273  99 916115.445      5.367
+   4   52   48  100 Cd       -74194.596      1.677     8437.737    0.017 B-  -9881.624  182.517  99 920348.820      1.799
+   2   51   49  100 In       -64312.972    182.519     8331.097    1.825 B-  -7030.000  240.000  99 930957.180    195.942
+   0   50   50  100 Sn    -  -57282.972    301.518     8252.974    3.015 B-      *               99 938504.196    323.693
+0 29   65   36  101 Kr    x  -29128#       503#        8081#       5#    B-  13717#     541#    100 968730#       540#
+  27   64   37  101 Rb    +  -42845#       200#        8209#       2#    B-  12480#     200#    100 954004#       215#
+  25   63   38  101 Sr    x  -55324.907      8.480     8324.740    0.084 B-   9736.095   11.055 100 940606.266      9.103
+  23   62   39  101 Y     x  -65061.002      7.092     8413.391    0.070 B-   8104.955   10.933 100 930154.138      7.614
+  21   61   40  101 Zr       -73165.957      8.339     8485.892    0.083 B-   5725.534    9.143 100 921453.110      8.951
+  19   60   41  101 Nb    x  -78891.491      3.749     8534.835    0.037 B-   4628.458    3.738 100 915306.496      4.024
+  17   59   42  101 Mo   -n  -83519.949      0.309     8572.915    0.003 B-   2824.645   24.002 100 910337.641      0.331
+  15   58   43  101 Tc    +  -86344.594     24.004     8593.136    0.238 B-   1613.520   24.000 100 907305.260     25.768
+  13   57   44  101 Ru       -87958.114      0.415     8601.365    0.004 B-   -545.697    5.852 100 905573.075      0.445
+  11   56   45  101 Rh       -87412.416      5.841     8588.216    0.058 B-  -1980.284    3.903 100 906158.905      6.270
+   9   55   46  101 Pd       -85432.132      4.588     8560.864    0.045 B-  -4097.759    6.668 100 908284.828      4.925
+   7   54   47  101 Ag    x  -81334.374      4.838     8512.546    0.048 B-  -5497.918    5.063 100 912683.953      5.193
+   5   53   48  101 Cd    x  -75836.456      1.490     8450.365    0.015 B-  -7223#     196#    100 918586.211      1.600
+   3   52   49  101 In    x  -68614#       196#        8371#       2#    B-  -8308#     358#    100 926340#       210#
+   1   51   50  101 Sn   ep  -60305.626    300.005     8281.102    2.970 B-      *              100 935259.244    322.068
+0 28   65   37  102 Rb    x  -37707#       298#        8157#       3#    B-  14452#     306#    101 959520#       320#
+  26   64   38  102 Sr    x  -52159.304     67.068     8291.220    0.658 B-   9013.873   67.191 101 944004.680     72.000
+  24   63   39  102 Y     x  -61173.177      4.077     8371.922    0.040 B-  10414.530    9.669 101 934327.889      4.377
+  22   62   40  102 Zr       -71587.707      8.767     8466.355    0.086 B-   4716.837    9.053 101 923147.431      9.412
+  20   61   41  102 Nb       -76304.544      2.545     8504.928    0.025 B-   7261.517    8.675 101 918083.697      2.732
+  18   60   42  102 Mo       -83566.061      8.312     8568.450    0.081 B-   1006.817   12.373 101 910288.138      8.923
+  16   59   43  102 Tc       -84572.878      9.166     8570.650    0.090 B-   4533.558    9.165 101 909207.275      9.840
+  14   58   44  102 Ru       -89106.437      0.418     8607.427    0.004 B-  -2323.119    6.396 101 904340.300      0.448
+  12   57   45  102 Rh    -  -86783.318      6.410     8576.981    0.063 B-   1119.853    6.406 101 906834.270      6.881
+  10   56   46  102 Pd       -87903.171      0.554     8580.290    0.005 B-  -5656.480    8.190 101 905632.058      0.594
+   8   55   47  102 Ag    +  -82246.691      8.171     8517.164    0.080 B-  -2587.000    8.000 101 911704.540      8.771
+   6   54   48  102 Cd       -79659.691      1.662     8484.131    0.016 B-  -8964.807    4.865 101 914481.799      1.784
+   4   53   49  102 In       -70694.884      4.573     8388.571    0.045 B-  -5760.000  100.000 101 924105.916      4.909
+   2   52   50  102 Sn    -  -64934.884    100.105     8324.430    0.981 B-      *              101 930289.530    107.466
+0 29   66   37  103 Rb    x  -33608#       401#        8117#       4#    B-  13814#     446#    102 963920#       430#
+  27   65   38  103 Sr    x  -47422#       196#        8243#       2#    B-  11035#     196#    102 949090#       210#
+  25   64   39  103 Y     x  -58457.575     11.204     8342.638    0.109 B-   9357.759   14.518 102 937243.208     12.028
+  23   63   40  103 Zr    x  -67815.334      9.232     8425.895    0.090 B-   7213.337   10.036 102 927197.240      9.911
+  21   62   41  103 Nb    x  -75028.671      3.935     8488.331    0.038 B-   5931.999   10.036 102 919453.403      4.224
+  19   61   42  103 Mo    x  -80960.670      9.232     8538.328    0.090 B-   3643.197   13.471 102 913085.140      9.911
+  17   60   43  103 Tc   +p  -84603.867      9.810     8566.103    0.095 B-   2663.304    9.808 102 909174.008     10.531
+  15   59   44  103 Ru       -87267.171      0.443     8584.365    0.004 B-    764.538    2.260 102 906314.833      0.475
+  13   58   45  103 Rh       -88031.708      2.301     8584.192    0.022 B-   -574.519    2.420 102 905494.068      2.470
+  11   57   46  103 Pd   -n  -87457.189      0.950     8571.019    0.009 B-  -2654.498    4.207 102 906110.840      1.019
+   9   56   47  103 Ag    x  -84802.692      4.099     8537.651    0.040 B-  -4151.075    4.481 102 908960.560      4.400
+   7   55   48  103 Cd       -80651.616      1.811     8489.754    0.018 B-  -6019.026    9.754 102 913416.923      1.943
+   5   54   49  103 In       -74632.591      9.625     8423.721    0.093 B-  -7660.000   70.000 102 919878.613     10.332
+   3   53   50  103 Sn    -  -66972.591     70.659     8341.757    0.686 B- -10794#     306#    102 928101.962     75.855
+   1   52   51  103 Sb    x  -56178#       298#        8229#       3#    B-      *              102 939690#       320#
+0 28   66   38  104 Sr    x  -44106#       298#        8210#       3#    B-   9958#     499#    103 952650#       320#
+  26   65   39  104 Y     x  -54064#       401#        8298#       4#    B-  11660#     401#    103 941960#       430#
+  24   64   40  104 Zr    x  -65724.060      9.325     8402.377    0.090 B-   6094.952    9.699 103 929442.315     10.011
+  22   63   41  104 Nb    x  -71819.012      2.737     8453.459    0.026 B-   8530.957    9.311 103 922899.115      2.938
+  20   62   42  104 Mo       -80349.968      8.921     8527.965    0.086 B-   2153.476   24.167 103 913740.756      9.576
+  18   61   43  104 Tc       -82503.444     24.888     8541.149    0.239 B-   5592.266   24.939 103 911428.905     26.718
+  16   60   44  104 Ru       -88095.710      2.498     8587.399    0.024 B-  -1136.362    3.364 103 905425.360      2.681
+  14   59   45  104 Rh   -n  -86959.348      2.303     8568.949    0.022 B-   2435.758    2.660 103 906645.295      2.472
+  12   58   46  104 Pd   +n  -89395.105      1.336     8584.848    0.013 B-  -4278.654    4.000 103 904030.401      1.434
+  10   57   47  104 Ag    -  -85116.452      4.217     8536.184    0.041 B-  -1148.072    4.537 103 908623.725      4.527
+   8   56   48  104 Cd       -83968.380      1.673     8517.622    0.016 B-  -7785.716    6.013 103 909856.230      1.795
+   6   55   49  104 In    x  -76182.665      5.775     8435.237    0.056 B-  -4555.617    8.146 103 918214.540      6.200
+   4   54   50  104 Sn       -71627.047      5.745     8383.911    0.055 B- -12453.427  122.579 103 923105.197      6.167
+   2   53   51  104 Sb   -p  -59173.620    122.444     8256.644    1.177 B-      *              103 936474.502    131.449
+0 29   67   38  105 Sr    x  -38610#       503#        8156#       5#    B-  12660#    1428#    104 958550#       540#
+  27   66   39  105 Y     x  -51270.361   1336.694     8269.020   12.730 B-  10194.373 1336.749 104 944959.000   1435.000
+  25   65   40  105 Zr    x  -61464.734     12.118     8358.659    0.115 B-   8450.817   12.770 104 934014.890     13.008
+  23   64   41  105 Nb    x  -69915.551      4.028     8431.692    0.038 B-   7421.590    9.920 104 924942.564      4.324
+  21   63   42  105 Mo       -77337.141      9.065     8494.923    0.086 B-   4952.947   35.031 104 916975.159      9.731
+  19   62   43  105 Tc       -82290.088     35.264     8534.643    0.336 B-   3644.402   35.280 104 911657.952     37.857
+  17   61   44  105 Ru       -85934.490      2.499     8561.900    0.024 B-   1916.752    2.851 104 907745.525      2.682
+  15   60   45  105 Rh       -87851.243      2.502     8572.704    0.024 B-    566.646    2.346 104 905687.806      2.685
+  13   59   46  105 Pd       -88417.888      1.138     8570.650    0.011 B-  -1347.052    4.670 104 905079.487      1.222
+  11   58   47  105 Ag       -87070.836      4.544     8550.370    0.043 B-  -2736.997    4.362 104 906525.607      4.877
+   9   57   48  105 Cd       -84333.839      1.392     8516.852    0.013 B-  -4693.267   10.341 104 909463.895      1.494
+   7   56   49  105 In    x  -79640.572     10.246     8464.704    0.098 B-  -6302.580   10.989 104 914502.324     11.000
+   5   55   50  105 Sn       -73337.992      3.971     8397.228    0.038 B-  -9322.510   22.185 104 921268.423      4.263
+   3   54   51  105 Sb   +a  -64015.482     21.827     8300.992    0.208 B- -11203.972  300.813 104 931276.549     23.431
+   1   53   52  105 Te   -a  -52811.510    300.020     8186.836    2.857 B-      *              104 943304.508    322.084
+0 30   68   38  106 Sr    x  -34790#       600#        8119#       6#    B-  11263#     783#    105 962651#       644#
+  28   67   39  106 Y     x  -46053#       503#        8218#       5#    B-  12497#     664#    105 950560#       540#
+  26   66   40  106 Zr    x  -58549.987    433.145     8328.450    4.086 B-   7653.370  433.164 105 937144.000    465.000
+  24   65   41  106 Nb    x  -66203.357      4.122     8393.271    0.039 B-   9931.170   10.026 105 928927.768      4.424
+  22   64   42  106 Mo    x  -76134.528      9.140     8479.581    0.086 B-   3641.695   15.284 105 918266.218      9.812
+  20   63   43  106 Tc    +  -79776.223     12.250     8506.556    0.116 B-   6547.000   11.000 105 914356.697     13.150
+  18   62   44  106 Ru       -86323.223      5.391     8560.940    0.051 B-     39.404    0.212 105 907328.203      5.787
+  16   61   45  106 Rh       -86362.627      5.390     8553.931    0.051 B-   3544.901    5.335 105 907285.901      5.785
+  14   60   46  106 Pd       -89907.527      1.106     8579.992    0.010 B-  -2965.145    2.817 105 903480.293      1.186
+  12   59   47  106 Ag       -86942.383      3.016     8544.639    0.028 B-    189.755    2.819 105 906663.507      3.237
+  10   58   48  106 Cd       -87132.138      1.104     8539.048    0.010 B-  -6524.004   12.176 105 906459.797      1.184
+   8   57   49  106 In    -  -80608.134     12.226     8470.120    0.115 B-  -3254.447   13.244 105 913463.603     13.125
+   6   56   50  106 Sn       -77353.687      5.091     8432.038    0.048 B- -10880.396    9.025 105 916957.396      5.465
+   4   55   51  106 Sb    x  -66473.292      7.452     8322.012    0.070 B-  -8253.544  100.816 105 928637.982      8.000
+   2   54   52  106 Te   -a  -58219.748    100.541     8236.767    0.948 B-      *              105 937498.526    107.934
+0 31   69   38  107 Sr    x  -28900#       700#        8064#       7#    B-  13465#     862#    106 968975#       751#
+  29   68   39  107 Y     x  -42364#       503#        8182#       5#    B-  12015#    1230#    106 954520#       540#
+  27   67   40  107 Zr    x  -54379.688   1122.450     8287.073   10.490 B-   9344.122 1122.479 106 941621.000   1205.000
+  25   66   41  107 Nb    x  -63723.810      8.023     8367.089    0.075 B-   8827.750   12.232 106 931589.672      8.612
+  23   65   42  107 Mo    x  -72551.560      9.233     8442.280    0.086 B-   6198.355   12.667 106 922112.692      9.912
+  21   64   43  107 Tc    x  -78749.914      8.673     8492.897    0.081 B-   5112.598   11.724 106 915458.485      9.310
+  19   63   44  107 Ru  -nn  -83862.512      8.673     8533.366    0.081 B-   3001.191   14.847 106 909969.885      9.310
+  17   62   45  107 Rh   +p  -86863.703     12.051     8554.103    0.113 B-   1508.936   12.111 106 906747.974     12.937
+  15   61   46  107 Pd       -88372.639      1.201     8560.894    0.011 B-     34.031    2.318 106 905128.064      1.289
+  13   60   47  107 Ag       -88406.670      2.382     8553.900    0.022 B-  -1416.409    2.567 106 905091.531      2.557
+  11   59   48  107 Cd       -86990.261      1.665     8533.351    0.016 B-  -3426.000   11.000 106 906612.108      1.787
+   9   58   49  107 In    -  -83564.261     11.125     8494.021    0.104 B-  -5052.033   12.327 106 910290.071     11.943
+   7   57   50  107 Sn    x  -78512.228      5.310     8439.494    0.050 B-  -7858.989    6.738 106 915713.651      5.700
+   5   56   51  107 Sb       -70653.239      4.148     8358.734    0.039 B- -10113.913   70.952 106 924150.624      4.452
+   3   55   52  107 Te   -a  -60539.326     70.830     8256.899    0.662 B- -11110#     308#    106 935008.356     76.039
+   1   54   53  107 I     x  -49430#       300#        8146#       3#    B-      *              106 946935#       322#
+0 30   69   39  108 Y     x  -37297#       596#        8134#       6#    B-  14056#     718#    107 959960#       640#
+  28   68   40  108 Zr    x  -51353#       401#        8257#       4#    B-   8193#     401#    107 944870#       430#
+  26   67   41  108 Nb    x  -59545.765      8.237     8325.665    0.076 B-  11210.177   12.373 107 936074.988      8.842
+  24   66   42  108 Mo    x  -70755.942      9.233     8422.219    0.085 B-   5166.835   12.734 107 924040.367      9.912
+  22   65   43  108 Tc    x  -75922.778      8.769     8462.816    0.081 B-   7738.573   11.790 107 918493.541      9.413
+  20   64   44  108 Ru  -3n  -83661.350      8.680     8527.225    0.080 B-   1370.370   16.469 107 910185.841      9.318
+  18   63   45  108 Rh    x  -85031.721     13.996     8532.670    0.130 B-   4492.486   14.039 107 908714.688     15.024
+  16   62   46  108 Pd       -89524.206      1.108     8567.023    0.010 B-  -1917.444    2.633 107 903891.805      1.189
+  14   61   47  108 Ag   -n  -87606.763      2.388     8542.025    0.022 B-   1645.651    2.639 107 905950.266      2.563
+  12   60   48  108 Cd       -89252.414      1.123     8550.019    0.010 B-  -5132.595    8.584 107 904183.587      1.205
+  10   59   49  108 In       -84119.819      8.641     8495.251    0.080 B-  -2049.881    9.836 107 909693.655      9.276
+   8   58   50  108 Sn       -82069.938      5.382     8469.027    0.050 B-  -9624.607    7.692 107 911894.292      5.778
+   6   57   51  108 Sb    x  -72445.331      5.496     8372.666    0.051 B-  -6663.664    7.712 107 922226.734      5.900
+   4   56   52  108 Te       -65781.667      5.411     8303.721    0.050 B- -13132.062  132.370 107 929380.471      5.808
+   2   55   53  108 I    -a  -52649.605    132.260     8174.884    1.225 B-      *              107 943478.321    141.986
+0 31   70   39  109 Y     x  -33200#       700#        8096#       6#    B-  12992#     862#    108 964358#       751#
+  29   69   40  109 Zr    x  -46193#       503#        8208#       5#    B-  10497#     566#    108 950410#       540#
+  27   68   41  109 Nb    x  -56689.794    258.490     8297.130    2.371 B-   9976.202  258.732 108 939141.000    277.500
+  25   67   42  109 Mo    x  -66665.996     11.188     8381.477    0.103 B-   7616.780   14.787 108 928431.106     12.010
+  23   66   43  109 Tc    x  -74282.775      9.669     8444.178    0.089 B-   6455.626   12.657 108 920254.156     10.380
+  21   65   44  109 Ru  -4n  -80738.401      8.954     8496.227    0.082 B-   4261.054    9.822 108 913323.756      9.612
+  19   64   45  109 Rh       -84999.455      4.039     8528.142    0.037 B-   2607.021    4.187 108 908749.326      4.336
+  17   63   46  109 Pd       -87606.476      1.114     8544.882    0.010 B-   1112.950    1.402 108 905950.574      1.195
+  15   62   47  109 Ag       -88719.426      1.287     8547.915    0.012 B-   -215.105    1.780 108 904755.773      1.381
+  13   61   48  109 Cd       -88504.321      1.536     8538.764    0.014 B-  -2014.809    4.066 108 904986.698      1.649
+  11   60   49  109 In       -86489.511      3.969     8513.102    0.036 B-  -3859.327    8.887 108 907149.685      4.261
+   9   59   50  109 Sn       -82630.184      7.949     8470.518    0.073 B-  -6379.206    8.807 108 911292.843      8.533
+   7   58   51  109 Sb       -76250.977      5.265     8404.815    0.048 B-  -8535.587    6.850 108 918141.204      5.652
+   5   57   52  109 Te       -67715.390      4.382     8319.330    0.040 B- -10042.894    8.030 108 927304.534      4.704
+   3   56   53  109 I    -p  -57672.496      6.729     8220.016    0.062 B- -11502.948  300.183 108 938086.025      7.223
+   1   55   54  109 Xe   -a  -46169.548    300.108     8107.306    2.753 B-      *              108 950434.948    322.178
+0 30   70   40  110 Zr    x  -42886#       596#        8177#       5#    B-   9424#    1029#    109 953960#       640#
+  28   69   41  110 Nb    x  -52309.909    838.345     8255.260    7.621 B-  12232.677  838.694 109 943843.000    900.000
+  26   68   42  110 Mo    x  -64542.585     24.223     8359.354    0.220 B-   6491.925   26.018 109 930710.680     26.004
+  24   67   43  110 Tc    x  -71034.510      9.497     8411.259    0.086 B-   9038.066   12.509 109 923741.312     10.195
+  22   66   44  110 Ru       -80072.576      8.924     8486.311    0.081 B-   2756.110   19.404 109 914038.548      9.580
+  20   65   45  110 Rh       -82828.686     17.805     8504.254    0.162 B-   5502.218   17.797 109 911079.742     19.114
+  18   64   46  110 Pd       -88330.905      0.612     8547.162    0.006 B-   -873.603    1.378 109 905172.868      0.657
+  16   63   47  110 Ag       -87457.302      1.286     8532.108    0.012 B-   2890.667    1.277 109 906110.719      1.380
+  14   62   48  110 Cd       -90347.969      0.380     8551.275    0.003 B-  -3878.000   11.547 109 903007.460      0.407
+  12   61   49  110 In    -  -86469.969     11.553     8508.908    0.105 B-   -627.985   17.980 109 907170.665     12.402
+  10   60   50  110 Sn    x  -85841.983     13.777     8496.087    0.125 B-  -8392.250   15.012 109 907844.835     14.790
+   8   59   51  110 Sb    x  -77449.734      5.962     8412.681    0.054 B-  -5219.923    8.875 109 916854.286      6.400
+   6   58   52  110 Te       -72229.811      6.575     8358.115    0.060 B- -11765.635   50.978 109 922458.104      7.058
+   4   57   53  110 I    -a  -60464.176     50.552     8244.043    0.460 B-  -8541.551  112.934 109 935089.033     54.270
+   2   56   54  110 Xe   -a  -51922.625    100.988     8159.280    0.918 B-      *              109 944258.765    108.415
+0 31   71   40  111 Zr    x  -37560#       700#        8128#       6#    B-  11316#     760#    110 959678#       751#
+  29   70   41  111 Nb    x  -48875#       298#        8223#       3#    B-  11064#     298#    110 947530#       320#
+  27   69   42  111 Mo    +  -59939.761     12.578     8315.292    0.113 B-   9084.861    6.800 110 935652.016     13.502
+  25   68   43  111 Tc    x  -69024.622     10.581     8390.089    0.095 B-   7760.649   13.848 110 925899.016     11.359
+  23   67   44  111 Ru    x  -76785.271      9.682     8452.957    0.087 B-   5519.181   11.860 110 917567.616     10.394
+  21   66   45  111 Rh       -82304.452      6.850     8495.631    0.062 B-   3681.435    6.887 110 911642.531      7.354
+  19   65   46  111 Pd   -n  -85985.888      0.731     8521.749    0.007 B-   2229.560    1.572 110 907690.347      0.785
+  17   64   47  111 Ag    +  -88215.447      1.459     8534.787    0.013 B-   1036.800    1.414 110 905296.816      1.565
+  15   63   48  111 Cd       -89252.247      0.357     8537.079    0.003 B-   -860.204    3.417 110 904183.766      0.383
+  13   62   49  111 In       -88392.043      3.424     8522.282    0.031 B-  -2453.456    6.337 110 905107.233      3.675
+  11   61   50  111 Sn   +n  -85938.587      5.336     8493.130    0.048 B-  -5101.851   10.334 110 907741.126      5.728
+   9   60   51  111 Sb    x  -80836.736      8.849     8440.120    0.080 B-  -7249.259   10.937 110 913218.189      9.500
+   7   59   52  111 Te    x  -73587.477      6.427     8367.763    0.058 B-  -8633.692    7.994 110 921000.589      6.900
+   5   58   53  111 I        -64953.785      4.754     8282.934    0.043 B- -10558.252   86.830 110 930269.239      5.103
+   3   57   54  111 Xe   -a  -54395.534     86.700     8180.766    0.781 B- -11575#     214#    110 941603.989     93.076
+   1   56   55  111 Cs    x  -42821#       196#        8069#       2#    B-      *              110 954030#       210#
+0 32   72   40  112 Zr    x  -33810#       700#        8094#       6#    B-  10463#     760#    111 963703#       751#
+  30   71   41  112 Nb    x  -44274#       298#        8180#       3#    B-  13190#     357#    111 952470#       320#
+  28   70   42  112 Mo    x  -57464#       196#        8291#       2#    B-   7795#     196#    111 938310#       210#
+  26   69   43  112 Tc    x  -65258.938      5.515     8353.621    0.049 B-  10371.881   11.060 111 929941.644      5.920
+  24   68   44  112 Ru    x  -75630.818      9.599     8439.242    0.086 B-   4100.685   45.118 111 918806.972     10.305
+  22   67   45  112 Rh       -79731.503     44.085     8468.870    0.394 B-   6590.059   43.927 111 914404.705     47.327
+  20   66   46  112 Pd       -86321.562      6.544     8520.724    0.058 B-    262.156    6.978 111 907329.986      7.025
+  18   65   47  112 Ag    x  -86583.718      2.422     8516.080    0.022 B-   3991.141    2.435 111 907048.550      2.600
+  16   64   48  112 Cd       -90574.859      0.250     8544.730    0.002 B-  -2584.728    4.243 111 902763.883      0.268
+  14   63   49  112 In       -87990.131      4.251     8514.667    0.038 B-    664.925    4.243 111 905538.704      4.563
+  12   62   50  112 Sn       -88655.056      0.294     8513.618    0.003 B-  -7056.091   17.832 111 904824.877      0.315
+  10   61   51  112 Sb    x  -81598.965     17.829     8443.632    0.159 B-  -4031.457   19.702 111 912399.903     19.140
+   8   60   52  112 Te    x  -77567.508      8.383     8400.652    0.075 B- -10504.178   13.239 111 916727.850      9.000
+   6   59   53  112 I     x  -67063.330     10.246     8299.879    0.091 B-  -7036.991   13.175 111 928004.550     11.000
+   4   58   54  112 Xe   -a  -60026.338      8.283     8230.064    0.074 B- -13736.062   87.190 111 935559.071      8.891
+   2   57   55  112 Cs   -p  -46290.277     86.796     8100.435    0.775 B-      *              111 950305.341     93.178
+0 31   72   41  113 Nb    x  -40511#       401#        8146#       4#    B-  11979#     500#    112 956510#       430#
+  29   71   42  113 Mo    x  -52490#       300#        8245#       3#    B-  10322#     300#    112 943650#       322#
+  27   70   43  113 Tc    x  -62811.541      3.353     8329.464    0.030 B-   9056.578   37.028 112 932569.033      3.600
+  25   69   44  113 Ru       -71868.119     36.875     8402.688    0.326 B-   6899.417   37.558 112 922846.396     39.587
+  23   68   45  113 Rh    x  -78767.536      7.130     8456.821    0.063 B-   4823.555    9.881 112 915439.567      7.653
+  21   67   46  113 Pd    x  -83591.092      6.945     8492.584    0.061 B-   3435.731   18.033 112 910261.267      7.455
+  19   66   47  113 Ag    +  -87026.822     16.643     8516.065    0.147 B-   2016.462   16.641 112 906572.858     17.866
+  17   65   48  113 Cd       -89043.284      0.244     8526.987    0.002 B-    323.833    0.265 112 904408.097      0.262
+  15   64   49  113 In       -89367.117      0.188     8522.929    0.002 B-  -1038.985    1.573 112 904060.448      0.202
+  13   63   50  113 Sn       -88328.132      1.575     8506.811    0.014 B-  -3911.164   17.121 112 905175.845      1.690
+  11   62   51  113 Sb    -  -84416.968     17.193     8465.275    0.152 B-  -6069.939   32.810 112 909374.652     18.457
+   9   61   52  113 Te    x  -78347.029     27.945     8404.636    0.247 B-  -7227.522   29.070 112 915891.000     30.000
+   7   60   53  113 I     x  -71119.507      8.011     8333.752    0.071 B-  -8915.889   10.533 112 923650.064      8.600
+   5   59   54  113 Xe       -62203.618      6.840     8247.927    0.061 B- -10439.088   10.970 112 933221.666      7.342
+   3   58   55  113 Cs   -p  -51764.530      8.577     8148.622    0.076 B- -11980#     298#    112 944428.488      9.207
+   1   57   56  113 Ba    x  -39784#       298#        8036#       3#    B-      *              112 957290#       320#
+0 32   73   41  114 Nb    x  -35387#       503#        8100#       4#    B-  14420#     585#    113 962010#       540#
+  30   72   42  114 Mo    x  -49807#       298#        8220#       3#    B-   8793#     526#    113 946530#       320#
+  28   71   43  114 Tc    x  -58600.288    433.145     8290.259    3.800 B-  11621.524  433.159 113 937090.000    465.000
+  26   70   44  114 Ru    x  -70221.811      3.550     8385.340    0.031 B-   5488.813   71.643 113 924613.780      3.811
+  24   69   45  114 Rh       -75710.625     71.561     8426.624    0.628 B-   7780.319   71.891 113 918721.296     76.824
+  22   68   46  114 Pd    x  -83490.943      6.945     8488.010    0.061 B-   1439.856    8.311 113 910368.780      7.456
+  20   67   47  114 Ag    x  -84930.800      4.564     8493.778    0.040 B-   5084.133    4.573 113 908823.031      4.900
+  18   66   48  114 Cd       -90014.932      0.276     8531.513    0.002 B-  -1445.132    0.382 113 903364.990      0.296
+  16   65   49  114 In       -88569.801      0.301     8511.973    0.003 B-   1989.923    0.302 113 904916.402      0.323
+  14   64   50  114 Sn       -90559.723      0.029     8522.566    0.000 B-  -6063.149   21.838 113 902780.132      0.031
+  12   63   51  114 Sb       -84496.574     21.838     8462.518    0.192 B-  -2608.005   35.466 113 909289.191     23.444
+  10   62   52  114 Te    x  -81888.569     27.945     8432.778    0.245 B-  -9092#     152#    113 912089.000     30.000
+   8   61   53  114 I     x  -72796#       149#        8346#       1#    B-  -5710#     149#    113 921850#       160#
+   6   60   54  114 Xe    x  -67085.890     11.178     8289.205    0.098 B- -12403.629   71.976 113 927980.331     12.000
+   4   59   55  114 Cs   -a  -54682.261     71.102     8173.538    0.624 B-  -8776.835  124.892 113 941296.175     76.331
+   2   58   56  114 Ba   -a  -45905.426    102.676     8089.686    0.901 B-      *              113 950718.495    110.227
+0 33   74   41  115 Nb    x  -31354#       503#        8065#       4#    B-  13395#     643#    114 966340#       540#
+  31   73   42  115 Mo    x  -44749#       401#        8175#       3#    B-  11571#     885#    114 951960#       430#
+  29   72   43  115 Tc    x  -56319.990    789.441     8268.527    6.865 B-   9869.744  794.386 114 939538.000    847.500
+  27   71   44  115 Ru    x  -66189.734     88.496     8347.547    0.770 B-   8040.097   88.790 114 928942.393     95.004
+  25   70   45  115 Rh    x  -74229.831      7.316     8410.658    0.064 B-   6196.554   15.350 114 920310.993      7.854
+  23   69   46  115 Pd       -80426.386     13.546     8457.738    0.118 B-   4556.268   21.649 114 913658.718     14.541
+  21   68   47  115 Ag       -84982.654     18.268     8490.555    0.159 B-   3101.825   18.274 114 908767.363     19.611
+  19   67   48  115 Cd       -88084.479      0.651     8510.724    0.006 B-   1451.867    0.651 114 905437.417      0.699
+  17   66   49  115 In       -89536.346      0.012     8516.546    0.000 B-    497.489    0.010 114 903878.773      0.012
+  15   65   50  115 Sn       -90033.835      0.015     8514.069    0.000 B-  -3030.432   16.025 114 903344.697      0.016
+  13   64   51  115 Sb    x  -87003.403     16.025     8480.915    0.139 B-  -4940.644   32.214 114 906598.000     17.203
+  11   63   52  115 Te    x  -82062.759     27.945     8431.150    0.243 B-  -5724.962   40.184 114 911902.000     30.000
+   9   62   53  115 I     x  -76337.797     28.876     8374.564    0.251 B-  -7681.049   31.313 114 918048.000     31.000
+   7   61   54  115 Xe    x  -68656.748     12.109     8300.970    0.105 B-  -8957#     103#    114 926293.945     13.000
+   5   60   55  115 Cs    x  -59699#       102#        8216#       1#    B- -10680#     225#    114 935910#       110#
+   3   59   56  115 Ba    x  -49020#       200#        8117#       2#    B-      *              114 947375#       215#
+0 32   74   42  116 Mo    x  -41500#       500#        8146#       4#    B-   9956#     582#    115 955448#       537#
+  30   73   43  116 Tc    x  -51456#       298#        8225#       3#    B-  12613#     298#    115 944760#       320#
+  28   72   44  116 Ru    x  -64068.909      3.726     8326.883    0.032 B-   6667.213   73.926 115 931219.193      4.000
+  26   71   45  116 Rh       -70736.122     73.832     8377.615    0.636 B-   9095.512   74.169 115 924061.645     79.261
+  24   70   46  116 Pd    x  -79831.635      7.132     8449.280    0.061 B-   2711.019    7.842 115 914297.210      7.656
+  22   69   47  116 Ag    x  -82542.653      3.260     8465.907    0.028 B-   6169.827    3.264 115 911386.812      3.500
+  20   68   48  116 Cd       -88712.480      0.160     8512.350    0.001 B-   -462.731    0.272 115 904763.230      0.172
+  18   67   49  116 In   -n  -88249.749      0.220     8501.617    0.002 B-   3276.221    0.240 115 905259.992      0.236
+  16   66   50  116 Sn       -91525.970      0.096     8523.116    0.001 B-  -4703.820    5.160 115 901742.824      0.103
+  14   65   51  116 Sb       -86822.150      5.160     8475.821    0.044 B-  -1553.189   28.417 115 906792.583      5.539
+  12   64   52  116 Te    x  -85268.961     27.945     8455.687    0.241 B-  -7776.725  100.553 115 908460.000     30.000
+  10   63   53  116 I     +  -77492.236     96.592     8381.902    0.833 B-  -4445.512   95.707 115 916808.658    103.695
+   8   62   54  116 Xe    x  -73046.724     13.041     8336.834    0.112 B- -11004#     101#    115 921581.112     14.000
+   6   61   55  116 Cs   ea  -62043#       100#        8235#       1#    B-  -7463#     224#    115 933395#       108#
+   4   60   56  116 Ba    x  -54580#       200#        8164#       2#    B- -13935#     371#    115 941406#       215#
+   2   59   57  116 La   -a  -40645#       312#        8037#       3#    B-      *              115 956365#       335#
+0 33   75   42  117 Mo    x  -36170#       500#        8100#       4#    B-  12212#     641#    116 961170#       537#
+  31   74   43  117 Tc    x  -48382#       401#        8197#       3#    B-  11108#     590#    116 948060#       430#
+  29   73   44  117 Ru    x  -59489.865    433.145     8285.562    3.702 B-   9407.508  433.236 116 936135.000    465.000
+  27   72   45  117 Rh    x  -68897.373      8.892     8359.281    0.076 B-   7527.104   11.411 116 926035.623      9.546
+  25   71   46  117 Pd       -76424.477      7.252     8416.929    0.062 B-   5757.537   14.766 116 917954.944      7.785
+  23   70   47  117 Ag       -82182.014     13.572     8459.452    0.116 B-   4236.375   13.610 116 911773.974     14.570
+  21   69   48  117 Cd   -n  -86418.389      1.013     8488.973    0.009 B-   2524.653    4.983 116 907226.038      1.087
+  19   68   49  117 In       -88943.042      4.881     8503.865    0.042 B-   1454.709    4.857 116 904515.712      5.239
+  17   67   50  117 Sn       -90397.751      0.483     8509.611    0.004 B-  -1758.212    8.445 116 902954.017      0.518
+  15   66   51  117 Sb       -88639.539      8.437     8487.897    0.072 B-  -3544.128   13.079 116 904841.535      9.057
+  13   65   52  117 Te       -85095.411     13.456     8450.919    0.115 B-  -4659.334   28.673 116 908646.313     14.446
+  11   64   53  117 I        -80436.077     26.196     8404.409    0.224 B-  -6250.740   28.177 116 913648.314     28.123
+   9   63   54  117 Xe    x  -74185.337     10.378     8344.297    0.089 B-  -7692.245   63.267 116 920358.760     11.141
+   7   62   55  117 Cs    x  -66493.092     62.410     8271.864    0.533 B-  -9035.338  258.002 116 928616.726     67.000
+   5   61   56  117 Ba   ep  -57457.753    250.340     8187.953    2.140 B- -10987#     321#    116 938316.561    268.750
+   3   60   57  117 La   -p  -46471#       200#        8087#       2#    B-      *              116 950111#       215#
+0 34   76   42  118 Mo    x  -32630#       500#        8069#       4#    B-  11159#     641#    117 964970#       537#
+  32   75   43  118 Tc    x  -43790#       401#        8157#       3#    B-  13470#     448#    117 952990#       430#
+  30   74   44  118 Ru    x  -57260#       200#        8265#       2#    B-   7628#     202#    117 938529#       215#
+  28   73   45  118 Rh    x  -64887.460     24.235     8322.858    0.205 B-  10501.286   24.342 117 930340.443     26.017
+  26   72   46  118 Pd       -75388.746      2.491     8405.222    0.021 B-   4165.046    3.539 117 919066.847      2.673
+  24   71   47  118 Ag    x  -79553.792      2.515     8433.889    0.021 B-   7147.849   20.158 117 914595.487      2.700
+  22   70   48  118 Cd  -nn  -86701.641     20.001     8487.834    0.169 B-    526.570   21.450 117 906921.955     21.471
+  20   69   49  118 In       -87228.211      7.752     8485.667    0.066 B-   4424.643    7.740 117 906356.659      8.322
+  18   68   50  118 Sn       -91652.853      0.499     8516.533    0.004 B-  -3656.640    2.975 117 901606.609      0.536
+  16   67   51  118 Sb    -  -87996.213      3.016     8478.915    0.026 B-   -299.630   18.726 117 905532.174      3.238
+  14   66   52  118 Te  +nn  -87696.584     18.481     8469.746    0.157 B-  -6725.536   27.056 117 905853.839     19.840
+  12   65   53  118 I     x  -80971.048     19.760     8406.120    0.167 B-  -2891.991   22.320 117 913074.000     21.213
+  10   64   54  118 Xe    x  -78079.057     10.378     8374.981    0.088 B-  -9669.689   16.442 117 916178.680     11.141
+   8   63   55  118 Cs   IT  -68409.367     12.753     8286.404    0.108 B-  -6055#     196#    117 926559.519     13.690
+   6   62   56  118 Ba    x  -62354#       196#        8228#       2#    B- -12794#     358#    117 933060#       210#
+   4   61   57  118 La    x  -49560#       300#        8113#       3#    B-      *              117 946795#       322#
+0 33   76   43  119 Tc    x  -40371#       503#        8128#       4#    B-  12193#     585#    118 956660#       540#
+  31   75   44  119 Ru    x  -52564#       298#        8224#       3#    B-  10259#     298#    118 943570#       320#
+  29   74   45  119 Rh    x  -62822.794      9.315     8303.394    0.078 B-   8585.108   12.440 118 932556.952     10.000
+  27   73   46  119 Pd    x  -71407.902      8.245     8368.964    0.069 B-   7237.863   16.855 118 923340.459      8.851
+  25   72   47  119 Ag       -78645.765     14.703     8423.212    0.124 B-   5331.303   35.926 118 915570.293     15.783
+  23   71   48  119 Cd       -83977.068     37.695     8461.438    0.317 B-   3722.212   38.088 118 909846.903     40.467
+  21   70   49  119 In       -87699.281      7.307     8486.143    0.061 B-   2365.742    7.336 118 905850.944      7.844
+  19   69   50  119 Sn       -90065.022      0.725     8499.449    0.006 B-   -590.843    7.689 118 903311.216      0.778
+  17   68   51  119 Sb       -89474.180      7.701     8487.910    0.065 B-  -2293.000    2.000 118 903945.512      8.267
+  15   67   52  119 Te    -  -87181.180      7.957     8462.066    0.067 B-  -3415.650   29.055 118 906407.148      8.541
+  13   66   53  119 I     x  -83765.530     27.945     8426.789    0.235 B-  -4971.117   29.810 118 910074.000     30.000
+  11   65   54  119 Xe    x  -78794.413     10.378     8378.441    0.087 B-  -6489.361   17.379 118 915410.713     11.141
+   9   64   55  119 Cs   IT  -72305.051     13.940     8317.334    0.117 B-  -7714.965  200.754 118 922377.330     14.965
+   7   63   56  119 Ba   ep  -64590.086    200.269     8245.928    1.683 B-  -9801#     361#    118 930659.686    214.997
+   5   62   57  119 La    x  -54790#       300#        8157#       3#    B- -10849#     583#    118 941181#       322#
+   3   61   58  119 Ce    x  -43940#       500#        8059#       4#    B-      *              118 952828#       537#
+0 34   77   43  120 Tc    x  -35518#       503#        8087#       4#    B-  14494#     643#    119 961870#       540#
+  32   76   44  120 Ru    x  -50012#       401#        8201#       3#    B-   8803#     446#    119 946310#       430#
+  30   75   45  120 Rh    x  -58815#       196#        8268#       2#    B-  11466#     196#    119 936860#       210#
+  28   74   46  120 Pd       -70280.050      2.291     8357.085    0.019 B-   5371.451    5.024 119 924551.258      2.459
+  26   73   47  120 Ag    x  -75651.502      4.471     8395.327    0.037 B-   8305.853    5.820 119 918784.767      4.800
+  24   72   48  120 Cd    x  -83957.354      3.726     8458.023    0.031 B-   1771.015   40.183 119 909868.067      4.000
+  22   71   49  120 In    +  -85728.369     40.010     8466.262    0.333 B-   5370.000   40.000 119 907966.805     42.952
+  20   70   50  120 Sn       -91098.369      0.896     8504.492    0.007 B-  -2680.608    7.140 119 902201.873      0.962
+  18   69   51  120 Sb    -  -88417.761      7.196     8475.635    0.060 B-    950.226    7.811 119 905079.624      7.725
+  16   68   52  120 Te       -89367.987      3.085     8477.034    0.026 B-  -5615.000   15.000 119 904059.514      3.311
+  14   67   53  120 I     -  -83752.987     15.314     8423.722    0.128 B-  -1580.563   19.343 119 910087.465     16.440
+  12   66   54  120 Xe    x  -82172.423     11.817     8404.031    0.098 B-  -8283.785   15.461 119 911784.270     12.686
+  10   65   55  120 Cs   IT  -73888.639      9.970     8328.480    0.083 B-  -5000.000  300.000 119 920677.279     10.702
+   8   64   56  120 Ba    -  -68888.639    300.166     8280.294    2.501 B- -11319#     424#    119 926045.000    322.241
+   6   63   57  120 La    x  -57570#       300#        8179#       2#    B-  -7970#     583#    119 938196#       322#
+   4   62   58  120 Ce    x  -49600#       500#        8107#       4#    B-      *              119 946752#       537#
+0 35   78   43  121 Tc    x  -31780#       500#        8056#       4#    B-  13267#     641#    120 965883#       537#
+  33   77   44  121 Ru    x  -45047#       401#        8159#       3#    B-  11203#     738#    120 951640#       430#
+  31   76   45  121 Rh    x  -56250.128    619.444     8245.239    5.119 B-   9932.201  619.453 120 939613.000    665.000
+  29   75   46  121 Pd    x  -66182.329      3.353     8320.858    0.028 B-   8220.492   12.565 120 928950.343      3.600
+  27   74   47  121 Ag    x  -74402.821     12.109     8382.330    0.100 B-   6671.005   12.264 120 920125.282     13.000
+  25   73   48  121 Cd    x  -81073.826      1.942     8430.996    0.016 B-   4762.148   27.483 120 912963.663      2.085
+  23   72   49  121 In   +p  -85835.974     27.414     8463.887    0.227 B-   3361.291   27.408 120 907851.286     29.430
+  21   71   50  121 Sn       -89197.265      0.955     8485.201    0.008 B-    403.057    2.690 120 904242.792      1.025
+  19   70   51  121 Sb       -89600.321      2.582     8482.066    0.021 B-  -1054.819   25.767 120 903810.093      2.771
+  17   69   52  121 Te       -88545.502     25.850     8466.883    0.214 B-  -2294.053   26.047 120 904942.488     27.751
+  15   68   53  121 I        -86251.449      5.356     8441.458    0.044 B-  -3770.463   11.558 120 907405.255      5.749
+  13   67   54  121 Xe       -82480.986     10.243     8403.832    0.085 B-  -5378.654   13.979 120 911453.014     10.995
+  11   66   55  121 Cs       -77102.331     14.290     8352.914    0.118 B-  -6357.495  141.176 120 917227.238     15.340
+   9   65   56  121 Ba    -  -70744.837    141.898     8293.907    1.173 B-  -8555#     332#    120 924052.289    152.333
+   7   64   57  121 La    x  -62190#       300#        8217#       2#    B-  -9500#     500#    120 933236#       322#
+   5   63   58  121 Ce    x  -52690#       401#        8132#       3#    B- -11268#     641#    120 943435#       430#
+   3   62   59  121 Pr   -p  -41422#       500#        8032#       4#    B-      *              120 955532#       537#
+0 34   78   44  122 Ru    x  -42150#       500#        8135#       4#    B-   9930#     583#    121 954750#       537#
+  32   77   45  122 Rh    x  -52080#       300#        8210#       2#    B-  12536#     301#    121 944090#       322#
+  30   76   46  122 Pd    x  -64616.161     19.561     8305.975    0.160 B-   6489.948   42.909 121 930631.694     21.000
+  28   75   47  122 Ag    x  -71106.108     38.191     8352.758    0.313 B-   9506.265   38.260 121 923664.448     41.000
+  26   74   48  122 Cd       -80612.374      2.299     8424.266    0.019 B-   2960.368   50.110 121 913459.052      2.468
+  24   73   49  122 In    +  -83572.741     50.057     8442.118    0.410 B-   6368.592   50.000 121 910280.966     53.738
+  22   72   50  122 Sn       -89941.333      2.395     8487.907    0.020 B-  -1605.963    3.384 121 903444.001      2.570
+  20   71   51  122 Sb       -88335.370      2.578     8468.331    0.021 B-   1979.089    2.127 121 905168.074      2.768
+  18   70   52  122 Te       -90314.460      1.507     8478.140    0.012 B-  -4234.000    5.000 121 903043.434      1.617
+  16   69   53  122 I     -  -86080.460      5.222     8437.023    0.043 B-   -725.483   12.277 121 907588.820      5.606
+  14   68   54  122 Xe    x  -85354.977     11.111     8424.664    0.091 B-  -7210.218   35.472 121 908367.658     11.928
+  12   67   55  122 Cs       -78144.759     33.687     8359.151    0.276 B-  -3535.815   43.769 121 916108.145     36.164
+  10   66   56  122 Ba    x  -74608.944     27.945     8323.756    0.229 B- -10066#     299#    121 919904.000     30.000
+   8   65   57  122 La    x  -64543#       298#        8235#       2#    B-  -6669#     499#    121 930710#       320#
+   6   64   58  122 Ce    x  -57874#       401#        8174#       3#    B- -13094#     641#    121 937870#       430#
+   4   63   59  122 Pr    x  -44780#       500#        8060#       4#    B-      *              121 951927#       537#
+0 35   79   44  123 Ru    x  -37080#       500#        8093#       4#    B-  12280#     640#    122 960193#       537#
+  33   78   45  123 Rh    x  -49360#       400#        8186#       3#    B-  11070#     885#    122 947010#       429#
+  31   77   46  123 Pd    x  -60429.742    789.441     8270.031    6.418 B-   9118.336  790.039 122 935126.000    847.500
+  29   76   47  123 Ag    x  -69548.078     30.739     8337.803    0.250 B-   7866.103   30.857 122 925337.062     33.000
+  27   75   48  123 Cd       -77414.181      2.696     8395.395    0.022 B-   6016.172   19.893 122 916892.453      2.894
+  25   74   49  123 In       -83430.353     19.827     8437.946    0.161 B-   4385.828   19.839 122 910433.826     21.285
+  23   73   50  123 Sn       -87816.181      2.416     8467.243    0.020 B-   1407.888    2.662 122 905725.446      2.594
+  21   72   51  123 Sb       -89224.069      1.506     8472.328    0.012 B-    -51.913    0.066 122 904214.016      1.616
+  19   71   52  123 Te       -89172.156      1.505     8465.546    0.012 B-  -1228.429    3.445 122 904269.747      1.615
+  17   70   53  123 I        -87943.727      3.740     8449.198    0.030 B-  -2695.027    9.690 122 905588.520      4.014
+  15   69   54  123 Xe       -85248.701      9.537     8420.927    0.078 B-  -4205.055   15.414 122 908481.750     10.238
+  13   68   55  123 Cs    x  -81043.646     12.109     8380.379    0.098 B-  -5388.693   17.125 122 912996.062     13.000
+  11   67   56  123 Ba    x  -75654.953     12.109     8330.208    0.098 B-  -7004#     196#    122 918781.062     13.000
+   9   66   57  123 La    x  -68651#       196#        8267#       2#    B-  -8365#     357#    122 926300#       210#
+   7   65   58  123 Ce    x  -60286#       298#        8193#       2#    B- -10056#     499#    122 935280#       320#
+   5   64   59  123 Pr    x  -50230#       400#        8104#       3#    B-      *              122 946076#       429#
+0 36   80   44  124 Ru    x  -33960#       600#        8068#       5#    B-  10929#     721#    123 963542#       644#
+  34   79   45  124 Rh    x  -44890#       400#        8149#       3#    B-  13500#     499#    123 951809#       429#
+  32   78   46  124 Pd    x  -58390#       298#        8252#       2#    B-   7810#     390#    123 937316#       320#
+  30   77   47  124 Ag    x  -66200.134    251.503     8308.655    2.028 B-  10501.538  251.521 123 928931.229    270.000
+  28   76   48  124 Cd       -76701.672      2.995     8387.035    0.024 B-   4168.529   30.539 123 917657.363      3.215
+  26   75   49  124 In       -80870.201     30.572     8414.343    0.247 B-   7363.992   30.576 123 913182.263     32.820
+  24   74   50  124 Sn       -88234.193      1.014     8467.421    0.008 B-   -613.944    1.513 123 905276.692      1.088
+  22   73   51  124 Sb   -n  -87620.248      1.507     8456.160    0.012 B-   2905.073    0.132 123 905935.789      1.618
+  20   72   52  124 Te       -90525.321      1.502     8473.279    0.012 B-  -3159.587    1.859 123 902817.064      1.612
+  18   71   53  124 I     -  -87365.734      2.390     8441.489    0.019 B-    295.686    2.846 123 906209.021      2.566
+  16   70   54  124 Xe       -87661.421      1.793     8437.565    0.014 B-  -5930.086    8.495 123 905891.588      1.924
+  14   69   55  124 Cs    x  -81731.334      8.304     8383.432    0.067 B-  -2641.559   15.004 123 912257.798      8.914
+  12   68   56  124 Ba    x  -79089.775     12.497     8355.820    0.101 B-  -8831.165   58.030 123 915093.629     13.416
+  10   67   57  124 La    x  -70258.610     56.669     8278.292    0.457 B-  -5343#     303#    123 924574.275     60.836
+   8   66   58  124 Ce    x  -64916#       298#        8229#       2#    B- -11765#     499#    123 930310#       320#
+   6   65   59  124 Pr    x  -53151#       401#        8128#       3#    B-  -8626#     643#    123 942940#       430#
+   4   64   60  124 Nd    x  -44525#       503#        8052#       4#    B-      *              123 952200#       540#
+0 35   80   45  125 Rh    x  -42000#       500#        8126#       4#    B-  12120#     640#    124 954911#       537#
+  33   79   46  125 Pd    x  -54120#       400#        8216#       3#    B-  10400#     589#    124 941900#       429#
+  31   78   47  125 Ag    x  -64519.932    433.145     8293.314    3.465 B-   8828.163  433.154 124 930735.000    465.000
+  29   77   48  125 Cd       -73348.095      2.885     8357.681    0.023 B-   7128.710   27.119 124 921257.577      3.097
+  27   76   49  125 In       -80476.805     27.023     8408.452    0.216 B-   5419.571   27.011 124 913604.591     29.010
+  25   75   50  125 Sn       -85896.376      1.033     8445.550    0.008 B-   2359.899    2.610 124 907786.442      1.109
+  23   74   51  125 Sb    +  -88256.274      2.599     8458.170    0.021 B-    766.700    2.121 124 905252.987      2.790
+  21   73   52  125 Te       -89022.974      1.502     8458.045    0.012 B-   -185.770    0.060 124 904429.900      1.612
+  19   72   53  125 I     -  -88837.204      1.504     8450.300    0.012 B-  -1643.824    2.192 124 904629.333      1.614
+  17   71   54  125 Xe       -87193.381      1.836     8430.890    0.015 B-  -3105.430    7.831 124 906394.050      1.971
+  15   70   55  125 Cs       -84087.950      7.744     8399.788    0.062 B-  -4418.985   13.446 124 909727.867      8.313
+  13   69   56  125 Ba       -79668.965     10.992     8358.178    0.088 B-  -5909.481   27.631 124 914471.843     11.800
+  11   68   57  125 La       -73759.484     25.997     8304.643    0.208 B-  -7102#     197#    124 920815.932     27.909
+   9   67   58  125 Ce    x  -66658#       196#        8242#       2#    B-  -8718#     358#    124 928440#       210#
+   7   66   59  125 Pr    x  -57940#       300#        8166#       2#    B- -10341#     500#    124 937799#       322#
+   5   65   60  125 Nd    x  -47599#       401#        8077#       3#    B-      *              124 948900#       430#
+0 36   81   45  126 Rh    x  -37300#       500#        8088#       4#    B-  14560#     640#    125 959957#       537#
+  34   80   46  126 Pd    x  -51860#       400#        8197#       3#    B-   8820#     447#    125 944326#       429#
+  32   79   47  126 Ag    x  -60680#       200#        8261#       2#    B-  11576#     200#    125 934857#       215#
+  30   78   48  126 Cd       -72256.802      2.476     8346.747    0.020 B-   5516.106   26.908 125 922429.127      2.658
+  28   77   49  126 In       -77772.908     26.921     8384.317    0.214 B-   8242.332   27.078 125 916507.344     28.901
+  26   76   50  126 Sn       -86015.240     10.447     8443.523    0.083 B-    378.000   30.000 125 907658.836     11.215
+  24   75   51  126 Sb    -  -86393.240     31.767     8440.314    0.252 B-   3672.108   31.787 125 907253.036     34.103
+  22   74   52  126 Te       -90065.348      1.504     8463.248    0.012 B-  -2154.031    3.677 125 903310.866      1.614
+  20   73   53  126 I        -87911.318      3.809     8439.944    0.030 B-   1235.644    5.173 125 905623.313      4.089
+  18   72   54  126 Xe       -89146.962      3.500     8443.541    0.028 B-  -4796.133   10.671 125 904296.794      3.757
+  16   71   55  126 Cs       -84350.829     10.401     8399.268    0.083 B-  -1680.927   16.259 125 909445.655     11.166
+  14   70   56  126 Ba    x  -82669.902     12.497     8379.718    0.099 B-  -7696.435   91.366 125 911250.204     13.416
+  12   69   57  126 La    x  -74973.468     90.508     8312.426    0.718 B-  -4152.910   94.723 125 919512.667     97.163
+  10   68   58  126 Ce    x  -70820.558     27.945     8273.257    0.222 B- -10497#     198#    125 923971.000     30.000
+   8   67   59  126 Pr    x  -60324#       196#        8184#       2#    B-  -7331#     357#    125 935240#       210#
+   6   66   60  126 Nd    x  -52993#       298#        8119#       2#    B- -13643#     582#    125 943110#       320#
+   4   65   61  126 Pm    x  -39350#       500#        8005#       4#    B-      *              125 957756#       537#
+0 37   82   45  127 Rh    x  -34030#       600#        8062#       5#    B-  13150#     781#    126 963467#       644#
+  35   81   46  127 Pd    x  -47180#       500#        8159#       4#    B-  11260#     539#    126 949350#       537#
+  33   80   47  127 Ag    x  -58440#       200#        8242#       2#    B-  10307#     201#    126 937262#       215#
+  31   79   48  127 Cd    x  -68747.402     12.109     8316.945    0.095 B-   8148.782   24.378 126 926196.624     13.000
+  29   78   49  127 In       -76896.184     21.157     8374.949    0.167 B-   6574.619   19.098 126 917448.546     22.713
+  27   77   50  127 Sn       -83470.803     10.057     8420.557    0.079 B-   3228.674   10.875 126 910390.401     10.796
+  25   76   51  127 Sb       -86699.477      5.126     8439.820    0.040 B-   1582.201    4.913 126 906924.277      5.502
+  23   75   52  127 Te       -88281.678      1.514     8446.118    0.012 B-    702.231    3.575 126 905225.714      1.625
+  21   74   53  127 I        -88983.909      3.647     8445.487    0.029 B-   -662.349    2.044 126 904471.838      3.915
+  19   73   54  127 Xe       -88321.560      4.110     8434.111    0.032 B-  -2081.406    6.421 126 905182.899      4.412
+  17   72   55  127 Cs       -86240.154      5.578     8411.562    0.044 B-  -3422.210   12.653 126 907417.381      5.988
+  15   71   56  127 Ba       -82817.944     11.357     8378.455    0.089 B-  -4921.836   27.740 126 911091.275     12.192
+  13   70   57  127 La       -77896.108     26.000     8333.540    0.205 B-  -5916.772   38.857 126 916375.084     27.912
+  11   69   58  127 Ce    x  -71979.336     28.876     8280.791    0.227 B-  -7436#     198#    126 922727.000     31.000
+   9   68   59  127 Pr    x  -64543#       196#        8216#       2#    B-  -9008#     357#    126 930710#       210#
+   7   67   60  127 Nd    x  -55536#       298#        8139#       2#    B- -10749#     499#    126 940380#       320#
+   5   66   61  127 Pm    x  -44786#       401#        8048#       3#    B-      *              126 951920#       430#
+0 36   82   46  128 Pd    x  -44490#       500#        8138#       4#    B-  10130#     583#    127 952238#       537#
+  34   81   47  128 Ag    x  -54620#       300#        8211#       2#    B-  12622#     300#    127 941363#       322#
+  32   80   48  128 Cd       -67241.890      7.244     8303.264    0.057 B-   6904.051  153.554 127 927812.857      7.776
+  30   79   49  128 In       -74145.941    153.479     8351.090    1.199 B-   9216.067  153.027 127 920401.053    164.766
+  28   78   50  128 Sn       -83362.008     17.660     8416.979    0.138 B-   1268.278   13.796 127 910507.197     18.958
+  26   77   51  128 Sb   IT  -84630.286     19.119     8420.775    0.149 B-   4363.429   19.117 127 909145.645     20.525
+  24   76   52  128 Te       -88993.716      0.866     8448.752    0.007 B-  -1254.992    3.714 127 904461.311      0.929
+  22   75   53  128 I        -87738.724      3.647     8432.836    0.028 B-   2121.575    3.748 127 905808.600      3.915
+  20   74   54  128 Xe       -89860.298      1.061     8443.298    0.008 B-  -3928.717    5.380 127 903530.996      1.138
+  18   73   55  128 Cs       -85931.581      5.443     8406.493    0.043 B-   -553.084    7.525 127 907748.648      5.843
+  16   72   56  128 Ba       -85378.497      5.195     8396.060    0.041 B-  -6753.066   54.695 127 908342.408      5.577
+  14   71   57  128 La    x  -78625.431     54.448     8337.190    0.425 B-  -3091.513   61.200 127 915592.123     58.452
+  12   70   58  128 Ce    x  -75533.917     27.945     8306.925    0.218 B-  -9203.161   40.859 127 918911.000     30.000
+  10   69   59  128 Pr    x  -66330.757     29.808     8228.913    0.233 B-  -6017#     198#    127 928791.000     32.000
+   8   68   60  128 Nd    x  -60314#       196#        8176#       2#    B- -12529#     357#    127 935250#       210#
+   6   67   61  128 Pm    x  -47786#       298#        8072#       2#    B-  -9116#     582#    127 948700#       320#
+   4   66   62  128 Sm    x  -38670#       500#        7994#       4#    B-      *              127 958486#       537#
+0 37   83   46  129 Pd    x  -37610#       600#        8084#       5#    B-  14370#     721#    128 959624#       644#
+  35   82   47  129 Ag    x  -51980#       400#        8189#       3#    B-  11078#     400#    128 944197#       429#
+  33   81   48  129 Cd    x  -63058.046     16.767     8269.034    0.130 B-   9779.674   16.982 128 932304.399     18.000
+  31   80   49  129 In       -72837.720      2.693     8338.780    0.021 B-   7753.183   17.302 128 921805.486      2.891
+  29   79   50  129 Sn       -80590.903     17.277     8392.818    0.134 B-   4038.404   27.372 128 913482.102     18.547
+  27   78   51  129 Sb    +  -84629.307     21.231     8418.058    0.165 B-   2375.500   21.213 128 909146.696     22.792
+  25   77   52  129 Te       -87004.807      0.869     8430.409    0.007 B-   1502.318    3.142 128 906596.492      0.933
+  23   76   53  129 I        -88507.125      3.168     8435.990    0.025 B-    188.934    3.168 128 904983.687      3.401
+  21   75   54  129 Xe       -88696.05896    0.00537   8431.390    0.000 B-  -1196.813    4.555 128 904780.85892    0.00576
+  19   74   55  129 Cs       -87499.246      4.555     8416.047    0.035 B-  -2436.048   10.623 128 906065.690      4.889
+  17   73   56  129 Ba       -85063.198     10.577     8391.098    0.082 B-  -3738.625   21.639 128 908680.896     11.354
+  15   72   57  129 La       -81324.573     21.351     8356.052    0.166 B-  -5037.077   35.168 128 912694.475     22.920
+  13   71   58  129 Ce    x  -76287.496     27.945     8310.940    0.217 B-  -6513.938   40.859 128 918102.000     30.000
+  11   70   59  129 Pr    x  -69773.558     29.808     8254.380    0.231 B-  -7459#     204#    128 925095.000     32.000
+   9   69   60  129 Nd   ep  -62315#       202#        8190#       2#    B-  -9434#     360#    128 933102#       217#
+   7   68   61  129 Pm    x  -52881#       298#        8111#       2#    B- -10881#     582#    128 943230#       320#
+   5   67   62  129 Sm    x  -42000#       500#        8021#       4#    B-      *              128 954911#       537#
+0 36   83   47  130 Ag  -nn  -45697#       500#        8140#       4#    B-  15420#     500#    129 950942#       537#
+  34   82   48  130 Cd    x  -61117.589     22.356     8252.586    0.172 B-   8765.617   44.128 129 934387.566     24.000
+  32   81   49  130 In    +  -69883.206     38.046     8313.996    0.293 B-  10249.000   38.000 129 924977.288     40.844
+  30   80   50  130 Sn       -80132.206      1.873     8386.816    0.014 B-   2153.470   14.113 129 913974.533      2.010
+  28   79   51  130 Sb       -82285.676     14.212     8397.363    0.109 B-   5067.273   14.212 129 911662.688     15.257
+  26   78   52  130 Te       -87352.949      0.011     8430.324    0.000 B-   -416.811    3.168 129 906222.747      0.012
+  24   77   53  130 I    -n  -86936.138      3.168     8421.100    0.024 B-   2944.325    3.168 129 906670.211      3.401
+  22   76   54  130 Xe       -89880.463      0.009     8437.731    0.000 B-  -2980.720    8.357 129 903509.349      0.010
+  20   75   55  130 Cs       -86899.743      8.357     8408.784    0.064 B-    361.801    8.738 129 906709.283      8.971
+  18   74   56  130 Ba       -87261.544      2.553     8405.549    0.020 B-  -5634.178   26.071 129 906320.874      2.741
+  16   73   57  130 La    x  -81627.366     25.946     8356.191    0.200 B-  -2204.461   38.133 129 912369.413     27.854
+  14   72   58  130 Ce    x  -79422.905     27.945     8333.216    0.215 B-  -8247.448   70.085 129 914736.000     30.000
+  12   71   59  130 Pr    x  -71175.457     64.273     8263.756    0.494 B-  -4579.225   70.085 129 923590.000     69.000
+  10   70   60  130 Nd    x  -66596.232     27.945     8222.513    0.215 B- -11200#     198#    129 928506.000     30.000
+   8   69   61  130 Pm    x  -55396#       196#        8130#       2#    B-  -7890#     446#    129 940530#       210#
+   6   68   62  130 Sm    x  -47506#       401#        8064#       3#    B- -13823#     641#    129 949000#       430#
+   4   67   63  130 Eu   -p  -33683#       500#        7951#       4#    B-      *              129 963840#       537#
+0 37   84   47  131 Ag    x  -40380#       500#        8099#       4#    B-  14839#     511#    130 956650#       537#
+  35   83   48  131 Cd    x  -55218.965    102.464     8206.175    0.782 B-  12806.066  102.500 130 940720.000    110.000
+  33   82   49  131 In    x  -68025.030      2.701     8297.959    0.021 B-   9239.541    4.518 130 926972.122      2.900
+  31   81   50  131 Sn       -77264.571      3.621     8362.517    0.028 B-   4716.830    3.962 130 917053.066      3.887
+  29   80   51  131 Sb       -81981.401      2.084     8392.552    0.016 B-   3229.611    2.085 130 911989.341      2.236
+  27   79   52  131 Te   -n  -85211.012      0.061     8411.233    0.001 B-   2231.699    0.608 130 908522.211      0.065
+  25   78   53  131 I     +  -87442.710      0.605     8422.297    0.005 B-    970.848    0.605 130 906126.384      0.649
+  23   77   54  131 Xe       -88413.558      0.009     8423.736    0.000 B-   -354.772    4.974 130 905084.136      0.009
+  21   76   55  131 Cs       -88058.786      4.974     8415.056    0.038 B-  -1375.055    5.279 130 905464.999      5.340
+  19   75   56  131 Ba       -86683.731      2.569     8398.587    0.020 B-  -2914.475   28.063 130 906941.181      2.757
+  17   74   57  131 La    x  -83769.256     27.945     8370.367    0.213 B-  -4060.816   43.092 130 910070.000     30.000
+  15   73   58  131 Ce       -79708.440     32.802     8333.396    0.250 B-  -5407.784   55.446 130 914429.465     35.214
+  13   72   59  131 Pr       -74300.656     46.995     8286.143    0.359 B-  -6532.623   53.081 130 920234.960     50.451
+  11   71   60  131 Nd       -67768.033     27.517     8230.304    0.210 B-  -8108#     202#    130 927248.020     29.541
+   9   70   61  131 Pm    x  -59660#       200#        8162#       2#    B-  -9527#     448#    130 935952#       215#
+   7   69   62  131 Sm    x  -50133#       401#        8084#       3#    B- -10863#     566#    130 946180#       430#
+   5   68   63  131 Eu   -p  -39270#       401#        7995#       3#    B-      *              130 957842#       430#
+0 38   85   47  132 Ag    x  -33790#       500#        8049#       4#    B-  16473#     537#    131 963725#       537#
+  36   84   48  132 Cd    x  -50263#       196#        8168#       1#    B-  12148#     205#    131 946040#       210#
+  34   83   49  132 In    +  -62411.542     60.033     8253.715    0.455 B-  14135.000   60.000 131 932998.449     64.447
+  32   82   50  132 Sn       -76546.542      1.976     8354.872    0.015 B-   3088.729    3.161 131 917823.902      2.121
+  30   81   51  132 Sb       -79635.271      2.467     8372.344    0.019 B-   5552.915    4.271 131 914508.015      2.648
+  28   80   52  132 Te       -85188.186      3.486     8408.485    0.026 B-    515.304    3.483 131 908546.716      3.742
+  26   79   53  132 I        -85703.490      4.065     8406.462    0.031 B-   3575.472    4.065 131 907993.514      4.364
+  24   78   54  132 Xe       -89278.96179    0.00515   8427.622    0.000 B-  -2126.280    1.036 131 904155.08697    0.00553
+  22   77   55  132 Cs       -87152.681      1.036     8405.587    0.008 B-   1282.336    1.478 131 906437.743      1.112
+  20   76   56  132 Ba       -88435.017      1.054     8409.375    0.008 B-  -4711.367   36.354 131 905061.098      1.131
+  18   75   57  132 La       -83723.650     36.359     8367.756    0.275 B-  -1252.754   41.718 131 910118.959     39.032
+  16   74   58  132 Ce       -82470.896     20.442     8352.338    0.155 B-  -7243.440   35.380 131 911463.846     21.945
+  14   73   59  132 Pr    x  -75227.456     28.876     8291.537    0.219 B-  -3801.648   37.679 131 919240.000     31.000
+  12   72   60  132 Nd    x  -71425.807     24.205     8256.810    0.183 B-  -9798#     151#    131 923321.237     25.985
+  10   71   61  132 Pm    x  -61628#       149#        8177#       1#    B-  -6548#     333#    131 933840#       160#
+   8   70   62  132 Sm    x  -55079#       298#        8121#       2#    B- -12879#     499#    131 940870#       320#
+   6   69   63  132 Eu    x  -42200#       400#        8018#       3#    B-      *              131 954696#       429#
+0 37   85   48  133 Cd    x  -43920#       298#        8119#       2#    B-  13544#     357#    132 952850#       320#
+  35   84   49  133 In    x  -57464#       196#        8215#       1#    B-  13410#     196#    132 938310#       210#
+  33   83   50  133 Sn       -70873.880      1.904     8310.088    0.014 B-   8049.623    3.662 132 923913.756      2.044
+  31   82   51  133 Sb       -78923.503      3.128     8364.729    0.024 B-   4013.619    3.518 132 915272.130      3.357
+  29   81   52  133 Te       -82937.122      2.066     8389.025    0.016 B-   2921.139    6.751 132 910963.332      2.218
+  27   80   53  133 I    ++  -85858.260      6.427     8405.106    0.048 B-   1785.311    6.861 132 907827.361      6.900
+  25   79   54  133 Xe    +  -87643.571      2.400     8412.647    0.018 B-    427.360    2.400 132 905910.750      2.576
+  23   78   55  133 Cs       -88070.931      0.008     8409.978    0.000 B-   -517.319    0.992 132 905451.961      0.008
+  21   77   56  133 Ba       -87553.613      0.992     8400.206    0.007 B-  -2059.230   27.962 132 906007.325      1.065
+  19   76   57  133 La    x  -85494.383     27.945     8378.841    0.210 B-  -3076.168   32.379 132 908218.000     30.000
+  17   75   58  133 Ce    x  -82418.214     16.354     8349.829    0.123 B-  -4480.634   20.583 132 911520.402     17.557
+  15   74   59  133 Pr    x  -77937.581     12.497     8310.258    0.094 B-  -5605.208   48.222 132 916330.561     13.416
+  13   73   60  133 Nd    x  -72332.372     46.575     8262.231    0.350 B-  -6924.726   68.552 132 922348.000     50.000
+  11   72   61  133 Pm    x  -65407.646     50.301     8204.283    0.378 B-  -8177#     302#    132 929782.000     54.000
+   9   71   62  133 Sm    x  -57231#       298#        8137#       2#    B-  -9995#     422#    132 938560#       320#
+   7   70   63  133 Eu    x  -47236#       298#        8056#       2#    B- -11376#     582#    132 949290#       320#
+   5   69   64  133 Gd    x  -35860#       500#        7964#       4#    B-      *              132 961503#       537#
+0 38   86   48  134 Cd    x  -38920#       400#        8082#       3#    B-  12741#     499#    133 958218#       429#
+  36   85   49  134 In    x  -51661#       298#        8171#       2#    B-  14773#     298#    133 944540#       320#
+  34   84   50  134 Sn    x  -66433.748      3.167     8275.171    0.024 B-   7586.794    3.597 133 928680.433      3.400
+  32   83   51  134 Sb    x  -74020.542      1.705     8325.950    0.013 B-   8513.200    3.233 133 920535.675      1.830
+  30   82   52  134 Te       -82533.741      2.746     8383.643    0.020 B-   1509.687    4.933 133 911396.379      2.948
+  28   81   53  134 I        -84043.429      4.857     8389.071    0.036 B-   4082.393    4.857 133 909775.663      5.213
+  26   80   54  134 Xe       -88125.822      0.009     8413.699    0.000 B-  -1234.667    0.018 133 905393.033      0.010
+  24   79   55  134 Cs       -86891.154      0.016     8398.646    0.000 B-   2058.699    0.304 133 906718.503      0.017
+  22   78   56  134 Ba       -88949.853      0.304     8408.171    0.002 B-  -3731.204   19.932 133 904508.399      0.326
+  20   77   57  134 La    x  -85218.650     19.930     8374.488    0.149 B-   -385.760   28.510 133 908514.011     21.395
+  18   76   58  134 Ce    x  -84832.889     20.387     8365.771    0.152 B-  -6304.898   28.781 133 908928.142     21.886
+  16   75   59  134 Pr    x  -78527.991     20.316     8312.881    0.152 B-  -2881.559   23.503 133 915696.729     21.810
+  14   74   60  134 Nd    x  -75646.432     11.817     8285.538    0.088 B-  -8907.681   58.949 133 918790.210     12.686
+  12   73   61  134 Pm    x  -66738.751     57.753     8213.225    0.431 B-  -5363#     204#    133 928353.000     62.000
+  10   72   62  134 Sm    x  -61376#       196#        8167#       1#    B- -11448#     357#    133 934110#       210#
+   8   71   63  134 Eu    x  -49928#       298#        8076#       2#    B-  -8626#     499#    133 946400#       320#
+   6   70   64  134 Gd    x  -41302#       401#        8006#       3#    B-      *              133 955660#       430#
+0 37   86   49  135 In    x  -46528#       401#        8132#       3#    B-  14104#     401#    134 950050#       430#
+  35   85   50  135 Sn    x  -60632.244      3.074     8230.687    0.023 B-   9058.079    4.052 134 934908.605      3.300
+  33   84   51  135 Sb       -69690.323      2.640     8291.989    0.020 B-   8038.457    3.152 134 925184.357      2.834
+  31   83   52  135 Te       -77728.780      1.722     8345.738    0.013 B-   6050.366    2.686 134 916554.718      1.848
+  29   82   53  135 I        -83779.145      2.061     8384.760    0.015 B-   2634.005    3.868 134 910059.382      2.212
+  27   81   54  135 Xe       -86413.151      3.720     8398.476    0.028 B-   1168.492    3.675 134 907231.661      3.993
+  25   80   55  135 Cs       -87581.643      0.992     8401.336    0.007 B-    268.855    1.038 134 905977.234      1.064
+  23   79   56  135 Ba       -87850.498      0.306     8397.533    0.002 B-  -1207.181    9.430 134 905688.606      0.328
+  21   78   57  135 La       -86643.317      9.434     8382.795    0.070 B-  -2027.146    4.610 134 906984.568     10.127
+  19   77   58  135 Ce       -84616.171     10.267     8361.984    0.076 B-  -3680.310   15.655 134 909160.799     11.022
+  17   76   59  135 Pr    x  -80935.861     11.817     8328.928    0.088 B-  -4722.252   22.484 134 913111.774     12.686
+  15   75   60  135 Nd    x  -76213.609     19.128     8288.153    0.142 B-  -6161.534   77.838 134 918181.320     20.534
+  13   74   61  135 Pm    x  -70052.075     75.451     8236.717    0.559 B-  -7194.860  172.054 134 924796.000     81.000
+  11   73   62  135 Sm    x  -62857.215    154.628     8177.626    1.145 B-  -8709#     249#    134 932520.000    166.000
+   9   72   63  135 Eu    x  -54148#       196#        8107#       1#    B-  -9757#     445#    134 941870#       210#
+   7   71   64  135 Gd    x  -44390#       400#        8029#       3#    B- -11565#     566#    134 952345#       429#
+   5   70   65  135 Tb   -p  -32825#       401#        7938#       3#    B-      *              134 964760#       430#
+0 38   87   49  136 In    x  -40510#       400#        8087#       3#    B-  15389#     499#    135 956511#       429#
+  36   86   50  136 Sn    x  -55899#       298#        8195#       2#    B-   8608#     298#    135 939990#       320#
+  34   85   51  136 Sb       -64506.880      5.830     8252.252    0.043 B-   9918.389    6.260 135 930749.011      6.258
+  32   84   52  136 Te       -74425.269      2.281     8319.429    0.017 B-   5119.945   14.188 135 920101.182      2.448
+  30   83   53  136 I        -79545.214     14.188     8351.323    0.104 B-   6883.945   14.188 135 914604.695     15.231
+  28   82   54  136 Xe       -86429.159      0.007     8396.188    0.000 B-    -90.462    1.882 135 907214.476      0.007
+  26   81   55  136 Cs    +  -86338.697      1.882     8389.770    0.014 B-   2548.224    1.857 135 907311.590      2.020
+  24   80   56  136 Ba       -88886.921      0.306     8402.755    0.002 B-  -2849.443   53.172 135 904575.959      0.328
+  22   79   57  136 La    x  -86037.479     53.171     8376.050    0.391 B-    470.893   53.173 135 907634.962     57.081
+  20   78   58  136 Ce       -86508.372      0.408     8373.760    0.003 B-  -5168.017   11.457 135 907129.438      0.438
+  18   77   59  136 Pr       -81340.355     11.455     8330.008    0.084 B-  -2141.068   16.458 135 912677.532     12.297
+  16   76   60  136 Nd    x  -79199.287     11.817     8308.512    0.087 B-  -8029.371   70.076 135 914976.064     12.686
+  14   75   61  136 Pm    x  -71169.915     69.073     8243.720    0.508 B-  -4359.026   70.194 135 923595.949     74.152
+  12   74   62  136 Sm    x  -66810.890     12.497     8205.916    0.092 B- -10567#     196#    135 928275.555     13.416
+  10   73   63  136 Eu    x  -56244#       196#        8122#       1#    B-  -7154#     357#    135 939620#       210#
+   8   72   64  136 Gd    x  -49090#       298#        8064#       2#    B- -12960#     582#    135 947300#       320#
+   6   71   65  136 Tb    x  -36130#       500#        7963#       4#    B-      *              135 961213#       537#
+0 39   88   49  137 In    x  -35040#       500#        8047#       4#    B-  14748#     641#    136 962383#       537#
+  37   87   50  137 Sn    x  -49788#       401#        8149#       3#    B-  10272#     404#    136 946550#       430#
+  35   86   51  137 Sb    x  -60060.384     52.164     8218.476    0.381 B-   9243.369   52.206 136 935522.522     56.000
+  33   85   52  137 Te       -69303.753      2.100     8280.235    0.015 B-   7052.506    8.643 136 925599.357      2.254
+  31   84   53  137 I  p-2n  -76356.258      8.383     8326.002    0.061 B-   6027.145    8.384 136 918028.180      9.000
+  29   83   54  137 Xe   -n  -82383.404      0.103     8364.286    0.001 B-   4162.203    0.373 136 911557.773      0.111
+  27   82   55  137 Cs    +  -86545.606      0.358     8388.956    0.003 B-   1175.629    0.172 136 907089.464      0.384
+  25   81   56  137 Ba       -87721.235      0.314     8391.827    0.002 B-   -580.547    1.632 136 905827.375      0.337
+  23   80   57  137 La    +  -87140.688      1.659     8381.879    0.012 B-  -1222.100    1.600 136 906450.618      1.780
+  21   79   58  137 Ce       -85918.588      0.437     8367.248    0.003 B-  -2716.895    8.133 136 907762.596      0.469
+  19   78   59  137 Pr       -83201.693      8.137     8341.706    0.059 B-  -3617.126   14.282 136 910679.304      8.735
+  17   77   60  137 Nd       -79584.567     11.737     8309.593    0.086 B-  -5511.719   17.545 136 914562.448     12.600
+  15   76   61  137 Pm    x  -74072.848     13.041     8263.651    0.095 B-  -6046.323   44.355 136 920479.522     14.000
+  13   75   62  137 Sm       -68026.525     42.395     8213.806    0.309 B-  -7880.630   42.620 136 926970.517     45.512
+  11   74   63  137 Eu    x  -60145.895      4.378     8150.573    0.032 B-  -8932#     298#    136 935430.722      4.700
+   9   73   64  137 Gd    x  -51214#       298#        8080#       2#    B- -10246#     499#    136 945020#       320#
+   7   72   65  137 Tb    x  -40967#       401#        7999#       3#    B-      *              136 956020#       430#
+0 38   88   50  138 Sn    x  -44861#       503#        8113#       4#    B-   9360#    1177#    137 951840#       540#
+  36   87   51  138 Sb    x  -54220.403   1064.232     8175.091    7.712 B-  11475.582 1064.239 137 941792.000   1142.500
+  34   86   52  138 Te       -65695.985      3.787     8252.578    0.027 B-   6283.914    7.063 137 929472.454      4.065
+  32   85   53  138 I     x  -71979.900      5.962     8292.444    0.043 B-   7992.334    6.588 137 922726.394      6.400
+  30   84   54  138 Xe       -79972.233      2.804     8344.690    0.020 B-   2914.704    9.579 137 914146.271      3.010
+  28   83   55  138 Cs       -82886.937      9.159     8360.142    0.066 B-   5374.700    9.159 137 911017.207      9.832
+  26   82   56  138 Ba       -88261.638      0.317     8393.420    0.002 B-  -1742.458    3.191 137 905247.229      0.339
+  24   81   57  138 La       -86519.180      3.188     8375.125    0.023 B-   1051.742    4.038 137 907117.834      3.422
+  22   80   58  138 Ce       -87570.922      4.931     8377.077    0.036 B-  -4437.000   10.000 137 905988.743      5.293
+  20   79   59  138 Pr    -  -83133.922     11.150     8339.255    0.081 B-  -1115.612   16.090 137 910752.059     11.969
+  18   78   60  138 Nd       -82018.310     11.601     8325.502    0.084 B-  -7077.827   28.756 137 911949.717     12.454
+  16   77   61  138 Pm       -74940.483     27.739     8268.544    0.201 B-  -3442.721   30.151 137 919548.077     29.778
+  14   76   62  138 Sm    x  -71497.762     11.817     8237.928    0.086 B-  -9748.093   30.341 137 923243.990     12.686
+  12   75   63  138 Eu    x  -61749.669     27.945     8161.620    0.202 B-  -5949#     198#    137 933709.000     30.000
+  10   74   64  138 Gd    x  -55800#       196#        8113#       1#    B- -12132#     357#    137 940096#       210#
+   8   73   65  138 Tb    x  -43668#       298#        8019#       2#    B-  -8737#     585#    137 953120#       320#
+   6   72   66  138 Dy    x  -34931#       503#        7950#       4#    B-      *              137 962500#       540#
+0 39   89   50  139 Sn    x  -38440#       500#        8066#       4#    B-  11348#     641#    138 958733#       537#
+  37   88   51  139 Sb    x  -49788#       401#        8142#       3#    B-  10417#     401#    138 946550#       430#
+  35   87   52  139 Te    x  -60205.072      3.540     8211.771    0.025 B-   8265.882    5.345 138 935367.193      3.800
+  33   86   53  139 I     x  -68470.954      4.005     8265.609    0.029 B-   7173.622    4.542 138 926493.403      4.300
+  31   85   54  139 Xe    x  -75644.576      2.142     8311.590    0.015 B-   5056.346    3.801 138 918792.203      2.300
+  29   84   55  139 Cs    +  -80700.921      3.140     8342.338    0.023 B-   4212.829    3.123 138 913363.992      3.370
+  27   83   56  139 Ba       -84913.751      0.319     8367.017    0.002 B-   2312.461    2.013 138 908841.334      0.342
+  25   82   57  139 La       -87226.212      2.009     8378.025    0.014 B-   -278.350    6.953 138 906358.804      2.156
+  23   81   58  139 Ce       -86947.862      7.230     8370.395    0.052 B-  -2129.064    2.996 138 906657.625      7.761
+  21   80   59  139 Pr       -84818.798      7.809     8349.449    0.056 B-  -2804.856   28.033 138 908943.270      8.383
+  19   79   60  139 Nd       -82013.942     27.600     8323.642    0.199 B-  -4513.460   25.927 138 911954.407     29.630
+  17   78   61  139 Pm       -77500.481     13.593     8285.543    0.098 B-  -5120.263   17.414 138 916799.806     14.592
+  15   77   62  139 Sm    x  -72380.219     10.884     8243.078    0.078 B-  -6982.177   17.071 138 922296.634     11.684
+  13   76   63  139 Eu    x  -65398.042     13.151     8187.218    0.095 B-  -7767#     196#    138 929792.310     14.117
+  11   75   64  139 Gd    x  -57632#       196#        8126#       1#    B-  -9501#     357#    138 938130#       210#
+   9   74   65  139 Tb    x  -48130#       298#        8052#       2#    B- -10489#     585#    138 948330#       320#
+   7   73   66  139 Dy    x  -37642#       503#        7971#       4#    B-      *              138 959590#       540#
+0 38   89   51  140 Sb    x  -43939#       596#        8100#       4#    B-  12638#     599#    139 952830#       640#
+  36   88   52  140 Te    x  -56576.228     62.410     8184.847    0.446 B-   7029.985   63.574 139 939262.917     67.000
+  34   87   53  140 I     x  -63606.213     12.109     8229.473    0.086 B-   9380.238   12.331 139 931715.917     13.000
+  32   86   54  140 Xe    x  -72986.451      2.329     8290.887    0.017 B-   4063.654    8.525 139 921645.817      2.500
+  30   85   55  140 Cs       -77050.105      8.201     8314.325    0.059 B-   6219.249    9.882 139 917283.305      8.804
+  28   84   56  140 Ba       -83269.354      7.933     8353.160    0.057 B-   1046.517    7.993 139 910606.666      8.516
+  26   83   57  140 La       -84315.871      2.009     8355.047    0.014 B-   3760.218    1.727 139 909483.184      2.156
+  24   82   58  140 Ce       -88076.089      1.596     8376.317    0.011 B-  -3388.000    6.000 139 905446.424      1.713
+  22   81   59  140 Pr    -  -84688.089      6.209     8346.529    0.044 B-   -429.177    7.101 139 909083.592      6.665
+  20   80   60  140 Nd    x  -84258.912      3.447     8337.875    0.025 B-  -6045.200   24.000 139 909544.332      3.700
+  18   79   61  140 Pm    -  -78213.712     24.246     8289.107    0.173 B-  -2757.777   27.277 139 916034.122     26.029
+  16   78   62  140 Sm    x  -75455.935     12.497     8263.820    0.089 B-  -8470.000   50.000 139 918994.717     13.416
+  14   77   63  140 Eu    -  -66985.935     51.538     8197.732    0.368 B-  -5203.664   58.627 139 928087.637     55.328
+  12   76   64  140 Gd    x  -61782.271     27.945     8154.975    0.200 B- -11300.000  800.000 139 933674.000     30.000
+  10   75   65  140 Tb    -  -50482.271    800.488     8068.672    5.718 B-  -7652#     895#    139 945805.049    859.359
+   8   74   66  140 Dy    x  -42830#       401#        8008#       3#    B- -13571#     643#    139 954020#       430#
+   6   73   67  140 Ho   -p  -29259#       503#        7906#       4#    B-      *              139 968589#       540#
+0 39   90   51  141 Sb    x  -39110#       500#        8066#       4#    B-  11377#     641#    140 958014#       537#
+  37   89   52  141 Te    x  -50487#       401#        8141#       3#    B-   9440#     401#    140 945800#       430#
+  35   88   53  141 I     x  -59926.657     15.835     8202.255    0.112 B-   8270.642   16.097 140 935666.084     17.000
+  33   87   54  141 Xe    x  -68197.299      2.888     8255.364    0.020 B-   6280.224    9.638 140 926787.184      3.100
+  31   86   55  141 Cs       -74477.523      9.195     8294.356    0.065 B-   5255.103    9.617 140 920045.086      9.871
+  29   85   56  141 Ba       -79732.626      5.319     8326.078    0.038 B-   3199.010    6.600 140 914403.500      5.710
+  27   84   57  141 La       -82931.636      4.219     8343.217    0.030 B-   2501.280    3.928 140 910969.222      4.528
+  25   83   58  141 Ce       -85432.916      1.597     8355.408    0.011 B-    582.728    1.202 140 908283.987      1.714
+  23   82   59  141 Pr       -86015.644      1.665     8353.992    0.012 B-  -1823.014    2.809 140 907658.403      1.787
+  21   81   60  141 Nd    -  -84192.630      3.265     8335.515    0.023 B-  -3669.709   14.349 140 909615.488      3.505
+  19   80   61  141 Pm    x  -80522.921     13.972     8303.940    0.099 B-  -4589.012   16.375 140 913555.084     15.000
+  17   79   62  141 Sm       -75933.909      8.539     8265.845    0.061 B-  -6008.280   14.285 140 918481.591      9.167
+  15   78   63  141 Eu       -69925.629     12.639     8217.684    0.090 B-  -6701.405   23.456 140 924931.745     13.568
+  13   77   64  141 Gd    x  -63224.224     19.760     8164.608    0.140 B-  -8683.387  107.098 140 932126.000     21.213
+  11   76   65  141 Tb    x  -54540.837    105.259     8097.475    0.747 B-  -9158#     316#    140 941448.000    113.000
+   9   75   66  141 Dy    x  -45382#       298#        8027#       2#    B- -11018#     499#    140 951280#       320#
+   7   74   67  141 Ho   -p  -34364#       401#        7943#       3#    B-      *              140 963108#       430#
+0 38   90   52  142 Te    x  -46370#       503#        8111#       4#    B-   8400#     627#    141 950220#       540#
+  36   89   53  142 I     x  -54769.984    374.461     8165.019    2.637 B-  10459.655  374.470 141 941202.000    402.000
+  34   88   54  142 Xe    x  -65229.639      2.701     8233.169    0.019 B-   5284.911    7.565 141 929973.098      2.900
+  32   87   55  142 Cs       -70514.550      7.067     8264.877    0.050 B-   7327.714    8.363 141 924299.512      7.586
+  30   86   56  142 Ba       -77842.264      5.920     8310.971    0.042 B-   2181.963    8.391 141 916432.888      6.355
+  28   85   57  142 La       -80024.227      6.309     8320.828    0.044 B-   4508.961    5.845 141 914090.454      6.773
+  26   84   58  142 Ce       -84533.188      2.509     8347.071    0.018 B-   -745.712    2.500 141 909249.884      2.693
+  24   83   59  142 Pr       -83787.476      1.665     8336.310    0.012 B-   2162.505    1.416 141 910050.440      1.786
+  22   82   60  142 Nd       -85949.980      1.368     8346.030    0.010 B-  -4807.936   23.629 141 907728.895      1.468
+  20   81   61  142 Pm       -81142.044     23.597     8306.662    0.166 B-  -2155.574   23.752 141 912890.428     25.332
+  18   80   62  142 Sm       -78986.470      3.079     8285.972    0.022 B-  -7673.000   30.000 141 915204.532      3.305
+  16   79   63  142 Eu    -  -71313.470     30.158     8226.427    0.212 B-  -4353.955   41.114 141 923441.836     32.375
+  14   78   64  142 Gd    x  -66959.515     27.945     8190.256    0.197 B- -10400.000  700.000 141 928116.000     30.000
+  12   77   65  142 Tb    -  -56559.515    700.558     8111.507    4.934 B-  -6440#     200#    141 939280.859    752.079
+  10   76   66  142 Dy    -  -50120#       729#        8061#       5#    B- -12869#     831#    141 946194#       782#
+   8   75   67  142 Ho    x  -37250#       401#        7965#       3#    B-  -9221#     641#    141 960010#       430#
+   6   74   68  142 Er    x  -28030#       500#        7894#       4#    B-      *              141 969909#       537#
+0 39   91   52  143 Te    x  -40278#       503#        8068#       4#    B-  10353#     541#    142 956760#       540#
+  37   90   53  143 I     x  -50630#       200#        8135#       1#    B-   9572#     200#    142 945646#       215#
+  35   89   54  143 Xe    x  -60202.873      4.657     8196.885    0.033 B-   7472.636    8.891 142 935369.553      5.000
+  33   88   55  143 Cs       -67675.509      7.573     8243.670    0.053 B-   6261.688    9.730 142 927347.348      8.130
+  31   87   56  143 Ba       -73937.197      6.756     8281.987    0.047 B-   4234.318    9.969 142 920625.150      7.253
+  29   86   57  143 La       -78171.515      7.330     8306.127    0.051 B-   3435.156    7.595 142 916079.422      7.869
+  27   85   58  143 Ce       -81606.671      2.508     8324.678    0.018 B-   1461.575    1.866 142 912391.630      2.692
+  25   84   59  143 Pr       -83068.246      1.897     8329.428    0.013 B-    933.988    1.368 142 910822.564      2.037
+  23   83   60  143 Nd       -84002.234      1.367     8330.488    0.010 B-  -1041.583    2.682 142 909819.887      1.467
+  21   82   61  143 Pm       -82960.651      2.996     8317.733    0.021 B-  -3443.499    3.560 142 910938.073      3.216
+  19   81   62  143 Sm       -79517.152      2.804     8288.182    0.020 B-  -5275.851   11.338 142 914634.821      3.010
+  17   80   63  143 Eu    x  -74241.300     10.986     8245.817    0.077 B-  -6010.000  200.000 142 920298.681     11.793
+  15   79   64  143 Gd    -  -68231.300    200.301     8198.318    1.401 B-  -7812.117  206.750 142 926750.682    215.032
+  13   78   65  143 Tb    x  -60419.183     51.232     8138.217    0.358 B-  -8250.242   52.866 142 935137.335     55.000
+  11   77   66  143 Dy    x  -52168.941     13.041     8075.052    0.091 B- -10121#     298#    142 943994.335     14.000
+   9   76   67  143 Ho    x  -42048#       298#        7999#       2#    B- -10788#     499#    142 954860#       320#
+   7   75   68  143 Er    x  -31260#       400#        7918#       3#    B-      *              142 966441#       429#
+0 38   91   53  144 I     x  -45280#       401#        8098#       3#    B-  11592#     401#    143 951390#       430#
+  36   90   54  144 Xe    x  -56872.293      5.310     8172.884    0.037 B-   6399.060   20.820 143 938945.079      5.700
+  34   89   55  144 Cs       -63271.353     20.132     8211.889    0.140 B-   8495.768   20.416 143 932075.404     21.612
+  32   88   56  144 Ba       -71767.121      7.136     8265.454    0.050 B-   3082.530   14.774 143 922954.821      7.661
+  30   87   57  144 La    x  -74849.652     12.937     8281.428    0.090 B-   5582.219   13.254 143 919645.589     13.888
+  28   86   58  144 Ce    +  -80431.870      2.885     8314.760    0.020 B-    318.646    0.832 143 913652.830      3.096
+  26   85   59  144 Pr    +  -80750.517      2.762     8311.540    0.019 B-   2997.440    2.400 143 913310.750      2.965
+  24   84   60  144 Nd       -83747.957      1.367     8326.922    0.009 B-  -2331.863    2.646 143 910092.865      1.467
+  22   83   61  144 Pm       -81416.093      2.964     8305.296    0.021 B-    549.442    2.668 143 912596.224      3.181
+  20   82   62  144 Sm       -81965.535      1.554     8303.679    0.011 B-  -6346.402   10.814 143 912006.373      1.668
+  18   81   63  144 Eu       -75619.133     10.789     8254.173    0.075 B-  -3859.630   29.955 143 918819.517     11.582
+  16   80   64  144 Gd    x  -71759.504     27.945     8221.937    0.194 B-  -9391.323   39.520 143 922963.000     30.000
+  14   79   65  144 Tb    x  -62368.181     27.945     8151.287    0.194 B-  -5798.098   28.851 143 933045.000     30.000
+  12   78   66  144 Dy    x  -56570.083      7.173     8105.589    0.050 B- -11960.569   11.104 143 939269.514      7.700
+  10   77   67  144 Ho    x  -44609.513      8.477     8017.097    0.059 B-  -8002#     196#    143 952109.714      9.100
+   8   76   68  144 Er    x  -36608#       196#        7956#       1#    B- -14349#     445#    143 960700#       210#
+   6   75   69  144 Tm   -p  -22259#       400#        7851#       3#    B-      *              143 976104#       429#
+0 39   92   53  145 I     x  -40939#       503#        8068#       3#    B-  10554#     503#    144 956050#       540#
+  37   91   54  145 Xe    x  -51493.329     11.178     8135.087    0.077 B-   8561.086   14.393 144 944719.634     12.000
+  35   90   55  145 Cs       -60054.415      9.067     8188.733    0.063 B-   7461.761   12.412 144 935528.930      9.733
+  33   89   56  145 Ba    x  -67516.176      8.477     8234.798    0.058 B-   5319.141   14.912 144 927518.400      9.100
+  31   88   57  145 La       -72835.317     12.269     8266.087    0.085 B-   4231.705   35.300 144 921808.066     13.170
+  29   87   58  145 Ce       -77067.022     33.902     8289.875    0.234 B-   2558.918   33.635 144 917265.144     36.395
+  27   86   59  145 Pr       -79625.940      7.169     8302.127    0.049 B-   1806.012    7.037 144 914518.033      7.696
+  25   85   60  145 Nd       -81431.951      1.384     8309.187    0.010 B-   -164.478    2.536 144 912579.199      1.485
+  23   84   61  145 Pm       -81267.474      2.859     8302.657    0.020 B-   -616.156    2.539 144 912755.773      3.068
+  21   83   62  145 Sm       -80651.318      1.578     8293.013    0.011 B-  -2659.810    2.723 144 913417.244      1.694
+  19   82   63  145 Eu       -77991.507      3.106     8269.274    0.021 B-  -5065.187   19.953 144 916272.668      3.334
+  17   81   64  145 Gd       -72926.320     19.709     8228.946    0.136 B-  -6537.909  108.066 144 921710.370     21.158
+  15   80   65  145 Tb       -66388.411    109.174     8178.461    0.753 B-  -8145.812  109.369 144 928729.105    117.203
+  13   79   66  145 Dy    x  -58242.599      6.520     8116.888    0.045 B-  -9122.493    9.902 144 937473.994      7.000
+  11   78   67  145 Ho    x  -49120.105      7.452     8048.578    0.051 B-  -9880#     200#    144 947267.394      8.000
+   9   77   68  145 Er    x  -39240#       200#        7975#       1#    B- -11657#     280#    144 957874#       215#
+   7   76   69  145 Tm   -p  -27583#       196#        7889#       1#    B-      *              144 970389#       210#
+0 38   92   54  146 Xe    x  -47954.943     24.219     8110.415    0.166 B-   7355.429   24.391 145 948518.248     26.000
+  36   91   55  146 Cs    x  -55310.372      2.893     8155.436    0.020 B-   9636.714   21.063 145 940621.870      3.106
+  34   90   56  146 Ba       -64947.086     20.863     8216.082    0.143 B-   4103.197   33.503 145 930276.431     22.397
+  32   89   57  146 La       -69050.283     33.616     8238.828    0.230 B-   6585.107   34.456 145 925871.468     36.088
+  30   88   58  146 Ce       -75635.389     16.306     8278.573    0.112 B-   1045.616   32.519 145 918802.065     17.504
+  28   87   59  146 Pr       -76681.006     34.816     8280.376    0.238 B-   4244.861   34.830 145 917679.549     37.376
+  26   86   60  146 Nd       -80925.867      1.386     8304.092    0.009 B-  -1471.558    4.119 145 913122.503      1.488
+  24   85   61  146 Pm    +  -79454.309      4.308     8288.654    0.030 B-   1542.000    3.000 145 914702.286      4.624
+  22   84   62  146 Sm       -80996.309      3.092     8293.857    0.021 B-  -3878.768    5.869 145 913046.881      3.319
+  20   83   63  146 Eu       -77117.541      6.026     8261.932    0.041 B-  -1031.758    7.076 145 917210.909      6.468
+  18   82   64  146 Gd       -76085.783      4.108     8249.506    0.028 B-  -8322.173   44.749 145 918318.548      4.409
+  16   81   65  146 Tb       -67763.610     44.862     8187.146    0.307 B-  -5208.691   45.160 145 927252.768     48.161
+  14   80   66  146 Dy       -62554.918      6.695     8146.112    0.046 B- -11316.699    9.392 145 932844.529      7.187
+  12   79   67  146 Ho       -51238.219      6.587     8063.242    0.045 B-  -6916.206    9.399 145 944993.506      7.071
+  10   78   68  146 Er       -44322.012      6.705     8010.512    0.046 B- -13267#     200#    145 952418.359      7.197
+   8   77   69  146 Tm   -p  -31055#       200#        7914#       1#    B-      *              145 966661#       215#
+0 39   93   54  147 Xe    x  -42360#       200#        8072#       1#    B-   9560#     200#    146 954525#       215#
+  37   92   55  147 Cs    x  -51920.064      8.383     8131.800    0.057 B-   8343.965   21.454 146 944261.515      9.000
+  35   91   56  147 Ba    x  -60264.029     19.748     8183.240    0.134 B-   6414.361   22.466 146 935303.900     21.200
+  33   90   57  147 La    x  -66678.390     10.712     8221.553    0.073 B-   5335.501   13.725 146 928417.800     11.500
+  31   89   58  147 Ce       -72013.891      8.580     8252.527    0.058 B-   3430.176   15.533 146 922689.903      9.211
+  29   88   59  147 Pr       -75444.067     15.857     8270.539    0.108 B-   2702.681   15.860 146 919007.458     17.023
+  27   87   60  147 Nd       -78146.748      1.388     8283.603    0.009 B-    895.512    0.482 146 916106.010      1.490
+  25   86   61  147 Pm       -79042.260      1.399     8284.372    0.010 B-    224.094    0.293 146 915144.638      1.501
+  23   85   62  147 Sm       -79266.354      1.372     8280.575    0.009 B-  -1721.598    2.281 146 914904.064      1.473
+  21   84   63  147 Eu       -77544.756      2.630     8263.541    0.018 B-  -2187.811    2.524 146 916752.276      2.823
+  19   83   64  147 Gd       -75356.945      1.965     8243.336    0.013 B-  -4614.280    8.147 146 919100.987      2.109
+  17   82   65  147 Tb       -70742.665      8.100     8206.624    0.055 B-  -6546.628   11.997 146 924054.620      8.695
+  15   81   66  147 Dy    x  -64196.038      8.849     8156.767    0.060 B-  -8438.945   10.164 146 931082.715      9.500
+  13   80   67  147 Ho       -55757.092      5.001     8094.037    0.034 B-  -9149.286   38.517 146 940142.295      5.368
+  11   79   68  147 Er    x  -46607.807     38.191     8026.475    0.260 B- -10633.406   38.799 146 949964.458     41.000
+   9   78   69  147 Tm       -35974.400      6.839     7948.817    0.047 B-      *              146 961379.890      7.341
+0 40   94   54  148 Xe    x  -38600#       300#        8047#       2#    B-   8311#     300#    147 958561#       322#
+  38   93   55  148 Cs    x  -46910.942     13.041     8097.546    0.088 B-  10682.793   64.413 147 949639.029     14.000
+  36   92   56  148 Ba    +  -57593.735     63.079     8164.441    0.426 B-   5115.000   60.000 147 938170.578     67.718
+  34   91   57  148 La    x  -62708.735     19.468     8193.716    0.132 B-   7689.672   22.457 147 932679.400     20.900
+  32   90   58  148 Ce       -70398.408     11.195     8240.387    0.076 B-   2137.016   12.567 147 924424.196     12.018
+  30   89   59  148 Pr       -72535.424     15.043     8249.540    0.102 B-   4872.572   15.090 147 922130.015     16.148
+  28   88   60  148 Nd       -77407.996      2.127     8277.177    0.014 B-   -542.280    5.874 147 916899.093      2.283
+  26   87   61  148 Pm   +p  -76865.716      5.723     8268.226    0.039 B-   2470.548    5.642 147 917481.255      6.143
+  24   86   62  148 Sm       -79336.264      1.367     8279.633    0.009 B-  -3036.933   10.019 147 914829.012      1.467
+  22   85   63  148 Eu       -76299.331     10.013     8253.827    0.068 B-    -30.003   10.019 147 918089.294     10.748
+  20   84   64  148 Gd       -76269.329      1.554     8248.338    0.011 B-  -5732.246   12.529 147 918121.503      1.668
+  18   83   65  148 Tb       -70537.082     12.464     8204.321    0.084 B-  -2677.532    9.596 147 924275.323     13.380
+  16   82   66  148 Dy       -67859.550      8.724     8180.943    0.059 B-  -9868.393   84.287 147 927149.772      9.366
+  14   81   67  148 Ho    x  -57991.158     83.834     8108.979    0.566 B-  -6512.169   84.458 147 937743.928     90.000
+  12   80   68  148 Er    x  -51478.989     10.246     8059.692    0.069 B- -12713.962   14.491 147 944735.029     11.000
+  10   79   69  148 Tm    x  -38765.027     10.246     7968.500    0.069 B-  -8435#     400#    147 958384.029     11.000
+   8   78   70  148 Yb    x  -30330#       400#        7906#       3#    B-      *              147 967439#       429#
+0 39   94   55  149 Cs    x  -43250#       400#        8073#       3#    B-   9870#     593#    148 953569#       429#
+  37   93   56  149 Ba    x  -53120.309    437.802     8133.793    2.938 B-   7099.605  481.431 148 942973.000    470.000
+  35   92   57  149 La    +  -60219.913    200.262     8176.191    1.344 B-   6450.000  200.000 148 935351.260    214.990
+  33   91   58  149 Ce    x  -66669.913     10.246     8214.229    0.069 B-   4369.452   14.230 148 928426.900     11.000
+  31   90   59  149 Pr    x  -71039.366      9.874     8238.303    0.066 B-   3336.100   10.101 148 923736.100     10.600
+  29   89   60  149 Nd   -n  -74375.466      2.128     8255.442    0.014 B-   1688.790    2.455 148 920154.648      2.284
+  27   88   61  149 Pm       -76064.256      2.267     8261.526    0.015 B-   1071.481    1.875 148 918341.658      2.434
+  25   87   62  149 Sm       -77135.737      1.311     8263.466    0.009 B-   -694.625    3.788 148 917191.375      1.407
+  23   86   63  149 Eu       -76441.112      3.951     8253.554    0.027 B-  -1314.101    4.136 148 917937.086      4.241
+  21   85   64  149 Gd       -75127.011      3.360     8239.484    0.023 B-  -3638.343    4.339 148 919347.831      3.607
+  19   84   65  149 Tb       -71488.669      3.667     8209.815    0.025 B-  -3792.760    9.162 148 923253.753      3.936
+  17   83   66  149 Dy       -67695.909      9.221     8179.109    0.062 B-  -6049.330   12.803 148 927325.448      9.898
+  15   82   67  149 Ho       -61646.578     11.990     8133.259    0.080 B-  -7904.963   30.408 148 933819.672     12.871
+  13   81   68  149 Er    x  -53741.615     27.945     8074.955    0.188 B-  -9859#     198#    148 942306.000     30.000
+  11   80   69  149 Tm    x  -43883#       196#        8004#       1#    B- -10684#     358#    148 952890#       210#
+   9   79   70  149 Yb    x  -33198#       300#        7927#       2#    B-      *              148 964360#       322#
+0 40   95   55  150 Cs    x  -38170#       400#        8039#       3#    B-  11730#     500#    149 959023#       429#
+  38   94   56  150 Ba    x  -49900#       300#        8112#       2#    B-   6230#     529#    149 946430#       322#
+  36   93   57  150 La    x  -56129.966    435.473     8148.225    2.903 B-   8716.888  435.631 149 939742.000    467.500
+  34   92   58  150 Ce       -64846.854     11.697     8201.122    0.078 B-   3453.625   14.291 149 930384.035     12.556
+  32   91   59  150 Pr       -68300.479      9.015     8218.931    0.060 B-   5379.276    9.085 149 926676.415      9.678
+  30   90   60  150 Nd       -73679.755      1.289     8249.577    0.009 B-    -82.616   20.001 149 920901.525      1.384
+  28   89   61  150 Pm    +  -73597.139     20.041     8243.810    0.134 B-   3454.000   20.000 149 920990.217     21.514
+  26   88   62  150 Sm       -77051.139      1.274     8261.621    0.009 B-  -2258.905    6.181 149 917282.195      1.368
+  24   87   63  150 Eu       -74792.234      6.253     8241.346    0.042 B-    971.700    3.543 149 919707.229      6.712
+  22   86   64  150 Gd       -75763.934      6.076     8242.609    0.041 B-  -4658.213    8.379 149 918664.066      6.523
+  20   85   65  150 Tb       -71105.721      7.384     8206.338    0.049 B-  -1796.122    8.389 149 923664.864      7.927
+  18   84   66  150 Dy       -69309.600      4.348     8189.149    0.029 B-  -7363.719   14.468 149 925593.080      4.667
+  16   83   67  150 Ho       -61945.881     14.168     8134.842    0.094 B-  -4114.568   13.591 149 933498.358     15.210
+  14   82   68  150 Er       -57831.313     17.195     8102.195    0.115 B- -11340#     196#    149 937915.528     18.459
+  12   81   69  150 Tm    x  -46491#       196#        8021#       1#    B-  -7852#     358#    149 950090#       210#
+  10   80   70  150 Yb    x  -38638#       300#        7964#       2#    B- -13998#     424#    149 958520#       322#
+   8   79   71  150 Lu   -p  -24640#       300#        7865#       2#    B-      *              149 973548#       322#
+0 41   96   55  151 Cs    x  -34230#       500#        8013#       3#    B-  10710#     640#    150 963253#       537#
+  39   95   56  151 Ba    x  -44940#       400#        8079#       3#    B-   8370#     591#    150 951755#       429#
+  37   94   57  151 La    x  -53310.333    435.473     8129.043    2.884 B-   7914.718  435.833 150 942769.000    467.500
+  35   93   58  151 Ce    x  -61225.052     17.698     8176.277    0.117 B-   5554.578   21.189 150 934272.200     19.000
+  33   92   59  151 Pr       -66779.630     11.651     8207.881    0.077 B-   4163.358   11.689 150 928309.114     12.507
+  31   91   60  151 Nd       -70942.988      1.293     8230.272    0.009 B-   2443.075    4.473 150 923839.565      1.387
+  29   90   61  151 Pm       -73386.063      4.653     8241.270    0.031 B-   1190.217    4.476 150 921216.817      4.994
+  27   89   62  151 Sm       -74576.279      1.273     8243.971    0.008 B-     76.574    0.538 150 919939.066      1.367
+  25   88   63  151 Eu       -74652.854      1.325     8239.297    0.009 B-   -464.116    2.779 150 919856.860      1.422
+  23   87   64  151 Gd       -74188.737      3.056     8231.043    0.020 B-  -2565.233    3.762 150 920355.109      3.280
+  21   86   65  151 Tb       -71623.504      4.137     8208.873    0.027 B-  -2871.099    4.944 150 923109.001      4.441
+  19   85   66  151 Dy   -a  -68752.405      3.293     8184.678    0.022 B-  -5129.667    8.756 150 926191.253      3.535
+  17   84   67  151 Ho   -a  -63622.738      8.302     8145.526    0.055 B-  -5356.454   18.444 150 931698.177      8.912
+  15   83   68  151 Er    x  -58266.284     16.470     8104.872    0.109 B-  -7493.528   25.460 150 937448.567     17.681
+  13   82   69  151 Tm   +a  -50772.756     19.416     8050.064    0.129 B-  -9230.414  300.136 150 945493.201     20.843
+  11   81   70  151 Yb   ep  -41542.342    300.492     7983.755    1.990 B- -11434#     425#    150 955402.458    322.591
+   9   80   71  151 Lu   -p  -30108#       300#        7903#       2#    B-      *              150 967677#       322#
+0 42   97   55  152 Cs    x  -28930#       500#        7979#       3#    B-  12780#     640#    151 968942#       537#
+  40   96   56  152 Ba    x  -41710#       400#        8057#       3#    B-   7580#     500#    151 955222#       429#
+  38   95   57  152 La    x  -49290#       300#        8102#       2#    B-   9690#     361#    151 947085#       322#
+  36   94   58  152 Ce    x  -58980#       200#        8161#       1#    B-   4778#     201#    151 936682#       215#
+  34   93   59  152 Pr    x  -63758.063     18.537     8187.104    0.122 B-   6391.344   30.710 151 931552.900     19.900
+  32   92   60  152 Nd       -70149.407     24.485     8224.005    0.161 B-   1104.778   18.501 151 924691.509     26.285
+  30   91   61  152 Pm       -71254.186     25.912     8226.127    0.170 B-   3508.417   25.886 151 923505.481     27.817
+  28   90   62  152 Sm       -74762.603      1.212     8244.061    0.008 B-  -1874.348    0.687 151 919739.040      1.301
+  26   89   63  152 Eu       -72888.255      1.326     8226.583    0.009 B-   1818.661    0.702 151 921751.235      1.423
+  24   88   64  152 Gd       -74706.916      1.206     8233.401    0.008 B-  -3990.000   40.000 151 919798.822      1.294
+  22   87   65  152 Tb    -  -70716.916     40.018     8202.004    0.263 B-   -599.044   40.258 151 924082.263     42.961
+  20   86   66  152 Dy   -a  -70117.873      4.624     8192.916    0.030 B-  -6513.102   13.325 151 924725.363      4.963
+  18   85   67  152 Ho       -63604.771     12.529     8144.919    0.082 B-  -3104.394    9.815 151 931717.465     13.450
+  16   84   68  152 Er       -60500.377      8.830     8119.349    0.058 B-  -8780.104   54.743 151 935050.169      9.479
+  14   83   69  152 Tm       -51720.273     54.027     8056.438    0.355 B-  -5449.892  139.620 151 944476.000     58.000
+  12   82   70  152 Yb       -46270.381    149.708     8015.436    0.985 B- -12848#     246#    151 950326.700    160.718
+  10   81   71  152 Lu    x  -33422#       196#        7926#       1#    B-      *              151 964120#       210#
+0 41   97   56  153 Ba    x  -36470#       400#        8023#       3#    B-   9590#     500#    152 960848#       429#
+  39   96   57  153 La    x  -46060#       300#        8081#       2#    B-   8850#     361#    152 950553#       322#
+  37   95   58  153 Ce    x  -54910#       200#        8134#       1#    B-   6659#     201#    152 941052#       215#
+  35   94   59  153 Pr       -61568.463     11.882     8172.036    0.078 B-   5761.834   12.190 152 933903.532     12.755
+  33   93   60  153 Nd       -67330.297      2.747     8204.582    0.018 B-   3317.527    9.355 152 927717.949      2.948
+  31   92   61  153 Pm       -70647.824      9.066     8221.152    0.059 B-   1911.862    9.093 152 924156.436      9.732
+  29   91   62  153 Sm   -n  -72559.686      1.219     8228.534    0.008 B-    807.536    0.708 152 922103.969      1.309
+  27   90   63  153 Eu       -73367.222      1.330     8228.699    0.009 B-   -484.671    0.717 152 921237.043      1.428
+  25   89   64  153 Gd       -72882.551      1.203     8220.418    0.008 B-  -1569.213    3.845 152 921757.359      1.291
+  23   88   65  153 Tb       -71313.338      3.995     8205.048    0.026 B-  -2170.394    1.933 152 923441.978      4.289
+  21   87   66  153 Dy       -69142.943      4.048     8185.749    0.026 B-  -4130.840    6.157 152 925771.992      4.346
+  19   86   67  153 Ho   -a  -65012.103      5.094     8153.637    0.033 B-  -4543.499    9.916 152 930206.632      5.468
+  17   85   68  153 Er       -60468.604      9.321     8118.827    0.061 B-  -6495.276   12.891 152 935084.279     10.006
+  15   84   69  153 Tm       -53973.329     11.983     8071.261    0.078 B-  -6765#     196#    152 942057.244     12.864
+  13   83   70  153 Yb    x  -47208#       196#        8022#       1#    B-  -8835#     247#    152 949320#       210#
+  11   82   71  153 Lu   +a  -38372.844    150.034     7959.070    0.981 B- -11073#     335#    152 958805.054    161.068
+   9   81   72  153 Hf    x  -27300#       300#        7882#       2#    B-      *              152 970692#       322#
+0 42   98   56  154 Ba    x  -32820#       500#        8000#       3#    B-   8709#     583#    153 964766#       537#
+  40   97   57  154 La    x  -41530#       300#        8051#       2#    B-  10690#     361#    153 955416#       322#
+  38   96   58  154 Ce    x  -52220#       200#        8116#       1#    B-   5885#     230#    153 943940#       215#
+  36   95   59  154 Pr    +  -58104.976    113.009     8148.892    0.734 B-   7720.000  100.000 153 937621.738    121.320
+  34   94   60  154 Nd    +  -65824.976     52.642     8193.942    0.342 B-   2687.000   25.000 153 929333.977     56.513
+  32   93   61  154 Pm   IT  -68511.976     46.326     8206.310    0.301 B-   3943.200   46.303 153 926449.364     49.733
+  30   92   62  154 Sm       -72455.176      1.462     8226.835    0.009 B-   -717.056    1.103 153 922216.164      1.569
+  28   91   63  154 Eu       -71738.121      1.346     8217.098    0.009 B-   1967.834    0.755 153 922985.955      1.444
+  26   90   64  154 Gd       -73705.954      1.197     8224.796    0.008 B-  -3549.651   45.298 153 920873.398      1.285
+  24   89   65  154 Tb    -  -70156.303     45.314     8196.666    0.294 B-    237.604   45.901 153 924684.106     48.646
+  22   88   66  154 Dy       -70393.907      7.445     8193.129    0.048 B-  -5754.596   10.178 153 924429.028      7.992
+  20   87   67  154 Ho   -a  -64639.311      8.228     8150.681    0.053 B-  -2034.291    9.463 153 930606.841      8.833
+  18   86   68  154 Er       -62605.020      4.986     8132.392    0.032 B-  -8177.888   14.916 153 932790.743      5.353
+  16   85   69  154 Tm   -a  -54427.131     14.412     8074.208    0.094 B-  -4495.048   13.953 153 941570.067     15.472
+  14   84   70  154 Yb       -49932.083     17.281     8039.939    0.112 B- -10217#     197#    153 946395.701     18.552
+  12   83   71  154 Lu   +a  -39715#       196#        7969#       1#    B-  -7045#     358#    153 957364#       211#
+  10   82   72  154 Hf    x  -32670#       300#        7918#       2#    B-      *              153 964927#       322#
+0 41   98   57  155 La    x  -37930#       400#        8028#       3#    B-   9850#     500#    154 959280#       429#
+  39   97   58  155 Ce    x  -47780#       300#        8087#       2#    B-   7635#     300#    154 948706#       322#
+  37   96   59  155 Pr       -55415.268     17.198     8131.039    0.111 B-   6868.456   19.472 154 940509.259     18.462
+  35   95   60  155 Nd       -62283.724      9.153     8170.304    0.059 B-   4656.207   10.278 154 933135.668      9.826
+  33   94   61  155 Pm       -66939.931      4.718     8195.296    0.030 B-   3250.888    4.946 154 928137.024      5.065
+  31   93   62  155 Sm   -n  -70190.819      1.487     8211.223    0.010 B-   1627.273    1.202 154 924647.051      1.595
+  29   92   63  155 Eu       -71818.092      1.401     8216.674    0.009 B-    251.788    0.870 154 922900.102      1.504
+  27   91   64  155 Gd       -72069.880      1.191     8213.251    0.008 B-   -819.830    9.789 154 922629.796      1.278
+  25   90   65  155 Tb    +  -71250.050      9.849     8202.914    0.064 B-  -2094.500    1.897 154 923509.921     10.573
+  23   89   66  155 Dy       -69155.550      9.665     8184.354    0.062 B-  -3116.011   16.590 154 925758.459     10.375
+  21   88   67  155 Ho       -66039.539     17.474     8159.203    0.113 B-  -3830.350   18.474 154 929103.634     18.759
+  19   87   68  155 Er   -a  -62209.189      6.098     8129.444    0.039 B-  -5583.276   11.515 154 933215.684      6.546
+  17   86   69  155 Tm   -a  -56625.913      9.925     8088.375    0.064 B-  -6123.305   19.341 154 939209.578     10.654
+  15   85   70  155 Yb   -a  -50502.608     16.600     8043.823    0.107 B-  -7957.561   25.416 154 945783.217     17.820
+  13   84   71  155 Lu   +a  -42545.047     19.246     7987.436    0.124 B-  -8375#     299#    154 954326.011     20.661
+  11   83   72  155 Hf    x  -34170#       298#        7928#       2#    B- -10242#     423#    154 963317#       320#
+   9   82   73  155 Ta   -p  -23928#       300#        7857#       2#    B-      *              154 974312#       322#
+0 42   99   57  156 La    x  -33050#       400#        7997#       3#    B-  11769#     500#    155 964519#       429#
+  40   98   58  156 Ce    x  -44820#       300#        8068#       2#    B-   6748#     361#    155 951884#       322#
+  38   97   59  156 Pr    x  -51568#       200#        8106#       1#    B-   8906#     283#    155 944640#       215#
+  36   96   60  156 Nd    +  -60473.644    200.033     8158.066    1.282 B-   3690.000  200.000 155 935078.868    214.744
+  34   95   61  156 Pm       -64163.644      3.631     8176.705    0.023 B-   5196.786    9.285 155 931117.490      3.897
+  32   94   62  156 Sm       -69360.430      8.546     8205.003    0.055 B-    722.118    7.902 155 925538.511      9.174
+  30   93   63  156 Eu       -70082.548      3.590     8204.617    0.023 B-   2452.366    3.409 155 924763.285      3.853
+  28   92   64  156 Gd       -72534.914      1.190     8215.322    0.008 B-  -2444.117    3.679 155 922130.562      1.277
+  26   91   65  156 Tb       -70090.797      3.814     8194.639    0.024 B-    438.167    3.681 155 924754.430      4.094
+  24   90   66  156 Dy       -70528.964      1.194     8192.433    0.008 B-  -5050.000   60.000 155 924284.038      1.281
+  22   89   67  156 Ho    -  -65478.964     60.012     8155.046    0.385 B-  -1267.255   64.869 155 929705.436     64.425
+  20   88   68  156 Er       -64211.709     24.629     8141.908    0.158 B-  -7377.159   26.710 155 931065.890     26.440
+  18   87   69  156 Tm       -56834.550     14.279     8089.603    0.092 B-  -3568.830   12.548 155 938985.597     15.328
+  16   86   70  156 Yb       -53265.721      9.308     8061.711    0.060 B-  -9566.176   54.916 155 942816.893      9.992
+  14   85   71  156 Lu   -a  -43699.544     54.122     7995.374    0.347 B-  -5882.648  139.709 155 953086.606     58.102
+  12   84   72  156 Hf       -37816.896    149.757     7952.650    0.960 B- -11956#     334#    155 959401.889    160.770
+  10   83   73  156 Ta   -p  -25861#       298#        7871#       2#    B-      *              155 972237#       320#
+0 41   99   58  157 Ce    x  -39930#       400#        8037#       3#    B-   8610#     500#    156 957133#       429#
+  39   98   59  157 Pr    x  -48540#       300#        8086#       2#    B-   7921#     301#    156 947890#       322#
+  37   97   60  157 Nd       -56461.543     24.918     8131.959    0.159 B-   5835.500   25.877 156 939386.037     26.750
+  35   96   61  157 Pm       -62297.043      7.006     8164.145    0.045 B-   4380.534    8.265 156 933121.370      7.521
+  33   95   62  157 Sm       -66677.577      4.433     8187.063    0.028 B-   2781.331    6.136 156 928418.673      4.759
+  31   94   63  157 Eu       -69458.908      4.259     8199.795    0.027 B-   1364.565    4.203 156 925432.791      4.572
+  29   93   64  157 Gd       -70823.473      1.189     8203.504    0.008 B-    -60.043    0.297 156 923967.870      1.276
+  27   92   65  157 Tb       -70763.430      1.222     8198.138    0.008 B-  -1338.872    5.134 156 924032.328      1.311
+  25   91   66  157 Dy       -69424.558      5.177     8184.627    0.033 B-  -2591.725   23.787 156 925469.667      5.557
+  23   90   67  157 Ho       -66832.832     23.469     8163.136    0.149 B-  -3419.194   33.668 156 928251.999     25.195
+  21   89   68  157 Er       -63413.639     26.505     8136.375    0.169 B-  -4704.366   38.515 156 931922.655     28.454
+  19   88   69  157 Tm    x  -58709.273     27.945     8101.428    0.178 B-  -5287.374   30.003 156 936973.000     30.000
+  17   87   70  157 Yb       -53421.898     10.920     8062.767    0.070 B-  -6981.376   14.241 156 942649.230     11.723
+  15   86   71  157 Lu       -46440.523     12.078     8013.317    0.077 B-  -7537#     196#    156 950144.045     12.965
+  13   85   72  157 Hf   -a  -38903#       196#        7960#       1#    B-  -9310#     247#    156 958236#       210#
+  11   84   73  157 Ta   IT  -29593.330    150.069     7896.043    0.956 B- -10123#     427#    156 968230.251    161.105
+   9   83   74  157 W     x  -19470#       400#        7827#       3#    B-      *              156 979098#       429#
+0 42  100   58  158 Ce    x  -36660#       400#        8016#       3#    B-   7670#     500#    157 960644#       429#
+  40   99   59  158 Pr    x  -44330#       300#        8060#       2#    B-   9725#     361#    157 952410#       322#
+  38   98   60  158 Nd    x  -54055#       200#        8116#       1#    B-   5035#     201#    157 941970#       215#
+  36   97   61  158 Pm       -59089.209     13.453     8143.254    0.085 B-   6161.034   14.299 157 936565.121     14.442
+  34   96   62  158 Sm       -65250.243      4.892     8177.297    0.031 B-   2004.945   10.369 157 929950.979      5.251
+  32   95   63  158 Eu       -67255.188     10.255     8185.035    0.065 B-   3434.358   10.323 157 927798.581     11.008
+  30   94   64  158 Gd       -70689.546      1.189     8201.819    0.008 B-  -1218.878    0.987 157 924111.646      1.276
+  28   93   65  158 Tb       -69470.668      1.401     8189.153    0.009 B-    936.681    2.495 157 925420.166      1.503
+  26   92   66  158 Dy       -70407.349      2.365     8190.130    0.015 B-  -4219.755   27.005 157 924414.597      2.538
+  24   91   67  158 Ho    -  -66187.593     27.108     8158.471    0.172 B-   -883.785   37.025 157 928944.692     29.101
+  22   90   68  158 Er       -65303.808     25.219     8147.926    0.160 B-  -6600.615   31.341 157 929893.474     27.074
+  20   89   69  158 Tm       -58703.194     25.219     8101.199    0.160 B-  -2692.957   26.456 157 936979.525     27.074
+  18   88   70  158 Yb       -56010.237      7.994     8079.203    0.051 B-  -8798.047   16.872 157 939870.534      8.582
+  16   87   71  158 Lu   -a  -47212.190     15.125     8018.568    0.096 B-  -5109.800   14.937 157 949315.626     16.237
+  14   86   72  158 Hf       -42102.390     17.494     7981.276    0.111 B- -10936#     197#    157 954801.222     18.780
+  12   85   73  158 Ta   +a  -31167#       196#        7907#       1#    B-  -7534#     358#    157 966541#       210#
+  10   84   74  158 W    -a  -23633#       300#        7854#       2#    B-      *              157 974629#       322#
+0 41  100   59  159 Pr    x  -41088#       400#        8039#       3#    B-   8719#     500#    158 955890#       429#
+  39   99   60  159 Nd    x  -49807#       300#        8089#       2#    B-   6747#     300#    158 946530#       322#
+  37   98   61  159 Pm       -56554.280     10.039     8126.859    0.063 B-   5653.495   11.644 158 939286.479     10.777
+  35   97   62  159 Sm       -62207.775      5.934     8157.495    0.037 B-   3835.510    7.324 158 933217.202      6.369
+  33   96   63  159 Eu       -66043.285      4.325     8176.697    0.027 B-   2518.150    4.391 158 929099.612      4.643
+  31   95   64  159 Gd       -68561.436      1.192     8187.614    0.007 B-    970.927    0.759 158 926396.267      1.279
+  29   94   65  159 Tb       -69532.363      1.256     8188.800    0.008 B-   -365.229    1.162 158 925353.933      1.348
+  27   93   66  159 Dy       -69167.134      1.528     8181.583    0.010 B-  -1837.600    2.683 158 925746.023      1.640
+  25   92   67  159 Ho    -  -67329.534      3.088     8165.105    0.019 B-  -2768.500    2.000 158 927718.768      3.314
+  23   91   68  159 Er    -  -64561.034      3.679     8142.773    0.023 B-  -3990.637   28.186 158 930690.875      3.949
+  21   90   69  159 Tm    x  -60570.398     27.945     8112.754    0.176 B-  -4731.792   33.129 158 934975.000     30.000
+  19   89   70  159 Yb    x  -55838.606     17.794     8078.074    0.112 B-  -6130.002   41.655 158 940054.787     19.102
+  17   88   71  159 Lu    x  -49708.604     37.663     8034.600    0.237 B-  -6856.004   41.246 158 946635.615     40.433
+  15   87   72  159 Hf   -a  -42852.600     16.813     7986.560    0.106 B-  -8413.452   25.891 158 953995.838     18.049
+  13   86   73  159 Ta   IT  -34439.148     19.689     7928.725    0.124 B-  -9145#     299#    158 963028.052     21.137
+  11   85   74  159 W    -a  -25295#       298#        7866#       2#    B- -10550#     426#    158 972845#       320#
+   9   84   75  159 Re   IT  -14745#       305#        7795#       2#    B-      *              158 984171#       327#
+0 42  101   59  160 Pr    x  -36520#       400#        8011#       2#    B-  10613#     500#    159 960794#       429#
+  40  100   60  160 Nd    x  -47134#       300#        8073#       2#    B-   5868#     361#    159 949400#       322#
+  38   99   61  160 Pm    x  -53002#       200#        8104#       1#    B-   7233#     200#    159 943100#       215#
+  36   98   62  160 Sm       -60234.793      5.934     8144.625    0.037 B-   3245.670   11.183 159 935335.286      6.370
+  34   97   63  160 Eu       -63480.463      9.501     8160.021    0.059 B-   4461.278    9.586 159 931850.916     10.199
+  32   96   64  160 Gd       -67941.741      1.278     8183.014    0.008 B-   -105.483    1.022 159 927061.537      1.371
+  30   95   65  160 Tb       -67836.257      1.262     8177.465    0.008 B-   1836.472    1.172 159 927174.778      1.354
+  28   94   66  160 Dy       -69672.730      0.772     8184.054    0.005 B-  -3290.000   15.000 159 925203.244      0.828
+  26   93   67  160 Ho    -  -66382.730     15.020     8158.602    0.094 B-   -318.502   28.528 159 928735.204     16.124
+  24   92   68  160 Er       -66064.228     24.254     8151.721    0.152 B-  -5762.200   40.311 159 929077.130     26.037
+  22   91   69  160 Tm       -60302.028     34.262     8110.818    0.214 B-  -2139.322   35.017 159 935263.106     36.781
+  20   90   70  160 Yb       -58162.706      7.233     8092.557    0.045 B-  -7892.769   57.280 159 937559.763      7.764
+  18   89   71  160 Lu    x  -50269.937     56.821     8038.338    0.355 B-  -4330.994   57.617 159 946033.000     61.000
+  16   88   72  160 Hf       -45938.943      9.541     8006.380    0.060 B- -10115.249   55.147 159 950682.513     10.242
+  14   87   73  160 Ta   -a  -35823.695     54.316     7938.270    0.339 B-  -6497.240  139.859 159 961541.679     58.310
+  12   86   74  160 W        -29326.455    149.827     7892.772    0.936 B- -12588#     334#    159 968516.753    160.846
+  10   85   75  160 Re   -a  -16739#       298#        7809#       2#    B-      *              159 982030#       320#
+0 41  101   60  161 Nd    x  -42588#       400#        8044#       2#    B-   7648#     499#    160 954280#       429#
+  39  100   61  161 Pm    x  -50235#       298#        8087#       2#    B-   6436#     298#    160 946070#       320#
+  37   99   62  161 Sm       -56671.962      6.817     8122.041    0.042 B-   5119.562   12.415 160 939160.143      7.318
+  35   98   63  161 Eu       -61791.524     10.399     8148.980    0.065 B-   3714.299   10.525 160 933664.066     11.164
+  33   97   64  161 Gd   -n  -65505.824      1.622     8167.191    0.010 B-   1955.765    1.442 160 929676.602      1.741
+  31   96   65  161 Tb       -67461.589      1.350     8174.479    0.008 B-    594.212    1.260 160 927577.001      1.449
+  29   95   66  161 Dy       -68055.801      0.768     8173.310    0.005 B-   -858.530    2.166 160 926939.088      0.824
+  27   94   67  161 Ho       -67197.270      2.242     8163.119    0.014 B-  -1995.663    9.011 160 927860.759      2.406
+  25   93   68  161 Er   +n  -65201.608      8.780     8145.864    0.055 B-  -3302.900   29.292 160 930003.191      9.425
+  23   92   69  161 Tm    x  -61898.708     27.945     8120.490    0.174 B-  -4059.308   31.885 160 933549.000     30.000
+  21   91   70  161 Yb    x  -57839.400     15.354     8090.417    0.095 B-  -5277.056   31.885 160 937906.846     16.483
+  19   90   71  161 Lu    x  -52562.344     27.945     8052.781    0.174 B-  -6247.672   35.909 160 943572.000     30.000
+  17   89   72  161 Hf       -46314.672     22.551     8009.117    0.140 B-  -7535.674   33.097 160 950279.151     24.209
+  15   88   73  161 Ta   +a  -38778.997     24.382     7957.452    0.151 B-  -8224#     197#    160 958369.031     26.175
+  13   87   74  161 W    -a  -30555#       196#        7902#       1#    B-  -9715#     247#    160 967197#       210#
+  11   86   75  161 Re       -20840.203    149.922     7836.312    0.931 B- -10861#     427#    160 977627.121    160.948
+   9   85   76  161 Os   -a   -9979#       400#        7764#       2#    B-      *              160 989287#       429#
+0 42  102   60  162 Nd    x  -39550#       400#        8026#       2#    B-   6819#     500#    161 957541#       429#
+  40  101   61  162 Pm    x  -46370#       300#        8063#       2#    B-   8160#     358#    161 950220#       322#
+  38  100   62  162 Sm    x  -54530#       196#        8109#       1#    B-   4174#     199#    161 941460#       210#
+  36   99   63  162 Eu    +  -58703.401     35.229     8129.438    0.217 B-   5577.000   35.000 161 936979.303     37.819
+  34   98   64  162 Gd  -nn  -64280.401      4.009     8159.035    0.025 B-   1395.556   36.581 161 930992.146      4.303
+  32   97   65  162 Tb    +  -65675.958     36.372     8162.820    0.225 B-   2505.521   36.364 161 929493.955     39.046
+  30   96   66  162 Dy       -68181.478      0.767     8173.457    0.005 B-  -2139.937    3.113 161 926804.168      0.822
+  28   95   67  162 Ho       -66041.541      3.166     8155.418    0.020 B-    292.978    3.127 161 929101.485      3.398
+  26   94   68  162 Er       -66334.519      0.822     8152.397    0.005 B-  -4856.728   26.047 161 928786.960      0.882
+  24   93   69  162 Tm    -  -61477.791     26.060     8117.588    0.161 B-  -1651.444   30.240 161 934000.872     27.977
+  22   92   70  162 Yb    x  -59826.347     15.359     8102.565    0.095 B-  -6994.593   76.592 161 935773.771     16.488
+  20   91   71  162 Lu    x  -52831.753     75.036     8054.559    0.463 B-  -3662.746   75.570 161 943282.776     80.554
+  18   90   72  162 Hf       -49169.008      8.968     8027.120    0.055 B-  -9388.814   52.931 161 947214.896      9.627
+  16   89   73  162 Ta   -a  -39780.194     52.238     7964.335    0.322 B-  -5780.987   52.239 161 957294.202     56.079
+  14   88   74  162 W        -33999.207     17.658     7923.821    0.109 B- -11498#     197#    161 963500.347     18.956
+  12   87   75  162 Re   +a  -22501#       196#        7848#       1#    B-  -8061#     358#    161 975844#       210#
+  10   86   76  162 Os   -a  -14440#       300#        7793#       2#    B-      *              161 984498#       322#
+0 41  102   61  163 Pm    x  -43249#       400#        8044#       2#    B-   7471#     499#    162 953570#       429#
+  39  101   62  163 Sm    x  -50720#       298#        8085#       2#    B-   5765#     305#    162 945550#       320#
+  37  100   63  163 Eu    +  -56484.886     65.544     8115.471    0.402 B-   4829.000   65.000 162 939360.977     70.364
+  35   99   64  163 Gd       -61313.886      8.426     8140.297    0.052 B-   3282.185    9.358 162 934176.832      9.045
+  33   98   65  163 Tb   +p  -64596.071      4.073     8155.633    0.025 B-   1785.099    4.001 162 930653.261      4.372
+  31   97   66  163 Dy       -66381.170      0.765     8161.785    0.005 B-     -2.834    0.019 162 928736.879      0.821
+  29   96   67  163 Ho       -66378.335      0.765     8156.968    0.005 B-  -1210.612    4.575 162 928739.921      0.821
+  27   95   68  163 Er       -65167.723      4.639     8144.741    0.028 B-  -2439.000    3.000 162 930039.567      4.980
+  25   94   69  163 Tm    -  -62728.723      5.524     8124.979    0.034 B-  -3429.629   16.309 162 932657.941      5.930
+  23   93   70  163 Yb    x  -59299.094     15.364     8099.138    0.094 B-  -4507.685   31.890 162 936339.800     16.493
+  21   92   71  163 Lu    x  -54791.409     27.945     8066.684    0.171 B-  -5527.726   37.348 162 941179.000     30.000
+  19   91   72  163 Hf       -49263.682     24.778     8027.972    0.152 B-  -6729.054   45.416 162 947113.258     26.599
+  17   90   73  163 Ta   -a  -42534.629     38.061     7981.890    0.234 B-  -7626.436   65.049 162 954337.195     40.860
+  15   89   74  163 W    -a  -34908.193     52.751     7930.302    0.324 B-  -8905.949   55.912 162 962524.511     56.630
+  13   88   75  163 Re   +a  -26002.244     18.534     7870.865    0.114 B-  -9810#     299#    162 972085.441     19.897
+  11   87   76  163 Os   -a  -16192#       298#        7806#       2#    B-      *              162 982617#       320#
+0 42  103   61  164 Pm    x  -38870#       400#        8017#       2#    B-   9232#     499#    163 958271#       429#
+  40  102   62  164 Sm    x  -48102#       298#        8069#       2#    B-   5279#     319#    163 948360#       320#
+  38  101   63  164 Eu    +  -53381#       114#        8096#       1#    B-   6393.000   50.000 163 942693#       122#
+  36  100   64  164 Gd    x  -59774#       102#        8130#       1#    B-   2304#     143#    163 935830#       110#
+  34   99   65  164 Tb    +  -62077.965    100.003     8139.765    0.610 B-   3890.000  100.000 163 933356.559    107.357
+  32   98   66  164 Dy       -65967.965      0.767     8158.714    0.005 B-   -986.463    1.415 163 929180.472      0.823
+  30   97   67  164 Ho       -64981.503      1.528     8147.929    0.009 B-    961.387    1.420 163 930239.483      1.639
+  28   96   68  164 Er       -65942.890      0.775     8149.020    0.005 B-  -4038.855   24.401 163 929207.392      0.832
+  26   95   69  164 Tm       -61904.035     24.394     8119.623    0.149 B-   -886.617   28.829 163 933543.281     26.188
+  24   94   70  164 Yb    x  -61017.418     15.369     8109.446    0.094 B-  -6375.048   31.892 163 934495.103     16.499
+  22   93   71  164 Lu    x  -54642.370     27.945     8065.804    0.170 B-  -2823.865   32.112 163 941339.000     30.000
+  20   92   72  164 Hf       -51818.505     15.820     8043.814    0.096 B-  -8535.704   32.112 163 944370.544     16.983
+  18   91   73  164 Ta    x  -43282.800     27.945     7986.997    0.170 B-  -5047.041   29.572 163 953534.000     30.000
+  16   90   74  164 W        -38235.759      9.674     7951.452    0.059 B- -10763.323   55.406 163 958952.222     10.385
+  14   89   75  164 Re   -a  -27472.436     54.555     7881.052    0.333 B-  -7050.331  140.051 163 970507.124     58.566
+  12   88   76  164 Os       -20422.106    149.919     7833.291    0.914 B- -13078#     348#    163 978075.966    160.945
+  10   87   77  164 Ir   -a   -7344#       314#        7749#       2#    B-      *              163 992116#       338#
+0 41  103   62  165 Sm    x  -43808#       401#        8043#       2#    B-   6916#     424#    164 952970#       430#
+  39  102   63  165 Eu    +  -50724#       138#        8080#       1#    B-   5729.000   65.000 164 945546#       148#
+  37  101   64  165 Gd    +  -56453#       121#        8110#       1#    B-   4113.000   65.000 164 939395#       130#
+  35  100   65  165 Tb    x  -60566#       102#        8130#       1#    B-   3047#     102#    164 934980#       110#
+  33   99   66  165 Dy   -n  -63612.606      0.769     8143.909    0.005 B-   1286.400    0.829 164 931709.054      0.825
+  31   98   67  165 Ho       -64899.006      1.010     8146.964    0.006 B-   -377.396    1.033 164 930328.047      1.084
+  29   97   68  165 Er       -64521.610      0.964     8139.936    0.006 B-  -1591.990    1.538 164 930733.198      1.034
+  27   96   69  165 Tm       -62929.621      1.671     8125.546    0.010 B-  -2634.239   26.592 164 932442.269      1.793
+  25   95   70  165 Yb       -60295.382     26.539     8104.839    0.161 B-  -3853.140   35.432 164 935270.241     28.490
+  23   94   71  165 Lu       -56442.242     26.539     8076.745    0.161 B-  -4806.734   38.539 164 939406.758     28.490
+  21   93   72  165 Hf    x  -51635.507     27.945     8042.872    0.169 B-  -5787.655   31.195 164 944567.000     30.000
+  19   92   73  165 Ta       -45847.853     13.863     8003.054    0.084 B-  -6986.830   28.566 164 950780.303     14.882
+  17   91   74  165 W        -38861.022     24.977     7955.968    0.151 B-  -8201.246   34.332 164 958280.974     26.813
+  15   90   75  165 Re   +a  -30659.776     23.594     7901.522    0.143 B-  -8865#     197#    164 967085.375     25.329
+  13   89   76  165 Os   -a  -21795#       196#        7843#       1#    B- -10202#     252#    164 976602#       210#
+  11   88   77  165 Ir   IT  -11593#       158#        7776#       1#    B-      *              164 987555#       170#
+0 42  104   62  166 Sm    x  -40730#       400#        8024#       2#    B-   6478#     537#    165 956275#       429#
+  40  103   63  166 Eu    +  -47208#       358#        8059#       2#    B-   7322.000  300.000 165 949320#       384#
+  38  102   64  166 Gd    x  -54530#       196#        8098#       1#    B-   3355#     208#    165 941460#       210#
+  36  101   65  166 Tb    +  -57884.789     70.005     8113.680    0.422 B-   4700.000   70.000 165 937858.119     75.153
+  34  100   66  166 Dy   -n  -62584.789      0.867     8137.280    0.005 B-    486.540    0.921 165 932812.461      0.930
+  32   99   67  166 Ho       -63071.329      1.010     8135.499    0.006 B-   1854.713    0.922 165 932290.139      1.084
+  30   98   68  166 Er       -64926.042      1.165     8141.959    0.007 B-  -3037.667   11.547 165 930299.023      1.251
+  28   97   69  166 Tm    -  -61888.375     11.606     8118.946    0.070 B-   -292.635   13.507 165 933560.092     12.459
+  26   96   70  166 Yb  +nn  -61595.740      7.101     8112.471    0.043 B-  -5574.759   30.642 165 933874.249      7.623
+  24   95   71  166 Lu    x  -56020.981     29.808     8074.175    0.180 B-  -2161.998   40.859 165 939859.000     32.000
+  22   94   72  166 Hf    x  -53858.983     27.945     8056.438    0.168 B-  -7761.208   39.520 165 942180.000     30.000
+  20   93   73  166 Ta    x  -46097.775     27.945     8004.971    0.168 B-  -4209.744   29.508 165 950512.000     30.000
+  18   92   74  166 W        -41888.031      9.478     7974.898    0.057 B-  -9994.553   72.879 165 955031.346     10.174
+  16   91   75  166 Re   -a  -31893.478     72.310     7909.977    0.436 B-  -6461.961   72.387 165 965760.940     77.628
+  14   90   76  166 Os       -25431.517     17.966     7866.336    0.108 B- -12078#     197#    165 972698.141     19.287
+  12   89   77  166 Ir   -p  -13354#       196#        7789#       1#    B-  -8624#     359#    165 985664#       210#
+  10   88   78  166 Pt   -a   -4730#       300#        7732#       2#    B-      *              165 994923#       322#
+0 41  104   63  167 Eu    x  -44010#       400#        8040#       2#    B-   6803#     499#    166 952753#       429#
+  39  103   64  167 Gd    x  -50813#       298#        8076#       2#    B-   5114#     357#    166 945450#       320#
+  37  102   65  167 Tb    x  -55927#       196#        8102#       1#    B-   4004#     205#    166 939960#       210#
+  35  101   66  167 Dy    +  -59930.626     60.228     8120.992    0.361 B-   2350.000   60.000 166 935661.823     64.657
+  33  100   67  167 Ho  p2n  -62280.626      5.234     8130.379    0.031 B-   1010.554    5.208 166 933138.994      5.618
+  31   99   68  167 Er       -63291.180      1.166     8131.746    0.007 B-   -747.539    1.501 166 932054.119      1.252
+  29   98   69  167 Tm       -62543.641      1.298     8122.585    0.008 B-  -1953.065    3.798 166 932856.635      1.393
+  27   97   70  167 Yb       -60590.576      3.981     8106.205    0.024 B-  -3089.451   31.920 166 934953.337      4.273
+  25   96   71  167 Lu    x  -57501.125     31.671     8083.021    0.190 B-  -4033.369   42.237 166 938270.000     34.000
+  23   95   72  167 Hf    x  -53467.756     27.945     8054.184    0.167 B-  -5116.697   39.520 166 942600.000     30.000
+  21   94   73  167 Ta    x  -48351.059     27.945     8018.861    0.167 B-  -6253.001   33.382 166 948093.000     30.000
+  19   93   74  167 W        -42098.058     18.261     7976.733    0.109 B-  -7267#      44#    166 954805.873     19.603
+  17   92   75  167 Re   +a  -34831#        40#        7929#       0#    B-  -8329#      83#    166 962607#        43#
+  15   91   76  167 Os   -a  -26501.993     72.682     7873.974    0.435 B-  -9429.554   74.962 166 971548.938     78.027
+  13   90   77  167 Ir       -17072.440     18.346     7812.825    0.110 B- -10460#     303#    166 981671.981     19.695
+  11   89   78  167 Pt   -a   -6612#       302#        7746#       2#    B-      *              166 992901#       325#
+0 42  105   63  168 Eu    x  -39740#       500#        8014#       3#    B-   8623#     641#    167 957337#       537#
+  40  104   64  168 Gd    x  -48363#       401#        8061#       2#    B-   4359#     499#    167 948080#       430#
+  38  103   65  168 Tb    x  -52723#       298#        8082#       2#    B-   5837#     329#    167 943400#       320#
+  36  102   66  168 Dy  +pp  -58559.566    140.009     8112.536    0.833 B-   1501.605  143.186 167 937133.716    150.305
+  34  101   67  168 Ho    +  -60061.171     30.023     8116.817    0.179 B-   2930.000   30.000 167 935521.676     32.230
+  32  100   68  168 Er       -62991.171      1.168     8129.601    0.007 B-  -1678.250    1.870 167 932376.192      1.254
+  30   99   69  168 Tm       -61312.921      1.709     8114.954    0.010 B-    268.980    1.887 167 934177.868      1.834
+  28   98   70  168 Yb       -61581.901      1.195     8111.898    0.007 B-  -4514.051   39.248 167 933889.106      1.282
+  26   97   71  168 Lu    -  -57067.850     39.266     8080.372    0.234 B-  -1707.299   48.195 167 938735.139     42.153
+  24   96   72  168 Hf    x  -55360.552     27.945     8065.553    0.166 B-  -6966.644   39.520 167 940568.000     30.000
+  22   95   73  168 Ta    x  -48393.908     27.945     8019.428    0.166 B-  -3500.799   30.936 167 948047.000     30.000
+  20   94   74  168 W        -44893.109     13.271     7993.933    0.079 B-  -9098.224   33.556 167 951805.262     14.247
+  18   93   75  168 Re   -a  -35794.885     30.821     7935.120    0.183 B-  -5799.672   32.373 167 961572.608     33.087
+  16   92   76  168 Os       -29995.213      9.904     7895.941    0.059 B- -11328.987   56.098 167 967798.812     10.632
+  14   91   77  168 Ir   -a  -18666.226     55.216     7823.850    0.329 B-  -7658.765  140.344 167 979960.981     59.277
+  12   90   78  168 Pt   -a  -11007.460    149.951     7773.605    0.893 B-      *              167 988183.004    160.978
+0 41  105   64  169 Gd    x  -44153#       503#        8036#       3#    B-   6176#     585#    168 952600#       540#
+  39  104   65  169 Tb    x  -50329#       298#        8068#       2#    B-   5269#     423#    168 945970#       320#
+  37  103   66  169 Dy    +  -55597.177    300.670     8094.763    1.779 B-   3200.000  300.000 168 940313.971    322.782
+  35  102   67  169 Ho   +p  -58797.177     20.060     8109.068    0.119 B-   2125.927   20.053 168 936878.630     21.534
+  33  101   68  169 Er   -n  -60923.104      1.179     8117.019    0.007 B-    352.108    1.114 168 934596.353      1.265
+  31  100   69  169 Tm       -61275.212      0.812     8114.473    0.005 B-   -897.650    1.141 168 934218.350      0.871
+  29   99   70  169 Yb   -n  -60377.563      1.205     8104.532    0.007 B-  -2293.000    3.000 168 935182.016      1.293
+  27   98   71  169 Lu    -  -58084.563      3.233     8086.335    0.019 B-  -3367.673   28.131 168 937643.653      3.470
+  25   97   72  169 Hf    x  -54716.889     27.945     8061.778    0.165 B-  -4426.460   39.520 168 941259.000     30.000
+  23   96   73  169 Ta    x  -50290.430     27.945     8030.957    0.165 B-  -5372.557   31.925 168 946011.000     30.000
+  21   95   74  169 W        -44917.873     15.436     7994.537    0.091 B-  -6508.641   19.173 168 951778.677     16.571
+  19   94   75  169 Re   +a  -38409.232     11.374     7951.395    0.067 B-  -7686.541   27.618 168 958765.991     12.210
+  17   93   76  169 Os   -a  -30722.691     25.167     7901.284    0.149 B-  -8628.852   34.276 168 967017.833     27.017
+  15   92   77  169 Ir   +a  -22093.839     23.308     7845.596    0.138 B-  -9581#     197#    168 976281.287     25.021
+  13   91   78  169 Pt   -a  -12512#       196#        7784#       1#    B- -10724#     357#    168 986567#       210#
+  11   90   79  169 Au    x   -1788#       298#        7716#       2#    B-      *              168 998080#       320#
+0 42  106   64  170 Gd    x  -41380#       596#        8020#       4#    B-   5344#     718#    169 955577#       640#
+  40  105   65  170 Tb    x  -46724#       401#        8047#       2#    B-   6940#     446#    169 949840#       430#
+  38  104   66  170 Dy    x  -53663#       196#        8083#       1#    B-   2575#     202#    169 942390#       210#
+  36  103   67  170 Ho    +  -56238.681     50.024     8093.796    0.294 B-   3870.000   50.000 169 939625.289     53.702
+  34  102   68  170 Er       -60108.681      1.546     8111.959    0.009 B-   -312.828    1.821 169 935470.673      1.659
+  32  101   69  170 Tm       -59795.853      0.801     8105.517    0.005 B-    968.066    0.801 169 935806.507      0.859
+  30  100   70  170 Yb       -60763.919      0.010     8106.609    0.000 B-  -3457.695   16.843 169 934767.245      0.011
+  28   99   71  170 Lu    -  -57306.224     16.843     8081.668    0.099 B-  -1052.370   32.628 169 938479.234     18.081
+  26   98   72  170 Hf    x  -56253.854     27.945     8070.875    0.164 B-  -6116.190   39.520 169 939609.000     30.000
+  24   97   73  170 Ta    x  -50137.665     27.945     8030.296    0.164 B-  -2846.832   30.904 169 946175.000     30.000
+  22   96   74  170 W        -47290.832     13.195     8008.948    0.078 B-  -8377.639   26.611 169 949231.200     14.165
+  20   95   75  170 Re       -38913.193     23.109     7955.065    0.136 B-  -4986.946   25.091 169 958224.966     24.808
+  18   94   76  170 Os       -33926.247      9.773     7921.128    0.057 B- -10567#      89#    169 963578.673     10.491
+  16   93   77  170 Ir   -a  -23360#        89#        7854#       1#    B-  -7060#      89#    169 974922#        95#
+  14   92   78  170 Pt       -16299.193     18.247     7808.236    0.107 B- -12547#     197#    169 982502.095     19.588
+  12   91   79  170 Au   -p   -3752#       196#        7730#       1#    B-      *              169 995972#       211#
+0 41  106   65  171 Tb    x  -44032#       503#        8031#       3#    B-   6157#     585#    170 952730#       540#
+  39  105   66  171 Dy    x  -50189#       298#        8063#       2#    B-   4330#     670#    170 946120#       320#
+  37  104   67  171 Ho    +  -54518.956    600.002     8083.608    3.509 B-   3200.000  600.000 170 941471.490    644.128
+  35  103   68  171 Er       -57718.956      1.557     8097.746    0.009 B-   1491.342    1.256 170 938036.148      1.670
+  33  102   69  171 Tm       -59210.298      0.972     8101.893    0.006 B-     96.512    0.972 170 936435.126      1.043
+  31  101   70  171 Yb       -59306.810      0.013     8097.882    0.000 B-  -1478.414    1.862 170 936331.517      0.014
+  29  100   71  171 Lu       -57828.395      1.862     8084.661    0.011 B-  -2397.050   28.936 170 937918.660      1.998
+  27   99   72  171 Hf    x  -55431.345     28.876     8066.068    0.169 B-  -3711.072   40.184 170 940492.000     31.000
+  25   98   73  171 Ta    x  -51720.273     27.945     8039.791    0.163 B-  -4634.183   39.520 170 944476.000     30.000
+  23   97   74  171 W     x  -47086.090     27.945     8008.115    0.163 B-  -5835.810   39.520 170 949451.000     30.000
+  21   96   75  171 Re    x  -41250.280     27.945     7969.412    0.163 B-  -6948.339   33.145 170 955716.000     30.000
+  19   95   76  171 Os       -34301.942     17.823     7924.204    0.104 B-  -7889.916   42.395 170 963175.348     19.133
+  17   94   77  171 Ir   -a  -26412.025     38.466     7873.489    0.225 B-  -8942.323   82.291 170 971645.522     41.295
+  15   93   78  171 Pt   -a  -17469.702     72.747     7816.619    0.425 B-  -9907.407   75.639 170 981245.502     78.097
+  13   92   79  171 Au   -p   -7562.295     20.713     7754.106    0.121 B- -11043#     303#    170 991881.542     22.236
+  11   91   80  171 Hg   -a    3480#       303#        7685#       2#    B-      *              171 003736#       325#
+0 42  107   65  172 Tb    x  -39850#       503#        8007#       3#    B-   8159#     585#    171 957219#       540#
+  40  106   66  172 Dy    x  -48009#       298#        8050#       2#    B-   3474#     357#    171 948460#       320#
+  38  105   67  172 Ho    x  -51484#       196#        8066#       1#    B-   5000#     196#    171 944730#       210#
+  36  104   68  172 Er       -56483.612      4.008     8090.410    0.023 B-    890.767    4.543 171 939362.344      4.302
+  34  103   69  172 Tm       -57374.379      5.503     8091.041    0.032 B-   1881.067    5.503 171 938406.067      5.907
+  32  102   70  172 Yb       -59255.446      0.014     8097.429    0.000 B-  -2519.466    2.336 171 936386.658      0.014
+  30  101   71  172 Lu       -56735.980      2.336     8078.232    0.014 B-   -333.754   24.540 171 939091.417      2.507
+  28  100   72  172 Hf    x  -56402.226     24.428     8071.743    0.142 B-  -5072.248   37.117 171 939449.716     26.224
+  26   99   73  172 Ta    x  -51329.977     27.945     8037.705    0.162 B-  -2232.791   39.520 171 944895.000     30.000
+  24   98   74  172 W     x  -49097.186     27.945     8020.175    0.162 B-  -7560.080   47.974 171 947292.000     30.000
+  22   97   75  172 Re       -41537.106     38.995     7971.672    0.227 B-  -4293.264   41.036 171 955408.079     41.862
+  20   96   76  172 Os       -37243.842     12.782     7942.163    0.074 B-  -9864.473   34.832 171 960017.088     13.721
+  18   95   77  172 Ir   -a  -27379.369     32.402     7880.263    0.188 B-  -6272.449   34.023 171 970607.036     34.785
+  16   94   78  172 Pt       -21106.920     10.377     7839.247    0.060 B- -11788.914   57.108 171 977340.788     11.139
+  14   93   79  172 Au   -a   -9318.006     56.158     7766.158    0.326 B-  -8259.262  140.853 171 989996.708     60.287
+  12   92   80  172 Hg   -a   -1058.744    150.079     7713.591    0.873 B-      *              171 998863.391    161.116
+0 41  107   66  173 Dy    x  -43939#       401#        8027#       2#    B-   5412#     499#    172 952830#       430#
+  39  106   67  173 Ho    x  -49351#       298#        8054#       2#    B-   4304#     357#    172 947020#       320#
+  37  105   68  173 Er    x  -53654#       196#        8074#       1#    B-   2602#     196#    172 942400#       210#
+  35  104   69  173 Tm  p2n  -56256.059      4.400     8084.463    0.025 B-   1295.166    4.400 172 939606.632      4.723
+  33  103   70  173 Yb       -57551.225      0.011     8087.427    0.000 B-   -670.310    1.567 172 938216.215      0.012
+  31  102   71  173 Lu       -56880.916      1.567     8079.030    0.009 B-  -1469.132   27.989 172 938935.822      1.682
+  29  101   72  173 Hf    x  -55411.784     27.945     8066.016    0.162 B-  -3015.246   39.520 172 940513.000     30.000
+  27  100   73  173 Ta    x  -52396.538     27.945     8044.064    0.162 B-  -3669.155   39.520 172 943750.000     30.000
+  25   99   74  173 W     x  -48727.383     27.945     8018.333    0.162 B-  -5173.518   39.520 172 947689.000     30.000
+  23   98   75  173 Re    x  -43553.865     27.945     7983.906    0.162 B-  -6115.608   31.697 172 953243.000     30.000
+  21   97   76  173 Os       -37438.257     14.959     7944.033    0.086 B-  -7169.822   18.583 172 959808.375     16.059
+  19   96   77  173 Ir       -30268.435     11.026     7898.067    0.064 B-  -8325.524   57.052 172 967505.496     11.837
+  17   95   78  173 Pt   -a  -21942.911     55.977     7845.420    0.324 B-  -9110.471   60.421 172 976443.315     60.093
+  15   94   79  173 Au   +a  -12832.440     22.784     7788.237    0.132 B- -10123#     197#    172 986223.808     24.459
+  13   93   80  173 Hg   -a   -2710#       196#        7725#       1#    B-      *              172 997091#       210#
+0 42  108   66  174 Dy    x  -41370#       503#        8012#       3#    B-   4319#     585#    173 955587#       540#
+  40  107   67  174 Ho    x  -45690#       298#        8033#       2#    B-   6260#     422#    173 950950#       320#
+  38  106   68  174 Er    x  -51949#       298#        8064#       2#    B-   1915#     301#    173 944230#       320#
+  36  105   69  174 Tm    +  -53864.512     44.721     8070.642    0.257 B-   3080.000   44.721 173 942174.064     48.010
+  34  104   70  174 Yb       -56944.512      0.011     8083.847    0.000 B-  -1374.317    1.567 173 938867.548      0.011
+  32  103   71  174 Lu       -55570.195      1.567     8071.453    0.009 B-    274.286    2.169 173 940342.938      1.682
+  30  102   72  174 Hf       -55844.481      2.259     8068.533    0.013 B-  -4103.715   28.036 173 940048.480      2.424
+  28  101   73  174 Ta    x  -51740.766     27.945     8040.452    0.161 B-  -1513.678   39.520 173 944454.000     30.000
+  26  100   74  174 W     x  -50227.088     27.945     8027.256    0.161 B-  -6553.992   39.520 173 946079.000     30.000
+  24   99   75  174 Re    x  -43673.096     27.945     7985.094    0.161 B-  -3677.681   29.767 173 953115.000     30.000
+  22   98   76  174 Os       -39995.416     10.254     7959.461    0.059 B-  -9131.924   26.395 173 957063.152     11.008
+  20   97   77  174 Ir       -30863.492     24.322     7902.483    0.140 B-  -5545.329   26.433 173 966866.676     26.111
+  18   96   78  174 Pt   -a  -25318.163     10.351     7866.117    0.059 B- -11083#      89#    173 972819.832     11.112
+  16   95   79  174 Au   -a  -14235#        89#        7798#       1#    B-  -7594#      89#    173 984718#        95#
+  14   94   80  174 Hg   -a   -6641.009     19.211     7749.784    0.110 B-      *              173 992870.583     20.624
+0 41  108   67  175 Ho    x  -43203#       401#        8019#       2#    B-   5449#     566#    174 953620#       430#
+  39  107   68  175 Er    x  -48652#       401#        8045#       2#    B-   3659#     404#    174 947770#       430#
+  37  106   69  175 Tm    +  -52310.549     50.000     8061.766    0.286 B-   2385.000   50.000 174 943842.313     53.677
+  35  105   70  175 Yb       -54695.549      0.071     8070.925    0.000 B-    470.033    1.206 174 941281.910      0.076
+  33  104   71  175 Lu       -55165.582      1.207     8069.140    0.007 B-   -683.920    1.952 174 940777.308      1.295
+  31  103   72  175 Hf       -54481.662      2.282     8060.761    0.013 B-  -2073.015   28.038 174 941511.527      2.449
+  29  102   73  175 Ta    x  -52408.647     27.945     8044.445    0.160 B-  -2775.852   39.520 174 943737.000     30.000
+  27  101   74  175 W     x  -49632.795     27.945     8024.112    0.160 B-  -4344.488   39.520 174 946717.000     30.000
+  25  100   75  175 Re    x  -45288.307     27.945     7994.816    0.160 B-  -5182.931   30.324 174 951381.000     30.000
+  23   99   76  175 Os       -40105.376     11.775     7960.729    0.067 B-  -6710.870   17.089 174 956945.105     12.640
+  21   98   77  175 Ir       -33394.506     12.384     7917.910    0.071 B-  -7681.040   22.001 174 964149.521     13.295
+  19   97   78  175 Pt       -25713.466     18.185     7869.548    0.104 B-  -8309.511   42.705 174 972395.457     19.522
+  17   96   79  175 Au   -a  -17403.955     38.640     7817.595    0.221 B-  -9431.379   82.504 174 981316.085     41.481
+  15   95   80  175 Hg   -a   -7972.576     72.896     7759.231    0.417 B-      *              174 991441.086     78.257
+0 42  109   67  176 Ho    x  -39290#       503#        7997#       3#    B-   7340#     643#    175 957820#       540#
+  40  108   68  176 Er    x  -46631#       401#        8034#       2#    B-   2741#     413#    175 949940#       430#
+  38  107   69  176 Tm    +  -49371.314    100.000     8045.121    0.568 B-   4120.000  100.000 175 946997.711    107.354
+  36  106   70  176 Yb       -53491.314      0.015     8064.085    0.000 B-   -109.078    1.212 175 942574.708      0.015
+  34  105   71  176 Lu       -53382.236      1.212     8059.020    0.007 B-   1194.085    0.874 175 942691.809      1.301
+  32  104   72  176 Hf       -54576.321      1.481     8061.359    0.008 B-  -3210.948   30.775 175 941409.905      1.590
+  30  103   73  176 Ta    x  -51365.374     30.739     8038.670    0.175 B-   -723.771   41.543 175 944857.000     33.000
+  28  102   74  176 W     x  -50641.603     27.945     8030.112    0.159 B-  -5578.718   39.520 175 945634.000     30.000
+  26  101   75  176 Re    x  -45062.885     27.945     7993.970    0.159 B-  -2964.945   39.520 175 951623.000     30.000
+  24  100   76  176 Os    x  -42097.940     27.945     7972.679    0.159 B-  -8219.614   32.580 175 954806.000     30.000
+  22   99   77  176 Ir       -33878.326     16.750     7921.531    0.095 B-  -4944.459   21.035 175 963630.119     17.981
+  20   98   78  176 Pt       -28933.867     12.724     7888.992    0.072 B- -10412.904   35.541 175 968938.214     13.660
+  18   97   79  176 Au   -a  -18520.963     33.185     7825.383    0.189 B-  -6736.013   34.998 175 980116.927     35.625
+  16   96   80  176 Hg       -11784.950     11.119     7782.665    0.063 B- -12366.544   75.904 175 987348.335     11.937
+  14   95   81  176 Tl   -p     581.594     75.086     7707.955    0.427 B-      *              176 000624.367     80.607
+0 41  109   68  177 Er    x  -42858#       503#        8013#       3#    B-   4611#     585#    176 953990#       540#
+  39  108   69  177 Tm    x  -47469#       298#        8035#       2#    B-   3517#     298#    176 949040#       320#
+  37  107   70  177 Yb   -n  -50986.397      0.220     8049.973    0.001 B-   1397.409    1.240 176 945263.848      0.236
+  35  106   71  177 Lu       -52383.806      1.220     8053.448    0.007 B-    496.810    0.791 176 943763.668      1.310
+  33  105   72  177 Hf       -52880.616      1.408     8051.835    0.008 B-  -1166.000    3.000 176 943230.320      1.511
+  31  104   73  177 Ta    -  -51714.616      3.314     8040.827    0.019 B-  -2012.890   28.141 176 944482.073      3.557
+  29  103   74  177 W     x  -49701.726     27.945     8025.035    0.158 B-  -3432.555   39.520 176 946643.000     30.000
+  27  102   75  177 Re    x  -46269.170     27.945     8001.222    0.158 B-  -4312.708   31.535 176 950328.000     30.000
+  25  101   76  177 Os   +a  -41956.462     14.613     7972.436    0.083 B-  -5909.041   24.576 176 954957.882     15.687
+  23  100   77  177 Ir    x  -36047.421     19.760     7934.632    0.112 B-  -6676.976   24.801 176 961301.500     21.213
+  21   99   78  177 Pt       -29370.444     14.988     7892.489    0.085 B-  -7825.341   18.297 176 968469.529     16.090
+  19   98   79  177 Au       -21545.103     10.496     7843.858    0.059 B-  -8762.561   75.786 176 976870.379     11.268
+  17   97   80  177 Hg   -a  -12782.542     75.056     7789.932    0.424 B-  -9442.016   78.098 176 986277.376     80.575
+  15   96   81  177 Tl   IT   -3340.526     21.629     7732.167    0.122 B-      *              176 996413.797     23.219
+0 42  110   68  178 Er    x  -40260#       596#        7999#       3#    B-   3855#     718#    177 956779#       640#
+  40  109   69  178 Tm    x  -44116#       401#        8016#       2#    B-   5580#     401#    177 952640#       430#
+  38  108   70  178 Yb  -nn  -49695.475     10.000     8042.841    0.056 B-    642.309   10.250 177 946649.710     10.735
+  36  107   71  178 Lu       -50337.784      2.251     8042.054    0.013 B-   2097.451    2.057 177 945960.162      2.416
+  34  106   72  178 Hf       -52435.236      1.412     8049.442    0.008 B-  -1837#      52#    177 943708.456      1.516
+  32  105   73  178 Ta   IT  -50598#        52#        8035#       0#    B-   -191#      50#    177 945681#        56#
+  30  104   74  178 W     -  -50406.936     15.199     8029.257    0.085 B-  -4753.483   31.810 177 945885.925     16.316
+  28  103   75  178 Re    x  -45653.453     27.945     7998.157    0.157 B-  -2109.183   31.093 177 950989.000     30.000
+  26  102   76  178 Os       -43544.270     13.632     7981.912    0.077 B-  -7292.386   24.006 177 953253.300     14.634
+  24  101   77  178 Ir    x  -36251.884     19.760     7936.549    0.111 B-  -4254.365   22.207 177 961082.000     21.213
+  22  100   78  178 Pt       -31997.519     10.133     7908.252    0.057 B-  -9693.776   14.297 177 965649.248     10.878
+  20   99   79  178 Au       -22303.743     10.086     7849.398    0.057 B-  -5987.841   14.755 177 976055.945     10.827
+  18   98   80  178 Hg   -a  -16315.901     10.770     7811.363    0.061 B- -11526#      90#    177 982484.158     11.562
+  16   97   81  178 Tl   -a   -4790#        89#        7742#       1#    B-  -8365#      91#    177 994857#        96#
+  14   96   82  178 Pb   -a    3574.294     23.963     7690.830    0.135 B-      *              178 003837.163     25.724
+0 41  110   69  179 Tm    x  -41601#       503#        8002#       3#    B-   4937#     540#    178 955340#       540#
+  39  109   70  179 Yb    x  -46537#       196#        8025#       1#    B-   2521#     196#    178 950040#       210#
+  37  108   71  179 Lu       -49058.918      5.150     8035.073    0.029 B-   1403.989    5.067 178 947333.082      5.528
+  35  107   72  179 Hf       -50462.907      1.413     8038.546    0.008 B-   -105.584    0.409 178 945825.838      1.517
+  33  106   73  179 Ta       -50357.323      1.463     8033.585    0.008 B-  -1062.195   14.520 178 945939.187      1.571
+  31  105   74  179 W        -49295.127     14.573     8023.281    0.081 B-  -2710.847   26.802 178 947079.501     15.644
+  29  104   75  179 Re       -46584.280     24.639     8003.766    0.138 B-  -3564.785   29.656 178 949989.715     26.450
+  27  103   76  179 Os       -43019.495     16.504     7979.480    0.092 B-  -4937.781   19.180 178 953816.669     17.718
+  25  102   77  179 Ir       -38081.714      9.771     7947.524    0.055 B-  -5813.569   12.613 178 959117.596     10.489
+  23  101   78  179 Pt       -32268.145      7.977     7910.675    0.045 B-  -7279.578   14.157 178 965358.719      8.563
+  21  100   79  179 Au       -24988.567     11.696     7865.637    0.065 B-  -8060.433   29.669 178 973173.668     12.555
+  19   99   80  179 Hg       -16928.134     27.267     7816.236    0.152 B-  -8659.639   47.417 178 981826.899     29.272
+  17   98   81  179 Tl   -a   -8268.495     38.793     7763.487    0.217 B- -10319.134   84.963 178 991123.405     41.646
+  15   97   82  179 Pb   -a    2050.639     75.590     7701.468    0.422 B-      *              179 002201.452     81.149
+0 42  111   69  180 Tm    x  -37920#       503#        7982#       3#    B-   6680#     585#    179 959291#       540#
+  40  110   70  180 Yb    x  -44600#       298#        8015#       2#    B-   2076#     306#    179 952120#       320#
+  38  109   71  180 Lu    +  -46676.348     70.725     8022.038    0.393 B-   3103.000   70.711 179 949890.876     75.926
+  36  108   72  180 Hf       -49779.348      1.419     8034.930    0.008 B-   -846.471    2.269 179 946559.669      1.522
+  34  107   73  180 Ta   +n  -48932.877      1.939     8025.881    0.011 B-    703.238    2.281 179 947468.392      2.081
+  32  106   74  180 W        -49636.115      1.436     8025.442    0.008 B-  -3798.757   21.440 179 946713.435      1.542
+  30  105   75  180 Re    x  -45837.359     21.392     7999.991    0.119 B-  -1479.549   26.950 179 950791.568     22.965
+  28  104   76  180 Os       -44357.810     16.391     7987.425    0.091 B-  -6380.284   27.200 179 952379.930     17.596
+  26  103   77  180 Ir    x  -37977.526     21.706     7947.633    0.121 B-  -3541.649   24.323 179 959229.446     23.302
+  24  102   78  180 Pt   +a  -34435.877     10.974     7923.611    0.061 B-  -8810.368   11.973 179 963031.563     11.781
+  22  101   79  180 Au       -25625.509      4.786     7870.318    0.027 B-  -5375.062   13.524 179 972489.883      5.137
+  20  100   80  180 Hg       -20250.447     12.649     7836.110    0.070 B- -10863.800   61.329 179 978260.249     13.579
+  18   99   81  180 Tl   -a   -9386.647     60.010     7771.409    0.333 B-  -7445.267   61.277 179 989923.019     64.423
+  16   98   82  180 Pb   -a   -1941.380     12.396     7725.700    0.069 B-      *              179 997915.842     13.307
+0 43  112   69  181 Tm    x  -35170#       596#        7967#       3#    B-   5918#     667#    180 962243#       640#
+  41  111   70  181 Yb    x  -41088#       298#        7996#       2#    B-   3709#     324#    180 955890#       320#
+  39  110   71  181 Lu    x  -44797.410    125.752     8011.929    0.695 B-   2605.421  125.760 180 951908.000    135.000
+  37  109   72  181 Hf   -n  -47402.831      1.420     8022.002    0.008 B-   1035.480    1.834 180 949110.965      1.524
+  35  108   73  181 Ta       -48438.311      1.403     8023.400    0.008 B-   -204.493    1.854 180 947999.331      1.506
+  33  107   74  181 W    -n  -48233.818      1.445     8017.948    0.008 B-  -1716.427   12.629 180 948218.863      1.551
+  31  106   75  181 Re   4n  -46517.391     12.549     8004.143    0.069 B-  -2967.428   28.275 180 950061.523     13.471
+  29  105   76  181 Os       -43549.963     25.338     7983.426    0.140 B-  -4086.935   25.876 180 953247.188     27.201
+  27  104   77  181 Ir   +a  -39463.028      5.245     7956.523    0.029 B-  -5081.517   14.660 180 957634.694      5.631
+  25  103   78  181 Pt       -34381.511     13.689     7924.126    0.076 B-  -6510.375   24.216 180 963089.927     14.695
+  23  102   79  181 Au   -a  -27871.136     19.976     7883.835    0.110 B-  -7210.000   25.212 180 970079.103     21.445
+  21  101   80  181 Hg       -20661.136     15.382     7839.679    0.085 B-  -7862.401   17.876 180 977819.357     16.513
+  19  100   81  181 Tl       -12798.735      9.108     7791.918    0.050 B-  -9681.385   75.959 180 986259.992      9.778
+  17   99   82  181 Pb   -a   -3117.350     75.411     7734.107    0.417 B-      *              180 996653.386     80.957
+0 42  112   70  182 Yb    x  -38820#       401#        7984#       2#    B-   3060#     446#    181 958325#       430#
+  40  111   71  182 Lu    x  -41880#       196#        7996#       1#    B-   4170#     196#    181 955040#       210#
+  38  110   72  182 Hf  -nn  -46049.508      6.165     8014.837    0.034 B-    380.425    6.274 181 950563.816      6.618
+  36  109   73  182 Ta       -46429.934      1.405     8012.628    0.008 B-   1816.126    1.399 181 950155.413      1.508
+  34  108   74  182 W        -48246.060      0.738     8018.308    0.004 B-  -2800.000  101.980 181 948205.721      0.791
+  32  107   75  182 Re   IT  -45446.060    101.983     7998.625    0.560 B-   -836.955  104.276 181 951211.645    109.483
+  30  106   76  182 Os       -44609.104     21.745     7989.728    0.119 B-  -5557.426   30.207 181 952110.153     23.344
+  28  105   77  182 Ir       -39051.679     20.967     7954.894    0.115 B-  -2883.230   24.720 181 958076.296     22.509
+  26  104   78  182 Pt       -36168.449     13.095     7934.754    0.072 B-  -7867.680   24.123 181 961171.571     14.057
+  24  103   79  182 Au   -a  -28300.768     20.260     7887.226    0.111 B-  -4723.846   22.501 181 969617.874     21.749
+  22  102   80  182 Hg       -23576.922      9.790     7856.972    0.054 B- -10248.994   15.363 181 974689.132     10.510
+  20  101   81  182 Tl   -a  -13327.927     11.839     7796.360    0.065 B-  -6502.815   16.927 181 985691.880     12.709
+  18  100   82  182 Pb   -a   -6825.112     12.098     7756.332    0.066 B-      *              181 992672.940     12.987
+0 43  113   70  183 Yb    x  -35100#       401#        7964#       2#    B-   4616#     408#    182 962319#       430#
+  41  112   71  183 Lu    x  -39716.110     80.108     7984.812    0.438 B-   3566.687   85.553 182 957363.000     86.000
+  39  111   72  183 Hf    +  -43282.796     30.034     8000.027    0.164 B-   2010.000   30.000 182 953534.004     32.242
+  37  110   73  183 Ta   -n  -45292.796      1.419     8006.735    0.008 B-   1072.783    1.413 182 951376.180      1.523
+  35  109   74  183 W        -46365.580      0.737     8008.322    0.004 B-   -556.000    8.000 182 950224.500      0.790
+  33  108   75  183 Re    -  -45809.580      8.034     8001.009    0.044 B-  -2145.537   50.405 182 950821.390      8.624
+  31  107   76  183 Os       -43664.043     49.760     7985.010    0.272 B-  -3460.732   52.733 182 953124.719     53.420
+  29  106   77  183 Ir       -40203.311     24.398     7961.823    0.133 B-  -4430.824   28.923 182 956839.968     26.191
+  27  105   78  183 Pt       -35772.487     15.533     7933.336    0.085 B-  -5581.004   18.168 182 961596.653     16.675
+  25  104   79  183 Au       -30191.483      9.423     7898.564    0.051 B-  -6386.809   11.789 182 967588.108     10.116
+  23  103   80  183 Hg       -23804.674      7.084     7859.388    0.039 B-  -7217.417   11.716 182 974444.629      7.604
+  21  102   81  183 Tl       -16587.257      9.331     7815.673    0.051 B-  -9012.038   29.657 182 982192.846     10.017
+  19  101   82  183 Pb   -a   -7575.218     28.151     7762.152    0.154 B-      *              182 991867.668     30.221
+0 44  114   70  184 Yb    x  -32540#       503#        7951#       3#    B-   3872#     585#    183 965067#       540#
+  42  113   71  184 Lu    x  -36412#       298#        7967#       2#    B-   5087#     301#    183 960910#       320#
+  40  112   72  184 Hf    +  -41499.373     39.706     7990.722    0.216 B-   1340.000   30.000 183 955448.587     42.625
+  38  111   73  184 Ta    +  -42839.373     26.010     7993.752    0.141 B-   2866.000   26.000 183 954010.038     27.923
+  36  110   74  184 W        -45705.373      0.731     8005.077    0.004 B-  -1485.739    4.198 183 950933.260      0.785
+  34  109   75  184 Re       -44219.634      4.275     7992.750    0.023 B-     32.898    4.140 183 952528.267      4.589
+  32  108   76  184 Os       -44252.533      0.827     7988.677    0.005 B-  -4641.682   27.957 183 952492.949      0.887
+  30  107   77  184 Ir    x  -39610.851     27.945     7959.198    0.152 B-  -2276.608   31.997 183 957476.000     30.000
+  28  106   78  184 Pt       -37334.243     15.584     7942.574    0.085 B-  -7015.533   27.185 183 959920.039     16.730
+  26  105   79  184 Au   -a  -30318.710     22.275     7900.194    0.121 B-  -3969.745   24.442 183 967451.524     23.912
+  24  104   80  184 Hg       -26348.965     10.062     7874.367    0.055 B-  -9465.723   14.200 183 971713.221     10.802
+  22  103   81  184 Tl       -16883.242     10.020     7818.671    0.054 B-  -5831.720   16.260 183 981875.093     10.757
+  20  102   82  184 Pb       -11051.522     12.806     7782.725    0.070 B- -12114.590   79.153 183 988135.702     13.748
+  18  101   83  184 Bi   -a    1063.068     78.110     7712.633    0.425 B-      *              184 001141.250     83.854
+0 45  115   70  185 Yb    x  -28500#       503#        7929#       3#    B-   5388#     585#    184 969404#       540#
+  43  114   71  185 Lu    x  -33888#       298#        7954#       2#    B-   4432#     305#    184 963620#       320#
+  41  113   72  185 Hf    x  -38319.800     64.273     7973.970    0.347 B-   3074.492   65.815 184 958862.000     69.000
+  39  112   73  185 Ta    +  -41394.293     14.161     7986.360    0.077 B-   1993.500   14.142 184 955561.396     15.202
+  37  111   74  185 W        -43387.793      0.733     7992.907    0.004 B-    431.234    0.661 184 953421.286      0.786
+  35  110   75  185 Re       -43819.027      0.818     7991.009    0.004 B-  -1013.147    0.419 184 952958.337      0.877
+  33  109   76  185 Os       -42805.880      0.830     7981.304    0.004 B-  -2470.326   27.957 184 954045.995      0.891
+  31  108   77  185 Ir    x  -40335.553     27.945     7963.722    0.151 B-  -3647.414   38.055 184 956698.000     30.000
+  29  107   78  185 Pt       -36688.140     25.832     7939.777    0.140 B-  -4829.997   25.963 184 960613.659     27.731
+  27  106   79  185 Au    x  -31858.143      2.608     7909.440    0.014 B-  -5674.477   13.886 184 965798.874      2.800
+  25  105   80  185 Hg       -26183.666     13.639     7874.538    0.074 B-  -6425.925   24.767 184 971890.676     14.641
+  23  104   81  185 Tl   IT  -19757.741     20.674     7835.575    0.112 B-  -8216.521   26.249 184 978789.191     22.194
+  21  103   82  185 Pb   -a  -11541.220     16.175     7786.932    0.087 B-  -9305#      83#    184 987609.989     17.364
+  19  102   83  185 Bi   IT   -2236#        81#        7732#       0#    B-      *              184 997600#        87#
+0 44  115   71  186 Lu    x  -30210#       401#        7935#       2#    B-   6214#     404#    185 967568#       430#
+  42  114   72  186 Hf    x  -36424.210     51.232     7964.302    0.275 B-   2183.318   78.906 185 960897.000     55.000
+  40  113   73  186 Ta    +  -38607.528     60.012     7971.835    0.323 B-   3901.000   60.000 185 958553.111     64.425
+  38  112   74  186 W        -42508.528      1.212     7988.601    0.007 B-   -581.442    1.244 185 954365.215      1.300
+  36  111   75  186 Re       -41927.086      0.826     7981.269    0.004 B-   1072.857    0.837 185 954989.419      0.886
+  34  110   76  186 Os       -42999.943      0.761     7982.831    0.004 B-  -3827.596   16.543 185 953837.660      0.816
+  32  109   77  186 Ir    x  -39172.346     16.526     7958.047    0.089 B-  -1307.903   27.312 185 957946.754     17.740
+  30  108   78  186 Pt       -37864.443     21.745     7946.809    0.117 B-  -6149.591   30.207 185 959350.846     23.344
+  28  107   79  186 Au       -31714.852     20.967     7909.540    0.113 B-  -3175.756   23.987 185 965952.703     22.509
+  26  106   80  186 Hg       -28539.097     11.650     7888.260    0.063 B-  -8652.484   25.209 185 969362.017     12.507
+  24  105   81  186 Tl    x  -19886.613     22.356     7837.535    0.120 B-  -5204.588   25.078 185 978650.841     24.000
+  22  104   82  186 Pb   -a  -14682.026     11.363     7805.347    0.061 B- -11535.814   20.329 185 984238.196     12.199
+  20  103   83  186 Bi   -a   -3146.212     16.857     7739.121    0.091 B-  -7247.186   24.870 185 996622.402     18.096
+  18  102   84  186 Po   -a    4100.974     18.286     7695.951    0.098 B-      *              186 004402.577     19.630
+0 45  116   71  187 Lu    x  -27580#       401#        7922#       2#    B-   5237#     499#    186 970392#       430#
+  43  115   72  187 Hf    x  -32817#       298#        7946#       2#    B-   4079#     303#    186 964770#       320#
+  41  114   73  187 Ta    x  -36895.546     55.890     7963.212    0.299 B-   3008.424   55.903 186 960391.000     60.000
+  39  113   74  187 W        -39903.970      1.212     7975.116    0.006 B-   1312.508    1.122 186 957161.323      1.300
+  37  112   75  187 Re       -41216.478      0.736     7977.951    0.004 B-      2.467    0.002 186 955752.288      0.790
+  35  111   76  187 Os       -41218.945      0.736     7973.780    0.004 B-  -1669.572   27.955 186 955749.640      0.790
+  33  110   77  187 Ir    x  -39549.372     27.945     7960.668    0.149 B-  -2864.323   36.868 186 957542.000     30.000
+  31  109   78  187 Pt       -36685.050     24.048     7941.168    0.129 B-  -3657.212   27.377 186 960616.976     25.816
+  29  108   79  187 Au       -33027.838     22.308     7917.427    0.119 B-  -4909.908   26.287 186 964543.155     23.948
+  27  107   80  187 Hg       -28117.930     13.905     7886.987    0.074 B-  -5673.343   16.067 186 969814.158     14.928
+  25  106   81  187 Tl       -22444.587      8.048     7852.464    0.043 B-  -7457.628    9.525 186 975904.743      8.640
+  23  105   82  187 Pb       -14986.959      5.094     7808.400    0.027 B-  -8603.688   11.227 186 983910.836      5.468
+  21  104   83  187 Bi   -a   -6383.271     10.005     7758.208    0.054 B-  -9211.868   33.430 186 993147.276     10.740
+  19  103   84  187 Po   -a    2828.597     31.898     7704.763    0.171 B-      *              187 003036.624     34.243
+0 46  117   71  188 Lu    x  -23790#       503#        7902#       3#    B-   7089#     585#    187 974460#       540#
+  44  116   72  188 Hf    x  -30879#       298#        7936#       2#    B-   2733#     303#    187 966850#       320#
+  42  115   73  188 Ta    x  -33612.030     54.958     7946.321    0.292 B-   5055.781   55.045 187 963916.000     59.000
+  40  114   74  188 W     +  -38667.811      3.089     7969.052    0.016 B-    349.000    3.000 187 958488.395      3.316
+  38  113   75  188 Re   -n  -39016.811      0.738     7966.747    0.004 B-   2120.422    0.152 187 958113.728      0.791
+  36  112   76  188 Os       -41137.233      0.734     7973.864    0.004 B-  -2792.326    9.416 187 955837.361      0.787
+  34  111   77  188 Ir       -38344.907      9.423     7954.850    0.050 B-   -523.979    8.686 187 958835.046     10.116
+  32  110   78  188 Pt       -37820.929      5.304     7947.902    0.028 B-  -5449.621    5.953 187 959397.560      5.694
+  30  109   79  188 Au    x  -32371.308      2.701     7914.753    0.014 B-  -2169.394   12.569 187 965247.969      2.900
+  28  108   80  188 Hg       -30201.914     12.275     7899.052    0.065 B-  -7865.513   32.325 187 967576.910     13.178
+  26  107   81  188 Tl    x  -22336.400     29.904     7853.053    0.159 B-  -4521.198   31.734 187 976020.886     32.103
+  24  106   82  188 Pb   -a  -17815.202     10.622     7824.843    0.057 B- -10620.515   15.427 187 980874.592     11.403
+  22  105   83  188 Bi   -a   -7194.687     11.187     7764.189    0.060 B-  -6650.374   22.892 187 992276.184     12.009
+  20  104   84  188 Po   -a    -544.313     19.973     7724.653    0.106 B-      *              187 999415.655     21.441
+0 45  117   72  189 Hf    x  -27162#       298#        7917#       2#    B-   4667#     357#    188 970840#       320#
+  43  116   73  189 Ta    x  -31829#       196#        7938#       1#    B-   3788#     200#    188 965830#       210#
+  41  115   74  189 W     x  -35617.536     40.054     7953.454    0.212 B-   2361.507   40.883 188 961763.000     43.000
+  39  114   75  189 Re   +p  -37979.043      8.191     7961.809    0.043 B-   1007.702    8.167 188 959227.817      8.793
+  37  113   76  189 Os       -38986.745      0.666     7963.002    0.004 B-   -537.159   12.563 188 958146.005      0.715
+  35  112   77  189 Ir       -38449.586     12.576     7956.020    0.067 B-  -1980.238   13.636 188 958722.669     13.500
+  33  111   78  189 Pt       -36469.348     10.090     7941.403    0.053 B-  -2887.394   22.474 188 960848.542     10.832
+  31  110   79  189 Au    x  -33581.955     20.081     7921.987    0.106 B-  -3955.554   37.401 188 963948.286     21.558
+  29  109   80  189 Hg       -29626.401     31.553     7896.919    0.167 B-  -5010.300   32.643 188 968194.748     33.873
+  27  108   81  189 Tl       -24616.100      8.368     7866.270    0.044 B-  -6772.066   16.364 188 973573.527      8.983
+  25  107   82  189 Pb       -17844.035     14.062     7826.299    0.074 B-  -7779.374   25.150 188 980843.639     15.096
+  23  106   83  189 Bi   -a  -10064.660     20.851     7780.999    0.110 B-  -8642.656   30.354 188 989195.141     22.384
+  21  105   84  189 Po   -a   -1422.005     22.059     7731.131    0.117 B-      *              188 998473.415     23.681
+0 46  118   72  190 Hf    x  -25030#       401#        7907#       2#    B-   3483#     446#    189 973129#       430#
+  44  117   73  190 Ta    x  -28513#       196#        7921#       1#    B-   5869#     200#    189 969390#       210#
+  42  116   74  190 W        -34382.313     39.726     7947.573    0.209 B-   1253.517   63.522 189 963089.066     42.647
+  40  115   75  190 Re       -35635.830     70.852     7950.053    0.373 B-   3071.941   70.854 189 961743.360     76.063
+  38  114   76  190 Os       -38707.771      0.650     7962.104    0.003 B-  -1954.227    1.213 189 958445.496      0.697
+  36  113   77  190 Ir   +n  -36753.544      1.370     7947.701    0.007 B-    552.906    1.282 189 960543.445      1.470
+  34  112   78  190 Pt       -37306.450      0.657     7946.493    0.003 B-  -4472.917    3.509 189 959949.876      0.704
+  32  111   79  190 Au    x  -32833.533      3.447     7918.834    0.018 B-  -1462.836   16.276 189 964751.750      3.700
+  30  110   80  190 Hg       -31370.697     15.907     7907.017    0.084 B-  -6998.670   17.782 189 966322.169     17.076
+  28  109   81  190 Tl   +a  -24372.027      7.948     7866.064    0.042 B-  -3955.382   14.825 189 973835.551      8.532
+  26  108   82  190 Pb   -a  -20416.645     12.514     7841.129    0.066 B-  -9817.066   25.800 189 978081.828     13.434
+  24  107   83  190 Bi   -a  -10599.579     22.562     7785.342    0.119 B-  -6035.742   26.275 189 988620.883     24.221
+  22  106   84  190 Po   -a   -4563.837     13.465     7749.458    0.071 B-      *              189 995100.519     14.455
+0 45  118   73  191 Ta    x  -26492#       298#        7911#       2#    B-   4684#     301#    190 971560#       320#
+  43  117   74  191 W     x  -31176.173     41.917     7931.435    0.219 B-   3174.124   43.156 190 966531.000     45.000
+  41  116   75  191 Re   +p  -34350.296     10.265     7943.957    0.054 B-   2044.889   10.244 190 963123.437     11.019
+  39  115   76  191 Os       -36395.185      0.659     7950.568    0.003 B-    313.570    1.141 190 960928.159      0.707
+  37  114   77  191 Ir       -36708.756      1.311     7948.113    0.007 B-  -1010.518    3.636 190 960591.527      1.406
+  35  113   78  191 Pt       -35698.237      4.127     7938.727    0.022 B-  -1900.333    6.426 190 961676.363      4.430
+  33  112   79  191 Au       -33797.904      4.926     7924.681    0.026 B-  -3206.008   22.710 190 963716.455      5.288
+  31  111   80  191 Hg       -30591.896     22.280     7903.800    0.117 B-  -4308.951   23.461 190 967158.247     23.918
+  29  110   81  191 Tl   +a  -26282.945      7.349     7877.144    0.038 B-  -6051.827   37.978 190 971784.096      7.889
+  27  109   82  191 Pb    x  -20231.118     37.260     7841.363    0.195 B-  -6991.772   38.005 190 978281.000     40.000
+  25  108   83  191 Bi       -13239.347      7.487     7800.661    0.039 B-  -8170.612   10.320 190 985786.975      8.037
+  23  107   84  191 Po        -5068.735      7.103     7753.786    0.037 B-  -8932.653   17.600 190 994558.488      7.624
+  21  106   85  191 At   -a    3863.917     16.103     7702.923    0.084 B-      *              191 004148.086     17.287
+0 46  119   73  192 Ta    x  -23064#       401#        7894#       2#    B-   6586#     446#    191 975240#       430#
+  44  118   74  192 W     x  -29649#       196#        7924#       1#    B-   1939#     208#    191 968170#       210#
+  42  117   75  192 Re    x  -31588.825     70.794     7930.238    0.369 B-   4293.366   70.831 191 966088.000     76.000
+  40  116   76  192 Os       -35882.191      2.315     7948.525    0.012 B-  -1046.630    2.397 191 961478.881      2.485
+  38  115   77  192 Ir       -34835.561      1.314     7938.999    0.007 B-   1452.896    2.274 191 962602.485      1.410
+  36  114   78  192 Pt       -36288.457      2.570     7942.491    0.013 B-  -3516.341   15.617 191 961042.736      2.758
+  34  113   79  192 Au    -  -32772.116     15.827     7920.102    0.082 B-   -760.563   22.178 191 964817.684     16.991
+  32  112   80  192 Hg    x  -32011.553     15.537     7912.066    0.081 B-  -6139.307   35.277 191 965634.182     16.679
+  30  111   81  192 Tl    x  -25872.246     31.671     7876.016    0.165 B-  -3316.226   34.348 191 972225.000     34.000
+  28  110   82  192 Pb   -a  -22556.020     13.295     7854.669    0.069 B-  -9021.485   32.917 191 975785.115     14.273
+  26  109   83  192 Bi   -a  -13534.535     30.112     7803.608    0.157 B-  -5463.873   32.096 191 985470.078     32.326
+  24  108   84  192 Po   -a   -8070.661     11.110     7771.075    0.058 B- -10996.516   30.008 191 991335.788     11.926
+  22  107   85  192 At   -a    2925.854     27.876     7709.727    0.145 B-      *              192 003141.034     29.926
+0 47  120   73  193 Ta    x  -20870#       401#        7884#       2#    B-   5417#     446#    192 977595#       430#
+  45  119   74  193 W     x  -26287#       196#        7908#       1#    B-   3945#     199#    192 971780#       210#
+  43  118   75  193 Re    x  -30231.638     39.123     7923.937    0.203 B-   3162.652   39.192 192 967545.000     42.000
+  41  117   76  193 Os       -33394.289      2.321     7936.270    0.012 B-   1141.946    2.400 192 964149.753      2.491
+  39  116   77  193 Ir       -34536.235      1.328     7938.133    0.007 B-    -56.628    0.300 192 962923.824      1.425
+  37  115   78  193 Pt       -34479.608      1.359     7933.786    0.007 B-  -1074.787    8.768 192 962984.616      1.458
+  35  114   79  193 Au       -33404.821      8.674     7924.164    0.045 B-  -2342.642   14.370 192 964138.447      9.311
+  33  113   80  193 Hg       -31062.179     15.505     7907.972    0.080 B-  -3584.967   16.894 192 966653.377     16.645
+  31  112   81  193 Tl    x  -27477.212      6.707     7885.344    0.035 B-  -5282.723   50.028 192 970501.997      7.200
+  29  111   82  193 Pb    x  -22194.490     49.577     7853.919    0.257 B-  -6309.931   50.152 192 976173.234     53.222
+  27  110   83  193 Bi       -15884.559      7.576     7817.171    0.039 B-  -7559.241   16.387 192 982947.223      8.132
+  25  109   84  193 Po   -a   -8325.318     14.531     7773.950    0.075 B-  -8257.998   26.059 192 991062.403     15.599
+  23  108   85  193 At   -a     -67.320     21.632     7727.109    0.112 B-  -9110.231   33.144 192 999927.728     23.222
+  21  107   86  193 Rn   -a    9042.911     25.112     7675.852    0.130 B-      *              193 009707.964     26.958
+0 48  121   73  194 Ta    x  -17300#       503#        7866#       3#    B-   7227#     585#    193 981428#       540#
+  46  120   74  194 W     x  -24526#       298#        7899#       2#    B-   2711#     357#    193 973670#       320#
+  44  119   75  194 Re    x  -27237#       196#        7909#       1#    B-   5198#     196#    193 970760#       210#
+  42  118   76  194 Os    +  -32435.108      2.403     7932.022    0.012 B-     96.600    2.000 193 965179.477      2.579
+  40  117   77  194 Ir   -n  -32531.708      1.332     7928.487    0.007 B-   2228.362    1.257 193 965075.773      1.430
+  38  116   78  194 Pt       -34760.070      0.496     7935.941    0.003 B-  -2548.134    2.117 193 962683.527      0.532
+  36  115   79  194 Au  +3n  -32211.936      2.118     7918.774    0.011 B-    -27.991    3.581 193 965419.062      2.273
+  34  114   80  194 Hg    x  -32183.945      2.888     7914.597    0.015 B-  -5246.454   14.268 193 965449.111      3.100
+  32  113   81  194 Tl    x  -26937.491     13.972     7883.520    0.072 B-  -2729.552   22.343 193 971081.411     15.000
+  30  112   82  194 Pb       -24207.940     17.435     7865.418    0.090 B-  -8179.128   18.498 193 974011.706     18.717
+  28  111   83  194 Bi   +a  -16028.811      6.178     7819.225    0.032 B-  -5024.156   14.313 193 982792.362      6.632
+  26  110   84  194 Po   -a  -11004.655     12.911     7789.294    0.067 B- -10284.492   28.076 193 988186.015     13.860
+  24  109   85  194 At   -a    -720.163     24.931     7732.249    0.129 B-  -6443.658   30.119 193 999226.872     26.764
+  22  108   86  194 Rn   -a    5723.495     16.899     7695.001    0.087 B-      *              194 006144.424     18.141
+0 47  121   74  195 W     x  -21010#       298#        7882#       2#    B-   4569#     422#    194 977445#       320#
+  45  120   75  195 Re    x  -25579#       298#        7902#       2#    B-   3933#     303#    194 972540#       320#
+  43  119   76  195 Os    x  -29511.593     55.890     7917.744    0.287 B-   2180.658   55.906 194 968318.000     60.000
+  41  118   77  195 Ir   -n  -31692.251      1.333     7924.915    0.007 B-   1101.598    1.264 194 965976.967      1.431
+  39  117   78  195 Pt       -32793.849      0.503     7926.552    0.003 B-   -226.817    1.000 194 964794.353      0.539
+  37  116   79  195 Au       -32567.031      1.119     7921.377    0.006 B-  -1553.638   23.156 194 965037.851      1.201
+  35  115   80  195 Hg       -31013.393     23.142     7909.397    0.119 B-  -2858.145   25.657 194 966705.751     24.843
+  33  114   81  195 Tl       -28155.248     11.093     7890.728    0.057 B-  -4447.555   21.106 194 969774.096     11.909
+  31  113   82  195 Pb       -23707.693     17.960     7863.908    0.092 B-  -5682.132   18.722 194 974548.743     19.280
+  29  112   83  195 Bi       -18025.561      5.287     7830.757    0.027 B-  -6969.303   37.737 194 980648.762      5.675
+  27  111   84  195 Po   -a  -11056.259     37.364     7791.005    0.192 B-  -7585.964   38.571 194 988130.617     40.112
+  25  110   85  195 At   -a   -3470.295      9.573     7748.091    0.049 B-  -8520.575   51.401 194 996274.485     10.276
+  23  109   86  195 Rn   -a    5050.281     50.502     7700.383    0.259 B-      *              195 005421.699     54.216
+0 48  122   74  196 W     x  -18880#       401#        7872#       2#    B-   3662#     499#    195 979731#       430#
+  46  121   75  196 Re    x  -22542#       298#        7887#       2#    B-   5735#     301#    195 975800#       320#
+  44  120   76  196 Os  +pp  -28277.105     40.055     7912.229    0.204 B-   1158.388   55.495 195 969643.277     43.000
+  42  119   77  196 Ir    +  -29435.493     38.414     7914.148    0.196 B-   3209.016   38.411 195 968399.696     41.239
+  40  118   78  196 Pt       -32644.510      0.510     7926.529    0.003 B-  -1505.803    2.960 195 964954.675      0.547
+  38  117   79  196 Au       -31138.706      2.962     7914.855    0.015 B-    687.235    3.118 195 966571.221      3.179
+  36  116   80  196 Hg       -31825.941      2.946     7914.369    0.015 B-  -4329.349   12.463 195 965833.444      3.163
+  34  115   81  196 Tl    x  -27496.592     12.109     7888.289    0.062 B-  -2148.280   14.356 195 970481.192     13.000
+  32  114   82  196 Pb       -25348.312      7.710     7873.337    0.039 B-  -7339.281   25.616 195 972787.466      8.277
+  30  113   83  196 Bi    x  -18009.031     24.428     7831.900    0.125 B-  -4535.989   27.916 195 980666.509     26.224
+  28  112   84  196 Po   -a  -13473.042     13.512     7804.766    0.069 B-  -9558.365   33.162 195 985536.094     14.506
+  26  111   85  196 At   -a   -3914.677     30.284     7752.007    0.155 B-  -5885.668   33.540 195 995797.421     32.511
+  24  110   86  196 Rn   -a    1970.991     14.417     7717.987    0.074 B-      *              196 002115.945     15.476
+0 49  123   74  197 W     x  -15140#       401#        7854#       2#    B-   5363#     499#    196 983747#       430#
+  47  122   75  197 Re    x  -20502#       298#        7878#       2#    B-   4807#     357#    196 977990#       320#
+  45  121   76  197 Os    x  -25309#       196#        7898#       1#    B-   2955#     197#    196 972830#       210#
+  43  120   77  197 Ir   +p  -28264.105     20.110     7908.999    0.102 B-   2155.645   20.106 196 969657.233     21.588
+  41  119   78  197 Pt       -30419.750      0.536     7915.971    0.003 B-    719.988    0.502 196 967343.053      0.575
+  39  118   79  197 Au       -31139.738      0.542     7915.654    0.003 B-   -599.509    3.202 196 966570.114      0.581
+  37  117   80  197 Hg       -30540.229      3.207     7908.640    0.016 B-  -2198.580   16.637 196 967213.713      3.442
+  35  116   81  197 Tl   +a  -28341.649     16.325     7893.508    0.083 B-  -3596.247   17.014 196 969573.986     17.526
+  33  115   82  197 Pb       -24745.401      4.804     7871.282    0.024 B-  -5058.210    9.619 196 973434.717      5.157
+  31  114   83  197 Bi   +a  -19687.191      8.333     7841.634    0.042 B-  -6329.202   50.373 196 978864.929      8.946
+  29  113   84  197 Po   -a  -13357.990     49.679     7805.535    0.252 B-  -7002.739   50.317 196 985659.607     53.332
+  27  112   85  197 At        -6355.250      7.983     7766.017    0.041 B-  -7865.603   18.054 196 993177.357      8.570
+  25  111   86  197 Rn   -a    1510.353     16.193     7722.118    0.082 B-  -8743.618   56.762 197 001621.430     17.383
+  23  110   87  197 Fr   -a   10253.971     54.404     7673.763    0.276 B-      *              197 011008.090     58.404
+0 48  123   75  198 Re    x  -17139#       401#        7862#       2#    B-   6697#     446#    197 981600#       430#
+  46  122   76  198 Os    x  -23837#       196#        7891#       1#    B-   1984#     277#    197 974410#       210#
+  44  121   77  198 Ir    x  -25821#       196#        7897#       1#    B-   4083#     196#    197 972280#       210#
+  42  120   78  198 Pt       -29903.999      2.100     7914.150    0.011 B-   -323.219    2.059 197 967896.734      2.254
+  40  119   79  198 Au       -29580.781      0.540     7908.567    0.003 B-   1373.530    0.490 197 968243.724      0.579
+  38  118   80  198 Hg       -30954.310      0.458     7911.552    0.002 B-  -3425.564    7.559 197 966769.179      0.491
+  36  117   81  198 Tl    x  -27528.746      7.545     7890.300    0.038 B-  -1461.257   11.554 197 970446.673      8.100
+  34  116   82  198 Pb       -26067.489      8.750     7878.969    0.044 B-  -6698.003   29.283 197 972015.397      9.393
+  32  115   83  198 Bi    x  -19369.486     27.945     7841.189    0.141 B-  -3896.134   32.932 197 979206.000     30.000
+  30  114   84  198 Po       -15473.352     17.424     7817.561    0.088 B-  -8758.839   18.386 197 983388.672     18.705
+  28  113   85  198 At    x   -6714.513      5.868     7769.373    0.030 B-  -5484.155   14.647 197 992791.673      6.300
+  26  112   86  198 Rn   -a   -1230.358     13.420     7737.724    0.068 B- -10804.382   34.905 197 998679.156     14.406
+  24  111   87  198 Fr   -a    9574.024     32.222     7679.205    0.163 B-      *              198 010278.138     34.591
+0 49  124   75  199 Re    x  -14860#       401#        7851#       2#    B-   5623#     446#    198 984047#       430#
+  47  123   76  199 Os    x  -20484#       196#        7875#       1#    B-   3915#     200#    198 978010#       210#
+  45  122   77  199 Ir p-2n  -24398.515     41.054     7891.206    0.206 B-   2990.167   41.003 198 973807.115     44.073
+  43  121   78  199 Pt   -n  -27388.682      2.159     7902.300    0.011 B-   1705.059    2.120 198 970597.038      2.317
+  41  120   79  199 Au       -29093.741      0.542     7906.937    0.003 B-    452.327    0.613 198 968766.582      0.581
+  39  119   80  199 Hg       -29546.068      0.526     7905.279    0.003 B-  -1486.674   27.950 198 968280.989      0.564
+  37  118   81  199 Tl    x  -28059.394     27.945     7893.877    0.140 B-  -2827.589   29.679 198 969877.000     30.000
+  35  117   82  199 Pb   +a  -25231.804      9.996     7875.736    0.050 B-  -4434.239   14.547 198 972912.542     10.730
+  33  116   83  199 Bi       -20797.566     10.568     7849.522    0.053 B-  -5589.083   20.919 198 977672.893     11.345
+  31  115   84  199 Po   -a  -15208.483     18.060     7817.505    0.091 B-  -6385.111   18.845 198 983673.021     19.387
+  29  114   85  199 At        -8823.372      5.384     7781.488    0.027 B-  -7323.921   37.972 198 990527.719      5.780
+  27  113   86  199 Rn   -a   -1499.451     37.588     7740.753    0.189 B-  -8270.844   40.015 198 998390.273     40.352
+  25  112   87  199 Fr   -a    6771.393     13.726     7695.259    0.069 B-      *              199 007269.389     14.734
+0 48  124   76  200 Os    x  -18779#       298#        7868#       1#    B-   2832#     357#    199 979840#       320#
+  46  123   77  200 Ir    x  -21611#       196#        7878#       1#    B-   4988#     197#    199 976800#       210#
+  44  122   78  200 Pt  -nn  -26599.160     20.110     7899.198    0.101 B-    640.932   33.439 199 971444.625     21.588
+  42  121   79  200 Au       -27240.092     26.717     7898.491    0.134 B-   2263.178   26.719 199 970756.556     28.681
+  40  120   80  200 Hg       -29503.270      0.529     7905.895    0.003 B-  -2456.040    5.735 199 968326.934      0.568
+  38  119   81  200 Tl    -  -27047.230      5.759     7889.703    0.029 B-   -796.176   12.340 199 970963.602      6.182
+  36  118   82  200 Pb   4n  -26251.054     10.927     7881.810    0.055 B-  -5880.299   24.852 199 971818.332     11.730
+  34  117   83  200 Bi   +a  -20370.755     22.321     7848.497    0.112 B-  -3428.994   23.573 199 978131.093     23.962
+  32  116   84  200 Po       -16941.761      7.579     7827.440    0.038 B-  -7953.869   25.612 199 981812.270      8.135
+  30  115   85  200 At   -a   -8987.892     24.465     7783.759    0.122 B-  -4983.127   28.030 199 990351.100     26.264
+  28  114   86  200 Rn   -a   -4004.765     13.681     7754.932    0.068 B- -10137.263   33.529 199 995700.707     14.686
+  26  113   87  200 Fr   -a    6132.498     30.611     7700.334    0.153 B-      *              200 006583.507     32.861
+0 49  125   76  201 Os    x  -15239#       298#        7851#       1#    B-   4657#     357#    200 983640#       320#
+  47  124   77  201 Ir    x  -19897#       196#        7871#       1#    B-   3844#     202#    200 978640#       210#
+  45  123   78  201 Pt    +  -23740.714     50.103     7885.833    0.249 B-   2660.000   50.000 200 974513.293     53.788
+  43  122   79  201 Au       -26400.714      3.218     7895.175    0.016 B-   1261.827    3.147 200 971657.665      3.454
+  41  121   80  201 Hg       -27662.542      0.711     7897.560    0.004 B-   -481.704   14.181 200 970303.038      0.763
+  39  120   81  201 Tl       -27180.838     14.185     7891.271    0.071 B-  -1909.802   18.530 200 970820.168     15.228
+  37  119   82  201 Pb       -25271.036     13.747     7877.877    0.068 B-  -3854.603   20.481 200 972870.425     14.758
+  35  118   83  201 Bi   +a  -21416.433     15.183     7854.808    0.076 B-  -4895.248   15.962 200 977008.512     16.299
+  33  117   84  201 Po       -16521.185      4.942     7826.561    0.025 B-  -5731.747    9.561 200 982263.777      5.305
+  31  116   85  201 At   +a  -10789.438      8.184     7794.153    0.041 B-  -6717.113   50.401 200 988417.061      8.786
+  29  115   86  201 Rn   -a   -4072.324     49.732     7756.842    0.247 B-  -7660.902   50.554 200 995628.179     53.389
+  27  114   87  201 Fr   -a    3588.577      9.080     7714.836    0.045 B-  -8348.224   22.239 201 003852.496      9.747
+  25  113   88  201 Ra   -a   11936.801     20.301     7669.410    0.101 B-      *              201 012814.683     21.794
+0 50  126   76  202 Os    x  -13087#       401#        7842#       2#    B-   3689#     499#    201 985950#       430#
+  48  125   77  202 Ir    x  -16776#       298#        7856#       1#    B-   5916#     299#    201 981990#       320#
+  46  124   78  202 Pt    x  -22692.125     25.150     7881.560    0.125 B-   1660.854   34.276 201 975639.000     27.000
+  44  123   79  202 Au    x  -24352.979     23.287     7885.909    0.115 B-   2992.345   23.298 201 973856.000     25.000
+  42  122   80  202 Hg       -27345.324      0.705     7896.850    0.003 B-  -1365.108    1.636 201 970643.585      0.756
+  40  121   81  202 Tl       -25980.216      1.606     7886.219    0.008 B-    -39.602    4.096 201 972109.089      1.723
+  38  120   82  202 Pb       -25940.614      3.796     7882.150    0.019 B-  -5199.130   15.856 201 972151.604      4.075
+  36  119   83  202 Bi       -20741.484     15.396     7852.539    0.076 B-  -2799.868   17.666 201 977733.100     16.528
+  34  118   84  202 Po       -17941.616      8.670     7834.805    0.043 B-  -7350.884   29.289 201 980738.881      9.307
+  32  117   85  202 At   -a  -10590.732     27.977     7794.541    0.138 B-  -4316.097   33.010 201 988630.380     30.034
+  30  116   86  202 Rn   -a   -6274.635     17.520     7769.301    0.087 B-  -9370.871   18.881 201 993263.902     18.808
+  28  115   87  202 Fr   -a    3096.237      7.040     7719.038    0.035 B-  -5978.625   16.586 202 003323.946      7.557
+  26  114   88  202 Ra   -a    9074.861     15.018     7685.568    0.074 B-      *              202 009742.264     16.122
+0 51  127   76  203 Os    x   -7640#       401#        7816#       2#    B-   7050#     566#    202 991798#       430#
+  49  126   77  203 Ir    x  -14690#       401#        7847#       2#    B-   4937#     446#    202 984230#       430#
+  47  125   78  203 Pt    x  -19627#       196#        7867#       1#    B-   3517#     196#    202 978930#       210#
+  45  124   79  203 Au       -23143.436      3.083     7880.864    0.015 B-   2125.829    3.451 202 975154.498      3.309
+  43  123   80  203 Hg       -25269.265      1.627     7887.482    0.008 B-    492.112    1.225 202 972872.326      1.746
+  41  122   81  203 Tl       -25761.377      1.166     7886.053    0.006 B-   -974.820    6.461 202 972344.022      1.252
+  39  121   82  203 Pb       -24786.557      6.554     7877.397    0.032 B-  -3261.729   14.356 202 973390.535      7.036
+  37  120   83  203 Bi   +a  -21524.827     12.778     7857.475    0.063 B-  -4213.939   15.433 202 976892.145     13.717
+  35  119   84  203 Po   +a  -17310.889      8.655     7832.863    0.043 B-  -5148.332   13.666 202 981415.995      9.291
+  33  118   85  203 At       -12162.557     10.576     7803.648    0.052 B-  -6008.858   21.027 202 986942.957     11.353
+  31  117   86  203 Rn   -a   -6153.699     18.179     7770.193    0.090 B-  -7030.116   19.218 202 993393.732     19.516
+  29  116   87  203 Fr          876.417      6.232     7731.708    0.031 B-  -7785.309   38.630 203 000940.872      6.689
+  27  115   88  203 Ra   -a    8661.726     38.124     7689.503    0.188 B-      *              203 009298.745     40.928
+0 50  127   77  204 Ir    x   -9688#       401#        7824#       2#    B-   8234#     446#    203 989600#       430#
+  48  126   78  204 Pt    x  -17922#       196#        7860#       1#    B-   2728#     280#    203 980760#       210#
+  46  125   79  204 Au    +  -20650#       200#        7870#       1#    B-   4040#     200#    203 977831#       215#
+  44  124   80  204 Hg       -24690.145      0.498     7885.545    0.002 B-   -344.000    1.186 203 973494.037      0.534
+  42  123   81  204 Tl       -24346.145      1.152     7880.023    0.006 B-    763.748    0.177 203 973863.337      1.236
+  40  122   82  204 Pb       -25109.892      1.146     7879.932    0.006 B-  -4463.996    9.248 203 973043.420      1.230
+  38  121   83  204 Bi   +a  -20645.896      9.180     7854.215    0.045 B-  -2304.652   14.335 203 977835.717      9.854
+  36  120   84  204 Po   -a  -18341.244     11.013     7839.083    0.054 B-  -6465.811   24.860 203 980309.863     11.822
+  34  119   85  204 At       -11875.433     22.288     7803.552    0.109 B-  -3905.240   23.498 203 987251.197     23.926
+  32  118   86  204 Rn        -7970.193      7.444     7780.574    0.036 B-  -8577.503   25.684 203 991443.644      7.991
+  30  117   87  204 Fr   -a     607.310     24.581     7734.692    0.120 B-  -5449.477   28.940 204 000651.974     26.389
+  28  116   88  204 Ra   -a    6056.787     15.273     7704.144    0.075 B-      *              204 006502.228     16.396
+0 51  128   77  205 Ir    x   -5960#       503#        7807#       2#    B-   7007#     585#    204 993602#       540#
+  49  127   78  205 Pt    x  -12966#       298#        7837#       1#    B-   5803#     357#    204 986080#       320#
+  47  126   79  205 Au    x  -18770#       196#        7861#       1#    B-   3518#     196#    204 979850#       210#
+  45  125   80  205 Hg       -22287.740      3.654     7874.732    0.018 B-   1533.135    3.724 204 976073.125      3.923
+  43  124   81  205 Tl       -23820.874      1.237     7878.394    0.006 B-    -50.636    0.503 204 974427.237      1.328
+  41  123   82  205 Pb       -23770.239      1.144     7874.331    0.006 B-  -2705.734    5.107 204 974481.597      1.228
+  39  122   83  205 Bi       -21064.504      5.111     7857.316    0.025 B-  -3543.106   11.280 204 977386.323      5.487
+  37  121   84  205 Po       -17521.398     10.059     7836.216    0.049 B-  -4549.452   18.130 204 981190.004     10.798
+  35  120   85  205 At   +a  -12971.946     15.085     7810.207    0.074 B-  -5262.161   15.913 204 986074.041     16.194
+  33  119   86  205 Rn        -7709.786      5.080     7780.722    0.025 B-  -6399.973    9.329 204 991723.204      5.453
+  31  118   87  205 Fr    x   -1309.813      7.824     7745.686    0.038 B-  -7148.804   70.954 204 998593.858      8.399
+  29  117   88  205 Ra   -a    5838.991     70.521     7706.998    0.344 B-  -8267.702   86.923 205 006268.415     75.707
+  27  116   89  205 Ac   -a   14106.693     50.818     7662.851    0.248 B-      *              205 015144.158     54.555
+0 50  128   78  206 Pt    x   -9632#       298#        7822#       1#    B-   4583#     422#    205 989660#       320#
+  48  127   79  206 Au    x  -14215#       298#        7840#       1#    B-   6731#     299#    205 984740#       320#
+  46  126   80  206 Hg   +a  -20945.801     20.440     7869.172    0.099 B-   1307.566   20.410 205 977513.756     21.943
+  44  125   81  206 Tl       -22253.367      1.284     7871.721    0.006 B-   1532.217    0.612 205 976110.026      1.378
+  42  124   82  206 Pb       -23785.584      1.144     7875.362    0.006 B-  -3757.306    7.546 205 974465.124      1.227
+  40  123   83  206 Bi    -  -20028.278      7.632     7853.324    0.037 B-  -1839.604    8.600 205 978498.757      8.193
+  38  122   84  206 Po   -a  -18188.674      4.012     7840.597    0.019 B-  -5758.956   15.580 205 980473.654      4.306
+  36  121   85  206 At       -12429.718     15.056     7808.843    0.073 B-  -3296.753   17.330 205 986656.148     16.162
+  34  120   86  206 Rn        -9132.965      8.591     7789.041    0.042 B-  -7890.549   29.475 205 990195.358      9.223
+  32  119   87  206 Fr   -a   -1242.416     28.195     7746.940    0.137 B-  -4807.955   33.455 205 998666.211     30.268
+  30  118   88  206 Ra   -a    3565.539     18.008     7719.802    0.087 B-  -9913.913   53.608 206 003827.763     19.332
+  28  117   89  206 Ac   -a   13479.452     50.493     7667.879    0.245 B-      *              206 014470.787     54.206
+0 51  129   78  207 Pt    x   -4540#       401#        7798#       2#    B-   6270#     500#    206 995126#       430#
+  49  128   79  207 Au    x  -10810#       300#        7825#       1#    B-   5677#     301#    206 988395#       322#
+  47  127   80  207 Hg    x  -16487.444     29.808     7848.610    0.144 B-   4547.008   30.300 206 982300.000     32.000
+  45  126   81  207 Tl       -21034.451      5.439     7866.797    0.026 B-   1417.595    5.402 206 977418.586      5.839
+  43  125   82  207 Pb       -22452.047      1.147     7869.866    0.006 B-  -2397.420    2.118 206 975896.735      1.230
+  41  124   83  207 Bi       -20054.627      2.397     7854.505    0.012 B-  -2908.852    6.614 206 978470.471      2.573
+  39  123   84  207 Po       -17145.775      6.659     7836.673    0.032 B-  -3918.358   14.075 206 981593.252      7.148
+  37  122   85  207 At   +a  -13227.416     12.406     7813.964    0.060 B-  -4592.654   15.037 206 985799.783     13.318
+  35  121   86  207 Rn   +a   -8634.762      8.497     7787.998    0.041 B-  -5790.421   19.458 206 990730.200      9.121
+  33  120   87  207 Fr        -2844.341     17.505     7756.246    0.085 B-  -6388.826   56.008 206 996946.474     18.792
+  31  119   88  207 Ra   -a    3544.485     53.202     7721.602    0.257 B-  -7601.748   73.276 207 003805.161     57.115
+  29  118   89  207 Ac   -a   11146.233     50.387     7681.099    0.243 B-      *              207 011965.973     54.092
+0 52  130   78  208 Pt    x    -990#       400#        7783#       2#    B-   5111#     499#    207 998937#       429#
+  50  129   79  208 Au    x   -6101#       298#        7804#       1#    B-   7164#     300#    207 993450#       320#
+  48  128   80  208 Hg    x  -13265.406     30.739     7834.191    0.148 B-   3484.726   30.795 207 985759.000     33.000
+  46  127   81  208 Tl   +a  -16750.132      1.854     7847.183    0.009 B-   4998.466    1.669 207 982017.992      1.990
+  44  126   82  208 Pb       -21748.598      1.148     7867.453    0.006 B-  -2878.375    2.013 207 976651.918      1.231
+  42  125   83  208 Bi   +n  -18870.223      2.304     7849.853    0.011 B-  -1400.628    2.397 207 979741.981      2.473
+  40  124   84  208 Po   -a  -17469.596      1.737     7839.358    0.008 B-  -4999.725    9.086 207 981245.616      1.864
+  38  123   85  208 At   +a  -12469.871      8.921     7811.560    0.043 B-  -2814.279   14.269 207 986613.042      9.577
+  36  122   86  208 Rn   -a   -9655.591     11.138     7794.268    0.054 B-  -6989.672   16.251 207 989634.295     11.957
+  34  121   87  208 Fr        -2665.919     11.834     7756.903    0.057 B-  -4393.774   14.881 207 997138.018     12.704
+  32  120   88  208 Ra   -a    1727.856      9.023     7732.017    0.043 B-  -9025.380   56.442 208 001854.929      9.686
+  30  119   89  208 Ac   -a   10753.235     55.716     7684.865    0.268 B-  -5930.495   65.370 208 011544.073     59.813
+  28  118   90  208 Th   -a   16683.730     34.190     7652.592    0.164 B-      *              208 017910.722     36.704
+0 51  130   79  209 Au    x   -2540#       400#        7788#       2#    B-   6104#     426#    208 997273#       429#
+  49  129   80  209 Hg    x   -8644#       149#        7813#       1#    B-   5000#     149#    208 990720#       160#
+  47  128   81  209 Tl   +a  -13644.757      6.110     7833.397    0.029 B-   3969.889    6.211 208 985351.750      6.559
+  45  127   82  209 Pb       -17614.646      1.747     7848.648    0.008 B-    644.016    1.146 208 981089.898      1.875
+  43  126   83  209 Bi       -18258.662      1.364     7847.987    0.007 B-  -1892.570    1.564 208 980398.519      1.464
+  41  125   84  209 Po   -a  -16366.092      1.778     7835.188    0.009 B-  -3483.478    5.287 208 982430.276      1.908
+  39  124   85  209 At       -12882.613      5.102     7814.777    0.024 B-  -3941.564   11.188 208 986169.944      5.477
+  37  123   86  209 Rn        -8941.049      9.960     7792.175    0.048 B-  -5171.477   17.713 208 990401.388     10.692
+  35  122   87  209 Fr    x   -3769.572     14.648     7763.688    0.070 B-  -5627.791   15.730 208 995953.197     15.725
+  33  121   88  209 Ra   -a    1858.219      5.747     7733.017    0.027 B-  -6985.590   50.934 209 001994.879      6.169
+  31  120   89  209 Ac   -a    8843.809     50.608     7695.850    0.242 B-  -7523#     148#    209 009494.220     54.330
+  29  119   90  209 Th   IT   16367#       140#        7656#       1#    B-      *              209 017571#       150#
+0 52  131   79  210 Au    x    2329#       401#        7766#       2#    B-   7694#     446#    210 002500#       430#
+  50  130   80  210 Hg    x   -5365#       196#        7799#       1#    B-   3882#     196#    209 994240#       210#
+  48  129   81  210 Tl   +a   -9246.969     11.604     7813.588    0.055 B-   5481.534   11.561 209 990072.970     12.456
+  46  128   82  210 Pb       -14728.502      1.447     7835.965    0.007 B-     63.476    0.499 209 984188.301      1.553
+  44  127   83  210 Bi       -14791.979      1.363     7832.542    0.006 B-   1161.159    0.766 209 984120.156      1.462
+  42  126   84  210 Po       -15953.137      1.146     7834.346    0.005 B-  -3980.960    7.610 209 982873.601      1.230
+  40  125   85  210 At   -a  -11972.177      7.695     7811.663    0.037 B-  -2367.407    8.922 209 987147.338      8.261
+  38  124   86  210 Rn   -a   -9604.770      4.557     7796.665    0.022 B-  -6271.565   15.824 209 989688.854      4.892
+  36  123   87  210 Fr        -3333.205     15.154     7763.075    0.072 B-  -3775.997   17.720 209 996421.657     16.268
+  34  122   88  210 Ra   -a     442.792      9.193     7741.368    0.044 B-  -8346.908   58.133 210 000475.356      9.868
+  32  121   89  210 Ac   -a    8789.699     57.402     7697.896    0.273 B-  -5269.747   60.436 210 009436.130     61.623
+  30  120   90  210 Th   -a   14059.446     18.909     7669.076    0.090 B-      *              210 015093.437     20.299
+0 51  131   80  211 Hg    x    -624#       196#        7778#       1#    B-   5454#     200#    210 999330#       210#
+  49  130   81  211 Tl    x   -6077.998     41.917     7799.791    0.199 B-   4414.950   41.978 210 993475.000     45.000
+  47  129   82  211 Pb       -10492.948      2.261     7817.007    0.011 B-   1366.183    5.471 210 988735.356      2.426
+  45  128   83  211 Bi       -11859.131      5.442     7819.774    0.026 B-    573.439    5.430 210 987268.698      5.842
+  43  127   84  211 Po   -a  -12432.571      1.255     7818.784    0.006 B-   -785.307    2.539 210 986653.085      1.347
+  41  126   85  211 At   -a  -11647.264      2.729     7811.354    0.013 B-  -2891.860    6.894 210 987496.147      2.929
+  39  125   86  211 Rn   -a   -8755.404      6.813     7793.941    0.032 B-  -4615.155   13.786 210 990600.686      7.314
+  37  124   87  211 Fr        -4140.249     11.991     7768.360    0.057 B-  -4972.272   14.369 210 995555.259     12.872
+  35  123   88  211 Ra    x     832.023      7.918     7741.087    0.038 B-  -6370.191   53.564 211 000893.213      8.500
+  33  122   89  211 Ac   -a    7202.214     52.976     7707.189    0.251 B-  -6707.958   90.205 211 007731.894     56.871
+  31  121   90  211 Th   -a   13910.171     73.010     7671.690    0.346 B-  -8170#     126#    211 014933.183     78.379
+  29  120   91  211 Pa    x   22080#       102#        7629#       0#    B-      *              211 023704#       110#
+0 52  132   80  212 Hg    x    2757#       298#        7763#       1#    B-   4308#     359#    212 002960#       320#
+  50  131   81  212 Tl   +a   -1551#       200#        7780#       1#    B-   5998#     200#    211 998335#       215#
+  48  130   82  212 Pb        -7548.850      1.842     7804.319    0.009 B-    569.104    1.825 211 991895.975      1.977
+  46  129   83  212 Bi        -8117.954      1.854     7803.313    0.009 B-   2251.533    1.667 211 991285.016      1.989
+  44  128   84  212 Po       -10369.487      1.152     7810.243    0.005 B-  -1741.266    2.107 211 988867.896      1.236
+  42  127   85  212 At   -a   -8628.221      2.384     7798.340    0.011 B-     31.387    3.605 211 990737.223      2.559
+  40  126   86  212 Rn   -a   -8659.608      3.145     7794.797    0.015 B-  -5143.640    9.318 211 990703.528      3.376
+  38  125   87  212 Fr        -3515.968      8.775     7766.845    0.041 B-  -3317.000   14.276 211 996225.453      9.420
+  36  124   88  212 Ra   -a    -198.968     11.263     7747.508    0.053 B-  -7476.266   52.601 211 999786.399     12.091
+  34  123   89  212 Ac   -a    7277.298     51.381     7708.552    0.242 B-  -4833.510   52.366 212 007812.501     55.160
+  32  122   90  212 Th   -a   12110.808     10.109     7682.062    0.048 B-  -9482.551   75.541 212 013001.487     10.852
+  30  121   91  212 Pa   -a   21593.358     74.862     7633.643    0.353 B-      *              212 023181.425     80.367
+0 53  133   80  213 Hg    x    7666#       298#        7741#       1#    B-   5882#     299#    213 008230#       320#
+  51  132   81  213 Tl    x    1783.811     27.013     7765.430    0.127 B-   4987.343   27.894 213 001915.000     29.000
+  49  131   82  213 Pb   +a   -3203.532      6.954     7785.172    0.033 B-   2028.103    8.371 212 996560.867      7.465
+  47  130   83  213 Bi        -5231.635      5.082     7791.021    0.024 B-   1421.949    5.490 212 994383.608      5.456
+  45  129   84  213 Po        -6653.584      3.053     7794.024    0.014 B-    -73.989    5.465 212 992857.083      3.277
+  43  128   85  213 At   -a   -6579.595      4.898     7790.003    0.023 B-   -883.569    5.724 212 992936.514      5.257
+  41  127   86  213 Rn   -a   -5696.026      3.370     7782.182    0.016 B-  -2143.179    6.006 212 993885.064      3.617
+  39  126   87  213 Fr        -3552.848      5.091     7768.447    0.024 B-  -3898.405   11.057 212 996185.861      5.465
+  37  125   88  213 Ra          345.557      9.818     7746.472    0.046 B-  -5809.134   18.156 213 000370.970     10.540
+  35  124   89  213 Ac   -a    6154.692     15.272     7715.526    0.072 B-  -5965.394   17.834 213 006607.333     16.395
+  33  123   90  213 Th   -a   12120.086      9.217     7683.846    0.043 B-  -7542.539   71.737 213 013011.447      9.895
+  31  122   91  213 Pa   -a   19662.625     71.142     7644.762    0.334 B-      *              213 021108.697     76.374
+0 54  134   80  214 Hg    x   11178#       401#        7727#       2#    B-   4713#     446#    214 012000#       430#
+  52  133   81  214 Tl    x    6465#       196#        7745#       1#    B-   6647#     196#    214 006940#       210#
+  50  132   82  214 Pb         -182.769      1.975     7772.394    0.009 B-   1017.984   11.256 213 999803.788      2.120
+  48  131   83  214 Bi        -1200.753     11.209     7773.495    0.052 B-   3269.293   11.165 213 998710.938     12.033
+  46  130   84  214 Po        -4470.046      1.449     7785.116    0.007 B-  -1090.215    4.107 213 995201.208      1.555
+  44  129   85  214 At   -a   -3379.831      4.298     7776.366    0.020 B-    939.911   10.014 213 996371.601      4.614
+  42  128   86  214 Rn   -a   -4319.742      9.187     7777.102    0.043 B-  -3361.035   12.503 213 995362.566      9.862
+  40  127   87  214 Fr   -a    -958.707      8.634     7757.740    0.040 B-  -1051.441   10.086 213 998970.785      9.268
+  38  126   88  214 Ra   -a      92.734      5.250     7749.171    0.025 B-  -6351.120   16.232 214 000099.554      5.636
+  36  125   89  214 Ac   -a    6443.854     15.360     7715.837    0.072 B-  -4251.030   18.693 214 006917.762     16.489
+  34  124   90  214 Th   -a   10694.885     10.661     7692.317    0.050 B-  -8790.630   76.867 214 011481.431     11.445
+  32  123   91  214 Pa   -a   19485.515     76.125     7647.583    0.356 B-      *              214 020918.561     81.723
+0 55  135   80  215 Hg    x   16208#       401#        7705#       2#    B-   6297#     499#    215 017400#       430#
+  53  134   81  215 Tl    x    9911#       298#        7730#       1#    B-   5569#     303#    215 010640#       320#
+  51  133   82  215 Pb   +a    4342.244     52.448     7752.737    0.244 B-   2712.922   52.748 215 004661.590     56.304
+  49  132   83  215 Bi         1629.322      5.624     7761.717    0.026 B-   2171.028    5.530 215 001749.149      6.037
+  47  131   84  215 Po         -541.706      2.121     7768.176    0.010 B-    714.049    6.819 214 999418.454      2.276
+  45  130   85  215 At   -a   -1255.756      6.799     7767.858    0.032 B-    -87.195   10.168 214 998651.890      7.299
+  43  129   86  215 Rn   -a   -1168.561      7.672     7763.814    0.036 B-  -1486.625   10.306 214 998745.498      8.236
+  41  128   87  215 Fr   -a     318.065      7.066     7753.260    0.033 B-  -2215.674   10.077 215 000341.456      7.585
+  39  127   88  215 Ra   -a    2533.739      7.613     7739.316    0.035 B-  -3496.877   14.551 215 002720.080      8.172
+  37  126   89  215 Ac   -a    6030.615     12.406     7719.413    0.058 B-  -4890.971   15.234 215 006474.132     13.318
+  35  125   90  215 Th   -a   10921.586      8.840     7693.025    0.041 B-  -6942.353   73.380 215 011724.805      9.490
+  33  124   91  215 Pa   -a   17863.939     72.845     7657.096    0.339 B-  -7059.147  114.616 215 019177.728     78.202
+  31  123   92  215 U    -a   24923.087     88.490     7620.624    0.412 B-      *              215 026756.035     94.997
+0 56  136   80  216 Hg    x   19859#       401#        7690#       2#    B-   5142#     499#    216 021320#       430#
+  54  135   81  216 Tl    x   14718#       298#        7710#       1#    B-   7238#     357#    216 015800#       320#
+  52  134   82  216 Pb    x    7480#       196#        7740#       1#    B-   1606#     196#    216 008030#       210#
+  50  133   83  216 Bi    x    5873.991     11.178     7743.499    0.052 B-   4091.571   11.324 216 006305.989     12.000
+  48  132   84  216 Po         1782.420      1.816     7758.819    0.008 B-   -474.246    3.571 216 001913.506      1.949
+  46  131   85  216 At   -a    2256.666      3.575     7753.002    0.017 B-   2003.799    6.836 216 002422.631      3.837
+  44  130   86  216 Rn   -a     252.868      5.994     7758.657    0.028 B-  -2718.082    7.126 216 000271.464      6.435
+  42  129   87  216 Fr   -a    2970.950      4.173     7742.451    0.019 B-   -320.128    9.548 216 003189.445      4.480
+  40  128   88  216 Ra   -a    3291.077      8.737     7737.347    0.040 B-  -4853.317   13.921 216 003533.117      9.379
+  38  127   89  216 Ac   -a    8144.395     10.840     7711.256    0.050 B-  -2153.937   16.201 216 008743.367     11.637
+  36  126   90  216 Th   -a   10298.332     12.042     7697.662    0.056 B-  -7500.882   54.864 216 011055.714     12.928
+  34  125   91  216 Pa   -a   17799.214     53.526     7659.314    0.248 B-  -5267.137   60.450 216 019108.242     57.462
+  32  124   92  216 U    -a   23066.351     28.093     7631.307    0.130 B-      *              216 024762.747     30.158
+0 55  136   81  217 Tl    x   18313#       401#        7695#       2#    B-   6073#     499#    217 019660#       430#
+  53  135   82  217 Pb    x   12240#       298#        7719#       1#    B-   3510#     299#    217 013140#       320#
+  51  134   83  217 Bi    x    8729.962     17.698     7731.848    0.082 B-   2846.444   18.870 217 009372.000     19.000
+  49  133   84  217 Po   +a    5883.518      6.544     7741.360    0.030 B-   1488.883    7.979 217 006316.216      7.025
+  47  132   85  217 At         4394.635      5.001     7744.616    0.023 B-    736.135    6.151 217 004717.835      5.369
+  45  131   86  217 Rn   -a    3658.501      4.198     7744.403    0.019 B-   -656.089    7.538 217 003927.562      4.506
+  43  130   87  217 Fr   -a    4314.590      6.531     7737.775    0.030 B-  -1575.067    9.588 217 004631.902      7.010
+  41  129   88  217 Ra   -a    5889.656      7.202     7726.911    0.033 B-  -2814.017   13.430 217 006322.806      7.731
+  39  128   89  217 Ac   -a    8703.673     11.389     7710.338    0.052 B-  -3502.107   15.566 217 009343.777     12.226
+  37  127   90  217 Th   -a   12205.780     10.614     7690.594    0.049 B-  -4862.630   19.132 217 013103.444     11.394
+  35  126   91  217 Pa   -a   17068.410     15.918     7664.580    0.073 B-  -5905#      73#    217 018323.692     17.089
+  33  125   92  217 U    -a   22973#        71#        7634#       0#    B-      *              217 024663#        77#
+0 56  137   81  218 Tl    x   23180#       400#        7674#       2#    B-   7727#     499#    218 024885#       429#
+  54  136   82  218 Pb    x   15453#       298#        7706#       1#    B-   2237#     299#    218 016590#       320#
+  52  135   83  218 Bi    x   13216.037     27.013     7712.827    0.124 B-   4859.136   27.085 218 014188.000     29.000
+  50  134   84  218 Po         8356.901      1.973     7731.528    0.009 B-    258.738   11.649 218 008971.502      2.118
+  48  133   85  218 At   -a    8098.162     11.604     7729.126    0.053 B-   2880.816   11.705 218 008693.735     12.456
+  46  132   86  218 Rn         5217.347      2.316     7738.752    0.011 B-  -1841.770    4.942 218 005601.052      2.486
+  44  131   87  218 Fr   -a    7059.117      4.757     7726.715    0.022 B-    407.947   12.039 218 007578.274      5.106
+  42  130   88  218 Ra   -a    6651.170     11.176     7724.998    0.051 B-  -4192.439   51.931 218 007140.325     11.997
+  40  129   89  218 Ac   -a   10843.609     50.740     7702.177    0.233 B-  -1523.132   51.815 218 011641.093     54.471
+  38  128   90  218 Th   -a   12366.741     10.516     7691.602    0.048 B-  -6317.029   21.130 218 013276.242     11.289
+  36  127   91  218 Pa   -a   18683.770     18.329     7659.036    0.084 B-  -3210.838   22.888 218 020057.853     19.676
+  34  126   92  218 U    -a   21894.608     13.714     7640.719    0.063 B-      *              218 023504.829     14.722
+0 55  137   82  219 Pb    x   20279#       401#        7686#       2#    B-   3996#     446#    219 021770#       430#
+  53  136   83  219 Bi    x   16283#       196#        7700#       1#    B-   3601#     196#    219 017480#       210#
+  51  135   84  219 Po    x   12681.359     15.835     7713.333    0.072 B-   2285.283   16.163 219 013614.000     17.000
+  49  134   85  219 At        10396.076      3.237     7720.196    0.015 B-   1566.675    2.947 219 011160.647      3.474
+  47  133   86  219 Rn         8829.402      2.100     7723.777    0.010 B-    211.635    7.058 219 009478.753      2.254
+  45  132   87  219 Fr   -a    8617.767      7.039     7721.171    0.032 B-   -776.515   10.772 219 009251.553      7.556
+  43  131   88  219 Ra   -a    9394.282      8.258     7714.053    0.038 B-  -2175.199   51.142 219 010085.176      8.865
+  41  130   89  219 Ac   -a   11569.480     50.497     7700.549    0.231 B-  -2901.910   71.425 219 012420.348     54.210
+  39  129   90  219 Th   -a   14471.390     50.576     7683.725    0.231 B-  -4068.741   72.192 219 015535.677     54.295
+  37  128   91  219 Pa   -a   18540.131     51.516     7661.574    0.235 B-  -4746.439   72.333 219 019903.650     55.304
+  35  127   92  219 U    -a   23286.569     50.775     7636.329    0.232 B-  -6170.086  101.905 219 024999.161     54.509
+  33  126   93  219 Np   -a   29456.655     88.354     7604.583    0.403 B-      *              219 031623.021     94.851
+0 56  138   82  220 Pb    x   23669#       401#        7672#       2#    B-   2850#     499#    220 025410#       430#
+  54  137   83  220 Bi    x   20819#       298#        7682#       1#    B-   5555#     299#    220 022350#       320#
+  52  136   84  220 Po    x   15263.461     17.698     7703.224    0.080 B-    887.714   22.549 220 016386.000     19.000
+  50  135   85  220 At    x   14375.747     13.972     7703.703    0.064 B-   3763.670   14.090 220 015433.000     15.000
+  48  134   86  220 Rn        10612.077      1.815     7717.254    0.008 B-   -870.242    4.026 220 011392.534      1.948
+  46  133   87  220 Fr   -a   11482.320      4.028     7709.742    0.018 B-   1212.075    9.061 220 012326.778      4.324
+  44  132   88  220 Ra   -a   10270.245      8.237     7711.696    0.037 B-  -3473.437   10.141 220 011025.562      8.843
+  42  131   89  220 Ac   -a   13743.682      6.129     7692.351    0.028 B-   -925.417   22.941 220 014754.450      6.579
+  40  130   90  220 Th   -a   14669.100     22.166     7684.589    0.101 B-  -5549#      56#    220 015747.926     23.795
+  38  129   91  220 Pa   -a   20218#        51#        7656#       0#    B-  -2715#     113#    220 021705#        55#
+  36  128   92  220 U    -a   22933#       101#        7640#       0#    B-  -7378#     220#    220 024620#       108#
+  34  127   93  220 Np    x   30311#       196#        7603#       1#    B-      *              220 032540#       210#
+0 55  138   83  221 Bi    x   24098#       298#        7668#       1#    B-   4324#     299#    221 025870#       320#
+  53  137   84  221 Po    x   19773.755     19.561     7684.481    0.089 B-   2991.027   24.039 221 021228.000     21.000
+  51  136   85  221 At    x   16782.727     13.972     7694.475    0.063 B-   2311.308   15.096 221 018017.000     15.000
+  49  135   86  221 Rn   +a   14471.420      5.714     7701.393    0.026 B-   1194.130    7.231 221 015535.709      6.134
+  47  134   87  221 Fr        13277.290      4.886     7703.256    0.022 B-    313.479    6.386 221 014253.757      5.245
+  45  133   88  221 Ra   -a   12963.811      4.630     7701.135    0.021 B-  -1559.298   50.603 221 013917.224      4.970
+  43  132   89  221 Ac   -a   14523.109     50.425     7690.539    0.228 B-  -2417.261   51.056 221 015591.199     54.133
+  41  131   90  221 Th   -a   16940.371      8.166     7676.061    0.037 B-  -3435.918   51.915 221 018186.236      8.766
+  39  130   91  221 Pa   -a   20376.288     51.281     7656.974    0.232 B-  -4143.707   72.404 221 021874.846     55.052
+  37  129   92  221 U    -a   24519.995     51.114     7634.684    0.231 B-  -5330#     207#    221 026323.299     54.873
+  35  128   93  221 Np    x   29850#       200#        7607#       1#    B-      *              221 032045#       215#
+0 56  139   83  222 Bi    x   28729#       300#        7649#       1#    B-   6243#     303#    222 030842#       322#
+  54  138   84  222 Po    x   22486.265     40.054     7674.005    0.180 B-   1533.239   43.071 222 024140.000     43.000
+  52  137   85  222 At    x   20953.026     15.835     7677.387    0.071 B-   4580.820   15.955 222 022494.000     17.000
+  50  136   86  222 Rn        16372.206      1.950     7694.497    0.009 B-     -5.900    7.703 222 017576.286      2.093
+  48  135   87  222 Fr    x   16378.105      7.452     7690.947    0.034 B-   2057.917    8.682 222 017582.620      8.000
+  46  134   88  222 Ra        14320.188      4.454     7696.692    0.020 B-  -2301.285    6.637 222 015373.355      4.781
+  44  133   89  222 Ac   -a   16621.474      5.174     7682.802    0.023 B-   -581.637   13.228 222 017843.887      5.554
+  42  132   90  222 Th   -a   17203.111     12.279     7676.658    0.055 B-  -4951#      74#    222 018468.300     13.182
+  40  131   91  222 Pa   -a   22155#        72#        7651#       0#    B-  -2118#      89#    222 023784#        78#
+  38  130   92  222 U    -a   24272.827     51.994     7637.764    0.234 B-  -6746#     202#    222 026057.953     55.817
+  36  129   93  222 Np    x   31019#       196#        7604#       1#    B-      *              222 033300#       210#
+0 57  140   83  223 Bi    x   32137#       401#        7636#       2#    B-   5058#     446#    223 034500#       430#
+  55  139   84  223 Po    x   27079#       196#        7655#       1#    B-   3651#     196#    223 029070#       210#
+  53  138   85  223 At    x   23428.006     13.972     7668.055    0.063 B-   3038.267   16.013 223 025151.000     15.000
+  51  137   86  223 Rn        20389.739      7.822     7678.171    0.035 B-   2007.344    8.057 223 021889.285      8.397
+  49  136   87  223 Fr        18382.394      1.932     7683.664    0.009 B-   1149.085    0.848 223 019734.313      2.073
+  47  135   88  223 Ra        17233.309      2.090     7685.309    0.009 B-   -592.573    7.128 223 018500.719      2.244
+  45  134   89  223 Ac   -a   17825.882      7.110     7679.143    0.032 B-  -1559.948   11.563 223 019136.872      7.632
+  43  133   90  223 Th   -a   19385.831      9.212     7668.640    0.041 B-  -2934.845   71.639 223 020811.546      9.889
+  41  132   91  223 Pa   -a   22320.676     71.063     7651.971    0.319 B-  -3516.330  100.506 223 023962.232     76.289
+  39  131   92  223 U    -a   25837.006     71.119     7632.694    0.319 B-  -4763#     208#    223 027737.168     76.349
+  37  130   93  223 Np    x   30600#       196#        7608#       1#    B-      *              223 032850#       210#
+0 58  141   83  224 Bi    x   36830#       400#        7617#       2#    B-   6920#     445#    224 039539#       429#
+  56  140   84  224 Po    x   29910#       196#        7644#       1#    B-   2199#     197#    224 032110#       210#
+  54  139   85  224 At    x   27711.015     22.356     7650.735    0.100 B-   5265.917   24.415 224 029749.000     24.000
+  52  138   86  224 Rn        22445.098      9.814     7670.751    0.044 B-    696.482   14.875 224 024095.804     10.536
+  50  137   87  224 Fr    x   21748.616     11.178     7670.367    0.050 B-   2922.699   11.324 224 023348.100     12.000
+  48  136   88  224 Ra        18825.917      1.813     7679.922    0.008 B-  -1408.219    4.087 224 020210.453      1.945
+  46  135   89  224 Ac   -a   20234.135      4.089     7670.143    0.018 B-    240.401   10.823 224 021722.239      4.389
+  44  134   90  224 Th   -a   19993.734     10.120     7667.724    0.045 B-  -3868.544   12.546 224 021464.157     10.864
+  42  133   91  224 Pa   -a   23862.278      7.587     7646.961    0.034 B-  -1859.974   24.329 224 025617.210      8.145
+  40  132   92  224 U    -a   25722.252     23.171     7635.165    0.103 B-  -6153#     197#    224 027613.974     24.875
+  38  131   93  224 Np    x   31876#       196#        7604#       1#    B-      *              224 034220#       210#
+0 57  141   84  225 Po    x   34530#       298#        7626#       1#    B-   4136#     422#    225 037070#       320#
+  55  140   85  225 At    x   30395#       298#        7641#       1#    B-   3861#     298#    225 032630#       320#
+  53  139   86  225 Rn        26534.141     11.140     7654.357    0.050 B-   2713.531   16.349 225 028485.574     11.958
+  51  138   87  225 Fr        23820.610     11.967     7662.940    0.053 B-   1827.501   12.158 225 025572.478     12.847
+  49  137   88  225 Ra        21993.109      2.596     7667.586    0.012 B-    355.763    5.007 225 023610.574      2.787
+  47  136   89  225 Ac        21637.346      4.758     7665.690    0.021 B-   -672.781    6.658 225 023228.647      5.107
+  45  135   90  225 Th   -a   22310.127      5.093     7659.222    0.023 B-  -2030.598   71.170 225 023950.907      5.467
+  43  134   91  225 Pa   -a   24340.725     71.012     7646.720    0.316 B-  -3039.196   71.827 225 026130.844     76.234
+  41  133   92  225 U    -a   27379.921     10.909     7629.736    0.048 B-  -4207.783   72.440 225 029393.555     11.711
+  39  132   93  225 Np   -a   31587.704     71.622     7607.557    0.318 B-      *              225 033910.797     76.889
+0 58  142   84  226 Po    x   37549#       401#        7614#       2#    B-   2934#     499#    226 040310#       430#
+  56  141   85  226 At    x   34614#       298#        7624#       1#    B-   5867#     298#    226 037160#       320#
+  54  140   86  226 Rn        28747.192     10.477     7646.410    0.046 B-   1226.653   12.190 226 030861.382     11.247
+  52  139   87  226 Fr        27520.539      6.230     7648.376    0.028 B-   3852.715    6.523 226 029544.515      6.688
+  50  138   88  226 Ra        23667.824      1.933     7661.962    0.009 B-   -641.440    3.274 226 025408.455      2.075
+  48  137   89  226 Ac        24309.264      3.100     7655.662    0.014 B-   1111.630    4.563 226 026097.069      3.328
+  46  136   90  226 Th        23197.634      4.481     7657.119    0.020 B-  -2835.642   12.165 226 024903.686      4.810
+  44  135   91  226 Pa   -a   26033.276     11.420     7641.110    0.051 B-  -1295.593   17.228 226 027947.872     12.259
+  42  134   92  226 U    -a   27328.869     12.999     7631.916    0.058 B-  -5448#      89#    226 029338.749     13.955
+  40  133   93  226 Np   -a   32777#        88#        7604#       0#    B-      *              226 035188#        95#
+0 59  143   84  227 Po    x   42281#       401#        7596#       2#    B-   4797#     499#    227 045390#       430#
+  57  142   85  227 At    x   37483#       298#        7613#       1#    B-   4597#     298#    227 040240#       320#
+  55  141   86  227 Rn        32885.834     14.091     7630.050    0.062 B-   3203.388   15.276 227 035304.396     15.127
+  53  140   87  227 Fr        29682.445      5.898     7640.715    0.026 B-   2504.734    6.213 227 031865.417      6.332
+  51  139   88  227 Ra   -n   27177.711      1.952     7648.303    0.009 B-   1328.132    2.265 227 029176.474      2.095
+  49  138   89  227 Ac        25849.580      1.927     7650.707    0.008 B-     44.757    0.830 227 027750.666      2.068
+  47  137   90  227 Th        25804.823      2.088     7647.458    0.009 B-  -1026.375    7.437 227 027702.618      2.241
+  45  136   91  227 Pa   -a   26831.198      7.420     7639.490    0.033 B-  -2214.264   12.146 227 028804.477      7.965
+  43  135   92  227 U    -a   29045.462      9.705     7626.289    0.043 B-  -3516.618   73.135 227 031181.587     10.419
+  41  134   93  227 Np   -a   32562.080     72.506     7607.351    0.319 B-  -4208#     123#    227 034956.832     77.838
+  39  133   94  227 Pu    x   36770#       100#        7585#       0#    B-      *              227 039474#       107#
+0 58  143   85  228 At    x   41684#       401#        7597#       2#    B-   6441#     401#    228 044750#       430#
+  56  142   86  228 Rn        35243.465     17.677     7621.645    0.078 B-   1859.244   18.916 228 037835.418     18.977
+  54  141   87  228 Fr        33384.221      6.732     7626.368    0.030 B-   4443.953    7.021 228 035839.437      7.226
+  52  140   88  228 Ra   +a   28940.268      1.996     7642.428    0.009 B-     45.540    0.634 228 031068.657      2.142
+  50  139   89  228 Ac    -   28894.728      2.094     7639.196    0.009 B-   2123.743    2.645 228 031019.767      2.247
+  48  138   90  228 Th        26770.984      1.807     7645.080    0.008 B-  -2152.602    4.340 228 028739.835      1.940
+  46  137   91  228 Pa   -a   28923.586      4.340     7632.207    0.019 B-   -298.640   14.929 228 031050.748      4.659
+  44  136   92  228 U    -a   29222.226     14.354     7627.466    0.063 B-  -4373.468   52.545 228 031371.351     15.409
+  42  135   93  228 Np   -a   33595.694     50.572     7604.853    0.222 B-  -2491.677   58.346 228 036066.462     54.291
+  40  134   94  228 Pu   -a   36087.370     29.143     7590.493    0.128 B-      *              228 038741.387     31.286
+0 59  144   85  229 At    x   44823#       401#        7585#       2#    B-   5461#     401#    229 048120#       430#
+  57  143   86  229 Rn    x   39362.400     13.041     7605.622    0.057 B-   3694.138   13.967 229 042257.276     14.000
+  55  142   87  229 Fr        35668.262      5.001     7618.337    0.022 B-   3106.298   16.231 229 038291.455      5.368
+  53  141   88  229 Ra    x   32561.963     15.441     7628.485    0.067 B-   1872.030   19.623 229 034956.707     16.576
+  51  140   89  229 Ac    x   30689.933     12.109     7633.244    0.053 B-   1104.350   12.346 229 032947.000     13.000
+  49  139   90  229 Th        29585.583      2.405     7634.650    0.011 B-   -311.325    3.715 229 031761.431      2.581
+  47  138   91  229 Pa        29896.908      3.280     7629.874    0.014 B-  -1313.646    6.655 229 032095.652      3.521
+  45  137   92  229 U    -a   31210.554      5.938     7620.721    0.026 B-  -2569.122   87.031 229 033505.909      6.374
+  43  136   93  229 Np   -a   33779.675     86.848     7606.086    0.379 B-  -3615.915  100.792 229 036263.974     93.235
+  41  135   94  229 Pu   -a   37395.590     51.176     7586.880    0.223 B-  -4754.430  101.230 229 040145.819     54.939
+  39  134   95  229 Am   -a   42150.020     87.348     7562.702    0.381 B-      *              229 045249.909     93.772
+0 58  144   86  230 Rn    x   42048#       196#        7596#       1#    B-   2561#     196#    230 045140#       210#
+  56  143   87  230 Fr        39486.768      6.541     7603.704    0.028 B-   4970.462   12.198 230 042390.791      7.022
+  54  142   88  230 Ra    x   34516.306     10.296     7621.914    0.045 B-    677.924   18.888 230 037054.780     11.053
+  52  141   89  230 Ac    x   33838.383     15.835     7621.460    0.069 B-   2975.789   15.882 230 036327.000     17.000
+  50  140   90  230 Th        30862.593      1.210     7630.996    0.005 B-  -1311.014    2.833 230 033132.358      1.299
+  48  139   91  230 Pa        32173.607      3.038     7621.895    0.013 B-    558.605    4.592 230 034539.789      3.261
+  46  138   92  230 U    -a   31615.002      4.509     7620.922    0.020 B-  -3621.290   51.461 230 033940.102      4.841
+  44  137   93  230 Np   -a   35236.291     51.288     7601.776    0.223 B-  -1698.101   53.363 230 037827.716     55.059
+  42  136   94  230 Pu   -a   36934.392     14.824     7590.991    0.064 B-  -5998#     134#    230 039650.703     15.913
+  40  135   95  230 Am   -a   42932#       133#        7562#       1#    B-      *              230 046089#       143#
+0 59  145   86  231 Rn    x   46454#       298#        7579#       1#    B-   4373#     298#    231 049870#       320#
+  57  144   87  231 Fr    x   42080.575      7.731     7594.500    0.033 B-   3864.089   13.749 231 045175.357      8.300
+  55  143   88  231 Ra        38216.486     11.370     7607.841    0.049 B-   2453.636   17.301 231 041027.086     12.206
+  53  142   89  231 Ac    x   35762.849     13.041     7615.076    0.056 B-   1946.959   13.098 231 038393.000     14.000
+  51  141   90  231 Th        33815.891      1.218     7620.118    0.005 B-    391.487    1.460 231 036302.853      1.308
+  49  140   91  231 Pa        33424.404      1.772     7618.426    0.008 B-   -381.611    2.033 231 035882.575      1.902
+  47  139   92  231 U    -a   33806.015      2.670     7613.387    0.012 B-  -1818.498   50.577 231 036292.252      2.866
+  45  138   93  231 Np   -a   35624.513     50.547     7602.128    0.219 B-  -2684.492   55.333 231 038244.490     54.264
+  43  137   94  231 Pu   -a   38309.005     22.549     7587.120    0.098 B-  -4101#     301#    231 041126.410     24.206
+  41  136   95  231 Am    x   42410#       300#        7566#       1#    B-  -4860#     424#    231 045529#       322#
+  39  135   96  231 Cm    x   47270#       300#        7542#       1#    B-      *              231 050746#       322#
+0 58  145   87  232 Fr    x   46072.834     13.972     7579.347    0.060 B-   5575.880   16.702 232 049461.224     15.000
+  56  144   88  232 Ra        40496.953      9.151     7600.009    0.039 B-   1342.534   15.931 232 043475.270      9.823
+  54  143   89  232 Ac    x   39154.419     13.041     7602.424    0.056 B-   3707.635   13.118 232 042034.000     14.000
+  52  142   90  232 Th        35446.784      1.422     7615.033    0.006 B-   -499.850    7.734 232 038053.689      1.526
+  50  141   91  232 Pa    +   35946.633      7.645     7609.506    0.033 B-   1337.103    7.428 232 038590.300      8.207
+  48  140   92  232 U         34609.530      1.809     7611.897    0.008 B-  -2750#     100#    232 037154.860      1.942
+  46  139   93  232 Np    -   37360#       100#        7597#       0#    B-  -1004#     102#    232 040107#       107#
+  44  138   94  232 Pu   -a   38363.140     17.595     7588.974    0.076 B-  -4976#     300#    232 041184.526     18.888
+  42  137   95  232 Am    x   43340#       300#        7564#       1#    B-  -2973#     362#    232 046527#       322#
+  40  136   96  232 Cm   -a   46312#       202#        7548#       1#    B-      *              232 049718#       217#
+0 59  146   87  233 Fr    x   48920.051     19.561     7569.239    0.084 B-   4585.991   21.369 233 052517.838     21.000
+  57  145   88  233 Ra        44334.060      8.603     7585.564    0.037 B-   3026.027   15.623 233 047594.573      9.235
+  55  144   89  233 Ac    x   41308.033     13.041     7595.193    0.056 B-   2576.318   13.118 233 044346.000     14.000
+  53  143   90  233 Th        38731.715      1.425     7602.893    0.006 B-   1242.243    1.122 233 041580.208      1.529
+  51  142   91  233 Pa        37489.472      1.336     7604.866    0.006 B-    570.296    1.975 233 040246.605      1.434
+  49  141   92  233 U         36919.176      2.255     7603.956    0.010 B-  -1029.415   51.005 233 039634.367      2.420
+  47  140   93  233 Np   -a   37948.590     50.981     7596.181    0.219 B-  -2103.179   71.642 233 040739.489     54.729
+  45  139   94  233 Pu   -a   40051.769     50.351     7583.796    0.216 B-  -3211#     113#    233 042997.345     54.054
+  43  138   95  233 Am   -a   43263#       102#        7567#       0#    B-  -4031#     124#    233 046445#       109#
+  41  137   96  233 Cm   -a   47294.006     71.547     7545.998    0.307 B-  -5567#     235#    233 050772.206     76.809
+  39  136   97  233 Bk   -a   52861#       224#        7519#       1#    B-      *              233 056748#       240#
+0 58  146   88  234 Ra    x   46930.629      8.383     7576.543    0.036 B-   2089.439   16.294 234 050382.104      9.000
+  56  145   89  234 Ac    x   44841.190     13.972     7582.129    0.060 B-   4228.181   14.210 234 048139.000     15.000
+  54  144   90  234 Th   +a   40613.009      2.589     7596.855    0.011 B-    274.088    3.172 234 043599.860      2.779
+  52  143   91  234 Pa   IT   40338.921      4.094     7594.683    0.017 B-   2193.896    4.000 234 043305.615      4.395
+  50  142   92  234 U         38145.025      1.130     7600.715    0.005 B-  -1809.846    8.321 234 040950.370      1.213
+  48  141   93  234 Np    -   39954.871      8.397     7589.637    0.036 B-   -395.100   10.752 234 042893.320      9.014
+  46  140   94  234 Pu   -a   40349.971      6.798     7584.605    0.029 B-  -4111#     159#    234 043317.478      7.298
+  44  139   95  234 Am   -a   44461#       159#        7564#       1#    B-  -2263#     159#    234 047731#       170#
+  42  138   96  234 Cm   -a   46724.633     17.394     7550.677    0.074 B-  -6731#     143#    234 050160.959     18.673
+  40  137   97  234 Bk   -a   53455#       142#        7519#       1#    B-      *              234 057387#       153#
+0 59  147   88  235 Ra    x   51130#       300#        7561#       1#    B-   3773#     300#    235 054890#       322#
+  57  146   89  235 Ac    x   47357.155     13.972     7573.504    0.059 B-   3339.406   19.113 235 050840.000     15.000
+  55  145   90  235 Th    x   44017.749     13.041     7584.385    0.055 B-   1728.853   19.113 235 047255.000     14.000
+  53  144   91  235 Pa    x   42288.896     13.972     7588.413    0.059 B-   1370.050   14.017 235 045399.000     15.000
+  51  143   92  235 U         40918.846      1.117     7590.914    0.005 B-   -124.262    0.852 235 043928.190      1.199
+  49  142   93  235 Np        41043.108      1.389     7587.056    0.006 B-  -1139.302   20.499 235 044061.591      1.491
+  47  141   94  235 Pu   -a   42182.410     20.521     7578.879    0.087 B-  -2443.019   56.045 235 045284.682     22.030
+  45  140   95  235 Am   -a   44625.429     52.192     7565.154    0.222 B-  -3408#     208#    235 047907.371     56.030
+  43  139   96  235 Cm   -a   48034#       201#        7547#       1#    B-  -4670#     448#    235 051567#       216#
+  41  138   97  235 Bk    x   52704#       401#        7524#       2#    B-      *              235 056580#       430#
+0 58  147   89  236 Ac    x   51220.992     38.191     7559.242    0.162 B-   4965.795   40.667 236 054988.000     41.000
+  56  146   90  236 Th    x   46255.198     13.972     7576.968    0.059 B-    921.248   19.760 236 049657.000     15.000
+  54  145   91  236 Pa    x   45333.950     13.972     7577.557    0.059 B-   2889.306   14.017 236 048668.000     15.000
+  52  144   92  236 U         42444.644      1.113     7586.484    0.005 B-   -933.534   50.415 236 045566.201      1.194
+  50  143   93  236 Np   IT   43378.178     50.421     7579.214    0.214 B-    476.585   50.389 236 046568.392     54.129
+  48  142   94  236 Pu        42901.593      1.811     7577.918    0.008 B-  -3139#     112#    236 046056.756      1.944
+  46  141   95  236 Am   -a   46041#       112#        7561#       0#    B-  -1814#     113#    236 049427#       120#
+  44  140   96  236 Cm   -a   47855.045     18.315     7550.299    0.078 B-  -5687#     401#    236 051374.506     19.662
+  42  139   97  236 Bk    x   53542#       401#        7523#       2#    B-      *              236 057480#       430#
+0 59  148   89  237 Ac    x   54020#       400#        7550#       2#    B-   4065#     400#    237 057993#       429#
+  57  147   90  237 Th    x   49955.092     15.835     7563.443    0.067 B-   2427.473   20.514 237 053629.000     17.000
+  55  146   91  237 Pa    x   47527.619     13.041     7570.384    0.055 B-   2137.425   13.096 237 051023.000     14.000
+  53  145   92  237 U         45390.194      1.203     7576.102    0.005 B-    518.534    0.520 237 048728.380      1.291
+  51  144   93  237 Np        44871.659      1.120     7574.989    0.005 B-   -220.063    1.294 237 048171.710      1.202
+  49  143   94  237 Pu        45091.722      1.697     7570.759    0.007 B-  -1478#      59#    237 048407.957      1.822
+  47  142   95  237 Am   -a   46570#        59#        7561#       0#    B-  -2677#      93#    237 049995#        64#
+  45  141   96  237 Cm   -a   49247.085     70.960     7546.624    0.299 B-  -3941#     235#    237 052868.923     76.178
+  43  140   97  237 Bk   -a   53188#       224#        7527#       1#    B-  -4751#     241#    237 057100#       241#
+  41  139   98  237 Cf   -a   57938.921     87.287     7503.347    0.368 B-      *              237 062199.993     93.706
+0 58  148   90  238 Th   +a   52525#       283#        7555#       1#    B-   1631#     284#    238 056388#       304#
+  56  147   91  238 Pa    x   50894.038     15.835     7558.344    0.067 B-   3586.255   15.906 238 054637.000     17.000
+  54  146   92  238 U         47307.783      1.493     7570.125    0.006 B-   -146.874    1.201 238 050786.996      1.602
+  52  145   93  238 Np   -n   47454.656      1.138     7566.221    0.005 B-   1291.443    0.457 238 050944.671      1.221
+  50  144   94  238 Pu        46163.213      1.139     7568.360    0.005 B-  -2258.273   50.688 238 049558.250      1.222
+  48  143   95  238 Am   -a   48421.487     50.700     7555.584    0.213 B-  -1023.701   52.145 238 051982.607     54.428
+  46  142   96  238 Cm   -a   49445.188     12.234     7547.996    0.051 B-  -4771#     255#    238 053081.595     13.133
+  44  141   97  238 Bk   -a   54216#       255#        7525#       1#    B-  -3061#     392#    238 058203#       274#
+  42  140   98  238 Cf    x   57278#       298#        7509#       1#    B-      *              238 061490#       320#
+0 59  149   90  239 Th    x   56450#       400#        7541#       2#    B-   3113#     445#    239 060602#       429#
+  57  148   91  239 Pa    x   53337#       196#        7550#       1#    B-   2765#     196#    239 057260#       210#
+  55  147   92  239 U    -n   50572.718      1.503     7558.561    0.006 B-   1261.661    1.493 239 054292.048      1.613
+  53  146   93  239 Np        49311.057      1.311     7560.567    0.005 B-    722.774    0.930 239 052937.599      1.407
+  51  145   94  239 Pu        48588.282      1.113     7560.318    0.005 B-   -802.142    1.664 239 052161.669      1.195
+  49  144   95  239 Am   -a   49390.424      1.982     7553.688    0.008 B-  -1756.602   54.058 239 053022.803      2.128
+  47  143   96  239 Cm   -a   51147.025     54.047     7543.065    0.226 B-  -3103#     214#    239 054908.593     58.022
+  45  142   97  239 Bk   -a   54250#       207#        7527#       1#    B-  -4019#     294#    239 058240#       222#
+  43  141   98  239 Cf   -a   58269#       209#        7507#       1#    B-  -5287#     364#    239 062554#       224#
+  41  140   99  239 Es    x   63556#       298#        7481#       1#    B-      *              239 068230#       320#
+0 58  149   91  240 Pa    x   56910#       200#        7538#       1#    B-   4194#     200#    240 061095#       215#
+  56  148   92  240 U         52715.505      2.553     7551.770    0.011 B-    399.233   17.083 240 056592.425      2.740
+  54  147   93  240 Np        52316.272     17.032     7550.173    0.071 B-   2190.891   17.015 240 056163.830     18.284
+  52  146   94  240 Pu        50125.380      1.106     7556.042    0.005 B-  -1384.789   13.788 240 053811.812      1.187
+  50  145   95  240 Am   +n   51510.169     13.832     7547.013    0.058 B-   -214.137   13.897 240 055298.444     14.849
+  48  144   96  240 Cm        51724.306      1.906     7542.861    0.008 B-  -3940#     150#    240 055528.329      2.046
+  46  143   97  240 Bk    -   55664#       150#        7523#       1#    B-  -2327#     151#    240 059758#       161#
+  44  142   98  240 Cf   -a   57990.944     18.700     7510.230    0.078 B-  -6208#     401#    240 062255.842     20.075
+  42  141   99  240 Es    x   64199#       401#        7481#       2#    B-      *              240 068920#       430#
+0 59  150   91  241 Pa    x   59640#       300#        7528#       1#    B-   3443#     358#    241 064026#       322#
+  57  149   92  241 U     x   56197#       196#        7539#       1#    B-   1937#     208#    241 060330#       210#
+  55  148   93  241 Np    +   54260.175     70.719     7544.270    0.293 B-   1305.000   70.711 241 058250.697     75.920
+  53  147   94  241 Pu        52955.175      1.106     7546.439    0.005 B-     20.780    0.166 241 056849.722      1.187
+  51  146   95  241 Am        52934.395      1.114     7543.278    0.005 B-   -767.434    1.168 241 056827.413      1.195
+  49  145   96  241 Cm        53701.830      1.608     7536.848    0.007 B-  -2330#     200#    241 057651.288      1.726
+  47  144   97  241 Bk    -   56032#       200#        7524#       1#    B-  -3295#     260#    241 060153#       215#
+  45  143   98  241 Cf   -a   59327#       166#        7507#       1#    B-  -4537#     280#    241 063690#       178#
+  43  142   99  241 Es   -a   63863#       225#        7485#       1#    B-  -5263#     374#    241 068560#       242#
+  41  141  100  241 Fm    x   69126#       298#        7460#       1#    B-      *              241 074210#       320#
+0 58  150   92  242 U    +a   58620#       201#        7532#       1#    B-   1203#     283#    242 062931#       215#
+  56  149   93  242 Np    +   57416.932    200.004     7533.403    0.826 B-   2700.000  200.000 242 061639.615    214.713
+  54  148   94  242 Pu        54716.932      1.245     7541.327    0.005 B-   -751.140    0.708 242 058741.045      1.336
+  52  147   95  242 Am   -n   55468.072      1.119     7534.991    0.005 B-    664.309    0.414 242 059547.428      1.200
+  50  146   96  242 Cm        54803.764      1.142     7534.503    0.005 B-  -2930#     200#    242 058834.263      1.225
+  48  145   97  242 Bk    -   57734#       200#        7519#       1#    B-  -1653#     200#    242 061980#       215#
+  46  144   98  242 Cf   -a   59386.966     12.892     7509.098    0.053 B-  -5414#     256#    242 063754.533     13.840
+  44  143   99  242 Es   -a   64801#       256#        7483#       1#    B-  -3598#     475#    242 069567#       275#
+  42  142  100  242 Fm    x   68400#       401#        7465#       2#    B-      *              242 073430#       430#
+0 59  151   92  243 U     x   62360#       300#        7518#       1#    B-   2484#     302#    243 066946#       322#
+  57  150   93  243 Np   IT   59876#        32#        7525#       0#    B-   2121#      32#    243 064279#        34#
+  55  149   94  243 Pu        57754.602      2.542     7531.008    0.010 B-    579.556    2.622 243 062002.119      2.728
+  53  148   95  243 Am        57175.046      1.388     7530.173    0.006 B-     -6.952    1.569 243 061379.940      1.490
+  51  147   96  243 Cm   -a   57181.998      1.496     7526.925    0.006 B-  -1507.695    4.506 243 061387.403      1.606
+  49  146   97  243 Bk   -a   58689.693      4.524     7517.501    0.019 B-  -2300#     114#    243 063005.980      4.857
+  47  145   98  243 Cf   -a   60990#       114#        7505#       0#    B-  -3757#     236#    243 065475#       123#
+  45  144   99  243 Es   -a   64747#       207#        7486#       1#    B-  -4640#     298#    243 069509#       222#
+  43  143  100  243 Fm   -a   69387#       215#        7464#       1#    B-      *              243 074490#       231#
+0 58  151   93  244 Np    x   63202#       298#        7514#       1#    B-   3396#     298#    244 067850#       320#
+  56  150   94  244 Pu        59806.028      2.346     7524.815    0.010 B-    -73.168    2.686 244 064204.415      2.518
+  54  149   95  244 Am    +   59879.196      1.492     7521.308    0.006 B-   1427.300    1.000 244 064282.964      1.601
+  52  148   96  244 Cm   -a   58451.896      1.107     7523.952    0.005 B-  -2261.989   14.357 244 062750.694      1.188
+  50  147   97  244 Bk   -a   60713.885     14.399     7511.475    0.059 B-   -764.294   14.572 244 065179.039     15.457
+  48  146   98  244 Cf        61478.179      2.618     7505.136    0.011 B-  -4547#     181#    244 065999.543      2.810
+  46  145   99  244 Es   -a   66026#       181#        7483#       1#    B-  -2940#     271#    244 070881#       195#
+  44  144  100  244 Fm   -a   68966#       201#        7468#       1#    B-      *              244 074038#       216#
+0 59  152   93  245 Np    x   65890#       300#        7505#       1#    B-   2712#     300#    245 070736#       322#
+  57  151   94  245 Pu   -n   63178.179     13.620     7513.281    0.056 B-   1277.710   13.733 245 067824.568     14.621
+  55  150   95  245 Am   +a   61900.469      1.887     7515.303    0.008 B-    895.889    1.549 245 066452.890      2.025
+  53  149   96  245 Cm        61004.580      1.150     7515.767    0.005 B-   -809.256    1.496 245 065491.113      1.234
+  51  148   97  245 Bk   -a   61813.836      1.793     7509.270    0.007 B-  -1571.374    2.586 245 066359.885      1.924
+  49  147   98  245 Cf        63385.210      2.428     7499.663    0.010 B-  -2981#     200#    245 068046.825      2.606
+  47  146   99  245 Es   -a   66366#       200#        7484#       1#    B-  -3821#     279#    245 071247#       215#
+  45  145  100  245 Fm   -a   70187#       195#        7466#       1#    B-  -5085#     362#    245 075349#       209#
+  43  144  101  245 Md   -a   75272#       305#        7442#       1#    B-      *              245 080808#       328#
+0 58  152   94  246 Pu        65394.801     14.985     7506.539    0.061 B-    401#      14#    246 070204.209     16.087
+  56  151   95  246 Am   IT   64994#        18#        7505#       0#    B-   2377#      18#    246 069774#        19#
+  54  150   96  246 Cm        62616.967      1.526     7511.471    0.006 B-  -1350.000   60.000 246 067222.082      1.638
+  52  149   97  246 Bk    -   63966.967     60.019     7502.803    0.244 B-   -123.325   60.020 246 068671.367     64.433
+  50  148   98  246 Cf        64090.292      1.515     7499.121    0.006 B-  -3810#     224#    246 068803.762      1.626
+  48  147   99  246 Es   -a   67901#       224#        7480#       1#    B-  -2288#     224#    246 072894#       240#
+  46  146  100  246 Fm   -a   70188.833     15.333     7467.970    0.062 B-  -5926#     260#    246 075350.815     16.460
+  44  145  101  246 Md   -a   76115#       259#        7441#       1#    B-      *              246 081713#       278#
+0 59  153   94  247 Pu    x   69108#       196#        7494#       1#    B-   1954#     220#    247 074190#       210#
+  57  152   95  247 Am    +   67153#       100#        7499#       0#    B-   1620#     100#    247 072092#       107#
+  55  151   96  247 Cm        65533.143      3.797     7501.931    0.015 B-     43.581    6.324 247 070352.726      4.076
+  53  150   97  247 Bk   -a   65489.562      5.189     7498.940    0.021 B-   -614.341   16.188 247 070305.940      5.570
+  51  149   98  247 Cf   +a   66103.903     15.334     7493.285    0.062 B-  -2474.485   24.760 247 070965.462     16.461
+  49  148   99  247 Es   +a   68578.388     19.441     7480.100    0.079 B-  -3094#     116#    247 073621.932     20.870
+  47  147  100  247 Fm   +a   71673#       115#        7464#       0#    B-  -4264#     237#    247 076944#       123#
+  45  146  101  247 Md   -a   75937#       207#        7444#       1#    B-      *              247 081521#       222#
+0 58  153   95  248 Am    +   70563#       200#        7487#       1#    B-   3170#     200#    248 075752#       215#
+  56  152   96  248 Cm        67392.755      2.358     7496.728    0.010 B-   -687#      71#    248 072349.101      2.531
+  54  151   97  248 Bk   IT   68080#        71#        7491#       0#    B-    842#      71#    248 073087#        76#
+  52  150   98  248 Cf   -a   67238.012      5.121     7491.043    0.021 B-  -3061#      53#    248 072182.978      5.497
+  50  149   99  248 Es   -a   70299#        52#        7476#       0#    B-  -1599#      53#    248 075469#        56#
+  48  148  100  248 Fm        71897.857      8.497     7465.944    0.034 B-  -5250#     238#    248 077185.528      9.122
+  46  147  101  248 Md   -a   77148#       237#        7442#       1#    B-  -3473#     327#    248 082822#       255#
+  44  146  102  248 No   -a   80621#       224#        7424#       1#    B-      *              248 086550#       241#
+0 59  154   95  249 Am    x   73104#       298#        7479#       1#    B-   2353#     298#    249 078480#       320#
+  57  153   96  249 Cm   -n   70750.702      2.371     7485.550    0.010 B-    904.317    2.594 249 075954.006      2.545
+  55  152   97  249 Bk    +   69846.384      1.249     7486.040    0.005 B-    123.600    0.400 249 074983.182      1.340
+  53  151   98  249 Cf        69722.784      1.183     7483.394    0.005 B-  -1452#      30#    249 074850.491      1.270
+  51  150   99  249 Es   -a   71175#        30#        7474#       0#    B-  -2344#      31#    249 076409#        32#
+  49  149  100  249 Fm        73519.188      6.212     7461.864    0.025 B-  -3713#     201#    249 078926.098      6.668
+  47  148  101  249 Md   -a   77232#       201#        7444#       1#    B-  -4550#     344#    249 082912#       216#
+  45  147  102  249 No   -a   81782#       279#        7422#       1#    B-      *              249 087797#       300#
+0 58  154   96  250 Cm  -nn   72989.594     10.274     7478.938    0.041 B-     39.616   10.894 250 078357.556     11.029
+  56  153   97  250 Bk   +a   72949.978      3.719     7475.967    0.015 B-   1779.587    3.386 250 078315.027      3.992
+  54  152   98  250 Cf   -a   71170.391      1.538     7479.956    0.006 B-  -2055#     100#    250 076404.561      1.651
+  52  151   99  250 Es    -   73225#       100#        7469#       0#    B-   -847#     100#    250 078611#       107#
+  50  150  100  250 Fm        74072.243      7.888     7462.090    0.032 B-  -4558#     301#    250 079519.828      8.468
+  48  149  101  250 Md   -a   78630#       301#        7441#       1#    B-  -2933#     362#    250 084413#       323#
+  46  148  102  250 No   -a   81564#       201#        7426#       1#    B-      *              250 087562#       215#
+0 59  155   96  251 Cm    +   76648.018     22.698     7466.722    0.090 B-   1420.000   20.000 251 082285.036     24.367
+  57  154   97  251 Bk    +   75228.018     10.734     7469.263    0.043 B-   1093.000   10.000 251 080760.603     11.523
+  55  153   98  251 Cf   -a   74135.018      3.901     7470.500    0.016 B-   -377.259    7.057 251 079587.219      4.187
+  53  152   99  251 Es   -a   74512.277      5.994     7465.881    0.024 B-  -1441.641   16.342 251 079992.224      6.434
+  51  151  100  251 Fm   +a   75953.919     15.203     7457.020    0.061 B-  -3012.825   24.271 251 081539.889     16.320
+  49  150  101  251 Md   +a   78966.744     18.919     7441.900    0.075 B-  -3882#     116#    251 084774.291     20.310
+  47  149  102  251 No   IT   82849#       114#        7423#       0#    B-  -4879#     319#    251 088942#       123#
+  45  148  103  251 Lr    x   87728#       298#        7401#       1#    B-      *              251 094180#       320#
+0 60  156   96  252 Cm    x   79056#       298#        7460#       1#    B-    521#     359#    252 084870#       320#
+  58  155   97  252 Bk    +   78535#       200#        7459#       1#    B-   2500#     200#    252 084310#       215#
+  56  154   98  252 Cf   -a   76034.617      2.358     7465.347    0.009 B-  -1260.000   50.000 252 081626.523      2.531
+  54  153   99  252 Es    -   77294.617     50.056     7457.242    0.199 B-    478.990   50.351 252 082979.189     53.736
+  52  152  100  252 Fm   -a   76815.627      5.498     7456.038    0.022 B-  -3695#     130#    252 082464.972      5.902
+  50  151  101  252 Md   IT   80510#       130#        7438#       1#    B-  -2361#     131#    252 086432#       140#
+  48  150  102  252 No        82871.427      9.292     7425.798    0.037 B-  -5866#     238#    252 088966.141      9.975
+  46  149  103  252 Lr   -a   88737#       238#        7399#       1#    B-      *              252 095263#       255#
+0 59  156   97  253 Bk   -a   80929#       359#        7451#       1#    B-   1627#     359#    253 086880#       385#
+  57  155   98  253 Cf   -a   79301.567      4.257     7454.829    0.017 B-    291.030    4.385 253 085133.738      4.570
+  55  154   99  253 Es   -a   79010.538      1.250     7452.887    0.005 B-   -335.202    2.713 253 084821.305      1.341
+  53  153  100  253 Fm   -a   79345.740      2.932     7448.470    0.012 B-  -1827#      31#    253 085181.160      3.148
+  51  152  101  253 Md   -a   81173#        31#        7438#       0#    B-  -3186#      32#    253 087143#        34#
+  49  151  102  253 No        84358.735      6.912     7422.471    0.027 B-  -4217#     202#    253 090562.831      7.420
+  47  150  103  253 Lr   -a   88575#       202#        7403#       1#    B-  -4982#     457#    253 095089#       217#
+  45  149  104  253 Rf   -a   93557#       410#        7380#       2#    B-      *              253 100438#       440#
+0 60  157   97  254 Bk    x   84393#       298#        7440#       1#    B-   3052#     298#    254 090600#       320#
+  58  156   98  254 Cf   -a   81341.401     11.462     7449.225    0.045 B-   -649.193   12.113 254 087323.590     12.304
+  56  155   99  254 Es   -a   81990.594      4.010     7443.589    0.016 B-   1087.800    3.202 254 088020.527      4.304
+  54  154  100  254 Fm   -a   80902.794      2.414     7444.792    0.010 B-  -2550#     100#    254 086852.726      2.591
+  52  153  101  254 Md    -   83453#       100#        7432#       0#    B-  -1271#     100#    254 089590#       107#
+  50  152  102  254 No        84723.347      9.658     7423.590    0.038 B-  -5148#     301#    254 090954.259     10.367
+  48  151  103  254 Lr   -a   89871#       301#        7400#       1#    B-  -3327#     414#    254 096481#       323#
+  46  150  104  254 Rf   -a   93199#       283#        7384#       1#    B-      *              254 100053#       304#
+0 59  157   98  255 Cf    +   84809#       200#        7438#       1#    B-    720#     200#    255 091047#       215#
+  57  156   99  255 Es   -a   84089.274     10.817     7437.821    0.042 B-    289.620   10.247 255 090273.553     11.612
+  55  155  100  255 Fm   -a   83799.654      4.291     7435.888    0.017 B-  -1043.416    7.747 255 089962.633      4.607
+  53  154  101  255 Md   -a   84843.070      6.553     7428.729    0.026 B-  -1964.164   16.281 255 091082.787      7.035
+  51  153  102  255 No    x   86807.234     14.904     7417.958    0.058 B-  -3140.066   23.138 255 093191.404     16.000
+  49  152  103  255 Lr    x   89947.300     17.698     7402.576    0.069 B-  -4382#     116#    255 096562.404     19.000
+  47  151  104  255 Rf   -a   94330#       115#        7382#       0#    B-  -5263#     377#    255 101267#       123#
+  45  150  105  255 Db   -a   99593#       359#        7359#       1#    B-      *              255 106918#       385#
+0 60  158   98  256 Cf   -a   87041#       314#        7432#       1#    B-   -146#     330#    256 093442#       338#
+  58  157   99  256 Es    +   87187#       100#        7428#       0#    B-   1700#     100#    256 093599#       108#
+  56  156  100  256 Fm   -a   85486.817      5.600     7431.780    0.022 B-  -1969#     123#    256 091773.878      6.012
+  54  155  101  256 Md   IT   87456#       122#        7421#       0#    B-   -366#     123#    256 093888#       132#
+  52  154  102  256 No   -a   87822.062      7.743     7416.546    0.030 B-  -3924.536   83.264 256 094280.866      8.312
+  50  153  103  256 Lr    x   91746.598     82.903     7398.160    0.324 B-  -2475.451   84.802 256 098494.029     89.000
+  48  152  104  256 Rf   -a   94222.049     17.848     7385.434    0.070 B-  -6276#     241#    256 101151.535     19.160
+  46  151  105  256 Db   -a  100498#       240#        7358#       1#    B-      *              256 107889#       258#
+0 59  158   99  257 Es   -a   89403#       411#        7422#       2#    B-    813#     411#    257 095979#       441#
+  57  157  100  257 Fm   -a   88590.033      4.486     7422.194    0.017 B-   -403.020    4.715 257 095105.317      4.815
+  55  156  101  257 Md   -a   88993.053      1.601     7417.582    0.006 B-  -1254.202    6.661 257 095537.977      1.718
+  53  155  102  257 No   -a   90247.256      6.678     7409.657    0.026 B-  -2418#      45#    257 096884.419      7.169
+  51  154  103  257 Lr   -a   92665#        44#        7397#       0#    B-  -3201#      45#    257 099480#        47#
+  49  153  104  257 Rf   -a   95866.427     10.817     7381.704    0.042 B-  -4340#     203#    257 102916.848     11.612
+  47  152  105  257 Db   -a  100207#       203#        7362#       1#    B-      *              257 107576#       218#
+0 60  159   99  258 Es    x   92702#       401#        7412#       2#    B-   2276#     448#    258 099520#       430#
+  58  158  100  258 Fm   -a   90426#       200#        7418#       1#    B-  -1260#     200#    258 097077#       215#
+  56  157  101  258 Md   -a   91686.792      4.419     7409.675    0.017 B-    209#     100#    258 098429.825      4.743
+  54  156  102  258 No   -a   91478#       100#        7407#       0#    B-  -3304#     143#    258 098205#       107#
+  52  155  103  258 Lr   -a   94782#       102#        7392#       0#    B-  -1559#     107#    258 101753#       109#
+  50  154  104  258 Rf   -a   96341.036     31.967     7382.538    0.124 B-  -5456#     307#    258 103426.362     34.317
+  48  153  105  258 Db   -a  101797#       305#        7358#       1#    B-  -3447#     513#    258 109284#       328#
+  46  152  106  258 Sg   -a  105244#       413#        7342#       2#    B-      *              258 112984#       443#
+0 59  159  100  259 Fm   -a   93704#       283#        7407#       1#    B-     80#     346#    259 100596#       304#
+  57  158  101  259 Md   -a   93624#       200#        7405#       1#    B-   -454#     200#    259 100510#       215#
+  55  157  102  259 No   -a   94078.569      6.589     7399.974    0.025 B-  -1773#      71#    259 100997.503      7.073
+  53  156  103  259 Lr   -a   95852#        71#        7390#       0#    B-  -2510#     101#    259 102901#        76#
+  51  155  104  259 Rf   -a   98362#        72#        7377#       0#    B-  -3629#      90#    259 105596#        78#
+  49  154  105  259 Db   -a  101991.016     53.040     7360.362    0.205 B-  -4529#     126#    259 109491.865     56.940
+  47  153  106  259 Sg   -a  106520#       115#        7340#       0#    B-      *              259 114353#       123#
+0 60  160  100  260 Fm   -a   95766#       435#        7402#       2#    B-   -786#     537#    260 102809#       467#
+  58  159  101  260 Md   -a   96552#       316#        7396#       1#    B-    940#     374#    260 103653#       340#
+  56  158  102  260 No   -a   95612#       200#        7397#       1#    B-  -2665#     235#    260 102643#       215#
+  54  157  103  260 Lr   -a   98277#       124#        7383#       0#    B-   -870#     236#    260 105504#       133#
+  52  156  104  260 Rf   -a   99147#       200#        7377#       1#    B-  -4526#     221#    260 106439#       215#
+  50  155  105  260 Db   -a  103673#        93#        7357#       0#    B-  -2875#      95#    260 111297#       100#
+  48  154  106  260 Sg   -a  106547.552     20.536     7342.562    0.079 B-  -6776#     246#    260 114383.508     22.045
+  46  153  107  260 Bh   -a  113323#       245#        7313#       1#    B-      *              260 121658#       263#
+0 59  160  101  261 Md   -a   98578#       509#        7391#       2#    B-    123#     547#    261 105828#       546#
+  57  159  102  261 No   -a   98455#       200#        7388#       1#    B-  -1103#     283#    261 105696#       215#
+  55  158  103  261 Lr   -a   99558#       200#        7381#       1#    B-  -1761#     206#    261 106880#       215#
+  53  157  104  261 Rf   -a  101318.594     50.444     7371.384    0.193 B-  -2990#     121#    261 108769.990     54.153
+  51  156  105  261 Db   -a  104308#       110#        7357#       0#    B-  -3697#     112#    261 111980#       118#
+  49  155  106  261 Sg   -a  108005.043     18.494     7339.770    0.071 B-  -5128#     210#    261 115948.188     19.853
+  47  154  107  261 Bh   -a  113133#       209#        7317#       1#    B-      *              261 121454#       224#
+0 60  161  101  262 Md   -a  101627#       500#        7382#       2#    B-   1526#     617#    262 109101#       537#
+  58  160  102  262 No   -a  100101#       361#        7385#       1#    B-  -2000#     412#    262 107463#       387#
+  56  159  103  262 Lr   -a  102102#       200#        7374#       1#    B-   -291#     300#    262 109611#       215#
+  54  158  104  262 Rf   -a  102393#       224#        7370#       1#    B-  -3861#     265#    262 109923#       240#
+  52  157  105  262 Db   -a  106253#       143#        7352#       1#    B-  -2112#     147#    262 114068#       154#
+  50  156  106  262 Sg   -a  108365.771     35.411     7341.185    0.135 B-  -6176#     308#    262 116335.446     38.015
+  48  155  107  262 Bh   -a  114541#       306#        7315#       1#    B-      *              262 122965#       328#
+0 59  161  102  263 No   -a  103129#       490#        7376#       2#    B-   -600#     566#    263 110714#       526#
+  57  160  103  263 Lr   -a  103729#       283#        7371#       1#    B-  -1027#     322#    263 111358#       304#
+  55  159  104  263 Rf   -a  104756#       153#        7364#       1#    B-  -2355#     227#    263 112460#       164#
+  53  158  105  263 Db   -a  107111#       168#        7352#       1#    B-  -3079#     193#    263 114988#       181#
+  51  157  106  263 Sg   -a  110190#        95#        7337#       0#    B-  -4306#     319#    263 118294#       102#
+  49  156  107  263 Bh   -a  114496#       305#        7318#       1#    B-  -5182#     329#    263 122916#       327#
+  47  155  108  263 Hs   -a  119678#       125#        7295#       0#    B-      *              263 128480#       134#
+0 60  162  102  264 No   -a  105011#       591#        7371#       2#    B-  -1366#     734#    264 112734#       634#
+  58  161  103  264 Lr   -a  106377#       436#        7363#       2#    B-    300#     566#    264 114200#       468#
+  56  160  104  264 Rf   -a  106077#       361#        7361#       1#    B-  -3285#     431#    264 113878#       387#
+  54  159  105  264 Db   -a  109362#       235#        7346#       1#    B-  -1420#     368#    264 117405#       253#
+  52  158  106  264 Sg   -a  110782#       283#        7338#       1#    B-  -5276#     334#    264 118929#       304#
+  50  157  107  264 Bh   -a  116058#       177#        7315#       1#    B-  -3506#     180#    264 124593#       190#
+  48  156  108  264 Hs   -a  119563.222     28.881     7298.375    0.109 B-      *              264 128356.405     31.005
+0 59  162  103  265 Lr   -a  108233#       547#        7359#       2#    B-   -457#     655#    265 116193#       587#
+  57  161  104  265 Rf   -a  108690#       361#        7354#       1#    B-  -1793#     424#    265 116683#       387#
+  55  160  105  265 Db   -a  110483#       224#        7344#       1#    B-  -2312#     255#    265 118608#       240#
+  53  159  106  265 Sg   -a  112794#       123#        7333#       0#    B-  -3621#     264#    265 121090#       132#
+  51  158  107  265 Bh   -a  116415#       234#        7316#       1#    B-  -4485#     235#    265 124977#       251#
+  49  157  108  265 Hs   -a  120900.283     23.958     7296.247    0.090 B-  -5778#     452#    265 129791.799     25.719
+  47  156  109  265 Mt   -a  126678#       451#        7271#       2#    B-      *              265 135995#       484#
+0 60  163  103  266 Lr   -a  111622#       583#        7349#       2#    B-   1546#     749#    266 119831#       626#
+  58  162  104  266 Rf   -a  110076#       469#        7352#       2#    B-  -2660#     548#    266 118172#       504#
+  56  161  105  266 Db   -a  112737#       283#        7339#       1#    B-   -881#     374#    266 121028#       304#
+  54  160  106  266 Sg   -a  113618#       245#        7332#       1#    B-  -4487#     294#    266 121973#       263#
+  52  159  107  266 Bh   -a  118104#       163#        7313#       1#    B-  -3032#     167#    266 126790#       175#
+  50  158  108  266 Hs   -a  121136.373     38.695     7298.273    0.145 B-  -6826#     309#    266 130045.252     41.540
+  48  157  109  266 Mt   -a  127962#       307#        7270#       1#    B-      *              266 137373#       329#
+0 59  163  104  267 Rf   -a  113444#       575#        7342#       2#    B-   -630#     707#    267 121787#       617#
+  57  162  105  267 Db   -a  114074#       412#        7336#       2#    B-  -1732#     486#    267 122464#       443#
+  55  161  106  267 Sg   -a  115806#       257#        7327#       1#    B-  -2960#     367#    267 124322#       276#
+  53  160  107  267 Bh   -a  118766#       263#        7313#       1#    B-  -3887#     279#    267 127500#       282#
+  51  159  108  267 Hs   -a  122653#        96#        7295#       0#    B-  -5138#     512#    267 131673#       103#
+  49  158  109  267 Mt   -a  127791#       503#        7273#       2#    B-  -6089#     521#    267 137189#       540#
+  47  157  110  267 Ds   -a  133880#       135#        7248#       1#    B-      *              267 143726#       145#
+0 60  164  104  268 Rf   -a  115476#       662#        7337#       2#    B-  -1586#     848#    268 123968#       711#
+  58  163  105  268 Db   -a  117062#       529#        7328#       2#    B-    260#     707#    268 125671#       568#
+  56  162  106  268 Sg   -a  116802#       469#        7326#       2#    B-  -4005#     605#    268 125392#       504#
+  54  161  107  268 Bh   -a  120807#       381#        7308#       1#    B-  -2023#     475#    268 129691#       409#
+  52  160  108  268 Hs   -a  122830#       283#        7298#       1#    B-  -6321#     367#    268 131863#       304#
+  50  159  109  268 Mt   -a  129151#       233#        7271#       1#    B-  -4497#     381#    268 138649#       250#
+  48  158  110  268 Ds   -a  133648#       301#        7252#       1#    B-      *              268 143477#       324#
+0 59  164  105  269 Db   -a  119148#       624#        7323#       2#    B-   -614#     722#    269 127911#       669#
+  57  163  106  269 Sg   -a  119763#       364#        7318#       1#    B-  -1715#     522#    269 128570#       391#
+  55  162  107  269 Bh   -a  121478#       374#        7309#       1#    B-  -3086#     394#    269 130412#       402#
+  53  161  108  269 Hs   -a  124564#       124#        7294#       0#    B-  -4806#     480#    269 133725#       133#
+  51  160  109  269 Mt   -a  129370#       463#        7273#       2#    B-  -5465#     464#    269 138884#       497#
+  49  159  110  269 Ds   -a  134834.709     31.403     7250.154    0.117 B-      *              269 144751.021     33.712
+0 60  165  105  270 Db   -a  122307#       617#        7314#       2#    B-    816#     831#    270 131302#       662#
+  58  164  106  270 Sg   -a  121491#       557#        7314#       2#    B-  -2735#     627#    270 130426#       598#
+  56  163  107  270 Bh   -a  124226#       287#        7301#       1#    B-   -886#     379#    270 133362#       308#
+  54  162  108  270 Hs   -a  125112#       248#        7295#       1#    B-  -5598#     301#    270 134314#       266#
+  52  161  109  270 Mt   -a  130710#       170#        7271#       1#    B-  -3968#     177#    270 140323#       183#
+  50  160  110  270 Ds   -a  134678.282     48.011     7253.775    0.178 B-      *              270 144583.090     51.542
+0 59  165  106  271 Sg   -a  124757#       585#        7305#       2#    B-  -1164#     718#    271 133932#       628#
+  57  164  107  271 Bh   -a  125921#       415#        7298#       2#    B-  -1819#     501#    271 135182#       446#
+  55  163  108  271 Hs   -a  127740#       280#        7288#       1#    B-  -3361#     433#    271 137135#       301#
+  53  162  109  271 Mt   -a  131101#       330#        7273#       1#    B-  -4847#     344#    271 140742#       354#
+  51  161  110  271 Ds   -a  135948#        97#        7252#       0#    B-      *              271 145946#       104#
+0 60  166  106  272 Sg   -a  126580#       727#        7301#       3#    B-  -2209#     901#    272 135890#       781#
+  58  165  107  272 Bh   -a  128789#       532#        7290#       2#    B-   -217#     737#    272 138261#       571#
+  56  164  108  272 Hs   -a  129006#       510#        7286#       2#    B-  -4575#     704#    272 138494#       547#
+  54  163  109  272 Mt   -a  133581#       485#        7267#       2#    B-  -2433#     637#    272 143406#       521#
+  52  162  110  272 Ds   -a  136015#       413#        7255#       2#    B-  -6758#     474#    272 146018#       443#
+  50  161  111  272 Rg   -a  142773#       233#        7227#       1#    B-      *              272 153273#       251#
+0 61  167  106  273 Sg    x  130018#       503#        7291#       2#    B-   -615#     855#    273 139580#       540#
+  59  166  107  273 Bh   -a  130633#       692#        7286#       3#    B-  -1257#     783#    273 140240#       743#
+  57  165  108  273 Hs   -a  131890#       367#        7279#       1#    B-  -2822#     561#    273 141590#       394#
+  55  164  109  273 Mt   -a  134713#       424#        7265#       2#    B-  -3643#     445#    273 144620#       455#
+  53  163  110  273 Ds   -a  138356#       134#        7249#       0#    B-  -4339#     543#    273 148531#       144#
+  51  162  111  273 Rg   -a  142695#       526#        7231#       2#    B-      *              273 153189#       565#
+0 60  167  107  274 Bh   -a  133682#       619#        7278#       2#    B-    196#     856#    274 143513#       664#
+  58  166  108  274 Hs   -a  133486#       592#        7276#       2#    B-  -3760#     689#    274 143303#       635#
+  56  165  109  274 Mt   -a  137246#       354#        7259#       1#    B-  -1952#     526#    274 147339#       380#
+  54  164  110  274 Ds   -a  139197#       389#        7249#       1#    B-  -5416#     428#    274 149434#       418#
+  52  163  111  274 Rg   -a  144613#       177#        7227#       1#    B-      *              274 155249#       190#
+0 61  168  107  275 Bh    x  135691#       596#        7273#       2#    B-   -929#     837#    275 145670#       640#
+  59  167  108  275 Hs   -a  136620#       587#        7267#       2#    B-  -2209#     721#    275 146667#       631#
+  57  166  109  275 Mt   -a  138829#       418#        7256#       2#    B-  -2736#     586#    275 149039#       449#
+  55  165  110  275 Ds   -a  141565#       410#        7244#       1#    B-  -3731#     661#    275 151976#       441#
+  53  164  111  275 Rg   -a  145296#       519#        7227#       2#    B-      *              275 155981#       557#
+0 60  168  108  276 Hs   -a  138285#       754#        7264#       3#    B-  -3029#     923#    276 148455#       810#
+  58  167  109  276 Mt   -a  141315#       532#        7250#       2#    B-  -1227#     763#    276 151708#       571#
+  56  166  110  276 Ds   -a  142541#       548#        7243#       2#    B-  -4945#     834#    276 153024#       588#
+  54  165  111  276 Rg   -a  147486#       629#        7222#       2#    B-  -2866#     866#    276 158333#       675#
+  52  164  112  276 Cn    x  150352#       596#        7209#       2#    B-      *              276 161410#       640#
+0 61  169  108  277 Hs   -a  141493#       541#        7255#       2#    B-  -1475#     884#    277 151899#       581#
+  59  168  109  277 Mt   -a  142968#       699#        7247#       3#    B-  -2172#     798#    277 153483#       751#
+  57  167  110  277 Ds   -a  145140#       384#        7237#       1#    B-  -3197#     646#    277 155815#       412#
+  55  166  111  277 Rg   -a  148338#       520#        7222#       2#    B-  -4065#     539#    277 159247#       558#
+  53  165  112  277 Cn   -a  152403#       143#        7205#       1#    B-      *              277 163611#       153#
+0 60  169  109  278 Mt   -a  145736#       621#        7240#       2#    B-   -645#     881#    278 156454#       666#
+  58  168  110  278 Ds   -a  146381#       625#        7235#       2#    B-  -4136#     719#    278 157146#       671#
+  56  167  111  278 Rg   -a  150517#       357#        7218#       1#    B-  -2415#     565#    278 161587#       383#
+  54  166  112  278 Cn   -a  152932#       438#        7206#       2#    B-  -5957#     475#    278 164179#       470#
+  52  165  113  278 Ed   -a  158889#       184#        7182#       1#    B-      *              278 170574#       198#
+0 61  170  109  279 Mt   -a  147496#       667#        7237#       2#    B-  -1630#     896#    279 158343#       716#
+  59  169  110  279 Ds   -a  149126#       598#        7228#       2#    B-  -2649#     731#    279 160093#       642#
+  57  168  111  279 Rg   -a  151775#       421#        7216#       2#    B-  -3255#     621#    279 162937#       452#
+  55  167  112  279 Cn   -a  155030#       456#        7202#       2#    B-  -4209#     835#    279 166432#       490#
+  53  166  113  279 Ed    x  159239#       699#        7184#       3#    B-      *              279 170950#       750#
+0 60  170  110  280 Ds   -a  150520#       780#        7226#       3#    B-  -3366#     944#    280 161590#       838#
+  58  169  111  280 Rg   -a  153886#       532#        7212#       2#    B-  -1810#     789#    280 165203#       571#
+  56  168  112  280 Cn   -a  155696#       583#        7202#       2#    B-  -5444#     707#    280 167147#       626#
+  54  167  113  280 Ed    x  161140#       400#        7180#       1#    B-      *              280 172991#       429#
+0 61  171  110  281 Ds   -a  153431#       579#        7219#       2#    B-  -1866#     992#    281 164715#       622#
+  59  170  111  281 Rg   -a  155297#       806#        7210#       3#    B-  -2722#     894#    281 166718#       865#
+  57  169  112  281 Cn   -a  158019#       387#        7197#       1#    B-  -3790#     490#    281 169641#       416#
+  55  168  113  281 Ed    x  161810#       300#        7181#       1#    B-      *              281 173710#       322#
+0 60  171  111  282 Rg   -a  157800#       654#        7204#       2#    B-  -1176#     926#    282 169405#       702#
+  58  170  112  282 Cn   -a  158976#       656#        7197#       2#    B-  -4749#     748#    282 170668#       704#
+  56  169  113  282 Ed   -a  163725#       361#        7177#       1#    B-      *              282 175766#       387#
+0 61  172  111  283 Rg   -a  159281#       697#        7202#       2#    B-  -2205#     925#    283 170995#       748#
+  59  171  112  283 Cn   -a  161486#       608#        7191#       2#    B-  -3221#     748#    283 173362#       653#
+  57  170  113  283 Ed   -a  164707#       436#        7177#       2#    B-      *              283 176820#       468#
+0 60  172  112  284 Cn   -a  162545#       806#        7190#       3#    B-  -4046#     966#    284 174499#       865#
+  58  171  113  284 Ed   -a  166591#       534#        7173#       2#    B-  -2330#     846#    284 178843#       573#
+  56  170  114  284 Fl   -a  168921#       656#        7162#       2#    B-      *              284 181344#       704#
+0 61  173  112  285 Cn   -a  165173#       581#        7184#       2#    B-  -2557#     995#    285 177321#       624#
+  59  172  113  285 Ed   -a  167731#       807#        7173#       3#    B-  -3272#     897#    285 180066#       866#
+  57  171  114  285 Fl   -a  171003#       391#        7158#       1#    B-      *              285 183579#       419#
+0 60  173  113  286 Ed   -a  170014#       656#        7168#       2#    B-  -1758#     928#    286 182518#       704#
+  58  172  114  286 Fl   -a  171773#       657#        7159#       2#    B-      *              286 184406#       705#
+0 61  174  113  287 Ed   -a  171245#       725#        7167#       3#    B-  -2827#     948#    287 183840#       778#
+  59  173  114  287 Fl   -a  174073#       610#        7154#       2#    B-  -3823#     752#    287 186875#       655#
+  57  172  115  287 Ef   -a  177895#       439#        7138#       2#    B-      *              287 190978#       471#
+0 60  174  114  288 Fl   -a  175042#       806#        7154#       3#    B-  -4728#     968#    288 187916#       865#
+  58  173  115  288 Ef   -a  179771#       536#        7135#       2#    B-      *              288 192992#       576#
+0 61  175  114  289 Fl   -a  177564#       584#        7148#       2#    B-  -3102#     997#    289 190623#       626#
+  59  174  115  289 Ef   -a  180666#       809#        7135#       3#    B-  -3861#     947#    289 193953#       868#
+  57  173  116  289 Lv   -a  184528#       492#        7119#       2#    B-      *              289 198099#       529#
+0 60  175  115  290 Ef   -a  182894#       658#        7130#       2#    B-  -2304#     932#    290 196345#       706#
+  58  174  116  290 Lv   -a  185198#       660#        7120#       2#    B-      *              290 198818#       708#
+0 61  176  115  291 Ef   -a  183990#       784#        7130#       3#    B-  -3397#     995#    291 197522#       842#
+  59  175  116  291 Lv   -a  187387#       612#        7116#       2#    B-  -4413#     853#    291 201169#       658#
+  57  174  117  291 Eh   -a  191800#       594#        7098#       2#    B-      *              291 205906#       637#
+0 60  176  116  292 Lv   -a  188242#       806#        7116#       3#    B-  -5334#    1047#    292 202086#       865#
+  58  175  117  292 Eh   -a  193576#       669#        7095#       2#    B-      *              292 207812#       718#
+0 61  177  116  293 Lv   -a  190669#       586#        7111#       2#    B-  -3716#    1000#    293 204691#       629#
+  59  176  117  293 Eh   -a  194385#       810#        7095#       3#    B-  -4488#    1072#    293 208680#       870#
+  57  175  118  293 Ei   -a  198873#       702#        7077#       2#    B-      *              293 213498#       753#
+0 60  177  117  294 Eh   -a  196521#       660#        7092#       2#    B-  -2942#     936#    294 210974#       708#
+  58  176  118  294 Ei   -a  199463#       663#        7079#       2#    B-      *              294 214132#       712#
+0 59  177  118  295 Ei   -a  201512#       644#        7075#       2#    B-      *              295 216332#       692#
diff --git a/doc/pub/week39/ipynb/Datafiles/autocor.dat b/doc/pub/week39/ipynb/Datafiles/autocor.dat
new file mode 100644
index 000000000..8ee8c5fe3
--- /dev/null
+++ b/doc/pub/week39/ipynb/Datafiles/autocor.dat
@@ -0,0 +1,1000 @@
+              0      1.0000000
+              1    0.030436777
+              2   -0.041967823
+              3   0.0025843740
+              4   -0.013971981
+              5    0.031863200
+              6    0.011244624
+              7  -0.0017268302
+              8   -0.032367877
+              9   0.0048970777
+             10   -0.011108971
+             11    0.029249261
+             12   -0.060102972
+             13   -0.010339285
+             14   -0.016163202
+             15   -0.014328605
+             16    0.037522532
+             17   0.0086741695
+             18    0.042486236
+             19    0.038599298
+             20   -0.017576480
+             21    0.026784371
+             22   -0.068080722
+             23   -0.032413624
+             24   -0.033787280
+             25   0.0044378763
+             26   0.0084675258
+             27   -0.049112322
+             28    0.029052476
+             29   -0.010253671
+             30   -0.020415539
+             31   -0.026030116
+             32    0.035646939
+             33   -0.016364511
+             34    0.031223752
+             35 -0.00098907934
+             36    0.011793763
+             37    0.044375824
+             38  -0.0075893899
+             39    0.038505140
+             40   -0.050820678
+             41  -0.0058587490
+             42   -0.014352719
+             43   -0.030117666
+             44   -0.010670435
+             45    0.037061620
+             46    0.012236259
+             47    0.012275474
+             48   -0.043753664
+             49   -0.046814374
+             50  -0.0078373590
+             51    0.030224467
+             52  -0.0014576041
+             53  -0.0073935525
+             54  -0.0015361878
+             55   0.0063681276
+             56  0.00077378384
+             57    0.027396611
+             58    0.026288042
+             59    0.041165848
+             60    0.033625798
+             61   -0.021151787
+             62   0.0091334499
+             63    0.019146160
+             64   -0.010915406
+             65   -0.038860498
+             66  -0.0086368193
+             67   0.0021648597
+             68    0.026319537
+             69   0.0053794411
+             70    0.039529055
+             71   0.0087119709
+             72   -0.031583705
+             73    0.040882208
+             74   0.0081181815
+             75   0.0016127077
+             76    0.057618438
+             77   -0.020863018
+             78   -0.053206466
+             79   -0.048595441
+             80  -0.0019519214
+             81  -0.0091520929
+             82  -0.0093091454
+             83   -0.025526629
+             84   0.0038194868
+             85   0.0056035808
+             86   -0.041792435
+             87    0.019370857
+             88    0.014560618
+             89   -0.016079104
+             90    0.017263262
+             91    0.030090005
+             92    0.046198390
+             93    0.014260591
+             94   0.0035178537
+             95    0.014851525
+             96   -0.015769457
+             97  -0.0019402466
+             98    0.022252302
+             99   -0.017622777
+            100   -0.029172137
+            101  0.00092111483
+            102   -0.019516339
+            103   -0.014110995
+            104  -0.0053766072
+            105    0.015551435
+            106  -0.0092984599
+            107   -0.017125844
+            108   -0.023354720
+            109    0.037924881
+            110   -0.013205795
+            111   -0.073863600
+            112   -0.014986822
+            113    0.022838581
+            114    0.028783448
+            115  -0.0046507703
+            116   0.0072623794
+            117   -0.017007609
+            118   0.0054601073
+            119    0.014645594
+            120   -0.018330664
+            121  -0.0074984609
+            122   -0.019475491
+            123    0.041232837
+            124    0.035696824
+            125   -0.010334698
+            126   -0.011736466
+            127    0.030226422
+            128    0.021398621
+            129    0.024604001
+            130   -0.016037954
+            131    0.038544269
+            132    0.061300628
+            133    0.019470825
+            134   -0.011061461
+            135    0.049068767
+            136   -0.049298447
+            137   0.0039132268
+            138    0.020882292
+            139   -0.025826097
+            140   -0.011796518
+            141    0.028217876
+            142    0.016595503
+            143    0.025935569
+            144  -0.0078950322
+            145   0.0010445150
+            146   -0.019355185
+            147   -0.026751109
+            148    0.023582576
+            149    0.025495541
+            150  -0.0046856525
+            151  -0.0053126503
+            152   -0.018928898
+            153    0.017209905
+            154   -0.020543359
+            155  -0.0019265343
+            156  -0.0045039479
+            157   -0.044022012
+            158  -0.0054810482
+            159   0.0097895871
+            160    0.069425506
+            161   0.0074504423
+            162   -0.016934482
+            163  -0.0075259798
+            164 -0.00021461104
+            165   -0.056810813
+            166   0.0041571957
+            167   -0.051444304
+            168    0.018242092
+            169  -0.0092588016
+            170   -0.058149110
+            171   -0.019497092
+            172    0.033414724
+            173    0.019228122
+            174   -0.017014927
+            175    0.023859853
+            176   0.0093103834
+            177    0.033798290
+            178    0.013270749
+            179    0.067569382
+            180   -0.010644310
+            181   -0.046364881
+            182    0.018396640
+            183    0.020785098
+            184   -0.021406123
+            185   -0.022586765
+            186   -0.015545857
+            187   -0.030076727
+            188  -0.0065117780
+            189  -0.0040088089
+            190    0.010598275
+            191   -0.017176792
+            192    0.056184566
+            193   -0.011321257
+            194  -0.0029512385
+            195   -0.014918170
+            196  -0.0039190314
+            197    0.056045986
+            198   -0.032240800
+            199   -0.024267093
+            200    0.025936442
+            201    0.024728180
+            202 -0.00089880027
+            203   -0.029831804
+            204   -0.027023166
+            205   -0.033939923
+            206   -0.033440421
+            207    0.026431974
+            208    0.021336927
+            209   -0.052701327
+            210   -0.036889376
+            211    0.050380536
+            212   -0.011311638
+            213   0.0057673144
+            214  -0.0027656116
+            215    0.042261502
+            216    0.028936072
+            217    0.028579634
+            218    0.024570794
+            219  -0.0069215979
+            220   -0.019236105
+            221   -0.021460579
+            222    0.045327931
+            223    0.017796223
+            224   -0.021253081
+            225   0.0071524123
+            226   -0.015126679
+            227   -0.051499638
+            228    0.012432292
+            229   0.0074777375
+            230   -0.017758289
+            231    0.014844143
+            232    0.013394570
+            233   -0.018637091
+            234  -0.0099195677
+            235   -0.011840993
+            236    0.056187841
+            237    0.015187364
+            238   -0.027196415
+            239   -0.021471154
+            240   -0.015489221
+            241   0.0011648445
+            242   -0.014737635
+            243    0.041224909
+            244   -0.048386598
+            245   -0.028771515
+            246  -0.0069030430
+            247  -0.0082757555
+            248    0.035867587
+            249   -0.028705115
+            250   -0.023521399
+            251  -0.0080840144
+            252    0.032038834
+            253   0.0016768482
+            254    0.017568742
+            255   -0.038854623
+            256   -0.019572338
+            257   -0.018945188
+            258   -0.044047531
+            259    0.035712621
+            260   0.0052305122
+            261   -0.019221249
+            262    0.016556546
+            263   -0.018528132
+            264   -0.033294901
+            265   -0.029063893
+            266    0.068630731
+            267    0.014630449
+            268   -0.010624595
+            269   0.0068084665
+            270   -0.013154851
+            271  -0.0031250199
+            272   -0.022697076
+            273    0.052943127
+            274    0.013646959
+            275    0.019661200
+            276  -0.0021662857
+            277   0.0048494673
+            278   0.0093039642
+            279   -0.050613478
+            280   0.0037930281
+            281   0.0027056109
+            282   0.0067641602
+            283   -0.031584591
+            284   -0.016064322
+            285  -0.0010516554
+            286   -0.028260447
+            287   0.0040810671
+            288    0.010358807
+            289   -0.019334825
+            290   -0.048260958
+            291    0.035049342
+            292    0.040489272
+            293   0.0020011032
+            294   -0.047585009
+            295  -0.0069058676
+            296  -0.0068136076
+            297   -0.025701037
+            298    0.016612923
+            299  0.00069465478
+            300   -0.025587862
+            301    0.029814669
+            302   -0.041293387
+            303   -0.017292295
+            304   -0.013939533
+            305   -0.017807378
+            306    0.035623264
+            307    0.023870958
+            308  -0.0048061196
+            309    0.013506371
+            310    0.024774288
+            311   0.0073845590
+            312   0.0074994639
+            313    0.011664191
+            314    0.039796920
+            315   -0.021968341
+            316    0.030270961
+            317    0.011032615
+            318   -0.013020314
+            319   -0.016347247
+            320  -0.0053552657
+            321  -0.0018146771
+            322   -0.028324460
+            323   -0.033892233
+            324    0.040970876
+            325    0.013724516
+            326   -0.025758979
+            327    0.019040388
+            328  -0.0047334635
+            329  -0.0034112772
+            330   0.0068517387
+            331   -0.026995154
+            332   -0.014256522
+            333    0.019176374
+            334   0.0051692845
+            335    0.041770495
+            336  -0.0019396636
+            337    0.010891816
+            338   -0.044531596
+            339  -0.0077558642
+            340   -0.025102912
+            341   -0.010674026
+            342    0.024980787
+            343   0.0024171666
+            344  -0.0081651699
+            345    0.023649184
+            346   0.0069294682
+            347   -0.039211268
+            348    0.021395177
+            349    0.010078175
+            350   0.0069000193
+            351    0.028906152
+            352    0.022775633
+            353   -0.023275057
+            354    0.017183626
+            355  -0.0034314696
+            356    0.010874216
+            357   -0.029315379
+            358    0.010861994
+            359    0.017521148
+            360    0.025805095
+            361    0.019252809
+            362   -0.064531812
+            363    0.048847457
+            364    0.033641876
+            365  -0.0051765311
+            366   0.0091743379
+            367    0.033470461
+            368    0.016391479
+            369    0.033188930
+            370  0.00020395242
+            371    0.010477670
+            372    0.043657980
+            373   -0.045721493
+            374   0.0095296094
+            375    0.038701345
+            376   -0.019764861
+            377  -0.0024760707
+            378   -0.035616363
+            379   -0.025423739
+            380   0.0048029114
+            381  -0.0057620708
+            382   -0.015241488
+            383  -0.0094964164
+            384   -0.015301911
+            385   -0.029335964
+            386  -0.0085647523
+            387   -0.035018913
+            388   -0.018494103
+            389   -0.025290309
+            390    0.043807258
+            391    0.024917719
+            392   -0.037659446
+            393  -0.0094455059
+            394  -0.0092903549
+            395    0.021372140
+            396   0.0029776435
+            397    0.048249941
+            398   -0.051252331
+            399   -0.042462183
+            400   -0.027531735
+            401    0.032780000
+            402   -0.012075698
+            403    0.017104870
+            404   0.0056918634
+            405   -0.037383540
+            406  -0.0099994917
+            407    0.034147318
+            408    0.035420675
+            409   -0.018830626
+            410   0.0057399934
+            411    0.028717444
+            412   0.0043248247
+            413   -0.015179249
+            414    0.038564990
+            415    0.028899559
+            416   0.0070276321
+            417   -0.020818426
+            418  -0.0013244685
+            419   -0.016868518
+            420    0.015794320
+            421   -0.034150397
+            422   -0.011318810
+            423   0.0093680067
+            424  -0.0056974750
+            425  0.00089406498
+            426    0.023839990
+            427   -0.023249489
+            428   0.0036581356
+            429  -0.0068173069
+            430   -0.010542605
+            431    0.021106295
+            432   -0.026663001
+            433 -0.00065540509
+            434   0.0056271702
+            435    0.014968416
+            436   0.0069719584
+            437   -0.032506238
+            438   -0.018610431
+            439    0.013945493
+            440    0.030296160
+            441    0.015990399
+            442    0.027637874
+            443   -0.040348487
+            444  -0.0082368848
+            445   0.0084959142
+            446   -0.023774043
+            447   -0.021734126
+            448    0.012168605
+            449  -0.0074461207
+            450   -0.016125466
+            451    0.033331887
+            452  0.00062736237
+            453   -0.012717250
+            454    0.017806603
+            455   -0.027145155
+            456   0.0012005238
+            457  -0.0077436248
+            458   -0.028097243
+            459 -0.00095474376
+            460  -0.0014605876
+            461   -0.035595271
+            462   -0.051237530
+            463    0.010146341
+            464   -0.028954767
+            465   0.0037797756
+            466   -0.030882319
+            467    0.047912444
+            468    0.011363212
+            469  -0.0085205849
+            470  -0.0039355171
+            471   -0.031696814
+            472    0.012773460
+            473   0.0055999575
+            474 -0.00037892267
+            475    0.015504229
+            476  -0.0076580626
+            477    0.045706707
+            478   0.0071347194
+            479   -0.045884838
+            480   -0.043718227
+            481    0.018240807
+            482    0.054386867
+            483    0.046911775
+            484    0.045204253
+            485    0.010090395
+            486    0.026767753
+            487    0.010723149
+            488    0.025154764
+            489   -0.024373964
+            490   -0.018122001
+            491    0.018975902
+            492   -0.029915565
+            493    0.021781915
+            494   -0.012285808
+            495   0.0027546898
+            496  0.00062612064
+            497   -0.040850871
+            498   -0.023513739
+            499    0.021760239
+            500   -0.021001688
+            501   0.0096979566
+            502   0.0063539835
+            503    0.021533295
+            504    0.023511110
+            505   -0.012760504
+            506   -0.018141212
+            507   -0.021346409
+            508  -0.0022698665
+            509    0.019874015
+            510    0.011557117
+            511   -0.042294961
+            512    0.029066005
+            513   0.0017714648
+            514  3.3189262E-05
+            515   -0.013922936
+            516  -0.0045819016
+            517   -0.023030698
+            518    0.017014357
+            519   -0.015759868
+            520    0.022547087
+            521    0.010541576
+            522   -0.023908324
+            523   -0.012544793
+            524   -0.034918466
+            525   -0.032696570
+            526   -0.014898371
+            527    0.024734697
+            528    0.029617332
+            529   0.0092461414
+            530   -0.035946729
+            531   -0.011867012
+            532    0.034753028
+            533  3.7608681E-05
+            534   -0.026655371
+            535  -0.0085675831
+            536  -0.0043169745
+            537  0.00019284442
+            538  -0.0027829108
+            539    0.014288886
+            540   0.0036864577
+            541   -0.016568991
+            542    0.011661583
+            543    0.010312897
+            544    0.047191126
+            545    0.010246982
+            546    0.018691967
+            547    0.012782215
+            548    0.010575964
+            549    0.043346000
+            550   0.0012075999
+            551   0.0048928567
+            552   0.0088800511
+            553   -0.026749858
+            554   -0.049468570
+            555   0.0087226114
+            556   -0.025355412
+            557  0.00041911744
+            558  -0.0039277925
+            559   -0.019867410
+            560   0.0090696073
+            561   -0.020407695
+            562   -0.028022150
+            563  0.00021588069
+            564   0.0036235564
+            565    0.012238820
+            566   0.0016474398
+            567    0.019772585
+            568  -0.0079130745
+            569  -0.0048698647
+            570 -0.00046839106
+            571  -0.0068985684
+            572   0.0038773945
+            573    0.025536952
+            574    0.014407956
+            575   -0.016018662
+            576    0.018400000
+            577  -0.0074673768
+            578    0.014582128
+            579    0.015277437
+            580   -0.010662929
+            581   -0.028007675
+            582   -0.025491028
+            583    0.010355834
+            584   -0.017664380
+            585  -0.0027916888
+            586  -0.0031681423
+            587   -0.020491989
+            588    0.030287661
+            589  -0.0018555509
+            590 -0.00029702455
+            591   0.0077901131
+            592  -0.0074424409
+            593   -0.042697622
+            594  -0.0081579992
+            595   0.0075807326
+            596   0.0062213137
+            597   -0.024978210
+            598   -0.015466545
+            599   -0.021531833
+            600   -0.014845678
+            601  -0.0047515868
+            602    0.026920902
+            603   0.0093622167
+            604    0.035153363
+            605   0.0062605573
+            606  -0.0025669440
+            607   -0.019158868
+            608    0.033285966
+            609  -0.0054035913
+            610   -0.017200913
+            611    0.014100356
+            612  -0.0090120505
+            613   0.0039391323
+            614   -0.028582199
+            615    0.026504777
+            616   -0.031792297
+            617   0.0053277806
+            618    0.010772900
+            619   -0.013054304
+            620    0.012242009
+            621   -0.065873242
+            622   0.0032291061
+            623   0.0091991992
+            624  0.00032081972
+            625    0.033746878
+            626    0.021605607
+            627 -0.00065569553
+            628  -0.0052540101
+            629  -0.0088500047
+            630    0.016983225
+            631    0.017762578
+            632   -0.020731913
+            633    0.027829821
+            634    0.011591300
+            635   -0.028061398
+            636    0.034537856
+            637  -0.0053690020
+            638   -0.045885706
+            639   -0.017892409
+            640   -0.010396254
+            641    0.028504922
+            642    0.024906302
+            643    0.031601151
+            644   -0.020471528
+            645    0.012436367
+            646  -0.0060652007
+            647    0.016340730
+            648   -0.051146749
+            649    0.025541636
+            650  -0.0091187569
+            651   -0.028358025
+            652   0.0090547667
+            653   -0.039390098
+            654   -0.025279051
+            655  -0.0098439764
+            656    0.010135984
+            657  -0.0099451758
+            658   -0.013868539
+            659    0.011209477
+            660   -0.015549717
+            661    0.015580477
+            662    0.034790993
+            663    0.036386181
+            664    0.017019255
+            665  -0.0071014117
+            666   -0.026363213
+            667  -0.0011162350
+            668    0.010430304
+            669  -0.0037094956
+            670   -0.017952290
+            671  -0.0077706804
+            672   0.0029775795
+            673  -0.0061743735
+            674   -0.015180197
+            675  -0.0090826895
+            676    0.013756556
+            677    0.025269597
+            678    0.034896932
+            679   0.0045113518
+            680  -0.0027466582
+            681  -0.0020897520
+            682   0.0077071015
+            683    0.017099343
+            684  -0.0015647601
+            685   -0.042943759
+            686   -0.019877423
+            687   -0.014563904
+            688    0.028812347
+            689    0.013691120
+            690   -0.024519950
+            691  -0.0044018710
+            692   0.0078443611
+            693    0.027075429
+            694   -0.012549527
+            695    0.016024749
+            696  -0.0046314243
+            697   0.0084472590
+            698   -0.015057447
+            699    0.026575886
+            700    0.010553947
+            701  -0.0033593037
+            702   0.0030676930
+            703   0.0086218092
+            704   0.0037965560
+            705   -0.014944414
+            706  -0.0029175505
+            707   -0.022225334
+            708   0.0035673060
+            709   -0.014910503
+            710  -0.0028479147
+            711  -0.0012980229
+            712   -0.024413269
+            713   -0.014945607
+            714 -0.00092249340
+            715   0.0060404214
+            716   -0.018852282
+            717  -0.0093385434
+            718   0.0040141908
+            719   0.0024247280
+            720    0.018611644
+            721  -0.0052456314
+            722   -0.022206528
+            723    0.012921902
+            724  0.00064998606
+            725   0.0043656363
+            726  -0.0026142811
+            727   0.0066569906
+            728   -0.014715930
+            729    0.023061889
+            730   -0.014595305
+            731    0.018909306
+            732   0.0019405656
+            733   -0.024034112
+            734    0.044901386
+            735   -0.012970072
+            736    0.011127732
+            737   -0.014563174
+            738  -0.0064960244
+            739   0.0023165471
+            740 -0.00087576396
+            741  -0.0049030238
+            742   -0.019033070
+            743  -0.0021911524
+            744    0.025625643
+            745   -0.014500976
+            746   -0.012423680
+            747   -0.017058080
+            748   0.0045641993
+            749   0.0084538383
+            750   0.0021116607
+            751  -0.0033408150
+            752  -0.0072360325
+            753  -0.0022798159
+            754   -0.021979932
+            755    0.023315014
+            756   0.0052468019
+            757    0.016949879
+            758   -0.017138783
+            759   0.0078100113
+            760   0.0099470522
+            761   0.0096305195
+            762    0.027398820
+            763   -0.025646831
+            764    0.013123399
+            765    0.015549034
+            766   0.0023558555
+            767  -0.0096405092
+            768    0.012253011
+            769  -0.0051483047
+            770  -0.0084922180
+            771   -0.012338577
+            772   -0.012115014
+            773  -0.0054183951
+            774    0.015587209
+            775    0.015120992
+            776   -0.012270671
+            777  -0.0034455014
+            778    0.010222090
+            779   -0.020718897
+            780   -0.033416426
+            781    0.031396462
+            782   -0.016828081
+            783   0.0037494749
+            784  -0.0072224179
+            785   0.0056141543
+            786    0.018963466
+            787  -0.0082246575
+            788  -0.0050421303
+            789   -0.022400808
+            790    0.011845444
+            791   0.0037973135
+            792   0.0081527811
+            793   0.0013800440
+            794    0.019983158
+            795  -0.0056201910
+            796  -0.0043737064
+            797  -0.0091988539
+            798  -0.0098326674
+            799   -0.024412160
+            800  -0.0057559680
+            801   0.0066857351
+            802    0.021505241
+            803   0.0030792036
+            804   -0.019226409
+            805   0.0045538061
+            806  -0.0056070254
+            807  -0.0016601038
+            808   -0.024716882
+            809   0.0054226135
+            810 -0.00025770775
+            811    0.015113434
+            812  -0.0032290640
+            813   -0.027790661
+            814  -0.0041114266
+            815    0.013470213
+            816  -0.0092058822
+            817  -0.0095842170
+            818    0.014469428
+            819    0.019165159
+            820   0.0028352364
+            821 -9.5780268E-05
+            822  -0.0076164534
+            823   -0.013116541
+            824    0.011262012
+            825   -0.014590759
+            826   0.0072059449
+            827   -0.015742103
+            828    0.013380709
+            829   -0.027024814
+            830   -0.011718400
+            831   0.0070073072
+            832  -0.0058391392
+            833  -0.0041303314
+            834   -0.031348383
+            835   -0.010022600
+            836  -0.0021030126
+            837  -0.0088289877
+            838    0.011174023
+            839   0.0078621381
+            840   -0.020319884
+            841  -0.0054612370
+            842   0.0036168037
+            843   -0.021466662
+            844    0.017063947
+            845   -0.015876318
+            846  -0.0015561464
+            847    0.018739482
+            848   0.0070220237
+            849    0.010466066
+            850  -0.0065905279
+            851   0.0050221836
+            852  -0.0049380275
+            853  -0.0022540120
+            854  -0.0066911228
+            855 -0.00085489296
+            856   0.0091311886
+            857   0.0087961910
+            858   0.0084225981
+            859   0.0033897841
+            860  -0.0084094261
+            861  -0.0066896747
+            862   0.0077203999
+            863   0.0012605110
+            864  -0.0026422053
+            865   0.0036337117
+            866   0.0017212519
+            867   -0.015745443
+            868  -0.0092171165
+            869 -0.00011985970
+            870   -0.021726538
+            871  -0.0073536601
+            872  7.2163303E-05
+            873   0.0011758806
+            874    0.014912380
+            875   0.0012475430
+            876    0.022576078
+            877   0.0045818387
+            878    0.010462266
+            879    0.025107039
+            880   -0.013052848
+            881   0.0025105745
+            882   -0.010884996
+            883  -0.0025679305
+            884 -0.00040656738
+            885    0.014211879
+            886   -0.013404473
+            887   -0.014505338
+            888  -0.0027969620
+            889   0.0092900832
+            890    0.012893798
+            891   -0.011269371
+            892    0.016874925
+            893  -0.0047647532
+            894   0.0081446665
+            895   0.0058947644
+            896  0.00076487417
+            897  -0.0025185792
+            898   -0.013129512
+            899  -0.0067449885
+            900   -0.014411868
+            901   -0.011283279
+            902    0.011162838
+            903   -0.010418718
+            904   0.0032281419
+            905   -0.010894860
+            906   0.0011494465
+            907   0.0084593562
+            908   0.0065085104
+            909    0.014711201
+            910   0.0054832203
+            911   0.0012882653
+            912 -0.00089841363
+            913   0.0050714575
+            914   -0.011098348
+            915   0.0055713307
+            916   0.0024382259
+            917  -0.0074570407
+            918  -0.0020684133
+            919  0.00048080589
+            920   0.0021420954
+            921   0.0054954627
+            922   0.0021075997
+            923   0.0033861839
+            924   -0.015575036
+            925 -0.00010693208
+            926   0.0076125254
+            927   0.0033302873
+            928   -0.013058275
+            929   0.0013763743
+            930   -0.014574421
+            931  -0.0043236772
+            932   0.0015626567
+            933  -0.0033933812
+            934    0.010242569
+            935  -0.0067711917
+            936 -0.00072185958
+            937   0.0081792143
+            938  -0.0024182634
+            939   0.0038473932
+            940   0.0069120137
+            941 -0.00085246271
+            942  -0.0067411485
+            943   0.0029968194
+            944   0.0039549069
+            945  -0.0037928923
+            946  -0.0036476329
+            947   0.0040189508
+            948   -0.012532289
+            949   0.0058993881
+            950  -0.0053575605
+            951   0.0060915122
+            952   0.0013451894
+            953  -0.0045829537
+            954    0.018097604
+            955   -0.012682236
+            956   0.0020792812
+            957   0.0044188576
+            958   -0.012281183
+            959   0.0011869564
+            960   0.0043558809
+            961  -0.0042990975
+            962  -0.0068498043
+            963    0.014101593
+            964   -0.011923454
+            965 -0.00064539439
+            966  -0.0039817352
+            967  0.00014351008
+            968   0.0051525695
+            969 -0.00033345980
+            970    0.015672329
+            971  -0.0031266958
+            972  0.00093231204
+            973  -0.0034217829
+            974  -0.0027825064
+            975  -0.0058773123
+            976   0.0039172115
+            977   0.0022204348
+            978  -0.0019712632
+            979   0.0074446894
+            980  -0.0050907804
+            981   0.0058000460
+            982  -0.0064782460
+            983 -0.00086478009
+            984   0.0028678024
+            985  -0.0047832128
+            986   0.0071608110
+            987   0.0022478887
+            988  -0.0024566931
+            989  0.00024711205
+            990 -0.00029949750
+            991  0.00033041933
+            992  -0.0015456286
+            993  -0.0023284185
+            994  -0.0012638969
+            995 -0.00034855887
+            996  -0.0013436311
+            997   0.0024342189
+            998   0.0013168498
+            999   0.0019766962
diff --git a/doc/pub/week39/ipynb/Datafiles/autocorrelation.cpp b/doc/pub/week39/ipynb/Datafiles/autocorrelation.cpp
new file mode 100644
index 000000000..bd11734ac
--- /dev/null
+++ b/doc/pub/week39/ipynb/Datafiles/autocorrelation.cpp
@@ -0,0 +1,67 @@
+//  This function computes the autocorrelation function for 
+//  the standard c++ random number generator
+
+#include <fstream>
+#include <iomanip>
+#include <iostream>
+#include <cmath>
+using namespace std;
+// output file as global variable
+ofstream ofile;  
+
+//     Main function begins here     
+int main(int argc, char* argv[])
+{
+     int n;
+     char *outfilename;
+
+     cin >> n;
+     double MCint = 0.;      double MCintsqr2=0.;
+     double invers_period = 1./RAND_MAX; // initialise the random number generator
+     srand(time(NULL));  // This produces the so-called seed in MC jargon
+     // Compute the variance and the mean value of the uniform distribution
+     // Compute also the specific values x for each cycle in order to be able to
+     // the covariance and the correlation function  
+     // Read in output file, abort if there are too few command-line arguments
+     if( argc <= 2 ){
+       cout << "Bad Usage: " << argv[0] << 
+	 " read also output file and number of cycles on same line" << endl;
+       exit(1);
+     }
+     else{
+       outfilename=argv[1];
+     }
+     ofile.open(outfilename); 
+     // Get  the number of Monte-Carlo samples
+     n = atoi(argv[2]);
+     double *X;  
+     X = new double[n];
+     for (int i = 0;  i < n; i++){
+           double x = double(rand())*invers_period; 
+           X[i] = x;
+           MCint += x;
+           MCintsqr2 += x*x;
+     }
+     double Mean = MCint/((double) n );
+     MCintsqr2 = MCintsqr2/((double) n );
+     double STDev = sqrt(MCintsqr2-Mean*Mean);
+     double Variance = MCintsqr2-Mean*Mean;
+//   Write mean value and standard deviation 
+     cout << " Standard deviation= " << STDev << " Integral = " << Mean << endl;
+
+     // Now we compute the autocorrelation function, setting the distance d between two
+     // to a most 1/4 of the total number of cycles
+     double *autocor;  autocor = new double[n];
+     for (int j = 0; j < n; j++){
+       double sum = 0.0;
+       for (int k = 0; k < (n-j); k++){
+	 sum  += (X[k]-Mean)*(X[k+j]-Mean); 
+       }
+       autocor[j] = sum/Variance/((double) n );
+       ofile << setiosflags(ios::showpoint | ios::uppercase);
+       ofile << setw(15) << setprecision(8) << j;
+       ofile << setw(15) << setprecision(8) << autocor[j] << endl;
+     }
+     ofile.close();  // close output file
+     return 0;
+}  // end of main program 
diff --git a/doc/pub/week39/ipynb/Datafiles/automersenne.cpp b/doc/pub/week39/ipynb/Datafiles/automersenne.cpp
new file mode 100644
index 000000000..f4039a3b9
--- /dev/null
+++ b/doc/pub/week39/ipynb/Datafiles/automersenne.cpp
@@ -0,0 +1,76 @@
+//  This function computes the autocorrelation function for 
+//  the standard c++ random number generator
+
+#include <fstream>
+#include <iomanip>
+#include <iostream>
+#include <cmath>
+#include <random>
+
+using namespace std;
+// output file as global variable
+ofstream ofile;  
+
+//     Main function begins here     
+int main(int argc, char* argv[])
+{
+     int n;
+     char *outfilename;
+
+     cin >> n;
+     double MCint = 0.;      double MCintsqr2=0.;
+  // Initialize the seed and call the Mersienne algo
+     std::random_device rd;
+     std::mt19937_64 gen(rd());
+  // Set up the uniform distribution for x \in [[0, 1]
+     std::uniform_real_distribution<double> RandomNumberGenerator(0.0,1.0);
+     // Compute the variance and the mean value of the uniform distribution
+     // Compute also the specific values x for each cycle in order to be able to
+     // the covariance and the correlation function  
+     // Read in output file, abort if there are too few command-line arguments
+     if( argc <= 2 ){
+       cout << "Bad Usage: " << argv[0] << 
+	 " read also output file and number of cycles on same line" << endl;
+       exit(1);
+     }
+     else{
+       outfilename=argv[1];
+     }
+     ofile.open(outfilename); 
+     // Get  the number of Monte-Carlo samples
+     n = atoi(argv[2]);
+     double *X;  
+     X = new double[n];
+     for (int i = 0;  i < n; i++){
+           double x = RandomNumberGenerator(gen);
+           X[i] = x;
+           MCint += x;
+           MCintsqr2 += x*x;
+     }
+     double Mean = MCint/((double) n );
+     MCintsqr2 = MCintsqr2/((double) n );
+     double STDev = sqrt(MCintsqr2-Mean*Mean);
+     double Variance = MCintsqr2-Mean*Mean;
+//   Write mean value and standard deviation 
+     cout << " Standard deviation= " << STDev << " Integral = " << Mean << endl;
+
+     // Now we compute the autocorrelation function, setting the distance d 
+     double *autocor;  autocor = new double[n];
+     for (int j = 0; j < n; j++){
+       double sum = 0.0;
+       for (int k = 0; k < (n-j); k++){
+	 sum  += (X[k]-Mean)*(X[k+j]-Mean); 
+       }
+       autocor[j] = sum/Variance/((double) n );
+       ofile << setiosflags(ios::showpoint | ios::uppercase);
+       ofile << setw(15) << setprecision(8) << j;
+       ofile << setw(15) << setprecision(8) << autocor[j] << endl;
+     }
+     ofile.close();  // close output file
+     return 0;
+}  // end of main program 
+
+
+
+
+
diff --git a/doc/pub/week39/ipynb/Datafiles/automersenne.dat b/doc/pub/week39/ipynb/Datafiles/automersenne.dat
new file mode 100644
index 000000000..e60f67f00
--- /dev/null
+++ b/doc/pub/week39/ipynb/Datafiles/automersenne.dat
@@ -0,0 +1,1000 @@
+              0      1.0000000
+              1 -0.00039850883
+              2   0.0075297237
+              3   -0.076614584
+              4   -0.011132200
+              5   -0.030989810
+              6    0.041812361
+              7    0.022850567
+              8    0.020375660
+              9    0.021512042
+             10   -0.038233298
+             11   -0.038677806
+             12    0.039211752
+             13   -0.011095491
+             14  -0.0044564329
+             15   -0.035488512
+             16   -0.036152110
+             17   -0.013012335
+             18   0.0020817233
+             19   -0.018777970
+             20   0.0015342904
+             21   -0.034707469
+             22  -0.0045328442
+             23   -0.022961177
+             24   -0.017504691
+             25    0.025258106
+             26    0.038105998
+             27    0.027601135
+             28  -0.0042234787
+             29   -0.048024298
+             30    0.046726347
+             31    0.033528879
+             32    0.010739866
+             33    0.048112058
+             34    0.074342844
+             35  -0.0035160703
+             36   -0.097559373
+             37  -0.0036536069
+             38   -0.021169986
+             39   -0.026994281
+             40   0.0011361472
+             41  0.00042746464
+             42   0.0094556642
+             43    0.035186489
+             44    0.012008881
+             45   -0.066250195
+             46  -0.0058885570
+             47  -0.0013470536
+             48   -0.020357559
+             49    0.013659362
+             50   -0.014194533
+             51    0.064474272
+             52   -0.049315732
+             53   0.0068481500
+             54  -0.0015570919
+             55   -0.041760952
+             56    0.047017059
+             57   0.0083207514
+             58   -0.018694355
+             59   -0.011955604
+             60   -0.018575782
+             61   -0.022540700
+             62  -0.0038630138
+             63    0.020907804
+             64   -0.024616327
+             65     0.11530859
+             66   -0.067525916
+             67  -0.0067679184
+             68   -0.032918810
+             69   -0.018539289
+             70   -0.084216861
+             71    0.019426308
+             72 -0.00074715715
+             73    0.037295939
+             74    0.029917292
+             75   -0.027646591
+             76   0.0044122468
+             77    0.060618328
+             78    0.051236343
+             79   -0.016390863
+             80  -0.0099063162
+             81 -0.00082734335
+             82   -0.014929417
+             83  -0.0091957452
+             84    0.026182610
+             85   0.0018857001
+             86    0.031396746
+             87   -0.045390441
+             88   -0.055549108
+             89   -0.052299358
+             90   -0.015109536
+             91    0.032190708
+             92   -0.026868033
+             93    0.021618018
+             94   0.0050115645
+             95    0.011128105
+             96   -0.055351651
+             97    0.023984066
+             98    0.025469199
+             99    0.024346935
+            100  -0.0096941685
+            101   -0.053075665
+            102    0.013409246
+            103  -0.0071408978
+            104   -0.012223020
+            105  -0.0016478848
+            106    0.021045263
+            107    0.046403350
+            108   0.0069554279
+            109    0.014708717
+            110 -0.00026539003
+            111    0.024845445
+            112  -0.0049231997
+            113  -0.0017220862
+            114   0.0045621071
+            115   -0.025192401
+            116   0.0028209063
+            117   -0.092084217
+            118    0.016684106
+            119    0.021835229
+            120    0.018734216
+            121   -0.036952674
+            122   -0.040815536
+            123   0.0015671699
+            124    0.010132854
+            125    0.046715210
+            126   -0.019535585
+            127   0.0044946246
+            128    0.026224408
+            129   -0.031652034
+            130   0.0094934804
+            131   -0.050688941
+            132    0.032252957
+            133  -0.0078409521
+            134   -0.011630815
+            135  -0.0087330534
+            136    0.039295136
+            137    0.045309325
+            138   -0.022338545
+            139   -0.021070208
+            140    0.039383224
+            141   -0.021284479
+            142    0.017074248
+            143 -5.1343887E-05
+            144   -0.045458822
+            145   -0.046122944
+            146   -0.035461130
+            147   -0.021600258
+            148    0.021437230
+            149  -0.0061934076
+            150    0.051751485
+            151   -0.025273260
+            152   -0.017574909
+            153   0.0067792040
+            154   0.0018960306
+            155    0.016680514
+            156    0.017963759
+            157    0.043943111
+            158  -0.0061690099
+            159    0.026941385
+            160    0.012216633
+            161   -0.019785526
+            162    0.033424289
+            163   0.0088921176
+            164   -0.043272488
+            165  -0.0073920701
+            166   -0.022735988
+            167    0.019831288
+            168   -0.022406716
+            169  -0.0034270288
+            170    0.038661913
+            171   0.0015083024
+            172   -0.022180629
+            173   -0.013418595
+            174    0.017036953
+            175   -0.012137219
+            176   0.0024605105
+            177   -0.033808969
+            178   -0.018628761
+            179   -0.027346843
+            180   -0.012037812
+            181    0.019175778
+            182   -0.066844592
+            183    0.040087724
+            184   0.0045306968
+            185  -0.0016694184
+            186    0.027033664
+            187    0.038560397
+            188  -0.0030428931
+            189   -0.014561361
+            190    0.058253381
+            191   -0.030927654
+            192    0.025350468
+            193   0.0081393072
+            194   -0.063504391
+            195   -0.048869374
+            196    0.061287388
+            197   0.0067389738
+            198   -0.014016816
+            199    0.017009642
+            200    0.013498226
+            201    0.066760012
+            202  -0.0078831629
+            203   -0.014284510
+            204    0.015891370
+            205  -0.0017101068
+            206    0.017561522
+            207   -0.042216875
+            208   0.0010105827
+            209   -0.039588987
+            210   -0.010623434
+            211    0.012428578
+            212   -0.044573760
+            213    0.080482020
+            214    0.036705368
+            215   -0.025105371
+            216    0.014592058
+            217   -0.017789069
+            218    0.038765246
+            219   -0.025885902
+            220    0.017043839
+            221    0.025500897
+            222    0.013596501
+            223   -0.019972053
+            224   -0.016509130
+            225   0.0027069680
+            226    0.014210155
+            227    0.017071603
+            228   0.0034474582
+            229   -0.019289487
+            230    0.032128214
+            231   -0.023774060
+            232  -0.0052932205
+            233   0.0043239495
+            234    0.049467003
+            235  -0.0019682175
+            236  -0.0057175850
+            237   -0.086458240
+            238   -0.010633932
+            239    0.049711797
+            240  -0.0051473949
+            241   -0.010502996
+            242   -0.059682111
+            243   -0.013875569
+            244   -0.050671337
+            245   -0.042970472
+            246   0.0032983110
+            247    0.022735402
+            248    0.051786772
+            249   -0.065820501
+            250   -0.014616146
+            251 -0.00053863927
+            252    0.036148981
+            253   -0.027105785
+            254  -0.0033587844
+            255    0.012587625
+            256   -0.032473465
+            257   0.0045156150
+            258   0.0025041393
+            259    0.022749381
+            260   -0.011942502
+            261    0.040564347
+            262    0.019153347
+            263    0.042605069
+            264    0.026921673
+            265   -0.017759274
+            266    0.014206232
+            267  -0.0069809192
+            268  -0.0011496859
+            269   0.0063889545
+            270  -0.0075013878
+            271   -0.046046299
+            272   -0.013068379
+            273  -0.0062615689
+            274   0.0056876531
+            275   -0.025717888
+            276  -0.0072959257
+            277   -0.025127295
+            278    0.044797986
+            279   -0.011806658
+            280   -0.011621526
+            281    0.016598971
+            282   -0.023535958
+            283  -0.0033071575
+            284   0.0039764024
+            285    0.032759776
+            286  -0.0072796509
+            287   -0.022481839
+            288   -0.038837342
+            289   -0.035360598
+            290    0.047625475
+            291   0.0029441419
+            292    0.015363520
+            293   -0.032890305
+            294  0.00014385169
+            295   0.0093506211
+            296    0.042240558
+            297    0.016496134
+            298  -0.0046955917
+            299    0.026925193
+            300   -0.047930589
+            301    0.022580448
+            302   -0.041639190
+            303    0.031523493
+            304   -0.029478734
+            305    0.021526621
+            306   -0.022861985
+            307   0.0016559278
+            308   0.0028637402
+            309    0.026062242
+            310   0.0015438354
+            311    0.035893343
+            312    0.025777036
+            313    0.020541070
+            314   -0.056851732
+            315    0.034370506
+            316   -0.034130810
+            317   0.0032066005
+            318   -0.028514122
+            319   -0.016266227
+            320    0.014115845
+            321   0.0098882228
+            322   -0.046938829
+            323    0.030429223
+            324    0.049126829
+            325   -0.020672407
+            326  -0.0066021677
+            327   0.0017879728
+            328   -0.026402363
+            329  -0.0040738627
+            330   -0.025430820
+            331   -0.027299590
+            332   -0.020525149
+            333    0.040032531
+            334   0.0041808025
+            335  -0.0036519573
+            336  -0.0027050634
+            337   -0.024097320
+            338    0.019582987
+            339    0.045558977
+            340   0.0067800239
+            341  -0.0042827926
+            342   -0.011481723
+            343    0.041870674
+            344  -0.0078637289
+            345   0.0047561047
+            346    0.042687334
+            347  -0.0053904870
+            348   0.0066880645
+            349  0.00076997894
+            350    0.015016180
+            351   -0.046314612
+            352  -0.0035992390
+            353    0.018856659
+            354    0.029241951
+            355   -0.023837018
+            356  -0.0092689546
+            357   -0.027062945
+            358    0.017439465
+            359   -0.011280277
+            360   -0.020408789
+            361    0.029352588
+            362  -0.0065214405
+            363   0.0019510915
+            364    0.030417934
+            365   -0.021614425
+            366    0.040519121
+            367   -0.014387146
+            368    0.022631461
+            369   -0.037162875
+            370    0.057832805
+            371  -0.0012082448
+            372   -0.025176406
+            373    0.021927684
+            374    0.036809227
+            375    0.011755410
+            376    0.039719074
+            377  -0.0057936908
+            378   -0.014894587
+            379   -0.030001782
+            380  -0.0068359635
+            381   -0.027206809
+            382   0.0033147001
+            383    0.019952090
+            384   -0.016189684
+            385   -0.014190482
+            386    0.013660921
+            387    0.020810808
+            388    0.036953804
+            389   0.0068815608
+            390  -0.0051748463
+            391   -0.035329903
+            392    0.017989289
+            393   0.0017726599
+            394   -0.028362238
+            395   0.0076066270
+            396    0.012315135
+            397    0.019602391
+            398   0.0054637460
+            399    0.010180415
+            400    0.010715276
+            401    0.032347243
+            402   -0.011310452
+            403   -0.019241395
+            404    0.046814252
+            405   0.0033537765
+            406  -0.0080705101
+            407   -0.034532463
+            408   -0.013596282
+            409   -0.030167978
+            410    0.021618277
+            411    0.023256360
+            412   -0.024011435
+            413   0.0099447977
+            414   0.0051984929
+            415   0.0032058618
+            416   -0.018386174
+            417    0.091155277
+            418   0.0038552629
+            419   -0.011819495
+            420   -0.019909822
+            421   0.0049708514
+            422    0.030180281
+            423    0.019513707
+            424    0.010365884
+            425   -0.021038696
+            426    0.012563095
+            427  -0.0078607670
+            428   -0.034763368
+            429   -0.013946109
+            430   0.0018269461
+            431    0.034874965
+            432   -0.027224239
+            433    0.015590005
+            434   -0.019383632
+            435   0.0038593809
+            436  -0.0056616145
+            437   -0.020352622
+            438    0.013123802
+            439   -0.025004282
+            440   0.0018341696
+            441   -0.039770439
+            442  0.00087601823
+            443  -0.0066024767
+            444  -0.0097114643
+            445   -0.011356804
+            446  -0.0071475429
+            447    0.035558002
+            448    0.021950580
+            449   0.0071515180
+            450   -0.019641713
+            451   -0.012505286
+            452  0.00058888713
+            453  -0.0032990818
+            454   0.0055603481
+            455   -0.012210133
+            456   -0.036752880
+            457  -0.0010291934
+            458    0.013480362
+            459   0.0011063715
+            460    0.015099980
+            461    0.010381880
+            462   -0.012656722
+            463   -0.034560636
+            464    0.046300125
+            465   -0.012855784
+            466  -0.0069566210
+            467   -0.014963271
+            468   0.0071160662
+            469   -0.023183606
+            470  -0.0054130250
+            471   -0.029441744
+            472   -0.013208213
+            473   0.0058226359
+            474   0.0073592316
+            475   -0.026501589
+            476   0.0086787125
+            477    0.011144501
+            478   -0.017992368
+            479   -0.020137351
+            480 -0.00050191872
+            481    0.032606384
+            482    0.028294831
+            483   0.0069425809
+            484  -0.0025918567
+            485   -0.030876138
+            486    0.028514704
+            487   -0.030215330
+            488   -0.022505299
+            489  -0.0045540661
+            490   -0.021775574
+            491  0.00038863032
+            492   -0.019469717
+            493   -0.026643162
+            494    0.015456519
+            495   0.0060890381
+            496   0.0072811940
+            497   -0.020604861
+            498   0.0058975829
+            499   0.0077669157
+            500   -0.014203543
+            501    0.027618735
+            502    0.010987245
+            503    0.016146396
+            504   -0.046623086
+            505   -0.021404063
+            506  -0.0071026109
+            507    0.014316482
+            508    0.050116141
+            509   -0.016053678
+            510  0.00037589453
+            511    0.024107855
+            512   -0.020410421
+            513  -0.0039090131
+            514    0.018030706
+            515   -0.019683084
+            516    0.020915648
+            517   -0.012283794
+            518   -0.072770967
+            519  -0.0020466698
+            520    0.033620334
+            521   0.0026312146
+            522   -0.019340110
+            523   -0.010093083
+            524    0.032372824
+            525    0.023358211
+            526    0.019563085
+            527   -0.023193643
+            528   -0.023912984
+            529    0.028365424
+            530   -0.017833524
+            531    0.049366485
+            532   0.0033034666
+            533    0.036273527
+            534   -0.034107635
+            535    0.027272035
+            536   -0.012223060
+            537    0.047632726
+            538   -0.010089959
+            539  -0.0028197945
+            540  0.00065234228
+            541   0.0032028460
+            542   -0.014032424
+            543  -0.0038940314
+            544   -0.014471639
+            545    0.010245881
+            546   -0.031846228
+            547   -0.018379583
+            548   -0.012049365
+            549   -0.010966222
+            550   -0.013878160
+            551    0.016589271
+            552   -0.021916867
+            553   0.0087858521
+            554    0.027621607
+            555   -0.018883327
+            556   0.0055198510
+            557  -0.0060233484
+            558   0.0065385887
+            559    0.049948053
+            560    0.022709131
+            561   0.0083870584
+            562    0.029122276
+            563    0.030653719
+            564  -0.0038621603
+            565  -0.0077334777
+            566   0.0013917389
+            567    0.025550810
+            568   -0.011855303
+            569   -0.030284018
+            570  -0.0073176655
+            571    0.017890083
+            572   0.0023946457
+            573   0.0082304715
+            574   -0.014303315
+            575    0.020850567
+            576   -0.010499909
+            577   -0.028686313
+            578  0.00029063430
+            579   0.0058666114
+            580   -0.021785486
+            581  -0.0096116460
+            582   -0.010938017
+            583   -0.028888331
+            584 -0.00055043544
+            585    0.020775413
+            586    0.019325640
+            587    0.021697931
+            588   -0.031392595
+            589   -0.033268006
+            590   -0.039920469
+            591   -0.015858921
+            592  -0.0010340578
+            593    0.028674925
+            594    0.027232115
+            595   -0.023133588
+            596   -0.031392641
+            597  -0.0033751189
+            598   -0.043930110
+            599   -0.022065423
+            600    0.018575533
+            601    0.034786503
+            602    0.022188960
+            603   -0.018722519
+            604  -0.0033538481
+            605  -0.0019135364
+            606    0.034412541
+            607   0.0063933819
+            608   0.0043941754
+            609   0.0031151635
+            610   0.0059123942
+            611   -0.022565534
+            612 -0.00012048878
+            613    0.018358466
+            614   0.0058849744
+            615   -0.010446412
+            616    0.027175242
+            617    0.015094912
+            618    0.045011644
+            619   0.0051040681
+            620   -0.015088009
+            621  0.00060846204
+            622   -0.023562337
+            623   -0.014687213
+            624  -0.0019508561
+            625   -0.027180327
+            626   0.0053059608
+            627    0.026911053
+            628    0.019832294
+            629   -0.014316090
+            630   -0.013579816
+            631  -0.0074524339
+            632   0.0023041549
+            633  -0.0048902581
+            634    0.014748700
+            635    0.015828956
+            636    0.011651372
+            637   -0.028449219
+            638    0.018893515
+            639    0.022430458
+            640   0.0020572850
+            641   -0.047852942
+            642   -0.030971596
+            643   -0.011280746
+            644    0.035021969
+            645   -0.011408873
+            646   -0.024270635
+            647   -0.024445963
+            648  -0.0080845450
+            649    0.020187707
+            650    0.010225120
+            651    0.026541612
+            652  -0.0094225257
+            653   -0.028655708
+            654   -0.028771259
+            655   -0.054234804
+            656    0.027811754
+            657    0.012886491
+            658    0.027248276
+            659   0.0014272864
+            660  -0.0038342847
+            661    0.012264597
+            662   0.0078784196
+            663   -0.014772008
+            664  -0.0021303419
+            665  -0.0023548533
+            666   -0.010001963
+            667    0.010265614
+            668   -0.016169868
+            669  0.00068463391
+            670   0.0019238310
+            671  0.00054326870
+            672  -0.0016790464
+            673    0.020675073
+            674   -0.030547720
+            675   -0.016952851
+            676  -0.0043647773
+            677   0.0089803777
+            678  -0.0011795400
+            679  -0.0026837760
+            680  -0.0066145568
+            681    0.016555175
+            682   0.0045479616
+            683    0.021909635
+            684  -0.0070725479
+            685    0.011858958
+            686   0.0063636282
+            687    0.013213975
+            688   0.0086205024
+            689   0.0034084325
+            690   -0.012270779
+            691   0.0018423414
+            692    0.013453993
+            693   0.0050505880
+            694    0.022730152
+            695    0.010188657
+            696  -0.0051335682
+            697  -0.0020429929
+            698  -0.0046473046
+            699  -0.0064359202
+            700    0.027437169
+            701   0.0021062070
+            702  -0.0092277785
+            703   -0.021793309
+            704  -0.0093357745
+            705   0.0096844781
+            706   -0.017812423
+            707    0.019216536
+            708   -0.011751425
+            709    0.031306164
+            710   -0.031003733
+            711   0.0045908435
+            712    0.022197245
+            713    0.044356823
+            714   -0.011279340
+            715 -0.00063210160
+            716    0.022619262
+            717  -0.0048696777
+            718  2.7710142E-05
+            719   0.0093208258
+            720    0.011151741
+            721    0.042734701
+            722  0.00082345063
+            723  -0.0064815085
+            724   -0.011162417
+            725    0.019371743
+            726  -0.0072227478
+            727    0.036856608
+            728  -0.0046597009
+            729   0.0093292407
+            730   -0.029881720
+            731   -0.022544055
+            732   0.0067547008
+            733   -0.012852388
+            734    0.017629149
+            735    0.022035951
+            736   0.0039029374
+            737   0.0038631069
+            738  -0.0064078979
+            739   -0.025045521
+            740   -0.030156360
+            741   0.0022182783
+            742   0.0056079074
+            743  -0.0057498694
+            744  -0.0038137357
+            745  -0.0019111597
+            746    0.033244546
+            747   0.0039995162
+            748   -0.028431035
+            749  -0.0084654091
+            750   -0.017920230
+            751   0.0091895467
+            752   -0.047359823
+            753   0.0053614276
+            754   0.0045515416
+            755   0.0063096922
+            756  -0.0051722096
+            757  -0.0090646204
+            758   -0.015375531
+            759    0.010469262
+            760    0.020619059
+            761    0.019446803
+            762    0.015360235
+            763    0.026556079
+            764   -0.014850757
+            765  0.00043243229
+            766   -0.019207935
+            767   0.0091227560
+            768  -0.0039674501
+            769    0.012677223
+            770   0.0059764792
+            771  -0.0019081743
+            772  0.00052190352
+            773   -0.023890356
+            774  -0.0093265644
+            775  -0.0027615351
+            776   -0.010170966
+            777   -0.026209697
+            778  -0.0048646684
+            779   0.0030630937
+            780  -0.0039676134
+            781  -0.0088915791
+            782   -0.014279751
+            783   -0.019704165
+            784   -0.019209887
+            785   0.0061373699
+            786    0.020227079
+            787    0.028729569
+            788    0.012112855
+            789    0.017006455
+            790  -0.0033648302
+            791  0.00030061169
+            792   0.0012253500
+            793    0.023771468
+            794   0.0046375357
+            795   -0.011610943
+            796   -0.018452313
+            797   0.0083301500
+            798   0.0037865728
+            799    0.021777084
+            800    0.015769708
+            801   -0.014965320
+            802  -0.0017419876
+            803   -0.033476334
+            804   -0.030830255
+            805   -0.021870028
+            806   0.0074449924
+            807   0.0013585510
+            808   -0.026724716
+            809  -0.0089502027
+            810   0.0047979763
+            811   -0.013332303
+            812  -0.0064698535
+            813   -0.012448045
+            814  -0.0047413069
+            815  -0.0038978876
+            816    0.016833818
+            817    0.014603311
+            818    0.020443440
+            819    0.028826407
+            820    0.013123271
+            821   0.0091723388
+            822   -0.011255640
+            823  -0.0012867583
+            824   0.0074609431
+            825   0.0079598103
+            826  0.00055143813
+            827   0.0073569375
+            828    0.027685980
+            829  -0.0079481602
+            830   -0.015245658
+            831   -0.016568012
+            832    0.014166210
+            833   -0.017490318
+            834   0.0068715750
+            835   -0.010515617
+            836   -0.023031012
+            837  -0.0039414950
+            838   -0.021860724
+            839  0.00029086251
+            840    0.025457721
+            841    0.018815201
+            842   -0.014766348
+            843   -0.020393215
+            844    0.025794784
+            845  -0.0082851118
+            846  0.00013439103
+            847   0.0045753418
+            848   0.0080142804
+            849  -0.0056724468
+            850  -0.0053680915
+            851 -5.0186689E-05
+            852  -0.0032442615
+            853   0.0060424922
+            854   -0.017920096
+            855   -0.018203065
+            856  -0.0012571564
+            857   0.0078961550
+            858   -0.014874879
+            859  -0.0071654637
+            860   -0.012561047
+            861   -0.022924028
+            862  -0.0010177375
+            863   -0.016723859
+            864  -0.0041250596
+            865  -0.0099232278
+            866   -0.012007622
+            867   -0.010921927
+            868   -0.010611169
+            869  -0.0064454145
+            870   -0.012373089
+            871    0.016952049
+            872   0.0052607349
+            873   -0.012175133
+            874   0.0054016304
+            875   0.0092675688
+            876   -0.020126948
+            877   -0.015452958
+            878   0.0042092441
+            879    0.011057301
+            880   -0.014070768
+            881   0.0039739742
+            882   -0.010045862
+            883  -0.0024596864
+            884  -0.0025652609
+            885  -0.0046209025
+            886    0.011785118
+            887  -0.0069725970
+            888   -0.017379060
+            889   -0.024952371
+            890  -0.0037120511
+            891   0.0062956522
+            892   -0.015428430
+            893    0.014351959
+            894  -0.0074217601
+            895 -0.00016310749
+            896  0.00035103207
+            897    0.029621595
+            898   0.0095742916
+            899    0.012108004
+            900  -0.0013415321
+            901   0.0040298110
+            902   0.0081285013
+            903   0.0010463687
+            904   -0.013433386
+            905    0.010687478
+            906  -0.0016356944
+            907   -0.012058460
+            908  -0.0036622641
+            909   0.0021945389
+            910   -0.011631748
+            911   -0.012051364
+            912    0.013892661
+            913 -4.9505611E-05
+            914  -0.0016314468
+            915  7.4078423E-05
+            916   -0.011434937
+            917  -0.0069514616
+            918    0.016964461
+            919    0.012711562
+            920   -0.011803641
+            921   0.0059246183
+            922   0.0084047975
+            923  0.00085369214
+            924    0.013476117
+            925    0.017477181
+            926    0.013697700
+            927   0.0078041173
+            928   0.0081624940
+            929   0.0079026863
+            930    0.018032525
+            931  -0.0021333952
+            932  -0.0089801782
+            933  -0.0043425879
+            934  -0.0012769913
+            935   0.0031995993
+            936  -0.0085301623
+            937   0.0016565121
+            938   -0.016048639
+            939   0.0041939700
+            940  -0.0042789437
+            941   0.0040290875
+            942  -0.0078984469
+            943  -0.0031005041
+            944   -0.010489387
+            945   -0.014048278
+            946   0.0080731664
+            947   0.0072791508
+            948    0.017732252
+            949   0.0070456969
+            950   0.0048539841
+            951  -0.0063464819
+            952   0.0071559674
+            953 -0.00076997791
+            954  -0.0060856751
+            955   0.0052101682
+            956   0.0076901419
+            957  -0.0027116074
+            958  -0.0015772656
+            959    0.011581456
+            960   0.0052794831
+            961  -0.0076978630
+            962  -0.0081319124
+            963  -0.0092991557
+            964   -0.012073101
+            965   -0.013250383
+            966  -0.0036785721
+            967   0.0050898879
+            968   0.0028822611
+            969   -0.010407446
+            970   0.0011417214
+            971  -0.0074299748
+            972  -0.0015352307
+            973  -0.0057842566
+            974    0.011529249
+            975   0.0087991237
+            976  0.00030885135
+            977   0.0036026354
+            978  -0.0014831089
+            979  0.00037216609
+            980  -0.0014774795
+            981   0.0074766204
+            982   0.0072999330
+            983   0.0055104880
+            984   0.0020526707
+            985  -0.0042872281
+            986  -0.0023896493
+            987  -0.0060384437
+            988  -0.0070743121
+            989  -0.0062837043
+            990  -0.0039570614
+            991  -0.0034175474
+            992   0.0015737944
+            993 -6.8705654E-05
+            994   0.0059629554
+            995   0.0020717095
+            996   0.0023956585
+            997 -0.00034251251
+            998   0.0015951958
+            999  0.00031542955
diff --git a/doc/pub/week39/ipynb/Datafiles/chddata.csv b/doc/pub/week39/ipynb/Datafiles/chddata.csv
new file mode 100644
index 000000000..7f44183c7
--- /dev/null
+++ b/doc/pub/week39/ipynb/Datafiles/chddata.csv
@@ -0,0 +1,100 @@
+1,    21,      1,         0
+2,    23,      1,         0
+3,    25,      1,         1
+4,    29,      1,         0
+5,    21,      1,         0
+6,    24,      1,         0
+7,    27,      1,         0
+8,    29,      1,         0
+9,    28,      1,         0
+10,   26,      1,         0
+11,   30,      2,         0
+12,   31,      2,         0 
+13,   31,      2,         0 
+14,   31,      2,         1
+15,   32,      2,         0 
+16,   34,      2,         0
+17,   34,      2,         0 
+18,   31,      2,         0 
+19,   32,      2,         0  
+20,   32,      2,         0 
+21,   33,      2,         0 
+22,   34,      2,         0 
+23,   31,      2,         1 
+24,   30,      2,         0 
+25,   33,      2,         0
+26,   36,      3,         1 
+27,   35,      3,         0 
+28,   35,      3,         0 
+29,   38,      3,         0 
+30,   37,      3,         1 
+31,   36,      3,         0 
+32,   35,      3,         0 
+33,   39,      3,         0 
+34,   39,      3,         0 
+35,   38,      3,         1 
+36,   37,      3,         0 
+37,   37,      3,         0 
+38,   40,      4,         0 
+39,   41,      4,         1
+40,   44,      4,         0 
+41,   44,      4,         0 
+42,   43,      4,         1 
+43,   42,      4,         0 
+44,   41,      4,         0 
+45,   40,      4,         1 
+46,   42,      4,         0 
+47,   42,      4,         0 
+48,   43,      4,         0 
+49,   44,      4,         1 
+50,   44,      4,         0 
+51,   42,      4,         0 
+52,   41,      4,         1 
+53,   45,      5,         0 
+54,   45,      5,         1 
+55,   49,      5,         0 
+56,   48,      5,         1 
+57,   47,      5,         0 
+58,   49,      5,         1 
+59,   46,      5,         1 
+60,   45,      5,         0 
+61,   49,      5,         1 
+62,   48,      5,         0 
+63,   47,      5,         1 
+64,   46,      5,         0 
+65,   47,      5,         0 
+66,   50,      6,         1 
+67,   51,      6,         1 
+68,   51,      6,         0 
+69,   54,      6,         1 
+70,   53,      6,         1 
+71,   51,      6,         0 
+72,   52,      6,         1 
+73,   54,      6,         0 
+74,   55,      7,         1 
+75,   56,      7,         1 
+76,   58,      7,         0 
+77,   59,      7,         1 
+78,   59,      7,         1 
+79,   58,      7,         0 
+80,   55,      7,         1 
+81,   56,      7,         1 
+82,   57,      7,         1 
+83,   58,      7,         1 
+84,   59,      7,         0 
+85,   55,      7,         1 
+86,   56,      7,         1 
+87,   57,      7,         1 
+88,   58,      7,         0 
+89,   59,      7,         1 
+90,   56,      7,         1 
+91,   60,      8,         1 
+92,   65,      8,         1 
+93,   67,      8,         1 
+94,   66,      8,         0 
+95,   63,      8,         1 
+96,   61,      8,         1 
+97,   69,      8,         1 
+98,   65,      8,         1 
+99,   64,      8,         1 
+100,  63,      8,        0 
\ No newline at end of file
diff --git a/doc/pub/week39/ipynb/Datafiles/plot.py b/doc/pub/week39/ipynb/Datafiles/plot.py
new file mode 100644
index 000000000..ecea29a77
--- /dev/null
+++ b/doc/pub/week39/ipynb/Datafiles/plot.py
@@ -0,0 +1,16 @@
+import numpy as np
+from  matplotlib import pyplot as plt
+# Load in data file
+data = np.loadtxt("autocor.dat")
+data1 = np.loadtxt("automersenne.dat")
+# Make arrays containing x-axis and binding energies as function of A
+x = data[:,0]
+corr = data[:,1]
+corr2 = data1[:,1]
+plt.plot(x, corr ,'ro', x, corr2, 'b')
+plt.axis([0,1000,-0.2, 1.1])
+plt.xlabel(r'$d$')
+plt.ylabel(r'$C_d$')
+plt.title(r'autocorrelation function for RNG')
+plt.savefig('autocorr.pdf')
+plt.show()
diff --git a/doc/pub/week39/ipynb/figures/BiasVariance.png b/doc/pub/week39/ipynb/figures/BiasVariance.png
new file mode 100644
index 000000000..3fb3474ac
Binary files /dev/null and b/doc/pub/week39/ipynb/figures/BiasVariance.png differ
diff --git a/doc/pub/week39/ipynb/figures/adagrad.png b/doc/pub/week39/ipynb/figures/adagrad.png
new file mode 100644
index 000000000..97a9cf908
Binary files /dev/null and b/doc/pub/week39/ipynb/figures/adagrad.png differ
diff --git a/doc/pub/week39/ipynb/figures/adam.png b/doc/pub/week39/ipynb/figures/adam.png
new file mode 100644
index 000000000..a3a39f025
Binary files /dev/null and b/doc/pub/week39/ipynb/figures/adam.png differ
diff --git a/doc/pub/week39/ipynb/figures/nns.png b/doc/pub/week39/ipynb/figures/nns.png
new file mode 100644
index 000000000..19e31ef05
Binary files /dev/null and b/doc/pub/week39/ipynb/figures/nns.png differ
diff --git a/doc/pub/week39/ipynb/figures/rmsprop.png b/doc/pub/week39/ipynb/figures/rmsprop.png
new file mode 100644
index 000000000..9f336d033
Binary files /dev/null and b/doc/pub/week39/ipynb/figures/rmsprop.png differ
diff --git a/doc/pub/week39/ipynb/ipynb-week39-src.tar.gz b/doc/pub/week39/ipynb/ipynb-week39-src.tar.gz
index 77179d504..bc31af6d8 100644
Binary files a/doc/pub/week39/ipynb/ipynb-week39-src.tar.gz and b/doc/pub/week39/ipynb/ipynb-week39-src.tar.gz differ
diff --git a/doc/pub/week39/ipynb/week39.ipynb b/doc/pub/week39/ipynb/week39.ipynb
index b6bfa7613..1f411fe62 100644
--- a/doc/pub/week39/ipynb/week39.ipynb
+++ b/doc/pub/week39/ipynb/week39.ipynb
@@ -2,4315 +2,2429 @@
  "cells": [
   {
    "cell_type": "markdown",
-   "id": "014c8f7e",
-   "metadata": {},
+   "id": "3a65fcc4",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "<!-- HTML file automatically generated from DocOnce source (https://github.com/doconce/doconce/)\n",
     "doconce format html week39.do.txt --no_mako -->\n",
-    "<!-- dom:TITLE: Week 39: Optimization and  Gradient Methods -->"
+    "<!-- dom:TITLE: Week 39: Resampling methods and logistic regression -->"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "0bb320b9",
-   "metadata": {},
+   "id": "284ac98b",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "# Week 39: Optimization and  Gradient Methods\n",
-    "**Morten Hjorth-Jensen**, Department of Physics, University of Oslo and Department of Physics and Astronomy and Facility for Rare Isotope Beams, Michigan State University\n",
+    "# Week 39: Resampling methods and logistic regression\n",
+    "**Morten Hjorth-Jensen**, Department of Physics, University of Oslo\n",
     "\n",
     "Date: **Week 39**"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "39af5319",
-   "metadata": {},
+   "id": "582e0b32",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "## Plan for week 39, September 23-27, 2024"
+    "## Plan for week 39, September 22-26, 2025\n",
+    "\n",
+    "**Material for the lecture on Monday September 22.**\n",
+    "\n",
+    "1. Resampling techniques, Bootstrap and cross validation and bias-variance tradeoff\n",
+    "\n",
+    "2. Logistic regression, our first classification encounter and a stepping stone towards neural networks\n",
+    "\n",
+    "3. [Video of lecture](https://youtu.be/OVouJyhoksY)\n",
+    "\n",
+    "4. [Whiteboard notes](https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2024/FYSSTKweek39.pdf)"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "57ff9961",
-   "metadata": {},
+   "id": "08ea52de",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "## Lecture Monday September 23\n",
-    "\n",
-    "**Material for the lecture on Monday September 23.**\n",
-    "\n",
-    "  * Repetition of Logistic regression equations and classification problems and discussion of Gradient methods. Examples on how to implement Logistic Regression and discussion of gradient methods\n",
-    "\n",
-    "  * Stochastic Gradient descent with examples and automatic differentiation (theme also for next week).\n",
+    "## Readings and Videos, resampling methods\n",
+    "1. Raschka et al, pages 175-192\n",
     "\n",
-    "  * [Video of lecture](https://youtu.be/ISGpTC28Vmk)\n",
+    "2. Hastie et al Chapter 7, here we recommend 7.1-7.5 and 7.10 (cross-validation) and 7.11 (bootstrap). See <https://link.springer.com/book/10.1007/978-0-387-84858-7>.\n",
     "\n",
-    "  * [Whiteboard notes](https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2024/NotesSeptember23.pdf)\n",
+    "3. [Video on bias-variance tradeoff](https://www.youtube.com/watch?v=EuBBz3bI-aA)\n",
     "\n",
-    "  * Readings and Videos:\n",
+    "4. [Video on Bootstrapping](https://www.youtube.com/watch?v=Xz0x-8-cgaQ)\n",
     "\n",
-    "    * These lecture notes\n",
-    "\n",
-    "    * For a good discussion on gradient methods, we would like to recommend Goodfellow et al section 4.3-4.5 and sections 8.3-8.6. We will come back to the latter chapter in our discussion of Neural networks as well.\n",
+    "5. [Video on cross validation](https://www.youtube.com/watch?v=fSytzGwwBVw)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a8d5878f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Readings and Videos, logistic regression\n",
+    "1. Hastie et al 4.1, 4.2 and 4.3 on logistic regression\n",
     "\n",
-    "    * Raschka et al, pages 53-76 on Logistic regression and pages 37-52 on gradient optimization    \n",
+    "2. Raschka et al, pages 53-76 on Logistic regression and pages 37-52 on gradient optimization\n",
     "\n",
-    "    * [Video on gradient descent](https://www.youtube.com/watch?v=sDv4f4s2SB8)\n",
+    "3. [Video on Logistic regression](https://www.youtube.com/watch?v=C5268D9t9Ak)\n",
     "\n",
-    "    * [Video on stochastic gradient descent](https://www.youtube.com/watch?v=vMh0zPT0tLI)"
+    "4. [Yet another video on logistic regression](https://www.youtube.com/watch?v=yIYKR4sgzI8)"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "4cef25d3",
-   "metadata": {},
+   "id": "e93210f9",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "## Lab sessions week 39\n",
     "\n",
-    "**Material for the active learning sessions on Tuesday and Wednesday.**\n",
+    "**Material for the lab sessions on Tuesday and Wednesday.**\n",
     "\n",
-    "  * Discussions on how to structure your report for the first project\n",
+    "1. Discussions on how to structure your report for the first project\n",
     "\n",
-    "  * Exercise for week 39 on how to write the abstract and the introduction of the report and how to include references. \n",
+    "2. Exercise for week 39 on how to write the abstract and the introduction of the report and how to include references. \n",
     "\n",
-    "  * Work on project 1, in particular resampling methods like cross-validation and bootstrap. **For more discussions of project 1, chapter 5 of Goodfellow et al is a good read, in particular sections 5.1-5.5 and 5.7-5.11**.\n",
+    "3. Work on project 1, in particular resampling methods like cross-validation and bootstrap. **For more discussions of project 1, chapter 5 of Goodfellow et al is a good read, in particular sections 5.1-5.5 and 5.7-5.11**.\n",
     "\n",
-    "  * [Video on how to write scientific reports recorded during one of the lab sessions](https://youtu.be/tVW1ZDmZnwM)\n",
+    "4. [Video on how to write scientific reports recorded during one of the lab sessions](https://youtu.be/tVW1ZDmZnwM)\n",
     "\n",
-    "  * A general guideline can be found at <https://github.com/CompPhysics/MachineLearning/blob/master/doc/Projects/EvaluationGrading/EvaluationForm.md>."
+    "5. A general guideline can be found at <https://github.com/CompPhysics/MachineLearning/blob/master/doc/Projects/EvaluationGrading/EvaluationForm.md>."
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "bdae6350",
-   "metadata": {},
+   "id": "c319a504",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "## Lecture Monday September 23, Optimization, the central part of any Machine Learning algortithm\n",
-    "\n",
-    "The first few slides here are a repetition from last week. \n",
-    "\n",
-    "Almost every problem in machine learning and data science starts with\n",
-    "a dataset $X$, a model $g(\\beta)$, which is a function of the\n",
-    "parameters $\\beta$ and a cost function $C(X, g(\\beta))$ that allows\n",
-    "us to judge how well the model $g(\\beta)$ explains the observations\n",
-    "$X$. The model is fit by finding the values of $\\beta$ that minimize\n",
-    "the cost function. Ideally we would be able to solve for $\\beta$\n",
-    "analytically, however this is not possible in general and we must use\n",
-    "some approximative/numerical method to compute the minimum."
+    "## Lecture material"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "8fede05c",
-   "metadata": {},
+   "id": "5f29284a",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "## Revisiting our Logistic Regression case\n",
+    "## Resampling methods\n",
+    "Resampling methods are an indispensable tool in modern\n",
+    "statistics. They involve repeatedly drawing samples from a training\n",
+    "set and refitting a model of interest on each sample in order to\n",
+    "obtain additional information about the fitted model. For example, in\n",
+    "order to estimate the variability of a linear regression fit, we can\n",
+    "repeatedly draw different samples from the training data, fit a linear\n",
+    "regression to each new sample, and then examine the extent to which\n",
+    "the resulting fits differ. Such an approach may allow us to obtain\n",
+    "information that would not be available from fitting the model only\n",
+    "once using the original training sample.\n",
     "\n",
-    "In our discussion on Logistic Regression we studied the \n",
-    "case of\n",
-    "two classes, with $y_i$ either\n",
-    "$0$ or $1$. Furthermore we assumed also that we have only two\n",
-    "parameters $\\beta$ in our fitting, that is we\n",
-    "defined probabilities"
+    "Two resampling methods are often used in Machine Learning analyses,\n",
+    "1. The **bootstrap method**\n",
+    "\n",
+    "2. and **Cross-Validation**\n",
+    "\n",
+    "In addition there are several other methods such as the Jackknife and the Blocking methods. This week will repeat some of the elements of the bootstrap method and focus more on cross-validation."
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "b8a00c23",
-   "metadata": {},
+   "id": "4a774608",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "$$\n",
-    "\\begin{align*}\n",
-    "p(y_i=1|x_i,\\boldsymbol{\\beta}) &= \\frac{\\exp{(\\beta_0+\\beta_1x_i)}}{1+\\exp{(\\beta_0+\\beta_1x_i)}},\\nonumber\\\\\n",
-    "p(y_i=0|x_i,\\boldsymbol{\\beta}) &= 1 - p(y_i=1|x_i,\\boldsymbol{\\beta}),\n",
-    "\\end{align*}\n",
-    "$$"
+    "## Resampling approaches can be computationally expensive\n",
+    "\n",
+    "Resampling approaches can be computationally expensive, because they\n",
+    "involve fitting the same statistical method multiple times using\n",
+    "different subsets of the training data. However, due to recent\n",
+    "advances in computing power, the computational requirements of\n",
+    "resampling methods generally are not prohibitive. In this chapter, we\n",
+    "discuss two of the most commonly used resampling methods,\n",
+    "cross-validation and the bootstrap. Both methods are important tools\n",
+    "in the practical application of many statistical learning\n",
+    "procedures. For example, cross-validation can be used to estimate the\n",
+    "test error associated with a given statistical learning method in\n",
+    "order to evaluate its performance, or to select the appropriate level\n",
+    "of flexibility. The process of evaluating a model’s performance is\n",
+    "known as model assessment, whereas the process of selecting the proper\n",
+    "level of flexibility for a model is known as model selection. The\n",
+    "bootstrap is widely used."
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "5f2ae8d2",
-   "metadata": {},
+   "id": "5e62c381",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "where $\\boldsymbol{\\beta}$ are the weights we wish to extract from data, in our case $\\beta_0$ and $\\beta_1$."
+    "## Why resampling methods ?\n",
+    "**Statistical analysis.**\n",
+    "\n",
+    "* Our simulations can be treated as *computer experiments*. This is particularly the case for Monte Carlo methods which are widely used in statistical analyses.\n",
+    "\n",
+    "* The results can be analysed with the same statistical tools as we would use when analysing experimental data.\n",
+    "\n",
+    "* As in all experiments, we are looking for expectation values and an estimate of how accurate they are, i.e., possible sources for errors."
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "43aafb89",
-   "metadata": {},
+   "id": "96896342",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "## The equations to solve\n",
+    "## Statistical analysis\n",
     "\n",
-    "Our compact equations used a definition of a vector $\\boldsymbol{y}$ with $n$\n",
-    "elements $y_i$, an $n\\times p$ matrix $\\boldsymbol{X}$ which contains the\n",
-    "$x_i$ values and a vector $\\boldsymbol{p}$ of fitted probabilities\n",
-    "$p(y_i\\vert x_i,\\boldsymbol{\\beta})$. We rewrote in a more compact form\n",
-    "the first derivative of the cost function as"
+    "* As in other experiments, many numerical  experiments have two classes of errors:\n",
+    "\n",
+    "  * Statistical errors\n",
+    "\n",
+    "  * Systematical errors\n",
+    "\n",
+    "* Statistical errors can be estimated using standard tools from statistics\n",
+    "\n",
+    "* Systematical errors are method specific and must be treated differently from case to case."
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "a1a11c10",
-   "metadata": {},
+   "id": "d5318be7",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "$$\n",
-    "\\frac{\\partial \\mathcal{C}(\\boldsymbol{\\beta})}{\\partial \\boldsymbol{\\beta}} = -\\boldsymbol{X}^T\\left(\\boldsymbol{y}-\\boldsymbol{p}\\right).\n",
-    "$$"
+    "## Resampling methods\n",
+    "\n",
+    "With all these analytical equations for both the OLS and Ridge\n",
+    "regression, we will now outline how to assess a given model. This will\n",
+    "lead to a discussion of the so-called bias-variance tradeoff (see\n",
+    "below) and so-called resampling methods.\n",
+    "\n",
+    "One of the quantities we have discussed as a way to measure errors is\n",
+    "the mean-squared error (MSE), mainly used for fitting of continuous\n",
+    "functions. Another choice is the absolute error.\n",
+    "\n",
+    "In the discussions below we will focus on the MSE and in particular since we will split the data into test and training data,\n",
+    "we discuss the\n",
+    "1. prediction error or simply the **test error** $\\mathrm{Err_{Test}}$, where we have a fixed training set and the test error is the MSE arising from the data reserved for testing. We discuss also the \n",
+    "\n",
+    "2. training error $\\mathrm{Err_{Train}}$, which is the average loss over the training data.\n",
+    "\n",
+    "As our model becomes more and more complex, more of the training data tends to  used. The training may thence adapt to more complicated structures in the data. This may lead to a decrease in the bias (see below for code example) and a slight increase of the variance for the test error.\n",
+    "For a certain level of complexity the test error will reach minimum, before starting to increase again. The\n",
+    "training error reaches a saturation."
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "28e31aff",
-   "metadata": {},
+   "id": "7597015e",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "If we in addition define a diagonal matrix $\\boldsymbol{W}$ with elements \n",
-    "$p(y_i\\vert x_i,\\boldsymbol{\\beta})(1-p(y_i\\vert x_i,\\boldsymbol{\\beta})$, we can obtain a compact expression of the second derivative as"
+    "## Resampling methods: Bootstrap\n",
+    "Bootstrapping is a [non-parametric approach](https://en.wikipedia.org/wiki/Nonparametric_statistics) to statistical inference\n",
+    "that substitutes computation for more traditional distributional\n",
+    "assumptions and asymptotic results. Bootstrapping offers a number of\n",
+    "advantages: \n",
+    "1. The bootstrap is quite general, although there are some cases in which it fails.  \n",
+    "\n",
+    "2. Because it does not require distributional assumptions (such as normally distributed errors), the bootstrap can provide more accurate inferences when the data are not well behaved or when the sample size is small.  \n",
+    "\n",
+    "3. It is possible to apply the bootstrap to statistics with sampling distributions that are difficult to derive, even asymptotically. \n",
+    "\n",
+    "4. It is relatively simple to apply the bootstrap to complex data-collection plans (such as stratified and clustered samples).\n",
+    "\n",
+    "The textbook by [Davison on the Bootstrap Methods and their Applications](https://www.cambridge.org/core/books/bootstrap-methods-and-their-application/ED2FD043579F27952363566DC09CBD6A) provides many more insights and proofs. In this course we will take a more practical approach and use the results and theorems provided in the literature. For those interested in reading more about the bootstrap methods, we recommend the above text and the one by [Efron and Tibshirani](https://www.routledge.com/An-Introduction-to-the-Bootstrap/Efron-Tibshirani/p/book/9780412042317)."
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "3c985328",
-   "metadata": {},
+   "id": "fbf69230",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "$$\n",
-    "\\frac{\\partial^2 \\mathcal{C}(\\boldsymbol{\\beta})}{\\partial \\boldsymbol{\\beta}\\partial \\boldsymbol{\\beta}^T} = \\boldsymbol{X}^T\\boldsymbol{W}\\boldsymbol{X}.\n",
-    "$$"
+    "## The bias-variance tradeoff\n",
+    "\n",
+    "We will discuss the bias-variance tradeoff in the context of\n",
+    "continuous predictions such as regression. However, many of the\n",
+    "intuitions and ideas discussed here also carry over to classification\n",
+    "tasks. Consider a dataset $\\mathcal{D}$ consisting of the data\n",
+    "$\\mathbf{X}_\\mathcal{D}=\\{(y_j, \\boldsymbol{x}_j), j=0\\ldots n-1\\}$. \n",
+    "\n",
+    "Let us assume that the true data is generated from a noisy model"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "555aa6ae",
-   "metadata": {},
+   "id": "358f7872",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "This defines what is called  the Hessian matrix."
+    "$$\n",
+    "\\boldsymbol{y}=f(\\boldsymbol{x}) + \\boldsymbol{\\epsilon}\n",
+    "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "1a1c2b60",
-   "metadata": {},
+   "id": "6a4aceef",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "## Solving using Newton-Raphson's method\n",
+    "where $\\epsilon$ is normally distributed with mean zero and standard deviation $\\sigma^2$.\n",
     "\n",
-    "If we can set up these equations, Newton-Raphson's iterative method is normally the method of choice. It requires however that we can compute in an efficient way the  matrices that define the first and second derivatives. \n",
+    "In our derivation of the ordinary least squares method we defined then\n",
+    "an approximation to the function $f$ in terms of the parameters\n",
+    "$\\boldsymbol{\\theta}$ and the design matrix $\\boldsymbol{X}$ which embody our model,\n",
+    "that is $\\boldsymbol{\\tilde{y}}=\\boldsymbol{X}\\boldsymbol{\\theta}$. \n",
     "\n",
-    "Our iterative scheme is then given by"
+    "Thereafter we found the parameters $\\boldsymbol{\\theta}$ by optimizing the means squared error via the so-called cost function"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "e0247958",
-   "metadata": {},
+   "id": "84416669",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
-    "\\boldsymbol{\\beta}^{\\mathrm{new}} = \\boldsymbol{\\beta}^{\\mathrm{old}}-\\left(\\frac{\\partial^2 \\mathcal{C}(\\boldsymbol{\\beta})}{\\partial \\boldsymbol{\\beta}\\partial \\boldsymbol{\\beta}^T}\\right)^{-1}_{\\boldsymbol{\\beta}^{\\mathrm{old}}}\\times \\left(\\frac{\\partial \\mathcal{C}(\\boldsymbol{\\beta})}{\\partial \\boldsymbol{\\beta}}\\right)_{\\boldsymbol{\\beta}^{\\mathrm{old}}},\n",
+    "C(\\boldsymbol{X},\\boldsymbol{\\theta}) =\\frac{1}{n}\\sum_{i=0}^{n-1}(y_i-\\tilde{y}_i)^2=\\mathbb{E}\\left[(\\boldsymbol{y}-\\boldsymbol{\\tilde{y}})^2\\right].\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "f5ba3a72",
-   "metadata": {},
+   "id": "0036358e",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "or in matrix form as"
+    "We can rewrite this as"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "f8a47d6d",
-   "metadata": {},
+   "id": "d712d2d7",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
-    "\\boldsymbol{\\beta}^{\\mathrm{new}} = \\boldsymbol{\\beta}^{\\mathrm{old}}-\\left(\\boldsymbol{X}^T\\boldsymbol{W}\\boldsymbol{X} \\right)^{-1}\\times \\left(-\\boldsymbol{X}^T(\\boldsymbol{y}-\\boldsymbol{p}) \\right)_{\\boldsymbol{\\beta}^{\\mathrm{old}}}.\n",
+    "\\mathbb{E}\\left[(\\boldsymbol{y}-\\boldsymbol{\\tilde{y}})^2\\right]=\\frac{1}{n}\\sum_i(f_i-\\mathbb{E}\\left[\\boldsymbol{\\tilde{y}}\\right])^2+\\frac{1}{n}\\sum_i(\\tilde{y}_i-\\mathbb{E}\\left[\\boldsymbol{\\tilde{y}}\\right])^2+\\sigma^2.\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "506e1a50",
-   "metadata": {},
+   "id": "b71e48ac",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "The right-hand side is computed with the old values of $\\beta$. \n",
+    "The three terms represent the square of the bias of the learning\n",
+    "method, which can be thought of as the error caused by the simplifying\n",
+    "assumptions built into the method. The second term represents the\n",
+    "variance of the chosen model and finally the last terms is variance of\n",
+    "the error $\\boldsymbol{\\epsilon}$.\n",
     "\n",
-    "If we can compute these matrices, in particular the Hessian, the above is often the easiest method to implement."
+    "To derive this equation, we need to recall that the variance of $\\boldsymbol{y}$ and $\\boldsymbol{\\epsilon}$ are both equal to $\\sigma^2$. The mean value of $\\boldsymbol{\\epsilon}$ is by definition equal to zero. Furthermore, the function $f$ is not a stochastics variable, idem for $\\boldsymbol{\\tilde{y}}$.\n",
+    "We use a more compact notation in terms of the expectation value"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "3ad1aa1c",
-   "metadata": {},
+   "id": "c78ceafe",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "## Brief reminder on Newton-Raphson's method\n",
-    "\n",
-    "Let us quickly remind ourselves how we derive the above method.\n",
-    "\n",
-    "Perhaps the most celebrated of all one-dimensional root-finding\n",
-    "routines is Newton's method, also called the Newton-Raphson\n",
-    "method. This method  requires the evaluation of both the\n",
-    "function $f$ and its derivative $f'$ at arbitrary points. \n",
-    "If you can only calculate the derivative\n",
-    "numerically and/or your function is not of the smooth type, we\n",
-    "normally discourage the use of this method."
+    "$$\n",
+    "\\mathbb{E}\\left[(\\boldsymbol{y}-\\boldsymbol{\\tilde{y}})^2\\right]=\\mathbb{E}\\left[(\\boldsymbol{f}+\\boldsymbol{\\epsilon}-\\boldsymbol{\\tilde{y}})^2\\right],\n",
+    "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "d3d8671f",
-   "metadata": {},
+   "id": "74aae5bc",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "## The equations\n",
-    "\n",
-    "The Newton-Raphson formula consists geometrically of extending the\n",
-    "tangent line at a current point until it crosses zero, then setting\n",
-    "the next guess to the abscissa of that zero-crossing.  The mathematics\n",
-    "behind this method is rather simple. Employing a Taylor expansion for\n",
-    "$x$ sufficiently close to the solution $s$, we have"
+    "and adding and subtracting $\\mathbb{E}\\left[\\boldsymbol{\\tilde{y}}\\right]$ we get"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "a2e68cbb",
-   "metadata": {},
+   "id": "1f2313f1",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "<!-- Equation labels as ordinary links -->\n",
-    "<div id=\"eq:taylornr\"></div>\n",
-    "\n",
     "$$\n",
-    "f(s)=0=f(x)+(s-x)f'(x)+\\frac{(s-x)^2}{2}f''(x) +\\dots.\n",
-    "    \\label{eq:taylornr} \\tag{1}\n",
+    "\\mathbb{E}\\left[(\\boldsymbol{y}-\\boldsymbol{\\tilde{y}})^2\\right]=\\mathbb{E}\\left[(\\boldsymbol{f}+\\boldsymbol{\\epsilon}-\\boldsymbol{\\tilde{y}}+\\mathbb{E}\\left[\\boldsymbol{\\tilde{y}}\\right]-\\mathbb{E}\\left[\\boldsymbol{\\tilde{y}}\\right])^2\\right],\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "d7fd3b46",
-   "metadata": {},
+   "id": "a29b174f",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "For small enough values of the function and for well-behaved\n",
-    "functions, the terms beyond linear are unimportant, hence we obtain"
+    "which, using the abovementioned expectation values can be rewritten as"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "465903ba",
-   "metadata": {},
+   "id": "3bc08002",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
-    "f(x)+(s-x)f'(x)\\approx 0,\n",
+    "\\mathbb{E}\\left[(\\boldsymbol{y}-\\boldsymbol{\\tilde{y}})^2\\right]=\\mathbb{E}\\left[(\\boldsymbol{y}-\\mathbb{E}\\left[\\boldsymbol{\\tilde{y}}\\right])^2\\right]+\\mathrm{Var}\\left[\\boldsymbol{\\tilde{y}}\\right]+\\sigma^2,\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "92a910da",
-   "metadata": {},
+   "id": "7b7d24e8",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "yielding"
+    "that is the rewriting in terms of the so-called bias, the variance of the model $\\boldsymbol{\\tilde{y}}$ and the variance of $\\boldsymbol{\\epsilon}$.\n",
+    "\n",
+    "**Note that in order to derive these equations we have assumed we can replace the unknown function $\\boldsymbol{f}$ with the target/output data $\\boldsymbol{y}$.**"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "e9ed41e4",
-   "metadata": {},
+   "id": "f2118d82",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "$$\n",
-    "s\\approx x-\\frac{f(x)}{f'(x)}.\n",
-    "$$"
+    "## A way to Read the Bias-Variance Tradeoff\n",
+    "\n",
+    "<!-- dom:FIGURE: [figures/BiasVariance.png, width=600 frac=0.9] -->\n",
+    "<!-- begin figure -->\n",
+    "\n",
+    "<img src=\"figures/BiasVariance.png\" width=\"600\"><p style=\"font-size: 0.9em\"><i>Figure 1: </i></p>\n",
+    "<!-- end figure -->"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "742eaaea",
-   "metadata": {},
+   "id": "baf08f8a",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "Having in mind an iterative procedure, it is natural to start iterating with"
+    "## Understanding what happens"
    ]
   },
   {
-   "cell_type": "markdown",
-   "id": "1eabaea3",
-   "metadata": {},
+   "cell_type": "code",
+   "execution_count": 1,
+   "id": "1bd7ac4e",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
    "source": [
-    "$$\n",
-    "x_{n+1}=x_n-\\frac{f(x_n)}{f'(x_n)}.\n",
-    "$$"
+    "%matplotlib inline\n",
+    "\n",
+    "import matplotlib.pyplot as plt\n",
+    "import numpy as np\n",
+    "from sklearn.linear_model import LinearRegression, Ridge, Lasso\n",
+    "from sklearn.preprocessing import PolynomialFeatures\n",
+    "from sklearn.model_selection import train_test_split\n",
+    "from sklearn.pipeline import make_pipeline\n",
+    "from sklearn.utils import resample\n",
+    "\n",
+    "np.random.seed(2018)\n",
+    "\n",
+    "n = 40\n",
+    "n_boostraps = 100\n",
+    "maxdegree = 14\n",
+    "\n",
+    "\n",
+    "# Make data set.\n",
+    "x = np.linspace(-3, 3, n).reshape(-1, 1)\n",
+    "y = np.exp(-x**2) + 1.5 * np.exp(-(x-2)**2)+ np.random.normal(0, 0.1, x.shape)\n",
+    "error = np.zeros(maxdegree)\n",
+    "bias = np.zeros(maxdegree)\n",
+    "variance = np.zeros(maxdegree)\n",
+    "polydegree = np.zeros(maxdegree)\n",
+    "x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2)\n",
+    "\n",
+    "for degree in range(maxdegree):\n",
+    "    model = make_pipeline(PolynomialFeatures(degree=degree), LinearRegression(fit_intercept=False))\n",
+    "    y_pred = np.empty((y_test.shape[0], n_boostraps))\n",
+    "    for i in range(n_boostraps):\n",
+    "        x_, y_ = resample(x_train, y_train)\n",
+    "        y_pred[:, i] = model.fit(x_, y_).predict(x_test).ravel()\n",
+    "\n",
+    "    polydegree[degree] = degree\n",
+    "    error[degree] = np.mean( np.mean((y_test - y_pred)**2, axis=1, keepdims=True) )\n",
+    "    bias[degree] = np.mean( (y_test - np.mean(y_pred, axis=1, keepdims=True))**2 )\n",
+    "    variance[degree] = np.mean( np.var(y_pred, axis=1, keepdims=True) )\n",
+    "    print('Polynomial degree:', degree)\n",
+    "    print('Error:', error[degree])\n",
+    "    print('Bias^2:', bias[degree])\n",
+    "    print('Var:', variance[degree])\n",
+    "    print('{} >= {} + {} = {}'.format(error[degree], bias[degree], variance[degree], bias[degree]+variance[degree]))\n",
+    "\n",
+    "plt.plot(polydegree, error, label='Error')\n",
+    "plt.plot(polydegree, bias, label='bias')\n",
+    "plt.plot(polydegree, variance, label='Variance')\n",
+    "plt.legend()\n",
+    "plt.show()"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "532422db",
-   "metadata": {},
+   "id": "3edb75ab",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "## Simple geometric interpretation\n",
-    "\n",
-    "The above is Newton-Raphson's method. It has a simple geometric\n",
-    "interpretation, namely $x_{n+1}$ is the point where the tangent from\n",
-    "$(x_n,f(x_n))$ crosses the $x$-axis.  Close to the solution,\n",
-    "Newton-Raphson converges fast to the desired result. However, if we\n",
-    "are far from a root, where the higher-order terms in the series are\n",
-    "important, the Newton-Raphson formula can give grossly inaccurate\n",
-    "results. For instance, the initial guess for the root might be so far\n",
-    "from the true root as to let the search interval include a local\n",
-    "maximum or minimum of the function.  If an iteration places a trial\n",
-    "guess near such a local extremum, so that the first derivative nearly\n",
-    "vanishes, then Newton-Raphson may fail totally"
+    "## Summing up\n",
+    "\n",
+    "The bias-variance tradeoff summarizes the fundamental tension in\n",
+    "machine learning, particularly supervised learning, between the\n",
+    "complexity of a model and the amount of training data needed to train\n",
+    "it.  Since data is often limited, in practice it is often useful to\n",
+    "use a less-complex model with higher bias, that is  a model whose asymptotic\n",
+    "performance is worse than another model because it is easier to\n",
+    "train and less sensitive to sampling noise arising from having a\n",
+    "finite-sized training dataset (smaller variance). \n",
+    "\n",
+    "The above equations tell us that in\n",
+    "order to minimize the expected test error, we need to select a\n",
+    "statistical learning method that simultaneously achieves low variance\n",
+    "and low bias. Note that variance is inherently a nonnegative quantity,\n",
+    "and squared bias is also nonnegative. Hence, we see that the expected\n",
+    "test MSE can never lie below $Var(\\epsilon)$, the irreducible error.\n",
+    "\n",
+    "What do we mean by the variance and bias of a statistical learning\n",
+    "method? The variance refers to the amount by which our model would change if we\n",
+    "estimated it using a different training data set. Since the training\n",
+    "data are used to fit the statistical learning method, different\n",
+    "training data sets  will result in a different estimate. But ideally the\n",
+    "estimate for our model should not vary too much between training\n",
+    "sets. However, if a method has high variance  then small changes in\n",
+    "the training data can result in large changes in the model. In general, more\n",
+    "flexible statistical methods have higher variance.\n",
+    "\n",
+    "You may also find this recent [article](https://www.pnas.org/content/116/32/15849) of interest."
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "98a519bb",
-   "metadata": {},
+   "id": "88ce8a48",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "## Extending to more than one variable\n",
+    "## Another Example from Scikit-Learn's Repository\n",
     "\n",
-    "Newton's method can be generalized to systems of several non-linear equations\n",
-    "and variables. Consider the case with two equations"
+    "This example demonstrates the problems of underfitting and overfitting and\n",
+    "how we can use linear regression with polynomial features to approximate\n",
+    "nonlinear functions. The plot shows the function that we want to approximate,\n",
+    "which is a part of the cosine function. In addition, the samples from the\n",
+    "real function and the approximations of different models are displayed. The\n",
+    "models have polynomial features of different degrees. We can see that a\n",
+    "linear function (polynomial with degree 1) is not sufficient to fit the\n",
+    "training samples. This is called **underfitting**. A polynomial of degree 4\n",
+    "approximates the true function almost perfectly. However, for higher degrees\n",
+    "the model will **overfit** the training data, i.e. it learns the noise of the\n",
+    "training data.\n",
+    "We evaluate quantitatively overfitting and underfitting by using\n",
+    "cross-validation. We calculate the mean squared error (MSE) on the validation\n",
+    "set, the higher, the less likely the model generalizes correctly from the\n",
+    "training data."
    ]
   },
   {
-   "cell_type": "markdown",
-   "id": "8dd900d6",
-   "metadata": {},
+   "cell_type": "code",
+   "execution_count": 2,
+   "id": "40385eb8",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
    "source": [
-    "$$\n",
-    "\\begin{array}{cc} f_1(x_1,x_2) &=0\\\\\n",
-    "                     f_2(x_1,x_2) &=0,\\end{array}\n",
-    "$$"
+    "\n",
+    "\n",
+    "#print(__doc__)\n",
+    "\n",
+    "import numpy as np\n",
+    "import matplotlib.pyplot as plt\n",
+    "from sklearn.pipeline import Pipeline\n",
+    "from sklearn.preprocessing import PolynomialFeatures\n",
+    "from sklearn.linear_model import LinearRegression\n",
+    "from sklearn.model_selection import cross_val_score\n",
+    "\n",
+    "\n",
+    "def true_fun(X):\n",
+    "    return np.cos(1.5 * np.pi * X)\n",
+    "\n",
+    "np.random.seed(0)\n",
+    "\n",
+    "n_samples = 30\n",
+    "degrees = [1, 4, 15]\n",
+    "\n",
+    "X = np.sort(np.random.rand(n_samples))\n",
+    "y = true_fun(X) + np.random.randn(n_samples) * 0.1\n",
+    "\n",
+    "plt.figure(figsize=(14, 5))\n",
+    "for i in range(len(degrees)):\n",
+    "    ax = plt.subplot(1, len(degrees), i + 1)\n",
+    "    plt.setp(ax, xticks=(), yticks=())\n",
+    "\n",
+    "    polynomial_features = PolynomialFeatures(degree=degrees[i],\n",
+    "                                             include_bias=False)\n",
+    "    linear_regression = LinearRegression()\n",
+    "    pipeline = Pipeline([(\"polynomial_features\", polynomial_features),\n",
+    "                         (\"linear_regression\", linear_regression)])\n",
+    "    pipeline.fit(X[:, np.newaxis], y)\n",
+    "\n",
+    "    # Evaluate the models using crossvalidation\n",
+    "    scores = cross_val_score(pipeline, X[:, np.newaxis], y,\n",
+    "                             scoring=\"neg_mean_squared_error\", cv=10)\n",
+    "\n",
+    "    X_test = np.linspace(0, 1, 100)\n",
+    "    plt.plot(X_test, pipeline.predict(X_test[:, np.newaxis]), label=\"Model\")\n",
+    "    plt.plot(X_test, true_fun(X_test), label=\"True function\")\n",
+    "    plt.scatter(X, y, edgecolor='b', s=20, label=\"Samples\")\n",
+    "    plt.xlabel(\"x\")\n",
+    "    plt.ylabel(\"y\")\n",
+    "    plt.xlim((0, 1))\n",
+    "    plt.ylim((-2, 2))\n",
+    "    plt.legend(loc=\"best\")\n",
+    "    plt.title(\"Degree {}\\nMSE = {:.2e}(+/- {:.2e})\".format(\n",
+    "        degrees[i], -scores.mean(), scores.std()))\n",
+    "plt.show()"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "eb544f9d",
-   "metadata": {},
+   "id": "a0c0d4df",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "which we Taylor expand to obtain"
+    "## Various steps in cross-validation\n",
+    "\n",
+    "When the repetitive splitting of the data set is done randomly,\n",
+    "samples may accidently end up in a fast majority of the splits in\n",
+    "either training or test set. Such samples may have an unbalanced\n",
+    "influence on either model building or prediction evaluation. To avoid\n",
+    "this $k$-fold cross-validation structures the data splitting. The\n",
+    "samples are divided into $k$ more or less equally sized exhaustive and\n",
+    "mutually exclusive subsets. In turn (at each split) one of these\n",
+    "subsets plays the role of the test set while the union of the\n",
+    "remaining subsets constitutes the training set. Such a splitting\n",
+    "warrants a balanced representation of each sample in both training and\n",
+    "test set over the splits. Still the division into the $k$ subsets\n",
+    "involves a degree of randomness. This may be fully excluded when\n",
+    "choosing $k=n$. This particular case is referred to as leave-one-out\n",
+    "cross-validation (LOOCV)."
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "7b83fe06",
-   "metadata": {},
+   "id": "68d3e653",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "$$\n",
-    "\\begin{array}{cc} 0=f_1(x_1+h_1,x_2+h_2)=&f_1(x_1,x_2)+h_1\n",
-    "                     \\partial f_1/\\partial x_1+h_2\n",
-    "                     \\partial f_1/\\partial x_2+\\dots\\\\\n",
-    "                     0=f_2(x_1+h_1,x_2+h_2)=&f_2(x_1,x_2)+h_1\n",
-    "                     \\partial f_2/\\partial x_1+h_2\n",
-    "                     \\partial f_2/\\partial x_2+\\dots\n",
-    "                       \\end{array}.\n",
-    "$$"
+    "## Cross-validation in brief\n",
+    "\n",
+    "For the various values of $k$\n",
+    "\n",
+    "1. shuffle the dataset randomly.\n",
+    "\n",
+    "2. Split the dataset into $k$ groups.\n",
+    "\n",
+    "3. For each unique group:\n",
+    "\n",
+    "a. Decide which group to use as set for test data\n",
+    "\n",
+    "b. Take the remaining groups as a training data set\n",
+    "\n",
+    "c. Fit a model on the training set and evaluate it on the test set\n",
+    "\n",
+    "d. Retain the evaluation score and discard the model\n",
+    "\n",
+    "5. Summarize the model using the sample of model evaluation scores"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "7e03330a",
-   "metadata": {},
+   "id": "7f7a6350",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "Defining the Jacobian matrix ${\\bf \\boldsymbol{J}}$ we have"
+    "## Code Example for Cross-validation and $k$-fold Cross-validation\n",
+    "\n",
+    "The code here uses Ridge regression with cross-validation (CV)  resampling and $k$-fold CV in order to fit a specific polynomial."
    ]
   },
   {
-   "cell_type": "markdown",
-   "id": "8b6bbcc0",
-   "metadata": {},
+   "cell_type": "code",
+   "execution_count": 3,
+   "id": "23eef50b",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
    "source": [
-    "$$\n",
-    "{\\bf \\boldsymbol{J}}=\\left( \\begin{array}{cc}\n",
-    "                         \\partial f_1/\\partial x_1  & \\partial f_1/\\partial x_2 \\\\\n",
-    "                          \\partial f_2/\\partial x_1     &\\partial f_2/\\partial x_2\n",
-    "             \\end{array} \\right),\n",
-    "$$"
+    "import numpy as np\n",
+    "import matplotlib.pyplot as plt\n",
+    "from sklearn.model_selection import KFold\n",
+    "from sklearn.linear_model import Ridge\n",
+    "from sklearn.model_selection import cross_val_score\n",
+    "from sklearn.preprocessing import PolynomialFeatures\n",
+    "\n",
+    "# A seed just to ensure that the random numbers are the same for every run.\n",
+    "# Useful for eventual debugging.\n",
+    "np.random.seed(3155)\n",
+    "\n",
+    "# Generate the data.\n",
+    "nsamples = 100\n",
+    "x = np.random.randn(nsamples)\n",
+    "y = 3*x**2 + np.random.randn(nsamples)\n",
+    "\n",
+    "## Cross-validation on Ridge regression using KFold only\n",
+    "\n",
+    "# Decide degree on polynomial to fit\n",
+    "poly = PolynomialFeatures(degree = 6)\n",
+    "\n",
+    "# Decide which values of lambda to use\n",
+    "nlambdas = 500\n",
+    "lambdas = np.logspace(-3, 5, nlambdas)\n",
+    "\n",
+    "# Initialize a KFold instance\n",
+    "k = 5\n",
+    "kfold = KFold(n_splits = k)\n",
+    "\n",
+    "# Perform the cross-validation to estimate MSE\n",
+    "scores_KFold = np.zeros((nlambdas, k))\n",
+    "\n",
+    "i = 0\n",
+    "for lmb in lambdas:\n",
+    "    ridge = Ridge(alpha = lmb)\n",
+    "    j = 0\n",
+    "    for train_inds, test_inds in kfold.split(x):\n",
+    "        xtrain = x[train_inds]\n",
+    "        ytrain = y[train_inds]\n",
+    "\n",
+    "        xtest = x[test_inds]\n",
+    "        ytest = y[test_inds]\n",
+    "\n",
+    "        Xtrain = poly.fit_transform(xtrain[:, np.newaxis])\n",
+    "        ridge.fit(Xtrain, ytrain[:, np.newaxis])\n",
+    "\n",
+    "        Xtest = poly.fit_transform(xtest[:, np.newaxis])\n",
+    "        ypred = ridge.predict(Xtest)\n",
+    "\n",
+    "        scores_KFold[i,j] = np.sum((ypred - ytest[:, np.newaxis])**2)/np.size(ypred)\n",
+    "\n",
+    "        j += 1\n",
+    "    i += 1\n",
+    "\n",
+    "\n",
+    "estimated_mse_KFold = np.mean(scores_KFold, axis = 1)\n",
+    "\n",
+    "## Cross-validation using cross_val_score from sklearn along with KFold\n",
+    "\n",
+    "# kfold is an instance initialized above as:\n",
+    "# kfold = KFold(n_splits = k)\n",
+    "\n",
+    "estimated_mse_sklearn = np.zeros(nlambdas)\n",
+    "i = 0\n",
+    "for lmb in lambdas:\n",
+    "    ridge = Ridge(alpha = lmb)\n",
+    "\n",
+    "    X = poly.fit_transform(x[:, np.newaxis])\n",
+    "    estimated_mse_folds = cross_val_score(ridge, X, y[:, np.newaxis], scoring='neg_mean_squared_error', cv=kfold)\n",
+    "\n",
+    "    # cross_val_score return an array containing the estimated negative mse for every fold.\n",
+    "    # we have to the the mean of every array in order to get an estimate of the mse of the model\n",
+    "    estimated_mse_sklearn[i] = np.mean(-estimated_mse_folds)\n",
+    "\n",
+    "    i += 1\n",
+    "\n",
+    "## Plot and compare the slightly different ways to perform cross-validation\n",
+    "\n",
+    "plt.figure()\n",
+    "\n",
+    "plt.plot(np.log10(lambdas), estimated_mse_sklearn, label = 'cross_val_score')\n",
+    "#plt.plot(np.log10(lambdas), estimated_mse_KFold, 'r--', label = 'KFold')\n",
+    "\n",
+    "plt.xlabel('log10(lambda)')\n",
+    "plt.ylabel('mse')\n",
+    "\n",
+    "plt.legend()\n",
+    "\n",
+    "plt.show()"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "942ef368",
-   "metadata": {},
+   "id": "76662787",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "we can rephrase Newton's method as"
+    "## More examples on bootstrap and cross-validation and errors"
    ]
   },
   {
-   "cell_type": "markdown",
-   "id": "8d383d9e",
-   "metadata": {},
+   "cell_type": "code",
+   "execution_count": 4,
+   "id": "166cd085",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
    "source": [
-    "$$\n",
-    "\\left(\\begin{array}{c} x_1^{n+1} \\\\ x_2^{n+1} \\end{array} \\right)=\n",
-    "\\left(\\begin{array}{c} x_1^{n} \\\\ x_2^{n} \\end{array} \\right)+\n",
-    "\\left(\\begin{array}{c} h_1^{n} \\\\ h_2^{n} \\end{array} \\right),\n",
-    "$$"
+    "# Common imports\n",
+    "import os\n",
+    "import numpy as np\n",
+    "import pandas as pd\n",
+    "import matplotlib.pyplot as plt\n",
+    "from sklearn.linear_model import LinearRegression, Ridge, Lasso\n",
+    "from sklearn.model_selection import train_test_split\n",
+    "from sklearn.utils import resample\n",
+    "from sklearn.metrics import mean_squared_error\n",
+    "# Where to save the figures and data files\n",
+    "PROJECT_ROOT_DIR = \"Results\"\n",
+    "FIGURE_ID = \"Results/FigureFiles\"\n",
+    "DATA_ID = \"DataFiles/\"\n",
+    "\n",
+    "if not os.path.exists(PROJECT_ROOT_DIR):\n",
+    "    os.mkdir(PROJECT_ROOT_DIR)\n",
+    "\n",
+    "if not os.path.exists(FIGURE_ID):\n",
+    "    os.makedirs(FIGURE_ID)\n",
+    "\n",
+    "if not os.path.exists(DATA_ID):\n",
+    "    os.makedirs(DATA_ID)\n",
+    "\n",
+    "def image_path(fig_id):\n",
+    "    return os.path.join(FIGURE_ID, fig_id)\n",
+    "\n",
+    "def data_path(dat_id):\n",
+    "    return os.path.join(DATA_ID, dat_id)\n",
+    "\n",
+    "def save_fig(fig_id):\n",
+    "    plt.savefig(image_path(fig_id) + \".png\", format='png')\n",
+    "\n",
+    "infile = open(data_path(\"EoS.csv\"),'r')\n",
+    "\n",
+    "# Read the EoS data as  csv file and organize the data into two arrays with density and energies\n",
+    "EoS = pd.read_csv(infile, names=('Density', 'Energy'))\n",
+    "EoS['Energy'] = pd.to_numeric(EoS['Energy'], errors='coerce')\n",
+    "EoS = EoS.dropna()\n",
+    "Energies = EoS['Energy']\n",
+    "Density = EoS['Density']\n",
+    "#  The design matrix now as function of various polytrops\n",
+    "\n",
+    "Maxpolydegree = 30\n",
+    "X = np.zeros((len(Density),Maxpolydegree))\n",
+    "X[:,0] = 1.0\n",
+    "testerror = np.zeros(Maxpolydegree)\n",
+    "trainingerror = np.zeros(Maxpolydegree)\n",
+    "polynomial = np.zeros(Maxpolydegree)\n",
+    "\n",
+    "trials = 100\n",
+    "for polydegree in range(1, Maxpolydegree):\n",
+    "    polynomial[polydegree] = polydegree\n",
+    "    for degree in range(polydegree):\n",
+    "        X[:,degree] = Density**(degree/3.0)\n",
+    "\n",
+    "# loop over trials in order to estimate the expectation value of the MSE\n",
+    "    testerror[polydegree] = 0.0\n",
+    "    trainingerror[polydegree] = 0.0\n",
+    "    for samples in range(trials):\n",
+    "        x_train, x_test, y_train, y_test = train_test_split(X, Energies, test_size=0.2)\n",
+    "        model = LinearRegression(fit_intercept=False).fit(x_train, y_train)\n",
+    "        ypred = model.predict(x_train)\n",
+    "        ytilde = model.predict(x_test)\n",
+    "        testerror[polydegree] += mean_squared_error(y_test, ytilde)\n",
+    "        trainingerror[polydegree] += mean_squared_error(y_train, ypred) \n",
+    "\n",
+    "    testerror[polydegree] /= trials\n",
+    "    trainingerror[polydegree] /= trials\n",
+    "    print(\"Degree of polynomial: %3d\"% polynomial[polydegree])\n",
+    "    print(\"Mean squared error on training data: %.8f\" % trainingerror[polydegree])\n",
+    "    print(\"Mean squared error on test data: %.8f\" % testerror[polydegree])\n",
+    "\n",
+    "plt.plot(polynomial, np.log10(trainingerror), label='Training Error')\n",
+    "plt.plot(polynomial, np.log10(testerror), label='Test Error')\n",
+    "plt.xlabel('Polynomial degree')\n",
+    "plt.ylabel('log10[MSE]')\n",
+    "plt.legend()\n",
+    "plt.show()"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "f3cbe6ed",
-   "metadata": {},
+   "id": "53dc97b8",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "where we have defined"
+    "Note that we kept the intercept column in the fitting here. This means that we need to set the **intercept** in the call to the **Scikit-Learn** function as **False**. Alternatively, we could have set up the design matrix $X$ without the first column of ones."
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "c4086d22",
-   "metadata": {},
+   "id": "660084ab",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "$$\n",
-    "\\left(\\begin{array}{c} h_1^{n} \\\\ h_2^{n} \\end{array} \\right)=\n",
-    "   -{\\bf \\boldsymbol{J}}^{-1}\n",
-    "   \\left(\\begin{array}{c} f_1(x_1^{n},x_2^{n}) \\\\ f_2(x_1^{n},x_2^{n}) \\end{array} \\right).\n",
-    "$$"
+    "## The same example but now with cross-validation\n",
+    "\n",
+    "In this example we keep the intercept column again but add cross-validation in order to estimate the best possible value of the means squared error."
    ]
   },
   {
-   "cell_type": "markdown",
-   "id": "e6317de3",
-   "metadata": {},
+   "cell_type": "code",
+   "execution_count": 5,
+   "id": "5dd5aec2",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
    "source": [
-    "We need thus to compute the inverse of the Jacobian matrix and it\n",
-    "is to understand that difficulties  may\n",
-    "arise in case ${\\bf \\boldsymbol{J}}$ is nearly singular.\n",
-    "\n",
-    "It is rather straightforward to extend the above scheme to systems of\n",
-    "more than two non-linear equations. In our case, the Jacobian matrix is given by the Hessian that represents the second derivative of cost function."
+    "# Common imports\n",
+    "import os\n",
+    "import numpy as np\n",
+    "import pandas as pd\n",
+    "import matplotlib.pyplot as plt\n",
+    "from sklearn.linear_model import LinearRegression, Ridge, Lasso\n",
+    "from sklearn.metrics import mean_squared_error\n",
+    "from sklearn.model_selection import KFold\n",
+    "from sklearn.model_selection import cross_val_score\n",
+    "\n",
+    "\n",
+    "# Where to save the figures and data files\n",
+    "PROJECT_ROOT_DIR = \"Results\"\n",
+    "FIGURE_ID = \"Results/FigureFiles\"\n",
+    "DATA_ID = \"DataFiles/\"\n",
+    "\n",
+    "if not os.path.exists(PROJECT_ROOT_DIR):\n",
+    "    os.mkdir(PROJECT_ROOT_DIR)\n",
+    "\n",
+    "if not os.path.exists(FIGURE_ID):\n",
+    "    os.makedirs(FIGURE_ID)\n",
+    "\n",
+    "if not os.path.exists(DATA_ID):\n",
+    "    os.makedirs(DATA_ID)\n",
+    "\n",
+    "def image_path(fig_id):\n",
+    "    return os.path.join(FIGURE_ID, fig_id)\n",
+    "\n",
+    "def data_path(dat_id):\n",
+    "    return os.path.join(DATA_ID, dat_id)\n",
+    "\n",
+    "def save_fig(fig_id):\n",
+    "    plt.savefig(image_path(fig_id) + \".png\", format='png')\n",
+    "\n",
+    "infile = open(data_path(\"EoS.csv\"),'r')\n",
+    "\n",
+    "# Read the EoS data as  csv file and organize the data into two arrays with density and energies\n",
+    "EoS = pd.read_csv(infile, names=('Density', 'Energy'))\n",
+    "EoS['Energy'] = pd.to_numeric(EoS['Energy'], errors='coerce')\n",
+    "EoS = EoS.dropna()\n",
+    "Energies = EoS['Energy']\n",
+    "Density = EoS['Density']\n",
+    "#  The design matrix now as function of various polytrops\n",
+    "\n",
+    "Maxpolydegree = 30\n",
+    "X = np.zeros((len(Density),Maxpolydegree))\n",
+    "X[:,0] = 1.0\n",
+    "estimated_mse_sklearn = np.zeros(Maxpolydegree)\n",
+    "polynomial = np.zeros(Maxpolydegree)\n",
+    "k =5\n",
+    "kfold = KFold(n_splits = k)\n",
+    "\n",
+    "for polydegree in range(1, Maxpolydegree):\n",
+    "    polynomial[polydegree] = polydegree\n",
+    "    for degree in range(polydegree):\n",
+    "        X[:,degree] = Density**(degree/3.0)\n",
+    "        OLS = LinearRegression(fit_intercept=False)\n",
+    "# loop over trials in order to estimate the expectation value of the MSE\n",
+    "    estimated_mse_folds = cross_val_score(OLS, X, Energies, scoring='neg_mean_squared_error', cv=kfold)\n",
+    "#[:, np.newaxis]\n",
+    "    estimated_mse_sklearn[polydegree] = np.mean(-estimated_mse_folds)\n",
+    "\n",
+    "plt.plot(polynomial, np.log10(estimated_mse_sklearn), label='Test Error')\n",
+    "plt.xlabel('Polynomial degree')\n",
+    "plt.ylabel('log10[MSE]')\n",
+    "plt.legend()\n",
+    "plt.show()"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "edf2bfe8",
-   "metadata": {},
+   "id": "2c1f6d4b",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "## Steepest descent\n",
-    "\n",
-    "The basic idea of gradient descent is\n",
-    "that a function $F(\\mathbf{x})$, \n",
-    "$\\mathbf{x} \\equiv (x_1,\\cdots,x_n)$, decreases fastest if one goes from $\\bf {x}$ in the\n",
-    "direction of the negative gradient $-\\nabla F(\\mathbf{x})$.\n",
+    "## Logistic Regression\n",
     "\n",
-    "It can be shown that if"
+    "In linear regression our main interest was centered on learning the\n",
+    "coefficients of a functional fit (say a polynomial) in order to be\n",
+    "able to predict the response of a continuous variable on some unseen\n",
+    "data. The fit to the continuous variable $y_i$ is based on some\n",
+    "independent variables $\\boldsymbol{x}_i$. Linear regression resulted in\n",
+    "analytical expressions for standard ordinary Least Squares or Ridge\n",
+    "regression (in terms of matrices to invert) for several quantities,\n",
+    "ranging from the variance and thereby the confidence intervals of the\n",
+    "parameters $\\boldsymbol{\\theta}$ to the mean squared error. If we can invert\n",
+    "the product of the design matrices, linear regression gives then a\n",
+    "simple recipe for fitting our data."
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "a7fb938f",
-   "metadata": {},
+   "id": "149e92ec",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "$$\n",
-    "\\mathbf{x}_{k+1} = \\mathbf{x}_k - \\gamma_k \\nabla F(\\mathbf{x}_k),\n",
-    "$$"
+    "## Classification problems\n",
+    "\n",
+    "Classification problems, however, are concerned with outcomes taking\n",
+    "the form of discrete variables (i.e. categories). We may for example,\n",
+    "on the basis of DNA sequencing for a number of patients, like to find\n",
+    "out which mutations are important for a certain disease; or based on\n",
+    "scans of various patients' brains, figure out if there is a tumor or\n",
+    "not; or given a specific physical system, we'd like to identify its\n",
+    "state, say whether it is an ordered or disordered system (typical\n",
+    "situation in solid state physics); or classify the status of a\n",
+    "patient, whether she/he has a stroke or not and many other similar\n",
+    "situations.\n",
+    "\n",
+    "The most common situation we encounter when we apply logistic\n",
+    "regression is that of two possible outcomes, normally denoted as a\n",
+    "binary outcome, true or false, positive or negative, success or\n",
+    "failure etc."
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "d847b9be",
-   "metadata": {},
+   "id": "ce85cd3a",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "with $\\gamma_k > 0$.\n",
+    "## Optimization and Deep learning\n",
     "\n",
-    "For $\\gamma_k$ small enough, then $F(\\mathbf{x}_{k+1}) \\leq\n",
-    "F(\\mathbf{x}_k)$. This means that for a sufficiently small $\\gamma_k$\n",
-    "we are always moving towards smaller function values, i.e a minimum."
+    "Logistic regression will also serve as our stepping stone towards\n",
+    "neural network algorithms and supervised deep learning. For logistic\n",
+    "learning, the minimization of the cost function leads to a non-linear\n",
+    "equation in the parameters $\\boldsymbol{\\theta}$. The optimization of the\n",
+    "problem calls therefore for minimization algorithms. This forms the\n",
+    "bottle neck of all machine learning algorithms, namely how to find\n",
+    "reliable minima of a multi-variable function. This leads us to the\n",
+    "family of gradient descent methods. The latter are the working horses\n",
+    "of basically all modern machine learning algorithms.\n",
+    "\n",
+    "We note also that many of the topics discussed here on logistic \n",
+    "regression are also commonly used in modern supervised Deep Learning\n",
+    "models, as we will see later."
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "9b5b4acb",
-   "metadata": {},
+   "id": "2eb9e687",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "## More on Steepest descent\n",
+    "## Basics\n",
     "\n",
-    "The previous observation is the basis of the method of steepest\n",
-    "descent, which is also referred to as just gradient descent (GD). One\n",
-    "starts with an initial guess $\\mathbf{x}_0$ for a minimum of $F$ and\n",
-    "computes new approximations according to"
+    "We consider the case where the outputs/targets, also called the\n",
+    "responses or the outcomes, $y_i$ are discrete and only take values\n",
+    "from $k=0,\\dots,K-1$ (i.e. $K$ classes).\n",
+    "\n",
+    "The goal is to predict the\n",
+    "output classes from the design matrix $\\boldsymbol{X}\\in\\mathbb{R}^{n\\times p}$\n",
+    "made of $n$ samples, each of which carries $p$ features or predictors. The\n",
+    "primary goal is to identify the classes to which new unseen samples\n",
+    "belong.\n",
+    "\n",
+    "Let us specialize to the case of two classes only, with outputs\n",
+    "$y_i=0$ and $y_i=1$. Our outcomes could represent the status of a\n",
+    "credit card user that could default or not on her/his credit card\n",
+    "debt. That is"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "ce9525ec",
-   "metadata": {},
+   "id": "9b8b7d05",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
-    "\\mathbf{x}_{k+1} = \\mathbf{x}_k - \\gamma_k \\nabla F(\\mathbf{x}_k), \\ \\ k \\geq 0.\n",
+    "y_i = \\begin{bmatrix} 0 & \\mathrm{no}\\\\  1 & \\mathrm{yes} \\end{bmatrix}.\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "54d2e571",
-   "metadata": {},
+   "id": "7db50d1a",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "The parameter $\\gamma_k$ is often referred to as the step length or\n",
-    "the learning rate within the context of Machine Learning."
+    "## Linear classifier\n",
+    "\n",
+    "Before moving to the logistic model, let us try to use our linear\n",
+    "regression model to classify these two outcomes. We could for example\n",
+    "fit a linear model to the default case if $y_i > 0.5$ and the no\n",
+    "default case $y_i \\leq 0.5$.\n",
+    "\n",
+    "We would then have our \n",
+    "weighted linear combination, namely"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "c35ee552",
-   "metadata": {},
+   "id": "a78fc346",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "## The ideal\n",
-    "\n",
-    "Ideally the sequence $\\{\\mathbf{x}_k \\}_{k=0}$ converges to a global\n",
-    "minimum of the function $F$. In general we do not know if we are in a\n",
-    "global or local minimum. In the special case when $F$ is a convex\n",
-    "function, all local minima are also global minima, so in this case\n",
-    "gradient descent can converge to the global solution. The advantage of\n",
-    "this scheme is that it is conceptually simple and straightforward to\n",
-    "implement. However the method in this form has some severe\n",
-    "limitations:\n",
-    "\n",
-    "In machine learing we are often faced with non-convex high dimensional\n",
-    "cost functions with many local minima. Since GD is deterministic we\n",
-    "will get stuck in a local minimum, if the method converges, unless we\n",
-    "have a very good intial guess. This also implies that the scheme is\n",
-    "sensitive to the chosen initial condition.\n",
-    "\n",
-    "Note that the gradient is a function of $\\mathbf{x} =\n",
-    "(x_1,\\cdots,x_n)$ which makes it expensive to compute numerically."
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"_auto1\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation}\n",
+    "\\boldsymbol{y} = \\boldsymbol{X}^T\\boldsymbol{\\theta} +  \\boldsymbol{\\epsilon},\n",
+    "\\label{_auto1} \\tag{1}\n",
+    "\\end{equation}\n",
+    "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "2f025ad9",
-   "metadata": {},
+   "id": "661d8faf",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "## The sensitiveness of the gradient descent\n",
-    "\n",
-    "The gradient descent method \n",
-    "is sensitive to the choice of learning rate $\\gamma_k$. This is due\n",
-    "to the fact that we are only guaranteed that $F(\\mathbf{x}_{k+1}) \\leq\n",
-    "F(\\mathbf{x}_k)$ for sufficiently small $\\gamma_k$. The problem is to\n",
-    "determine an optimal learning rate. If the learning rate is chosen too\n",
-    "small the method will take a long time to converge and if it is too\n",
-    "large we can experience erratic behavior.\n",
-    "\n",
-    "Many of these shortcomings can be alleviated by introducing\n",
-    "randomness. One such method is that of Stochastic Gradient Descent\n",
-    "(SGD), see below."
+    "where $\\boldsymbol{y}$ is a vector representing the possible outcomes, $\\boldsymbol{X}$ is our\n",
+    "$n\\times p$ design matrix and $\\boldsymbol{\\theta}$ represents our estimators/predictors."
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "80f3bd26",
-   "metadata": {},
+   "id": "8620ba1b",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "## Convex functions\n",
+    "## Some selected properties\n",
     "\n",
-    "Ideally we want our cost/loss function to be convex(concave).\n",
+    "The main problem with our function is that it takes values on the\n",
+    "entire real axis. In the case of logistic regression, however, the\n",
+    "labels $y_i$ are discrete variables. A typical example is the credit\n",
+    "card data discussed below here, where we can set the state of\n",
+    "defaulting the debt to $y_i=1$ and not to $y_i=0$ for one the persons\n",
+    "in the data set (see the full example below).\n",
     "\n",
-    "First we give the definition of a convex set: A set $C$ in\n",
-    "$\\mathbb{R}^n$ is said to be convex if, for all $x$ and $y$ in $C$ and\n",
-    "all $t \\in (0,1)$ , the point $(1 − t)x + ty$ also belongs to\n",
-    "C. Geometrically this means that every point on the line segment\n",
-    "connecting $x$ and $y$ is in $C$ as discussed below.\n",
+    "One simple way to get a discrete output is to have sign\n",
+    "functions that map the output of a linear regressor to values $\\{0,1\\}$,\n",
+    "$f(s_i)=sign(s_i)=1$ if $s_i\\ge 0$ and 0 if otherwise. \n",
+    "We will encounter this model in our first demonstration of neural networks.\n",
     "\n",
-    "The convex subsets of $\\mathbb{R}$ are the intervals of\n",
-    "$\\mathbb{R}$. Examples of convex sets of $\\mathbb{R}^2$ are the\n",
-    "regular polygons (triangles, rectangles, pentagons, etc...)."
+    "Historically it is called the **perceptron** model in the machine learning\n",
+    "literature. This model is extremely simple. However, in many cases it is more\n",
+    "favorable to use a ``soft\" classifier that outputs\n",
+    "the probability of a given category. This leads us to the logistic function."
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "199afe72",
-   "metadata": {},
+   "id": "8fdbebd2",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "## Convex function\n",
-    "\n",
-    "**Convex function**: Let $X \\subset \\mathbb{R}^n$ be a convex\n",
-    "set. Assume that the function $f: X \\rightarrow \\mathbb{R}$ is\n",
-    "continuous, then $f$ is said to be convex if $f(tx_1 + (1-t)x_2) \\leq tf(x_1) + (1-t)f(x_2)$\n",
-    "for all $x_1, x_2 \\in X$ and for all $t \\in [0,1]$.\n",
-    "If $\\leq$ is replaced with a strict inequaltiy in the\n",
-    "definition, we demand $x_1 \\neq x_2$ and $t\\in(0,1)$ then $f$ is said\n",
-    "to be strictly convex. For a single variable function, convexity means\n",
-    "that if you draw a straight line connecting $f(x_1)$ and $f(x_2)$, the\n",
-    "value of the function on the interval $[x_1,x_2]$ is always below the\n",
-    "line as illustrated below."
+    "## Simple example\n",
+    "\n",
+    "The following example on data for coronary heart disease (CHD) as function of age may serve as an illustration. In the code here we read and plot whether a person has had CHD (output = 1) or not (output = 0). This ouput  is plotted the person's against age. Clearly, the figure shows that attempting to make a standard linear regression fit may not be very meaningful."
    ]
   },
   {
-   "cell_type": "markdown",
-   "id": "26f4fb05",
-   "metadata": {},
+   "cell_type": "code",
+   "execution_count": 6,
+   "id": "8dc64aeb",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
    "source": [
-    "## Conditions on convex functions\n",
-    "\n",
-    "In the following we state first and second-order conditions which\n",
-    "ensures convexity of a function $f$. We write $D_f$ to denote the\n",
-    "domain of $f$, i.e the subset of $R^n$ where $f$ is defined. For more\n",
-    "details and proofs we refer to: [S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge University Press](http://stanford.edu/boyd/cvxbook/).\n",
-    "\n",
-    "**First order condition.**\n",
-    "\n",
-    "Suppose $f$ is differentiable (i.e $\\nabla f(x)$ is well defined for\n",
-    "all $x$ in the domain of $f$). Then $f$ is convex if and only if $D_f$\n",
-    "is a convex set and $f(y) \\geq f(x) + \\nabla f(x)^T (y-x)$ holds\n",
-    "for all $x,y \\in D_f$.\n",
-    "\n",
-    "This condition means that for a convex function\n",
-    "the first order Taylor expansion (right hand side above) at any point\n",
-    "a global under estimator of the function. To convince yourself you can\n",
-    "make a drawing of $f(x) = x^2+1$ and draw the tangent line to $f(x)$ and\n",
-    "note that it is always below the graph.\n",
-    "\n",
-    "**Second order condition.**\n",
-    "\n",
-    "Assume that $f$ is twice\n",
-    "differentiable, i.e the Hessian matrix exists at each point in\n",
-    "$D_f$. Then $f$ is convex if and only if $D_f$ is a convex set and its\n",
-    "Hessian is positive semi-definite for all $x\\in D_f$. For a\n",
-    "single-variable function this reduces to $f''(x) \\geq 0$. Geometrically this means that $f$ has nonnegative curvature\n",
-    "everywhere.\n",
-    "\n",
-    "This condition is particularly useful since it gives us an procedure for determining if the function under consideration is convex, apart from using the definition."
+    "# Common imports\n",
+    "import os\n",
+    "import numpy as np\n",
+    "import pandas as pd\n",
+    "import matplotlib.pyplot as plt\n",
+    "from sklearn.linear_model import LinearRegression, Ridge, Lasso\n",
+    "from sklearn.model_selection import train_test_split\n",
+    "from sklearn.utils import resample\n",
+    "from sklearn.metrics import mean_squared_error\n",
+    "from IPython.display import display\n",
+    "from pylab import plt, mpl\n",
+    "mpl.rcParams['font.family'] = 'serif'\n",
+    "\n",
+    "# Where to save the figures and data files\n",
+    "PROJECT_ROOT_DIR = \"Results\"\n",
+    "FIGURE_ID = \"Results/FigureFiles\"\n",
+    "DATA_ID = \"DataFiles/\"\n",
+    "\n",
+    "if not os.path.exists(PROJECT_ROOT_DIR):\n",
+    "    os.mkdir(PROJECT_ROOT_DIR)\n",
+    "\n",
+    "if not os.path.exists(FIGURE_ID):\n",
+    "    os.makedirs(FIGURE_ID)\n",
+    "\n",
+    "if not os.path.exists(DATA_ID):\n",
+    "    os.makedirs(DATA_ID)\n",
+    "\n",
+    "def image_path(fig_id):\n",
+    "    return os.path.join(FIGURE_ID, fig_id)\n",
+    "\n",
+    "def data_path(dat_id):\n",
+    "    return os.path.join(DATA_ID, dat_id)\n",
+    "\n",
+    "def save_fig(fig_id):\n",
+    "    plt.savefig(image_path(fig_id) + \".png\", format='png')\n",
+    "\n",
+    "infile = open(data_path(\"chddata.csv\"),'r')\n",
+    "\n",
+    "# Read the chd data as  csv file and organize the data into arrays with age group, age, and chd\n",
+    "chd = pd.read_csv(infile, names=('ID', 'Age', 'Agegroup', 'CHD'))\n",
+    "chd.columns = ['ID', 'Age', 'Agegroup', 'CHD']\n",
+    "output = chd['CHD']\n",
+    "age = chd['Age']\n",
+    "agegroup = chd['Agegroup']\n",
+    "numberID  = chd['ID'] \n",
+    "display(chd)\n",
+    "\n",
+    "plt.scatter(age, output, marker='o')\n",
+    "plt.axis([18,70.0,-0.1, 1.2])\n",
+    "plt.xlabel(r'Age')\n",
+    "plt.ylabel(r'CHD')\n",
+    "plt.title(r'Age distribution and Coronary heart disease')\n",
+    "plt.show()"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "ef861245",
-   "metadata": {},
+   "id": "40385068",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "## More on convex functions\n",
-    "\n",
-    "The next result is of great importance to us and the reason why we are\n",
-    "going on about convex functions. In machine learning we frequently\n",
-    "have to minimize a loss/cost function in order to find the best\n",
-    "parameters for the model we are considering. \n",
-    "\n",
-    "Ideally we want the\n",
-    "global minimum (for high-dimensional models it is hard to know\n",
-    "if we have local or global minimum). However, if the cost/loss function\n",
-    "is convex the following result provides invaluable information:\n",
+    "## Plotting the mean value for each group\n",
     "\n",
-    "**Any minimum is global for convex functions.**\n",
-    "\n",
-    "Consider the problem of finding $x \\in \\mathbb{R}^n$ such that $f(x)$\n",
-    "is minimal, where $f$ is convex and differentiable. Then, any point\n",
-    "$x^*$ that satisfies $\\nabla f(x^*) = 0$ is a global minimum.\n",
-    "\n",
-    "This result means that if we know that the cost/loss function is convex and we are able to find a minimum, we are guaranteed that it is a global minimum."
+    "What we could attempt however is to plot the mean value for each group."
    ]
   },
   {
-   "cell_type": "markdown",
-   "id": "989686b9",
-   "metadata": {},
+   "cell_type": "code",
+   "execution_count": 7,
+   "id": "a473659b",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
    "source": [
-    "## Some simple problems\n",
-    "\n",
-    "1. Show that $f(x)=x^2$ is convex for $x \\in \\mathbb{R}$ using the definition of convexity. Hint: If you re-write the definition, $f$ is convex if the following holds for all $x,y \\in D_f$ and any $\\lambda \\in [0,1]$ $\\lambda f(x)+(1-\\lambda)f(y)-f(\\lambda x + (1-\\lambda) y ) \\geq 0$.\n",
-    "\n",
-    "2. Using the second order condition show that the following functions are convex on the specified domain.\n",
-    "\n",
-    " * $f(x) = e^x$ is convex for $x \\in \\mathbb{R}$.\n",
-    "\n",
-    " * $g(x) = -\\ln(x)$ is convex for $x \\in (0,\\infty)$.\n",
-    "\n",
-    "3. Let $f(x) = x^2$ and $g(x) = e^x$. Show that $f(g(x))$ and $g(f(x))$ is convex for $x \\in \\mathbb{R}$. Also show that if $f(x)$ is any convex function than $h(x) = e^{f(x)}$ is convex.\n",
-    "\n",
-    "4. A norm is any function that satisfy the following properties\n",
-    "\n",
-    " * $f(\\alpha x) = |\\alpha| f(x)$ for all $\\alpha \\in \\mathbb{R}$.\n",
-    "\n",
-    " * $f(x+y) \\leq f(x) + f(y)$\n",
-    "\n",
-    " * $f(x) \\leq 0$ for all $x \\in \\mathbb{R}^n$ with equality if and only if $x = 0$\n",
-    "\n",
-    "Using the definition of convexity, try to show that a function satisfying the properties above is convex (the third condition is not needed to show this)."
+    "agegroupmean = np.array([0.1, 0.133, 0.250, 0.333, 0.462, 0.625, 0.765, 0.800])\n",
+    "group = np.array([1, 2, 3, 4, 5, 6, 7, 8])\n",
+    "plt.plot(group, agegroupmean, \"r-\")\n",
+    "plt.axis([0,9,0, 1.0])\n",
+    "plt.xlabel(r'Age group')\n",
+    "plt.ylabel(r'CHD mean values')\n",
+    "plt.title(r'Mean values for each age group')\n",
+    "plt.show()"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "8f0a01ed",
-   "metadata": {},
+   "id": "3e2ab512",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "## Standard steepest descent\n",
-    "\n",
-    "Before we proceed, we would like to discuss the approach called the\n",
-    "**standard Steepest descent** (different from the above steepest descent discussion), which again leads to us having to be able\n",
-    "to compute a matrix. It belongs to the class of Conjugate Gradient methods (CG).\n",
-    "\n",
-    "[The success of the CG method](https://www.cs.cmu.edu/~quake-papers/painless-conjugate-gradient.pdf)\n",
-    "for finding solutions of non-linear problems is based on the theory\n",
-    "of conjugate gradients for linear systems of equations. It belongs to\n",
-    "the class of iterative methods for solving problems from linear\n",
-    "algebra of the type"
+    "We are now trying to find a function $f(y\\vert x)$, that is a function which gives us an expected value for the output $y$ with a given input $x$.\n",
+    "In standard linear regression with a linear dependence on $x$, we would write this in terms of our model"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "098985af",
-   "metadata": {},
+   "id": "40361f1b",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
-    "\\boldsymbol{A}\\boldsymbol{x} = \\boldsymbol{b}.\n",
+    "f(y_i\\vert x_i)=\\theta_0+\\theta_1 x_i.\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "ca3b1849",
-   "metadata": {},
+   "id": "a1b379fb",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "In the iterative process we end up with a problem like"
+    "This expression implies however that $f(y_i\\vert x_i)$ could take any\n",
+    "value from minus infinity to plus infinity. If we however let\n",
+    "$f(y\\vert y)$ be represented by the mean value, the above example\n",
+    "shows us that we can constrain the function to take values between\n",
+    "zero and one, that is we have $0 \\le f(y_i\\vert x_i) \\le 1$. Looking\n",
+    "at our last curve we see also that it has an S-shaped form. This leads\n",
+    "us to a very popular model for the function $f$, namely the so-called\n",
+    "Sigmoid function or logistic model. We will consider this function as\n",
+    "representing the probability for finding a value of $y_i$ with a given\n",
+    "$x_i$."
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "a9026b79",
-   "metadata": {},
+   "id": "bcbf3d2b",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "$$\n",
-    "\\boldsymbol{r}= \\boldsymbol{b}-\\boldsymbol{A}\\boldsymbol{x},\n",
-    "$$"
+    "## The logistic function\n",
+    "\n",
+    "Another widely studied model, is the so-called \n",
+    "perceptron model, which is an example of a \"hard classification\" model. We\n",
+    "will encounter this model when we discuss neural networks as\n",
+    "well. Each datapoint is deterministically assigned to a category (i.e\n",
+    "$y_i=0$ or $y_i=1$). In many cases, and the coronary heart disease data forms one of many such examples, it is favorable to have a \"soft\"\n",
+    "classifier that outputs the probability of a given category rather\n",
+    "than a single value. For example, given $x_i$, the classifier\n",
+    "outputs the probability of being in a category $k$.  Logistic regression\n",
+    "is the most common example of a so-called soft classifier. In logistic\n",
+    "regression, the probability that a data point $x_i$\n",
+    "belongs to a category $y_i=\\{0,1\\}$ is given by the so-called logit function (or Sigmoid) which is meant to represent the likelihood for a given event,"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "dff64482",
-   "metadata": {},
+   "id": "38918f44",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "where $\\boldsymbol{r}$ is the so-called residual or error in the iterative process.\n",
-    "\n",
-    "When we have found the exact solution, $\\boldsymbol{r}=0$."
+    "$$\n",
+    "p(t) = \\frac{1}{1+\\mathrm \\exp{-t}}=\\frac{\\exp{t}}{1+\\mathrm \\exp{t}}.\n",
+    "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "502eba66",
-   "metadata": {},
+   "id": "fd225d0f",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "## Gradient method\n",
-    "\n",
-    "The residual is zero when we reach the minimum of the quadratic equation"
+    "Note that $1-p(t)= p(-t)$."
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "fde7d2c9",
-   "metadata": {},
+   "id": "d340b5c1",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "$$\n",
-    "P(\\boldsymbol{x})=\\frac{1}{2}\\boldsymbol{x}^T\\boldsymbol{A}\\boldsymbol{x} - \\boldsymbol{x}^T\\boldsymbol{b},\n",
-    "$$"
+    "## Examples of likelihood functions used in logistic regression and nueral networks\n",
+    "\n",
+    "The following code plots the logistic function, the step function and other functions we will encounter from here and on."
    ]
   },
   {
-   "cell_type": "markdown",
-   "id": "636a5eb2",
-   "metadata": {},
+   "cell_type": "code",
+   "execution_count": 8,
+   "id": "357d6f03",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
    "source": [
-    "with the constraint that the matrix $\\boldsymbol{A}$ is positive definite and\n",
-    "symmetric.  This defines also the Hessian and we want it to be  positive definite."
+    "\"\"\"The sigmoid function (or the logistic curve) is a\n",
+    "function that takes any real number, z, and outputs a number (0,1).\n",
+    "It is useful in neural networks for assigning weights on a relative scale.\n",
+    "The value z is the weighted sum of parameters involved in the learning algorithm.\"\"\"\n",
+    "\n",
+    "import numpy\n",
+    "import matplotlib.pyplot as plt\n",
+    "import math as mt\n",
+    "\n",
+    "z = numpy.arange(-5, 5, .1)\n",
+    "sigma_fn = numpy.vectorize(lambda z: 1/(1+numpy.exp(-z)))\n",
+    "sigma = sigma_fn(z)\n",
+    "\n",
+    "fig = plt.figure()\n",
+    "ax = fig.add_subplot(111)\n",
+    "ax.plot(z, sigma)\n",
+    "ax.set_ylim([-0.1, 1.1])\n",
+    "ax.set_xlim([-5,5])\n",
+    "ax.grid(True)\n",
+    "ax.set_xlabel('z')\n",
+    "ax.set_title('sigmoid function')\n",
+    "\n",
+    "plt.show()\n",
+    "\n",
+    "\"\"\"Step Function\"\"\"\n",
+    "z = numpy.arange(-5, 5, .02)\n",
+    "step_fn = numpy.vectorize(lambda z: 1.0 if z >= 0.0 else 0.0)\n",
+    "step = step_fn(z)\n",
+    "\n",
+    "fig = plt.figure()\n",
+    "ax = fig.add_subplot(111)\n",
+    "ax.plot(z, step)\n",
+    "ax.set_ylim([-0.5, 1.5])\n",
+    "ax.set_xlim([-5,5])\n",
+    "ax.grid(True)\n",
+    "ax.set_xlabel('z')\n",
+    "ax.set_title('step function')\n",
+    "\n",
+    "plt.show()\n",
+    "\n",
+    "\"\"\"tanh Function\"\"\"\n",
+    "z = numpy.arange(-2*mt.pi, 2*mt.pi, 0.1)\n",
+    "t = numpy.tanh(z)\n",
+    "\n",
+    "fig = plt.figure()\n",
+    "ax = fig.add_subplot(111)\n",
+    "ax.plot(z, t)\n",
+    "ax.set_ylim([-1.0, 1.0])\n",
+    "ax.set_xlim([-2*mt.pi,2*mt.pi])\n",
+    "ax.grid(True)\n",
+    "ax.set_xlabel('z')\n",
+    "ax.set_title('tanh function')\n",
+    "\n",
+    "plt.show()"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "ce3be231",
-   "metadata": {},
+   "id": "8be63821",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "## Steepest descent  method\n",
+    "## Two parameters\n",
     "\n",
-    "We denote the initial guess for $\\boldsymbol{x}$ as $\\boldsymbol{x}_0$. \n",
-    "We can assume without loss of generality that"
+    "We assume now that we have two classes with $y_i$ either $0$ or $1$. Furthermore we assume also that we have only two parameters $\\theta$ in our fitting of the Sigmoid function, that is we define probabilities"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "f87ac8d2",
-   "metadata": {},
+   "id": "f79d930e",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
-    "\\boldsymbol{x}_0=0,\n",
+    "\\begin{align*}\n",
+    "p(y_i=1|x_i,\\boldsymbol{\\theta}) &= \\frac{\\exp{(\\theta_0+\\theta_1x_i)}}{1+\\exp{(\\theta_0+\\theta_1x_i)}},\\nonumber\\\\\n",
+    "p(y_i=0|x_i,\\boldsymbol{\\theta}) &= 1 - p(y_i=1|x_i,\\boldsymbol{\\theta}),\n",
+    "\\end{align*}\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "7961c3fc",
-   "metadata": {},
+   "id": "8a758aae",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "or consider the system"
+    "where $\\boldsymbol{\\theta}$ are the weights we wish to extract from data, in our case $\\theta_0$ and $\\theta_1$. \n",
+    "\n",
+    "Note that we used"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "9ae1698c",
-   "metadata": {},
+   "id": "88159170",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
-    "\\boldsymbol{A}\\boldsymbol{z} = \\boldsymbol{b}-\\boldsymbol{A}\\boldsymbol{x}_0,\n",
+    "p(y_i=0\\vert x_i, \\boldsymbol{\\theta}) = 1-p(y_i=1\\vert x_i, \\boldsymbol{\\theta}).\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "b5e2f728",
-   "metadata": {},
-   "source": [
-    "instead."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "f3d7aa36",
-   "metadata": {},
+   "id": "f9972402",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "## Steepest descent  method\n",
-    "One can show that the solution $\\boldsymbol{x}$ is also the unique minimizer of the quadratic form"
+    "## Maximum likelihood\n",
+    "\n",
+    "In order to define the total likelihood for all possible outcomes from a  \n",
+    "dataset $\\mathcal{D}=\\{(y_i,x_i)\\}$, with the binary labels\n",
+    "$y_i\\in\\{0,1\\}$ and where the data points are drawn independently, we use the so-called [Maximum Likelihood Estimation](https://en.wikipedia.org/wiki/Maximum_likelihood_estimation) (MLE) principle. \n",
+    "We aim thus at maximizing \n",
+    "the probability of seeing the observed data. We can then approximate the \n",
+    "likelihood in terms of the product of the individual probabilities of a specific outcome $y_i$, that is"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "2acb11da",
-   "metadata": {},
+   "id": "949524d2",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
-    "f(\\boldsymbol{x}) = \\frac{1}{2}\\boldsymbol{x}^T\\boldsymbol{A}\\boldsymbol{x} - \\boldsymbol{x}^T \\boldsymbol{x} , \\quad \\boldsymbol{x}\\in\\mathbf{R}^n.\n",
+    "\\begin{align*}\n",
+    "P(\\mathcal{D}|\\boldsymbol{\\theta})& = \\prod_{i=1}^n \\left[p(y_i=1|x_i,\\boldsymbol{\\theta})\\right]^{y_i}\\left[1-p(y_i=1|x_i,\\boldsymbol{\\theta}))\\right]^{1-y_i}\\nonumber \\\\\n",
+    "\\end{align*}\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "ee51e96c",
-   "metadata": {},
+   "id": "d9a7fded",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "This suggests taking the first basis vector $\\boldsymbol{r}_1$ (see below for definition) \n",
-    "to be the gradient of $f$ at $\\boldsymbol{x}=\\boldsymbol{x}_0$, \n",
-    "which equals"
+    "from which we obtain the log-likelihood and our **cost/loss** function"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "921a7d11",
-   "metadata": {},
+   "id": "4c5f78fb",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
-    "\\boldsymbol{A}\\boldsymbol{x}_0-\\boldsymbol{b},\n",
+    "\\mathcal{C}(\\boldsymbol{\\theta}) = \\sum_{i=1}^n \\left( y_i\\log{p(y_i=1|x_i,\\boldsymbol{\\theta})} + (1-y_i)\\log\\left[1-p(y_i=1|x_i,\\boldsymbol{\\theta}))\\right]\\right).\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "30f50187",
-   "metadata": {},
-   "source": [
-    "and \n",
-    "$\\boldsymbol{x}_0=0$ it is equal $-\\boldsymbol{b}$."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "268a19dd",
-   "metadata": {},
+   "id": "5ccce506",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "## Final expressions\n",
-    "We can compute the residual iteratively as"
+    "## The cost function rewritten\n",
+    "\n",
+    "Reordering the logarithms, we can rewrite the **cost/loss** function as"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "bc9ef63b",
-   "metadata": {},
+   "id": "bf58bb76",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
-    "\\boldsymbol{r}_{k+1}=\\boldsymbol{b}-\\boldsymbol{A}\\boldsymbol{x}_{k+1},\n",
+    "\\mathcal{C}(\\boldsymbol{\\theta}) = \\sum_{i=1}^n  \\left(y_i(\\theta_0+\\theta_1x_i) -\\log{(1+\\exp{(\\theta_0+\\theta_1x_i)})}\\right).\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "4b2a66e9",
-   "metadata": {},
+   "id": "41543ca6",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "which equals"
+    "The maximum likelihood estimator is defined as the set of parameters that maximize the log-likelihood where we maximize with respect to $\\theta$.\n",
+    "Since the cost (error) function is just the negative log-likelihood, for logistic regression we have that"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "cc10185f",
-   "metadata": {},
+   "id": "e664b57a",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
-    "\\boldsymbol{b}-\\boldsymbol{A}(\\boldsymbol{x}_k+\\alpha_k\\boldsymbol{r}_k),\n",
+    "\\mathcal{C}(\\boldsymbol{\\theta})=-\\sum_{i=1}^n  \\left(y_i(\\theta_0+\\theta_1x_i) -\\log{(1+\\exp{(\\theta_0+\\theta_1x_i)})}\\right).\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "bfea4d71",
-   "metadata": {},
-   "source": [
-    "or"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "0db69549",
-   "metadata": {},
+   "id": "eb357503",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "$$\n",
-    "(\\boldsymbol{b}-\\boldsymbol{A}\\boldsymbol{x}_k)-\\alpha_k\\boldsymbol{A}\\boldsymbol{r}_k,\n",
-    "$$"
+    "This equation is known in statistics as the **cross entropy**. Finally, we note that just as in linear regression, \n",
+    "in practice we often supplement the cross-entropy with additional regularization terms, usually $L_1$ and $L_2$ regularization as we did for Ridge and Lasso regression."
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "e950ac57",
-   "metadata": {},
+   "id": "e388ad02",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "which gives"
+    "## Minimizing the cross entropy\n",
+    "\n",
+    "The cross entropy is a convex function of the weights $\\boldsymbol{\\theta}$ and,\n",
+    "therefore, any local minimizer is a global minimizer. \n",
+    "\n",
+    "Minimizing this\n",
+    "cost function with respect to the two parameters $\\theta_0$ and $\\theta_1$ we obtain"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "0d0be156",
-   "metadata": {},
+   "id": "1d4f2850",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
-    "\\alpha_k = \\frac{\\boldsymbol{r}_k^T\\boldsymbol{r}_k}{\\boldsymbol{r}_k^T\\boldsymbol{A}\\boldsymbol{r}_k}\n",
+    "\\frac{\\partial \\mathcal{C}(\\boldsymbol{\\theta})}{\\partial \\theta_0} = -\\sum_{i=1}^n  \\left(y_i -\\frac{\\exp{(\\theta_0+\\theta_1x_i)}}{1+\\exp{(\\theta_0+\\theta_1x_i)}}\\right),\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "5a44b7f9",
-   "metadata": {},
+   "id": "68a0c133",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "leading to the iterative scheme"
+    "and"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "7fcb3188",
-   "metadata": {},
+   "id": "c942a72b",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
-    "\\boldsymbol{x}_{k+1}=\\boldsymbol{x}_k+\\alpha_k\\boldsymbol{r}_{k},\n",
+    "\\frac{\\partial \\mathcal{C}(\\boldsymbol{\\theta})}{\\partial \\theta_1} = -\\sum_{i=1}^n  \\left(y_ix_i -x_i\\frac{\\exp{(\\theta_0+\\theta_1x_i)}}{1+\\exp{(\\theta_0+\\theta_1x_i)}}\\right).\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "59681f93",
-   "metadata": {},
-   "source": [
-    "## Steepest descent example"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 1,
-   "id": "b27b7d36",
-   "metadata": {},
-   "outputs": [],
+   "id": "42caf6db",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "%matplotlib inline\n",
+    "## A more compact expression\n",
     "\n",
-    "import numpy as np\n",
-    "import numpy.linalg as la\n",
-    "\n",
-    "import scipy.optimize as sopt\n",
-    "\n",
-    "import matplotlib.pyplot as pt\n",
-    "from mpl_toolkits.mplot3d import axes3d\n",
-    "\n",
-    "def f(x):\n",
-    "    return x[0]**2 + 3.0*x[1]**2\n",
-    "\n",
-    "def df(x):\n",
-    "    return np.array([2*x[0], 6*x[1]])\n",
-    "\n",
-    "fig = pt.figure()\n",
-    "ax = fig.add_subplot(projection = '3d')\n",
-    "\n",
-    "xmesh, ymesh = np.mgrid[-3:3:50j,-3:3:50j]\n",
-    "fmesh = f(np.array([xmesh, ymesh]))\n",
-    "ax.plot_surface(xmesh, ymesh, fmesh)"
+    "Let us now define a vector $\\boldsymbol{y}$ with $n$ elements $y_i$, an\n",
+    "$n\\times p$ matrix $\\boldsymbol{X}$ which contains the $x_i$ values and a\n",
+    "vector $\\boldsymbol{p}$ of fitted probabilities $p(y_i\\vert x_i,\\boldsymbol{\\theta})$. We can rewrite in a more compact form the first\n",
+    "derivative of the cost function as"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "3cec05a7",
-   "metadata": {},
-   "source": [
-    "And then as countor plot"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 2,
-   "id": "3f348ef8",
-   "metadata": {},
-   "outputs": [],
+   "id": "22cd94c9",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "pt.axis(\"equal\")\n",
-    "pt.contour(xmesh, ymesh, fmesh)\n",
-    "guesses = [np.array([2, 2./5])]"
+    "$$\n",
+    "\\frac{\\partial \\mathcal{C}(\\boldsymbol{\\theta})}{\\partial \\boldsymbol{\\theta}} = -\\boldsymbol{X}^T\\left(\\boldsymbol{y}-\\boldsymbol{p}\\right).\n",
+    "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "b4e35f32",
-   "metadata": {},
-   "source": [
-    "Find guesses"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 3,
-   "id": "41aa6226",
-   "metadata": {},
-   "outputs": [],
+   "id": "d9428067",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "x = guesses[-1]\n",
-    "s = -df(x)"
+    "If we in addition define a diagonal matrix $\\boldsymbol{W}$ with elements \n",
+    "$p(y_i\\vert x_i,\\boldsymbol{\\theta})(1-p(y_i\\vert x_i,\\boldsymbol{\\theta})$, we can obtain a compact expression of the second derivative as"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "03977607",
-   "metadata": {},
+   "id": "29178d5a",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "Run it!"
+    "$$\n",
+    "\\frac{\\partial^2 \\mathcal{C}(\\boldsymbol{\\theta})}{\\partial \\boldsymbol{\\theta}\\partial \\boldsymbol{\\theta}^T} = \\boldsymbol{X}^T\\boldsymbol{W}\\boldsymbol{X}.\n",
+    "$$"
    ]
   },
   {
-   "cell_type": "code",
-   "execution_count": 4,
-   "id": "318314e2",
-   "metadata": {},
-   "outputs": [],
+   "cell_type": "markdown",
+   "id": "6b7671ad",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "def f1d(alpha):\n",
-    "    return f(x + alpha*s)\n",
+    "## Extending to more predictors\n",
     "\n",
-    "alpha_opt = sopt.golden(f1d)\n",
-    "next_guess = x + alpha_opt * s\n",
-    "guesses.append(next_guess)\n",
-    "print(next_guess)"
+    "Within a binary classification problem, we can easily expand our model to include multiple predictors. Our ratio between likelihoods is then with $p$ predictors"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "a0261d67",
-   "metadata": {},
+   "id": "500b6574",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "What happened?"
+    "$$\n",
+    "\\log{ \\frac{p(\\boldsymbol{\\theta}\\boldsymbol{x})}{1-p(\\boldsymbol{\\theta}\\boldsymbol{x})}} = \\theta_0+\\theta_1x_1+\\theta_2x_2+\\dots+\\theta_px_p.\n",
+    "$$"
    ]
   },
   {
-   "cell_type": "code",
-   "execution_count": 5,
-   "id": "43d274e3",
-   "metadata": {},
-   "outputs": [],
+   "cell_type": "markdown",
+   "id": "cf0b50ce",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "pt.axis(\"equal\")\n",
-    "pt.contour(xmesh, ymesh, fmesh, 50)\n",
-    "it_array = np.array(guesses)\n",
-    "pt.plot(it_array.T[0], it_array.T[1], \"x-\")"
+    "Here we defined $\\boldsymbol{x}=[1,x_1,x_2,\\dots,x_p]$ and $\\boldsymbol{\\theta}=[\\theta_0, \\theta_1, \\dots, \\theta_p]$ leading to"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "a38a2dab",
-   "metadata": {},
+   "id": "537486ee",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "Note that we did only one iteration here. We can easily add more using our previous guesses."
+    "$$\n",
+    "p(\\boldsymbol{\\theta}\\boldsymbol{x})=\\frac{ \\exp{(\\theta_0+\\theta_1x_1+\\theta_2x_2+\\dots+\\theta_px_p)}}{1+\\exp{(\\theta_0+\\theta_1x_1+\\theta_2x_2+\\dots+\\theta_px_p)}}.\n",
+    "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "54215dfb",
-   "metadata": {},
+   "id": "534fb571",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "## Conjugate gradient method\n",
-    "In the CG method we define so-called conjugate directions and two vectors \n",
-    "$\\boldsymbol{s}$ and $\\boldsymbol{t}$\n",
-    "are said to be\n",
-    "conjugate if"
+    "## Including more classes\n",
+    "\n",
+    "Till now we have mainly focused on two classes, the so-called binary\n",
+    "system. Suppose we wish to extend to $K$ classes.  Let us for the sake\n",
+    "of simplicity assume we have only two predictors. We have then following model"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "b78458c4",
-   "metadata": {},
+   "id": "fa7ca275",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
-    "\\boldsymbol{s}^T\\boldsymbol{A}\\boldsymbol{t}= 0.\n",
+    "\\log{\\frac{p(C=1\\vert x)}{p(K\\vert x)}} = \\theta_{10}+\\theta_{11}x_1,\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "e4ff83c9",
-   "metadata": {},
+   "id": "cc765c0e",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "The philosophy of the CG method is to perform searches in various conjugate directions\n",
-    "of our vectors $\\boldsymbol{x}_i$ obeying the above criterion, namely"
+    "and"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "8f583f25",
-   "metadata": {},
+   "id": "2c43387d",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
-    "\\boldsymbol{x}_i^T\\boldsymbol{A}\\boldsymbol{x}_j= 0.\n",
+    "\\log{\\frac{p(C=2\\vert x)}{p(K\\vert x)}} = \\theta_{20}+\\theta_{21}x_1,\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "cef5af3c",
-   "metadata": {},
-   "source": [
-    "Two vectors are conjugate if they are orthogonal with respect to \n",
-    "this inner product. Being conjugate is a symmetric relation: if $\\boldsymbol{s}$ is conjugate to $\\boldsymbol{t}$, then $\\boldsymbol{t}$ is conjugate to $\\boldsymbol{s}$."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "121f7cf6",
-   "metadata": {},
+   "id": "e063f183",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "## Conjugate gradient method\n",
-    "An example is given by the eigenvectors of the matrix"
+    "and so on till the class $C=K-1$ class"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "bd6f31c1",
-   "metadata": {},
+   "id": "060fa00c",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
-    "\\boldsymbol{v}_i^T\\boldsymbol{A}\\boldsymbol{v}_j= \\lambda\\boldsymbol{v}_i^T\\boldsymbol{v}_j,\n",
+    "\\log{\\frac{p(C=K-1\\vert x)}{p(K\\vert x)}} = \\theta_{(K-1)0}+\\theta_{(K-1)1}x_1,\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "49e134c5",
-   "metadata": {},
+   "id": "b9034492",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "which is zero unless $i=j$."
+    "and the model is specified in term of $K-1$ so-called log-odds or\n",
+    "**logit** transformations."
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "246cde51",
-   "metadata": {},
+   "id": "b7fba1fc",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "## Conjugate gradient method\n",
-    "Assume now that we have a symmetric positive-definite matrix $\\boldsymbol{A}$ of size\n",
-    "$n\\times n$. At each iteration $i+1$ we obtain the conjugate direction of a vector"
+    "## More classes\n",
+    "\n",
+    "In our discussion of neural networks we will encounter the above again\n",
+    "in terms of a slightly modified function, the so-called **Softmax** function.\n",
+    "\n",
+    "The softmax function is used in various multiclass classification\n",
+    "methods, such as multinomial logistic regression (also known as\n",
+    "softmax regression), multiclass linear discriminant analysis, naive\n",
+    "Bayes classifiers, and artificial neural networks.  Specifically, in\n",
+    "multinomial logistic regression and linear discriminant analysis, the\n",
+    "input to the function is the result of $K$ distinct linear functions,\n",
+    "and the predicted probability for the $k$-th class given a sample\n",
+    "vector $\\boldsymbol{x}$ and a weighting vector $\\boldsymbol{\\theta}$ is (with two\n",
+    "predictors):"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "8f37039b",
-   "metadata": {},
+   "id": "a8346f86",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
-    "\\boldsymbol{x}_{i+1}=\\boldsymbol{x}_{i}+\\alpha_i\\boldsymbol{p}_{i}.\n",
+    "p(C=k\\vert \\mathbf {x} )=\\frac{\\exp{(\\theta_{k0}+\\theta_{k1}x_1)}}{1+\\sum_{l=1}^{K-1}\\exp{(\\theta_{l0}+\\theta_{l1}x_1)}}.\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "87526237",
-   "metadata": {},
+   "id": "b05e18eb",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "We assume that $\\boldsymbol{p}_{i}$ is a sequence of $n$ mutually conjugate directions. \n",
-    "Then the $\\boldsymbol{p}_{i}$  form a basis of $R^n$ and we can expand the solution \n",
-    "$  \\boldsymbol{A}\\boldsymbol{x} = \\boldsymbol{b}$ in this basis, namely"
+    "It is easy to extend to more predictors. The final class is"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "3e9efa18",
-   "metadata": {},
+   "id": "3bff89b1",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
-    "\\boldsymbol{x}  = \\sum^{n}_{i=1} \\alpha_i \\boldsymbol{p}_i.\n",
+    "p(C=K\\vert \\mathbf {x} )=\\frac{1}{1+\\sum_{l=1}^{K-1}\\exp{(\\theta_{l0}+\\theta_{l1}x_1)}},\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "93a7fb4b",
-   "metadata": {},
+   "id": "e89e832c",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "## Conjugate gradient method\n",
-    "The coefficients are given by"
+    "and they sum to one. Our earlier discussions were all specialized to\n",
+    "the case with two classes only. It is easy to see from the above that\n",
+    "what we derived earlier is compatible with these equations.\n",
+    "\n",
+    "To find the optimal parameters we would typically use a gradient\n",
+    "descent method.  Newton's method and gradient descent methods are\n",
+    "discussed in the material on [optimization\n",
+    "methods](https://compphysics.github.io/MachineLearning/doc/pub/Splines/html/Splines-bs.html)."
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "8a737bc7",
-   "metadata": {},
+   "id": "464d4933",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "$$\n",
-    "\\mathbf{A}\\mathbf{x} = \\sum^{n}_{i=1} \\alpha_i \\mathbf{A} \\mathbf{p}_i = \\mathbf{b}.\n",
-    "$$"
+    "## Optimization, the central part of any Machine Learning algortithm\n",
+    "\n",
+    "Almost every problem in machine learning and data science starts with\n",
+    "a dataset $X$, a model $g(\\theta)$, which is a function of the\n",
+    "parameters $\\theta$ and a cost function $C(X, g(\\theta))$ that allows\n",
+    "us to judge how well the model $g(\\theta)$ explains the observations\n",
+    "$X$. The model is fit by finding the values of $\\theta$ that minimize\n",
+    "the cost function. Ideally we would be able to solve for $\\theta$\n",
+    "analytically, however this is not possible in general and we must use\n",
+    "some approximative/numerical method to compute the minimum."
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "383e3298",
-   "metadata": {},
+   "id": "c707d4a0",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "Multiplying with $\\boldsymbol{p}_k^T$  from the left gives"
+    "## Revisiting our Logistic Regression case\n",
+    "\n",
+    "In our discussion on Logistic Regression we studied the \n",
+    "case of\n",
+    "two classes, with $y_i$ either\n",
+    "$0$ or $1$. Furthermore we assumed also that we have only two\n",
+    "parameters $\\theta$ in our fitting, that is we\n",
+    "defined probabilities"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "c25b2dfa",
-   "metadata": {},
+   "id": "3f00d244",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
-    "\\boldsymbol{p}_k^T \\boldsymbol{A}\\boldsymbol{x} = \\sum^{n}_{i=1} \\alpha_i\\boldsymbol{p}_k^T \\boldsymbol{A}\\boldsymbol{p}_i= \\boldsymbol{p}_k^T \\boldsymbol{b},\n",
+    "\\begin{align*}\n",
+    "p(y_i=1|x_i,\\boldsymbol{\\theta}) &= \\frac{\\exp{(\\theta_0+\\theta_1x_i)}}{1+\\exp{(\\theta_0+\\theta_1x_i)}},\\nonumber\\\\\n",
+    "p(y_i=0|x_i,\\boldsymbol{\\theta}) &= 1 - p(y_i=1|x_i,\\boldsymbol{\\theta}),\n",
+    "\\end{align*}\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "4b3f5db3",
-   "metadata": {},
-   "source": [
-    "and we can define the coefficients $\\alpha_k$ as"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "ebd88b2e",
-   "metadata": {},
+   "id": "2d239661",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "$$\n",
-    "\\alpha_k = \\frac{\\boldsymbol{p}_k^T \\boldsymbol{b}}{\\boldsymbol{p}_k^T \\boldsymbol{A} \\boldsymbol{p}_k}\n",
-    "$$"
+    "where $\\boldsymbol{\\theta}$ are the weights we wish to extract from data, in our case $\\theta_0$ and $\\theta_1$."
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "baa81bcb",
-   "metadata": {},
+   "id": "4243778f",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "## Conjugate gradient method and iterations\n",
-    "\n",
-    "If we choose the conjugate vectors $\\boldsymbol{p}_k$ carefully, \n",
-    "then we may not need all of them to obtain a good approximation to the solution \n",
-    "$\\boldsymbol{x}$. \n",
-    "We want to regard the conjugate gradient method as an iterative method. \n",
-    "This will us to solve systems where $n$ is so large that the direct \n",
-    "method would take too much time.\n",
+    "## The equations to solve\n",
     "\n",
-    "We denote the initial guess for $\\boldsymbol{x}$ as $\\boldsymbol{x}_0$. \n",
-    "We can assume without loss of generality that"
+    "Our compact equations used a definition of a vector $\\boldsymbol{y}$ with $n$\n",
+    "elements $y_i$, an $n\\times p$ matrix $\\boldsymbol{X}$ which contains the\n",
+    "$x_i$ values and a vector $\\boldsymbol{p}$ of fitted probabilities\n",
+    "$p(y_i\\vert x_i,\\boldsymbol{\\theta})$. We rewrote in a more compact form\n",
+    "the first derivative of the cost function as"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "46e17082",
-   "metadata": {},
+   "id": "21ce04bb",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
-    "\\boldsymbol{x}_0=0,\n",
+    "\\frac{\\partial \\mathcal{C}(\\boldsymbol{\\theta})}{\\partial \\boldsymbol{\\theta}} = -\\boldsymbol{X}^T\\left(\\boldsymbol{y}-\\boldsymbol{p}\\right).\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "ed1e6366",
-   "metadata": {},
+   "id": "b854153c",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "or consider the system"
+    "If we in addition define a diagonal matrix $\\boldsymbol{W}$ with elements \n",
+    "$p(y_i\\vert x_i,\\boldsymbol{\\theta})(1-p(y_i\\vert x_i,\\boldsymbol{\\theta})$, we can obtain a compact expression of the second derivative as"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "44dfe77f",
-   "metadata": {},
+   "id": "235c9b1d",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
-    "\\boldsymbol{A}\\boldsymbol{z} = \\boldsymbol{b}-\\boldsymbol{A}\\boldsymbol{x}_0,\n",
+    "\\frac{\\partial^2 \\mathcal{C}(\\boldsymbol{\\theta})}{\\partial \\boldsymbol{\\theta}\\partial \\boldsymbol{\\theta}^T} = \\boldsymbol{X}^T\\boldsymbol{W}\\boldsymbol{X}.\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "51b93467",
-   "metadata": {},
+   "id": "1651fe82",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "instead."
+    "This defines what is called  the Hessian matrix."
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "659810a9",
-   "metadata": {},
+   "id": "f36a8c94",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "## Conjugate gradient method\n",
-    "One can show that the solution $\\boldsymbol{x}$ is also the unique minimizer of the quadratic form"
+    "## Solving using Newton-Raphson's method\n",
+    "\n",
+    "If we can set up these equations, Newton-Raphson's iterative method is normally the method of choice. It requires however that we can compute in an efficient way the  matrices that define the first and second derivatives. \n",
+    "\n",
+    "Our iterative scheme is then given by"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "62890f2c",
-   "metadata": {},
+   "id": "438b5efe",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
-    "f(\\boldsymbol{x}) = \\frac{1}{2}\\boldsymbol{x}^T\\boldsymbol{A}\\boldsymbol{x} - \\boldsymbol{x}^T \\boldsymbol{x} , \\quad \\boldsymbol{x}\\in\\mathbf{R}^n.\n",
+    "\\boldsymbol{\\theta}^{\\mathrm{new}} = \\boldsymbol{\\theta}^{\\mathrm{old}}-\\left(\\frac{\\partial^2 \\mathcal{C}(\\boldsymbol{\\theta})}{\\partial \\boldsymbol{\\theta}\\partial \\boldsymbol{\\theta}^T}\\right)^{-1}_{\\boldsymbol{\\theta}^{\\mathrm{old}}}\\times \\left(\\frac{\\partial \\mathcal{C}(\\boldsymbol{\\theta})}{\\partial \\boldsymbol{\\theta}}\\right)_{\\boldsymbol{\\theta}^{\\mathrm{old}}},\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "c0fe14af",
-   "metadata": {},
+   "id": "f3ae8207",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "This suggests taking the first basis vector $\\boldsymbol{p}_1$ \n",
-    "to be the gradient of $f$ at $\\boldsymbol{x}=\\boldsymbol{x}_0$, \n",
-    "which equals"
+    "or in matrix form as"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "1e52026b",
-   "metadata": {},
+   "id": "702a38c4",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
-    "\\boldsymbol{A}\\boldsymbol{x}_0-\\boldsymbol{b},\n",
+    "\\boldsymbol{\\theta}^{\\mathrm{new}} = \\boldsymbol{\\theta}^{\\mathrm{old}}-\\left(\\boldsymbol{X}^T\\boldsymbol{W}\\boldsymbol{X} \\right)^{-1}\\times \\left(-\\boldsymbol{X}^T(\\boldsymbol{y}-\\boldsymbol{p}) \\right)_{\\boldsymbol{\\theta}^{\\mathrm{old}}}.\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "bcf59df7",
-   "metadata": {},
+   "id": "43b5a9ab",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "and \n",
-    "$\\boldsymbol{x}_0=0$ it is equal $-\\boldsymbol{b}$.\n",
-    "The other vectors in the basis will be conjugate to the gradient, \n",
-    "hence the name conjugate gradient method."
+    "The right-hand side is computed with the old values of $\\theta$. \n",
+    "\n",
+    "If we can compute these matrices, in particular the Hessian, the above is often the easiest method to implement."
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "292e8dc5",
-   "metadata": {},
+   "id": "5b579d10",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "## Conjugate gradient method\n",
-    "Let  $\\boldsymbol{r}_k$ be the residual at the $k$-th step:"
+    "## Example code for Logistic Regression\n",
+    "\n",
+    "Here we make a class for Logistic regression. The code uses a simple data set and includes both a binary case and a multiclass case."
    ]
   },
   {
-   "cell_type": "markdown",
-   "id": "093ad28a",
-   "metadata": {},
+   "cell_type": "code",
+   "execution_count": 9,
+   "id": "a59b8c77",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
    "source": [
-    "$$\n",
-    "\\boldsymbol{r}_k=\\boldsymbol{b}-\\boldsymbol{A}\\boldsymbol{x}_k.\n",
-    "$$"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "9ee5dacc",
-   "metadata": {},
+    "import numpy as np\n",
+    "\n",
+    "class LogisticRegression:\n",
+    "    \"\"\"\n",
+    "    Logistic Regression for binary and multiclass classification.\n",
+    "    \"\"\"\n",
+    "    def __init__(self, lr=0.01, epochs=1000, fit_intercept=True, verbose=False):\n",
+    "        self.lr = lr                  # Learning rate for gradient descent\n",
+    "        self.epochs = epochs          # Number of iterations\n",
+    "        self.fit_intercept = fit_intercept  # Whether to add intercept (bias)\n",
+    "        self.verbose = verbose        # Print loss during training if True\n",
+    "        self.weights = None\n",
+    "        self.multi_class = False      # Will be determined at fit time\n",
+    "\n",
+    "    def _add_intercept(self, X):\n",
+    "        \"\"\"Add intercept term (column of ones) to feature matrix.\"\"\"\n",
+    "        intercept = np.ones((X.shape[0], 1))\n",
+    "        return np.concatenate((intercept, X), axis=1)\n",
+    "\n",
+    "    def _sigmoid(self, z):\n",
+    "        \"\"\"Sigmoid function for binary logistic.\"\"\"\n",
+    "        return 1 / (1 + np.exp(-z))\n",
+    "\n",
+    "    def _softmax(self, Z):\n",
+    "        \"\"\"Softmax function for multiclass logistic.\"\"\"\n",
+    "        exp_Z = np.exp(Z - np.max(Z, axis=1, keepdims=True))\n",
+    "        return exp_Z / np.sum(exp_Z, axis=1, keepdims=True)\n",
+    "\n",
+    "    def fit(self, X, y):\n",
+    "        \"\"\"\n",
+    "        Train the logistic regression model using gradient descent.\n",
+    "        Supports binary (sigmoid) and multiclass (softmax) based on y.\n",
+    "        \"\"\"\n",
+    "        X = np.array(X)\n",
+    "        y = np.array(y)\n",
+    "        n_samples, n_features = X.shape\n",
+    "\n",
+    "        # Add intercept if needed\n",
+    "        if self.fit_intercept:\n",
+    "            X = self._add_intercept(X)\n",
+    "            n_features += 1\n",
+    "\n",
+    "        # Determine classes and mode (binary vs multiclass)\n",
+    "        unique_classes = np.unique(y)\n",
+    "        if len(unique_classes) > 2:\n",
+    "            self.multi_class = True\n",
+    "        else:\n",
+    "            self.multi_class = False\n",
+    "\n",
+    "        # ----- Multiclass case -----\n",
+    "        if self.multi_class:\n",
+    "            n_classes = len(unique_classes)\n",
+    "            # Map original labels to 0...n_classes-1\n",
+    "            class_to_index = {c: idx for idx, c in enumerate(unique_classes)}\n",
+    "            y_indices = np.array([class_to_index[c] for c in y])\n",
+    "            # Initialize weight matrix (features x classes)\n",
+    "            self.weights = np.zeros((n_features, n_classes))\n",
+    "\n",
+    "            # One-hot encode y\n",
+    "            Y_onehot = np.zeros((n_samples, n_classes))\n",
+    "            Y_onehot[np.arange(n_samples), y_indices] = 1\n",
+    "\n",
+    "            # Gradient descent\n",
+    "            for epoch in range(self.epochs):\n",
+    "                scores = X.dot(self.weights)          # Linear scores (n_samples x n_classes)\n",
+    "                probs = self._softmax(scores)        # Probabilities (n_samples x n_classes)\n",
+    "                # Compute gradient (features x classes)\n",
+    "                gradient = (1 / n_samples) * X.T.dot(probs - Y_onehot)\n",
+    "                # Update weights\n",
+    "                self.weights -= self.lr * gradient\n",
+    "\n",
+    "                if self.verbose and epoch % 100 == 0:\n",
+    "                    # Compute current loss (categorical cross-entropy)\n",
+    "                    loss = -np.sum(Y_onehot * np.log(probs + 1e-15)) / n_samples\n",
+    "                    print(f\"[Epoch {epoch}] Multiclass loss: {loss:.4f}\")\n",
+    "\n",
+    "        # ----- Binary case -----\n",
+    "        else:\n",
+    "            # Convert y to 0/1 if not already\n",
+    "            if not np.array_equal(unique_classes, [0, 1]):\n",
+    "                # Map the two classes to 0 and 1\n",
+    "                class0, class1 = unique_classes\n",
+    "                y_binary = np.where(y == class1, 1, 0)\n",
+    "            else:\n",
+    "                y_binary = y.copy().astype(int)\n",
+    "\n",
+    "            # Initialize weights vector (features,)\n",
+    "            self.weights = np.zeros(n_features)\n",
+    "\n",
+    "            # Gradient descent\n",
+    "            for epoch in range(self.epochs):\n",
+    "                linear_model = X.dot(self.weights)     # (n_samples,)\n",
+    "                probs = self._sigmoid(linear_model)   # (n_samples,)\n",
+    "                # Gradient for binary cross-entropy\n",
+    "                gradient = (1 / n_samples) * X.T.dot(probs - y_binary)\n",
+    "                self.weights -= self.lr * gradient\n",
+    "\n",
+    "                if self.verbose and epoch % 100 == 0:\n",
+    "                    # Compute binary cross-entropy loss\n",
+    "                    loss = -np.mean(\n",
+    "                        y_binary * np.log(probs + 1e-15) + \n",
+    "                        (1 - y_binary) * np.log(1 - probs + 1e-15)\n",
+    "                    )\n",
+    "                    print(f\"[Epoch {epoch}] Binary loss: {loss:.4f}\")\n",
+    "\n",
+    "    def predict_prob(self, X):\n",
+    "        \"\"\"\n",
+    "        Compute probability estimates. Returns a 1D array for binary or\n",
+    "        a 2D array (n_samples x n_classes) for multiclass.\n",
+    "        \"\"\"\n",
+    "        X = np.array(X)\n",
+    "        # Add intercept if the model used it\n",
+    "        if self.fit_intercept:\n",
+    "            X = self._add_intercept(X)\n",
+    "        scores = X.dot(self.weights)\n",
+    "        if self.multi_class:\n",
+    "            return self._softmax(scores)\n",
+    "        else:\n",
+    "            return self._sigmoid(scores)\n",
+    "\n",
+    "    def predict(self, X):\n",
+    "        \"\"\"\n",
+    "        Predict class labels for samples in X.\n",
+    "        Returns integer class labels (0,1 for binary, or 0...C-1 for multiclass).\n",
+    "        \"\"\"\n",
+    "        probs = self.predict_prob(X)\n",
+    "        if self.multi_class:\n",
+    "            # Choose class with highest probability\n",
+    "            return np.argmax(probs, axis=1)\n",
+    "        else:\n",
+    "            # Threshold at 0.5 for binary\n",
+    "            return (probs >= 0.5).astype(int)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d7401376",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "Note that $\\boldsymbol{r}_k$ is the negative gradient of $f$ at \n",
-    "$\\boldsymbol{x}=\\boldsymbol{x}_k$, \n",
-    "so the gradient descent method would be to move in the direction $\\boldsymbol{r}_k$. \n",
-    "Here, we insist that the directions $\\boldsymbol{p}_k$ are conjugate to each other, \n",
-    "so we take the direction closest to the gradient $\\boldsymbol{r}_k$  \n",
-    "under the conjugacy constraint. \n",
-    "This gives the following expression"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "68f99505",
-   "metadata": {},
-   "source": [
-    "$$\n",
-    "\\boldsymbol{p}_{k+1}=\\boldsymbol{r}_k-\\frac{\\boldsymbol{p}_k^T \\boldsymbol{A}\\boldsymbol{r}_k}{\\boldsymbol{p}_k^T\\boldsymbol{A}\\boldsymbol{p}_k} \\boldsymbol{p}_k.\n",
-    "$$"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "bcac1586",
-   "metadata": {},
-   "source": [
-    "## Conjugate gradient method\n",
-    "We can also  compute the residual iteratively as"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "ce27b155",
-   "metadata": {},
-   "source": [
-    "$$\n",
-    "\\boldsymbol{r}_{k+1}=\\boldsymbol{b}-\\boldsymbol{A}\\boldsymbol{x}_{k+1},\n",
-    "$$"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "524bcfdb",
-   "metadata": {},
-   "source": [
-    "which equals"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "1386c597",
-   "metadata": {},
-   "source": [
-    "$$\n",
-    "\\boldsymbol{b}-\\boldsymbol{A}(\\boldsymbol{x}_k+\\alpha_k\\boldsymbol{p}_k),\n",
-    "$$"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "6f813648",
-   "metadata": {},
-   "source": [
-    "or"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "4c3248f2",
-   "metadata": {},
-   "source": [
-    "$$\n",
-    "(\\boldsymbol{b}-\\boldsymbol{A}\\boldsymbol{x}_k)-\\alpha_k\\boldsymbol{A}\\boldsymbol{p}_k,\n",
-    "$$"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "9bc69068",
-   "metadata": {},
-   "source": [
-    "which gives"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "14e706f1",
-   "metadata": {},
-   "source": [
-    "$$\n",
-    "\\boldsymbol{r}_{k+1}=\\boldsymbol{r}_k-\\boldsymbol{A}\\boldsymbol{p}_{k},\n",
-    "$$"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "8ff319c8",
-   "metadata": {},
-   "source": [
-    "## Revisiting our first homework\n",
-    "\n",
-    "We will use linear regression as a case study for the gradient descent\n",
-    "methods. Linear regression is a great test case for the gradient\n",
-    "descent methods discussed in the lectures since it has several\n",
-    "desirable properties such as:\n",
-    "\n",
-    "1. An analytical solution (recall homework set 1).\n",
-    "\n",
-    "2. The gradient can be computed analytically.\n",
-    "\n",
-    "3. The cost function is convex which guarantees that gradient descent converges for small enough learning rates\n",
-    "\n",
-    "We revisit an example similar to what we had in the first homework set. We had a function  of the type"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 6,
-   "id": "7e7faa97",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "m = 100\n",
-    "x = 2*np.random.rand(m,1)\n",
-    "y = 4+3*x+np.random.randn(m,1)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "7885df38",
-   "metadata": {},
-   "source": [
-    "with $x_i \\in [0,1] $ is chosen randomly using a uniform distribution. Additionally we have a stochastic noise chosen according to a normal distribution $\\cal {N}(0,1)$. \n",
-    "The linear regression model is given by"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "20151eb1",
-   "metadata": {},
-   "source": [
-    "$$\n",
-    "h_\\beta(x) = \\boldsymbol{y} = \\beta_0 + \\beta_1 x,\n",
-    "$$"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "87c3b2cd",
-   "metadata": {},
-   "source": [
-    "such that"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "493e2541",
-   "metadata": {},
-   "source": [
-    "$$\n",
-    "\\boldsymbol{y}_i = \\beta_0 + \\beta_1 x_i.\n",
-    "$$"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "8a082762",
-   "metadata": {},
-   "source": [
-    "## Gradient descent example\n",
-    "\n",
-    "Let $\\mathbf{y} = (y_1,\\cdots,y_n)^T$, $\\mathbf{\\boldsymbol{y}} = (\\boldsymbol{y}_1,\\cdots,\\boldsymbol{y}_n)^T$ and $\\beta = (\\beta_0, \\beta_1)^T$\n",
-    "\n",
-    "It is convenient to write $\\mathbf{\\boldsymbol{y}} = X\\beta$ where $X \\in \\mathbb{R}^{100 \\times 2} $ is the design matrix given by (we keep the intercept here)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "5c54bcb6",
-   "metadata": {},
-   "source": [
-    "$$\n",
-    "X \\equiv \\begin{bmatrix}\n",
-    "1 & x_1  \\\\\n",
-    "\\vdots & \\vdots  \\\\\n",
-    "1 & x_{100} &  \\\\\n",
-    "\\end{bmatrix}.\n",
-    "$$"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "23d09807",
-   "metadata": {},
-   "source": [
-    "The cost/loss/risk function is given by ("
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "e4d67729",
-   "metadata": {},
-   "source": [
-    "$$\n",
-    "C(\\beta) = \\frac{1}{n}||X\\beta-\\mathbf{y}||_{2}^{2} = \\frac{1}{n}\\sum_{i=1}^{100}\\left[ (\\beta_0 + \\beta_1 x_i)^2 - 2 y_i (\\beta_0 + \\beta_1 x_i) + y_i^2\\right]\n",
-    "$$"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "2eef4f99",
-   "metadata": {},
-   "source": [
-    "and we want to find $\\beta$ such that $C(\\beta)$ is minimized."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "56f73ade",
-   "metadata": {},
-   "source": [
-    "## The derivative of the cost/loss function\n",
-    "\n",
-    "Computing $\\partial C(\\beta) / \\partial \\beta_0$ and $\\partial C(\\beta) / \\partial \\beta_1$ we can show  that the gradient can be written as"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "8deab8b6",
-   "metadata": {},
-   "source": [
-    "$$\n",
-    "\\nabla_{\\beta} C(\\beta) = \\frac{2}{n}\\begin{bmatrix} \\sum_{i=1}^{100} \\left(\\beta_0+\\beta_1x_i-y_i\\right) \\\\\n",
-    "\\sum_{i=1}^{100}\\left( x_i (\\beta_0+\\beta_1x_i)-y_ix_i\\right) \\\\\n",
-    "\\end{bmatrix} = \\frac{2}{n}X^T(X\\beta - \\mathbf{y}),\n",
-    "$$"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "fd240c75",
-   "metadata": {},
-   "source": [
-    "where $X$ is the design matrix defined above."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "fdf6f2dd",
-   "metadata": {},
-   "source": [
-    "## The Hessian matrix\n",
-    "The Hessian matrix of $C(\\beta)$ is given by"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "f930ee8a",
-   "metadata": {},
-   "source": [
-    "$$\n",
-    "\\boldsymbol{H} \\equiv \\begin{bmatrix}\n",
-    "\\frac{\\partial^2 C(\\beta)}{\\partial \\beta_0^2} & \\frac{\\partial^2 C(\\beta)}{\\partial \\beta_0 \\partial \\beta_1}  \\\\\n",
-    "\\frac{\\partial^2 C(\\beta)}{\\partial \\beta_0 \\partial \\beta_1} & \\frac{\\partial^2 C(\\beta)}{\\partial \\beta_1^2} &  \\\\\n",
-    "\\end{bmatrix} = \\frac{2}{n}X^T X.\n",
-    "$$"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "898e06cb",
-   "metadata": {},
-   "source": [
-    "This result implies that $C(\\beta)$ is a convex function since the matrix $X^T X$ always is positive semi-definite."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "81725bf9",
-   "metadata": {},
-   "source": [
-    "## Simple program\n",
-    "\n",
-    "We can now write a program that minimizes $C(\\beta)$ using the gradient descent method with a constant learning rate $\\gamma$ according to"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "4ef5633f",
-   "metadata": {},
-   "source": [
-    "$$\n",
-    "\\beta_{k+1} = \\beta_k - \\gamma \\nabla_\\beta C(\\beta_k), \\ k=0,1,\\cdots\n",
-    "$$"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "142ffc92",
-   "metadata": {},
-   "source": [
-    "We can use the expression we computed for the gradient and let use a\n",
-    "$\\beta_0$ be chosen randomly and let $\\gamma = 0.001$. Stop iterating\n",
-    "when $||\\nabla_\\beta C(\\beta_k) || \\leq \\epsilon = 10^{-8}$. **Note that the code below does not include the latter stop criterion**.\n",
-    "\n",
-    "And finally we can compare our solution for $\\beta$ with the analytic result given by \n",
-    "$\\beta= (X^TX)^{-1} X^T \\mathbf{y}$."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "28051170",
-   "metadata": {},
-   "source": [
-    "## Gradient Descent Example\n",
-    "\n",
-    "Here our simple example"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 7,
-   "id": "481d02c8",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "\n",
-    "# Importing various packages\n",
-    "from random import random, seed\n",
-    "import numpy as np\n",
-    "import matplotlib.pyplot as plt\n",
-    "from mpl_toolkits.mplot3d import Axes3D\n",
-    "from matplotlib import cm\n",
-    "from matplotlib.ticker import LinearLocator, FormatStrFormatter\n",
-    "import sys\n",
-    "\n",
-    "# the number of datapoints\n",
-    "n = 100\n",
-    "x = 2*np.random.rand(n,1)\n",
-    "y = 4+3*x+np.random.randn(n,1)\n",
-    "\n",
-    "X = np.c_[np.ones((n,1)), x]\n",
-    "# Hessian matrix\n",
-    "H = (2.0/n)* X.T @ X\n",
-    "# Get the eigenvalues\n",
-    "EigValues, EigVectors = np.linalg.eig(H)\n",
-    "print(f\"Eigenvalues of Hessian Matrix:{EigValues}\")\n",
-    "\n",
-    "beta_linreg = np.linalg.inv(X.T @ X) @ X.T @ y\n",
-    "print(beta_linreg)\n",
-    "beta = np.random.randn(2,1)\n",
-    "\n",
-    "eta = 1.0/np.max(EigValues)\n",
-    "Niterations = 1000\n",
-    "\n",
-    "for iter in range(Niterations):\n",
-    "    gradient = (2.0/n)*X.T @ (X @ beta-y)\n",
-    "    beta -= eta*gradient\n",
-    "\n",
-    "print(beta)\n",
-    "xnew = np.array([[0],[2]])\n",
-    "xbnew = np.c_[np.ones((2,1)), xnew]\n",
-    "ypredict = xbnew.dot(beta)\n",
-    "ypredict2 = xbnew.dot(beta_linreg)\n",
-    "plt.plot(xnew, ypredict, \"r-\")\n",
-    "plt.plot(xnew, ypredict2, \"b-\")\n",
-    "plt.plot(x, y ,'ro')\n",
-    "plt.axis([0,2.0,0, 15.0])\n",
-    "plt.xlabel(r'$x$')\n",
-    "plt.ylabel(r'$y$')\n",
-    "plt.title(r'Gradient descent example')\n",
-    "plt.show()"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "35b0af87",
-   "metadata": {},
-   "source": [
-    "## And a corresponding example using **scikit-learn**"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 8,
-   "id": "d0822e39",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# Importing various packages\n",
-    "from random import random, seed\n",
-    "import numpy as np\n",
-    "import matplotlib.pyplot as plt\n",
-    "from sklearn.linear_model import SGDRegressor\n",
-    "\n",
-    "n = 100\n",
-    "x = 2*np.random.rand(n,1)\n",
-    "y = 4+3*x+np.random.randn(n,1)\n",
-    "\n",
-    "X = np.c_[np.ones((n,1)), x]\n",
-    "beta_linreg = np.linalg.inv(X.T @ X) @ (X.T @ y)\n",
-    "print(beta_linreg)\n",
-    "sgdreg = SGDRegressor(max_iter = 50, penalty=None, eta0=0.1)\n",
-    "sgdreg.fit(x,y.ravel())\n",
-    "print(sgdreg.intercept_, sgdreg.coef_)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "af7ac4d0",
-   "metadata": {},
-   "source": [
-    "## Gradient descent and Ridge\n",
-    "\n",
-    "We have also discussed Ridge regression where the loss function contains a regularized term given by the $L_2$ norm of $\\beta$,"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "9e04fb04",
-   "metadata": {},
-   "source": [
-    "$$\n",
-    "C_{\\text{ridge}}(\\beta) = \\frac{1}{n}||X\\beta -\\mathbf{y}||^2 + \\lambda ||\\beta||^2, \\ \\lambda \\geq 0.\n",
-    "$$"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "740b9955",
-   "metadata": {},
-   "source": [
-    "In order to minimize $C_{\\text{ridge}}(\\beta)$ using GD we adjust the gradient as follows"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "c3ffbc6f",
-   "metadata": {},
-   "source": [
-    "$$\n",
-    "\\nabla_\\beta C_{\\text{ridge}}(\\beta)  = \\frac{2}{n}\\begin{bmatrix} \\sum_{i=1}^{100} \\left(\\beta_0+\\beta_1x_i-y_i\\right) \\\\\n",
-    "\\sum_{i=1}^{100}\\left( x_i (\\beta_0+\\beta_1x_i)-y_ix_i\\right) \\\\\n",
-    "\\end{bmatrix} + 2\\lambda\\begin{bmatrix} \\beta_0 \\\\ \\beta_1\\end{bmatrix} = 2 (\\frac{1}{n}X^T(X\\beta - \\mathbf{y})+\\lambda \\beta).\n",
-    "$$"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "20eb1fff",
-   "metadata": {},
-   "source": [
-    "We can easily extend our program to minimize $C_{\\text{ridge}}(\\beta)$ using gradient descent and compare with the analytical solution given by"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "e995e244",
-   "metadata": {},
-   "source": [
-    "$$\n",
-    "\\beta_{\\text{ridge}} = \\left(X^T X + n\\lambda I_{2 \\times 2} \\right)^{-1} X^T \\mathbf{y}.\n",
-    "$$"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "1c300f84",
-   "metadata": {},
-   "source": [
-    "## The Hessian matrix for Ridge Regression\n",
-    "The Hessian matrix of Ridge Regression for our simple example  is given by"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "7d165637",
-   "metadata": {},
-   "source": [
-    "$$\n",
-    "\\boldsymbol{H} \\equiv \\begin{bmatrix}\n",
-    "\\frac{\\partial^2 C(\\beta)}{\\partial \\beta_0^2} & \\frac{\\partial^2 C(\\beta)}{\\partial \\beta_0 \\partial \\beta_1}  \\\\\n",
-    "\\frac{\\partial^2 C(\\beta)}{\\partial \\beta_0 \\partial \\beta_1} & \\frac{\\partial^2 C(\\beta)}{\\partial \\beta_1^2} &  \\\\\n",
-    "\\end{bmatrix} = \\frac{2}{n}X^T X+2\\lambda\\boldsymbol{I}.\n",
-    "$$"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "08f0c5d1",
-   "metadata": {},
-   "source": [
-    "This implies that the Hessian matrix  is positive definite, hence the stationary point is a\n",
-    "minimum.\n",
-    "Note that the Ridge cost function is convex being  a sum of two convex\n",
-    "functions. Therefore, the stationary point is a global\n",
-    "minimum of this function."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "7d06856d",
-   "metadata": {},
-   "source": [
-    "## Program example for gradient descent with Ridge Regression"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 9,
-   "id": "e439d732",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "from random import random, seed\n",
-    "import numpy as np\n",
-    "import matplotlib.pyplot as plt\n",
-    "from mpl_toolkits.mplot3d import Axes3D\n",
-    "from matplotlib import cm\n",
-    "from matplotlib.ticker import LinearLocator, FormatStrFormatter\n",
-    "import sys\n",
-    "\n",
-    "# the number of datapoints\n",
-    "n = 100\n",
-    "x = 2*np.random.rand(n,1)\n",
-    "y = 4+3*x+np.random.randn(n,1)\n",
-    "\n",
-    "X = np.c_[np.ones((n,1)), x]\n",
-    "XT_X = X.T @ X\n",
-    "\n",
-    "#Ridge parameter lambda\n",
-    "lmbda  = 0.001\n",
-    "Id = n*lmbda* np.eye(XT_X.shape[0])\n",
-    "\n",
-    "# Hessian matrix\n",
-    "H = (2.0/n)* XT_X+2*lmbda* np.eye(XT_X.shape[0])\n",
-    "# Get the eigenvalues\n",
-    "EigValues, EigVectors = np.linalg.eig(H)\n",
-    "print(f\"Eigenvalues of Hessian Matrix:{EigValues}\")\n",
-    "\n",
-    "\n",
-    "beta_linreg = np.linalg.inv(XT_X+Id) @ X.T @ y\n",
-    "print(beta_linreg)\n",
-    "# Start plain gradient descent\n",
-    "beta = np.random.randn(2,1)\n",
-    "\n",
-    "eta = 1.0/np.max(EigValues)\n",
-    "Niterations = 100\n",
-    "\n",
-    "for iter in range(Niterations):\n",
-    "    gradients = 2.0/n*X.T @ (X @ (beta)-y)+2*lmbda*beta\n",
-    "    beta -= eta*gradients\n",
-    "\n",
-    "print(beta)\n",
-    "ypredict = X @ beta\n",
-    "ypredict2 = X @ beta_linreg\n",
-    "plt.plot(x, ypredict, \"r-\")\n",
-    "plt.plot(x, ypredict2, \"b-\")\n",
-    "plt.plot(x, y ,'ro')\n",
-    "plt.axis([0,2.0,0, 15.0])\n",
-    "plt.xlabel(r'$x$')\n",
-    "plt.ylabel(r'$y$')\n",
-    "plt.title(r'Gradient descent example for Ridge')\n",
-    "plt.show()"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "aa6db5fd",
-   "metadata": {},
-   "source": [
-    "## Using gradient descent methods, limitations\n",
-    "\n",
-    "* **Gradient descent (GD) finds local minima of our function**. Since the GD algorithm is deterministic, if it converges, it will converge to a local minimum of our cost/loss/risk function. Because in ML we are often dealing with extremely rugged landscapes with many local minima, this can lead to poor performance.\n",
-    "\n",
-    "* **GD is sensitive to initial conditions**. One consequence of the local nature of GD is that initial conditions matter. Depending on where one starts, one will end up at a different local minima. Therefore, it is very important to think about how one initializes the training process. This is true for GD as well as more complicated variants of GD.\n",
-    "\n",
-    "* **Gradients are computationally expensive to calculate for large datasets**. In many cases in statistics and ML, the cost/loss/risk function is a sum of terms, with one term for each data point. For example, in linear regression, $E \\propto \\sum_{i=1}^n (y_i - \\mathbf{w}^T\\cdot\\mathbf{x}_i)^2$; for logistic regression, the square error is replaced by the cross entropy. To calculate the gradient we have to sum over *all* $n$ data points. Doing this at every GD step becomes extremely computationally expensive. An ingenious solution to this, is to calculate the gradients using small subsets of the data called \"mini batches\". This has the added benefit of introducing stochasticity into our algorithm.\n",
-    "\n",
-    "* **GD is very sensitive to choices of learning rates**. GD is extremely sensitive to the choice of learning rates. If the learning rate is very small, the training process take an extremely long time. For larger learning rates, GD can diverge and give poor results. Furthermore, depending on what the local landscape looks like, we have to modify the learning rates to ensure convergence. Ideally, we would *adaptively* choose the learning rates to match the landscape.\n",
-    "\n",
-    "* **GD treats all directions in parameter space uniformly.** Another major drawback of GD is that unlike Newton's method, the learning rate for GD is the same in all directions in parameter space. For this reason, the maximum learning rate is set by the behavior of the steepest direction and this can significantly slow down training. Ideally, we would like to take large steps in flat directions and small steps in steep directions. Since we are exploring rugged landscapes where curvatures change, this requires us to keep track of not only the gradient but second derivatives. The ideal scenario would be to calculate the Hessian but this proves to be too computationally expensive. \n",
-    "\n",
-    "* GD can take exponential time to escape saddle points, even with random initialization. As we mentioned, GD is extremely sensitive to initial condition since it determines the particular local minimum GD would eventually reach. However, even with a good initialization scheme, through the introduction of randomness, GD can still take exponential time to escape saddle points."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "4cf99d2c",
-   "metadata": {},
-   "source": [
-    "## Improving gradient descent with momentum\n",
-    "\n",
-    "We discuss here some simple examples where we introduce what is called 'memory'about previous steps, or what is normally called momentum gradient descent. The mathematics is explained below in connection with Stochastic gradient descent."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 10,
-   "id": "3eaa6001",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "from numpy import asarray\n",
-    "from numpy import arange\n",
-    "from numpy.random import rand\n",
-    "from numpy.random import seed\n",
-    "from matplotlib import pyplot\n",
-    " \n",
-    "# objective function\n",
-    "def objective(x):\n",
-    "\treturn x**2.0\n",
-    " \n",
-    "# derivative of objective function\n",
-    "def derivative(x):\n",
-    "\treturn x * 2.0\n",
-    " \n",
-    "# gradient descent algorithm\n",
-    "def gradient_descent(objective, derivative, bounds, n_iter, step_size):\n",
-    "\t# track all solutions\n",
-    "\tsolutions, scores = list(), list()\n",
-    "\t# generate an initial point\n",
-    "\tsolution = bounds[:, 0] + rand(len(bounds)) * (bounds[:, 1] - bounds[:, 0])\n",
-    "\t# run the gradient descent\n",
-    "\tfor i in range(n_iter):\n",
-    "\t\t# calculate gradient\n",
-    "\t\tgradient = derivative(solution)\n",
-    "\t\t# take a step\n",
-    "\t\tsolution = solution - step_size * gradient\n",
-    "\t\t# evaluate candidate point\n",
-    "\t\tsolution_eval = objective(solution)\n",
-    "\t\t# store solution\n",
-    "\t\tsolutions.append(solution)\n",
-    "\t\tscores.append(solution_eval)\n",
-    "\t\t# report progress\n",
-    "\t\tprint('>%d f(%s) = %.5f' % (i, solution, solution_eval))\n",
-    "\treturn [solutions, scores]\n",
-    " \n",
-    "# seed the pseudo random number generator\n",
-    "seed(4)\n",
-    "# define range for input\n",
-    "bounds = asarray([[-1.0, 1.0]])\n",
-    "# define the total iterations\n",
-    "n_iter = 30\n",
-    "# define the step size\n",
-    "step_size = 0.1\n",
-    "# perform the gradient descent search\n",
-    "solutions, scores = gradient_descent(objective, derivative, bounds, n_iter, step_size)\n",
-    "# sample input range uniformly at 0.1 increments\n",
-    "inputs = arange(bounds[0,0], bounds[0,1]+0.1, 0.1)\n",
-    "# compute targets\n",
-    "results = objective(inputs)\n",
-    "# create a line plot of input vs result\n",
-    "pyplot.plot(inputs, results)\n",
-    "# plot the solutions found\n",
-    "pyplot.plot(solutions, scores, '.-', color='red')\n",
-    "# show the plot\n",
-    "pyplot.show()"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "a94bcdd8",
-   "metadata": {},
-   "source": [
-    "## Same code but now with momentum gradient descent"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 11,
-   "id": "a83da0e4",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "from numpy import asarray\n",
-    "from numpy import arange\n",
-    "from numpy.random import rand\n",
-    "from numpy.random import seed\n",
-    "from matplotlib import pyplot\n",
-    " \n",
-    "# objective function\n",
-    "def objective(x):\n",
-    "\treturn x**2.0\n",
-    " \n",
-    "# derivative of objective function\n",
-    "def derivative(x):\n",
-    "\treturn x * 2.0\n",
-    " \n",
-    "# gradient descent algorithm\n",
-    "def gradient_descent(objective, derivative, bounds, n_iter, step_size, momentum):\n",
-    "\t# track all solutions\n",
-    "\tsolutions, scores = list(), list()\n",
-    "\t# generate an initial point\n",
-    "\tsolution = bounds[:, 0] + rand(len(bounds)) * (bounds[:, 1] - bounds[:, 0])\n",
-    "\t# keep track of the change\n",
-    "\tchange = 0.0\n",
-    "\t# run the gradient descent\n",
-    "\tfor i in range(n_iter):\n",
-    "\t\t# calculate gradient\n",
-    "\t\tgradient = derivative(solution)\n",
-    "\t\t# calculate update\n",
-    "\t\tnew_change = step_size * gradient + momentum * change\n",
-    "\t\t# take a step\n",
-    "\t\tsolution = solution - new_change\n",
-    "\t\t# save the change\n",
-    "\t\tchange = new_change\n",
-    "\t\t# evaluate candidate point\n",
-    "\t\tsolution_eval = objective(solution)\n",
-    "\t\t# store solution\n",
-    "\t\tsolutions.append(solution)\n",
-    "\t\tscores.append(solution_eval)\n",
-    "\t\t# report progress\n",
-    "\t\tprint('>%d f(%s) = %.5f' % (i, solution, solution_eval))\n",
-    "\treturn [solutions, scores]\n",
-    " \n",
-    "# seed the pseudo random number generator\n",
-    "seed(4)\n",
-    "# define range for input\n",
-    "bounds = asarray([[-1.0, 1.0]])\n",
-    "# define the total iterations\n",
-    "n_iter = 30\n",
-    "# define the step size\n",
-    "step_size = 0.1\n",
-    "# define momentum\n",
-    "momentum = 0.3\n",
-    "# perform the gradient descent search with momentum\n",
-    "solutions, scores = gradient_descent(objective, derivative, bounds, n_iter, step_size, momentum)\n",
-    "# sample input range uniformly at 0.1 increments\n",
-    "inputs = arange(bounds[0,0], bounds[0,1]+0.1, 0.1)\n",
-    "# compute targets\n",
-    "results = objective(inputs)\n",
-    "# create a line plot of input vs result\n",
-    "pyplot.plot(inputs, results)\n",
-    "# plot the solutions found\n",
-    "pyplot.plot(solutions, scores, '.-', color='red')\n",
-    "# show the plot\n",
-    "pyplot.show()"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "3ac55f04",
-   "metadata": {},
-   "source": [
-    "## Overview video on Stochastic Gradient Descent\n",
-    "\n",
-    "[What is Stochastic Gradient Descent](https://www.youtube.com/watch?v=vMh0zPT0tLI&ab_channel=StatQuestwithJoshStarmer)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "2a8579fd",
-   "metadata": {},
-   "source": [
-    "## Batches and mini-batches\n",
-    "\n",
-    "In gradient descent we compute the cost function and its gradient for all data points we have.\n",
-    "\n",
-    "In large-scale applications such as the [ILSVRC challenge](https://www.image-net.org/challenges/LSVRC/), the\n",
-    "training data can have on order of millions of examples. Hence, it\n",
-    "seems wasteful to compute the full cost function over the entire\n",
-    "training set in order to perform only a single parameter update. A\n",
-    "very common approach to addressing this challenge is to compute the\n",
-    "gradient over batches of the training data. For example, a typical batch could contain some thousand  examples from\n",
-    "an  entire training set of several millions. This batch is then used to\n",
-    "perform a parameter update."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "d07dbbb0",
-   "metadata": {},
-   "source": [
-    "## Stochastic Gradient Descent (SGD)\n",
-    "\n",
-    "In stochastic gradient descent, the extreme case is the case where we\n",
-    "have only one batch, that is we include the whole data set.\n",
-    "\n",
-    "This process is called Stochastic Gradient\n",
-    "Descent (SGD) (or also sometimes on-line gradient descent). This is\n",
-    "relatively less common to see because in practice due to vectorized\n",
-    "code optimizations it can be computationally much more efficient to\n",
-    "evaluate the gradient for 100 examples, than the gradient for one\n",
-    "example 100 times. Even though SGD technically refers to using a\n",
-    "single example at a time to evaluate the gradient, you will hear\n",
-    "people use the term SGD even when referring to mini-batch gradient\n",
-    "descent (i.e. mentions of MGD for “Minibatch Gradient Descent”, or BGD\n",
-    "for “Batch gradient descent” are rare to see), where it is usually\n",
-    "assumed that mini-batches are used. The size of the mini-batch is a\n",
-    "hyperparameter but it is not very common to cross-validate or bootstrap it. It is\n",
-    "usually based on memory constraints (if any), or set to some value,\n",
-    "e.g. 32, 64 or 128. We use powers of 2 in practice because many\n",
-    "vectorized operation implementations work faster when their inputs are\n",
-    "sized in powers of 2.\n",
-    "\n",
-    "In our notes with  SGD we mean stochastic gradient descent with mini-batches."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "f7050d91",
-   "metadata": {},
-   "source": [
-    "## Stochastic Gradient Descent\n",
-    "\n",
-    "Stochastic gradient descent (SGD) and variants thereof address some of\n",
-    "the shortcomings of the Gradient descent method discussed above.\n",
-    "\n",
-    "The underlying idea of SGD comes from the observation that the cost\n",
-    "function, which we want to minimize, can almost always be written as a\n",
-    "sum over $n$ data points $\\{\\mathbf{x}_i\\}_{i=1}^n$,"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "53a1eb81",
-   "metadata": {},
-   "source": [
-    "$$\n",
-    "C(\\mathbf{\\beta}) = \\sum_{i=1}^n c_i(\\mathbf{x}_i,\n",
-    "\\mathbf{\\beta}).\n",
-    "$$"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "e4d86364",
-   "metadata": {},
-   "source": [
-    "## Computation of gradients\n",
-    "\n",
-    "This in turn means that the gradient can be\n",
-    "computed as a sum over $i$-gradients"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "0fb47ea3",
-   "metadata": {},
-   "source": [
-    "$$\n",
-    "\\nabla_\\beta C(\\mathbf{\\beta}) = \\sum_i^n \\nabla_\\beta c_i(\\mathbf{x}_i,\n",
-    "\\mathbf{\\beta}).\n",
-    "$$"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "61242d66",
-   "metadata": {},
-   "source": [
-    "Stochasticity/randomness is introduced by only taking the\n",
-    "gradient on a subset of the data called minibatches.  If there are $n$\n",
-    "data points and the size of each minibatch is $M$, there will be $n/M$\n",
-    "minibatches. We denote these minibatches by $B_k$ where\n",
-    "$k=1,\\cdots,n/M$."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "242ed688",
-   "metadata": {},
-   "source": [
-    "## SGD example\n",
-    "As an example, suppose we have $10$ data points $(\\mathbf{x}_1,\\cdots, \\mathbf{x}_{10})$ \n",
-    "and we choose to have $M=5$ minibathces,\n",
-    "then each minibatch contains two data points. In particular we have\n",
-    "$B_1 = (\\mathbf{x}_1,\\mathbf{x}_2), \\cdots, B_5 =\n",
-    "(\\mathbf{x}_9,\\mathbf{x}_{10})$. Note that if you choose $M=1$ you\n",
-    "have only a single batch with all data points and on the other extreme,\n",
-    "you may choose $M=n$ resulting in a minibatch for each datapoint, i.e\n",
-    "$B_k = \\mathbf{x}_k$.\n",
-    "\n",
-    "The idea is now to approximate the gradient by replacing the sum over\n",
-    "all data points with a sum over the data points in one the minibatches\n",
-    "picked at random in each gradient descent step"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "9888d0c0",
-   "metadata": {},
-   "source": [
-    "$$\n",
-    "\\nabla_{\\beta}\n",
-    "C(\\mathbf{\\beta}) = \\sum_{i=1}^n \\nabla_\\beta c_i(\\mathbf{x}_i,\n",
-    "\\mathbf{\\beta}) \\rightarrow \\sum_{i \\in B_k}^n \\nabla_\\beta\n",
-    "c_i(\\mathbf{x}_i, \\mathbf{\\beta}).\n",
-    "$$"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "00908421",
-   "metadata": {},
-   "source": [
-    "## The gradient step\n",
-    "\n",
-    "Thus a gradient descent step now looks like"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "4dfe9a45",
-   "metadata": {},
-   "source": [
-    "$$\n",
-    "\\beta_{j+1} = \\beta_j - \\gamma_j \\sum_{i \\in B_k}^n \\nabla_\\beta c_i(\\mathbf{x}_i,\n",
-    "\\mathbf{\\beta})\n",
-    "$$"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "d91957f8",
-   "metadata": {},
-   "source": [
-    "where $k$ is picked at random with equal\n",
-    "probability from $[1,n/M]$. An iteration over the number of\n",
-    "minibathces (n/M) is commonly referred to as an epoch. Thus it is\n",
-    "typical to choose a number of epochs and for each epoch iterate over\n",
-    "the number of minibatches, as exemplified in the code below."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "2bf38592",
-   "metadata": {},
-   "source": [
-    "## Simple example code"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 12,
-   "id": "43232599",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "import numpy as np \n",
-    "\n",
-    "n = 100 #100 datapoints \n",
-    "M = 5   #size of each minibatch\n",
-    "m = int(n/M) #number of minibatches\n",
-    "n_epochs = 10 #number of epochs\n",
-    "\n",
-    "j = 0\n",
-    "for epoch in range(1,n_epochs+1):\n",
-    "    for i in range(m):\n",
-    "        k = np.random.randint(m) #Pick the k-th minibatch at random\n",
-    "        #Compute the gradient using the data in minibatch Bk\n",
-    "        #Compute new suggestion for \n",
-    "        j += 1"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "b6d1acf5",
-   "metadata": {},
-   "source": [
-    "Taking the gradient only on a subset of the data has two important\n",
-    "benefits. First, it introduces randomness which decreases the chance\n",
-    "that our opmization scheme gets stuck in a local minima. Second, if\n",
-    "the size of the minibatches are small relative to the number of\n",
-    "datapoints ($M <  n$), the computation of the gradient is much\n",
-    "cheaper since we sum over the datapoints in the $k-th$ minibatch and not\n",
-    "all $n$ datapoints."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "a14ce3ad",
-   "metadata": {},
-   "source": [
-    "## When do we stop?\n",
-    "\n",
-    "A natural question is when do we stop the search for a new minimum?\n",
-    "One possibility is to compute the full gradient after a given number\n",
-    "of epochs and check if the norm of the gradient is smaller than some\n",
-    "threshold and stop if true. However, the condition that the gradient\n",
-    "is zero is valid also for local minima, so this would only tell us\n",
-    "that we are close to a local/global minimum. However, we could also\n",
-    "evaluate the cost function at this point, store the result and\n",
-    "continue the search. If the test kicks in at a later stage we can\n",
-    "compare the values of the cost function and keep the $\\beta$ that\n",
-    "gave the lowest value."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "5d674391",
-   "metadata": {},
-   "source": [
-    "## Slightly different approach\n",
-    "\n",
-    "Another approach is to let the step length $\\gamma_j$ depend on the\n",
-    "number of epochs in such a way that it becomes very small after a\n",
-    "reasonable time such that we do not move at all. Such approaches are\n",
-    "also called scaling. There are many such ways to [scale the learning\n",
-    "rate](https://towardsdatascience.com/gradient-descent-the-learning-rate-and-the-importance-of-feature-scaling-6c0b416596e1)\n",
-    "and [discussions here](https://www.jmlr.org/papers/volume23/20-1258/20-1258.pdf). See\n",
-    "also\n",
-    "<https://towardsdatascience.com/learning-rate-schedules-and-adaptive-learning-rate-methods-for-deep-learning-2c8f433990d1>\n",
-    "for a discussion of different scaling functions for the learning rate."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "b01f74c8",
-   "metadata": {},
-   "source": [
-    "## Time decay rate\n",
-    "\n",
-    "As an example, let $e = 0,1,2,3,\\cdots$ denote the current epoch and let $t_0, t_1 > 0$ be two fixed numbers. Furthermore, let $t = e \\cdot m + i$ where $m$ is the number of minibatches and $i=0,\\cdots,m-1$. Then the function $$\\gamma_j(t; t_0, t_1) = \\frac{t_0}{t+t_1} $$ goes to zero as the number of epochs gets large. I.e. we start with a step length $\\gamma_j (0; t_0, t_1) = t_0/t_1$ which decays in *time* $t$.\n",
-    "\n",
-    "In this way we can fix the number of epochs, compute $\\beta$ and\n",
-    "evaluate the cost function at the end. Repeating the computation will\n",
-    "give a different result since the scheme is random by design. Then we\n",
-    "pick the final $\\beta$ that gives the lowest value of the cost\n",
-    "function."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 13,
-   "id": "81c42935",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "import numpy as np \n",
-    "\n",
-    "def step_length(t,t0,t1):\n",
-    "    return t0/(t+t1)\n",
-    "\n",
-    "n = 100 #100 datapoints \n",
-    "M = 5   #size of each minibatch\n",
-    "m = int(n/M) #number of minibatches\n",
-    "n_epochs = 500 #number of epochs\n",
-    "t0 = 1.0\n",
-    "t1 = 10\n",
-    "\n",
-    "gamma_j = t0/t1\n",
-    "j = 0\n",
-    "for epoch in range(1,n_epochs+1):\n",
-    "    for i in range(m):\n",
-    "        k = np.random.randint(m) #Pick the k-th minibatch at random\n",
-    "        #Compute the gradient using the data in minibatch Bk\n",
-    "        #Compute new suggestion for beta\n",
-    "        t = epoch*m+i\n",
-    "        gamma_j = step_length(t,t0,t1)\n",
-    "        j += 1\n",
-    "\n",
-    "print(\"gamma_j after %d epochs: %g\" % (n_epochs,gamma_j))"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "5a28593f",
-   "metadata": {},
-   "source": [
-    "## Code with a Number of Minibatches which varies\n",
-    "\n",
-    "In the code here we vary the number of mini-batches."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 14,
-   "id": "94ca81fd",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# Importing various packages\n",
-    "from math import exp, sqrt\n",
-    "from random import random, seed\n",
-    "import numpy as np\n",
-    "import matplotlib.pyplot as plt\n",
-    "\n",
-    "n = 100\n",
-    "x = 2*np.random.rand(n,1)\n",
-    "y = 4+3*x+np.random.randn(n,1)\n",
-    "\n",
-    "X = np.c_[np.ones((n,1)), x]\n",
-    "XT_X = X.T @ X\n",
-    "theta_linreg = np.linalg.inv(X.T @ X) @ (X.T @ y)\n",
-    "print(\"Own inversion\")\n",
-    "print(theta_linreg)\n",
-    "# Hessian matrix\n",
-    "H = (2.0/n)* XT_X\n",
-    "EigValues, EigVectors = np.linalg.eig(H)\n",
-    "print(f\"Eigenvalues of Hessian Matrix:{EigValues}\")\n",
-    "\n",
-    "theta = np.random.randn(2,1)\n",
-    "eta = 1.0/np.max(EigValues)\n",
-    "Niterations = 1000\n",
-    "\n",
-    "\n",
-    "for iter in range(Niterations):\n",
-    "    gradients = 2.0/n*X.T @ ((X @ theta)-y)\n",
-    "    theta -= eta*gradients\n",
-    "print(\"theta from own gd\")\n",
-    "print(theta)\n",
-    "\n",
-    "xnew = np.array([[0],[2]])\n",
-    "Xnew = np.c_[np.ones((2,1)), xnew]\n",
-    "ypredict = Xnew.dot(theta)\n",
-    "ypredict2 = Xnew.dot(theta_linreg)\n",
-    "\n",
-    "n_epochs = 50\n",
-    "M = 5   #size of each minibatch\n",
-    "m = int(n/M) #number of minibatches\n",
-    "t0, t1 = 5, 50\n",
-    "\n",
-    "def learning_schedule(t):\n",
-    "    return t0/(t+t1)\n",
-    "\n",
-    "theta = np.random.randn(2,1)\n",
-    "\n",
-    "for epoch in range(n_epochs):\n",
-    "# Can you figure out a better way of setting up the contributions to each batch?\n",
-    "    for i in range(m):\n",
-    "        random_index = M*np.random.randint(m)\n",
-    "        xi = X[random_index:random_index+M]\n",
-    "        yi = y[random_index:random_index+M]\n",
-    "        gradients = (2.0/M)* xi.T @ ((xi @ theta)-yi)\n",
-    "        eta = learning_schedule(epoch*m+i)\n",
-    "        theta = theta - eta*gradients\n",
-    "print(\"theta from own sdg\")\n",
-    "print(theta)\n",
-    "\n",
-    "plt.plot(xnew, ypredict, \"r-\")\n",
-    "plt.plot(xnew, ypredict2, \"b-\")\n",
-    "plt.plot(x, y ,'ro')\n",
-    "plt.axis([0,2.0,0, 15.0])\n",
-    "plt.xlabel(r'$x$')\n",
-    "plt.ylabel(r'$y$')\n",
-    "plt.title(r'Random numbers ')\n",
-    "plt.show()"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "603b60cd",
-   "metadata": {},
-   "source": [
-    "## Replace or not\n",
-    "\n",
-    "In the above code, we have use replacement in setting up the\n",
-    "mini-batches. The discussion\n",
-    "[here](https://sebastianraschka.com/faq/docs/sgd-methods.html) may be\n",
-    "useful."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "19b9dcdb",
-   "metadata": {},
-   "source": [
-    "## Momentum based GD\n",
-    "\n",
-    "The stochastic gradient descent (SGD) is almost always used with a\n",
-    "*momentum* or inertia term that serves as a memory of the direction we\n",
-    "are moving in parameter space.  This is typically implemented as\n",
-    "follows"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "a22a038a",
-   "metadata": {},
-   "source": [
-    "$$\n",
-    "\\mathbf{v}_{t}=\\gamma \\mathbf{v}_{t-1}+\\eta_{t}\\nabla_\\theta E(\\boldsymbol{\\theta}_t) \\nonumber\n",
-    "$$"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "787585ab",
-   "metadata": {},
-   "source": [
-    "<!-- Equation labels as ordinary links -->\n",
-    "<div id=\"_auto1\"></div>\n",
-    "\n",
-    "$$\n",
-    "\\begin{equation} \n",
-    "\\boldsymbol{\\theta}_{t+1}= \\boldsymbol{\\theta}_t -\\mathbf{v}_{t},\n",
-    "\\label{_auto1} \\tag{2}\n",
-    "\\end{equation}\n",
-    "$$"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "96ff0afc",
-   "metadata": {},
-   "source": [
-    "where we have introduced a momentum parameter $\\gamma$, with\n",
-    "$0\\le\\gamma\\le 1$, and for brevity we dropped the explicit notation to\n",
-    "indicate the gradient is to be taken over a different mini-batch at\n",
-    "each step. We call this algorithm gradient descent with momentum\n",
-    "(GDM). From these equations, it is clear that $\\mathbf{v}_t$ is a\n",
-    "running average of recently encountered gradients and\n",
-    "$(1-\\gamma)^{-1}$ sets the characteristic time scale for the memory\n",
-    "used in the averaging procedure. Consistent with this, when\n",
-    "$\\gamma=0$, this just reduces down to ordinary SGD as discussed\n",
-    "earlier. An equivalent way of writing the updates is"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "61791f14",
-   "metadata": {},
-   "source": [
-    "$$\n",
-    "\\Delta \\boldsymbol{\\theta}_{t+1} = \\gamma \\Delta \\boldsymbol{\\theta}_t -\\ \\eta_{t}\\nabla_\\theta E(\\boldsymbol{\\theta}_t),\n",
-    "$$"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "f823f328",
-   "metadata": {},
-   "source": [
-    "where we have defined $\\Delta \\boldsymbol{\\theta}_{t}= \\boldsymbol{\\theta}_t-\\boldsymbol{\\theta}_{t-1}$."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "c4d9d496",
-   "metadata": {},
-   "source": [
-    "## More on momentum based approaches\n",
-    "\n",
-    "Let us try to get more intuition from these equations. It is helpful\n",
-    "to consider a simple physical analogy with a particle of mass $m$\n",
-    "moving in a viscous medium with drag coefficient $\\mu$ and potential\n",
-    "$E(\\mathbf{w})$. If we denote the particle's position by $\\mathbf{w}$,\n",
-    "then its motion is described by"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "391417ce",
-   "metadata": {},
-   "source": [
-    "$$\n",
-    "m {d^2 \\mathbf{w} \\over dt^2} + \\mu {d \\mathbf{w} \\over dt }= -\\nabla_w E(\\mathbf{w}).\n",
-    "$$"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "4f023c4e",
-   "metadata": {},
-   "source": [
-    "We can discretize this equation in the usual way to get"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "69270617",
-   "metadata": {},
-   "source": [
-    "$$\n",
-    "m { \\mathbf{w}_{t+\\Delta t}-2 \\mathbf{w}_{t} +\\mathbf{w}_{t-\\Delta t} \\over (\\Delta t)^2}+\\mu {\\mathbf{w}_{t+\\Delta t}- \\mathbf{w}_{t} \\over \\Delta t} = -\\nabla_w E(\\mathbf{w}).\n",
-    "$$"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "2a4e6d30",
-   "metadata": {},
-   "source": [
-    "Rearranging this equation, we can rewrite this as"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "359bd99b",
-   "metadata": {},
-   "source": [
-    "$$\n",
-    "\\Delta \\mathbf{w}_{t +\\Delta t}= - { (\\Delta t)^2 \\over m +\\mu \\Delta t} \\nabla_w E(\\mathbf{w})+ {m \\over m +\\mu \\Delta t} \\Delta \\mathbf{w}_t.\n",
-    "$$"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "7ea6f847",
-   "metadata": {},
-   "source": [
-    "## Momentum parameter\n",
-    "\n",
-    "Notice that this equation is identical to previous one if we identify\n",
-    "the position of the particle, $\\mathbf{w}$, with the parameters\n",
-    "$\\boldsymbol{\\theta}$. This allows us to identify the momentum\n",
-    "parameter and learning rate with the mass of the particle and the\n",
-    "viscous drag as:"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "691c8355",
-   "metadata": {},
-   "source": [
-    "$$\n",
-    "\\gamma= {m \\over m +\\mu \\Delta t }, \\qquad \\eta = {(\\Delta t)^2 \\over m +\\mu \\Delta t}.\n",
-    "$$"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "36be1849",
-   "metadata": {},
-   "source": [
-    "Thus, as the name suggests, the momentum parameter is proportional to\n",
-    "the mass of the particle and effectively provides inertia.\n",
-    "Furthermore, in the large viscosity/small learning rate limit, our\n",
-    "memory time scales as $(1-\\gamma)^{-1} \\approx m/(\\mu \\Delta t)$.\n",
-    "\n",
-    "Why is momentum useful? SGD momentum helps the gradient descent\n",
-    "algorithm gain speed in directions with persistent but small gradients\n",
-    "even in the presence of stochasticity, while suppressing oscillations\n",
-    "in high-curvature directions. This becomes especially important in\n",
-    "situations where the landscape is shallow and flat in some directions\n",
-    "and narrow and steep in others. It has been argued that first-order\n",
-    "methods (with appropriate initial conditions) can perform comparable\n",
-    "to more expensive second order methods, especially in the context of\n",
-    "complex deep learning models.\n",
-    "\n",
-    "These beneficial properties of momentum can sometimes become even more\n",
-    "pronounced by using a slight modification of the classical momentum\n",
-    "algorithm called Nesterov Accelerated Gradient (NAG).\n",
-    "\n",
-    "In the NAG algorithm, rather than calculating the gradient at the\n",
-    "current parameters, $\\nabla_\\theta E(\\boldsymbol{\\theta}_t)$, one\n",
-    "calculates the gradient at the expected value of the parameters given\n",
-    "our current momentum, $\\nabla_\\theta E(\\boldsymbol{\\theta}_t +\\gamma\n",
-    "\\mathbf{v}_{t-1})$. This yields the NAG update rule"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "2b7f8487",
-   "metadata": {},
-   "source": [
-    "$$\n",
-    "\\mathbf{v}_{t}=\\gamma \\mathbf{v}_{t-1}+\\eta_{t}\\nabla_\\theta E(\\boldsymbol{\\theta}_t +\\gamma \\mathbf{v}_{t-1}) \\nonumber\n",
-    "$$"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "d8a09725",
-   "metadata": {},
-   "source": [
-    "<!-- Equation labels as ordinary links -->\n",
-    "<div id=\"_auto2\"></div>\n",
-    "\n",
-    "$$\n",
-    "\\begin{equation} \n",
-    "\\boldsymbol{\\theta}_{t+1}= \\boldsymbol{\\theta}_t -\\mathbf{v}_{t}.\n",
-    "\\label{_auto2} \\tag{3}\n",
-    "\\end{equation}\n",
-    "$$"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "2d94a01d",
-   "metadata": {},
-   "source": [
-    "One of the major advantages of NAG is that it allows for the use of a larger learning rate than GDM for the same choice of $\\gamma$."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "65d01e36",
-   "metadata": {},
-   "source": [
-    "## Second moment of the gradient\n",
-    "\n",
-    "In stochastic gradient descent, with and without momentum, we still\n",
-    "have to specify a schedule for tuning the learning rates $\\eta_t$\n",
-    "as a function of time.  As discussed in the context of Newton's\n",
-    "method, this presents a number of dilemmas. The learning rate is\n",
-    "limited by the steepest direction which can change depending on the\n",
-    "current position in the landscape. To circumvent this problem, ideally\n",
-    "our algorithm would keep track of curvature and take large steps in\n",
-    "shallow, flat directions and small steps in steep, narrow directions.\n",
-    "Second-order methods accomplish this by calculating or approximating\n",
-    "the Hessian and normalizing the learning rate by the\n",
-    "curvature. However, this is very computationally expensive for\n",
-    "extremely large models. Ideally, we would like to be able to\n",
-    "adaptively change the step size to match the landscape without paying\n",
-    "the steep computational price of calculating or approximating\n",
-    "Hessians.\n",
-    "\n",
-    "Recently, a number of methods have been introduced that accomplish\n",
-    "this by tracking not only the gradient, but also the second moment of\n",
-    "the gradient. These methods include AdaGrad, AdaDelta, Root Mean Squared Propagation (RMS-Prop), and\n",
-    "[ADAM](https://arxiv.org/abs/1412.6980)."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "8da72974",
-   "metadata": {},
-   "source": [
-    "## RMS prop\n",
-    "\n",
-    "In RMS prop, in addition to keeping a running average of the first\n",
-    "moment of the gradient, we also keep track of the second moment\n",
-    "denoted by $\\mathbf{s}_t=\\mathbb{E}[\\mathbf{g}_t^2]$. The update rule\n",
-    "for RMS prop is given by"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "c58d0f20",
-   "metadata": {},
-   "source": [
-    "<!-- Equation labels as ordinary links -->\n",
-    "<div id=\"_auto3\"></div>\n",
-    "\n",
-    "$$\n",
-    "\\begin{equation}\n",
-    "\\mathbf{g}_t = \\nabla_\\theta E(\\boldsymbol{\\theta}) \n",
-    "\\label{_auto3} \\tag{4}\n",
-    "\\end{equation}\n",
-    "$$"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "e5dc0738",
-   "metadata": {},
-   "source": [
-    "$$\n",
-    "\\mathbf{s}_t =\\beta \\mathbf{s}_{t-1} +(1-\\beta)\\mathbf{g}_t^2 \\nonumber\n",
-    "$$"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "fbd6fdd8",
-   "metadata": {},
-   "source": [
-    "$$\n",
-    "\\boldsymbol{\\theta}_{t+1}=\\boldsymbol{\\theta}_t - \\eta_t { \\mathbf{g}_t \\over \\sqrt{\\mathbf{s}_t +\\epsilon}}, \\nonumber\n",
-    "$$"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "626dc040",
-   "metadata": {},
-   "source": [
-    "where $\\beta$ controls the averaging time of the second moment and is\n",
-    "typically taken to be about $\\beta=0.9$, $\\eta_t$ is a learning rate\n",
-    "typically chosen to be $10^{-3}$, and $\\epsilon\\sim 10^{-8} $ is a\n",
-    "small regularization constant to prevent divergences. Multiplication\n",
-    "and division by vectors is understood as an element-wise operation. It\n",
-    "is clear from this formula that the learning rate is reduced in\n",
-    "directions where the norm of the gradient is consistently large. This\n",
-    "greatly speeds up the convergence by allowing us to use a larger\n",
-    "learning rate for flat directions."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "6fcf9b87",
-   "metadata": {},
-   "source": [
-    "## [ADAM optimizer](https://arxiv.org/abs/1412.6980)\n",
-    "\n",
-    "A related algorithm is the ADAM optimizer. In\n",
-    "[ADAM](https://arxiv.org/abs/1412.6980), we keep a running average of\n",
-    "both the first and second moment of the gradient and use this\n",
-    "information to adaptively change the learning rate for different\n",
-    "parameters.  The method isefficient when working with large\n",
-    "problems involving lots data and/or parameters.  It is a combination of the\n",
-    "gradient descent with momentum algorithm and the RMSprop algorithm\n",
-    "discussed above.\n",
-    "\n",
-    "In addition to keeping a running average of the first and\n",
-    "second moments of the gradient\n",
-    "(i.e. $\\mathbf{m}_t=\\mathbb{E}[\\mathbf{g}_t]$ and\n",
-    "$\\mathbf{s}_t=\\mathbb{E}[\\mathbf{g}^2_t]$, respectively), ADAM\n",
-    "performs an additional bias correction to account for the fact that we\n",
-    "are estimating the first two moments of the gradient using a running\n",
-    "average (denoted by the hats in the update rule below). The update\n",
-    "rule for ADAM is given by (where multiplication and division are once\n",
-    "again understood to be element-wise operations below)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "88bbaad2",
-   "metadata": {},
-   "source": [
-    "<!-- Equation labels as ordinary links -->\n",
-    "<div id=\"_auto4\"></div>\n",
-    "\n",
-    "$$\n",
-    "\\begin{equation}\n",
-    "\\mathbf{g}_t = \\nabla_\\theta E(\\boldsymbol{\\theta}) \n",
-    "\\label{_auto4} \\tag{5}\n",
-    "\\end{equation}\n",
-    "$$"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "b4349ab6",
-   "metadata": {},
-   "source": [
-    "$$\n",
-    "\\mathbf{m}_t = \\beta_1 \\mathbf{m}_{t-1} + (1-\\beta_1) \\mathbf{g}_t \\nonumber\n",
-    "$$"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "b2215a9e",
-   "metadata": {},
-   "source": [
-    "$$\n",
-    "\\mathbf{s}_t =\\beta_2 \\mathbf{s}_{t-1} +(1-\\beta_2)\\mathbf{g}_t^2 \\nonumber\n",
-    "$$"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "4d2452fb",
-   "metadata": {},
-   "source": [
-    "$$\n",
-    "\\boldsymbol{\\mathbf{m}}_t={\\mathbf{m}_t \\over 1-\\beta_1^t} \\nonumber\n",
-    "$$"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "2af543bd",
-   "metadata": {},
-   "source": [
-    "$$\n",
-    "\\boldsymbol{\\mathbf{s}}_t ={\\mathbf{s}_t \\over1-\\beta_2^t} \\nonumber\n",
-    "$$"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "67134da9",
-   "metadata": {},
-   "source": [
-    "$$\n",
-    "\\boldsymbol{\\theta}_{t+1}=\\boldsymbol{\\theta}_t - \\eta_t { \\boldsymbol{\\mathbf{m}}_t \\over \\sqrt{\\boldsymbol{\\mathbf{s}}_t} +\\epsilon}, \\nonumber\n",
-    "$$"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "59683156",
-   "metadata": {},
-   "source": [
-    "<!-- Equation labels as ordinary links -->\n",
-    "<div id=\"_auto5\"></div>\n",
-    "\n",
-    "$$\n",
-    "\\begin{equation} \n",
-    "\\label{_auto5} \\tag{6}\n",
-    "\\end{equation}\n",
-    "$$"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "56b23d7b",
-   "metadata": {},
-   "source": [
-    "where $\\beta_1$ and $\\beta_2$ set the memory lifetime of the first and\n",
-    "second moment and are typically taken to be $0.9$ and $0.99$\n",
-    "respectively, and $\\eta$ and $\\epsilon$ are identical to RMSprop.\n",
-    "\n",
-    "Like in RMSprop, the effective step size of a parameter depends on the\n",
-    "magnitude of its gradient squared.  To understand this better, let us\n",
-    "rewrite this expression in terms of the variance\n",
-    "$\\boldsymbol{\\sigma}_t^2 = \\boldsymbol{\\mathbf{s}}_t -\n",
-    "(\\boldsymbol{\\mathbf{m}}_t)^2$. Consider a single parameter $\\theta_t$. The\n",
-    "update rule for this parameter is given by"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "ef6b130f",
-   "metadata": {},
-   "source": [
-    "$$\n",
-    "\\Delta \\theta_{t+1}= -\\eta_t { \\boldsymbol{m}_t \\over \\sqrt{\\sigma_t^2 +  m_t^2 }+\\epsilon}.\n",
-    "$$"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "55d1d0ff",
-   "metadata": {},
-   "source": [
-    "## Algorithms and codes for Adagrad, RMSprop and Adam\n",
-    "\n",
-    "The algorithms we have implemented are well described in the text by [Goodfellow, Bengio and Courville, chapter 8](https://www.deeplearningbook.org/contents/optimization.html).\n",
-    "\n",
-    "The codes which implement these algorithms are discussed after our presentation of automatic differentiation."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "5b0fb743",
-   "metadata": {},
-   "source": [
-    "## Practical tips\n",
-    "\n",
-    "* **Randomize the data when making mini-batches**. It is always important to randomly shuffle the data when forming mini-batches. Otherwise, the gradient descent method can fit spurious correlations resulting from the order in which data is presented.\n",
-    "\n",
-    "* **Transform your inputs**. Learning becomes difficult when our landscape has a mixture of steep and flat directions. One simple trick for minimizing these situations is to standardize the data by subtracting the mean and normalizing the variance of input variables. Whenever possible, also decorrelate the inputs. To understand why this is helpful, consider the case of linear regression. It is easy to show that for the squared error cost function, the Hessian of the cost function is just the correlation matrix between the inputs. Thus, by standardizing the inputs, we are ensuring that the landscape looks homogeneous in all directions in parameter space. Since most deep networks can be viewed as linear transformations followed by a non-linearity at each layer, we expect this intuition to hold beyond the linear case.\n",
-    "\n",
-    "* **Monitor the out-of-sample performance.** Always monitor the performance of your model on a validation set (a small portion of the training data that is held out of the training process to serve as a proxy for the test set. If the validation error starts increasing, then the model is beginning to overfit. Terminate the learning process. This *early stopping* significantly improves performance in many settings.\n",
-    "\n",
-    "* **Adaptive optimization methods don't always have good generalization.** Recent studies have shown that adaptive methods such as ADAM, RMSPorp, and AdaGrad tend to have poor generalization compared to SGD or SGD with momentum, particularly in the high-dimensional limit (i.e. the number of parameters exceeds the number of data points). Although it is not clear at this stage why these methods perform so well in training deep neural networks, simpler procedures like properly-tuned SGD may work as well or better in these applications.\n",
-    "\n",
-    "Geron's text, see chapter 11, has several interesting discussions."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "ea05bbe2",
-   "metadata": {},
-   "source": [
-    "## Automatic differentiation\n",
-    "\n",
-    "[Automatic differentiation (AD)](https://en.wikipedia.org/wiki/Automatic_differentiation), \n",
-    "also called algorithmic\n",
-    "differentiation or computational differentiation,is a set of\n",
-    "techniques to numerically evaluate the derivative of a function\n",
-    "specified by a computer program. AD exploits the fact that every\n",
-    "computer program, no matter how complicated, executes a sequence of\n",
-    "elementary arithmetic operations (addition, subtraction,\n",
-    "multiplication, division, etc.) and elementary functions (exp, log,\n",
-    "sin, cos, etc.). By applying the chain rule repeatedly to these\n",
-    "operations, derivatives of arbitrary order can be computed\n",
-    "automatically, accurately to working precision, and using at most a\n",
-    "small constant factor more arithmetic operations than the original\n",
-    "program.\n",
-    "\n",
-    "Automatic differentiation is neither:\n",
-    "\n",
-    "* Symbolic differentiation, nor\n",
-    "\n",
-    "* Numerical differentiation (the method of finite differences).\n",
-    "\n",
-    "Symbolic differentiation can lead to inefficient code and faces the\n",
-    "difficulty of converting a computer program into a single expression,\n",
-    "while numerical differentiation can introduce round-off errors in the\n",
-    "discretization process and cancellation\n",
-    "\n",
-    "Python has tools for so-called **automatic differentiation**.\n",
-    "Consider the following example"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "b4cf1ace",
-   "metadata": {},
-   "source": [
-    "$$\n",
-    "f(x) = \\sin\\left(2\\pi x + x^2\\right)\n",
-    "$$"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "862002bf",
-   "metadata": {},
-   "source": [
-    "which has the following derivative"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "be897152",
-   "metadata": {},
-   "source": [
-    "$$\n",
-    "f'(x) = \\cos\\left(2\\pi x + x^2\\right)\\left(2\\pi + 2x\\right)\n",
-    "$$"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "13e4ae3a",
-   "metadata": {},
-   "source": [
-    "Using **autograd** we have"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 15,
-   "id": "d4a79e0d",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "import autograd.numpy as np\n",
-    "\n",
-    "# To do elementwise differentiation:\n",
-    "from autograd import elementwise_grad as egrad \n",
-    "\n",
-    "# To plot:\n",
-    "import matplotlib.pyplot as plt \n",
-    "\n",
-    "\n",
-    "def f(x):\n",
-    "    return np.sin(2*np.pi*x + x**2)\n",
-    "\n",
-    "def f_grad_analytic(x):\n",
-    "    return np.cos(2*np.pi*x + x**2)*(2*np.pi + 2*x)\n",
-    "\n",
-    "# Do the comparison:\n",
-    "x = np.linspace(0,1,1000)\n",
-    "\n",
-    "f_grad = egrad(f)\n",
-    "\n",
-    "computed = f_grad(x)\n",
-    "analytic = f_grad_analytic(x)\n",
-    "\n",
-    "plt.title('Derivative computed from Autograd compared with the analytical derivative')\n",
-    "plt.plot(x,computed,label='autograd')\n",
-    "plt.plot(x,analytic,label='analytic')\n",
-    "\n",
-    "plt.xlabel('x')\n",
-    "plt.ylabel('y')\n",
-    "plt.legend()\n",
-    "\n",
-    "plt.show()\n",
-    "\n",
-    "print(\"The max absolute difference is: %g\"%(np.max(np.abs(computed - analytic))))"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "4d3694c5",
-   "metadata": {},
-   "source": [
-    "## Using autograd\n",
-    "\n",
-    "Here we\n",
-    "experiment with what kind of functions Autograd is capable\n",
-    "of finding the gradient of. The following Python functions are just\n",
-    "meant to illustrate what Autograd can do, but please feel free to\n",
-    "experiment with other, possibly more complicated, functions as well."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 16,
-   "id": "a9719490",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "import autograd.numpy as np\n",
-    "from autograd import grad\n",
-    "\n",
-    "def f1(x):\n",
-    "    return x**3 + 1\n",
-    "\n",
-    "f1_grad = grad(f1)\n",
-    "\n",
-    "# Remember to send in float as argument to the computed gradient from Autograd!\n",
-    "a = 1.0\n",
-    "\n",
-    "# See the evaluated gradient at a using autograd:\n",
-    "print(\"The gradient of f1 evaluated at a = %g using autograd is: %g\"%(a,f1_grad(a)))\n",
-    "\n",
-    "# Compare with the analytical derivative, that is f1'(x) = 3*x**2 \n",
-    "grad_analytical = 3*a**2\n",
-    "print(\"The gradient of f1 evaluated at a = %g by finding the analytic expression is: %g\"%(a,grad_analytical))"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "b2259ecd",
-   "metadata": {},
-   "source": [
-    "## Autograd with more complicated functions\n",
-    "\n",
-    "To differentiate with respect to two (or more) arguments of a Python\n",
-    "function, Autograd need to know at which variable the function if\n",
-    "being differentiated with respect to."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 17,
-   "id": "5b154fa9",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "import autograd.numpy as np\n",
-    "from autograd import grad\n",
-    "def f2(x1,x2):\n",
-    "    return 3*x1**3 + x2*(x1 - 5) + 1\n",
-    "\n",
-    "# By sending the argument 0, Autograd will compute the derivative w.r.t the first variable, in this case x1\n",
-    "f2_grad_x1 = grad(f2,0)\n",
-    "\n",
-    "# ... and differentiate w.r.t x2 by sending 1 as an additional arugment to grad\n",
-    "f2_grad_x2 = grad(f2,1)\n",
-    "\n",
-    "x1 = 1.0\n",
-    "x2 = 3.0 \n",
-    "\n",
-    "print(\"Evaluating at x1 = %g, x2 = %g\"%(x1,x2))\n",
-    "print(\"-\"*30)\n",
-    "\n",
-    "# Compare with the analytical derivatives:\n",
-    "\n",
-    "# Derivative of f2 w.r.t x1 is: 9*x1**2 + x2:\n",
-    "f2_grad_x1_analytical = 9*x1**2 + x2\n",
-    "\n",
-    "# Derivative of f2 w.r.t x2 is: x1 - 5:\n",
-    "f2_grad_x2_analytical = x1 - 5\n",
-    "\n",
-    "# See the evaluated derivations:\n",
-    "print(\"The derivative of f2 w.r.t x1: %g\"%( f2_grad_x1(x1,x2) ))\n",
-    "print(\"The analytical derivative of f2 w.r.t x1: %g\"%( f2_grad_x1(x1,x2) ))\n",
-    "\n",
-    "print()\n",
-    "\n",
-    "print(\"The derivative of f2 w.r.t x2: %g\"%( f2_grad_x2(x1,x2) ))\n",
-    "print(\"The analytical derivative of f2 w.r.t x2: %g\"%( f2_grad_x2(x1,x2) ))"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "733bbca1",
-   "metadata": {},
-   "source": [
-    "Note that the grad function will not produce the true gradient of the function. The true gradient of a function with two or more variables will produce a vector, where each element is the function differentiated w.r.t a variable."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "d817341e",
-   "metadata": {},
-   "source": [
-    "## More complicated functions using the elements of their arguments directly"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 18,
-   "id": "1289ab58",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "import autograd.numpy as np\n",
-    "from autograd import grad\n",
-    "def f3(x): # Assumes x is an array of length 5 or higher\n",
-    "    return 2*x[0] + 3*x[1] + 5*x[2] + 7*x[3] + 11*x[4]**2\n",
-    "\n",
-    "f3_grad = grad(f3)\n",
-    "\n",
-    "x = np.linspace(0,4,5)\n",
-    "\n",
-    "# Print the computed gradient:\n",
-    "print(\"The computed gradient of f3 is: \", f3_grad(x))\n",
-    "\n",
-    "# The analytical gradient is: (2, 3, 5, 7, 22*x[4])\n",
-    "f3_grad_analytical = np.array([2, 3, 5, 7, 22*x[4]])\n",
-    "\n",
-    "# Print the analytical gradient:\n",
-    "print(\"The analytical gradient of f3 is: \", f3_grad_analytical)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "7d4fed1d",
-   "metadata": {},
-   "source": [
-    "Note that in this case, when sending an array as input argument, the\n",
-    "output from Autograd is another array. This is the true gradient of\n",
-    "the function, as opposed to the function in the previous example. By\n",
-    "using arrays to represent the variables, the output from Autograd\n",
-    "might be easier to work with, as the output is closer to what one\n",
-    "could expect form a gradient-evaluting function."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "5ad8360a",
-   "metadata": {},
-   "source": [
-    "## Functions using mathematical functions from Numpy"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 19,
-   "id": "0cf2c306",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "import autograd.numpy as np\n",
-    "from autograd import grad\n",
-    "def f4(x):\n",
-    "    return np.sqrt(1+x**2) + np.exp(x) + np.sin(2*np.pi*x)\n",
-    "\n",
-    "f4_grad = grad(f4)\n",
-    "\n",
-    "x = 2.7\n",
-    "\n",
-    "# Print the computed derivative:\n",
-    "print(\"The computed derivative of f4 at x = %g is: %g\"%(x,f4_grad(x)))\n",
-    "\n",
-    "# The analytical derivative is: x/sqrt(1 + x**2) + exp(x) + cos(2*pi*x)*2*pi\n",
-    "f4_grad_analytical = x/np.sqrt(1 + x**2) + np.exp(x) + np.cos(2*np.pi*x)*2*np.pi\n",
-    "\n",
-    "# Print the analytical gradient:\n",
-    "print(\"The analytical gradient of f4 at x = %g is: %g\"%(x,f4_grad_analytical))"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "70050aa3",
-   "metadata": {},
-   "source": [
-    "## More autograd"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 20,
-   "id": "7eee1b5e",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "import autograd.numpy as np\n",
-    "from autograd import grad\n",
-    "def f5(x):\n",
-    "    if x >= 0:\n",
-    "        return x**2\n",
-    "    else:\n",
-    "        return -3*x + 1\n",
-    "\n",
-    "f5_grad = grad(f5)\n",
-    "\n",
-    "x = 2.7\n",
-    "\n",
-    "# Print the computed derivative:\n",
-    "print(\"The computed derivative of f5 at x = %g is: %g\"%(x,f5_grad(x)))"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "352a0503",
-   "metadata": {},
-   "source": [
-    "## And  with loops"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 21,
-   "id": "943cc42b",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "import autograd.numpy as np\n",
-    "from autograd import grad\n",
-    "def f6_for(x):\n",
-    "    val = 0\n",
-    "    for i in range(10):\n",
-    "        val = val + x**i\n",
-    "    return val\n",
-    "\n",
-    "def f6_while(x):\n",
-    "    val = 0\n",
-    "    i = 0\n",
-    "    while i < 10:\n",
-    "        val = val + x**i\n",
-    "        i = i + 1\n",
-    "    return val\n",
-    "\n",
-    "f6_for_grad = grad(f6_for)\n",
-    "f6_while_grad = grad(f6_while)\n",
-    "\n",
-    "x = 0.5\n",
-    "\n",
-    "# Print the computed derivaties of f6_for and f6_while\n",
-    "print(\"The computed derivative of f6_for at x = %g is: %g\"%(x,f6_for_grad(x)))\n",
-    "print(\"The computed derivative of f6_while at x = %g is: %g\"%(x,f6_while_grad(x)))"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 22,
-   "id": "e8664459",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "import autograd.numpy as np\n",
-    "from autograd import grad\n",
-    "# Both of the functions are implementation of the sum: sum(x**i) for i = 0, ..., 9\n",
-    "# The analytical derivative is: sum(i*x**(i-1)) \n",
-    "f6_grad_analytical = 0\n",
-    "for i in range(10):\n",
-    "    f6_grad_analytical += i*x**(i-1)\n",
-    "\n",
-    "print(\"The analytical derivative of f6 at x = %g is: %g\"%(x,f6_grad_analytical))"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "6ddce8f0",
-   "metadata": {},
-   "source": [
-    "## Using recursion"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 23,
-   "id": "96a3253f",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "import autograd.numpy as np\n",
-    "from autograd import grad\n",
-    "\n",
-    "def f7(n): # Assume that n is an integer\n",
-    "    if n == 1 or n == 0:\n",
-    "        return 1\n",
-    "    else:\n",
-    "        return n*f7(n-1)\n",
-    "\n",
-    "f7_grad = grad(f7)\n",
-    "\n",
-    "n = 2.0\n",
-    "\n",
-    "print(\"The computed derivative of f7 at n = %d is: %g\"%(n,f7_grad(n)))\n",
-    "\n",
-    "# The function f7 is an implementation of the factorial of n.\n",
-    "# By using the product rule, one can find that the derivative is:\n",
-    "\n",
-    "f7_grad_analytical = 0\n",
-    "for i in range(int(n)-1):\n",
-    "    tmp = 1\n",
-    "    for k in range(int(n)-1):\n",
-    "        if k != i:\n",
-    "            tmp *= (n - k)\n",
-    "    f7_grad_analytical += tmp\n",
-    "\n",
-    "print(\"The analytical derivative of f7 at n = %d is: %g\"%(n,f7_grad_analytical))"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "8b25fa4c",
-   "metadata": {},
-   "source": [
-    "Note that if n is equal to zero or one, Autograd will give an error message. This message appears when the output is independent on input."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "e225412f",
-   "metadata": {},
-   "source": [
-    "## Unsupported functions\n",
-    "Autograd supports many features. However, there are some functions that is not supported (yet) by Autograd.\n",
-    "\n",
-    "Assigning a value to the variable being differentiated with respect to"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 24,
-   "id": "89af0ad7",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "import autograd.numpy as np\n",
-    "from autograd import grad\n",
-    "def f8(x): # Assume x is an array\n",
-    "    x[2] = 3\n",
-    "    return x*2\n",
-    "\n",
-    "#f8_grad = grad(f8)\n",
-    "\n",
-    "#x = 8.4\n",
-    "\n",
-    "#print(\"The derivative of f8 is:\",f8_grad(x))"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "87db5510",
-   "metadata": {},
-   "source": [
-    "Here, running this code, Autograd tells us that an 'ArrayBox' does not support item assignment. The item assignment is done when the program tries to assign x[2] to the value 3. However, Autograd has implemented the computation of the derivative such that this assignment is not possible."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "21d8ac43",
-   "metadata": {},
-   "source": [
-    "## The syntax a.dot(b) when finding the dot product"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 25,
-   "id": "5411f897",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "import autograd.numpy as np\n",
-    "from autograd import grad\n",
-    "def f9(a): # Assume a is an array with 2 elements\n",
-    "    b = np.array([1.0,2.0])\n",
-    "    return a.dot(b)\n",
-    "\n",
-    "#f9_grad = grad(f9)\n",
-    "\n",
-    "#x = np.array([1.0,0.0])\n",
-    "\n",
-    "#print(\"The derivative of f9 is:\",f9_grad(x))"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "f6b40ee1",
-   "metadata": {},
-   "source": [
-    "Here we are told that the 'dot' function does not belong to Autograd's\n",
-    "version of a Numpy array.  To overcome this, an alternative syntax\n",
-    "which also computed the dot product can be used:"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 26,
-   "id": "9590986c",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "import autograd.numpy as np\n",
-    "from autograd import grad\n",
-    "def f9_alternative(x): # Assume a is an array with 2 elements\n",
-    "    b = np.array([1.0,2.0])\n",
-    "    return np.dot(x,b) # The same as x_1*b_1 + x_2*b_2\n",
-    "\n",
-    "f9_alternative_grad = grad(f9_alternative)\n",
-    "\n",
-    "x = np.array([3.0,0.0])\n",
-    "\n",
-    "print(\"The gradient of f9 is:\",f9_alternative_grad(x))\n",
-    "\n",
-    "# The analytical gradient of the dot product of vectors x and b with two elements (x_1,x_2) and (b_1, b_2) respectively\n",
-    "# w.r.t x is (b_1, b_2)."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "2085a755",
-   "metadata": {},
-   "source": [
-    "## Using Autograd with OLS\n",
-    "\n",
-    "We conclude the part on optmization by showing how we can make codes\n",
-    "for linear regression and logistic regression using **autograd**. The\n",
-    "first example shows results with ordinary leats squares."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 27,
-   "id": "8108ae0a",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# Using Autograd to calculate gradients for OLS\n",
-    "from random import random, seed\n",
-    "import numpy as np\n",
-    "import autograd.numpy as np\n",
-    "import matplotlib.pyplot as plt\n",
-    "from autograd import grad\n",
-    "\n",
-    "def CostOLS(beta):\n",
-    "    return (1.0/n)*np.sum((y-X @ beta)**2)\n",
-    "\n",
-    "n = 100\n",
-    "x = 2*np.random.rand(n,1)\n",
-    "y = 4+3*x+np.random.randn(n,1)\n",
-    "\n",
-    "X = np.c_[np.ones((n,1)), x]\n",
-    "XT_X = X.T @ X\n",
-    "theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)\n",
-    "print(\"Own inversion\")\n",
-    "print(theta_linreg)\n",
-    "# Hessian matrix\n",
-    "H = (2.0/n)* XT_X\n",
-    "EigValues, EigVectors = np.linalg.eig(H)\n",
-    "print(f\"Eigenvalues of Hessian Matrix:{EigValues}\")\n",
-    "\n",
-    "theta = np.random.randn(2,1)\n",
-    "eta = 1.0/np.max(EigValues)\n",
-    "Niterations = 1000\n",
-    "# define the gradient\n",
-    "training_gradient = grad(CostOLS)\n",
-    "\n",
-    "for iter in range(Niterations):\n",
-    "    gradients = training_gradient(theta)\n",
-    "    theta -= eta*gradients\n",
-    "print(\"theta from own gd\")\n",
-    "print(theta)\n",
-    "\n",
-    "xnew = np.array([[0],[2]])\n",
-    "Xnew = np.c_[np.ones((2,1)), xnew]\n",
-    "ypredict = Xnew.dot(theta)\n",
-    "ypredict2 = Xnew.dot(theta_linreg)\n",
-    "\n",
-    "plt.plot(xnew, ypredict, \"r-\")\n",
-    "plt.plot(xnew, ypredict2, \"b-\")\n",
-    "plt.plot(x, y ,'ro')\n",
-    "plt.axis([0,2.0,0, 15.0])\n",
-    "plt.xlabel(r'$x$')\n",
-    "plt.ylabel(r'$y$')\n",
-    "plt.title(r'Random numbers ')\n",
-    "plt.show()"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "b11794c2",
-   "metadata": {},
-   "source": [
-    "## Same code but now with momentum gradient descent"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 28,
-   "id": "d0d357c7",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# Using Autograd to calculate gradients for OLS\n",
-    "from random import random, seed\n",
-    "import numpy as np\n",
-    "import autograd.numpy as np\n",
-    "import matplotlib.pyplot as plt\n",
-    "from autograd import grad\n",
-    "\n",
-    "def CostOLS(beta):\n",
-    "    return (1.0/n)*np.sum((y-X @ beta)**2)\n",
-    "\n",
-    "n = 100\n",
-    "x = 2*np.random.rand(n,1)\n",
-    "y = 4+3*x#+np.random.randn(n,1)\n",
-    "\n",
-    "X = np.c_[np.ones((n,1)), x]\n",
-    "XT_X = X.T @ X\n",
-    "theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)\n",
-    "print(\"Own inversion\")\n",
-    "print(theta_linreg)\n",
-    "# Hessian matrix\n",
-    "H = (2.0/n)* XT_X\n",
-    "EigValues, EigVectors = np.linalg.eig(H)\n",
-    "print(f\"Eigenvalues of Hessian Matrix:{EigValues}\")\n",
-    "\n",
-    "theta = np.random.randn(2,1)\n",
-    "eta = 1.0/np.max(EigValues)\n",
-    "Niterations = 30\n",
-    "\n",
-    "# define the gradient\n",
-    "training_gradient = grad(CostOLS)\n",
-    "\n",
-    "for iter in range(Niterations):\n",
-    "    gradients = training_gradient(theta)\n",
-    "    theta -= eta*gradients\n",
-    "    print(iter,gradients[0],gradients[1])\n",
-    "print(\"theta from own gd\")\n",
-    "print(theta)\n",
-    "\n",
-    "# Now improve with momentum gradient descent\n",
-    "change = 0.0\n",
-    "delta_momentum = 0.3\n",
-    "for iter in range(Niterations):\n",
-    "    # calculate gradient\n",
-    "    gradients = training_gradient(theta)\n",
-    "    # calculate update\n",
-    "    new_change = eta*gradients+delta_momentum*change\n",
-    "    # take a step\n",
-    "    theta -= new_change\n",
-    "    # save the change\n",
-    "    change = new_change\n",
-    "    print(iter,gradients[0],gradients[1])\n",
-    "print(\"theta from own gd wth momentum\")\n",
-    "print(theta)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "641a2e50",
-   "metadata": {},
-   "source": [
-    "## But none of these can compete with Newton's method"
+    "The class implements the sigmoid and softmax internally. During fit(),\n",
+    "we check the number of classes: if more than 2, we set\n",
+    "self.multi_class=True and perform multinomial logistic regression. We\n",
+    "one-hot encode the target vector and update a weight matrix with\n",
+    "softmax probabilities. Otherwise, we do standard binary logistic\n",
+    "regression, converting labels to 0/1 if needed and updating a weight\n",
+    "vector. In both cases we use batch gradient descent on the\n",
+    "cross-entropy loss (we add a small epsilon 1e-15 to logs for numerical\n",
+    "stability). Progress (loss) can be printed if verbose=True."
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 29,
-   "id": "ca6775b3",
-   "metadata": {},
+   "execution_count": 10,
+   "id": "8609fd64",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
    "outputs": [],
    "source": [
-    "# Using Newton's method\n",
-    "from random import random, seed\n",
-    "import numpy as np\n",
-    "import autograd.numpy as np\n",
-    "import matplotlib.pyplot as plt\n",
-    "from autograd import grad\n",
-    "\n",
-    "def CostOLS(beta):\n",
-    "    return (1.0/n)*np.sum((y-X @ beta)**2)\n",
-    "\n",
-    "n = 100\n",
-    "x = 2*np.random.rand(n,1)\n",
-    "y = 4+3*x+np.random.randn(n,1)\n",
-    "\n",
-    "X = np.c_[np.ones((n,1)), x]\n",
-    "XT_X = X.T @ X\n",
-    "beta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)\n",
-    "print(\"Own inversion\")\n",
-    "print(beta_linreg)\n",
-    "# Hessian matrix\n",
-    "H = (2.0/n)* XT_X\n",
-    "# Note that here the Hessian does not depend on the parameters beta\n",
-    "invH = np.linalg.pinv(H)\n",
-    "EigValues, EigVectors = np.linalg.eig(H)\n",
-    "print(f\"Eigenvalues of Hessian Matrix:{EigValues}\")\n",
-    "\n",
-    "beta = np.random.randn(2,1)\n",
-    "Niterations = 5\n",
-    "\n",
-    "# define the gradient\n",
-    "training_gradient = grad(CostOLS)\n",
-    "\n",
-    "for iter in range(Niterations):\n",
-    "    gradients = training_gradient(beta)\n",
-    "    beta -= invH @ gradients\n",
-    "    print(iter,gradients[0],gradients[1])\n",
-    "print(\"beta from own Newton code\")\n",
-    "print(beta)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "75ed4c75",
-   "metadata": {},
-   "source": [
-    "## Including Stochastic Gradient Descent with Autograd\n",
-    "In this code we include the stochastic gradient descent approach discussed above. Note here that we specify which argument we are taking the derivative with respect to when using **autograd**."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 30,
-   "id": "5a69be60",
-   "metadata": {},
-   "outputs": [],
+    "# Evaluation Metrics\n",
+    "#We define helper functions for accuracy and cross-entropy loss. Accuracy is the fraction of correct predictions . For loss, we compute the appropriate cross-entropy:\n",
+    "\n",
+    "def accuracy_score(y_true, y_pred):\n",
+    "    \"\"\"Accuracy = (# correct predictions) / (total samples).\"\"\"\n",
+    "    y_true = np.array(y_true)\n",
+    "    y_pred = np.array(y_pred)\n",
+    "    return np.mean(y_true == y_pred)\n",
+    "\n",
+    "def binary_cross_entropy(y_true, y_prob):\n",
+    "    \"\"\"\n",
+    "    Binary cross-entropy loss.\n",
+    "    y_true: true binary labels (0 or 1), y_prob: predicted probabilities for class 1.\n",
+    "    \"\"\"\n",
+    "    y_true = np.array(y_true)\n",
+    "    y_prob = np.clip(np.array(y_prob), 1e-15, 1-1e-15)\n",
+    "    return -np.mean(y_true * np.log(y_prob) + (1 - y_true) * np.log(1 - y_prob))\n",
+    "\n",
+    "def categorical_cross_entropy(y_true, y_prob):\n",
+    "    \"\"\"\n",
+    "    Categorical cross-entropy loss for multiclass.\n",
+    "    y_true: true labels (0...C-1), y_prob: array of predicted probabilities (n_samples x C).\n",
+    "    \"\"\"\n",
+    "    y_true = np.array(y_true, dtype=int)\n",
+    "    y_prob = np.clip(np.array(y_prob), 1e-15, 1-1e-15)\n",
+    "    # One-hot encode true labels\n",
+    "    n_samples, n_classes = y_prob.shape\n",
+    "    one_hot = np.zeros_like(y_prob)\n",
+    "    one_hot[np.arange(n_samples), y_true] = 1\n",
+    "    # Compute cross-entropy\n",
+    "    loss_vec = -np.sum(one_hot * np.log(y_prob), axis=1)\n",
+    "    return np.mean(loss_vec)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1879aba2",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "# Using Autograd to calculate gradients using SGD\n",
-    "# OLS example\n",
-    "from random import random, seed\n",
-    "import numpy as np\n",
-    "import autograd.numpy as np\n",
-    "import matplotlib.pyplot as plt\n",
-    "from autograd import grad\n",
-    "\n",
-    "# Note change from previous example\n",
-    "def CostOLS(y,X,theta):\n",
-    "    return np.sum((y-X @ theta)**2)\n",
-    "\n",
-    "n = 100\n",
-    "x = 2*np.random.rand(n,1)\n",
-    "y = 4+3*x+np.random.randn(n,1)\n",
-    "\n",
-    "X = np.c_[np.ones((n,1)), x]\n",
-    "XT_X = X.T @ X\n",
-    "theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)\n",
-    "print(\"Own inversion\")\n",
-    "print(theta_linreg)\n",
-    "# Hessian matrix\n",
-    "H = (2.0/n)* XT_X\n",
-    "EigValues, EigVectors = np.linalg.eig(H)\n",
-    "print(f\"Eigenvalues of Hessian Matrix:{EigValues}\")\n",
-    "\n",
-    "theta = np.random.randn(2,1)\n",
-    "eta = 1.0/np.max(EigValues)\n",
-    "Niterations = 1000\n",
-    "\n",
-    "# Note that we request the derivative wrt third argument (theta, 2 here)\n",
-    "training_gradient = grad(CostOLS,2)\n",
-    "\n",
-    "for iter in range(Niterations):\n",
-    "    gradients = (1.0/n)*training_gradient(y, X, theta)\n",
-    "    theta -= eta*gradients\n",
-    "print(\"theta from own gd\")\n",
-    "print(theta)\n",
-    "\n",
-    "xnew = np.array([[0],[2]])\n",
-    "Xnew = np.c_[np.ones((2,1)), xnew]\n",
-    "ypredict = Xnew.dot(theta)\n",
-    "ypredict2 = Xnew.dot(theta_linreg)\n",
-    "\n",
-    "plt.plot(xnew, ypredict, \"r-\")\n",
-    "plt.plot(xnew, ypredict2, \"b-\")\n",
-    "plt.plot(x, y ,'ro')\n",
-    "plt.axis([0,2.0,0, 15.0])\n",
-    "plt.xlabel(r'$x$')\n",
-    "plt.ylabel(r'$y$')\n",
-    "plt.title(r'Random numbers ')\n",
-    "plt.show()\n",
+    "### Synthetic data generation\n",
     "\n",
-    "n_epochs = 50\n",
-    "M = 5   #size of each minibatch\n",
-    "m = int(n/M) #number of minibatches\n",
-    "t0, t1 = 5, 50\n",
-    "def learning_schedule(t):\n",
-    "    return t0/(t+t1)\n",
-    "\n",
-    "theta = np.random.randn(2,1)\n",
-    "\n",
-    "for epoch in range(n_epochs):\n",
-    "# Can you figure out a better way of setting up the contributions to each batch?\n",
-    "    for i in range(m):\n",
-    "        random_index = M*np.random.randint(m)\n",
-    "        xi = X[random_index:random_index+M]\n",
-    "        yi = y[random_index:random_index+M]\n",
-    "        gradients = (1.0/M)*training_gradient(yi, xi, theta)\n",
-    "        eta = learning_schedule(epoch*m+i)\n",
-    "        theta = theta - eta*gradients\n",
-    "print(\"theta from own sdg\")\n",
-    "print(theta)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "345a84dc",
-   "metadata": {},
-   "source": [
-    "## Same code but now with momentum gradient descent"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 31,
-   "id": "656324bd",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# Using Autograd to calculate gradients using SGD\n",
-    "# OLS example\n",
-    "from random import random, seed\n",
-    "import numpy as np\n",
-    "import autograd.numpy as np\n",
-    "import matplotlib.pyplot as plt\n",
-    "from autograd import grad\n",
-    "\n",
-    "# Note change from previous example\n",
-    "def CostOLS(y,X,theta):\n",
-    "    return np.sum((y-X @ theta)**2)\n",
-    "\n",
-    "n = 100\n",
-    "x = 2*np.random.rand(n,1)\n",
-    "y = 4+3*x+np.random.randn(n,1)\n",
-    "\n",
-    "X = np.c_[np.ones((n,1)), x]\n",
-    "XT_X = X.T @ X\n",
-    "theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)\n",
-    "print(\"Own inversion\")\n",
-    "print(theta_linreg)\n",
-    "# Hessian matrix\n",
-    "H = (2.0/n)* XT_X\n",
-    "EigValues, EigVectors = np.linalg.eig(H)\n",
-    "print(f\"Eigenvalues of Hessian Matrix:{EigValues}\")\n",
-    "\n",
-    "theta = np.random.randn(2,1)\n",
-    "eta = 1.0/np.max(EigValues)\n",
-    "Niterations = 100\n",
-    "\n",
-    "# Note that we request the derivative wrt third argument (theta, 2 here)\n",
-    "training_gradient = grad(CostOLS,2)\n",
-    "\n",
-    "for iter in range(Niterations):\n",
-    "    gradients = (1.0/n)*training_gradient(y, X, theta)\n",
-    "    theta -= eta*gradients\n",
-    "print(\"theta from own gd\")\n",
-    "print(theta)\n",
-    "\n",
-    "\n",
-    "n_epochs = 50\n",
-    "M = 5   #size of each minibatch\n",
-    "m = int(n/M) #number of minibatches\n",
-    "t0, t1 = 5, 50\n",
-    "def learning_schedule(t):\n",
-    "    return t0/(t+t1)\n",
-    "\n",
-    "theta = np.random.randn(2,1)\n",
-    "\n",
-    "change = 0.0\n",
-    "delta_momentum = 0.3\n",
-    "\n",
-    "for epoch in range(n_epochs):\n",
-    "    for i in range(m):\n",
-    "        random_index = M*np.random.randint(m)\n",
-    "        xi = X[random_index:random_index+M]\n",
-    "        yi = y[random_index:random_index+M]\n",
-    "        gradients = (1.0/M)*training_gradient(yi, xi, theta)\n",
-    "        eta = learning_schedule(epoch*m+i)\n",
-    "        # calculate update\n",
-    "        new_change = eta*gradients+delta_momentum*change\n",
-    "        # take a step\n",
-    "        theta -= new_change\n",
-    "        # save the change\n",
-    "        change = new_change\n",
-    "print(\"theta from own sdg with momentum\")\n",
-    "print(theta)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "a4f56c77",
-   "metadata": {},
-   "source": [
-    "## Similar (second order function now) problem but now with AdaGrad"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 25,
-   "id": "69086c69",
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "Own inversion\n",
-      "[[1.]\n",
-      " [2.]\n",
-      " [3.]\n",
-      " [4.]]\n",
-      "theta from own AdaGrad\n",
-      "[[0.99560675]\n",
-      " [2.02080651]\n",
-      " [2.98348281]\n",
-      " [3.99738446]]\n"
-     ]
-    }
-   ],
-   "source": [
-    "# Using Autograd to calculate gradients using AdaGrad and Stochastic Gradient descent\n",
-    "# OLS example\n",
-    "from random import random, seed\n",
-    "import numpy as np\n",
-    "import autograd.numpy as np\n",
-    "import matplotlib.pyplot as plt\n",
-    "from autograd import grad\n",
-    "\n",
-    "# Note change from previous example\n",
-    "def CostOLS(y,X,theta):\n",
-    "    return np.sum((y-X @ theta)**2)\n",
-    "\n",
-    "n = 100\n",
-    "x = np.random.rand(n,1)\n",
-    "y = 1.0+2*x +3*x*x+4*x*x*x\n",
-    "\n",
-    "X = np.c_[np.ones((n,1)), x, x*x, x*x*x]\n",
-    "XT_X = X.T @ X\n",
-    "theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)\n",
-    "print(\"Own inversion\")\n",
-    "print(theta_linreg)\n",
-    "\n",
-    "\n",
-    "# Note that we request the derivative wrt third argument (theta, 2 here)\n",
-    "training_gradient = grad(CostOLS,2)\n",
-    "# Define parameters for Stochastic Gradient Descent\n",
-    "n_epochs = 100\n",
-    "M = 16   #size of each minibatch\n",
-    "m = int(n/M) #number of minibatches\n",
-    "# Guess for unknown parameters theta\n",
-    "theta = 0.1*np.random.randn(4,1)\n",
-    "\n",
-    "# Value for learning rate\n",
-    "eta = 100000000.\n",
-    "# Including AdaGrad parameter to avoid possible division by zero\n",
-    "delta  = 1e-8\n",
-    "Giter = 0.0\n",
-    "for epoch in range(n_epochs):\n",
-    "#    Giter = 0.0\n",
-    "    for i in range(m):\n",
-    "        random_index = M*np.random.randint(m)\n",
-    "        xi = X[random_index:random_index+M]\n",
-    "        yi = y[random_index:random_index+M]\n",
-    "        gradients = (1.0/M)*training_gradient(yi, xi, theta)\n",
-    "        Giter += gradients*gradients\n",
-    "        update = gradients*eta/(delta+np.sqrt(Giter))\n",
-    "#        print(gradients)\n",
-    "        theta -= update\n",
-    "print(\"theta from own AdaGrad\")\n",
-    "print(theta)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "aaf0ff5c",
-   "metadata": {},
-   "source": [
-    "Running this code we note an almost perfect agreement with the results from matrix inversion."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "adc13d31",
-   "metadata": {},
-   "source": [
-    "## RMSprop for adaptive learning rate with Stochastic Gradient Descent"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 33,
-   "id": "ca22be17",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# Using Autograd to calculate gradients using RMSprop  and Stochastic Gradient descent\n",
-    "# OLS example\n",
-    "from random import random, seed\n",
-    "import numpy as np\n",
-    "import autograd.numpy as np\n",
-    "import matplotlib.pyplot as plt\n",
-    "from autograd import grad\n",
-    "\n",
-    "# Note change from previous example\n",
-    "def CostOLS(y,X,theta):\n",
-    "    return np.sum((y-X @ theta)**2)\n",
-    "\n",
-    "n = 1000\n",
-    "x = np.random.rand(n,1)\n",
-    "y = 2.0+3*x +4*x*x# +np.random.randn(n,1)\n",
-    "\n",
-    "X = np.c_[np.ones((n,1)), x, x*x]\n",
-    "XT_X = X.T @ X\n",
-    "theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)\n",
-    "print(\"Own inversion\")\n",
-    "print(theta_linreg)\n",
-    "\n",
-    "\n",
-    "# Note that we request the derivative wrt third argument (theta, 2 here)\n",
-    "training_gradient = grad(CostOLS,2)\n",
-    "# Define parameters for Stochastic Gradient Descent\n",
-    "n_epochs = 50\n",
-    "M = 5   #size of each minibatch\n",
-    "m = int(n/M) #number of minibatches\n",
-    "# Guess for unknown parameters theta\n",
-    "theta = np.random.randn(3,1)\n",
-    "\n",
-    "# Value for learning rate\n",
-    "eta = 0.01\n",
-    "# Value for parameter rho\n",
-    "rho = 0.99\n",
-    "# Including AdaGrad parameter to avoid possible division by zero\n",
-    "delta  = 1e-8\n",
-    "Giter = 0.0\n",
-    "for epoch in range(n_epochs):\n",
-    "#    Giter = 0.0\n",
-    "    for i in range(m):\n",
-    "        random_index = M*np.random.randint(m)\n",
-    "        xi = X[random_index:random_index+M]\n",
-    "        yi = y[random_index:random_index+M]\n",
-    "        gradients = (1.0/M)*training_gradient(yi, xi, theta)\n",
-    "\t# Accumulated gradient\n",
-    "\t# Scaling with rho the new and the previous results\n",
-    "        Giter = (rho*Giter+(1-rho)*gradients*gradients)\n",
-    "\t# Taking the diagonal only and inverting\n",
-    "        update = gradients*eta/(delta+np.sqrt(Giter))\n",
-    "\t# Hadamard product\n",
-    "        theta -= update\n",
-    "print(\"theta from own RMSprop\")\n",
-    "print(theta)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "20bdfaf1",
-   "metadata": {},
-   "source": [
-    "## And finally [ADAM](https://arxiv.org/pdf/1412.6980.pdf)"
+    "Binary classification data: Create two Gaussian clusters in 2D. For example, class 0 around mean [-2,-2] and class 1 around [2,2].\n",
+    "Multiclass data: Create several Gaussian clusters (one per class) spread out in feature space."
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 34,
-   "id": "aebc385b",
-   "metadata": {},
+   "execution_count": 11,
+   "id": "6083d844",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
    "outputs": [],
    "source": [
-    "# Using Autograd to calculate gradients using RMSprop  and Stochastic Gradient descent\n",
-    "# OLS example\n",
-    "from random import random, seed\n",
     "import numpy as np\n",
-    "import autograd.numpy as np\n",
-    "import matplotlib.pyplot as plt\n",
-    "from autograd import grad\n",
-    "\n",
-    "# Note change from previous example\n",
-    "def CostOLS(y,X,theta):\n",
-    "    return np.sum((y-X @ theta)**2)\n",
-    "\n",
-    "n = 1000\n",
-    "x = np.random.rand(n,1)\n",
-    "y = 2.0+3*x +4*x*x# +np.random.randn(n,1)\n",
-    "\n",
-    "X = np.c_[np.ones((n,1)), x, x*x]\n",
-    "XT_X = X.T @ X\n",
-    "theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)\n",
-    "print(\"Own inversion\")\n",
-    "print(theta_linreg)\n",
-    "\n",
-    "\n",
-    "# Note that we request the derivative wrt third argument (theta, 2 here)\n",
-    "training_gradient = grad(CostOLS,2)\n",
-    "# Define parameters for Stochastic Gradient Descent\n",
-    "n_epochs = 50\n",
-    "M = 5   #size of each minibatch\n",
-    "m = int(n/M) #number of minibatches\n",
-    "# Guess for unknown parameters theta\n",
-    "theta = np.random.randn(3,1)\n",
-    "\n",
-    "# Value for learning rate\n",
-    "eta = 0.01\n",
-    "# Value for parameters beta1 and beta2, see https://arxiv.org/abs/1412.6980\n",
-    "beta1 = 0.9\n",
-    "beta2 = 0.999\n",
-    "# Including AdaGrad parameter to avoid possible division by zero\n",
-    "delta  = 1e-7\n",
-    "iter = 0\n",
-    "for epoch in range(n_epochs):\n",
-    "    first_moment = 0.0\n",
-    "    second_moment = 0.0\n",
-    "    iter += 1\n",
-    "    for i in range(m):\n",
-    "        random_index = M*np.random.randint(m)\n",
-    "        xi = X[random_index:random_index+M]\n",
-    "        yi = y[random_index:random_index+M]\n",
-    "        gradients = (1.0/M)*training_gradient(yi, xi, theta)\n",
-    "        # Computing moments first\n",
-    "        first_moment = beta1*first_moment + (1-beta1)*gradients\n",
-    "        second_moment = beta2*second_moment+(1-beta2)*gradients*gradients\n",
-    "        first_term = first_moment/(1.0-beta1**iter)\n",
-    "        second_term = second_moment/(1.0-beta2**iter)\n",
-    "\t# Scaling with rho the new and the previous results\n",
-    "        update = eta*first_term/(np.sqrt(second_term)+delta)\n",
-    "        theta -= update\n",
-    "print(\"theta from own ADAM\")\n",
-    "print(theta)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "59f9e902",
-   "metadata": {},
-   "source": [
-    "## And Logistic Regression"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 35,
-   "id": "0787385b",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "import autograd.numpy as np\n",
-    "from autograd import grad\n",
-    "\n",
-    "def sigmoid(x):\n",
-    "    return 0.5 * (np.tanh(x / 2.) + 1)\n",
-    "\n",
-    "def logistic_predictions(weights, inputs):\n",
-    "    # Outputs probability of a label being true according to logistic model.\n",
-    "    return sigmoid(np.dot(inputs, weights))\n",
-    "\n",
-    "def training_loss(weights):\n",
-    "    # Training loss is the negative log-likelihood of the training labels.\n",
-    "    preds = logistic_predictions(weights, inputs)\n",
-    "    label_probabilities = preds * targets + (1 - preds) * (1 - targets)\n",
-    "    return -np.sum(np.log(label_probabilities))\n",
-    "\n",
-    "# Build a toy dataset.\n",
-    "inputs = np.array([[0.52, 1.12,  0.77],\n",
-    "                   [0.88, -1.08, 0.15],\n",
-    "                   [0.52, 0.06, -1.30],\n",
-    "                   [0.74, -2.49, 1.39]])\n",
-    "targets = np.array([True, True, False, True])\n",
-    "\n",
-    "# Define a function that returns gradients of training loss using Autograd.\n",
-    "training_gradient_fun = grad(training_loss)\n",
-    "\n",
-    "# Optimize weights using gradient descent.\n",
-    "weights = np.array([0.0, 0.0, 0.0])\n",
-    "print(\"Initial loss:\", training_loss(weights))\n",
-    "for i in range(100):\n",
-    "    weights -= training_gradient_fun(weights) * 0.01\n",
-    "\n",
-    "print(\"Trained loss:\", training_loss(weights))"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "69ab7a74",
-   "metadata": {},
-   "source": [
-    "## Introducing [JAX](https://jax.readthedocs.io/en/latest/)\n",
-    "\n",
-    "Presently, instead of using **autograd**, we recommend using [JAX](https://jax.readthedocs.io/en/latest/)\n",
-    "\n",
-    "**JAX** is Autograd and [XLA (Accelerated Linear Algebra))](https://www.tensorflow.org/xla),\n",
-    "brought together for high-performance numerical computing and machine learning research.\n",
-    "It provides composable transformations of Python+NumPy programs: differentiate, vectorize, parallelize, Just-In-Time compile to GPU/TPU, and more.\n",
-    "\n",
-    "Here's a simple example on how you can use **JAX** to compute the derivate of the logistic function."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 36,
-   "id": "fec78172",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "import jax.numpy as jnp\n",
-    "from jax import grad, jit, vmap\n",
     "\n",
-    "def sum_logistic(x):\n",
-    "  return jnp.sum(1.0 / (1.0 + jnp.exp(-x)))\n",
-    "\n",
-    "x_small = jnp.arange(3.)\n",
-    "derivative_fn = grad(sum_logistic)\n",
-    "print(derivative_fn(x_small))"
+    "def generate_binary_data(n_samples=100, n_features=2, random_state=None):\n",
+    "    \"\"\"\n",
+    "    Generate synthetic binary classification data.\n",
+    "    Returns (X, y) where X is (n_samples x n_features), y in {0,1}.\n",
+    "    \"\"\"\n",
+    "    rng = np.random.RandomState(random_state)\n",
+    "    # Half samples for class 0, half for class 1\n",
+    "    n0 = n_samples // 2\n",
+    "    n1 = n_samples - n0\n",
+    "    # Class 0 around mean -2, class 1 around +2\n",
+    "    mean0 = -2 * np.ones(n_features)\n",
+    "    mean1 =  2 * np.ones(n_features)\n",
+    "    X0 = rng.randn(n0, n_features) + mean0\n",
+    "    X1 = rng.randn(n1, n_features) + mean1\n",
+    "    X = np.vstack((X0, X1))\n",
+    "    y = np.array([0]*n0 + [1]*n1)\n",
+    "    return X, y\n",
+    "\n",
+    "def generate_multiclass_data(n_samples=150, n_features=2, n_classes=3, random_state=None):\n",
+    "    \"\"\"\n",
+    "    Generate synthetic multiclass data with n_classes Gaussian clusters.\n",
+    "    \"\"\"\n",
+    "    rng = np.random.RandomState(random_state)\n",
+    "    X = []\n",
+    "    y = []\n",
+    "    samples_per_class = n_samples // n_classes\n",
+    "    for cls in range(n_classes):\n",
+    "        # Random cluster center for each class\n",
+    "        center = rng.uniform(-5, 5, size=n_features)\n",
+    "        Xi = rng.randn(samples_per_class, n_features) + center\n",
+    "        yi = [cls] * samples_per_class\n",
+    "        X.append(Xi)\n",
+    "        y.extend(yi)\n",
+    "    X = np.vstack(X)\n",
+    "    y = np.array(y)\n",
+    "    return X, y\n",
+    "\n",
+    "\n",
+    "# Generate and test on binary data\n",
+    "X_bin, y_bin = generate_binary_data(n_samples=200, n_features=2, random_state=42)\n",
+    "model_bin = LogisticRegression(lr=0.1, epochs=1000)\n",
+    "model_bin.fit(X_bin, y_bin)\n",
+    "y_prob_bin = model_bin.predict_prob(X_bin)      # probabilities for class 1\n",
+    "y_pred_bin = model_bin.predict(X_bin)           # predicted classes 0 or 1\n",
+    "\n",
+    "acc_bin = accuracy_score(y_bin, y_pred_bin)\n",
+    "loss_bin = binary_cross_entropy(y_bin, y_prob_bin)\n",
+    "print(f\"Binary Classification - Accuracy: {acc_bin:.2f}, Cross-Entropy Loss: {loss_bin:.2f}\")\n",
+    "#For multiclass:\n",
+    "# Generate and test on multiclass data\n",
+    "X_multi, y_multi = generate_multiclass_data(n_samples=300, n_features=2, n_classes=3, random_state=1)\n",
+    "model_multi = LogisticRegression(lr=0.1, epochs=1000)\n",
+    "model_multi.fit(X_multi, y_multi)\n",
+    "y_prob_multi = model_multi.predict_prob(X_multi)     # (n_samples x 3) probabilities\n",
+    "y_pred_multi = model_multi.predict(X_multi)          # predicted labels 0,1,2\n",
+    "\n",
+    "acc_multi = accuracy_score(y_multi, y_pred_multi)\n",
+    "loss_multi = categorical_cross_entropy(y_multi, y_prob_multi)\n",
+    "print(f\"Multiclass Classification - Accuracy: {acc_multi:.2f}, Cross-Entropy Loss: {loss_multi:.2f}\")\n",
+    "\n",
+    "# CSV Export\n",
+    "import csv\n",
+    "\n",
+    "# Export binary results\n",
+    "with open('binary_results.csv', mode='w', newline='') as f:\n",
+    "    writer = csv.writer(f)\n",
+    "    writer.writerow([\"TrueLabel\", \"PredictedLabel\"])\n",
+    "    for true, pred in zip(y_bin, y_pred_bin):\n",
+    "        writer.writerow([true, pred])\n",
+    "\n",
+    "# Export multiclass results\n",
+    "with open('multiclass_results.csv', mode='w', newline='') as f:\n",
+    "    writer = csv.writer(f)\n",
+    "    writer.writerow([\"TrueLabel\", \"PredictedLabel\"])\n",
+    "    for true, pred in zip(y_multi, y_pred_multi):\n",
+    "        writer.writerow([true, pred])"
    ]
   }
  ],
- "metadata": {
-  "kernelspec": {
-   "display_name": "Python 3 (ipykernel)",
-   "language": "python",
-   "name": "python3"
-  },
-  "language_info": {
-   "codemirror_mode": {
-    "name": "ipython",
-    "version": 3
-   },
-   "file_extension": ".py",
-   "mimetype": "text/x-python",
-   "name": "python",
-   "nbconvert_exporter": "python",
-   "pygments_lexer": "ipython3",
-   "version": "3.9.15"
-  }
- },
+ "metadata": {},
  "nbformat": 4,
  "nbformat_minor": 5
 }
diff --git a/doc/pub/week40/html/._week40-bs000.html b/doc/pub/week40/html/._week40-bs000.html
index e77b3a43b..6833ab7a8 100644
--- a/doc/pub/week40/html/._week40-bs000.html
+++ b/doc/pub/week40/html/._week40-bs000.html
@@ -36,11 +36,10 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Plans for week 40', 2, None, 'plans-for-week-40'),
-              ('Lecture Monday September 30, 2024',
+ 'sections': [('Lecture Monday September 29, 2025',
                2,
                None,
-               'lecture-monday-september-30-2024'),
+               'lecture-monday-september-29-2025'),
               ('Suggested readings and videos',
                2,
                None,
@@ -49,145 +48,71 @@
                2,
                None,
                'lab-sessions-tuesday-and-wednesday'),
-              ('Summary from last week, using gradient descent methods, '
-               'limitations',
+              ('Logistic Regression, from last week',
                2,
                None,
-               'summary-from-last-week-using-gradient-descent-methods-limitations'),
-              ('Simple implementation of GD for OLS, Ridge and Lasso',
+               'logistic-regression-from-last-week'),
+              ('Classification problems', 2, None, 'classification-problems'),
+              ('Optimization and Deep learning',
                2,
                None,
-               'simple-implementation-of-gd-for-ols-ridge-and-lasso'),
-              ("But none of these can compete with Newton's method",
+               'optimization-and-deep-learning'),
+              ('Basics', 2, None, 'basics'),
+              ('Two parameters', 2, None, 'two-parameters'),
+              ('Maximum likelihood', 2, None, 'maximum-likelihood'),
+              ('The cost function rewritten',
                2,
                None,
-               'but-none-of-these-can-compete-with-newton-s-method'),
-              ('Gradient descent and Logistic regression',
+               'the-cost-function-rewritten'),
+              ('Minimizing the cross entropy',
                2,
                None,
-               'gradient-descent-and-logistic-regression'),
-              ('Overview video on Stochastic Gradient Descent',
+               'minimizing-the-cross-entropy'),
+              ('A more compact expression',
                2,
                None,
-               'overview-video-on-stochastic-gradient-descent'),
-              ('Batches and mini-batches', 2, None, 'batches-and-mini-batches'),
-              ('Stochastic Gradient Descent (SGD)',
+               'a-more-compact-expression'),
+              ('Extending to more predictors',
                2,
                None,
-               'stochastic-gradient-descent-sgd'),
-              ('Stochastic Gradient Descent',
+               'extending-to-more-predictors'),
+              ('Including more classes', 2, None, 'including-more-classes'),
+              ('More classes', 2, None, 'more-classes'),
+              ('Optimization, the central part of any Machine Learning '
+               'algortithm',
                2,
                None,
-               'stochastic-gradient-descent'),
-              ('Computation of gradients', 2, None, 'computation-of-gradients'),
-              ('SGD example', 2, None, 'sgd-example'),
-              ('The gradient step', 2, None, 'the-gradient-step'),
-              ('Simple example code', 2, None, 'simple-example-code'),
-              ('When do we stop?', 2, None, 'when-do-we-stop'),
-              ('Slightly different approach',
+               'optimization-the-central-part-of-any-machine-learning-algortithm'),
+              ('Revisiting our Logistic Regression case',
                2,
                None,
-               'slightly-different-approach'),
-              ('Time decay rate', 2, None, 'time-decay-rate'),
-              ('Code with a Number of Minibatches which varies',
+               'revisiting-our-logistic-regression-case'),
+              ('The equations to solve', 2, None, 'the-equations-to-solve'),
+              ("Solving using Newton-Raphson's method",
                2,
                None,
-               'code-with-a-number-of-minibatches-which-varies'),
-              ('Replace or not', 2, None, 'replace-or-not'),
-              ('Momentum based GD', 2, None, 'momentum-based-gd'),
-              ('More on momentum based approaches',
+               'solving-using-newton-raphson-s-method'),
+              ('Example code for Logistic Regression',
                2,
                None,
-               'more-on-momentum-based-approaches'),
-              ('Momentum parameter', 2, None, 'momentum-parameter'),
-              ('Second moment of the gradient',
-               2,
-               None,
-               'second-moment-of-the-gradient'),
-              ('RMS prop', 2, None, 'rms-prop'),
-              ('"ADAM optimizer":"https://arxiv.org/abs/1412.6980"',
-               2,
-               None,
-               'adam-optimizer-https-arxiv-org-abs-1412-6980'),
-              ('Algorithms and codes for Adagrad, RMSprop and Adam',
-               2,
-               None,
-               'algorithms-and-codes-for-adagrad-rmsprop-and-adam'),
-              ('AdaGrad algorithm, taken from "Goodfellow et '
-               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
-               2,
-               None,
-               'adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
-              ('RMSProp algorithm, taken from "Goodfellow et '
-               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
-               2,
-               None,
-               'rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
-              ('ADAM algorithm, taken from "Goodfellow et '
-               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
-               2,
-               None,
-               'adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
-              ('Practical tips', 2, None, 'practical-tips'),
-              ('Automatic differentiation',
-               2,
-               None,
-               'automatic-differentiation'),
-              ('Using autograd', 2, None, 'using-autograd'),
-              ('Autograd with more complicated functions',
-               2,
-               None,
-               'autograd-with-more-complicated-functions'),
-              ('More complicated functions using the elements of their '
-               'arguments directly',
-               2,
-               None,
-               'more-complicated-functions-using-the-elements-of-their-arguments-directly'),
-              ('Functions using mathematical functions from Numpy',
-               2,
-               None,
-               'functions-using-mathematical-functions-from-numpy'),
-              ('More autograd', 2, None, 'more-autograd'),
-              ('And  with loops', 2, None, 'and-with-loops'),
-              ('Using recursion', 2, None, 'using-recursion'),
-              ('Using Autograd with OLS', 2, None, 'using-autograd-with-ols'),
-              ('Same code but now with momentum gradient descent',
-               2,
-               None,
-               'same-code-but-now-with-momentum-gradient-descent'),
-              ('Including Stochastic Gradient Descent with Autograd',
-               2,
-               None,
-               'including-stochastic-gradient-descent-with-autograd'),
-              ('Same code but now with momentum gradient descent',
-               2,
-               None,
-               'same-code-but-now-with-momentum-gradient-descent'),
-              ('Similar (second order function now) problem but now with '
-               'AdaGrad',
-               2,
+               'example-code-for-logistic-regression'),
+              ('Synthetic data generation',
+               3,
                None,
-               'similar-second-order-function-now-problem-but-now-with-adagrad'),
-              ('RMSprop for adaptive learning rate with Stochastic Gradient '
-               'Descent',
+               'synthetic-data-generation'),
+              ('Using _Scikit-learn_', 2, None, 'using-scikit-learn'),
+              ('Using the correlation matrix',
                2,
                None,
-               'rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent'),
-              ('And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"',
+               'using-the-correlation-matrix'),
+              ('Discussing the correlation data',
                2,
                None,
-               'and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf'),
-              ('And Logistic Regression', 2, None, 'and-logistic-regression'),
-              ('Introducing "JAX":"https://jax.readthedocs.io/en/latest/"',
+               'discussing-the-correlation-data'),
+              ('Other measures in classification studies',
                2,
                None,
-               'introducing-jax-https-jax-readthedocs-io-en-latest'),
-              ('Getting started with Jax, note the way we import numpy',
-               3,
-               None,
-               'getting-started-with-jax-note-the-way-we-import-numpy'),
-              ('A warm-up example', 3, None, 'a-warm-up-example'),
-              ('A more advanced example', 3, None, 'a-more-advanced-example'),
+               'other-measures-in-classification-studies'),
               ('Introduction to Neural networks',
                2,
                None,
@@ -276,81 +201,54 @@
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs001.html#plans-for-week-40" style="font-size: 80%;"><b>Plans for week 40</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs002.html#lecture-monday-september-30-2024" style="font-size: 80%;"><b>Lecture Monday September 30, 2024</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs003.html#suggested-readings-and-videos" style="font-size: 80%;"><b>Suggested readings and videos</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs004.html#lab-sessions-tuesday-and-wednesday" style="font-size: 80%;"><b>Lab sessions Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs005.html#summary-from-last-week-using-gradient-descent-methods-limitations" style="font-size: 80%;"><b>Summary from last week, using gradient descent methods, limitations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs006.html#simple-implementation-of-gd-for-ols-ridge-and-lasso" style="font-size: 80%;"><b>Simple implementation of GD for OLS, Ridge and Lasso</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs007.html#but-none-of-these-can-compete-with-newton-s-method" style="font-size: 80%;"><b>But none of these can compete with Newton's method</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs008.html#gradient-descent-and-logistic-regression" style="font-size: 80%;"><b>Gradient descent and Logistic regression</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs009.html#overview-video-on-stochastic-gradient-descent" style="font-size: 80%;"><b>Overview video on Stochastic Gradient Descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs010.html#batches-and-mini-batches" style="font-size: 80%;"><b>Batches and mini-batches</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs011.html#stochastic-gradient-descent-sgd" style="font-size: 80%;"><b>Stochastic Gradient Descent (SGD)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs012.html#stochastic-gradient-descent" style="font-size: 80%;"><b>Stochastic Gradient Descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs013.html#computation-of-gradients" style="font-size: 80%;"><b>Computation of gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs014.html#sgd-example" style="font-size: 80%;"><b>SGD example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs015.html#the-gradient-step" style="font-size: 80%;"><b>The gradient step</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs016.html#simple-example-code" style="font-size: 80%;"><b>Simple example code</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs017.html#when-do-we-stop" style="font-size: 80%;"><b>When do we stop?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs018.html#slightly-different-approach" style="font-size: 80%;"><b>Slightly different approach</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs019.html#time-decay-rate" style="font-size: 80%;"><b>Time decay rate</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs020.html#code-with-a-number-of-minibatches-which-varies" style="font-size: 80%;"><b>Code with a Number of Minibatches which varies</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs021.html#replace-or-not" style="font-size: 80%;"><b>Replace or not</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs022.html#momentum-based-gd" style="font-size: 80%;"><b>Momentum based GD</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs023.html#more-on-momentum-based-approaches" style="font-size: 80%;"><b>More on momentum based approaches</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs024.html#momentum-parameter" style="font-size: 80%;"><b>Momentum parameter</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs025.html#second-moment-of-the-gradient" style="font-size: 80%;"><b>Second moment of the gradient</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs026.html#rms-prop" style="font-size: 80%;"><b>RMS prop</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs027.html#adam-optimizer-https-arxiv-org-abs-1412-6980" style="font-size: 80%;"><b>"ADAM optimizer":"https://arxiv.org/abs/1412.6980"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#algorithms-and-codes-for-adagrad-rmsprop-and-adam" style="font-size: 80%;"><b>Algorithms and codes for Adagrad, RMSprop and Adam</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>AdaGrad algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>RMSProp algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>ADAM algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs029.html#practical-tips" style="font-size: 80%;"><b>Practical tips</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs030.html#automatic-differentiation" style="font-size: 80%;"><b>Automatic differentiation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs031.html#using-autograd" style="font-size: 80%;"><b>Using autograd</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs032.html#autograd-with-more-complicated-functions" style="font-size: 80%;"><b>Autograd with more complicated functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs033.html#more-complicated-functions-using-the-elements-of-their-arguments-directly" style="font-size: 80%;"><b>More complicated functions using the elements of their arguments directly</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs034.html#functions-using-mathematical-functions-from-numpy" style="font-size: 80%;"><b>Functions using mathematical functions from Numpy</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs035.html#more-autograd" style="font-size: 80%;"><b>More autograd</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs036.html#and-with-loops" style="font-size: 80%;"><b>And  with loops</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs037.html#using-recursion" style="font-size: 80%;"><b>Using recursion</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs038.html#using-autograd-with-ols" style="font-size: 80%;"><b>Using Autograd with OLS</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs041.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs040.html#including-stochastic-gradient-descent-with-autograd" style="font-size: 80%;"><b>Including Stochastic Gradient Descent with Autograd</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs041.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#similar-second-order-function-now-problem-but-now-with-adagrad" style="font-size: 80%;"><b>Similar (second order function now) problem but now with AdaGrad</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs043.html#rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent" style="font-size: 80%;"><b>RMSprop for adaptive learning rate with Stochastic Gradient Descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs044.html#and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf" style="font-size: 80%;"><b>And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#and-logistic-regression" style="font-size: 80%;"><b>And Logistic Regression</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#introducing-jax-https-jax-readthedocs-io-en-latest" style="font-size: 80%;"><b>Introducing "JAX":"https://jax.readthedocs.io/en/latest/"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#getting-started-with-jax-note-the-way-we-import-numpy" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Getting started with Jax, note the way we import numpy</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#a-warm-up-example" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;A warm-up example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#a-more-advanced-example" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;A more advanced example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs046.html#introduction-to-neural-networks" style="font-size: 80%;"><b>Introduction to Neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs047.html#artificial-neurons" style="font-size: 80%;"><b>Artificial neurons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs048.html#neural-network-types" style="font-size: 80%;"><b>Neural network types</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs049.html#feed-forward-neural-networks" style="font-size: 80%;"><b>Feed-forward neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs050.html#convolutional-neural-network" style="font-size: 80%;"><b>Convolutional Neural Network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs051.html#recurrent-neural-networks" style="font-size: 80%;"><b>Recurrent neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs052.html#other-types-of-networks" style="font-size: 80%;"><b>Other types of networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs053.html#multilayer-perceptrons" style="font-size: 80%;"><b>Multilayer perceptrons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs054.html#why-multilayer-perceptrons" style="font-size: 80%;"><b>Why multilayer perceptrons?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs055.html#illustration-of-a-single-perceptron-model-and-a-multi-perceptron-model" style="font-size: 80%;"><b>Illustration of a single perceptron model and a multi-perceptron model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs056.html#examples-of-xor-or-and-and-gates" style="font-size: 80%;"><b>Examples of XOR, OR and AND gates</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs057.html#does-logistic-regression-do-a-better-job" style="font-size: 80%;"><b>Does Logistic Regression do a better Job?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs058.html#adding-neural-networks" style="font-size: 80%;"><b>Adding Neural Networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs064.html#matrix-vector-notation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Matrix-vector notation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs065.html#matrix-vector-notation-and-activation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Matrix-vector notation  and activation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs066.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs067.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs068.html#relevance" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Relevance</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs001.html#lecture-monday-september-29-2025" style="font-size: 80%;"><b>Lecture Monday September 29, 2025</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs002.html#suggested-readings-and-videos" style="font-size: 80%;"><b>Suggested readings and videos</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs003.html#lab-sessions-tuesday-and-wednesday" style="font-size: 80%;"><b>Lab sessions Tuesday and Wednesday</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs004.html#logistic-regression-from-last-week" style="font-size: 80%;"><b>Logistic Regression, from last week</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs005.html#classification-problems" style="font-size: 80%;"><b>Classification problems</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs006.html#optimization-and-deep-learning" style="font-size: 80%;"><b>Optimization and Deep learning</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs007.html#basics" style="font-size: 80%;"><b>Basics</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs008.html#two-parameters" style="font-size: 80%;"><b>Two parameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs009.html#maximum-likelihood" style="font-size: 80%;"><b>Maximum likelihood</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs010.html#the-cost-function-rewritten" style="font-size: 80%;"><b>The cost function rewritten</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs011.html#minimizing-the-cross-entropy" style="font-size: 80%;"><b>Minimizing the cross entropy</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs012.html#a-more-compact-expression" style="font-size: 80%;"><b>A more compact expression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs013.html#extending-to-more-predictors" style="font-size: 80%;"><b>Extending to more predictors</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs014.html#including-more-classes" style="font-size: 80%;"><b>Including more classes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs015.html#more-classes" style="font-size: 80%;"><b>More classes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs016.html#optimization-the-central-part-of-any-machine-learning-algortithm" style="font-size: 80%;"><b>Optimization, the central part of any Machine Learning algortithm</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs017.html#revisiting-our-logistic-regression-case" style="font-size: 80%;"><b>Revisiting our Logistic Regression case</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs018.html#the-equations-to-solve" style="font-size: 80%;"><b>The equations to solve</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs019.html#solving-using-newton-raphson-s-method" style="font-size: 80%;"><b>Solving using Newton-Raphson's method</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs020.html#example-code-for-logistic-regression" style="font-size: 80%;"><b>Example code for Logistic Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs020.html#synthetic-data-generation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Synthetic data generation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs021.html#using-scikit-learn" style="font-size: 80%;"><b>Using <b>Scikit-learn</b></b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs022.html#using-the-correlation-matrix" style="font-size: 80%;"><b>Using the correlation matrix</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs023.html#discussing-the-correlation-data" style="font-size: 80%;"><b>Discussing the correlation data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs024.html#other-measures-in-classification-studies" style="font-size: 80%;"><b>Other measures in classification studies</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs025.html#introduction-to-neural-networks" style="font-size: 80%;"><b>Introduction to Neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs026.html#artificial-neurons" style="font-size: 80%;"><b>Artificial neurons</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs027.html#neural-network-types" style="font-size: 80%;"><b>Neural network types</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#feed-forward-neural-networks" style="font-size: 80%;"><b>Feed-forward neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs029.html#convolutional-neural-network" style="font-size: 80%;"><b>Convolutional Neural Network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs030.html#recurrent-neural-networks" style="font-size: 80%;"><b>Recurrent neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs031.html#other-types-of-networks" style="font-size: 80%;"><b>Other types of networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs032.html#multilayer-perceptrons" style="font-size: 80%;"><b>Multilayer perceptrons</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs033.html#why-multilayer-perceptrons" style="font-size: 80%;"><b>Why multilayer perceptrons?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs034.html#illustration-of-a-single-perceptron-model-and-a-multi-perceptron-model" style="font-size: 80%;"><b>Illustration of a single perceptron model and a multi-perceptron model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs035.html#examples-of-xor-or-and-and-gates" style="font-size: 80%;"><b>Examples of XOR, OR and AND gates</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs036.html#does-logistic-regression-do-a-better-job" style="font-size: 80%;"><b>Does Logistic Regression do a better Job?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs037.html#adding-neural-networks" style="font-size: 80%;"><b>Adding Neural Networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs043.html#matrix-vector-notation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Matrix-vector notation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs044.html#matrix-vector-notation-and-activation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Matrix-vector notation  and activation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs046.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs047.html#relevance" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Relevance</a></li>
 
         </ul>
       </li>
@@ -369,18 +267,15 @@ <h1>Week 40: Gradient descent methods (continued) and start Neural networks</h1>
 
 <!-- author(s): Morten Hjorth-Jensen -->
 <center>
-<b>Morten Hjorth-Jensen</b> [1, 2]
-</center>
-<!-- institution(s) -->
-<center>
-[1] <b>Department of Physics, University of Oslo, Norway</b>
+<b>Morten Hjorth-Jensen</b> 
 </center>
+<!-- institution -->
 <center>
-[2] <b>Department of Physics and Astronomy and Facility for Rare Ion Beams, Michigan State University, USA</b>
+<b>Department of Physics, University of Oslo, Norway</b>
 </center>
 <br>
 <center>
-<h4>September 30-October 4, 2024</h4>
+<h4>September 29-October 3, 2025</h4>
 </center> <!-- date -->
 <br>
 
@@ -405,7 +300,7 @@ <h4>September 30-October 4, 2024</h4>
   <li><a href="http://github.com/._week40-bs008.html">9</a></li>
   <li><a href="http://github.com/._week40-bs009.html">10</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week40-bs068.html">69</a></li>
+  <li><a href="http://github.com/._week40-bs047.html">48</a></li>
   <li><a href="http://github.com/._week40-bs001.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
@@ -419,7 +314,7 @@ <h4>September 30-October 4, 2024</h4>
 </footer>
 -->
 <center style="font-size:80%">
-<!-- copyright --> &copy; 1999-2024, Morten Hjorth-Jensen. Released under CC Attribution-NonCommercial 4.0 license
+<!-- copyright --> &copy; 1999-2025, Morten Hjorth-Jensen. Released under CC Attribution-NonCommercial 4.0 license
 </center>
 </body>
 </html>
diff --git a/doc/pub/week40/html/._week40-bs001.html b/doc/pub/week40/html/._week40-bs001.html
index c135404f4..f3f860452 100644
--- a/doc/pub/week40/html/._week40-bs001.html
+++ b/doc/pub/week40/html/._week40-bs001.html
@@ -36,11 +36,10 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Plans for week 40', 2, None, 'plans-for-week-40'),
-              ('Lecture Monday September 30, 2024',
+ 'sections': [('Lecture Monday September 29, 2025',
                2,
                None,
-               'lecture-monday-september-30-2024'),
+               'lecture-monday-september-29-2025'),
               ('Suggested readings and videos',
                2,
                None,
@@ -49,145 +48,71 @@
                2,
                None,
                'lab-sessions-tuesday-and-wednesday'),
-              ('Summary from last week, using gradient descent methods, '
-               'limitations',
+              ('Logistic Regression, from last week',
                2,
                None,
-               'summary-from-last-week-using-gradient-descent-methods-limitations'),
-              ('Simple implementation of GD for OLS, Ridge and Lasso',
+               'logistic-regression-from-last-week'),
+              ('Classification problems', 2, None, 'classification-problems'),
+              ('Optimization and Deep learning',
                2,
                None,
-               'simple-implementation-of-gd-for-ols-ridge-and-lasso'),
-              ("But none of these can compete with Newton's method",
+               'optimization-and-deep-learning'),
+              ('Basics', 2, None, 'basics'),
+              ('Two parameters', 2, None, 'two-parameters'),
+              ('Maximum likelihood', 2, None, 'maximum-likelihood'),
+              ('The cost function rewritten',
                2,
                None,
-               'but-none-of-these-can-compete-with-newton-s-method'),
-              ('Gradient descent and Logistic regression',
+               'the-cost-function-rewritten'),
+              ('Minimizing the cross entropy',
                2,
                None,
-               'gradient-descent-and-logistic-regression'),
-              ('Overview video on Stochastic Gradient Descent',
+               'minimizing-the-cross-entropy'),
+              ('A more compact expression',
                2,
                None,
-               'overview-video-on-stochastic-gradient-descent'),
-              ('Batches and mini-batches', 2, None, 'batches-and-mini-batches'),
-              ('Stochastic Gradient Descent (SGD)',
+               'a-more-compact-expression'),
+              ('Extending to more predictors',
                2,
                None,
-               'stochastic-gradient-descent-sgd'),
-              ('Stochastic Gradient Descent',
+               'extending-to-more-predictors'),
+              ('Including more classes', 2, None, 'including-more-classes'),
+              ('More classes', 2, None, 'more-classes'),
+              ('Optimization, the central part of any Machine Learning '
+               'algortithm',
                2,
                None,
-               'stochastic-gradient-descent'),
-              ('Computation of gradients', 2, None, 'computation-of-gradients'),
-              ('SGD example', 2, None, 'sgd-example'),
-              ('The gradient step', 2, None, 'the-gradient-step'),
-              ('Simple example code', 2, None, 'simple-example-code'),
-              ('When do we stop?', 2, None, 'when-do-we-stop'),
-              ('Slightly different approach',
+               'optimization-the-central-part-of-any-machine-learning-algortithm'),
+              ('Revisiting our Logistic Regression case',
                2,
                None,
-               'slightly-different-approach'),
-              ('Time decay rate', 2, None, 'time-decay-rate'),
-              ('Code with a Number of Minibatches which varies',
+               'revisiting-our-logistic-regression-case'),
+              ('The equations to solve', 2, None, 'the-equations-to-solve'),
+              ("Solving using Newton-Raphson's method",
                2,
                None,
-               'code-with-a-number-of-minibatches-which-varies'),
-              ('Replace or not', 2, None, 'replace-or-not'),
-              ('Momentum based GD', 2, None, 'momentum-based-gd'),
-              ('More on momentum based approaches',
+               'solving-using-newton-raphson-s-method'),
+              ('Example code for Logistic Regression',
                2,
                None,
-               'more-on-momentum-based-approaches'),
-              ('Momentum parameter', 2, None, 'momentum-parameter'),
-              ('Second moment of the gradient',
-               2,
-               None,
-               'second-moment-of-the-gradient'),
-              ('RMS prop', 2, None, 'rms-prop'),
-              ('"ADAM optimizer":"https://arxiv.org/abs/1412.6980"',
-               2,
-               None,
-               'adam-optimizer-https-arxiv-org-abs-1412-6980'),
-              ('Algorithms and codes for Adagrad, RMSprop and Adam',
-               2,
-               None,
-               'algorithms-and-codes-for-adagrad-rmsprop-and-adam'),
-              ('AdaGrad algorithm, taken from "Goodfellow et '
-               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
-               2,
-               None,
-               'adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
-              ('RMSProp algorithm, taken from "Goodfellow et '
-               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
-               2,
-               None,
-               'rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
-              ('ADAM algorithm, taken from "Goodfellow et '
-               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
-               2,
-               None,
-               'adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
-              ('Practical tips', 2, None, 'practical-tips'),
-              ('Automatic differentiation',
-               2,
-               None,
-               'automatic-differentiation'),
-              ('Using autograd', 2, None, 'using-autograd'),
-              ('Autograd with more complicated functions',
-               2,
-               None,
-               'autograd-with-more-complicated-functions'),
-              ('More complicated functions using the elements of their '
-               'arguments directly',
-               2,
-               None,
-               'more-complicated-functions-using-the-elements-of-their-arguments-directly'),
-              ('Functions using mathematical functions from Numpy',
-               2,
-               None,
-               'functions-using-mathematical-functions-from-numpy'),
-              ('More autograd', 2, None, 'more-autograd'),
-              ('And  with loops', 2, None, 'and-with-loops'),
-              ('Using recursion', 2, None, 'using-recursion'),
-              ('Using Autograd with OLS', 2, None, 'using-autograd-with-ols'),
-              ('Same code but now with momentum gradient descent',
-               2,
-               None,
-               'same-code-but-now-with-momentum-gradient-descent'),
-              ('Including Stochastic Gradient Descent with Autograd',
-               2,
-               None,
-               'including-stochastic-gradient-descent-with-autograd'),
-              ('Same code but now with momentum gradient descent',
-               2,
-               None,
-               'same-code-but-now-with-momentum-gradient-descent'),
-              ('Similar (second order function now) problem but now with '
-               'AdaGrad',
-               2,
+               'example-code-for-logistic-regression'),
+              ('Synthetic data generation',
+               3,
                None,
-               'similar-second-order-function-now-problem-but-now-with-adagrad'),
-              ('RMSprop for adaptive learning rate with Stochastic Gradient '
-               'Descent',
+               'synthetic-data-generation'),
+              ('Using _Scikit-learn_', 2, None, 'using-scikit-learn'),
+              ('Using the correlation matrix',
                2,
                None,
-               'rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent'),
-              ('And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"',
+               'using-the-correlation-matrix'),
+              ('Discussing the correlation data',
                2,
                None,
-               'and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf'),
-              ('And Logistic Regression', 2, None, 'and-logistic-regression'),
-              ('Introducing "JAX":"https://jax.readthedocs.io/en/latest/"',
+               'discussing-the-correlation-data'),
+              ('Other measures in classification studies',
                2,
                None,
-               'introducing-jax-https-jax-readthedocs-io-en-latest'),
-              ('Getting started with Jax, note the way we import numpy',
-               3,
-               None,
-               'getting-started-with-jax-note-the-way-we-import-numpy'),
-              ('A warm-up example', 3, None, 'a-warm-up-example'),
-              ('A more advanced example', 3, None, 'a-more-advanced-example'),
+               'other-measures-in-classification-studies'),
               ('Introduction to Neural networks',
                2,
                None,
@@ -276,81 +201,54 @@
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="#plans-for-week-40" style="font-size: 80%;"><b>Plans for week 40</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs002.html#lecture-monday-september-30-2024" style="font-size: 80%;"><b>Lecture Monday September 30, 2024</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs003.html#suggested-readings-and-videos" style="font-size: 80%;"><b>Suggested readings and videos</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs004.html#lab-sessions-tuesday-and-wednesday" style="font-size: 80%;"><b>Lab sessions Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs005.html#summary-from-last-week-using-gradient-descent-methods-limitations" style="font-size: 80%;"><b>Summary from last week, using gradient descent methods, limitations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs006.html#simple-implementation-of-gd-for-ols-ridge-and-lasso" style="font-size: 80%;"><b>Simple implementation of GD for OLS, Ridge and Lasso</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs007.html#but-none-of-these-can-compete-with-newton-s-method" style="font-size: 80%;"><b>But none of these can compete with Newton's method</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs008.html#gradient-descent-and-logistic-regression" style="font-size: 80%;"><b>Gradient descent and Logistic regression</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs009.html#overview-video-on-stochastic-gradient-descent" style="font-size: 80%;"><b>Overview video on Stochastic Gradient Descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs010.html#batches-and-mini-batches" style="font-size: 80%;"><b>Batches and mini-batches</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs011.html#stochastic-gradient-descent-sgd" style="font-size: 80%;"><b>Stochastic Gradient Descent (SGD)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs012.html#stochastic-gradient-descent" style="font-size: 80%;"><b>Stochastic Gradient Descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs013.html#computation-of-gradients" style="font-size: 80%;"><b>Computation of gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs014.html#sgd-example" style="font-size: 80%;"><b>SGD example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs015.html#the-gradient-step" style="font-size: 80%;"><b>The gradient step</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs016.html#simple-example-code" style="font-size: 80%;"><b>Simple example code</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs017.html#when-do-we-stop" style="font-size: 80%;"><b>When do we stop?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs018.html#slightly-different-approach" style="font-size: 80%;"><b>Slightly different approach</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs019.html#time-decay-rate" style="font-size: 80%;"><b>Time decay rate</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs020.html#code-with-a-number-of-minibatches-which-varies" style="font-size: 80%;"><b>Code with a Number of Minibatches which varies</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs021.html#replace-or-not" style="font-size: 80%;"><b>Replace or not</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs022.html#momentum-based-gd" style="font-size: 80%;"><b>Momentum based GD</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs023.html#more-on-momentum-based-approaches" style="font-size: 80%;"><b>More on momentum based approaches</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs024.html#momentum-parameter" style="font-size: 80%;"><b>Momentum parameter</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs025.html#second-moment-of-the-gradient" style="font-size: 80%;"><b>Second moment of the gradient</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs026.html#rms-prop" style="font-size: 80%;"><b>RMS prop</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs027.html#adam-optimizer-https-arxiv-org-abs-1412-6980" style="font-size: 80%;"><b>"ADAM optimizer":"https://arxiv.org/abs/1412.6980"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#algorithms-and-codes-for-adagrad-rmsprop-and-adam" style="font-size: 80%;"><b>Algorithms and codes for Adagrad, RMSprop and Adam</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>AdaGrad algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>RMSProp algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>ADAM algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs029.html#practical-tips" style="font-size: 80%;"><b>Practical tips</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs030.html#automatic-differentiation" style="font-size: 80%;"><b>Automatic differentiation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs031.html#using-autograd" style="font-size: 80%;"><b>Using autograd</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs032.html#autograd-with-more-complicated-functions" style="font-size: 80%;"><b>Autograd with more complicated functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs033.html#more-complicated-functions-using-the-elements-of-their-arguments-directly" style="font-size: 80%;"><b>More complicated functions using the elements of their arguments directly</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs034.html#functions-using-mathematical-functions-from-numpy" style="font-size: 80%;"><b>Functions using mathematical functions from Numpy</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs035.html#more-autograd" style="font-size: 80%;"><b>More autograd</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs036.html#and-with-loops" style="font-size: 80%;"><b>And  with loops</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs037.html#using-recursion" style="font-size: 80%;"><b>Using recursion</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs038.html#using-autograd-with-ols" style="font-size: 80%;"><b>Using Autograd with OLS</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs041.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs040.html#including-stochastic-gradient-descent-with-autograd" style="font-size: 80%;"><b>Including Stochastic Gradient Descent with Autograd</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs041.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#similar-second-order-function-now-problem-but-now-with-adagrad" style="font-size: 80%;"><b>Similar (second order function now) problem but now with AdaGrad</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs043.html#rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent" style="font-size: 80%;"><b>RMSprop for adaptive learning rate with Stochastic Gradient Descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs044.html#and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf" style="font-size: 80%;"><b>And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#and-logistic-regression" style="font-size: 80%;"><b>And Logistic Regression</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#introducing-jax-https-jax-readthedocs-io-en-latest" style="font-size: 80%;"><b>Introducing "JAX":"https://jax.readthedocs.io/en/latest/"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#getting-started-with-jax-note-the-way-we-import-numpy" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Getting started with Jax, note the way we import numpy</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#a-warm-up-example" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;A warm-up example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#a-more-advanced-example" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;A more advanced example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs046.html#introduction-to-neural-networks" style="font-size: 80%;"><b>Introduction to Neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs047.html#artificial-neurons" style="font-size: 80%;"><b>Artificial neurons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs048.html#neural-network-types" style="font-size: 80%;"><b>Neural network types</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs049.html#feed-forward-neural-networks" style="font-size: 80%;"><b>Feed-forward neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs050.html#convolutional-neural-network" style="font-size: 80%;"><b>Convolutional Neural Network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs051.html#recurrent-neural-networks" style="font-size: 80%;"><b>Recurrent neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs052.html#other-types-of-networks" style="font-size: 80%;"><b>Other types of networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs053.html#multilayer-perceptrons" style="font-size: 80%;"><b>Multilayer perceptrons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs054.html#why-multilayer-perceptrons" style="font-size: 80%;"><b>Why multilayer perceptrons?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs055.html#illustration-of-a-single-perceptron-model-and-a-multi-perceptron-model" style="font-size: 80%;"><b>Illustration of a single perceptron model and a multi-perceptron model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs056.html#examples-of-xor-or-and-and-gates" style="font-size: 80%;"><b>Examples of XOR, OR and AND gates</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs057.html#does-logistic-regression-do-a-better-job" style="font-size: 80%;"><b>Does Logistic Regression do a better Job?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs058.html#adding-neural-networks" style="font-size: 80%;"><b>Adding Neural Networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs064.html#matrix-vector-notation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Matrix-vector notation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs065.html#matrix-vector-notation-and-activation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Matrix-vector notation  and activation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs066.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs067.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs068.html#relevance" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Relevance</a></li>
+     <!-- navigation toc: --> <li><a href="#lecture-monday-september-29-2025" style="font-size: 80%;"><b>Lecture Monday September 29, 2025</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs002.html#suggested-readings-and-videos" style="font-size: 80%;"><b>Suggested readings and videos</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs003.html#lab-sessions-tuesday-and-wednesday" style="font-size: 80%;"><b>Lab sessions Tuesday and Wednesday</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs004.html#logistic-regression-from-last-week" style="font-size: 80%;"><b>Logistic Regression, from last week</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs005.html#classification-problems" style="font-size: 80%;"><b>Classification problems</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs006.html#optimization-and-deep-learning" style="font-size: 80%;"><b>Optimization and Deep learning</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs007.html#basics" style="font-size: 80%;"><b>Basics</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs008.html#two-parameters" style="font-size: 80%;"><b>Two parameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs009.html#maximum-likelihood" style="font-size: 80%;"><b>Maximum likelihood</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs010.html#the-cost-function-rewritten" style="font-size: 80%;"><b>The cost function rewritten</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs011.html#minimizing-the-cross-entropy" style="font-size: 80%;"><b>Minimizing the cross entropy</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs012.html#a-more-compact-expression" style="font-size: 80%;"><b>A more compact expression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs013.html#extending-to-more-predictors" style="font-size: 80%;"><b>Extending to more predictors</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs014.html#including-more-classes" style="font-size: 80%;"><b>Including more classes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs015.html#more-classes" style="font-size: 80%;"><b>More classes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs016.html#optimization-the-central-part-of-any-machine-learning-algortithm" style="font-size: 80%;"><b>Optimization, the central part of any Machine Learning algortithm</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs017.html#revisiting-our-logistic-regression-case" style="font-size: 80%;"><b>Revisiting our Logistic Regression case</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs018.html#the-equations-to-solve" style="font-size: 80%;"><b>The equations to solve</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs019.html#solving-using-newton-raphson-s-method" style="font-size: 80%;"><b>Solving using Newton-Raphson's method</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs020.html#example-code-for-logistic-regression" style="font-size: 80%;"><b>Example code for Logistic Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs020.html#synthetic-data-generation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Synthetic data generation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs021.html#using-scikit-learn" style="font-size: 80%;"><b>Using <b>Scikit-learn</b></b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs022.html#using-the-correlation-matrix" style="font-size: 80%;"><b>Using the correlation matrix</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs023.html#discussing-the-correlation-data" style="font-size: 80%;"><b>Discussing the correlation data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs024.html#other-measures-in-classification-studies" style="font-size: 80%;"><b>Other measures in classification studies</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs025.html#introduction-to-neural-networks" style="font-size: 80%;"><b>Introduction to Neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs026.html#artificial-neurons" style="font-size: 80%;"><b>Artificial neurons</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs027.html#neural-network-types" style="font-size: 80%;"><b>Neural network types</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#feed-forward-neural-networks" style="font-size: 80%;"><b>Feed-forward neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs029.html#convolutional-neural-network" style="font-size: 80%;"><b>Convolutional Neural Network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs030.html#recurrent-neural-networks" style="font-size: 80%;"><b>Recurrent neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs031.html#other-types-of-networks" style="font-size: 80%;"><b>Other types of networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs032.html#multilayer-perceptrons" style="font-size: 80%;"><b>Multilayer perceptrons</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs033.html#why-multilayer-perceptrons" style="font-size: 80%;"><b>Why multilayer perceptrons?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs034.html#illustration-of-a-single-perceptron-model-and-a-multi-perceptron-model" style="font-size: 80%;"><b>Illustration of a single perceptron model and a multi-perceptron model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs035.html#examples-of-xor-or-and-and-gates" style="font-size: 80%;"><b>Examples of XOR, OR and AND gates</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs036.html#does-logistic-regression-do-a-better-job" style="font-size: 80%;"><b>Does Logistic Regression do a better Job?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs037.html#adding-neural-networks" style="font-size: 80%;"><b>Adding Neural Networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs043.html#matrix-vector-notation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Matrix-vector notation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs044.html#matrix-vector-notation-and-activation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Matrix-vector notation  and activation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs046.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs047.html#relevance" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Relevance</a></li>
 
         </ul>
       </li>
@@ -362,7 +260,20 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0001"></a>
 <!-- !split -->
-<h2 id="plans-for-week-40" class="anchor">Plans for week 40 </h2>
+<h2 id="lecture-monday-september-29-2025" class="anchor">Lecture Monday September 29, 2025 </h2>
+<div class="panel panel-default">
+<div class="panel-body">
+<!-- subsequent paragraphs come in larger fonts, so start with a paragraph -->
+<ol>
+<li> Logistic regression and gradient descent, examples on how to code
+<!-- o Automatic differentiation and gradient descent, examples using Logistic regression --></li>
+<li> Start with the basics of Neural Networks, setting up the basic steps, from the simple perceptron model to the multi-layer perceptron model</li>
+<li> Video of lecture at <a href="https://youtu.be/MS3Tv8FVArs" target="_self"><tt>https://youtu.be/MS3Tv8FVArs</tt></a></li>
+<li> Whiteboard notes at <a href="https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2025/FYSSTKweek40.pdf" target="_self"><tt>https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2025/FYSSTKweek40.pdf</tt></a></li>  
+</ol>
+</div>
+</div>
+
 
 <p>
 <!-- navigation buttons at the bottom of the page -->
@@ -380,7 +291,7 @@ <h2 id="plans-for-week-40" class="anchor">Plans for week 40 </h2>
   <li><a href="http://github.com/._week40-bs009.html">10</a></li>
   <li><a href="http://github.com/._week40-bs010.html">11</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week40-bs068.html">69</a></li>
+  <li><a href="http://github.com/._week40-bs047.html">48</a></li>
   <li><a href="http://github.com/._week40-bs002.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week40/html/._week40-bs002.html b/doc/pub/week40/html/._week40-bs002.html
index ea218983a..5378461e2 100644
--- a/doc/pub/week40/html/._week40-bs002.html
+++ b/doc/pub/week40/html/._week40-bs002.html
@@ -36,11 +36,10 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Plans for week 40', 2, None, 'plans-for-week-40'),
-              ('Lecture Monday September 30, 2024',
+ 'sections': [('Lecture Monday September 29, 2025',
                2,
                None,
-               'lecture-monday-september-30-2024'),
+               'lecture-monday-september-29-2025'),
               ('Suggested readings and videos',
                2,
                None,
@@ -49,145 +48,71 @@
                2,
                None,
                'lab-sessions-tuesday-and-wednesday'),
-              ('Summary from last week, using gradient descent methods, '
-               'limitations',
+              ('Logistic Regression, from last week',
                2,
                None,
-               'summary-from-last-week-using-gradient-descent-methods-limitations'),
-              ('Simple implementation of GD for OLS, Ridge and Lasso',
+               'logistic-regression-from-last-week'),
+              ('Classification problems', 2, None, 'classification-problems'),
+              ('Optimization and Deep learning',
                2,
                None,
-               'simple-implementation-of-gd-for-ols-ridge-and-lasso'),
-              ("But none of these can compete with Newton's method",
+               'optimization-and-deep-learning'),
+              ('Basics', 2, None, 'basics'),
+              ('Two parameters', 2, None, 'two-parameters'),
+              ('Maximum likelihood', 2, None, 'maximum-likelihood'),
+              ('The cost function rewritten',
                2,
                None,
-               'but-none-of-these-can-compete-with-newton-s-method'),
-              ('Gradient descent and Logistic regression',
+               'the-cost-function-rewritten'),
+              ('Minimizing the cross entropy',
                2,
                None,
-               'gradient-descent-and-logistic-regression'),
-              ('Overview video on Stochastic Gradient Descent',
+               'minimizing-the-cross-entropy'),
+              ('A more compact expression',
                2,
                None,
-               'overview-video-on-stochastic-gradient-descent'),
-              ('Batches and mini-batches', 2, None, 'batches-and-mini-batches'),
-              ('Stochastic Gradient Descent (SGD)',
+               'a-more-compact-expression'),
+              ('Extending to more predictors',
                2,
                None,
-               'stochastic-gradient-descent-sgd'),
-              ('Stochastic Gradient Descent',
+               'extending-to-more-predictors'),
+              ('Including more classes', 2, None, 'including-more-classes'),
+              ('More classes', 2, None, 'more-classes'),
+              ('Optimization, the central part of any Machine Learning '
+               'algortithm',
                2,
                None,
-               'stochastic-gradient-descent'),
-              ('Computation of gradients', 2, None, 'computation-of-gradients'),
-              ('SGD example', 2, None, 'sgd-example'),
-              ('The gradient step', 2, None, 'the-gradient-step'),
-              ('Simple example code', 2, None, 'simple-example-code'),
-              ('When do we stop?', 2, None, 'when-do-we-stop'),
-              ('Slightly different approach',
+               'optimization-the-central-part-of-any-machine-learning-algortithm'),
+              ('Revisiting our Logistic Regression case',
                2,
                None,
-               'slightly-different-approach'),
-              ('Time decay rate', 2, None, 'time-decay-rate'),
-              ('Code with a Number of Minibatches which varies',
+               'revisiting-our-logistic-regression-case'),
+              ('The equations to solve', 2, None, 'the-equations-to-solve'),
+              ("Solving using Newton-Raphson's method",
                2,
                None,
-               'code-with-a-number-of-minibatches-which-varies'),
-              ('Replace or not', 2, None, 'replace-or-not'),
-              ('Momentum based GD', 2, None, 'momentum-based-gd'),
-              ('More on momentum based approaches',
+               'solving-using-newton-raphson-s-method'),
+              ('Example code for Logistic Regression',
                2,
                None,
-               'more-on-momentum-based-approaches'),
-              ('Momentum parameter', 2, None, 'momentum-parameter'),
-              ('Second moment of the gradient',
-               2,
-               None,
-               'second-moment-of-the-gradient'),
-              ('RMS prop', 2, None, 'rms-prop'),
-              ('"ADAM optimizer":"https://arxiv.org/abs/1412.6980"',
-               2,
-               None,
-               'adam-optimizer-https-arxiv-org-abs-1412-6980'),
-              ('Algorithms and codes for Adagrad, RMSprop and Adam',
-               2,
-               None,
-               'algorithms-and-codes-for-adagrad-rmsprop-and-adam'),
-              ('AdaGrad algorithm, taken from "Goodfellow et '
-               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
-               2,
-               None,
-               'adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
-              ('RMSProp algorithm, taken from "Goodfellow et '
-               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
-               2,
-               None,
-               'rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
-              ('ADAM algorithm, taken from "Goodfellow et '
-               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
-               2,
-               None,
-               'adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
-              ('Practical tips', 2, None, 'practical-tips'),
-              ('Automatic differentiation',
-               2,
-               None,
-               'automatic-differentiation'),
-              ('Using autograd', 2, None, 'using-autograd'),
-              ('Autograd with more complicated functions',
-               2,
-               None,
-               'autograd-with-more-complicated-functions'),
-              ('More complicated functions using the elements of their '
-               'arguments directly',
-               2,
-               None,
-               'more-complicated-functions-using-the-elements-of-their-arguments-directly'),
-              ('Functions using mathematical functions from Numpy',
-               2,
-               None,
-               'functions-using-mathematical-functions-from-numpy'),
-              ('More autograd', 2, None, 'more-autograd'),
-              ('And  with loops', 2, None, 'and-with-loops'),
-              ('Using recursion', 2, None, 'using-recursion'),
-              ('Using Autograd with OLS', 2, None, 'using-autograd-with-ols'),
-              ('Same code but now with momentum gradient descent',
-               2,
-               None,
-               'same-code-but-now-with-momentum-gradient-descent'),
-              ('Including Stochastic Gradient Descent with Autograd',
-               2,
-               None,
-               'including-stochastic-gradient-descent-with-autograd'),
-              ('Same code but now with momentum gradient descent',
-               2,
-               None,
-               'same-code-but-now-with-momentum-gradient-descent'),
-              ('Similar (second order function now) problem but now with '
-               'AdaGrad',
-               2,
+               'example-code-for-logistic-regression'),
+              ('Synthetic data generation',
+               3,
                None,
-               'similar-second-order-function-now-problem-but-now-with-adagrad'),
-              ('RMSprop for adaptive learning rate with Stochastic Gradient '
-               'Descent',
+               'synthetic-data-generation'),
+              ('Using _Scikit-learn_', 2, None, 'using-scikit-learn'),
+              ('Using the correlation matrix',
                2,
                None,
-               'rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent'),
-              ('And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"',
+               'using-the-correlation-matrix'),
+              ('Discussing the correlation data',
                2,
                None,
-               'and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf'),
-              ('And Logistic Regression', 2, None, 'and-logistic-regression'),
-              ('Introducing "JAX":"https://jax.readthedocs.io/en/latest/"',
+               'discussing-the-correlation-data'),
+              ('Other measures in classification studies',
                2,
                None,
-               'introducing-jax-https-jax-readthedocs-io-en-latest'),
-              ('Getting started with Jax, note the way we import numpy',
-               3,
-               None,
-               'getting-started-with-jax-note-the-way-we-import-numpy'),
-              ('A warm-up example', 3, None, 'a-warm-up-example'),
-              ('A more advanced example', 3, None, 'a-more-advanced-example'),
+               'other-measures-in-classification-studies'),
               ('Introduction to Neural networks',
                2,
                None,
@@ -276,81 +201,54 @@
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs001.html#plans-for-week-40" style="font-size: 80%;"><b>Plans for week 40</b></a></li>
-     <!-- navigation toc: --> <li><a href="#lecture-monday-september-30-2024" style="font-size: 80%;"><b>Lecture Monday September 30, 2024</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs003.html#suggested-readings-and-videos" style="font-size: 80%;"><b>Suggested readings and videos</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs004.html#lab-sessions-tuesday-and-wednesday" style="font-size: 80%;"><b>Lab sessions Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs005.html#summary-from-last-week-using-gradient-descent-methods-limitations" style="font-size: 80%;"><b>Summary from last week, using gradient descent methods, limitations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs006.html#simple-implementation-of-gd-for-ols-ridge-and-lasso" style="font-size: 80%;"><b>Simple implementation of GD for OLS, Ridge and Lasso</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs007.html#but-none-of-these-can-compete-with-newton-s-method" style="font-size: 80%;"><b>But none of these can compete with Newton's method</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs008.html#gradient-descent-and-logistic-regression" style="font-size: 80%;"><b>Gradient descent and Logistic regression</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs009.html#overview-video-on-stochastic-gradient-descent" style="font-size: 80%;"><b>Overview video on Stochastic Gradient Descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs010.html#batches-and-mini-batches" style="font-size: 80%;"><b>Batches and mini-batches</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs011.html#stochastic-gradient-descent-sgd" style="font-size: 80%;"><b>Stochastic Gradient Descent (SGD)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs012.html#stochastic-gradient-descent" style="font-size: 80%;"><b>Stochastic Gradient Descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs013.html#computation-of-gradients" style="font-size: 80%;"><b>Computation of gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs014.html#sgd-example" style="font-size: 80%;"><b>SGD example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs015.html#the-gradient-step" style="font-size: 80%;"><b>The gradient step</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs016.html#simple-example-code" style="font-size: 80%;"><b>Simple example code</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs017.html#when-do-we-stop" style="font-size: 80%;"><b>When do we stop?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs018.html#slightly-different-approach" style="font-size: 80%;"><b>Slightly different approach</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs019.html#time-decay-rate" style="font-size: 80%;"><b>Time decay rate</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs020.html#code-with-a-number-of-minibatches-which-varies" style="font-size: 80%;"><b>Code with a Number of Minibatches which varies</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs021.html#replace-or-not" style="font-size: 80%;"><b>Replace or not</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs022.html#momentum-based-gd" style="font-size: 80%;"><b>Momentum based GD</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs023.html#more-on-momentum-based-approaches" style="font-size: 80%;"><b>More on momentum based approaches</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs024.html#momentum-parameter" style="font-size: 80%;"><b>Momentum parameter</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs025.html#second-moment-of-the-gradient" style="font-size: 80%;"><b>Second moment of the gradient</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs026.html#rms-prop" style="font-size: 80%;"><b>RMS prop</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs027.html#adam-optimizer-https-arxiv-org-abs-1412-6980" style="font-size: 80%;"><b>"ADAM optimizer":"https://arxiv.org/abs/1412.6980"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#algorithms-and-codes-for-adagrad-rmsprop-and-adam" style="font-size: 80%;"><b>Algorithms and codes for Adagrad, RMSprop and Adam</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>AdaGrad algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>RMSProp algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>ADAM algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs029.html#practical-tips" style="font-size: 80%;"><b>Practical tips</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs030.html#automatic-differentiation" style="font-size: 80%;"><b>Automatic differentiation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs031.html#using-autograd" style="font-size: 80%;"><b>Using autograd</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs032.html#autograd-with-more-complicated-functions" style="font-size: 80%;"><b>Autograd with more complicated functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs033.html#more-complicated-functions-using-the-elements-of-their-arguments-directly" style="font-size: 80%;"><b>More complicated functions using the elements of their arguments directly</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs034.html#functions-using-mathematical-functions-from-numpy" style="font-size: 80%;"><b>Functions using mathematical functions from Numpy</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs035.html#more-autograd" style="font-size: 80%;"><b>More autograd</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs036.html#and-with-loops" style="font-size: 80%;"><b>And  with loops</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs037.html#using-recursion" style="font-size: 80%;"><b>Using recursion</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs038.html#using-autograd-with-ols" style="font-size: 80%;"><b>Using Autograd with OLS</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs041.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs040.html#including-stochastic-gradient-descent-with-autograd" style="font-size: 80%;"><b>Including Stochastic Gradient Descent with Autograd</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs041.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#similar-second-order-function-now-problem-but-now-with-adagrad" style="font-size: 80%;"><b>Similar (second order function now) problem but now with AdaGrad</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs043.html#rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent" style="font-size: 80%;"><b>RMSprop for adaptive learning rate with Stochastic Gradient Descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs044.html#and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf" style="font-size: 80%;"><b>And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#and-logistic-regression" style="font-size: 80%;"><b>And Logistic Regression</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#introducing-jax-https-jax-readthedocs-io-en-latest" style="font-size: 80%;"><b>Introducing "JAX":"https://jax.readthedocs.io/en/latest/"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#getting-started-with-jax-note-the-way-we-import-numpy" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Getting started with Jax, note the way we import numpy</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#a-warm-up-example" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;A warm-up example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#a-more-advanced-example" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;A more advanced example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs046.html#introduction-to-neural-networks" style="font-size: 80%;"><b>Introduction to Neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs047.html#artificial-neurons" style="font-size: 80%;"><b>Artificial neurons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs048.html#neural-network-types" style="font-size: 80%;"><b>Neural network types</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs049.html#feed-forward-neural-networks" style="font-size: 80%;"><b>Feed-forward neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs050.html#convolutional-neural-network" style="font-size: 80%;"><b>Convolutional Neural Network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs051.html#recurrent-neural-networks" style="font-size: 80%;"><b>Recurrent neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs052.html#other-types-of-networks" style="font-size: 80%;"><b>Other types of networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs053.html#multilayer-perceptrons" style="font-size: 80%;"><b>Multilayer perceptrons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs054.html#why-multilayer-perceptrons" style="font-size: 80%;"><b>Why multilayer perceptrons?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs055.html#illustration-of-a-single-perceptron-model-and-a-multi-perceptron-model" style="font-size: 80%;"><b>Illustration of a single perceptron model and a multi-perceptron model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs056.html#examples-of-xor-or-and-and-gates" style="font-size: 80%;"><b>Examples of XOR, OR and AND gates</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs057.html#does-logistic-regression-do-a-better-job" style="font-size: 80%;"><b>Does Logistic Regression do a better Job?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs058.html#adding-neural-networks" style="font-size: 80%;"><b>Adding Neural Networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs064.html#matrix-vector-notation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Matrix-vector notation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs065.html#matrix-vector-notation-and-activation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Matrix-vector notation  and activation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs066.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs067.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs068.html#relevance" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Relevance</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs001.html#lecture-monday-september-29-2025" style="font-size: 80%;"><b>Lecture Monday September 29, 2025</b></a></li>
+     <!-- navigation toc: --> <li><a href="#suggested-readings-and-videos" style="font-size: 80%;"><b>Suggested readings and videos</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs003.html#lab-sessions-tuesday-and-wednesday" style="font-size: 80%;"><b>Lab sessions Tuesday and Wednesday</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs004.html#logistic-regression-from-last-week" style="font-size: 80%;"><b>Logistic Regression, from last week</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs005.html#classification-problems" style="font-size: 80%;"><b>Classification problems</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs006.html#optimization-and-deep-learning" style="font-size: 80%;"><b>Optimization and Deep learning</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs007.html#basics" style="font-size: 80%;"><b>Basics</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs008.html#two-parameters" style="font-size: 80%;"><b>Two parameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs009.html#maximum-likelihood" style="font-size: 80%;"><b>Maximum likelihood</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs010.html#the-cost-function-rewritten" style="font-size: 80%;"><b>The cost function rewritten</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs011.html#minimizing-the-cross-entropy" style="font-size: 80%;"><b>Minimizing the cross entropy</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs012.html#a-more-compact-expression" style="font-size: 80%;"><b>A more compact expression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs013.html#extending-to-more-predictors" style="font-size: 80%;"><b>Extending to more predictors</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs014.html#including-more-classes" style="font-size: 80%;"><b>Including more classes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs015.html#more-classes" style="font-size: 80%;"><b>More classes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs016.html#optimization-the-central-part-of-any-machine-learning-algortithm" style="font-size: 80%;"><b>Optimization, the central part of any Machine Learning algortithm</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs017.html#revisiting-our-logistic-regression-case" style="font-size: 80%;"><b>Revisiting our Logistic Regression case</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs018.html#the-equations-to-solve" style="font-size: 80%;"><b>The equations to solve</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs019.html#solving-using-newton-raphson-s-method" style="font-size: 80%;"><b>Solving using Newton-Raphson's method</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs020.html#example-code-for-logistic-regression" style="font-size: 80%;"><b>Example code for Logistic Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs020.html#synthetic-data-generation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Synthetic data generation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs021.html#using-scikit-learn" style="font-size: 80%;"><b>Using <b>Scikit-learn</b></b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs022.html#using-the-correlation-matrix" style="font-size: 80%;"><b>Using the correlation matrix</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs023.html#discussing-the-correlation-data" style="font-size: 80%;"><b>Discussing the correlation data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs024.html#other-measures-in-classification-studies" style="font-size: 80%;"><b>Other measures in classification studies</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs025.html#introduction-to-neural-networks" style="font-size: 80%;"><b>Introduction to Neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs026.html#artificial-neurons" style="font-size: 80%;"><b>Artificial neurons</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs027.html#neural-network-types" style="font-size: 80%;"><b>Neural network types</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#feed-forward-neural-networks" style="font-size: 80%;"><b>Feed-forward neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs029.html#convolutional-neural-network" style="font-size: 80%;"><b>Convolutional Neural Network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs030.html#recurrent-neural-networks" style="font-size: 80%;"><b>Recurrent neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs031.html#other-types-of-networks" style="font-size: 80%;"><b>Other types of networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs032.html#multilayer-perceptrons" style="font-size: 80%;"><b>Multilayer perceptrons</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs033.html#why-multilayer-perceptrons" style="font-size: 80%;"><b>Why multilayer perceptrons?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs034.html#illustration-of-a-single-perceptron-model-and-a-multi-perceptron-model" style="font-size: 80%;"><b>Illustration of a single perceptron model and a multi-perceptron model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs035.html#examples-of-xor-or-and-and-gates" style="font-size: 80%;"><b>Examples of XOR, OR and AND gates</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs036.html#does-logistic-regression-do-a-better-job" style="font-size: 80%;"><b>Does Logistic Regression do a better Job?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs037.html#adding-neural-networks" style="font-size: 80%;"><b>Adding Neural Networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs043.html#matrix-vector-notation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Matrix-vector notation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs044.html#matrix-vector-notation-and-activation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Matrix-vector notation  and activation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs046.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs047.html#relevance" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Relevance</a></li>
 
         </ul>
       </li>
@@ -362,15 +260,18 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0002"></a>
 <!-- !split -->
-<h2 id="lecture-monday-september-30-2024" class="anchor">Lecture Monday September 30, 2024 </h2>
+<h2 id="suggested-readings-and-videos" class="anchor">Suggested readings and videos </h2>
 <div class="panel panel-default">
 <div class="panel-body">
 <!-- subsequent paragraphs come in larger fonts, so start with a paragraph -->
 <ol>
- <li> Stochastic Gradient descent with examples and automatic differentiation</li>
- <li> If we get time, we start with the basics of Neural Networks, setting up the basic steps, from the simple perceptron model to the multi-layer perceptron model</li>
- <li> <a href="https://youtu.be/jdJoOrCIdII" target="_self">Video of lecture</a></li>
- <li> Whiteboard notes at <a href="https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2024/NotesSeptember30.pdf" target="_self"><tt>https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2024/NotesSeptember30.pdf</tt></a></li>  
+<li> The lecture notes for week 40 (these notes)
+<!-- o For a good discussion on gradient methods, we would like to recommend Goodfellow et al section 4.3-4.5 and# sections 8.3-8.6. We will come back to the latter chapter in our discussion of Neural networks as well. --></li>
+<li> For neural networks we recommend Goodfellow et al chapter 6 and Raschka et al chapter 2 (contains also material about gradient descent) and chapter 11 (we will use this next week)
+<!-- o Video on gradient descent at <a href="https://www.youtube.com/watch?v=sDv4f4s2SB8" target="_self"><tt>https://www.youtube.com/watch?v=sDv4f4s2SB8</tt></a> -->
+<!-- o Video on automatic differentiation  at <a href="https://www.youtube.com/watch?v=wG_nF1awSSY" target="_self"><tt>https://www.youtube.com/watch?v=wG_nF1awSSY</tt></a> --></li>
+<li> Neural Networks demystified at <a href="https://www.youtube.com/watch?v=bxe2T-V8XRs&list=PLiaHhY2iBX9hdHaRr6b7XevZtgZRa1PoU&ab_channel=WelchLabs" target="_self"><tt>https://www.youtube.com/watch?v=bxe2T-V8XRs&list=PLiaHhY2iBX9hdHaRr6b7XevZtgZRa1PoU&ab_channel=WelchLabs</tt></a></li>
+<li> Building Neural Networks from scratch at URL:https://www.youtube.com/watch?v=Wo5dMEP_BbI&list=PLQVvvaa0QuDcjD5BAw2DxE6OF2tius3V3&ab_channel=sentdex"</li>
 </ol>
 </div>
 </div>
@@ -393,7 +294,7 @@ <h2 id="lecture-monday-september-30-2024" class="anchor">Lecture Monday Septembe
   <li><a href="http://github.com/._week40-bs010.html">11</a></li>
   <li><a href="http://github.com/._week40-bs011.html">12</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week40-bs068.html">69</a></li>
+  <li><a href="http://github.com/._week40-bs047.html">48</a></li>
   <li><a href="http://github.com/._week40-bs003.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week40/html/._week40-bs003.html b/doc/pub/week40/html/._week40-bs003.html
index c93b19017..f211f544d 100644
--- a/doc/pub/week40/html/._week40-bs003.html
+++ b/doc/pub/week40/html/._week40-bs003.html
@@ -36,11 +36,10 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Plans for week 40', 2, None, 'plans-for-week-40'),
-              ('Lecture Monday September 30, 2024',
+ 'sections': [('Lecture Monday September 29, 2025',
                2,
                None,
-               'lecture-monday-september-30-2024'),
+               'lecture-monday-september-29-2025'),
               ('Suggested readings and videos',
                2,
                None,
@@ -49,145 +48,71 @@
                2,
                None,
                'lab-sessions-tuesday-and-wednesday'),
-              ('Summary from last week, using gradient descent methods, '
-               'limitations',
+              ('Logistic Regression, from last week',
                2,
                None,
-               'summary-from-last-week-using-gradient-descent-methods-limitations'),
-              ('Simple implementation of GD for OLS, Ridge and Lasso',
+               'logistic-regression-from-last-week'),
+              ('Classification problems', 2, None, 'classification-problems'),
+              ('Optimization and Deep learning',
                2,
                None,
-               'simple-implementation-of-gd-for-ols-ridge-and-lasso'),
-              ("But none of these can compete with Newton's method",
+               'optimization-and-deep-learning'),
+              ('Basics', 2, None, 'basics'),
+              ('Two parameters', 2, None, 'two-parameters'),
+              ('Maximum likelihood', 2, None, 'maximum-likelihood'),
+              ('The cost function rewritten',
                2,
                None,
-               'but-none-of-these-can-compete-with-newton-s-method'),
-              ('Gradient descent and Logistic regression',
+               'the-cost-function-rewritten'),
+              ('Minimizing the cross entropy',
                2,
                None,
-               'gradient-descent-and-logistic-regression'),
-              ('Overview video on Stochastic Gradient Descent',
+               'minimizing-the-cross-entropy'),
+              ('A more compact expression',
                2,
                None,
-               'overview-video-on-stochastic-gradient-descent'),
-              ('Batches and mini-batches', 2, None, 'batches-and-mini-batches'),
-              ('Stochastic Gradient Descent (SGD)',
+               'a-more-compact-expression'),
+              ('Extending to more predictors',
                2,
                None,
-               'stochastic-gradient-descent-sgd'),
-              ('Stochastic Gradient Descent',
+               'extending-to-more-predictors'),
+              ('Including more classes', 2, None, 'including-more-classes'),
+              ('More classes', 2, None, 'more-classes'),
+              ('Optimization, the central part of any Machine Learning '
+               'algortithm',
                2,
                None,
-               'stochastic-gradient-descent'),
-              ('Computation of gradients', 2, None, 'computation-of-gradients'),
-              ('SGD example', 2, None, 'sgd-example'),
-              ('The gradient step', 2, None, 'the-gradient-step'),
-              ('Simple example code', 2, None, 'simple-example-code'),
-              ('When do we stop?', 2, None, 'when-do-we-stop'),
-              ('Slightly different approach',
+               'optimization-the-central-part-of-any-machine-learning-algortithm'),
+              ('Revisiting our Logistic Regression case',
                2,
                None,
-               'slightly-different-approach'),
-              ('Time decay rate', 2, None, 'time-decay-rate'),
-              ('Code with a Number of Minibatches which varies',
+               'revisiting-our-logistic-regression-case'),
+              ('The equations to solve', 2, None, 'the-equations-to-solve'),
+              ("Solving using Newton-Raphson's method",
                2,
                None,
-               'code-with-a-number-of-minibatches-which-varies'),
-              ('Replace or not', 2, None, 'replace-or-not'),
-              ('Momentum based GD', 2, None, 'momentum-based-gd'),
-              ('More on momentum based approaches',
+               'solving-using-newton-raphson-s-method'),
+              ('Example code for Logistic Regression',
                2,
                None,
-               'more-on-momentum-based-approaches'),
-              ('Momentum parameter', 2, None, 'momentum-parameter'),
-              ('Second moment of the gradient',
-               2,
-               None,
-               'second-moment-of-the-gradient'),
-              ('RMS prop', 2, None, 'rms-prop'),
-              ('"ADAM optimizer":"https://arxiv.org/abs/1412.6980"',
-               2,
-               None,
-               'adam-optimizer-https-arxiv-org-abs-1412-6980'),
-              ('Algorithms and codes for Adagrad, RMSprop and Adam',
-               2,
-               None,
-               'algorithms-and-codes-for-adagrad-rmsprop-and-adam'),
-              ('AdaGrad algorithm, taken from "Goodfellow et '
-               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
-               2,
-               None,
-               'adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
-              ('RMSProp algorithm, taken from "Goodfellow et '
-               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
-               2,
-               None,
-               'rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
-              ('ADAM algorithm, taken from "Goodfellow et '
-               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
-               2,
-               None,
-               'adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
-              ('Practical tips', 2, None, 'practical-tips'),
-              ('Automatic differentiation',
-               2,
-               None,
-               'automatic-differentiation'),
-              ('Using autograd', 2, None, 'using-autograd'),
-              ('Autograd with more complicated functions',
-               2,
-               None,
-               'autograd-with-more-complicated-functions'),
-              ('More complicated functions using the elements of their '
-               'arguments directly',
-               2,
-               None,
-               'more-complicated-functions-using-the-elements-of-their-arguments-directly'),
-              ('Functions using mathematical functions from Numpy',
-               2,
-               None,
-               'functions-using-mathematical-functions-from-numpy'),
-              ('More autograd', 2, None, 'more-autograd'),
-              ('And  with loops', 2, None, 'and-with-loops'),
-              ('Using recursion', 2, None, 'using-recursion'),
-              ('Using Autograd with OLS', 2, None, 'using-autograd-with-ols'),
-              ('Same code but now with momentum gradient descent',
-               2,
-               None,
-               'same-code-but-now-with-momentum-gradient-descent'),
-              ('Including Stochastic Gradient Descent with Autograd',
-               2,
-               None,
-               'including-stochastic-gradient-descent-with-autograd'),
-              ('Same code but now with momentum gradient descent',
-               2,
-               None,
-               'same-code-but-now-with-momentum-gradient-descent'),
-              ('Similar (second order function now) problem but now with '
-               'AdaGrad',
-               2,
+               'example-code-for-logistic-regression'),
+              ('Synthetic data generation',
+               3,
                None,
-               'similar-second-order-function-now-problem-but-now-with-adagrad'),
-              ('RMSprop for adaptive learning rate with Stochastic Gradient '
-               'Descent',
+               'synthetic-data-generation'),
+              ('Using _Scikit-learn_', 2, None, 'using-scikit-learn'),
+              ('Using the correlation matrix',
                2,
                None,
-               'rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent'),
-              ('And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"',
+               'using-the-correlation-matrix'),
+              ('Discussing the correlation data',
                2,
                None,
-               'and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf'),
-              ('And Logistic Regression', 2, None, 'and-logistic-regression'),
-              ('Introducing "JAX":"https://jax.readthedocs.io/en/latest/"',
+               'discussing-the-correlation-data'),
+              ('Other measures in classification studies',
                2,
                None,
-               'introducing-jax-https-jax-readthedocs-io-en-latest'),
-              ('Getting started with Jax, note the way we import numpy',
-               3,
-               None,
-               'getting-started-with-jax-note-the-way-we-import-numpy'),
-              ('A warm-up example', 3, None, 'a-warm-up-example'),
-              ('A more advanced example', 3, None, 'a-more-advanced-example'),
+               'other-measures-in-classification-studies'),
               ('Introduction to Neural networks',
                2,
                None,
@@ -276,81 +201,54 @@
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs001.html#plans-for-week-40" style="font-size: 80%;"><b>Plans for week 40</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs002.html#lecture-monday-september-30-2024" style="font-size: 80%;"><b>Lecture Monday September 30, 2024</b></a></li>
-     <!-- navigation toc: --> <li><a href="#suggested-readings-and-videos" style="font-size: 80%;"><b>Suggested readings and videos</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs004.html#lab-sessions-tuesday-and-wednesday" style="font-size: 80%;"><b>Lab sessions Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs005.html#summary-from-last-week-using-gradient-descent-methods-limitations" style="font-size: 80%;"><b>Summary from last week, using gradient descent methods, limitations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs006.html#simple-implementation-of-gd-for-ols-ridge-and-lasso" style="font-size: 80%;"><b>Simple implementation of GD for OLS, Ridge and Lasso</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs007.html#but-none-of-these-can-compete-with-newton-s-method" style="font-size: 80%;"><b>But none of these can compete with Newton's method</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs008.html#gradient-descent-and-logistic-regression" style="font-size: 80%;"><b>Gradient descent and Logistic regression</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs009.html#overview-video-on-stochastic-gradient-descent" style="font-size: 80%;"><b>Overview video on Stochastic Gradient Descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs010.html#batches-and-mini-batches" style="font-size: 80%;"><b>Batches and mini-batches</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs011.html#stochastic-gradient-descent-sgd" style="font-size: 80%;"><b>Stochastic Gradient Descent (SGD)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs012.html#stochastic-gradient-descent" style="font-size: 80%;"><b>Stochastic Gradient Descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs013.html#computation-of-gradients" style="font-size: 80%;"><b>Computation of gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs014.html#sgd-example" style="font-size: 80%;"><b>SGD example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs015.html#the-gradient-step" style="font-size: 80%;"><b>The gradient step</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs016.html#simple-example-code" style="font-size: 80%;"><b>Simple example code</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs017.html#when-do-we-stop" style="font-size: 80%;"><b>When do we stop?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs018.html#slightly-different-approach" style="font-size: 80%;"><b>Slightly different approach</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs019.html#time-decay-rate" style="font-size: 80%;"><b>Time decay rate</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs020.html#code-with-a-number-of-minibatches-which-varies" style="font-size: 80%;"><b>Code with a Number of Minibatches which varies</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs021.html#replace-or-not" style="font-size: 80%;"><b>Replace or not</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs022.html#momentum-based-gd" style="font-size: 80%;"><b>Momentum based GD</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs023.html#more-on-momentum-based-approaches" style="font-size: 80%;"><b>More on momentum based approaches</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs024.html#momentum-parameter" style="font-size: 80%;"><b>Momentum parameter</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs025.html#second-moment-of-the-gradient" style="font-size: 80%;"><b>Second moment of the gradient</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs026.html#rms-prop" style="font-size: 80%;"><b>RMS prop</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs027.html#adam-optimizer-https-arxiv-org-abs-1412-6980" style="font-size: 80%;"><b>"ADAM optimizer":"https://arxiv.org/abs/1412.6980"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#algorithms-and-codes-for-adagrad-rmsprop-and-adam" style="font-size: 80%;"><b>Algorithms and codes for Adagrad, RMSprop and Adam</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>AdaGrad algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>RMSProp algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>ADAM algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs029.html#practical-tips" style="font-size: 80%;"><b>Practical tips</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs030.html#automatic-differentiation" style="font-size: 80%;"><b>Automatic differentiation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs031.html#using-autograd" style="font-size: 80%;"><b>Using autograd</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs032.html#autograd-with-more-complicated-functions" style="font-size: 80%;"><b>Autograd with more complicated functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs033.html#more-complicated-functions-using-the-elements-of-their-arguments-directly" style="font-size: 80%;"><b>More complicated functions using the elements of their arguments directly</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs034.html#functions-using-mathematical-functions-from-numpy" style="font-size: 80%;"><b>Functions using mathematical functions from Numpy</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs035.html#more-autograd" style="font-size: 80%;"><b>More autograd</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs036.html#and-with-loops" style="font-size: 80%;"><b>And  with loops</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs037.html#using-recursion" style="font-size: 80%;"><b>Using recursion</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs038.html#using-autograd-with-ols" style="font-size: 80%;"><b>Using Autograd with OLS</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs041.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs040.html#including-stochastic-gradient-descent-with-autograd" style="font-size: 80%;"><b>Including Stochastic Gradient Descent with Autograd</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs041.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#similar-second-order-function-now-problem-but-now-with-adagrad" style="font-size: 80%;"><b>Similar (second order function now) problem but now with AdaGrad</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs043.html#rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent" style="font-size: 80%;"><b>RMSprop for adaptive learning rate with Stochastic Gradient Descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs044.html#and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf" style="font-size: 80%;"><b>And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#and-logistic-regression" style="font-size: 80%;"><b>And Logistic Regression</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#introducing-jax-https-jax-readthedocs-io-en-latest" style="font-size: 80%;"><b>Introducing "JAX":"https://jax.readthedocs.io/en/latest/"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#getting-started-with-jax-note-the-way-we-import-numpy" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Getting started with Jax, note the way we import numpy</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#a-warm-up-example" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;A warm-up example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#a-more-advanced-example" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;A more advanced example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs046.html#introduction-to-neural-networks" style="font-size: 80%;"><b>Introduction to Neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs047.html#artificial-neurons" style="font-size: 80%;"><b>Artificial neurons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs048.html#neural-network-types" style="font-size: 80%;"><b>Neural network types</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs049.html#feed-forward-neural-networks" style="font-size: 80%;"><b>Feed-forward neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs050.html#convolutional-neural-network" style="font-size: 80%;"><b>Convolutional Neural Network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs051.html#recurrent-neural-networks" style="font-size: 80%;"><b>Recurrent neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs052.html#other-types-of-networks" style="font-size: 80%;"><b>Other types of networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs053.html#multilayer-perceptrons" style="font-size: 80%;"><b>Multilayer perceptrons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs054.html#why-multilayer-perceptrons" style="font-size: 80%;"><b>Why multilayer perceptrons?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs055.html#illustration-of-a-single-perceptron-model-and-a-multi-perceptron-model" style="font-size: 80%;"><b>Illustration of a single perceptron model and a multi-perceptron model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs056.html#examples-of-xor-or-and-and-gates" style="font-size: 80%;"><b>Examples of XOR, OR and AND gates</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs057.html#does-logistic-regression-do-a-better-job" style="font-size: 80%;"><b>Does Logistic Regression do a better Job?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs058.html#adding-neural-networks" style="font-size: 80%;"><b>Adding Neural Networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs064.html#matrix-vector-notation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Matrix-vector notation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs065.html#matrix-vector-notation-and-activation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Matrix-vector notation  and activation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs066.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs067.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs068.html#relevance" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Relevance</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs001.html#lecture-monday-september-29-2025" style="font-size: 80%;"><b>Lecture Monday September 29, 2025</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs002.html#suggested-readings-and-videos" style="font-size: 80%;"><b>Suggested readings and videos</b></a></li>
+     <!-- navigation toc: --> <li><a href="#lab-sessions-tuesday-and-wednesday" style="font-size: 80%;"><b>Lab sessions Tuesday and Wednesday</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs004.html#logistic-regression-from-last-week" style="font-size: 80%;"><b>Logistic Regression, from last week</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs005.html#classification-problems" style="font-size: 80%;"><b>Classification problems</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs006.html#optimization-and-deep-learning" style="font-size: 80%;"><b>Optimization and Deep learning</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs007.html#basics" style="font-size: 80%;"><b>Basics</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs008.html#two-parameters" style="font-size: 80%;"><b>Two parameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs009.html#maximum-likelihood" style="font-size: 80%;"><b>Maximum likelihood</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs010.html#the-cost-function-rewritten" style="font-size: 80%;"><b>The cost function rewritten</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs011.html#minimizing-the-cross-entropy" style="font-size: 80%;"><b>Minimizing the cross entropy</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs012.html#a-more-compact-expression" style="font-size: 80%;"><b>A more compact expression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs013.html#extending-to-more-predictors" style="font-size: 80%;"><b>Extending to more predictors</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs014.html#including-more-classes" style="font-size: 80%;"><b>Including more classes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs015.html#more-classes" style="font-size: 80%;"><b>More classes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs016.html#optimization-the-central-part-of-any-machine-learning-algortithm" style="font-size: 80%;"><b>Optimization, the central part of any Machine Learning algortithm</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs017.html#revisiting-our-logistic-regression-case" style="font-size: 80%;"><b>Revisiting our Logistic Regression case</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs018.html#the-equations-to-solve" style="font-size: 80%;"><b>The equations to solve</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs019.html#solving-using-newton-raphson-s-method" style="font-size: 80%;"><b>Solving using Newton-Raphson's method</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs020.html#example-code-for-logistic-regression" style="font-size: 80%;"><b>Example code for Logistic Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs020.html#synthetic-data-generation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Synthetic data generation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs021.html#using-scikit-learn" style="font-size: 80%;"><b>Using <b>Scikit-learn</b></b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs022.html#using-the-correlation-matrix" style="font-size: 80%;"><b>Using the correlation matrix</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs023.html#discussing-the-correlation-data" style="font-size: 80%;"><b>Discussing the correlation data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs024.html#other-measures-in-classification-studies" style="font-size: 80%;"><b>Other measures in classification studies</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs025.html#introduction-to-neural-networks" style="font-size: 80%;"><b>Introduction to Neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs026.html#artificial-neurons" style="font-size: 80%;"><b>Artificial neurons</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs027.html#neural-network-types" style="font-size: 80%;"><b>Neural network types</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#feed-forward-neural-networks" style="font-size: 80%;"><b>Feed-forward neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs029.html#convolutional-neural-network" style="font-size: 80%;"><b>Convolutional Neural Network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs030.html#recurrent-neural-networks" style="font-size: 80%;"><b>Recurrent neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs031.html#other-types-of-networks" style="font-size: 80%;"><b>Other types of networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs032.html#multilayer-perceptrons" style="font-size: 80%;"><b>Multilayer perceptrons</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs033.html#why-multilayer-perceptrons" style="font-size: 80%;"><b>Why multilayer perceptrons?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs034.html#illustration-of-a-single-perceptron-model-and-a-multi-perceptron-model" style="font-size: 80%;"><b>Illustration of a single perceptron model and a multi-perceptron model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs035.html#examples-of-xor-or-and-and-gates" style="font-size: 80%;"><b>Examples of XOR, OR and AND gates</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs036.html#does-logistic-regression-do-a-better-job" style="font-size: 80%;"><b>Does Logistic Regression do a better Job?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs037.html#adding-neural-networks" style="font-size: 80%;"><b>Adding Neural Networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs043.html#matrix-vector-notation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Matrix-vector notation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs044.html#matrix-vector-notation-and-activation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Matrix-vector notation  and activation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs046.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs047.html#relevance" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Relevance</a></li>
 
         </ul>
       </li>
@@ -362,22 +260,19 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0003"></a>
 <!-- !split -->
-<h2 id="suggested-readings-and-videos" class="anchor">Suggested readings and videos </h2>
+<h2 id="lab-sessions-tuesday-and-wednesday" class="anchor">Lab sessions Tuesday and Wednesday </h2>
 <div class="panel panel-default">
 <div class="panel-body">
 <!-- subsequent paragraphs come in larger fonts, so start with a paragraph -->
-<ol>
-  <li> The lecture notes for week 40 (these notes)</li>
-  <li> For a good discussion on gradient methods, we would like to recommend Goodfellow et al section 4.3-4.5 and sections 8.3-8.6. We will come back to the latter chapter in our discussion of Neural networks as well.</li>
-  <li> For neural networks we recommend Goodfellow et al chapter 6 and Raschka et al chapter 2 (contains also material about gradient descent) and chapter 11 (we will use this next week)</li>
-  <li> Video on gradient descent at <a href="https://www.youtube.com/watch?v=sDv4f4s2SB8" target="_self"><tt>https://www.youtube.com/watch?v=sDv4f4s2SB8</tt></a></li>
-  <li> Video on stochastic gradient descent at <a href="https://www.youtube.com/watch?v=vMh0zPT0tLI" target="_self"><tt>https://www.youtube.com/watch?v=vMh0zPT0tLI</tt></a></li>
-  <li> Neural Networks demystified at <a href="https://www.youtube.com/watch?v=bxe2T-V8XRs&list=PLiaHhY2iBX9hdHaRr6b7XevZtgZRa1PoU&ab_channel=WelchLabs" target="_self"><tt>https://www.youtube.com/watch?v=bxe2T-V8XRs&list=PLiaHhY2iBX9hdHaRr6b7XevZtgZRa1PoU&ab_channel=WelchLabs</tt></a></li>
-  <li> Building Neural Networks from scratch at URL:https://www.youtube.com/watch?v=Wo5dMEP_BbI&list=PLQVvvaa0QuDcjD5BAw2DxE6OF2tius3V3&ab_channel=sentdex"</li>
-</ol>
+<ul>
+  <li> Work on project 1 and discussions on how to structure your report</li>
+  <li> No weekly exercises for week 40, project work only</li>
+  <li> Video on how to write scientific reports recorded during one of the lab sessions at <a href="https://youtu.be/tVW1ZDmZnwM" target="_self"><tt>https://youtu.be/tVW1ZDmZnwM</tt></a></li>
+  <li> A general guideline can be found at <a href="https://github.com/CompPhysics/MachineLearning/blob/master/doc/Projects/EvaluationGrading/EvaluationForm.md" target="_self"><tt>https://github.com/CompPhysics/MachineLearning/blob/master/doc/Projects/EvaluationGrading/EvaluationForm.md</tt></a>.</li>
+</ul>
 </div>
 </div>
-
+  
 
 <p>
 <!-- navigation buttons at the bottom of the page -->
@@ -397,7 +292,7 @@ <h2 id="suggested-readings-and-videos" class="anchor">Suggested readings and vid
   <li><a href="http://github.com/._week40-bs011.html">12</a></li>
   <li><a href="http://github.com/._week40-bs012.html">13</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week40-bs068.html">69</a></li>
+  <li><a href="http://github.com/._week40-bs047.html">48</a></li>
   <li><a href="http://github.com/._week40-bs004.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week40/html/._week40-bs004.html b/doc/pub/week40/html/._week40-bs004.html
index 2c637fbc2..6809af720 100644
--- a/doc/pub/week40/html/._week40-bs004.html
+++ b/doc/pub/week40/html/._week40-bs004.html
@@ -36,11 +36,10 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Plans for week 40', 2, None, 'plans-for-week-40'),
-              ('Lecture Monday September 30, 2024',
+ 'sections': [('Lecture Monday September 29, 2025',
                2,
                None,
-               'lecture-monday-september-30-2024'),
+               'lecture-monday-september-29-2025'),
               ('Suggested readings and videos',
                2,
                None,
@@ -49,145 +48,71 @@
                2,
                None,
                'lab-sessions-tuesday-and-wednesday'),
-              ('Summary from last week, using gradient descent methods, '
-               'limitations',
+              ('Logistic Regression, from last week',
                2,
                None,
-               'summary-from-last-week-using-gradient-descent-methods-limitations'),
-              ('Simple implementation of GD for OLS, Ridge and Lasso',
+               'logistic-regression-from-last-week'),
+              ('Classification problems', 2, None, 'classification-problems'),
+              ('Optimization and Deep learning',
                2,
                None,
-               'simple-implementation-of-gd-for-ols-ridge-and-lasso'),
-              ("But none of these can compete with Newton's method",
+               'optimization-and-deep-learning'),
+              ('Basics', 2, None, 'basics'),
+              ('Two parameters', 2, None, 'two-parameters'),
+              ('Maximum likelihood', 2, None, 'maximum-likelihood'),
+              ('The cost function rewritten',
                2,
                None,
-               'but-none-of-these-can-compete-with-newton-s-method'),
-              ('Gradient descent and Logistic regression',
+               'the-cost-function-rewritten'),
+              ('Minimizing the cross entropy',
                2,
                None,
-               'gradient-descent-and-logistic-regression'),
-              ('Overview video on Stochastic Gradient Descent',
+               'minimizing-the-cross-entropy'),
+              ('A more compact expression',
                2,
                None,
-               'overview-video-on-stochastic-gradient-descent'),
-              ('Batches and mini-batches', 2, None, 'batches-and-mini-batches'),
-              ('Stochastic Gradient Descent (SGD)',
+               'a-more-compact-expression'),
+              ('Extending to more predictors',
                2,
                None,
-               'stochastic-gradient-descent-sgd'),
-              ('Stochastic Gradient Descent',
+               'extending-to-more-predictors'),
+              ('Including more classes', 2, None, 'including-more-classes'),
+              ('More classes', 2, None, 'more-classes'),
+              ('Optimization, the central part of any Machine Learning '
+               'algortithm',
                2,
                None,
-               'stochastic-gradient-descent'),
-              ('Computation of gradients', 2, None, 'computation-of-gradients'),
-              ('SGD example', 2, None, 'sgd-example'),
-              ('The gradient step', 2, None, 'the-gradient-step'),
-              ('Simple example code', 2, None, 'simple-example-code'),
-              ('When do we stop?', 2, None, 'when-do-we-stop'),
-              ('Slightly different approach',
+               'optimization-the-central-part-of-any-machine-learning-algortithm'),
+              ('Revisiting our Logistic Regression case',
                2,
                None,
-               'slightly-different-approach'),
-              ('Time decay rate', 2, None, 'time-decay-rate'),
-              ('Code with a Number of Minibatches which varies',
+               'revisiting-our-logistic-regression-case'),
+              ('The equations to solve', 2, None, 'the-equations-to-solve'),
+              ("Solving using Newton-Raphson's method",
                2,
                None,
-               'code-with-a-number-of-minibatches-which-varies'),
-              ('Replace or not', 2, None, 'replace-or-not'),
-              ('Momentum based GD', 2, None, 'momentum-based-gd'),
-              ('More on momentum based approaches',
+               'solving-using-newton-raphson-s-method'),
+              ('Example code for Logistic Regression',
                2,
                None,
-               'more-on-momentum-based-approaches'),
-              ('Momentum parameter', 2, None, 'momentum-parameter'),
-              ('Second moment of the gradient',
-               2,
-               None,
-               'second-moment-of-the-gradient'),
-              ('RMS prop', 2, None, 'rms-prop'),
-              ('"ADAM optimizer":"https://arxiv.org/abs/1412.6980"',
-               2,
-               None,
-               'adam-optimizer-https-arxiv-org-abs-1412-6980'),
-              ('Algorithms and codes for Adagrad, RMSprop and Adam',
-               2,
-               None,
-               'algorithms-and-codes-for-adagrad-rmsprop-and-adam'),
-              ('AdaGrad algorithm, taken from "Goodfellow et '
-               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
-               2,
-               None,
-               'adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
-              ('RMSProp algorithm, taken from "Goodfellow et '
-               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
-               2,
-               None,
-               'rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
-              ('ADAM algorithm, taken from "Goodfellow et '
-               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
-               2,
-               None,
-               'adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
-              ('Practical tips', 2, None, 'practical-tips'),
-              ('Automatic differentiation',
-               2,
-               None,
-               'automatic-differentiation'),
-              ('Using autograd', 2, None, 'using-autograd'),
-              ('Autograd with more complicated functions',
-               2,
-               None,
-               'autograd-with-more-complicated-functions'),
-              ('More complicated functions using the elements of their '
-               'arguments directly',
-               2,
-               None,
-               'more-complicated-functions-using-the-elements-of-their-arguments-directly'),
-              ('Functions using mathematical functions from Numpy',
-               2,
-               None,
-               'functions-using-mathematical-functions-from-numpy'),
-              ('More autograd', 2, None, 'more-autograd'),
-              ('And  with loops', 2, None, 'and-with-loops'),
-              ('Using recursion', 2, None, 'using-recursion'),
-              ('Using Autograd with OLS', 2, None, 'using-autograd-with-ols'),
-              ('Same code but now with momentum gradient descent',
-               2,
-               None,
-               'same-code-but-now-with-momentum-gradient-descent'),
-              ('Including Stochastic Gradient Descent with Autograd',
-               2,
-               None,
-               'including-stochastic-gradient-descent-with-autograd'),
-              ('Same code but now with momentum gradient descent',
-               2,
-               None,
-               'same-code-but-now-with-momentum-gradient-descent'),
-              ('Similar (second order function now) problem but now with '
-               'AdaGrad',
-               2,
+               'example-code-for-logistic-regression'),
+              ('Synthetic data generation',
+               3,
                None,
-               'similar-second-order-function-now-problem-but-now-with-adagrad'),
-              ('RMSprop for adaptive learning rate with Stochastic Gradient '
-               'Descent',
+               'synthetic-data-generation'),
+              ('Using _Scikit-learn_', 2, None, 'using-scikit-learn'),
+              ('Using the correlation matrix',
                2,
                None,
-               'rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent'),
-              ('And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"',
+               'using-the-correlation-matrix'),
+              ('Discussing the correlation data',
                2,
                None,
-               'and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf'),
-              ('And Logistic Regression', 2, None, 'and-logistic-regression'),
-              ('Introducing "JAX":"https://jax.readthedocs.io/en/latest/"',
+               'discussing-the-correlation-data'),
+              ('Other measures in classification studies',
                2,
                None,
-               'introducing-jax-https-jax-readthedocs-io-en-latest'),
-              ('Getting started with Jax, note the way we import numpy',
-               3,
-               None,
-               'getting-started-with-jax-note-the-way-we-import-numpy'),
-              ('A warm-up example', 3, None, 'a-warm-up-example'),
-              ('A more advanced example', 3, None, 'a-more-advanced-example'),
+               'other-measures-in-classification-studies'),
               ('Introduction to Neural networks',
                2,
                None,
@@ -276,81 +201,54 @@
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs001.html#plans-for-week-40" style="font-size: 80%;"><b>Plans for week 40</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs002.html#lecture-monday-september-30-2024" style="font-size: 80%;"><b>Lecture Monday September 30, 2024</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs003.html#suggested-readings-and-videos" style="font-size: 80%;"><b>Suggested readings and videos</b></a></li>
-     <!-- navigation toc: --> <li><a href="#lab-sessions-tuesday-and-wednesday" style="font-size: 80%;"><b>Lab sessions Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs005.html#summary-from-last-week-using-gradient-descent-methods-limitations" style="font-size: 80%;"><b>Summary from last week, using gradient descent methods, limitations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs006.html#simple-implementation-of-gd-for-ols-ridge-and-lasso" style="font-size: 80%;"><b>Simple implementation of GD for OLS, Ridge and Lasso</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs007.html#but-none-of-these-can-compete-with-newton-s-method" style="font-size: 80%;"><b>But none of these can compete with Newton's method</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs008.html#gradient-descent-and-logistic-regression" style="font-size: 80%;"><b>Gradient descent and Logistic regression</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs009.html#overview-video-on-stochastic-gradient-descent" style="font-size: 80%;"><b>Overview video on Stochastic Gradient Descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs010.html#batches-and-mini-batches" style="font-size: 80%;"><b>Batches and mini-batches</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs011.html#stochastic-gradient-descent-sgd" style="font-size: 80%;"><b>Stochastic Gradient Descent (SGD)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs012.html#stochastic-gradient-descent" style="font-size: 80%;"><b>Stochastic Gradient Descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs013.html#computation-of-gradients" style="font-size: 80%;"><b>Computation of gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs014.html#sgd-example" style="font-size: 80%;"><b>SGD example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs015.html#the-gradient-step" style="font-size: 80%;"><b>The gradient step</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs016.html#simple-example-code" style="font-size: 80%;"><b>Simple example code</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs017.html#when-do-we-stop" style="font-size: 80%;"><b>When do we stop?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs018.html#slightly-different-approach" style="font-size: 80%;"><b>Slightly different approach</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs019.html#time-decay-rate" style="font-size: 80%;"><b>Time decay rate</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs020.html#code-with-a-number-of-minibatches-which-varies" style="font-size: 80%;"><b>Code with a Number of Minibatches which varies</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs021.html#replace-or-not" style="font-size: 80%;"><b>Replace or not</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs022.html#momentum-based-gd" style="font-size: 80%;"><b>Momentum based GD</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs023.html#more-on-momentum-based-approaches" style="font-size: 80%;"><b>More on momentum based approaches</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs024.html#momentum-parameter" style="font-size: 80%;"><b>Momentum parameter</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs025.html#second-moment-of-the-gradient" style="font-size: 80%;"><b>Second moment of the gradient</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs026.html#rms-prop" style="font-size: 80%;"><b>RMS prop</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs027.html#adam-optimizer-https-arxiv-org-abs-1412-6980" style="font-size: 80%;"><b>"ADAM optimizer":"https://arxiv.org/abs/1412.6980"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#algorithms-and-codes-for-adagrad-rmsprop-and-adam" style="font-size: 80%;"><b>Algorithms and codes for Adagrad, RMSprop and Adam</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>AdaGrad algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>RMSProp algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>ADAM algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs029.html#practical-tips" style="font-size: 80%;"><b>Practical tips</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs030.html#automatic-differentiation" style="font-size: 80%;"><b>Automatic differentiation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs031.html#using-autograd" style="font-size: 80%;"><b>Using autograd</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs032.html#autograd-with-more-complicated-functions" style="font-size: 80%;"><b>Autograd with more complicated functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs033.html#more-complicated-functions-using-the-elements-of-their-arguments-directly" style="font-size: 80%;"><b>More complicated functions using the elements of their arguments directly</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs034.html#functions-using-mathematical-functions-from-numpy" style="font-size: 80%;"><b>Functions using mathematical functions from Numpy</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs035.html#more-autograd" style="font-size: 80%;"><b>More autograd</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs036.html#and-with-loops" style="font-size: 80%;"><b>And  with loops</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs037.html#using-recursion" style="font-size: 80%;"><b>Using recursion</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs038.html#using-autograd-with-ols" style="font-size: 80%;"><b>Using Autograd with OLS</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs041.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs040.html#including-stochastic-gradient-descent-with-autograd" style="font-size: 80%;"><b>Including Stochastic Gradient Descent with Autograd</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs041.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#similar-second-order-function-now-problem-but-now-with-adagrad" style="font-size: 80%;"><b>Similar (second order function now) problem but now with AdaGrad</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs043.html#rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent" style="font-size: 80%;"><b>RMSprop for adaptive learning rate with Stochastic Gradient Descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs044.html#and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf" style="font-size: 80%;"><b>And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#and-logistic-regression" style="font-size: 80%;"><b>And Logistic Regression</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#introducing-jax-https-jax-readthedocs-io-en-latest" style="font-size: 80%;"><b>Introducing "JAX":"https://jax.readthedocs.io/en/latest/"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#getting-started-with-jax-note-the-way-we-import-numpy" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Getting started with Jax, note the way we import numpy</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#a-warm-up-example" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;A warm-up example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#a-more-advanced-example" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;A more advanced example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs046.html#introduction-to-neural-networks" style="font-size: 80%;"><b>Introduction to Neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs047.html#artificial-neurons" style="font-size: 80%;"><b>Artificial neurons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs048.html#neural-network-types" style="font-size: 80%;"><b>Neural network types</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs049.html#feed-forward-neural-networks" style="font-size: 80%;"><b>Feed-forward neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs050.html#convolutional-neural-network" style="font-size: 80%;"><b>Convolutional Neural Network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs051.html#recurrent-neural-networks" style="font-size: 80%;"><b>Recurrent neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs052.html#other-types-of-networks" style="font-size: 80%;"><b>Other types of networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs053.html#multilayer-perceptrons" style="font-size: 80%;"><b>Multilayer perceptrons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs054.html#why-multilayer-perceptrons" style="font-size: 80%;"><b>Why multilayer perceptrons?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs055.html#illustration-of-a-single-perceptron-model-and-a-multi-perceptron-model" style="font-size: 80%;"><b>Illustration of a single perceptron model and a multi-perceptron model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs056.html#examples-of-xor-or-and-and-gates" style="font-size: 80%;"><b>Examples of XOR, OR and AND gates</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs057.html#does-logistic-regression-do-a-better-job" style="font-size: 80%;"><b>Does Logistic Regression do a better Job?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs058.html#adding-neural-networks" style="font-size: 80%;"><b>Adding Neural Networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs064.html#matrix-vector-notation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Matrix-vector notation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs065.html#matrix-vector-notation-and-activation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Matrix-vector notation  and activation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs066.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs067.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs068.html#relevance" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Relevance</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs001.html#lecture-monday-september-29-2025" style="font-size: 80%;"><b>Lecture Monday September 29, 2025</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs002.html#suggested-readings-and-videos" style="font-size: 80%;"><b>Suggested readings and videos</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs003.html#lab-sessions-tuesday-and-wednesday" style="font-size: 80%;"><b>Lab sessions Tuesday and Wednesday</b></a></li>
+     <!-- navigation toc: --> <li><a href="#logistic-regression-from-last-week" style="font-size: 80%;"><b>Logistic Regression, from last week</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs005.html#classification-problems" style="font-size: 80%;"><b>Classification problems</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs006.html#optimization-and-deep-learning" style="font-size: 80%;"><b>Optimization and Deep learning</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs007.html#basics" style="font-size: 80%;"><b>Basics</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs008.html#two-parameters" style="font-size: 80%;"><b>Two parameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs009.html#maximum-likelihood" style="font-size: 80%;"><b>Maximum likelihood</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs010.html#the-cost-function-rewritten" style="font-size: 80%;"><b>The cost function rewritten</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs011.html#minimizing-the-cross-entropy" style="font-size: 80%;"><b>Minimizing the cross entropy</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs012.html#a-more-compact-expression" style="font-size: 80%;"><b>A more compact expression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs013.html#extending-to-more-predictors" style="font-size: 80%;"><b>Extending to more predictors</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs014.html#including-more-classes" style="font-size: 80%;"><b>Including more classes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs015.html#more-classes" style="font-size: 80%;"><b>More classes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs016.html#optimization-the-central-part-of-any-machine-learning-algortithm" style="font-size: 80%;"><b>Optimization, the central part of any Machine Learning algortithm</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs017.html#revisiting-our-logistic-regression-case" style="font-size: 80%;"><b>Revisiting our Logistic Regression case</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs018.html#the-equations-to-solve" style="font-size: 80%;"><b>The equations to solve</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs019.html#solving-using-newton-raphson-s-method" style="font-size: 80%;"><b>Solving using Newton-Raphson's method</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs020.html#example-code-for-logistic-regression" style="font-size: 80%;"><b>Example code for Logistic Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs020.html#synthetic-data-generation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Synthetic data generation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs021.html#using-scikit-learn" style="font-size: 80%;"><b>Using <b>Scikit-learn</b></b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs022.html#using-the-correlation-matrix" style="font-size: 80%;"><b>Using the correlation matrix</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs023.html#discussing-the-correlation-data" style="font-size: 80%;"><b>Discussing the correlation data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs024.html#other-measures-in-classification-studies" style="font-size: 80%;"><b>Other measures in classification studies</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs025.html#introduction-to-neural-networks" style="font-size: 80%;"><b>Introduction to Neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs026.html#artificial-neurons" style="font-size: 80%;"><b>Artificial neurons</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs027.html#neural-network-types" style="font-size: 80%;"><b>Neural network types</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#feed-forward-neural-networks" style="font-size: 80%;"><b>Feed-forward neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs029.html#convolutional-neural-network" style="font-size: 80%;"><b>Convolutional Neural Network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs030.html#recurrent-neural-networks" style="font-size: 80%;"><b>Recurrent neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs031.html#other-types-of-networks" style="font-size: 80%;"><b>Other types of networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs032.html#multilayer-perceptrons" style="font-size: 80%;"><b>Multilayer perceptrons</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs033.html#why-multilayer-perceptrons" style="font-size: 80%;"><b>Why multilayer perceptrons?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs034.html#illustration-of-a-single-perceptron-model-and-a-multi-perceptron-model" style="font-size: 80%;"><b>Illustration of a single perceptron model and a multi-perceptron model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs035.html#examples-of-xor-or-and-and-gates" style="font-size: 80%;"><b>Examples of XOR, OR and AND gates</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs036.html#does-logistic-regression-do-a-better-job" style="font-size: 80%;"><b>Does Logistic Regression do a better Job?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs037.html#adding-neural-networks" style="font-size: 80%;"><b>Adding Neural Networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs043.html#matrix-vector-notation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Matrix-vector notation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs044.html#matrix-vector-notation-and-activation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Matrix-vector notation  and activation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs046.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs047.html#relevance" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Relevance</a></li>
 
         </ul>
       </li>
@@ -361,20 +259,21 @@
 <div class="container">
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0004"></a>
-<!-- !split -->
-<h2 id="lab-sessions-tuesday-and-wednesday" class="anchor">Lab sessions Tuesday and Wednesday </h2>
-<div class="panel panel-default">
-<div class="panel-body">
-<!-- subsequent paragraphs come in larger fonts, so start with a paragraph -->
-<ul>
-  <li> Work on project 1 and discussions on how to structure your report</li>
-  <li> No weekly exercises for week 40, project work only</li>
-  <li> Video on how to write scientific reports recorded during one of the lab sessions at <a href="https://youtu.be/tVW1ZDmZnwM" target="_self"><tt>https://youtu.be/tVW1ZDmZnwM</tt></a></li>
-  <li> A general guideline can be found at <a href="https://github.com/CompPhysics/MachineLearning/blob/master/doc/Projects/EvaluationGrading/EvaluationForm.md" target="_self"><tt>https://github.com/CompPhysics/MachineLearning/blob/master/doc/Projects/EvaluationGrading/EvaluationForm.md</tt></a>.</li>
-</ul>
-</div>
-</div>
-  
+<!-- !split  -->
+<h2 id="logistic-regression-from-last-week" class="anchor">Logistic Regression, from last week  </h2>
+
+<p>In linear regression our main interest was centered on learning the
+coefficients of a functional fit (say a polynomial) in order to be
+able to predict the response of a continuous variable on some unseen
+data. The fit to the continuous variable \( y_i \) is based on some
+independent variables \( \boldsymbol{x}_i \). Linear regression resulted in
+analytical expressions for standard ordinary Least Squares or Ridge
+regression (in terms of matrices to invert) for several quantities,
+ranging from the variance and thereby the confidence intervals of the
+parameters \( \boldsymbol{\theta} \) to the mean squared error. If we can invert
+the product of the design matrices, linear regression gives then a
+simple recipe for fitting our data.
+</p>
 
 <p>
 <!-- navigation buttons at the bottom of the page -->
@@ -395,7 +294,7 @@ <h2 id="lab-sessions-tuesday-and-wednesday" class="anchor">Lab sessions Tuesday
   <li><a href="http://github.com/._week40-bs012.html">13</a></li>
   <li><a href="http://github.com/._week40-bs013.html">14</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week40-bs068.html">69</a></li>
+  <li><a href="http://github.com/._week40-bs047.html">48</a></li>
   <li><a href="http://github.com/._week40-bs005.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week40/html/._week40-bs005.html b/doc/pub/week40/html/._week40-bs005.html
index a2408701c..b0c320e13 100644
--- a/doc/pub/week40/html/._week40-bs005.html
+++ b/doc/pub/week40/html/._week40-bs005.html
@@ -36,11 +36,10 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Plans for week 40', 2, None, 'plans-for-week-40'),
-              ('Lecture Monday September 30, 2024',
+ 'sections': [('Lecture Monday September 29, 2025',
                2,
                None,
-               'lecture-monday-september-30-2024'),
+               'lecture-monday-september-29-2025'),
               ('Suggested readings and videos',
                2,
                None,
@@ -49,145 +48,71 @@
                2,
                None,
                'lab-sessions-tuesday-and-wednesday'),
-              ('Summary from last week, using gradient descent methods, '
-               'limitations',
+              ('Logistic Regression, from last week',
                2,
                None,
-               'summary-from-last-week-using-gradient-descent-methods-limitations'),
-              ('Simple implementation of GD for OLS, Ridge and Lasso',
+               'logistic-regression-from-last-week'),
+              ('Classification problems', 2, None, 'classification-problems'),
+              ('Optimization and Deep learning',
                2,
                None,
-               'simple-implementation-of-gd-for-ols-ridge-and-lasso'),
-              ("But none of these can compete with Newton's method",
+               'optimization-and-deep-learning'),
+              ('Basics', 2, None, 'basics'),
+              ('Two parameters', 2, None, 'two-parameters'),
+              ('Maximum likelihood', 2, None, 'maximum-likelihood'),
+              ('The cost function rewritten',
                2,
                None,
-               'but-none-of-these-can-compete-with-newton-s-method'),
-              ('Gradient descent and Logistic regression',
+               'the-cost-function-rewritten'),
+              ('Minimizing the cross entropy',
                2,
                None,
-               'gradient-descent-and-logistic-regression'),
-              ('Overview video on Stochastic Gradient Descent',
+               'minimizing-the-cross-entropy'),
+              ('A more compact expression',
                2,
                None,
-               'overview-video-on-stochastic-gradient-descent'),
-              ('Batches and mini-batches', 2, None, 'batches-and-mini-batches'),
-              ('Stochastic Gradient Descent (SGD)',
+               'a-more-compact-expression'),
+              ('Extending to more predictors',
                2,
                None,
-               'stochastic-gradient-descent-sgd'),
-              ('Stochastic Gradient Descent',
+               'extending-to-more-predictors'),
+              ('Including more classes', 2, None, 'including-more-classes'),
+              ('More classes', 2, None, 'more-classes'),
+              ('Optimization, the central part of any Machine Learning '
+               'algortithm',
                2,
                None,
-               'stochastic-gradient-descent'),
-              ('Computation of gradients', 2, None, 'computation-of-gradients'),
-              ('SGD example', 2, None, 'sgd-example'),
-              ('The gradient step', 2, None, 'the-gradient-step'),
-              ('Simple example code', 2, None, 'simple-example-code'),
-              ('When do we stop?', 2, None, 'when-do-we-stop'),
-              ('Slightly different approach',
+               'optimization-the-central-part-of-any-machine-learning-algortithm'),
+              ('Revisiting our Logistic Regression case',
                2,
                None,
-               'slightly-different-approach'),
-              ('Time decay rate', 2, None, 'time-decay-rate'),
-              ('Code with a Number of Minibatches which varies',
+               'revisiting-our-logistic-regression-case'),
+              ('The equations to solve', 2, None, 'the-equations-to-solve'),
+              ("Solving using Newton-Raphson's method",
                2,
                None,
-               'code-with-a-number-of-minibatches-which-varies'),
-              ('Replace or not', 2, None, 'replace-or-not'),
-              ('Momentum based GD', 2, None, 'momentum-based-gd'),
-              ('More on momentum based approaches',
+               'solving-using-newton-raphson-s-method'),
+              ('Example code for Logistic Regression',
                2,
                None,
-               'more-on-momentum-based-approaches'),
-              ('Momentum parameter', 2, None, 'momentum-parameter'),
-              ('Second moment of the gradient',
-               2,
-               None,
-               'second-moment-of-the-gradient'),
-              ('RMS prop', 2, None, 'rms-prop'),
-              ('"ADAM optimizer":"https://arxiv.org/abs/1412.6980"',
-               2,
-               None,
-               'adam-optimizer-https-arxiv-org-abs-1412-6980'),
-              ('Algorithms and codes for Adagrad, RMSprop and Adam',
-               2,
-               None,
-               'algorithms-and-codes-for-adagrad-rmsprop-and-adam'),
-              ('AdaGrad algorithm, taken from "Goodfellow et '
-               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
-               2,
-               None,
-               'adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
-              ('RMSProp algorithm, taken from "Goodfellow et '
-               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
-               2,
-               None,
-               'rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
-              ('ADAM algorithm, taken from "Goodfellow et '
-               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
-               2,
-               None,
-               'adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
-              ('Practical tips', 2, None, 'practical-tips'),
-              ('Automatic differentiation',
-               2,
-               None,
-               'automatic-differentiation'),
-              ('Using autograd', 2, None, 'using-autograd'),
-              ('Autograd with more complicated functions',
-               2,
-               None,
-               'autograd-with-more-complicated-functions'),
-              ('More complicated functions using the elements of their '
-               'arguments directly',
-               2,
-               None,
-               'more-complicated-functions-using-the-elements-of-their-arguments-directly'),
-              ('Functions using mathematical functions from Numpy',
-               2,
-               None,
-               'functions-using-mathematical-functions-from-numpy'),
-              ('More autograd', 2, None, 'more-autograd'),
-              ('And  with loops', 2, None, 'and-with-loops'),
-              ('Using recursion', 2, None, 'using-recursion'),
-              ('Using Autograd with OLS', 2, None, 'using-autograd-with-ols'),
-              ('Same code but now with momentum gradient descent',
-               2,
-               None,
-               'same-code-but-now-with-momentum-gradient-descent'),
-              ('Including Stochastic Gradient Descent with Autograd',
-               2,
-               None,
-               'including-stochastic-gradient-descent-with-autograd'),
-              ('Same code but now with momentum gradient descent',
-               2,
-               None,
-               'same-code-but-now-with-momentum-gradient-descent'),
-              ('Similar (second order function now) problem but now with '
-               'AdaGrad',
-               2,
+               'example-code-for-logistic-regression'),
+              ('Synthetic data generation',
+               3,
                None,
-               'similar-second-order-function-now-problem-but-now-with-adagrad'),
-              ('RMSprop for adaptive learning rate with Stochastic Gradient '
-               'Descent',
+               'synthetic-data-generation'),
+              ('Using _Scikit-learn_', 2, None, 'using-scikit-learn'),
+              ('Using the correlation matrix',
                2,
                None,
-               'rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent'),
-              ('And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"',
+               'using-the-correlation-matrix'),
+              ('Discussing the correlation data',
                2,
                None,
-               'and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf'),
-              ('And Logistic Regression', 2, None, 'and-logistic-regression'),
-              ('Introducing "JAX":"https://jax.readthedocs.io/en/latest/"',
+               'discussing-the-correlation-data'),
+              ('Other measures in classification studies',
                2,
                None,
-               'introducing-jax-https-jax-readthedocs-io-en-latest'),
-              ('Getting started with Jax, note the way we import numpy',
-               3,
-               None,
-               'getting-started-with-jax-note-the-way-we-import-numpy'),
-              ('A warm-up example', 3, None, 'a-warm-up-example'),
-              ('A more advanced example', 3, None, 'a-more-advanced-example'),
+               'other-measures-in-classification-studies'),
               ('Introduction to Neural networks',
                2,
                None,
@@ -276,81 +201,54 @@
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs001.html#plans-for-week-40" style="font-size: 80%;"><b>Plans for week 40</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs002.html#lecture-monday-september-30-2024" style="font-size: 80%;"><b>Lecture Monday September 30, 2024</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs003.html#suggested-readings-and-videos" style="font-size: 80%;"><b>Suggested readings and videos</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs004.html#lab-sessions-tuesday-and-wednesday" style="font-size: 80%;"><b>Lab sessions Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="#summary-from-last-week-using-gradient-descent-methods-limitations" style="font-size: 80%;"><b>Summary from last week, using gradient descent methods, limitations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs006.html#simple-implementation-of-gd-for-ols-ridge-and-lasso" style="font-size: 80%;"><b>Simple implementation of GD for OLS, Ridge and Lasso</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs007.html#but-none-of-these-can-compete-with-newton-s-method" style="font-size: 80%;"><b>But none of these can compete with Newton's method</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs008.html#gradient-descent-and-logistic-regression" style="font-size: 80%;"><b>Gradient descent and Logistic regression</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs009.html#overview-video-on-stochastic-gradient-descent" style="font-size: 80%;"><b>Overview video on Stochastic Gradient Descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs010.html#batches-and-mini-batches" style="font-size: 80%;"><b>Batches and mini-batches</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs011.html#stochastic-gradient-descent-sgd" style="font-size: 80%;"><b>Stochastic Gradient Descent (SGD)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs012.html#stochastic-gradient-descent" style="font-size: 80%;"><b>Stochastic Gradient Descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs013.html#computation-of-gradients" style="font-size: 80%;"><b>Computation of gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs014.html#sgd-example" style="font-size: 80%;"><b>SGD example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs015.html#the-gradient-step" style="font-size: 80%;"><b>The gradient step</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs016.html#simple-example-code" style="font-size: 80%;"><b>Simple example code</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs017.html#when-do-we-stop" style="font-size: 80%;"><b>When do we stop?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs018.html#slightly-different-approach" style="font-size: 80%;"><b>Slightly different approach</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs019.html#time-decay-rate" style="font-size: 80%;"><b>Time decay rate</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs020.html#code-with-a-number-of-minibatches-which-varies" style="font-size: 80%;"><b>Code with a Number of Minibatches which varies</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs021.html#replace-or-not" style="font-size: 80%;"><b>Replace or not</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs022.html#momentum-based-gd" style="font-size: 80%;"><b>Momentum based GD</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs023.html#more-on-momentum-based-approaches" style="font-size: 80%;"><b>More on momentum based approaches</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs024.html#momentum-parameter" style="font-size: 80%;"><b>Momentum parameter</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs025.html#second-moment-of-the-gradient" style="font-size: 80%;"><b>Second moment of the gradient</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs026.html#rms-prop" style="font-size: 80%;"><b>RMS prop</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs027.html#adam-optimizer-https-arxiv-org-abs-1412-6980" style="font-size: 80%;"><b>"ADAM optimizer":"https://arxiv.org/abs/1412.6980"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#algorithms-and-codes-for-adagrad-rmsprop-and-adam" style="font-size: 80%;"><b>Algorithms and codes for Adagrad, RMSprop and Adam</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>AdaGrad algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>RMSProp algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>ADAM algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs029.html#practical-tips" style="font-size: 80%;"><b>Practical tips</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs030.html#automatic-differentiation" style="font-size: 80%;"><b>Automatic differentiation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs031.html#using-autograd" style="font-size: 80%;"><b>Using autograd</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs032.html#autograd-with-more-complicated-functions" style="font-size: 80%;"><b>Autograd with more complicated functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs033.html#more-complicated-functions-using-the-elements-of-their-arguments-directly" style="font-size: 80%;"><b>More complicated functions using the elements of their arguments directly</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs034.html#functions-using-mathematical-functions-from-numpy" style="font-size: 80%;"><b>Functions using mathematical functions from Numpy</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs035.html#more-autograd" style="font-size: 80%;"><b>More autograd</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs036.html#and-with-loops" style="font-size: 80%;"><b>And  with loops</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs037.html#using-recursion" style="font-size: 80%;"><b>Using recursion</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs038.html#using-autograd-with-ols" style="font-size: 80%;"><b>Using Autograd with OLS</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs041.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs040.html#including-stochastic-gradient-descent-with-autograd" style="font-size: 80%;"><b>Including Stochastic Gradient Descent with Autograd</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs041.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#similar-second-order-function-now-problem-but-now-with-adagrad" style="font-size: 80%;"><b>Similar (second order function now) problem but now with AdaGrad</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs043.html#rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent" style="font-size: 80%;"><b>RMSprop for adaptive learning rate with Stochastic Gradient Descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs044.html#and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf" style="font-size: 80%;"><b>And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#and-logistic-regression" style="font-size: 80%;"><b>And Logistic Regression</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#introducing-jax-https-jax-readthedocs-io-en-latest" style="font-size: 80%;"><b>Introducing "JAX":"https://jax.readthedocs.io/en/latest/"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#getting-started-with-jax-note-the-way-we-import-numpy" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Getting started with Jax, note the way we import numpy</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#a-warm-up-example" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;A warm-up example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#a-more-advanced-example" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;A more advanced example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs046.html#introduction-to-neural-networks" style="font-size: 80%;"><b>Introduction to Neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs047.html#artificial-neurons" style="font-size: 80%;"><b>Artificial neurons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs048.html#neural-network-types" style="font-size: 80%;"><b>Neural network types</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs049.html#feed-forward-neural-networks" style="font-size: 80%;"><b>Feed-forward neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs050.html#convolutional-neural-network" style="font-size: 80%;"><b>Convolutional Neural Network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs051.html#recurrent-neural-networks" style="font-size: 80%;"><b>Recurrent neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs052.html#other-types-of-networks" style="font-size: 80%;"><b>Other types of networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs053.html#multilayer-perceptrons" style="font-size: 80%;"><b>Multilayer perceptrons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs054.html#why-multilayer-perceptrons" style="font-size: 80%;"><b>Why multilayer perceptrons?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs055.html#illustration-of-a-single-perceptron-model-and-a-multi-perceptron-model" style="font-size: 80%;"><b>Illustration of a single perceptron model and a multi-perceptron model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs056.html#examples-of-xor-or-and-and-gates" style="font-size: 80%;"><b>Examples of XOR, OR and AND gates</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs057.html#does-logistic-regression-do-a-better-job" style="font-size: 80%;"><b>Does Logistic Regression do a better Job?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs058.html#adding-neural-networks" style="font-size: 80%;"><b>Adding Neural Networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs064.html#matrix-vector-notation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Matrix-vector notation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs065.html#matrix-vector-notation-and-activation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Matrix-vector notation  and activation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs066.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs067.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs068.html#relevance" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Relevance</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs001.html#lecture-monday-september-29-2025" style="font-size: 80%;"><b>Lecture Monday September 29, 2025</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs002.html#suggested-readings-and-videos" style="font-size: 80%;"><b>Suggested readings and videos</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs003.html#lab-sessions-tuesday-and-wednesday" style="font-size: 80%;"><b>Lab sessions Tuesday and Wednesday</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs004.html#logistic-regression-from-last-week" style="font-size: 80%;"><b>Logistic Regression, from last week</b></a></li>
+     <!-- navigation toc: --> <li><a href="#classification-problems" style="font-size: 80%;"><b>Classification problems</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs006.html#optimization-and-deep-learning" style="font-size: 80%;"><b>Optimization and Deep learning</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs007.html#basics" style="font-size: 80%;"><b>Basics</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs008.html#two-parameters" style="font-size: 80%;"><b>Two parameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs009.html#maximum-likelihood" style="font-size: 80%;"><b>Maximum likelihood</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs010.html#the-cost-function-rewritten" style="font-size: 80%;"><b>The cost function rewritten</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs011.html#minimizing-the-cross-entropy" style="font-size: 80%;"><b>Minimizing the cross entropy</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs012.html#a-more-compact-expression" style="font-size: 80%;"><b>A more compact expression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs013.html#extending-to-more-predictors" style="font-size: 80%;"><b>Extending to more predictors</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs014.html#including-more-classes" style="font-size: 80%;"><b>Including more classes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs015.html#more-classes" style="font-size: 80%;"><b>More classes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs016.html#optimization-the-central-part-of-any-machine-learning-algortithm" style="font-size: 80%;"><b>Optimization, the central part of any Machine Learning algortithm</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs017.html#revisiting-our-logistic-regression-case" style="font-size: 80%;"><b>Revisiting our Logistic Regression case</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs018.html#the-equations-to-solve" style="font-size: 80%;"><b>The equations to solve</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs019.html#solving-using-newton-raphson-s-method" style="font-size: 80%;"><b>Solving using Newton-Raphson's method</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs020.html#example-code-for-logistic-regression" style="font-size: 80%;"><b>Example code for Logistic Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs020.html#synthetic-data-generation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Synthetic data generation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs021.html#using-scikit-learn" style="font-size: 80%;"><b>Using <b>Scikit-learn</b></b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs022.html#using-the-correlation-matrix" style="font-size: 80%;"><b>Using the correlation matrix</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs023.html#discussing-the-correlation-data" style="font-size: 80%;"><b>Discussing the correlation data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs024.html#other-measures-in-classification-studies" style="font-size: 80%;"><b>Other measures in classification studies</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs025.html#introduction-to-neural-networks" style="font-size: 80%;"><b>Introduction to Neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs026.html#artificial-neurons" style="font-size: 80%;"><b>Artificial neurons</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs027.html#neural-network-types" style="font-size: 80%;"><b>Neural network types</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#feed-forward-neural-networks" style="font-size: 80%;"><b>Feed-forward neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs029.html#convolutional-neural-network" style="font-size: 80%;"><b>Convolutional Neural Network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs030.html#recurrent-neural-networks" style="font-size: 80%;"><b>Recurrent neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs031.html#other-types-of-networks" style="font-size: 80%;"><b>Other types of networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs032.html#multilayer-perceptrons" style="font-size: 80%;"><b>Multilayer perceptrons</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs033.html#why-multilayer-perceptrons" style="font-size: 80%;"><b>Why multilayer perceptrons?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs034.html#illustration-of-a-single-perceptron-model-and-a-multi-perceptron-model" style="font-size: 80%;"><b>Illustration of a single perceptron model and a multi-perceptron model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs035.html#examples-of-xor-or-and-and-gates" style="font-size: 80%;"><b>Examples of XOR, OR and AND gates</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs036.html#does-logistic-regression-do-a-better-job" style="font-size: 80%;"><b>Does Logistic Regression do a better Job?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs037.html#adding-neural-networks" style="font-size: 80%;"><b>Adding Neural Networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs043.html#matrix-vector-notation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Matrix-vector notation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs044.html#matrix-vector-notation-and-activation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Matrix-vector notation  and activation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs046.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs047.html#relevance" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Relevance</a></li>
 
         </ul>
       </li>
@@ -361,17 +259,27 @@
 <div class="container">
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0005"></a>
-<!-- !split -->
-<h2 id="summary-from-last-week-using-gradient-descent-methods-limitations" class="anchor">Summary from last week, using gradient descent methods, limitations </h2>
+<!-- !split  -->
+<h2 id="classification-problems" class="anchor">Classification problems </h2>
+
+<p>Classification problems, however, are concerned with outcomes taking
+the form of discrete variables (i.e. categories). We may for example,
+on the basis of DNA sequencing for a number of patients, like to find
+out which mutations are important for a certain disease; or based on
+scans of various patients' brains, figure out if there is a tumor or
+not; or given a specific physical system, we'd like to identify its
+state, say whether it is an ordered or disordered system (typical
+situation in solid state physics); or classify the status of a
+patient, whether she/he has a stroke or not and many other similar
+situations.
+</p>
+
+<p>The most common situation we encounter when we apply logistic
+regression is that of two possible outcomes, normally denoted as a
+binary outcome, true or false, positive or negative, success or
+failure etc.
+</p>
 
-<ul>
-<li> <b>Gradient descent (GD) finds local minima of our function</b>. Since the GD algorithm is deterministic, if it converges, it will converge to a local minimum of our cost/loss/risk function. Because in ML we are often dealing with extremely rugged landscapes with many local minima, this can lead to poor performance.</li>
-<li> <b>GD is sensitive to initial conditions</b>. One consequence of the local nature of GD is that initial conditions matter. Depending on where one starts, one will end up at a different local minima. Therefore, it is very important to think about how one initializes the training process. This is true for GD as well as more complicated variants of GD.</li>
-<li> <b>Gradients are computationally expensive to calculate for large datasets</b>. In many cases in statistics and ML, the cost/loss/risk function is a sum of terms, with one term for each data point. For example, in linear regression, \( E \propto \sum_{i=1}^n (y_i - \mathbf{w}^T\cdot\mathbf{x}_i)^2 \); for logistic regression, the square error is replaced by the cross entropy. To calculate the gradient we have to sum over <em>all</em> \( n \) data points. Doing this at every GD step becomes extremely computationally expensive. An ingenious solution to this, is to calculate the gradients using small subsets of the data called &quot;mini batches&quot;. This has the added benefit of introducing stochasticity into our algorithm.</li>
-<li> <b>GD is very sensitive to choices of learning rates</b>. GD is extremely sensitive to the choice of learning rates. If the learning rate is very small, the training process take an extremely long time. For larger learning rates, GD can diverge and give poor results. Furthermore, depending on what the local landscape looks like, we have to modify the learning rates to ensure convergence. Ideally, we would <em>adaptively</em> choose the learning rates to match the landscape.</li>
-<li> <b>GD treats all directions in parameter space uniformly.</b> Another major drawback of GD is that unlike Newton's method, the learning rate for GD is the same in all directions in parameter space. For this reason, the maximum learning rate is set by the behavior of the steepest direction and this can significantly slow down training. Ideally, we would like to take large steps in flat directions and small steps in steep directions. Since we are exploring rugged landscapes where curvatures change, this requires us to keep track of not only the gradient but second derivatives. The ideal scenario would be to calculate the Hessian but this proves to be too computationally expensive.</li> 
-<li> GD can take exponential time to escape saddle points, even with random initialization. As we mentioned, GD is extremely sensitive to initial condition since it determines the particular local minimum GD would eventually reach. However, even with a good initialization scheme, through the introduction of randomness, GD can still take exponential time to escape saddle points.</li>
-</ul>
 <p>
 <!-- navigation buttons at the bottom of the page -->
 <ul class="pagination">
@@ -392,7 +300,7 @@ <h2 id="summary-from-last-week-using-gradient-descent-methods-limitations" class
   <li><a href="http://github.com/._week40-bs013.html">14</a></li>
   <li><a href="http://github.com/._week40-bs014.html">15</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week40-bs068.html">69</a></li>
+  <li><a href="http://github.com/._week40-bs047.html">48</a></li>
   <li><a href="http://github.com/._week40-bs006.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week40/html/._week40-bs006.html b/doc/pub/week40/html/._week40-bs006.html
index 7379dc348..d50bdaa97 100644
--- a/doc/pub/week40/html/._week40-bs006.html
+++ b/doc/pub/week40/html/._week40-bs006.html
@@ -36,11 +36,10 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Plans for week 40', 2, None, 'plans-for-week-40'),
-              ('Lecture Monday September 30, 2024',
+ 'sections': [('Lecture Monday September 29, 2025',
                2,
                None,
-               'lecture-monday-september-30-2024'),
+               'lecture-monday-september-29-2025'),
               ('Suggested readings and videos',
                2,
                None,
@@ -49,145 +48,71 @@
                2,
                None,
                'lab-sessions-tuesday-and-wednesday'),
-              ('Summary from last week, using gradient descent methods, '
-               'limitations',
+              ('Logistic Regression, from last week',
                2,
                None,
-               'summary-from-last-week-using-gradient-descent-methods-limitations'),
-              ('Simple implementation of GD for OLS, Ridge and Lasso',
+               'logistic-regression-from-last-week'),
+              ('Classification problems', 2, None, 'classification-problems'),
+              ('Optimization and Deep learning',
                2,
                None,
-               'simple-implementation-of-gd-for-ols-ridge-and-lasso'),
-              ("But none of these can compete with Newton's method",
+               'optimization-and-deep-learning'),
+              ('Basics', 2, None, 'basics'),
+              ('Two parameters', 2, None, 'two-parameters'),
+              ('Maximum likelihood', 2, None, 'maximum-likelihood'),
+              ('The cost function rewritten',
                2,
                None,
-               'but-none-of-these-can-compete-with-newton-s-method'),
-              ('Gradient descent and Logistic regression',
+               'the-cost-function-rewritten'),
+              ('Minimizing the cross entropy',
                2,
                None,
-               'gradient-descent-and-logistic-regression'),
-              ('Overview video on Stochastic Gradient Descent',
+               'minimizing-the-cross-entropy'),
+              ('A more compact expression',
                2,
                None,
-               'overview-video-on-stochastic-gradient-descent'),
-              ('Batches and mini-batches', 2, None, 'batches-and-mini-batches'),
-              ('Stochastic Gradient Descent (SGD)',
+               'a-more-compact-expression'),
+              ('Extending to more predictors',
                2,
                None,
-               'stochastic-gradient-descent-sgd'),
-              ('Stochastic Gradient Descent',
+               'extending-to-more-predictors'),
+              ('Including more classes', 2, None, 'including-more-classes'),
+              ('More classes', 2, None, 'more-classes'),
+              ('Optimization, the central part of any Machine Learning '
+               'algortithm',
                2,
                None,
-               'stochastic-gradient-descent'),
-              ('Computation of gradients', 2, None, 'computation-of-gradients'),
-              ('SGD example', 2, None, 'sgd-example'),
-              ('The gradient step', 2, None, 'the-gradient-step'),
-              ('Simple example code', 2, None, 'simple-example-code'),
-              ('When do we stop?', 2, None, 'when-do-we-stop'),
-              ('Slightly different approach',
+               'optimization-the-central-part-of-any-machine-learning-algortithm'),
+              ('Revisiting our Logistic Regression case',
                2,
                None,
-               'slightly-different-approach'),
-              ('Time decay rate', 2, None, 'time-decay-rate'),
-              ('Code with a Number of Minibatches which varies',
+               'revisiting-our-logistic-regression-case'),
+              ('The equations to solve', 2, None, 'the-equations-to-solve'),
+              ("Solving using Newton-Raphson's method",
                2,
                None,
-               'code-with-a-number-of-minibatches-which-varies'),
-              ('Replace or not', 2, None, 'replace-or-not'),
-              ('Momentum based GD', 2, None, 'momentum-based-gd'),
-              ('More on momentum based approaches',
+               'solving-using-newton-raphson-s-method'),
+              ('Example code for Logistic Regression',
                2,
                None,
-               'more-on-momentum-based-approaches'),
-              ('Momentum parameter', 2, None, 'momentum-parameter'),
-              ('Second moment of the gradient',
-               2,
-               None,
-               'second-moment-of-the-gradient'),
-              ('RMS prop', 2, None, 'rms-prop'),
-              ('"ADAM optimizer":"https://arxiv.org/abs/1412.6980"',
-               2,
-               None,
-               'adam-optimizer-https-arxiv-org-abs-1412-6980'),
-              ('Algorithms and codes for Adagrad, RMSprop and Adam',
-               2,
-               None,
-               'algorithms-and-codes-for-adagrad-rmsprop-and-adam'),
-              ('AdaGrad algorithm, taken from "Goodfellow et '
-               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
-               2,
-               None,
-               'adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
-              ('RMSProp algorithm, taken from "Goodfellow et '
-               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
-               2,
-               None,
-               'rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
-              ('ADAM algorithm, taken from "Goodfellow et '
-               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
-               2,
-               None,
-               'adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
-              ('Practical tips', 2, None, 'practical-tips'),
-              ('Automatic differentiation',
-               2,
-               None,
-               'automatic-differentiation'),
-              ('Using autograd', 2, None, 'using-autograd'),
-              ('Autograd with more complicated functions',
-               2,
-               None,
-               'autograd-with-more-complicated-functions'),
-              ('More complicated functions using the elements of their '
-               'arguments directly',
-               2,
-               None,
-               'more-complicated-functions-using-the-elements-of-their-arguments-directly'),
-              ('Functions using mathematical functions from Numpy',
-               2,
-               None,
-               'functions-using-mathematical-functions-from-numpy'),
-              ('More autograd', 2, None, 'more-autograd'),
-              ('And  with loops', 2, None, 'and-with-loops'),
-              ('Using recursion', 2, None, 'using-recursion'),
-              ('Using Autograd with OLS', 2, None, 'using-autograd-with-ols'),
-              ('Same code but now with momentum gradient descent',
-               2,
-               None,
-               'same-code-but-now-with-momentum-gradient-descent'),
-              ('Including Stochastic Gradient Descent with Autograd',
-               2,
-               None,
-               'including-stochastic-gradient-descent-with-autograd'),
-              ('Same code but now with momentum gradient descent',
-               2,
-               None,
-               'same-code-but-now-with-momentum-gradient-descent'),
-              ('Similar (second order function now) problem but now with '
-               'AdaGrad',
-               2,
+               'example-code-for-logistic-regression'),
+              ('Synthetic data generation',
+               3,
                None,
-               'similar-second-order-function-now-problem-but-now-with-adagrad'),
-              ('RMSprop for adaptive learning rate with Stochastic Gradient '
-               'Descent',
+               'synthetic-data-generation'),
+              ('Using _Scikit-learn_', 2, None, 'using-scikit-learn'),
+              ('Using the correlation matrix',
                2,
                None,
-               'rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent'),
-              ('And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"',
+               'using-the-correlation-matrix'),
+              ('Discussing the correlation data',
                2,
                None,
-               'and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf'),
-              ('And Logistic Regression', 2, None, 'and-logistic-regression'),
-              ('Introducing "JAX":"https://jax.readthedocs.io/en/latest/"',
+               'discussing-the-correlation-data'),
+              ('Other measures in classification studies',
                2,
                None,
-               'introducing-jax-https-jax-readthedocs-io-en-latest'),
-              ('Getting started with Jax, note the way we import numpy',
-               3,
-               None,
-               'getting-started-with-jax-note-the-way-we-import-numpy'),
-              ('A warm-up example', 3, None, 'a-warm-up-example'),
-              ('A more advanced example', 3, None, 'a-more-advanced-example'),
+               'other-measures-in-classification-studies'),
               ('Introduction to Neural networks',
                2,
                None,
@@ -276,81 +201,54 @@
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs001.html#plans-for-week-40" style="font-size: 80%;"><b>Plans for week 40</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs002.html#lecture-monday-september-30-2024" style="font-size: 80%;"><b>Lecture Monday September 30, 2024</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs003.html#suggested-readings-and-videos" style="font-size: 80%;"><b>Suggested readings and videos</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs004.html#lab-sessions-tuesday-and-wednesday" style="font-size: 80%;"><b>Lab sessions Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs005.html#summary-from-last-week-using-gradient-descent-methods-limitations" style="font-size: 80%;"><b>Summary from last week, using gradient descent methods, limitations</b></a></li>
-     <!-- navigation toc: --> <li><a href="#simple-implementation-of-gd-for-ols-ridge-and-lasso" style="font-size: 80%;"><b>Simple implementation of GD for OLS, Ridge and Lasso</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs007.html#but-none-of-these-can-compete-with-newton-s-method" style="font-size: 80%;"><b>But none of these can compete with Newton's method</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs008.html#gradient-descent-and-logistic-regression" style="font-size: 80%;"><b>Gradient descent and Logistic regression</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs009.html#overview-video-on-stochastic-gradient-descent" style="font-size: 80%;"><b>Overview video on Stochastic Gradient Descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs010.html#batches-and-mini-batches" style="font-size: 80%;"><b>Batches and mini-batches</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs011.html#stochastic-gradient-descent-sgd" style="font-size: 80%;"><b>Stochastic Gradient Descent (SGD)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs012.html#stochastic-gradient-descent" style="font-size: 80%;"><b>Stochastic Gradient Descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs013.html#computation-of-gradients" style="font-size: 80%;"><b>Computation of gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs014.html#sgd-example" style="font-size: 80%;"><b>SGD example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs015.html#the-gradient-step" style="font-size: 80%;"><b>The gradient step</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs016.html#simple-example-code" style="font-size: 80%;"><b>Simple example code</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs017.html#when-do-we-stop" style="font-size: 80%;"><b>When do we stop?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs018.html#slightly-different-approach" style="font-size: 80%;"><b>Slightly different approach</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs019.html#time-decay-rate" style="font-size: 80%;"><b>Time decay rate</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs020.html#code-with-a-number-of-minibatches-which-varies" style="font-size: 80%;"><b>Code with a Number of Minibatches which varies</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs021.html#replace-or-not" style="font-size: 80%;"><b>Replace or not</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs022.html#momentum-based-gd" style="font-size: 80%;"><b>Momentum based GD</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs023.html#more-on-momentum-based-approaches" style="font-size: 80%;"><b>More on momentum based approaches</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs024.html#momentum-parameter" style="font-size: 80%;"><b>Momentum parameter</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs025.html#second-moment-of-the-gradient" style="font-size: 80%;"><b>Second moment of the gradient</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs026.html#rms-prop" style="font-size: 80%;"><b>RMS prop</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs027.html#adam-optimizer-https-arxiv-org-abs-1412-6980" style="font-size: 80%;"><b>"ADAM optimizer":"https://arxiv.org/abs/1412.6980"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#algorithms-and-codes-for-adagrad-rmsprop-and-adam" style="font-size: 80%;"><b>Algorithms and codes for Adagrad, RMSprop and Adam</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>AdaGrad algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>RMSProp algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>ADAM algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs029.html#practical-tips" style="font-size: 80%;"><b>Practical tips</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs030.html#automatic-differentiation" style="font-size: 80%;"><b>Automatic differentiation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs031.html#using-autograd" style="font-size: 80%;"><b>Using autograd</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs032.html#autograd-with-more-complicated-functions" style="font-size: 80%;"><b>Autograd with more complicated functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs033.html#more-complicated-functions-using-the-elements-of-their-arguments-directly" style="font-size: 80%;"><b>More complicated functions using the elements of their arguments directly</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs034.html#functions-using-mathematical-functions-from-numpy" style="font-size: 80%;"><b>Functions using mathematical functions from Numpy</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs035.html#more-autograd" style="font-size: 80%;"><b>More autograd</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs036.html#and-with-loops" style="font-size: 80%;"><b>And  with loops</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs037.html#using-recursion" style="font-size: 80%;"><b>Using recursion</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs038.html#using-autograd-with-ols" style="font-size: 80%;"><b>Using Autograd with OLS</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs041.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs040.html#including-stochastic-gradient-descent-with-autograd" style="font-size: 80%;"><b>Including Stochastic Gradient Descent with Autograd</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs041.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#similar-second-order-function-now-problem-but-now-with-adagrad" style="font-size: 80%;"><b>Similar (second order function now) problem but now with AdaGrad</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs043.html#rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent" style="font-size: 80%;"><b>RMSprop for adaptive learning rate with Stochastic Gradient Descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs044.html#and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf" style="font-size: 80%;"><b>And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#and-logistic-regression" style="font-size: 80%;"><b>And Logistic Regression</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#introducing-jax-https-jax-readthedocs-io-en-latest" style="font-size: 80%;"><b>Introducing "JAX":"https://jax.readthedocs.io/en/latest/"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#getting-started-with-jax-note-the-way-we-import-numpy" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Getting started with Jax, note the way we import numpy</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#a-warm-up-example" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;A warm-up example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#a-more-advanced-example" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;A more advanced example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs046.html#introduction-to-neural-networks" style="font-size: 80%;"><b>Introduction to Neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs047.html#artificial-neurons" style="font-size: 80%;"><b>Artificial neurons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs048.html#neural-network-types" style="font-size: 80%;"><b>Neural network types</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs049.html#feed-forward-neural-networks" style="font-size: 80%;"><b>Feed-forward neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs050.html#convolutional-neural-network" style="font-size: 80%;"><b>Convolutional Neural Network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs051.html#recurrent-neural-networks" style="font-size: 80%;"><b>Recurrent neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs052.html#other-types-of-networks" style="font-size: 80%;"><b>Other types of networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs053.html#multilayer-perceptrons" style="font-size: 80%;"><b>Multilayer perceptrons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs054.html#why-multilayer-perceptrons" style="font-size: 80%;"><b>Why multilayer perceptrons?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs055.html#illustration-of-a-single-perceptron-model-and-a-multi-perceptron-model" style="font-size: 80%;"><b>Illustration of a single perceptron model and a multi-perceptron model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs056.html#examples-of-xor-or-and-and-gates" style="font-size: 80%;"><b>Examples of XOR, OR and AND gates</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs057.html#does-logistic-regression-do-a-better-job" style="font-size: 80%;"><b>Does Logistic Regression do a better Job?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs058.html#adding-neural-networks" style="font-size: 80%;"><b>Adding Neural Networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs064.html#matrix-vector-notation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Matrix-vector notation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs065.html#matrix-vector-notation-and-activation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Matrix-vector notation  and activation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs066.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs067.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs068.html#relevance" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Relevance</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs001.html#lecture-monday-september-29-2025" style="font-size: 80%;"><b>Lecture Monday September 29, 2025</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs002.html#suggested-readings-and-videos" style="font-size: 80%;"><b>Suggested readings and videos</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs003.html#lab-sessions-tuesday-and-wednesday" style="font-size: 80%;"><b>Lab sessions Tuesday and Wednesday</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs004.html#logistic-regression-from-last-week" style="font-size: 80%;"><b>Logistic Regression, from last week</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs005.html#classification-problems" style="font-size: 80%;"><b>Classification problems</b></a></li>
+     <!-- navigation toc: --> <li><a href="#optimization-and-deep-learning" style="font-size: 80%;"><b>Optimization and Deep learning</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs007.html#basics" style="font-size: 80%;"><b>Basics</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs008.html#two-parameters" style="font-size: 80%;"><b>Two parameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs009.html#maximum-likelihood" style="font-size: 80%;"><b>Maximum likelihood</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs010.html#the-cost-function-rewritten" style="font-size: 80%;"><b>The cost function rewritten</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs011.html#minimizing-the-cross-entropy" style="font-size: 80%;"><b>Minimizing the cross entropy</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs012.html#a-more-compact-expression" style="font-size: 80%;"><b>A more compact expression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs013.html#extending-to-more-predictors" style="font-size: 80%;"><b>Extending to more predictors</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs014.html#including-more-classes" style="font-size: 80%;"><b>Including more classes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs015.html#more-classes" style="font-size: 80%;"><b>More classes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs016.html#optimization-the-central-part-of-any-machine-learning-algortithm" style="font-size: 80%;"><b>Optimization, the central part of any Machine Learning algortithm</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs017.html#revisiting-our-logistic-regression-case" style="font-size: 80%;"><b>Revisiting our Logistic Regression case</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs018.html#the-equations-to-solve" style="font-size: 80%;"><b>The equations to solve</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs019.html#solving-using-newton-raphson-s-method" style="font-size: 80%;"><b>Solving using Newton-Raphson's method</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs020.html#example-code-for-logistic-regression" style="font-size: 80%;"><b>Example code for Logistic Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs020.html#synthetic-data-generation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Synthetic data generation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs021.html#using-scikit-learn" style="font-size: 80%;"><b>Using <b>Scikit-learn</b></b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs022.html#using-the-correlation-matrix" style="font-size: 80%;"><b>Using the correlation matrix</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs023.html#discussing-the-correlation-data" style="font-size: 80%;"><b>Discussing the correlation data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs024.html#other-measures-in-classification-studies" style="font-size: 80%;"><b>Other measures in classification studies</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs025.html#introduction-to-neural-networks" style="font-size: 80%;"><b>Introduction to Neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs026.html#artificial-neurons" style="font-size: 80%;"><b>Artificial neurons</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs027.html#neural-network-types" style="font-size: 80%;"><b>Neural network types</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#feed-forward-neural-networks" style="font-size: 80%;"><b>Feed-forward neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs029.html#convolutional-neural-network" style="font-size: 80%;"><b>Convolutional Neural Network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs030.html#recurrent-neural-networks" style="font-size: 80%;"><b>Recurrent neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs031.html#other-types-of-networks" style="font-size: 80%;"><b>Other types of networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs032.html#multilayer-perceptrons" style="font-size: 80%;"><b>Multilayer perceptrons</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs033.html#why-multilayer-perceptrons" style="font-size: 80%;"><b>Why multilayer perceptrons?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs034.html#illustration-of-a-single-perceptron-model-and-a-multi-perceptron-model" style="font-size: 80%;"><b>Illustration of a single perceptron model and a multi-perceptron model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs035.html#examples-of-xor-or-and-and-gates" style="font-size: 80%;"><b>Examples of XOR, OR and AND gates</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs036.html#does-logistic-regression-do-a-better-job" style="font-size: 80%;"><b>Does Logistic Regression do a better Job?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs037.html#adding-neural-networks" style="font-size: 80%;"><b>Adding Neural Networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs043.html#matrix-vector-notation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Matrix-vector notation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs044.html#matrix-vector-notation-and-activation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Matrix-vector notation  and activation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs046.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs047.html#relevance" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Relevance</a></li>
 
         </ul>
       </li>
@@ -362,76 +260,26 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0006"></a>
 <!-- !split -->
-<h2 id="simple-implementation-of-gd-for-ols-ridge-and-lasso" class="anchor">Simple implementation of GD for OLS, Ridge and Lasso </h2>
+<h2 id="optimization-and-deep-learning" class="anchor">Optimization and Deep learning </h2>
 
-<p>Last week we studied both several gradient methods. With and without an update of the learning.
-We summarize some of these here for the methods we hvae studied in project one, without the inclusion of momentum.
+<p>Logistic regression will also serve as our stepping stone towards
+neural network algorithms and supervised deep learning. For logistic
+learning, the minimization of the cost function leads to a non-linear
+equation in the parameters \( \boldsymbol{\theta} \). The optimization of the
+problem calls therefore for minimization algorithms.
 </p>
 
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">random</span> <span style="color: #008000; font-weight: bold">import</span> random, seed
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
-
-<span style="color: #408080; font-style: italic"># the number of datapoints with a 2nd-order polynomial</span>
-n <span style="color: #666666">=</span> <span style="color: #666666">100</span>
-x <span style="color: #666666">=</span> <span style="color: #666666">2*</span>np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>rand(n,<span style="color: #666666">1</span>)
-y <span style="color: #666666">=</span> <span style="color: #666666">4+3*</span>x<span style="color: #666666">+5*</span>x<span style="color: #666666">*</span>x
-<span style="color: #408080; font-style: italic"># Design matrix including the intercept</span>
-<span style="color: #408080; font-style: italic"># No scaling of data of and all data used for training </span>
-X <span style="color: #666666">=</span> np<span style="color: #666666">.</span>c_[np<span style="color: #666666">.</span>ones((n,<span style="color: #666666">1</span>)), x, x<span style="color: #666666">*</span>x]
-<span style="color: #408080; font-style: italic"># Learning rate and number of iterations</span>
-eta <span style="color: #666666">=</span> <span style="color: #666666">0.05</span>
-Niterations <span style="color: #666666">=</span> <span style="color: #666666">100</span>
-
-<span style="color: #408080; font-style: italic"># OLS part</span>
-beta_OLS <span style="color: #666666">=</span> np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>randn(<span style="color: #666666">3</span>,<span style="color: #666666">1</span>)
-gradient <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros(<span style="color: #666666">3</span>)
-<span style="color: #008000; font-weight: bold">for</span> <span style="color: #008000">iter</span> <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(Niterations):
-    gradient <span style="color: #666666">=</span> (<span style="color: #666666">2.0/</span>n)<span style="color: #666666">*</span>X<span style="color: #666666">.</span>T <span style="color: #666666">@</span> (X <span style="color: #666666">@</span> beta_OLS<span style="color: #666666">-</span>y)
-    beta_OLS <span style="color: #666666">-=</span> eta<span style="color: #666666">*</span>gradient
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&#39;Parameters for OLS using gradient descent&#39;</span>)    
-<span style="color: #008000">print</span>(beta_OLS)
-
-<span style="color: #408080; font-style: italic">#Ridge and Lasso parameter Lambda</span>
-Lambda  <span style="color: #666666">=</span> <span style="color: #666666">0.01</span>
-Id <span style="color: #666666">=</span> n<span style="color: #666666">*</span>Lambda<span style="color: #666666">*</span> np<span style="color: #666666">.</span>eye((X<span style="color: #666666">.</span>T <span style="color: #666666">@</span> X)<span style="color: #666666">.</span>shape[<span style="color: #666666">0</span>])
-<span style="color: #408080; font-style: italic"># Gradient descent with  Ridge</span>
-beta_Ridge <span style="color: #666666">=</span> np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>randn(<span style="color: #666666">3</span>,<span style="color: #666666">1</span>)
-gradient <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros(<span style="color: #666666">3</span>)
-<span style="color: #008000; font-weight: bold">for</span> <span style="color: #008000">iter</span> <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(Niterations):
-    gradients <span style="color: #666666">=</span> <span style="color: #666666">2.0/</span>n<span style="color: #666666">*</span>X<span style="color: #666666">.</span>T <span style="color: #666666">@</span> (X <span style="color: #666666">@</span> beta_Ridge<span style="color: #666666">-</span>y)<span style="color: #666666">+2*</span>Lambda<span style="color: #666666">*</span>beta_Ridge
-    beta_Ridge <span style="color: #666666">-=</span> eta<span style="color: #666666">*</span>gradients
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&#39;Parameters for Ridge using gradient descent&#39;</span>)    
-<span style="color: #008000">print</span>(beta_Ridge)
-
-<span style="color: #408080; font-style: italic"># Gradient descent with Lasso</span>
-beta_Lasso <span style="color: #666666">=</span> np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>randn(<span style="color: #666666">3</span>,<span style="color: #666666">1</span>)
-gradient <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros(<span style="color: #666666">3</span>)
-<span style="color: #008000; font-weight: bold">for</span> <span style="color: #008000">iter</span> <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(Niterations):
-    gradients <span style="color: #666666">=</span> <span style="color: #666666">2.0/</span>n<span style="color: #666666">*</span>X<span style="color: #666666">.</span>T <span style="color: #666666">@</span> (X <span style="color: #666666">@</span> beta_Lasso<span style="color: #666666">-</span>y)<span style="color: #666666">+</span>Lambda<span style="color: #666666">*</span>np<span style="color: #666666">.</span>sign(beta_Lasso)
-    beta_Lasso <span style="color: #666666">-=</span> eta<span style="color: #666666">*</span>gradients
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&#39;Parameters for Lasso using gradient descent&#39;</span>)    
-<span style="color: #008000">print</span>(beta_Lasso)
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
+<p>As we have discussed earlier, this forms the
+bottle neck of all machine learning algorithms, namely how to find
+reliable minima of a multi-variable function. This leads us to the
+family of gradient descent methods. The latter are the working horses
+of basically all modern machine learning algorithms.
+</p>
 
+<p>We note also that many of the topics discussed here on logistic 
+regression are also commonly used in modern supervised Deep Learning
+models, as we will see later.
+</p>
 
 <p>
 <!-- navigation buttons at the bottom of the page -->
@@ -454,7 +302,7 @@ <h2 id="simple-implementation-of-gd-for-ols-ridge-and-lasso" class="anchor">Simp
   <li><a href="http://github.com/._week40-bs014.html">15</a></li>
   <li><a href="http://github.com/._week40-bs015.html">16</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week40-bs068.html">69</a></li>
+  <li><a href="http://github.com/._week40-bs047.html">48</a></li>
   <li><a href="http://github.com/._week40-bs007.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week40/html/._week40-bs007.html b/doc/pub/week40/html/._week40-bs007.html
index 191de7c21..c587b4ba9 100644
--- a/doc/pub/week40/html/._week40-bs007.html
+++ b/doc/pub/week40/html/._week40-bs007.html
@@ -36,11 +36,10 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Plans for week 40', 2, None, 'plans-for-week-40'),
-              ('Lecture Monday September 30, 2024',
+ 'sections': [('Lecture Monday September 29, 2025',
                2,
                None,
-               'lecture-monday-september-30-2024'),
+               'lecture-monday-september-29-2025'),
               ('Suggested readings and videos',
                2,
                None,
@@ -49,145 +48,71 @@
                2,
                None,
                'lab-sessions-tuesday-and-wednesday'),
-              ('Summary from last week, using gradient descent methods, '
-               'limitations',
+              ('Logistic Regression, from last week',
                2,
                None,
-               'summary-from-last-week-using-gradient-descent-methods-limitations'),
-              ('Simple implementation of GD for OLS, Ridge and Lasso',
+               'logistic-regression-from-last-week'),
+              ('Classification problems', 2, None, 'classification-problems'),
+              ('Optimization and Deep learning',
                2,
                None,
-               'simple-implementation-of-gd-for-ols-ridge-and-lasso'),
-              ("But none of these can compete with Newton's method",
+               'optimization-and-deep-learning'),
+              ('Basics', 2, None, 'basics'),
+              ('Two parameters', 2, None, 'two-parameters'),
+              ('Maximum likelihood', 2, None, 'maximum-likelihood'),
+              ('The cost function rewritten',
                2,
                None,
-               'but-none-of-these-can-compete-with-newton-s-method'),
-              ('Gradient descent and Logistic regression',
+               'the-cost-function-rewritten'),
+              ('Minimizing the cross entropy',
                2,
                None,
-               'gradient-descent-and-logistic-regression'),
-              ('Overview video on Stochastic Gradient Descent',
+               'minimizing-the-cross-entropy'),
+              ('A more compact expression',
                2,
                None,
-               'overview-video-on-stochastic-gradient-descent'),
-              ('Batches and mini-batches', 2, None, 'batches-and-mini-batches'),
-              ('Stochastic Gradient Descent (SGD)',
+               'a-more-compact-expression'),
+              ('Extending to more predictors',
                2,
                None,
-               'stochastic-gradient-descent-sgd'),
-              ('Stochastic Gradient Descent',
+               'extending-to-more-predictors'),
+              ('Including more classes', 2, None, 'including-more-classes'),
+              ('More classes', 2, None, 'more-classes'),
+              ('Optimization, the central part of any Machine Learning '
+               'algortithm',
                2,
                None,
-               'stochastic-gradient-descent'),
-              ('Computation of gradients', 2, None, 'computation-of-gradients'),
-              ('SGD example', 2, None, 'sgd-example'),
-              ('The gradient step', 2, None, 'the-gradient-step'),
-              ('Simple example code', 2, None, 'simple-example-code'),
-              ('When do we stop?', 2, None, 'when-do-we-stop'),
-              ('Slightly different approach',
+               'optimization-the-central-part-of-any-machine-learning-algortithm'),
+              ('Revisiting our Logistic Regression case',
                2,
                None,
-               'slightly-different-approach'),
-              ('Time decay rate', 2, None, 'time-decay-rate'),
-              ('Code with a Number of Minibatches which varies',
+               'revisiting-our-logistic-regression-case'),
+              ('The equations to solve', 2, None, 'the-equations-to-solve'),
+              ("Solving using Newton-Raphson's method",
                2,
                None,
-               'code-with-a-number-of-minibatches-which-varies'),
-              ('Replace or not', 2, None, 'replace-or-not'),
-              ('Momentum based GD', 2, None, 'momentum-based-gd'),
-              ('More on momentum based approaches',
+               'solving-using-newton-raphson-s-method'),
+              ('Example code for Logistic Regression',
                2,
                None,
-               'more-on-momentum-based-approaches'),
-              ('Momentum parameter', 2, None, 'momentum-parameter'),
-              ('Second moment of the gradient',
-               2,
-               None,
-               'second-moment-of-the-gradient'),
-              ('RMS prop', 2, None, 'rms-prop'),
-              ('"ADAM optimizer":"https://arxiv.org/abs/1412.6980"',
-               2,
-               None,
-               'adam-optimizer-https-arxiv-org-abs-1412-6980'),
-              ('Algorithms and codes for Adagrad, RMSprop and Adam',
-               2,
-               None,
-               'algorithms-and-codes-for-adagrad-rmsprop-and-adam'),
-              ('AdaGrad algorithm, taken from "Goodfellow et '
-               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
-               2,
-               None,
-               'adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
-              ('RMSProp algorithm, taken from "Goodfellow et '
-               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
-               2,
-               None,
-               'rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
-              ('ADAM algorithm, taken from "Goodfellow et '
-               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
-               2,
-               None,
-               'adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
-              ('Practical tips', 2, None, 'practical-tips'),
-              ('Automatic differentiation',
-               2,
-               None,
-               'automatic-differentiation'),
-              ('Using autograd', 2, None, 'using-autograd'),
-              ('Autograd with more complicated functions',
-               2,
-               None,
-               'autograd-with-more-complicated-functions'),
-              ('More complicated functions using the elements of their '
-               'arguments directly',
-               2,
-               None,
-               'more-complicated-functions-using-the-elements-of-their-arguments-directly'),
-              ('Functions using mathematical functions from Numpy',
-               2,
-               None,
-               'functions-using-mathematical-functions-from-numpy'),
-              ('More autograd', 2, None, 'more-autograd'),
-              ('And  with loops', 2, None, 'and-with-loops'),
-              ('Using recursion', 2, None, 'using-recursion'),
-              ('Using Autograd with OLS', 2, None, 'using-autograd-with-ols'),
-              ('Same code but now with momentum gradient descent',
-               2,
-               None,
-               'same-code-but-now-with-momentum-gradient-descent'),
-              ('Including Stochastic Gradient Descent with Autograd',
-               2,
-               None,
-               'including-stochastic-gradient-descent-with-autograd'),
-              ('Same code but now with momentum gradient descent',
-               2,
-               None,
-               'same-code-but-now-with-momentum-gradient-descent'),
-              ('Similar (second order function now) problem but now with '
-               'AdaGrad',
-               2,
+               'example-code-for-logistic-regression'),
+              ('Synthetic data generation',
+               3,
                None,
-               'similar-second-order-function-now-problem-but-now-with-adagrad'),
-              ('RMSprop for adaptive learning rate with Stochastic Gradient '
-               'Descent',
+               'synthetic-data-generation'),
+              ('Using _Scikit-learn_', 2, None, 'using-scikit-learn'),
+              ('Using the correlation matrix',
                2,
                None,
-               'rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent'),
-              ('And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"',
+               'using-the-correlation-matrix'),
+              ('Discussing the correlation data',
                2,
                None,
-               'and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf'),
-              ('And Logistic Regression', 2, None, 'and-logistic-regression'),
-              ('Introducing "JAX":"https://jax.readthedocs.io/en/latest/"',
+               'discussing-the-correlation-data'),
+              ('Other measures in classification studies',
                2,
                None,
-               'introducing-jax-https-jax-readthedocs-io-en-latest'),
-              ('Getting started with Jax, note the way we import numpy',
-               3,
-               None,
-               'getting-started-with-jax-note-the-way-we-import-numpy'),
-              ('A warm-up example', 3, None, 'a-warm-up-example'),
-              ('A more advanced example', 3, None, 'a-more-advanced-example'),
+               'other-measures-in-classification-studies'),
               ('Introduction to Neural networks',
                2,
                None,
@@ -276,81 +201,54 @@
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs001.html#plans-for-week-40" style="font-size: 80%;"><b>Plans for week 40</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs002.html#lecture-monday-september-30-2024" style="font-size: 80%;"><b>Lecture Monday September 30, 2024</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs003.html#suggested-readings-and-videos" style="font-size: 80%;"><b>Suggested readings and videos</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs004.html#lab-sessions-tuesday-and-wednesday" style="font-size: 80%;"><b>Lab sessions Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs005.html#summary-from-last-week-using-gradient-descent-methods-limitations" style="font-size: 80%;"><b>Summary from last week, using gradient descent methods, limitations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs006.html#simple-implementation-of-gd-for-ols-ridge-and-lasso" style="font-size: 80%;"><b>Simple implementation of GD for OLS, Ridge and Lasso</b></a></li>
-     <!-- navigation toc: --> <li><a href="#but-none-of-these-can-compete-with-newton-s-method" style="font-size: 80%;"><b>But none of these can compete with Newton's method</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs008.html#gradient-descent-and-logistic-regression" style="font-size: 80%;"><b>Gradient descent and Logistic regression</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs009.html#overview-video-on-stochastic-gradient-descent" style="font-size: 80%;"><b>Overview video on Stochastic Gradient Descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs010.html#batches-and-mini-batches" style="font-size: 80%;"><b>Batches and mini-batches</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs011.html#stochastic-gradient-descent-sgd" style="font-size: 80%;"><b>Stochastic Gradient Descent (SGD)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs012.html#stochastic-gradient-descent" style="font-size: 80%;"><b>Stochastic Gradient Descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs013.html#computation-of-gradients" style="font-size: 80%;"><b>Computation of gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs014.html#sgd-example" style="font-size: 80%;"><b>SGD example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs015.html#the-gradient-step" style="font-size: 80%;"><b>The gradient step</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs016.html#simple-example-code" style="font-size: 80%;"><b>Simple example code</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs017.html#when-do-we-stop" style="font-size: 80%;"><b>When do we stop?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs018.html#slightly-different-approach" style="font-size: 80%;"><b>Slightly different approach</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs019.html#time-decay-rate" style="font-size: 80%;"><b>Time decay rate</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs020.html#code-with-a-number-of-minibatches-which-varies" style="font-size: 80%;"><b>Code with a Number of Minibatches which varies</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs021.html#replace-or-not" style="font-size: 80%;"><b>Replace or not</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs022.html#momentum-based-gd" style="font-size: 80%;"><b>Momentum based GD</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs023.html#more-on-momentum-based-approaches" style="font-size: 80%;"><b>More on momentum based approaches</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs024.html#momentum-parameter" style="font-size: 80%;"><b>Momentum parameter</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs025.html#second-moment-of-the-gradient" style="font-size: 80%;"><b>Second moment of the gradient</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs026.html#rms-prop" style="font-size: 80%;"><b>RMS prop</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs027.html#adam-optimizer-https-arxiv-org-abs-1412-6980" style="font-size: 80%;"><b>"ADAM optimizer":"https://arxiv.org/abs/1412.6980"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#algorithms-and-codes-for-adagrad-rmsprop-and-adam" style="font-size: 80%;"><b>Algorithms and codes for Adagrad, RMSprop and Adam</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>AdaGrad algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>RMSProp algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>ADAM algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs029.html#practical-tips" style="font-size: 80%;"><b>Practical tips</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs030.html#automatic-differentiation" style="font-size: 80%;"><b>Automatic differentiation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs031.html#using-autograd" style="font-size: 80%;"><b>Using autograd</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs032.html#autograd-with-more-complicated-functions" style="font-size: 80%;"><b>Autograd with more complicated functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs033.html#more-complicated-functions-using-the-elements-of-their-arguments-directly" style="font-size: 80%;"><b>More complicated functions using the elements of their arguments directly</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs034.html#functions-using-mathematical-functions-from-numpy" style="font-size: 80%;"><b>Functions using mathematical functions from Numpy</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs035.html#more-autograd" style="font-size: 80%;"><b>More autograd</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs036.html#and-with-loops" style="font-size: 80%;"><b>And  with loops</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs037.html#using-recursion" style="font-size: 80%;"><b>Using recursion</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs038.html#using-autograd-with-ols" style="font-size: 80%;"><b>Using Autograd with OLS</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs041.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs040.html#including-stochastic-gradient-descent-with-autograd" style="font-size: 80%;"><b>Including Stochastic Gradient Descent with Autograd</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs041.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#similar-second-order-function-now-problem-but-now-with-adagrad" style="font-size: 80%;"><b>Similar (second order function now) problem but now with AdaGrad</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs043.html#rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent" style="font-size: 80%;"><b>RMSprop for adaptive learning rate with Stochastic Gradient Descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs044.html#and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf" style="font-size: 80%;"><b>And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#and-logistic-regression" style="font-size: 80%;"><b>And Logistic Regression</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#introducing-jax-https-jax-readthedocs-io-en-latest" style="font-size: 80%;"><b>Introducing "JAX":"https://jax.readthedocs.io/en/latest/"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#getting-started-with-jax-note-the-way-we-import-numpy" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Getting started with Jax, note the way we import numpy</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#a-warm-up-example" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;A warm-up example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#a-more-advanced-example" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;A more advanced example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs046.html#introduction-to-neural-networks" style="font-size: 80%;"><b>Introduction to Neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs047.html#artificial-neurons" style="font-size: 80%;"><b>Artificial neurons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs048.html#neural-network-types" style="font-size: 80%;"><b>Neural network types</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs049.html#feed-forward-neural-networks" style="font-size: 80%;"><b>Feed-forward neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs050.html#convolutional-neural-network" style="font-size: 80%;"><b>Convolutional Neural Network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs051.html#recurrent-neural-networks" style="font-size: 80%;"><b>Recurrent neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs052.html#other-types-of-networks" style="font-size: 80%;"><b>Other types of networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs053.html#multilayer-perceptrons" style="font-size: 80%;"><b>Multilayer perceptrons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs054.html#why-multilayer-perceptrons" style="font-size: 80%;"><b>Why multilayer perceptrons?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs055.html#illustration-of-a-single-perceptron-model-and-a-multi-perceptron-model" style="font-size: 80%;"><b>Illustration of a single perceptron model and a multi-perceptron model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs056.html#examples-of-xor-or-and-and-gates" style="font-size: 80%;"><b>Examples of XOR, OR and AND gates</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs057.html#does-logistic-regression-do-a-better-job" style="font-size: 80%;"><b>Does Logistic Regression do a better Job?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs058.html#adding-neural-networks" style="font-size: 80%;"><b>Adding Neural Networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs064.html#matrix-vector-notation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Matrix-vector notation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs065.html#matrix-vector-notation-and-activation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Matrix-vector notation  and activation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs066.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs067.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs068.html#relevance" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Relevance</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs001.html#lecture-monday-september-29-2025" style="font-size: 80%;"><b>Lecture Monday September 29, 2025</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs002.html#suggested-readings-and-videos" style="font-size: 80%;"><b>Suggested readings and videos</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs003.html#lab-sessions-tuesday-and-wednesday" style="font-size: 80%;"><b>Lab sessions Tuesday and Wednesday</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs004.html#logistic-regression-from-last-week" style="font-size: 80%;"><b>Logistic Regression, from last week</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs005.html#classification-problems" style="font-size: 80%;"><b>Classification problems</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs006.html#optimization-and-deep-learning" style="font-size: 80%;"><b>Optimization and Deep learning</b></a></li>
+     <!-- navigation toc: --> <li><a href="#basics" style="font-size: 80%;"><b>Basics</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs008.html#two-parameters" style="font-size: 80%;"><b>Two parameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs009.html#maximum-likelihood" style="font-size: 80%;"><b>Maximum likelihood</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs010.html#the-cost-function-rewritten" style="font-size: 80%;"><b>The cost function rewritten</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs011.html#minimizing-the-cross-entropy" style="font-size: 80%;"><b>Minimizing the cross entropy</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs012.html#a-more-compact-expression" style="font-size: 80%;"><b>A more compact expression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs013.html#extending-to-more-predictors" style="font-size: 80%;"><b>Extending to more predictors</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs014.html#including-more-classes" style="font-size: 80%;"><b>Including more classes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs015.html#more-classes" style="font-size: 80%;"><b>More classes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs016.html#optimization-the-central-part-of-any-machine-learning-algortithm" style="font-size: 80%;"><b>Optimization, the central part of any Machine Learning algortithm</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs017.html#revisiting-our-logistic-regression-case" style="font-size: 80%;"><b>Revisiting our Logistic Regression case</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs018.html#the-equations-to-solve" style="font-size: 80%;"><b>The equations to solve</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs019.html#solving-using-newton-raphson-s-method" style="font-size: 80%;"><b>Solving using Newton-Raphson's method</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs020.html#example-code-for-logistic-regression" style="font-size: 80%;"><b>Example code for Logistic Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs020.html#synthetic-data-generation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Synthetic data generation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs021.html#using-scikit-learn" style="font-size: 80%;"><b>Using <b>Scikit-learn</b></b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs022.html#using-the-correlation-matrix" style="font-size: 80%;"><b>Using the correlation matrix</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs023.html#discussing-the-correlation-data" style="font-size: 80%;"><b>Discussing the correlation data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs024.html#other-measures-in-classification-studies" style="font-size: 80%;"><b>Other measures in classification studies</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs025.html#introduction-to-neural-networks" style="font-size: 80%;"><b>Introduction to Neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs026.html#artificial-neurons" style="font-size: 80%;"><b>Artificial neurons</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs027.html#neural-network-types" style="font-size: 80%;"><b>Neural network types</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#feed-forward-neural-networks" style="font-size: 80%;"><b>Feed-forward neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs029.html#convolutional-neural-network" style="font-size: 80%;"><b>Convolutional Neural Network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs030.html#recurrent-neural-networks" style="font-size: 80%;"><b>Recurrent neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs031.html#other-types-of-networks" style="font-size: 80%;"><b>Other types of networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs032.html#multilayer-perceptrons" style="font-size: 80%;"><b>Multilayer perceptrons</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs033.html#why-multilayer-perceptrons" style="font-size: 80%;"><b>Why multilayer perceptrons?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs034.html#illustration-of-a-single-perceptron-model-and-a-multi-perceptron-model" style="font-size: 80%;"><b>Illustration of a single perceptron model and a multi-perceptron model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs035.html#examples-of-xor-or-and-and-gates" style="font-size: 80%;"><b>Examples of XOR, OR and AND gates</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs036.html#does-logistic-regression-do-a-better-job" style="font-size: 80%;"><b>Does Logistic Regression do a better Job?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs037.html#adding-neural-networks" style="font-size: 80%;"><b>Adding Neural Networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs043.html#matrix-vector-notation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Matrix-vector notation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs044.html#matrix-vector-notation-and-activation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Matrix-vector notation  and activation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs046.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs047.html#relevance" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Relevance</a></li>
 
         </ul>
       </li>
@@ -361,64 +259,30 @@
 <div class="container">
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0007"></a>
-<!-- !split -->
-<h2 id="but-none-of-these-can-compete-with-newton-s-method" class="anchor">But none of these can compete with Newton's method </h2>
-
-<p>Note that we here have introduced automatic differentiation</p>
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #408080; font-style: italic"># Using Newton&#39;s method</span>
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">random</span> <span style="color: #008000; font-weight: bold">import</span> random, seed
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">autograd.numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">autograd</span> <span style="color: #008000; font-weight: bold">import</span> grad
+<!-- !split  -->
+<h2 id="basics" class="anchor">Basics </h2>
 
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">CostOLS</span>(beta):
-    <span style="color: #008000; font-weight: bold">return</span> (<span style="color: #666666">1.0/</span>n)<span style="color: #666666">*</span>np<span style="color: #666666">.</span>sum((y<span style="color: #666666">-</span>X <span style="color: #666666">@</span> beta)<span style="color: #666666">**2</span>)
+<p>We consider the case where the outputs/targets, also called the
+responses or the outcomes, \( y_i \) are discrete and only take values
+from \( k=0,\dots,K-1 \) (i.e. \( K \) classes).
+</p>
 
-n <span style="color: #666666">=</span> <span style="color: #666666">100</span>
-x <span style="color: #666666">=</span> <span style="color: #666666">2*</span>np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>rand(n,<span style="color: #666666">1</span>)
-y <span style="color: #666666">=</span> <span style="color: #666666">4+3*</span>x<span style="color: #666666">+5*</span>x<span style="color: #666666">*</span>x
+<p>The goal is to predict the
+output classes from the design matrix \( \boldsymbol{X}\in\mathbb{R}^{n\times p} \)
+made of \( n \) samples, each of which carries \( p \) features or predictors. The
+primary goal is to identify the classes to which new unseen samples
+belong.
+</p>
 
-X <span style="color: #666666">=</span> np<span style="color: #666666">.</span>c_[np<span style="color: #666666">.</span>ones((n,<span style="color: #666666">1</span>)), x, x<span style="color: #666666">*</span>x]
-XT_X <span style="color: #666666">=</span> X<span style="color: #666666">.</span>T <span style="color: #666666">@</span> X
-beta_linreg <span style="color: #666666">=</span> np<span style="color: #666666">.</span>linalg<span style="color: #666666">.</span>pinv(XT_X) <span style="color: #666666">@</span> (X<span style="color: #666666">.</span>T <span style="color: #666666">@</span> y)
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;Own inversion&quot;</span>)
-<span style="color: #008000">print</span>(beta_linreg)
-<span style="color: #408080; font-style: italic"># Hessian matrix</span>
-H <span style="color: #666666">=</span> (<span style="color: #666666">2.0/</span>n)<span style="color: #666666">*</span> XT_X
-<span style="color: #408080; font-style: italic"># Note that here the Hessian does not depend on the parameters beta</span>
-invH <span style="color: #666666">=</span> np<span style="color: #666666">.</span>linalg<span style="color: #666666">.</span>pinv(H)
-beta <span style="color: #666666">=</span> np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>randn(<span style="color: #666666">3</span>,<span style="color: #666666">1</span>)
-Niterations <span style="color: #666666">=</span> <span style="color: #666666">5</span>
-<span style="color: #408080; font-style: italic"># define the gradient</span>
-training_gradient <span style="color: #666666">=</span> grad(CostOLS)
+<p>Last week we  specialized to the case of two classes only, with outputs
+\( y_i=0 \) and \( y_i=1 \). Our outcomes could represent the status of a
+credit card user that could default or not on her/his credit card
+debt. That is
+</p>
 
-<span style="color: #008000; font-weight: bold">for</span> <span style="color: #008000">iter</span> <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(Niterations):
-    gradients <span style="color: #666666">=</span> training_gradient(beta)
-    beta <span style="color: #666666">-=</span> invH <span style="color: #666666">@</span> gradients
-    <span style="color: #008000">print</span>(<span style="color: #008000">iter</span>,gradients[<span style="color: #666666">0</span>],gradients[<span style="color: #666666">1</span>])
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;beta from own Newton code&quot;</span>)
-<span style="color: #008000">print</span>(beta)
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
+$$
+y_i = \begin{bmatrix} 0 & \mathrm{no}\\  1 & \mathrm{yes} \end{bmatrix}.
+$$
 
 
 <p>
@@ -443,7 +307,7 @@ <h2 id="but-none-of-these-can-compete-with-newton-s-method" class="anchor">But n
   <li><a href="http://github.com/._week40-bs015.html">16</a></li>
   <li><a href="http://github.com/._week40-bs016.html">17</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week40-bs068.html">69</a></li>
+  <li><a href="http://github.com/._week40-bs047.html">48</a></li>
   <li><a href="http://github.com/._week40-bs008.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week40/html/._week40-bs008.html b/doc/pub/week40/html/._week40-bs008.html
index 05c8af889..0ebf1aa44 100644
--- a/doc/pub/week40/html/._week40-bs008.html
+++ b/doc/pub/week40/html/._week40-bs008.html
@@ -36,11 +36,10 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Plans for week 40', 2, None, 'plans-for-week-40'),
-              ('Lecture Monday September 30, 2024',
+ 'sections': [('Lecture Monday September 29, 2025',
                2,
                None,
-               'lecture-monday-september-30-2024'),
+               'lecture-monday-september-29-2025'),
               ('Suggested readings and videos',
                2,
                None,
@@ -49,145 +48,71 @@
                2,
                None,
                'lab-sessions-tuesday-and-wednesday'),
-              ('Summary from last week, using gradient descent methods, '
-               'limitations',
+              ('Logistic Regression, from last week',
                2,
                None,
-               'summary-from-last-week-using-gradient-descent-methods-limitations'),
-              ('Simple implementation of GD for OLS, Ridge and Lasso',
+               'logistic-regression-from-last-week'),
+              ('Classification problems', 2, None, 'classification-problems'),
+              ('Optimization and Deep learning',
                2,
                None,
-               'simple-implementation-of-gd-for-ols-ridge-and-lasso'),
-              ("But none of these can compete with Newton's method",
+               'optimization-and-deep-learning'),
+              ('Basics', 2, None, 'basics'),
+              ('Two parameters', 2, None, 'two-parameters'),
+              ('Maximum likelihood', 2, None, 'maximum-likelihood'),
+              ('The cost function rewritten',
                2,
                None,
-               'but-none-of-these-can-compete-with-newton-s-method'),
-              ('Gradient descent and Logistic regression',
+               'the-cost-function-rewritten'),
+              ('Minimizing the cross entropy',
                2,
                None,
-               'gradient-descent-and-logistic-regression'),
-              ('Overview video on Stochastic Gradient Descent',
+               'minimizing-the-cross-entropy'),
+              ('A more compact expression',
                2,
                None,
-               'overview-video-on-stochastic-gradient-descent'),
-              ('Batches and mini-batches', 2, None, 'batches-and-mini-batches'),
-              ('Stochastic Gradient Descent (SGD)',
+               'a-more-compact-expression'),
+              ('Extending to more predictors',
                2,
                None,
-               'stochastic-gradient-descent-sgd'),
-              ('Stochastic Gradient Descent',
+               'extending-to-more-predictors'),
+              ('Including more classes', 2, None, 'including-more-classes'),
+              ('More classes', 2, None, 'more-classes'),
+              ('Optimization, the central part of any Machine Learning '
+               'algortithm',
                2,
                None,
-               'stochastic-gradient-descent'),
-              ('Computation of gradients', 2, None, 'computation-of-gradients'),
-              ('SGD example', 2, None, 'sgd-example'),
-              ('The gradient step', 2, None, 'the-gradient-step'),
-              ('Simple example code', 2, None, 'simple-example-code'),
-              ('When do we stop?', 2, None, 'when-do-we-stop'),
-              ('Slightly different approach',
+               'optimization-the-central-part-of-any-machine-learning-algortithm'),
+              ('Revisiting our Logistic Regression case',
                2,
                None,
-               'slightly-different-approach'),
-              ('Time decay rate', 2, None, 'time-decay-rate'),
-              ('Code with a Number of Minibatches which varies',
+               'revisiting-our-logistic-regression-case'),
+              ('The equations to solve', 2, None, 'the-equations-to-solve'),
+              ("Solving using Newton-Raphson's method",
                2,
                None,
-               'code-with-a-number-of-minibatches-which-varies'),
-              ('Replace or not', 2, None, 'replace-or-not'),
-              ('Momentum based GD', 2, None, 'momentum-based-gd'),
-              ('More on momentum based approaches',
+               'solving-using-newton-raphson-s-method'),
+              ('Example code for Logistic Regression',
                2,
                None,
-               'more-on-momentum-based-approaches'),
-              ('Momentum parameter', 2, None, 'momentum-parameter'),
-              ('Second moment of the gradient',
-               2,
-               None,
-               'second-moment-of-the-gradient'),
-              ('RMS prop', 2, None, 'rms-prop'),
-              ('"ADAM optimizer":"https://arxiv.org/abs/1412.6980"',
-               2,
-               None,
-               'adam-optimizer-https-arxiv-org-abs-1412-6980'),
-              ('Algorithms and codes for Adagrad, RMSprop and Adam',
-               2,
-               None,
-               'algorithms-and-codes-for-adagrad-rmsprop-and-adam'),
-              ('AdaGrad algorithm, taken from "Goodfellow et '
-               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
-               2,
-               None,
-               'adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
-              ('RMSProp algorithm, taken from "Goodfellow et '
-               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
-               2,
-               None,
-               'rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
-              ('ADAM algorithm, taken from "Goodfellow et '
-               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
-               2,
-               None,
-               'adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
-              ('Practical tips', 2, None, 'practical-tips'),
-              ('Automatic differentiation',
-               2,
-               None,
-               'automatic-differentiation'),
-              ('Using autograd', 2, None, 'using-autograd'),
-              ('Autograd with more complicated functions',
-               2,
-               None,
-               'autograd-with-more-complicated-functions'),
-              ('More complicated functions using the elements of their '
-               'arguments directly',
-               2,
-               None,
-               'more-complicated-functions-using-the-elements-of-their-arguments-directly'),
-              ('Functions using mathematical functions from Numpy',
-               2,
-               None,
-               'functions-using-mathematical-functions-from-numpy'),
-              ('More autograd', 2, None, 'more-autograd'),
-              ('And  with loops', 2, None, 'and-with-loops'),
-              ('Using recursion', 2, None, 'using-recursion'),
-              ('Using Autograd with OLS', 2, None, 'using-autograd-with-ols'),
-              ('Same code but now with momentum gradient descent',
-               2,
-               None,
-               'same-code-but-now-with-momentum-gradient-descent'),
-              ('Including Stochastic Gradient Descent with Autograd',
-               2,
-               None,
-               'including-stochastic-gradient-descent-with-autograd'),
-              ('Same code but now with momentum gradient descent',
-               2,
-               None,
-               'same-code-but-now-with-momentum-gradient-descent'),
-              ('Similar (second order function now) problem but now with '
-               'AdaGrad',
-               2,
+               'example-code-for-logistic-regression'),
+              ('Synthetic data generation',
+               3,
                None,
-               'similar-second-order-function-now-problem-but-now-with-adagrad'),
-              ('RMSprop for adaptive learning rate with Stochastic Gradient '
-               'Descent',
+               'synthetic-data-generation'),
+              ('Using _Scikit-learn_', 2, None, 'using-scikit-learn'),
+              ('Using the correlation matrix',
                2,
                None,
-               'rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent'),
-              ('And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"',
+               'using-the-correlation-matrix'),
+              ('Discussing the correlation data',
                2,
                None,
-               'and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf'),
-              ('And Logistic Regression', 2, None, 'and-logistic-regression'),
-              ('Introducing "JAX":"https://jax.readthedocs.io/en/latest/"',
+               'discussing-the-correlation-data'),
+              ('Other measures in classification studies',
                2,
                None,
-               'introducing-jax-https-jax-readthedocs-io-en-latest'),
-              ('Getting started with Jax, note the way we import numpy',
-               3,
-               None,
-               'getting-started-with-jax-note-the-way-we-import-numpy'),
-              ('A warm-up example', 3, None, 'a-warm-up-example'),
-              ('A more advanced example', 3, None, 'a-more-advanced-example'),
+               'other-measures-in-classification-studies'),
               ('Introduction to Neural networks',
                2,
                None,
@@ -276,81 +201,54 @@
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs001.html#plans-for-week-40" style="font-size: 80%;"><b>Plans for week 40</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs002.html#lecture-monday-september-30-2024" style="font-size: 80%;"><b>Lecture Monday September 30, 2024</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs003.html#suggested-readings-and-videos" style="font-size: 80%;"><b>Suggested readings and videos</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs004.html#lab-sessions-tuesday-and-wednesday" style="font-size: 80%;"><b>Lab sessions Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs005.html#summary-from-last-week-using-gradient-descent-methods-limitations" style="font-size: 80%;"><b>Summary from last week, using gradient descent methods, limitations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs006.html#simple-implementation-of-gd-for-ols-ridge-and-lasso" style="font-size: 80%;"><b>Simple implementation of GD for OLS, Ridge and Lasso</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs007.html#but-none-of-these-can-compete-with-newton-s-method" style="font-size: 80%;"><b>But none of these can compete with Newton's method</b></a></li>
-     <!-- navigation toc: --> <li><a href="#gradient-descent-and-logistic-regression" style="font-size: 80%;"><b>Gradient descent and Logistic regression</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs009.html#overview-video-on-stochastic-gradient-descent" style="font-size: 80%;"><b>Overview video on Stochastic Gradient Descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs010.html#batches-and-mini-batches" style="font-size: 80%;"><b>Batches and mini-batches</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs011.html#stochastic-gradient-descent-sgd" style="font-size: 80%;"><b>Stochastic Gradient Descent (SGD)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs012.html#stochastic-gradient-descent" style="font-size: 80%;"><b>Stochastic Gradient Descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs013.html#computation-of-gradients" style="font-size: 80%;"><b>Computation of gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs014.html#sgd-example" style="font-size: 80%;"><b>SGD example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs015.html#the-gradient-step" style="font-size: 80%;"><b>The gradient step</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs016.html#simple-example-code" style="font-size: 80%;"><b>Simple example code</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs017.html#when-do-we-stop" style="font-size: 80%;"><b>When do we stop?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs018.html#slightly-different-approach" style="font-size: 80%;"><b>Slightly different approach</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs019.html#time-decay-rate" style="font-size: 80%;"><b>Time decay rate</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs020.html#code-with-a-number-of-minibatches-which-varies" style="font-size: 80%;"><b>Code with a Number of Minibatches which varies</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs021.html#replace-or-not" style="font-size: 80%;"><b>Replace or not</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs022.html#momentum-based-gd" style="font-size: 80%;"><b>Momentum based GD</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs023.html#more-on-momentum-based-approaches" style="font-size: 80%;"><b>More on momentum based approaches</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs024.html#momentum-parameter" style="font-size: 80%;"><b>Momentum parameter</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs025.html#second-moment-of-the-gradient" style="font-size: 80%;"><b>Second moment of the gradient</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs026.html#rms-prop" style="font-size: 80%;"><b>RMS prop</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs027.html#adam-optimizer-https-arxiv-org-abs-1412-6980" style="font-size: 80%;"><b>"ADAM optimizer":"https://arxiv.org/abs/1412.6980"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#algorithms-and-codes-for-adagrad-rmsprop-and-adam" style="font-size: 80%;"><b>Algorithms and codes for Adagrad, RMSprop and Adam</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>AdaGrad algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>RMSProp algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>ADAM algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs029.html#practical-tips" style="font-size: 80%;"><b>Practical tips</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs030.html#automatic-differentiation" style="font-size: 80%;"><b>Automatic differentiation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs031.html#using-autograd" style="font-size: 80%;"><b>Using autograd</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs032.html#autograd-with-more-complicated-functions" style="font-size: 80%;"><b>Autograd with more complicated functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs033.html#more-complicated-functions-using-the-elements-of-their-arguments-directly" style="font-size: 80%;"><b>More complicated functions using the elements of their arguments directly</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs034.html#functions-using-mathematical-functions-from-numpy" style="font-size: 80%;"><b>Functions using mathematical functions from Numpy</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs035.html#more-autograd" style="font-size: 80%;"><b>More autograd</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs036.html#and-with-loops" style="font-size: 80%;"><b>And  with loops</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs037.html#using-recursion" style="font-size: 80%;"><b>Using recursion</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs038.html#using-autograd-with-ols" style="font-size: 80%;"><b>Using Autograd with OLS</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs041.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs040.html#including-stochastic-gradient-descent-with-autograd" style="font-size: 80%;"><b>Including Stochastic Gradient Descent with Autograd</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs041.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#similar-second-order-function-now-problem-but-now-with-adagrad" style="font-size: 80%;"><b>Similar (second order function now) problem but now with AdaGrad</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs043.html#rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent" style="font-size: 80%;"><b>RMSprop for adaptive learning rate with Stochastic Gradient Descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs044.html#and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf" style="font-size: 80%;"><b>And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#and-logistic-regression" style="font-size: 80%;"><b>And Logistic Regression</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#introducing-jax-https-jax-readthedocs-io-en-latest" style="font-size: 80%;"><b>Introducing "JAX":"https://jax.readthedocs.io/en/latest/"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#getting-started-with-jax-note-the-way-we-import-numpy" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Getting started with Jax, note the way we import numpy</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#a-warm-up-example" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;A warm-up example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#a-more-advanced-example" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;A more advanced example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs046.html#introduction-to-neural-networks" style="font-size: 80%;"><b>Introduction to Neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs047.html#artificial-neurons" style="font-size: 80%;"><b>Artificial neurons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs048.html#neural-network-types" style="font-size: 80%;"><b>Neural network types</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs049.html#feed-forward-neural-networks" style="font-size: 80%;"><b>Feed-forward neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs050.html#convolutional-neural-network" style="font-size: 80%;"><b>Convolutional Neural Network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs051.html#recurrent-neural-networks" style="font-size: 80%;"><b>Recurrent neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs052.html#other-types-of-networks" style="font-size: 80%;"><b>Other types of networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs053.html#multilayer-perceptrons" style="font-size: 80%;"><b>Multilayer perceptrons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs054.html#why-multilayer-perceptrons" style="font-size: 80%;"><b>Why multilayer perceptrons?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs055.html#illustration-of-a-single-perceptron-model-and-a-multi-perceptron-model" style="font-size: 80%;"><b>Illustration of a single perceptron model and a multi-perceptron model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs056.html#examples-of-xor-or-and-and-gates" style="font-size: 80%;"><b>Examples of XOR, OR and AND gates</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs057.html#does-logistic-regression-do-a-better-job" style="font-size: 80%;"><b>Does Logistic Regression do a better Job?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs058.html#adding-neural-networks" style="font-size: 80%;"><b>Adding Neural Networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs064.html#matrix-vector-notation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Matrix-vector notation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs065.html#matrix-vector-notation-and-activation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Matrix-vector notation  and activation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs066.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs067.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs068.html#relevance" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Relevance</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs001.html#lecture-monday-september-29-2025" style="font-size: 80%;"><b>Lecture Monday September 29, 2025</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs002.html#suggested-readings-and-videos" style="font-size: 80%;"><b>Suggested readings and videos</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs003.html#lab-sessions-tuesday-and-wednesday" style="font-size: 80%;"><b>Lab sessions Tuesday and Wednesday</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs004.html#logistic-regression-from-last-week" style="font-size: 80%;"><b>Logistic Regression, from last week</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs005.html#classification-problems" style="font-size: 80%;"><b>Classification problems</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs006.html#optimization-and-deep-learning" style="font-size: 80%;"><b>Optimization and Deep learning</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs007.html#basics" style="font-size: 80%;"><b>Basics</b></a></li>
+     <!-- navigation toc: --> <li><a href="#two-parameters" style="font-size: 80%;"><b>Two parameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs009.html#maximum-likelihood" style="font-size: 80%;"><b>Maximum likelihood</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs010.html#the-cost-function-rewritten" style="font-size: 80%;"><b>The cost function rewritten</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs011.html#minimizing-the-cross-entropy" style="font-size: 80%;"><b>Minimizing the cross entropy</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs012.html#a-more-compact-expression" style="font-size: 80%;"><b>A more compact expression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs013.html#extending-to-more-predictors" style="font-size: 80%;"><b>Extending to more predictors</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs014.html#including-more-classes" style="font-size: 80%;"><b>Including more classes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs015.html#more-classes" style="font-size: 80%;"><b>More classes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs016.html#optimization-the-central-part-of-any-machine-learning-algortithm" style="font-size: 80%;"><b>Optimization, the central part of any Machine Learning algortithm</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs017.html#revisiting-our-logistic-regression-case" style="font-size: 80%;"><b>Revisiting our Logistic Regression case</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs018.html#the-equations-to-solve" style="font-size: 80%;"><b>The equations to solve</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs019.html#solving-using-newton-raphson-s-method" style="font-size: 80%;"><b>Solving using Newton-Raphson's method</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs020.html#example-code-for-logistic-regression" style="font-size: 80%;"><b>Example code for Logistic Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs020.html#synthetic-data-generation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Synthetic data generation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs021.html#using-scikit-learn" style="font-size: 80%;"><b>Using <b>Scikit-learn</b></b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs022.html#using-the-correlation-matrix" style="font-size: 80%;"><b>Using the correlation matrix</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs023.html#discussing-the-correlation-data" style="font-size: 80%;"><b>Discussing the correlation data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs024.html#other-measures-in-classification-studies" style="font-size: 80%;"><b>Other measures in classification studies</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs025.html#introduction-to-neural-networks" style="font-size: 80%;"><b>Introduction to Neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs026.html#artificial-neurons" style="font-size: 80%;"><b>Artificial neurons</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs027.html#neural-network-types" style="font-size: 80%;"><b>Neural network types</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#feed-forward-neural-networks" style="font-size: 80%;"><b>Feed-forward neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs029.html#convolutional-neural-network" style="font-size: 80%;"><b>Convolutional Neural Network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs030.html#recurrent-neural-networks" style="font-size: 80%;"><b>Recurrent neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs031.html#other-types-of-networks" style="font-size: 80%;"><b>Other types of networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs032.html#multilayer-perceptrons" style="font-size: 80%;"><b>Multilayer perceptrons</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs033.html#why-multilayer-perceptrons" style="font-size: 80%;"><b>Why multilayer perceptrons?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs034.html#illustration-of-a-single-perceptron-model-and-a-multi-perceptron-model" style="font-size: 80%;"><b>Illustration of a single perceptron model and a multi-perceptron model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs035.html#examples-of-xor-or-and-and-gates" style="font-size: 80%;"><b>Examples of XOR, OR and AND gates</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs036.html#does-logistic-regression-do-a-better-job" style="font-size: 80%;"><b>Does Logistic Regression do a better Job?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs037.html#adding-neural-networks" style="font-size: 80%;"><b>Adding Neural Networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs043.html#matrix-vector-notation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Matrix-vector notation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs044.html#matrix-vector-notation-and-activation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Matrix-vector notation  and activation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs046.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs047.html#relevance" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Relevance</a></li>
 
         </ul>
       </li>
@@ -362,65 +260,22 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0008"></a>
 <!-- !split -->
-<h2 id="gradient-descent-and-logistic-regression" class="anchor">Gradient descent and Logistic regression </h2>
+<h2 id="two-parameters" class="anchor">Two parameters </h2>
 
-<p>Finally, we complete these examples by adding a simple code for
-Logistic regression. Note the more general approach with a class for
-the method. Here we use a so-called <b>AND</b> gate for our data set.
-</p>
+<p>We assume now that we have two classes with \( y_i \) either \( 0 \) or \( 1 \). Furthermore we assume also that we have only two parameters \( \theta \) in our fitting of the Sigmoid function, that is we define probabilities </p>
+$$
+\begin{align*}
+p(y_i=1|x_i,\boldsymbol{\theta}) &= \frac{\exp{(\theta_0+\theta_1x_i)}}{1+\exp{(\theta_0+\theta_1x_i)}},\nonumber\\
+p(y_i=0|x_i,\boldsymbol{\theta}) &= 1 - p(y_i=1|x_i,\boldsymbol{\theta}),
+\end{align*}
+$$
 
+<p>where \( \boldsymbol{\theta} \) are the weights we wish to extract from data, in our case \( \theta_0 \) and \( \theta_1 \). </p>
 
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
-<span style="color: #008000; font-weight: bold">class</span> <span style="color: #0000FF; font-weight: bold">LogisticRegression</span>:
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">__init__</span>(<span style="color: #008000">self</span>, learning_rate<span style="color: #666666">=0.01</span>, num_iterations<span style="color: #666666">=1000</span>):
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>learning_rate <span style="color: #666666">=</span> learning_rate
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>num_iterations <span style="color: #666666">=</span> num_iterations
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>beta_logreg <span style="color: #666666">=</span> <span style="color: #008000; font-weight: bold">None</span>
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">sigmoid</span>(<span style="color: #008000">self</span>, z):
-        <span style="color: #008000; font-weight: bold">return</span> <span style="color: #666666">1</span> <span style="color: #666666">/</span> (<span style="color: #666666">1</span> <span style="color: #666666">+</span> np<span style="color: #666666">.</span>exp(<span style="color: #666666">-</span>z))
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">GDfit</span>(<span style="color: #008000">self</span>, X, y):
-        n_data, num_features <span style="color: #666666">=</span> X<span style="color: #666666">.</span>shape
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>beta_logreg <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros(num_features)
-        <span style="color: #008000; font-weight: bold">for</span> _ <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #008000">self</span><span style="color: #666666">.</span>num_iterations):
-            linear_model <span style="color: #666666">=</span> X <span style="color: #666666">@</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>beta_logreg
-            y_predicted <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>sigmoid(linear_model)
-            <span style="color: #408080; font-style: italic"># Gradient calculation</span>
-            gradient <span style="color: #666666">=</span> (X<span style="color: #666666">.</span>T <span style="color: #666666">@</span> (y_predicted <span style="color: #666666">-</span> y))<span style="color: #666666">/</span>n_data
-            <span style="color: #408080; font-style: italic"># Update beta_logreg</span>
-            <span style="color: #008000">self</span><span style="color: #666666">.</span>beta_logreg <span style="color: #666666">-=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>learning_rate<span style="color: #666666">*</span>gradient
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">predict</span>(<span style="color: #008000">self</span>, X):
-        linear_model <span style="color: #666666">=</span> X <span style="color: #666666">@</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>beta_logreg
-        y_predicted <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>sigmoid(linear_model)
-        <span style="color: #008000; font-weight: bold">return</span> [<span style="color: #666666">1</span> <span style="color: #008000; font-weight: bold">if</span> i <span style="color: #666666">&gt;=</span> <span style="color: #666666">0.5</span> <span style="color: #008000; font-weight: bold">else</span> <span style="color: #666666">0</span> <span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> y_predicted]
-<span style="color: #408080; font-style: italic"># Example usage</span>
-<span style="color: #008000; font-weight: bold">if</span> <span style="color: #19177C">__name__</span> <span style="color: #666666">==</span> <span style="color: #BA2121">&quot;__main__&quot;</span>:
-    <span style="color: #408080; font-style: italic"># Sample data</span>
-    X <span style="color: #666666">=</span> np<span style="color: #666666">.</span>array([[<span style="color: #666666">0</span>, <span style="color: #666666">0</span>], [<span style="color: #666666">1</span>, <span style="color: #666666">0</span>], [<span style="color: #666666">0</span>, <span style="color: #666666">1</span>], [<span style="color: #666666">1</span>, <span style="color: #666666">1</span>]])
-    y <span style="color: #666666">=</span> np<span style="color: #666666">.</span>array([<span style="color: #666666">0</span>, <span style="color: #666666">0</span>, <span style="color: #666666">0</span>, <span style="color: #666666">1</span>])  <span style="color: #408080; font-style: italic"># This is an AND gate</span>
-    model <span style="color: #666666">=</span> LogisticRegression(learning_rate<span style="color: #666666">=0.01</span>, num_iterations<span style="color: #666666">=1000</span>)
-    model<span style="color: #666666">.</span>GDfit(X, y)
-    predictions <span style="color: #666666">=</span> model<span style="color: #666666">.</span>predict(X)
-    <span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;Predictions:&quot;</span>, predictions)
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
+<p>Note that we used</p>
+$$
+p(y_i=0\vert x_i, \boldsymbol{\theta}) = 1-p(y_i=1\vert x_i, \boldsymbol{\theta}).
+$$
 
 
 <p>
@@ -446,7 +301,7 @@ <h2 id="gradient-descent-and-logistic-regression" class="anchor">Gradient descen
   <li><a href="http://github.com/._week40-bs016.html">17</a></li>
   <li><a href="http://github.com/._week40-bs017.html">18</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week40-bs068.html">69</a></li>
+  <li><a href="http://github.com/._week40-bs047.html">48</a></li>
   <li><a href="http://github.com/._week40-bs009.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week40/html/._week40-bs009.html b/doc/pub/week40/html/._week40-bs009.html
index 09e3cf44d..aa3df1577 100644
--- a/doc/pub/week40/html/._week40-bs009.html
+++ b/doc/pub/week40/html/._week40-bs009.html
@@ -36,11 +36,10 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Plans for week 40', 2, None, 'plans-for-week-40'),
-              ('Lecture Monday September 30, 2024',
+ 'sections': [('Lecture Monday September 29, 2025',
                2,
                None,
-               'lecture-monday-september-30-2024'),
+               'lecture-monday-september-29-2025'),
               ('Suggested readings and videos',
                2,
                None,
@@ -49,145 +48,71 @@
                2,
                None,
                'lab-sessions-tuesday-and-wednesday'),
-              ('Summary from last week, using gradient descent methods, '
-               'limitations',
+              ('Logistic Regression, from last week',
                2,
                None,
-               'summary-from-last-week-using-gradient-descent-methods-limitations'),
-              ('Simple implementation of GD for OLS, Ridge and Lasso',
+               'logistic-regression-from-last-week'),
+              ('Classification problems', 2, None, 'classification-problems'),
+              ('Optimization and Deep learning',
                2,
                None,
-               'simple-implementation-of-gd-for-ols-ridge-and-lasso'),
-              ("But none of these can compete with Newton's method",
+               'optimization-and-deep-learning'),
+              ('Basics', 2, None, 'basics'),
+              ('Two parameters', 2, None, 'two-parameters'),
+              ('Maximum likelihood', 2, None, 'maximum-likelihood'),
+              ('The cost function rewritten',
                2,
                None,
-               'but-none-of-these-can-compete-with-newton-s-method'),
-              ('Gradient descent and Logistic regression',
+               'the-cost-function-rewritten'),
+              ('Minimizing the cross entropy',
                2,
                None,
-               'gradient-descent-and-logistic-regression'),
-              ('Overview video on Stochastic Gradient Descent',
+               'minimizing-the-cross-entropy'),
+              ('A more compact expression',
                2,
                None,
-               'overview-video-on-stochastic-gradient-descent'),
-              ('Batches and mini-batches', 2, None, 'batches-and-mini-batches'),
-              ('Stochastic Gradient Descent (SGD)',
+               'a-more-compact-expression'),
+              ('Extending to more predictors',
                2,
                None,
-               'stochastic-gradient-descent-sgd'),
-              ('Stochastic Gradient Descent',
+               'extending-to-more-predictors'),
+              ('Including more classes', 2, None, 'including-more-classes'),
+              ('More classes', 2, None, 'more-classes'),
+              ('Optimization, the central part of any Machine Learning '
+               'algortithm',
                2,
                None,
-               'stochastic-gradient-descent'),
-              ('Computation of gradients', 2, None, 'computation-of-gradients'),
-              ('SGD example', 2, None, 'sgd-example'),
-              ('The gradient step', 2, None, 'the-gradient-step'),
-              ('Simple example code', 2, None, 'simple-example-code'),
-              ('When do we stop?', 2, None, 'when-do-we-stop'),
-              ('Slightly different approach',
+               'optimization-the-central-part-of-any-machine-learning-algortithm'),
+              ('Revisiting our Logistic Regression case',
                2,
                None,
-               'slightly-different-approach'),
-              ('Time decay rate', 2, None, 'time-decay-rate'),
-              ('Code with a Number of Minibatches which varies',
+               'revisiting-our-logistic-regression-case'),
+              ('The equations to solve', 2, None, 'the-equations-to-solve'),
+              ("Solving using Newton-Raphson's method",
                2,
                None,
-               'code-with-a-number-of-minibatches-which-varies'),
-              ('Replace or not', 2, None, 'replace-or-not'),
-              ('Momentum based GD', 2, None, 'momentum-based-gd'),
-              ('More on momentum based approaches',
+               'solving-using-newton-raphson-s-method'),
+              ('Example code for Logistic Regression',
                2,
                None,
-               'more-on-momentum-based-approaches'),
-              ('Momentum parameter', 2, None, 'momentum-parameter'),
-              ('Second moment of the gradient',
-               2,
-               None,
-               'second-moment-of-the-gradient'),
-              ('RMS prop', 2, None, 'rms-prop'),
-              ('"ADAM optimizer":"https://arxiv.org/abs/1412.6980"',
-               2,
-               None,
-               'adam-optimizer-https-arxiv-org-abs-1412-6980'),
-              ('Algorithms and codes for Adagrad, RMSprop and Adam',
-               2,
-               None,
-               'algorithms-and-codes-for-adagrad-rmsprop-and-adam'),
-              ('AdaGrad algorithm, taken from "Goodfellow et '
-               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
-               2,
-               None,
-               'adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
-              ('RMSProp algorithm, taken from "Goodfellow et '
-               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
-               2,
-               None,
-               'rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
-              ('ADAM algorithm, taken from "Goodfellow et '
-               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
-               2,
-               None,
-               'adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
-              ('Practical tips', 2, None, 'practical-tips'),
-              ('Automatic differentiation',
-               2,
-               None,
-               'automatic-differentiation'),
-              ('Using autograd', 2, None, 'using-autograd'),
-              ('Autograd with more complicated functions',
-               2,
-               None,
-               'autograd-with-more-complicated-functions'),
-              ('More complicated functions using the elements of their '
-               'arguments directly',
-               2,
-               None,
-               'more-complicated-functions-using-the-elements-of-their-arguments-directly'),
-              ('Functions using mathematical functions from Numpy',
-               2,
-               None,
-               'functions-using-mathematical-functions-from-numpy'),
-              ('More autograd', 2, None, 'more-autograd'),
-              ('And  with loops', 2, None, 'and-with-loops'),
-              ('Using recursion', 2, None, 'using-recursion'),
-              ('Using Autograd with OLS', 2, None, 'using-autograd-with-ols'),
-              ('Same code but now with momentum gradient descent',
-               2,
-               None,
-               'same-code-but-now-with-momentum-gradient-descent'),
-              ('Including Stochastic Gradient Descent with Autograd',
-               2,
-               None,
-               'including-stochastic-gradient-descent-with-autograd'),
-              ('Same code but now with momentum gradient descent',
-               2,
-               None,
-               'same-code-but-now-with-momentum-gradient-descent'),
-              ('Similar (second order function now) problem but now with '
-               'AdaGrad',
-               2,
+               'example-code-for-logistic-regression'),
+              ('Synthetic data generation',
+               3,
                None,
-               'similar-second-order-function-now-problem-but-now-with-adagrad'),
-              ('RMSprop for adaptive learning rate with Stochastic Gradient '
-               'Descent',
+               'synthetic-data-generation'),
+              ('Using _Scikit-learn_', 2, None, 'using-scikit-learn'),
+              ('Using the correlation matrix',
                2,
                None,
-               'rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent'),
-              ('And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"',
+               'using-the-correlation-matrix'),
+              ('Discussing the correlation data',
                2,
                None,
-               'and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf'),
-              ('And Logistic Regression', 2, None, 'and-logistic-regression'),
-              ('Introducing "JAX":"https://jax.readthedocs.io/en/latest/"',
+               'discussing-the-correlation-data'),
+              ('Other measures in classification studies',
                2,
                None,
-               'introducing-jax-https-jax-readthedocs-io-en-latest'),
-              ('Getting started with Jax, note the way we import numpy',
-               3,
-               None,
-               'getting-started-with-jax-note-the-way-we-import-numpy'),
-              ('A warm-up example', 3, None, 'a-warm-up-example'),
-              ('A more advanced example', 3, None, 'a-more-advanced-example'),
+               'other-measures-in-classification-studies'),
               ('Introduction to Neural networks',
                2,
                None,
@@ -276,81 +201,54 @@
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs001.html#plans-for-week-40" style="font-size: 80%;"><b>Plans for week 40</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs002.html#lecture-monday-september-30-2024" style="font-size: 80%;"><b>Lecture Monday September 30, 2024</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs003.html#suggested-readings-and-videos" style="font-size: 80%;"><b>Suggested readings and videos</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs004.html#lab-sessions-tuesday-and-wednesday" style="font-size: 80%;"><b>Lab sessions Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs005.html#summary-from-last-week-using-gradient-descent-methods-limitations" style="font-size: 80%;"><b>Summary from last week, using gradient descent methods, limitations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs006.html#simple-implementation-of-gd-for-ols-ridge-and-lasso" style="font-size: 80%;"><b>Simple implementation of GD for OLS, Ridge and Lasso</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs007.html#but-none-of-these-can-compete-with-newton-s-method" style="font-size: 80%;"><b>But none of these can compete with Newton's method</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs008.html#gradient-descent-and-logistic-regression" style="font-size: 80%;"><b>Gradient descent and Logistic regression</b></a></li>
-     <!-- navigation toc: --> <li><a href="#overview-video-on-stochastic-gradient-descent" style="font-size: 80%;"><b>Overview video on Stochastic Gradient Descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs010.html#batches-and-mini-batches" style="font-size: 80%;"><b>Batches and mini-batches</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs011.html#stochastic-gradient-descent-sgd" style="font-size: 80%;"><b>Stochastic Gradient Descent (SGD)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs012.html#stochastic-gradient-descent" style="font-size: 80%;"><b>Stochastic Gradient Descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs013.html#computation-of-gradients" style="font-size: 80%;"><b>Computation of gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs014.html#sgd-example" style="font-size: 80%;"><b>SGD example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs015.html#the-gradient-step" style="font-size: 80%;"><b>The gradient step</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs016.html#simple-example-code" style="font-size: 80%;"><b>Simple example code</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs017.html#when-do-we-stop" style="font-size: 80%;"><b>When do we stop?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs018.html#slightly-different-approach" style="font-size: 80%;"><b>Slightly different approach</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs019.html#time-decay-rate" style="font-size: 80%;"><b>Time decay rate</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs020.html#code-with-a-number-of-minibatches-which-varies" style="font-size: 80%;"><b>Code with a Number of Minibatches which varies</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs021.html#replace-or-not" style="font-size: 80%;"><b>Replace or not</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs022.html#momentum-based-gd" style="font-size: 80%;"><b>Momentum based GD</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs023.html#more-on-momentum-based-approaches" style="font-size: 80%;"><b>More on momentum based approaches</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs024.html#momentum-parameter" style="font-size: 80%;"><b>Momentum parameter</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs025.html#second-moment-of-the-gradient" style="font-size: 80%;"><b>Second moment of the gradient</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs026.html#rms-prop" style="font-size: 80%;"><b>RMS prop</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs027.html#adam-optimizer-https-arxiv-org-abs-1412-6980" style="font-size: 80%;"><b>"ADAM optimizer":"https://arxiv.org/abs/1412.6980"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#algorithms-and-codes-for-adagrad-rmsprop-and-adam" style="font-size: 80%;"><b>Algorithms and codes for Adagrad, RMSprop and Adam</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>AdaGrad algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>RMSProp algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>ADAM algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs029.html#practical-tips" style="font-size: 80%;"><b>Practical tips</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs030.html#automatic-differentiation" style="font-size: 80%;"><b>Automatic differentiation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs031.html#using-autograd" style="font-size: 80%;"><b>Using autograd</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs032.html#autograd-with-more-complicated-functions" style="font-size: 80%;"><b>Autograd with more complicated functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs033.html#more-complicated-functions-using-the-elements-of-their-arguments-directly" style="font-size: 80%;"><b>More complicated functions using the elements of their arguments directly</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs034.html#functions-using-mathematical-functions-from-numpy" style="font-size: 80%;"><b>Functions using mathematical functions from Numpy</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs035.html#more-autograd" style="font-size: 80%;"><b>More autograd</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs036.html#and-with-loops" style="font-size: 80%;"><b>And  with loops</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs037.html#using-recursion" style="font-size: 80%;"><b>Using recursion</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs038.html#using-autograd-with-ols" style="font-size: 80%;"><b>Using Autograd with OLS</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs041.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs040.html#including-stochastic-gradient-descent-with-autograd" style="font-size: 80%;"><b>Including Stochastic Gradient Descent with Autograd</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs041.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#similar-second-order-function-now-problem-but-now-with-adagrad" style="font-size: 80%;"><b>Similar (second order function now) problem but now with AdaGrad</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs043.html#rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent" style="font-size: 80%;"><b>RMSprop for adaptive learning rate with Stochastic Gradient Descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs044.html#and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf" style="font-size: 80%;"><b>And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#and-logistic-regression" style="font-size: 80%;"><b>And Logistic Regression</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#introducing-jax-https-jax-readthedocs-io-en-latest" style="font-size: 80%;"><b>Introducing "JAX":"https://jax.readthedocs.io/en/latest/"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#getting-started-with-jax-note-the-way-we-import-numpy" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Getting started with Jax, note the way we import numpy</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#a-warm-up-example" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;A warm-up example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#a-more-advanced-example" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;A more advanced example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs046.html#introduction-to-neural-networks" style="font-size: 80%;"><b>Introduction to Neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs047.html#artificial-neurons" style="font-size: 80%;"><b>Artificial neurons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs048.html#neural-network-types" style="font-size: 80%;"><b>Neural network types</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs049.html#feed-forward-neural-networks" style="font-size: 80%;"><b>Feed-forward neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs050.html#convolutional-neural-network" style="font-size: 80%;"><b>Convolutional Neural Network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs051.html#recurrent-neural-networks" style="font-size: 80%;"><b>Recurrent neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs052.html#other-types-of-networks" style="font-size: 80%;"><b>Other types of networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs053.html#multilayer-perceptrons" style="font-size: 80%;"><b>Multilayer perceptrons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs054.html#why-multilayer-perceptrons" style="font-size: 80%;"><b>Why multilayer perceptrons?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs055.html#illustration-of-a-single-perceptron-model-and-a-multi-perceptron-model" style="font-size: 80%;"><b>Illustration of a single perceptron model and a multi-perceptron model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs056.html#examples-of-xor-or-and-and-gates" style="font-size: 80%;"><b>Examples of XOR, OR and AND gates</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs057.html#does-logistic-regression-do-a-better-job" style="font-size: 80%;"><b>Does Logistic Regression do a better Job?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs058.html#adding-neural-networks" style="font-size: 80%;"><b>Adding Neural Networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs064.html#matrix-vector-notation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Matrix-vector notation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs065.html#matrix-vector-notation-and-activation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Matrix-vector notation  and activation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs066.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs067.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs068.html#relevance" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Relevance</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs001.html#lecture-monday-september-29-2025" style="font-size: 80%;"><b>Lecture Monday September 29, 2025</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs002.html#suggested-readings-and-videos" style="font-size: 80%;"><b>Suggested readings and videos</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs003.html#lab-sessions-tuesday-and-wednesday" style="font-size: 80%;"><b>Lab sessions Tuesday and Wednesday</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs004.html#logistic-regression-from-last-week" style="font-size: 80%;"><b>Logistic Regression, from last week</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs005.html#classification-problems" style="font-size: 80%;"><b>Classification problems</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs006.html#optimization-and-deep-learning" style="font-size: 80%;"><b>Optimization and Deep learning</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs007.html#basics" style="font-size: 80%;"><b>Basics</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs008.html#two-parameters" style="font-size: 80%;"><b>Two parameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="#maximum-likelihood" style="font-size: 80%;"><b>Maximum likelihood</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs010.html#the-cost-function-rewritten" style="font-size: 80%;"><b>The cost function rewritten</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs011.html#minimizing-the-cross-entropy" style="font-size: 80%;"><b>Minimizing the cross entropy</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs012.html#a-more-compact-expression" style="font-size: 80%;"><b>A more compact expression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs013.html#extending-to-more-predictors" style="font-size: 80%;"><b>Extending to more predictors</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs014.html#including-more-classes" style="font-size: 80%;"><b>Including more classes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs015.html#more-classes" style="font-size: 80%;"><b>More classes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs016.html#optimization-the-central-part-of-any-machine-learning-algortithm" style="font-size: 80%;"><b>Optimization, the central part of any Machine Learning algortithm</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs017.html#revisiting-our-logistic-regression-case" style="font-size: 80%;"><b>Revisiting our Logistic Regression case</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs018.html#the-equations-to-solve" style="font-size: 80%;"><b>The equations to solve</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs019.html#solving-using-newton-raphson-s-method" style="font-size: 80%;"><b>Solving using Newton-Raphson's method</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs020.html#example-code-for-logistic-regression" style="font-size: 80%;"><b>Example code for Logistic Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs020.html#synthetic-data-generation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Synthetic data generation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs021.html#using-scikit-learn" style="font-size: 80%;"><b>Using <b>Scikit-learn</b></b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs022.html#using-the-correlation-matrix" style="font-size: 80%;"><b>Using the correlation matrix</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs023.html#discussing-the-correlation-data" style="font-size: 80%;"><b>Discussing the correlation data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs024.html#other-measures-in-classification-studies" style="font-size: 80%;"><b>Other measures in classification studies</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs025.html#introduction-to-neural-networks" style="font-size: 80%;"><b>Introduction to Neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs026.html#artificial-neurons" style="font-size: 80%;"><b>Artificial neurons</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs027.html#neural-network-types" style="font-size: 80%;"><b>Neural network types</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#feed-forward-neural-networks" style="font-size: 80%;"><b>Feed-forward neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs029.html#convolutional-neural-network" style="font-size: 80%;"><b>Convolutional Neural Network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs030.html#recurrent-neural-networks" style="font-size: 80%;"><b>Recurrent neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs031.html#other-types-of-networks" style="font-size: 80%;"><b>Other types of networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs032.html#multilayer-perceptrons" style="font-size: 80%;"><b>Multilayer perceptrons</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs033.html#why-multilayer-perceptrons" style="font-size: 80%;"><b>Why multilayer perceptrons?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs034.html#illustration-of-a-single-perceptron-model-and-a-multi-perceptron-model" style="font-size: 80%;"><b>Illustration of a single perceptron model and a multi-perceptron model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs035.html#examples-of-xor-or-and-and-gates" style="font-size: 80%;"><b>Examples of XOR, OR and AND gates</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs036.html#does-logistic-regression-do-a-better-job" style="font-size: 80%;"><b>Does Logistic Regression do a better Job?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs037.html#adding-neural-networks" style="font-size: 80%;"><b>Adding Neural Networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs043.html#matrix-vector-notation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Matrix-vector notation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs044.html#matrix-vector-notation-and-activation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Matrix-vector notation  and activation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs046.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs047.html#relevance" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Relevance</a></li>
 
         </ul>
       </li>
@@ -361,17 +259,28 @@
 <div class="container">
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0009"></a>
-<!-- !split -->
-<h2 id="overview-video-on-stochastic-gradient-descent" class="anchor">Overview video on Stochastic Gradient Descent </h2>
+<!-- !split  -->
+<h2 id="maximum-likelihood" class="anchor">Maximum likelihood </h2>
+
+<p>In order to define the total likelihood for all possible outcomes from a  
+dataset \( \mathcal{D}=\{(y_i,x_i)\} \), with the binary labels
+\( y_i\in\{0,1\} \) and where the data points are drawn independently, we use the so-called <a href="https://en.wikipedia.org/wiki/Maximum_likelihood_estimation" target="_self">Maximum Likelihood Estimation</a> (MLE) principle. 
+We aim thus at maximizing 
+the probability of seeing the observed data. We can then approximate the 
+likelihood in terms of the product of the individual probabilities of a specific outcome \( y_i \), that is 
+</p>
+$$
+\begin{align*}
+P(\mathcal{D}|\boldsymbol{\theta})& = \prod_{i=1}^n \left[p(y_i=1|x_i,\boldsymbol{\theta})\right]^{y_i}\left[1-p(y_i=1|x_i,\boldsymbol{\theta}))\right]^{1-y_i}\nonumber \\
+\end{align*}
+$$
+
+<p>from which we obtain the log-likelihood and our <b>cost/loss</b> function</p>
+$$
+\mathcal{C}(\boldsymbol{\theta}) = \sum_{i=1}^n \left( y_i\log{p(y_i=1|x_i,\boldsymbol{\theta})} + (1-y_i)\log\left[1-p(y_i=1|x_i,\boldsymbol{\theta}))\right]\right).
+$$
 
-<a href="https://www.youtube.com/watch?v=vMh0zPT0tLI&ab_channel=StatQuestwithJoshStarmer" target="_self">What is Stochastic Gradient Descent</a>
-<p>There are several reasons for using stochastic gradient descent. Some of these are:</p>
 
-<ol>
-<li> Efficiency: Updates weights more frequently using a single or a small batch of samples, which speeds up convergence.</li>
-<li> Hopefully avoid Local Minima</li>
-<li> Memory Usage: Requires less memory compared to computing gradients for the entire dataset.</li>
-</ol>
 <p>
 <!-- navigation buttons at the bottom of the page -->
 <ul class="pagination">
@@ -396,7 +305,7 @@ <h2 id="overview-video-on-stochastic-gradient-descent" class="anchor">Overview v
   <li><a href="http://github.com/._week40-bs017.html">18</a></li>
   <li><a href="http://github.com/._week40-bs018.html">19</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week40-bs068.html">69</a></li>
+  <li><a href="http://github.com/._week40-bs047.html">48</a></li>
   <li><a href="http://github.com/._week40-bs010.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week40/html/._week40-bs010.html b/doc/pub/week40/html/._week40-bs010.html
index 447208763..4684d0960 100644
--- a/doc/pub/week40/html/._week40-bs010.html
+++ b/doc/pub/week40/html/._week40-bs010.html
@@ -36,11 +36,10 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Plans for week 40', 2, None, 'plans-for-week-40'),
-              ('Lecture Monday September 30, 2024',
+ 'sections': [('Lecture Monday September 29, 2025',
                2,
                None,
-               'lecture-monday-september-30-2024'),
+               'lecture-monday-september-29-2025'),
               ('Suggested readings and videos',
                2,
                None,
@@ -49,145 +48,71 @@
                2,
                None,
                'lab-sessions-tuesday-and-wednesday'),
-              ('Summary from last week, using gradient descent methods, '
-               'limitations',
+              ('Logistic Regression, from last week',
                2,
                None,
-               'summary-from-last-week-using-gradient-descent-methods-limitations'),
-              ('Simple implementation of GD for OLS, Ridge and Lasso',
+               'logistic-regression-from-last-week'),
+              ('Classification problems', 2, None, 'classification-problems'),
+              ('Optimization and Deep learning',
                2,
                None,
-               'simple-implementation-of-gd-for-ols-ridge-and-lasso'),
-              ("But none of these can compete with Newton's method",
+               'optimization-and-deep-learning'),
+              ('Basics', 2, None, 'basics'),
+              ('Two parameters', 2, None, 'two-parameters'),
+              ('Maximum likelihood', 2, None, 'maximum-likelihood'),
+              ('The cost function rewritten',
                2,
                None,
-               'but-none-of-these-can-compete-with-newton-s-method'),
-              ('Gradient descent and Logistic regression',
+               'the-cost-function-rewritten'),
+              ('Minimizing the cross entropy',
                2,
                None,
-               'gradient-descent-and-logistic-regression'),
-              ('Overview video on Stochastic Gradient Descent',
+               'minimizing-the-cross-entropy'),
+              ('A more compact expression',
                2,
                None,
-               'overview-video-on-stochastic-gradient-descent'),
-              ('Batches and mini-batches', 2, None, 'batches-and-mini-batches'),
-              ('Stochastic Gradient Descent (SGD)',
+               'a-more-compact-expression'),
+              ('Extending to more predictors',
                2,
                None,
-               'stochastic-gradient-descent-sgd'),
-              ('Stochastic Gradient Descent',
+               'extending-to-more-predictors'),
+              ('Including more classes', 2, None, 'including-more-classes'),
+              ('More classes', 2, None, 'more-classes'),
+              ('Optimization, the central part of any Machine Learning '
+               'algortithm',
                2,
                None,
-               'stochastic-gradient-descent'),
-              ('Computation of gradients', 2, None, 'computation-of-gradients'),
-              ('SGD example', 2, None, 'sgd-example'),
-              ('The gradient step', 2, None, 'the-gradient-step'),
-              ('Simple example code', 2, None, 'simple-example-code'),
-              ('When do we stop?', 2, None, 'when-do-we-stop'),
-              ('Slightly different approach',
+               'optimization-the-central-part-of-any-machine-learning-algortithm'),
+              ('Revisiting our Logistic Regression case',
                2,
                None,
-               'slightly-different-approach'),
-              ('Time decay rate', 2, None, 'time-decay-rate'),
-              ('Code with a Number of Minibatches which varies',
+               'revisiting-our-logistic-regression-case'),
+              ('The equations to solve', 2, None, 'the-equations-to-solve'),
+              ("Solving using Newton-Raphson's method",
                2,
                None,
-               'code-with-a-number-of-minibatches-which-varies'),
-              ('Replace or not', 2, None, 'replace-or-not'),
-              ('Momentum based GD', 2, None, 'momentum-based-gd'),
-              ('More on momentum based approaches',
+               'solving-using-newton-raphson-s-method'),
+              ('Example code for Logistic Regression',
                2,
                None,
-               'more-on-momentum-based-approaches'),
-              ('Momentum parameter', 2, None, 'momentum-parameter'),
-              ('Second moment of the gradient',
-               2,
-               None,
-               'second-moment-of-the-gradient'),
-              ('RMS prop', 2, None, 'rms-prop'),
-              ('"ADAM optimizer":"https://arxiv.org/abs/1412.6980"',
-               2,
-               None,
-               'adam-optimizer-https-arxiv-org-abs-1412-6980'),
-              ('Algorithms and codes for Adagrad, RMSprop and Adam',
-               2,
-               None,
-               'algorithms-and-codes-for-adagrad-rmsprop-and-adam'),
-              ('AdaGrad algorithm, taken from "Goodfellow et '
-               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
-               2,
-               None,
-               'adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
-              ('RMSProp algorithm, taken from "Goodfellow et '
-               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
-               2,
-               None,
-               'rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
-              ('ADAM algorithm, taken from "Goodfellow et '
-               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
-               2,
-               None,
-               'adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
-              ('Practical tips', 2, None, 'practical-tips'),
-              ('Automatic differentiation',
-               2,
-               None,
-               'automatic-differentiation'),
-              ('Using autograd', 2, None, 'using-autograd'),
-              ('Autograd with more complicated functions',
-               2,
-               None,
-               'autograd-with-more-complicated-functions'),
-              ('More complicated functions using the elements of their '
-               'arguments directly',
-               2,
-               None,
-               'more-complicated-functions-using-the-elements-of-their-arguments-directly'),
-              ('Functions using mathematical functions from Numpy',
-               2,
-               None,
-               'functions-using-mathematical-functions-from-numpy'),
-              ('More autograd', 2, None, 'more-autograd'),
-              ('And  with loops', 2, None, 'and-with-loops'),
-              ('Using recursion', 2, None, 'using-recursion'),
-              ('Using Autograd with OLS', 2, None, 'using-autograd-with-ols'),
-              ('Same code but now with momentum gradient descent',
-               2,
-               None,
-               'same-code-but-now-with-momentum-gradient-descent'),
-              ('Including Stochastic Gradient Descent with Autograd',
-               2,
-               None,
-               'including-stochastic-gradient-descent-with-autograd'),
-              ('Same code but now with momentum gradient descent',
-               2,
-               None,
-               'same-code-but-now-with-momentum-gradient-descent'),
-              ('Similar (second order function now) problem but now with '
-               'AdaGrad',
-               2,
+               'example-code-for-logistic-regression'),
+              ('Synthetic data generation',
+               3,
                None,
-               'similar-second-order-function-now-problem-but-now-with-adagrad'),
-              ('RMSprop for adaptive learning rate with Stochastic Gradient '
-               'Descent',
+               'synthetic-data-generation'),
+              ('Using _Scikit-learn_', 2, None, 'using-scikit-learn'),
+              ('Using the correlation matrix',
                2,
                None,
-               'rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent'),
-              ('And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"',
+               'using-the-correlation-matrix'),
+              ('Discussing the correlation data',
                2,
                None,
-               'and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf'),
-              ('And Logistic Regression', 2, None, 'and-logistic-regression'),
-              ('Introducing "JAX":"https://jax.readthedocs.io/en/latest/"',
+               'discussing-the-correlation-data'),
+              ('Other measures in classification studies',
                2,
                None,
-               'introducing-jax-https-jax-readthedocs-io-en-latest'),
-              ('Getting started with Jax, note the way we import numpy',
-               3,
-               None,
-               'getting-started-with-jax-note-the-way-we-import-numpy'),
-              ('A warm-up example', 3, None, 'a-warm-up-example'),
-              ('A more advanced example', 3, None, 'a-more-advanced-example'),
+               'other-measures-in-classification-studies'),
               ('Introduction to Neural networks',
                2,
                None,
@@ -276,81 +201,54 @@
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs001.html#plans-for-week-40" style="font-size: 80%;"><b>Plans for week 40</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs002.html#lecture-monday-september-30-2024" style="font-size: 80%;"><b>Lecture Monday September 30, 2024</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs003.html#suggested-readings-and-videos" style="font-size: 80%;"><b>Suggested readings and videos</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs004.html#lab-sessions-tuesday-and-wednesday" style="font-size: 80%;"><b>Lab sessions Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs005.html#summary-from-last-week-using-gradient-descent-methods-limitations" style="font-size: 80%;"><b>Summary from last week, using gradient descent methods, limitations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs006.html#simple-implementation-of-gd-for-ols-ridge-and-lasso" style="font-size: 80%;"><b>Simple implementation of GD for OLS, Ridge and Lasso</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs007.html#but-none-of-these-can-compete-with-newton-s-method" style="font-size: 80%;"><b>But none of these can compete with Newton's method</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs008.html#gradient-descent-and-logistic-regression" style="font-size: 80%;"><b>Gradient descent and Logistic regression</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs009.html#overview-video-on-stochastic-gradient-descent" style="font-size: 80%;"><b>Overview video on Stochastic Gradient Descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="#batches-and-mini-batches" style="font-size: 80%;"><b>Batches and mini-batches</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs011.html#stochastic-gradient-descent-sgd" style="font-size: 80%;"><b>Stochastic Gradient Descent (SGD)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs012.html#stochastic-gradient-descent" style="font-size: 80%;"><b>Stochastic Gradient Descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs013.html#computation-of-gradients" style="font-size: 80%;"><b>Computation of gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs014.html#sgd-example" style="font-size: 80%;"><b>SGD example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs015.html#the-gradient-step" style="font-size: 80%;"><b>The gradient step</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs016.html#simple-example-code" style="font-size: 80%;"><b>Simple example code</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs017.html#when-do-we-stop" style="font-size: 80%;"><b>When do we stop?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs018.html#slightly-different-approach" style="font-size: 80%;"><b>Slightly different approach</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs019.html#time-decay-rate" style="font-size: 80%;"><b>Time decay rate</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs020.html#code-with-a-number-of-minibatches-which-varies" style="font-size: 80%;"><b>Code with a Number of Minibatches which varies</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs021.html#replace-or-not" style="font-size: 80%;"><b>Replace or not</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs022.html#momentum-based-gd" style="font-size: 80%;"><b>Momentum based GD</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs023.html#more-on-momentum-based-approaches" style="font-size: 80%;"><b>More on momentum based approaches</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs024.html#momentum-parameter" style="font-size: 80%;"><b>Momentum parameter</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs025.html#second-moment-of-the-gradient" style="font-size: 80%;"><b>Second moment of the gradient</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs026.html#rms-prop" style="font-size: 80%;"><b>RMS prop</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs027.html#adam-optimizer-https-arxiv-org-abs-1412-6980" style="font-size: 80%;"><b>"ADAM optimizer":"https://arxiv.org/abs/1412.6980"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#algorithms-and-codes-for-adagrad-rmsprop-and-adam" style="font-size: 80%;"><b>Algorithms and codes for Adagrad, RMSprop and Adam</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>AdaGrad algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>RMSProp algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>ADAM algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs029.html#practical-tips" style="font-size: 80%;"><b>Practical tips</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs030.html#automatic-differentiation" style="font-size: 80%;"><b>Automatic differentiation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs031.html#using-autograd" style="font-size: 80%;"><b>Using autograd</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs032.html#autograd-with-more-complicated-functions" style="font-size: 80%;"><b>Autograd with more complicated functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs033.html#more-complicated-functions-using-the-elements-of-their-arguments-directly" style="font-size: 80%;"><b>More complicated functions using the elements of their arguments directly</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs034.html#functions-using-mathematical-functions-from-numpy" style="font-size: 80%;"><b>Functions using mathematical functions from Numpy</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs035.html#more-autograd" style="font-size: 80%;"><b>More autograd</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs036.html#and-with-loops" style="font-size: 80%;"><b>And  with loops</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs037.html#using-recursion" style="font-size: 80%;"><b>Using recursion</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs038.html#using-autograd-with-ols" style="font-size: 80%;"><b>Using Autograd with OLS</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs041.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs040.html#including-stochastic-gradient-descent-with-autograd" style="font-size: 80%;"><b>Including Stochastic Gradient Descent with Autograd</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs041.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#similar-second-order-function-now-problem-but-now-with-adagrad" style="font-size: 80%;"><b>Similar (second order function now) problem but now with AdaGrad</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs043.html#rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent" style="font-size: 80%;"><b>RMSprop for adaptive learning rate with Stochastic Gradient Descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs044.html#and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf" style="font-size: 80%;"><b>And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#and-logistic-regression" style="font-size: 80%;"><b>And Logistic Regression</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#introducing-jax-https-jax-readthedocs-io-en-latest" style="font-size: 80%;"><b>Introducing "JAX":"https://jax.readthedocs.io/en/latest/"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#getting-started-with-jax-note-the-way-we-import-numpy" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Getting started with Jax, note the way we import numpy</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#a-warm-up-example" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;A warm-up example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#a-more-advanced-example" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;A more advanced example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs046.html#introduction-to-neural-networks" style="font-size: 80%;"><b>Introduction to Neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs047.html#artificial-neurons" style="font-size: 80%;"><b>Artificial neurons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs048.html#neural-network-types" style="font-size: 80%;"><b>Neural network types</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs049.html#feed-forward-neural-networks" style="font-size: 80%;"><b>Feed-forward neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs050.html#convolutional-neural-network" style="font-size: 80%;"><b>Convolutional Neural Network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs051.html#recurrent-neural-networks" style="font-size: 80%;"><b>Recurrent neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs052.html#other-types-of-networks" style="font-size: 80%;"><b>Other types of networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs053.html#multilayer-perceptrons" style="font-size: 80%;"><b>Multilayer perceptrons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs054.html#why-multilayer-perceptrons" style="font-size: 80%;"><b>Why multilayer perceptrons?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs055.html#illustration-of-a-single-perceptron-model-and-a-multi-perceptron-model" style="font-size: 80%;"><b>Illustration of a single perceptron model and a multi-perceptron model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs056.html#examples-of-xor-or-and-and-gates" style="font-size: 80%;"><b>Examples of XOR, OR and AND gates</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs057.html#does-logistic-regression-do-a-better-job" style="font-size: 80%;"><b>Does Logistic Regression do a better Job?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs058.html#adding-neural-networks" style="font-size: 80%;"><b>Adding Neural Networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs064.html#matrix-vector-notation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Matrix-vector notation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs065.html#matrix-vector-notation-and-activation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Matrix-vector notation  and activation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs066.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs067.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs068.html#relevance" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Relevance</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs001.html#lecture-monday-september-29-2025" style="font-size: 80%;"><b>Lecture Monday September 29, 2025</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs002.html#suggested-readings-and-videos" style="font-size: 80%;"><b>Suggested readings and videos</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs003.html#lab-sessions-tuesday-and-wednesday" style="font-size: 80%;"><b>Lab sessions Tuesday and Wednesday</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs004.html#logistic-regression-from-last-week" style="font-size: 80%;"><b>Logistic Regression, from last week</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs005.html#classification-problems" style="font-size: 80%;"><b>Classification problems</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs006.html#optimization-and-deep-learning" style="font-size: 80%;"><b>Optimization and Deep learning</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs007.html#basics" style="font-size: 80%;"><b>Basics</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs008.html#two-parameters" style="font-size: 80%;"><b>Two parameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs009.html#maximum-likelihood" style="font-size: 80%;"><b>Maximum likelihood</b></a></li>
+     <!-- navigation toc: --> <li><a href="#the-cost-function-rewritten" style="font-size: 80%;"><b>The cost function rewritten</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs011.html#minimizing-the-cross-entropy" style="font-size: 80%;"><b>Minimizing the cross entropy</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs012.html#a-more-compact-expression" style="font-size: 80%;"><b>A more compact expression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs013.html#extending-to-more-predictors" style="font-size: 80%;"><b>Extending to more predictors</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs014.html#including-more-classes" style="font-size: 80%;"><b>Including more classes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs015.html#more-classes" style="font-size: 80%;"><b>More classes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs016.html#optimization-the-central-part-of-any-machine-learning-algortithm" style="font-size: 80%;"><b>Optimization, the central part of any Machine Learning algortithm</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs017.html#revisiting-our-logistic-regression-case" style="font-size: 80%;"><b>Revisiting our Logistic Regression case</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs018.html#the-equations-to-solve" style="font-size: 80%;"><b>The equations to solve</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs019.html#solving-using-newton-raphson-s-method" style="font-size: 80%;"><b>Solving using Newton-Raphson's method</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs020.html#example-code-for-logistic-regression" style="font-size: 80%;"><b>Example code for Logistic Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs020.html#synthetic-data-generation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Synthetic data generation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs021.html#using-scikit-learn" style="font-size: 80%;"><b>Using <b>Scikit-learn</b></b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs022.html#using-the-correlation-matrix" style="font-size: 80%;"><b>Using the correlation matrix</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs023.html#discussing-the-correlation-data" style="font-size: 80%;"><b>Discussing the correlation data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs024.html#other-measures-in-classification-studies" style="font-size: 80%;"><b>Other measures in classification studies</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs025.html#introduction-to-neural-networks" style="font-size: 80%;"><b>Introduction to Neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs026.html#artificial-neurons" style="font-size: 80%;"><b>Artificial neurons</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs027.html#neural-network-types" style="font-size: 80%;"><b>Neural network types</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#feed-forward-neural-networks" style="font-size: 80%;"><b>Feed-forward neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs029.html#convolutional-neural-network" style="font-size: 80%;"><b>Convolutional Neural Network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs030.html#recurrent-neural-networks" style="font-size: 80%;"><b>Recurrent neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs031.html#other-types-of-networks" style="font-size: 80%;"><b>Other types of networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs032.html#multilayer-perceptrons" style="font-size: 80%;"><b>Multilayer perceptrons</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs033.html#why-multilayer-perceptrons" style="font-size: 80%;"><b>Why multilayer perceptrons?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs034.html#illustration-of-a-single-perceptron-model-and-a-multi-perceptron-model" style="font-size: 80%;"><b>Illustration of a single perceptron model and a multi-perceptron model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs035.html#examples-of-xor-or-and-and-gates" style="font-size: 80%;"><b>Examples of XOR, OR and AND gates</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs036.html#does-logistic-regression-do-a-better-job" style="font-size: 80%;"><b>Does Logistic Regression do a better Job?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs037.html#adding-neural-networks" style="font-size: 80%;"><b>Adding Neural Networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs043.html#matrix-vector-notation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Matrix-vector notation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs044.html#matrix-vector-notation-and-activation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Matrix-vector notation  and activation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs046.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs047.html#relevance" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Relevance</a></li>
 
         </ul>
       </li>
@@ -362,18 +260,22 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0010"></a>
 <!-- !split -->
-<h2 id="batches-and-mini-batches" class="anchor">Batches and mini-batches </h2>
+<h2 id="the-cost-function-rewritten" class="anchor">The cost function rewritten </h2>
 
-<p>In gradient descent we compute the cost function and its gradient for all data points we have.</p>
+<p>Reordering the logarithms, we can rewrite the <b>cost/loss</b> function as</p>
+$$
+\mathcal{C}(\boldsymbol{\theta}) = \sum_{i=1}^n  \left(y_i(\theta_0+\theta_1x_i) -\log{(1+\exp{(\theta_0+\theta_1x_i)})}\right).
+$$
+
+<p>The maximum likelihood estimator is defined as the set of parameters that maximize the log-likelihood where we maximize with respect to \( \theta \).
+Since the cost (error) function is just the negative log-likelihood, for logistic regression we have that
+</p>
+$$
+\mathcal{C}(\boldsymbol{\theta})=-\sum_{i=1}^n  \left(y_i(\theta_0+\theta_1x_i) -\log{(1+\exp{(\theta_0+\theta_1x_i)})}\right).
+$$
 
-<p>In large-scale applications such as the <a href="https://www.image-net.org/challenges/LSVRC/" target="_self">ILSVRC challenge</a>, the
-training data can have on order of millions of examples. Hence, it
-seems wasteful to compute the full cost function over the entire
-training set in order to perform only a single parameter update. A
-very common approach to addressing this challenge is to compute the
-gradient over batches of the training data. For example, a typical batch could contain some thousand  examples from
-an  entire training set of several millions. This batch is then used to
-perform a parameter update.
+<p>This equation is known in statistics as the <b>cross entropy</b>. Finally, we note that just as in linear regression, 
+in practice we often supplement the cross-entropy with additional regularization terms, usually \( L_1 \) and \( L_2 \) regularization as we did for Ridge and Lasso regression.
 </p>
 
 <p>
@@ -401,7 +303,7 @@ <h2 id="batches-and-mini-batches" class="anchor">Batches and mini-batches </h2>
   <li><a href="http://github.com/._week40-bs018.html">19</a></li>
   <li><a href="http://github.com/._week40-bs019.html">20</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week40-bs068.html">69</a></li>
+  <li><a href="http://github.com/._week40-bs047.html">48</a></li>
   <li><a href="http://github.com/._week40-bs011.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week40/html/._week40-bs011.html b/doc/pub/week40/html/._week40-bs011.html
index 71aee4fdb..77af12316 100644
--- a/doc/pub/week40/html/._week40-bs011.html
+++ b/doc/pub/week40/html/._week40-bs011.html
@@ -36,11 +36,10 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Plans for week 40', 2, None, 'plans-for-week-40'),
-              ('Lecture Monday September 30, 2024',
+ 'sections': [('Lecture Monday September 29, 2025',
                2,
                None,
-               'lecture-monday-september-30-2024'),
+               'lecture-monday-september-29-2025'),
               ('Suggested readings and videos',
                2,
                None,
@@ -49,145 +48,71 @@
                2,
                None,
                'lab-sessions-tuesday-and-wednesday'),
-              ('Summary from last week, using gradient descent methods, '
-               'limitations',
+              ('Logistic Regression, from last week',
                2,
                None,
-               'summary-from-last-week-using-gradient-descent-methods-limitations'),
-              ('Simple implementation of GD for OLS, Ridge and Lasso',
+               'logistic-regression-from-last-week'),
+              ('Classification problems', 2, None, 'classification-problems'),
+              ('Optimization and Deep learning',
                2,
                None,
-               'simple-implementation-of-gd-for-ols-ridge-and-lasso'),
-              ("But none of these can compete with Newton's method",
+               'optimization-and-deep-learning'),
+              ('Basics', 2, None, 'basics'),
+              ('Two parameters', 2, None, 'two-parameters'),
+              ('Maximum likelihood', 2, None, 'maximum-likelihood'),
+              ('The cost function rewritten',
                2,
                None,
-               'but-none-of-these-can-compete-with-newton-s-method'),
-              ('Gradient descent and Logistic regression',
+               'the-cost-function-rewritten'),
+              ('Minimizing the cross entropy',
                2,
                None,
-               'gradient-descent-and-logistic-regression'),
-              ('Overview video on Stochastic Gradient Descent',
+               'minimizing-the-cross-entropy'),
+              ('A more compact expression',
                2,
                None,
-               'overview-video-on-stochastic-gradient-descent'),
-              ('Batches and mini-batches', 2, None, 'batches-and-mini-batches'),
-              ('Stochastic Gradient Descent (SGD)',
+               'a-more-compact-expression'),
+              ('Extending to more predictors',
                2,
                None,
-               'stochastic-gradient-descent-sgd'),
-              ('Stochastic Gradient Descent',
+               'extending-to-more-predictors'),
+              ('Including more classes', 2, None, 'including-more-classes'),
+              ('More classes', 2, None, 'more-classes'),
+              ('Optimization, the central part of any Machine Learning '
+               'algortithm',
                2,
                None,
-               'stochastic-gradient-descent'),
-              ('Computation of gradients', 2, None, 'computation-of-gradients'),
-              ('SGD example', 2, None, 'sgd-example'),
-              ('The gradient step', 2, None, 'the-gradient-step'),
-              ('Simple example code', 2, None, 'simple-example-code'),
-              ('When do we stop?', 2, None, 'when-do-we-stop'),
-              ('Slightly different approach',
+               'optimization-the-central-part-of-any-machine-learning-algortithm'),
+              ('Revisiting our Logistic Regression case',
                2,
                None,
-               'slightly-different-approach'),
-              ('Time decay rate', 2, None, 'time-decay-rate'),
-              ('Code with a Number of Minibatches which varies',
+               'revisiting-our-logistic-regression-case'),
+              ('The equations to solve', 2, None, 'the-equations-to-solve'),
+              ("Solving using Newton-Raphson's method",
                2,
                None,
-               'code-with-a-number-of-minibatches-which-varies'),
-              ('Replace or not', 2, None, 'replace-or-not'),
-              ('Momentum based GD', 2, None, 'momentum-based-gd'),
-              ('More on momentum based approaches',
+               'solving-using-newton-raphson-s-method'),
+              ('Example code for Logistic Regression',
                2,
                None,
-               'more-on-momentum-based-approaches'),
-              ('Momentum parameter', 2, None, 'momentum-parameter'),
-              ('Second moment of the gradient',
-               2,
-               None,
-               'second-moment-of-the-gradient'),
-              ('RMS prop', 2, None, 'rms-prop'),
-              ('"ADAM optimizer":"https://arxiv.org/abs/1412.6980"',
-               2,
-               None,
-               'adam-optimizer-https-arxiv-org-abs-1412-6980'),
-              ('Algorithms and codes for Adagrad, RMSprop and Adam',
-               2,
-               None,
-               'algorithms-and-codes-for-adagrad-rmsprop-and-adam'),
-              ('AdaGrad algorithm, taken from "Goodfellow et '
-               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
-               2,
-               None,
-               'adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
-              ('RMSProp algorithm, taken from "Goodfellow et '
-               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
-               2,
-               None,
-               'rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
-              ('ADAM algorithm, taken from "Goodfellow et '
-               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
-               2,
-               None,
-               'adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
-              ('Practical tips', 2, None, 'practical-tips'),
-              ('Automatic differentiation',
-               2,
-               None,
-               'automatic-differentiation'),
-              ('Using autograd', 2, None, 'using-autograd'),
-              ('Autograd with more complicated functions',
-               2,
-               None,
-               'autograd-with-more-complicated-functions'),
-              ('More complicated functions using the elements of their '
-               'arguments directly',
-               2,
-               None,
-               'more-complicated-functions-using-the-elements-of-their-arguments-directly'),
-              ('Functions using mathematical functions from Numpy',
-               2,
-               None,
-               'functions-using-mathematical-functions-from-numpy'),
-              ('More autograd', 2, None, 'more-autograd'),
-              ('And  with loops', 2, None, 'and-with-loops'),
-              ('Using recursion', 2, None, 'using-recursion'),
-              ('Using Autograd with OLS', 2, None, 'using-autograd-with-ols'),
-              ('Same code but now with momentum gradient descent',
-               2,
-               None,
-               'same-code-but-now-with-momentum-gradient-descent'),
-              ('Including Stochastic Gradient Descent with Autograd',
-               2,
-               None,
-               'including-stochastic-gradient-descent-with-autograd'),
-              ('Same code but now with momentum gradient descent',
-               2,
-               None,
-               'same-code-but-now-with-momentum-gradient-descent'),
-              ('Similar (second order function now) problem but now with '
-               'AdaGrad',
-               2,
+               'example-code-for-logistic-regression'),
+              ('Synthetic data generation',
+               3,
                None,
-               'similar-second-order-function-now-problem-but-now-with-adagrad'),
-              ('RMSprop for adaptive learning rate with Stochastic Gradient '
-               'Descent',
+               'synthetic-data-generation'),
+              ('Using _Scikit-learn_', 2, None, 'using-scikit-learn'),
+              ('Using the correlation matrix',
                2,
                None,
-               'rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent'),
-              ('And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"',
+               'using-the-correlation-matrix'),
+              ('Discussing the correlation data',
                2,
                None,
-               'and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf'),
-              ('And Logistic Regression', 2, None, 'and-logistic-regression'),
-              ('Introducing "JAX":"https://jax.readthedocs.io/en/latest/"',
+               'discussing-the-correlation-data'),
+              ('Other measures in classification studies',
                2,
                None,
-               'introducing-jax-https-jax-readthedocs-io-en-latest'),
-              ('Getting started with Jax, note the way we import numpy',
-               3,
-               None,
-               'getting-started-with-jax-note-the-way-we-import-numpy'),
-              ('A warm-up example', 3, None, 'a-warm-up-example'),
-              ('A more advanced example', 3, None, 'a-more-advanced-example'),
+               'other-measures-in-classification-studies'),
               ('Introduction to Neural networks',
                2,
                None,
@@ -276,81 +201,54 @@
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs001.html#plans-for-week-40" style="font-size: 80%;"><b>Plans for week 40</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs002.html#lecture-monday-september-30-2024" style="font-size: 80%;"><b>Lecture Monday September 30, 2024</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs003.html#suggested-readings-and-videos" style="font-size: 80%;"><b>Suggested readings and videos</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs004.html#lab-sessions-tuesday-and-wednesday" style="font-size: 80%;"><b>Lab sessions Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs005.html#summary-from-last-week-using-gradient-descent-methods-limitations" style="font-size: 80%;"><b>Summary from last week, using gradient descent methods, limitations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs006.html#simple-implementation-of-gd-for-ols-ridge-and-lasso" style="font-size: 80%;"><b>Simple implementation of GD for OLS, Ridge and Lasso</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs007.html#but-none-of-these-can-compete-with-newton-s-method" style="font-size: 80%;"><b>But none of these can compete with Newton's method</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs008.html#gradient-descent-and-logistic-regression" style="font-size: 80%;"><b>Gradient descent and Logistic regression</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs009.html#overview-video-on-stochastic-gradient-descent" style="font-size: 80%;"><b>Overview video on Stochastic Gradient Descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs010.html#batches-and-mini-batches" style="font-size: 80%;"><b>Batches and mini-batches</b></a></li>
-     <!-- navigation toc: --> <li><a href="#stochastic-gradient-descent-sgd" style="font-size: 80%;"><b>Stochastic Gradient Descent (SGD)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs012.html#stochastic-gradient-descent" style="font-size: 80%;"><b>Stochastic Gradient Descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs013.html#computation-of-gradients" style="font-size: 80%;"><b>Computation of gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs014.html#sgd-example" style="font-size: 80%;"><b>SGD example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs015.html#the-gradient-step" style="font-size: 80%;"><b>The gradient step</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs016.html#simple-example-code" style="font-size: 80%;"><b>Simple example code</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs017.html#when-do-we-stop" style="font-size: 80%;"><b>When do we stop?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs018.html#slightly-different-approach" style="font-size: 80%;"><b>Slightly different approach</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs019.html#time-decay-rate" style="font-size: 80%;"><b>Time decay rate</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs020.html#code-with-a-number-of-minibatches-which-varies" style="font-size: 80%;"><b>Code with a Number of Minibatches which varies</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs021.html#replace-or-not" style="font-size: 80%;"><b>Replace or not</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs022.html#momentum-based-gd" style="font-size: 80%;"><b>Momentum based GD</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs023.html#more-on-momentum-based-approaches" style="font-size: 80%;"><b>More on momentum based approaches</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs024.html#momentum-parameter" style="font-size: 80%;"><b>Momentum parameter</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs025.html#second-moment-of-the-gradient" style="font-size: 80%;"><b>Second moment of the gradient</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs026.html#rms-prop" style="font-size: 80%;"><b>RMS prop</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs027.html#adam-optimizer-https-arxiv-org-abs-1412-6980" style="font-size: 80%;"><b>"ADAM optimizer":"https://arxiv.org/abs/1412.6980"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#algorithms-and-codes-for-adagrad-rmsprop-and-adam" style="font-size: 80%;"><b>Algorithms and codes for Adagrad, RMSprop and Adam</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>AdaGrad algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>RMSProp algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>ADAM algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs029.html#practical-tips" style="font-size: 80%;"><b>Practical tips</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs030.html#automatic-differentiation" style="font-size: 80%;"><b>Automatic differentiation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs031.html#using-autograd" style="font-size: 80%;"><b>Using autograd</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs032.html#autograd-with-more-complicated-functions" style="font-size: 80%;"><b>Autograd with more complicated functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs033.html#more-complicated-functions-using-the-elements-of-their-arguments-directly" style="font-size: 80%;"><b>More complicated functions using the elements of their arguments directly</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs034.html#functions-using-mathematical-functions-from-numpy" style="font-size: 80%;"><b>Functions using mathematical functions from Numpy</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs035.html#more-autograd" style="font-size: 80%;"><b>More autograd</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs036.html#and-with-loops" style="font-size: 80%;"><b>And  with loops</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs037.html#using-recursion" style="font-size: 80%;"><b>Using recursion</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs038.html#using-autograd-with-ols" style="font-size: 80%;"><b>Using Autograd with OLS</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs041.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs040.html#including-stochastic-gradient-descent-with-autograd" style="font-size: 80%;"><b>Including Stochastic Gradient Descent with Autograd</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs041.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#similar-second-order-function-now-problem-but-now-with-adagrad" style="font-size: 80%;"><b>Similar (second order function now) problem but now with AdaGrad</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs043.html#rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent" style="font-size: 80%;"><b>RMSprop for adaptive learning rate with Stochastic Gradient Descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs044.html#and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf" style="font-size: 80%;"><b>And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#and-logistic-regression" style="font-size: 80%;"><b>And Logistic Regression</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#introducing-jax-https-jax-readthedocs-io-en-latest" style="font-size: 80%;"><b>Introducing "JAX":"https://jax.readthedocs.io/en/latest/"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#getting-started-with-jax-note-the-way-we-import-numpy" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Getting started with Jax, note the way we import numpy</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#a-warm-up-example" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;A warm-up example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#a-more-advanced-example" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;A more advanced example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs046.html#introduction-to-neural-networks" style="font-size: 80%;"><b>Introduction to Neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs047.html#artificial-neurons" style="font-size: 80%;"><b>Artificial neurons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs048.html#neural-network-types" style="font-size: 80%;"><b>Neural network types</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs049.html#feed-forward-neural-networks" style="font-size: 80%;"><b>Feed-forward neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs050.html#convolutional-neural-network" style="font-size: 80%;"><b>Convolutional Neural Network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs051.html#recurrent-neural-networks" style="font-size: 80%;"><b>Recurrent neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs052.html#other-types-of-networks" style="font-size: 80%;"><b>Other types of networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs053.html#multilayer-perceptrons" style="font-size: 80%;"><b>Multilayer perceptrons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs054.html#why-multilayer-perceptrons" style="font-size: 80%;"><b>Why multilayer perceptrons?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs055.html#illustration-of-a-single-perceptron-model-and-a-multi-perceptron-model" style="font-size: 80%;"><b>Illustration of a single perceptron model and a multi-perceptron model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs056.html#examples-of-xor-or-and-and-gates" style="font-size: 80%;"><b>Examples of XOR, OR and AND gates</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs057.html#does-logistic-regression-do-a-better-job" style="font-size: 80%;"><b>Does Logistic Regression do a better Job?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs058.html#adding-neural-networks" style="font-size: 80%;"><b>Adding Neural Networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs064.html#matrix-vector-notation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Matrix-vector notation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs065.html#matrix-vector-notation-and-activation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Matrix-vector notation  and activation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs066.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs067.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs068.html#relevance" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Relevance</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs001.html#lecture-monday-september-29-2025" style="font-size: 80%;"><b>Lecture Monday September 29, 2025</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs002.html#suggested-readings-and-videos" style="font-size: 80%;"><b>Suggested readings and videos</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs003.html#lab-sessions-tuesday-and-wednesday" style="font-size: 80%;"><b>Lab sessions Tuesday and Wednesday</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs004.html#logistic-regression-from-last-week" style="font-size: 80%;"><b>Logistic Regression, from last week</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs005.html#classification-problems" style="font-size: 80%;"><b>Classification problems</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs006.html#optimization-and-deep-learning" style="font-size: 80%;"><b>Optimization and Deep learning</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs007.html#basics" style="font-size: 80%;"><b>Basics</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs008.html#two-parameters" style="font-size: 80%;"><b>Two parameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs009.html#maximum-likelihood" style="font-size: 80%;"><b>Maximum likelihood</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs010.html#the-cost-function-rewritten" style="font-size: 80%;"><b>The cost function rewritten</b></a></li>
+     <!-- navigation toc: --> <li><a href="#minimizing-the-cross-entropy" style="font-size: 80%;"><b>Minimizing the cross entropy</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs012.html#a-more-compact-expression" style="font-size: 80%;"><b>A more compact expression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs013.html#extending-to-more-predictors" style="font-size: 80%;"><b>Extending to more predictors</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs014.html#including-more-classes" style="font-size: 80%;"><b>Including more classes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs015.html#more-classes" style="font-size: 80%;"><b>More classes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs016.html#optimization-the-central-part-of-any-machine-learning-algortithm" style="font-size: 80%;"><b>Optimization, the central part of any Machine Learning algortithm</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs017.html#revisiting-our-logistic-regression-case" style="font-size: 80%;"><b>Revisiting our Logistic Regression case</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs018.html#the-equations-to-solve" style="font-size: 80%;"><b>The equations to solve</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs019.html#solving-using-newton-raphson-s-method" style="font-size: 80%;"><b>Solving using Newton-Raphson's method</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs020.html#example-code-for-logistic-regression" style="font-size: 80%;"><b>Example code for Logistic Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs020.html#synthetic-data-generation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Synthetic data generation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs021.html#using-scikit-learn" style="font-size: 80%;"><b>Using <b>Scikit-learn</b></b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs022.html#using-the-correlation-matrix" style="font-size: 80%;"><b>Using the correlation matrix</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs023.html#discussing-the-correlation-data" style="font-size: 80%;"><b>Discussing the correlation data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs024.html#other-measures-in-classification-studies" style="font-size: 80%;"><b>Other measures in classification studies</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs025.html#introduction-to-neural-networks" style="font-size: 80%;"><b>Introduction to Neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs026.html#artificial-neurons" style="font-size: 80%;"><b>Artificial neurons</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs027.html#neural-network-types" style="font-size: 80%;"><b>Neural network types</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#feed-forward-neural-networks" style="font-size: 80%;"><b>Feed-forward neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs029.html#convolutional-neural-network" style="font-size: 80%;"><b>Convolutional Neural Network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs030.html#recurrent-neural-networks" style="font-size: 80%;"><b>Recurrent neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs031.html#other-types-of-networks" style="font-size: 80%;"><b>Other types of networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs032.html#multilayer-perceptrons" style="font-size: 80%;"><b>Multilayer perceptrons</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs033.html#why-multilayer-perceptrons" style="font-size: 80%;"><b>Why multilayer perceptrons?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs034.html#illustration-of-a-single-perceptron-model-and-a-multi-perceptron-model" style="font-size: 80%;"><b>Illustration of a single perceptron model and a multi-perceptron model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs035.html#examples-of-xor-or-and-and-gates" style="font-size: 80%;"><b>Examples of XOR, OR and AND gates</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs036.html#does-logistic-regression-do-a-better-job" style="font-size: 80%;"><b>Does Logistic Regression do a better Job?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs037.html#adding-neural-networks" style="font-size: 80%;"><b>Adding Neural Networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs043.html#matrix-vector-notation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Matrix-vector notation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs044.html#matrix-vector-notation-and-activation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Matrix-vector notation  and activation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs046.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs047.html#relevance" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Relevance</a></li>
 
         </ul>
       </li>
@@ -362,31 +260,25 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0011"></a>
 <!-- !split -->
-<h2 id="stochastic-gradient-descent-sgd" class="anchor">Stochastic Gradient Descent (SGD) </h2>
+<h2 id="minimizing-the-cross-entropy" class="anchor">Minimizing the cross entropy </h2>
 
-<p>In stochastic gradient descent, the extreme case is the case where we
-have only one batch, that is we include the whole data set.
+<p>The cross entropy is a convex function of the weights \( \boldsymbol{\theta} \) and,
+therefore, any local minimizer is a global minimizer. 
 </p>
 
-<p>This process is called Stochastic Gradient
-Descent (SGD) (or also sometimes on-line gradient descent). This is
-relatively less common to see because in practice due to vectorized
-code optimizations it can be computationally much more efficient to
-evaluate the gradient for 100 examples, than the gradient for one
-example 100 times. Even though SGD technically refers to using a
-single example at a time to evaluate the gradient, you will hear
-people use the term SGD even when referring to mini-batch gradient
-descent (i.e. mentions of MGD for &#8220;Minibatch Gradient Descent&#8221;, or BGD
-for &#8220;Batch gradient descent&#8221; are rare to see), where it is usually
-assumed that mini-batches are used. The size of the mini-batch is a
-hyperparameter but it is not very common to cross-validate or bootstrap it. It is
-usually based on memory constraints (if any), or set to some value,
-e.g. 32, 64 or 128. We use powers of 2 in practice because many
-vectorized operation implementations work faster when their inputs are
-sized in powers of 2.
+<p>Minimizing this
+cost function with respect to the two parameters \( \theta_0 \) and \( \theta_1 \) we obtain
 </p>
 
-<p>In our notes with  SGD we mean stochastic gradient descent with mini-batches.</p>
+$$
+\frac{\partial \mathcal{C}(\boldsymbol{\theta})}{\partial \theta_0} = -\sum_{i=1}^n  \left(y_i -\frac{\exp{(\theta_0+\theta_1x_i)}}{1+\exp{(\theta_0+\theta_1x_i)}}\right),
+$$
+
+<p>and </p>
+$$
+\frac{\partial \mathcal{C}(\boldsymbol{\theta})}{\partial \theta_1} = -\sum_{i=1}^n  \left(y_ix_i -x_i\frac{\exp{(\theta_0+\theta_1x_i)}}{1+\exp{(\theta_0+\theta_1x_i)}}\right).
+$$
+
 
 <p>
 <!-- navigation buttons at the bottom of the page -->
@@ -413,7 +305,7 @@ <h2 id="stochastic-gradient-descent-sgd" class="anchor">Stochastic Gradient Desc
   <li><a href="http://github.com/._week40-bs019.html">20</a></li>
   <li><a href="http://github.com/._week40-bs020.html">21</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week40-bs068.html">69</a></li>
+  <li><a href="http://github.com/._week40-bs047.html">48</a></li>
   <li><a href="http://github.com/._week40-bs012.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week40/html/._week40-bs012.html b/doc/pub/week40/html/._week40-bs012.html
index a9f7f40ba..616fc6ea3 100644
--- a/doc/pub/week40/html/._week40-bs012.html
+++ b/doc/pub/week40/html/._week40-bs012.html
@@ -36,11 +36,10 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Plans for week 40', 2, None, 'plans-for-week-40'),
-              ('Lecture Monday September 30, 2024',
+ 'sections': [('Lecture Monday September 29, 2025',
                2,
                None,
-               'lecture-monday-september-30-2024'),
+               'lecture-monday-september-29-2025'),
               ('Suggested readings and videos',
                2,
                None,
@@ -49,145 +48,71 @@
                2,
                None,
                'lab-sessions-tuesday-and-wednesday'),
-              ('Summary from last week, using gradient descent methods, '
-               'limitations',
+              ('Logistic Regression, from last week',
                2,
                None,
-               'summary-from-last-week-using-gradient-descent-methods-limitations'),
-              ('Simple implementation of GD for OLS, Ridge and Lasso',
+               'logistic-regression-from-last-week'),
+              ('Classification problems', 2, None, 'classification-problems'),
+              ('Optimization and Deep learning',
                2,
                None,
-               'simple-implementation-of-gd-for-ols-ridge-and-lasso'),
-              ("But none of these can compete with Newton's method",
+               'optimization-and-deep-learning'),
+              ('Basics', 2, None, 'basics'),
+              ('Two parameters', 2, None, 'two-parameters'),
+              ('Maximum likelihood', 2, None, 'maximum-likelihood'),
+              ('The cost function rewritten',
                2,
                None,
-               'but-none-of-these-can-compete-with-newton-s-method'),
-              ('Gradient descent and Logistic regression',
+               'the-cost-function-rewritten'),
+              ('Minimizing the cross entropy',
                2,
                None,
-               'gradient-descent-and-logistic-regression'),
-              ('Overview video on Stochastic Gradient Descent',
+               'minimizing-the-cross-entropy'),
+              ('A more compact expression',
                2,
                None,
-               'overview-video-on-stochastic-gradient-descent'),
-              ('Batches and mini-batches', 2, None, 'batches-and-mini-batches'),
-              ('Stochastic Gradient Descent (SGD)',
+               'a-more-compact-expression'),
+              ('Extending to more predictors',
                2,
                None,
-               'stochastic-gradient-descent-sgd'),
-              ('Stochastic Gradient Descent',
+               'extending-to-more-predictors'),
+              ('Including more classes', 2, None, 'including-more-classes'),
+              ('More classes', 2, None, 'more-classes'),
+              ('Optimization, the central part of any Machine Learning '
+               'algortithm',
                2,
                None,
-               'stochastic-gradient-descent'),
-              ('Computation of gradients', 2, None, 'computation-of-gradients'),
-              ('SGD example', 2, None, 'sgd-example'),
-              ('The gradient step', 2, None, 'the-gradient-step'),
-              ('Simple example code', 2, None, 'simple-example-code'),
-              ('When do we stop?', 2, None, 'when-do-we-stop'),
-              ('Slightly different approach',
+               'optimization-the-central-part-of-any-machine-learning-algortithm'),
+              ('Revisiting our Logistic Regression case',
                2,
                None,
-               'slightly-different-approach'),
-              ('Time decay rate', 2, None, 'time-decay-rate'),
-              ('Code with a Number of Minibatches which varies',
+               'revisiting-our-logistic-regression-case'),
+              ('The equations to solve', 2, None, 'the-equations-to-solve'),
+              ("Solving using Newton-Raphson's method",
                2,
                None,
-               'code-with-a-number-of-minibatches-which-varies'),
-              ('Replace or not', 2, None, 'replace-or-not'),
-              ('Momentum based GD', 2, None, 'momentum-based-gd'),
-              ('More on momentum based approaches',
+               'solving-using-newton-raphson-s-method'),
+              ('Example code for Logistic Regression',
                2,
                None,
-               'more-on-momentum-based-approaches'),
-              ('Momentum parameter', 2, None, 'momentum-parameter'),
-              ('Second moment of the gradient',
-               2,
-               None,
-               'second-moment-of-the-gradient'),
-              ('RMS prop', 2, None, 'rms-prop'),
-              ('"ADAM optimizer":"https://arxiv.org/abs/1412.6980"',
-               2,
-               None,
-               'adam-optimizer-https-arxiv-org-abs-1412-6980'),
-              ('Algorithms and codes for Adagrad, RMSprop and Adam',
-               2,
-               None,
-               'algorithms-and-codes-for-adagrad-rmsprop-and-adam'),
-              ('AdaGrad algorithm, taken from "Goodfellow et '
-               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
-               2,
-               None,
-               'adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
-              ('RMSProp algorithm, taken from "Goodfellow et '
-               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
-               2,
-               None,
-               'rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
-              ('ADAM algorithm, taken from "Goodfellow et '
-               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
-               2,
-               None,
-               'adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
-              ('Practical tips', 2, None, 'practical-tips'),
-              ('Automatic differentiation',
-               2,
-               None,
-               'automatic-differentiation'),
-              ('Using autograd', 2, None, 'using-autograd'),
-              ('Autograd with more complicated functions',
-               2,
-               None,
-               'autograd-with-more-complicated-functions'),
-              ('More complicated functions using the elements of their '
-               'arguments directly',
-               2,
-               None,
-               'more-complicated-functions-using-the-elements-of-their-arguments-directly'),
-              ('Functions using mathematical functions from Numpy',
-               2,
-               None,
-               'functions-using-mathematical-functions-from-numpy'),
-              ('More autograd', 2, None, 'more-autograd'),
-              ('And  with loops', 2, None, 'and-with-loops'),
-              ('Using recursion', 2, None, 'using-recursion'),
-              ('Using Autograd with OLS', 2, None, 'using-autograd-with-ols'),
-              ('Same code but now with momentum gradient descent',
-               2,
-               None,
-               'same-code-but-now-with-momentum-gradient-descent'),
-              ('Including Stochastic Gradient Descent with Autograd',
-               2,
-               None,
-               'including-stochastic-gradient-descent-with-autograd'),
-              ('Same code but now with momentum gradient descent',
-               2,
-               None,
-               'same-code-but-now-with-momentum-gradient-descent'),
-              ('Similar (second order function now) problem but now with '
-               'AdaGrad',
-               2,
+               'example-code-for-logistic-regression'),
+              ('Synthetic data generation',
+               3,
                None,
-               'similar-second-order-function-now-problem-but-now-with-adagrad'),
-              ('RMSprop for adaptive learning rate with Stochastic Gradient '
-               'Descent',
+               'synthetic-data-generation'),
+              ('Using _Scikit-learn_', 2, None, 'using-scikit-learn'),
+              ('Using the correlation matrix',
                2,
                None,
-               'rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent'),
-              ('And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"',
+               'using-the-correlation-matrix'),
+              ('Discussing the correlation data',
                2,
                None,
-               'and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf'),
-              ('And Logistic Regression', 2, None, 'and-logistic-regression'),
-              ('Introducing "JAX":"https://jax.readthedocs.io/en/latest/"',
+               'discussing-the-correlation-data'),
+              ('Other measures in classification studies',
                2,
                None,
-               'introducing-jax-https-jax-readthedocs-io-en-latest'),
-              ('Getting started with Jax, note the way we import numpy',
-               3,
-               None,
-               'getting-started-with-jax-note-the-way-we-import-numpy'),
-              ('A warm-up example', 3, None, 'a-warm-up-example'),
-              ('A more advanced example', 3, None, 'a-more-advanced-example'),
+               'other-measures-in-classification-studies'),
               ('Introduction to Neural networks',
                2,
                None,
@@ -276,81 +201,54 @@
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs001.html#plans-for-week-40" style="font-size: 80%;"><b>Plans for week 40</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs002.html#lecture-monday-september-30-2024" style="font-size: 80%;"><b>Lecture Monday September 30, 2024</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs003.html#suggested-readings-and-videos" style="font-size: 80%;"><b>Suggested readings and videos</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs004.html#lab-sessions-tuesday-and-wednesday" style="font-size: 80%;"><b>Lab sessions Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs005.html#summary-from-last-week-using-gradient-descent-methods-limitations" style="font-size: 80%;"><b>Summary from last week, using gradient descent methods, limitations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs006.html#simple-implementation-of-gd-for-ols-ridge-and-lasso" style="font-size: 80%;"><b>Simple implementation of GD for OLS, Ridge and Lasso</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs007.html#but-none-of-these-can-compete-with-newton-s-method" style="font-size: 80%;"><b>But none of these can compete with Newton's method</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs008.html#gradient-descent-and-logistic-regression" style="font-size: 80%;"><b>Gradient descent and Logistic regression</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs009.html#overview-video-on-stochastic-gradient-descent" style="font-size: 80%;"><b>Overview video on Stochastic Gradient Descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs010.html#batches-and-mini-batches" style="font-size: 80%;"><b>Batches and mini-batches</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs011.html#stochastic-gradient-descent-sgd" style="font-size: 80%;"><b>Stochastic Gradient Descent (SGD)</b></a></li>
-     <!-- navigation toc: --> <li><a href="#stochastic-gradient-descent" style="font-size: 80%;"><b>Stochastic Gradient Descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs013.html#computation-of-gradients" style="font-size: 80%;"><b>Computation of gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs014.html#sgd-example" style="font-size: 80%;"><b>SGD example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs015.html#the-gradient-step" style="font-size: 80%;"><b>The gradient step</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs016.html#simple-example-code" style="font-size: 80%;"><b>Simple example code</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs017.html#when-do-we-stop" style="font-size: 80%;"><b>When do we stop?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs018.html#slightly-different-approach" style="font-size: 80%;"><b>Slightly different approach</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs019.html#time-decay-rate" style="font-size: 80%;"><b>Time decay rate</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs020.html#code-with-a-number-of-minibatches-which-varies" style="font-size: 80%;"><b>Code with a Number of Minibatches which varies</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs021.html#replace-or-not" style="font-size: 80%;"><b>Replace or not</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs022.html#momentum-based-gd" style="font-size: 80%;"><b>Momentum based GD</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs023.html#more-on-momentum-based-approaches" style="font-size: 80%;"><b>More on momentum based approaches</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs024.html#momentum-parameter" style="font-size: 80%;"><b>Momentum parameter</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs025.html#second-moment-of-the-gradient" style="font-size: 80%;"><b>Second moment of the gradient</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs026.html#rms-prop" style="font-size: 80%;"><b>RMS prop</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs027.html#adam-optimizer-https-arxiv-org-abs-1412-6980" style="font-size: 80%;"><b>"ADAM optimizer":"https://arxiv.org/abs/1412.6980"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#algorithms-and-codes-for-adagrad-rmsprop-and-adam" style="font-size: 80%;"><b>Algorithms and codes for Adagrad, RMSprop and Adam</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>AdaGrad algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>RMSProp algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>ADAM algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs029.html#practical-tips" style="font-size: 80%;"><b>Practical tips</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs030.html#automatic-differentiation" style="font-size: 80%;"><b>Automatic differentiation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs031.html#using-autograd" style="font-size: 80%;"><b>Using autograd</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs032.html#autograd-with-more-complicated-functions" style="font-size: 80%;"><b>Autograd with more complicated functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs033.html#more-complicated-functions-using-the-elements-of-their-arguments-directly" style="font-size: 80%;"><b>More complicated functions using the elements of their arguments directly</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs034.html#functions-using-mathematical-functions-from-numpy" style="font-size: 80%;"><b>Functions using mathematical functions from Numpy</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs035.html#more-autograd" style="font-size: 80%;"><b>More autograd</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs036.html#and-with-loops" style="font-size: 80%;"><b>And  with loops</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs037.html#using-recursion" style="font-size: 80%;"><b>Using recursion</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs038.html#using-autograd-with-ols" style="font-size: 80%;"><b>Using Autograd with OLS</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs041.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs040.html#including-stochastic-gradient-descent-with-autograd" style="font-size: 80%;"><b>Including Stochastic Gradient Descent with Autograd</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs041.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#similar-second-order-function-now-problem-but-now-with-adagrad" style="font-size: 80%;"><b>Similar (second order function now) problem but now with AdaGrad</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs043.html#rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent" style="font-size: 80%;"><b>RMSprop for adaptive learning rate with Stochastic Gradient Descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs044.html#and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf" style="font-size: 80%;"><b>And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#and-logistic-regression" style="font-size: 80%;"><b>And Logistic Regression</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#introducing-jax-https-jax-readthedocs-io-en-latest" style="font-size: 80%;"><b>Introducing "JAX":"https://jax.readthedocs.io/en/latest/"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#getting-started-with-jax-note-the-way-we-import-numpy" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Getting started with Jax, note the way we import numpy</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#a-warm-up-example" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;A warm-up example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#a-more-advanced-example" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;A more advanced example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs046.html#introduction-to-neural-networks" style="font-size: 80%;"><b>Introduction to Neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs047.html#artificial-neurons" style="font-size: 80%;"><b>Artificial neurons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs048.html#neural-network-types" style="font-size: 80%;"><b>Neural network types</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs049.html#feed-forward-neural-networks" style="font-size: 80%;"><b>Feed-forward neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs050.html#convolutional-neural-network" style="font-size: 80%;"><b>Convolutional Neural Network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs051.html#recurrent-neural-networks" style="font-size: 80%;"><b>Recurrent neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs052.html#other-types-of-networks" style="font-size: 80%;"><b>Other types of networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs053.html#multilayer-perceptrons" style="font-size: 80%;"><b>Multilayer perceptrons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs054.html#why-multilayer-perceptrons" style="font-size: 80%;"><b>Why multilayer perceptrons?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs055.html#illustration-of-a-single-perceptron-model-and-a-multi-perceptron-model" style="font-size: 80%;"><b>Illustration of a single perceptron model and a multi-perceptron model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs056.html#examples-of-xor-or-and-and-gates" style="font-size: 80%;"><b>Examples of XOR, OR and AND gates</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs057.html#does-logistic-regression-do-a-better-job" style="font-size: 80%;"><b>Does Logistic Regression do a better Job?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs058.html#adding-neural-networks" style="font-size: 80%;"><b>Adding Neural Networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs064.html#matrix-vector-notation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Matrix-vector notation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs065.html#matrix-vector-notation-and-activation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Matrix-vector notation  and activation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs066.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs067.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs068.html#relevance" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Relevance</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs001.html#lecture-monday-september-29-2025" style="font-size: 80%;"><b>Lecture Monday September 29, 2025</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs002.html#suggested-readings-and-videos" style="font-size: 80%;"><b>Suggested readings and videos</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs003.html#lab-sessions-tuesday-and-wednesday" style="font-size: 80%;"><b>Lab sessions Tuesday and Wednesday</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs004.html#logistic-regression-from-last-week" style="font-size: 80%;"><b>Logistic Regression, from last week</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs005.html#classification-problems" style="font-size: 80%;"><b>Classification problems</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs006.html#optimization-and-deep-learning" style="font-size: 80%;"><b>Optimization and Deep learning</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs007.html#basics" style="font-size: 80%;"><b>Basics</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs008.html#two-parameters" style="font-size: 80%;"><b>Two parameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs009.html#maximum-likelihood" style="font-size: 80%;"><b>Maximum likelihood</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs010.html#the-cost-function-rewritten" style="font-size: 80%;"><b>The cost function rewritten</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs011.html#minimizing-the-cross-entropy" style="font-size: 80%;"><b>Minimizing the cross entropy</b></a></li>
+     <!-- navigation toc: --> <li><a href="#a-more-compact-expression" style="font-size: 80%;"><b>A more compact expression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs013.html#extending-to-more-predictors" style="font-size: 80%;"><b>Extending to more predictors</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs014.html#including-more-classes" style="font-size: 80%;"><b>Including more classes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs015.html#more-classes" style="font-size: 80%;"><b>More classes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs016.html#optimization-the-central-part-of-any-machine-learning-algortithm" style="font-size: 80%;"><b>Optimization, the central part of any Machine Learning algortithm</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs017.html#revisiting-our-logistic-regression-case" style="font-size: 80%;"><b>Revisiting our Logistic Regression case</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs018.html#the-equations-to-solve" style="font-size: 80%;"><b>The equations to solve</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs019.html#solving-using-newton-raphson-s-method" style="font-size: 80%;"><b>Solving using Newton-Raphson's method</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs020.html#example-code-for-logistic-regression" style="font-size: 80%;"><b>Example code for Logistic Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs020.html#synthetic-data-generation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Synthetic data generation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs021.html#using-scikit-learn" style="font-size: 80%;"><b>Using <b>Scikit-learn</b></b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs022.html#using-the-correlation-matrix" style="font-size: 80%;"><b>Using the correlation matrix</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs023.html#discussing-the-correlation-data" style="font-size: 80%;"><b>Discussing the correlation data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs024.html#other-measures-in-classification-studies" style="font-size: 80%;"><b>Other measures in classification studies</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs025.html#introduction-to-neural-networks" style="font-size: 80%;"><b>Introduction to Neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs026.html#artificial-neurons" style="font-size: 80%;"><b>Artificial neurons</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs027.html#neural-network-types" style="font-size: 80%;"><b>Neural network types</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#feed-forward-neural-networks" style="font-size: 80%;"><b>Feed-forward neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs029.html#convolutional-neural-network" style="font-size: 80%;"><b>Convolutional Neural Network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs030.html#recurrent-neural-networks" style="font-size: 80%;"><b>Recurrent neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs031.html#other-types-of-networks" style="font-size: 80%;"><b>Other types of networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs032.html#multilayer-perceptrons" style="font-size: 80%;"><b>Multilayer perceptrons</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs033.html#why-multilayer-perceptrons" style="font-size: 80%;"><b>Why multilayer perceptrons?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs034.html#illustration-of-a-single-perceptron-model-and-a-multi-perceptron-model" style="font-size: 80%;"><b>Illustration of a single perceptron model and a multi-perceptron model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs035.html#examples-of-xor-or-and-and-gates" style="font-size: 80%;"><b>Examples of XOR, OR and AND gates</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs036.html#does-logistic-regression-do-a-better-job" style="font-size: 80%;"><b>Does Logistic Regression do a better Job?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs037.html#adding-neural-networks" style="font-size: 80%;"><b>Adding Neural Networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs043.html#matrix-vector-notation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Matrix-vector notation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs044.html#matrix-vector-notation-and-activation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Matrix-vector notation  and activation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs046.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs047.html#relevance" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Relevance</a></li>
 
         </ul>
       </li>
@@ -362,19 +260,24 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0012"></a>
 <!-- !split -->
-<h2 id="stochastic-gradient-descent" class="anchor">Stochastic Gradient Descent </h2>
+<h2 id="a-more-compact-expression" class="anchor">A more compact expression </h2>
 
-<p>Stochastic gradient descent (SGD) and variants thereof address some of
-the shortcomings of the Gradient descent method discussed above.
+<p>Let us now define a vector \( \boldsymbol{y} \) with \( n \) elements \( y_i \), an
+\( n\times p \) matrix \( \boldsymbol{X} \) which contains the \( x_i \) values and a
+vector \( \boldsymbol{p} \) of fitted probabilities \( p(y_i\vert x_i,\boldsymbol{\theta}) \). We can rewrite in a more compact form the first
+derivative of the cost function as
 </p>
 
-<p>The underlying idea of SGD comes from the observation that the cost
-function, which we want to minimize, can almost always be written as a
-sum over \( n \) data points \( \{\mathbf{x}_i\}_{i=1}^n \),
+$$
+\frac{\partial \mathcal{C}(\boldsymbol{\theta})}{\partial \boldsymbol{\theta}} = -\boldsymbol{X}^T\left(\boldsymbol{y}-\boldsymbol{p}\right). 
+$$
+
+<p>If we in addition define a diagonal matrix \( \boldsymbol{W} \) with elements 
+\( p(y_i\vert x_i,\boldsymbol{\theta})(1-p(y_i\vert x_i,\boldsymbol{\theta}) \), we can obtain a compact expression of the second derivative as 
 </p>
+
 $$
-C(\mathbf{\beta}) = \sum_{i=1}^n c_i(\mathbf{x}_i,
-\mathbf{\beta}). 
+\frac{\partial^2 \mathcal{C}(\boldsymbol{\theta})}{\partial \boldsymbol{\theta}\partial \boldsymbol{\theta}^T} = \boldsymbol{X}^T\boldsymbol{W}\boldsymbol{X}. 
 $$
 
 
@@ -403,7 +306,7 @@ <h2 id="stochastic-gradient-descent" class="anchor">Stochastic Gradient Descent
   <li><a href="http://github.com/._week40-bs020.html">21</a></li>
   <li><a href="http://github.com/._week40-bs021.html">22</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week40-bs068.html">69</a></li>
+  <li><a href="http://github.com/._week40-bs047.html">48</a></li>
   <li><a href="http://github.com/._week40-bs013.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week40/html/._week40-bs013.html b/doc/pub/week40/html/._week40-bs013.html
index 98b9d5ccf..b23decc66 100644
--- a/doc/pub/week40/html/._week40-bs013.html
+++ b/doc/pub/week40/html/._week40-bs013.html
@@ -36,11 +36,10 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Plans for week 40', 2, None, 'plans-for-week-40'),
-              ('Lecture Monday September 30, 2024',
+ 'sections': [('Lecture Monday September 29, 2025',
                2,
                None,
-               'lecture-monday-september-30-2024'),
+               'lecture-monday-september-29-2025'),
               ('Suggested readings and videos',
                2,
                None,
@@ -49,145 +48,71 @@
                2,
                None,
                'lab-sessions-tuesday-and-wednesday'),
-              ('Summary from last week, using gradient descent methods, '
-               'limitations',
+              ('Logistic Regression, from last week',
                2,
                None,
-               'summary-from-last-week-using-gradient-descent-methods-limitations'),
-              ('Simple implementation of GD for OLS, Ridge and Lasso',
+               'logistic-regression-from-last-week'),
+              ('Classification problems', 2, None, 'classification-problems'),
+              ('Optimization and Deep learning',
                2,
                None,
-               'simple-implementation-of-gd-for-ols-ridge-and-lasso'),
-              ("But none of these can compete with Newton's method",
+               'optimization-and-deep-learning'),
+              ('Basics', 2, None, 'basics'),
+              ('Two parameters', 2, None, 'two-parameters'),
+              ('Maximum likelihood', 2, None, 'maximum-likelihood'),
+              ('The cost function rewritten',
                2,
                None,
-               'but-none-of-these-can-compete-with-newton-s-method'),
-              ('Gradient descent and Logistic regression',
+               'the-cost-function-rewritten'),
+              ('Minimizing the cross entropy',
                2,
                None,
-               'gradient-descent-and-logistic-regression'),
-              ('Overview video on Stochastic Gradient Descent',
+               'minimizing-the-cross-entropy'),
+              ('A more compact expression',
                2,
                None,
-               'overview-video-on-stochastic-gradient-descent'),
-              ('Batches and mini-batches', 2, None, 'batches-and-mini-batches'),
-              ('Stochastic Gradient Descent (SGD)',
+               'a-more-compact-expression'),
+              ('Extending to more predictors',
                2,
                None,
-               'stochastic-gradient-descent-sgd'),
-              ('Stochastic Gradient Descent',
+               'extending-to-more-predictors'),
+              ('Including more classes', 2, None, 'including-more-classes'),
+              ('More classes', 2, None, 'more-classes'),
+              ('Optimization, the central part of any Machine Learning '
+               'algortithm',
                2,
                None,
-               'stochastic-gradient-descent'),
-              ('Computation of gradients', 2, None, 'computation-of-gradients'),
-              ('SGD example', 2, None, 'sgd-example'),
-              ('The gradient step', 2, None, 'the-gradient-step'),
-              ('Simple example code', 2, None, 'simple-example-code'),
-              ('When do we stop?', 2, None, 'when-do-we-stop'),
-              ('Slightly different approach',
+               'optimization-the-central-part-of-any-machine-learning-algortithm'),
+              ('Revisiting our Logistic Regression case',
                2,
                None,
-               'slightly-different-approach'),
-              ('Time decay rate', 2, None, 'time-decay-rate'),
-              ('Code with a Number of Minibatches which varies',
+               'revisiting-our-logistic-regression-case'),
+              ('The equations to solve', 2, None, 'the-equations-to-solve'),
+              ("Solving using Newton-Raphson's method",
                2,
                None,
-               'code-with-a-number-of-minibatches-which-varies'),
-              ('Replace or not', 2, None, 'replace-or-not'),
-              ('Momentum based GD', 2, None, 'momentum-based-gd'),
-              ('More on momentum based approaches',
+               'solving-using-newton-raphson-s-method'),
+              ('Example code for Logistic Regression',
                2,
                None,
-               'more-on-momentum-based-approaches'),
-              ('Momentum parameter', 2, None, 'momentum-parameter'),
-              ('Second moment of the gradient',
-               2,
-               None,
-               'second-moment-of-the-gradient'),
-              ('RMS prop', 2, None, 'rms-prop'),
-              ('"ADAM optimizer":"https://arxiv.org/abs/1412.6980"',
-               2,
-               None,
-               'adam-optimizer-https-arxiv-org-abs-1412-6980'),
-              ('Algorithms and codes for Adagrad, RMSprop and Adam',
-               2,
-               None,
-               'algorithms-and-codes-for-adagrad-rmsprop-and-adam'),
-              ('AdaGrad algorithm, taken from "Goodfellow et '
-               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
-               2,
-               None,
-               'adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
-              ('RMSProp algorithm, taken from "Goodfellow et '
-               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
-               2,
-               None,
-               'rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
-              ('ADAM algorithm, taken from "Goodfellow et '
-               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
-               2,
-               None,
-               'adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
-              ('Practical tips', 2, None, 'practical-tips'),
-              ('Automatic differentiation',
-               2,
-               None,
-               'automatic-differentiation'),
-              ('Using autograd', 2, None, 'using-autograd'),
-              ('Autograd with more complicated functions',
-               2,
-               None,
-               'autograd-with-more-complicated-functions'),
-              ('More complicated functions using the elements of their '
-               'arguments directly',
-               2,
-               None,
-               'more-complicated-functions-using-the-elements-of-their-arguments-directly'),
-              ('Functions using mathematical functions from Numpy',
-               2,
-               None,
-               'functions-using-mathematical-functions-from-numpy'),
-              ('More autograd', 2, None, 'more-autograd'),
-              ('And  with loops', 2, None, 'and-with-loops'),
-              ('Using recursion', 2, None, 'using-recursion'),
-              ('Using Autograd with OLS', 2, None, 'using-autograd-with-ols'),
-              ('Same code but now with momentum gradient descent',
-               2,
-               None,
-               'same-code-but-now-with-momentum-gradient-descent'),
-              ('Including Stochastic Gradient Descent with Autograd',
-               2,
-               None,
-               'including-stochastic-gradient-descent-with-autograd'),
-              ('Same code but now with momentum gradient descent',
-               2,
-               None,
-               'same-code-but-now-with-momentum-gradient-descent'),
-              ('Similar (second order function now) problem but now with '
-               'AdaGrad',
-               2,
+               'example-code-for-logistic-regression'),
+              ('Synthetic data generation',
+               3,
                None,
-               'similar-second-order-function-now-problem-but-now-with-adagrad'),
-              ('RMSprop for adaptive learning rate with Stochastic Gradient '
-               'Descent',
+               'synthetic-data-generation'),
+              ('Using _Scikit-learn_', 2, None, 'using-scikit-learn'),
+              ('Using the correlation matrix',
                2,
                None,
-               'rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent'),
-              ('And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"',
+               'using-the-correlation-matrix'),
+              ('Discussing the correlation data',
                2,
                None,
-               'and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf'),
-              ('And Logistic Regression', 2, None, 'and-logistic-regression'),
-              ('Introducing "JAX":"https://jax.readthedocs.io/en/latest/"',
+               'discussing-the-correlation-data'),
+              ('Other measures in classification studies',
                2,
                None,
-               'introducing-jax-https-jax-readthedocs-io-en-latest'),
-              ('Getting started with Jax, note the way we import numpy',
-               3,
-               None,
-               'getting-started-with-jax-note-the-way-we-import-numpy'),
-              ('A warm-up example', 3, None, 'a-warm-up-example'),
-              ('A more advanced example', 3, None, 'a-more-advanced-example'),
+               'other-measures-in-classification-studies'),
               ('Introduction to Neural networks',
                2,
                None,
@@ -276,81 +201,54 @@
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs001.html#plans-for-week-40" style="font-size: 80%;"><b>Plans for week 40</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs002.html#lecture-monday-september-30-2024" style="font-size: 80%;"><b>Lecture Monday September 30, 2024</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs003.html#suggested-readings-and-videos" style="font-size: 80%;"><b>Suggested readings and videos</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs004.html#lab-sessions-tuesday-and-wednesday" style="font-size: 80%;"><b>Lab sessions Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs005.html#summary-from-last-week-using-gradient-descent-methods-limitations" style="font-size: 80%;"><b>Summary from last week, using gradient descent methods, limitations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs006.html#simple-implementation-of-gd-for-ols-ridge-and-lasso" style="font-size: 80%;"><b>Simple implementation of GD for OLS, Ridge and Lasso</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs007.html#but-none-of-these-can-compete-with-newton-s-method" style="font-size: 80%;"><b>But none of these can compete with Newton's method</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs008.html#gradient-descent-and-logistic-regression" style="font-size: 80%;"><b>Gradient descent and Logistic regression</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs009.html#overview-video-on-stochastic-gradient-descent" style="font-size: 80%;"><b>Overview video on Stochastic Gradient Descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs010.html#batches-and-mini-batches" style="font-size: 80%;"><b>Batches and mini-batches</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs011.html#stochastic-gradient-descent-sgd" style="font-size: 80%;"><b>Stochastic Gradient Descent (SGD)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs012.html#stochastic-gradient-descent" style="font-size: 80%;"><b>Stochastic Gradient Descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="#computation-of-gradients" style="font-size: 80%;"><b>Computation of gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs014.html#sgd-example" style="font-size: 80%;"><b>SGD example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs015.html#the-gradient-step" style="font-size: 80%;"><b>The gradient step</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs016.html#simple-example-code" style="font-size: 80%;"><b>Simple example code</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs017.html#when-do-we-stop" style="font-size: 80%;"><b>When do we stop?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs018.html#slightly-different-approach" style="font-size: 80%;"><b>Slightly different approach</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs019.html#time-decay-rate" style="font-size: 80%;"><b>Time decay rate</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs020.html#code-with-a-number-of-minibatches-which-varies" style="font-size: 80%;"><b>Code with a Number of Minibatches which varies</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs021.html#replace-or-not" style="font-size: 80%;"><b>Replace or not</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs022.html#momentum-based-gd" style="font-size: 80%;"><b>Momentum based GD</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs023.html#more-on-momentum-based-approaches" style="font-size: 80%;"><b>More on momentum based approaches</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs024.html#momentum-parameter" style="font-size: 80%;"><b>Momentum parameter</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs025.html#second-moment-of-the-gradient" style="font-size: 80%;"><b>Second moment of the gradient</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs026.html#rms-prop" style="font-size: 80%;"><b>RMS prop</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs027.html#adam-optimizer-https-arxiv-org-abs-1412-6980" style="font-size: 80%;"><b>"ADAM optimizer":"https://arxiv.org/abs/1412.6980"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#algorithms-and-codes-for-adagrad-rmsprop-and-adam" style="font-size: 80%;"><b>Algorithms and codes for Adagrad, RMSprop and Adam</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>AdaGrad algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>RMSProp algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>ADAM algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs029.html#practical-tips" style="font-size: 80%;"><b>Practical tips</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs030.html#automatic-differentiation" style="font-size: 80%;"><b>Automatic differentiation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs031.html#using-autograd" style="font-size: 80%;"><b>Using autograd</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs032.html#autograd-with-more-complicated-functions" style="font-size: 80%;"><b>Autograd with more complicated functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs033.html#more-complicated-functions-using-the-elements-of-their-arguments-directly" style="font-size: 80%;"><b>More complicated functions using the elements of their arguments directly</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs034.html#functions-using-mathematical-functions-from-numpy" style="font-size: 80%;"><b>Functions using mathematical functions from Numpy</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs035.html#more-autograd" style="font-size: 80%;"><b>More autograd</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs036.html#and-with-loops" style="font-size: 80%;"><b>And  with loops</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs037.html#using-recursion" style="font-size: 80%;"><b>Using recursion</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs038.html#using-autograd-with-ols" style="font-size: 80%;"><b>Using Autograd with OLS</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs041.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs040.html#including-stochastic-gradient-descent-with-autograd" style="font-size: 80%;"><b>Including Stochastic Gradient Descent with Autograd</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs041.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#similar-second-order-function-now-problem-but-now-with-adagrad" style="font-size: 80%;"><b>Similar (second order function now) problem but now with AdaGrad</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs043.html#rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent" style="font-size: 80%;"><b>RMSprop for adaptive learning rate with Stochastic Gradient Descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs044.html#and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf" style="font-size: 80%;"><b>And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#and-logistic-regression" style="font-size: 80%;"><b>And Logistic Regression</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#introducing-jax-https-jax-readthedocs-io-en-latest" style="font-size: 80%;"><b>Introducing "JAX":"https://jax.readthedocs.io/en/latest/"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#getting-started-with-jax-note-the-way-we-import-numpy" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Getting started with Jax, note the way we import numpy</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#a-warm-up-example" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;A warm-up example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#a-more-advanced-example" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;A more advanced example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs046.html#introduction-to-neural-networks" style="font-size: 80%;"><b>Introduction to Neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs047.html#artificial-neurons" style="font-size: 80%;"><b>Artificial neurons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs048.html#neural-network-types" style="font-size: 80%;"><b>Neural network types</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs049.html#feed-forward-neural-networks" style="font-size: 80%;"><b>Feed-forward neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs050.html#convolutional-neural-network" style="font-size: 80%;"><b>Convolutional Neural Network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs051.html#recurrent-neural-networks" style="font-size: 80%;"><b>Recurrent neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs052.html#other-types-of-networks" style="font-size: 80%;"><b>Other types of networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs053.html#multilayer-perceptrons" style="font-size: 80%;"><b>Multilayer perceptrons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs054.html#why-multilayer-perceptrons" style="font-size: 80%;"><b>Why multilayer perceptrons?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs055.html#illustration-of-a-single-perceptron-model-and-a-multi-perceptron-model" style="font-size: 80%;"><b>Illustration of a single perceptron model and a multi-perceptron model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs056.html#examples-of-xor-or-and-and-gates" style="font-size: 80%;"><b>Examples of XOR, OR and AND gates</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs057.html#does-logistic-regression-do-a-better-job" style="font-size: 80%;"><b>Does Logistic Regression do a better Job?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs058.html#adding-neural-networks" style="font-size: 80%;"><b>Adding Neural Networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs064.html#matrix-vector-notation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Matrix-vector notation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs065.html#matrix-vector-notation-and-activation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Matrix-vector notation  and activation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs066.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs067.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs068.html#relevance" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Relevance</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs001.html#lecture-monday-september-29-2025" style="font-size: 80%;"><b>Lecture Monday September 29, 2025</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs002.html#suggested-readings-and-videos" style="font-size: 80%;"><b>Suggested readings and videos</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs003.html#lab-sessions-tuesday-and-wednesday" style="font-size: 80%;"><b>Lab sessions Tuesday and Wednesday</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs004.html#logistic-regression-from-last-week" style="font-size: 80%;"><b>Logistic Regression, from last week</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs005.html#classification-problems" style="font-size: 80%;"><b>Classification problems</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs006.html#optimization-and-deep-learning" style="font-size: 80%;"><b>Optimization and Deep learning</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs007.html#basics" style="font-size: 80%;"><b>Basics</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs008.html#two-parameters" style="font-size: 80%;"><b>Two parameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs009.html#maximum-likelihood" style="font-size: 80%;"><b>Maximum likelihood</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs010.html#the-cost-function-rewritten" style="font-size: 80%;"><b>The cost function rewritten</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs011.html#minimizing-the-cross-entropy" style="font-size: 80%;"><b>Minimizing the cross entropy</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs012.html#a-more-compact-expression" style="font-size: 80%;"><b>A more compact expression</b></a></li>
+     <!-- navigation toc: --> <li><a href="#extending-to-more-predictors" style="font-size: 80%;"><b>Extending to more predictors</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs014.html#including-more-classes" style="font-size: 80%;"><b>Including more classes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs015.html#more-classes" style="font-size: 80%;"><b>More classes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs016.html#optimization-the-central-part-of-any-machine-learning-algortithm" style="font-size: 80%;"><b>Optimization, the central part of any Machine Learning algortithm</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs017.html#revisiting-our-logistic-regression-case" style="font-size: 80%;"><b>Revisiting our Logistic Regression case</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs018.html#the-equations-to-solve" style="font-size: 80%;"><b>The equations to solve</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs019.html#solving-using-newton-raphson-s-method" style="font-size: 80%;"><b>Solving using Newton-Raphson's method</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs020.html#example-code-for-logistic-regression" style="font-size: 80%;"><b>Example code for Logistic Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs020.html#synthetic-data-generation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Synthetic data generation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs021.html#using-scikit-learn" style="font-size: 80%;"><b>Using <b>Scikit-learn</b></b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs022.html#using-the-correlation-matrix" style="font-size: 80%;"><b>Using the correlation matrix</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs023.html#discussing-the-correlation-data" style="font-size: 80%;"><b>Discussing the correlation data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs024.html#other-measures-in-classification-studies" style="font-size: 80%;"><b>Other measures in classification studies</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs025.html#introduction-to-neural-networks" style="font-size: 80%;"><b>Introduction to Neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs026.html#artificial-neurons" style="font-size: 80%;"><b>Artificial neurons</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs027.html#neural-network-types" style="font-size: 80%;"><b>Neural network types</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#feed-forward-neural-networks" style="font-size: 80%;"><b>Feed-forward neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs029.html#convolutional-neural-network" style="font-size: 80%;"><b>Convolutional Neural Network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs030.html#recurrent-neural-networks" style="font-size: 80%;"><b>Recurrent neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs031.html#other-types-of-networks" style="font-size: 80%;"><b>Other types of networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs032.html#multilayer-perceptrons" style="font-size: 80%;"><b>Multilayer perceptrons</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs033.html#why-multilayer-perceptrons" style="font-size: 80%;"><b>Why multilayer perceptrons?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs034.html#illustration-of-a-single-perceptron-model-and-a-multi-perceptron-model" style="font-size: 80%;"><b>Illustration of a single perceptron model and a multi-perceptron model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs035.html#examples-of-xor-or-and-and-gates" style="font-size: 80%;"><b>Examples of XOR, OR and AND gates</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs036.html#does-logistic-regression-do-a-better-job" style="font-size: 80%;"><b>Does Logistic Regression do a better Job?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs037.html#adding-neural-networks" style="font-size: 80%;"><b>Adding Neural Networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs043.html#matrix-vector-notation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Matrix-vector notation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs044.html#matrix-vector-notation-and-activation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Matrix-vector notation  and activation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs046.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs047.html#relevance" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Relevance</a></li>
 
         </ul>
       </li>
@@ -362,22 +260,18 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0013"></a>
 <!-- !split -->
-<h2 id="computation-of-gradients" class="anchor">Computation of gradients </h2>
+<h2 id="extending-to-more-predictors" class="anchor">Extending to more predictors </h2>
+
+<p>Within a binary classification problem, we can easily expand our model to include multiple predictors. Our ratio between likelihoods is then with \( p \) predictors</p>
+$$
+\log{ \frac{p(\boldsymbol{\theta}\boldsymbol{x})}{1-p(\boldsymbol{\theta}\boldsymbol{x})}} = \theta_0+\theta_1x_1+\theta_2x_2+\dots+\theta_px_p.
+$$
 
-<p>This in turn means that the gradient can be
-computed as a sum over \( i \)-gradients 
-</p>
+<p>Here we defined \( \boldsymbol{x}=[1,x_1,x_2,\dots,x_p] \) and \( \boldsymbol{\theta}=[\theta_0, \theta_1, \dots, \theta_p] \) leading to</p>
 $$
-\nabla_\beta C(\mathbf{\beta}) = \sum_i^n \nabla_\beta c_i(\mathbf{x}_i,
-\mathbf{\beta}).
+p(\boldsymbol{\theta}\boldsymbol{x})=\frac{ \exp{(\theta_0+\theta_1x_1+\theta_2x_2+\dots+\theta_px_p)}}{1+\exp{(\theta_0+\theta_1x_1+\theta_2x_2+\dots+\theta_px_p)}}.
 $$
 
-<p>Stochasticity/randomness is introduced by only taking the
-gradient on a subset of the data called minibatches.  If there are \( n \)
-data points and the size of each minibatch is \( M \), there will be \( n/M \)
-minibatches. We denote these minibatches by \( B_k \) where
-\( k=1,\cdots,n/M \).
-</p>
 
 <p>
 <!-- navigation buttons at the bottom of the page -->
@@ -404,7 +298,7 @@ <h2 id="computation-of-gradients" class="anchor">Computation of gradients </h2>
   <li><a href="http://github.com/._week40-bs021.html">22</a></li>
   <li><a href="http://github.com/._week40-bs022.html">23</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week40-bs068.html">69</a></li>
+  <li><a href="http://github.com/._week40-bs047.html">48</a></li>
   <li><a href="http://github.com/._week40-bs014.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week40/html/._week40-bs014.html b/doc/pub/week40/html/._week40-bs014.html
index a59e64516..4b520e024 100644
--- a/doc/pub/week40/html/._week40-bs014.html
+++ b/doc/pub/week40/html/._week40-bs014.html
@@ -36,11 +36,10 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Plans for week 40', 2, None, 'plans-for-week-40'),
-              ('Lecture Monday September 30, 2024',
+ 'sections': [('Lecture Monday September 29, 2025',
                2,
                None,
-               'lecture-monday-september-30-2024'),
+               'lecture-monday-september-29-2025'),
               ('Suggested readings and videos',
                2,
                None,
@@ -49,145 +48,71 @@
                2,
                None,
                'lab-sessions-tuesday-and-wednesday'),
-              ('Summary from last week, using gradient descent methods, '
-               'limitations',
+              ('Logistic Regression, from last week',
                2,
                None,
-               'summary-from-last-week-using-gradient-descent-methods-limitations'),
-              ('Simple implementation of GD for OLS, Ridge and Lasso',
+               'logistic-regression-from-last-week'),
+              ('Classification problems', 2, None, 'classification-problems'),
+              ('Optimization and Deep learning',
                2,
                None,
-               'simple-implementation-of-gd-for-ols-ridge-and-lasso'),
-              ("But none of these can compete with Newton's method",
+               'optimization-and-deep-learning'),
+              ('Basics', 2, None, 'basics'),
+              ('Two parameters', 2, None, 'two-parameters'),
+              ('Maximum likelihood', 2, None, 'maximum-likelihood'),
+              ('The cost function rewritten',
                2,
                None,
-               'but-none-of-these-can-compete-with-newton-s-method'),
-              ('Gradient descent and Logistic regression',
+               'the-cost-function-rewritten'),
+              ('Minimizing the cross entropy',
                2,
                None,
-               'gradient-descent-and-logistic-regression'),
-              ('Overview video on Stochastic Gradient Descent',
+               'minimizing-the-cross-entropy'),
+              ('A more compact expression',
                2,
                None,
-               'overview-video-on-stochastic-gradient-descent'),
-              ('Batches and mini-batches', 2, None, 'batches-and-mini-batches'),
-              ('Stochastic Gradient Descent (SGD)',
+               'a-more-compact-expression'),
+              ('Extending to more predictors',
                2,
                None,
-               'stochastic-gradient-descent-sgd'),
-              ('Stochastic Gradient Descent',
+               'extending-to-more-predictors'),
+              ('Including more classes', 2, None, 'including-more-classes'),
+              ('More classes', 2, None, 'more-classes'),
+              ('Optimization, the central part of any Machine Learning '
+               'algortithm',
                2,
                None,
-               'stochastic-gradient-descent'),
-              ('Computation of gradients', 2, None, 'computation-of-gradients'),
-              ('SGD example', 2, None, 'sgd-example'),
-              ('The gradient step', 2, None, 'the-gradient-step'),
-              ('Simple example code', 2, None, 'simple-example-code'),
-              ('When do we stop?', 2, None, 'when-do-we-stop'),
-              ('Slightly different approach',
+               'optimization-the-central-part-of-any-machine-learning-algortithm'),
+              ('Revisiting our Logistic Regression case',
                2,
                None,
-               'slightly-different-approach'),
-              ('Time decay rate', 2, None, 'time-decay-rate'),
-              ('Code with a Number of Minibatches which varies',
+               'revisiting-our-logistic-regression-case'),
+              ('The equations to solve', 2, None, 'the-equations-to-solve'),
+              ("Solving using Newton-Raphson's method",
                2,
                None,
-               'code-with-a-number-of-minibatches-which-varies'),
-              ('Replace or not', 2, None, 'replace-or-not'),
-              ('Momentum based GD', 2, None, 'momentum-based-gd'),
-              ('More on momentum based approaches',
+               'solving-using-newton-raphson-s-method'),
+              ('Example code for Logistic Regression',
                2,
                None,
-               'more-on-momentum-based-approaches'),
-              ('Momentum parameter', 2, None, 'momentum-parameter'),
-              ('Second moment of the gradient',
-               2,
-               None,
-               'second-moment-of-the-gradient'),
-              ('RMS prop', 2, None, 'rms-prop'),
-              ('"ADAM optimizer":"https://arxiv.org/abs/1412.6980"',
-               2,
-               None,
-               'adam-optimizer-https-arxiv-org-abs-1412-6980'),
-              ('Algorithms and codes for Adagrad, RMSprop and Adam',
-               2,
-               None,
-               'algorithms-and-codes-for-adagrad-rmsprop-and-adam'),
-              ('AdaGrad algorithm, taken from "Goodfellow et '
-               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
-               2,
-               None,
-               'adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
-              ('RMSProp algorithm, taken from "Goodfellow et '
-               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
-               2,
-               None,
-               'rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
-              ('ADAM algorithm, taken from "Goodfellow et '
-               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
-               2,
-               None,
-               'adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
-              ('Practical tips', 2, None, 'practical-tips'),
-              ('Automatic differentiation',
-               2,
-               None,
-               'automatic-differentiation'),
-              ('Using autograd', 2, None, 'using-autograd'),
-              ('Autograd with more complicated functions',
-               2,
-               None,
-               'autograd-with-more-complicated-functions'),
-              ('More complicated functions using the elements of their '
-               'arguments directly',
-               2,
-               None,
-               'more-complicated-functions-using-the-elements-of-their-arguments-directly'),
-              ('Functions using mathematical functions from Numpy',
-               2,
-               None,
-               'functions-using-mathematical-functions-from-numpy'),
-              ('More autograd', 2, None, 'more-autograd'),
-              ('And  with loops', 2, None, 'and-with-loops'),
-              ('Using recursion', 2, None, 'using-recursion'),
-              ('Using Autograd with OLS', 2, None, 'using-autograd-with-ols'),
-              ('Same code but now with momentum gradient descent',
-               2,
-               None,
-               'same-code-but-now-with-momentum-gradient-descent'),
-              ('Including Stochastic Gradient Descent with Autograd',
-               2,
-               None,
-               'including-stochastic-gradient-descent-with-autograd'),
-              ('Same code but now with momentum gradient descent',
-               2,
-               None,
-               'same-code-but-now-with-momentum-gradient-descent'),
-              ('Similar (second order function now) problem but now with '
-               'AdaGrad',
-               2,
+               'example-code-for-logistic-regression'),
+              ('Synthetic data generation',
+               3,
                None,
-               'similar-second-order-function-now-problem-but-now-with-adagrad'),
-              ('RMSprop for adaptive learning rate with Stochastic Gradient '
-               'Descent',
+               'synthetic-data-generation'),
+              ('Using _Scikit-learn_', 2, None, 'using-scikit-learn'),
+              ('Using the correlation matrix',
                2,
                None,
-               'rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent'),
-              ('And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"',
+               'using-the-correlation-matrix'),
+              ('Discussing the correlation data',
                2,
                None,
-               'and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf'),
-              ('And Logistic Regression', 2, None, 'and-logistic-regression'),
-              ('Introducing "JAX":"https://jax.readthedocs.io/en/latest/"',
+               'discussing-the-correlation-data'),
+              ('Other measures in classification studies',
                2,
                None,
-               'introducing-jax-https-jax-readthedocs-io-en-latest'),
-              ('Getting started with Jax, note the way we import numpy',
-               3,
-               None,
-               'getting-started-with-jax-note-the-way-we-import-numpy'),
-              ('A warm-up example', 3, None, 'a-warm-up-example'),
-              ('A more advanced example', 3, None, 'a-more-advanced-example'),
+               'other-measures-in-classification-studies'),
               ('Introduction to Neural networks',
                2,
                None,
@@ -276,81 +201,54 @@
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs001.html#plans-for-week-40" style="font-size: 80%;"><b>Plans for week 40</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs002.html#lecture-monday-september-30-2024" style="font-size: 80%;"><b>Lecture Monday September 30, 2024</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs003.html#suggested-readings-and-videos" style="font-size: 80%;"><b>Suggested readings and videos</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs004.html#lab-sessions-tuesday-and-wednesday" style="font-size: 80%;"><b>Lab sessions Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs005.html#summary-from-last-week-using-gradient-descent-methods-limitations" style="font-size: 80%;"><b>Summary from last week, using gradient descent methods, limitations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs006.html#simple-implementation-of-gd-for-ols-ridge-and-lasso" style="font-size: 80%;"><b>Simple implementation of GD for OLS, Ridge and Lasso</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs007.html#but-none-of-these-can-compete-with-newton-s-method" style="font-size: 80%;"><b>But none of these can compete with Newton's method</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs008.html#gradient-descent-and-logistic-regression" style="font-size: 80%;"><b>Gradient descent and Logistic regression</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs009.html#overview-video-on-stochastic-gradient-descent" style="font-size: 80%;"><b>Overview video on Stochastic Gradient Descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs010.html#batches-and-mini-batches" style="font-size: 80%;"><b>Batches and mini-batches</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs011.html#stochastic-gradient-descent-sgd" style="font-size: 80%;"><b>Stochastic Gradient Descent (SGD)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs012.html#stochastic-gradient-descent" style="font-size: 80%;"><b>Stochastic Gradient Descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs013.html#computation-of-gradients" style="font-size: 80%;"><b>Computation of gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="#sgd-example" style="font-size: 80%;"><b>SGD example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs015.html#the-gradient-step" style="font-size: 80%;"><b>The gradient step</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs016.html#simple-example-code" style="font-size: 80%;"><b>Simple example code</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs017.html#when-do-we-stop" style="font-size: 80%;"><b>When do we stop?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs018.html#slightly-different-approach" style="font-size: 80%;"><b>Slightly different approach</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs019.html#time-decay-rate" style="font-size: 80%;"><b>Time decay rate</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs020.html#code-with-a-number-of-minibatches-which-varies" style="font-size: 80%;"><b>Code with a Number of Minibatches which varies</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs021.html#replace-or-not" style="font-size: 80%;"><b>Replace or not</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs022.html#momentum-based-gd" style="font-size: 80%;"><b>Momentum based GD</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs023.html#more-on-momentum-based-approaches" style="font-size: 80%;"><b>More on momentum based approaches</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs024.html#momentum-parameter" style="font-size: 80%;"><b>Momentum parameter</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs025.html#second-moment-of-the-gradient" style="font-size: 80%;"><b>Second moment of the gradient</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs026.html#rms-prop" style="font-size: 80%;"><b>RMS prop</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs027.html#adam-optimizer-https-arxiv-org-abs-1412-6980" style="font-size: 80%;"><b>"ADAM optimizer":"https://arxiv.org/abs/1412.6980"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#algorithms-and-codes-for-adagrad-rmsprop-and-adam" style="font-size: 80%;"><b>Algorithms and codes for Adagrad, RMSprop and Adam</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>AdaGrad algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>RMSProp algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>ADAM algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs029.html#practical-tips" style="font-size: 80%;"><b>Practical tips</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs030.html#automatic-differentiation" style="font-size: 80%;"><b>Automatic differentiation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs031.html#using-autograd" style="font-size: 80%;"><b>Using autograd</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs032.html#autograd-with-more-complicated-functions" style="font-size: 80%;"><b>Autograd with more complicated functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs033.html#more-complicated-functions-using-the-elements-of-their-arguments-directly" style="font-size: 80%;"><b>More complicated functions using the elements of their arguments directly</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs034.html#functions-using-mathematical-functions-from-numpy" style="font-size: 80%;"><b>Functions using mathematical functions from Numpy</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs035.html#more-autograd" style="font-size: 80%;"><b>More autograd</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs036.html#and-with-loops" style="font-size: 80%;"><b>And  with loops</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs037.html#using-recursion" style="font-size: 80%;"><b>Using recursion</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs038.html#using-autograd-with-ols" style="font-size: 80%;"><b>Using Autograd with OLS</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs041.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs040.html#including-stochastic-gradient-descent-with-autograd" style="font-size: 80%;"><b>Including Stochastic Gradient Descent with Autograd</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs041.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#similar-second-order-function-now-problem-but-now-with-adagrad" style="font-size: 80%;"><b>Similar (second order function now) problem but now with AdaGrad</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs043.html#rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent" style="font-size: 80%;"><b>RMSprop for adaptive learning rate with Stochastic Gradient Descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs044.html#and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf" style="font-size: 80%;"><b>And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#and-logistic-regression" style="font-size: 80%;"><b>And Logistic Regression</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#introducing-jax-https-jax-readthedocs-io-en-latest" style="font-size: 80%;"><b>Introducing "JAX":"https://jax.readthedocs.io/en/latest/"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#getting-started-with-jax-note-the-way-we-import-numpy" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Getting started with Jax, note the way we import numpy</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#a-warm-up-example" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;A warm-up example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#a-more-advanced-example" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;A more advanced example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs046.html#introduction-to-neural-networks" style="font-size: 80%;"><b>Introduction to Neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs047.html#artificial-neurons" style="font-size: 80%;"><b>Artificial neurons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs048.html#neural-network-types" style="font-size: 80%;"><b>Neural network types</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs049.html#feed-forward-neural-networks" style="font-size: 80%;"><b>Feed-forward neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs050.html#convolutional-neural-network" style="font-size: 80%;"><b>Convolutional Neural Network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs051.html#recurrent-neural-networks" style="font-size: 80%;"><b>Recurrent neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs052.html#other-types-of-networks" style="font-size: 80%;"><b>Other types of networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs053.html#multilayer-perceptrons" style="font-size: 80%;"><b>Multilayer perceptrons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs054.html#why-multilayer-perceptrons" style="font-size: 80%;"><b>Why multilayer perceptrons?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs055.html#illustration-of-a-single-perceptron-model-and-a-multi-perceptron-model" style="font-size: 80%;"><b>Illustration of a single perceptron model and a multi-perceptron model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs056.html#examples-of-xor-or-and-and-gates" style="font-size: 80%;"><b>Examples of XOR, OR and AND gates</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs057.html#does-logistic-regression-do-a-better-job" style="font-size: 80%;"><b>Does Logistic Regression do a better Job?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs058.html#adding-neural-networks" style="font-size: 80%;"><b>Adding Neural Networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs064.html#matrix-vector-notation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Matrix-vector notation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs065.html#matrix-vector-notation-and-activation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Matrix-vector notation  and activation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs066.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs067.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs068.html#relevance" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Relevance</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs001.html#lecture-monday-september-29-2025" style="font-size: 80%;"><b>Lecture Monday September 29, 2025</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs002.html#suggested-readings-and-videos" style="font-size: 80%;"><b>Suggested readings and videos</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs003.html#lab-sessions-tuesday-and-wednesday" style="font-size: 80%;"><b>Lab sessions Tuesday and Wednesday</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs004.html#logistic-regression-from-last-week" style="font-size: 80%;"><b>Logistic Regression, from last week</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs005.html#classification-problems" style="font-size: 80%;"><b>Classification problems</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs006.html#optimization-and-deep-learning" style="font-size: 80%;"><b>Optimization and Deep learning</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs007.html#basics" style="font-size: 80%;"><b>Basics</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs008.html#two-parameters" style="font-size: 80%;"><b>Two parameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs009.html#maximum-likelihood" style="font-size: 80%;"><b>Maximum likelihood</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs010.html#the-cost-function-rewritten" style="font-size: 80%;"><b>The cost function rewritten</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs011.html#minimizing-the-cross-entropy" style="font-size: 80%;"><b>Minimizing the cross entropy</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs012.html#a-more-compact-expression" style="font-size: 80%;"><b>A more compact expression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs013.html#extending-to-more-predictors" style="font-size: 80%;"><b>Extending to more predictors</b></a></li>
+     <!-- navigation toc: --> <li><a href="#including-more-classes" style="font-size: 80%;"><b>Including more classes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs015.html#more-classes" style="font-size: 80%;"><b>More classes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs016.html#optimization-the-central-part-of-any-machine-learning-algortithm" style="font-size: 80%;"><b>Optimization, the central part of any Machine Learning algortithm</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs017.html#revisiting-our-logistic-regression-case" style="font-size: 80%;"><b>Revisiting our Logistic Regression case</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs018.html#the-equations-to-solve" style="font-size: 80%;"><b>The equations to solve</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs019.html#solving-using-newton-raphson-s-method" style="font-size: 80%;"><b>Solving using Newton-Raphson's method</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs020.html#example-code-for-logistic-regression" style="font-size: 80%;"><b>Example code for Logistic Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs020.html#synthetic-data-generation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Synthetic data generation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs021.html#using-scikit-learn" style="font-size: 80%;"><b>Using <b>Scikit-learn</b></b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs022.html#using-the-correlation-matrix" style="font-size: 80%;"><b>Using the correlation matrix</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs023.html#discussing-the-correlation-data" style="font-size: 80%;"><b>Discussing the correlation data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs024.html#other-measures-in-classification-studies" style="font-size: 80%;"><b>Other measures in classification studies</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs025.html#introduction-to-neural-networks" style="font-size: 80%;"><b>Introduction to Neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs026.html#artificial-neurons" style="font-size: 80%;"><b>Artificial neurons</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs027.html#neural-network-types" style="font-size: 80%;"><b>Neural network types</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#feed-forward-neural-networks" style="font-size: 80%;"><b>Feed-forward neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs029.html#convolutional-neural-network" style="font-size: 80%;"><b>Convolutional Neural Network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs030.html#recurrent-neural-networks" style="font-size: 80%;"><b>Recurrent neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs031.html#other-types-of-networks" style="font-size: 80%;"><b>Other types of networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs032.html#multilayer-perceptrons" style="font-size: 80%;"><b>Multilayer perceptrons</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs033.html#why-multilayer-perceptrons" style="font-size: 80%;"><b>Why multilayer perceptrons?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs034.html#illustration-of-a-single-perceptron-model-and-a-multi-perceptron-model" style="font-size: 80%;"><b>Illustration of a single perceptron model and a multi-perceptron model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs035.html#examples-of-xor-or-and-and-gates" style="font-size: 80%;"><b>Examples of XOR, OR and AND gates</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs036.html#does-logistic-regression-do-a-better-job" style="font-size: 80%;"><b>Does Logistic Regression do a better Job?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs037.html#adding-neural-networks" style="font-size: 80%;"><b>Adding Neural Networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs043.html#matrix-vector-notation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Matrix-vector notation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs044.html#matrix-vector-notation-and-activation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Matrix-vector notation  and activation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs046.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs047.html#relevance" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Relevance</a></li>
 
         </ul>
       </li>
@@ -362,28 +260,30 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0014"></a>
 <!-- !split -->
-<h2 id="sgd-example" class="anchor">SGD example </h2>
-<p>As an example, suppose we have \( 10 \) data points \( (\mathbf{x}_1,\cdots, \mathbf{x}_{10}) \) 
-and we choose to have \( M=5 \) minibathces,
-then each minibatch contains two data points. In particular we have
-\( B_1 = (\mathbf{x}_1,\mathbf{x}_2), \cdots, B_5 =
-(\mathbf{x}_9,\mathbf{x}_{10}) \). Note that if you choose \( M=1 \) you
-have only a single batch with all data points and on the other extreme,
-you may choose \( M=n \) resulting in a minibatch for each datapoint, i.e
-\( B_k = \mathbf{x}_k \).
-</p>
+<h2 id="including-more-classes" class="anchor">Including more classes </h2>
 
-<p>The idea is now to approximate the gradient by replacing the sum over
-all data points with a sum over the data points in one the minibatches
-picked at random in each gradient descent step 
+<p>Till now we have mainly focused on two classes, the so-called binary
+system. Suppose we wish to extend to \( K \) classes.  Let us for the sake
+of simplicity assume we have only two predictors. We have then following model
 </p>
+
+$$
+\log{\frac{p(C=1\vert x)}{p(K\vert x)}} = \theta_{10}+\theta_{11}x_1,
+$$
+
+<p>and </p>
 $$
-\nabla_{\beta}
-C(\mathbf{\beta}) = \sum_{i=1}^n \nabla_\beta c_i(\mathbf{x}_i,
-\mathbf{\beta}) \rightarrow \sum_{i \in B_k}^n \nabla_\beta
-c_i(\mathbf{x}_i, \mathbf{\beta}).
+\log{\frac{p(C=2\vert x)}{p(K\vert x)}} = \theta_{20}+\theta_{21}x_1,
 $$
 
+<p>and so on till the class \( C=K-1 \) class</p>
+$$
+\log{\frac{p(C=K-1\vert x)}{p(K\vert x)}} = \theta_{(K-1)0}+\theta_{(K-1)1}x_1,
+$$
+
+<p>and the model is specified in term of \( K-1 \) so-called log-odds or
+<b>logit</b> transformations.
+</p>
 
 <p>
 <!-- navigation buttons at the bottom of the page -->
@@ -410,7 +310,7 @@ <h2 id="sgd-example" class="anchor">SGD example </h2>
   <li><a href="http://github.com/._week40-bs022.html">23</a></li>
   <li><a href="http://github.com/._week40-bs023.html">24</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week40-bs068.html">69</a></li>
+  <li><a href="http://github.com/._week40-bs047.html">48</a></li>
   <li><a href="http://github.com/._week40-bs015.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week40/html/._week40-bs015.html b/doc/pub/week40/html/._week40-bs015.html
index 807703ac2..2601c6eb7 100644
--- a/doc/pub/week40/html/._week40-bs015.html
+++ b/doc/pub/week40/html/._week40-bs015.html
@@ -36,11 +36,10 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Plans for week 40', 2, None, 'plans-for-week-40'),
-              ('Lecture Monday September 30, 2024',
+ 'sections': [('Lecture Monday September 29, 2025',
                2,
                None,
-               'lecture-monday-september-30-2024'),
+               'lecture-monday-september-29-2025'),
               ('Suggested readings and videos',
                2,
                None,
@@ -49,145 +48,71 @@
                2,
                None,
                'lab-sessions-tuesday-and-wednesday'),
-              ('Summary from last week, using gradient descent methods, '
-               'limitations',
+              ('Logistic Regression, from last week',
                2,
                None,
-               'summary-from-last-week-using-gradient-descent-methods-limitations'),
-              ('Simple implementation of GD for OLS, Ridge and Lasso',
+               'logistic-regression-from-last-week'),
+              ('Classification problems', 2, None, 'classification-problems'),
+              ('Optimization and Deep learning',
                2,
                None,
-               'simple-implementation-of-gd-for-ols-ridge-and-lasso'),
-              ("But none of these can compete with Newton's method",
+               'optimization-and-deep-learning'),
+              ('Basics', 2, None, 'basics'),
+              ('Two parameters', 2, None, 'two-parameters'),
+              ('Maximum likelihood', 2, None, 'maximum-likelihood'),
+              ('The cost function rewritten',
                2,
                None,
-               'but-none-of-these-can-compete-with-newton-s-method'),
-              ('Gradient descent and Logistic regression',
+               'the-cost-function-rewritten'),
+              ('Minimizing the cross entropy',
                2,
                None,
-               'gradient-descent-and-logistic-regression'),
-              ('Overview video on Stochastic Gradient Descent',
+               'minimizing-the-cross-entropy'),
+              ('A more compact expression',
                2,
                None,
-               'overview-video-on-stochastic-gradient-descent'),
-              ('Batches and mini-batches', 2, None, 'batches-and-mini-batches'),
-              ('Stochastic Gradient Descent (SGD)',
+               'a-more-compact-expression'),
+              ('Extending to more predictors',
                2,
                None,
-               'stochastic-gradient-descent-sgd'),
-              ('Stochastic Gradient Descent',
+               'extending-to-more-predictors'),
+              ('Including more classes', 2, None, 'including-more-classes'),
+              ('More classes', 2, None, 'more-classes'),
+              ('Optimization, the central part of any Machine Learning '
+               'algortithm',
                2,
                None,
-               'stochastic-gradient-descent'),
-              ('Computation of gradients', 2, None, 'computation-of-gradients'),
-              ('SGD example', 2, None, 'sgd-example'),
-              ('The gradient step', 2, None, 'the-gradient-step'),
-              ('Simple example code', 2, None, 'simple-example-code'),
-              ('When do we stop?', 2, None, 'when-do-we-stop'),
-              ('Slightly different approach',
+               'optimization-the-central-part-of-any-machine-learning-algortithm'),
+              ('Revisiting our Logistic Regression case',
                2,
                None,
-               'slightly-different-approach'),
-              ('Time decay rate', 2, None, 'time-decay-rate'),
-              ('Code with a Number of Minibatches which varies',
+               'revisiting-our-logistic-regression-case'),
+              ('The equations to solve', 2, None, 'the-equations-to-solve'),
+              ("Solving using Newton-Raphson's method",
                2,
                None,
-               'code-with-a-number-of-minibatches-which-varies'),
-              ('Replace or not', 2, None, 'replace-or-not'),
-              ('Momentum based GD', 2, None, 'momentum-based-gd'),
-              ('More on momentum based approaches',
+               'solving-using-newton-raphson-s-method'),
+              ('Example code for Logistic Regression',
                2,
                None,
-               'more-on-momentum-based-approaches'),
-              ('Momentum parameter', 2, None, 'momentum-parameter'),
-              ('Second moment of the gradient',
-               2,
-               None,
-               'second-moment-of-the-gradient'),
-              ('RMS prop', 2, None, 'rms-prop'),
-              ('"ADAM optimizer":"https://arxiv.org/abs/1412.6980"',
-               2,
-               None,
-               'adam-optimizer-https-arxiv-org-abs-1412-6980'),
-              ('Algorithms and codes for Adagrad, RMSprop and Adam',
-               2,
-               None,
-               'algorithms-and-codes-for-adagrad-rmsprop-and-adam'),
-              ('AdaGrad algorithm, taken from "Goodfellow et '
-               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
-               2,
-               None,
-               'adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
-              ('RMSProp algorithm, taken from "Goodfellow et '
-               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
-               2,
-               None,
-               'rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
-              ('ADAM algorithm, taken from "Goodfellow et '
-               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
-               2,
-               None,
-               'adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
-              ('Practical tips', 2, None, 'practical-tips'),
-              ('Automatic differentiation',
-               2,
-               None,
-               'automatic-differentiation'),
-              ('Using autograd', 2, None, 'using-autograd'),
-              ('Autograd with more complicated functions',
-               2,
-               None,
-               'autograd-with-more-complicated-functions'),
-              ('More complicated functions using the elements of their '
-               'arguments directly',
-               2,
-               None,
-               'more-complicated-functions-using-the-elements-of-their-arguments-directly'),
-              ('Functions using mathematical functions from Numpy',
-               2,
-               None,
-               'functions-using-mathematical-functions-from-numpy'),
-              ('More autograd', 2, None, 'more-autograd'),
-              ('And  with loops', 2, None, 'and-with-loops'),
-              ('Using recursion', 2, None, 'using-recursion'),
-              ('Using Autograd with OLS', 2, None, 'using-autograd-with-ols'),
-              ('Same code but now with momentum gradient descent',
-               2,
-               None,
-               'same-code-but-now-with-momentum-gradient-descent'),
-              ('Including Stochastic Gradient Descent with Autograd',
-               2,
-               None,
-               'including-stochastic-gradient-descent-with-autograd'),
-              ('Same code but now with momentum gradient descent',
-               2,
-               None,
-               'same-code-but-now-with-momentum-gradient-descent'),
-              ('Similar (second order function now) problem but now with '
-               'AdaGrad',
-               2,
+               'example-code-for-logistic-regression'),
+              ('Synthetic data generation',
+               3,
                None,
-               'similar-second-order-function-now-problem-but-now-with-adagrad'),
-              ('RMSprop for adaptive learning rate with Stochastic Gradient '
-               'Descent',
+               'synthetic-data-generation'),
+              ('Using _Scikit-learn_', 2, None, 'using-scikit-learn'),
+              ('Using the correlation matrix',
                2,
                None,
-               'rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent'),
-              ('And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"',
+               'using-the-correlation-matrix'),
+              ('Discussing the correlation data',
                2,
                None,
-               'and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf'),
-              ('And Logistic Regression', 2, None, 'and-logistic-regression'),
-              ('Introducing "JAX":"https://jax.readthedocs.io/en/latest/"',
+               'discussing-the-correlation-data'),
+              ('Other measures in classification studies',
                2,
                None,
-               'introducing-jax-https-jax-readthedocs-io-en-latest'),
-              ('Getting started with Jax, note the way we import numpy',
-               3,
-               None,
-               'getting-started-with-jax-note-the-way-we-import-numpy'),
-              ('A warm-up example', 3, None, 'a-warm-up-example'),
-              ('A more advanced example', 3, None, 'a-more-advanced-example'),
+               'other-measures-in-classification-studies'),
               ('Introduction to Neural networks',
                2,
                None,
@@ -276,81 +201,54 @@
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs001.html#plans-for-week-40" style="font-size: 80%;"><b>Plans for week 40</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs002.html#lecture-monday-september-30-2024" style="font-size: 80%;"><b>Lecture Monday September 30, 2024</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs003.html#suggested-readings-and-videos" style="font-size: 80%;"><b>Suggested readings and videos</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs004.html#lab-sessions-tuesday-and-wednesday" style="font-size: 80%;"><b>Lab sessions Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs005.html#summary-from-last-week-using-gradient-descent-methods-limitations" style="font-size: 80%;"><b>Summary from last week, using gradient descent methods, limitations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs006.html#simple-implementation-of-gd-for-ols-ridge-and-lasso" style="font-size: 80%;"><b>Simple implementation of GD for OLS, Ridge and Lasso</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs007.html#but-none-of-these-can-compete-with-newton-s-method" style="font-size: 80%;"><b>But none of these can compete with Newton's method</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs008.html#gradient-descent-and-logistic-regression" style="font-size: 80%;"><b>Gradient descent and Logistic regression</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs009.html#overview-video-on-stochastic-gradient-descent" style="font-size: 80%;"><b>Overview video on Stochastic Gradient Descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs010.html#batches-and-mini-batches" style="font-size: 80%;"><b>Batches and mini-batches</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs011.html#stochastic-gradient-descent-sgd" style="font-size: 80%;"><b>Stochastic Gradient Descent (SGD)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs012.html#stochastic-gradient-descent" style="font-size: 80%;"><b>Stochastic Gradient Descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs013.html#computation-of-gradients" style="font-size: 80%;"><b>Computation of gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs014.html#sgd-example" style="font-size: 80%;"><b>SGD example</b></a></li>
-     <!-- navigation toc: --> <li><a href="#the-gradient-step" style="font-size: 80%;"><b>The gradient step</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs016.html#simple-example-code" style="font-size: 80%;"><b>Simple example code</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs017.html#when-do-we-stop" style="font-size: 80%;"><b>When do we stop?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs018.html#slightly-different-approach" style="font-size: 80%;"><b>Slightly different approach</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs019.html#time-decay-rate" style="font-size: 80%;"><b>Time decay rate</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs020.html#code-with-a-number-of-minibatches-which-varies" style="font-size: 80%;"><b>Code with a Number of Minibatches which varies</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs021.html#replace-or-not" style="font-size: 80%;"><b>Replace or not</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs022.html#momentum-based-gd" style="font-size: 80%;"><b>Momentum based GD</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs023.html#more-on-momentum-based-approaches" style="font-size: 80%;"><b>More on momentum based approaches</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs024.html#momentum-parameter" style="font-size: 80%;"><b>Momentum parameter</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs025.html#second-moment-of-the-gradient" style="font-size: 80%;"><b>Second moment of the gradient</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs026.html#rms-prop" style="font-size: 80%;"><b>RMS prop</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs027.html#adam-optimizer-https-arxiv-org-abs-1412-6980" style="font-size: 80%;"><b>"ADAM optimizer":"https://arxiv.org/abs/1412.6980"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#algorithms-and-codes-for-adagrad-rmsprop-and-adam" style="font-size: 80%;"><b>Algorithms and codes for Adagrad, RMSprop and Adam</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>AdaGrad algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>RMSProp algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>ADAM algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs029.html#practical-tips" style="font-size: 80%;"><b>Practical tips</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs030.html#automatic-differentiation" style="font-size: 80%;"><b>Automatic differentiation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs031.html#using-autograd" style="font-size: 80%;"><b>Using autograd</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs032.html#autograd-with-more-complicated-functions" style="font-size: 80%;"><b>Autograd with more complicated functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs033.html#more-complicated-functions-using-the-elements-of-their-arguments-directly" style="font-size: 80%;"><b>More complicated functions using the elements of their arguments directly</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs034.html#functions-using-mathematical-functions-from-numpy" style="font-size: 80%;"><b>Functions using mathematical functions from Numpy</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs035.html#more-autograd" style="font-size: 80%;"><b>More autograd</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs036.html#and-with-loops" style="font-size: 80%;"><b>And  with loops</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs037.html#using-recursion" style="font-size: 80%;"><b>Using recursion</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs038.html#using-autograd-with-ols" style="font-size: 80%;"><b>Using Autograd with OLS</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs041.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs040.html#including-stochastic-gradient-descent-with-autograd" style="font-size: 80%;"><b>Including Stochastic Gradient Descent with Autograd</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs041.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#similar-second-order-function-now-problem-but-now-with-adagrad" style="font-size: 80%;"><b>Similar (second order function now) problem but now with AdaGrad</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs043.html#rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent" style="font-size: 80%;"><b>RMSprop for adaptive learning rate with Stochastic Gradient Descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs044.html#and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf" style="font-size: 80%;"><b>And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#and-logistic-regression" style="font-size: 80%;"><b>And Logistic Regression</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#introducing-jax-https-jax-readthedocs-io-en-latest" style="font-size: 80%;"><b>Introducing "JAX":"https://jax.readthedocs.io/en/latest/"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#getting-started-with-jax-note-the-way-we-import-numpy" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Getting started with Jax, note the way we import numpy</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#a-warm-up-example" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;A warm-up example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#a-more-advanced-example" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;A more advanced example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs046.html#introduction-to-neural-networks" style="font-size: 80%;"><b>Introduction to Neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs047.html#artificial-neurons" style="font-size: 80%;"><b>Artificial neurons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs048.html#neural-network-types" style="font-size: 80%;"><b>Neural network types</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs049.html#feed-forward-neural-networks" style="font-size: 80%;"><b>Feed-forward neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs050.html#convolutional-neural-network" style="font-size: 80%;"><b>Convolutional Neural Network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs051.html#recurrent-neural-networks" style="font-size: 80%;"><b>Recurrent neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs052.html#other-types-of-networks" style="font-size: 80%;"><b>Other types of networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs053.html#multilayer-perceptrons" style="font-size: 80%;"><b>Multilayer perceptrons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs054.html#why-multilayer-perceptrons" style="font-size: 80%;"><b>Why multilayer perceptrons?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs055.html#illustration-of-a-single-perceptron-model-and-a-multi-perceptron-model" style="font-size: 80%;"><b>Illustration of a single perceptron model and a multi-perceptron model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs056.html#examples-of-xor-or-and-and-gates" style="font-size: 80%;"><b>Examples of XOR, OR and AND gates</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs057.html#does-logistic-regression-do-a-better-job" style="font-size: 80%;"><b>Does Logistic Regression do a better Job?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs058.html#adding-neural-networks" style="font-size: 80%;"><b>Adding Neural Networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs064.html#matrix-vector-notation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Matrix-vector notation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs065.html#matrix-vector-notation-and-activation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Matrix-vector notation  and activation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs066.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs067.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs068.html#relevance" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Relevance</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs001.html#lecture-monday-september-29-2025" style="font-size: 80%;"><b>Lecture Monday September 29, 2025</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs002.html#suggested-readings-and-videos" style="font-size: 80%;"><b>Suggested readings and videos</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs003.html#lab-sessions-tuesday-and-wednesday" style="font-size: 80%;"><b>Lab sessions Tuesday and Wednesday</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs004.html#logistic-regression-from-last-week" style="font-size: 80%;"><b>Logistic Regression, from last week</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs005.html#classification-problems" style="font-size: 80%;"><b>Classification problems</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs006.html#optimization-and-deep-learning" style="font-size: 80%;"><b>Optimization and Deep learning</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs007.html#basics" style="font-size: 80%;"><b>Basics</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs008.html#two-parameters" style="font-size: 80%;"><b>Two parameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs009.html#maximum-likelihood" style="font-size: 80%;"><b>Maximum likelihood</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs010.html#the-cost-function-rewritten" style="font-size: 80%;"><b>The cost function rewritten</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs011.html#minimizing-the-cross-entropy" style="font-size: 80%;"><b>Minimizing the cross entropy</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs012.html#a-more-compact-expression" style="font-size: 80%;"><b>A more compact expression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs013.html#extending-to-more-predictors" style="font-size: 80%;"><b>Extending to more predictors</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs014.html#including-more-classes" style="font-size: 80%;"><b>Including more classes</b></a></li>
+     <!-- navigation toc: --> <li><a href="#more-classes" style="font-size: 80%;"><b>More classes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs016.html#optimization-the-central-part-of-any-machine-learning-algortithm" style="font-size: 80%;"><b>Optimization, the central part of any Machine Learning algortithm</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs017.html#revisiting-our-logistic-regression-case" style="font-size: 80%;"><b>Revisiting our Logistic Regression case</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs018.html#the-equations-to-solve" style="font-size: 80%;"><b>The equations to solve</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs019.html#solving-using-newton-raphson-s-method" style="font-size: 80%;"><b>Solving using Newton-Raphson's method</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs020.html#example-code-for-logistic-regression" style="font-size: 80%;"><b>Example code for Logistic Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs020.html#synthetic-data-generation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Synthetic data generation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs021.html#using-scikit-learn" style="font-size: 80%;"><b>Using <b>Scikit-learn</b></b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs022.html#using-the-correlation-matrix" style="font-size: 80%;"><b>Using the correlation matrix</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs023.html#discussing-the-correlation-data" style="font-size: 80%;"><b>Discussing the correlation data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs024.html#other-measures-in-classification-studies" style="font-size: 80%;"><b>Other measures in classification studies</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs025.html#introduction-to-neural-networks" style="font-size: 80%;"><b>Introduction to Neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs026.html#artificial-neurons" style="font-size: 80%;"><b>Artificial neurons</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs027.html#neural-network-types" style="font-size: 80%;"><b>Neural network types</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#feed-forward-neural-networks" style="font-size: 80%;"><b>Feed-forward neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs029.html#convolutional-neural-network" style="font-size: 80%;"><b>Convolutional Neural Network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs030.html#recurrent-neural-networks" style="font-size: 80%;"><b>Recurrent neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs031.html#other-types-of-networks" style="font-size: 80%;"><b>Other types of networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs032.html#multilayer-perceptrons" style="font-size: 80%;"><b>Multilayer perceptrons</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs033.html#why-multilayer-perceptrons" style="font-size: 80%;"><b>Why multilayer perceptrons?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs034.html#illustration-of-a-single-perceptron-model-and-a-multi-perceptron-model" style="font-size: 80%;"><b>Illustration of a single perceptron model and a multi-perceptron model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs035.html#examples-of-xor-or-and-and-gates" style="font-size: 80%;"><b>Examples of XOR, OR and AND gates</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs036.html#does-logistic-regression-do-a-better-job" style="font-size: 80%;"><b>Does Logistic Regression do a better Job?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs037.html#adding-neural-networks" style="font-size: 80%;"><b>Adding Neural Networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs043.html#matrix-vector-notation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Matrix-vector notation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs044.html#matrix-vector-notation-and-activation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Matrix-vector notation  and activation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs046.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs047.html#relevance" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Relevance</a></li>
 
         </ul>
       </li>
@@ -362,19 +260,41 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0015"></a>
 <!-- !split -->
-<h2 id="the-gradient-step" class="anchor">The gradient step </h2>
+<h2 id="more-classes" class="anchor">More classes </h2>
+
+<p>In our discussion of neural networks we will encounter the above again
+in terms of a slightly modified function, the so-called <b>Softmax</b> function.
+</p>
+
+<p>The softmax function is used in various multiclass classification
+methods, such as multinomial logistic regression (also known as
+softmax regression), multiclass linear discriminant analysis, naive
+Bayes classifiers, and artificial neural networks.  Specifically, in
+multinomial logistic regression and linear discriminant analysis, the
+input to the function is the result of \( K \) distinct linear functions,
+and the predicted probability for the \( k \)-th class given a sample
+vector \( \boldsymbol{x} \) and a weighting vector \( \boldsymbol{\theta} \) is (with two
+predictors):
+</p>
 
-<p>Thus a gradient descent step now looks like </p>
 $$
-\beta_{j+1} = \beta_j - \gamma_j \sum_{i \in B_k}^n \nabla_\beta c_i(\mathbf{x}_i,
-\mathbf{\beta})
+p(C=k\vert \mathbf {x} )=\frac{\exp{(\theta_{k0}+\theta_{k1}x_1)}}{1+\sum_{l=1}^{K-1}\exp{(\theta_{l0}+\theta_{l1}x_1)}}.
 $$
 
-<p>where \( k \) is picked at random with equal
-probability from \( [1,n/M] \). An iteration over the number of
-minibathces (n/M) is commonly referred to as an epoch. Thus it is
-typical to choose a number of epochs and for each epoch iterate over
-the number of minibatches, as exemplified in the code below.
+<p>It is easy to extend to more predictors. The final class is </p>
+$$
+p(C=K\vert \mathbf {x} )=\frac{1}{1+\sum_{l=1}^{K-1}\exp{(\theta_{l0}+\theta_{l1}x_1)}},
+$$
+
+<p>and they sum to one. Our earlier discussions were all specialized to
+the case with two classes only. It is easy to see from the above that
+what we derived earlier is compatible with these equations.
+</p>
+
+<p>To find the optimal parameters we would typically use a gradient
+descent method.  Newton's method and gradient descent methods are
+discussed in the material on <a href="https://compphysics.github.io/MachineLearning/doc/pub/Splines/html/Splines-bs.html" target="_self">optimization
+methods</a>.
 </p>
 
 <p>
@@ -402,7 +322,7 @@ <h2 id="the-gradient-step" class="anchor">The gradient step </h2>
   <li><a href="http://github.com/._week40-bs023.html">24</a></li>
   <li><a href="http://github.com/._week40-bs024.html">25</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week40-bs068.html">69</a></li>
+  <li><a href="http://github.com/._week40-bs047.html">48</a></li>
   <li><a href="http://github.com/._week40-bs016.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week40/html/._week40-bs016.html b/doc/pub/week40/html/._week40-bs016.html
index aa516eb92..0f39656fd 100644
--- a/doc/pub/week40/html/._week40-bs016.html
+++ b/doc/pub/week40/html/._week40-bs016.html
@@ -36,11 +36,10 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Plans for week 40', 2, None, 'plans-for-week-40'),
-              ('Lecture Monday September 30, 2024',
+ 'sections': [('Lecture Monday September 29, 2025',
                2,
                None,
-               'lecture-monday-september-30-2024'),
+               'lecture-monday-september-29-2025'),
               ('Suggested readings and videos',
                2,
                None,
@@ -49,145 +48,71 @@
                2,
                None,
                'lab-sessions-tuesday-and-wednesday'),
-              ('Summary from last week, using gradient descent methods, '
-               'limitations',
+              ('Logistic Regression, from last week',
                2,
                None,
-               'summary-from-last-week-using-gradient-descent-methods-limitations'),
-              ('Simple implementation of GD for OLS, Ridge and Lasso',
+               'logistic-regression-from-last-week'),
+              ('Classification problems', 2, None, 'classification-problems'),
+              ('Optimization and Deep learning',
                2,
                None,
-               'simple-implementation-of-gd-for-ols-ridge-and-lasso'),
-              ("But none of these can compete with Newton's method",
+               'optimization-and-deep-learning'),
+              ('Basics', 2, None, 'basics'),
+              ('Two parameters', 2, None, 'two-parameters'),
+              ('Maximum likelihood', 2, None, 'maximum-likelihood'),
+              ('The cost function rewritten',
                2,
                None,
-               'but-none-of-these-can-compete-with-newton-s-method'),
-              ('Gradient descent and Logistic regression',
+               'the-cost-function-rewritten'),
+              ('Minimizing the cross entropy',
                2,
                None,
-               'gradient-descent-and-logistic-regression'),
-              ('Overview video on Stochastic Gradient Descent',
+               'minimizing-the-cross-entropy'),
+              ('A more compact expression',
                2,
                None,
-               'overview-video-on-stochastic-gradient-descent'),
-              ('Batches and mini-batches', 2, None, 'batches-and-mini-batches'),
-              ('Stochastic Gradient Descent (SGD)',
+               'a-more-compact-expression'),
+              ('Extending to more predictors',
                2,
                None,
-               'stochastic-gradient-descent-sgd'),
-              ('Stochastic Gradient Descent',
+               'extending-to-more-predictors'),
+              ('Including more classes', 2, None, 'including-more-classes'),
+              ('More classes', 2, None, 'more-classes'),
+              ('Optimization, the central part of any Machine Learning '
+               'algortithm',
                2,
                None,
-               'stochastic-gradient-descent'),
-              ('Computation of gradients', 2, None, 'computation-of-gradients'),
-              ('SGD example', 2, None, 'sgd-example'),
-              ('The gradient step', 2, None, 'the-gradient-step'),
-              ('Simple example code', 2, None, 'simple-example-code'),
-              ('When do we stop?', 2, None, 'when-do-we-stop'),
-              ('Slightly different approach',
+               'optimization-the-central-part-of-any-machine-learning-algortithm'),
+              ('Revisiting our Logistic Regression case',
                2,
                None,
-               'slightly-different-approach'),
-              ('Time decay rate', 2, None, 'time-decay-rate'),
-              ('Code with a Number of Minibatches which varies',
+               'revisiting-our-logistic-regression-case'),
+              ('The equations to solve', 2, None, 'the-equations-to-solve'),
+              ("Solving using Newton-Raphson's method",
                2,
                None,
-               'code-with-a-number-of-minibatches-which-varies'),
-              ('Replace or not', 2, None, 'replace-or-not'),
-              ('Momentum based GD', 2, None, 'momentum-based-gd'),
-              ('More on momentum based approaches',
+               'solving-using-newton-raphson-s-method'),
+              ('Example code for Logistic Regression',
                2,
                None,
-               'more-on-momentum-based-approaches'),
-              ('Momentum parameter', 2, None, 'momentum-parameter'),
-              ('Second moment of the gradient',
-               2,
-               None,
-               'second-moment-of-the-gradient'),
-              ('RMS prop', 2, None, 'rms-prop'),
-              ('"ADAM optimizer":"https://arxiv.org/abs/1412.6980"',
-               2,
-               None,
-               'adam-optimizer-https-arxiv-org-abs-1412-6980'),
-              ('Algorithms and codes for Adagrad, RMSprop and Adam',
-               2,
-               None,
-               'algorithms-and-codes-for-adagrad-rmsprop-and-adam'),
-              ('AdaGrad algorithm, taken from "Goodfellow et '
-               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
-               2,
-               None,
-               'adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
-              ('RMSProp algorithm, taken from "Goodfellow et '
-               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
-               2,
-               None,
-               'rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
-              ('ADAM algorithm, taken from "Goodfellow et '
-               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
-               2,
-               None,
-               'adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
-              ('Practical tips', 2, None, 'practical-tips'),
-              ('Automatic differentiation',
-               2,
-               None,
-               'automatic-differentiation'),
-              ('Using autograd', 2, None, 'using-autograd'),
-              ('Autograd with more complicated functions',
-               2,
-               None,
-               'autograd-with-more-complicated-functions'),
-              ('More complicated functions using the elements of their '
-               'arguments directly',
-               2,
-               None,
-               'more-complicated-functions-using-the-elements-of-their-arguments-directly'),
-              ('Functions using mathematical functions from Numpy',
-               2,
-               None,
-               'functions-using-mathematical-functions-from-numpy'),
-              ('More autograd', 2, None, 'more-autograd'),
-              ('And  with loops', 2, None, 'and-with-loops'),
-              ('Using recursion', 2, None, 'using-recursion'),
-              ('Using Autograd with OLS', 2, None, 'using-autograd-with-ols'),
-              ('Same code but now with momentum gradient descent',
-               2,
-               None,
-               'same-code-but-now-with-momentum-gradient-descent'),
-              ('Including Stochastic Gradient Descent with Autograd',
-               2,
-               None,
-               'including-stochastic-gradient-descent-with-autograd'),
-              ('Same code but now with momentum gradient descent',
-               2,
-               None,
-               'same-code-but-now-with-momentum-gradient-descent'),
-              ('Similar (second order function now) problem but now with '
-               'AdaGrad',
-               2,
+               'example-code-for-logistic-regression'),
+              ('Synthetic data generation',
+               3,
                None,
-               'similar-second-order-function-now-problem-but-now-with-adagrad'),
-              ('RMSprop for adaptive learning rate with Stochastic Gradient '
-               'Descent',
+               'synthetic-data-generation'),
+              ('Using _Scikit-learn_', 2, None, 'using-scikit-learn'),
+              ('Using the correlation matrix',
                2,
                None,
-               'rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent'),
-              ('And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"',
+               'using-the-correlation-matrix'),
+              ('Discussing the correlation data',
                2,
                None,
-               'and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf'),
-              ('And Logistic Regression', 2, None, 'and-logistic-regression'),
-              ('Introducing "JAX":"https://jax.readthedocs.io/en/latest/"',
+               'discussing-the-correlation-data'),
+              ('Other measures in classification studies',
                2,
                None,
-               'introducing-jax-https-jax-readthedocs-io-en-latest'),
-              ('Getting started with Jax, note the way we import numpy',
-               3,
-               None,
-               'getting-started-with-jax-note-the-way-we-import-numpy'),
-              ('A warm-up example', 3, None, 'a-warm-up-example'),
-              ('A more advanced example', 3, None, 'a-more-advanced-example'),
+               'other-measures-in-classification-studies'),
               ('Introduction to Neural networks',
                2,
                None,
@@ -276,81 +201,54 @@
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs001.html#plans-for-week-40" style="font-size: 80%;"><b>Plans for week 40</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs002.html#lecture-monday-september-30-2024" style="font-size: 80%;"><b>Lecture Monday September 30, 2024</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs003.html#suggested-readings-and-videos" style="font-size: 80%;"><b>Suggested readings and videos</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs004.html#lab-sessions-tuesday-and-wednesday" style="font-size: 80%;"><b>Lab sessions Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs005.html#summary-from-last-week-using-gradient-descent-methods-limitations" style="font-size: 80%;"><b>Summary from last week, using gradient descent methods, limitations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs006.html#simple-implementation-of-gd-for-ols-ridge-and-lasso" style="font-size: 80%;"><b>Simple implementation of GD for OLS, Ridge and Lasso</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs007.html#but-none-of-these-can-compete-with-newton-s-method" style="font-size: 80%;"><b>But none of these can compete with Newton's method</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs008.html#gradient-descent-and-logistic-regression" style="font-size: 80%;"><b>Gradient descent and Logistic regression</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs009.html#overview-video-on-stochastic-gradient-descent" style="font-size: 80%;"><b>Overview video on Stochastic Gradient Descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs010.html#batches-and-mini-batches" style="font-size: 80%;"><b>Batches and mini-batches</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs011.html#stochastic-gradient-descent-sgd" style="font-size: 80%;"><b>Stochastic Gradient Descent (SGD)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs012.html#stochastic-gradient-descent" style="font-size: 80%;"><b>Stochastic Gradient Descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs013.html#computation-of-gradients" style="font-size: 80%;"><b>Computation of gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs014.html#sgd-example" style="font-size: 80%;"><b>SGD example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs015.html#the-gradient-step" style="font-size: 80%;"><b>The gradient step</b></a></li>
-     <!-- navigation toc: --> <li><a href="#simple-example-code" style="font-size: 80%;"><b>Simple example code</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs017.html#when-do-we-stop" style="font-size: 80%;"><b>When do we stop?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs018.html#slightly-different-approach" style="font-size: 80%;"><b>Slightly different approach</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs019.html#time-decay-rate" style="font-size: 80%;"><b>Time decay rate</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs020.html#code-with-a-number-of-minibatches-which-varies" style="font-size: 80%;"><b>Code with a Number of Minibatches which varies</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs021.html#replace-or-not" style="font-size: 80%;"><b>Replace or not</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs022.html#momentum-based-gd" style="font-size: 80%;"><b>Momentum based GD</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs023.html#more-on-momentum-based-approaches" style="font-size: 80%;"><b>More on momentum based approaches</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs024.html#momentum-parameter" style="font-size: 80%;"><b>Momentum parameter</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs025.html#second-moment-of-the-gradient" style="font-size: 80%;"><b>Second moment of the gradient</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs026.html#rms-prop" style="font-size: 80%;"><b>RMS prop</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs027.html#adam-optimizer-https-arxiv-org-abs-1412-6980" style="font-size: 80%;"><b>"ADAM optimizer":"https://arxiv.org/abs/1412.6980"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#algorithms-and-codes-for-adagrad-rmsprop-and-adam" style="font-size: 80%;"><b>Algorithms and codes for Adagrad, RMSprop and Adam</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>AdaGrad algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>RMSProp algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>ADAM algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs029.html#practical-tips" style="font-size: 80%;"><b>Practical tips</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs030.html#automatic-differentiation" style="font-size: 80%;"><b>Automatic differentiation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs031.html#using-autograd" style="font-size: 80%;"><b>Using autograd</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs032.html#autograd-with-more-complicated-functions" style="font-size: 80%;"><b>Autograd with more complicated functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs033.html#more-complicated-functions-using-the-elements-of-their-arguments-directly" style="font-size: 80%;"><b>More complicated functions using the elements of their arguments directly</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs034.html#functions-using-mathematical-functions-from-numpy" style="font-size: 80%;"><b>Functions using mathematical functions from Numpy</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs035.html#more-autograd" style="font-size: 80%;"><b>More autograd</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs036.html#and-with-loops" style="font-size: 80%;"><b>And  with loops</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs037.html#using-recursion" style="font-size: 80%;"><b>Using recursion</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs038.html#using-autograd-with-ols" style="font-size: 80%;"><b>Using Autograd with OLS</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs041.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs040.html#including-stochastic-gradient-descent-with-autograd" style="font-size: 80%;"><b>Including Stochastic Gradient Descent with Autograd</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs041.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#similar-second-order-function-now-problem-but-now-with-adagrad" style="font-size: 80%;"><b>Similar (second order function now) problem but now with AdaGrad</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs043.html#rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent" style="font-size: 80%;"><b>RMSprop for adaptive learning rate with Stochastic Gradient Descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs044.html#and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf" style="font-size: 80%;"><b>And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#and-logistic-regression" style="font-size: 80%;"><b>And Logistic Regression</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#introducing-jax-https-jax-readthedocs-io-en-latest" style="font-size: 80%;"><b>Introducing "JAX":"https://jax.readthedocs.io/en/latest/"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#getting-started-with-jax-note-the-way-we-import-numpy" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Getting started with Jax, note the way we import numpy</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#a-warm-up-example" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;A warm-up example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#a-more-advanced-example" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;A more advanced example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs046.html#introduction-to-neural-networks" style="font-size: 80%;"><b>Introduction to Neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs047.html#artificial-neurons" style="font-size: 80%;"><b>Artificial neurons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs048.html#neural-network-types" style="font-size: 80%;"><b>Neural network types</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs049.html#feed-forward-neural-networks" style="font-size: 80%;"><b>Feed-forward neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs050.html#convolutional-neural-network" style="font-size: 80%;"><b>Convolutional Neural Network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs051.html#recurrent-neural-networks" style="font-size: 80%;"><b>Recurrent neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs052.html#other-types-of-networks" style="font-size: 80%;"><b>Other types of networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs053.html#multilayer-perceptrons" style="font-size: 80%;"><b>Multilayer perceptrons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs054.html#why-multilayer-perceptrons" style="font-size: 80%;"><b>Why multilayer perceptrons?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs055.html#illustration-of-a-single-perceptron-model-and-a-multi-perceptron-model" style="font-size: 80%;"><b>Illustration of a single perceptron model and a multi-perceptron model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs056.html#examples-of-xor-or-and-and-gates" style="font-size: 80%;"><b>Examples of XOR, OR and AND gates</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs057.html#does-logistic-regression-do-a-better-job" style="font-size: 80%;"><b>Does Logistic Regression do a better Job?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs058.html#adding-neural-networks" style="font-size: 80%;"><b>Adding Neural Networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs064.html#matrix-vector-notation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Matrix-vector notation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs065.html#matrix-vector-notation-and-activation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Matrix-vector notation  and activation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs066.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs067.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs068.html#relevance" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Relevance</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs001.html#lecture-monday-september-29-2025" style="font-size: 80%;"><b>Lecture Monday September 29, 2025</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs002.html#suggested-readings-and-videos" style="font-size: 80%;"><b>Suggested readings and videos</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs003.html#lab-sessions-tuesday-and-wednesday" style="font-size: 80%;"><b>Lab sessions Tuesday and Wednesday</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs004.html#logistic-regression-from-last-week" style="font-size: 80%;"><b>Logistic Regression, from last week</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs005.html#classification-problems" style="font-size: 80%;"><b>Classification problems</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs006.html#optimization-and-deep-learning" style="font-size: 80%;"><b>Optimization and Deep learning</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs007.html#basics" style="font-size: 80%;"><b>Basics</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs008.html#two-parameters" style="font-size: 80%;"><b>Two parameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs009.html#maximum-likelihood" style="font-size: 80%;"><b>Maximum likelihood</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs010.html#the-cost-function-rewritten" style="font-size: 80%;"><b>The cost function rewritten</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs011.html#minimizing-the-cross-entropy" style="font-size: 80%;"><b>Minimizing the cross entropy</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs012.html#a-more-compact-expression" style="font-size: 80%;"><b>A more compact expression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs013.html#extending-to-more-predictors" style="font-size: 80%;"><b>Extending to more predictors</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs014.html#including-more-classes" style="font-size: 80%;"><b>Including more classes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs015.html#more-classes" style="font-size: 80%;"><b>More classes</b></a></li>
+     <!-- navigation toc: --> <li><a href="#optimization-the-central-part-of-any-machine-learning-algortithm" style="font-size: 80%;"><b>Optimization, the central part of any Machine Learning algortithm</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs017.html#revisiting-our-logistic-regression-case" style="font-size: 80%;"><b>Revisiting our Logistic Regression case</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs018.html#the-equations-to-solve" style="font-size: 80%;"><b>The equations to solve</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs019.html#solving-using-newton-raphson-s-method" style="font-size: 80%;"><b>Solving using Newton-Raphson's method</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs020.html#example-code-for-logistic-regression" style="font-size: 80%;"><b>Example code for Logistic Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs020.html#synthetic-data-generation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Synthetic data generation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs021.html#using-scikit-learn" style="font-size: 80%;"><b>Using <b>Scikit-learn</b></b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs022.html#using-the-correlation-matrix" style="font-size: 80%;"><b>Using the correlation matrix</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs023.html#discussing-the-correlation-data" style="font-size: 80%;"><b>Discussing the correlation data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs024.html#other-measures-in-classification-studies" style="font-size: 80%;"><b>Other measures in classification studies</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs025.html#introduction-to-neural-networks" style="font-size: 80%;"><b>Introduction to Neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs026.html#artificial-neurons" style="font-size: 80%;"><b>Artificial neurons</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs027.html#neural-network-types" style="font-size: 80%;"><b>Neural network types</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#feed-forward-neural-networks" style="font-size: 80%;"><b>Feed-forward neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs029.html#convolutional-neural-network" style="font-size: 80%;"><b>Convolutional Neural Network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs030.html#recurrent-neural-networks" style="font-size: 80%;"><b>Recurrent neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs031.html#other-types-of-networks" style="font-size: 80%;"><b>Other types of networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs032.html#multilayer-perceptrons" style="font-size: 80%;"><b>Multilayer perceptrons</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs033.html#why-multilayer-perceptrons" style="font-size: 80%;"><b>Why multilayer perceptrons?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs034.html#illustration-of-a-single-perceptron-model-and-a-multi-perceptron-model" style="font-size: 80%;"><b>Illustration of a single perceptron model and a multi-perceptron model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs035.html#examples-of-xor-or-and-and-gates" style="font-size: 80%;"><b>Examples of XOR, OR and AND gates</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs036.html#does-logistic-regression-do-a-better-job" style="font-size: 80%;"><b>Does Logistic Regression do a better Job?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs037.html#adding-neural-networks" style="font-size: 80%;"><b>Adding Neural Networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs043.html#matrix-vector-notation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Matrix-vector notation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs044.html#matrix-vector-notation-and-activation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Matrix-vector notation  and activation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs046.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs047.html#relevance" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Relevance</a></li>
 
         </ul>
       </li>
@@ -362,51 +260,16 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0016"></a>
 <!-- !split -->
-<h2 id="simple-example-code" class="anchor">Simple example code </h2>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span> 
-
-n <span style="color: #666666">=</span> <span style="color: #666666">100</span> <span style="color: #408080; font-style: italic">#100 datapoints </span>
-M <span style="color: #666666">=</span> <span style="color: #666666">5</span>   <span style="color: #408080; font-style: italic">#size of each minibatch</span>
-m <span style="color: #666666">=</span> <span style="color: #008000">int</span>(n<span style="color: #666666">/</span>M) <span style="color: #408080; font-style: italic">#number of minibatches</span>
-n_epochs <span style="color: #666666">=</span> <span style="color: #666666">10</span> <span style="color: #408080; font-style: italic">#number of epochs</span>
-
-j <span style="color: #666666">=</span> <span style="color: #666666">0</span>
-<span style="color: #008000; font-weight: bold">for</span> epoch <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #666666">1</span>,n_epochs<span style="color: #666666">+1</span>):
-    <span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(m):
-        k <span style="color: #666666">=</span> np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>randint(m) <span style="color: #408080; font-style: italic">#Pick the k-th minibatch at random</span>
-        <span style="color: #408080; font-style: italic">#Compute the gradient using the data in minibatch Bk</span>
-        <span style="color: #408080; font-style: italic">#Compute new suggestion for </span>
-        j <span style="color: #666666">+=</span> <span style="color: #666666">1</span>
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
+<h2 id="optimization-the-central-part-of-any-machine-learning-algortithm" class="anchor">Optimization, the central part of any Machine Learning algortithm </h2>
 
-<p>Taking the gradient only on a subset of the data has two important
-benefits. First, it introduces randomness which decreases the chance
-that our opmization scheme gets stuck in a local minima. Second, if
-the size of the minibatches are small relative to the number of
-datapoints (\( M <  n \)), the computation of the gradient is much
-cheaper since we sum over the datapoints in the \( k-th \) minibatch and not
-all \( n \) datapoints.
+<p>Almost every problem in machine learning and data science starts with
+a dataset \( X \), a model \( g(\theta) \), which is a function of the
+parameters \( \theta \) and a cost function \( C(X, g(\theta)) \) that allows
+us to judge how well the model \( g(\theta) \) explains the observations
+\( X \). The model is fit by finding the values of \( \theta \) that minimize
+the cost function. Ideally we would be able to solve for \( \theta \)
+analytically, however this is not possible in general and we must use
+some approximative/numerical method to compute the minimum.
 </p>
 
 <p>
@@ -434,7 +297,7 @@ <h2 id="simple-example-code" class="anchor">Simple example code </h2>
   <li><a href="http://github.com/._week40-bs024.html">25</a></li>
   <li><a href="http://github.com/._week40-bs025.html">26</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week40-bs068.html">69</a></li>
+  <li><a href="http://github.com/._week40-bs047.html">48</a></li>
   <li><a href="http://github.com/._week40-bs017.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week40/html/._week40-bs017.html b/doc/pub/week40/html/._week40-bs017.html
index c9734f413..6c9828550 100644
--- a/doc/pub/week40/html/._week40-bs017.html
+++ b/doc/pub/week40/html/._week40-bs017.html
@@ -36,11 +36,10 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Plans for week 40', 2, None, 'plans-for-week-40'),
-              ('Lecture Monday September 30, 2024',
+ 'sections': [('Lecture Monday September 29, 2025',
                2,
                None,
-               'lecture-monday-september-30-2024'),
+               'lecture-monday-september-29-2025'),
               ('Suggested readings and videos',
                2,
                None,
@@ -49,145 +48,71 @@
                2,
                None,
                'lab-sessions-tuesday-and-wednesday'),
-              ('Summary from last week, using gradient descent methods, '
-               'limitations',
+              ('Logistic Regression, from last week',
                2,
                None,
-               'summary-from-last-week-using-gradient-descent-methods-limitations'),
-              ('Simple implementation of GD for OLS, Ridge and Lasso',
+               'logistic-regression-from-last-week'),
+              ('Classification problems', 2, None, 'classification-problems'),
+              ('Optimization and Deep learning',
                2,
                None,
-               'simple-implementation-of-gd-for-ols-ridge-and-lasso'),
-              ("But none of these can compete with Newton's method",
+               'optimization-and-deep-learning'),
+              ('Basics', 2, None, 'basics'),
+              ('Two parameters', 2, None, 'two-parameters'),
+              ('Maximum likelihood', 2, None, 'maximum-likelihood'),
+              ('The cost function rewritten',
                2,
                None,
-               'but-none-of-these-can-compete-with-newton-s-method'),
-              ('Gradient descent and Logistic regression',
+               'the-cost-function-rewritten'),
+              ('Minimizing the cross entropy',
                2,
                None,
-               'gradient-descent-and-logistic-regression'),
-              ('Overview video on Stochastic Gradient Descent',
+               'minimizing-the-cross-entropy'),
+              ('A more compact expression',
                2,
                None,
-               'overview-video-on-stochastic-gradient-descent'),
-              ('Batches and mini-batches', 2, None, 'batches-and-mini-batches'),
-              ('Stochastic Gradient Descent (SGD)',
+               'a-more-compact-expression'),
+              ('Extending to more predictors',
                2,
                None,
-               'stochastic-gradient-descent-sgd'),
-              ('Stochastic Gradient Descent',
+               'extending-to-more-predictors'),
+              ('Including more classes', 2, None, 'including-more-classes'),
+              ('More classes', 2, None, 'more-classes'),
+              ('Optimization, the central part of any Machine Learning '
+               'algortithm',
                2,
                None,
-               'stochastic-gradient-descent'),
-              ('Computation of gradients', 2, None, 'computation-of-gradients'),
-              ('SGD example', 2, None, 'sgd-example'),
-              ('The gradient step', 2, None, 'the-gradient-step'),
-              ('Simple example code', 2, None, 'simple-example-code'),
-              ('When do we stop?', 2, None, 'when-do-we-stop'),
-              ('Slightly different approach',
+               'optimization-the-central-part-of-any-machine-learning-algortithm'),
+              ('Revisiting our Logistic Regression case',
                2,
                None,
-               'slightly-different-approach'),
-              ('Time decay rate', 2, None, 'time-decay-rate'),
-              ('Code with a Number of Minibatches which varies',
+               'revisiting-our-logistic-regression-case'),
+              ('The equations to solve', 2, None, 'the-equations-to-solve'),
+              ("Solving using Newton-Raphson's method",
                2,
                None,
-               'code-with-a-number-of-minibatches-which-varies'),
-              ('Replace or not', 2, None, 'replace-or-not'),
-              ('Momentum based GD', 2, None, 'momentum-based-gd'),
-              ('More on momentum based approaches',
+               'solving-using-newton-raphson-s-method'),
+              ('Example code for Logistic Regression',
                2,
                None,
-               'more-on-momentum-based-approaches'),
-              ('Momentum parameter', 2, None, 'momentum-parameter'),
-              ('Second moment of the gradient',
-               2,
-               None,
-               'second-moment-of-the-gradient'),
-              ('RMS prop', 2, None, 'rms-prop'),
-              ('"ADAM optimizer":"https://arxiv.org/abs/1412.6980"',
-               2,
-               None,
-               'adam-optimizer-https-arxiv-org-abs-1412-6980'),
-              ('Algorithms and codes for Adagrad, RMSprop and Adam',
-               2,
-               None,
-               'algorithms-and-codes-for-adagrad-rmsprop-and-adam'),
-              ('AdaGrad algorithm, taken from "Goodfellow et '
-               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
-               2,
-               None,
-               'adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
-              ('RMSProp algorithm, taken from "Goodfellow et '
-               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
-               2,
-               None,
-               'rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
-              ('ADAM algorithm, taken from "Goodfellow et '
-               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
-               2,
-               None,
-               'adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
-              ('Practical tips', 2, None, 'practical-tips'),
-              ('Automatic differentiation',
-               2,
-               None,
-               'automatic-differentiation'),
-              ('Using autograd', 2, None, 'using-autograd'),
-              ('Autograd with more complicated functions',
-               2,
-               None,
-               'autograd-with-more-complicated-functions'),
-              ('More complicated functions using the elements of their '
-               'arguments directly',
-               2,
-               None,
-               'more-complicated-functions-using-the-elements-of-their-arguments-directly'),
-              ('Functions using mathematical functions from Numpy',
-               2,
-               None,
-               'functions-using-mathematical-functions-from-numpy'),
-              ('More autograd', 2, None, 'more-autograd'),
-              ('And  with loops', 2, None, 'and-with-loops'),
-              ('Using recursion', 2, None, 'using-recursion'),
-              ('Using Autograd with OLS', 2, None, 'using-autograd-with-ols'),
-              ('Same code but now with momentum gradient descent',
-               2,
-               None,
-               'same-code-but-now-with-momentum-gradient-descent'),
-              ('Including Stochastic Gradient Descent with Autograd',
-               2,
-               None,
-               'including-stochastic-gradient-descent-with-autograd'),
-              ('Same code but now with momentum gradient descent',
-               2,
-               None,
-               'same-code-but-now-with-momentum-gradient-descent'),
-              ('Similar (second order function now) problem but now with '
-               'AdaGrad',
-               2,
+               'example-code-for-logistic-regression'),
+              ('Synthetic data generation',
+               3,
                None,
-               'similar-second-order-function-now-problem-but-now-with-adagrad'),
-              ('RMSprop for adaptive learning rate with Stochastic Gradient '
-               'Descent',
+               'synthetic-data-generation'),
+              ('Using _Scikit-learn_', 2, None, 'using-scikit-learn'),
+              ('Using the correlation matrix',
                2,
                None,
-               'rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent'),
-              ('And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"',
+               'using-the-correlation-matrix'),
+              ('Discussing the correlation data',
                2,
                None,
-               'and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf'),
-              ('And Logistic Regression', 2, None, 'and-logistic-regression'),
-              ('Introducing "JAX":"https://jax.readthedocs.io/en/latest/"',
+               'discussing-the-correlation-data'),
+              ('Other measures in classification studies',
                2,
                None,
-               'introducing-jax-https-jax-readthedocs-io-en-latest'),
-              ('Getting started with Jax, note the way we import numpy',
-               3,
-               None,
-               'getting-started-with-jax-note-the-way-we-import-numpy'),
-              ('A warm-up example', 3, None, 'a-warm-up-example'),
-              ('A more advanced example', 3, None, 'a-more-advanced-example'),
+               'other-measures-in-classification-studies'),
               ('Introduction to Neural networks',
                2,
                None,
@@ -276,81 +201,54 @@
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs001.html#plans-for-week-40" style="font-size: 80%;"><b>Plans for week 40</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs002.html#lecture-monday-september-30-2024" style="font-size: 80%;"><b>Lecture Monday September 30, 2024</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs003.html#suggested-readings-and-videos" style="font-size: 80%;"><b>Suggested readings and videos</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs004.html#lab-sessions-tuesday-and-wednesday" style="font-size: 80%;"><b>Lab sessions Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs005.html#summary-from-last-week-using-gradient-descent-methods-limitations" style="font-size: 80%;"><b>Summary from last week, using gradient descent methods, limitations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs006.html#simple-implementation-of-gd-for-ols-ridge-and-lasso" style="font-size: 80%;"><b>Simple implementation of GD for OLS, Ridge and Lasso</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs007.html#but-none-of-these-can-compete-with-newton-s-method" style="font-size: 80%;"><b>But none of these can compete with Newton's method</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs008.html#gradient-descent-and-logistic-regression" style="font-size: 80%;"><b>Gradient descent and Logistic regression</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs009.html#overview-video-on-stochastic-gradient-descent" style="font-size: 80%;"><b>Overview video on Stochastic Gradient Descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs010.html#batches-and-mini-batches" style="font-size: 80%;"><b>Batches and mini-batches</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs011.html#stochastic-gradient-descent-sgd" style="font-size: 80%;"><b>Stochastic Gradient Descent (SGD)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs012.html#stochastic-gradient-descent" style="font-size: 80%;"><b>Stochastic Gradient Descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs013.html#computation-of-gradients" style="font-size: 80%;"><b>Computation of gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs014.html#sgd-example" style="font-size: 80%;"><b>SGD example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs015.html#the-gradient-step" style="font-size: 80%;"><b>The gradient step</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs016.html#simple-example-code" style="font-size: 80%;"><b>Simple example code</b></a></li>
-     <!-- navigation toc: --> <li><a href="#when-do-we-stop" style="font-size: 80%;"><b>When do we stop?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs018.html#slightly-different-approach" style="font-size: 80%;"><b>Slightly different approach</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs019.html#time-decay-rate" style="font-size: 80%;"><b>Time decay rate</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs020.html#code-with-a-number-of-minibatches-which-varies" style="font-size: 80%;"><b>Code with a Number of Minibatches which varies</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs021.html#replace-or-not" style="font-size: 80%;"><b>Replace or not</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs022.html#momentum-based-gd" style="font-size: 80%;"><b>Momentum based GD</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs023.html#more-on-momentum-based-approaches" style="font-size: 80%;"><b>More on momentum based approaches</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs024.html#momentum-parameter" style="font-size: 80%;"><b>Momentum parameter</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs025.html#second-moment-of-the-gradient" style="font-size: 80%;"><b>Second moment of the gradient</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs026.html#rms-prop" style="font-size: 80%;"><b>RMS prop</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs027.html#adam-optimizer-https-arxiv-org-abs-1412-6980" style="font-size: 80%;"><b>"ADAM optimizer":"https://arxiv.org/abs/1412.6980"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#algorithms-and-codes-for-adagrad-rmsprop-and-adam" style="font-size: 80%;"><b>Algorithms and codes for Adagrad, RMSprop and Adam</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>AdaGrad algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>RMSProp algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>ADAM algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs029.html#practical-tips" style="font-size: 80%;"><b>Practical tips</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs030.html#automatic-differentiation" style="font-size: 80%;"><b>Automatic differentiation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs031.html#using-autograd" style="font-size: 80%;"><b>Using autograd</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs032.html#autograd-with-more-complicated-functions" style="font-size: 80%;"><b>Autograd with more complicated functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs033.html#more-complicated-functions-using-the-elements-of-their-arguments-directly" style="font-size: 80%;"><b>More complicated functions using the elements of their arguments directly</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs034.html#functions-using-mathematical-functions-from-numpy" style="font-size: 80%;"><b>Functions using mathematical functions from Numpy</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs035.html#more-autograd" style="font-size: 80%;"><b>More autograd</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs036.html#and-with-loops" style="font-size: 80%;"><b>And  with loops</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs037.html#using-recursion" style="font-size: 80%;"><b>Using recursion</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs038.html#using-autograd-with-ols" style="font-size: 80%;"><b>Using Autograd with OLS</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs041.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs040.html#including-stochastic-gradient-descent-with-autograd" style="font-size: 80%;"><b>Including Stochastic Gradient Descent with Autograd</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs041.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#similar-second-order-function-now-problem-but-now-with-adagrad" style="font-size: 80%;"><b>Similar (second order function now) problem but now with AdaGrad</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs043.html#rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent" style="font-size: 80%;"><b>RMSprop for adaptive learning rate with Stochastic Gradient Descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs044.html#and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf" style="font-size: 80%;"><b>And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#and-logistic-regression" style="font-size: 80%;"><b>And Logistic Regression</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#introducing-jax-https-jax-readthedocs-io-en-latest" style="font-size: 80%;"><b>Introducing "JAX":"https://jax.readthedocs.io/en/latest/"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#getting-started-with-jax-note-the-way-we-import-numpy" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Getting started with Jax, note the way we import numpy</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#a-warm-up-example" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;A warm-up example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#a-more-advanced-example" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;A more advanced example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs046.html#introduction-to-neural-networks" style="font-size: 80%;"><b>Introduction to Neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs047.html#artificial-neurons" style="font-size: 80%;"><b>Artificial neurons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs048.html#neural-network-types" style="font-size: 80%;"><b>Neural network types</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs049.html#feed-forward-neural-networks" style="font-size: 80%;"><b>Feed-forward neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs050.html#convolutional-neural-network" style="font-size: 80%;"><b>Convolutional Neural Network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs051.html#recurrent-neural-networks" style="font-size: 80%;"><b>Recurrent neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs052.html#other-types-of-networks" style="font-size: 80%;"><b>Other types of networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs053.html#multilayer-perceptrons" style="font-size: 80%;"><b>Multilayer perceptrons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs054.html#why-multilayer-perceptrons" style="font-size: 80%;"><b>Why multilayer perceptrons?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs055.html#illustration-of-a-single-perceptron-model-and-a-multi-perceptron-model" style="font-size: 80%;"><b>Illustration of a single perceptron model and a multi-perceptron model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs056.html#examples-of-xor-or-and-and-gates" style="font-size: 80%;"><b>Examples of XOR, OR and AND gates</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs057.html#does-logistic-regression-do-a-better-job" style="font-size: 80%;"><b>Does Logistic Regression do a better Job?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs058.html#adding-neural-networks" style="font-size: 80%;"><b>Adding Neural Networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs064.html#matrix-vector-notation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Matrix-vector notation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs065.html#matrix-vector-notation-and-activation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Matrix-vector notation  and activation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs066.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs067.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs068.html#relevance" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Relevance</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs001.html#lecture-monday-september-29-2025" style="font-size: 80%;"><b>Lecture Monday September 29, 2025</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs002.html#suggested-readings-and-videos" style="font-size: 80%;"><b>Suggested readings and videos</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs003.html#lab-sessions-tuesday-and-wednesday" style="font-size: 80%;"><b>Lab sessions Tuesday and Wednesday</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs004.html#logistic-regression-from-last-week" style="font-size: 80%;"><b>Logistic Regression, from last week</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs005.html#classification-problems" style="font-size: 80%;"><b>Classification problems</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs006.html#optimization-and-deep-learning" style="font-size: 80%;"><b>Optimization and Deep learning</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs007.html#basics" style="font-size: 80%;"><b>Basics</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs008.html#two-parameters" style="font-size: 80%;"><b>Two parameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs009.html#maximum-likelihood" style="font-size: 80%;"><b>Maximum likelihood</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs010.html#the-cost-function-rewritten" style="font-size: 80%;"><b>The cost function rewritten</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs011.html#minimizing-the-cross-entropy" style="font-size: 80%;"><b>Minimizing the cross entropy</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs012.html#a-more-compact-expression" style="font-size: 80%;"><b>A more compact expression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs013.html#extending-to-more-predictors" style="font-size: 80%;"><b>Extending to more predictors</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs014.html#including-more-classes" style="font-size: 80%;"><b>Including more classes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs015.html#more-classes" style="font-size: 80%;"><b>More classes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs016.html#optimization-the-central-part-of-any-machine-learning-algortithm" style="font-size: 80%;"><b>Optimization, the central part of any Machine Learning algortithm</b></a></li>
+     <!-- navigation toc: --> <li><a href="#revisiting-our-logistic-regression-case" style="font-size: 80%;"><b>Revisiting our Logistic Regression case</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs018.html#the-equations-to-solve" style="font-size: 80%;"><b>The equations to solve</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs019.html#solving-using-newton-raphson-s-method" style="font-size: 80%;"><b>Solving using Newton-Raphson's method</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs020.html#example-code-for-logistic-regression" style="font-size: 80%;"><b>Example code for Logistic Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs020.html#synthetic-data-generation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Synthetic data generation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs021.html#using-scikit-learn" style="font-size: 80%;"><b>Using <b>Scikit-learn</b></b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs022.html#using-the-correlation-matrix" style="font-size: 80%;"><b>Using the correlation matrix</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs023.html#discussing-the-correlation-data" style="font-size: 80%;"><b>Discussing the correlation data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs024.html#other-measures-in-classification-studies" style="font-size: 80%;"><b>Other measures in classification studies</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs025.html#introduction-to-neural-networks" style="font-size: 80%;"><b>Introduction to Neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs026.html#artificial-neurons" style="font-size: 80%;"><b>Artificial neurons</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs027.html#neural-network-types" style="font-size: 80%;"><b>Neural network types</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#feed-forward-neural-networks" style="font-size: 80%;"><b>Feed-forward neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs029.html#convolutional-neural-network" style="font-size: 80%;"><b>Convolutional Neural Network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs030.html#recurrent-neural-networks" style="font-size: 80%;"><b>Recurrent neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs031.html#other-types-of-networks" style="font-size: 80%;"><b>Other types of networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs032.html#multilayer-perceptrons" style="font-size: 80%;"><b>Multilayer perceptrons</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs033.html#why-multilayer-perceptrons" style="font-size: 80%;"><b>Why multilayer perceptrons?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs034.html#illustration-of-a-single-perceptron-model-and-a-multi-perceptron-model" style="font-size: 80%;"><b>Illustration of a single perceptron model and a multi-perceptron model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs035.html#examples-of-xor-or-and-and-gates" style="font-size: 80%;"><b>Examples of XOR, OR and AND gates</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs036.html#does-logistic-regression-do-a-better-job" style="font-size: 80%;"><b>Does Logistic Regression do a better Job?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs037.html#adding-neural-networks" style="font-size: 80%;"><b>Adding Neural Networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs043.html#matrix-vector-notation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Matrix-vector notation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs044.html#matrix-vector-notation-and-activation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Matrix-vector notation  and activation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs046.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs047.html#relevance" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Relevance</a></li>
 
         </ul>
       </li>
@@ -362,20 +260,25 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0017"></a>
 <!-- !split -->
-<h2 id="when-do-we-stop" class="anchor">When do we stop? </h2>
+<h2 id="revisiting-our-logistic-regression-case" class="anchor">Revisiting our Logistic Regression case </h2>
 
-<p>A natural question is when do we stop the search for a new minimum?
-One possibility is to compute the full gradient after a given number
-of epochs and check if the norm of the gradient is smaller than some
-threshold and stop if true. However, the condition that the gradient
-is zero is valid also for local minima, so this would only tell us
-that we are close to a local/global minimum. However, we could also
-evaluate the cost function at this point, store the result and
-continue the search. If the test kicks in at a later stage we can
-compare the values of the cost function and keep the \( \beta \) that
-gave the lowest value.
+<p>In our discussion on Logistic Regression we studied the 
+case of
+two classes, with \( y_i \) either
+\( 0 \) or \( 1 \). Furthermore we assumed also that we have only two
+parameters \( \theta \) in our fitting, that is we
+defined probabilities
 </p>
 
+$$
+\begin{align*}
+p(y_i=1|x_i,\boldsymbol{\theta}) &= \frac{\exp{(\theta_0+\theta_1x_i)}}{1+\exp{(\theta_0+\theta_1x_i)}},\nonumber\\
+p(y_i=0|x_i,\boldsymbol{\theta}) &= 1 - p(y_i=1|x_i,\boldsymbol{\theta}),
+\end{align*}
+$$
+
+<p>where \( \boldsymbol{\theta} \) are the weights we wish to extract from data, in our case \( \theta_0 \) and \( \theta_1 \). </p>
+
 <p>
 <!-- navigation buttons at the bottom of the page -->
 <ul class="pagination">
@@ -401,7 +304,7 @@ <h2 id="when-do-we-stop" class="anchor">When do we stop? </h2>
   <li><a href="http://github.com/._week40-bs025.html">26</a></li>
   <li><a href="http://github.com/._week40-bs026.html">27</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week40-bs068.html">69</a></li>
+  <li><a href="http://github.com/._week40-bs047.html">48</a></li>
   <li><a href="http://github.com/._week40-bs018.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week40/html/._week40-bs018.html b/doc/pub/week40/html/._week40-bs018.html
index a6e7c620e..fca8a7478 100644
--- a/doc/pub/week40/html/._week40-bs018.html
+++ b/doc/pub/week40/html/._week40-bs018.html
@@ -36,11 +36,10 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Plans for week 40', 2, None, 'plans-for-week-40'),
-              ('Lecture Monday September 30, 2024',
+ 'sections': [('Lecture Monday September 29, 2025',
                2,
                None,
-               'lecture-monday-september-30-2024'),
+               'lecture-monday-september-29-2025'),
               ('Suggested readings and videos',
                2,
                None,
@@ -49,145 +48,71 @@
                2,
                None,
                'lab-sessions-tuesday-and-wednesday'),
-              ('Summary from last week, using gradient descent methods, '
-               'limitations',
+              ('Logistic Regression, from last week',
                2,
                None,
-               'summary-from-last-week-using-gradient-descent-methods-limitations'),
-              ('Simple implementation of GD for OLS, Ridge and Lasso',
+               'logistic-regression-from-last-week'),
+              ('Classification problems', 2, None, 'classification-problems'),
+              ('Optimization and Deep learning',
                2,
                None,
-               'simple-implementation-of-gd-for-ols-ridge-and-lasso'),
-              ("But none of these can compete with Newton's method",
+               'optimization-and-deep-learning'),
+              ('Basics', 2, None, 'basics'),
+              ('Two parameters', 2, None, 'two-parameters'),
+              ('Maximum likelihood', 2, None, 'maximum-likelihood'),
+              ('The cost function rewritten',
                2,
                None,
-               'but-none-of-these-can-compete-with-newton-s-method'),
-              ('Gradient descent and Logistic regression',
+               'the-cost-function-rewritten'),
+              ('Minimizing the cross entropy',
                2,
                None,
-               'gradient-descent-and-logistic-regression'),
-              ('Overview video on Stochastic Gradient Descent',
+               'minimizing-the-cross-entropy'),
+              ('A more compact expression',
                2,
                None,
-               'overview-video-on-stochastic-gradient-descent'),
-              ('Batches and mini-batches', 2, None, 'batches-and-mini-batches'),
-              ('Stochastic Gradient Descent (SGD)',
+               'a-more-compact-expression'),
+              ('Extending to more predictors',
                2,
                None,
-               'stochastic-gradient-descent-sgd'),
-              ('Stochastic Gradient Descent',
+               'extending-to-more-predictors'),
+              ('Including more classes', 2, None, 'including-more-classes'),
+              ('More classes', 2, None, 'more-classes'),
+              ('Optimization, the central part of any Machine Learning '
+               'algortithm',
                2,
                None,
-               'stochastic-gradient-descent'),
-              ('Computation of gradients', 2, None, 'computation-of-gradients'),
-              ('SGD example', 2, None, 'sgd-example'),
-              ('The gradient step', 2, None, 'the-gradient-step'),
-              ('Simple example code', 2, None, 'simple-example-code'),
-              ('When do we stop?', 2, None, 'when-do-we-stop'),
-              ('Slightly different approach',
+               'optimization-the-central-part-of-any-machine-learning-algortithm'),
+              ('Revisiting our Logistic Regression case',
                2,
                None,
-               'slightly-different-approach'),
-              ('Time decay rate', 2, None, 'time-decay-rate'),
-              ('Code with a Number of Minibatches which varies',
+               'revisiting-our-logistic-regression-case'),
+              ('The equations to solve', 2, None, 'the-equations-to-solve'),
+              ("Solving using Newton-Raphson's method",
                2,
                None,
-               'code-with-a-number-of-minibatches-which-varies'),
-              ('Replace or not', 2, None, 'replace-or-not'),
-              ('Momentum based GD', 2, None, 'momentum-based-gd'),
-              ('More on momentum based approaches',
+               'solving-using-newton-raphson-s-method'),
+              ('Example code for Logistic Regression',
                2,
                None,
-               'more-on-momentum-based-approaches'),
-              ('Momentum parameter', 2, None, 'momentum-parameter'),
-              ('Second moment of the gradient',
-               2,
-               None,
-               'second-moment-of-the-gradient'),
-              ('RMS prop', 2, None, 'rms-prop'),
-              ('"ADAM optimizer":"https://arxiv.org/abs/1412.6980"',
-               2,
-               None,
-               'adam-optimizer-https-arxiv-org-abs-1412-6980'),
-              ('Algorithms and codes for Adagrad, RMSprop and Adam',
-               2,
-               None,
-               'algorithms-and-codes-for-adagrad-rmsprop-and-adam'),
-              ('AdaGrad algorithm, taken from "Goodfellow et '
-               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
-               2,
-               None,
-               'adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
-              ('RMSProp algorithm, taken from "Goodfellow et '
-               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
-               2,
-               None,
-               'rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
-              ('ADAM algorithm, taken from "Goodfellow et '
-               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
-               2,
-               None,
-               'adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
-              ('Practical tips', 2, None, 'practical-tips'),
-              ('Automatic differentiation',
-               2,
-               None,
-               'automatic-differentiation'),
-              ('Using autograd', 2, None, 'using-autograd'),
-              ('Autograd with more complicated functions',
-               2,
-               None,
-               'autograd-with-more-complicated-functions'),
-              ('More complicated functions using the elements of their '
-               'arguments directly',
-               2,
-               None,
-               'more-complicated-functions-using-the-elements-of-their-arguments-directly'),
-              ('Functions using mathematical functions from Numpy',
-               2,
-               None,
-               'functions-using-mathematical-functions-from-numpy'),
-              ('More autograd', 2, None, 'more-autograd'),
-              ('And  with loops', 2, None, 'and-with-loops'),
-              ('Using recursion', 2, None, 'using-recursion'),
-              ('Using Autograd with OLS', 2, None, 'using-autograd-with-ols'),
-              ('Same code but now with momentum gradient descent',
-               2,
-               None,
-               'same-code-but-now-with-momentum-gradient-descent'),
-              ('Including Stochastic Gradient Descent with Autograd',
-               2,
-               None,
-               'including-stochastic-gradient-descent-with-autograd'),
-              ('Same code but now with momentum gradient descent',
-               2,
-               None,
-               'same-code-but-now-with-momentum-gradient-descent'),
-              ('Similar (second order function now) problem but now with '
-               'AdaGrad',
-               2,
+               'example-code-for-logistic-regression'),
+              ('Synthetic data generation',
+               3,
                None,
-               'similar-second-order-function-now-problem-but-now-with-adagrad'),
-              ('RMSprop for adaptive learning rate with Stochastic Gradient '
-               'Descent',
+               'synthetic-data-generation'),
+              ('Using _Scikit-learn_', 2, None, 'using-scikit-learn'),
+              ('Using the correlation matrix',
                2,
                None,
-               'rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent'),
-              ('And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"',
+               'using-the-correlation-matrix'),
+              ('Discussing the correlation data',
                2,
                None,
-               'and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf'),
-              ('And Logistic Regression', 2, None, 'and-logistic-regression'),
-              ('Introducing "JAX":"https://jax.readthedocs.io/en/latest/"',
+               'discussing-the-correlation-data'),
+              ('Other measures in classification studies',
                2,
                None,
-               'introducing-jax-https-jax-readthedocs-io-en-latest'),
-              ('Getting started with Jax, note the way we import numpy',
-               3,
-               None,
-               'getting-started-with-jax-note-the-way-we-import-numpy'),
-              ('A warm-up example', 3, None, 'a-warm-up-example'),
-              ('A more advanced example', 3, None, 'a-more-advanced-example'),
+               'other-measures-in-classification-studies'),
               ('Introduction to Neural networks',
                2,
                None,
@@ -276,81 +201,54 @@
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs001.html#plans-for-week-40" style="font-size: 80%;"><b>Plans for week 40</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs002.html#lecture-monday-september-30-2024" style="font-size: 80%;"><b>Lecture Monday September 30, 2024</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs003.html#suggested-readings-and-videos" style="font-size: 80%;"><b>Suggested readings and videos</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs004.html#lab-sessions-tuesday-and-wednesday" style="font-size: 80%;"><b>Lab sessions Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs005.html#summary-from-last-week-using-gradient-descent-methods-limitations" style="font-size: 80%;"><b>Summary from last week, using gradient descent methods, limitations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs006.html#simple-implementation-of-gd-for-ols-ridge-and-lasso" style="font-size: 80%;"><b>Simple implementation of GD for OLS, Ridge and Lasso</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs007.html#but-none-of-these-can-compete-with-newton-s-method" style="font-size: 80%;"><b>But none of these can compete with Newton's method</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs008.html#gradient-descent-and-logistic-regression" style="font-size: 80%;"><b>Gradient descent and Logistic regression</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs009.html#overview-video-on-stochastic-gradient-descent" style="font-size: 80%;"><b>Overview video on Stochastic Gradient Descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs010.html#batches-and-mini-batches" style="font-size: 80%;"><b>Batches and mini-batches</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs011.html#stochastic-gradient-descent-sgd" style="font-size: 80%;"><b>Stochastic Gradient Descent (SGD)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs012.html#stochastic-gradient-descent" style="font-size: 80%;"><b>Stochastic Gradient Descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs013.html#computation-of-gradients" style="font-size: 80%;"><b>Computation of gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs014.html#sgd-example" style="font-size: 80%;"><b>SGD example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs015.html#the-gradient-step" style="font-size: 80%;"><b>The gradient step</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs016.html#simple-example-code" style="font-size: 80%;"><b>Simple example code</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs017.html#when-do-we-stop" style="font-size: 80%;"><b>When do we stop?</b></a></li>
-     <!-- navigation toc: --> <li><a href="#slightly-different-approach" style="font-size: 80%;"><b>Slightly different approach</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs019.html#time-decay-rate" style="font-size: 80%;"><b>Time decay rate</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs020.html#code-with-a-number-of-minibatches-which-varies" style="font-size: 80%;"><b>Code with a Number of Minibatches which varies</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs021.html#replace-or-not" style="font-size: 80%;"><b>Replace or not</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs022.html#momentum-based-gd" style="font-size: 80%;"><b>Momentum based GD</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs023.html#more-on-momentum-based-approaches" style="font-size: 80%;"><b>More on momentum based approaches</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs024.html#momentum-parameter" style="font-size: 80%;"><b>Momentum parameter</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs025.html#second-moment-of-the-gradient" style="font-size: 80%;"><b>Second moment of the gradient</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs026.html#rms-prop" style="font-size: 80%;"><b>RMS prop</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs027.html#adam-optimizer-https-arxiv-org-abs-1412-6980" style="font-size: 80%;"><b>"ADAM optimizer":"https://arxiv.org/abs/1412.6980"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#algorithms-and-codes-for-adagrad-rmsprop-and-adam" style="font-size: 80%;"><b>Algorithms and codes for Adagrad, RMSprop and Adam</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>AdaGrad algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>RMSProp algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>ADAM algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs029.html#practical-tips" style="font-size: 80%;"><b>Practical tips</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs030.html#automatic-differentiation" style="font-size: 80%;"><b>Automatic differentiation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs031.html#using-autograd" style="font-size: 80%;"><b>Using autograd</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs032.html#autograd-with-more-complicated-functions" style="font-size: 80%;"><b>Autograd with more complicated functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs033.html#more-complicated-functions-using-the-elements-of-their-arguments-directly" style="font-size: 80%;"><b>More complicated functions using the elements of their arguments directly</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs034.html#functions-using-mathematical-functions-from-numpy" style="font-size: 80%;"><b>Functions using mathematical functions from Numpy</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs035.html#more-autograd" style="font-size: 80%;"><b>More autograd</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs036.html#and-with-loops" style="font-size: 80%;"><b>And  with loops</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs037.html#using-recursion" style="font-size: 80%;"><b>Using recursion</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs038.html#using-autograd-with-ols" style="font-size: 80%;"><b>Using Autograd with OLS</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs041.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs040.html#including-stochastic-gradient-descent-with-autograd" style="font-size: 80%;"><b>Including Stochastic Gradient Descent with Autograd</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs041.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#similar-second-order-function-now-problem-but-now-with-adagrad" style="font-size: 80%;"><b>Similar (second order function now) problem but now with AdaGrad</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs043.html#rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent" style="font-size: 80%;"><b>RMSprop for adaptive learning rate with Stochastic Gradient Descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs044.html#and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf" style="font-size: 80%;"><b>And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#and-logistic-regression" style="font-size: 80%;"><b>And Logistic Regression</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#introducing-jax-https-jax-readthedocs-io-en-latest" style="font-size: 80%;"><b>Introducing "JAX":"https://jax.readthedocs.io/en/latest/"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#getting-started-with-jax-note-the-way-we-import-numpy" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Getting started with Jax, note the way we import numpy</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#a-warm-up-example" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;A warm-up example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#a-more-advanced-example" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;A more advanced example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs046.html#introduction-to-neural-networks" style="font-size: 80%;"><b>Introduction to Neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs047.html#artificial-neurons" style="font-size: 80%;"><b>Artificial neurons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs048.html#neural-network-types" style="font-size: 80%;"><b>Neural network types</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs049.html#feed-forward-neural-networks" style="font-size: 80%;"><b>Feed-forward neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs050.html#convolutional-neural-network" style="font-size: 80%;"><b>Convolutional Neural Network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs051.html#recurrent-neural-networks" style="font-size: 80%;"><b>Recurrent neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs052.html#other-types-of-networks" style="font-size: 80%;"><b>Other types of networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs053.html#multilayer-perceptrons" style="font-size: 80%;"><b>Multilayer perceptrons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs054.html#why-multilayer-perceptrons" style="font-size: 80%;"><b>Why multilayer perceptrons?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs055.html#illustration-of-a-single-perceptron-model-and-a-multi-perceptron-model" style="font-size: 80%;"><b>Illustration of a single perceptron model and a multi-perceptron model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs056.html#examples-of-xor-or-and-and-gates" style="font-size: 80%;"><b>Examples of XOR, OR and AND gates</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs057.html#does-logistic-regression-do-a-better-job" style="font-size: 80%;"><b>Does Logistic Regression do a better Job?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs058.html#adding-neural-networks" style="font-size: 80%;"><b>Adding Neural Networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs064.html#matrix-vector-notation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Matrix-vector notation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs065.html#matrix-vector-notation-and-activation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Matrix-vector notation  and activation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs066.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs067.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs068.html#relevance" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Relevance</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs001.html#lecture-monday-september-29-2025" style="font-size: 80%;"><b>Lecture Monday September 29, 2025</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs002.html#suggested-readings-and-videos" style="font-size: 80%;"><b>Suggested readings and videos</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs003.html#lab-sessions-tuesday-and-wednesday" style="font-size: 80%;"><b>Lab sessions Tuesday and Wednesday</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs004.html#logistic-regression-from-last-week" style="font-size: 80%;"><b>Logistic Regression, from last week</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs005.html#classification-problems" style="font-size: 80%;"><b>Classification problems</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs006.html#optimization-and-deep-learning" style="font-size: 80%;"><b>Optimization and Deep learning</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs007.html#basics" style="font-size: 80%;"><b>Basics</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs008.html#two-parameters" style="font-size: 80%;"><b>Two parameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs009.html#maximum-likelihood" style="font-size: 80%;"><b>Maximum likelihood</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs010.html#the-cost-function-rewritten" style="font-size: 80%;"><b>The cost function rewritten</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs011.html#minimizing-the-cross-entropy" style="font-size: 80%;"><b>Minimizing the cross entropy</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs012.html#a-more-compact-expression" style="font-size: 80%;"><b>A more compact expression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs013.html#extending-to-more-predictors" style="font-size: 80%;"><b>Extending to more predictors</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs014.html#including-more-classes" style="font-size: 80%;"><b>Including more classes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs015.html#more-classes" style="font-size: 80%;"><b>More classes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs016.html#optimization-the-central-part-of-any-machine-learning-algortithm" style="font-size: 80%;"><b>Optimization, the central part of any Machine Learning algortithm</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs017.html#revisiting-our-logistic-regression-case" style="font-size: 80%;"><b>Revisiting our Logistic Regression case</b></a></li>
+     <!-- navigation toc: --> <li><a href="#the-equations-to-solve" style="font-size: 80%;"><b>The equations to solve</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs019.html#solving-using-newton-raphson-s-method" style="font-size: 80%;"><b>Solving using Newton-Raphson's method</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs020.html#example-code-for-logistic-regression" style="font-size: 80%;"><b>Example code for Logistic Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs020.html#synthetic-data-generation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Synthetic data generation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs021.html#using-scikit-learn" style="font-size: 80%;"><b>Using <b>Scikit-learn</b></b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs022.html#using-the-correlation-matrix" style="font-size: 80%;"><b>Using the correlation matrix</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs023.html#discussing-the-correlation-data" style="font-size: 80%;"><b>Discussing the correlation data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs024.html#other-measures-in-classification-studies" style="font-size: 80%;"><b>Other measures in classification studies</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs025.html#introduction-to-neural-networks" style="font-size: 80%;"><b>Introduction to Neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs026.html#artificial-neurons" style="font-size: 80%;"><b>Artificial neurons</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs027.html#neural-network-types" style="font-size: 80%;"><b>Neural network types</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#feed-forward-neural-networks" style="font-size: 80%;"><b>Feed-forward neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs029.html#convolutional-neural-network" style="font-size: 80%;"><b>Convolutional Neural Network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs030.html#recurrent-neural-networks" style="font-size: 80%;"><b>Recurrent neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs031.html#other-types-of-networks" style="font-size: 80%;"><b>Other types of networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs032.html#multilayer-perceptrons" style="font-size: 80%;"><b>Multilayer perceptrons</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs033.html#why-multilayer-perceptrons" style="font-size: 80%;"><b>Why multilayer perceptrons?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs034.html#illustration-of-a-single-perceptron-model-and-a-multi-perceptron-model" style="font-size: 80%;"><b>Illustration of a single perceptron model and a multi-perceptron model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs035.html#examples-of-xor-or-and-and-gates" style="font-size: 80%;"><b>Examples of XOR, OR and AND gates</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs036.html#does-logistic-regression-do-a-better-job" style="font-size: 80%;"><b>Does Logistic Regression do a better Job?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs037.html#adding-neural-networks" style="font-size: 80%;"><b>Adding Neural Networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs043.html#matrix-vector-notation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Matrix-vector notation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs044.html#matrix-vector-notation-and-activation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Matrix-vector notation  and activation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs046.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs047.html#relevance" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Relevance</a></li>
 
         </ul>
       </li>
@@ -362,19 +260,29 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0018"></a>
 <!-- !split -->
-<h2 id="slightly-different-approach" class="anchor">Slightly different approach </h2>
+<h2 id="the-equations-to-solve" class="anchor">The equations to solve </h2>
+
+<p>Our compact equations used a definition of a vector \( \boldsymbol{y} \) with \( n \)
+elements \( y_i \), an \( n\times p \) matrix \( \boldsymbol{X} \) which contains the
+\( x_i \) values and a vector \( \boldsymbol{p} \) of fitted probabilities
+\( p(y_i\vert x_i,\boldsymbol{\theta}) \). We rewrote in a more compact form
+the first derivative of the cost function as
+</p>
+
+$$
+\frac{\partial \mathcal{C}(\boldsymbol{\theta})}{\partial \boldsymbol{\theta}} = -\boldsymbol{X}^T\left(\boldsymbol{y}-\boldsymbol{p}\right). 
+$$
 
-<p>Another approach is to let the step length \( \gamma_j \) depend on the
-number of epochs in such a way that it becomes very small after a
-reasonable time such that we do not move at all. Such approaches are
-also called scaling. There are many such ways to <a href="https://towardsdatascience.com/gradient-descent-the-learning-rate-and-the-importance-of-feature-scaling-6c0b416596e1" target="_self">scale the learning
-rate</a>
-and <a href="https://www.jmlr.org/papers/volume23/20-1258/20-1258.pdf" target="_self">discussions here</a>. See
-also
-<a href="https://towardsdatascience.com/learning-rate-schedules-and-adaptive-learning-rate-methods-for-deep-learning-2c8f433990d1" target="_self"><tt>https://towardsdatascience.com/learning-rate-schedules-and-adaptive-learning-rate-methods-for-deep-learning-2c8f433990d1</tt></a>
-for a discussion of different scaling functions for the learning rate.
+<p>If we in addition define a diagonal matrix \( \boldsymbol{W} \) with elements 
+\( p(y_i\vert x_i,\boldsymbol{\theta})(1-p(y_i\vert x_i,\boldsymbol{\theta}) \), we can obtain a compact expression of the second derivative as 
 </p>
 
+$$
+\frac{\partial^2 \mathcal{C}(\boldsymbol{\theta})}{\partial \boldsymbol{\theta}\partial \boldsymbol{\theta}^T} = \boldsymbol{X}^T\boldsymbol{W}\boldsymbol{X}. 
+$$
+
+<p>This defines what is called  the Hessian matrix.</p>
+
 <p>
 <!-- navigation buttons at the bottom of the page -->
 <ul class="pagination">
@@ -400,7 +308,7 @@ <h2 id="slightly-different-approach" class="anchor">Slightly different approach
   <li><a href="http://github.com/._week40-bs026.html">27</a></li>
   <li><a href="http://github.com/._week40-bs027.html">28</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week40-bs068.html">69</a></li>
+  <li><a href="http://github.com/._week40-bs047.html">48</a></li>
   <li><a href="http://github.com/._week40-bs019.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week40/html/._week40-bs019.html b/doc/pub/week40/html/._week40-bs019.html
index 28b7fd295..34687374e 100644
--- a/doc/pub/week40/html/._week40-bs019.html
+++ b/doc/pub/week40/html/._week40-bs019.html
@@ -36,11 +36,10 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Plans for week 40', 2, None, 'plans-for-week-40'),
-              ('Lecture Monday September 30, 2024',
+ 'sections': [('Lecture Monday September 29, 2025',
                2,
                None,
-               'lecture-monday-september-30-2024'),
+               'lecture-monday-september-29-2025'),
               ('Suggested readings and videos',
                2,
                None,
@@ -49,145 +48,71 @@
                2,
                None,
                'lab-sessions-tuesday-and-wednesday'),
-              ('Summary from last week, using gradient descent methods, '
-               'limitations',
+              ('Logistic Regression, from last week',
                2,
                None,
-               'summary-from-last-week-using-gradient-descent-methods-limitations'),
-              ('Simple implementation of GD for OLS, Ridge and Lasso',
+               'logistic-regression-from-last-week'),
+              ('Classification problems', 2, None, 'classification-problems'),
+              ('Optimization and Deep learning',
                2,
                None,
-               'simple-implementation-of-gd-for-ols-ridge-and-lasso'),
-              ("But none of these can compete with Newton's method",
+               'optimization-and-deep-learning'),
+              ('Basics', 2, None, 'basics'),
+              ('Two parameters', 2, None, 'two-parameters'),
+              ('Maximum likelihood', 2, None, 'maximum-likelihood'),
+              ('The cost function rewritten',
                2,
                None,
-               'but-none-of-these-can-compete-with-newton-s-method'),
-              ('Gradient descent and Logistic regression',
+               'the-cost-function-rewritten'),
+              ('Minimizing the cross entropy',
                2,
                None,
-               'gradient-descent-and-logistic-regression'),
-              ('Overview video on Stochastic Gradient Descent',
+               'minimizing-the-cross-entropy'),
+              ('A more compact expression',
                2,
                None,
-               'overview-video-on-stochastic-gradient-descent'),
-              ('Batches and mini-batches', 2, None, 'batches-and-mini-batches'),
-              ('Stochastic Gradient Descent (SGD)',
+               'a-more-compact-expression'),
+              ('Extending to more predictors',
                2,
                None,
-               'stochastic-gradient-descent-sgd'),
-              ('Stochastic Gradient Descent',
+               'extending-to-more-predictors'),
+              ('Including more classes', 2, None, 'including-more-classes'),
+              ('More classes', 2, None, 'more-classes'),
+              ('Optimization, the central part of any Machine Learning '
+               'algortithm',
                2,
                None,
-               'stochastic-gradient-descent'),
-              ('Computation of gradients', 2, None, 'computation-of-gradients'),
-              ('SGD example', 2, None, 'sgd-example'),
-              ('The gradient step', 2, None, 'the-gradient-step'),
-              ('Simple example code', 2, None, 'simple-example-code'),
-              ('When do we stop?', 2, None, 'when-do-we-stop'),
-              ('Slightly different approach',
+               'optimization-the-central-part-of-any-machine-learning-algortithm'),
+              ('Revisiting our Logistic Regression case',
                2,
                None,
-               'slightly-different-approach'),
-              ('Time decay rate', 2, None, 'time-decay-rate'),
-              ('Code with a Number of Minibatches which varies',
+               'revisiting-our-logistic-regression-case'),
+              ('The equations to solve', 2, None, 'the-equations-to-solve'),
+              ("Solving using Newton-Raphson's method",
                2,
                None,
-               'code-with-a-number-of-minibatches-which-varies'),
-              ('Replace or not', 2, None, 'replace-or-not'),
-              ('Momentum based GD', 2, None, 'momentum-based-gd'),
-              ('More on momentum based approaches',
+               'solving-using-newton-raphson-s-method'),
+              ('Example code for Logistic Regression',
                2,
                None,
-               'more-on-momentum-based-approaches'),
-              ('Momentum parameter', 2, None, 'momentum-parameter'),
-              ('Second moment of the gradient',
-               2,
-               None,
-               'second-moment-of-the-gradient'),
-              ('RMS prop', 2, None, 'rms-prop'),
-              ('"ADAM optimizer":"https://arxiv.org/abs/1412.6980"',
-               2,
-               None,
-               'adam-optimizer-https-arxiv-org-abs-1412-6980'),
-              ('Algorithms and codes for Adagrad, RMSprop and Adam',
-               2,
-               None,
-               'algorithms-and-codes-for-adagrad-rmsprop-and-adam'),
-              ('AdaGrad algorithm, taken from "Goodfellow et '
-               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
-               2,
-               None,
-               'adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
-              ('RMSProp algorithm, taken from "Goodfellow et '
-               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
-               2,
-               None,
-               'rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
-              ('ADAM algorithm, taken from "Goodfellow et '
-               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
-               2,
-               None,
-               'adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
-              ('Practical tips', 2, None, 'practical-tips'),
-              ('Automatic differentiation',
-               2,
-               None,
-               'automatic-differentiation'),
-              ('Using autograd', 2, None, 'using-autograd'),
-              ('Autograd with more complicated functions',
-               2,
-               None,
-               'autograd-with-more-complicated-functions'),
-              ('More complicated functions using the elements of their '
-               'arguments directly',
-               2,
-               None,
-               'more-complicated-functions-using-the-elements-of-their-arguments-directly'),
-              ('Functions using mathematical functions from Numpy',
-               2,
-               None,
-               'functions-using-mathematical-functions-from-numpy'),
-              ('More autograd', 2, None, 'more-autograd'),
-              ('And  with loops', 2, None, 'and-with-loops'),
-              ('Using recursion', 2, None, 'using-recursion'),
-              ('Using Autograd with OLS', 2, None, 'using-autograd-with-ols'),
-              ('Same code but now with momentum gradient descent',
-               2,
-               None,
-               'same-code-but-now-with-momentum-gradient-descent'),
-              ('Including Stochastic Gradient Descent with Autograd',
-               2,
-               None,
-               'including-stochastic-gradient-descent-with-autograd'),
-              ('Same code but now with momentum gradient descent',
-               2,
-               None,
-               'same-code-but-now-with-momentum-gradient-descent'),
-              ('Similar (second order function now) problem but now with '
-               'AdaGrad',
-               2,
+               'example-code-for-logistic-regression'),
+              ('Synthetic data generation',
+               3,
                None,
-               'similar-second-order-function-now-problem-but-now-with-adagrad'),
-              ('RMSprop for adaptive learning rate with Stochastic Gradient '
-               'Descent',
+               'synthetic-data-generation'),
+              ('Using _Scikit-learn_', 2, None, 'using-scikit-learn'),
+              ('Using the correlation matrix',
                2,
                None,
-               'rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent'),
-              ('And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"',
+               'using-the-correlation-matrix'),
+              ('Discussing the correlation data',
                2,
                None,
-               'and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf'),
-              ('And Logistic Regression', 2, None, 'and-logistic-regression'),
-              ('Introducing "JAX":"https://jax.readthedocs.io/en/latest/"',
+               'discussing-the-correlation-data'),
+              ('Other measures in classification studies',
                2,
                None,
-               'introducing-jax-https-jax-readthedocs-io-en-latest'),
-              ('Getting started with Jax, note the way we import numpy',
-               3,
-               None,
-               'getting-started-with-jax-note-the-way-we-import-numpy'),
-              ('A warm-up example', 3, None, 'a-warm-up-example'),
-              ('A more advanced example', 3, None, 'a-more-advanced-example'),
+               'other-measures-in-classification-studies'),
               ('Introduction to Neural networks',
                2,
                None,
@@ -276,81 +201,54 @@
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs001.html#plans-for-week-40" style="font-size: 80%;"><b>Plans for week 40</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs002.html#lecture-monday-september-30-2024" style="font-size: 80%;"><b>Lecture Monday September 30, 2024</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs003.html#suggested-readings-and-videos" style="font-size: 80%;"><b>Suggested readings and videos</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs004.html#lab-sessions-tuesday-and-wednesday" style="font-size: 80%;"><b>Lab sessions Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs005.html#summary-from-last-week-using-gradient-descent-methods-limitations" style="font-size: 80%;"><b>Summary from last week, using gradient descent methods, limitations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs006.html#simple-implementation-of-gd-for-ols-ridge-and-lasso" style="font-size: 80%;"><b>Simple implementation of GD for OLS, Ridge and Lasso</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs007.html#but-none-of-these-can-compete-with-newton-s-method" style="font-size: 80%;"><b>But none of these can compete with Newton's method</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs008.html#gradient-descent-and-logistic-regression" style="font-size: 80%;"><b>Gradient descent and Logistic regression</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs009.html#overview-video-on-stochastic-gradient-descent" style="font-size: 80%;"><b>Overview video on Stochastic Gradient Descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs010.html#batches-and-mini-batches" style="font-size: 80%;"><b>Batches and mini-batches</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs011.html#stochastic-gradient-descent-sgd" style="font-size: 80%;"><b>Stochastic Gradient Descent (SGD)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs012.html#stochastic-gradient-descent" style="font-size: 80%;"><b>Stochastic Gradient Descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs013.html#computation-of-gradients" style="font-size: 80%;"><b>Computation of gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs014.html#sgd-example" style="font-size: 80%;"><b>SGD example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs015.html#the-gradient-step" style="font-size: 80%;"><b>The gradient step</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs016.html#simple-example-code" style="font-size: 80%;"><b>Simple example code</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs017.html#when-do-we-stop" style="font-size: 80%;"><b>When do we stop?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs018.html#slightly-different-approach" style="font-size: 80%;"><b>Slightly different approach</b></a></li>
-     <!-- navigation toc: --> <li><a href="#time-decay-rate" style="font-size: 80%;"><b>Time decay rate</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs020.html#code-with-a-number-of-minibatches-which-varies" style="font-size: 80%;"><b>Code with a Number of Minibatches which varies</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs021.html#replace-or-not" style="font-size: 80%;"><b>Replace or not</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs022.html#momentum-based-gd" style="font-size: 80%;"><b>Momentum based GD</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs023.html#more-on-momentum-based-approaches" style="font-size: 80%;"><b>More on momentum based approaches</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs024.html#momentum-parameter" style="font-size: 80%;"><b>Momentum parameter</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs025.html#second-moment-of-the-gradient" style="font-size: 80%;"><b>Second moment of the gradient</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs026.html#rms-prop" style="font-size: 80%;"><b>RMS prop</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs027.html#adam-optimizer-https-arxiv-org-abs-1412-6980" style="font-size: 80%;"><b>"ADAM optimizer":"https://arxiv.org/abs/1412.6980"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#algorithms-and-codes-for-adagrad-rmsprop-and-adam" style="font-size: 80%;"><b>Algorithms and codes for Adagrad, RMSprop and Adam</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>AdaGrad algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>RMSProp algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>ADAM algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs029.html#practical-tips" style="font-size: 80%;"><b>Practical tips</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs030.html#automatic-differentiation" style="font-size: 80%;"><b>Automatic differentiation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs031.html#using-autograd" style="font-size: 80%;"><b>Using autograd</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs032.html#autograd-with-more-complicated-functions" style="font-size: 80%;"><b>Autograd with more complicated functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs033.html#more-complicated-functions-using-the-elements-of-their-arguments-directly" style="font-size: 80%;"><b>More complicated functions using the elements of their arguments directly</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs034.html#functions-using-mathematical-functions-from-numpy" style="font-size: 80%;"><b>Functions using mathematical functions from Numpy</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs035.html#more-autograd" style="font-size: 80%;"><b>More autograd</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs036.html#and-with-loops" style="font-size: 80%;"><b>And  with loops</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs037.html#using-recursion" style="font-size: 80%;"><b>Using recursion</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs038.html#using-autograd-with-ols" style="font-size: 80%;"><b>Using Autograd with OLS</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs041.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs040.html#including-stochastic-gradient-descent-with-autograd" style="font-size: 80%;"><b>Including Stochastic Gradient Descent with Autograd</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs041.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#similar-second-order-function-now-problem-but-now-with-adagrad" style="font-size: 80%;"><b>Similar (second order function now) problem but now with AdaGrad</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs043.html#rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent" style="font-size: 80%;"><b>RMSprop for adaptive learning rate with Stochastic Gradient Descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs044.html#and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf" style="font-size: 80%;"><b>And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#and-logistic-regression" style="font-size: 80%;"><b>And Logistic Regression</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#introducing-jax-https-jax-readthedocs-io-en-latest" style="font-size: 80%;"><b>Introducing "JAX":"https://jax.readthedocs.io/en/latest/"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#getting-started-with-jax-note-the-way-we-import-numpy" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Getting started with Jax, note the way we import numpy</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#a-warm-up-example" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;A warm-up example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#a-more-advanced-example" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;A more advanced example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs046.html#introduction-to-neural-networks" style="font-size: 80%;"><b>Introduction to Neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs047.html#artificial-neurons" style="font-size: 80%;"><b>Artificial neurons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs048.html#neural-network-types" style="font-size: 80%;"><b>Neural network types</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs049.html#feed-forward-neural-networks" style="font-size: 80%;"><b>Feed-forward neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs050.html#convolutional-neural-network" style="font-size: 80%;"><b>Convolutional Neural Network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs051.html#recurrent-neural-networks" style="font-size: 80%;"><b>Recurrent neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs052.html#other-types-of-networks" style="font-size: 80%;"><b>Other types of networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs053.html#multilayer-perceptrons" style="font-size: 80%;"><b>Multilayer perceptrons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs054.html#why-multilayer-perceptrons" style="font-size: 80%;"><b>Why multilayer perceptrons?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs055.html#illustration-of-a-single-perceptron-model-and-a-multi-perceptron-model" style="font-size: 80%;"><b>Illustration of a single perceptron model and a multi-perceptron model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs056.html#examples-of-xor-or-and-and-gates" style="font-size: 80%;"><b>Examples of XOR, OR and AND gates</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs057.html#does-logistic-regression-do-a-better-job" style="font-size: 80%;"><b>Does Logistic Regression do a better Job?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs058.html#adding-neural-networks" style="font-size: 80%;"><b>Adding Neural Networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs064.html#matrix-vector-notation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Matrix-vector notation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs065.html#matrix-vector-notation-and-activation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Matrix-vector notation  and activation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs066.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs067.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs068.html#relevance" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Relevance</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs001.html#lecture-monday-september-29-2025" style="font-size: 80%;"><b>Lecture Monday September 29, 2025</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs002.html#suggested-readings-and-videos" style="font-size: 80%;"><b>Suggested readings and videos</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs003.html#lab-sessions-tuesday-and-wednesday" style="font-size: 80%;"><b>Lab sessions Tuesday and Wednesday</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs004.html#logistic-regression-from-last-week" style="font-size: 80%;"><b>Logistic Regression, from last week</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs005.html#classification-problems" style="font-size: 80%;"><b>Classification problems</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs006.html#optimization-and-deep-learning" style="font-size: 80%;"><b>Optimization and Deep learning</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs007.html#basics" style="font-size: 80%;"><b>Basics</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs008.html#two-parameters" style="font-size: 80%;"><b>Two parameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs009.html#maximum-likelihood" style="font-size: 80%;"><b>Maximum likelihood</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs010.html#the-cost-function-rewritten" style="font-size: 80%;"><b>The cost function rewritten</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs011.html#minimizing-the-cross-entropy" style="font-size: 80%;"><b>Minimizing the cross entropy</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs012.html#a-more-compact-expression" style="font-size: 80%;"><b>A more compact expression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs013.html#extending-to-more-predictors" style="font-size: 80%;"><b>Extending to more predictors</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs014.html#including-more-classes" style="font-size: 80%;"><b>Including more classes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs015.html#more-classes" style="font-size: 80%;"><b>More classes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs016.html#optimization-the-central-part-of-any-machine-learning-algortithm" style="font-size: 80%;"><b>Optimization, the central part of any Machine Learning algortithm</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs017.html#revisiting-our-logistic-regression-case" style="font-size: 80%;"><b>Revisiting our Logistic Regression case</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs018.html#the-equations-to-solve" style="font-size: 80%;"><b>The equations to solve</b></a></li>
+     <!-- navigation toc: --> <li><a href="#solving-using-newton-raphson-s-method" style="font-size: 80%;"><b>Solving using Newton-Raphson's method</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs020.html#example-code-for-logistic-regression" style="font-size: 80%;"><b>Example code for Logistic Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs020.html#synthetic-data-generation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Synthetic data generation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs021.html#using-scikit-learn" style="font-size: 80%;"><b>Using <b>Scikit-learn</b></b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs022.html#using-the-correlation-matrix" style="font-size: 80%;"><b>Using the correlation matrix</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs023.html#discussing-the-correlation-data" style="font-size: 80%;"><b>Discussing the correlation data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs024.html#other-measures-in-classification-studies" style="font-size: 80%;"><b>Other measures in classification studies</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs025.html#introduction-to-neural-networks" style="font-size: 80%;"><b>Introduction to Neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs026.html#artificial-neurons" style="font-size: 80%;"><b>Artificial neurons</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs027.html#neural-network-types" style="font-size: 80%;"><b>Neural network types</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#feed-forward-neural-networks" style="font-size: 80%;"><b>Feed-forward neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs029.html#convolutional-neural-network" style="font-size: 80%;"><b>Convolutional Neural Network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs030.html#recurrent-neural-networks" style="font-size: 80%;"><b>Recurrent neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs031.html#other-types-of-networks" style="font-size: 80%;"><b>Other types of networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs032.html#multilayer-perceptrons" style="font-size: 80%;"><b>Multilayer perceptrons</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs033.html#why-multilayer-perceptrons" style="font-size: 80%;"><b>Why multilayer perceptrons?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs034.html#illustration-of-a-single-perceptron-model-and-a-multi-perceptron-model" style="font-size: 80%;"><b>Illustration of a single perceptron model and a multi-perceptron model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs035.html#examples-of-xor-or-and-and-gates" style="font-size: 80%;"><b>Examples of XOR, OR and AND gates</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs036.html#does-logistic-regression-do-a-better-job" style="font-size: 80%;"><b>Does Logistic Regression do a better Job?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs037.html#adding-neural-networks" style="font-size: 80%;"><b>Adding Neural Networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs043.html#matrix-vector-notation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Matrix-vector notation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs044.html#matrix-vector-notation-and-activation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Matrix-vector notation  and activation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs046.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs047.html#relevance" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Relevance</a></li>
 
         </ul>
       </li>
@@ -362,63 +260,25 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0019"></a>
 <!-- !split -->
-<h2 id="time-decay-rate" class="anchor">Time decay rate </h2>
-
-<p>As an example, let \( e = 0,1,2,3,\cdots \) denote the current epoch and let \( t_0, t_1 > 0 \) be two fixed numbers. Furthermore, let \( t = e \cdot m + i \) where \( m \) is the number of minibatches and \( i=0,\cdots,m-1 \). Then the function $$\gamma_j(t; t_0, t_1) = \frac{t_0}{t+t_1} $$ goes to zero as the number of epochs gets large. I.e. we start with a step length \( \gamma_j (0; t_0, t_1) = t_0/t_1 \) which decays in <em>time</em> \( t \).</p>
-
-<p>In this way we can fix the number of epochs, compute \( \beta \) and
-evaluate the cost function at the end. Repeating the computation will
-give a different result since the scheme is random by design. Then we
-pick the final \( \beta \) that gives the lowest value of the cost
-function.
-</p>
+<h2 id="solving-using-newton-raphson-s-method" class="anchor">Solving using Newton-Raphson's method </h2>
 
+<p>If we can set up these equations, Newton-Raphson's iterative method is normally the method of choice. It requires however that we can compute in an efficient way the  matrices that define the first and second derivatives. </p>
 
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span> 
+<p>Our iterative scheme is then given by</p>
 
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">step_length</span>(t,t0,t1):
-    <span style="color: #008000; font-weight: bold">return</span> t0<span style="color: #666666">/</span>(t<span style="color: #666666">+</span>t1)
+$$
+\boldsymbol{\theta}^{\mathrm{new}} = \boldsymbol{\theta}^{\mathrm{old}}-\left(\frac{\partial^2 \mathcal{C}(\boldsymbol{\theta})}{\partial \boldsymbol{\theta}\partial \boldsymbol{\theta}^T}\right)^{-1}_{\boldsymbol{\theta}^{\mathrm{old}}}\times \left(\frac{\partial \mathcal{C}(\boldsymbol{\theta})}{\partial \boldsymbol{\theta}}\right)_{\boldsymbol{\theta}^{\mathrm{old}}},
+$$
 
-n <span style="color: #666666">=</span> <span style="color: #666666">100</span> <span style="color: #408080; font-style: italic">#100 datapoints </span>
-M <span style="color: #666666">=</span> <span style="color: #666666">5</span>   <span style="color: #408080; font-style: italic">#size of each minibatch</span>
-m <span style="color: #666666">=</span> <span style="color: #008000">int</span>(n<span style="color: #666666">/</span>M) <span style="color: #408080; font-style: italic">#number of minibatches</span>
-n_epochs <span style="color: #666666">=</span> <span style="color: #666666">500</span> <span style="color: #408080; font-style: italic">#number of epochs</span>
-t0 <span style="color: #666666">=</span> <span style="color: #666666">1.0</span>
-t1 <span style="color: #666666">=</span> <span style="color: #666666">10</span>
+<p>or in matrix form as</p>
 
-gamma_j <span style="color: #666666">=</span> t0<span style="color: #666666">/</span>t1
-j <span style="color: #666666">=</span> <span style="color: #666666">0</span>
-<span style="color: #008000; font-weight: bold">for</span> epoch <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #666666">1</span>,n_epochs<span style="color: #666666">+1</span>):
-    <span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(m):
-        k <span style="color: #666666">=</span> np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>randint(m) <span style="color: #408080; font-style: italic">#Pick the k-th minibatch at random</span>
-        <span style="color: #408080; font-style: italic">#Compute the gradient using the data in minibatch Bk</span>
-        <span style="color: #408080; font-style: italic">#Compute new suggestion for beta</span>
-        t <span style="color: #666666">=</span> epoch<span style="color: #666666">*</span>m<span style="color: #666666">+</span>i
-        gamma_j <span style="color: #666666">=</span> step_length(t,t0,t1)
-        j <span style="color: #666666">+=</span> <span style="color: #666666">1</span>
+$$
+\boldsymbol{\theta}^{\mathrm{new}} = \boldsymbol{\theta}^{\mathrm{old}}-\left(\boldsymbol{X}^T\boldsymbol{W}\boldsymbol{X} \right)^{-1}\times \left(-\boldsymbol{X}^T(\boldsymbol{y}-\boldsymbol{p}) \right)_{\boldsymbol{\theta}^{\mathrm{old}}}.
+$$
 
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;gamma_j after </span><span style="color: #BB6688; font-weight: bold">%d</span><span style="color: #BA2121"> epochs: </span><span style="color: #BB6688; font-weight: bold">%g</span><span style="color: #BA2121">&quot;</span> <span style="color: #666666">%</span> (n_epochs,gamma_j))
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
+<p>The right-hand side is computed with the old values of \( \theta \). </p>
 
+<p>If we can compute these matrices, in particular the Hessian, the above is often the easiest method to implement. </p>
 
 <p>
 <!-- navigation buttons at the bottom of the page -->
@@ -445,7 +305,7 @@ <h2 id="time-decay-rate" class="anchor">Time decay rate </h2>
   <li><a href="http://github.com/._week40-bs027.html">28</a></li>
   <li><a href="http://github.com/._week40-bs028.html">29</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week40-bs068.html">69</a></li>
+  <li><a href="http://github.com/._week40-bs047.html">48</a></li>
   <li><a href="http://github.com/._week40-bs020.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week40/html/._week40-bs020.html b/doc/pub/week40/html/._week40-bs020.html
index 1bdbd33a9..84f70c7c1 100644
--- a/doc/pub/week40/html/._week40-bs020.html
+++ b/doc/pub/week40/html/._week40-bs020.html
@@ -36,11 +36,10 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Plans for week 40', 2, None, 'plans-for-week-40'),
-              ('Lecture Monday September 30, 2024',
+ 'sections': [('Lecture Monday September 29, 2025',
                2,
                None,
-               'lecture-monday-september-30-2024'),
+               'lecture-monday-september-29-2025'),
               ('Suggested readings and videos',
                2,
                None,
@@ -49,145 +48,71 @@
                2,
                None,
                'lab-sessions-tuesday-and-wednesday'),
-              ('Summary from last week, using gradient descent methods, '
-               'limitations',
+              ('Logistic Regression, from last week',
                2,
                None,
-               'summary-from-last-week-using-gradient-descent-methods-limitations'),
-              ('Simple implementation of GD for OLS, Ridge and Lasso',
+               'logistic-regression-from-last-week'),
+              ('Classification problems', 2, None, 'classification-problems'),
+              ('Optimization and Deep learning',
                2,
                None,
-               'simple-implementation-of-gd-for-ols-ridge-and-lasso'),
-              ("But none of these can compete with Newton's method",
+               'optimization-and-deep-learning'),
+              ('Basics', 2, None, 'basics'),
+              ('Two parameters', 2, None, 'two-parameters'),
+              ('Maximum likelihood', 2, None, 'maximum-likelihood'),
+              ('The cost function rewritten',
                2,
                None,
-               'but-none-of-these-can-compete-with-newton-s-method'),
-              ('Gradient descent and Logistic regression',
+               'the-cost-function-rewritten'),
+              ('Minimizing the cross entropy',
                2,
                None,
-               'gradient-descent-and-logistic-regression'),
-              ('Overview video on Stochastic Gradient Descent',
+               'minimizing-the-cross-entropy'),
+              ('A more compact expression',
                2,
                None,
-               'overview-video-on-stochastic-gradient-descent'),
-              ('Batches and mini-batches', 2, None, 'batches-and-mini-batches'),
-              ('Stochastic Gradient Descent (SGD)',
+               'a-more-compact-expression'),
+              ('Extending to more predictors',
                2,
                None,
-               'stochastic-gradient-descent-sgd'),
-              ('Stochastic Gradient Descent',
+               'extending-to-more-predictors'),
+              ('Including more classes', 2, None, 'including-more-classes'),
+              ('More classes', 2, None, 'more-classes'),
+              ('Optimization, the central part of any Machine Learning '
+               'algortithm',
                2,
                None,
-               'stochastic-gradient-descent'),
-              ('Computation of gradients', 2, None, 'computation-of-gradients'),
-              ('SGD example', 2, None, 'sgd-example'),
-              ('The gradient step', 2, None, 'the-gradient-step'),
-              ('Simple example code', 2, None, 'simple-example-code'),
-              ('When do we stop?', 2, None, 'when-do-we-stop'),
-              ('Slightly different approach',
+               'optimization-the-central-part-of-any-machine-learning-algortithm'),
+              ('Revisiting our Logistic Regression case',
                2,
                None,
-               'slightly-different-approach'),
-              ('Time decay rate', 2, None, 'time-decay-rate'),
-              ('Code with a Number of Minibatches which varies',
+               'revisiting-our-logistic-regression-case'),
+              ('The equations to solve', 2, None, 'the-equations-to-solve'),
+              ("Solving using Newton-Raphson's method",
                2,
                None,
-               'code-with-a-number-of-minibatches-which-varies'),
-              ('Replace or not', 2, None, 'replace-or-not'),
-              ('Momentum based GD', 2, None, 'momentum-based-gd'),
-              ('More on momentum based approaches',
+               'solving-using-newton-raphson-s-method'),
+              ('Example code for Logistic Regression',
                2,
                None,
-               'more-on-momentum-based-approaches'),
-              ('Momentum parameter', 2, None, 'momentum-parameter'),
-              ('Second moment of the gradient',
-               2,
-               None,
-               'second-moment-of-the-gradient'),
-              ('RMS prop', 2, None, 'rms-prop'),
-              ('"ADAM optimizer":"https://arxiv.org/abs/1412.6980"',
-               2,
-               None,
-               'adam-optimizer-https-arxiv-org-abs-1412-6980'),
-              ('Algorithms and codes for Adagrad, RMSprop and Adam',
-               2,
-               None,
-               'algorithms-and-codes-for-adagrad-rmsprop-and-adam'),
-              ('AdaGrad algorithm, taken from "Goodfellow et '
-               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
-               2,
-               None,
-               'adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
-              ('RMSProp algorithm, taken from "Goodfellow et '
-               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
-               2,
-               None,
-               'rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
-              ('ADAM algorithm, taken from "Goodfellow et '
-               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
-               2,
-               None,
-               'adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
-              ('Practical tips', 2, None, 'practical-tips'),
-              ('Automatic differentiation',
-               2,
-               None,
-               'automatic-differentiation'),
-              ('Using autograd', 2, None, 'using-autograd'),
-              ('Autograd with more complicated functions',
-               2,
-               None,
-               'autograd-with-more-complicated-functions'),
-              ('More complicated functions using the elements of their '
-               'arguments directly',
-               2,
-               None,
-               'more-complicated-functions-using-the-elements-of-their-arguments-directly'),
-              ('Functions using mathematical functions from Numpy',
-               2,
-               None,
-               'functions-using-mathematical-functions-from-numpy'),
-              ('More autograd', 2, None, 'more-autograd'),
-              ('And  with loops', 2, None, 'and-with-loops'),
-              ('Using recursion', 2, None, 'using-recursion'),
-              ('Using Autograd with OLS', 2, None, 'using-autograd-with-ols'),
-              ('Same code but now with momentum gradient descent',
-               2,
-               None,
-               'same-code-but-now-with-momentum-gradient-descent'),
-              ('Including Stochastic Gradient Descent with Autograd',
-               2,
-               None,
-               'including-stochastic-gradient-descent-with-autograd'),
-              ('Same code but now with momentum gradient descent',
-               2,
-               None,
-               'same-code-but-now-with-momentum-gradient-descent'),
-              ('Similar (second order function now) problem but now with '
-               'AdaGrad',
-               2,
+               'example-code-for-logistic-regression'),
+              ('Synthetic data generation',
+               3,
                None,
-               'similar-second-order-function-now-problem-but-now-with-adagrad'),
-              ('RMSprop for adaptive learning rate with Stochastic Gradient '
-               'Descent',
+               'synthetic-data-generation'),
+              ('Using _Scikit-learn_', 2, None, 'using-scikit-learn'),
+              ('Using the correlation matrix',
                2,
                None,
-               'rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent'),
-              ('And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"',
+               'using-the-correlation-matrix'),
+              ('Discussing the correlation data',
                2,
                None,
-               'and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf'),
-              ('And Logistic Regression', 2, None, 'and-logistic-regression'),
-              ('Introducing "JAX":"https://jax.readthedocs.io/en/latest/"',
+               'discussing-the-correlation-data'),
+              ('Other measures in classification studies',
                2,
                None,
-               'introducing-jax-https-jax-readthedocs-io-en-latest'),
-              ('Getting started with Jax, note the way we import numpy',
-               3,
-               None,
-               'getting-started-with-jax-note-the-way-we-import-numpy'),
-              ('A warm-up example', 3, None, 'a-warm-up-example'),
-              ('A more advanced example', 3, None, 'a-more-advanced-example'),
+               'other-measures-in-classification-studies'),
               ('Introduction to Neural networks',
                2,
                None,
@@ -276,81 +201,54 @@
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs001.html#plans-for-week-40" style="font-size: 80%;"><b>Plans for week 40</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs002.html#lecture-monday-september-30-2024" style="font-size: 80%;"><b>Lecture Monday September 30, 2024</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs003.html#suggested-readings-and-videos" style="font-size: 80%;"><b>Suggested readings and videos</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs004.html#lab-sessions-tuesday-and-wednesday" style="font-size: 80%;"><b>Lab sessions Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs005.html#summary-from-last-week-using-gradient-descent-methods-limitations" style="font-size: 80%;"><b>Summary from last week, using gradient descent methods, limitations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs006.html#simple-implementation-of-gd-for-ols-ridge-and-lasso" style="font-size: 80%;"><b>Simple implementation of GD for OLS, Ridge and Lasso</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs007.html#but-none-of-these-can-compete-with-newton-s-method" style="font-size: 80%;"><b>But none of these can compete with Newton's method</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs008.html#gradient-descent-and-logistic-regression" style="font-size: 80%;"><b>Gradient descent and Logistic regression</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs009.html#overview-video-on-stochastic-gradient-descent" style="font-size: 80%;"><b>Overview video on Stochastic Gradient Descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs010.html#batches-and-mini-batches" style="font-size: 80%;"><b>Batches and mini-batches</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs011.html#stochastic-gradient-descent-sgd" style="font-size: 80%;"><b>Stochastic Gradient Descent (SGD)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs012.html#stochastic-gradient-descent" style="font-size: 80%;"><b>Stochastic Gradient Descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs013.html#computation-of-gradients" style="font-size: 80%;"><b>Computation of gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs014.html#sgd-example" style="font-size: 80%;"><b>SGD example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs015.html#the-gradient-step" style="font-size: 80%;"><b>The gradient step</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs016.html#simple-example-code" style="font-size: 80%;"><b>Simple example code</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs017.html#when-do-we-stop" style="font-size: 80%;"><b>When do we stop?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs018.html#slightly-different-approach" style="font-size: 80%;"><b>Slightly different approach</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs019.html#time-decay-rate" style="font-size: 80%;"><b>Time decay rate</b></a></li>
-     <!-- navigation toc: --> <li><a href="#code-with-a-number-of-minibatches-which-varies" style="font-size: 80%;"><b>Code with a Number of Minibatches which varies</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs021.html#replace-or-not" style="font-size: 80%;"><b>Replace or not</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs022.html#momentum-based-gd" style="font-size: 80%;"><b>Momentum based GD</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs023.html#more-on-momentum-based-approaches" style="font-size: 80%;"><b>More on momentum based approaches</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs024.html#momentum-parameter" style="font-size: 80%;"><b>Momentum parameter</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs025.html#second-moment-of-the-gradient" style="font-size: 80%;"><b>Second moment of the gradient</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs026.html#rms-prop" style="font-size: 80%;"><b>RMS prop</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs027.html#adam-optimizer-https-arxiv-org-abs-1412-6980" style="font-size: 80%;"><b>"ADAM optimizer":"https://arxiv.org/abs/1412.6980"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#algorithms-and-codes-for-adagrad-rmsprop-and-adam" style="font-size: 80%;"><b>Algorithms and codes for Adagrad, RMSprop and Adam</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>AdaGrad algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>RMSProp algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>ADAM algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs029.html#practical-tips" style="font-size: 80%;"><b>Practical tips</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs030.html#automatic-differentiation" style="font-size: 80%;"><b>Automatic differentiation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs031.html#using-autograd" style="font-size: 80%;"><b>Using autograd</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs032.html#autograd-with-more-complicated-functions" style="font-size: 80%;"><b>Autograd with more complicated functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs033.html#more-complicated-functions-using-the-elements-of-their-arguments-directly" style="font-size: 80%;"><b>More complicated functions using the elements of their arguments directly</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs034.html#functions-using-mathematical-functions-from-numpy" style="font-size: 80%;"><b>Functions using mathematical functions from Numpy</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs035.html#more-autograd" style="font-size: 80%;"><b>More autograd</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs036.html#and-with-loops" style="font-size: 80%;"><b>And  with loops</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs037.html#using-recursion" style="font-size: 80%;"><b>Using recursion</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs038.html#using-autograd-with-ols" style="font-size: 80%;"><b>Using Autograd with OLS</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs041.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs040.html#including-stochastic-gradient-descent-with-autograd" style="font-size: 80%;"><b>Including Stochastic Gradient Descent with Autograd</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs041.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#similar-second-order-function-now-problem-but-now-with-adagrad" style="font-size: 80%;"><b>Similar (second order function now) problem but now with AdaGrad</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs043.html#rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent" style="font-size: 80%;"><b>RMSprop for adaptive learning rate with Stochastic Gradient Descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs044.html#and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf" style="font-size: 80%;"><b>And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#and-logistic-regression" style="font-size: 80%;"><b>And Logistic Regression</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#introducing-jax-https-jax-readthedocs-io-en-latest" style="font-size: 80%;"><b>Introducing "JAX":"https://jax.readthedocs.io/en/latest/"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#getting-started-with-jax-note-the-way-we-import-numpy" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Getting started with Jax, note the way we import numpy</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#a-warm-up-example" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;A warm-up example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#a-more-advanced-example" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;A more advanced example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs046.html#introduction-to-neural-networks" style="font-size: 80%;"><b>Introduction to Neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs047.html#artificial-neurons" style="font-size: 80%;"><b>Artificial neurons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs048.html#neural-network-types" style="font-size: 80%;"><b>Neural network types</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs049.html#feed-forward-neural-networks" style="font-size: 80%;"><b>Feed-forward neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs050.html#convolutional-neural-network" style="font-size: 80%;"><b>Convolutional Neural Network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs051.html#recurrent-neural-networks" style="font-size: 80%;"><b>Recurrent neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs052.html#other-types-of-networks" style="font-size: 80%;"><b>Other types of networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs053.html#multilayer-perceptrons" style="font-size: 80%;"><b>Multilayer perceptrons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs054.html#why-multilayer-perceptrons" style="font-size: 80%;"><b>Why multilayer perceptrons?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs055.html#illustration-of-a-single-perceptron-model-and-a-multi-perceptron-model" style="font-size: 80%;"><b>Illustration of a single perceptron model and a multi-perceptron model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs056.html#examples-of-xor-or-and-and-gates" style="font-size: 80%;"><b>Examples of XOR, OR and AND gates</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs057.html#does-logistic-regression-do-a-better-job" style="font-size: 80%;"><b>Does Logistic Regression do a better Job?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs058.html#adding-neural-networks" style="font-size: 80%;"><b>Adding Neural Networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs064.html#matrix-vector-notation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Matrix-vector notation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs065.html#matrix-vector-notation-and-activation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Matrix-vector notation  and activation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs066.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs067.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs068.html#relevance" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Relevance</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs001.html#lecture-monday-september-29-2025" style="font-size: 80%;"><b>Lecture Monday September 29, 2025</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs002.html#suggested-readings-and-videos" style="font-size: 80%;"><b>Suggested readings and videos</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs003.html#lab-sessions-tuesday-and-wednesday" style="font-size: 80%;"><b>Lab sessions Tuesday and Wednesday</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs004.html#logistic-regression-from-last-week" style="font-size: 80%;"><b>Logistic Regression, from last week</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs005.html#classification-problems" style="font-size: 80%;"><b>Classification problems</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs006.html#optimization-and-deep-learning" style="font-size: 80%;"><b>Optimization and Deep learning</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs007.html#basics" style="font-size: 80%;"><b>Basics</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs008.html#two-parameters" style="font-size: 80%;"><b>Two parameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs009.html#maximum-likelihood" style="font-size: 80%;"><b>Maximum likelihood</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs010.html#the-cost-function-rewritten" style="font-size: 80%;"><b>The cost function rewritten</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs011.html#minimizing-the-cross-entropy" style="font-size: 80%;"><b>Minimizing the cross entropy</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs012.html#a-more-compact-expression" style="font-size: 80%;"><b>A more compact expression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs013.html#extending-to-more-predictors" style="font-size: 80%;"><b>Extending to more predictors</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs014.html#including-more-classes" style="font-size: 80%;"><b>Including more classes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs015.html#more-classes" style="font-size: 80%;"><b>More classes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs016.html#optimization-the-central-part-of-any-machine-learning-algortithm" style="font-size: 80%;"><b>Optimization, the central part of any Machine Learning algortithm</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs017.html#revisiting-our-logistic-regression-case" style="font-size: 80%;"><b>Revisiting our Logistic Regression case</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs018.html#the-equations-to-solve" style="font-size: 80%;"><b>The equations to solve</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs019.html#solving-using-newton-raphson-s-method" style="font-size: 80%;"><b>Solving using Newton-Raphson's method</b></a></li>
+     <!-- navigation toc: --> <li><a href="#example-code-for-logistic-regression" style="font-size: 80%;"><b>Example code for Logistic Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="#synthetic-data-generation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Synthetic data generation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs021.html#using-scikit-learn" style="font-size: 80%;"><b>Using <b>Scikit-learn</b></b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs022.html#using-the-correlation-matrix" style="font-size: 80%;"><b>Using the correlation matrix</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs023.html#discussing-the-correlation-data" style="font-size: 80%;"><b>Discussing the correlation data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs024.html#other-measures-in-classification-studies" style="font-size: 80%;"><b>Other measures in classification studies</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs025.html#introduction-to-neural-networks" style="font-size: 80%;"><b>Introduction to Neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs026.html#artificial-neurons" style="font-size: 80%;"><b>Artificial neurons</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs027.html#neural-network-types" style="font-size: 80%;"><b>Neural network types</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#feed-forward-neural-networks" style="font-size: 80%;"><b>Feed-forward neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs029.html#convolutional-neural-network" style="font-size: 80%;"><b>Convolutional Neural Network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs030.html#recurrent-neural-networks" style="font-size: 80%;"><b>Recurrent neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs031.html#other-types-of-networks" style="font-size: 80%;"><b>Other types of networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs032.html#multilayer-perceptrons" style="font-size: 80%;"><b>Multilayer perceptrons</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs033.html#why-multilayer-perceptrons" style="font-size: 80%;"><b>Why multilayer perceptrons?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs034.html#illustration-of-a-single-perceptron-model-and-a-multi-perceptron-model" style="font-size: 80%;"><b>Illustration of a single perceptron model and a multi-perceptron model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs035.html#examples-of-xor-or-and-and-gates" style="font-size: 80%;"><b>Examples of XOR, OR and AND gates</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs036.html#does-logistic-regression-do-a-better-job" style="font-size: 80%;"><b>Does Logistic Regression do a better Job?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs037.html#adding-neural-networks" style="font-size: 80%;"><b>Adding Neural Networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs043.html#matrix-vector-notation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Matrix-vector notation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs044.html#matrix-vector-notation-and-activation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Matrix-vector notation  and activation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs046.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs047.html#relevance" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Relevance</a></li>
 
         </ul>
       </li>
@@ -362,82 +260,318 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0020"></a>
 <!-- !split -->
-<h2 id="code-with-a-number-of-minibatches-which-varies" class="anchor">Code with a Number of Minibatches which varies </h2>
+<h2 id="example-code-for-logistic-regression" class="anchor">Example code for Logistic Regression </h2>
 
-<p>In the code here we vary the number of mini-batches.</p>
+<p>Here we make a class for Logistic regression. The code uses a simple data set and includes both a binary case and a multiclass case.</p>
 
-<!-- code=text (!bc pycode) typeset with pygments style "default" -->
+<!-- code=python (!bc pycod) typeset with pygments style "default" -->
 <div class="cell border-box-sizing code_cell rendered">
   <div class="input">
     <div class="inner_cell">
       <div class="input_area">
         <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"># Importing various packages
-from math import exp, sqrt
-from random import random, seed
-import numpy as np
-import matplotlib.pyplot as plt
-
-n = 100
-x = 2*np.random.rand(n,1)
-y = 4+3*x+np.random.randn(n,1)
-
-X = np.c_[np.ones((n,1)), x]
-XT_X = X.T @ X
-theta_linreg = np.linalg.inv(X.T @ X) @ (X.T @ y)
-print(&quot;Own inversion&quot;)
-print(theta_linreg)
-# Hessian matrix
-H = (2.0/n)* XT_X
-EigValues, EigVectors = np.linalg.eig(H)
-print(f&quot;Eigenvalues of Hessian Matrix:{EigValues}&quot;)
-
-theta = np.random.randn(2,1)
-eta = 1.0/np.max(EigValues)
-Niterations = 1000
-
-
-for iter in range(Niterations):
-    gradients = 2.0/n*X.T @ ((X @ theta)-y)
-    theta -= eta*gradients
-print(&quot;theta from own gd&quot;)
-print(theta)
-
-xnew = np.array([[0],[2]])
-Xnew = np.c_[np.ones((2,1)), xnew]
-ypredict = Xnew.dot(theta)
-ypredict2 = Xnew.dot(theta_linreg)
-
-n_epochs = 50
-M = 5   #size of each minibatch
-m = int(n/M) #number of minibatches
-t0, t1 = 5, 50
-
-def learning_schedule(t):
-    return t0/(t+t1)
-
-theta = np.random.randn(2,1)
-
-for epoch in range(n_epochs):
-# Can you figure out a better way of setting up the contributions to each batch?
-    for i in range(m):
-        random_index = M*np.random.randint(m)
-        xi = X[random_index:random_index+M]
-        yi = y[random_index:random_index+M]
-        gradients = (2.0/M)* xi.T @ ((xi @ theta)-yi)
-        eta = learning_schedule(epoch*m+i)
-        theta = theta - eta*gradients
-print(&quot;theta from own sdg&quot;)
-print(theta)
-
-plt.plot(xnew, ypredict, &quot;r-&quot;)
-plt.plot(xnew, ypredict2, &quot;b-&quot;)
-plt.plot(x, y ,&#39;ro&#39;)
-plt.axis([0,2.0,0, 15.0])
-plt.xlabel(r&#39;$x$&#39;)
-plt.ylabel(r&#39;$y$&#39;)
-plt.title(r&#39;Random numbers &#39;)
-plt.show()
+  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
+
+<span style="color: #008000; font-weight: bold">class</span> <span style="color: #0000FF; font-weight: bold">LogisticRegression</span>:
+    <span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;</span>
+<span style="color: #BA2121; font-style: italic">    Logistic Regression for binary and multiclass classification.</span>
+<span style="color: #BA2121; font-style: italic">    &quot;&quot;&quot;</span>
+    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">__init__</span>(<span style="color: #008000">self</span>, lr<span style="color: #666666">=0.01</span>, epochs<span style="color: #666666">=1000</span>, fit_intercept<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">True</span>, verbose<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">False</span>):
+        <span style="color: #008000">self</span><span style="color: #666666">.</span>lr <span style="color: #666666">=</span> lr                  <span style="color: #408080; font-style: italic"># Learning rate for gradient descent</span>
+        <span style="color: #008000">self</span><span style="color: #666666">.</span>epochs <span style="color: #666666">=</span> epochs          <span style="color: #408080; font-style: italic"># Number of iterations</span>
+        <span style="color: #008000">self</span><span style="color: #666666">.</span>fit_intercept <span style="color: #666666">=</span> fit_intercept  <span style="color: #408080; font-style: italic"># Whether to add intercept (bias)</span>
+        <span style="color: #008000">self</span><span style="color: #666666">.</span>verbose <span style="color: #666666">=</span> verbose        <span style="color: #408080; font-style: italic"># Print loss during training if True</span>
+        <span style="color: #008000">self</span><span style="color: #666666">.</span>weights <span style="color: #666666">=</span> <span style="color: #008000; font-weight: bold">None</span>
+        <span style="color: #008000">self</span><span style="color: #666666">.</span>multi_class <span style="color: #666666">=</span> <span style="color: #008000; font-weight: bold">False</span>      <span style="color: #408080; font-style: italic"># Will be determined at fit time</span>
+
+    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">_add_intercept</span>(<span style="color: #008000">self</span>, X):
+        <span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;Add intercept term (column of ones) to feature matrix.&quot;&quot;&quot;</span>
+        intercept <span style="color: #666666">=</span> np<span style="color: #666666">.</span>ones((X<span style="color: #666666">.</span>shape[<span style="color: #666666">0</span>], <span style="color: #666666">1</span>))
+        <span style="color: #008000; font-weight: bold">return</span> np<span style="color: #666666">.</span>concatenate((intercept, X), axis<span style="color: #666666">=1</span>)
+
+    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">_sigmoid</span>(<span style="color: #008000">self</span>, z):
+        <span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;Sigmoid function for binary logistic.&quot;&quot;&quot;</span>
+        <span style="color: #008000; font-weight: bold">return</span> <span style="color: #666666">1</span> <span style="color: #666666">/</span> (<span style="color: #666666">1</span> <span style="color: #666666">+</span> np<span style="color: #666666">.</span>exp(<span style="color: #666666">-</span>z))
+
+    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">_softmax</span>(<span style="color: #008000">self</span>, Z):
+        <span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;Softmax function for multiclass logistic.&quot;&quot;&quot;</span>
+        exp_Z <span style="color: #666666">=</span> np<span style="color: #666666">.</span>exp(Z <span style="color: #666666">-</span> np<span style="color: #666666">.</span>max(Z, axis<span style="color: #666666">=1</span>, keepdims<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">True</span>))
+        <span style="color: #008000; font-weight: bold">return</span> exp_Z <span style="color: #666666">/</span> np<span style="color: #666666">.</span>sum(exp_Z, axis<span style="color: #666666">=1</span>, keepdims<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">True</span>)
+
+    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">fit</span>(<span style="color: #008000">self</span>, X, y):
+        <span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;</span>
+<span style="color: #BA2121; font-style: italic">        Train the logistic regression model using gradient descent.</span>
+<span style="color: #BA2121; font-style: italic">        Supports binary (sigmoid) and multiclass (softmax) based on y.</span>
+<span style="color: #BA2121; font-style: italic">        &quot;&quot;&quot;</span>
+        X <span style="color: #666666">=</span> np<span style="color: #666666">.</span>array(X)
+        y <span style="color: #666666">=</span> np<span style="color: #666666">.</span>array(y)
+        n_samples, n_features <span style="color: #666666">=</span> X<span style="color: #666666">.</span>shape
+
+        <span style="color: #408080; font-style: italic"># Add intercept if needed</span>
+        <span style="color: #008000; font-weight: bold">if</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>fit_intercept:
+            X <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>_add_intercept(X)
+            n_features <span style="color: #666666">+=</span> <span style="color: #666666">1</span>
+
+        <span style="color: #408080; font-style: italic"># Determine classes and mode (binary vs multiclass)</span>
+        unique_classes <span style="color: #666666">=</span> np<span style="color: #666666">.</span>unique(y)
+        <span style="color: #008000; font-weight: bold">if</span> <span style="color: #008000">len</span>(unique_classes) <span style="color: #666666">&gt;</span> <span style="color: #666666">2</span>:
+            <span style="color: #008000">self</span><span style="color: #666666">.</span>multi_class <span style="color: #666666">=</span> <span style="color: #008000; font-weight: bold">True</span>
+        <span style="color: #008000; font-weight: bold">else</span>:
+            <span style="color: #008000">self</span><span style="color: #666666">.</span>multi_class <span style="color: #666666">=</span> <span style="color: #008000; font-weight: bold">False</span>
+
+        <span style="color: #408080; font-style: italic"># ----- Multiclass case -----</span>
+        <span style="color: #008000; font-weight: bold">if</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>multi_class:
+            n_classes <span style="color: #666666">=</span> <span style="color: #008000">len</span>(unique_classes)
+            <span style="color: #408080; font-style: italic"># Map original labels to 0...n_classes-1</span>
+            class_to_index <span style="color: #666666">=</span> {c: idx <span style="color: #008000; font-weight: bold">for</span> idx, c <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">enumerate</span>(unique_classes)}
+            y_indices <span style="color: #666666">=</span> np<span style="color: #666666">.</span>array([class_to_index[c] <span style="color: #008000; font-weight: bold">for</span> c <span style="color: #AA22FF; font-weight: bold">in</span> y])
+            <span style="color: #408080; font-style: italic"># Initialize weight matrix (features x classes)</span>
+            <span style="color: #008000">self</span><span style="color: #666666">.</span>weights <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros((n_features, n_classes))
+
+            <span style="color: #408080; font-style: italic"># One-hot encode y</span>
+            Y_onehot <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros((n_samples, n_classes))
+            Y_onehot[np<span style="color: #666666">.</span>arange(n_samples), y_indices] <span style="color: #666666">=</span> <span style="color: #666666">1</span>
+
+            <span style="color: #408080; font-style: italic"># Gradient descent</span>
+            <span style="color: #008000; font-weight: bold">for</span> epoch <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #008000">self</span><span style="color: #666666">.</span>epochs):
+                scores <span style="color: #666666">=</span> X<span style="color: #666666">.</span>dot(<span style="color: #008000">self</span><span style="color: #666666">.</span>weights)          <span style="color: #408080; font-style: italic"># Linear scores (n_samples x n_classes)</span>
+                probs <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>_softmax(scores)        <span style="color: #408080; font-style: italic"># Probabilities (n_samples x n_classes)</span>
+                <span style="color: #408080; font-style: italic"># Compute gradient (features x classes)</span>
+                gradient <span style="color: #666666">=</span> (<span style="color: #666666">1</span> <span style="color: #666666">/</span> n_samples) <span style="color: #666666">*</span> X<span style="color: #666666">.</span>T<span style="color: #666666">.</span>dot(probs <span style="color: #666666">-</span> Y_onehot)
+                <span style="color: #408080; font-style: italic"># Update weights</span>
+                <span style="color: #008000">self</span><span style="color: #666666">.</span>weights <span style="color: #666666">-=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>lr <span style="color: #666666">*</span> gradient
+
+                <span style="color: #008000; font-weight: bold">if</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>verbose <span style="color: #AA22FF; font-weight: bold">and</span> epoch <span style="color: #666666">%</span> <span style="color: #666666">100</span> <span style="color: #666666">==</span> <span style="color: #666666">0</span>:
+                    <span style="color: #408080; font-style: italic"># Compute current loss (categorical cross-entropy)</span>
+                    loss <span style="color: #666666">=</span> <span style="color: #666666">-</span>np<span style="color: #666666">.</span>sum(Y_onehot <span style="color: #666666">*</span> np<span style="color: #666666">.</span>log(probs <span style="color: #666666">+</span> <span style="color: #666666">1e-15</span>)) <span style="color: #666666">/</span> n_samples
+                    <span style="color: #008000">print</span>(<span style="color: #BA2121">f&quot;[Epoch </span><span style="color: #BB6688; font-weight: bold">{</span>epoch<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">] Multiclass loss: </span><span style="color: #BB6688; font-weight: bold">{</span>loss<span style="color: #BB6688; font-weight: bold">:</span><span style="color: #BA2121">.4f</span><span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">&quot;</span>)
+
+        <span style="color: #408080; font-style: italic"># ----- Binary case -----</span>
+        <span style="color: #008000; font-weight: bold">else</span>:
+            <span style="color: #408080; font-style: italic"># Convert y to 0/1 if not already</span>
+            <span style="color: #008000; font-weight: bold">if</span> <span style="color: #AA22FF; font-weight: bold">not</span> np<span style="color: #666666">.</span>array_equal(unique_classes, [<span style="color: #666666">0</span>, <span style="color: #666666">1</span>]):
+                <span style="color: #408080; font-style: italic"># Map the two classes to 0 and 1</span>
+                class0, class1 <span style="color: #666666">=</span> unique_classes
+                y_binary <span style="color: #666666">=</span> np<span style="color: #666666">.</span>where(y <span style="color: #666666">==</span> class1, <span style="color: #666666">1</span>, <span style="color: #666666">0</span>)
+            <span style="color: #008000; font-weight: bold">else</span>:
+                y_binary <span style="color: #666666">=</span> y<span style="color: #666666">.</span>copy()<span style="color: #666666">.</span>astype(<span style="color: #008000">int</span>)
+
+            <span style="color: #408080; font-style: italic"># Initialize weights vector (features,)</span>
+            <span style="color: #008000">self</span><span style="color: #666666">.</span>weights <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros(n_features)
+
+            <span style="color: #408080; font-style: italic"># Gradient descent</span>
+            <span style="color: #008000; font-weight: bold">for</span> epoch <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #008000">self</span><span style="color: #666666">.</span>epochs):
+                linear_model <span style="color: #666666">=</span> X<span style="color: #666666">.</span>dot(<span style="color: #008000">self</span><span style="color: #666666">.</span>weights)     <span style="color: #408080; font-style: italic"># (n_samples,)</span>
+                probs <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>_sigmoid(linear_model)   <span style="color: #408080; font-style: italic"># (n_samples,)</span>
+                <span style="color: #408080; font-style: italic"># Gradient for binary cross-entropy</span>
+                gradient <span style="color: #666666">=</span> (<span style="color: #666666">1</span> <span style="color: #666666">/</span> n_samples) <span style="color: #666666">*</span> X<span style="color: #666666">.</span>T<span style="color: #666666">.</span>dot(probs <span style="color: #666666">-</span> y_binary)
+                <span style="color: #008000">self</span><span style="color: #666666">.</span>weights <span style="color: #666666">-=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>lr <span style="color: #666666">*</span> gradient
+
+                <span style="color: #008000; font-weight: bold">if</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>verbose <span style="color: #AA22FF; font-weight: bold">and</span> epoch <span style="color: #666666">%</span> <span style="color: #666666">100</span> <span style="color: #666666">==</span> <span style="color: #666666">0</span>:
+                    <span style="color: #408080; font-style: italic"># Compute binary cross-entropy loss</span>
+                    loss <span style="color: #666666">=</span> <span style="color: #666666">-</span>np<span style="color: #666666">.</span>mean(
+                        y_binary <span style="color: #666666">*</span> np<span style="color: #666666">.</span>log(probs <span style="color: #666666">+</span> <span style="color: #666666">1e-15</span>) <span style="color: #666666">+</span> 
+                        (<span style="color: #666666">1</span> <span style="color: #666666">-</span> y_binary) <span style="color: #666666">*</span> np<span style="color: #666666">.</span>log(<span style="color: #666666">1</span> <span style="color: #666666">-</span> probs <span style="color: #666666">+</span> <span style="color: #666666">1e-15</span>)
+                    )
+                    <span style="color: #008000">print</span>(<span style="color: #BA2121">f&quot;[Epoch </span><span style="color: #BB6688; font-weight: bold">{</span>epoch<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">] Binary loss: </span><span style="color: #BB6688; font-weight: bold">{</span>loss<span style="color: #BB6688; font-weight: bold">:</span><span style="color: #BA2121">.4f</span><span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">&quot;</span>)
+
+    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">predict_prob</span>(<span style="color: #008000">self</span>, X):
+        <span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;</span>
+<span style="color: #BA2121; font-style: italic">        Compute probability estimates. Returns a 1D array for binary or</span>
+<span style="color: #BA2121; font-style: italic">        a 2D array (n_samples x n_classes) for multiclass.</span>
+<span style="color: #BA2121; font-style: italic">        &quot;&quot;&quot;</span>
+        X <span style="color: #666666">=</span> np<span style="color: #666666">.</span>array(X)
+        <span style="color: #408080; font-style: italic"># Add intercept if the model used it</span>
+        <span style="color: #008000; font-weight: bold">if</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>fit_intercept:
+            X <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>_add_intercept(X)
+        scores <span style="color: #666666">=</span> X<span style="color: #666666">.</span>dot(<span style="color: #008000">self</span><span style="color: #666666">.</span>weights)
+        <span style="color: #008000; font-weight: bold">if</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>multi_class:
+            <span style="color: #008000; font-weight: bold">return</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>_softmax(scores)
+        <span style="color: #008000; font-weight: bold">else</span>:
+            <span style="color: #008000; font-weight: bold">return</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>_sigmoid(scores)
+
+    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">predict</span>(<span style="color: #008000">self</span>, X):
+        <span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;</span>
+<span style="color: #BA2121; font-style: italic">        Predict class labels for samples in X.</span>
+<span style="color: #BA2121; font-style: italic">        Returns integer class labels (0,1 for binary, or 0...C-1 for multiclass).</span>
+<span style="color: #BA2121; font-style: italic">        &quot;&quot;&quot;</span>
+        probs <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>predict_prob(X)
+        <span style="color: #008000; font-weight: bold">if</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>multi_class:
+            <span style="color: #408080; font-style: italic"># Choose class with highest probability</span>
+            <span style="color: #008000; font-weight: bold">return</span> np<span style="color: #666666">.</span>argmax(probs, axis<span style="color: #666666">=1</span>)
+        <span style="color: #008000; font-weight: bold">else</span>:
+            <span style="color: #408080; font-style: italic"># Threshold at 0.5 for binary</span>
+            <span style="color: #008000; font-weight: bold">return</span> (probs <span style="color: #666666">&gt;=</span> <span style="color: #666666">0.5</span>)<span style="color: #666666">.</span>astype(<span style="color: #008000">int</span>)
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
+</div>
+
+<p>The class implements the sigmoid and softmax internally. During fit(),
+we check the number of classes: if more than 2, we set
+self.multi_class=True and perform multinomial logistic regression. We
+one-hot encode the target vector and update a weight matrix with
+softmax probabilities. Otherwise, we do standard binary logistic
+regression, converting labels to 0/1 if needed and updating a weight
+vector. In both cases we use batch gradient descent on the
+cross-entropy loss (we add a small epsilon 1e-15 to logs for numerical
+stability). Progress (loss) can be printed if verbose=True.
+</p>
+
+
+<!-- code=python (!bc pycod) typeset with pygments style "default" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #f8f8f8">
+  <pre style="line-height: 125%;"><span style="color: #408080; font-style: italic"># Evaluation Metrics</span>
+<span style="color: #408080; font-style: italic">#We define helper functions for accuracy and cross-entropy loss. Accuracy is the fraction of correct predictions . For loss, we compute the appropriate cross-entropy:</span>
+
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">accuracy_score</span>(y_true, y_pred):
+    <span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;Accuracy = (# correct predictions) / (total samples).&quot;&quot;&quot;</span>
+    y_true <span style="color: #666666">=</span> np<span style="color: #666666">.</span>array(y_true)
+    y_pred <span style="color: #666666">=</span> np<span style="color: #666666">.</span>array(y_pred)
+    <span style="color: #008000; font-weight: bold">return</span> np<span style="color: #666666">.</span>mean(y_true <span style="color: #666666">==</span> y_pred)
+
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">binary_cross_entropy</span>(y_true, y_prob):
+    <span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;</span>
+<span style="color: #BA2121; font-style: italic">    Binary cross-entropy loss.</span>
+<span style="color: #BA2121; font-style: italic">    y_true: true binary labels (0 or 1), y_prob: predicted probabilities for class 1.</span>
+<span style="color: #BA2121; font-style: italic">    &quot;&quot;&quot;</span>
+    y_true <span style="color: #666666">=</span> np<span style="color: #666666">.</span>array(y_true)
+    y_prob <span style="color: #666666">=</span> np<span style="color: #666666">.</span>clip(np<span style="color: #666666">.</span>array(y_prob), <span style="color: #666666">1e-15</span>, <span style="color: #666666">1-1e-15</span>)
+    <span style="color: #008000; font-weight: bold">return</span> <span style="color: #666666">-</span>np<span style="color: #666666">.</span>mean(y_true <span style="color: #666666">*</span> np<span style="color: #666666">.</span>log(y_prob) <span style="color: #666666">+</span> (<span style="color: #666666">1</span> <span style="color: #666666">-</span> y_true) <span style="color: #666666">*</span> np<span style="color: #666666">.</span>log(<span style="color: #666666">1</span> <span style="color: #666666">-</span> y_prob))
+
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">categorical_cross_entropy</span>(y_true, y_prob):
+    <span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;</span>
+<span style="color: #BA2121; font-style: italic">    Categorical cross-entropy loss for multiclass.</span>
+<span style="color: #BA2121; font-style: italic">    y_true: true labels (0...C-1), y_prob: array of predicted probabilities (n_samples x C).</span>
+<span style="color: #BA2121; font-style: italic">    &quot;&quot;&quot;</span>
+    y_true <span style="color: #666666">=</span> np<span style="color: #666666">.</span>array(y_true, dtype<span style="color: #666666">=</span><span style="color: #008000">int</span>)
+    y_prob <span style="color: #666666">=</span> np<span style="color: #666666">.</span>clip(np<span style="color: #666666">.</span>array(y_prob), <span style="color: #666666">1e-15</span>, <span style="color: #666666">1-1e-15</span>)
+    <span style="color: #408080; font-style: italic"># One-hot encode true labels</span>
+    n_samples, n_classes <span style="color: #666666">=</span> y_prob<span style="color: #666666">.</span>shape
+    one_hot <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros_like(y_prob)
+    one_hot[np<span style="color: #666666">.</span>arange(n_samples), y_true] <span style="color: #666666">=</span> <span style="color: #666666">1</span>
+    <span style="color: #408080; font-style: italic"># Compute cross-entropy</span>
+    loss_vec <span style="color: #666666">=</span> <span style="color: #666666">-</span>np<span style="color: #666666">.</span>sum(one_hot <span style="color: #666666">*</span> np<span style="color: #666666">.</span>log(y_prob), axis<span style="color: #666666">=1</span>)
+    <span style="color: #008000; font-weight: bold">return</span> np<span style="color: #666666">.</span>mean(loss_vec)
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
+</div>
+<h3 id="synthetic-data-generation" class="anchor">Synthetic data generation </h3>
+
+<p>Binary classification data: Create two Gaussian clusters in 2D. For example, class 0 around mean [-2,-2] and class 1 around [2,2].
+Multiclass data: Create several Gaussian clusters (one per class) spread out in feature space.
+</p>
+
+
+<!-- code=python (!bc pycod) typeset with pygments style "default" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #f8f8f8">
+  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
+
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">generate_binary_data</span>(n_samples<span style="color: #666666">=100</span>, n_features<span style="color: #666666">=2</span>, random_state<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">None</span>):
+    <span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;</span>
+<span style="color: #BA2121; font-style: italic">    Generate synthetic binary classification data.</span>
+<span style="color: #BA2121; font-style: italic">    Returns (X, y) where X is (n_samples x n_features), y in {0,1}.</span>
+<span style="color: #BA2121; font-style: italic">    &quot;&quot;&quot;</span>
+    rng <span style="color: #666666">=</span> np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>RandomState(random_state)
+    <span style="color: #408080; font-style: italic"># Half samples for class 0, half for class 1</span>
+    n0 <span style="color: #666666">=</span> n_samples <span style="color: #666666">//</span> <span style="color: #666666">2</span>
+    n1 <span style="color: #666666">=</span> n_samples <span style="color: #666666">-</span> n0
+    <span style="color: #408080; font-style: italic"># Class 0 around mean -2, class 1 around +2</span>
+    mean0 <span style="color: #666666">=</span> <span style="color: #666666">-2</span> <span style="color: #666666">*</span> np<span style="color: #666666">.</span>ones(n_features)
+    mean1 <span style="color: #666666">=</span>  <span style="color: #666666">2</span> <span style="color: #666666">*</span> np<span style="color: #666666">.</span>ones(n_features)
+    X0 <span style="color: #666666">=</span> rng<span style="color: #666666">.</span>randn(n0, n_features) <span style="color: #666666">+</span> mean0
+    X1 <span style="color: #666666">=</span> rng<span style="color: #666666">.</span>randn(n1, n_features) <span style="color: #666666">+</span> mean1
+    X <span style="color: #666666">=</span> np<span style="color: #666666">.</span>vstack((X0, X1))
+    y <span style="color: #666666">=</span> np<span style="color: #666666">.</span>array([<span style="color: #666666">0</span>]<span style="color: #666666">*</span>n0 <span style="color: #666666">+</span> [<span style="color: #666666">1</span>]<span style="color: #666666">*</span>n1)
+    <span style="color: #008000; font-weight: bold">return</span> X, y
+
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">generate_multiclass_data</span>(n_samples<span style="color: #666666">=150</span>, n_features<span style="color: #666666">=2</span>, n_classes<span style="color: #666666">=3</span>, random_state<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">None</span>):
+    <span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;</span>
+<span style="color: #BA2121; font-style: italic">    Generate synthetic multiclass data with n_classes Gaussian clusters.</span>
+<span style="color: #BA2121; font-style: italic">    &quot;&quot;&quot;</span>
+    rng <span style="color: #666666">=</span> np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>RandomState(random_state)
+    X <span style="color: #666666">=</span> []
+    y <span style="color: #666666">=</span> []
+    samples_per_class <span style="color: #666666">=</span> n_samples <span style="color: #666666">//</span> n_classes
+    <span style="color: #008000; font-weight: bold">for</span> <span style="color: #008000">cls</span> <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(n_classes):
+        <span style="color: #408080; font-style: italic"># Random cluster center for each class</span>
+        center <span style="color: #666666">=</span> rng<span style="color: #666666">.</span>uniform(<span style="color: #666666">-5</span>, <span style="color: #666666">5</span>, size<span style="color: #666666">=</span>n_features)
+        Xi <span style="color: #666666">=</span> rng<span style="color: #666666">.</span>randn(samples_per_class, n_features) <span style="color: #666666">+</span> center
+        yi <span style="color: #666666">=</span> [<span style="color: #008000">cls</span>] <span style="color: #666666">*</span> samples_per_class
+        X<span style="color: #666666">.</span>append(Xi)
+        y<span style="color: #666666">.</span>extend(yi)
+    X <span style="color: #666666">=</span> np<span style="color: #666666">.</span>vstack(X)
+    y <span style="color: #666666">=</span> np<span style="color: #666666">.</span>array(y)
+    <span style="color: #008000; font-weight: bold">return</span> X, y
+
+
+<span style="color: #408080; font-style: italic"># Generate and test on binary data</span>
+X_bin, y_bin <span style="color: #666666">=</span> generate_binary_data(n_samples<span style="color: #666666">=200</span>, n_features<span style="color: #666666">=2</span>, random_state<span style="color: #666666">=42</span>)
+model_bin <span style="color: #666666">=</span> LogisticRegression(lr<span style="color: #666666">=0.1</span>, epochs<span style="color: #666666">=1000</span>)
+model_bin<span style="color: #666666">.</span>fit(X_bin, y_bin)
+y_prob_bin <span style="color: #666666">=</span> model_bin<span style="color: #666666">.</span>predict_prob(X_bin)      <span style="color: #408080; font-style: italic"># probabilities for class 1</span>
+y_pred_bin <span style="color: #666666">=</span> model_bin<span style="color: #666666">.</span>predict(X_bin)           <span style="color: #408080; font-style: italic"># predicted classes 0 or 1</span>
+
+acc_bin <span style="color: #666666">=</span> accuracy_score(y_bin, y_pred_bin)
+loss_bin <span style="color: #666666">=</span> binary_cross_entropy(y_bin, y_prob_bin)
+<span style="color: #008000">print</span>(<span style="color: #BA2121">f&quot;Binary Classification - Accuracy: </span><span style="color: #BB6688; font-weight: bold">{</span>acc_bin<span style="color: #BB6688; font-weight: bold">:</span><span style="color: #BA2121">.2f</span><span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">, Cross-Entropy Loss: </span><span style="color: #BB6688; font-weight: bold">{</span>loss_bin<span style="color: #BB6688; font-weight: bold">:</span><span style="color: #BA2121">.2f</span><span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">&quot;</span>)
+<span style="color: #408080; font-style: italic">#For multiclass:</span>
+<span style="color: #408080; font-style: italic"># Generate and test on multiclass data</span>
+X_multi, y_multi <span style="color: #666666">=</span> generate_multiclass_data(n_samples<span style="color: #666666">=300</span>, n_features<span style="color: #666666">=2</span>, n_classes<span style="color: #666666">=3</span>, random_state<span style="color: #666666">=1</span>)
+model_multi <span style="color: #666666">=</span> LogisticRegression(lr<span style="color: #666666">=0.1</span>, epochs<span style="color: #666666">=1000</span>)
+model_multi<span style="color: #666666">.</span>fit(X_multi, y_multi)
+y_prob_multi <span style="color: #666666">=</span> model_multi<span style="color: #666666">.</span>predict_prob(X_multi)     <span style="color: #408080; font-style: italic"># (n_samples x 3) probabilities</span>
+y_pred_multi <span style="color: #666666">=</span> model_multi<span style="color: #666666">.</span>predict(X_multi)          <span style="color: #408080; font-style: italic"># predicted labels 0,1,2</span>
+
+acc_multi <span style="color: #666666">=</span> accuracy_score(y_multi, y_pred_multi)
+loss_multi <span style="color: #666666">=</span> categorical_cross_entropy(y_multi, y_prob_multi)
+<span style="color: #008000">print</span>(<span style="color: #BA2121">f&quot;Multiclass Classification - Accuracy: </span><span style="color: #BB6688; font-weight: bold">{</span>acc_multi<span style="color: #BB6688; font-weight: bold">:</span><span style="color: #BA2121">.2f</span><span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">, Cross-Entropy Loss: </span><span style="color: #BB6688; font-weight: bold">{</span>loss_multi<span style="color: #BB6688; font-weight: bold">:</span><span style="color: #BA2121">.2f</span><span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">&quot;</span>)
+
+<span style="color: #408080; font-style: italic"># CSV Export</span>
+<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">csv</span>
+
+<span style="color: #408080; font-style: italic"># Export binary results</span>
+<span style="color: #008000; font-weight: bold">with</span> <span style="color: #008000">open</span>(<span style="color: #BA2121">&#39;binary_results.csv&#39;</span>, mode<span style="color: #666666">=</span><span style="color: #BA2121">&#39;w&#39;</span>, newline<span style="color: #666666">=</span><span style="color: #BA2121">&#39;&#39;</span>) <span style="color: #008000; font-weight: bold">as</span> f:
+    writer <span style="color: #666666">=</span> csv<span style="color: #666666">.</span>writer(f)
+    writer<span style="color: #666666">.</span>writerow([<span style="color: #BA2121">&quot;TrueLabel&quot;</span>, <span style="color: #BA2121">&quot;PredictedLabel&quot;</span>])
+    <span style="color: #008000; font-weight: bold">for</span> true, pred <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">zip</span>(y_bin, y_pred_bin):
+        writer<span style="color: #666666">.</span>writerow([true, pred])
+
+<span style="color: #408080; font-style: italic"># Export multiclass results</span>
+<span style="color: #008000; font-weight: bold">with</span> <span style="color: #008000">open</span>(<span style="color: #BA2121">&#39;multiclass_results.csv&#39;</span>, mode<span style="color: #666666">=</span><span style="color: #BA2121">&#39;w&#39;</span>, newline<span style="color: #666666">=</span><span style="color: #BA2121">&#39;&#39;</span>) <span style="color: #008000; font-weight: bold">as</span> f:
+    writer <span style="color: #666666">=</span> csv<span style="color: #666666">.</span>writer(f)
+    writer<span style="color: #666666">.</span>writerow([<span style="color: #BA2121">&quot;TrueLabel&quot;</span>, <span style="color: #BA2121">&quot;PredictedLabel&quot;</span>])
+    <span style="color: #008000; font-weight: bold">for</span> true, pred <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">zip</span>(y_multi, y_pred_multi):
+        writer<span style="color: #666666">.</span>writerow([true, pred])
 </pre>
 </div>
       </div>
@@ -479,7 +613,7 @@ <h2 id="code-with-a-number-of-minibatches-which-varies" class="anchor">Code with
   <li><a href="http://github.com/._week40-bs028.html">29</a></li>
   <li><a href="http://github.com/._week40-bs029.html">30</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week40-bs068.html">69</a></li>
+  <li><a href="http://github.com/._week40-bs047.html">48</a></li>
   <li><a href="http://github.com/._week40-bs021.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week40/html/._week40-bs021.html b/doc/pub/week40/html/._week40-bs021.html
index 3a9312816..d9aeef6c6 100644
--- a/doc/pub/week40/html/._week40-bs021.html
+++ b/doc/pub/week40/html/._week40-bs021.html
@@ -36,11 +36,10 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Plans for week 40', 2, None, 'plans-for-week-40'),
-              ('Lecture Monday September 30, 2024',
+ 'sections': [('Lecture Monday September 29, 2025',
                2,
                None,
-               'lecture-monday-september-30-2024'),
+               'lecture-monday-september-29-2025'),
               ('Suggested readings and videos',
                2,
                None,
@@ -49,145 +48,71 @@
                2,
                None,
                'lab-sessions-tuesday-and-wednesday'),
-              ('Summary from last week, using gradient descent methods, '
-               'limitations',
+              ('Logistic Regression, from last week',
                2,
                None,
-               'summary-from-last-week-using-gradient-descent-methods-limitations'),
-              ('Simple implementation of GD for OLS, Ridge and Lasso',
+               'logistic-regression-from-last-week'),
+              ('Classification problems', 2, None, 'classification-problems'),
+              ('Optimization and Deep learning',
                2,
                None,
-               'simple-implementation-of-gd-for-ols-ridge-and-lasso'),
-              ("But none of these can compete with Newton's method",
+               'optimization-and-deep-learning'),
+              ('Basics', 2, None, 'basics'),
+              ('Two parameters', 2, None, 'two-parameters'),
+              ('Maximum likelihood', 2, None, 'maximum-likelihood'),
+              ('The cost function rewritten',
                2,
                None,
-               'but-none-of-these-can-compete-with-newton-s-method'),
-              ('Gradient descent and Logistic regression',
+               'the-cost-function-rewritten'),
+              ('Minimizing the cross entropy',
                2,
                None,
-               'gradient-descent-and-logistic-regression'),
-              ('Overview video on Stochastic Gradient Descent',
+               'minimizing-the-cross-entropy'),
+              ('A more compact expression',
                2,
                None,
-               'overview-video-on-stochastic-gradient-descent'),
-              ('Batches and mini-batches', 2, None, 'batches-and-mini-batches'),
-              ('Stochastic Gradient Descent (SGD)',
+               'a-more-compact-expression'),
+              ('Extending to more predictors',
                2,
                None,
-               'stochastic-gradient-descent-sgd'),
-              ('Stochastic Gradient Descent',
+               'extending-to-more-predictors'),
+              ('Including more classes', 2, None, 'including-more-classes'),
+              ('More classes', 2, None, 'more-classes'),
+              ('Optimization, the central part of any Machine Learning '
+               'algortithm',
                2,
                None,
-               'stochastic-gradient-descent'),
-              ('Computation of gradients', 2, None, 'computation-of-gradients'),
-              ('SGD example', 2, None, 'sgd-example'),
-              ('The gradient step', 2, None, 'the-gradient-step'),
-              ('Simple example code', 2, None, 'simple-example-code'),
-              ('When do we stop?', 2, None, 'when-do-we-stop'),
-              ('Slightly different approach',
+               'optimization-the-central-part-of-any-machine-learning-algortithm'),
+              ('Revisiting our Logistic Regression case',
                2,
                None,
-               'slightly-different-approach'),
-              ('Time decay rate', 2, None, 'time-decay-rate'),
-              ('Code with a Number of Minibatches which varies',
+               'revisiting-our-logistic-regression-case'),
+              ('The equations to solve', 2, None, 'the-equations-to-solve'),
+              ("Solving using Newton-Raphson's method",
                2,
                None,
-               'code-with-a-number-of-minibatches-which-varies'),
-              ('Replace or not', 2, None, 'replace-or-not'),
-              ('Momentum based GD', 2, None, 'momentum-based-gd'),
-              ('More on momentum based approaches',
+               'solving-using-newton-raphson-s-method'),
+              ('Example code for Logistic Regression',
                2,
                None,
-               'more-on-momentum-based-approaches'),
-              ('Momentum parameter', 2, None, 'momentum-parameter'),
-              ('Second moment of the gradient',
-               2,
-               None,
-               'second-moment-of-the-gradient'),
-              ('RMS prop', 2, None, 'rms-prop'),
-              ('"ADAM optimizer":"https://arxiv.org/abs/1412.6980"',
-               2,
-               None,
-               'adam-optimizer-https-arxiv-org-abs-1412-6980'),
-              ('Algorithms and codes for Adagrad, RMSprop and Adam',
-               2,
-               None,
-               'algorithms-and-codes-for-adagrad-rmsprop-and-adam'),
-              ('AdaGrad algorithm, taken from "Goodfellow et '
-               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
-               2,
-               None,
-               'adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
-              ('RMSProp algorithm, taken from "Goodfellow et '
-               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
-               2,
-               None,
-               'rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
-              ('ADAM algorithm, taken from "Goodfellow et '
-               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
-               2,
-               None,
-               'adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
-              ('Practical tips', 2, None, 'practical-tips'),
-              ('Automatic differentiation',
-               2,
-               None,
-               'automatic-differentiation'),
-              ('Using autograd', 2, None, 'using-autograd'),
-              ('Autograd with more complicated functions',
-               2,
-               None,
-               'autograd-with-more-complicated-functions'),
-              ('More complicated functions using the elements of their '
-               'arguments directly',
-               2,
-               None,
-               'more-complicated-functions-using-the-elements-of-their-arguments-directly'),
-              ('Functions using mathematical functions from Numpy',
-               2,
-               None,
-               'functions-using-mathematical-functions-from-numpy'),
-              ('More autograd', 2, None, 'more-autograd'),
-              ('And  with loops', 2, None, 'and-with-loops'),
-              ('Using recursion', 2, None, 'using-recursion'),
-              ('Using Autograd with OLS', 2, None, 'using-autograd-with-ols'),
-              ('Same code but now with momentum gradient descent',
-               2,
-               None,
-               'same-code-but-now-with-momentum-gradient-descent'),
-              ('Including Stochastic Gradient Descent with Autograd',
-               2,
-               None,
-               'including-stochastic-gradient-descent-with-autograd'),
-              ('Same code but now with momentum gradient descent',
-               2,
-               None,
-               'same-code-but-now-with-momentum-gradient-descent'),
-              ('Similar (second order function now) problem but now with '
-               'AdaGrad',
-               2,
+               'example-code-for-logistic-regression'),
+              ('Synthetic data generation',
+               3,
                None,
-               'similar-second-order-function-now-problem-but-now-with-adagrad'),
-              ('RMSprop for adaptive learning rate with Stochastic Gradient '
-               'Descent',
+               'synthetic-data-generation'),
+              ('Using _Scikit-learn_', 2, None, 'using-scikit-learn'),
+              ('Using the correlation matrix',
                2,
                None,
-               'rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent'),
-              ('And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"',
+               'using-the-correlation-matrix'),
+              ('Discussing the correlation data',
                2,
                None,
-               'and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf'),
-              ('And Logistic Regression', 2, None, 'and-logistic-regression'),
-              ('Introducing "JAX":"https://jax.readthedocs.io/en/latest/"',
+               'discussing-the-correlation-data'),
+              ('Other measures in classification studies',
                2,
                None,
-               'introducing-jax-https-jax-readthedocs-io-en-latest'),
-              ('Getting started with Jax, note the way we import numpy',
-               3,
-               None,
-               'getting-started-with-jax-note-the-way-we-import-numpy'),
-              ('A warm-up example', 3, None, 'a-warm-up-example'),
-              ('A more advanced example', 3, None, 'a-more-advanced-example'),
+               'other-measures-in-classification-studies'),
               ('Introduction to Neural networks',
                2,
                None,
@@ -276,81 +201,54 @@
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs001.html#plans-for-week-40" style="font-size: 80%;"><b>Plans for week 40</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs002.html#lecture-monday-september-30-2024" style="font-size: 80%;"><b>Lecture Monday September 30, 2024</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs003.html#suggested-readings-and-videos" style="font-size: 80%;"><b>Suggested readings and videos</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs004.html#lab-sessions-tuesday-and-wednesday" style="font-size: 80%;"><b>Lab sessions Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs005.html#summary-from-last-week-using-gradient-descent-methods-limitations" style="font-size: 80%;"><b>Summary from last week, using gradient descent methods, limitations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs006.html#simple-implementation-of-gd-for-ols-ridge-and-lasso" style="font-size: 80%;"><b>Simple implementation of GD for OLS, Ridge and Lasso</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs007.html#but-none-of-these-can-compete-with-newton-s-method" style="font-size: 80%;"><b>But none of these can compete with Newton's method</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs008.html#gradient-descent-and-logistic-regression" style="font-size: 80%;"><b>Gradient descent and Logistic regression</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs009.html#overview-video-on-stochastic-gradient-descent" style="font-size: 80%;"><b>Overview video on Stochastic Gradient Descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs010.html#batches-and-mini-batches" style="font-size: 80%;"><b>Batches and mini-batches</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs011.html#stochastic-gradient-descent-sgd" style="font-size: 80%;"><b>Stochastic Gradient Descent (SGD)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs012.html#stochastic-gradient-descent" style="font-size: 80%;"><b>Stochastic Gradient Descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs013.html#computation-of-gradients" style="font-size: 80%;"><b>Computation of gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs014.html#sgd-example" style="font-size: 80%;"><b>SGD example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs015.html#the-gradient-step" style="font-size: 80%;"><b>The gradient step</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs016.html#simple-example-code" style="font-size: 80%;"><b>Simple example code</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs017.html#when-do-we-stop" style="font-size: 80%;"><b>When do we stop?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs018.html#slightly-different-approach" style="font-size: 80%;"><b>Slightly different approach</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs019.html#time-decay-rate" style="font-size: 80%;"><b>Time decay rate</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs020.html#code-with-a-number-of-minibatches-which-varies" style="font-size: 80%;"><b>Code with a Number of Minibatches which varies</b></a></li>
-     <!-- navigation toc: --> <li><a href="#replace-or-not" style="font-size: 80%;"><b>Replace or not</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs022.html#momentum-based-gd" style="font-size: 80%;"><b>Momentum based GD</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs023.html#more-on-momentum-based-approaches" style="font-size: 80%;"><b>More on momentum based approaches</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs024.html#momentum-parameter" style="font-size: 80%;"><b>Momentum parameter</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs025.html#second-moment-of-the-gradient" style="font-size: 80%;"><b>Second moment of the gradient</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs026.html#rms-prop" style="font-size: 80%;"><b>RMS prop</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs027.html#adam-optimizer-https-arxiv-org-abs-1412-6980" style="font-size: 80%;"><b>"ADAM optimizer":"https://arxiv.org/abs/1412.6980"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#algorithms-and-codes-for-adagrad-rmsprop-and-adam" style="font-size: 80%;"><b>Algorithms and codes for Adagrad, RMSprop and Adam</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>AdaGrad algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>RMSProp algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>ADAM algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs029.html#practical-tips" style="font-size: 80%;"><b>Practical tips</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs030.html#automatic-differentiation" style="font-size: 80%;"><b>Automatic differentiation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs031.html#using-autograd" style="font-size: 80%;"><b>Using autograd</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs032.html#autograd-with-more-complicated-functions" style="font-size: 80%;"><b>Autograd with more complicated functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs033.html#more-complicated-functions-using-the-elements-of-their-arguments-directly" style="font-size: 80%;"><b>More complicated functions using the elements of their arguments directly</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs034.html#functions-using-mathematical-functions-from-numpy" style="font-size: 80%;"><b>Functions using mathematical functions from Numpy</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs035.html#more-autograd" style="font-size: 80%;"><b>More autograd</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs036.html#and-with-loops" style="font-size: 80%;"><b>And  with loops</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs037.html#using-recursion" style="font-size: 80%;"><b>Using recursion</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs038.html#using-autograd-with-ols" style="font-size: 80%;"><b>Using Autograd with OLS</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs041.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs040.html#including-stochastic-gradient-descent-with-autograd" style="font-size: 80%;"><b>Including Stochastic Gradient Descent with Autograd</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs041.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#similar-second-order-function-now-problem-but-now-with-adagrad" style="font-size: 80%;"><b>Similar (second order function now) problem but now with AdaGrad</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs043.html#rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent" style="font-size: 80%;"><b>RMSprop for adaptive learning rate with Stochastic Gradient Descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs044.html#and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf" style="font-size: 80%;"><b>And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#and-logistic-regression" style="font-size: 80%;"><b>And Logistic Regression</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#introducing-jax-https-jax-readthedocs-io-en-latest" style="font-size: 80%;"><b>Introducing "JAX":"https://jax.readthedocs.io/en/latest/"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#getting-started-with-jax-note-the-way-we-import-numpy" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Getting started with Jax, note the way we import numpy</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#a-warm-up-example" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;A warm-up example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#a-more-advanced-example" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;A more advanced example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs046.html#introduction-to-neural-networks" style="font-size: 80%;"><b>Introduction to Neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs047.html#artificial-neurons" style="font-size: 80%;"><b>Artificial neurons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs048.html#neural-network-types" style="font-size: 80%;"><b>Neural network types</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs049.html#feed-forward-neural-networks" style="font-size: 80%;"><b>Feed-forward neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs050.html#convolutional-neural-network" style="font-size: 80%;"><b>Convolutional Neural Network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs051.html#recurrent-neural-networks" style="font-size: 80%;"><b>Recurrent neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs052.html#other-types-of-networks" style="font-size: 80%;"><b>Other types of networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs053.html#multilayer-perceptrons" style="font-size: 80%;"><b>Multilayer perceptrons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs054.html#why-multilayer-perceptrons" style="font-size: 80%;"><b>Why multilayer perceptrons?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs055.html#illustration-of-a-single-perceptron-model-and-a-multi-perceptron-model" style="font-size: 80%;"><b>Illustration of a single perceptron model and a multi-perceptron model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs056.html#examples-of-xor-or-and-and-gates" style="font-size: 80%;"><b>Examples of XOR, OR and AND gates</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs057.html#does-logistic-regression-do-a-better-job" style="font-size: 80%;"><b>Does Logistic Regression do a better Job?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs058.html#adding-neural-networks" style="font-size: 80%;"><b>Adding Neural Networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs064.html#matrix-vector-notation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Matrix-vector notation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs065.html#matrix-vector-notation-and-activation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Matrix-vector notation  and activation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs066.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs067.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs068.html#relevance" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Relevance</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs001.html#lecture-monday-september-29-2025" style="font-size: 80%;"><b>Lecture Monday September 29, 2025</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs002.html#suggested-readings-and-videos" style="font-size: 80%;"><b>Suggested readings and videos</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs003.html#lab-sessions-tuesday-and-wednesday" style="font-size: 80%;"><b>Lab sessions Tuesday and Wednesday</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs004.html#logistic-regression-from-last-week" style="font-size: 80%;"><b>Logistic Regression, from last week</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs005.html#classification-problems" style="font-size: 80%;"><b>Classification problems</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs006.html#optimization-and-deep-learning" style="font-size: 80%;"><b>Optimization and Deep learning</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs007.html#basics" style="font-size: 80%;"><b>Basics</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs008.html#two-parameters" style="font-size: 80%;"><b>Two parameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs009.html#maximum-likelihood" style="font-size: 80%;"><b>Maximum likelihood</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs010.html#the-cost-function-rewritten" style="font-size: 80%;"><b>The cost function rewritten</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs011.html#minimizing-the-cross-entropy" style="font-size: 80%;"><b>Minimizing the cross entropy</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs012.html#a-more-compact-expression" style="font-size: 80%;"><b>A more compact expression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs013.html#extending-to-more-predictors" style="font-size: 80%;"><b>Extending to more predictors</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs014.html#including-more-classes" style="font-size: 80%;"><b>Including more classes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs015.html#more-classes" style="font-size: 80%;"><b>More classes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs016.html#optimization-the-central-part-of-any-machine-learning-algortithm" style="font-size: 80%;"><b>Optimization, the central part of any Machine Learning algortithm</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs017.html#revisiting-our-logistic-regression-case" style="font-size: 80%;"><b>Revisiting our Logistic Regression case</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs018.html#the-equations-to-solve" style="font-size: 80%;"><b>The equations to solve</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs019.html#solving-using-newton-raphson-s-method" style="font-size: 80%;"><b>Solving using Newton-Raphson's method</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs020.html#example-code-for-logistic-regression" style="font-size: 80%;"><b>Example code for Logistic Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs020.html#synthetic-data-generation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Synthetic data generation</a></li>
+     <!-- navigation toc: --> <li><a href="#using-scikit-learn" style="font-size: 80%;"><b>Using <b>Scikit-learn</b></b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs022.html#using-the-correlation-matrix" style="font-size: 80%;"><b>Using the correlation matrix</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs023.html#discussing-the-correlation-data" style="font-size: 80%;"><b>Discussing the correlation data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs024.html#other-measures-in-classification-studies" style="font-size: 80%;"><b>Other measures in classification studies</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs025.html#introduction-to-neural-networks" style="font-size: 80%;"><b>Introduction to Neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs026.html#artificial-neurons" style="font-size: 80%;"><b>Artificial neurons</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs027.html#neural-network-types" style="font-size: 80%;"><b>Neural network types</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#feed-forward-neural-networks" style="font-size: 80%;"><b>Feed-forward neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs029.html#convolutional-neural-network" style="font-size: 80%;"><b>Convolutional Neural Network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs030.html#recurrent-neural-networks" style="font-size: 80%;"><b>Recurrent neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs031.html#other-types-of-networks" style="font-size: 80%;"><b>Other types of networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs032.html#multilayer-perceptrons" style="font-size: 80%;"><b>Multilayer perceptrons</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs033.html#why-multilayer-perceptrons" style="font-size: 80%;"><b>Why multilayer perceptrons?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs034.html#illustration-of-a-single-perceptron-model-and-a-multi-perceptron-model" style="font-size: 80%;"><b>Illustration of a single perceptron model and a multi-perceptron model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs035.html#examples-of-xor-or-and-and-gates" style="font-size: 80%;"><b>Examples of XOR, OR and AND gates</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs036.html#does-logistic-regression-do-a-better-job" style="font-size: 80%;"><b>Does Logistic Regression do a better Job?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs037.html#adding-neural-networks" style="font-size: 80%;"><b>Adding Neural Networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs043.html#matrix-vector-notation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Matrix-vector notation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs044.html#matrix-vector-notation-and-activation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Matrix-vector notation  and activation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs046.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs047.html#relevance" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Relevance</a></li>
 
         </ul>
       </li>
@@ -362,14 +260,54 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0021"></a>
 <!-- !split -->
-<h2 id="replace-or-not" class="anchor">Replace or not </h2>
+<h2 id="using-scikit-learn" class="anchor">Using <b>Scikit-learn</b>   </h2>
 
-<p>In the above code, we have use replacement in setting up the
-mini-batches. The discussion
-<a href="https://sebastianraschka.com/faq/docs/sgd-methods.html" target="_self">here</a> may be
-useful.  
+<p>We show here how we can use a logistic regression case on a data set
+included in _scikit_learn_, the so-called Wisconsin breast cancer data
+using Logistic regression as our algorithm for classification. This is
+a widely studied data set and can easily be included in demonstrations
+of classification problems.
 </p>
 
+
+<!-- code=python (!bc pycod) typeset with pygments style "default" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #f8f8f8">
+  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">matplotlib.pyplot</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">plt</span>
+<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.model_selection</span> <span style="color: #008000; font-weight: bold">import</span>  train_test_split 
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.datasets</span> <span style="color: #008000; font-weight: bold">import</span> load_breast_cancer
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.linear_model</span> <span style="color: #008000; font-weight: bold">import</span> LogisticRegression
+
+<span style="color: #408080; font-style: italic"># Load the data</span>
+cancer <span style="color: #666666">=</span> load_breast_cancer()
+
+X_train, X_test, y_train, y_test <span style="color: #666666">=</span> train_test_split(cancer<span style="color: #666666">.</span>data,cancer<span style="color: #666666">.</span>target,random_state<span style="color: #666666">=0</span>)
+<span style="color: #008000">print</span>(X_train<span style="color: #666666">.</span>shape)
+<span style="color: #008000">print</span>(X_test<span style="color: #666666">.</span>shape)
+<span style="color: #408080; font-style: italic"># Logistic Regression</span>
+logreg <span style="color: #666666">=</span> LogisticRegression(solver<span style="color: #666666">=</span><span style="color: #BA2121">&#39;lbfgs&#39;</span>)
+logreg<span style="color: #666666">.</span>fit(X_train, y_train)
+<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;Test set accuracy with Logistic Regression: </span><span style="color: #BB6688; font-weight: bold">{:.2f}</span><span style="color: #BA2121">&quot;</span><span style="color: #666666">.</span>format(logreg<span style="color: #666666">.</span>score(X_test,y_test)))
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
+</div>
+
+
 <p>
 <!-- navigation buttons at the bottom of the page -->
 <ul class="pagination">
@@ -395,7 +333,7 @@ <h2 id="replace-or-not" class="anchor">Replace or not </h2>
   <li><a href="http://github.com/._week40-bs029.html">30</a></li>
   <li><a href="http://github.com/._week40-bs030.html">31</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week40-bs068.html">69</a></li>
+  <li><a href="http://github.com/._week40-bs047.html">48</a></li>
   <li><a href="http://github.com/._week40-bs022.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week40/html/._week40-bs022.html b/doc/pub/week40/html/._week40-bs022.html
index c1f9f4e2d..447390292 100644
--- a/doc/pub/week40/html/._week40-bs022.html
+++ b/doc/pub/week40/html/._week40-bs022.html
@@ -36,11 +36,10 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Plans for week 40', 2, None, 'plans-for-week-40'),
-              ('Lecture Monday September 30, 2024',
+ 'sections': [('Lecture Monday September 29, 2025',
                2,
                None,
-               'lecture-monday-september-30-2024'),
+               'lecture-monday-september-29-2025'),
               ('Suggested readings and videos',
                2,
                None,
@@ -49,145 +48,71 @@
                2,
                None,
                'lab-sessions-tuesday-and-wednesday'),
-              ('Summary from last week, using gradient descent methods, '
-               'limitations',
+              ('Logistic Regression, from last week',
                2,
                None,
-               'summary-from-last-week-using-gradient-descent-methods-limitations'),
-              ('Simple implementation of GD for OLS, Ridge and Lasso',
+               'logistic-regression-from-last-week'),
+              ('Classification problems', 2, None, 'classification-problems'),
+              ('Optimization and Deep learning',
                2,
                None,
-               'simple-implementation-of-gd-for-ols-ridge-and-lasso'),
-              ("But none of these can compete with Newton's method",
+               'optimization-and-deep-learning'),
+              ('Basics', 2, None, 'basics'),
+              ('Two parameters', 2, None, 'two-parameters'),
+              ('Maximum likelihood', 2, None, 'maximum-likelihood'),
+              ('The cost function rewritten',
                2,
                None,
-               'but-none-of-these-can-compete-with-newton-s-method'),
-              ('Gradient descent and Logistic regression',
+               'the-cost-function-rewritten'),
+              ('Minimizing the cross entropy',
                2,
                None,
-               'gradient-descent-and-logistic-regression'),
-              ('Overview video on Stochastic Gradient Descent',
+               'minimizing-the-cross-entropy'),
+              ('A more compact expression',
                2,
                None,
-               'overview-video-on-stochastic-gradient-descent'),
-              ('Batches and mini-batches', 2, None, 'batches-and-mini-batches'),
-              ('Stochastic Gradient Descent (SGD)',
+               'a-more-compact-expression'),
+              ('Extending to more predictors',
                2,
                None,
-               'stochastic-gradient-descent-sgd'),
-              ('Stochastic Gradient Descent',
+               'extending-to-more-predictors'),
+              ('Including more classes', 2, None, 'including-more-classes'),
+              ('More classes', 2, None, 'more-classes'),
+              ('Optimization, the central part of any Machine Learning '
+               'algortithm',
                2,
                None,
-               'stochastic-gradient-descent'),
-              ('Computation of gradients', 2, None, 'computation-of-gradients'),
-              ('SGD example', 2, None, 'sgd-example'),
-              ('The gradient step', 2, None, 'the-gradient-step'),
-              ('Simple example code', 2, None, 'simple-example-code'),
-              ('When do we stop?', 2, None, 'when-do-we-stop'),
-              ('Slightly different approach',
+               'optimization-the-central-part-of-any-machine-learning-algortithm'),
+              ('Revisiting our Logistic Regression case',
                2,
                None,
-               'slightly-different-approach'),
-              ('Time decay rate', 2, None, 'time-decay-rate'),
-              ('Code with a Number of Minibatches which varies',
+               'revisiting-our-logistic-regression-case'),
+              ('The equations to solve', 2, None, 'the-equations-to-solve'),
+              ("Solving using Newton-Raphson's method",
                2,
                None,
-               'code-with-a-number-of-minibatches-which-varies'),
-              ('Replace or not', 2, None, 'replace-or-not'),
-              ('Momentum based GD', 2, None, 'momentum-based-gd'),
-              ('More on momentum based approaches',
+               'solving-using-newton-raphson-s-method'),
+              ('Example code for Logistic Regression',
                2,
                None,
-               'more-on-momentum-based-approaches'),
-              ('Momentum parameter', 2, None, 'momentum-parameter'),
-              ('Second moment of the gradient',
-               2,
-               None,
-               'second-moment-of-the-gradient'),
-              ('RMS prop', 2, None, 'rms-prop'),
-              ('"ADAM optimizer":"https://arxiv.org/abs/1412.6980"',
-               2,
-               None,
-               'adam-optimizer-https-arxiv-org-abs-1412-6980'),
-              ('Algorithms and codes for Adagrad, RMSprop and Adam',
-               2,
-               None,
-               'algorithms-and-codes-for-adagrad-rmsprop-and-adam'),
-              ('AdaGrad algorithm, taken from "Goodfellow et '
-               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
-               2,
-               None,
-               'adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
-              ('RMSProp algorithm, taken from "Goodfellow et '
-               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
-               2,
-               None,
-               'rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
-              ('ADAM algorithm, taken from "Goodfellow et '
-               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
-               2,
-               None,
-               'adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
-              ('Practical tips', 2, None, 'practical-tips'),
-              ('Automatic differentiation',
-               2,
-               None,
-               'automatic-differentiation'),
-              ('Using autograd', 2, None, 'using-autograd'),
-              ('Autograd with more complicated functions',
-               2,
-               None,
-               'autograd-with-more-complicated-functions'),
-              ('More complicated functions using the elements of their '
-               'arguments directly',
-               2,
-               None,
-               'more-complicated-functions-using-the-elements-of-their-arguments-directly'),
-              ('Functions using mathematical functions from Numpy',
-               2,
-               None,
-               'functions-using-mathematical-functions-from-numpy'),
-              ('More autograd', 2, None, 'more-autograd'),
-              ('And  with loops', 2, None, 'and-with-loops'),
-              ('Using recursion', 2, None, 'using-recursion'),
-              ('Using Autograd with OLS', 2, None, 'using-autograd-with-ols'),
-              ('Same code but now with momentum gradient descent',
-               2,
-               None,
-               'same-code-but-now-with-momentum-gradient-descent'),
-              ('Including Stochastic Gradient Descent with Autograd',
-               2,
-               None,
-               'including-stochastic-gradient-descent-with-autograd'),
-              ('Same code but now with momentum gradient descent',
-               2,
-               None,
-               'same-code-but-now-with-momentum-gradient-descent'),
-              ('Similar (second order function now) problem but now with '
-               'AdaGrad',
-               2,
+               'example-code-for-logistic-regression'),
+              ('Synthetic data generation',
+               3,
                None,
-               'similar-second-order-function-now-problem-but-now-with-adagrad'),
-              ('RMSprop for adaptive learning rate with Stochastic Gradient '
-               'Descent',
+               'synthetic-data-generation'),
+              ('Using _Scikit-learn_', 2, None, 'using-scikit-learn'),
+              ('Using the correlation matrix',
                2,
                None,
-               'rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent'),
-              ('And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"',
+               'using-the-correlation-matrix'),
+              ('Discussing the correlation data',
                2,
                None,
-               'and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf'),
-              ('And Logistic Regression', 2, None, 'and-logistic-regression'),
-              ('Introducing "JAX":"https://jax.readthedocs.io/en/latest/"',
+               'discussing-the-correlation-data'),
+              ('Other measures in classification studies',
                2,
                None,
-               'introducing-jax-https-jax-readthedocs-io-en-latest'),
-              ('Getting started with Jax, note the way we import numpy',
-               3,
-               None,
-               'getting-started-with-jax-note-the-way-we-import-numpy'),
-              ('A warm-up example', 3, None, 'a-warm-up-example'),
-              ('A more advanced example', 3, None, 'a-more-advanced-example'),
+               'other-measures-in-classification-studies'),
               ('Introduction to Neural networks',
                2,
                None,
@@ -276,81 +201,54 @@
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs001.html#plans-for-week-40" style="font-size: 80%;"><b>Plans for week 40</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs002.html#lecture-monday-september-30-2024" style="font-size: 80%;"><b>Lecture Monday September 30, 2024</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs003.html#suggested-readings-and-videos" style="font-size: 80%;"><b>Suggested readings and videos</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs004.html#lab-sessions-tuesday-and-wednesday" style="font-size: 80%;"><b>Lab sessions Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs005.html#summary-from-last-week-using-gradient-descent-methods-limitations" style="font-size: 80%;"><b>Summary from last week, using gradient descent methods, limitations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs006.html#simple-implementation-of-gd-for-ols-ridge-and-lasso" style="font-size: 80%;"><b>Simple implementation of GD for OLS, Ridge and Lasso</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs007.html#but-none-of-these-can-compete-with-newton-s-method" style="font-size: 80%;"><b>But none of these can compete with Newton's method</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs008.html#gradient-descent-and-logistic-regression" style="font-size: 80%;"><b>Gradient descent and Logistic regression</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs009.html#overview-video-on-stochastic-gradient-descent" style="font-size: 80%;"><b>Overview video on Stochastic Gradient Descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs010.html#batches-and-mini-batches" style="font-size: 80%;"><b>Batches and mini-batches</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs011.html#stochastic-gradient-descent-sgd" style="font-size: 80%;"><b>Stochastic Gradient Descent (SGD)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs012.html#stochastic-gradient-descent" style="font-size: 80%;"><b>Stochastic Gradient Descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs013.html#computation-of-gradients" style="font-size: 80%;"><b>Computation of gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs014.html#sgd-example" style="font-size: 80%;"><b>SGD example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs015.html#the-gradient-step" style="font-size: 80%;"><b>The gradient step</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs016.html#simple-example-code" style="font-size: 80%;"><b>Simple example code</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs017.html#when-do-we-stop" style="font-size: 80%;"><b>When do we stop?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs018.html#slightly-different-approach" style="font-size: 80%;"><b>Slightly different approach</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs019.html#time-decay-rate" style="font-size: 80%;"><b>Time decay rate</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs020.html#code-with-a-number-of-minibatches-which-varies" style="font-size: 80%;"><b>Code with a Number of Minibatches which varies</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs021.html#replace-or-not" style="font-size: 80%;"><b>Replace or not</b></a></li>
-     <!-- navigation toc: --> <li><a href="#momentum-based-gd" style="font-size: 80%;"><b>Momentum based GD</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs023.html#more-on-momentum-based-approaches" style="font-size: 80%;"><b>More on momentum based approaches</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs024.html#momentum-parameter" style="font-size: 80%;"><b>Momentum parameter</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs025.html#second-moment-of-the-gradient" style="font-size: 80%;"><b>Second moment of the gradient</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs026.html#rms-prop" style="font-size: 80%;"><b>RMS prop</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs027.html#adam-optimizer-https-arxiv-org-abs-1412-6980" style="font-size: 80%;"><b>"ADAM optimizer":"https://arxiv.org/abs/1412.6980"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#algorithms-and-codes-for-adagrad-rmsprop-and-adam" style="font-size: 80%;"><b>Algorithms and codes for Adagrad, RMSprop and Adam</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>AdaGrad algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>RMSProp algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>ADAM algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs029.html#practical-tips" style="font-size: 80%;"><b>Practical tips</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs030.html#automatic-differentiation" style="font-size: 80%;"><b>Automatic differentiation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs031.html#using-autograd" style="font-size: 80%;"><b>Using autograd</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs032.html#autograd-with-more-complicated-functions" style="font-size: 80%;"><b>Autograd with more complicated functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs033.html#more-complicated-functions-using-the-elements-of-their-arguments-directly" style="font-size: 80%;"><b>More complicated functions using the elements of their arguments directly</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs034.html#functions-using-mathematical-functions-from-numpy" style="font-size: 80%;"><b>Functions using mathematical functions from Numpy</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs035.html#more-autograd" style="font-size: 80%;"><b>More autograd</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs036.html#and-with-loops" style="font-size: 80%;"><b>And  with loops</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs037.html#using-recursion" style="font-size: 80%;"><b>Using recursion</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs038.html#using-autograd-with-ols" style="font-size: 80%;"><b>Using Autograd with OLS</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs041.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs040.html#including-stochastic-gradient-descent-with-autograd" style="font-size: 80%;"><b>Including Stochastic Gradient Descent with Autograd</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs041.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#similar-second-order-function-now-problem-but-now-with-adagrad" style="font-size: 80%;"><b>Similar (second order function now) problem but now with AdaGrad</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs043.html#rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent" style="font-size: 80%;"><b>RMSprop for adaptive learning rate with Stochastic Gradient Descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs044.html#and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf" style="font-size: 80%;"><b>And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#and-logistic-regression" style="font-size: 80%;"><b>And Logistic Regression</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#introducing-jax-https-jax-readthedocs-io-en-latest" style="font-size: 80%;"><b>Introducing "JAX":"https://jax.readthedocs.io/en/latest/"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#getting-started-with-jax-note-the-way-we-import-numpy" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Getting started with Jax, note the way we import numpy</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#a-warm-up-example" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;A warm-up example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#a-more-advanced-example" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;A more advanced example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs046.html#introduction-to-neural-networks" style="font-size: 80%;"><b>Introduction to Neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs047.html#artificial-neurons" style="font-size: 80%;"><b>Artificial neurons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs048.html#neural-network-types" style="font-size: 80%;"><b>Neural network types</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs049.html#feed-forward-neural-networks" style="font-size: 80%;"><b>Feed-forward neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs050.html#convolutional-neural-network" style="font-size: 80%;"><b>Convolutional Neural Network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs051.html#recurrent-neural-networks" style="font-size: 80%;"><b>Recurrent neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs052.html#other-types-of-networks" style="font-size: 80%;"><b>Other types of networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs053.html#multilayer-perceptrons" style="font-size: 80%;"><b>Multilayer perceptrons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs054.html#why-multilayer-perceptrons" style="font-size: 80%;"><b>Why multilayer perceptrons?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs055.html#illustration-of-a-single-perceptron-model-and-a-multi-perceptron-model" style="font-size: 80%;"><b>Illustration of a single perceptron model and a multi-perceptron model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs056.html#examples-of-xor-or-and-and-gates" style="font-size: 80%;"><b>Examples of XOR, OR and AND gates</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs057.html#does-logistic-regression-do-a-better-job" style="font-size: 80%;"><b>Does Logistic Regression do a better Job?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs058.html#adding-neural-networks" style="font-size: 80%;"><b>Adding Neural Networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs064.html#matrix-vector-notation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Matrix-vector notation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs065.html#matrix-vector-notation-and-activation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Matrix-vector notation  and activation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs066.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs067.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs068.html#relevance" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Relevance</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs001.html#lecture-monday-september-29-2025" style="font-size: 80%;"><b>Lecture Monday September 29, 2025</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs002.html#suggested-readings-and-videos" style="font-size: 80%;"><b>Suggested readings and videos</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs003.html#lab-sessions-tuesday-and-wednesday" style="font-size: 80%;"><b>Lab sessions Tuesday and Wednesday</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs004.html#logistic-regression-from-last-week" style="font-size: 80%;"><b>Logistic Regression, from last week</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs005.html#classification-problems" style="font-size: 80%;"><b>Classification problems</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs006.html#optimization-and-deep-learning" style="font-size: 80%;"><b>Optimization and Deep learning</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs007.html#basics" style="font-size: 80%;"><b>Basics</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs008.html#two-parameters" style="font-size: 80%;"><b>Two parameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs009.html#maximum-likelihood" style="font-size: 80%;"><b>Maximum likelihood</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs010.html#the-cost-function-rewritten" style="font-size: 80%;"><b>The cost function rewritten</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs011.html#minimizing-the-cross-entropy" style="font-size: 80%;"><b>Minimizing the cross entropy</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs012.html#a-more-compact-expression" style="font-size: 80%;"><b>A more compact expression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs013.html#extending-to-more-predictors" style="font-size: 80%;"><b>Extending to more predictors</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs014.html#including-more-classes" style="font-size: 80%;"><b>Including more classes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs015.html#more-classes" style="font-size: 80%;"><b>More classes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs016.html#optimization-the-central-part-of-any-machine-learning-algortithm" style="font-size: 80%;"><b>Optimization, the central part of any Machine Learning algortithm</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs017.html#revisiting-our-logistic-regression-case" style="font-size: 80%;"><b>Revisiting our Logistic Regression case</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs018.html#the-equations-to-solve" style="font-size: 80%;"><b>The equations to solve</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs019.html#solving-using-newton-raphson-s-method" style="font-size: 80%;"><b>Solving using Newton-Raphson's method</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs020.html#example-code-for-logistic-regression" style="font-size: 80%;"><b>Example code for Logistic Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs020.html#synthetic-data-generation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Synthetic data generation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs021.html#using-scikit-learn" style="font-size: 80%;"><b>Using <b>Scikit-learn</b></b></a></li>
+     <!-- navigation toc: --> <li><a href="#using-the-correlation-matrix" style="font-size: 80%;"><b>Using the correlation matrix</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs023.html#discussing-the-correlation-data" style="font-size: 80%;"><b>Discussing the correlation data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs024.html#other-measures-in-classification-studies" style="font-size: 80%;"><b>Other measures in classification studies</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs025.html#introduction-to-neural-networks" style="font-size: 80%;"><b>Introduction to Neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs026.html#artificial-neurons" style="font-size: 80%;"><b>Artificial neurons</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs027.html#neural-network-types" style="font-size: 80%;"><b>Neural network types</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#feed-forward-neural-networks" style="font-size: 80%;"><b>Feed-forward neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs029.html#convolutional-neural-network" style="font-size: 80%;"><b>Convolutional Neural Network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs030.html#recurrent-neural-networks" style="font-size: 80%;"><b>Recurrent neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs031.html#other-types-of-networks" style="font-size: 80%;"><b>Other types of networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs032.html#multilayer-perceptrons" style="font-size: 80%;"><b>Multilayer perceptrons</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs033.html#why-multilayer-perceptrons" style="font-size: 80%;"><b>Why multilayer perceptrons?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs034.html#illustration-of-a-single-perceptron-model-and-a-multi-perceptron-model" style="font-size: 80%;"><b>Illustration of a single perceptron model and a multi-perceptron model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs035.html#examples-of-xor-or-and-and-gates" style="font-size: 80%;"><b>Examples of XOR, OR and AND gates</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs036.html#does-logistic-regression-do-a-better-job" style="font-size: 80%;"><b>Does Logistic Regression do a better Job?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs037.html#adding-neural-networks" style="font-size: 80%;"><b>Adding Neural Networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs043.html#matrix-vector-notation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Matrix-vector notation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs044.html#matrix-vector-notation-and-activation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Matrix-vector notation  and activation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs046.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs047.html#relevance" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Relevance</a></li>
 
         </ul>
       </li>
@@ -362,39 +260,67 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0022"></a>
 <!-- !split -->
-<h2 id="momentum-based-gd" class="anchor">Momentum based GD </h2>
+<h2 id="using-the-correlation-matrix" class="anchor">Using the correlation matrix </h2>
 
-<p>The stochastic gradient descent (SGD) is almost always used with a
-<em>momentum</em> or inertia term that serves as a memory of the direction we
-are moving in parameter space.  This is typically implemented as
-follows
+<p>In addition to the above scores, we could also study the covariance (and the correlation matrix).
+We use <b>Pandas</b> to compute the correlation matrix.
 </p>
 
-$$
-\begin{align}
-\mathbf{v}_{t}&=\gamma \mathbf{v}_{t-1}+\eta_{t}\nabla_\theta E(\boldsymbol{\theta}_t) \nonumber \\
-\boldsymbol{\theta}_{t+1}&= \boldsymbol{\theta}_t -\mathbf{v}_{t},
-\tag{1}
-\end{align}
-$$
+<!-- code=python (!bc pycod) typeset with pygments style "default" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #f8f8f8">
+  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">matplotlib.pyplot</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">plt</span>
+<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.model_selection</span> <span style="color: #008000; font-weight: bold">import</span>  train_test_split 
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.datasets</span> <span style="color: #008000; font-weight: bold">import</span> load_breast_cancer
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.linear_model</span> <span style="color: #008000; font-weight: bold">import</span> LogisticRegression
+cancer <span style="color: #666666">=</span> load_breast_cancer()
+<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">pandas</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">pd</span>
+<span style="color: #408080; font-style: italic"># Making a data frame</span>
+cancerpd <span style="color: #666666">=</span> pd<span style="color: #666666">.</span>DataFrame(cancer<span style="color: #666666">.</span>data, columns<span style="color: #666666">=</span>cancer<span style="color: #666666">.</span>feature_names)
 
-<p>where we have introduced a momentum parameter \( \gamma \), with
-\( 0\le\gamma\le 1 \), and for brevity we dropped the explicit notation to
-indicate the gradient is to be taken over a different mini-batch at
-each step. We call this algorithm gradient descent with momentum
-(GDM). From these equations, it is clear that \( \mathbf{v}_t \) is a
-running average of recently encountered gradients and
-\( (1-\gamma)^{-1} \) sets the characteristic time scale for the memory
-used in the averaging procedure. Consistent with this, when
-\( \gamma=0 \), this just reduces down to ordinary SGD as discussed
-earlier. An equivalent way of writing the updates is
-</p>
+fig, axes <span style="color: #666666">=</span> plt<span style="color: #666666">.</span>subplots(<span style="color: #666666">15</span>,<span style="color: #666666">2</span>,figsize<span style="color: #666666">=</span>(<span style="color: #666666">10</span>,<span style="color: #666666">20</span>))
+malignant <span style="color: #666666">=</span> cancer<span style="color: #666666">.</span>data[cancer<span style="color: #666666">.</span>target <span style="color: #666666">==</span> <span style="color: #666666">0</span>]
+benign <span style="color: #666666">=</span> cancer<span style="color: #666666">.</span>data[cancer<span style="color: #666666">.</span>target <span style="color: #666666">==</span> <span style="color: #666666">1</span>]
+ax <span style="color: #666666">=</span> axes<span style="color: #666666">.</span>ravel()
+
+<span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #666666">30</span>):
+    _, bins <span style="color: #666666">=</span> np<span style="color: #666666">.</span>histogram(cancer<span style="color: #666666">.</span>data[:,i], bins <span style="color: #666666">=50</span>)
+    ax[i]<span style="color: #666666">.</span>hist(malignant[:,i], bins <span style="color: #666666">=</span> bins, alpha <span style="color: #666666">=</span> <span style="color: #666666">0.5</span>)
+    ax[i]<span style="color: #666666">.</span>hist(benign[:,i], bins <span style="color: #666666">=</span> bins, alpha <span style="color: #666666">=</span> <span style="color: #666666">0.5</span>)
+    ax[i]<span style="color: #666666">.</span>set_title(cancer<span style="color: #666666">.</span>feature_names[i])
+    ax[i]<span style="color: #666666">.</span>set_yticks(())
+ax[<span style="color: #666666">0</span>]<span style="color: #666666">.</span>set_xlabel(<span style="color: #BA2121">&quot;Feature magnitude&quot;</span>)
+ax[<span style="color: #666666">0</span>]<span style="color: #666666">.</span>set_ylabel(<span style="color: #BA2121">&quot;Frequency&quot;</span>)
+ax[<span style="color: #666666">0</span>]<span style="color: #666666">.</span>legend([<span style="color: #BA2121">&quot;Malignant&quot;</span>, <span style="color: #BA2121">&quot;Benign&quot;</span>], loc <span style="color: #666666">=</span><span style="color: #BA2121">&quot;best&quot;</span>)
+fig<span style="color: #666666">.</span>tight_layout()
+plt<span style="color: #666666">.</span>show()
 
-$$
-\Delta \boldsymbol{\theta}_{t+1} = \gamma \Delta \boldsymbol{\theta}_t -\ \eta_{t}\nabla_\theta E(\boldsymbol{\theta}_t),
-$$
+<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">seaborn</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">sns</span>
+correlation_matrix <span style="color: #666666">=</span> cancerpd<span style="color: #666666">.</span>corr()<span style="color: #666666">.</span>round(<span style="color: #666666">1</span>)
+<span style="color: #408080; font-style: italic"># use the heatmap function from seaborn to plot the correlation matrix</span>
+<span style="color: #408080; font-style: italic"># annot = True to print the values inside the square</span>
+plt<span style="color: #666666">.</span>figure(figsize<span style="color: #666666">=</span>(<span style="color: #666666">15</span>,<span style="color: #666666">8</span>))
+sns<span style="color: #666666">.</span>heatmap(data<span style="color: #666666">=</span>correlation_matrix, annot<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">True</span>)
+plt<span style="color: #666666">.</span>show()
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
+</div>
 
-<p>where we have defined \( \Delta \boldsymbol{\theta}_{t}= \boldsymbol{\theta}_t-\boldsymbol{\theta}_{t-1} \).</p>
 
 <p>
 <!-- navigation buttons at the bottom of the page -->
@@ -421,7 +347,7 @@ <h2 id="momentum-based-gd" class="anchor">Momentum based GD </h2>
   <li><a href="http://github.com/._week40-bs030.html">31</a></li>
   <li><a href="http://github.com/._week40-bs031.html">32</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week40-bs068.html">69</a></li>
+  <li><a href="http://github.com/._week40-bs047.html">48</a></li>
   <li><a href="http://github.com/._week40-bs023.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week40/html/._week40-bs023.html b/doc/pub/week40/html/._week40-bs023.html
index a66ee74bf..067ac7e36 100644
--- a/doc/pub/week40/html/._week40-bs023.html
+++ b/doc/pub/week40/html/._week40-bs023.html
@@ -36,11 +36,10 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Plans for week 40', 2, None, 'plans-for-week-40'),
-              ('Lecture Monday September 30, 2024',
+ 'sections': [('Lecture Monday September 29, 2025',
                2,
                None,
-               'lecture-monday-september-30-2024'),
+               'lecture-monday-september-29-2025'),
               ('Suggested readings and videos',
                2,
                None,
@@ -49,145 +48,71 @@
                2,
                None,
                'lab-sessions-tuesday-and-wednesday'),
-              ('Summary from last week, using gradient descent methods, '
-               'limitations',
+              ('Logistic Regression, from last week',
                2,
                None,
-               'summary-from-last-week-using-gradient-descent-methods-limitations'),
-              ('Simple implementation of GD for OLS, Ridge and Lasso',
+               'logistic-regression-from-last-week'),
+              ('Classification problems', 2, None, 'classification-problems'),
+              ('Optimization and Deep learning',
                2,
                None,
-               'simple-implementation-of-gd-for-ols-ridge-and-lasso'),
-              ("But none of these can compete with Newton's method",
+               'optimization-and-deep-learning'),
+              ('Basics', 2, None, 'basics'),
+              ('Two parameters', 2, None, 'two-parameters'),
+              ('Maximum likelihood', 2, None, 'maximum-likelihood'),
+              ('The cost function rewritten',
                2,
                None,
-               'but-none-of-these-can-compete-with-newton-s-method'),
-              ('Gradient descent and Logistic regression',
+               'the-cost-function-rewritten'),
+              ('Minimizing the cross entropy',
                2,
                None,
-               'gradient-descent-and-logistic-regression'),
-              ('Overview video on Stochastic Gradient Descent',
+               'minimizing-the-cross-entropy'),
+              ('A more compact expression',
                2,
                None,
-               'overview-video-on-stochastic-gradient-descent'),
-              ('Batches and mini-batches', 2, None, 'batches-and-mini-batches'),
-              ('Stochastic Gradient Descent (SGD)',
+               'a-more-compact-expression'),
+              ('Extending to more predictors',
                2,
                None,
-               'stochastic-gradient-descent-sgd'),
-              ('Stochastic Gradient Descent',
+               'extending-to-more-predictors'),
+              ('Including more classes', 2, None, 'including-more-classes'),
+              ('More classes', 2, None, 'more-classes'),
+              ('Optimization, the central part of any Machine Learning '
+               'algortithm',
                2,
                None,
-               'stochastic-gradient-descent'),
-              ('Computation of gradients', 2, None, 'computation-of-gradients'),
-              ('SGD example', 2, None, 'sgd-example'),
-              ('The gradient step', 2, None, 'the-gradient-step'),
-              ('Simple example code', 2, None, 'simple-example-code'),
-              ('When do we stop?', 2, None, 'when-do-we-stop'),
-              ('Slightly different approach',
+               'optimization-the-central-part-of-any-machine-learning-algortithm'),
+              ('Revisiting our Logistic Regression case',
                2,
                None,
-               'slightly-different-approach'),
-              ('Time decay rate', 2, None, 'time-decay-rate'),
-              ('Code with a Number of Minibatches which varies',
+               'revisiting-our-logistic-regression-case'),
+              ('The equations to solve', 2, None, 'the-equations-to-solve'),
+              ("Solving using Newton-Raphson's method",
                2,
                None,
-               'code-with-a-number-of-minibatches-which-varies'),
-              ('Replace or not', 2, None, 'replace-or-not'),
-              ('Momentum based GD', 2, None, 'momentum-based-gd'),
-              ('More on momentum based approaches',
+               'solving-using-newton-raphson-s-method'),
+              ('Example code for Logistic Regression',
                2,
                None,
-               'more-on-momentum-based-approaches'),
-              ('Momentum parameter', 2, None, 'momentum-parameter'),
-              ('Second moment of the gradient',
-               2,
-               None,
-               'second-moment-of-the-gradient'),
-              ('RMS prop', 2, None, 'rms-prop'),
-              ('"ADAM optimizer":"https://arxiv.org/abs/1412.6980"',
-               2,
-               None,
-               'adam-optimizer-https-arxiv-org-abs-1412-6980'),
-              ('Algorithms and codes for Adagrad, RMSprop and Adam',
-               2,
-               None,
-               'algorithms-and-codes-for-adagrad-rmsprop-and-adam'),
-              ('AdaGrad algorithm, taken from "Goodfellow et '
-               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
-               2,
-               None,
-               'adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
-              ('RMSProp algorithm, taken from "Goodfellow et '
-               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
-               2,
-               None,
-               'rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
-              ('ADAM algorithm, taken from "Goodfellow et '
-               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
-               2,
-               None,
-               'adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
-              ('Practical tips', 2, None, 'practical-tips'),
-              ('Automatic differentiation',
-               2,
-               None,
-               'automatic-differentiation'),
-              ('Using autograd', 2, None, 'using-autograd'),
-              ('Autograd with more complicated functions',
-               2,
-               None,
-               'autograd-with-more-complicated-functions'),
-              ('More complicated functions using the elements of their '
-               'arguments directly',
-               2,
-               None,
-               'more-complicated-functions-using-the-elements-of-their-arguments-directly'),
-              ('Functions using mathematical functions from Numpy',
-               2,
-               None,
-               'functions-using-mathematical-functions-from-numpy'),
-              ('More autograd', 2, None, 'more-autograd'),
-              ('And  with loops', 2, None, 'and-with-loops'),
-              ('Using recursion', 2, None, 'using-recursion'),
-              ('Using Autograd with OLS', 2, None, 'using-autograd-with-ols'),
-              ('Same code but now with momentum gradient descent',
-               2,
-               None,
-               'same-code-but-now-with-momentum-gradient-descent'),
-              ('Including Stochastic Gradient Descent with Autograd',
-               2,
-               None,
-               'including-stochastic-gradient-descent-with-autograd'),
-              ('Same code but now with momentum gradient descent',
-               2,
-               None,
-               'same-code-but-now-with-momentum-gradient-descent'),
-              ('Similar (second order function now) problem but now with '
-               'AdaGrad',
-               2,
+               'example-code-for-logistic-regression'),
+              ('Synthetic data generation',
+               3,
                None,
-               'similar-second-order-function-now-problem-but-now-with-adagrad'),
-              ('RMSprop for adaptive learning rate with Stochastic Gradient '
-               'Descent',
+               'synthetic-data-generation'),
+              ('Using _Scikit-learn_', 2, None, 'using-scikit-learn'),
+              ('Using the correlation matrix',
                2,
                None,
-               'rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent'),
-              ('And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"',
+               'using-the-correlation-matrix'),
+              ('Discussing the correlation data',
                2,
                None,
-               'and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf'),
-              ('And Logistic Regression', 2, None, 'and-logistic-regression'),
-              ('Introducing "JAX":"https://jax.readthedocs.io/en/latest/"',
+               'discussing-the-correlation-data'),
+              ('Other measures in classification studies',
                2,
                None,
-               'introducing-jax-https-jax-readthedocs-io-en-latest'),
-              ('Getting started with Jax, note the way we import numpy',
-               3,
-               None,
-               'getting-started-with-jax-note-the-way-we-import-numpy'),
-              ('A warm-up example', 3, None, 'a-warm-up-example'),
-              ('A more advanced example', 3, None, 'a-more-advanced-example'),
+               'other-measures-in-classification-studies'),
               ('Introduction to Neural networks',
                2,
                None,
@@ -276,81 +201,54 @@
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs001.html#plans-for-week-40" style="font-size: 80%;"><b>Plans for week 40</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs002.html#lecture-monday-september-30-2024" style="font-size: 80%;"><b>Lecture Monday September 30, 2024</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs003.html#suggested-readings-and-videos" style="font-size: 80%;"><b>Suggested readings and videos</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs004.html#lab-sessions-tuesday-and-wednesday" style="font-size: 80%;"><b>Lab sessions Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs005.html#summary-from-last-week-using-gradient-descent-methods-limitations" style="font-size: 80%;"><b>Summary from last week, using gradient descent methods, limitations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs006.html#simple-implementation-of-gd-for-ols-ridge-and-lasso" style="font-size: 80%;"><b>Simple implementation of GD for OLS, Ridge and Lasso</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs007.html#but-none-of-these-can-compete-with-newton-s-method" style="font-size: 80%;"><b>But none of these can compete with Newton's method</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs008.html#gradient-descent-and-logistic-regression" style="font-size: 80%;"><b>Gradient descent and Logistic regression</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs009.html#overview-video-on-stochastic-gradient-descent" style="font-size: 80%;"><b>Overview video on Stochastic Gradient Descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs010.html#batches-and-mini-batches" style="font-size: 80%;"><b>Batches and mini-batches</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs011.html#stochastic-gradient-descent-sgd" style="font-size: 80%;"><b>Stochastic Gradient Descent (SGD)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs012.html#stochastic-gradient-descent" style="font-size: 80%;"><b>Stochastic Gradient Descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs013.html#computation-of-gradients" style="font-size: 80%;"><b>Computation of gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs014.html#sgd-example" style="font-size: 80%;"><b>SGD example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs015.html#the-gradient-step" style="font-size: 80%;"><b>The gradient step</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs016.html#simple-example-code" style="font-size: 80%;"><b>Simple example code</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs017.html#when-do-we-stop" style="font-size: 80%;"><b>When do we stop?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs018.html#slightly-different-approach" style="font-size: 80%;"><b>Slightly different approach</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs019.html#time-decay-rate" style="font-size: 80%;"><b>Time decay rate</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs020.html#code-with-a-number-of-minibatches-which-varies" style="font-size: 80%;"><b>Code with a Number of Minibatches which varies</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs021.html#replace-or-not" style="font-size: 80%;"><b>Replace or not</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs022.html#momentum-based-gd" style="font-size: 80%;"><b>Momentum based GD</b></a></li>
-     <!-- navigation toc: --> <li><a href="#more-on-momentum-based-approaches" style="font-size: 80%;"><b>More on momentum based approaches</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs024.html#momentum-parameter" style="font-size: 80%;"><b>Momentum parameter</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs025.html#second-moment-of-the-gradient" style="font-size: 80%;"><b>Second moment of the gradient</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs026.html#rms-prop" style="font-size: 80%;"><b>RMS prop</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs027.html#adam-optimizer-https-arxiv-org-abs-1412-6980" style="font-size: 80%;"><b>"ADAM optimizer":"https://arxiv.org/abs/1412.6980"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#algorithms-and-codes-for-adagrad-rmsprop-and-adam" style="font-size: 80%;"><b>Algorithms and codes for Adagrad, RMSprop and Adam</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>AdaGrad algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>RMSProp algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>ADAM algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs029.html#practical-tips" style="font-size: 80%;"><b>Practical tips</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs030.html#automatic-differentiation" style="font-size: 80%;"><b>Automatic differentiation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs031.html#using-autograd" style="font-size: 80%;"><b>Using autograd</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs032.html#autograd-with-more-complicated-functions" style="font-size: 80%;"><b>Autograd with more complicated functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs033.html#more-complicated-functions-using-the-elements-of-their-arguments-directly" style="font-size: 80%;"><b>More complicated functions using the elements of their arguments directly</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs034.html#functions-using-mathematical-functions-from-numpy" style="font-size: 80%;"><b>Functions using mathematical functions from Numpy</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs035.html#more-autograd" style="font-size: 80%;"><b>More autograd</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs036.html#and-with-loops" style="font-size: 80%;"><b>And  with loops</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs037.html#using-recursion" style="font-size: 80%;"><b>Using recursion</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs038.html#using-autograd-with-ols" style="font-size: 80%;"><b>Using Autograd with OLS</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs041.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs040.html#including-stochastic-gradient-descent-with-autograd" style="font-size: 80%;"><b>Including Stochastic Gradient Descent with Autograd</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs041.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#similar-second-order-function-now-problem-but-now-with-adagrad" style="font-size: 80%;"><b>Similar (second order function now) problem but now with AdaGrad</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs043.html#rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent" style="font-size: 80%;"><b>RMSprop for adaptive learning rate with Stochastic Gradient Descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs044.html#and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf" style="font-size: 80%;"><b>And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#and-logistic-regression" style="font-size: 80%;"><b>And Logistic Regression</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#introducing-jax-https-jax-readthedocs-io-en-latest" style="font-size: 80%;"><b>Introducing "JAX":"https://jax.readthedocs.io/en/latest/"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#getting-started-with-jax-note-the-way-we-import-numpy" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Getting started with Jax, note the way we import numpy</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#a-warm-up-example" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;A warm-up example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#a-more-advanced-example" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;A more advanced example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs046.html#introduction-to-neural-networks" style="font-size: 80%;"><b>Introduction to Neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs047.html#artificial-neurons" style="font-size: 80%;"><b>Artificial neurons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs048.html#neural-network-types" style="font-size: 80%;"><b>Neural network types</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs049.html#feed-forward-neural-networks" style="font-size: 80%;"><b>Feed-forward neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs050.html#convolutional-neural-network" style="font-size: 80%;"><b>Convolutional Neural Network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs051.html#recurrent-neural-networks" style="font-size: 80%;"><b>Recurrent neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs052.html#other-types-of-networks" style="font-size: 80%;"><b>Other types of networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs053.html#multilayer-perceptrons" style="font-size: 80%;"><b>Multilayer perceptrons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs054.html#why-multilayer-perceptrons" style="font-size: 80%;"><b>Why multilayer perceptrons?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs055.html#illustration-of-a-single-perceptron-model-and-a-multi-perceptron-model" style="font-size: 80%;"><b>Illustration of a single perceptron model and a multi-perceptron model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs056.html#examples-of-xor-or-and-and-gates" style="font-size: 80%;"><b>Examples of XOR, OR and AND gates</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs057.html#does-logistic-regression-do-a-better-job" style="font-size: 80%;"><b>Does Logistic Regression do a better Job?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs058.html#adding-neural-networks" style="font-size: 80%;"><b>Adding Neural Networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs064.html#matrix-vector-notation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Matrix-vector notation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs065.html#matrix-vector-notation-and-activation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Matrix-vector notation  and activation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs066.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs067.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs068.html#relevance" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Relevance</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs001.html#lecture-monday-september-29-2025" style="font-size: 80%;"><b>Lecture Monday September 29, 2025</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs002.html#suggested-readings-and-videos" style="font-size: 80%;"><b>Suggested readings and videos</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs003.html#lab-sessions-tuesday-and-wednesday" style="font-size: 80%;"><b>Lab sessions Tuesday and Wednesday</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs004.html#logistic-regression-from-last-week" style="font-size: 80%;"><b>Logistic Regression, from last week</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs005.html#classification-problems" style="font-size: 80%;"><b>Classification problems</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs006.html#optimization-and-deep-learning" style="font-size: 80%;"><b>Optimization and Deep learning</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs007.html#basics" style="font-size: 80%;"><b>Basics</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs008.html#two-parameters" style="font-size: 80%;"><b>Two parameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs009.html#maximum-likelihood" style="font-size: 80%;"><b>Maximum likelihood</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs010.html#the-cost-function-rewritten" style="font-size: 80%;"><b>The cost function rewritten</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs011.html#minimizing-the-cross-entropy" style="font-size: 80%;"><b>Minimizing the cross entropy</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs012.html#a-more-compact-expression" style="font-size: 80%;"><b>A more compact expression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs013.html#extending-to-more-predictors" style="font-size: 80%;"><b>Extending to more predictors</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs014.html#including-more-classes" style="font-size: 80%;"><b>Including more classes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs015.html#more-classes" style="font-size: 80%;"><b>More classes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs016.html#optimization-the-central-part-of-any-machine-learning-algortithm" style="font-size: 80%;"><b>Optimization, the central part of any Machine Learning algortithm</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs017.html#revisiting-our-logistic-regression-case" style="font-size: 80%;"><b>Revisiting our Logistic Regression case</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs018.html#the-equations-to-solve" style="font-size: 80%;"><b>The equations to solve</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs019.html#solving-using-newton-raphson-s-method" style="font-size: 80%;"><b>Solving using Newton-Raphson's method</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs020.html#example-code-for-logistic-regression" style="font-size: 80%;"><b>Example code for Logistic Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs020.html#synthetic-data-generation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Synthetic data generation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs021.html#using-scikit-learn" style="font-size: 80%;"><b>Using <b>Scikit-learn</b></b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs022.html#using-the-correlation-matrix" style="font-size: 80%;"><b>Using the correlation matrix</b></a></li>
+     <!-- navigation toc: --> <li><a href="#discussing-the-correlation-data" style="font-size: 80%;"><b>Discussing the correlation data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs024.html#other-measures-in-classification-studies" style="font-size: 80%;"><b>Other measures in classification studies</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs025.html#introduction-to-neural-networks" style="font-size: 80%;"><b>Introduction to Neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs026.html#artificial-neurons" style="font-size: 80%;"><b>Artificial neurons</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs027.html#neural-network-types" style="font-size: 80%;"><b>Neural network types</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#feed-forward-neural-networks" style="font-size: 80%;"><b>Feed-forward neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs029.html#convolutional-neural-network" style="font-size: 80%;"><b>Convolutional Neural Network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs030.html#recurrent-neural-networks" style="font-size: 80%;"><b>Recurrent neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs031.html#other-types-of-networks" style="font-size: 80%;"><b>Other types of networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs032.html#multilayer-perceptrons" style="font-size: 80%;"><b>Multilayer perceptrons</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs033.html#why-multilayer-perceptrons" style="font-size: 80%;"><b>Why multilayer perceptrons?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs034.html#illustration-of-a-single-perceptron-model-and-a-multi-perceptron-model" style="font-size: 80%;"><b>Illustration of a single perceptron model and a multi-perceptron model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs035.html#examples-of-xor-or-and-and-gates" style="font-size: 80%;"><b>Examples of XOR, OR and AND gates</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs036.html#does-logistic-regression-do-a-better-job" style="font-size: 80%;"><b>Does Logistic Regression do a better Job?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs037.html#adding-neural-networks" style="font-size: 80%;"><b>Adding Neural Networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs043.html#matrix-vector-notation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Matrix-vector notation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs044.html#matrix-vector-notation-and-activation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Matrix-vector notation  and activation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs046.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs047.html#relevance" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Relevance</a></li>
 
         </ul>
       </li>
@@ -362,31 +260,75 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0023"></a>
 <!-- !split -->
-<h2 id="more-on-momentum-based-approaches" class="anchor">More on momentum based approaches </h2>
+<h2 id="discussing-the-correlation-data" class="anchor">Discussing the correlation data </h2>
 
-<p>Let us try to get more intuition from these equations. It is helpful
-to consider a simple physical analogy with a particle of mass \( m \)
-moving in a viscous medium with drag coefficient \( \mu \) and potential
-\( E(\mathbf{w}) \). If we denote the particle's position by \( \mathbf{w} \),
-then its motion is described by
+<p>In the above example we note two things. In the first plot we display
+the overlap of benign and malignant tumors as functions of the various
+features in the Wisconsin data set. We see that for
+some of the features we can distinguish clearly the benign and
+malignant cases while for other features we cannot. This can point to
+us which features may be of greater interest when we wish to classify
+a benign or not benign tumour.
 </p>
 
-$$
-m {d^2 \mathbf{w} \over dt^2} + \mu {d \mathbf{w} \over dt }= -\nabla_w E(\mathbf{w}).
-$$
+<p>In the second figure we have computed the so-called correlation
+matrix, which in our case with thirty features becomes a \( 30\times 30 \)
+matrix.
+</p>
 
-<p>We can discretize this equation in the usual way to get</p>
+<p>We constructed this matrix using <b>pandas</b> via the statements</p>
 
-$$
-m { \mathbf{w}_{t+\Delta t}-2 \mathbf{w}_{t} +\mathbf{w}_{t-\Delta t} \over (\Delta t)^2}+\mu {\mathbf{w}_{t+\Delta t}- \mathbf{w}_{t} \over \Delta t} = -\nabla_w E(\mathbf{w}).
-$$
+<!-- code=python (!bc pycod) typeset with pygments style "default" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #f8f8f8">
+  <pre style="line-height: 125%;">cancerpd <span style="color: #666666">=</span> pd<span style="color: #666666">.</span>DataFrame(cancer<span style="color: #666666">.</span>data, columns<span style="color: #666666">=</span>cancer<span style="color: #666666">.</span>feature_names)
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
+</div>
 
-<p>Rearranging this equation, we can rewrite this as</p>
+<p>and then</p>
 
-$$
-\Delta \mathbf{w}_{t +\Delta t}= - { (\Delta t)^2 \over m +\mu \Delta t} \nabla_w E(\mathbf{w})+ {m \over m +\mu \Delta t} \Delta \mathbf{w}_t.
-$$
+<!-- code=python (!bc pycod) typeset with pygments style "default" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #f8f8f8">
+  <pre style="line-height: 125%;">correlation_matrix <span style="color: #666666">=</span> cancerpd<span style="color: #666666">.</span>corr()<span style="color: #666666">.</span>round(<span style="color: #666666">1</span>)
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
+</div>
 
+<p>Diagonalizing this matrix we can in turn say something about which
+features are of relevance and which are not. This leads  us to
+the classical Principal Component Analysis (PCA) theorem with
+applications. This will be discussed later this semester.
+</p>
 
 <p>
 <!-- navigation buttons at the bottom of the page -->
@@ -413,7 +355,7 @@ <h2 id="more-on-momentum-based-approaches" class="anchor">More on momentum based
   <li><a href="http://github.com/._week40-bs031.html">32</a></li>
   <li><a href="http://github.com/._week40-bs032.html">33</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week40-bs068.html">69</a></li>
+  <li><a href="http://github.com/._week40-bs047.html">48</a></li>
   <li><a href="http://github.com/._week40-bs024.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week40/html/._week40-bs024.html b/doc/pub/week40/html/._week40-bs024.html
index 20b5a5065..491a81eb4 100644
--- a/doc/pub/week40/html/._week40-bs024.html
+++ b/doc/pub/week40/html/._week40-bs024.html
@@ -36,11 +36,10 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Plans for week 40', 2, None, 'plans-for-week-40'),
-              ('Lecture Monday September 30, 2024',
+ 'sections': [('Lecture Monday September 29, 2025',
                2,
                None,
-               'lecture-monday-september-30-2024'),
+               'lecture-monday-september-29-2025'),
               ('Suggested readings and videos',
                2,
                None,
@@ -49,145 +48,71 @@
                2,
                None,
                'lab-sessions-tuesday-and-wednesday'),
-              ('Summary from last week, using gradient descent methods, '
-               'limitations',
+              ('Logistic Regression, from last week',
                2,
                None,
-               'summary-from-last-week-using-gradient-descent-methods-limitations'),
-              ('Simple implementation of GD for OLS, Ridge and Lasso',
+               'logistic-regression-from-last-week'),
+              ('Classification problems', 2, None, 'classification-problems'),
+              ('Optimization and Deep learning',
                2,
                None,
-               'simple-implementation-of-gd-for-ols-ridge-and-lasso'),
-              ("But none of these can compete with Newton's method",
+               'optimization-and-deep-learning'),
+              ('Basics', 2, None, 'basics'),
+              ('Two parameters', 2, None, 'two-parameters'),
+              ('Maximum likelihood', 2, None, 'maximum-likelihood'),
+              ('The cost function rewritten',
                2,
                None,
-               'but-none-of-these-can-compete-with-newton-s-method'),
-              ('Gradient descent and Logistic regression',
+               'the-cost-function-rewritten'),
+              ('Minimizing the cross entropy',
                2,
                None,
-               'gradient-descent-and-logistic-regression'),
-              ('Overview video on Stochastic Gradient Descent',
+               'minimizing-the-cross-entropy'),
+              ('A more compact expression',
                2,
                None,
-               'overview-video-on-stochastic-gradient-descent'),
-              ('Batches and mini-batches', 2, None, 'batches-and-mini-batches'),
-              ('Stochastic Gradient Descent (SGD)',
+               'a-more-compact-expression'),
+              ('Extending to more predictors',
                2,
                None,
-               'stochastic-gradient-descent-sgd'),
-              ('Stochastic Gradient Descent',
+               'extending-to-more-predictors'),
+              ('Including more classes', 2, None, 'including-more-classes'),
+              ('More classes', 2, None, 'more-classes'),
+              ('Optimization, the central part of any Machine Learning '
+               'algortithm',
                2,
                None,
-               'stochastic-gradient-descent'),
-              ('Computation of gradients', 2, None, 'computation-of-gradients'),
-              ('SGD example', 2, None, 'sgd-example'),
-              ('The gradient step', 2, None, 'the-gradient-step'),
-              ('Simple example code', 2, None, 'simple-example-code'),
-              ('When do we stop?', 2, None, 'when-do-we-stop'),
-              ('Slightly different approach',
+               'optimization-the-central-part-of-any-machine-learning-algortithm'),
+              ('Revisiting our Logistic Regression case',
                2,
                None,
-               'slightly-different-approach'),
-              ('Time decay rate', 2, None, 'time-decay-rate'),
-              ('Code with a Number of Minibatches which varies',
+               'revisiting-our-logistic-regression-case'),
+              ('The equations to solve', 2, None, 'the-equations-to-solve'),
+              ("Solving using Newton-Raphson's method",
                2,
                None,
-               'code-with-a-number-of-minibatches-which-varies'),
-              ('Replace or not', 2, None, 'replace-or-not'),
-              ('Momentum based GD', 2, None, 'momentum-based-gd'),
-              ('More on momentum based approaches',
+               'solving-using-newton-raphson-s-method'),
+              ('Example code for Logistic Regression',
                2,
                None,
-               'more-on-momentum-based-approaches'),
-              ('Momentum parameter', 2, None, 'momentum-parameter'),
-              ('Second moment of the gradient',
-               2,
-               None,
-               'second-moment-of-the-gradient'),
-              ('RMS prop', 2, None, 'rms-prop'),
-              ('"ADAM optimizer":"https://arxiv.org/abs/1412.6980"',
-               2,
-               None,
-               'adam-optimizer-https-arxiv-org-abs-1412-6980'),
-              ('Algorithms and codes for Adagrad, RMSprop and Adam',
-               2,
-               None,
-               'algorithms-and-codes-for-adagrad-rmsprop-and-adam'),
-              ('AdaGrad algorithm, taken from "Goodfellow et '
-               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
-               2,
-               None,
-               'adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
-              ('RMSProp algorithm, taken from "Goodfellow et '
-               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
-               2,
-               None,
-               'rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
-              ('ADAM algorithm, taken from "Goodfellow et '
-               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
-               2,
-               None,
-               'adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
-              ('Practical tips', 2, None, 'practical-tips'),
-              ('Automatic differentiation',
-               2,
-               None,
-               'automatic-differentiation'),
-              ('Using autograd', 2, None, 'using-autograd'),
-              ('Autograd with more complicated functions',
-               2,
-               None,
-               'autograd-with-more-complicated-functions'),
-              ('More complicated functions using the elements of their '
-               'arguments directly',
-               2,
-               None,
-               'more-complicated-functions-using-the-elements-of-their-arguments-directly'),
-              ('Functions using mathematical functions from Numpy',
-               2,
-               None,
-               'functions-using-mathematical-functions-from-numpy'),
-              ('More autograd', 2, None, 'more-autograd'),
-              ('And  with loops', 2, None, 'and-with-loops'),
-              ('Using recursion', 2, None, 'using-recursion'),
-              ('Using Autograd with OLS', 2, None, 'using-autograd-with-ols'),
-              ('Same code but now with momentum gradient descent',
-               2,
-               None,
-               'same-code-but-now-with-momentum-gradient-descent'),
-              ('Including Stochastic Gradient Descent with Autograd',
-               2,
-               None,
-               'including-stochastic-gradient-descent-with-autograd'),
-              ('Same code but now with momentum gradient descent',
-               2,
-               None,
-               'same-code-but-now-with-momentum-gradient-descent'),
-              ('Similar (second order function now) problem but now with '
-               'AdaGrad',
-               2,
+               'example-code-for-logistic-regression'),
+              ('Synthetic data generation',
+               3,
                None,
-               'similar-second-order-function-now-problem-but-now-with-adagrad'),
-              ('RMSprop for adaptive learning rate with Stochastic Gradient '
-               'Descent',
+               'synthetic-data-generation'),
+              ('Using _Scikit-learn_', 2, None, 'using-scikit-learn'),
+              ('Using the correlation matrix',
                2,
                None,
-               'rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent'),
-              ('And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"',
+               'using-the-correlation-matrix'),
+              ('Discussing the correlation data',
                2,
                None,
-               'and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf'),
-              ('And Logistic Regression', 2, None, 'and-logistic-regression'),
-              ('Introducing "JAX":"https://jax.readthedocs.io/en/latest/"',
+               'discussing-the-correlation-data'),
+              ('Other measures in classification studies',
                2,
                None,
-               'introducing-jax-https-jax-readthedocs-io-en-latest'),
-              ('Getting started with Jax, note the way we import numpy',
-               3,
-               None,
-               'getting-started-with-jax-note-the-way-we-import-numpy'),
-              ('A warm-up example', 3, None, 'a-warm-up-example'),
-              ('A more advanced example', 3, None, 'a-more-advanced-example'),
+               'other-measures-in-classification-studies'),
               ('Introduction to Neural networks',
                2,
                None,
@@ -276,81 +201,54 @@
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs001.html#plans-for-week-40" style="font-size: 80%;"><b>Plans for week 40</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs002.html#lecture-monday-september-30-2024" style="font-size: 80%;"><b>Lecture Monday September 30, 2024</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs003.html#suggested-readings-and-videos" style="font-size: 80%;"><b>Suggested readings and videos</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs004.html#lab-sessions-tuesday-and-wednesday" style="font-size: 80%;"><b>Lab sessions Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs005.html#summary-from-last-week-using-gradient-descent-methods-limitations" style="font-size: 80%;"><b>Summary from last week, using gradient descent methods, limitations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs006.html#simple-implementation-of-gd-for-ols-ridge-and-lasso" style="font-size: 80%;"><b>Simple implementation of GD for OLS, Ridge and Lasso</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs007.html#but-none-of-these-can-compete-with-newton-s-method" style="font-size: 80%;"><b>But none of these can compete with Newton's method</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs008.html#gradient-descent-and-logistic-regression" style="font-size: 80%;"><b>Gradient descent and Logistic regression</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs009.html#overview-video-on-stochastic-gradient-descent" style="font-size: 80%;"><b>Overview video on Stochastic Gradient Descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs010.html#batches-and-mini-batches" style="font-size: 80%;"><b>Batches and mini-batches</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs011.html#stochastic-gradient-descent-sgd" style="font-size: 80%;"><b>Stochastic Gradient Descent (SGD)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs012.html#stochastic-gradient-descent" style="font-size: 80%;"><b>Stochastic Gradient Descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs013.html#computation-of-gradients" style="font-size: 80%;"><b>Computation of gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs014.html#sgd-example" style="font-size: 80%;"><b>SGD example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs015.html#the-gradient-step" style="font-size: 80%;"><b>The gradient step</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs016.html#simple-example-code" style="font-size: 80%;"><b>Simple example code</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs017.html#when-do-we-stop" style="font-size: 80%;"><b>When do we stop?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs018.html#slightly-different-approach" style="font-size: 80%;"><b>Slightly different approach</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs019.html#time-decay-rate" style="font-size: 80%;"><b>Time decay rate</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs020.html#code-with-a-number-of-minibatches-which-varies" style="font-size: 80%;"><b>Code with a Number of Minibatches which varies</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs021.html#replace-or-not" style="font-size: 80%;"><b>Replace or not</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs022.html#momentum-based-gd" style="font-size: 80%;"><b>Momentum based GD</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs023.html#more-on-momentum-based-approaches" style="font-size: 80%;"><b>More on momentum based approaches</b></a></li>
-     <!-- navigation toc: --> <li><a href="#momentum-parameter" style="font-size: 80%;"><b>Momentum parameter</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs025.html#second-moment-of-the-gradient" style="font-size: 80%;"><b>Second moment of the gradient</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs026.html#rms-prop" style="font-size: 80%;"><b>RMS prop</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs027.html#adam-optimizer-https-arxiv-org-abs-1412-6980" style="font-size: 80%;"><b>"ADAM optimizer":"https://arxiv.org/abs/1412.6980"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#algorithms-and-codes-for-adagrad-rmsprop-and-adam" style="font-size: 80%;"><b>Algorithms and codes for Adagrad, RMSprop and Adam</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>AdaGrad algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>RMSProp algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>ADAM algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs029.html#practical-tips" style="font-size: 80%;"><b>Practical tips</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs030.html#automatic-differentiation" style="font-size: 80%;"><b>Automatic differentiation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs031.html#using-autograd" style="font-size: 80%;"><b>Using autograd</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs032.html#autograd-with-more-complicated-functions" style="font-size: 80%;"><b>Autograd with more complicated functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs033.html#more-complicated-functions-using-the-elements-of-their-arguments-directly" style="font-size: 80%;"><b>More complicated functions using the elements of their arguments directly</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs034.html#functions-using-mathematical-functions-from-numpy" style="font-size: 80%;"><b>Functions using mathematical functions from Numpy</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs035.html#more-autograd" style="font-size: 80%;"><b>More autograd</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs036.html#and-with-loops" style="font-size: 80%;"><b>And  with loops</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs037.html#using-recursion" style="font-size: 80%;"><b>Using recursion</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs038.html#using-autograd-with-ols" style="font-size: 80%;"><b>Using Autograd with OLS</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs041.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs040.html#including-stochastic-gradient-descent-with-autograd" style="font-size: 80%;"><b>Including Stochastic Gradient Descent with Autograd</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs041.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#similar-second-order-function-now-problem-but-now-with-adagrad" style="font-size: 80%;"><b>Similar (second order function now) problem but now with AdaGrad</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs043.html#rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent" style="font-size: 80%;"><b>RMSprop for adaptive learning rate with Stochastic Gradient Descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs044.html#and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf" style="font-size: 80%;"><b>And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#and-logistic-regression" style="font-size: 80%;"><b>And Logistic Regression</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#introducing-jax-https-jax-readthedocs-io-en-latest" style="font-size: 80%;"><b>Introducing "JAX":"https://jax.readthedocs.io/en/latest/"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#getting-started-with-jax-note-the-way-we-import-numpy" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Getting started with Jax, note the way we import numpy</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#a-warm-up-example" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;A warm-up example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#a-more-advanced-example" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;A more advanced example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs046.html#introduction-to-neural-networks" style="font-size: 80%;"><b>Introduction to Neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs047.html#artificial-neurons" style="font-size: 80%;"><b>Artificial neurons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs048.html#neural-network-types" style="font-size: 80%;"><b>Neural network types</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs049.html#feed-forward-neural-networks" style="font-size: 80%;"><b>Feed-forward neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs050.html#convolutional-neural-network" style="font-size: 80%;"><b>Convolutional Neural Network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs051.html#recurrent-neural-networks" style="font-size: 80%;"><b>Recurrent neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs052.html#other-types-of-networks" style="font-size: 80%;"><b>Other types of networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs053.html#multilayer-perceptrons" style="font-size: 80%;"><b>Multilayer perceptrons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs054.html#why-multilayer-perceptrons" style="font-size: 80%;"><b>Why multilayer perceptrons?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs055.html#illustration-of-a-single-perceptron-model-and-a-multi-perceptron-model" style="font-size: 80%;"><b>Illustration of a single perceptron model and a multi-perceptron model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs056.html#examples-of-xor-or-and-and-gates" style="font-size: 80%;"><b>Examples of XOR, OR and AND gates</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs057.html#does-logistic-regression-do-a-better-job" style="font-size: 80%;"><b>Does Logistic Regression do a better Job?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs058.html#adding-neural-networks" style="font-size: 80%;"><b>Adding Neural Networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs064.html#matrix-vector-notation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Matrix-vector notation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs065.html#matrix-vector-notation-and-activation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Matrix-vector notation  and activation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs066.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs067.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs068.html#relevance" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Relevance</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs001.html#lecture-monday-september-29-2025" style="font-size: 80%;"><b>Lecture Monday September 29, 2025</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs002.html#suggested-readings-and-videos" style="font-size: 80%;"><b>Suggested readings and videos</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs003.html#lab-sessions-tuesday-and-wednesday" style="font-size: 80%;"><b>Lab sessions Tuesday and Wednesday</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs004.html#logistic-regression-from-last-week" style="font-size: 80%;"><b>Logistic Regression, from last week</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs005.html#classification-problems" style="font-size: 80%;"><b>Classification problems</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs006.html#optimization-and-deep-learning" style="font-size: 80%;"><b>Optimization and Deep learning</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs007.html#basics" style="font-size: 80%;"><b>Basics</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs008.html#two-parameters" style="font-size: 80%;"><b>Two parameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs009.html#maximum-likelihood" style="font-size: 80%;"><b>Maximum likelihood</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs010.html#the-cost-function-rewritten" style="font-size: 80%;"><b>The cost function rewritten</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs011.html#minimizing-the-cross-entropy" style="font-size: 80%;"><b>Minimizing the cross entropy</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs012.html#a-more-compact-expression" style="font-size: 80%;"><b>A more compact expression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs013.html#extending-to-more-predictors" style="font-size: 80%;"><b>Extending to more predictors</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs014.html#including-more-classes" style="font-size: 80%;"><b>Including more classes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs015.html#more-classes" style="font-size: 80%;"><b>More classes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs016.html#optimization-the-central-part-of-any-machine-learning-algortithm" style="font-size: 80%;"><b>Optimization, the central part of any Machine Learning algortithm</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs017.html#revisiting-our-logistic-regression-case" style="font-size: 80%;"><b>Revisiting our Logistic Regression case</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs018.html#the-equations-to-solve" style="font-size: 80%;"><b>The equations to solve</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs019.html#solving-using-newton-raphson-s-method" style="font-size: 80%;"><b>Solving using Newton-Raphson's method</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs020.html#example-code-for-logistic-regression" style="font-size: 80%;"><b>Example code for Logistic Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs020.html#synthetic-data-generation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Synthetic data generation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs021.html#using-scikit-learn" style="font-size: 80%;"><b>Using <b>Scikit-learn</b></b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs022.html#using-the-correlation-matrix" style="font-size: 80%;"><b>Using the correlation matrix</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs023.html#discussing-the-correlation-data" style="font-size: 80%;"><b>Discussing the correlation data</b></a></li>
+     <!-- navigation toc: --> <li><a href="#other-measures-in-classification-studies" style="font-size: 80%;"><b>Other measures in classification studies</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs025.html#introduction-to-neural-networks" style="font-size: 80%;"><b>Introduction to Neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs026.html#artificial-neurons" style="font-size: 80%;"><b>Artificial neurons</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs027.html#neural-network-types" style="font-size: 80%;"><b>Neural network types</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#feed-forward-neural-networks" style="font-size: 80%;"><b>Feed-forward neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs029.html#convolutional-neural-network" style="font-size: 80%;"><b>Convolutional Neural Network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs030.html#recurrent-neural-networks" style="font-size: 80%;"><b>Recurrent neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs031.html#other-types-of-networks" style="font-size: 80%;"><b>Other types of networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs032.html#multilayer-perceptrons" style="font-size: 80%;"><b>Multilayer perceptrons</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs033.html#why-multilayer-perceptrons" style="font-size: 80%;"><b>Why multilayer perceptrons?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs034.html#illustration-of-a-single-perceptron-model-and-a-multi-perceptron-model" style="font-size: 80%;"><b>Illustration of a single perceptron model and a multi-perceptron model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs035.html#examples-of-xor-or-and-and-gates" style="font-size: 80%;"><b>Examples of XOR, OR and AND gates</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs036.html#does-logistic-regression-do-a-better-job" style="font-size: 80%;"><b>Does Logistic Regression do a better Job?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs037.html#adding-neural-networks" style="font-size: 80%;"><b>Adding Neural Networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs043.html#matrix-vector-notation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Matrix-vector notation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs044.html#matrix-vector-notation-and-activation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Matrix-vector notation  and activation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs046.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs047.html#relevance" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Relevance</a></li>
 
         </ul>
       </li>
@@ -362,57 +260,61 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0024"></a>
 <!-- !split -->
-<h2 id="momentum-parameter" class="anchor">Momentum parameter </h2>
-
-<p>Notice that this equation is identical to previous one if we identify
-the position of the particle, \( \mathbf{w} \), with the parameters
-\( \boldsymbol{\theta} \). This allows us to identify the momentum
-parameter and learning rate with the mass of the particle and the
-viscous drag as:
-</p>
-
-$$
-\gamma= {m \over m +\mu \Delta t }, \qquad \eta = {(\Delta t)^2 \over m +\mu \Delta t}.
-$$
+<h2 id="other-measures-in-classification-studies" class="anchor">Other measures in classification studies </h2>
 
-<p>Thus, as the name suggests, the momentum parameter is proportional to
-the mass of the particle and effectively provides inertia.
-Furthermore, in the large viscosity/small learning rate limit, our
-memory time scales as \( (1-\gamma)^{-1} \approx m/(\mu \Delta t) \).
-</p>
+<!-- code=python (!bc pycod) typeset with pygments style "default" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #f8f8f8">
+  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">matplotlib.pyplot</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">plt</span>
+<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.model_selection</span> <span style="color: #008000; font-weight: bold">import</span>  train_test_split 
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.datasets</span> <span style="color: #008000; font-weight: bold">import</span> load_breast_cancer
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.linear_model</span> <span style="color: #008000; font-weight: bold">import</span> LogisticRegression
 
-<p>Why is momentum useful? SGD momentum helps the gradient descent
-algorithm gain speed in directions with persistent but small gradients
-even in the presence of stochasticity, while suppressing oscillations
-in high-curvature directions. This becomes especially important in
-situations where the landscape is shallow and flat in some directions
-and narrow and steep in others. It has been argued that first-order
-methods (with appropriate initial conditions) can perform comparable
-to more expensive second order methods, especially in the context of
-complex deep learning models.
-</p>
+<span style="color: #408080; font-style: italic"># Load the data</span>
+cancer <span style="color: #666666">=</span> load_breast_cancer()
 
-<p>These beneficial properties of momentum can sometimes become even more
-pronounced by using a slight modification of the classical momentum
-algorithm called Nesterov Accelerated Gradient (NAG).
-</p>
+X_train, X_test, y_train, y_test <span style="color: #666666">=</span> train_test_split(cancer<span style="color: #666666">.</span>data,cancer<span style="color: #666666">.</span>target,random_state<span style="color: #666666">=0</span>)
+<span style="color: #008000">print</span>(X_train<span style="color: #666666">.</span>shape)
+<span style="color: #008000">print</span>(X_test<span style="color: #666666">.</span>shape)
+<span style="color: #408080; font-style: italic"># Logistic Regression</span>
+logreg <span style="color: #666666">=</span> LogisticRegression(solver<span style="color: #666666">=</span><span style="color: #BA2121">&#39;lbfgs&#39;</span>)
+logreg<span style="color: #666666">.</span>fit(X_train, y_train)
 
-<p>In the NAG algorithm, rather than calculating the gradient at the
-current parameters, \( \nabla_\theta E(\boldsymbol{\theta}_t) \), one
-calculates the gradient at the expected value of the parameters given
-our current momentum, \( \nabla_\theta E(\boldsymbol{\theta}_t +\gamma
-\mathbf{v}_{t-1}) \). This yields the NAG update rule
-</p>
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.preprocessing</span> <span style="color: #008000; font-weight: bold">import</span> LabelEncoder
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.model_selection</span> <span style="color: #008000; font-weight: bold">import</span> cross_validate
+<span style="color: #408080; font-style: italic">#Cross validation</span>
+accuracy <span style="color: #666666">=</span> cross_validate(logreg,X_test,y_test,cv<span style="color: #666666">=10</span>)[<span style="color: #BA2121">&#39;test_score&#39;</span>]
+<span style="color: #008000">print</span>(accuracy)
+<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;Test set accuracy with Logistic Regression: </span><span style="color: #BB6688; font-weight: bold">{:.2f}</span><span style="color: #BA2121">&quot;</span><span style="color: #666666">.</span>format(logreg<span style="color: #666666">.</span>score(X_test,y_test)))
 
-$$
-\begin{align}
-\mathbf{v}_{t}&=\gamma \mathbf{v}_{t-1}+\eta_{t}\nabla_\theta E(\boldsymbol{\theta}_t +\gamma \mathbf{v}_{t-1}) \nonumber \\
-\boldsymbol{\theta}_{t+1}&= \boldsymbol{\theta}_t -\mathbf{v}_{t}.
-\tag{2}
-\end{align}
-$$
+<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">scikitplot</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">skplt</span>
+y_pred <span style="color: #666666">=</span> logreg<span style="color: #666666">.</span>predict(X_test)
+skplt<span style="color: #666666">.</span>metrics<span style="color: #666666">.</span>plot_confusion_matrix(y_test, y_pred, normalize<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">True</span>)
+plt<span style="color: #666666">.</span>show()
+y_probas <span style="color: #666666">=</span> logreg<span style="color: #666666">.</span>predict_proba(X_test)
+skplt<span style="color: #666666">.</span>metrics<span style="color: #666666">.</span>plot_roc(y_test, y_probas)
+plt<span style="color: #666666">.</span>show()
+skplt<span style="color: #666666">.</span>metrics<span style="color: #666666">.</span>plot_cumulative_gain(y_test, y_probas)
+plt<span style="color: #666666">.</span>show()
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
+</div>
 
-<p>One of the major advantages of NAG is that it allows for the use of a larger learning rate than GDM for the same choice of \( \gamma \).</p>
 
 <p>
 <!-- navigation buttons at the bottom of the page -->
@@ -439,7 +341,7 @@ <h2 id="momentum-parameter" class="anchor">Momentum parameter </h2>
   <li><a href="http://github.com/._week40-bs032.html">33</a></li>
   <li><a href="http://github.com/._week40-bs033.html">34</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week40-bs068.html">69</a></li>
+  <li><a href="http://github.com/._week40-bs047.html">48</a></li>
   <li><a href="http://github.com/._week40-bs025.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week40/html/._week40-bs025.html b/doc/pub/week40/html/._week40-bs025.html
index 90f79cf87..9986d3190 100644
--- a/doc/pub/week40/html/._week40-bs025.html
+++ b/doc/pub/week40/html/._week40-bs025.html
@@ -36,11 +36,10 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Plans for week 40', 2, None, 'plans-for-week-40'),
-              ('Lecture Monday September 30, 2024',
+ 'sections': [('Lecture Monday September 29, 2025',
                2,
                None,
-               'lecture-monday-september-30-2024'),
+               'lecture-monday-september-29-2025'),
               ('Suggested readings and videos',
                2,
                None,
@@ -49,145 +48,71 @@
                2,
                None,
                'lab-sessions-tuesday-and-wednesday'),
-              ('Summary from last week, using gradient descent methods, '
-               'limitations',
+              ('Logistic Regression, from last week',
                2,
                None,
-               'summary-from-last-week-using-gradient-descent-methods-limitations'),
-              ('Simple implementation of GD for OLS, Ridge and Lasso',
+               'logistic-regression-from-last-week'),
+              ('Classification problems', 2, None, 'classification-problems'),
+              ('Optimization and Deep learning',
                2,
                None,
-               'simple-implementation-of-gd-for-ols-ridge-and-lasso'),
-              ("But none of these can compete with Newton's method",
+               'optimization-and-deep-learning'),
+              ('Basics', 2, None, 'basics'),
+              ('Two parameters', 2, None, 'two-parameters'),
+              ('Maximum likelihood', 2, None, 'maximum-likelihood'),
+              ('The cost function rewritten',
                2,
                None,
-               'but-none-of-these-can-compete-with-newton-s-method'),
-              ('Gradient descent and Logistic regression',
+               'the-cost-function-rewritten'),
+              ('Minimizing the cross entropy',
                2,
                None,
-               'gradient-descent-and-logistic-regression'),
-              ('Overview video on Stochastic Gradient Descent',
+               'minimizing-the-cross-entropy'),
+              ('A more compact expression',
                2,
                None,
-               'overview-video-on-stochastic-gradient-descent'),
-              ('Batches and mini-batches', 2, None, 'batches-and-mini-batches'),
-              ('Stochastic Gradient Descent (SGD)',
+               'a-more-compact-expression'),
+              ('Extending to more predictors',
                2,
                None,
-               'stochastic-gradient-descent-sgd'),
-              ('Stochastic Gradient Descent',
+               'extending-to-more-predictors'),
+              ('Including more classes', 2, None, 'including-more-classes'),
+              ('More classes', 2, None, 'more-classes'),
+              ('Optimization, the central part of any Machine Learning '
+               'algortithm',
                2,
                None,
-               'stochastic-gradient-descent'),
-              ('Computation of gradients', 2, None, 'computation-of-gradients'),
-              ('SGD example', 2, None, 'sgd-example'),
-              ('The gradient step', 2, None, 'the-gradient-step'),
-              ('Simple example code', 2, None, 'simple-example-code'),
-              ('When do we stop?', 2, None, 'when-do-we-stop'),
-              ('Slightly different approach',
+               'optimization-the-central-part-of-any-machine-learning-algortithm'),
+              ('Revisiting our Logistic Regression case',
                2,
                None,
-               'slightly-different-approach'),
-              ('Time decay rate', 2, None, 'time-decay-rate'),
-              ('Code with a Number of Minibatches which varies',
+               'revisiting-our-logistic-regression-case'),
+              ('The equations to solve', 2, None, 'the-equations-to-solve'),
+              ("Solving using Newton-Raphson's method",
                2,
                None,
-               'code-with-a-number-of-minibatches-which-varies'),
-              ('Replace or not', 2, None, 'replace-or-not'),
-              ('Momentum based GD', 2, None, 'momentum-based-gd'),
-              ('More on momentum based approaches',
+               'solving-using-newton-raphson-s-method'),
+              ('Example code for Logistic Regression',
                2,
                None,
-               'more-on-momentum-based-approaches'),
-              ('Momentum parameter', 2, None, 'momentum-parameter'),
-              ('Second moment of the gradient',
-               2,
-               None,
-               'second-moment-of-the-gradient'),
-              ('RMS prop', 2, None, 'rms-prop'),
-              ('"ADAM optimizer":"https://arxiv.org/abs/1412.6980"',
-               2,
-               None,
-               'adam-optimizer-https-arxiv-org-abs-1412-6980'),
-              ('Algorithms and codes for Adagrad, RMSprop and Adam',
-               2,
-               None,
-               'algorithms-and-codes-for-adagrad-rmsprop-and-adam'),
-              ('AdaGrad algorithm, taken from "Goodfellow et '
-               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
-               2,
-               None,
-               'adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
-              ('RMSProp algorithm, taken from "Goodfellow et '
-               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
-               2,
-               None,
-               'rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
-              ('ADAM algorithm, taken from "Goodfellow et '
-               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
-               2,
-               None,
-               'adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
-              ('Practical tips', 2, None, 'practical-tips'),
-              ('Automatic differentiation',
-               2,
-               None,
-               'automatic-differentiation'),
-              ('Using autograd', 2, None, 'using-autograd'),
-              ('Autograd with more complicated functions',
-               2,
-               None,
-               'autograd-with-more-complicated-functions'),
-              ('More complicated functions using the elements of their '
-               'arguments directly',
-               2,
-               None,
-               'more-complicated-functions-using-the-elements-of-their-arguments-directly'),
-              ('Functions using mathematical functions from Numpy',
-               2,
-               None,
-               'functions-using-mathematical-functions-from-numpy'),
-              ('More autograd', 2, None, 'more-autograd'),
-              ('And  with loops', 2, None, 'and-with-loops'),
-              ('Using recursion', 2, None, 'using-recursion'),
-              ('Using Autograd with OLS', 2, None, 'using-autograd-with-ols'),
-              ('Same code but now with momentum gradient descent',
-               2,
-               None,
-               'same-code-but-now-with-momentum-gradient-descent'),
-              ('Including Stochastic Gradient Descent with Autograd',
-               2,
-               None,
-               'including-stochastic-gradient-descent-with-autograd'),
-              ('Same code but now with momentum gradient descent',
-               2,
-               None,
-               'same-code-but-now-with-momentum-gradient-descent'),
-              ('Similar (second order function now) problem but now with '
-               'AdaGrad',
-               2,
+               'example-code-for-logistic-regression'),
+              ('Synthetic data generation',
+               3,
                None,
-               'similar-second-order-function-now-problem-but-now-with-adagrad'),
-              ('RMSprop for adaptive learning rate with Stochastic Gradient '
-               'Descent',
+               'synthetic-data-generation'),
+              ('Using _Scikit-learn_', 2, None, 'using-scikit-learn'),
+              ('Using the correlation matrix',
                2,
                None,
-               'rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent'),
-              ('And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"',
+               'using-the-correlation-matrix'),
+              ('Discussing the correlation data',
                2,
                None,
-               'and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf'),
-              ('And Logistic Regression', 2, None, 'and-logistic-regression'),
-              ('Introducing "JAX":"https://jax.readthedocs.io/en/latest/"',
+               'discussing-the-correlation-data'),
+              ('Other measures in classification studies',
                2,
                None,
-               'introducing-jax-https-jax-readthedocs-io-en-latest'),
-              ('Getting started with Jax, note the way we import numpy',
-               3,
-               None,
-               'getting-started-with-jax-note-the-way-we-import-numpy'),
-              ('A warm-up example', 3, None, 'a-warm-up-example'),
-              ('A more advanced example', 3, None, 'a-more-advanced-example'),
+               'other-measures-in-classification-studies'),
               ('Introduction to Neural networks',
                2,
                None,
@@ -276,81 +201,54 @@
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs001.html#plans-for-week-40" style="font-size: 80%;"><b>Plans for week 40</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs002.html#lecture-monday-september-30-2024" style="font-size: 80%;"><b>Lecture Monday September 30, 2024</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs003.html#suggested-readings-and-videos" style="font-size: 80%;"><b>Suggested readings and videos</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs004.html#lab-sessions-tuesday-and-wednesday" style="font-size: 80%;"><b>Lab sessions Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs005.html#summary-from-last-week-using-gradient-descent-methods-limitations" style="font-size: 80%;"><b>Summary from last week, using gradient descent methods, limitations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs006.html#simple-implementation-of-gd-for-ols-ridge-and-lasso" style="font-size: 80%;"><b>Simple implementation of GD for OLS, Ridge and Lasso</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs007.html#but-none-of-these-can-compete-with-newton-s-method" style="font-size: 80%;"><b>But none of these can compete with Newton's method</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs008.html#gradient-descent-and-logistic-regression" style="font-size: 80%;"><b>Gradient descent and Logistic regression</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs009.html#overview-video-on-stochastic-gradient-descent" style="font-size: 80%;"><b>Overview video on Stochastic Gradient Descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs010.html#batches-and-mini-batches" style="font-size: 80%;"><b>Batches and mini-batches</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs011.html#stochastic-gradient-descent-sgd" style="font-size: 80%;"><b>Stochastic Gradient Descent (SGD)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs012.html#stochastic-gradient-descent" style="font-size: 80%;"><b>Stochastic Gradient Descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs013.html#computation-of-gradients" style="font-size: 80%;"><b>Computation of gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs014.html#sgd-example" style="font-size: 80%;"><b>SGD example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs015.html#the-gradient-step" style="font-size: 80%;"><b>The gradient step</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs016.html#simple-example-code" style="font-size: 80%;"><b>Simple example code</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs017.html#when-do-we-stop" style="font-size: 80%;"><b>When do we stop?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs018.html#slightly-different-approach" style="font-size: 80%;"><b>Slightly different approach</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs019.html#time-decay-rate" style="font-size: 80%;"><b>Time decay rate</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs020.html#code-with-a-number-of-minibatches-which-varies" style="font-size: 80%;"><b>Code with a Number of Minibatches which varies</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs021.html#replace-or-not" style="font-size: 80%;"><b>Replace or not</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs022.html#momentum-based-gd" style="font-size: 80%;"><b>Momentum based GD</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs023.html#more-on-momentum-based-approaches" style="font-size: 80%;"><b>More on momentum based approaches</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs024.html#momentum-parameter" style="font-size: 80%;"><b>Momentum parameter</b></a></li>
-     <!-- navigation toc: --> <li><a href="#second-moment-of-the-gradient" style="font-size: 80%;"><b>Second moment of the gradient</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs026.html#rms-prop" style="font-size: 80%;"><b>RMS prop</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs027.html#adam-optimizer-https-arxiv-org-abs-1412-6980" style="font-size: 80%;"><b>"ADAM optimizer":"https://arxiv.org/abs/1412.6980"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#algorithms-and-codes-for-adagrad-rmsprop-and-adam" style="font-size: 80%;"><b>Algorithms and codes for Adagrad, RMSprop and Adam</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>AdaGrad algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>RMSProp algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>ADAM algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs029.html#practical-tips" style="font-size: 80%;"><b>Practical tips</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs030.html#automatic-differentiation" style="font-size: 80%;"><b>Automatic differentiation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs031.html#using-autograd" style="font-size: 80%;"><b>Using autograd</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs032.html#autograd-with-more-complicated-functions" style="font-size: 80%;"><b>Autograd with more complicated functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs033.html#more-complicated-functions-using-the-elements-of-their-arguments-directly" style="font-size: 80%;"><b>More complicated functions using the elements of their arguments directly</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs034.html#functions-using-mathematical-functions-from-numpy" style="font-size: 80%;"><b>Functions using mathematical functions from Numpy</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs035.html#more-autograd" style="font-size: 80%;"><b>More autograd</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs036.html#and-with-loops" style="font-size: 80%;"><b>And  with loops</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs037.html#using-recursion" style="font-size: 80%;"><b>Using recursion</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs038.html#using-autograd-with-ols" style="font-size: 80%;"><b>Using Autograd with OLS</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs041.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs040.html#including-stochastic-gradient-descent-with-autograd" style="font-size: 80%;"><b>Including Stochastic Gradient Descent with Autograd</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs041.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#similar-second-order-function-now-problem-but-now-with-adagrad" style="font-size: 80%;"><b>Similar (second order function now) problem but now with AdaGrad</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs043.html#rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent" style="font-size: 80%;"><b>RMSprop for adaptive learning rate with Stochastic Gradient Descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs044.html#and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf" style="font-size: 80%;"><b>And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#and-logistic-regression" style="font-size: 80%;"><b>And Logistic Regression</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#introducing-jax-https-jax-readthedocs-io-en-latest" style="font-size: 80%;"><b>Introducing "JAX":"https://jax.readthedocs.io/en/latest/"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#getting-started-with-jax-note-the-way-we-import-numpy" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Getting started with Jax, note the way we import numpy</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#a-warm-up-example" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;A warm-up example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#a-more-advanced-example" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;A more advanced example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs046.html#introduction-to-neural-networks" style="font-size: 80%;"><b>Introduction to Neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs047.html#artificial-neurons" style="font-size: 80%;"><b>Artificial neurons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs048.html#neural-network-types" style="font-size: 80%;"><b>Neural network types</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs049.html#feed-forward-neural-networks" style="font-size: 80%;"><b>Feed-forward neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs050.html#convolutional-neural-network" style="font-size: 80%;"><b>Convolutional Neural Network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs051.html#recurrent-neural-networks" style="font-size: 80%;"><b>Recurrent neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs052.html#other-types-of-networks" style="font-size: 80%;"><b>Other types of networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs053.html#multilayer-perceptrons" style="font-size: 80%;"><b>Multilayer perceptrons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs054.html#why-multilayer-perceptrons" style="font-size: 80%;"><b>Why multilayer perceptrons?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs055.html#illustration-of-a-single-perceptron-model-and-a-multi-perceptron-model" style="font-size: 80%;"><b>Illustration of a single perceptron model and a multi-perceptron model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs056.html#examples-of-xor-or-and-and-gates" style="font-size: 80%;"><b>Examples of XOR, OR and AND gates</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs057.html#does-logistic-regression-do-a-better-job" style="font-size: 80%;"><b>Does Logistic Regression do a better Job?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs058.html#adding-neural-networks" style="font-size: 80%;"><b>Adding Neural Networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs064.html#matrix-vector-notation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Matrix-vector notation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs065.html#matrix-vector-notation-and-activation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Matrix-vector notation  and activation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs066.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs067.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs068.html#relevance" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Relevance</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs001.html#lecture-monday-september-29-2025" style="font-size: 80%;"><b>Lecture Monday September 29, 2025</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs002.html#suggested-readings-and-videos" style="font-size: 80%;"><b>Suggested readings and videos</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs003.html#lab-sessions-tuesday-and-wednesday" style="font-size: 80%;"><b>Lab sessions Tuesday and Wednesday</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs004.html#logistic-regression-from-last-week" style="font-size: 80%;"><b>Logistic Regression, from last week</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs005.html#classification-problems" style="font-size: 80%;"><b>Classification problems</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs006.html#optimization-and-deep-learning" style="font-size: 80%;"><b>Optimization and Deep learning</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs007.html#basics" style="font-size: 80%;"><b>Basics</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs008.html#two-parameters" style="font-size: 80%;"><b>Two parameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs009.html#maximum-likelihood" style="font-size: 80%;"><b>Maximum likelihood</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs010.html#the-cost-function-rewritten" style="font-size: 80%;"><b>The cost function rewritten</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs011.html#minimizing-the-cross-entropy" style="font-size: 80%;"><b>Minimizing the cross entropy</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs012.html#a-more-compact-expression" style="font-size: 80%;"><b>A more compact expression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs013.html#extending-to-more-predictors" style="font-size: 80%;"><b>Extending to more predictors</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs014.html#including-more-classes" style="font-size: 80%;"><b>Including more classes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs015.html#more-classes" style="font-size: 80%;"><b>More classes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs016.html#optimization-the-central-part-of-any-machine-learning-algortithm" style="font-size: 80%;"><b>Optimization, the central part of any Machine Learning algortithm</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs017.html#revisiting-our-logistic-regression-case" style="font-size: 80%;"><b>Revisiting our Logistic Regression case</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs018.html#the-equations-to-solve" style="font-size: 80%;"><b>The equations to solve</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs019.html#solving-using-newton-raphson-s-method" style="font-size: 80%;"><b>Solving using Newton-Raphson's method</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs020.html#example-code-for-logistic-regression" style="font-size: 80%;"><b>Example code for Logistic Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs020.html#synthetic-data-generation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Synthetic data generation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs021.html#using-scikit-learn" style="font-size: 80%;"><b>Using <b>Scikit-learn</b></b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs022.html#using-the-correlation-matrix" style="font-size: 80%;"><b>Using the correlation matrix</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs023.html#discussing-the-correlation-data" style="font-size: 80%;"><b>Discussing the correlation data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs024.html#other-measures-in-classification-studies" style="font-size: 80%;"><b>Other measures in classification studies</b></a></li>
+     <!-- navigation toc: --> <li><a href="#introduction-to-neural-networks" style="font-size: 80%;"><b>Introduction to Neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs026.html#artificial-neurons" style="font-size: 80%;"><b>Artificial neurons</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs027.html#neural-network-types" style="font-size: 80%;"><b>Neural network types</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#feed-forward-neural-networks" style="font-size: 80%;"><b>Feed-forward neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs029.html#convolutional-neural-network" style="font-size: 80%;"><b>Convolutional Neural Network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs030.html#recurrent-neural-networks" style="font-size: 80%;"><b>Recurrent neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs031.html#other-types-of-networks" style="font-size: 80%;"><b>Other types of networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs032.html#multilayer-perceptrons" style="font-size: 80%;"><b>Multilayer perceptrons</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs033.html#why-multilayer-perceptrons" style="font-size: 80%;"><b>Why multilayer perceptrons?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs034.html#illustration-of-a-single-perceptron-model-and-a-multi-perceptron-model" style="font-size: 80%;"><b>Illustration of a single perceptron model and a multi-perceptron model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs035.html#examples-of-xor-or-and-and-gates" style="font-size: 80%;"><b>Examples of XOR, OR and AND gates</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs036.html#does-logistic-regression-do-a-better-job" style="font-size: 80%;"><b>Does Logistic Regression do a better Job?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs037.html#adding-neural-networks" style="font-size: 80%;"><b>Adding Neural Networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs043.html#matrix-vector-notation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Matrix-vector notation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs044.html#matrix-vector-notation-and-activation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Matrix-vector notation  and activation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs046.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs047.html#relevance" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Relevance</a></li>
 
         </ul>
       </li>
@@ -362,29 +260,15 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0025"></a>
 <!-- !split -->
-<h2 id="second-moment-of-the-gradient" class="anchor">Second moment of the gradient </h2>
-
-<p>In stochastic gradient descent, with and without momentum, we still
-have to specify a schedule for tuning the learning rates \( \eta_t \)
-as a function of time.  As discussed in the context of Newton's
-method, this presents a number of dilemmas. The learning rate is
-limited by the steepest direction which can change depending on the
-current position in the landscape. To circumvent this problem, ideally
-our algorithm would keep track of curvature and take large steps in
-shallow, flat directions and small steps in steep, narrow directions.
-Second-order methods accomplish this by calculating or approximating
-the Hessian and normalizing the learning rate by the
-curvature. However, this is very computationally expensive for
-extremely large models. Ideally, we would like to be able to
-adaptively change the step size to match the landscape without paying
-the steep computational price of calculating or approximating
-Hessians.
-</p>
+<h2 id="introduction-to-neural-networks" class="anchor">Introduction to Neural networks </h2>
 
-<p>During the last decade a number of methods have been introduced that accomplish
-this by tracking not only the gradient, but also the second moment of
-the gradient. These methods include AdaGrad, AdaDelta, Root Mean Squared Propagation (RMS-Prop), and
-<a href="https://arxiv.org/abs/1412.6980" target="_self">ADAM</a>.
+<p>Artificial neural networks are computational systems that can learn to
+perform tasks by considering examples, generally without being
+programmed with any task-specific rules. It is supposed to mimic a
+biological system, wherein neurons interact by sending signals in the
+form of mathematical functions between layers. All layers can contain
+an arbitrary number of neurons, and each connection is represented by
+a weight variable.
 </p>
 
 <p>
@@ -412,7 +296,7 @@ <h2 id="second-moment-of-the-gradient" class="anchor">Second moment of the gradi
   <li><a href="http://github.com/._week40-bs033.html">34</a></li>
   <li><a href="http://github.com/._week40-bs034.html">35</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week40-bs068.html">69</a></li>
+  <li><a href="http://github.com/._week40-bs047.html">48</a></li>
   <li><a href="http://github.com/._week40-bs026.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week40/html/._week40-bs026.html b/doc/pub/week40/html/._week40-bs026.html
index 243cb4354..a081419b8 100644
--- a/doc/pub/week40/html/._week40-bs026.html
+++ b/doc/pub/week40/html/._week40-bs026.html
@@ -36,11 +36,10 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Plans for week 40', 2, None, 'plans-for-week-40'),
-              ('Lecture Monday September 30, 2024',
+ 'sections': [('Lecture Monday September 29, 2025',
                2,
                None,
-               'lecture-monday-september-30-2024'),
+               'lecture-monday-september-29-2025'),
               ('Suggested readings and videos',
                2,
                None,
@@ -49,145 +48,71 @@
                2,
                None,
                'lab-sessions-tuesday-and-wednesday'),
-              ('Summary from last week, using gradient descent methods, '
-               'limitations',
+              ('Logistic Regression, from last week',
                2,
                None,
-               'summary-from-last-week-using-gradient-descent-methods-limitations'),
-              ('Simple implementation of GD for OLS, Ridge and Lasso',
+               'logistic-regression-from-last-week'),
+              ('Classification problems', 2, None, 'classification-problems'),
+              ('Optimization and Deep learning',
                2,
                None,
-               'simple-implementation-of-gd-for-ols-ridge-and-lasso'),
-              ("But none of these can compete with Newton's method",
+               'optimization-and-deep-learning'),
+              ('Basics', 2, None, 'basics'),
+              ('Two parameters', 2, None, 'two-parameters'),
+              ('Maximum likelihood', 2, None, 'maximum-likelihood'),
+              ('The cost function rewritten',
                2,
                None,
-               'but-none-of-these-can-compete-with-newton-s-method'),
-              ('Gradient descent and Logistic regression',
+               'the-cost-function-rewritten'),
+              ('Minimizing the cross entropy',
                2,
                None,
-               'gradient-descent-and-logistic-regression'),
-              ('Overview video on Stochastic Gradient Descent',
+               'minimizing-the-cross-entropy'),
+              ('A more compact expression',
                2,
                None,
-               'overview-video-on-stochastic-gradient-descent'),
-              ('Batches and mini-batches', 2, None, 'batches-and-mini-batches'),
-              ('Stochastic Gradient Descent (SGD)',
+               'a-more-compact-expression'),
+              ('Extending to more predictors',
                2,
                None,
-               'stochastic-gradient-descent-sgd'),
-              ('Stochastic Gradient Descent',
+               'extending-to-more-predictors'),
+              ('Including more classes', 2, None, 'including-more-classes'),
+              ('More classes', 2, None, 'more-classes'),
+              ('Optimization, the central part of any Machine Learning '
+               'algortithm',
                2,
                None,
-               'stochastic-gradient-descent'),
-              ('Computation of gradients', 2, None, 'computation-of-gradients'),
-              ('SGD example', 2, None, 'sgd-example'),
-              ('The gradient step', 2, None, 'the-gradient-step'),
-              ('Simple example code', 2, None, 'simple-example-code'),
-              ('When do we stop?', 2, None, 'when-do-we-stop'),
-              ('Slightly different approach',
+               'optimization-the-central-part-of-any-machine-learning-algortithm'),
+              ('Revisiting our Logistic Regression case',
                2,
                None,
-               'slightly-different-approach'),
-              ('Time decay rate', 2, None, 'time-decay-rate'),
-              ('Code with a Number of Minibatches which varies',
+               'revisiting-our-logistic-regression-case'),
+              ('The equations to solve', 2, None, 'the-equations-to-solve'),
+              ("Solving using Newton-Raphson's method",
                2,
                None,
-               'code-with-a-number-of-minibatches-which-varies'),
-              ('Replace or not', 2, None, 'replace-or-not'),
-              ('Momentum based GD', 2, None, 'momentum-based-gd'),
-              ('More on momentum based approaches',
+               'solving-using-newton-raphson-s-method'),
+              ('Example code for Logistic Regression',
                2,
                None,
-               'more-on-momentum-based-approaches'),
-              ('Momentum parameter', 2, None, 'momentum-parameter'),
-              ('Second moment of the gradient',
-               2,
-               None,
-               'second-moment-of-the-gradient'),
-              ('RMS prop', 2, None, 'rms-prop'),
-              ('"ADAM optimizer":"https://arxiv.org/abs/1412.6980"',
-               2,
-               None,
-               'adam-optimizer-https-arxiv-org-abs-1412-6980'),
-              ('Algorithms and codes for Adagrad, RMSprop and Adam',
-               2,
-               None,
-               'algorithms-and-codes-for-adagrad-rmsprop-and-adam'),
-              ('AdaGrad algorithm, taken from "Goodfellow et '
-               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
-               2,
-               None,
-               'adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
-              ('RMSProp algorithm, taken from "Goodfellow et '
-               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
-               2,
-               None,
-               'rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
-              ('ADAM algorithm, taken from "Goodfellow et '
-               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
-               2,
-               None,
-               'adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
-              ('Practical tips', 2, None, 'practical-tips'),
-              ('Automatic differentiation',
-               2,
-               None,
-               'automatic-differentiation'),
-              ('Using autograd', 2, None, 'using-autograd'),
-              ('Autograd with more complicated functions',
-               2,
-               None,
-               'autograd-with-more-complicated-functions'),
-              ('More complicated functions using the elements of their '
-               'arguments directly',
-               2,
-               None,
-               'more-complicated-functions-using-the-elements-of-their-arguments-directly'),
-              ('Functions using mathematical functions from Numpy',
-               2,
-               None,
-               'functions-using-mathematical-functions-from-numpy'),
-              ('More autograd', 2, None, 'more-autograd'),
-              ('And  with loops', 2, None, 'and-with-loops'),
-              ('Using recursion', 2, None, 'using-recursion'),
-              ('Using Autograd with OLS', 2, None, 'using-autograd-with-ols'),
-              ('Same code but now with momentum gradient descent',
-               2,
-               None,
-               'same-code-but-now-with-momentum-gradient-descent'),
-              ('Including Stochastic Gradient Descent with Autograd',
-               2,
-               None,
-               'including-stochastic-gradient-descent-with-autograd'),
-              ('Same code but now with momentum gradient descent',
-               2,
-               None,
-               'same-code-but-now-with-momentum-gradient-descent'),
-              ('Similar (second order function now) problem but now with '
-               'AdaGrad',
-               2,
+               'example-code-for-logistic-regression'),
+              ('Synthetic data generation',
+               3,
                None,
-               'similar-second-order-function-now-problem-but-now-with-adagrad'),
-              ('RMSprop for adaptive learning rate with Stochastic Gradient '
-               'Descent',
+               'synthetic-data-generation'),
+              ('Using _Scikit-learn_', 2, None, 'using-scikit-learn'),
+              ('Using the correlation matrix',
                2,
                None,
-               'rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent'),
-              ('And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"',
+               'using-the-correlation-matrix'),
+              ('Discussing the correlation data',
                2,
                None,
-               'and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf'),
-              ('And Logistic Regression', 2, None, 'and-logistic-regression'),
-              ('Introducing "JAX":"https://jax.readthedocs.io/en/latest/"',
+               'discussing-the-correlation-data'),
+              ('Other measures in classification studies',
                2,
                None,
-               'introducing-jax-https-jax-readthedocs-io-en-latest'),
-              ('Getting started with Jax, note the way we import numpy',
-               3,
-               None,
-               'getting-started-with-jax-note-the-way-we-import-numpy'),
-              ('A warm-up example', 3, None, 'a-warm-up-example'),
-              ('A more advanced example', 3, None, 'a-more-advanced-example'),
+               'other-measures-in-classification-studies'),
               ('Introduction to Neural networks',
                2,
                None,
@@ -276,81 +201,54 @@
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs001.html#plans-for-week-40" style="font-size: 80%;"><b>Plans for week 40</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs002.html#lecture-monday-september-30-2024" style="font-size: 80%;"><b>Lecture Monday September 30, 2024</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs003.html#suggested-readings-and-videos" style="font-size: 80%;"><b>Suggested readings and videos</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs004.html#lab-sessions-tuesday-and-wednesday" style="font-size: 80%;"><b>Lab sessions Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs005.html#summary-from-last-week-using-gradient-descent-methods-limitations" style="font-size: 80%;"><b>Summary from last week, using gradient descent methods, limitations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs006.html#simple-implementation-of-gd-for-ols-ridge-and-lasso" style="font-size: 80%;"><b>Simple implementation of GD for OLS, Ridge and Lasso</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs007.html#but-none-of-these-can-compete-with-newton-s-method" style="font-size: 80%;"><b>But none of these can compete with Newton's method</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs008.html#gradient-descent-and-logistic-regression" style="font-size: 80%;"><b>Gradient descent and Logistic regression</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs009.html#overview-video-on-stochastic-gradient-descent" style="font-size: 80%;"><b>Overview video on Stochastic Gradient Descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs010.html#batches-and-mini-batches" style="font-size: 80%;"><b>Batches and mini-batches</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs011.html#stochastic-gradient-descent-sgd" style="font-size: 80%;"><b>Stochastic Gradient Descent (SGD)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs012.html#stochastic-gradient-descent" style="font-size: 80%;"><b>Stochastic Gradient Descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs013.html#computation-of-gradients" style="font-size: 80%;"><b>Computation of gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs014.html#sgd-example" style="font-size: 80%;"><b>SGD example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs015.html#the-gradient-step" style="font-size: 80%;"><b>The gradient step</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs016.html#simple-example-code" style="font-size: 80%;"><b>Simple example code</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs017.html#when-do-we-stop" style="font-size: 80%;"><b>When do we stop?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs018.html#slightly-different-approach" style="font-size: 80%;"><b>Slightly different approach</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs019.html#time-decay-rate" style="font-size: 80%;"><b>Time decay rate</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs020.html#code-with-a-number-of-minibatches-which-varies" style="font-size: 80%;"><b>Code with a Number of Minibatches which varies</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs021.html#replace-or-not" style="font-size: 80%;"><b>Replace or not</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs022.html#momentum-based-gd" style="font-size: 80%;"><b>Momentum based GD</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs023.html#more-on-momentum-based-approaches" style="font-size: 80%;"><b>More on momentum based approaches</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs024.html#momentum-parameter" style="font-size: 80%;"><b>Momentum parameter</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs025.html#second-moment-of-the-gradient" style="font-size: 80%;"><b>Second moment of the gradient</b></a></li>
-     <!-- navigation toc: --> <li><a href="#rms-prop" style="font-size: 80%;"><b>RMS prop</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs027.html#adam-optimizer-https-arxiv-org-abs-1412-6980" style="font-size: 80%;"><b>"ADAM optimizer":"https://arxiv.org/abs/1412.6980"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#algorithms-and-codes-for-adagrad-rmsprop-and-adam" style="font-size: 80%;"><b>Algorithms and codes for Adagrad, RMSprop and Adam</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>AdaGrad algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>RMSProp algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>ADAM algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs029.html#practical-tips" style="font-size: 80%;"><b>Practical tips</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs030.html#automatic-differentiation" style="font-size: 80%;"><b>Automatic differentiation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs031.html#using-autograd" style="font-size: 80%;"><b>Using autograd</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs032.html#autograd-with-more-complicated-functions" style="font-size: 80%;"><b>Autograd with more complicated functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs033.html#more-complicated-functions-using-the-elements-of-their-arguments-directly" style="font-size: 80%;"><b>More complicated functions using the elements of their arguments directly</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs034.html#functions-using-mathematical-functions-from-numpy" style="font-size: 80%;"><b>Functions using mathematical functions from Numpy</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs035.html#more-autograd" style="font-size: 80%;"><b>More autograd</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs036.html#and-with-loops" style="font-size: 80%;"><b>And  with loops</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs037.html#using-recursion" style="font-size: 80%;"><b>Using recursion</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs038.html#using-autograd-with-ols" style="font-size: 80%;"><b>Using Autograd with OLS</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs041.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs040.html#including-stochastic-gradient-descent-with-autograd" style="font-size: 80%;"><b>Including Stochastic Gradient Descent with Autograd</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs041.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#similar-second-order-function-now-problem-but-now-with-adagrad" style="font-size: 80%;"><b>Similar (second order function now) problem but now with AdaGrad</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs043.html#rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent" style="font-size: 80%;"><b>RMSprop for adaptive learning rate with Stochastic Gradient Descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs044.html#and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf" style="font-size: 80%;"><b>And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#and-logistic-regression" style="font-size: 80%;"><b>And Logistic Regression</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#introducing-jax-https-jax-readthedocs-io-en-latest" style="font-size: 80%;"><b>Introducing "JAX":"https://jax.readthedocs.io/en/latest/"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#getting-started-with-jax-note-the-way-we-import-numpy" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Getting started with Jax, note the way we import numpy</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#a-warm-up-example" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;A warm-up example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#a-more-advanced-example" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;A more advanced example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs046.html#introduction-to-neural-networks" style="font-size: 80%;"><b>Introduction to Neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs047.html#artificial-neurons" style="font-size: 80%;"><b>Artificial neurons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs048.html#neural-network-types" style="font-size: 80%;"><b>Neural network types</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs049.html#feed-forward-neural-networks" style="font-size: 80%;"><b>Feed-forward neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs050.html#convolutional-neural-network" style="font-size: 80%;"><b>Convolutional Neural Network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs051.html#recurrent-neural-networks" style="font-size: 80%;"><b>Recurrent neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs052.html#other-types-of-networks" style="font-size: 80%;"><b>Other types of networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs053.html#multilayer-perceptrons" style="font-size: 80%;"><b>Multilayer perceptrons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs054.html#why-multilayer-perceptrons" style="font-size: 80%;"><b>Why multilayer perceptrons?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs055.html#illustration-of-a-single-perceptron-model-and-a-multi-perceptron-model" style="font-size: 80%;"><b>Illustration of a single perceptron model and a multi-perceptron model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs056.html#examples-of-xor-or-and-and-gates" style="font-size: 80%;"><b>Examples of XOR, OR and AND gates</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs057.html#does-logistic-regression-do-a-better-job" style="font-size: 80%;"><b>Does Logistic Regression do a better Job?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs058.html#adding-neural-networks" style="font-size: 80%;"><b>Adding Neural Networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs064.html#matrix-vector-notation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Matrix-vector notation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs065.html#matrix-vector-notation-and-activation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Matrix-vector notation  and activation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs066.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs067.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs068.html#relevance" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Relevance</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs001.html#lecture-monday-september-29-2025" style="font-size: 80%;"><b>Lecture Monday September 29, 2025</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs002.html#suggested-readings-and-videos" style="font-size: 80%;"><b>Suggested readings and videos</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs003.html#lab-sessions-tuesday-and-wednesday" style="font-size: 80%;"><b>Lab sessions Tuesday and Wednesday</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs004.html#logistic-regression-from-last-week" style="font-size: 80%;"><b>Logistic Regression, from last week</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs005.html#classification-problems" style="font-size: 80%;"><b>Classification problems</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs006.html#optimization-and-deep-learning" style="font-size: 80%;"><b>Optimization and Deep learning</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs007.html#basics" style="font-size: 80%;"><b>Basics</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs008.html#two-parameters" style="font-size: 80%;"><b>Two parameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs009.html#maximum-likelihood" style="font-size: 80%;"><b>Maximum likelihood</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs010.html#the-cost-function-rewritten" style="font-size: 80%;"><b>The cost function rewritten</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs011.html#minimizing-the-cross-entropy" style="font-size: 80%;"><b>Minimizing the cross entropy</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs012.html#a-more-compact-expression" style="font-size: 80%;"><b>A more compact expression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs013.html#extending-to-more-predictors" style="font-size: 80%;"><b>Extending to more predictors</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs014.html#including-more-classes" style="font-size: 80%;"><b>Including more classes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs015.html#more-classes" style="font-size: 80%;"><b>More classes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs016.html#optimization-the-central-part-of-any-machine-learning-algortithm" style="font-size: 80%;"><b>Optimization, the central part of any Machine Learning algortithm</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs017.html#revisiting-our-logistic-regression-case" style="font-size: 80%;"><b>Revisiting our Logistic Regression case</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs018.html#the-equations-to-solve" style="font-size: 80%;"><b>The equations to solve</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs019.html#solving-using-newton-raphson-s-method" style="font-size: 80%;"><b>Solving using Newton-Raphson's method</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs020.html#example-code-for-logistic-regression" style="font-size: 80%;"><b>Example code for Logistic Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs020.html#synthetic-data-generation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Synthetic data generation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs021.html#using-scikit-learn" style="font-size: 80%;"><b>Using <b>Scikit-learn</b></b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs022.html#using-the-correlation-matrix" style="font-size: 80%;"><b>Using the correlation matrix</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs023.html#discussing-the-correlation-data" style="font-size: 80%;"><b>Discussing the correlation data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs024.html#other-measures-in-classification-studies" style="font-size: 80%;"><b>Other measures in classification studies</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs025.html#introduction-to-neural-networks" style="font-size: 80%;"><b>Introduction to Neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="#artificial-neurons" style="font-size: 80%;"><b>Artificial neurons</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs027.html#neural-network-types" style="font-size: 80%;"><b>Neural network types</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#feed-forward-neural-networks" style="font-size: 80%;"><b>Feed-forward neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs029.html#convolutional-neural-network" style="font-size: 80%;"><b>Convolutional Neural Network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs030.html#recurrent-neural-networks" style="font-size: 80%;"><b>Recurrent neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs031.html#other-types-of-networks" style="font-size: 80%;"><b>Other types of networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs032.html#multilayer-perceptrons" style="font-size: 80%;"><b>Multilayer perceptrons</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs033.html#why-multilayer-perceptrons" style="font-size: 80%;"><b>Why multilayer perceptrons?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs034.html#illustration-of-a-single-perceptron-model-and-a-multi-perceptron-model" style="font-size: 80%;"><b>Illustration of a single perceptron model and a multi-perceptron model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs035.html#examples-of-xor-or-and-and-gates" style="font-size: 80%;"><b>Examples of XOR, OR and AND gates</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs036.html#does-logistic-regression-do-a-better-job" style="font-size: 80%;"><b>Does Logistic Regression do a better Job?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs037.html#adding-neural-networks" style="font-size: 80%;"><b>Adding Neural Networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs043.html#matrix-vector-notation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Matrix-vector notation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs044.html#matrix-vector-notation-and-activation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Matrix-vector notation  and activation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs046.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs047.html#relevance" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Relevance</a></li>
 
         </ul>
       </li>
@@ -362,32 +260,63 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0026"></a>
 <!-- !split -->
-<h2 id="rms-prop" class="anchor">RMS prop </h2>
+<h2 id="artificial-neurons" class="anchor">Artificial neurons  </h2>
 
-<p>In RMS prop, in addition to keeping a running average of the first
-moment of the gradient, we also keep track of the second moment
-denoted by \( \mathbf{s}_t=\mathbb{E}[\mathbf{g}_t^2] \). The update rule
-for RMS prop is given by
+<p>The field of artificial neural networks has a long history of
+development, and is closely connected with the advancement of computer
+science and computers in general. A model of artificial neurons was
+first developed by McCulloch and Pitts in 1943 to study signal
+processing in the brain and has later been refined by others. The
+general idea is to mimic neural networks in the human brain, which is
+composed of billions of neurons that communicate with each other by
+sending electrical signals.  Each neuron accumulates its incoming
+signals, which must exceed an activation threshold to yield an
+output. If the threshold is not overcome, the neuron remains inactive,
+i.e. has zero output.
 </p>
 
+<p>This behaviour has inspired a simple mathematical model for an artificial neuron.</p>
+
 $$
-\begin{align}
-\mathbf{g}_t &= \nabla_\theta E(\boldsymbol{\theta}) 
-\tag{3}\\
-\mathbf{s}_t &=\beta \mathbf{s}_{t-1} +(1-\beta)\mathbf{g}_t^2 \nonumber \\
-\boldsymbol{\theta}_{t+1}&=&\boldsymbol{\theta}_t - \eta_t { \mathbf{g}_t \over \sqrt{\mathbf{s}_t +\epsilon}}, \nonumber
-\end{align}
+\begin{equation}
+ y = f\left(\sum_{i=1}^n w_ix_i\right) = f(u)
+\tag{1}
+\end{equation}
 $$
 
-<p>where \( \beta \) controls the averaging time of the second moment and is
-typically taken to be about \( \beta=0.9 \), \( \eta_t \) is a learning rate
-typically chosen to be \( 10^{-3} \), and \( \epsilon\sim 10^{-8}  \) is a
-small regularization constant to prevent divergences. Multiplication
-and division by vectors is understood as an element-wise operation. It
-is clear from this formula that the learning rate is reduced in
-directions where the norm of the gradient is consistently large. This
-greatly speeds up the convergence by allowing us to use a larger
-learning rate for flat directions.
+<p>Here, the output \( y \) of the neuron is the value of its activation function, which have as input
+a weighted sum of signals \( x_i, \dots ,x_n \) received by \( n \) other neurons.
+</p>
+
+<p>Conceptually, it is helpful to divide neural networks into four
+categories:
+</p>
+<ol>
+<li> general purpose neural networks for supervised learning,</li>
+<li> neural networks designed specifically for image processing, the most prominent example of this class being Convolutional Neural Networks (CNNs),</li>
+<li> neural networks for sequential data such as Recurrent Neural Networks (RNNs), and</li>
+<li> neural networks for unsupervised learning such as Deep Boltzmann Machines.</li>
+</ol>
+<p>In natural science, DNNs and CNNs have already found numerous
+applications. In statistical physics, they have been applied to detect
+phase transitions in 2D Ising and Potts models, lattice gauge
+theories, and different phases of polymers, or solving the
+Navier-Stokes equation in weather forecasting.  Deep learning has also
+found interesting applications in quantum physics. Various quantum
+phase transitions can be detected and studied using DNNs and CNNs,
+topological phases, and even non-equilibrium many-body
+localization. Representing quantum states as DNNs quantum state
+tomography are among some of the impressive achievements to reveal the
+potential of DNNs to facilitate the study of quantum systems.
+</p>
+
+<p>In quantum information theory, it has been shown that one can perform
+gate decompositions with the help of neural. 
+</p>
+
+<p>The applications are not limited to the natural sciences. There is a
+plethora of applications in essentially all disciplines, from the
+humanities to life science and medicine.
 </p>
 
 <p>
@@ -415,7 +344,7 @@ <h2 id="rms-prop" class="anchor">RMS prop </h2>
   <li><a href="http://github.com/._week40-bs034.html">35</a></li>
   <li><a href="http://github.com/._week40-bs035.html">36</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week40-bs068.html">69</a></li>
+  <li><a href="http://github.com/._week40-bs047.html">48</a></li>
   <li><a href="http://github.com/._week40-bs027.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week40/html/._week40-bs027.html b/doc/pub/week40/html/._week40-bs027.html
index 022e3ec3b..6f6a4527f 100644
--- a/doc/pub/week40/html/._week40-bs027.html
+++ b/doc/pub/week40/html/._week40-bs027.html
@@ -36,11 +36,10 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Plans for week 40', 2, None, 'plans-for-week-40'),
-              ('Lecture Monday September 30, 2024',
+ 'sections': [('Lecture Monday September 29, 2025',
                2,
                None,
-               'lecture-monday-september-30-2024'),
+               'lecture-monday-september-29-2025'),
               ('Suggested readings and videos',
                2,
                None,
@@ -49,145 +48,71 @@
                2,
                None,
                'lab-sessions-tuesday-and-wednesday'),
-              ('Summary from last week, using gradient descent methods, '
-               'limitations',
+              ('Logistic Regression, from last week',
                2,
                None,
-               'summary-from-last-week-using-gradient-descent-methods-limitations'),
-              ('Simple implementation of GD for OLS, Ridge and Lasso',
+               'logistic-regression-from-last-week'),
+              ('Classification problems', 2, None, 'classification-problems'),
+              ('Optimization and Deep learning',
                2,
                None,
-               'simple-implementation-of-gd-for-ols-ridge-and-lasso'),
-              ("But none of these can compete with Newton's method",
+               'optimization-and-deep-learning'),
+              ('Basics', 2, None, 'basics'),
+              ('Two parameters', 2, None, 'two-parameters'),
+              ('Maximum likelihood', 2, None, 'maximum-likelihood'),
+              ('The cost function rewritten',
                2,
                None,
-               'but-none-of-these-can-compete-with-newton-s-method'),
-              ('Gradient descent and Logistic regression',
+               'the-cost-function-rewritten'),
+              ('Minimizing the cross entropy',
                2,
                None,
-               'gradient-descent-and-logistic-regression'),
-              ('Overview video on Stochastic Gradient Descent',
+               'minimizing-the-cross-entropy'),
+              ('A more compact expression',
                2,
                None,
-               'overview-video-on-stochastic-gradient-descent'),
-              ('Batches and mini-batches', 2, None, 'batches-and-mini-batches'),
-              ('Stochastic Gradient Descent (SGD)',
+               'a-more-compact-expression'),
+              ('Extending to more predictors',
                2,
                None,
-               'stochastic-gradient-descent-sgd'),
-              ('Stochastic Gradient Descent',
+               'extending-to-more-predictors'),
+              ('Including more classes', 2, None, 'including-more-classes'),
+              ('More classes', 2, None, 'more-classes'),
+              ('Optimization, the central part of any Machine Learning '
+               'algortithm',
                2,
                None,
-               'stochastic-gradient-descent'),
-              ('Computation of gradients', 2, None, 'computation-of-gradients'),
-              ('SGD example', 2, None, 'sgd-example'),
-              ('The gradient step', 2, None, 'the-gradient-step'),
-              ('Simple example code', 2, None, 'simple-example-code'),
-              ('When do we stop?', 2, None, 'when-do-we-stop'),
-              ('Slightly different approach',
+               'optimization-the-central-part-of-any-machine-learning-algortithm'),
+              ('Revisiting our Logistic Regression case',
                2,
                None,
-               'slightly-different-approach'),
-              ('Time decay rate', 2, None, 'time-decay-rate'),
-              ('Code with a Number of Minibatches which varies',
+               'revisiting-our-logistic-regression-case'),
+              ('The equations to solve', 2, None, 'the-equations-to-solve'),
+              ("Solving using Newton-Raphson's method",
                2,
                None,
-               'code-with-a-number-of-minibatches-which-varies'),
-              ('Replace or not', 2, None, 'replace-or-not'),
-              ('Momentum based GD', 2, None, 'momentum-based-gd'),
-              ('More on momentum based approaches',
+               'solving-using-newton-raphson-s-method'),
+              ('Example code for Logistic Regression',
                2,
                None,
-               'more-on-momentum-based-approaches'),
-              ('Momentum parameter', 2, None, 'momentum-parameter'),
-              ('Second moment of the gradient',
-               2,
-               None,
-               'second-moment-of-the-gradient'),
-              ('RMS prop', 2, None, 'rms-prop'),
-              ('"ADAM optimizer":"https://arxiv.org/abs/1412.6980"',
-               2,
-               None,
-               'adam-optimizer-https-arxiv-org-abs-1412-6980'),
-              ('Algorithms and codes for Adagrad, RMSprop and Adam',
-               2,
-               None,
-               'algorithms-and-codes-for-adagrad-rmsprop-and-adam'),
-              ('AdaGrad algorithm, taken from "Goodfellow et '
-               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
-               2,
-               None,
-               'adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
-              ('RMSProp algorithm, taken from "Goodfellow et '
-               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
-               2,
-               None,
-               'rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
-              ('ADAM algorithm, taken from "Goodfellow et '
-               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
-               2,
-               None,
-               'adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
-              ('Practical tips', 2, None, 'practical-tips'),
-              ('Automatic differentiation',
-               2,
-               None,
-               'automatic-differentiation'),
-              ('Using autograd', 2, None, 'using-autograd'),
-              ('Autograd with more complicated functions',
-               2,
-               None,
-               'autograd-with-more-complicated-functions'),
-              ('More complicated functions using the elements of their '
-               'arguments directly',
-               2,
-               None,
-               'more-complicated-functions-using-the-elements-of-their-arguments-directly'),
-              ('Functions using mathematical functions from Numpy',
-               2,
-               None,
-               'functions-using-mathematical-functions-from-numpy'),
-              ('More autograd', 2, None, 'more-autograd'),
-              ('And  with loops', 2, None, 'and-with-loops'),
-              ('Using recursion', 2, None, 'using-recursion'),
-              ('Using Autograd with OLS', 2, None, 'using-autograd-with-ols'),
-              ('Same code but now with momentum gradient descent',
-               2,
-               None,
-               'same-code-but-now-with-momentum-gradient-descent'),
-              ('Including Stochastic Gradient Descent with Autograd',
-               2,
-               None,
-               'including-stochastic-gradient-descent-with-autograd'),
-              ('Same code but now with momentum gradient descent',
-               2,
-               None,
-               'same-code-but-now-with-momentum-gradient-descent'),
-              ('Similar (second order function now) problem but now with '
-               'AdaGrad',
-               2,
+               'example-code-for-logistic-regression'),
+              ('Synthetic data generation',
+               3,
                None,
-               'similar-second-order-function-now-problem-but-now-with-adagrad'),
-              ('RMSprop for adaptive learning rate with Stochastic Gradient '
-               'Descent',
+               'synthetic-data-generation'),
+              ('Using _Scikit-learn_', 2, None, 'using-scikit-learn'),
+              ('Using the correlation matrix',
                2,
                None,
-               'rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent'),
-              ('And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"',
+               'using-the-correlation-matrix'),
+              ('Discussing the correlation data',
                2,
                None,
-               'and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf'),
-              ('And Logistic Regression', 2, None, 'and-logistic-regression'),
-              ('Introducing "JAX":"https://jax.readthedocs.io/en/latest/"',
+               'discussing-the-correlation-data'),
+              ('Other measures in classification studies',
                2,
                None,
-               'introducing-jax-https-jax-readthedocs-io-en-latest'),
-              ('Getting started with Jax, note the way we import numpy',
-               3,
-               None,
-               'getting-started-with-jax-note-the-way-we-import-numpy'),
-              ('A warm-up example', 3, None, 'a-warm-up-example'),
-              ('A more advanced example', 3, None, 'a-more-advanced-example'),
+               'other-measures-in-classification-studies'),
               ('Introduction to Neural networks',
                2,
                None,
@@ -276,81 +201,54 @@
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs001.html#plans-for-week-40" style="font-size: 80%;"><b>Plans for week 40</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs002.html#lecture-monday-september-30-2024" style="font-size: 80%;"><b>Lecture Monday September 30, 2024</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs003.html#suggested-readings-and-videos" style="font-size: 80%;"><b>Suggested readings and videos</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs004.html#lab-sessions-tuesday-and-wednesday" style="font-size: 80%;"><b>Lab sessions Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs005.html#summary-from-last-week-using-gradient-descent-methods-limitations" style="font-size: 80%;"><b>Summary from last week, using gradient descent methods, limitations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs006.html#simple-implementation-of-gd-for-ols-ridge-and-lasso" style="font-size: 80%;"><b>Simple implementation of GD for OLS, Ridge and Lasso</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs007.html#but-none-of-these-can-compete-with-newton-s-method" style="font-size: 80%;"><b>But none of these can compete with Newton's method</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs008.html#gradient-descent-and-logistic-regression" style="font-size: 80%;"><b>Gradient descent and Logistic regression</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs009.html#overview-video-on-stochastic-gradient-descent" style="font-size: 80%;"><b>Overview video on Stochastic Gradient Descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs010.html#batches-and-mini-batches" style="font-size: 80%;"><b>Batches and mini-batches</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs011.html#stochastic-gradient-descent-sgd" style="font-size: 80%;"><b>Stochastic Gradient Descent (SGD)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs012.html#stochastic-gradient-descent" style="font-size: 80%;"><b>Stochastic Gradient Descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs013.html#computation-of-gradients" style="font-size: 80%;"><b>Computation of gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs014.html#sgd-example" style="font-size: 80%;"><b>SGD example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs015.html#the-gradient-step" style="font-size: 80%;"><b>The gradient step</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs016.html#simple-example-code" style="font-size: 80%;"><b>Simple example code</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs017.html#when-do-we-stop" style="font-size: 80%;"><b>When do we stop?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs018.html#slightly-different-approach" style="font-size: 80%;"><b>Slightly different approach</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs019.html#time-decay-rate" style="font-size: 80%;"><b>Time decay rate</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs020.html#code-with-a-number-of-minibatches-which-varies" style="font-size: 80%;"><b>Code with a Number of Minibatches which varies</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs021.html#replace-or-not" style="font-size: 80%;"><b>Replace or not</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs022.html#momentum-based-gd" style="font-size: 80%;"><b>Momentum based GD</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs023.html#more-on-momentum-based-approaches" style="font-size: 80%;"><b>More on momentum based approaches</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs024.html#momentum-parameter" style="font-size: 80%;"><b>Momentum parameter</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs025.html#second-moment-of-the-gradient" style="font-size: 80%;"><b>Second moment of the gradient</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs026.html#rms-prop" style="font-size: 80%;"><b>RMS prop</b></a></li>
-     <!-- navigation toc: --> <li><a href="#adam-optimizer-https-arxiv-org-abs-1412-6980" style="font-size: 80%;"><b>"ADAM optimizer":"https://arxiv.org/abs/1412.6980"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#algorithms-and-codes-for-adagrad-rmsprop-and-adam" style="font-size: 80%;"><b>Algorithms and codes for Adagrad, RMSprop and Adam</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>AdaGrad algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>RMSProp algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>ADAM algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs029.html#practical-tips" style="font-size: 80%;"><b>Practical tips</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs030.html#automatic-differentiation" style="font-size: 80%;"><b>Automatic differentiation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs031.html#using-autograd" style="font-size: 80%;"><b>Using autograd</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs032.html#autograd-with-more-complicated-functions" style="font-size: 80%;"><b>Autograd with more complicated functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs033.html#more-complicated-functions-using-the-elements-of-their-arguments-directly" style="font-size: 80%;"><b>More complicated functions using the elements of their arguments directly</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs034.html#functions-using-mathematical-functions-from-numpy" style="font-size: 80%;"><b>Functions using mathematical functions from Numpy</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs035.html#more-autograd" style="font-size: 80%;"><b>More autograd</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs036.html#and-with-loops" style="font-size: 80%;"><b>And  with loops</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs037.html#using-recursion" style="font-size: 80%;"><b>Using recursion</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs038.html#using-autograd-with-ols" style="font-size: 80%;"><b>Using Autograd with OLS</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs041.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs040.html#including-stochastic-gradient-descent-with-autograd" style="font-size: 80%;"><b>Including Stochastic Gradient Descent with Autograd</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs041.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#similar-second-order-function-now-problem-but-now-with-adagrad" style="font-size: 80%;"><b>Similar (second order function now) problem but now with AdaGrad</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs043.html#rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent" style="font-size: 80%;"><b>RMSprop for adaptive learning rate with Stochastic Gradient Descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs044.html#and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf" style="font-size: 80%;"><b>And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#and-logistic-regression" style="font-size: 80%;"><b>And Logistic Regression</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#introducing-jax-https-jax-readthedocs-io-en-latest" style="font-size: 80%;"><b>Introducing "JAX":"https://jax.readthedocs.io/en/latest/"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#getting-started-with-jax-note-the-way-we-import-numpy" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Getting started with Jax, note the way we import numpy</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#a-warm-up-example" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;A warm-up example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#a-more-advanced-example" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;A more advanced example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs046.html#introduction-to-neural-networks" style="font-size: 80%;"><b>Introduction to Neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs047.html#artificial-neurons" style="font-size: 80%;"><b>Artificial neurons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs048.html#neural-network-types" style="font-size: 80%;"><b>Neural network types</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs049.html#feed-forward-neural-networks" style="font-size: 80%;"><b>Feed-forward neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs050.html#convolutional-neural-network" style="font-size: 80%;"><b>Convolutional Neural Network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs051.html#recurrent-neural-networks" style="font-size: 80%;"><b>Recurrent neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs052.html#other-types-of-networks" style="font-size: 80%;"><b>Other types of networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs053.html#multilayer-perceptrons" style="font-size: 80%;"><b>Multilayer perceptrons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs054.html#why-multilayer-perceptrons" style="font-size: 80%;"><b>Why multilayer perceptrons?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs055.html#illustration-of-a-single-perceptron-model-and-a-multi-perceptron-model" style="font-size: 80%;"><b>Illustration of a single perceptron model and a multi-perceptron model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs056.html#examples-of-xor-or-and-and-gates" style="font-size: 80%;"><b>Examples of XOR, OR and AND gates</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs057.html#does-logistic-regression-do-a-better-job" style="font-size: 80%;"><b>Does Logistic Regression do a better Job?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs058.html#adding-neural-networks" style="font-size: 80%;"><b>Adding Neural Networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs064.html#matrix-vector-notation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Matrix-vector notation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs065.html#matrix-vector-notation-and-activation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Matrix-vector notation  and activation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs066.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs067.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs068.html#relevance" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Relevance</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs001.html#lecture-monday-september-29-2025" style="font-size: 80%;"><b>Lecture Monday September 29, 2025</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs002.html#suggested-readings-and-videos" style="font-size: 80%;"><b>Suggested readings and videos</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs003.html#lab-sessions-tuesday-and-wednesday" style="font-size: 80%;"><b>Lab sessions Tuesday and Wednesday</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs004.html#logistic-regression-from-last-week" style="font-size: 80%;"><b>Logistic Regression, from last week</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs005.html#classification-problems" style="font-size: 80%;"><b>Classification problems</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs006.html#optimization-and-deep-learning" style="font-size: 80%;"><b>Optimization and Deep learning</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs007.html#basics" style="font-size: 80%;"><b>Basics</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs008.html#two-parameters" style="font-size: 80%;"><b>Two parameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs009.html#maximum-likelihood" style="font-size: 80%;"><b>Maximum likelihood</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs010.html#the-cost-function-rewritten" style="font-size: 80%;"><b>The cost function rewritten</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs011.html#minimizing-the-cross-entropy" style="font-size: 80%;"><b>Minimizing the cross entropy</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs012.html#a-more-compact-expression" style="font-size: 80%;"><b>A more compact expression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs013.html#extending-to-more-predictors" style="font-size: 80%;"><b>Extending to more predictors</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs014.html#including-more-classes" style="font-size: 80%;"><b>Including more classes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs015.html#more-classes" style="font-size: 80%;"><b>More classes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs016.html#optimization-the-central-part-of-any-machine-learning-algortithm" style="font-size: 80%;"><b>Optimization, the central part of any Machine Learning algortithm</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs017.html#revisiting-our-logistic-regression-case" style="font-size: 80%;"><b>Revisiting our Logistic Regression case</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs018.html#the-equations-to-solve" style="font-size: 80%;"><b>The equations to solve</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs019.html#solving-using-newton-raphson-s-method" style="font-size: 80%;"><b>Solving using Newton-Raphson's method</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs020.html#example-code-for-logistic-regression" style="font-size: 80%;"><b>Example code for Logistic Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs020.html#synthetic-data-generation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Synthetic data generation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs021.html#using-scikit-learn" style="font-size: 80%;"><b>Using <b>Scikit-learn</b></b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs022.html#using-the-correlation-matrix" style="font-size: 80%;"><b>Using the correlation matrix</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs023.html#discussing-the-correlation-data" style="font-size: 80%;"><b>Discussing the correlation data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs024.html#other-measures-in-classification-studies" style="font-size: 80%;"><b>Other measures in classification studies</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs025.html#introduction-to-neural-networks" style="font-size: 80%;"><b>Introduction to Neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs026.html#artificial-neurons" style="font-size: 80%;"><b>Artificial neurons</b></a></li>
+     <!-- navigation toc: --> <li><a href="#neural-network-types" style="font-size: 80%;"><b>Neural network types</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#feed-forward-neural-networks" style="font-size: 80%;"><b>Feed-forward neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs029.html#convolutional-neural-network" style="font-size: 80%;"><b>Convolutional Neural Network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs030.html#recurrent-neural-networks" style="font-size: 80%;"><b>Recurrent neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs031.html#other-types-of-networks" style="font-size: 80%;"><b>Other types of networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs032.html#multilayer-perceptrons" style="font-size: 80%;"><b>Multilayer perceptrons</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs033.html#why-multilayer-perceptrons" style="font-size: 80%;"><b>Why multilayer perceptrons?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs034.html#illustration-of-a-single-perceptron-model-and-a-multi-perceptron-model" style="font-size: 80%;"><b>Illustration of a single perceptron model and a multi-perceptron model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs035.html#examples-of-xor-or-and-and-gates" style="font-size: 80%;"><b>Examples of XOR, OR and AND gates</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs036.html#does-logistic-regression-do-a-better-job" style="font-size: 80%;"><b>Does Logistic Regression do a better Job?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs037.html#adding-neural-networks" style="font-size: 80%;"><b>Adding Neural Networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs043.html#matrix-vector-notation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Matrix-vector notation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs044.html#matrix-vector-notation-and-activation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Matrix-vector notation  and activation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs046.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs047.html#relevance" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Relevance</a></li>
 
         </ul>
       </li>
@@ -362,60 +260,30 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0027"></a>
 <!-- !split -->
-<h2 id="adam-optimizer-https-arxiv-org-abs-1412-6980" class="anchor"><a href="https://arxiv.org/abs/1412.6980" target="_self">ADAM optimizer</a> </h2>
-
-<p>A related algorithm is the ADAM optimizer. In
-<a href="https://arxiv.org/abs/1412.6980" target="_self">ADAM</a>, we keep a running average of
-both the first and second moment of the gradient and use this
-information to adaptively change the learning rate for different
-parameters.  The method isefficient when working with large
-problems involving lots data and/or parameters.  It is a combination of the
-gradient descent with momentum algorithm and the RMSprop algorithm
-discussed above.
-</p>
+<h2 id="neural-network-types" class="anchor">Neural network types </h2>
 
-<p>In addition to keeping a running average of the first and
-second moments of the gradient
-(i.e. \( \mathbf{m}_t=\mathbb{E}[\mathbf{g}_t] \) and
-\( \mathbf{s}_t=\mathbb{E}[\mathbf{g}^2_t] \), respectively), ADAM
-performs an additional bias correction to account for the fact that we
-are estimating the first two moments of the gradient using a running
-average (denoted by the hats in the update rule below). The update
-rule for ADAM is given by (where multiplication and division are once
-again understood to be element-wise operations below)
+<p>An artificial neural network (ANN), is a computational model that
+consists of layers of connected neurons, or nodes or units.  We will
+refer to these interchangeably as units or nodes, and sometimes as
+neurons.
 </p>
 
-$$
-\begin{align}
-\mathbf{g}_t &= \nabla_\theta E(\boldsymbol{\theta}) 
-\tag{4}\\
-\mathbf{m}_t &= \beta_1 \mathbf{m}_{t-1} + (1-\beta_1) \mathbf{g}_t \nonumber \\
-\mathbf{s}_t &=\beta_2 \mathbf{s}_{t-1} +(1-\beta_2)\mathbf{g}_t^2 \nonumber \\
-\boldsymbol{\mathbf{m}}_t&={\mathbf{m}_t \over 1-\beta_1^t} \nonumber \\
-\boldsymbol{\mathbf{s}}_t &={\mathbf{s}_t \over1-\beta_2^t} \nonumber \\
-\boldsymbol{\theta}_{t+1}&=\boldsymbol{\theta}_t - \eta_t { \boldsymbol{\mathbf{m}}_t \over \sqrt{\boldsymbol{\mathbf{s}}_t} +\epsilon}, \nonumber \\
-\tag{5}
-\end{align}
-$$
-
-<p>where \( \beta_1 \) and \( \beta_2 \) set the memory lifetime of the first and
-second moment and are typically taken to be \( 0.9 \) and \( 0.99 \)
-respectively, and \( \eta \) and \( \epsilon \) are identical to RMSprop.
+<p>It is supposed to mimic a biological nervous system by letting each
+neuron interact with other neurons by sending signals in the form of
+mathematical functions between layers.  A wide variety of different
+ANNs have been developed, but most of them consist of an input layer,
+an output layer and eventual layers in-between, called <em>hidden
+layers</em>. All layers can contain an arbitrary number of nodes, and each
+connection between two nodes is associated with a weight variable.
 </p>
 
-<p>Like in RMSprop, the effective step size of a parameter depends on the
-magnitude of its gradient squared.  To understand this better, let us
-rewrite this expression in terms of the variance
-\( \boldsymbol{\sigma}_t^2 = \boldsymbol{\mathbf{s}}_t -
-(\boldsymbol{\mathbf{m}}_t)^2 \). Consider a single parameter \( \theta_t \). The
-update rule for this parameter is given by
+<p>Neural networks (also called neural nets) are neural-inspired
+nonlinear models for supervised learning.  As we will see, neural nets
+can be viewed as natural, more powerful extensions of supervised
+learning methods such as linear and logistic regression and soft-max
+methods we discussed earlier.
 </p>
 
-$$
-\Delta \theta_{t+1}= -\eta_t { \boldsymbol{m}_t \over \sqrt{\sigma_t^2 +  m_t^2 }+\epsilon}.
-$$
-
-
 <p>
 <!-- navigation buttons at the bottom of the page -->
 <ul class="pagination">
@@ -441,7 +309,7 @@ <h2 id="adam-optimizer-https-arxiv-org-abs-1412-6980" class="anchor"><a href="ht
   <li><a href="http://github.com/._week40-bs035.html">36</a></li>
   <li><a href="http://github.com/._week40-bs036.html">37</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week40-bs068.html">69</a></li>
+  <li><a href="http://github.com/._week40-bs047.html">48</a></li>
   <li><a href="http://github.com/._week40-bs028.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week40/html/._week40-bs028.html b/doc/pub/week40/html/._week40-bs028.html
index f5f7681a1..1f5dd7388 100644
--- a/doc/pub/week40/html/._week40-bs028.html
+++ b/doc/pub/week40/html/._week40-bs028.html
@@ -36,11 +36,10 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Plans for week 40', 2, None, 'plans-for-week-40'),
-              ('Lecture Monday September 30, 2024',
+ 'sections': [('Lecture Monday September 29, 2025',
                2,
                None,
-               'lecture-monday-september-30-2024'),
+               'lecture-monday-september-29-2025'),
               ('Suggested readings and videos',
                2,
                None,
@@ -49,145 +48,71 @@
                2,
                None,
                'lab-sessions-tuesday-and-wednesday'),
-              ('Summary from last week, using gradient descent methods, '
-               'limitations',
+              ('Logistic Regression, from last week',
                2,
                None,
-               'summary-from-last-week-using-gradient-descent-methods-limitations'),
-              ('Simple implementation of GD for OLS, Ridge and Lasso',
+               'logistic-regression-from-last-week'),
+              ('Classification problems', 2, None, 'classification-problems'),
+              ('Optimization and Deep learning',
                2,
                None,
-               'simple-implementation-of-gd-for-ols-ridge-and-lasso'),
-              ("But none of these can compete with Newton's method",
+               'optimization-and-deep-learning'),
+              ('Basics', 2, None, 'basics'),
+              ('Two parameters', 2, None, 'two-parameters'),
+              ('Maximum likelihood', 2, None, 'maximum-likelihood'),
+              ('The cost function rewritten',
                2,
                None,
-               'but-none-of-these-can-compete-with-newton-s-method'),
-              ('Gradient descent and Logistic regression',
+               'the-cost-function-rewritten'),
+              ('Minimizing the cross entropy',
                2,
                None,
-               'gradient-descent-and-logistic-regression'),
-              ('Overview video on Stochastic Gradient Descent',
+               'minimizing-the-cross-entropy'),
+              ('A more compact expression',
                2,
                None,
-               'overview-video-on-stochastic-gradient-descent'),
-              ('Batches and mini-batches', 2, None, 'batches-and-mini-batches'),
-              ('Stochastic Gradient Descent (SGD)',
+               'a-more-compact-expression'),
+              ('Extending to more predictors',
                2,
                None,
-               'stochastic-gradient-descent-sgd'),
-              ('Stochastic Gradient Descent',
+               'extending-to-more-predictors'),
+              ('Including more classes', 2, None, 'including-more-classes'),
+              ('More classes', 2, None, 'more-classes'),
+              ('Optimization, the central part of any Machine Learning '
+               'algortithm',
                2,
                None,
-               'stochastic-gradient-descent'),
-              ('Computation of gradients', 2, None, 'computation-of-gradients'),
-              ('SGD example', 2, None, 'sgd-example'),
-              ('The gradient step', 2, None, 'the-gradient-step'),
-              ('Simple example code', 2, None, 'simple-example-code'),
-              ('When do we stop?', 2, None, 'when-do-we-stop'),
-              ('Slightly different approach',
+               'optimization-the-central-part-of-any-machine-learning-algortithm'),
+              ('Revisiting our Logistic Regression case',
                2,
                None,
-               'slightly-different-approach'),
-              ('Time decay rate', 2, None, 'time-decay-rate'),
-              ('Code with a Number of Minibatches which varies',
+               'revisiting-our-logistic-regression-case'),
+              ('The equations to solve', 2, None, 'the-equations-to-solve'),
+              ("Solving using Newton-Raphson's method",
                2,
                None,
-               'code-with-a-number-of-minibatches-which-varies'),
-              ('Replace or not', 2, None, 'replace-or-not'),
-              ('Momentum based GD', 2, None, 'momentum-based-gd'),
-              ('More on momentum based approaches',
+               'solving-using-newton-raphson-s-method'),
+              ('Example code for Logistic Regression',
                2,
                None,
-               'more-on-momentum-based-approaches'),
-              ('Momentum parameter', 2, None, 'momentum-parameter'),
-              ('Second moment of the gradient',
-               2,
-               None,
-               'second-moment-of-the-gradient'),
-              ('RMS prop', 2, None, 'rms-prop'),
-              ('"ADAM optimizer":"https://arxiv.org/abs/1412.6980"',
-               2,
-               None,
-               'adam-optimizer-https-arxiv-org-abs-1412-6980'),
-              ('Algorithms and codes for Adagrad, RMSprop and Adam',
-               2,
-               None,
-               'algorithms-and-codes-for-adagrad-rmsprop-and-adam'),
-              ('AdaGrad algorithm, taken from "Goodfellow et '
-               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
-               2,
-               None,
-               'adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
-              ('RMSProp algorithm, taken from "Goodfellow et '
-               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
-               2,
-               None,
-               'rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
-              ('ADAM algorithm, taken from "Goodfellow et '
-               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
-               2,
-               None,
-               'adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
-              ('Practical tips', 2, None, 'practical-tips'),
-              ('Automatic differentiation',
-               2,
-               None,
-               'automatic-differentiation'),
-              ('Using autograd', 2, None, 'using-autograd'),
-              ('Autograd with more complicated functions',
-               2,
-               None,
-               'autograd-with-more-complicated-functions'),
-              ('More complicated functions using the elements of their '
-               'arguments directly',
-               2,
-               None,
-               'more-complicated-functions-using-the-elements-of-their-arguments-directly'),
-              ('Functions using mathematical functions from Numpy',
-               2,
-               None,
-               'functions-using-mathematical-functions-from-numpy'),
-              ('More autograd', 2, None, 'more-autograd'),
-              ('And  with loops', 2, None, 'and-with-loops'),
-              ('Using recursion', 2, None, 'using-recursion'),
-              ('Using Autograd with OLS', 2, None, 'using-autograd-with-ols'),
-              ('Same code but now with momentum gradient descent',
-               2,
-               None,
-               'same-code-but-now-with-momentum-gradient-descent'),
-              ('Including Stochastic Gradient Descent with Autograd',
-               2,
-               None,
-               'including-stochastic-gradient-descent-with-autograd'),
-              ('Same code but now with momentum gradient descent',
-               2,
-               None,
-               'same-code-but-now-with-momentum-gradient-descent'),
-              ('Similar (second order function now) problem but now with '
-               'AdaGrad',
-               2,
+               'example-code-for-logistic-regression'),
+              ('Synthetic data generation',
+               3,
                None,
-               'similar-second-order-function-now-problem-but-now-with-adagrad'),
-              ('RMSprop for adaptive learning rate with Stochastic Gradient '
-               'Descent',
+               'synthetic-data-generation'),
+              ('Using _Scikit-learn_', 2, None, 'using-scikit-learn'),
+              ('Using the correlation matrix',
                2,
                None,
-               'rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent'),
-              ('And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"',
+               'using-the-correlation-matrix'),
+              ('Discussing the correlation data',
                2,
                None,
-               'and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf'),
-              ('And Logistic Regression', 2, None, 'and-logistic-regression'),
-              ('Introducing "JAX":"https://jax.readthedocs.io/en/latest/"',
+               'discussing-the-correlation-data'),
+              ('Other measures in classification studies',
                2,
                None,
-               'introducing-jax-https-jax-readthedocs-io-en-latest'),
-              ('Getting started with Jax, note the way we import numpy',
-               3,
-               None,
-               'getting-started-with-jax-note-the-way-we-import-numpy'),
-              ('A warm-up example', 3, None, 'a-warm-up-example'),
-              ('A more advanced example', 3, None, 'a-more-advanced-example'),
+               'other-measures-in-classification-studies'),
               ('Introduction to Neural networks',
                2,
                None,
@@ -276,81 +201,54 @@
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs001.html#plans-for-week-40" style="font-size: 80%;"><b>Plans for week 40</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs002.html#lecture-monday-september-30-2024" style="font-size: 80%;"><b>Lecture Monday September 30, 2024</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs003.html#suggested-readings-and-videos" style="font-size: 80%;"><b>Suggested readings and videos</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs004.html#lab-sessions-tuesday-and-wednesday" style="font-size: 80%;"><b>Lab sessions Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs005.html#summary-from-last-week-using-gradient-descent-methods-limitations" style="font-size: 80%;"><b>Summary from last week, using gradient descent methods, limitations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs006.html#simple-implementation-of-gd-for-ols-ridge-and-lasso" style="font-size: 80%;"><b>Simple implementation of GD for OLS, Ridge and Lasso</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs007.html#but-none-of-these-can-compete-with-newton-s-method" style="font-size: 80%;"><b>But none of these can compete with Newton's method</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs008.html#gradient-descent-and-logistic-regression" style="font-size: 80%;"><b>Gradient descent and Logistic regression</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs009.html#overview-video-on-stochastic-gradient-descent" style="font-size: 80%;"><b>Overview video on Stochastic Gradient Descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs010.html#batches-and-mini-batches" style="font-size: 80%;"><b>Batches and mini-batches</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs011.html#stochastic-gradient-descent-sgd" style="font-size: 80%;"><b>Stochastic Gradient Descent (SGD)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs012.html#stochastic-gradient-descent" style="font-size: 80%;"><b>Stochastic Gradient Descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs013.html#computation-of-gradients" style="font-size: 80%;"><b>Computation of gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs014.html#sgd-example" style="font-size: 80%;"><b>SGD example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs015.html#the-gradient-step" style="font-size: 80%;"><b>The gradient step</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs016.html#simple-example-code" style="font-size: 80%;"><b>Simple example code</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs017.html#when-do-we-stop" style="font-size: 80%;"><b>When do we stop?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs018.html#slightly-different-approach" style="font-size: 80%;"><b>Slightly different approach</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs019.html#time-decay-rate" style="font-size: 80%;"><b>Time decay rate</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs020.html#code-with-a-number-of-minibatches-which-varies" style="font-size: 80%;"><b>Code with a Number of Minibatches which varies</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs021.html#replace-or-not" style="font-size: 80%;"><b>Replace or not</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs022.html#momentum-based-gd" style="font-size: 80%;"><b>Momentum based GD</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs023.html#more-on-momentum-based-approaches" style="font-size: 80%;"><b>More on momentum based approaches</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs024.html#momentum-parameter" style="font-size: 80%;"><b>Momentum parameter</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs025.html#second-moment-of-the-gradient" style="font-size: 80%;"><b>Second moment of the gradient</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs026.html#rms-prop" style="font-size: 80%;"><b>RMS prop</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs027.html#adam-optimizer-https-arxiv-org-abs-1412-6980" style="font-size: 80%;"><b>"ADAM optimizer":"https://arxiv.org/abs/1412.6980"</b></a></li>
-     <!-- navigation toc: --> <li><a href="#algorithms-and-codes-for-adagrad-rmsprop-and-adam" style="font-size: 80%;"><b>Algorithms and codes for Adagrad, RMSprop and Adam</b></a></li>
-     <!-- navigation toc: --> <li><a href="#adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>AdaGrad algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="#rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>RMSProp algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="#adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>ADAM algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs029.html#practical-tips" style="font-size: 80%;"><b>Practical tips</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs030.html#automatic-differentiation" style="font-size: 80%;"><b>Automatic differentiation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs031.html#using-autograd" style="font-size: 80%;"><b>Using autograd</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs032.html#autograd-with-more-complicated-functions" style="font-size: 80%;"><b>Autograd with more complicated functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs033.html#more-complicated-functions-using-the-elements-of-their-arguments-directly" style="font-size: 80%;"><b>More complicated functions using the elements of their arguments directly</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs034.html#functions-using-mathematical-functions-from-numpy" style="font-size: 80%;"><b>Functions using mathematical functions from Numpy</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs035.html#more-autograd" style="font-size: 80%;"><b>More autograd</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs036.html#and-with-loops" style="font-size: 80%;"><b>And  with loops</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs037.html#using-recursion" style="font-size: 80%;"><b>Using recursion</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs038.html#using-autograd-with-ols" style="font-size: 80%;"><b>Using Autograd with OLS</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs041.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs040.html#including-stochastic-gradient-descent-with-autograd" style="font-size: 80%;"><b>Including Stochastic Gradient Descent with Autograd</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs041.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#similar-second-order-function-now-problem-but-now-with-adagrad" style="font-size: 80%;"><b>Similar (second order function now) problem but now with AdaGrad</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs043.html#rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent" style="font-size: 80%;"><b>RMSprop for adaptive learning rate with Stochastic Gradient Descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs044.html#and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf" style="font-size: 80%;"><b>And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#and-logistic-regression" style="font-size: 80%;"><b>And Logistic Regression</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#introducing-jax-https-jax-readthedocs-io-en-latest" style="font-size: 80%;"><b>Introducing "JAX":"https://jax.readthedocs.io/en/latest/"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#getting-started-with-jax-note-the-way-we-import-numpy" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Getting started with Jax, note the way we import numpy</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#a-warm-up-example" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;A warm-up example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#a-more-advanced-example" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;A more advanced example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs046.html#introduction-to-neural-networks" style="font-size: 80%;"><b>Introduction to Neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs047.html#artificial-neurons" style="font-size: 80%;"><b>Artificial neurons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs048.html#neural-network-types" style="font-size: 80%;"><b>Neural network types</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs049.html#feed-forward-neural-networks" style="font-size: 80%;"><b>Feed-forward neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs050.html#convolutional-neural-network" style="font-size: 80%;"><b>Convolutional Neural Network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs051.html#recurrent-neural-networks" style="font-size: 80%;"><b>Recurrent neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs052.html#other-types-of-networks" style="font-size: 80%;"><b>Other types of networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs053.html#multilayer-perceptrons" style="font-size: 80%;"><b>Multilayer perceptrons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs054.html#why-multilayer-perceptrons" style="font-size: 80%;"><b>Why multilayer perceptrons?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs055.html#illustration-of-a-single-perceptron-model-and-a-multi-perceptron-model" style="font-size: 80%;"><b>Illustration of a single perceptron model and a multi-perceptron model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs056.html#examples-of-xor-or-and-and-gates" style="font-size: 80%;"><b>Examples of XOR, OR and AND gates</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs057.html#does-logistic-regression-do-a-better-job" style="font-size: 80%;"><b>Does Logistic Regression do a better Job?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs058.html#adding-neural-networks" style="font-size: 80%;"><b>Adding Neural Networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs064.html#matrix-vector-notation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Matrix-vector notation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs065.html#matrix-vector-notation-and-activation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Matrix-vector notation  and activation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs066.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs067.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs068.html#relevance" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Relevance</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs001.html#lecture-monday-september-29-2025" style="font-size: 80%;"><b>Lecture Monday September 29, 2025</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs002.html#suggested-readings-and-videos" style="font-size: 80%;"><b>Suggested readings and videos</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs003.html#lab-sessions-tuesday-and-wednesday" style="font-size: 80%;"><b>Lab sessions Tuesday and Wednesday</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs004.html#logistic-regression-from-last-week" style="font-size: 80%;"><b>Logistic Regression, from last week</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs005.html#classification-problems" style="font-size: 80%;"><b>Classification problems</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs006.html#optimization-and-deep-learning" style="font-size: 80%;"><b>Optimization and Deep learning</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs007.html#basics" style="font-size: 80%;"><b>Basics</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs008.html#two-parameters" style="font-size: 80%;"><b>Two parameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs009.html#maximum-likelihood" style="font-size: 80%;"><b>Maximum likelihood</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs010.html#the-cost-function-rewritten" style="font-size: 80%;"><b>The cost function rewritten</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs011.html#minimizing-the-cross-entropy" style="font-size: 80%;"><b>Minimizing the cross entropy</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs012.html#a-more-compact-expression" style="font-size: 80%;"><b>A more compact expression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs013.html#extending-to-more-predictors" style="font-size: 80%;"><b>Extending to more predictors</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs014.html#including-more-classes" style="font-size: 80%;"><b>Including more classes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs015.html#more-classes" style="font-size: 80%;"><b>More classes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs016.html#optimization-the-central-part-of-any-machine-learning-algortithm" style="font-size: 80%;"><b>Optimization, the central part of any Machine Learning algortithm</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs017.html#revisiting-our-logistic-regression-case" style="font-size: 80%;"><b>Revisiting our Logistic Regression case</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs018.html#the-equations-to-solve" style="font-size: 80%;"><b>The equations to solve</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs019.html#solving-using-newton-raphson-s-method" style="font-size: 80%;"><b>Solving using Newton-Raphson's method</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs020.html#example-code-for-logistic-regression" style="font-size: 80%;"><b>Example code for Logistic Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs020.html#synthetic-data-generation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Synthetic data generation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs021.html#using-scikit-learn" style="font-size: 80%;"><b>Using <b>Scikit-learn</b></b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs022.html#using-the-correlation-matrix" style="font-size: 80%;"><b>Using the correlation matrix</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs023.html#discussing-the-correlation-data" style="font-size: 80%;"><b>Discussing the correlation data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs024.html#other-measures-in-classification-studies" style="font-size: 80%;"><b>Other measures in classification studies</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs025.html#introduction-to-neural-networks" style="font-size: 80%;"><b>Introduction to Neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs026.html#artificial-neurons" style="font-size: 80%;"><b>Artificial neurons</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs027.html#neural-network-types" style="font-size: 80%;"><b>Neural network types</b></a></li>
+     <!-- navigation toc: --> <li><a href="#feed-forward-neural-networks" style="font-size: 80%;"><b>Feed-forward neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs029.html#convolutional-neural-network" style="font-size: 80%;"><b>Convolutional Neural Network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs030.html#recurrent-neural-networks" style="font-size: 80%;"><b>Recurrent neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs031.html#other-types-of-networks" style="font-size: 80%;"><b>Other types of networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs032.html#multilayer-perceptrons" style="font-size: 80%;"><b>Multilayer perceptrons</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs033.html#why-multilayer-perceptrons" style="font-size: 80%;"><b>Why multilayer perceptrons?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs034.html#illustration-of-a-single-perceptron-model-and-a-multi-perceptron-model" style="font-size: 80%;"><b>Illustration of a single perceptron model and a multi-perceptron model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs035.html#examples-of-xor-or-and-and-gates" style="font-size: 80%;"><b>Examples of XOR, OR and AND gates</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs036.html#does-logistic-regression-do-a-better-job" style="font-size: 80%;"><b>Does Logistic Regression do a better Job?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs037.html#adding-neural-networks" style="font-size: 80%;"><b>Adding Neural Networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs043.html#matrix-vector-notation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Matrix-vector notation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs044.html#matrix-vector-notation-and-activation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Matrix-vector notation  and activation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs046.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs047.html#relevance" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Relevance</a></li>
 
         </ul>
       </li>
@@ -362,32 +260,20 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0028"></a>
 <!-- !split -->
-<h2 id="algorithms-and-codes-for-adagrad-rmsprop-and-adam" class="anchor">Algorithms and codes for Adagrad, RMSprop and Adam </h2>
+<h2 id="feed-forward-neural-networks" class="anchor">Feed-forward neural networks </h2>
 
-<p>The algorithms we have implemented are well described in the text by <a href="https://www.deeplearningbook.org/contents/optimization.html" target="_self">Goodfellow, Bengio and Courville, chapter 8</a>.</p>
+<p>The feed-forward neural network (FFNN) was the first and simplest type
+of ANNs that were devised. In this network, the information moves in
+only one direction: forward through the layers.
+</p>
 
-<p>The codes which implement these algorithms are discussed after our presentation of automatic differentiation.</p>
-<h2 id="adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" class="anchor">AdaGrad algorithm, taken from <a href="https://www.deeplearningbook.org/contents/optimization.html" target="_self">Goodfellow et al</a> </h2>
-
-<br/><br/>
-<center>
-<p><img src="http://github.com/figures/adagrad.png" width="600" align="bottom"></p>
-</center>
-<br/><br/>
-<h2 id="rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" class="anchor">RMSProp algorithm, taken from <a href="https://www.deeplearningbook.org/contents/optimization.html" target="_self">Goodfellow et al</a> </h2>
-
-<br/><br/>
-<center>
-<p><img src="http://github.com/figures/rmsprop.png" width="600" align="bottom"></p>
-</center>
-<br/><br/>
-<h2 id="adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" class="anchor">ADAM algorithm, taken from <a href="https://www.deeplearningbook.org/contents/optimization.html" target="_self">Goodfellow et al</a> </h2>
-
-<br/><br/>
-<center>
-<p><img src="http://github.com/figures/adam.png" width="600" align="bottom"></p>
-</center>
-<br/><br/>
+<p>Nodes are represented by circles, while the arrows display the
+connections between the nodes, including the direction of information
+flow. Additionally, each arrow corresponds to a weight variable
+(figure to come).  We observe that each node in a layer is connected
+to <em>all</em> nodes in the subsequent layer, making this a so-called
+<em>fully-connected</em> FFNN.
+</p>
 
 <p>
 <!-- navigation buttons at the bottom of the page -->
@@ -414,7 +300,7 @@ <h2 id="adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-or
   <li><a href="http://github.com/._week40-bs036.html">37</a></li>
   <li><a href="http://github.com/._week40-bs037.html">38</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week40-bs068.html">69</a></li>
+  <li><a href="http://github.com/._week40-bs047.html">48</a></li>
   <li><a href="http://github.com/._week40-bs029.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week40/html/._week40-bs029.html b/doc/pub/week40/html/._week40-bs029.html
index 8b7943fa3..473c624bc 100644
--- a/doc/pub/week40/html/._week40-bs029.html
+++ b/doc/pub/week40/html/._week40-bs029.html
@@ -36,11 +36,10 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Plans for week 40', 2, None, 'plans-for-week-40'),
-              ('Lecture Monday September 30, 2024',
+ 'sections': [('Lecture Monday September 29, 2025',
                2,
                None,
-               'lecture-monday-september-30-2024'),
+               'lecture-monday-september-29-2025'),
               ('Suggested readings and videos',
                2,
                None,
@@ -49,145 +48,71 @@
                2,
                None,
                'lab-sessions-tuesday-and-wednesday'),
-              ('Summary from last week, using gradient descent methods, '
-               'limitations',
+              ('Logistic Regression, from last week',
                2,
                None,
-               'summary-from-last-week-using-gradient-descent-methods-limitations'),
-              ('Simple implementation of GD for OLS, Ridge and Lasso',
+               'logistic-regression-from-last-week'),
+              ('Classification problems', 2, None, 'classification-problems'),
+              ('Optimization and Deep learning',
                2,
                None,
-               'simple-implementation-of-gd-for-ols-ridge-and-lasso'),
-              ("But none of these can compete with Newton's method",
+               'optimization-and-deep-learning'),
+              ('Basics', 2, None, 'basics'),
+              ('Two parameters', 2, None, 'two-parameters'),
+              ('Maximum likelihood', 2, None, 'maximum-likelihood'),
+              ('The cost function rewritten',
                2,
                None,
-               'but-none-of-these-can-compete-with-newton-s-method'),
-              ('Gradient descent and Logistic regression',
+               'the-cost-function-rewritten'),
+              ('Minimizing the cross entropy',
                2,
                None,
-               'gradient-descent-and-logistic-regression'),
-              ('Overview video on Stochastic Gradient Descent',
+               'minimizing-the-cross-entropy'),
+              ('A more compact expression',
                2,
                None,
-               'overview-video-on-stochastic-gradient-descent'),
-              ('Batches and mini-batches', 2, None, 'batches-and-mini-batches'),
-              ('Stochastic Gradient Descent (SGD)',
+               'a-more-compact-expression'),
+              ('Extending to more predictors',
                2,
                None,
-               'stochastic-gradient-descent-sgd'),
-              ('Stochastic Gradient Descent',
+               'extending-to-more-predictors'),
+              ('Including more classes', 2, None, 'including-more-classes'),
+              ('More classes', 2, None, 'more-classes'),
+              ('Optimization, the central part of any Machine Learning '
+               'algortithm',
                2,
                None,
-               'stochastic-gradient-descent'),
-              ('Computation of gradients', 2, None, 'computation-of-gradients'),
-              ('SGD example', 2, None, 'sgd-example'),
-              ('The gradient step', 2, None, 'the-gradient-step'),
-              ('Simple example code', 2, None, 'simple-example-code'),
-              ('When do we stop?', 2, None, 'when-do-we-stop'),
-              ('Slightly different approach',
+               'optimization-the-central-part-of-any-machine-learning-algortithm'),
+              ('Revisiting our Logistic Regression case',
                2,
                None,
-               'slightly-different-approach'),
-              ('Time decay rate', 2, None, 'time-decay-rate'),
-              ('Code with a Number of Minibatches which varies',
+               'revisiting-our-logistic-regression-case'),
+              ('The equations to solve', 2, None, 'the-equations-to-solve'),
+              ("Solving using Newton-Raphson's method",
                2,
                None,
-               'code-with-a-number-of-minibatches-which-varies'),
-              ('Replace or not', 2, None, 'replace-or-not'),
-              ('Momentum based GD', 2, None, 'momentum-based-gd'),
-              ('More on momentum based approaches',
+               'solving-using-newton-raphson-s-method'),
+              ('Example code for Logistic Regression',
                2,
                None,
-               'more-on-momentum-based-approaches'),
-              ('Momentum parameter', 2, None, 'momentum-parameter'),
-              ('Second moment of the gradient',
-               2,
-               None,
-               'second-moment-of-the-gradient'),
-              ('RMS prop', 2, None, 'rms-prop'),
-              ('"ADAM optimizer":"https://arxiv.org/abs/1412.6980"',
-               2,
-               None,
-               'adam-optimizer-https-arxiv-org-abs-1412-6980'),
-              ('Algorithms and codes for Adagrad, RMSprop and Adam',
-               2,
-               None,
-               'algorithms-and-codes-for-adagrad-rmsprop-and-adam'),
-              ('AdaGrad algorithm, taken from "Goodfellow et '
-               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
-               2,
-               None,
-               'adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
-              ('RMSProp algorithm, taken from "Goodfellow et '
-               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
-               2,
-               None,
-               'rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
-              ('ADAM algorithm, taken from "Goodfellow et '
-               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
-               2,
-               None,
-               'adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
-              ('Practical tips', 2, None, 'practical-tips'),
-              ('Automatic differentiation',
-               2,
-               None,
-               'automatic-differentiation'),
-              ('Using autograd', 2, None, 'using-autograd'),
-              ('Autograd with more complicated functions',
-               2,
-               None,
-               'autograd-with-more-complicated-functions'),
-              ('More complicated functions using the elements of their '
-               'arguments directly',
-               2,
-               None,
-               'more-complicated-functions-using-the-elements-of-their-arguments-directly'),
-              ('Functions using mathematical functions from Numpy',
-               2,
-               None,
-               'functions-using-mathematical-functions-from-numpy'),
-              ('More autograd', 2, None, 'more-autograd'),
-              ('And  with loops', 2, None, 'and-with-loops'),
-              ('Using recursion', 2, None, 'using-recursion'),
-              ('Using Autograd with OLS', 2, None, 'using-autograd-with-ols'),
-              ('Same code but now with momentum gradient descent',
-               2,
-               None,
-               'same-code-but-now-with-momentum-gradient-descent'),
-              ('Including Stochastic Gradient Descent with Autograd',
-               2,
-               None,
-               'including-stochastic-gradient-descent-with-autograd'),
-              ('Same code but now with momentum gradient descent',
-               2,
-               None,
-               'same-code-but-now-with-momentum-gradient-descent'),
-              ('Similar (second order function now) problem but now with '
-               'AdaGrad',
-               2,
+               'example-code-for-logistic-regression'),
+              ('Synthetic data generation',
+               3,
                None,
-               'similar-second-order-function-now-problem-but-now-with-adagrad'),
-              ('RMSprop for adaptive learning rate with Stochastic Gradient '
-               'Descent',
+               'synthetic-data-generation'),
+              ('Using _Scikit-learn_', 2, None, 'using-scikit-learn'),
+              ('Using the correlation matrix',
                2,
                None,
-               'rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent'),
-              ('And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"',
+               'using-the-correlation-matrix'),
+              ('Discussing the correlation data',
                2,
                None,
-               'and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf'),
-              ('And Logistic Regression', 2, None, 'and-logistic-regression'),
-              ('Introducing "JAX":"https://jax.readthedocs.io/en/latest/"',
+               'discussing-the-correlation-data'),
+              ('Other measures in classification studies',
                2,
                None,
-               'introducing-jax-https-jax-readthedocs-io-en-latest'),
-              ('Getting started with Jax, note the way we import numpy',
-               3,
-               None,
-               'getting-started-with-jax-note-the-way-we-import-numpy'),
-              ('A warm-up example', 3, None, 'a-warm-up-example'),
-              ('A more advanced example', 3, None, 'a-more-advanced-example'),
+               'other-measures-in-classification-studies'),
               ('Introduction to Neural networks',
                2,
                None,
@@ -276,81 +201,54 @@
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs001.html#plans-for-week-40" style="font-size: 80%;"><b>Plans for week 40</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs002.html#lecture-monday-september-30-2024" style="font-size: 80%;"><b>Lecture Monday September 30, 2024</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs003.html#suggested-readings-and-videos" style="font-size: 80%;"><b>Suggested readings and videos</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs004.html#lab-sessions-tuesday-and-wednesday" style="font-size: 80%;"><b>Lab sessions Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs005.html#summary-from-last-week-using-gradient-descent-methods-limitations" style="font-size: 80%;"><b>Summary from last week, using gradient descent methods, limitations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs006.html#simple-implementation-of-gd-for-ols-ridge-and-lasso" style="font-size: 80%;"><b>Simple implementation of GD for OLS, Ridge and Lasso</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs007.html#but-none-of-these-can-compete-with-newton-s-method" style="font-size: 80%;"><b>But none of these can compete with Newton's method</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs008.html#gradient-descent-and-logistic-regression" style="font-size: 80%;"><b>Gradient descent and Logistic regression</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs009.html#overview-video-on-stochastic-gradient-descent" style="font-size: 80%;"><b>Overview video on Stochastic Gradient Descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs010.html#batches-and-mini-batches" style="font-size: 80%;"><b>Batches and mini-batches</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs011.html#stochastic-gradient-descent-sgd" style="font-size: 80%;"><b>Stochastic Gradient Descent (SGD)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs012.html#stochastic-gradient-descent" style="font-size: 80%;"><b>Stochastic Gradient Descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs013.html#computation-of-gradients" style="font-size: 80%;"><b>Computation of gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs014.html#sgd-example" style="font-size: 80%;"><b>SGD example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs015.html#the-gradient-step" style="font-size: 80%;"><b>The gradient step</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs016.html#simple-example-code" style="font-size: 80%;"><b>Simple example code</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs017.html#when-do-we-stop" style="font-size: 80%;"><b>When do we stop?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs018.html#slightly-different-approach" style="font-size: 80%;"><b>Slightly different approach</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs019.html#time-decay-rate" style="font-size: 80%;"><b>Time decay rate</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs020.html#code-with-a-number-of-minibatches-which-varies" style="font-size: 80%;"><b>Code with a Number of Minibatches which varies</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs021.html#replace-or-not" style="font-size: 80%;"><b>Replace or not</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs022.html#momentum-based-gd" style="font-size: 80%;"><b>Momentum based GD</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs023.html#more-on-momentum-based-approaches" style="font-size: 80%;"><b>More on momentum based approaches</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs024.html#momentum-parameter" style="font-size: 80%;"><b>Momentum parameter</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs025.html#second-moment-of-the-gradient" style="font-size: 80%;"><b>Second moment of the gradient</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs026.html#rms-prop" style="font-size: 80%;"><b>RMS prop</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs027.html#adam-optimizer-https-arxiv-org-abs-1412-6980" style="font-size: 80%;"><b>"ADAM optimizer":"https://arxiv.org/abs/1412.6980"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#algorithms-and-codes-for-adagrad-rmsprop-and-adam" style="font-size: 80%;"><b>Algorithms and codes for Adagrad, RMSprop and Adam</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>AdaGrad algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>RMSProp algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>ADAM algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="#practical-tips" style="font-size: 80%;"><b>Practical tips</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs030.html#automatic-differentiation" style="font-size: 80%;"><b>Automatic differentiation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs031.html#using-autograd" style="font-size: 80%;"><b>Using autograd</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs032.html#autograd-with-more-complicated-functions" style="font-size: 80%;"><b>Autograd with more complicated functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs033.html#more-complicated-functions-using-the-elements-of-their-arguments-directly" style="font-size: 80%;"><b>More complicated functions using the elements of their arguments directly</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs034.html#functions-using-mathematical-functions-from-numpy" style="font-size: 80%;"><b>Functions using mathematical functions from Numpy</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs035.html#more-autograd" style="font-size: 80%;"><b>More autograd</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs036.html#and-with-loops" style="font-size: 80%;"><b>And  with loops</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs037.html#using-recursion" style="font-size: 80%;"><b>Using recursion</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs038.html#using-autograd-with-ols" style="font-size: 80%;"><b>Using Autograd with OLS</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs041.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs040.html#including-stochastic-gradient-descent-with-autograd" style="font-size: 80%;"><b>Including Stochastic Gradient Descent with Autograd</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs041.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#similar-second-order-function-now-problem-but-now-with-adagrad" style="font-size: 80%;"><b>Similar (second order function now) problem but now with AdaGrad</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs043.html#rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent" style="font-size: 80%;"><b>RMSprop for adaptive learning rate with Stochastic Gradient Descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs044.html#and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf" style="font-size: 80%;"><b>And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#and-logistic-regression" style="font-size: 80%;"><b>And Logistic Regression</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#introducing-jax-https-jax-readthedocs-io-en-latest" style="font-size: 80%;"><b>Introducing "JAX":"https://jax.readthedocs.io/en/latest/"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#getting-started-with-jax-note-the-way-we-import-numpy" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Getting started with Jax, note the way we import numpy</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#a-warm-up-example" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;A warm-up example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#a-more-advanced-example" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;A more advanced example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs046.html#introduction-to-neural-networks" style="font-size: 80%;"><b>Introduction to Neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs047.html#artificial-neurons" style="font-size: 80%;"><b>Artificial neurons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs048.html#neural-network-types" style="font-size: 80%;"><b>Neural network types</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs049.html#feed-forward-neural-networks" style="font-size: 80%;"><b>Feed-forward neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs050.html#convolutional-neural-network" style="font-size: 80%;"><b>Convolutional Neural Network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs051.html#recurrent-neural-networks" style="font-size: 80%;"><b>Recurrent neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs052.html#other-types-of-networks" style="font-size: 80%;"><b>Other types of networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs053.html#multilayer-perceptrons" style="font-size: 80%;"><b>Multilayer perceptrons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs054.html#why-multilayer-perceptrons" style="font-size: 80%;"><b>Why multilayer perceptrons?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs055.html#illustration-of-a-single-perceptron-model-and-a-multi-perceptron-model" style="font-size: 80%;"><b>Illustration of a single perceptron model and a multi-perceptron model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs056.html#examples-of-xor-or-and-and-gates" style="font-size: 80%;"><b>Examples of XOR, OR and AND gates</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs057.html#does-logistic-regression-do-a-better-job" style="font-size: 80%;"><b>Does Logistic Regression do a better Job?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs058.html#adding-neural-networks" style="font-size: 80%;"><b>Adding Neural Networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs064.html#matrix-vector-notation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Matrix-vector notation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs065.html#matrix-vector-notation-and-activation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Matrix-vector notation  and activation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs066.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs067.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs068.html#relevance" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Relevance</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs001.html#lecture-monday-september-29-2025" style="font-size: 80%;"><b>Lecture Monday September 29, 2025</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs002.html#suggested-readings-and-videos" style="font-size: 80%;"><b>Suggested readings and videos</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs003.html#lab-sessions-tuesday-and-wednesday" style="font-size: 80%;"><b>Lab sessions Tuesday and Wednesday</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs004.html#logistic-regression-from-last-week" style="font-size: 80%;"><b>Logistic Regression, from last week</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs005.html#classification-problems" style="font-size: 80%;"><b>Classification problems</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs006.html#optimization-and-deep-learning" style="font-size: 80%;"><b>Optimization and Deep learning</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs007.html#basics" style="font-size: 80%;"><b>Basics</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs008.html#two-parameters" style="font-size: 80%;"><b>Two parameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs009.html#maximum-likelihood" style="font-size: 80%;"><b>Maximum likelihood</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs010.html#the-cost-function-rewritten" style="font-size: 80%;"><b>The cost function rewritten</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs011.html#minimizing-the-cross-entropy" style="font-size: 80%;"><b>Minimizing the cross entropy</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs012.html#a-more-compact-expression" style="font-size: 80%;"><b>A more compact expression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs013.html#extending-to-more-predictors" style="font-size: 80%;"><b>Extending to more predictors</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs014.html#including-more-classes" style="font-size: 80%;"><b>Including more classes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs015.html#more-classes" style="font-size: 80%;"><b>More classes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs016.html#optimization-the-central-part-of-any-machine-learning-algortithm" style="font-size: 80%;"><b>Optimization, the central part of any Machine Learning algortithm</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs017.html#revisiting-our-logistic-regression-case" style="font-size: 80%;"><b>Revisiting our Logistic Regression case</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs018.html#the-equations-to-solve" style="font-size: 80%;"><b>The equations to solve</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs019.html#solving-using-newton-raphson-s-method" style="font-size: 80%;"><b>Solving using Newton-Raphson's method</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs020.html#example-code-for-logistic-regression" style="font-size: 80%;"><b>Example code for Logistic Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs020.html#synthetic-data-generation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Synthetic data generation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs021.html#using-scikit-learn" style="font-size: 80%;"><b>Using <b>Scikit-learn</b></b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs022.html#using-the-correlation-matrix" style="font-size: 80%;"><b>Using the correlation matrix</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs023.html#discussing-the-correlation-data" style="font-size: 80%;"><b>Discussing the correlation data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs024.html#other-measures-in-classification-studies" style="font-size: 80%;"><b>Other measures in classification studies</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs025.html#introduction-to-neural-networks" style="font-size: 80%;"><b>Introduction to Neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs026.html#artificial-neurons" style="font-size: 80%;"><b>Artificial neurons</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs027.html#neural-network-types" style="font-size: 80%;"><b>Neural network types</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#feed-forward-neural-networks" style="font-size: 80%;"><b>Feed-forward neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="#convolutional-neural-network" style="font-size: 80%;"><b>Convolutional Neural Network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs030.html#recurrent-neural-networks" style="font-size: 80%;"><b>Recurrent neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs031.html#other-types-of-networks" style="font-size: 80%;"><b>Other types of networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs032.html#multilayer-perceptrons" style="font-size: 80%;"><b>Multilayer perceptrons</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs033.html#why-multilayer-perceptrons" style="font-size: 80%;"><b>Why multilayer perceptrons?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs034.html#illustration-of-a-single-perceptron-model-and-a-multi-perceptron-model" style="font-size: 80%;"><b>Illustration of a single perceptron model and a multi-perceptron model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs035.html#examples-of-xor-or-and-and-gates" style="font-size: 80%;"><b>Examples of XOR, OR and AND gates</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs036.html#does-logistic-regression-do-a-better-job" style="font-size: 80%;"><b>Does Logistic Regression do a better Job?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs037.html#adding-neural-networks" style="font-size: 80%;"><b>Adding Neural Networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs043.html#matrix-vector-notation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Matrix-vector notation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs044.html#matrix-vector-notation-and-activation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Matrix-vector notation  and activation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs046.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs047.html#relevance" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Relevance</a></li>
 
         </ul>
       </li>
@@ -362,15 +260,28 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0029"></a>
 <!-- !split -->
-<h2 id="practical-tips" class="anchor">Practical tips </h2>
+<h2 id="convolutional-neural-network" class="anchor">Convolutional Neural Network </h2>
 
-<ul>
-<li> <b>Randomize the data when making mini-batches</b>. It is always important to randomly shuffle the data when forming mini-batches. Otherwise, the gradient descent method can fit spurious correlations resulting from the order in which data is presented.</li>
-<li> <b>Transform your inputs</b>. Learning becomes difficult when our landscape has a mixture of steep and flat directions. One simple trick for minimizing these situations is to standardize the data by subtracting the mean and normalizing the variance of input variables. Whenever possible, also decorrelate the inputs. To understand why this is helpful, consider the case of linear regression. It is easy to show that for the squared error cost function, the Hessian of the cost function is just the correlation matrix between the inputs. Thus, by standardizing the inputs, we are ensuring that the landscape looks homogeneous in all directions in parameter space. Since most deep networks can be viewed as linear transformations followed by a non-linearity at each layer, we expect this intuition to hold beyond the linear case.</li>
-<li> <b>Monitor the out-of-sample performance.</b> Always monitor the performance of your model on a validation set (a small portion of the training data that is held out of the training process to serve as a proxy for the test set. If the validation error starts increasing, then the model is beginning to overfit. Terminate the learning process. This <em>early stopping</em> significantly improves performance in many settings.</li>
-<li> <b>Adaptive optimization methods don't always have good generalization.</b> Recent studies have shown that adaptive methods such as ADAM, RMSPorp, and AdaGrad tend to have poor generalization compared to SGD or SGD with momentum, particularly in the high-dimensional limit (i.e. the number of parameters exceeds the number of data points). Although it is not clear at this stage why these methods perform so well in training deep neural networks, simpler procedures like properly-tuned SGD may work as well or better in these applications.</li>
-</ul>
-<p>Geron's text, see chapter 11, has several interesting discussions.</p>
+<p>A different variant of FFNNs are <em>convolutional neural networks</em>
+(CNNs), which have a connectivity pattern inspired by the animal
+visual cortex. Individual neurons in the visual cortex only respond to
+stimuli from small sub-regions of the visual field, called a receptive
+field. This makes the neurons well-suited to exploit the strong
+spatially local correlation present in natural images. The response of
+each neuron can be approximated mathematically as a convolution
+operation.  (figure to come)
+</p>
+
+<p>Convolutional neural networks emulate the behaviour of neurons in the
+visual cortex by enforcing a <em>local</em> connectivity pattern between
+nodes of adjacent layers: Each node in a convolutional layer is
+connected only to a subset of the nodes in the previous layer, in
+contrast to the fully-connected FFNN.  Often, CNNs consist of several
+convolutional layers that learn local features of the input, with a
+fully-connected layer at the end, which gathers all the local data and
+produces the outputs. They have wide applications in image and video
+recognition.
+</p>
 
 <p>
 <!-- navigation buttons at the bottom of the page -->
@@ -397,7 +308,7 @@ <h2 id="practical-tips" class="anchor">Practical tips </h2>
   <li><a href="http://github.com/._week40-bs037.html">38</a></li>
   <li><a href="http://github.com/._week40-bs038.html">39</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week40-bs068.html">69</a></li>
+  <li><a href="http://github.com/._week40-bs047.html">48</a></li>
   <li><a href="http://github.com/._week40-bs030.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week40/html/._week40-bs030.html b/doc/pub/week40/html/._week40-bs030.html
index 52fd05644..ca91a85ae 100644
--- a/doc/pub/week40/html/._week40-bs030.html
+++ b/doc/pub/week40/html/._week40-bs030.html
@@ -36,11 +36,10 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Plans for week 40', 2, None, 'plans-for-week-40'),
-              ('Lecture Monday September 30, 2024',
+ 'sections': [('Lecture Monday September 29, 2025',
                2,
                None,
-               'lecture-monday-september-30-2024'),
+               'lecture-monday-september-29-2025'),
               ('Suggested readings and videos',
                2,
                None,
@@ -49,145 +48,71 @@
                2,
                None,
                'lab-sessions-tuesday-and-wednesday'),
-              ('Summary from last week, using gradient descent methods, '
-               'limitations',
+              ('Logistic Regression, from last week',
                2,
                None,
-               'summary-from-last-week-using-gradient-descent-methods-limitations'),
-              ('Simple implementation of GD for OLS, Ridge and Lasso',
+               'logistic-regression-from-last-week'),
+              ('Classification problems', 2, None, 'classification-problems'),
+              ('Optimization and Deep learning',
                2,
                None,
-               'simple-implementation-of-gd-for-ols-ridge-and-lasso'),
-              ("But none of these can compete with Newton's method",
+               'optimization-and-deep-learning'),
+              ('Basics', 2, None, 'basics'),
+              ('Two parameters', 2, None, 'two-parameters'),
+              ('Maximum likelihood', 2, None, 'maximum-likelihood'),
+              ('The cost function rewritten',
                2,
                None,
-               'but-none-of-these-can-compete-with-newton-s-method'),
-              ('Gradient descent and Logistic regression',
+               'the-cost-function-rewritten'),
+              ('Minimizing the cross entropy',
                2,
                None,
-               'gradient-descent-and-logistic-regression'),
-              ('Overview video on Stochastic Gradient Descent',
+               'minimizing-the-cross-entropy'),
+              ('A more compact expression',
                2,
                None,
-               'overview-video-on-stochastic-gradient-descent'),
-              ('Batches and mini-batches', 2, None, 'batches-and-mini-batches'),
-              ('Stochastic Gradient Descent (SGD)',
+               'a-more-compact-expression'),
+              ('Extending to more predictors',
                2,
                None,
-               'stochastic-gradient-descent-sgd'),
-              ('Stochastic Gradient Descent',
+               'extending-to-more-predictors'),
+              ('Including more classes', 2, None, 'including-more-classes'),
+              ('More classes', 2, None, 'more-classes'),
+              ('Optimization, the central part of any Machine Learning '
+               'algortithm',
                2,
                None,
-               'stochastic-gradient-descent'),
-              ('Computation of gradients', 2, None, 'computation-of-gradients'),
-              ('SGD example', 2, None, 'sgd-example'),
-              ('The gradient step', 2, None, 'the-gradient-step'),
-              ('Simple example code', 2, None, 'simple-example-code'),
-              ('When do we stop?', 2, None, 'when-do-we-stop'),
-              ('Slightly different approach',
+               'optimization-the-central-part-of-any-machine-learning-algortithm'),
+              ('Revisiting our Logistic Regression case',
                2,
                None,
-               'slightly-different-approach'),
-              ('Time decay rate', 2, None, 'time-decay-rate'),
-              ('Code with a Number of Minibatches which varies',
+               'revisiting-our-logistic-regression-case'),
+              ('The equations to solve', 2, None, 'the-equations-to-solve'),
+              ("Solving using Newton-Raphson's method",
                2,
                None,
-               'code-with-a-number-of-minibatches-which-varies'),
-              ('Replace or not', 2, None, 'replace-or-not'),
-              ('Momentum based GD', 2, None, 'momentum-based-gd'),
-              ('More on momentum based approaches',
+               'solving-using-newton-raphson-s-method'),
+              ('Example code for Logistic Regression',
                2,
                None,
-               'more-on-momentum-based-approaches'),
-              ('Momentum parameter', 2, None, 'momentum-parameter'),
-              ('Second moment of the gradient',
-               2,
-               None,
-               'second-moment-of-the-gradient'),
-              ('RMS prop', 2, None, 'rms-prop'),
-              ('"ADAM optimizer":"https://arxiv.org/abs/1412.6980"',
-               2,
-               None,
-               'adam-optimizer-https-arxiv-org-abs-1412-6980'),
-              ('Algorithms and codes for Adagrad, RMSprop and Adam',
-               2,
-               None,
-               'algorithms-and-codes-for-adagrad-rmsprop-and-adam'),
-              ('AdaGrad algorithm, taken from "Goodfellow et '
-               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
-               2,
-               None,
-               'adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
-              ('RMSProp algorithm, taken from "Goodfellow et '
-               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
-               2,
-               None,
-               'rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
-              ('ADAM algorithm, taken from "Goodfellow et '
-               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
-               2,
-               None,
-               'adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
-              ('Practical tips', 2, None, 'practical-tips'),
-              ('Automatic differentiation',
-               2,
-               None,
-               'automatic-differentiation'),
-              ('Using autograd', 2, None, 'using-autograd'),
-              ('Autograd with more complicated functions',
-               2,
-               None,
-               'autograd-with-more-complicated-functions'),
-              ('More complicated functions using the elements of their '
-               'arguments directly',
-               2,
-               None,
-               'more-complicated-functions-using-the-elements-of-their-arguments-directly'),
-              ('Functions using mathematical functions from Numpy',
-               2,
-               None,
-               'functions-using-mathematical-functions-from-numpy'),
-              ('More autograd', 2, None, 'more-autograd'),
-              ('And  with loops', 2, None, 'and-with-loops'),
-              ('Using recursion', 2, None, 'using-recursion'),
-              ('Using Autograd with OLS', 2, None, 'using-autograd-with-ols'),
-              ('Same code but now with momentum gradient descent',
-               2,
-               None,
-               'same-code-but-now-with-momentum-gradient-descent'),
-              ('Including Stochastic Gradient Descent with Autograd',
-               2,
-               None,
-               'including-stochastic-gradient-descent-with-autograd'),
-              ('Same code but now with momentum gradient descent',
-               2,
-               None,
-               'same-code-but-now-with-momentum-gradient-descent'),
-              ('Similar (second order function now) problem but now with '
-               'AdaGrad',
-               2,
+               'example-code-for-logistic-regression'),
+              ('Synthetic data generation',
+               3,
                None,
-               'similar-second-order-function-now-problem-but-now-with-adagrad'),
-              ('RMSprop for adaptive learning rate with Stochastic Gradient '
-               'Descent',
+               'synthetic-data-generation'),
+              ('Using _Scikit-learn_', 2, None, 'using-scikit-learn'),
+              ('Using the correlation matrix',
                2,
                None,
-               'rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent'),
-              ('And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"',
+               'using-the-correlation-matrix'),
+              ('Discussing the correlation data',
                2,
                None,
-               'and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf'),
-              ('And Logistic Regression', 2, None, 'and-logistic-regression'),
-              ('Introducing "JAX":"https://jax.readthedocs.io/en/latest/"',
+               'discussing-the-correlation-data'),
+              ('Other measures in classification studies',
                2,
                None,
-               'introducing-jax-https-jax-readthedocs-io-en-latest'),
-              ('Getting started with Jax, note the way we import numpy',
-               3,
-               None,
-               'getting-started-with-jax-note-the-way-we-import-numpy'),
-              ('A warm-up example', 3, None, 'a-warm-up-example'),
-              ('A more advanced example', 3, None, 'a-more-advanced-example'),
+               'other-measures-in-classification-studies'),
               ('Introduction to Neural networks',
                2,
                None,
@@ -276,81 +201,54 @@
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs001.html#plans-for-week-40" style="font-size: 80%;"><b>Plans for week 40</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs002.html#lecture-monday-september-30-2024" style="font-size: 80%;"><b>Lecture Monday September 30, 2024</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs003.html#suggested-readings-and-videos" style="font-size: 80%;"><b>Suggested readings and videos</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs004.html#lab-sessions-tuesday-and-wednesday" style="font-size: 80%;"><b>Lab sessions Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs005.html#summary-from-last-week-using-gradient-descent-methods-limitations" style="font-size: 80%;"><b>Summary from last week, using gradient descent methods, limitations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs006.html#simple-implementation-of-gd-for-ols-ridge-and-lasso" style="font-size: 80%;"><b>Simple implementation of GD for OLS, Ridge and Lasso</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs007.html#but-none-of-these-can-compete-with-newton-s-method" style="font-size: 80%;"><b>But none of these can compete with Newton's method</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs008.html#gradient-descent-and-logistic-regression" style="font-size: 80%;"><b>Gradient descent and Logistic regression</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs009.html#overview-video-on-stochastic-gradient-descent" style="font-size: 80%;"><b>Overview video on Stochastic Gradient Descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs010.html#batches-and-mini-batches" style="font-size: 80%;"><b>Batches and mini-batches</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs011.html#stochastic-gradient-descent-sgd" style="font-size: 80%;"><b>Stochastic Gradient Descent (SGD)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs012.html#stochastic-gradient-descent" style="font-size: 80%;"><b>Stochastic Gradient Descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs013.html#computation-of-gradients" style="font-size: 80%;"><b>Computation of gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs014.html#sgd-example" style="font-size: 80%;"><b>SGD example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs015.html#the-gradient-step" style="font-size: 80%;"><b>The gradient step</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs016.html#simple-example-code" style="font-size: 80%;"><b>Simple example code</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs017.html#when-do-we-stop" style="font-size: 80%;"><b>When do we stop?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs018.html#slightly-different-approach" style="font-size: 80%;"><b>Slightly different approach</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs019.html#time-decay-rate" style="font-size: 80%;"><b>Time decay rate</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs020.html#code-with-a-number-of-minibatches-which-varies" style="font-size: 80%;"><b>Code with a Number of Minibatches which varies</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs021.html#replace-or-not" style="font-size: 80%;"><b>Replace or not</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs022.html#momentum-based-gd" style="font-size: 80%;"><b>Momentum based GD</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs023.html#more-on-momentum-based-approaches" style="font-size: 80%;"><b>More on momentum based approaches</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs024.html#momentum-parameter" style="font-size: 80%;"><b>Momentum parameter</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs025.html#second-moment-of-the-gradient" style="font-size: 80%;"><b>Second moment of the gradient</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs026.html#rms-prop" style="font-size: 80%;"><b>RMS prop</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs027.html#adam-optimizer-https-arxiv-org-abs-1412-6980" style="font-size: 80%;"><b>"ADAM optimizer":"https://arxiv.org/abs/1412.6980"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#algorithms-and-codes-for-adagrad-rmsprop-and-adam" style="font-size: 80%;"><b>Algorithms and codes for Adagrad, RMSprop and Adam</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>AdaGrad algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>RMSProp algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>ADAM algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs029.html#practical-tips" style="font-size: 80%;"><b>Practical tips</b></a></li>
-     <!-- navigation toc: --> <li><a href="#automatic-differentiation" style="font-size: 80%;"><b>Automatic differentiation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs031.html#using-autograd" style="font-size: 80%;"><b>Using autograd</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs032.html#autograd-with-more-complicated-functions" style="font-size: 80%;"><b>Autograd with more complicated functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs033.html#more-complicated-functions-using-the-elements-of-their-arguments-directly" style="font-size: 80%;"><b>More complicated functions using the elements of their arguments directly</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs034.html#functions-using-mathematical-functions-from-numpy" style="font-size: 80%;"><b>Functions using mathematical functions from Numpy</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs035.html#more-autograd" style="font-size: 80%;"><b>More autograd</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs036.html#and-with-loops" style="font-size: 80%;"><b>And  with loops</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs037.html#using-recursion" style="font-size: 80%;"><b>Using recursion</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs038.html#using-autograd-with-ols" style="font-size: 80%;"><b>Using Autograd with OLS</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs041.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs040.html#including-stochastic-gradient-descent-with-autograd" style="font-size: 80%;"><b>Including Stochastic Gradient Descent with Autograd</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs041.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#similar-second-order-function-now-problem-but-now-with-adagrad" style="font-size: 80%;"><b>Similar (second order function now) problem but now with AdaGrad</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs043.html#rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent" style="font-size: 80%;"><b>RMSprop for adaptive learning rate with Stochastic Gradient Descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs044.html#and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf" style="font-size: 80%;"><b>And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#and-logistic-regression" style="font-size: 80%;"><b>And Logistic Regression</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#introducing-jax-https-jax-readthedocs-io-en-latest" style="font-size: 80%;"><b>Introducing "JAX":"https://jax.readthedocs.io/en/latest/"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#getting-started-with-jax-note-the-way-we-import-numpy" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Getting started with Jax, note the way we import numpy</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#a-warm-up-example" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;A warm-up example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#a-more-advanced-example" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;A more advanced example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs046.html#introduction-to-neural-networks" style="font-size: 80%;"><b>Introduction to Neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs047.html#artificial-neurons" style="font-size: 80%;"><b>Artificial neurons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs048.html#neural-network-types" style="font-size: 80%;"><b>Neural network types</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs049.html#feed-forward-neural-networks" style="font-size: 80%;"><b>Feed-forward neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs050.html#convolutional-neural-network" style="font-size: 80%;"><b>Convolutional Neural Network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs051.html#recurrent-neural-networks" style="font-size: 80%;"><b>Recurrent neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs052.html#other-types-of-networks" style="font-size: 80%;"><b>Other types of networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs053.html#multilayer-perceptrons" style="font-size: 80%;"><b>Multilayer perceptrons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs054.html#why-multilayer-perceptrons" style="font-size: 80%;"><b>Why multilayer perceptrons?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs055.html#illustration-of-a-single-perceptron-model-and-a-multi-perceptron-model" style="font-size: 80%;"><b>Illustration of a single perceptron model and a multi-perceptron model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs056.html#examples-of-xor-or-and-and-gates" style="font-size: 80%;"><b>Examples of XOR, OR and AND gates</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs057.html#does-logistic-regression-do-a-better-job" style="font-size: 80%;"><b>Does Logistic Regression do a better Job?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs058.html#adding-neural-networks" style="font-size: 80%;"><b>Adding Neural Networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs064.html#matrix-vector-notation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Matrix-vector notation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs065.html#matrix-vector-notation-and-activation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Matrix-vector notation  and activation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs066.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs067.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs068.html#relevance" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Relevance</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs001.html#lecture-monday-september-29-2025" style="font-size: 80%;"><b>Lecture Monday September 29, 2025</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs002.html#suggested-readings-and-videos" style="font-size: 80%;"><b>Suggested readings and videos</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs003.html#lab-sessions-tuesday-and-wednesday" style="font-size: 80%;"><b>Lab sessions Tuesday and Wednesday</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs004.html#logistic-regression-from-last-week" style="font-size: 80%;"><b>Logistic Regression, from last week</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs005.html#classification-problems" style="font-size: 80%;"><b>Classification problems</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs006.html#optimization-and-deep-learning" style="font-size: 80%;"><b>Optimization and Deep learning</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs007.html#basics" style="font-size: 80%;"><b>Basics</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs008.html#two-parameters" style="font-size: 80%;"><b>Two parameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs009.html#maximum-likelihood" style="font-size: 80%;"><b>Maximum likelihood</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs010.html#the-cost-function-rewritten" style="font-size: 80%;"><b>The cost function rewritten</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs011.html#minimizing-the-cross-entropy" style="font-size: 80%;"><b>Minimizing the cross entropy</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs012.html#a-more-compact-expression" style="font-size: 80%;"><b>A more compact expression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs013.html#extending-to-more-predictors" style="font-size: 80%;"><b>Extending to more predictors</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs014.html#including-more-classes" style="font-size: 80%;"><b>Including more classes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs015.html#more-classes" style="font-size: 80%;"><b>More classes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs016.html#optimization-the-central-part-of-any-machine-learning-algortithm" style="font-size: 80%;"><b>Optimization, the central part of any Machine Learning algortithm</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs017.html#revisiting-our-logistic-regression-case" style="font-size: 80%;"><b>Revisiting our Logistic Regression case</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs018.html#the-equations-to-solve" style="font-size: 80%;"><b>The equations to solve</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs019.html#solving-using-newton-raphson-s-method" style="font-size: 80%;"><b>Solving using Newton-Raphson's method</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs020.html#example-code-for-logistic-regression" style="font-size: 80%;"><b>Example code for Logistic Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs020.html#synthetic-data-generation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Synthetic data generation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs021.html#using-scikit-learn" style="font-size: 80%;"><b>Using <b>Scikit-learn</b></b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs022.html#using-the-correlation-matrix" style="font-size: 80%;"><b>Using the correlation matrix</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs023.html#discussing-the-correlation-data" style="font-size: 80%;"><b>Discussing the correlation data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs024.html#other-measures-in-classification-studies" style="font-size: 80%;"><b>Other measures in classification studies</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs025.html#introduction-to-neural-networks" style="font-size: 80%;"><b>Introduction to Neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs026.html#artificial-neurons" style="font-size: 80%;"><b>Artificial neurons</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs027.html#neural-network-types" style="font-size: 80%;"><b>Neural network types</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#feed-forward-neural-networks" style="font-size: 80%;"><b>Feed-forward neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs029.html#convolutional-neural-network" style="font-size: 80%;"><b>Convolutional Neural Network</b></a></li>
+     <!-- navigation toc: --> <li><a href="#recurrent-neural-networks" style="font-size: 80%;"><b>Recurrent neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs031.html#other-types-of-networks" style="font-size: 80%;"><b>Other types of networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs032.html#multilayer-perceptrons" style="font-size: 80%;"><b>Multilayer perceptrons</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs033.html#why-multilayer-perceptrons" style="font-size: 80%;"><b>Why multilayer perceptrons?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs034.html#illustration-of-a-single-perceptron-model-and-a-multi-perceptron-model" style="font-size: 80%;"><b>Illustration of a single perceptron model and a multi-perceptron model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs035.html#examples-of-xor-or-and-and-gates" style="font-size: 80%;"><b>Examples of XOR, OR and AND gates</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs036.html#does-logistic-regression-do-a-better-job" style="font-size: 80%;"><b>Does Logistic Regression do a better Job?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs037.html#adding-neural-networks" style="font-size: 80%;"><b>Adding Neural Networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs043.html#matrix-vector-notation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Matrix-vector notation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs044.html#matrix-vector-notation-and-activation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Matrix-vector notation  and activation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs046.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs047.html#relevance" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Relevance</a></li>
 
         </ul>
       </li>
@@ -362,105 +260,19 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0030"></a>
 <!-- !split -->
-<h2 id="automatic-differentiation" class="anchor">Automatic differentiation </h2>
-
-<p><a href="https://en.wikipedia.org/wiki/Automatic_differentiation" target="_self">Automatic differentiation (AD)</a>, 
-also called algorithmic
-differentiation or computational differentiation,is a set of
-techniques to numerically evaluate the derivative of a function
-specified by a computer program. AD exploits the fact that every
-computer program, no matter how complicated, executes a sequence of
-elementary arithmetic operations (addition, subtraction,
-multiplication, division, etc.) and elementary functions (exp, log,
-sin, cos, etc.). By applying the chain rule repeatedly to these
-operations, derivatives of arbitrary order can be computed
-automatically, accurately to working precision, and using at most a
-small constant factor more arithmetic operations than the original
-program.
-</p>
-
-<p>Automatic differentiation is neither:</p>
-
-<ul>
-<li> Symbolic differentiation, nor</li>
-<li> Numerical differentiation (the method of finite differences).</li>
-</ul>
-<p>Symbolic differentiation can lead to inefficient code and faces the
-difficulty of converting a computer program into a single expression,
-while numerical differentiation can introduce round-off errors in the
-discretization process and cancellation
-</p>
+<h2 id="recurrent-neural-networks" class="anchor">Recurrent neural networks </h2>
 
-<p>Python has tools for so-called <b>automatic differentiation</b>.
-Consider the following example
+<p>So far we have only mentioned ANNs where information flows in one
+direction: forward. <em>Recurrent neural networks</em> on the other hand,
+have connections between nodes that form directed <em>cycles</em>. This
+creates a form of internal memory which are able to capture
+information on what has been calculated before; the output is
+dependent on the previous computations. Recurrent NNs make use of
+sequential information by performing the same task for every element
+in a sequence, where each element depends on previous elements. An
+example of such information is sentences, making recurrent NNs
+especially well-suited for handwriting and speech recognition.
 </p>
-$$
-f(x) = \sin\left(2\pi x + x^2\right)
-$$
-
-<p>which has the following derivative</p>
-$$
-f'(x) = \cos\left(2\pi x + x^2\right)\left(2\pi + 2x\right) 
-$$
-
-<p>Using <b>autograd</b> we have</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">autograd.numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
-
-<span style="color: #408080; font-style: italic"># To do elementwise differentiation:</span>
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">autograd</span> <span style="color: #008000; font-weight: bold">import</span> elementwise_grad <span style="color: #008000; font-weight: bold">as</span> egrad 
-
-<span style="color: #408080; font-style: italic"># To plot:</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">matplotlib.pyplot</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">plt</span> 
-
-
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">f</span>(x):
-    <span style="color: #008000; font-weight: bold">return</span> np<span style="color: #666666">.</span>sin(<span style="color: #666666">2*</span>np<span style="color: #666666">.</span>pi<span style="color: #666666">*</span>x <span style="color: #666666">+</span> x<span style="color: #666666">**2</span>)
-
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">f_grad_analytic</span>(x):
-    <span style="color: #008000; font-weight: bold">return</span> np<span style="color: #666666">.</span>cos(<span style="color: #666666">2*</span>np<span style="color: #666666">.</span>pi<span style="color: #666666">*</span>x <span style="color: #666666">+</span> x<span style="color: #666666">**2</span>)<span style="color: #666666">*</span>(<span style="color: #666666">2*</span>np<span style="color: #666666">.</span>pi <span style="color: #666666">+</span> <span style="color: #666666">2*</span>x)
-
-<span style="color: #408080; font-style: italic"># Do the comparison:</span>
-x <span style="color: #666666">=</span> np<span style="color: #666666">.</span>linspace(<span style="color: #666666">0</span>,<span style="color: #666666">1</span>,<span style="color: #666666">1000</span>)
-
-f_grad <span style="color: #666666">=</span> egrad(f)
-
-computed <span style="color: #666666">=</span> f_grad(x)
-analytic <span style="color: #666666">=</span> f_grad_analytic(x)
-
-plt<span style="color: #666666">.</span>title(<span style="color: #BA2121">&#39;Derivative computed from Autograd compared with the analytical derivative&#39;</span>)
-plt<span style="color: #666666">.</span>plot(x,computed,label<span style="color: #666666">=</span><span style="color: #BA2121">&#39;autograd&#39;</span>)
-plt<span style="color: #666666">.</span>plot(x,analytic,label<span style="color: #666666">=</span><span style="color: #BA2121">&#39;analytic&#39;</span>)
-
-plt<span style="color: #666666">.</span>xlabel(<span style="color: #BA2121">&#39;x&#39;</span>)
-plt<span style="color: #666666">.</span>ylabel(<span style="color: #BA2121">&#39;y&#39;</span>)
-plt<span style="color: #666666">.</span>legend()
-
-plt<span style="color: #666666">.</span>show()
-
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;The max absolute difference is: </span><span style="color: #BB6688; font-weight: bold">%g</span><span style="color: #BA2121">&quot;</span><span style="color: #666666">%</span>(np<span style="color: #666666">.</span>max(np<span style="color: #666666">.</span>abs(computed <span style="color: #666666">-</span> analytic))))
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
 
 <p>
 <!-- navigation buttons at the bottom of the page -->
@@ -487,7 +299,7 @@ <h2 id="automatic-differentiation" class="anchor">Automatic differentiation </h2
   <li><a href="http://github.com/._week40-bs038.html">39</a></li>
   <li><a href="http://github.com/._week40-bs039.html">40</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week40-bs068.html">69</a></li>
+  <li><a href="http://github.com/._week40-bs047.html">48</a></li>
   <li><a href="http://github.com/._week40-bs031.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week40/html/._week40-bs031.html b/doc/pub/week40/html/._week40-bs031.html
index 7f289ab36..09c5083c0 100644
--- a/doc/pub/week40/html/._week40-bs031.html
+++ b/doc/pub/week40/html/._week40-bs031.html
@@ -36,11 +36,10 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Plans for week 40', 2, None, 'plans-for-week-40'),
-              ('Lecture Monday September 30, 2024',
+ 'sections': [('Lecture Monday September 29, 2025',
                2,
                None,
-               'lecture-monday-september-30-2024'),
+               'lecture-monday-september-29-2025'),
               ('Suggested readings and videos',
                2,
                None,
@@ -49,145 +48,71 @@
                2,
                None,
                'lab-sessions-tuesday-and-wednesday'),
-              ('Summary from last week, using gradient descent methods, '
-               'limitations',
+              ('Logistic Regression, from last week',
                2,
                None,
-               'summary-from-last-week-using-gradient-descent-methods-limitations'),
-              ('Simple implementation of GD for OLS, Ridge and Lasso',
+               'logistic-regression-from-last-week'),
+              ('Classification problems', 2, None, 'classification-problems'),
+              ('Optimization and Deep learning',
                2,
                None,
-               'simple-implementation-of-gd-for-ols-ridge-and-lasso'),
-              ("But none of these can compete with Newton's method",
+               'optimization-and-deep-learning'),
+              ('Basics', 2, None, 'basics'),
+              ('Two parameters', 2, None, 'two-parameters'),
+              ('Maximum likelihood', 2, None, 'maximum-likelihood'),
+              ('The cost function rewritten',
                2,
                None,
-               'but-none-of-these-can-compete-with-newton-s-method'),
-              ('Gradient descent and Logistic regression',
+               'the-cost-function-rewritten'),
+              ('Minimizing the cross entropy',
                2,
                None,
-               'gradient-descent-and-logistic-regression'),
-              ('Overview video on Stochastic Gradient Descent',
+               'minimizing-the-cross-entropy'),
+              ('A more compact expression',
                2,
                None,
-               'overview-video-on-stochastic-gradient-descent'),
-              ('Batches and mini-batches', 2, None, 'batches-and-mini-batches'),
-              ('Stochastic Gradient Descent (SGD)',
+               'a-more-compact-expression'),
+              ('Extending to more predictors',
                2,
                None,
-               'stochastic-gradient-descent-sgd'),
-              ('Stochastic Gradient Descent',
+               'extending-to-more-predictors'),
+              ('Including more classes', 2, None, 'including-more-classes'),
+              ('More classes', 2, None, 'more-classes'),
+              ('Optimization, the central part of any Machine Learning '
+               'algortithm',
                2,
                None,
-               'stochastic-gradient-descent'),
-              ('Computation of gradients', 2, None, 'computation-of-gradients'),
-              ('SGD example', 2, None, 'sgd-example'),
-              ('The gradient step', 2, None, 'the-gradient-step'),
-              ('Simple example code', 2, None, 'simple-example-code'),
-              ('When do we stop?', 2, None, 'when-do-we-stop'),
-              ('Slightly different approach',
+               'optimization-the-central-part-of-any-machine-learning-algortithm'),
+              ('Revisiting our Logistic Regression case',
                2,
                None,
-               'slightly-different-approach'),
-              ('Time decay rate', 2, None, 'time-decay-rate'),
-              ('Code with a Number of Minibatches which varies',
+               'revisiting-our-logistic-regression-case'),
+              ('The equations to solve', 2, None, 'the-equations-to-solve'),
+              ("Solving using Newton-Raphson's method",
                2,
                None,
-               'code-with-a-number-of-minibatches-which-varies'),
-              ('Replace or not', 2, None, 'replace-or-not'),
-              ('Momentum based GD', 2, None, 'momentum-based-gd'),
-              ('More on momentum based approaches',
+               'solving-using-newton-raphson-s-method'),
+              ('Example code for Logistic Regression',
                2,
                None,
-               'more-on-momentum-based-approaches'),
-              ('Momentum parameter', 2, None, 'momentum-parameter'),
-              ('Second moment of the gradient',
-               2,
-               None,
-               'second-moment-of-the-gradient'),
-              ('RMS prop', 2, None, 'rms-prop'),
-              ('"ADAM optimizer":"https://arxiv.org/abs/1412.6980"',
-               2,
-               None,
-               'adam-optimizer-https-arxiv-org-abs-1412-6980'),
-              ('Algorithms and codes for Adagrad, RMSprop and Adam',
-               2,
-               None,
-               'algorithms-and-codes-for-adagrad-rmsprop-and-adam'),
-              ('AdaGrad algorithm, taken from "Goodfellow et '
-               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
-               2,
-               None,
-               'adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
-              ('RMSProp algorithm, taken from "Goodfellow et '
-               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
-               2,
-               None,
-               'rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
-              ('ADAM algorithm, taken from "Goodfellow et '
-               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
-               2,
-               None,
-               'adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
-              ('Practical tips', 2, None, 'practical-tips'),
-              ('Automatic differentiation',
-               2,
-               None,
-               'automatic-differentiation'),
-              ('Using autograd', 2, None, 'using-autograd'),
-              ('Autograd with more complicated functions',
-               2,
-               None,
-               'autograd-with-more-complicated-functions'),
-              ('More complicated functions using the elements of their '
-               'arguments directly',
-               2,
-               None,
-               'more-complicated-functions-using-the-elements-of-their-arguments-directly'),
-              ('Functions using mathematical functions from Numpy',
-               2,
-               None,
-               'functions-using-mathematical-functions-from-numpy'),
-              ('More autograd', 2, None, 'more-autograd'),
-              ('And  with loops', 2, None, 'and-with-loops'),
-              ('Using recursion', 2, None, 'using-recursion'),
-              ('Using Autograd with OLS', 2, None, 'using-autograd-with-ols'),
-              ('Same code but now with momentum gradient descent',
-               2,
-               None,
-               'same-code-but-now-with-momentum-gradient-descent'),
-              ('Including Stochastic Gradient Descent with Autograd',
-               2,
-               None,
-               'including-stochastic-gradient-descent-with-autograd'),
-              ('Same code but now with momentum gradient descent',
-               2,
-               None,
-               'same-code-but-now-with-momentum-gradient-descent'),
-              ('Similar (second order function now) problem but now with '
-               'AdaGrad',
-               2,
+               'example-code-for-logistic-regression'),
+              ('Synthetic data generation',
+               3,
                None,
-               'similar-second-order-function-now-problem-but-now-with-adagrad'),
-              ('RMSprop for adaptive learning rate with Stochastic Gradient '
-               'Descent',
+               'synthetic-data-generation'),
+              ('Using _Scikit-learn_', 2, None, 'using-scikit-learn'),
+              ('Using the correlation matrix',
                2,
                None,
-               'rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent'),
-              ('And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"',
+               'using-the-correlation-matrix'),
+              ('Discussing the correlation data',
                2,
                None,
-               'and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf'),
-              ('And Logistic Regression', 2, None, 'and-logistic-regression'),
-              ('Introducing "JAX":"https://jax.readthedocs.io/en/latest/"',
+               'discussing-the-correlation-data'),
+              ('Other measures in classification studies',
                2,
                None,
-               'introducing-jax-https-jax-readthedocs-io-en-latest'),
-              ('Getting started with Jax, note the way we import numpy',
-               3,
-               None,
-               'getting-started-with-jax-note-the-way-we-import-numpy'),
-              ('A warm-up example', 3, None, 'a-warm-up-example'),
-              ('A more advanced example', 3, None, 'a-more-advanced-example'),
+               'other-measures-in-classification-studies'),
               ('Introduction to Neural networks',
                2,
                None,
@@ -276,81 +201,54 @@
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs001.html#plans-for-week-40" style="font-size: 80%;"><b>Plans for week 40</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs002.html#lecture-monday-september-30-2024" style="font-size: 80%;"><b>Lecture Monday September 30, 2024</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs003.html#suggested-readings-and-videos" style="font-size: 80%;"><b>Suggested readings and videos</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs004.html#lab-sessions-tuesday-and-wednesday" style="font-size: 80%;"><b>Lab sessions Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs005.html#summary-from-last-week-using-gradient-descent-methods-limitations" style="font-size: 80%;"><b>Summary from last week, using gradient descent methods, limitations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs006.html#simple-implementation-of-gd-for-ols-ridge-and-lasso" style="font-size: 80%;"><b>Simple implementation of GD for OLS, Ridge and Lasso</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs007.html#but-none-of-these-can-compete-with-newton-s-method" style="font-size: 80%;"><b>But none of these can compete with Newton's method</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs008.html#gradient-descent-and-logistic-regression" style="font-size: 80%;"><b>Gradient descent and Logistic regression</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs009.html#overview-video-on-stochastic-gradient-descent" style="font-size: 80%;"><b>Overview video on Stochastic Gradient Descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs010.html#batches-and-mini-batches" style="font-size: 80%;"><b>Batches and mini-batches</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs011.html#stochastic-gradient-descent-sgd" style="font-size: 80%;"><b>Stochastic Gradient Descent (SGD)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs012.html#stochastic-gradient-descent" style="font-size: 80%;"><b>Stochastic Gradient Descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs013.html#computation-of-gradients" style="font-size: 80%;"><b>Computation of gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs014.html#sgd-example" style="font-size: 80%;"><b>SGD example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs015.html#the-gradient-step" style="font-size: 80%;"><b>The gradient step</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs016.html#simple-example-code" style="font-size: 80%;"><b>Simple example code</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs017.html#when-do-we-stop" style="font-size: 80%;"><b>When do we stop?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs018.html#slightly-different-approach" style="font-size: 80%;"><b>Slightly different approach</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs019.html#time-decay-rate" style="font-size: 80%;"><b>Time decay rate</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs020.html#code-with-a-number-of-minibatches-which-varies" style="font-size: 80%;"><b>Code with a Number of Minibatches which varies</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs021.html#replace-or-not" style="font-size: 80%;"><b>Replace or not</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs022.html#momentum-based-gd" style="font-size: 80%;"><b>Momentum based GD</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs023.html#more-on-momentum-based-approaches" style="font-size: 80%;"><b>More on momentum based approaches</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs024.html#momentum-parameter" style="font-size: 80%;"><b>Momentum parameter</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs025.html#second-moment-of-the-gradient" style="font-size: 80%;"><b>Second moment of the gradient</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs026.html#rms-prop" style="font-size: 80%;"><b>RMS prop</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs027.html#adam-optimizer-https-arxiv-org-abs-1412-6980" style="font-size: 80%;"><b>"ADAM optimizer":"https://arxiv.org/abs/1412.6980"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#algorithms-and-codes-for-adagrad-rmsprop-and-adam" style="font-size: 80%;"><b>Algorithms and codes for Adagrad, RMSprop and Adam</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>AdaGrad algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>RMSProp algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>ADAM algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs029.html#practical-tips" style="font-size: 80%;"><b>Practical tips</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs030.html#automatic-differentiation" style="font-size: 80%;"><b>Automatic differentiation</b></a></li>
-     <!-- navigation toc: --> <li><a href="#using-autograd" style="font-size: 80%;"><b>Using autograd</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs032.html#autograd-with-more-complicated-functions" style="font-size: 80%;"><b>Autograd with more complicated functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs033.html#more-complicated-functions-using-the-elements-of-their-arguments-directly" style="font-size: 80%;"><b>More complicated functions using the elements of their arguments directly</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs034.html#functions-using-mathematical-functions-from-numpy" style="font-size: 80%;"><b>Functions using mathematical functions from Numpy</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs035.html#more-autograd" style="font-size: 80%;"><b>More autograd</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs036.html#and-with-loops" style="font-size: 80%;"><b>And  with loops</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs037.html#using-recursion" style="font-size: 80%;"><b>Using recursion</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs038.html#using-autograd-with-ols" style="font-size: 80%;"><b>Using Autograd with OLS</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs041.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs040.html#including-stochastic-gradient-descent-with-autograd" style="font-size: 80%;"><b>Including Stochastic Gradient Descent with Autograd</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs041.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#similar-second-order-function-now-problem-but-now-with-adagrad" style="font-size: 80%;"><b>Similar (second order function now) problem but now with AdaGrad</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs043.html#rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent" style="font-size: 80%;"><b>RMSprop for adaptive learning rate with Stochastic Gradient Descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs044.html#and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf" style="font-size: 80%;"><b>And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#and-logistic-regression" style="font-size: 80%;"><b>And Logistic Regression</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#introducing-jax-https-jax-readthedocs-io-en-latest" style="font-size: 80%;"><b>Introducing "JAX":"https://jax.readthedocs.io/en/latest/"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#getting-started-with-jax-note-the-way-we-import-numpy" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Getting started with Jax, note the way we import numpy</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#a-warm-up-example" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;A warm-up example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#a-more-advanced-example" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;A more advanced example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs046.html#introduction-to-neural-networks" style="font-size: 80%;"><b>Introduction to Neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs047.html#artificial-neurons" style="font-size: 80%;"><b>Artificial neurons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs048.html#neural-network-types" style="font-size: 80%;"><b>Neural network types</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs049.html#feed-forward-neural-networks" style="font-size: 80%;"><b>Feed-forward neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs050.html#convolutional-neural-network" style="font-size: 80%;"><b>Convolutional Neural Network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs051.html#recurrent-neural-networks" style="font-size: 80%;"><b>Recurrent neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs052.html#other-types-of-networks" style="font-size: 80%;"><b>Other types of networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs053.html#multilayer-perceptrons" style="font-size: 80%;"><b>Multilayer perceptrons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs054.html#why-multilayer-perceptrons" style="font-size: 80%;"><b>Why multilayer perceptrons?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs055.html#illustration-of-a-single-perceptron-model-and-a-multi-perceptron-model" style="font-size: 80%;"><b>Illustration of a single perceptron model and a multi-perceptron model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs056.html#examples-of-xor-or-and-and-gates" style="font-size: 80%;"><b>Examples of XOR, OR and AND gates</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs057.html#does-logistic-regression-do-a-better-job" style="font-size: 80%;"><b>Does Logistic Regression do a better Job?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs058.html#adding-neural-networks" style="font-size: 80%;"><b>Adding Neural Networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs064.html#matrix-vector-notation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Matrix-vector notation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs065.html#matrix-vector-notation-and-activation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Matrix-vector notation  and activation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs066.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs067.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs068.html#relevance" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Relevance</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs001.html#lecture-monday-september-29-2025" style="font-size: 80%;"><b>Lecture Monday September 29, 2025</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs002.html#suggested-readings-and-videos" style="font-size: 80%;"><b>Suggested readings and videos</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs003.html#lab-sessions-tuesday-and-wednesday" style="font-size: 80%;"><b>Lab sessions Tuesday and Wednesday</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs004.html#logistic-regression-from-last-week" style="font-size: 80%;"><b>Logistic Regression, from last week</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs005.html#classification-problems" style="font-size: 80%;"><b>Classification problems</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs006.html#optimization-and-deep-learning" style="font-size: 80%;"><b>Optimization and Deep learning</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs007.html#basics" style="font-size: 80%;"><b>Basics</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs008.html#two-parameters" style="font-size: 80%;"><b>Two parameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs009.html#maximum-likelihood" style="font-size: 80%;"><b>Maximum likelihood</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs010.html#the-cost-function-rewritten" style="font-size: 80%;"><b>The cost function rewritten</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs011.html#minimizing-the-cross-entropy" style="font-size: 80%;"><b>Minimizing the cross entropy</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs012.html#a-more-compact-expression" style="font-size: 80%;"><b>A more compact expression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs013.html#extending-to-more-predictors" style="font-size: 80%;"><b>Extending to more predictors</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs014.html#including-more-classes" style="font-size: 80%;"><b>Including more classes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs015.html#more-classes" style="font-size: 80%;"><b>More classes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs016.html#optimization-the-central-part-of-any-machine-learning-algortithm" style="font-size: 80%;"><b>Optimization, the central part of any Machine Learning algortithm</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs017.html#revisiting-our-logistic-regression-case" style="font-size: 80%;"><b>Revisiting our Logistic Regression case</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs018.html#the-equations-to-solve" style="font-size: 80%;"><b>The equations to solve</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs019.html#solving-using-newton-raphson-s-method" style="font-size: 80%;"><b>Solving using Newton-Raphson's method</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs020.html#example-code-for-logistic-regression" style="font-size: 80%;"><b>Example code for Logistic Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs020.html#synthetic-data-generation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Synthetic data generation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs021.html#using-scikit-learn" style="font-size: 80%;"><b>Using <b>Scikit-learn</b></b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs022.html#using-the-correlation-matrix" style="font-size: 80%;"><b>Using the correlation matrix</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs023.html#discussing-the-correlation-data" style="font-size: 80%;"><b>Discussing the correlation data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs024.html#other-measures-in-classification-studies" style="font-size: 80%;"><b>Other measures in classification studies</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs025.html#introduction-to-neural-networks" style="font-size: 80%;"><b>Introduction to Neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs026.html#artificial-neurons" style="font-size: 80%;"><b>Artificial neurons</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs027.html#neural-network-types" style="font-size: 80%;"><b>Neural network types</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#feed-forward-neural-networks" style="font-size: 80%;"><b>Feed-forward neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs029.html#convolutional-neural-network" style="font-size: 80%;"><b>Convolutional Neural Network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs030.html#recurrent-neural-networks" style="font-size: 80%;"><b>Recurrent neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="#other-types-of-networks" style="font-size: 80%;"><b>Other types of networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs032.html#multilayer-perceptrons" style="font-size: 80%;"><b>Multilayer perceptrons</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs033.html#why-multilayer-perceptrons" style="font-size: 80%;"><b>Why multilayer perceptrons?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs034.html#illustration-of-a-single-perceptron-model-and-a-multi-perceptron-model" style="font-size: 80%;"><b>Illustration of a single perceptron model and a multi-perceptron model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs035.html#examples-of-xor-or-and-and-gates" style="font-size: 80%;"><b>Examples of XOR, OR and AND gates</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs036.html#does-logistic-regression-do-a-better-job" style="font-size: 80%;"><b>Does Logistic Regression do a better Job?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs037.html#adding-neural-networks" style="font-size: 80%;"><b>Adding Neural Networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs043.html#matrix-vector-notation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Matrix-vector notation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs044.html#matrix-vector-notation-and-activation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Matrix-vector notation  and activation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs046.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs047.html#relevance" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Relevance</a></li>
 
         </ul>
       </li>
@@ -361,56 +259,21 @@
 <div class="container">
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0031"></a>
-<!-- !split  -->
-<h2 id="using-autograd" class="anchor">Using autograd </h2>
+<!-- !split -->
+<h2 id="other-types-of-networks" class="anchor">Other types of networks </h2>
 
-<p>Here we
-experiment with what kind of functions Autograd is capable
-of finding the gradient of. The following Python functions are just
-meant to illustrate what Autograd can do, but please feel free to
-experiment with other, possibly more complicated, functions as well.
+<p>There are many other kinds of ANNs that have been developed. One type
+that is specifically designed for interpolation in multidimensional
+space is the radial basis function (RBF) network. RBFs are typically
+made up of three layers: an input layer, a hidden layer with
+non-linear radial symmetric activation functions and a linear output
+layer (''linear'' here means that each node in the output layer has a
+linear activation function). The layers are normally fully-connected
+and there are no cycles, thus RBFs can be viewed as a type of
+fully-connected FFNN. They are however usually treated as a separate
+type of NN due the unusual activation functions.
 </p>
 
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">autograd.numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">autograd</span> <span style="color: #008000; font-weight: bold">import</span> grad
-
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">f1</span>(x):
-    <span style="color: #008000; font-weight: bold">return</span> x<span style="color: #666666">**3</span> <span style="color: #666666">+</span> <span style="color: #666666">1</span>
-
-f1_grad <span style="color: #666666">=</span> grad(f1)
-
-<span style="color: #408080; font-style: italic"># Remember to send in float as argument to the computed gradient from Autograd!</span>
-a <span style="color: #666666">=</span> <span style="color: #666666">1.0</span>
-
-<span style="color: #408080; font-style: italic"># See the evaluated gradient at a using autograd:</span>
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;The gradient of f1 evaluated at a = </span><span style="color: #BB6688; font-weight: bold">%g</span><span style="color: #BA2121"> using autograd is: </span><span style="color: #BB6688; font-weight: bold">%g</span><span style="color: #BA2121">&quot;</span><span style="color: #666666">%</span>(a,f1_grad(a)))
-
-<span style="color: #408080; font-style: italic"># Compare with the analytical derivative, that is f1&#39;(x) = 3*x**2 </span>
-grad_analytical <span style="color: #666666">=</span> <span style="color: #666666">3*</span>a<span style="color: #666666">**2</span>
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;The gradient of f1 evaluated at a = </span><span style="color: #BB6688; font-weight: bold">%g</span><span style="color: #BA2121"> by finding the analytic expression is: </span><span style="color: #BB6688; font-weight: bold">%g</span><span style="color: #BA2121">&quot;</span><span style="color: #666666">%</span>(a,grad_analytical))
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-
 <p>
 <!-- navigation buttons at the bottom of the page -->
 <ul class="pagination">
@@ -436,7 +299,7 @@ <h2 id="using-autograd" class="anchor">Using autograd </h2>
   <li><a href="http://github.com/._week40-bs039.html">40</a></li>
   <li><a href="http://github.com/._week40-bs040.html">41</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week40-bs068.html">69</a></li>
+  <li><a href="http://github.com/._week40-bs047.html">48</a></li>
   <li><a href="http://github.com/._week40-bs032.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week40/html/._week40-bs032.html b/doc/pub/week40/html/._week40-bs032.html
index d37d20e9e..4655fa193 100644
--- a/doc/pub/week40/html/._week40-bs032.html
+++ b/doc/pub/week40/html/._week40-bs032.html
@@ -36,11 +36,10 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Plans for week 40', 2, None, 'plans-for-week-40'),
-              ('Lecture Monday September 30, 2024',
+ 'sections': [('Lecture Monday September 29, 2025',
                2,
                None,
-               'lecture-monday-september-30-2024'),
+               'lecture-monday-september-29-2025'),
               ('Suggested readings and videos',
                2,
                None,
@@ -49,145 +48,71 @@
                2,
                None,
                'lab-sessions-tuesday-and-wednesday'),
-              ('Summary from last week, using gradient descent methods, '
-               'limitations',
+              ('Logistic Regression, from last week',
                2,
                None,
-               'summary-from-last-week-using-gradient-descent-methods-limitations'),
-              ('Simple implementation of GD for OLS, Ridge and Lasso',
+               'logistic-regression-from-last-week'),
+              ('Classification problems', 2, None, 'classification-problems'),
+              ('Optimization and Deep learning',
                2,
                None,
-               'simple-implementation-of-gd-for-ols-ridge-and-lasso'),
-              ("But none of these can compete with Newton's method",
+               'optimization-and-deep-learning'),
+              ('Basics', 2, None, 'basics'),
+              ('Two parameters', 2, None, 'two-parameters'),
+              ('Maximum likelihood', 2, None, 'maximum-likelihood'),
+              ('The cost function rewritten',
                2,
                None,
-               'but-none-of-these-can-compete-with-newton-s-method'),
-              ('Gradient descent and Logistic regression',
+               'the-cost-function-rewritten'),
+              ('Minimizing the cross entropy',
                2,
                None,
-               'gradient-descent-and-logistic-regression'),
-              ('Overview video on Stochastic Gradient Descent',
+               'minimizing-the-cross-entropy'),
+              ('A more compact expression',
                2,
                None,
-               'overview-video-on-stochastic-gradient-descent'),
-              ('Batches and mini-batches', 2, None, 'batches-and-mini-batches'),
-              ('Stochastic Gradient Descent (SGD)',
+               'a-more-compact-expression'),
+              ('Extending to more predictors',
                2,
                None,
-               'stochastic-gradient-descent-sgd'),
-              ('Stochastic Gradient Descent',
+               'extending-to-more-predictors'),
+              ('Including more classes', 2, None, 'including-more-classes'),
+              ('More classes', 2, None, 'more-classes'),
+              ('Optimization, the central part of any Machine Learning '
+               'algortithm',
                2,
                None,
-               'stochastic-gradient-descent'),
-              ('Computation of gradients', 2, None, 'computation-of-gradients'),
-              ('SGD example', 2, None, 'sgd-example'),
-              ('The gradient step', 2, None, 'the-gradient-step'),
-              ('Simple example code', 2, None, 'simple-example-code'),
-              ('When do we stop?', 2, None, 'when-do-we-stop'),
-              ('Slightly different approach',
+               'optimization-the-central-part-of-any-machine-learning-algortithm'),
+              ('Revisiting our Logistic Regression case',
                2,
                None,
-               'slightly-different-approach'),
-              ('Time decay rate', 2, None, 'time-decay-rate'),
-              ('Code with a Number of Minibatches which varies',
+               'revisiting-our-logistic-regression-case'),
+              ('The equations to solve', 2, None, 'the-equations-to-solve'),
+              ("Solving using Newton-Raphson's method",
                2,
                None,
-               'code-with-a-number-of-minibatches-which-varies'),
-              ('Replace or not', 2, None, 'replace-or-not'),
-              ('Momentum based GD', 2, None, 'momentum-based-gd'),
-              ('More on momentum based approaches',
+               'solving-using-newton-raphson-s-method'),
+              ('Example code for Logistic Regression',
                2,
                None,
-               'more-on-momentum-based-approaches'),
-              ('Momentum parameter', 2, None, 'momentum-parameter'),
-              ('Second moment of the gradient',
-               2,
-               None,
-               'second-moment-of-the-gradient'),
-              ('RMS prop', 2, None, 'rms-prop'),
-              ('"ADAM optimizer":"https://arxiv.org/abs/1412.6980"',
-               2,
-               None,
-               'adam-optimizer-https-arxiv-org-abs-1412-6980'),
-              ('Algorithms and codes for Adagrad, RMSprop and Adam',
-               2,
-               None,
-               'algorithms-and-codes-for-adagrad-rmsprop-and-adam'),
-              ('AdaGrad algorithm, taken from "Goodfellow et '
-               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
-               2,
-               None,
-               'adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
-              ('RMSProp algorithm, taken from "Goodfellow et '
-               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
-               2,
-               None,
-               'rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
-              ('ADAM algorithm, taken from "Goodfellow et '
-               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
-               2,
-               None,
-               'adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
-              ('Practical tips', 2, None, 'practical-tips'),
-              ('Automatic differentiation',
-               2,
-               None,
-               'automatic-differentiation'),
-              ('Using autograd', 2, None, 'using-autograd'),
-              ('Autograd with more complicated functions',
-               2,
-               None,
-               'autograd-with-more-complicated-functions'),
-              ('More complicated functions using the elements of their '
-               'arguments directly',
-               2,
-               None,
-               'more-complicated-functions-using-the-elements-of-their-arguments-directly'),
-              ('Functions using mathematical functions from Numpy',
-               2,
-               None,
-               'functions-using-mathematical-functions-from-numpy'),
-              ('More autograd', 2, None, 'more-autograd'),
-              ('And  with loops', 2, None, 'and-with-loops'),
-              ('Using recursion', 2, None, 'using-recursion'),
-              ('Using Autograd with OLS', 2, None, 'using-autograd-with-ols'),
-              ('Same code but now with momentum gradient descent',
-               2,
-               None,
-               'same-code-but-now-with-momentum-gradient-descent'),
-              ('Including Stochastic Gradient Descent with Autograd',
-               2,
-               None,
-               'including-stochastic-gradient-descent-with-autograd'),
-              ('Same code but now with momentum gradient descent',
-               2,
-               None,
-               'same-code-but-now-with-momentum-gradient-descent'),
-              ('Similar (second order function now) problem but now with '
-               'AdaGrad',
-               2,
+               'example-code-for-logistic-regression'),
+              ('Synthetic data generation',
+               3,
                None,
-               'similar-second-order-function-now-problem-but-now-with-adagrad'),
-              ('RMSprop for adaptive learning rate with Stochastic Gradient '
-               'Descent',
+               'synthetic-data-generation'),
+              ('Using _Scikit-learn_', 2, None, 'using-scikit-learn'),
+              ('Using the correlation matrix',
                2,
                None,
-               'rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent'),
-              ('And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"',
+               'using-the-correlation-matrix'),
+              ('Discussing the correlation data',
                2,
                None,
-               'and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf'),
-              ('And Logistic Regression', 2, None, 'and-logistic-regression'),
-              ('Introducing "JAX":"https://jax.readthedocs.io/en/latest/"',
+               'discussing-the-correlation-data'),
+              ('Other measures in classification studies',
                2,
                None,
-               'introducing-jax-https-jax-readthedocs-io-en-latest'),
-              ('Getting started with Jax, note the way we import numpy',
-               3,
-               None,
-               'getting-started-with-jax-note-the-way-we-import-numpy'),
-              ('A warm-up example', 3, None, 'a-warm-up-example'),
-              ('A more advanced example', 3, None, 'a-more-advanced-example'),
+               'other-measures-in-classification-studies'),
               ('Introduction to Neural networks',
                2,
                None,
@@ -276,81 +201,54 @@
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs001.html#plans-for-week-40" style="font-size: 80%;"><b>Plans for week 40</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs002.html#lecture-monday-september-30-2024" style="font-size: 80%;"><b>Lecture Monday September 30, 2024</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs003.html#suggested-readings-and-videos" style="font-size: 80%;"><b>Suggested readings and videos</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs004.html#lab-sessions-tuesday-and-wednesday" style="font-size: 80%;"><b>Lab sessions Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs005.html#summary-from-last-week-using-gradient-descent-methods-limitations" style="font-size: 80%;"><b>Summary from last week, using gradient descent methods, limitations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs006.html#simple-implementation-of-gd-for-ols-ridge-and-lasso" style="font-size: 80%;"><b>Simple implementation of GD for OLS, Ridge and Lasso</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs007.html#but-none-of-these-can-compete-with-newton-s-method" style="font-size: 80%;"><b>But none of these can compete with Newton's method</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs008.html#gradient-descent-and-logistic-regression" style="font-size: 80%;"><b>Gradient descent and Logistic regression</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs009.html#overview-video-on-stochastic-gradient-descent" style="font-size: 80%;"><b>Overview video on Stochastic Gradient Descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs010.html#batches-and-mini-batches" style="font-size: 80%;"><b>Batches and mini-batches</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs011.html#stochastic-gradient-descent-sgd" style="font-size: 80%;"><b>Stochastic Gradient Descent (SGD)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs012.html#stochastic-gradient-descent" style="font-size: 80%;"><b>Stochastic Gradient Descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs013.html#computation-of-gradients" style="font-size: 80%;"><b>Computation of gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs014.html#sgd-example" style="font-size: 80%;"><b>SGD example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs015.html#the-gradient-step" style="font-size: 80%;"><b>The gradient step</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs016.html#simple-example-code" style="font-size: 80%;"><b>Simple example code</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs017.html#when-do-we-stop" style="font-size: 80%;"><b>When do we stop?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs018.html#slightly-different-approach" style="font-size: 80%;"><b>Slightly different approach</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs019.html#time-decay-rate" style="font-size: 80%;"><b>Time decay rate</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs020.html#code-with-a-number-of-minibatches-which-varies" style="font-size: 80%;"><b>Code with a Number of Minibatches which varies</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs021.html#replace-or-not" style="font-size: 80%;"><b>Replace or not</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs022.html#momentum-based-gd" style="font-size: 80%;"><b>Momentum based GD</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs023.html#more-on-momentum-based-approaches" style="font-size: 80%;"><b>More on momentum based approaches</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs024.html#momentum-parameter" style="font-size: 80%;"><b>Momentum parameter</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs025.html#second-moment-of-the-gradient" style="font-size: 80%;"><b>Second moment of the gradient</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs026.html#rms-prop" style="font-size: 80%;"><b>RMS prop</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs027.html#adam-optimizer-https-arxiv-org-abs-1412-6980" style="font-size: 80%;"><b>"ADAM optimizer":"https://arxiv.org/abs/1412.6980"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#algorithms-and-codes-for-adagrad-rmsprop-and-adam" style="font-size: 80%;"><b>Algorithms and codes for Adagrad, RMSprop and Adam</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>AdaGrad algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>RMSProp algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>ADAM algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs029.html#practical-tips" style="font-size: 80%;"><b>Practical tips</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs030.html#automatic-differentiation" style="font-size: 80%;"><b>Automatic differentiation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs031.html#using-autograd" style="font-size: 80%;"><b>Using autograd</b></a></li>
-     <!-- navigation toc: --> <li><a href="#autograd-with-more-complicated-functions" style="font-size: 80%;"><b>Autograd with more complicated functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs033.html#more-complicated-functions-using-the-elements-of-their-arguments-directly" style="font-size: 80%;"><b>More complicated functions using the elements of their arguments directly</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs034.html#functions-using-mathematical-functions-from-numpy" style="font-size: 80%;"><b>Functions using mathematical functions from Numpy</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs035.html#more-autograd" style="font-size: 80%;"><b>More autograd</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs036.html#and-with-loops" style="font-size: 80%;"><b>And  with loops</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs037.html#using-recursion" style="font-size: 80%;"><b>Using recursion</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs038.html#using-autograd-with-ols" style="font-size: 80%;"><b>Using Autograd with OLS</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs041.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs040.html#including-stochastic-gradient-descent-with-autograd" style="font-size: 80%;"><b>Including Stochastic Gradient Descent with Autograd</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs041.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#similar-second-order-function-now-problem-but-now-with-adagrad" style="font-size: 80%;"><b>Similar (second order function now) problem but now with AdaGrad</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs043.html#rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent" style="font-size: 80%;"><b>RMSprop for adaptive learning rate with Stochastic Gradient Descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs044.html#and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf" style="font-size: 80%;"><b>And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#and-logistic-regression" style="font-size: 80%;"><b>And Logistic Regression</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#introducing-jax-https-jax-readthedocs-io-en-latest" style="font-size: 80%;"><b>Introducing "JAX":"https://jax.readthedocs.io/en/latest/"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#getting-started-with-jax-note-the-way-we-import-numpy" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Getting started with Jax, note the way we import numpy</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#a-warm-up-example" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;A warm-up example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#a-more-advanced-example" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;A more advanced example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs046.html#introduction-to-neural-networks" style="font-size: 80%;"><b>Introduction to Neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs047.html#artificial-neurons" style="font-size: 80%;"><b>Artificial neurons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs048.html#neural-network-types" style="font-size: 80%;"><b>Neural network types</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs049.html#feed-forward-neural-networks" style="font-size: 80%;"><b>Feed-forward neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs050.html#convolutional-neural-network" style="font-size: 80%;"><b>Convolutional Neural Network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs051.html#recurrent-neural-networks" style="font-size: 80%;"><b>Recurrent neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs052.html#other-types-of-networks" style="font-size: 80%;"><b>Other types of networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs053.html#multilayer-perceptrons" style="font-size: 80%;"><b>Multilayer perceptrons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs054.html#why-multilayer-perceptrons" style="font-size: 80%;"><b>Why multilayer perceptrons?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs055.html#illustration-of-a-single-perceptron-model-and-a-multi-perceptron-model" style="font-size: 80%;"><b>Illustration of a single perceptron model and a multi-perceptron model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs056.html#examples-of-xor-or-and-and-gates" style="font-size: 80%;"><b>Examples of XOR, OR and AND gates</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs057.html#does-logistic-regression-do-a-better-job" style="font-size: 80%;"><b>Does Logistic Regression do a better Job?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs058.html#adding-neural-networks" style="font-size: 80%;"><b>Adding Neural Networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs064.html#matrix-vector-notation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Matrix-vector notation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs065.html#matrix-vector-notation-and-activation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Matrix-vector notation  and activation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs066.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs067.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs068.html#relevance" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Relevance</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs001.html#lecture-monday-september-29-2025" style="font-size: 80%;"><b>Lecture Monday September 29, 2025</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs002.html#suggested-readings-and-videos" style="font-size: 80%;"><b>Suggested readings and videos</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs003.html#lab-sessions-tuesday-and-wednesday" style="font-size: 80%;"><b>Lab sessions Tuesday and Wednesday</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs004.html#logistic-regression-from-last-week" style="font-size: 80%;"><b>Logistic Regression, from last week</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs005.html#classification-problems" style="font-size: 80%;"><b>Classification problems</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs006.html#optimization-and-deep-learning" style="font-size: 80%;"><b>Optimization and Deep learning</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs007.html#basics" style="font-size: 80%;"><b>Basics</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs008.html#two-parameters" style="font-size: 80%;"><b>Two parameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs009.html#maximum-likelihood" style="font-size: 80%;"><b>Maximum likelihood</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs010.html#the-cost-function-rewritten" style="font-size: 80%;"><b>The cost function rewritten</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs011.html#minimizing-the-cross-entropy" style="font-size: 80%;"><b>Minimizing the cross entropy</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs012.html#a-more-compact-expression" style="font-size: 80%;"><b>A more compact expression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs013.html#extending-to-more-predictors" style="font-size: 80%;"><b>Extending to more predictors</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs014.html#including-more-classes" style="font-size: 80%;"><b>Including more classes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs015.html#more-classes" style="font-size: 80%;"><b>More classes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs016.html#optimization-the-central-part-of-any-machine-learning-algortithm" style="font-size: 80%;"><b>Optimization, the central part of any Machine Learning algortithm</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs017.html#revisiting-our-logistic-regression-case" style="font-size: 80%;"><b>Revisiting our Logistic Regression case</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs018.html#the-equations-to-solve" style="font-size: 80%;"><b>The equations to solve</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs019.html#solving-using-newton-raphson-s-method" style="font-size: 80%;"><b>Solving using Newton-Raphson's method</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs020.html#example-code-for-logistic-regression" style="font-size: 80%;"><b>Example code for Logistic Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs020.html#synthetic-data-generation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Synthetic data generation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs021.html#using-scikit-learn" style="font-size: 80%;"><b>Using <b>Scikit-learn</b></b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs022.html#using-the-correlation-matrix" style="font-size: 80%;"><b>Using the correlation matrix</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs023.html#discussing-the-correlation-data" style="font-size: 80%;"><b>Discussing the correlation data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs024.html#other-measures-in-classification-studies" style="font-size: 80%;"><b>Other measures in classification studies</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs025.html#introduction-to-neural-networks" style="font-size: 80%;"><b>Introduction to Neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs026.html#artificial-neurons" style="font-size: 80%;"><b>Artificial neurons</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs027.html#neural-network-types" style="font-size: 80%;"><b>Neural network types</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#feed-forward-neural-networks" style="font-size: 80%;"><b>Feed-forward neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs029.html#convolutional-neural-network" style="font-size: 80%;"><b>Convolutional Neural Network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs030.html#recurrent-neural-networks" style="font-size: 80%;"><b>Recurrent neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs031.html#other-types-of-networks" style="font-size: 80%;"><b>Other types of networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="#multilayer-perceptrons" style="font-size: 80%;"><b>Multilayer perceptrons</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs033.html#why-multilayer-perceptrons" style="font-size: 80%;"><b>Why multilayer perceptrons?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs034.html#illustration-of-a-single-perceptron-model-and-a-multi-perceptron-model" style="font-size: 80%;"><b>Illustration of a single perceptron model and a multi-perceptron model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs035.html#examples-of-xor-or-and-and-gates" style="font-size: 80%;"><b>Examples of XOR, OR and AND gates</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs036.html#does-logistic-regression-do-a-better-job" style="font-size: 80%;"><b>Does Logistic Regression do a better Job?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs037.html#adding-neural-networks" style="font-size: 80%;"><b>Adding Neural Networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs043.html#matrix-vector-notation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Matrix-vector notation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs044.html#matrix-vector-notation-and-activation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Matrix-vector notation  and activation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs046.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs047.html#relevance" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Relevance</a></li>
 
         </ul>
       </li>
@@ -362,69 +260,15 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0032"></a>
 <!-- !split -->
-<h2 id="autograd-with-more-complicated-functions" class="anchor">Autograd with more complicated functions </h2>
+<h2 id="multilayer-perceptrons" class="anchor">Multilayer perceptrons  </h2>
 
-<p>To differentiate with respect to two (or more) arguments of a Python
-function, Autograd need to know at which variable the function if
-being differentiated with respect to.
+<p>One uses often so-called fully-connected feed-forward neural networks
+with three or more layers (an input layer, one or more hidden layers
+and an output layer) consisting of neurons that have non-linear
+activation functions.
 </p>
 
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">autograd.numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">autograd</span> <span style="color: #008000; font-weight: bold">import</span> grad
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">f2</span>(x1,x2):
-    <span style="color: #008000; font-weight: bold">return</span> <span style="color: #666666">3*</span>x1<span style="color: #666666">**3</span> <span style="color: #666666">+</span> x2<span style="color: #666666">*</span>(x1 <span style="color: #666666">-</span> <span style="color: #666666">5</span>) <span style="color: #666666">+</span> <span style="color: #666666">1</span>
-
-<span style="color: #408080; font-style: italic"># By sending the argument 0, Autograd will compute the derivative w.r.t the first variable, in this case x1</span>
-f2_grad_x1 <span style="color: #666666">=</span> grad(f2,<span style="color: #666666">0</span>)
-
-<span style="color: #408080; font-style: italic"># ... and differentiate w.r.t x2 by sending 1 as an additional arugment to grad</span>
-f2_grad_x2 <span style="color: #666666">=</span> grad(f2,<span style="color: #666666">1</span>)
-
-x1 <span style="color: #666666">=</span> <span style="color: #666666">1.0</span>
-x2 <span style="color: #666666">=</span> <span style="color: #666666">3.0</span> 
-
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;Evaluating at x1 = </span><span style="color: #BB6688; font-weight: bold">%g</span><span style="color: #BA2121">, x2 = </span><span style="color: #BB6688; font-weight: bold">%g</span><span style="color: #BA2121">&quot;</span><span style="color: #666666">%</span>(x1,x2))
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;-&quot;</span><span style="color: #666666">*30</span>)
-
-<span style="color: #408080; font-style: italic"># Compare with the analytical derivatives:</span>
-
-<span style="color: #408080; font-style: italic"># Derivative of f2 w.r.t x1 is: 9*x1**2 + x2:</span>
-f2_grad_x1_analytical <span style="color: #666666">=</span> <span style="color: #666666">9*</span>x1<span style="color: #666666">**2</span> <span style="color: #666666">+</span> x2
-
-<span style="color: #408080; font-style: italic"># Derivative of f2 w.r.t x2 is: x1 - 5:</span>
-f2_grad_x2_analytical <span style="color: #666666">=</span> x1 <span style="color: #666666">-</span> <span style="color: #666666">5</span>
-
-<span style="color: #408080; font-style: italic"># See the evaluated derivations:</span>
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;The derivative of f2 w.r.t x1: </span><span style="color: #BB6688; font-weight: bold">%g</span><span style="color: #BA2121">&quot;</span><span style="color: #666666">%</span>( f2_grad_x1(x1,x2) ))
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;The analytical derivative of f2 w.r.t x1: </span><span style="color: #BB6688; font-weight: bold">%g</span><span style="color: #BA2121">&quot;</span><span style="color: #666666">%</span>( f2_grad_x1(x1,x2) ))
-
-<span style="color: #008000">print</span>()
-
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;The derivative of f2 w.r.t x2: </span><span style="color: #BB6688; font-weight: bold">%g</span><span style="color: #BA2121">&quot;</span><span style="color: #666666">%</span>( f2_grad_x2(x1,x2) ))
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;The analytical derivative of f2 w.r.t x2: </span><span style="color: #BB6688; font-weight: bold">%g</span><span style="color: #BA2121">&quot;</span><span style="color: #666666">%</span>( f2_grad_x2(x1,x2) ))
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-<p>Note that the grad function will not produce the true gradient of the function. The true gradient of a function with two or more variables will produce a vector, where each element is the function differentiated w.r.t a variable.</p>
+<p>Such networks are often called <em>multilayer perceptrons</em> (MLPs).</p>
 
 <p>
 <!-- navigation buttons at the bottom of the page -->
@@ -451,7 +295,7 @@ <h2 id="autograd-with-more-complicated-functions" class="anchor">Autograd with m
   <li><a href="http://github.com/._week40-bs040.html">41</a></li>
   <li><a href="http://github.com/._week40-bs041.html">42</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week40-bs068.html">69</a></li>
+  <li><a href="http://github.com/._week40-bs047.html">48</a></li>
   <li><a href="http://github.com/._week40-bs033.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week40/html/._week40-bs033.html b/doc/pub/week40/html/._week40-bs033.html
index 075b2d381..9e89c75a1 100644
--- a/doc/pub/week40/html/._week40-bs033.html
+++ b/doc/pub/week40/html/._week40-bs033.html
@@ -36,11 +36,10 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Plans for week 40', 2, None, 'plans-for-week-40'),
-              ('Lecture Monday September 30, 2024',
+ 'sections': [('Lecture Monday September 29, 2025',
                2,
                None,
-               'lecture-monday-september-30-2024'),
+               'lecture-monday-september-29-2025'),
               ('Suggested readings and videos',
                2,
                None,
@@ -49,145 +48,71 @@
                2,
                None,
                'lab-sessions-tuesday-and-wednesday'),
-              ('Summary from last week, using gradient descent methods, '
-               'limitations',
+              ('Logistic Regression, from last week',
                2,
                None,
-               'summary-from-last-week-using-gradient-descent-methods-limitations'),
-              ('Simple implementation of GD for OLS, Ridge and Lasso',
+               'logistic-regression-from-last-week'),
+              ('Classification problems', 2, None, 'classification-problems'),
+              ('Optimization and Deep learning',
                2,
                None,
-               'simple-implementation-of-gd-for-ols-ridge-and-lasso'),
-              ("But none of these can compete with Newton's method",
+               'optimization-and-deep-learning'),
+              ('Basics', 2, None, 'basics'),
+              ('Two parameters', 2, None, 'two-parameters'),
+              ('Maximum likelihood', 2, None, 'maximum-likelihood'),
+              ('The cost function rewritten',
                2,
                None,
-               'but-none-of-these-can-compete-with-newton-s-method'),
-              ('Gradient descent and Logistic regression',
+               'the-cost-function-rewritten'),
+              ('Minimizing the cross entropy',
                2,
                None,
-               'gradient-descent-and-logistic-regression'),
-              ('Overview video on Stochastic Gradient Descent',
+               'minimizing-the-cross-entropy'),
+              ('A more compact expression',
                2,
                None,
-               'overview-video-on-stochastic-gradient-descent'),
-              ('Batches and mini-batches', 2, None, 'batches-and-mini-batches'),
-              ('Stochastic Gradient Descent (SGD)',
+               'a-more-compact-expression'),
+              ('Extending to more predictors',
                2,
                None,
-               'stochastic-gradient-descent-sgd'),
-              ('Stochastic Gradient Descent',
+               'extending-to-more-predictors'),
+              ('Including more classes', 2, None, 'including-more-classes'),
+              ('More classes', 2, None, 'more-classes'),
+              ('Optimization, the central part of any Machine Learning '
+               'algortithm',
                2,
                None,
-               'stochastic-gradient-descent'),
-              ('Computation of gradients', 2, None, 'computation-of-gradients'),
-              ('SGD example', 2, None, 'sgd-example'),
-              ('The gradient step', 2, None, 'the-gradient-step'),
-              ('Simple example code', 2, None, 'simple-example-code'),
-              ('When do we stop?', 2, None, 'when-do-we-stop'),
-              ('Slightly different approach',
+               'optimization-the-central-part-of-any-machine-learning-algortithm'),
+              ('Revisiting our Logistic Regression case',
                2,
                None,
-               'slightly-different-approach'),
-              ('Time decay rate', 2, None, 'time-decay-rate'),
-              ('Code with a Number of Minibatches which varies',
+               'revisiting-our-logistic-regression-case'),
+              ('The equations to solve', 2, None, 'the-equations-to-solve'),
+              ("Solving using Newton-Raphson's method",
                2,
                None,
-               'code-with-a-number-of-minibatches-which-varies'),
-              ('Replace or not', 2, None, 'replace-or-not'),
-              ('Momentum based GD', 2, None, 'momentum-based-gd'),
-              ('More on momentum based approaches',
+               'solving-using-newton-raphson-s-method'),
+              ('Example code for Logistic Regression',
                2,
                None,
-               'more-on-momentum-based-approaches'),
-              ('Momentum parameter', 2, None, 'momentum-parameter'),
-              ('Second moment of the gradient',
-               2,
-               None,
-               'second-moment-of-the-gradient'),
-              ('RMS prop', 2, None, 'rms-prop'),
-              ('"ADAM optimizer":"https://arxiv.org/abs/1412.6980"',
-               2,
-               None,
-               'adam-optimizer-https-arxiv-org-abs-1412-6980'),
-              ('Algorithms and codes for Adagrad, RMSprop and Adam',
-               2,
-               None,
-               'algorithms-and-codes-for-adagrad-rmsprop-and-adam'),
-              ('AdaGrad algorithm, taken from "Goodfellow et '
-               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
-               2,
-               None,
-               'adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
-              ('RMSProp algorithm, taken from "Goodfellow et '
-               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
-               2,
-               None,
-               'rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
-              ('ADAM algorithm, taken from "Goodfellow et '
-               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
-               2,
-               None,
-               'adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
-              ('Practical tips', 2, None, 'practical-tips'),
-              ('Automatic differentiation',
-               2,
-               None,
-               'automatic-differentiation'),
-              ('Using autograd', 2, None, 'using-autograd'),
-              ('Autograd with more complicated functions',
-               2,
-               None,
-               'autograd-with-more-complicated-functions'),
-              ('More complicated functions using the elements of their '
-               'arguments directly',
-               2,
-               None,
-               'more-complicated-functions-using-the-elements-of-their-arguments-directly'),
-              ('Functions using mathematical functions from Numpy',
-               2,
-               None,
-               'functions-using-mathematical-functions-from-numpy'),
-              ('More autograd', 2, None, 'more-autograd'),
-              ('And  with loops', 2, None, 'and-with-loops'),
-              ('Using recursion', 2, None, 'using-recursion'),
-              ('Using Autograd with OLS', 2, None, 'using-autograd-with-ols'),
-              ('Same code but now with momentum gradient descent',
-               2,
-               None,
-               'same-code-but-now-with-momentum-gradient-descent'),
-              ('Including Stochastic Gradient Descent with Autograd',
-               2,
-               None,
-               'including-stochastic-gradient-descent-with-autograd'),
-              ('Same code but now with momentum gradient descent',
-               2,
-               None,
-               'same-code-but-now-with-momentum-gradient-descent'),
-              ('Similar (second order function now) problem but now with '
-               'AdaGrad',
-               2,
+               'example-code-for-logistic-regression'),
+              ('Synthetic data generation',
+               3,
                None,
-               'similar-second-order-function-now-problem-but-now-with-adagrad'),
-              ('RMSprop for adaptive learning rate with Stochastic Gradient '
-               'Descent',
+               'synthetic-data-generation'),
+              ('Using _Scikit-learn_', 2, None, 'using-scikit-learn'),
+              ('Using the correlation matrix',
                2,
                None,
-               'rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent'),
-              ('And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"',
+               'using-the-correlation-matrix'),
+              ('Discussing the correlation data',
                2,
                None,
-               'and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf'),
-              ('And Logistic Regression', 2, None, 'and-logistic-regression'),
-              ('Introducing "JAX":"https://jax.readthedocs.io/en/latest/"',
+               'discussing-the-correlation-data'),
+              ('Other measures in classification studies',
                2,
                None,
-               'introducing-jax-https-jax-readthedocs-io-en-latest'),
-              ('Getting started with Jax, note the way we import numpy',
-               3,
-               None,
-               'getting-started-with-jax-note-the-way-we-import-numpy'),
-              ('A warm-up example', 3, None, 'a-warm-up-example'),
-              ('A more advanced example', 3, None, 'a-more-advanced-example'),
+               'other-measures-in-classification-studies'),
               ('Introduction to Neural networks',
                2,
                None,
@@ -276,81 +201,54 @@
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs001.html#plans-for-week-40" style="font-size: 80%;"><b>Plans for week 40</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs002.html#lecture-monday-september-30-2024" style="font-size: 80%;"><b>Lecture Monday September 30, 2024</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs003.html#suggested-readings-and-videos" style="font-size: 80%;"><b>Suggested readings and videos</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs004.html#lab-sessions-tuesday-and-wednesday" style="font-size: 80%;"><b>Lab sessions Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs005.html#summary-from-last-week-using-gradient-descent-methods-limitations" style="font-size: 80%;"><b>Summary from last week, using gradient descent methods, limitations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs006.html#simple-implementation-of-gd-for-ols-ridge-and-lasso" style="font-size: 80%;"><b>Simple implementation of GD for OLS, Ridge and Lasso</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs007.html#but-none-of-these-can-compete-with-newton-s-method" style="font-size: 80%;"><b>But none of these can compete with Newton's method</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs008.html#gradient-descent-and-logistic-regression" style="font-size: 80%;"><b>Gradient descent and Logistic regression</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs009.html#overview-video-on-stochastic-gradient-descent" style="font-size: 80%;"><b>Overview video on Stochastic Gradient Descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs010.html#batches-and-mini-batches" style="font-size: 80%;"><b>Batches and mini-batches</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs011.html#stochastic-gradient-descent-sgd" style="font-size: 80%;"><b>Stochastic Gradient Descent (SGD)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs012.html#stochastic-gradient-descent" style="font-size: 80%;"><b>Stochastic Gradient Descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs013.html#computation-of-gradients" style="font-size: 80%;"><b>Computation of gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs014.html#sgd-example" style="font-size: 80%;"><b>SGD example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs015.html#the-gradient-step" style="font-size: 80%;"><b>The gradient step</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs016.html#simple-example-code" style="font-size: 80%;"><b>Simple example code</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs017.html#when-do-we-stop" style="font-size: 80%;"><b>When do we stop?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs018.html#slightly-different-approach" style="font-size: 80%;"><b>Slightly different approach</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs019.html#time-decay-rate" style="font-size: 80%;"><b>Time decay rate</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs020.html#code-with-a-number-of-minibatches-which-varies" style="font-size: 80%;"><b>Code with a Number of Minibatches which varies</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs021.html#replace-or-not" style="font-size: 80%;"><b>Replace or not</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs022.html#momentum-based-gd" style="font-size: 80%;"><b>Momentum based GD</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs023.html#more-on-momentum-based-approaches" style="font-size: 80%;"><b>More on momentum based approaches</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs024.html#momentum-parameter" style="font-size: 80%;"><b>Momentum parameter</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs025.html#second-moment-of-the-gradient" style="font-size: 80%;"><b>Second moment of the gradient</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs026.html#rms-prop" style="font-size: 80%;"><b>RMS prop</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs027.html#adam-optimizer-https-arxiv-org-abs-1412-6980" style="font-size: 80%;"><b>"ADAM optimizer":"https://arxiv.org/abs/1412.6980"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#algorithms-and-codes-for-adagrad-rmsprop-and-adam" style="font-size: 80%;"><b>Algorithms and codes for Adagrad, RMSprop and Adam</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>AdaGrad algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>RMSProp algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>ADAM algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs029.html#practical-tips" style="font-size: 80%;"><b>Practical tips</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs030.html#automatic-differentiation" style="font-size: 80%;"><b>Automatic differentiation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs031.html#using-autograd" style="font-size: 80%;"><b>Using autograd</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs032.html#autograd-with-more-complicated-functions" style="font-size: 80%;"><b>Autograd with more complicated functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="#more-complicated-functions-using-the-elements-of-their-arguments-directly" style="font-size: 80%;"><b>More complicated functions using the elements of their arguments directly</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs034.html#functions-using-mathematical-functions-from-numpy" style="font-size: 80%;"><b>Functions using mathematical functions from Numpy</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs035.html#more-autograd" style="font-size: 80%;"><b>More autograd</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs036.html#and-with-loops" style="font-size: 80%;"><b>And  with loops</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs037.html#using-recursion" style="font-size: 80%;"><b>Using recursion</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs038.html#using-autograd-with-ols" style="font-size: 80%;"><b>Using Autograd with OLS</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs041.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs040.html#including-stochastic-gradient-descent-with-autograd" style="font-size: 80%;"><b>Including Stochastic Gradient Descent with Autograd</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs041.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#similar-second-order-function-now-problem-but-now-with-adagrad" style="font-size: 80%;"><b>Similar (second order function now) problem but now with AdaGrad</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs043.html#rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent" style="font-size: 80%;"><b>RMSprop for adaptive learning rate with Stochastic Gradient Descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs044.html#and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf" style="font-size: 80%;"><b>And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#and-logistic-regression" style="font-size: 80%;"><b>And Logistic Regression</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#introducing-jax-https-jax-readthedocs-io-en-latest" style="font-size: 80%;"><b>Introducing "JAX":"https://jax.readthedocs.io/en/latest/"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#getting-started-with-jax-note-the-way-we-import-numpy" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Getting started with Jax, note the way we import numpy</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#a-warm-up-example" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;A warm-up example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#a-more-advanced-example" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;A more advanced example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs046.html#introduction-to-neural-networks" style="font-size: 80%;"><b>Introduction to Neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs047.html#artificial-neurons" style="font-size: 80%;"><b>Artificial neurons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs048.html#neural-network-types" style="font-size: 80%;"><b>Neural network types</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs049.html#feed-forward-neural-networks" style="font-size: 80%;"><b>Feed-forward neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs050.html#convolutional-neural-network" style="font-size: 80%;"><b>Convolutional Neural Network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs051.html#recurrent-neural-networks" style="font-size: 80%;"><b>Recurrent neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs052.html#other-types-of-networks" style="font-size: 80%;"><b>Other types of networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs053.html#multilayer-perceptrons" style="font-size: 80%;"><b>Multilayer perceptrons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs054.html#why-multilayer-perceptrons" style="font-size: 80%;"><b>Why multilayer perceptrons?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs055.html#illustration-of-a-single-perceptron-model-and-a-multi-perceptron-model" style="font-size: 80%;"><b>Illustration of a single perceptron model and a multi-perceptron model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs056.html#examples-of-xor-or-and-and-gates" style="font-size: 80%;"><b>Examples of XOR, OR and AND gates</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs057.html#does-logistic-regression-do-a-better-job" style="font-size: 80%;"><b>Does Logistic Regression do a better Job?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs058.html#adding-neural-networks" style="font-size: 80%;"><b>Adding Neural Networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs064.html#matrix-vector-notation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Matrix-vector notation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs065.html#matrix-vector-notation-and-activation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Matrix-vector notation  and activation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs066.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs067.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs068.html#relevance" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Relevance</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs001.html#lecture-monday-september-29-2025" style="font-size: 80%;"><b>Lecture Monday September 29, 2025</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs002.html#suggested-readings-and-videos" style="font-size: 80%;"><b>Suggested readings and videos</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs003.html#lab-sessions-tuesday-and-wednesday" style="font-size: 80%;"><b>Lab sessions Tuesday and Wednesday</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs004.html#logistic-regression-from-last-week" style="font-size: 80%;"><b>Logistic Regression, from last week</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs005.html#classification-problems" style="font-size: 80%;"><b>Classification problems</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs006.html#optimization-and-deep-learning" style="font-size: 80%;"><b>Optimization and Deep learning</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs007.html#basics" style="font-size: 80%;"><b>Basics</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs008.html#two-parameters" style="font-size: 80%;"><b>Two parameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs009.html#maximum-likelihood" style="font-size: 80%;"><b>Maximum likelihood</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs010.html#the-cost-function-rewritten" style="font-size: 80%;"><b>The cost function rewritten</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs011.html#minimizing-the-cross-entropy" style="font-size: 80%;"><b>Minimizing the cross entropy</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs012.html#a-more-compact-expression" style="font-size: 80%;"><b>A more compact expression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs013.html#extending-to-more-predictors" style="font-size: 80%;"><b>Extending to more predictors</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs014.html#including-more-classes" style="font-size: 80%;"><b>Including more classes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs015.html#more-classes" style="font-size: 80%;"><b>More classes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs016.html#optimization-the-central-part-of-any-machine-learning-algortithm" style="font-size: 80%;"><b>Optimization, the central part of any Machine Learning algortithm</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs017.html#revisiting-our-logistic-regression-case" style="font-size: 80%;"><b>Revisiting our Logistic Regression case</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs018.html#the-equations-to-solve" style="font-size: 80%;"><b>The equations to solve</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs019.html#solving-using-newton-raphson-s-method" style="font-size: 80%;"><b>Solving using Newton-Raphson's method</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs020.html#example-code-for-logistic-regression" style="font-size: 80%;"><b>Example code for Logistic Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs020.html#synthetic-data-generation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Synthetic data generation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs021.html#using-scikit-learn" style="font-size: 80%;"><b>Using <b>Scikit-learn</b></b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs022.html#using-the-correlation-matrix" style="font-size: 80%;"><b>Using the correlation matrix</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs023.html#discussing-the-correlation-data" style="font-size: 80%;"><b>Discussing the correlation data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs024.html#other-measures-in-classification-studies" style="font-size: 80%;"><b>Other measures in classification studies</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs025.html#introduction-to-neural-networks" style="font-size: 80%;"><b>Introduction to Neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs026.html#artificial-neurons" style="font-size: 80%;"><b>Artificial neurons</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs027.html#neural-network-types" style="font-size: 80%;"><b>Neural network types</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#feed-forward-neural-networks" style="font-size: 80%;"><b>Feed-forward neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs029.html#convolutional-neural-network" style="font-size: 80%;"><b>Convolutional Neural Network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs030.html#recurrent-neural-networks" style="font-size: 80%;"><b>Recurrent neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs031.html#other-types-of-networks" style="font-size: 80%;"><b>Other types of networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs032.html#multilayer-perceptrons" style="font-size: 80%;"><b>Multilayer perceptrons</b></a></li>
+     <!-- navigation toc: --> <li><a href="#why-multilayer-perceptrons" style="font-size: 80%;"><b>Why multilayer perceptrons?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs034.html#illustration-of-a-single-perceptron-model-and-a-multi-perceptron-model" style="font-size: 80%;"><b>Illustration of a single perceptron model and a multi-perceptron model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs035.html#examples-of-xor-or-and-and-gates" style="font-size: 80%;"><b>Examples of XOR, OR and AND gates</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs036.html#does-logistic-regression-do-a-better-job" style="font-size: 80%;"><b>Does Logistic Regression do a better Job?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs037.html#adding-neural-networks" style="font-size: 80%;"><b>Adding Neural Networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs043.html#matrix-vector-notation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Matrix-vector notation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs044.html#matrix-vector-notation-and-activation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Matrix-vector notation  and activation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs046.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs047.html#relevance" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Relevance</a></li>
 
         </ul>
       </li>
@@ -362,53 +260,19 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0033"></a>
 <!-- !split -->
-<h2 id="more-complicated-functions-using-the-elements-of-their-arguments-directly" class="anchor">More complicated functions using the elements of their arguments directly </h2>
-
+<h2 id="why-multilayer-perceptrons" class="anchor">Why multilayer perceptrons?  </h2>
 
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">autograd.numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">autograd</span> <span style="color: #008000; font-weight: bold">import</span> grad
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">f3</span>(x): <span style="color: #408080; font-style: italic"># Assumes x is an array of length 5 or higher</span>
-    <span style="color: #008000; font-weight: bold">return</span> <span style="color: #666666">2*</span>x[<span style="color: #666666">0</span>] <span style="color: #666666">+</span> <span style="color: #666666">3*</span>x[<span style="color: #666666">1</span>] <span style="color: #666666">+</span> <span style="color: #666666">5*</span>x[<span style="color: #666666">2</span>] <span style="color: #666666">+</span> <span style="color: #666666">7*</span>x[<span style="color: #666666">3</span>] <span style="color: #666666">+</span> <span style="color: #666666">11*</span>x[<span style="color: #666666">4</span>]<span style="color: #666666">**2</span>
-
-f3_grad <span style="color: #666666">=</span> grad(f3)
-
-x <span style="color: #666666">=</span> np<span style="color: #666666">.</span>linspace(<span style="color: #666666">0</span>,<span style="color: #666666">4</span>,<span style="color: #666666">5</span>)
-
-<span style="color: #408080; font-style: italic"># Print the computed gradient:</span>
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;The computed gradient of f3 is: &quot;</span>, f3_grad(x))
-
-<span style="color: #408080; font-style: italic"># The analytical gradient is: (2, 3, 5, 7, 22*x[4])</span>
-f3_grad_analytical <span style="color: #666666">=</span> np<span style="color: #666666">.</span>array([<span style="color: #666666">2</span>, <span style="color: #666666">3</span>, <span style="color: #666666">5</span>, <span style="color: #666666">7</span>, <span style="color: #666666">22*</span>x[<span style="color: #666666">4</span>]])
-
-<span style="color: #408080; font-style: italic"># Print the analytical gradient:</span>
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;The analytical gradient of f3 is: &quot;</span>, f3_grad_analytical)
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
+<p>According to the <em>Universal approximation theorem</em>, a feed-forward
+neural network with just a single hidden layer containing a finite
+number of neurons can approximate a continuous multidimensional
+function to arbitrary accuracy, assuming the activation function for
+the hidden layer is a <b>non-constant, bounded and
+monotonically-increasing continuous function</b>.
+</p>
 
-<p>Note that in this case, when sending an array as input argument, the
-output from Autograd is another array. This is the true gradient of
-the function, as opposed to the function in the previous example. By
-using arrays to represent the variables, the output from Autograd
-might be easier to work with, as the output is closer to what one
-could expect form a gradient-evaluting function.
+<p>Note that the requirements on the activation function only applies to
+the hidden layer, the output nodes are always assumed to be linear, so
+as to not restrict the range of output values.
 </p>
 
 <p>
@@ -436,7 +300,7 @@ <h2 id="more-complicated-functions-using-the-elements-of-their-arguments-directl
   <li><a href="http://github.com/._week40-bs041.html">42</a></li>
   <li><a href="http://github.com/._week40-bs042.html">43</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week40-bs068.html">69</a></li>
+  <li><a href="http://github.com/._week40-bs047.html">48</a></li>
   <li><a href="http://github.com/._week40-bs034.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week40/html/._week40-bs034.html b/doc/pub/week40/html/._week40-bs034.html
index 22faf3bf4..0d5a62572 100644
--- a/doc/pub/week40/html/._week40-bs034.html
+++ b/doc/pub/week40/html/._week40-bs034.html
@@ -36,11 +36,10 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Plans for week 40', 2, None, 'plans-for-week-40'),
-              ('Lecture Monday September 30, 2024',
+ 'sections': [('Lecture Monday September 29, 2025',
                2,
                None,
-               'lecture-monday-september-30-2024'),
+               'lecture-monday-september-29-2025'),
               ('Suggested readings and videos',
                2,
                None,
@@ -49,145 +48,71 @@
                2,
                None,
                'lab-sessions-tuesday-and-wednesday'),
-              ('Summary from last week, using gradient descent methods, '
-               'limitations',
+              ('Logistic Regression, from last week',
                2,
                None,
-               'summary-from-last-week-using-gradient-descent-methods-limitations'),
-              ('Simple implementation of GD for OLS, Ridge and Lasso',
+               'logistic-regression-from-last-week'),
+              ('Classification problems', 2, None, 'classification-problems'),
+              ('Optimization and Deep learning',
                2,
                None,
-               'simple-implementation-of-gd-for-ols-ridge-and-lasso'),
-              ("But none of these can compete with Newton's method",
+               'optimization-and-deep-learning'),
+              ('Basics', 2, None, 'basics'),
+              ('Two parameters', 2, None, 'two-parameters'),
+              ('Maximum likelihood', 2, None, 'maximum-likelihood'),
+              ('The cost function rewritten',
                2,
                None,
-               'but-none-of-these-can-compete-with-newton-s-method'),
-              ('Gradient descent and Logistic regression',
+               'the-cost-function-rewritten'),
+              ('Minimizing the cross entropy',
                2,
                None,
-               'gradient-descent-and-logistic-regression'),
-              ('Overview video on Stochastic Gradient Descent',
+               'minimizing-the-cross-entropy'),
+              ('A more compact expression',
                2,
                None,
-               'overview-video-on-stochastic-gradient-descent'),
-              ('Batches and mini-batches', 2, None, 'batches-and-mini-batches'),
-              ('Stochastic Gradient Descent (SGD)',
+               'a-more-compact-expression'),
+              ('Extending to more predictors',
                2,
                None,
-               'stochastic-gradient-descent-sgd'),
-              ('Stochastic Gradient Descent',
+               'extending-to-more-predictors'),
+              ('Including more classes', 2, None, 'including-more-classes'),
+              ('More classes', 2, None, 'more-classes'),
+              ('Optimization, the central part of any Machine Learning '
+               'algortithm',
                2,
                None,
-               'stochastic-gradient-descent'),
-              ('Computation of gradients', 2, None, 'computation-of-gradients'),
-              ('SGD example', 2, None, 'sgd-example'),
-              ('The gradient step', 2, None, 'the-gradient-step'),
-              ('Simple example code', 2, None, 'simple-example-code'),
-              ('When do we stop?', 2, None, 'when-do-we-stop'),
-              ('Slightly different approach',
+               'optimization-the-central-part-of-any-machine-learning-algortithm'),
+              ('Revisiting our Logistic Regression case',
                2,
                None,
-               'slightly-different-approach'),
-              ('Time decay rate', 2, None, 'time-decay-rate'),
-              ('Code with a Number of Minibatches which varies',
+               'revisiting-our-logistic-regression-case'),
+              ('The equations to solve', 2, None, 'the-equations-to-solve'),
+              ("Solving using Newton-Raphson's method",
                2,
                None,
-               'code-with-a-number-of-minibatches-which-varies'),
-              ('Replace or not', 2, None, 'replace-or-not'),
-              ('Momentum based GD', 2, None, 'momentum-based-gd'),
-              ('More on momentum based approaches',
+               'solving-using-newton-raphson-s-method'),
+              ('Example code for Logistic Regression',
                2,
                None,
-               'more-on-momentum-based-approaches'),
-              ('Momentum parameter', 2, None, 'momentum-parameter'),
-              ('Second moment of the gradient',
-               2,
-               None,
-               'second-moment-of-the-gradient'),
-              ('RMS prop', 2, None, 'rms-prop'),
-              ('"ADAM optimizer":"https://arxiv.org/abs/1412.6980"',
-               2,
-               None,
-               'adam-optimizer-https-arxiv-org-abs-1412-6980'),
-              ('Algorithms and codes for Adagrad, RMSprop and Adam',
-               2,
-               None,
-               'algorithms-and-codes-for-adagrad-rmsprop-and-adam'),
-              ('AdaGrad algorithm, taken from "Goodfellow et '
-               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
-               2,
-               None,
-               'adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
-              ('RMSProp algorithm, taken from "Goodfellow et '
-               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
-               2,
-               None,
-               'rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
-              ('ADAM algorithm, taken from "Goodfellow et '
-               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
-               2,
-               None,
-               'adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
-              ('Practical tips', 2, None, 'practical-tips'),
-              ('Automatic differentiation',
-               2,
-               None,
-               'automatic-differentiation'),
-              ('Using autograd', 2, None, 'using-autograd'),
-              ('Autograd with more complicated functions',
-               2,
-               None,
-               'autograd-with-more-complicated-functions'),
-              ('More complicated functions using the elements of their '
-               'arguments directly',
-               2,
-               None,
-               'more-complicated-functions-using-the-elements-of-their-arguments-directly'),
-              ('Functions using mathematical functions from Numpy',
-               2,
-               None,
-               'functions-using-mathematical-functions-from-numpy'),
-              ('More autograd', 2, None, 'more-autograd'),
-              ('And  with loops', 2, None, 'and-with-loops'),
-              ('Using recursion', 2, None, 'using-recursion'),
-              ('Using Autograd with OLS', 2, None, 'using-autograd-with-ols'),
-              ('Same code but now with momentum gradient descent',
-               2,
-               None,
-               'same-code-but-now-with-momentum-gradient-descent'),
-              ('Including Stochastic Gradient Descent with Autograd',
-               2,
-               None,
-               'including-stochastic-gradient-descent-with-autograd'),
-              ('Same code but now with momentum gradient descent',
-               2,
-               None,
-               'same-code-but-now-with-momentum-gradient-descent'),
-              ('Similar (second order function now) problem but now with '
-               'AdaGrad',
-               2,
+               'example-code-for-logistic-regression'),
+              ('Synthetic data generation',
+               3,
                None,
-               'similar-second-order-function-now-problem-but-now-with-adagrad'),
-              ('RMSprop for adaptive learning rate with Stochastic Gradient '
-               'Descent',
+               'synthetic-data-generation'),
+              ('Using _Scikit-learn_', 2, None, 'using-scikit-learn'),
+              ('Using the correlation matrix',
                2,
                None,
-               'rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent'),
-              ('And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"',
+               'using-the-correlation-matrix'),
+              ('Discussing the correlation data',
                2,
                None,
-               'and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf'),
-              ('And Logistic Regression', 2, None, 'and-logistic-regression'),
-              ('Introducing "JAX":"https://jax.readthedocs.io/en/latest/"',
+               'discussing-the-correlation-data'),
+              ('Other measures in classification studies',
                2,
                None,
-               'introducing-jax-https-jax-readthedocs-io-en-latest'),
-              ('Getting started with Jax, note the way we import numpy',
-               3,
-               None,
-               'getting-started-with-jax-note-the-way-we-import-numpy'),
-              ('A warm-up example', 3, None, 'a-warm-up-example'),
-              ('A more advanced example', 3, None, 'a-more-advanced-example'),
+               'other-measures-in-classification-studies'),
               ('Introduction to Neural networks',
                2,
                None,
@@ -276,81 +201,54 @@
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs001.html#plans-for-week-40" style="font-size: 80%;"><b>Plans for week 40</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs002.html#lecture-monday-september-30-2024" style="font-size: 80%;"><b>Lecture Monday September 30, 2024</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs003.html#suggested-readings-and-videos" style="font-size: 80%;"><b>Suggested readings and videos</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs004.html#lab-sessions-tuesday-and-wednesday" style="font-size: 80%;"><b>Lab sessions Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs005.html#summary-from-last-week-using-gradient-descent-methods-limitations" style="font-size: 80%;"><b>Summary from last week, using gradient descent methods, limitations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs006.html#simple-implementation-of-gd-for-ols-ridge-and-lasso" style="font-size: 80%;"><b>Simple implementation of GD for OLS, Ridge and Lasso</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs007.html#but-none-of-these-can-compete-with-newton-s-method" style="font-size: 80%;"><b>But none of these can compete with Newton's method</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs008.html#gradient-descent-and-logistic-regression" style="font-size: 80%;"><b>Gradient descent and Logistic regression</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs009.html#overview-video-on-stochastic-gradient-descent" style="font-size: 80%;"><b>Overview video on Stochastic Gradient Descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs010.html#batches-and-mini-batches" style="font-size: 80%;"><b>Batches and mini-batches</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs011.html#stochastic-gradient-descent-sgd" style="font-size: 80%;"><b>Stochastic Gradient Descent (SGD)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs012.html#stochastic-gradient-descent" style="font-size: 80%;"><b>Stochastic Gradient Descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs013.html#computation-of-gradients" style="font-size: 80%;"><b>Computation of gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs014.html#sgd-example" style="font-size: 80%;"><b>SGD example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs015.html#the-gradient-step" style="font-size: 80%;"><b>The gradient step</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs016.html#simple-example-code" style="font-size: 80%;"><b>Simple example code</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs017.html#when-do-we-stop" style="font-size: 80%;"><b>When do we stop?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs018.html#slightly-different-approach" style="font-size: 80%;"><b>Slightly different approach</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs019.html#time-decay-rate" style="font-size: 80%;"><b>Time decay rate</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs020.html#code-with-a-number-of-minibatches-which-varies" style="font-size: 80%;"><b>Code with a Number of Minibatches which varies</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs021.html#replace-or-not" style="font-size: 80%;"><b>Replace or not</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs022.html#momentum-based-gd" style="font-size: 80%;"><b>Momentum based GD</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs023.html#more-on-momentum-based-approaches" style="font-size: 80%;"><b>More on momentum based approaches</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs024.html#momentum-parameter" style="font-size: 80%;"><b>Momentum parameter</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs025.html#second-moment-of-the-gradient" style="font-size: 80%;"><b>Second moment of the gradient</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs026.html#rms-prop" style="font-size: 80%;"><b>RMS prop</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs027.html#adam-optimizer-https-arxiv-org-abs-1412-6980" style="font-size: 80%;"><b>"ADAM optimizer":"https://arxiv.org/abs/1412.6980"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#algorithms-and-codes-for-adagrad-rmsprop-and-adam" style="font-size: 80%;"><b>Algorithms and codes for Adagrad, RMSprop and Adam</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>AdaGrad algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>RMSProp algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>ADAM algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs029.html#practical-tips" style="font-size: 80%;"><b>Practical tips</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs030.html#automatic-differentiation" style="font-size: 80%;"><b>Automatic differentiation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs031.html#using-autograd" style="font-size: 80%;"><b>Using autograd</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs032.html#autograd-with-more-complicated-functions" style="font-size: 80%;"><b>Autograd with more complicated functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs033.html#more-complicated-functions-using-the-elements-of-their-arguments-directly" style="font-size: 80%;"><b>More complicated functions using the elements of their arguments directly</b></a></li>
-     <!-- navigation toc: --> <li><a href="#functions-using-mathematical-functions-from-numpy" style="font-size: 80%;"><b>Functions using mathematical functions from Numpy</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs035.html#more-autograd" style="font-size: 80%;"><b>More autograd</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs036.html#and-with-loops" style="font-size: 80%;"><b>And  with loops</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs037.html#using-recursion" style="font-size: 80%;"><b>Using recursion</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs038.html#using-autograd-with-ols" style="font-size: 80%;"><b>Using Autograd with OLS</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs041.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs040.html#including-stochastic-gradient-descent-with-autograd" style="font-size: 80%;"><b>Including Stochastic Gradient Descent with Autograd</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs041.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#similar-second-order-function-now-problem-but-now-with-adagrad" style="font-size: 80%;"><b>Similar (second order function now) problem but now with AdaGrad</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs043.html#rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent" style="font-size: 80%;"><b>RMSprop for adaptive learning rate with Stochastic Gradient Descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs044.html#and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf" style="font-size: 80%;"><b>And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#and-logistic-regression" style="font-size: 80%;"><b>And Logistic Regression</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#introducing-jax-https-jax-readthedocs-io-en-latest" style="font-size: 80%;"><b>Introducing "JAX":"https://jax.readthedocs.io/en/latest/"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#getting-started-with-jax-note-the-way-we-import-numpy" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Getting started with Jax, note the way we import numpy</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#a-warm-up-example" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;A warm-up example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#a-more-advanced-example" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;A more advanced example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs046.html#introduction-to-neural-networks" style="font-size: 80%;"><b>Introduction to Neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs047.html#artificial-neurons" style="font-size: 80%;"><b>Artificial neurons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs048.html#neural-network-types" style="font-size: 80%;"><b>Neural network types</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs049.html#feed-forward-neural-networks" style="font-size: 80%;"><b>Feed-forward neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs050.html#convolutional-neural-network" style="font-size: 80%;"><b>Convolutional Neural Network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs051.html#recurrent-neural-networks" style="font-size: 80%;"><b>Recurrent neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs052.html#other-types-of-networks" style="font-size: 80%;"><b>Other types of networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs053.html#multilayer-perceptrons" style="font-size: 80%;"><b>Multilayer perceptrons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs054.html#why-multilayer-perceptrons" style="font-size: 80%;"><b>Why multilayer perceptrons?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs055.html#illustration-of-a-single-perceptron-model-and-a-multi-perceptron-model" style="font-size: 80%;"><b>Illustration of a single perceptron model and a multi-perceptron model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs056.html#examples-of-xor-or-and-and-gates" style="font-size: 80%;"><b>Examples of XOR, OR and AND gates</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs057.html#does-logistic-regression-do-a-better-job" style="font-size: 80%;"><b>Does Logistic Regression do a better Job?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs058.html#adding-neural-networks" style="font-size: 80%;"><b>Adding Neural Networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs064.html#matrix-vector-notation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Matrix-vector notation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs065.html#matrix-vector-notation-and-activation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Matrix-vector notation  and activation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs066.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs067.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs068.html#relevance" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Relevance</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs001.html#lecture-monday-september-29-2025" style="font-size: 80%;"><b>Lecture Monday September 29, 2025</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs002.html#suggested-readings-and-videos" style="font-size: 80%;"><b>Suggested readings and videos</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs003.html#lab-sessions-tuesday-and-wednesday" style="font-size: 80%;"><b>Lab sessions Tuesday and Wednesday</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs004.html#logistic-regression-from-last-week" style="font-size: 80%;"><b>Logistic Regression, from last week</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs005.html#classification-problems" style="font-size: 80%;"><b>Classification problems</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs006.html#optimization-and-deep-learning" style="font-size: 80%;"><b>Optimization and Deep learning</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs007.html#basics" style="font-size: 80%;"><b>Basics</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs008.html#two-parameters" style="font-size: 80%;"><b>Two parameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs009.html#maximum-likelihood" style="font-size: 80%;"><b>Maximum likelihood</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs010.html#the-cost-function-rewritten" style="font-size: 80%;"><b>The cost function rewritten</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs011.html#minimizing-the-cross-entropy" style="font-size: 80%;"><b>Minimizing the cross entropy</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs012.html#a-more-compact-expression" style="font-size: 80%;"><b>A more compact expression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs013.html#extending-to-more-predictors" style="font-size: 80%;"><b>Extending to more predictors</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs014.html#including-more-classes" style="font-size: 80%;"><b>Including more classes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs015.html#more-classes" style="font-size: 80%;"><b>More classes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs016.html#optimization-the-central-part-of-any-machine-learning-algortithm" style="font-size: 80%;"><b>Optimization, the central part of any Machine Learning algortithm</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs017.html#revisiting-our-logistic-regression-case" style="font-size: 80%;"><b>Revisiting our Logistic Regression case</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs018.html#the-equations-to-solve" style="font-size: 80%;"><b>The equations to solve</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs019.html#solving-using-newton-raphson-s-method" style="font-size: 80%;"><b>Solving using Newton-Raphson's method</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs020.html#example-code-for-logistic-regression" style="font-size: 80%;"><b>Example code for Logistic Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs020.html#synthetic-data-generation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Synthetic data generation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs021.html#using-scikit-learn" style="font-size: 80%;"><b>Using <b>Scikit-learn</b></b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs022.html#using-the-correlation-matrix" style="font-size: 80%;"><b>Using the correlation matrix</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs023.html#discussing-the-correlation-data" style="font-size: 80%;"><b>Discussing the correlation data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs024.html#other-measures-in-classification-studies" style="font-size: 80%;"><b>Other measures in classification studies</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs025.html#introduction-to-neural-networks" style="font-size: 80%;"><b>Introduction to Neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs026.html#artificial-neurons" style="font-size: 80%;"><b>Artificial neurons</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs027.html#neural-network-types" style="font-size: 80%;"><b>Neural network types</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#feed-forward-neural-networks" style="font-size: 80%;"><b>Feed-forward neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs029.html#convolutional-neural-network" style="font-size: 80%;"><b>Convolutional Neural Network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs030.html#recurrent-neural-networks" style="font-size: 80%;"><b>Recurrent neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs031.html#other-types-of-networks" style="font-size: 80%;"><b>Other types of networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs032.html#multilayer-perceptrons" style="font-size: 80%;"><b>Multilayer perceptrons</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs033.html#why-multilayer-perceptrons" style="font-size: 80%;"><b>Why multilayer perceptrons?</b></a></li>
+     <!-- navigation toc: --> <li><a href="#illustration-of-a-single-perceptron-model-and-a-multi-perceptron-model" style="font-size: 80%;"><b>Illustration of a single perceptron model and a multi-perceptron model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs035.html#examples-of-xor-or-and-and-gates" style="font-size: 80%;"><b>Examples of XOR, OR and AND gates</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs036.html#does-logistic-regression-do-a-better-job" style="font-size: 80%;"><b>Does Logistic Regression do a better Job?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs037.html#adding-neural-networks" style="font-size: 80%;"><b>Adding Neural Networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs043.html#matrix-vector-notation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Matrix-vector notation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs044.html#matrix-vector-notation-and-activation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Matrix-vector notation  and activation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs046.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs047.html#relevance" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Relevance</a></li>
 
         </ul>
       </li>
@@ -361,48 +259,16 @@
 <div class="container">
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0034"></a>
-<!-- !split  -->
-<h2 id="functions-using-mathematical-functions-from-numpy" class="anchor">Functions using mathematical functions from Numpy </h2>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">autograd.numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">autograd</span> <span style="color: #008000; font-weight: bold">import</span> grad
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">f4</span>(x):
-    <span style="color: #008000; font-weight: bold">return</span> np<span style="color: #666666">.</span>sqrt(<span style="color: #666666">1+</span>x<span style="color: #666666">**2</span>) <span style="color: #666666">+</span> np<span style="color: #666666">.</span>exp(x) <span style="color: #666666">+</span> np<span style="color: #666666">.</span>sin(<span style="color: #666666">2*</span>np<span style="color: #666666">.</span>pi<span style="color: #666666">*</span>x)
-
-f4_grad <span style="color: #666666">=</span> grad(f4)
-
-x <span style="color: #666666">=</span> <span style="color: #666666">2.7</span>
-
-<span style="color: #408080; font-style: italic"># Print the computed derivative:</span>
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;The computed derivative of f4 at x = </span><span style="color: #BB6688; font-weight: bold">%g</span><span style="color: #BA2121"> is: </span><span style="color: #BB6688; font-weight: bold">%g</span><span style="color: #BA2121">&quot;</span><span style="color: #666666">%</span>(x,f4_grad(x)))
-
-<span style="color: #408080; font-style: italic"># The analytical derivative is: x/sqrt(1 + x**2) + exp(x) + cos(2*pi*x)*2*pi</span>
-f4_grad_analytical <span style="color: #666666">=</span> x<span style="color: #666666">/</span>np<span style="color: #666666">.</span>sqrt(<span style="color: #666666">1</span> <span style="color: #666666">+</span> x<span style="color: #666666">**2</span>) <span style="color: #666666">+</span> np<span style="color: #666666">.</span>exp(x) <span style="color: #666666">+</span> np<span style="color: #666666">.</span>cos(<span style="color: #666666">2*</span>np<span style="color: #666666">.</span>pi<span style="color: #666666">*</span>x)<span style="color: #666666">*2*</span>np<span style="color: #666666">.</span>pi
-
-<span style="color: #408080; font-style: italic"># Print the analytical gradient:</span>
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;The analytical gradient of f4 at x = </span><span style="color: #BB6688; font-weight: bold">%g</span><span style="color: #BA2121"> is: </span><span style="color: #BB6688; font-weight: bold">%g</span><span style="color: #BA2121">&quot;</span><span style="color: #666666">%</span>(x,f4_grad_analytical))
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
+<!-- !split -->
+<h2 id="illustration-of-a-single-perceptron-model-and-a-multi-perceptron-model" class="anchor">Illustration of a single perceptron model and a multi-perceptron model </h2>
 
+<center>  <!-- FIGURE -->
+<hr class="figure">
+<center>
+<p class="caption">Figure 1:  In a) we show a single perceptron model while in b) we dispay a network with two  hidden layers, an input layer and an output layer. </p>
+</center>
+<p><img src="http://github.com/figures/nns.png" width="600" align="bottom"></p>
+</center>
 
 <p>
 <!-- navigation buttons at the bottom of the page -->
@@ -429,7 +295,7 @@ <h2 id="functions-using-mathematical-functions-from-numpy" class="anchor">Functi
   <li><a href="http://github.com/._week40-bs042.html">43</a></li>
   <li><a href="http://github.com/._week40-bs043.html">44</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week40-bs068.html">69</a></li>
+  <li><a href="http://github.com/._week40-bs047.html">48</a></li>
   <li><a href="http://github.com/._week40-bs035.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week40/html/._week40-bs035.html b/doc/pub/week40/html/._week40-bs035.html
index bd8007acb..772b91ce9 100644
--- a/doc/pub/week40/html/._week40-bs035.html
+++ b/doc/pub/week40/html/._week40-bs035.html
@@ -36,11 +36,10 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Plans for week 40', 2, None, 'plans-for-week-40'),
-              ('Lecture Monday September 30, 2024',
+ 'sections': [('Lecture Monday September 29, 2025',
                2,
                None,
-               'lecture-monday-september-30-2024'),
+               'lecture-monday-september-29-2025'),
               ('Suggested readings and videos',
                2,
                None,
@@ -49,145 +48,71 @@
                2,
                None,
                'lab-sessions-tuesday-and-wednesday'),
-              ('Summary from last week, using gradient descent methods, '
-               'limitations',
+              ('Logistic Regression, from last week',
                2,
                None,
-               'summary-from-last-week-using-gradient-descent-methods-limitations'),
-              ('Simple implementation of GD for OLS, Ridge and Lasso',
+               'logistic-regression-from-last-week'),
+              ('Classification problems', 2, None, 'classification-problems'),
+              ('Optimization and Deep learning',
                2,
                None,
-               'simple-implementation-of-gd-for-ols-ridge-and-lasso'),
-              ("But none of these can compete with Newton's method",
+               'optimization-and-deep-learning'),
+              ('Basics', 2, None, 'basics'),
+              ('Two parameters', 2, None, 'two-parameters'),
+              ('Maximum likelihood', 2, None, 'maximum-likelihood'),
+              ('The cost function rewritten',
                2,
                None,
-               'but-none-of-these-can-compete-with-newton-s-method'),
-              ('Gradient descent and Logistic regression',
+               'the-cost-function-rewritten'),
+              ('Minimizing the cross entropy',
                2,
                None,
-               'gradient-descent-and-logistic-regression'),
-              ('Overview video on Stochastic Gradient Descent',
+               'minimizing-the-cross-entropy'),
+              ('A more compact expression',
                2,
                None,
-               'overview-video-on-stochastic-gradient-descent'),
-              ('Batches and mini-batches', 2, None, 'batches-and-mini-batches'),
-              ('Stochastic Gradient Descent (SGD)',
+               'a-more-compact-expression'),
+              ('Extending to more predictors',
                2,
                None,
-               'stochastic-gradient-descent-sgd'),
-              ('Stochastic Gradient Descent',
+               'extending-to-more-predictors'),
+              ('Including more classes', 2, None, 'including-more-classes'),
+              ('More classes', 2, None, 'more-classes'),
+              ('Optimization, the central part of any Machine Learning '
+               'algortithm',
                2,
                None,
-               'stochastic-gradient-descent'),
-              ('Computation of gradients', 2, None, 'computation-of-gradients'),
-              ('SGD example', 2, None, 'sgd-example'),
-              ('The gradient step', 2, None, 'the-gradient-step'),
-              ('Simple example code', 2, None, 'simple-example-code'),
-              ('When do we stop?', 2, None, 'when-do-we-stop'),
-              ('Slightly different approach',
+               'optimization-the-central-part-of-any-machine-learning-algortithm'),
+              ('Revisiting our Logistic Regression case',
                2,
                None,
-               'slightly-different-approach'),
-              ('Time decay rate', 2, None, 'time-decay-rate'),
-              ('Code with a Number of Minibatches which varies',
+               'revisiting-our-logistic-regression-case'),
+              ('The equations to solve', 2, None, 'the-equations-to-solve'),
+              ("Solving using Newton-Raphson's method",
                2,
                None,
-               'code-with-a-number-of-minibatches-which-varies'),
-              ('Replace or not', 2, None, 'replace-or-not'),
-              ('Momentum based GD', 2, None, 'momentum-based-gd'),
-              ('More on momentum based approaches',
+               'solving-using-newton-raphson-s-method'),
+              ('Example code for Logistic Regression',
                2,
                None,
-               'more-on-momentum-based-approaches'),
-              ('Momentum parameter', 2, None, 'momentum-parameter'),
-              ('Second moment of the gradient',
-               2,
-               None,
-               'second-moment-of-the-gradient'),
-              ('RMS prop', 2, None, 'rms-prop'),
-              ('"ADAM optimizer":"https://arxiv.org/abs/1412.6980"',
-               2,
-               None,
-               'adam-optimizer-https-arxiv-org-abs-1412-6980'),
-              ('Algorithms and codes for Adagrad, RMSprop and Adam',
-               2,
-               None,
-               'algorithms-and-codes-for-adagrad-rmsprop-and-adam'),
-              ('AdaGrad algorithm, taken from "Goodfellow et '
-               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
-               2,
-               None,
-               'adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
-              ('RMSProp algorithm, taken from "Goodfellow et '
-               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
-               2,
-               None,
-               'rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
-              ('ADAM algorithm, taken from "Goodfellow et '
-               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
-               2,
-               None,
-               'adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
-              ('Practical tips', 2, None, 'practical-tips'),
-              ('Automatic differentiation',
-               2,
-               None,
-               'automatic-differentiation'),
-              ('Using autograd', 2, None, 'using-autograd'),
-              ('Autograd with more complicated functions',
-               2,
-               None,
-               'autograd-with-more-complicated-functions'),
-              ('More complicated functions using the elements of their '
-               'arguments directly',
-               2,
-               None,
-               'more-complicated-functions-using-the-elements-of-their-arguments-directly'),
-              ('Functions using mathematical functions from Numpy',
-               2,
-               None,
-               'functions-using-mathematical-functions-from-numpy'),
-              ('More autograd', 2, None, 'more-autograd'),
-              ('And  with loops', 2, None, 'and-with-loops'),
-              ('Using recursion', 2, None, 'using-recursion'),
-              ('Using Autograd with OLS', 2, None, 'using-autograd-with-ols'),
-              ('Same code but now with momentum gradient descent',
-               2,
-               None,
-               'same-code-but-now-with-momentum-gradient-descent'),
-              ('Including Stochastic Gradient Descent with Autograd',
-               2,
-               None,
-               'including-stochastic-gradient-descent-with-autograd'),
-              ('Same code but now with momentum gradient descent',
-               2,
-               None,
-               'same-code-but-now-with-momentum-gradient-descent'),
-              ('Similar (second order function now) problem but now with '
-               'AdaGrad',
-               2,
+               'example-code-for-logistic-regression'),
+              ('Synthetic data generation',
+               3,
                None,
-               'similar-second-order-function-now-problem-but-now-with-adagrad'),
-              ('RMSprop for adaptive learning rate with Stochastic Gradient '
-               'Descent',
+               'synthetic-data-generation'),
+              ('Using _Scikit-learn_', 2, None, 'using-scikit-learn'),
+              ('Using the correlation matrix',
                2,
                None,
-               'rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent'),
-              ('And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"',
+               'using-the-correlation-matrix'),
+              ('Discussing the correlation data',
                2,
                None,
-               'and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf'),
-              ('And Logistic Regression', 2, None, 'and-logistic-regression'),
-              ('Introducing "JAX":"https://jax.readthedocs.io/en/latest/"',
+               'discussing-the-correlation-data'),
+              ('Other measures in classification studies',
                2,
                None,
-               'introducing-jax-https-jax-readthedocs-io-en-latest'),
-              ('Getting started with Jax, note the way we import numpy',
-               3,
-               None,
-               'getting-started-with-jax-note-the-way-we-import-numpy'),
-              ('A warm-up example', 3, None, 'a-warm-up-example'),
-              ('A more advanced example', 3, None, 'a-more-advanced-example'),
+               'other-measures-in-classification-studies'),
               ('Introduction to Neural networks',
                2,
                None,
@@ -276,81 +201,54 @@
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs001.html#plans-for-week-40" style="font-size: 80%;"><b>Plans for week 40</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs002.html#lecture-monday-september-30-2024" style="font-size: 80%;"><b>Lecture Monday September 30, 2024</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs003.html#suggested-readings-and-videos" style="font-size: 80%;"><b>Suggested readings and videos</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs004.html#lab-sessions-tuesday-and-wednesday" style="font-size: 80%;"><b>Lab sessions Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs005.html#summary-from-last-week-using-gradient-descent-methods-limitations" style="font-size: 80%;"><b>Summary from last week, using gradient descent methods, limitations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs006.html#simple-implementation-of-gd-for-ols-ridge-and-lasso" style="font-size: 80%;"><b>Simple implementation of GD for OLS, Ridge and Lasso</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs007.html#but-none-of-these-can-compete-with-newton-s-method" style="font-size: 80%;"><b>But none of these can compete with Newton's method</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs008.html#gradient-descent-and-logistic-regression" style="font-size: 80%;"><b>Gradient descent and Logistic regression</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs009.html#overview-video-on-stochastic-gradient-descent" style="font-size: 80%;"><b>Overview video on Stochastic Gradient Descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs010.html#batches-and-mini-batches" style="font-size: 80%;"><b>Batches and mini-batches</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs011.html#stochastic-gradient-descent-sgd" style="font-size: 80%;"><b>Stochastic Gradient Descent (SGD)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs012.html#stochastic-gradient-descent" style="font-size: 80%;"><b>Stochastic Gradient Descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs013.html#computation-of-gradients" style="font-size: 80%;"><b>Computation of gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs014.html#sgd-example" style="font-size: 80%;"><b>SGD example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs015.html#the-gradient-step" style="font-size: 80%;"><b>The gradient step</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs016.html#simple-example-code" style="font-size: 80%;"><b>Simple example code</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs017.html#when-do-we-stop" style="font-size: 80%;"><b>When do we stop?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs018.html#slightly-different-approach" style="font-size: 80%;"><b>Slightly different approach</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs019.html#time-decay-rate" style="font-size: 80%;"><b>Time decay rate</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs020.html#code-with-a-number-of-minibatches-which-varies" style="font-size: 80%;"><b>Code with a Number of Minibatches which varies</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs021.html#replace-or-not" style="font-size: 80%;"><b>Replace or not</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs022.html#momentum-based-gd" style="font-size: 80%;"><b>Momentum based GD</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs023.html#more-on-momentum-based-approaches" style="font-size: 80%;"><b>More on momentum based approaches</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs024.html#momentum-parameter" style="font-size: 80%;"><b>Momentum parameter</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs025.html#second-moment-of-the-gradient" style="font-size: 80%;"><b>Second moment of the gradient</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs026.html#rms-prop" style="font-size: 80%;"><b>RMS prop</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs027.html#adam-optimizer-https-arxiv-org-abs-1412-6980" style="font-size: 80%;"><b>"ADAM optimizer":"https://arxiv.org/abs/1412.6980"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#algorithms-and-codes-for-adagrad-rmsprop-and-adam" style="font-size: 80%;"><b>Algorithms and codes for Adagrad, RMSprop and Adam</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>AdaGrad algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>RMSProp algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>ADAM algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs029.html#practical-tips" style="font-size: 80%;"><b>Practical tips</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs030.html#automatic-differentiation" style="font-size: 80%;"><b>Automatic differentiation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs031.html#using-autograd" style="font-size: 80%;"><b>Using autograd</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs032.html#autograd-with-more-complicated-functions" style="font-size: 80%;"><b>Autograd with more complicated functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs033.html#more-complicated-functions-using-the-elements-of-their-arguments-directly" style="font-size: 80%;"><b>More complicated functions using the elements of their arguments directly</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs034.html#functions-using-mathematical-functions-from-numpy" style="font-size: 80%;"><b>Functions using mathematical functions from Numpy</b></a></li>
-     <!-- navigation toc: --> <li><a href="#more-autograd" style="font-size: 80%;"><b>More autograd</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs036.html#and-with-loops" style="font-size: 80%;"><b>And  with loops</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs037.html#using-recursion" style="font-size: 80%;"><b>Using recursion</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs038.html#using-autograd-with-ols" style="font-size: 80%;"><b>Using Autograd with OLS</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs041.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs040.html#including-stochastic-gradient-descent-with-autograd" style="font-size: 80%;"><b>Including Stochastic Gradient Descent with Autograd</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs041.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#similar-second-order-function-now-problem-but-now-with-adagrad" style="font-size: 80%;"><b>Similar (second order function now) problem but now with AdaGrad</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs043.html#rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent" style="font-size: 80%;"><b>RMSprop for adaptive learning rate with Stochastic Gradient Descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs044.html#and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf" style="font-size: 80%;"><b>And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#and-logistic-regression" style="font-size: 80%;"><b>And Logistic Regression</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#introducing-jax-https-jax-readthedocs-io-en-latest" style="font-size: 80%;"><b>Introducing "JAX":"https://jax.readthedocs.io/en/latest/"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#getting-started-with-jax-note-the-way-we-import-numpy" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Getting started with Jax, note the way we import numpy</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#a-warm-up-example" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;A warm-up example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#a-more-advanced-example" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;A more advanced example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs046.html#introduction-to-neural-networks" style="font-size: 80%;"><b>Introduction to Neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs047.html#artificial-neurons" style="font-size: 80%;"><b>Artificial neurons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs048.html#neural-network-types" style="font-size: 80%;"><b>Neural network types</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs049.html#feed-forward-neural-networks" style="font-size: 80%;"><b>Feed-forward neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs050.html#convolutional-neural-network" style="font-size: 80%;"><b>Convolutional Neural Network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs051.html#recurrent-neural-networks" style="font-size: 80%;"><b>Recurrent neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs052.html#other-types-of-networks" style="font-size: 80%;"><b>Other types of networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs053.html#multilayer-perceptrons" style="font-size: 80%;"><b>Multilayer perceptrons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs054.html#why-multilayer-perceptrons" style="font-size: 80%;"><b>Why multilayer perceptrons?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs055.html#illustration-of-a-single-perceptron-model-and-a-multi-perceptron-model" style="font-size: 80%;"><b>Illustration of a single perceptron model and a multi-perceptron model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs056.html#examples-of-xor-or-and-and-gates" style="font-size: 80%;"><b>Examples of XOR, OR and AND gates</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs057.html#does-logistic-regression-do-a-better-job" style="font-size: 80%;"><b>Does Logistic Regression do a better Job?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs058.html#adding-neural-networks" style="font-size: 80%;"><b>Adding Neural Networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs064.html#matrix-vector-notation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Matrix-vector notation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs065.html#matrix-vector-notation-and-activation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Matrix-vector notation  and activation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs066.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs067.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs068.html#relevance" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Relevance</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs001.html#lecture-monday-september-29-2025" style="font-size: 80%;"><b>Lecture Monday September 29, 2025</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs002.html#suggested-readings-and-videos" style="font-size: 80%;"><b>Suggested readings and videos</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs003.html#lab-sessions-tuesday-and-wednesday" style="font-size: 80%;"><b>Lab sessions Tuesday and Wednesday</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs004.html#logistic-regression-from-last-week" style="font-size: 80%;"><b>Logistic Regression, from last week</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs005.html#classification-problems" style="font-size: 80%;"><b>Classification problems</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs006.html#optimization-and-deep-learning" style="font-size: 80%;"><b>Optimization and Deep learning</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs007.html#basics" style="font-size: 80%;"><b>Basics</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs008.html#two-parameters" style="font-size: 80%;"><b>Two parameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs009.html#maximum-likelihood" style="font-size: 80%;"><b>Maximum likelihood</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs010.html#the-cost-function-rewritten" style="font-size: 80%;"><b>The cost function rewritten</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs011.html#minimizing-the-cross-entropy" style="font-size: 80%;"><b>Minimizing the cross entropy</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs012.html#a-more-compact-expression" style="font-size: 80%;"><b>A more compact expression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs013.html#extending-to-more-predictors" style="font-size: 80%;"><b>Extending to more predictors</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs014.html#including-more-classes" style="font-size: 80%;"><b>Including more classes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs015.html#more-classes" style="font-size: 80%;"><b>More classes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs016.html#optimization-the-central-part-of-any-machine-learning-algortithm" style="font-size: 80%;"><b>Optimization, the central part of any Machine Learning algortithm</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs017.html#revisiting-our-logistic-regression-case" style="font-size: 80%;"><b>Revisiting our Logistic Regression case</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs018.html#the-equations-to-solve" style="font-size: 80%;"><b>The equations to solve</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs019.html#solving-using-newton-raphson-s-method" style="font-size: 80%;"><b>Solving using Newton-Raphson's method</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs020.html#example-code-for-logistic-regression" style="font-size: 80%;"><b>Example code for Logistic Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs020.html#synthetic-data-generation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Synthetic data generation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs021.html#using-scikit-learn" style="font-size: 80%;"><b>Using <b>Scikit-learn</b></b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs022.html#using-the-correlation-matrix" style="font-size: 80%;"><b>Using the correlation matrix</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs023.html#discussing-the-correlation-data" style="font-size: 80%;"><b>Discussing the correlation data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs024.html#other-measures-in-classification-studies" style="font-size: 80%;"><b>Other measures in classification studies</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs025.html#introduction-to-neural-networks" style="font-size: 80%;"><b>Introduction to Neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs026.html#artificial-neurons" style="font-size: 80%;"><b>Artificial neurons</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs027.html#neural-network-types" style="font-size: 80%;"><b>Neural network types</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#feed-forward-neural-networks" style="font-size: 80%;"><b>Feed-forward neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs029.html#convolutional-neural-network" style="font-size: 80%;"><b>Convolutional Neural Network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs030.html#recurrent-neural-networks" style="font-size: 80%;"><b>Recurrent neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs031.html#other-types-of-networks" style="font-size: 80%;"><b>Other types of networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs032.html#multilayer-perceptrons" style="font-size: 80%;"><b>Multilayer perceptrons</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs033.html#why-multilayer-perceptrons" style="font-size: 80%;"><b>Why multilayer perceptrons?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs034.html#illustration-of-a-single-perceptron-model-and-a-multi-perceptron-model" style="font-size: 80%;"><b>Illustration of a single perceptron model and a multi-perceptron model</b></a></li>
+     <!-- navigation toc: --> <li><a href="#examples-of-xor-or-and-and-gates" style="font-size: 80%;"><b>Examples of XOR, OR and AND gates</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs036.html#does-logistic-regression-do-a-better-job" style="font-size: 80%;"><b>Does Logistic Regression do a better Job?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs037.html#adding-neural-networks" style="font-size: 80%;"><b>Adding Neural Networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs043.html#matrix-vector-notation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Matrix-vector notation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs044.html#matrix-vector-notation-and-activation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Matrix-vector notation  and activation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs046.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs047.html#relevance" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Relevance</a></li>
 
         </ul>
       </li>
@@ -362,7 +260,14 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0035"></a>
 <!-- !split -->
-<h2 id="more-autograd" class="anchor">More autograd </h2>
+<h2 id="examples-of-xor-or-and-and-gates" class="anchor">Examples of XOR, OR and AND gates </h2>
+
+<p>Let us first try to fit various gates using standard linear
+regression. The gates we are thinking of are the classical XOR, OR and
+AND gates, well-known elements in computer science. The tables here
+show how we can set up the inputs \( x_1 \) and \( x_2 \) in order to yield a
+specific target \( y_i \).
+</p>
 
 
 <!-- code=python (!bc pycod) typeset with pygments style "default" -->
@@ -371,20 +276,36 @@ <h2 id="more-autograd" class="anchor">More autograd </h2>
     <div class="inner_cell">
       <div class="input_area">
         <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">autograd.numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">autograd</span> <span style="color: #008000; font-weight: bold">import</span> grad
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">f5</span>(x):
-    <span style="color: #008000; font-weight: bold">if</span> x <span style="color: #666666">&gt;=</span> <span style="color: #666666">0</span>:
-        <span style="color: #008000; font-weight: bold">return</span> x<span style="color: #666666">**2</span>
-    <span style="color: #008000; font-weight: bold">else</span>:
-        <span style="color: #008000; font-weight: bold">return</span> <span style="color: #666666">-3*</span>x <span style="color: #666666">+</span> <span style="color: #666666">1</span>
+  <pre style="line-height: 125%;"><span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;</span>
+<span style="color: #BA2121; font-style: italic">Simple code that tests XOR, OR and AND gates with linear regression</span>
+<span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;</span>
+
+<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
+<span style="color: #408080; font-style: italic"># Design matrix</span>
+X <span style="color: #666666">=</span> np<span style="color: #666666">.</span>array([ [<span style="color: #666666">1</span>, <span style="color: #666666">0</span>, <span style="color: #666666">0</span>], [<span style="color: #666666">1</span>, <span style="color: #666666">0</span>, <span style="color: #666666">1</span>], [<span style="color: #666666">1</span>, <span style="color: #666666">1</span>, <span style="color: #666666">0</span>],[<span style="color: #666666">1</span>, <span style="color: #666666">1</span>, <span style="color: #666666">1</span>]],dtype<span style="color: #666666">=</span>np<span style="color: #666666">.</span>float64)
+<span style="color: #008000">print</span>(<span style="color: #BA2121">f&quot;The X.TX  matrix:</span><span style="color: #BB6688; font-weight: bold">{</span>X<span style="color: #666666">.</span>T <span style="color: #666666">@</span> X<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">&quot;</span>)
+Xinv <span style="color: #666666">=</span> np<span style="color: #666666">.</span>linalg<span style="color: #666666">.</span>pinv(X<span style="color: #666666">.</span>T <span style="color: #666666">@</span> X)
+<span style="color: #008000">print</span>(<span style="color: #BA2121">f&quot;The invers of X.TX  matrix:</span><span style="color: #BB6688; font-weight: bold">{</span>Xinv<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">&quot;</span>)
+
+<span style="color: #408080; font-style: italic"># The XOR gate </span>
+yXOR <span style="color: #666666">=</span> np<span style="color: #666666">.</span>array( [ <span style="color: #666666">0</span>, <span style="color: #666666">1</span> ,<span style="color: #666666">1</span>, <span style="color: #666666">0</span>])
+ThetaXOR  <span style="color: #666666">=</span> Xinv <span style="color: #666666">@</span> X<span style="color: #666666">.</span>T <span style="color: #666666">@</span> yXOR
+<span style="color: #008000">print</span>(<span style="color: #BA2121">f&quot;The values of theta for the XOR gate:</span><span style="color: #BB6688; font-weight: bold">{</span>ThetaXOR<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">&quot;</span>)
+<span style="color: #008000">print</span>(<span style="color: #BA2121">f&quot;The linear regression prediction  for the XOR gate:</span><span style="color: #BB6688; font-weight: bold">{</span>X <span style="color: #666666">@</span> ThetaXOR<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">&quot;</span>)
+
 
-f5_grad <span style="color: #666666">=</span> grad(f5)
+<span style="color: #408080; font-style: italic"># The OR gate </span>
+yOR <span style="color: #666666">=</span> np<span style="color: #666666">.</span>array( [ <span style="color: #666666">0</span>, <span style="color: #666666">1</span> ,<span style="color: #666666">1</span>, <span style="color: #666666">1</span>])
+ThetaOR  <span style="color: #666666">=</span> Xinv <span style="color: #666666">@</span> X<span style="color: #666666">.</span>T <span style="color: #666666">@</span> yOR
+<span style="color: #008000">print</span>(<span style="color: #BA2121">f&quot;The values of theta for the OR gate:</span><span style="color: #BB6688; font-weight: bold">{</span>ThetaOR<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">&quot;</span>)
+<span style="color: #008000">print</span>(<span style="color: #BA2121">f&quot;The linear regression prediction  for the OR gate:</span><span style="color: #BB6688; font-weight: bold">{</span>X <span style="color: #666666">@</span> ThetaOR<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">&quot;</span>)
 
-x <span style="color: #666666">=</span> <span style="color: #666666">2.7</span>
 
-<span style="color: #408080; font-style: italic"># Print the computed derivative:</span>
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;The computed derivative of f5 at x = </span><span style="color: #BB6688; font-weight: bold">%g</span><span style="color: #BA2121"> is: </span><span style="color: #BB6688; font-weight: bold">%g</span><span style="color: #BA2121">&quot;</span><span style="color: #666666">%</span>(x,f5_grad(x)))
+<span style="color: #408080; font-style: italic"># The OR gate </span>
+yAND <span style="color: #666666">=</span> np<span style="color: #666666">.</span>array( [ <span style="color: #666666">0</span>, <span style="color: #666666">0</span> ,<span style="color: #666666">0</span>, <span style="color: #666666">1</span>])
+ThetaAND  <span style="color: #666666">=</span> Xinv <span style="color: #666666">@</span> X<span style="color: #666666">.</span>T <span style="color: #666666">@</span> yAND
+<span style="color: #008000">print</span>(<span style="color: #BA2121">f&quot;The values of theta for the AND gate:</span><span style="color: #BB6688; font-weight: bold">{</span>ThetaAND<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">&quot;</span>)
+<span style="color: #008000">print</span>(<span style="color: #BA2121">f&quot;The linear regression prediction  for the AND gate:</span><span style="color: #BB6688; font-weight: bold">{</span>X <span style="color: #666666">@</span> ThetaAND<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">&quot;</span>)
 </pre>
 </div>
       </div>
@@ -400,6 +321,7 @@ <h2 id="more-autograd" class="anchor">More autograd </h2>
   </div>
 </div>
 
+<p>What is happening here?</p>
 
 <p>
 <!-- navigation buttons at the bottom of the page -->
@@ -426,7 +348,7 @@ <h2 id="more-autograd" class="anchor">More autograd </h2>
   <li><a href="http://github.com/._week40-bs043.html">44</a></li>
   <li><a href="http://github.com/._week40-bs044.html">45</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week40-bs068.html">69</a></li>
+  <li><a href="http://github.com/._week40-bs047.html">48</a></li>
   <li><a href="http://github.com/._week40-bs036.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week40/html/._week40-bs036.html b/doc/pub/week40/html/._week40-bs036.html
index 426306ba0..02a9a773c 100644
--- a/doc/pub/week40/html/._week40-bs036.html
+++ b/doc/pub/week40/html/._week40-bs036.html
@@ -36,11 +36,10 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Plans for week 40', 2, None, 'plans-for-week-40'),
-              ('Lecture Monday September 30, 2024',
+ 'sections': [('Lecture Monday September 29, 2025',
                2,
                None,
-               'lecture-monday-september-30-2024'),
+               'lecture-monday-september-29-2025'),
               ('Suggested readings and videos',
                2,
                None,
@@ -49,145 +48,71 @@
                2,
                None,
                'lab-sessions-tuesday-and-wednesday'),
-              ('Summary from last week, using gradient descent methods, '
-               'limitations',
+              ('Logistic Regression, from last week',
                2,
                None,
-               'summary-from-last-week-using-gradient-descent-methods-limitations'),
-              ('Simple implementation of GD for OLS, Ridge and Lasso',
+               'logistic-regression-from-last-week'),
+              ('Classification problems', 2, None, 'classification-problems'),
+              ('Optimization and Deep learning',
                2,
                None,
-               'simple-implementation-of-gd-for-ols-ridge-and-lasso'),
-              ("But none of these can compete with Newton's method",
+               'optimization-and-deep-learning'),
+              ('Basics', 2, None, 'basics'),
+              ('Two parameters', 2, None, 'two-parameters'),
+              ('Maximum likelihood', 2, None, 'maximum-likelihood'),
+              ('The cost function rewritten',
                2,
                None,
-               'but-none-of-these-can-compete-with-newton-s-method'),
-              ('Gradient descent and Logistic regression',
+               'the-cost-function-rewritten'),
+              ('Minimizing the cross entropy',
                2,
                None,
-               'gradient-descent-and-logistic-regression'),
-              ('Overview video on Stochastic Gradient Descent',
+               'minimizing-the-cross-entropy'),
+              ('A more compact expression',
                2,
                None,
-               'overview-video-on-stochastic-gradient-descent'),
-              ('Batches and mini-batches', 2, None, 'batches-and-mini-batches'),
-              ('Stochastic Gradient Descent (SGD)',
+               'a-more-compact-expression'),
+              ('Extending to more predictors',
                2,
                None,
-               'stochastic-gradient-descent-sgd'),
-              ('Stochastic Gradient Descent',
+               'extending-to-more-predictors'),
+              ('Including more classes', 2, None, 'including-more-classes'),
+              ('More classes', 2, None, 'more-classes'),
+              ('Optimization, the central part of any Machine Learning '
+               'algortithm',
                2,
                None,
-               'stochastic-gradient-descent'),
-              ('Computation of gradients', 2, None, 'computation-of-gradients'),
-              ('SGD example', 2, None, 'sgd-example'),
-              ('The gradient step', 2, None, 'the-gradient-step'),
-              ('Simple example code', 2, None, 'simple-example-code'),
-              ('When do we stop?', 2, None, 'when-do-we-stop'),
-              ('Slightly different approach',
+               'optimization-the-central-part-of-any-machine-learning-algortithm'),
+              ('Revisiting our Logistic Regression case',
                2,
                None,
-               'slightly-different-approach'),
-              ('Time decay rate', 2, None, 'time-decay-rate'),
-              ('Code with a Number of Minibatches which varies',
+               'revisiting-our-logistic-regression-case'),
+              ('The equations to solve', 2, None, 'the-equations-to-solve'),
+              ("Solving using Newton-Raphson's method",
                2,
                None,
-               'code-with-a-number-of-minibatches-which-varies'),
-              ('Replace or not', 2, None, 'replace-or-not'),
-              ('Momentum based GD', 2, None, 'momentum-based-gd'),
-              ('More on momentum based approaches',
+               'solving-using-newton-raphson-s-method'),
+              ('Example code for Logistic Regression',
                2,
                None,
-               'more-on-momentum-based-approaches'),
-              ('Momentum parameter', 2, None, 'momentum-parameter'),
-              ('Second moment of the gradient',
-               2,
-               None,
-               'second-moment-of-the-gradient'),
-              ('RMS prop', 2, None, 'rms-prop'),
-              ('"ADAM optimizer":"https://arxiv.org/abs/1412.6980"',
-               2,
-               None,
-               'adam-optimizer-https-arxiv-org-abs-1412-6980'),
-              ('Algorithms and codes for Adagrad, RMSprop and Adam',
-               2,
-               None,
-               'algorithms-and-codes-for-adagrad-rmsprop-and-adam'),
-              ('AdaGrad algorithm, taken from "Goodfellow et '
-               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
-               2,
-               None,
-               'adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
-              ('RMSProp algorithm, taken from "Goodfellow et '
-               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
-               2,
-               None,
-               'rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
-              ('ADAM algorithm, taken from "Goodfellow et '
-               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
-               2,
-               None,
-               'adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
-              ('Practical tips', 2, None, 'practical-tips'),
-              ('Automatic differentiation',
-               2,
-               None,
-               'automatic-differentiation'),
-              ('Using autograd', 2, None, 'using-autograd'),
-              ('Autograd with more complicated functions',
-               2,
-               None,
-               'autograd-with-more-complicated-functions'),
-              ('More complicated functions using the elements of their '
-               'arguments directly',
-               2,
-               None,
-               'more-complicated-functions-using-the-elements-of-their-arguments-directly'),
-              ('Functions using mathematical functions from Numpy',
-               2,
-               None,
-               'functions-using-mathematical-functions-from-numpy'),
-              ('More autograd', 2, None, 'more-autograd'),
-              ('And  with loops', 2, None, 'and-with-loops'),
-              ('Using recursion', 2, None, 'using-recursion'),
-              ('Using Autograd with OLS', 2, None, 'using-autograd-with-ols'),
-              ('Same code but now with momentum gradient descent',
-               2,
-               None,
-               'same-code-but-now-with-momentum-gradient-descent'),
-              ('Including Stochastic Gradient Descent with Autograd',
-               2,
-               None,
-               'including-stochastic-gradient-descent-with-autograd'),
-              ('Same code but now with momentum gradient descent',
-               2,
-               None,
-               'same-code-but-now-with-momentum-gradient-descent'),
-              ('Similar (second order function now) problem but now with '
-               'AdaGrad',
-               2,
+               'example-code-for-logistic-regression'),
+              ('Synthetic data generation',
+               3,
                None,
-               'similar-second-order-function-now-problem-but-now-with-adagrad'),
-              ('RMSprop for adaptive learning rate with Stochastic Gradient '
-               'Descent',
+               'synthetic-data-generation'),
+              ('Using _Scikit-learn_', 2, None, 'using-scikit-learn'),
+              ('Using the correlation matrix',
                2,
                None,
-               'rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent'),
-              ('And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"',
+               'using-the-correlation-matrix'),
+              ('Discussing the correlation data',
                2,
                None,
-               'and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf'),
-              ('And Logistic Regression', 2, None, 'and-logistic-regression'),
-              ('Introducing "JAX":"https://jax.readthedocs.io/en/latest/"',
+               'discussing-the-correlation-data'),
+              ('Other measures in classification studies',
                2,
                None,
-               'introducing-jax-https-jax-readthedocs-io-en-latest'),
-              ('Getting started with Jax, note the way we import numpy',
-               3,
-               None,
-               'getting-started-with-jax-note-the-way-we-import-numpy'),
-              ('A warm-up example', 3, None, 'a-warm-up-example'),
-              ('A more advanced example', 3, None, 'a-more-advanced-example'),
+               'other-measures-in-classification-studies'),
               ('Introduction to Neural networks',
                2,
                None,
@@ -276,81 +201,54 @@
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs001.html#plans-for-week-40" style="font-size: 80%;"><b>Plans for week 40</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs002.html#lecture-monday-september-30-2024" style="font-size: 80%;"><b>Lecture Monday September 30, 2024</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs003.html#suggested-readings-and-videos" style="font-size: 80%;"><b>Suggested readings and videos</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs004.html#lab-sessions-tuesday-and-wednesday" style="font-size: 80%;"><b>Lab sessions Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs005.html#summary-from-last-week-using-gradient-descent-methods-limitations" style="font-size: 80%;"><b>Summary from last week, using gradient descent methods, limitations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs006.html#simple-implementation-of-gd-for-ols-ridge-and-lasso" style="font-size: 80%;"><b>Simple implementation of GD for OLS, Ridge and Lasso</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs007.html#but-none-of-these-can-compete-with-newton-s-method" style="font-size: 80%;"><b>But none of these can compete with Newton's method</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs008.html#gradient-descent-and-logistic-regression" style="font-size: 80%;"><b>Gradient descent and Logistic regression</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs009.html#overview-video-on-stochastic-gradient-descent" style="font-size: 80%;"><b>Overview video on Stochastic Gradient Descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs010.html#batches-and-mini-batches" style="font-size: 80%;"><b>Batches and mini-batches</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs011.html#stochastic-gradient-descent-sgd" style="font-size: 80%;"><b>Stochastic Gradient Descent (SGD)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs012.html#stochastic-gradient-descent" style="font-size: 80%;"><b>Stochastic Gradient Descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs013.html#computation-of-gradients" style="font-size: 80%;"><b>Computation of gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs014.html#sgd-example" style="font-size: 80%;"><b>SGD example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs015.html#the-gradient-step" style="font-size: 80%;"><b>The gradient step</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs016.html#simple-example-code" style="font-size: 80%;"><b>Simple example code</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs017.html#when-do-we-stop" style="font-size: 80%;"><b>When do we stop?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs018.html#slightly-different-approach" style="font-size: 80%;"><b>Slightly different approach</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs019.html#time-decay-rate" style="font-size: 80%;"><b>Time decay rate</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs020.html#code-with-a-number-of-minibatches-which-varies" style="font-size: 80%;"><b>Code with a Number of Minibatches which varies</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs021.html#replace-or-not" style="font-size: 80%;"><b>Replace or not</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs022.html#momentum-based-gd" style="font-size: 80%;"><b>Momentum based GD</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs023.html#more-on-momentum-based-approaches" style="font-size: 80%;"><b>More on momentum based approaches</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs024.html#momentum-parameter" style="font-size: 80%;"><b>Momentum parameter</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs025.html#second-moment-of-the-gradient" style="font-size: 80%;"><b>Second moment of the gradient</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs026.html#rms-prop" style="font-size: 80%;"><b>RMS prop</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs027.html#adam-optimizer-https-arxiv-org-abs-1412-6980" style="font-size: 80%;"><b>"ADAM optimizer":"https://arxiv.org/abs/1412.6980"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#algorithms-and-codes-for-adagrad-rmsprop-and-adam" style="font-size: 80%;"><b>Algorithms and codes for Adagrad, RMSprop and Adam</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>AdaGrad algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>RMSProp algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>ADAM algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs029.html#practical-tips" style="font-size: 80%;"><b>Practical tips</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs030.html#automatic-differentiation" style="font-size: 80%;"><b>Automatic differentiation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs031.html#using-autograd" style="font-size: 80%;"><b>Using autograd</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs032.html#autograd-with-more-complicated-functions" style="font-size: 80%;"><b>Autograd with more complicated functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs033.html#more-complicated-functions-using-the-elements-of-their-arguments-directly" style="font-size: 80%;"><b>More complicated functions using the elements of their arguments directly</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs034.html#functions-using-mathematical-functions-from-numpy" style="font-size: 80%;"><b>Functions using mathematical functions from Numpy</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs035.html#more-autograd" style="font-size: 80%;"><b>More autograd</b></a></li>
-     <!-- navigation toc: --> <li><a href="#and-with-loops" style="font-size: 80%;"><b>And  with loops</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs037.html#using-recursion" style="font-size: 80%;"><b>Using recursion</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs038.html#using-autograd-with-ols" style="font-size: 80%;"><b>Using Autograd with OLS</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs041.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs040.html#including-stochastic-gradient-descent-with-autograd" style="font-size: 80%;"><b>Including Stochastic Gradient Descent with Autograd</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs041.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#similar-second-order-function-now-problem-but-now-with-adagrad" style="font-size: 80%;"><b>Similar (second order function now) problem but now with AdaGrad</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs043.html#rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent" style="font-size: 80%;"><b>RMSprop for adaptive learning rate with Stochastic Gradient Descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs044.html#and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf" style="font-size: 80%;"><b>And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#and-logistic-regression" style="font-size: 80%;"><b>And Logistic Regression</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#introducing-jax-https-jax-readthedocs-io-en-latest" style="font-size: 80%;"><b>Introducing "JAX":"https://jax.readthedocs.io/en/latest/"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#getting-started-with-jax-note-the-way-we-import-numpy" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Getting started with Jax, note the way we import numpy</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#a-warm-up-example" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;A warm-up example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#a-more-advanced-example" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;A more advanced example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs046.html#introduction-to-neural-networks" style="font-size: 80%;"><b>Introduction to Neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs047.html#artificial-neurons" style="font-size: 80%;"><b>Artificial neurons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs048.html#neural-network-types" style="font-size: 80%;"><b>Neural network types</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs049.html#feed-forward-neural-networks" style="font-size: 80%;"><b>Feed-forward neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs050.html#convolutional-neural-network" style="font-size: 80%;"><b>Convolutional Neural Network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs051.html#recurrent-neural-networks" style="font-size: 80%;"><b>Recurrent neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs052.html#other-types-of-networks" style="font-size: 80%;"><b>Other types of networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs053.html#multilayer-perceptrons" style="font-size: 80%;"><b>Multilayer perceptrons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs054.html#why-multilayer-perceptrons" style="font-size: 80%;"><b>Why multilayer perceptrons?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs055.html#illustration-of-a-single-perceptron-model-and-a-multi-perceptron-model" style="font-size: 80%;"><b>Illustration of a single perceptron model and a multi-perceptron model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs056.html#examples-of-xor-or-and-and-gates" style="font-size: 80%;"><b>Examples of XOR, OR and AND gates</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs057.html#does-logistic-regression-do-a-better-job" style="font-size: 80%;"><b>Does Logistic Regression do a better Job?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs058.html#adding-neural-networks" style="font-size: 80%;"><b>Adding Neural Networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs064.html#matrix-vector-notation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Matrix-vector notation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs065.html#matrix-vector-notation-and-activation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Matrix-vector notation  and activation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs066.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs067.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs068.html#relevance" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Relevance</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs001.html#lecture-monday-september-29-2025" style="font-size: 80%;"><b>Lecture Monday September 29, 2025</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs002.html#suggested-readings-and-videos" style="font-size: 80%;"><b>Suggested readings and videos</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs003.html#lab-sessions-tuesday-and-wednesday" style="font-size: 80%;"><b>Lab sessions Tuesday and Wednesday</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs004.html#logistic-regression-from-last-week" style="font-size: 80%;"><b>Logistic Regression, from last week</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs005.html#classification-problems" style="font-size: 80%;"><b>Classification problems</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs006.html#optimization-and-deep-learning" style="font-size: 80%;"><b>Optimization and Deep learning</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs007.html#basics" style="font-size: 80%;"><b>Basics</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs008.html#two-parameters" style="font-size: 80%;"><b>Two parameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs009.html#maximum-likelihood" style="font-size: 80%;"><b>Maximum likelihood</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs010.html#the-cost-function-rewritten" style="font-size: 80%;"><b>The cost function rewritten</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs011.html#minimizing-the-cross-entropy" style="font-size: 80%;"><b>Minimizing the cross entropy</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs012.html#a-more-compact-expression" style="font-size: 80%;"><b>A more compact expression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs013.html#extending-to-more-predictors" style="font-size: 80%;"><b>Extending to more predictors</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs014.html#including-more-classes" style="font-size: 80%;"><b>Including more classes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs015.html#more-classes" style="font-size: 80%;"><b>More classes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs016.html#optimization-the-central-part-of-any-machine-learning-algortithm" style="font-size: 80%;"><b>Optimization, the central part of any Machine Learning algortithm</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs017.html#revisiting-our-logistic-regression-case" style="font-size: 80%;"><b>Revisiting our Logistic Regression case</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs018.html#the-equations-to-solve" style="font-size: 80%;"><b>The equations to solve</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs019.html#solving-using-newton-raphson-s-method" style="font-size: 80%;"><b>Solving using Newton-Raphson's method</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs020.html#example-code-for-logistic-regression" style="font-size: 80%;"><b>Example code for Logistic Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs020.html#synthetic-data-generation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Synthetic data generation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs021.html#using-scikit-learn" style="font-size: 80%;"><b>Using <b>Scikit-learn</b></b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs022.html#using-the-correlation-matrix" style="font-size: 80%;"><b>Using the correlation matrix</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs023.html#discussing-the-correlation-data" style="font-size: 80%;"><b>Discussing the correlation data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs024.html#other-measures-in-classification-studies" style="font-size: 80%;"><b>Other measures in classification studies</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs025.html#introduction-to-neural-networks" style="font-size: 80%;"><b>Introduction to Neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs026.html#artificial-neurons" style="font-size: 80%;"><b>Artificial neurons</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs027.html#neural-network-types" style="font-size: 80%;"><b>Neural network types</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#feed-forward-neural-networks" style="font-size: 80%;"><b>Feed-forward neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs029.html#convolutional-neural-network" style="font-size: 80%;"><b>Convolutional Neural Network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs030.html#recurrent-neural-networks" style="font-size: 80%;"><b>Recurrent neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs031.html#other-types-of-networks" style="font-size: 80%;"><b>Other types of networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs032.html#multilayer-perceptrons" style="font-size: 80%;"><b>Multilayer perceptrons</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs033.html#why-multilayer-perceptrons" style="font-size: 80%;"><b>Why multilayer perceptrons?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs034.html#illustration-of-a-single-perceptron-model-and-a-multi-perceptron-model" style="font-size: 80%;"><b>Illustration of a single perceptron model and a multi-perceptron model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs035.html#examples-of-xor-or-and-and-gates" style="font-size: 80%;"><b>Examples of XOR, OR and AND gates</b></a></li>
+     <!-- navigation toc: --> <li><a href="#does-logistic-regression-do-a-better-job" style="font-size: 80%;"><b>Does Logistic Regression do a better Job?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs037.html#adding-neural-networks" style="font-size: 80%;"><b>Adding Neural Networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs043.html#matrix-vector-notation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Matrix-vector notation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs044.html#matrix-vector-notation-and-activation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Matrix-vector notation  and activation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs046.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs047.html#relevance" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Relevance</a></li>
 
         </ul>
       </li>
@@ -362,7 +260,7 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0036"></a>
 <!-- !split -->
-<h2 id="and-with-loops" class="anchor">And  with loops </h2>
+<h2 id="does-logistic-regression-do-a-better-job" class="anchor">Does Logistic Regression do a better Job? </h2>
 
 
 <!-- code=python (!bc pycod) typeset with pygments style "default" -->
@@ -371,61 +269,55 @@ <h2 id="and-with-loops" class="anchor">And  with loops </h2>
     <div class="inner_cell">
       <div class="input_area">
         <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">autograd.numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">autograd</span> <span style="color: #008000; font-weight: bold">import</span> grad
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">f6_for</span>(x):
-    val <span style="color: #666666">=</span> <span style="color: #666666">0</span>
-    <span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #666666">10</span>):
-        val <span style="color: #666666">=</span> val <span style="color: #666666">+</span> x<span style="color: #666666">**</span>i
-    <span style="color: #008000; font-weight: bold">return</span> val
+  <pre style="line-height: 125%;"><span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;</span>
+<span style="color: #BA2121; font-style: italic">Simple code that tests XOR and OR gates with linear regression</span>
+<span style="color: #BA2121; font-style: italic">and logistic regression</span>
+<span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;</span>
 
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">f6_while</span>(x):
-    val <span style="color: #666666">=</span> <span style="color: #666666">0</span>
-    i <span style="color: #666666">=</span> <span style="color: #666666">0</span>
-    <span style="color: #008000; font-weight: bold">while</span> i <span style="color: #666666">&lt;</span> <span style="color: #666666">10</span>:
-        val <span style="color: #666666">=</span> val <span style="color: #666666">+</span> x<span style="color: #666666">**</span>i
-        i <span style="color: #666666">=</span> i <span style="color: #666666">+</span> <span style="color: #666666">1</span>
-    <span style="color: #008000; font-weight: bold">return</span> val
+<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">matplotlib.pyplot</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">plt</span>
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.linear_model</span> <span style="color: #008000; font-weight: bold">import</span> LogisticRegression
+<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
 
-f6_for_grad <span style="color: #666666">=</span> grad(f6_for)
-f6_while_grad <span style="color: #666666">=</span> grad(f6_while)
+<span style="color: #408080; font-style: italic"># Design matrix</span>
+X <span style="color: #666666">=</span> np<span style="color: #666666">.</span>array([ [<span style="color: #666666">1</span>, <span style="color: #666666">0</span>, <span style="color: #666666">0</span>], [<span style="color: #666666">1</span>, <span style="color: #666666">0</span>, <span style="color: #666666">1</span>], [<span style="color: #666666">1</span>, <span style="color: #666666">1</span>, <span style="color: #666666">0</span>],[<span style="color: #666666">1</span>, <span style="color: #666666">1</span>, <span style="color: #666666">1</span>]],dtype<span style="color: #666666">=</span>np<span style="color: #666666">.</span>float64)
+<span style="color: #008000">print</span>(<span style="color: #BA2121">f&quot;The X.TX  matrix:</span><span style="color: #BB6688; font-weight: bold">{</span>X<span style="color: #666666">.</span>T <span style="color: #666666">@</span> X<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">&quot;</span>)
+Xinv <span style="color: #666666">=</span> np<span style="color: #666666">.</span>linalg<span style="color: #666666">.</span>pinv(X<span style="color: #666666">.</span>T <span style="color: #666666">@</span> X)
+<span style="color: #008000">print</span>(<span style="color: #BA2121">f&quot;The invers of X.TX  matrix:</span><span style="color: #BB6688; font-weight: bold">{</span>Xinv<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">&quot;</span>)
 
-x <span style="color: #666666">=</span> <span style="color: #666666">0.5</span>
+<span style="color: #408080; font-style: italic"># The XOR gate </span>
+yXOR <span style="color: #666666">=</span> np<span style="color: #666666">.</span>array( [ <span style="color: #666666">0</span>, <span style="color: #666666">1</span> ,<span style="color: #666666">1</span>, <span style="color: #666666">0</span>])
+ThetaXOR  <span style="color: #666666">=</span> Xinv <span style="color: #666666">@</span> X<span style="color: #666666">.</span>T <span style="color: #666666">@</span> yXOR
+<span style="color: #008000">print</span>(<span style="color: #BA2121">f&quot;The values of theta for the XOR gate:</span><span style="color: #BB6688; font-weight: bold">{</span>ThetaXOR<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">&quot;</span>)
+<span style="color: #008000">print</span>(<span style="color: #BA2121">f&quot;The linear regression prediction  for the XOR gate:</span><span style="color: #BB6688; font-weight: bold">{</span>X <span style="color: #666666">@</span> ThetaXOR<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">&quot;</span>)
 
-<span style="color: #408080; font-style: italic"># Print the computed derivaties of f6_for and f6_while</span>
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;The computed derivative of f6_for at x = </span><span style="color: #BB6688; font-weight: bold">%g</span><span style="color: #BA2121"> is: </span><span style="color: #BB6688; font-weight: bold">%g</span><span style="color: #BA2121">&quot;</span><span style="color: #666666">%</span>(x,f6_for_grad(x)))
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;The computed derivative of f6_while at x = </span><span style="color: #BB6688; font-weight: bold">%g</span><span style="color: #BA2121"> is: </span><span style="color: #BB6688; font-weight: bold">%g</span><span style="color: #BA2121">&quot;</span><span style="color: #666666">%</span>(x,f6_while_grad(x)))
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
 
+<span style="color: #408080; font-style: italic"># The OR gate </span>
+yOR <span style="color: #666666">=</span> np<span style="color: #666666">.</span>array( [ <span style="color: #666666">0</span>, <span style="color: #666666">1</span> ,<span style="color: #666666">1</span>, <span style="color: #666666">1</span>])
+ThetaOR  <span style="color: #666666">=</span> Xinv <span style="color: #666666">@</span> X<span style="color: #666666">.</span>T <span style="color: #666666">@</span> yOR
+<span style="color: #008000">print</span>(<span style="color: #BA2121">f&quot;The values of theta for the OR gate:</span><span style="color: #BB6688; font-weight: bold">{</span>ThetaOR<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">&quot;</span>)
+<span style="color: #008000">print</span>(<span style="color: #BA2121">f&quot;The linear regression prediction  for the OR gate:</span><span style="color: #BB6688; font-weight: bold">{</span>X <span style="color: #666666">@</span> ThetaOR<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">&quot;</span>)
+
+
+<span style="color: #408080; font-style: italic"># The OR gate </span>
+yAND <span style="color: #666666">=</span> np<span style="color: #666666">.</span>array( [ <span style="color: #666666">0</span>, <span style="color: #666666">0</span> ,<span style="color: #666666">0</span>, <span style="color: #666666">1</span>])
+ThetaAND  <span style="color: #666666">=</span> Xinv <span style="color: #666666">@</span> X<span style="color: #666666">.</span>T <span style="color: #666666">@</span> yAND
+<span style="color: #008000">print</span>(<span style="color: #BA2121">f&quot;The values of theta for the AND gate:</span><span style="color: #BB6688; font-weight: bold">{</span>ThetaAND<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">&quot;</span>)
+<span style="color: #008000">print</span>(<span style="color: #BA2121">f&quot;The linear regression prediction  for the AND gate:</span><span style="color: #BB6688; font-weight: bold">{</span>X <span style="color: #666666">@</span> ThetaAND<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">&quot;</span>)
+
+<span style="color: #408080; font-style: italic"># Now we change to logistic regression</span>
+
+
+<span style="color: #408080; font-style: italic"># Logistic Regression</span>
+logreg <span style="color: #666666">=</span> LogisticRegression()
+logreg<span style="color: #666666">.</span>fit(X, yOR)
+<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;Test set accuracy with Logistic Regression for OR gate: </span><span style="color: #BB6688; font-weight: bold">{:.2f}</span><span style="color: #BA2121">&quot;</span><span style="color: #666666">.</span>format(logreg<span style="color: #666666">.</span>score(X,yOR)))
+
+logreg<span style="color: #666666">.</span>fit(X, yXOR)
+<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;Test set accuracy with Logistic Regression for XOR gate: </span><span style="color: #BB6688; font-weight: bold">{:.2f}</span><span style="color: #BA2121">&quot;</span><span style="color: #666666">.</span>format(logreg<span style="color: #666666">.</span>score(X,yXOR)))
 
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">autograd.numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">autograd</span> <span style="color: #008000; font-weight: bold">import</span> grad
-<span style="color: #408080; font-style: italic"># Both of the functions are implementation of the sum: sum(x**i) for i = 0, ..., 9</span>
-<span style="color: #408080; font-style: italic"># The analytical derivative is: sum(i*x**(i-1)) </span>
-f6_grad_analytical <span style="color: #666666">=</span> <span style="color: #666666">0</span>
-<span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #666666">10</span>):
-    f6_grad_analytical <span style="color: #666666">+=</span> i<span style="color: #666666">*</span>x<span style="color: #666666">**</span>(i<span style="color: #666666">-1</span>)
 
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;The analytical derivative of f6 at x = </span><span style="color: #BB6688; font-weight: bold">%g</span><span style="color: #BA2121"> is: </span><span style="color: #BB6688; font-weight: bold">%g</span><span style="color: #BA2121">&quot;</span><span style="color: #666666">%</span>(x,f6_grad_analytical))
+logreg<span style="color: #666666">.</span>fit(X, yAND)
+<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;Test set accuracy with Logistic Regression for AND gate: </span><span style="color: #BB6688; font-weight: bold">{:.2f}</span><span style="color: #BA2121">&quot;</span><span style="color: #666666">.</span>format(logreg<span style="color: #666666">.</span>score(X,yAND)))
 </pre>
 </div>
       </div>
@@ -441,6 +333,7 @@ <h2 id="and-with-loops" class="anchor">And  with loops </h2>
   </div>
 </div>
 
+<p>Not exactly impressive, but somewhat better.</p>
 
 <p>
 <!-- navigation buttons at the bottom of the page -->
@@ -467,7 +360,7 @@ <h2 id="and-with-loops" class="anchor">And  with loops </h2>
   <li><a href="http://github.com/._week40-bs044.html">45</a></li>
   <li><a href="http://github.com/._week40-bs045.html">46</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week40-bs068.html">69</a></li>
+  <li><a href="http://github.com/._week40-bs047.html">48</a></li>
   <li><a href="http://github.com/._week40-bs037.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week40/html/._week40-bs037.html b/doc/pub/week40/html/._week40-bs037.html
index 009ec6f0b..119c75944 100644
--- a/doc/pub/week40/html/._week40-bs037.html
+++ b/doc/pub/week40/html/._week40-bs037.html
@@ -36,11 +36,10 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Plans for week 40', 2, None, 'plans-for-week-40'),
-              ('Lecture Monday September 30, 2024',
+ 'sections': [('Lecture Monday September 29, 2025',
                2,
                None,
-               'lecture-monday-september-30-2024'),
+               'lecture-monday-september-29-2025'),
               ('Suggested readings and videos',
                2,
                None,
@@ -49,145 +48,71 @@
                2,
                None,
                'lab-sessions-tuesday-and-wednesday'),
-              ('Summary from last week, using gradient descent methods, '
-               'limitations',
+              ('Logistic Regression, from last week',
                2,
                None,
-               'summary-from-last-week-using-gradient-descent-methods-limitations'),
-              ('Simple implementation of GD for OLS, Ridge and Lasso',
+               'logistic-regression-from-last-week'),
+              ('Classification problems', 2, None, 'classification-problems'),
+              ('Optimization and Deep learning',
                2,
                None,
-               'simple-implementation-of-gd-for-ols-ridge-and-lasso'),
-              ("But none of these can compete with Newton's method",
+               'optimization-and-deep-learning'),
+              ('Basics', 2, None, 'basics'),
+              ('Two parameters', 2, None, 'two-parameters'),
+              ('Maximum likelihood', 2, None, 'maximum-likelihood'),
+              ('The cost function rewritten',
                2,
                None,
-               'but-none-of-these-can-compete-with-newton-s-method'),
-              ('Gradient descent and Logistic regression',
+               'the-cost-function-rewritten'),
+              ('Minimizing the cross entropy',
                2,
                None,
-               'gradient-descent-and-logistic-regression'),
-              ('Overview video on Stochastic Gradient Descent',
+               'minimizing-the-cross-entropy'),
+              ('A more compact expression',
                2,
                None,
-               'overview-video-on-stochastic-gradient-descent'),
-              ('Batches and mini-batches', 2, None, 'batches-and-mini-batches'),
-              ('Stochastic Gradient Descent (SGD)',
+               'a-more-compact-expression'),
+              ('Extending to more predictors',
                2,
                None,
-               'stochastic-gradient-descent-sgd'),
-              ('Stochastic Gradient Descent',
+               'extending-to-more-predictors'),
+              ('Including more classes', 2, None, 'including-more-classes'),
+              ('More classes', 2, None, 'more-classes'),
+              ('Optimization, the central part of any Machine Learning '
+               'algortithm',
                2,
                None,
-               'stochastic-gradient-descent'),
-              ('Computation of gradients', 2, None, 'computation-of-gradients'),
-              ('SGD example', 2, None, 'sgd-example'),
-              ('The gradient step', 2, None, 'the-gradient-step'),
-              ('Simple example code', 2, None, 'simple-example-code'),
-              ('When do we stop?', 2, None, 'when-do-we-stop'),
-              ('Slightly different approach',
+               'optimization-the-central-part-of-any-machine-learning-algortithm'),
+              ('Revisiting our Logistic Regression case',
                2,
                None,
-               'slightly-different-approach'),
-              ('Time decay rate', 2, None, 'time-decay-rate'),
-              ('Code with a Number of Minibatches which varies',
+               'revisiting-our-logistic-regression-case'),
+              ('The equations to solve', 2, None, 'the-equations-to-solve'),
+              ("Solving using Newton-Raphson's method",
                2,
                None,
-               'code-with-a-number-of-minibatches-which-varies'),
-              ('Replace or not', 2, None, 'replace-or-not'),
-              ('Momentum based GD', 2, None, 'momentum-based-gd'),
-              ('More on momentum based approaches',
+               'solving-using-newton-raphson-s-method'),
+              ('Example code for Logistic Regression',
                2,
                None,
-               'more-on-momentum-based-approaches'),
-              ('Momentum parameter', 2, None, 'momentum-parameter'),
-              ('Second moment of the gradient',
-               2,
-               None,
-               'second-moment-of-the-gradient'),
-              ('RMS prop', 2, None, 'rms-prop'),
-              ('"ADAM optimizer":"https://arxiv.org/abs/1412.6980"',
-               2,
-               None,
-               'adam-optimizer-https-arxiv-org-abs-1412-6980'),
-              ('Algorithms and codes for Adagrad, RMSprop and Adam',
-               2,
-               None,
-               'algorithms-and-codes-for-adagrad-rmsprop-and-adam'),
-              ('AdaGrad algorithm, taken from "Goodfellow et '
-               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
-               2,
-               None,
-               'adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
-              ('RMSProp algorithm, taken from "Goodfellow et '
-               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
-               2,
-               None,
-               'rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
-              ('ADAM algorithm, taken from "Goodfellow et '
-               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
-               2,
-               None,
-               'adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
-              ('Practical tips', 2, None, 'practical-tips'),
-              ('Automatic differentiation',
-               2,
-               None,
-               'automatic-differentiation'),
-              ('Using autograd', 2, None, 'using-autograd'),
-              ('Autograd with more complicated functions',
-               2,
-               None,
-               'autograd-with-more-complicated-functions'),
-              ('More complicated functions using the elements of their '
-               'arguments directly',
-               2,
-               None,
-               'more-complicated-functions-using-the-elements-of-their-arguments-directly'),
-              ('Functions using mathematical functions from Numpy',
-               2,
-               None,
-               'functions-using-mathematical-functions-from-numpy'),
-              ('More autograd', 2, None, 'more-autograd'),
-              ('And  with loops', 2, None, 'and-with-loops'),
-              ('Using recursion', 2, None, 'using-recursion'),
-              ('Using Autograd with OLS', 2, None, 'using-autograd-with-ols'),
-              ('Same code but now with momentum gradient descent',
-               2,
-               None,
-               'same-code-but-now-with-momentum-gradient-descent'),
-              ('Including Stochastic Gradient Descent with Autograd',
-               2,
-               None,
-               'including-stochastic-gradient-descent-with-autograd'),
-              ('Same code but now with momentum gradient descent',
-               2,
-               None,
-               'same-code-but-now-with-momentum-gradient-descent'),
-              ('Similar (second order function now) problem but now with '
-               'AdaGrad',
-               2,
+               'example-code-for-logistic-regression'),
+              ('Synthetic data generation',
+               3,
                None,
-               'similar-second-order-function-now-problem-but-now-with-adagrad'),
-              ('RMSprop for adaptive learning rate with Stochastic Gradient '
-               'Descent',
+               'synthetic-data-generation'),
+              ('Using _Scikit-learn_', 2, None, 'using-scikit-learn'),
+              ('Using the correlation matrix',
                2,
                None,
-               'rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent'),
-              ('And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"',
+               'using-the-correlation-matrix'),
+              ('Discussing the correlation data',
                2,
                None,
-               'and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf'),
-              ('And Logistic Regression', 2, None, 'and-logistic-regression'),
-              ('Introducing "JAX":"https://jax.readthedocs.io/en/latest/"',
+               'discussing-the-correlation-data'),
+              ('Other measures in classification studies',
                2,
                None,
-               'introducing-jax-https-jax-readthedocs-io-en-latest'),
-              ('Getting started with Jax, note the way we import numpy',
-               3,
-               None,
-               'getting-started-with-jax-note-the-way-we-import-numpy'),
-              ('A warm-up example', 3, None, 'a-warm-up-example'),
-              ('A more advanced example', 3, None, 'a-more-advanced-example'),
+               'other-measures-in-classification-studies'),
               ('Introduction to Neural networks',
                2,
                None,
@@ -276,81 +201,54 @@
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs001.html#plans-for-week-40" style="font-size: 80%;"><b>Plans for week 40</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs002.html#lecture-monday-september-30-2024" style="font-size: 80%;"><b>Lecture Monday September 30, 2024</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs003.html#suggested-readings-and-videos" style="font-size: 80%;"><b>Suggested readings and videos</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs004.html#lab-sessions-tuesday-and-wednesday" style="font-size: 80%;"><b>Lab sessions Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs005.html#summary-from-last-week-using-gradient-descent-methods-limitations" style="font-size: 80%;"><b>Summary from last week, using gradient descent methods, limitations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs006.html#simple-implementation-of-gd-for-ols-ridge-and-lasso" style="font-size: 80%;"><b>Simple implementation of GD for OLS, Ridge and Lasso</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs007.html#but-none-of-these-can-compete-with-newton-s-method" style="font-size: 80%;"><b>But none of these can compete with Newton's method</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs008.html#gradient-descent-and-logistic-regression" style="font-size: 80%;"><b>Gradient descent and Logistic regression</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs009.html#overview-video-on-stochastic-gradient-descent" style="font-size: 80%;"><b>Overview video on Stochastic Gradient Descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs010.html#batches-and-mini-batches" style="font-size: 80%;"><b>Batches and mini-batches</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs011.html#stochastic-gradient-descent-sgd" style="font-size: 80%;"><b>Stochastic Gradient Descent (SGD)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs012.html#stochastic-gradient-descent" style="font-size: 80%;"><b>Stochastic Gradient Descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs013.html#computation-of-gradients" style="font-size: 80%;"><b>Computation of gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs014.html#sgd-example" style="font-size: 80%;"><b>SGD example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs015.html#the-gradient-step" style="font-size: 80%;"><b>The gradient step</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs016.html#simple-example-code" style="font-size: 80%;"><b>Simple example code</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs017.html#when-do-we-stop" style="font-size: 80%;"><b>When do we stop?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs018.html#slightly-different-approach" style="font-size: 80%;"><b>Slightly different approach</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs019.html#time-decay-rate" style="font-size: 80%;"><b>Time decay rate</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs020.html#code-with-a-number-of-minibatches-which-varies" style="font-size: 80%;"><b>Code with a Number of Minibatches which varies</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs021.html#replace-or-not" style="font-size: 80%;"><b>Replace or not</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs022.html#momentum-based-gd" style="font-size: 80%;"><b>Momentum based GD</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs023.html#more-on-momentum-based-approaches" style="font-size: 80%;"><b>More on momentum based approaches</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs024.html#momentum-parameter" style="font-size: 80%;"><b>Momentum parameter</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs025.html#second-moment-of-the-gradient" style="font-size: 80%;"><b>Second moment of the gradient</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs026.html#rms-prop" style="font-size: 80%;"><b>RMS prop</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs027.html#adam-optimizer-https-arxiv-org-abs-1412-6980" style="font-size: 80%;"><b>"ADAM optimizer":"https://arxiv.org/abs/1412.6980"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#algorithms-and-codes-for-adagrad-rmsprop-and-adam" style="font-size: 80%;"><b>Algorithms and codes for Adagrad, RMSprop and Adam</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>AdaGrad algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>RMSProp algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>ADAM algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs029.html#practical-tips" style="font-size: 80%;"><b>Practical tips</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs030.html#automatic-differentiation" style="font-size: 80%;"><b>Automatic differentiation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs031.html#using-autograd" style="font-size: 80%;"><b>Using autograd</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs032.html#autograd-with-more-complicated-functions" style="font-size: 80%;"><b>Autograd with more complicated functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs033.html#more-complicated-functions-using-the-elements-of-their-arguments-directly" style="font-size: 80%;"><b>More complicated functions using the elements of their arguments directly</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs034.html#functions-using-mathematical-functions-from-numpy" style="font-size: 80%;"><b>Functions using mathematical functions from Numpy</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs035.html#more-autograd" style="font-size: 80%;"><b>More autograd</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs036.html#and-with-loops" style="font-size: 80%;"><b>And  with loops</b></a></li>
-     <!-- navigation toc: --> <li><a href="#using-recursion" style="font-size: 80%;"><b>Using recursion</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs038.html#using-autograd-with-ols" style="font-size: 80%;"><b>Using Autograd with OLS</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs041.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs040.html#including-stochastic-gradient-descent-with-autograd" style="font-size: 80%;"><b>Including Stochastic Gradient Descent with Autograd</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs041.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#similar-second-order-function-now-problem-but-now-with-adagrad" style="font-size: 80%;"><b>Similar (second order function now) problem but now with AdaGrad</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs043.html#rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent" style="font-size: 80%;"><b>RMSprop for adaptive learning rate with Stochastic Gradient Descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs044.html#and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf" style="font-size: 80%;"><b>And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#and-logistic-regression" style="font-size: 80%;"><b>And Logistic Regression</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#introducing-jax-https-jax-readthedocs-io-en-latest" style="font-size: 80%;"><b>Introducing "JAX":"https://jax.readthedocs.io/en/latest/"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#getting-started-with-jax-note-the-way-we-import-numpy" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Getting started with Jax, note the way we import numpy</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#a-warm-up-example" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;A warm-up example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#a-more-advanced-example" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;A more advanced example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs046.html#introduction-to-neural-networks" style="font-size: 80%;"><b>Introduction to Neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs047.html#artificial-neurons" style="font-size: 80%;"><b>Artificial neurons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs048.html#neural-network-types" style="font-size: 80%;"><b>Neural network types</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs049.html#feed-forward-neural-networks" style="font-size: 80%;"><b>Feed-forward neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs050.html#convolutional-neural-network" style="font-size: 80%;"><b>Convolutional Neural Network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs051.html#recurrent-neural-networks" style="font-size: 80%;"><b>Recurrent neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs052.html#other-types-of-networks" style="font-size: 80%;"><b>Other types of networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs053.html#multilayer-perceptrons" style="font-size: 80%;"><b>Multilayer perceptrons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs054.html#why-multilayer-perceptrons" style="font-size: 80%;"><b>Why multilayer perceptrons?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs055.html#illustration-of-a-single-perceptron-model-and-a-multi-perceptron-model" style="font-size: 80%;"><b>Illustration of a single perceptron model and a multi-perceptron model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs056.html#examples-of-xor-or-and-and-gates" style="font-size: 80%;"><b>Examples of XOR, OR and AND gates</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs057.html#does-logistic-regression-do-a-better-job" style="font-size: 80%;"><b>Does Logistic Regression do a better Job?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs058.html#adding-neural-networks" style="font-size: 80%;"><b>Adding Neural Networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs064.html#matrix-vector-notation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Matrix-vector notation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs065.html#matrix-vector-notation-and-activation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Matrix-vector notation  and activation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs066.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs067.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs068.html#relevance" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Relevance</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs001.html#lecture-monday-september-29-2025" style="font-size: 80%;"><b>Lecture Monday September 29, 2025</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs002.html#suggested-readings-and-videos" style="font-size: 80%;"><b>Suggested readings and videos</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs003.html#lab-sessions-tuesday-and-wednesday" style="font-size: 80%;"><b>Lab sessions Tuesday and Wednesday</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs004.html#logistic-regression-from-last-week" style="font-size: 80%;"><b>Logistic Regression, from last week</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs005.html#classification-problems" style="font-size: 80%;"><b>Classification problems</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs006.html#optimization-and-deep-learning" style="font-size: 80%;"><b>Optimization and Deep learning</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs007.html#basics" style="font-size: 80%;"><b>Basics</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs008.html#two-parameters" style="font-size: 80%;"><b>Two parameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs009.html#maximum-likelihood" style="font-size: 80%;"><b>Maximum likelihood</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs010.html#the-cost-function-rewritten" style="font-size: 80%;"><b>The cost function rewritten</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs011.html#minimizing-the-cross-entropy" style="font-size: 80%;"><b>Minimizing the cross entropy</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs012.html#a-more-compact-expression" style="font-size: 80%;"><b>A more compact expression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs013.html#extending-to-more-predictors" style="font-size: 80%;"><b>Extending to more predictors</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs014.html#including-more-classes" style="font-size: 80%;"><b>Including more classes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs015.html#more-classes" style="font-size: 80%;"><b>More classes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs016.html#optimization-the-central-part-of-any-machine-learning-algortithm" style="font-size: 80%;"><b>Optimization, the central part of any Machine Learning algortithm</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs017.html#revisiting-our-logistic-regression-case" style="font-size: 80%;"><b>Revisiting our Logistic Regression case</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs018.html#the-equations-to-solve" style="font-size: 80%;"><b>The equations to solve</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs019.html#solving-using-newton-raphson-s-method" style="font-size: 80%;"><b>Solving using Newton-Raphson's method</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs020.html#example-code-for-logistic-regression" style="font-size: 80%;"><b>Example code for Logistic Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs020.html#synthetic-data-generation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Synthetic data generation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs021.html#using-scikit-learn" style="font-size: 80%;"><b>Using <b>Scikit-learn</b></b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs022.html#using-the-correlation-matrix" style="font-size: 80%;"><b>Using the correlation matrix</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs023.html#discussing-the-correlation-data" style="font-size: 80%;"><b>Discussing the correlation data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs024.html#other-measures-in-classification-studies" style="font-size: 80%;"><b>Other measures in classification studies</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs025.html#introduction-to-neural-networks" style="font-size: 80%;"><b>Introduction to Neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs026.html#artificial-neurons" style="font-size: 80%;"><b>Artificial neurons</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs027.html#neural-network-types" style="font-size: 80%;"><b>Neural network types</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#feed-forward-neural-networks" style="font-size: 80%;"><b>Feed-forward neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs029.html#convolutional-neural-network" style="font-size: 80%;"><b>Convolutional Neural Network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs030.html#recurrent-neural-networks" style="font-size: 80%;"><b>Recurrent neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs031.html#other-types-of-networks" style="font-size: 80%;"><b>Other types of networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs032.html#multilayer-perceptrons" style="font-size: 80%;"><b>Multilayer perceptrons</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs033.html#why-multilayer-perceptrons" style="font-size: 80%;"><b>Why multilayer perceptrons?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs034.html#illustration-of-a-single-perceptron-model-and-a-multi-perceptron-model" style="font-size: 80%;"><b>Illustration of a single perceptron model and a multi-perceptron model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs035.html#examples-of-xor-or-and-and-gates" style="font-size: 80%;"><b>Examples of XOR, OR and AND gates</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs036.html#does-logistic-regression-do-a-better-job" style="font-size: 80%;"><b>Does Logistic Regression do a better Job?</b></a></li>
+     <!-- navigation toc: --> <li><a href="#adding-neural-networks" style="font-size: 80%;"><b>Adding Neural Networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs043.html#matrix-vector-notation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Matrix-vector notation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs044.html#matrix-vector-notation-and-activation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Matrix-vector notation  and activation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs046.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs047.html#relevance" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Relevance</a></li>
 
         </ul>
       </li>
@@ -362,7 +260,8 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0037"></a>
 <!-- !split -->
-<h2 id="using-recursion" class="anchor">Using recursion </h2>
+<h2 id="adding-neural-networks" class="anchor">Adding Neural Networks </h2>
+
 
 <!-- code=python (!bc pycod) typeset with pygments style "default" -->
 <div class="cell border-box-sizing code_cell rendered">
@@ -370,33 +269,14 @@ <h2 id="using-recursion" class="anchor">Using recursion </h2>
     <div class="inner_cell">
       <div class="input_area">
         <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">autograd.numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">autograd</span> <span style="color: #008000; font-weight: bold">import</span> grad
-
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">f7</span>(n): <span style="color: #408080; font-style: italic"># Assume that n is an integer</span>
-    <span style="color: #008000; font-weight: bold">if</span> n <span style="color: #666666">==</span> <span style="color: #666666">1</span> <span style="color: #AA22FF; font-weight: bold">or</span> n <span style="color: #666666">==</span> <span style="color: #666666">0</span>:
-        <span style="color: #008000; font-weight: bold">return</span> <span style="color: #666666">1</span>
-    <span style="color: #008000; font-weight: bold">else</span>:
-        <span style="color: #008000; font-weight: bold">return</span> n<span style="color: #666666">*</span>f7(n<span style="color: #666666">-1</span>)
-
-f7_grad <span style="color: #666666">=</span> grad(f7)
-
-n <span style="color: #666666">=</span> <span style="color: #666666">2.0</span>
-
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;The computed derivative of f7 at n = </span><span style="color: #BB6688; font-weight: bold">%d</span><span style="color: #BA2121"> is: </span><span style="color: #BB6688; font-weight: bold">%g</span><span style="color: #BA2121">&quot;</span><span style="color: #666666">%</span>(n,f7_grad(n)))
-
-<span style="color: #408080; font-style: italic"># The function f7 is an implementation of the factorial of n.</span>
-<span style="color: #408080; font-style: italic"># By using the product rule, one can find that the derivative is:</span>
-
-f7_grad_analytical <span style="color: #666666">=</span> <span style="color: #666666">0</span>
-<span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #008000">int</span>(n)<span style="color: #666666">-1</span>):
-    tmp <span style="color: #666666">=</span> <span style="color: #666666">1</span>
-    <span style="color: #008000; font-weight: bold">for</span> k <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #008000">int</span>(n)<span style="color: #666666">-1</span>):
-        <span style="color: #008000; font-weight: bold">if</span> k <span style="color: #666666">!=</span> i:
-            tmp <span style="color: #666666">*=</span> (n <span style="color: #666666">-</span> k)
-    f7_grad_analytical <span style="color: #666666">+=</span> tmp
+  <pre style="line-height: 125%;"><span style="color: #408080; font-style: italic"># and now neural networks with Scikit-Learn and the XOR</span>
 
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;The analytical derivative of f7 at n = </span><span style="color: #BB6688; font-weight: bold">%d</span><span style="color: #BA2121"> is: </span><span style="color: #BB6688; font-weight: bold">%g</span><span style="color: #BA2121">&quot;</span><span style="color: #666666">%</span>(n,f7_grad_analytical))
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.neural_network</span> <span style="color: #008000; font-weight: bold">import</span> MLPClassifier
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.datasets</span> <span style="color: #008000; font-weight: bold">import</span> make_classification
+X, yXOR <span style="color: #666666">=</span> make_classification(n_samples<span style="color: #666666">=100</span>, random_state<span style="color: #666666">=1</span>)
+FFNN <span style="color: #666666">=</span> MLPClassifier(random_state<span style="color: #666666">=1</span>, max_iter<span style="color: #666666">=300</span>)<span style="color: #666666">.</span>fit(X, yXOR)
+FFNN<span style="color: #666666">.</span>predict_proba(X)
+<span style="color: #008000">print</span>(<span style="color: #BA2121">f&quot;Test set accuracy with Feed Forward Neural Network  for XOR gate:</span><span style="color: #BB6688; font-weight: bold">{</span>FFNN<span style="color: #666666">.</span>score(X, yXOR)<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">&quot;</span>)
 </pre>
 </div>
       </div>
@@ -412,7 +292,6 @@ <h2 id="using-recursion" class="anchor">Using recursion </h2>
   </div>
 </div>
 
-<p>Note that if n is equal to zero or one, Autograd will give an error message. This message appears when the output is independent on input.</p>
 
 <p>
 <!-- navigation buttons at the bottom of the page -->
@@ -439,7 +318,7 @@ <h2 id="using-recursion" class="anchor">Using recursion </h2>
   <li><a href="http://github.com/._week40-bs045.html">46</a></li>
   <li><a href="http://github.com/._week40-bs046.html">47</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week40-bs068.html">69</a></li>
+  <li><a href="http://github.com/._week40-bs047.html">48</a></li>
   <li><a href="http://github.com/._week40-bs038.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week40/html/._week40-bs038.html b/doc/pub/week40/html/._week40-bs038.html
index 2ba6e5e1f..58b6c8b7f 100644
--- a/doc/pub/week40/html/._week40-bs038.html
+++ b/doc/pub/week40/html/._week40-bs038.html
@@ -36,11 +36,10 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Plans for week 40', 2, None, 'plans-for-week-40'),
-              ('Lecture Monday September 30, 2024',
+ 'sections': [('Lecture Monday September 29, 2025',
                2,
                None,
-               'lecture-monday-september-30-2024'),
+               'lecture-monday-september-29-2025'),
               ('Suggested readings and videos',
                2,
                None,
@@ -49,145 +48,71 @@
                2,
                None,
                'lab-sessions-tuesday-and-wednesday'),
-              ('Summary from last week, using gradient descent methods, '
-               'limitations',
+              ('Logistic Regression, from last week',
                2,
                None,
-               'summary-from-last-week-using-gradient-descent-methods-limitations'),
-              ('Simple implementation of GD for OLS, Ridge and Lasso',
+               'logistic-regression-from-last-week'),
+              ('Classification problems', 2, None, 'classification-problems'),
+              ('Optimization and Deep learning',
                2,
                None,
-               'simple-implementation-of-gd-for-ols-ridge-and-lasso'),
-              ("But none of these can compete with Newton's method",
+               'optimization-and-deep-learning'),
+              ('Basics', 2, None, 'basics'),
+              ('Two parameters', 2, None, 'two-parameters'),
+              ('Maximum likelihood', 2, None, 'maximum-likelihood'),
+              ('The cost function rewritten',
                2,
                None,
-               'but-none-of-these-can-compete-with-newton-s-method'),
-              ('Gradient descent and Logistic regression',
+               'the-cost-function-rewritten'),
+              ('Minimizing the cross entropy',
                2,
                None,
-               'gradient-descent-and-logistic-regression'),
-              ('Overview video on Stochastic Gradient Descent',
+               'minimizing-the-cross-entropy'),
+              ('A more compact expression',
                2,
                None,
-               'overview-video-on-stochastic-gradient-descent'),
-              ('Batches and mini-batches', 2, None, 'batches-and-mini-batches'),
-              ('Stochastic Gradient Descent (SGD)',
+               'a-more-compact-expression'),
+              ('Extending to more predictors',
                2,
                None,
-               'stochastic-gradient-descent-sgd'),
-              ('Stochastic Gradient Descent',
+               'extending-to-more-predictors'),
+              ('Including more classes', 2, None, 'including-more-classes'),
+              ('More classes', 2, None, 'more-classes'),
+              ('Optimization, the central part of any Machine Learning '
+               'algortithm',
                2,
                None,
-               'stochastic-gradient-descent'),
-              ('Computation of gradients', 2, None, 'computation-of-gradients'),
-              ('SGD example', 2, None, 'sgd-example'),
-              ('The gradient step', 2, None, 'the-gradient-step'),
-              ('Simple example code', 2, None, 'simple-example-code'),
-              ('When do we stop?', 2, None, 'when-do-we-stop'),
-              ('Slightly different approach',
+               'optimization-the-central-part-of-any-machine-learning-algortithm'),
+              ('Revisiting our Logistic Regression case',
                2,
                None,
-               'slightly-different-approach'),
-              ('Time decay rate', 2, None, 'time-decay-rate'),
-              ('Code with a Number of Minibatches which varies',
+               'revisiting-our-logistic-regression-case'),
+              ('The equations to solve', 2, None, 'the-equations-to-solve'),
+              ("Solving using Newton-Raphson's method",
                2,
                None,
-               'code-with-a-number-of-minibatches-which-varies'),
-              ('Replace or not', 2, None, 'replace-or-not'),
-              ('Momentum based GD', 2, None, 'momentum-based-gd'),
-              ('More on momentum based approaches',
+               'solving-using-newton-raphson-s-method'),
+              ('Example code for Logistic Regression',
                2,
                None,
-               'more-on-momentum-based-approaches'),
-              ('Momentum parameter', 2, None, 'momentum-parameter'),
-              ('Second moment of the gradient',
-               2,
-               None,
-               'second-moment-of-the-gradient'),
-              ('RMS prop', 2, None, 'rms-prop'),
-              ('"ADAM optimizer":"https://arxiv.org/abs/1412.6980"',
-               2,
-               None,
-               'adam-optimizer-https-arxiv-org-abs-1412-6980'),
-              ('Algorithms and codes for Adagrad, RMSprop and Adam',
-               2,
-               None,
-               'algorithms-and-codes-for-adagrad-rmsprop-and-adam'),
-              ('AdaGrad algorithm, taken from "Goodfellow et '
-               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
-               2,
-               None,
-               'adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
-              ('RMSProp algorithm, taken from "Goodfellow et '
-               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
-               2,
-               None,
-               'rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
-              ('ADAM algorithm, taken from "Goodfellow et '
-               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
-               2,
-               None,
-               'adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
-              ('Practical tips', 2, None, 'practical-tips'),
-              ('Automatic differentiation',
-               2,
-               None,
-               'automatic-differentiation'),
-              ('Using autograd', 2, None, 'using-autograd'),
-              ('Autograd with more complicated functions',
-               2,
-               None,
-               'autograd-with-more-complicated-functions'),
-              ('More complicated functions using the elements of their '
-               'arguments directly',
-               2,
-               None,
-               'more-complicated-functions-using-the-elements-of-their-arguments-directly'),
-              ('Functions using mathematical functions from Numpy',
-               2,
-               None,
-               'functions-using-mathematical-functions-from-numpy'),
-              ('More autograd', 2, None, 'more-autograd'),
-              ('And  with loops', 2, None, 'and-with-loops'),
-              ('Using recursion', 2, None, 'using-recursion'),
-              ('Using Autograd with OLS', 2, None, 'using-autograd-with-ols'),
-              ('Same code but now with momentum gradient descent',
-               2,
-               None,
-               'same-code-but-now-with-momentum-gradient-descent'),
-              ('Including Stochastic Gradient Descent with Autograd',
-               2,
-               None,
-               'including-stochastic-gradient-descent-with-autograd'),
-              ('Same code but now with momentum gradient descent',
-               2,
-               None,
-               'same-code-but-now-with-momentum-gradient-descent'),
-              ('Similar (second order function now) problem but now with '
-               'AdaGrad',
-               2,
+               'example-code-for-logistic-regression'),
+              ('Synthetic data generation',
+               3,
                None,
-               'similar-second-order-function-now-problem-but-now-with-adagrad'),
-              ('RMSprop for adaptive learning rate with Stochastic Gradient '
-               'Descent',
+               'synthetic-data-generation'),
+              ('Using _Scikit-learn_', 2, None, 'using-scikit-learn'),
+              ('Using the correlation matrix',
                2,
                None,
-               'rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent'),
-              ('And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"',
+               'using-the-correlation-matrix'),
+              ('Discussing the correlation data',
                2,
                None,
-               'and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf'),
-              ('And Logistic Regression', 2, None, 'and-logistic-regression'),
-              ('Introducing "JAX":"https://jax.readthedocs.io/en/latest/"',
+               'discussing-the-correlation-data'),
+              ('Other measures in classification studies',
                2,
                None,
-               'introducing-jax-https-jax-readthedocs-io-en-latest'),
-              ('Getting started with Jax, note the way we import numpy',
-               3,
-               None,
-               'getting-started-with-jax-note-the-way-we-import-numpy'),
-              ('A warm-up example', 3, None, 'a-warm-up-example'),
-              ('A more advanced example', 3, None, 'a-more-advanced-example'),
+               'other-measures-in-classification-studies'),
               ('Introduction to Neural networks',
                2,
                None,
@@ -276,81 +201,54 @@
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs001.html#plans-for-week-40" style="font-size: 80%;"><b>Plans for week 40</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs002.html#lecture-monday-september-30-2024" style="font-size: 80%;"><b>Lecture Monday September 30, 2024</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs003.html#suggested-readings-and-videos" style="font-size: 80%;"><b>Suggested readings and videos</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs004.html#lab-sessions-tuesday-and-wednesday" style="font-size: 80%;"><b>Lab sessions Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs005.html#summary-from-last-week-using-gradient-descent-methods-limitations" style="font-size: 80%;"><b>Summary from last week, using gradient descent methods, limitations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs006.html#simple-implementation-of-gd-for-ols-ridge-and-lasso" style="font-size: 80%;"><b>Simple implementation of GD for OLS, Ridge and Lasso</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs007.html#but-none-of-these-can-compete-with-newton-s-method" style="font-size: 80%;"><b>But none of these can compete with Newton's method</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs008.html#gradient-descent-and-logistic-regression" style="font-size: 80%;"><b>Gradient descent and Logistic regression</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs009.html#overview-video-on-stochastic-gradient-descent" style="font-size: 80%;"><b>Overview video on Stochastic Gradient Descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs010.html#batches-and-mini-batches" style="font-size: 80%;"><b>Batches and mini-batches</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs011.html#stochastic-gradient-descent-sgd" style="font-size: 80%;"><b>Stochastic Gradient Descent (SGD)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs012.html#stochastic-gradient-descent" style="font-size: 80%;"><b>Stochastic Gradient Descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs013.html#computation-of-gradients" style="font-size: 80%;"><b>Computation of gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs014.html#sgd-example" style="font-size: 80%;"><b>SGD example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs015.html#the-gradient-step" style="font-size: 80%;"><b>The gradient step</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs016.html#simple-example-code" style="font-size: 80%;"><b>Simple example code</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs017.html#when-do-we-stop" style="font-size: 80%;"><b>When do we stop?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs018.html#slightly-different-approach" style="font-size: 80%;"><b>Slightly different approach</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs019.html#time-decay-rate" style="font-size: 80%;"><b>Time decay rate</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs020.html#code-with-a-number-of-minibatches-which-varies" style="font-size: 80%;"><b>Code with a Number of Minibatches which varies</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs021.html#replace-or-not" style="font-size: 80%;"><b>Replace or not</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs022.html#momentum-based-gd" style="font-size: 80%;"><b>Momentum based GD</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs023.html#more-on-momentum-based-approaches" style="font-size: 80%;"><b>More on momentum based approaches</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs024.html#momentum-parameter" style="font-size: 80%;"><b>Momentum parameter</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs025.html#second-moment-of-the-gradient" style="font-size: 80%;"><b>Second moment of the gradient</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs026.html#rms-prop" style="font-size: 80%;"><b>RMS prop</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs027.html#adam-optimizer-https-arxiv-org-abs-1412-6980" style="font-size: 80%;"><b>"ADAM optimizer":"https://arxiv.org/abs/1412.6980"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#algorithms-and-codes-for-adagrad-rmsprop-and-adam" style="font-size: 80%;"><b>Algorithms and codes for Adagrad, RMSprop and Adam</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>AdaGrad algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>RMSProp algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>ADAM algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs029.html#practical-tips" style="font-size: 80%;"><b>Practical tips</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs030.html#automatic-differentiation" style="font-size: 80%;"><b>Automatic differentiation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs031.html#using-autograd" style="font-size: 80%;"><b>Using autograd</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs032.html#autograd-with-more-complicated-functions" style="font-size: 80%;"><b>Autograd with more complicated functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs033.html#more-complicated-functions-using-the-elements-of-their-arguments-directly" style="font-size: 80%;"><b>More complicated functions using the elements of their arguments directly</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs034.html#functions-using-mathematical-functions-from-numpy" style="font-size: 80%;"><b>Functions using mathematical functions from Numpy</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs035.html#more-autograd" style="font-size: 80%;"><b>More autograd</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs036.html#and-with-loops" style="font-size: 80%;"><b>And  with loops</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs037.html#using-recursion" style="font-size: 80%;"><b>Using recursion</b></a></li>
-     <!-- navigation toc: --> <li><a href="#using-autograd-with-ols" style="font-size: 80%;"><b>Using Autograd with OLS</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs041.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs040.html#including-stochastic-gradient-descent-with-autograd" style="font-size: 80%;"><b>Including Stochastic Gradient Descent with Autograd</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs041.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#similar-second-order-function-now-problem-but-now-with-adagrad" style="font-size: 80%;"><b>Similar (second order function now) problem but now with AdaGrad</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs043.html#rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent" style="font-size: 80%;"><b>RMSprop for adaptive learning rate with Stochastic Gradient Descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs044.html#and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf" style="font-size: 80%;"><b>And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#and-logistic-regression" style="font-size: 80%;"><b>And Logistic Regression</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#introducing-jax-https-jax-readthedocs-io-en-latest" style="font-size: 80%;"><b>Introducing "JAX":"https://jax.readthedocs.io/en/latest/"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#getting-started-with-jax-note-the-way-we-import-numpy" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Getting started with Jax, note the way we import numpy</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#a-warm-up-example" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;A warm-up example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#a-more-advanced-example" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;A more advanced example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs046.html#introduction-to-neural-networks" style="font-size: 80%;"><b>Introduction to Neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs047.html#artificial-neurons" style="font-size: 80%;"><b>Artificial neurons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs048.html#neural-network-types" style="font-size: 80%;"><b>Neural network types</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs049.html#feed-forward-neural-networks" style="font-size: 80%;"><b>Feed-forward neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs050.html#convolutional-neural-network" style="font-size: 80%;"><b>Convolutional Neural Network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs051.html#recurrent-neural-networks" style="font-size: 80%;"><b>Recurrent neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs052.html#other-types-of-networks" style="font-size: 80%;"><b>Other types of networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs053.html#multilayer-perceptrons" style="font-size: 80%;"><b>Multilayer perceptrons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs054.html#why-multilayer-perceptrons" style="font-size: 80%;"><b>Why multilayer perceptrons?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs055.html#illustration-of-a-single-perceptron-model-and-a-multi-perceptron-model" style="font-size: 80%;"><b>Illustration of a single perceptron model and a multi-perceptron model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs056.html#examples-of-xor-or-and-and-gates" style="font-size: 80%;"><b>Examples of XOR, OR and AND gates</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs057.html#does-logistic-regression-do-a-better-job" style="font-size: 80%;"><b>Does Logistic Regression do a better Job?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs058.html#adding-neural-networks" style="font-size: 80%;"><b>Adding Neural Networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs064.html#matrix-vector-notation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Matrix-vector notation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs065.html#matrix-vector-notation-and-activation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Matrix-vector notation  and activation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs066.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs067.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs068.html#relevance" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Relevance</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs001.html#lecture-monday-september-29-2025" style="font-size: 80%;"><b>Lecture Monday September 29, 2025</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs002.html#suggested-readings-and-videos" style="font-size: 80%;"><b>Suggested readings and videos</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs003.html#lab-sessions-tuesday-and-wednesday" style="font-size: 80%;"><b>Lab sessions Tuesday and Wednesday</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs004.html#logistic-regression-from-last-week" style="font-size: 80%;"><b>Logistic Regression, from last week</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs005.html#classification-problems" style="font-size: 80%;"><b>Classification problems</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs006.html#optimization-and-deep-learning" style="font-size: 80%;"><b>Optimization and Deep learning</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs007.html#basics" style="font-size: 80%;"><b>Basics</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs008.html#two-parameters" style="font-size: 80%;"><b>Two parameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs009.html#maximum-likelihood" style="font-size: 80%;"><b>Maximum likelihood</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs010.html#the-cost-function-rewritten" style="font-size: 80%;"><b>The cost function rewritten</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs011.html#minimizing-the-cross-entropy" style="font-size: 80%;"><b>Minimizing the cross entropy</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs012.html#a-more-compact-expression" style="font-size: 80%;"><b>A more compact expression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs013.html#extending-to-more-predictors" style="font-size: 80%;"><b>Extending to more predictors</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs014.html#including-more-classes" style="font-size: 80%;"><b>Including more classes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs015.html#more-classes" style="font-size: 80%;"><b>More classes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs016.html#optimization-the-central-part-of-any-machine-learning-algortithm" style="font-size: 80%;"><b>Optimization, the central part of any Machine Learning algortithm</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs017.html#revisiting-our-logistic-regression-case" style="font-size: 80%;"><b>Revisiting our Logistic Regression case</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs018.html#the-equations-to-solve" style="font-size: 80%;"><b>The equations to solve</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs019.html#solving-using-newton-raphson-s-method" style="font-size: 80%;"><b>Solving using Newton-Raphson's method</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs020.html#example-code-for-logistic-regression" style="font-size: 80%;"><b>Example code for Logistic Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs020.html#synthetic-data-generation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Synthetic data generation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs021.html#using-scikit-learn" style="font-size: 80%;"><b>Using <b>Scikit-learn</b></b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs022.html#using-the-correlation-matrix" style="font-size: 80%;"><b>Using the correlation matrix</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs023.html#discussing-the-correlation-data" style="font-size: 80%;"><b>Discussing the correlation data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs024.html#other-measures-in-classification-studies" style="font-size: 80%;"><b>Other measures in classification studies</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs025.html#introduction-to-neural-networks" style="font-size: 80%;"><b>Introduction to Neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs026.html#artificial-neurons" style="font-size: 80%;"><b>Artificial neurons</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs027.html#neural-network-types" style="font-size: 80%;"><b>Neural network types</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#feed-forward-neural-networks" style="font-size: 80%;"><b>Feed-forward neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs029.html#convolutional-neural-network" style="font-size: 80%;"><b>Convolutional Neural Network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs030.html#recurrent-neural-networks" style="font-size: 80%;"><b>Recurrent neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs031.html#other-types-of-networks" style="font-size: 80%;"><b>Other types of networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs032.html#multilayer-perceptrons" style="font-size: 80%;"><b>Multilayer perceptrons</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs033.html#why-multilayer-perceptrons" style="font-size: 80%;"><b>Why multilayer perceptrons?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs034.html#illustration-of-a-single-perceptron-model-and-a-multi-perceptron-model" style="font-size: 80%;"><b>Illustration of a single perceptron model and a multi-perceptron model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs035.html#examples-of-xor-or-and-and-gates" style="font-size: 80%;"><b>Examples of XOR, OR and AND gates</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs036.html#does-logistic-regression-do-a-better-job" style="font-size: 80%;"><b>Does Logistic Regression do a better Job?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs037.html#adding-neural-networks" style="font-size: 80%;"><b>Adding Neural Networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs043.html#matrix-vector-notation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Matrix-vector notation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs044.html#matrix-vector-notation-and-activation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Matrix-vector notation  and activation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs046.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs047.html#relevance" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Relevance</a></li>
 
         </ul>
       </li>
@@ -362,84 +260,20 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0038"></a>
 <!-- !split -->
-<h2 id="using-autograd-with-ols" class="anchor">Using Autograd with OLS </h2>
-
-<p>We conclude the part on optmization by showing how we can make codes
-for linear regression and logistic regression using <b>autograd</b>. The
-first example shows results with ordinary leats squares.
-</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #408080; font-style: italic"># Using Autograd to calculate gradients for OLS</span>
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">random</span> <span style="color: #008000; font-weight: bold">import</span> random, seed
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">autograd.numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">matplotlib.pyplot</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">plt</span>
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">autograd</span> <span style="color: #008000; font-weight: bold">import</span> grad
-
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">CostOLS</span>(beta):
-    <span style="color: #008000; font-weight: bold">return</span> (<span style="color: #666666">1.0/</span>n)<span style="color: #666666">*</span>np<span style="color: #666666">.</span>sum((y<span style="color: #666666">-</span>X <span style="color: #666666">@</span> beta)<span style="color: #666666">**2</span>)
+<h2 id="mathematical-model" class="anchor">Mathematical model  </h2>
 
-n <span style="color: #666666">=</span> <span style="color: #666666">100</span>
-x <span style="color: #666666">=</span> <span style="color: #666666">2*</span>np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>rand(n,<span style="color: #666666">1</span>)
-y <span style="color: #666666">=</span> <span style="color: #666666">4+3*</span>x<span style="color: #666666">+</span>np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>randn(n,<span style="color: #666666">1</span>)
-
-X <span style="color: #666666">=</span> np<span style="color: #666666">.</span>c_[np<span style="color: #666666">.</span>ones((n,<span style="color: #666666">1</span>)), x]
-XT_X <span style="color: #666666">=</span> X<span style="color: #666666">.</span>T <span style="color: #666666">@</span> X
-theta_linreg <span style="color: #666666">=</span> np<span style="color: #666666">.</span>linalg<span style="color: #666666">.</span>pinv(XT_X) <span style="color: #666666">@</span> (X<span style="color: #666666">.</span>T <span style="color: #666666">@</span> y)
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;Own inversion&quot;</span>)
-<span style="color: #008000">print</span>(theta_linreg)
-<span style="color: #408080; font-style: italic"># Hessian matrix</span>
-H <span style="color: #666666">=</span> (<span style="color: #666666">2.0/</span>n)<span style="color: #666666">*</span> XT_X
-EigValues, EigVectors <span style="color: #666666">=</span> np<span style="color: #666666">.</span>linalg<span style="color: #666666">.</span>eig(H)
-<span style="color: #008000">print</span>(<span style="color: #BA2121">f&quot;Eigenvalues of Hessian Matrix:</span><span style="color: #BB6688; font-weight: bold">{</span>EigValues<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">&quot;</span>)
-
-theta <span style="color: #666666">=</span> np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>randn(<span style="color: #666666">2</span>,<span style="color: #666666">1</span>)
-eta <span style="color: #666666">=</span> <span style="color: #666666">1.0/</span>np<span style="color: #666666">.</span>max(EigValues)
-Niterations <span style="color: #666666">=</span> <span style="color: #666666">1000</span>
-<span style="color: #408080; font-style: italic"># define the gradient</span>
-training_gradient <span style="color: #666666">=</span> grad(CostOLS)
-
-<span style="color: #008000; font-weight: bold">for</span> <span style="color: #008000">iter</span> <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(Niterations):
-    gradients <span style="color: #666666">=</span> training_gradient(theta)
-    theta <span style="color: #666666">-=</span> eta<span style="color: #666666">*</span>gradients
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;theta from own gd&quot;</span>)
-<span style="color: #008000">print</span>(theta)
-
-xnew <span style="color: #666666">=</span> np<span style="color: #666666">.</span>array([[<span style="color: #666666">0</span>],[<span style="color: #666666">2</span>]])
-Xnew <span style="color: #666666">=</span> np<span style="color: #666666">.</span>c_[np<span style="color: #666666">.</span>ones((<span style="color: #666666">2</span>,<span style="color: #666666">1</span>)), xnew]
-ypredict <span style="color: #666666">=</span> Xnew<span style="color: #666666">.</span>dot(theta)
-ypredict2 <span style="color: #666666">=</span> Xnew<span style="color: #666666">.</span>dot(theta_linreg)
-
-plt<span style="color: #666666">.</span>plot(xnew, ypredict, <span style="color: #BA2121">&quot;r-&quot;</span>)
-plt<span style="color: #666666">.</span>plot(xnew, ypredict2, <span style="color: #BA2121">&quot;b-&quot;</span>)
-plt<span style="color: #666666">.</span>plot(x, y ,<span style="color: #BA2121">&#39;ro&#39;</span>)
-plt<span style="color: #666666">.</span>axis([<span style="color: #666666">0</span>,<span style="color: #666666">2.0</span>,<span style="color: #666666">0</span>, <span style="color: #666666">15.0</span>])
-plt<span style="color: #666666">.</span>xlabel(<span style="color: #BA2121">r&#39;$x$&#39;</span>)
-plt<span style="color: #666666">.</span>ylabel(<span style="color: #BA2121">r&#39;$y$&#39;</span>)
-plt<span style="color: #666666">.</span>title(<span style="color: #BA2121">r&#39;Random numbers &#39;</span>)
-plt<span style="color: #666666">.</span>show()
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
+<p>The output \( y \) is produced via the activation function \( f \)</p>
+$$
+ y = f\left(\sum_{i=1}^n w_ix_i + b_i\right) = f(z),
+$$
 
+<p>This function receives \( x_i \) as inputs.
+Here the activation \( z=(\sum_{i=1}^n w_ix_i+b_i) \). 
+In an FFNN of such neurons, the <em>inputs</em> \( x_i \) are the <em>outputs</em> of
+the neurons in the preceding layer. Furthermore, an MLP is
+fully-connected, which means that each neuron receives a weighted sum
+of the outputs of <em>all</em> neurons in the previous layer.
+</p>
 
 <p>
 <!-- navigation buttons at the bottom of the page -->
@@ -465,8 +299,6 @@ <h2 id="using-autograd-with-ols" class="anchor">Using Autograd with OLS </h2>
   <li><a href="http://github.com/._week40-bs045.html">46</a></li>
   <li><a href="http://github.com/._week40-bs046.html">47</a></li>
   <li><a href="http://github.com/._week40-bs047.html">48</a></li>
-  <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week40-bs068.html">69</a></li>
   <li><a href="http://github.com/._week40-bs039.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week40/html/._week40-bs039.html b/doc/pub/week40/html/._week40-bs039.html
index 1d5e056c2..aa4f118dd 100644
--- a/doc/pub/week40/html/._week40-bs039.html
+++ b/doc/pub/week40/html/._week40-bs039.html
@@ -36,11 +36,10 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Plans for week 40', 2, None, 'plans-for-week-40'),
-              ('Lecture Monday September 30, 2024',
+ 'sections': [('Lecture Monday September 29, 2025',
                2,
                None,
-               'lecture-monday-september-30-2024'),
+               'lecture-monday-september-29-2025'),
               ('Suggested readings and videos',
                2,
                None,
@@ -49,145 +48,71 @@
                2,
                None,
                'lab-sessions-tuesday-and-wednesday'),
-              ('Summary from last week, using gradient descent methods, '
-               'limitations',
+              ('Logistic Regression, from last week',
                2,
                None,
-               'summary-from-last-week-using-gradient-descent-methods-limitations'),
-              ('Simple implementation of GD for OLS, Ridge and Lasso',
+               'logistic-regression-from-last-week'),
+              ('Classification problems', 2, None, 'classification-problems'),
+              ('Optimization and Deep learning',
                2,
                None,
-               'simple-implementation-of-gd-for-ols-ridge-and-lasso'),
-              ("But none of these can compete with Newton's method",
+               'optimization-and-deep-learning'),
+              ('Basics', 2, None, 'basics'),
+              ('Two parameters', 2, None, 'two-parameters'),
+              ('Maximum likelihood', 2, None, 'maximum-likelihood'),
+              ('The cost function rewritten',
                2,
                None,
-               'but-none-of-these-can-compete-with-newton-s-method'),
-              ('Gradient descent and Logistic regression',
+               'the-cost-function-rewritten'),
+              ('Minimizing the cross entropy',
                2,
                None,
-               'gradient-descent-and-logistic-regression'),
-              ('Overview video on Stochastic Gradient Descent',
+               'minimizing-the-cross-entropy'),
+              ('A more compact expression',
                2,
                None,
-               'overview-video-on-stochastic-gradient-descent'),
-              ('Batches and mini-batches', 2, None, 'batches-and-mini-batches'),
-              ('Stochastic Gradient Descent (SGD)',
+               'a-more-compact-expression'),
+              ('Extending to more predictors',
                2,
                None,
-               'stochastic-gradient-descent-sgd'),
-              ('Stochastic Gradient Descent',
+               'extending-to-more-predictors'),
+              ('Including more classes', 2, None, 'including-more-classes'),
+              ('More classes', 2, None, 'more-classes'),
+              ('Optimization, the central part of any Machine Learning '
+               'algortithm',
                2,
                None,
-               'stochastic-gradient-descent'),
-              ('Computation of gradients', 2, None, 'computation-of-gradients'),
-              ('SGD example', 2, None, 'sgd-example'),
-              ('The gradient step', 2, None, 'the-gradient-step'),
-              ('Simple example code', 2, None, 'simple-example-code'),
-              ('When do we stop?', 2, None, 'when-do-we-stop'),
-              ('Slightly different approach',
+               'optimization-the-central-part-of-any-machine-learning-algortithm'),
+              ('Revisiting our Logistic Regression case',
                2,
                None,
-               'slightly-different-approach'),
-              ('Time decay rate', 2, None, 'time-decay-rate'),
-              ('Code with a Number of Minibatches which varies',
+               'revisiting-our-logistic-regression-case'),
+              ('The equations to solve', 2, None, 'the-equations-to-solve'),
+              ("Solving using Newton-Raphson's method",
                2,
                None,
-               'code-with-a-number-of-minibatches-which-varies'),
-              ('Replace or not', 2, None, 'replace-or-not'),
-              ('Momentum based GD', 2, None, 'momentum-based-gd'),
-              ('More on momentum based approaches',
+               'solving-using-newton-raphson-s-method'),
+              ('Example code for Logistic Regression',
                2,
                None,
-               'more-on-momentum-based-approaches'),
-              ('Momentum parameter', 2, None, 'momentum-parameter'),
-              ('Second moment of the gradient',
-               2,
-               None,
-               'second-moment-of-the-gradient'),
-              ('RMS prop', 2, None, 'rms-prop'),
-              ('"ADAM optimizer":"https://arxiv.org/abs/1412.6980"',
-               2,
-               None,
-               'adam-optimizer-https-arxiv-org-abs-1412-6980'),
-              ('Algorithms and codes for Adagrad, RMSprop and Adam',
-               2,
-               None,
-               'algorithms-and-codes-for-adagrad-rmsprop-and-adam'),
-              ('AdaGrad algorithm, taken from "Goodfellow et '
-               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
-               2,
-               None,
-               'adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
-              ('RMSProp algorithm, taken from "Goodfellow et '
-               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
-               2,
-               None,
-               'rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
-              ('ADAM algorithm, taken from "Goodfellow et '
-               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
-               2,
-               None,
-               'adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
-              ('Practical tips', 2, None, 'practical-tips'),
-              ('Automatic differentiation',
-               2,
-               None,
-               'automatic-differentiation'),
-              ('Using autograd', 2, None, 'using-autograd'),
-              ('Autograd with more complicated functions',
-               2,
-               None,
-               'autograd-with-more-complicated-functions'),
-              ('More complicated functions using the elements of their '
-               'arguments directly',
-               2,
-               None,
-               'more-complicated-functions-using-the-elements-of-their-arguments-directly'),
-              ('Functions using mathematical functions from Numpy',
-               2,
-               None,
-               'functions-using-mathematical-functions-from-numpy'),
-              ('More autograd', 2, None, 'more-autograd'),
-              ('And  with loops', 2, None, 'and-with-loops'),
-              ('Using recursion', 2, None, 'using-recursion'),
-              ('Using Autograd with OLS', 2, None, 'using-autograd-with-ols'),
-              ('Same code but now with momentum gradient descent',
-               2,
-               None,
-               'same-code-but-now-with-momentum-gradient-descent'),
-              ('Including Stochastic Gradient Descent with Autograd',
-               2,
-               None,
-               'including-stochastic-gradient-descent-with-autograd'),
-              ('Same code but now with momentum gradient descent',
-               2,
-               None,
-               'same-code-but-now-with-momentum-gradient-descent'),
-              ('Similar (second order function now) problem but now with '
-               'AdaGrad',
-               2,
+               'example-code-for-logistic-regression'),
+              ('Synthetic data generation',
+               3,
                None,
-               'similar-second-order-function-now-problem-but-now-with-adagrad'),
-              ('RMSprop for adaptive learning rate with Stochastic Gradient '
-               'Descent',
+               'synthetic-data-generation'),
+              ('Using _Scikit-learn_', 2, None, 'using-scikit-learn'),
+              ('Using the correlation matrix',
                2,
                None,
-               'rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent'),
-              ('And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"',
+               'using-the-correlation-matrix'),
+              ('Discussing the correlation data',
                2,
                None,
-               'and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf'),
-              ('And Logistic Regression', 2, None, 'and-logistic-regression'),
-              ('Introducing "JAX":"https://jax.readthedocs.io/en/latest/"',
+               'discussing-the-correlation-data'),
+              ('Other measures in classification studies',
                2,
                None,
-               'introducing-jax-https-jax-readthedocs-io-en-latest'),
-              ('Getting started with Jax, note the way we import numpy',
-               3,
-               None,
-               'getting-started-with-jax-note-the-way-we-import-numpy'),
-              ('A warm-up example', 3, None, 'a-warm-up-example'),
-              ('A more advanced example', 3, None, 'a-more-advanced-example'),
+               'other-measures-in-classification-studies'),
               ('Introduction to Neural networks',
                2,
                None,
@@ -276,81 +201,54 @@
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs001.html#plans-for-week-40" style="font-size: 80%;"><b>Plans for week 40</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs002.html#lecture-monday-september-30-2024" style="font-size: 80%;"><b>Lecture Monday September 30, 2024</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs003.html#suggested-readings-and-videos" style="font-size: 80%;"><b>Suggested readings and videos</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs004.html#lab-sessions-tuesday-and-wednesday" style="font-size: 80%;"><b>Lab sessions Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs005.html#summary-from-last-week-using-gradient-descent-methods-limitations" style="font-size: 80%;"><b>Summary from last week, using gradient descent methods, limitations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs006.html#simple-implementation-of-gd-for-ols-ridge-and-lasso" style="font-size: 80%;"><b>Simple implementation of GD for OLS, Ridge and Lasso</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs007.html#but-none-of-these-can-compete-with-newton-s-method" style="font-size: 80%;"><b>But none of these can compete with Newton's method</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs008.html#gradient-descent-and-logistic-regression" style="font-size: 80%;"><b>Gradient descent and Logistic regression</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs009.html#overview-video-on-stochastic-gradient-descent" style="font-size: 80%;"><b>Overview video on Stochastic Gradient Descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs010.html#batches-and-mini-batches" style="font-size: 80%;"><b>Batches and mini-batches</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs011.html#stochastic-gradient-descent-sgd" style="font-size: 80%;"><b>Stochastic Gradient Descent (SGD)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs012.html#stochastic-gradient-descent" style="font-size: 80%;"><b>Stochastic Gradient Descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs013.html#computation-of-gradients" style="font-size: 80%;"><b>Computation of gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs014.html#sgd-example" style="font-size: 80%;"><b>SGD example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs015.html#the-gradient-step" style="font-size: 80%;"><b>The gradient step</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs016.html#simple-example-code" style="font-size: 80%;"><b>Simple example code</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs017.html#when-do-we-stop" style="font-size: 80%;"><b>When do we stop?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs018.html#slightly-different-approach" style="font-size: 80%;"><b>Slightly different approach</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs019.html#time-decay-rate" style="font-size: 80%;"><b>Time decay rate</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs020.html#code-with-a-number-of-minibatches-which-varies" style="font-size: 80%;"><b>Code with a Number of Minibatches which varies</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs021.html#replace-or-not" style="font-size: 80%;"><b>Replace or not</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs022.html#momentum-based-gd" style="font-size: 80%;"><b>Momentum based GD</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs023.html#more-on-momentum-based-approaches" style="font-size: 80%;"><b>More on momentum based approaches</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs024.html#momentum-parameter" style="font-size: 80%;"><b>Momentum parameter</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs025.html#second-moment-of-the-gradient" style="font-size: 80%;"><b>Second moment of the gradient</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs026.html#rms-prop" style="font-size: 80%;"><b>RMS prop</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs027.html#adam-optimizer-https-arxiv-org-abs-1412-6980" style="font-size: 80%;"><b>"ADAM optimizer":"https://arxiv.org/abs/1412.6980"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#algorithms-and-codes-for-adagrad-rmsprop-and-adam" style="font-size: 80%;"><b>Algorithms and codes for Adagrad, RMSprop and Adam</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>AdaGrad algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>RMSProp algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>ADAM algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs029.html#practical-tips" style="font-size: 80%;"><b>Practical tips</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs030.html#automatic-differentiation" style="font-size: 80%;"><b>Automatic differentiation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs031.html#using-autograd" style="font-size: 80%;"><b>Using autograd</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs032.html#autograd-with-more-complicated-functions" style="font-size: 80%;"><b>Autograd with more complicated functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs033.html#more-complicated-functions-using-the-elements-of-their-arguments-directly" style="font-size: 80%;"><b>More complicated functions using the elements of their arguments directly</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs034.html#functions-using-mathematical-functions-from-numpy" style="font-size: 80%;"><b>Functions using mathematical functions from Numpy</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs035.html#more-autograd" style="font-size: 80%;"><b>More autograd</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs036.html#and-with-loops" style="font-size: 80%;"><b>And  with loops</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs037.html#using-recursion" style="font-size: 80%;"><b>Using recursion</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs038.html#using-autograd-with-ols" style="font-size: 80%;"><b>Using Autograd with OLS</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs041.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs040.html#including-stochastic-gradient-descent-with-autograd" style="font-size: 80%;"><b>Including Stochastic Gradient Descent with Autograd</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs041.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#similar-second-order-function-now-problem-but-now-with-adagrad" style="font-size: 80%;"><b>Similar (second order function now) problem but now with AdaGrad</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs043.html#rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent" style="font-size: 80%;"><b>RMSprop for adaptive learning rate with Stochastic Gradient Descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs044.html#and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf" style="font-size: 80%;"><b>And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#and-logistic-regression" style="font-size: 80%;"><b>And Logistic Regression</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#introducing-jax-https-jax-readthedocs-io-en-latest" style="font-size: 80%;"><b>Introducing "JAX":"https://jax.readthedocs.io/en/latest/"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#getting-started-with-jax-note-the-way-we-import-numpy" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Getting started with Jax, note the way we import numpy</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#a-warm-up-example" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;A warm-up example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#a-more-advanced-example" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;A more advanced example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs046.html#introduction-to-neural-networks" style="font-size: 80%;"><b>Introduction to Neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs047.html#artificial-neurons" style="font-size: 80%;"><b>Artificial neurons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs048.html#neural-network-types" style="font-size: 80%;"><b>Neural network types</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs049.html#feed-forward-neural-networks" style="font-size: 80%;"><b>Feed-forward neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs050.html#convolutional-neural-network" style="font-size: 80%;"><b>Convolutional Neural Network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs051.html#recurrent-neural-networks" style="font-size: 80%;"><b>Recurrent neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs052.html#other-types-of-networks" style="font-size: 80%;"><b>Other types of networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs053.html#multilayer-perceptrons" style="font-size: 80%;"><b>Multilayer perceptrons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs054.html#why-multilayer-perceptrons" style="font-size: 80%;"><b>Why multilayer perceptrons?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs055.html#illustration-of-a-single-perceptron-model-and-a-multi-perceptron-model" style="font-size: 80%;"><b>Illustration of a single perceptron model and a multi-perceptron model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs056.html#examples-of-xor-or-and-and-gates" style="font-size: 80%;"><b>Examples of XOR, OR and AND gates</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs057.html#does-logistic-regression-do-a-better-job" style="font-size: 80%;"><b>Does Logistic Regression do a better Job?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs058.html#adding-neural-networks" style="font-size: 80%;"><b>Adding Neural Networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs064.html#matrix-vector-notation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Matrix-vector notation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs065.html#matrix-vector-notation-and-activation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Matrix-vector notation  and activation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs066.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs067.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs068.html#relevance" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Relevance</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs001.html#lecture-monday-september-29-2025" style="font-size: 80%;"><b>Lecture Monday September 29, 2025</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs002.html#suggested-readings-and-videos" style="font-size: 80%;"><b>Suggested readings and videos</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs003.html#lab-sessions-tuesday-and-wednesday" style="font-size: 80%;"><b>Lab sessions Tuesday and Wednesday</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs004.html#logistic-regression-from-last-week" style="font-size: 80%;"><b>Logistic Regression, from last week</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs005.html#classification-problems" style="font-size: 80%;"><b>Classification problems</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs006.html#optimization-and-deep-learning" style="font-size: 80%;"><b>Optimization and Deep learning</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs007.html#basics" style="font-size: 80%;"><b>Basics</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs008.html#two-parameters" style="font-size: 80%;"><b>Two parameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs009.html#maximum-likelihood" style="font-size: 80%;"><b>Maximum likelihood</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs010.html#the-cost-function-rewritten" style="font-size: 80%;"><b>The cost function rewritten</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs011.html#minimizing-the-cross-entropy" style="font-size: 80%;"><b>Minimizing the cross entropy</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs012.html#a-more-compact-expression" style="font-size: 80%;"><b>A more compact expression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs013.html#extending-to-more-predictors" style="font-size: 80%;"><b>Extending to more predictors</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs014.html#including-more-classes" style="font-size: 80%;"><b>Including more classes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs015.html#more-classes" style="font-size: 80%;"><b>More classes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs016.html#optimization-the-central-part-of-any-machine-learning-algortithm" style="font-size: 80%;"><b>Optimization, the central part of any Machine Learning algortithm</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs017.html#revisiting-our-logistic-regression-case" style="font-size: 80%;"><b>Revisiting our Logistic Regression case</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs018.html#the-equations-to-solve" style="font-size: 80%;"><b>The equations to solve</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs019.html#solving-using-newton-raphson-s-method" style="font-size: 80%;"><b>Solving using Newton-Raphson's method</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs020.html#example-code-for-logistic-regression" style="font-size: 80%;"><b>Example code for Logistic Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs020.html#synthetic-data-generation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Synthetic data generation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs021.html#using-scikit-learn" style="font-size: 80%;"><b>Using <b>Scikit-learn</b></b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs022.html#using-the-correlation-matrix" style="font-size: 80%;"><b>Using the correlation matrix</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs023.html#discussing-the-correlation-data" style="font-size: 80%;"><b>Discussing the correlation data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs024.html#other-measures-in-classification-studies" style="font-size: 80%;"><b>Other measures in classification studies</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs025.html#introduction-to-neural-networks" style="font-size: 80%;"><b>Introduction to Neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs026.html#artificial-neurons" style="font-size: 80%;"><b>Artificial neurons</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs027.html#neural-network-types" style="font-size: 80%;"><b>Neural network types</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#feed-forward-neural-networks" style="font-size: 80%;"><b>Feed-forward neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs029.html#convolutional-neural-network" style="font-size: 80%;"><b>Convolutional Neural Network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs030.html#recurrent-neural-networks" style="font-size: 80%;"><b>Recurrent neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs031.html#other-types-of-networks" style="font-size: 80%;"><b>Other types of networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs032.html#multilayer-perceptrons" style="font-size: 80%;"><b>Multilayer perceptrons</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs033.html#why-multilayer-perceptrons" style="font-size: 80%;"><b>Why multilayer perceptrons?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs034.html#illustration-of-a-single-perceptron-model-and-a-multi-perceptron-model" style="font-size: 80%;"><b>Illustration of a single perceptron model and a multi-perceptron model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs035.html#examples-of-xor-or-and-and-gates" style="font-size: 80%;"><b>Examples of XOR, OR and AND gates</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs036.html#does-logistic-regression-do-a-better-job" style="font-size: 80%;"><b>Does Logistic Regression do a better Job?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs037.html#adding-neural-networks" style="font-size: 80%;"><b>Adding Neural Networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs043.html#matrix-vector-notation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Matrix-vector notation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs044.html#matrix-vector-notation-and-activation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Matrix-vector notation  and activation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs046.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs047.html#relevance" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Relevance</a></li>
 
         </ul>
       </li>
@@ -362,82 +260,48 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0039"></a>
 <!-- !split -->
-<h2 id="same-code-but-now-with-momentum-gradient-descent" class="anchor">Same code but now with momentum gradient descent </h2>
+<h2 id="mathematical-model" class="anchor">Mathematical model  </h2>
 
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #408080; font-style: italic"># Using Autograd to calculate gradients for OLS</span>
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">random</span> <span style="color: #008000; font-weight: bold">import</span> random, seed
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">autograd.numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">matplotlib.pyplot</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">plt</span>
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">autograd</span> <span style="color: #008000; font-weight: bold">import</span> grad
+<p>First, for each node \( i \) in the first hidden layer, we calculate a weighted sum \( z_i^1 \) of the input coordinates \( x_j \),</p>
 
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">CostOLS</span>(beta):
-    <span style="color: #008000; font-weight: bold">return</span> (<span style="color: #666666">1.0/</span>n)<span style="color: #666666">*</span>np<span style="color: #666666">.</span>sum((y<span style="color: #666666">-</span>X <span style="color: #666666">@</span> beta)<span style="color: #666666">**2</span>)
+$$
+\begin{equation} z_i^1 = \sum_{j=1}^{M} w_{ij}^1 x_j + b_i^1
+\tag{2}
+\end{equation}
+$$
 
-n <span style="color: #666666">=</span> <span style="color: #666666">100</span>
-x <span style="color: #666666">=</span> <span style="color: #666666">2*</span>np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>rand(n,<span style="color: #666666">1</span>)
-y <span style="color: #666666">=</span> <span style="color: #666666">4+3*</span>x<span style="color: #408080; font-style: italic">#+np.random.randn(n,1)</span>
+<p>Here \( b_i \) is the so-called bias which is normally needed in
+case of zero activation weights or inputs. How to fix the biases and
+the weights will be discussed below.  The value of \( z_i^1 \) is the
+argument to the activation function \( f_i \) of each node \( i \), The
+variable \( M \) stands for all possible inputs to a given node \( i \) in the
+first layer.  We define  the output \( y_i^1 \) of all neurons in layer 1 as
+</p>
 
-X <span style="color: #666666">=</span> np<span style="color: #666666">.</span>c_[np<span style="color: #666666">.</span>ones((n,<span style="color: #666666">1</span>)), x]
-XT_X <span style="color: #666666">=</span> X<span style="color: #666666">.</span>T <span style="color: #666666">@</span> X
-theta_linreg <span style="color: #666666">=</span> np<span style="color: #666666">.</span>linalg<span style="color: #666666">.</span>pinv(XT_X) <span style="color: #666666">@</span> (X<span style="color: #666666">.</span>T <span style="color: #666666">@</span> y)
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;Own inversion&quot;</span>)
-<span style="color: #008000">print</span>(theta_linreg)
-<span style="color: #408080; font-style: italic"># Hessian matrix</span>
-H <span style="color: #666666">=</span> (<span style="color: #666666">2.0/</span>n)<span style="color: #666666">*</span> XT_X
-EigValues, EigVectors <span style="color: #666666">=</span> np<span style="color: #666666">.</span>linalg<span style="color: #666666">.</span>eig(H)
-<span style="color: #008000">print</span>(<span style="color: #BA2121">f&quot;Eigenvalues of Hessian Matrix:</span><span style="color: #BB6688; font-weight: bold">{</span>EigValues<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">&quot;</span>)
+$$
+\begin{equation}
+ y_i^1 = f(z_i^1) = f\left(\sum_{j=1}^M w_{ij}^1 x_j  + b_i^1\right)
+\tag{3}
+\end{equation}
+$$
 
-theta <span style="color: #666666">=</span> np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>randn(<span style="color: #666666">2</span>,<span style="color: #666666">1</span>)
-eta <span style="color: #666666">=</span> <span style="color: #666666">1.0/</span>np<span style="color: #666666">.</span>max(EigValues)
-Niterations <span style="color: #666666">=</span> <span style="color: #666666">30</span>
+<p>where we assume that all nodes in the same layer have identical
+activation functions, hence the notation \( f \). In general, we could assume in the more general case that different layers have different activation functions.
+In this case we would identify these functions with a superscript \( l \) for the \( l \)-th layer,
+</p>
 
-<span style="color: #408080; font-style: italic"># define the gradient</span>
-training_gradient <span style="color: #666666">=</span> grad(CostOLS)
-
-<span style="color: #008000; font-weight: bold">for</span> <span style="color: #008000">iter</span> <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(Niterations):
-    gradients <span style="color: #666666">=</span> training_gradient(theta)
-    theta <span style="color: #666666">-=</span> eta<span style="color: #666666">*</span>gradients
-    <span style="color: #008000">print</span>(<span style="color: #008000">iter</span>,gradients[<span style="color: #666666">0</span>],gradients[<span style="color: #666666">1</span>])
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;theta from own gd&quot;</span>)
-<span style="color: #008000">print</span>(theta)
-
-<span style="color: #408080; font-style: italic"># Now improve with momentum gradient descent</span>
-change <span style="color: #666666">=</span> <span style="color: #666666">0.0</span>
-delta_momentum <span style="color: #666666">=</span> <span style="color: #666666">0.3</span>
-<span style="color: #008000; font-weight: bold">for</span> <span style="color: #008000">iter</span> <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(Niterations):
-    <span style="color: #408080; font-style: italic"># calculate gradient</span>
-    gradients <span style="color: #666666">=</span> training_gradient(theta)
-    <span style="color: #408080; font-style: italic"># calculate update</span>
-    new_change <span style="color: #666666">=</span> eta<span style="color: #666666">*</span>gradients<span style="color: #666666">+</span>delta_momentum<span style="color: #666666">*</span>change
-    <span style="color: #408080; font-style: italic"># take a step</span>
-    theta <span style="color: #666666">-=</span> new_change
-    <span style="color: #408080; font-style: italic"># save the change</span>
-    change <span style="color: #666666">=</span> new_change
-    <span style="color: #008000">print</span>(<span style="color: #008000">iter</span>,gradients[<span style="color: #666666">0</span>],gradients[<span style="color: #666666">1</span>])
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;theta from own gd wth momentum&quot;</span>)
-<span style="color: #008000">print</span>(theta)
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
+$$
+\begin{equation}
+ y_i^l = f^l(u_i^l) = f^l\left(\sum_{j=1}^{N_{l-1}} w_{ij}^l y_j^{l-1} + b_i^l\right)
+\tag{4}
+\end{equation}
+$$
 
+<p>where \( N_l \) is the number of nodes in layer \( l \). When the output of
+all the nodes in the first hidden layer are computed, the values of
+the subsequent layer can be calculated and so forth until the output
+is obtained.
+</p>
 
 <p>
 <!-- navigation buttons at the bottom of the page -->
@@ -462,9 +326,6 @@ <h2 id="same-code-but-now-with-momentum-gradient-descent" class="anchor">Same co
   <li><a href="http://github.com/._week40-bs045.html">46</a></li>
   <li><a href="http://github.com/._week40-bs046.html">47</a></li>
   <li><a href="http://github.com/._week40-bs047.html">48</a></li>
-  <li><a href="http://github.com/._week40-bs048.html">49</a></li>
-  <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week40-bs068.html">69</a></li>
   <li><a href="http://github.com/._week40-bs040.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week40/html/._week40-bs040.html b/doc/pub/week40/html/._week40-bs040.html
index 9366e6fcd..1cbe9cdba 100644
--- a/doc/pub/week40/html/._week40-bs040.html
+++ b/doc/pub/week40/html/._week40-bs040.html
@@ -36,11 +36,10 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Plans for week 40', 2, None, 'plans-for-week-40'),
-              ('Lecture Monday September 30, 2024',
+ 'sections': [('Lecture Monday September 29, 2025',
                2,
                None,
-               'lecture-monday-september-30-2024'),
+               'lecture-monday-september-29-2025'),
               ('Suggested readings and videos',
                2,
                None,
@@ -49,145 +48,71 @@
                2,
                None,
                'lab-sessions-tuesday-and-wednesday'),
-              ('Summary from last week, using gradient descent methods, '
-               'limitations',
+              ('Logistic Regression, from last week',
                2,
                None,
-               'summary-from-last-week-using-gradient-descent-methods-limitations'),
-              ('Simple implementation of GD for OLS, Ridge and Lasso',
+               'logistic-regression-from-last-week'),
+              ('Classification problems', 2, None, 'classification-problems'),
+              ('Optimization and Deep learning',
                2,
                None,
-               'simple-implementation-of-gd-for-ols-ridge-and-lasso'),
-              ("But none of these can compete with Newton's method",
+               'optimization-and-deep-learning'),
+              ('Basics', 2, None, 'basics'),
+              ('Two parameters', 2, None, 'two-parameters'),
+              ('Maximum likelihood', 2, None, 'maximum-likelihood'),
+              ('The cost function rewritten',
                2,
                None,
-               'but-none-of-these-can-compete-with-newton-s-method'),
-              ('Gradient descent and Logistic regression',
+               'the-cost-function-rewritten'),
+              ('Minimizing the cross entropy',
                2,
                None,
-               'gradient-descent-and-logistic-regression'),
-              ('Overview video on Stochastic Gradient Descent',
+               'minimizing-the-cross-entropy'),
+              ('A more compact expression',
                2,
                None,
-               'overview-video-on-stochastic-gradient-descent'),
-              ('Batches and mini-batches', 2, None, 'batches-and-mini-batches'),
-              ('Stochastic Gradient Descent (SGD)',
+               'a-more-compact-expression'),
+              ('Extending to more predictors',
                2,
                None,
-               'stochastic-gradient-descent-sgd'),
-              ('Stochastic Gradient Descent',
+               'extending-to-more-predictors'),
+              ('Including more classes', 2, None, 'including-more-classes'),
+              ('More classes', 2, None, 'more-classes'),
+              ('Optimization, the central part of any Machine Learning '
+               'algortithm',
                2,
                None,
-               'stochastic-gradient-descent'),
-              ('Computation of gradients', 2, None, 'computation-of-gradients'),
-              ('SGD example', 2, None, 'sgd-example'),
-              ('The gradient step', 2, None, 'the-gradient-step'),
-              ('Simple example code', 2, None, 'simple-example-code'),
-              ('When do we stop?', 2, None, 'when-do-we-stop'),
-              ('Slightly different approach',
+               'optimization-the-central-part-of-any-machine-learning-algortithm'),
+              ('Revisiting our Logistic Regression case',
                2,
                None,
-               'slightly-different-approach'),
-              ('Time decay rate', 2, None, 'time-decay-rate'),
-              ('Code with a Number of Minibatches which varies',
+               'revisiting-our-logistic-regression-case'),
+              ('The equations to solve', 2, None, 'the-equations-to-solve'),
+              ("Solving using Newton-Raphson's method",
                2,
                None,
-               'code-with-a-number-of-minibatches-which-varies'),
-              ('Replace or not', 2, None, 'replace-or-not'),
-              ('Momentum based GD', 2, None, 'momentum-based-gd'),
-              ('More on momentum based approaches',
+               'solving-using-newton-raphson-s-method'),
+              ('Example code for Logistic Regression',
                2,
                None,
-               'more-on-momentum-based-approaches'),
-              ('Momentum parameter', 2, None, 'momentum-parameter'),
-              ('Second moment of the gradient',
-               2,
-               None,
-               'second-moment-of-the-gradient'),
-              ('RMS prop', 2, None, 'rms-prop'),
-              ('"ADAM optimizer":"https://arxiv.org/abs/1412.6980"',
-               2,
-               None,
-               'adam-optimizer-https-arxiv-org-abs-1412-6980'),
-              ('Algorithms and codes for Adagrad, RMSprop and Adam',
-               2,
-               None,
-               'algorithms-and-codes-for-adagrad-rmsprop-and-adam'),
-              ('AdaGrad algorithm, taken from "Goodfellow et '
-               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
-               2,
-               None,
-               'adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
-              ('RMSProp algorithm, taken from "Goodfellow et '
-               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
-               2,
-               None,
-               'rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
-              ('ADAM algorithm, taken from "Goodfellow et '
-               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
-               2,
-               None,
-               'adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
-              ('Practical tips', 2, None, 'practical-tips'),
-              ('Automatic differentiation',
-               2,
-               None,
-               'automatic-differentiation'),
-              ('Using autograd', 2, None, 'using-autograd'),
-              ('Autograd with more complicated functions',
-               2,
-               None,
-               'autograd-with-more-complicated-functions'),
-              ('More complicated functions using the elements of their '
-               'arguments directly',
-               2,
-               None,
-               'more-complicated-functions-using-the-elements-of-their-arguments-directly'),
-              ('Functions using mathematical functions from Numpy',
-               2,
-               None,
-               'functions-using-mathematical-functions-from-numpy'),
-              ('More autograd', 2, None, 'more-autograd'),
-              ('And  with loops', 2, None, 'and-with-loops'),
-              ('Using recursion', 2, None, 'using-recursion'),
-              ('Using Autograd with OLS', 2, None, 'using-autograd-with-ols'),
-              ('Same code but now with momentum gradient descent',
-               2,
-               None,
-               'same-code-but-now-with-momentum-gradient-descent'),
-              ('Including Stochastic Gradient Descent with Autograd',
-               2,
-               None,
-               'including-stochastic-gradient-descent-with-autograd'),
-              ('Same code but now with momentum gradient descent',
-               2,
-               None,
-               'same-code-but-now-with-momentum-gradient-descent'),
-              ('Similar (second order function now) problem but now with '
-               'AdaGrad',
-               2,
+               'example-code-for-logistic-regression'),
+              ('Synthetic data generation',
+               3,
                None,
-               'similar-second-order-function-now-problem-but-now-with-adagrad'),
-              ('RMSprop for adaptive learning rate with Stochastic Gradient '
-               'Descent',
+               'synthetic-data-generation'),
+              ('Using _Scikit-learn_', 2, None, 'using-scikit-learn'),
+              ('Using the correlation matrix',
                2,
                None,
-               'rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent'),
-              ('And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"',
+               'using-the-correlation-matrix'),
+              ('Discussing the correlation data',
                2,
                None,
-               'and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf'),
-              ('And Logistic Regression', 2, None, 'and-logistic-regression'),
-              ('Introducing "JAX":"https://jax.readthedocs.io/en/latest/"',
+               'discussing-the-correlation-data'),
+              ('Other measures in classification studies',
                2,
                None,
-               'introducing-jax-https-jax-readthedocs-io-en-latest'),
-              ('Getting started with Jax, note the way we import numpy',
-               3,
-               None,
-               'getting-started-with-jax-note-the-way-we-import-numpy'),
-              ('A warm-up example', 3, None, 'a-warm-up-example'),
-              ('A more advanced example', 3, None, 'a-more-advanced-example'),
+               'other-measures-in-classification-studies'),
               ('Introduction to Neural networks',
                2,
                None,
@@ -276,81 +201,54 @@
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs001.html#plans-for-week-40" style="font-size: 80%;"><b>Plans for week 40</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs002.html#lecture-monday-september-30-2024" style="font-size: 80%;"><b>Lecture Monday September 30, 2024</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs003.html#suggested-readings-and-videos" style="font-size: 80%;"><b>Suggested readings and videos</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs004.html#lab-sessions-tuesday-and-wednesday" style="font-size: 80%;"><b>Lab sessions Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs005.html#summary-from-last-week-using-gradient-descent-methods-limitations" style="font-size: 80%;"><b>Summary from last week, using gradient descent methods, limitations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs006.html#simple-implementation-of-gd-for-ols-ridge-and-lasso" style="font-size: 80%;"><b>Simple implementation of GD for OLS, Ridge and Lasso</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs007.html#but-none-of-these-can-compete-with-newton-s-method" style="font-size: 80%;"><b>But none of these can compete with Newton's method</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs008.html#gradient-descent-and-logistic-regression" style="font-size: 80%;"><b>Gradient descent and Logistic regression</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs009.html#overview-video-on-stochastic-gradient-descent" style="font-size: 80%;"><b>Overview video on Stochastic Gradient Descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs010.html#batches-and-mini-batches" style="font-size: 80%;"><b>Batches and mini-batches</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs011.html#stochastic-gradient-descent-sgd" style="font-size: 80%;"><b>Stochastic Gradient Descent (SGD)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs012.html#stochastic-gradient-descent" style="font-size: 80%;"><b>Stochastic Gradient Descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs013.html#computation-of-gradients" style="font-size: 80%;"><b>Computation of gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs014.html#sgd-example" style="font-size: 80%;"><b>SGD example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs015.html#the-gradient-step" style="font-size: 80%;"><b>The gradient step</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs016.html#simple-example-code" style="font-size: 80%;"><b>Simple example code</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs017.html#when-do-we-stop" style="font-size: 80%;"><b>When do we stop?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs018.html#slightly-different-approach" style="font-size: 80%;"><b>Slightly different approach</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs019.html#time-decay-rate" style="font-size: 80%;"><b>Time decay rate</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs020.html#code-with-a-number-of-minibatches-which-varies" style="font-size: 80%;"><b>Code with a Number of Minibatches which varies</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs021.html#replace-or-not" style="font-size: 80%;"><b>Replace or not</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs022.html#momentum-based-gd" style="font-size: 80%;"><b>Momentum based GD</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs023.html#more-on-momentum-based-approaches" style="font-size: 80%;"><b>More on momentum based approaches</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs024.html#momentum-parameter" style="font-size: 80%;"><b>Momentum parameter</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs025.html#second-moment-of-the-gradient" style="font-size: 80%;"><b>Second moment of the gradient</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs026.html#rms-prop" style="font-size: 80%;"><b>RMS prop</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs027.html#adam-optimizer-https-arxiv-org-abs-1412-6980" style="font-size: 80%;"><b>"ADAM optimizer":"https://arxiv.org/abs/1412.6980"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#algorithms-and-codes-for-adagrad-rmsprop-and-adam" style="font-size: 80%;"><b>Algorithms and codes for Adagrad, RMSprop and Adam</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>AdaGrad algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>RMSProp algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>ADAM algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs029.html#practical-tips" style="font-size: 80%;"><b>Practical tips</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs030.html#automatic-differentiation" style="font-size: 80%;"><b>Automatic differentiation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs031.html#using-autograd" style="font-size: 80%;"><b>Using autograd</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs032.html#autograd-with-more-complicated-functions" style="font-size: 80%;"><b>Autograd with more complicated functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs033.html#more-complicated-functions-using-the-elements-of-their-arguments-directly" style="font-size: 80%;"><b>More complicated functions using the elements of their arguments directly</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs034.html#functions-using-mathematical-functions-from-numpy" style="font-size: 80%;"><b>Functions using mathematical functions from Numpy</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs035.html#more-autograd" style="font-size: 80%;"><b>More autograd</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs036.html#and-with-loops" style="font-size: 80%;"><b>And  with loops</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs037.html#using-recursion" style="font-size: 80%;"><b>Using recursion</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs038.html#using-autograd-with-ols" style="font-size: 80%;"><b>Using Autograd with OLS</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs041.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="#including-stochastic-gradient-descent-with-autograd" style="font-size: 80%;"><b>Including Stochastic Gradient Descent with Autograd</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs041.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#similar-second-order-function-now-problem-but-now-with-adagrad" style="font-size: 80%;"><b>Similar (second order function now) problem but now with AdaGrad</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs043.html#rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent" style="font-size: 80%;"><b>RMSprop for adaptive learning rate with Stochastic Gradient Descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs044.html#and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf" style="font-size: 80%;"><b>And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#and-logistic-regression" style="font-size: 80%;"><b>And Logistic Regression</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#introducing-jax-https-jax-readthedocs-io-en-latest" style="font-size: 80%;"><b>Introducing "JAX":"https://jax.readthedocs.io/en/latest/"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#getting-started-with-jax-note-the-way-we-import-numpy" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Getting started with Jax, note the way we import numpy</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#a-warm-up-example" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;A warm-up example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#a-more-advanced-example" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;A more advanced example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs046.html#introduction-to-neural-networks" style="font-size: 80%;"><b>Introduction to Neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs047.html#artificial-neurons" style="font-size: 80%;"><b>Artificial neurons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs048.html#neural-network-types" style="font-size: 80%;"><b>Neural network types</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs049.html#feed-forward-neural-networks" style="font-size: 80%;"><b>Feed-forward neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs050.html#convolutional-neural-network" style="font-size: 80%;"><b>Convolutional Neural Network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs051.html#recurrent-neural-networks" style="font-size: 80%;"><b>Recurrent neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs052.html#other-types-of-networks" style="font-size: 80%;"><b>Other types of networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs053.html#multilayer-perceptrons" style="font-size: 80%;"><b>Multilayer perceptrons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs054.html#why-multilayer-perceptrons" style="font-size: 80%;"><b>Why multilayer perceptrons?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs055.html#illustration-of-a-single-perceptron-model-and-a-multi-perceptron-model" style="font-size: 80%;"><b>Illustration of a single perceptron model and a multi-perceptron model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs056.html#examples-of-xor-or-and-and-gates" style="font-size: 80%;"><b>Examples of XOR, OR and AND gates</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs057.html#does-logistic-regression-do-a-better-job" style="font-size: 80%;"><b>Does Logistic Regression do a better Job?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs058.html#adding-neural-networks" style="font-size: 80%;"><b>Adding Neural Networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs064.html#matrix-vector-notation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Matrix-vector notation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs065.html#matrix-vector-notation-and-activation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Matrix-vector notation  and activation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs066.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs067.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs068.html#relevance" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Relevance</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs001.html#lecture-monday-september-29-2025" style="font-size: 80%;"><b>Lecture Monday September 29, 2025</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs002.html#suggested-readings-and-videos" style="font-size: 80%;"><b>Suggested readings and videos</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs003.html#lab-sessions-tuesday-and-wednesday" style="font-size: 80%;"><b>Lab sessions Tuesday and Wednesday</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs004.html#logistic-regression-from-last-week" style="font-size: 80%;"><b>Logistic Regression, from last week</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs005.html#classification-problems" style="font-size: 80%;"><b>Classification problems</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs006.html#optimization-and-deep-learning" style="font-size: 80%;"><b>Optimization and Deep learning</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs007.html#basics" style="font-size: 80%;"><b>Basics</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs008.html#two-parameters" style="font-size: 80%;"><b>Two parameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs009.html#maximum-likelihood" style="font-size: 80%;"><b>Maximum likelihood</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs010.html#the-cost-function-rewritten" style="font-size: 80%;"><b>The cost function rewritten</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs011.html#minimizing-the-cross-entropy" style="font-size: 80%;"><b>Minimizing the cross entropy</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs012.html#a-more-compact-expression" style="font-size: 80%;"><b>A more compact expression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs013.html#extending-to-more-predictors" style="font-size: 80%;"><b>Extending to more predictors</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs014.html#including-more-classes" style="font-size: 80%;"><b>Including more classes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs015.html#more-classes" style="font-size: 80%;"><b>More classes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs016.html#optimization-the-central-part-of-any-machine-learning-algortithm" style="font-size: 80%;"><b>Optimization, the central part of any Machine Learning algortithm</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs017.html#revisiting-our-logistic-regression-case" style="font-size: 80%;"><b>Revisiting our Logistic Regression case</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs018.html#the-equations-to-solve" style="font-size: 80%;"><b>The equations to solve</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs019.html#solving-using-newton-raphson-s-method" style="font-size: 80%;"><b>Solving using Newton-Raphson's method</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs020.html#example-code-for-logistic-regression" style="font-size: 80%;"><b>Example code for Logistic Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs020.html#synthetic-data-generation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Synthetic data generation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs021.html#using-scikit-learn" style="font-size: 80%;"><b>Using <b>Scikit-learn</b></b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs022.html#using-the-correlation-matrix" style="font-size: 80%;"><b>Using the correlation matrix</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs023.html#discussing-the-correlation-data" style="font-size: 80%;"><b>Discussing the correlation data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs024.html#other-measures-in-classification-studies" style="font-size: 80%;"><b>Other measures in classification studies</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs025.html#introduction-to-neural-networks" style="font-size: 80%;"><b>Introduction to Neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs026.html#artificial-neurons" style="font-size: 80%;"><b>Artificial neurons</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs027.html#neural-network-types" style="font-size: 80%;"><b>Neural network types</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#feed-forward-neural-networks" style="font-size: 80%;"><b>Feed-forward neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs029.html#convolutional-neural-network" style="font-size: 80%;"><b>Convolutional Neural Network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs030.html#recurrent-neural-networks" style="font-size: 80%;"><b>Recurrent neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs031.html#other-types-of-networks" style="font-size: 80%;"><b>Other types of networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs032.html#multilayer-perceptrons" style="font-size: 80%;"><b>Multilayer perceptrons</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs033.html#why-multilayer-perceptrons" style="font-size: 80%;"><b>Why multilayer perceptrons?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs034.html#illustration-of-a-single-perceptron-model-and-a-multi-perceptron-model" style="font-size: 80%;"><b>Illustration of a single perceptron model and a multi-perceptron model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs035.html#examples-of-xor-or-and-and-gates" style="font-size: 80%;"><b>Examples of XOR, OR and AND gates</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs036.html#does-logistic-regression-do-a-better-job" style="font-size: 80%;"><b>Does Logistic Regression do a better Job?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs037.html#adding-neural-networks" style="font-size: 80%;"><b>Adding Neural Networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs043.html#matrix-vector-notation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Matrix-vector notation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs044.html#matrix-vector-notation-and-activation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Matrix-vector notation  and activation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs046.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs047.html#relevance" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Relevance</a></li>
 
         </ul>
       </li>
@@ -362,103 +260,30 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0040"></a>
 <!-- !split -->
-<h2 id="including-stochastic-gradient-descent-with-autograd" class="anchor">Including Stochastic Gradient Descent with Autograd </h2>
-<p>In this code we include the stochastic gradient descent approach discussed above. Note here that we specify which argument we are taking the derivative with respect to when using <b>autograd</b>.</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #408080; font-style: italic"># Using Autograd to calculate gradients using SGD</span>
-<span style="color: #408080; font-style: italic"># OLS example</span>
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">random</span> <span style="color: #008000; font-weight: bold">import</span> random, seed
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">autograd.numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">matplotlib.pyplot</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">plt</span>
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">autograd</span> <span style="color: #008000; font-weight: bold">import</span> grad
-
-<span style="color: #408080; font-style: italic"># Note change from previous example</span>
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">CostOLS</span>(y,X,theta):
-    <span style="color: #008000; font-weight: bold">return</span> np<span style="color: #666666">.</span>sum((y<span style="color: #666666">-</span>X <span style="color: #666666">@</span> theta)<span style="color: #666666">**2</span>)
+<h2 id="mathematical-model" class="anchor">Mathematical model  </h2>
 
-n <span style="color: #666666">=</span> <span style="color: #666666">100</span>
-x <span style="color: #666666">=</span> <span style="color: #666666">2*</span>np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>rand(n,<span style="color: #666666">1</span>)
-y <span style="color: #666666">=</span> <span style="color: #666666">4+3*</span>x<span style="color: #666666">+</span>np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>randn(n,<span style="color: #666666">1</span>)
+<p>The output of neuron \( i \) in layer 2 is thus,</p>
 
-X <span style="color: #666666">=</span> np<span style="color: #666666">.</span>c_[np<span style="color: #666666">.</span>ones((n,<span style="color: #666666">1</span>)), x]
-XT_X <span style="color: #666666">=</span> X<span style="color: #666666">.</span>T <span style="color: #666666">@</span> X
-theta_linreg <span style="color: #666666">=</span> np<span style="color: #666666">.</span>linalg<span style="color: #666666">.</span>pinv(XT_X) <span style="color: #666666">@</span> (X<span style="color: #666666">.</span>T <span style="color: #666666">@</span> y)
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;Own inversion&quot;</span>)
-<span style="color: #008000">print</span>(theta_linreg)
-<span style="color: #408080; font-style: italic"># Hessian matrix</span>
-H <span style="color: #666666">=</span> (<span style="color: #666666">2.0/</span>n)<span style="color: #666666">*</span> XT_X
-EigValues, EigVectors <span style="color: #666666">=</span> np<span style="color: #666666">.</span>linalg<span style="color: #666666">.</span>eig(H)
-<span style="color: #008000">print</span>(<span style="color: #BA2121">f&quot;Eigenvalues of Hessian Matrix:</span><span style="color: #BB6688; font-weight: bold">{</span>EigValues<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">&quot;</span>)
+$$
+\begin{align}
+ y_i^2 &= f^2\left(\sum_{j=1}^N w_{ij}^2 y_j^1 + b_i^2\right) 
+\tag{5}\\
+ &= f^2\left[\sum_{j=1}^N w_{ij}^2f^1\left(\sum_{k=1}^M w_{jk}^1 x_k + b_j^1\right) + b_i^2\right]
+\tag{6}
+\end{align}
+$$
 
-theta <span style="color: #666666">=</span> np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>randn(<span style="color: #666666">2</span>,<span style="color: #666666">1</span>)
-eta <span style="color: #666666">=</span> <span style="color: #666666">1.0/</span>np<span style="color: #666666">.</span>max(EigValues)
-Niterations <span style="color: #666666">=</span> <span style="color: #666666">1000</span>
+<p>where we have substituted \( y_k^1 \) with the inputs \( x_k \). Finally, the ANN output reads</p>
 
-<span style="color: #408080; font-style: italic"># Note that we request the derivative wrt third argument (theta, 2 here)</span>
-training_gradient <span style="color: #666666">=</span> grad(CostOLS,<span style="color: #666666">2</span>)
-
-<span style="color: #008000; font-weight: bold">for</span> <span style="color: #008000">iter</span> <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(Niterations):
-    gradients <span style="color: #666666">=</span> (<span style="color: #666666">1.0/</span>n)<span style="color: #666666">*</span>training_gradient(y, X, theta)
-    theta <span style="color: #666666">-=</span> eta<span style="color: #666666">*</span>gradients
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;theta from own gd&quot;</span>)
-<span style="color: #008000">print</span>(theta)
-
-xnew <span style="color: #666666">=</span> np<span style="color: #666666">.</span>array([[<span style="color: #666666">0</span>],[<span style="color: #666666">2</span>]])
-Xnew <span style="color: #666666">=</span> np<span style="color: #666666">.</span>c_[np<span style="color: #666666">.</span>ones((<span style="color: #666666">2</span>,<span style="color: #666666">1</span>)), xnew]
-ypredict <span style="color: #666666">=</span> Xnew<span style="color: #666666">.</span>dot(theta)
-ypredict2 <span style="color: #666666">=</span> Xnew<span style="color: #666666">.</span>dot(theta_linreg)
-
-plt<span style="color: #666666">.</span>plot(xnew, ypredict, <span style="color: #BA2121">&quot;r-&quot;</span>)
-plt<span style="color: #666666">.</span>plot(xnew, ypredict2, <span style="color: #BA2121">&quot;b-&quot;</span>)
-plt<span style="color: #666666">.</span>plot(x, y ,<span style="color: #BA2121">&#39;ro&#39;</span>)
-plt<span style="color: #666666">.</span>axis([<span style="color: #666666">0</span>,<span style="color: #666666">2.0</span>,<span style="color: #666666">0</span>, <span style="color: #666666">15.0</span>])
-plt<span style="color: #666666">.</span>xlabel(<span style="color: #BA2121">r&#39;$x$&#39;</span>)
-plt<span style="color: #666666">.</span>ylabel(<span style="color: #BA2121">r&#39;$y$&#39;</span>)
-plt<span style="color: #666666">.</span>title(<span style="color: #BA2121">r&#39;Random numbers &#39;</span>)
-plt<span style="color: #666666">.</span>show()
-
-n_epochs <span style="color: #666666">=</span> <span style="color: #666666">50</span>
-M <span style="color: #666666">=</span> <span style="color: #666666">5</span>   <span style="color: #408080; font-style: italic">#size of each minibatch</span>
-m <span style="color: #666666">=</span> <span style="color: #008000">int</span>(n<span style="color: #666666">/</span>M) <span style="color: #408080; font-style: italic">#number of minibatches</span>
-t0, t1 <span style="color: #666666">=</span> <span style="color: #666666">5</span>, <span style="color: #666666">50</span>
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">learning_schedule</span>(t):
-    <span style="color: #008000; font-weight: bold">return</span> t0<span style="color: #666666">/</span>(t<span style="color: #666666">+</span>t1)
-
-theta <span style="color: #666666">=</span> np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>randn(<span style="color: #666666">2</span>,<span style="color: #666666">1</span>)
-
-<span style="color: #008000; font-weight: bold">for</span> epoch <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(n_epochs):
-<span style="color: #408080; font-style: italic"># Can you figure out a better way of setting up the contributions to each batch?</span>
-    <span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(m):
-        random_index <span style="color: #666666">=</span> M<span style="color: #666666">*</span>np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>randint(m)
-        xi <span style="color: #666666">=</span> X[random_index:random_index<span style="color: #666666">+</span>M]
-        yi <span style="color: #666666">=</span> y[random_index:random_index<span style="color: #666666">+</span>M]
-        gradients <span style="color: #666666">=</span> (<span style="color: #666666">1.0/</span>M)<span style="color: #666666">*</span>training_gradient(yi, xi, theta)
-        eta <span style="color: #666666">=</span> learning_schedule(epoch<span style="color: #666666">*</span>m<span style="color: #666666">+</span>i)
-        theta <span style="color: #666666">=</span> theta <span style="color: #666666">-</span> eta<span style="color: #666666">*</span>gradients
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;theta from own sdg&quot;</span>)
-<span style="color: #008000">print</span>(theta)
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
+$$
+\begin{align}
+ y_i^3 &= f^3\left(\sum_{j=1}^N w_{ij}^3 y_j^2 + b_i^3\right) 
+\tag{7}\\
+ &= f_3\left[\sum_{j} w_{ij}^3 f^2\left(\sum_{k} w_{jk}^2 f^1\left(\sum_{m} w_{km}^1 x_m + b_k^1\right) + b_j^2\right)
+  + b_1^3\right]
+\tag{8}
+\end{align}
+$$
 
 
 <p>
@@ -483,10 +308,6 @@ <h2 id="including-stochastic-gradient-descent-with-autograd" class="anchor">Incl
   <li><a href="http://github.com/._week40-bs045.html">46</a></li>
   <li><a href="http://github.com/._week40-bs046.html">47</a></li>
   <li><a href="http://github.com/._week40-bs047.html">48</a></li>
-  <li><a href="http://github.com/._week40-bs048.html">49</a></li>
-  <li><a href="http://github.com/._week40-bs049.html">50</a></li>
-  <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week40-bs068.html">69</a></li>
   <li><a href="http://github.com/._week40-bs041.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week40/html/._week40-bs041.html b/doc/pub/week40/html/._week40-bs041.html
index 7ad2afa6c..ac4e6bfff 100644
--- a/doc/pub/week40/html/._week40-bs041.html
+++ b/doc/pub/week40/html/._week40-bs041.html
@@ -36,11 +36,10 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Plans for week 40', 2, None, 'plans-for-week-40'),
-              ('Lecture Monday September 30, 2024',
+ 'sections': [('Lecture Monday September 29, 2025',
                2,
                None,
-               'lecture-monday-september-30-2024'),
+               'lecture-monday-september-29-2025'),
               ('Suggested readings and videos',
                2,
                None,
@@ -49,145 +48,71 @@
                2,
                None,
                'lab-sessions-tuesday-and-wednesday'),
-              ('Summary from last week, using gradient descent methods, '
-               'limitations',
+              ('Logistic Regression, from last week',
                2,
                None,
-               'summary-from-last-week-using-gradient-descent-methods-limitations'),
-              ('Simple implementation of GD for OLS, Ridge and Lasso',
+               'logistic-regression-from-last-week'),
+              ('Classification problems', 2, None, 'classification-problems'),
+              ('Optimization and Deep learning',
                2,
                None,
-               'simple-implementation-of-gd-for-ols-ridge-and-lasso'),
-              ("But none of these can compete with Newton's method",
+               'optimization-and-deep-learning'),
+              ('Basics', 2, None, 'basics'),
+              ('Two parameters', 2, None, 'two-parameters'),
+              ('Maximum likelihood', 2, None, 'maximum-likelihood'),
+              ('The cost function rewritten',
                2,
                None,
-               'but-none-of-these-can-compete-with-newton-s-method'),
-              ('Gradient descent and Logistic regression',
+               'the-cost-function-rewritten'),
+              ('Minimizing the cross entropy',
                2,
                None,
-               'gradient-descent-and-logistic-regression'),
-              ('Overview video on Stochastic Gradient Descent',
+               'minimizing-the-cross-entropy'),
+              ('A more compact expression',
                2,
                None,
-               'overview-video-on-stochastic-gradient-descent'),
-              ('Batches and mini-batches', 2, None, 'batches-and-mini-batches'),
-              ('Stochastic Gradient Descent (SGD)',
+               'a-more-compact-expression'),
+              ('Extending to more predictors',
                2,
                None,
-               'stochastic-gradient-descent-sgd'),
-              ('Stochastic Gradient Descent',
+               'extending-to-more-predictors'),
+              ('Including more classes', 2, None, 'including-more-classes'),
+              ('More classes', 2, None, 'more-classes'),
+              ('Optimization, the central part of any Machine Learning '
+               'algortithm',
                2,
                None,
-               'stochastic-gradient-descent'),
-              ('Computation of gradients', 2, None, 'computation-of-gradients'),
-              ('SGD example', 2, None, 'sgd-example'),
-              ('The gradient step', 2, None, 'the-gradient-step'),
-              ('Simple example code', 2, None, 'simple-example-code'),
-              ('When do we stop?', 2, None, 'when-do-we-stop'),
-              ('Slightly different approach',
+               'optimization-the-central-part-of-any-machine-learning-algortithm'),
+              ('Revisiting our Logistic Regression case',
                2,
                None,
-               'slightly-different-approach'),
-              ('Time decay rate', 2, None, 'time-decay-rate'),
-              ('Code with a Number of Minibatches which varies',
+               'revisiting-our-logistic-regression-case'),
+              ('The equations to solve', 2, None, 'the-equations-to-solve'),
+              ("Solving using Newton-Raphson's method",
                2,
                None,
-               'code-with-a-number-of-minibatches-which-varies'),
-              ('Replace or not', 2, None, 'replace-or-not'),
-              ('Momentum based GD', 2, None, 'momentum-based-gd'),
-              ('More on momentum based approaches',
+               'solving-using-newton-raphson-s-method'),
+              ('Example code for Logistic Regression',
                2,
                None,
-               'more-on-momentum-based-approaches'),
-              ('Momentum parameter', 2, None, 'momentum-parameter'),
-              ('Second moment of the gradient',
-               2,
-               None,
-               'second-moment-of-the-gradient'),
-              ('RMS prop', 2, None, 'rms-prop'),
-              ('"ADAM optimizer":"https://arxiv.org/abs/1412.6980"',
-               2,
-               None,
-               'adam-optimizer-https-arxiv-org-abs-1412-6980'),
-              ('Algorithms and codes for Adagrad, RMSprop and Adam',
-               2,
-               None,
-               'algorithms-and-codes-for-adagrad-rmsprop-and-adam'),
-              ('AdaGrad algorithm, taken from "Goodfellow et '
-               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
-               2,
-               None,
-               'adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
-              ('RMSProp algorithm, taken from "Goodfellow et '
-               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
-               2,
-               None,
-               'rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
-              ('ADAM algorithm, taken from "Goodfellow et '
-               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
-               2,
-               None,
-               'adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
-              ('Practical tips', 2, None, 'practical-tips'),
-              ('Automatic differentiation',
-               2,
-               None,
-               'automatic-differentiation'),
-              ('Using autograd', 2, None, 'using-autograd'),
-              ('Autograd with more complicated functions',
-               2,
-               None,
-               'autograd-with-more-complicated-functions'),
-              ('More complicated functions using the elements of their '
-               'arguments directly',
-               2,
-               None,
-               'more-complicated-functions-using-the-elements-of-their-arguments-directly'),
-              ('Functions using mathematical functions from Numpy',
-               2,
-               None,
-               'functions-using-mathematical-functions-from-numpy'),
-              ('More autograd', 2, None, 'more-autograd'),
-              ('And  with loops', 2, None, 'and-with-loops'),
-              ('Using recursion', 2, None, 'using-recursion'),
-              ('Using Autograd with OLS', 2, None, 'using-autograd-with-ols'),
-              ('Same code but now with momentum gradient descent',
-               2,
-               None,
-               'same-code-but-now-with-momentum-gradient-descent'),
-              ('Including Stochastic Gradient Descent with Autograd',
-               2,
-               None,
-               'including-stochastic-gradient-descent-with-autograd'),
-              ('Same code but now with momentum gradient descent',
-               2,
-               None,
-               'same-code-but-now-with-momentum-gradient-descent'),
-              ('Similar (second order function now) problem but now with '
-               'AdaGrad',
-               2,
+               'example-code-for-logistic-regression'),
+              ('Synthetic data generation',
+               3,
                None,
-               'similar-second-order-function-now-problem-but-now-with-adagrad'),
-              ('RMSprop for adaptive learning rate with Stochastic Gradient '
-               'Descent',
+               'synthetic-data-generation'),
+              ('Using _Scikit-learn_', 2, None, 'using-scikit-learn'),
+              ('Using the correlation matrix',
                2,
                None,
-               'rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent'),
-              ('And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"',
+               'using-the-correlation-matrix'),
+              ('Discussing the correlation data',
                2,
                None,
-               'and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf'),
-              ('And Logistic Regression', 2, None, 'and-logistic-regression'),
-              ('Introducing "JAX":"https://jax.readthedocs.io/en/latest/"',
+               'discussing-the-correlation-data'),
+              ('Other measures in classification studies',
                2,
                None,
-               'introducing-jax-https-jax-readthedocs-io-en-latest'),
-              ('Getting started with Jax, note the way we import numpy',
-               3,
-               None,
-               'getting-started-with-jax-note-the-way-we-import-numpy'),
-              ('A warm-up example', 3, None, 'a-warm-up-example'),
-              ('A more advanced example', 3, None, 'a-more-advanced-example'),
+               'other-measures-in-classification-studies'),
               ('Introduction to Neural networks',
                2,
                None,
@@ -276,81 +201,54 @@
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs001.html#plans-for-week-40" style="font-size: 80%;"><b>Plans for week 40</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs002.html#lecture-monday-september-30-2024" style="font-size: 80%;"><b>Lecture Monday September 30, 2024</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs003.html#suggested-readings-and-videos" style="font-size: 80%;"><b>Suggested readings and videos</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs004.html#lab-sessions-tuesday-and-wednesday" style="font-size: 80%;"><b>Lab sessions Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs005.html#summary-from-last-week-using-gradient-descent-methods-limitations" style="font-size: 80%;"><b>Summary from last week, using gradient descent methods, limitations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs006.html#simple-implementation-of-gd-for-ols-ridge-and-lasso" style="font-size: 80%;"><b>Simple implementation of GD for OLS, Ridge and Lasso</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs007.html#but-none-of-these-can-compete-with-newton-s-method" style="font-size: 80%;"><b>But none of these can compete with Newton's method</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs008.html#gradient-descent-and-logistic-regression" style="font-size: 80%;"><b>Gradient descent and Logistic regression</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs009.html#overview-video-on-stochastic-gradient-descent" style="font-size: 80%;"><b>Overview video on Stochastic Gradient Descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs010.html#batches-and-mini-batches" style="font-size: 80%;"><b>Batches and mini-batches</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs011.html#stochastic-gradient-descent-sgd" style="font-size: 80%;"><b>Stochastic Gradient Descent (SGD)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs012.html#stochastic-gradient-descent" style="font-size: 80%;"><b>Stochastic Gradient Descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs013.html#computation-of-gradients" style="font-size: 80%;"><b>Computation of gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs014.html#sgd-example" style="font-size: 80%;"><b>SGD example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs015.html#the-gradient-step" style="font-size: 80%;"><b>The gradient step</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs016.html#simple-example-code" style="font-size: 80%;"><b>Simple example code</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs017.html#when-do-we-stop" style="font-size: 80%;"><b>When do we stop?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs018.html#slightly-different-approach" style="font-size: 80%;"><b>Slightly different approach</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs019.html#time-decay-rate" style="font-size: 80%;"><b>Time decay rate</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs020.html#code-with-a-number-of-minibatches-which-varies" style="font-size: 80%;"><b>Code with a Number of Minibatches which varies</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs021.html#replace-or-not" style="font-size: 80%;"><b>Replace or not</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs022.html#momentum-based-gd" style="font-size: 80%;"><b>Momentum based GD</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs023.html#more-on-momentum-based-approaches" style="font-size: 80%;"><b>More on momentum based approaches</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs024.html#momentum-parameter" style="font-size: 80%;"><b>Momentum parameter</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs025.html#second-moment-of-the-gradient" style="font-size: 80%;"><b>Second moment of the gradient</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs026.html#rms-prop" style="font-size: 80%;"><b>RMS prop</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs027.html#adam-optimizer-https-arxiv-org-abs-1412-6980" style="font-size: 80%;"><b>"ADAM optimizer":"https://arxiv.org/abs/1412.6980"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#algorithms-and-codes-for-adagrad-rmsprop-and-adam" style="font-size: 80%;"><b>Algorithms and codes for Adagrad, RMSprop and Adam</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>AdaGrad algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>RMSProp algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>ADAM algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs029.html#practical-tips" style="font-size: 80%;"><b>Practical tips</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs030.html#automatic-differentiation" style="font-size: 80%;"><b>Automatic differentiation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs031.html#using-autograd" style="font-size: 80%;"><b>Using autograd</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs032.html#autograd-with-more-complicated-functions" style="font-size: 80%;"><b>Autograd with more complicated functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs033.html#more-complicated-functions-using-the-elements-of-their-arguments-directly" style="font-size: 80%;"><b>More complicated functions using the elements of their arguments directly</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs034.html#functions-using-mathematical-functions-from-numpy" style="font-size: 80%;"><b>Functions using mathematical functions from Numpy</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs035.html#more-autograd" style="font-size: 80%;"><b>More autograd</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs036.html#and-with-loops" style="font-size: 80%;"><b>And  with loops</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs037.html#using-recursion" style="font-size: 80%;"><b>Using recursion</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs038.html#using-autograd-with-ols" style="font-size: 80%;"><b>Using Autograd with OLS</b></a></li>
-     <!-- navigation toc: --> <li><a href="#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs040.html#including-stochastic-gradient-descent-with-autograd" style="font-size: 80%;"><b>Including Stochastic Gradient Descent with Autograd</b></a></li>
-     <!-- navigation toc: --> <li><a href="#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#similar-second-order-function-now-problem-but-now-with-adagrad" style="font-size: 80%;"><b>Similar (second order function now) problem but now with AdaGrad</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs043.html#rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent" style="font-size: 80%;"><b>RMSprop for adaptive learning rate with Stochastic Gradient Descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs044.html#and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf" style="font-size: 80%;"><b>And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#and-logistic-regression" style="font-size: 80%;"><b>And Logistic Regression</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#introducing-jax-https-jax-readthedocs-io-en-latest" style="font-size: 80%;"><b>Introducing "JAX":"https://jax.readthedocs.io/en/latest/"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#getting-started-with-jax-note-the-way-we-import-numpy" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Getting started with Jax, note the way we import numpy</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#a-warm-up-example" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;A warm-up example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#a-more-advanced-example" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;A more advanced example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs046.html#introduction-to-neural-networks" style="font-size: 80%;"><b>Introduction to Neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs047.html#artificial-neurons" style="font-size: 80%;"><b>Artificial neurons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs048.html#neural-network-types" style="font-size: 80%;"><b>Neural network types</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs049.html#feed-forward-neural-networks" style="font-size: 80%;"><b>Feed-forward neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs050.html#convolutional-neural-network" style="font-size: 80%;"><b>Convolutional Neural Network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs051.html#recurrent-neural-networks" style="font-size: 80%;"><b>Recurrent neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs052.html#other-types-of-networks" style="font-size: 80%;"><b>Other types of networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs053.html#multilayer-perceptrons" style="font-size: 80%;"><b>Multilayer perceptrons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs054.html#why-multilayer-perceptrons" style="font-size: 80%;"><b>Why multilayer perceptrons?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs055.html#illustration-of-a-single-perceptron-model-and-a-multi-perceptron-model" style="font-size: 80%;"><b>Illustration of a single perceptron model and a multi-perceptron model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs056.html#examples-of-xor-or-and-and-gates" style="font-size: 80%;"><b>Examples of XOR, OR and AND gates</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs057.html#does-logistic-regression-do-a-better-job" style="font-size: 80%;"><b>Does Logistic Regression do a better Job?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs058.html#adding-neural-networks" style="font-size: 80%;"><b>Adding Neural Networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs064.html#matrix-vector-notation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Matrix-vector notation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs065.html#matrix-vector-notation-and-activation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Matrix-vector notation  and activation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs066.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs067.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs068.html#relevance" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Relevance</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs001.html#lecture-monday-september-29-2025" style="font-size: 80%;"><b>Lecture Monday September 29, 2025</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs002.html#suggested-readings-and-videos" style="font-size: 80%;"><b>Suggested readings and videos</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs003.html#lab-sessions-tuesday-and-wednesday" style="font-size: 80%;"><b>Lab sessions Tuesday and Wednesday</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs004.html#logistic-regression-from-last-week" style="font-size: 80%;"><b>Logistic Regression, from last week</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs005.html#classification-problems" style="font-size: 80%;"><b>Classification problems</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs006.html#optimization-and-deep-learning" style="font-size: 80%;"><b>Optimization and Deep learning</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs007.html#basics" style="font-size: 80%;"><b>Basics</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs008.html#two-parameters" style="font-size: 80%;"><b>Two parameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs009.html#maximum-likelihood" style="font-size: 80%;"><b>Maximum likelihood</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs010.html#the-cost-function-rewritten" style="font-size: 80%;"><b>The cost function rewritten</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs011.html#minimizing-the-cross-entropy" style="font-size: 80%;"><b>Minimizing the cross entropy</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs012.html#a-more-compact-expression" style="font-size: 80%;"><b>A more compact expression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs013.html#extending-to-more-predictors" style="font-size: 80%;"><b>Extending to more predictors</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs014.html#including-more-classes" style="font-size: 80%;"><b>Including more classes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs015.html#more-classes" style="font-size: 80%;"><b>More classes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs016.html#optimization-the-central-part-of-any-machine-learning-algortithm" style="font-size: 80%;"><b>Optimization, the central part of any Machine Learning algortithm</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs017.html#revisiting-our-logistic-regression-case" style="font-size: 80%;"><b>Revisiting our Logistic Regression case</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs018.html#the-equations-to-solve" style="font-size: 80%;"><b>The equations to solve</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs019.html#solving-using-newton-raphson-s-method" style="font-size: 80%;"><b>Solving using Newton-Raphson's method</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs020.html#example-code-for-logistic-regression" style="font-size: 80%;"><b>Example code for Logistic Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs020.html#synthetic-data-generation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Synthetic data generation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs021.html#using-scikit-learn" style="font-size: 80%;"><b>Using <b>Scikit-learn</b></b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs022.html#using-the-correlation-matrix" style="font-size: 80%;"><b>Using the correlation matrix</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs023.html#discussing-the-correlation-data" style="font-size: 80%;"><b>Discussing the correlation data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs024.html#other-measures-in-classification-studies" style="font-size: 80%;"><b>Other measures in classification studies</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs025.html#introduction-to-neural-networks" style="font-size: 80%;"><b>Introduction to Neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs026.html#artificial-neurons" style="font-size: 80%;"><b>Artificial neurons</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs027.html#neural-network-types" style="font-size: 80%;"><b>Neural network types</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#feed-forward-neural-networks" style="font-size: 80%;"><b>Feed-forward neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs029.html#convolutional-neural-network" style="font-size: 80%;"><b>Convolutional Neural Network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs030.html#recurrent-neural-networks" style="font-size: 80%;"><b>Recurrent neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs031.html#other-types-of-networks" style="font-size: 80%;"><b>Other types of networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs032.html#multilayer-perceptrons" style="font-size: 80%;"><b>Multilayer perceptrons</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs033.html#why-multilayer-perceptrons" style="font-size: 80%;"><b>Why multilayer perceptrons?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs034.html#illustration-of-a-single-perceptron-model-and-a-multi-perceptron-model" style="font-size: 80%;"><b>Illustration of a single perceptron model and a multi-perceptron model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs035.html#examples-of-xor-or-and-and-gates" style="font-size: 80%;"><b>Examples of XOR, OR and AND gates</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs036.html#does-logistic-regression-do-a-better-job" style="font-size: 80%;"><b>Does Logistic Regression do a better Job?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs037.html#adding-neural-networks" style="font-size: 80%;"><b>Adding Neural Networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs043.html#matrix-vector-notation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Matrix-vector notation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs044.html#matrix-vector-notation-and-activation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Matrix-vector notation  and activation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs046.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs047.html#relevance" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Relevance</a></li>
 
         </ul>
       </li>
@@ -362,96 +260,22 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0041"></a>
 <!-- !split -->
-<h2 id="same-code-but-now-with-momentum-gradient-descent" class="anchor">Same code but now with momentum gradient descent </h2>
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #408080; font-style: italic"># Using Autograd to calculate gradients using SGD</span>
-<span style="color: #408080; font-style: italic"># OLS example</span>
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">random</span> <span style="color: #008000; font-weight: bold">import</span> random, seed
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">autograd.numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">matplotlib.pyplot</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">plt</span>
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">autograd</span> <span style="color: #008000; font-weight: bold">import</span> grad
-
-<span style="color: #408080; font-style: italic"># Note change from previous example</span>
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">CostOLS</span>(y,X,theta):
-    <span style="color: #008000; font-weight: bold">return</span> np<span style="color: #666666">.</span>sum((y<span style="color: #666666">-</span>X <span style="color: #666666">@</span> theta)<span style="color: #666666">**2</span>)
-
-n <span style="color: #666666">=</span> <span style="color: #666666">100</span>
-x <span style="color: #666666">=</span> <span style="color: #666666">2*</span>np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>rand(n,<span style="color: #666666">1</span>)
-y <span style="color: #666666">=</span> <span style="color: #666666">4+3*</span>x<span style="color: #666666">+</span>np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>randn(n,<span style="color: #666666">1</span>)
-
-X <span style="color: #666666">=</span> np<span style="color: #666666">.</span>c_[np<span style="color: #666666">.</span>ones((n,<span style="color: #666666">1</span>)), x]
-XT_X <span style="color: #666666">=</span> X<span style="color: #666666">.</span>T <span style="color: #666666">@</span> X
-theta_linreg <span style="color: #666666">=</span> np<span style="color: #666666">.</span>linalg<span style="color: #666666">.</span>pinv(XT_X) <span style="color: #666666">@</span> (X<span style="color: #666666">.</span>T <span style="color: #666666">@</span> y)
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;Own inversion&quot;</span>)
-<span style="color: #008000">print</span>(theta_linreg)
-<span style="color: #408080; font-style: italic"># Hessian matrix</span>
-H <span style="color: #666666">=</span> (<span style="color: #666666">2.0/</span>n)<span style="color: #666666">*</span> XT_X
-EigValues, EigVectors <span style="color: #666666">=</span> np<span style="color: #666666">.</span>linalg<span style="color: #666666">.</span>eig(H)
-<span style="color: #008000">print</span>(<span style="color: #BA2121">f&quot;Eigenvalues of Hessian Matrix:</span><span style="color: #BB6688; font-weight: bold">{</span>EigValues<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">&quot;</span>)
+<h2 id="mathematical-model" class="anchor">Mathematical model  </h2>
 
-theta <span style="color: #666666">=</span> np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>randn(<span style="color: #666666">2</span>,<span style="color: #666666">1</span>)
-eta <span style="color: #666666">=</span> <span style="color: #666666">1.0/</span>np<span style="color: #666666">.</span>max(EigValues)
-Niterations <span style="color: #666666">=</span> <span style="color: #666666">100</span>
+<p>We can generalize this expression to an MLP with \( l \) hidden
+layers. The complete functional form is,
+</p>
 
-<span style="color: #408080; font-style: italic"># Note that we request the derivative wrt third argument (theta, 2 here)</span>
-training_gradient <span style="color: #666666">=</span> grad(CostOLS,<span style="color: #666666">2</span>)
-
-<span style="color: #008000; font-weight: bold">for</span> <span style="color: #008000">iter</span> <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(Niterations):
-    gradients <span style="color: #666666">=</span> (<span style="color: #666666">1.0/</span>n)<span style="color: #666666">*</span>training_gradient(y, X, theta)
-    theta <span style="color: #666666">-=</span> eta<span style="color: #666666">*</span>gradients
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;theta from own gd&quot;</span>)
-<span style="color: #008000">print</span>(theta)
-
-
-n_epochs <span style="color: #666666">=</span> <span style="color: #666666">50</span>
-M <span style="color: #666666">=</span> <span style="color: #666666">5</span>   <span style="color: #408080; font-style: italic">#size of each minibatch</span>
-m <span style="color: #666666">=</span> <span style="color: #008000">int</span>(n<span style="color: #666666">/</span>M) <span style="color: #408080; font-style: italic">#number of minibatches</span>
-t0, t1 <span style="color: #666666">=</span> <span style="color: #666666">5</span>, <span style="color: #666666">50</span>
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">learning_schedule</span>(t):
-    <span style="color: #008000; font-weight: bold">return</span> t0<span style="color: #666666">/</span>(t<span style="color: #666666">+</span>t1)
-
-theta <span style="color: #666666">=</span> np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>randn(<span style="color: #666666">2</span>,<span style="color: #666666">1</span>)
-
-change <span style="color: #666666">=</span> <span style="color: #666666">0.0</span>
-delta_momentum <span style="color: #666666">=</span> <span style="color: #666666">0.3</span>
-
-<span style="color: #008000; font-weight: bold">for</span> epoch <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(n_epochs):
-    <span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(m):
-        random_index <span style="color: #666666">=</span> M<span style="color: #666666">*</span>np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>randint(m)
-        xi <span style="color: #666666">=</span> X[random_index:random_index<span style="color: #666666">+</span>M]
-        yi <span style="color: #666666">=</span> y[random_index:random_index<span style="color: #666666">+</span>M]
-        gradients <span style="color: #666666">=</span> (<span style="color: #666666">1.0/</span>M)<span style="color: #666666">*</span>training_gradient(yi, xi, theta)
-        eta <span style="color: #666666">=</span> learning_schedule(epoch<span style="color: #666666">*</span>m<span style="color: #666666">+</span>i)
-        <span style="color: #408080; font-style: italic"># calculate update</span>
-        new_change <span style="color: #666666">=</span> eta<span style="color: #666666">*</span>gradients<span style="color: #666666">+</span>delta_momentum<span style="color: #666666">*</span>change
-        <span style="color: #408080; font-style: italic"># take a step</span>
-        theta <span style="color: #666666">-=</span> new_change
-        <span style="color: #408080; font-style: italic"># save the change</span>
-        change <span style="color: #666666">=</span> new_change
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;theta from own sdg with momentum&quot;</span>)
-<span style="color: #008000">print</span>(theta)
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
+$$
+\begin{align}
+&y^{l+1}_i = f^{l+1}\left[\!\sum_{j=1}^{N_l} w_{ij}^3 f^l\left(\sum_{k=1}^{N_{l-1}}w_{jk}^{l-1}\left(\dots f^1\left(\sum_{n=1}^{N_0} w_{mn}^1 x_n+ b_m^1\right)\dots\right)+b_k^2\right)+b_1^3\right] &&
+\tag{9}
+\end{align}
+$$
 
+<p>which illustrates a basic property of MLPs: The only independent
+variables are the input values \( x_n \).
+</p>
 
 <p>
 <!-- navigation buttons at the bottom of the page -->
@@ -474,11 +298,6 @@ <h2 id="same-code-but-now-with-momentum-gradient-descent" class="anchor">Same co
   <li><a href="http://github.com/._week40-bs045.html">46</a></li>
   <li><a href="http://github.com/._week40-bs046.html">47</a></li>
   <li><a href="http://github.com/._week40-bs047.html">48</a></li>
-  <li><a href="http://github.com/._week40-bs048.html">49</a></li>
-  <li><a href="http://github.com/._week40-bs049.html">50</a></li>
-  <li><a href="http://github.com/._week40-bs050.html">51</a></li>
-  <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week40-bs068.html">69</a></li>
   <li><a href="http://github.com/._week40-bs042.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week40/html/._week40-bs042.html b/doc/pub/week40/html/._week40-bs042.html
index 9c83fc42a..2aa358998 100644
--- a/doc/pub/week40/html/._week40-bs042.html
+++ b/doc/pub/week40/html/._week40-bs042.html
@@ -36,11 +36,10 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Plans for week 40', 2, None, 'plans-for-week-40'),
-              ('Lecture Monday September 30, 2024',
+ 'sections': [('Lecture Monday September 29, 2025',
                2,
                None,
-               'lecture-monday-september-30-2024'),
+               'lecture-monday-september-29-2025'),
               ('Suggested readings and videos',
                2,
                None,
@@ -49,145 +48,71 @@
                2,
                None,
                'lab-sessions-tuesday-and-wednesday'),
-              ('Summary from last week, using gradient descent methods, '
-               'limitations',
+              ('Logistic Regression, from last week',
                2,
                None,
-               'summary-from-last-week-using-gradient-descent-methods-limitations'),
-              ('Simple implementation of GD for OLS, Ridge and Lasso',
+               'logistic-regression-from-last-week'),
+              ('Classification problems', 2, None, 'classification-problems'),
+              ('Optimization and Deep learning',
                2,
                None,
-               'simple-implementation-of-gd-for-ols-ridge-and-lasso'),
-              ("But none of these can compete with Newton's method",
+               'optimization-and-deep-learning'),
+              ('Basics', 2, None, 'basics'),
+              ('Two parameters', 2, None, 'two-parameters'),
+              ('Maximum likelihood', 2, None, 'maximum-likelihood'),
+              ('The cost function rewritten',
                2,
                None,
-               'but-none-of-these-can-compete-with-newton-s-method'),
-              ('Gradient descent and Logistic regression',
+               'the-cost-function-rewritten'),
+              ('Minimizing the cross entropy',
                2,
                None,
-               'gradient-descent-and-logistic-regression'),
-              ('Overview video on Stochastic Gradient Descent',
+               'minimizing-the-cross-entropy'),
+              ('A more compact expression',
                2,
                None,
-               'overview-video-on-stochastic-gradient-descent'),
-              ('Batches and mini-batches', 2, None, 'batches-and-mini-batches'),
-              ('Stochastic Gradient Descent (SGD)',
+               'a-more-compact-expression'),
+              ('Extending to more predictors',
                2,
                None,
-               'stochastic-gradient-descent-sgd'),
-              ('Stochastic Gradient Descent',
+               'extending-to-more-predictors'),
+              ('Including more classes', 2, None, 'including-more-classes'),
+              ('More classes', 2, None, 'more-classes'),
+              ('Optimization, the central part of any Machine Learning '
+               'algortithm',
                2,
                None,
-               'stochastic-gradient-descent'),
-              ('Computation of gradients', 2, None, 'computation-of-gradients'),
-              ('SGD example', 2, None, 'sgd-example'),
-              ('The gradient step', 2, None, 'the-gradient-step'),
-              ('Simple example code', 2, None, 'simple-example-code'),
-              ('When do we stop?', 2, None, 'when-do-we-stop'),
-              ('Slightly different approach',
+               'optimization-the-central-part-of-any-machine-learning-algortithm'),
+              ('Revisiting our Logistic Regression case',
                2,
                None,
-               'slightly-different-approach'),
-              ('Time decay rate', 2, None, 'time-decay-rate'),
-              ('Code with a Number of Minibatches which varies',
+               'revisiting-our-logistic-regression-case'),
+              ('The equations to solve', 2, None, 'the-equations-to-solve'),
+              ("Solving using Newton-Raphson's method",
                2,
                None,
-               'code-with-a-number-of-minibatches-which-varies'),
-              ('Replace or not', 2, None, 'replace-or-not'),
-              ('Momentum based GD', 2, None, 'momentum-based-gd'),
-              ('More on momentum based approaches',
+               'solving-using-newton-raphson-s-method'),
+              ('Example code for Logistic Regression',
                2,
                None,
-               'more-on-momentum-based-approaches'),
-              ('Momentum parameter', 2, None, 'momentum-parameter'),
-              ('Second moment of the gradient',
-               2,
-               None,
-               'second-moment-of-the-gradient'),
-              ('RMS prop', 2, None, 'rms-prop'),
-              ('"ADAM optimizer":"https://arxiv.org/abs/1412.6980"',
-               2,
-               None,
-               'adam-optimizer-https-arxiv-org-abs-1412-6980'),
-              ('Algorithms and codes for Adagrad, RMSprop and Adam',
-               2,
-               None,
-               'algorithms-and-codes-for-adagrad-rmsprop-and-adam'),
-              ('AdaGrad algorithm, taken from "Goodfellow et '
-               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
-               2,
-               None,
-               'adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
-              ('RMSProp algorithm, taken from "Goodfellow et '
-               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
-               2,
-               None,
-               'rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
-              ('ADAM algorithm, taken from "Goodfellow et '
-               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
-               2,
-               None,
-               'adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
-              ('Practical tips', 2, None, 'practical-tips'),
-              ('Automatic differentiation',
-               2,
-               None,
-               'automatic-differentiation'),
-              ('Using autograd', 2, None, 'using-autograd'),
-              ('Autograd with more complicated functions',
-               2,
-               None,
-               'autograd-with-more-complicated-functions'),
-              ('More complicated functions using the elements of their '
-               'arguments directly',
-               2,
-               None,
-               'more-complicated-functions-using-the-elements-of-their-arguments-directly'),
-              ('Functions using mathematical functions from Numpy',
-               2,
-               None,
-               'functions-using-mathematical-functions-from-numpy'),
-              ('More autograd', 2, None, 'more-autograd'),
-              ('And  with loops', 2, None, 'and-with-loops'),
-              ('Using recursion', 2, None, 'using-recursion'),
-              ('Using Autograd with OLS', 2, None, 'using-autograd-with-ols'),
-              ('Same code but now with momentum gradient descent',
-               2,
-               None,
-               'same-code-but-now-with-momentum-gradient-descent'),
-              ('Including Stochastic Gradient Descent with Autograd',
-               2,
-               None,
-               'including-stochastic-gradient-descent-with-autograd'),
-              ('Same code but now with momentum gradient descent',
-               2,
-               None,
-               'same-code-but-now-with-momentum-gradient-descent'),
-              ('Similar (second order function now) problem but now with '
-               'AdaGrad',
-               2,
+               'example-code-for-logistic-regression'),
+              ('Synthetic data generation',
+               3,
                None,
-               'similar-second-order-function-now-problem-but-now-with-adagrad'),
-              ('RMSprop for adaptive learning rate with Stochastic Gradient '
-               'Descent',
+               'synthetic-data-generation'),
+              ('Using _Scikit-learn_', 2, None, 'using-scikit-learn'),
+              ('Using the correlation matrix',
                2,
                None,
-               'rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent'),
-              ('And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"',
+               'using-the-correlation-matrix'),
+              ('Discussing the correlation data',
                2,
                None,
-               'and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf'),
-              ('And Logistic Regression', 2, None, 'and-logistic-regression'),
-              ('Introducing "JAX":"https://jax.readthedocs.io/en/latest/"',
+               'discussing-the-correlation-data'),
+              ('Other measures in classification studies',
                2,
                None,
-               'introducing-jax-https-jax-readthedocs-io-en-latest'),
-              ('Getting started with Jax, note the way we import numpy',
-               3,
-               None,
-               'getting-started-with-jax-note-the-way-we-import-numpy'),
-              ('A warm-up example', 3, None, 'a-warm-up-example'),
-              ('A more advanced example', 3, None, 'a-more-advanced-example'),
+               'other-measures-in-classification-studies'),
               ('Introduction to Neural networks',
                2,
                None,
@@ -276,81 +201,54 @@
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs001.html#plans-for-week-40" style="font-size: 80%;"><b>Plans for week 40</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs002.html#lecture-monday-september-30-2024" style="font-size: 80%;"><b>Lecture Monday September 30, 2024</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs003.html#suggested-readings-and-videos" style="font-size: 80%;"><b>Suggested readings and videos</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs004.html#lab-sessions-tuesday-and-wednesday" style="font-size: 80%;"><b>Lab sessions Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs005.html#summary-from-last-week-using-gradient-descent-methods-limitations" style="font-size: 80%;"><b>Summary from last week, using gradient descent methods, limitations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs006.html#simple-implementation-of-gd-for-ols-ridge-and-lasso" style="font-size: 80%;"><b>Simple implementation of GD for OLS, Ridge and Lasso</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs007.html#but-none-of-these-can-compete-with-newton-s-method" style="font-size: 80%;"><b>But none of these can compete with Newton's method</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs008.html#gradient-descent-and-logistic-regression" style="font-size: 80%;"><b>Gradient descent and Logistic regression</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs009.html#overview-video-on-stochastic-gradient-descent" style="font-size: 80%;"><b>Overview video on Stochastic Gradient Descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs010.html#batches-and-mini-batches" style="font-size: 80%;"><b>Batches and mini-batches</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs011.html#stochastic-gradient-descent-sgd" style="font-size: 80%;"><b>Stochastic Gradient Descent (SGD)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs012.html#stochastic-gradient-descent" style="font-size: 80%;"><b>Stochastic Gradient Descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs013.html#computation-of-gradients" style="font-size: 80%;"><b>Computation of gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs014.html#sgd-example" style="font-size: 80%;"><b>SGD example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs015.html#the-gradient-step" style="font-size: 80%;"><b>The gradient step</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs016.html#simple-example-code" style="font-size: 80%;"><b>Simple example code</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs017.html#when-do-we-stop" style="font-size: 80%;"><b>When do we stop?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs018.html#slightly-different-approach" style="font-size: 80%;"><b>Slightly different approach</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs019.html#time-decay-rate" style="font-size: 80%;"><b>Time decay rate</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs020.html#code-with-a-number-of-minibatches-which-varies" style="font-size: 80%;"><b>Code with a Number of Minibatches which varies</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs021.html#replace-or-not" style="font-size: 80%;"><b>Replace or not</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs022.html#momentum-based-gd" style="font-size: 80%;"><b>Momentum based GD</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs023.html#more-on-momentum-based-approaches" style="font-size: 80%;"><b>More on momentum based approaches</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs024.html#momentum-parameter" style="font-size: 80%;"><b>Momentum parameter</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs025.html#second-moment-of-the-gradient" style="font-size: 80%;"><b>Second moment of the gradient</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs026.html#rms-prop" style="font-size: 80%;"><b>RMS prop</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs027.html#adam-optimizer-https-arxiv-org-abs-1412-6980" style="font-size: 80%;"><b>"ADAM optimizer":"https://arxiv.org/abs/1412.6980"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#algorithms-and-codes-for-adagrad-rmsprop-and-adam" style="font-size: 80%;"><b>Algorithms and codes for Adagrad, RMSprop and Adam</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>AdaGrad algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>RMSProp algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>ADAM algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs029.html#practical-tips" style="font-size: 80%;"><b>Practical tips</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs030.html#automatic-differentiation" style="font-size: 80%;"><b>Automatic differentiation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs031.html#using-autograd" style="font-size: 80%;"><b>Using autograd</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs032.html#autograd-with-more-complicated-functions" style="font-size: 80%;"><b>Autograd with more complicated functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs033.html#more-complicated-functions-using-the-elements-of-their-arguments-directly" style="font-size: 80%;"><b>More complicated functions using the elements of their arguments directly</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs034.html#functions-using-mathematical-functions-from-numpy" style="font-size: 80%;"><b>Functions using mathematical functions from Numpy</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs035.html#more-autograd" style="font-size: 80%;"><b>More autograd</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs036.html#and-with-loops" style="font-size: 80%;"><b>And  with loops</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs037.html#using-recursion" style="font-size: 80%;"><b>Using recursion</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs038.html#using-autograd-with-ols" style="font-size: 80%;"><b>Using Autograd with OLS</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs041.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs040.html#including-stochastic-gradient-descent-with-autograd" style="font-size: 80%;"><b>Including Stochastic Gradient Descent with Autograd</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs041.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="#similar-second-order-function-now-problem-but-now-with-adagrad" style="font-size: 80%;"><b>Similar (second order function now) problem but now with AdaGrad</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs043.html#rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent" style="font-size: 80%;"><b>RMSprop for adaptive learning rate with Stochastic Gradient Descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs044.html#and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf" style="font-size: 80%;"><b>And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#and-logistic-regression" style="font-size: 80%;"><b>And Logistic Regression</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#introducing-jax-https-jax-readthedocs-io-en-latest" style="font-size: 80%;"><b>Introducing "JAX":"https://jax.readthedocs.io/en/latest/"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#getting-started-with-jax-note-the-way-we-import-numpy" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Getting started with Jax, note the way we import numpy</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#a-warm-up-example" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;A warm-up example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#a-more-advanced-example" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;A more advanced example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs046.html#introduction-to-neural-networks" style="font-size: 80%;"><b>Introduction to Neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs047.html#artificial-neurons" style="font-size: 80%;"><b>Artificial neurons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs048.html#neural-network-types" style="font-size: 80%;"><b>Neural network types</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs049.html#feed-forward-neural-networks" style="font-size: 80%;"><b>Feed-forward neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs050.html#convolutional-neural-network" style="font-size: 80%;"><b>Convolutional Neural Network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs051.html#recurrent-neural-networks" style="font-size: 80%;"><b>Recurrent neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs052.html#other-types-of-networks" style="font-size: 80%;"><b>Other types of networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs053.html#multilayer-perceptrons" style="font-size: 80%;"><b>Multilayer perceptrons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs054.html#why-multilayer-perceptrons" style="font-size: 80%;"><b>Why multilayer perceptrons?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs055.html#illustration-of-a-single-perceptron-model-and-a-multi-perceptron-model" style="font-size: 80%;"><b>Illustration of a single perceptron model and a multi-perceptron model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs056.html#examples-of-xor-or-and-and-gates" style="font-size: 80%;"><b>Examples of XOR, OR and AND gates</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs057.html#does-logistic-regression-do-a-better-job" style="font-size: 80%;"><b>Does Logistic Regression do a better Job?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs058.html#adding-neural-networks" style="font-size: 80%;"><b>Adding Neural Networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs064.html#matrix-vector-notation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Matrix-vector notation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs065.html#matrix-vector-notation-and-activation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Matrix-vector notation  and activation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs066.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs067.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs068.html#relevance" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Relevance</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs001.html#lecture-monday-september-29-2025" style="font-size: 80%;"><b>Lecture Monday September 29, 2025</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs002.html#suggested-readings-and-videos" style="font-size: 80%;"><b>Suggested readings and videos</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs003.html#lab-sessions-tuesday-and-wednesday" style="font-size: 80%;"><b>Lab sessions Tuesday and Wednesday</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs004.html#logistic-regression-from-last-week" style="font-size: 80%;"><b>Logistic Regression, from last week</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs005.html#classification-problems" style="font-size: 80%;"><b>Classification problems</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs006.html#optimization-and-deep-learning" style="font-size: 80%;"><b>Optimization and Deep learning</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs007.html#basics" style="font-size: 80%;"><b>Basics</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs008.html#two-parameters" style="font-size: 80%;"><b>Two parameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs009.html#maximum-likelihood" style="font-size: 80%;"><b>Maximum likelihood</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs010.html#the-cost-function-rewritten" style="font-size: 80%;"><b>The cost function rewritten</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs011.html#minimizing-the-cross-entropy" style="font-size: 80%;"><b>Minimizing the cross entropy</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs012.html#a-more-compact-expression" style="font-size: 80%;"><b>A more compact expression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs013.html#extending-to-more-predictors" style="font-size: 80%;"><b>Extending to more predictors</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs014.html#including-more-classes" style="font-size: 80%;"><b>Including more classes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs015.html#more-classes" style="font-size: 80%;"><b>More classes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs016.html#optimization-the-central-part-of-any-machine-learning-algortithm" style="font-size: 80%;"><b>Optimization, the central part of any Machine Learning algortithm</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs017.html#revisiting-our-logistic-regression-case" style="font-size: 80%;"><b>Revisiting our Logistic Regression case</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs018.html#the-equations-to-solve" style="font-size: 80%;"><b>The equations to solve</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs019.html#solving-using-newton-raphson-s-method" style="font-size: 80%;"><b>Solving using Newton-Raphson's method</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs020.html#example-code-for-logistic-regression" style="font-size: 80%;"><b>Example code for Logistic Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs020.html#synthetic-data-generation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Synthetic data generation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs021.html#using-scikit-learn" style="font-size: 80%;"><b>Using <b>Scikit-learn</b></b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs022.html#using-the-correlation-matrix" style="font-size: 80%;"><b>Using the correlation matrix</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs023.html#discussing-the-correlation-data" style="font-size: 80%;"><b>Discussing the correlation data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs024.html#other-measures-in-classification-studies" style="font-size: 80%;"><b>Other measures in classification studies</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs025.html#introduction-to-neural-networks" style="font-size: 80%;"><b>Introduction to Neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs026.html#artificial-neurons" style="font-size: 80%;"><b>Artificial neurons</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs027.html#neural-network-types" style="font-size: 80%;"><b>Neural network types</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#feed-forward-neural-networks" style="font-size: 80%;"><b>Feed-forward neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs029.html#convolutional-neural-network" style="font-size: 80%;"><b>Convolutional Neural Network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs030.html#recurrent-neural-networks" style="font-size: 80%;"><b>Recurrent neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs031.html#other-types-of-networks" style="font-size: 80%;"><b>Other types of networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs032.html#multilayer-perceptrons" style="font-size: 80%;"><b>Multilayer perceptrons</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs033.html#why-multilayer-perceptrons" style="font-size: 80%;"><b>Why multilayer perceptrons?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs034.html#illustration-of-a-single-perceptron-model-and-a-multi-perceptron-model" style="font-size: 80%;"><b>Illustration of a single perceptron model and a multi-perceptron model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs035.html#examples-of-xor-or-and-and-gates" style="font-size: 80%;"><b>Examples of XOR, OR and AND gates</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs036.html#does-logistic-regression-do-a-better-job" style="font-size: 80%;"><b>Does Logistic Regression do a better Job?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs037.html#adding-neural-networks" style="font-size: 80%;"><b>Adding Neural Networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs043.html#matrix-vector-notation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Matrix-vector notation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs044.html#matrix-vector-notation-and-activation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Matrix-vector notation  and activation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs046.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs047.html#relevance" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Relevance</a></li>
 
         </ul>
       </li>
@@ -362,78 +260,31 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0042"></a>
 <!-- !split -->
-<h2 id="similar-second-order-function-now-problem-but-now-with-adagrad" class="anchor">Similar (second order function now) problem but now with AdaGrad </h2>
+<h2 id="mathematical-model" class="anchor">Mathematical model  </h2>
 
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #408080; font-style: italic"># Using Autograd to calculate gradients using AdaGrad and Stochastic Gradient descent</span>
-<span style="color: #408080; font-style: italic"># OLS example</span>
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">random</span> <span style="color: #008000; font-weight: bold">import</span> random, seed
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">autograd.numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">matplotlib.pyplot</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">plt</span>
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">autograd</span> <span style="color: #008000; font-weight: bold">import</span> grad
+<p>This confirms that an MLP, despite its quite convoluted mathematical
+form, is nothing more than an analytic function, specifically a
+mapping of real-valued vectors \( \hat{x} \in \mathbb{R}^n \rightarrow
+\hat{y} \in \mathbb{R}^m \).
+</p>
 
-<span style="color: #408080; font-style: italic"># Note change from previous example</span>
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">CostOLS</span>(y,X,theta):
-    <span style="color: #008000; font-weight: bold">return</span> np<span style="color: #666666">.</span>sum((y<span style="color: #666666">-</span>X <span style="color: #666666">@</span> theta)<span style="color: #666666">**2</span>)
+<p>Furthermore, the flexibility and universality of an MLP can be
+illustrated by realizing that the expression is essentially a nested
+sum of scaled activation functions of the form
+</p>
 
-n <span style="color: #666666">=</span> <span style="color: #666666">1000</span>
-x <span style="color: #666666">=</span> np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>rand(n,<span style="color: #666666">1</span>)
-y <span style="color: #666666">=</span> <span style="color: #666666">2.0+3*</span>x <span style="color: #666666">+4*</span>x<span style="color: #666666">*</span>x
+$$
+\begin{equation}
+ f(x) = c_1 f(c_2 x + c_3) + c_4
+\tag{10}
+\end{equation}
+$$
 
-X <span style="color: #666666">=</span> np<span style="color: #666666">.</span>c_[np<span style="color: #666666">.</span>ones((n,<span style="color: #666666">1</span>)), x, x<span style="color: #666666">*</span>x]
-XT_X <span style="color: #666666">=</span> X<span style="color: #666666">.</span>T <span style="color: #666666">@</span> X
-theta_linreg <span style="color: #666666">=</span> np<span style="color: #666666">.</span>linalg<span style="color: #666666">.</span>pinv(XT_X) <span style="color: #666666">@</span> (X<span style="color: #666666">.</span>T <span style="color: #666666">@</span> y)
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;Own inversion&quot;</span>)
-<span style="color: #008000">print</span>(theta_linreg)
-
-
-<span style="color: #408080; font-style: italic"># Note that we request the derivative wrt third argument (theta, 2 here)</span>
-training_gradient <span style="color: #666666">=</span> grad(CostOLS,<span style="color: #666666">2</span>)
-<span style="color: #408080; font-style: italic"># Define parameters for Stochastic Gradient Descent</span>
-n_epochs <span style="color: #666666">=</span> <span style="color: #666666">50</span>
-M <span style="color: #666666">=</span> <span style="color: #666666">5</span>   <span style="color: #408080; font-style: italic">#size of each minibatch</span>
-m <span style="color: #666666">=</span> <span style="color: #008000">int</span>(n<span style="color: #666666">/</span>M) <span style="color: #408080; font-style: italic">#number of minibatches</span>
-<span style="color: #408080; font-style: italic"># Guess for unknown parameters theta</span>
-theta <span style="color: #666666">=</span> np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>randn(<span style="color: #666666">3</span>,<span style="color: #666666">1</span>)
-
-<span style="color: #408080; font-style: italic"># Value for learning rate</span>
-eta <span style="color: #666666">=</span> <span style="color: #666666">0.01</span>
-<span style="color: #408080; font-style: italic"># Including AdaGrad parameter to avoid possible division by zero</span>
-delta  <span style="color: #666666">=</span> <span style="color: #666666">1e-8</span>
-<span style="color: #008000; font-weight: bold">for</span> epoch <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(n_epochs):
-    Giter <span style="color: #666666">=</span> <span style="color: #666666">0.0</span>
-    <span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(m):
-        random_index <span style="color: #666666">=</span> M<span style="color: #666666">*</span>np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>randint(m)
-        xi <span style="color: #666666">=</span> X[random_index:random_index<span style="color: #666666">+</span>M]
-        yi <span style="color: #666666">=</span> y[random_index:random_index<span style="color: #666666">+</span>M]
-        gradients <span style="color: #666666">=</span> (<span style="color: #666666">1.0/</span>M)<span style="color: #666666">*</span>training_gradient(yi, xi, theta)
-        Giter <span style="color: #666666">+=</span> gradients<span style="color: #666666">*</span>gradients
-        update <span style="color: #666666">=</span> gradients<span style="color: #666666">*</span>eta<span style="color: #666666">/</span>(delta<span style="color: #666666">+</span>np<span style="color: #666666">.</span>sqrt(Giter))
-        theta <span style="color: #666666">-=</span> update
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;theta from own AdaGrad&quot;</span>)
-<span style="color: #008000">print</span>(theta)
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-<p>Running this code we note an almost perfect agreement with the results from matrix inversion.</p>
+<p>where the parameters \( c_i \) are weights and biases. By adjusting these
+parameters, the activation functions can be shifted up and down or
+left and right, change slope or be rescaled which is the key to the
+flexibility of a neural network.
+</p>
 
 <p>
 <!-- navigation buttons at the bottom of the page -->
@@ -455,12 +306,6 @@ <h2 id="similar-second-order-function-now-problem-but-now-with-adagrad" class="a
   <li><a href="http://github.com/._week40-bs045.html">46</a></li>
   <li><a href="http://github.com/._week40-bs046.html">47</a></li>
   <li><a href="http://github.com/._week40-bs047.html">48</a></li>
-  <li><a href="http://github.com/._week40-bs048.html">49</a></li>
-  <li><a href="http://github.com/._week40-bs049.html">50</a></li>
-  <li><a href="http://github.com/._week40-bs050.html">51</a></li>
-  <li><a href="http://github.com/._week40-bs051.html">52</a></li>
-  <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week40-bs068.html">69</a></li>
   <li><a href="http://github.com/._week40-bs043.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week40/html/._week40-bs043.html b/doc/pub/week40/html/._week40-bs043.html
index 60d931148..4fde07550 100644
--- a/doc/pub/week40/html/._week40-bs043.html
+++ b/doc/pub/week40/html/._week40-bs043.html
@@ -36,11 +36,10 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Plans for week 40', 2, None, 'plans-for-week-40'),
-              ('Lecture Monday September 30, 2024',
+ 'sections': [('Lecture Monday September 29, 2025',
                2,
                None,
-               'lecture-monday-september-30-2024'),
+               'lecture-monday-september-29-2025'),
               ('Suggested readings and videos',
                2,
                None,
@@ -49,145 +48,71 @@
                2,
                None,
                'lab-sessions-tuesday-and-wednesday'),
-              ('Summary from last week, using gradient descent methods, '
-               'limitations',
+              ('Logistic Regression, from last week',
                2,
                None,
-               'summary-from-last-week-using-gradient-descent-methods-limitations'),
-              ('Simple implementation of GD for OLS, Ridge and Lasso',
+               'logistic-regression-from-last-week'),
+              ('Classification problems', 2, None, 'classification-problems'),
+              ('Optimization and Deep learning',
                2,
                None,
-               'simple-implementation-of-gd-for-ols-ridge-and-lasso'),
-              ("But none of these can compete with Newton's method",
+               'optimization-and-deep-learning'),
+              ('Basics', 2, None, 'basics'),
+              ('Two parameters', 2, None, 'two-parameters'),
+              ('Maximum likelihood', 2, None, 'maximum-likelihood'),
+              ('The cost function rewritten',
                2,
                None,
-               'but-none-of-these-can-compete-with-newton-s-method'),
-              ('Gradient descent and Logistic regression',
+               'the-cost-function-rewritten'),
+              ('Minimizing the cross entropy',
                2,
                None,
-               'gradient-descent-and-logistic-regression'),
-              ('Overview video on Stochastic Gradient Descent',
+               'minimizing-the-cross-entropy'),
+              ('A more compact expression',
                2,
                None,
-               'overview-video-on-stochastic-gradient-descent'),
-              ('Batches and mini-batches', 2, None, 'batches-and-mini-batches'),
-              ('Stochastic Gradient Descent (SGD)',
+               'a-more-compact-expression'),
+              ('Extending to more predictors',
                2,
                None,
-               'stochastic-gradient-descent-sgd'),
-              ('Stochastic Gradient Descent',
+               'extending-to-more-predictors'),
+              ('Including more classes', 2, None, 'including-more-classes'),
+              ('More classes', 2, None, 'more-classes'),
+              ('Optimization, the central part of any Machine Learning '
+               'algortithm',
                2,
                None,
-               'stochastic-gradient-descent'),
-              ('Computation of gradients', 2, None, 'computation-of-gradients'),
-              ('SGD example', 2, None, 'sgd-example'),
-              ('The gradient step', 2, None, 'the-gradient-step'),
-              ('Simple example code', 2, None, 'simple-example-code'),
-              ('When do we stop?', 2, None, 'when-do-we-stop'),
-              ('Slightly different approach',
+               'optimization-the-central-part-of-any-machine-learning-algortithm'),
+              ('Revisiting our Logistic Regression case',
                2,
                None,
-               'slightly-different-approach'),
-              ('Time decay rate', 2, None, 'time-decay-rate'),
-              ('Code with a Number of Minibatches which varies',
+               'revisiting-our-logistic-regression-case'),
+              ('The equations to solve', 2, None, 'the-equations-to-solve'),
+              ("Solving using Newton-Raphson's method",
                2,
                None,
-               'code-with-a-number-of-minibatches-which-varies'),
-              ('Replace or not', 2, None, 'replace-or-not'),
-              ('Momentum based GD', 2, None, 'momentum-based-gd'),
-              ('More on momentum based approaches',
+               'solving-using-newton-raphson-s-method'),
+              ('Example code for Logistic Regression',
                2,
                None,
-               'more-on-momentum-based-approaches'),
-              ('Momentum parameter', 2, None, 'momentum-parameter'),
-              ('Second moment of the gradient',
-               2,
-               None,
-               'second-moment-of-the-gradient'),
-              ('RMS prop', 2, None, 'rms-prop'),
-              ('"ADAM optimizer":"https://arxiv.org/abs/1412.6980"',
-               2,
-               None,
-               'adam-optimizer-https-arxiv-org-abs-1412-6980'),
-              ('Algorithms and codes for Adagrad, RMSprop and Adam',
-               2,
-               None,
-               'algorithms-and-codes-for-adagrad-rmsprop-and-adam'),
-              ('AdaGrad algorithm, taken from "Goodfellow et '
-               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
-               2,
-               None,
-               'adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
-              ('RMSProp algorithm, taken from "Goodfellow et '
-               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
-               2,
-               None,
-               'rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
-              ('ADAM algorithm, taken from "Goodfellow et '
-               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
-               2,
-               None,
-               'adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
-              ('Practical tips', 2, None, 'practical-tips'),
-              ('Automatic differentiation',
-               2,
-               None,
-               'automatic-differentiation'),
-              ('Using autograd', 2, None, 'using-autograd'),
-              ('Autograd with more complicated functions',
-               2,
-               None,
-               'autograd-with-more-complicated-functions'),
-              ('More complicated functions using the elements of their '
-               'arguments directly',
-               2,
-               None,
-               'more-complicated-functions-using-the-elements-of-their-arguments-directly'),
-              ('Functions using mathematical functions from Numpy',
-               2,
-               None,
-               'functions-using-mathematical-functions-from-numpy'),
-              ('More autograd', 2, None, 'more-autograd'),
-              ('And  with loops', 2, None, 'and-with-loops'),
-              ('Using recursion', 2, None, 'using-recursion'),
-              ('Using Autograd with OLS', 2, None, 'using-autograd-with-ols'),
-              ('Same code but now with momentum gradient descent',
-               2,
-               None,
-               'same-code-but-now-with-momentum-gradient-descent'),
-              ('Including Stochastic Gradient Descent with Autograd',
-               2,
-               None,
-               'including-stochastic-gradient-descent-with-autograd'),
-              ('Same code but now with momentum gradient descent',
-               2,
-               None,
-               'same-code-but-now-with-momentum-gradient-descent'),
-              ('Similar (second order function now) problem but now with '
-               'AdaGrad',
-               2,
+               'example-code-for-logistic-regression'),
+              ('Synthetic data generation',
+               3,
                None,
-               'similar-second-order-function-now-problem-but-now-with-adagrad'),
-              ('RMSprop for adaptive learning rate with Stochastic Gradient '
-               'Descent',
+               'synthetic-data-generation'),
+              ('Using _Scikit-learn_', 2, None, 'using-scikit-learn'),
+              ('Using the correlation matrix',
                2,
                None,
-               'rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent'),
-              ('And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"',
+               'using-the-correlation-matrix'),
+              ('Discussing the correlation data',
                2,
                None,
-               'and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf'),
-              ('And Logistic Regression', 2, None, 'and-logistic-regression'),
-              ('Introducing "JAX":"https://jax.readthedocs.io/en/latest/"',
+               'discussing-the-correlation-data'),
+              ('Other measures in classification studies',
                2,
                None,
-               'introducing-jax-https-jax-readthedocs-io-en-latest'),
-              ('Getting started with Jax, note the way we import numpy',
-               3,
-               None,
-               'getting-started-with-jax-note-the-way-we-import-numpy'),
-              ('A warm-up example', 3, None, 'a-warm-up-example'),
-              ('A more advanced example', 3, None, 'a-more-advanced-example'),
+               'other-measures-in-classification-studies'),
               ('Introduction to Neural networks',
                2,
                None,
@@ -276,81 +201,54 @@
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs001.html#plans-for-week-40" style="font-size: 80%;"><b>Plans for week 40</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs002.html#lecture-monday-september-30-2024" style="font-size: 80%;"><b>Lecture Monday September 30, 2024</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs003.html#suggested-readings-and-videos" style="font-size: 80%;"><b>Suggested readings and videos</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs004.html#lab-sessions-tuesday-and-wednesday" style="font-size: 80%;"><b>Lab sessions Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs005.html#summary-from-last-week-using-gradient-descent-methods-limitations" style="font-size: 80%;"><b>Summary from last week, using gradient descent methods, limitations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs006.html#simple-implementation-of-gd-for-ols-ridge-and-lasso" style="font-size: 80%;"><b>Simple implementation of GD for OLS, Ridge and Lasso</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs007.html#but-none-of-these-can-compete-with-newton-s-method" style="font-size: 80%;"><b>But none of these can compete with Newton's method</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs008.html#gradient-descent-and-logistic-regression" style="font-size: 80%;"><b>Gradient descent and Logistic regression</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs009.html#overview-video-on-stochastic-gradient-descent" style="font-size: 80%;"><b>Overview video on Stochastic Gradient Descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs010.html#batches-and-mini-batches" style="font-size: 80%;"><b>Batches and mini-batches</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs011.html#stochastic-gradient-descent-sgd" style="font-size: 80%;"><b>Stochastic Gradient Descent (SGD)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs012.html#stochastic-gradient-descent" style="font-size: 80%;"><b>Stochastic Gradient Descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs013.html#computation-of-gradients" style="font-size: 80%;"><b>Computation of gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs014.html#sgd-example" style="font-size: 80%;"><b>SGD example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs015.html#the-gradient-step" style="font-size: 80%;"><b>The gradient step</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs016.html#simple-example-code" style="font-size: 80%;"><b>Simple example code</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs017.html#when-do-we-stop" style="font-size: 80%;"><b>When do we stop?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs018.html#slightly-different-approach" style="font-size: 80%;"><b>Slightly different approach</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs019.html#time-decay-rate" style="font-size: 80%;"><b>Time decay rate</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs020.html#code-with-a-number-of-minibatches-which-varies" style="font-size: 80%;"><b>Code with a Number of Minibatches which varies</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs021.html#replace-or-not" style="font-size: 80%;"><b>Replace or not</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs022.html#momentum-based-gd" style="font-size: 80%;"><b>Momentum based GD</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs023.html#more-on-momentum-based-approaches" style="font-size: 80%;"><b>More on momentum based approaches</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs024.html#momentum-parameter" style="font-size: 80%;"><b>Momentum parameter</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs025.html#second-moment-of-the-gradient" style="font-size: 80%;"><b>Second moment of the gradient</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs026.html#rms-prop" style="font-size: 80%;"><b>RMS prop</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs027.html#adam-optimizer-https-arxiv-org-abs-1412-6980" style="font-size: 80%;"><b>"ADAM optimizer":"https://arxiv.org/abs/1412.6980"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#algorithms-and-codes-for-adagrad-rmsprop-and-adam" style="font-size: 80%;"><b>Algorithms and codes for Adagrad, RMSprop and Adam</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>AdaGrad algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>RMSProp algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>ADAM algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs029.html#practical-tips" style="font-size: 80%;"><b>Practical tips</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs030.html#automatic-differentiation" style="font-size: 80%;"><b>Automatic differentiation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs031.html#using-autograd" style="font-size: 80%;"><b>Using autograd</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs032.html#autograd-with-more-complicated-functions" style="font-size: 80%;"><b>Autograd with more complicated functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs033.html#more-complicated-functions-using-the-elements-of-their-arguments-directly" style="font-size: 80%;"><b>More complicated functions using the elements of their arguments directly</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs034.html#functions-using-mathematical-functions-from-numpy" style="font-size: 80%;"><b>Functions using mathematical functions from Numpy</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs035.html#more-autograd" style="font-size: 80%;"><b>More autograd</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs036.html#and-with-loops" style="font-size: 80%;"><b>And  with loops</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs037.html#using-recursion" style="font-size: 80%;"><b>Using recursion</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs038.html#using-autograd-with-ols" style="font-size: 80%;"><b>Using Autograd with OLS</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs041.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs040.html#including-stochastic-gradient-descent-with-autograd" style="font-size: 80%;"><b>Including Stochastic Gradient Descent with Autograd</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs041.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#similar-second-order-function-now-problem-but-now-with-adagrad" style="font-size: 80%;"><b>Similar (second order function now) problem but now with AdaGrad</b></a></li>
-     <!-- navigation toc: --> <li><a href="#rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent" style="font-size: 80%;"><b>RMSprop for adaptive learning rate with Stochastic Gradient Descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs044.html#and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf" style="font-size: 80%;"><b>And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#and-logistic-regression" style="font-size: 80%;"><b>And Logistic Regression</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#introducing-jax-https-jax-readthedocs-io-en-latest" style="font-size: 80%;"><b>Introducing "JAX":"https://jax.readthedocs.io/en/latest/"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#getting-started-with-jax-note-the-way-we-import-numpy" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Getting started with Jax, note the way we import numpy</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#a-warm-up-example" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;A warm-up example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#a-more-advanced-example" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;A more advanced example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs046.html#introduction-to-neural-networks" style="font-size: 80%;"><b>Introduction to Neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs047.html#artificial-neurons" style="font-size: 80%;"><b>Artificial neurons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs048.html#neural-network-types" style="font-size: 80%;"><b>Neural network types</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs049.html#feed-forward-neural-networks" style="font-size: 80%;"><b>Feed-forward neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs050.html#convolutional-neural-network" style="font-size: 80%;"><b>Convolutional Neural Network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs051.html#recurrent-neural-networks" style="font-size: 80%;"><b>Recurrent neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs052.html#other-types-of-networks" style="font-size: 80%;"><b>Other types of networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs053.html#multilayer-perceptrons" style="font-size: 80%;"><b>Multilayer perceptrons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs054.html#why-multilayer-perceptrons" style="font-size: 80%;"><b>Why multilayer perceptrons?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs055.html#illustration-of-a-single-perceptron-model-and-a-multi-perceptron-model" style="font-size: 80%;"><b>Illustration of a single perceptron model and a multi-perceptron model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs056.html#examples-of-xor-or-and-and-gates" style="font-size: 80%;"><b>Examples of XOR, OR and AND gates</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs057.html#does-logistic-regression-do-a-better-job" style="font-size: 80%;"><b>Does Logistic Regression do a better Job?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs058.html#adding-neural-networks" style="font-size: 80%;"><b>Adding Neural Networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs064.html#matrix-vector-notation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Matrix-vector notation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs065.html#matrix-vector-notation-and-activation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Matrix-vector notation  and activation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs066.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs067.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs068.html#relevance" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Relevance</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs001.html#lecture-monday-september-29-2025" style="font-size: 80%;"><b>Lecture Monday September 29, 2025</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs002.html#suggested-readings-and-videos" style="font-size: 80%;"><b>Suggested readings and videos</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs003.html#lab-sessions-tuesday-and-wednesday" style="font-size: 80%;"><b>Lab sessions Tuesday and Wednesday</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs004.html#logistic-regression-from-last-week" style="font-size: 80%;"><b>Logistic Regression, from last week</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs005.html#classification-problems" style="font-size: 80%;"><b>Classification problems</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs006.html#optimization-and-deep-learning" style="font-size: 80%;"><b>Optimization and Deep learning</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs007.html#basics" style="font-size: 80%;"><b>Basics</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs008.html#two-parameters" style="font-size: 80%;"><b>Two parameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs009.html#maximum-likelihood" style="font-size: 80%;"><b>Maximum likelihood</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs010.html#the-cost-function-rewritten" style="font-size: 80%;"><b>The cost function rewritten</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs011.html#minimizing-the-cross-entropy" style="font-size: 80%;"><b>Minimizing the cross entropy</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs012.html#a-more-compact-expression" style="font-size: 80%;"><b>A more compact expression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs013.html#extending-to-more-predictors" style="font-size: 80%;"><b>Extending to more predictors</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs014.html#including-more-classes" style="font-size: 80%;"><b>Including more classes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs015.html#more-classes" style="font-size: 80%;"><b>More classes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs016.html#optimization-the-central-part-of-any-machine-learning-algortithm" style="font-size: 80%;"><b>Optimization, the central part of any Machine Learning algortithm</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs017.html#revisiting-our-logistic-regression-case" style="font-size: 80%;"><b>Revisiting our Logistic Regression case</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs018.html#the-equations-to-solve" style="font-size: 80%;"><b>The equations to solve</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs019.html#solving-using-newton-raphson-s-method" style="font-size: 80%;"><b>Solving using Newton-Raphson's method</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs020.html#example-code-for-logistic-regression" style="font-size: 80%;"><b>Example code for Logistic Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs020.html#synthetic-data-generation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Synthetic data generation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs021.html#using-scikit-learn" style="font-size: 80%;"><b>Using <b>Scikit-learn</b></b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs022.html#using-the-correlation-matrix" style="font-size: 80%;"><b>Using the correlation matrix</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs023.html#discussing-the-correlation-data" style="font-size: 80%;"><b>Discussing the correlation data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs024.html#other-measures-in-classification-studies" style="font-size: 80%;"><b>Other measures in classification studies</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs025.html#introduction-to-neural-networks" style="font-size: 80%;"><b>Introduction to Neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs026.html#artificial-neurons" style="font-size: 80%;"><b>Artificial neurons</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs027.html#neural-network-types" style="font-size: 80%;"><b>Neural network types</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#feed-forward-neural-networks" style="font-size: 80%;"><b>Feed-forward neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs029.html#convolutional-neural-network" style="font-size: 80%;"><b>Convolutional Neural Network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs030.html#recurrent-neural-networks" style="font-size: 80%;"><b>Recurrent neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs031.html#other-types-of-networks" style="font-size: 80%;"><b>Other types of networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs032.html#multilayer-perceptrons" style="font-size: 80%;"><b>Multilayer perceptrons</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs033.html#why-multilayer-perceptrons" style="font-size: 80%;"><b>Why multilayer perceptrons?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs034.html#illustration-of-a-single-perceptron-model-and-a-multi-perceptron-model" style="font-size: 80%;"><b>Illustration of a single perceptron model and a multi-perceptron model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs035.html#examples-of-xor-or-and-and-gates" style="font-size: 80%;"><b>Examples of XOR, OR and AND gates</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs036.html#does-logistic-regression-do-a-better-job" style="font-size: 80%;"><b>Does Logistic Regression do a better Job?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs037.html#adding-neural-networks" style="font-size: 80%;"><b>Adding Neural Networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="#matrix-vector-notation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Matrix-vector notation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs044.html#matrix-vector-notation-and-activation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Matrix-vector notation  and activation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs046.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs047.html#relevance" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Relevance</a></li>
 
         </ul>
       </li>
@@ -362,82 +260,40 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0043"></a>
 <!-- !split -->
-<h2 id="rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent" class="anchor">RMSprop for adaptive learning rate with Stochastic Gradient Descent </h2>
+<h3 id="matrix-vector-notation" class="anchor">Matrix-vector notation </h3>
 
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #408080; font-style: italic"># Using Autograd to calculate gradients using RMSprop  and Stochastic Gradient descent</span>
-<span style="color: #408080; font-style: italic"># OLS example</span>
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">random</span> <span style="color: #008000; font-weight: bold">import</span> random, seed
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">autograd.numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">matplotlib.pyplot</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">plt</span>
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">autograd</span> <span style="color: #008000; font-weight: bold">import</span> grad
+<p>We can introduce a more convenient notation for the activations in an A NN. </p>
 
-<span style="color: #408080; font-style: italic"># Note change from previous example</span>
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">CostOLS</span>(y,X,theta):
-    <span style="color: #008000; font-weight: bold">return</span> np<span style="color: #666666">.</span>sum((y<span style="color: #666666">-</span>X <span style="color: #666666">@</span> theta)<span style="color: #666666">**2</span>)
+<p>Additionally, we can represent the biases and activations
+as layer-wise column vectors \( \hat{b}_l \) and \( \hat{y}_l \), so that the \( i \)-th element of each vector 
+is the bias \( b_i^l \) and activation \( y_i^l \) of node \( i \) in layer \( l \) respectively. 
+</p>
 
-n <span style="color: #666666">=</span> <span style="color: #666666">1000</span>
-x <span style="color: #666666">=</span> np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>rand(n,<span style="color: #666666">1</span>)
-y <span style="color: #666666">=</span> <span style="color: #666666">2.0+3*</span>x <span style="color: #666666">+4*</span>x<span style="color: #666666">*</span>x<span style="color: #408080; font-style: italic"># +np.random.randn(n,1)</span>
-
-X <span style="color: #666666">=</span> np<span style="color: #666666">.</span>c_[np<span style="color: #666666">.</span>ones((n,<span style="color: #666666">1</span>)), x, x<span style="color: #666666">*</span>x]
-XT_X <span style="color: #666666">=</span> X<span style="color: #666666">.</span>T <span style="color: #666666">@</span> X
-theta_linreg <span style="color: #666666">=</span> np<span style="color: #666666">.</span>linalg<span style="color: #666666">.</span>pinv(XT_X) <span style="color: #666666">@</span> (X<span style="color: #666666">.</span>T <span style="color: #666666">@</span> y)
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;Own inversion&quot;</span>)
-<span style="color: #008000">print</span>(theta_linreg)
-
-
-<span style="color: #408080; font-style: italic"># Note that we request the derivative wrt third argument (theta, 2 here)</span>
-training_gradient <span style="color: #666666">=</span> grad(CostOLS,<span style="color: #666666">2</span>)
-<span style="color: #408080; font-style: italic"># Define parameters for Stochastic Gradient Descent</span>
-n_epochs <span style="color: #666666">=</span> <span style="color: #666666">50</span>
-M <span style="color: #666666">=</span> <span style="color: #666666">5</span>   <span style="color: #408080; font-style: italic">#size of each minibatch</span>
-m <span style="color: #666666">=</span> <span style="color: #008000">int</span>(n<span style="color: #666666">/</span>M) <span style="color: #408080; font-style: italic">#number of minibatches</span>
-<span style="color: #408080; font-style: italic"># Guess for unknown parameters theta</span>
-theta <span style="color: #666666">=</span> np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>randn(<span style="color: #666666">3</span>,<span style="color: #666666">1</span>)
-
-<span style="color: #408080; font-style: italic"># Value for learning rate</span>
-eta <span style="color: #666666">=</span> <span style="color: #666666">0.01</span>
-<span style="color: #408080; font-style: italic"># Value for parameter rho</span>
-rho <span style="color: #666666">=</span> <span style="color: #666666">0.99</span>
-<span style="color: #408080; font-style: italic"># Including AdaGrad parameter to avoid possible division by zero</span>
-delta  <span style="color: #666666">=</span> <span style="color: #666666">1e-8</span>
-<span style="color: #008000; font-weight: bold">for</span> epoch <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(n_epochs):
-    Giter <span style="color: #666666">=</span> <span style="color: #666666">0.0</span>
-    <span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(m):
-        random_index <span style="color: #666666">=</span> M<span style="color: #666666">*</span>np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>randint(m)
-        xi <span style="color: #666666">=</span> X[random_index:random_index<span style="color: #666666">+</span>M]
-        yi <span style="color: #666666">=</span> y[random_index:random_index<span style="color: #666666">+</span>M]
-        gradients <span style="color: #666666">=</span> (<span style="color: #666666">1.0/</span>M)<span style="color: #666666">*</span>training_gradient(yi, xi, theta)
-	<span style="color: #408080; font-style: italic"># Accumulated gradient</span>
-	<span style="color: #408080; font-style: italic"># Scaling with rho the new and the previous results</span>
-        Giter <span style="color: #666666">=</span> (rho<span style="color: #666666">*</span>Giter<span style="color: #666666">+</span>(<span style="color: #666666">1-</span>rho)<span style="color: #666666">*</span>gradients<span style="color: #666666">*</span>gradients)
-	<span style="color: #408080; font-style: italic"># Taking the diagonal only and inverting</span>
-        update <span style="color: #666666">=</span> gradients<span style="color: #666666">*</span>eta<span style="color: #666666">/</span>(delta<span style="color: #666666">+</span>np<span style="color: #666666">.</span>sqrt(Giter))
-	<span style="color: #408080; font-style: italic"># Hadamard product</span>
-        theta <span style="color: #666666">-=</span> update
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;theta from own RMSprop&quot;</span>)
-<span style="color: #008000">print</span>(theta)
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
+<p>We have that \( \mathrm{W}_l \) is an \( N_{l-1} \times N_l \) matrix, while \( \hat{b}_l \) and \( \hat{y}_l \) are \( N_l \times 1 \) column vectors. 
+With this notation, the sum becomes a matrix-vector multiplication, and we can write
+the equation for the activations of hidden layer 2 (assuming three nodes for simplicity) as
+</p>
+$$
+\begin{equation}
+ \hat{y}_2 = f_2(\mathrm{W}_2 \hat{y}_{1} + \hat{b}_{2}) = 
+ f_2\left(\left[\begin{array}{ccc}
+    w^2_{11} &w^2_{12} &w^2_{13} \\
+    w^2_{21} &w^2_{22} &w^2_{23} \\
+    w^2_{31} &w^2_{32} &w^2_{33} \\
+    \end{array} \right] \cdot
+    \left[\begin{array}{c}
+           y^1_1 \\
+           y^1_2 \\
+           y^1_3 \\
+          \end{array}\right] + 
+    \left[\begin{array}{c}
+           b^2_1 \\
+           b^2_2 \\
+           b^2_3 \\
+          \end{array}\right]\right).
+\tag{11}
+\end{equation}
+$$
 
 
 <p>
@@ -459,13 +315,6 @@ <h2 id="rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent" cla
   <li><a href="http://github.com/._week40-bs045.html">46</a></li>
   <li><a href="http://github.com/._week40-bs046.html">47</a></li>
   <li><a href="http://github.com/._week40-bs047.html">48</a></li>
-  <li><a href="http://github.com/._week40-bs048.html">49</a></li>
-  <li><a href="http://github.com/._week40-bs049.html">50</a></li>
-  <li><a href="http://github.com/._week40-bs050.html">51</a></li>
-  <li><a href="http://github.com/._week40-bs051.html">52</a></li>
-  <li><a href="http://github.com/._week40-bs052.html">53</a></li>
-  <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week40-bs068.html">69</a></li>
   <li><a href="http://github.com/._week40-bs044.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week40/html/._week40-bs044.html b/doc/pub/week40/html/._week40-bs044.html
index 1ebfad7de..78297e631 100644
--- a/doc/pub/week40/html/._week40-bs044.html
+++ b/doc/pub/week40/html/._week40-bs044.html
@@ -36,11 +36,10 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Plans for week 40', 2, None, 'plans-for-week-40'),
-              ('Lecture Monday September 30, 2024',
+ 'sections': [('Lecture Monday September 29, 2025',
                2,
                None,
-               'lecture-monday-september-30-2024'),
+               'lecture-monday-september-29-2025'),
               ('Suggested readings and videos',
                2,
                None,
@@ -49,145 +48,71 @@
                2,
                None,
                'lab-sessions-tuesday-and-wednesday'),
-              ('Summary from last week, using gradient descent methods, '
-               'limitations',
+              ('Logistic Regression, from last week',
                2,
                None,
-               'summary-from-last-week-using-gradient-descent-methods-limitations'),
-              ('Simple implementation of GD for OLS, Ridge and Lasso',
+               'logistic-regression-from-last-week'),
+              ('Classification problems', 2, None, 'classification-problems'),
+              ('Optimization and Deep learning',
                2,
                None,
-               'simple-implementation-of-gd-for-ols-ridge-and-lasso'),
-              ("But none of these can compete with Newton's method",
+               'optimization-and-deep-learning'),
+              ('Basics', 2, None, 'basics'),
+              ('Two parameters', 2, None, 'two-parameters'),
+              ('Maximum likelihood', 2, None, 'maximum-likelihood'),
+              ('The cost function rewritten',
                2,
                None,
-               'but-none-of-these-can-compete-with-newton-s-method'),
-              ('Gradient descent and Logistic regression',
+               'the-cost-function-rewritten'),
+              ('Minimizing the cross entropy',
                2,
                None,
-               'gradient-descent-and-logistic-regression'),
-              ('Overview video on Stochastic Gradient Descent',
+               'minimizing-the-cross-entropy'),
+              ('A more compact expression',
                2,
                None,
-               'overview-video-on-stochastic-gradient-descent'),
-              ('Batches and mini-batches', 2, None, 'batches-and-mini-batches'),
-              ('Stochastic Gradient Descent (SGD)',
+               'a-more-compact-expression'),
+              ('Extending to more predictors',
                2,
                None,
-               'stochastic-gradient-descent-sgd'),
-              ('Stochastic Gradient Descent',
+               'extending-to-more-predictors'),
+              ('Including more classes', 2, None, 'including-more-classes'),
+              ('More classes', 2, None, 'more-classes'),
+              ('Optimization, the central part of any Machine Learning '
+               'algortithm',
                2,
                None,
-               'stochastic-gradient-descent'),
-              ('Computation of gradients', 2, None, 'computation-of-gradients'),
-              ('SGD example', 2, None, 'sgd-example'),
-              ('The gradient step', 2, None, 'the-gradient-step'),
-              ('Simple example code', 2, None, 'simple-example-code'),
-              ('When do we stop?', 2, None, 'when-do-we-stop'),
-              ('Slightly different approach',
+               'optimization-the-central-part-of-any-machine-learning-algortithm'),
+              ('Revisiting our Logistic Regression case',
                2,
                None,
-               'slightly-different-approach'),
-              ('Time decay rate', 2, None, 'time-decay-rate'),
-              ('Code with a Number of Minibatches which varies',
+               'revisiting-our-logistic-regression-case'),
+              ('The equations to solve', 2, None, 'the-equations-to-solve'),
+              ("Solving using Newton-Raphson's method",
                2,
                None,
-               'code-with-a-number-of-minibatches-which-varies'),
-              ('Replace or not', 2, None, 'replace-or-not'),
-              ('Momentum based GD', 2, None, 'momentum-based-gd'),
-              ('More on momentum based approaches',
+               'solving-using-newton-raphson-s-method'),
+              ('Example code for Logistic Regression',
                2,
                None,
-               'more-on-momentum-based-approaches'),
-              ('Momentum parameter', 2, None, 'momentum-parameter'),
-              ('Second moment of the gradient',
-               2,
-               None,
-               'second-moment-of-the-gradient'),
-              ('RMS prop', 2, None, 'rms-prop'),
-              ('"ADAM optimizer":"https://arxiv.org/abs/1412.6980"',
-               2,
-               None,
-               'adam-optimizer-https-arxiv-org-abs-1412-6980'),
-              ('Algorithms and codes for Adagrad, RMSprop and Adam',
-               2,
-               None,
-               'algorithms-and-codes-for-adagrad-rmsprop-and-adam'),
-              ('AdaGrad algorithm, taken from "Goodfellow et '
-               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
-               2,
-               None,
-               'adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
-              ('RMSProp algorithm, taken from "Goodfellow et '
-               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
-               2,
-               None,
-               'rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
-              ('ADAM algorithm, taken from "Goodfellow et '
-               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
-               2,
-               None,
-               'adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
-              ('Practical tips', 2, None, 'practical-tips'),
-              ('Automatic differentiation',
-               2,
-               None,
-               'automatic-differentiation'),
-              ('Using autograd', 2, None, 'using-autograd'),
-              ('Autograd with more complicated functions',
-               2,
-               None,
-               'autograd-with-more-complicated-functions'),
-              ('More complicated functions using the elements of their '
-               'arguments directly',
-               2,
-               None,
-               'more-complicated-functions-using-the-elements-of-their-arguments-directly'),
-              ('Functions using mathematical functions from Numpy',
-               2,
-               None,
-               'functions-using-mathematical-functions-from-numpy'),
-              ('More autograd', 2, None, 'more-autograd'),
-              ('And  with loops', 2, None, 'and-with-loops'),
-              ('Using recursion', 2, None, 'using-recursion'),
-              ('Using Autograd with OLS', 2, None, 'using-autograd-with-ols'),
-              ('Same code but now with momentum gradient descent',
-               2,
-               None,
-               'same-code-but-now-with-momentum-gradient-descent'),
-              ('Including Stochastic Gradient Descent with Autograd',
-               2,
-               None,
-               'including-stochastic-gradient-descent-with-autograd'),
-              ('Same code but now with momentum gradient descent',
-               2,
-               None,
-               'same-code-but-now-with-momentum-gradient-descent'),
-              ('Similar (second order function now) problem but now with '
-               'AdaGrad',
-               2,
+               'example-code-for-logistic-regression'),
+              ('Synthetic data generation',
+               3,
                None,
-               'similar-second-order-function-now-problem-but-now-with-adagrad'),
-              ('RMSprop for adaptive learning rate with Stochastic Gradient '
-               'Descent',
+               'synthetic-data-generation'),
+              ('Using _Scikit-learn_', 2, None, 'using-scikit-learn'),
+              ('Using the correlation matrix',
                2,
                None,
-               'rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent'),
-              ('And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"',
+               'using-the-correlation-matrix'),
+              ('Discussing the correlation data',
                2,
                None,
-               'and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf'),
-              ('And Logistic Regression', 2, None, 'and-logistic-regression'),
-              ('Introducing "JAX":"https://jax.readthedocs.io/en/latest/"',
+               'discussing-the-correlation-data'),
+              ('Other measures in classification studies',
                2,
                None,
-               'introducing-jax-https-jax-readthedocs-io-en-latest'),
-              ('Getting started with Jax, note the way we import numpy',
-               3,
-               None,
-               'getting-started-with-jax-note-the-way-we-import-numpy'),
-              ('A warm-up example', 3, None, 'a-warm-up-example'),
-              ('A more advanced example', 3, None, 'a-more-advanced-example'),
+               'other-measures-in-classification-studies'),
               ('Introduction to Neural networks',
                2,
                None,
@@ -276,81 +201,54 @@
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs001.html#plans-for-week-40" style="font-size: 80%;"><b>Plans for week 40</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs002.html#lecture-monday-september-30-2024" style="font-size: 80%;"><b>Lecture Monday September 30, 2024</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs003.html#suggested-readings-and-videos" style="font-size: 80%;"><b>Suggested readings and videos</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs004.html#lab-sessions-tuesday-and-wednesday" style="font-size: 80%;"><b>Lab sessions Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs005.html#summary-from-last-week-using-gradient-descent-methods-limitations" style="font-size: 80%;"><b>Summary from last week, using gradient descent methods, limitations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs006.html#simple-implementation-of-gd-for-ols-ridge-and-lasso" style="font-size: 80%;"><b>Simple implementation of GD for OLS, Ridge and Lasso</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs007.html#but-none-of-these-can-compete-with-newton-s-method" style="font-size: 80%;"><b>But none of these can compete with Newton's method</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs008.html#gradient-descent-and-logistic-regression" style="font-size: 80%;"><b>Gradient descent and Logistic regression</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs009.html#overview-video-on-stochastic-gradient-descent" style="font-size: 80%;"><b>Overview video on Stochastic Gradient Descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs010.html#batches-and-mini-batches" style="font-size: 80%;"><b>Batches and mini-batches</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs011.html#stochastic-gradient-descent-sgd" style="font-size: 80%;"><b>Stochastic Gradient Descent (SGD)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs012.html#stochastic-gradient-descent" style="font-size: 80%;"><b>Stochastic Gradient Descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs013.html#computation-of-gradients" style="font-size: 80%;"><b>Computation of gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs014.html#sgd-example" style="font-size: 80%;"><b>SGD example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs015.html#the-gradient-step" style="font-size: 80%;"><b>The gradient step</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs016.html#simple-example-code" style="font-size: 80%;"><b>Simple example code</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs017.html#when-do-we-stop" style="font-size: 80%;"><b>When do we stop?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs018.html#slightly-different-approach" style="font-size: 80%;"><b>Slightly different approach</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs019.html#time-decay-rate" style="font-size: 80%;"><b>Time decay rate</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs020.html#code-with-a-number-of-minibatches-which-varies" style="font-size: 80%;"><b>Code with a Number of Minibatches which varies</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs021.html#replace-or-not" style="font-size: 80%;"><b>Replace or not</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs022.html#momentum-based-gd" style="font-size: 80%;"><b>Momentum based GD</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs023.html#more-on-momentum-based-approaches" style="font-size: 80%;"><b>More on momentum based approaches</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs024.html#momentum-parameter" style="font-size: 80%;"><b>Momentum parameter</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs025.html#second-moment-of-the-gradient" style="font-size: 80%;"><b>Second moment of the gradient</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs026.html#rms-prop" style="font-size: 80%;"><b>RMS prop</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs027.html#adam-optimizer-https-arxiv-org-abs-1412-6980" style="font-size: 80%;"><b>"ADAM optimizer":"https://arxiv.org/abs/1412.6980"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#algorithms-and-codes-for-adagrad-rmsprop-and-adam" style="font-size: 80%;"><b>Algorithms and codes for Adagrad, RMSprop and Adam</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>AdaGrad algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>RMSProp algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>ADAM algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs029.html#practical-tips" style="font-size: 80%;"><b>Practical tips</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs030.html#automatic-differentiation" style="font-size: 80%;"><b>Automatic differentiation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs031.html#using-autograd" style="font-size: 80%;"><b>Using autograd</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs032.html#autograd-with-more-complicated-functions" style="font-size: 80%;"><b>Autograd with more complicated functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs033.html#more-complicated-functions-using-the-elements-of-their-arguments-directly" style="font-size: 80%;"><b>More complicated functions using the elements of their arguments directly</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs034.html#functions-using-mathematical-functions-from-numpy" style="font-size: 80%;"><b>Functions using mathematical functions from Numpy</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs035.html#more-autograd" style="font-size: 80%;"><b>More autograd</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs036.html#and-with-loops" style="font-size: 80%;"><b>And  with loops</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs037.html#using-recursion" style="font-size: 80%;"><b>Using recursion</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs038.html#using-autograd-with-ols" style="font-size: 80%;"><b>Using Autograd with OLS</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs041.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs040.html#including-stochastic-gradient-descent-with-autograd" style="font-size: 80%;"><b>Including Stochastic Gradient Descent with Autograd</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs041.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#similar-second-order-function-now-problem-but-now-with-adagrad" style="font-size: 80%;"><b>Similar (second order function now) problem but now with AdaGrad</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs043.html#rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent" style="font-size: 80%;"><b>RMSprop for adaptive learning rate with Stochastic Gradient Descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="#and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf" style="font-size: 80%;"><b>And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#and-logistic-regression" style="font-size: 80%;"><b>And Logistic Regression</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#introducing-jax-https-jax-readthedocs-io-en-latest" style="font-size: 80%;"><b>Introducing "JAX":"https://jax.readthedocs.io/en/latest/"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#getting-started-with-jax-note-the-way-we-import-numpy" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Getting started with Jax, note the way we import numpy</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#a-warm-up-example" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;A warm-up example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#a-more-advanced-example" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;A more advanced example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs046.html#introduction-to-neural-networks" style="font-size: 80%;"><b>Introduction to Neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs047.html#artificial-neurons" style="font-size: 80%;"><b>Artificial neurons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs048.html#neural-network-types" style="font-size: 80%;"><b>Neural network types</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs049.html#feed-forward-neural-networks" style="font-size: 80%;"><b>Feed-forward neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs050.html#convolutional-neural-network" style="font-size: 80%;"><b>Convolutional Neural Network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs051.html#recurrent-neural-networks" style="font-size: 80%;"><b>Recurrent neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs052.html#other-types-of-networks" style="font-size: 80%;"><b>Other types of networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs053.html#multilayer-perceptrons" style="font-size: 80%;"><b>Multilayer perceptrons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs054.html#why-multilayer-perceptrons" style="font-size: 80%;"><b>Why multilayer perceptrons?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs055.html#illustration-of-a-single-perceptron-model-and-a-multi-perceptron-model" style="font-size: 80%;"><b>Illustration of a single perceptron model and a multi-perceptron model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs056.html#examples-of-xor-or-and-and-gates" style="font-size: 80%;"><b>Examples of XOR, OR and AND gates</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs057.html#does-logistic-regression-do-a-better-job" style="font-size: 80%;"><b>Does Logistic Regression do a better Job?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs058.html#adding-neural-networks" style="font-size: 80%;"><b>Adding Neural Networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs064.html#matrix-vector-notation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Matrix-vector notation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs065.html#matrix-vector-notation-and-activation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Matrix-vector notation  and activation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs066.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs067.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs068.html#relevance" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Relevance</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs001.html#lecture-monday-september-29-2025" style="font-size: 80%;"><b>Lecture Monday September 29, 2025</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs002.html#suggested-readings-and-videos" style="font-size: 80%;"><b>Suggested readings and videos</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs003.html#lab-sessions-tuesday-and-wednesday" style="font-size: 80%;"><b>Lab sessions Tuesday and Wednesday</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs004.html#logistic-regression-from-last-week" style="font-size: 80%;"><b>Logistic Regression, from last week</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs005.html#classification-problems" style="font-size: 80%;"><b>Classification problems</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs006.html#optimization-and-deep-learning" style="font-size: 80%;"><b>Optimization and Deep learning</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs007.html#basics" style="font-size: 80%;"><b>Basics</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs008.html#two-parameters" style="font-size: 80%;"><b>Two parameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs009.html#maximum-likelihood" style="font-size: 80%;"><b>Maximum likelihood</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs010.html#the-cost-function-rewritten" style="font-size: 80%;"><b>The cost function rewritten</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs011.html#minimizing-the-cross-entropy" style="font-size: 80%;"><b>Minimizing the cross entropy</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs012.html#a-more-compact-expression" style="font-size: 80%;"><b>A more compact expression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs013.html#extending-to-more-predictors" style="font-size: 80%;"><b>Extending to more predictors</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs014.html#including-more-classes" style="font-size: 80%;"><b>Including more classes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs015.html#more-classes" style="font-size: 80%;"><b>More classes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs016.html#optimization-the-central-part-of-any-machine-learning-algortithm" style="font-size: 80%;"><b>Optimization, the central part of any Machine Learning algortithm</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs017.html#revisiting-our-logistic-regression-case" style="font-size: 80%;"><b>Revisiting our Logistic Regression case</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs018.html#the-equations-to-solve" style="font-size: 80%;"><b>The equations to solve</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs019.html#solving-using-newton-raphson-s-method" style="font-size: 80%;"><b>Solving using Newton-Raphson's method</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs020.html#example-code-for-logistic-regression" style="font-size: 80%;"><b>Example code for Logistic Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs020.html#synthetic-data-generation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Synthetic data generation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs021.html#using-scikit-learn" style="font-size: 80%;"><b>Using <b>Scikit-learn</b></b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs022.html#using-the-correlation-matrix" style="font-size: 80%;"><b>Using the correlation matrix</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs023.html#discussing-the-correlation-data" style="font-size: 80%;"><b>Discussing the correlation data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs024.html#other-measures-in-classification-studies" style="font-size: 80%;"><b>Other measures in classification studies</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs025.html#introduction-to-neural-networks" style="font-size: 80%;"><b>Introduction to Neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs026.html#artificial-neurons" style="font-size: 80%;"><b>Artificial neurons</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs027.html#neural-network-types" style="font-size: 80%;"><b>Neural network types</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#feed-forward-neural-networks" style="font-size: 80%;"><b>Feed-forward neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs029.html#convolutional-neural-network" style="font-size: 80%;"><b>Convolutional Neural Network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs030.html#recurrent-neural-networks" style="font-size: 80%;"><b>Recurrent neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs031.html#other-types-of-networks" style="font-size: 80%;"><b>Other types of networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs032.html#multilayer-perceptrons" style="font-size: 80%;"><b>Multilayer perceptrons</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs033.html#why-multilayer-perceptrons" style="font-size: 80%;"><b>Why multilayer perceptrons?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs034.html#illustration-of-a-single-perceptron-model-and-a-multi-perceptron-model" style="font-size: 80%;"><b>Illustration of a single perceptron model and a multi-perceptron model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs035.html#examples-of-xor-or-and-and-gates" style="font-size: 80%;"><b>Examples of XOR, OR and AND gates</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs036.html#does-logistic-regression-do-a-better-job" style="font-size: 80%;"><b>Does Logistic Regression do a better Job?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs037.html#adding-neural-networks" style="font-size: 80%;"><b>Adding Neural Networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs043.html#matrix-vector-notation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Matrix-vector notation</a></li>
+     <!-- navigation toc: --> <li><a href="#matrix-vector-notation-and-activation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Matrix-vector notation  and activation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs046.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs047.html#relevance" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Relevance</a></li>
 
         </ul>
       </li>
@@ -362,89 +260,24 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0044"></a>
 <!-- !split -->
-<h2 id="and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf" class="anchor">And finally <a href="https://arxiv.org/pdf/1412.6980.pdf" target="_self">ADAM</a> </h2>
-
+<h3 id="matrix-vector-notation-and-activation" class="anchor">Matrix-vector notation  and activation </h3>
 
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #408080; font-style: italic"># Using Autograd to calculate gradients using RMSprop  and Stochastic Gradient descent</span>
-<span style="color: #408080; font-style: italic"># OLS example</span>
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">random</span> <span style="color: #008000; font-weight: bold">import</span> random, seed
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">autograd.numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">matplotlib.pyplot</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">plt</span>
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">autograd</span> <span style="color: #008000; font-weight: bold">import</span> grad
+<p>The activation of node \( i \) in layer 2 is</p>
 
-<span style="color: #408080; font-style: italic"># Note change from previous example</span>
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">CostOLS</span>(y,X,theta):
-    <span style="color: #008000; font-weight: bold">return</span> np<span style="color: #666666">.</span>sum((y<span style="color: #666666">-</span>X <span style="color: #666666">@</span> theta)<span style="color: #666666">**2</span>)
-
-n <span style="color: #666666">=</span> <span style="color: #666666">1000</span>
-x <span style="color: #666666">=</span> np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>rand(n,<span style="color: #666666">1</span>)
-y <span style="color: #666666">=</span> <span style="color: #666666">2.0+3*</span>x <span style="color: #666666">+4*</span>x<span style="color: #666666">*</span>x<span style="color: #408080; font-style: italic"># +np.random.randn(n,1)</span>
-
-X <span style="color: #666666">=</span> np<span style="color: #666666">.</span>c_[np<span style="color: #666666">.</span>ones((n,<span style="color: #666666">1</span>)), x, x<span style="color: #666666">*</span>x]
-XT_X <span style="color: #666666">=</span> X<span style="color: #666666">.</span>T <span style="color: #666666">@</span> X
-theta_linreg <span style="color: #666666">=</span> np<span style="color: #666666">.</span>linalg<span style="color: #666666">.</span>pinv(XT_X) <span style="color: #666666">@</span> (X<span style="color: #666666">.</span>T <span style="color: #666666">@</span> y)
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;Own inversion&quot;</span>)
-<span style="color: #008000">print</span>(theta_linreg)
-
-
-<span style="color: #408080; font-style: italic"># Note that we request the derivative wrt third argument (theta, 2 here)</span>
-training_gradient <span style="color: #666666">=</span> grad(CostOLS,<span style="color: #666666">2</span>)
-<span style="color: #408080; font-style: italic"># Define parameters for Stochastic Gradient Descent</span>
-n_epochs <span style="color: #666666">=</span> <span style="color: #666666">50</span>
-M <span style="color: #666666">=</span> <span style="color: #666666">5</span>   <span style="color: #408080; font-style: italic">#size of each minibatch</span>
-m <span style="color: #666666">=</span> <span style="color: #008000">int</span>(n<span style="color: #666666">/</span>M) <span style="color: #408080; font-style: italic">#number of minibatches</span>
-<span style="color: #408080; font-style: italic"># Guess for unknown parameters theta</span>
-theta <span style="color: #666666">=</span> np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>randn(<span style="color: #666666">3</span>,<span style="color: #666666">1</span>)
-
-<span style="color: #408080; font-style: italic"># Value for learning rate</span>
-eta <span style="color: #666666">=</span> <span style="color: #666666">0.01</span>
-<span style="color: #408080; font-style: italic"># Value for parameters beta1 and beta2, see https://arxiv.org/abs/1412.6980</span>
-beta1 <span style="color: #666666">=</span> <span style="color: #666666">0.9</span>
-beta2 <span style="color: #666666">=</span> <span style="color: #666666">0.999</span>
-<span style="color: #408080; font-style: italic"># Including AdaGrad parameter to avoid possible division by zero</span>
-delta  <span style="color: #666666">=</span> <span style="color: #666666">1e-7</span>
-<span style="color: #008000">iter</span> <span style="color: #666666">=</span> <span style="color: #666666">0</span>
-<span style="color: #008000; font-weight: bold">for</span> epoch <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(n_epochs):
-    first_moment <span style="color: #666666">=</span> <span style="color: #666666">0.0</span>
-    second_moment <span style="color: #666666">=</span> <span style="color: #666666">0.0</span>
-    <span style="color: #008000">iter</span> <span style="color: #666666">+=</span> <span style="color: #666666">1</span>
-    <span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(m):
-        random_index <span style="color: #666666">=</span> M<span style="color: #666666">*</span>np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>randint(m)
-        xi <span style="color: #666666">=</span> X[random_index:random_index<span style="color: #666666">+</span>M]
-        yi <span style="color: #666666">=</span> y[random_index:random_index<span style="color: #666666">+</span>M]
-        gradients <span style="color: #666666">=</span> (<span style="color: #666666">1.0/</span>M)<span style="color: #666666">*</span>training_gradient(yi, xi, theta)
-        <span style="color: #408080; font-style: italic"># Computing moments first</span>
-        first_moment <span style="color: #666666">=</span> beta1<span style="color: #666666">*</span>first_moment <span style="color: #666666">+</span> (<span style="color: #666666">1-</span>beta1)<span style="color: #666666">*</span>gradients
-        second_moment <span style="color: #666666">=</span> beta2<span style="color: #666666">*</span>second_moment<span style="color: #666666">+</span>(<span style="color: #666666">1-</span>beta2)<span style="color: #666666">*</span>gradients<span style="color: #666666">*</span>gradients
-        first_term <span style="color: #666666">=</span> first_moment<span style="color: #666666">/</span>(<span style="color: #666666">1.0-</span>beta1<span style="color: #666666">**</span><span style="color: #008000">iter</span>)
-        second_term <span style="color: #666666">=</span> second_moment<span style="color: #666666">/</span>(<span style="color: #666666">1.0-</span>beta2<span style="color: #666666">**</span><span style="color: #008000">iter</span>)
-	<span style="color: #408080; font-style: italic"># Scaling with rho the new and the previous results</span>
-        update <span style="color: #666666">=</span> eta<span style="color: #666666">*</span>first_term<span style="color: #666666">/</span>(np<span style="color: #666666">.</span>sqrt(second_term)<span style="color: #666666">+</span>delta)
-        theta <span style="color: #666666">-=</span> update
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;theta from own ADAM&quot;</span>)
-<span style="color: #008000">print</span>(theta)
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
+$$
+\begin{equation}
+ y^2_i = f_2\Bigr(w^2_{i1}y^1_1 + w^2_{i2}y^1_2 + w^2_{i3}y^1_3 + b^2_i\Bigr) = 
+ f_2\left(\sum_{j=1}^3 w^2_{ij} y_j^1 + b^2_i\right).
+\tag{12}
+\end{equation}
+$$
 
+<p>This is not just a convenient and compact notation, but also a useful
+and intuitive way to think about MLPs: The output is calculated by a
+series of matrix-vector multiplications and vector additions that are
+used as input to the activation functions. For each operation
+\( \mathrm{W}_l \hat{y}_{l-1} \) we move forward one layer.
+</p>
 
 <p>
 <!-- navigation buttons at the bottom of the page -->
@@ -464,14 +297,6 @@ <h2 id="and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf" class="anchor">And f
   <li><a href="http://github.com/._week40-bs045.html">46</a></li>
   <li><a href="http://github.com/._week40-bs046.html">47</a></li>
   <li><a href="http://github.com/._week40-bs047.html">48</a></li>
-  <li><a href="http://github.com/._week40-bs048.html">49</a></li>
-  <li><a href="http://github.com/._week40-bs049.html">50</a></li>
-  <li><a href="http://github.com/._week40-bs050.html">51</a></li>
-  <li><a href="http://github.com/._week40-bs051.html">52</a></li>
-  <li><a href="http://github.com/._week40-bs052.html">53</a></li>
-  <li><a href="http://github.com/._week40-bs053.html">54</a></li>
-  <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week40-bs068.html">69</a></li>
   <li><a href="http://github.com/._week40-bs045.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week40/html/._week40-bs045.html b/doc/pub/week40/html/._week40-bs045.html
index ba7f7b7d7..fccdf462d 100644
--- a/doc/pub/week40/html/._week40-bs045.html
+++ b/doc/pub/week40/html/._week40-bs045.html
@@ -36,11 +36,10 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Plans for week 40', 2, None, 'plans-for-week-40'),
-              ('Lecture Monday September 30, 2024',
+ 'sections': [('Lecture Monday September 29, 2025',
                2,
                None,
-               'lecture-monday-september-30-2024'),
+               'lecture-monday-september-29-2025'),
               ('Suggested readings and videos',
                2,
                None,
@@ -49,145 +48,71 @@
                2,
                None,
                'lab-sessions-tuesday-and-wednesday'),
-              ('Summary from last week, using gradient descent methods, '
-               'limitations',
+              ('Logistic Regression, from last week',
                2,
                None,
-               'summary-from-last-week-using-gradient-descent-methods-limitations'),
-              ('Simple implementation of GD for OLS, Ridge and Lasso',
+               'logistic-regression-from-last-week'),
+              ('Classification problems', 2, None, 'classification-problems'),
+              ('Optimization and Deep learning',
                2,
                None,
-               'simple-implementation-of-gd-for-ols-ridge-and-lasso'),
-              ("But none of these can compete with Newton's method",
+               'optimization-and-deep-learning'),
+              ('Basics', 2, None, 'basics'),
+              ('Two parameters', 2, None, 'two-parameters'),
+              ('Maximum likelihood', 2, None, 'maximum-likelihood'),
+              ('The cost function rewritten',
                2,
                None,
-               'but-none-of-these-can-compete-with-newton-s-method'),
-              ('Gradient descent and Logistic regression',
+               'the-cost-function-rewritten'),
+              ('Minimizing the cross entropy',
                2,
                None,
-               'gradient-descent-and-logistic-regression'),
-              ('Overview video on Stochastic Gradient Descent',
+               'minimizing-the-cross-entropy'),
+              ('A more compact expression',
                2,
                None,
-               'overview-video-on-stochastic-gradient-descent'),
-              ('Batches and mini-batches', 2, None, 'batches-and-mini-batches'),
-              ('Stochastic Gradient Descent (SGD)',
+               'a-more-compact-expression'),
+              ('Extending to more predictors',
                2,
                None,
-               'stochastic-gradient-descent-sgd'),
-              ('Stochastic Gradient Descent',
+               'extending-to-more-predictors'),
+              ('Including more classes', 2, None, 'including-more-classes'),
+              ('More classes', 2, None, 'more-classes'),
+              ('Optimization, the central part of any Machine Learning '
+               'algortithm',
                2,
                None,
-               'stochastic-gradient-descent'),
-              ('Computation of gradients', 2, None, 'computation-of-gradients'),
-              ('SGD example', 2, None, 'sgd-example'),
-              ('The gradient step', 2, None, 'the-gradient-step'),
-              ('Simple example code', 2, None, 'simple-example-code'),
-              ('When do we stop?', 2, None, 'when-do-we-stop'),
-              ('Slightly different approach',
+               'optimization-the-central-part-of-any-machine-learning-algortithm'),
+              ('Revisiting our Logistic Regression case',
                2,
                None,
-               'slightly-different-approach'),
-              ('Time decay rate', 2, None, 'time-decay-rate'),
-              ('Code with a Number of Minibatches which varies',
+               'revisiting-our-logistic-regression-case'),
+              ('The equations to solve', 2, None, 'the-equations-to-solve'),
+              ("Solving using Newton-Raphson's method",
                2,
                None,
-               'code-with-a-number-of-minibatches-which-varies'),
-              ('Replace or not', 2, None, 'replace-or-not'),
-              ('Momentum based GD', 2, None, 'momentum-based-gd'),
-              ('More on momentum based approaches',
+               'solving-using-newton-raphson-s-method'),
+              ('Example code for Logistic Regression',
                2,
                None,
-               'more-on-momentum-based-approaches'),
-              ('Momentum parameter', 2, None, 'momentum-parameter'),
-              ('Second moment of the gradient',
-               2,
-               None,
-               'second-moment-of-the-gradient'),
-              ('RMS prop', 2, None, 'rms-prop'),
-              ('"ADAM optimizer":"https://arxiv.org/abs/1412.6980"',
-               2,
-               None,
-               'adam-optimizer-https-arxiv-org-abs-1412-6980'),
-              ('Algorithms and codes for Adagrad, RMSprop and Adam',
-               2,
-               None,
-               'algorithms-and-codes-for-adagrad-rmsprop-and-adam'),
-              ('AdaGrad algorithm, taken from "Goodfellow et '
-               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
-               2,
-               None,
-               'adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
-              ('RMSProp algorithm, taken from "Goodfellow et '
-               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
-               2,
-               None,
-               'rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
-              ('ADAM algorithm, taken from "Goodfellow et '
-               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
-               2,
-               None,
-               'adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
-              ('Practical tips', 2, None, 'practical-tips'),
-              ('Automatic differentiation',
-               2,
-               None,
-               'automatic-differentiation'),
-              ('Using autograd', 2, None, 'using-autograd'),
-              ('Autograd with more complicated functions',
-               2,
-               None,
-               'autograd-with-more-complicated-functions'),
-              ('More complicated functions using the elements of their '
-               'arguments directly',
-               2,
-               None,
-               'more-complicated-functions-using-the-elements-of-their-arguments-directly'),
-              ('Functions using mathematical functions from Numpy',
-               2,
-               None,
-               'functions-using-mathematical-functions-from-numpy'),
-              ('More autograd', 2, None, 'more-autograd'),
-              ('And  with loops', 2, None, 'and-with-loops'),
-              ('Using recursion', 2, None, 'using-recursion'),
-              ('Using Autograd with OLS', 2, None, 'using-autograd-with-ols'),
-              ('Same code but now with momentum gradient descent',
-               2,
-               None,
-               'same-code-but-now-with-momentum-gradient-descent'),
-              ('Including Stochastic Gradient Descent with Autograd',
-               2,
-               None,
-               'including-stochastic-gradient-descent-with-autograd'),
-              ('Same code but now with momentum gradient descent',
-               2,
-               None,
-               'same-code-but-now-with-momentum-gradient-descent'),
-              ('Similar (second order function now) problem but now with '
-               'AdaGrad',
-               2,
+               'example-code-for-logistic-regression'),
+              ('Synthetic data generation',
+               3,
                None,
-               'similar-second-order-function-now-problem-but-now-with-adagrad'),
-              ('RMSprop for adaptive learning rate with Stochastic Gradient '
-               'Descent',
+               'synthetic-data-generation'),
+              ('Using _Scikit-learn_', 2, None, 'using-scikit-learn'),
+              ('Using the correlation matrix',
                2,
                None,
-               'rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent'),
-              ('And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"',
+               'using-the-correlation-matrix'),
+              ('Discussing the correlation data',
                2,
                None,
-               'and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf'),
-              ('And Logistic Regression', 2, None, 'and-logistic-regression'),
-              ('Introducing "JAX":"https://jax.readthedocs.io/en/latest/"',
+               'discussing-the-correlation-data'),
+              ('Other measures in classification studies',
                2,
                None,
-               'introducing-jax-https-jax-readthedocs-io-en-latest'),
-              ('Getting started with Jax, note the way we import numpy',
-               3,
-               None,
-               'getting-started-with-jax-note-the-way-we-import-numpy'),
-              ('A warm-up example', 3, None, 'a-warm-up-example'),
-              ('A more advanced example', 3, None, 'a-more-advanced-example'),
+               'other-measures-in-classification-studies'),
               ('Introduction to Neural networks',
                2,
                None,
@@ -276,81 +201,54 @@
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs001.html#plans-for-week-40" style="font-size: 80%;"><b>Plans for week 40</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs002.html#lecture-monday-september-30-2024" style="font-size: 80%;"><b>Lecture Monday September 30, 2024</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs003.html#suggested-readings-and-videos" style="font-size: 80%;"><b>Suggested readings and videos</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs004.html#lab-sessions-tuesday-and-wednesday" style="font-size: 80%;"><b>Lab sessions Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs005.html#summary-from-last-week-using-gradient-descent-methods-limitations" style="font-size: 80%;"><b>Summary from last week, using gradient descent methods, limitations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs006.html#simple-implementation-of-gd-for-ols-ridge-and-lasso" style="font-size: 80%;"><b>Simple implementation of GD for OLS, Ridge and Lasso</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs007.html#but-none-of-these-can-compete-with-newton-s-method" style="font-size: 80%;"><b>But none of these can compete with Newton's method</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs008.html#gradient-descent-and-logistic-regression" style="font-size: 80%;"><b>Gradient descent and Logistic regression</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs009.html#overview-video-on-stochastic-gradient-descent" style="font-size: 80%;"><b>Overview video on Stochastic Gradient Descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs010.html#batches-and-mini-batches" style="font-size: 80%;"><b>Batches and mini-batches</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs011.html#stochastic-gradient-descent-sgd" style="font-size: 80%;"><b>Stochastic Gradient Descent (SGD)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs012.html#stochastic-gradient-descent" style="font-size: 80%;"><b>Stochastic Gradient Descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs013.html#computation-of-gradients" style="font-size: 80%;"><b>Computation of gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs014.html#sgd-example" style="font-size: 80%;"><b>SGD example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs015.html#the-gradient-step" style="font-size: 80%;"><b>The gradient step</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs016.html#simple-example-code" style="font-size: 80%;"><b>Simple example code</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs017.html#when-do-we-stop" style="font-size: 80%;"><b>When do we stop?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs018.html#slightly-different-approach" style="font-size: 80%;"><b>Slightly different approach</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs019.html#time-decay-rate" style="font-size: 80%;"><b>Time decay rate</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs020.html#code-with-a-number-of-minibatches-which-varies" style="font-size: 80%;"><b>Code with a Number of Minibatches which varies</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs021.html#replace-or-not" style="font-size: 80%;"><b>Replace or not</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs022.html#momentum-based-gd" style="font-size: 80%;"><b>Momentum based GD</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs023.html#more-on-momentum-based-approaches" style="font-size: 80%;"><b>More on momentum based approaches</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs024.html#momentum-parameter" style="font-size: 80%;"><b>Momentum parameter</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs025.html#second-moment-of-the-gradient" style="font-size: 80%;"><b>Second moment of the gradient</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs026.html#rms-prop" style="font-size: 80%;"><b>RMS prop</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs027.html#adam-optimizer-https-arxiv-org-abs-1412-6980" style="font-size: 80%;"><b>"ADAM optimizer":"https://arxiv.org/abs/1412.6980"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#algorithms-and-codes-for-adagrad-rmsprop-and-adam" style="font-size: 80%;"><b>Algorithms and codes for Adagrad, RMSprop and Adam</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>AdaGrad algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>RMSProp algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>ADAM algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs029.html#practical-tips" style="font-size: 80%;"><b>Practical tips</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs030.html#automatic-differentiation" style="font-size: 80%;"><b>Automatic differentiation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs031.html#using-autograd" style="font-size: 80%;"><b>Using autograd</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs032.html#autograd-with-more-complicated-functions" style="font-size: 80%;"><b>Autograd with more complicated functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs033.html#more-complicated-functions-using-the-elements-of-their-arguments-directly" style="font-size: 80%;"><b>More complicated functions using the elements of their arguments directly</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs034.html#functions-using-mathematical-functions-from-numpy" style="font-size: 80%;"><b>Functions using mathematical functions from Numpy</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs035.html#more-autograd" style="font-size: 80%;"><b>More autograd</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs036.html#and-with-loops" style="font-size: 80%;"><b>And  with loops</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs037.html#using-recursion" style="font-size: 80%;"><b>Using recursion</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs038.html#using-autograd-with-ols" style="font-size: 80%;"><b>Using Autograd with OLS</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs041.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs040.html#including-stochastic-gradient-descent-with-autograd" style="font-size: 80%;"><b>Including Stochastic Gradient Descent with Autograd</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs041.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#similar-second-order-function-now-problem-but-now-with-adagrad" style="font-size: 80%;"><b>Similar (second order function now) problem but now with AdaGrad</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs043.html#rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent" style="font-size: 80%;"><b>RMSprop for adaptive learning rate with Stochastic Gradient Descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs044.html#and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf" style="font-size: 80%;"><b>And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"</b></a></li>
-     <!-- navigation toc: --> <li><a href="#and-logistic-regression" style="font-size: 80%;"><b>And Logistic Regression</b></a></li>
-     <!-- navigation toc: --> <li><a href="#introducing-jax-https-jax-readthedocs-io-en-latest" style="font-size: 80%;"><b>Introducing "JAX":"https://jax.readthedocs.io/en/latest/"</b></a></li>
-     <!-- navigation toc: --> <li><a href="#getting-started-with-jax-note-the-way-we-import-numpy" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Getting started with Jax, note the way we import numpy</a></li>
-     <!-- navigation toc: --> <li><a href="#a-warm-up-example" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;A warm-up example</a></li>
-     <!-- navigation toc: --> <li><a href="#a-more-advanced-example" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;A more advanced example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs046.html#introduction-to-neural-networks" style="font-size: 80%;"><b>Introduction to Neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs047.html#artificial-neurons" style="font-size: 80%;"><b>Artificial neurons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs048.html#neural-network-types" style="font-size: 80%;"><b>Neural network types</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs049.html#feed-forward-neural-networks" style="font-size: 80%;"><b>Feed-forward neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs050.html#convolutional-neural-network" style="font-size: 80%;"><b>Convolutional Neural Network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs051.html#recurrent-neural-networks" style="font-size: 80%;"><b>Recurrent neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs052.html#other-types-of-networks" style="font-size: 80%;"><b>Other types of networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs053.html#multilayer-perceptrons" style="font-size: 80%;"><b>Multilayer perceptrons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs054.html#why-multilayer-perceptrons" style="font-size: 80%;"><b>Why multilayer perceptrons?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs055.html#illustration-of-a-single-perceptron-model-and-a-multi-perceptron-model" style="font-size: 80%;"><b>Illustration of a single perceptron model and a multi-perceptron model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs056.html#examples-of-xor-or-and-and-gates" style="font-size: 80%;"><b>Examples of XOR, OR and AND gates</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs057.html#does-logistic-regression-do-a-better-job" style="font-size: 80%;"><b>Does Logistic Regression do a better Job?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs058.html#adding-neural-networks" style="font-size: 80%;"><b>Adding Neural Networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs064.html#matrix-vector-notation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Matrix-vector notation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs065.html#matrix-vector-notation-and-activation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Matrix-vector notation  and activation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs066.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs067.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs068.html#relevance" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Relevance</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs001.html#lecture-monday-september-29-2025" style="font-size: 80%;"><b>Lecture Monday September 29, 2025</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs002.html#suggested-readings-and-videos" style="font-size: 80%;"><b>Suggested readings and videos</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs003.html#lab-sessions-tuesday-and-wednesday" style="font-size: 80%;"><b>Lab sessions Tuesday and Wednesday</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs004.html#logistic-regression-from-last-week" style="font-size: 80%;"><b>Logistic Regression, from last week</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs005.html#classification-problems" style="font-size: 80%;"><b>Classification problems</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs006.html#optimization-and-deep-learning" style="font-size: 80%;"><b>Optimization and Deep learning</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs007.html#basics" style="font-size: 80%;"><b>Basics</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs008.html#two-parameters" style="font-size: 80%;"><b>Two parameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs009.html#maximum-likelihood" style="font-size: 80%;"><b>Maximum likelihood</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs010.html#the-cost-function-rewritten" style="font-size: 80%;"><b>The cost function rewritten</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs011.html#minimizing-the-cross-entropy" style="font-size: 80%;"><b>Minimizing the cross entropy</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs012.html#a-more-compact-expression" style="font-size: 80%;"><b>A more compact expression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs013.html#extending-to-more-predictors" style="font-size: 80%;"><b>Extending to more predictors</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs014.html#including-more-classes" style="font-size: 80%;"><b>Including more classes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs015.html#more-classes" style="font-size: 80%;"><b>More classes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs016.html#optimization-the-central-part-of-any-machine-learning-algortithm" style="font-size: 80%;"><b>Optimization, the central part of any Machine Learning algortithm</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs017.html#revisiting-our-logistic-regression-case" style="font-size: 80%;"><b>Revisiting our Logistic Regression case</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs018.html#the-equations-to-solve" style="font-size: 80%;"><b>The equations to solve</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs019.html#solving-using-newton-raphson-s-method" style="font-size: 80%;"><b>Solving using Newton-Raphson's method</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs020.html#example-code-for-logistic-regression" style="font-size: 80%;"><b>Example code for Logistic Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs020.html#synthetic-data-generation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Synthetic data generation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs021.html#using-scikit-learn" style="font-size: 80%;"><b>Using <b>Scikit-learn</b></b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs022.html#using-the-correlation-matrix" style="font-size: 80%;"><b>Using the correlation matrix</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs023.html#discussing-the-correlation-data" style="font-size: 80%;"><b>Discussing the correlation data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs024.html#other-measures-in-classification-studies" style="font-size: 80%;"><b>Other measures in classification studies</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs025.html#introduction-to-neural-networks" style="font-size: 80%;"><b>Introduction to Neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs026.html#artificial-neurons" style="font-size: 80%;"><b>Artificial neurons</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs027.html#neural-network-types" style="font-size: 80%;"><b>Neural network types</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#feed-forward-neural-networks" style="font-size: 80%;"><b>Feed-forward neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs029.html#convolutional-neural-network" style="font-size: 80%;"><b>Convolutional Neural Network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs030.html#recurrent-neural-networks" style="font-size: 80%;"><b>Recurrent neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs031.html#other-types-of-networks" style="font-size: 80%;"><b>Other types of networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs032.html#multilayer-perceptrons" style="font-size: 80%;"><b>Multilayer perceptrons</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs033.html#why-multilayer-perceptrons" style="font-size: 80%;"><b>Why multilayer perceptrons?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs034.html#illustration-of-a-single-perceptron-model-and-a-multi-perceptron-model" style="font-size: 80%;"><b>Illustration of a single perceptron model and a multi-perceptron model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs035.html#examples-of-xor-or-and-and-gates" style="font-size: 80%;"><b>Examples of XOR, OR and AND gates</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs036.html#does-logistic-regression-do-a-better-job" style="font-size: 80%;"><b>Does Logistic Regression do a better Job?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs037.html#adding-neural-networks" style="font-size: 80%;"><b>Adding Neural Networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs043.html#matrix-vector-notation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Matrix-vector notation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs044.html#matrix-vector-notation-and-activation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Matrix-vector notation  and activation</a></li>
+     <!-- navigation toc: --> <li><a href="#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs046.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs047.html#relevance" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Relevance</a></li>
 
         </ul>
       </li>
@@ -362,200 +260,20 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0045"></a>
 <!-- !split -->
-<h2 id="and-logistic-regression" class="anchor">And Logistic Regression </h2>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">autograd.numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">autograd</span> <span style="color: #008000; font-weight: bold">import</span> grad
-
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">sigmoid</span>(x):
-    <span style="color: #008000; font-weight: bold">return</span> <span style="color: #666666">0.5</span> <span style="color: #666666">*</span> (np<span style="color: #666666">.</span>tanh(x <span style="color: #666666">/</span> <span style="color: #666666">2.</span>) <span style="color: #666666">+</span> <span style="color: #666666">1</span>)
-
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">logistic_predictions</span>(weights, inputs):
-    <span style="color: #408080; font-style: italic"># Outputs probability of a label being true according to logistic model.</span>
-    <span style="color: #008000; font-weight: bold">return</span> sigmoid(np<span style="color: #666666">.</span>dot(inputs, weights))
-
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">training_loss</span>(weights):
-    <span style="color: #408080; font-style: italic"># Training loss is the negative log-likelihood of the training labels.</span>
-    preds <span style="color: #666666">=</span> logistic_predictions(weights, inputs)
-    label_probabilities <span style="color: #666666">=</span> preds <span style="color: #666666">*</span> targets <span style="color: #666666">+</span> (<span style="color: #666666">1</span> <span style="color: #666666">-</span> preds) <span style="color: #666666">*</span> (<span style="color: #666666">1</span> <span style="color: #666666">-</span> targets)
-    <span style="color: #008000; font-weight: bold">return</span> <span style="color: #666666">-</span>np<span style="color: #666666">.</span>sum(np<span style="color: #666666">.</span>log(label_probabilities))
-
-<span style="color: #408080; font-style: italic"># Build a toy dataset.</span>
-inputs <span style="color: #666666">=</span> np<span style="color: #666666">.</span>array([[<span style="color: #666666">0.52</span>, <span style="color: #666666">1.12</span>,  <span style="color: #666666">0.77</span>],
-                   [<span style="color: #666666">0.88</span>, <span style="color: #666666">-1.08</span>, <span style="color: #666666">0.15</span>],
-                   [<span style="color: #666666">0.52</span>, <span style="color: #666666">0.06</span>, <span style="color: #666666">-1.30</span>],
-                   [<span style="color: #666666">0.74</span>, <span style="color: #666666">-2.49</span>, <span style="color: #666666">1.39</span>]])
-targets <span style="color: #666666">=</span> np<span style="color: #666666">.</span>array([<span style="color: #008000; font-weight: bold">True</span>, <span style="color: #008000; font-weight: bold">True</span>, <span style="color: #008000; font-weight: bold">False</span>, <span style="color: #008000; font-weight: bold">True</span>])
-
-<span style="color: #408080; font-style: italic"># Define a function that returns gradients of training loss using Autograd.</span>
-training_gradient_fun <span style="color: #666666">=</span> grad(training_loss)
-
-<span style="color: #408080; font-style: italic"># Optimize weights using gradient descent.</span>
-weights <span style="color: #666666">=</span> np<span style="color: #666666">.</span>array([<span style="color: #666666">0.0</span>, <span style="color: #666666">0.0</span>, <span style="color: #666666">0.0</span>])
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;Initial loss:&quot;</span>, training_loss(weights))
-<span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #666666">100</span>):
-    weights <span style="color: #666666">-=</span> training_gradient_fun(weights) <span style="color: #666666">*</span> <span style="color: #666666">0.01</span>
-
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;Trained loss:&quot;</span>, training_loss(weights))
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-<h2 id="introducing-jax-https-jax-readthedocs-io-en-latest" class="anchor">Introducing <a href="https://jax.readthedocs.io/en/latest/" target="_self">JAX</a> </h2>
+<h3 id="activation-functions" class="anchor">Activation functions  </h3>
 
-<p>Presently, instead of using <b>autograd</b>, we recommend using <a href="https://jax.readthedocs.io/en/latest/" target="_self">JAX</a></p>
-
-<p><b>JAX</b> is Autograd and <a href="https://www.tensorflow.org/xla" target="_self">XLA (Accelerated Linear Algebra))</a>,
-brought together for high-performance numerical computing and machine learning research.
-It provides composable transformations of Python+NumPy programs: differentiate, vectorize, parallelize, Just-In-Time compile to GPU/TPU, and more.
+<p>A property that characterizes a neural network, other than its
+connectivity, is the choice of activation function(s).  As described
+in, the following restrictions are imposed on an activation function
+for a FFNN to fulfill the universal approximation theorem
 </p>
-<h3 id="getting-started-with-jax-note-the-way-we-import-numpy" class="anchor">Getting started with Jax, note the way we import numpy </h3>
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">jax</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">jax.numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">jnp</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">matplotlib.pyplot</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">plt</span>
-
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">jax</span> <span style="color: #008000; font-weight: bold">import</span> grad <span style="color: #008000; font-weight: bold">as</span> jax_grad
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-<h3 id="a-warm-up-example" class="anchor">A warm-up example </h3>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">function</span>(x):
-    <span style="color: #008000; font-weight: bold">return</span> x<span style="color: #666666">**2</span>
-
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">analytical_gradient</span>(x):
-    <span style="color: #008000; font-weight: bold">return</span> <span style="color: #666666">2*</span>x
-
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">gradient_descent</span>(starting_point, learning_rate, num_iterations, solver<span style="color: #666666">=</span><span style="color: #BA2121">&quot;analytical&quot;</span>):
-    x <span style="color: #666666">=</span> starting_point
-    trajectory_x <span style="color: #666666">=</span> [x]
-    trajectory_y <span style="color: #666666">=</span> [function(x)]
-
-    <span style="color: #008000; font-weight: bold">if</span> solver <span style="color: #666666">==</span> <span style="color: #BA2121">&quot;analytical&quot;</span>:
-        grad <span style="color: #666666">=</span> analytical_gradient    
-    <span style="color: #008000; font-weight: bold">elif</span> solver <span style="color: #666666">==</span> <span style="color: #BA2121">&quot;jax&quot;</span>:
-        grad <span style="color: #666666">=</span> jax_grad(function)
-        x <span style="color: #666666">=</span> jnp<span style="color: #666666">.</span>float64(x)
-        learning_rate <span style="color: #666666">=</span> jnp<span style="color: #666666">.</span>float64(learning_rate)
-
-    <span style="color: #008000; font-weight: bold">for</span> _ <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(num_iterations):
-        
-        x <span style="color: #666666">=</span> x <span style="color: #666666">-</span> learning_rate <span style="color: #666666">*</span> grad(x)
-        trajectory_x<span style="color: #666666">.</span>append(x)
-        trajectory_y<span style="color: #666666">.</span>append(function(x))
-
-    <span style="color: #008000; font-weight: bold">return</span> trajectory_x, trajectory_y
-
-x <span style="color: #666666">=</span> np<span style="color: #666666">.</span>linspace(<span style="color: #666666">-5</span>, <span style="color: #666666">5</span>, <span style="color: #666666">100</span>)
-plt<span style="color: #666666">.</span>plot(x, function(x), label<span style="color: #666666">=</span><span style="color: #BA2121">&quot;f(x)&quot;</span>)
-
-descent_x, descent_y <span style="color: #666666">=</span> gradient_descent(<span style="color: #666666">5</span>, <span style="color: #666666">0.1</span>, <span style="color: #666666">10</span>, solver<span style="color: #666666">=</span><span style="color: #BA2121">&quot;analytical&quot;</span>)
-jax_descend_x, jax_descend_y <span style="color: #666666">=</span> gradient_descent(<span style="color: #666666">5</span>, <span style="color: #666666">0.1</span>, <span style="color: #666666">10</span>, solver<span style="color: #666666">=</span><span style="color: #BA2121">&quot;jax&quot;</span>)
-
-plt<span style="color: #666666">.</span>plot(descent_x, descent_y, label<span style="color: #666666">=</span><span style="color: #BA2121">&quot;Gradient descent&quot;</span>, marker<span style="color: #666666">=</span><span style="color: #BA2121">&quot;o&quot;</span>)
-plt<span style="color: #666666">.</span>plot(jax_descend_x, jax_descend_y, label<span style="color: #666666">=</span><span style="color: #BA2121">&quot;JAX&quot;</span>, marker<span style="color: #666666">=</span><span style="color: #BA2121">&quot;x&quot;</span>)
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-<h3 id="a-more-advanced-example" class="anchor">A more advanced example </h3>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;">backend <span style="color: #666666">=</span> np
-
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">function</span>(x):
-    <span style="color: #008000; font-weight: bold">return</span> x<span style="color: #666666">*</span>backend<span style="color: #666666">.</span>sin(x<span style="color: #666666">**2</span> <span style="color: #666666">+</span> <span style="color: #666666">1</span>)
-
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">analytical_gradient</span>(x):
-    <span style="color: #008000; font-weight: bold">return</span> backend<span style="color: #666666">.</span>sin(x<span style="color: #666666">**2</span> <span style="color: #666666">+</span> <span style="color: #666666">1</span>) <span style="color: #666666">+</span> <span style="color: #666666">2*</span>x<span style="color: #666666">**2*</span>backend<span style="color: #666666">.</span>cos(x<span style="color: #666666">**2</span> <span style="color: #666666">+</span> <span style="color: #666666">1</span>)
-
-
-x <span style="color: #666666">=</span> np<span style="color: #666666">.</span>linspace(<span style="color: #666666">-5</span>, <span style="color: #666666">5</span>, <span style="color: #666666">100</span>)
-plt<span style="color: #666666">.</span>plot(x, function(x), label<span style="color: #666666">=</span><span style="color: #BA2121">&quot;f(x)&quot;</span>)
-
-descent_x, descent_y <span style="color: #666666">=</span> gradient_descent(<span style="color: #666666">1</span>, <span style="color: #666666">0.01</span>, <span style="color: #666666">300</span>, solver<span style="color: #666666">=</span><span style="color: #BA2121">&quot;analytical&quot;</span>)
-
-<span style="color: #408080; font-style: italic"># Change the backend to JAX</span>
-backend <span style="color: #666666">=</span> jnp
-jax_descend_x, jax_descend_y <span style="color: #666666">=</span> gradient_descent(<span style="color: #666666">1</span>, <span style="color: #666666">0.01</span>, <span style="color: #666666">300</span>, solver<span style="color: #666666">=</span><span style="color: #BA2121">&quot;jax&quot;</span>)
-
-plt<span style="color: #666666">.</span>scatter(descent_x, descent_y, label<span style="color: #666666">=</span><span style="color: #BA2121">&quot;Gradient descent&quot;</span>, marker<span style="color: #666666">=</span><span style="color: #BA2121">&quot;v&quot;</span>, s<span style="color: #666666">=10</span>, color<span style="color: #666666">=</span><span style="color: #BA2121">&quot;red&quot;</span>) 
-plt<span style="color: #666666">.</span>scatter(jax_descend_x, jax_descend_y, label<span style="color: #666666">=</span><span style="color: #BA2121">&quot;JAX&quot;</span>, marker<span style="color: #666666">=</span><span style="color: #BA2121">&quot;x&quot;</span>, s<span style="color: #666666">=5</span>, color<span style="color: #666666">=</span><span style="color: #BA2121">&quot;black&quot;</span>)
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
 
+<ul>
+  <li> Non-constant</li>
+  <li> Bounded</li>
+  <li> Monotonically-increasing</li>
+  <li> Continuous</li>
+</ul>
 <p>
 <!-- navigation buttons at the bottom of the page -->
 <ul class="pagination">
@@ -573,15 +291,6 @@ <h3 id="a-more-advanced-example" class="anchor">A more advanced example </h3>
   <li class="active"><a href="http://github.com/._week40-bs045.html">46</a></li>
   <li><a href="http://github.com/._week40-bs046.html">47</a></li>
   <li><a href="http://github.com/._week40-bs047.html">48</a></li>
-  <li><a href="http://github.com/._week40-bs048.html">49</a></li>
-  <li><a href="http://github.com/._week40-bs049.html">50</a></li>
-  <li><a href="http://github.com/._week40-bs050.html">51</a></li>
-  <li><a href="http://github.com/._week40-bs051.html">52</a></li>
-  <li><a href="http://github.com/._week40-bs052.html">53</a></li>
-  <li><a href="http://github.com/._week40-bs053.html">54</a></li>
-  <li><a href="http://github.com/._week40-bs054.html">55</a></li>
-  <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week40-bs068.html">69</a></li>
   <li><a href="http://github.com/._week40-bs046.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week40/html/._week40-bs046.html b/doc/pub/week40/html/._week40-bs046.html
index b4ec6ad62..02189cb42 100644
--- a/doc/pub/week40/html/._week40-bs046.html
+++ b/doc/pub/week40/html/._week40-bs046.html
@@ -36,11 +36,10 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Plans for week 40', 2, None, 'plans-for-week-40'),
-              ('Lecture Monday September 30, 2024',
+ 'sections': [('Lecture Monday September 29, 2025',
                2,
                None,
-               'lecture-monday-september-30-2024'),
+               'lecture-monday-september-29-2025'),
               ('Suggested readings and videos',
                2,
                None,
@@ -49,145 +48,71 @@
                2,
                None,
                'lab-sessions-tuesday-and-wednesday'),
-              ('Summary from last week, using gradient descent methods, '
-               'limitations',
+              ('Logistic Regression, from last week',
                2,
                None,
-               'summary-from-last-week-using-gradient-descent-methods-limitations'),
-              ('Simple implementation of GD for OLS, Ridge and Lasso',
+               'logistic-regression-from-last-week'),
+              ('Classification problems', 2, None, 'classification-problems'),
+              ('Optimization and Deep learning',
                2,
                None,
-               'simple-implementation-of-gd-for-ols-ridge-and-lasso'),
-              ("But none of these can compete with Newton's method",
+               'optimization-and-deep-learning'),
+              ('Basics', 2, None, 'basics'),
+              ('Two parameters', 2, None, 'two-parameters'),
+              ('Maximum likelihood', 2, None, 'maximum-likelihood'),
+              ('The cost function rewritten',
                2,
                None,
-               'but-none-of-these-can-compete-with-newton-s-method'),
-              ('Gradient descent and Logistic regression',
+               'the-cost-function-rewritten'),
+              ('Minimizing the cross entropy',
                2,
                None,
-               'gradient-descent-and-logistic-regression'),
-              ('Overview video on Stochastic Gradient Descent',
+               'minimizing-the-cross-entropy'),
+              ('A more compact expression',
                2,
                None,
-               'overview-video-on-stochastic-gradient-descent'),
-              ('Batches and mini-batches', 2, None, 'batches-and-mini-batches'),
-              ('Stochastic Gradient Descent (SGD)',
+               'a-more-compact-expression'),
+              ('Extending to more predictors',
                2,
                None,
-               'stochastic-gradient-descent-sgd'),
-              ('Stochastic Gradient Descent',
+               'extending-to-more-predictors'),
+              ('Including more classes', 2, None, 'including-more-classes'),
+              ('More classes', 2, None, 'more-classes'),
+              ('Optimization, the central part of any Machine Learning '
+               'algortithm',
                2,
                None,
-               'stochastic-gradient-descent'),
-              ('Computation of gradients', 2, None, 'computation-of-gradients'),
-              ('SGD example', 2, None, 'sgd-example'),
-              ('The gradient step', 2, None, 'the-gradient-step'),
-              ('Simple example code', 2, None, 'simple-example-code'),
-              ('When do we stop?', 2, None, 'when-do-we-stop'),
-              ('Slightly different approach',
+               'optimization-the-central-part-of-any-machine-learning-algortithm'),
+              ('Revisiting our Logistic Regression case',
                2,
                None,
-               'slightly-different-approach'),
-              ('Time decay rate', 2, None, 'time-decay-rate'),
-              ('Code with a Number of Minibatches which varies',
+               'revisiting-our-logistic-regression-case'),
+              ('The equations to solve', 2, None, 'the-equations-to-solve'),
+              ("Solving using Newton-Raphson's method",
                2,
                None,
-               'code-with-a-number-of-minibatches-which-varies'),
-              ('Replace or not', 2, None, 'replace-or-not'),
-              ('Momentum based GD', 2, None, 'momentum-based-gd'),
-              ('More on momentum based approaches',
+               'solving-using-newton-raphson-s-method'),
+              ('Example code for Logistic Regression',
                2,
                None,
-               'more-on-momentum-based-approaches'),
-              ('Momentum parameter', 2, None, 'momentum-parameter'),
-              ('Second moment of the gradient',
-               2,
-               None,
-               'second-moment-of-the-gradient'),
-              ('RMS prop', 2, None, 'rms-prop'),
-              ('"ADAM optimizer":"https://arxiv.org/abs/1412.6980"',
-               2,
-               None,
-               'adam-optimizer-https-arxiv-org-abs-1412-6980'),
-              ('Algorithms and codes for Adagrad, RMSprop and Adam',
-               2,
-               None,
-               'algorithms-and-codes-for-adagrad-rmsprop-and-adam'),
-              ('AdaGrad algorithm, taken from "Goodfellow et '
-               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
-               2,
-               None,
-               'adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
-              ('RMSProp algorithm, taken from "Goodfellow et '
-               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
-               2,
-               None,
-               'rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
-              ('ADAM algorithm, taken from "Goodfellow et '
-               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
-               2,
-               None,
-               'adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
-              ('Practical tips', 2, None, 'practical-tips'),
-              ('Automatic differentiation',
-               2,
-               None,
-               'automatic-differentiation'),
-              ('Using autograd', 2, None, 'using-autograd'),
-              ('Autograd with more complicated functions',
-               2,
-               None,
-               'autograd-with-more-complicated-functions'),
-              ('More complicated functions using the elements of their '
-               'arguments directly',
-               2,
-               None,
-               'more-complicated-functions-using-the-elements-of-their-arguments-directly'),
-              ('Functions using mathematical functions from Numpy',
-               2,
-               None,
-               'functions-using-mathematical-functions-from-numpy'),
-              ('More autograd', 2, None, 'more-autograd'),
-              ('And  with loops', 2, None, 'and-with-loops'),
-              ('Using recursion', 2, None, 'using-recursion'),
-              ('Using Autograd with OLS', 2, None, 'using-autograd-with-ols'),
-              ('Same code but now with momentum gradient descent',
-               2,
-               None,
-               'same-code-but-now-with-momentum-gradient-descent'),
-              ('Including Stochastic Gradient Descent with Autograd',
-               2,
-               None,
-               'including-stochastic-gradient-descent-with-autograd'),
-              ('Same code but now with momentum gradient descent',
-               2,
-               None,
-               'same-code-but-now-with-momentum-gradient-descent'),
-              ('Similar (second order function now) problem but now with '
-               'AdaGrad',
-               2,
+               'example-code-for-logistic-regression'),
+              ('Synthetic data generation',
+               3,
                None,
-               'similar-second-order-function-now-problem-but-now-with-adagrad'),
-              ('RMSprop for adaptive learning rate with Stochastic Gradient '
-               'Descent',
+               'synthetic-data-generation'),
+              ('Using _Scikit-learn_', 2, None, 'using-scikit-learn'),
+              ('Using the correlation matrix',
                2,
                None,
-               'rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent'),
-              ('And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"',
+               'using-the-correlation-matrix'),
+              ('Discussing the correlation data',
                2,
                None,
-               'and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf'),
-              ('And Logistic Regression', 2, None, 'and-logistic-regression'),
-              ('Introducing "JAX":"https://jax.readthedocs.io/en/latest/"',
+               'discussing-the-correlation-data'),
+              ('Other measures in classification studies',
                2,
                None,
-               'introducing-jax-https-jax-readthedocs-io-en-latest'),
-              ('Getting started with Jax, note the way we import numpy',
-               3,
-               None,
-               'getting-started-with-jax-note-the-way-we-import-numpy'),
-              ('A warm-up example', 3, None, 'a-warm-up-example'),
-              ('A more advanced example', 3, None, 'a-more-advanced-example'),
+               'other-measures-in-classification-studies'),
               ('Introduction to Neural networks',
                2,
                None,
@@ -276,81 +201,54 @@
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs001.html#plans-for-week-40" style="font-size: 80%;"><b>Plans for week 40</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs002.html#lecture-monday-september-30-2024" style="font-size: 80%;"><b>Lecture Monday September 30, 2024</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs003.html#suggested-readings-and-videos" style="font-size: 80%;"><b>Suggested readings and videos</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs004.html#lab-sessions-tuesday-and-wednesday" style="font-size: 80%;"><b>Lab sessions Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs005.html#summary-from-last-week-using-gradient-descent-methods-limitations" style="font-size: 80%;"><b>Summary from last week, using gradient descent methods, limitations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs006.html#simple-implementation-of-gd-for-ols-ridge-and-lasso" style="font-size: 80%;"><b>Simple implementation of GD for OLS, Ridge and Lasso</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs007.html#but-none-of-these-can-compete-with-newton-s-method" style="font-size: 80%;"><b>But none of these can compete with Newton's method</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs008.html#gradient-descent-and-logistic-regression" style="font-size: 80%;"><b>Gradient descent and Logistic regression</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs009.html#overview-video-on-stochastic-gradient-descent" style="font-size: 80%;"><b>Overview video on Stochastic Gradient Descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs010.html#batches-and-mini-batches" style="font-size: 80%;"><b>Batches and mini-batches</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs011.html#stochastic-gradient-descent-sgd" style="font-size: 80%;"><b>Stochastic Gradient Descent (SGD)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs012.html#stochastic-gradient-descent" style="font-size: 80%;"><b>Stochastic Gradient Descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs013.html#computation-of-gradients" style="font-size: 80%;"><b>Computation of gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs014.html#sgd-example" style="font-size: 80%;"><b>SGD example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs015.html#the-gradient-step" style="font-size: 80%;"><b>The gradient step</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs016.html#simple-example-code" style="font-size: 80%;"><b>Simple example code</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs017.html#when-do-we-stop" style="font-size: 80%;"><b>When do we stop?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs018.html#slightly-different-approach" style="font-size: 80%;"><b>Slightly different approach</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs019.html#time-decay-rate" style="font-size: 80%;"><b>Time decay rate</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs020.html#code-with-a-number-of-minibatches-which-varies" style="font-size: 80%;"><b>Code with a Number of Minibatches which varies</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs021.html#replace-or-not" style="font-size: 80%;"><b>Replace or not</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs022.html#momentum-based-gd" style="font-size: 80%;"><b>Momentum based GD</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs023.html#more-on-momentum-based-approaches" style="font-size: 80%;"><b>More on momentum based approaches</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs024.html#momentum-parameter" style="font-size: 80%;"><b>Momentum parameter</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs025.html#second-moment-of-the-gradient" style="font-size: 80%;"><b>Second moment of the gradient</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs026.html#rms-prop" style="font-size: 80%;"><b>RMS prop</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs027.html#adam-optimizer-https-arxiv-org-abs-1412-6980" style="font-size: 80%;"><b>"ADAM optimizer":"https://arxiv.org/abs/1412.6980"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#algorithms-and-codes-for-adagrad-rmsprop-and-adam" style="font-size: 80%;"><b>Algorithms and codes for Adagrad, RMSprop and Adam</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>AdaGrad algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>RMSProp algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>ADAM algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs029.html#practical-tips" style="font-size: 80%;"><b>Practical tips</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs030.html#automatic-differentiation" style="font-size: 80%;"><b>Automatic differentiation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs031.html#using-autograd" style="font-size: 80%;"><b>Using autograd</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs032.html#autograd-with-more-complicated-functions" style="font-size: 80%;"><b>Autograd with more complicated functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs033.html#more-complicated-functions-using-the-elements-of-their-arguments-directly" style="font-size: 80%;"><b>More complicated functions using the elements of their arguments directly</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs034.html#functions-using-mathematical-functions-from-numpy" style="font-size: 80%;"><b>Functions using mathematical functions from Numpy</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs035.html#more-autograd" style="font-size: 80%;"><b>More autograd</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs036.html#and-with-loops" style="font-size: 80%;"><b>And  with loops</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs037.html#using-recursion" style="font-size: 80%;"><b>Using recursion</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs038.html#using-autograd-with-ols" style="font-size: 80%;"><b>Using Autograd with OLS</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs041.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs040.html#including-stochastic-gradient-descent-with-autograd" style="font-size: 80%;"><b>Including Stochastic Gradient Descent with Autograd</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs041.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#similar-second-order-function-now-problem-but-now-with-adagrad" style="font-size: 80%;"><b>Similar (second order function now) problem but now with AdaGrad</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs043.html#rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent" style="font-size: 80%;"><b>RMSprop for adaptive learning rate with Stochastic Gradient Descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs044.html#and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf" style="font-size: 80%;"><b>And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#and-logistic-regression" style="font-size: 80%;"><b>And Logistic Regression</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#introducing-jax-https-jax-readthedocs-io-en-latest" style="font-size: 80%;"><b>Introducing "JAX":"https://jax.readthedocs.io/en/latest/"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#getting-started-with-jax-note-the-way-we-import-numpy" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Getting started with Jax, note the way we import numpy</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#a-warm-up-example" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;A warm-up example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#a-more-advanced-example" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;A more advanced example</a></li>
-     <!-- navigation toc: --> <li><a href="#introduction-to-neural-networks" style="font-size: 80%;"><b>Introduction to Neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs047.html#artificial-neurons" style="font-size: 80%;"><b>Artificial neurons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs048.html#neural-network-types" style="font-size: 80%;"><b>Neural network types</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs049.html#feed-forward-neural-networks" style="font-size: 80%;"><b>Feed-forward neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs050.html#convolutional-neural-network" style="font-size: 80%;"><b>Convolutional Neural Network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs051.html#recurrent-neural-networks" style="font-size: 80%;"><b>Recurrent neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs052.html#other-types-of-networks" style="font-size: 80%;"><b>Other types of networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs053.html#multilayer-perceptrons" style="font-size: 80%;"><b>Multilayer perceptrons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs054.html#why-multilayer-perceptrons" style="font-size: 80%;"><b>Why multilayer perceptrons?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs055.html#illustration-of-a-single-perceptron-model-and-a-multi-perceptron-model" style="font-size: 80%;"><b>Illustration of a single perceptron model and a multi-perceptron model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs056.html#examples-of-xor-or-and-and-gates" style="font-size: 80%;"><b>Examples of XOR, OR and AND gates</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs057.html#does-logistic-regression-do-a-better-job" style="font-size: 80%;"><b>Does Logistic Regression do a better Job?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs058.html#adding-neural-networks" style="font-size: 80%;"><b>Adding Neural Networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs064.html#matrix-vector-notation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Matrix-vector notation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs065.html#matrix-vector-notation-and-activation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Matrix-vector notation  and activation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs066.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs067.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs068.html#relevance" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Relevance</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs001.html#lecture-monday-september-29-2025" style="font-size: 80%;"><b>Lecture Monday September 29, 2025</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs002.html#suggested-readings-and-videos" style="font-size: 80%;"><b>Suggested readings and videos</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs003.html#lab-sessions-tuesday-and-wednesday" style="font-size: 80%;"><b>Lab sessions Tuesday and Wednesday</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs004.html#logistic-regression-from-last-week" style="font-size: 80%;"><b>Logistic Regression, from last week</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs005.html#classification-problems" style="font-size: 80%;"><b>Classification problems</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs006.html#optimization-and-deep-learning" style="font-size: 80%;"><b>Optimization and Deep learning</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs007.html#basics" style="font-size: 80%;"><b>Basics</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs008.html#two-parameters" style="font-size: 80%;"><b>Two parameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs009.html#maximum-likelihood" style="font-size: 80%;"><b>Maximum likelihood</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs010.html#the-cost-function-rewritten" style="font-size: 80%;"><b>The cost function rewritten</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs011.html#minimizing-the-cross-entropy" style="font-size: 80%;"><b>Minimizing the cross entropy</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs012.html#a-more-compact-expression" style="font-size: 80%;"><b>A more compact expression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs013.html#extending-to-more-predictors" style="font-size: 80%;"><b>Extending to more predictors</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs014.html#including-more-classes" style="font-size: 80%;"><b>Including more classes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs015.html#more-classes" style="font-size: 80%;"><b>More classes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs016.html#optimization-the-central-part-of-any-machine-learning-algortithm" style="font-size: 80%;"><b>Optimization, the central part of any Machine Learning algortithm</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs017.html#revisiting-our-logistic-regression-case" style="font-size: 80%;"><b>Revisiting our Logistic Regression case</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs018.html#the-equations-to-solve" style="font-size: 80%;"><b>The equations to solve</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs019.html#solving-using-newton-raphson-s-method" style="font-size: 80%;"><b>Solving using Newton-Raphson's method</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs020.html#example-code-for-logistic-regression" style="font-size: 80%;"><b>Example code for Logistic Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs020.html#synthetic-data-generation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Synthetic data generation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs021.html#using-scikit-learn" style="font-size: 80%;"><b>Using <b>Scikit-learn</b></b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs022.html#using-the-correlation-matrix" style="font-size: 80%;"><b>Using the correlation matrix</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs023.html#discussing-the-correlation-data" style="font-size: 80%;"><b>Discussing the correlation data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs024.html#other-measures-in-classification-studies" style="font-size: 80%;"><b>Other measures in classification studies</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs025.html#introduction-to-neural-networks" style="font-size: 80%;"><b>Introduction to Neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs026.html#artificial-neurons" style="font-size: 80%;"><b>Artificial neurons</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs027.html#neural-network-types" style="font-size: 80%;"><b>Neural network types</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#feed-forward-neural-networks" style="font-size: 80%;"><b>Feed-forward neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs029.html#convolutional-neural-network" style="font-size: 80%;"><b>Convolutional Neural Network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs030.html#recurrent-neural-networks" style="font-size: 80%;"><b>Recurrent neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs031.html#other-types-of-networks" style="font-size: 80%;"><b>Other types of networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs032.html#multilayer-perceptrons" style="font-size: 80%;"><b>Multilayer perceptrons</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs033.html#why-multilayer-perceptrons" style="font-size: 80%;"><b>Why multilayer perceptrons?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs034.html#illustration-of-a-single-perceptron-model-and-a-multi-perceptron-model" style="font-size: 80%;"><b>Illustration of a single perceptron model and a multi-perceptron model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs035.html#examples-of-xor-or-and-and-gates" style="font-size: 80%;"><b>Examples of XOR, OR and AND gates</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs036.html#does-logistic-regression-do-a-better-job" style="font-size: 80%;"><b>Does Logistic Regression do a better Job?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs037.html#adding-neural-networks" style="font-size: 80%;"><b>Adding Neural Networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs043.html#matrix-vector-notation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Matrix-vector notation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs044.html#matrix-vector-notation-and-activation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Matrix-vector notation  and activation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
+     <!-- navigation toc: --> <li><a href="#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs047.html#relevance" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Relevance</a></li>
 
         </ul>
       </li>
@@ -362,17 +260,29 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0046"></a>
 <!-- !split -->
-<h2 id="introduction-to-neural-networks" class="anchor">Introduction to Neural networks </h2>
+<h3 id="activation-functions-logistic-and-hyperbolic-ones" class="anchor">Activation functions, Logistic and Hyperbolic ones  </h3>
+
+<p>The second requirement excludes all linear functions. Furthermore, in
+a MLP with only linear activation functions, each layer simply
+performs a linear transformation of its inputs.
+</p>
 
-<p>Artificial neural networks are computational systems that can learn to
-perform tasks by considering examples, generally without being
-programmed with any task-specific rules. It is supposed to mimic a
-biological system, wherein neurons interact by sending signals in the
-form of mathematical functions between layers. All layers can contain
-an arbitrary number of neurons, and each connection is represented by
-a weight variable.
+<p>Regardless of the number of layers, the output of the NN will be
+nothing but a linear function of the inputs. Thus we need to introduce
+some kind of non-linearity to the NN to be able to fit non-linear
+functions Typical examples are the logistic <em>Sigmoid</em>
 </p>
 
+$$
+ f(x) = \frac{1}{1 + e^{-x}},
+$$
+
+<p>and the <em>hyperbolic tangent</em> function</p>
+$$
+ f(x) = \tanh(x)
+$$
+
+
 <p>
 <!-- navigation buttons at the bottom of the page -->
 <ul class="pagination">
@@ -389,16 +299,6 @@ <h2 id="introduction-to-neural-networks" class="anchor">Introduction to Neural n
   <li><a href="http://github.com/._week40-bs045.html">46</a></li>
   <li class="active"><a href="http://github.com/._week40-bs046.html">47</a></li>
   <li><a href="http://github.com/._week40-bs047.html">48</a></li>
-  <li><a href="http://github.com/._week40-bs048.html">49</a></li>
-  <li><a href="http://github.com/._week40-bs049.html">50</a></li>
-  <li><a href="http://github.com/._week40-bs050.html">51</a></li>
-  <li><a href="http://github.com/._week40-bs051.html">52</a></li>
-  <li><a href="http://github.com/._week40-bs052.html">53</a></li>
-  <li><a href="http://github.com/._week40-bs053.html">54</a></li>
-  <li><a href="http://github.com/._week40-bs054.html">55</a></li>
-  <li><a href="http://github.com/._week40-bs055.html">56</a></li>
-  <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week40-bs068.html">69</a></li>
   <li><a href="http://github.com/._week40-bs047.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week40/html/._week40-bs047.html b/doc/pub/week40/html/._week40-bs047.html
index 36bcf9e11..3f400cdcd 100644
--- a/doc/pub/week40/html/._week40-bs047.html
+++ b/doc/pub/week40/html/._week40-bs047.html
@@ -36,11 +36,10 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Plans for week 40', 2, None, 'plans-for-week-40'),
-              ('Lecture Monday September 30, 2024',
+ 'sections': [('Lecture Monday September 29, 2025',
                2,
                None,
-               'lecture-monday-september-30-2024'),
+               'lecture-monday-september-29-2025'),
               ('Suggested readings and videos',
                2,
                None,
@@ -49,145 +48,71 @@
                2,
                None,
                'lab-sessions-tuesday-and-wednesday'),
-              ('Summary from last week, using gradient descent methods, '
-               'limitations',
+              ('Logistic Regression, from last week',
                2,
                None,
-               'summary-from-last-week-using-gradient-descent-methods-limitations'),
-              ('Simple implementation of GD for OLS, Ridge and Lasso',
+               'logistic-regression-from-last-week'),
+              ('Classification problems', 2, None, 'classification-problems'),
+              ('Optimization and Deep learning',
                2,
                None,
-               'simple-implementation-of-gd-for-ols-ridge-and-lasso'),
-              ("But none of these can compete with Newton's method",
+               'optimization-and-deep-learning'),
+              ('Basics', 2, None, 'basics'),
+              ('Two parameters', 2, None, 'two-parameters'),
+              ('Maximum likelihood', 2, None, 'maximum-likelihood'),
+              ('The cost function rewritten',
                2,
                None,
-               'but-none-of-these-can-compete-with-newton-s-method'),
-              ('Gradient descent and Logistic regression',
+               'the-cost-function-rewritten'),
+              ('Minimizing the cross entropy',
                2,
                None,
-               'gradient-descent-and-logistic-regression'),
-              ('Overview video on Stochastic Gradient Descent',
+               'minimizing-the-cross-entropy'),
+              ('A more compact expression',
                2,
                None,
-               'overview-video-on-stochastic-gradient-descent'),
-              ('Batches and mini-batches', 2, None, 'batches-and-mini-batches'),
-              ('Stochastic Gradient Descent (SGD)',
+               'a-more-compact-expression'),
+              ('Extending to more predictors',
                2,
                None,
-               'stochastic-gradient-descent-sgd'),
-              ('Stochastic Gradient Descent',
+               'extending-to-more-predictors'),
+              ('Including more classes', 2, None, 'including-more-classes'),
+              ('More classes', 2, None, 'more-classes'),
+              ('Optimization, the central part of any Machine Learning '
+               'algortithm',
                2,
                None,
-               'stochastic-gradient-descent'),
-              ('Computation of gradients', 2, None, 'computation-of-gradients'),
-              ('SGD example', 2, None, 'sgd-example'),
-              ('The gradient step', 2, None, 'the-gradient-step'),
-              ('Simple example code', 2, None, 'simple-example-code'),
-              ('When do we stop?', 2, None, 'when-do-we-stop'),
-              ('Slightly different approach',
+               'optimization-the-central-part-of-any-machine-learning-algortithm'),
+              ('Revisiting our Logistic Regression case',
                2,
                None,
-               'slightly-different-approach'),
-              ('Time decay rate', 2, None, 'time-decay-rate'),
-              ('Code with a Number of Minibatches which varies',
+               'revisiting-our-logistic-regression-case'),
+              ('The equations to solve', 2, None, 'the-equations-to-solve'),
+              ("Solving using Newton-Raphson's method",
                2,
                None,
-               'code-with-a-number-of-minibatches-which-varies'),
-              ('Replace or not', 2, None, 'replace-or-not'),
-              ('Momentum based GD', 2, None, 'momentum-based-gd'),
-              ('More on momentum based approaches',
+               'solving-using-newton-raphson-s-method'),
+              ('Example code for Logistic Regression',
                2,
                None,
-               'more-on-momentum-based-approaches'),
-              ('Momentum parameter', 2, None, 'momentum-parameter'),
-              ('Second moment of the gradient',
-               2,
-               None,
-               'second-moment-of-the-gradient'),
-              ('RMS prop', 2, None, 'rms-prop'),
-              ('"ADAM optimizer":"https://arxiv.org/abs/1412.6980"',
-               2,
-               None,
-               'adam-optimizer-https-arxiv-org-abs-1412-6980'),
-              ('Algorithms and codes for Adagrad, RMSprop and Adam',
-               2,
-               None,
-               'algorithms-and-codes-for-adagrad-rmsprop-and-adam'),
-              ('AdaGrad algorithm, taken from "Goodfellow et '
-               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
-               2,
-               None,
-               'adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
-              ('RMSProp algorithm, taken from "Goodfellow et '
-               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
-               2,
-               None,
-               'rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
-              ('ADAM algorithm, taken from "Goodfellow et '
-               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
-               2,
-               None,
-               'adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
-              ('Practical tips', 2, None, 'practical-tips'),
-              ('Automatic differentiation',
-               2,
-               None,
-               'automatic-differentiation'),
-              ('Using autograd', 2, None, 'using-autograd'),
-              ('Autograd with more complicated functions',
-               2,
-               None,
-               'autograd-with-more-complicated-functions'),
-              ('More complicated functions using the elements of their '
-               'arguments directly',
-               2,
-               None,
-               'more-complicated-functions-using-the-elements-of-their-arguments-directly'),
-              ('Functions using mathematical functions from Numpy',
-               2,
-               None,
-               'functions-using-mathematical-functions-from-numpy'),
-              ('More autograd', 2, None, 'more-autograd'),
-              ('And  with loops', 2, None, 'and-with-loops'),
-              ('Using recursion', 2, None, 'using-recursion'),
-              ('Using Autograd with OLS', 2, None, 'using-autograd-with-ols'),
-              ('Same code but now with momentum gradient descent',
-               2,
-               None,
-               'same-code-but-now-with-momentum-gradient-descent'),
-              ('Including Stochastic Gradient Descent with Autograd',
-               2,
-               None,
-               'including-stochastic-gradient-descent-with-autograd'),
-              ('Same code but now with momentum gradient descent',
-               2,
-               None,
-               'same-code-but-now-with-momentum-gradient-descent'),
-              ('Similar (second order function now) problem but now with '
-               'AdaGrad',
-               2,
+               'example-code-for-logistic-regression'),
+              ('Synthetic data generation',
+               3,
                None,
-               'similar-second-order-function-now-problem-but-now-with-adagrad'),
-              ('RMSprop for adaptive learning rate with Stochastic Gradient '
-               'Descent',
+               'synthetic-data-generation'),
+              ('Using _Scikit-learn_', 2, None, 'using-scikit-learn'),
+              ('Using the correlation matrix',
                2,
                None,
-               'rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent'),
-              ('And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"',
+               'using-the-correlation-matrix'),
+              ('Discussing the correlation data',
                2,
                None,
-               'and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf'),
-              ('And Logistic Regression', 2, None, 'and-logistic-regression'),
-              ('Introducing "JAX":"https://jax.readthedocs.io/en/latest/"',
+               'discussing-the-correlation-data'),
+              ('Other measures in classification studies',
                2,
                None,
-               'introducing-jax-https-jax-readthedocs-io-en-latest'),
-              ('Getting started with Jax, note the way we import numpy',
-               3,
-               None,
-               'getting-started-with-jax-note-the-way-we-import-numpy'),
-              ('A warm-up example', 3, None, 'a-warm-up-example'),
-              ('A more advanced example', 3, None, 'a-more-advanced-example'),
+               'other-measures-in-classification-studies'),
               ('Introduction to Neural networks',
                2,
                None,
@@ -276,81 +201,54 @@
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs001.html#plans-for-week-40" style="font-size: 80%;"><b>Plans for week 40</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs002.html#lecture-monday-september-30-2024" style="font-size: 80%;"><b>Lecture Monday September 30, 2024</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs003.html#suggested-readings-and-videos" style="font-size: 80%;"><b>Suggested readings and videos</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs004.html#lab-sessions-tuesday-and-wednesday" style="font-size: 80%;"><b>Lab sessions Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs005.html#summary-from-last-week-using-gradient-descent-methods-limitations" style="font-size: 80%;"><b>Summary from last week, using gradient descent methods, limitations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs006.html#simple-implementation-of-gd-for-ols-ridge-and-lasso" style="font-size: 80%;"><b>Simple implementation of GD for OLS, Ridge and Lasso</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs007.html#but-none-of-these-can-compete-with-newton-s-method" style="font-size: 80%;"><b>But none of these can compete with Newton's method</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs008.html#gradient-descent-and-logistic-regression" style="font-size: 80%;"><b>Gradient descent and Logistic regression</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs009.html#overview-video-on-stochastic-gradient-descent" style="font-size: 80%;"><b>Overview video on Stochastic Gradient Descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs010.html#batches-and-mini-batches" style="font-size: 80%;"><b>Batches and mini-batches</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs011.html#stochastic-gradient-descent-sgd" style="font-size: 80%;"><b>Stochastic Gradient Descent (SGD)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs012.html#stochastic-gradient-descent" style="font-size: 80%;"><b>Stochastic Gradient Descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs013.html#computation-of-gradients" style="font-size: 80%;"><b>Computation of gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs014.html#sgd-example" style="font-size: 80%;"><b>SGD example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs015.html#the-gradient-step" style="font-size: 80%;"><b>The gradient step</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs016.html#simple-example-code" style="font-size: 80%;"><b>Simple example code</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs017.html#when-do-we-stop" style="font-size: 80%;"><b>When do we stop?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs018.html#slightly-different-approach" style="font-size: 80%;"><b>Slightly different approach</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs019.html#time-decay-rate" style="font-size: 80%;"><b>Time decay rate</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs020.html#code-with-a-number-of-minibatches-which-varies" style="font-size: 80%;"><b>Code with a Number of Minibatches which varies</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs021.html#replace-or-not" style="font-size: 80%;"><b>Replace or not</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs022.html#momentum-based-gd" style="font-size: 80%;"><b>Momentum based GD</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs023.html#more-on-momentum-based-approaches" style="font-size: 80%;"><b>More on momentum based approaches</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs024.html#momentum-parameter" style="font-size: 80%;"><b>Momentum parameter</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs025.html#second-moment-of-the-gradient" style="font-size: 80%;"><b>Second moment of the gradient</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs026.html#rms-prop" style="font-size: 80%;"><b>RMS prop</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs027.html#adam-optimizer-https-arxiv-org-abs-1412-6980" style="font-size: 80%;"><b>"ADAM optimizer":"https://arxiv.org/abs/1412.6980"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#algorithms-and-codes-for-adagrad-rmsprop-and-adam" style="font-size: 80%;"><b>Algorithms and codes for Adagrad, RMSprop and Adam</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>AdaGrad algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>RMSProp algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>ADAM algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs029.html#practical-tips" style="font-size: 80%;"><b>Practical tips</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs030.html#automatic-differentiation" style="font-size: 80%;"><b>Automatic differentiation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs031.html#using-autograd" style="font-size: 80%;"><b>Using autograd</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs032.html#autograd-with-more-complicated-functions" style="font-size: 80%;"><b>Autograd with more complicated functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs033.html#more-complicated-functions-using-the-elements-of-their-arguments-directly" style="font-size: 80%;"><b>More complicated functions using the elements of their arguments directly</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs034.html#functions-using-mathematical-functions-from-numpy" style="font-size: 80%;"><b>Functions using mathematical functions from Numpy</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs035.html#more-autograd" style="font-size: 80%;"><b>More autograd</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs036.html#and-with-loops" style="font-size: 80%;"><b>And  with loops</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs037.html#using-recursion" style="font-size: 80%;"><b>Using recursion</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs038.html#using-autograd-with-ols" style="font-size: 80%;"><b>Using Autograd with OLS</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs041.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs040.html#including-stochastic-gradient-descent-with-autograd" style="font-size: 80%;"><b>Including Stochastic Gradient Descent with Autograd</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs041.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#similar-second-order-function-now-problem-but-now-with-adagrad" style="font-size: 80%;"><b>Similar (second order function now) problem but now with AdaGrad</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs043.html#rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent" style="font-size: 80%;"><b>RMSprop for adaptive learning rate with Stochastic Gradient Descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs044.html#and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf" style="font-size: 80%;"><b>And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#and-logistic-regression" style="font-size: 80%;"><b>And Logistic Regression</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#introducing-jax-https-jax-readthedocs-io-en-latest" style="font-size: 80%;"><b>Introducing "JAX":"https://jax.readthedocs.io/en/latest/"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#getting-started-with-jax-note-the-way-we-import-numpy" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Getting started with Jax, note the way we import numpy</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#a-warm-up-example" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;A warm-up example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#a-more-advanced-example" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;A more advanced example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs046.html#introduction-to-neural-networks" style="font-size: 80%;"><b>Introduction to Neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="#artificial-neurons" style="font-size: 80%;"><b>Artificial neurons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs048.html#neural-network-types" style="font-size: 80%;"><b>Neural network types</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs049.html#feed-forward-neural-networks" style="font-size: 80%;"><b>Feed-forward neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs050.html#convolutional-neural-network" style="font-size: 80%;"><b>Convolutional Neural Network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs051.html#recurrent-neural-networks" style="font-size: 80%;"><b>Recurrent neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs052.html#other-types-of-networks" style="font-size: 80%;"><b>Other types of networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs053.html#multilayer-perceptrons" style="font-size: 80%;"><b>Multilayer perceptrons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs054.html#why-multilayer-perceptrons" style="font-size: 80%;"><b>Why multilayer perceptrons?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs055.html#illustration-of-a-single-perceptron-model-and-a-multi-perceptron-model" style="font-size: 80%;"><b>Illustration of a single perceptron model and a multi-perceptron model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs056.html#examples-of-xor-or-and-and-gates" style="font-size: 80%;"><b>Examples of XOR, OR and AND gates</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs057.html#does-logistic-regression-do-a-better-job" style="font-size: 80%;"><b>Does Logistic Regression do a better Job?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs058.html#adding-neural-networks" style="font-size: 80%;"><b>Adding Neural Networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs064.html#matrix-vector-notation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Matrix-vector notation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs065.html#matrix-vector-notation-and-activation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Matrix-vector notation  and activation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs066.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs067.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs068.html#relevance" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Relevance</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs001.html#lecture-monday-september-29-2025" style="font-size: 80%;"><b>Lecture Monday September 29, 2025</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs002.html#suggested-readings-and-videos" style="font-size: 80%;"><b>Suggested readings and videos</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs003.html#lab-sessions-tuesday-and-wednesday" style="font-size: 80%;"><b>Lab sessions Tuesday and Wednesday</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs004.html#logistic-regression-from-last-week" style="font-size: 80%;"><b>Logistic Regression, from last week</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs005.html#classification-problems" style="font-size: 80%;"><b>Classification problems</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs006.html#optimization-and-deep-learning" style="font-size: 80%;"><b>Optimization and Deep learning</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs007.html#basics" style="font-size: 80%;"><b>Basics</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs008.html#two-parameters" style="font-size: 80%;"><b>Two parameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs009.html#maximum-likelihood" style="font-size: 80%;"><b>Maximum likelihood</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs010.html#the-cost-function-rewritten" style="font-size: 80%;"><b>The cost function rewritten</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs011.html#minimizing-the-cross-entropy" style="font-size: 80%;"><b>Minimizing the cross entropy</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs012.html#a-more-compact-expression" style="font-size: 80%;"><b>A more compact expression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs013.html#extending-to-more-predictors" style="font-size: 80%;"><b>Extending to more predictors</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs014.html#including-more-classes" style="font-size: 80%;"><b>Including more classes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs015.html#more-classes" style="font-size: 80%;"><b>More classes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs016.html#optimization-the-central-part-of-any-machine-learning-algortithm" style="font-size: 80%;"><b>Optimization, the central part of any Machine Learning algortithm</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs017.html#revisiting-our-logistic-regression-case" style="font-size: 80%;"><b>Revisiting our Logistic Regression case</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs018.html#the-equations-to-solve" style="font-size: 80%;"><b>The equations to solve</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs019.html#solving-using-newton-raphson-s-method" style="font-size: 80%;"><b>Solving using Newton-Raphson's method</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs020.html#example-code-for-logistic-regression" style="font-size: 80%;"><b>Example code for Logistic Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs020.html#synthetic-data-generation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Synthetic data generation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs021.html#using-scikit-learn" style="font-size: 80%;"><b>Using <b>Scikit-learn</b></b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs022.html#using-the-correlation-matrix" style="font-size: 80%;"><b>Using the correlation matrix</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs023.html#discussing-the-correlation-data" style="font-size: 80%;"><b>Discussing the correlation data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs024.html#other-measures-in-classification-studies" style="font-size: 80%;"><b>Other measures in classification studies</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs025.html#introduction-to-neural-networks" style="font-size: 80%;"><b>Introduction to Neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs026.html#artificial-neurons" style="font-size: 80%;"><b>Artificial neurons</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs027.html#neural-network-types" style="font-size: 80%;"><b>Neural network types</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#feed-forward-neural-networks" style="font-size: 80%;"><b>Feed-forward neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs029.html#convolutional-neural-network" style="font-size: 80%;"><b>Convolutional Neural Network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs030.html#recurrent-neural-networks" style="font-size: 80%;"><b>Recurrent neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs031.html#other-types-of-networks" style="font-size: 80%;"><b>Other types of networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs032.html#multilayer-perceptrons" style="font-size: 80%;"><b>Multilayer perceptrons</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs033.html#why-multilayer-perceptrons" style="font-size: 80%;"><b>Why multilayer perceptrons?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs034.html#illustration-of-a-single-perceptron-model-and-a-multi-perceptron-model" style="font-size: 80%;"><b>Illustration of a single perceptron model and a multi-perceptron model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs035.html#examples-of-xor-or-and-and-gates" style="font-size: 80%;"><b>Examples of XOR, OR and AND gates</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs036.html#does-logistic-regression-do-a-better-job" style="font-size: 80%;"><b>Does Logistic Regression do a better Job?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs037.html#adding-neural-networks" style="font-size: 80%;"><b>Adding Neural Networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs043.html#matrix-vector-notation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Matrix-vector notation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs044.html#matrix-vector-notation-and-activation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Matrix-vector notation  and activation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs046.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
+     <!-- navigation toc: --> <li><a href="#relevance" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Relevance</a></li>
 
         </ul>
       </li>
@@ -362,64 +260,108 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0047"></a>
 <!-- !split -->
-<h2 id="artificial-neurons" class="anchor">Artificial neurons  </h2>
+<h3 id="relevance" class="anchor">Relevance </h3>
 
-<p>The field of artificial neural networks has a long history of
-development, and is closely connected with the advancement of computer
-science and computers in general. A model of artificial neurons was
-first developed by McCulloch and Pitts in 1943 to study signal
-processing in the brain and has later been refined by others. The
-general idea is to mimic neural networks in the human brain, which is
-composed of billions of neurons that communicate with each other by
-sending electrical signals.  Each neuron accumulates its incoming
-signals, which must exceed an activation threshold to yield an
-output. If the threshold is not overcome, the neuron remains inactive,
-i.e. has zero output.
+<p>The <em>sigmoid</em> function are more biologically plausible because the
+output of inactive neurons are zero. Such activation function are
+called <em>one-sided</em>. However, it has been shown that the hyperbolic
+tangent performs better than the sigmoid for training MLPs.  has
+become the most popular for <em>deep neural networks</em>
 </p>
 
-<p>This behaviour has inspired a simple mathematical model for an artificial neuron.</p>
 
-$$
-\begin{equation}
- y = f\left(\sum_{i=1}^n w_ix_i\right) = f(u)
-\tag{6}
-\end{equation}
-$$
+<!-- code=python (!bc pycod) typeset with pygments style "default" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #f8f8f8">
+  <pre style="line-height: 125%;"><span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;The sigmoid function (or the logistic curve) is a </span>
+<span style="color: #BA2121; font-style: italic">function that takes any real number, z, and outputs a number (0,1).</span>
+<span style="color: #BA2121; font-style: italic">It is useful in neural networks for assigning weights on a relative scale.</span>
+<span style="color: #BA2121; font-style: italic">The value z is the weighted sum of parameters involved in the learning algorithm.&quot;&quot;&quot;</span>
 
-<p>Here, the output \( y \) of the neuron is the value of its activation function, which have as input
-a weighted sum of signals \( x_i, \dots ,x_n \) received by \( n \) other neurons.
-</p>
+<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">numpy</span>
+<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">matplotlib.pyplot</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">plt</span>
+<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">math</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">mt</span>
 
-<p>Conceptually, it is helpful to divide neural networks into four
-categories:
-</p>
-<ol>
-<li> general purpose neural networks for supervised learning,</li>
-<li> neural networks designed specifically for image processing, the most prominent example of this class being Convolutional Neural Networks (CNNs),</li>
-<li> neural networks for sequential data such as Recurrent Neural Networks (RNNs), and</li>
-<li> neural networks for unsupervised learning such as Deep Boltzmann Machines.</li>
-</ol>
-<p>In natural science, DNNs and CNNs have already found numerous
-applications. In statistical physics, they have been applied to detect
-phase transitions in 2D Ising and Potts models, lattice gauge
-theories, and different phases of polymers, or solving the
-Navier-Stokes equation in weather forecasting.  Deep learning has also
-found interesting applications in quantum physics. Various quantum
-phase transitions can be detected and studied using DNNs and CNNs,
-topological phases, and even non-equilibrium many-body
-localization. Representing quantum states as DNNs quantum state
-tomography are among some of the impressive achievements to reveal the
-potential of DNNs to facilitate the study of quantum systems.
-</p>
+z <span style="color: #666666">=</span> numpy<span style="color: #666666">.</span>arange(<span style="color: #666666">-5</span>, <span style="color: #666666">5</span>, <span style="color: #666666">.1</span>)
+sigma_fn <span style="color: #666666">=</span> numpy<span style="color: #666666">.</span>vectorize(<span style="color: #008000; font-weight: bold">lambda</span> z: <span style="color: #666666">1/</span>(<span style="color: #666666">1+</span>numpy<span style="color: #666666">.</span>exp(<span style="color: #666666">-</span>z)))
+sigma <span style="color: #666666">=</span> sigma_fn(z)
 
-<p>In quantum information theory, it has been shown that one can perform
-gate decompositions with the help of neural. 
-</p>
+fig <span style="color: #666666">=</span> plt<span style="color: #666666">.</span>figure()
+ax <span style="color: #666666">=</span> fig<span style="color: #666666">.</span>add_subplot(<span style="color: #666666">111</span>)
+ax<span style="color: #666666">.</span>plot(z, sigma)
+ax<span style="color: #666666">.</span>set_ylim([<span style="color: #666666">-0.1</span>, <span style="color: #666666">1.1</span>])
+ax<span style="color: #666666">.</span>set_xlim([<span style="color: #666666">-5</span>,<span style="color: #666666">5</span>])
+ax<span style="color: #666666">.</span>grid(<span style="color: #008000; font-weight: bold">True</span>)
+ax<span style="color: #666666">.</span>set_xlabel(<span style="color: #BA2121">&#39;z&#39;</span>)
+ax<span style="color: #666666">.</span>set_title(<span style="color: #BA2121">&#39;sigmoid function&#39;</span>)
+
+plt<span style="color: #666666">.</span>show()
+
+<span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;Step Function&quot;&quot;&quot;</span>
+z <span style="color: #666666">=</span> numpy<span style="color: #666666">.</span>arange(<span style="color: #666666">-5</span>, <span style="color: #666666">5</span>, <span style="color: #666666">.02</span>)
+step_fn <span style="color: #666666">=</span> numpy<span style="color: #666666">.</span>vectorize(<span style="color: #008000; font-weight: bold">lambda</span> z: <span style="color: #666666">1.0</span> <span style="color: #008000; font-weight: bold">if</span> z <span style="color: #666666">&gt;=</span> <span style="color: #666666">0.0</span> <span style="color: #008000; font-weight: bold">else</span> <span style="color: #666666">0.0</span>)
+step <span style="color: #666666">=</span> step_fn(z)
+
+fig <span style="color: #666666">=</span> plt<span style="color: #666666">.</span>figure()
+ax <span style="color: #666666">=</span> fig<span style="color: #666666">.</span>add_subplot(<span style="color: #666666">111</span>)
+ax<span style="color: #666666">.</span>plot(z, step)
+ax<span style="color: #666666">.</span>set_ylim([<span style="color: #666666">-0.5</span>, <span style="color: #666666">1.5</span>])
+ax<span style="color: #666666">.</span>set_xlim([<span style="color: #666666">-5</span>,<span style="color: #666666">5</span>])
+ax<span style="color: #666666">.</span>grid(<span style="color: #008000; font-weight: bold">True</span>)
+ax<span style="color: #666666">.</span>set_xlabel(<span style="color: #BA2121">&#39;z&#39;</span>)
+ax<span style="color: #666666">.</span>set_title(<span style="color: #BA2121">&#39;step function&#39;</span>)
+
+plt<span style="color: #666666">.</span>show()
+
+<span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;Sine Function&quot;&quot;&quot;</span>
+z <span style="color: #666666">=</span> numpy<span style="color: #666666">.</span>arange(<span style="color: #666666">-2*</span>mt<span style="color: #666666">.</span>pi, <span style="color: #666666">2*</span>mt<span style="color: #666666">.</span>pi, <span style="color: #666666">0.1</span>)
+t <span style="color: #666666">=</span> numpy<span style="color: #666666">.</span>sin(z)
+
+fig <span style="color: #666666">=</span> plt<span style="color: #666666">.</span>figure()
+ax <span style="color: #666666">=</span> fig<span style="color: #666666">.</span>add_subplot(<span style="color: #666666">111</span>)
+ax<span style="color: #666666">.</span>plot(z, t)
+ax<span style="color: #666666">.</span>set_ylim([<span style="color: #666666">-1.0</span>, <span style="color: #666666">1.0</span>])
+ax<span style="color: #666666">.</span>set_xlim([<span style="color: #666666">-2*</span>mt<span style="color: #666666">.</span>pi,<span style="color: #666666">2*</span>mt<span style="color: #666666">.</span>pi])
+ax<span style="color: #666666">.</span>grid(<span style="color: #008000; font-weight: bold">True</span>)
+ax<span style="color: #666666">.</span>set_xlabel(<span style="color: #BA2121">&#39;z&#39;</span>)
+ax<span style="color: #666666">.</span>set_title(<span style="color: #BA2121">&#39;sine function&#39;</span>)
+
+plt<span style="color: #666666">.</span>show()
+
+<span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;Plots a graph of the squashing function used by a rectified linear</span>
+<span style="color: #BA2121; font-style: italic">unit&quot;&quot;&quot;</span>
+z <span style="color: #666666">=</span> numpy<span style="color: #666666">.</span>arange(<span style="color: #666666">-2</span>, <span style="color: #666666">2</span>, <span style="color: #666666">.1</span>)
+zero <span style="color: #666666">=</span> numpy<span style="color: #666666">.</span>zeros(<span style="color: #008000">len</span>(z))
+y <span style="color: #666666">=</span> numpy<span style="color: #666666">.</span>max([zero, z], axis<span style="color: #666666">=0</span>)
+
+fig <span style="color: #666666">=</span> plt<span style="color: #666666">.</span>figure()
+ax <span style="color: #666666">=</span> fig<span style="color: #666666">.</span>add_subplot(<span style="color: #666666">111</span>)
+ax<span style="color: #666666">.</span>plot(z, y)
+ax<span style="color: #666666">.</span>set_ylim([<span style="color: #666666">-2.0</span>, <span style="color: #666666">2.0</span>])
+ax<span style="color: #666666">.</span>set_xlim([<span style="color: #666666">-2.0</span>, <span style="color: #666666">2.0</span>])
+ax<span style="color: #666666">.</span>grid(<span style="color: #008000; font-weight: bold">True</span>)
+ax<span style="color: #666666">.</span>set_xlabel(<span style="color: #BA2121">&#39;z&#39;</span>)
+ax<span style="color: #666666">.</span>set_title(<span style="color: #BA2121">&#39;Rectified linear unit&#39;</span>)
+
+plt<span style="color: #666666">.</span>show()
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
+</div>
 
-<p>The applications are not limited to the natural sciences. There is a
-plethora of applications in essentially all disciplines, from the
-humanities to life science and medicine.
-</p>
 
 <p>
 <!-- navigation buttons at the bottom of the page -->
@@ -436,18 +378,6 @@ <h2 id="artificial-neurons" class="anchor">Artificial neurons  </h2>
   <li><a href="http://github.com/._week40-bs045.html">46</a></li>
   <li><a href="http://github.com/._week40-bs046.html">47</a></li>
   <li class="active"><a href="http://github.com/._week40-bs047.html">48</a></li>
-  <li><a href="http://github.com/._week40-bs048.html">49</a></li>
-  <li><a href="http://github.com/._week40-bs049.html">50</a></li>
-  <li><a href="http://github.com/._week40-bs050.html">51</a></li>
-  <li><a href="http://github.com/._week40-bs051.html">52</a></li>
-  <li><a href="http://github.com/._week40-bs052.html">53</a></li>
-  <li><a href="http://github.com/._week40-bs053.html">54</a></li>
-  <li><a href="http://github.com/._week40-bs054.html">55</a></li>
-  <li><a href="http://github.com/._week40-bs055.html">56</a></li>
-  <li><a href="http://github.com/._week40-bs056.html">57</a></li>
-  <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week40-bs068.html">69</a></li>
-  <li><a href="http://github.com/._week40-bs048.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
 </div>  <!-- end container -->
diff --git a/doc/pub/week40/html/week40-bs.html b/doc/pub/week40/html/week40-bs.html
index e77b3a43b..6833ab7a8 100644
--- a/doc/pub/week40/html/week40-bs.html
+++ b/doc/pub/week40/html/week40-bs.html
@@ -36,11 +36,10 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Plans for week 40', 2, None, 'plans-for-week-40'),
-              ('Lecture Monday September 30, 2024',
+ 'sections': [('Lecture Monday September 29, 2025',
                2,
                None,
-               'lecture-monday-september-30-2024'),
+               'lecture-monday-september-29-2025'),
               ('Suggested readings and videos',
                2,
                None,
@@ -49,145 +48,71 @@
                2,
                None,
                'lab-sessions-tuesday-and-wednesday'),
-              ('Summary from last week, using gradient descent methods, '
-               'limitations',
+              ('Logistic Regression, from last week',
                2,
                None,
-               'summary-from-last-week-using-gradient-descent-methods-limitations'),
-              ('Simple implementation of GD for OLS, Ridge and Lasso',
+               'logistic-regression-from-last-week'),
+              ('Classification problems', 2, None, 'classification-problems'),
+              ('Optimization and Deep learning',
                2,
                None,
-               'simple-implementation-of-gd-for-ols-ridge-and-lasso'),
-              ("But none of these can compete with Newton's method",
+               'optimization-and-deep-learning'),
+              ('Basics', 2, None, 'basics'),
+              ('Two parameters', 2, None, 'two-parameters'),
+              ('Maximum likelihood', 2, None, 'maximum-likelihood'),
+              ('The cost function rewritten',
                2,
                None,
-               'but-none-of-these-can-compete-with-newton-s-method'),
-              ('Gradient descent and Logistic regression',
+               'the-cost-function-rewritten'),
+              ('Minimizing the cross entropy',
                2,
                None,
-               'gradient-descent-and-logistic-regression'),
-              ('Overview video on Stochastic Gradient Descent',
+               'minimizing-the-cross-entropy'),
+              ('A more compact expression',
                2,
                None,
-               'overview-video-on-stochastic-gradient-descent'),
-              ('Batches and mini-batches', 2, None, 'batches-and-mini-batches'),
-              ('Stochastic Gradient Descent (SGD)',
+               'a-more-compact-expression'),
+              ('Extending to more predictors',
                2,
                None,
-               'stochastic-gradient-descent-sgd'),
-              ('Stochastic Gradient Descent',
+               'extending-to-more-predictors'),
+              ('Including more classes', 2, None, 'including-more-classes'),
+              ('More classes', 2, None, 'more-classes'),
+              ('Optimization, the central part of any Machine Learning '
+               'algortithm',
                2,
                None,
-               'stochastic-gradient-descent'),
-              ('Computation of gradients', 2, None, 'computation-of-gradients'),
-              ('SGD example', 2, None, 'sgd-example'),
-              ('The gradient step', 2, None, 'the-gradient-step'),
-              ('Simple example code', 2, None, 'simple-example-code'),
-              ('When do we stop?', 2, None, 'when-do-we-stop'),
-              ('Slightly different approach',
+               'optimization-the-central-part-of-any-machine-learning-algortithm'),
+              ('Revisiting our Logistic Regression case',
                2,
                None,
-               'slightly-different-approach'),
-              ('Time decay rate', 2, None, 'time-decay-rate'),
-              ('Code with a Number of Minibatches which varies',
+               'revisiting-our-logistic-regression-case'),
+              ('The equations to solve', 2, None, 'the-equations-to-solve'),
+              ("Solving using Newton-Raphson's method",
                2,
                None,
-               'code-with-a-number-of-minibatches-which-varies'),
-              ('Replace or not', 2, None, 'replace-or-not'),
-              ('Momentum based GD', 2, None, 'momentum-based-gd'),
-              ('More on momentum based approaches',
+               'solving-using-newton-raphson-s-method'),
+              ('Example code for Logistic Regression',
                2,
                None,
-               'more-on-momentum-based-approaches'),
-              ('Momentum parameter', 2, None, 'momentum-parameter'),
-              ('Second moment of the gradient',
-               2,
-               None,
-               'second-moment-of-the-gradient'),
-              ('RMS prop', 2, None, 'rms-prop'),
-              ('"ADAM optimizer":"https://arxiv.org/abs/1412.6980"',
-               2,
-               None,
-               'adam-optimizer-https-arxiv-org-abs-1412-6980'),
-              ('Algorithms and codes for Adagrad, RMSprop and Adam',
-               2,
-               None,
-               'algorithms-and-codes-for-adagrad-rmsprop-and-adam'),
-              ('AdaGrad algorithm, taken from "Goodfellow et '
-               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
-               2,
-               None,
-               'adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
-              ('RMSProp algorithm, taken from "Goodfellow et '
-               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
-               2,
-               None,
-               'rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
-              ('ADAM algorithm, taken from "Goodfellow et '
-               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
-               2,
-               None,
-               'adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
-              ('Practical tips', 2, None, 'practical-tips'),
-              ('Automatic differentiation',
-               2,
-               None,
-               'automatic-differentiation'),
-              ('Using autograd', 2, None, 'using-autograd'),
-              ('Autograd with more complicated functions',
-               2,
-               None,
-               'autograd-with-more-complicated-functions'),
-              ('More complicated functions using the elements of their '
-               'arguments directly',
-               2,
-               None,
-               'more-complicated-functions-using-the-elements-of-their-arguments-directly'),
-              ('Functions using mathematical functions from Numpy',
-               2,
-               None,
-               'functions-using-mathematical-functions-from-numpy'),
-              ('More autograd', 2, None, 'more-autograd'),
-              ('And  with loops', 2, None, 'and-with-loops'),
-              ('Using recursion', 2, None, 'using-recursion'),
-              ('Using Autograd with OLS', 2, None, 'using-autograd-with-ols'),
-              ('Same code but now with momentum gradient descent',
-               2,
-               None,
-               'same-code-but-now-with-momentum-gradient-descent'),
-              ('Including Stochastic Gradient Descent with Autograd',
-               2,
-               None,
-               'including-stochastic-gradient-descent-with-autograd'),
-              ('Same code but now with momentum gradient descent',
-               2,
-               None,
-               'same-code-but-now-with-momentum-gradient-descent'),
-              ('Similar (second order function now) problem but now with '
-               'AdaGrad',
-               2,
+               'example-code-for-logistic-regression'),
+              ('Synthetic data generation',
+               3,
                None,
-               'similar-second-order-function-now-problem-but-now-with-adagrad'),
-              ('RMSprop for adaptive learning rate with Stochastic Gradient '
-               'Descent',
+               'synthetic-data-generation'),
+              ('Using _Scikit-learn_', 2, None, 'using-scikit-learn'),
+              ('Using the correlation matrix',
                2,
                None,
-               'rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent'),
-              ('And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"',
+               'using-the-correlation-matrix'),
+              ('Discussing the correlation data',
                2,
                None,
-               'and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf'),
-              ('And Logistic Regression', 2, None, 'and-logistic-regression'),
-              ('Introducing "JAX":"https://jax.readthedocs.io/en/latest/"',
+               'discussing-the-correlation-data'),
+              ('Other measures in classification studies',
                2,
                None,
-               'introducing-jax-https-jax-readthedocs-io-en-latest'),
-              ('Getting started with Jax, note the way we import numpy',
-               3,
-               None,
-               'getting-started-with-jax-note-the-way-we-import-numpy'),
-              ('A warm-up example', 3, None, 'a-warm-up-example'),
-              ('A more advanced example', 3, None, 'a-more-advanced-example'),
+               'other-measures-in-classification-studies'),
               ('Introduction to Neural networks',
                2,
                None,
@@ -276,81 +201,54 @@
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs001.html#plans-for-week-40" style="font-size: 80%;"><b>Plans for week 40</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs002.html#lecture-monday-september-30-2024" style="font-size: 80%;"><b>Lecture Monday September 30, 2024</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs003.html#suggested-readings-and-videos" style="font-size: 80%;"><b>Suggested readings and videos</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs004.html#lab-sessions-tuesday-and-wednesday" style="font-size: 80%;"><b>Lab sessions Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs005.html#summary-from-last-week-using-gradient-descent-methods-limitations" style="font-size: 80%;"><b>Summary from last week, using gradient descent methods, limitations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs006.html#simple-implementation-of-gd-for-ols-ridge-and-lasso" style="font-size: 80%;"><b>Simple implementation of GD for OLS, Ridge and Lasso</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs007.html#but-none-of-these-can-compete-with-newton-s-method" style="font-size: 80%;"><b>But none of these can compete with Newton's method</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs008.html#gradient-descent-and-logistic-regression" style="font-size: 80%;"><b>Gradient descent and Logistic regression</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs009.html#overview-video-on-stochastic-gradient-descent" style="font-size: 80%;"><b>Overview video on Stochastic Gradient Descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs010.html#batches-and-mini-batches" style="font-size: 80%;"><b>Batches and mini-batches</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs011.html#stochastic-gradient-descent-sgd" style="font-size: 80%;"><b>Stochastic Gradient Descent (SGD)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs012.html#stochastic-gradient-descent" style="font-size: 80%;"><b>Stochastic Gradient Descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs013.html#computation-of-gradients" style="font-size: 80%;"><b>Computation of gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs014.html#sgd-example" style="font-size: 80%;"><b>SGD example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs015.html#the-gradient-step" style="font-size: 80%;"><b>The gradient step</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs016.html#simple-example-code" style="font-size: 80%;"><b>Simple example code</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs017.html#when-do-we-stop" style="font-size: 80%;"><b>When do we stop?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs018.html#slightly-different-approach" style="font-size: 80%;"><b>Slightly different approach</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs019.html#time-decay-rate" style="font-size: 80%;"><b>Time decay rate</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs020.html#code-with-a-number-of-minibatches-which-varies" style="font-size: 80%;"><b>Code with a Number of Minibatches which varies</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs021.html#replace-or-not" style="font-size: 80%;"><b>Replace or not</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs022.html#momentum-based-gd" style="font-size: 80%;"><b>Momentum based GD</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs023.html#more-on-momentum-based-approaches" style="font-size: 80%;"><b>More on momentum based approaches</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs024.html#momentum-parameter" style="font-size: 80%;"><b>Momentum parameter</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs025.html#second-moment-of-the-gradient" style="font-size: 80%;"><b>Second moment of the gradient</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs026.html#rms-prop" style="font-size: 80%;"><b>RMS prop</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs027.html#adam-optimizer-https-arxiv-org-abs-1412-6980" style="font-size: 80%;"><b>"ADAM optimizer":"https://arxiv.org/abs/1412.6980"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#algorithms-and-codes-for-adagrad-rmsprop-and-adam" style="font-size: 80%;"><b>Algorithms and codes for Adagrad, RMSprop and Adam</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>AdaGrad algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>RMSProp algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html" style="font-size: 80%;"><b>ADAM algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs029.html#practical-tips" style="font-size: 80%;"><b>Practical tips</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs030.html#automatic-differentiation" style="font-size: 80%;"><b>Automatic differentiation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs031.html#using-autograd" style="font-size: 80%;"><b>Using autograd</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs032.html#autograd-with-more-complicated-functions" style="font-size: 80%;"><b>Autograd with more complicated functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs033.html#more-complicated-functions-using-the-elements-of-their-arguments-directly" style="font-size: 80%;"><b>More complicated functions using the elements of their arguments directly</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs034.html#functions-using-mathematical-functions-from-numpy" style="font-size: 80%;"><b>Functions using mathematical functions from Numpy</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs035.html#more-autograd" style="font-size: 80%;"><b>More autograd</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs036.html#and-with-loops" style="font-size: 80%;"><b>And  with loops</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs037.html#using-recursion" style="font-size: 80%;"><b>Using recursion</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs038.html#using-autograd-with-ols" style="font-size: 80%;"><b>Using Autograd with OLS</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs041.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs040.html#including-stochastic-gradient-descent-with-autograd" style="font-size: 80%;"><b>Including Stochastic Gradient Descent with Autograd</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs041.html#same-code-but-now-with-momentum-gradient-descent" style="font-size: 80%;"><b>Same code but now with momentum gradient descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#similar-second-order-function-now-problem-but-now-with-adagrad" style="font-size: 80%;"><b>Similar (second order function now) problem but now with AdaGrad</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs043.html#rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent" style="font-size: 80%;"><b>RMSprop for adaptive learning rate with Stochastic Gradient Descent</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs044.html#and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf" style="font-size: 80%;"><b>And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#and-logistic-regression" style="font-size: 80%;"><b>And Logistic Regression</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#introducing-jax-https-jax-readthedocs-io-en-latest" style="font-size: 80%;"><b>Introducing "JAX":"https://jax.readthedocs.io/en/latest/"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#getting-started-with-jax-note-the-way-we-import-numpy" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Getting started with Jax, note the way we import numpy</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#a-warm-up-example" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;A warm-up example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#a-more-advanced-example" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;A more advanced example</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs046.html#introduction-to-neural-networks" style="font-size: 80%;"><b>Introduction to Neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs047.html#artificial-neurons" style="font-size: 80%;"><b>Artificial neurons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs048.html#neural-network-types" style="font-size: 80%;"><b>Neural network types</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs049.html#feed-forward-neural-networks" style="font-size: 80%;"><b>Feed-forward neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs050.html#convolutional-neural-network" style="font-size: 80%;"><b>Convolutional Neural Network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs051.html#recurrent-neural-networks" style="font-size: 80%;"><b>Recurrent neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs052.html#other-types-of-networks" style="font-size: 80%;"><b>Other types of networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs053.html#multilayer-perceptrons" style="font-size: 80%;"><b>Multilayer perceptrons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs054.html#why-multilayer-perceptrons" style="font-size: 80%;"><b>Why multilayer perceptrons?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs055.html#illustration-of-a-single-perceptron-model-and-a-multi-perceptron-model" style="font-size: 80%;"><b>Illustration of a single perceptron model and a multi-perceptron model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs056.html#examples-of-xor-or-and-and-gates" style="font-size: 80%;"><b>Examples of XOR, OR and AND gates</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs057.html#does-logistic-regression-do-a-better-job" style="font-size: 80%;"><b>Does Logistic Regression do a better Job?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs058.html#adding-neural-networks" style="font-size: 80%;"><b>Adding Neural Networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs063.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs064.html#matrix-vector-notation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Matrix-vector notation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs065.html#matrix-vector-notation-and-activation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Matrix-vector notation  and activation</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs066.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs067.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs068.html#relevance" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Relevance</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs001.html#lecture-monday-september-29-2025" style="font-size: 80%;"><b>Lecture Monday September 29, 2025</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs002.html#suggested-readings-and-videos" style="font-size: 80%;"><b>Suggested readings and videos</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs003.html#lab-sessions-tuesday-and-wednesday" style="font-size: 80%;"><b>Lab sessions Tuesday and Wednesday</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs004.html#logistic-regression-from-last-week" style="font-size: 80%;"><b>Logistic Regression, from last week</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs005.html#classification-problems" style="font-size: 80%;"><b>Classification problems</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs006.html#optimization-and-deep-learning" style="font-size: 80%;"><b>Optimization and Deep learning</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs007.html#basics" style="font-size: 80%;"><b>Basics</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs008.html#two-parameters" style="font-size: 80%;"><b>Two parameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs009.html#maximum-likelihood" style="font-size: 80%;"><b>Maximum likelihood</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs010.html#the-cost-function-rewritten" style="font-size: 80%;"><b>The cost function rewritten</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs011.html#minimizing-the-cross-entropy" style="font-size: 80%;"><b>Minimizing the cross entropy</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs012.html#a-more-compact-expression" style="font-size: 80%;"><b>A more compact expression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs013.html#extending-to-more-predictors" style="font-size: 80%;"><b>Extending to more predictors</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs014.html#including-more-classes" style="font-size: 80%;"><b>Including more classes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs015.html#more-classes" style="font-size: 80%;"><b>More classes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs016.html#optimization-the-central-part-of-any-machine-learning-algortithm" style="font-size: 80%;"><b>Optimization, the central part of any Machine Learning algortithm</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs017.html#revisiting-our-logistic-regression-case" style="font-size: 80%;"><b>Revisiting our Logistic Regression case</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs018.html#the-equations-to-solve" style="font-size: 80%;"><b>The equations to solve</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs019.html#solving-using-newton-raphson-s-method" style="font-size: 80%;"><b>Solving using Newton-Raphson's method</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs020.html#example-code-for-logistic-regression" style="font-size: 80%;"><b>Example code for Logistic Regression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs020.html#synthetic-data-generation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Synthetic data generation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs021.html#using-scikit-learn" style="font-size: 80%;"><b>Using <b>Scikit-learn</b></b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs022.html#using-the-correlation-matrix" style="font-size: 80%;"><b>Using the correlation matrix</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs023.html#discussing-the-correlation-data" style="font-size: 80%;"><b>Discussing the correlation data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs024.html#other-measures-in-classification-studies" style="font-size: 80%;"><b>Other measures in classification studies</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs025.html#introduction-to-neural-networks" style="font-size: 80%;"><b>Introduction to Neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs026.html#artificial-neurons" style="font-size: 80%;"><b>Artificial neurons</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs027.html#neural-network-types" style="font-size: 80%;"><b>Neural network types</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs028.html#feed-forward-neural-networks" style="font-size: 80%;"><b>Feed-forward neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs029.html#convolutional-neural-network" style="font-size: 80%;"><b>Convolutional Neural Network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs030.html#recurrent-neural-networks" style="font-size: 80%;"><b>Recurrent neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs031.html#other-types-of-networks" style="font-size: 80%;"><b>Other types of networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs032.html#multilayer-perceptrons" style="font-size: 80%;"><b>Multilayer perceptrons</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs033.html#why-multilayer-perceptrons" style="font-size: 80%;"><b>Why multilayer perceptrons?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs034.html#illustration-of-a-single-perceptron-model-and-a-multi-perceptron-model" style="font-size: 80%;"><b>Illustration of a single perceptron model and a multi-perceptron model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs035.html#examples-of-xor-or-and-and-gates" style="font-size: 80%;"><b>Examples of XOR, OR and AND gates</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs036.html#does-logistic-regression-do-a-better-job" style="font-size: 80%;"><b>Does Logistic Regression do a better Job?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs037.html#adding-neural-networks" style="font-size: 80%;"><b>Adding Neural Networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs042.html#mathematical-model" style="font-size: 80%;"><b>Mathematical model</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs043.html#matrix-vector-notation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Matrix-vector notation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs044.html#matrix-vector-notation-and-activation" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Matrix-vector notation  and activation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs045.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs046.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week40-bs047.html#relevance" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Relevance</a></li>
 
         </ul>
       </li>
@@ -369,18 +267,15 @@ <h1>Week 40: Gradient descent methods (continued) and start Neural networks</h1>
 
 <!-- author(s): Morten Hjorth-Jensen -->
 <center>
-<b>Morten Hjorth-Jensen</b> [1, 2]
-</center>
-<!-- institution(s) -->
-<center>
-[1] <b>Department of Physics, University of Oslo, Norway</b>
+<b>Morten Hjorth-Jensen</b> 
 </center>
+<!-- institution -->
 <center>
-[2] <b>Department of Physics and Astronomy and Facility for Rare Ion Beams, Michigan State University, USA</b>
+<b>Department of Physics, University of Oslo, Norway</b>
 </center>
 <br>
 <center>
-<h4>September 30-October 4, 2024</h4>
+<h4>September 29-October 3, 2025</h4>
 </center> <!-- date -->
 <br>
 
@@ -405,7 +300,7 @@ <h4>September 30-October 4, 2024</h4>
   <li><a href="http://github.com/._week40-bs008.html">9</a></li>
   <li><a href="http://github.com/._week40-bs009.html">10</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week40-bs068.html">69</a></li>
+  <li><a href="http://github.com/._week40-bs047.html">48</a></li>
   <li><a href="http://github.com/._week40-bs001.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
@@ -419,7 +314,7 @@ <h4>September 30-October 4, 2024</h4>
 </footer>
 -->
 <center style="font-size:80%">
-<!-- copyright --> &copy; 1999-2024, Morten Hjorth-Jensen. Released under CC Attribution-NonCommercial 4.0 license
+<!-- copyright --> &copy; 1999-2025, Morten Hjorth-Jensen. Released under CC Attribution-NonCommercial 4.0 license
 </center>
 </body>
 </html>
diff --git a/doc/pub/week40/html/week40-reveal.html b/doc/pub/week40/html/week40-reveal.html
index 1f6ba7f5d..83dec048c 100644
--- a/doc/pub/week40/html/week40-reveal.html
+++ b/doc/pub/week40/html/week40-reveal.html
@@ -173,41 +173,35 @@ <h1 style="text-align: center;">Week 40: Gradient descent methods (continued) an
 
 <!-- author(s): Morten Hjorth-Jensen -->
 <center>
-<b>Morten Hjorth-Jensen</b> [1, 2]
+<b>Morten Hjorth-Jensen</b> 
 </center>
-<!-- institution(s) -->
+<!-- institution -->
 <center>
-[1] <b>Department of Physics, University of Oslo, Norway</b>
-</center>
-<center>
-[2] <b>Department of Physics and Astronomy and Facility for Rare Ion Beams, Michigan State University, USA</b>
+<b>Department of Physics, University of Oslo, Norway</b>
 </center>
 <br>
 <center>
-<h4>September 30-October 4, 2024</h4>
+<h4>September 29-October 3, 2025</h4>
 </center> <!-- date -->
 <br>
 
 
 <center style="font-size:80%">
-<!-- copyright --> &copy; 1999-2024, Morten Hjorth-Jensen. Released under CC Attribution-NonCommercial 4.0 license
+<!-- copyright --> &copy; 1999-2025, Morten Hjorth-Jensen. Released under CC Attribution-NonCommercial 4.0 license
 </center>
 </section>
 
 <section>
-<h2 id="plans-for-week-40">Plans for week 40 </h2>
-</section>
-
-<section>
-<h2 id="lecture-monday-september-30-2024">Lecture Monday September 30, 2024 </h2>
+<h2 id="lecture-monday-september-29-2025">Lecture Monday September 29, 2025 </h2>
 <div class="alert alert-block alert-block alert-text-normal">
 <b></b>
 <p>
 <ol>
- <p><li> Stochastic Gradient descent with examples and automatic differentiation</li>
- <p><li> If we get time, we start with the basics of Neural Networks, setting up the basic steps, from the simple perceptron model to the multi-layer perceptron model</li>
- <p><li> <a href="https://youtu.be/jdJoOrCIdII" target="_blank">Video of lecture</a></li>
- <p><li> Whiteboard notes at <a href="https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2024/NotesSeptember30.pdf" target="_blank"><tt>https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2024/NotesSeptember30.pdf</tt></a></li>  
+<p><li> Logistic regression and gradient descent, examples on how to code
+<!-- o Automatic differentiation and gradient descent, examples using Logistic regression --></li>
+<p><li> Start with the basics of Neural Networks, setting up the basic steps, from the simple perceptron model to the multi-layer perceptron model</li>
+<p><li> Video of lecture at <a href="https://youtu.be/MS3Tv8FVArs" target="_blank"><tt>https://youtu.be/MS3Tv8FVArs</tt></a></li>
+<p><li> Whiteboard notes at <a href="https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2025/FYSSTKweek40.pdf" target="_blank"><tt>https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2025/FYSSTKweek40.pdf</tt></a></li>  
 </ol>
 </div>
 </section>
@@ -218,19 +212,12 @@ <h2 id="suggested-readings-and-videos">Suggested readings and videos </h2>
 <b>Readings and Videos:</b>
 <p>
 <ol>
-
-<p><li> The lecture notes for week 40 (these notes)</li>
-
-<p><li> For a good discussion on gradient methods, we would like to recommend Goodfellow et al section 4.3-4.5 and sections 8.3-8.6. We will come back to the latter chapter in our discussion of Neural networks as well.</li>
-
-<p><li> For neural networks we recommend Goodfellow et al chapter 6 and Raschka et al chapter 2 (contains also material about gradient descent) and chapter 11 (we will use this next week)</li>
-
-<p><li> Video on gradient descent at <a href="https://www.youtube.com/watch?v=sDv4f4s2SB8" target="_blank"><tt>https://www.youtube.com/watch?v=sDv4f4s2SB8</tt></a></li>
-
-<p><li> Video on stochastic gradient descent at <a href="https://www.youtube.com/watch?v=vMh0zPT0tLI" target="_blank"><tt>https://www.youtube.com/watch?v=vMh0zPT0tLI</tt></a></li>
-
+<p><li> The lecture notes for week 40 (these notes)
+<!-- o For a good discussion on gradient methods, we would like to recommend Goodfellow et al section 4.3-4.5 and# sections 8.3-8.6. We will come back to the latter chapter in our discussion of Neural networks as well. --></li>
+<p><li> For neural networks we recommend Goodfellow et al chapter 6 and Raschka et al chapter 2 (contains also material about gradient descent) and chapter 11 (we will use this next week)
+<!-- o Video on gradient descent at <a href="https://www.youtube.com/watch?v=sDv4f4s2SB8" target="_blank"><tt>https://www.youtube.com/watch?v=sDv4f4s2SB8</tt></a> -->
+<!-- o Video on automatic differentiation  at <a href="https://www.youtube.com/watch?v=wG_nF1awSSY" target="_blank"><tt>https://www.youtube.com/watch?v=wG_nF1awSSY</tt></a> --></li>
 <p><li> Neural Networks demystified at <a href="https://www.youtube.com/watch?v=bxe2T-V8XRs&list=PLiaHhY2iBX9hdHaRr6b7XevZtgZRa1PoU&ab_channel=WelchLabs" target="_blank"><tt>https://www.youtube.com/watch?v=bxe2T-V8XRs&list=PLiaHhY2iBX9hdHaRr6b7XevZtgZRa1PoU&ab_channel=WelchLabs</tt></a></li>
-
 <p><li> Building Neural Networks from scratch at URL:https://www.youtube.com/watch?v=Wo5dMEP_BbI&list=PLQVvvaa0QuDcjD5BAw2DxE6OF2tius3V3&ab_channel=sentdex"</li>
 </ol>
 </div>
@@ -255,1388 +242,409 @@ <h2 id="lab-sessions-tuesday-and-wednesday">Lab sessions Tuesday and Wednesday <
 </section>
 
 <section>
-<h2 id="summary-from-last-week-using-gradient-descent-methods-limitations">Summary from last week, using gradient descent methods, limitations </h2>
+<h2 id="logistic-regression-from-last-week">Logistic Regression, from last week  </h2>
 
-<ul>
-<p><li> <b>Gradient descent (GD) finds local minima of our function</b>. Since the GD algorithm is deterministic, if it converges, it will converge to a local minimum of our cost/loss/risk function. Because in ML we are often dealing with extremely rugged landscapes with many local minima, this can lead to poor performance.</li>
-<p><li> <b>GD is sensitive to initial conditions</b>. One consequence of the local nature of GD is that initial conditions matter. Depending on where one starts, one will end up at a different local minima. Therefore, it is very important to think about how one initializes the training process. This is true for GD as well as more complicated variants of GD.</li>
-<p><li> <b>Gradients are computationally expensive to calculate for large datasets</b>. In many cases in statistics and ML, the cost/loss/risk function is a sum of terms, with one term for each data point. For example, in linear regression, \( E \propto \sum_{i=1}^n (y_i - \mathbf{w}^T\cdot\mathbf{x}_i)^2 \); for logistic regression, the square error is replaced by the cross entropy. To calculate the gradient we have to sum over <em>all</em> \( n \) data points. Doing this at every GD step becomes extremely computationally expensive. An ingenious solution to this, is to calculate the gradients using small subsets of the data called &quot;mini batches&quot;. This has the added benefit of introducing stochasticity into our algorithm.</li>
-<p><li> <b>GD is very sensitive to choices of learning rates</b>. GD is extremely sensitive to the choice of learning rates. If the learning rate is very small, the training process take an extremely long time. For larger learning rates, GD can diverge and give poor results. Furthermore, depending on what the local landscape looks like, we have to modify the learning rates to ensure convergence. Ideally, we would <em>adaptively</em> choose the learning rates to match the landscape.</li>
-<p><li> <b>GD treats all directions in parameter space uniformly.</b> Another major drawback of GD is that unlike Newton's method, the learning rate for GD is the same in all directions in parameter space. For this reason, the maximum learning rate is set by the behavior of the steepest direction and this can significantly slow down training. Ideally, we would like to take large steps in flat directions and small steps in steep directions. Since we are exploring rugged landscapes where curvatures change, this requires us to keep track of not only the gradient but second derivatives. The ideal scenario would be to calculate the Hessian but this proves to be too computationally expensive.</li> 
-<p><li> GD can take exponential time to escape saddle points, even with random initialization. As we mentioned, GD is extremely sensitive to initial condition since it determines the particular local minimum GD would eventually reach. However, even with a good initialization scheme, through the introduction of randomness, GD can still take exponential time to escape saddle points.</li>
-</ul>
-</section>
-
-<section>
-<h2 id="simple-implementation-of-gd-for-ols-ridge-and-lasso">Simple implementation of GD for OLS, Ridge and Lasso </h2>
-
-<p>Last week we studied both several gradient methods. With and without an update of the learning.
-We summarize some of these here for the methods we hvae studied in project one, without the inclusion of momentum.
+<p>In linear regression our main interest was centered on learning the
+coefficients of a functional fit (say a polynomial) in order to be
+able to predict the response of a continuous variable on some unseen
+data. The fit to the continuous variable \( y_i \) is based on some
+independent variables \( \boldsymbol{x}_i \). Linear regression resulted in
+analytical expressions for standard ordinary Least Squares or Ridge
+regression (in terms of matrices to invert) for several quantities,
+ranging from the variance and thereby the confidence intervals of the
+parameters \( \boldsymbol{\theta} \) to the mean squared error. If we can invert
+the product of the design matrices, linear regression gives then a
+simple recipe for fitting our data.
 </p>
-
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="font-size: 80%; line-height: 125%;"><span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">random</span> <span style="color: #8B008B; font-weight: bold">import</span> random, seed
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
-
-<span style="color: #228B22"># the number of datapoints with a 2nd-order polynomial</span>
-n = <span style="color: #B452CD">100</span>
-x = <span style="color: #B452CD">2</span>*np.random.rand(n,<span style="color: #B452CD">1</span>)
-y = <span style="color: #B452CD">4</span>+<span style="color: #B452CD">3</span>*x+<span style="color: #B452CD">5</span>*x*x
-<span style="color: #228B22"># Design matrix including the intercept</span>
-<span style="color: #228B22"># No scaling of data of and all data used for training </span>
-X = np.c_[np.ones((n,<span style="color: #B452CD">1</span>)), x, x*x]
-<span style="color: #228B22"># Learning rate and number of iterations</span>
-eta = <span style="color: #B452CD">0.05</span>
-Niterations = <span style="color: #B452CD">100</span>
-
-<span style="color: #228B22"># OLS part</span>
-beta_OLS = np.random.randn(<span style="color: #B452CD">3</span>,<span style="color: #B452CD">1</span>)
-gradient = np.zeros(<span style="color: #B452CD">3</span>)
-<span style="color: #8B008B; font-weight: bold">for</span> <span style="color: #658b00">iter</span> <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(Niterations):
-    gradient = (<span style="color: #B452CD">2.0</span>/n)*X.T @ (X @ beta_OLS-y)
-    beta_OLS -= eta*gradient
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&#39;Parameters for OLS using gradient descent&#39;</span>)    
-<span style="color: #658b00">print</span>(beta_OLS)
-
-<span style="color: #228B22">#Ridge and Lasso parameter Lambda</span>
-Lambda  = <span style="color: #B452CD">0.01</span>
-Id = n*Lambda* np.eye((X.T @ X).shape[<span style="color: #B452CD">0</span>])
-<span style="color: #228B22"># Gradient descent with  Ridge</span>
-beta_Ridge = np.random.randn(<span style="color: #B452CD">3</span>,<span style="color: #B452CD">1</span>)
-gradient = np.zeros(<span style="color: #B452CD">3</span>)
-<span style="color: #8B008B; font-weight: bold">for</span> <span style="color: #658b00">iter</span> <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(Niterations):
-    gradients = <span style="color: #B452CD">2.0</span>/n*X.T @ (X @ beta_Ridge-y)+<span style="color: #B452CD">2</span>*Lambda*beta_Ridge
-    beta_Ridge -= eta*gradients
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&#39;Parameters for Ridge using gradient descent&#39;</span>)    
-<span style="color: #658b00">print</span>(beta_Ridge)
-
-<span style="color: #228B22"># Gradient descent with Lasso</span>
-beta_Lasso = np.random.randn(<span style="color: #B452CD">3</span>,<span style="color: #B452CD">1</span>)
-gradient = np.zeros(<span style="color: #B452CD">3</span>)
-<span style="color: #8B008B; font-weight: bold">for</span> <span style="color: #658b00">iter</span> <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(Niterations):
-    gradients = <span style="color: #B452CD">2.0</span>/n*X.T @ (X @ beta_Lasso-y)+Lambda*np.sign(beta_Lasso)
-    beta_Lasso -= eta*gradients
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&#39;Parameters for Lasso using gradient descent&#39;</span>)    
-<span style="color: #658b00">print</span>(beta_Lasso)
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
 </section>
 
 <section>
-<h2 id="but-none-of-these-can-compete-with-newton-s-method">But none of these can compete with Newton's method </h2>
-
-<p>Note that we here have introduced automatic differentiation</p>
+<h2 id="classification-problems">Classification problems </h2>
 
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="font-size: 80%; line-height: 125%;"><span style="color: #228B22"># Using Newton&#39;s method</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">random</span> <span style="color: #8B008B; font-weight: bold">import</span> random, seed
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">autograd.numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">autograd</span> <span style="color: #8B008B; font-weight: bold">import</span> grad
-
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">CostOLS</span>(beta):
-    <span style="color: #8B008B; font-weight: bold">return</span> (<span style="color: #B452CD">1.0</span>/n)*np.sum((y-X @ beta)**<span style="color: #B452CD">2</span>)
-
-n = <span style="color: #B452CD">100</span>
-x = <span style="color: #B452CD">2</span>*np.random.rand(n,<span style="color: #B452CD">1</span>)
-y = <span style="color: #B452CD">4</span>+<span style="color: #B452CD">3</span>*x+<span style="color: #B452CD">5</span>*x*x
-
-X = np.c_[np.ones((n,<span style="color: #B452CD">1</span>)), x, x*x]
-XT_X = X.T @ X
-beta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;Own inversion&quot;</span>)
-<span style="color: #658b00">print</span>(beta_linreg)
-<span style="color: #228B22"># Hessian matrix</span>
-H = (<span style="color: #B452CD">2.0</span>/n)* XT_X
-<span style="color: #228B22"># Note that here the Hessian does not depend on the parameters beta</span>
-invH = np.linalg.pinv(H)
-beta = np.random.randn(<span style="color: #B452CD">3</span>,<span style="color: #B452CD">1</span>)
-Niterations = <span style="color: #B452CD">5</span>
-<span style="color: #228B22"># define the gradient</span>
-training_gradient = grad(CostOLS)
-
-<span style="color: #8B008B; font-weight: bold">for</span> <span style="color: #658b00">iter</span> <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(Niterations):
-    gradients = training_gradient(beta)
-    beta -= invH @ gradients
-    <span style="color: #658b00">print</span>(<span style="color: #658b00">iter</span>,gradients[<span style="color: #B452CD">0</span>],gradients[<span style="color: #B452CD">1</span>])
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;beta from own Newton code&quot;</span>)
-<span style="color: #658b00">print</span>(beta)
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-</section>
-
-<section>
-<h2 id="gradient-descent-and-logistic-regression">Gradient descent and Logistic regression </h2>
-
-<p>Finally, we complete these examples by adding a simple code for
-Logistic regression. Note the more general approach with a class for
-the method. Here we use a so-called <b>AND</b> gate for our data set.
+<p>Classification problems, however, are concerned with outcomes taking
+the form of discrete variables (i.e. categories). We may for example,
+on the basis of DNA sequencing for a number of patients, like to find
+out which mutations are important for a certain disease; or based on
+scans of various patients' brains, figure out if there is a tumor or
+not; or given a specific physical system, we'd like to identify its
+state, say whether it is an ordered or disordered system (typical
+situation in solid state physics); or classify the status of a
+patient, whether she/he has a stroke or not and many other similar
+situations.
 </p>
 
-
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="font-size: 80%; line-height: 125%;"><span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
-<span style="color: #8B008B; font-weight: bold">class</span> <span style="color: #008b45; font-weight: bold">LogisticRegression</span>:
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">__init__</span>(<span style="color: #658b00">self</span>, learning_rate=<span style="color: #B452CD">0.01</span>, num_iterations=<span style="color: #B452CD">1000</span>):
-        <span style="color: #658b00">self</span>.learning_rate = learning_rate
-        <span style="color: #658b00">self</span>.num_iterations = num_iterations
-        <span style="color: #658b00">self</span>.beta_logreg = <span style="color: #8B008B; font-weight: bold">None</span>
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">sigmoid</span>(<span style="color: #658b00">self</span>, z):
-        <span style="color: #8B008B; font-weight: bold">return</span> <span style="color: #B452CD">1</span> / (<span style="color: #B452CD">1</span> + np.exp(-z))
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">GDfit</span>(<span style="color: #658b00">self</span>, X, y):
-        n_data, num_features = X.shape
-        <span style="color: #658b00">self</span>.beta_logreg = np.zeros(num_features)
-        <span style="color: #8B008B; font-weight: bold">for</span> _ <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #658b00">self</span>.num_iterations):
-            linear_model = X @ <span style="color: #658b00">self</span>.beta_logreg
-            y_predicted = <span style="color: #658b00">self</span>.sigmoid(linear_model)
-            <span style="color: #228B22"># Gradient calculation</span>
-            gradient = (X.T @ (y_predicted - y))/n_data
-            <span style="color: #228B22"># Update beta_logreg</span>
-            <span style="color: #658b00">self</span>.beta_logreg -= <span style="color: #658b00">self</span>.learning_rate*gradient
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">predict</span>(<span style="color: #658b00">self</span>, X):
-        linear_model = X @ <span style="color: #658b00">self</span>.beta_logreg
-        y_predicted = <span style="color: #658b00">self</span>.sigmoid(linear_model)
-        <span style="color: #8B008B; font-weight: bold">return</span> [<span style="color: #B452CD">1</span> <span style="color: #8B008B; font-weight: bold">if</span> i &gt;= <span style="color: #B452CD">0.5</span> <span style="color: #8B008B; font-weight: bold">else</span> <span style="color: #B452CD">0</span> <span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> y_predicted]
-<span style="color: #228B22"># Example usage</span>
-<span style="color: #8B008B; font-weight: bold">if</span> <span style="color: #00688B">__name__</span> == <span style="color: #CD5555">&quot;__main__&quot;</span>:
-    <span style="color: #228B22"># Sample data</span>
-    X = np.array([[<span style="color: #B452CD">0</span>, <span style="color: #B452CD">0</span>], [<span style="color: #B452CD">1</span>, <span style="color: #B452CD">0</span>], [<span style="color: #B452CD">0</span>, <span style="color: #B452CD">1</span>], [<span style="color: #B452CD">1</span>, <span style="color: #B452CD">1</span>]])
-    y = np.array([<span style="color: #B452CD">0</span>, <span style="color: #B452CD">0</span>, <span style="color: #B452CD">0</span>, <span style="color: #B452CD">1</span>])  <span style="color: #228B22"># This is an AND gate</span>
-    model = LogisticRegression(learning_rate=<span style="color: #B452CD">0.01</span>, num_iterations=<span style="color: #B452CD">1000</span>)
-    model.GDfit(X, y)
-    predictions = model.predict(X)
-    <span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;Predictions:&quot;</span>, predictions)
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
+<p>The most common situation we encounter when we apply logistic
+regression is that of two possible outcomes, normally denoted as a
+binary outcome, true or false, positive or negative, success or
+failure etc.
+</p>
 </section>
 
 <section>
-<h2 id="overview-video-on-stochastic-gradient-descent">Overview video on Stochastic Gradient Descent </h2>
-
-<a href="https://www.youtube.com/watch?v=vMh0zPT0tLI&ab_channel=StatQuestwithJoshStarmer" target="_blank">What is Stochastic Gradient Descent</a>
-<p>There are several reasons for using stochastic gradient descent. Some of these are:</p>
+<h2 id="optimization-and-deep-learning">Optimization and Deep learning </h2>
 
-<ol>
-<p><li> Efficiency: Updates weights more frequently using a single or a small batch of samples, which speeds up convergence.</li>
-<p><li> Hopefully avoid Local Minima</li>
-<p><li> Memory Usage: Requires less memory compared to computing gradients for the entire dataset.</li>
-</ol>
-</section>
-
-<section>
-<h2 id="batches-and-mini-batches">Batches and mini-batches </h2>
+<p>Logistic regression will also serve as our stepping stone towards
+neural network algorithms and supervised deep learning. For logistic
+learning, the minimization of the cost function leads to a non-linear
+equation in the parameters \( \boldsymbol{\theta} \). The optimization of the
+problem calls therefore for minimization algorithms.
+</p>
 
-<p>In gradient descent we compute the cost function and its gradient for all data points we have.</p>
+<p>As we have discussed earlier, this forms the
+bottle neck of all machine learning algorithms, namely how to find
+reliable minima of a multi-variable function. This leads us to the
+family of gradient descent methods. The latter are the working horses
+of basically all modern machine learning algorithms.
+</p>
 
-<p>In large-scale applications such as the <a href="https://www.image-net.org/challenges/LSVRC/" target="_blank">ILSVRC challenge</a>, the
-training data can have on order of millions of examples. Hence, it
-seems wasteful to compute the full cost function over the entire
-training set in order to perform only a single parameter update. A
-very common approach to addressing this challenge is to compute the
-gradient over batches of the training data. For example, a typical batch could contain some thousand  examples from
-an  entire training set of several millions. This batch is then used to
-perform a parameter update.
+<p>We note also that many of the topics discussed here on logistic 
+regression are also commonly used in modern supervised Deep Learning
+models, as we will see later.
 </p>
 </section>
 
 <section>
-<h2 id="stochastic-gradient-descent-sgd">Stochastic Gradient Descent (SGD) </h2>
-
-<p>In stochastic gradient descent, the extreme case is the case where we
-have only one batch, that is we include the whole data set.
-</p>
-
-<p>This process is called Stochastic Gradient
-Descent (SGD) (or also sometimes on-line gradient descent). This is
-relatively less common to see because in practice due to vectorized
-code optimizations it can be computationally much more efficient to
-evaluate the gradient for 100 examples, than the gradient for one
-example 100 times. Even though SGD technically refers to using a
-single example at a time to evaluate the gradient, you will hear
-people use the term SGD even when referring to mini-batch gradient
-descent (i.e. mentions of MGD for &#8220;Minibatch Gradient Descent&#8221;, or BGD
-for &#8220;Batch gradient descent&#8221; are rare to see), where it is usually
-assumed that mini-batches are used. The size of the mini-batch is a
-hyperparameter but it is not very common to cross-validate or bootstrap it. It is
-usually based on memory constraints (if any), or set to some value,
-e.g. 32, 64 or 128. We use powers of 2 in practice because many
-vectorized operation implementations work faster when their inputs are
-sized in powers of 2.
-</p>
-
-<p>In our notes with  SGD we mean stochastic gradient descent with mini-batches.</p>
-</section>
+<h2 id="basics">Basics </h2>
 
-<section>
-<h2 id="stochastic-gradient-descent">Stochastic Gradient Descent </h2>
+<p>We consider the case where the outputs/targets, also called the
+responses or the outcomes, \( y_i \) are discrete and only take values
+from \( k=0,\dots,K-1 \) (i.e. \( K \) classes).
+</p>
 
-<p>Stochastic gradient descent (SGD) and variants thereof address some of
-the shortcomings of the Gradient descent method discussed above.
+<p>The goal is to predict the
+output classes from the design matrix \( \boldsymbol{X}\in\mathbb{R}^{n\times p} \)
+made of \( n \) samples, each of which carries \( p \) features or predictors. The
+primary goal is to identify the classes to which new unseen samples
+belong.
 </p>
 
-<p>The underlying idea of SGD comes from the observation that the cost
-function, which we want to minimize, can almost always be written as a
-sum over \( n \) data points \( \{\mathbf{x}_i\}_{i=1}^n \),
+<p>Last week we  specialized to the case of two classes only, with outputs
+\( y_i=0 \) and \( y_i=1 \). Our outcomes could represent the status of a
+credit card user that could default or not on her/his credit card
+debt. That is
 </p>
+
 <p>&nbsp;<br>
 $$
-C(\mathbf{\beta}) = \sum_{i=1}^n c_i(\mathbf{x}_i,
-\mathbf{\beta}). 
+y_i = \begin{bmatrix} 0 & \mathrm{no}\\  1 & \mathrm{yes} \end{bmatrix}.
 $$
 <p>&nbsp;<br>
 </section>
 
 <section>
-<h2 id="computation-of-gradients">Computation of gradients </h2>
+<h2 id="two-parameters">Two parameters </h2>
 
-<p>This in turn means that the gradient can be
-computed as a sum over \( i \)-gradients 
-</p>
+<p>We assume now that we have two classes with \( y_i \) either \( 0 \) or \( 1 \). Furthermore we assume also that we have only two parameters \( \theta \) in our fitting of the Sigmoid function, that is we define probabilities </p>
 <p>&nbsp;<br>
 $$
-\nabla_\beta C(\mathbf{\beta}) = \sum_i^n \nabla_\beta c_i(\mathbf{x}_i,
-\mathbf{\beta}).
+\begin{align*}
+p(y_i=1|x_i,\boldsymbol{\theta}) &= \frac{\exp{(\theta_0+\theta_1x_i)}}{1+\exp{(\theta_0+\theta_1x_i)}},\nonumber\\
+p(y_i=0|x_i,\boldsymbol{\theta}) &= 1 - p(y_i=1|x_i,\boldsymbol{\theta}),
+\end{align*}
 $$
 <p>&nbsp;<br>
 
-<p>Stochasticity/randomness is introduced by only taking the
-gradient on a subset of the data called minibatches.  If there are \( n \)
-data points and the size of each minibatch is \( M \), there will be \( n/M \)
-minibatches. We denote these minibatches by \( B_k \) where
-\( k=1,\cdots,n/M \).
-</p>
-</section>
+<p>where \( \boldsymbol{\theta} \) are the weights we wish to extract from data, in our case \( \theta_0 \) and \( \theta_1 \). </p>
 
-<section>
-<h2 id="sgd-example">SGD example </h2>
-<p>As an example, suppose we have \( 10 \) data points \( (\mathbf{x}_1,\cdots, \mathbf{x}_{10}) \) 
-and we choose to have \( M=5 \) minibathces,
-then each minibatch contains two data points. In particular we have
-\( B_1 = (\mathbf{x}_1,\mathbf{x}_2), \cdots, B_5 =
-(\mathbf{x}_9,\mathbf{x}_{10}) \). Note that if you choose \( M=1 \) you
-have only a single batch with all data points and on the other extreme,
-you may choose \( M=n \) resulting in a minibatch for each datapoint, i.e
-\( B_k = \mathbf{x}_k \).
-</p>
-
-<p>The idea is now to approximate the gradient by replacing the sum over
-all data points with a sum over the data points in one the minibatches
-picked at random in each gradient descent step 
-</p>
+<p>Note that we used</p>
 <p>&nbsp;<br>
 $$
-\nabla_{\beta}
-C(\mathbf{\beta}) = \sum_{i=1}^n \nabla_\beta c_i(\mathbf{x}_i,
-\mathbf{\beta}) \rightarrow \sum_{i \in B_k}^n \nabla_\beta
-c_i(\mathbf{x}_i, \mathbf{\beta}).
+p(y_i=0\vert x_i, \boldsymbol{\theta}) = 1-p(y_i=1\vert x_i, \boldsymbol{\theta}).
 $$
 <p>&nbsp;<br>
 </section>
 
 <section>
-<h2 id="the-gradient-step">The gradient step </h2>
+<h2 id="maximum-likelihood">Maximum likelihood </h2>
 
-<p>Thus a gradient descent step now looks like </p>
+<p>In order to define the total likelihood for all possible outcomes from a  
+dataset \( \mathcal{D}=\{(y_i,x_i)\} \), with the binary labels
+\( y_i\in\{0,1\} \) and where the data points are drawn independently, we use the so-called <a href="https://en.wikipedia.org/wiki/Maximum_likelihood_estimation" target="_blank">Maximum Likelihood Estimation</a> (MLE) principle. 
+We aim thus at maximizing 
+the probability of seeing the observed data. We can then approximate the 
+likelihood in terms of the product of the individual probabilities of a specific outcome \( y_i \), that is 
+</p>
 <p>&nbsp;<br>
 $$
-\beta_{j+1} = \beta_j - \gamma_j \sum_{i \in B_k}^n \nabla_\beta c_i(\mathbf{x}_i,
-\mathbf{\beta})
+\begin{align*}
+P(\mathcal{D}|\boldsymbol{\theta})& = \prod_{i=1}^n \left[p(y_i=1|x_i,\boldsymbol{\theta})\right]^{y_i}\left[1-p(y_i=1|x_i,\boldsymbol{\theta}))\right]^{1-y_i}\nonumber \\
+\end{align*}
 $$
 <p>&nbsp;<br>
 
-<p>where \( k \) is picked at random with equal
-probability from \( [1,n/M] \). An iteration over the number of
-minibathces (n/M) is commonly referred to as an epoch. Thus it is
-typical to choose a number of epochs and for each epoch iterate over
-the number of minibatches, as exemplified in the code below.
-</p>
+<p>from which we obtain the log-likelihood and our <b>cost/loss</b> function</p>
+<p>&nbsp;<br>
+$$
+\mathcal{C}(\boldsymbol{\theta}) = \sum_{i=1}^n \left( y_i\log{p(y_i=1|x_i,\boldsymbol{\theta})} + (1-y_i)\log\left[1-p(y_i=1|x_i,\boldsymbol{\theta}))\right]\right).
+$$
+<p>&nbsp;<br>
 </section>
 
 <section>
-<h2 id="simple-example-code">Simple example code </h2>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="font-size: 80%; line-height: 125%;"><span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span> 
-
-n = <span style="color: #B452CD">100</span> <span style="color: #228B22">#100 datapoints </span>
-M = <span style="color: #B452CD">5</span>   <span style="color: #228B22">#size of each minibatch</span>
-m = <span style="color: #658b00">int</span>(n/M) <span style="color: #228B22">#number of minibatches</span>
-n_epochs = <span style="color: #B452CD">10</span> <span style="color: #228B22">#number of epochs</span>
-
-j = <span style="color: #B452CD">0</span>
-<span style="color: #8B008B; font-weight: bold">for</span> epoch <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #B452CD">1</span>,n_epochs+<span style="color: #B452CD">1</span>):
-    <span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(m):
-        k = np.random.randint(m) <span style="color: #228B22">#Pick the k-th minibatch at random</span>
-        <span style="color: #228B22">#Compute the gradient using the data in minibatch Bk</span>
-        <span style="color: #228B22">#Compute new suggestion for </span>
-        j += <span style="color: #B452CD">1</span>
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-<p>Taking the gradient only on a subset of the data has two important
-benefits. First, it introduces randomness which decreases the chance
-that our opmization scheme gets stuck in a local minima. Second, if
-the size of the minibatches are small relative to the number of
-datapoints (\( M <  n \)), the computation of the gradient is much
-cheaper since we sum over the datapoints in the \( k-th \) minibatch and not
-all \( n \) datapoints.
-</p>
-</section>
+<h2 id="the-cost-function-rewritten">The cost function rewritten </h2>
 
-<section>
-<h2 id="when-do-we-stop">When do we stop? </h2>
-
-<p>A natural question is when do we stop the search for a new minimum?
-One possibility is to compute the full gradient after a given number
-of epochs and check if the norm of the gradient is smaller than some
-threshold and stop if true. However, the condition that the gradient
-is zero is valid also for local minima, so this would only tell us
-that we are close to a local/global minimum. However, we could also
-evaluate the cost function at this point, store the result and
-continue the search. If the test kicks in at a later stage we can
-compare the values of the cost function and keep the \( \beta \) that
-gave the lowest value.
-</p>
-</section>
+<p>Reordering the logarithms, we can rewrite the <b>cost/loss</b> function as</p>
+<p>&nbsp;<br>
+$$
+\mathcal{C}(\boldsymbol{\theta}) = \sum_{i=1}^n  \left(y_i(\theta_0+\theta_1x_i) -\log{(1+\exp{(\theta_0+\theta_1x_i)})}\right).
+$$
+<p>&nbsp;<br>
 
-<section>
-<h2 id="slightly-different-approach">Slightly different approach </h2>
-
-<p>Another approach is to let the step length \( \gamma_j \) depend on the
-number of epochs in such a way that it becomes very small after a
-reasonable time such that we do not move at all. Such approaches are
-also called scaling. There are many such ways to <a href="https://towardsdatascience.com/gradient-descent-the-learning-rate-and-the-importance-of-feature-scaling-6c0b416596e1" target="_blank">scale the learning
-rate</a>
-and <a href="https://www.jmlr.org/papers/volume23/20-1258/20-1258.pdf" target="_blank">discussions here</a>. See
-also
-<a href="https://towardsdatascience.com/learning-rate-schedules-and-adaptive-learning-rate-methods-for-deep-learning-2c8f433990d1" target="_blank"><tt>https://towardsdatascience.com/learning-rate-schedules-and-adaptive-learning-rate-methods-for-deep-learning-2c8f433990d1</tt></a>
-for a discussion of different scaling functions for the learning rate.
+<p>The maximum likelihood estimator is defined as the set of parameters that maximize the log-likelihood where we maximize with respect to \( \theta \).
+Since the cost (error) function is just the negative log-likelihood, for logistic regression we have that
 </p>
-</section>
-
-<section>
-<h2 id="time-decay-rate">Time decay rate </h2>
-
-<p>As an example, let \( e = 0,1,2,3,\cdots \) denote the current epoch and let \( t_0, t_1 > 0 \) be two fixed numbers. Furthermore, let \( t = e \cdot m + i \) where \( m \) is the number of minibatches and \( i=0,\cdots,m-1 \). Then the function <p>&nbsp;<br>
-$$\gamma_j(t; t_0, t_1) = \frac{t_0}{t+t_1} $$
-<p>&nbsp;<br> goes to zero as the number of epochs gets large. I.e. we start with a step length \( \gamma_j (0; t_0, t_1) = t_0/t_1 \) which decays in <em>time</em> \( t \).</p>
+<p>&nbsp;<br>
+$$
+\mathcal{C}(\boldsymbol{\theta})=-\sum_{i=1}^n  \left(y_i(\theta_0+\theta_1x_i) -\log{(1+\exp{(\theta_0+\theta_1x_i)})}\right).
+$$
+<p>&nbsp;<br>
 
-<p>In this way we can fix the number of epochs, compute \( \beta \) and
-evaluate the cost function at the end. Repeating the computation will
-give a different result since the scheme is random by design. Then we
-pick the final \( \beta \) that gives the lowest value of the cost
-function.
+<p>This equation is known in statistics as the <b>cross entropy</b>. Finally, we note that just as in linear regression, 
+in practice we often supplement the cross-entropy with additional regularization terms, usually \( L_1 \) and \( L_2 \) regularization as we did for Ridge and Lasso regression.
 </p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="font-size: 80%; line-height: 125%;"><span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span> 
-
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">step_length</span>(t,t0,t1):
-    <span style="color: #8B008B; font-weight: bold">return</span> t0/(t+t1)
-
-n = <span style="color: #B452CD">100</span> <span style="color: #228B22">#100 datapoints </span>
-M = <span style="color: #B452CD">5</span>   <span style="color: #228B22">#size of each minibatch</span>
-m = <span style="color: #658b00">int</span>(n/M) <span style="color: #228B22">#number of minibatches</span>
-n_epochs = <span style="color: #B452CD">500</span> <span style="color: #228B22">#number of epochs</span>
-t0 = <span style="color: #B452CD">1.0</span>
-t1 = <span style="color: #B452CD">10</span>
-
-gamma_j = t0/t1
-j = <span style="color: #B452CD">0</span>
-<span style="color: #8B008B; font-weight: bold">for</span> epoch <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #B452CD">1</span>,n_epochs+<span style="color: #B452CD">1</span>):
-    <span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(m):
-        k = np.random.randint(m) <span style="color: #228B22">#Pick the k-th minibatch at random</span>
-        <span style="color: #228B22">#Compute the gradient using the data in minibatch Bk</span>
-        <span style="color: #228B22">#Compute new suggestion for beta</span>
-        t = epoch*m+i
-        gamma_j = step_length(t,t0,t1)
-        j += <span style="color: #B452CD">1</span>
-
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;gamma_j after %d epochs: %g&quot;</span> % (n_epochs,gamma_j))
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-</section>
-
-<section>
-<h2 id="code-with-a-number-of-minibatches-which-varies">Code with a Number of Minibatches which varies </h2>
-
-<p>In the code here we vary the number of mini-batches.</p>
-
-<!-- code=text (!bc pycode) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="font-size: 80%; line-height: 125%;"># Importing various packages
-from math import exp, sqrt
-from random import random, seed
-import numpy as np
-import matplotlib.pyplot as plt
-
-n = 100
-x = 2*np.random.rand(n,1)
-y = 4+3*x+np.random.randn(n,1)
-
-X = np.c_[np.ones((n,1)), x]
-XT_X = X.T @ X
-theta_linreg = np.linalg.inv(X.T @ X) @ (X.T @ y)
-print(&quot;Own inversion&quot;)
-print(theta_linreg)
-# Hessian matrix
-H = (2.0/n)* XT_X
-EigValues, EigVectors = np.linalg.eig(H)
-print(f&quot;Eigenvalues of Hessian Matrix:{EigValues}&quot;)
-
-theta = np.random.randn(2,1)
-eta = 1.0/np.max(EigValues)
-Niterations = 1000
-
-
-for iter in range(Niterations):
-    gradients = 2.0/n*X.T @ ((X @ theta)-y)
-    theta -= eta*gradients
-print(&quot;theta from own gd&quot;)
-print(theta)
-
-xnew = np.array([[0],[2]])
-Xnew = np.c_[np.ones((2,1)), xnew]
-ypredict = Xnew.dot(theta)
-ypredict2 = Xnew.dot(theta_linreg)
-
-n_epochs = 50
-M = 5   #size of each minibatch
-m = int(n/M) #number of minibatches
-t0, t1 = 5, 50
-
-def learning_schedule(t):
-    return t0/(t+t1)
-
-theta = np.random.randn(2,1)
-
-for epoch in range(n_epochs):
-# Can you figure out a better way of setting up the contributions to each batch?
-    for i in range(m):
-        random_index = M*np.random.randint(m)
-        xi = X[random_index:random_index+M]
-        yi = y[random_index:random_index+M]
-        gradients = (2.0/M)* xi.T @ ((xi @ theta)-yi)
-        eta = learning_schedule(epoch*m+i)
-        theta = theta - eta*gradients
-print(&quot;theta from own sdg&quot;)
-print(theta)
-
-plt.plot(xnew, ypredict, &quot;r-&quot;)
-plt.plot(xnew, ypredict2, &quot;b-&quot;)
-plt.plot(x, y ,&#39;ro&#39;)
-plt.axis([0,2.0,0, 15.0])
-plt.xlabel(r&#39;$x$&#39;)
-plt.ylabel(r&#39;$y$&#39;)
-plt.title(r&#39;Random numbers &#39;)
-plt.show()
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
 </section>
 
 <section>
-<h2 id="replace-or-not">Replace or not </h2>
+<h2 id="minimizing-the-cross-entropy">Minimizing the cross entropy </h2>
 
-<p>In the above code, we have use replacement in setting up the
-mini-batches. The discussion
-<a href="https://sebastianraschka.com/faq/docs/sgd-methods.html" target="_blank">here</a> may be
-useful.  
+<p>The cross entropy is a convex function of the weights \( \boldsymbol{\theta} \) and,
+therefore, any local minimizer is a global minimizer. 
 </p>
-</section>
 
-<section>
-<h2 id="momentum-based-gd">Momentum based GD </h2>
-
-<p>The stochastic gradient descent (SGD) is almost always used with a
-<em>momentum</em> or inertia term that serves as a memory of the direction we
-are moving in parameter space.  This is typically implemented as
-follows
+<p>Minimizing this
+cost function with respect to the two parameters \( \theta_0 \) and \( \theta_1 \) we obtain
 </p>
 
 <p>&nbsp;<br>
 $$
-\begin{align}
-\mathbf{v}_{t}&=\gamma \mathbf{v}_{t-1}+\eta_{t}\nabla_\theta E(\boldsymbol{\theta}_t) \nonumber \\
-\boldsymbol{\theta}_{t+1}&= \boldsymbol{\theta}_t -\mathbf{v}_{t},
-\tag{1}
-\end{align}
+\frac{\partial \mathcal{C}(\boldsymbol{\theta})}{\partial \theta_0} = -\sum_{i=1}^n  \left(y_i -\frac{\exp{(\theta_0+\theta_1x_i)}}{1+\exp{(\theta_0+\theta_1x_i)}}\right),
 $$
 <p>&nbsp;<br>
 
-<p>where we have introduced a momentum parameter \( \gamma \), with
-\( 0\le\gamma\le 1 \), and for brevity we dropped the explicit notation to
-indicate the gradient is to be taken over a different mini-batch at
-each step. We call this algorithm gradient descent with momentum
-(GDM). From these equations, it is clear that \( \mathbf{v}_t \) is a
-running average of recently encountered gradients and
-\( (1-\gamma)^{-1} \) sets the characteristic time scale for the memory
-used in the averaging procedure. Consistent with this, when
-\( \gamma=0 \), this just reduces down to ordinary SGD as discussed
-earlier. An equivalent way of writing the updates is
-</p>
-
+<p>and </p>
 <p>&nbsp;<br>
 $$
-\Delta \boldsymbol{\theta}_{t+1} = \gamma \Delta \boldsymbol{\theta}_t -\ \eta_{t}\nabla_\theta E(\boldsymbol{\theta}_t),
+\frac{\partial \mathcal{C}(\boldsymbol{\theta})}{\partial \theta_1} = -\sum_{i=1}^n  \left(y_ix_i -x_i\frac{\exp{(\theta_0+\theta_1x_i)}}{1+\exp{(\theta_0+\theta_1x_i)}}\right).
 $$
 <p>&nbsp;<br>
-
-<p>where we have defined \( \Delta \boldsymbol{\theta}_{t}= \boldsymbol{\theta}_t-\boldsymbol{\theta}_{t-1} \).</p>
 </section>
 
 <section>
-<h2 id="more-on-momentum-based-approaches">More on momentum based approaches </h2>
+<h2 id="a-more-compact-expression">A more compact expression </h2>
 
-<p>Let us try to get more intuition from these equations. It is helpful
-to consider a simple physical analogy with a particle of mass \( m \)
-moving in a viscous medium with drag coefficient \( \mu \) and potential
-\( E(\mathbf{w}) \). If we denote the particle's position by \( \mathbf{w} \),
-then its motion is described by
+<p>Let us now define a vector \( \boldsymbol{y} \) with \( n \) elements \( y_i \), an
+\( n\times p \) matrix \( \boldsymbol{X} \) which contains the \( x_i \) values and a
+vector \( \boldsymbol{p} \) of fitted probabilities \( p(y_i\vert x_i,\boldsymbol{\theta}) \). We can rewrite in a more compact form the first
+derivative of the cost function as
 </p>
 
 <p>&nbsp;<br>
 $$
-m {d^2 \mathbf{w} \over dt^2} + \mu {d \mathbf{w} \over dt }= -\nabla_w E(\mathbf{w}).
-$$
-<p>&nbsp;<br>
-
-<p>We can discretize this equation in the usual way to get</p>
-
-<p>&nbsp;<br>
-$$
-m { \mathbf{w}_{t+\Delta t}-2 \mathbf{w}_{t} +\mathbf{w}_{t-\Delta t} \over (\Delta t)^2}+\mu {\mathbf{w}_{t+\Delta t}- \mathbf{w}_{t} \over \Delta t} = -\nabla_w E(\mathbf{w}).
+\frac{\partial \mathcal{C}(\boldsymbol{\theta})}{\partial \boldsymbol{\theta}} = -\boldsymbol{X}^T\left(\boldsymbol{y}-\boldsymbol{p}\right). 
 $$
 <p>&nbsp;<br>
 
-<p>Rearranging this equation, we can rewrite this as</p>
+<p>If we in addition define a diagonal matrix \( \boldsymbol{W} \) with elements 
+\( p(y_i\vert x_i,\boldsymbol{\theta})(1-p(y_i\vert x_i,\boldsymbol{\theta}) \), we can obtain a compact expression of the second derivative as 
+</p>
 
 <p>&nbsp;<br>
 $$
-\Delta \mathbf{w}_{t +\Delta t}= - { (\Delta t)^2 \over m +\mu \Delta t} \nabla_w E(\mathbf{w})+ {m \over m +\mu \Delta t} \Delta \mathbf{w}_t.
+\frac{\partial^2 \mathcal{C}(\boldsymbol{\theta})}{\partial \boldsymbol{\theta}\partial \boldsymbol{\theta}^T} = \boldsymbol{X}^T\boldsymbol{W}\boldsymbol{X}. 
 $$
 <p>&nbsp;<br>
 </section>
 
 <section>
-<h2 id="momentum-parameter">Momentum parameter </h2>
-
-<p>Notice that this equation is identical to previous one if we identify
-the position of the particle, \( \mathbf{w} \), with the parameters
-\( \boldsymbol{\theta} \). This allows us to identify the momentum
-parameter and learning rate with the mass of the particle and the
-viscous drag as:
-</p>
+<h2 id="extending-to-more-predictors">Extending to more predictors </h2>
 
+<p>Within a binary classification problem, we can easily expand our model to include multiple predictors. Our ratio between likelihoods is then with \( p \) predictors</p>
 <p>&nbsp;<br>
 $$
-\gamma= {m \over m +\mu \Delta t }, \qquad \eta = {(\Delta t)^2 \over m +\mu \Delta t}.
+\log{ \frac{p(\boldsymbol{\theta}\boldsymbol{x})}{1-p(\boldsymbol{\theta}\boldsymbol{x})}} = \theta_0+\theta_1x_1+\theta_2x_2+\dots+\theta_px_p.
 $$
 <p>&nbsp;<br>
 
-<p>Thus, as the name suggests, the momentum parameter is proportional to
-the mass of the particle and effectively provides inertia.
-Furthermore, in the large viscosity/small learning rate limit, our
-memory time scales as \( (1-\gamma)^{-1} \approx m/(\mu \Delta t) \).
-</p>
-
-<p>Why is momentum useful? SGD momentum helps the gradient descent
-algorithm gain speed in directions with persistent but small gradients
-even in the presence of stochasticity, while suppressing oscillations
-in high-curvature directions. This becomes especially important in
-situations where the landscape is shallow and flat in some directions
-and narrow and steep in others. It has been argued that first-order
-methods (with appropriate initial conditions) can perform comparable
-to more expensive second order methods, especially in the context of
-complex deep learning models.
-</p>
-
-<p>These beneficial properties of momentum can sometimes become even more
-pronounced by using a slight modification of the classical momentum
-algorithm called Nesterov Accelerated Gradient (NAG).
-</p>
-
-<p>In the NAG algorithm, rather than calculating the gradient at the
-current parameters, \( \nabla_\theta E(\boldsymbol{\theta}_t) \), one
-calculates the gradient at the expected value of the parameters given
-our current momentum, \( \nabla_\theta E(\boldsymbol{\theta}_t +\gamma
-\mathbf{v}_{t-1}) \). This yields the NAG update rule
-</p>
-
+<p>Here we defined \( \boldsymbol{x}=[1,x_1,x_2,\dots,x_p] \) and \( \boldsymbol{\theta}=[\theta_0, \theta_1, \dots, \theta_p] \) leading to</p>
 <p>&nbsp;<br>
 $$
-\begin{align}
-\mathbf{v}_{t}&=\gamma \mathbf{v}_{t-1}+\eta_{t}\nabla_\theta E(\boldsymbol{\theta}_t +\gamma \mathbf{v}_{t-1}) \nonumber \\
-\boldsymbol{\theta}_{t+1}&= \boldsymbol{\theta}_t -\mathbf{v}_{t}.
-\tag{2}
-\end{align}
+p(\boldsymbol{\theta}\boldsymbol{x})=\frac{ \exp{(\theta_0+\theta_1x_1+\theta_2x_2+\dots+\theta_px_p)}}{1+\exp{(\theta_0+\theta_1x_1+\theta_2x_2+\dots+\theta_px_p)}}.
 $$
 <p>&nbsp;<br>
-
-<p>One of the major advantages of NAG is that it allows for the use of a larger learning rate than GDM for the same choice of \( \gamma \).</p>
-</section>
-
-<section>
-<h2 id="second-moment-of-the-gradient">Second moment of the gradient </h2>
-
-<p>In stochastic gradient descent, with and without momentum, we still
-have to specify a schedule for tuning the learning rates \( \eta_t \)
-as a function of time.  As discussed in the context of Newton's
-method, this presents a number of dilemmas. The learning rate is
-limited by the steepest direction which can change depending on the
-current position in the landscape. To circumvent this problem, ideally
-our algorithm would keep track of curvature and take large steps in
-shallow, flat directions and small steps in steep, narrow directions.
-Second-order methods accomplish this by calculating or approximating
-the Hessian and normalizing the learning rate by the
-curvature. However, this is very computationally expensive for
-extremely large models. Ideally, we would like to be able to
-adaptively change the step size to match the landscape without paying
-the steep computational price of calculating or approximating
-Hessians.
-</p>
-
-<p>During the last decade a number of methods have been introduced that accomplish
-this by tracking not only the gradient, but also the second moment of
-the gradient. These methods include AdaGrad, AdaDelta, Root Mean Squared Propagation (RMS-Prop), and
-<a href="https://arxiv.org/abs/1412.6980" target="_blank">ADAM</a>.
-</p>
 </section>
 
 <section>
-<h2 id="rms-prop">RMS prop </h2>
+<h2 id="including-more-classes">Including more classes </h2>
 
-<p>In RMS prop, in addition to keeping a running average of the first
-moment of the gradient, we also keep track of the second moment
-denoted by \( \mathbf{s}_t=\mathbb{E}[\mathbf{g}_t^2] \). The update rule
-for RMS prop is given by
+<p>Till now we have mainly focused on two classes, the so-called binary
+system. Suppose we wish to extend to \( K \) classes.  Let us for the sake
+of simplicity assume we have only two predictors. We have then following model
 </p>
 
 <p>&nbsp;<br>
 $$
-\begin{align}
-\mathbf{g}_t &= \nabla_\theta E(\boldsymbol{\theta}) 
-\tag{3}\\
-\mathbf{s}_t &=\beta \mathbf{s}_{t-1} +(1-\beta)\mathbf{g}_t^2 \nonumber \\
-\boldsymbol{\theta}_{t+1}&=&\boldsymbol{\theta}_t - \eta_t { \mathbf{g}_t \over \sqrt{\mathbf{s}_t +\epsilon}}, \nonumber
-\end{align}
+\log{\frac{p(C=1\vert x)}{p(K\vert x)}} = \theta_{10}+\theta_{11}x_1,
 $$
 <p>&nbsp;<br>
 
-<p>where \( \beta \) controls the averaging time of the second moment and is
-typically taken to be about \( \beta=0.9 \), \( \eta_t \) is a learning rate
-typically chosen to be \( 10^{-3} \), and \( \epsilon\sim 10^{-8}  \) is a
-small regularization constant to prevent divergences. Multiplication
-and division by vectors is understood as an element-wise operation. It
-is clear from this formula that the learning rate is reduced in
-directions where the norm of the gradient is consistently large. This
-greatly speeds up the convergence by allowing us to use a larger
-learning rate for flat directions.
-</p>
-</section>
-
-<section>
-<h2 id="adam-optimizer-https-arxiv-org-abs-1412-6980"><a href="https://arxiv.org/abs/1412.6980" target="_blank">ADAM optimizer</a> </h2>
-
-<p>A related algorithm is the ADAM optimizer. In
-<a href="https://arxiv.org/abs/1412.6980" target="_blank">ADAM</a>, we keep a running average of
-both the first and second moment of the gradient and use this
-information to adaptively change the learning rate for different
-parameters.  The method isefficient when working with large
-problems involving lots data and/or parameters.  It is a combination of the
-gradient descent with momentum algorithm and the RMSprop algorithm
-discussed above.
-</p>
-
-<p>In addition to keeping a running average of the first and
-second moments of the gradient
-(i.e. \( \mathbf{m}_t=\mathbb{E}[\mathbf{g}_t] \) and
-\( \mathbf{s}_t=\mathbb{E}[\mathbf{g}^2_t] \), respectively), ADAM
-performs an additional bias correction to account for the fact that we
-are estimating the first two moments of the gradient using a running
-average (denoted by the hats in the update rule below). The update
-rule for ADAM is given by (where multiplication and division are once
-again understood to be element-wise operations below)
-</p>
-
+<p>and </p>
 <p>&nbsp;<br>
 $$
-\begin{align}
-\mathbf{g}_t &= \nabla_\theta E(\boldsymbol{\theta}) 
-\tag{4}\\
-\mathbf{m}_t &= \beta_1 \mathbf{m}_{t-1} + (1-\beta_1) \mathbf{g}_t \nonumber \\
-\mathbf{s}_t &=\beta_2 \mathbf{s}_{t-1} +(1-\beta_2)\mathbf{g}_t^2 \nonumber \\
-\boldsymbol{\mathbf{m}}_t&={\mathbf{m}_t \over 1-\beta_1^t} \nonumber \\
-\boldsymbol{\mathbf{s}}_t &={\mathbf{s}_t \over1-\beta_2^t} \nonumber \\
-\boldsymbol{\theta}_{t+1}&=\boldsymbol{\theta}_t - \eta_t { \boldsymbol{\mathbf{m}}_t \over \sqrt{\boldsymbol{\mathbf{s}}_t} +\epsilon}, \nonumber \\
-\tag{5}
-\end{align}
+\log{\frac{p(C=2\vert x)}{p(K\vert x)}} = \theta_{20}+\theta_{21}x_1,
 $$
 <p>&nbsp;<br>
 
-<p>where \( \beta_1 \) and \( \beta_2 \) set the memory lifetime of the first and
-second moment and are typically taken to be \( 0.9 \) and \( 0.99 \)
-respectively, and \( \eta \) and \( \epsilon \) are identical to RMSprop.
-</p>
-
-<p>Like in RMSprop, the effective step size of a parameter depends on the
-magnitude of its gradient squared.  To understand this better, let us
-rewrite this expression in terms of the variance
-\( \boldsymbol{\sigma}_t^2 = \boldsymbol{\mathbf{s}}_t -
-(\boldsymbol{\mathbf{m}}_t)^2 \). Consider a single parameter \( \theta_t \). The
-update rule for this parameter is given by
-</p>
-
+<p>and so on till the class \( C=K-1 \) class</p>
 <p>&nbsp;<br>
 $$
-\Delta \theta_{t+1}= -\eta_t { \boldsymbol{m}_t \over \sqrt{\sigma_t^2 +  m_t^2 }+\epsilon}.
+\log{\frac{p(C=K-1\vert x)}{p(K\vert x)}} = \theta_{(K-1)0}+\theta_{(K-1)1}x_1,
 $$
 <p>&nbsp;<br>
-</section>
-
-<section>
-<h2 id="algorithms-and-codes-for-adagrad-rmsprop-and-adam">Algorithms and codes for Adagrad, RMSprop and Adam </h2>
-
-<p>The algorithms we have implemented are well described in the text by <a href="https://www.deeplearningbook.org/contents/optimization.html" target="_blank">Goodfellow, Bengio and Courville, chapter 8</a>.</p>
-
-<p>The codes which implement these algorithms are discussed after our presentation of automatic differentiation.</p>
-<h2 id="adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html">AdaGrad algorithm, taken from <a href="https://www.deeplearningbook.org/contents/optimization.html" target="_blank">Goodfellow et al</a> </h2>
-
-<br/><br/>
-<center>
-<p><img src="http://github.com/figures/adagrad.png" width="600" align="bottom"></p>
-</center>
-<br/><br/>
-<h2 id="rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html">RMSProp algorithm, taken from <a href="https://www.deeplearningbook.org/contents/optimization.html" target="_blank">Goodfellow et al</a> </h2>
-
-<br/><br/>
-<center>
-<p><img src="http://github.com/figures/rmsprop.png" width="600" align="bottom"></p>
-</center>
-<br/><br/>
-<h2 id="adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html">ADAM algorithm, taken from <a href="https://www.deeplearningbook.org/contents/optimization.html" target="_blank">Goodfellow et al</a> </h2>
 
-<br/><br/>
-<center>
-<p><img src="http://github.com/figures/adam.png" width="600" align="bottom"></p>
-</center>
-<br/><br/>
-</section>
-
-<section>
-<h2 id="practical-tips">Practical tips </h2>
-
-<ul>
-<p><li> <b>Randomize the data when making mini-batches</b>. It is always important to randomly shuffle the data when forming mini-batches. Otherwise, the gradient descent method can fit spurious correlations resulting from the order in which data is presented.</li>
-<p><li> <b>Transform your inputs</b>. Learning becomes difficult when our landscape has a mixture of steep and flat directions. One simple trick for minimizing these situations is to standardize the data by subtracting the mean and normalizing the variance of input variables. Whenever possible, also decorrelate the inputs. To understand why this is helpful, consider the case of linear regression. It is easy to show that for the squared error cost function, the Hessian of the cost function is just the correlation matrix between the inputs. Thus, by standardizing the inputs, we are ensuring that the landscape looks homogeneous in all directions in parameter space. Since most deep networks can be viewed as linear transformations followed by a non-linearity at each layer, we expect this intuition to hold beyond the linear case.</li>
-<p><li> <b>Monitor the out-of-sample performance.</b> Always monitor the performance of your model on a validation set (a small portion of the training data that is held out of the training process to serve as a proxy for the test set. If the validation error starts increasing, then the model is beginning to overfit. Terminate the learning process. This <em>early stopping</em> significantly improves performance in many settings.</li>
-<p><li> <b>Adaptive optimization methods don't always have good generalization.</b> Recent studies have shown that adaptive methods such as ADAM, RMSPorp, and AdaGrad tend to have poor generalization compared to SGD or SGD with momentum, particularly in the high-dimensional limit (i.e. the number of parameters exceeds the number of data points). Although it is not clear at this stage why these methods perform so well in training deep neural networks, simpler procedures like properly-tuned SGD may work as well or better in these applications.</li>
-</ul>
-<p>
-<p>Geron's text, see chapter 11, has several interesting discussions.</p>
+<p>and the model is specified in term of \( K-1 \) so-called log-odds or
+<b>logit</b> transformations.
+</p>
 </section>
 
 <section>
-<h2 id="automatic-differentiation">Automatic differentiation </h2>
+<h2 id="more-classes">More classes </h2>
 
-<p><a href="https://en.wikipedia.org/wiki/Automatic_differentiation" target="_blank">Automatic differentiation (AD)</a>, 
-also called algorithmic
-differentiation or computational differentiation,is a set of
-techniques to numerically evaluate the derivative of a function
-specified by a computer program. AD exploits the fact that every
-computer program, no matter how complicated, executes a sequence of
-elementary arithmetic operations (addition, subtraction,
-multiplication, division, etc.) and elementary functions (exp, log,
-sin, cos, etc.). By applying the chain rule repeatedly to these
-operations, derivatives of arbitrary order can be computed
-automatically, accurately to working precision, and using at most a
-small constant factor more arithmetic operations than the original
-program.
+<p>In our discussion of neural networks we will encounter the above again
+in terms of a slightly modified function, the so-called <b>Softmax</b> function.
 </p>
 
-<p>Automatic differentiation is neither:</p>
-
-<ul>
-<p><li> Symbolic differentiation, nor</li>
-<p><li> Numerical differentiation (the method of finite differences).</li>
-</ul>
-<p>
-<p>Symbolic differentiation can lead to inefficient code and faces the
-difficulty of converting a computer program into a single expression,
-while numerical differentiation can introduce round-off errors in the
-discretization process and cancellation
+<p>The softmax function is used in various multiclass classification
+methods, such as multinomial logistic regression (also known as
+softmax regression), multiclass linear discriminant analysis, naive
+Bayes classifiers, and artificial neural networks.  Specifically, in
+multinomial logistic regression and linear discriminant analysis, the
+input to the function is the result of \( K \) distinct linear functions,
+and the predicted probability for the \( k \)-th class given a sample
+vector \( \boldsymbol{x} \) and a weighting vector \( \boldsymbol{\theta} \) is (with two
+predictors):
 </p>
 
-<p>Python has tools for so-called <b>automatic differentiation</b>.
-Consider the following example
-</p>
 <p>&nbsp;<br>
 $$
-f(x) = \sin\left(2\pi x + x^2\right)
+p(C=k\vert \mathbf {x} )=\frac{\exp{(\theta_{k0}+\theta_{k1}x_1)}}{1+\sum_{l=1}^{K-1}\exp{(\theta_{l0}+\theta_{l1}x_1)}}.
 $$
 <p>&nbsp;<br>
 
-<p>which has the following derivative</p>
+<p>It is easy to extend to more predictors. The final class is </p>
 <p>&nbsp;<br>
 $$
-f'(x) = \cos\left(2\pi x + x^2\right)\left(2\pi + 2x\right) 
+p(C=K\vert \mathbf {x} )=\frac{1}{1+\sum_{l=1}^{K-1}\exp{(\theta_{l0}+\theta_{l1}x_1)}},
 $$
 <p>&nbsp;<br>
 
-<p>Using <b>autograd</b> we have</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="font-size: 80%; line-height: 125%;"><span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">autograd.numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
-
-<span style="color: #228B22"># To do elementwise differentiation:</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">autograd</span> <span style="color: #8B008B; font-weight: bold">import</span> elementwise_grad <span style="color: #8B008B; font-weight: bold">as</span> egrad 
-
-<span style="color: #228B22"># To plot:</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">matplotlib.pyplot</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">plt</span> 
-
-
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">f</span>(x):
-    <span style="color: #8B008B; font-weight: bold">return</span> np.sin(<span style="color: #B452CD">2</span>*np.pi*x + x**<span style="color: #B452CD">2</span>)
-
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">f_grad_analytic</span>(x):
-    <span style="color: #8B008B; font-weight: bold">return</span> np.cos(<span style="color: #B452CD">2</span>*np.pi*x + x**<span style="color: #B452CD">2</span>)*(<span style="color: #B452CD">2</span>*np.pi + <span style="color: #B452CD">2</span>*x)
-
-<span style="color: #228B22"># Do the comparison:</span>
-x = np.linspace(<span style="color: #B452CD">0</span>,<span style="color: #B452CD">1</span>,<span style="color: #B452CD">1000</span>)
-
-f_grad = egrad(f)
-
-computed = f_grad(x)
-analytic = f_grad_analytic(x)
-
-plt.title(<span style="color: #CD5555">&#39;Derivative computed from Autograd compared with the analytical derivative&#39;</span>)
-plt.plot(x,computed,label=<span style="color: #CD5555">&#39;autograd&#39;</span>)
-plt.plot(x,analytic,label=<span style="color: #CD5555">&#39;analytic&#39;</span>)
-
-plt.xlabel(<span style="color: #CD5555">&#39;x&#39;</span>)
-plt.ylabel(<span style="color: #CD5555">&#39;y&#39;</span>)
-plt.legend()
-
-plt.show()
-
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;The max absolute difference is: %g&quot;</span>%(np.max(np.abs(computed - analytic))))
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-</section>
-
-<section>
-<h2 id="using-autograd">Using autograd </h2>
-
-<p>Here we
-experiment with what kind of functions Autograd is capable
-of finding the gradient of. The following Python functions are just
-meant to illustrate what Autograd can do, but please feel free to
-experiment with other, possibly more complicated, functions as well.
-</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="font-size: 80%; line-height: 125%;"><span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">autograd.numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">autograd</span> <span style="color: #8B008B; font-weight: bold">import</span> grad
-
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">f1</span>(x):
-    <span style="color: #8B008B; font-weight: bold">return</span> x**<span style="color: #B452CD">3</span> + <span style="color: #B452CD">1</span>
-
-f1_grad = grad(f1)
-
-<span style="color: #228B22"># Remember to send in float as argument to the computed gradient from Autograd!</span>
-a = <span style="color: #B452CD">1.0</span>
-
-<span style="color: #228B22"># See the evaluated gradient at a using autograd:</span>
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;The gradient of f1 evaluated at a = %g using autograd is: %g&quot;</span>%(a,f1_grad(a)))
-
-<span style="color: #228B22"># Compare with the analytical derivative, that is f1&#39;(x) = 3*x**2 </span>
-grad_analytical = <span style="color: #B452CD">3</span>*a**<span style="color: #B452CD">2</span>
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;The gradient of f1 evaluated at a = %g by finding the analytic expression is: %g&quot;</span>%(a,grad_analytical))
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-</section>
-
-<section>
-<h2 id="autograd-with-more-complicated-functions">Autograd with more complicated functions </h2>
-
-<p>To differentiate with respect to two (or more) arguments of a Python
-function, Autograd need to know at which variable the function if
-being differentiated with respect to.
-</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="font-size: 80%; line-height: 125%;"><span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">autograd.numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">autograd</span> <span style="color: #8B008B; font-weight: bold">import</span> grad
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">f2</span>(x1,x2):
-    <span style="color: #8B008B; font-weight: bold">return</span> <span style="color: #B452CD">3</span>*x1**<span style="color: #B452CD">3</span> + x2*(x1 - <span style="color: #B452CD">5</span>) + <span style="color: #B452CD">1</span>
-
-<span style="color: #228B22"># By sending the argument 0, Autograd will compute the derivative w.r.t the first variable, in this case x1</span>
-f2_grad_x1 = grad(f2,<span style="color: #B452CD">0</span>)
-
-<span style="color: #228B22"># ... and differentiate w.r.t x2 by sending 1 as an additional arugment to grad</span>
-f2_grad_x2 = grad(f2,<span style="color: #B452CD">1</span>)
-
-x1 = <span style="color: #B452CD">1.0</span>
-x2 = <span style="color: #B452CD">3.0</span> 
-
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;Evaluating at x1 = %g, x2 = %g&quot;</span>%(x1,x2))
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;-&quot;</span>*<span style="color: #B452CD">30</span>)
-
-<span style="color: #228B22"># Compare with the analytical derivatives:</span>
-
-<span style="color: #228B22"># Derivative of f2 w.r.t x1 is: 9*x1**2 + x2:</span>
-f2_grad_x1_analytical = <span style="color: #B452CD">9</span>*x1**<span style="color: #B452CD">2</span> + x2
-
-<span style="color: #228B22"># Derivative of f2 w.r.t x2 is: x1 - 5:</span>
-f2_grad_x2_analytical = x1 - <span style="color: #B452CD">5</span>
-
-<span style="color: #228B22"># See the evaluated derivations:</span>
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;The derivative of f2 w.r.t x1: %g&quot;</span>%( f2_grad_x1(x1,x2) ))
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;The analytical derivative of f2 w.r.t x1: %g&quot;</span>%( f2_grad_x1(x1,x2) ))
-
-<span style="color: #658b00">print</span>()
-
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;The derivative of f2 w.r.t x2: %g&quot;</span>%( f2_grad_x2(x1,x2) ))
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;The analytical derivative of f2 w.r.t x2: %g&quot;</span>%( f2_grad_x2(x1,x2) ))
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-<p>Note that the grad function will not produce the true gradient of the function. The true gradient of a function with two or more variables will produce a vector, where each element is the function differentiated w.r.t a variable.</p>
-</section>
-
-<section>
-<h2 id="more-complicated-functions-using-the-elements-of-their-arguments-directly">More complicated functions using the elements of their arguments directly </h2>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="font-size: 80%; line-height: 125%;"><span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">autograd.numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">autograd</span> <span style="color: #8B008B; font-weight: bold">import</span> grad
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">f3</span>(x): <span style="color: #228B22"># Assumes x is an array of length 5 or higher</span>
-    <span style="color: #8B008B; font-weight: bold">return</span> <span style="color: #B452CD">2</span>*x[<span style="color: #B452CD">0</span>] + <span style="color: #B452CD">3</span>*x[<span style="color: #B452CD">1</span>] + <span style="color: #B452CD">5</span>*x[<span style="color: #B452CD">2</span>] + <span style="color: #B452CD">7</span>*x[<span style="color: #B452CD">3</span>] + <span style="color: #B452CD">11</span>*x[<span style="color: #B452CD">4</span>]**<span style="color: #B452CD">2</span>
-
-f3_grad = grad(f3)
-
-x = np.linspace(<span style="color: #B452CD">0</span>,<span style="color: #B452CD">4</span>,<span style="color: #B452CD">5</span>)
-
-<span style="color: #228B22"># Print the computed gradient:</span>
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;The computed gradient of f3 is: &quot;</span>, f3_grad(x))
-
-<span style="color: #228B22"># The analytical gradient is: (2, 3, 5, 7, 22*x[4])</span>
-f3_grad_analytical = np.array([<span style="color: #B452CD">2</span>, <span style="color: #B452CD">3</span>, <span style="color: #B452CD">5</span>, <span style="color: #B452CD">7</span>, <span style="color: #B452CD">22</span>*x[<span style="color: #B452CD">4</span>]])
-
-<span style="color: #228B22"># Print the analytical gradient:</span>
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;The analytical gradient of f3 is: &quot;</span>, f3_grad_analytical)
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-<p>Note that in this case, when sending an array as input argument, the
-output from Autograd is another array. This is the true gradient of
-the function, as opposed to the function in the previous example. By
-using arrays to represent the variables, the output from Autograd
-might be easier to work with, as the output is closer to what one
-could expect form a gradient-evaluting function.
-</p>
-</section>
-
-<section>
-<h2 id="functions-using-mathematical-functions-from-numpy">Functions using mathematical functions from Numpy </h2>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="font-size: 80%; line-height: 125%;"><span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">autograd.numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">autograd</span> <span style="color: #8B008B; font-weight: bold">import</span> grad
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">f4</span>(x):
-    <span style="color: #8B008B; font-weight: bold">return</span> np.sqrt(<span style="color: #B452CD">1</span>+x**<span style="color: #B452CD">2</span>) + np.exp(x) + np.sin(<span style="color: #B452CD">2</span>*np.pi*x)
-
-f4_grad = grad(f4)
-
-x = <span style="color: #B452CD">2.7</span>
-
-<span style="color: #228B22"># Print the computed derivative:</span>
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;The computed derivative of f4 at x = %g is: %g&quot;</span>%(x,f4_grad(x)))
-
-<span style="color: #228B22"># The analytical derivative is: x/sqrt(1 + x**2) + exp(x) + cos(2*pi*x)*2*pi</span>
-f4_grad_analytical = x/np.sqrt(<span style="color: #B452CD">1</span> + x**<span style="color: #B452CD">2</span>) + np.exp(x) + np.cos(<span style="color: #B452CD">2</span>*np.pi*x)*<span style="color: #B452CD">2</span>*np.pi
-
-<span style="color: #228B22"># Print the analytical gradient:</span>
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;The analytical gradient of f4 at x = %g is: %g&quot;</span>%(x,f4_grad_analytical))
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
+<p>and they sum to one. Our earlier discussions were all specialized to
+the case with two classes only. It is easy to see from the above that
+what we derived earlier is compatible with these equations.
+</p>
+
+<p>To find the optimal parameters we would typically use a gradient
+descent method.  Newton's method and gradient descent methods are
+discussed in the material on <a href="https://compphysics.github.io/MachineLearning/doc/pub/Splines/html/Splines-bs.html" target="_blank">optimization
+methods</a>.
+</p>
 </section>
 
 <section>
-<h2 id="more-autograd">More autograd </h2>
+<h2 id="optimization-the-central-part-of-any-machine-learning-algortithm">Optimization, the central part of any Machine Learning algortithm </h2>
 
+<p>Almost every problem in machine learning and data science starts with
+a dataset \( X \), a model \( g(\theta) \), which is a function of the
+parameters \( \theta \) and a cost function \( C(X, g(\theta)) \) that allows
+us to judge how well the model \( g(\theta) \) explains the observations
+\( X \). The model is fit by finding the values of \( \theta \) that minimize
+the cost function. Ideally we would be able to solve for \( \theta \)
+analytically, however this is not possible in general and we must use
+some approximative/numerical method to compute the minimum.
+</p>
+</section>
 
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="font-size: 80%; line-height: 125%;"><span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">autograd.numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">autograd</span> <span style="color: #8B008B; font-weight: bold">import</span> grad
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">f5</span>(x):
-    <span style="color: #8B008B; font-weight: bold">if</span> x &gt;= <span style="color: #B452CD">0</span>:
-        <span style="color: #8B008B; font-weight: bold">return</span> x**<span style="color: #B452CD">2</span>
-    <span style="color: #8B008B; font-weight: bold">else</span>:
-        <span style="color: #8B008B; font-weight: bold">return</span> -<span style="color: #B452CD">3</span>*x + <span style="color: #B452CD">1</span>
+<section>
+<h2 id="revisiting-our-logistic-regression-case">Revisiting our Logistic Regression case </h2>
 
-f5_grad = grad(f5)
+<p>In our discussion on Logistic Regression we studied the 
+case of
+two classes, with \( y_i \) either
+\( 0 \) or \( 1 \). Furthermore we assumed also that we have only two
+parameters \( \theta \) in our fitting, that is we
+defined probabilities
+</p>
 
-x = <span style="color: #B452CD">2.7</span>
+<p>&nbsp;<br>
+$$
+\begin{align*}
+p(y_i=1|x_i,\boldsymbol{\theta}) &= \frac{\exp{(\theta_0+\theta_1x_i)}}{1+\exp{(\theta_0+\theta_1x_i)}},\nonumber\\
+p(y_i=0|x_i,\boldsymbol{\theta}) &= 1 - p(y_i=1|x_i,\boldsymbol{\theta}),
+\end{align*}
+$$
+<p>&nbsp;<br>
 
-<span style="color: #228B22"># Print the computed derivative:</span>
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;The computed derivative of f5 at x = %g is: %g&quot;</span>%(x,f5_grad(x)))
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
+<p>where \( \boldsymbol{\theta} \) are the weights we wish to extract from data, in our case \( \theta_0 \) and \( \theta_1 \). </p>
 </section>
 
 <section>
-<h2 id="and-with-loops">And  with loops </h2>
+<h2 id="the-equations-to-solve">The equations to solve </h2>
 
+<p>Our compact equations used a definition of a vector \( \boldsymbol{y} \) with \( n \)
+elements \( y_i \), an \( n\times p \) matrix \( \boldsymbol{X} \) which contains the
+\( x_i \) values and a vector \( \boldsymbol{p} \) of fitted probabilities
+\( p(y_i\vert x_i,\boldsymbol{\theta}) \). We rewrote in a more compact form
+the first derivative of the cost function as
+</p>
 
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="font-size: 80%; line-height: 125%;"><span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">autograd.numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">autograd</span> <span style="color: #8B008B; font-weight: bold">import</span> grad
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">f6_for</span>(x):
-    val = <span style="color: #B452CD">0</span>
-    <span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #B452CD">10</span>):
-        val = val + x**i
-    <span style="color: #8B008B; font-weight: bold">return</span> val
-
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">f6_while</span>(x):
-    val = <span style="color: #B452CD">0</span>
-    i = <span style="color: #B452CD">0</span>
-    <span style="color: #8B008B; font-weight: bold">while</span> i &lt; <span style="color: #B452CD">10</span>:
-        val = val + x**i
-        i = i + <span style="color: #B452CD">1</span>
-    <span style="color: #8B008B; font-weight: bold">return</span> val
-
-f6_for_grad = grad(f6_for)
-f6_while_grad = grad(f6_while)
-
-x = <span style="color: #B452CD">0.5</span>
-
-<span style="color: #228B22"># Print the computed derivaties of f6_for and f6_while</span>
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;The computed derivative of f6_for at x = %g is: %g&quot;</span>%(x,f6_for_grad(x)))
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;The computed derivative of f6_while at x = %g is: %g&quot;</span>%(x,f6_while_grad(x)))
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
+<p>&nbsp;<br>
+$$
+\frac{\partial \mathcal{C}(\boldsymbol{\theta})}{\partial \boldsymbol{\theta}} = -\boldsymbol{X}^T\left(\boldsymbol{y}-\boldsymbol{p}\right). 
+$$
+<p>&nbsp;<br>
 
+<p>If we in addition define a diagonal matrix \( \boldsymbol{W} \) with elements 
+\( p(y_i\vert x_i,\boldsymbol{\theta})(1-p(y_i\vert x_i,\boldsymbol{\theta}) \), we can obtain a compact expression of the second derivative as 
+</p>
 
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="font-size: 80%; line-height: 125%;"><span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">autograd.numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">autograd</span> <span style="color: #8B008B; font-weight: bold">import</span> grad
-<span style="color: #228B22"># Both of the functions are implementation of the sum: sum(x**i) for i = 0, ..., 9</span>
-<span style="color: #228B22"># The analytical derivative is: sum(i*x**(i-1)) </span>
-f6_grad_analytical = <span style="color: #B452CD">0</span>
-<span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #B452CD">10</span>):
-    f6_grad_analytical += i*x**(i-<span style="color: #B452CD">1</span>)
-
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;The analytical derivative of f6 at x = %g is: %g&quot;</span>%(x,f6_grad_analytical))
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
+<p>&nbsp;<br>
+$$
+\frac{\partial^2 \mathcal{C}(\boldsymbol{\theta})}{\partial \boldsymbol{\theta}\partial \boldsymbol{\theta}^T} = \boldsymbol{X}^T\boldsymbol{W}\boldsymbol{X}. 
+$$
+<p>&nbsp;<br>
+
+<p>This defines what is called  the Hessian matrix.</p>
 </section>
 
 <section>
-<h2 id="using-recursion">Using recursion </h2>
-
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="font-size: 80%; line-height: 125%;"><span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">autograd.numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">autograd</span> <span style="color: #8B008B; font-weight: bold">import</span> grad
-
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">f7</span>(n): <span style="color: #228B22"># Assume that n is an integer</span>
-    <span style="color: #8B008B; font-weight: bold">if</span> n == <span style="color: #B452CD">1</span> <span style="color: #8B008B">or</span> n == <span style="color: #B452CD">0</span>:
-        <span style="color: #8B008B; font-weight: bold">return</span> <span style="color: #B452CD">1</span>
-    <span style="color: #8B008B; font-weight: bold">else</span>:
-        <span style="color: #8B008B; font-weight: bold">return</span> n*f7(n-<span style="color: #B452CD">1</span>)
+<h2 id="solving-using-newton-raphson-s-method">Solving using Newton-Raphson's method </h2>
 
-f7_grad = grad(f7)
+<p>If we can set up these equations, Newton-Raphson's iterative method is normally the method of choice. It requires however that we can compute in an efficient way the  matrices that define the first and second derivatives. </p>
 
-n = <span style="color: #B452CD">2.0</span>
+<p>Our iterative scheme is then given by</p>
 
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;The computed derivative of f7 at n = %d is: %g&quot;</span>%(n,f7_grad(n)))
+<p>&nbsp;<br>
+$$
+\boldsymbol{\theta}^{\mathrm{new}} = \boldsymbol{\theta}^{\mathrm{old}}-\left(\frac{\partial^2 \mathcal{C}(\boldsymbol{\theta})}{\partial \boldsymbol{\theta}\partial \boldsymbol{\theta}^T}\right)^{-1}_{\boldsymbol{\theta}^{\mathrm{old}}}\times \left(\frac{\partial \mathcal{C}(\boldsymbol{\theta})}{\partial \boldsymbol{\theta}}\right)_{\boldsymbol{\theta}^{\mathrm{old}}},
+$$
+<p>&nbsp;<br>
 
-<span style="color: #228B22"># The function f7 is an implementation of the factorial of n.</span>
-<span style="color: #228B22"># By using the product rule, one can find that the derivative is:</span>
+<p>or in matrix form as</p>
 
-f7_grad_analytical = <span style="color: #B452CD">0</span>
-<span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #658b00">int</span>(n)-<span style="color: #B452CD">1</span>):
-    tmp = <span style="color: #B452CD">1</span>
-    <span style="color: #8B008B; font-weight: bold">for</span> k <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #658b00">int</span>(n)-<span style="color: #B452CD">1</span>):
-        <span style="color: #8B008B; font-weight: bold">if</span> k != i:
-            tmp *= (n - k)
-    f7_grad_analytical += tmp
+<p>&nbsp;<br>
+$$
+\boldsymbol{\theta}^{\mathrm{new}} = \boldsymbol{\theta}^{\mathrm{old}}-\left(\boldsymbol{X}^T\boldsymbol{W}\boldsymbol{X} \right)^{-1}\times \left(-\boldsymbol{X}^T(\boldsymbol{y}-\boldsymbol{p}) \right)_{\boldsymbol{\theta}^{\mathrm{old}}}.
+$$
+<p>&nbsp;<br>
 
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;The analytical derivative of f7 at n = %d is: %g&quot;</span>%(n,f7_grad_analytical))
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
+<p>The right-hand side is computed with the old values of \( \theta \). </p>
 
-<p>Note that if n is equal to zero or one, Autograd will give an error message. This message appears when the output is independent on input.</p>
+<p>If we can compute these matrices, in particular the Hessian, the above is often the easiest method to implement. </p>
 </section>
 
 <section>
-<h2 id="using-autograd-with-ols">Using Autograd with OLS </h2>
-
-<p>We conclude the part on optmization by showing how we can make codes
-for linear regression and logistic regression using <b>autograd</b>. The
-first example shows results with ordinary leats squares.
-</p>
+<h2 id="example-code-for-logistic-regression">Example code for Logistic Regression </h2>
 
+<p>Here we make a class for Logistic regression. The code uses a simple data set and includes both a binary case and a multiclass case.</p>
 
 <!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
 <div class="cell border-box-sizing code_cell rendered">
@@ -1644,55 +652,138 @@ <h2 id="using-autograd-with-ols">Using Autograd with OLS </h2>
     <div class="inner_cell">
       <div class="input_area">
         <div class="highlight" style="background: #eeeedd">
-  <pre style="font-size: 80%; line-height: 125%;"><span style="color: #228B22"># Using Autograd to calculate gradients for OLS</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">random</span> <span style="color: #8B008B; font-weight: bold">import</span> random, seed
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">autograd.numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">matplotlib.pyplot</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">plt</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">autograd</span> <span style="color: #8B008B; font-weight: bold">import</span> grad
-
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">CostOLS</span>(beta):
-    <span style="color: #8B008B; font-weight: bold">return</span> (<span style="color: #B452CD">1.0</span>/n)*np.sum((y-X @ beta)**<span style="color: #B452CD">2</span>)
-
-n = <span style="color: #B452CD">100</span>
-x = <span style="color: #B452CD">2</span>*np.random.rand(n,<span style="color: #B452CD">1</span>)
-y = <span style="color: #B452CD">4</span>+<span style="color: #B452CD">3</span>*x+np.random.randn(n,<span style="color: #B452CD">1</span>)
-
-X = np.c_[np.ones((n,<span style="color: #B452CD">1</span>)), x]
-XT_X = X.T @ X
-theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;Own inversion&quot;</span>)
-<span style="color: #658b00">print</span>(theta_linreg)
-<span style="color: #228B22"># Hessian matrix</span>
-H = (<span style="color: #B452CD">2.0</span>/n)* XT_X
-EigValues, EigVectors = np.linalg.eig(H)
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">f&quot;Eigenvalues of Hessian Matrix:{</span>EigValues<span style="color: #CD5555">}&quot;</span>)
-
-theta = np.random.randn(<span style="color: #B452CD">2</span>,<span style="color: #B452CD">1</span>)
-eta = <span style="color: #B452CD">1.0</span>/np.max(EigValues)
-Niterations = <span style="color: #B452CD">1000</span>
-<span style="color: #228B22"># define the gradient</span>
-training_gradient = grad(CostOLS)
-
-<span style="color: #8B008B; font-weight: bold">for</span> <span style="color: #658b00">iter</span> <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(Niterations):
-    gradients = training_gradient(theta)
-    theta -= eta*gradients
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;theta from own gd&quot;</span>)
-<span style="color: #658b00">print</span>(theta)
-
-xnew = np.array([[<span style="color: #B452CD">0</span>],[<span style="color: #B452CD">2</span>]])
-Xnew = np.c_[np.ones((<span style="color: #B452CD">2</span>,<span style="color: #B452CD">1</span>)), xnew]
-ypredict = Xnew.dot(theta)
-ypredict2 = Xnew.dot(theta_linreg)
-
-plt.plot(xnew, ypredict, <span style="color: #CD5555">&quot;r-&quot;</span>)
-plt.plot(xnew, ypredict2, <span style="color: #CD5555">&quot;b-&quot;</span>)
-plt.plot(x, y ,<span style="color: #CD5555">&#39;ro&#39;</span>)
-plt.axis([<span style="color: #B452CD">0</span>,<span style="color: #B452CD">2.0</span>,<span style="color: #B452CD">0</span>, <span style="color: #B452CD">15.0</span>])
-plt.xlabel(<span style="color: #CD5555">r&#39;$x$&#39;</span>)
-plt.ylabel(<span style="color: #CD5555">r&#39;$y$&#39;</span>)
-plt.title(<span style="color: #CD5555">r&#39;Random numbers &#39;</span>)
-plt.show()
+  <pre style="font-size: 80%; line-height: 125%;"><span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
+
+<span style="color: #8B008B; font-weight: bold">class</span> <span style="color: #008b45; font-weight: bold">LogisticRegression</span>:
+    <span style="color: #CD5555">&quot;&quot;&quot;</span>
+<span style="color: #CD5555">    Logistic Regression for binary and multiclass classification.</span>
+<span style="color: #CD5555">    &quot;&quot;&quot;</span>
+    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">__init__</span>(<span style="color: #658b00">self</span>, lr=<span style="color: #B452CD">0.01</span>, epochs=<span style="color: #B452CD">1000</span>, fit_intercept=<span style="color: #8B008B; font-weight: bold">True</span>, verbose=<span style="color: #8B008B; font-weight: bold">False</span>):
+        <span style="color: #658b00">self</span>.lr = lr                  <span style="color: #228B22"># Learning rate for gradient descent</span>
+        <span style="color: #658b00">self</span>.epochs = epochs          <span style="color: #228B22"># Number of iterations</span>
+        <span style="color: #658b00">self</span>.fit_intercept = fit_intercept  <span style="color: #228B22"># Whether to add intercept (bias)</span>
+        <span style="color: #658b00">self</span>.verbose = verbose        <span style="color: #228B22"># Print loss during training if True</span>
+        <span style="color: #658b00">self</span>.weights = <span style="color: #8B008B; font-weight: bold">None</span>
+        <span style="color: #658b00">self</span>.multi_class = <span style="color: #8B008B; font-weight: bold">False</span>      <span style="color: #228B22"># Will be determined at fit time</span>
+
+    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">_add_intercept</span>(<span style="color: #658b00">self</span>, X):
+        <span style="color: #CD5555">&quot;&quot;&quot;Add intercept term (column of ones) to feature matrix.&quot;&quot;&quot;</span>
+        intercept = np.ones((X.shape[<span style="color: #B452CD">0</span>], <span style="color: #B452CD">1</span>))
+        <span style="color: #8B008B; font-weight: bold">return</span> np.concatenate((intercept, X), axis=<span style="color: #B452CD">1</span>)
+
+    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">_sigmoid</span>(<span style="color: #658b00">self</span>, z):
+        <span style="color: #CD5555">&quot;&quot;&quot;Sigmoid function for binary logistic.&quot;&quot;&quot;</span>
+        <span style="color: #8B008B; font-weight: bold">return</span> <span style="color: #B452CD">1</span> / (<span style="color: #B452CD">1</span> + np.exp(-z))
+
+    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">_softmax</span>(<span style="color: #658b00">self</span>, Z):
+        <span style="color: #CD5555">&quot;&quot;&quot;Softmax function for multiclass logistic.&quot;&quot;&quot;</span>
+        exp_Z = np.exp(Z - np.max(Z, axis=<span style="color: #B452CD">1</span>, keepdims=<span style="color: #8B008B; font-weight: bold">True</span>))
+        <span style="color: #8B008B; font-weight: bold">return</span> exp_Z / np.sum(exp_Z, axis=<span style="color: #B452CD">1</span>, keepdims=<span style="color: #8B008B; font-weight: bold">True</span>)
+
+    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">fit</span>(<span style="color: #658b00">self</span>, X, y):
+        <span style="color: #CD5555">&quot;&quot;&quot;</span>
+<span style="color: #CD5555">        Train the logistic regression model using gradient descent.</span>
+<span style="color: #CD5555">        Supports binary (sigmoid) and multiclass (softmax) based on y.</span>
+<span style="color: #CD5555">        &quot;&quot;&quot;</span>
+        X = np.array(X)
+        y = np.array(y)
+        n_samples, n_features = X.shape
+
+        <span style="color: #228B22"># Add intercept if needed</span>
+        <span style="color: #8B008B; font-weight: bold">if</span> <span style="color: #658b00">self</span>.fit_intercept:
+            X = <span style="color: #658b00">self</span>._add_intercept(X)
+            n_features += <span style="color: #B452CD">1</span>
+
+        <span style="color: #228B22"># Determine classes and mode (binary vs multiclass)</span>
+        unique_classes = np.unique(y)
+        <span style="color: #8B008B; font-weight: bold">if</span> <span style="color: #658b00">len</span>(unique_classes) &gt; <span style="color: #B452CD">2</span>:
+            <span style="color: #658b00">self</span>.multi_class = <span style="color: #8B008B; font-weight: bold">True</span>
+        <span style="color: #8B008B; font-weight: bold">else</span>:
+            <span style="color: #658b00">self</span>.multi_class = <span style="color: #8B008B; font-weight: bold">False</span>
+
+        <span style="color: #228B22"># ----- Multiclass case -----</span>
+        <span style="color: #8B008B; font-weight: bold">if</span> <span style="color: #658b00">self</span>.multi_class:
+            n_classes = <span style="color: #658b00">len</span>(unique_classes)
+            <span style="color: #228B22"># Map original labels to 0...n_classes-1</span>
+            class_to_index = {c: idx <span style="color: #8B008B; font-weight: bold">for</span> idx, c <span style="color: #8B008B">in</span> <span style="color: #658b00">enumerate</span>(unique_classes)}
+            y_indices = np.array([class_to_index[c] <span style="color: #8B008B; font-weight: bold">for</span> c <span style="color: #8B008B">in</span> y])
+            <span style="color: #228B22"># Initialize weight matrix (features x classes)</span>
+            <span style="color: #658b00">self</span>.weights = np.zeros((n_features, n_classes))
+
+            <span style="color: #228B22"># One-hot encode y</span>
+            Y_onehot = np.zeros((n_samples, n_classes))
+            Y_onehot[np.arange(n_samples), y_indices] = <span style="color: #B452CD">1</span>
+
+            <span style="color: #228B22"># Gradient descent</span>
+            <span style="color: #8B008B; font-weight: bold">for</span> epoch <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #658b00">self</span>.epochs):
+                scores = X.dot(<span style="color: #658b00">self</span>.weights)          <span style="color: #228B22"># Linear scores (n_samples x n_classes)</span>
+                probs = <span style="color: #658b00">self</span>._softmax(scores)        <span style="color: #228B22"># Probabilities (n_samples x n_classes)</span>
+                <span style="color: #228B22"># Compute gradient (features x classes)</span>
+                gradient = (<span style="color: #B452CD">1</span> / n_samples) * X.T.dot(probs - Y_onehot)
+                <span style="color: #228B22"># Update weights</span>
+                <span style="color: #658b00">self</span>.weights -= <span style="color: #658b00">self</span>.lr * gradient
+
+                <span style="color: #8B008B; font-weight: bold">if</span> <span style="color: #658b00">self</span>.verbose <span style="color: #8B008B">and</span> epoch % <span style="color: #B452CD">100</span> == <span style="color: #B452CD">0</span>:
+                    <span style="color: #228B22"># Compute current loss (categorical cross-entropy)</span>
+                    loss = -np.sum(Y_onehot * np.log(probs + <span style="color: #B452CD">1e-15</span>)) / n_samples
+                    <span style="color: #658b00">print</span>(<span style="color: #CD5555">f&quot;[Epoch {</span>epoch<span style="color: #CD5555">}] Multiclass loss: {</span>loss<span style="color: #CD5555">:.4f}&quot;</span>)
+
+        <span style="color: #228B22"># ----- Binary case -----</span>
+        <span style="color: #8B008B; font-weight: bold">else</span>:
+            <span style="color: #228B22"># Convert y to 0/1 if not already</span>
+            <span style="color: #8B008B; font-weight: bold">if</span> <span style="color: #8B008B">not</span> np.array_equal(unique_classes, [<span style="color: #B452CD">0</span>, <span style="color: #B452CD">1</span>]):
+                <span style="color: #228B22"># Map the two classes to 0 and 1</span>
+                class0, class1 = unique_classes
+                y_binary = np.where(y == class1, <span style="color: #B452CD">1</span>, <span style="color: #B452CD">0</span>)
+            <span style="color: #8B008B; font-weight: bold">else</span>:
+                y_binary = y.copy().astype(<span style="color: #658b00">int</span>)
+
+            <span style="color: #228B22"># Initialize weights vector (features,)</span>
+            <span style="color: #658b00">self</span>.weights = np.zeros(n_features)
+
+            <span style="color: #228B22"># Gradient descent</span>
+            <span style="color: #8B008B; font-weight: bold">for</span> epoch <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #658b00">self</span>.epochs):
+                linear_model = X.dot(<span style="color: #658b00">self</span>.weights)     <span style="color: #228B22"># (n_samples,)</span>
+                probs = <span style="color: #658b00">self</span>._sigmoid(linear_model)   <span style="color: #228B22"># (n_samples,)</span>
+                <span style="color: #228B22"># Gradient for binary cross-entropy</span>
+                gradient = (<span style="color: #B452CD">1</span> / n_samples) * X.T.dot(probs - y_binary)
+                <span style="color: #658b00">self</span>.weights -= <span style="color: #658b00">self</span>.lr * gradient
+
+                <span style="color: #8B008B; font-weight: bold">if</span> <span style="color: #658b00">self</span>.verbose <span style="color: #8B008B">and</span> epoch % <span style="color: #B452CD">100</span> == <span style="color: #B452CD">0</span>:
+                    <span style="color: #228B22"># Compute binary cross-entropy loss</span>
+                    loss = -np.mean(
+                        y_binary * np.log(probs + <span style="color: #B452CD">1e-15</span>) + 
+                        (<span style="color: #B452CD">1</span> - y_binary) * np.log(<span style="color: #B452CD">1</span> - probs + <span style="color: #B452CD">1e-15</span>)
+                    )
+                    <span style="color: #658b00">print</span>(<span style="color: #CD5555">f&quot;[Epoch {</span>epoch<span style="color: #CD5555">}] Binary loss: {</span>loss<span style="color: #CD5555">:.4f}&quot;</span>)
+
+    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">predict_prob</span>(<span style="color: #658b00">self</span>, X):
+        <span style="color: #CD5555">&quot;&quot;&quot;</span>
+<span style="color: #CD5555">        Compute probability estimates. Returns a 1D array for binary or</span>
+<span style="color: #CD5555">        a 2D array (n_samples x n_classes) for multiclass.</span>
+<span style="color: #CD5555">        &quot;&quot;&quot;</span>
+        X = np.array(X)
+        <span style="color: #228B22"># Add intercept if the model used it</span>
+        <span style="color: #8B008B; font-weight: bold">if</span> <span style="color: #658b00">self</span>.fit_intercept:
+            X = <span style="color: #658b00">self</span>._add_intercept(X)
+        scores = X.dot(<span style="color: #658b00">self</span>.weights)
+        <span style="color: #8B008B; font-weight: bold">if</span> <span style="color: #658b00">self</span>.multi_class:
+            <span style="color: #8B008B; font-weight: bold">return</span> <span style="color: #658b00">self</span>._softmax(scores)
+        <span style="color: #8B008B; font-weight: bold">else</span>:
+            <span style="color: #8B008B; font-weight: bold">return</span> <span style="color: #658b00">self</span>._sigmoid(scores)
+
+    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">predict</span>(<span style="color: #658b00">self</span>, X):
+        <span style="color: #CD5555">&quot;&quot;&quot;</span>
+<span style="color: #CD5555">        Predict class labels for samples in X.</span>
+<span style="color: #CD5555">        Returns integer class labels (0,1 for binary, or 0...C-1 for multiclass).</span>
+<span style="color: #CD5555">        &quot;&quot;&quot;</span>
+        probs = <span style="color: #658b00">self</span>.predict_prob(X)
+        <span style="color: #8B008B; font-weight: bold">if</span> <span style="color: #658b00">self</span>.multi_class:
+            <span style="color: #228B22"># Choose class with highest probability</span>
+            <span style="color: #8B008B; font-weight: bold">return</span> np.argmax(probs, axis=<span style="color: #B452CD">1</span>)
+        <span style="color: #8B008B; font-weight: bold">else</span>:
+            <span style="color: #228B22"># Threshold at 0.5 for binary</span>
+            <span style="color: #8B008B; font-weight: bold">return</span> (probs &gt;= <span style="color: #B452CD">0.5</span>).astype(<span style="color: #658b00">int</span>)
 </pre>
 </div>
       </div>
@@ -1707,10 +798,18 @@ <h2 id="using-autograd-with-ols">Using Autograd with OLS </h2>
     </div>
   </div>
 </div>
-</section>
 
-<section>
-<h2 id="same-code-but-now-with-momentum-gradient-descent">Same code but now with momentum gradient descent </h2>
+<p>The class implements the sigmoid and softmax internally. During fit(),
+we check the number of classes: if more than 2, we set
+self.multi_class=True and perform multinomial logistic regression. We
+one-hot encode the target vector and update a weight matrix with
+softmax probabilities. Otherwise, we do standard binary logistic
+regression, converting labels to 0/1 if needed and updating a weight
+vector. In both cases we use batch gradient descent on the
+cross-entropy loss (we add a small epsilon 1e-15 to logs for numerical
+stability). Progress (loss) can be printed if verbose=True.
+</p>
+
 
 <!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
 <div class="cell border-box-sizing code_cell rendered">
@@ -1718,59 +817,38 @@ <h2 id="same-code-but-now-with-momentum-gradient-descent">Same code but now with
     <div class="inner_cell">
       <div class="input_area">
         <div class="highlight" style="background: #eeeedd">
-  <pre style="font-size: 80%; line-height: 125%;"><span style="color: #228B22"># Using Autograd to calculate gradients for OLS</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">random</span> <span style="color: #8B008B; font-weight: bold">import</span> random, seed
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">autograd.numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">matplotlib.pyplot</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">plt</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">autograd</span> <span style="color: #8B008B; font-weight: bold">import</span> grad
-
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">CostOLS</span>(beta):
-    <span style="color: #8B008B; font-weight: bold">return</span> (<span style="color: #B452CD">1.0</span>/n)*np.sum((y-X @ beta)**<span style="color: #B452CD">2</span>)
-
-n = <span style="color: #B452CD">100</span>
-x = <span style="color: #B452CD">2</span>*np.random.rand(n,<span style="color: #B452CD">1</span>)
-y = <span style="color: #B452CD">4</span>+<span style="color: #B452CD">3</span>*x<span style="color: #228B22">#+np.random.randn(n,1)</span>
-
-X = np.c_[np.ones((n,<span style="color: #B452CD">1</span>)), x]
-XT_X = X.T @ X
-theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;Own inversion&quot;</span>)
-<span style="color: #658b00">print</span>(theta_linreg)
-<span style="color: #228B22"># Hessian matrix</span>
-H = (<span style="color: #B452CD">2.0</span>/n)* XT_X
-EigValues, EigVectors = np.linalg.eig(H)
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">f&quot;Eigenvalues of Hessian Matrix:{</span>EigValues<span style="color: #CD5555">}&quot;</span>)
-
-theta = np.random.randn(<span style="color: #B452CD">2</span>,<span style="color: #B452CD">1</span>)
-eta = <span style="color: #B452CD">1.0</span>/np.max(EigValues)
-Niterations = <span style="color: #B452CD">30</span>
-
-<span style="color: #228B22"># define the gradient</span>
-training_gradient = grad(CostOLS)
-
-<span style="color: #8B008B; font-weight: bold">for</span> <span style="color: #658b00">iter</span> <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(Niterations):
-    gradients = training_gradient(theta)
-    theta -= eta*gradients
-    <span style="color: #658b00">print</span>(<span style="color: #658b00">iter</span>,gradients[<span style="color: #B452CD">0</span>],gradients[<span style="color: #B452CD">1</span>])
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;theta from own gd&quot;</span>)
-<span style="color: #658b00">print</span>(theta)
-
-<span style="color: #228B22"># Now improve with momentum gradient descent</span>
-change = <span style="color: #B452CD">0.0</span>
-delta_momentum = <span style="color: #B452CD">0.3</span>
-<span style="color: #8B008B; font-weight: bold">for</span> <span style="color: #658b00">iter</span> <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(Niterations):
-    <span style="color: #228B22"># calculate gradient</span>
-    gradients = training_gradient(theta)
-    <span style="color: #228B22"># calculate update</span>
-    new_change = eta*gradients+delta_momentum*change
-    <span style="color: #228B22"># take a step</span>
-    theta -= new_change
-    <span style="color: #228B22"># save the change</span>
-    change = new_change
-    <span style="color: #658b00">print</span>(<span style="color: #658b00">iter</span>,gradients[<span style="color: #B452CD">0</span>],gradients[<span style="color: #B452CD">1</span>])
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;theta from own gd wth momentum&quot;</span>)
-<span style="color: #658b00">print</span>(theta)
+  <pre style="font-size: 80%; line-height: 125%;"><span style="color: #228B22"># Evaluation Metrics</span>
+<span style="color: #228B22">#We define helper functions for accuracy and cross-entropy loss. Accuracy is the fraction of correct predictions . For loss, we compute the appropriate cross-entropy:</span>
+
+<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">accuracy_score</span>(y_true, y_pred):
+    <span style="color: #CD5555">&quot;&quot;&quot;Accuracy = (# correct predictions) / (total samples).&quot;&quot;&quot;</span>
+    y_true = np.array(y_true)
+    y_pred = np.array(y_pred)
+    <span style="color: #8B008B; font-weight: bold">return</span> np.mean(y_true == y_pred)
+
+<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">binary_cross_entropy</span>(y_true, y_prob):
+    <span style="color: #CD5555">&quot;&quot;&quot;</span>
+<span style="color: #CD5555">    Binary cross-entropy loss.</span>
+<span style="color: #CD5555">    y_true: true binary labels (0 or 1), y_prob: predicted probabilities for class 1.</span>
+<span style="color: #CD5555">    &quot;&quot;&quot;</span>
+    y_true = np.array(y_true)
+    y_prob = np.clip(np.array(y_prob), <span style="color: #B452CD">1e-15</span>, <span style="color: #B452CD">1</span>-<span style="color: #B452CD">1e-15</span>)
+    <span style="color: #8B008B; font-weight: bold">return</span> -np.mean(y_true * np.log(y_prob) + (<span style="color: #B452CD">1</span> - y_true) * np.log(<span style="color: #B452CD">1</span> - y_prob))
+
+<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">categorical_cross_entropy</span>(y_true, y_prob):
+    <span style="color: #CD5555">&quot;&quot;&quot;</span>
+<span style="color: #CD5555">    Categorical cross-entropy loss for multiclass.</span>
+<span style="color: #CD5555">    y_true: true labels (0...C-1), y_prob: array of predicted probabilities (n_samples x C).</span>
+<span style="color: #CD5555">    &quot;&quot;&quot;</span>
+    y_true = np.array(y_true, dtype=<span style="color: #658b00">int</span>)
+    y_prob = np.clip(np.array(y_prob), <span style="color: #B452CD">1e-15</span>, <span style="color: #B452CD">1</span>-<span style="color: #B452CD">1e-15</span>)
+    <span style="color: #228B22"># One-hot encode true labels</span>
+    n_samples, n_classes = y_prob.shape
+    one_hot = np.zeros_like(y_prob)
+    one_hot[np.arange(n_samples), y_true] = <span style="color: #B452CD">1</span>
+    <span style="color: #228B22"># Compute cross-entropy</span>
+    loss_vec = -np.sum(one_hot * np.log(y_prob), axis=<span style="color: #B452CD">1</span>)
+    <span style="color: #8B008B; font-weight: bold">return</span> np.mean(loss_vec)
 </pre>
 </div>
       </div>
@@ -1785,11 +863,11 @@ <h2 id="same-code-but-now-with-momentum-gradient-descent">Same code but now with
     </div>
   </div>
 </div>
-</section>
+<h3 id="synthetic-data-generation">Synthetic data generation </h3>
 
-<section>
-<h2 id="including-stochastic-gradient-descent-with-autograd">Including Stochastic Gradient Descent with Autograd </h2>
-<p>In this code we include the stochastic gradient descent approach discussed above. Note here that we specify which argument we are taking the derivative with respect to when using <b>autograd</b>.</p>
+<p>Binary classification data: Create two Gaussian clusters in 2D. For example, class 0 around mean [-2,-2] and class 1 around [2,2].
+Multiclass data: Create several Gaussian clusters (one per class) spread out in feature space.
+</p>
 
 
 <!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
@@ -1798,79 +876,84 @@ <h2 id="including-stochastic-gradient-descent-with-autograd">Including Stochasti
     <div class="inner_cell">
       <div class="input_area">
         <div class="highlight" style="background: #eeeedd">
-  <pre style="font-size: 80%; line-height: 125%;"><span style="color: #228B22"># Using Autograd to calculate gradients using SGD</span>
-<span style="color: #228B22"># OLS example</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">random</span> <span style="color: #8B008B; font-weight: bold">import</span> random, seed
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">autograd.numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">matplotlib.pyplot</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">plt</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">autograd</span> <span style="color: #8B008B; font-weight: bold">import</span> grad
-
-<span style="color: #228B22"># Note change from previous example</span>
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">CostOLS</span>(y,X,theta):
-    <span style="color: #8B008B; font-weight: bold">return</span> np.sum((y-X @ theta)**<span style="color: #B452CD">2</span>)
-
-n = <span style="color: #B452CD">100</span>
-x = <span style="color: #B452CD">2</span>*np.random.rand(n,<span style="color: #B452CD">1</span>)
-y = <span style="color: #B452CD">4</span>+<span style="color: #B452CD">3</span>*x+np.random.randn(n,<span style="color: #B452CD">1</span>)
-
-X = np.c_[np.ones((n,<span style="color: #B452CD">1</span>)), x]
-XT_X = X.T @ X
-theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;Own inversion&quot;</span>)
-<span style="color: #658b00">print</span>(theta_linreg)
-<span style="color: #228B22"># Hessian matrix</span>
-H = (<span style="color: #B452CD">2.0</span>/n)* XT_X
-EigValues, EigVectors = np.linalg.eig(H)
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">f&quot;Eigenvalues of Hessian Matrix:{</span>EigValues<span style="color: #CD5555">}&quot;</span>)
-
-theta = np.random.randn(<span style="color: #B452CD">2</span>,<span style="color: #B452CD">1</span>)
-eta = <span style="color: #B452CD">1.0</span>/np.max(EigValues)
-Niterations = <span style="color: #B452CD">1000</span>
-
-<span style="color: #228B22"># Note that we request the derivative wrt third argument (theta, 2 here)</span>
-training_gradient = grad(CostOLS,<span style="color: #B452CD">2</span>)
-
-<span style="color: #8B008B; font-weight: bold">for</span> <span style="color: #658b00">iter</span> <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(Niterations):
-    gradients = (<span style="color: #B452CD">1.0</span>/n)*training_gradient(y, X, theta)
-    theta -= eta*gradients
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;theta from own gd&quot;</span>)
-<span style="color: #658b00">print</span>(theta)
-
-xnew = np.array([[<span style="color: #B452CD">0</span>],[<span style="color: #B452CD">2</span>]])
-Xnew = np.c_[np.ones((<span style="color: #B452CD">2</span>,<span style="color: #B452CD">1</span>)), xnew]
-ypredict = Xnew.dot(theta)
-ypredict2 = Xnew.dot(theta_linreg)
-
-plt.plot(xnew, ypredict, <span style="color: #CD5555">&quot;r-&quot;</span>)
-plt.plot(xnew, ypredict2, <span style="color: #CD5555">&quot;b-&quot;</span>)
-plt.plot(x, y ,<span style="color: #CD5555">&#39;ro&#39;</span>)
-plt.axis([<span style="color: #B452CD">0</span>,<span style="color: #B452CD">2.0</span>,<span style="color: #B452CD">0</span>, <span style="color: #B452CD">15.0</span>])
-plt.xlabel(<span style="color: #CD5555">r&#39;$x$&#39;</span>)
-plt.ylabel(<span style="color: #CD5555">r&#39;$y$&#39;</span>)
-plt.title(<span style="color: #CD5555">r&#39;Random numbers &#39;</span>)
-plt.show()
+  <pre style="font-size: 80%; line-height: 125%;"><span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
 
-n_epochs = <span style="color: #B452CD">50</span>
-M = <span style="color: #B452CD">5</span>   <span style="color: #228B22">#size of each minibatch</span>
-m = <span style="color: #658b00">int</span>(n/M) <span style="color: #228B22">#number of minibatches</span>
-t0, t1 = <span style="color: #B452CD">5</span>, <span style="color: #B452CD">50</span>
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">learning_schedule</span>(t):
-    <span style="color: #8B008B; font-weight: bold">return</span> t0/(t+t1)
-
-theta = np.random.randn(<span style="color: #B452CD">2</span>,<span style="color: #B452CD">1</span>)
-
-<span style="color: #8B008B; font-weight: bold">for</span> epoch <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(n_epochs):
-<span style="color: #228B22"># Can you figure out a better way of setting up the contributions to each batch?</span>
-    <span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(m):
-        random_index = M*np.random.randint(m)
-        xi = X[random_index:random_index+M]
-        yi = y[random_index:random_index+M]
-        gradients = (<span style="color: #B452CD">1.0</span>/M)*training_gradient(yi, xi, theta)
-        eta = learning_schedule(epoch*m+i)
-        theta = theta - eta*gradients
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;theta from own sdg&quot;</span>)
-<span style="color: #658b00">print</span>(theta)
+<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">generate_binary_data</span>(n_samples=<span style="color: #B452CD">100</span>, n_features=<span style="color: #B452CD">2</span>, random_state=<span style="color: #8B008B; font-weight: bold">None</span>):
+    <span style="color: #CD5555">&quot;&quot;&quot;</span>
+<span style="color: #CD5555">    Generate synthetic binary classification data.</span>
+<span style="color: #CD5555">    Returns (X, y) where X is (n_samples x n_features), y in {0,1}.</span>
+<span style="color: #CD5555">    &quot;&quot;&quot;</span>
+    rng = np.random.RandomState(random_state)
+    <span style="color: #228B22"># Half samples for class 0, half for class 1</span>
+    n0 = n_samples // <span style="color: #B452CD">2</span>
+    n1 = n_samples - n0
+    <span style="color: #228B22"># Class 0 around mean -2, class 1 around +2</span>
+    mean0 = -<span style="color: #B452CD">2</span> * np.ones(n_features)
+    mean1 =  <span style="color: #B452CD">2</span> * np.ones(n_features)
+    X0 = rng.randn(n0, n_features) + mean0
+    X1 = rng.randn(n1, n_features) + mean1
+    X = np.vstack((X0, X1))
+    y = np.array([<span style="color: #B452CD">0</span>]*n0 + [<span style="color: #B452CD">1</span>]*n1)
+    <span style="color: #8B008B; font-weight: bold">return</span> X, y
+
+<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">generate_multiclass_data</span>(n_samples=<span style="color: #B452CD">150</span>, n_features=<span style="color: #B452CD">2</span>, n_classes=<span style="color: #B452CD">3</span>, random_state=<span style="color: #8B008B; font-weight: bold">None</span>):
+    <span style="color: #CD5555">&quot;&quot;&quot;</span>
+<span style="color: #CD5555">    Generate synthetic multiclass data with n_classes Gaussian clusters.</span>
+<span style="color: #CD5555">    &quot;&quot;&quot;</span>
+    rng = np.random.RandomState(random_state)
+    X = []
+    y = []
+    samples_per_class = n_samples // n_classes
+    <span style="color: #8B008B; font-weight: bold">for</span> <span style="color: #658b00">cls</span> <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(n_classes):
+        <span style="color: #228B22"># Random cluster center for each class</span>
+        center = rng.uniform(-<span style="color: #B452CD">5</span>, <span style="color: #B452CD">5</span>, size=n_features)
+        Xi = rng.randn(samples_per_class, n_features) + center
+        yi = [<span style="color: #658b00">cls</span>] * samples_per_class
+        X.append(Xi)
+        y.extend(yi)
+    X = np.vstack(X)
+    y = np.array(y)
+    <span style="color: #8B008B; font-weight: bold">return</span> X, y
+
+
+<span style="color: #228B22"># Generate and test on binary data</span>
+X_bin, y_bin = generate_binary_data(n_samples=<span style="color: #B452CD">200</span>, n_features=<span style="color: #B452CD">2</span>, random_state=<span style="color: #B452CD">42</span>)
+model_bin = LogisticRegression(lr=<span style="color: #B452CD">0.1</span>, epochs=<span style="color: #B452CD">1000</span>)
+model_bin.fit(X_bin, y_bin)
+y_prob_bin = model_bin.predict_prob(X_bin)      <span style="color: #228B22"># probabilities for class 1</span>
+y_pred_bin = model_bin.predict(X_bin)           <span style="color: #228B22"># predicted classes 0 or 1</span>
+
+acc_bin = accuracy_score(y_bin, y_pred_bin)
+loss_bin = binary_cross_entropy(y_bin, y_prob_bin)
+<span style="color: #658b00">print</span>(<span style="color: #CD5555">f&quot;Binary Classification - Accuracy: {</span>acc_bin<span style="color: #CD5555">:.2f}, Cross-Entropy Loss: {</span>loss_bin<span style="color: #CD5555">:.2f}&quot;</span>)
+<span style="color: #228B22">#For multiclass:</span>
+<span style="color: #228B22"># Generate and test on multiclass data</span>
+X_multi, y_multi = generate_multiclass_data(n_samples=<span style="color: #B452CD">300</span>, n_features=<span style="color: #B452CD">2</span>, n_classes=<span style="color: #B452CD">3</span>, random_state=<span style="color: #B452CD">1</span>)
+model_multi = LogisticRegression(lr=<span style="color: #B452CD">0.1</span>, epochs=<span style="color: #B452CD">1000</span>)
+model_multi.fit(X_multi, y_multi)
+y_prob_multi = model_multi.predict_prob(X_multi)     <span style="color: #228B22"># (n_samples x 3) probabilities</span>
+y_pred_multi = model_multi.predict(X_multi)          <span style="color: #228B22"># predicted labels 0,1,2</span>
+
+acc_multi = accuracy_score(y_multi, y_pred_multi)
+loss_multi = categorical_cross_entropy(y_multi, y_prob_multi)
+<span style="color: #658b00">print</span>(<span style="color: #CD5555">f&quot;Multiclass Classification - Accuracy: {</span>acc_multi<span style="color: #CD5555">:.2f}, Cross-Entropy Loss: {</span>loss_multi<span style="color: #CD5555">:.2f}&quot;</span>)
+
+<span style="color: #228B22"># CSV Export</span>
+<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">csv</span>
+
+<span style="color: #228B22"># Export binary results</span>
+<span style="color: #8B008B; font-weight: bold">with</span> <span style="color: #658b00">open</span>(<span style="color: #CD5555">&#39;binary_results.csv&#39;</span>, mode=<span style="color: #CD5555">&#39;w&#39;</span>, newline=<span style="color: #CD5555">&#39;&#39;</span>) <span style="color: #8B008B; font-weight: bold">as</span> f:
+    writer = csv.writer(f)
+    writer.writerow([<span style="color: #CD5555">&quot;TrueLabel&quot;</span>, <span style="color: #CD5555">&quot;PredictedLabel&quot;</span>])
+    <span style="color: #8B008B; font-weight: bold">for</span> true, pred <span style="color: #8B008B">in</span> <span style="color: #658b00">zip</span>(y_bin, y_pred_bin):
+        writer.writerow([true, pred])
+
+<span style="color: #228B22"># Export multiclass results</span>
+<span style="color: #8B008B; font-weight: bold">with</span> <span style="color: #658b00">open</span>(<span style="color: #CD5555">&#39;multiclass_results.csv&#39;</span>, mode=<span style="color: #CD5555">&#39;w&#39;</span>, newline=<span style="color: #CD5555">&#39;&#39;</span>) <span style="color: #8B008B; font-weight: bold">as</span> f:
+    writer = csv.writer(f)
+    writer.writerow([<span style="color: #CD5555">&quot;TrueLabel&quot;</span>, <span style="color: #CD5555">&quot;PredictedLabel&quot;</span>])
+    <span style="color: #8B008B; font-weight: bold">for</span> true, pred <span style="color: #8B008B">in</span> <span style="color: #658b00">zip</span>(y_multi, y_pred_multi):
+        writer.writerow([true, pred])
 </pre>
 </div>
       </div>
@@ -1888,99 +971,15 @@ <h2 id="including-stochastic-gradient-descent-with-autograd">Including Stochasti
 </section>
 
 <section>
-<h2 id="same-code-but-now-with-momentum-gradient-descent">Same code but now with momentum gradient descent </h2>
+<h2 id="using-scikit-learn">Using <b>Scikit-learn</b>   </h2>
 
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="font-size: 80%; line-height: 125%;"><span style="color: #228B22"># Using Autograd to calculate gradients using SGD</span>
-<span style="color: #228B22"># OLS example</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">random</span> <span style="color: #8B008B; font-weight: bold">import</span> random, seed
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">autograd.numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">matplotlib.pyplot</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">plt</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">autograd</span> <span style="color: #8B008B; font-weight: bold">import</span> grad
-
-<span style="color: #228B22"># Note change from previous example</span>
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">CostOLS</span>(y,X,theta):
-    <span style="color: #8B008B; font-weight: bold">return</span> np.sum((y-X @ theta)**<span style="color: #B452CD">2</span>)
-
-n = <span style="color: #B452CD">100</span>
-x = <span style="color: #B452CD">2</span>*np.random.rand(n,<span style="color: #B452CD">1</span>)
-y = <span style="color: #B452CD">4</span>+<span style="color: #B452CD">3</span>*x+np.random.randn(n,<span style="color: #B452CD">1</span>)
-
-X = np.c_[np.ones((n,<span style="color: #B452CD">1</span>)), x]
-XT_X = X.T @ X
-theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;Own inversion&quot;</span>)
-<span style="color: #658b00">print</span>(theta_linreg)
-<span style="color: #228B22"># Hessian matrix</span>
-H = (<span style="color: #B452CD">2.0</span>/n)* XT_X
-EigValues, EigVectors = np.linalg.eig(H)
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">f&quot;Eigenvalues of Hessian Matrix:{</span>EigValues<span style="color: #CD5555">}&quot;</span>)
-
-theta = np.random.randn(<span style="color: #B452CD">2</span>,<span style="color: #B452CD">1</span>)
-eta = <span style="color: #B452CD">1.0</span>/np.max(EigValues)
-Niterations = <span style="color: #B452CD">100</span>
-
-<span style="color: #228B22"># Note that we request the derivative wrt third argument (theta, 2 here)</span>
-training_gradient = grad(CostOLS,<span style="color: #B452CD">2</span>)
-
-<span style="color: #8B008B; font-weight: bold">for</span> <span style="color: #658b00">iter</span> <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(Niterations):
-    gradients = (<span style="color: #B452CD">1.0</span>/n)*training_gradient(y, X, theta)
-    theta -= eta*gradients
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;theta from own gd&quot;</span>)
-<span style="color: #658b00">print</span>(theta)
-
-
-n_epochs = <span style="color: #B452CD">50</span>
-M = <span style="color: #B452CD">5</span>   <span style="color: #228B22">#size of each minibatch</span>
-m = <span style="color: #658b00">int</span>(n/M) <span style="color: #228B22">#number of minibatches</span>
-t0, t1 = <span style="color: #B452CD">5</span>, <span style="color: #B452CD">50</span>
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">learning_schedule</span>(t):
-    <span style="color: #8B008B; font-weight: bold">return</span> t0/(t+t1)
-
-theta = np.random.randn(<span style="color: #B452CD">2</span>,<span style="color: #B452CD">1</span>)
-
-change = <span style="color: #B452CD">0.0</span>
-delta_momentum = <span style="color: #B452CD">0.3</span>
-
-<span style="color: #8B008B; font-weight: bold">for</span> epoch <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(n_epochs):
-    <span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(m):
-        random_index = M*np.random.randint(m)
-        xi = X[random_index:random_index+M]
-        yi = y[random_index:random_index+M]
-        gradients = (<span style="color: #B452CD">1.0</span>/M)*training_gradient(yi, xi, theta)
-        eta = learning_schedule(epoch*m+i)
-        <span style="color: #228B22"># calculate update</span>
-        new_change = eta*gradients+delta_momentum*change
-        <span style="color: #228B22"># take a step</span>
-        theta -= new_change
-        <span style="color: #228B22"># save the change</span>
-        change = new_change
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;theta from own sdg with momentum&quot;</span>)
-<span style="color: #658b00">print</span>(theta)
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-</section>
+<p>We show here how we can use a logistic regression case on a data set
+included in _scikit_learn_, the so-called Wisconsin breast cancer data
+using Logistic regression as our algorithm for classification. This is
+a widely studied data set and can easily be included in demonstrations
+of classification problems.
+</p>
 
-<section>
-<h2 id="similar-second-order-function-now-problem-but-now-with-adagrad">Similar (second order function now) problem but now with AdaGrad </h2>
 
 <!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
 <div class="cell border-box-sizing code_cell rendered">
@@ -1988,135 +987,22 @@ <h2 id="similar-second-order-function-now-problem-but-now-with-adagrad">Similar
     <div class="inner_cell">
       <div class="input_area">
         <div class="highlight" style="background: #eeeedd">
-  <pre style="font-size: 80%; line-height: 125%;"><span style="color: #228B22"># Using Autograd to calculate gradients using AdaGrad and Stochastic Gradient descent</span>
-<span style="color: #228B22"># OLS example</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">random</span> <span style="color: #8B008B; font-weight: bold">import</span> random, seed
+  <pre style="font-size: 80%; line-height: 125%;"><span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">matplotlib.pyplot</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">plt</span>
 <span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">autograd.numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">matplotlib.pyplot</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">plt</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">autograd</span> <span style="color: #8B008B; font-weight: bold">import</span> grad
-
-<span style="color: #228B22"># Note change from previous example</span>
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">CostOLS</span>(y,X,theta):
-    <span style="color: #8B008B; font-weight: bold">return</span> np.sum((y-X @ theta)**<span style="color: #B452CD">2</span>)
-
-n = <span style="color: #B452CD">1000</span>
-x = np.random.rand(n,<span style="color: #B452CD">1</span>)
-y = <span style="color: #B452CD">2.0</span>+<span style="color: #B452CD">3</span>*x +<span style="color: #B452CD">4</span>*x*x
-
-X = np.c_[np.ones((n,<span style="color: #B452CD">1</span>)), x, x*x]
-XT_X = X.T @ X
-theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;Own inversion&quot;</span>)
-<span style="color: #658b00">print</span>(theta_linreg)
-
-
-<span style="color: #228B22"># Note that we request the derivative wrt third argument (theta, 2 here)</span>
-training_gradient = grad(CostOLS,<span style="color: #B452CD">2</span>)
-<span style="color: #228B22"># Define parameters for Stochastic Gradient Descent</span>
-n_epochs = <span style="color: #B452CD">50</span>
-M = <span style="color: #B452CD">5</span>   <span style="color: #228B22">#size of each minibatch</span>
-m = <span style="color: #658b00">int</span>(n/M) <span style="color: #228B22">#number of minibatches</span>
-<span style="color: #228B22"># Guess for unknown parameters theta</span>
-theta = np.random.randn(<span style="color: #B452CD">3</span>,<span style="color: #B452CD">1</span>)
-
-<span style="color: #228B22"># Value for learning rate</span>
-eta = <span style="color: #B452CD">0.01</span>
-<span style="color: #228B22"># Including AdaGrad parameter to avoid possible division by zero</span>
-delta  = <span style="color: #B452CD">1e-8</span>
-<span style="color: #8B008B; font-weight: bold">for</span> epoch <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(n_epochs):
-    Giter = <span style="color: #B452CD">0.0</span>
-    <span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(m):
-        random_index = M*np.random.randint(m)
-        xi = X[random_index:random_index+M]
-        yi = y[random_index:random_index+M]
-        gradients = (<span style="color: #B452CD">1.0</span>/M)*training_gradient(yi, xi, theta)
-        Giter += gradients*gradients
-        update = gradients*eta/(delta+np.sqrt(Giter))
-        theta -= update
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;theta from own AdaGrad&quot;</span>)
-<span style="color: #658b00">print</span>(theta)
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-<p>Running this code we note an almost perfect agreement with the results from matrix inversion.</p>
-</section>
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.model_selection</span> <span style="color: #8B008B; font-weight: bold">import</span>  train_test_split 
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.datasets</span> <span style="color: #8B008B; font-weight: bold">import</span> load_breast_cancer
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.linear_model</span> <span style="color: #8B008B; font-weight: bold">import</span> LogisticRegression
 
-<section>
-<h2 id="rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent">RMSprop for adaptive learning rate with Stochastic Gradient Descent </h2>
+<span style="color: #228B22"># Load the data</span>
+cancer = load_breast_cancer()
 
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="font-size: 80%; line-height: 125%;"><span style="color: #228B22"># Using Autograd to calculate gradients using RMSprop  and Stochastic Gradient descent</span>
-<span style="color: #228B22"># OLS example</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">random</span> <span style="color: #8B008B; font-weight: bold">import</span> random, seed
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">autograd.numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">matplotlib.pyplot</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">plt</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">autograd</span> <span style="color: #8B008B; font-weight: bold">import</span> grad
-
-<span style="color: #228B22"># Note change from previous example</span>
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">CostOLS</span>(y,X,theta):
-    <span style="color: #8B008B; font-weight: bold">return</span> np.sum((y-X @ theta)**<span style="color: #B452CD">2</span>)
-
-n = <span style="color: #B452CD">1000</span>
-x = np.random.rand(n,<span style="color: #B452CD">1</span>)
-y = <span style="color: #B452CD">2.0</span>+<span style="color: #B452CD">3</span>*x +<span style="color: #B452CD">4</span>*x*x<span style="color: #228B22"># +np.random.randn(n,1)</span>
-
-X = np.c_[np.ones((n,<span style="color: #B452CD">1</span>)), x, x*x]
-XT_X = X.T @ X
-theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;Own inversion&quot;</span>)
-<span style="color: #658b00">print</span>(theta_linreg)
-
-
-<span style="color: #228B22"># Note that we request the derivative wrt third argument (theta, 2 here)</span>
-training_gradient = grad(CostOLS,<span style="color: #B452CD">2</span>)
-<span style="color: #228B22"># Define parameters for Stochastic Gradient Descent</span>
-n_epochs = <span style="color: #B452CD">50</span>
-M = <span style="color: #B452CD">5</span>   <span style="color: #228B22">#size of each minibatch</span>
-m = <span style="color: #658b00">int</span>(n/M) <span style="color: #228B22">#number of minibatches</span>
-<span style="color: #228B22"># Guess for unknown parameters theta</span>
-theta = np.random.randn(<span style="color: #B452CD">3</span>,<span style="color: #B452CD">1</span>)
-
-<span style="color: #228B22"># Value for learning rate</span>
-eta = <span style="color: #B452CD">0.01</span>
-<span style="color: #228B22"># Value for parameter rho</span>
-rho = <span style="color: #B452CD">0.99</span>
-<span style="color: #228B22"># Including AdaGrad parameter to avoid possible division by zero</span>
-delta  = <span style="color: #B452CD">1e-8</span>
-<span style="color: #8B008B; font-weight: bold">for</span> epoch <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(n_epochs):
-    Giter = <span style="color: #B452CD">0.0</span>
-    <span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(m):
-        random_index = M*np.random.randint(m)
-        xi = X[random_index:random_index+M]
-        yi = y[random_index:random_index+M]
-        gradients = (<span style="color: #B452CD">1.0</span>/M)*training_gradient(yi, xi, theta)
-	<span style="color: #228B22"># Accumulated gradient</span>
-	<span style="color: #228B22"># Scaling with rho the new and the previous results</span>
-        Giter = (rho*Giter+(<span style="color: #B452CD">1</span>-rho)*gradients*gradients)
-	<span style="color: #228B22"># Taking the diagonal only and inverting</span>
-        update = gradients*eta/(delta+np.sqrt(Giter))
-	<span style="color: #228B22"># Hadamard product</span>
-        theta -= update
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;theta from own RMSprop&quot;</span>)
-<span style="color: #658b00">print</span>(theta)
+X_train, X_test, y_train, y_test = train_test_split(cancer.data,cancer.target,random_state=<span style="color: #B452CD">0</span>)
+<span style="color: #658b00">print</span>(X_train.shape)
+<span style="color: #658b00">print</span>(X_test.shape)
+<span style="color: #228B22"># Logistic Regression</span>
+logreg = LogisticRegression(solver=<span style="color: #CD5555">&#39;lbfgs&#39;</span>)
+logreg.fit(X_train, y_train)
+<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;Test set accuracy with Logistic Regression: {:.2f}&quot;</span>.format(logreg.score(X_test,y_test)))
 </pre>
 </div>
       </div>
@@ -2134,8 +1020,11 @@ <h2 id="rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent">RMS
 </section>
 
 <section>
-<h2 id="and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf">And finally <a href="https://arxiv.org/pdf/1412.6980.pdf" target="_blank">ADAM</a> </h2>
+<h2 id="using-the-correlation-matrix">Using the correlation matrix </h2>
 
+<p>In addition to the above scores, we could also study the covariance (and the correlation matrix).
+We use <b>Pandas</b> to compute the correlation matrix.
+</p>
 
 <!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
 <div class="cell border-box-sizing code_cell rendered">
@@ -2143,65 +1032,40 @@ <h2 id="and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf">And finally <a href=
     <div class="inner_cell">
       <div class="input_area">
         <div class="highlight" style="background: #eeeedd">
-  <pre style="font-size: 80%; line-height: 125%;"><span style="color: #228B22"># Using Autograd to calculate gradients using RMSprop  and Stochastic Gradient descent</span>
-<span style="color: #228B22"># OLS example</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">random</span> <span style="color: #8B008B; font-weight: bold">import</span> random, seed
+  <pre style="font-size: 80%; line-height: 125%;"><span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">matplotlib.pyplot</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">plt</span>
 <span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">autograd.numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">matplotlib.pyplot</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">plt</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">autograd</span> <span style="color: #8B008B; font-weight: bold">import</span> grad
-
-<span style="color: #228B22"># Note change from previous example</span>
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">CostOLS</span>(y,X,theta):
-    <span style="color: #8B008B; font-weight: bold">return</span> np.sum((y-X @ theta)**<span style="color: #B452CD">2</span>)
-
-n = <span style="color: #B452CD">1000</span>
-x = np.random.rand(n,<span style="color: #B452CD">1</span>)
-y = <span style="color: #B452CD">2.0</span>+<span style="color: #B452CD">3</span>*x +<span style="color: #B452CD">4</span>*x*x<span style="color: #228B22"># +np.random.randn(n,1)</span>
-
-X = np.c_[np.ones((n,<span style="color: #B452CD">1</span>)), x, x*x]
-XT_X = X.T @ X
-theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;Own inversion&quot;</span>)
-<span style="color: #658b00">print</span>(theta_linreg)
-
-
-<span style="color: #228B22"># Note that we request the derivative wrt third argument (theta, 2 here)</span>
-training_gradient = grad(CostOLS,<span style="color: #B452CD">2</span>)
-<span style="color: #228B22"># Define parameters for Stochastic Gradient Descent</span>
-n_epochs = <span style="color: #B452CD">50</span>
-M = <span style="color: #B452CD">5</span>   <span style="color: #228B22">#size of each minibatch</span>
-m = <span style="color: #658b00">int</span>(n/M) <span style="color: #228B22">#number of minibatches</span>
-<span style="color: #228B22"># Guess for unknown parameters theta</span>
-theta = np.random.randn(<span style="color: #B452CD">3</span>,<span style="color: #B452CD">1</span>)
-
-<span style="color: #228B22"># Value for learning rate</span>
-eta = <span style="color: #B452CD">0.01</span>
-<span style="color: #228B22"># Value for parameters beta1 and beta2, see https://arxiv.org/abs/1412.6980</span>
-beta1 = <span style="color: #B452CD">0.9</span>
-beta2 = <span style="color: #B452CD">0.999</span>
-<span style="color: #228B22"># Including AdaGrad parameter to avoid possible division by zero</span>
-delta  = <span style="color: #B452CD">1e-7</span>
-<span style="color: #658b00">iter</span> = <span style="color: #B452CD">0</span>
-<span style="color: #8B008B; font-weight: bold">for</span> epoch <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(n_epochs):
-    first_moment = <span style="color: #B452CD">0.0</span>
-    second_moment = <span style="color: #B452CD">0.0</span>
-    <span style="color: #658b00">iter</span> += <span style="color: #B452CD">1</span>
-    <span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(m):
-        random_index = M*np.random.randint(m)
-        xi = X[random_index:random_index+M]
-        yi = y[random_index:random_index+M]
-        gradients = (<span style="color: #B452CD">1.0</span>/M)*training_gradient(yi, xi, theta)
-        <span style="color: #228B22"># Computing moments first</span>
-        first_moment = beta1*first_moment + (<span style="color: #B452CD">1</span>-beta1)*gradients
-        second_moment = beta2*second_moment+(<span style="color: #B452CD">1</span>-beta2)*gradients*gradients
-        first_term = first_moment/(<span style="color: #B452CD">1.0</span>-beta1**<span style="color: #658b00">iter</span>)
-        second_term = second_moment/(<span style="color: #B452CD">1.0</span>-beta2**<span style="color: #658b00">iter</span>)
-	<span style="color: #228B22"># Scaling with rho the new and the previous results</span>
-        update = eta*first_term/(np.sqrt(second_term)+delta)
-        theta -= update
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;theta from own ADAM&quot;</span>)
-<span style="color: #658b00">print</span>(theta)
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.model_selection</span> <span style="color: #8B008B; font-weight: bold">import</span>  train_test_split 
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.datasets</span> <span style="color: #8B008B; font-weight: bold">import</span> load_breast_cancer
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.linear_model</span> <span style="color: #8B008B; font-weight: bold">import</span> LogisticRegression
+cancer = load_breast_cancer()
+<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">pandas</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">pd</span>
+<span style="color: #228B22"># Making a data frame</span>
+cancerpd = pd.DataFrame(cancer.data, columns=cancer.feature_names)
+
+fig, axes = plt.subplots(<span style="color: #B452CD">15</span>,<span style="color: #B452CD">2</span>,figsize=(<span style="color: #B452CD">10</span>,<span style="color: #B452CD">20</span>))
+malignant = cancer.data[cancer.target == <span style="color: #B452CD">0</span>]
+benign = cancer.data[cancer.target == <span style="color: #B452CD">1</span>]
+ax = axes.ravel()
+
+<span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #B452CD">30</span>):
+    _, bins = np.histogram(cancer.data[:,i], bins =<span style="color: #B452CD">50</span>)
+    ax[i].hist(malignant[:,i], bins = bins, alpha = <span style="color: #B452CD">0.5</span>)
+    ax[i].hist(benign[:,i], bins = bins, alpha = <span style="color: #B452CD">0.5</span>)
+    ax[i].set_title(cancer.feature_names[i])
+    ax[i].set_yticks(())
+ax[<span style="color: #B452CD">0</span>].set_xlabel(<span style="color: #CD5555">&quot;Feature magnitude&quot;</span>)
+ax[<span style="color: #B452CD">0</span>].set_ylabel(<span style="color: #CD5555">&quot;Frequency&quot;</span>)
+ax[<span style="color: #B452CD">0</span>].legend([<span style="color: #CD5555">&quot;Malignant&quot;</span>, <span style="color: #CD5555">&quot;Benign&quot;</span>], loc =<span style="color: #CD5555">&quot;best&quot;</span>)
+fig.tight_layout()
+plt.show()
+
+<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">seaborn</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">sns</span>
+correlation_matrix = cancerpd.corr().round(<span style="color: #B452CD">1</span>)
+<span style="color: #228B22"># use the heatmap function from seaborn to plot the correlation matrix</span>
+<span style="color: #228B22"># annot = True to print the values inside the square</span>
+plt.figure(figsize=(<span style="color: #B452CD">15</span>,<span style="color: #B452CD">8</span>))
+sns.heatmap(data=correlation_matrix, annot=<span style="color: #8B008B; font-weight: bold">True</span>)
+plt.show()
 </pre>
 </div>
       </div>
@@ -2219,71 +1083,23 @@ <h2 id="and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf">And finally <a href=
 </section>
 
 <section>
-<h2 id="and-logistic-regression">And Logistic Regression </h2>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="font-size: 80%; line-height: 125%;"><span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">autograd.numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">autograd</span> <span style="color: #8B008B; font-weight: bold">import</span> grad
-
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">sigmoid</span>(x):
-    <span style="color: #8B008B; font-weight: bold">return</span> <span style="color: #B452CD">0.5</span> * (np.tanh(x / <span style="color: #B452CD">2.</span>) + <span style="color: #B452CD">1</span>)
-
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">logistic_predictions</span>(weights, inputs):
-    <span style="color: #228B22"># Outputs probability of a label being true according to logistic model.</span>
-    <span style="color: #8B008B; font-weight: bold">return</span> sigmoid(np.dot(inputs, weights))
-
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">training_loss</span>(weights):
-    <span style="color: #228B22"># Training loss is the negative log-likelihood of the training labels.</span>
-    preds = logistic_predictions(weights, inputs)
-    label_probabilities = preds * targets + (<span style="color: #B452CD">1</span> - preds) * (<span style="color: #B452CD">1</span> - targets)
-    <span style="color: #8B008B; font-weight: bold">return</span> -np.sum(np.log(label_probabilities))
-
-<span style="color: #228B22"># Build a toy dataset.</span>
-inputs = np.array([[<span style="color: #B452CD">0.52</span>, <span style="color: #B452CD">1.12</span>,  <span style="color: #B452CD">0.77</span>],
-                   [<span style="color: #B452CD">0.88</span>, -<span style="color: #B452CD">1.08</span>, <span style="color: #B452CD">0.15</span>],
-                   [<span style="color: #B452CD">0.52</span>, <span style="color: #B452CD">0.06</span>, -<span style="color: #B452CD">1.30</span>],
-                   [<span style="color: #B452CD">0.74</span>, -<span style="color: #B452CD">2.49</span>, <span style="color: #B452CD">1.39</span>]])
-targets = np.array([<span style="color: #8B008B; font-weight: bold">True</span>, <span style="color: #8B008B; font-weight: bold">True</span>, <span style="color: #8B008B; font-weight: bold">False</span>, <span style="color: #8B008B; font-weight: bold">True</span>])
-
-<span style="color: #228B22"># Define a function that returns gradients of training loss using Autograd.</span>
-training_gradient_fun = grad(training_loss)
-
-<span style="color: #228B22"># Optimize weights using gradient descent.</span>
-weights = np.array([<span style="color: #B452CD">0.0</span>, <span style="color: #B452CD">0.0</span>, <span style="color: #B452CD">0.0</span>])
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;Initial loss:&quot;</span>, training_loss(weights))
-<span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #B452CD">100</span>):
-    weights -= training_gradient_fun(weights) * <span style="color: #B452CD">0.01</span>
-
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;Trained loss:&quot;</span>, training_loss(weights))
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-<h2 id="introducing-jax-https-jax-readthedocs-io-en-latest">Introducing <a href="https://jax.readthedocs.io/en/latest/" target="_blank">JAX</a> </h2>
+<h2 id="discussing-the-correlation-data">Discussing the correlation data </h2>
 
-<p>Presently, instead of using <b>autograd</b>, we recommend using <a href="https://jax.readthedocs.io/en/latest/" target="_blank">JAX</a></p>
+<p>In the above example we note two things. In the first plot we display
+the overlap of benign and malignant tumors as functions of the various
+features in the Wisconsin data set. We see that for
+some of the features we can distinguish clearly the benign and
+malignant cases while for other features we cannot. This can point to
+us which features may be of greater interest when we wish to classify
+a benign or not benign tumour.
+</p>
 
-<p><b>JAX</b> is Autograd and <a href="https://www.tensorflow.org/xla" target="_blank">XLA (Accelerated Linear Algebra))</a>,
-brought together for high-performance numerical computing and machine learning research.
-It provides composable transformations of Python+NumPy programs: differentiate, vectorize, parallelize, Just-In-Time compile to GPU/TPU, and more.
+<p>In the second figure we have computed the so-called correlation
+matrix, which in our case with thirty features becomes a \( 30\times 30 \)
+matrix.
 </p>
-<h3 id="getting-started-with-jax-note-the-way-we-import-numpy">Getting started with Jax, note the way we import numpy </h3>
+
+<p>We constructed this matrix using <b>pandas</b> via the statements</p>
 
 <!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
 <div class="cell border-box-sizing code_cell rendered">
@@ -2291,12 +1107,7 @@ <h3 id="getting-started-with-jax-note-the-way-we-import-numpy">Getting started w
     <div class="inner_cell">
       <div class="input_area">
         <div class="highlight" style="background: #eeeedd">
-  <pre style="font-size: 80%; line-height: 125%;"><span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">jax</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">jax.numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">jnp</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">matplotlib.pyplot</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">plt</span>
-
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">jax</span> <span style="color: #8B008B; font-weight: bold">import</span> grad <span style="color: #8B008B; font-weight: bold">as</span> jax_grad
+  <pre style="font-size: 80%; line-height: 125%;">cancerpd = pd.DataFrame(cancer.data, columns=cancer.feature_names)
 </pre>
 </div>
       </div>
@@ -2311,8 +1122,8 @@ <h3 id="getting-started-with-jax-note-the-way-we-import-numpy">Getting started w
     </div>
   </div>
 </div>
-<h3 id="a-warm-up-example">A warm-up example </h3>
 
+<p>and then</p>
 
 <!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
 <div class="cell border-box-sizing code_cell rendered">
@@ -2320,40 +1131,7 @@ <h3 id="a-warm-up-example">A warm-up example </h3>
     <div class="inner_cell">
       <div class="input_area">
         <div class="highlight" style="background: #eeeedd">
-  <pre style="font-size: 80%; line-height: 125%;"><span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">function</span>(x):
-    <span style="color: #8B008B; font-weight: bold">return</span> x**<span style="color: #B452CD">2</span>
-
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">analytical_gradient</span>(x):
-    <span style="color: #8B008B; font-weight: bold">return</span> <span style="color: #B452CD">2</span>*x
-
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">gradient_descent</span>(starting_point, learning_rate, num_iterations, solver=<span style="color: #CD5555">&quot;analytical&quot;</span>):
-    x = starting_point
-    trajectory_x = [x]
-    trajectory_y = [function(x)]
-
-    <span style="color: #8B008B; font-weight: bold">if</span> solver == <span style="color: #CD5555">&quot;analytical&quot;</span>:
-        grad = analytical_gradient    
-    <span style="color: #8B008B; font-weight: bold">elif</span> solver == <span style="color: #CD5555">&quot;jax&quot;</span>:
-        grad = jax_grad(function)
-        x = jnp.float64(x)
-        learning_rate = jnp.float64(learning_rate)
-
-    <span style="color: #8B008B; font-weight: bold">for</span> _ <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(num_iterations):
-        
-        x = x - learning_rate * grad(x)
-        trajectory_x.append(x)
-        trajectory_y.append(function(x))
-
-    <span style="color: #8B008B; font-weight: bold">return</span> trajectory_x, trajectory_y
-
-x = np.linspace(-<span style="color: #B452CD">5</span>, <span style="color: #B452CD">5</span>, <span style="color: #B452CD">100</span>)
-plt.plot(x, function(x), label=<span style="color: #CD5555">&quot;f(x)&quot;</span>)
-
-descent_x, descent_y = gradient_descent(<span style="color: #B452CD">5</span>, <span style="color: #B452CD">0.1</span>, <span style="color: #B452CD">10</span>, solver=<span style="color: #CD5555">&quot;analytical&quot;</span>)
-jax_descend_x, jax_descend_y = gradient_descent(<span style="color: #B452CD">5</span>, <span style="color: #B452CD">0.1</span>, <span style="color: #B452CD">10</span>, solver=<span style="color: #CD5555">&quot;jax&quot;</span>)
-
-plt.plot(descent_x, descent_y, label=<span style="color: #CD5555">&quot;Gradient descent&quot;</span>, marker=<span style="color: #CD5555">&quot;o&quot;</span>)
-plt.plot(jax_descend_x, jax_descend_y, label=<span style="color: #CD5555">&quot;JAX&quot;</span>, marker=<span style="color: #CD5555">&quot;x&quot;</span>)
+  <pre style="font-size: 80%; line-height: 125%;">correlation_matrix = cancerpd.corr().round(<span style="color: #B452CD">1</span>)
 </pre>
 </div>
       </div>
@@ -2368,8 +1146,16 @@ <h3 id="a-warm-up-example">A warm-up example </h3>
     </div>
   </div>
 </div>
-<h3 id="a-more-advanced-example">A more advanced example </h3>
 
+<p>Diagonalizing this matrix we can in turn say something about which
+features are of relevance and which are not. This leads  us to
+the classical Principal Component Analysis (PCA) theorem with
+applications. This will be discussed later this semester.
+</p>
+</section>
+
+<section>
+<h2 id="other-measures-in-classification-studies">Other measures in classification studies </h2>
 
 <!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
 <div class="cell border-box-sizing code_cell rendered">
@@ -2377,26 +1163,38 @@ <h3 id="a-more-advanced-example">A more advanced example </h3>
     <div class="inner_cell">
       <div class="input_area">
         <div class="highlight" style="background: #eeeedd">
-  <pre style="font-size: 80%; line-height: 125%;">backend = np
-
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">function</span>(x):
-    <span style="color: #8B008B; font-weight: bold">return</span> x*backend.sin(x**<span style="color: #B452CD">2</span> + <span style="color: #B452CD">1</span>)
-
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">analytical_gradient</span>(x):
-    <span style="color: #8B008B; font-weight: bold">return</span> backend.sin(x**<span style="color: #B452CD">2</span> + <span style="color: #B452CD">1</span>) + <span style="color: #B452CD">2</span>*x**<span style="color: #B452CD">2</span>*backend.cos(x**<span style="color: #B452CD">2</span> + <span style="color: #B452CD">1</span>)
-
-
-x = np.linspace(-<span style="color: #B452CD">5</span>, <span style="color: #B452CD">5</span>, <span style="color: #B452CD">100</span>)
-plt.plot(x, function(x), label=<span style="color: #CD5555">&quot;f(x)&quot;</span>)
-
-descent_x, descent_y = gradient_descent(<span style="color: #B452CD">1</span>, <span style="color: #B452CD">0.01</span>, <span style="color: #B452CD">300</span>, solver=<span style="color: #CD5555">&quot;analytical&quot;</span>)
+  <pre style="font-size: 80%; line-height: 125%;"><span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">matplotlib.pyplot</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">plt</span>
+<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.model_selection</span> <span style="color: #8B008B; font-weight: bold">import</span>  train_test_split 
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.datasets</span> <span style="color: #8B008B; font-weight: bold">import</span> load_breast_cancer
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.linear_model</span> <span style="color: #8B008B; font-weight: bold">import</span> LogisticRegression
 
-<span style="color: #228B22"># Change the backend to JAX</span>
-backend = jnp
-jax_descend_x, jax_descend_y = gradient_descent(<span style="color: #B452CD">1</span>, <span style="color: #B452CD">0.01</span>, <span style="color: #B452CD">300</span>, solver=<span style="color: #CD5555">&quot;jax&quot;</span>)
+<span style="color: #228B22"># Load the data</span>
+cancer = load_breast_cancer()
 
-plt.scatter(descent_x, descent_y, label=<span style="color: #CD5555">&quot;Gradient descent&quot;</span>, marker=<span style="color: #CD5555">&quot;v&quot;</span>, s=<span style="color: #B452CD">10</span>, color=<span style="color: #CD5555">&quot;red&quot;</span>) 
-plt.scatter(jax_descend_x, jax_descend_y, label=<span style="color: #CD5555">&quot;JAX&quot;</span>, marker=<span style="color: #CD5555">&quot;x&quot;</span>, s=<span style="color: #B452CD">5</span>, color=<span style="color: #CD5555">&quot;black&quot;</span>)
+X_train, X_test, y_train, y_test = train_test_split(cancer.data,cancer.target,random_state=<span style="color: #B452CD">0</span>)
+<span style="color: #658b00">print</span>(X_train.shape)
+<span style="color: #658b00">print</span>(X_test.shape)
+<span style="color: #228B22"># Logistic Regression</span>
+logreg = LogisticRegression(solver=<span style="color: #CD5555">&#39;lbfgs&#39;</span>)
+logreg.fit(X_train, y_train)
+
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.preprocessing</span> <span style="color: #8B008B; font-weight: bold">import</span> LabelEncoder
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.model_selection</span> <span style="color: #8B008B; font-weight: bold">import</span> cross_validate
+<span style="color: #228B22">#Cross validation</span>
+accuracy = cross_validate(logreg,X_test,y_test,cv=<span style="color: #B452CD">10</span>)[<span style="color: #CD5555">&#39;test_score&#39;</span>]
+<span style="color: #658b00">print</span>(accuracy)
+<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;Test set accuracy with Logistic Regression: {:.2f}&quot;</span>.format(logreg.score(X_test,y_test)))
+
+<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">scikitplot</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">skplt</span>
+y_pred = logreg.predict(X_test)
+skplt.metrics.plot_confusion_matrix(y_test, y_pred, normalize=<span style="color: #8B008B; font-weight: bold">True</span>)
+plt.show()
+y_probas = logreg.predict_proba(X_test)
+skplt.metrics.plot_roc(y_test, y_probas)
+plt.show()
+skplt.metrics.plot_cumulative_gain(y_test, y_probas)
+plt.show()
 </pre>
 </div>
       </div>
@@ -2448,7 +1246,7 @@ <h2 id="artificial-neurons">Artificial neurons  </h2>
 $$
 \begin{equation}
  y = f\left(\sum_{i=1}^n w_ix_i\right) = f(u)
-\tag{6}
+\tag{1}
 \end{equation}
 $$
 <p>&nbsp;<br>
@@ -2834,7 +1632,7 @@ <h2 id="mathematical-model">Mathematical model  </h2>
 <p>&nbsp;<br>
 $$
 \begin{equation} z_i^1 = \sum_{j=1}^{M} w_{ij}^1 x_j + b_i^1
-\tag{7}
+\tag{2}
 \end{equation}
 $$
 <p>&nbsp;<br>
@@ -2851,7 +1649,7 @@ <h2 id="mathematical-model">Mathematical model  </h2>
 $$
 \begin{equation}
  y_i^1 = f(z_i^1) = f\left(\sum_{j=1}^M w_{ij}^1 x_j  + b_i^1\right)
-\tag{8}
+\tag{3}
 \end{equation}
 $$
 <p>&nbsp;<br>
@@ -2865,7 +1663,7 @@ <h2 id="mathematical-model">Mathematical model  </h2>
 $$
 \begin{equation}
  y_i^l = f^l(u_i^l) = f^l\left(\sum_{j=1}^{N_{l-1}} w_{ij}^l y_j^{l-1} + b_i^l\right)
-\tag{9}
+\tag{4}
 \end{equation}
 $$
 <p>&nbsp;<br>
@@ -2886,9 +1684,9 @@ <h2 id="mathematical-model">Mathematical model  </h2>
 $$
 \begin{align}
  y_i^2 &= f^2\left(\sum_{j=1}^N w_{ij}^2 y_j^1 + b_i^2\right) 
-\tag{10}\\
+\tag{5}\\
  &= f^2\left[\sum_{j=1}^N w_{ij}^2f^1\left(\sum_{k=1}^M w_{jk}^1 x_k + b_j^1\right) + b_i^2\right]
-\tag{11}
+\tag{6}
 \end{align}
 $$
 <p>&nbsp;<br>
@@ -2899,10 +1697,10 @@ <h2 id="mathematical-model">Mathematical model  </h2>
 $$
 \begin{align}
  y_i^3 &= f^3\left(\sum_{j=1}^N w_{ij}^3 y_j^2 + b_i^3\right) 
-\tag{12}\\
+\tag{7}\\
  &= f_3\left[\sum_{j} w_{ij}^3 f^2\left(\sum_{k} w_{jk}^2 f^1\left(\sum_{m} w_{km}^1 x_m + b_k^1\right) + b_j^2\right)
   + b_1^3\right]
-\tag{13}
+\tag{8}
 \end{align}
 $$
 <p>&nbsp;<br>
@@ -2919,7 +1717,7 @@ <h2 id="mathematical-model">Mathematical model  </h2>
 $$
 \begin{align}
 &y^{l+1}_i = f^{l+1}\left[\!\sum_{j=1}^{N_l} w_{ij}^3 f^l\left(\sum_{k=1}^{N_{l-1}}w_{jk}^{l-1}\left(\dots f^1\left(\sum_{n=1}^{N_0} w_{mn}^1 x_n+ b_m^1\right)\dots\right)+b_k^2\right)+b_1^3\right] &&
-\tag{14}
+\tag{9}
 \end{align}
 $$
 <p>&nbsp;<br>
@@ -2947,7 +1745,7 @@ <h2 id="mathematical-model">Mathematical model  </h2>
 $$
 \begin{equation}
  f(x) = c_1 f(c_2 x + c_3) + c_4
-\tag{15}
+\tag{10}
 \end{equation}
 $$
 <p>&nbsp;<br>
@@ -2992,7 +1790,7 @@ <h3 id="matrix-vector-notation">Matrix-vector notation </h3>
            b^2_2 \\
            b^2_3 \\
           \end{array}\right]\right).
-\tag{16}
+\tag{11}
 \end{equation}
 $$
 <p>&nbsp;<br>
@@ -3008,7 +1806,7 @@ <h3 id="matrix-vector-notation-and-activation">Matrix-vector notation  and activ
 \begin{equation}
  y^2_i = f_2\Bigr(w^2_{i1}y^1_1 + w^2_{i2}y^1_2 + w^2_{i3}y^1_3 + b^2_i\Bigr) = 
  f_2\left(\sum_{j=1}^3 w^2_{ij} y_j^1 + b^2_i\right).
-\tag{17}
+\tag{12}
 \end{equation}
 $$
 <p>&nbsp;<br>
diff --git a/doc/pub/week40/html/week40-solarized.html b/doc/pub/week40/html/week40-solarized.html
index b62619a15..92aaf3cdb 100644
--- a/doc/pub/week40/html/week40-solarized.html
+++ b/doc/pub/week40/html/week40-solarized.html
@@ -63,11 +63,10 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Plans for week 40', 2, None, 'plans-for-week-40'),
-              ('Lecture Monday September 30, 2024',
+ 'sections': [('Lecture Monday September 29, 2025',
                2,
                None,
-               'lecture-monday-september-30-2024'),
+               'lecture-monday-september-29-2025'),
               ('Suggested readings and videos',
                2,
                None,
@@ -76,145 +75,71 @@
                2,
                None,
                'lab-sessions-tuesday-and-wednesday'),
-              ('Summary from last week, using gradient descent methods, '
-               'limitations',
+              ('Logistic Regression, from last week',
                2,
                None,
-               'summary-from-last-week-using-gradient-descent-methods-limitations'),
-              ('Simple implementation of GD for OLS, Ridge and Lasso',
+               'logistic-regression-from-last-week'),
+              ('Classification problems', 2, None, 'classification-problems'),
+              ('Optimization and Deep learning',
                2,
                None,
-               'simple-implementation-of-gd-for-ols-ridge-and-lasso'),
-              ("But none of these can compete with Newton's method",
+               'optimization-and-deep-learning'),
+              ('Basics', 2, None, 'basics'),
+              ('Two parameters', 2, None, 'two-parameters'),
+              ('Maximum likelihood', 2, None, 'maximum-likelihood'),
+              ('The cost function rewritten',
                2,
                None,
-               'but-none-of-these-can-compete-with-newton-s-method'),
-              ('Gradient descent and Logistic regression',
+               'the-cost-function-rewritten'),
+              ('Minimizing the cross entropy',
                2,
                None,
-               'gradient-descent-and-logistic-regression'),
-              ('Overview video on Stochastic Gradient Descent',
+               'minimizing-the-cross-entropy'),
+              ('A more compact expression',
                2,
                None,
-               'overview-video-on-stochastic-gradient-descent'),
-              ('Batches and mini-batches', 2, None, 'batches-and-mini-batches'),
-              ('Stochastic Gradient Descent (SGD)',
+               'a-more-compact-expression'),
+              ('Extending to more predictors',
                2,
                None,
-               'stochastic-gradient-descent-sgd'),
-              ('Stochastic Gradient Descent',
+               'extending-to-more-predictors'),
+              ('Including more classes', 2, None, 'including-more-classes'),
+              ('More classes', 2, None, 'more-classes'),
+              ('Optimization, the central part of any Machine Learning '
+               'algortithm',
                2,
                None,
-               'stochastic-gradient-descent'),
-              ('Computation of gradients', 2, None, 'computation-of-gradients'),
-              ('SGD example', 2, None, 'sgd-example'),
-              ('The gradient step', 2, None, 'the-gradient-step'),
-              ('Simple example code', 2, None, 'simple-example-code'),
-              ('When do we stop?', 2, None, 'when-do-we-stop'),
-              ('Slightly different approach',
+               'optimization-the-central-part-of-any-machine-learning-algortithm'),
+              ('Revisiting our Logistic Regression case',
                2,
                None,
-               'slightly-different-approach'),
-              ('Time decay rate', 2, None, 'time-decay-rate'),
-              ('Code with a Number of Minibatches which varies',
+               'revisiting-our-logistic-regression-case'),
+              ('The equations to solve', 2, None, 'the-equations-to-solve'),
+              ("Solving using Newton-Raphson's method",
                2,
                None,
-               'code-with-a-number-of-minibatches-which-varies'),
-              ('Replace or not', 2, None, 'replace-or-not'),
-              ('Momentum based GD', 2, None, 'momentum-based-gd'),
-              ('More on momentum based approaches',
+               'solving-using-newton-raphson-s-method'),
+              ('Example code for Logistic Regression',
                2,
                None,
-               'more-on-momentum-based-approaches'),
-              ('Momentum parameter', 2, None, 'momentum-parameter'),
-              ('Second moment of the gradient',
-               2,
-               None,
-               'second-moment-of-the-gradient'),
-              ('RMS prop', 2, None, 'rms-prop'),
-              ('"ADAM optimizer":"https://arxiv.org/abs/1412.6980"',
-               2,
-               None,
-               'adam-optimizer-https-arxiv-org-abs-1412-6980'),
-              ('Algorithms and codes for Adagrad, RMSprop and Adam',
-               2,
-               None,
-               'algorithms-and-codes-for-adagrad-rmsprop-and-adam'),
-              ('AdaGrad algorithm, taken from "Goodfellow et '
-               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
-               2,
-               None,
-               'adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
-              ('RMSProp algorithm, taken from "Goodfellow et '
-               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
-               2,
-               None,
-               'rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
-              ('ADAM algorithm, taken from "Goodfellow et '
-               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
-               2,
-               None,
-               'adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
-              ('Practical tips', 2, None, 'practical-tips'),
-              ('Automatic differentiation',
-               2,
-               None,
-               'automatic-differentiation'),
-              ('Using autograd', 2, None, 'using-autograd'),
-              ('Autograd with more complicated functions',
-               2,
-               None,
-               'autograd-with-more-complicated-functions'),
-              ('More complicated functions using the elements of their '
-               'arguments directly',
-               2,
-               None,
-               'more-complicated-functions-using-the-elements-of-their-arguments-directly'),
-              ('Functions using mathematical functions from Numpy',
-               2,
-               None,
-               'functions-using-mathematical-functions-from-numpy'),
-              ('More autograd', 2, None, 'more-autograd'),
-              ('And  with loops', 2, None, 'and-with-loops'),
-              ('Using recursion', 2, None, 'using-recursion'),
-              ('Using Autograd with OLS', 2, None, 'using-autograd-with-ols'),
-              ('Same code but now with momentum gradient descent',
-               2,
-               None,
-               'same-code-but-now-with-momentum-gradient-descent'),
-              ('Including Stochastic Gradient Descent with Autograd',
-               2,
-               None,
-               'including-stochastic-gradient-descent-with-autograd'),
-              ('Same code but now with momentum gradient descent',
-               2,
-               None,
-               'same-code-but-now-with-momentum-gradient-descent'),
-              ('Similar (second order function now) problem but now with '
-               'AdaGrad',
-               2,
+               'example-code-for-logistic-regression'),
+              ('Synthetic data generation',
+               3,
                None,
-               'similar-second-order-function-now-problem-but-now-with-adagrad'),
-              ('RMSprop for adaptive learning rate with Stochastic Gradient '
-               'Descent',
+               'synthetic-data-generation'),
+              ('Using _Scikit-learn_', 2, None, 'using-scikit-learn'),
+              ('Using the correlation matrix',
                2,
                None,
-               'rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent'),
-              ('And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"',
+               'using-the-correlation-matrix'),
+              ('Discussing the correlation data',
                2,
                None,
-               'and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf'),
-              ('And Logistic Regression', 2, None, 'and-logistic-regression'),
-              ('Introducing "JAX":"https://jax.readthedocs.io/en/latest/"',
+               'discussing-the-correlation-data'),
+              ('Other measures in classification studies',
                2,
                None,
-               'introducing-jax-https-jax-readthedocs-io-en-latest'),
-              ('Getting started with Jax, note the way we import numpy',
-               3,
-               None,
-               'getting-started-with-jax-note-the-way-we-import-numpy'),
-              ('A warm-up example', 3, None, 'a-warm-up-example'),
-              ('A more advanced example', 3, None, 'a-more-advanced-example'),
+               'other-measures-in-classification-studies'),
               ('Introduction to Neural networks',
                2,
                None,
@@ -295,34 +220,29 @@ <h1>Week 40: Gradient descent methods (continued) and start Neural networks</h1>
 
 <!-- author(s): Morten Hjorth-Jensen -->
 <center>
-<b>Morten Hjorth-Jensen</b> [1, 2]
+<b>Morten Hjorth-Jensen</b> 
 </center>
-<!-- institution(s) -->
+<!-- institution -->
 <center>
-[1] <b>Department of Physics, University of Oslo, Norway</b>
-</center>
-<center>
-[2] <b>Department of Physics and Astronomy and Facility for Rare Ion Beams, Michigan State University, USA</b>
+<b>Department of Physics, University of Oslo, Norway</b>
 </center>
 <br>
 <center>
-<h4>September 30-October 4, 2024</h4>
+<h4>September 29-October 3, 2025</h4>
 </center> <!-- date -->
 <br>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="plans-for-week-40">Plans for week 40 </h2>
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="lecture-monday-september-30-2024">Lecture Monday September 30, 2024 </h2>
+<h2 id="lecture-monday-september-29-2025">Lecture Monday September 29, 2025 </h2>
 <div class="alert alert-block alert-block alert-text-normal">
 <b></b>
 <p>
 <ol>
- <li> Stochastic Gradient descent with examples and automatic differentiation</li>
- <li> If we get time, we start with the basics of Neural Networks, setting up the basic steps, from the simple perceptron model to the multi-layer perceptron model</li>
- <li> <a href="https://youtu.be/jdJoOrCIdII" target="_blank">Video of lecture</a></li>
- <li> Whiteboard notes at <a href="https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2024/NotesSeptember30.pdf" target="_blank"><tt>https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2024/NotesSeptember30.pdf</tt></a></li>  
+<li> Logistic regression and gradient descent, examples on how to code
+<!-- o Automatic differentiation and gradient descent, examples using Logistic regression --></li>
+<li> Start with the basics of Neural Networks, setting up the basic steps, from the simple perceptron model to the multi-layer perceptron model</li>
+<li> Video of lecture at <a href="https://youtu.be/MS3Tv8FVArs" target="_blank"><tt>https://youtu.be/MS3Tv8FVArs</tt></a></li>
+<li> Whiteboard notes at <a href="https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2025/FYSSTKweek40.pdf" target="_blank"><tt>https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2025/FYSSTKweek40.pdf</tt></a></li>  
 </ol>
 </div>
 
@@ -333,13 +253,13 @@ <h2 id="suggested-readings-and-videos">Suggested readings and videos </h2>
 <b>Readings and Videos:</b>
 <p>
 <ol>
-  <li> The lecture notes for week 40 (these notes)</li>
-  <li> For a good discussion on gradient methods, we would like to recommend Goodfellow et al section 4.3-4.5 and sections 8.3-8.6. We will come back to the latter chapter in our discussion of Neural networks as well.</li>
-  <li> For neural networks we recommend Goodfellow et al chapter 6 and Raschka et al chapter 2 (contains also material about gradient descent) and chapter 11 (we will use this next week)</li>
-  <li> Video on gradient descent at <a href="https://www.youtube.com/watch?v=sDv4f4s2SB8" target="_blank"><tt>https://www.youtube.com/watch?v=sDv4f4s2SB8</tt></a></li>
-  <li> Video on stochastic gradient descent at <a href="https://www.youtube.com/watch?v=vMh0zPT0tLI" target="_blank"><tt>https://www.youtube.com/watch?v=vMh0zPT0tLI</tt></a></li>
-  <li> Neural Networks demystified at <a href="https://www.youtube.com/watch?v=bxe2T-V8XRs&list=PLiaHhY2iBX9hdHaRr6b7XevZtgZRa1PoU&ab_channel=WelchLabs" target="_blank"><tt>https://www.youtube.com/watch?v=bxe2T-V8XRs&list=PLiaHhY2iBX9hdHaRr6b7XevZtgZRa1PoU&ab_channel=WelchLabs</tt></a></li>
-  <li> Building Neural Networks from scratch at URL:https://www.youtube.com/watch?v=Wo5dMEP_BbI&list=PLQVvvaa0QuDcjD5BAw2DxE6OF2tius3V3&ab_channel=sentdex"</li>
+<li> The lecture notes for week 40 (these notes)
+<!-- o For a good discussion on gradient methods, we would like to recommend Goodfellow et al section 4.3-4.5 and# sections 8.3-8.6. We will come back to the latter chapter in our discussion of Neural networks as well. --></li>
+<li> For neural networks we recommend Goodfellow et al chapter 6 and Raschka et al chapter 2 (contains also material about gradient descent) and chapter 11 (we will use this next week)
+<!-- o Video on gradient descent at <a href="https://www.youtube.com/watch?v=sDv4f4s2SB8" target="_blank"><tt>https://www.youtube.com/watch?v=sDv4f4s2SB8</tt></a> -->
+<!-- o Video on automatic differentiation  at <a href="https://www.youtube.com/watch?v=wG_nF1awSSY" target="_blank"><tt>https://www.youtube.com/watch?v=wG_nF1awSSY</tt></a> --></li>
+<li> Neural Networks demystified at <a href="https://www.youtube.com/watch?v=bxe2T-V8XRs&list=PLiaHhY2iBX9hdHaRr6b7XevZtgZRa1PoU&ab_channel=WelchLabs" target="_blank"><tt>https://www.youtube.com/watch?v=bxe2T-V8XRs&list=PLiaHhY2iBX9hdHaRr6b7XevZtgZRa1PoU&ab_channel=WelchLabs</tt></a></li>
+<li> Building Neural Networks from scratch at URL:https://www.youtube.com/watch?v=Wo5dMEP_BbI&list=PLQVvvaa0QuDcjD5BAw2DxE6OF2tius3V3&ab_channel=sentdex"</li>
 </ol>
 </div>
 
@@ -358,1486 +278,354 @@ <h2 id="lab-sessions-tuesday-and-wednesday">Lab sessions Tuesday and Wednesday <
 </div>
   
 
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="summary-from-last-week-using-gradient-descent-methods-limitations">Summary from last week, using gradient descent methods, limitations </h2>
-
-<ul>
-<li> <b>Gradient descent (GD) finds local minima of our function</b>. Since the GD algorithm is deterministic, if it converges, it will converge to a local minimum of our cost/loss/risk function. Because in ML we are often dealing with extremely rugged landscapes with many local minima, this can lead to poor performance.</li>
-<li> <b>GD is sensitive to initial conditions</b>. One consequence of the local nature of GD is that initial conditions matter. Depending on where one starts, one will end up at a different local minima. Therefore, it is very important to think about how one initializes the training process. This is true for GD as well as more complicated variants of GD.</li>
-<li> <b>Gradients are computationally expensive to calculate for large datasets</b>. In many cases in statistics and ML, the cost/loss/risk function is a sum of terms, with one term for each data point. For example, in linear regression, \( E \propto \sum_{i=1}^n (y_i - \mathbf{w}^T\cdot\mathbf{x}_i)^2 \); for logistic regression, the square error is replaced by the cross entropy. To calculate the gradient we have to sum over <em>all</em> \( n \) data points. Doing this at every GD step becomes extremely computationally expensive. An ingenious solution to this, is to calculate the gradients using small subsets of the data called &quot;mini batches&quot;. This has the added benefit of introducing stochasticity into our algorithm.</li>
-<li> <b>GD is very sensitive to choices of learning rates</b>. GD is extremely sensitive to the choice of learning rates. If the learning rate is very small, the training process take an extremely long time. For larger learning rates, GD can diverge and give poor results. Furthermore, depending on what the local landscape looks like, we have to modify the learning rates to ensure convergence. Ideally, we would <em>adaptively</em> choose the learning rates to match the landscape.</li>
-<li> <b>GD treats all directions in parameter space uniformly.</b> Another major drawback of GD is that unlike Newton's method, the learning rate for GD is the same in all directions in parameter space. For this reason, the maximum learning rate is set by the behavior of the steepest direction and this can significantly slow down training. Ideally, we would like to take large steps in flat directions and small steps in steep directions. Since we are exploring rugged landscapes where curvatures change, this requires us to keep track of not only the gradient but second derivatives. The ideal scenario would be to calculate the Hessian but this proves to be too computationally expensive.</li> 
-<li> GD can take exponential time to escape saddle points, even with random initialization. As we mentioned, GD is extremely sensitive to initial condition since it determines the particular local minimum GD would eventually reach. However, even with a good initialization scheme, through the introduction of randomness, GD can still take exponential time to escape saddle points.</li>
-</ul>
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="simple-implementation-of-gd-for-ols-ridge-and-lasso">Simple implementation of GD for OLS, Ridge and Lasso </h2>
+<!-- !split  -->
+<h2 id="logistic-regression-from-last-week">Logistic Regression, from last week  </h2>
 
-<p>Last week we studied both several gradient methods. With and without an update of the learning.
-We summarize some of these here for the methods we hvae studied in project one, without the inclusion of momentum.
+<p>In linear regression our main interest was centered on learning the
+coefficients of a functional fit (say a polynomial) in order to be
+able to predict the response of a continuous variable on some unseen
+data. The fit to the continuous variable \( y_i \) is based on some
+independent variables \( \boldsymbol{x}_i \). Linear regression resulted in
+analytical expressions for standard ordinary Least Squares or Ridge
+regression (in terms of matrices to invert) for several quantities,
+ranging from the variance and thereby the confidence intervals of the
+parameters \( \boldsymbol{\theta} \) to the mean squared error. If we can invert
+the product of the design matrices, linear regression gives then a
+simple recipe for fitting our data.
 </p>
 
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="line-height: 125%;"><span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">random</span> <span style="color: #8B008B; font-weight: bold">import</span> random, seed
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
-
-<span style="color: #228B22"># the number of datapoints with a 2nd-order polynomial</span>
-n = <span style="color: #B452CD">100</span>
-x = <span style="color: #B452CD">2</span>*np.random.rand(n,<span style="color: #B452CD">1</span>)
-y = <span style="color: #B452CD">4</span>+<span style="color: #B452CD">3</span>*x+<span style="color: #B452CD">5</span>*x*x
-<span style="color: #228B22"># Design matrix including the intercept</span>
-<span style="color: #228B22"># No scaling of data of and all data used for training </span>
-X = np.c_[np.ones((n,<span style="color: #B452CD">1</span>)), x, x*x]
-<span style="color: #228B22"># Learning rate and number of iterations</span>
-eta = <span style="color: #B452CD">0.05</span>
-Niterations = <span style="color: #B452CD">100</span>
-
-<span style="color: #228B22"># OLS part</span>
-beta_OLS = np.random.randn(<span style="color: #B452CD">3</span>,<span style="color: #B452CD">1</span>)
-gradient = np.zeros(<span style="color: #B452CD">3</span>)
-<span style="color: #8B008B; font-weight: bold">for</span> <span style="color: #658b00">iter</span> <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(Niterations):
-    gradient = (<span style="color: #B452CD">2.0</span>/n)*X.T @ (X @ beta_OLS-y)
-    beta_OLS -= eta*gradient
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&#39;Parameters for OLS using gradient descent&#39;</span>)    
-<span style="color: #658b00">print</span>(beta_OLS)
-
-<span style="color: #228B22">#Ridge and Lasso parameter Lambda</span>
-Lambda  = <span style="color: #B452CD">0.01</span>
-Id = n*Lambda* np.eye((X.T @ X).shape[<span style="color: #B452CD">0</span>])
-<span style="color: #228B22"># Gradient descent with  Ridge</span>
-beta_Ridge = np.random.randn(<span style="color: #B452CD">3</span>,<span style="color: #B452CD">1</span>)
-gradient = np.zeros(<span style="color: #B452CD">3</span>)
-<span style="color: #8B008B; font-weight: bold">for</span> <span style="color: #658b00">iter</span> <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(Niterations):
-    gradients = <span style="color: #B452CD">2.0</span>/n*X.T @ (X @ beta_Ridge-y)+<span style="color: #B452CD">2</span>*Lambda*beta_Ridge
-    beta_Ridge -= eta*gradients
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&#39;Parameters for Ridge using gradient descent&#39;</span>)    
-<span style="color: #658b00">print</span>(beta_Ridge)
-
-<span style="color: #228B22"># Gradient descent with Lasso</span>
-beta_Lasso = np.random.randn(<span style="color: #B452CD">3</span>,<span style="color: #B452CD">1</span>)
-gradient = np.zeros(<span style="color: #B452CD">3</span>)
-<span style="color: #8B008B; font-weight: bold">for</span> <span style="color: #658b00">iter</span> <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(Niterations):
-    gradients = <span style="color: #B452CD">2.0</span>/n*X.T @ (X @ beta_Lasso-y)+Lambda*np.sign(beta_Lasso)
-    beta_Lasso -= eta*gradients
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&#39;Parameters for Lasso using gradient descent&#39;</span>)    
-<span style="color: #658b00">print</span>(beta_Lasso)
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="but-none-of-these-can-compete-with-newton-s-method">But none of these can compete with Newton's method </h2>
-
-<p>Note that we here have introduced automatic differentiation</p>
-
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="line-height: 125%;"><span style="color: #228B22"># Using Newton&#39;s method</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">random</span> <span style="color: #8B008B; font-weight: bold">import</span> random, seed
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">autograd.numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">autograd</span> <span style="color: #8B008B; font-weight: bold">import</span> grad
-
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">CostOLS</span>(beta):
-    <span style="color: #8B008B; font-weight: bold">return</span> (<span style="color: #B452CD">1.0</span>/n)*np.sum((y-X @ beta)**<span style="color: #B452CD">2</span>)
-
-n = <span style="color: #B452CD">100</span>
-x = <span style="color: #B452CD">2</span>*np.random.rand(n,<span style="color: #B452CD">1</span>)
-y = <span style="color: #B452CD">4</span>+<span style="color: #B452CD">3</span>*x+<span style="color: #B452CD">5</span>*x*x
-
-X = np.c_[np.ones((n,<span style="color: #B452CD">1</span>)), x, x*x]
-XT_X = X.T @ X
-beta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;Own inversion&quot;</span>)
-<span style="color: #658b00">print</span>(beta_linreg)
-<span style="color: #228B22"># Hessian matrix</span>
-H = (<span style="color: #B452CD">2.0</span>/n)* XT_X
-<span style="color: #228B22"># Note that here the Hessian does not depend on the parameters beta</span>
-invH = np.linalg.pinv(H)
-beta = np.random.randn(<span style="color: #B452CD">3</span>,<span style="color: #B452CD">1</span>)
-Niterations = <span style="color: #B452CD">5</span>
-<span style="color: #228B22"># define the gradient</span>
-training_gradient = grad(CostOLS)
-
-<span style="color: #8B008B; font-weight: bold">for</span> <span style="color: #658b00">iter</span> <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(Niterations):
-    gradients = training_gradient(beta)
-    beta -= invH @ gradients
-    <span style="color: #658b00">print</span>(<span style="color: #658b00">iter</span>,gradients[<span style="color: #B452CD">0</span>],gradients[<span style="color: #B452CD">1</span>])
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;beta from own Newton code&quot;</span>)
-<span style="color: #658b00">print</span>(beta)
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="gradient-descent-and-logistic-regression">Gradient descent and Logistic regression </h2>
+<!-- !split  -->
+<h2 id="classification-problems">Classification problems </h2>
 
-<p>Finally, we complete these examples by adding a simple code for
-Logistic regression. Note the more general approach with a class for
-the method. Here we use a so-called <b>AND</b> gate for our data set.
+<p>Classification problems, however, are concerned with outcomes taking
+the form of discrete variables (i.e. categories). We may for example,
+on the basis of DNA sequencing for a number of patients, like to find
+out which mutations are important for a certain disease; or based on
+scans of various patients' brains, figure out if there is a tumor or
+not; or given a specific physical system, we'd like to identify its
+state, say whether it is an ordered or disordered system (typical
+situation in solid state physics); or classify the status of a
+patient, whether she/he has a stroke or not and many other similar
+situations.
 </p>
 
-
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="line-height: 125%;"><span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
-<span style="color: #8B008B; font-weight: bold">class</span> <span style="color: #008b45; font-weight: bold">LogisticRegression</span>:
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">__init__</span>(<span style="color: #658b00">self</span>, learning_rate=<span style="color: #B452CD">0.01</span>, num_iterations=<span style="color: #B452CD">1000</span>):
-        <span style="color: #658b00">self</span>.learning_rate = learning_rate
-        <span style="color: #658b00">self</span>.num_iterations = num_iterations
-        <span style="color: #658b00">self</span>.beta_logreg = <span style="color: #8B008B; font-weight: bold">None</span>
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">sigmoid</span>(<span style="color: #658b00">self</span>, z):
-        <span style="color: #8B008B; font-weight: bold">return</span> <span style="color: #B452CD">1</span> / (<span style="color: #B452CD">1</span> + np.exp(-z))
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">GDfit</span>(<span style="color: #658b00">self</span>, X, y):
-        n_data, num_features = X.shape
-        <span style="color: #658b00">self</span>.beta_logreg = np.zeros(num_features)
-        <span style="color: #8B008B; font-weight: bold">for</span> _ <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #658b00">self</span>.num_iterations):
-            linear_model = X @ <span style="color: #658b00">self</span>.beta_logreg
-            y_predicted = <span style="color: #658b00">self</span>.sigmoid(linear_model)
-            <span style="color: #228B22"># Gradient calculation</span>
-            gradient = (X.T @ (y_predicted - y))/n_data
-            <span style="color: #228B22"># Update beta_logreg</span>
-            <span style="color: #658b00">self</span>.beta_logreg -= <span style="color: #658b00">self</span>.learning_rate*gradient
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">predict</span>(<span style="color: #658b00">self</span>, X):
-        linear_model = X @ <span style="color: #658b00">self</span>.beta_logreg
-        y_predicted = <span style="color: #658b00">self</span>.sigmoid(linear_model)
-        <span style="color: #8B008B; font-weight: bold">return</span> [<span style="color: #B452CD">1</span> <span style="color: #8B008B; font-weight: bold">if</span> i &gt;= <span style="color: #B452CD">0.5</span> <span style="color: #8B008B; font-weight: bold">else</span> <span style="color: #B452CD">0</span> <span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> y_predicted]
-<span style="color: #228B22"># Example usage</span>
-<span style="color: #8B008B; font-weight: bold">if</span> <span style="color: #00688B">__name__</span> == <span style="color: #CD5555">&quot;__main__&quot;</span>:
-    <span style="color: #228B22"># Sample data</span>
-    X = np.array([[<span style="color: #B452CD">0</span>, <span style="color: #B452CD">0</span>], [<span style="color: #B452CD">1</span>, <span style="color: #B452CD">0</span>], [<span style="color: #B452CD">0</span>, <span style="color: #B452CD">1</span>], [<span style="color: #B452CD">1</span>, <span style="color: #B452CD">1</span>]])
-    y = np.array([<span style="color: #B452CD">0</span>, <span style="color: #B452CD">0</span>, <span style="color: #B452CD">0</span>, <span style="color: #B452CD">1</span>])  <span style="color: #228B22"># This is an AND gate</span>
-    model = LogisticRegression(learning_rate=<span style="color: #B452CD">0.01</span>, num_iterations=<span style="color: #B452CD">1000</span>)
-    model.GDfit(X, y)
-    predictions = model.predict(X)
-    <span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;Predictions:&quot;</span>, predictions)
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="overview-video-on-stochastic-gradient-descent">Overview video on Stochastic Gradient Descent </h2>
-
-<a href="https://www.youtube.com/watch?v=vMh0zPT0tLI&ab_channel=StatQuestwithJoshStarmer" target="_blank">What is Stochastic Gradient Descent</a>
-<p>There are several reasons for using stochastic gradient descent. Some of these are:</p>
-
-<ol>
-<li> Efficiency: Updates weights more frequently using a single or a small batch of samples, which speeds up convergence.</li>
-<li> Hopefully avoid Local Minima</li>
-<li> Memory Usage: Requires less memory compared to computing gradients for the entire dataset.</li>
-</ol>
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="batches-and-mini-batches">Batches and mini-batches </h2>
-
-<p>In gradient descent we compute the cost function and its gradient for all data points we have.</p>
-
-<p>In large-scale applications such as the <a href="https://www.image-net.org/challenges/LSVRC/" target="_blank">ILSVRC challenge</a>, the
-training data can have on order of millions of examples. Hence, it
-seems wasteful to compute the full cost function over the entire
-training set in order to perform only a single parameter update. A
-very common approach to addressing this challenge is to compute the
-gradient over batches of the training data. For example, a typical batch could contain some thousand  examples from
-an  entire training set of several millions. This batch is then used to
-perform a parameter update.
+<p>The most common situation we encounter when we apply logistic
+regression is that of two possible outcomes, normally denoted as a
+binary outcome, true or false, positive or negative, success or
+failure etc.
 </p>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="stochastic-gradient-descent-sgd">Stochastic Gradient Descent (SGD) </h2>
+<h2 id="optimization-and-deep-learning">Optimization and Deep learning </h2>
 
-<p>In stochastic gradient descent, the extreme case is the case where we
-have only one batch, that is we include the whole data set.
+<p>Logistic regression will also serve as our stepping stone towards
+neural network algorithms and supervised deep learning. For logistic
+learning, the minimization of the cost function leads to a non-linear
+equation in the parameters \( \boldsymbol{\theta} \). The optimization of the
+problem calls therefore for minimization algorithms.
 </p>
 
-<p>This process is called Stochastic Gradient
-Descent (SGD) (or also sometimes on-line gradient descent). This is
-relatively less common to see because in practice due to vectorized
-code optimizations it can be computationally much more efficient to
-evaluate the gradient for 100 examples, than the gradient for one
-example 100 times. Even though SGD technically refers to using a
-single example at a time to evaluate the gradient, you will hear
-people use the term SGD even when referring to mini-batch gradient
-descent (i.e. mentions of MGD for &#8220;Minibatch Gradient Descent&#8221;, or BGD
-for &#8220;Batch gradient descent&#8221; are rare to see), where it is usually
-assumed that mini-batches are used. The size of the mini-batch is a
-hyperparameter but it is not very common to cross-validate or bootstrap it. It is
-usually based on memory constraints (if any), or set to some value,
-e.g. 32, 64 or 128. We use powers of 2 in practice because many
-vectorized operation implementations work faster when their inputs are
-sized in powers of 2.
+<p>As we have discussed earlier, this forms the
+bottle neck of all machine learning algorithms, namely how to find
+reliable minima of a multi-variable function. This leads us to the
+family of gradient descent methods. The latter are the working horses
+of basically all modern machine learning algorithms.
 </p>
 
-<p>In our notes with  SGD we mean stochastic gradient descent with mini-batches.</p>
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="stochastic-gradient-descent">Stochastic Gradient Descent </h2>
-
-<p>Stochastic gradient descent (SGD) and variants thereof address some of
-the shortcomings of the Gradient descent method discussed above.
+<p>We note also that many of the topics discussed here on logistic 
+regression are also commonly used in modern supervised Deep Learning
+models, as we will see later.
 </p>
 
-<p>The underlying idea of SGD comes from the observation that the cost
-function, which we want to minimize, can almost always be written as a
-sum over \( n \) data points \( \{\mathbf{x}_i\}_{i=1}^n \),
-</p>
-$$
-C(\mathbf{\beta}) = \sum_{i=1}^n c_i(\mathbf{x}_i,
-\mathbf{\beta}). 
-$$
-
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="computation-of-gradients">Computation of gradients </h2>
+<!-- !split  -->
+<h2 id="basics">Basics </h2>
 
-<p>This in turn means that the gradient can be
-computed as a sum over \( i \)-gradients 
+<p>We consider the case where the outputs/targets, also called the
+responses or the outcomes, \( y_i \) are discrete and only take values
+from \( k=0,\dots,K-1 \) (i.e. \( K \) classes).
 </p>
-$$
-\nabla_\beta C(\mathbf{\beta}) = \sum_i^n \nabla_\beta c_i(\mathbf{x}_i,
-\mathbf{\beta}).
-$$
 
-<p>Stochasticity/randomness is introduced by only taking the
-gradient on a subset of the data called minibatches.  If there are \( n \)
-data points and the size of each minibatch is \( M \), there will be \( n/M \)
-minibatches. We denote these minibatches by \( B_k \) where
-\( k=1,\cdots,n/M \).
+<p>The goal is to predict the
+output classes from the design matrix \( \boldsymbol{X}\in\mathbb{R}^{n\times p} \)
+made of \( n \) samples, each of which carries \( p \) features or predictors. The
+primary goal is to identify the classes to which new unseen samples
+belong.
 </p>
 
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="sgd-example">SGD example </h2>
-<p>As an example, suppose we have \( 10 \) data points \( (\mathbf{x}_1,\cdots, \mathbf{x}_{10}) \) 
-and we choose to have \( M=5 \) minibathces,
-then each minibatch contains two data points. In particular we have
-\( B_1 = (\mathbf{x}_1,\mathbf{x}_2), \cdots, B_5 =
-(\mathbf{x}_9,\mathbf{x}_{10}) \). Note that if you choose \( M=1 \) you
-have only a single batch with all data points and on the other extreme,
-you may choose \( M=n \) resulting in a minibatch for each datapoint, i.e
-\( B_k = \mathbf{x}_k \).
+<p>Last week we  specialized to the case of two classes only, with outputs
+\( y_i=0 \) and \( y_i=1 \). Our outcomes could represent the status of a
+credit card user that could default or not on her/his credit card
+debt. That is
 </p>
 
-<p>The idea is now to approximate the gradient by replacing the sum over
-all data points with a sum over the data points in one the minibatches
-picked at random in each gradient descent step 
-</p>
 $$
-\nabla_{\beta}
-C(\mathbf{\beta}) = \sum_{i=1}^n \nabla_\beta c_i(\mathbf{x}_i,
-\mathbf{\beta}) \rightarrow \sum_{i \in B_k}^n \nabla_\beta
-c_i(\mathbf{x}_i, \mathbf{\beta}).
+y_i = \begin{bmatrix} 0 & \mathrm{no}\\  1 & \mathrm{yes} \end{bmatrix}.
 $$
 
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="the-gradient-step">The gradient step </h2>
-
-<p>Thus a gradient descent step now looks like </p>
-$$
-\beta_{j+1} = \beta_j - \gamma_j \sum_{i \in B_k}^n \nabla_\beta c_i(\mathbf{x}_i,
-\mathbf{\beta})
-$$
-
-<p>where \( k \) is picked at random with equal
-probability from \( [1,n/M] \). An iteration over the number of
-minibathces (n/M) is commonly referred to as an epoch. Thus it is
-typical to choose a number of epochs and for each epoch iterate over
-the number of minibatches, as exemplified in the code below.
-</p>
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="simple-example-code">Simple example code </h2>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="line-height: 125%;"><span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span> 
-
-n = <span style="color: #B452CD">100</span> <span style="color: #228B22">#100 datapoints </span>
-M = <span style="color: #B452CD">5</span>   <span style="color: #228B22">#size of each minibatch</span>
-m = <span style="color: #658b00">int</span>(n/M) <span style="color: #228B22">#number of minibatches</span>
-n_epochs = <span style="color: #B452CD">10</span> <span style="color: #228B22">#number of epochs</span>
-
-j = <span style="color: #B452CD">0</span>
-<span style="color: #8B008B; font-weight: bold">for</span> epoch <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #B452CD">1</span>,n_epochs+<span style="color: #B452CD">1</span>):
-    <span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(m):
-        k = np.random.randint(m) <span style="color: #228B22">#Pick the k-th minibatch at random</span>
-        <span style="color: #228B22">#Compute the gradient using the data in minibatch Bk</span>
-        <span style="color: #228B22">#Compute new suggestion for </span>
-        j += <span style="color: #B452CD">1</span>
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-<p>Taking the gradient only on a subset of the data has two important
-benefits. First, it introduces randomness which decreases the chance
-that our opmization scheme gets stuck in a local minima. Second, if
-the size of the minibatches are small relative to the number of
-datapoints (\( M <  n \)), the computation of the gradient is much
-cheaper since we sum over the datapoints in the \( k-th \) minibatch and not
-all \( n \) datapoints.
-</p>
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="when-do-we-stop">When do we stop? </h2>
-
-<p>A natural question is when do we stop the search for a new minimum?
-One possibility is to compute the full gradient after a given number
-of epochs and check if the norm of the gradient is smaller than some
-threshold and stop if true. However, the condition that the gradient
-is zero is valid also for local minima, so this would only tell us
-that we are close to a local/global minimum. However, we could also
-evaluate the cost function at this point, store the result and
-continue the search. If the test kicks in at a later stage we can
-compare the values of the cost function and keep the \( \beta \) that
-gave the lowest value.
-</p>
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="slightly-different-approach">Slightly different approach </h2>
-
-<p>Another approach is to let the step length \( \gamma_j \) depend on the
-number of epochs in such a way that it becomes very small after a
-reasonable time such that we do not move at all. Such approaches are
-also called scaling. There are many such ways to <a href="https://towardsdatascience.com/gradient-descent-the-learning-rate-and-the-importance-of-feature-scaling-6c0b416596e1" target="_blank">scale the learning
-rate</a>
-and <a href="https://www.jmlr.org/papers/volume23/20-1258/20-1258.pdf" target="_blank">discussions here</a>. See
-also
-<a href="https://towardsdatascience.com/learning-rate-schedules-and-adaptive-learning-rate-methods-for-deep-learning-2c8f433990d1" target="_blank"><tt>https://towardsdatascience.com/learning-rate-schedules-and-adaptive-learning-rate-methods-for-deep-learning-2c8f433990d1</tt></a>
-for a discussion of different scaling functions for the learning rate.
-</p>
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="time-decay-rate">Time decay rate </h2>
-
-<p>As an example, let \( e = 0,1,2,3,\cdots \) denote the current epoch and let \( t_0, t_1 > 0 \) be two fixed numbers. Furthermore, let \( t = e \cdot m + i \) where \( m \) is the number of minibatches and \( i=0,\cdots,m-1 \). Then the function $$\gamma_j(t; t_0, t_1) = \frac{t_0}{t+t_1} $$ goes to zero as the number of epochs gets large. I.e. we start with a step length \( \gamma_j (0; t_0, t_1) = t_0/t_1 \) which decays in <em>time</em> \( t \).</p>
-
-<p>In this way we can fix the number of epochs, compute \( \beta \) and
-evaluate the cost function at the end. Repeating the computation will
-give a different result since the scheme is random by design. Then we
-pick the final \( \beta \) that gives the lowest value of the cost
-function.
-</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="line-height: 125%;"><span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span> 
-
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">step_length</span>(t,t0,t1):
-    <span style="color: #8B008B; font-weight: bold">return</span> t0/(t+t1)
-
-n = <span style="color: #B452CD">100</span> <span style="color: #228B22">#100 datapoints </span>
-M = <span style="color: #B452CD">5</span>   <span style="color: #228B22">#size of each minibatch</span>
-m = <span style="color: #658b00">int</span>(n/M) <span style="color: #228B22">#number of minibatches</span>
-n_epochs = <span style="color: #B452CD">500</span> <span style="color: #228B22">#number of epochs</span>
-t0 = <span style="color: #B452CD">1.0</span>
-t1 = <span style="color: #B452CD">10</span>
-
-gamma_j = t0/t1
-j = <span style="color: #B452CD">0</span>
-<span style="color: #8B008B; font-weight: bold">for</span> epoch <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #B452CD">1</span>,n_epochs+<span style="color: #B452CD">1</span>):
-    <span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(m):
-        k = np.random.randint(m) <span style="color: #228B22">#Pick the k-th minibatch at random</span>
-        <span style="color: #228B22">#Compute the gradient using the data in minibatch Bk</span>
-        <span style="color: #228B22">#Compute new suggestion for beta</span>
-        t = epoch*m+i
-        gamma_j = step_length(t,t0,t1)
-        j += <span style="color: #B452CD">1</span>
-
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;gamma_j after %d epochs: %g&quot;</span> % (n_epochs,gamma_j))
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="code-with-a-number-of-minibatches-which-varies">Code with a Number of Minibatches which varies </h2>
-
-<p>In the code here we vary the number of mini-batches.</p>
-
-<!-- code=text (!bc pycode) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="line-height: 125%;"># Importing various packages
-from math import exp, sqrt
-from random import random, seed
-import numpy as np
-import matplotlib.pyplot as plt
-
-n = 100
-x = 2*np.random.rand(n,1)
-y = 4+3*x+np.random.randn(n,1)
-
-X = np.c_[np.ones((n,1)), x]
-XT_X = X.T @ X
-theta_linreg = np.linalg.inv(X.T @ X) @ (X.T @ y)
-print(&quot;Own inversion&quot;)
-print(theta_linreg)
-# Hessian matrix
-H = (2.0/n)* XT_X
-EigValues, EigVectors = np.linalg.eig(H)
-print(f&quot;Eigenvalues of Hessian Matrix:{EigValues}&quot;)
-
-theta = np.random.randn(2,1)
-eta = 1.0/np.max(EigValues)
-Niterations = 1000
-
-
-for iter in range(Niterations):
-    gradients = 2.0/n*X.T @ ((X @ theta)-y)
-    theta -= eta*gradients
-print(&quot;theta from own gd&quot;)
-print(theta)
-
-xnew = np.array([[0],[2]])
-Xnew = np.c_[np.ones((2,1)), xnew]
-ypredict = Xnew.dot(theta)
-ypredict2 = Xnew.dot(theta_linreg)
-
-n_epochs = 50
-M = 5   #size of each minibatch
-m = int(n/M) #number of minibatches
-t0, t1 = 5, 50
-
-def learning_schedule(t):
-    return t0/(t+t1)
-
-theta = np.random.randn(2,1)
-
-for epoch in range(n_epochs):
-# Can you figure out a better way of setting up the contributions to each batch?
-    for i in range(m):
-        random_index = M*np.random.randint(m)
-        xi = X[random_index:random_index+M]
-        yi = y[random_index:random_index+M]
-        gradients = (2.0/M)* xi.T @ ((xi @ theta)-yi)
-        eta = learning_schedule(epoch*m+i)
-        theta = theta - eta*gradients
-print(&quot;theta from own sdg&quot;)
-print(theta)
-
-plt.plot(xnew, ypredict, &quot;r-&quot;)
-plt.plot(xnew, ypredict2, &quot;b-&quot;)
-plt.plot(x, y ,&#39;ro&#39;)
-plt.axis([0,2.0,0, 15.0])
-plt.xlabel(r&#39;$x$&#39;)
-plt.ylabel(r&#39;$y$&#39;)
-plt.title(r&#39;Random numbers &#39;)
-plt.show()
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="replace-or-not">Replace or not </h2>
-
-<p>In the above code, we have use replacement in setting up the
-mini-batches. The discussion
-<a href="https://sebastianraschka.com/faq/docs/sgd-methods.html" target="_blank">here</a> may be
-useful.  
-</p>
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="momentum-based-gd">Momentum based GD </h2>
-
-<p>The stochastic gradient descent (SGD) is almost always used with a
-<em>momentum</em> or inertia term that serves as a memory of the direction we
-are moving in parameter space.  This is typically implemented as
-follows
-</p>
+<h2 id="two-parameters">Two parameters </h2>
 
+<p>We assume now that we have two classes with \( y_i \) either \( 0 \) or \( 1 \). Furthermore we assume also that we have only two parameters \( \theta \) in our fitting of the Sigmoid function, that is we define probabilities </p>
 $$
-\begin{align}
-\mathbf{v}_{t}&=\gamma \mathbf{v}_{t-1}+\eta_{t}\nabla_\theta E(\boldsymbol{\theta}_t) \nonumber \\
-\boldsymbol{\theta}_{t+1}&= \boldsymbol{\theta}_t -\mathbf{v}_{t},
-\label{_auto1}
-\end{align}
+\begin{align*}
+p(y_i=1|x_i,\boldsymbol{\theta}) &= \frac{\exp{(\theta_0+\theta_1x_i)}}{1+\exp{(\theta_0+\theta_1x_i)}},\nonumber\\
+p(y_i=0|x_i,\boldsymbol{\theta}) &= 1 - p(y_i=1|x_i,\boldsymbol{\theta}),
+\end{align*}
 $$
 
-<p>where we have introduced a momentum parameter \( \gamma \), with
-\( 0\le\gamma\le 1 \), and for brevity we dropped the explicit notation to
-indicate the gradient is to be taken over a different mini-batch at
-each step. We call this algorithm gradient descent with momentum
-(GDM). From these equations, it is clear that \( \mathbf{v}_t \) is a
-running average of recently encountered gradients and
-\( (1-\gamma)^{-1} \) sets the characteristic time scale for the memory
-used in the averaging procedure. Consistent with this, when
-\( \gamma=0 \), this just reduces down to ordinary SGD as discussed
-earlier. An equivalent way of writing the updates is
-</p>
+<p>where \( \boldsymbol{\theta} \) are the weights we wish to extract from data, in our case \( \theta_0 \) and \( \theta_1 \). </p>
 
+<p>Note that we used</p>
 $$
-\Delta \boldsymbol{\theta}_{t+1} = \gamma \Delta \boldsymbol{\theta}_t -\ \eta_{t}\nabla_\theta E(\boldsymbol{\theta}_t),
+p(y_i=0\vert x_i, \boldsymbol{\theta}) = 1-p(y_i=1\vert x_i, \boldsymbol{\theta}).
 $$
 
-<p>where we have defined \( \Delta \boldsymbol{\theta}_{t}= \boldsymbol{\theta}_t-\boldsymbol{\theta}_{t-1} \).</p>
 
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="more-on-momentum-based-approaches">More on momentum based approaches </h2>
+<!-- !split  -->
+<h2 id="maximum-likelihood">Maximum likelihood </h2>
 
-<p>Let us try to get more intuition from these equations. It is helpful
-to consider a simple physical analogy with a particle of mass \( m \)
-moving in a viscous medium with drag coefficient \( \mu \) and potential
-\( E(\mathbf{w}) \). If we denote the particle's position by \( \mathbf{w} \),
-then its motion is described by
+<p>In order to define the total likelihood for all possible outcomes from a  
+dataset \( \mathcal{D}=\{(y_i,x_i)\} \), with the binary labels
+\( y_i\in\{0,1\} \) and where the data points are drawn independently, we use the so-called <a href="https://en.wikipedia.org/wiki/Maximum_likelihood_estimation" target="_blank">Maximum Likelihood Estimation</a> (MLE) principle. 
+We aim thus at maximizing 
+the probability of seeing the observed data. We can then approximate the 
+likelihood in terms of the product of the individual probabilities of a specific outcome \( y_i \), that is 
 </p>
-
-$$
-m {d^2 \mathbf{w} \over dt^2} + \mu {d \mathbf{w} \over dt }= -\nabla_w E(\mathbf{w}).
-$$
-
-<p>We can discretize this equation in the usual way to get</p>
-
 $$
-m { \mathbf{w}_{t+\Delta t}-2 \mathbf{w}_{t} +\mathbf{w}_{t-\Delta t} \over (\Delta t)^2}+\mu {\mathbf{w}_{t+\Delta t}- \mathbf{w}_{t} \over \Delta t} = -\nabla_w E(\mathbf{w}).
+\begin{align*}
+P(\mathcal{D}|\boldsymbol{\theta})& = \prod_{i=1}^n \left[p(y_i=1|x_i,\boldsymbol{\theta})\right]^{y_i}\left[1-p(y_i=1|x_i,\boldsymbol{\theta}))\right]^{1-y_i}\nonumber \\
+\end{align*}
 $$
 
-<p>Rearranging this equation, we can rewrite this as</p>
-
+<p>from which we obtain the log-likelihood and our <b>cost/loss</b> function</p>
 $$
-\Delta \mathbf{w}_{t +\Delta t}= - { (\Delta t)^2 \over m +\mu \Delta t} \nabla_w E(\mathbf{w})+ {m \over m +\mu \Delta t} \Delta \mathbf{w}_t.
+\mathcal{C}(\boldsymbol{\theta}) = \sum_{i=1}^n \left( y_i\log{p(y_i=1|x_i,\boldsymbol{\theta})} + (1-y_i)\log\left[1-p(y_i=1|x_i,\boldsymbol{\theta}))\right]\right).
 $$
 
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="momentum-parameter">Momentum parameter </h2>
-
-<p>Notice that this equation is identical to previous one if we identify
-the position of the particle, \( \mathbf{w} \), with the parameters
-\( \boldsymbol{\theta} \). This allows us to identify the momentum
-parameter and learning rate with the mass of the particle and the
-viscous drag as:
-</p>
+<h2 id="the-cost-function-rewritten">The cost function rewritten </h2>
 
+<p>Reordering the logarithms, we can rewrite the <b>cost/loss</b> function as</p>
 $$
-\gamma= {m \over m +\mu \Delta t }, \qquad \eta = {(\Delta t)^2 \over m +\mu \Delta t}.
-$$
-
-<p>Thus, as the name suggests, the momentum parameter is proportional to
-the mass of the particle and effectively provides inertia.
-Furthermore, in the large viscosity/small learning rate limit, our
-memory time scales as \( (1-\gamma)^{-1} \approx m/(\mu \Delta t) \).
-</p>
-
-<p>Why is momentum useful? SGD momentum helps the gradient descent
-algorithm gain speed in directions with persistent but small gradients
-even in the presence of stochasticity, while suppressing oscillations
-in high-curvature directions. This becomes especially important in
-situations where the landscape is shallow and flat in some directions
-and narrow and steep in others. It has been argued that first-order
-methods (with appropriate initial conditions) can perform comparable
-to more expensive second order methods, especially in the context of
-complex deep learning models.
-</p>
-
-<p>These beneficial properties of momentum can sometimes become even more
-pronounced by using a slight modification of the classical momentum
-algorithm called Nesterov Accelerated Gradient (NAG).
-</p>
-
-<p>In the NAG algorithm, rather than calculating the gradient at the
-current parameters, \( \nabla_\theta E(\boldsymbol{\theta}_t) \), one
-calculates the gradient at the expected value of the parameters given
-our current momentum, \( \nabla_\theta E(\boldsymbol{\theta}_t +\gamma
-\mathbf{v}_{t-1}) \). This yields the NAG update rule
-</p>
-
-$$
-\begin{align}
-\mathbf{v}_{t}&=\gamma \mathbf{v}_{t-1}+\eta_{t}\nabla_\theta E(\boldsymbol{\theta}_t +\gamma \mathbf{v}_{t-1}) \nonumber \\
-\boldsymbol{\theta}_{t+1}&= \boldsymbol{\theta}_t -\mathbf{v}_{t}.
-\label{_auto2}
-\end{align}
+\mathcal{C}(\boldsymbol{\theta}) = \sum_{i=1}^n  \left(y_i(\theta_0+\theta_1x_i) -\log{(1+\exp{(\theta_0+\theta_1x_i)})}\right).
 $$
 
-<p>One of the major advantages of NAG is that it allows for the use of a larger learning rate than GDM for the same choice of \( \gamma \).</p>
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="second-moment-of-the-gradient">Second moment of the gradient </h2>
-
-<p>In stochastic gradient descent, with and without momentum, we still
-have to specify a schedule for tuning the learning rates \( \eta_t \)
-as a function of time.  As discussed in the context of Newton's
-method, this presents a number of dilemmas. The learning rate is
-limited by the steepest direction which can change depending on the
-current position in the landscape. To circumvent this problem, ideally
-our algorithm would keep track of curvature and take large steps in
-shallow, flat directions and small steps in steep, narrow directions.
-Second-order methods accomplish this by calculating or approximating
-the Hessian and normalizing the learning rate by the
-curvature. However, this is very computationally expensive for
-extremely large models. Ideally, we would like to be able to
-adaptively change the step size to match the landscape without paying
-the steep computational price of calculating or approximating
-Hessians.
-</p>
-
-<p>During the last decade a number of methods have been introduced that accomplish
-this by tracking not only the gradient, but also the second moment of
-the gradient. These methods include AdaGrad, AdaDelta, Root Mean Squared Propagation (RMS-Prop), and
-<a href="https://arxiv.org/abs/1412.6980" target="_blank">ADAM</a>.
-</p>
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="rms-prop">RMS prop </h2>
-
-<p>In RMS prop, in addition to keeping a running average of the first
-moment of the gradient, we also keep track of the second moment
-denoted by \( \mathbf{s}_t=\mathbb{E}[\mathbf{g}_t^2] \). The update rule
-for RMS prop is given by
+<p>The maximum likelihood estimator is defined as the set of parameters that maximize the log-likelihood where we maximize with respect to \( \theta \).
+Since the cost (error) function is just the negative log-likelihood, for logistic regression we have that
 </p>
-
 $$
-\begin{align}
-\mathbf{g}_t &= \nabla_\theta E(\boldsymbol{\theta}) 
-\label{_auto3}\\
-\mathbf{s}_t &=\beta \mathbf{s}_{t-1} +(1-\beta)\mathbf{g}_t^2 \nonumber \\
-\boldsymbol{\theta}_{t+1}&=&\boldsymbol{\theta}_t - \eta_t { \mathbf{g}_t \over \sqrt{\mathbf{s}_t +\epsilon}}, \nonumber
-\end{align}
+\mathcal{C}(\boldsymbol{\theta})=-\sum_{i=1}^n  \left(y_i(\theta_0+\theta_1x_i) -\log{(1+\exp{(\theta_0+\theta_1x_i)})}\right).
 $$
 
-<p>where \( \beta \) controls the averaging time of the second moment and is
-typically taken to be about \( \beta=0.9 \), \( \eta_t \) is a learning rate
-typically chosen to be \( 10^{-3} \), and \( \epsilon\sim 10^{-8}  \) is a
-small regularization constant to prevent divergences. Multiplication
-and division by vectors is understood as an element-wise operation. It
-is clear from this formula that the learning rate is reduced in
-directions where the norm of the gradient is consistently large. This
-greatly speeds up the convergence by allowing us to use a larger
-learning rate for flat directions.
+<p>This equation is known in statistics as the <b>cross entropy</b>. Finally, we note that just as in linear regression, 
+in practice we often supplement the cross-entropy with additional regularization terms, usually \( L_1 \) and \( L_2 \) regularization as we did for Ridge and Lasso regression.
 </p>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="adam-optimizer-https-arxiv-org-abs-1412-6980"><a href="https://arxiv.org/abs/1412.6980" target="_blank">ADAM optimizer</a> </h2>
+<h2 id="minimizing-the-cross-entropy">Minimizing the cross entropy </h2>
 
-<p>A related algorithm is the ADAM optimizer. In
-<a href="https://arxiv.org/abs/1412.6980" target="_blank">ADAM</a>, we keep a running average of
-both the first and second moment of the gradient and use this
-information to adaptively change the learning rate for different
-parameters.  The method isefficient when working with large
-problems involving lots data and/or parameters.  It is a combination of the
-gradient descent with momentum algorithm and the RMSprop algorithm
-discussed above.
+<p>The cross entropy is a convex function of the weights \( \boldsymbol{\theta} \) and,
+therefore, any local minimizer is a global minimizer. 
 </p>
 
-<p>In addition to keeping a running average of the first and
-second moments of the gradient
-(i.e. \( \mathbf{m}_t=\mathbb{E}[\mathbf{g}_t] \) and
-\( \mathbf{s}_t=\mathbb{E}[\mathbf{g}^2_t] \), respectively), ADAM
-performs an additional bias correction to account for the fact that we
-are estimating the first two moments of the gradient using a running
-average (denoted by the hats in the update rule below). The update
-rule for ADAM is given by (where multiplication and division are once
-again understood to be element-wise operations below)
+<p>Minimizing this
+cost function with respect to the two parameters \( \theta_0 \) and \( \theta_1 \) we obtain
 </p>
 
 $$
-\begin{align}
-\mathbf{g}_t &= \nabla_\theta E(\boldsymbol{\theta}) 
-\label{_auto4}\\
-\mathbf{m}_t &= \beta_1 \mathbf{m}_{t-1} + (1-\beta_1) \mathbf{g}_t \nonumber \\
-\mathbf{s}_t &=\beta_2 \mathbf{s}_{t-1} +(1-\beta_2)\mathbf{g}_t^2 \nonumber \\
-\boldsymbol{\mathbf{m}}_t&={\mathbf{m}_t \over 1-\beta_1^t} \nonumber \\
-\boldsymbol{\mathbf{s}}_t &={\mathbf{s}_t \over1-\beta_2^t} \nonumber \\
-\boldsymbol{\theta}_{t+1}&=\boldsymbol{\theta}_t - \eta_t { \boldsymbol{\mathbf{m}}_t \over \sqrt{\boldsymbol{\mathbf{s}}_t} +\epsilon}, \nonumber \\
-\label{_auto5}
-\end{align}
+\frac{\partial \mathcal{C}(\boldsymbol{\theta})}{\partial \theta_0} = -\sum_{i=1}^n  \left(y_i -\frac{\exp{(\theta_0+\theta_1x_i)}}{1+\exp{(\theta_0+\theta_1x_i)}}\right),
 $$
 
-<p>where \( \beta_1 \) and \( \beta_2 \) set the memory lifetime of the first and
-second moment and are typically taken to be \( 0.9 \) and \( 0.99 \)
-respectively, and \( \eta \) and \( \epsilon \) are identical to RMSprop.
-</p>
-
-<p>Like in RMSprop, the effective step size of a parameter depends on the
-magnitude of its gradient squared.  To understand this better, let us
-rewrite this expression in terms of the variance
-\( \boldsymbol{\sigma}_t^2 = \boldsymbol{\mathbf{s}}_t -
-(\boldsymbol{\mathbf{m}}_t)^2 \). Consider a single parameter \( \theta_t \). The
-update rule for this parameter is given by
-</p>
-
+<p>and </p>
 $$
-\Delta \theta_{t+1}= -\eta_t { \boldsymbol{m}_t \over \sqrt{\sigma_t^2 +  m_t^2 }+\epsilon}.
+\frac{\partial \mathcal{C}(\boldsymbol{\theta})}{\partial \theta_1} = -\sum_{i=1}^n  \left(y_ix_i -x_i\frac{\exp{(\theta_0+\theta_1x_i)}}{1+\exp{(\theta_0+\theta_1x_i)}}\right).
 $$
 
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="algorithms-and-codes-for-adagrad-rmsprop-and-adam">Algorithms and codes for Adagrad, RMSprop and Adam </h2>
-
-<p>The algorithms we have implemented are well described in the text by <a href="https://www.deeplearningbook.org/contents/optimization.html" target="_blank">Goodfellow, Bengio and Courville, chapter 8</a>.</p>
-
-<p>The codes which implement these algorithms are discussed after our presentation of automatic differentiation.</p>
-<h2 id="adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html">AdaGrad algorithm, taken from <a href="https://www.deeplearningbook.org/contents/optimization.html" target="_blank">Goodfellow et al</a> </h2>
-
-<br/><br/>
-<center>
-<p><img src="http://github.com/figures/adagrad.png" width="600" align="bottom"></p>
-</center>
-<br/><br/>
-<h2 id="rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html">RMSProp algorithm, taken from <a href="https://www.deeplearningbook.org/contents/optimization.html" target="_blank">Goodfellow et al</a> </h2>
+<h2 id="a-more-compact-expression">A more compact expression </h2>
 
-<br/><br/>
-<center>
-<p><img src="http://github.com/figures/rmsprop.png" width="600" align="bottom"></p>
-</center>
-<br/><br/>
-<h2 id="adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html">ADAM algorithm, taken from <a href="https://www.deeplearningbook.org/contents/optimization.html" target="_blank">Goodfellow et al</a> </h2>
-
-<br/><br/>
-<center>
-<p><img src="http://github.com/figures/adam.png" width="600" align="bottom"></p>
-</center>
-<br/><br/>
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="practical-tips">Practical tips </h2>
-
-<ul>
-<li> <b>Randomize the data when making mini-batches</b>. It is always important to randomly shuffle the data when forming mini-batches. Otherwise, the gradient descent method can fit spurious correlations resulting from the order in which data is presented.</li>
-<li> <b>Transform your inputs</b>. Learning becomes difficult when our landscape has a mixture of steep and flat directions. One simple trick for minimizing these situations is to standardize the data by subtracting the mean and normalizing the variance of input variables. Whenever possible, also decorrelate the inputs. To understand why this is helpful, consider the case of linear regression. It is easy to show that for the squared error cost function, the Hessian of the cost function is just the correlation matrix between the inputs. Thus, by standardizing the inputs, we are ensuring that the landscape looks homogeneous in all directions in parameter space. Since most deep networks can be viewed as linear transformations followed by a non-linearity at each layer, we expect this intuition to hold beyond the linear case.</li>
-<li> <b>Monitor the out-of-sample performance.</b> Always monitor the performance of your model on a validation set (a small portion of the training data that is held out of the training process to serve as a proxy for the test set. If the validation error starts increasing, then the model is beginning to overfit. Terminate the learning process. This <em>early stopping</em> significantly improves performance in many settings.</li>
-<li> <b>Adaptive optimization methods don't always have good generalization.</b> Recent studies have shown that adaptive methods such as ADAM, RMSPorp, and AdaGrad tend to have poor generalization compared to SGD or SGD with momentum, particularly in the high-dimensional limit (i.e. the number of parameters exceeds the number of data points). Although it is not clear at this stage why these methods perform so well in training deep neural networks, simpler procedures like properly-tuned SGD may work as well or better in these applications.</li>
-</ul>
-<p>Geron's text, see chapter 11, has several interesting discussions.</p>
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="automatic-differentiation">Automatic differentiation </h2>
-
-<p><a href="https://en.wikipedia.org/wiki/Automatic_differentiation" target="_blank">Automatic differentiation (AD)</a>, 
-also called algorithmic
-differentiation or computational differentiation,is a set of
-techniques to numerically evaluate the derivative of a function
-specified by a computer program. AD exploits the fact that every
-computer program, no matter how complicated, executes a sequence of
-elementary arithmetic operations (addition, subtraction,
-multiplication, division, etc.) and elementary functions (exp, log,
-sin, cos, etc.). By applying the chain rule repeatedly to these
-operations, derivatives of arbitrary order can be computed
-automatically, accurately to working precision, and using at most a
-small constant factor more arithmetic operations than the original
-program.
+<p>Let us now define a vector \( \boldsymbol{y} \) with \( n \) elements \( y_i \), an
+\( n\times p \) matrix \( \boldsymbol{X} \) which contains the \( x_i \) values and a
+vector \( \boldsymbol{p} \) of fitted probabilities \( p(y_i\vert x_i,\boldsymbol{\theta}) \). We can rewrite in a more compact form the first
+derivative of the cost function as
 </p>
 
-<p>Automatic differentiation is neither:</p>
-
-<ul>
-<li> Symbolic differentiation, nor</li>
-<li> Numerical differentiation (the method of finite differences).</li>
-</ul>
-<p>Symbolic differentiation can lead to inefficient code and faces the
-difficulty of converting a computer program into a single expression,
-while numerical differentiation can introduce round-off errors in the
-discretization process and cancellation
-</p>
-
-<p>Python has tools for so-called <b>automatic differentiation</b>.
-Consider the following example
-</p>
-$$
-f(x) = \sin\left(2\pi x + x^2\right)
-$$
-
-<p>which has the following derivative</p>
 $$
-f'(x) = \cos\left(2\pi x + x^2\right)\left(2\pi + 2x\right) 
+\frac{\partial \mathcal{C}(\boldsymbol{\theta})}{\partial \boldsymbol{\theta}} = -\boldsymbol{X}^T\left(\boldsymbol{y}-\boldsymbol{p}\right). 
 $$
 
-<p>Using <b>autograd</b> we have</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="line-height: 125%;"><span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">autograd.numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
-
-<span style="color: #228B22"># To do elementwise differentiation:</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">autograd</span> <span style="color: #8B008B; font-weight: bold">import</span> elementwise_grad <span style="color: #8B008B; font-weight: bold">as</span> egrad 
-
-<span style="color: #228B22"># To plot:</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">matplotlib.pyplot</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">plt</span> 
-
-
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">f</span>(x):
-    <span style="color: #8B008B; font-weight: bold">return</span> np.sin(<span style="color: #B452CD">2</span>*np.pi*x + x**<span style="color: #B452CD">2</span>)
-
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">f_grad_analytic</span>(x):
-    <span style="color: #8B008B; font-weight: bold">return</span> np.cos(<span style="color: #B452CD">2</span>*np.pi*x + x**<span style="color: #B452CD">2</span>)*(<span style="color: #B452CD">2</span>*np.pi + <span style="color: #B452CD">2</span>*x)
-
-<span style="color: #228B22"># Do the comparison:</span>
-x = np.linspace(<span style="color: #B452CD">0</span>,<span style="color: #B452CD">1</span>,<span style="color: #B452CD">1000</span>)
-
-f_grad = egrad(f)
-
-computed = f_grad(x)
-analytic = f_grad_analytic(x)
-
-plt.title(<span style="color: #CD5555">&#39;Derivative computed from Autograd compared with the analytical derivative&#39;</span>)
-plt.plot(x,computed,label=<span style="color: #CD5555">&#39;autograd&#39;</span>)
-plt.plot(x,analytic,label=<span style="color: #CD5555">&#39;analytic&#39;</span>)
-
-plt.xlabel(<span style="color: #CD5555">&#39;x&#39;</span>)
-plt.ylabel(<span style="color: #CD5555">&#39;y&#39;</span>)
-plt.legend()
-
-plt.show()
-
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;The max absolute difference is: %g&quot;</span>%(np.max(np.abs(computed - analytic))))
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-
-<!-- !split  -->
-<h2 id="using-autograd">Using autograd </h2>
-
-<p>Here we
-experiment with what kind of functions Autograd is capable
-of finding the gradient of. The following Python functions are just
-meant to illustrate what Autograd can do, but please feel free to
-experiment with other, possibly more complicated, functions as well.
+<p>If we in addition define a diagonal matrix \( \boldsymbol{W} \) with elements 
+\( p(y_i\vert x_i,\boldsymbol{\theta})(1-p(y_i\vert x_i,\boldsymbol{\theta}) \), we can obtain a compact expression of the second derivative as 
 </p>
 
+$$
+\frac{\partial^2 \mathcal{C}(\boldsymbol{\theta})}{\partial \boldsymbol{\theta}\partial \boldsymbol{\theta}^T} = \boldsymbol{X}^T\boldsymbol{W}\boldsymbol{X}. 
+$$
 
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="line-height: 125%;"><span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">autograd.numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">autograd</span> <span style="color: #8B008B; font-weight: bold">import</span> grad
-
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">f1</span>(x):
-    <span style="color: #8B008B; font-weight: bold">return</span> x**<span style="color: #B452CD">3</span> + <span style="color: #B452CD">1</span>
-
-f1_grad = grad(f1)
-
-<span style="color: #228B22"># Remember to send in float as argument to the computed gradient from Autograd!</span>
-a = <span style="color: #B452CD">1.0</span>
-
-<span style="color: #228B22"># See the evaluated gradient at a using autograd:</span>
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;The gradient of f1 evaluated at a = %g using autograd is: %g&quot;</span>%(a,f1_grad(a)))
-
-<span style="color: #228B22"># Compare with the analytical derivative, that is f1&#39;(x) = 3*x**2 </span>
-grad_analytical = <span style="color: #B452CD">3</span>*a**<span style="color: #B452CD">2</span>
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;The gradient of f1 evaluated at a = %g by finding the analytic expression is: %g&quot;</span>%(a,grad_analytical))
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="autograd-with-more-complicated-functions">Autograd with more complicated functions </h2>
-
-<p>To differentiate with respect to two (or more) arguments of a Python
-function, Autograd need to know at which variable the function if
-being differentiated with respect to.
-</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="line-height: 125%;"><span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">autograd.numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">autograd</span> <span style="color: #8B008B; font-weight: bold">import</span> grad
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">f2</span>(x1,x2):
-    <span style="color: #8B008B; font-weight: bold">return</span> <span style="color: #B452CD">3</span>*x1**<span style="color: #B452CD">3</span> + x2*(x1 - <span style="color: #B452CD">5</span>) + <span style="color: #B452CD">1</span>
-
-<span style="color: #228B22"># By sending the argument 0, Autograd will compute the derivative w.r.t the first variable, in this case x1</span>
-f2_grad_x1 = grad(f2,<span style="color: #B452CD">0</span>)
-
-<span style="color: #228B22"># ... and differentiate w.r.t x2 by sending 1 as an additional arugment to grad</span>
-f2_grad_x2 = grad(f2,<span style="color: #B452CD">1</span>)
-
-x1 = <span style="color: #B452CD">1.0</span>
-x2 = <span style="color: #B452CD">3.0</span> 
-
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;Evaluating at x1 = %g, x2 = %g&quot;</span>%(x1,x2))
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;-&quot;</span>*<span style="color: #B452CD">30</span>)
-
-<span style="color: #228B22"># Compare with the analytical derivatives:</span>
-
-<span style="color: #228B22"># Derivative of f2 w.r.t x1 is: 9*x1**2 + x2:</span>
-f2_grad_x1_analytical = <span style="color: #B452CD">9</span>*x1**<span style="color: #B452CD">2</span> + x2
-
-<span style="color: #228B22"># Derivative of f2 w.r.t x2 is: x1 - 5:</span>
-f2_grad_x2_analytical = x1 - <span style="color: #B452CD">5</span>
-
-<span style="color: #228B22"># See the evaluated derivations:</span>
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;The derivative of f2 w.r.t x1: %g&quot;</span>%( f2_grad_x1(x1,x2) ))
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;The analytical derivative of f2 w.r.t x1: %g&quot;</span>%( f2_grad_x1(x1,x2) ))
-
-<span style="color: #658b00">print</span>()
-
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;The derivative of f2 w.r.t x2: %g&quot;</span>%( f2_grad_x2(x1,x2) ))
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;The analytical derivative of f2 w.r.t x2: %g&quot;</span>%( f2_grad_x2(x1,x2) ))
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-<p>Note that the grad function will not produce the true gradient of the function. The true gradient of a function with two or more variables will produce a vector, where each element is the function differentiated w.r.t a variable.</p>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="more-complicated-functions-using-the-elements-of-their-arguments-directly">More complicated functions using the elements of their arguments directly </h2>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="line-height: 125%;"><span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">autograd.numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">autograd</span> <span style="color: #8B008B; font-weight: bold">import</span> grad
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">f3</span>(x): <span style="color: #228B22"># Assumes x is an array of length 5 or higher</span>
-    <span style="color: #8B008B; font-weight: bold">return</span> <span style="color: #B452CD">2</span>*x[<span style="color: #B452CD">0</span>] + <span style="color: #B452CD">3</span>*x[<span style="color: #B452CD">1</span>] + <span style="color: #B452CD">5</span>*x[<span style="color: #B452CD">2</span>] + <span style="color: #B452CD">7</span>*x[<span style="color: #B452CD">3</span>] + <span style="color: #B452CD">11</span>*x[<span style="color: #B452CD">4</span>]**<span style="color: #B452CD">2</span>
-
-f3_grad = grad(f3)
-
-x = np.linspace(<span style="color: #B452CD">0</span>,<span style="color: #B452CD">4</span>,<span style="color: #B452CD">5</span>)
-
-<span style="color: #228B22"># Print the computed gradient:</span>
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;The computed gradient of f3 is: &quot;</span>, f3_grad(x))
-
-<span style="color: #228B22"># The analytical gradient is: (2, 3, 5, 7, 22*x[4])</span>
-f3_grad_analytical = np.array([<span style="color: #B452CD">2</span>, <span style="color: #B452CD">3</span>, <span style="color: #B452CD">5</span>, <span style="color: #B452CD">7</span>, <span style="color: #B452CD">22</span>*x[<span style="color: #B452CD">4</span>]])
-
-<span style="color: #228B22"># Print the analytical gradient:</span>
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;The analytical gradient of f3 is: &quot;</span>, f3_grad_analytical)
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-<p>Note that in this case, when sending an array as input argument, the
-output from Autograd is another array. This is the true gradient of
-the function, as opposed to the function in the previous example. By
-using arrays to represent the variables, the output from Autograd
-might be easier to work with, as the output is closer to what one
-could expect form a gradient-evaluting function.
-</p>
-
-<!-- !split  -->
-<h2 id="functions-using-mathematical-functions-from-numpy">Functions using mathematical functions from Numpy </h2>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="line-height: 125%;"><span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">autograd.numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">autograd</span> <span style="color: #8B008B; font-weight: bold">import</span> grad
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">f4</span>(x):
-    <span style="color: #8B008B; font-weight: bold">return</span> np.sqrt(<span style="color: #B452CD">1</span>+x**<span style="color: #B452CD">2</span>) + np.exp(x) + np.sin(<span style="color: #B452CD">2</span>*np.pi*x)
-
-f4_grad = grad(f4)
-
-x = <span style="color: #B452CD">2.7</span>
-
-<span style="color: #228B22"># Print the computed derivative:</span>
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;The computed derivative of f4 at x = %g is: %g&quot;</span>%(x,f4_grad(x)))
-
-<span style="color: #228B22"># The analytical derivative is: x/sqrt(1 + x**2) + exp(x) + cos(2*pi*x)*2*pi</span>
-f4_grad_analytical = x/np.sqrt(<span style="color: #B452CD">1</span> + x**<span style="color: #B452CD">2</span>) + np.exp(x) + np.cos(<span style="color: #B452CD">2</span>*np.pi*x)*<span style="color: #B452CD">2</span>*np.pi
-
-<span style="color: #228B22"># Print the analytical gradient:</span>
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;The analytical gradient of f4 at x = %g is: %g&quot;</span>%(x,f4_grad_analytical))
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
+<h2 id="extending-to-more-predictors">Extending to more predictors </h2>
 
+<p>Within a binary classification problem, we can easily expand our model to include multiple predictors. Our ratio between likelihoods is then with \( p \) predictors</p>
+$$
+\log{ \frac{p(\boldsymbol{\theta}\boldsymbol{x})}{1-p(\boldsymbol{\theta}\boldsymbol{x})}} = \theta_0+\theta_1x_1+\theta_2x_2+\dots+\theta_px_p.
+$$
 
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="more-autograd">More autograd </h2>
+<p>Here we defined \( \boldsymbol{x}=[1,x_1,x_2,\dots,x_p] \) and \( \boldsymbol{\theta}=[\theta_0, \theta_1, \dots, \theta_p] \) leading to</p>
+$$
+p(\boldsymbol{\theta}\boldsymbol{x})=\frac{ \exp{(\theta_0+\theta_1x_1+\theta_2x_2+\dots+\theta_px_p)}}{1+\exp{(\theta_0+\theta_1x_1+\theta_2x_2+\dots+\theta_px_p)}}.
+$$
 
 
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="line-height: 125%;"><span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">autograd.numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">autograd</span> <span style="color: #8B008B; font-weight: bold">import</span> grad
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">f5</span>(x):
-    <span style="color: #8B008B; font-weight: bold">if</span> x &gt;= <span style="color: #B452CD">0</span>:
-        <span style="color: #8B008B; font-weight: bold">return</span> x**<span style="color: #B452CD">2</span>
-    <span style="color: #8B008B; font-weight: bold">else</span>:
-        <span style="color: #8B008B; font-weight: bold">return</span> -<span style="color: #B452CD">3</span>*x + <span style="color: #B452CD">1</span>
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="including-more-classes">Including more classes </h2>
 
-f5_grad = grad(f5)
+<p>Till now we have mainly focused on two classes, the so-called binary
+system. Suppose we wish to extend to \( K \) classes.  Let us for the sake
+of simplicity assume we have only two predictors. We have then following model
+</p>
 
-x = <span style="color: #B452CD">2.7</span>
+$$
+\log{\frac{p(C=1\vert x)}{p(K\vert x)}} = \theta_{10}+\theta_{11}x_1,
+$$
 
-<span style="color: #228B22"># Print the computed derivative:</span>
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;The computed derivative of f5 at x = %g is: %g&quot;</span>%(x,f5_grad(x)))
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
+<p>and </p>
+$$
+\log{\frac{p(C=2\vert x)}{p(K\vert x)}} = \theta_{20}+\theta_{21}x_1,
+$$
+
+<p>and so on till the class \( C=K-1 \) class</p>
+$$
+\log{\frac{p(C=K-1\vert x)}{p(K\vert x)}} = \theta_{(K-1)0}+\theta_{(K-1)1}x_1,
+$$
 
+<p>and the model is specified in term of \( K-1 \) so-called log-odds or
+<b>logit</b> transformations.
+</p>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="and-with-loops">And  with loops </h2>
+<h2 id="more-classes">More classes </h2>
 
+<p>In our discussion of neural networks we will encounter the above again
+in terms of a slightly modified function, the so-called <b>Softmax</b> function.
+</p>
 
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="line-height: 125%;"><span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">autograd.numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">autograd</span> <span style="color: #8B008B; font-weight: bold">import</span> grad
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">f6_for</span>(x):
-    val = <span style="color: #B452CD">0</span>
-    <span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #B452CD">10</span>):
-        val = val + x**i
-    <span style="color: #8B008B; font-weight: bold">return</span> val
-
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">f6_while</span>(x):
-    val = <span style="color: #B452CD">0</span>
-    i = <span style="color: #B452CD">0</span>
-    <span style="color: #8B008B; font-weight: bold">while</span> i &lt; <span style="color: #B452CD">10</span>:
-        val = val + x**i
-        i = i + <span style="color: #B452CD">1</span>
-    <span style="color: #8B008B; font-weight: bold">return</span> val
-
-f6_for_grad = grad(f6_for)
-f6_while_grad = grad(f6_while)
-
-x = <span style="color: #B452CD">0.5</span>
-
-<span style="color: #228B22"># Print the computed derivaties of f6_for and f6_while</span>
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;The computed derivative of f6_for at x = %g is: %g&quot;</span>%(x,f6_for_grad(x)))
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;The computed derivative of f6_while at x = %g is: %g&quot;</span>%(x,f6_while_grad(x)))
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
+<p>The softmax function is used in various multiclass classification
+methods, such as multinomial logistic regression (also known as
+softmax regression), multiclass linear discriminant analysis, naive
+Bayes classifiers, and artificial neural networks.  Specifically, in
+multinomial logistic regression and linear discriminant analysis, the
+input to the function is the result of \( K \) distinct linear functions,
+and the predicted probability for the \( k \)-th class given a sample
+vector \( \boldsymbol{x} \) and a weighting vector \( \boldsymbol{\theta} \) is (with two
+predictors):
+</p>
 
+$$
+p(C=k\vert \mathbf {x} )=\frac{\exp{(\theta_{k0}+\theta_{k1}x_1)}}{1+\sum_{l=1}^{K-1}\exp{(\theta_{l0}+\theta_{l1}x_1)}}.
+$$
 
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="line-height: 125%;"><span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">autograd.numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">autograd</span> <span style="color: #8B008B; font-weight: bold">import</span> grad
-<span style="color: #228B22"># Both of the functions are implementation of the sum: sum(x**i) for i = 0, ..., 9</span>
-<span style="color: #228B22"># The analytical derivative is: sum(i*x**(i-1)) </span>
-f6_grad_analytical = <span style="color: #B452CD">0</span>
-<span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #B452CD">10</span>):
-    f6_grad_analytical += i*x**(i-<span style="color: #B452CD">1</span>)
-
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;The analytical derivative of f6 at x = %g is: %g&quot;</span>%(x,f6_grad_analytical))
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
+<p>It is easy to extend to more predictors. The final class is </p>
+$$
+p(C=K\vert \mathbf {x} )=\frac{1}{1+\sum_{l=1}^{K-1}\exp{(\theta_{l0}+\theta_{l1}x_1)}},
+$$
+
+<p>and they sum to one. Our earlier discussions were all specialized to
+the case with two classes only. It is easy to see from the above that
+what we derived earlier is compatible with these equations.
+</p>
 
+<p>To find the optimal parameters we would typically use a gradient
+descent method.  Newton's method and gradient descent methods are
+discussed in the material on <a href="https://compphysics.github.io/MachineLearning/doc/pub/Splines/html/Splines-bs.html" target="_blank">optimization
+methods</a>.
+</p>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="using-recursion">Using recursion </h2>
+<h2 id="optimization-the-central-part-of-any-machine-learning-algortithm">Optimization, the central part of any Machine Learning algortithm </h2>
 
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="line-height: 125%;"><span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">autograd.numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">autograd</span> <span style="color: #8B008B; font-weight: bold">import</span> grad
+<p>Almost every problem in machine learning and data science starts with
+a dataset \( X \), a model \( g(\theta) \), which is a function of the
+parameters \( \theta \) and a cost function \( C(X, g(\theta)) \) that allows
+us to judge how well the model \( g(\theta) \) explains the observations
+\( X \). The model is fit by finding the values of \( \theta \) that minimize
+the cost function. Ideally we would be able to solve for \( \theta \)
+analytically, however this is not possible in general and we must use
+some approximative/numerical method to compute the minimum.
+</p>
+
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="revisiting-our-logistic-regression-case">Revisiting our Logistic Regression case </h2>
+
+<p>In our discussion on Logistic Regression we studied the 
+case of
+two classes, with \( y_i \) either
+\( 0 \) or \( 1 \). Furthermore we assumed also that we have only two
+parameters \( \theta \) in our fitting, that is we
+defined probabilities
+</p>
 
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">f7</span>(n): <span style="color: #228B22"># Assume that n is an integer</span>
-    <span style="color: #8B008B; font-weight: bold">if</span> n == <span style="color: #B452CD">1</span> <span style="color: #8B008B">or</span> n == <span style="color: #B452CD">0</span>:
-        <span style="color: #8B008B; font-weight: bold">return</span> <span style="color: #B452CD">1</span>
-    <span style="color: #8B008B; font-weight: bold">else</span>:
-        <span style="color: #8B008B; font-weight: bold">return</span> n*f7(n-<span style="color: #B452CD">1</span>)
+$$
+\begin{align*}
+p(y_i=1|x_i,\boldsymbol{\theta}) &= \frac{\exp{(\theta_0+\theta_1x_i)}}{1+\exp{(\theta_0+\theta_1x_i)}},\nonumber\\
+p(y_i=0|x_i,\boldsymbol{\theta}) &= 1 - p(y_i=1|x_i,\boldsymbol{\theta}),
+\end{align*}
+$$
 
-f7_grad = grad(f7)
+<p>where \( \boldsymbol{\theta} \) are the weights we wish to extract from data, in our case \( \theta_0 \) and \( \theta_1 \). </p>
 
-n = <span style="color: #B452CD">2.0</span>
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="the-equations-to-solve">The equations to solve </h2>
 
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;The computed derivative of f7 at n = %d is: %g&quot;</span>%(n,f7_grad(n)))
+<p>Our compact equations used a definition of a vector \( \boldsymbol{y} \) with \( n \)
+elements \( y_i \), an \( n\times p \) matrix \( \boldsymbol{X} \) which contains the
+\( x_i \) values and a vector \( \boldsymbol{p} \) of fitted probabilities
+\( p(y_i\vert x_i,\boldsymbol{\theta}) \). We rewrote in a more compact form
+the first derivative of the cost function as
+</p>
 
-<span style="color: #228B22"># The function f7 is an implementation of the factorial of n.</span>
-<span style="color: #228B22"># By using the product rule, one can find that the derivative is:</span>
+$$
+\frac{\partial \mathcal{C}(\boldsymbol{\theta})}{\partial \boldsymbol{\theta}} = -\boldsymbol{X}^T\left(\boldsymbol{y}-\boldsymbol{p}\right). 
+$$
 
-f7_grad_analytical = <span style="color: #B452CD">0</span>
-<span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #658b00">int</span>(n)-<span style="color: #B452CD">1</span>):
-    tmp = <span style="color: #B452CD">1</span>
-    <span style="color: #8B008B; font-weight: bold">for</span> k <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #658b00">int</span>(n)-<span style="color: #B452CD">1</span>):
-        <span style="color: #8B008B; font-weight: bold">if</span> k != i:
-            tmp *= (n - k)
-    f7_grad_analytical += tmp
+<p>If we in addition define a diagonal matrix \( \boldsymbol{W} \) with elements 
+\( p(y_i\vert x_i,\boldsymbol{\theta})(1-p(y_i\vert x_i,\boldsymbol{\theta}) \), we can obtain a compact expression of the second derivative as 
+</p>
 
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;The analytical derivative of f7 at n = %d is: %g&quot;</span>%(n,f7_grad_analytical))
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
+$$
+\frac{\partial^2 \mathcal{C}(\boldsymbol{\theta})}{\partial \boldsymbol{\theta}\partial \boldsymbol{\theta}^T} = \boldsymbol{X}^T\boldsymbol{W}\boldsymbol{X}. 
+$$
 
-<p>Note that if n is equal to zero or one, Autograd will give an error message. This message appears when the output is independent on input.</p>
+<p>This defines what is called  the Hessian matrix.</p>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="using-autograd-with-ols">Using Autograd with OLS </h2>
+<h2 id="solving-using-newton-raphson-s-method">Solving using Newton-Raphson's method </h2>
 
-<p>We conclude the part on optmization by showing how we can make codes
-for linear regression and logistic regression using <b>autograd</b>. The
-first example shows results with ordinary leats squares.
-</p>
+<p>If we can set up these equations, Newton-Raphson's iterative method is normally the method of choice. It requires however that we can compute in an efficient way the  matrices that define the first and second derivatives. </p>
 
+<p>Our iterative scheme is then given by</p>
 
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="line-height: 125%;"><span style="color: #228B22"># Using Autograd to calculate gradients for OLS</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">random</span> <span style="color: #8B008B; font-weight: bold">import</span> random, seed
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">autograd.numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">matplotlib.pyplot</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">plt</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">autograd</span> <span style="color: #8B008B; font-weight: bold">import</span> grad
-
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">CostOLS</span>(beta):
-    <span style="color: #8B008B; font-weight: bold">return</span> (<span style="color: #B452CD">1.0</span>/n)*np.sum((y-X @ beta)**<span style="color: #B452CD">2</span>)
-
-n = <span style="color: #B452CD">100</span>
-x = <span style="color: #B452CD">2</span>*np.random.rand(n,<span style="color: #B452CD">1</span>)
-y = <span style="color: #B452CD">4</span>+<span style="color: #B452CD">3</span>*x+np.random.randn(n,<span style="color: #B452CD">1</span>)
-
-X = np.c_[np.ones((n,<span style="color: #B452CD">1</span>)), x]
-XT_X = X.T @ X
-theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;Own inversion&quot;</span>)
-<span style="color: #658b00">print</span>(theta_linreg)
-<span style="color: #228B22"># Hessian matrix</span>
-H = (<span style="color: #B452CD">2.0</span>/n)* XT_X
-EigValues, EigVectors = np.linalg.eig(H)
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">f&quot;Eigenvalues of Hessian Matrix:{</span>EigValues<span style="color: #CD5555">}&quot;</span>)
-
-theta = np.random.randn(<span style="color: #B452CD">2</span>,<span style="color: #B452CD">1</span>)
-eta = <span style="color: #B452CD">1.0</span>/np.max(EigValues)
-Niterations = <span style="color: #B452CD">1000</span>
-<span style="color: #228B22"># define the gradient</span>
-training_gradient = grad(CostOLS)
-
-<span style="color: #8B008B; font-weight: bold">for</span> <span style="color: #658b00">iter</span> <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(Niterations):
-    gradients = training_gradient(theta)
-    theta -= eta*gradients
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;theta from own gd&quot;</span>)
-<span style="color: #658b00">print</span>(theta)
-
-xnew = np.array([[<span style="color: #B452CD">0</span>],[<span style="color: #B452CD">2</span>]])
-Xnew = np.c_[np.ones((<span style="color: #B452CD">2</span>,<span style="color: #B452CD">1</span>)), xnew]
-ypredict = Xnew.dot(theta)
-ypredict2 = Xnew.dot(theta_linreg)
-
-plt.plot(xnew, ypredict, <span style="color: #CD5555">&quot;r-&quot;</span>)
-plt.plot(xnew, ypredict2, <span style="color: #CD5555">&quot;b-&quot;</span>)
-plt.plot(x, y ,<span style="color: #CD5555">&#39;ro&#39;</span>)
-plt.axis([<span style="color: #B452CD">0</span>,<span style="color: #B452CD">2.0</span>,<span style="color: #B452CD">0</span>, <span style="color: #B452CD">15.0</span>])
-plt.xlabel(<span style="color: #CD5555">r&#39;$x$&#39;</span>)
-plt.ylabel(<span style="color: #CD5555">r&#39;$y$&#39;</span>)
-plt.title(<span style="color: #CD5555">r&#39;Random numbers &#39;</span>)
-plt.show()
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
+$$
+\boldsymbol{\theta}^{\mathrm{new}} = \boldsymbol{\theta}^{\mathrm{old}}-\left(\frac{\partial^2 \mathcal{C}(\boldsymbol{\theta})}{\partial \boldsymbol{\theta}\partial \boldsymbol{\theta}^T}\right)^{-1}_{\boldsymbol{\theta}^{\mathrm{old}}}\times \left(\frac{\partial \mathcal{C}(\boldsymbol{\theta})}{\partial \boldsymbol{\theta}}\right)_{\boldsymbol{\theta}^{\mathrm{old}}},
+$$
 
+<p>or in matrix form as</p>
 
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="same-code-but-now-with-momentum-gradient-descent">Same code but now with momentum gradient descent </h2>
+$$
+\boldsymbol{\theta}^{\mathrm{new}} = \boldsymbol{\theta}^{\mathrm{old}}-\left(\boldsymbol{X}^T\boldsymbol{W}\boldsymbol{X} \right)^{-1}\times \left(-\boldsymbol{X}^T(\boldsymbol{y}-\boldsymbol{p}) \right)_{\boldsymbol{\theta}^{\mathrm{old}}}.
+$$
 
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="line-height: 125%;"><span style="color: #228B22"># Using Autograd to calculate gradients for OLS</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">random</span> <span style="color: #8B008B; font-weight: bold">import</span> random, seed
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">autograd.numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">matplotlib.pyplot</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">plt</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">autograd</span> <span style="color: #8B008B; font-weight: bold">import</span> grad
-
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">CostOLS</span>(beta):
-    <span style="color: #8B008B; font-weight: bold">return</span> (<span style="color: #B452CD">1.0</span>/n)*np.sum((y-X @ beta)**<span style="color: #B452CD">2</span>)
-
-n = <span style="color: #B452CD">100</span>
-x = <span style="color: #B452CD">2</span>*np.random.rand(n,<span style="color: #B452CD">1</span>)
-y = <span style="color: #B452CD">4</span>+<span style="color: #B452CD">3</span>*x<span style="color: #228B22">#+np.random.randn(n,1)</span>
-
-X = np.c_[np.ones((n,<span style="color: #B452CD">1</span>)), x]
-XT_X = X.T @ X
-theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;Own inversion&quot;</span>)
-<span style="color: #658b00">print</span>(theta_linreg)
-<span style="color: #228B22"># Hessian matrix</span>
-H = (<span style="color: #B452CD">2.0</span>/n)* XT_X
-EigValues, EigVectors = np.linalg.eig(H)
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">f&quot;Eigenvalues of Hessian Matrix:{</span>EigValues<span style="color: #CD5555">}&quot;</span>)
-
-theta = np.random.randn(<span style="color: #B452CD">2</span>,<span style="color: #B452CD">1</span>)
-eta = <span style="color: #B452CD">1.0</span>/np.max(EigValues)
-Niterations = <span style="color: #B452CD">30</span>
-
-<span style="color: #228B22"># define the gradient</span>
-training_gradient = grad(CostOLS)
-
-<span style="color: #8B008B; font-weight: bold">for</span> <span style="color: #658b00">iter</span> <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(Niterations):
-    gradients = training_gradient(theta)
-    theta -= eta*gradients
-    <span style="color: #658b00">print</span>(<span style="color: #658b00">iter</span>,gradients[<span style="color: #B452CD">0</span>],gradients[<span style="color: #B452CD">1</span>])
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;theta from own gd&quot;</span>)
-<span style="color: #658b00">print</span>(theta)
-
-<span style="color: #228B22"># Now improve with momentum gradient descent</span>
-change = <span style="color: #B452CD">0.0</span>
-delta_momentum = <span style="color: #B452CD">0.3</span>
-<span style="color: #8B008B; font-weight: bold">for</span> <span style="color: #658b00">iter</span> <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(Niterations):
-    <span style="color: #228B22"># calculate gradient</span>
-    gradients = training_gradient(theta)
-    <span style="color: #228B22"># calculate update</span>
-    new_change = eta*gradients+delta_momentum*change
-    <span style="color: #228B22"># take a step</span>
-    theta -= new_change
-    <span style="color: #228B22"># save the change</span>
-    change = new_change
-    <span style="color: #658b00">print</span>(<span style="color: #658b00">iter</span>,gradients[<span style="color: #B452CD">0</span>],gradients[<span style="color: #B452CD">1</span>])
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;theta from own gd wth momentum&quot;</span>)
-<span style="color: #658b00">print</span>(theta)
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
+<p>The right-hand side is computed with the old values of \( \theta \). </p>
 
+<p>If we can compute these matrices, in particular the Hessian, the above is often the easiest method to implement. </p>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="including-stochastic-gradient-descent-with-autograd">Including Stochastic Gradient Descent with Autograd </h2>
-<p>In this code we include the stochastic gradient descent approach discussed above. Note here that we specify which argument we are taking the derivative with respect to when using <b>autograd</b>.</p>
+<h2 id="example-code-for-logistic-regression">Example code for Logistic Regression </h2>
 
+<p>Here we make a class for Logistic regression. The code uses a simple data set and includes both a binary case and a multiclass case.</p>
 
 <!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
 <div class="cell border-box-sizing code_cell rendered">
@@ -1845,79 +633,138 @@ <h2 id="including-stochastic-gradient-descent-with-autograd">Including Stochasti
     <div class="inner_cell">
       <div class="input_area">
         <div class="highlight" style="background: #eeeedd">
-  <pre style="line-height: 125%;"><span style="color: #228B22"># Using Autograd to calculate gradients using SGD</span>
-<span style="color: #228B22"># OLS example</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">random</span> <span style="color: #8B008B; font-weight: bold">import</span> random, seed
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">autograd.numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">matplotlib.pyplot</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">plt</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">autograd</span> <span style="color: #8B008B; font-weight: bold">import</span> grad
-
-<span style="color: #228B22"># Note change from previous example</span>
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">CostOLS</span>(y,X,theta):
-    <span style="color: #8B008B; font-weight: bold">return</span> np.sum((y-X @ theta)**<span style="color: #B452CD">2</span>)
-
-n = <span style="color: #B452CD">100</span>
-x = <span style="color: #B452CD">2</span>*np.random.rand(n,<span style="color: #B452CD">1</span>)
-y = <span style="color: #B452CD">4</span>+<span style="color: #B452CD">3</span>*x+np.random.randn(n,<span style="color: #B452CD">1</span>)
-
-X = np.c_[np.ones((n,<span style="color: #B452CD">1</span>)), x]
-XT_X = X.T @ X
-theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;Own inversion&quot;</span>)
-<span style="color: #658b00">print</span>(theta_linreg)
-<span style="color: #228B22"># Hessian matrix</span>
-H = (<span style="color: #B452CD">2.0</span>/n)* XT_X
-EigValues, EigVectors = np.linalg.eig(H)
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">f&quot;Eigenvalues of Hessian Matrix:{</span>EigValues<span style="color: #CD5555">}&quot;</span>)
-
-theta = np.random.randn(<span style="color: #B452CD">2</span>,<span style="color: #B452CD">1</span>)
-eta = <span style="color: #B452CD">1.0</span>/np.max(EigValues)
-Niterations = <span style="color: #B452CD">1000</span>
-
-<span style="color: #228B22"># Note that we request the derivative wrt third argument (theta, 2 here)</span>
-training_gradient = grad(CostOLS,<span style="color: #B452CD">2</span>)
-
-<span style="color: #8B008B; font-weight: bold">for</span> <span style="color: #658b00">iter</span> <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(Niterations):
-    gradients = (<span style="color: #B452CD">1.0</span>/n)*training_gradient(y, X, theta)
-    theta -= eta*gradients
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;theta from own gd&quot;</span>)
-<span style="color: #658b00">print</span>(theta)
-
-xnew = np.array([[<span style="color: #B452CD">0</span>],[<span style="color: #B452CD">2</span>]])
-Xnew = np.c_[np.ones((<span style="color: #B452CD">2</span>,<span style="color: #B452CD">1</span>)), xnew]
-ypredict = Xnew.dot(theta)
-ypredict2 = Xnew.dot(theta_linreg)
-
-plt.plot(xnew, ypredict, <span style="color: #CD5555">&quot;r-&quot;</span>)
-plt.plot(xnew, ypredict2, <span style="color: #CD5555">&quot;b-&quot;</span>)
-plt.plot(x, y ,<span style="color: #CD5555">&#39;ro&#39;</span>)
-plt.axis([<span style="color: #B452CD">0</span>,<span style="color: #B452CD">2.0</span>,<span style="color: #B452CD">0</span>, <span style="color: #B452CD">15.0</span>])
-plt.xlabel(<span style="color: #CD5555">r&#39;$x$&#39;</span>)
-plt.ylabel(<span style="color: #CD5555">r&#39;$y$&#39;</span>)
-plt.title(<span style="color: #CD5555">r&#39;Random numbers &#39;</span>)
-plt.show()
+  <pre style="line-height: 125%;"><span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
+
+<span style="color: #8B008B; font-weight: bold">class</span> <span style="color: #008b45; font-weight: bold">LogisticRegression</span>:
+    <span style="color: #CD5555">&quot;&quot;&quot;</span>
+<span style="color: #CD5555">    Logistic Regression for binary and multiclass classification.</span>
+<span style="color: #CD5555">    &quot;&quot;&quot;</span>
+    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">__init__</span>(<span style="color: #658b00">self</span>, lr=<span style="color: #B452CD">0.01</span>, epochs=<span style="color: #B452CD">1000</span>, fit_intercept=<span style="color: #8B008B; font-weight: bold">True</span>, verbose=<span style="color: #8B008B; font-weight: bold">False</span>):
+        <span style="color: #658b00">self</span>.lr = lr                  <span style="color: #228B22"># Learning rate for gradient descent</span>
+        <span style="color: #658b00">self</span>.epochs = epochs          <span style="color: #228B22"># Number of iterations</span>
+        <span style="color: #658b00">self</span>.fit_intercept = fit_intercept  <span style="color: #228B22"># Whether to add intercept (bias)</span>
+        <span style="color: #658b00">self</span>.verbose = verbose        <span style="color: #228B22"># Print loss during training if True</span>
+        <span style="color: #658b00">self</span>.weights = <span style="color: #8B008B; font-weight: bold">None</span>
+        <span style="color: #658b00">self</span>.multi_class = <span style="color: #8B008B; font-weight: bold">False</span>      <span style="color: #228B22"># Will be determined at fit time</span>
+
+    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">_add_intercept</span>(<span style="color: #658b00">self</span>, X):
+        <span style="color: #CD5555">&quot;&quot;&quot;Add intercept term (column of ones) to feature matrix.&quot;&quot;&quot;</span>
+        intercept = np.ones((X.shape[<span style="color: #B452CD">0</span>], <span style="color: #B452CD">1</span>))
+        <span style="color: #8B008B; font-weight: bold">return</span> np.concatenate((intercept, X), axis=<span style="color: #B452CD">1</span>)
+
+    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">_sigmoid</span>(<span style="color: #658b00">self</span>, z):
+        <span style="color: #CD5555">&quot;&quot;&quot;Sigmoid function for binary logistic.&quot;&quot;&quot;</span>
+        <span style="color: #8B008B; font-weight: bold">return</span> <span style="color: #B452CD">1</span> / (<span style="color: #B452CD">1</span> + np.exp(-z))
 
-n_epochs = <span style="color: #B452CD">50</span>
-M = <span style="color: #B452CD">5</span>   <span style="color: #228B22">#size of each minibatch</span>
-m = <span style="color: #658b00">int</span>(n/M) <span style="color: #228B22">#number of minibatches</span>
-t0, t1 = <span style="color: #B452CD">5</span>, <span style="color: #B452CD">50</span>
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">learning_schedule</span>(t):
-    <span style="color: #8B008B; font-weight: bold">return</span> t0/(t+t1)
-
-theta = np.random.randn(<span style="color: #B452CD">2</span>,<span style="color: #B452CD">1</span>)
-
-<span style="color: #8B008B; font-weight: bold">for</span> epoch <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(n_epochs):
-<span style="color: #228B22"># Can you figure out a better way of setting up the contributions to each batch?</span>
-    <span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(m):
-        random_index = M*np.random.randint(m)
-        xi = X[random_index:random_index+M]
-        yi = y[random_index:random_index+M]
-        gradients = (<span style="color: #B452CD">1.0</span>/M)*training_gradient(yi, xi, theta)
-        eta = learning_schedule(epoch*m+i)
-        theta = theta - eta*gradients
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;theta from own sdg&quot;</span>)
-<span style="color: #658b00">print</span>(theta)
+    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">_softmax</span>(<span style="color: #658b00">self</span>, Z):
+        <span style="color: #CD5555">&quot;&quot;&quot;Softmax function for multiclass logistic.&quot;&quot;&quot;</span>
+        exp_Z = np.exp(Z - np.max(Z, axis=<span style="color: #B452CD">1</span>, keepdims=<span style="color: #8B008B; font-weight: bold">True</span>))
+        <span style="color: #8B008B; font-weight: bold">return</span> exp_Z / np.sum(exp_Z, axis=<span style="color: #B452CD">1</span>, keepdims=<span style="color: #8B008B; font-weight: bold">True</span>)
+
+    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">fit</span>(<span style="color: #658b00">self</span>, X, y):
+        <span style="color: #CD5555">&quot;&quot;&quot;</span>
+<span style="color: #CD5555">        Train the logistic regression model using gradient descent.</span>
+<span style="color: #CD5555">        Supports binary (sigmoid) and multiclass (softmax) based on y.</span>
+<span style="color: #CD5555">        &quot;&quot;&quot;</span>
+        X = np.array(X)
+        y = np.array(y)
+        n_samples, n_features = X.shape
+
+        <span style="color: #228B22"># Add intercept if needed</span>
+        <span style="color: #8B008B; font-weight: bold">if</span> <span style="color: #658b00">self</span>.fit_intercept:
+            X = <span style="color: #658b00">self</span>._add_intercept(X)
+            n_features += <span style="color: #B452CD">1</span>
+
+        <span style="color: #228B22"># Determine classes and mode (binary vs multiclass)</span>
+        unique_classes = np.unique(y)
+        <span style="color: #8B008B; font-weight: bold">if</span> <span style="color: #658b00">len</span>(unique_classes) &gt; <span style="color: #B452CD">2</span>:
+            <span style="color: #658b00">self</span>.multi_class = <span style="color: #8B008B; font-weight: bold">True</span>
+        <span style="color: #8B008B; font-weight: bold">else</span>:
+            <span style="color: #658b00">self</span>.multi_class = <span style="color: #8B008B; font-weight: bold">False</span>
+
+        <span style="color: #228B22"># ----- Multiclass case -----</span>
+        <span style="color: #8B008B; font-weight: bold">if</span> <span style="color: #658b00">self</span>.multi_class:
+            n_classes = <span style="color: #658b00">len</span>(unique_classes)
+            <span style="color: #228B22"># Map original labels to 0...n_classes-1</span>
+            class_to_index = {c: idx <span style="color: #8B008B; font-weight: bold">for</span> idx, c <span style="color: #8B008B">in</span> <span style="color: #658b00">enumerate</span>(unique_classes)}
+            y_indices = np.array([class_to_index[c] <span style="color: #8B008B; font-weight: bold">for</span> c <span style="color: #8B008B">in</span> y])
+            <span style="color: #228B22"># Initialize weight matrix (features x classes)</span>
+            <span style="color: #658b00">self</span>.weights = np.zeros((n_features, n_classes))
+
+            <span style="color: #228B22"># One-hot encode y</span>
+            Y_onehot = np.zeros((n_samples, n_classes))
+            Y_onehot[np.arange(n_samples), y_indices] = <span style="color: #B452CD">1</span>
+
+            <span style="color: #228B22"># Gradient descent</span>
+            <span style="color: #8B008B; font-weight: bold">for</span> epoch <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #658b00">self</span>.epochs):
+                scores = X.dot(<span style="color: #658b00">self</span>.weights)          <span style="color: #228B22"># Linear scores (n_samples x n_classes)</span>
+                probs = <span style="color: #658b00">self</span>._softmax(scores)        <span style="color: #228B22"># Probabilities (n_samples x n_classes)</span>
+                <span style="color: #228B22"># Compute gradient (features x classes)</span>
+                gradient = (<span style="color: #B452CD">1</span> / n_samples) * X.T.dot(probs - Y_onehot)
+                <span style="color: #228B22"># Update weights</span>
+                <span style="color: #658b00">self</span>.weights -= <span style="color: #658b00">self</span>.lr * gradient
+
+                <span style="color: #8B008B; font-weight: bold">if</span> <span style="color: #658b00">self</span>.verbose <span style="color: #8B008B">and</span> epoch % <span style="color: #B452CD">100</span> == <span style="color: #B452CD">0</span>:
+                    <span style="color: #228B22"># Compute current loss (categorical cross-entropy)</span>
+                    loss = -np.sum(Y_onehot * np.log(probs + <span style="color: #B452CD">1e-15</span>)) / n_samples
+                    <span style="color: #658b00">print</span>(<span style="color: #CD5555">f&quot;[Epoch {</span>epoch<span style="color: #CD5555">}] Multiclass loss: {</span>loss<span style="color: #CD5555">:.4f}&quot;</span>)
+
+        <span style="color: #228B22"># ----- Binary case -----</span>
+        <span style="color: #8B008B; font-weight: bold">else</span>:
+            <span style="color: #228B22"># Convert y to 0/1 if not already</span>
+            <span style="color: #8B008B; font-weight: bold">if</span> <span style="color: #8B008B">not</span> np.array_equal(unique_classes, [<span style="color: #B452CD">0</span>, <span style="color: #B452CD">1</span>]):
+                <span style="color: #228B22"># Map the two classes to 0 and 1</span>
+                class0, class1 = unique_classes
+                y_binary = np.where(y == class1, <span style="color: #B452CD">1</span>, <span style="color: #B452CD">0</span>)
+            <span style="color: #8B008B; font-weight: bold">else</span>:
+                y_binary = y.copy().astype(<span style="color: #658b00">int</span>)
+
+            <span style="color: #228B22"># Initialize weights vector (features,)</span>
+            <span style="color: #658b00">self</span>.weights = np.zeros(n_features)
+
+            <span style="color: #228B22"># Gradient descent</span>
+            <span style="color: #8B008B; font-weight: bold">for</span> epoch <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #658b00">self</span>.epochs):
+                linear_model = X.dot(<span style="color: #658b00">self</span>.weights)     <span style="color: #228B22"># (n_samples,)</span>
+                probs = <span style="color: #658b00">self</span>._sigmoid(linear_model)   <span style="color: #228B22"># (n_samples,)</span>
+                <span style="color: #228B22"># Gradient for binary cross-entropy</span>
+                gradient = (<span style="color: #B452CD">1</span> / n_samples) * X.T.dot(probs - y_binary)
+                <span style="color: #658b00">self</span>.weights -= <span style="color: #658b00">self</span>.lr * gradient
+
+                <span style="color: #8B008B; font-weight: bold">if</span> <span style="color: #658b00">self</span>.verbose <span style="color: #8B008B">and</span> epoch % <span style="color: #B452CD">100</span> == <span style="color: #B452CD">0</span>:
+                    <span style="color: #228B22"># Compute binary cross-entropy loss</span>
+                    loss = -np.mean(
+                        y_binary * np.log(probs + <span style="color: #B452CD">1e-15</span>) + 
+                        (<span style="color: #B452CD">1</span> - y_binary) * np.log(<span style="color: #B452CD">1</span> - probs + <span style="color: #B452CD">1e-15</span>)
+                    )
+                    <span style="color: #658b00">print</span>(<span style="color: #CD5555">f&quot;[Epoch {</span>epoch<span style="color: #CD5555">}] Binary loss: {</span>loss<span style="color: #CD5555">:.4f}&quot;</span>)
+
+    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">predict_prob</span>(<span style="color: #658b00">self</span>, X):
+        <span style="color: #CD5555">&quot;&quot;&quot;</span>
+<span style="color: #CD5555">        Compute probability estimates. Returns a 1D array for binary or</span>
+<span style="color: #CD5555">        a 2D array (n_samples x n_classes) for multiclass.</span>
+<span style="color: #CD5555">        &quot;&quot;&quot;</span>
+        X = np.array(X)
+        <span style="color: #228B22"># Add intercept if the model used it</span>
+        <span style="color: #8B008B; font-weight: bold">if</span> <span style="color: #658b00">self</span>.fit_intercept:
+            X = <span style="color: #658b00">self</span>._add_intercept(X)
+        scores = X.dot(<span style="color: #658b00">self</span>.weights)
+        <span style="color: #8B008B; font-weight: bold">if</span> <span style="color: #658b00">self</span>.multi_class:
+            <span style="color: #8B008B; font-weight: bold">return</span> <span style="color: #658b00">self</span>._softmax(scores)
+        <span style="color: #8B008B; font-weight: bold">else</span>:
+            <span style="color: #8B008B; font-weight: bold">return</span> <span style="color: #658b00">self</span>._sigmoid(scores)
+
+    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">predict</span>(<span style="color: #658b00">self</span>, X):
+        <span style="color: #CD5555">&quot;&quot;&quot;</span>
+<span style="color: #CD5555">        Predict class labels for samples in X.</span>
+<span style="color: #CD5555">        Returns integer class labels (0,1 for binary, or 0...C-1 for multiclass).</span>
+<span style="color: #CD5555">        &quot;&quot;&quot;</span>
+        probs = <span style="color: #658b00">self</span>.predict_prob(X)
+        <span style="color: #8B008B; font-weight: bold">if</span> <span style="color: #658b00">self</span>.multi_class:
+            <span style="color: #228B22"># Choose class with highest probability</span>
+            <span style="color: #8B008B; font-weight: bold">return</span> np.argmax(probs, axis=<span style="color: #B452CD">1</span>)
+        <span style="color: #8B008B; font-weight: bold">else</span>:
+            <span style="color: #228B22"># Threshold at 0.5 for binary</span>
+            <span style="color: #8B008B; font-weight: bold">return</span> (probs &gt;= <span style="color: #B452CD">0.5</span>).astype(<span style="color: #658b00">int</span>)
 </pre>
 </div>
       </div>
@@ -1933,9 +780,17 @@ <h2 id="including-stochastic-gradient-descent-with-autograd">Including Stochasti
   </div>
 </div>
 
+<p>The class implements the sigmoid and softmax internally. During fit(),
+we check the number of classes: if more than 2, we set
+self.multi_class=True and perform multinomial logistic regression. We
+one-hot encode the target vector and update a weight matrix with
+softmax probabilities. Otherwise, we do standard binary logistic
+regression, converting labels to 0/1 if needed and updating a weight
+vector. In both cases we use batch gradient descent on the
+cross-entropy loss (we add a small epsilon 1e-15 to logs for numerical
+stability). Progress (loss) can be printed if verbose=True.
+</p>
 
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="same-code-but-now-with-momentum-gradient-descent">Same code but now with momentum gradient descent </h2>
 
 <!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
 <div class="cell border-box-sizing code_cell rendered">
@@ -1943,73 +798,38 @@ <h2 id="same-code-but-now-with-momentum-gradient-descent">Same code but now with
     <div class="inner_cell">
       <div class="input_area">
         <div class="highlight" style="background: #eeeedd">
-  <pre style="line-height: 125%;"><span style="color: #228B22"># Using Autograd to calculate gradients using SGD</span>
-<span style="color: #228B22"># OLS example</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">random</span> <span style="color: #8B008B; font-weight: bold">import</span> random, seed
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">autograd.numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">matplotlib.pyplot</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">plt</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">autograd</span> <span style="color: #8B008B; font-weight: bold">import</span> grad
-
-<span style="color: #228B22"># Note change from previous example</span>
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">CostOLS</span>(y,X,theta):
-    <span style="color: #8B008B; font-weight: bold">return</span> np.sum((y-X @ theta)**<span style="color: #B452CD">2</span>)
-
-n = <span style="color: #B452CD">100</span>
-x = <span style="color: #B452CD">2</span>*np.random.rand(n,<span style="color: #B452CD">1</span>)
-y = <span style="color: #B452CD">4</span>+<span style="color: #B452CD">3</span>*x+np.random.randn(n,<span style="color: #B452CD">1</span>)
-
-X = np.c_[np.ones((n,<span style="color: #B452CD">1</span>)), x]
-XT_X = X.T @ X
-theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;Own inversion&quot;</span>)
-<span style="color: #658b00">print</span>(theta_linreg)
-<span style="color: #228B22"># Hessian matrix</span>
-H = (<span style="color: #B452CD">2.0</span>/n)* XT_X
-EigValues, EigVectors = np.linalg.eig(H)
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">f&quot;Eigenvalues of Hessian Matrix:{</span>EigValues<span style="color: #CD5555">}&quot;</span>)
-
-theta = np.random.randn(<span style="color: #B452CD">2</span>,<span style="color: #B452CD">1</span>)
-eta = <span style="color: #B452CD">1.0</span>/np.max(EigValues)
-Niterations = <span style="color: #B452CD">100</span>
-
-<span style="color: #228B22"># Note that we request the derivative wrt third argument (theta, 2 here)</span>
-training_gradient = grad(CostOLS,<span style="color: #B452CD">2</span>)
-
-<span style="color: #8B008B; font-weight: bold">for</span> <span style="color: #658b00">iter</span> <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(Niterations):
-    gradients = (<span style="color: #B452CD">1.0</span>/n)*training_gradient(y, X, theta)
-    theta -= eta*gradients
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;theta from own gd&quot;</span>)
-<span style="color: #658b00">print</span>(theta)
-
-
-n_epochs = <span style="color: #B452CD">50</span>
-M = <span style="color: #B452CD">5</span>   <span style="color: #228B22">#size of each minibatch</span>
-m = <span style="color: #658b00">int</span>(n/M) <span style="color: #228B22">#number of minibatches</span>
-t0, t1 = <span style="color: #B452CD">5</span>, <span style="color: #B452CD">50</span>
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">learning_schedule</span>(t):
-    <span style="color: #8B008B; font-weight: bold">return</span> t0/(t+t1)
-
-theta = np.random.randn(<span style="color: #B452CD">2</span>,<span style="color: #B452CD">1</span>)
-
-change = <span style="color: #B452CD">0.0</span>
-delta_momentum = <span style="color: #B452CD">0.3</span>
-
-<span style="color: #8B008B; font-weight: bold">for</span> epoch <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(n_epochs):
-    <span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(m):
-        random_index = M*np.random.randint(m)
-        xi = X[random_index:random_index+M]
-        yi = y[random_index:random_index+M]
-        gradients = (<span style="color: #B452CD">1.0</span>/M)*training_gradient(yi, xi, theta)
-        eta = learning_schedule(epoch*m+i)
-        <span style="color: #228B22"># calculate update</span>
-        new_change = eta*gradients+delta_momentum*change
-        <span style="color: #228B22"># take a step</span>
-        theta -= new_change
-        <span style="color: #228B22"># save the change</span>
-        change = new_change
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;theta from own sdg with momentum&quot;</span>)
-<span style="color: #658b00">print</span>(theta)
+  <pre style="line-height: 125%;"><span style="color: #228B22"># Evaluation Metrics</span>
+<span style="color: #228B22">#We define helper functions for accuracy and cross-entropy loss. Accuracy is the fraction of correct predictions . For loss, we compute the appropriate cross-entropy:</span>
+
+<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">accuracy_score</span>(y_true, y_pred):
+    <span style="color: #CD5555">&quot;&quot;&quot;Accuracy = (# correct predictions) / (total samples).&quot;&quot;&quot;</span>
+    y_true = np.array(y_true)
+    y_pred = np.array(y_pred)
+    <span style="color: #8B008B; font-weight: bold">return</span> np.mean(y_true == y_pred)
+
+<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">binary_cross_entropy</span>(y_true, y_prob):
+    <span style="color: #CD5555">&quot;&quot;&quot;</span>
+<span style="color: #CD5555">    Binary cross-entropy loss.</span>
+<span style="color: #CD5555">    y_true: true binary labels (0 or 1), y_prob: predicted probabilities for class 1.</span>
+<span style="color: #CD5555">    &quot;&quot;&quot;</span>
+    y_true = np.array(y_true)
+    y_prob = np.clip(np.array(y_prob), <span style="color: #B452CD">1e-15</span>, <span style="color: #B452CD">1</span>-<span style="color: #B452CD">1e-15</span>)
+    <span style="color: #8B008B; font-weight: bold">return</span> -np.mean(y_true * np.log(y_prob) + (<span style="color: #B452CD">1</span> - y_true) * np.log(<span style="color: #B452CD">1</span> - y_prob))
+
+<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">categorical_cross_entropy</span>(y_true, y_prob):
+    <span style="color: #CD5555">&quot;&quot;&quot;</span>
+<span style="color: #CD5555">    Categorical cross-entropy loss for multiclass.</span>
+<span style="color: #CD5555">    y_true: true labels (0...C-1), y_prob: array of predicted probabilities (n_samples x C).</span>
+<span style="color: #CD5555">    &quot;&quot;&quot;</span>
+    y_true = np.array(y_true, dtype=<span style="color: #658b00">int</span>)
+    y_prob = np.clip(np.array(y_prob), <span style="color: #B452CD">1e-15</span>, <span style="color: #B452CD">1</span>-<span style="color: #B452CD">1e-15</span>)
+    <span style="color: #228B22"># One-hot encode true labels</span>
+    n_samples, n_classes = y_prob.shape
+    one_hot = np.zeros_like(y_prob)
+    one_hot[np.arange(n_samples), y_true] = <span style="color: #B452CD">1</span>
+    <span style="color: #228B22"># Compute cross-entropy</span>
+    loss_vec = -np.sum(one_hot * np.log(y_prob), axis=<span style="color: #B452CD">1</span>)
+    <span style="color: #8B008B; font-weight: bold">return</span> np.mean(loss_vec)
 </pre>
 </div>
       </div>
@@ -2024,10 +844,12 @@ <h2 id="same-code-but-now-with-momentum-gradient-descent">Same code but now with
     </div>
   </div>
 </div>
+<h3 id="synthetic-data-generation">Synthetic data generation </h3>
 
+<p>Binary classification data: Create two Gaussian clusters in 2D. For example, class 0 around mean [-2,-2] and class 1 around [2,2].
+Multiclass data: Create several Gaussian clusters (one per class) spread out in feature space.
+</p>
 
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="similar-second-order-function-now-problem-but-now-with-adagrad">Similar (second order function now) problem but now with AdaGrad </h2>
 
 <!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
 <div class="cell border-box-sizing code_cell rendered">
@@ -2035,54 +857,84 @@ <h2 id="similar-second-order-function-now-problem-but-now-with-adagrad">Similar
     <div class="inner_cell">
       <div class="input_area">
         <div class="highlight" style="background: #eeeedd">
-  <pre style="line-height: 125%;"><span style="color: #228B22"># Using Autograd to calculate gradients using AdaGrad and Stochastic Gradient descent</span>
-<span style="color: #228B22"># OLS example</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">random</span> <span style="color: #8B008B; font-weight: bold">import</span> random, seed
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">autograd.numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">matplotlib.pyplot</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">plt</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">autograd</span> <span style="color: #8B008B; font-weight: bold">import</span> grad
-
-<span style="color: #228B22"># Note change from previous example</span>
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">CostOLS</span>(y,X,theta):
-    <span style="color: #8B008B; font-weight: bold">return</span> np.sum((y-X @ theta)**<span style="color: #B452CD">2</span>)
-
-n = <span style="color: #B452CD">1000</span>
-x = np.random.rand(n,<span style="color: #B452CD">1</span>)
-y = <span style="color: #B452CD">2.0</span>+<span style="color: #B452CD">3</span>*x +<span style="color: #B452CD">4</span>*x*x
-
-X = np.c_[np.ones((n,<span style="color: #B452CD">1</span>)), x, x*x]
-XT_X = X.T @ X
-theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;Own inversion&quot;</span>)
-<span style="color: #658b00">print</span>(theta_linreg)
-
-
-<span style="color: #228B22"># Note that we request the derivative wrt third argument (theta, 2 here)</span>
-training_gradient = grad(CostOLS,<span style="color: #B452CD">2</span>)
-<span style="color: #228B22"># Define parameters for Stochastic Gradient Descent</span>
-n_epochs = <span style="color: #B452CD">50</span>
-M = <span style="color: #B452CD">5</span>   <span style="color: #228B22">#size of each minibatch</span>
-m = <span style="color: #658b00">int</span>(n/M) <span style="color: #228B22">#number of minibatches</span>
-<span style="color: #228B22"># Guess for unknown parameters theta</span>
-theta = np.random.randn(<span style="color: #B452CD">3</span>,<span style="color: #B452CD">1</span>)
-
-<span style="color: #228B22"># Value for learning rate</span>
-eta = <span style="color: #B452CD">0.01</span>
-<span style="color: #228B22"># Including AdaGrad parameter to avoid possible division by zero</span>
-delta  = <span style="color: #B452CD">1e-8</span>
-<span style="color: #8B008B; font-weight: bold">for</span> epoch <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(n_epochs):
-    Giter = <span style="color: #B452CD">0.0</span>
-    <span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(m):
-        random_index = M*np.random.randint(m)
-        xi = X[random_index:random_index+M]
-        yi = y[random_index:random_index+M]
-        gradients = (<span style="color: #B452CD">1.0</span>/M)*training_gradient(yi, xi, theta)
-        Giter += gradients*gradients
-        update = gradients*eta/(delta+np.sqrt(Giter))
-        theta -= update
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;theta from own AdaGrad&quot;</span>)
-<span style="color: #658b00">print</span>(theta)
+  <pre style="line-height: 125%;"><span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
+
+<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">generate_binary_data</span>(n_samples=<span style="color: #B452CD">100</span>, n_features=<span style="color: #B452CD">2</span>, random_state=<span style="color: #8B008B; font-weight: bold">None</span>):
+    <span style="color: #CD5555">&quot;&quot;&quot;</span>
+<span style="color: #CD5555">    Generate synthetic binary classification data.</span>
+<span style="color: #CD5555">    Returns (X, y) where X is (n_samples x n_features), y in {0,1}.</span>
+<span style="color: #CD5555">    &quot;&quot;&quot;</span>
+    rng = np.random.RandomState(random_state)
+    <span style="color: #228B22"># Half samples for class 0, half for class 1</span>
+    n0 = n_samples // <span style="color: #B452CD">2</span>
+    n1 = n_samples - n0
+    <span style="color: #228B22"># Class 0 around mean -2, class 1 around +2</span>
+    mean0 = -<span style="color: #B452CD">2</span> * np.ones(n_features)
+    mean1 =  <span style="color: #B452CD">2</span> * np.ones(n_features)
+    X0 = rng.randn(n0, n_features) + mean0
+    X1 = rng.randn(n1, n_features) + mean1
+    X = np.vstack((X0, X1))
+    y = np.array([<span style="color: #B452CD">0</span>]*n0 + [<span style="color: #B452CD">1</span>]*n1)
+    <span style="color: #8B008B; font-weight: bold">return</span> X, y
+
+<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">generate_multiclass_data</span>(n_samples=<span style="color: #B452CD">150</span>, n_features=<span style="color: #B452CD">2</span>, n_classes=<span style="color: #B452CD">3</span>, random_state=<span style="color: #8B008B; font-weight: bold">None</span>):
+    <span style="color: #CD5555">&quot;&quot;&quot;</span>
+<span style="color: #CD5555">    Generate synthetic multiclass data with n_classes Gaussian clusters.</span>
+<span style="color: #CD5555">    &quot;&quot;&quot;</span>
+    rng = np.random.RandomState(random_state)
+    X = []
+    y = []
+    samples_per_class = n_samples // n_classes
+    <span style="color: #8B008B; font-weight: bold">for</span> <span style="color: #658b00">cls</span> <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(n_classes):
+        <span style="color: #228B22"># Random cluster center for each class</span>
+        center = rng.uniform(-<span style="color: #B452CD">5</span>, <span style="color: #B452CD">5</span>, size=n_features)
+        Xi = rng.randn(samples_per_class, n_features) + center
+        yi = [<span style="color: #658b00">cls</span>] * samples_per_class
+        X.append(Xi)
+        y.extend(yi)
+    X = np.vstack(X)
+    y = np.array(y)
+    <span style="color: #8B008B; font-weight: bold">return</span> X, y
+
+
+<span style="color: #228B22"># Generate and test on binary data</span>
+X_bin, y_bin = generate_binary_data(n_samples=<span style="color: #B452CD">200</span>, n_features=<span style="color: #B452CD">2</span>, random_state=<span style="color: #B452CD">42</span>)
+model_bin = LogisticRegression(lr=<span style="color: #B452CD">0.1</span>, epochs=<span style="color: #B452CD">1000</span>)
+model_bin.fit(X_bin, y_bin)
+y_prob_bin = model_bin.predict_prob(X_bin)      <span style="color: #228B22"># probabilities for class 1</span>
+y_pred_bin = model_bin.predict(X_bin)           <span style="color: #228B22"># predicted classes 0 or 1</span>
+
+acc_bin = accuracy_score(y_bin, y_pred_bin)
+loss_bin = binary_cross_entropy(y_bin, y_prob_bin)
+<span style="color: #658b00">print</span>(<span style="color: #CD5555">f&quot;Binary Classification - Accuracy: {</span>acc_bin<span style="color: #CD5555">:.2f}, Cross-Entropy Loss: {</span>loss_bin<span style="color: #CD5555">:.2f}&quot;</span>)
+<span style="color: #228B22">#For multiclass:</span>
+<span style="color: #228B22"># Generate and test on multiclass data</span>
+X_multi, y_multi = generate_multiclass_data(n_samples=<span style="color: #B452CD">300</span>, n_features=<span style="color: #B452CD">2</span>, n_classes=<span style="color: #B452CD">3</span>, random_state=<span style="color: #B452CD">1</span>)
+model_multi = LogisticRegression(lr=<span style="color: #B452CD">0.1</span>, epochs=<span style="color: #B452CD">1000</span>)
+model_multi.fit(X_multi, y_multi)
+y_prob_multi = model_multi.predict_prob(X_multi)     <span style="color: #228B22"># (n_samples x 3) probabilities</span>
+y_pred_multi = model_multi.predict(X_multi)          <span style="color: #228B22"># predicted labels 0,1,2</span>
+
+acc_multi = accuracy_score(y_multi, y_pred_multi)
+loss_multi = categorical_cross_entropy(y_multi, y_prob_multi)
+<span style="color: #658b00">print</span>(<span style="color: #CD5555">f&quot;Multiclass Classification - Accuracy: {</span>acc_multi<span style="color: #CD5555">:.2f}, Cross-Entropy Loss: {</span>loss_multi<span style="color: #CD5555">:.2f}&quot;</span>)
+
+<span style="color: #228B22"># CSV Export</span>
+<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">csv</span>
+
+<span style="color: #228B22"># Export binary results</span>
+<span style="color: #8B008B; font-weight: bold">with</span> <span style="color: #658b00">open</span>(<span style="color: #CD5555">&#39;binary_results.csv&#39;</span>, mode=<span style="color: #CD5555">&#39;w&#39;</span>, newline=<span style="color: #CD5555">&#39;&#39;</span>) <span style="color: #8B008B; font-weight: bold">as</span> f:
+    writer = csv.writer(f)
+    writer.writerow([<span style="color: #CD5555">&quot;TrueLabel&quot;</span>, <span style="color: #CD5555">&quot;PredictedLabel&quot;</span>])
+    <span style="color: #8B008B; font-weight: bold">for</span> true, pred <span style="color: #8B008B">in</span> <span style="color: #658b00">zip</span>(y_bin, y_pred_bin):
+        writer.writerow([true, pred])
+
+<span style="color: #228B22"># Export multiclass results</span>
+<span style="color: #8B008B; font-weight: bold">with</span> <span style="color: #658b00">open</span>(<span style="color: #CD5555">&#39;multiclass_results.csv&#39;</span>, mode=<span style="color: #CD5555">&#39;w&#39;</span>, newline=<span style="color: #CD5555">&#39;&#39;</span>) <span style="color: #8B008B; font-weight: bold">as</span> f:
+    writer = csv.writer(f)
+    writer.writerow([<span style="color: #CD5555">&quot;TrueLabel&quot;</span>, <span style="color: #CD5555">&quot;PredictedLabel&quot;</span>])
+    <span style="color: #8B008B; font-weight: bold">for</span> true, pred <span style="color: #8B008B">in</span> <span style="color: #658b00">zip</span>(y_multi, y_pred_multi):
+        writer.writerow([true, pred])
 </pre>
 </div>
       </div>
@@ -2098,10 +950,17 @@ <h2 id="similar-second-order-function-now-problem-but-now-with-adagrad">Similar
   </div>
 </div>
 
-<p>Running this code we note an almost perfect agreement with the results from matrix inversion.</p>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent">RMSprop for adaptive learning rate with Stochastic Gradient Descent </h2>
+<h2 id="using-scikit-learn">Using <b>Scikit-learn</b>   </h2>
+
+<p>We show here how we can use a logistic regression case on a data set
+included in _scikit_learn_, the so-called Wisconsin breast cancer data
+using Logistic regression as our algorithm for classification. This is
+a widely studied data set and can easily be included in demonstrations
+of classification problems.
+</p>
+
 
 <!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
 <div class="cell border-box-sizing code_cell rendered">
@@ -2109,60 +968,22 @@ <h2 id="rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent">RMS
     <div class="inner_cell">
       <div class="input_area">
         <div class="highlight" style="background: #eeeedd">
-  <pre style="line-height: 125%;"><span style="color: #228B22"># Using Autograd to calculate gradients using RMSprop  and Stochastic Gradient descent</span>
-<span style="color: #228B22"># OLS example</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">random</span> <span style="color: #8B008B; font-weight: bold">import</span> random, seed
+  <pre style="line-height: 125%;"><span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">matplotlib.pyplot</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">plt</span>
 <span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">autograd.numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">matplotlib.pyplot</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">plt</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">autograd</span> <span style="color: #8B008B; font-weight: bold">import</span> grad
-
-<span style="color: #228B22"># Note change from previous example</span>
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">CostOLS</span>(y,X,theta):
-    <span style="color: #8B008B; font-weight: bold">return</span> np.sum((y-X @ theta)**<span style="color: #B452CD">2</span>)
-
-n = <span style="color: #B452CD">1000</span>
-x = np.random.rand(n,<span style="color: #B452CD">1</span>)
-y = <span style="color: #B452CD">2.0</span>+<span style="color: #B452CD">3</span>*x +<span style="color: #B452CD">4</span>*x*x<span style="color: #228B22"># +np.random.randn(n,1)</span>
-
-X = np.c_[np.ones((n,<span style="color: #B452CD">1</span>)), x, x*x]
-XT_X = X.T @ X
-theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;Own inversion&quot;</span>)
-<span style="color: #658b00">print</span>(theta_linreg)
-
-
-<span style="color: #228B22"># Note that we request the derivative wrt third argument (theta, 2 here)</span>
-training_gradient = grad(CostOLS,<span style="color: #B452CD">2</span>)
-<span style="color: #228B22"># Define parameters for Stochastic Gradient Descent</span>
-n_epochs = <span style="color: #B452CD">50</span>
-M = <span style="color: #B452CD">5</span>   <span style="color: #228B22">#size of each minibatch</span>
-m = <span style="color: #658b00">int</span>(n/M) <span style="color: #228B22">#number of minibatches</span>
-<span style="color: #228B22"># Guess for unknown parameters theta</span>
-theta = np.random.randn(<span style="color: #B452CD">3</span>,<span style="color: #B452CD">1</span>)
-
-<span style="color: #228B22"># Value for learning rate</span>
-eta = <span style="color: #B452CD">0.01</span>
-<span style="color: #228B22"># Value for parameter rho</span>
-rho = <span style="color: #B452CD">0.99</span>
-<span style="color: #228B22"># Including AdaGrad parameter to avoid possible division by zero</span>
-delta  = <span style="color: #B452CD">1e-8</span>
-<span style="color: #8B008B; font-weight: bold">for</span> epoch <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(n_epochs):
-    Giter = <span style="color: #B452CD">0.0</span>
-    <span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(m):
-        random_index = M*np.random.randint(m)
-        xi = X[random_index:random_index+M]
-        yi = y[random_index:random_index+M]
-        gradients = (<span style="color: #B452CD">1.0</span>/M)*training_gradient(yi, xi, theta)
-	<span style="color: #228B22"># Accumulated gradient</span>
-	<span style="color: #228B22"># Scaling with rho the new and the previous results</span>
-        Giter = (rho*Giter+(<span style="color: #B452CD">1</span>-rho)*gradients*gradients)
-	<span style="color: #228B22"># Taking the diagonal only and inverting</span>
-        update = gradients*eta/(delta+np.sqrt(Giter))
-	<span style="color: #228B22"># Hadamard product</span>
-        theta -= update
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;theta from own RMSprop&quot;</span>)
-<span style="color: #658b00">print</span>(theta)
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.model_selection</span> <span style="color: #8B008B; font-weight: bold">import</span>  train_test_split 
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.datasets</span> <span style="color: #8B008B; font-weight: bold">import</span> load_breast_cancer
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.linear_model</span> <span style="color: #8B008B; font-weight: bold">import</span> LogisticRegression
+
+<span style="color: #228B22"># Load the data</span>
+cancer = load_breast_cancer()
+
+X_train, X_test, y_train, y_test = train_test_split(cancer.data,cancer.target,random_state=<span style="color: #B452CD">0</span>)
+<span style="color: #658b00">print</span>(X_train.shape)
+<span style="color: #658b00">print</span>(X_test.shape)
+<span style="color: #228B22"># Logistic Regression</span>
+logreg = LogisticRegression(solver=<span style="color: #CD5555">&#39;lbfgs&#39;</span>)
+logreg.fit(X_train, y_train)
+<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;Test set accuracy with Logistic Regression: {:.2f}&quot;</span>.format(logreg.score(X_test,y_test)))
 </pre>
 </div>
       </div>
@@ -2180,8 +1001,11 @@ <h2 id="rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent">RMS
 
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf">And finally <a href="https://arxiv.org/pdf/1412.6980.pdf" target="_blank">ADAM</a> </h2>
+<h2 id="using-the-correlation-matrix">Using the correlation matrix </h2>
 
+<p>In addition to the above scores, we could also study the covariance (and the correlation matrix).
+We use <b>Pandas</b> to compute the correlation matrix.
+</p>
 
 <!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
 <div class="cell border-box-sizing code_cell rendered">
@@ -2189,65 +1013,40 @@ <h2 id="and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf">And finally <a href=
     <div class="inner_cell">
       <div class="input_area">
         <div class="highlight" style="background: #eeeedd">
-  <pre style="line-height: 125%;"><span style="color: #228B22"># Using Autograd to calculate gradients using RMSprop  and Stochastic Gradient descent</span>
-<span style="color: #228B22"># OLS example</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">random</span> <span style="color: #8B008B; font-weight: bold">import</span> random, seed
+  <pre style="line-height: 125%;"><span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">matplotlib.pyplot</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">plt</span>
 <span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">autograd.numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">matplotlib.pyplot</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">plt</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">autograd</span> <span style="color: #8B008B; font-weight: bold">import</span> grad
-
-<span style="color: #228B22"># Note change from previous example</span>
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">CostOLS</span>(y,X,theta):
-    <span style="color: #8B008B; font-weight: bold">return</span> np.sum((y-X @ theta)**<span style="color: #B452CD">2</span>)
-
-n = <span style="color: #B452CD">1000</span>
-x = np.random.rand(n,<span style="color: #B452CD">1</span>)
-y = <span style="color: #B452CD">2.0</span>+<span style="color: #B452CD">3</span>*x +<span style="color: #B452CD">4</span>*x*x<span style="color: #228B22"># +np.random.randn(n,1)</span>
-
-X = np.c_[np.ones((n,<span style="color: #B452CD">1</span>)), x, x*x]
-XT_X = X.T @ X
-theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;Own inversion&quot;</span>)
-<span style="color: #658b00">print</span>(theta_linreg)
-
-
-<span style="color: #228B22"># Note that we request the derivative wrt third argument (theta, 2 here)</span>
-training_gradient = grad(CostOLS,<span style="color: #B452CD">2</span>)
-<span style="color: #228B22"># Define parameters for Stochastic Gradient Descent</span>
-n_epochs = <span style="color: #B452CD">50</span>
-M = <span style="color: #B452CD">5</span>   <span style="color: #228B22">#size of each minibatch</span>
-m = <span style="color: #658b00">int</span>(n/M) <span style="color: #228B22">#number of minibatches</span>
-<span style="color: #228B22"># Guess for unknown parameters theta</span>
-theta = np.random.randn(<span style="color: #B452CD">3</span>,<span style="color: #B452CD">1</span>)
-
-<span style="color: #228B22"># Value for learning rate</span>
-eta = <span style="color: #B452CD">0.01</span>
-<span style="color: #228B22"># Value for parameters beta1 and beta2, see https://arxiv.org/abs/1412.6980</span>
-beta1 = <span style="color: #B452CD">0.9</span>
-beta2 = <span style="color: #B452CD">0.999</span>
-<span style="color: #228B22"># Including AdaGrad parameter to avoid possible division by zero</span>
-delta  = <span style="color: #B452CD">1e-7</span>
-<span style="color: #658b00">iter</span> = <span style="color: #B452CD">0</span>
-<span style="color: #8B008B; font-weight: bold">for</span> epoch <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(n_epochs):
-    first_moment = <span style="color: #B452CD">0.0</span>
-    second_moment = <span style="color: #B452CD">0.0</span>
-    <span style="color: #658b00">iter</span> += <span style="color: #B452CD">1</span>
-    <span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(m):
-        random_index = M*np.random.randint(m)
-        xi = X[random_index:random_index+M]
-        yi = y[random_index:random_index+M]
-        gradients = (<span style="color: #B452CD">1.0</span>/M)*training_gradient(yi, xi, theta)
-        <span style="color: #228B22"># Computing moments first</span>
-        first_moment = beta1*first_moment + (<span style="color: #B452CD">1</span>-beta1)*gradients
-        second_moment = beta2*second_moment+(<span style="color: #B452CD">1</span>-beta2)*gradients*gradients
-        first_term = first_moment/(<span style="color: #B452CD">1.0</span>-beta1**<span style="color: #658b00">iter</span>)
-        second_term = second_moment/(<span style="color: #B452CD">1.0</span>-beta2**<span style="color: #658b00">iter</span>)
-	<span style="color: #228B22"># Scaling with rho the new and the previous results</span>
-        update = eta*first_term/(np.sqrt(second_term)+delta)
-        theta -= update
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;theta from own ADAM&quot;</span>)
-<span style="color: #658b00">print</span>(theta)
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.model_selection</span> <span style="color: #8B008B; font-weight: bold">import</span>  train_test_split 
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.datasets</span> <span style="color: #8B008B; font-weight: bold">import</span> load_breast_cancer
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.linear_model</span> <span style="color: #8B008B; font-weight: bold">import</span> LogisticRegression
+cancer = load_breast_cancer()
+<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">pandas</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">pd</span>
+<span style="color: #228B22"># Making a data frame</span>
+cancerpd = pd.DataFrame(cancer.data, columns=cancer.feature_names)
+
+fig, axes = plt.subplots(<span style="color: #B452CD">15</span>,<span style="color: #B452CD">2</span>,figsize=(<span style="color: #B452CD">10</span>,<span style="color: #B452CD">20</span>))
+malignant = cancer.data[cancer.target == <span style="color: #B452CD">0</span>]
+benign = cancer.data[cancer.target == <span style="color: #B452CD">1</span>]
+ax = axes.ravel()
+
+<span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #B452CD">30</span>):
+    _, bins = np.histogram(cancer.data[:,i], bins =<span style="color: #B452CD">50</span>)
+    ax[i].hist(malignant[:,i], bins = bins, alpha = <span style="color: #B452CD">0.5</span>)
+    ax[i].hist(benign[:,i], bins = bins, alpha = <span style="color: #B452CD">0.5</span>)
+    ax[i].set_title(cancer.feature_names[i])
+    ax[i].set_yticks(())
+ax[<span style="color: #B452CD">0</span>].set_xlabel(<span style="color: #CD5555">&quot;Feature magnitude&quot;</span>)
+ax[<span style="color: #B452CD">0</span>].set_ylabel(<span style="color: #CD5555">&quot;Frequency&quot;</span>)
+ax[<span style="color: #B452CD">0</span>].legend([<span style="color: #CD5555">&quot;Malignant&quot;</span>, <span style="color: #CD5555">&quot;Benign&quot;</span>], loc =<span style="color: #CD5555">&quot;best&quot;</span>)
+fig.tight_layout()
+plt.show()
+
+<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">seaborn</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">sns</span>
+correlation_matrix = cancerpd.corr().round(<span style="color: #B452CD">1</span>)
+<span style="color: #228B22"># use the heatmap function from seaborn to plot the correlation matrix</span>
+<span style="color: #228B22"># annot = True to print the values inside the square</span>
+plt.figure(figsize=(<span style="color: #B452CD">15</span>,<span style="color: #B452CD">8</span>))
+sns.heatmap(data=correlation_matrix, annot=<span style="color: #8B008B; font-weight: bold">True</span>)
+plt.show()
 </pre>
 </div>
       </div>
@@ -2265,71 +1064,23 @@ <h2 id="and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf">And finally <a href=
 
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="and-logistic-regression">And Logistic Regression </h2>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="line-height: 125%;"><span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">autograd.numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">autograd</span> <span style="color: #8B008B; font-weight: bold">import</span> grad
-
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">sigmoid</span>(x):
-    <span style="color: #8B008B; font-weight: bold">return</span> <span style="color: #B452CD">0.5</span> * (np.tanh(x / <span style="color: #B452CD">2.</span>) + <span style="color: #B452CD">1</span>)
-
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">logistic_predictions</span>(weights, inputs):
-    <span style="color: #228B22"># Outputs probability of a label being true according to logistic model.</span>
-    <span style="color: #8B008B; font-weight: bold">return</span> sigmoid(np.dot(inputs, weights))
-
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">training_loss</span>(weights):
-    <span style="color: #228B22"># Training loss is the negative log-likelihood of the training labels.</span>
-    preds = logistic_predictions(weights, inputs)
-    label_probabilities = preds * targets + (<span style="color: #B452CD">1</span> - preds) * (<span style="color: #B452CD">1</span> - targets)
-    <span style="color: #8B008B; font-weight: bold">return</span> -np.sum(np.log(label_probabilities))
-
-<span style="color: #228B22"># Build a toy dataset.</span>
-inputs = np.array([[<span style="color: #B452CD">0.52</span>, <span style="color: #B452CD">1.12</span>,  <span style="color: #B452CD">0.77</span>],
-                   [<span style="color: #B452CD">0.88</span>, -<span style="color: #B452CD">1.08</span>, <span style="color: #B452CD">0.15</span>],
-                   [<span style="color: #B452CD">0.52</span>, <span style="color: #B452CD">0.06</span>, -<span style="color: #B452CD">1.30</span>],
-                   [<span style="color: #B452CD">0.74</span>, -<span style="color: #B452CD">2.49</span>, <span style="color: #B452CD">1.39</span>]])
-targets = np.array([<span style="color: #8B008B; font-weight: bold">True</span>, <span style="color: #8B008B; font-weight: bold">True</span>, <span style="color: #8B008B; font-weight: bold">False</span>, <span style="color: #8B008B; font-weight: bold">True</span>])
-
-<span style="color: #228B22"># Define a function that returns gradients of training loss using Autograd.</span>
-training_gradient_fun = grad(training_loss)
-
-<span style="color: #228B22"># Optimize weights using gradient descent.</span>
-weights = np.array([<span style="color: #B452CD">0.0</span>, <span style="color: #B452CD">0.0</span>, <span style="color: #B452CD">0.0</span>])
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;Initial loss:&quot;</span>, training_loss(weights))
-<span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #B452CD">100</span>):
-    weights -= training_gradient_fun(weights) * <span style="color: #B452CD">0.01</span>
-
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;Trained loss:&quot;</span>, training_loss(weights))
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-<h2 id="introducing-jax-https-jax-readthedocs-io-en-latest">Introducing <a href="https://jax.readthedocs.io/en/latest/" target="_blank">JAX</a> </h2>
+<h2 id="discussing-the-correlation-data">Discussing the correlation data </h2>
 
-<p>Presently, instead of using <b>autograd</b>, we recommend using <a href="https://jax.readthedocs.io/en/latest/" target="_blank">JAX</a></p>
+<p>In the above example we note two things. In the first plot we display
+the overlap of benign and malignant tumors as functions of the various
+features in the Wisconsin data set. We see that for
+some of the features we can distinguish clearly the benign and
+malignant cases while for other features we cannot. This can point to
+us which features may be of greater interest when we wish to classify
+a benign or not benign tumour.
+</p>
 
-<p><b>JAX</b> is Autograd and <a href="https://www.tensorflow.org/xla" target="_blank">XLA (Accelerated Linear Algebra))</a>,
-brought together for high-performance numerical computing and machine learning research.
-It provides composable transformations of Python+NumPy programs: differentiate, vectorize, parallelize, Just-In-Time compile to GPU/TPU, and more.
+<p>In the second figure we have computed the so-called correlation
+matrix, which in our case with thirty features becomes a \( 30\times 30 \)
+matrix.
 </p>
-<h3 id="getting-started-with-jax-note-the-way-we-import-numpy">Getting started with Jax, note the way we import numpy </h3>
+
+<p>We constructed this matrix using <b>pandas</b> via the statements</p>
 
 <!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
 <div class="cell border-box-sizing code_cell rendered">
@@ -2337,12 +1088,7 @@ <h3 id="getting-started-with-jax-note-the-way-we-import-numpy">Getting started w
     <div class="inner_cell">
       <div class="input_area">
         <div class="highlight" style="background: #eeeedd">
-  <pre style="line-height: 125%;"><span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">jax</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">jax.numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">jnp</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">matplotlib.pyplot</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">plt</span>
-
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">jax</span> <span style="color: #8B008B; font-weight: bold">import</span> grad <span style="color: #8B008B; font-weight: bold">as</span> jax_grad
+  <pre style="line-height: 125%;">cancerpd = pd.DataFrame(cancer.data, columns=cancer.feature_names)
 </pre>
 </div>
       </div>
@@ -2357,8 +1103,8 @@ <h3 id="getting-started-with-jax-note-the-way-we-import-numpy">Getting started w
     </div>
   </div>
 </div>
-<h3 id="a-warm-up-example">A warm-up example </h3>
 
+<p>and then</p>
 
 <!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
 <div class="cell border-box-sizing code_cell rendered">
@@ -2366,40 +1112,7 @@ <h3 id="a-warm-up-example">A warm-up example </h3>
     <div class="inner_cell">
       <div class="input_area">
         <div class="highlight" style="background: #eeeedd">
-  <pre style="line-height: 125%;"><span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">function</span>(x):
-    <span style="color: #8B008B; font-weight: bold">return</span> x**<span style="color: #B452CD">2</span>
-
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">analytical_gradient</span>(x):
-    <span style="color: #8B008B; font-weight: bold">return</span> <span style="color: #B452CD">2</span>*x
-
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">gradient_descent</span>(starting_point, learning_rate, num_iterations, solver=<span style="color: #CD5555">&quot;analytical&quot;</span>):
-    x = starting_point
-    trajectory_x = [x]
-    trajectory_y = [function(x)]
-
-    <span style="color: #8B008B; font-weight: bold">if</span> solver == <span style="color: #CD5555">&quot;analytical&quot;</span>:
-        grad = analytical_gradient    
-    <span style="color: #8B008B; font-weight: bold">elif</span> solver == <span style="color: #CD5555">&quot;jax&quot;</span>:
-        grad = jax_grad(function)
-        x = jnp.float64(x)
-        learning_rate = jnp.float64(learning_rate)
-
-    <span style="color: #8B008B; font-weight: bold">for</span> _ <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(num_iterations):
-        
-        x = x - learning_rate * grad(x)
-        trajectory_x.append(x)
-        trajectory_y.append(function(x))
-
-    <span style="color: #8B008B; font-weight: bold">return</span> trajectory_x, trajectory_y
-
-x = np.linspace(-<span style="color: #B452CD">5</span>, <span style="color: #B452CD">5</span>, <span style="color: #B452CD">100</span>)
-plt.plot(x, function(x), label=<span style="color: #CD5555">&quot;f(x)&quot;</span>)
-
-descent_x, descent_y = gradient_descent(<span style="color: #B452CD">5</span>, <span style="color: #B452CD">0.1</span>, <span style="color: #B452CD">10</span>, solver=<span style="color: #CD5555">&quot;analytical&quot;</span>)
-jax_descend_x, jax_descend_y = gradient_descent(<span style="color: #B452CD">5</span>, <span style="color: #B452CD">0.1</span>, <span style="color: #B452CD">10</span>, solver=<span style="color: #CD5555">&quot;jax&quot;</span>)
-
-plt.plot(descent_x, descent_y, label=<span style="color: #CD5555">&quot;Gradient descent&quot;</span>, marker=<span style="color: #CD5555">&quot;o&quot;</span>)
-plt.plot(jax_descend_x, jax_descend_y, label=<span style="color: #CD5555">&quot;JAX&quot;</span>, marker=<span style="color: #CD5555">&quot;x&quot;</span>)
+  <pre style="line-height: 125%;">correlation_matrix = cancerpd.corr().round(<span style="color: #B452CD">1</span>)
 </pre>
 </div>
       </div>
@@ -2414,8 +1127,15 @@ <h3 id="a-warm-up-example">A warm-up example </h3>
     </div>
   </div>
 </div>
-<h3 id="a-more-advanced-example">A more advanced example </h3>
 
+<p>Diagonalizing this matrix we can in turn say something about which
+features are of relevance and which are not. This leads  us to
+the classical Principal Component Analysis (PCA) theorem with
+applications. This will be discussed later this semester.
+</p>
+
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="other-measures-in-classification-studies">Other measures in classification studies </h2>
 
 <!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
 <div class="cell border-box-sizing code_cell rendered">
@@ -2423,26 +1143,38 @@ <h3 id="a-more-advanced-example">A more advanced example </h3>
     <div class="inner_cell">
       <div class="input_area">
         <div class="highlight" style="background: #eeeedd">
-  <pre style="line-height: 125%;">backend = np
-
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">function</span>(x):
-    <span style="color: #8B008B; font-weight: bold">return</span> x*backend.sin(x**<span style="color: #B452CD">2</span> + <span style="color: #B452CD">1</span>)
-
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">analytical_gradient</span>(x):
-    <span style="color: #8B008B; font-weight: bold">return</span> backend.sin(x**<span style="color: #B452CD">2</span> + <span style="color: #B452CD">1</span>) + <span style="color: #B452CD">2</span>*x**<span style="color: #B452CD">2</span>*backend.cos(x**<span style="color: #B452CD">2</span> + <span style="color: #B452CD">1</span>)
-
-
-x = np.linspace(-<span style="color: #B452CD">5</span>, <span style="color: #B452CD">5</span>, <span style="color: #B452CD">100</span>)
-plt.plot(x, function(x), label=<span style="color: #CD5555">&quot;f(x)&quot;</span>)
-
-descent_x, descent_y = gradient_descent(<span style="color: #B452CD">1</span>, <span style="color: #B452CD">0.01</span>, <span style="color: #B452CD">300</span>, solver=<span style="color: #CD5555">&quot;analytical&quot;</span>)
+  <pre style="line-height: 125%;"><span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">matplotlib.pyplot</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">plt</span>
+<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.model_selection</span> <span style="color: #8B008B; font-weight: bold">import</span>  train_test_split 
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.datasets</span> <span style="color: #8B008B; font-weight: bold">import</span> load_breast_cancer
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.linear_model</span> <span style="color: #8B008B; font-weight: bold">import</span> LogisticRegression
 
-<span style="color: #228B22"># Change the backend to JAX</span>
-backend = jnp
-jax_descend_x, jax_descend_y = gradient_descent(<span style="color: #B452CD">1</span>, <span style="color: #B452CD">0.01</span>, <span style="color: #B452CD">300</span>, solver=<span style="color: #CD5555">&quot;jax&quot;</span>)
+<span style="color: #228B22"># Load the data</span>
+cancer = load_breast_cancer()
 
-plt.scatter(descent_x, descent_y, label=<span style="color: #CD5555">&quot;Gradient descent&quot;</span>, marker=<span style="color: #CD5555">&quot;v&quot;</span>, s=<span style="color: #B452CD">10</span>, color=<span style="color: #CD5555">&quot;red&quot;</span>) 
-plt.scatter(jax_descend_x, jax_descend_y, label=<span style="color: #CD5555">&quot;JAX&quot;</span>, marker=<span style="color: #CD5555">&quot;x&quot;</span>, s=<span style="color: #B452CD">5</span>, color=<span style="color: #CD5555">&quot;black&quot;</span>)
+X_train, X_test, y_train, y_test = train_test_split(cancer.data,cancer.target,random_state=<span style="color: #B452CD">0</span>)
+<span style="color: #658b00">print</span>(X_train.shape)
+<span style="color: #658b00">print</span>(X_test.shape)
+<span style="color: #228B22"># Logistic Regression</span>
+logreg = LogisticRegression(solver=<span style="color: #CD5555">&#39;lbfgs&#39;</span>)
+logreg.fit(X_train, y_train)
+
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.preprocessing</span> <span style="color: #8B008B; font-weight: bold">import</span> LabelEncoder
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.model_selection</span> <span style="color: #8B008B; font-weight: bold">import</span> cross_validate
+<span style="color: #228B22">#Cross validation</span>
+accuracy = cross_validate(logreg,X_test,y_test,cv=<span style="color: #B452CD">10</span>)[<span style="color: #CD5555">&#39;test_score&#39;</span>]
+<span style="color: #658b00">print</span>(accuracy)
+<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;Test set accuracy with Logistic Regression: {:.2f}&quot;</span>.format(logreg.score(X_test,y_test)))
+
+<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">scikitplot</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">skplt</span>
+y_pred = logreg.predict(X_test)
+skplt.metrics.plot_confusion_matrix(y_test, y_pred, normalize=<span style="color: #8B008B; font-weight: bold">True</span>)
+plt.show()
+y_probas = logreg.predict_proba(X_test)
+skplt.metrics.plot_roc(y_test, y_probas)
+plt.show()
+skplt.metrics.plot_cumulative_gain(y_test, y_probas)
+plt.show()
 </pre>
 </div>
       </div>
@@ -2861,7 +1593,7 @@ <h2 id="mathematical-model">Mathematical model  </h2>
 
 $$
 \begin{equation} z_i^1 = \sum_{j=1}^{M} w_{ij}^1 x_j + b_i^1
-\label{_auto6}
+\label{_auto1}
 \end{equation}
 $$
 
@@ -2906,7 +1638,7 @@ <h2 id="mathematical-model">Mathematical model  </h2>
 $$
 \begin{align}
  y_i^2 &= f^2\left(\sum_{j=1}^N w_{ij}^2 y_j^1 + b_i^2\right) 
-\label{_auto7}\\
+\label{_auto2}\\
  &= f^2\left[\sum_{j=1}^N w_{ij}^2f^1\left(\sum_{k=1}^M w_{jk}^1 x_k + b_j^1\right) + b_i^2\right]
 \label{outputLayer2}
 \end{align}
@@ -2917,10 +1649,10 @@ <h2 id="mathematical-model">Mathematical model  </h2>
 $$
 \begin{align}
  y_i^3 &= f^3\left(\sum_{j=1}^N w_{ij}^3 y_j^2 + b_i^3\right) 
-\label{_auto8}\\
+\label{_auto3}\\
  &= f_3\left[\sum_{j} w_{ij}^3 f^2\left(\sum_{k} w_{jk}^2 f^1\left(\sum_{m} w_{km}^1 x_m + b_k^1\right) + b_j^2\right)
   + b_1^3\right]
-\label{_auto9}
+\label{_auto4}
 \end{align}
 $$
 
@@ -2960,7 +1692,7 @@ <h2 id="mathematical-model">Mathematical model  </h2>
 $$
 \begin{equation}
  f(x) = c_1 f(c_2 x + c_3) + c_4
-\label{_auto10}
+\label{_auto5}
 \end{equation}
 $$
 
@@ -3002,7 +1734,7 @@ <h3 id="matrix-vector-notation">Matrix-vector notation </h3>
            b^2_2 \\
            b^2_3 \\
           \end{array}\right]\right).
-\label{_auto11}
+\label{_auto6}
 \end{equation}
 $$
 
@@ -3016,7 +1748,7 @@ <h3 id="matrix-vector-notation-and-activation">Matrix-vector notation  and activ
 \begin{equation}
  y^2_i = f_2\Bigr(w^2_{i1}y^1_1 + w^2_{i2}y^1_2 + w^2_{i3}y^1_3 + b^2_i\Bigr) = 
  f_2\left(\sum_{j=1}^3 w^2_{ij} y_j^1 + b^2_i\right).
-\label{_auto12}
+\label{_auto7}
 \end{equation}
 $$
 
@@ -3172,7 +1904,7 @@ <h3 id="relevance">Relevance </h3>
 
 <!-- ------------------- end of main content --------------- -->
 <center style="font-size:80%">
-<!-- copyright --> &copy; 1999-2024, Morten Hjorth-Jensen. Released under CC Attribution-NonCommercial 4.0 license
+<!-- copyright --> &copy; 1999-2025, Morten Hjorth-Jensen. Released under CC Attribution-NonCommercial 4.0 license
 </center>
 </body>
 </html>
diff --git a/doc/pub/week40/html/week40.html b/doc/pub/week40/html/week40.html
index 6b1494bfb..4c3526ca6 100644
--- a/doc/pub/week40/html/week40.html
+++ b/doc/pub/week40/html/week40.html
@@ -140,11 +140,10 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Plans for week 40', 2, None, 'plans-for-week-40'),
-              ('Lecture Monday September 30, 2024',
+ 'sections': [('Lecture Monday September 29, 2025',
                2,
                None,
-               'lecture-monday-september-30-2024'),
+               'lecture-monday-september-29-2025'),
               ('Suggested readings and videos',
                2,
                None,
@@ -153,145 +152,71 @@
                2,
                None,
                'lab-sessions-tuesday-and-wednesday'),
-              ('Summary from last week, using gradient descent methods, '
-               'limitations',
+              ('Logistic Regression, from last week',
                2,
                None,
-               'summary-from-last-week-using-gradient-descent-methods-limitations'),
-              ('Simple implementation of GD for OLS, Ridge and Lasso',
+               'logistic-regression-from-last-week'),
+              ('Classification problems', 2, None, 'classification-problems'),
+              ('Optimization and Deep learning',
                2,
                None,
-               'simple-implementation-of-gd-for-ols-ridge-and-lasso'),
-              ("But none of these can compete with Newton's method",
+               'optimization-and-deep-learning'),
+              ('Basics', 2, None, 'basics'),
+              ('Two parameters', 2, None, 'two-parameters'),
+              ('Maximum likelihood', 2, None, 'maximum-likelihood'),
+              ('The cost function rewritten',
                2,
                None,
-               'but-none-of-these-can-compete-with-newton-s-method'),
-              ('Gradient descent and Logistic regression',
+               'the-cost-function-rewritten'),
+              ('Minimizing the cross entropy',
                2,
                None,
-               'gradient-descent-and-logistic-regression'),
-              ('Overview video on Stochastic Gradient Descent',
+               'minimizing-the-cross-entropy'),
+              ('A more compact expression',
                2,
                None,
-               'overview-video-on-stochastic-gradient-descent'),
-              ('Batches and mini-batches', 2, None, 'batches-and-mini-batches'),
-              ('Stochastic Gradient Descent (SGD)',
+               'a-more-compact-expression'),
+              ('Extending to more predictors',
                2,
                None,
-               'stochastic-gradient-descent-sgd'),
-              ('Stochastic Gradient Descent',
+               'extending-to-more-predictors'),
+              ('Including more classes', 2, None, 'including-more-classes'),
+              ('More classes', 2, None, 'more-classes'),
+              ('Optimization, the central part of any Machine Learning '
+               'algortithm',
                2,
                None,
-               'stochastic-gradient-descent'),
-              ('Computation of gradients', 2, None, 'computation-of-gradients'),
-              ('SGD example', 2, None, 'sgd-example'),
-              ('The gradient step', 2, None, 'the-gradient-step'),
-              ('Simple example code', 2, None, 'simple-example-code'),
-              ('When do we stop?', 2, None, 'when-do-we-stop'),
-              ('Slightly different approach',
+               'optimization-the-central-part-of-any-machine-learning-algortithm'),
+              ('Revisiting our Logistic Regression case',
                2,
                None,
-               'slightly-different-approach'),
-              ('Time decay rate', 2, None, 'time-decay-rate'),
-              ('Code with a Number of Minibatches which varies',
+               'revisiting-our-logistic-regression-case'),
+              ('The equations to solve', 2, None, 'the-equations-to-solve'),
+              ("Solving using Newton-Raphson's method",
                2,
                None,
-               'code-with-a-number-of-minibatches-which-varies'),
-              ('Replace or not', 2, None, 'replace-or-not'),
-              ('Momentum based GD', 2, None, 'momentum-based-gd'),
-              ('More on momentum based approaches',
+               'solving-using-newton-raphson-s-method'),
+              ('Example code for Logistic Regression',
                2,
                None,
-               'more-on-momentum-based-approaches'),
-              ('Momentum parameter', 2, None, 'momentum-parameter'),
-              ('Second moment of the gradient',
-               2,
-               None,
-               'second-moment-of-the-gradient'),
-              ('RMS prop', 2, None, 'rms-prop'),
-              ('"ADAM optimizer":"https://arxiv.org/abs/1412.6980"',
-               2,
-               None,
-               'adam-optimizer-https-arxiv-org-abs-1412-6980'),
-              ('Algorithms and codes for Adagrad, RMSprop and Adam',
-               2,
-               None,
-               'algorithms-and-codes-for-adagrad-rmsprop-and-adam'),
-              ('AdaGrad algorithm, taken from "Goodfellow et '
-               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
-               2,
-               None,
-               'adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
-              ('RMSProp algorithm, taken from "Goodfellow et '
-               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
-               2,
-               None,
-               'rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
-              ('ADAM algorithm, taken from "Goodfellow et '
-               'al":"https://www.deeplearningbook.org/contents/optimization.html"',
-               2,
-               None,
-               'adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html'),
-              ('Practical tips', 2, None, 'practical-tips'),
-              ('Automatic differentiation',
-               2,
-               None,
-               'automatic-differentiation'),
-              ('Using autograd', 2, None, 'using-autograd'),
-              ('Autograd with more complicated functions',
-               2,
-               None,
-               'autograd-with-more-complicated-functions'),
-              ('More complicated functions using the elements of their '
-               'arguments directly',
-               2,
-               None,
-               'more-complicated-functions-using-the-elements-of-their-arguments-directly'),
-              ('Functions using mathematical functions from Numpy',
-               2,
-               None,
-               'functions-using-mathematical-functions-from-numpy'),
-              ('More autograd', 2, None, 'more-autograd'),
-              ('And  with loops', 2, None, 'and-with-loops'),
-              ('Using recursion', 2, None, 'using-recursion'),
-              ('Using Autograd with OLS', 2, None, 'using-autograd-with-ols'),
-              ('Same code but now with momentum gradient descent',
-               2,
-               None,
-               'same-code-but-now-with-momentum-gradient-descent'),
-              ('Including Stochastic Gradient Descent with Autograd',
-               2,
-               None,
-               'including-stochastic-gradient-descent-with-autograd'),
-              ('Same code but now with momentum gradient descent',
-               2,
-               None,
-               'same-code-but-now-with-momentum-gradient-descent'),
-              ('Similar (second order function now) problem but now with '
-               'AdaGrad',
-               2,
+               'example-code-for-logistic-regression'),
+              ('Synthetic data generation',
+               3,
                None,
-               'similar-second-order-function-now-problem-but-now-with-adagrad'),
-              ('RMSprop for adaptive learning rate with Stochastic Gradient '
-               'Descent',
+               'synthetic-data-generation'),
+              ('Using _Scikit-learn_', 2, None, 'using-scikit-learn'),
+              ('Using the correlation matrix',
                2,
                None,
-               'rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent'),
-              ('And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf"',
+               'using-the-correlation-matrix'),
+              ('Discussing the correlation data',
                2,
                None,
-               'and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf'),
-              ('And Logistic Regression', 2, None, 'and-logistic-regression'),
-              ('Introducing "JAX":"https://jax.readthedocs.io/en/latest/"',
+               'discussing-the-correlation-data'),
+              ('Other measures in classification studies',
                2,
                None,
-               'introducing-jax-https-jax-readthedocs-io-en-latest'),
-              ('Getting started with Jax, note the way we import numpy',
-               3,
-               None,
-               'getting-started-with-jax-note-the-way-we-import-numpy'),
-              ('A warm-up example', 3, None, 'a-warm-up-example'),
-              ('A more advanced example', 3, None, 'a-more-advanced-example'),
+               'other-measures-in-classification-studies'),
               ('Introduction to Neural networks',
                2,
                None,
@@ -372,34 +297,29 @@ <h1>Week 40: Gradient descent methods (continued) and start Neural networks</h1>
 
 <!-- author(s): Morten Hjorth-Jensen -->
 <center>
-<b>Morten Hjorth-Jensen</b> [1, 2]
+<b>Morten Hjorth-Jensen</b> 
 </center>
-<!-- institution(s) -->
+<!-- institution -->
 <center>
-[1] <b>Department of Physics, University of Oslo, Norway</b>
-</center>
-<center>
-[2] <b>Department of Physics and Astronomy and Facility for Rare Ion Beams, Michigan State University, USA</b>
+<b>Department of Physics, University of Oslo, Norway</b>
 </center>
 <br>
 <center>
-<h4>September 30-October 4, 2024</h4>
+<h4>September 29-October 3, 2025</h4>
 </center> <!-- date -->
 <br>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="plans-for-week-40">Plans for week 40 </h2>
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="lecture-monday-september-30-2024">Lecture Monday September 30, 2024 </h2>
+<h2 id="lecture-monday-september-29-2025">Lecture Monday September 29, 2025 </h2>
 <div class="alert alert-block alert-block alert-text-normal">
 <b></b>
 <p>
 <ol>
- <li> Stochastic Gradient descent with examples and automatic differentiation</li>
- <li> If we get time, we start with the basics of Neural Networks, setting up the basic steps, from the simple perceptron model to the multi-layer perceptron model</li>
- <li> <a href="https://youtu.be/jdJoOrCIdII" target="_blank">Video of lecture</a></li>
- <li> Whiteboard notes at <a href="https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2024/NotesSeptember30.pdf" target="_blank"><tt>https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2024/NotesSeptember30.pdf</tt></a></li>  
+<li> Logistic regression and gradient descent, examples on how to code
+<!-- o Automatic differentiation and gradient descent, examples using Logistic regression --></li>
+<li> Start with the basics of Neural Networks, setting up the basic steps, from the simple perceptron model to the multi-layer perceptron model</li>
+<li> Video of lecture at <a href="https://youtu.be/MS3Tv8FVArs" target="_blank"><tt>https://youtu.be/MS3Tv8FVArs</tt></a></li>
+<li> Whiteboard notes at <a href="https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2025/FYSSTKweek40.pdf" target="_blank"><tt>https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2025/FYSSTKweek40.pdf</tt></a></li>  
 </ol>
 </div>
 
@@ -410,13 +330,13 @@ <h2 id="suggested-readings-and-videos">Suggested readings and videos </h2>
 <b>Readings and Videos:</b>
 <p>
 <ol>
-  <li> The lecture notes for week 40 (these notes)</li>
-  <li> For a good discussion on gradient methods, we would like to recommend Goodfellow et al section 4.3-4.5 and sections 8.3-8.6. We will come back to the latter chapter in our discussion of Neural networks as well.</li>
-  <li> For neural networks we recommend Goodfellow et al chapter 6 and Raschka et al chapter 2 (contains also material about gradient descent) and chapter 11 (we will use this next week)</li>
-  <li> Video on gradient descent at <a href="https://www.youtube.com/watch?v=sDv4f4s2SB8" target="_blank"><tt>https://www.youtube.com/watch?v=sDv4f4s2SB8</tt></a></li>
-  <li> Video on stochastic gradient descent at <a href="https://www.youtube.com/watch?v=vMh0zPT0tLI" target="_blank"><tt>https://www.youtube.com/watch?v=vMh0zPT0tLI</tt></a></li>
-  <li> Neural Networks demystified at <a href="https://www.youtube.com/watch?v=bxe2T-V8XRs&list=PLiaHhY2iBX9hdHaRr6b7XevZtgZRa1PoU&ab_channel=WelchLabs" target="_blank"><tt>https://www.youtube.com/watch?v=bxe2T-V8XRs&list=PLiaHhY2iBX9hdHaRr6b7XevZtgZRa1PoU&ab_channel=WelchLabs</tt></a></li>
-  <li> Building Neural Networks from scratch at URL:https://www.youtube.com/watch?v=Wo5dMEP_BbI&list=PLQVvvaa0QuDcjD5BAw2DxE6OF2tius3V3&ab_channel=sentdex"</li>
+<li> The lecture notes for week 40 (these notes)
+<!-- o For a good discussion on gradient methods, we would like to recommend Goodfellow et al section 4.3-4.5 and# sections 8.3-8.6. We will come back to the latter chapter in our discussion of Neural networks as well. --></li>
+<li> For neural networks we recommend Goodfellow et al chapter 6 and Raschka et al chapter 2 (contains also material about gradient descent) and chapter 11 (we will use this next week)
+<!-- o Video on gradient descent at <a href="https://www.youtube.com/watch?v=sDv4f4s2SB8" target="_blank"><tt>https://www.youtube.com/watch?v=sDv4f4s2SB8</tt></a> -->
+<!-- o Video on automatic differentiation  at <a href="https://www.youtube.com/watch?v=wG_nF1awSSY" target="_blank"><tt>https://www.youtube.com/watch?v=wG_nF1awSSY</tt></a> --></li>
+<li> Neural Networks demystified at <a href="https://www.youtube.com/watch?v=bxe2T-V8XRs&list=PLiaHhY2iBX9hdHaRr6b7XevZtgZRa1PoU&ab_channel=WelchLabs" target="_blank"><tt>https://www.youtube.com/watch?v=bxe2T-V8XRs&list=PLiaHhY2iBX9hdHaRr6b7XevZtgZRa1PoU&ab_channel=WelchLabs</tt></a></li>
+<li> Building Neural Networks from scratch at URL:https://www.youtube.com/watch?v=Wo5dMEP_BbI&list=PLQVvvaa0QuDcjD5BAw2DxE6OF2tius3V3&ab_channel=sentdex"</li>
 </ol>
 </div>
 
@@ -435,1486 +355,354 @@ <h2 id="lab-sessions-tuesday-and-wednesday">Lab sessions Tuesday and Wednesday <
 </div>
   
 
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="summary-from-last-week-using-gradient-descent-methods-limitations">Summary from last week, using gradient descent methods, limitations </h2>
-
-<ul>
-<li> <b>Gradient descent (GD) finds local minima of our function</b>. Since the GD algorithm is deterministic, if it converges, it will converge to a local minimum of our cost/loss/risk function. Because in ML we are often dealing with extremely rugged landscapes with many local minima, this can lead to poor performance.</li>
-<li> <b>GD is sensitive to initial conditions</b>. One consequence of the local nature of GD is that initial conditions matter. Depending on where one starts, one will end up at a different local minima. Therefore, it is very important to think about how one initializes the training process. This is true for GD as well as more complicated variants of GD.</li>
-<li> <b>Gradients are computationally expensive to calculate for large datasets</b>. In many cases in statistics and ML, the cost/loss/risk function is a sum of terms, with one term for each data point. For example, in linear regression, \( E \propto \sum_{i=1}^n (y_i - \mathbf{w}^T\cdot\mathbf{x}_i)^2 \); for logistic regression, the square error is replaced by the cross entropy. To calculate the gradient we have to sum over <em>all</em> \( n \) data points. Doing this at every GD step becomes extremely computationally expensive. An ingenious solution to this, is to calculate the gradients using small subsets of the data called &quot;mini batches&quot;. This has the added benefit of introducing stochasticity into our algorithm.</li>
-<li> <b>GD is very sensitive to choices of learning rates</b>. GD is extremely sensitive to the choice of learning rates. If the learning rate is very small, the training process take an extremely long time. For larger learning rates, GD can diverge and give poor results. Furthermore, depending on what the local landscape looks like, we have to modify the learning rates to ensure convergence. Ideally, we would <em>adaptively</em> choose the learning rates to match the landscape.</li>
-<li> <b>GD treats all directions in parameter space uniformly.</b> Another major drawback of GD is that unlike Newton's method, the learning rate for GD is the same in all directions in parameter space. For this reason, the maximum learning rate is set by the behavior of the steepest direction and this can significantly slow down training. Ideally, we would like to take large steps in flat directions and small steps in steep directions. Since we are exploring rugged landscapes where curvatures change, this requires us to keep track of not only the gradient but second derivatives. The ideal scenario would be to calculate the Hessian but this proves to be too computationally expensive.</li> 
-<li> GD can take exponential time to escape saddle points, even with random initialization. As we mentioned, GD is extremely sensitive to initial condition since it determines the particular local minimum GD would eventually reach. However, even with a good initialization scheme, through the introduction of randomness, GD can still take exponential time to escape saddle points.</li>
-</ul>
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="simple-implementation-of-gd-for-ols-ridge-and-lasso">Simple implementation of GD for OLS, Ridge and Lasso </h2>
+<!-- !split  -->
+<h2 id="logistic-regression-from-last-week">Logistic Regression, from last week  </h2>
 
-<p>Last week we studied both several gradient methods. With and without an update of the learning.
-We summarize some of these here for the methods we hvae studied in project one, without the inclusion of momentum.
+<p>In linear regression our main interest was centered on learning the
+coefficients of a functional fit (say a polynomial) in order to be
+able to predict the response of a continuous variable on some unseen
+data. The fit to the continuous variable \( y_i \) is based on some
+independent variables \( \boldsymbol{x}_i \). Linear regression resulted in
+analytical expressions for standard ordinary Least Squares or Ridge
+regression (in terms of matrices to invert) for several quantities,
+ranging from the variance and thereby the confidence intervals of the
+parameters \( \boldsymbol{\theta} \) to the mean squared error. If we can invert
+the product of the design matrices, linear regression gives then a
+simple recipe for fitting our data.
 </p>
 
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">random</span> <span style="color: #008000; font-weight: bold">import</span> random, seed
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
-
-<span style="color: #408080; font-style: italic"># the number of datapoints with a 2nd-order polynomial</span>
-n <span style="color: #666666">=</span> <span style="color: #666666">100</span>
-x <span style="color: #666666">=</span> <span style="color: #666666">2*</span>np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>rand(n,<span style="color: #666666">1</span>)
-y <span style="color: #666666">=</span> <span style="color: #666666">4+3*</span>x<span style="color: #666666">+5*</span>x<span style="color: #666666">*</span>x
-<span style="color: #408080; font-style: italic"># Design matrix including the intercept</span>
-<span style="color: #408080; font-style: italic"># No scaling of data of and all data used for training </span>
-X <span style="color: #666666">=</span> np<span style="color: #666666">.</span>c_[np<span style="color: #666666">.</span>ones((n,<span style="color: #666666">1</span>)), x, x<span style="color: #666666">*</span>x]
-<span style="color: #408080; font-style: italic"># Learning rate and number of iterations</span>
-eta <span style="color: #666666">=</span> <span style="color: #666666">0.05</span>
-Niterations <span style="color: #666666">=</span> <span style="color: #666666">100</span>
-
-<span style="color: #408080; font-style: italic"># OLS part</span>
-beta_OLS <span style="color: #666666">=</span> np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>randn(<span style="color: #666666">3</span>,<span style="color: #666666">1</span>)
-gradient <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros(<span style="color: #666666">3</span>)
-<span style="color: #008000; font-weight: bold">for</span> <span style="color: #008000">iter</span> <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(Niterations):
-    gradient <span style="color: #666666">=</span> (<span style="color: #666666">2.0/</span>n)<span style="color: #666666">*</span>X<span style="color: #666666">.</span>T <span style="color: #666666">@</span> (X <span style="color: #666666">@</span> beta_OLS<span style="color: #666666">-</span>y)
-    beta_OLS <span style="color: #666666">-=</span> eta<span style="color: #666666">*</span>gradient
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&#39;Parameters for OLS using gradient descent&#39;</span>)    
-<span style="color: #008000">print</span>(beta_OLS)
-
-<span style="color: #408080; font-style: italic">#Ridge and Lasso parameter Lambda</span>
-Lambda  <span style="color: #666666">=</span> <span style="color: #666666">0.01</span>
-Id <span style="color: #666666">=</span> n<span style="color: #666666">*</span>Lambda<span style="color: #666666">*</span> np<span style="color: #666666">.</span>eye((X<span style="color: #666666">.</span>T <span style="color: #666666">@</span> X)<span style="color: #666666">.</span>shape[<span style="color: #666666">0</span>])
-<span style="color: #408080; font-style: italic"># Gradient descent with  Ridge</span>
-beta_Ridge <span style="color: #666666">=</span> np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>randn(<span style="color: #666666">3</span>,<span style="color: #666666">1</span>)
-gradient <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros(<span style="color: #666666">3</span>)
-<span style="color: #008000; font-weight: bold">for</span> <span style="color: #008000">iter</span> <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(Niterations):
-    gradients <span style="color: #666666">=</span> <span style="color: #666666">2.0/</span>n<span style="color: #666666">*</span>X<span style="color: #666666">.</span>T <span style="color: #666666">@</span> (X <span style="color: #666666">@</span> beta_Ridge<span style="color: #666666">-</span>y)<span style="color: #666666">+2*</span>Lambda<span style="color: #666666">*</span>beta_Ridge
-    beta_Ridge <span style="color: #666666">-=</span> eta<span style="color: #666666">*</span>gradients
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&#39;Parameters for Ridge using gradient descent&#39;</span>)    
-<span style="color: #008000">print</span>(beta_Ridge)
-
-<span style="color: #408080; font-style: italic"># Gradient descent with Lasso</span>
-beta_Lasso <span style="color: #666666">=</span> np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>randn(<span style="color: #666666">3</span>,<span style="color: #666666">1</span>)
-gradient <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros(<span style="color: #666666">3</span>)
-<span style="color: #008000; font-weight: bold">for</span> <span style="color: #008000">iter</span> <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(Niterations):
-    gradients <span style="color: #666666">=</span> <span style="color: #666666">2.0/</span>n<span style="color: #666666">*</span>X<span style="color: #666666">.</span>T <span style="color: #666666">@</span> (X <span style="color: #666666">@</span> beta_Lasso<span style="color: #666666">-</span>y)<span style="color: #666666">+</span>Lambda<span style="color: #666666">*</span>np<span style="color: #666666">.</span>sign(beta_Lasso)
-    beta_Lasso <span style="color: #666666">-=</span> eta<span style="color: #666666">*</span>gradients
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&#39;Parameters for Lasso using gradient descent&#39;</span>)    
-<span style="color: #008000">print</span>(beta_Lasso)
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="but-none-of-these-can-compete-with-newton-s-method">But none of these can compete with Newton's method </h2>
-
-<p>Note that we here have introduced automatic differentiation</p>
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #408080; font-style: italic"># Using Newton&#39;s method</span>
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">random</span> <span style="color: #008000; font-weight: bold">import</span> random, seed
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">autograd.numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">autograd</span> <span style="color: #008000; font-weight: bold">import</span> grad
-
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">CostOLS</span>(beta):
-    <span style="color: #008000; font-weight: bold">return</span> (<span style="color: #666666">1.0/</span>n)<span style="color: #666666">*</span>np<span style="color: #666666">.</span>sum((y<span style="color: #666666">-</span>X <span style="color: #666666">@</span> beta)<span style="color: #666666">**2</span>)
-
-n <span style="color: #666666">=</span> <span style="color: #666666">100</span>
-x <span style="color: #666666">=</span> <span style="color: #666666">2*</span>np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>rand(n,<span style="color: #666666">1</span>)
-y <span style="color: #666666">=</span> <span style="color: #666666">4+3*</span>x<span style="color: #666666">+5*</span>x<span style="color: #666666">*</span>x
-
-X <span style="color: #666666">=</span> np<span style="color: #666666">.</span>c_[np<span style="color: #666666">.</span>ones((n,<span style="color: #666666">1</span>)), x, x<span style="color: #666666">*</span>x]
-XT_X <span style="color: #666666">=</span> X<span style="color: #666666">.</span>T <span style="color: #666666">@</span> X
-beta_linreg <span style="color: #666666">=</span> np<span style="color: #666666">.</span>linalg<span style="color: #666666">.</span>pinv(XT_X) <span style="color: #666666">@</span> (X<span style="color: #666666">.</span>T <span style="color: #666666">@</span> y)
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;Own inversion&quot;</span>)
-<span style="color: #008000">print</span>(beta_linreg)
-<span style="color: #408080; font-style: italic"># Hessian matrix</span>
-H <span style="color: #666666">=</span> (<span style="color: #666666">2.0/</span>n)<span style="color: #666666">*</span> XT_X
-<span style="color: #408080; font-style: italic"># Note that here the Hessian does not depend on the parameters beta</span>
-invH <span style="color: #666666">=</span> np<span style="color: #666666">.</span>linalg<span style="color: #666666">.</span>pinv(H)
-beta <span style="color: #666666">=</span> np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>randn(<span style="color: #666666">3</span>,<span style="color: #666666">1</span>)
-Niterations <span style="color: #666666">=</span> <span style="color: #666666">5</span>
-<span style="color: #408080; font-style: italic"># define the gradient</span>
-training_gradient <span style="color: #666666">=</span> grad(CostOLS)
-
-<span style="color: #008000; font-weight: bold">for</span> <span style="color: #008000">iter</span> <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(Niterations):
-    gradients <span style="color: #666666">=</span> training_gradient(beta)
-    beta <span style="color: #666666">-=</span> invH <span style="color: #666666">@</span> gradients
-    <span style="color: #008000">print</span>(<span style="color: #008000">iter</span>,gradients[<span style="color: #666666">0</span>],gradients[<span style="color: #666666">1</span>])
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;beta from own Newton code&quot;</span>)
-<span style="color: #008000">print</span>(beta)
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="gradient-descent-and-logistic-regression">Gradient descent and Logistic regression </h2>
+<!-- !split  -->
+<h2 id="classification-problems">Classification problems </h2>
 
-<p>Finally, we complete these examples by adding a simple code for
-Logistic regression. Note the more general approach with a class for
-the method. Here we use a so-called <b>AND</b> gate for our data set.
+<p>Classification problems, however, are concerned with outcomes taking
+the form of discrete variables (i.e. categories). We may for example,
+on the basis of DNA sequencing for a number of patients, like to find
+out which mutations are important for a certain disease; or based on
+scans of various patients' brains, figure out if there is a tumor or
+not; or given a specific physical system, we'd like to identify its
+state, say whether it is an ordered or disordered system (typical
+situation in solid state physics); or classify the status of a
+patient, whether she/he has a stroke or not and many other similar
+situations.
 </p>
 
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
-<span style="color: #008000; font-weight: bold">class</span> <span style="color: #0000FF; font-weight: bold">LogisticRegression</span>:
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">__init__</span>(<span style="color: #008000">self</span>, learning_rate<span style="color: #666666">=0.01</span>, num_iterations<span style="color: #666666">=1000</span>):
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>learning_rate <span style="color: #666666">=</span> learning_rate
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>num_iterations <span style="color: #666666">=</span> num_iterations
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>beta_logreg <span style="color: #666666">=</span> <span style="color: #008000; font-weight: bold">None</span>
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">sigmoid</span>(<span style="color: #008000">self</span>, z):
-        <span style="color: #008000; font-weight: bold">return</span> <span style="color: #666666">1</span> <span style="color: #666666">/</span> (<span style="color: #666666">1</span> <span style="color: #666666">+</span> np<span style="color: #666666">.</span>exp(<span style="color: #666666">-</span>z))
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">GDfit</span>(<span style="color: #008000">self</span>, X, y):
-        n_data, num_features <span style="color: #666666">=</span> X<span style="color: #666666">.</span>shape
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>beta_logreg <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros(num_features)
-        <span style="color: #008000; font-weight: bold">for</span> _ <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #008000">self</span><span style="color: #666666">.</span>num_iterations):
-            linear_model <span style="color: #666666">=</span> X <span style="color: #666666">@</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>beta_logreg
-            y_predicted <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>sigmoid(linear_model)
-            <span style="color: #408080; font-style: italic"># Gradient calculation</span>
-            gradient <span style="color: #666666">=</span> (X<span style="color: #666666">.</span>T <span style="color: #666666">@</span> (y_predicted <span style="color: #666666">-</span> y))<span style="color: #666666">/</span>n_data
-            <span style="color: #408080; font-style: italic"># Update beta_logreg</span>
-            <span style="color: #008000">self</span><span style="color: #666666">.</span>beta_logreg <span style="color: #666666">-=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>learning_rate<span style="color: #666666">*</span>gradient
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">predict</span>(<span style="color: #008000">self</span>, X):
-        linear_model <span style="color: #666666">=</span> X <span style="color: #666666">@</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>beta_logreg
-        y_predicted <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>sigmoid(linear_model)
-        <span style="color: #008000; font-weight: bold">return</span> [<span style="color: #666666">1</span> <span style="color: #008000; font-weight: bold">if</span> i <span style="color: #666666">&gt;=</span> <span style="color: #666666">0.5</span> <span style="color: #008000; font-weight: bold">else</span> <span style="color: #666666">0</span> <span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> y_predicted]
-<span style="color: #408080; font-style: italic"># Example usage</span>
-<span style="color: #008000; font-weight: bold">if</span> <span style="color: #19177C">__name__</span> <span style="color: #666666">==</span> <span style="color: #BA2121">&quot;__main__&quot;</span>:
-    <span style="color: #408080; font-style: italic"># Sample data</span>
-    X <span style="color: #666666">=</span> np<span style="color: #666666">.</span>array([[<span style="color: #666666">0</span>, <span style="color: #666666">0</span>], [<span style="color: #666666">1</span>, <span style="color: #666666">0</span>], [<span style="color: #666666">0</span>, <span style="color: #666666">1</span>], [<span style="color: #666666">1</span>, <span style="color: #666666">1</span>]])
-    y <span style="color: #666666">=</span> np<span style="color: #666666">.</span>array([<span style="color: #666666">0</span>, <span style="color: #666666">0</span>, <span style="color: #666666">0</span>, <span style="color: #666666">1</span>])  <span style="color: #408080; font-style: italic"># This is an AND gate</span>
-    model <span style="color: #666666">=</span> LogisticRegression(learning_rate<span style="color: #666666">=0.01</span>, num_iterations<span style="color: #666666">=1000</span>)
-    model<span style="color: #666666">.</span>GDfit(X, y)
-    predictions <span style="color: #666666">=</span> model<span style="color: #666666">.</span>predict(X)
-    <span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;Predictions:&quot;</span>, predictions)
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="overview-video-on-stochastic-gradient-descent">Overview video on Stochastic Gradient Descent </h2>
-
-<a href="https://www.youtube.com/watch?v=vMh0zPT0tLI&ab_channel=StatQuestwithJoshStarmer" target="_blank">What is Stochastic Gradient Descent</a>
-<p>There are several reasons for using stochastic gradient descent. Some of these are:</p>
-
-<ol>
-<li> Efficiency: Updates weights more frequently using a single or a small batch of samples, which speeds up convergence.</li>
-<li> Hopefully avoid Local Minima</li>
-<li> Memory Usage: Requires less memory compared to computing gradients for the entire dataset.</li>
-</ol>
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="batches-and-mini-batches">Batches and mini-batches </h2>
-
-<p>In gradient descent we compute the cost function and its gradient for all data points we have.</p>
-
-<p>In large-scale applications such as the <a href="https://www.image-net.org/challenges/LSVRC/" target="_blank">ILSVRC challenge</a>, the
-training data can have on order of millions of examples. Hence, it
-seems wasteful to compute the full cost function over the entire
-training set in order to perform only a single parameter update. A
-very common approach to addressing this challenge is to compute the
-gradient over batches of the training data. For example, a typical batch could contain some thousand  examples from
-an  entire training set of several millions. This batch is then used to
-perform a parameter update.
+<p>The most common situation we encounter when we apply logistic
+regression is that of two possible outcomes, normally denoted as a
+binary outcome, true or false, positive or negative, success or
+failure etc.
 </p>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="stochastic-gradient-descent-sgd">Stochastic Gradient Descent (SGD) </h2>
+<h2 id="optimization-and-deep-learning">Optimization and Deep learning </h2>
 
-<p>In stochastic gradient descent, the extreme case is the case where we
-have only one batch, that is we include the whole data set.
+<p>Logistic regression will also serve as our stepping stone towards
+neural network algorithms and supervised deep learning. For logistic
+learning, the minimization of the cost function leads to a non-linear
+equation in the parameters \( \boldsymbol{\theta} \). The optimization of the
+problem calls therefore for minimization algorithms.
 </p>
 
-<p>This process is called Stochastic Gradient
-Descent (SGD) (or also sometimes on-line gradient descent). This is
-relatively less common to see because in practice due to vectorized
-code optimizations it can be computationally much more efficient to
-evaluate the gradient for 100 examples, than the gradient for one
-example 100 times. Even though SGD technically refers to using a
-single example at a time to evaluate the gradient, you will hear
-people use the term SGD even when referring to mini-batch gradient
-descent (i.e. mentions of MGD for &#8220;Minibatch Gradient Descent&#8221;, or BGD
-for &#8220;Batch gradient descent&#8221; are rare to see), where it is usually
-assumed that mini-batches are used. The size of the mini-batch is a
-hyperparameter but it is not very common to cross-validate or bootstrap it. It is
-usually based on memory constraints (if any), or set to some value,
-e.g. 32, 64 or 128. We use powers of 2 in practice because many
-vectorized operation implementations work faster when their inputs are
-sized in powers of 2.
+<p>As we have discussed earlier, this forms the
+bottle neck of all machine learning algorithms, namely how to find
+reliable minima of a multi-variable function. This leads us to the
+family of gradient descent methods. The latter are the working horses
+of basically all modern machine learning algorithms.
 </p>
 
-<p>In our notes with  SGD we mean stochastic gradient descent with mini-batches.</p>
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="stochastic-gradient-descent">Stochastic Gradient Descent </h2>
-
-<p>Stochastic gradient descent (SGD) and variants thereof address some of
-the shortcomings of the Gradient descent method discussed above.
+<p>We note also that many of the topics discussed here on logistic 
+regression are also commonly used in modern supervised Deep Learning
+models, as we will see later.
 </p>
 
-<p>The underlying idea of SGD comes from the observation that the cost
-function, which we want to minimize, can almost always be written as a
-sum over \( n \) data points \( \{\mathbf{x}_i\}_{i=1}^n \),
-</p>
-$$
-C(\mathbf{\beta}) = \sum_{i=1}^n c_i(\mathbf{x}_i,
-\mathbf{\beta}). 
-$$
-
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="computation-of-gradients">Computation of gradients </h2>
+<!-- !split  -->
+<h2 id="basics">Basics </h2>
 
-<p>This in turn means that the gradient can be
-computed as a sum over \( i \)-gradients 
+<p>We consider the case where the outputs/targets, also called the
+responses or the outcomes, \( y_i \) are discrete and only take values
+from \( k=0,\dots,K-1 \) (i.e. \( K \) classes).
 </p>
-$$
-\nabla_\beta C(\mathbf{\beta}) = \sum_i^n \nabla_\beta c_i(\mathbf{x}_i,
-\mathbf{\beta}).
-$$
 
-<p>Stochasticity/randomness is introduced by only taking the
-gradient on a subset of the data called minibatches.  If there are \( n \)
-data points and the size of each minibatch is \( M \), there will be \( n/M \)
-minibatches. We denote these minibatches by \( B_k \) where
-\( k=1,\cdots,n/M \).
+<p>The goal is to predict the
+output classes from the design matrix \( \boldsymbol{X}\in\mathbb{R}^{n\times p} \)
+made of \( n \) samples, each of which carries \( p \) features or predictors. The
+primary goal is to identify the classes to which new unseen samples
+belong.
 </p>
 
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="sgd-example">SGD example </h2>
-<p>As an example, suppose we have \( 10 \) data points \( (\mathbf{x}_1,\cdots, \mathbf{x}_{10}) \) 
-and we choose to have \( M=5 \) minibathces,
-then each minibatch contains two data points. In particular we have
-\( B_1 = (\mathbf{x}_1,\mathbf{x}_2), \cdots, B_5 =
-(\mathbf{x}_9,\mathbf{x}_{10}) \). Note that if you choose \( M=1 \) you
-have only a single batch with all data points and on the other extreme,
-you may choose \( M=n \) resulting in a minibatch for each datapoint, i.e
-\( B_k = \mathbf{x}_k \).
+<p>Last week we  specialized to the case of two classes only, with outputs
+\( y_i=0 \) and \( y_i=1 \). Our outcomes could represent the status of a
+credit card user that could default or not on her/his credit card
+debt. That is
 </p>
 
-<p>The idea is now to approximate the gradient by replacing the sum over
-all data points with a sum over the data points in one the minibatches
-picked at random in each gradient descent step 
-</p>
 $$
-\nabla_{\beta}
-C(\mathbf{\beta}) = \sum_{i=1}^n \nabla_\beta c_i(\mathbf{x}_i,
-\mathbf{\beta}) \rightarrow \sum_{i \in B_k}^n \nabla_\beta
-c_i(\mathbf{x}_i, \mathbf{\beta}).
+y_i = \begin{bmatrix} 0 & \mathrm{no}\\  1 & \mathrm{yes} \end{bmatrix}.
 $$
 
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="the-gradient-step">The gradient step </h2>
+<h2 id="two-parameters">Two parameters </h2>
 
-<p>Thus a gradient descent step now looks like </p>
+<p>We assume now that we have two classes with \( y_i \) either \( 0 \) or \( 1 \). Furthermore we assume also that we have only two parameters \( \theta \) in our fitting of the Sigmoid function, that is we define probabilities </p>
 $$
-\beta_{j+1} = \beta_j - \gamma_j \sum_{i \in B_k}^n \nabla_\beta c_i(\mathbf{x}_i,
-\mathbf{\beta})
+\begin{align*}
+p(y_i=1|x_i,\boldsymbol{\theta}) &= \frac{\exp{(\theta_0+\theta_1x_i)}}{1+\exp{(\theta_0+\theta_1x_i)}},\nonumber\\
+p(y_i=0|x_i,\boldsymbol{\theta}) &= 1 - p(y_i=1|x_i,\boldsymbol{\theta}),
+\end{align*}
 $$
 
-<p>where \( k \) is picked at random with equal
-probability from \( [1,n/M] \). An iteration over the number of
-minibathces (n/M) is commonly referred to as an epoch. Thus it is
-typical to choose a number of epochs and for each epoch iterate over
-the number of minibatches, as exemplified in the code below.
-</p>
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="simple-example-code">Simple example code </h2>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span> 
-
-n <span style="color: #666666">=</span> <span style="color: #666666">100</span> <span style="color: #408080; font-style: italic">#100 datapoints </span>
-M <span style="color: #666666">=</span> <span style="color: #666666">5</span>   <span style="color: #408080; font-style: italic">#size of each minibatch</span>
-m <span style="color: #666666">=</span> <span style="color: #008000">int</span>(n<span style="color: #666666">/</span>M) <span style="color: #408080; font-style: italic">#number of minibatches</span>
-n_epochs <span style="color: #666666">=</span> <span style="color: #666666">10</span> <span style="color: #408080; font-style: italic">#number of epochs</span>
-
-j <span style="color: #666666">=</span> <span style="color: #666666">0</span>
-<span style="color: #008000; font-weight: bold">for</span> epoch <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #666666">1</span>,n_epochs<span style="color: #666666">+1</span>):
-    <span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(m):
-        k <span style="color: #666666">=</span> np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>randint(m) <span style="color: #408080; font-style: italic">#Pick the k-th minibatch at random</span>
-        <span style="color: #408080; font-style: italic">#Compute the gradient using the data in minibatch Bk</span>
-        <span style="color: #408080; font-style: italic">#Compute new suggestion for </span>
-        j <span style="color: #666666">+=</span> <span style="color: #666666">1</span>
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-<p>Taking the gradient only on a subset of the data has two important
-benefits. First, it introduces randomness which decreases the chance
-that our opmization scheme gets stuck in a local minima. Second, if
-the size of the minibatches are small relative to the number of
-datapoints (\( M <  n \)), the computation of the gradient is much
-cheaper since we sum over the datapoints in the \( k-th \) minibatch and not
-all \( n \) datapoints.
-</p>
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="when-do-we-stop">When do we stop? </h2>
-
-<p>A natural question is when do we stop the search for a new minimum?
-One possibility is to compute the full gradient after a given number
-of epochs and check if the norm of the gradient is smaller than some
-threshold and stop if true. However, the condition that the gradient
-is zero is valid also for local minima, so this would only tell us
-that we are close to a local/global minimum. However, we could also
-evaluate the cost function at this point, store the result and
-continue the search. If the test kicks in at a later stage we can
-compare the values of the cost function and keep the \( \beta \) that
-gave the lowest value.
-</p>
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="slightly-different-approach">Slightly different approach </h2>
-
-<p>Another approach is to let the step length \( \gamma_j \) depend on the
-number of epochs in such a way that it becomes very small after a
-reasonable time such that we do not move at all. Such approaches are
-also called scaling. There are many such ways to <a href="https://towardsdatascience.com/gradient-descent-the-learning-rate-and-the-importance-of-feature-scaling-6c0b416596e1" target="_blank">scale the learning
-rate</a>
-and <a href="https://www.jmlr.org/papers/volume23/20-1258/20-1258.pdf" target="_blank">discussions here</a>. See
-also
-<a href="https://towardsdatascience.com/learning-rate-schedules-and-adaptive-learning-rate-methods-for-deep-learning-2c8f433990d1" target="_blank"><tt>https://towardsdatascience.com/learning-rate-schedules-and-adaptive-learning-rate-methods-for-deep-learning-2c8f433990d1</tt></a>
-for a discussion of different scaling functions for the learning rate.
-</p>
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="time-decay-rate">Time decay rate </h2>
-
-<p>As an example, let \( e = 0,1,2,3,\cdots \) denote the current epoch and let \( t_0, t_1 > 0 \) be two fixed numbers. Furthermore, let \( t = e \cdot m + i \) where \( m \) is the number of minibatches and \( i=0,\cdots,m-1 \). Then the function $$\gamma_j(t; t_0, t_1) = \frac{t_0}{t+t_1} $$ goes to zero as the number of epochs gets large. I.e. we start with a step length \( \gamma_j (0; t_0, t_1) = t_0/t_1 \) which decays in <em>time</em> \( t \).</p>
-
-<p>In this way we can fix the number of epochs, compute \( \beta \) and
-evaluate the cost function at the end. Repeating the computation will
-give a different result since the scheme is random by design. Then we
-pick the final \( \beta \) that gives the lowest value of the cost
-function.
-</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span> 
-
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">step_length</span>(t,t0,t1):
-    <span style="color: #008000; font-weight: bold">return</span> t0<span style="color: #666666">/</span>(t<span style="color: #666666">+</span>t1)
-
-n <span style="color: #666666">=</span> <span style="color: #666666">100</span> <span style="color: #408080; font-style: italic">#100 datapoints </span>
-M <span style="color: #666666">=</span> <span style="color: #666666">5</span>   <span style="color: #408080; font-style: italic">#size of each minibatch</span>
-m <span style="color: #666666">=</span> <span style="color: #008000">int</span>(n<span style="color: #666666">/</span>M) <span style="color: #408080; font-style: italic">#number of minibatches</span>
-n_epochs <span style="color: #666666">=</span> <span style="color: #666666">500</span> <span style="color: #408080; font-style: italic">#number of epochs</span>
-t0 <span style="color: #666666">=</span> <span style="color: #666666">1.0</span>
-t1 <span style="color: #666666">=</span> <span style="color: #666666">10</span>
-
-gamma_j <span style="color: #666666">=</span> t0<span style="color: #666666">/</span>t1
-j <span style="color: #666666">=</span> <span style="color: #666666">0</span>
-<span style="color: #008000; font-weight: bold">for</span> epoch <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #666666">1</span>,n_epochs<span style="color: #666666">+1</span>):
-    <span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(m):
-        k <span style="color: #666666">=</span> np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>randint(m) <span style="color: #408080; font-style: italic">#Pick the k-th minibatch at random</span>
-        <span style="color: #408080; font-style: italic">#Compute the gradient using the data in minibatch Bk</span>
-        <span style="color: #408080; font-style: italic">#Compute new suggestion for beta</span>
-        t <span style="color: #666666">=</span> epoch<span style="color: #666666">*</span>m<span style="color: #666666">+</span>i
-        gamma_j <span style="color: #666666">=</span> step_length(t,t0,t1)
-        j <span style="color: #666666">+=</span> <span style="color: #666666">1</span>
-
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;gamma_j after </span><span style="color: #BB6688; font-weight: bold">%d</span><span style="color: #BA2121"> epochs: </span><span style="color: #BB6688; font-weight: bold">%g</span><span style="color: #BA2121">&quot;</span> <span style="color: #666666">%</span> (n_epochs,gamma_j))
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="code-with-a-number-of-minibatches-which-varies">Code with a Number of Minibatches which varies </h2>
-
-<p>In the code here we vary the number of mini-batches.</p>
-
-<!-- code=text (!bc pycode) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"># Importing various packages
-from math import exp, sqrt
-from random import random, seed
-import numpy as np
-import matplotlib.pyplot as plt
-
-n = 100
-x = 2*np.random.rand(n,1)
-y = 4+3*x+np.random.randn(n,1)
-
-X = np.c_[np.ones((n,1)), x]
-XT_X = X.T @ X
-theta_linreg = np.linalg.inv(X.T @ X) @ (X.T @ y)
-print(&quot;Own inversion&quot;)
-print(theta_linreg)
-# Hessian matrix
-H = (2.0/n)* XT_X
-EigValues, EigVectors = np.linalg.eig(H)
-print(f&quot;Eigenvalues of Hessian Matrix:{EigValues}&quot;)
-
-theta = np.random.randn(2,1)
-eta = 1.0/np.max(EigValues)
-Niterations = 1000
-
-
-for iter in range(Niterations):
-    gradients = 2.0/n*X.T @ ((X @ theta)-y)
-    theta -= eta*gradients
-print(&quot;theta from own gd&quot;)
-print(theta)
-
-xnew = np.array([[0],[2]])
-Xnew = np.c_[np.ones((2,1)), xnew]
-ypredict = Xnew.dot(theta)
-ypredict2 = Xnew.dot(theta_linreg)
-
-n_epochs = 50
-M = 5   #size of each minibatch
-m = int(n/M) #number of minibatches
-t0, t1 = 5, 50
-
-def learning_schedule(t):
-    return t0/(t+t1)
-
-theta = np.random.randn(2,1)
-
-for epoch in range(n_epochs):
-# Can you figure out a better way of setting up the contributions to each batch?
-    for i in range(m):
-        random_index = M*np.random.randint(m)
-        xi = X[random_index:random_index+M]
-        yi = y[random_index:random_index+M]
-        gradients = (2.0/M)* xi.T @ ((xi @ theta)-yi)
-        eta = learning_schedule(epoch*m+i)
-        theta = theta - eta*gradients
-print(&quot;theta from own sdg&quot;)
-print(theta)
-
-plt.plot(xnew, ypredict, &quot;r-&quot;)
-plt.plot(xnew, ypredict2, &quot;b-&quot;)
-plt.plot(x, y ,&#39;ro&#39;)
-plt.axis([0,2.0,0, 15.0])
-plt.xlabel(r&#39;$x$&#39;)
-plt.ylabel(r&#39;$y$&#39;)
-plt.title(r&#39;Random numbers &#39;)
-plt.show()
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="replace-or-not">Replace or not </h2>
-
-<p>In the above code, we have use replacement in setting up the
-mini-batches. The discussion
-<a href="https://sebastianraschka.com/faq/docs/sgd-methods.html" target="_blank">here</a> may be
-useful.  
-</p>
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="momentum-based-gd">Momentum based GD </h2>
-
-<p>The stochastic gradient descent (SGD) is almost always used with a
-<em>momentum</em> or inertia term that serves as a memory of the direction we
-are moving in parameter space.  This is typically implemented as
-follows
-</p>
+<p>where \( \boldsymbol{\theta} \) are the weights we wish to extract from data, in our case \( \theta_0 \) and \( \theta_1 \). </p>
 
+<p>Note that we used</p>
 $$
-\begin{align}
-\mathbf{v}_{t}&=\gamma \mathbf{v}_{t-1}+\eta_{t}\nabla_\theta E(\boldsymbol{\theta}_t) \nonumber \\
-\boldsymbol{\theta}_{t+1}&= \boldsymbol{\theta}_t -\mathbf{v}_{t},
-\label{_auto1}
-\end{align}
-$$
-
-<p>where we have introduced a momentum parameter \( \gamma \), with
-\( 0\le\gamma\le 1 \), and for brevity we dropped the explicit notation to
-indicate the gradient is to be taken over a different mini-batch at
-each step. We call this algorithm gradient descent with momentum
-(GDM). From these equations, it is clear that \( \mathbf{v}_t \) is a
-running average of recently encountered gradients and
-\( (1-\gamma)^{-1} \) sets the characteristic time scale for the memory
-used in the averaging procedure. Consistent with this, when
-\( \gamma=0 \), this just reduces down to ordinary SGD as discussed
-earlier. An equivalent way of writing the updates is
-</p>
-
-$$
-\Delta \boldsymbol{\theta}_{t+1} = \gamma \Delta \boldsymbol{\theta}_t -\ \eta_{t}\nabla_\theta E(\boldsymbol{\theta}_t),
+p(y_i=0\vert x_i, \boldsymbol{\theta}) = 1-p(y_i=1\vert x_i, \boldsymbol{\theta}).
 $$
 
-<p>where we have defined \( \Delta \boldsymbol{\theta}_{t}= \boldsymbol{\theta}_t-\boldsymbol{\theta}_{t-1} \).</p>
 
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="more-on-momentum-based-approaches">More on momentum based approaches </h2>
+<!-- !split  -->
+<h2 id="maximum-likelihood">Maximum likelihood </h2>
 
-<p>Let us try to get more intuition from these equations. It is helpful
-to consider a simple physical analogy with a particle of mass \( m \)
-moving in a viscous medium with drag coefficient \( \mu \) and potential
-\( E(\mathbf{w}) \). If we denote the particle's position by \( \mathbf{w} \),
-then its motion is described by
+<p>In order to define the total likelihood for all possible outcomes from a  
+dataset \( \mathcal{D}=\{(y_i,x_i)\} \), with the binary labels
+\( y_i\in\{0,1\} \) and where the data points are drawn independently, we use the so-called <a href="https://en.wikipedia.org/wiki/Maximum_likelihood_estimation" target="_blank">Maximum Likelihood Estimation</a> (MLE) principle. 
+We aim thus at maximizing 
+the probability of seeing the observed data. We can then approximate the 
+likelihood in terms of the product of the individual probabilities of a specific outcome \( y_i \), that is 
 </p>
-
 $$
-m {d^2 \mathbf{w} \over dt^2} + \mu {d \mathbf{w} \over dt }= -\nabla_w E(\mathbf{w}).
+\begin{align*}
+P(\mathcal{D}|\boldsymbol{\theta})& = \prod_{i=1}^n \left[p(y_i=1|x_i,\boldsymbol{\theta})\right]^{y_i}\left[1-p(y_i=1|x_i,\boldsymbol{\theta}))\right]^{1-y_i}\nonumber \\
+\end{align*}
 $$
 
-<p>We can discretize this equation in the usual way to get</p>
-
-$$
-m { \mathbf{w}_{t+\Delta t}-2 \mathbf{w}_{t} +\mathbf{w}_{t-\Delta t} \over (\Delta t)^2}+\mu {\mathbf{w}_{t+\Delta t}- \mathbf{w}_{t} \over \Delta t} = -\nabla_w E(\mathbf{w}).
-$$
-
-<p>Rearranging this equation, we can rewrite this as</p>
-
+<p>from which we obtain the log-likelihood and our <b>cost/loss</b> function</p>
 $$
-\Delta \mathbf{w}_{t +\Delta t}= - { (\Delta t)^2 \over m +\mu \Delta t} \nabla_w E(\mathbf{w})+ {m \over m +\mu \Delta t} \Delta \mathbf{w}_t.
+\mathcal{C}(\boldsymbol{\theta}) = \sum_{i=1}^n \left( y_i\log{p(y_i=1|x_i,\boldsymbol{\theta})} + (1-y_i)\log\left[1-p(y_i=1|x_i,\boldsymbol{\theta}))\right]\right).
 $$
 
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="momentum-parameter">Momentum parameter </h2>
-
-<p>Notice that this equation is identical to previous one if we identify
-the position of the particle, \( \mathbf{w} \), with the parameters
-\( \boldsymbol{\theta} \). This allows us to identify the momentum
-parameter and learning rate with the mass of the particle and the
-viscous drag as:
-</p>
+<h2 id="the-cost-function-rewritten">The cost function rewritten </h2>
 
+<p>Reordering the logarithms, we can rewrite the <b>cost/loss</b> function as</p>
 $$
-\gamma= {m \over m +\mu \Delta t }, \qquad \eta = {(\Delta t)^2 \over m +\mu \Delta t}.
+\mathcal{C}(\boldsymbol{\theta}) = \sum_{i=1}^n  \left(y_i(\theta_0+\theta_1x_i) -\log{(1+\exp{(\theta_0+\theta_1x_i)})}\right).
 $$
 
-<p>Thus, as the name suggests, the momentum parameter is proportional to
-the mass of the particle and effectively provides inertia.
-Furthermore, in the large viscosity/small learning rate limit, our
-memory time scales as \( (1-\gamma)^{-1} \approx m/(\mu \Delta t) \).
-</p>
-
-<p>Why is momentum useful? SGD momentum helps the gradient descent
-algorithm gain speed in directions with persistent but small gradients
-even in the presence of stochasticity, while suppressing oscillations
-in high-curvature directions. This becomes especially important in
-situations where the landscape is shallow and flat in some directions
-and narrow and steep in others. It has been argued that first-order
-methods (with appropriate initial conditions) can perform comparable
-to more expensive second order methods, especially in the context of
-complex deep learning models.
-</p>
-
-<p>These beneficial properties of momentum can sometimes become even more
-pronounced by using a slight modification of the classical momentum
-algorithm called Nesterov Accelerated Gradient (NAG).
-</p>
-
-<p>In the NAG algorithm, rather than calculating the gradient at the
-current parameters, \( \nabla_\theta E(\boldsymbol{\theta}_t) \), one
-calculates the gradient at the expected value of the parameters given
-our current momentum, \( \nabla_\theta E(\boldsymbol{\theta}_t +\gamma
-\mathbf{v}_{t-1}) \). This yields the NAG update rule
+<p>The maximum likelihood estimator is defined as the set of parameters that maximize the log-likelihood where we maximize with respect to \( \theta \).
+Since the cost (error) function is just the negative log-likelihood, for logistic regression we have that
 </p>
-
 $$
-\begin{align}
-\mathbf{v}_{t}&=\gamma \mathbf{v}_{t-1}+\eta_{t}\nabla_\theta E(\boldsymbol{\theta}_t +\gamma \mathbf{v}_{t-1}) \nonumber \\
-\boldsymbol{\theta}_{t+1}&= \boldsymbol{\theta}_t -\mathbf{v}_{t}.
-\label{_auto2}
-\end{align}
+\mathcal{C}(\boldsymbol{\theta})=-\sum_{i=1}^n  \left(y_i(\theta_0+\theta_1x_i) -\log{(1+\exp{(\theta_0+\theta_1x_i)})}\right).
 $$
 
-<p>One of the major advantages of NAG is that it allows for the use of a larger learning rate than GDM for the same choice of \( \gamma \).</p>
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="second-moment-of-the-gradient">Second moment of the gradient </h2>
-
-<p>In stochastic gradient descent, with and without momentum, we still
-have to specify a schedule for tuning the learning rates \( \eta_t \)
-as a function of time.  As discussed in the context of Newton's
-method, this presents a number of dilemmas. The learning rate is
-limited by the steepest direction which can change depending on the
-current position in the landscape. To circumvent this problem, ideally
-our algorithm would keep track of curvature and take large steps in
-shallow, flat directions and small steps in steep, narrow directions.
-Second-order methods accomplish this by calculating or approximating
-the Hessian and normalizing the learning rate by the
-curvature. However, this is very computationally expensive for
-extremely large models. Ideally, we would like to be able to
-adaptively change the step size to match the landscape without paying
-the steep computational price of calculating or approximating
-Hessians.
-</p>
-
-<p>During the last decade a number of methods have been introduced that accomplish
-this by tracking not only the gradient, but also the second moment of
-the gradient. These methods include AdaGrad, AdaDelta, Root Mean Squared Propagation (RMS-Prop), and
-<a href="https://arxiv.org/abs/1412.6980" target="_blank">ADAM</a>.
+<p>This equation is known in statistics as the <b>cross entropy</b>. Finally, we note that just as in linear regression, 
+in practice we often supplement the cross-entropy with additional regularization terms, usually \( L_1 \) and \( L_2 \) regularization as we did for Ridge and Lasso regression.
 </p>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="rms-prop">RMS prop </h2>
+<h2 id="minimizing-the-cross-entropy">Minimizing the cross entropy </h2>
 
-<p>In RMS prop, in addition to keeping a running average of the first
-moment of the gradient, we also keep track of the second moment
-denoted by \( \mathbf{s}_t=\mathbb{E}[\mathbf{g}_t^2] \). The update rule
-for RMS prop is given by
+<p>The cross entropy is a convex function of the weights \( \boldsymbol{\theta} \) and,
+therefore, any local minimizer is a global minimizer. 
 </p>
 
-$$
-\begin{align}
-\mathbf{g}_t &= \nabla_\theta E(\boldsymbol{\theta}) 
-\label{_auto3}\\
-\mathbf{s}_t &=\beta \mathbf{s}_{t-1} +(1-\beta)\mathbf{g}_t^2 \nonumber \\
-\boldsymbol{\theta}_{t+1}&=&\boldsymbol{\theta}_t - \eta_t { \mathbf{g}_t \over \sqrt{\mathbf{s}_t +\epsilon}}, \nonumber
-\end{align}
-$$
-
-<p>where \( \beta \) controls the averaging time of the second moment and is
-typically taken to be about \( \beta=0.9 \), \( \eta_t \) is a learning rate
-typically chosen to be \( 10^{-3} \), and \( \epsilon\sim 10^{-8}  \) is a
-small regularization constant to prevent divergences. Multiplication
-and division by vectors is understood as an element-wise operation. It
-is clear from this formula that the learning rate is reduced in
-directions where the norm of the gradient is consistently large. This
-greatly speeds up the convergence by allowing us to use a larger
-learning rate for flat directions.
-</p>
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="adam-optimizer-https-arxiv-org-abs-1412-6980"><a href="https://arxiv.org/abs/1412.6980" target="_blank">ADAM optimizer</a> </h2>
-
-<p>A related algorithm is the ADAM optimizer. In
-<a href="https://arxiv.org/abs/1412.6980" target="_blank">ADAM</a>, we keep a running average of
-both the first and second moment of the gradient and use this
-information to adaptively change the learning rate for different
-parameters.  The method isefficient when working with large
-problems involving lots data and/or parameters.  It is a combination of the
-gradient descent with momentum algorithm and the RMSprop algorithm
-discussed above.
-</p>
-
-<p>In addition to keeping a running average of the first and
-second moments of the gradient
-(i.e. \( \mathbf{m}_t=\mathbb{E}[\mathbf{g}_t] \) and
-\( \mathbf{s}_t=\mathbb{E}[\mathbf{g}^2_t] \), respectively), ADAM
-performs an additional bias correction to account for the fact that we
-are estimating the first two moments of the gradient using a running
-average (denoted by the hats in the update rule below). The update
-rule for ADAM is given by (where multiplication and division are once
-again understood to be element-wise operations below)
+<p>Minimizing this
+cost function with respect to the two parameters \( \theta_0 \) and \( \theta_1 \) we obtain
 </p>
 
 $$
-\begin{align}
-\mathbf{g}_t &= \nabla_\theta E(\boldsymbol{\theta}) 
-\label{_auto4}\\
-\mathbf{m}_t &= \beta_1 \mathbf{m}_{t-1} + (1-\beta_1) \mathbf{g}_t \nonumber \\
-\mathbf{s}_t &=\beta_2 \mathbf{s}_{t-1} +(1-\beta_2)\mathbf{g}_t^2 \nonumber \\
-\boldsymbol{\mathbf{m}}_t&={\mathbf{m}_t \over 1-\beta_1^t} \nonumber \\
-\boldsymbol{\mathbf{s}}_t &={\mathbf{s}_t \over1-\beta_2^t} \nonumber \\
-\boldsymbol{\theta}_{t+1}&=\boldsymbol{\theta}_t - \eta_t { \boldsymbol{\mathbf{m}}_t \over \sqrt{\boldsymbol{\mathbf{s}}_t} +\epsilon}, \nonumber \\
-\label{_auto5}
-\end{align}
+\frac{\partial \mathcal{C}(\boldsymbol{\theta})}{\partial \theta_0} = -\sum_{i=1}^n  \left(y_i -\frac{\exp{(\theta_0+\theta_1x_i)}}{1+\exp{(\theta_0+\theta_1x_i)}}\right),
 $$
 
-<p>where \( \beta_1 \) and \( \beta_2 \) set the memory lifetime of the first and
-second moment and are typically taken to be \( 0.9 \) and \( 0.99 \)
-respectively, and \( \eta \) and \( \epsilon \) are identical to RMSprop.
-</p>
-
-<p>Like in RMSprop, the effective step size of a parameter depends on the
-magnitude of its gradient squared.  To understand this better, let us
-rewrite this expression in terms of the variance
-\( \boldsymbol{\sigma}_t^2 = \boldsymbol{\mathbf{s}}_t -
-(\boldsymbol{\mathbf{m}}_t)^2 \). Consider a single parameter \( \theta_t \). The
-update rule for this parameter is given by
-</p>
-
+<p>and </p>
 $$
-\Delta \theta_{t+1}= -\eta_t { \boldsymbol{m}_t \over \sqrt{\sigma_t^2 +  m_t^2 }+\epsilon}.
+\frac{\partial \mathcal{C}(\boldsymbol{\theta})}{\partial \theta_1} = -\sum_{i=1}^n  \left(y_ix_i -x_i\frac{\exp{(\theta_0+\theta_1x_i)}}{1+\exp{(\theta_0+\theta_1x_i)}}\right).
 $$
 
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="algorithms-and-codes-for-adagrad-rmsprop-and-adam">Algorithms and codes for Adagrad, RMSprop and Adam </h2>
-
-<p>The algorithms we have implemented are well described in the text by <a href="https://www.deeplearningbook.org/contents/optimization.html" target="_blank">Goodfellow, Bengio and Courville, chapter 8</a>.</p>
-
-<p>The codes which implement these algorithms are discussed after our presentation of automatic differentiation.</p>
-<h2 id="adagrad-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html">AdaGrad algorithm, taken from <a href="https://www.deeplearningbook.org/contents/optimization.html" target="_blank">Goodfellow et al</a> </h2>
-
-<br/><br/>
-<center>
-<p><img src="http://github.com/figures/adagrad.png" width="600" align="bottom"></p>
-</center>
-<br/><br/>
-<h2 id="rmsprop-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html">RMSProp algorithm, taken from <a href="https://www.deeplearningbook.org/contents/optimization.html" target="_blank">Goodfellow et al</a> </h2>
-
-<br/><br/>
-<center>
-<p><img src="http://github.com/figures/rmsprop.png" width="600" align="bottom"></p>
-</center>
-<br/><br/>
-<h2 id="adam-algorithm-taken-from-goodfellow-et-al-https-www-deeplearningbook-org-contents-optimization-html">ADAM algorithm, taken from <a href="https://www.deeplearningbook.org/contents/optimization.html" target="_blank">Goodfellow et al</a> </h2>
-
-<br/><br/>
-<center>
-<p><img src="http://github.com/figures/adam.png" width="600" align="bottom"></p>
-</center>
-<br/><br/>
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="practical-tips">Practical tips </h2>
-
-<ul>
-<li> <b>Randomize the data when making mini-batches</b>. It is always important to randomly shuffle the data when forming mini-batches. Otherwise, the gradient descent method can fit spurious correlations resulting from the order in which data is presented.</li>
-<li> <b>Transform your inputs</b>. Learning becomes difficult when our landscape has a mixture of steep and flat directions. One simple trick for minimizing these situations is to standardize the data by subtracting the mean and normalizing the variance of input variables. Whenever possible, also decorrelate the inputs. To understand why this is helpful, consider the case of linear regression. It is easy to show that for the squared error cost function, the Hessian of the cost function is just the correlation matrix between the inputs. Thus, by standardizing the inputs, we are ensuring that the landscape looks homogeneous in all directions in parameter space. Since most deep networks can be viewed as linear transformations followed by a non-linearity at each layer, we expect this intuition to hold beyond the linear case.</li>
-<li> <b>Monitor the out-of-sample performance.</b> Always monitor the performance of your model on a validation set (a small portion of the training data that is held out of the training process to serve as a proxy for the test set. If the validation error starts increasing, then the model is beginning to overfit. Terminate the learning process. This <em>early stopping</em> significantly improves performance in many settings.</li>
-<li> <b>Adaptive optimization methods don't always have good generalization.</b> Recent studies have shown that adaptive methods such as ADAM, RMSPorp, and AdaGrad tend to have poor generalization compared to SGD or SGD with momentum, particularly in the high-dimensional limit (i.e. the number of parameters exceeds the number of data points). Although it is not clear at this stage why these methods perform so well in training deep neural networks, simpler procedures like properly-tuned SGD may work as well or better in these applications.</li>
-</ul>
-<p>Geron's text, see chapter 11, has several interesting discussions.</p>
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="automatic-differentiation">Automatic differentiation </h2>
+<h2 id="a-more-compact-expression">A more compact expression </h2>
 
-<p><a href="https://en.wikipedia.org/wiki/Automatic_differentiation" target="_blank">Automatic differentiation (AD)</a>, 
-also called algorithmic
-differentiation or computational differentiation,is a set of
-techniques to numerically evaluate the derivative of a function
-specified by a computer program. AD exploits the fact that every
-computer program, no matter how complicated, executes a sequence of
-elementary arithmetic operations (addition, subtraction,
-multiplication, division, etc.) and elementary functions (exp, log,
-sin, cos, etc.). By applying the chain rule repeatedly to these
-operations, derivatives of arbitrary order can be computed
-automatically, accurately to working precision, and using at most a
-small constant factor more arithmetic operations than the original
-program.
+<p>Let us now define a vector \( \boldsymbol{y} \) with \( n \) elements \( y_i \), an
+\( n\times p \) matrix \( \boldsymbol{X} \) which contains the \( x_i \) values and a
+vector \( \boldsymbol{p} \) of fitted probabilities \( p(y_i\vert x_i,\boldsymbol{\theta}) \). We can rewrite in a more compact form the first
+derivative of the cost function as
 </p>
 
-<p>Automatic differentiation is neither:</p>
-
-<ul>
-<li> Symbolic differentiation, nor</li>
-<li> Numerical differentiation (the method of finite differences).</li>
-</ul>
-<p>Symbolic differentiation can lead to inefficient code and faces the
-difficulty of converting a computer program into a single expression,
-while numerical differentiation can introduce round-off errors in the
-discretization process and cancellation
-</p>
-
-<p>Python has tools for so-called <b>automatic differentiation</b>.
-Consider the following example
-</p>
-$$
-f(x) = \sin\left(2\pi x + x^2\right)
-$$
-
-<p>which has the following derivative</p>
 $$
-f'(x) = \cos\left(2\pi x + x^2\right)\left(2\pi + 2x\right) 
+\frac{\partial \mathcal{C}(\boldsymbol{\theta})}{\partial \boldsymbol{\theta}} = -\boldsymbol{X}^T\left(\boldsymbol{y}-\boldsymbol{p}\right). 
 $$
 
-<p>Using <b>autograd</b> we have</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">autograd.numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
-
-<span style="color: #408080; font-style: italic"># To do elementwise differentiation:</span>
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">autograd</span> <span style="color: #008000; font-weight: bold">import</span> elementwise_grad <span style="color: #008000; font-weight: bold">as</span> egrad 
-
-<span style="color: #408080; font-style: italic"># To plot:</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">matplotlib.pyplot</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">plt</span> 
-
-
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">f</span>(x):
-    <span style="color: #008000; font-weight: bold">return</span> np<span style="color: #666666">.</span>sin(<span style="color: #666666">2*</span>np<span style="color: #666666">.</span>pi<span style="color: #666666">*</span>x <span style="color: #666666">+</span> x<span style="color: #666666">**2</span>)
-
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">f_grad_analytic</span>(x):
-    <span style="color: #008000; font-weight: bold">return</span> np<span style="color: #666666">.</span>cos(<span style="color: #666666">2*</span>np<span style="color: #666666">.</span>pi<span style="color: #666666">*</span>x <span style="color: #666666">+</span> x<span style="color: #666666">**2</span>)<span style="color: #666666">*</span>(<span style="color: #666666">2*</span>np<span style="color: #666666">.</span>pi <span style="color: #666666">+</span> <span style="color: #666666">2*</span>x)
-
-<span style="color: #408080; font-style: italic"># Do the comparison:</span>
-x <span style="color: #666666">=</span> np<span style="color: #666666">.</span>linspace(<span style="color: #666666">0</span>,<span style="color: #666666">1</span>,<span style="color: #666666">1000</span>)
-
-f_grad <span style="color: #666666">=</span> egrad(f)
-
-computed <span style="color: #666666">=</span> f_grad(x)
-analytic <span style="color: #666666">=</span> f_grad_analytic(x)
-
-plt<span style="color: #666666">.</span>title(<span style="color: #BA2121">&#39;Derivative computed from Autograd compared with the analytical derivative&#39;</span>)
-plt<span style="color: #666666">.</span>plot(x,computed,label<span style="color: #666666">=</span><span style="color: #BA2121">&#39;autograd&#39;</span>)
-plt<span style="color: #666666">.</span>plot(x,analytic,label<span style="color: #666666">=</span><span style="color: #BA2121">&#39;analytic&#39;</span>)
-
-plt<span style="color: #666666">.</span>xlabel(<span style="color: #BA2121">&#39;x&#39;</span>)
-plt<span style="color: #666666">.</span>ylabel(<span style="color: #BA2121">&#39;y&#39;</span>)
-plt<span style="color: #666666">.</span>legend()
-
-plt<span style="color: #666666">.</span>show()
-
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;The max absolute difference is: </span><span style="color: #BB6688; font-weight: bold">%g</span><span style="color: #BA2121">&quot;</span><span style="color: #666666">%</span>(np<span style="color: #666666">.</span>max(np<span style="color: #666666">.</span>abs(computed <span style="color: #666666">-</span> analytic))))
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-
-<!-- !split  -->
-<h2 id="using-autograd">Using autograd </h2>
-
-<p>Here we
-experiment with what kind of functions Autograd is capable
-of finding the gradient of. The following Python functions are just
-meant to illustrate what Autograd can do, but please feel free to
-experiment with other, possibly more complicated, functions as well.
-</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">autograd.numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">autograd</span> <span style="color: #008000; font-weight: bold">import</span> grad
-
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">f1</span>(x):
-    <span style="color: #008000; font-weight: bold">return</span> x<span style="color: #666666">**3</span> <span style="color: #666666">+</span> <span style="color: #666666">1</span>
-
-f1_grad <span style="color: #666666">=</span> grad(f1)
-
-<span style="color: #408080; font-style: italic"># Remember to send in float as argument to the computed gradient from Autograd!</span>
-a <span style="color: #666666">=</span> <span style="color: #666666">1.0</span>
-
-<span style="color: #408080; font-style: italic"># See the evaluated gradient at a using autograd:</span>
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;The gradient of f1 evaluated at a = </span><span style="color: #BB6688; font-weight: bold">%g</span><span style="color: #BA2121"> using autograd is: </span><span style="color: #BB6688; font-weight: bold">%g</span><span style="color: #BA2121">&quot;</span><span style="color: #666666">%</span>(a,f1_grad(a)))
-
-<span style="color: #408080; font-style: italic"># Compare with the analytical derivative, that is f1&#39;(x) = 3*x**2 </span>
-grad_analytical <span style="color: #666666">=</span> <span style="color: #666666">3*</span>a<span style="color: #666666">**2</span>
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;The gradient of f1 evaluated at a = </span><span style="color: #BB6688; font-weight: bold">%g</span><span style="color: #BA2121"> by finding the analytic expression is: </span><span style="color: #BB6688; font-weight: bold">%g</span><span style="color: #BA2121">&quot;</span><span style="color: #666666">%</span>(a,grad_analytical))
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="autograd-with-more-complicated-functions">Autograd with more complicated functions </h2>
-
-<p>To differentiate with respect to two (or more) arguments of a Python
-function, Autograd need to know at which variable the function if
-being differentiated with respect to.
+<p>If we in addition define a diagonal matrix \( \boldsymbol{W} \) with elements 
+\( p(y_i\vert x_i,\boldsymbol{\theta})(1-p(y_i\vert x_i,\boldsymbol{\theta}) \), we can obtain a compact expression of the second derivative as 
 </p>
 
+$$
+\frac{\partial^2 \mathcal{C}(\boldsymbol{\theta})}{\partial \boldsymbol{\theta}\partial \boldsymbol{\theta}^T} = \boldsymbol{X}^T\boldsymbol{W}\boldsymbol{X}. 
+$$
 
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">autograd.numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">autograd</span> <span style="color: #008000; font-weight: bold">import</span> grad
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">f2</span>(x1,x2):
-    <span style="color: #008000; font-weight: bold">return</span> <span style="color: #666666">3*</span>x1<span style="color: #666666">**3</span> <span style="color: #666666">+</span> x2<span style="color: #666666">*</span>(x1 <span style="color: #666666">-</span> <span style="color: #666666">5</span>) <span style="color: #666666">+</span> <span style="color: #666666">1</span>
-
-<span style="color: #408080; font-style: italic"># By sending the argument 0, Autograd will compute the derivative w.r.t the first variable, in this case x1</span>
-f2_grad_x1 <span style="color: #666666">=</span> grad(f2,<span style="color: #666666">0</span>)
-
-<span style="color: #408080; font-style: italic"># ... and differentiate w.r.t x2 by sending 1 as an additional arugment to grad</span>
-f2_grad_x2 <span style="color: #666666">=</span> grad(f2,<span style="color: #666666">1</span>)
-
-x1 <span style="color: #666666">=</span> <span style="color: #666666">1.0</span>
-x2 <span style="color: #666666">=</span> <span style="color: #666666">3.0</span> 
-
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;Evaluating at x1 = </span><span style="color: #BB6688; font-weight: bold">%g</span><span style="color: #BA2121">, x2 = </span><span style="color: #BB6688; font-weight: bold">%g</span><span style="color: #BA2121">&quot;</span><span style="color: #666666">%</span>(x1,x2))
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;-&quot;</span><span style="color: #666666">*30</span>)
-
-<span style="color: #408080; font-style: italic"># Compare with the analytical derivatives:</span>
-
-<span style="color: #408080; font-style: italic"># Derivative of f2 w.r.t x1 is: 9*x1**2 + x2:</span>
-f2_grad_x1_analytical <span style="color: #666666">=</span> <span style="color: #666666">9*</span>x1<span style="color: #666666">**2</span> <span style="color: #666666">+</span> x2
-
-<span style="color: #408080; font-style: italic"># Derivative of f2 w.r.t x2 is: x1 - 5:</span>
-f2_grad_x2_analytical <span style="color: #666666">=</span> x1 <span style="color: #666666">-</span> <span style="color: #666666">5</span>
-
-<span style="color: #408080; font-style: italic"># See the evaluated derivations:</span>
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;The derivative of f2 w.r.t x1: </span><span style="color: #BB6688; font-weight: bold">%g</span><span style="color: #BA2121">&quot;</span><span style="color: #666666">%</span>( f2_grad_x1(x1,x2) ))
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;The analytical derivative of f2 w.r.t x1: </span><span style="color: #BB6688; font-weight: bold">%g</span><span style="color: #BA2121">&quot;</span><span style="color: #666666">%</span>( f2_grad_x1(x1,x2) ))
-
-<span style="color: #008000">print</span>()
-
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;The derivative of f2 w.r.t x2: </span><span style="color: #BB6688; font-weight: bold">%g</span><span style="color: #BA2121">&quot;</span><span style="color: #666666">%</span>( f2_grad_x2(x1,x2) ))
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;The analytical derivative of f2 w.r.t x2: </span><span style="color: #BB6688; font-weight: bold">%g</span><span style="color: #BA2121">&quot;</span><span style="color: #666666">%</span>( f2_grad_x2(x1,x2) ))
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-<p>Note that the grad function will not produce the true gradient of the function. The true gradient of a function with two or more variables will produce a vector, where each element is the function differentiated w.r.t a variable.</p>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="more-complicated-functions-using-the-elements-of-their-arguments-directly">More complicated functions using the elements of their arguments directly </h2>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">autograd.numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">autograd</span> <span style="color: #008000; font-weight: bold">import</span> grad
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">f3</span>(x): <span style="color: #408080; font-style: italic"># Assumes x is an array of length 5 or higher</span>
-    <span style="color: #008000; font-weight: bold">return</span> <span style="color: #666666">2*</span>x[<span style="color: #666666">0</span>] <span style="color: #666666">+</span> <span style="color: #666666">3*</span>x[<span style="color: #666666">1</span>] <span style="color: #666666">+</span> <span style="color: #666666">5*</span>x[<span style="color: #666666">2</span>] <span style="color: #666666">+</span> <span style="color: #666666">7*</span>x[<span style="color: #666666">3</span>] <span style="color: #666666">+</span> <span style="color: #666666">11*</span>x[<span style="color: #666666">4</span>]<span style="color: #666666">**2</span>
-
-f3_grad <span style="color: #666666">=</span> grad(f3)
-
-x <span style="color: #666666">=</span> np<span style="color: #666666">.</span>linspace(<span style="color: #666666">0</span>,<span style="color: #666666">4</span>,<span style="color: #666666">5</span>)
-
-<span style="color: #408080; font-style: italic"># Print the computed gradient:</span>
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;The computed gradient of f3 is: &quot;</span>, f3_grad(x))
-
-<span style="color: #408080; font-style: italic"># The analytical gradient is: (2, 3, 5, 7, 22*x[4])</span>
-f3_grad_analytical <span style="color: #666666">=</span> np<span style="color: #666666">.</span>array([<span style="color: #666666">2</span>, <span style="color: #666666">3</span>, <span style="color: #666666">5</span>, <span style="color: #666666">7</span>, <span style="color: #666666">22*</span>x[<span style="color: #666666">4</span>]])
-
-<span style="color: #408080; font-style: italic"># Print the analytical gradient:</span>
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;The analytical gradient of f3 is: &quot;</span>, f3_grad_analytical)
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-<p>Note that in this case, when sending an array as input argument, the
-output from Autograd is another array. This is the true gradient of
-the function, as opposed to the function in the previous example. By
-using arrays to represent the variables, the output from Autograd
-might be easier to work with, as the output is closer to what one
-could expect form a gradient-evaluting function.
-</p>
-
-<!-- !split  -->
-<h2 id="functions-using-mathematical-functions-from-numpy">Functions using mathematical functions from Numpy </h2>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">autograd.numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">autograd</span> <span style="color: #008000; font-weight: bold">import</span> grad
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">f4</span>(x):
-    <span style="color: #008000; font-weight: bold">return</span> np<span style="color: #666666">.</span>sqrt(<span style="color: #666666">1+</span>x<span style="color: #666666">**2</span>) <span style="color: #666666">+</span> np<span style="color: #666666">.</span>exp(x) <span style="color: #666666">+</span> np<span style="color: #666666">.</span>sin(<span style="color: #666666">2*</span>np<span style="color: #666666">.</span>pi<span style="color: #666666">*</span>x)
-
-f4_grad <span style="color: #666666">=</span> grad(f4)
-
-x <span style="color: #666666">=</span> <span style="color: #666666">2.7</span>
-
-<span style="color: #408080; font-style: italic"># Print the computed derivative:</span>
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;The computed derivative of f4 at x = </span><span style="color: #BB6688; font-weight: bold">%g</span><span style="color: #BA2121"> is: </span><span style="color: #BB6688; font-weight: bold">%g</span><span style="color: #BA2121">&quot;</span><span style="color: #666666">%</span>(x,f4_grad(x)))
-
-<span style="color: #408080; font-style: italic"># The analytical derivative is: x/sqrt(1 + x**2) + exp(x) + cos(2*pi*x)*2*pi</span>
-f4_grad_analytical <span style="color: #666666">=</span> x<span style="color: #666666">/</span>np<span style="color: #666666">.</span>sqrt(<span style="color: #666666">1</span> <span style="color: #666666">+</span> x<span style="color: #666666">**2</span>) <span style="color: #666666">+</span> np<span style="color: #666666">.</span>exp(x) <span style="color: #666666">+</span> np<span style="color: #666666">.</span>cos(<span style="color: #666666">2*</span>np<span style="color: #666666">.</span>pi<span style="color: #666666">*</span>x)<span style="color: #666666">*2*</span>np<span style="color: #666666">.</span>pi
-
-<span style="color: #408080; font-style: italic"># Print the analytical gradient:</span>
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;The analytical gradient of f4 at x = </span><span style="color: #BB6688; font-weight: bold">%g</span><span style="color: #BA2121"> is: </span><span style="color: #BB6688; font-weight: bold">%g</span><span style="color: #BA2121">&quot;</span><span style="color: #666666">%</span>(x,f4_grad_analytical))
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
+<h2 id="extending-to-more-predictors">Extending to more predictors </h2>
 
+<p>Within a binary classification problem, we can easily expand our model to include multiple predictors. Our ratio between likelihoods is then with \( p \) predictors</p>
+$$
+\log{ \frac{p(\boldsymbol{\theta}\boldsymbol{x})}{1-p(\boldsymbol{\theta}\boldsymbol{x})}} = \theta_0+\theta_1x_1+\theta_2x_2+\dots+\theta_px_p.
+$$
 
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="more-autograd">More autograd </h2>
+<p>Here we defined \( \boldsymbol{x}=[1,x_1,x_2,\dots,x_p] \) and \( \boldsymbol{\theta}=[\theta_0, \theta_1, \dots, \theta_p] \) leading to</p>
+$$
+p(\boldsymbol{\theta}\boldsymbol{x})=\frac{ \exp{(\theta_0+\theta_1x_1+\theta_2x_2+\dots+\theta_px_p)}}{1+\exp{(\theta_0+\theta_1x_1+\theta_2x_2+\dots+\theta_px_p)}}.
+$$
 
 
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">autograd.numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">autograd</span> <span style="color: #008000; font-weight: bold">import</span> grad
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">f5</span>(x):
-    <span style="color: #008000; font-weight: bold">if</span> x <span style="color: #666666">&gt;=</span> <span style="color: #666666">0</span>:
-        <span style="color: #008000; font-weight: bold">return</span> x<span style="color: #666666">**2</span>
-    <span style="color: #008000; font-weight: bold">else</span>:
-        <span style="color: #008000; font-weight: bold">return</span> <span style="color: #666666">-3*</span>x <span style="color: #666666">+</span> <span style="color: #666666">1</span>
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="including-more-classes">Including more classes </h2>
 
-f5_grad <span style="color: #666666">=</span> grad(f5)
+<p>Till now we have mainly focused on two classes, the so-called binary
+system. Suppose we wish to extend to \( K \) classes.  Let us for the sake
+of simplicity assume we have only two predictors. We have then following model
+</p>
 
-x <span style="color: #666666">=</span> <span style="color: #666666">2.7</span>
+$$
+\log{\frac{p(C=1\vert x)}{p(K\vert x)}} = \theta_{10}+\theta_{11}x_1,
+$$
 
-<span style="color: #408080; font-style: italic"># Print the computed derivative:</span>
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;The computed derivative of f5 at x = </span><span style="color: #BB6688; font-weight: bold">%g</span><span style="color: #BA2121"> is: </span><span style="color: #BB6688; font-weight: bold">%g</span><span style="color: #BA2121">&quot;</span><span style="color: #666666">%</span>(x,f5_grad(x)))
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
+<p>and </p>
+$$
+\log{\frac{p(C=2\vert x)}{p(K\vert x)}} = \theta_{20}+\theta_{21}x_1,
+$$
+
+<p>and so on till the class \( C=K-1 \) class</p>
+$$
+\log{\frac{p(C=K-1\vert x)}{p(K\vert x)}} = \theta_{(K-1)0}+\theta_{(K-1)1}x_1,
+$$
 
+<p>and the model is specified in term of \( K-1 \) so-called log-odds or
+<b>logit</b> transformations.
+</p>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="and-with-loops">And  with loops </h2>
+<h2 id="more-classes">More classes </h2>
 
+<p>In our discussion of neural networks we will encounter the above again
+in terms of a slightly modified function, the so-called <b>Softmax</b> function.
+</p>
 
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">autograd.numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">autograd</span> <span style="color: #008000; font-weight: bold">import</span> grad
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">f6_for</span>(x):
-    val <span style="color: #666666">=</span> <span style="color: #666666">0</span>
-    <span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #666666">10</span>):
-        val <span style="color: #666666">=</span> val <span style="color: #666666">+</span> x<span style="color: #666666">**</span>i
-    <span style="color: #008000; font-weight: bold">return</span> val
-
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">f6_while</span>(x):
-    val <span style="color: #666666">=</span> <span style="color: #666666">0</span>
-    i <span style="color: #666666">=</span> <span style="color: #666666">0</span>
-    <span style="color: #008000; font-weight: bold">while</span> i <span style="color: #666666">&lt;</span> <span style="color: #666666">10</span>:
-        val <span style="color: #666666">=</span> val <span style="color: #666666">+</span> x<span style="color: #666666">**</span>i
-        i <span style="color: #666666">=</span> i <span style="color: #666666">+</span> <span style="color: #666666">1</span>
-    <span style="color: #008000; font-weight: bold">return</span> val
-
-f6_for_grad <span style="color: #666666">=</span> grad(f6_for)
-f6_while_grad <span style="color: #666666">=</span> grad(f6_while)
-
-x <span style="color: #666666">=</span> <span style="color: #666666">0.5</span>
-
-<span style="color: #408080; font-style: italic"># Print the computed derivaties of f6_for and f6_while</span>
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;The computed derivative of f6_for at x = </span><span style="color: #BB6688; font-weight: bold">%g</span><span style="color: #BA2121"> is: </span><span style="color: #BB6688; font-weight: bold">%g</span><span style="color: #BA2121">&quot;</span><span style="color: #666666">%</span>(x,f6_for_grad(x)))
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;The computed derivative of f6_while at x = </span><span style="color: #BB6688; font-weight: bold">%g</span><span style="color: #BA2121"> is: </span><span style="color: #BB6688; font-weight: bold">%g</span><span style="color: #BA2121">&quot;</span><span style="color: #666666">%</span>(x,f6_while_grad(x)))
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
+<p>The softmax function is used in various multiclass classification
+methods, such as multinomial logistic regression (also known as
+softmax regression), multiclass linear discriminant analysis, naive
+Bayes classifiers, and artificial neural networks.  Specifically, in
+multinomial logistic regression and linear discriminant analysis, the
+input to the function is the result of \( K \) distinct linear functions,
+and the predicted probability for the \( k \)-th class given a sample
+vector \( \boldsymbol{x} \) and a weighting vector \( \boldsymbol{\theta} \) is (with two
+predictors):
+</p>
 
+$$
+p(C=k\vert \mathbf {x} )=\frac{\exp{(\theta_{k0}+\theta_{k1}x_1)}}{1+\sum_{l=1}^{K-1}\exp{(\theta_{l0}+\theta_{l1}x_1)}}.
+$$
 
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">autograd.numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">autograd</span> <span style="color: #008000; font-weight: bold">import</span> grad
-<span style="color: #408080; font-style: italic"># Both of the functions are implementation of the sum: sum(x**i) for i = 0, ..., 9</span>
-<span style="color: #408080; font-style: italic"># The analytical derivative is: sum(i*x**(i-1)) </span>
-f6_grad_analytical <span style="color: #666666">=</span> <span style="color: #666666">0</span>
-<span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #666666">10</span>):
-    f6_grad_analytical <span style="color: #666666">+=</span> i<span style="color: #666666">*</span>x<span style="color: #666666">**</span>(i<span style="color: #666666">-1</span>)
-
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;The analytical derivative of f6 at x = </span><span style="color: #BB6688; font-weight: bold">%g</span><span style="color: #BA2121"> is: </span><span style="color: #BB6688; font-weight: bold">%g</span><span style="color: #BA2121">&quot;</span><span style="color: #666666">%</span>(x,f6_grad_analytical))
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
+<p>It is easy to extend to more predictors. The final class is </p>
+$$
+p(C=K\vert \mathbf {x} )=\frac{1}{1+\sum_{l=1}^{K-1}\exp{(\theta_{l0}+\theta_{l1}x_1)}},
+$$
+
+<p>and they sum to one. Our earlier discussions were all specialized to
+the case with two classes only. It is easy to see from the above that
+what we derived earlier is compatible with these equations.
+</p>
 
+<p>To find the optimal parameters we would typically use a gradient
+descent method.  Newton's method and gradient descent methods are
+discussed in the material on <a href="https://compphysics.github.io/MachineLearning/doc/pub/Splines/html/Splines-bs.html" target="_blank">optimization
+methods</a>.
+</p>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="using-recursion">Using recursion </h2>
+<h2 id="optimization-the-central-part-of-any-machine-learning-algortithm">Optimization, the central part of any Machine Learning algortithm </h2>
 
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">autograd.numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">autograd</span> <span style="color: #008000; font-weight: bold">import</span> grad
+<p>Almost every problem in machine learning and data science starts with
+a dataset \( X \), a model \( g(\theta) \), which is a function of the
+parameters \( \theta \) and a cost function \( C(X, g(\theta)) \) that allows
+us to judge how well the model \( g(\theta) \) explains the observations
+\( X \). The model is fit by finding the values of \( \theta \) that minimize
+the cost function. Ideally we would be able to solve for \( \theta \)
+analytically, however this is not possible in general and we must use
+some approximative/numerical method to compute the minimum.
+</p>
+
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="revisiting-our-logistic-regression-case">Revisiting our Logistic Regression case </h2>
+
+<p>In our discussion on Logistic Regression we studied the 
+case of
+two classes, with \( y_i \) either
+\( 0 \) or \( 1 \). Furthermore we assumed also that we have only two
+parameters \( \theta \) in our fitting, that is we
+defined probabilities
+</p>
 
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">f7</span>(n): <span style="color: #408080; font-style: italic"># Assume that n is an integer</span>
-    <span style="color: #008000; font-weight: bold">if</span> n <span style="color: #666666">==</span> <span style="color: #666666">1</span> <span style="color: #AA22FF; font-weight: bold">or</span> n <span style="color: #666666">==</span> <span style="color: #666666">0</span>:
-        <span style="color: #008000; font-weight: bold">return</span> <span style="color: #666666">1</span>
-    <span style="color: #008000; font-weight: bold">else</span>:
-        <span style="color: #008000; font-weight: bold">return</span> n<span style="color: #666666">*</span>f7(n<span style="color: #666666">-1</span>)
+$$
+\begin{align*}
+p(y_i=1|x_i,\boldsymbol{\theta}) &= \frac{\exp{(\theta_0+\theta_1x_i)}}{1+\exp{(\theta_0+\theta_1x_i)}},\nonumber\\
+p(y_i=0|x_i,\boldsymbol{\theta}) &= 1 - p(y_i=1|x_i,\boldsymbol{\theta}),
+\end{align*}
+$$
 
-f7_grad <span style="color: #666666">=</span> grad(f7)
+<p>where \( \boldsymbol{\theta} \) are the weights we wish to extract from data, in our case \( \theta_0 \) and \( \theta_1 \). </p>
 
-n <span style="color: #666666">=</span> <span style="color: #666666">2.0</span>
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="the-equations-to-solve">The equations to solve </h2>
 
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;The computed derivative of f7 at n = </span><span style="color: #BB6688; font-weight: bold">%d</span><span style="color: #BA2121"> is: </span><span style="color: #BB6688; font-weight: bold">%g</span><span style="color: #BA2121">&quot;</span><span style="color: #666666">%</span>(n,f7_grad(n)))
+<p>Our compact equations used a definition of a vector \( \boldsymbol{y} \) with \( n \)
+elements \( y_i \), an \( n\times p \) matrix \( \boldsymbol{X} \) which contains the
+\( x_i \) values and a vector \( \boldsymbol{p} \) of fitted probabilities
+\( p(y_i\vert x_i,\boldsymbol{\theta}) \). We rewrote in a more compact form
+the first derivative of the cost function as
+</p>
 
-<span style="color: #408080; font-style: italic"># The function f7 is an implementation of the factorial of n.</span>
-<span style="color: #408080; font-style: italic"># By using the product rule, one can find that the derivative is:</span>
+$$
+\frac{\partial \mathcal{C}(\boldsymbol{\theta})}{\partial \boldsymbol{\theta}} = -\boldsymbol{X}^T\left(\boldsymbol{y}-\boldsymbol{p}\right). 
+$$
 
-f7_grad_analytical <span style="color: #666666">=</span> <span style="color: #666666">0</span>
-<span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #008000">int</span>(n)<span style="color: #666666">-1</span>):
-    tmp <span style="color: #666666">=</span> <span style="color: #666666">1</span>
-    <span style="color: #008000; font-weight: bold">for</span> k <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #008000">int</span>(n)<span style="color: #666666">-1</span>):
-        <span style="color: #008000; font-weight: bold">if</span> k <span style="color: #666666">!=</span> i:
-            tmp <span style="color: #666666">*=</span> (n <span style="color: #666666">-</span> k)
-    f7_grad_analytical <span style="color: #666666">+=</span> tmp
+<p>If we in addition define a diagonal matrix \( \boldsymbol{W} \) with elements 
+\( p(y_i\vert x_i,\boldsymbol{\theta})(1-p(y_i\vert x_i,\boldsymbol{\theta}) \), we can obtain a compact expression of the second derivative as 
+</p>
 
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;The analytical derivative of f7 at n = </span><span style="color: #BB6688; font-weight: bold">%d</span><span style="color: #BA2121"> is: </span><span style="color: #BB6688; font-weight: bold">%g</span><span style="color: #BA2121">&quot;</span><span style="color: #666666">%</span>(n,f7_grad_analytical))
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
+$$
+\frac{\partial^2 \mathcal{C}(\boldsymbol{\theta})}{\partial \boldsymbol{\theta}\partial \boldsymbol{\theta}^T} = \boldsymbol{X}^T\boldsymbol{W}\boldsymbol{X}. 
+$$
 
-<p>Note that if n is equal to zero or one, Autograd will give an error message. This message appears when the output is independent on input.</p>
+<p>This defines what is called  the Hessian matrix.</p>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="using-autograd-with-ols">Using Autograd with OLS </h2>
+<h2 id="solving-using-newton-raphson-s-method">Solving using Newton-Raphson's method </h2>
 
-<p>We conclude the part on optmization by showing how we can make codes
-for linear regression and logistic regression using <b>autograd</b>. The
-first example shows results with ordinary leats squares.
-</p>
+<p>If we can set up these equations, Newton-Raphson's iterative method is normally the method of choice. It requires however that we can compute in an efficient way the  matrices that define the first and second derivatives. </p>
 
+<p>Our iterative scheme is then given by</p>
 
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #408080; font-style: italic"># Using Autograd to calculate gradients for OLS</span>
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">random</span> <span style="color: #008000; font-weight: bold">import</span> random, seed
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">autograd.numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">matplotlib.pyplot</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">plt</span>
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">autograd</span> <span style="color: #008000; font-weight: bold">import</span> grad
-
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">CostOLS</span>(beta):
-    <span style="color: #008000; font-weight: bold">return</span> (<span style="color: #666666">1.0/</span>n)<span style="color: #666666">*</span>np<span style="color: #666666">.</span>sum((y<span style="color: #666666">-</span>X <span style="color: #666666">@</span> beta)<span style="color: #666666">**2</span>)
-
-n <span style="color: #666666">=</span> <span style="color: #666666">100</span>
-x <span style="color: #666666">=</span> <span style="color: #666666">2*</span>np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>rand(n,<span style="color: #666666">1</span>)
-y <span style="color: #666666">=</span> <span style="color: #666666">4+3*</span>x<span style="color: #666666">+</span>np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>randn(n,<span style="color: #666666">1</span>)
-
-X <span style="color: #666666">=</span> np<span style="color: #666666">.</span>c_[np<span style="color: #666666">.</span>ones((n,<span style="color: #666666">1</span>)), x]
-XT_X <span style="color: #666666">=</span> X<span style="color: #666666">.</span>T <span style="color: #666666">@</span> X
-theta_linreg <span style="color: #666666">=</span> np<span style="color: #666666">.</span>linalg<span style="color: #666666">.</span>pinv(XT_X) <span style="color: #666666">@</span> (X<span style="color: #666666">.</span>T <span style="color: #666666">@</span> y)
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;Own inversion&quot;</span>)
-<span style="color: #008000">print</span>(theta_linreg)
-<span style="color: #408080; font-style: italic"># Hessian matrix</span>
-H <span style="color: #666666">=</span> (<span style="color: #666666">2.0/</span>n)<span style="color: #666666">*</span> XT_X
-EigValues, EigVectors <span style="color: #666666">=</span> np<span style="color: #666666">.</span>linalg<span style="color: #666666">.</span>eig(H)
-<span style="color: #008000">print</span>(<span style="color: #BA2121">f&quot;Eigenvalues of Hessian Matrix:</span><span style="color: #BB6688; font-weight: bold">{</span>EigValues<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">&quot;</span>)
-
-theta <span style="color: #666666">=</span> np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>randn(<span style="color: #666666">2</span>,<span style="color: #666666">1</span>)
-eta <span style="color: #666666">=</span> <span style="color: #666666">1.0/</span>np<span style="color: #666666">.</span>max(EigValues)
-Niterations <span style="color: #666666">=</span> <span style="color: #666666">1000</span>
-<span style="color: #408080; font-style: italic"># define the gradient</span>
-training_gradient <span style="color: #666666">=</span> grad(CostOLS)
-
-<span style="color: #008000; font-weight: bold">for</span> <span style="color: #008000">iter</span> <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(Niterations):
-    gradients <span style="color: #666666">=</span> training_gradient(theta)
-    theta <span style="color: #666666">-=</span> eta<span style="color: #666666">*</span>gradients
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;theta from own gd&quot;</span>)
-<span style="color: #008000">print</span>(theta)
-
-xnew <span style="color: #666666">=</span> np<span style="color: #666666">.</span>array([[<span style="color: #666666">0</span>],[<span style="color: #666666">2</span>]])
-Xnew <span style="color: #666666">=</span> np<span style="color: #666666">.</span>c_[np<span style="color: #666666">.</span>ones((<span style="color: #666666">2</span>,<span style="color: #666666">1</span>)), xnew]
-ypredict <span style="color: #666666">=</span> Xnew<span style="color: #666666">.</span>dot(theta)
-ypredict2 <span style="color: #666666">=</span> Xnew<span style="color: #666666">.</span>dot(theta_linreg)
-
-plt<span style="color: #666666">.</span>plot(xnew, ypredict, <span style="color: #BA2121">&quot;r-&quot;</span>)
-plt<span style="color: #666666">.</span>plot(xnew, ypredict2, <span style="color: #BA2121">&quot;b-&quot;</span>)
-plt<span style="color: #666666">.</span>plot(x, y ,<span style="color: #BA2121">&#39;ro&#39;</span>)
-plt<span style="color: #666666">.</span>axis([<span style="color: #666666">0</span>,<span style="color: #666666">2.0</span>,<span style="color: #666666">0</span>, <span style="color: #666666">15.0</span>])
-plt<span style="color: #666666">.</span>xlabel(<span style="color: #BA2121">r&#39;$x$&#39;</span>)
-plt<span style="color: #666666">.</span>ylabel(<span style="color: #BA2121">r&#39;$y$&#39;</span>)
-plt<span style="color: #666666">.</span>title(<span style="color: #BA2121">r&#39;Random numbers &#39;</span>)
-plt<span style="color: #666666">.</span>show()
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
+$$
+\boldsymbol{\theta}^{\mathrm{new}} = \boldsymbol{\theta}^{\mathrm{old}}-\left(\frac{\partial^2 \mathcal{C}(\boldsymbol{\theta})}{\partial \boldsymbol{\theta}\partial \boldsymbol{\theta}^T}\right)^{-1}_{\boldsymbol{\theta}^{\mathrm{old}}}\times \left(\frac{\partial \mathcal{C}(\boldsymbol{\theta})}{\partial \boldsymbol{\theta}}\right)_{\boldsymbol{\theta}^{\mathrm{old}}},
+$$
 
+<p>or in matrix form as</p>
 
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="same-code-but-now-with-momentum-gradient-descent">Same code but now with momentum gradient descent </h2>
+$$
+\boldsymbol{\theta}^{\mathrm{new}} = \boldsymbol{\theta}^{\mathrm{old}}-\left(\boldsymbol{X}^T\boldsymbol{W}\boldsymbol{X} \right)^{-1}\times \left(-\boldsymbol{X}^T(\boldsymbol{y}-\boldsymbol{p}) \right)_{\boldsymbol{\theta}^{\mathrm{old}}}.
+$$
 
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #408080; font-style: italic"># Using Autograd to calculate gradients for OLS</span>
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">random</span> <span style="color: #008000; font-weight: bold">import</span> random, seed
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">autograd.numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">matplotlib.pyplot</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">plt</span>
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">autograd</span> <span style="color: #008000; font-weight: bold">import</span> grad
-
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">CostOLS</span>(beta):
-    <span style="color: #008000; font-weight: bold">return</span> (<span style="color: #666666">1.0/</span>n)<span style="color: #666666">*</span>np<span style="color: #666666">.</span>sum((y<span style="color: #666666">-</span>X <span style="color: #666666">@</span> beta)<span style="color: #666666">**2</span>)
-
-n <span style="color: #666666">=</span> <span style="color: #666666">100</span>
-x <span style="color: #666666">=</span> <span style="color: #666666">2*</span>np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>rand(n,<span style="color: #666666">1</span>)
-y <span style="color: #666666">=</span> <span style="color: #666666">4+3*</span>x<span style="color: #408080; font-style: italic">#+np.random.randn(n,1)</span>
-
-X <span style="color: #666666">=</span> np<span style="color: #666666">.</span>c_[np<span style="color: #666666">.</span>ones((n,<span style="color: #666666">1</span>)), x]
-XT_X <span style="color: #666666">=</span> X<span style="color: #666666">.</span>T <span style="color: #666666">@</span> X
-theta_linreg <span style="color: #666666">=</span> np<span style="color: #666666">.</span>linalg<span style="color: #666666">.</span>pinv(XT_X) <span style="color: #666666">@</span> (X<span style="color: #666666">.</span>T <span style="color: #666666">@</span> y)
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;Own inversion&quot;</span>)
-<span style="color: #008000">print</span>(theta_linreg)
-<span style="color: #408080; font-style: italic"># Hessian matrix</span>
-H <span style="color: #666666">=</span> (<span style="color: #666666">2.0/</span>n)<span style="color: #666666">*</span> XT_X
-EigValues, EigVectors <span style="color: #666666">=</span> np<span style="color: #666666">.</span>linalg<span style="color: #666666">.</span>eig(H)
-<span style="color: #008000">print</span>(<span style="color: #BA2121">f&quot;Eigenvalues of Hessian Matrix:</span><span style="color: #BB6688; font-weight: bold">{</span>EigValues<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">&quot;</span>)
-
-theta <span style="color: #666666">=</span> np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>randn(<span style="color: #666666">2</span>,<span style="color: #666666">1</span>)
-eta <span style="color: #666666">=</span> <span style="color: #666666">1.0/</span>np<span style="color: #666666">.</span>max(EigValues)
-Niterations <span style="color: #666666">=</span> <span style="color: #666666">30</span>
-
-<span style="color: #408080; font-style: italic"># define the gradient</span>
-training_gradient <span style="color: #666666">=</span> grad(CostOLS)
-
-<span style="color: #008000; font-weight: bold">for</span> <span style="color: #008000">iter</span> <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(Niterations):
-    gradients <span style="color: #666666">=</span> training_gradient(theta)
-    theta <span style="color: #666666">-=</span> eta<span style="color: #666666">*</span>gradients
-    <span style="color: #008000">print</span>(<span style="color: #008000">iter</span>,gradients[<span style="color: #666666">0</span>],gradients[<span style="color: #666666">1</span>])
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;theta from own gd&quot;</span>)
-<span style="color: #008000">print</span>(theta)
-
-<span style="color: #408080; font-style: italic"># Now improve with momentum gradient descent</span>
-change <span style="color: #666666">=</span> <span style="color: #666666">0.0</span>
-delta_momentum <span style="color: #666666">=</span> <span style="color: #666666">0.3</span>
-<span style="color: #008000; font-weight: bold">for</span> <span style="color: #008000">iter</span> <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(Niterations):
-    <span style="color: #408080; font-style: italic"># calculate gradient</span>
-    gradients <span style="color: #666666">=</span> training_gradient(theta)
-    <span style="color: #408080; font-style: italic"># calculate update</span>
-    new_change <span style="color: #666666">=</span> eta<span style="color: #666666">*</span>gradients<span style="color: #666666">+</span>delta_momentum<span style="color: #666666">*</span>change
-    <span style="color: #408080; font-style: italic"># take a step</span>
-    theta <span style="color: #666666">-=</span> new_change
-    <span style="color: #408080; font-style: italic"># save the change</span>
-    change <span style="color: #666666">=</span> new_change
-    <span style="color: #008000">print</span>(<span style="color: #008000">iter</span>,gradients[<span style="color: #666666">0</span>],gradients[<span style="color: #666666">1</span>])
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;theta from own gd wth momentum&quot;</span>)
-<span style="color: #008000">print</span>(theta)
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
+<p>The right-hand side is computed with the old values of \( \theta \). </p>
 
+<p>If we can compute these matrices, in particular the Hessian, the above is often the easiest method to implement. </p>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="including-stochastic-gradient-descent-with-autograd">Including Stochastic Gradient Descent with Autograd </h2>
-<p>In this code we include the stochastic gradient descent approach discussed above. Note here that we specify which argument we are taking the derivative with respect to when using <b>autograd</b>.</p>
+<h2 id="example-code-for-logistic-regression">Example code for Logistic Regression </h2>
 
+<p>Here we make a class for Logistic regression. The code uses a simple data set and includes both a binary case and a multiclass case.</p>
 
 <!-- code=python (!bc pycod) typeset with pygments style "default" -->
 <div class="cell border-box-sizing code_cell rendered">
@@ -1922,79 +710,138 @@ <h2 id="including-stochastic-gradient-descent-with-autograd">Including Stochasti
     <div class="inner_cell">
       <div class="input_area">
         <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #408080; font-style: italic"># Using Autograd to calculate gradients using SGD</span>
-<span style="color: #408080; font-style: italic"># OLS example</span>
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">random</span> <span style="color: #008000; font-weight: bold">import</span> random, seed
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">autograd.numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">matplotlib.pyplot</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">plt</span>
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">autograd</span> <span style="color: #008000; font-weight: bold">import</span> grad
-
-<span style="color: #408080; font-style: italic"># Note change from previous example</span>
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">CostOLS</span>(y,X,theta):
-    <span style="color: #008000; font-weight: bold">return</span> np<span style="color: #666666">.</span>sum((y<span style="color: #666666">-</span>X <span style="color: #666666">@</span> theta)<span style="color: #666666">**2</span>)
-
-n <span style="color: #666666">=</span> <span style="color: #666666">100</span>
-x <span style="color: #666666">=</span> <span style="color: #666666">2*</span>np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>rand(n,<span style="color: #666666">1</span>)
-y <span style="color: #666666">=</span> <span style="color: #666666">4+3*</span>x<span style="color: #666666">+</span>np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>randn(n,<span style="color: #666666">1</span>)
-
-X <span style="color: #666666">=</span> np<span style="color: #666666">.</span>c_[np<span style="color: #666666">.</span>ones((n,<span style="color: #666666">1</span>)), x]
-XT_X <span style="color: #666666">=</span> X<span style="color: #666666">.</span>T <span style="color: #666666">@</span> X
-theta_linreg <span style="color: #666666">=</span> np<span style="color: #666666">.</span>linalg<span style="color: #666666">.</span>pinv(XT_X) <span style="color: #666666">@</span> (X<span style="color: #666666">.</span>T <span style="color: #666666">@</span> y)
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;Own inversion&quot;</span>)
-<span style="color: #008000">print</span>(theta_linreg)
-<span style="color: #408080; font-style: italic"># Hessian matrix</span>
-H <span style="color: #666666">=</span> (<span style="color: #666666">2.0/</span>n)<span style="color: #666666">*</span> XT_X
-EigValues, EigVectors <span style="color: #666666">=</span> np<span style="color: #666666">.</span>linalg<span style="color: #666666">.</span>eig(H)
-<span style="color: #008000">print</span>(<span style="color: #BA2121">f&quot;Eigenvalues of Hessian Matrix:</span><span style="color: #BB6688; font-weight: bold">{</span>EigValues<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">&quot;</span>)
-
-theta <span style="color: #666666">=</span> np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>randn(<span style="color: #666666">2</span>,<span style="color: #666666">1</span>)
-eta <span style="color: #666666">=</span> <span style="color: #666666">1.0/</span>np<span style="color: #666666">.</span>max(EigValues)
-Niterations <span style="color: #666666">=</span> <span style="color: #666666">1000</span>
-
-<span style="color: #408080; font-style: italic"># Note that we request the derivative wrt third argument (theta, 2 here)</span>
-training_gradient <span style="color: #666666">=</span> grad(CostOLS,<span style="color: #666666">2</span>)
-
-<span style="color: #008000; font-weight: bold">for</span> <span style="color: #008000">iter</span> <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(Niterations):
-    gradients <span style="color: #666666">=</span> (<span style="color: #666666">1.0/</span>n)<span style="color: #666666">*</span>training_gradient(y, X, theta)
-    theta <span style="color: #666666">-=</span> eta<span style="color: #666666">*</span>gradients
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;theta from own gd&quot;</span>)
-<span style="color: #008000">print</span>(theta)
-
-xnew <span style="color: #666666">=</span> np<span style="color: #666666">.</span>array([[<span style="color: #666666">0</span>],[<span style="color: #666666">2</span>]])
-Xnew <span style="color: #666666">=</span> np<span style="color: #666666">.</span>c_[np<span style="color: #666666">.</span>ones((<span style="color: #666666">2</span>,<span style="color: #666666">1</span>)), xnew]
-ypredict <span style="color: #666666">=</span> Xnew<span style="color: #666666">.</span>dot(theta)
-ypredict2 <span style="color: #666666">=</span> Xnew<span style="color: #666666">.</span>dot(theta_linreg)
-
-plt<span style="color: #666666">.</span>plot(xnew, ypredict, <span style="color: #BA2121">&quot;r-&quot;</span>)
-plt<span style="color: #666666">.</span>plot(xnew, ypredict2, <span style="color: #BA2121">&quot;b-&quot;</span>)
-plt<span style="color: #666666">.</span>plot(x, y ,<span style="color: #BA2121">&#39;ro&#39;</span>)
-plt<span style="color: #666666">.</span>axis([<span style="color: #666666">0</span>,<span style="color: #666666">2.0</span>,<span style="color: #666666">0</span>, <span style="color: #666666">15.0</span>])
-plt<span style="color: #666666">.</span>xlabel(<span style="color: #BA2121">r&#39;$x$&#39;</span>)
-plt<span style="color: #666666">.</span>ylabel(<span style="color: #BA2121">r&#39;$y$&#39;</span>)
-plt<span style="color: #666666">.</span>title(<span style="color: #BA2121">r&#39;Random numbers &#39;</span>)
-plt<span style="color: #666666">.</span>show()
+  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
+
+<span style="color: #008000; font-weight: bold">class</span> <span style="color: #0000FF; font-weight: bold">LogisticRegression</span>:
+    <span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;</span>
+<span style="color: #BA2121; font-style: italic">    Logistic Regression for binary and multiclass classification.</span>
+<span style="color: #BA2121; font-style: italic">    &quot;&quot;&quot;</span>
+    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">__init__</span>(<span style="color: #008000">self</span>, lr<span style="color: #666666">=0.01</span>, epochs<span style="color: #666666">=1000</span>, fit_intercept<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">True</span>, verbose<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">False</span>):
+        <span style="color: #008000">self</span><span style="color: #666666">.</span>lr <span style="color: #666666">=</span> lr                  <span style="color: #408080; font-style: italic"># Learning rate for gradient descent</span>
+        <span style="color: #008000">self</span><span style="color: #666666">.</span>epochs <span style="color: #666666">=</span> epochs          <span style="color: #408080; font-style: italic"># Number of iterations</span>
+        <span style="color: #008000">self</span><span style="color: #666666">.</span>fit_intercept <span style="color: #666666">=</span> fit_intercept  <span style="color: #408080; font-style: italic"># Whether to add intercept (bias)</span>
+        <span style="color: #008000">self</span><span style="color: #666666">.</span>verbose <span style="color: #666666">=</span> verbose        <span style="color: #408080; font-style: italic"># Print loss during training if True</span>
+        <span style="color: #008000">self</span><span style="color: #666666">.</span>weights <span style="color: #666666">=</span> <span style="color: #008000; font-weight: bold">None</span>
+        <span style="color: #008000">self</span><span style="color: #666666">.</span>multi_class <span style="color: #666666">=</span> <span style="color: #008000; font-weight: bold">False</span>      <span style="color: #408080; font-style: italic"># Will be determined at fit time</span>
+
+    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">_add_intercept</span>(<span style="color: #008000">self</span>, X):
+        <span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;Add intercept term (column of ones) to feature matrix.&quot;&quot;&quot;</span>
+        intercept <span style="color: #666666">=</span> np<span style="color: #666666">.</span>ones((X<span style="color: #666666">.</span>shape[<span style="color: #666666">0</span>], <span style="color: #666666">1</span>))
+        <span style="color: #008000; font-weight: bold">return</span> np<span style="color: #666666">.</span>concatenate((intercept, X), axis<span style="color: #666666">=1</span>)
+
+    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">_sigmoid</span>(<span style="color: #008000">self</span>, z):
+        <span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;Sigmoid function for binary logistic.&quot;&quot;&quot;</span>
+        <span style="color: #008000; font-weight: bold">return</span> <span style="color: #666666">1</span> <span style="color: #666666">/</span> (<span style="color: #666666">1</span> <span style="color: #666666">+</span> np<span style="color: #666666">.</span>exp(<span style="color: #666666">-</span>z))
 
-n_epochs <span style="color: #666666">=</span> <span style="color: #666666">50</span>
-M <span style="color: #666666">=</span> <span style="color: #666666">5</span>   <span style="color: #408080; font-style: italic">#size of each minibatch</span>
-m <span style="color: #666666">=</span> <span style="color: #008000">int</span>(n<span style="color: #666666">/</span>M) <span style="color: #408080; font-style: italic">#number of minibatches</span>
-t0, t1 <span style="color: #666666">=</span> <span style="color: #666666">5</span>, <span style="color: #666666">50</span>
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">learning_schedule</span>(t):
-    <span style="color: #008000; font-weight: bold">return</span> t0<span style="color: #666666">/</span>(t<span style="color: #666666">+</span>t1)
-
-theta <span style="color: #666666">=</span> np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>randn(<span style="color: #666666">2</span>,<span style="color: #666666">1</span>)
-
-<span style="color: #008000; font-weight: bold">for</span> epoch <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(n_epochs):
-<span style="color: #408080; font-style: italic"># Can you figure out a better way of setting up the contributions to each batch?</span>
-    <span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(m):
-        random_index <span style="color: #666666">=</span> M<span style="color: #666666">*</span>np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>randint(m)
-        xi <span style="color: #666666">=</span> X[random_index:random_index<span style="color: #666666">+</span>M]
-        yi <span style="color: #666666">=</span> y[random_index:random_index<span style="color: #666666">+</span>M]
-        gradients <span style="color: #666666">=</span> (<span style="color: #666666">1.0/</span>M)<span style="color: #666666">*</span>training_gradient(yi, xi, theta)
-        eta <span style="color: #666666">=</span> learning_schedule(epoch<span style="color: #666666">*</span>m<span style="color: #666666">+</span>i)
-        theta <span style="color: #666666">=</span> theta <span style="color: #666666">-</span> eta<span style="color: #666666">*</span>gradients
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;theta from own sdg&quot;</span>)
-<span style="color: #008000">print</span>(theta)
+    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">_softmax</span>(<span style="color: #008000">self</span>, Z):
+        <span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;Softmax function for multiclass logistic.&quot;&quot;&quot;</span>
+        exp_Z <span style="color: #666666">=</span> np<span style="color: #666666">.</span>exp(Z <span style="color: #666666">-</span> np<span style="color: #666666">.</span>max(Z, axis<span style="color: #666666">=1</span>, keepdims<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">True</span>))
+        <span style="color: #008000; font-weight: bold">return</span> exp_Z <span style="color: #666666">/</span> np<span style="color: #666666">.</span>sum(exp_Z, axis<span style="color: #666666">=1</span>, keepdims<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">True</span>)
+
+    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">fit</span>(<span style="color: #008000">self</span>, X, y):
+        <span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;</span>
+<span style="color: #BA2121; font-style: italic">        Train the logistic regression model using gradient descent.</span>
+<span style="color: #BA2121; font-style: italic">        Supports binary (sigmoid) and multiclass (softmax) based on y.</span>
+<span style="color: #BA2121; font-style: italic">        &quot;&quot;&quot;</span>
+        X <span style="color: #666666">=</span> np<span style="color: #666666">.</span>array(X)
+        y <span style="color: #666666">=</span> np<span style="color: #666666">.</span>array(y)
+        n_samples, n_features <span style="color: #666666">=</span> X<span style="color: #666666">.</span>shape
+
+        <span style="color: #408080; font-style: italic"># Add intercept if needed</span>
+        <span style="color: #008000; font-weight: bold">if</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>fit_intercept:
+            X <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>_add_intercept(X)
+            n_features <span style="color: #666666">+=</span> <span style="color: #666666">1</span>
+
+        <span style="color: #408080; font-style: italic"># Determine classes and mode (binary vs multiclass)</span>
+        unique_classes <span style="color: #666666">=</span> np<span style="color: #666666">.</span>unique(y)
+        <span style="color: #008000; font-weight: bold">if</span> <span style="color: #008000">len</span>(unique_classes) <span style="color: #666666">&gt;</span> <span style="color: #666666">2</span>:
+            <span style="color: #008000">self</span><span style="color: #666666">.</span>multi_class <span style="color: #666666">=</span> <span style="color: #008000; font-weight: bold">True</span>
+        <span style="color: #008000; font-weight: bold">else</span>:
+            <span style="color: #008000">self</span><span style="color: #666666">.</span>multi_class <span style="color: #666666">=</span> <span style="color: #008000; font-weight: bold">False</span>
+
+        <span style="color: #408080; font-style: italic"># ----- Multiclass case -----</span>
+        <span style="color: #008000; font-weight: bold">if</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>multi_class:
+            n_classes <span style="color: #666666">=</span> <span style="color: #008000">len</span>(unique_classes)
+            <span style="color: #408080; font-style: italic"># Map original labels to 0...n_classes-1</span>
+            class_to_index <span style="color: #666666">=</span> {c: idx <span style="color: #008000; font-weight: bold">for</span> idx, c <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">enumerate</span>(unique_classes)}
+            y_indices <span style="color: #666666">=</span> np<span style="color: #666666">.</span>array([class_to_index[c] <span style="color: #008000; font-weight: bold">for</span> c <span style="color: #AA22FF; font-weight: bold">in</span> y])
+            <span style="color: #408080; font-style: italic"># Initialize weight matrix (features x classes)</span>
+            <span style="color: #008000">self</span><span style="color: #666666">.</span>weights <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros((n_features, n_classes))
+
+            <span style="color: #408080; font-style: italic"># One-hot encode y</span>
+            Y_onehot <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros((n_samples, n_classes))
+            Y_onehot[np<span style="color: #666666">.</span>arange(n_samples), y_indices] <span style="color: #666666">=</span> <span style="color: #666666">1</span>
+
+            <span style="color: #408080; font-style: italic"># Gradient descent</span>
+            <span style="color: #008000; font-weight: bold">for</span> epoch <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #008000">self</span><span style="color: #666666">.</span>epochs):
+                scores <span style="color: #666666">=</span> X<span style="color: #666666">.</span>dot(<span style="color: #008000">self</span><span style="color: #666666">.</span>weights)          <span style="color: #408080; font-style: italic"># Linear scores (n_samples x n_classes)</span>
+                probs <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>_softmax(scores)        <span style="color: #408080; font-style: italic"># Probabilities (n_samples x n_classes)</span>
+                <span style="color: #408080; font-style: italic"># Compute gradient (features x classes)</span>
+                gradient <span style="color: #666666">=</span> (<span style="color: #666666">1</span> <span style="color: #666666">/</span> n_samples) <span style="color: #666666">*</span> X<span style="color: #666666">.</span>T<span style="color: #666666">.</span>dot(probs <span style="color: #666666">-</span> Y_onehot)
+                <span style="color: #408080; font-style: italic"># Update weights</span>
+                <span style="color: #008000">self</span><span style="color: #666666">.</span>weights <span style="color: #666666">-=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>lr <span style="color: #666666">*</span> gradient
+
+                <span style="color: #008000; font-weight: bold">if</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>verbose <span style="color: #AA22FF; font-weight: bold">and</span> epoch <span style="color: #666666">%</span> <span style="color: #666666">100</span> <span style="color: #666666">==</span> <span style="color: #666666">0</span>:
+                    <span style="color: #408080; font-style: italic"># Compute current loss (categorical cross-entropy)</span>
+                    loss <span style="color: #666666">=</span> <span style="color: #666666">-</span>np<span style="color: #666666">.</span>sum(Y_onehot <span style="color: #666666">*</span> np<span style="color: #666666">.</span>log(probs <span style="color: #666666">+</span> <span style="color: #666666">1e-15</span>)) <span style="color: #666666">/</span> n_samples
+                    <span style="color: #008000">print</span>(<span style="color: #BA2121">f&quot;[Epoch </span><span style="color: #BB6688; font-weight: bold">{</span>epoch<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">] Multiclass loss: </span><span style="color: #BB6688; font-weight: bold">{</span>loss<span style="color: #BB6688; font-weight: bold">:</span><span style="color: #BA2121">.4f</span><span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">&quot;</span>)
+
+        <span style="color: #408080; font-style: italic"># ----- Binary case -----</span>
+        <span style="color: #008000; font-weight: bold">else</span>:
+            <span style="color: #408080; font-style: italic"># Convert y to 0/1 if not already</span>
+            <span style="color: #008000; font-weight: bold">if</span> <span style="color: #AA22FF; font-weight: bold">not</span> np<span style="color: #666666">.</span>array_equal(unique_classes, [<span style="color: #666666">0</span>, <span style="color: #666666">1</span>]):
+                <span style="color: #408080; font-style: italic"># Map the two classes to 0 and 1</span>
+                class0, class1 <span style="color: #666666">=</span> unique_classes
+                y_binary <span style="color: #666666">=</span> np<span style="color: #666666">.</span>where(y <span style="color: #666666">==</span> class1, <span style="color: #666666">1</span>, <span style="color: #666666">0</span>)
+            <span style="color: #008000; font-weight: bold">else</span>:
+                y_binary <span style="color: #666666">=</span> y<span style="color: #666666">.</span>copy()<span style="color: #666666">.</span>astype(<span style="color: #008000">int</span>)
+
+            <span style="color: #408080; font-style: italic"># Initialize weights vector (features,)</span>
+            <span style="color: #008000">self</span><span style="color: #666666">.</span>weights <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros(n_features)
+
+            <span style="color: #408080; font-style: italic"># Gradient descent</span>
+            <span style="color: #008000; font-weight: bold">for</span> epoch <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #008000">self</span><span style="color: #666666">.</span>epochs):
+                linear_model <span style="color: #666666">=</span> X<span style="color: #666666">.</span>dot(<span style="color: #008000">self</span><span style="color: #666666">.</span>weights)     <span style="color: #408080; font-style: italic"># (n_samples,)</span>
+                probs <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>_sigmoid(linear_model)   <span style="color: #408080; font-style: italic"># (n_samples,)</span>
+                <span style="color: #408080; font-style: italic"># Gradient for binary cross-entropy</span>
+                gradient <span style="color: #666666">=</span> (<span style="color: #666666">1</span> <span style="color: #666666">/</span> n_samples) <span style="color: #666666">*</span> X<span style="color: #666666">.</span>T<span style="color: #666666">.</span>dot(probs <span style="color: #666666">-</span> y_binary)
+                <span style="color: #008000">self</span><span style="color: #666666">.</span>weights <span style="color: #666666">-=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>lr <span style="color: #666666">*</span> gradient
+
+                <span style="color: #008000; font-weight: bold">if</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>verbose <span style="color: #AA22FF; font-weight: bold">and</span> epoch <span style="color: #666666">%</span> <span style="color: #666666">100</span> <span style="color: #666666">==</span> <span style="color: #666666">0</span>:
+                    <span style="color: #408080; font-style: italic"># Compute binary cross-entropy loss</span>
+                    loss <span style="color: #666666">=</span> <span style="color: #666666">-</span>np<span style="color: #666666">.</span>mean(
+                        y_binary <span style="color: #666666">*</span> np<span style="color: #666666">.</span>log(probs <span style="color: #666666">+</span> <span style="color: #666666">1e-15</span>) <span style="color: #666666">+</span> 
+                        (<span style="color: #666666">1</span> <span style="color: #666666">-</span> y_binary) <span style="color: #666666">*</span> np<span style="color: #666666">.</span>log(<span style="color: #666666">1</span> <span style="color: #666666">-</span> probs <span style="color: #666666">+</span> <span style="color: #666666">1e-15</span>)
+                    )
+                    <span style="color: #008000">print</span>(<span style="color: #BA2121">f&quot;[Epoch </span><span style="color: #BB6688; font-weight: bold">{</span>epoch<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">] Binary loss: </span><span style="color: #BB6688; font-weight: bold">{</span>loss<span style="color: #BB6688; font-weight: bold">:</span><span style="color: #BA2121">.4f</span><span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">&quot;</span>)
+
+    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">predict_prob</span>(<span style="color: #008000">self</span>, X):
+        <span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;</span>
+<span style="color: #BA2121; font-style: italic">        Compute probability estimates. Returns a 1D array for binary or</span>
+<span style="color: #BA2121; font-style: italic">        a 2D array (n_samples x n_classes) for multiclass.</span>
+<span style="color: #BA2121; font-style: italic">        &quot;&quot;&quot;</span>
+        X <span style="color: #666666">=</span> np<span style="color: #666666">.</span>array(X)
+        <span style="color: #408080; font-style: italic"># Add intercept if the model used it</span>
+        <span style="color: #008000; font-weight: bold">if</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>fit_intercept:
+            X <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>_add_intercept(X)
+        scores <span style="color: #666666">=</span> X<span style="color: #666666">.</span>dot(<span style="color: #008000">self</span><span style="color: #666666">.</span>weights)
+        <span style="color: #008000; font-weight: bold">if</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>multi_class:
+            <span style="color: #008000; font-weight: bold">return</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>_softmax(scores)
+        <span style="color: #008000; font-weight: bold">else</span>:
+            <span style="color: #008000; font-weight: bold">return</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>_sigmoid(scores)
+
+    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">predict</span>(<span style="color: #008000">self</span>, X):
+        <span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;</span>
+<span style="color: #BA2121; font-style: italic">        Predict class labels for samples in X.</span>
+<span style="color: #BA2121; font-style: italic">        Returns integer class labels (0,1 for binary, or 0...C-1 for multiclass).</span>
+<span style="color: #BA2121; font-style: italic">        &quot;&quot;&quot;</span>
+        probs <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>predict_prob(X)
+        <span style="color: #008000; font-weight: bold">if</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>multi_class:
+            <span style="color: #408080; font-style: italic"># Choose class with highest probability</span>
+            <span style="color: #008000; font-weight: bold">return</span> np<span style="color: #666666">.</span>argmax(probs, axis<span style="color: #666666">=1</span>)
+        <span style="color: #008000; font-weight: bold">else</span>:
+            <span style="color: #408080; font-style: italic"># Threshold at 0.5 for binary</span>
+            <span style="color: #008000; font-weight: bold">return</span> (probs <span style="color: #666666">&gt;=</span> <span style="color: #666666">0.5</span>)<span style="color: #666666">.</span>astype(<span style="color: #008000">int</span>)
 </pre>
 </div>
       </div>
@@ -2010,9 +857,17 @@ <h2 id="including-stochastic-gradient-descent-with-autograd">Including Stochasti
   </div>
 </div>
 
+<p>The class implements the sigmoid and softmax internally. During fit(),
+we check the number of classes: if more than 2, we set
+self.multi_class=True and perform multinomial logistic regression. We
+one-hot encode the target vector and update a weight matrix with
+softmax probabilities. Otherwise, we do standard binary logistic
+regression, converting labels to 0/1 if needed and updating a weight
+vector. In both cases we use batch gradient descent on the
+cross-entropy loss (we add a small epsilon 1e-15 to logs for numerical
+stability). Progress (loss) can be printed if verbose=True.
+</p>
 
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="same-code-but-now-with-momentum-gradient-descent">Same code but now with momentum gradient descent </h2>
 
 <!-- code=python (!bc pycod) typeset with pygments style "default" -->
 <div class="cell border-box-sizing code_cell rendered">
@@ -2020,73 +875,38 @@ <h2 id="same-code-but-now-with-momentum-gradient-descent">Same code but now with
     <div class="inner_cell">
       <div class="input_area">
         <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #408080; font-style: italic"># Using Autograd to calculate gradients using SGD</span>
-<span style="color: #408080; font-style: italic"># OLS example</span>
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">random</span> <span style="color: #008000; font-weight: bold">import</span> random, seed
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">autograd.numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">matplotlib.pyplot</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">plt</span>
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">autograd</span> <span style="color: #008000; font-weight: bold">import</span> grad
-
-<span style="color: #408080; font-style: italic"># Note change from previous example</span>
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">CostOLS</span>(y,X,theta):
-    <span style="color: #008000; font-weight: bold">return</span> np<span style="color: #666666">.</span>sum((y<span style="color: #666666">-</span>X <span style="color: #666666">@</span> theta)<span style="color: #666666">**2</span>)
-
-n <span style="color: #666666">=</span> <span style="color: #666666">100</span>
-x <span style="color: #666666">=</span> <span style="color: #666666">2*</span>np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>rand(n,<span style="color: #666666">1</span>)
-y <span style="color: #666666">=</span> <span style="color: #666666">4+3*</span>x<span style="color: #666666">+</span>np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>randn(n,<span style="color: #666666">1</span>)
-
-X <span style="color: #666666">=</span> np<span style="color: #666666">.</span>c_[np<span style="color: #666666">.</span>ones((n,<span style="color: #666666">1</span>)), x]
-XT_X <span style="color: #666666">=</span> X<span style="color: #666666">.</span>T <span style="color: #666666">@</span> X
-theta_linreg <span style="color: #666666">=</span> np<span style="color: #666666">.</span>linalg<span style="color: #666666">.</span>pinv(XT_X) <span style="color: #666666">@</span> (X<span style="color: #666666">.</span>T <span style="color: #666666">@</span> y)
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;Own inversion&quot;</span>)
-<span style="color: #008000">print</span>(theta_linreg)
-<span style="color: #408080; font-style: italic"># Hessian matrix</span>
-H <span style="color: #666666">=</span> (<span style="color: #666666">2.0/</span>n)<span style="color: #666666">*</span> XT_X
-EigValues, EigVectors <span style="color: #666666">=</span> np<span style="color: #666666">.</span>linalg<span style="color: #666666">.</span>eig(H)
-<span style="color: #008000">print</span>(<span style="color: #BA2121">f&quot;Eigenvalues of Hessian Matrix:</span><span style="color: #BB6688; font-weight: bold">{</span>EigValues<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">&quot;</span>)
-
-theta <span style="color: #666666">=</span> np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>randn(<span style="color: #666666">2</span>,<span style="color: #666666">1</span>)
-eta <span style="color: #666666">=</span> <span style="color: #666666">1.0/</span>np<span style="color: #666666">.</span>max(EigValues)
-Niterations <span style="color: #666666">=</span> <span style="color: #666666">100</span>
-
-<span style="color: #408080; font-style: italic"># Note that we request the derivative wrt third argument (theta, 2 here)</span>
-training_gradient <span style="color: #666666">=</span> grad(CostOLS,<span style="color: #666666">2</span>)
-
-<span style="color: #008000; font-weight: bold">for</span> <span style="color: #008000">iter</span> <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(Niterations):
-    gradients <span style="color: #666666">=</span> (<span style="color: #666666">1.0/</span>n)<span style="color: #666666">*</span>training_gradient(y, X, theta)
-    theta <span style="color: #666666">-=</span> eta<span style="color: #666666">*</span>gradients
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;theta from own gd&quot;</span>)
-<span style="color: #008000">print</span>(theta)
-
-
-n_epochs <span style="color: #666666">=</span> <span style="color: #666666">50</span>
-M <span style="color: #666666">=</span> <span style="color: #666666">5</span>   <span style="color: #408080; font-style: italic">#size of each minibatch</span>
-m <span style="color: #666666">=</span> <span style="color: #008000">int</span>(n<span style="color: #666666">/</span>M) <span style="color: #408080; font-style: italic">#number of minibatches</span>
-t0, t1 <span style="color: #666666">=</span> <span style="color: #666666">5</span>, <span style="color: #666666">50</span>
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">learning_schedule</span>(t):
-    <span style="color: #008000; font-weight: bold">return</span> t0<span style="color: #666666">/</span>(t<span style="color: #666666">+</span>t1)
-
-theta <span style="color: #666666">=</span> np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>randn(<span style="color: #666666">2</span>,<span style="color: #666666">1</span>)
-
-change <span style="color: #666666">=</span> <span style="color: #666666">0.0</span>
-delta_momentum <span style="color: #666666">=</span> <span style="color: #666666">0.3</span>
-
-<span style="color: #008000; font-weight: bold">for</span> epoch <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(n_epochs):
-    <span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(m):
-        random_index <span style="color: #666666">=</span> M<span style="color: #666666">*</span>np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>randint(m)
-        xi <span style="color: #666666">=</span> X[random_index:random_index<span style="color: #666666">+</span>M]
-        yi <span style="color: #666666">=</span> y[random_index:random_index<span style="color: #666666">+</span>M]
-        gradients <span style="color: #666666">=</span> (<span style="color: #666666">1.0/</span>M)<span style="color: #666666">*</span>training_gradient(yi, xi, theta)
-        eta <span style="color: #666666">=</span> learning_schedule(epoch<span style="color: #666666">*</span>m<span style="color: #666666">+</span>i)
-        <span style="color: #408080; font-style: italic"># calculate update</span>
-        new_change <span style="color: #666666">=</span> eta<span style="color: #666666">*</span>gradients<span style="color: #666666">+</span>delta_momentum<span style="color: #666666">*</span>change
-        <span style="color: #408080; font-style: italic"># take a step</span>
-        theta <span style="color: #666666">-=</span> new_change
-        <span style="color: #408080; font-style: italic"># save the change</span>
-        change <span style="color: #666666">=</span> new_change
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;theta from own sdg with momentum&quot;</span>)
-<span style="color: #008000">print</span>(theta)
+  <pre style="line-height: 125%;"><span style="color: #408080; font-style: italic"># Evaluation Metrics</span>
+<span style="color: #408080; font-style: italic">#We define helper functions for accuracy and cross-entropy loss. Accuracy is the fraction of correct predictions . For loss, we compute the appropriate cross-entropy:</span>
+
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">accuracy_score</span>(y_true, y_pred):
+    <span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;Accuracy = (# correct predictions) / (total samples).&quot;&quot;&quot;</span>
+    y_true <span style="color: #666666">=</span> np<span style="color: #666666">.</span>array(y_true)
+    y_pred <span style="color: #666666">=</span> np<span style="color: #666666">.</span>array(y_pred)
+    <span style="color: #008000; font-weight: bold">return</span> np<span style="color: #666666">.</span>mean(y_true <span style="color: #666666">==</span> y_pred)
+
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">binary_cross_entropy</span>(y_true, y_prob):
+    <span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;</span>
+<span style="color: #BA2121; font-style: italic">    Binary cross-entropy loss.</span>
+<span style="color: #BA2121; font-style: italic">    y_true: true binary labels (0 or 1), y_prob: predicted probabilities for class 1.</span>
+<span style="color: #BA2121; font-style: italic">    &quot;&quot;&quot;</span>
+    y_true <span style="color: #666666">=</span> np<span style="color: #666666">.</span>array(y_true)
+    y_prob <span style="color: #666666">=</span> np<span style="color: #666666">.</span>clip(np<span style="color: #666666">.</span>array(y_prob), <span style="color: #666666">1e-15</span>, <span style="color: #666666">1-1e-15</span>)
+    <span style="color: #008000; font-weight: bold">return</span> <span style="color: #666666">-</span>np<span style="color: #666666">.</span>mean(y_true <span style="color: #666666">*</span> np<span style="color: #666666">.</span>log(y_prob) <span style="color: #666666">+</span> (<span style="color: #666666">1</span> <span style="color: #666666">-</span> y_true) <span style="color: #666666">*</span> np<span style="color: #666666">.</span>log(<span style="color: #666666">1</span> <span style="color: #666666">-</span> y_prob))
+
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">categorical_cross_entropy</span>(y_true, y_prob):
+    <span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;</span>
+<span style="color: #BA2121; font-style: italic">    Categorical cross-entropy loss for multiclass.</span>
+<span style="color: #BA2121; font-style: italic">    y_true: true labels (0...C-1), y_prob: array of predicted probabilities (n_samples x C).</span>
+<span style="color: #BA2121; font-style: italic">    &quot;&quot;&quot;</span>
+    y_true <span style="color: #666666">=</span> np<span style="color: #666666">.</span>array(y_true, dtype<span style="color: #666666">=</span><span style="color: #008000">int</span>)
+    y_prob <span style="color: #666666">=</span> np<span style="color: #666666">.</span>clip(np<span style="color: #666666">.</span>array(y_prob), <span style="color: #666666">1e-15</span>, <span style="color: #666666">1-1e-15</span>)
+    <span style="color: #408080; font-style: italic"># One-hot encode true labels</span>
+    n_samples, n_classes <span style="color: #666666">=</span> y_prob<span style="color: #666666">.</span>shape
+    one_hot <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros_like(y_prob)
+    one_hot[np<span style="color: #666666">.</span>arange(n_samples), y_true] <span style="color: #666666">=</span> <span style="color: #666666">1</span>
+    <span style="color: #408080; font-style: italic"># Compute cross-entropy</span>
+    loss_vec <span style="color: #666666">=</span> <span style="color: #666666">-</span>np<span style="color: #666666">.</span>sum(one_hot <span style="color: #666666">*</span> np<span style="color: #666666">.</span>log(y_prob), axis<span style="color: #666666">=1</span>)
+    <span style="color: #008000; font-weight: bold">return</span> np<span style="color: #666666">.</span>mean(loss_vec)
 </pre>
 </div>
       </div>
@@ -2101,10 +921,12 @@ <h2 id="same-code-but-now-with-momentum-gradient-descent">Same code but now with
     </div>
   </div>
 </div>
+<h3 id="synthetic-data-generation">Synthetic data generation </h3>
 
+<p>Binary classification data: Create two Gaussian clusters in 2D. For example, class 0 around mean [-2,-2] and class 1 around [2,2].
+Multiclass data: Create several Gaussian clusters (one per class) spread out in feature space.
+</p>
 
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="similar-second-order-function-now-problem-but-now-with-adagrad">Similar (second order function now) problem but now with AdaGrad </h2>
 
 <!-- code=python (!bc pycod) typeset with pygments style "default" -->
 <div class="cell border-box-sizing code_cell rendered">
@@ -2112,54 +934,84 @@ <h2 id="similar-second-order-function-now-problem-but-now-with-adagrad">Similar
     <div class="inner_cell">
       <div class="input_area">
         <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #408080; font-style: italic"># Using Autograd to calculate gradients using AdaGrad and Stochastic Gradient descent</span>
-<span style="color: #408080; font-style: italic"># OLS example</span>
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">random</span> <span style="color: #008000; font-weight: bold">import</span> random, seed
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">autograd.numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">matplotlib.pyplot</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">plt</span>
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">autograd</span> <span style="color: #008000; font-weight: bold">import</span> grad
-
-<span style="color: #408080; font-style: italic"># Note change from previous example</span>
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">CostOLS</span>(y,X,theta):
-    <span style="color: #008000; font-weight: bold">return</span> np<span style="color: #666666">.</span>sum((y<span style="color: #666666">-</span>X <span style="color: #666666">@</span> theta)<span style="color: #666666">**2</span>)
-
-n <span style="color: #666666">=</span> <span style="color: #666666">1000</span>
-x <span style="color: #666666">=</span> np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>rand(n,<span style="color: #666666">1</span>)
-y <span style="color: #666666">=</span> <span style="color: #666666">2.0+3*</span>x <span style="color: #666666">+4*</span>x<span style="color: #666666">*</span>x
-
-X <span style="color: #666666">=</span> np<span style="color: #666666">.</span>c_[np<span style="color: #666666">.</span>ones((n,<span style="color: #666666">1</span>)), x, x<span style="color: #666666">*</span>x]
-XT_X <span style="color: #666666">=</span> X<span style="color: #666666">.</span>T <span style="color: #666666">@</span> X
-theta_linreg <span style="color: #666666">=</span> np<span style="color: #666666">.</span>linalg<span style="color: #666666">.</span>pinv(XT_X) <span style="color: #666666">@</span> (X<span style="color: #666666">.</span>T <span style="color: #666666">@</span> y)
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;Own inversion&quot;</span>)
-<span style="color: #008000">print</span>(theta_linreg)
-
-
-<span style="color: #408080; font-style: italic"># Note that we request the derivative wrt third argument (theta, 2 here)</span>
-training_gradient <span style="color: #666666">=</span> grad(CostOLS,<span style="color: #666666">2</span>)
-<span style="color: #408080; font-style: italic"># Define parameters for Stochastic Gradient Descent</span>
-n_epochs <span style="color: #666666">=</span> <span style="color: #666666">50</span>
-M <span style="color: #666666">=</span> <span style="color: #666666">5</span>   <span style="color: #408080; font-style: italic">#size of each minibatch</span>
-m <span style="color: #666666">=</span> <span style="color: #008000">int</span>(n<span style="color: #666666">/</span>M) <span style="color: #408080; font-style: italic">#number of minibatches</span>
-<span style="color: #408080; font-style: italic"># Guess for unknown parameters theta</span>
-theta <span style="color: #666666">=</span> np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>randn(<span style="color: #666666">3</span>,<span style="color: #666666">1</span>)
-
-<span style="color: #408080; font-style: italic"># Value for learning rate</span>
-eta <span style="color: #666666">=</span> <span style="color: #666666">0.01</span>
-<span style="color: #408080; font-style: italic"># Including AdaGrad parameter to avoid possible division by zero</span>
-delta  <span style="color: #666666">=</span> <span style="color: #666666">1e-8</span>
-<span style="color: #008000; font-weight: bold">for</span> epoch <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(n_epochs):
-    Giter <span style="color: #666666">=</span> <span style="color: #666666">0.0</span>
-    <span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(m):
-        random_index <span style="color: #666666">=</span> M<span style="color: #666666">*</span>np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>randint(m)
-        xi <span style="color: #666666">=</span> X[random_index:random_index<span style="color: #666666">+</span>M]
-        yi <span style="color: #666666">=</span> y[random_index:random_index<span style="color: #666666">+</span>M]
-        gradients <span style="color: #666666">=</span> (<span style="color: #666666">1.0/</span>M)<span style="color: #666666">*</span>training_gradient(yi, xi, theta)
-        Giter <span style="color: #666666">+=</span> gradients<span style="color: #666666">*</span>gradients
-        update <span style="color: #666666">=</span> gradients<span style="color: #666666">*</span>eta<span style="color: #666666">/</span>(delta<span style="color: #666666">+</span>np<span style="color: #666666">.</span>sqrt(Giter))
-        theta <span style="color: #666666">-=</span> update
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;theta from own AdaGrad&quot;</span>)
-<span style="color: #008000">print</span>(theta)
+  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
+
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">generate_binary_data</span>(n_samples<span style="color: #666666">=100</span>, n_features<span style="color: #666666">=2</span>, random_state<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">None</span>):
+    <span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;</span>
+<span style="color: #BA2121; font-style: italic">    Generate synthetic binary classification data.</span>
+<span style="color: #BA2121; font-style: italic">    Returns (X, y) where X is (n_samples x n_features), y in {0,1}.</span>
+<span style="color: #BA2121; font-style: italic">    &quot;&quot;&quot;</span>
+    rng <span style="color: #666666">=</span> np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>RandomState(random_state)
+    <span style="color: #408080; font-style: italic"># Half samples for class 0, half for class 1</span>
+    n0 <span style="color: #666666">=</span> n_samples <span style="color: #666666">//</span> <span style="color: #666666">2</span>
+    n1 <span style="color: #666666">=</span> n_samples <span style="color: #666666">-</span> n0
+    <span style="color: #408080; font-style: italic"># Class 0 around mean -2, class 1 around +2</span>
+    mean0 <span style="color: #666666">=</span> <span style="color: #666666">-2</span> <span style="color: #666666">*</span> np<span style="color: #666666">.</span>ones(n_features)
+    mean1 <span style="color: #666666">=</span>  <span style="color: #666666">2</span> <span style="color: #666666">*</span> np<span style="color: #666666">.</span>ones(n_features)
+    X0 <span style="color: #666666">=</span> rng<span style="color: #666666">.</span>randn(n0, n_features) <span style="color: #666666">+</span> mean0
+    X1 <span style="color: #666666">=</span> rng<span style="color: #666666">.</span>randn(n1, n_features) <span style="color: #666666">+</span> mean1
+    X <span style="color: #666666">=</span> np<span style="color: #666666">.</span>vstack((X0, X1))
+    y <span style="color: #666666">=</span> np<span style="color: #666666">.</span>array([<span style="color: #666666">0</span>]<span style="color: #666666">*</span>n0 <span style="color: #666666">+</span> [<span style="color: #666666">1</span>]<span style="color: #666666">*</span>n1)
+    <span style="color: #008000; font-weight: bold">return</span> X, y
+
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">generate_multiclass_data</span>(n_samples<span style="color: #666666">=150</span>, n_features<span style="color: #666666">=2</span>, n_classes<span style="color: #666666">=3</span>, random_state<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">None</span>):
+    <span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;</span>
+<span style="color: #BA2121; font-style: italic">    Generate synthetic multiclass data with n_classes Gaussian clusters.</span>
+<span style="color: #BA2121; font-style: italic">    &quot;&quot;&quot;</span>
+    rng <span style="color: #666666">=</span> np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>RandomState(random_state)
+    X <span style="color: #666666">=</span> []
+    y <span style="color: #666666">=</span> []
+    samples_per_class <span style="color: #666666">=</span> n_samples <span style="color: #666666">//</span> n_classes
+    <span style="color: #008000; font-weight: bold">for</span> <span style="color: #008000">cls</span> <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(n_classes):
+        <span style="color: #408080; font-style: italic"># Random cluster center for each class</span>
+        center <span style="color: #666666">=</span> rng<span style="color: #666666">.</span>uniform(<span style="color: #666666">-5</span>, <span style="color: #666666">5</span>, size<span style="color: #666666">=</span>n_features)
+        Xi <span style="color: #666666">=</span> rng<span style="color: #666666">.</span>randn(samples_per_class, n_features) <span style="color: #666666">+</span> center
+        yi <span style="color: #666666">=</span> [<span style="color: #008000">cls</span>] <span style="color: #666666">*</span> samples_per_class
+        X<span style="color: #666666">.</span>append(Xi)
+        y<span style="color: #666666">.</span>extend(yi)
+    X <span style="color: #666666">=</span> np<span style="color: #666666">.</span>vstack(X)
+    y <span style="color: #666666">=</span> np<span style="color: #666666">.</span>array(y)
+    <span style="color: #008000; font-weight: bold">return</span> X, y
+
+
+<span style="color: #408080; font-style: italic"># Generate and test on binary data</span>
+X_bin, y_bin <span style="color: #666666">=</span> generate_binary_data(n_samples<span style="color: #666666">=200</span>, n_features<span style="color: #666666">=2</span>, random_state<span style="color: #666666">=42</span>)
+model_bin <span style="color: #666666">=</span> LogisticRegression(lr<span style="color: #666666">=0.1</span>, epochs<span style="color: #666666">=1000</span>)
+model_bin<span style="color: #666666">.</span>fit(X_bin, y_bin)
+y_prob_bin <span style="color: #666666">=</span> model_bin<span style="color: #666666">.</span>predict_prob(X_bin)      <span style="color: #408080; font-style: italic"># probabilities for class 1</span>
+y_pred_bin <span style="color: #666666">=</span> model_bin<span style="color: #666666">.</span>predict(X_bin)           <span style="color: #408080; font-style: italic"># predicted classes 0 or 1</span>
+
+acc_bin <span style="color: #666666">=</span> accuracy_score(y_bin, y_pred_bin)
+loss_bin <span style="color: #666666">=</span> binary_cross_entropy(y_bin, y_prob_bin)
+<span style="color: #008000">print</span>(<span style="color: #BA2121">f&quot;Binary Classification - Accuracy: </span><span style="color: #BB6688; font-weight: bold">{</span>acc_bin<span style="color: #BB6688; font-weight: bold">:</span><span style="color: #BA2121">.2f</span><span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">, Cross-Entropy Loss: </span><span style="color: #BB6688; font-weight: bold">{</span>loss_bin<span style="color: #BB6688; font-weight: bold">:</span><span style="color: #BA2121">.2f</span><span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">&quot;</span>)
+<span style="color: #408080; font-style: italic">#For multiclass:</span>
+<span style="color: #408080; font-style: italic"># Generate and test on multiclass data</span>
+X_multi, y_multi <span style="color: #666666">=</span> generate_multiclass_data(n_samples<span style="color: #666666">=300</span>, n_features<span style="color: #666666">=2</span>, n_classes<span style="color: #666666">=3</span>, random_state<span style="color: #666666">=1</span>)
+model_multi <span style="color: #666666">=</span> LogisticRegression(lr<span style="color: #666666">=0.1</span>, epochs<span style="color: #666666">=1000</span>)
+model_multi<span style="color: #666666">.</span>fit(X_multi, y_multi)
+y_prob_multi <span style="color: #666666">=</span> model_multi<span style="color: #666666">.</span>predict_prob(X_multi)     <span style="color: #408080; font-style: italic"># (n_samples x 3) probabilities</span>
+y_pred_multi <span style="color: #666666">=</span> model_multi<span style="color: #666666">.</span>predict(X_multi)          <span style="color: #408080; font-style: italic"># predicted labels 0,1,2</span>
+
+acc_multi <span style="color: #666666">=</span> accuracy_score(y_multi, y_pred_multi)
+loss_multi <span style="color: #666666">=</span> categorical_cross_entropy(y_multi, y_prob_multi)
+<span style="color: #008000">print</span>(<span style="color: #BA2121">f&quot;Multiclass Classification - Accuracy: </span><span style="color: #BB6688; font-weight: bold">{</span>acc_multi<span style="color: #BB6688; font-weight: bold">:</span><span style="color: #BA2121">.2f</span><span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">, Cross-Entropy Loss: </span><span style="color: #BB6688; font-weight: bold">{</span>loss_multi<span style="color: #BB6688; font-weight: bold">:</span><span style="color: #BA2121">.2f</span><span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">&quot;</span>)
+
+<span style="color: #408080; font-style: italic"># CSV Export</span>
+<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">csv</span>
+
+<span style="color: #408080; font-style: italic"># Export binary results</span>
+<span style="color: #008000; font-weight: bold">with</span> <span style="color: #008000">open</span>(<span style="color: #BA2121">&#39;binary_results.csv&#39;</span>, mode<span style="color: #666666">=</span><span style="color: #BA2121">&#39;w&#39;</span>, newline<span style="color: #666666">=</span><span style="color: #BA2121">&#39;&#39;</span>) <span style="color: #008000; font-weight: bold">as</span> f:
+    writer <span style="color: #666666">=</span> csv<span style="color: #666666">.</span>writer(f)
+    writer<span style="color: #666666">.</span>writerow([<span style="color: #BA2121">&quot;TrueLabel&quot;</span>, <span style="color: #BA2121">&quot;PredictedLabel&quot;</span>])
+    <span style="color: #008000; font-weight: bold">for</span> true, pred <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">zip</span>(y_bin, y_pred_bin):
+        writer<span style="color: #666666">.</span>writerow([true, pred])
+
+<span style="color: #408080; font-style: italic"># Export multiclass results</span>
+<span style="color: #008000; font-weight: bold">with</span> <span style="color: #008000">open</span>(<span style="color: #BA2121">&#39;multiclass_results.csv&#39;</span>, mode<span style="color: #666666">=</span><span style="color: #BA2121">&#39;w&#39;</span>, newline<span style="color: #666666">=</span><span style="color: #BA2121">&#39;&#39;</span>) <span style="color: #008000; font-weight: bold">as</span> f:
+    writer <span style="color: #666666">=</span> csv<span style="color: #666666">.</span>writer(f)
+    writer<span style="color: #666666">.</span>writerow([<span style="color: #BA2121">&quot;TrueLabel&quot;</span>, <span style="color: #BA2121">&quot;PredictedLabel&quot;</span>])
+    <span style="color: #008000; font-weight: bold">for</span> true, pred <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">zip</span>(y_multi, y_pred_multi):
+        writer<span style="color: #666666">.</span>writerow([true, pred])
 </pre>
 </div>
       </div>
@@ -2175,10 +1027,17 @@ <h2 id="similar-second-order-function-now-problem-but-now-with-adagrad">Similar
   </div>
 </div>
 
-<p>Running this code we note an almost perfect agreement with the results from matrix inversion.</p>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent">RMSprop for adaptive learning rate with Stochastic Gradient Descent </h2>
+<h2 id="using-scikit-learn">Using <b>Scikit-learn</b>   </h2>
+
+<p>We show here how we can use a logistic regression case on a data set
+included in _scikit_learn_, the so-called Wisconsin breast cancer data
+using Logistic regression as our algorithm for classification. This is
+a widely studied data set and can easily be included in demonstrations
+of classification problems.
+</p>
+
 
 <!-- code=python (!bc pycod) typeset with pygments style "default" -->
 <div class="cell border-box-sizing code_cell rendered">
@@ -2186,60 +1045,22 @@ <h2 id="rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent">RMS
     <div class="inner_cell">
       <div class="input_area">
         <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #408080; font-style: italic"># Using Autograd to calculate gradients using RMSprop  and Stochastic Gradient descent</span>
-<span style="color: #408080; font-style: italic"># OLS example</span>
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">random</span> <span style="color: #008000; font-weight: bold">import</span> random, seed
+  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">matplotlib.pyplot</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">plt</span>
 <span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">autograd.numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">matplotlib.pyplot</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">plt</span>
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">autograd</span> <span style="color: #008000; font-weight: bold">import</span> grad
-
-<span style="color: #408080; font-style: italic"># Note change from previous example</span>
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">CostOLS</span>(y,X,theta):
-    <span style="color: #008000; font-weight: bold">return</span> np<span style="color: #666666">.</span>sum((y<span style="color: #666666">-</span>X <span style="color: #666666">@</span> theta)<span style="color: #666666">**2</span>)
-
-n <span style="color: #666666">=</span> <span style="color: #666666">1000</span>
-x <span style="color: #666666">=</span> np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>rand(n,<span style="color: #666666">1</span>)
-y <span style="color: #666666">=</span> <span style="color: #666666">2.0+3*</span>x <span style="color: #666666">+4*</span>x<span style="color: #666666">*</span>x<span style="color: #408080; font-style: italic"># +np.random.randn(n,1)</span>
-
-X <span style="color: #666666">=</span> np<span style="color: #666666">.</span>c_[np<span style="color: #666666">.</span>ones((n,<span style="color: #666666">1</span>)), x, x<span style="color: #666666">*</span>x]
-XT_X <span style="color: #666666">=</span> X<span style="color: #666666">.</span>T <span style="color: #666666">@</span> X
-theta_linreg <span style="color: #666666">=</span> np<span style="color: #666666">.</span>linalg<span style="color: #666666">.</span>pinv(XT_X) <span style="color: #666666">@</span> (X<span style="color: #666666">.</span>T <span style="color: #666666">@</span> y)
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;Own inversion&quot;</span>)
-<span style="color: #008000">print</span>(theta_linreg)
-
-
-<span style="color: #408080; font-style: italic"># Note that we request the derivative wrt third argument (theta, 2 here)</span>
-training_gradient <span style="color: #666666">=</span> grad(CostOLS,<span style="color: #666666">2</span>)
-<span style="color: #408080; font-style: italic"># Define parameters for Stochastic Gradient Descent</span>
-n_epochs <span style="color: #666666">=</span> <span style="color: #666666">50</span>
-M <span style="color: #666666">=</span> <span style="color: #666666">5</span>   <span style="color: #408080; font-style: italic">#size of each minibatch</span>
-m <span style="color: #666666">=</span> <span style="color: #008000">int</span>(n<span style="color: #666666">/</span>M) <span style="color: #408080; font-style: italic">#number of minibatches</span>
-<span style="color: #408080; font-style: italic"># Guess for unknown parameters theta</span>
-theta <span style="color: #666666">=</span> np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>randn(<span style="color: #666666">3</span>,<span style="color: #666666">1</span>)
-
-<span style="color: #408080; font-style: italic"># Value for learning rate</span>
-eta <span style="color: #666666">=</span> <span style="color: #666666">0.01</span>
-<span style="color: #408080; font-style: italic"># Value for parameter rho</span>
-rho <span style="color: #666666">=</span> <span style="color: #666666">0.99</span>
-<span style="color: #408080; font-style: italic"># Including AdaGrad parameter to avoid possible division by zero</span>
-delta  <span style="color: #666666">=</span> <span style="color: #666666">1e-8</span>
-<span style="color: #008000; font-weight: bold">for</span> epoch <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(n_epochs):
-    Giter <span style="color: #666666">=</span> <span style="color: #666666">0.0</span>
-    <span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(m):
-        random_index <span style="color: #666666">=</span> M<span style="color: #666666">*</span>np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>randint(m)
-        xi <span style="color: #666666">=</span> X[random_index:random_index<span style="color: #666666">+</span>M]
-        yi <span style="color: #666666">=</span> y[random_index:random_index<span style="color: #666666">+</span>M]
-        gradients <span style="color: #666666">=</span> (<span style="color: #666666">1.0/</span>M)<span style="color: #666666">*</span>training_gradient(yi, xi, theta)
-	<span style="color: #408080; font-style: italic"># Accumulated gradient</span>
-	<span style="color: #408080; font-style: italic"># Scaling with rho the new and the previous results</span>
-        Giter <span style="color: #666666">=</span> (rho<span style="color: #666666">*</span>Giter<span style="color: #666666">+</span>(<span style="color: #666666">1-</span>rho)<span style="color: #666666">*</span>gradients<span style="color: #666666">*</span>gradients)
-	<span style="color: #408080; font-style: italic"># Taking the diagonal only and inverting</span>
-        update <span style="color: #666666">=</span> gradients<span style="color: #666666">*</span>eta<span style="color: #666666">/</span>(delta<span style="color: #666666">+</span>np<span style="color: #666666">.</span>sqrt(Giter))
-	<span style="color: #408080; font-style: italic"># Hadamard product</span>
-        theta <span style="color: #666666">-=</span> update
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;theta from own RMSprop&quot;</span>)
-<span style="color: #008000">print</span>(theta)
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.model_selection</span> <span style="color: #008000; font-weight: bold">import</span>  train_test_split 
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.datasets</span> <span style="color: #008000; font-weight: bold">import</span> load_breast_cancer
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.linear_model</span> <span style="color: #008000; font-weight: bold">import</span> LogisticRegression
+
+<span style="color: #408080; font-style: italic"># Load the data</span>
+cancer <span style="color: #666666">=</span> load_breast_cancer()
+
+X_train, X_test, y_train, y_test <span style="color: #666666">=</span> train_test_split(cancer<span style="color: #666666">.</span>data,cancer<span style="color: #666666">.</span>target,random_state<span style="color: #666666">=0</span>)
+<span style="color: #008000">print</span>(X_train<span style="color: #666666">.</span>shape)
+<span style="color: #008000">print</span>(X_test<span style="color: #666666">.</span>shape)
+<span style="color: #408080; font-style: italic"># Logistic Regression</span>
+logreg <span style="color: #666666">=</span> LogisticRegression(solver<span style="color: #666666">=</span><span style="color: #BA2121">&#39;lbfgs&#39;</span>)
+logreg<span style="color: #666666">.</span>fit(X_train, y_train)
+<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;Test set accuracy with Logistic Regression: </span><span style="color: #BB6688; font-weight: bold">{:.2f}</span><span style="color: #BA2121">&quot;</span><span style="color: #666666">.</span>format(logreg<span style="color: #666666">.</span>score(X_test,y_test)))
 </pre>
 </div>
       </div>
@@ -2257,8 +1078,11 @@ <h2 id="rmsprop-for-adaptive-learning-rate-with-stochastic-gradient-descent">RMS
 
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf">And finally <a href="https://arxiv.org/pdf/1412.6980.pdf" target="_blank">ADAM</a> </h2>
+<h2 id="using-the-correlation-matrix">Using the correlation matrix </h2>
 
+<p>In addition to the above scores, we could also study the covariance (and the correlation matrix).
+We use <b>Pandas</b> to compute the correlation matrix.
+</p>
 
 <!-- code=python (!bc pycod) typeset with pygments style "default" -->
 <div class="cell border-box-sizing code_cell rendered">
@@ -2266,65 +1090,40 @@ <h2 id="and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf">And finally <a href=
     <div class="inner_cell">
       <div class="input_area">
         <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #408080; font-style: italic"># Using Autograd to calculate gradients using RMSprop  and Stochastic Gradient descent</span>
-<span style="color: #408080; font-style: italic"># OLS example</span>
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">random</span> <span style="color: #008000; font-weight: bold">import</span> random, seed
+  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">matplotlib.pyplot</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">plt</span>
 <span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">autograd.numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">matplotlib.pyplot</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">plt</span>
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">autograd</span> <span style="color: #008000; font-weight: bold">import</span> grad
-
-<span style="color: #408080; font-style: italic"># Note change from previous example</span>
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">CostOLS</span>(y,X,theta):
-    <span style="color: #008000; font-weight: bold">return</span> np<span style="color: #666666">.</span>sum((y<span style="color: #666666">-</span>X <span style="color: #666666">@</span> theta)<span style="color: #666666">**2</span>)
-
-n <span style="color: #666666">=</span> <span style="color: #666666">1000</span>
-x <span style="color: #666666">=</span> np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>rand(n,<span style="color: #666666">1</span>)
-y <span style="color: #666666">=</span> <span style="color: #666666">2.0+3*</span>x <span style="color: #666666">+4*</span>x<span style="color: #666666">*</span>x<span style="color: #408080; font-style: italic"># +np.random.randn(n,1)</span>
-
-X <span style="color: #666666">=</span> np<span style="color: #666666">.</span>c_[np<span style="color: #666666">.</span>ones((n,<span style="color: #666666">1</span>)), x, x<span style="color: #666666">*</span>x]
-XT_X <span style="color: #666666">=</span> X<span style="color: #666666">.</span>T <span style="color: #666666">@</span> X
-theta_linreg <span style="color: #666666">=</span> np<span style="color: #666666">.</span>linalg<span style="color: #666666">.</span>pinv(XT_X) <span style="color: #666666">@</span> (X<span style="color: #666666">.</span>T <span style="color: #666666">@</span> y)
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;Own inversion&quot;</span>)
-<span style="color: #008000">print</span>(theta_linreg)
-
-
-<span style="color: #408080; font-style: italic"># Note that we request the derivative wrt third argument (theta, 2 here)</span>
-training_gradient <span style="color: #666666">=</span> grad(CostOLS,<span style="color: #666666">2</span>)
-<span style="color: #408080; font-style: italic"># Define parameters for Stochastic Gradient Descent</span>
-n_epochs <span style="color: #666666">=</span> <span style="color: #666666">50</span>
-M <span style="color: #666666">=</span> <span style="color: #666666">5</span>   <span style="color: #408080; font-style: italic">#size of each minibatch</span>
-m <span style="color: #666666">=</span> <span style="color: #008000">int</span>(n<span style="color: #666666">/</span>M) <span style="color: #408080; font-style: italic">#number of minibatches</span>
-<span style="color: #408080; font-style: italic"># Guess for unknown parameters theta</span>
-theta <span style="color: #666666">=</span> np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>randn(<span style="color: #666666">3</span>,<span style="color: #666666">1</span>)
-
-<span style="color: #408080; font-style: italic"># Value for learning rate</span>
-eta <span style="color: #666666">=</span> <span style="color: #666666">0.01</span>
-<span style="color: #408080; font-style: italic"># Value for parameters beta1 and beta2, see https://arxiv.org/abs/1412.6980</span>
-beta1 <span style="color: #666666">=</span> <span style="color: #666666">0.9</span>
-beta2 <span style="color: #666666">=</span> <span style="color: #666666">0.999</span>
-<span style="color: #408080; font-style: italic"># Including AdaGrad parameter to avoid possible division by zero</span>
-delta  <span style="color: #666666">=</span> <span style="color: #666666">1e-7</span>
-<span style="color: #008000">iter</span> <span style="color: #666666">=</span> <span style="color: #666666">0</span>
-<span style="color: #008000; font-weight: bold">for</span> epoch <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(n_epochs):
-    first_moment <span style="color: #666666">=</span> <span style="color: #666666">0.0</span>
-    second_moment <span style="color: #666666">=</span> <span style="color: #666666">0.0</span>
-    <span style="color: #008000">iter</span> <span style="color: #666666">+=</span> <span style="color: #666666">1</span>
-    <span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(m):
-        random_index <span style="color: #666666">=</span> M<span style="color: #666666">*</span>np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>randint(m)
-        xi <span style="color: #666666">=</span> X[random_index:random_index<span style="color: #666666">+</span>M]
-        yi <span style="color: #666666">=</span> y[random_index:random_index<span style="color: #666666">+</span>M]
-        gradients <span style="color: #666666">=</span> (<span style="color: #666666">1.0/</span>M)<span style="color: #666666">*</span>training_gradient(yi, xi, theta)
-        <span style="color: #408080; font-style: italic"># Computing moments first</span>
-        first_moment <span style="color: #666666">=</span> beta1<span style="color: #666666">*</span>first_moment <span style="color: #666666">+</span> (<span style="color: #666666">1-</span>beta1)<span style="color: #666666">*</span>gradients
-        second_moment <span style="color: #666666">=</span> beta2<span style="color: #666666">*</span>second_moment<span style="color: #666666">+</span>(<span style="color: #666666">1-</span>beta2)<span style="color: #666666">*</span>gradients<span style="color: #666666">*</span>gradients
-        first_term <span style="color: #666666">=</span> first_moment<span style="color: #666666">/</span>(<span style="color: #666666">1.0-</span>beta1<span style="color: #666666">**</span><span style="color: #008000">iter</span>)
-        second_term <span style="color: #666666">=</span> second_moment<span style="color: #666666">/</span>(<span style="color: #666666">1.0-</span>beta2<span style="color: #666666">**</span><span style="color: #008000">iter</span>)
-	<span style="color: #408080; font-style: italic"># Scaling with rho the new and the previous results</span>
-        update <span style="color: #666666">=</span> eta<span style="color: #666666">*</span>first_term<span style="color: #666666">/</span>(np<span style="color: #666666">.</span>sqrt(second_term)<span style="color: #666666">+</span>delta)
-        theta <span style="color: #666666">-=</span> update
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;theta from own ADAM&quot;</span>)
-<span style="color: #008000">print</span>(theta)
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.model_selection</span> <span style="color: #008000; font-weight: bold">import</span>  train_test_split 
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.datasets</span> <span style="color: #008000; font-weight: bold">import</span> load_breast_cancer
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.linear_model</span> <span style="color: #008000; font-weight: bold">import</span> LogisticRegression
+cancer <span style="color: #666666">=</span> load_breast_cancer()
+<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">pandas</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">pd</span>
+<span style="color: #408080; font-style: italic"># Making a data frame</span>
+cancerpd <span style="color: #666666">=</span> pd<span style="color: #666666">.</span>DataFrame(cancer<span style="color: #666666">.</span>data, columns<span style="color: #666666">=</span>cancer<span style="color: #666666">.</span>feature_names)
+
+fig, axes <span style="color: #666666">=</span> plt<span style="color: #666666">.</span>subplots(<span style="color: #666666">15</span>,<span style="color: #666666">2</span>,figsize<span style="color: #666666">=</span>(<span style="color: #666666">10</span>,<span style="color: #666666">20</span>))
+malignant <span style="color: #666666">=</span> cancer<span style="color: #666666">.</span>data[cancer<span style="color: #666666">.</span>target <span style="color: #666666">==</span> <span style="color: #666666">0</span>]
+benign <span style="color: #666666">=</span> cancer<span style="color: #666666">.</span>data[cancer<span style="color: #666666">.</span>target <span style="color: #666666">==</span> <span style="color: #666666">1</span>]
+ax <span style="color: #666666">=</span> axes<span style="color: #666666">.</span>ravel()
+
+<span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #666666">30</span>):
+    _, bins <span style="color: #666666">=</span> np<span style="color: #666666">.</span>histogram(cancer<span style="color: #666666">.</span>data[:,i], bins <span style="color: #666666">=50</span>)
+    ax[i]<span style="color: #666666">.</span>hist(malignant[:,i], bins <span style="color: #666666">=</span> bins, alpha <span style="color: #666666">=</span> <span style="color: #666666">0.5</span>)
+    ax[i]<span style="color: #666666">.</span>hist(benign[:,i], bins <span style="color: #666666">=</span> bins, alpha <span style="color: #666666">=</span> <span style="color: #666666">0.5</span>)
+    ax[i]<span style="color: #666666">.</span>set_title(cancer<span style="color: #666666">.</span>feature_names[i])
+    ax[i]<span style="color: #666666">.</span>set_yticks(())
+ax[<span style="color: #666666">0</span>]<span style="color: #666666">.</span>set_xlabel(<span style="color: #BA2121">&quot;Feature magnitude&quot;</span>)
+ax[<span style="color: #666666">0</span>]<span style="color: #666666">.</span>set_ylabel(<span style="color: #BA2121">&quot;Frequency&quot;</span>)
+ax[<span style="color: #666666">0</span>]<span style="color: #666666">.</span>legend([<span style="color: #BA2121">&quot;Malignant&quot;</span>, <span style="color: #BA2121">&quot;Benign&quot;</span>], loc <span style="color: #666666">=</span><span style="color: #BA2121">&quot;best&quot;</span>)
+fig<span style="color: #666666">.</span>tight_layout()
+plt<span style="color: #666666">.</span>show()
+
+<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">seaborn</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">sns</span>
+correlation_matrix <span style="color: #666666">=</span> cancerpd<span style="color: #666666">.</span>corr()<span style="color: #666666">.</span>round(<span style="color: #666666">1</span>)
+<span style="color: #408080; font-style: italic"># use the heatmap function from seaborn to plot the correlation matrix</span>
+<span style="color: #408080; font-style: italic"># annot = True to print the values inside the square</span>
+plt<span style="color: #666666">.</span>figure(figsize<span style="color: #666666">=</span>(<span style="color: #666666">15</span>,<span style="color: #666666">8</span>))
+sns<span style="color: #666666">.</span>heatmap(data<span style="color: #666666">=</span>correlation_matrix, annot<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">True</span>)
+plt<span style="color: #666666">.</span>show()
 </pre>
 </div>
       </div>
@@ -2342,71 +1141,23 @@ <h2 id="and-finally-adam-https-arxiv-org-pdf-1412-6980-pdf">And finally <a href=
 
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="and-logistic-regression">And Logistic Regression </h2>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">autograd.numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">autograd</span> <span style="color: #008000; font-weight: bold">import</span> grad
-
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">sigmoid</span>(x):
-    <span style="color: #008000; font-weight: bold">return</span> <span style="color: #666666">0.5</span> <span style="color: #666666">*</span> (np<span style="color: #666666">.</span>tanh(x <span style="color: #666666">/</span> <span style="color: #666666">2.</span>) <span style="color: #666666">+</span> <span style="color: #666666">1</span>)
-
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">logistic_predictions</span>(weights, inputs):
-    <span style="color: #408080; font-style: italic"># Outputs probability of a label being true according to logistic model.</span>
-    <span style="color: #008000; font-weight: bold">return</span> sigmoid(np<span style="color: #666666">.</span>dot(inputs, weights))
-
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">training_loss</span>(weights):
-    <span style="color: #408080; font-style: italic"># Training loss is the negative log-likelihood of the training labels.</span>
-    preds <span style="color: #666666">=</span> logistic_predictions(weights, inputs)
-    label_probabilities <span style="color: #666666">=</span> preds <span style="color: #666666">*</span> targets <span style="color: #666666">+</span> (<span style="color: #666666">1</span> <span style="color: #666666">-</span> preds) <span style="color: #666666">*</span> (<span style="color: #666666">1</span> <span style="color: #666666">-</span> targets)
-    <span style="color: #008000; font-weight: bold">return</span> <span style="color: #666666">-</span>np<span style="color: #666666">.</span>sum(np<span style="color: #666666">.</span>log(label_probabilities))
-
-<span style="color: #408080; font-style: italic"># Build a toy dataset.</span>
-inputs <span style="color: #666666">=</span> np<span style="color: #666666">.</span>array([[<span style="color: #666666">0.52</span>, <span style="color: #666666">1.12</span>,  <span style="color: #666666">0.77</span>],
-                   [<span style="color: #666666">0.88</span>, <span style="color: #666666">-1.08</span>, <span style="color: #666666">0.15</span>],
-                   [<span style="color: #666666">0.52</span>, <span style="color: #666666">0.06</span>, <span style="color: #666666">-1.30</span>],
-                   [<span style="color: #666666">0.74</span>, <span style="color: #666666">-2.49</span>, <span style="color: #666666">1.39</span>]])
-targets <span style="color: #666666">=</span> np<span style="color: #666666">.</span>array([<span style="color: #008000; font-weight: bold">True</span>, <span style="color: #008000; font-weight: bold">True</span>, <span style="color: #008000; font-weight: bold">False</span>, <span style="color: #008000; font-weight: bold">True</span>])
-
-<span style="color: #408080; font-style: italic"># Define a function that returns gradients of training loss using Autograd.</span>
-training_gradient_fun <span style="color: #666666">=</span> grad(training_loss)
-
-<span style="color: #408080; font-style: italic"># Optimize weights using gradient descent.</span>
-weights <span style="color: #666666">=</span> np<span style="color: #666666">.</span>array([<span style="color: #666666">0.0</span>, <span style="color: #666666">0.0</span>, <span style="color: #666666">0.0</span>])
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;Initial loss:&quot;</span>, training_loss(weights))
-<span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #666666">100</span>):
-    weights <span style="color: #666666">-=</span> training_gradient_fun(weights) <span style="color: #666666">*</span> <span style="color: #666666">0.01</span>
-
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;Trained loss:&quot;</span>, training_loss(weights))
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-<h2 id="introducing-jax-https-jax-readthedocs-io-en-latest">Introducing <a href="https://jax.readthedocs.io/en/latest/" target="_blank">JAX</a> </h2>
+<h2 id="discussing-the-correlation-data">Discussing the correlation data </h2>
 
-<p>Presently, instead of using <b>autograd</b>, we recommend using <a href="https://jax.readthedocs.io/en/latest/" target="_blank">JAX</a></p>
+<p>In the above example we note two things. In the first plot we display
+the overlap of benign and malignant tumors as functions of the various
+features in the Wisconsin data set. We see that for
+some of the features we can distinguish clearly the benign and
+malignant cases while for other features we cannot. This can point to
+us which features may be of greater interest when we wish to classify
+a benign or not benign tumour.
+</p>
 
-<p><b>JAX</b> is Autograd and <a href="https://www.tensorflow.org/xla" target="_blank">XLA (Accelerated Linear Algebra))</a>,
-brought together for high-performance numerical computing and machine learning research.
-It provides composable transformations of Python+NumPy programs: differentiate, vectorize, parallelize, Just-In-Time compile to GPU/TPU, and more.
+<p>In the second figure we have computed the so-called correlation
+matrix, which in our case with thirty features becomes a \( 30\times 30 \)
+matrix.
 </p>
-<h3 id="getting-started-with-jax-note-the-way-we-import-numpy">Getting started with Jax, note the way we import numpy </h3>
+
+<p>We constructed this matrix using <b>pandas</b> via the statements</p>
 
 <!-- code=python (!bc pycod) typeset with pygments style "default" -->
 <div class="cell border-box-sizing code_cell rendered">
@@ -2414,12 +1165,7 @@ <h3 id="getting-started-with-jax-note-the-way-we-import-numpy">Getting started w
     <div class="inner_cell">
       <div class="input_area">
         <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">jax</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">jax.numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">jnp</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">matplotlib.pyplot</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">plt</span>
-
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">jax</span> <span style="color: #008000; font-weight: bold">import</span> grad <span style="color: #008000; font-weight: bold">as</span> jax_grad
+  <pre style="line-height: 125%;">cancerpd <span style="color: #666666">=</span> pd<span style="color: #666666">.</span>DataFrame(cancer<span style="color: #666666">.</span>data, columns<span style="color: #666666">=</span>cancer<span style="color: #666666">.</span>feature_names)
 </pre>
 </div>
       </div>
@@ -2434,8 +1180,8 @@ <h3 id="getting-started-with-jax-note-the-way-we-import-numpy">Getting started w
     </div>
   </div>
 </div>
-<h3 id="a-warm-up-example">A warm-up example </h3>
 
+<p>and then</p>
 
 <!-- code=python (!bc pycod) typeset with pygments style "default" -->
 <div class="cell border-box-sizing code_cell rendered">
@@ -2443,40 +1189,7 @@ <h3 id="a-warm-up-example">A warm-up example </h3>
     <div class="inner_cell">
       <div class="input_area">
         <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">function</span>(x):
-    <span style="color: #008000; font-weight: bold">return</span> x<span style="color: #666666">**2</span>
-
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">analytical_gradient</span>(x):
-    <span style="color: #008000; font-weight: bold">return</span> <span style="color: #666666">2*</span>x
-
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">gradient_descent</span>(starting_point, learning_rate, num_iterations, solver<span style="color: #666666">=</span><span style="color: #BA2121">&quot;analytical&quot;</span>):
-    x <span style="color: #666666">=</span> starting_point
-    trajectory_x <span style="color: #666666">=</span> [x]
-    trajectory_y <span style="color: #666666">=</span> [function(x)]
-
-    <span style="color: #008000; font-weight: bold">if</span> solver <span style="color: #666666">==</span> <span style="color: #BA2121">&quot;analytical&quot;</span>:
-        grad <span style="color: #666666">=</span> analytical_gradient    
-    <span style="color: #008000; font-weight: bold">elif</span> solver <span style="color: #666666">==</span> <span style="color: #BA2121">&quot;jax&quot;</span>:
-        grad <span style="color: #666666">=</span> jax_grad(function)
-        x <span style="color: #666666">=</span> jnp<span style="color: #666666">.</span>float64(x)
-        learning_rate <span style="color: #666666">=</span> jnp<span style="color: #666666">.</span>float64(learning_rate)
-
-    <span style="color: #008000; font-weight: bold">for</span> _ <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(num_iterations):
-        
-        x <span style="color: #666666">=</span> x <span style="color: #666666">-</span> learning_rate <span style="color: #666666">*</span> grad(x)
-        trajectory_x<span style="color: #666666">.</span>append(x)
-        trajectory_y<span style="color: #666666">.</span>append(function(x))
-
-    <span style="color: #008000; font-weight: bold">return</span> trajectory_x, trajectory_y
-
-x <span style="color: #666666">=</span> np<span style="color: #666666">.</span>linspace(<span style="color: #666666">-5</span>, <span style="color: #666666">5</span>, <span style="color: #666666">100</span>)
-plt<span style="color: #666666">.</span>plot(x, function(x), label<span style="color: #666666">=</span><span style="color: #BA2121">&quot;f(x)&quot;</span>)
-
-descent_x, descent_y <span style="color: #666666">=</span> gradient_descent(<span style="color: #666666">5</span>, <span style="color: #666666">0.1</span>, <span style="color: #666666">10</span>, solver<span style="color: #666666">=</span><span style="color: #BA2121">&quot;analytical&quot;</span>)
-jax_descend_x, jax_descend_y <span style="color: #666666">=</span> gradient_descent(<span style="color: #666666">5</span>, <span style="color: #666666">0.1</span>, <span style="color: #666666">10</span>, solver<span style="color: #666666">=</span><span style="color: #BA2121">&quot;jax&quot;</span>)
-
-plt<span style="color: #666666">.</span>plot(descent_x, descent_y, label<span style="color: #666666">=</span><span style="color: #BA2121">&quot;Gradient descent&quot;</span>, marker<span style="color: #666666">=</span><span style="color: #BA2121">&quot;o&quot;</span>)
-plt<span style="color: #666666">.</span>plot(jax_descend_x, jax_descend_y, label<span style="color: #666666">=</span><span style="color: #BA2121">&quot;JAX&quot;</span>, marker<span style="color: #666666">=</span><span style="color: #BA2121">&quot;x&quot;</span>)
+  <pre style="line-height: 125%;">correlation_matrix <span style="color: #666666">=</span> cancerpd<span style="color: #666666">.</span>corr()<span style="color: #666666">.</span>round(<span style="color: #666666">1</span>)
 </pre>
 </div>
       </div>
@@ -2491,8 +1204,15 @@ <h3 id="a-warm-up-example">A warm-up example </h3>
     </div>
   </div>
 </div>
-<h3 id="a-more-advanced-example">A more advanced example </h3>
 
+<p>Diagonalizing this matrix we can in turn say something about which
+features are of relevance and which are not. This leads  us to
+the classical Principal Component Analysis (PCA) theorem with
+applications. This will be discussed later this semester.
+</p>
+
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="other-measures-in-classification-studies">Other measures in classification studies </h2>
 
 <!-- code=python (!bc pycod) typeset with pygments style "default" -->
 <div class="cell border-box-sizing code_cell rendered">
@@ -2500,26 +1220,38 @@ <h3 id="a-more-advanced-example">A more advanced example </h3>
     <div class="inner_cell">
       <div class="input_area">
         <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;">backend <span style="color: #666666">=</span> np
-
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">function</span>(x):
-    <span style="color: #008000; font-weight: bold">return</span> x<span style="color: #666666">*</span>backend<span style="color: #666666">.</span>sin(x<span style="color: #666666">**2</span> <span style="color: #666666">+</span> <span style="color: #666666">1</span>)
-
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">analytical_gradient</span>(x):
-    <span style="color: #008000; font-weight: bold">return</span> backend<span style="color: #666666">.</span>sin(x<span style="color: #666666">**2</span> <span style="color: #666666">+</span> <span style="color: #666666">1</span>) <span style="color: #666666">+</span> <span style="color: #666666">2*</span>x<span style="color: #666666">**2*</span>backend<span style="color: #666666">.</span>cos(x<span style="color: #666666">**2</span> <span style="color: #666666">+</span> <span style="color: #666666">1</span>)
-
-
-x <span style="color: #666666">=</span> np<span style="color: #666666">.</span>linspace(<span style="color: #666666">-5</span>, <span style="color: #666666">5</span>, <span style="color: #666666">100</span>)
-plt<span style="color: #666666">.</span>plot(x, function(x), label<span style="color: #666666">=</span><span style="color: #BA2121">&quot;f(x)&quot;</span>)
-
-descent_x, descent_y <span style="color: #666666">=</span> gradient_descent(<span style="color: #666666">1</span>, <span style="color: #666666">0.01</span>, <span style="color: #666666">300</span>, solver<span style="color: #666666">=</span><span style="color: #BA2121">&quot;analytical&quot;</span>)
+  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">matplotlib.pyplot</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">plt</span>
+<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.model_selection</span> <span style="color: #008000; font-weight: bold">import</span>  train_test_split 
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.datasets</span> <span style="color: #008000; font-weight: bold">import</span> load_breast_cancer
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.linear_model</span> <span style="color: #008000; font-weight: bold">import</span> LogisticRegression
 
-<span style="color: #408080; font-style: italic"># Change the backend to JAX</span>
-backend <span style="color: #666666">=</span> jnp
-jax_descend_x, jax_descend_y <span style="color: #666666">=</span> gradient_descent(<span style="color: #666666">1</span>, <span style="color: #666666">0.01</span>, <span style="color: #666666">300</span>, solver<span style="color: #666666">=</span><span style="color: #BA2121">&quot;jax&quot;</span>)
+<span style="color: #408080; font-style: italic"># Load the data</span>
+cancer <span style="color: #666666">=</span> load_breast_cancer()
 
-plt<span style="color: #666666">.</span>scatter(descent_x, descent_y, label<span style="color: #666666">=</span><span style="color: #BA2121">&quot;Gradient descent&quot;</span>, marker<span style="color: #666666">=</span><span style="color: #BA2121">&quot;v&quot;</span>, s<span style="color: #666666">=10</span>, color<span style="color: #666666">=</span><span style="color: #BA2121">&quot;red&quot;</span>) 
-plt<span style="color: #666666">.</span>scatter(jax_descend_x, jax_descend_y, label<span style="color: #666666">=</span><span style="color: #BA2121">&quot;JAX&quot;</span>, marker<span style="color: #666666">=</span><span style="color: #BA2121">&quot;x&quot;</span>, s<span style="color: #666666">=5</span>, color<span style="color: #666666">=</span><span style="color: #BA2121">&quot;black&quot;</span>)
+X_train, X_test, y_train, y_test <span style="color: #666666">=</span> train_test_split(cancer<span style="color: #666666">.</span>data,cancer<span style="color: #666666">.</span>target,random_state<span style="color: #666666">=0</span>)
+<span style="color: #008000">print</span>(X_train<span style="color: #666666">.</span>shape)
+<span style="color: #008000">print</span>(X_test<span style="color: #666666">.</span>shape)
+<span style="color: #408080; font-style: italic"># Logistic Regression</span>
+logreg <span style="color: #666666">=</span> LogisticRegression(solver<span style="color: #666666">=</span><span style="color: #BA2121">&#39;lbfgs&#39;</span>)
+logreg<span style="color: #666666">.</span>fit(X_train, y_train)
+
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.preprocessing</span> <span style="color: #008000; font-weight: bold">import</span> LabelEncoder
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.model_selection</span> <span style="color: #008000; font-weight: bold">import</span> cross_validate
+<span style="color: #408080; font-style: italic">#Cross validation</span>
+accuracy <span style="color: #666666">=</span> cross_validate(logreg,X_test,y_test,cv<span style="color: #666666">=10</span>)[<span style="color: #BA2121">&#39;test_score&#39;</span>]
+<span style="color: #008000">print</span>(accuracy)
+<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;Test set accuracy with Logistic Regression: </span><span style="color: #BB6688; font-weight: bold">{:.2f}</span><span style="color: #BA2121">&quot;</span><span style="color: #666666">.</span>format(logreg<span style="color: #666666">.</span>score(X_test,y_test)))
+
+<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">scikitplot</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">skplt</span>
+y_pred <span style="color: #666666">=</span> logreg<span style="color: #666666">.</span>predict(X_test)
+skplt<span style="color: #666666">.</span>metrics<span style="color: #666666">.</span>plot_confusion_matrix(y_test, y_pred, normalize<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">True</span>)
+plt<span style="color: #666666">.</span>show()
+y_probas <span style="color: #666666">=</span> logreg<span style="color: #666666">.</span>predict_proba(X_test)
+skplt<span style="color: #666666">.</span>metrics<span style="color: #666666">.</span>plot_roc(y_test, y_probas)
+plt<span style="color: #666666">.</span>show()
+skplt<span style="color: #666666">.</span>metrics<span style="color: #666666">.</span>plot_cumulative_gain(y_test, y_probas)
+plt<span style="color: #666666">.</span>show()
 </pre>
 </div>
       </div>
@@ -2938,7 +1670,7 @@ <h2 id="mathematical-model">Mathematical model  </h2>
 
 $$
 \begin{equation} z_i^1 = \sum_{j=1}^{M} w_{ij}^1 x_j + b_i^1
-\label{_auto6}
+\label{_auto1}
 \end{equation}
 $$
 
@@ -2983,7 +1715,7 @@ <h2 id="mathematical-model">Mathematical model  </h2>
 $$
 \begin{align}
  y_i^2 &= f^2\left(\sum_{j=1}^N w_{ij}^2 y_j^1 + b_i^2\right) 
-\label{_auto7}\\
+\label{_auto2}\\
  &= f^2\left[\sum_{j=1}^N w_{ij}^2f^1\left(\sum_{k=1}^M w_{jk}^1 x_k + b_j^1\right) + b_i^2\right]
 \label{outputLayer2}
 \end{align}
@@ -2994,10 +1726,10 @@ <h2 id="mathematical-model">Mathematical model  </h2>
 $$
 \begin{align}
  y_i^3 &= f^3\left(\sum_{j=1}^N w_{ij}^3 y_j^2 + b_i^3\right) 
-\label{_auto8}\\
+\label{_auto3}\\
  &= f_3\left[\sum_{j} w_{ij}^3 f^2\left(\sum_{k} w_{jk}^2 f^1\left(\sum_{m} w_{km}^1 x_m + b_k^1\right) + b_j^2\right)
   + b_1^3\right]
-\label{_auto9}
+\label{_auto4}
 \end{align}
 $$
 
@@ -3037,7 +1769,7 @@ <h2 id="mathematical-model">Mathematical model  </h2>
 $$
 \begin{equation}
  f(x) = c_1 f(c_2 x + c_3) + c_4
-\label{_auto10}
+\label{_auto5}
 \end{equation}
 $$
 
@@ -3079,7 +1811,7 @@ <h3 id="matrix-vector-notation">Matrix-vector notation </h3>
            b^2_2 \\
            b^2_3 \\
           \end{array}\right]\right).
-\label{_auto11}
+\label{_auto6}
 \end{equation}
 $$
 
@@ -3093,7 +1825,7 @@ <h3 id="matrix-vector-notation-and-activation">Matrix-vector notation  and activ
 \begin{equation}
  y^2_i = f_2\Bigr(w^2_{i1}y^1_1 + w^2_{i2}y^1_2 + w^2_{i3}y^1_3 + b^2_i\Bigr) = 
  f_2\left(\sum_{j=1}^3 w^2_{ij} y_j^1 + b^2_i\right).
-\label{_auto12}
+\label{_auto7}
 \end{equation}
 $$
 
@@ -3249,7 +1981,7 @@ <h3 id="relevance">Relevance </h3>
 
 <!-- ------------------- end of main content --------------- -->
 <center style="font-size:80%">
-<!-- copyright --> &copy; 1999-2024, Morten Hjorth-Jensen. Released under CC Attribution-NonCommercial 4.0 license
+<!-- copyright --> &copy; 1999-2025, Morten Hjorth-Jensen. Released under CC Attribution-NonCommercial 4.0 license
 </center>
 </body>
 </html>
diff --git a/doc/pub/week40/ipynb/ipynb-week40-src.tar.gz b/doc/pub/week40/ipynb/ipynb-week40-src.tar.gz
index 116180ff4..cbb8081d6 100644
Binary files a/doc/pub/week40/ipynb/ipynb-week40-src.tar.gz and b/doc/pub/week40/ipynb/ipynb-week40-src.tar.gz differ
diff --git a/doc/pub/week40/ipynb/week40.ipynb b/doc/pub/week40/ipynb/week40.ipynb
index 56c4d41a7..aa3733b88 100644
--- a/doc/pub/week40/ipynb/week40.ipynb
+++ b/doc/pub/week40/ipynb/week40.ipynb
@@ -2,8 +2,10 @@
  "cells": [
   {
    "cell_type": "markdown",
-   "id": "71c0e62a",
-   "metadata": {},
+   "id": "2303c986",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "<!-- HTML file automatically generated from DocOnce source (https://github.com/doconce/doconce/)\n",
     "doconce format html week40.do.txt --no_mako -->\n",
@@ -12,65 +14,63 @@
   },
   {
    "cell_type": "markdown",
-   "id": "4e0afae4",
-   "metadata": {},
+   "id": "75c3b33e",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "# Week 40: Gradient descent methods (continued) and start Neural networks\n",
-    "**Morten Hjorth-Jensen**, Department of Physics, University of Oslo, Norway and Department of Physics and Astronomy and Facility for Rare Ion Beams, Michigan State University, USA\n",
+    "**Morten Hjorth-Jensen**, Department of Physics, University of Oslo, Norway\n",
     "\n",
-    "Date: **September 30-October 4, 2024**"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "e6b378ac",
-   "metadata": {},
-   "source": [
-    "## Plans for week 40"
+    "Date: **September 29-October 3, 2025**"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "1ba91689",
-   "metadata": {},
+   "id": "4ba50982",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "## Lecture Monday September 30, 2024\n",
-    "1. Stochastic Gradient descent with examples and automatic differentiation\n",
+    "## Lecture Monday September 29, 2025\n",
+    "1. Logistic regression and gradient descent, examples on how to code\n",
+    "<!-- o Automatic differentiation and gradient descent, examples using Logistic regression -->\n",
     "\n",
-    "2. If we get time, we start with the basics of Neural Networks, setting up the basic steps, from the simple perceptron model to the multi-layer perceptron model\n",
+    "2. Start with the basics of Neural Networks, setting up the basic steps, from the simple perceptron model to the multi-layer perceptron model\n",
     "\n",
-    "3. [Video of lecture](https://youtu.be/jdJoOrCIdII)\n",
+    "3. Video of lecture at <https://youtu.be/MS3Tv8FVArs>\n",
     "\n",
-    "4. Whiteboard notes at <https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2024/NotesSeptember30.pdf>"
+    "4. Whiteboard notes at <https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2025/FYSSTKweek40.pdf>"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "fb1da492",
-   "metadata": {},
+   "id": "1d527020",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "## Suggested readings and videos\n",
     "**Readings and Videos:**\n",
     "\n",
     "1. The lecture notes for week 40 (these notes)\n",
+    "<!-- o For a good discussion on gradient methods, we would like to recommend Goodfellow et al section 4.3-4.5 and# sections 8.3-8.6. We will come back to the latter chapter in our discussion of Neural networks as well. -->\n",
     "\n",
-    "2. For a good discussion on gradient methods, we would like to recommend Goodfellow et al section 4.3-4.5 and sections 8.3-8.6. We will come back to the latter chapter in our discussion of Neural networks as well.\n",
-    "\n",
-    "3. For neural networks we recommend Goodfellow et al chapter 6 and Raschka et al chapter 2 (contains also material about gradient descent) and chapter 11 (we will use this next week)\n",
-    "\n",
-    "4. Video on gradient descent at <https://www.youtube.com/watch?v=sDv4f4s2SB8>\n",
-    "\n",
-    "5. Video on stochastic gradient descent at <https://www.youtube.com/watch?v=vMh0zPT0tLI>\n",
+    "2. For neural networks we recommend Goodfellow et al chapter 6 and Raschka et al chapter 2 (contains also material about gradient descent) and chapter 11 (we will use this next week)\n",
+    "<!-- o Video on gradient descent at <https://www.youtube.com/watch?v=sDv4f4s2SB8> -->\n",
+    "<!-- o Video on automatic differentiation  at <https://www.youtube.com/watch?v=wG_nF1awSSY> -->\n",
     "\n",
-    "6. Neural Networks demystified at <https://www.youtube.com/watch?v=bxe2T-V8XRs&list=PLiaHhY2iBX9hdHaRr6b7XevZtgZRa1PoU&ab_channel=WelchLabs>\n",
+    "3. Neural Networks demystified at <https://www.youtube.com/watch?v=bxe2T-V8XRs&list=PLiaHhY2iBX9hdHaRr6b7XevZtgZRa1PoU&ab_channel=WelchLabs>\n",
     "\n",
-    "7. Building Neural Networks from scratch at URL:https://www.youtube.com/watch?v=Wo5dMEP_BbI&list=PLQVvvaa0QuDcjD5BAw2DxE6OF2tius3V3&ab_channel=sentdex\""
+    "4. Building Neural Networks from scratch at URL:https://www.youtube.com/watch?v=Wo5dMEP_BbI&list=PLQVvvaa0QuDcjD5BAw2DxE6OF2tius3V3&ab_channel=sentdex\""
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "5e18b164",
-   "metadata": {},
+   "id": "63a4d497",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "## Lab sessions Tuesday and Wednesday\n",
     "**Material for the active learning sessions on Tuesday and Wednesday.**\n",
@@ -86,2337 +86,1319 @@
   },
   {
    "cell_type": "markdown",
-   "id": "ca1eb3e1",
-   "metadata": {},
-   "source": [
-    "## Summary from last week, using gradient descent methods, limitations\n",
-    "\n",
-    "* **Gradient descent (GD) finds local minima of our function**. Since the GD algorithm is deterministic, if it converges, it will converge to a local minimum of our cost/loss/risk function. Because in ML we are often dealing with extremely rugged landscapes with many local minima, this can lead to poor performance.\n",
-    "\n",
-    "* **GD is sensitive to initial conditions**. One consequence of the local nature of GD is that initial conditions matter. Depending on where one starts, one will end up at a different local minima. Therefore, it is very important to think about how one initializes the training process. This is true for GD as well as more complicated variants of GD.\n",
-    "\n",
-    "* **Gradients are computationally expensive to calculate for large datasets**. In many cases in statistics and ML, the cost/loss/risk function is a sum of terms, with one term for each data point. For example, in linear regression, $E \\propto \\sum_{i=1}^n (y_i - \\mathbf{w}^T\\cdot\\mathbf{x}_i)^2$; for logistic regression, the square error is replaced by the cross entropy. To calculate the gradient we have to sum over *all* $n$ data points. Doing this at every GD step becomes extremely computationally expensive. An ingenious solution to this, is to calculate the gradients using small subsets of the data called \"mini batches\". This has the added benefit of introducing stochasticity into our algorithm.\n",
-    "\n",
-    "* **GD is very sensitive to choices of learning rates**. GD is extremely sensitive to the choice of learning rates. If the learning rate is very small, the training process take an extremely long time. For larger learning rates, GD can diverge and give poor results. Furthermore, depending on what the local landscape looks like, we have to modify the learning rates to ensure convergence. Ideally, we would *adaptively* choose the learning rates to match the landscape.\n",
-    "\n",
-    "* **GD treats all directions in parameter space uniformly.** Another major drawback of GD is that unlike Newton's method, the learning rate for GD is the same in all directions in parameter space. For this reason, the maximum learning rate is set by the behavior of the steepest direction and this can significantly slow down training. Ideally, we would like to take large steps in flat directions and small steps in steep directions. Since we are exploring rugged landscapes where curvatures change, this requires us to keep track of not only the gradient but second derivatives. The ideal scenario would be to calculate the Hessian but this proves to be too computationally expensive. \n",
-    "\n",
-    "* GD can take exponential time to escape saddle points, even with random initialization. As we mentioned, GD is extremely sensitive to initial condition since it determines the particular local minimum GD would eventually reach. However, even with a good initialization scheme, through the introduction of randomness, GD can still take exponential time to escape saddle points."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "d1832283",
-   "metadata": {},
-   "source": [
-    "## Simple implementation of GD for OLS, Ridge and Lasso\n",
-    "\n",
-    "Last week we studied both several gradient methods. With and without an update of the learning.\n",
-    "We summarize some of these here for the methods we hvae studied in project one, without the inclusion of momentum."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 1,
-   "id": "0dee3b51",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "from random import random, seed\n",
-    "import numpy as np\n",
-    "\n",
-    "# the number of datapoints with a 2nd-order polynomial\n",
-    "n = 100\n",
-    "x = 2*np.random.rand(n,1)\n",
-    "y = 4+3*x+5*x*x\n",
-    "# Design matrix including the intercept\n",
-    "# No scaling of data of and all data used for training \n",
-    "X = np.c_[np.ones((n,1)), x, x*x]\n",
-    "# Learning rate and number of iterations\n",
-    "eta = 0.05\n",
-    "Niterations = 100\n",
-    "\n",
-    "# OLS part\n",
-    "beta_OLS = np.random.randn(3,1)\n",
-    "gradient = np.zeros(3)\n",
-    "for iter in range(Niterations):\n",
-    "    gradient = (2.0/n)*X.T @ (X @ beta_OLS-y)\n",
-    "    beta_OLS -= eta*gradient\n",
-    "print('Parameters for OLS using gradient descent')    \n",
-    "print(beta_OLS)\n",
-    "\n",
-    "#Ridge and Lasso parameter Lambda\n",
-    "Lambda  = 0.01\n",
-    "Id = n*Lambda* np.eye((X.T @ X).shape[0])\n",
-    "# Gradient descent with  Ridge\n",
-    "beta_Ridge = np.random.randn(3,1)\n",
-    "gradient = np.zeros(3)\n",
-    "for iter in range(Niterations):\n",
-    "    gradients = 2.0/n*X.T @ (X @ beta_Ridge-y)+2*Lambda*beta_Ridge\n",
-    "    beta_Ridge -= eta*gradients\n",
-    "print('Parameters for Ridge using gradient descent')    \n",
-    "print(beta_Ridge)\n",
-    "\n",
-    "# Gradient descent with Lasso\n",
-    "beta_Lasso = np.random.randn(3,1)\n",
-    "gradient = np.zeros(3)\n",
-    "for iter in range(Niterations):\n",
-    "    gradients = 2.0/n*X.T @ (X @ beta_Lasso-y)+Lambda*np.sign(beta_Lasso)\n",
-    "    beta_Lasso -= eta*gradients\n",
-    "print('Parameters for Lasso using gradient descent')    \n",
-    "print(beta_Lasso)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "cda18663",
-   "metadata": {},
-   "source": [
-    "## But none of these can compete with Newton's method\n",
-    "\n",
-    "Note that we here have introduced automatic differentiation"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 2,
-   "id": "e1e51c75",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# Using Newton's method\n",
-    "from random import random, seed\n",
-    "import numpy as np\n",
-    "import autograd.numpy as np\n",
-    "from autograd import grad\n",
-    "\n",
-    "def CostOLS(beta):\n",
-    "    return (1.0/n)*np.sum((y-X @ beta)**2)\n",
-    "\n",
-    "n = 100\n",
-    "x = 2*np.random.rand(n,1)\n",
-    "y = 4+3*x+5*x*x\n",
-    "\n",
-    "X = np.c_[np.ones((n,1)), x, x*x]\n",
-    "XT_X = X.T @ X\n",
-    "beta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)\n",
-    "print(\"Own inversion\")\n",
-    "print(beta_linreg)\n",
-    "# Hessian matrix\n",
-    "H = (2.0/n)* XT_X\n",
-    "# Note that here the Hessian does not depend on the parameters beta\n",
-    "invH = np.linalg.pinv(H)\n",
-    "beta = np.random.randn(3,1)\n",
-    "Niterations = 5\n",
-    "# define the gradient\n",
-    "training_gradient = grad(CostOLS)\n",
-    "\n",
-    "for iter in range(Niterations):\n",
-    "    gradients = training_gradient(beta)\n",
-    "    beta -= invH @ gradients\n",
-    "    print(iter,gradients[0],gradients[1])\n",
-    "print(\"beta from own Newton code\")\n",
-    "print(beta)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "8de3d7c1",
-   "metadata": {},
-   "source": [
-    "## Gradient descent and Logistic regression\n",
-    "\n",
-    "Finally, we complete these examples by adding a simple code for\n",
-    "Logistic regression. Note the more general approach with a class for\n",
-    "the method. Here we use a so-called **AND** gate for our data set."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 3,
-   "id": "4f87ae26",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "import numpy as np\n",
-    "class LogisticRegression:\n",
-    "    def __init__(self, learning_rate=0.01, num_iterations=1000):\n",
-    "        self.learning_rate = learning_rate\n",
-    "        self.num_iterations = num_iterations\n",
-    "        self.beta_logreg = None\n",
-    "    def sigmoid(self, z):\n",
-    "        return 1 / (1 + np.exp(-z))\n",
-    "    def GDfit(self, X, y):\n",
-    "        n_data, num_features = X.shape\n",
-    "        self.beta_logreg = np.zeros(num_features)\n",
-    "        for _ in range(self.num_iterations):\n",
-    "            linear_model = X @ self.beta_logreg\n",
-    "            y_predicted = self.sigmoid(linear_model)\n",
-    "            # Gradient calculation\n",
-    "            gradient = (X.T @ (y_predicted - y))/n_data\n",
-    "            # Update beta_logreg\n",
-    "            self.beta_logreg -= self.learning_rate*gradient\n",
-    "    def predict(self, X):\n",
-    "        linear_model = X @ self.beta_logreg\n",
-    "        y_predicted = self.sigmoid(linear_model)\n",
-    "        return [1 if i >= 0.5 else 0 for i in y_predicted]\n",
-    "# Example usage\n",
-    "if __name__ == \"__main__\":\n",
-    "    # Sample data\n",
-    "    X = np.array([[0, 0], [1, 0], [0, 1], [1, 1]])\n",
-    "    y = np.array([0, 0, 0, 1])  # This is an AND gate\n",
-    "    model = LogisticRegression(learning_rate=0.01, num_iterations=1000)\n",
-    "    model.GDfit(X, y)\n",
-    "    predictions = model.predict(X)\n",
-    "    print(\"Predictions:\", predictions)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "c2781943",
-   "metadata": {},
+   "id": "73621d6b",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "## Overview video on Stochastic Gradient Descent\n",
-    "\n",
-    "[What is Stochastic Gradient Descent](https://www.youtube.com/watch?v=vMh0zPT0tLI&ab_channel=StatQuestwithJoshStarmer)\n",
-    "There are several reasons for using stochastic gradient descent. Some of these are:\n",
-    "\n",
-    "1. Efficiency: Updates weights more frequently using a single or a small batch of samples, which speeds up convergence.\n",
-    "\n",
-    "2. Hopefully avoid Local Minima\n",
+    "## Logistic Regression, from last week\n",
     "\n",
-    "3. Memory Usage: Requires less memory compared to computing gradients for the entire dataset."
+    "In linear regression our main interest was centered on learning the\n",
+    "coefficients of a functional fit (say a polynomial) in order to be\n",
+    "able to predict the response of a continuous variable on some unseen\n",
+    "data. The fit to the continuous variable $y_i$ is based on some\n",
+    "independent variables $\\boldsymbol{x}_i$. Linear regression resulted in\n",
+    "analytical expressions for standard ordinary Least Squares or Ridge\n",
+    "regression (in terms of matrices to invert) for several quantities,\n",
+    "ranging from the variance and thereby the confidence intervals of the\n",
+    "parameters $\\boldsymbol{\\theta}$ to the mean squared error. If we can invert\n",
+    "the product of the design matrices, linear regression gives then a\n",
+    "simple recipe for fitting our data."
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "1e37491a",
-   "metadata": {},
+   "id": "fc1df17b",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "## Batches and mini-batches\n",
+    "## Classification problems\n",
     "\n",
-    "In gradient descent we compute the cost function and its gradient for all data points we have.\n",
+    "Classification problems, however, are concerned with outcomes taking\n",
+    "the form of discrete variables (i.e. categories). We may for example,\n",
+    "on the basis of DNA sequencing for a number of patients, like to find\n",
+    "out which mutations are important for a certain disease; or based on\n",
+    "scans of various patients' brains, figure out if there is a tumor or\n",
+    "not; or given a specific physical system, we'd like to identify its\n",
+    "state, say whether it is an ordered or disordered system (typical\n",
+    "situation in solid state physics); or classify the status of a\n",
+    "patient, whether she/he has a stroke or not and many other similar\n",
+    "situations.\n",
     "\n",
-    "In large-scale applications such as the [ILSVRC challenge](https://www.image-net.org/challenges/LSVRC/), the\n",
-    "training data can have on order of millions of examples. Hence, it\n",
-    "seems wasteful to compute the full cost function over the entire\n",
-    "training set in order to perform only a single parameter update. A\n",
-    "very common approach to addressing this challenge is to compute the\n",
-    "gradient over batches of the training data. For example, a typical batch could contain some thousand  examples from\n",
-    "an  entire training set of several millions. This batch is then used to\n",
-    "perform a parameter update."
+    "The most common situation we encounter when we apply logistic\n",
+    "regression is that of two possible outcomes, normally denoted as a\n",
+    "binary outcome, true or false, positive or negative, success or\n",
+    "failure etc."
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "6feab258",
-   "metadata": {},
+   "id": "a3d311e6",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "## Stochastic Gradient Descent (SGD)\n",
+    "## Optimization and Deep learning\n",
     "\n",
-    "In stochastic gradient descent, the extreme case is the case where we\n",
-    "have only one batch, that is we include the whole data set.\n",
+    "Logistic regression will also serve as our stepping stone towards\n",
+    "neural network algorithms and supervised deep learning. For logistic\n",
+    "learning, the minimization of the cost function leads to a non-linear\n",
+    "equation in the parameters $\\boldsymbol{\\theta}$. The optimization of the\n",
+    "problem calls therefore for minimization algorithms.\n",
     "\n",
-    "This process is called Stochastic Gradient\n",
-    "Descent (SGD) (or also sometimes on-line gradient descent). This is\n",
-    "relatively less common to see because in practice due to vectorized\n",
-    "code optimizations it can be computationally much more efficient to\n",
-    "evaluate the gradient for 100 examples, than the gradient for one\n",
-    "example 100 times. Even though SGD technically refers to using a\n",
-    "single example at a time to evaluate the gradient, you will hear\n",
-    "people use the term SGD even when referring to mini-batch gradient\n",
-    "descent (i.e. mentions of MGD for “Minibatch Gradient Descent”, or BGD\n",
-    "for “Batch gradient descent” are rare to see), where it is usually\n",
-    "assumed that mini-batches are used. The size of the mini-batch is a\n",
-    "hyperparameter but it is not very common to cross-validate or bootstrap it. It is\n",
-    "usually based on memory constraints (if any), or set to some value,\n",
-    "e.g. 32, 64 or 128. We use powers of 2 in practice because many\n",
-    "vectorized operation implementations work faster when their inputs are\n",
-    "sized in powers of 2.\n",
+    "As we have discussed earlier, this forms the\n",
+    "bottle neck of all machine learning algorithms, namely how to find\n",
+    "reliable minima of a multi-variable function. This leads us to the\n",
+    "family of gradient descent methods. The latter are the working horses\n",
+    "of basically all modern machine learning algorithms.\n",
     "\n",
-    "In our notes with  SGD we mean stochastic gradient descent with mini-batches."
+    "We note also that many of the topics discussed here on logistic \n",
+    "regression are also commonly used in modern supervised Deep Learning\n",
+    "models, as we will see later."
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "204c15af",
-   "metadata": {},
+   "id": "4120d6f9",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "## Stochastic Gradient Descent\n",
+    "## Basics\n",
     "\n",
-    "Stochastic gradient descent (SGD) and variants thereof address some of\n",
-    "the shortcomings of the Gradient descent method discussed above.\n",
+    "We consider the case where the outputs/targets, also called the\n",
+    "responses or the outcomes, $y_i$ are discrete and only take values\n",
+    "from $k=0,\\dots,K-1$ (i.e. $K$ classes).\n",
     "\n",
-    "The underlying idea of SGD comes from the observation that the cost\n",
-    "function, which we want to minimize, can almost always be written as a\n",
-    "sum over $n$ data points $\\{\\mathbf{x}_i\\}_{i=1}^n$,"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "c9453416",
-   "metadata": {},
-   "source": [
-    "$$\n",
-    "C(\\mathbf{\\beta}) = \\sum_{i=1}^n c_i(\\mathbf{x}_i,\n",
-    "\\mathbf{\\beta}).\n",
-    "$$"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "19b0403c",
-   "metadata": {},
-   "source": [
-    "## Computation of gradients\n",
+    "The goal is to predict the\n",
+    "output classes from the design matrix $\\boldsymbol{X}\\in\\mathbb{R}^{n\\times p}$\n",
+    "made of $n$ samples, each of which carries $p$ features or predictors. The\n",
+    "primary goal is to identify the classes to which new unseen samples\n",
+    "belong.\n",
     "\n",
-    "This in turn means that the gradient can be\n",
-    "computed as a sum over $i$-gradients"
+    "Last week we  specialized to the case of two classes only, with outputs\n",
+    "$y_i=0$ and $y_i=1$. Our outcomes could represent the status of a\n",
+    "credit card user that could default or not on her/his credit card\n",
+    "debt. That is"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "37025507",
-   "metadata": {},
+   "id": "9e85d1e4",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
-    "\\nabla_\\beta C(\\mathbf{\\beta}) = \\sum_i^n \\nabla_\\beta c_i(\\mathbf{x}_i,\n",
-    "\\mathbf{\\beta}).\n",
+    "y_i = \\begin{bmatrix} 0 & \\mathrm{no}\\\\  1 & \\mathrm{yes} \\end{bmatrix}.\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "cc7aaec1",
-   "metadata": {},
-   "source": [
-    "Stochasticity/randomness is introduced by only taking the\n",
-    "gradient on a subset of the data called minibatches.  If there are $n$\n",
-    "data points and the size of each minibatch is $M$, there will be $n/M$\n",
-    "minibatches. We denote these minibatches by $B_k$ where\n",
-    "$k=1,\\cdots,n/M$."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "3a3a0d11",
-   "metadata": {},
+   "id": "a0d8c838",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "## SGD example\n",
-    "As an example, suppose we have $10$ data points $(\\mathbf{x}_1,\\cdots, \\mathbf{x}_{10})$ \n",
-    "and we choose to have $M=5$ minibathces,\n",
-    "then each minibatch contains two data points. In particular we have\n",
-    "$B_1 = (\\mathbf{x}_1,\\mathbf{x}_2), \\cdots, B_5 =\n",
-    "(\\mathbf{x}_9,\\mathbf{x}_{10})$. Note that if you choose $M=1$ you\n",
-    "have only a single batch with all data points and on the other extreme,\n",
-    "you may choose $M=n$ resulting in a minibatch for each datapoint, i.e\n",
-    "$B_k = \\mathbf{x}_k$.\n",
+    "## Two parameters\n",
     "\n",
-    "The idea is now to approximate the gradient by replacing the sum over\n",
-    "all data points with a sum over the data points in one the minibatches\n",
-    "picked at random in each gradient descent step"
+    "We assume now that we have two classes with $y_i$ either $0$ or $1$. Furthermore we assume also that we have only two parameters $\\theta$ in our fitting of the Sigmoid function, that is we define probabilities"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "0acfc986",
-   "metadata": {},
+   "id": "7cea7945",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
-    "\\nabla_{\\beta}\n",
-    "C(\\mathbf{\\beta}) = \\sum_{i=1}^n \\nabla_\\beta c_i(\\mathbf{x}_i,\n",
-    "\\mathbf{\\beta}) \\rightarrow \\sum_{i \\in B_k}^n \\nabla_\\beta\n",
-    "c_i(\\mathbf{x}_i, \\mathbf{\\beta}).\n",
+    "\\begin{align*}\n",
+    "p(y_i=1|x_i,\\boldsymbol{\\theta}) &= \\frac{\\exp{(\\theta_0+\\theta_1x_i)}}{1+\\exp{(\\theta_0+\\theta_1x_i)}},\\nonumber\\\\\n",
+    "p(y_i=0|x_i,\\boldsymbol{\\theta}) &= 1 - p(y_i=1|x_i,\\boldsymbol{\\theta}),\n",
+    "\\end{align*}\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "8d3f995d",
-   "metadata": {},
+   "id": "6adc5106",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "## The gradient step\n",
+    "where $\\boldsymbol{\\theta}$ are the weights we wish to extract from data, in our case $\\theta_0$ and $\\theta_1$. \n",
     "\n",
-    "Thus a gradient descent step now looks like"
+    "Note that we used"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "8c73ac82",
-   "metadata": {},
+   "id": "f976068e",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
-    "\\beta_{j+1} = \\beta_j - \\gamma_j \\sum_{i \\in B_k}^n \\nabla_\\beta c_i(\\mathbf{x}_i,\n",
-    "\\mathbf{\\beta})\n",
+    "p(y_i=0\\vert x_i, \\boldsymbol{\\theta}) = 1-p(y_i=1\\vert x_i, \\boldsymbol{\\theta}).\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "f88656c1",
-   "metadata": {},
-   "source": [
-    "where $k$ is picked at random with equal\n",
-    "probability from $[1,n/M]$. An iteration over the number of\n",
-    "minibathces (n/M) is commonly referred to as an epoch. Thus it is\n",
-    "typical to choose a number of epochs and for each epoch iterate over\n",
-    "the number of minibatches, as exemplified in the code below."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "e80a498f",
-   "metadata": {},
-   "source": [
-    "## Simple example code"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 4,
-   "id": "626ac884",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "import numpy as np \n",
-    "\n",
-    "n = 100 #100 datapoints \n",
-    "M = 5   #size of each minibatch\n",
-    "m = int(n/M) #number of minibatches\n",
-    "n_epochs = 10 #number of epochs\n",
-    "\n",
-    "j = 0\n",
-    "for epoch in range(1,n_epochs+1):\n",
-    "    for i in range(m):\n",
-    "        k = np.random.randint(m) #Pick the k-th minibatch at random\n",
-    "        #Compute the gradient using the data in minibatch Bk\n",
-    "        #Compute new suggestion for \n",
-    "        j += 1"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "3c9a754d",
-   "metadata": {},
-   "source": [
-    "Taking the gradient only on a subset of the data has two important\n",
-    "benefits. First, it introduces randomness which decreases the chance\n",
-    "that our opmization scheme gets stuck in a local minima. Second, if\n",
-    "the size of the minibatches are small relative to the number of\n",
-    "datapoints ($M <  n$), the computation of the gradient is much\n",
-    "cheaper since we sum over the datapoints in the $k-th$ minibatch and not\n",
-    "all $n$ datapoints."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "2790ab60",
-   "metadata": {},
-   "source": [
-    "## When do we stop?\n",
-    "\n",
-    "A natural question is when do we stop the search for a new minimum?\n",
-    "One possibility is to compute the full gradient after a given number\n",
-    "of epochs and check if the norm of the gradient is smaller than some\n",
-    "threshold and stop if true. However, the condition that the gradient\n",
-    "is zero is valid also for local minima, so this would only tell us\n",
-    "that we are close to a local/global minimum. However, we could also\n",
-    "evaluate the cost function at this point, store the result and\n",
-    "continue the search. If the test kicks in at a later stage we can\n",
-    "compare the values of the cost function and keep the $\\beta$ that\n",
-    "gave the lowest value."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "3ea3ee12",
-   "metadata": {},
-   "source": [
-    "## Slightly different approach\n",
-    "\n",
-    "Another approach is to let the step length $\\gamma_j$ depend on the\n",
-    "number of epochs in such a way that it becomes very small after a\n",
-    "reasonable time such that we do not move at all. Such approaches are\n",
-    "also called scaling. There are many such ways to [scale the learning\n",
-    "rate](https://towardsdatascience.com/gradient-descent-the-learning-rate-and-the-importance-of-feature-scaling-6c0b416596e1)\n",
-    "and [discussions here](https://www.jmlr.org/papers/volume23/20-1258/20-1258.pdf). See\n",
-    "also\n",
-    "<https://towardsdatascience.com/learning-rate-schedules-and-adaptive-learning-rate-methods-for-deep-learning-2c8f433990d1>\n",
-    "for a discussion of different scaling functions for the learning rate."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "3c39cc08",
-   "metadata": {},
-   "source": [
-    "## Time decay rate\n",
-    "\n",
-    "As an example, let $e = 0,1,2,3,\\cdots$ denote the current epoch and let $t_0, t_1 > 0$ be two fixed numbers. Furthermore, let $t = e \\cdot m + i$ where $m$ is the number of minibatches and $i=0,\\cdots,m-1$. Then the function $$\\gamma_j(t; t_0, t_1) = \\frac{t_0}{t+t_1} $$ goes to zero as the number of epochs gets large. I.e. we start with a step length $\\gamma_j (0; t_0, t_1) = t_0/t_1$ which decays in *time* $t$.\n",
-    "\n",
-    "In this way we can fix the number of epochs, compute $\\beta$ and\n",
-    "evaluate the cost function at the end. Repeating the computation will\n",
-    "give a different result since the scheme is random by design. Then we\n",
-    "pick the final $\\beta$ that gives the lowest value of the cost\n",
-    "function."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 5,
-   "id": "294edbef",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "import numpy as np \n",
-    "\n",
-    "def step_length(t,t0,t1):\n",
-    "    return t0/(t+t1)\n",
-    "\n",
-    "n = 100 #100 datapoints \n",
-    "M = 5   #size of each minibatch\n",
-    "m = int(n/M) #number of minibatches\n",
-    "n_epochs = 500 #number of epochs\n",
-    "t0 = 1.0\n",
-    "t1 = 10\n",
-    "\n",
-    "gamma_j = t0/t1\n",
-    "j = 0\n",
-    "for epoch in range(1,n_epochs+1):\n",
-    "    for i in range(m):\n",
-    "        k = np.random.randint(m) #Pick the k-th minibatch at random\n",
-    "        #Compute the gradient using the data in minibatch Bk\n",
-    "        #Compute new suggestion for beta\n",
-    "        t = epoch*m+i\n",
-    "        gamma_j = step_length(t,t0,t1)\n",
-    "        j += 1\n",
-    "\n",
-    "print(\"gamma_j after %d epochs: %g\" % (n_epochs,gamma_j))"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "f60f930b",
-   "metadata": {},
-   "source": [
-    "## Code with a Number of Minibatches which varies\n",
-    "\n",
-    "In the code here we vary the number of mini-batches."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 6,
-   "id": "41a929b5",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "%matplotlib inline\n",
-    "\n",
-    "# Importing various packages\n",
-    "from math import exp, sqrt\n",
-    "from random import random, seed\n",
-    "import numpy as np\n",
-    "import matplotlib.pyplot as plt\n",
-    "\n",
-    "n = 100\n",
-    "x = 2*np.random.rand(n,1)\n",
-    "y = 4+3*x+np.random.randn(n,1)\n",
-    "\n",
-    "X = np.c_[np.ones((n,1)), x]\n",
-    "XT_X = X.T @ X\n",
-    "theta_linreg = np.linalg.inv(X.T @ X) @ (X.T @ y)\n",
-    "print(\"Own inversion\")\n",
-    "print(theta_linreg)\n",
-    "# Hessian matrix\n",
-    "H = (2.0/n)* XT_X\n",
-    "EigValues, EigVectors = np.linalg.eig(H)\n",
-    "print(f\"Eigenvalues of Hessian Matrix:{EigValues}\")\n",
-    "\n",
-    "theta = np.random.randn(2,1)\n",
-    "eta = 1.0/np.max(EigValues)\n",
-    "Niterations = 1000\n",
-    "\n",
-    "\n",
-    "for iter in range(Niterations):\n",
-    "    gradients = 2.0/n*X.T @ ((X @ theta)-y)\n",
-    "    theta -= eta*gradients\n",
-    "print(\"theta from own gd\")\n",
-    "print(theta)\n",
-    "\n",
-    "xnew = np.array([[0],[2]])\n",
-    "Xnew = np.c_[np.ones((2,1)), xnew]\n",
-    "ypredict = Xnew.dot(theta)\n",
-    "ypredict2 = Xnew.dot(theta_linreg)\n",
-    "\n",
-    "n_epochs = 50\n",
-    "M = 5   #size of each minibatch\n",
-    "m = int(n/M) #number of minibatches\n",
-    "t0, t1 = 5, 50\n",
-    "\n",
-    "def learning_schedule(t):\n",
-    "    return t0/(t+t1)\n",
-    "\n",
-    "theta = np.random.randn(2,1)\n",
-    "\n",
-    "for epoch in range(n_epochs):\n",
-    "# Can you figure out a better way of setting up the contributions to each batch?\n",
-    "    for i in range(m):\n",
-    "        random_index = M*np.random.randint(m)\n",
-    "        xi = X[random_index:random_index+M]\n",
-    "        yi = y[random_index:random_index+M]\n",
-    "        gradients = (2.0/M)* xi.T @ ((xi @ theta)-yi)\n",
-    "        eta = learning_schedule(epoch*m+i)\n",
-    "        theta = theta - eta*gradients\n",
-    "print(\"theta from own sdg\")\n",
-    "print(theta)\n",
-    "\n",
-    "plt.plot(xnew, ypredict, \"r-\")\n",
-    "plt.plot(xnew, ypredict2, \"b-\")\n",
-    "plt.plot(x, y ,'ro')\n",
-    "plt.axis([0,2.0,0, 15.0])\n",
-    "plt.xlabel(r'$x$')\n",
-    "plt.ylabel(r'$y$')\n",
-    "plt.title(r'Random numbers ')\n",
-    "plt.show()"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "af8f3db9",
-   "metadata": {},
-   "source": [
-    "## Replace or not\n",
-    "\n",
-    "In the above code, we have use replacement in setting up the\n",
-    "mini-batches. The discussion\n",
-    "[here](https://sebastianraschka.com/faq/docs/sgd-methods.html) may be\n",
-    "useful."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "ce1d1147",
-   "metadata": {},
+   "id": "dedf9f0e",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "## Momentum based GD\n",
+    "## Maximum likelihood\n",
     "\n",
-    "The stochastic gradient descent (SGD) is almost always used with a\n",
-    "*momentum* or inertia term that serves as a memory of the direction we\n",
-    "are moving in parameter space.  This is typically implemented as\n",
-    "follows"
+    "In order to define the total likelihood for all possible outcomes from a  \n",
+    "dataset $\\mathcal{D}=\\{(y_i,x_i)\\}$, with the binary labels\n",
+    "$y_i\\in\\{0,1\\}$ and where the data points are drawn independently, we use the so-called [Maximum Likelihood Estimation](https://en.wikipedia.org/wiki/Maximum_likelihood_estimation) (MLE) principle. \n",
+    "We aim thus at maximizing \n",
+    "the probability of seeing the observed data. We can then approximate the \n",
+    "likelihood in terms of the product of the individual probabilities of a specific outcome $y_i$, that is"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "b8750f09",
-   "metadata": {},
-   "source": [
-    "$$\n",
-    "\\mathbf{v}_{t}=\\gamma \\mathbf{v}_{t-1}+\\eta_{t}\\nabla_\\theta E(\\boldsymbol{\\theta}_t) \\nonumber\n",
-    "$$"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "6b18d51a",
-   "metadata": {},
+   "id": "bd8b54ab",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "<!-- Equation labels as ordinary links -->\n",
-    "<div id=\"_auto1\"></div>\n",
-    "\n",
     "$$\n",
-    "\\begin{equation} \n",
-    "\\boldsymbol{\\theta}_{t+1}= \\boldsymbol{\\theta}_t -\\mathbf{v}_{t},\n",
-    "\\label{_auto1} \\tag{1}\n",
-    "\\end{equation}\n",
+    "\\begin{align*}\n",
+    "P(\\mathcal{D}|\\boldsymbol{\\theta})& = \\prod_{i=1}^n \\left[p(y_i=1|x_i,\\boldsymbol{\\theta})\\right]^{y_i}\\left[1-p(y_i=1|x_i,\\boldsymbol{\\theta}))\\right]^{1-y_i}\\nonumber \\\\\n",
+    "\\end{align*}\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "efa3113f",
-   "metadata": {},
+   "id": "57bfb17f",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "where we have introduced a momentum parameter $\\gamma$, with\n",
-    "$0\\le\\gamma\\le 1$, and for brevity we dropped the explicit notation to\n",
-    "indicate the gradient is to be taken over a different mini-batch at\n",
-    "each step. We call this algorithm gradient descent with momentum\n",
-    "(GDM). From these equations, it is clear that $\\mathbf{v}_t$ is a\n",
-    "running average of recently encountered gradients and\n",
-    "$(1-\\gamma)^{-1}$ sets the characteristic time scale for the memory\n",
-    "used in the averaging procedure. Consistent with this, when\n",
-    "$\\gamma=0$, this just reduces down to ordinary SGD as discussed\n",
-    "earlier. An equivalent way of writing the updates is"
+    "from which we obtain the log-likelihood and our **cost/loss** function"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "2007f72c",
-   "metadata": {},
+   "id": "00aee268",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
-    "\\Delta \\boldsymbol{\\theta}_{t+1} = \\gamma \\Delta \\boldsymbol{\\theta}_t -\\ \\eta_{t}\\nabla_\\theta E(\\boldsymbol{\\theta}_t),\n",
+    "\\mathcal{C}(\\boldsymbol{\\theta}) = \\sum_{i=1}^n \\left( y_i\\log{p(y_i=1|x_i,\\boldsymbol{\\theta})} + (1-y_i)\\log\\left[1-p(y_i=1|x_i,\\boldsymbol{\\theta}))\\right]\\right).\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "149a0faa",
-   "metadata": {},
-   "source": [
-    "where we have defined $\\Delta \\boldsymbol{\\theta}_{t}= \\boldsymbol{\\theta}_t-\\boldsymbol{\\theta}_{t-1}$."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "3930d988",
-   "metadata": {},
+   "id": "e12940f3",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "## More on momentum based approaches\n",
+    "## The cost function rewritten\n",
     "\n",
-    "Let us try to get more intuition from these equations. It is helpful\n",
-    "to consider a simple physical analogy with a particle of mass $m$\n",
-    "moving in a viscous medium with drag coefficient $\\mu$ and potential\n",
-    "$E(\\mathbf{w})$. If we denote the particle's position by $\\mathbf{w}$,\n",
-    "then its motion is described by"
+    "Reordering the logarithms, we can rewrite the **cost/loss** function as"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "8560c22c",
-   "metadata": {},
+   "id": "e5b2b29e",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
-    "m {d^2 \\mathbf{w} \\over dt^2} + \\mu {d \\mathbf{w} \\over dt }= -\\nabla_w E(\\mathbf{w}).\n",
+    "\\mathcal{C}(\\boldsymbol{\\theta}) = \\sum_{i=1}^n  \\left(y_i(\\theta_0+\\theta_1x_i) -\\log{(1+\\exp{(\\theta_0+\\theta_1x_i)})}\\right).\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "6f4563fe",
-   "metadata": {},
+   "id": "c6c0ba4c",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "We can discretize this equation in the usual way to get"
+    "The maximum likelihood estimator is defined as the set of parameters that maximize the log-likelihood where we maximize with respect to $\\theta$.\n",
+    "Since the cost (error) function is just the negative log-likelihood, for logistic regression we have that"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "8fe2b7ac",
-   "metadata": {},
+   "id": "46ee2ea8",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
-    "m { \\mathbf{w}_{t+\\Delta t}-2 \\mathbf{w}_{t} +\\mathbf{w}_{t-\\Delta t} \\over (\\Delta t)^2}+\\mu {\\mathbf{w}_{t+\\Delta t}- \\mathbf{w}_{t} \\over \\Delta t} = -\\nabla_w E(\\mathbf{w}).\n",
+    "\\mathcal{C}(\\boldsymbol{\\theta})=-\\sum_{i=1}^n  \\left(y_i(\\theta_0+\\theta_1x_i) -\\log{(1+\\exp{(\\theta_0+\\theta_1x_i)})}\\right).\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "207eae67",
-   "metadata": {},
-   "source": [
-    "Rearranging this equation, we can rewrite this as"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "9770730e",
-   "metadata": {},
+   "id": "9a05709b",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "$$\n",
-    "\\Delta \\mathbf{w}_{t +\\Delta t}= - { (\\Delta t)^2 \\over m +\\mu \\Delta t} \\nabla_w E(\\mathbf{w})+ {m \\over m +\\mu \\Delta t} \\Delta \\mathbf{w}_t.\n",
-    "$$"
+    "This equation is known in statistics as the **cross entropy**. Finally, we note that just as in linear regression, \n",
+    "in practice we often supplement the cross-entropy with additional regularization terms, usually $L_1$ and $L_2$ regularization as we did for Ridge and Lasso regression."
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "c233679d",
-   "metadata": {},
+   "id": "ae1362c9",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "## Momentum parameter\n",
+    "## Minimizing the cross entropy\n",
     "\n",
-    "Notice that this equation is identical to previous one if we identify\n",
-    "the position of the particle, $\\mathbf{w}$, with the parameters\n",
-    "$\\boldsymbol{\\theta}$. This allows us to identify the momentum\n",
-    "parameter and learning rate with the mass of the particle and the\n",
-    "viscous drag as:"
+    "The cross entropy is a convex function of the weights $\\boldsymbol{\\theta}$ and,\n",
+    "therefore, any local minimizer is a global minimizer. \n",
+    "\n",
+    "Minimizing this\n",
+    "cost function with respect to the two parameters $\\theta_0$ and $\\theta_1$ we obtain"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "cf54be10",
-   "metadata": {},
+   "id": "57f4670b",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
-    "\\gamma= {m \\over m +\\mu \\Delta t }, \\qquad \\eta = {(\\Delta t)^2 \\over m +\\mu \\Delta t}.\n",
+    "\\frac{\\partial \\mathcal{C}(\\boldsymbol{\\theta})}{\\partial \\theta_0} = -\\sum_{i=1}^n  \\left(y_i -\\frac{\\exp{(\\theta_0+\\theta_1x_i)}}{1+\\exp{(\\theta_0+\\theta_1x_i)}}\\right),\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "35f187e7",
-   "metadata": {},
+   "id": "1dc19f59",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "Thus, as the name suggests, the momentum parameter is proportional to\n",
-    "the mass of the particle and effectively provides inertia.\n",
-    "Furthermore, in the large viscosity/small learning rate limit, our\n",
-    "memory time scales as $(1-\\gamma)^{-1} \\approx m/(\\mu \\Delta t)$.\n",
-    "\n",
-    "Why is momentum useful? SGD momentum helps the gradient descent\n",
-    "algorithm gain speed in directions with persistent but small gradients\n",
-    "even in the presence of stochasticity, while suppressing oscillations\n",
-    "in high-curvature directions. This becomes especially important in\n",
-    "situations where the landscape is shallow and flat in some directions\n",
-    "and narrow and steep in others. It has been argued that first-order\n",
-    "methods (with appropriate initial conditions) can perform comparable\n",
-    "to more expensive second order methods, especially in the context of\n",
-    "complex deep learning models.\n",
-    "\n",
-    "These beneficial properties of momentum can sometimes become even more\n",
-    "pronounced by using a slight modification of the classical momentum\n",
-    "algorithm called Nesterov Accelerated Gradient (NAG).\n",
-    "\n",
-    "In the NAG algorithm, rather than calculating the gradient at the\n",
-    "current parameters, $\\nabla_\\theta E(\\boldsymbol{\\theta}_t)$, one\n",
-    "calculates the gradient at the expected value of the parameters given\n",
-    "our current momentum, $\\nabla_\\theta E(\\boldsymbol{\\theta}_t +\\gamma\n",
-    "\\mathbf{v}_{t-1})$. This yields the NAG update rule"
+    "and"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "476338d5",
-   "metadata": {},
+   "id": "4e96dc87",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
-    "\\mathbf{v}_{t}=\\gamma \\mathbf{v}_{t-1}+\\eta_{t}\\nabla_\\theta E(\\boldsymbol{\\theta}_t +\\gamma \\mathbf{v}_{t-1}) \\nonumber\n",
+    "\\frac{\\partial \\mathcal{C}(\\boldsymbol{\\theta})}{\\partial \\theta_1} = -\\sum_{i=1}^n  \\left(y_ix_i -x_i\\frac{\\exp{(\\theta_0+\\theta_1x_i)}}{1+\\exp{(\\theta_0+\\theta_1x_i)}}\\right).\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "c23f6df7",
-   "metadata": {},
-   "source": [
-    "<!-- Equation labels as ordinary links -->\n",
-    "<div id=\"_auto2\"></div>\n",
-    "\n",
-    "$$\n",
-    "\\begin{equation} \n",
-    "\\boldsymbol{\\theta}_{t+1}= \\boldsymbol{\\theta}_t -\\mathbf{v}_{t}.\n",
-    "\\label{_auto2} \\tag{2}\n",
-    "\\end{equation}\n",
-    "$$"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "e36c0680",
-   "metadata": {},
-   "source": [
-    "One of the major advantages of NAG is that it allows for the use of a larger learning rate than GDM for the same choice of $\\gamma$."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "484048bb",
-   "metadata": {},
-   "source": [
-    "## Second moment of the gradient\n",
-    "\n",
-    "In stochastic gradient descent, with and without momentum, we still\n",
-    "have to specify a schedule for tuning the learning rates $\\eta_t$\n",
-    "as a function of time.  As discussed in the context of Newton's\n",
-    "method, this presents a number of dilemmas. The learning rate is\n",
-    "limited by the steepest direction which can change depending on the\n",
-    "current position in the landscape. To circumvent this problem, ideally\n",
-    "our algorithm would keep track of curvature and take large steps in\n",
-    "shallow, flat directions and small steps in steep, narrow directions.\n",
-    "Second-order methods accomplish this by calculating or approximating\n",
-    "the Hessian and normalizing the learning rate by the\n",
-    "curvature. However, this is very computationally expensive for\n",
-    "extremely large models. Ideally, we would like to be able to\n",
-    "adaptively change the step size to match the landscape without paying\n",
-    "the steep computational price of calculating or approximating\n",
-    "Hessians.\n",
-    "\n",
-    "During the last decade a number of methods have been introduced that accomplish\n",
-    "this by tracking not only the gradient, but also the second moment of\n",
-    "the gradient. These methods include AdaGrad, AdaDelta, Root Mean Squared Propagation (RMS-Prop), and\n",
-    "[ADAM](https://arxiv.org/abs/1412.6980)."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "ed915ab1",
-   "metadata": {},
+   "id": "fa77bec9",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "## RMS prop\n",
+    "## A more compact expression\n",
     "\n",
-    "In RMS prop, in addition to keeping a running average of the first\n",
-    "moment of the gradient, we also keep track of the second moment\n",
-    "denoted by $\\mathbf{s}_t=\\mathbb{E}[\\mathbf{g}_t^2]$. The update rule\n",
-    "for RMS prop is given by"
+    "Let us now define a vector $\\boldsymbol{y}$ with $n$ elements $y_i$, an\n",
+    "$n\\times p$ matrix $\\boldsymbol{X}$ which contains the $x_i$ values and a\n",
+    "vector $\\boldsymbol{p}$ of fitted probabilities $p(y_i\\vert x_i,\\boldsymbol{\\theta})$. We can rewrite in a more compact form the first\n",
+    "derivative of the cost function as"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "c4d8b1a8",
-   "metadata": {},
+   "id": "1b013fd2",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "<!-- Equation labels as ordinary links -->\n",
-    "<div id=\"_auto3\"></div>\n",
-    "\n",
     "$$\n",
-    "\\begin{equation}\n",
-    "\\mathbf{g}_t = \\nabla_\\theta E(\\boldsymbol{\\theta}) \n",
-    "\\label{_auto3} \\tag{3}\n",
-    "\\end{equation}\n",
+    "\\frac{\\partial \\mathcal{C}(\\boldsymbol{\\theta})}{\\partial \\boldsymbol{\\theta}} = -\\boldsymbol{X}^T\\left(\\boldsymbol{y}-\\boldsymbol{p}\\right).\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "6819d54f",
-   "metadata": {},
+   "id": "910f36dd",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "$$\n",
-    "\\mathbf{s}_t =\\beta \\mathbf{s}_{t-1} +(1-\\beta)\\mathbf{g}_t^2 \\nonumber\n",
-    "$$"
+    "If we in addition define a diagonal matrix $\\boldsymbol{W}$ with elements \n",
+    "$p(y_i\\vert x_i,\\boldsymbol{\\theta})(1-p(y_i\\vert x_i,\\boldsymbol{\\theta})$, we can obtain a compact expression of the second derivative as"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "c38fa00d",
-   "metadata": {},
+   "id": "8212d0ed",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
-    "\\boldsymbol{\\theta}_{t+1}=\\boldsymbol{\\theta}_t - \\eta_t { \\mathbf{g}_t \\over \\sqrt{\\mathbf{s}_t +\\epsilon}}, \\nonumber\n",
+    "\\frac{\\partial^2 \\mathcal{C}(\\boldsymbol{\\theta})}{\\partial \\boldsymbol{\\theta}\\partial \\boldsymbol{\\theta}^T} = \\boldsymbol{X}^T\\boldsymbol{W}\\boldsymbol{X}.\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "7ff83ef0",
-   "metadata": {},
-   "source": [
-    "where $\\beta$ controls the averaging time of the second moment and is\n",
-    "typically taken to be about $\\beta=0.9$, $\\eta_t$ is a learning rate\n",
-    "typically chosen to be $10^{-3}$, and $\\epsilon\\sim 10^{-8} $ is a\n",
-    "small regularization constant to prevent divergences. Multiplication\n",
-    "and division by vectors is understood as an element-wise operation. It\n",
-    "is clear from this formula that the learning rate is reduced in\n",
-    "directions where the norm of the gradient is consistently large. This\n",
-    "greatly speeds up the convergence by allowing us to use a larger\n",
-    "learning rate for flat directions."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "ba98f789",
-   "metadata": {},
+   "id": "7ae7078b",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "## [ADAM optimizer](https://arxiv.org/abs/1412.6980)\n",
-    "\n",
-    "A related algorithm is the ADAM optimizer. In\n",
-    "[ADAM](https://arxiv.org/abs/1412.6980), we keep a running average of\n",
-    "both the first and second moment of the gradient and use this\n",
-    "information to adaptively change the learning rate for different\n",
-    "parameters.  The method isefficient when working with large\n",
-    "problems involving lots data and/or parameters.  It is a combination of the\n",
-    "gradient descent with momentum algorithm and the RMSprop algorithm\n",
-    "discussed above.\n",
+    "## Extending to more predictors\n",
     "\n",
-    "In addition to keeping a running average of the first and\n",
-    "second moments of the gradient\n",
-    "(i.e. $\\mathbf{m}_t=\\mathbb{E}[\\mathbf{g}_t]$ and\n",
-    "$\\mathbf{s}_t=\\mathbb{E}[\\mathbf{g}^2_t]$, respectively), ADAM\n",
-    "performs an additional bias correction to account for the fact that we\n",
-    "are estimating the first two moments of the gradient using a running\n",
-    "average (denoted by the hats in the update rule below). The update\n",
-    "rule for ADAM is given by (where multiplication and division are once\n",
-    "again understood to be element-wise operations below)"
+    "Within a binary classification problem, we can easily expand our model to include multiple predictors. Our ratio between likelihoods is then with $p$ predictors"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "03428756",
-   "metadata": {},
+   "id": "59e57d7c",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "<!-- Equation labels as ordinary links -->\n",
-    "<div id=\"_auto4\"></div>\n",
-    "\n",
     "$$\n",
-    "\\begin{equation}\n",
-    "\\mathbf{g}_t = \\nabla_\\theta E(\\boldsymbol{\\theta}) \n",
-    "\\label{_auto4} \\tag{4}\n",
-    "\\end{equation}\n",
+    "\\log{ \\frac{p(\\boldsymbol{\\theta}\\boldsymbol{x})}{1-p(\\boldsymbol{\\theta}\\boldsymbol{x})}} = \\theta_0+\\theta_1x_1+\\theta_2x_2+\\dots+\\theta_px_p.\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "ef5b461b",
-   "metadata": {},
+   "id": "6ffe0955",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "$$\n",
-    "\\mathbf{m}_t = \\beta_1 \\mathbf{m}_{t-1} + (1-\\beta_1) \\mathbf{g}_t \\nonumber\n",
-    "$$"
+    "Here we defined $\\boldsymbol{x}=[1,x_1,x_2,\\dots,x_p]$ and $\\boldsymbol{\\theta}=[\\theta_0, \\theta_1, \\dots, \\theta_p]$ leading to"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "0149850a",
-   "metadata": {},
+   "id": "56e9bd82",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
-    "\\mathbf{s}_t =\\beta_2 \\mathbf{s}_{t-1} +(1-\\beta_2)\\mathbf{g}_t^2 \\nonumber\n",
+    "p(\\boldsymbol{\\theta}\\boldsymbol{x})=\\frac{ \\exp{(\\theta_0+\\theta_1x_1+\\theta_2x_2+\\dots+\\theta_px_p)}}{1+\\exp{(\\theta_0+\\theta_1x_1+\\theta_2x_2+\\dots+\\theta_px_p)}}.\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "4ae41be8",
-   "metadata": {},
+   "id": "86b12946",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "$$\n",
-    "\\boldsymbol{\\mathbf{m}}_t={\\mathbf{m}_t \\over 1-\\beta_1^t} \\nonumber\n",
-    "$$"
+    "## Including more classes\n",
+    "\n",
+    "Till now we have mainly focused on two classes, the so-called binary\n",
+    "system. Suppose we wish to extend to $K$ classes.  Let us for the sake\n",
+    "of simplicity assume we have only two predictors. We have then following model"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "5d36d54a",
-   "metadata": {},
+   "id": "d55394df",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
-    "\\boldsymbol{\\mathbf{s}}_t ={\\mathbf{s}_t \\over1-\\beta_2^t} \\nonumber\n",
+    "\\log{\\frac{p(C=1\\vert x)}{p(K\\vert x)}} = \\theta_{10}+\\theta_{11}x_1,\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "08eb5528",
-   "metadata": {},
+   "id": "ee01378a",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "$$\n",
-    "\\boldsymbol{\\theta}_{t+1}=\\boldsymbol{\\theta}_t - \\eta_t { \\boldsymbol{\\mathbf{m}}_t \\over \\sqrt{\\boldsymbol{\\mathbf{s}}_t} +\\epsilon}, \\nonumber\n",
-    "$$"
+    "and"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "4a5e4b7b",
-   "metadata": {},
+   "id": "c7fadfbb",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "<!-- Equation labels as ordinary links -->\n",
-    "<div id=\"_auto5\"></div>\n",
-    "\n",
     "$$\n",
-    "\\begin{equation} \n",
-    "\\label{_auto5} \\tag{5}\n",
-    "\\end{equation}\n",
+    "\\log{\\frac{p(C=2\\vert x)}{p(K\\vert x)}} = \\theta_{20}+\\theta_{21}x_1,\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "b71679d3",
-   "metadata": {},
+   "id": "e8310f63",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "where $\\beta_1$ and $\\beta_2$ set the memory lifetime of the first and\n",
-    "second moment and are typically taken to be $0.9$ and $0.99$\n",
-    "respectively, and $\\eta$ and $\\epsilon$ are identical to RMSprop.\n",
-    "\n",
-    "Like in RMSprop, the effective step size of a parameter depends on the\n",
-    "magnitude of its gradient squared.  To understand this better, let us\n",
-    "rewrite this expression in terms of the variance\n",
-    "$\\boldsymbol{\\sigma}_t^2 = \\boldsymbol{\\mathbf{s}}_t -\n",
-    "(\\boldsymbol{\\mathbf{m}}_t)^2$. Consider a single parameter $\\theta_t$. The\n",
-    "update rule for this parameter is given by"
+    "and so on till the class $C=K-1$ class"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "a9910c4b",
-   "metadata": {},
+   "id": "be651647",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
-    "\\Delta \\theta_{t+1}= -\\eta_t { \\boldsymbol{m}_t \\over \\sqrt{\\sigma_t^2 +  m_t^2 }+\\epsilon}.\n",
+    "\\log{\\frac{p(C=K-1\\vert x)}{p(K\\vert x)}} = \\theta_{(K-1)0}+\\theta_{(K-1)1}x_1,\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "85d963d2",
-   "metadata": {},
-   "source": [
-    "## Algorithms and codes for Adagrad, RMSprop and Adam\n",
-    "\n",
-    "The algorithms we have implemented are well described in the text by [Goodfellow, Bengio and Courville, chapter 8](https://www.deeplearningbook.org/contents/optimization.html).\n",
-    "\n",
-    "The codes which implement these algorithms are discussed after our presentation of automatic differentiation."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "bc9de56d",
-   "metadata": {},
-   "source": [
-    "## AdaGrad algorithm, taken from [Goodfellow et al](https://www.deeplearningbook.org/contents/optimization.html)\n",
-    "\n",
-    "<!-- dom:FIGURE: [figures/adagrad.png, width=600 frac=0.8] -->\n",
-    "<!-- begin figure -->\n",
-    "\n",
-    "<img src=\"figures/adagrad.png\" width=\"600\"><p style=\"font-size: 0.9em\"><i>Figure 1: </i></p>\n",
-    "<!-- end figure -->"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "e6293208",
-   "metadata": {},
-   "source": [
-    "## RMSProp algorithm, taken from [Goodfellow et al](https://www.deeplearningbook.org/contents/optimization.html)\n",
-    "\n",
-    "<!-- dom:FIGURE: [figures/rmsprop.png, width=600 frac=0.8] -->\n",
-    "<!-- begin figure -->\n",
-    "\n",
-    "<img src=\"figures/rmsprop.png\" width=\"600\"><p style=\"font-size: 0.9em\"><i>Figure 1: </i></p>\n",
-    "<!-- end figure -->"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "40fde17d",
-   "metadata": {},
-   "source": [
-    "## ADAM algorithm, taken from [Goodfellow et al](https://www.deeplearningbook.org/contents/optimization.html)\n",
-    "\n",
-    "<!-- dom:FIGURE: [figures/adam.png, width=600 frac=0.8] -->\n",
-    "<!-- begin figure -->\n",
-    "\n",
-    "<img src=\"figures/adam.png\" width=\"600\"><p style=\"font-size: 0.9em\"><i>Figure 1: </i></p>\n",
-    "<!-- end figure -->"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "f9066e5f",
-   "metadata": {},
+   "id": "e277c601",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "## Practical tips\n",
-    "\n",
-    "* **Randomize the data when making mini-batches**. It is always important to randomly shuffle the data when forming mini-batches. Otherwise, the gradient descent method can fit spurious correlations resulting from the order in which data is presented.\n",
-    "\n",
-    "* **Transform your inputs**. Learning becomes difficult when our landscape has a mixture of steep and flat directions. One simple trick for minimizing these situations is to standardize the data by subtracting the mean and normalizing the variance of input variables. Whenever possible, also decorrelate the inputs. To understand why this is helpful, consider the case of linear regression. It is easy to show that for the squared error cost function, the Hessian of the cost function is just the correlation matrix between the inputs. Thus, by standardizing the inputs, we are ensuring that the landscape looks homogeneous in all directions in parameter space. Since most deep networks can be viewed as linear transformations followed by a non-linearity at each layer, we expect this intuition to hold beyond the linear case.\n",
-    "\n",
-    "* **Monitor the out-of-sample performance.** Always monitor the performance of your model on a validation set (a small portion of the training data that is held out of the training process to serve as a proxy for the test set. If the validation error starts increasing, then the model is beginning to overfit. Terminate the learning process. This *early stopping* significantly improves performance in many settings.\n",
-    "\n",
-    "* **Adaptive optimization methods don't always have good generalization.** Recent studies have shown that adaptive methods such as ADAM, RMSPorp, and AdaGrad tend to have poor generalization compared to SGD or SGD with momentum, particularly in the high-dimensional limit (i.e. the number of parameters exceeds the number of data points). Although it is not clear at this stage why these methods perform so well in training deep neural networks, simpler procedures like properly-tuned SGD may work as well or better in these applications.\n",
-    "\n",
-    "Geron's text, see chapter 11, has several interesting discussions."
+    "and the model is specified in term of $K-1$ so-called log-odds or\n",
+    "**logit** transformations."
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "cfac43d8",
-   "metadata": {},
+   "id": "aea3a410",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "## Automatic differentiation\n",
-    "\n",
-    "[Automatic differentiation (AD)](https://en.wikipedia.org/wiki/Automatic_differentiation), \n",
-    "also called algorithmic\n",
-    "differentiation or computational differentiation,is a set of\n",
-    "techniques to numerically evaluate the derivative of a function\n",
-    "specified by a computer program. AD exploits the fact that every\n",
-    "computer program, no matter how complicated, executes a sequence of\n",
-    "elementary arithmetic operations (addition, subtraction,\n",
-    "multiplication, division, etc.) and elementary functions (exp, log,\n",
-    "sin, cos, etc.). By applying the chain rule repeatedly to these\n",
-    "operations, derivatives of arbitrary order can be computed\n",
-    "automatically, accurately to working precision, and using at most a\n",
-    "small constant factor more arithmetic operations than the original\n",
-    "program.\n",
-    "\n",
-    "Automatic differentiation is neither:\n",
-    "\n",
-    "* Symbolic differentiation, nor\n",
+    "## More classes\n",
     "\n",
-    "* Numerical differentiation (the method of finite differences).\n",
+    "In our discussion of neural networks we will encounter the above again\n",
+    "in terms of a slightly modified function, the so-called **Softmax** function.\n",
     "\n",
-    "Symbolic differentiation can lead to inefficient code and faces the\n",
-    "difficulty of converting a computer program into a single expression,\n",
-    "while numerical differentiation can introduce round-off errors in the\n",
-    "discretization process and cancellation\n",
-    "\n",
-    "Python has tools for so-called **automatic differentiation**.\n",
-    "Consider the following example"
+    "The softmax function is used in various multiclass classification\n",
+    "methods, such as multinomial logistic regression (also known as\n",
+    "softmax regression), multiclass linear discriminant analysis, naive\n",
+    "Bayes classifiers, and artificial neural networks.  Specifically, in\n",
+    "multinomial logistic regression and linear discriminant analysis, the\n",
+    "input to the function is the result of $K$ distinct linear functions,\n",
+    "and the predicted probability for the $k$-th class given a sample\n",
+    "vector $\\boldsymbol{x}$ and a weighting vector $\\boldsymbol{\\theta}$ is (with two\n",
+    "predictors):"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "6c1afa20",
-   "metadata": {},
+   "id": "bfa7221f",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
-    "f(x) = \\sin\\left(2\\pi x + x^2\\right)\n",
+    "p(C=k\\vert \\mathbf {x} )=\\frac{\\exp{(\\theta_{k0}+\\theta_{k1}x_1)}}{1+\\sum_{l=1}^{K-1}\\exp{(\\theta_{l0}+\\theta_{l1}x_1)}}.\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "4c038cf3",
-   "metadata": {},
+   "id": "3d749c39",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "which has the following derivative"
+    "It is easy to extend to more predictors. The final class is"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "eed0ca87",
-   "metadata": {},
+   "id": "dc061a39",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
-    "f'(x) = \\cos\\left(2\\pi x + x^2\\right)\\left(2\\pi + 2x\\right)\n",
+    "p(C=K\\vert \\mathbf {x} )=\\frac{1}{1+\\sum_{l=1}^{K-1}\\exp{(\\theta_{l0}+\\theta_{l1}x_1)}},\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "3f493ed1",
-   "metadata": {},
-   "source": [
-    "Using **autograd** we have"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 7,
-   "id": "f42e9964",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "import autograd.numpy as np\n",
-    "\n",
-    "# To do elementwise differentiation:\n",
-    "from autograd import elementwise_grad as egrad \n",
-    "\n",
-    "# To plot:\n",
-    "import matplotlib.pyplot as plt \n",
-    "\n",
-    "\n",
-    "def f(x):\n",
-    "    return np.sin(2*np.pi*x + x**2)\n",
-    "\n",
-    "def f_grad_analytic(x):\n",
-    "    return np.cos(2*np.pi*x + x**2)*(2*np.pi + 2*x)\n",
-    "\n",
-    "# Do the comparison:\n",
-    "x = np.linspace(0,1,1000)\n",
-    "\n",
-    "f_grad = egrad(f)\n",
-    "\n",
-    "computed = f_grad(x)\n",
-    "analytic = f_grad_analytic(x)\n",
-    "\n",
-    "plt.title('Derivative computed from Autograd compared with the analytical derivative')\n",
-    "plt.plot(x,computed,label='autograd')\n",
-    "plt.plot(x,analytic,label='analytic')\n",
-    "\n",
-    "plt.xlabel('x')\n",
-    "plt.ylabel('y')\n",
-    "plt.legend()\n",
-    "\n",
-    "plt.show()\n",
-    "\n",
-    "print(\"The max absolute difference is: %g\"%(np.max(np.abs(computed - analytic))))"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "9a9dd7cb",
-   "metadata": {},
-   "source": [
-    "## Using autograd\n",
-    "\n",
-    "Here we\n",
-    "experiment with what kind of functions Autograd is capable\n",
-    "of finding the gradient of. The following Python functions are just\n",
-    "meant to illustrate what Autograd can do, but please feel free to\n",
-    "experiment with other, possibly more complicated, functions as well."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 8,
-   "id": "f5d99737",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "import autograd.numpy as np\n",
-    "from autograd import grad\n",
-    "\n",
-    "def f1(x):\n",
-    "    return x**3 + 1\n",
-    "\n",
-    "f1_grad = grad(f1)\n",
-    "\n",
-    "# Remember to send in float as argument to the computed gradient from Autograd!\n",
-    "a = 1.0\n",
-    "\n",
-    "# See the evaluated gradient at a using autograd:\n",
-    "print(\"The gradient of f1 evaluated at a = %g using autograd is: %g\"%(a,f1_grad(a)))\n",
-    "\n",
-    "# Compare with the analytical derivative, that is f1'(x) = 3*x**2 \n",
-    "grad_analytical = 3*a**2\n",
-    "print(\"The gradient of f1 evaluated at a = %g by finding the analytic expression is: %g\"%(a,grad_analytical))"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "1a033f2a",
-   "metadata": {},
-   "source": [
-    "## Autograd with more complicated functions\n",
-    "\n",
-    "To differentiate with respect to two (or more) arguments of a Python\n",
-    "function, Autograd need to know at which variable the function if\n",
-    "being differentiated with respect to."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 9,
-   "id": "36ba3883",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "import autograd.numpy as np\n",
-    "from autograd import grad\n",
-    "def f2(x1,x2):\n",
-    "    return 3*x1**3 + x2*(x1 - 5) + 1\n",
-    "\n",
-    "# By sending the argument 0, Autograd will compute the derivative w.r.t the first variable, in this case x1\n",
-    "f2_grad_x1 = grad(f2,0)\n",
-    "\n",
-    "# ... and differentiate w.r.t x2 by sending 1 as an additional arugment to grad\n",
-    "f2_grad_x2 = grad(f2,1)\n",
-    "\n",
-    "x1 = 1.0\n",
-    "x2 = 3.0 \n",
-    "\n",
-    "print(\"Evaluating at x1 = %g, x2 = %g\"%(x1,x2))\n",
-    "print(\"-\"*30)\n",
-    "\n",
-    "# Compare with the analytical derivatives:\n",
-    "\n",
-    "# Derivative of f2 w.r.t x1 is: 9*x1**2 + x2:\n",
-    "f2_grad_x1_analytical = 9*x1**2 + x2\n",
-    "\n",
-    "# Derivative of f2 w.r.t x2 is: x1 - 5:\n",
-    "f2_grad_x2_analytical = x1 - 5\n",
-    "\n",
-    "# See the evaluated derivations:\n",
-    "print(\"The derivative of f2 w.r.t x1: %g\"%( f2_grad_x1(x1,x2) ))\n",
-    "print(\"The analytical derivative of f2 w.r.t x1: %g\"%( f2_grad_x1(x1,x2) ))\n",
-    "\n",
-    "print()\n",
-    "\n",
-    "print(\"The derivative of f2 w.r.t x2: %g\"%( f2_grad_x2(x1,x2) ))\n",
-    "print(\"The analytical derivative of f2 w.r.t x2: %g\"%( f2_grad_x2(x1,x2) ))"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "98a7ad40",
-   "metadata": {},
-   "source": [
-    "Note that the grad function will not produce the true gradient of the function. The true gradient of a function with two or more variables will produce a vector, where each element is the function differentiated w.r.t a variable."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "a840a46c",
-   "metadata": {},
-   "source": [
-    "## More complicated functions using the elements of their arguments directly"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 10,
-   "id": "0ce16fc4",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "import autograd.numpy as np\n",
-    "from autograd import grad\n",
-    "def f3(x): # Assumes x is an array of length 5 or higher\n",
-    "    return 2*x[0] + 3*x[1] + 5*x[2] + 7*x[3] + 11*x[4]**2\n",
-    "\n",
-    "f3_grad = grad(f3)\n",
-    "\n",
-    "x = np.linspace(0,4,5)\n",
-    "\n",
-    "# Print the computed gradient:\n",
-    "print(\"The computed gradient of f3 is: \", f3_grad(x))\n",
-    "\n",
-    "# The analytical gradient is: (2, 3, 5, 7, 22*x[4])\n",
-    "f3_grad_analytical = np.array([2, 3, 5, 7, 22*x[4]])\n",
-    "\n",
-    "# Print the analytical gradient:\n",
-    "print(\"The analytical gradient of f3 is: \", f3_grad_analytical)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "470f7e16",
-   "metadata": {},
-   "source": [
-    "Note that in this case, when sending an array as input argument, the\n",
-    "output from Autograd is another array. This is the true gradient of\n",
-    "the function, as opposed to the function in the previous example. By\n",
-    "using arrays to represent the variables, the output from Autograd\n",
-    "might be easier to work with, as the output is closer to what one\n",
-    "could expect form a gradient-evaluting function."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "4d7c7e61",
-   "metadata": {},
-   "source": [
-    "## Functions using mathematical functions from Numpy"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 11,
-   "id": "69eceec6",
-   "metadata": {},
-   "outputs": [],
+   "id": "8ea10488",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "import autograd.numpy as np\n",
-    "from autograd import grad\n",
-    "def f4(x):\n",
-    "    return np.sqrt(1+x**2) + np.exp(x) + np.sin(2*np.pi*x)\n",
+    "and they sum to one. Our earlier discussions were all specialized to\n",
+    "the case with two classes only. It is easy to see from the above that\n",
+    "what we derived earlier is compatible with these equations.\n",
     "\n",
-    "f4_grad = grad(f4)\n",
-    "\n",
-    "x = 2.7\n",
-    "\n",
-    "# Print the computed derivative:\n",
-    "print(\"The computed derivative of f4 at x = %g is: %g\"%(x,f4_grad(x)))\n",
-    "\n",
-    "# The analytical derivative is: x/sqrt(1 + x**2) + exp(x) + cos(2*pi*x)*2*pi\n",
-    "f4_grad_analytical = x/np.sqrt(1 + x**2) + np.exp(x) + np.cos(2*np.pi*x)*2*np.pi\n",
-    "\n",
-    "# Print the analytical gradient:\n",
-    "print(\"The analytical gradient of f4 at x = %g is: %g\"%(x,f4_grad_analytical))"
+    "To find the optimal parameters we would typically use a gradient\n",
+    "descent method.  Newton's method and gradient descent methods are\n",
+    "discussed in the material on [optimization\n",
+    "methods](https://compphysics.github.io/MachineLearning/doc/pub/Splines/html/Splines-bs.html)."
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "02192e06",
-   "metadata": {},
-   "source": [
-    "## More autograd"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 12,
-   "id": "6f5d7fa7",
-   "metadata": {},
-   "outputs": [],
+   "id": "9cb3baf8",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "import autograd.numpy as np\n",
-    "from autograd import grad\n",
-    "def f5(x):\n",
-    "    if x >= 0:\n",
-    "        return x**2\n",
-    "    else:\n",
-    "        return -3*x + 1\n",
-    "\n",
-    "f5_grad = grad(f5)\n",
-    "\n",
-    "x = 2.7\n",
+    "## Optimization, the central part of any Machine Learning algortithm\n",
     "\n",
-    "# Print the computed derivative:\n",
-    "print(\"The computed derivative of f5 at x = %g is: %g\"%(x,f5_grad(x)))"
+    "Almost every problem in machine learning and data science starts with\n",
+    "a dataset $X$, a model $g(\\theta)$, which is a function of the\n",
+    "parameters $\\theta$ and a cost function $C(X, g(\\theta))$ that allows\n",
+    "us to judge how well the model $g(\\theta)$ explains the observations\n",
+    "$X$. The model is fit by finding the values of $\\theta$ that minimize\n",
+    "the cost function. Ideally we would be able to solve for $\\theta$\n",
+    "analytically, however this is not possible in general and we must use\n",
+    "some approximative/numerical method to compute the minimum."
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "25b7c609",
-   "metadata": {},
-   "source": [
-    "## And  with loops"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 13,
-   "id": "043eb7de",
-   "metadata": {},
-   "outputs": [],
+   "id": "387393d7",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "import autograd.numpy as np\n",
-    "from autograd import grad\n",
-    "def f6_for(x):\n",
-    "    val = 0\n",
-    "    for i in range(10):\n",
-    "        val = val + x**i\n",
-    "    return val\n",
+    "## Revisiting our Logistic Regression case\n",
     "\n",
-    "def f6_while(x):\n",
-    "    val = 0\n",
-    "    i = 0\n",
-    "    while i < 10:\n",
-    "        val = val + x**i\n",
-    "        i = i + 1\n",
-    "    return val\n",
-    "\n",
-    "f6_for_grad = grad(f6_for)\n",
-    "f6_while_grad = grad(f6_while)\n",
-    "\n",
-    "x = 0.5\n",
-    "\n",
-    "# Print the computed derivaties of f6_for and f6_while\n",
-    "print(\"The computed derivative of f6_for at x = %g is: %g\"%(x,f6_for_grad(x)))\n",
-    "print(\"The computed derivative of f6_while at x = %g is: %g\"%(x,f6_while_grad(x)))"
+    "In our discussion on Logistic Regression we studied the \n",
+    "case of\n",
+    "two classes, with $y_i$ either\n",
+    "$0$ or $1$. Furthermore we assumed also that we have only two\n",
+    "parameters $\\theta$ in our fitting, that is we\n",
+    "defined probabilities"
    ]
   },
   {
-   "cell_type": "code",
-   "execution_count": 14,
-   "id": "880bc7f0",
-   "metadata": {},
-   "outputs": [],
+   "cell_type": "markdown",
+   "id": "30f64659",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "import autograd.numpy as np\n",
-    "from autograd import grad\n",
-    "# Both of the functions are implementation of the sum: sum(x**i) for i = 0, ..., 9\n",
-    "# The analytical derivative is: sum(i*x**(i-1)) \n",
-    "f6_grad_analytical = 0\n",
-    "for i in range(10):\n",
-    "    f6_grad_analytical += i*x**(i-1)\n",
-    "\n",
-    "print(\"The analytical derivative of f6 at x = %g is: %g\"%(x,f6_grad_analytical))"
+    "$$\n",
+    "\\begin{align*}\n",
+    "p(y_i=1|x_i,\\boldsymbol{\\theta}) &= \\frac{\\exp{(\\theta_0+\\theta_1x_i)}}{1+\\exp{(\\theta_0+\\theta_1x_i)}},\\nonumber\\\\\n",
+    "p(y_i=0|x_i,\\boldsymbol{\\theta}) &= 1 - p(y_i=1|x_i,\\boldsymbol{\\theta}),\n",
+    "\\end{align*}\n",
+    "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "322163dd",
-   "metadata": {},
+   "id": "3ba65422",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "## Using recursion"
+    "where $\\boldsymbol{\\theta}$ are the weights we wish to extract from data, in our case $\\theta_0$ and $\\theta_1$."
    ]
   },
   {
-   "cell_type": "code",
-   "execution_count": 15,
-   "id": "09f1a79b",
-   "metadata": {},
-   "outputs": [],
+   "cell_type": "markdown",
+   "id": "005f46d7",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "import autograd.numpy as np\n",
-    "from autograd import grad\n",
-    "\n",
-    "def f7(n): # Assume that n is an integer\n",
-    "    if n == 1 or n == 0:\n",
-    "        return 1\n",
-    "    else:\n",
-    "        return n*f7(n-1)\n",
+    "## The equations to solve\n",
     "\n",
-    "f7_grad = grad(f7)\n",
-    "\n",
-    "n = 2.0\n",
-    "\n",
-    "print(\"The computed derivative of f7 at n = %d is: %g\"%(n,f7_grad(n)))\n",
-    "\n",
-    "# The function f7 is an implementation of the factorial of n.\n",
-    "# By using the product rule, one can find that the derivative is:\n",
-    "\n",
-    "f7_grad_analytical = 0\n",
-    "for i in range(int(n)-1):\n",
-    "    tmp = 1\n",
-    "    for k in range(int(n)-1):\n",
-    "        if k != i:\n",
-    "            tmp *= (n - k)\n",
-    "    f7_grad_analytical += tmp\n",
-    "\n",
-    "print(\"The analytical derivative of f7 at n = %d is: %g\"%(n,f7_grad_analytical))"
+    "Our compact equations used a definition of a vector $\\boldsymbol{y}$ with $n$\n",
+    "elements $y_i$, an $n\\times p$ matrix $\\boldsymbol{X}$ which contains the\n",
+    "$x_i$ values and a vector $\\boldsymbol{p}$ of fitted probabilities\n",
+    "$p(y_i\\vert x_i,\\boldsymbol{\\theta})$. We rewrote in a more compact form\n",
+    "the first derivative of the cost function as"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "33d9596d",
-   "metadata": {},
+   "id": "61a638bc",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "Note that if n is equal to zero or one, Autograd will give an error message. This message appears when the output is independent on input."
+    "$$\n",
+    "\\frac{\\partial \\mathcal{C}(\\boldsymbol{\\theta})}{\\partial \\boldsymbol{\\theta}} = -\\boldsymbol{X}^T\\left(\\boldsymbol{y}-\\boldsymbol{p}\\right).\n",
+    "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "2b87a3af",
-   "metadata": {},
+   "id": "469c0042",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "## Using Autograd with OLS\n",
-    "\n",
-    "We conclude the part on optmization by showing how we can make codes\n",
-    "for linear regression and logistic regression using **autograd**. The\n",
-    "first example shows results with ordinary leats squares."
+    "If we in addition define a diagonal matrix $\\boldsymbol{W}$ with elements \n",
+    "$p(y_i\\vert x_i,\\boldsymbol{\\theta})(1-p(y_i\\vert x_i,\\boldsymbol{\\theta})$, we can obtain a compact expression of the second derivative as"
    ]
   },
   {
-   "cell_type": "code",
-   "execution_count": 16,
-   "id": "4674f449",
-   "metadata": {},
-   "outputs": [],
+   "cell_type": "markdown",
+   "id": "0af5449a",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "# Using Autograd to calculate gradients for OLS\n",
-    "from random import random, seed\n",
-    "import numpy as np\n",
-    "import autograd.numpy as np\n",
-    "import matplotlib.pyplot as plt\n",
-    "from autograd import grad\n",
-    "\n",
-    "def CostOLS(beta):\n",
-    "    return (1.0/n)*np.sum((y-X @ beta)**2)\n",
-    "\n",
-    "n = 100\n",
-    "x = 2*np.random.rand(n,1)\n",
-    "y = 4+3*x+np.random.randn(n,1)\n",
-    "\n",
-    "X = np.c_[np.ones((n,1)), x]\n",
-    "XT_X = X.T @ X\n",
-    "theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)\n",
-    "print(\"Own inversion\")\n",
-    "print(theta_linreg)\n",
-    "# Hessian matrix\n",
-    "H = (2.0/n)* XT_X\n",
-    "EigValues, EigVectors = np.linalg.eig(H)\n",
-    "print(f\"Eigenvalues of Hessian Matrix:{EigValues}\")\n",
-    "\n",
-    "theta = np.random.randn(2,1)\n",
-    "eta = 1.0/np.max(EigValues)\n",
-    "Niterations = 1000\n",
-    "# define the gradient\n",
-    "training_gradient = grad(CostOLS)\n",
-    "\n",
-    "for iter in range(Niterations):\n",
-    "    gradients = training_gradient(theta)\n",
-    "    theta -= eta*gradients\n",
-    "print(\"theta from own gd\")\n",
-    "print(theta)\n",
-    "\n",
-    "xnew = np.array([[0],[2]])\n",
-    "Xnew = np.c_[np.ones((2,1)), xnew]\n",
-    "ypredict = Xnew.dot(theta)\n",
-    "ypredict2 = Xnew.dot(theta_linreg)\n",
-    "\n",
-    "plt.plot(xnew, ypredict, \"r-\")\n",
-    "plt.plot(xnew, ypredict2, \"b-\")\n",
-    "plt.plot(x, y ,'ro')\n",
-    "plt.axis([0,2.0,0, 15.0])\n",
-    "plt.xlabel(r'$x$')\n",
-    "plt.ylabel(r'$y$')\n",
-    "plt.title(r'Random numbers ')\n",
-    "plt.show()"
+    "$$\n",
+    "\\frac{\\partial^2 \\mathcal{C}(\\boldsymbol{\\theta})}{\\partial \\boldsymbol{\\theta}\\partial \\boldsymbol{\\theta}^T} = \\boldsymbol{X}^T\\boldsymbol{W}\\boldsymbol{X}.\n",
+    "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "7c4e2d90",
-   "metadata": {},
+   "id": "f4c16b4f",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "## Same code but now with momentum gradient descent"
+    "This defines what is called  the Hessian matrix."
    ]
   },
   {
-   "cell_type": "code",
-   "execution_count": 17,
-   "id": "be5e6b23",
-   "metadata": {},
-   "outputs": [],
+   "cell_type": "markdown",
+   "id": "ddbe7f50",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "# Using Autograd to calculate gradients for OLS\n",
-    "from random import random, seed\n",
-    "import numpy as np\n",
-    "import autograd.numpy as np\n",
-    "import matplotlib.pyplot as plt\n",
-    "from autograd import grad\n",
-    "\n",
-    "def CostOLS(beta):\n",
-    "    return (1.0/n)*np.sum((y-X @ beta)**2)\n",
-    "\n",
-    "n = 100\n",
-    "x = 2*np.random.rand(n,1)\n",
-    "y = 4+3*x#+np.random.randn(n,1)\n",
+    "## Solving using Newton-Raphson's method\n",
     "\n",
-    "X = np.c_[np.ones((n,1)), x]\n",
-    "XT_X = X.T @ X\n",
-    "theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)\n",
-    "print(\"Own inversion\")\n",
-    "print(theta_linreg)\n",
-    "# Hessian matrix\n",
-    "H = (2.0/n)* XT_X\n",
-    "EigValues, EigVectors = np.linalg.eig(H)\n",
-    "print(f\"Eigenvalues of Hessian Matrix:{EigValues}\")\n",
+    "If we can set up these equations, Newton-Raphson's iterative method is normally the method of choice. It requires however that we can compute in an efficient way the  matrices that define the first and second derivatives. \n",
     "\n",
-    "theta = np.random.randn(2,1)\n",
-    "eta = 1.0/np.max(EigValues)\n",
-    "Niterations = 30\n",
-    "\n",
-    "# define the gradient\n",
-    "training_gradient = grad(CostOLS)\n",
-    "\n",
-    "for iter in range(Niterations):\n",
-    "    gradients = training_gradient(theta)\n",
-    "    theta -= eta*gradients\n",
-    "    print(iter,gradients[0],gradients[1])\n",
-    "print(\"theta from own gd\")\n",
-    "print(theta)\n",
-    "\n",
-    "# Now improve with momentum gradient descent\n",
-    "change = 0.0\n",
-    "delta_momentum = 0.3\n",
-    "for iter in range(Niterations):\n",
-    "    # calculate gradient\n",
-    "    gradients = training_gradient(theta)\n",
-    "    # calculate update\n",
-    "    new_change = eta*gradients+delta_momentum*change\n",
-    "    # take a step\n",
-    "    theta -= new_change\n",
-    "    # save the change\n",
-    "    change = new_change\n",
-    "    print(iter,gradients[0],gradients[1])\n",
-    "print(\"theta from own gd wth momentum\")\n",
-    "print(theta)"
+    "Our iterative scheme is then given by"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "0002f816",
-   "metadata": {},
+   "id": "52830f96",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "## Including Stochastic Gradient Descent with Autograd\n",
-    "In this code we include the stochastic gradient descent approach discussed above. Note here that we specify which argument we are taking the derivative with respect to when using **autograd**."
+    "$$\n",
+    "\\boldsymbol{\\theta}^{\\mathrm{new}} = \\boldsymbol{\\theta}^{\\mathrm{old}}-\\left(\\frac{\\partial^2 \\mathcal{C}(\\boldsymbol{\\theta})}{\\partial \\boldsymbol{\\theta}\\partial \\boldsymbol{\\theta}^T}\\right)^{-1}_{\\boldsymbol{\\theta}^{\\mathrm{old}}}\\times \\left(\\frac{\\partial \\mathcal{C}(\\boldsymbol{\\theta})}{\\partial \\boldsymbol{\\theta}}\\right)_{\\boldsymbol{\\theta}^{\\mathrm{old}}},\n",
+    "$$"
    ]
   },
   {
-   "cell_type": "code",
-   "execution_count": 18,
-   "id": "ccb60a39",
-   "metadata": {},
-   "outputs": [],
+   "cell_type": "markdown",
+   "id": "1b8a1c14",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "# Using Autograd to calculate gradients using SGD\n",
-    "# OLS example\n",
-    "from random import random, seed\n",
-    "import numpy as np\n",
-    "import autograd.numpy as np\n",
-    "import matplotlib.pyplot as plt\n",
-    "from autograd import grad\n",
-    "\n",
-    "# Note change from previous example\n",
-    "def CostOLS(y,X,theta):\n",
-    "    return np.sum((y-X @ theta)**2)\n",
-    "\n",
-    "n = 100\n",
-    "x = 2*np.random.rand(n,1)\n",
-    "y = 4+3*x+np.random.randn(n,1)\n",
-    "\n",
-    "X = np.c_[np.ones((n,1)), x]\n",
-    "XT_X = X.T @ X\n",
-    "theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)\n",
-    "print(\"Own inversion\")\n",
-    "print(theta_linreg)\n",
-    "# Hessian matrix\n",
-    "H = (2.0/n)* XT_X\n",
-    "EigValues, EigVectors = np.linalg.eig(H)\n",
-    "print(f\"Eigenvalues of Hessian Matrix:{EigValues}\")\n",
-    "\n",
-    "theta = np.random.randn(2,1)\n",
-    "eta = 1.0/np.max(EigValues)\n",
-    "Niterations = 1000\n",
-    "\n",
-    "# Note that we request the derivative wrt third argument (theta, 2 here)\n",
-    "training_gradient = grad(CostOLS,2)\n",
-    "\n",
-    "for iter in range(Niterations):\n",
-    "    gradients = (1.0/n)*training_gradient(y, X, theta)\n",
-    "    theta -= eta*gradients\n",
-    "print(\"theta from own gd\")\n",
-    "print(theta)\n",
-    "\n",
-    "xnew = np.array([[0],[2]])\n",
-    "Xnew = np.c_[np.ones((2,1)), xnew]\n",
-    "ypredict = Xnew.dot(theta)\n",
-    "ypredict2 = Xnew.dot(theta_linreg)\n",
-    "\n",
-    "plt.plot(xnew, ypredict, \"r-\")\n",
-    "plt.plot(xnew, ypredict2, \"b-\")\n",
-    "plt.plot(x, y ,'ro')\n",
-    "plt.axis([0,2.0,0, 15.0])\n",
-    "plt.xlabel(r'$x$')\n",
-    "plt.ylabel(r'$y$')\n",
-    "plt.title(r'Random numbers ')\n",
-    "plt.show()\n",
-    "\n",
-    "n_epochs = 50\n",
-    "M = 5   #size of each minibatch\n",
-    "m = int(n/M) #number of minibatches\n",
-    "t0, t1 = 5, 50\n",
-    "def learning_schedule(t):\n",
-    "    return t0/(t+t1)\n",
-    "\n",
-    "theta = np.random.randn(2,1)\n",
-    "\n",
-    "for epoch in range(n_epochs):\n",
-    "# Can you figure out a better way of setting up the contributions to each batch?\n",
-    "    for i in range(m):\n",
-    "        random_index = M*np.random.randint(m)\n",
-    "        xi = X[random_index:random_index+M]\n",
-    "        yi = y[random_index:random_index+M]\n",
-    "        gradients = (1.0/M)*training_gradient(yi, xi, theta)\n",
-    "        eta = learning_schedule(epoch*m+i)\n",
-    "        theta = theta - eta*gradients\n",
-    "print(\"theta from own sdg\")\n",
-    "print(theta)"
+    "or in matrix form as"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "502d537b",
-   "metadata": {},
+   "id": "8ad73cea",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "## Same code but now with momentum gradient descent"
+    "$$\n",
+    "\\boldsymbol{\\theta}^{\\mathrm{new}} = \\boldsymbol{\\theta}^{\\mathrm{old}}-\\left(\\boldsymbol{X}^T\\boldsymbol{W}\\boldsymbol{X} \\right)^{-1}\\times \\left(-\\boldsymbol{X}^T(\\boldsymbol{y}-\\boldsymbol{p}) \\right)_{\\boldsymbol{\\theta}^{\\mathrm{old}}}.\n",
+    "$$"
    ]
   },
   {
-   "cell_type": "code",
-   "execution_count": 19,
-   "id": "d62b1fd8",
-   "metadata": {},
-   "outputs": [],
+   "cell_type": "markdown",
+   "id": "6d47dd0b",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "# Using Autograd to calculate gradients using SGD\n",
-    "# OLS example\n",
-    "from random import random, seed\n",
-    "import numpy as np\n",
-    "import autograd.numpy as np\n",
-    "import matplotlib.pyplot as plt\n",
-    "from autograd import grad\n",
-    "\n",
-    "# Note change from previous example\n",
-    "def CostOLS(y,X,theta):\n",
-    "    return np.sum((y-X @ theta)**2)\n",
-    "\n",
-    "n = 100\n",
-    "x = 2*np.random.rand(n,1)\n",
-    "y = 4+3*x+np.random.randn(n,1)\n",
-    "\n",
-    "X = np.c_[np.ones((n,1)), x]\n",
-    "XT_X = X.T @ X\n",
-    "theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)\n",
-    "print(\"Own inversion\")\n",
-    "print(theta_linreg)\n",
-    "# Hessian matrix\n",
-    "H = (2.0/n)* XT_X\n",
-    "EigValues, EigVectors = np.linalg.eig(H)\n",
-    "print(f\"Eigenvalues of Hessian Matrix:{EigValues}\")\n",
+    "The right-hand side is computed with the old values of $\\theta$. \n",
     "\n",
-    "theta = np.random.randn(2,1)\n",
-    "eta = 1.0/np.max(EigValues)\n",
-    "Niterations = 100\n",
-    "\n",
-    "# Note that we request the derivative wrt third argument (theta, 2 here)\n",
-    "training_gradient = grad(CostOLS,2)\n",
-    "\n",
-    "for iter in range(Niterations):\n",
-    "    gradients = (1.0/n)*training_gradient(y, X, theta)\n",
-    "    theta -= eta*gradients\n",
-    "print(\"theta from own gd\")\n",
-    "print(theta)\n",
-    "\n",
-    "\n",
-    "n_epochs = 50\n",
-    "M = 5   #size of each minibatch\n",
-    "m = int(n/M) #number of minibatches\n",
-    "t0, t1 = 5, 50\n",
-    "def learning_schedule(t):\n",
-    "    return t0/(t+t1)\n",
-    "\n",
-    "theta = np.random.randn(2,1)\n",
-    "\n",
-    "change = 0.0\n",
-    "delta_momentum = 0.3\n",
-    "\n",
-    "for epoch in range(n_epochs):\n",
-    "    for i in range(m):\n",
-    "        random_index = M*np.random.randint(m)\n",
-    "        xi = X[random_index:random_index+M]\n",
-    "        yi = y[random_index:random_index+M]\n",
-    "        gradients = (1.0/M)*training_gradient(yi, xi, theta)\n",
-    "        eta = learning_schedule(epoch*m+i)\n",
-    "        # calculate update\n",
-    "        new_change = eta*gradients+delta_momentum*change\n",
-    "        # take a step\n",
-    "        theta -= new_change\n",
-    "        # save the change\n",
-    "        change = new_change\n",
-    "print(\"theta from own sdg with momentum\")\n",
-    "print(theta)"
+    "If we can compute these matrices, in particular the Hessian, the above is often the easiest method to implement."
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "ab2dfdcf",
-   "metadata": {},
+   "id": "f399c2f4",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "## Similar (second order function now) problem but now with AdaGrad"
+    "## Example code for Logistic Regression\n",
+    "\n",
+    "Here we make a class for Logistic regression. The code uses a simple data set and includes both a binary case and a multiclass case."
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 8,
-   "id": "24784ec8",
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "Own inversion\n",
-      "[[2. ]\n",
-      " [3. ]\n",
-      " [1.5]]\n",
-      "theta from own AdaGrad\n",
-      "[[2.00063579]\n",
-      " [2.996584  ]\n",
-      " [1.50373614]]\n"
-     ]
-    }
-   ],
-   "source": [
-    "# Using Autograd to calculate gradients using AdaGrad and Stochastic Gradient descent\n",
-    "# OLS example\n",
-    "from random import random, seed\n",
+   "execution_count": 1,
+   "id": "79f6b6fc",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
     "import numpy as np\n",
-    "import autograd.numpy as np\n",
-    "import matplotlib.pyplot as plt\n",
-    "from autograd import grad\n",
-    "\n",
-    "# Note change from previous example\n",
-    "def CostOLS(y,X,theta):\n",
-    "    return np.sum((y-X @ theta)**2)\n",
-    "\n",
-    "n = 1000\n",
-    "x = np.random.rand(n,1)\n",
-    "y = 2.0+3*x +1.5*x*x\n",
-    "\n",
-    "X = np.c_[np.ones((n,1)), x, x*x]\n",
-    "XT_X = X.T @ X\n",
-    "theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)\n",
-    "print(\"Own inversion\")\n",
-    "print(theta_linreg)\n",
     "\n",
+    "class LogisticRegression:\n",
+    "    \"\"\"\n",
+    "    Logistic Regression for binary and multiclass classification.\n",
+    "    \"\"\"\n",
+    "    def __init__(self, lr=0.01, epochs=1000, fit_intercept=True, verbose=False):\n",
+    "        self.lr = lr                  # Learning rate for gradient descent\n",
+    "        self.epochs = epochs          # Number of iterations\n",
+    "        self.fit_intercept = fit_intercept  # Whether to add intercept (bias)\n",
+    "        self.verbose = verbose        # Print loss during training if True\n",
+    "        self.weights = None\n",
+    "        self.multi_class = False      # Will be determined at fit time\n",
+    "\n",
+    "    def _add_intercept(self, X):\n",
+    "        \"\"\"Add intercept term (column of ones) to feature matrix.\"\"\"\n",
+    "        intercept = np.ones((X.shape[0], 1))\n",
+    "        return np.concatenate((intercept, X), axis=1)\n",
+    "\n",
+    "    def _sigmoid(self, z):\n",
+    "        \"\"\"Sigmoid function for binary logistic.\"\"\"\n",
+    "        return 1 / (1 + np.exp(-z))\n",
     "\n",
-    "# Note that we request the derivative wrt third argument (theta, 2 here)\n",
-    "training_gradient = grad(CostOLS,2)\n",
-    "# Define parameters for Stochastic Gradient Descent\n",
-    "n_epochs = 100\n",
-    "M = 10   #size of each minibatch\n",
-    "m = int(n/M) #number of minibatches\n",
-    "# Guess for unknown parameters theta\n",
-    "theta = np.random.randn(3,1)\n",
+    "    def _softmax(self, Z):\n",
+    "        \"\"\"Softmax function for multiclass logistic.\"\"\"\n",
+    "        exp_Z = np.exp(Z - np.max(Z, axis=1, keepdims=True))\n",
+    "        return exp_Z / np.sum(exp_Z, axis=1, keepdims=True)\n",
+    "\n",
+    "    def fit(self, X, y):\n",
+    "        \"\"\"\n",
+    "        Train the logistic regression model using gradient descent.\n",
+    "        Supports binary (sigmoid) and multiclass (softmax) based on y.\n",
+    "        \"\"\"\n",
+    "        X = np.array(X)\n",
+    "        y = np.array(y)\n",
+    "        n_samples, n_features = X.shape\n",
+    "\n",
+    "        # Add intercept if needed\n",
+    "        if self.fit_intercept:\n",
+    "            X = self._add_intercept(X)\n",
+    "            n_features += 1\n",
+    "\n",
+    "        # Determine classes and mode (binary vs multiclass)\n",
+    "        unique_classes = np.unique(y)\n",
+    "        if len(unique_classes) > 2:\n",
+    "            self.multi_class = True\n",
+    "        else:\n",
+    "            self.multi_class = False\n",
+    "\n",
+    "        # ----- Multiclass case -----\n",
+    "        if self.multi_class:\n",
+    "            n_classes = len(unique_classes)\n",
+    "            # Map original labels to 0...n_classes-1\n",
+    "            class_to_index = {c: idx for idx, c in enumerate(unique_classes)}\n",
+    "            y_indices = np.array([class_to_index[c] for c in y])\n",
+    "            # Initialize weight matrix (features x classes)\n",
+    "            self.weights = np.zeros((n_features, n_classes))\n",
+    "\n",
+    "            # One-hot encode y\n",
+    "            Y_onehot = np.zeros((n_samples, n_classes))\n",
+    "            Y_onehot[np.arange(n_samples), y_indices] = 1\n",
+    "\n",
+    "            # Gradient descent\n",
+    "            for epoch in range(self.epochs):\n",
+    "                scores = X.dot(self.weights)          # Linear scores (n_samples x n_classes)\n",
+    "                probs = self._softmax(scores)        # Probabilities (n_samples x n_classes)\n",
+    "                # Compute gradient (features x classes)\n",
+    "                gradient = (1 / n_samples) * X.T.dot(probs - Y_onehot)\n",
+    "                # Update weights\n",
+    "                self.weights -= self.lr * gradient\n",
+    "\n",
+    "                if self.verbose and epoch % 100 == 0:\n",
+    "                    # Compute current loss (categorical cross-entropy)\n",
+    "                    loss = -np.sum(Y_onehot * np.log(probs + 1e-15)) / n_samples\n",
+    "                    print(f\"[Epoch {epoch}] Multiclass loss: {loss:.4f}\")\n",
+    "\n",
+    "        # ----- Binary case -----\n",
+    "        else:\n",
+    "            # Convert y to 0/1 if not already\n",
+    "            if not np.array_equal(unique_classes, [0, 1]):\n",
+    "                # Map the two classes to 0 and 1\n",
+    "                class0, class1 = unique_classes\n",
+    "                y_binary = np.where(y == class1, 1, 0)\n",
+    "            else:\n",
+    "                y_binary = y.copy().astype(int)\n",
+    "\n",
+    "            # Initialize weights vector (features,)\n",
+    "            self.weights = np.zeros(n_features)\n",
+    "\n",
+    "            # Gradient descent\n",
+    "            for epoch in range(self.epochs):\n",
+    "                linear_model = X.dot(self.weights)     # (n_samples,)\n",
+    "                probs = self._sigmoid(linear_model)   # (n_samples,)\n",
+    "                # Gradient for binary cross-entropy\n",
+    "                gradient = (1 / n_samples) * X.T.dot(probs - y_binary)\n",
+    "                self.weights -= self.lr * gradient\n",
+    "\n",
+    "                if self.verbose and epoch % 100 == 0:\n",
+    "                    # Compute binary cross-entropy loss\n",
+    "                    loss = -np.mean(\n",
+    "                        y_binary * np.log(probs + 1e-15) + \n",
+    "                        (1 - y_binary) * np.log(1 - probs + 1e-15)\n",
+    "                    )\n",
+    "                    print(f\"[Epoch {epoch}] Binary loss: {loss:.4f}\")\n",
+    "\n",
+    "    def predict_prob(self, X):\n",
+    "        \"\"\"\n",
+    "        Compute probability estimates. Returns a 1D array for binary or\n",
+    "        a 2D array (n_samples x n_classes) for multiclass.\n",
+    "        \"\"\"\n",
+    "        X = np.array(X)\n",
+    "        # Add intercept if the model used it\n",
+    "        if self.fit_intercept:\n",
+    "            X = self._add_intercept(X)\n",
+    "        scores = X.dot(self.weights)\n",
+    "        if self.multi_class:\n",
+    "            return self._softmax(scores)\n",
+    "        else:\n",
+    "            return self._sigmoid(scores)\n",
     "\n",
-    "# Value for learning rate\n",
-    "eta = 0.01\n",
-    "# Including AdaGrad parameter to avoid possible division by zero\n",
-    "delta  = 1e-8\n",
-    "for epoch in range(n_epochs):\n",
-    "    Giter = 0.0\n",
-    "    for i in range(m):\n",
-    "        random_index = M*np.random.randint(m)\n",
-    "        xi = X[random_index:random_index+M]\n",
-    "        yi = y[random_index:random_index+M]\n",
-    "        gradients = (1.0/M)*training_gradient(yi, xi, theta)\n",
-    "        Giter += gradients*gradients\n",
-    "        update = gradients*eta/(delta+np.sqrt(Giter))\n",
-    "        theta -= update\n",
-    "print(\"theta from own AdaGrad\")\n",
-    "print(theta)"
+    "    def predict(self, X):\n",
+    "        \"\"\"\n",
+    "        Predict class labels for samples in X.\n",
+    "        Returns integer class labels (0,1 for binary, or 0...C-1 for multiclass).\n",
+    "        \"\"\"\n",
+    "        probs = self.predict_prob(X)\n",
+    "        if self.multi_class:\n",
+    "            # Choose class with highest probability\n",
+    "            return np.argmax(probs, axis=1)\n",
+    "        else:\n",
+    "            # Threshold at 0.5 for binary\n",
+    "            return (probs >= 0.5).astype(int)"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "b525a878",
-   "metadata": {},
+   "id": "24e84b29",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "Running this code we note an almost perfect agreement with the results from matrix inversion."
+    "The class implements the sigmoid and softmax internally. During fit(),\n",
+    "we check the number of classes: if more than 2, we set\n",
+    "self.multi_class=True and perform multinomial logistic regression. We\n",
+    "one-hot encode the target vector and update a weight matrix with\n",
+    "softmax probabilities. Otherwise, we do standard binary logistic\n",
+    "regression, converting labels to 0/1 if needed and updating a weight\n",
+    "vector. In both cases we use batch gradient descent on the\n",
+    "cross-entropy loss (we add a small epsilon 1e-15 to logs for numerical\n",
+    "stability). Progress (loss) can be printed if verbose=True."
    ]
   },
   {
-   "cell_type": "markdown",
-   "id": "f3de5529",
-   "metadata": {},
+   "cell_type": "code",
+   "execution_count": 2,
+   "id": "7a73eca4",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "# Evaluation Metrics\n",
+    "#We define helper functions for accuracy and cross-entropy loss. Accuracy is the fraction of correct predictions . For loss, we compute the appropriate cross-entropy:\n",
+    "\n",
+    "def accuracy_score(y_true, y_pred):\n",
+    "    \"\"\"Accuracy = (# correct predictions) / (total samples).\"\"\"\n",
+    "    y_true = np.array(y_true)\n",
+    "    y_pred = np.array(y_pred)\n",
+    "    return np.mean(y_true == y_pred)\n",
+    "\n",
+    "def binary_cross_entropy(y_true, y_prob):\n",
+    "    \"\"\"\n",
+    "    Binary cross-entropy loss.\n",
+    "    y_true: true binary labels (0 or 1), y_prob: predicted probabilities for class 1.\n",
+    "    \"\"\"\n",
+    "    y_true = np.array(y_true)\n",
+    "    y_prob = np.clip(np.array(y_prob), 1e-15, 1-1e-15)\n",
+    "    return -np.mean(y_true * np.log(y_prob) + (1 - y_true) * np.log(1 - y_prob))\n",
+    "\n",
+    "def categorical_cross_entropy(y_true, y_prob):\n",
+    "    \"\"\"\n",
+    "    Categorical cross-entropy loss for multiclass.\n",
+    "    y_true: true labels (0...C-1), y_prob: array of predicted probabilities (n_samples x C).\n",
+    "    \"\"\"\n",
+    "    y_true = np.array(y_true, dtype=int)\n",
+    "    y_prob = np.clip(np.array(y_prob), 1e-15, 1-1e-15)\n",
+    "    # One-hot encode true labels\n",
+    "    n_samples, n_classes = y_prob.shape\n",
+    "    one_hot = np.zeros_like(y_prob)\n",
+    "    one_hot[np.arange(n_samples), y_true] = 1\n",
+    "    # Compute cross-entropy\n",
+    "    loss_vec = -np.sum(one_hot * np.log(y_prob), axis=1)\n",
+    "    return np.mean(loss_vec)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "40d4b30f",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "## RMSprop for adaptive learning rate with Stochastic Gradient Descent"
+    "### Synthetic data generation\n",
+    "\n",
+    "Binary classification data: Create two Gaussian clusters in 2D. For example, class 0 around mean [-2,-2] and class 1 around [2,2].\n",
+    "Multiclass data: Create several Gaussian clusters (one per class) spread out in feature space."
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 5,
-   "id": "770a0f44",
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "Own inversion\n",
-      "[[2. ]\n",
-      " [3. ]\n",
-      " [1.5]]\n",
-      "theta from own RMSprop\n",
-      "[[1.99889288]\n",
-      " [3.00367262]\n",
-      " [1.4940189 ]]\n"
-     ]
-    }
-   ],
-   "source": [
-    "# Using Autograd to calculate gradients using RMSprop  and Stochastic Gradient descent\n",
-    "# OLS example\n",
-    "from random import random, seed\n",
+   "execution_count": 3,
+   "id": "ac0089bf",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
     "import numpy as np\n",
-    "import autograd.numpy as np\n",
-    "import matplotlib.pyplot as plt\n",
-    "from autograd import grad\n",
     "\n",
-    "# Note change from previous example\n",
-    "def CostOLS(y,X,theta):\n",
-    "    return np.sum((y-X @ theta)**2)\n",
-    "\n",
-    "n = 1000\n",
-    "x = np.random.rand(n,1)\n",
-    "y = 2.0+3*x +1.5*x*x# +np.random.randn(n,1)\n",
+    "def generate_binary_data(n_samples=100, n_features=2, random_state=None):\n",
+    "    \"\"\"\n",
+    "    Generate synthetic binary classification data.\n",
+    "    Returns (X, y) where X is (n_samples x n_features), y in {0,1}.\n",
+    "    \"\"\"\n",
+    "    rng = np.random.RandomState(random_state)\n",
+    "    # Half samples for class 0, half for class 1\n",
+    "    n0 = n_samples // 2\n",
+    "    n1 = n_samples - n0\n",
+    "    # Class 0 around mean -2, class 1 around +2\n",
+    "    mean0 = -2 * np.ones(n_features)\n",
+    "    mean1 =  2 * np.ones(n_features)\n",
+    "    X0 = rng.randn(n0, n_features) + mean0\n",
+    "    X1 = rng.randn(n1, n_features) + mean1\n",
+    "    X = np.vstack((X0, X1))\n",
+    "    y = np.array([0]*n0 + [1]*n1)\n",
+    "    return X, y\n",
+    "\n",
+    "def generate_multiclass_data(n_samples=150, n_features=2, n_classes=3, random_state=None):\n",
+    "    \"\"\"\n",
+    "    Generate synthetic multiclass data with n_classes Gaussian clusters.\n",
+    "    \"\"\"\n",
+    "    rng = np.random.RandomState(random_state)\n",
+    "    X = []\n",
+    "    y = []\n",
+    "    samples_per_class = n_samples // n_classes\n",
+    "    for cls in range(n_classes):\n",
+    "        # Random cluster center for each class\n",
+    "        center = rng.uniform(-5, 5, size=n_features)\n",
+    "        Xi = rng.randn(samples_per_class, n_features) + center\n",
+    "        yi = [cls] * samples_per_class\n",
+    "        X.append(Xi)\n",
+    "        y.extend(yi)\n",
+    "    X = np.vstack(X)\n",
+    "    y = np.array(y)\n",
+    "    return X, y\n",
+    "\n",
+    "\n",
+    "# Generate and test on binary data\n",
+    "X_bin, y_bin = generate_binary_data(n_samples=200, n_features=2, random_state=42)\n",
+    "model_bin = LogisticRegression(lr=0.1, epochs=1000)\n",
+    "model_bin.fit(X_bin, y_bin)\n",
+    "y_prob_bin = model_bin.predict_prob(X_bin)      # probabilities for class 1\n",
+    "y_pred_bin = model_bin.predict(X_bin)           # predicted classes 0 or 1\n",
+    "\n",
+    "acc_bin = accuracy_score(y_bin, y_pred_bin)\n",
+    "loss_bin = binary_cross_entropy(y_bin, y_prob_bin)\n",
+    "print(f\"Binary Classification - Accuracy: {acc_bin:.2f}, Cross-Entropy Loss: {loss_bin:.2f}\")\n",
+    "#For multiclass:\n",
+    "# Generate and test on multiclass data\n",
+    "X_multi, y_multi = generate_multiclass_data(n_samples=300, n_features=2, n_classes=3, random_state=1)\n",
+    "model_multi = LogisticRegression(lr=0.1, epochs=1000)\n",
+    "model_multi.fit(X_multi, y_multi)\n",
+    "y_prob_multi = model_multi.predict_prob(X_multi)     # (n_samples x 3) probabilities\n",
+    "y_pred_multi = model_multi.predict(X_multi)          # predicted labels 0,1,2\n",
+    "\n",
+    "acc_multi = accuracy_score(y_multi, y_pred_multi)\n",
+    "loss_multi = categorical_cross_entropy(y_multi, y_prob_multi)\n",
+    "print(f\"Multiclass Classification - Accuracy: {acc_multi:.2f}, Cross-Entropy Loss: {loss_multi:.2f}\")\n",
+    "\n",
+    "# CSV Export\n",
+    "import csv\n",
+    "\n",
+    "# Export binary results\n",
+    "with open('binary_results.csv', mode='w', newline='') as f:\n",
+    "    writer = csv.writer(f)\n",
+    "    writer.writerow([\"TrueLabel\", \"PredictedLabel\"])\n",
+    "    for true, pred in zip(y_bin, y_pred_bin):\n",
+    "        writer.writerow([true, pred])\n",
+    "\n",
+    "# Export multiclass results\n",
+    "with open('multiclass_results.csv', mode='w', newline='') as f:\n",
+    "    writer = csv.writer(f)\n",
+    "    writer.writerow([\"TrueLabel\", \"PredictedLabel\"])\n",
+    "    for true, pred in zip(y_multi, y_pred_multi):\n",
+    "        writer.writerow([true, pred])"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1e9acef3",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Using **Scikit-learn**\n",
     "\n",
-    "X = np.c_[np.ones((n,1)), x, x*x]\n",
-    "XT_X = X.T @ X\n",
-    "theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)\n",
-    "print(\"Own inversion\")\n",
-    "print(theta_linreg)\n",
+    "We show here how we can use a logistic regression case on a data set\n",
+    "included in _scikit_learn_, the so-called Wisconsin breast cancer data\n",
+    "using Logistic regression as our algorithm for classification. This is\n",
+    "a widely studied data set and can easily be included in demonstrations\n",
+    "of classification problems."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "id": "9153234a",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "%matplotlib inline\n",
     "\n",
+    "import matplotlib.pyplot as plt\n",
+    "import numpy as np\n",
+    "from sklearn.model_selection import  train_test_split \n",
+    "from sklearn.datasets import load_breast_cancer\n",
+    "from sklearn.linear_model import LogisticRegression\n",
     "\n",
-    "# Note that we request the derivative wrt third argument (theta, 2 here)\n",
-    "training_gradient = grad(CostOLS,2)\n",
-    "# Define parameters for Stochastic Gradient Descent\n",
-    "n_epochs = 1000\n",
-    "M = 5   #size of each minibatch\n",
-    "m = int(n/M) #number of minibatches\n",
-    "# Guess for unknown parameters theta\n",
-    "theta = np.random.randn(3,1)\n",
+    "# Load the data\n",
+    "cancer = load_breast_cancer()\n",
     "\n",
-    "# Value for learning rate\n",
-    "eta = 0.01\n",
-    "# Value for parameter rho\n",
-    "rho = 0.99\n",
-    "# Including AdaGrad parameter to avoid possible division by zero\n",
-    "delta  = 1e-8\n",
-    "for epoch in range(n_epochs):\n",
-    "    Giter = 0.0\n",
-    "    for i in range(m):\n",
-    "        random_index = M*np.random.randint(m)\n",
-    "        xi = X[random_index:random_index+M]\n",
-    "        yi = y[random_index:random_index+M]\n",
-    "        gradients = (1.0/M)*training_gradient(yi, xi, theta)\n",
-    "\t# Accumulated gradient\n",
-    "\t# Scaling with rho the new and the previous results\n",
-    "        Giter = (rho*Giter+(1-rho)*gradients*gradients)\n",
-    "\t# Taking the diagonal only and inverting\n",
-    "        update = gradients*eta/(delta+np.sqrt(Giter))\n",
-    "\t# Hadamard product\n",
-    "        theta -= update\n",
-    "print(\"theta from own RMSprop\")\n",
-    "print(theta)"
+    "X_train, X_test, y_train, y_test = train_test_split(cancer.data,cancer.target,random_state=0)\n",
+    "print(X_train.shape)\n",
+    "print(X_test.shape)\n",
+    "# Logistic Regression\n",
+    "logreg = LogisticRegression(solver='lbfgs')\n",
+    "logreg.fit(X_train, y_train)\n",
+    "print(\"Test set accuracy with Logistic Regression: {:.2f}\".format(logreg.score(X_test,y_test)))"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "2cf458d4",
-   "metadata": {},
+   "id": "908d547b",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "## And finally [ADAM](https://arxiv.org/pdf/1412.6980.pdf)"
+    "## Using the correlation matrix\n",
+    "\n",
+    "In addition to the above scores, we could also study the covariance (and the correlation matrix).\n",
+    "We use **Pandas** to compute the correlation matrix."
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 10,
-   "id": "ebe031fe",
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "Own inversion\n",
-      "[[2. ]\n",
-      " [3. ]\n",
-      " [1.5]]\n",
-      "theta from own ADAM\n",
-      "[[1.99999721]\n",
-      " [3.0000173 ]\n",
-      " [1.49998153]]\n"
-     ]
-    }
-   ],
-   "source": [
-    "# Using Autograd to calculate gradients using RMSprop  and Stochastic Gradient descent\n",
-    "# OLS example\n",
-    "from random import random, seed\n",
-    "import numpy as np\n",
-    "import autograd.numpy as np\n",
+   "execution_count": 5,
+   "id": "8a46f4f3",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
     "import matplotlib.pyplot as plt\n",
-    "from autograd import grad\n",
-    "\n",
-    "# Note change from previous example\n",
-    "def CostOLS(y,X,theta):\n",
-    "    return np.sum((y-X @ theta)**2)\n",
-    "\n",
-    "n = 1000\n",
-    "x = np.random.rand(n,1)\n",
-    "y = 2.0+3*x +1.5*x*x# +np.random.randn(n,1)\n",
-    "\n",
-    "X = np.c_[np.ones((n,1)), x, x*x]\n",
-    "XT_X = X.T @ X\n",
-    "theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)\n",
-    "print(\"Own inversion\")\n",
-    "print(theta_linreg)\n",
-    "\n",
-    "\n",
-    "# Note that we request the derivative wrt third argument (theta, 2 here)\n",
-    "training_gradient = grad(CostOLS,2)\n",
-    "# Define parameters for Stochastic Gradient Descent\n",
-    "n_epochs = 1000\n",
-    "M = 5   #size of each minibatch\n",
-    "m = int(n/M) #number of minibatches\n",
-    "# Guess for unknown parameters theta\n",
-    "theta = np.random.randn(3,1)\n",
-    "\n",
-    "# Value for learning rate\n",
-    "eta = 0.001\n",
-    "# Value for parameters beta1 and beta2, see https://arxiv.org/abs/1412.6980\n",
-    "beta1 = 0.9\n",
-    "beta2 = 0.999\n",
-    "# Including AdaGrad parameter to avoid possible division by zero\n",
-    "delta  = 1e-7\n",
-    "iter = 0\n",
-    "for epoch in range(n_epochs):\n",
-    "    first_moment = 0.0\n",
-    "    second_moment = 0.0\n",
-    "    iter += 1\n",
-    "    for i in range(m):\n",
-    "        random_index = M*np.random.randint(m)\n",
-    "        xi = X[random_index:random_index+M]\n",
-    "        yi = y[random_index:random_index+M]\n",
-    "        gradients = (1.0/M)*training_gradient(yi, xi, theta)\n",
-    "        # Computing moments first\n",
-    "        first_moment = beta1*first_moment + (1-beta1)*gradients\n",
-    "        second_moment = beta2*second_moment+(1-beta2)*gradients*gradients\n",
-    "        first_term = first_moment/(1.0-beta1**iter)\n",
-    "        second_term = second_moment/(1.0-beta2**iter)\n",
-    "\t# Scaling with rho the new and the previous results\n",
-    "        update = eta*first_term/(np.sqrt(second_term)+delta)\n",
-    "        theta -= update\n",
-    "print(\"theta from own ADAM\")\n",
-    "print(theta)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "df82f58c",
-   "metadata": {},
-   "source": [
-    "## And Logistic Regression"
+    "import numpy as np\n",
+    "from sklearn.model_selection import  train_test_split \n",
+    "from sklearn.datasets import load_breast_cancer\n",
+    "from sklearn.linear_model import LogisticRegression\n",
+    "cancer = load_breast_cancer()\n",
+    "import pandas as pd\n",
+    "# Making a data frame\n",
+    "cancerpd = pd.DataFrame(cancer.data, columns=cancer.feature_names)\n",
+    "\n",
+    "fig, axes = plt.subplots(15,2,figsize=(10,20))\n",
+    "malignant = cancer.data[cancer.target == 0]\n",
+    "benign = cancer.data[cancer.target == 1]\n",
+    "ax = axes.ravel()\n",
+    "\n",
+    "for i in range(30):\n",
+    "    _, bins = np.histogram(cancer.data[:,i], bins =50)\n",
+    "    ax[i].hist(malignant[:,i], bins = bins, alpha = 0.5)\n",
+    "    ax[i].hist(benign[:,i], bins = bins, alpha = 0.5)\n",
+    "    ax[i].set_title(cancer.feature_names[i])\n",
+    "    ax[i].set_yticks(())\n",
+    "ax[0].set_xlabel(\"Feature magnitude\")\n",
+    "ax[0].set_ylabel(\"Frequency\")\n",
+    "ax[0].legend([\"Malignant\", \"Benign\"], loc =\"best\")\n",
+    "fig.tight_layout()\n",
+    "plt.show()\n",
+    "\n",
+    "import seaborn as sns\n",
+    "correlation_matrix = cancerpd.corr().round(1)\n",
+    "# use the heatmap function from seaborn to plot the correlation matrix\n",
+    "# annot = True to print the values inside the square\n",
+    "plt.figure(figsize=(15,8))\n",
+    "sns.heatmap(data=correlation_matrix, annot=True)\n",
+    "plt.show()"
    ]
   },
   {
-   "cell_type": "code",
-   "execution_count": 23,
-   "id": "23a3aae7",
-   "metadata": {},
-   "outputs": [],
+   "cell_type": "markdown",
+   "id": "ba0275a7",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "import autograd.numpy as np\n",
-    "from autograd import grad\n",
-    "\n",
-    "def sigmoid(x):\n",
-    "    return 0.5 * (np.tanh(x / 2.) + 1)\n",
-    "\n",
-    "def logistic_predictions(weights, inputs):\n",
-    "    # Outputs probability of a label being true according to logistic model.\n",
-    "    return sigmoid(np.dot(inputs, weights))\n",
-    "\n",
-    "def training_loss(weights):\n",
-    "    # Training loss is the negative log-likelihood of the training labels.\n",
-    "    preds = logistic_predictions(weights, inputs)\n",
-    "    label_probabilities = preds * targets + (1 - preds) * (1 - targets)\n",
-    "    return -np.sum(np.log(label_probabilities))\n",
+    "## Discussing the correlation data\n",
     "\n",
-    "# Build a toy dataset.\n",
-    "inputs = np.array([[0.52, 1.12,  0.77],\n",
-    "                   [0.88, -1.08, 0.15],\n",
-    "                   [0.52, 0.06, -1.30],\n",
-    "                   [0.74, -2.49, 1.39]])\n",
-    "targets = np.array([True, True, False, True])\n",
+    "In the above example we note two things. In the first plot we display\n",
+    "the overlap of benign and malignant tumors as functions of the various\n",
+    "features in the Wisconsin data set. We see that for\n",
+    "some of the features we can distinguish clearly the benign and\n",
+    "malignant cases while for other features we cannot. This can point to\n",
+    "us which features may be of greater interest when we wish to classify\n",
+    "a benign or not benign tumour.\n",
     "\n",
-    "# Define a function that returns gradients of training loss using Autograd.\n",
-    "training_gradient_fun = grad(training_loss)\n",
+    "In the second figure we have computed the so-called correlation\n",
+    "matrix, which in our case with thirty features becomes a $30\\times 30$\n",
+    "matrix.\n",
     "\n",
-    "# Optimize weights using gradient descent.\n",
-    "weights = np.array([0.0, 0.0, 0.0])\n",
-    "print(\"Initial loss:\", training_loss(weights))\n",
-    "for i in range(100):\n",
-    "    weights -= training_gradient_fun(weights) * 0.01\n",
-    "\n",
-    "print(\"Trained loss:\", training_loss(weights))"
+    "We constructed this matrix using **pandas** via the statements"
    ]
   },
   {
-   "cell_type": "markdown",
-   "id": "04692c2e",
-   "metadata": {},
+   "cell_type": "code",
+   "execution_count": 6,
+   "id": "1af34f8e",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
    "source": [
-    "## Introducing [JAX](https://jax.readthedocs.io/en/latest/)\n",
-    "\n",
-    "Presently, instead of using **autograd**, we recommend using [JAX](https://jax.readthedocs.io/en/latest/)\n",
-    "\n",
-    "**JAX** is Autograd and [XLA (Accelerated Linear Algebra))](https://www.tensorflow.org/xla),\n",
-    "brought together for high-performance numerical computing and machine learning research.\n",
-    "It provides composable transformations of Python+NumPy programs: differentiate, vectorize, parallelize, Just-In-Time compile to GPU/TPU, and more."
+    "cancerpd = pd.DataFrame(cancer.data, columns=cancer.feature_names)"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "c9531ff9",
-   "metadata": {},
+   "id": "1eac30d3",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "### Getting started with Jax, note the way we import numpy"
+    "and then"
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 24,
-   "id": "1082677a",
-   "metadata": {},
+   "execution_count": 7,
+   "id": "a0cdd9c9",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
    "outputs": [],
    "source": [
-    "import jax\n",
-    "import jax.numpy as jnp\n",
-    "import numpy as np\n",
-    "import matplotlib.pyplot as plt\n",
-    "\n",
-    "from jax import grad as jax_grad"
+    "correlation_matrix = cancerpd.corr().round(1)"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "ddab1050",
-   "metadata": {},
-   "source": [
-    "### A warm-up example"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 25,
-   "id": "9a64315e",
-   "metadata": {},
-   "outputs": [],
+   "id": "013777ad",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "def function(x):\n",
-    "    return x**2\n",
-    "\n",
-    "def analytical_gradient(x):\n",
-    "    return 2*x\n",
-    "\n",
-    "def gradient_descent(starting_point, learning_rate, num_iterations, solver=\"analytical\"):\n",
-    "    x = starting_point\n",
-    "    trajectory_x = [x]\n",
-    "    trajectory_y = [function(x)]\n",
-    "\n",
-    "    if solver == \"analytical\":\n",
-    "        grad = analytical_gradient    \n",
-    "    elif solver == \"jax\":\n",
-    "        grad = jax_grad(function)\n",
-    "        x = jnp.float64(x)\n",
-    "        learning_rate = jnp.float64(learning_rate)\n",
-    "\n",
-    "    for _ in range(num_iterations):\n",
-    "        \n",
-    "        x = x - learning_rate * grad(x)\n",
-    "        trajectory_x.append(x)\n",
-    "        trajectory_y.append(function(x))\n",
-    "\n",
-    "    return trajectory_x, trajectory_y\n",
-    "\n",
-    "x = np.linspace(-5, 5, 100)\n",
-    "plt.plot(x, function(x), label=\"f(x)\")\n",
-    "\n",
-    "descent_x, descent_y = gradient_descent(5, 0.1, 10, solver=\"analytical\")\n",
-    "jax_descend_x, jax_descend_y = gradient_descent(5, 0.1, 10, solver=\"jax\")\n",
-    "\n",
-    "plt.plot(descent_x, descent_y, label=\"Gradient descent\", marker=\"o\")\n",
-    "plt.plot(jax_descend_x, jax_descend_y, label=\"JAX\", marker=\"x\")"
+    "Diagonalizing this matrix we can in turn say something about which\n",
+    "features are of relevance and which are not. This leads  us to\n",
+    "the classical Principal Component Analysis (PCA) theorem with\n",
+    "applications. This will be discussed later this semester."
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "fbaa615b",
-   "metadata": {},
+   "id": "410f90ac",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "### A more advanced example"
+    "## Other measures in classification studies"
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 26,
-   "id": "6400dac8",
-   "metadata": {},
+   "execution_count": 8,
+   "id": "fa16a459",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
    "outputs": [],
    "source": [
-    "backend = np\n",
-    "\n",
-    "def function(x):\n",
-    "    return x*backend.sin(x**2 + 1)\n",
-    "\n",
-    "def analytical_gradient(x):\n",
-    "    return backend.sin(x**2 + 1) + 2*x**2*backend.cos(x**2 + 1)\n",
-    "\n",
-    "\n",
-    "x = np.linspace(-5, 5, 100)\n",
-    "plt.plot(x, function(x), label=\"f(x)\")\n",
-    "\n",
-    "descent_x, descent_y = gradient_descent(1, 0.01, 300, solver=\"analytical\")\n",
+    "import matplotlib.pyplot as plt\n",
+    "import numpy as np\n",
+    "from sklearn.model_selection import  train_test_split \n",
+    "from sklearn.datasets import load_breast_cancer\n",
+    "from sklearn.linear_model import LogisticRegression\n",
     "\n",
-    "# Change the backend to JAX\n",
-    "backend = jnp\n",
-    "jax_descend_x, jax_descend_y = gradient_descent(1, 0.01, 300, solver=\"jax\")\n",
+    "# Load the data\n",
+    "cancer = load_breast_cancer()\n",
     "\n",
-    "plt.scatter(descent_x, descent_y, label=\"Gradient descent\", marker=\"v\", s=10, color=\"red\") \n",
-    "plt.scatter(jax_descend_x, jax_descend_y, label=\"JAX\", marker=\"x\", s=5, color=\"black\")"
+    "X_train, X_test, y_train, y_test = train_test_split(cancer.data,cancer.target,random_state=0)\n",
+    "print(X_train.shape)\n",
+    "print(X_test.shape)\n",
+    "# Logistic Regression\n",
+    "logreg = LogisticRegression(solver='lbfgs')\n",
+    "logreg.fit(X_train, y_train)\n",
+    "\n",
+    "from sklearn.preprocessing import LabelEncoder\n",
+    "from sklearn.model_selection import cross_validate\n",
+    "#Cross validation\n",
+    "accuracy = cross_validate(logreg,X_test,y_test,cv=10)['test_score']\n",
+    "print(accuracy)\n",
+    "print(\"Test set accuracy with Logistic Regression: {:.2f}\".format(logreg.score(X_test,y_test)))\n",
+    "\n",
+    "import scikitplot as skplt\n",
+    "y_pred = logreg.predict(X_test)\n",
+    "skplt.metrics.plot_confusion_matrix(y_test, y_pred, normalize=True)\n",
+    "plt.show()\n",
+    "y_probas = logreg.predict_proba(X_test)\n",
+    "skplt.metrics.plot_roc(y_test, y_probas)\n",
+    "plt.show()\n",
+    "skplt.metrics.plot_cumulative_gain(y_test, y_probas)\n",
+    "plt.show()"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "b7cd2078",
-   "metadata": {},
+   "id": "a721de53",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "## Introduction to Neural networks\n",
     "\n",
@@ -2431,8 +1413,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "4b41d4a3",
-   "metadata": {},
+   "id": "68de5052",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "## Artificial neurons\n",
     "\n",
@@ -2453,8 +1437,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "05bbca92",
-   "metadata": {},
+   "id": "7685af02",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "<!-- Equation labels as ordinary links -->\n",
     "<div id=\"artificialNeuron\"></div>\n",
@@ -2462,15 +1448,17 @@
     "$$\n",
     "\\begin{equation}\n",
     " y = f\\left(\\sum_{i=1}^n w_ix_i\\right) = f(u)\n",
-    "\\label{artificialNeuron} \\tag{6}\n",
+    "\\label{artificialNeuron} \\tag{1}\n",
     "\\end{equation}\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "63e4ecd2",
-   "metadata": {},
+   "id": "3dfcfcb0",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "Here, the output $y$ of the neuron is the value of its activation function, which have as input\n",
     "a weighted sum of signals $x_i, \\dots ,x_n$ received by $n$ other neurons.\n",
@@ -2507,8 +1495,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "16405c0c",
-   "metadata": {},
+   "id": "0d037ca7",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "## Neural network types\n",
     "\n",
@@ -2534,8 +1524,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "49a77aeb",
-   "metadata": {},
+   "id": "7bcf7188",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "## Feed-forward neural networks\n",
     "\n",
@@ -2553,8 +1545,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "4408658f",
-   "metadata": {},
+   "id": "cd094e20",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "## Convolutional Neural Network\n",
     "\n",
@@ -2580,8 +1574,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "cf8d97ad",
-   "metadata": {},
+   "id": "ea99157e",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "## Recurrent neural networks\n",
     "\n",
@@ -2599,8 +1595,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "3cc3819b",
-   "metadata": {},
+   "id": "b73754c2",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "## Other types of networks\n",
     "\n",
@@ -2618,8 +1616,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "920fd548",
-   "metadata": {},
+   "id": "aa97c83d",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "## Multilayer perceptrons\n",
     "\n",
@@ -2633,8 +1633,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "d08f461e",
-   "metadata": {},
+   "id": "abe84919",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "## Why multilayer perceptrons?\n",
     "\n",
@@ -2652,8 +1654,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "dc59c40b",
-   "metadata": {},
+   "id": "d3ff207b",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "## Illustration of a single perceptron model and a multi-perceptron model\n",
     "\n",
@@ -2666,8 +1670,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "89c1b368",
-   "metadata": {},
+   "id": "f982c11f",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "## Examples of XOR, OR and AND gates\n",
     "\n",
@@ -2680,9 +1686,12 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 27,
-   "id": "f2a213fe",
-   "metadata": {},
+   "execution_count": 9,
+   "id": "04a3e090",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
    "outputs": [],
    "source": [
     "\"\"\"\n",
@@ -2719,25 +1728,32 @@
   },
   {
    "cell_type": "markdown",
-   "id": "046c878f",
-   "metadata": {},
+   "id": "95b1f5a5",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "What is happening here?"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "8dbddf81",
-   "metadata": {},
+   "id": "0d200eff",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "## Does Logistic Regression do a better Job?"
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 28,
-   "id": "3032a2c1",
-   "metadata": {},
+   "execution_count": 10,
+   "id": "040a69d0",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
    "outputs": [],
    "source": [
     "\"\"\"\n",
@@ -2793,25 +1809,32 @@
   },
   {
    "cell_type": "markdown",
-   "id": "ea8b4e4b",
-   "metadata": {},
+   "id": "49f17f65",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "Not exactly impressive, but somewhat better."
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "ef2b283b",
-   "metadata": {},
+   "id": "714e0891",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "## Adding Neural Networks"
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 29,
-   "id": "7415b824",
-   "metadata": {},
+   "execution_count": 11,
+   "id": "28bde670",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
    "outputs": [],
    "source": [
     "\n",
@@ -2827,8 +1850,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "bbfc7004",
-   "metadata": {},
+   "id": "4440856f",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "## Mathematical model\n",
     "\n",
@@ -2837,8 +1862,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "973905d4",
-   "metadata": {},
+   "id": "6199da92",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "y = f\\left(\\sum_{i=1}^n w_ix_i + b_i\\right) = f(z),\n",
@@ -2847,8 +1874,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "9343ae60",
-   "metadata": {},
+   "id": "62c964e3",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "This function receives $x_i$ as inputs.\n",
     "Here the activation $z=(\\sum_{i=1}^n w_ix_i+b_i)$. \n",
@@ -2860,8 +1889,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "0fcd0bdf",
-   "metadata": {},
+   "id": "64ba4c70",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "## Mathematical model\n",
     "\n",
@@ -2870,23 +1901,27 @@
   },
   {
    "cell_type": "markdown",
-   "id": "98e91440",
-   "metadata": {},
+   "id": "66c11135",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "<!-- Equation labels as ordinary links -->\n",
-    "<div id=\"_auto6\"></div>\n",
+    "<div id=\"_auto1\"></div>\n",
     "\n",
     "$$\n",
     "\\begin{equation} z_i^1 = \\sum_{j=1}^{M} w_{ij}^1 x_j + b_i^1\n",
-    "\\label{_auto6} \\tag{7}\n",
+    "\\label{_auto1} \\tag{2}\n",
     "\\end{equation}\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "944ccc68",
-   "metadata": {},
+   "id": "0f47b20a",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "Here $b_i$ is the so-called bias which is normally needed in\n",
     "case of zero activation weights or inputs. How to fix the biases and\n",
@@ -2898,8 +1933,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "17edaee3",
-   "metadata": {},
+   "id": "bda56156",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "<!-- Equation labels as ordinary links -->\n",
     "<div id=\"outputLayer1\"></div>\n",
@@ -2907,15 +1944,17 @@
     "$$\n",
     "\\begin{equation}\n",
     " y_i^1 = f(z_i^1) = f\\left(\\sum_{j=1}^M w_{ij}^1 x_j  + b_i^1\\right)\n",
-    "\\label{outputLayer1} \\tag{8}\n",
+    "\\label{outputLayer1} \\tag{3}\n",
     "\\end{equation}\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "dbe292f0",
-   "metadata": {},
+   "id": "1330fab9",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "where we assume that all nodes in the same layer have identical\n",
     "activation functions, hence the notation $f$. In general, we could assume in the more general case that different layers have different activation functions.\n",
@@ -2924,8 +1963,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "ab06ec71",
-   "metadata": {},
+   "id": "ae474dfb",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "<!-- Equation labels as ordinary links -->\n",
     "<div id=\"generalLayer\"></div>\n",
@@ -2933,15 +1974,17 @@
     "$$\n",
     "\\begin{equation}\n",
     " y_i^l = f^l(u_i^l) = f^l\\left(\\sum_{j=1}^{N_{l-1}} w_{ij}^l y_j^{l-1} + b_i^l\\right)\n",
-    "\\label{generalLayer} \\tag{9}\n",
+    "\\label{generalLayer} \\tag{4}\n",
     "\\end{equation}\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "0f069e07",
-   "metadata": {},
+   "id": "b6cb6fed",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "where $N_l$ is the number of nodes in layer $l$. When the output of\n",
     "all the nodes in the first hidden layer are computed, the values of\n",
@@ -2951,8 +1994,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "3ee71e06",
-   "metadata": {},
+   "id": "2f8f9b4e",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "## Mathematical model\n",
     "\n",
@@ -2961,24 +2006,28 @@
   },
   {
    "cell_type": "markdown",
-   "id": "cecfe50e",
-   "metadata": {},
+   "id": "18e74238",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "<!-- Equation labels as ordinary links -->\n",
-    "<div id=\"_auto7\"></div>\n",
+    "<div id=\"_auto2\"></div>\n",
     "\n",
     "$$\n",
     "\\begin{equation}\n",
     " y_i^2 = f^2\\left(\\sum_{j=1}^N w_{ij}^2 y_j^1 + b_i^2\\right) \n",
-    "\\label{_auto7} \\tag{10}\n",
+    "\\label{_auto2} \\tag{5}\n",
     "\\end{equation}\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "dc2a6523",
-   "metadata": {},
+   "id": "d10df3e7",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "<!-- Equation labels as ordinary links -->\n",
     "<div id=\"outputLayer2\"></div>\n",
@@ -2986,56 +2035,64 @@
     "$$\n",
     "\\begin{equation} \n",
     " = f^2\\left[\\sum_{j=1}^N w_{ij}^2f^1\\left(\\sum_{k=1}^M w_{jk}^1 x_k + b_j^1\\right) + b_i^2\\right]\n",
-    "\\label{outputLayer2} \\tag{11}\n",
+    "\\label{outputLayer2} \\tag{6}\n",
     "\\end{equation}\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "3f73c82a",
-   "metadata": {},
+   "id": "da21a316",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "where we have substituted $y_k^1$ with the inputs $x_k$. Finally, the ANN output reads"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "8b1a1945",
-   "metadata": {},
+   "id": "76938a28",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "<!-- Equation labels as ordinary links -->\n",
-    "<div id=\"_auto8\"></div>\n",
+    "<div id=\"_auto3\"></div>\n",
     "\n",
     "$$\n",
     "\\begin{equation}\n",
     " y_i^3 = f^3\\left(\\sum_{j=1}^N w_{ij}^3 y_j^2 + b_i^3\\right) \n",
-    "\\label{_auto8} \\tag{12}\n",
+    "\\label{_auto3} \\tag{7}\n",
     "\\end{equation}\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "ecd88770",
-   "metadata": {},
+   "id": "65434967",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "<!-- Equation labels as ordinary links -->\n",
-    "<div id=\"_auto9\"></div>\n",
+    "<div id=\"_auto4\"></div>\n",
     "\n",
     "$$\n",
     "\\begin{equation} \n",
     " = f_3\\left[\\sum_{j} w_{ij}^3 f^2\\left(\\sum_{k} w_{jk}^2 f^1\\left(\\sum_{m} w_{km}^1 x_m + b_k^1\\right) + b_j^2\\right)\n",
     "  + b_1^3\\right]\n",
-    "\\label{_auto9} \\tag{13}\n",
+    "\\label{_auto4} \\tag{8}\n",
     "\\end{equation}\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "6b6ce470",
-   "metadata": {},
+   "id": "31d4f5aa",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "## Mathematical model\n",
     "\n",
@@ -3045,8 +2102,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "6e018962",
-   "metadata": {},
+   "id": "114030e5",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "<!-- Equation labels as ordinary links -->\n",
     "<div id=\"completeNN\"></div>\n",
@@ -3054,15 +2113,17 @@
     "$$\n",
     "\\begin{equation}\n",
     "y^{l+1}_i = f^{l+1}\\left[\\!\\sum_{j=1}^{N_l} w_{ij}^3 f^l\\left(\\sum_{k=1}^{N_{l-1}}w_{jk}^{l-1}\\left(\\dots f^1\\left(\\sum_{n=1}^{N_0} w_{mn}^1 x_n+ b_m^1\\right)\\dots\\right)+b_k^2\\right)+b_1^3\\right] \n",
-    "\\label{completeNN} \\tag{14}\n",
+    "\\label{completeNN} \\tag{9}\n",
     "\\end{equation}\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "bf0b66e0",
-   "metadata": {},
+   "id": "a93aec4e",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "which illustrates a basic property of MLPs: The only independent\n",
     "variables are the input values $x_n$."
@@ -3070,8 +2131,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "4827bb05",
-   "metadata": {},
+   "id": "7c85562d",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "## Mathematical model\n",
     "\n",
@@ -3087,24 +2150,28 @@
   },
   {
    "cell_type": "markdown",
-   "id": "6fc6eadd",
-   "metadata": {},
+   "id": "1152ea5e",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "<!-- Equation labels as ordinary links -->\n",
-    "<div id=\"_auto10\"></div>\n",
+    "<div id=\"_auto5\"></div>\n",
     "\n",
     "$$\n",
     "\\begin{equation}\n",
     " f(x) = c_1 f(c_2 x + c_3) + c_4\n",
-    "\\label{_auto10} \\tag{15}\n",
+    "\\label{_auto5} \\tag{10}\n",
     "\\end{equation}\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "f7c37a39",
-   "metadata": {},
+   "id": "4f3d4b33",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "where the parameters $c_i$ are weights and biases. By adjusting these\n",
     "parameters, the activation functions can be shifted up and down or\n",
@@ -3114,8 +2181,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "09472aa3",
-   "metadata": {},
+   "id": "4c1ac54e",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "### Matrix-vector notation\n",
     "\n",
@@ -3132,11 +2201,13 @@
   },
   {
    "cell_type": "markdown",
-   "id": "1824564d",
-   "metadata": {},
+   "id": "5c4a861f",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "<!-- Equation labels as ordinary links -->\n",
-    "<div id=\"_auto11\"></div>\n",
+    "<div id=\"_auto6\"></div>\n",
     "\n",
     "$$\n",
     "\\begin{equation}\n",
@@ -3156,15 +2227,17 @@
     "           b^2_2 \\\\\n",
     "           b^2_3 \\\\\n",
     "          \\end{array}\\right]\\right).\n",
-    "\\label{_auto11} \\tag{16}\n",
+    "\\label{_auto6} \\tag{11}\n",
     "\\end{equation}\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "eafbe02e",
-   "metadata": {},
+   "id": "276b271b",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "### Matrix-vector notation  and activation\n",
     "\n",
@@ -3173,25 +2246,29 @@
   },
   {
    "cell_type": "markdown",
-   "id": "f52242e3",
-   "metadata": {},
+   "id": "63a5b8f1",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "<!-- Equation labels as ordinary links -->\n",
-    "<div id=\"_auto12\"></div>\n",
+    "<div id=\"_auto7\"></div>\n",
     "\n",
     "$$\n",
     "\\begin{equation}\n",
     " y^2_i = f_2\\Bigr(w^2_{i1}y^1_1 + w^2_{i2}y^1_2 + w^2_{i3}y^1_3 + b^2_i\\Bigr) = \n",
     " f_2\\left(\\sum_{j=1}^3 w^2_{ij} y_j^1 + b^2_i\\right).\n",
-    "\\label{_auto12} \\tag{17}\n",
+    "\\label{_auto7} \\tag{12}\n",
     "\\end{equation}\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "72371715",
-   "metadata": {},
+   "id": "316b8c32",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "This is not just a convenient and compact notation, but also a useful\n",
     "and intuitive way to think about MLPs: The output is calculated by a\n",
@@ -3202,8 +2279,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "b455d9ae",
-   "metadata": {},
+   "id": "34ba90c8",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "### Activation functions\n",
     "\n",
@@ -3223,8 +2302,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "7de531f8",
-   "metadata": {},
+   "id": "3019fcaf",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "### Activation functions, Logistic and Hyperbolic ones\n",
     "\n",
@@ -3240,8 +2321,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "dedb08ff",
-   "metadata": {},
+   "id": "389ff36b",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "f(x) = \\frac{1}{1 + e^{-x}},\n",
@@ -3250,16 +2333,20 @@
   },
   {
    "cell_type": "markdown",
-   "id": "ed7c69c9",
-   "metadata": {},
+   "id": "ee9b399a",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "and the *hyperbolic tangent* function"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "37be8225",
-   "metadata": {},
+   "id": "36f98b26",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "f(x) = \\tanh(x)\n",
@@ -3268,8 +2355,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "a8176533",
-   "metadata": {},
+   "id": "cb7b8839",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "### Relevance\n",
     "\n",
@@ -3282,9 +2371,12 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 30,
-   "id": "1b3252bc",
-   "metadata": {},
+   "execution_count": 12,
+   "id": "db8d28b5",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
    "outputs": [],
    "source": [
     "\"\"\"The sigmoid function (or the logistic curve) is a \n",
@@ -3361,25 +2453,7 @@
    ]
   }
  ],
- "metadata": {
-  "kernelspec": {
-   "display_name": "Python 3 (ipykernel)",
-   "language": "python",
-   "name": "python3"
-  },
-  "language_info": {
-   "codemirror_mode": {
-    "name": "ipython",
-    "version": 3
-   },
-   "file_extension": ".py",
-   "mimetype": "text/x-python",
-   "name": "python",
-   "nbconvert_exporter": "python",
-   "pygments_lexer": "ipython3",
-   "version": "3.9.15"
-  }
- },
+ "metadata": {},
  "nbformat": 4,
  "nbformat_minor": 5
 }
diff --git a/doc/pub/week41/html/week41-bs.html b/doc/pub/week41/html/week41-bs.html
index 0232fecde..1bfd1203a 100644
--- a/doc/pub/week41/html/week41-bs.html
+++ b/doc/pub/week41/html/week41-bs.html
@@ -36,23 +36,31 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Plan for week 41, October 7-11',
+ 'sections': [('Plan for week 41, October 6-10',
                2,
                None,
-               'plan-for-week-41-october-7-11'),
-              ('Material for the lecture on Monday October 7, 2024',
+               'plan-for-week-41-october-6-10'),
+              ('Material for the lecture on Monday October 6, 2025',
                2,
                None,
-               'material-for-the-lecture-on-monday-october-7-2024'),
-              ('Material for the active learning sessions on Tuesday and '
-               'Wednesday',
+               'material-for-the-lecture-on-monday-october-6-2025'),
+              ('Readings and Videos:', 2, None, 'readings-and-videos'),
+              ('Mathematics of deep learning',
+               2,
+               None,
+               'mathematics-of-deep-learning'),
+              ('Reminder on books with hands-on material and codes',
+               2,
+               None,
+               'reminder-on-books-with-hands-on-material-and-codes'),
+              ('Lab sessions on Tuesday and Wednesday',
                2,
                None,
-               'material-for-the-active-learning-sessions-on-tuesday-and-wednesday'),
-              ('Lecture Monday  October 7',
+               'lab-sessions-on-tuesday-and-wednesday'),
+              ('Lecture Monday  October 6',
                2,
                None,
-               'lecture-monday-october-7'),
+               'lecture-monday-october-6'),
               ('Introduction to Neural networks',
                2,
                None,
@@ -82,24 +90,6 @@
                2,
                None,
                'illustration-of-a-single-perceptron-model-and-a-multi-perceptron-model'),
-              ('Examples of XOR, OR and AND gates',
-               2,
-               None,
-               'examples-of-xor-or-and-and-gates'),
-              ('Does Logistic Regression do a better Job?',
-               2,
-               None,
-               'does-logistic-regression-do-a-better-job'),
-              ('Adding Neural Networks', 2, None, 'adding-neural-networks'),
-              ('Mathematics of deep learning',
-               2,
-               None,
-               'mathematics-of-deep-learning'),
-              ('Reminder on books with hands-on material and codes',
-               2,
-               None,
-               'reminder-on-books-with-hands-on-material-and-codes'),
-              ('Reading recommendations', 2, None, 'reading-recommendations'),
               ('Mathematics of deep learning and neural networks',
                2,
                None,
@@ -304,64 +294,7 @@
                2,
                None,
                'setting-up-the-back-propagation-algorithm-part-3'),
-              ('Updating the gradients', 2, None, 'updating-the-gradients'),
-              ('Activation functions', 3, None, 'activation-functions'),
-              ('Activation functions, Logistic and Hyperbolic ones',
-               3,
-               None,
-               'activation-functions-logistic-and-hyperbolic-ones'),
-              ('Relevance', 3, None, 'relevance'),
-              ('Fine-tuning neural network hyperparameters',
-               2,
-               None,
-               'fine-tuning-neural-network-hyperparameters'),
-              ('Hidden layers', 2, None, 'hidden-layers'),
-              ('Vanishing gradients', 2, None, 'vanishing-gradients'),
-              ('Exploding gradients', 2, None, 'exploding-gradients'),
-              ('Is the Logistic activation function (Sigmoid)  our choice?',
-               2,
-               None,
-               'is-the-logistic-activation-function-sigmoid-our-choice'),
-              ('Logistic function as the root of problems',
-               2,
-               None,
-               'logistic-function-as-the-root-of-problems'),
-              ('The derivative of the Logistic funtion',
-               2,
-               None,
-               'the-derivative-of-the-logistic-funtion'),
-              ('Insights from the paper by Glorot and Bengio',
-               2,
-               None,
-               'insights-from-the-paper-by-glorot-and-bengio'),
-              ('The RELU function family', 2, None, 'the-relu-function-family'),
-              ('ELU function', 2, None, 'elu-function'),
-              ('Which activation function should we use?',
-               2,
-               None,
-               'which-activation-function-should-we-use'),
-              ('More on activation functions, output layers',
-               2,
-               None,
-               'more-on-activation-functions-output-layers'),
-              ('Batch Normalization', 2, None, 'batch-normalization'),
-              ('Dropout', 2, None, 'dropout'),
-              ('Gradient Clipping', 2, None, 'gradient-clipping'),
-              ('A top-down perspective on Neural networks',
-               2,
-               None,
-               'a-top-down-perspective-on-neural-networks'),
-              ('More top-down perspectives',
-               2,
-               None,
-               'more-top-down-perspectives'),
-              ('Limitations of supervised learning with deep networks',
-               2,
-               None,
-               'limitations-of-supervised-learning-with-deep-networks'),
-              ('Limitations of NNs', 2, None, 'limitations-of-nns'),
-              ('Homogeneous data', 2, None, 'homogeneous-data'),
-              ('More limitations', 2, None, 'more-limitations')]}
+              ('Updating the gradients', 2, None, 'updating-the-gradients')]}
 end of tocinfo -->
 
 <body>
@@ -396,118 +329,91 @@
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="#plan-for-week-41-october-7-11" style="font-size: 80%;"><b>Plan for week 41, October 7-11</b></a></li>
-     <!-- navigation toc: --> <li><a href="#material-for-the-lecture-on-monday-october-7-2024" style="font-size: 80%;"><b>Material for the lecture on Monday October 7, 2024</b></a></li>
-     <!-- navigation toc: --> <li><a href="#material-for-the-active-learning-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Material for the active learning sessions on Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="#lecture-monday-october-7" style="font-size: 80%;"><b>Lecture Monday  October 7</b></a></li>
-     <!-- navigation toc: --> <li><a href="#introduction-to-neural-networks" style="font-size: 80%;"><b>Introduction to Neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="#artificial-neurons" style="font-size: 80%;"><b>Artificial neurons</b></a></li>
-     <!-- navigation toc: --> <li><a href="#neural-network-types" style="font-size: 80%;"><b>Neural network types</b></a></li>
-     <!-- navigation toc: --> <li><a href="#feed-forward-neural-networks" style="font-size: 80%;"><b>Feed-forward neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="#convolutional-neural-network" style="font-size: 80%;"><b>Convolutional Neural Network</b></a></li>
-     <!-- navigation toc: --> <li><a href="#recurrent-neural-networks" style="font-size: 80%;"><b>Recurrent neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="#other-types-of-networks" style="font-size: 80%;"><b>Other types of networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="#multilayer-perceptrons" style="font-size: 80%;"><b>Multilayer perceptrons</b></a></li>
-     <!-- navigation toc: --> <li><a href="#why-multilayer-perceptrons" style="font-size: 80%;"><b>Why multilayer perceptrons?</b></a></li>
-     <!-- navigation toc: --> <li><a href="#illustration-of-a-single-perceptron-model-and-a-multi-perceptron-model" style="font-size: 80%;"><b>Illustration of a single perceptron model and a multi-perceptron model</b></a></li>
-     <!-- navigation toc: --> <li><a href="#examples-of-xor-or-and-and-gates" style="font-size: 80%;"><b>Examples of XOR, OR and AND gates</b></a></li>
-     <!-- navigation toc: --> <li><a href="#does-logistic-regression-do-a-better-job" style="font-size: 80%;"><b>Does Logistic Regression do a better Job?</b></a></li>
-     <!-- navigation toc: --> <li><a href="#adding-neural-networks" style="font-size: 80%;"><b>Adding Neural Networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="#mathematics-of-deep-learning" style="font-size: 80%;"><b>Mathematics of deep learning</b></a></li>
-     <!-- navigation toc: --> <li><a href="#reminder-on-books-with-hands-on-material-and-codes" style="font-size: 80%;"><b>Reminder on books with hands-on material and codes</b></a></li>
-     <!-- navigation toc: --> <li><a href="#reading-recommendations" style="font-size: 80%;"><b>Reading recommendations</b></a></li>
-     <!-- navigation toc: --> <li><a href="#mathematics-of-deep-learning-and-neural-networks" style="font-size: 80%;"><b>Mathematics of deep learning and neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="#basics-of-an-nn" style="font-size: 80%;"><b>Basics of an NN</b></a></li>
-     <!-- navigation toc: --> <li><a href="#overarching-view-of-a-neural-network" style="font-size: 80%;"><b>Overarching view of a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="#the-optimization-problem" style="font-size: 80%;"><b>The optimization problem</b></a></li>
-     <!-- navigation toc: --> <li><a href="#parameters-of-neural-networks" style="font-size: 80%;"><b>Parameters of neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="#other-ingredients-of-a-neural-network" style="font-size: 80%;"><b>Other ingredients of a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="#other-parameters" style="font-size: 80%;"><b>Other parameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="#universal-approximation-theorem" style="font-size: 80%;"><b>Universal approximation theorem</b></a></li>
-     <!-- navigation toc: --> <li><a href="#some-parallels-from-real-analysis" style="font-size: 80%;"><b>Some parallels from real analysis</b></a></li>
-     <!-- navigation toc: --> <li><a href="#the-approximation-theorem-in-words" style="font-size: 80%;"><b>The approximation theorem in words</b></a></li>
-     <!-- navigation toc: --> <li><a href="#more-on-the-general-approximation-theorem" style="font-size: 80%;"><b>More on the general approximation theorem</b></a></li>
-     <!-- navigation toc: --> <li><a href="#class-of-functions-we-can-approximate" style="font-size: 80%;"><b>Class of functions we can approximate</b></a></li>
-     <!-- navigation toc: --> <li><a href="#setting-up-the-equations-for-a-neural-network" style="font-size: 80%;"><b>Setting up the equations for a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="#layout-of-a-neural-network-with-three-hidden-layers" style="font-size: 80%;"><b>Layout of a neural network with three hidden layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="#definitions" style="font-size: 80%;"><b>Definitions</b></a></li>
-     <!-- navigation toc: --> <li><a href="#inputs-to-the-activation-function" style="font-size: 80%;"><b>Inputs to the activation function</b></a></li>
-     <!-- navigation toc: --> <li><a href="#derivatives-and-the-chain-rule" style="font-size: 80%;"><b>Derivatives and the chain rule</b></a></li>
-     <!-- navigation toc: --> <li><a href="#derivative-of-the-cost-function" style="font-size: 80%;"><b>Derivative of the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="#simpler-examples-first-and-automatic-differentiation" style="font-size: 80%;"><b>Simpler examples first, and automatic differentiation</b></a></li>
-     <!-- navigation toc: --> <li><a href="#reminder-on-the-chain-rule-and-gradients" style="font-size: 80%;"><b>Reminder on the chain rule and gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="#multivariable-functions" style="font-size: 80%;"><b>Multivariable functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="#automatic-differentiation-through-examples" style="font-size: 80%;"><b>Automatic differentiation through examples</b></a></li>
-     <!-- navigation toc: --> <li><a href="#simple-example" style="font-size: 80%;"><b>Simple example</b></a></li>
-     <!-- navigation toc: --> <li><a href="#smarter-way-of-evaluating-the-above-function" style="font-size: 80%;"><b>Smarter way of evaluating the above function</b></a></li>
-     <!-- navigation toc: --> <li><a href="#reducing-the-number-of-operations" style="font-size: 80%;"><b>Reducing the number of operations</b></a></li>
-     <!-- navigation toc: --> <li><a href="#chain-rule-forward-and-reverse-modes" style="font-size: 80%;"><b>Chain rule, forward and reverse modes</b></a></li>
-     <!-- navigation toc: --> <li><a href="#forward-and-reverse-modes" style="font-size: 80%;"><b>Forward and reverse modes</b></a></li>
-     <!-- navigation toc: --> <li><a href="#more-complicated-function" style="font-size: 80%;"><b>More complicated function</b></a></li>
-     <!-- navigation toc: --> <li><a href="#counting-the-number-of-floating-point-operations" style="font-size: 80%;"><b>Counting the number of floating point operations</b></a></li>
-     <!-- navigation toc: --> <li><a href="#defining-intermediate-operations" style="font-size: 80%;"><b>Defining intermediate operations</b></a></li>
-     <!-- navigation toc: --> <li><a href="#new-expression-for-the-derivative" style="font-size: 80%;"><b>New expression for the derivative</b></a></li>
-     <!-- navigation toc: --> <li><a href="#final-derivatives" style="font-size: 80%;"><b>Final derivatives</b></a></li>
-     <!-- navigation toc: --> <li><a href="#in-general-not-this-simple" style="font-size: 80%;"><b>In general not this simple</b></a></li>
-     <!-- navigation toc: --> <li><a href="#automatic-differentiation" style="font-size: 80%;"><b>Automatic differentiation</b></a></li>
-     <!-- navigation toc: --> <li><a href="#chain-rule" style="font-size: 80%;"><b>Chain rule</b></a></li>
-     <!-- navigation toc: --> <li><a href="#first-network-example-simple-percepetron-with-one-input" style="font-size: 80%;"><b>First network example, simple percepetron with one input</b></a></li>
-     <!-- navigation toc: --> <li><a href="#layout-of-a-simple-neural-network-with-no-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with no hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="#optimizing-the-parameters" style="font-size: 80%;"><b>Optimizing the parameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="#adding-a-hidden-layer" style="font-size: 80%;"><b>Adding a hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="#layout-of-a-simple-neural-network-with-one-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with one hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="#the-derivatives" style="font-size: 80%;"><b>The derivatives</b></a></li>
-     <!-- navigation toc: --> <li><a href="#important-observations" style="font-size: 80%;"><b>Important observations</b></a></li>
-     <!-- navigation toc: --> <li><a href="#the-training" style="font-size: 80%;"><b>The training</b></a></li>
-     <!-- navigation toc: --> <li><a href="#code-example" style="font-size: 80%;"><b>Code example</b></a></li>
-     <!-- navigation toc: --> <li><a href="#exercise-1-including-more-data" style="font-size: 80%;"><b>Exercise 1: Including more data</b></a></li>
-     <!-- navigation toc: --> <li><a href="#simple-neural-network-and-the-back-propagation-equations" style="font-size: 80%;"><b>Simple neural network and the  back propagation equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="#layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-and-one-output-node" style="font-size: 80%;"><b>Layout of a simple neural network with two input nodes, one  hidden layer and one output node</b></a></li>
-     <!-- navigation toc: --> <li><a href="#the-ouput-layer" style="font-size: 80%;"><b>The ouput layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="#compact-expressions" style="font-size: 80%;"><b>Compact expressions</b></a></li>
-     <!-- navigation toc: --> <li><a href="#output-layer" style="font-size: 80%;"><b>Output layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="#explicit-derivatives" style="font-size: 80%;"><b>Explicit derivatives</b></a></li>
-     <!-- navigation toc: --> <li><a href="#derivatives-of-the-hidden-layer" style="font-size: 80%;"><b>Derivatives of the hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="#final-expression" style="font-size: 80%;"><b>Final expression</b></a></li>
-     <!-- navigation toc: --> <li><a href="#completing-the-list" style="font-size: 80%;"><b>Completing the list</b></a></li>
-     <!-- navigation toc: --> <li><a href="#final-expressions-for-the-biases-of-the-hidden-layer" style="font-size: 80%;"><b>Final expressions for the biases of the hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="#gradient-expressions" style="font-size: 80%;"><b>Gradient expressions</b></a></li>
-     <!-- navigation toc: --> <li><a href="#exercise-2-extended-program" style="font-size: 80%;"><b>Exercise 2: Extended program</b></a></li>
-     <!-- navigation toc: --> <li><a href="#getting-serious-the-back-propagation-equations-for-a-neural-network" style="font-size: 80%;"><b>Getting serious, the  back propagation equations for a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="#analyzing-the-last-results" style="font-size: 80%;"><b>Analyzing the last results</b></a></li>
-     <!-- navigation toc: --> <li><a href="#more-considerations" style="font-size: 80%;"><b>More considerations</b></a></li>
-     <!-- navigation toc: --> <li><a href="#derivatives-in-terms-of-z-j-l" style="font-size: 80%;"><b>Derivatives in terms of \( z_j^L \)</b></a></li>
-     <!-- navigation toc: --> <li><a href="#bringing-it-together" style="font-size: 80%;"><b>Bringing it together</b></a></li>
-     <!-- navigation toc: --> <li><a href="#final-back-propagating-equation" style="font-size: 80%;"><b>Final back propagating equation</b></a></li>
-     <!-- navigation toc: --> <li><a href="#using-the-chain-rule-and-summing-over-all-k-entries" style="font-size: 80%;"><b>Using the chain rule and summing over all \( k \) entries</b></a></li>
-     <!-- navigation toc: --> <li><a href="#setting-up-the-back-propagation-algorithm" style="font-size: 80%;"><b>Setting up the back propagation algorithm</b></a></li>
-     <!-- navigation toc: --> <li><a href="#setting-up-the-back-propagation-algorithm-part-2" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="#setting-up-the-back-propagation-algorithm-part-3" style="font-size: 80%;"><b>Setting up the Back propagation algorithm, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="#updating-the-gradients" style="font-size: 80%;"><b>Updating the gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
-     <!-- navigation toc: --> <li><a href="#relevance" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Relevance</a></li>
-     <!-- navigation toc: --> <li><a href="#fine-tuning-neural-network-hyperparameters" style="font-size: 80%;"><b>Fine-tuning neural network hyperparameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="#hidden-layers" style="font-size: 80%;"><b>Hidden layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="#exploding-gradients" style="font-size: 80%;"><b>Exploding gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="#is-the-logistic-activation-function-sigmoid-our-choice" style="font-size: 80%;"><b>Is the Logistic activation function (Sigmoid)  our choice?</b></a></li>
-     <!-- navigation toc: --> <li><a href="#logistic-function-as-the-root-of-problems" style="font-size: 80%;"><b>Logistic function as the root of problems</b></a></li>
-     <!-- navigation toc: --> <li><a href="#the-derivative-of-the-logistic-funtion" style="font-size: 80%;"><b>The derivative of the Logistic funtion</b></a></li>
-     <!-- navigation toc: --> <li><a href="#insights-from-the-paper-by-glorot-and-bengio" style="font-size: 80%;"><b>Insights from the paper by Glorot and Bengio</b></a></li>
-     <!-- navigation toc: --> <li><a href="#the-relu-function-family" style="font-size: 80%;"><b>The RELU function family</b></a></li>
-     <!-- navigation toc: --> <li><a href="#elu-function" style="font-size: 80%;"><b>ELU function</b></a></li>
-     <!-- navigation toc: --> <li><a href="#which-activation-function-should-we-use" style="font-size: 80%;"><b>Which activation function should we use?</b></a></li>
-     <!-- navigation toc: --> <li><a href="#more-on-activation-functions-output-layers" style="font-size: 80%;"><b>More on activation functions, output layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="#batch-normalization" style="font-size: 80%;"><b>Batch Normalization</b></a></li>
-     <!-- navigation toc: --> <li><a href="#dropout" style="font-size: 80%;"><b>Dropout</b></a></li>
-     <!-- navigation toc: --> <li><a href="#gradient-clipping" style="font-size: 80%;"><b>Gradient Clipping</b></a></li>
-     <!-- navigation toc: --> <li><a href="#a-top-down-perspective-on-neural-networks" style="font-size: 80%;"><b>A top-down perspective on Neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="#more-top-down-perspectives" style="font-size: 80%;"><b>More top-down perspectives</b></a></li>
-     <!-- navigation toc: --> <li><a href="#limitations-of-supervised-learning-with-deep-networks" style="font-size: 80%;"><b>Limitations of supervised learning with deep networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="#limitations-of-nns" style="font-size: 80%;"><b>Limitations of NNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="#homogeneous-data" style="font-size: 80%;"><b>Homogeneous data</b></a></li>
-     <!-- navigation toc: --> <li><a href="#more-limitations" style="font-size: 80%;"><b>More limitations</b></a></li>
+     <!-- navigation toc: --> <li><a href="#plan-for-week-41-october-6-10" style="font-size: 80%;">Plan for week 41, October 6-10</a></li>
+     <!-- navigation toc: --> <li><a href="#material-for-the-lecture-on-monday-october-6-2025" style="font-size: 80%;">Material for the lecture on Monday October 6, 2025</a></li>
+     <!-- navigation toc: --> <li><a href="#readings-and-videos" style="font-size: 80%;">Readings and Videos:</a></li>
+     <!-- navigation toc: --> <li><a href="#mathematics-of-deep-learning" style="font-size: 80%;">Mathematics of deep learning</a></li>
+     <!-- navigation toc: --> <li><a href="#reminder-on-books-with-hands-on-material-and-codes" style="font-size: 80%;">Reminder on books with hands-on material and codes</a></li>
+     <!-- navigation toc: --> <li><a href="#lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;">Lab sessions on Tuesday and Wednesday</a></li>
+     <!-- navigation toc: --> <li><a href="#lecture-monday-october-6" style="font-size: 80%;">Lecture Monday  October 6</a></li>
+     <!-- navigation toc: --> <li><a href="#introduction-to-neural-networks" style="font-size: 80%;">Introduction to Neural networks</a></li>
+     <!-- navigation toc: --> <li><a href="#artificial-neurons" style="font-size: 80%;">Artificial neurons</a></li>
+     <!-- navigation toc: --> <li><a href="#neural-network-types" style="font-size: 80%;">Neural network types</a></li>
+     <!-- navigation toc: --> <li><a href="#feed-forward-neural-networks" style="font-size: 80%;">Feed-forward neural networks</a></li>
+     <!-- navigation toc: --> <li><a href="#convolutional-neural-network" style="font-size: 80%;">Convolutional Neural Network</a></li>
+     <!-- navigation toc: --> <li><a href="#recurrent-neural-networks" style="font-size: 80%;">Recurrent neural networks</a></li>
+     <!-- navigation toc: --> <li><a href="#other-types-of-networks" style="font-size: 80%;">Other types of networks</a></li>
+     <!-- navigation toc: --> <li><a href="#multilayer-perceptrons" style="font-size: 80%;">Multilayer perceptrons</a></li>
+     <!-- navigation toc: --> <li><a href="#why-multilayer-perceptrons" style="font-size: 80%;">Why multilayer perceptrons?</a></li>
+     <!-- navigation toc: --> <li><a href="#illustration-of-a-single-perceptron-model-and-a-multi-perceptron-model" style="font-size: 80%;">Illustration of a single perceptron model and a multi-perceptron model</a></li>
+     <!-- navigation toc: --> <li><a href="#mathematics-of-deep-learning-and-neural-networks" style="font-size: 80%;">Mathematics of deep learning and neural networks</a></li>
+     <!-- navigation toc: --> <li><a href="#basics-of-an-nn" style="font-size: 80%;">Basics of an NN</a></li>
+     <!-- navigation toc: --> <li><a href="#overarching-view-of-a-neural-network" style="font-size: 80%;">Overarching view of a neural network</a></li>
+     <!-- navigation toc: --> <li><a href="#the-optimization-problem" style="font-size: 80%;">The optimization problem</a></li>
+     <!-- navigation toc: --> <li><a href="#parameters-of-neural-networks" style="font-size: 80%;">Parameters of neural networks</a></li>
+     <!-- navigation toc: --> <li><a href="#other-ingredients-of-a-neural-network" style="font-size: 80%;">Other ingredients of a neural network</a></li>
+     <!-- navigation toc: --> <li><a href="#other-parameters" style="font-size: 80%;">Other parameters</a></li>
+     <!-- navigation toc: --> <li><a href="#universal-approximation-theorem" style="font-size: 80%;">Universal approximation theorem</a></li>
+     <!-- navigation toc: --> <li><a href="#some-parallels-from-real-analysis" style="font-size: 80%;">Some parallels from real analysis</a></li>
+     <!-- navigation toc: --> <li><a href="#the-approximation-theorem-in-words" style="font-size: 80%;">The approximation theorem in words</a></li>
+     <!-- navigation toc: --> <li><a href="#more-on-the-general-approximation-theorem" style="font-size: 80%;">More on the general approximation theorem</a></li>
+     <!-- navigation toc: --> <li><a href="#class-of-functions-we-can-approximate" style="font-size: 80%;">Class of functions we can approximate</a></li>
+     <!-- navigation toc: --> <li><a href="#setting-up-the-equations-for-a-neural-network" style="font-size: 80%;">Setting up the equations for a neural network</a></li>
+     <!-- navigation toc: --> <li><a href="#layout-of-a-neural-network-with-three-hidden-layers" style="font-size: 80%;">Layout of a neural network with three hidden layers</a></li>
+     <!-- navigation toc: --> <li><a href="#definitions" style="font-size: 80%;">Definitions</a></li>
+     <!-- navigation toc: --> <li><a href="#inputs-to-the-activation-function" style="font-size: 80%;">Inputs to the activation function</a></li>
+     <!-- navigation toc: --> <li><a href="#derivatives-and-the-chain-rule" style="font-size: 80%;">Derivatives and the chain rule</a></li>
+     <!-- navigation toc: --> <li><a href="#derivative-of-the-cost-function" style="font-size: 80%;">Derivative of the cost function</a></li>
+     <!-- navigation toc: --> <li><a href="#simpler-examples-first-and-automatic-differentiation" style="font-size: 80%;">Simpler examples first, and automatic differentiation</a></li>
+     <!-- navigation toc: --> <li><a href="#reminder-on-the-chain-rule-and-gradients" style="font-size: 80%;">Reminder on the chain rule and gradients</a></li>
+     <!-- navigation toc: --> <li><a href="#multivariable-functions" style="font-size: 80%;">Multivariable functions</a></li>
+     <!-- navigation toc: --> <li><a href="#automatic-differentiation-through-examples" style="font-size: 80%;">Automatic differentiation through examples</a></li>
+     <!-- navigation toc: --> <li><a href="#simple-example" style="font-size: 80%;">Simple example</a></li>
+     <!-- navigation toc: --> <li><a href="#smarter-way-of-evaluating-the-above-function" style="font-size: 80%;">Smarter way of evaluating the above function</a></li>
+     <!-- navigation toc: --> <li><a href="#reducing-the-number-of-operations" style="font-size: 80%;">Reducing the number of operations</a></li>
+     <!-- navigation toc: --> <li><a href="#chain-rule-forward-and-reverse-modes" style="font-size: 80%;">Chain rule, forward and reverse modes</a></li>
+     <!-- navigation toc: --> <li><a href="#forward-and-reverse-modes" style="font-size: 80%;">Forward and reverse modes</a></li>
+     <!-- navigation toc: --> <li><a href="#more-complicated-function" style="font-size: 80%;">More complicated function</a></li>
+     <!-- navigation toc: --> <li><a href="#counting-the-number-of-floating-point-operations" style="font-size: 80%;">Counting the number of floating point operations</a></li>
+     <!-- navigation toc: --> <li><a href="#defining-intermediate-operations" style="font-size: 80%;">Defining intermediate operations</a></li>
+     <!-- navigation toc: --> <li><a href="#new-expression-for-the-derivative" style="font-size: 80%;">New expression for the derivative</a></li>
+     <!-- navigation toc: --> <li><a href="#final-derivatives" style="font-size: 80%;">Final derivatives</a></li>
+     <!-- navigation toc: --> <li><a href="#in-general-not-this-simple" style="font-size: 80%;">In general not this simple</a></li>
+     <!-- navigation toc: --> <li><a href="#automatic-differentiation" style="font-size: 80%;">Automatic differentiation</a></li>
+     <!-- navigation toc: --> <li><a href="#chain-rule" style="font-size: 80%;">Chain rule</a></li>
+     <!-- navigation toc: --> <li><a href="#first-network-example-simple-percepetron-with-one-input" style="font-size: 80%;">First network example, simple percepetron with one input</a></li>
+     <!-- navigation toc: --> <li><a href="#layout-of-a-simple-neural-network-with-no-hidden-layer" style="font-size: 80%;">Layout of a simple neural network with no hidden layer</a></li>
+     <!-- navigation toc: --> <li><a href="#optimizing-the-parameters" style="font-size: 80%;">Optimizing the parameters</a></li>
+     <!-- navigation toc: --> <li><a href="#adding-a-hidden-layer" style="font-size: 80%;">Adding a hidden layer</a></li>
+     <!-- navigation toc: --> <li><a href="#layout-of-a-simple-neural-network-with-one-hidden-layer" style="font-size: 80%;">Layout of a simple neural network with one hidden layer</a></li>
+     <!-- navigation toc: --> <li><a href="#the-derivatives" style="font-size: 80%;">The derivatives</a></li>
+     <!-- navigation toc: --> <li><a href="#important-observations" style="font-size: 80%;">Important observations</a></li>
+     <!-- navigation toc: --> <li><a href="#the-training" style="font-size: 80%;">The training</a></li>
+     <!-- navigation toc: --> <li><a href="#code-example" style="font-size: 80%;">Code example</a></li>
+     <!-- navigation toc: --> <li><a href="#exercise-1-including-more-data" style="font-size: 80%;">Exercise 1: Including more data</a></li>
+     <!-- navigation toc: --> <li><a href="#simple-neural-network-and-the-back-propagation-equations" style="font-size: 80%;">Simple neural network and the  back propagation equations</a></li>
+     <!-- navigation toc: --> <li><a href="#layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-and-one-output-node" style="font-size: 80%;">Layout of a simple neural network with two input nodes, one  hidden layer and one output node</a></li>
+     <!-- navigation toc: --> <li><a href="#the-ouput-layer" style="font-size: 80%;">The ouput layer</a></li>
+     <!-- navigation toc: --> <li><a href="#compact-expressions" style="font-size: 80%;">Compact expressions</a></li>
+     <!-- navigation toc: --> <li><a href="#output-layer" style="font-size: 80%;">Output layer</a></li>
+     <!-- navigation toc: --> <li><a href="#explicit-derivatives" style="font-size: 80%;">Explicit derivatives</a></li>
+     <!-- navigation toc: --> <li><a href="#derivatives-of-the-hidden-layer" style="font-size: 80%;">Derivatives of the hidden layer</a></li>
+     <!-- navigation toc: --> <li><a href="#final-expression" style="font-size: 80%;">Final expression</a></li>
+     <!-- navigation toc: --> <li><a href="#completing-the-list" style="font-size: 80%;">Completing the list</a></li>
+     <!-- navigation toc: --> <li><a href="#final-expressions-for-the-biases-of-the-hidden-layer" style="font-size: 80%;">Final expressions for the biases of the hidden layer</a></li>
+     <!-- navigation toc: --> <li><a href="#gradient-expressions" style="font-size: 80%;">Gradient expressions</a></li>
+     <!-- navigation toc: --> <li><a href="#exercise-2-extended-program" style="font-size: 80%;">Exercise 2: Extended program</a></li>
+     <!-- navigation toc: --> <li><a href="#getting-serious-the-back-propagation-equations-for-a-neural-network" style="font-size: 80%;">Getting serious, the  back propagation equations for a neural network</a></li>
+     <!-- navigation toc: --> <li><a href="#analyzing-the-last-results" style="font-size: 80%;">Analyzing the last results</a></li>
+     <!-- navigation toc: --> <li><a href="#more-considerations" style="font-size: 80%;">More considerations</a></li>
+     <!-- navigation toc: --> <li><a href="#derivatives-in-terms-of-z-j-l" style="font-size: 80%;">Derivatives in terms of \( z_j^L \)</a></li>
+     <!-- navigation toc: --> <li><a href="#bringing-it-together" style="font-size: 80%;">Bringing it together</a></li>
+     <!-- navigation toc: --> <li><a href="#final-back-propagating-equation" style="font-size: 80%;">Final back propagating equation</a></li>
+     <!-- navigation toc: --> <li><a href="#using-the-chain-rule-and-summing-over-all-k-entries" style="font-size: 80%;">Using the chain rule and summing over all \( k \) entries</a></li>
+     <!-- navigation toc: --> <li><a href="#setting-up-the-back-propagation-algorithm" style="font-size: 80%;">Setting up the back propagation algorithm</a></li>
+     <!-- navigation toc: --> <li><a href="#setting-up-the-back-propagation-algorithm-part-2" style="font-size: 80%;">Setting up the back propagation algorithm, part 2</a></li>
+     <!-- navigation toc: --> <li><a href="#setting-up-the-back-propagation-algorithm-part-3" style="font-size: 80%;">Setting up the Back propagation algorithm, part 3</a></li>
+     <!-- navigation toc: --> <li><a href="#updating-the-gradients" style="font-size: 80%;">Updating the gradients</a></li>
 
         </ul>
       </li>
@@ -529,7 +435,7 @@ <h1>Week 41 Neural networks and constructing a neural network code</h1>
 </center>
 <!-- institution -->
 <center>
-<b>Department of Physics, University of Oslo</b>
+<b>Department of Physics, University of Oslo, Norway</b>
 </center>
 <br>
 <center>
@@ -541,47 +447,68 @@ <h4>Week 41</h4>
 </div> <!-- end jumbotron -->
 
 <!-- !split -->
-<h2 id="plan-for-week-41-october-7-11" class="anchor">Plan for week 41, October 7-11 </h2>
+<h2 id="plan-for-week-41-october-6-10" class="anchor">Plan for week 41, October 6-10 </h2>
 
 <!-- !split -->
-<h2 id="material-for-the-lecture-on-monday-october-7-2024" class="anchor">Material for the lecture on Monday October 7, 2024 </h2>
+<h2 id="material-for-the-lecture-on-monday-october-6-2025" class="anchor">Material for the lecture on Monday October 6, 2025 </h2>
 <ol>
 <li> Neural Networks, setting up the basic steps, from the simple perceptron model to the multi-layer perceptron model.</li>
-<li> Building our own Feed-forward Neural Network</li>
-<ul>
-  <li> Video of lecture notes at <a href="https://youtu.be/pMRUbf9E-gM" target="_self"><tt>https://youtu.be/pMRUbf9E-gM</tt></a></li>
-  <li> Whiteboard notes at <a href="https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2024/NotesOctober7.pdf" target="_self"><tt>https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2024/NotesOctober7.pdf</tt></a></li>
-</ul>
+<li> Building our own Feed-forward Neural Network, getting started</li>
+<li> Video of lecture notes at <a href="https://youtu.be/96Pe6O9Wn6g" target="_self"><tt>https://youtu.be/96Pe6O9Wn6g</tt></a></li>
+<li> Whiteboard notes at <a href="https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2025/FYSSTKweek41.pdf" target="_self"><tt>https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2025/FYSSTKweek41.pdf</tt></a></li>
 </ol>
+<!-- !split -->
+<h2 id="readings-and-videos" class="anchor">Readings and Videos: </h2>
 <div class="panel panel-default">
 <div class="panel-body">
 <!-- subsequent paragraphs come in larger fonts, so start with a paragraph -->
 <ol>
 <li> These lecture notes</li>
-<li> Rashcka et al chapter 11</li> 
-<li> For neural networks we recommend Goodfellow et al chapter 6.
-<ol type="a"></li>
- <li> Neural Networks demystified at <a href="https://www.youtube.com/watch?v=bxe2T-V8XRs&list=PLiaHhY2iBX9hdHaRr6b7XevZtgZRa1PoU&ab_channel=WelchLabs" target="_self"><tt>https://www.youtube.com/watch?v=bxe2T-V8XRs&list=PLiaHhY2iBX9hdHaRr6b7XevZtgZRa1PoU&ab_channel=WelchLabs</tt></a></li>
-</ol>
+<li> For neural networks we recommend Goodfellow et al chapters 6 and 7.</li>
+<li> Rashkca et al., chapter 11, jupyter-notebook sent separately, from <a href="https://github.com/rasbt/machine-learning-book" target="_self">GitHub</a></li>
+<li> Neural Networks demystified at <a href="https://www.youtube.com/watch?v=bxe2T-V8XRs&list=PLiaHhY2iBX9hdHaRr6b7XevZtgZRa1PoU&ab_channel=WelchLabs" target="_self"><tt>https://www.youtube.com/watch?v=bxe2T-V8XRs&list=PLiaHhY2iBX9hdHaRr6b7XevZtgZRa1PoU&ab_channel=WelchLabs</tt></a></li>
 <li> Building Neural Networks from scratch at <a href="https://www.youtube.com/watch?v=Wo5dMEP_BbI&list=PLQVvvaa0QuDcjD5BAw2DxE6OF2tius3V3&ab_channel=sentdex" target="_self"><tt>https://www.youtube.com/watch?v=Wo5dMEP_BbI&list=PLQVvvaa0QuDcjD5BAw2DxE6OF2tius3V3&ab_channel=sentdex</tt></a></li>
 <li> Video on Neural Networks at <a href="https://www.youtube.com/watch?v=CqOfi41LfDw" target="_self"><tt>https://www.youtube.com/watch?v=CqOfi41LfDw</tt></a></li>
 <li> Video on the back propagation algorithm at <a href="https://www.youtube.com/watch?v=Ilg3gGewQ5U" target="_self"><tt>https://www.youtube.com/watch?v=Ilg3gGewQ5U</tt></a></li>
+<li> We also  recommend Michael Nielsen's intuitive approach to the neural networks and the universal approximation theorem, see the slides at <a href="http://neuralnetworksanddeeplearning.com/chap4.html" target="_self"><tt>http://neuralnetworksanddeeplearning.com/chap4.html</tt></a>.</li>
 </ol>
-<p>We also  recommend Michael Nielsen's intuitive approach to the neural networks and the universal approximation theorem, see the slides at <a href="http://neuralnetworksanddeeplearning.com/chap4.html" target="_self"><tt>http://neuralnetworksanddeeplearning.com/chap4.html</tt></a>.</p>
 </div>
 </div>
 
 
 <!-- !split -->
-<h2 id="material-for-the-active-learning-sessions-on-tuesday-and-wednesday" class="anchor">Material for the active learning sessions on Tuesday and Wednesday </h2>
-<ul>
-<li> Exercise on writing your own stochastic gradient and gradient descent codes. This exercise continues next week with studies of automatic differentiation</li>
-<li> One lecture at the beginning of each session on the material from weeks 39 and 40 and how to write your own gradient descent code</li>
-<li> Discussion of project 2</li>
-<li> Your task before the sessions: revisit the material from weeks 39 and 40 and in particular the material from week 40 on stochastic gradient descent</li>
-</ul>
+<h2 id="mathematics-of-deep-learning" class="anchor">Mathematics of deep learning </h2>
+
+<div class="panel panel-default">
+<div class="panel-body">
+<!-- subsequent paragraphs come in larger fonts, so start with a paragraph -->
+<ol>
+<li> <a href="https://arxiv.org/abs/2105.04026" target="_self">The Modern Mathematics of Deep Learning, by Julius Berner, Philipp Grohs, Gitta Kutyniok, Philipp Petersen</a>, published as <a href="https://doi.org/10.1017/9781009025096.002" target="_self">Mathematical Aspects of Deep Learning, pp. 1-111. Cambridge University Press, 2022</a></li>
+<li> <a href="https://doi.org/10.48550/arXiv.2310.20360" target="_self">Mathematical Introduction to Deep Learning: Methods, Implementations, and Theory, Arnulf Jentzen, Benno Kuckuck, Philippe von Wurstemberger</a></li>
+</ol>
+</div>
+</div>
+
+
 <!-- !split -->
-<h2 id="lecture-monday-october-7" class="anchor">Lecture Monday  October 7 </h2>
+<h2 id="reminder-on-books-with-hands-on-material-and-codes" class="anchor">Reminder on books with hands-on material and codes </h2>
+<div class="panel panel-default">
+<div class="panel-body">
+<!-- subsequent paragraphs come in larger fonts, so start with a paragraph -->
+<a href="https://sebastianraschka.com/blog/2022/ml-pytorch-book.html" target="_self">Sebastian Rashcka et al, Machine learning with Sickit-Learn and PyTorch</a>
+</div>
+</div>
+
+
+<!-- !split -->
+<h2 id="lab-sessions-on-tuesday-and-wednesday" class="anchor">Lab sessions on Tuesday and Wednesday </h2>
+
+<p>Aim: Getting started with coding neural network. The exercises this
+week aim at setting up the feed-forward part of a neural network.
+</p>
+
+<!-- !split -->
+<h2 id="lecture-monday-october-6" class="anchor">Lecture Monday  October 6 </h2>
 
 <!-- !split -->
 <h2 id="introduction-to-neural-networks" class="anchor">Introduction to Neural networks </h2>
@@ -788,213 +715,6 @@ <h2 id="illustration-of-a-single-perceptron-model-and-a-multi-perceptron-model"
 <p><img src="http://github.com/figures/nns.png" width="600" align="bottom"></p>
 </center>
 
-<!-- !split -->
-<h2 id="examples-of-xor-or-and-and-gates" class="anchor">Examples of XOR, OR and AND gates </h2>
-
-<p>Let us first try to fit various gates using standard linear
-regression. The gates we are thinking of are the classical XOR, OR and
-AND gates, well-known elements in computer science. The tables here
-show how we can set up the inputs \( x_1 \) and \( x_2 \) in order to yield a
-specific target \( y_i \).
-</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;</span>
-<span style="color: #BA2121; font-style: italic">Simple code that tests XOR, OR and AND gates with linear regression</span>
-<span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;</span>
-
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
-<span style="color: #408080; font-style: italic"># Design matrix</span>
-X <span style="color: #666666">=</span> np<span style="color: #666666">.</span>array([ [<span style="color: #666666">1</span>, <span style="color: #666666">0</span>, <span style="color: #666666">0</span>], [<span style="color: #666666">1</span>, <span style="color: #666666">0</span>, <span style="color: #666666">1</span>], [<span style="color: #666666">1</span>, <span style="color: #666666">1</span>, <span style="color: #666666">0</span>],[<span style="color: #666666">1</span>, <span style="color: #666666">1</span>, <span style="color: #666666">1</span>]],dtype<span style="color: #666666">=</span>np<span style="color: #666666">.</span>float64)
-<span style="color: #008000">print</span>(<span style="color: #BA2121">f&quot;The X.TX  matrix:</span><span style="color: #BB6688; font-weight: bold">{</span>X<span style="color: #666666">.</span>T <span style="color: #666666">@</span> X<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">&quot;</span>)
-Xinv <span style="color: #666666">=</span> np<span style="color: #666666">.</span>linalg<span style="color: #666666">.</span>pinv(X<span style="color: #666666">.</span>T <span style="color: #666666">@</span> X)
-<span style="color: #008000">print</span>(<span style="color: #BA2121">f&quot;The invers of X.TX  matrix:</span><span style="color: #BB6688; font-weight: bold">{</span>Xinv<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">&quot;</span>)
-
-<span style="color: #408080; font-style: italic"># The XOR gate </span>
-yXOR <span style="color: #666666">=</span> np<span style="color: #666666">.</span>array( [ <span style="color: #666666">0</span>, <span style="color: #666666">1</span> ,<span style="color: #666666">1</span>, <span style="color: #666666">0</span>])
-ThetaXOR  <span style="color: #666666">=</span> Xinv <span style="color: #666666">@</span> X<span style="color: #666666">.</span>T <span style="color: #666666">@</span> yXOR
-<span style="color: #008000">print</span>(<span style="color: #BA2121">f&quot;The values of theta for the XOR gate:</span><span style="color: #BB6688; font-weight: bold">{</span>ThetaXOR<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">&quot;</span>)
-<span style="color: #008000">print</span>(<span style="color: #BA2121">f&quot;The linear regression prediction  for the XOR gate:</span><span style="color: #BB6688; font-weight: bold">{</span>X <span style="color: #666666">@</span> ThetaXOR<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">&quot;</span>)
-
-
-<span style="color: #408080; font-style: italic"># The OR gate </span>
-yOR <span style="color: #666666">=</span> np<span style="color: #666666">.</span>array( [ <span style="color: #666666">0</span>, <span style="color: #666666">1</span> ,<span style="color: #666666">1</span>, <span style="color: #666666">1</span>])
-ThetaOR  <span style="color: #666666">=</span> Xinv <span style="color: #666666">@</span> X<span style="color: #666666">.</span>T <span style="color: #666666">@</span> yOR
-<span style="color: #008000">print</span>(<span style="color: #BA2121">f&quot;The values of theta for the OR gate:</span><span style="color: #BB6688; font-weight: bold">{</span>ThetaOR<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">&quot;</span>)
-<span style="color: #008000">print</span>(<span style="color: #BA2121">f&quot;The linear regression prediction  for the OR gate:</span><span style="color: #BB6688; font-weight: bold">{</span>X <span style="color: #666666">@</span> ThetaOR<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">&quot;</span>)
-
-
-<span style="color: #408080; font-style: italic"># The OR gate </span>
-yAND <span style="color: #666666">=</span> np<span style="color: #666666">.</span>array( [ <span style="color: #666666">0</span>, <span style="color: #666666">0</span> ,<span style="color: #666666">0</span>, <span style="color: #666666">1</span>])
-ThetaAND  <span style="color: #666666">=</span> Xinv <span style="color: #666666">@</span> X<span style="color: #666666">.</span>T <span style="color: #666666">@</span> yAND
-<span style="color: #008000">print</span>(<span style="color: #BA2121">f&quot;The values of theta for the AND gate:</span><span style="color: #BB6688; font-weight: bold">{</span>ThetaAND<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">&quot;</span>)
-<span style="color: #008000">print</span>(<span style="color: #BA2121">f&quot;The linear regression prediction  for the AND gate:</span><span style="color: #BB6688; font-weight: bold">{</span>X <span style="color: #666666">@</span> ThetaAND<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">&quot;</span>)
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-<p>What is happening here?</p>
-
-<!-- !split -->
-<h2 id="does-logistic-regression-do-a-better-job" class="anchor">Does Logistic Regression do a better Job? </h2>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;</span>
-<span style="color: #BA2121; font-style: italic">Simple code that tests XOR and OR gates with linear regression</span>
-<span style="color: #BA2121; font-style: italic">and logistic regression</span>
-<span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;</span>
-
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">matplotlib.pyplot</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">plt</span>
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.linear_model</span> <span style="color: #008000; font-weight: bold">import</span> LogisticRegression
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
-
-<span style="color: #408080; font-style: italic"># Design matrix</span>
-X <span style="color: #666666">=</span> np<span style="color: #666666">.</span>array([ [<span style="color: #666666">1</span>, <span style="color: #666666">0</span>, <span style="color: #666666">0</span>], [<span style="color: #666666">1</span>, <span style="color: #666666">0</span>, <span style="color: #666666">1</span>], [<span style="color: #666666">1</span>, <span style="color: #666666">1</span>, <span style="color: #666666">0</span>],[<span style="color: #666666">1</span>, <span style="color: #666666">1</span>, <span style="color: #666666">1</span>]],dtype<span style="color: #666666">=</span>np<span style="color: #666666">.</span>float64)
-<span style="color: #008000">print</span>(<span style="color: #BA2121">f&quot;The X.TX  matrix:</span><span style="color: #BB6688; font-weight: bold">{</span>X<span style="color: #666666">.</span>T <span style="color: #666666">@</span> X<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">&quot;</span>)
-Xinv <span style="color: #666666">=</span> np<span style="color: #666666">.</span>linalg<span style="color: #666666">.</span>pinv(X<span style="color: #666666">.</span>T <span style="color: #666666">@</span> X)
-<span style="color: #008000">print</span>(<span style="color: #BA2121">f&quot;The invers of X.TX  matrix:</span><span style="color: #BB6688; font-weight: bold">{</span>Xinv<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">&quot;</span>)
-
-<span style="color: #408080; font-style: italic"># The XOR gate </span>
-yXOR <span style="color: #666666">=</span> np<span style="color: #666666">.</span>array( [ <span style="color: #666666">0</span>, <span style="color: #666666">1</span> ,<span style="color: #666666">1</span>, <span style="color: #666666">0</span>])
-ThetaXOR  <span style="color: #666666">=</span> Xinv <span style="color: #666666">@</span> X<span style="color: #666666">.</span>T <span style="color: #666666">@</span> yXOR
-<span style="color: #008000">print</span>(<span style="color: #BA2121">f&quot;The values of theta for the XOR gate:</span><span style="color: #BB6688; font-weight: bold">{</span>ThetaXOR<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">&quot;</span>)
-<span style="color: #008000">print</span>(<span style="color: #BA2121">f&quot;The linear regression prediction  for the XOR gate:</span><span style="color: #BB6688; font-weight: bold">{</span>X <span style="color: #666666">@</span> ThetaXOR<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">&quot;</span>)
-
-
-<span style="color: #408080; font-style: italic"># The OR gate </span>
-yOR <span style="color: #666666">=</span> np<span style="color: #666666">.</span>array( [ <span style="color: #666666">0</span>, <span style="color: #666666">1</span> ,<span style="color: #666666">1</span>, <span style="color: #666666">1</span>])
-ThetaOR  <span style="color: #666666">=</span> Xinv <span style="color: #666666">@</span> X<span style="color: #666666">.</span>T <span style="color: #666666">@</span> yOR
-<span style="color: #008000">print</span>(<span style="color: #BA2121">f&quot;The values of theta for the OR gate:</span><span style="color: #BB6688; font-weight: bold">{</span>ThetaOR<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">&quot;</span>)
-<span style="color: #008000">print</span>(<span style="color: #BA2121">f&quot;The linear regression prediction  for the OR gate:</span><span style="color: #BB6688; font-weight: bold">{</span>X <span style="color: #666666">@</span> ThetaOR<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">&quot;</span>)
-
-
-<span style="color: #408080; font-style: italic"># The OR gate </span>
-yAND <span style="color: #666666">=</span> np<span style="color: #666666">.</span>array( [ <span style="color: #666666">0</span>, <span style="color: #666666">0</span> ,<span style="color: #666666">0</span>, <span style="color: #666666">1</span>])
-ThetaAND  <span style="color: #666666">=</span> Xinv <span style="color: #666666">@</span> X<span style="color: #666666">.</span>T <span style="color: #666666">@</span> yAND
-<span style="color: #008000">print</span>(<span style="color: #BA2121">f&quot;The values of theta for the AND gate:</span><span style="color: #BB6688; font-weight: bold">{</span>ThetaAND<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">&quot;</span>)
-<span style="color: #008000">print</span>(<span style="color: #BA2121">f&quot;The linear regression prediction  for the AND gate:</span><span style="color: #BB6688; font-weight: bold">{</span>X <span style="color: #666666">@</span> ThetaAND<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">&quot;</span>)
-
-<span style="color: #408080; font-style: italic"># Now we change to logistic regression</span>
-
-
-<span style="color: #408080; font-style: italic"># Logistic Regression</span>
-logreg <span style="color: #666666">=</span> LogisticRegression()
-logreg<span style="color: #666666">.</span>fit(X, yOR)
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;Test set accuracy with Logistic Regression for OR gate: </span><span style="color: #BB6688; font-weight: bold">{:.2f}</span><span style="color: #BA2121">&quot;</span><span style="color: #666666">.</span>format(logreg<span style="color: #666666">.</span>score(X,yOR)))
-
-logreg<span style="color: #666666">.</span>fit(X, yXOR)
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;Test set accuracy with Logistic Regression for XOR gate: </span><span style="color: #BB6688; font-weight: bold">{:.2f}</span><span style="color: #BA2121">&quot;</span><span style="color: #666666">.</span>format(logreg<span style="color: #666666">.</span>score(X,yXOR)))
-
-
-logreg<span style="color: #666666">.</span>fit(X, yAND)
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;Test set accuracy with Logistic Regression for AND gate: </span><span style="color: #BB6688; font-weight: bold">{:.2f}</span><span style="color: #BA2121">&quot;</span><span style="color: #666666">.</span>format(logreg<span style="color: #666666">.</span>score(X,yAND)))
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-<p>Not exactly impressive, but somewhat better.</p>
-
-<!-- !split -->
-<h2 id="adding-neural-networks" class="anchor">Adding Neural Networks </h2>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #408080; font-style: italic"># and now neural networks with Scikit-Learn and the XOR</span>
-
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.neural_network</span> <span style="color: #008000; font-weight: bold">import</span> MLPClassifier
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.datasets</span> <span style="color: #008000; font-weight: bold">import</span> make_classification
-X, yXOR <span style="color: #666666">=</span> make_classification(n_samples<span style="color: #666666">=100</span>, random_state<span style="color: #666666">=1</span>)
-FFNN <span style="color: #666666">=</span> MLPClassifier(random_state<span style="color: #666666">=1</span>, max_iter<span style="color: #666666">=300</span>)<span style="color: #666666">.</span>fit(X, yXOR)
-FFNN<span style="color: #666666">.</span>predict_proba(X)
-<span style="color: #008000">print</span>(<span style="color: #BA2121">f&quot;Test set accuracy with Feed Forward Neural Network  for XOR gate:</span><span style="color: #BB6688; font-weight: bold">{</span>FFNN<span style="color: #666666">.</span>score(X, yXOR)<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">&quot;</span>)
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-
-<!-- !split -->
-<h2 id="mathematics-of-deep-learning" class="anchor">Mathematics of deep learning </h2>
-
-<div class="panel panel-default">
-<div class="panel-body">
-<!-- subsequent paragraphs come in larger fonts, so start with a paragraph -->
-<ol>
-<li> <a href="https://arxiv.org/abs/2105.04026" target="_self">The Modern Mathematics of Deep Learning, by Julius Berner, Philipp Grohs, Gitta Kutyniok, Philipp Petersen</a>, published as <a href="https://doi.org/10.1017/9781009025096.002" target="_self">Mathematical Aspects of Deep Learning, pp. 1-111. Cambridge University Press, 2022</a></li>
-<li> <a href="https://doi.org/10.48550/arXiv.2310.20360" target="_self">Mathematical Introduction to Deep Learning: Methods, Implementations, and Theory, Arnulf Jentzen, Benno Kuckuck, Philippe von Wurstemberger</a></li>
-</ol>
-</div>
-</div>
-
-
-<!-- !split -->
-<h2 id="reminder-on-books-with-hands-on-material-and-codes" class="anchor">Reminder on books with hands-on material and codes </h2>
-<div class="panel panel-default">
-<div class="panel-body">
-<!-- subsequent paragraphs come in larger fonts, so start with a paragraph -->
-<ul>
-<li> <a href="https://sebastianraschka.com/blog/2022/ml-pytorch-book.html" target="_self">Sebastian Rashcka et al, Machine learning with Sickit-Learn and PyTorch</a></li>
-</ul>
-</div>
-</div>
-
-
-<!-- !split -->
-<h2 id="reading-recommendations" class="anchor">Reading recommendations </h2>
-
-<ol>
-<li> Rashkca et al., chapter 11, jupyter-notebook sent separately, from <a href="https://github.com/rasbt/machine-learning-book" target="_self">GitHub</a></li>
-<li> Goodfellow et al, chapter 6 and 7 contain most of the neural network background.</li>
-</ol>
 <!-- !split -->
 <h2 id="mathematics-of-deep-learning-and-neural-networks" class="anchor">Mathematics of deep learning and neural networks </h2>
 
@@ -1311,7 +1031,7 @@ <h2 id="simpler-examples-first-and-automatic-differentiation" class="anchor">Sim
 <p>In order to understand the back propagation algorithm and its
 derivation (an implementation of the chain rule), let us first digress
 with some simple examples. These examples are also meant to motivate
-the link with back propagation and <a href="https://en.wikipedia.org/wiki/Automatic_differentiation" target="_self">automatic differentiation</a>.
+the link with back propagation and <a href="https://en.wikipedia.org/wiki/Automatic_differentiation" target="_self">automatic differentiation</a>. We will discuss these topics next week (week 42).
 </p>
 
 <!-- !split -->
@@ -1345,7 +1065,9 @@ <h2 id="multivariable-functions" class="anchor">Multivariable functions </h2>
 <!-- !split -->
 <h2 id="automatic-differentiation-through-examples" class="anchor">Automatic differentiation through examples </h2>
 
-<p>A great introduction to automatic differentiation is given by Baydin et al., see <a href="https://arxiv.org/abs/1502.05767" target="_self"><tt>https://arxiv.org/abs/1502.05767</tt></a>.</p>
+<p>A great introduction to automatic differentiation is given by Baydin et al., see <a href="https://arxiv.org/abs/1502.05767" target="_self"><tt>https://arxiv.org/abs/1502.05767</tt></a>.
+See also the video at <a href="https://www.youtube.com/watch?v=wG_nF1awSSY" target="_self"><tt>https://www.youtube.com/watch?v=wG_nF1awSSY</tt></a>.
+</p>
 
 <p>Automatic differentiation is a represented by a repeated application
 of the chain rule on well-known functions and allows for the
@@ -2385,484 +2107,6 @@ <h2 id="updating-the-gradients" class="anchor">Updating the gradients  </h2>
 $$
 
 
-<!-- !split -->
-<h3 id="activation-functions" class="anchor">Activation functions  </h3>
-
-<p>A property that characterizes a neural network, other than its
-connectivity, is the choice of activation function(s).  As described
-in, the following restrictions are imposed on an activation function
-for a FFNN to fulfill the universal approximation theorem
-</p>
-
-<ul>
-  <li> Non-constant</li>
-  <li> Bounded</li>
-  <li> Monotonically-increasing</li>
-  <li> Continuous</li>
-</ul>
-<!-- !split -->
-<h3 id="activation-functions-logistic-and-hyperbolic-ones" class="anchor">Activation functions, Logistic and Hyperbolic ones  </h3>
-
-<p>The second requirement excludes all linear functions. Furthermore, in
-a MLP with only linear activation functions, each layer simply
-performs a linear transformation of its inputs.
-</p>
-
-<p>Regardless of the number of layers, the output of the NN will be
-nothing but a linear function of the inputs. Thus we need to introduce
-some kind of non-linearity to the NN to be able to fit non-linear
-functions Typical examples are the logistic <em>Sigmoid</em>
-</p>
-
-$$
- f(x) = \frac{1}{1 + e^{-x}},
-$$
-
-<p>and the <em>hyperbolic tangent</em> function</p>
-$$
- f(x) = \tanh(x)
-$$
-
-
-<!-- !split -->
-<h3 id="relevance" class="anchor">Relevance </h3>
-
-<p>The <em>sigmoid</em> function are more biologically plausible because the
-output of inactive neurons are zero. Such activation function are
-called <em>one-sided</em>. However, it has been shown that the hyperbolic
-tangent performs better than the sigmoid for training MLPs.  has
-become the most popular for <em>deep neural networks</em>
-</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;The sigmoid function (or the logistic curve) is a </span>
-<span style="color: #BA2121; font-style: italic">function that takes any real number, z, and outputs a number (0,1).</span>
-<span style="color: #BA2121; font-style: italic">It is useful in neural networks for assigning weights on a relative scale.</span>
-<span style="color: #BA2121; font-style: italic">The value z is the weighted sum of parameters involved in the learning algorithm.&quot;&quot;&quot;</span>
-
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">numpy</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">matplotlib.pyplot</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">plt</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">math</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">mt</span>
-
-z <span style="color: #666666">=</span> numpy<span style="color: #666666">.</span>arange(<span style="color: #666666">-5</span>, <span style="color: #666666">5</span>, <span style="color: #666666">.1</span>)
-sigma_fn <span style="color: #666666">=</span> numpy<span style="color: #666666">.</span>vectorize(<span style="color: #008000; font-weight: bold">lambda</span> z: <span style="color: #666666">1/</span>(<span style="color: #666666">1+</span>numpy<span style="color: #666666">.</span>exp(<span style="color: #666666">-</span>z)))
-sigma <span style="color: #666666">=</span> sigma_fn(z)
-
-fig <span style="color: #666666">=</span> plt<span style="color: #666666">.</span>figure()
-ax <span style="color: #666666">=</span> fig<span style="color: #666666">.</span>add_subplot(<span style="color: #666666">111</span>)
-ax<span style="color: #666666">.</span>plot(z, sigma)
-ax<span style="color: #666666">.</span>set_ylim([<span style="color: #666666">-0.1</span>, <span style="color: #666666">1.1</span>])
-ax<span style="color: #666666">.</span>set_xlim([<span style="color: #666666">-5</span>,<span style="color: #666666">5</span>])
-ax<span style="color: #666666">.</span>grid(<span style="color: #008000; font-weight: bold">True</span>)
-ax<span style="color: #666666">.</span>set_xlabel(<span style="color: #BA2121">&#39;z&#39;</span>)
-ax<span style="color: #666666">.</span>set_title(<span style="color: #BA2121">&#39;sigmoid function&#39;</span>)
-
-plt<span style="color: #666666">.</span>show()
-
-<span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;Step Function&quot;&quot;&quot;</span>
-z <span style="color: #666666">=</span> numpy<span style="color: #666666">.</span>arange(<span style="color: #666666">-5</span>, <span style="color: #666666">5</span>, <span style="color: #666666">.02</span>)
-step_fn <span style="color: #666666">=</span> numpy<span style="color: #666666">.</span>vectorize(<span style="color: #008000; font-weight: bold">lambda</span> z: <span style="color: #666666">1.0</span> <span style="color: #008000; font-weight: bold">if</span> z <span style="color: #666666">&gt;=</span> <span style="color: #666666">0.0</span> <span style="color: #008000; font-weight: bold">else</span> <span style="color: #666666">0.0</span>)
-step <span style="color: #666666">=</span> step_fn(z)
-
-fig <span style="color: #666666">=</span> plt<span style="color: #666666">.</span>figure()
-ax <span style="color: #666666">=</span> fig<span style="color: #666666">.</span>add_subplot(<span style="color: #666666">111</span>)
-ax<span style="color: #666666">.</span>plot(z, step)
-ax<span style="color: #666666">.</span>set_ylim([<span style="color: #666666">-0.5</span>, <span style="color: #666666">1.5</span>])
-ax<span style="color: #666666">.</span>set_xlim([<span style="color: #666666">-5</span>,<span style="color: #666666">5</span>])
-ax<span style="color: #666666">.</span>grid(<span style="color: #008000; font-weight: bold">True</span>)
-ax<span style="color: #666666">.</span>set_xlabel(<span style="color: #BA2121">&#39;z&#39;</span>)
-ax<span style="color: #666666">.</span>set_title(<span style="color: #BA2121">&#39;step function&#39;</span>)
-
-plt<span style="color: #666666">.</span>show()
-
-<span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;Sine Function&quot;&quot;&quot;</span>
-z <span style="color: #666666">=</span> numpy<span style="color: #666666">.</span>arange(<span style="color: #666666">-2*</span>mt<span style="color: #666666">.</span>pi, <span style="color: #666666">2*</span>mt<span style="color: #666666">.</span>pi, <span style="color: #666666">0.1</span>)
-t <span style="color: #666666">=</span> numpy<span style="color: #666666">.</span>sin(z)
-
-fig <span style="color: #666666">=</span> plt<span style="color: #666666">.</span>figure()
-ax <span style="color: #666666">=</span> fig<span style="color: #666666">.</span>add_subplot(<span style="color: #666666">111</span>)
-ax<span style="color: #666666">.</span>plot(z, t)
-ax<span style="color: #666666">.</span>set_ylim([<span style="color: #666666">-1.0</span>, <span style="color: #666666">1.0</span>])
-ax<span style="color: #666666">.</span>set_xlim([<span style="color: #666666">-2*</span>mt<span style="color: #666666">.</span>pi,<span style="color: #666666">2*</span>mt<span style="color: #666666">.</span>pi])
-ax<span style="color: #666666">.</span>grid(<span style="color: #008000; font-weight: bold">True</span>)
-ax<span style="color: #666666">.</span>set_xlabel(<span style="color: #BA2121">&#39;z&#39;</span>)
-ax<span style="color: #666666">.</span>set_title(<span style="color: #BA2121">&#39;sine function&#39;</span>)
-
-plt<span style="color: #666666">.</span>show()
-
-<span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;Plots a graph of the squashing function used by a rectified linear</span>
-<span style="color: #BA2121; font-style: italic">unit&quot;&quot;&quot;</span>
-z <span style="color: #666666">=</span> numpy<span style="color: #666666">.</span>arange(<span style="color: #666666">-2</span>, <span style="color: #666666">2</span>, <span style="color: #666666">.1</span>)
-zero <span style="color: #666666">=</span> numpy<span style="color: #666666">.</span>zeros(<span style="color: #008000">len</span>(z))
-y <span style="color: #666666">=</span> numpy<span style="color: #666666">.</span>max([zero, z], axis<span style="color: #666666">=0</span>)
-
-fig <span style="color: #666666">=</span> plt<span style="color: #666666">.</span>figure()
-ax <span style="color: #666666">=</span> fig<span style="color: #666666">.</span>add_subplot(<span style="color: #666666">111</span>)
-ax<span style="color: #666666">.</span>plot(z, y)
-ax<span style="color: #666666">.</span>set_ylim([<span style="color: #666666">-2.0</span>, <span style="color: #666666">2.0</span>])
-ax<span style="color: #666666">.</span>set_xlim([<span style="color: #666666">-2.0</span>, <span style="color: #666666">2.0</span>])
-ax<span style="color: #666666">.</span>grid(<span style="color: #008000; font-weight: bold">True</span>)
-ax<span style="color: #666666">.</span>set_xlabel(<span style="color: #BA2121">&#39;z&#39;</span>)
-ax<span style="color: #666666">.</span>set_title(<span style="color: #BA2121">&#39;Rectified linear unit&#39;</span>)
-
-plt<span style="color: #666666">.</span>show()
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-
-<!-- !split -->
-<h2 id="fine-tuning-neural-network-hyperparameters" class="anchor">Fine-tuning neural network hyperparameters </h2>
-
-<p>The flexibility of neural networks is also one of their main
-drawbacks: there are many hyperparameters to tweak. Not only can you
-use any imaginable network topology (how neurons/nodes are
-interconnected), but even in a simple FFNN you can change the number
-of layers, the number of neurons per layer, the type of activation
-function to use in each layer, the weight initialization logic, the
-stochastic gradient optmized and much more. How do you know what
-combination of hyperparameters is the best for your task?
-</p>
-
-<ul>
-<li> You can use grid search with cross-validation to find the right hyperparameters.</li>
-</ul>
-<p>However,since there are many hyperparameters to tune, and since
-training a neural network on a large dataset takes a lot of time, you
-will only be able to explore a tiny part of the hyperparameter space.
-</p>
-
-<ul>
-<li> You can use randomized search.</li>
-<li> Or use tools like <a href="http://oscar.calldesk.ai/" target="_self">Oscar</a>, which implements more complex algorithms to help you find a good set of hyperparameters quickly.</li>  
-</ul>
-<!-- !split -->
-<h2 id="hidden-layers" class="anchor">Hidden layers </h2>
-
-<p>For many problems you can start with just one or two hidden layers and
-it will work just fine.  For the MNIST data set you ca easily get a
-high accuracy using just one hidden layer with a few hundred neurons.
-You can reach for this data set above 98% accuracy using two hidden
-layers with the same total amount of neurons, in roughly the same
-amount of training time.
-</p>
-
-<p>For more complex problems, you can gradually ramp up the number of
-hidden layers, until you start overfitting the training set. Very
-complex tasks, such as large image classification or speech
-recognition, typically require networks with dozens of layers and they
-need a huge amount of training data. However, you will rarely have to
-train such networks from scratch: it is much more common to reuse
-parts of a pretrained state-of-the-art network that performs a similar
-task.
-</p>
-
-<!-- !split  -->
-<h2 id="vanishing-gradients" class="anchor">Vanishing gradients  </h2>
-
-<p>The Back propagation algorithm we derived above works by going from
-the output layer to the input layer, propagating the error gradient on
-the way. Once the algorithm has computed the gradient of the cost
-function with regards to each parameter in the network, it uses these
-gradients to update each parameter with a Gradient Descent (GD) step.
-</p>
-
-<p>Unfortunately for us, the gradients often get smaller and smaller as
-the algorithm progresses down to the first hidden layers. As a result,
-the GD update leaves the lower layer connection weights virtually
-unchanged, and training never converges to a good solution. This is
-known in the literature as <b>the vanishing gradients problem</b>.
-</p>
-
-<!-- !split -->
-<h2 id="exploding-gradients" class="anchor">Exploding gradients </h2>
-
-<p>In other cases, the opposite can happen, namely the the gradients can
-grow bigger and bigger. The result is that many of the layers get
-large updates of the weights the algorithm diverges. This is the
-<b>exploding gradients problem</b>, which is mostly encountered in
-recurrent neural networks. More generally, deep neural networks suffer
-from unstable gradients, different layers may learn at widely
-different speeds
-</p>
-
-<!-- !split  -->
-<h2 id="is-the-logistic-activation-function-sigmoid-our-choice" class="anchor">Is the Logistic activation function (Sigmoid)  our choice? </h2>
-
-<p>Although this unfortunate behavior has been empirically observed for
-quite a while (it was one of the reasons why deep neural networks were
-mostly abandoned for a long time), it is only around 2010 that
-significant progress was made in understanding it.
-</p>
-
-<p>A paper titled <a href="http://proceedings.mlr.press/v9/glorot10a.html" target="_self">Understanding the Difficulty of Training Deep
-Feedforward Neural Networks by Xavier Glorot and Yoshua Bengio</a> found that
-the problems with the popular logistic
-sigmoid activation function and the weight initialization technique
-that was most popular at the time, namely random initialization using
-a normal distribution with a mean of 0 and a standard deviation of
-1. 
-</p>
-
-<!-- !split -->
-<h2 id="logistic-function-as-the-root-of-problems" class="anchor">Logistic function as the root of problems </h2>
-
-<p>They showed that with this activation function and this
-initialization scheme, the variance of the outputs of each layer is
-much greater than the variance of its inputs. Going forward in the
-network, the variance keeps increasing after each layer until the
-activation function saturates at the top layers. This is actually made
-worse by the fact that the logistic function has a mean of 0.5, not 0
-(the hyperbolic tangent function has a mean of 0 and behaves slightly
-better than the logistic function in deep networks).
-</p>
-
-<!-- !split -->
-<h2 id="the-derivative-of-the-logistic-funtion" class="anchor">The derivative of the Logistic funtion </h2>
-
-<p>Looking at the logistic activation function, when inputs become large
-(negative or positive), the function saturates at 0 or 1, with a
-derivative extremely close to 0. Thus when backpropagation kicks in,
-it has virtually no gradient to propagate back through the network,
-and what little gradient exists keeps getting diluted as
-backpropagation progresses down through the top layers, so there is
-really nothing left for the lower layers.
-</p>
-
-<p>In their paper, Glorot and Bengio propose a way to significantly
-alleviate this problem. We need the signal to flow properly in both
-directions: in the forward direction when making predictions, and in
-the reverse direction when backpropagating gradients. We don&#8217;t want
-the signal to die out, nor do we want it to explode and saturate. For
-the signal to flow properly, the authors argue that we need the
-variance of the outputs of each layer to be equal to the variance of
-its inputs, and we also need the gradients to have equal variance
-before and after flowing through a layer in the reverse direction.
-</p>
-
-<!-- !split -->
-<h2 id="insights-from-the-paper-by-glorot-and-bengio" class="anchor">Insights from the paper by Glorot and Bengio </h2>
-
-<p>One of the insights in the 2010 paper by Glorot and Bengio was that
-the vanishing/exploding gradients problems were in part due to a poor
-choice of activation function. Until then most people had assumed that
-if Nature had chosen to use roughly sigmoid activation functions in
-biological neurons, they must be an excellent choice. But it turns out
-that other activation functions behave much better in deep neural
-networks, in particular the ReLU activation function, mostly because
-it does not saturate for positive values (and also because it is quite
-fast to compute).
-</p>
-
-<!-- !split -->
-<h2 id="the-relu-function-family" class="anchor">The RELU function family </h2>
-
-<p>The ReLU activation function suffers from a problem known as the dying
-ReLUs: during training, some neurons effectively die, meaning they
-stop outputting anything other than 0.
-</p>
-
-<p>In some cases, you may find that half of your network&#8217;s neurons are
-dead, especially if you used a large learning rate. During training,
-if a neuron&#8217;s weights get updated such that the weighted sum of the
-neuron&#8217;s inputs is negative, it will start outputting 0. When this
-happen, the neuron is unlikely to come back to life since the gradient
-of the ReLU function is 0 when its input is negative.
-</p>
-
-<!-- !split -->
-<h2 id="elu-function" class="anchor">ELU function </h2>
-
-<p>To solve this problem, nowadays practitioners use a variant of the
-ReLU function, such as the leaky ReLU discussed above or the so-called
-exponential linear unit (ELU) function
-</p>
-
-$$
-ELU(z) = \left\{\begin{array}{cc} \alpha\left( \exp{(z)}-1\right) & z < 0,\\  z & z \ge 0.\end{array}\right. 
-$$
-
-
-<!-- !split -->
-<h2 id="which-activation-function-should-we-use" class="anchor">Which activation function should we use? </h2>
-
-<p>In general it seems that the ELU activation function is better than
-the leaky ReLU function (and its variants), which is better than
-ReLU. ReLU performs better than \( \tanh \) which in turn performs better
-than the logistic function.
-</p>
-
-<p>If runtime performance is an issue, then you may opt for the leaky
-ReLU function over the ELU function If you don&#8217;t want to tweak yet
-another hyperparameter, you may just use the default \( \alpha \) of
-\( 0.01 \) for the leaky ReLU, and \( 1 \) for ELU. If you have spare time and
-computing power, you can use cross-validation or bootstrap to evaluate
-other activation functions.
-</p>
-
-<!-- !split -->
-<h2 id="more-on-activation-functions-output-layers" class="anchor">More on activation functions, output layers </h2>
-
-<p>In most cases you can use the ReLU activation function in the hidden
-layers (or one of its variants).
-</p>
-
-<p>It is a bit faster to compute than other activation functions, and the
-gradient descent optimization does in general not get stuck.
-</p>
-
-<b>For the output layer:</b>
-
-<ul>
-<li> For classification the softmax activation function is generally a good choice for classification tasks (when the classes are mutually exclusive).</li>
-<li> For regression tasks, you can simply use no activation function at all.</li>
-</ul>
-<!-- !split -->
-<h2 id="batch-normalization" class="anchor">Batch Normalization </h2>
-
-<p>Batch Normalization aims to address the vanishing/exploding gradients
-problems, and more generally the problem that the distribution of each
-layer&#8217;s inputs changes during training, as the parameters of the
-previous layers change.
-</p>
-
-<p>The technique consists of adding an operation in the model just before
-the activation function of each layer, simply zero-centering and
-normalizing the inputs, then scaling and shifting the result using two
-new parameters per layer (one for scaling, the other for shifting). In
-other words, this operation lets the model learn the optimal scale and
-mean of the inputs for each layer.  In order to zero-center and
-normalize the inputs, the algorithm needs to estimate the inputs&#8217; mean
-and standard deviation. It does so by evaluating the mean and standard
-deviation of the inputs over the current mini-batch, from this the
-name batch normalization.
-</p>
-
-<!-- !split -->
-<h2 id="dropout" class="anchor">Dropout </h2>
-
-<p>It is a fairly simple algorithm: at every training step, every neuron
-(including the input neurons but excluding the output neurons) has a
-probability \( p \) of being temporarily dropped out, meaning it will be
-entirely ignored during this training step, but it may be active
-during the next step.
-</p>
-
-<p>The hyperparameter \( p \) is called the dropout rate, and it is typically
-set to 50%. After training, the neurons are not dropped anymore.  It
-is viewed as one of the most popular regularization techniques.
-</p>
-
-<!-- !split -->
-<h2 id="gradient-clipping" class="anchor">Gradient Clipping </h2>
-
-<p>A popular technique to lessen the exploding gradients problem is to
-simply clip the gradients during backpropagation so that they never
-exceed some threshold (this is mostly useful for recurrent neural
-networks).
-</p>
-
-<p>This technique is called Gradient Clipping.</p>
-
-<p>In general however, Batch
-Normalization is preferred.
-</p>
-
-<!-- !split  -->
-<h2 id="a-top-down-perspective-on-neural-networks" class="anchor">A top-down perspective on Neural networks </h2>
-
-<p>The first thing we would like to do is divide the data into two or
-three parts. A training set, a validation or dev (development) set,
-and a test set. The test set is the data on which we want to make
-predictions. The dev set is a subset of the training data we use to
-check how well we are doing out-of-sample, after training the model on
-the training dataset. We use the validation error as a proxy for the
-test error in order to make tweaks to our model. It is crucial that we
-do not use any of the test data to train the algorithm. This is a
-cardinal sin in ML. Then:
-</p>
-
-<ol>
-<li> Estimate optimal error rate</li>
-<li> Minimize underfitting (bias) on training data set.</li>
-<li> Make sure you are not overfitting.</li>
-</ol>
-<!-- !split -->
-<h2 id="more-top-down-perspectives" class="anchor">More top-down perspectives </h2>
-
-<p>If the validation and test sets are drawn from the same distributions,
-then a good performance on the validation set should lead to similarly
-good performance on the test set. 
-</p>
-
-<p>However, sometimes
-the training data and test data differ in subtle ways because, for
-example, they are collected using slightly different methods, or
-because it is cheaper to collect data in one way versus another. In
-this case, there can be a mismatch between the training and test
-data. This can lead to the neural network overfitting these small
-differences between the test and training sets, and a poor performance
-on the test set despite having a good performance on the validation
-set. To rectify this, Andrew Ng suggests making two validation or dev
-sets, one constructed from the training data and one constructed from
-the test data. The difference between the performance of the algorithm
-on these two validation sets quantifies the train-test mismatch. This
-can serve as another important diagnostic when using DNNs for
-supervised learning.
-</p>
-
-<!-- !split -->
-<h2 id="limitations-of-supervised-learning-with-deep-networks" class="anchor">Limitations of supervised learning with deep networks </h2>
-
-<p>Like all statistical methods, supervised learning using neural
-networks has important limitations. This is especially important when
-one seeks to apply these methods, especially to physics problems. Like
-all tools, DNNs are not a universal solution. Often, the same or
-better performance on a task can be achieved by using a few
-hand-engineered features (or even a collection of random
-features). 
-</p>
-
-<!-- !split -->
-<h2 id="limitations-of-nns" class="anchor">Limitations of NNs </h2>
-
-<p>Here we list some of the important limitations of supervised neural network based models. </p>
-
-<ul>
-<li> <b>Need labeled data</b>. All supervised learning methods, DNNs for supervised learning require labeled data. Often, labeled data is harder to acquire than unlabeled data (e.g. one must pay for human experts to label images).</li>
-<li> <b>Supervised neural networks are extremely data intensive.</b> DNNs are data hungry. They perform best when data is plentiful. This is doubly so for supervised methods where the data must also be labeled. The utility of DNNs is extremely limited if data is hard to acquire or the datasets are small (hundreds to a few thousand samples). In this case, the performance of other methods that utilize hand-engineered features can exceed that of DNNs.</li>
-</ul>
-<!-- !split -->
-<h2 id="homogeneous-data" class="anchor">Homogeneous data </h2>
-
-<ul>
-<li> <b>Homogeneous data.</b> Almost all DNNs deal with homogeneous data of one type. It is very hard to design architectures that mix and match data types (i.e.&nbsp;some continuous variables, some discrete variables, some time series). In applications beyond images, video, and language, this is often what is required. In contrast, ensemble models like random forests or gradient-boosted trees have no difficulty handling mixed data types.</li>
-</ul>
-<!-- !split -->
-<h2 id="more-limitations" class="anchor">More limitations </h2>
-
-<ul>
-<li> <b>Many problems are not about prediction.</b> In natural science we are often interested in learning something about the underlying distribution that generates the data. In this case, it is often difficult to cast these ideas in a supervised learning setting. While the problems are related, it is possible to make good predictions with a <em>wrong</em> model. The model might or might not be useful for understanding the underlying science.</li>
-</ul>
-<p>Some of these remarks are particular to DNNs, others are shared by all supervised learning methods. This motivates the use of unsupervised methods which in part circumvent these problems.</p>
-
 <!-- ------------------- end of main content --------------- -->
 </div>  <!-- end container -->
 <!-- include javascript, jQuery *first* -->
@@ -2874,7 +2118,7 @@ <h2 id="more-limitations" class="anchor">More limitations </h2>
 </footer>
 -->
 <center style="font-size:80%">
-<!-- copyright --> &copy; 1999-2024, Morten Hjorth-Jensen. Released under CC Attribution-NonCommercial 4.0 license
+<!-- copyright --> &copy; 1999-2025, Morten Hjorth-Jensen. Released under CC Attribution-NonCommercial 4.0 license
 </center>
 </body>
 </html>
diff --git a/doc/pub/week41/html/week41-reveal.html b/doc/pub/week41/html/week41-reveal.html
index 6633308c2..ad3a4b3a8 100644
--- a/doc/pub/week41/html/week41-reveal.html
+++ b/doc/pub/week41/html/week41-reveal.html
@@ -177,7 +177,7 @@ <h1 style="text-align: center;">Week 41 Neural networks and constructing a neura
 </center>
 <!-- institution -->
 <center>
-<b>Department of Physics, University of Oslo</b>
+<b>Department of Physics, University of Oslo, Norway</b>
 </center>
 <br>
 <center>
@@ -187,60 +187,74 @@ <h4>Week 41</h4>
 
 
 <center style="font-size:80%">
-<!-- copyright --> &copy; 1999-2024, Morten Hjorth-Jensen. Released under CC Attribution-NonCommercial 4.0 license
+<!-- copyright --> &copy; 1999-2025, Morten Hjorth-Jensen. Released under CC Attribution-NonCommercial 4.0 license
 </center>
 </section>
 
 <section>
-<h2 id="plan-for-week-41-october-7-11">Plan for week 41, October 7-11 </h2>
+<h2 id="plan-for-week-41-october-6-10">Plan for week 41, October 6-10 </h2>
 </section>
 
 <section>
-<h2 id="material-for-the-lecture-on-monday-october-7-2024">Material for the lecture on Monday October 7, 2024 </h2>
+<h2 id="material-for-the-lecture-on-monday-october-6-2025">Material for the lecture on Monday October 6, 2025 </h2>
 <ol>
 <p><li> Neural Networks, setting up the basic steps, from the simple perceptron model to the multi-layer perceptron model.</li>
-<p><li> Building our own Feed-forward Neural Network</li>
-<ul>
-
-<p><li> Video of lecture notes at <a href="https://youtu.be/pMRUbf9E-gM" target="_blank"><tt>https://youtu.be/pMRUbf9E-gM</tt></a></li>
-
-<p><li> Whiteboard notes at <a href="https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2024/NotesOctober7.pdf" target="_blank"><tt>https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2024/NotesOctober7.pdf</tt></a></li>
-</ul>
-<p>
+<p><li> Building our own Feed-forward Neural Network, getting started</li>
+<p><li> Video of lecture notes at <a href="https://youtu.be/96Pe6O9Wn6g" target="_blank"><tt>https://youtu.be/96Pe6O9Wn6g</tt></a></li>
+<p><li> Whiteboard notes at <a href="https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2025/FYSSTKweek41.pdf" target="_blank"><tt>https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2025/FYSSTKweek41.pdf</tt></a></li>
 </ol>
-<p>
+</section>
+
+<section>
+<h2 id="readings-and-videos">Readings and Videos: </h2>
 <div class="alert alert-block alert-block alert-text-normal">
-<b>Readings and Videos:</b>
+<b></b>
 <p>
 <ol>
 <p><li> These lecture notes</li>
-<p><li> Rashcka et al chapter 11</li> 
-<p><li> For neural networks we recommend Goodfellow et al chapter 6.
-<ol type="a"></li>
- <p><li> Neural Networks demystified at <a href="https://www.youtube.com/watch?v=bxe2T-V8XRs&list=PLiaHhY2iBX9hdHaRr6b7XevZtgZRa1PoU&ab_channel=WelchLabs" target="_blank"><tt>https://www.youtube.com/watch?v=bxe2T-V8XRs&list=PLiaHhY2iBX9hdHaRr6b7XevZtgZRa1PoU&ab_channel=WelchLabs</tt></a></li>
-</ol>
-<p>
+<p><li> For neural networks we recommend Goodfellow et al chapters 6 and 7.</li>
+<p><li> Rashkca et al., chapter 11, jupyter-notebook sent separately, from <a href="https://github.com/rasbt/machine-learning-book" target="_blank">GitHub</a></li>
+<p><li> Neural Networks demystified at <a href="https://www.youtube.com/watch?v=bxe2T-V8XRs&list=PLiaHhY2iBX9hdHaRr6b7XevZtgZRa1PoU&ab_channel=WelchLabs" target="_blank"><tt>https://www.youtube.com/watch?v=bxe2T-V8XRs&list=PLiaHhY2iBX9hdHaRr6b7XevZtgZRa1PoU&ab_channel=WelchLabs</tt></a></li>
 <p><li> Building Neural Networks from scratch at <a href="https://www.youtube.com/watch?v=Wo5dMEP_BbI&list=PLQVvvaa0QuDcjD5BAw2DxE6OF2tius3V3&ab_channel=sentdex" target="_blank"><tt>https://www.youtube.com/watch?v=Wo5dMEP_BbI&list=PLQVvvaa0QuDcjD5BAw2DxE6OF2tius3V3&ab_channel=sentdex</tt></a></li>
 <p><li> Video on Neural Networks at <a href="https://www.youtube.com/watch?v=CqOfi41LfDw" target="_blank"><tt>https://www.youtube.com/watch?v=CqOfi41LfDw</tt></a></li>
 <p><li> Video on the back propagation algorithm at <a href="https://www.youtube.com/watch?v=Ilg3gGewQ5U" target="_blank"><tt>https://www.youtube.com/watch?v=Ilg3gGewQ5U</tt></a></li>
+<p><li> We also  recommend Michael Nielsen's intuitive approach to the neural networks and the universal approximation theorem, see the slides at <a href="http://neuralnetworksanddeeplearning.com/chap4.html" target="_blank"><tt>http://neuralnetworksanddeeplearning.com/chap4.html</tt></a>.</li>
 </ol>
+</div>
+</section>
+
+<section>
+<h2 id="mathematics-of-deep-learning">Mathematics of deep learning </h2>
+
+<div class="alert alert-block alert-block alert-text-normal">
+<b>Two recent books online</b>
 <p>
-<p>We also  recommend Michael Nielsen's intuitive approach to the neural networks and the universal approximation theorem, see the slides at <a href="http://neuralnetworksanddeeplearning.com/chap4.html" target="_blank"><tt>http://neuralnetworksanddeeplearning.com/chap4.html</tt></a>.</p>
+<ol>
+<p><li> <a href="https://arxiv.org/abs/2105.04026" target="_blank">The Modern Mathematics of Deep Learning, by Julius Berner, Philipp Grohs, Gitta Kutyniok, Philipp Petersen</a>, published as <a href="https://doi.org/10.1017/9781009025096.002" target="_blank">Mathematical Aspects of Deep Learning, pp. 1-111. Cambridge University Press, 2022</a></li>
+<p><li> <a href="https://doi.org/10.48550/arXiv.2310.20360" target="_blank">Mathematical Introduction to Deep Learning: Methods, Implementations, and Theory, Arnulf Jentzen, Benno Kuckuck, Philippe von Wurstemberger</a></li>
+</ol>
 </div>
 </section>
 
 <section>
-<h2 id="material-for-the-active-learning-sessions-on-tuesday-and-wednesday">Material for the active learning sessions on Tuesday and Wednesday </h2>
-<ul>
-<p><li> Exercise on writing your own stochastic gradient and gradient descent codes. This exercise continues next week with studies of automatic differentiation</li>
-<p><li> One lecture at the beginning of each session on the material from weeks 39 and 40 and how to write your own gradient descent code</li>
-<p><li> Discussion of project 2</li>
-<p><li> Your task before the sessions: revisit the material from weeks 39 and 40 and in particular the material from week 40 on stochastic gradient descent</li>
-</ul>
+<h2 id="reminder-on-books-with-hands-on-material-and-codes">Reminder on books with hands-on material and codes </h2>
+<div class="alert alert-block alert-block alert-text-normal">
+<b></b>
+<p>
+<a href="https://sebastianraschka.com/blog/2022/ml-pytorch-book.html" target="_blank">Sebastian Rashcka et al, Machine learning with Sickit-Learn and PyTorch</a>
+</div>
 </section>
 
 <section>
-<h2 id="lecture-monday-october-7">Lecture Monday  October 7 </h2>
+<h2 id="lab-sessions-on-tuesday-and-wednesday">Lab sessions on Tuesday and Wednesday </h2>
+
+<p>Aim: Getting started with coding neural network. The exercises this
+week aim at setting up the feed-forward part of a neural network.
+</p>
+</section>
+
+<section>
+<h2 id="lecture-monday-october-6">Lecture Monday  October 6 </h2>
 </section>
 
 <section>
@@ -461,215 +475,6 @@ <h2 id="illustration-of-a-single-perceptron-model-and-a-multi-perceptron-model">
 </center>
 </section>
 
-<section>
-<h2 id="examples-of-xor-or-and-and-gates">Examples of XOR, OR and AND gates </h2>
-
-<p>Let us first try to fit various gates using standard linear
-regression. The gates we are thinking of are the classical XOR, OR and
-AND gates, well-known elements in computer science. The tables here
-show how we can set up the inputs \( x_1 \) and \( x_2 \) in order to yield a
-specific target \( y_i \).
-</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="font-size: 80%; line-height: 125%;"><span style="color: #CD5555">&quot;&quot;&quot;</span>
-<span style="color: #CD5555">Simple code that tests XOR, OR and AND gates with linear regression</span>
-<span style="color: #CD5555">&quot;&quot;&quot;</span>
-
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
-<span style="color: #228B22"># Design matrix</span>
-X = np.array([ [<span style="color: #B452CD">1</span>, <span style="color: #B452CD">0</span>, <span style="color: #B452CD">0</span>], [<span style="color: #B452CD">1</span>, <span style="color: #B452CD">0</span>, <span style="color: #B452CD">1</span>], [<span style="color: #B452CD">1</span>, <span style="color: #B452CD">1</span>, <span style="color: #B452CD">0</span>],[<span style="color: #B452CD">1</span>, <span style="color: #B452CD">1</span>, <span style="color: #B452CD">1</span>]],dtype=np.float64)
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">f&quot;The X.TX  matrix:{</span>X.T @ X<span style="color: #CD5555">}&quot;</span>)
-Xinv = np.linalg.pinv(X.T @ X)
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">f&quot;The invers of X.TX  matrix:{</span>Xinv<span style="color: #CD5555">}&quot;</span>)
-
-<span style="color: #228B22"># The XOR gate </span>
-yXOR = np.array( [ <span style="color: #B452CD">0</span>, <span style="color: #B452CD">1</span> ,<span style="color: #B452CD">1</span>, <span style="color: #B452CD">0</span>])
-ThetaXOR  = Xinv @ X.T @ yXOR
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">f&quot;The values of theta for the XOR gate:{</span>ThetaXOR<span style="color: #CD5555">}&quot;</span>)
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">f&quot;The linear regression prediction  for the XOR gate:{</span>X @ ThetaXOR<span style="color: #CD5555">}&quot;</span>)
-
-
-<span style="color: #228B22"># The OR gate </span>
-yOR = np.array( [ <span style="color: #B452CD">0</span>, <span style="color: #B452CD">1</span> ,<span style="color: #B452CD">1</span>, <span style="color: #B452CD">1</span>])
-ThetaOR  = Xinv @ X.T @ yOR
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">f&quot;The values of theta for the OR gate:{</span>ThetaOR<span style="color: #CD5555">}&quot;</span>)
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">f&quot;The linear regression prediction  for the OR gate:{</span>X @ ThetaOR<span style="color: #CD5555">}&quot;</span>)
-
-
-<span style="color: #228B22"># The OR gate </span>
-yAND = np.array( [ <span style="color: #B452CD">0</span>, <span style="color: #B452CD">0</span> ,<span style="color: #B452CD">0</span>, <span style="color: #B452CD">1</span>])
-ThetaAND  = Xinv @ X.T @ yAND
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">f&quot;The values of theta for the AND gate:{</span>ThetaAND<span style="color: #CD5555">}&quot;</span>)
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">f&quot;The linear regression prediction  for the AND gate:{</span>X @ ThetaAND<span style="color: #CD5555">}&quot;</span>)
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-<p>What is happening here?</p>
-</section>
-
-<section>
-<h2 id="does-logistic-regression-do-a-better-job">Does Logistic Regression do a better Job? </h2>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="font-size: 80%; line-height: 125%;"><span style="color: #CD5555">&quot;&quot;&quot;</span>
-<span style="color: #CD5555">Simple code that tests XOR and OR gates with linear regression</span>
-<span style="color: #CD5555">and logistic regression</span>
-<span style="color: #CD5555">&quot;&quot;&quot;</span>
-
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">matplotlib.pyplot</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">plt</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.linear_model</span> <span style="color: #8B008B; font-weight: bold">import</span> LogisticRegression
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
-
-<span style="color: #228B22"># Design matrix</span>
-X = np.array([ [<span style="color: #B452CD">1</span>, <span style="color: #B452CD">0</span>, <span style="color: #B452CD">0</span>], [<span style="color: #B452CD">1</span>, <span style="color: #B452CD">0</span>, <span style="color: #B452CD">1</span>], [<span style="color: #B452CD">1</span>, <span style="color: #B452CD">1</span>, <span style="color: #B452CD">0</span>],[<span style="color: #B452CD">1</span>, <span style="color: #B452CD">1</span>, <span style="color: #B452CD">1</span>]],dtype=np.float64)
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">f&quot;The X.TX  matrix:{</span>X.T @ X<span style="color: #CD5555">}&quot;</span>)
-Xinv = np.linalg.pinv(X.T @ X)
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">f&quot;The invers of X.TX  matrix:{</span>Xinv<span style="color: #CD5555">}&quot;</span>)
-
-<span style="color: #228B22"># The XOR gate </span>
-yXOR = np.array( [ <span style="color: #B452CD">0</span>, <span style="color: #B452CD">1</span> ,<span style="color: #B452CD">1</span>, <span style="color: #B452CD">0</span>])
-ThetaXOR  = Xinv @ X.T @ yXOR
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">f&quot;The values of theta for the XOR gate:{</span>ThetaXOR<span style="color: #CD5555">}&quot;</span>)
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">f&quot;The linear regression prediction  for the XOR gate:{</span>X @ ThetaXOR<span style="color: #CD5555">}&quot;</span>)
-
-
-<span style="color: #228B22"># The OR gate </span>
-yOR = np.array( [ <span style="color: #B452CD">0</span>, <span style="color: #B452CD">1</span> ,<span style="color: #B452CD">1</span>, <span style="color: #B452CD">1</span>])
-ThetaOR  = Xinv @ X.T @ yOR
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">f&quot;The values of theta for the OR gate:{</span>ThetaOR<span style="color: #CD5555">}&quot;</span>)
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">f&quot;The linear regression prediction  for the OR gate:{</span>X @ ThetaOR<span style="color: #CD5555">}&quot;</span>)
-
-
-<span style="color: #228B22"># The OR gate </span>
-yAND = np.array( [ <span style="color: #B452CD">0</span>, <span style="color: #B452CD">0</span> ,<span style="color: #B452CD">0</span>, <span style="color: #B452CD">1</span>])
-ThetaAND  = Xinv @ X.T @ yAND
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">f&quot;The values of theta for the AND gate:{</span>ThetaAND<span style="color: #CD5555">}&quot;</span>)
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">f&quot;The linear regression prediction  for the AND gate:{</span>X @ ThetaAND<span style="color: #CD5555">}&quot;</span>)
-
-<span style="color: #228B22"># Now we change to logistic regression</span>
-
-
-<span style="color: #228B22"># Logistic Regression</span>
-logreg = LogisticRegression()
-logreg.fit(X, yOR)
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;Test set accuracy with Logistic Regression for OR gate: {:.2f}&quot;</span>.format(logreg.score(X,yOR)))
-
-logreg.fit(X, yXOR)
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;Test set accuracy with Logistic Regression for XOR gate: {:.2f}&quot;</span>.format(logreg.score(X,yXOR)))
-
-
-logreg.fit(X, yAND)
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;Test set accuracy with Logistic Regression for AND gate: {:.2f}&quot;</span>.format(logreg.score(X,yAND)))
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-<p>Not exactly impressive, but somewhat better.</p>
-</section>
-
-<section>
-<h2 id="adding-neural-networks">Adding Neural Networks </h2>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="font-size: 80%; line-height: 125%;"><span style="color: #228B22"># and now neural networks with Scikit-Learn and the XOR</span>
-
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.neural_network</span> <span style="color: #8B008B; font-weight: bold">import</span> MLPClassifier
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.datasets</span> <span style="color: #8B008B; font-weight: bold">import</span> make_classification
-X, yXOR = make_classification(n_samples=<span style="color: #B452CD">100</span>, random_state=<span style="color: #B452CD">1</span>)
-FFNN = MLPClassifier(random_state=<span style="color: #B452CD">1</span>, max_iter=<span style="color: #B452CD">300</span>).fit(X, yXOR)
-FFNN.predict_proba(X)
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">f&quot;Test set accuracy with Feed Forward Neural Network  for XOR gate:{</span>FFNN.score(X, yXOR)<span style="color: #CD5555">}&quot;</span>)
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-</section>
-
-<section>
-<h2 id="mathematics-of-deep-learning">Mathematics of deep learning </h2>
-
-<div class="alert alert-block alert-block alert-text-normal">
-<b>Two recent books online</b>
-<p>
-<ol>
-<p><li> <a href="https://arxiv.org/abs/2105.04026" target="_blank">The Modern Mathematics of Deep Learning, by Julius Berner, Philipp Grohs, Gitta Kutyniok, Philipp Petersen</a>, published as <a href="https://doi.org/10.1017/9781009025096.002" target="_blank">Mathematical Aspects of Deep Learning, pp. 1-111. Cambridge University Press, 2022</a></li>
-<p><li> <a href="https://doi.org/10.48550/arXiv.2310.20360" target="_blank">Mathematical Introduction to Deep Learning: Methods, Implementations, and Theory, Arnulf Jentzen, Benno Kuckuck, Philippe von Wurstemberger</a></li>
-</ol>
-</div>
-</section>
-
-<section>
-<h2 id="reminder-on-books-with-hands-on-material-and-codes">Reminder on books with hands-on material and codes </h2>
-<div class="alert alert-block alert-block alert-text-normal">
-<b></b>
-<p>
-<ul>
-<p><li> <a href="https://sebastianraschka.com/blog/2022/ml-pytorch-book.html" target="_blank">Sebastian Rashcka et al, Machine learning with Sickit-Learn and PyTorch</a></li>
-</ul>
-</div>
-</section>
-
-<section>
-<h2 id="reading-recommendations">Reading recommendations </h2>
-
-<ol>
-<p><li> Rashkca et al., chapter 11, jupyter-notebook sent separately, from <a href="https://github.com/rasbt/machine-learning-book" target="_blank">GitHub</a></li>
-<p><li> Goodfellow et al, chapter 6 and 7 contain most of the neural network background.</li>
-</ol>
-</section>
-
 <section>
 <h2 id="mathematics-of-deep-learning-and-neural-networks">Mathematics of deep learning and neural networks </h2>
 
@@ -1030,7 +835,7 @@ <h2 id="simpler-examples-first-and-automatic-differentiation">Simpler examples f
 <p>In order to understand the back propagation algorithm and its
 derivation (an implementation of the chain rule), let us first digress
 with some simple examples. These examples are also meant to motivate
-the link with back propagation and <a href="https://en.wikipedia.org/wiki/Automatic_differentiation" target="_blank">automatic differentiation</a>.
+the link with back propagation and <a href="https://en.wikipedia.org/wiki/Automatic_differentiation" target="_blank">automatic differentiation</a>. We will discuss these topics next week (week 42).
 </p>
 </section>
 
@@ -1073,7 +878,9 @@ <h2 id="multivariable-functions">Multivariable functions </h2>
 <section>
 <h2 id="automatic-differentiation-through-examples">Automatic differentiation through examples </h2>
 
-<p>A great introduction to automatic differentiation is given by Baydin et al., see <a href="https://arxiv.org/abs/1502.05767" target="_blank"><tt>https://arxiv.org/abs/1502.05767</tt></a>.</p>
+<p>A great introduction to automatic differentiation is given by Baydin et al., see <a href="https://arxiv.org/abs/1502.05767" target="_blank"><tt>https://arxiv.org/abs/1502.05767</tt></a>.
+See also the video at <a href="https://www.youtube.com/watch?v=wG_nF1awSSY" target="_blank"><tt>https://www.youtube.com/watch?v=wG_nF1awSSY</tt></a>.
+</p>
 
 <p>Automatic differentiation is a represented by a repeated application
 of the chain rule on well-known functions and allows for the
@@ -2329,523 +2136,6 @@ <h2 id="updating-the-gradients">Updating the gradients  </h2>
 <p>&nbsp;<br>
 </section>
 
-<section>
-<h3 id="activation-functions">Activation functions  </h3>
-
-<p>A property that characterizes a neural network, other than its
-connectivity, is the choice of activation function(s).  As described
-in, the following restrictions are imposed on an activation function
-for a FFNN to fulfill the universal approximation theorem
-</p>
-
-<ul>
-
-<p><li> Non-constant</li>
-
-<p><li> Bounded</li>
-
-<p><li> Monotonically-increasing</li>
-
-<p><li> Continuous</li>
-</ul>
-</section>
-
-<section>
-<h3 id="activation-functions-logistic-and-hyperbolic-ones">Activation functions, Logistic and Hyperbolic ones  </h3>
-
-<p>The second requirement excludes all linear functions. Furthermore, in
-a MLP with only linear activation functions, each layer simply
-performs a linear transformation of its inputs.
-</p>
-
-<p>Regardless of the number of layers, the output of the NN will be
-nothing but a linear function of the inputs. Thus we need to introduce
-some kind of non-linearity to the NN to be able to fit non-linear
-functions Typical examples are the logistic <em>Sigmoid</em>
-</p>
-
-<p>&nbsp;<br>
-$$
- f(x) = \frac{1}{1 + e^{-x}},
-$$
-<p>&nbsp;<br>
-
-<p>and the <em>hyperbolic tangent</em> function</p>
-<p>&nbsp;<br>
-$$
- f(x) = \tanh(x)
-$$
-<p>&nbsp;<br>
-</section>
-
-<section>
-<h3 id="relevance">Relevance </h3>
-
-<p>The <em>sigmoid</em> function are more biologically plausible because the
-output of inactive neurons are zero. Such activation function are
-called <em>one-sided</em>. However, it has been shown that the hyperbolic
-tangent performs better than the sigmoid for training MLPs.  has
-become the most popular for <em>deep neural networks</em>
-</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="font-size: 80%; line-height: 125%;"><span style="color: #CD5555">&quot;&quot;&quot;The sigmoid function (or the logistic curve) is a </span>
-<span style="color: #CD5555">function that takes any real number, z, and outputs a number (0,1).</span>
-<span style="color: #CD5555">It is useful in neural networks for assigning weights on a relative scale.</span>
-<span style="color: #CD5555">The value z is the weighted sum of parameters involved in the learning algorithm.&quot;&quot;&quot;</span>
-
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">numpy</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">matplotlib.pyplot</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">plt</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">math</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">mt</span>
-
-z = numpy.arange(-<span style="color: #B452CD">5</span>, <span style="color: #B452CD">5</span>, <span style="color: #B452CD">.1</span>)
-sigma_fn = numpy.vectorize(<span style="color: #8B008B; font-weight: bold">lambda</span> z: <span style="color: #B452CD">1</span>/(<span style="color: #B452CD">1</span>+numpy.exp(-z)))
-sigma = sigma_fn(z)
-
-fig = plt.figure()
-ax = fig.add_subplot(<span style="color: #B452CD">111</span>)
-ax.plot(z, sigma)
-ax.set_ylim([-<span style="color: #B452CD">0.1</span>, <span style="color: #B452CD">1.1</span>])
-ax.set_xlim([-<span style="color: #B452CD">5</span>,<span style="color: #B452CD">5</span>])
-ax.grid(<span style="color: #8B008B; font-weight: bold">True</span>)
-ax.set_xlabel(<span style="color: #CD5555">&#39;z&#39;</span>)
-ax.set_title(<span style="color: #CD5555">&#39;sigmoid function&#39;</span>)
-
-plt.show()
-
-<span style="color: #CD5555">&quot;&quot;&quot;Step Function&quot;&quot;&quot;</span>
-z = numpy.arange(-<span style="color: #B452CD">5</span>, <span style="color: #B452CD">5</span>, <span style="color: #B452CD">.02</span>)
-step_fn = numpy.vectorize(<span style="color: #8B008B; font-weight: bold">lambda</span> z: <span style="color: #B452CD">1.0</span> <span style="color: #8B008B; font-weight: bold">if</span> z &gt;= <span style="color: #B452CD">0.0</span> <span style="color: #8B008B; font-weight: bold">else</span> <span style="color: #B452CD">0.0</span>)
-step = step_fn(z)
-
-fig = plt.figure()
-ax = fig.add_subplot(<span style="color: #B452CD">111</span>)
-ax.plot(z, step)
-ax.set_ylim([-<span style="color: #B452CD">0.5</span>, <span style="color: #B452CD">1.5</span>])
-ax.set_xlim([-<span style="color: #B452CD">5</span>,<span style="color: #B452CD">5</span>])
-ax.grid(<span style="color: #8B008B; font-weight: bold">True</span>)
-ax.set_xlabel(<span style="color: #CD5555">&#39;z&#39;</span>)
-ax.set_title(<span style="color: #CD5555">&#39;step function&#39;</span>)
-
-plt.show()
-
-<span style="color: #CD5555">&quot;&quot;&quot;Sine Function&quot;&quot;&quot;</span>
-z = numpy.arange(-<span style="color: #B452CD">2</span>*mt.pi, <span style="color: #B452CD">2</span>*mt.pi, <span style="color: #B452CD">0.1</span>)
-t = numpy.sin(z)
-
-fig = plt.figure()
-ax = fig.add_subplot(<span style="color: #B452CD">111</span>)
-ax.plot(z, t)
-ax.set_ylim([-<span style="color: #B452CD">1.0</span>, <span style="color: #B452CD">1.0</span>])
-ax.set_xlim([-<span style="color: #B452CD">2</span>*mt.pi,<span style="color: #B452CD">2</span>*mt.pi])
-ax.grid(<span style="color: #8B008B; font-weight: bold">True</span>)
-ax.set_xlabel(<span style="color: #CD5555">&#39;z&#39;</span>)
-ax.set_title(<span style="color: #CD5555">&#39;sine function&#39;</span>)
-
-plt.show()
-
-<span style="color: #CD5555">&quot;&quot;&quot;Plots a graph of the squashing function used by a rectified linear</span>
-<span style="color: #CD5555">unit&quot;&quot;&quot;</span>
-z = numpy.arange(-<span style="color: #B452CD">2</span>, <span style="color: #B452CD">2</span>, <span style="color: #B452CD">.1</span>)
-zero = numpy.zeros(<span style="color: #658b00">len</span>(z))
-y = numpy.max([zero, z], axis=<span style="color: #B452CD">0</span>)
-
-fig = plt.figure()
-ax = fig.add_subplot(<span style="color: #B452CD">111</span>)
-ax.plot(z, y)
-ax.set_ylim([-<span style="color: #B452CD">2.0</span>, <span style="color: #B452CD">2.0</span>])
-ax.set_xlim([-<span style="color: #B452CD">2.0</span>, <span style="color: #B452CD">2.0</span>])
-ax.grid(<span style="color: #8B008B; font-weight: bold">True</span>)
-ax.set_xlabel(<span style="color: #CD5555">&#39;z&#39;</span>)
-ax.set_title(<span style="color: #CD5555">&#39;Rectified linear unit&#39;</span>)
-
-plt.show()
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-</section>
-
-<section>
-<h2 id="fine-tuning-neural-network-hyperparameters">Fine-tuning neural network hyperparameters </h2>
-
-<p>The flexibility of neural networks is also one of their main
-drawbacks: there are many hyperparameters to tweak. Not only can you
-use any imaginable network topology (how neurons/nodes are
-interconnected), but even in a simple FFNN you can change the number
-of layers, the number of neurons per layer, the type of activation
-function to use in each layer, the weight initialization logic, the
-stochastic gradient optmized and much more. How do you know what
-combination of hyperparameters is the best for your task?
-</p>
-
-<ul>
-<p><li> You can use grid search with cross-validation to find the right hyperparameters.</li>
-</ul>
-<p>
-<p>However,since there are many hyperparameters to tune, and since
-training a neural network on a large dataset takes a lot of time, you
-will only be able to explore a tiny part of the hyperparameter space.
-</p>
-
-<ul>
-<p><li> You can use randomized search.</li>
-<p><li> Or use tools like <a href="http://oscar.calldesk.ai/" target="_blank">Oscar</a>, which implements more complex algorithms to help you find a good set of hyperparameters quickly.</li>  
-</ul>
-</section>
-
-<section>
-<h2 id="hidden-layers">Hidden layers </h2>
-
-<p>For many problems you can start with just one or two hidden layers and
-it will work just fine.  For the MNIST data set you ca easily get a
-high accuracy using just one hidden layer with a few hundred neurons.
-You can reach for this data set above 98% accuracy using two hidden
-layers with the same total amount of neurons, in roughly the same
-amount of training time.
-</p>
-
-<p>For more complex problems, you can gradually ramp up the number of
-hidden layers, until you start overfitting the training set. Very
-complex tasks, such as large image classification or speech
-recognition, typically require networks with dozens of layers and they
-need a huge amount of training data. However, you will rarely have to
-train such networks from scratch: it is much more common to reuse
-parts of a pretrained state-of-the-art network that performs a similar
-task.
-</p>
-</section>
-
-<section>
-<h2 id="vanishing-gradients">Vanishing gradients  </h2>
-
-<p>The Back propagation algorithm we derived above works by going from
-the output layer to the input layer, propagating the error gradient on
-the way. Once the algorithm has computed the gradient of the cost
-function with regards to each parameter in the network, it uses these
-gradients to update each parameter with a Gradient Descent (GD) step.
-</p>
-
-<p>Unfortunately for us, the gradients often get smaller and smaller as
-the algorithm progresses down to the first hidden layers. As a result,
-the GD update leaves the lower layer connection weights virtually
-unchanged, and training never converges to a good solution. This is
-known in the literature as <b>the vanishing gradients problem</b>.
-</p>
-</section>
-
-<section>
-<h2 id="exploding-gradients">Exploding gradients </h2>
-
-<p>In other cases, the opposite can happen, namely the the gradients can
-grow bigger and bigger. The result is that many of the layers get
-large updates of the weights the algorithm diverges. This is the
-<b>exploding gradients problem</b>, which is mostly encountered in
-recurrent neural networks. More generally, deep neural networks suffer
-from unstable gradients, different layers may learn at widely
-different speeds
-</p>
-</section>
-
-<section>
-<h2 id="is-the-logistic-activation-function-sigmoid-our-choice">Is the Logistic activation function (Sigmoid)  our choice? </h2>
-
-<p>Although this unfortunate behavior has been empirically observed for
-quite a while (it was one of the reasons why deep neural networks were
-mostly abandoned for a long time), it is only around 2010 that
-significant progress was made in understanding it.
-</p>
-
-<p>A paper titled <a href="http://proceedings.mlr.press/v9/glorot10a.html" target="_blank">Understanding the Difficulty of Training Deep
-Feedforward Neural Networks by Xavier Glorot and Yoshua Bengio</a> found that
-the problems with the popular logistic
-sigmoid activation function and the weight initialization technique
-that was most popular at the time, namely random initialization using
-a normal distribution with a mean of 0 and a standard deviation of
-1. 
-</p>
-</section>
-
-<section>
-<h2 id="logistic-function-as-the-root-of-problems">Logistic function as the root of problems </h2>
-
-<p>They showed that with this activation function and this
-initialization scheme, the variance of the outputs of each layer is
-much greater than the variance of its inputs. Going forward in the
-network, the variance keeps increasing after each layer until the
-activation function saturates at the top layers. This is actually made
-worse by the fact that the logistic function has a mean of 0.5, not 0
-(the hyperbolic tangent function has a mean of 0 and behaves slightly
-better than the logistic function in deep networks).
-</p>
-</section>
-
-<section>
-<h2 id="the-derivative-of-the-logistic-funtion">The derivative of the Logistic funtion </h2>
-
-<p>Looking at the logistic activation function, when inputs become large
-(negative or positive), the function saturates at 0 or 1, with a
-derivative extremely close to 0. Thus when backpropagation kicks in,
-it has virtually no gradient to propagate back through the network,
-and what little gradient exists keeps getting diluted as
-backpropagation progresses down through the top layers, so there is
-really nothing left for the lower layers.
-</p>
-
-<p>In their paper, Glorot and Bengio propose a way to significantly
-alleviate this problem. We need the signal to flow properly in both
-directions: in the forward direction when making predictions, and in
-the reverse direction when backpropagating gradients. We don&#8217;t want
-the signal to die out, nor do we want it to explode and saturate. For
-the signal to flow properly, the authors argue that we need the
-variance of the outputs of each layer to be equal to the variance of
-its inputs, and we also need the gradients to have equal variance
-before and after flowing through a layer in the reverse direction.
-</p>
-</section>
-
-<section>
-<h2 id="insights-from-the-paper-by-glorot-and-bengio">Insights from the paper by Glorot and Bengio </h2>
-
-<p>One of the insights in the 2010 paper by Glorot and Bengio was that
-the vanishing/exploding gradients problems were in part due to a poor
-choice of activation function. Until then most people had assumed that
-if Nature had chosen to use roughly sigmoid activation functions in
-biological neurons, they must be an excellent choice. But it turns out
-that other activation functions behave much better in deep neural
-networks, in particular the ReLU activation function, mostly because
-it does not saturate for positive values (and also because it is quite
-fast to compute).
-</p>
-</section>
-
-<section>
-<h2 id="the-relu-function-family">The RELU function family </h2>
-
-<p>The ReLU activation function suffers from a problem known as the dying
-ReLUs: during training, some neurons effectively die, meaning they
-stop outputting anything other than 0.
-</p>
-
-<p>In some cases, you may find that half of your network&#8217;s neurons are
-dead, especially if you used a large learning rate. During training,
-if a neuron&#8217;s weights get updated such that the weighted sum of the
-neuron&#8217;s inputs is negative, it will start outputting 0. When this
-happen, the neuron is unlikely to come back to life since the gradient
-of the ReLU function is 0 when its input is negative.
-</p>
-</section>
-
-<section>
-<h2 id="elu-function">ELU function </h2>
-
-<p>To solve this problem, nowadays practitioners use a variant of the
-ReLU function, such as the leaky ReLU discussed above or the so-called
-exponential linear unit (ELU) function
-</p>
-
-<p>&nbsp;<br>
-$$
-ELU(z) = \left\{\begin{array}{cc} \alpha\left( \exp{(z)}-1\right) & z < 0,\\  z & z \ge 0.\end{array}\right. 
-$$
-<p>&nbsp;<br>
-</section>
-
-<section>
-<h2 id="which-activation-function-should-we-use">Which activation function should we use? </h2>
-
-<p>In general it seems that the ELU activation function is better than
-the leaky ReLU function (and its variants), which is better than
-ReLU. ReLU performs better than \( \tanh \) which in turn performs better
-than the logistic function.
-</p>
-
-<p>If runtime performance is an issue, then you may opt for the leaky
-ReLU function over the ELU function If you don&#8217;t want to tweak yet
-another hyperparameter, you may just use the default \( \alpha \) of
-\( 0.01 \) for the leaky ReLU, and \( 1 \) for ELU. If you have spare time and
-computing power, you can use cross-validation or bootstrap to evaluate
-other activation functions.
-</p>
-</section>
-
-<section>
-<h2 id="more-on-activation-functions-output-layers">More on activation functions, output layers </h2>
-
-<p>In most cases you can use the ReLU activation function in the hidden
-layers (or one of its variants).
-</p>
-
-<p>It is a bit faster to compute than other activation functions, and the
-gradient descent optimization does in general not get stuck.
-</p>
-
-<b>For the output layer:</b>
-
-<ul>
-<p><li> For classification the softmax activation function is generally a good choice for classification tasks (when the classes are mutually exclusive).</li>
-<p><li> For regression tasks, you can simply use no activation function at all.</li>
-</ul>
-</section>
-
-<section>
-<h2 id="batch-normalization">Batch Normalization </h2>
-
-<p>Batch Normalization aims to address the vanishing/exploding gradients
-problems, and more generally the problem that the distribution of each
-layer&#8217;s inputs changes during training, as the parameters of the
-previous layers change.
-</p>
-
-<p>The technique consists of adding an operation in the model just before
-the activation function of each layer, simply zero-centering and
-normalizing the inputs, then scaling and shifting the result using two
-new parameters per layer (one for scaling, the other for shifting). In
-other words, this operation lets the model learn the optimal scale and
-mean of the inputs for each layer.  In order to zero-center and
-normalize the inputs, the algorithm needs to estimate the inputs&#8217; mean
-and standard deviation. It does so by evaluating the mean and standard
-deviation of the inputs over the current mini-batch, from this the
-name batch normalization.
-</p>
-</section>
-
-<section>
-<h2 id="dropout">Dropout </h2>
-
-<p>It is a fairly simple algorithm: at every training step, every neuron
-(including the input neurons but excluding the output neurons) has a
-probability \( p \) of being temporarily dropped out, meaning it will be
-entirely ignored during this training step, but it may be active
-during the next step.
-</p>
-
-<p>The hyperparameter \( p \) is called the dropout rate, and it is typically
-set to 50%. After training, the neurons are not dropped anymore.  It
-is viewed as one of the most popular regularization techniques.
-</p>
-</section>
-
-<section>
-<h2 id="gradient-clipping">Gradient Clipping </h2>
-
-<p>A popular technique to lessen the exploding gradients problem is to
-simply clip the gradients during backpropagation so that they never
-exceed some threshold (this is mostly useful for recurrent neural
-networks).
-</p>
-
-<p>This technique is called Gradient Clipping.</p>
-
-<p>In general however, Batch
-Normalization is preferred.
-</p>
-</section>
-
-<section>
-<h2 id="a-top-down-perspective-on-neural-networks">A top-down perspective on Neural networks </h2>
-
-<p>The first thing we would like to do is divide the data into two or
-three parts. A training set, a validation or dev (development) set,
-and a test set. The test set is the data on which we want to make
-predictions. The dev set is a subset of the training data we use to
-check how well we are doing out-of-sample, after training the model on
-the training dataset. We use the validation error as a proxy for the
-test error in order to make tweaks to our model. It is crucial that we
-do not use any of the test data to train the algorithm. This is a
-cardinal sin in ML. Then:
-</p>
-
-<ol>
-<p><li> Estimate optimal error rate</li>
-<p><li> Minimize underfitting (bias) on training data set.</li>
-<p><li> Make sure you are not overfitting.</li>
-</ol>
-</section>
-
-<section>
-<h2 id="more-top-down-perspectives">More top-down perspectives </h2>
-
-<p>If the validation and test sets are drawn from the same distributions,
-then a good performance on the validation set should lead to similarly
-good performance on the test set. 
-</p>
-
-<p>However, sometimes
-the training data and test data differ in subtle ways because, for
-example, they are collected using slightly different methods, or
-because it is cheaper to collect data in one way versus another. In
-this case, there can be a mismatch between the training and test
-data. This can lead to the neural network overfitting these small
-differences between the test and training sets, and a poor performance
-on the test set despite having a good performance on the validation
-set. To rectify this, Andrew Ng suggests making two validation or dev
-sets, one constructed from the training data and one constructed from
-the test data. The difference between the performance of the algorithm
-on these two validation sets quantifies the train-test mismatch. This
-can serve as another important diagnostic when using DNNs for
-supervised learning.
-</p>
-</section>
-
-<section>
-<h2 id="limitations-of-supervised-learning-with-deep-networks">Limitations of supervised learning with deep networks </h2>
-
-<p>Like all statistical methods, supervised learning using neural
-networks has important limitations. This is especially important when
-one seeks to apply these methods, especially to physics problems. Like
-all tools, DNNs are not a universal solution. Often, the same or
-better performance on a task can be achieved by using a few
-hand-engineered features (or even a collection of random
-features). 
-</p>
-</section>
-
-<section>
-<h2 id="limitations-of-nns">Limitations of NNs </h2>
-
-<p>Here we list some of the important limitations of supervised neural network based models. </p>
-
-<ul>
-<p><li> <b>Need labeled data</b>. All supervised learning methods, DNNs for supervised learning require labeled data. Often, labeled data is harder to acquire than unlabeled data (e.g. one must pay for human experts to label images).</li>
-<p><li> <b>Supervised neural networks are extremely data intensive.</b> DNNs are data hungry. They perform best when data is plentiful. This is doubly so for supervised methods where the data must also be labeled. The utility of DNNs is extremely limited if data is hard to acquire or the datasets are small (hundreds to a few thousand samples). In this case, the performance of other methods that utilize hand-engineered features can exceed that of DNNs.</li>
-</ul>
-</section>
-
-<section>
-<h2 id="homogeneous-data">Homogeneous data </h2>
-
-<ul>
-<p><li> <b>Homogeneous data.</b> Almost all DNNs deal with homogeneous data of one type. It is very hard to design architectures that mix and match data types (i.e.&nbsp;some continuous variables, some discrete variables, some time series). In applications beyond images, video, and language, this is often what is required. In contrast, ensemble models like random forests or gradient-boosted trees have no difficulty handling mixed data types.</li>
-</ul>
-</section>
-
-<section>
-<h2 id="more-limitations">More limitations </h2>
-
-<ul>
-<p><li> <b>Many problems are not about prediction.</b> In natural science we are often interested in learning something about the underlying distribution that generates the data. In this case, it is often difficult to cast these ideas in a supervised learning setting. While the problems are related, it is possible to make good predictions with a <em>wrong</em> model. The model might or might not be useful for understanding the underlying science.</li>
-</ul>
-<p>
-<p>Some of these remarks are particular to DNNs, others are shared by all supervised learning methods. This motivates the use of unsupervised methods which in part circumvent these problems.</p>
-</section>
-
 
 
 </div> <!-- class="slides" -->
diff --git a/doc/pub/week41/html/week41-solarized.html b/doc/pub/week41/html/week41-solarized.html
index dc13a412e..a0b152cea 100644
--- a/doc/pub/week41/html/week41-solarized.html
+++ b/doc/pub/week41/html/week41-solarized.html
@@ -63,23 +63,31 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Plan for week 41, October 7-11',
+ 'sections': [('Plan for week 41, October 6-10',
                2,
                None,
-               'plan-for-week-41-october-7-11'),
-              ('Material for the lecture on Monday October 7, 2024',
+               'plan-for-week-41-october-6-10'),
+              ('Material for the lecture on Monday October 6, 2025',
                2,
                None,
-               'material-for-the-lecture-on-monday-october-7-2024'),
-              ('Material for the active learning sessions on Tuesday and '
-               'Wednesday',
+               'material-for-the-lecture-on-monday-october-6-2025'),
+              ('Readings and Videos:', 2, None, 'readings-and-videos'),
+              ('Mathematics of deep learning',
+               2,
+               None,
+               'mathematics-of-deep-learning'),
+              ('Reminder on books with hands-on material and codes',
+               2,
+               None,
+               'reminder-on-books-with-hands-on-material-and-codes'),
+              ('Lab sessions on Tuesday and Wednesday',
                2,
                None,
-               'material-for-the-active-learning-sessions-on-tuesday-and-wednesday'),
-              ('Lecture Monday  October 7',
+               'lab-sessions-on-tuesday-and-wednesday'),
+              ('Lecture Monday  October 6',
                2,
                None,
-               'lecture-monday-october-7'),
+               'lecture-monday-october-6'),
               ('Introduction to Neural networks',
                2,
                None,
@@ -109,24 +117,6 @@
                2,
                None,
                'illustration-of-a-single-perceptron-model-and-a-multi-perceptron-model'),
-              ('Examples of XOR, OR and AND gates',
-               2,
-               None,
-               'examples-of-xor-or-and-and-gates'),
-              ('Does Logistic Regression do a better Job?',
-               2,
-               None,
-               'does-logistic-regression-do-a-better-job'),
-              ('Adding Neural Networks', 2, None, 'adding-neural-networks'),
-              ('Mathematics of deep learning',
-               2,
-               None,
-               'mathematics-of-deep-learning'),
-              ('Reminder on books with hands-on material and codes',
-               2,
-               None,
-               'reminder-on-books-with-hands-on-material-and-codes'),
-              ('Reading recommendations', 2, None, 'reading-recommendations'),
               ('Mathematics of deep learning and neural networks',
                2,
                None,
@@ -331,64 +321,7 @@
                2,
                None,
                'setting-up-the-back-propagation-algorithm-part-3'),
-              ('Updating the gradients', 2, None, 'updating-the-gradients'),
-              ('Activation functions', 3, None, 'activation-functions'),
-              ('Activation functions, Logistic and Hyperbolic ones',
-               3,
-               None,
-               'activation-functions-logistic-and-hyperbolic-ones'),
-              ('Relevance', 3, None, 'relevance'),
-              ('Fine-tuning neural network hyperparameters',
-               2,
-               None,
-               'fine-tuning-neural-network-hyperparameters'),
-              ('Hidden layers', 2, None, 'hidden-layers'),
-              ('Vanishing gradients', 2, None, 'vanishing-gradients'),
-              ('Exploding gradients', 2, None, 'exploding-gradients'),
-              ('Is the Logistic activation function (Sigmoid)  our choice?',
-               2,
-               None,
-               'is-the-logistic-activation-function-sigmoid-our-choice'),
-              ('Logistic function as the root of problems',
-               2,
-               None,
-               'logistic-function-as-the-root-of-problems'),
-              ('The derivative of the Logistic funtion',
-               2,
-               None,
-               'the-derivative-of-the-logistic-funtion'),
-              ('Insights from the paper by Glorot and Bengio',
-               2,
-               None,
-               'insights-from-the-paper-by-glorot-and-bengio'),
-              ('The RELU function family', 2, None, 'the-relu-function-family'),
-              ('ELU function', 2, None, 'elu-function'),
-              ('Which activation function should we use?',
-               2,
-               None,
-               'which-activation-function-should-we-use'),
-              ('More on activation functions, output layers',
-               2,
-               None,
-               'more-on-activation-functions-output-layers'),
-              ('Batch Normalization', 2, None, 'batch-normalization'),
-              ('Dropout', 2, None, 'dropout'),
-              ('Gradient Clipping', 2, None, 'gradient-clipping'),
-              ('A top-down perspective on Neural networks',
-               2,
-               None,
-               'a-top-down-perspective-on-neural-networks'),
-              ('More top-down perspectives',
-               2,
-               None,
-               'more-top-down-perspectives'),
-              ('Limitations of supervised learning with deep networks',
-               2,
-               None,
-               'limitations-of-supervised-learning-with-deep-networks'),
-              ('Limitations of NNs', 2, None, 'limitations-of-nns'),
-              ('Homogeneous data', 2, None, 'homogeneous-data'),
-              ('More limitations', 2, None, 'more-limitations')]}
+              ('Updating the gradients', 2, None, 'updating-the-gradients')]}
 end of tocinfo -->
 
 <body>
@@ -419,7 +352,7 @@ <h1>Week 41 Neural networks and constructing a neural network code</h1>
 </center>
 <!-- institution -->
 <center>
-<b>Department of Physics, University of Oslo</b>
+<b>Department of Physics, University of Oslo, Norway</b>
 </center>
 <br>
 <center>
@@ -428,46 +361,65 @@ <h4>Week 41</h4>
 <br>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="plan-for-week-41-october-7-11">Plan for week 41, October 7-11 </h2>
+<h2 id="plan-for-week-41-october-6-10">Plan for week 41, October 6-10 </h2>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="material-for-the-lecture-on-monday-october-7-2024">Material for the lecture on Monday October 7, 2024 </h2>
+<h2 id="material-for-the-lecture-on-monday-october-6-2025">Material for the lecture on Monday October 6, 2025 </h2>
 <ol>
 <li> Neural Networks, setting up the basic steps, from the simple perceptron model to the multi-layer perceptron model.</li>
-<li> Building our own Feed-forward Neural Network</li>
-<ul>
-  <li> Video of lecture notes at <a href="https://youtu.be/pMRUbf9E-gM" target="_blank"><tt>https://youtu.be/pMRUbf9E-gM</tt></a></li>
-  <li> Whiteboard notes at <a href="https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2024/NotesOctober7.pdf" target="_blank"><tt>https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2024/NotesOctober7.pdf</tt></a></li>
-</ul>
+<li> Building our own Feed-forward Neural Network, getting started</li>
+<li> Video of lecture notes at <a href="https://youtu.be/96Pe6O9Wn6g" target="_blank"><tt>https://youtu.be/96Pe6O9Wn6g</tt></a></li>
+<li> Whiteboard notes at <a href="https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2025/FYSSTKweek41.pdf" target="_blank"><tt>https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2025/FYSSTKweek41.pdf</tt></a></li>
 </ol>
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="readings-and-videos">Readings and Videos: </h2>
 <div class="alert alert-block alert-block alert-text-normal">
-<b>Readings and Videos:</b>
+<b></b>
 <p>
 <ol>
 <li> These lecture notes</li>
-<li> Rashcka et al chapter 11</li> 
-<li> For neural networks we recommend Goodfellow et al chapter 6.
-<ol type="a"></li>
- <li> Neural Networks demystified at <a href="https://www.youtube.com/watch?v=bxe2T-V8XRs&list=PLiaHhY2iBX9hdHaRr6b7XevZtgZRa1PoU&ab_channel=WelchLabs" target="_blank"><tt>https://www.youtube.com/watch?v=bxe2T-V8XRs&list=PLiaHhY2iBX9hdHaRr6b7XevZtgZRa1PoU&ab_channel=WelchLabs</tt></a></li>
-</ol>
+<li> For neural networks we recommend Goodfellow et al chapters 6 and 7.</li>
+<li> Rashkca et al., chapter 11, jupyter-notebook sent separately, from <a href="https://github.com/rasbt/machine-learning-book" target="_blank">GitHub</a></li>
+<li> Neural Networks demystified at <a href="https://www.youtube.com/watch?v=bxe2T-V8XRs&list=PLiaHhY2iBX9hdHaRr6b7XevZtgZRa1PoU&ab_channel=WelchLabs" target="_blank"><tt>https://www.youtube.com/watch?v=bxe2T-V8XRs&list=PLiaHhY2iBX9hdHaRr6b7XevZtgZRa1PoU&ab_channel=WelchLabs</tt></a></li>
 <li> Building Neural Networks from scratch at <a href="https://www.youtube.com/watch?v=Wo5dMEP_BbI&list=PLQVvvaa0QuDcjD5BAw2DxE6OF2tius3V3&ab_channel=sentdex" target="_blank"><tt>https://www.youtube.com/watch?v=Wo5dMEP_BbI&list=PLQVvvaa0QuDcjD5BAw2DxE6OF2tius3V3&ab_channel=sentdex</tt></a></li>
 <li> Video on Neural Networks at <a href="https://www.youtube.com/watch?v=CqOfi41LfDw" target="_blank"><tt>https://www.youtube.com/watch?v=CqOfi41LfDw</tt></a></li>
 <li> Video on the back propagation algorithm at <a href="https://www.youtube.com/watch?v=Ilg3gGewQ5U" target="_blank"><tt>https://www.youtube.com/watch?v=Ilg3gGewQ5U</tt></a></li>
+<li> We also  recommend Michael Nielsen's intuitive approach to the neural networks and the universal approximation theorem, see the slides at <a href="http://neuralnetworksanddeeplearning.com/chap4.html" target="_blank"><tt>http://neuralnetworksanddeeplearning.com/chap4.html</tt></a>.</li>
 </ol>
-<p>We also  recommend Michael Nielsen's intuitive approach to the neural networks and the universal approximation theorem, see the slides at <a href="http://neuralnetworksanddeeplearning.com/chap4.html" target="_blank"><tt>http://neuralnetworksanddeeplearning.com/chap4.html</tt></a>.</p>
 </div>
 
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="material-for-the-active-learning-sessions-on-tuesday-and-wednesday">Material for the active learning sessions on Tuesday and Wednesday </h2>
-<ul>
-<li> Exercise on writing your own stochastic gradient and gradient descent codes. This exercise continues next week with studies of automatic differentiation</li>
-<li> One lecture at the beginning of each session on the material from weeks 39 and 40 and how to write your own gradient descent code</li>
-<li> Discussion of project 2</li>
-<li> Your task before the sessions: revisit the material from weeks 39 and 40 and in particular the material from week 40 on stochastic gradient descent</li>
-</ul>
+<h2 id="mathematics-of-deep-learning">Mathematics of deep learning </h2>
+
+<div class="alert alert-block alert-block alert-text-normal">
+<b>Two recent books online</b>
+<p>
+<ol>
+<li> <a href="https://arxiv.org/abs/2105.04026" target="_blank">The Modern Mathematics of Deep Learning, by Julius Berner, Philipp Grohs, Gitta Kutyniok, Philipp Petersen</a>, published as <a href="https://doi.org/10.1017/9781009025096.002" target="_blank">Mathematical Aspects of Deep Learning, pp. 1-111. Cambridge University Press, 2022</a></li>
+<li> <a href="https://doi.org/10.48550/arXiv.2310.20360" target="_blank">Mathematical Introduction to Deep Learning: Methods, Implementations, and Theory, Arnulf Jentzen, Benno Kuckuck, Philippe von Wurstemberger</a></li>
+</ol>
+</div>
+
+
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="lecture-monday-october-7">Lecture Monday  October 7 </h2>
+<h2 id="reminder-on-books-with-hands-on-material-and-codes">Reminder on books with hands-on material and codes </h2>
+<div class="alert alert-block alert-block alert-text-normal">
+<b></b>
+<p>
+<a href="https://sebastianraschka.com/blog/2022/ml-pytorch-book.html" target="_blank">Sebastian Rashcka et al, Machine learning with Sickit-Learn and PyTorch</a>
+</div>
+
+
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="lab-sessions-on-tuesday-and-wednesday">Lab sessions on Tuesday and Wednesday </h2>
+
+<p>Aim: Getting started with coding neural network. The exercises this
+week aim at setting up the feed-forward part of a neural network.
+</p>
+
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="lecture-monday-october-6">Lecture Monday  October 6 </h2>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
 <h2 id="introduction-to-neural-networks">Introduction to Neural networks </h2>
@@ -674,211 +626,6 @@ <h2 id="illustration-of-a-single-perceptron-model-and-a-multi-perceptron-model">
 <p><img src="http://github.com/figures/nns.png" width="600" align="bottom"></p>
 </center>
 
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="examples-of-xor-or-and-and-gates">Examples of XOR, OR and AND gates </h2>
-
-<p>Let us first try to fit various gates using standard linear
-regression. The gates we are thinking of are the classical XOR, OR and
-AND gates, well-known elements in computer science. The tables here
-show how we can set up the inputs \( x_1 \) and \( x_2 \) in order to yield a
-specific target \( y_i \).
-</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="line-height: 125%;"><span style="color: #CD5555">&quot;&quot;&quot;</span>
-<span style="color: #CD5555">Simple code that tests XOR, OR and AND gates with linear regression</span>
-<span style="color: #CD5555">&quot;&quot;&quot;</span>
-
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
-<span style="color: #228B22"># Design matrix</span>
-X = np.array([ [<span style="color: #B452CD">1</span>, <span style="color: #B452CD">0</span>, <span style="color: #B452CD">0</span>], [<span style="color: #B452CD">1</span>, <span style="color: #B452CD">0</span>, <span style="color: #B452CD">1</span>], [<span style="color: #B452CD">1</span>, <span style="color: #B452CD">1</span>, <span style="color: #B452CD">0</span>],[<span style="color: #B452CD">1</span>, <span style="color: #B452CD">1</span>, <span style="color: #B452CD">1</span>]],dtype=np.float64)
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">f&quot;The X.TX  matrix:{</span>X.T @ X<span style="color: #CD5555">}&quot;</span>)
-Xinv = np.linalg.pinv(X.T @ X)
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">f&quot;The invers of X.TX  matrix:{</span>Xinv<span style="color: #CD5555">}&quot;</span>)
-
-<span style="color: #228B22"># The XOR gate </span>
-yXOR = np.array( [ <span style="color: #B452CD">0</span>, <span style="color: #B452CD">1</span> ,<span style="color: #B452CD">1</span>, <span style="color: #B452CD">0</span>])
-ThetaXOR  = Xinv @ X.T @ yXOR
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">f&quot;The values of theta for the XOR gate:{</span>ThetaXOR<span style="color: #CD5555">}&quot;</span>)
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">f&quot;The linear regression prediction  for the XOR gate:{</span>X @ ThetaXOR<span style="color: #CD5555">}&quot;</span>)
-
-
-<span style="color: #228B22"># The OR gate </span>
-yOR = np.array( [ <span style="color: #B452CD">0</span>, <span style="color: #B452CD">1</span> ,<span style="color: #B452CD">1</span>, <span style="color: #B452CD">1</span>])
-ThetaOR  = Xinv @ X.T @ yOR
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">f&quot;The values of theta for the OR gate:{</span>ThetaOR<span style="color: #CD5555">}&quot;</span>)
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">f&quot;The linear regression prediction  for the OR gate:{</span>X @ ThetaOR<span style="color: #CD5555">}&quot;</span>)
-
-
-<span style="color: #228B22"># The OR gate </span>
-yAND = np.array( [ <span style="color: #B452CD">0</span>, <span style="color: #B452CD">0</span> ,<span style="color: #B452CD">0</span>, <span style="color: #B452CD">1</span>])
-ThetaAND  = Xinv @ X.T @ yAND
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">f&quot;The values of theta for the AND gate:{</span>ThetaAND<span style="color: #CD5555">}&quot;</span>)
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">f&quot;The linear regression prediction  for the AND gate:{</span>X @ ThetaAND<span style="color: #CD5555">}&quot;</span>)
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-<p>What is happening here?</p>
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="does-logistic-regression-do-a-better-job">Does Logistic Regression do a better Job? </h2>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="line-height: 125%;"><span style="color: #CD5555">&quot;&quot;&quot;</span>
-<span style="color: #CD5555">Simple code that tests XOR and OR gates with linear regression</span>
-<span style="color: #CD5555">and logistic regression</span>
-<span style="color: #CD5555">&quot;&quot;&quot;</span>
-
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">matplotlib.pyplot</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">plt</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.linear_model</span> <span style="color: #8B008B; font-weight: bold">import</span> LogisticRegression
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
-
-<span style="color: #228B22"># Design matrix</span>
-X = np.array([ [<span style="color: #B452CD">1</span>, <span style="color: #B452CD">0</span>, <span style="color: #B452CD">0</span>], [<span style="color: #B452CD">1</span>, <span style="color: #B452CD">0</span>, <span style="color: #B452CD">1</span>], [<span style="color: #B452CD">1</span>, <span style="color: #B452CD">1</span>, <span style="color: #B452CD">0</span>],[<span style="color: #B452CD">1</span>, <span style="color: #B452CD">1</span>, <span style="color: #B452CD">1</span>]],dtype=np.float64)
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">f&quot;The X.TX  matrix:{</span>X.T @ X<span style="color: #CD5555">}&quot;</span>)
-Xinv = np.linalg.pinv(X.T @ X)
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">f&quot;The invers of X.TX  matrix:{</span>Xinv<span style="color: #CD5555">}&quot;</span>)
-
-<span style="color: #228B22"># The XOR gate </span>
-yXOR = np.array( [ <span style="color: #B452CD">0</span>, <span style="color: #B452CD">1</span> ,<span style="color: #B452CD">1</span>, <span style="color: #B452CD">0</span>])
-ThetaXOR  = Xinv @ X.T @ yXOR
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">f&quot;The values of theta for the XOR gate:{</span>ThetaXOR<span style="color: #CD5555">}&quot;</span>)
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">f&quot;The linear regression prediction  for the XOR gate:{</span>X @ ThetaXOR<span style="color: #CD5555">}&quot;</span>)
-
-
-<span style="color: #228B22"># The OR gate </span>
-yOR = np.array( [ <span style="color: #B452CD">0</span>, <span style="color: #B452CD">1</span> ,<span style="color: #B452CD">1</span>, <span style="color: #B452CD">1</span>])
-ThetaOR  = Xinv @ X.T @ yOR
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">f&quot;The values of theta for the OR gate:{</span>ThetaOR<span style="color: #CD5555">}&quot;</span>)
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">f&quot;The linear regression prediction  for the OR gate:{</span>X @ ThetaOR<span style="color: #CD5555">}&quot;</span>)
-
-
-<span style="color: #228B22"># The OR gate </span>
-yAND = np.array( [ <span style="color: #B452CD">0</span>, <span style="color: #B452CD">0</span> ,<span style="color: #B452CD">0</span>, <span style="color: #B452CD">1</span>])
-ThetaAND  = Xinv @ X.T @ yAND
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">f&quot;The values of theta for the AND gate:{</span>ThetaAND<span style="color: #CD5555">}&quot;</span>)
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">f&quot;The linear regression prediction  for the AND gate:{</span>X @ ThetaAND<span style="color: #CD5555">}&quot;</span>)
-
-<span style="color: #228B22"># Now we change to logistic regression</span>
-
-
-<span style="color: #228B22"># Logistic Regression</span>
-logreg = LogisticRegression()
-logreg.fit(X, yOR)
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;Test set accuracy with Logistic Regression for OR gate: {:.2f}&quot;</span>.format(logreg.score(X,yOR)))
-
-logreg.fit(X, yXOR)
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;Test set accuracy with Logistic Regression for XOR gate: {:.2f}&quot;</span>.format(logreg.score(X,yXOR)))
-
-
-logreg.fit(X, yAND)
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;Test set accuracy with Logistic Regression for AND gate: {:.2f}&quot;</span>.format(logreg.score(X,yAND)))
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-<p>Not exactly impressive, but somewhat better.</p>
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="adding-neural-networks">Adding Neural Networks </h2>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="line-height: 125%;"><span style="color: #228B22"># and now neural networks with Scikit-Learn and the XOR</span>
-
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.neural_network</span> <span style="color: #8B008B; font-weight: bold">import</span> MLPClassifier
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.datasets</span> <span style="color: #8B008B; font-weight: bold">import</span> make_classification
-X, yXOR = make_classification(n_samples=<span style="color: #B452CD">100</span>, random_state=<span style="color: #B452CD">1</span>)
-FFNN = MLPClassifier(random_state=<span style="color: #B452CD">1</span>, max_iter=<span style="color: #B452CD">300</span>).fit(X, yXOR)
-FFNN.predict_proba(X)
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">f&quot;Test set accuracy with Feed Forward Neural Network  for XOR gate:{</span>FFNN.score(X, yXOR)<span style="color: #CD5555">}&quot;</span>)
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="mathematics-of-deep-learning">Mathematics of deep learning </h2>
-
-<div class="alert alert-block alert-block alert-text-normal">
-<b>Two recent books online</b>
-<p>
-<ol>
-<li> <a href="https://arxiv.org/abs/2105.04026" target="_blank">The Modern Mathematics of Deep Learning, by Julius Berner, Philipp Grohs, Gitta Kutyniok, Philipp Petersen</a>, published as <a href="https://doi.org/10.1017/9781009025096.002" target="_blank">Mathematical Aspects of Deep Learning, pp. 1-111. Cambridge University Press, 2022</a></li>
-<li> <a href="https://doi.org/10.48550/arXiv.2310.20360" target="_blank">Mathematical Introduction to Deep Learning: Methods, Implementations, and Theory, Arnulf Jentzen, Benno Kuckuck, Philippe von Wurstemberger</a></li>
-</ol>
-</div>
-
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="reminder-on-books-with-hands-on-material-and-codes">Reminder on books with hands-on material and codes </h2>
-<div class="alert alert-block alert-block alert-text-normal">
-<b></b>
-<p>
-<ul>
-<li> <a href="https://sebastianraschka.com/blog/2022/ml-pytorch-book.html" target="_blank">Sebastian Rashcka et al, Machine learning with Sickit-Learn and PyTorch</a></li>
-</ul>
-</div>
-
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="reading-recommendations">Reading recommendations </h2>
-
-<ol>
-<li> Rashkca et al., chapter 11, jupyter-notebook sent separately, from <a href="https://github.com/rasbt/machine-learning-book" target="_blank">GitHub</a></li>
-<li> Goodfellow et al, chapter 6 and 7 contain most of the neural network background.</li>
-</ol>
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
 <h2 id="mathematics-of-deep-learning-and-neural-networks">Mathematics of deep learning and neural networks </h2>
 
@@ -1192,7 +939,7 @@ <h2 id="simpler-examples-first-and-automatic-differentiation">Simpler examples f
 <p>In order to understand the back propagation algorithm and its
 derivation (an implementation of the chain rule), let us first digress
 with some simple examples. These examples are also meant to motivate
-the link with back propagation and <a href="https://en.wikipedia.org/wiki/Automatic_differentiation" target="_blank">automatic differentiation</a>.
+the link with back propagation and <a href="https://en.wikipedia.org/wiki/Automatic_differentiation" target="_blank">automatic differentiation</a>. We will discuss these topics next week (week 42).
 </p>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
@@ -1226,7 +973,9 @@ <h2 id="multivariable-functions">Multivariable functions </h2>
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
 <h2 id="automatic-differentiation-through-examples">Automatic differentiation through examples </h2>
 
-<p>A great introduction to automatic differentiation is given by Baydin et al., see <a href="https://arxiv.org/abs/1502.05767" target="_blank"><tt>https://arxiv.org/abs/1502.05767</tt></a>.</p>
+<p>A great introduction to automatic differentiation is given by Baydin et al., see <a href="https://arxiv.org/abs/1502.05767" target="_blank"><tt>https://arxiv.org/abs/1502.05767</tt></a>.
+See also the video at <a href="https://www.youtube.com/watch?v=wG_nF1awSSY" target="_blank"><tt>https://www.youtube.com/watch?v=wG_nF1awSSY</tt></a>.
+</p>
 
 <p>Automatic differentiation is a represented by a repeated application
 of the chain rule on well-known functions and allows for the
@@ -2265,487 +2014,9 @@ <h2 id="updating-the-gradients">Updating the gradients  </h2>
 $$
 
 
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h3 id="activation-functions">Activation functions  </h3>
-
-<p>A property that characterizes a neural network, other than its
-connectivity, is the choice of activation function(s).  As described
-in, the following restrictions are imposed on an activation function
-for a FFNN to fulfill the universal approximation theorem
-</p>
-
-<ul>
-  <li> Non-constant</li>
-  <li> Bounded</li>
-  <li> Monotonically-increasing</li>
-  <li> Continuous</li>
-</ul>
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h3 id="activation-functions-logistic-and-hyperbolic-ones">Activation functions, Logistic and Hyperbolic ones  </h3>
-
-<p>The second requirement excludes all linear functions. Furthermore, in
-a MLP with only linear activation functions, each layer simply
-performs a linear transformation of its inputs.
-</p>
-
-<p>Regardless of the number of layers, the output of the NN will be
-nothing but a linear function of the inputs. Thus we need to introduce
-some kind of non-linearity to the NN to be able to fit non-linear
-functions Typical examples are the logistic <em>Sigmoid</em>
-</p>
-
-$$
- f(x) = \frac{1}{1 + e^{-x}},
-$$
-
-<p>and the <em>hyperbolic tangent</em> function</p>
-$$
- f(x) = \tanh(x)
-$$
-
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h3 id="relevance">Relevance </h3>
-
-<p>The <em>sigmoid</em> function are more biologically plausible because the
-output of inactive neurons are zero. Such activation function are
-called <em>one-sided</em>. However, it has been shown that the hyperbolic
-tangent performs better than the sigmoid for training MLPs.  has
-become the most popular for <em>deep neural networks</em>
-</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="line-height: 125%;"><span style="color: #CD5555">&quot;&quot;&quot;The sigmoid function (or the logistic curve) is a </span>
-<span style="color: #CD5555">function that takes any real number, z, and outputs a number (0,1).</span>
-<span style="color: #CD5555">It is useful in neural networks for assigning weights on a relative scale.</span>
-<span style="color: #CD5555">The value z is the weighted sum of parameters involved in the learning algorithm.&quot;&quot;&quot;</span>
-
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">numpy</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">matplotlib.pyplot</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">plt</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">math</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">mt</span>
-
-z = numpy.arange(-<span style="color: #B452CD">5</span>, <span style="color: #B452CD">5</span>, <span style="color: #B452CD">.1</span>)
-sigma_fn = numpy.vectorize(<span style="color: #8B008B; font-weight: bold">lambda</span> z: <span style="color: #B452CD">1</span>/(<span style="color: #B452CD">1</span>+numpy.exp(-z)))
-sigma = sigma_fn(z)
-
-fig = plt.figure()
-ax = fig.add_subplot(<span style="color: #B452CD">111</span>)
-ax.plot(z, sigma)
-ax.set_ylim([-<span style="color: #B452CD">0.1</span>, <span style="color: #B452CD">1.1</span>])
-ax.set_xlim([-<span style="color: #B452CD">5</span>,<span style="color: #B452CD">5</span>])
-ax.grid(<span style="color: #8B008B; font-weight: bold">True</span>)
-ax.set_xlabel(<span style="color: #CD5555">&#39;z&#39;</span>)
-ax.set_title(<span style="color: #CD5555">&#39;sigmoid function&#39;</span>)
-
-plt.show()
-
-<span style="color: #CD5555">&quot;&quot;&quot;Step Function&quot;&quot;&quot;</span>
-z = numpy.arange(-<span style="color: #B452CD">5</span>, <span style="color: #B452CD">5</span>, <span style="color: #B452CD">.02</span>)
-step_fn = numpy.vectorize(<span style="color: #8B008B; font-weight: bold">lambda</span> z: <span style="color: #B452CD">1.0</span> <span style="color: #8B008B; font-weight: bold">if</span> z &gt;= <span style="color: #B452CD">0.0</span> <span style="color: #8B008B; font-weight: bold">else</span> <span style="color: #B452CD">0.0</span>)
-step = step_fn(z)
-
-fig = plt.figure()
-ax = fig.add_subplot(<span style="color: #B452CD">111</span>)
-ax.plot(z, step)
-ax.set_ylim([-<span style="color: #B452CD">0.5</span>, <span style="color: #B452CD">1.5</span>])
-ax.set_xlim([-<span style="color: #B452CD">5</span>,<span style="color: #B452CD">5</span>])
-ax.grid(<span style="color: #8B008B; font-weight: bold">True</span>)
-ax.set_xlabel(<span style="color: #CD5555">&#39;z&#39;</span>)
-ax.set_title(<span style="color: #CD5555">&#39;step function&#39;</span>)
-
-plt.show()
-
-<span style="color: #CD5555">&quot;&quot;&quot;Sine Function&quot;&quot;&quot;</span>
-z = numpy.arange(-<span style="color: #B452CD">2</span>*mt.pi, <span style="color: #B452CD">2</span>*mt.pi, <span style="color: #B452CD">0.1</span>)
-t = numpy.sin(z)
-
-fig = plt.figure()
-ax = fig.add_subplot(<span style="color: #B452CD">111</span>)
-ax.plot(z, t)
-ax.set_ylim([-<span style="color: #B452CD">1.0</span>, <span style="color: #B452CD">1.0</span>])
-ax.set_xlim([-<span style="color: #B452CD">2</span>*mt.pi,<span style="color: #B452CD">2</span>*mt.pi])
-ax.grid(<span style="color: #8B008B; font-weight: bold">True</span>)
-ax.set_xlabel(<span style="color: #CD5555">&#39;z&#39;</span>)
-ax.set_title(<span style="color: #CD5555">&#39;sine function&#39;</span>)
-
-plt.show()
-
-<span style="color: #CD5555">&quot;&quot;&quot;Plots a graph of the squashing function used by a rectified linear</span>
-<span style="color: #CD5555">unit&quot;&quot;&quot;</span>
-z = numpy.arange(-<span style="color: #B452CD">2</span>, <span style="color: #B452CD">2</span>, <span style="color: #B452CD">.1</span>)
-zero = numpy.zeros(<span style="color: #658b00">len</span>(z))
-y = numpy.max([zero, z], axis=<span style="color: #B452CD">0</span>)
-
-fig = plt.figure()
-ax = fig.add_subplot(<span style="color: #B452CD">111</span>)
-ax.plot(z, y)
-ax.set_ylim([-<span style="color: #B452CD">2.0</span>, <span style="color: #B452CD">2.0</span>])
-ax.set_xlim([-<span style="color: #B452CD">2.0</span>, <span style="color: #B452CD">2.0</span>])
-ax.grid(<span style="color: #8B008B; font-weight: bold">True</span>)
-ax.set_xlabel(<span style="color: #CD5555">&#39;z&#39;</span>)
-ax.set_title(<span style="color: #CD5555">&#39;Rectified linear unit&#39;</span>)
-
-plt.show()
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="fine-tuning-neural-network-hyperparameters">Fine-tuning neural network hyperparameters </h2>
-
-<p>The flexibility of neural networks is also one of their main
-drawbacks: there are many hyperparameters to tweak. Not only can you
-use any imaginable network topology (how neurons/nodes are
-interconnected), but even in a simple FFNN you can change the number
-of layers, the number of neurons per layer, the type of activation
-function to use in each layer, the weight initialization logic, the
-stochastic gradient optmized and much more. How do you know what
-combination of hyperparameters is the best for your task?
-</p>
-
-<ul>
-<li> You can use grid search with cross-validation to find the right hyperparameters.</li>
-</ul>
-<p>However,since there are many hyperparameters to tune, and since
-training a neural network on a large dataset takes a lot of time, you
-will only be able to explore a tiny part of the hyperparameter space.
-</p>
-
-<ul>
-<li> You can use randomized search.</li>
-<li> Or use tools like <a href="http://oscar.calldesk.ai/" target="_blank">Oscar</a>, which implements more complex algorithms to help you find a good set of hyperparameters quickly.</li>  
-</ul>
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="hidden-layers">Hidden layers </h2>
-
-<p>For many problems you can start with just one or two hidden layers and
-it will work just fine.  For the MNIST data set you ca easily get a
-high accuracy using just one hidden layer with a few hundred neurons.
-You can reach for this data set above 98% accuracy using two hidden
-layers with the same total amount of neurons, in roughly the same
-amount of training time.
-</p>
-
-<p>For more complex problems, you can gradually ramp up the number of
-hidden layers, until you start overfitting the training set. Very
-complex tasks, such as large image classification or speech
-recognition, typically require networks with dozens of layers and they
-need a huge amount of training data. However, you will rarely have to
-train such networks from scratch: it is much more common to reuse
-parts of a pretrained state-of-the-art network that performs a similar
-task.
-</p>
-
-<!-- !split  -->
-<h2 id="vanishing-gradients">Vanishing gradients  </h2>
-
-<p>The Back propagation algorithm we derived above works by going from
-the output layer to the input layer, propagating the error gradient on
-the way. Once the algorithm has computed the gradient of the cost
-function with regards to each parameter in the network, it uses these
-gradients to update each parameter with a Gradient Descent (GD) step.
-</p>
-
-<p>Unfortunately for us, the gradients often get smaller and smaller as
-the algorithm progresses down to the first hidden layers. As a result,
-the GD update leaves the lower layer connection weights virtually
-unchanged, and training never converges to a good solution. This is
-known in the literature as <b>the vanishing gradients problem</b>.
-</p>
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="exploding-gradients">Exploding gradients </h2>
-
-<p>In other cases, the opposite can happen, namely the the gradients can
-grow bigger and bigger. The result is that many of the layers get
-large updates of the weights the algorithm diverges. This is the
-<b>exploding gradients problem</b>, which is mostly encountered in
-recurrent neural networks. More generally, deep neural networks suffer
-from unstable gradients, different layers may learn at widely
-different speeds
-</p>
-
-<!-- !split  -->
-<h2 id="is-the-logistic-activation-function-sigmoid-our-choice">Is the Logistic activation function (Sigmoid)  our choice? </h2>
-
-<p>Although this unfortunate behavior has been empirically observed for
-quite a while (it was one of the reasons why deep neural networks were
-mostly abandoned for a long time), it is only around 2010 that
-significant progress was made in understanding it.
-</p>
-
-<p>A paper titled <a href="http://proceedings.mlr.press/v9/glorot10a.html" target="_blank">Understanding the Difficulty of Training Deep
-Feedforward Neural Networks by Xavier Glorot and Yoshua Bengio</a> found that
-the problems with the popular logistic
-sigmoid activation function and the weight initialization technique
-that was most popular at the time, namely random initialization using
-a normal distribution with a mean of 0 and a standard deviation of
-1. 
-</p>
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="logistic-function-as-the-root-of-problems">Logistic function as the root of problems </h2>
-
-<p>They showed that with this activation function and this
-initialization scheme, the variance of the outputs of each layer is
-much greater than the variance of its inputs. Going forward in the
-network, the variance keeps increasing after each layer until the
-activation function saturates at the top layers. This is actually made
-worse by the fact that the logistic function has a mean of 0.5, not 0
-(the hyperbolic tangent function has a mean of 0 and behaves slightly
-better than the logistic function in deep networks).
-</p>
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="the-derivative-of-the-logistic-funtion">The derivative of the Logistic funtion </h2>
-
-<p>Looking at the logistic activation function, when inputs become large
-(negative or positive), the function saturates at 0 or 1, with a
-derivative extremely close to 0. Thus when backpropagation kicks in,
-it has virtually no gradient to propagate back through the network,
-and what little gradient exists keeps getting diluted as
-backpropagation progresses down through the top layers, so there is
-really nothing left for the lower layers.
-</p>
-
-<p>In their paper, Glorot and Bengio propose a way to significantly
-alleviate this problem. We need the signal to flow properly in both
-directions: in the forward direction when making predictions, and in
-the reverse direction when backpropagating gradients. We don&#8217;t want
-the signal to die out, nor do we want it to explode and saturate. For
-the signal to flow properly, the authors argue that we need the
-variance of the outputs of each layer to be equal to the variance of
-its inputs, and we also need the gradients to have equal variance
-before and after flowing through a layer in the reverse direction.
-</p>
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="insights-from-the-paper-by-glorot-and-bengio">Insights from the paper by Glorot and Bengio </h2>
-
-<p>One of the insights in the 2010 paper by Glorot and Bengio was that
-the vanishing/exploding gradients problems were in part due to a poor
-choice of activation function. Until then most people had assumed that
-if Nature had chosen to use roughly sigmoid activation functions in
-biological neurons, they must be an excellent choice. But it turns out
-that other activation functions behave much better in deep neural
-networks, in particular the ReLU activation function, mostly because
-it does not saturate for positive values (and also because it is quite
-fast to compute).
-</p>
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="the-relu-function-family">The RELU function family </h2>
-
-<p>The ReLU activation function suffers from a problem known as the dying
-ReLUs: during training, some neurons effectively die, meaning they
-stop outputting anything other than 0.
-</p>
-
-<p>In some cases, you may find that half of your network&#8217;s neurons are
-dead, especially if you used a large learning rate. During training,
-if a neuron&#8217;s weights get updated such that the weighted sum of the
-neuron&#8217;s inputs is negative, it will start outputting 0. When this
-happen, the neuron is unlikely to come back to life since the gradient
-of the ReLU function is 0 when its input is negative.
-</p>
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="elu-function">ELU function </h2>
-
-<p>To solve this problem, nowadays practitioners use a variant of the
-ReLU function, such as the leaky ReLU discussed above or the so-called
-exponential linear unit (ELU) function
-</p>
-
-$$
-ELU(z) = \left\{\begin{array}{cc} \alpha\left( \exp{(z)}-1\right) & z < 0,\\  z & z \ge 0.\end{array}\right. 
-$$
-
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="which-activation-function-should-we-use">Which activation function should we use? </h2>
-
-<p>In general it seems that the ELU activation function is better than
-the leaky ReLU function (and its variants), which is better than
-ReLU. ReLU performs better than \( \tanh \) which in turn performs better
-than the logistic function.
-</p>
-
-<p>If runtime performance is an issue, then you may opt for the leaky
-ReLU function over the ELU function If you don&#8217;t want to tweak yet
-another hyperparameter, you may just use the default \( \alpha \) of
-\( 0.01 \) for the leaky ReLU, and \( 1 \) for ELU. If you have spare time and
-computing power, you can use cross-validation or bootstrap to evaluate
-other activation functions.
-</p>
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="more-on-activation-functions-output-layers">More on activation functions, output layers </h2>
-
-<p>In most cases you can use the ReLU activation function in the hidden
-layers (or one of its variants).
-</p>
-
-<p>It is a bit faster to compute than other activation functions, and the
-gradient descent optimization does in general not get stuck.
-</p>
-
-<b>For the output layer:</b>
-
-<ul>
-<li> For classification the softmax activation function is generally a good choice for classification tasks (when the classes are mutually exclusive).</li>
-<li> For regression tasks, you can simply use no activation function at all.</li>
-</ul>
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="batch-normalization">Batch Normalization </h2>
-
-<p>Batch Normalization aims to address the vanishing/exploding gradients
-problems, and more generally the problem that the distribution of each
-layer&#8217;s inputs changes during training, as the parameters of the
-previous layers change.
-</p>
-
-<p>The technique consists of adding an operation in the model just before
-the activation function of each layer, simply zero-centering and
-normalizing the inputs, then scaling and shifting the result using two
-new parameters per layer (one for scaling, the other for shifting). In
-other words, this operation lets the model learn the optimal scale and
-mean of the inputs for each layer.  In order to zero-center and
-normalize the inputs, the algorithm needs to estimate the inputs&#8217; mean
-and standard deviation. It does so by evaluating the mean and standard
-deviation of the inputs over the current mini-batch, from this the
-name batch normalization.
-</p>
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="dropout">Dropout </h2>
-
-<p>It is a fairly simple algorithm: at every training step, every neuron
-(including the input neurons but excluding the output neurons) has a
-probability \( p \) of being temporarily dropped out, meaning it will be
-entirely ignored during this training step, but it may be active
-during the next step.
-</p>
-
-<p>The hyperparameter \( p \) is called the dropout rate, and it is typically
-set to 50%. After training, the neurons are not dropped anymore.  It
-is viewed as one of the most popular regularization techniques.
-</p>
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="gradient-clipping">Gradient Clipping </h2>
-
-<p>A popular technique to lessen the exploding gradients problem is to
-simply clip the gradients during backpropagation so that they never
-exceed some threshold (this is mostly useful for recurrent neural
-networks).
-</p>
-
-<p>This technique is called Gradient Clipping.</p>
-
-<p>In general however, Batch
-Normalization is preferred.
-</p>
-
-<!-- !split  -->
-<h2 id="a-top-down-perspective-on-neural-networks">A top-down perspective on Neural networks </h2>
-
-<p>The first thing we would like to do is divide the data into two or
-three parts. A training set, a validation or dev (development) set,
-and a test set. The test set is the data on which we want to make
-predictions. The dev set is a subset of the training data we use to
-check how well we are doing out-of-sample, after training the model on
-the training dataset. We use the validation error as a proxy for the
-test error in order to make tweaks to our model. It is crucial that we
-do not use any of the test data to train the algorithm. This is a
-cardinal sin in ML. Then:
-</p>
-
-<ol>
-<li> Estimate optimal error rate</li>
-<li> Minimize underfitting (bias) on training data set.</li>
-<li> Make sure you are not overfitting.</li>
-</ol>
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="more-top-down-perspectives">More top-down perspectives </h2>
-
-<p>If the validation and test sets are drawn from the same distributions,
-then a good performance on the validation set should lead to similarly
-good performance on the test set. 
-</p>
-
-<p>However, sometimes
-the training data and test data differ in subtle ways because, for
-example, they are collected using slightly different methods, or
-because it is cheaper to collect data in one way versus another. In
-this case, there can be a mismatch between the training and test
-data. This can lead to the neural network overfitting these small
-differences between the test and training sets, and a poor performance
-on the test set despite having a good performance on the validation
-set. To rectify this, Andrew Ng suggests making two validation or dev
-sets, one constructed from the training data and one constructed from
-the test data. The difference between the performance of the algorithm
-on these two validation sets quantifies the train-test mismatch. This
-can serve as another important diagnostic when using DNNs for
-supervised learning.
-</p>
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="limitations-of-supervised-learning-with-deep-networks">Limitations of supervised learning with deep networks </h2>
-
-<p>Like all statistical methods, supervised learning using neural
-networks has important limitations. This is especially important when
-one seeks to apply these methods, especially to physics problems. Like
-all tools, DNNs are not a universal solution. Often, the same or
-better performance on a task can be achieved by using a few
-hand-engineered features (or even a collection of random
-features). 
-</p>
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="limitations-of-nns">Limitations of NNs </h2>
-
-<p>Here we list some of the important limitations of supervised neural network based models. </p>
-
-<ul>
-<li> <b>Need labeled data</b>. All supervised learning methods, DNNs for supervised learning require labeled data. Often, labeled data is harder to acquire than unlabeled data (e.g. one must pay for human experts to label images).</li>
-<li> <b>Supervised neural networks are extremely data intensive.</b> DNNs are data hungry. They perform best when data is plentiful. This is doubly so for supervised methods where the data must also be labeled. The utility of DNNs is extremely limited if data is hard to acquire or the datasets are small (hundreds to a few thousand samples). In this case, the performance of other methods that utilize hand-engineered features can exceed that of DNNs.</li>
-</ul>
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="homogeneous-data">Homogeneous data </h2>
-
-<ul>
-<li> <b>Homogeneous data.</b> Almost all DNNs deal with homogeneous data of one type. It is very hard to design architectures that mix and match data types (i.e.&nbsp;some continuous variables, some discrete variables, some time series). In applications beyond images, video, and language, this is often what is required. In contrast, ensemble models like random forests or gradient-boosted trees have no difficulty handling mixed data types.</li>
-</ul>
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="more-limitations">More limitations </h2>
-
-<ul>
-<li> <b>Many problems are not about prediction.</b> In natural science we are often interested in learning something about the underlying distribution that generates the data. In this case, it is often difficult to cast these ideas in a supervised learning setting. While the problems are related, it is possible to make good predictions with a <em>wrong</em> model. The model might or might not be useful for understanding the underlying science.</li>
-</ul>
-<p>Some of these remarks are particular to DNNs, others are shared by all supervised learning methods. This motivates the use of unsupervised methods which in part circumvent these problems.</p>
-
 <!-- ------------------- end of main content --------------- -->
 <center style="font-size:80%">
-<!-- copyright --> &copy; 1999-2024, Morten Hjorth-Jensen. Released under CC Attribution-NonCommercial 4.0 license
+<!-- copyright --> &copy; 1999-2025, Morten Hjorth-Jensen. Released under CC Attribution-NonCommercial 4.0 license
 </center>
 </body>
 </html>
diff --git a/doc/pub/week41/html/week41.html b/doc/pub/week41/html/week41.html
index c320e3d98..ce4d0b84c 100644
--- a/doc/pub/week41/html/week41.html
+++ b/doc/pub/week41/html/week41.html
@@ -140,23 +140,31 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Plan for week 41, October 7-11',
+ 'sections': [('Plan for week 41, October 6-10',
                2,
                None,
-               'plan-for-week-41-october-7-11'),
-              ('Material for the lecture on Monday October 7, 2024',
+               'plan-for-week-41-october-6-10'),
+              ('Material for the lecture on Monday October 6, 2025',
                2,
                None,
-               'material-for-the-lecture-on-monday-october-7-2024'),
-              ('Material for the active learning sessions on Tuesday and '
-               'Wednesday',
+               'material-for-the-lecture-on-monday-october-6-2025'),
+              ('Readings and Videos:', 2, None, 'readings-and-videos'),
+              ('Mathematics of deep learning',
+               2,
+               None,
+               'mathematics-of-deep-learning'),
+              ('Reminder on books with hands-on material and codes',
+               2,
+               None,
+               'reminder-on-books-with-hands-on-material-and-codes'),
+              ('Lab sessions on Tuesday and Wednesday',
                2,
                None,
-               'material-for-the-active-learning-sessions-on-tuesday-and-wednesday'),
-              ('Lecture Monday  October 7',
+               'lab-sessions-on-tuesday-and-wednesday'),
+              ('Lecture Monday  October 6',
                2,
                None,
-               'lecture-monday-october-7'),
+               'lecture-monday-october-6'),
               ('Introduction to Neural networks',
                2,
                None,
@@ -186,24 +194,6 @@
                2,
                None,
                'illustration-of-a-single-perceptron-model-and-a-multi-perceptron-model'),
-              ('Examples of XOR, OR and AND gates',
-               2,
-               None,
-               'examples-of-xor-or-and-and-gates'),
-              ('Does Logistic Regression do a better Job?',
-               2,
-               None,
-               'does-logistic-regression-do-a-better-job'),
-              ('Adding Neural Networks', 2, None, 'adding-neural-networks'),
-              ('Mathematics of deep learning',
-               2,
-               None,
-               'mathematics-of-deep-learning'),
-              ('Reminder on books with hands-on material and codes',
-               2,
-               None,
-               'reminder-on-books-with-hands-on-material-and-codes'),
-              ('Reading recommendations', 2, None, 'reading-recommendations'),
               ('Mathematics of deep learning and neural networks',
                2,
                None,
@@ -408,64 +398,7 @@
                2,
                None,
                'setting-up-the-back-propagation-algorithm-part-3'),
-              ('Updating the gradients', 2, None, 'updating-the-gradients'),
-              ('Activation functions', 3, None, 'activation-functions'),
-              ('Activation functions, Logistic and Hyperbolic ones',
-               3,
-               None,
-               'activation-functions-logistic-and-hyperbolic-ones'),
-              ('Relevance', 3, None, 'relevance'),
-              ('Fine-tuning neural network hyperparameters',
-               2,
-               None,
-               'fine-tuning-neural-network-hyperparameters'),
-              ('Hidden layers', 2, None, 'hidden-layers'),
-              ('Vanishing gradients', 2, None, 'vanishing-gradients'),
-              ('Exploding gradients', 2, None, 'exploding-gradients'),
-              ('Is the Logistic activation function (Sigmoid)  our choice?',
-               2,
-               None,
-               'is-the-logistic-activation-function-sigmoid-our-choice'),
-              ('Logistic function as the root of problems',
-               2,
-               None,
-               'logistic-function-as-the-root-of-problems'),
-              ('The derivative of the Logistic funtion',
-               2,
-               None,
-               'the-derivative-of-the-logistic-funtion'),
-              ('Insights from the paper by Glorot and Bengio',
-               2,
-               None,
-               'insights-from-the-paper-by-glorot-and-bengio'),
-              ('The RELU function family', 2, None, 'the-relu-function-family'),
-              ('ELU function', 2, None, 'elu-function'),
-              ('Which activation function should we use?',
-               2,
-               None,
-               'which-activation-function-should-we-use'),
-              ('More on activation functions, output layers',
-               2,
-               None,
-               'more-on-activation-functions-output-layers'),
-              ('Batch Normalization', 2, None, 'batch-normalization'),
-              ('Dropout', 2, None, 'dropout'),
-              ('Gradient Clipping', 2, None, 'gradient-clipping'),
-              ('A top-down perspective on Neural networks',
-               2,
-               None,
-               'a-top-down-perspective-on-neural-networks'),
-              ('More top-down perspectives',
-               2,
-               None,
-               'more-top-down-perspectives'),
-              ('Limitations of supervised learning with deep networks',
-               2,
-               None,
-               'limitations-of-supervised-learning-with-deep-networks'),
-              ('Limitations of NNs', 2, None, 'limitations-of-nns'),
-              ('Homogeneous data', 2, None, 'homogeneous-data'),
-              ('More limitations', 2, None, 'more-limitations')]}
+              ('Updating the gradients', 2, None, 'updating-the-gradients')]}
 end of tocinfo -->
 
 <body>
@@ -496,7 +429,7 @@ <h1>Week 41 Neural networks and constructing a neural network code</h1>
 </center>
 <!-- institution -->
 <center>
-<b>Department of Physics, University of Oslo</b>
+<b>Department of Physics, University of Oslo, Norway</b>
 </center>
 <br>
 <center>
@@ -505,46 +438,65 @@ <h4>Week 41</h4>
 <br>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="plan-for-week-41-october-7-11">Plan for week 41, October 7-11 </h2>
+<h2 id="plan-for-week-41-october-6-10">Plan for week 41, October 6-10 </h2>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="material-for-the-lecture-on-monday-october-7-2024">Material for the lecture on Monday October 7, 2024 </h2>
+<h2 id="material-for-the-lecture-on-monday-october-6-2025">Material for the lecture on Monday October 6, 2025 </h2>
 <ol>
 <li> Neural Networks, setting up the basic steps, from the simple perceptron model to the multi-layer perceptron model.</li>
-<li> Building our own Feed-forward Neural Network</li>
-<ul>
-  <li> Video of lecture notes at <a href="https://youtu.be/pMRUbf9E-gM" target="_blank"><tt>https://youtu.be/pMRUbf9E-gM</tt></a></li>
-  <li> Whiteboard notes at <a href="https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2024/NotesOctober7.pdf" target="_blank"><tt>https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2024/NotesOctober7.pdf</tt></a></li>
-</ul>
+<li> Building our own Feed-forward Neural Network, getting started</li>
+<li> Video of lecture notes at <a href="https://youtu.be/96Pe6O9Wn6g" target="_blank"><tt>https://youtu.be/96Pe6O9Wn6g</tt></a></li>
+<li> Whiteboard notes at <a href="https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2025/FYSSTKweek41.pdf" target="_blank"><tt>https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2025/FYSSTKweek41.pdf</tt></a></li>
 </ol>
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="readings-and-videos">Readings and Videos: </h2>
 <div class="alert alert-block alert-block alert-text-normal">
-<b>Readings and Videos:</b>
+<b></b>
 <p>
 <ol>
 <li> These lecture notes</li>
-<li> Rashcka et al chapter 11</li> 
-<li> For neural networks we recommend Goodfellow et al chapter 6.
-<ol type="a"></li>
- <li> Neural Networks demystified at <a href="https://www.youtube.com/watch?v=bxe2T-V8XRs&list=PLiaHhY2iBX9hdHaRr6b7XevZtgZRa1PoU&ab_channel=WelchLabs" target="_blank"><tt>https://www.youtube.com/watch?v=bxe2T-V8XRs&list=PLiaHhY2iBX9hdHaRr6b7XevZtgZRa1PoU&ab_channel=WelchLabs</tt></a></li>
-</ol>
+<li> For neural networks we recommend Goodfellow et al chapters 6 and 7.</li>
+<li> Rashkca et al., chapter 11, jupyter-notebook sent separately, from <a href="https://github.com/rasbt/machine-learning-book" target="_blank">GitHub</a></li>
+<li> Neural Networks demystified at <a href="https://www.youtube.com/watch?v=bxe2T-V8XRs&list=PLiaHhY2iBX9hdHaRr6b7XevZtgZRa1PoU&ab_channel=WelchLabs" target="_blank"><tt>https://www.youtube.com/watch?v=bxe2T-V8XRs&list=PLiaHhY2iBX9hdHaRr6b7XevZtgZRa1PoU&ab_channel=WelchLabs</tt></a></li>
 <li> Building Neural Networks from scratch at <a href="https://www.youtube.com/watch?v=Wo5dMEP_BbI&list=PLQVvvaa0QuDcjD5BAw2DxE6OF2tius3V3&ab_channel=sentdex" target="_blank"><tt>https://www.youtube.com/watch?v=Wo5dMEP_BbI&list=PLQVvvaa0QuDcjD5BAw2DxE6OF2tius3V3&ab_channel=sentdex</tt></a></li>
 <li> Video on Neural Networks at <a href="https://www.youtube.com/watch?v=CqOfi41LfDw" target="_blank"><tt>https://www.youtube.com/watch?v=CqOfi41LfDw</tt></a></li>
 <li> Video on the back propagation algorithm at <a href="https://www.youtube.com/watch?v=Ilg3gGewQ5U" target="_blank"><tt>https://www.youtube.com/watch?v=Ilg3gGewQ5U</tt></a></li>
+<li> We also  recommend Michael Nielsen's intuitive approach to the neural networks and the universal approximation theorem, see the slides at <a href="http://neuralnetworksanddeeplearning.com/chap4.html" target="_blank"><tt>http://neuralnetworksanddeeplearning.com/chap4.html</tt></a>.</li>
 </ol>
-<p>We also  recommend Michael Nielsen's intuitive approach to the neural networks and the universal approximation theorem, see the slides at <a href="http://neuralnetworksanddeeplearning.com/chap4.html" target="_blank"><tt>http://neuralnetworksanddeeplearning.com/chap4.html</tt></a>.</p>
 </div>
 
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="material-for-the-active-learning-sessions-on-tuesday-and-wednesday">Material for the active learning sessions on Tuesday and Wednesday </h2>
-<ul>
-<li> Exercise on writing your own stochastic gradient and gradient descent codes. This exercise continues next week with studies of automatic differentiation</li>
-<li> One lecture at the beginning of each session on the material from weeks 39 and 40 and how to write your own gradient descent code</li>
-<li> Discussion of project 2</li>
-<li> Your task before the sessions: revisit the material from weeks 39 and 40 and in particular the material from week 40 on stochastic gradient descent</li>
-</ul>
+<h2 id="mathematics-of-deep-learning">Mathematics of deep learning </h2>
+
+<div class="alert alert-block alert-block alert-text-normal">
+<b>Two recent books online</b>
+<p>
+<ol>
+<li> <a href="https://arxiv.org/abs/2105.04026" target="_blank">The Modern Mathematics of Deep Learning, by Julius Berner, Philipp Grohs, Gitta Kutyniok, Philipp Petersen</a>, published as <a href="https://doi.org/10.1017/9781009025096.002" target="_blank">Mathematical Aspects of Deep Learning, pp. 1-111. Cambridge University Press, 2022</a></li>
+<li> <a href="https://doi.org/10.48550/arXiv.2310.20360" target="_blank">Mathematical Introduction to Deep Learning: Methods, Implementations, and Theory, Arnulf Jentzen, Benno Kuckuck, Philippe von Wurstemberger</a></li>
+</ol>
+</div>
+
+
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="lecture-monday-october-7">Lecture Monday  October 7 </h2>
+<h2 id="reminder-on-books-with-hands-on-material-and-codes">Reminder on books with hands-on material and codes </h2>
+<div class="alert alert-block alert-block alert-text-normal">
+<b></b>
+<p>
+<a href="https://sebastianraschka.com/blog/2022/ml-pytorch-book.html" target="_blank">Sebastian Rashcka et al, Machine learning with Sickit-Learn and PyTorch</a>
+</div>
+
+
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="lab-sessions-on-tuesday-and-wednesday">Lab sessions on Tuesday and Wednesday </h2>
+
+<p>Aim: Getting started with coding neural network. The exercises this
+week aim at setting up the feed-forward part of a neural network.
+</p>
+
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="lecture-monday-october-6">Lecture Monday  October 6 </h2>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
 <h2 id="introduction-to-neural-networks">Introduction to Neural networks </h2>
@@ -751,211 +703,6 @@ <h2 id="illustration-of-a-single-perceptron-model-and-a-multi-perceptron-model">
 <p><img src="http://github.com/figures/nns.png" width="600" align="bottom"></p>
 </center>
 
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="examples-of-xor-or-and-and-gates">Examples of XOR, OR and AND gates </h2>
-
-<p>Let us first try to fit various gates using standard linear
-regression. The gates we are thinking of are the classical XOR, OR and
-AND gates, well-known elements in computer science. The tables here
-show how we can set up the inputs \( x_1 \) and \( x_2 \) in order to yield a
-specific target \( y_i \).
-</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;</span>
-<span style="color: #BA2121; font-style: italic">Simple code that tests XOR, OR and AND gates with linear regression</span>
-<span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;</span>
-
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
-<span style="color: #408080; font-style: italic"># Design matrix</span>
-X <span style="color: #666666">=</span> np<span style="color: #666666">.</span>array([ [<span style="color: #666666">1</span>, <span style="color: #666666">0</span>, <span style="color: #666666">0</span>], [<span style="color: #666666">1</span>, <span style="color: #666666">0</span>, <span style="color: #666666">1</span>], [<span style="color: #666666">1</span>, <span style="color: #666666">1</span>, <span style="color: #666666">0</span>],[<span style="color: #666666">1</span>, <span style="color: #666666">1</span>, <span style="color: #666666">1</span>]],dtype<span style="color: #666666">=</span>np<span style="color: #666666">.</span>float64)
-<span style="color: #008000">print</span>(<span style="color: #BA2121">f&quot;The X.TX  matrix:</span><span style="color: #BB6688; font-weight: bold">{</span>X<span style="color: #666666">.</span>T <span style="color: #666666">@</span> X<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">&quot;</span>)
-Xinv <span style="color: #666666">=</span> np<span style="color: #666666">.</span>linalg<span style="color: #666666">.</span>pinv(X<span style="color: #666666">.</span>T <span style="color: #666666">@</span> X)
-<span style="color: #008000">print</span>(<span style="color: #BA2121">f&quot;The invers of X.TX  matrix:</span><span style="color: #BB6688; font-weight: bold">{</span>Xinv<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">&quot;</span>)
-
-<span style="color: #408080; font-style: italic"># The XOR gate </span>
-yXOR <span style="color: #666666">=</span> np<span style="color: #666666">.</span>array( [ <span style="color: #666666">0</span>, <span style="color: #666666">1</span> ,<span style="color: #666666">1</span>, <span style="color: #666666">0</span>])
-ThetaXOR  <span style="color: #666666">=</span> Xinv <span style="color: #666666">@</span> X<span style="color: #666666">.</span>T <span style="color: #666666">@</span> yXOR
-<span style="color: #008000">print</span>(<span style="color: #BA2121">f&quot;The values of theta for the XOR gate:</span><span style="color: #BB6688; font-weight: bold">{</span>ThetaXOR<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">&quot;</span>)
-<span style="color: #008000">print</span>(<span style="color: #BA2121">f&quot;The linear regression prediction  for the XOR gate:</span><span style="color: #BB6688; font-weight: bold">{</span>X <span style="color: #666666">@</span> ThetaXOR<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">&quot;</span>)
-
-
-<span style="color: #408080; font-style: italic"># The OR gate </span>
-yOR <span style="color: #666666">=</span> np<span style="color: #666666">.</span>array( [ <span style="color: #666666">0</span>, <span style="color: #666666">1</span> ,<span style="color: #666666">1</span>, <span style="color: #666666">1</span>])
-ThetaOR  <span style="color: #666666">=</span> Xinv <span style="color: #666666">@</span> X<span style="color: #666666">.</span>T <span style="color: #666666">@</span> yOR
-<span style="color: #008000">print</span>(<span style="color: #BA2121">f&quot;The values of theta for the OR gate:</span><span style="color: #BB6688; font-weight: bold">{</span>ThetaOR<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">&quot;</span>)
-<span style="color: #008000">print</span>(<span style="color: #BA2121">f&quot;The linear regression prediction  for the OR gate:</span><span style="color: #BB6688; font-weight: bold">{</span>X <span style="color: #666666">@</span> ThetaOR<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">&quot;</span>)
-
-
-<span style="color: #408080; font-style: italic"># The OR gate </span>
-yAND <span style="color: #666666">=</span> np<span style="color: #666666">.</span>array( [ <span style="color: #666666">0</span>, <span style="color: #666666">0</span> ,<span style="color: #666666">0</span>, <span style="color: #666666">1</span>])
-ThetaAND  <span style="color: #666666">=</span> Xinv <span style="color: #666666">@</span> X<span style="color: #666666">.</span>T <span style="color: #666666">@</span> yAND
-<span style="color: #008000">print</span>(<span style="color: #BA2121">f&quot;The values of theta for the AND gate:</span><span style="color: #BB6688; font-weight: bold">{</span>ThetaAND<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">&quot;</span>)
-<span style="color: #008000">print</span>(<span style="color: #BA2121">f&quot;The linear regression prediction  for the AND gate:</span><span style="color: #BB6688; font-weight: bold">{</span>X <span style="color: #666666">@</span> ThetaAND<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">&quot;</span>)
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-<p>What is happening here?</p>
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="does-logistic-regression-do-a-better-job">Does Logistic Regression do a better Job? </h2>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;</span>
-<span style="color: #BA2121; font-style: italic">Simple code that tests XOR and OR gates with linear regression</span>
-<span style="color: #BA2121; font-style: italic">and logistic regression</span>
-<span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;</span>
-
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">matplotlib.pyplot</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">plt</span>
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.linear_model</span> <span style="color: #008000; font-weight: bold">import</span> LogisticRegression
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
-
-<span style="color: #408080; font-style: italic"># Design matrix</span>
-X <span style="color: #666666">=</span> np<span style="color: #666666">.</span>array([ [<span style="color: #666666">1</span>, <span style="color: #666666">0</span>, <span style="color: #666666">0</span>], [<span style="color: #666666">1</span>, <span style="color: #666666">0</span>, <span style="color: #666666">1</span>], [<span style="color: #666666">1</span>, <span style="color: #666666">1</span>, <span style="color: #666666">0</span>],[<span style="color: #666666">1</span>, <span style="color: #666666">1</span>, <span style="color: #666666">1</span>]],dtype<span style="color: #666666">=</span>np<span style="color: #666666">.</span>float64)
-<span style="color: #008000">print</span>(<span style="color: #BA2121">f&quot;The X.TX  matrix:</span><span style="color: #BB6688; font-weight: bold">{</span>X<span style="color: #666666">.</span>T <span style="color: #666666">@</span> X<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">&quot;</span>)
-Xinv <span style="color: #666666">=</span> np<span style="color: #666666">.</span>linalg<span style="color: #666666">.</span>pinv(X<span style="color: #666666">.</span>T <span style="color: #666666">@</span> X)
-<span style="color: #008000">print</span>(<span style="color: #BA2121">f&quot;The invers of X.TX  matrix:</span><span style="color: #BB6688; font-weight: bold">{</span>Xinv<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">&quot;</span>)
-
-<span style="color: #408080; font-style: italic"># The XOR gate </span>
-yXOR <span style="color: #666666">=</span> np<span style="color: #666666">.</span>array( [ <span style="color: #666666">0</span>, <span style="color: #666666">1</span> ,<span style="color: #666666">1</span>, <span style="color: #666666">0</span>])
-ThetaXOR  <span style="color: #666666">=</span> Xinv <span style="color: #666666">@</span> X<span style="color: #666666">.</span>T <span style="color: #666666">@</span> yXOR
-<span style="color: #008000">print</span>(<span style="color: #BA2121">f&quot;The values of theta for the XOR gate:</span><span style="color: #BB6688; font-weight: bold">{</span>ThetaXOR<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">&quot;</span>)
-<span style="color: #008000">print</span>(<span style="color: #BA2121">f&quot;The linear regression prediction  for the XOR gate:</span><span style="color: #BB6688; font-weight: bold">{</span>X <span style="color: #666666">@</span> ThetaXOR<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">&quot;</span>)
-
-
-<span style="color: #408080; font-style: italic"># The OR gate </span>
-yOR <span style="color: #666666">=</span> np<span style="color: #666666">.</span>array( [ <span style="color: #666666">0</span>, <span style="color: #666666">1</span> ,<span style="color: #666666">1</span>, <span style="color: #666666">1</span>])
-ThetaOR  <span style="color: #666666">=</span> Xinv <span style="color: #666666">@</span> X<span style="color: #666666">.</span>T <span style="color: #666666">@</span> yOR
-<span style="color: #008000">print</span>(<span style="color: #BA2121">f&quot;The values of theta for the OR gate:</span><span style="color: #BB6688; font-weight: bold">{</span>ThetaOR<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">&quot;</span>)
-<span style="color: #008000">print</span>(<span style="color: #BA2121">f&quot;The linear regression prediction  for the OR gate:</span><span style="color: #BB6688; font-weight: bold">{</span>X <span style="color: #666666">@</span> ThetaOR<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">&quot;</span>)
-
-
-<span style="color: #408080; font-style: italic"># The OR gate </span>
-yAND <span style="color: #666666">=</span> np<span style="color: #666666">.</span>array( [ <span style="color: #666666">0</span>, <span style="color: #666666">0</span> ,<span style="color: #666666">0</span>, <span style="color: #666666">1</span>])
-ThetaAND  <span style="color: #666666">=</span> Xinv <span style="color: #666666">@</span> X<span style="color: #666666">.</span>T <span style="color: #666666">@</span> yAND
-<span style="color: #008000">print</span>(<span style="color: #BA2121">f&quot;The values of theta for the AND gate:</span><span style="color: #BB6688; font-weight: bold">{</span>ThetaAND<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">&quot;</span>)
-<span style="color: #008000">print</span>(<span style="color: #BA2121">f&quot;The linear regression prediction  for the AND gate:</span><span style="color: #BB6688; font-weight: bold">{</span>X <span style="color: #666666">@</span> ThetaAND<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">&quot;</span>)
-
-<span style="color: #408080; font-style: italic"># Now we change to logistic regression</span>
-
-
-<span style="color: #408080; font-style: italic"># Logistic Regression</span>
-logreg <span style="color: #666666">=</span> LogisticRegression()
-logreg<span style="color: #666666">.</span>fit(X, yOR)
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;Test set accuracy with Logistic Regression for OR gate: </span><span style="color: #BB6688; font-weight: bold">{:.2f}</span><span style="color: #BA2121">&quot;</span><span style="color: #666666">.</span>format(logreg<span style="color: #666666">.</span>score(X,yOR)))
-
-logreg<span style="color: #666666">.</span>fit(X, yXOR)
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;Test set accuracy with Logistic Regression for XOR gate: </span><span style="color: #BB6688; font-weight: bold">{:.2f}</span><span style="color: #BA2121">&quot;</span><span style="color: #666666">.</span>format(logreg<span style="color: #666666">.</span>score(X,yXOR)))
-
-
-logreg<span style="color: #666666">.</span>fit(X, yAND)
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;Test set accuracy with Logistic Regression for AND gate: </span><span style="color: #BB6688; font-weight: bold">{:.2f}</span><span style="color: #BA2121">&quot;</span><span style="color: #666666">.</span>format(logreg<span style="color: #666666">.</span>score(X,yAND)))
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-<p>Not exactly impressive, but somewhat better.</p>
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="adding-neural-networks">Adding Neural Networks </h2>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #408080; font-style: italic"># and now neural networks with Scikit-Learn and the XOR</span>
-
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.neural_network</span> <span style="color: #008000; font-weight: bold">import</span> MLPClassifier
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.datasets</span> <span style="color: #008000; font-weight: bold">import</span> make_classification
-X, yXOR <span style="color: #666666">=</span> make_classification(n_samples<span style="color: #666666">=100</span>, random_state<span style="color: #666666">=1</span>)
-FFNN <span style="color: #666666">=</span> MLPClassifier(random_state<span style="color: #666666">=1</span>, max_iter<span style="color: #666666">=300</span>)<span style="color: #666666">.</span>fit(X, yXOR)
-FFNN<span style="color: #666666">.</span>predict_proba(X)
-<span style="color: #008000">print</span>(<span style="color: #BA2121">f&quot;Test set accuracy with Feed Forward Neural Network  for XOR gate:</span><span style="color: #BB6688; font-weight: bold">{</span>FFNN<span style="color: #666666">.</span>score(X, yXOR)<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">&quot;</span>)
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="mathematics-of-deep-learning">Mathematics of deep learning </h2>
-
-<div class="alert alert-block alert-block alert-text-normal">
-<b>Two recent books online</b>
-<p>
-<ol>
-<li> <a href="https://arxiv.org/abs/2105.04026" target="_blank">The Modern Mathematics of Deep Learning, by Julius Berner, Philipp Grohs, Gitta Kutyniok, Philipp Petersen</a>, published as <a href="https://doi.org/10.1017/9781009025096.002" target="_blank">Mathematical Aspects of Deep Learning, pp. 1-111. Cambridge University Press, 2022</a></li>
-<li> <a href="https://doi.org/10.48550/arXiv.2310.20360" target="_blank">Mathematical Introduction to Deep Learning: Methods, Implementations, and Theory, Arnulf Jentzen, Benno Kuckuck, Philippe von Wurstemberger</a></li>
-</ol>
-</div>
-
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="reminder-on-books-with-hands-on-material-and-codes">Reminder on books with hands-on material and codes </h2>
-<div class="alert alert-block alert-block alert-text-normal">
-<b></b>
-<p>
-<ul>
-<li> <a href="https://sebastianraschka.com/blog/2022/ml-pytorch-book.html" target="_blank">Sebastian Rashcka et al, Machine learning with Sickit-Learn and PyTorch</a></li>
-</ul>
-</div>
-
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="reading-recommendations">Reading recommendations </h2>
-
-<ol>
-<li> Rashkca et al., chapter 11, jupyter-notebook sent separately, from <a href="https://github.com/rasbt/machine-learning-book" target="_blank">GitHub</a></li>
-<li> Goodfellow et al, chapter 6 and 7 contain most of the neural network background.</li>
-</ol>
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
 <h2 id="mathematics-of-deep-learning-and-neural-networks">Mathematics of deep learning and neural networks </h2>
 
@@ -1269,7 +1016,7 @@ <h2 id="simpler-examples-first-and-automatic-differentiation">Simpler examples f
 <p>In order to understand the back propagation algorithm and its
 derivation (an implementation of the chain rule), let us first digress
 with some simple examples. These examples are also meant to motivate
-the link with back propagation and <a href="https://en.wikipedia.org/wiki/Automatic_differentiation" target="_blank">automatic differentiation</a>.
+the link with back propagation and <a href="https://en.wikipedia.org/wiki/Automatic_differentiation" target="_blank">automatic differentiation</a>. We will discuss these topics next week (week 42).
 </p>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
@@ -1303,7 +1050,9 @@ <h2 id="multivariable-functions">Multivariable functions </h2>
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
 <h2 id="automatic-differentiation-through-examples">Automatic differentiation through examples </h2>
 
-<p>A great introduction to automatic differentiation is given by Baydin et al., see <a href="https://arxiv.org/abs/1502.05767" target="_blank"><tt>https://arxiv.org/abs/1502.05767</tt></a>.</p>
+<p>A great introduction to automatic differentiation is given by Baydin et al., see <a href="https://arxiv.org/abs/1502.05767" target="_blank"><tt>https://arxiv.org/abs/1502.05767</tt></a>.
+See also the video at <a href="https://www.youtube.com/watch?v=wG_nF1awSSY" target="_blank"><tt>https://www.youtube.com/watch?v=wG_nF1awSSY</tt></a>.
+</p>
 
 <p>Automatic differentiation is a represented by a repeated application
 of the chain rule on well-known functions and allows for the
@@ -2342,487 +2091,9 @@ <h2 id="updating-the-gradients">Updating the gradients  </h2>
 $$
 
 
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h3 id="activation-functions">Activation functions  </h3>
-
-<p>A property that characterizes a neural network, other than its
-connectivity, is the choice of activation function(s).  As described
-in, the following restrictions are imposed on an activation function
-for a FFNN to fulfill the universal approximation theorem
-</p>
-
-<ul>
-  <li> Non-constant</li>
-  <li> Bounded</li>
-  <li> Monotonically-increasing</li>
-  <li> Continuous</li>
-</ul>
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h3 id="activation-functions-logistic-and-hyperbolic-ones">Activation functions, Logistic and Hyperbolic ones  </h3>
-
-<p>The second requirement excludes all linear functions. Furthermore, in
-a MLP with only linear activation functions, each layer simply
-performs a linear transformation of its inputs.
-</p>
-
-<p>Regardless of the number of layers, the output of the NN will be
-nothing but a linear function of the inputs. Thus we need to introduce
-some kind of non-linearity to the NN to be able to fit non-linear
-functions Typical examples are the logistic <em>Sigmoid</em>
-</p>
-
-$$
- f(x) = \frac{1}{1 + e^{-x}},
-$$
-
-<p>and the <em>hyperbolic tangent</em> function</p>
-$$
- f(x) = \tanh(x)
-$$
-
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h3 id="relevance">Relevance </h3>
-
-<p>The <em>sigmoid</em> function are more biologically plausible because the
-output of inactive neurons are zero. Such activation function are
-called <em>one-sided</em>. However, it has been shown that the hyperbolic
-tangent performs better than the sigmoid for training MLPs.  has
-become the most popular for <em>deep neural networks</em>
-</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;The sigmoid function (or the logistic curve) is a </span>
-<span style="color: #BA2121; font-style: italic">function that takes any real number, z, and outputs a number (0,1).</span>
-<span style="color: #BA2121; font-style: italic">It is useful in neural networks for assigning weights on a relative scale.</span>
-<span style="color: #BA2121; font-style: italic">The value z is the weighted sum of parameters involved in the learning algorithm.&quot;&quot;&quot;</span>
-
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">numpy</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">matplotlib.pyplot</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">plt</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">math</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">mt</span>
-
-z <span style="color: #666666">=</span> numpy<span style="color: #666666">.</span>arange(<span style="color: #666666">-5</span>, <span style="color: #666666">5</span>, <span style="color: #666666">.1</span>)
-sigma_fn <span style="color: #666666">=</span> numpy<span style="color: #666666">.</span>vectorize(<span style="color: #008000; font-weight: bold">lambda</span> z: <span style="color: #666666">1/</span>(<span style="color: #666666">1+</span>numpy<span style="color: #666666">.</span>exp(<span style="color: #666666">-</span>z)))
-sigma <span style="color: #666666">=</span> sigma_fn(z)
-
-fig <span style="color: #666666">=</span> plt<span style="color: #666666">.</span>figure()
-ax <span style="color: #666666">=</span> fig<span style="color: #666666">.</span>add_subplot(<span style="color: #666666">111</span>)
-ax<span style="color: #666666">.</span>plot(z, sigma)
-ax<span style="color: #666666">.</span>set_ylim([<span style="color: #666666">-0.1</span>, <span style="color: #666666">1.1</span>])
-ax<span style="color: #666666">.</span>set_xlim([<span style="color: #666666">-5</span>,<span style="color: #666666">5</span>])
-ax<span style="color: #666666">.</span>grid(<span style="color: #008000; font-weight: bold">True</span>)
-ax<span style="color: #666666">.</span>set_xlabel(<span style="color: #BA2121">&#39;z&#39;</span>)
-ax<span style="color: #666666">.</span>set_title(<span style="color: #BA2121">&#39;sigmoid function&#39;</span>)
-
-plt<span style="color: #666666">.</span>show()
-
-<span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;Step Function&quot;&quot;&quot;</span>
-z <span style="color: #666666">=</span> numpy<span style="color: #666666">.</span>arange(<span style="color: #666666">-5</span>, <span style="color: #666666">5</span>, <span style="color: #666666">.02</span>)
-step_fn <span style="color: #666666">=</span> numpy<span style="color: #666666">.</span>vectorize(<span style="color: #008000; font-weight: bold">lambda</span> z: <span style="color: #666666">1.0</span> <span style="color: #008000; font-weight: bold">if</span> z <span style="color: #666666">&gt;=</span> <span style="color: #666666">0.0</span> <span style="color: #008000; font-weight: bold">else</span> <span style="color: #666666">0.0</span>)
-step <span style="color: #666666">=</span> step_fn(z)
-
-fig <span style="color: #666666">=</span> plt<span style="color: #666666">.</span>figure()
-ax <span style="color: #666666">=</span> fig<span style="color: #666666">.</span>add_subplot(<span style="color: #666666">111</span>)
-ax<span style="color: #666666">.</span>plot(z, step)
-ax<span style="color: #666666">.</span>set_ylim([<span style="color: #666666">-0.5</span>, <span style="color: #666666">1.5</span>])
-ax<span style="color: #666666">.</span>set_xlim([<span style="color: #666666">-5</span>,<span style="color: #666666">5</span>])
-ax<span style="color: #666666">.</span>grid(<span style="color: #008000; font-weight: bold">True</span>)
-ax<span style="color: #666666">.</span>set_xlabel(<span style="color: #BA2121">&#39;z&#39;</span>)
-ax<span style="color: #666666">.</span>set_title(<span style="color: #BA2121">&#39;step function&#39;</span>)
-
-plt<span style="color: #666666">.</span>show()
-
-<span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;Sine Function&quot;&quot;&quot;</span>
-z <span style="color: #666666">=</span> numpy<span style="color: #666666">.</span>arange(<span style="color: #666666">-2*</span>mt<span style="color: #666666">.</span>pi, <span style="color: #666666">2*</span>mt<span style="color: #666666">.</span>pi, <span style="color: #666666">0.1</span>)
-t <span style="color: #666666">=</span> numpy<span style="color: #666666">.</span>sin(z)
-
-fig <span style="color: #666666">=</span> plt<span style="color: #666666">.</span>figure()
-ax <span style="color: #666666">=</span> fig<span style="color: #666666">.</span>add_subplot(<span style="color: #666666">111</span>)
-ax<span style="color: #666666">.</span>plot(z, t)
-ax<span style="color: #666666">.</span>set_ylim([<span style="color: #666666">-1.0</span>, <span style="color: #666666">1.0</span>])
-ax<span style="color: #666666">.</span>set_xlim([<span style="color: #666666">-2*</span>mt<span style="color: #666666">.</span>pi,<span style="color: #666666">2*</span>mt<span style="color: #666666">.</span>pi])
-ax<span style="color: #666666">.</span>grid(<span style="color: #008000; font-weight: bold">True</span>)
-ax<span style="color: #666666">.</span>set_xlabel(<span style="color: #BA2121">&#39;z&#39;</span>)
-ax<span style="color: #666666">.</span>set_title(<span style="color: #BA2121">&#39;sine function&#39;</span>)
-
-plt<span style="color: #666666">.</span>show()
-
-<span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;Plots a graph of the squashing function used by a rectified linear</span>
-<span style="color: #BA2121; font-style: italic">unit&quot;&quot;&quot;</span>
-z <span style="color: #666666">=</span> numpy<span style="color: #666666">.</span>arange(<span style="color: #666666">-2</span>, <span style="color: #666666">2</span>, <span style="color: #666666">.1</span>)
-zero <span style="color: #666666">=</span> numpy<span style="color: #666666">.</span>zeros(<span style="color: #008000">len</span>(z))
-y <span style="color: #666666">=</span> numpy<span style="color: #666666">.</span>max([zero, z], axis<span style="color: #666666">=0</span>)
-
-fig <span style="color: #666666">=</span> plt<span style="color: #666666">.</span>figure()
-ax <span style="color: #666666">=</span> fig<span style="color: #666666">.</span>add_subplot(<span style="color: #666666">111</span>)
-ax<span style="color: #666666">.</span>plot(z, y)
-ax<span style="color: #666666">.</span>set_ylim([<span style="color: #666666">-2.0</span>, <span style="color: #666666">2.0</span>])
-ax<span style="color: #666666">.</span>set_xlim([<span style="color: #666666">-2.0</span>, <span style="color: #666666">2.0</span>])
-ax<span style="color: #666666">.</span>grid(<span style="color: #008000; font-weight: bold">True</span>)
-ax<span style="color: #666666">.</span>set_xlabel(<span style="color: #BA2121">&#39;z&#39;</span>)
-ax<span style="color: #666666">.</span>set_title(<span style="color: #BA2121">&#39;Rectified linear unit&#39;</span>)
-
-plt<span style="color: #666666">.</span>show()
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="fine-tuning-neural-network-hyperparameters">Fine-tuning neural network hyperparameters </h2>
-
-<p>The flexibility of neural networks is also one of their main
-drawbacks: there are many hyperparameters to tweak. Not only can you
-use any imaginable network topology (how neurons/nodes are
-interconnected), but even in a simple FFNN you can change the number
-of layers, the number of neurons per layer, the type of activation
-function to use in each layer, the weight initialization logic, the
-stochastic gradient optmized and much more. How do you know what
-combination of hyperparameters is the best for your task?
-</p>
-
-<ul>
-<li> You can use grid search with cross-validation to find the right hyperparameters.</li>
-</ul>
-<p>However,since there are many hyperparameters to tune, and since
-training a neural network on a large dataset takes a lot of time, you
-will only be able to explore a tiny part of the hyperparameter space.
-</p>
-
-<ul>
-<li> You can use randomized search.</li>
-<li> Or use tools like <a href="http://oscar.calldesk.ai/" target="_blank">Oscar</a>, which implements more complex algorithms to help you find a good set of hyperparameters quickly.</li>  
-</ul>
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="hidden-layers">Hidden layers </h2>
-
-<p>For many problems you can start with just one or two hidden layers and
-it will work just fine.  For the MNIST data set you ca easily get a
-high accuracy using just one hidden layer with a few hundred neurons.
-You can reach for this data set above 98% accuracy using two hidden
-layers with the same total amount of neurons, in roughly the same
-amount of training time.
-</p>
-
-<p>For more complex problems, you can gradually ramp up the number of
-hidden layers, until you start overfitting the training set. Very
-complex tasks, such as large image classification or speech
-recognition, typically require networks with dozens of layers and they
-need a huge amount of training data. However, you will rarely have to
-train such networks from scratch: it is much more common to reuse
-parts of a pretrained state-of-the-art network that performs a similar
-task.
-</p>
-
-<!-- !split  -->
-<h2 id="vanishing-gradients">Vanishing gradients  </h2>
-
-<p>The Back propagation algorithm we derived above works by going from
-the output layer to the input layer, propagating the error gradient on
-the way. Once the algorithm has computed the gradient of the cost
-function with regards to each parameter in the network, it uses these
-gradients to update each parameter with a Gradient Descent (GD) step.
-</p>
-
-<p>Unfortunately for us, the gradients often get smaller and smaller as
-the algorithm progresses down to the first hidden layers. As a result,
-the GD update leaves the lower layer connection weights virtually
-unchanged, and training never converges to a good solution. This is
-known in the literature as <b>the vanishing gradients problem</b>.
-</p>
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="exploding-gradients">Exploding gradients </h2>
-
-<p>In other cases, the opposite can happen, namely the the gradients can
-grow bigger and bigger. The result is that many of the layers get
-large updates of the weights the algorithm diverges. This is the
-<b>exploding gradients problem</b>, which is mostly encountered in
-recurrent neural networks. More generally, deep neural networks suffer
-from unstable gradients, different layers may learn at widely
-different speeds
-</p>
-
-<!-- !split  -->
-<h2 id="is-the-logistic-activation-function-sigmoid-our-choice">Is the Logistic activation function (Sigmoid)  our choice? </h2>
-
-<p>Although this unfortunate behavior has been empirically observed for
-quite a while (it was one of the reasons why deep neural networks were
-mostly abandoned for a long time), it is only around 2010 that
-significant progress was made in understanding it.
-</p>
-
-<p>A paper titled <a href="http://proceedings.mlr.press/v9/glorot10a.html" target="_blank">Understanding the Difficulty of Training Deep
-Feedforward Neural Networks by Xavier Glorot and Yoshua Bengio</a> found that
-the problems with the popular logistic
-sigmoid activation function and the weight initialization technique
-that was most popular at the time, namely random initialization using
-a normal distribution with a mean of 0 and a standard deviation of
-1. 
-</p>
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="logistic-function-as-the-root-of-problems">Logistic function as the root of problems </h2>
-
-<p>They showed that with this activation function and this
-initialization scheme, the variance of the outputs of each layer is
-much greater than the variance of its inputs. Going forward in the
-network, the variance keeps increasing after each layer until the
-activation function saturates at the top layers. This is actually made
-worse by the fact that the logistic function has a mean of 0.5, not 0
-(the hyperbolic tangent function has a mean of 0 and behaves slightly
-better than the logistic function in deep networks).
-</p>
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="the-derivative-of-the-logistic-funtion">The derivative of the Logistic funtion </h2>
-
-<p>Looking at the logistic activation function, when inputs become large
-(negative or positive), the function saturates at 0 or 1, with a
-derivative extremely close to 0. Thus when backpropagation kicks in,
-it has virtually no gradient to propagate back through the network,
-and what little gradient exists keeps getting diluted as
-backpropagation progresses down through the top layers, so there is
-really nothing left for the lower layers.
-</p>
-
-<p>In their paper, Glorot and Bengio propose a way to significantly
-alleviate this problem. We need the signal to flow properly in both
-directions: in the forward direction when making predictions, and in
-the reverse direction when backpropagating gradients. We don&#8217;t want
-the signal to die out, nor do we want it to explode and saturate. For
-the signal to flow properly, the authors argue that we need the
-variance of the outputs of each layer to be equal to the variance of
-its inputs, and we also need the gradients to have equal variance
-before and after flowing through a layer in the reverse direction.
-</p>
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="insights-from-the-paper-by-glorot-and-bengio">Insights from the paper by Glorot and Bengio </h2>
-
-<p>One of the insights in the 2010 paper by Glorot and Bengio was that
-the vanishing/exploding gradients problems were in part due to a poor
-choice of activation function. Until then most people had assumed that
-if Nature had chosen to use roughly sigmoid activation functions in
-biological neurons, they must be an excellent choice. But it turns out
-that other activation functions behave much better in deep neural
-networks, in particular the ReLU activation function, mostly because
-it does not saturate for positive values (and also because it is quite
-fast to compute).
-</p>
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="the-relu-function-family">The RELU function family </h2>
-
-<p>The ReLU activation function suffers from a problem known as the dying
-ReLUs: during training, some neurons effectively die, meaning they
-stop outputting anything other than 0.
-</p>
-
-<p>In some cases, you may find that half of your network&#8217;s neurons are
-dead, especially if you used a large learning rate. During training,
-if a neuron&#8217;s weights get updated such that the weighted sum of the
-neuron&#8217;s inputs is negative, it will start outputting 0. When this
-happen, the neuron is unlikely to come back to life since the gradient
-of the ReLU function is 0 when its input is negative.
-</p>
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="elu-function">ELU function </h2>
-
-<p>To solve this problem, nowadays practitioners use a variant of the
-ReLU function, such as the leaky ReLU discussed above or the so-called
-exponential linear unit (ELU) function
-</p>
-
-$$
-ELU(z) = \left\{\begin{array}{cc} \alpha\left( \exp{(z)}-1\right) & z < 0,\\  z & z \ge 0.\end{array}\right. 
-$$
-
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="which-activation-function-should-we-use">Which activation function should we use? </h2>
-
-<p>In general it seems that the ELU activation function is better than
-the leaky ReLU function (and its variants), which is better than
-ReLU. ReLU performs better than \( \tanh \) which in turn performs better
-than the logistic function.
-</p>
-
-<p>If runtime performance is an issue, then you may opt for the leaky
-ReLU function over the ELU function If you don&#8217;t want to tweak yet
-another hyperparameter, you may just use the default \( \alpha \) of
-\( 0.01 \) for the leaky ReLU, and \( 1 \) for ELU. If you have spare time and
-computing power, you can use cross-validation or bootstrap to evaluate
-other activation functions.
-</p>
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="more-on-activation-functions-output-layers">More on activation functions, output layers </h2>
-
-<p>In most cases you can use the ReLU activation function in the hidden
-layers (or one of its variants).
-</p>
-
-<p>It is a bit faster to compute than other activation functions, and the
-gradient descent optimization does in general not get stuck.
-</p>
-
-<b>For the output layer:</b>
-
-<ul>
-<li> For classification the softmax activation function is generally a good choice for classification tasks (when the classes are mutually exclusive).</li>
-<li> For regression tasks, you can simply use no activation function at all.</li>
-</ul>
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="batch-normalization">Batch Normalization </h2>
-
-<p>Batch Normalization aims to address the vanishing/exploding gradients
-problems, and more generally the problem that the distribution of each
-layer&#8217;s inputs changes during training, as the parameters of the
-previous layers change.
-</p>
-
-<p>The technique consists of adding an operation in the model just before
-the activation function of each layer, simply zero-centering and
-normalizing the inputs, then scaling and shifting the result using two
-new parameters per layer (one for scaling, the other for shifting). In
-other words, this operation lets the model learn the optimal scale and
-mean of the inputs for each layer.  In order to zero-center and
-normalize the inputs, the algorithm needs to estimate the inputs&#8217; mean
-and standard deviation. It does so by evaluating the mean and standard
-deviation of the inputs over the current mini-batch, from this the
-name batch normalization.
-</p>
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="dropout">Dropout </h2>
-
-<p>It is a fairly simple algorithm: at every training step, every neuron
-(including the input neurons but excluding the output neurons) has a
-probability \( p \) of being temporarily dropped out, meaning it will be
-entirely ignored during this training step, but it may be active
-during the next step.
-</p>
-
-<p>The hyperparameter \( p \) is called the dropout rate, and it is typically
-set to 50%. After training, the neurons are not dropped anymore.  It
-is viewed as one of the most popular regularization techniques.
-</p>
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="gradient-clipping">Gradient Clipping </h2>
-
-<p>A popular technique to lessen the exploding gradients problem is to
-simply clip the gradients during backpropagation so that they never
-exceed some threshold (this is mostly useful for recurrent neural
-networks).
-</p>
-
-<p>This technique is called Gradient Clipping.</p>
-
-<p>In general however, Batch
-Normalization is preferred.
-</p>
-
-<!-- !split  -->
-<h2 id="a-top-down-perspective-on-neural-networks">A top-down perspective on Neural networks </h2>
-
-<p>The first thing we would like to do is divide the data into two or
-three parts. A training set, a validation or dev (development) set,
-and a test set. The test set is the data on which we want to make
-predictions. The dev set is a subset of the training data we use to
-check how well we are doing out-of-sample, after training the model on
-the training dataset. We use the validation error as a proxy for the
-test error in order to make tweaks to our model. It is crucial that we
-do not use any of the test data to train the algorithm. This is a
-cardinal sin in ML. Then:
-</p>
-
-<ol>
-<li> Estimate optimal error rate</li>
-<li> Minimize underfitting (bias) on training data set.</li>
-<li> Make sure you are not overfitting.</li>
-</ol>
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="more-top-down-perspectives">More top-down perspectives </h2>
-
-<p>If the validation and test sets are drawn from the same distributions,
-then a good performance on the validation set should lead to similarly
-good performance on the test set. 
-</p>
-
-<p>However, sometimes
-the training data and test data differ in subtle ways because, for
-example, they are collected using slightly different methods, or
-because it is cheaper to collect data in one way versus another. In
-this case, there can be a mismatch between the training and test
-data. This can lead to the neural network overfitting these small
-differences between the test and training sets, and a poor performance
-on the test set despite having a good performance on the validation
-set. To rectify this, Andrew Ng suggests making two validation or dev
-sets, one constructed from the training data and one constructed from
-the test data. The difference between the performance of the algorithm
-on these two validation sets quantifies the train-test mismatch. This
-can serve as another important diagnostic when using DNNs for
-supervised learning.
-</p>
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="limitations-of-supervised-learning-with-deep-networks">Limitations of supervised learning with deep networks </h2>
-
-<p>Like all statistical methods, supervised learning using neural
-networks has important limitations. This is especially important when
-one seeks to apply these methods, especially to physics problems. Like
-all tools, DNNs are not a universal solution. Often, the same or
-better performance on a task can be achieved by using a few
-hand-engineered features (or even a collection of random
-features). 
-</p>
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="limitations-of-nns">Limitations of NNs </h2>
-
-<p>Here we list some of the important limitations of supervised neural network based models. </p>
-
-<ul>
-<li> <b>Need labeled data</b>. All supervised learning methods, DNNs for supervised learning require labeled data. Often, labeled data is harder to acquire than unlabeled data (e.g. one must pay for human experts to label images).</li>
-<li> <b>Supervised neural networks are extremely data intensive.</b> DNNs are data hungry. They perform best when data is plentiful. This is doubly so for supervised methods where the data must also be labeled. The utility of DNNs is extremely limited if data is hard to acquire or the datasets are small (hundreds to a few thousand samples). In this case, the performance of other methods that utilize hand-engineered features can exceed that of DNNs.</li>
-</ul>
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="homogeneous-data">Homogeneous data </h2>
-
-<ul>
-<li> <b>Homogeneous data.</b> Almost all DNNs deal with homogeneous data of one type. It is very hard to design architectures that mix and match data types (i.e.&nbsp;some continuous variables, some discrete variables, some time series). In applications beyond images, video, and language, this is often what is required. In contrast, ensemble models like random forests or gradient-boosted trees have no difficulty handling mixed data types.</li>
-</ul>
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="more-limitations">More limitations </h2>
-
-<ul>
-<li> <b>Many problems are not about prediction.</b> In natural science we are often interested in learning something about the underlying distribution that generates the data. In this case, it is often difficult to cast these ideas in a supervised learning setting. While the problems are related, it is possible to make good predictions with a <em>wrong</em> model. The model might or might not be useful for understanding the underlying science.</li>
-</ul>
-<p>Some of these remarks are particular to DNNs, others are shared by all supervised learning methods. This motivates the use of unsupervised methods which in part circumvent these problems.</p>
-
 <!-- ------------------- end of main content --------------- -->
 <center style="font-size:80%">
-<!-- copyright --> &copy; 1999-2024, Morten Hjorth-Jensen. Released under CC Attribution-NonCommercial 4.0 license
+<!-- copyright --> &copy; 1999-2025, Morten Hjorth-Jensen. Released under CC Attribution-NonCommercial 4.0 license
 </center>
 </body>
 </html>
diff --git a/doc/pub/week41/ipynb/figures/adagrad.png b/doc/pub/week41/ipynb/figures/adagrad.png
new file mode 100644
index 000000000..97a9cf908
Binary files /dev/null and b/doc/pub/week41/ipynb/figures/adagrad.png differ
diff --git a/doc/pub/week41/ipynb/figures/adam.png b/doc/pub/week41/ipynb/figures/adam.png
new file mode 100644
index 000000000..a3a39f025
Binary files /dev/null and b/doc/pub/week41/ipynb/figures/adam.png differ
diff --git a/doc/pub/week41/ipynb/figures/nn1.pdf b/doc/pub/week41/ipynb/figures/nn1.pdf
new file mode 100644
index 000000000..bebe5cabd
Binary files /dev/null and b/doc/pub/week41/ipynb/figures/nn1.pdf differ
diff --git a/doc/pub/week41/ipynb/figures/nn1.png b/doc/pub/week41/ipynb/figures/nn1.png
new file mode 100644
index 000000000..05c359481
Binary files /dev/null and b/doc/pub/week41/ipynb/figures/nn1.png differ
diff --git a/doc/pub/week41/ipynb/figures/rmsprop.png b/doc/pub/week41/ipynb/figures/rmsprop.png
new file mode 100644
index 000000000..9f336d033
Binary files /dev/null and b/doc/pub/week41/ipynb/figures/rmsprop.png differ
diff --git a/doc/pub/week41/ipynb/figures/simplenn1.png b/doc/pub/week41/ipynb/figures/simplenn1.png
new file mode 100644
index 000000000..3c87aa3ee
Binary files /dev/null and b/doc/pub/week41/ipynb/figures/simplenn1.png differ
diff --git a/doc/pub/week41/ipynb/figures/simplenn2.png b/doc/pub/week41/ipynb/figures/simplenn2.png
new file mode 100644
index 000000000..2ce83dd53
Binary files /dev/null and b/doc/pub/week41/ipynb/figures/simplenn2.png differ
diff --git a/doc/pub/week41/ipynb/figures/simplenn3.png b/doc/pub/week41/ipynb/figures/simplenn3.png
new file mode 100644
index 000000000..1f1d25c42
Binary files /dev/null and b/doc/pub/week41/ipynb/figures/simplenn3.png differ
diff --git a/doc/pub/week41/ipynb/ipynb-week41-src.tar.gz b/doc/pub/week41/ipynb/ipynb-week41-src.tar.gz
index 9050fa53f..e9076d645 100644
Binary files a/doc/pub/week41/ipynb/ipynb-week41-src.tar.gz and b/doc/pub/week41/ipynb/ipynb-week41-src.tar.gz differ
diff --git a/doc/pub/week41/ipynb/week41.ipynb b/doc/pub/week41/ipynb/week41.ipynb
index c5f643509..64b58f37a 100644
--- a/doc/pub/week41/ipynb/week41.ipynb
+++ b/doc/pub/week41/ipynb/week41.ipynb
@@ -2,8 +2,10 @@
  "cells": [
   {
    "cell_type": "markdown",
-   "id": "5bf96c8a",
-   "metadata": {},
+   "id": "fda119b0",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "<!-- HTML file automatically generated from DocOnce source (https://github.com/doconce/doconce/)\n",
     "doconce format html week41.do.txt --no_mako -->\n",
@@ -12,83 +14,125 @@
   },
   {
    "cell_type": "markdown",
-   "id": "e3d91be4",
-   "metadata": {},
+   "id": "1038a70c",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "# Week 41 Neural networks and constructing a neural network code\n",
-    "**Morten Hjorth-Jensen**, Department of Physics, University of Oslo\n",
+    "**Morten Hjorth-Jensen**, Department of Physics, University of Oslo, Norway\n",
     "\n",
     "Date: **Week 41**"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "6ba7a03a",
-   "metadata": {},
+   "id": "68c8b82a",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "## Plan for week 41, October 7-11"
+    "## Plan for week 41, October 6-10"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "e8ee6c3d",
-   "metadata": {},
+   "id": "69bd69a9",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "## Material for the lecture on Monday October 7, 2024\n",
+    "## Material for the lecture on Monday October 6, 2025\n",
     "1. Neural Networks, setting up the basic steps, from the simple perceptron model to the multi-layer perceptron model.\n",
     "\n",
-    "2. Building our own Feed-forward Neural Network\n",
+    "2. Building our own Feed-forward Neural Network, getting started\n",
     "\n",
-    "  * Video of lecture notes at <https://youtu.be/pMRUbf9E-gM>\n",
-    "\n",
-    "  * Whiteboard notes at <https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2024/NotesOctober7.pdf>\n",
-    "\n",
-    "**Readings and Videos:**\n",
+    "3. Video of lecture notes at <https://youtu.be/96Pe6O9Wn6g>\n",
     "\n",
+    "4. Whiteboard notes at <https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2025/FYSSTKweek41.pdf>"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d1ff08da",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Readings and Videos:\n",
     "1. These lecture notes\n",
     "\n",
-    "2. Rashcka et al chapter 11 \n",
+    "2. For neural networks we recommend Goodfellow et al chapters 6 and 7.\n",
     "\n",
-    "3. For neural networks we recommend Goodfellow et al chapter 6.\n",
+    "3. Rashkca et al., chapter 11, jupyter-notebook sent separately, from [GitHub](https://github.com/rasbt/machine-learning-book)\n",
     "\n",
-    "a. Neural Networks demystified at <https://www.youtube.com/watch?v=bxe2T-V8XRs&list=PLiaHhY2iBX9hdHaRr6b7XevZtgZRa1PoU&ab_channel=WelchLabs>\n",
+    "4. Neural Networks demystified at <https://www.youtube.com/watch?v=bxe2T-V8XRs&list=PLiaHhY2iBX9hdHaRr6b7XevZtgZRa1PoU&ab_channel=WelchLabs>\n",
     "\n",
-    "2. Building Neural Networks from scratch at <https://www.youtube.com/watch?v=Wo5dMEP_BbI&list=PLQVvvaa0QuDcjD5BAw2DxE6OF2tius3V3&ab_channel=sentdex>\n",
+    "5. Building Neural Networks from scratch at <https://www.youtube.com/watch?v=Wo5dMEP_BbI&list=PLQVvvaa0QuDcjD5BAw2DxE6OF2tius3V3&ab_channel=sentdex>\n",
     "\n",
-    "3. Video on Neural Networks at <https://www.youtube.com/watch?v=CqOfi41LfDw>\n",
+    "6. Video on Neural Networks at <https://www.youtube.com/watch?v=CqOfi41LfDw>\n",
     "\n",
-    "4. Video on the back propagation algorithm at <https://www.youtube.com/watch?v=Ilg3gGewQ5U>\n",
+    "7. Video on the back propagation algorithm at <https://www.youtube.com/watch?v=Ilg3gGewQ5U>\n",
     "\n",
-    "We also  recommend Michael Nielsen's intuitive approach to the neural networks and the universal approximation theorem, see the slides at <http://neuralnetworksanddeeplearning.com/chap4.html>."
+    "8. We also  recommend Michael Nielsen's intuitive approach to the neural networks and the universal approximation theorem, see the slides at <http://neuralnetworksanddeeplearning.com/chap4.html>."
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "f1d45fb2",
-   "metadata": {},
+   "id": "253098c2",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "## Material for the active learning sessions on Tuesday and Wednesday\n",
-    "* Exercise on writing your own stochastic gradient and gradient descent codes. This exercise continues next week with studies of automatic differentiation\n",
+    "## Mathematics of deep learning\n",
+    "\n",
+    "**Two recent books online.**\n",
     "\n",
-    "* One lecture at the beginning of each session on the material from weeks 39 and 40 and how to write your own gradient descent code\n",
+    "1. [The Modern Mathematics of Deep Learning, by Julius Berner, Philipp Grohs, Gitta Kutyniok, Philipp Petersen](https://arxiv.org/abs/2105.04026), published as [Mathematical Aspects of Deep Learning, pp. 1-111. Cambridge University Press, 2022](https://doi.org/10.1017/9781009025096.002)\n",
     "\n",
-    "* Discussion of project 2\n",
+    "2. [Mathematical Introduction to Deep Learning: Methods, Implementations, and Theory, Arnulf Jentzen, Benno Kuckuck, Philippe von Wurstemberger](https://doi.org/10.48550/arXiv.2310.20360)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "331c104f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Reminder on books with hands-on material and codes\n",
+    "[Sebastian Rashcka et al, Machine learning with Sickit-Learn and PyTorch](https://sebastianraschka.com/blog/2022/ml-pytorch-book.html)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0d853864",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Lab sessions on Tuesday and Wednesday\n",
     "\n",
-    "* Your task before the sessions: revisit the material from weeks 39 and 40 and in particular the material from week 40 on stochastic gradient descent"
+    "Aim: Getting started with coding neural network. The exercises this\n",
+    "week aim at setting up the feed-forward part of a neural network."
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "d4e77966",
-   "metadata": {},
+   "id": "0a6f987d",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "## Lecture Monday  October 7"
+    "## Lecture Monday  October 6"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "e53c7266",
-   "metadata": {},
+   "id": "7aa23b23",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "## Introduction to Neural networks\n",
     "\n",
@@ -103,8 +147,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "bc8b8894",
-   "metadata": {},
+   "id": "30119c9d",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "## Artificial neurons\n",
     "\n",
@@ -125,8 +171,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "7822604f",
-   "metadata": {},
+   "id": "0159bd7b",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "<!-- Equation labels as ordinary links -->\n",
     "<div id=\"artificialNeuron\"></div>\n",
@@ -141,8 +189,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "e1b48997",
-   "metadata": {},
+   "id": "020bede5",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "Here, the output $y$ of the neuron is the value of its activation function, which have as input\n",
     "a weighted sum of signals $x_i, \\dots ,x_n$ received by $n$ other neurons.\n",
@@ -179,8 +229,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "17881d26",
-   "metadata": {},
+   "id": "db6cd14a",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "## Neural network types\n",
     "\n",
@@ -206,8 +258,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "64134ea5",
-   "metadata": {},
+   "id": "993aa174",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "## Feed-forward neural networks\n",
     "\n",
@@ -225,8 +279,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "d1b8101c",
-   "metadata": {},
+   "id": "4700c1e1",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "## Convolutional Neural Network\n",
     "\n",
@@ -252,8 +308,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "ee76d630",
-   "metadata": {},
+   "id": "79e6f59e",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "## Recurrent neural networks\n",
     "\n",
@@ -271,8 +329,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "77292572",
-   "metadata": {},
+   "id": "7569db61",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "## Other types of networks\n",
     "\n",
@@ -290,8 +350,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "31229c23",
-   "metadata": {},
+   "id": "3a15bcdd",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "## Multilayer perceptrons\n",
     "\n",
@@ -305,8 +367,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "8ef64a1c",
-   "metadata": {},
+   "id": "033f5b5f",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "## Why multilayer perceptrons?\n",
     "\n",
@@ -324,8 +388,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "8a05aae8",
-   "metadata": {},
+   "id": "e543ed48",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "## Illustration of a single perceptron model and a multi-perceptron model\n",
     "\n",
@@ -338,206 +404,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "53553bc4",
-   "metadata": {},
-   "source": [
-    "## Examples of XOR, OR and AND gates\n",
-    "\n",
-    "Let us first try to fit various gates using standard linear\n",
-    "regression. The gates we are thinking of are the classical XOR, OR and\n",
-    "AND gates, well-known elements in computer science. The tables here\n",
-    "show how we can set up the inputs $x_1$ and $x_2$ in order to yield a\n",
-    "specific target $y_i$."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 1,
-   "id": "bd160078",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "\"\"\"\n",
-    "Simple code that tests XOR, OR and AND gates with linear regression\n",
-    "\"\"\"\n",
-    "\n",
-    "import numpy as np\n",
-    "# Design matrix\n",
-    "X = np.array([ [1, 0, 0], [1, 0, 1], [1, 1, 0],[1, 1, 1]],dtype=np.float64)\n",
-    "print(f\"The X.TX  matrix:{X.T @ X}\")\n",
-    "Xinv = np.linalg.pinv(X.T @ X)\n",
-    "print(f\"The invers of X.TX  matrix:{Xinv}\")\n",
-    "\n",
-    "# The XOR gate \n",
-    "yXOR = np.array( [ 0, 1 ,1, 0])\n",
-    "ThetaXOR  = Xinv @ X.T @ yXOR\n",
-    "print(f\"The values of theta for the XOR gate:{ThetaXOR}\")\n",
-    "print(f\"The linear regression prediction  for the XOR gate:{X @ ThetaXOR}\")\n",
-    "\n",
-    "\n",
-    "# The OR gate \n",
-    "yOR = np.array( [ 0, 1 ,1, 1])\n",
-    "ThetaOR  = Xinv @ X.T @ yOR\n",
-    "print(f\"The values of theta for the OR gate:{ThetaOR}\")\n",
-    "print(f\"The linear regression prediction  for the OR gate:{X @ ThetaOR}\")\n",
-    "\n",
-    "\n",
-    "# The OR gate \n",
-    "yAND = np.array( [ 0, 0 ,0, 1])\n",
-    "ThetaAND  = Xinv @ X.T @ yAND\n",
-    "print(f\"The values of theta for the AND gate:{ThetaAND}\")\n",
-    "print(f\"The linear regression prediction  for the AND gate:{X @ ThetaAND}\")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "8dc03535",
-   "metadata": {},
-   "source": [
-    "What is happening here?"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "0f612b2a",
-   "metadata": {},
-   "source": [
-    "## Does Logistic Regression do a better Job?"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 2,
-   "id": "b2c26240",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "%matplotlib inline\n",
-    "\n",
-    "\"\"\"\n",
-    "Simple code that tests XOR and OR gates with linear regression\n",
-    "and logistic regression\n",
-    "\"\"\"\n",
-    "\n",
-    "import matplotlib.pyplot as plt\n",
-    "from sklearn.linear_model import LogisticRegression\n",
-    "import numpy as np\n",
-    "\n",
-    "# Design matrix\n",
-    "X = np.array([ [1, 0, 0], [1, 0, 1], [1, 1, 0],[1, 1, 1]],dtype=np.float64)\n",
-    "print(f\"The X.TX  matrix:{X.T @ X}\")\n",
-    "Xinv = np.linalg.pinv(X.T @ X)\n",
-    "print(f\"The invers of X.TX  matrix:{Xinv}\")\n",
-    "\n",
-    "# The XOR gate \n",
-    "yXOR = np.array( [ 0, 1 ,1, 0])\n",
-    "ThetaXOR  = Xinv @ X.T @ yXOR\n",
-    "print(f\"The values of theta for the XOR gate:{ThetaXOR}\")\n",
-    "print(f\"The linear regression prediction  for the XOR gate:{X @ ThetaXOR}\")\n",
-    "\n",
-    "\n",
-    "# The OR gate \n",
-    "yOR = np.array( [ 0, 1 ,1, 1])\n",
-    "ThetaOR  = Xinv @ X.T @ yOR\n",
-    "print(f\"The values of theta for the OR gate:{ThetaOR}\")\n",
-    "print(f\"The linear regression prediction  for the OR gate:{X @ ThetaOR}\")\n",
-    "\n",
-    "\n",
-    "# The OR gate \n",
-    "yAND = np.array( [ 0, 0 ,0, 1])\n",
-    "ThetaAND  = Xinv @ X.T @ yAND\n",
-    "print(f\"The values of theta for the AND gate:{ThetaAND}\")\n",
-    "print(f\"The linear regression prediction  for the AND gate:{X @ ThetaAND}\")\n",
-    "\n",
-    "# Now we change to logistic regression\n",
-    "\n",
-    "\n",
-    "# Logistic Regression\n",
-    "logreg = LogisticRegression()\n",
-    "logreg.fit(X, yOR)\n",
-    "print(\"Test set accuracy with Logistic Regression for OR gate: {:.2f}\".format(logreg.score(X,yOR)))\n",
-    "\n",
-    "logreg.fit(X, yXOR)\n",
-    "print(\"Test set accuracy with Logistic Regression for XOR gate: {:.2f}\".format(logreg.score(X,yXOR)))\n",
-    "\n",
-    "\n",
-    "logreg.fit(X, yAND)\n",
-    "print(\"Test set accuracy with Logistic Regression for AND gate: {:.2f}\".format(logreg.score(X,yAND)))"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "f5dc459e",
-   "metadata": {},
-   "source": [
-    "Not exactly impressive, but somewhat better."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "b2ad997b",
-   "metadata": {},
-   "source": [
-    "## Adding Neural Networks"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 3,
-   "id": "df5c9c62",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "\n",
-    "# and now neural networks with Scikit-Learn and the XOR\n",
-    "\n",
-    "from sklearn.neural_network import MLPClassifier\n",
-    "from sklearn.datasets import make_classification\n",
-    "X, yXOR = make_classification(n_samples=100, random_state=1)\n",
-    "FFNN = MLPClassifier(random_state=1, max_iter=300).fit(X, yXOR)\n",
-    "FFNN.predict_proba(X)\n",
-    "print(f\"Test set accuracy with Feed Forward Neural Network  for XOR gate:{FFNN.score(X, yXOR)}\")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "40a0930e",
-   "metadata": {},
-   "source": [
-    "## Mathematics of deep learning\n",
-    "\n",
-    "**Two recent books online.**\n",
-    "\n",
-    "1. [The Modern Mathematics of Deep Learning, by Julius Berner, Philipp Grohs, Gitta Kutyniok, Philipp Petersen](https://arxiv.org/abs/2105.04026), published as [Mathematical Aspects of Deep Learning, pp. 1-111. Cambridge University Press, 2022](https://doi.org/10.1017/9781009025096.002)\n",
-    "\n",
-    "2. [Mathematical Introduction to Deep Learning: Methods, Implementations, and Theory, Arnulf Jentzen, Benno Kuckuck, Philippe von Wurstemberger](https://doi.org/10.48550/arXiv.2310.20360)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "decd7409",
-   "metadata": {},
-   "source": [
-    "## Reminder on books with hands-on material and codes\n",
-    "* [Sebastian Rashcka et al, Machine learning with Sickit-Learn and PyTorch](https://sebastianraschka.com/blog/2022/ml-pytorch-book.html)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "e2a4d61c",
-   "metadata": {},
-   "source": [
-    "## Reading recommendations\n",
-    "\n",
-    "1. Rashkca et al., chapter 11, jupyter-notebook sent separately, from [GitHub](https://github.com/rasbt/machine-learning-book)\n",
-    "\n",
-    "2. Goodfellow et al, chapter 6 and 7 contain most of the neural network background."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "16723d55",
-   "metadata": {},
+   "id": "adc56636",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "## Mathematics of deep learning and neural networks\n",
     "\n",
@@ -553,8 +423,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "902a802b",
-   "metadata": {},
+   "id": "a9d40893",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "## Basics of an NN\n",
     "\n",
@@ -575,8 +447,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "fbd06338",
-   "metadata": {},
+   "id": "ab287d77",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "## Overarching view of a neural network\n",
     "\n",
@@ -600,8 +474,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "0e9e449b",
-   "metadata": {},
+   "id": "e2369bb5",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "## The optimization problem\n",
     "\n",
@@ -615,8 +491,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "b06ad67d",
-   "metadata": {},
+   "id": "ca0a243e",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "C(\\boldsymbol{\\Theta})=\\frac{1}{n}\\left\\{\\left(\\boldsymbol{y}-\\boldsymbol{X}\\boldsymbol{\\theta}\\right)^T\\left(\\boldsymbol{y}-\\boldsymbol{X}\\boldsymbol{\\theta}\\right)\\right\\}.\n",
@@ -625,8 +503,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "aa068f3c",
-   "metadata": {},
+   "id": "17f8f3a8",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "This function represents one of many possible ways to define\n",
     "the so-called cost function. Note that here we have assumed a linear dependence in terms of the paramters $\\boldsymbol{\\Theta}$. This is in general not the case."
@@ -634,8 +514,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "2977103a",
-   "metadata": {},
+   "id": "58378a79",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "## Parameters of neural networks\n",
     "For neural networks the parameters\n",
@@ -650,8 +532,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "3cd55846",
-   "metadata": {},
+   "id": "2a045740",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "## Other ingredients of a neural network\n",
     "\n",
@@ -677,8 +561,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "66def2af",
-   "metadata": {},
+   "id": "bf567858",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "## Other parameters\n",
     "\n",
@@ -689,8 +575,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "1dfc75e0",
-   "metadata": {},
+   "id": "8259f771",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "## Universal approximation theorem\n",
     "\n",
@@ -703,8 +591,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "db4ec583",
-   "metadata": {},
+   "id": "ac44c8cd",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "\\sigma(z) = \\left\\{\\begin{array}{cc} 1 & z\\rightarrow \\infty\\\\ 0 & z \\rightarrow -\\infty \\end{array}\\right.\n",
@@ -713,8 +603,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "e6b6f0e5",
-   "metadata": {},
+   "id": "2a2ab7f0",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "Given a continuous and deterministic function $F(\\boldsymbol{x})$ on the unit\n",
     "cube in $d$-dimensions $F\\in [0,1]^d$, $x\\in [0,1]^d$ and a parameter\n",
@@ -725,8 +617,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "82c4e644",
-   "metadata": {},
+   "id": "e4a4e371",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "\\vert F(\\boldsymbol{x})-f(\\boldsymbol{x};\\boldsymbol{\\Theta})\\vert < \\epsilon \\hspace{0.1cm} \\forall \\boldsymbol{x}\\in[0,1]^d.\n",
@@ -735,8 +629,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "4d8b622d",
-   "metadata": {},
+   "id": "879f7771",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "## Some parallels from real analysis\n",
     "\n",
@@ -749,8 +645,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "8d93823d",
-   "metadata": {},
+   "id": "21c22fd3",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "## The approximation theorem in words\n",
     "\n",
@@ -763,8 +661,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "087fd46d",
-   "metadata": {},
+   "id": "0c5370f5",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "\\mathbb{E}[\\vert F(\\boldsymbol{x})\\vert^2] =\\int_{\\boldsymbol{x}\\in D} \\vert F(\\boldsymbol{x})\\vert^2p(\\boldsymbol{x})d\\boldsymbol{x} < \\infty.\n",
@@ -773,16 +673,20 @@
   },
   {
    "cell_type": "markdown",
-   "id": "7547efaa",
-   "metadata": {},
+   "id": "ca8fb3f8",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "Then we have"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "65071f00",
-   "metadata": {},
+   "id": "7b207a01",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "\\mathbb{E}[\\vert F(\\boldsymbol{x})-f(\\boldsymbol{x};\\boldsymbol{\\Theta})\\vert^2] =\\int_{\\boldsymbol{x}\\in D} \\vert F(\\boldsymbol{x})-f(\\boldsymbol{x};\\boldsymbol{\\Theta})\\vert^2p(\\boldsymbol{x})d\\boldsymbol{x} < \\epsilon.\n",
@@ -791,8 +695,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "7a8ecd20",
-   "metadata": {},
+   "id": "2da1f39e",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "## More on the general approximation theorem\n",
     "\n",
@@ -807,8 +713,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "41ed53e6",
-   "metadata": {},
+   "id": "710149d0",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "## Class of functions we can approximate\n",
     "\n",
@@ -818,8 +726,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "2b83edda",
-   "metadata": {},
+   "id": "03f60f45",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "## Setting up the equations for a neural network\n",
     "\n",
@@ -833,8 +743,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "d27eb1c2",
-   "metadata": {},
+   "id": "616af88d",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "{\\cal C}(\\boldsymbol{\\Theta})  =  \\frac{1}{2}\\sum_{i=1}^n\\left(y_i - \\tilde{y}_i\\right)^2,\n",
@@ -843,8 +755,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "f83f5909",
-   "metadata": {},
+   "id": "f014f1d4",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "where the $y_i$s are our $n$ targets (the values we want to\n",
     "reproduce), while the outputs of the network after having propagated\n",
@@ -853,8 +767,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "693f1f94",
-   "metadata": {},
+   "id": "e5851e0f",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "## Layout of a neural network with three hidden layers\n",
     "\n",
@@ -867,8 +783,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "c4473918",
-   "metadata": {},
+   "id": "fdc33d40",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "## Definitions\n",
     "\n",
@@ -882,8 +800,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "80c49202",
-   "metadata": {},
+   "id": "c8aeafe7",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "z_j^l = \\sum_{i=1}^{M_{l-1}}w_{ij}^la_i^{l-1}+b_j^l,\n",
@@ -892,8 +812,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "b95ea794",
-   "metadata": {},
+   "id": "77ce0b77",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "where $b_k^l$ are the biases from layer $l$.  Here $M_{l-1}$\n",
     "represents the total number of nodes/neurons/units of layer $l-1$. The\n",
@@ -903,8 +825,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "7468be96",
-   "metadata": {},
+   "id": "425c9b4b",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "\\hat{z}^l = \\left(\\hat{W}^l\\right)^T\\hat{a}^{l-1}+\\hat{b}^l.\n",
@@ -913,8 +837,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "82d9ddc5",
-   "metadata": {},
+   "id": "5abfff62",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "## Inputs to the activation function\n",
     "\n",
@@ -927,8 +853,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "611a8aaf",
-   "metadata": {},
+   "id": "a38fc9e8",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "a_j^l = \\sigma(z_j^l) = \\frac{1}{1+\\exp{-(z_j^l)}}.\n",
@@ -937,8 +865,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "adfdba0b",
-   "metadata": {},
+   "id": "7488633d",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "## Derivatives and the chain rule\n",
     "\n",
@@ -947,8 +877,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "6b23deb7",
-   "metadata": {},
+   "id": "bf1aa773",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "\\frac{\\partial z_j^l}{\\partial w_{ij}^l} = a_i^{l-1},\n",
@@ -957,16 +889,20 @@
   },
   {
    "cell_type": "markdown",
-   "id": "3dbf8b63",
-   "metadata": {},
+   "id": "a69b21b8",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "and"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "b3228dee",
-   "metadata": {},
+   "id": "d360c03c",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "\\frac{\\partial z_j^l}{\\partial a_i^{l-1}} = w_{ji}^l.\n",
@@ -975,16 +911,20 @@
   },
   {
    "cell_type": "markdown",
-   "id": "52be372b",
-   "metadata": {},
+   "id": "45cbf5b4",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "With our definition of the activation function we have that (note that this function depends only on $z_j^l$)"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "8e1cfc96",
-   "metadata": {},
+   "id": "d3fb77dc",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "\\frac{\\partial a_j^l}{\\partial z_j^{l}} = a_j^l(1-a_j^l)=\\sigma(z_j^l)(1-\\sigma(z_j^l)).\n",
@@ -993,8 +933,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "79b3ab6b",
-   "metadata": {},
+   "id": "b1e1359a",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "## Derivative of the cost function\n",
     "\n",
@@ -1005,8 +947,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "4541109b",
-   "metadata": {},
+   "id": "078c467a",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "{\\cal C}(\\boldsymbol{\\Theta}^L)  =  \\frac{1}{2}\\sum_{i=1}^n\\left(y_i - \\tilde{y}_i\\right)^2=\\frac{1}{2}\\sum_{i=1}^n\\left(a_i^L - y_i\\right)^2,\n",
@@ -1015,16 +959,20 @@
   },
   {
    "cell_type": "markdown",
-   "id": "ada15490",
-   "metadata": {},
+   "id": "dcc1e46d",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "The derivative of this function with respect to the weights is"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "ef1846b1",
-   "metadata": {},
+   "id": "f6ee76d9",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "\\frac{\\partial{\\cal C}(\\boldsymbol{\\Theta}^L)}{\\partial w_{jk}^L}  =  \\left(a_j^L - y_j\\right)\\frac{\\partial a_j^L}{\\partial w_{jk}^{L}},\n",
@@ -1033,16 +981,20 @@
   },
   {
    "cell_type": "markdown",
-   "id": "2bcae885",
-   "metadata": {},
+   "id": "2676af85",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "The last partial derivative can easily be computed and reads (by applying the chain rule)"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "14438533",
-   "metadata": {},
+   "id": "559b2d4d",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "\\frac{\\partial a_j^L}{\\partial w_{jk}^{L}} = \\frac{\\partial a_j^L}{\\partial z_{j}^{L}}\\frac{\\partial z_j^L}{\\partial w_{jk}^{L}}=a_j^L(1-a_j^L)a_k^{L-1}.\n",
@@ -1051,21 +1003,25 @@
   },
   {
    "cell_type": "markdown",
-   "id": "1ea04dc6",
-   "metadata": {},
+   "id": "8c38da59",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "## Simpler examples first, and automatic differentiation\n",
     "\n",
     "In order to understand the back propagation algorithm and its\n",
     "derivation (an implementation of the chain rule), let us first digress\n",
     "with some simple examples. These examples are also meant to motivate\n",
-    "the link with back propagation and [automatic differentiation](https://en.wikipedia.org/wiki/Automatic_differentiation)."
+    "the link with back propagation and [automatic differentiation](https://en.wikipedia.org/wiki/Automatic_differentiation). We will discuss these topics next week (week 42)."
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "2c1d3d23",
-   "metadata": {},
+   "id": "56bce503",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "## Reminder on the chain rule and gradients\n",
     "\n",
@@ -1074,8 +1030,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "0a784486",
-   "metadata": {},
+   "id": "ad049eb6",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "\\frac{df}{dt} = \\begin{bmatrix}\\frac{\\partial f}{\\partial x} & \\frac{\\partial f}{\\partial y} \\end{bmatrix} \\begin{bmatrix}\\frac{\\partial x}{\\partial t} \\\\ \\frac{\\partial y}{\\partial t} \\end{bmatrix}=\\frac{\\partial f}{\\partial x} \\frac{\\partial x}{\\partial t} +\\frac{\\partial f}{\\partial y} \\frac{\\partial y}{\\partial t}.\n",
@@ -1084,8 +1042,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "91a6fac4",
-   "metadata": {},
+   "id": "aba1085e",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "## Multivariable functions\n",
     "\n",
@@ -1094,8 +1054,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "64245345",
-   "metadata": {},
+   "id": "af203e94",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "\\frac{\\partial f}{\\partial s}=\\frac{\\partial f}{\\partial x}\\frac{\\partial x}{\\partial s}+\\frac{\\partial f}{\\partial y}\\frac{\\partial y}{\\partial s},\n",
@@ -1104,16 +1066,20 @@
   },
   {
    "cell_type": "markdown",
-   "id": "8f42f6ef",
-   "metadata": {},
+   "id": "1f19bbc8",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "and"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "23f3f62a",
-   "metadata": {},
+   "id": "2e8ec6fa",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "\\frac{\\partial f}{\\partial t}=\\frac{\\partial f}{\\partial x}\\frac{\\partial x}{\\partial t}+\\frac{\\partial f}{\\partial y}\\frac{\\partial y}{\\partial t}.\n",
@@ -1122,16 +1088,20 @@
   },
   {
    "cell_type": "markdown",
-   "id": "31db67ce",
-   "metadata": {},
+   "id": "ced8bca7",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "the gradient of $f$ with respect to $t$ and $s$ (without the explicit unit vector components)"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "54f91226",
-   "metadata": {},
+   "id": "11314d5d",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "\\frac{df}{d(s,t)} = \\begin{bmatrix}\\frac{\\partial f}{\\partial x} & \\frac{\\partial f}{\\partial y} \\end{bmatrix} \\begin{bmatrix}\\frac{\\partial x}{\\partial s}  &\\frac{\\partial x}{\\partial t} \\\\ \\frac{\\partial y}{\\partial s} & \\frac{\\partial y}{\\partial t} \\end{bmatrix}.\n",
@@ -1140,12 +1110,15 @@
   },
   {
    "cell_type": "markdown",
-   "id": "551bf4e9",
-   "metadata": {},
+   "id": "41529e10",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "## Automatic differentiation through examples\n",
     "\n",
     "A great introduction to automatic differentiation is given by Baydin et al., see <https://arxiv.org/abs/1502.05767>.\n",
+    "See also the video at <https://www.youtube.com/watch?v=wG_nF1awSSY>.\n",
     "\n",
     "Automatic differentiation is a represented by a repeated application\n",
     "of the chain rule on well-known functions and allows for the\n",
@@ -1158,8 +1131,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "1d4117b4",
-   "metadata": {},
+   "id": "088d17de",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "## Simple example\n",
     "\n",
@@ -1168,8 +1143,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "402669ea",
-   "metadata": {},
+   "id": "7e71c7d1",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "f(x) =\\exp{x^2},\n",
@@ -1178,16 +1155,20 @@
   },
   {
    "cell_type": "markdown",
-   "id": "ec119701",
-   "metadata": {},
+   "id": "ba088a97",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "with derivative"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "bcb30e6f",
-   "metadata": {},
+   "id": "142dd1a3",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "f'(x) =2x\\exp{x^2}.\n",
@@ -1196,18 +1177,37 @@
   },
   {
    "cell_type": "markdown",
-   "id": "2a7acb6e",
-   "metadata": {},
+   "id": "0842e55e",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "We can use SymPy to extract the pertinent lines of Python code through the following simple example"
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 4,
-   "id": "044e79d9",
-   "metadata": {},
-   "outputs": [],
+   "execution_count": 1,
+   "id": "366417fd",
+   "metadata": {
+    "collapsed": false,
+    "editable": true,
+    "jupyter": {
+     "outputs_hidden": false
+    }
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "x = Symbol('x')\n",
+      "e = exp(x**2)\n",
+      "x = Symbol('x')\n",
+      "e = 2*x*exp(x**2)\n"
+     ]
+    }
+   ],
    "source": [
     "from __future__ import division\n",
     "from sympy import *\n",
@@ -1221,8 +1221,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "1c3cb722",
-   "metadata": {},
+   "id": "3c89222d",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "## Smarter way of evaluating the above function\n",
     "If we study this function, we note that we can reduce the number of operations by introducing an intermediate variable"
@@ -1230,8 +1232,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "808a017a",
-   "metadata": {},
+   "id": "e0d76ab0",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "a = x^2,\n",
@@ -1240,16 +1244,20 @@
   },
   {
    "cell_type": "markdown",
-   "id": "2330f35c",
-   "metadata": {},
+   "id": "0bdb3a24",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "leading to"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "c0b17b83",
-   "metadata": {},
+   "id": "c048512a",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "f(x) = f(a(x)) = b= \\exp{a}.\n",
@@ -1258,8 +1266,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "9129586e",
-   "metadata": {},
+   "id": "2fc989c0",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "We now assume that all operations can be counted in terms of equal\n",
     "floating point operations. This means that in order to calculate\n",
@@ -1269,8 +1279,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "83209806",
-   "metadata": {},
+   "id": "9870961b",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "## Reducing the number of operations\n",
     "\n",
@@ -1279,8 +1291,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "1e04f023",
-   "metadata": {},
+   "id": "bf5d2730",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "f'(x) = 2xb,\n",
@@ -1289,8 +1303,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "4015de5c",
-   "metadata": {},
+   "id": "83ac9462",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "which reduces the number of operations from four in the orginal\n",
     "expression to two. This means that if we need to compute $f(x)$ and\n",
@@ -1304,8 +1320,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "cfd489e0",
-   "metadata": {},
+   "id": "fc99aca0",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "## Chain rule, forward and reverse modes\n",
     "\n",
@@ -1314,8 +1332,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "abcfa23a",
-   "metadata": {},
+   "id": "951df9d3",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "f(x) = f(a(x)) = b= \\exp{a},\n",
@@ -1324,16 +1344,20 @@
   },
   {
    "cell_type": "markdown",
-   "id": "67bcaa0e",
-   "metadata": {},
+   "id": "21f67f9f",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "with $a=x^2$. We can decompose the derivative of $f$ with respect to $x$ as"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "d4e12796",
-   "metadata": {},
+   "id": "5a1f7640",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "\\frac{df}{dx}=\\frac{df}{db}\\frac{db}{da}\\frac{da}{dx}.\n",
@@ -1342,16 +1366,20 @@
   },
   {
    "cell_type": "markdown",
-   "id": "e74eed87",
-   "metadata": {},
+   "id": "65bc9301",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "We note that since $b=f(x)$ that"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "5b626985",
-   "metadata": {},
+   "id": "84cab735",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "\\frac{df}{db}=1,\n",
@@ -1360,16 +1388,20 @@
   },
   {
    "cell_type": "markdown",
-   "id": "4b91e7a2",
-   "metadata": {},
+   "id": "d224fe07",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "leading to"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "6da09721",
-   "metadata": {},
+   "id": "5688c311",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "\\frac{df}{dx}=\\frac{db}{da}\\frac{da}{dx}=2x\\exp{x^2},\n",
@@ -1378,16 +1410,20 @@
   },
   {
    "cell_type": "markdown",
-   "id": "1ed5c66b",
-   "metadata": {},
+   "id": "ca522422",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "as before."
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "6157ca77",
-   "metadata": {},
+   "id": "8092632c",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "## Forward and reverse modes\n",
     "\n",
@@ -1396,8 +1432,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "d47f8037",
-   "metadata": {},
+   "id": "f56a611a",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "\\frac{df}{dx}=\\frac{df}{db}\\frac{db}{da}\\frac{da}{dx},\n",
@@ -1406,16 +1444,20 @@
   },
   {
    "cell_type": "markdown",
-   "id": "7a7bbc5d",
-   "metadata": {},
+   "id": "48398d25",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "which we can rewrite either as"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "427cb024",
-   "metadata": {},
+   "id": "d3f16b57",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "\\frac{df}{dx}=\\left[\\frac{df}{db}\\frac{db}{da}\\right]\\frac{da}{dx},\n",
@@ -1424,16 +1466,20 @@
   },
   {
    "cell_type": "markdown",
-   "id": "57a1aaa6",
-   "metadata": {},
+   "id": "3fafc6a2",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "or"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "a8faabb8",
-   "metadata": {},
+   "id": "3350a8f1",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "\\frac{df}{dx}=\\frac{df}{db}\\left[\\frac{db}{da}\\frac{da}{dx}\\right].\n",
@@ -1442,8 +1488,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "bb053125",
-   "metadata": {},
+   "id": "a96b27cf",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "The first expression is called reverse mode (or back propagation)\n",
     "since we start by evaluating the derivatives at the end point and then\n",
@@ -1458,8 +1506,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "6d58682a",
-   "metadata": {},
+   "id": "07433937",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "## More complicated function\n",
     "\n",
@@ -1468,8 +1518,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "e26f3e58",
-   "metadata": {},
+   "id": "e9b0931f",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "f(x) =\\sqrt{x^2+exp{x^2}},\n",
@@ -1478,16 +1530,20 @@
   },
   {
    "cell_type": "markdown",
-   "id": "bd6d2ab5",
-   "metadata": {},
+   "id": "0f3ccfad",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "with derivative"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "4df7d922",
-   "metadata": {},
+   "id": "e9397f64",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "f'(x) =\\frac{x(1+\\exp{x^2})}{\\sqrt{x^2+exp{x^2}}}.\n",
@@ -1496,17 +1552,25 @@
   },
   {
    "cell_type": "markdown",
-   "id": "451135b1",
-   "metadata": {},
+   "id": "c744cb7c",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "The corresponding SymPy code reads"
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 5,
-   "id": "a0c4f830",
-   "metadata": {},
+   "execution_count": 2,
+   "id": "d11eb5f8",
+   "metadata": {
+    "collapsed": false,
+    "editable": true,
+    "jupyter": {
+     "outputs_hidden": false
+    }
+   },
    "outputs": [],
    "source": [
     "from __future__ import division\n",
@@ -1521,8 +1585,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "dd8fb453",
-   "metadata": {},
+   "id": "918d745b",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "## Counting the number of floating point operations\n",
     "\n",
@@ -1535,8 +1601,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "4a4d0c85",
-   "metadata": {},
+   "id": "94840c14",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "## Defining intermediate operations\n",
     "\n",
@@ -1546,8 +1614,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "6216f527",
-   "metadata": {},
+   "id": "25af25ff",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "a = x^2,\n",
@@ -1556,16 +1626,20 @@
   },
   {
    "cell_type": "markdown",
-   "id": "36fa3364",
-   "metadata": {},
+   "id": "bcd326ae",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "and"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "d7936338",
-   "metadata": {},
+   "id": "0764b39b",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "b = \\exp{x^2} = \\exp{a},\n",
@@ -1574,16 +1648,20 @@
   },
   {
    "cell_type": "markdown",
-   "id": "b3c1cf2e",
-   "metadata": {},
+   "id": "9d07a9fd",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "and"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "3045f2f3",
-   "metadata": {},
+   "id": "e5d27af1",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "c= a+b,\n",
@@ -1592,16 +1670,20 @@
   },
   {
    "cell_type": "markdown",
-   "id": "c62d9b4f",
-   "metadata": {},
+   "id": "ef822d98",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "and"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "cc00424b",
-   "metadata": {},
+   "id": "f3b32413",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "d=f(x)=\\sqrt{c}.\n",
@@ -1610,8 +1692,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "2880b26d",
-   "metadata": {},
+   "id": "442a69f2",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "## New expression for the derivative\n",
     "\n",
@@ -1620,8 +1704,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "6995073b",
-   "metadata": {},
+   "id": "73b739fc",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "\\frac{\\partial a}{\\partial x} = 2x,\n",
@@ -1630,16 +1716,20 @@
   },
   {
    "cell_type": "markdown",
-   "id": "3025c95c",
-   "metadata": {},
+   "id": "e062b0e3",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "and"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "eadbab8c",
-   "metadata": {},
+   "id": "6a31278c",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "\\frac{\\partial b}{\\partial a} = \\exp{a},\n",
@@ -1648,16 +1738,20 @@
   },
   {
    "cell_type": "markdown",
-   "id": "3b92e291",
-   "metadata": {},
+   "id": "33d9e9c5",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "and"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "61abe762",
-   "metadata": {},
+   "id": "bad920b9",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "\\frac{\\partial c}{\\partial a} = 1,\n",
@@ -1666,16 +1760,20 @@
   },
   {
    "cell_type": "markdown",
-   "id": "b9ed2e77",
-   "metadata": {},
+   "id": "b0efbca8",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "and"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "3dc26776",
-   "metadata": {},
+   "id": "5befbdb6",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "\\frac{\\partial c}{\\partial b} = 1,\n",
@@ -1684,16 +1782,20 @@
   },
   {
    "cell_type": "markdown",
-   "id": "8092c5dc",
-   "metadata": {},
+   "id": "6761dc6b",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "and"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "3408773c",
-   "metadata": {},
+   "id": "ddc863d0",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "\\frac{\\partial d}{\\partial c} = \\frac{1}{2\\sqrt{c}},\n",
@@ -1702,16 +1804,20 @@
   },
   {
    "cell_type": "markdown",
-   "id": "547d309c",
-   "metadata": {},
+   "id": "359cd73e",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "and finally"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "3885960f",
-   "metadata": {},
+   "id": "ec3a4c5a",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "\\frac{\\partial f}{\\partial d} = 1.\n",
@@ -1720,8 +1826,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "094da560",
-   "metadata": {},
+   "id": "23a46f59",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "## Final derivatives\n",
     "Our final derivatives are thus"
@@ -1729,8 +1837,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "5e7784ef",
-   "metadata": {},
+   "id": "d18c7266",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "\\frac{\\partial f}{\\partial c} = \\frac{\\partial f}{\\partial d} \\frac{\\partial d}{\\partial c}  = \\frac{1}{2\\sqrt{c}},\n",
@@ -1739,8 +1849,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "da55dc29",
-   "metadata": {},
+   "id": "c8466328",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "\\frac{\\partial f}{\\partial b} = \\frac{\\partial f}{\\partial c} \\frac{\\partial c}{\\partial b}  = \\frac{1}{2\\sqrt{c}},\n",
@@ -1749,8 +1861,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "1521da52",
-   "metadata": {},
+   "id": "b31cec2a",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "\\frac{\\partial f}{\\partial a} = \\frac{\\partial f}{\\partial c} \\frac{\\partial c}{\\partial a}+\n",
@@ -1760,16 +1874,20 @@
   },
   {
    "cell_type": "markdown",
-   "id": "c585cfc1",
-   "metadata": {},
+   "id": "cffbd90e",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "and finally"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "49baac68",
-   "metadata": {},
+   "id": "5832629f",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "\\frac{\\partial f}{\\partial x} = \\frac{\\partial f}{\\partial a} \\frac{\\partial a}{\\partial x}  = \\frac{x(1+\\exp{a})}{\\sqrt{c}},\n",
@@ -1778,16 +1896,20 @@
   },
   {
    "cell_type": "markdown",
-   "id": "90c1b377",
-   "metadata": {},
+   "id": "a6b96334",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "which is just"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "dfd1e5f5",
-   "metadata": {},
+   "id": "c227961a",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "\\frac{\\partial f}{\\partial x} = \\frac{x(1+b)}{d},\n",
@@ -1796,16 +1918,20 @@
   },
   {
    "cell_type": "markdown",
-   "id": "01027588",
-   "metadata": {},
+   "id": "2d6131d9",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "and requires only three operations if we can reuse all intermediate variables."
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "faf9a7f4",
-   "metadata": {},
+   "id": "628974b9",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "## In general not this simple\n",
     "\n",
@@ -1818,8 +1944,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "c8a60c3c",
-   "metadata": {},
+   "id": "a8e1c977",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "## Automatic differentiation\n",
     "\n",
@@ -1833,8 +1961,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "bebf89c4",
-   "metadata": {},
+   "id": "64dd16b8",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "\\begin{bmatrix} x_1=x & x_2 = x^2=a & x_3 =\\exp{a}= b & x_4=c=a+b & x_5 = \\sqrt{c}=d \\end{bmatrix}.\n",
@@ -1843,8 +1973,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "c8e55af0",
-   "metadata": {},
+   "id": "f2e7961e",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "Furthemore, for $i=l+1, \\dots, L$ (here $i=2,3,4,5$ and $f=x_L=d$), we\n",
     "define the elementary functions $g_i(x_{Pa(x_i)})$ where $x_{Pa(x_i)}$ are the parent nodes of the variable $x_i$.\n",
@@ -1854,8 +1986,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "4b69aacb",
-   "metadata": {},
+   "id": "ec404649",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "## Chain rule\n",
     "\n",
@@ -1865,8 +1999,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "c0d54321",
-   "metadata": {},
+   "id": "ca084d66",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "\\frac{\\partial f}{\\partial x_L} = 1,\n",
@@ -1875,16 +2011,20 @@
   },
   {
    "cell_type": "markdown",
-   "id": "8486d91e",
-   "metadata": {},
+   "id": "28ed9428",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "which allows us to find the derivatives of the various variables $x_i$ as"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "de5e8a70",
-   "metadata": {},
+   "id": "646dc164",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "\\frac{\\partial f}{\\partial x_i} = \\sum_{x_j:x_i\\in Pa(x_j)}\\frac{\\partial f}{\\partial x_j} \\frac{\\partial x_j}{\\partial x_i}=\\sum_{x_j:x_i\\in Pa(x_j)}\\frac{\\partial f}{\\partial x_j} \\frac{\\partial g_j}{\\partial x_i}.\n",
@@ -1893,8 +2033,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "42dc3f26",
-   "metadata": {},
+   "id": "59ca75d5",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "Whenever we have a function which can be expressed as a computation\n",
     "graph and the various functions can be expressed in terms of\n",
@@ -1906,8 +2048,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "133e16c7",
-   "metadata": {},
+   "id": "f9f96b7a",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "## First network example, simple percepetron with one input\n",
     "\n",
@@ -1919,8 +2063,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "1fd248a5",
-   "metadata": {},
+   "id": "6b30c908",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "z_1 = w_1x+b_1,\n",
@@ -1929,8 +2075,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "321220a1",
-   "metadata": {},
+   "id": "79c9202c",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "where $w_1$ is the weight and $b_1$ is the bias. These are the\n",
     "parameters we want to optimize.  The output is $a_1=\\sigma(z_1)$ (see\n",
@@ -1941,8 +2089,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "d4ef8cda",
-   "metadata": {},
+   "id": "2e796063",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "C(x;w_1,b_1)=\\frac{1}{2}(a_1-y)^2.\n",
@@ -1951,8 +2101,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "c83257a3",
-   "metadata": {},
+   "id": "9ed9e0a2",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "## Layout of a simple neural network with no hidden layer\n",
     "\n",
@@ -1965,8 +2117,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "41976e3b",
-   "metadata": {},
+   "id": "3c104135",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "## Optimizing the parameters\n",
     "\n",
@@ -1979,8 +2133,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "daeda85f",
-   "metadata": {},
+   "id": "6ff0f3bd",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "\\frac{\\partial C}{\\partial w_1} \\hspace{0.1cm}\\mathrm{and}\\hspace{0.1cm}\\frac{\\partial C}{\\partial b_1}.\n",
@@ -1989,16 +2145,20 @@
   },
   {
    "cell_type": "markdown",
-   "id": "cfd25fda",
-   "metadata": {},
+   "id": "9b08413b",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "Using the chain rule we find"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "8d695dcf",
-   "metadata": {},
+   "id": "c1f30f1e",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "\\frac{\\partial C}{\\partial w_1}=\\frac{\\partial C}{\\partial a_1}\\frac{\\partial a_1}{\\partial z_1}\\frac{\\partial z_1}{\\partial w_1}=(a_1-y)\\sigma_1'x,\n",
@@ -2007,16 +2167,20 @@
   },
   {
    "cell_type": "markdown",
-   "id": "dcc2f332",
-   "metadata": {},
+   "id": "1d1dd1a0",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "and"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "cb53a1ce",
-   "metadata": {},
+   "id": "ecd682c5",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "\\frac{\\partial C}{\\partial b_1}=\\frac{\\partial C}{\\partial a_1}\\frac{\\partial a_1}{\\partial z_1}\\frac{\\partial z_1}{\\partial b_1}=(a_1-y)\\sigma_1',\n",
@@ -2025,16 +2189,20 @@
   },
   {
    "cell_type": "markdown",
-   "id": "b547af4b",
-   "metadata": {},
+   "id": "801d6435",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "which we later will just define as"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "e05ad1a5",
-   "metadata": {},
+   "id": "8a44c9a0",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "\\frac{\\partial C}{\\partial a_1}\\frac{\\partial a_1}{\\partial z_1}=\\delta_1.\n",
@@ -2043,8 +2211,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "e841d792",
-   "metadata": {},
+   "id": "c79bd584",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "## Adding a hidden layer\n",
     "\n",
@@ -2057,8 +2227,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "7d0be7a7",
-   "metadata": {},
+   "id": "8db0e424",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "z_1 = w_1a_0+b_1 \\hspace{0.1cm} \\wedge a_1 = \\sigma_1(z_1),\n",
@@ -2067,8 +2239,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "99d7e794",
-   "metadata": {},
+   "id": "bdcabebe",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "z_2 = w_2a_1+b_2 \\hspace{0.1cm} \\wedge a_2 = \\sigma_2(z_2),\n",
@@ -2077,16 +2251,20 @@
   },
   {
    "cell_type": "markdown",
-   "id": "b5a9ccd1",
-   "metadata": {},
+   "id": "87b07537",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "and the cost function"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "97e1f77a",
-   "metadata": {},
+   "id": "3630c210",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "C(x;\\boldsymbol{\\Theta})=\\frac{1}{2}(a_2-y)^2,\n",
@@ -2095,16 +2273,20 @@
   },
   {
    "cell_type": "markdown",
-   "id": "129d6040",
-   "metadata": {},
+   "id": "983d6dd1",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "with $\\boldsymbol{\\Theta}=[w_1,w_2,b_1,b_2]$."
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "cd905c05",
-   "metadata": {},
+   "id": "1bf18e6d",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "## Layout of a simple neural network with one hidden layer\n",
     "\n",
@@ -2117,8 +2299,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "dc3ee985",
-   "metadata": {},
+   "id": "243530f3",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "## The derivatives\n",
     "\n",
@@ -2127,8 +2311,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "6b21de52",
-   "metadata": {},
+   "id": "d775c1df",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "\\frac{\\partial C}{\\partial w_2}=\\frac{\\partial C}{\\partial a_2}\\frac{\\partial a_2}{\\partial z_2}\\frac{\\partial z_2}{\\partial w_2}=(a_2-y)\\sigma_2'a_1=\\delta_2a_1,\n",
@@ -2137,8 +2323,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "04bef693",
-   "metadata": {},
+   "id": "882f61b2",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "\\frac{\\partial C}{\\partial b_2}=\\frac{\\partial C}{\\partial a_2}\\frac{\\partial a_2}{\\partial z_2}\\frac{\\partial z_2}{\\partial b_2}=(a_2-y)\\sigma_2'=\\delta_2,\n",
@@ -2147,8 +2335,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "6e9a9575",
-   "metadata": {},
+   "id": "b5e192ac",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "\\frac{\\partial C}{\\partial w_1}=\\frac{\\partial C}{\\partial a_2}\\frac{\\partial a_2}{\\partial z_2}\\frac{\\partial z_2}{\\partial a_1}\\frac{\\partial a_1}{\\partial z_1}\\frac{\\partial z_1}{\\partial w_1}=(a_2-y)\\sigma_2'a_1\\sigma_1'a_0,\n",
@@ -2157,8 +2347,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "c65ff641",
-   "metadata": {},
+   "id": "d348a3eb",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "\\frac{\\partial C}{\\partial b_1}=\\frac{\\partial C}{\\partial a_2}\\frac{\\partial a_2}{\\partial z_2}\\frac{\\partial z_2}{\\partial a_1}\\frac{\\partial a_1}{\\partial z_1}\\frac{\\partial z_1}{\\partial b_1}=(a_2-y)\\sigma_2'\\sigma_1'=\\delta_1.\n",
@@ -2167,16 +2359,20 @@
   },
   {
    "cell_type": "markdown",
-   "id": "f67e1353",
-   "metadata": {},
+   "id": "46c1d7be",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "Can you generalize this to more than one hidden layer?"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "7476ad06",
-   "metadata": {},
+   "id": "9e4495b0",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "## Important observations\n",
     "\n",
@@ -2189,8 +2385,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "520d8a90",
-   "metadata": {},
+   "id": "3f89cc68",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "## The training\n",
     "\n",
@@ -2199,8 +2397,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "d57d0f1d",
-   "metadata": {},
+   "id": "846a77a6",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "w_{i}\\leftarrow w_{i}- \\eta \\delta_i a_{i-1},\n",
@@ -2209,16 +2409,20 @@
   },
   {
    "cell_type": "markdown",
-   "id": "97925a47",
-   "metadata": {},
+   "id": "bb92d674",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "and"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "21bf2445",
-   "metadata": {},
+   "id": "1d3d0643",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "b_i \\leftarrow b_i-\\eta \\delta_i,\n",
@@ -2227,8 +2431,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "de7f0e7b",
-   "metadata": {},
+   "id": "1c17f3b2",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "with $\\eta$ is the learning rate.\n",
     "\n",
@@ -2239,8 +2445,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "4060cfa4",
-   "metadata": {},
+   "id": "d87898d3",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "## Code example\n",
     "\n",
@@ -2254,67 +2462,16 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 5,
-   "id": "1248e89b",
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "[36.89563074]\n",
-      "[23.62323175]\n",
-      "[15.1251681]\n",
-      "[9.68402334]\n",
-      "[6.20020163]\n",
-      "[3.96963458]\n",
-      "[2.54150298]\n",
-      "[1.62714703]\n",
-      "[1.0417409]\n",
-      "[0.66694492]\n",
-      "[0.42699034]\n",
-      "[0.27336592]\n",
-      "[0.17501258]\n",
-      "[0.11204514]\n",
-      "[0.07173249]\n",
-      "[0.04592382]\n",
-      "[0.02940083]\n",
-      "[0.01882264]\n",
-      "[0.01205039]\n",
-      "[0.00771475]\n",
-      "[0.00493903]\n",
-      "[0.003162]\n",
-      "[0.00202433]\n",
-      "[0.00129599]\n",
-      "[0.0008297]\n",
-      "[0.00053118]\n",
-      "[0.00034006]\n",
-      "[0.00021771]\n",
-      "[0.00013938]\n",
-      "[8.92313548e-05]\n",
-      "[5.71263851e-05]\n",
-      "[3.6572612e-05]\n",
-      "[2.34139775e-05]\n",
-      "[1.49897504e-05]\n",
-      "[9.5965161e-06]\n",
-      "[6.14373934e-06]\n",
-      "[3.93325371e-06]\n",
-      "[2.51808934e-06]\n",
-      "[1.61209378e-06]\n",
-      "[1.03207075e-06]\n",
-      "[6.60737006e-07]\n",
-      "[4.23007231e-07]\n",
-      "[2.70811405e-07]\n",
-      "[1.73374853e-07]\n",
-      "[1.10995472e-07]\n",
-      "[7.10598715e-08]\n",
-      "[4.54928947e-08]\n",
-      "[2.91247848e-08]\n",
-      "[1.86458368e-08]\n",
-      "[1.19371605e-08]\n"
-     ]
+   "execution_count": 3,
+   "id": "f6b380d8",
+   "metadata": {
+    "collapsed": false,
+    "editable": true,
+    "jupyter": {
+     "outputs_hidden": false
     }
-   ],
+   },
+   "outputs": [],
    "source": [
     "import numpy as np\n",
     "# We use the Sigmoid function as activation function\n",
@@ -2382,16 +2539,20 @@
   },
   {
    "cell_type": "markdown",
-   "id": "eabe6b42",
-   "metadata": {},
+   "id": "a7ed94bd",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "We see that after some few iterations (the results do depend on the learning rate however), we get an error which is rather small."
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "35974e2d",
-   "metadata": {},
+   "id": "17820f64",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "## Exercise 1: Including more data\n",
     "\n",
@@ -2406,8 +2567,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "eebf9d30",
-   "metadata": {},
+   "id": "fff2f990",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "## Simple neural network and the  back propagation equations\n",
     "\n",
@@ -2421,8 +2584,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "2f079f0b",
-   "metadata": {},
+   "id": "d4a54001",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "x_0 = a_0^{(0)} \\wedge x_1 = a_1^{(0)}.\n",
@@ -2431,16 +2596,20 @@
   },
   {
    "cell_type": "markdown",
-   "id": "50b799fe",
-   "metadata": {},
+   "id": "ecbb8ee9",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "The  hidden layer (layer $(1)$) has  nodes which yield the outputs $a_0^{(1)}$ and $a_1^{(1)}$) with  weight $\\boldsymbol{w}$ and bias $\\boldsymbol{b}$ parameters"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "9eeeddd7",
-   "metadata": {},
+   "id": "f7cf03b9",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "w_{ij}^{(1)}=\\left\\{w_{00}^{(1)},w_{01}^{(1)},w_{10}^{(1)},w_{11}^{(1)}\\right\\} \\wedge b^{(1)}=\\left\\{b_0^{(1)},b_1^{(1)}\\right\\}.\n",
@@ -2449,8 +2618,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "89d239c6",
-   "metadata": {},
+   "id": "b567bfd4",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "## Layout of a simple neural network with two input nodes, one  hidden layer and one output node\n",
     "\n",
@@ -2463,8 +2634,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "bdbf35b5",
-   "metadata": {},
+   "id": "5a736590",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "## The ouput layer\n",
     "\n",
@@ -2473,8 +2646,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "4e4ff5e9",
-   "metadata": {},
+   "id": "7093d4a2",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "w_{i}^{(2)}=\\left\\{w_{0}^{(2)},w_{1}^{(2)}\\right\\} \\wedge b^{(2)}.\n",
@@ -2483,8 +2658,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "0d262c6f",
-   "metadata": {},
+   "id": "6040f65c",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "Our output is $\\tilde{y}=a^{(2)}$ and we define a generic cost function $C(a^{(2)},y;\\boldsymbol{\\Theta})$ where $y$ is the target value (a scalar here).\n",
     "The parameters we need to optimize are given by"
@@ -2492,8 +2669,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "bc03ac5a",
-   "metadata": {},
+   "id": "1708ac04",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "\\boldsymbol{\\Theta}=\\left\\{w_{00}^{(1)},w_{01}^{(1)},w_{10}^{(1)},w_{11}^{(1)},w_{0}^{(2)},w_{1}^{(2)},b_0^{(1)},b_1^{(1)},b^{(2)}\\right\\}.\n",
@@ -2502,8 +2681,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "59f909e2",
-   "metadata": {},
+   "id": "4049f829",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "## Compact expressions\n",
     "\n",
@@ -2513,8 +2694,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "7eb0a44d",
-   "metadata": {},
+   "id": "c402877d",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "\\begin{bmatrix}z_0^{(1)} \\\\ z_1^{(1)} \\end{bmatrix}=\\begin{bmatrix}w_{00}^{(1)} & w_{01}^{(1)}\\\\ w_{10}^{(1)} &w_{11}^{(1)} \\end{bmatrix}\\begin{bmatrix}a_0^{(0)} \\\\ a_1^{(0)} \\end{bmatrix}+\\begin{bmatrix}b_0^{(1)} \\\\ b_1^{(1)} \\end{bmatrix},\n",
@@ -2523,16 +2706,20 @@
   },
   {
    "cell_type": "markdown",
-   "id": "07d8d778",
-   "metadata": {},
+   "id": "d5f389a4",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "with outputs"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "60368bf9",
-   "metadata": {},
+   "id": "5ddb4ea7",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "\\begin{bmatrix}a_0^{(1)} \\\\ a_1^{(1)} \\end{bmatrix}=\\begin{bmatrix}\\sigma^{(1)}(z_0^{(1)}) \\\\ \\sigma^{(1)}(z_1^{(1)}) \\end{bmatrix}.\n",
@@ -2541,8 +2728,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "79853185",
-   "metadata": {},
+   "id": "c2b1734e",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "## Output layer\n",
     "\n",
@@ -2551,8 +2740,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "e84489fb",
-   "metadata": {},
+   "id": "02ea3c13",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "z^{(2)} = w_{0}^{(2)}a_0^{(1)} +w_{1}^{(2)}a_1^{(1)}+b^{(2)},\n",
@@ -2561,16 +2752,20 @@
   },
   {
    "cell_type": "markdown",
-   "id": "10ac6ea6",
-   "metadata": {},
+   "id": "9a4f7ae1",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "resulting in the  output"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "4bb215d6",
-   "metadata": {},
+   "id": "66fa1eb9",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "a^{(2)}=\\sigma^{(2)}(z^{(2)}).\n",
@@ -2579,8 +2774,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "0389c02c",
-   "metadata": {},
+   "id": "f8f151b0",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "## Explicit derivatives\n",
     "\n",
@@ -2593,8 +2790,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "bd9a8665",
-   "metadata": {},
+   "id": "fd68932d",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "\\frac{\\partial C}{\\partial w_{i}^{(2)}}=\\frac{\\partial C}{\\partial a^{(2)}}\\frac{\\partial a^{(2)}}{\\partial z^{(2)}}\\frac{\\partial z^{(2)}}{\\partial w_{i}^{(2)}}=\\delta^{(2)}a_i^{(1)},\n",
@@ -2603,16 +2802,20 @@
   },
   {
    "cell_type": "markdown",
-   "id": "904eef6d",
-   "metadata": {},
+   "id": "8fe55456",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "with"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "f422cd5d",
-   "metadata": {},
+   "id": "90961a11",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "\\delta^{(2)}=\\frac{\\partial C}{\\partial a^{(2)}}\\frac{\\partial a^{(2)}}{\\partial z^{(2)}}\n",
@@ -2621,16 +2824,20 @@
   },
   {
    "cell_type": "markdown",
-   "id": "0827f570",
-   "metadata": {},
+   "id": "7762b2ae",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "and finally"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "0caae8eb",
-   "metadata": {},
+   "id": "dafa94cd",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "\\frac{\\partial C}{\\partial b^{(2)}}=\\frac{\\partial C}{\\partial a^{(2)}}\\frac{\\partial a^{(2)}}{\\partial z^{(2)}}\\frac{\\partial z^{(2)}}{\\partial b^{(2)}}=\\delta^{(2)}.\n",
@@ -2639,8 +2846,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "2f4011e7",
-   "metadata": {},
+   "id": "1ee741d3",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "## Derivatives of the hidden layer\n",
     "\n",
@@ -2649,8 +2858,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "4614eb63",
-   "metadata": {},
+   "id": "57285440",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "\\frac{\\partial C}{\\partial w_{00}^{(1)}}=\\frac{\\partial C}{\\partial a^{(2)}}\\frac{\\partial a^{(2)}}{\\partial z^{(2)}}\n",
@@ -2660,16 +2871,20 @@
   },
   {
    "cell_type": "markdown",
-   "id": "954403b8",
-   "metadata": {},
+   "id": "4015f8b6",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "which, noting that"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "d0f5c05f",
-   "metadata": {},
+   "id": "0ae4f875",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "z^{(2)} =w_0^{(2)}a_0^{(1)}+w_1^{(2)}a_1^{(1)}+b^{(2)},\n",
@@ -2678,16 +2893,20 @@
   },
   {
    "cell_type": "markdown",
-   "id": "ac8c2f66",
-   "metadata": {},
+   "id": "58474c41",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "allows us to rewrite"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "9bcf7c0d",
-   "metadata": {},
+   "id": "cdce5910",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "\\frac{\\partial z^{(2)}}{\\partial z_0^{(1)}}\\frac{\\partial z_0^{(1)}}{\\partial w_{00}^{(1)}}=w_0^{(2)}\\frac{\\partial a_0^{(1)}}{\\partial z_0^{(1)}}a_0^{(1)}.\n",
@@ -2696,8 +2915,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "4c2c5c00",
-   "metadata": {},
+   "id": "9cb2330b",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "## Final expression\n",
     "Defining"
@@ -2705,8 +2926,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "eed00134",
-   "metadata": {},
+   "id": "4cf867ce",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "\\delta_0^{(1)}=w_0^{(2)}\\frac{\\partial a_0^{(1)}}{\\partial z_0^{(1)}}\\delta^{(2)},\n",
@@ -2715,16 +2938,20 @@
   },
   {
    "cell_type": "markdown",
-   "id": "21d53946",
-   "metadata": {},
+   "id": "058be70c",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "we have"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "4fbe8642",
-   "metadata": {},
+   "id": "0718aad7",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "\\frac{\\partial C}{\\partial w_{00}^{(1)}}=\\delta_0^{(1)}a_0^{(1)}.\n",
@@ -2733,16 +2960,20 @@
   },
   {
    "cell_type": "markdown",
-   "id": "2325e21d",
-   "metadata": {},
+   "id": "3eb105cd",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "Similarly, we obtain"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "14edd073",
-   "metadata": {},
+   "id": "1e87092e",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "\\frac{\\partial C}{\\partial w_{01}^{(1)}}=\\delta_0^{(1)}a_1^{(1)}.\n",
@@ -2751,8 +2982,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "52bbf902",
-   "metadata": {},
+   "id": "06a05ad6",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "## Completing the list\n",
     "\n",
@@ -2761,8 +2994,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "98a6ed3a",
-   "metadata": {},
+   "id": "4a6b23b7",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "\\frac{\\partial C}{\\partial w_{10}^{(1)}}=\\delta_1^{(1)}a_0^{(1)},\n",
@@ -2771,16 +3006,20 @@
   },
   {
    "cell_type": "markdown",
-   "id": "6ab03495",
-   "metadata": {},
+   "id": "1205a0c6",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "and"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "dcda9d7d",
-   "metadata": {},
+   "id": "6521c77b",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "\\frac{\\partial C}{\\partial w_{11}^{(1)}}=\\delta_1^{(1)}a_1^{(1)},\n",
@@ -2789,16 +3028,20 @@
   },
   {
    "cell_type": "markdown",
-   "id": "46ec3534",
-   "metadata": {},
+   "id": "387bca3c",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "where we have defined"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "f10c20da",
-   "metadata": {},
+   "id": "10bbe577",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "\\delta_1^{(1)}=w_1^{(2)}\\frac{\\partial a_1^{(1)}}{\\partial z_1^{(1)}}\\delta^{(2)}.\n",
@@ -2807,8 +3050,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "c3c477be",
-   "metadata": {},
+   "id": "285b1083",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "## Final expressions for the biases of the hidden layer\n",
     "\n",
@@ -2817,8 +3062,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "0c915c97",
-   "metadata": {},
+   "id": "189494bf",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "\\frac{\\partial C}{\\partial b_{0}^{(1)}}=\\delta_0^{(1)},\n",
@@ -2827,16 +3074,20 @@
   },
   {
    "cell_type": "markdown",
-   "id": "646d8351",
-   "metadata": {},
+   "id": "6935c64e",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "and"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "4ce93a76",
-   "metadata": {},
+   "id": "e5d4fc04",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "\\frac{\\partial C}{\\partial b_{1}^{(1)}}=\\delta_1^{(1)}.\n",
@@ -2845,16 +3096,20 @@
   },
   {
    "cell_type": "markdown",
-   "id": "ee7a38ed",
-   "metadata": {},
+   "id": "81160d9f",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "As we will see below, these expressions can be generalized in a more compact form."
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "abdb541d",
-   "metadata": {},
+   "id": "287770fc",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "## Gradient expressions\n",
     "\n",
@@ -2864,8 +3119,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "410de8d4",
-   "metadata": {},
+   "id": "fe3ad298",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "w_{i}^{(2)}\\leftarrow w_{i}^{(2)}- \\eta \\delta^{(2)} a_{i}^{(1)},\n",
@@ -2874,16 +3131,20 @@
   },
   {
    "cell_type": "markdown",
-   "id": "ca1d6736",
-   "metadata": {},
+   "id": "35db83e2",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "and"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "d43c2266",
-   "metadata": {},
+   "id": "e81582fe",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "b^{(2)} \\leftarrow b^{(2)}-\\eta \\delta^{(2)},\n",
@@ -2892,16 +3153,20 @@
   },
   {
    "cell_type": "markdown",
-   "id": "56457d52",
-   "metadata": {},
+   "id": "371516fe",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "and"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "1ddc19f1",
-   "metadata": {},
+   "id": "2c9d92af",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "w_{ij}^{(1)}\\leftarrow w_{ij}^{(1)}- \\eta \\delta_{i}^{(1)} a_{j}^{(0)},\n",
@@ -2910,16 +3175,20 @@
   },
   {
    "cell_type": "markdown",
-   "id": "de8c80cd",
-   "metadata": {},
+   "id": "e3e1c8be",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "and"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "db88453a",
-   "metadata": {},
+   "id": "5e1403b0",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "b_{i}^{(1)} \\leftarrow b_{i}^{(1)}-\\eta \\delta_{i}^{(1)},\n",
@@ -2928,16 +3197,20 @@
   },
   {
    "cell_type": "markdown",
-   "id": "f61958e4",
-   "metadata": {},
+   "id": "29f57c65",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "where $\\eta$ is the learning rate."
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "7ef82615",
-   "metadata": {},
+   "id": "0d84bfc1",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "## Exercise 2: Extended program\n",
     "\n",
@@ -2946,8 +3219,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "f65323bf",
-   "metadata": {},
+   "id": "b2c26b76",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "y=f(x_0,x_1)=x_0^2+3x_0x_1+x_1^2+5.\n",
@@ -2956,16 +3231,20 @@
   },
   {
    "cell_type": "markdown",
-   "id": "96734d00",
-   "metadata": {},
+   "id": "e819129e",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "We feed our network with $n=100$ entries $x_0$ and $x_1$. We have thus two features represented by these variable and an input matrix/design matrix $\\boldsymbol{X}\\in \\mathbf{R}^{n\\times 2}$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "f776f53c",
-   "metadata": {},
+   "id": "b1a4a0fe",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "\\boldsymbol{X}=\\begin{bmatrix} x_{00} & x_{01} \\\\ x_{00} & x_{01} \\\\ x_{10} & x_{11} \\\\ x_{20} & x_{21} \\\\ \\dots & \\dots \\\\ \\dots & \\dots \\\\ x_{n-20} & x_{n-21} \\\\ x_{n-10} & x_{n-11} \\end{bmatrix}.\n",
@@ -2974,8 +3253,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "bbf2e007",
-   "metadata": {},
+   "id": "3c1a488d",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "Write a code, based on the previous code examples, which takes as input these data and fit the above function.\n",
     "You can extend your code to include automatic differentiation.\n",
@@ -2985,8 +3266,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "9d6d0a32",
-   "metadata": {},
+   "id": "12f13d1c",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "## Getting serious, the  back propagation equations for a neural network\n",
     "\n",
@@ -2997,8 +3280,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "6720dd98",
-   "metadata": {},
+   "id": "dce39d07",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "\\frac{\\partial{\\cal C}((\\boldsymbol{\\Theta}^L)}{\\partial w_{jk}^L}  =  \\left(a_j^L - y_j\\right)a_j^L(1-a_j^L)a_k^{L-1},\n",
@@ -3007,16 +3292,20 @@
   },
   {
    "cell_type": "markdown",
-   "id": "759eb99f",
-   "metadata": {},
+   "id": "e9e56334",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "Defining"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "294cc167",
-   "metadata": {},
+   "id": "36dedc68",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "\\delta_j^L = a_j^L(1-a_j^L)\\left(a_j^L - y_j\\right) = \\sigma'(z_j^L)\\frac{\\partial {\\cal C}}{\\partial (a_j^L)},\n",
@@ -3025,16 +3314,20 @@
   },
   {
    "cell_type": "markdown",
-   "id": "3311576a",
-   "metadata": {},
+   "id": "7c1f56c5",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "and using the Hadamard product of two vectors we can write this as"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "892ab641",
-   "metadata": {},
+   "id": "8fdbfe02",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "\\boldsymbol{\\delta}^L = \\sigma'(\\hat{z}^L)\\circ\\frac{\\partial {\\cal C}}{\\partial (\\boldsymbol{a}^L)}.\n",
@@ -3043,8 +3336,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "e126a949",
-   "metadata": {},
+   "id": "a6be0f67",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "## Analyzing the last results\n",
     "\n",
@@ -3059,8 +3354,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "0640b440",
-   "metadata": {},
+   "id": "774fed1c",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "## More considerations\n",
     "\n",
@@ -3075,8 +3372,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "780544e8",
-   "metadata": {},
+   "id": "a993dc18",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "\\frac{\\partial {\\cal C}}{\\partial (a_j^L)}\n",
@@ -3085,16 +3384,20 @@
   },
   {
    "cell_type": "markdown",
-   "id": "5ed8c781",
-   "metadata": {},
+   "id": "669e403a",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "With the definition of $\\delta_j^L$ we have a more compact definition of the derivative of the cost function in terms of the weights, namely"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "9a9568a1",
-   "metadata": {},
+   "id": "71efb79b",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "\\frac{\\partial{\\cal C}}{\\partial w_{jk}^L}  =  \\delta_j^La_k^{L-1}.\n",
@@ -3103,8 +3406,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "29ac7458",
-   "metadata": {},
+   "id": "db91aa56",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "## Derivatives in terms of $z_j^L$\n",
     "\n",
@@ -3113,8 +3418,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "693c53ac",
-   "metadata": {},
+   "id": "863fd39c",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "\\delta_j^L =\\frac{\\partial {\\cal C}}{\\partial z_j^L}= \\frac{\\partial {\\cal C}}{\\partial a_j^L}\\frac{\\partial a_j^L}{\\partial z_j^L},\n",
@@ -3123,16 +3430,20 @@
   },
   {
    "cell_type": "markdown",
-   "id": "650e4dea",
-   "metadata": {},
+   "id": "d7a29955",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "which can also be interpreted as the partial derivative of the cost function with respect to the biases $b_j^L$, namely"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "4fbf17cf",
-   "metadata": {},
+   "id": "8a4635df",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "\\delta_j^L = \\frac{\\partial {\\cal C}}{\\partial b_j^L}\\frac{\\partial b_j^L}{\\partial z_j^L}=\\frac{\\partial {\\cal C}}{\\partial b_j^L},\n",
@@ -3141,16 +3452,20 @@
   },
   {
    "cell_type": "markdown",
-   "id": "a7869a22",
-   "metadata": {},
+   "id": "ea6450ca",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "That is, the error $\\delta_j^L$ is exactly equal to the rate of change of the cost function as a function of the bias."
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "cfd6c9d7",
-   "metadata": {},
+   "id": "56c847db",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "## Bringing it together\n",
     "\n",
@@ -3159,8 +3474,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "73360f7e",
-   "metadata": {},
+   "id": "6f455069",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "<!-- Equation labels as ordinary links -->\n",
     "<div id=\"_auto1\"></div>\n",
@@ -3175,16 +3492,20 @@
   },
   {
    "cell_type": "markdown",
-   "id": "2685bc86",
-   "metadata": {},
+   "id": "0a56ea2a",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "and"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "ff174cf5",
-   "metadata": {},
+   "id": "c4a0f00b",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "<!-- Equation labels as ordinary links -->\n",
     "<div id=\"_auto2\"></div>\n",
@@ -3199,16 +3520,20 @@
   },
   {
    "cell_type": "markdown",
-   "id": "1c97902a",
-   "metadata": {},
+   "id": "5635a28d",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "and"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "d70e9292",
-   "metadata": {},
+   "id": "f61fc9dd",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "<!-- Equation labels as ordinary links -->\n",
     "<div id=\"_auto3\"></div>\n",
@@ -3223,8 +3548,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "ffcecfbe",
-   "metadata": {},
+   "id": "6c0414f4",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "## Final back propagating equation\n",
     "\n",
@@ -3233,8 +3560,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "e49c2a4e",
-   "metadata": {},
+   "id": "2e382020",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "\\delta_j^l =\\frac{\\partial {\\cal C}}{\\partial z_j^l}.\n",
@@ -3243,16 +3572,20 @@
   },
   {
    "cell_type": "markdown",
-   "id": "4c706310",
-   "metadata": {},
+   "id": "83d468a5",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "We want to express this in terms of the equations for layer $l+1$."
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "301cfb1f",
-   "metadata": {},
+   "id": "068475e3",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "## Using the chain rule and summing over all $k$ entries\n",
     "\n",
@@ -3261,8 +3594,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "5dc4f3ba",
-   "metadata": {},
+   "id": "a805b943",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "\\delta_j^l =\\sum_k \\frac{\\partial {\\cal C}}{\\partial z_k^{l+1}}\\frac{\\partial z_k^{l+1}}{\\partial z_j^{l}}=\\sum_k \\delta_k^{l+1}\\frac{\\partial z_k^{l+1}}{\\partial z_j^{l}},\n",
@@ -3271,16 +3606,20 @@
   },
   {
    "cell_type": "markdown",
-   "id": "3dfdac72",
-   "metadata": {},
+   "id": "3662ac90",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "and recalling that"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "c04afa91",
-   "metadata": {},
+   "id": "86ef1273",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "z_j^{l+1} = \\sum_{i=1}^{M_{l}}w_{ij}^{l+1}a_i^{l}+b_j^{l+1},\n",
@@ -3289,16 +3628,20 @@
   },
   {
    "cell_type": "markdown",
-   "id": "b7a22173",
-   "metadata": {},
+   "id": "ec2c38d1",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "with $M_l$ being the number of nodes in layer $l$, we obtain"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "fa9ff041",
-   "metadata": {},
+   "id": "bcb000b3",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "\\delta_j^l =\\sum_k \\delta_k^{l+1}w_{kj}^{l+1}\\sigma'(z_j^l),\n",
@@ -3307,8 +3650,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "a1648ab9",
-   "metadata": {},
+   "id": "86ac8888",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "This is our final equation.\n",
     "\n",
@@ -3317,8 +3662,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "24569ace",
-   "metadata": {},
+   "id": "ebf7592a",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "## Setting up the back propagation algorithm\n",
     "\n",
@@ -3338,8 +3685,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "478d8aec",
-   "metadata": {},
+   "id": "b88c50a9",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "## Setting up the back propagation algorithm, part 2\n",
     "\n",
@@ -3348,8 +3697,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "a17614e1",
-   "metadata": {},
+   "id": "6b690bb5",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "\\delta_j^L = \\sigma'(z_j^L)\\frac{\\partial {\\cal C}}{\\partial (a_j^L)}.\n",
@@ -3358,16 +3709,20 @@
   },
   {
    "cell_type": "markdown",
-   "id": "2bc63b98",
-   "metadata": {},
+   "id": "6282bb36",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "Then we compute the back propagate error for each $l=L-1,L-2,\\dots,1$ as"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "060a6a00",
-   "metadata": {},
+   "id": "abb9ade2",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "\\delta_j^l = \\sum_k \\delta_k^{l+1}w_{kj}^{l+1}\\sigma'(z_j^l).\n",
@@ -3376,8 +3731,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "f73677e1",
-   "metadata": {},
+   "id": "2eb5ab80",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "## Setting up the Back propagation algorithm, part 3\n",
     "\n",
@@ -3388,8 +3745,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "b0faa8a5",
-   "metadata": {},
+   "id": "ed8cd165",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "w_{jk}^l\\leftarrow  = w_{jk}^l- \\eta \\delta_j^la_k^{l-1},\n",
@@ -3398,8 +3757,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "4f0820e6",
-   "metadata": {},
+   "id": "e8ed463c",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "b_j^l \\leftarrow b_j^l-\\eta \\frac{\\partial {\\cal C}}{\\partial b_j^l}=b_j^l-\\eta \\delta_j^l,\n",
@@ -3408,16 +3769,20 @@
   },
   {
    "cell_type": "markdown",
-   "id": "72ac45e2",
-   "metadata": {},
+   "id": "271d5106",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "with $\\eta$ being the learning rate."
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "53cbf551",
-   "metadata": {},
+   "id": "fae74043",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "## Updating the gradients\n",
     "\n",
@@ -3426,8 +3791,10 @@
   },
   {
    "cell_type": "markdown",
-   "id": "8a9433ea",
-   "metadata": {},
+   "id": "4b3f62c9",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "\\delta_j^l = \\sum_k \\delta_k^{l+1}w_{kj}^{l+1}sigma'(z_j^l),\n",
@@ -3436,16 +3803,20 @@
   },
   {
    "cell_type": "markdown",
-   "id": "07c92792",
-   "metadata": {},
+   "id": "e72956ae",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "we update the weights and the biases using gradient descent for each $l=L-1,L-2,\\dots,1$ and update the weights and biases according to the rules"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "efa48fd7",
-   "metadata": {},
+   "id": "8cc15d61",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "w_{jk}^l\\leftarrow  = w_{jk}^l- \\eta \\delta_j^la_k^{l-1},\n",
@@ -3454,586 +3825,15 @@
   },
   {
    "cell_type": "markdown",
-   "id": "6c4db201",
-   "metadata": {},
+   "id": "2adb40ea",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
     "b_j^l \\leftarrow b_j^l-\\eta \\frac{\\partial {\\cal C}}{\\partial b_j^l}=b_j^l-\\eta \\delta_j^l,\n",
     "$$"
    ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "19ae4f4e",
-   "metadata": {},
-   "source": [
-    "### Activation functions\n",
-    "\n",
-    "A property that characterizes a neural network, other than its\n",
-    "connectivity, is the choice of activation function(s).  As described\n",
-    "in, the following restrictions are imposed on an activation function\n",
-    "for a FFNN to fulfill the universal approximation theorem\n",
-    "\n",
-    "  * Non-constant\n",
-    "\n",
-    "  * Bounded\n",
-    "\n",
-    "  * Monotonically-increasing\n",
-    "\n",
-    "  * Continuous"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "c478ce3c",
-   "metadata": {},
-   "source": [
-    "### Activation functions, Logistic and Hyperbolic ones\n",
-    "\n",
-    "The second requirement excludes all linear functions. Furthermore, in\n",
-    "a MLP with only linear activation functions, each layer simply\n",
-    "performs a linear transformation of its inputs.\n",
-    "\n",
-    "Regardless of the number of layers, the output of the NN will be\n",
-    "nothing but a linear function of the inputs. Thus we need to introduce\n",
-    "some kind of non-linearity to the NN to be able to fit non-linear\n",
-    "functions Typical examples are the logistic *Sigmoid*"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "5142ca2b",
-   "metadata": {},
-   "source": [
-    "$$\n",
-    "f(x) = \\frac{1}{1 + e^{-x}},\n",
-    "$$"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "9c4a44fb",
-   "metadata": {},
-   "source": [
-    "and the *hyperbolic tangent* function"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "74fded03",
-   "metadata": {},
-   "source": [
-    "$$\n",
-    "f(x) = \\tanh(x)\n",
-    "$$"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "fefab641",
-   "metadata": {},
-   "source": [
-    "### Relevance\n",
-    "\n",
-    "The *sigmoid* function are more biologically plausible because the\n",
-    "output of inactive neurons are zero. Such activation function are\n",
-    "called *one-sided*. However, it has been shown that the hyperbolic\n",
-    "tangent performs better than the sigmoid for training MLPs.  has\n",
-    "become the most popular for *deep neural networks*"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 7,
-   "id": "56c89538",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "\"\"\"The sigmoid function (or the logistic curve) is a \n",
-    "function that takes any real number, z, and outputs a number (0,1).\n",
-    "It is useful in neural networks for assigning weights on a relative scale.\n",
-    "The value z is the weighted sum of parameters involved in the learning algorithm.\"\"\"\n",
-    "\n",
-    "import numpy\n",
-    "import matplotlib.pyplot as plt\n",
-    "import math as mt\n",
-    "\n",
-    "z = numpy.arange(-5, 5, .1)\n",
-    "sigma_fn = numpy.vectorize(lambda z: 1/(1+numpy.exp(-z)))\n",
-    "sigma = sigma_fn(z)\n",
-    "\n",
-    "fig = plt.figure()\n",
-    "ax = fig.add_subplot(111)\n",
-    "ax.plot(z, sigma)\n",
-    "ax.set_ylim([-0.1, 1.1])\n",
-    "ax.set_xlim([-5,5])\n",
-    "ax.grid(True)\n",
-    "ax.set_xlabel('z')\n",
-    "ax.set_title('sigmoid function')\n",
-    "\n",
-    "plt.show()\n",
-    "\n",
-    "\"\"\"Step Function\"\"\"\n",
-    "z = numpy.arange(-5, 5, .02)\n",
-    "step_fn = numpy.vectorize(lambda z: 1.0 if z >= 0.0 else 0.0)\n",
-    "step = step_fn(z)\n",
-    "\n",
-    "fig = plt.figure()\n",
-    "ax = fig.add_subplot(111)\n",
-    "ax.plot(z, step)\n",
-    "ax.set_ylim([-0.5, 1.5])\n",
-    "ax.set_xlim([-5,5])\n",
-    "ax.grid(True)\n",
-    "ax.set_xlabel('z')\n",
-    "ax.set_title('step function')\n",
-    "\n",
-    "plt.show()\n",
-    "\n",
-    "\"\"\"Sine Function\"\"\"\n",
-    "z = numpy.arange(-2*mt.pi, 2*mt.pi, 0.1)\n",
-    "t = numpy.sin(z)\n",
-    "\n",
-    "fig = plt.figure()\n",
-    "ax = fig.add_subplot(111)\n",
-    "ax.plot(z, t)\n",
-    "ax.set_ylim([-1.0, 1.0])\n",
-    "ax.set_xlim([-2*mt.pi,2*mt.pi])\n",
-    "ax.grid(True)\n",
-    "ax.set_xlabel('z')\n",
-    "ax.set_title('sine function')\n",
-    "\n",
-    "plt.show()\n",
-    "\n",
-    "\"\"\"Plots a graph of the squashing function used by a rectified linear\n",
-    "unit\"\"\"\n",
-    "z = numpy.arange(-2, 2, .1)\n",
-    "zero = numpy.zeros(len(z))\n",
-    "y = numpy.max([zero, z], axis=0)\n",
-    "\n",
-    "fig = plt.figure()\n",
-    "ax = fig.add_subplot(111)\n",
-    "ax.plot(z, y)\n",
-    "ax.set_ylim([-2.0, 2.0])\n",
-    "ax.set_xlim([-2.0, 2.0])\n",
-    "ax.grid(True)\n",
-    "ax.set_xlabel('z')\n",
-    "ax.set_title('Rectified linear unit')\n",
-    "\n",
-    "plt.show()"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "21bdb71c",
-   "metadata": {},
-   "source": [
-    "## Fine-tuning neural network hyperparameters\n",
-    "\n",
-    "The flexibility of neural networks is also one of their main\n",
-    "drawbacks: there are many hyperparameters to tweak. Not only can you\n",
-    "use any imaginable network topology (how neurons/nodes are\n",
-    "interconnected), but even in a simple FFNN you can change the number\n",
-    "of layers, the number of neurons per layer, the type of activation\n",
-    "function to use in each layer, the weight initialization logic, the\n",
-    "stochastic gradient optmized and much more. How do you know what\n",
-    "combination of hyperparameters is the best for your task?\n",
-    "\n",
-    "* You can use grid search with cross-validation to find the right hyperparameters.\n",
-    "\n",
-    "However,since there are many hyperparameters to tune, and since\n",
-    "training a neural network on a large dataset takes a lot of time, you\n",
-    "will only be able to explore a tiny part of the hyperparameter space.\n",
-    "\n",
-    "* You can use randomized search.\n",
-    "\n",
-    "* Or use tools like [Oscar](http://oscar.calldesk.ai/), which implements more complex algorithms to help you find a good set of hyperparameters quickly."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "82072ebe",
-   "metadata": {},
-   "source": [
-    "## Hidden layers\n",
-    "\n",
-    "For many problems you can start with just one or two hidden layers and\n",
-    "it will work just fine.  For the MNIST data set you ca easily get a\n",
-    "high accuracy using just one hidden layer with a few hundred neurons.\n",
-    "You can reach for this data set above 98% accuracy using two hidden\n",
-    "layers with the same total amount of neurons, in roughly the same\n",
-    "amount of training time.\n",
-    "\n",
-    "For more complex problems, you can gradually ramp up the number of\n",
-    "hidden layers, until you start overfitting the training set. Very\n",
-    "complex tasks, such as large image classification or speech\n",
-    "recognition, typically require networks with dozens of layers and they\n",
-    "need a huge amount of training data. However, you will rarely have to\n",
-    "train such networks from scratch: it is much more common to reuse\n",
-    "parts of a pretrained state-of-the-art network that performs a similar\n",
-    "task."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "8bfa5b96",
-   "metadata": {},
-   "source": [
-    "## Vanishing gradients\n",
-    "\n",
-    "The Back propagation algorithm we derived above works by going from\n",
-    "the output layer to the input layer, propagating the error gradient on\n",
-    "the way. Once the algorithm has computed the gradient of the cost\n",
-    "function with regards to each parameter in the network, it uses these\n",
-    "gradients to update each parameter with a Gradient Descent (GD) step.\n",
-    "\n",
-    "Unfortunately for us, the gradients often get smaller and smaller as\n",
-    "the algorithm progresses down to the first hidden layers. As a result,\n",
-    "the GD update leaves the lower layer connection weights virtually\n",
-    "unchanged, and training never converges to a good solution. This is\n",
-    "known in the literature as **the vanishing gradients problem**."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "a52a67bd",
-   "metadata": {},
-   "source": [
-    "## Exploding gradients\n",
-    "\n",
-    "In other cases, the opposite can happen, namely the the gradients can\n",
-    "grow bigger and bigger. The result is that many of the layers get\n",
-    "large updates of the weights the algorithm diverges. This is the\n",
-    "**exploding gradients problem**, which is mostly encountered in\n",
-    "recurrent neural networks. More generally, deep neural networks suffer\n",
-    "from unstable gradients, different layers may learn at widely\n",
-    "different speeds"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "052308cd",
-   "metadata": {},
-   "source": [
-    "## Is the Logistic activation function (Sigmoid)  our choice?\n",
-    "\n",
-    "Although this unfortunate behavior has been empirically observed for\n",
-    "quite a while (it was one of the reasons why deep neural networks were\n",
-    "mostly abandoned for a long time), it is only around 2010 that\n",
-    "significant progress was made in understanding it.\n",
-    "\n",
-    "A paper titled [Understanding the Difficulty of Training Deep\n",
-    "Feedforward Neural Networks by Xavier Glorot and Yoshua Bengio](http://proceedings.mlr.press/v9/glorot10a.html) found that\n",
-    "the problems with the popular logistic\n",
-    "sigmoid activation function and the weight initialization technique\n",
-    "that was most popular at the time, namely random initialization using\n",
-    "a normal distribution with a mean of 0 and a standard deviation of\n",
-    "1."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "2c943852",
-   "metadata": {},
-   "source": [
-    "## Logistic function as the root of problems\n",
-    "\n",
-    "They showed that with this activation function and this\n",
-    "initialization scheme, the variance of the outputs of each layer is\n",
-    "much greater than the variance of its inputs. Going forward in the\n",
-    "network, the variance keeps increasing after each layer until the\n",
-    "activation function saturates at the top layers. This is actually made\n",
-    "worse by the fact that the logistic function has a mean of 0.5, not 0\n",
-    "(the hyperbolic tangent function has a mean of 0 and behaves slightly\n",
-    "better than the logistic function in deep networks)."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "09e98546",
-   "metadata": {},
-   "source": [
-    "## The derivative of the Logistic funtion\n",
-    "\n",
-    "Looking at the logistic activation function, when inputs become large\n",
-    "(negative or positive), the function saturates at 0 or 1, with a\n",
-    "derivative extremely close to 0. Thus when backpropagation kicks in,\n",
-    "it has virtually no gradient to propagate back through the network,\n",
-    "and what little gradient exists keeps getting diluted as\n",
-    "backpropagation progresses down through the top layers, so there is\n",
-    "really nothing left for the lower layers.\n",
-    "\n",
-    "In their paper, Glorot and Bengio propose a way to significantly\n",
-    "alleviate this problem. We need the signal to flow properly in both\n",
-    "directions: in the forward direction when making predictions, and in\n",
-    "the reverse direction when backpropagating gradients. We don’t want\n",
-    "the signal to die out, nor do we want it to explode and saturate. For\n",
-    "the signal to flow properly, the authors argue that we need the\n",
-    "variance of the outputs of each layer to be equal to the variance of\n",
-    "its inputs, and we also need the gradients to have equal variance\n",
-    "before and after flowing through a layer in the reverse direction."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "89458b0d",
-   "metadata": {},
-   "source": [
-    "## Insights from the paper by Glorot and Bengio\n",
-    "\n",
-    "One of the insights in the 2010 paper by Glorot and Bengio was that\n",
-    "the vanishing/exploding gradients problems were in part due to a poor\n",
-    "choice of activation function. Until then most people had assumed that\n",
-    "if Nature had chosen to use roughly sigmoid activation functions in\n",
-    "biological neurons, they must be an excellent choice. But it turns out\n",
-    "that other activation functions behave much better in deep neural\n",
-    "networks, in particular the ReLU activation function, mostly because\n",
-    "it does not saturate for positive values (and also because it is quite\n",
-    "fast to compute)."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "bb787cf2",
-   "metadata": {},
-   "source": [
-    "## The RELU function family\n",
-    "\n",
-    "The ReLU activation function suffers from a problem known as the dying\n",
-    "ReLUs: during training, some neurons effectively die, meaning they\n",
-    "stop outputting anything other than 0.\n",
-    "\n",
-    "In some cases, you may find that half of your network’s neurons are\n",
-    "dead, especially if you used a large learning rate. During training,\n",
-    "if a neuron’s weights get updated such that the weighted sum of the\n",
-    "neuron’s inputs is negative, it will start outputting 0. When this\n",
-    "happen, the neuron is unlikely to come back to life since the gradient\n",
-    "of the ReLU function is 0 when its input is negative."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "8d6d8c18",
-   "metadata": {},
-   "source": [
-    "## ELU function\n",
-    "\n",
-    "To solve this problem, nowadays practitioners use a variant of the\n",
-    "ReLU function, such as the leaky ReLU discussed above or the so-called\n",
-    "exponential linear unit (ELU) function"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "52c4e6b6",
-   "metadata": {},
-   "source": [
-    "$$\n",
-    "ELU(z) = \\left\\{\\begin{array}{cc} \\alpha\\left( \\exp{(z)}-1\\right) & z < 0,\\\\  z & z \\ge 0.\\end{array}\\right.\n",
-    "$$"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "d0dad2c7",
-   "metadata": {},
-   "source": [
-    "## Which activation function should we use?\n",
-    "\n",
-    "In general it seems that the ELU activation function is better than\n",
-    "the leaky ReLU function (and its variants), which is better than\n",
-    "ReLU. ReLU performs better than $\\tanh$ which in turn performs better\n",
-    "than the logistic function.\n",
-    "\n",
-    "If runtime performance is an issue, then you may opt for the leaky\n",
-    "ReLU function over the ELU function If you don’t want to tweak yet\n",
-    "another hyperparameter, you may just use the default $\\alpha$ of\n",
-    "$0.01$ for the leaky ReLU, and $1$ for ELU. If you have spare time and\n",
-    "computing power, you can use cross-validation or bootstrap to evaluate\n",
-    "other activation functions."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "31d1148a",
-   "metadata": {},
-   "source": [
-    "## More on activation functions, output layers\n",
-    "\n",
-    "In most cases you can use the ReLU activation function in the hidden\n",
-    "layers (or one of its variants).\n",
-    "\n",
-    "It is a bit faster to compute than other activation functions, and the\n",
-    "gradient descent optimization does in general not get stuck.\n",
-    "\n",
-    "**For the output layer:**\n",
-    "\n",
-    "* For classification the softmax activation function is generally a good choice for classification tasks (when the classes are mutually exclusive).\n",
-    "\n",
-    "* For regression tasks, you can simply use no activation function at all."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "d6d621a7",
-   "metadata": {},
-   "source": [
-    "## Batch Normalization\n",
-    "\n",
-    "Batch Normalization aims to address the vanishing/exploding gradients\n",
-    "problems, and more generally the problem that the distribution of each\n",
-    "layer’s inputs changes during training, as the parameters of the\n",
-    "previous layers change.\n",
-    "\n",
-    "The technique consists of adding an operation in the model just before\n",
-    "the activation function of each layer, simply zero-centering and\n",
-    "normalizing the inputs, then scaling and shifting the result using two\n",
-    "new parameters per layer (one for scaling, the other for shifting). In\n",
-    "other words, this operation lets the model learn the optimal scale and\n",
-    "mean of the inputs for each layer.  In order to zero-center and\n",
-    "normalize the inputs, the algorithm needs to estimate the inputs’ mean\n",
-    "and standard deviation. It does so by evaluating the mean and standard\n",
-    "deviation of the inputs over the current mini-batch, from this the\n",
-    "name batch normalization."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "379983d5",
-   "metadata": {},
-   "source": [
-    "## Dropout\n",
-    "\n",
-    "It is a fairly simple algorithm: at every training step, every neuron\n",
-    "(including the input neurons but excluding the output neurons) has a\n",
-    "probability $p$ of being temporarily dropped out, meaning it will be\n",
-    "entirely ignored during this training step, but it may be active\n",
-    "during the next step.\n",
-    "\n",
-    "The hyperparameter $p$ is called the dropout rate, and it is typically\n",
-    "set to 50%. After training, the neurons are not dropped anymore.  It\n",
-    "is viewed as one of the most popular regularization techniques."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "b769aa51",
-   "metadata": {},
-   "source": [
-    "## Gradient Clipping\n",
-    "\n",
-    "A popular technique to lessen the exploding gradients problem is to\n",
-    "simply clip the gradients during backpropagation so that they never\n",
-    "exceed some threshold (this is mostly useful for recurrent neural\n",
-    "networks).\n",
-    "\n",
-    "This technique is called Gradient Clipping.\n",
-    "\n",
-    "In general however, Batch\n",
-    "Normalization is preferred."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "abeef409",
-   "metadata": {},
-   "source": [
-    "## A top-down perspective on Neural networks\n",
-    "\n",
-    "The first thing we would like to do is divide the data into two or\n",
-    "three parts. A training set, a validation or dev (development) set,\n",
-    "and a test set. The test set is the data on which we want to make\n",
-    "predictions. The dev set is a subset of the training data we use to\n",
-    "check how well we are doing out-of-sample, after training the model on\n",
-    "the training dataset. We use the validation error as a proxy for the\n",
-    "test error in order to make tweaks to our model. It is crucial that we\n",
-    "do not use any of the test data to train the algorithm. This is a\n",
-    "cardinal sin in ML. Then:\n",
-    "\n",
-    "1. Estimate optimal error rate\n",
-    "\n",
-    "2. Minimize underfitting (bias) on training data set.\n",
-    "\n",
-    "3. Make sure you are not overfitting."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "75165741",
-   "metadata": {},
-   "source": [
-    "## More top-down perspectives\n",
-    "\n",
-    "If the validation and test sets are drawn from the same distributions,\n",
-    "then a good performance on the validation set should lead to similarly\n",
-    "good performance on the test set. \n",
-    "\n",
-    "However, sometimes\n",
-    "the training data and test data differ in subtle ways because, for\n",
-    "example, they are collected using slightly different methods, or\n",
-    "because it is cheaper to collect data in one way versus another. In\n",
-    "this case, there can be a mismatch between the training and test\n",
-    "data. This can lead to the neural network overfitting these small\n",
-    "differences between the test and training sets, and a poor performance\n",
-    "on the test set despite having a good performance on the validation\n",
-    "set. To rectify this, Andrew Ng suggests making two validation or dev\n",
-    "sets, one constructed from the training data and one constructed from\n",
-    "the test data. The difference between the performance of the algorithm\n",
-    "on these two validation sets quantifies the train-test mismatch. This\n",
-    "can serve as another important diagnostic when using DNNs for\n",
-    "supervised learning."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "14f22927",
-   "metadata": {},
-   "source": [
-    "## Limitations of supervised learning with deep networks\n",
-    "\n",
-    "Like all statistical methods, supervised learning using neural\n",
-    "networks has important limitations. This is especially important when\n",
-    "one seeks to apply these methods, especially to physics problems. Like\n",
-    "all tools, DNNs are not a universal solution. Often, the same or\n",
-    "better performance on a task can be achieved by using a few\n",
-    "hand-engineered features (or even a collection of random\n",
-    "features)."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "f5ad6d66",
-   "metadata": {},
-   "source": [
-    "## Limitations of NNs\n",
-    "\n",
-    "Here we list some of the important limitations of supervised neural network based models. \n",
-    "\n",
-    "* **Need labeled data**. All supervised learning methods, DNNs for supervised learning require labeled data. Often, labeled data is harder to acquire than unlabeled data (e.g. one must pay for human experts to label images).\n",
-    "\n",
-    "* **Supervised neural networks are extremely data intensive.** DNNs are data hungry. They perform best when data is plentiful. This is doubly so for supervised methods where the data must also be labeled. The utility of DNNs is extremely limited if data is hard to acquire or the datasets are small (hundreds to a few thousand samples). In this case, the performance of other methods that utilize hand-engineered features can exceed that of DNNs."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "82f3c1c1",
-   "metadata": {},
-   "source": [
-    "## Homogeneous data\n",
-    "\n",
-    "* **Homogeneous data.** Almost all DNNs deal with homogeneous data of one type. It is very hard to design architectures that mix and match data types (i.e. some continuous variables, some discrete variables, some time series). In applications beyond images, video, and language, this is often what is required. In contrast, ensemble models like random forests or gradient-boosted trees have no difficulty handling mixed data types."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "01fba9a0",
-   "metadata": {},
-   "source": [
-    "## More limitations\n",
-    "\n",
-    "* **Many problems are not about prediction.** In natural science we are often interested in learning something about the underlying distribution that generates the data. In this case, it is often difficult to cast these ideas in a supervised learning setting. While the problems are related, it is possible to make good predictions with a *wrong* model. The model might or might not be useful for understanding the underlying science.\n",
-    "\n",
-    "Some of these remarks are particular to DNNs, others are shared by all supervised learning methods. This motivates the use of unsupervised methods which in part circumvent these problems."
-   ]
   }
  ],
  "metadata": {
@@ -4052,7 +3852,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.9.18"
+   "version": "3.9.15"
   }
  },
  "nbformat": 4,
diff --git a/doc/pub/week42/html/._week42-bs000.html b/doc/pub/week42/html/._week42-bs000.html
index 01e8a9642..d478d6ac7 100644
--- a/doc/pub/week42/html/._week42-bs000.html
+++ b/doc/pub/week42/html/._week42-bs000.html
@@ -36,16 +36,17 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Lecture October 14, 2024', 2, None, 'lecture-october-14-2024'),
-              ('Material for the active learning sessions on Tuesday and '
-               'Wednesday',
+ 'sections': [('Lecture October 13, 2025', 2, None, 'lecture-october-13-2025'),
+              ('Readings and videos', 2, None, 'readings-and-videos'),
+              ('Material for the lab sessions on Tuesday and Wednesday',
                2,
                None,
-               'material-for-the-active-learning-sessions-on-tuesday-and-wednesday'),
-              ('Writing a code which implements a feed-forward neural network',
+               'material-for-the-lab-sessions-on-tuesday-and-wednesday'),
+              ('Lecture material: Writing a code which implements a '
+               'feed-forward neural network',
                2,
                None,
-               'writing-a-code-which-implements-a-feed-forward-neural-network'),
+               'lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network'),
               ('Mathematics of deep learning',
                2,
                None,
@@ -55,10 +56,11 @@
                None,
                'reminder-on-books-with-hands-on-material-and-codes'),
               ('Reading recommendations', 2, None, 'reading-recommendations'),
-              ('First network example, simple percepetron with one input',
+              ('Reminder from last week: First network example, simple '
+               'percepetron with one input',
                2,
                None,
-               'first-network-example-simple-percepetron-with-one-input'),
+               'reminder-from-last-week-first-network-example-simple-percepetron-with-one-input'),
               ('Layout of a simple neural network with no hidden layer',
                2,
                None,
@@ -105,10 +107,10 @@
                None,
                'setting-up-the-equations-for-a-neural-network'),
               ('Layout of a neural network with three hidden layers (last '
-               'later = $l=L=4$, first layer $l=0$)',
+               'layer = $l=L=4$, first layer $l=0$)',
                2,
                None,
-               'layout-of-a-neural-network-with-three-hidden-layers-last-later-l-l-4-first-layer-l-0'),
+               'layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0'),
               ('Definitions', 2, None, 'definitions'),
               ('Inputs to the activation function',
                2,
@@ -291,10 +293,6 @@
                2,
                None,
                'collect-and-pre-process-data'),
-              ('The Breast Cancer Data, now with Keras',
-               2,
-               None,
-               'the-breast-cancer-data-now-with-keras'),
               ('Building a neural network code',
                2,
                None,
@@ -349,103 +347,103 @@
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs001.html#lecture-october-14-2024" style="font-size: 80%;"><b>Lecture October 14, 2024</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs002.html#material-for-the-active-learning-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Material for the active learning sessions on Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs003.html#writing-a-code-which-implements-a-feed-forward-neural-network" style="font-size: 80%;"><b>Writing a code which implements a feed-forward neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs004.html#mathematics-of-deep-learning" style="font-size: 80%;"><b>Mathematics of deep learning</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs005.html#reminder-on-books-with-hands-on-material-and-codes" style="font-size: 80%;"><b>Reminder on books with hands-on material and codes</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs006.html#reading-recommendations" style="font-size: 80%;"><b>Reading recommendations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs007.html#first-network-example-simple-percepetron-with-one-input" style="font-size: 80%;"><b>First network example, simple percepetron with one input</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs008.html#layout-of-a-simple-neural-network-with-no-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with no hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs009.html#optimizing-the-parameters" style="font-size: 80%;"><b>Optimizing the parameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs010.html#adding-a-hidden-layer" style="font-size: 80%;"><b>Adding a hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs011.html#layout-of-a-simple-neural-network-with-one-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with one hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs012.html#the-derivatives" style="font-size: 80%;"><b>The derivatives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs013.html#important-observations" style="font-size: 80%;"><b>Important observations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs014.html#the-training" style="font-size: 80%;"><b>The training</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs015.html#code-example" style="font-size: 80%;"><b>Code example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs016.html#simple-neural-network-and-the-back-propagation-equations" style="font-size: 80%;"><b>Simple neural network and the  back propagation equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs017.html#layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-with-two-hidden-noeds-and-one-output-node" style="font-size: 80%;"><b>Layout of a simple neural network with two input nodes, one  hidden layer with two hidden noeds and one output node</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs018.html#the-ouput-layer" style="font-size: 80%;"><b>The ouput layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs019.html#compact-expressions" style="font-size: 80%;"><b>Compact expressions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs020.html#output-layer" style="font-size: 80%;"><b>Output layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs021.html#explicit-derivatives" style="font-size: 80%;"><b>Explicit derivatives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs022.html#derivatives-of-the-hidden-layer" style="font-size: 80%;"><b>Derivatives of the hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs023.html#final-expression" style="font-size: 80%;"><b>Final expression</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs024.html#completing-the-list" style="font-size: 80%;"><b>Completing the list</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs025.html#final-expressions-for-the-biases-of-the-hidden-layer" style="font-size: 80%;"><b>Final expressions for the biases of the hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs026.html#gradient-expressions" style="font-size: 80%;"><b>Gradient expressions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs027.html#setting-up-the-equations-for-a-neural-network" style="font-size: 80%;"><b>Setting up the equations for a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs028.html#layout-of-a-neural-network-with-three-hidden-layers-last-later-l-l-4-first-layer-l-0" style="font-size: 80%;"><b>Layout of a neural network with three hidden layers (last later = \( l=L=4 \), first layer \( l=0 \))</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs029.html#definitions" style="font-size: 80%;"><b>Definitions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs030.html#inputs-to-the-activation-function" style="font-size: 80%;"><b>Inputs to the activation function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs031.html#layout-of-input-to-first-hidden-layer-l-1-from-input-layer-l-0" style="font-size: 80%;"><b>Layout of input to first hidden layer \( l=1 \) from input layer \( l=0 \)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs032.html#derivatives-and-the-chain-rule" style="font-size: 80%;"><b>Derivatives and the chain rule</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs033.html#derivative-of-the-cost-function" style="font-size: 80%;"><b>Derivative of the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs034.html#the-back-propagation-equations-for-a-neural-network" style="font-size: 80%;"><b>The  back propagation equations for a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs035.html#analyzing-the-last-results" style="font-size: 80%;"><b>Analyzing the last results</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs036.html#more-considerations" style="font-size: 80%;"><b>More considerations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs037.html#derivatives-in-terms-of-z-j-l" style="font-size: 80%;"><b>Derivatives in terms of \( z_j^L \)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs038.html#bringing-it-together" style="font-size: 80%;"><b>Bringing it together</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs039.html#final-back-propagating-equation" style="font-size: 80%;"><b>Final back propagating equation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs040.html#using-the-chain-rule-and-summing-over-all-k-entries" style="font-size: 80%;"><b>Using the chain rule and summing over all \( k \) entries</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs041.html#setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations" style="font-size: 80%;"><b>Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs042.html#setting-up-the-back-propagation-algorithm-part-1" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 1</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs043.html#setting-up-the-back-propagation-algorithm-part-2" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs044.html#setting-up-the-back-propagation-algorithm-part-3" style="font-size: 80%;"><b>Setting up the Back propagation algorithm, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs045.html#updating-the-gradients" style="font-size: 80%;"><b>Updating the gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs001.html#lecture-october-13-2025" style="font-size: 80%;"><b>Lecture October 13, 2025</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs002.html#readings-and-videos" style="font-size: 80%;"><b>Readings and videos</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs003.html#material-for-the-lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Material for the lab sessions on Tuesday and Wednesday</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs004.html#lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network" style="font-size: 80%;"><b>Lecture material: Writing a code which implements a feed-forward neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs005.html#mathematics-of-deep-learning" style="font-size: 80%;"><b>Mathematics of deep learning</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs006.html#reminder-on-books-with-hands-on-material-and-codes" style="font-size: 80%;"><b>Reminder on books with hands-on material and codes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs007.html#reading-recommendations" style="font-size: 80%;"><b>Reading recommendations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs008.html#reminder-from-last-week-first-network-example-simple-percepetron-with-one-input" style="font-size: 80%;"><b>Reminder from last week: First network example, simple percepetron with one input</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs009.html#layout-of-a-simple-neural-network-with-no-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with no hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs010.html#optimizing-the-parameters" style="font-size: 80%;"><b>Optimizing the parameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs011.html#adding-a-hidden-layer" style="font-size: 80%;"><b>Adding a hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs012.html#layout-of-a-simple-neural-network-with-one-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with one hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs013.html#the-derivatives" style="font-size: 80%;"><b>The derivatives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs014.html#important-observations" style="font-size: 80%;"><b>Important observations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs015.html#the-training" style="font-size: 80%;"><b>The training</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs016.html#code-example" style="font-size: 80%;"><b>Code example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs017.html#simple-neural-network-and-the-back-propagation-equations" style="font-size: 80%;"><b>Simple neural network and the  back propagation equations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs018.html#layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-with-two-hidden-noeds-and-one-output-node" style="font-size: 80%;"><b>Layout of a simple neural network with two input nodes, one  hidden layer with two hidden noeds and one output node</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs019.html#the-ouput-layer" style="font-size: 80%;"><b>The ouput layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs020.html#compact-expressions" style="font-size: 80%;"><b>Compact expressions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs021.html#output-layer" style="font-size: 80%;"><b>Output layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs022.html#explicit-derivatives" style="font-size: 80%;"><b>Explicit derivatives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs023.html#derivatives-of-the-hidden-layer" style="font-size: 80%;"><b>Derivatives of the hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs024.html#final-expression" style="font-size: 80%;"><b>Final expression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs025.html#completing-the-list" style="font-size: 80%;"><b>Completing the list</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs026.html#final-expressions-for-the-biases-of-the-hidden-layer" style="font-size: 80%;"><b>Final expressions for the biases of the hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs027.html#gradient-expressions" style="font-size: 80%;"><b>Gradient expressions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs028.html#setting-up-the-equations-for-a-neural-network" style="font-size: 80%;"><b>Setting up the equations for a neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs029.html#layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0" style="font-size: 80%;"><b>Layout of a neural network with three hidden layers (last layer = \( l=L=4 \), first layer \( l=0 \))</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs030.html#definitions" style="font-size: 80%;"><b>Definitions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs031.html#inputs-to-the-activation-function" style="font-size: 80%;"><b>Inputs to the activation function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs032.html#layout-of-input-to-first-hidden-layer-l-1-from-input-layer-l-0" style="font-size: 80%;"><b>Layout of input to first hidden layer \( l=1 \) from input layer \( l=0 \)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs033.html#derivatives-and-the-chain-rule" style="font-size: 80%;"><b>Derivatives and the chain rule</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs034.html#derivative-of-the-cost-function" style="font-size: 80%;"><b>Derivative of the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs035.html#the-back-propagation-equations-for-a-neural-network" style="font-size: 80%;"><b>The  back propagation equations for a neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs036.html#analyzing-the-last-results" style="font-size: 80%;"><b>Analyzing the last results</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs037.html#more-considerations" style="font-size: 80%;"><b>More considerations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs038.html#derivatives-in-terms-of-z-j-l" style="font-size: 80%;"><b>Derivatives in terms of \( z_j^L \)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs039.html#bringing-it-together" style="font-size: 80%;"><b>Bringing it together</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs040.html#final-back-propagating-equation" style="font-size: 80%;"><b>Final back propagating equation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs041.html#using-the-chain-rule-and-summing-over-all-k-entries" style="font-size: 80%;"><b>Using the chain rule and summing over all \( k \) entries</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs042.html#setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations" style="font-size: 80%;"><b>Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs043.html#setting-up-the-back-propagation-algorithm-part-1" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 1</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs044.html#setting-up-the-back-propagation-algorithm-part-2" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 2</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs045.html#setting-up-the-back-propagation-algorithm-part-3" style="font-size: 80%;"><b>Setting up the Back propagation algorithm, part 3</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs046.html#updating-the-gradients" style="font-size: 80%;"><b>Updating the gradients</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#activation-functions" style="font-size: 80%;"><b>Activation functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs047.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs048.html#relevance" style="font-size: 80%;"><b>Relevance</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs049.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs050.html#exploding-gradients" style="font-size: 80%;"><b>Exploding gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs051.html#is-the-logistic-activation-function-sigmoid-our-choice" style="font-size: 80%;"><b>Is the Logistic activation function (Sigmoid)  our choice?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs052.html#logistic-function-as-the-root-of-problems" style="font-size: 80%;"><b>Logistic function as the root of problems</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs053.html#the-derivative-of-the-logistic-funtion" style="font-size: 80%;"><b>The derivative of the Logistic funtion</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs054.html#insights-from-the-paper-by-glorot-and-bengio" style="font-size: 80%;"><b>Insights from the paper by Glorot and Bengio</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs055.html#the-relu-function-family" style="font-size: 80%;"><b>The RELU function family</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs056.html#elu-function" style="font-size: 80%;"><b>ELU function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs057.html#which-activation-function-should-we-use" style="font-size: 80%;"><b>Which activation function should we use?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs058.html#more-on-activation-functions-output-layers" style="font-size: 80%;"><b>More on activation functions, output layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs059.html#fine-tuning-neural-network-hyperparameters" style="font-size: 80%;"><b>Fine-tuning neural network hyperparameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs060.html#hidden-layers" style="font-size: 80%;"><b>Hidden layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs061.html#batch-normalization" style="font-size: 80%;"><b>Batch Normalization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs062.html#dropout" style="font-size: 80%;"><b>Dropout</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs063.html#gradient-clipping" style="font-size: 80%;"><b>Gradient Clipping</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs064.html#a-top-down-perspective-on-neural-networks" style="font-size: 80%;"><b>A top-down perspective on Neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs065.html#more-top-down-perspectives" style="font-size: 80%;"><b>More top-down perspectives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs066.html#limitations-of-supervised-learning-with-deep-networks" style="font-size: 80%;"><b>Limitations of supervised learning with deep networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs067.html#limitations-of-nns" style="font-size: 80%;"><b>Limitations of NNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs068.html#homogeneous-data" style="font-size: 80%;"><b>Homogeneous data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs069.html#more-limitations" style="font-size: 80%;"><b>More limitations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs070.html#setting-up-a-multi-layer-perceptron-model-for-classification" style="font-size: 80%;"><b>Setting up a Multi-layer perceptron model for classification</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs071.html#defining-the-cost-function" style="font-size: 80%;"><b>Defining the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs072.html#example-binary-classification-problem" style="font-size: 80%;"><b>Example: binary classification problem</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs073.html#the-softmax-function" style="font-size: 80%;"><b>The Softmax function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs074.html#developing-a-code-for-doing-neural-networks-with-back-propagation" style="font-size: 80%;"><b>Developing a code for doing neural networks with back propagation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs076.html#train-and-test-datasets" style="font-size: 80%;"><b>Train and test datasets</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs077.html#define-model-and-architecture" style="font-size: 80%;"><b>Define model and architecture</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs078.html#layers" style="font-size: 80%;"><b>Layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs079.html#weights-and-biases" style="font-size: 80%;"><b>Weights and biases</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs080.html#feed-forward-pass" style="font-size: 80%;"><b>Feed-forward pass</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs081.html#matrix-multiplications" style="font-size: 80%;"><b>Matrix multiplications</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs082.html#choose-cost-function-and-optimizer" style="font-size: 80%;"><b>Choose cost function and optimizer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs083.html#optimizing-the-cost-function" style="font-size: 80%;"><b>Optimizing the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs084.html#regularization" style="font-size: 80%;"><b>Regularization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs085.html#matrix-multiplication" style="font-size: 80%;"><b>Matrix  multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs086.html#improving-performance" style="font-size: 80%;"><b>Improving performance</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs087.html#full-object-oriented-implementation" style="font-size: 80%;"><b>Full object-oriented implementation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs088.html#evaluate-model-performance-on-test-data" style="font-size: 80%;"><b>Evaluate model performance on test data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs089.html#adjust-hyperparameters" style="font-size: 80%;"><b>Adjust hyperparameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs091.html#scikit-learn-implementation" style="font-size: 80%;"><b>scikit-learn implementation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#building-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs094.html#tensorflow" style="font-size: 80%;"><b>Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs095.html#using-keras" style="font-size: 80%;"><b>Using Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#the-breast-cancer-data-now-with-keras" style="font-size: 80%;"><b>The Breast Cancer Data, now with Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs048.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs049.html#relevance" style="font-size: 80%;"><b>Relevance</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs050.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs051.html#exploding-gradients" style="font-size: 80%;"><b>Exploding gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs052.html#is-the-logistic-activation-function-sigmoid-our-choice" style="font-size: 80%;"><b>Is the Logistic activation function (Sigmoid)  our choice?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs053.html#logistic-function-as-the-root-of-problems" style="font-size: 80%;"><b>Logistic function as the root of problems</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs054.html#the-derivative-of-the-logistic-funtion" style="font-size: 80%;"><b>The derivative of the Logistic funtion</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs055.html#insights-from-the-paper-by-glorot-and-bengio" style="font-size: 80%;"><b>Insights from the paper by Glorot and Bengio</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs056.html#the-relu-function-family" style="font-size: 80%;"><b>The RELU function family</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs057.html#elu-function" style="font-size: 80%;"><b>ELU function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs058.html#which-activation-function-should-we-use" style="font-size: 80%;"><b>Which activation function should we use?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs059.html#more-on-activation-functions-output-layers" style="font-size: 80%;"><b>More on activation functions, output layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs060.html#fine-tuning-neural-network-hyperparameters" style="font-size: 80%;"><b>Fine-tuning neural network hyperparameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs061.html#hidden-layers" style="font-size: 80%;"><b>Hidden layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs062.html#batch-normalization" style="font-size: 80%;"><b>Batch Normalization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs063.html#dropout" style="font-size: 80%;"><b>Dropout</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs064.html#gradient-clipping" style="font-size: 80%;"><b>Gradient Clipping</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs065.html#a-top-down-perspective-on-neural-networks" style="font-size: 80%;"><b>A top-down perspective on Neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs066.html#more-top-down-perspectives" style="font-size: 80%;"><b>More top-down perspectives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs067.html#limitations-of-supervised-learning-with-deep-networks" style="font-size: 80%;"><b>Limitations of supervised learning with deep networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs068.html#limitations-of-nns" style="font-size: 80%;"><b>Limitations of NNs</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs069.html#homogeneous-data" style="font-size: 80%;"><b>Homogeneous data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs070.html#more-limitations" style="font-size: 80%;"><b>More limitations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs071.html#setting-up-a-multi-layer-perceptron-model-for-classification" style="font-size: 80%;"><b>Setting up a Multi-layer perceptron model for classification</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs072.html#defining-the-cost-function" style="font-size: 80%;"><b>Defining the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs073.html#example-binary-classification-problem" style="font-size: 80%;"><b>Example: binary classification problem</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs074.html#the-softmax-function" style="font-size: 80%;"><b>The Softmax function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs075.html#developing-a-code-for-doing-neural-networks-with-back-propagation" style="font-size: 80%;"><b>Developing a code for doing neural networks with back propagation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs077.html#train-and-test-datasets" style="font-size: 80%;"><b>Train and test datasets</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs078.html#define-model-and-architecture" style="font-size: 80%;"><b>Define model and architecture</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs079.html#layers" style="font-size: 80%;"><b>Layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs080.html#weights-and-biases" style="font-size: 80%;"><b>Weights and biases</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs081.html#feed-forward-pass" style="font-size: 80%;"><b>Feed-forward pass</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs082.html#matrix-multiplications" style="font-size: 80%;"><b>Matrix multiplications</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs083.html#choose-cost-function-and-optimizer" style="font-size: 80%;"><b>Choose cost function and optimizer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs084.html#optimizing-the-cost-function" style="font-size: 80%;"><b>Optimizing the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs085.html#regularization" style="font-size: 80%;"><b>Regularization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs086.html#matrix-multiplication" style="font-size: 80%;"><b>Matrix  multiplication</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs087.html#improving-performance" style="font-size: 80%;"><b>Improving performance</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs088.html#full-object-oriented-implementation" style="font-size: 80%;"><b>Full object-oriented implementation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs089.html#evaluate-model-performance-on-test-data" style="font-size: 80%;"><b>Evaluate model performance on test data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs090.html#adjust-hyperparameters" style="font-size: 80%;"><b>Adjust hyperparameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#scikit-learn-implementation" style="font-size: 80%;"><b>scikit-learn implementation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs094.html#building-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building neural networks in Tensorflow and Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs095.html#tensorflow" style="font-size: 80%;"><b>Tensorflow</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#using-keras" style="font-size: 80%;"><b>Using Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#building-a-neural-network-code" style="font-size: 80%;"><b>Building a neural network code</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#learning-rate-methods" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Learning rate methods</a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#usage-of-the-above-learning-rate-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of the above learning rate schedulers</a></li>
@@ -472,18 +470,15 @@ <h1>Week 42 Constructing a Neural Network code with examples</h1>
 
 <!-- author(s): Morten Hjorth-Jensen -->
 <center>
-<b>Morten Hjorth-Jensen</b> [1, 2]
-</center>
-<!-- institution(s) -->
-<center>
-[1] <b>Department of Physics, University of Oslo</b>
+<b>Morten Hjorth-Jensen</b> 
 </center>
+<!-- institution -->
 <center>
-[2] <b>Department of Physics and Astronomy and Facility for Rare Isotope Beams, Michigan State University</b>
+<b>Department of Physics, University of Oslo, Norway</b>
 </center>
 <br>
 <center>
-<h4>October 14-18, 2024</h4>
+<h4>October 13-17, 2025</h4>
 </center> <!-- date -->
 <br>
 
@@ -522,7 +517,7 @@ <h4>October 14-18, 2024</h4>
 </footer>
 -->
 <center style="font-size:80%">
-<!-- copyright --> &copy; 1999-2024, Morten Hjorth-Jensen. Released under CC Attribution-NonCommercial 4.0 license
+<!-- copyright --> &copy; 1999-2025, Morten Hjorth-Jensen. Released under CC Attribution-NonCommercial 4.0 license
 </center>
 </body>
 </html>
diff --git a/doc/pub/week42/html/._week42-bs001.html b/doc/pub/week42/html/._week42-bs001.html
index 03cafa1e4..31d85ff9e 100644
--- a/doc/pub/week42/html/._week42-bs001.html
+++ b/doc/pub/week42/html/._week42-bs001.html
@@ -36,16 +36,17 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Lecture October 14, 2024', 2, None, 'lecture-october-14-2024'),
-              ('Material for the active learning sessions on Tuesday and '
-               'Wednesday',
+ 'sections': [('Lecture October 13, 2025', 2, None, 'lecture-october-13-2025'),
+              ('Readings and videos', 2, None, 'readings-and-videos'),
+              ('Material for the lab sessions on Tuesday and Wednesday',
                2,
                None,
-               'material-for-the-active-learning-sessions-on-tuesday-and-wednesday'),
-              ('Writing a code which implements a feed-forward neural network',
+               'material-for-the-lab-sessions-on-tuesday-and-wednesday'),
+              ('Lecture material: Writing a code which implements a '
+               'feed-forward neural network',
                2,
                None,
-               'writing-a-code-which-implements-a-feed-forward-neural-network'),
+               'lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network'),
               ('Mathematics of deep learning',
                2,
                None,
@@ -55,10 +56,11 @@
                None,
                'reminder-on-books-with-hands-on-material-and-codes'),
               ('Reading recommendations', 2, None, 'reading-recommendations'),
-              ('First network example, simple percepetron with one input',
+              ('Reminder from last week: First network example, simple '
+               'percepetron with one input',
                2,
                None,
-               'first-network-example-simple-percepetron-with-one-input'),
+               'reminder-from-last-week-first-network-example-simple-percepetron-with-one-input'),
               ('Layout of a simple neural network with no hidden layer',
                2,
                None,
@@ -105,10 +107,10 @@
                None,
                'setting-up-the-equations-for-a-neural-network'),
               ('Layout of a neural network with three hidden layers (last '
-               'later = $l=L=4$, first layer $l=0$)',
+               'layer = $l=L=4$, first layer $l=0$)',
                2,
                None,
-               'layout-of-a-neural-network-with-three-hidden-layers-last-later-l-l-4-first-layer-l-0'),
+               'layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0'),
               ('Definitions', 2, None, 'definitions'),
               ('Inputs to the activation function',
                2,
@@ -291,10 +293,6 @@
                2,
                None,
                'collect-and-pre-process-data'),
-              ('The Breast Cancer Data, now with Keras',
-               2,
-               None,
-               'the-breast-cancer-data-now-with-keras'),
               ('Building a neural network code',
                2,
                None,
@@ -349,103 +347,103 @@
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="#lecture-october-14-2024" style="font-size: 80%;"><b>Lecture October 14, 2024</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs002.html#material-for-the-active-learning-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Material for the active learning sessions on Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs003.html#writing-a-code-which-implements-a-feed-forward-neural-network" style="font-size: 80%;"><b>Writing a code which implements a feed-forward neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs004.html#mathematics-of-deep-learning" style="font-size: 80%;"><b>Mathematics of deep learning</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs005.html#reminder-on-books-with-hands-on-material-and-codes" style="font-size: 80%;"><b>Reminder on books with hands-on material and codes</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs006.html#reading-recommendations" style="font-size: 80%;"><b>Reading recommendations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs007.html#first-network-example-simple-percepetron-with-one-input" style="font-size: 80%;"><b>First network example, simple percepetron with one input</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs008.html#layout-of-a-simple-neural-network-with-no-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with no hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs009.html#optimizing-the-parameters" style="font-size: 80%;"><b>Optimizing the parameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs010.html#adding-a-hidden-layer" style="font-size: 80%;"><b>Adding a hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs011.html#layout-of-a-simple-neural-network-with-one-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with one hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs012.html#the-derivatives" style="font-size: 80%;"><b>The derivatives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs013.html#important-observations" style="font-size: 80%;"><b>Important observations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs014.html#the-training" style="font-size: 80%;"><b>The training</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs015.html#code-example" style="font-size: 80%;"><b>Code example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs016.html#simple-neural-network-and-the-back-propagation-equations" style="font-size: 80%;"><b>Simple neural network and the  back propagation equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs017.html#layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-with-two-hidden-noeds-and-one-output-node" style="font-size: 80%;"><b>Layout of a simple neural network with two input nodes, one  hidden layer with two hidden noeds and one output node</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs018.html#the-ouput-layer" style="font-size: 80%;"><b>The ouput layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs019.html#compact-expressions" style="font-size: 80%;"><b>Compact expressions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs020.html#output-layer" style="font-size: 80%;"><b>Output layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs021.html#explicit-derivatives" style="font-size: 80%;"><b>Explicit derivatives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs022.html#derivatives-of-the-hidden-layer" style="font-size: 80%;"><b>Derivatives of the hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs023.html#final-expression" style="font-size: 80%;"><b>Final expression</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs024.html#completing-the-list" style="font-size: 80%;"><b>Completing the list</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs025.html#final-expressions-for-the-biases-of-the-hidden-layer" style="font-size: 80%;"><b>Final expressions for the biases of the hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs026.html#gradient-expressions" style="font-size: 80%;"><b>Gradient expressions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs027.html#setting-up-the-equations-for-a-neural-network" style="font-size: 80%;"><b>Setting up the equations for a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs028.html#layout-of-a-neural-network-with-three-hidden-layers-last-later-l-l-4-first-layer-l-0" style="font-size: 80%;"><b>Layout of a neural network with three hidden layers (last later = \( l=L=4 \), first layer \( l=0 \))</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs029.html#definitions" style="font-size: 80%;"><b>Definitions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs030.html#inputs-to-the-activation-function" style="font-size: 80%;"><b>Inputs to the activation function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs031.html#layout-of-input-to-first-hidden-layer-l-1-from-input-layer-l-0" style="font-size: 80%;"><b>Layout of input to first hidden layer \( l=1 \) from input layer \( l=0 \)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs032.html#derivatives-and-the-chain-rule" style="font-size: 80%;"><b>Derivatives and the chain rule</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs033.html#derivative-of-the-cost-function" style="font-size: 80%;"><b>Derivative of the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs034.html#the-back-propagation-equations-for-a-neural-network" style="font-size: 80%;"><b>The  back propagation equations for a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs035.html#analyzing-the-last-results" style="font-size: 80%;"><b>Analyzing the last results</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs036.html#more-considerations" style="font-size: 80%;"><b>More considerations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs037.html#derivatives-in-terms-of-z-j-l" style="font-size: 80%;"><b>Derivatives in terms of \( z_j^L \)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs038.html#bringing-it-together" style="font-size: 80%;"><b>Bringing it together</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs039.html#final-back-propagating-equation" style="font-size: 80%;"><b>Final back propagating equation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs040.html#using-the-chain-rule-and-summing-over-all-k-entries" style="font-size: 80%;"><b>Using the chain rule and summing over all \( k \) entries</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs041.html#setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations" style="font-size: 80%;"><b>Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs042.html#setting-up-the-back-propagation-algorithm-part-1" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 1</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs043.html#setting-up-the-back-propagation-algorithm-part-2" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs044.html#setting-up-the-back-propagation-algorithm-part-3" style="font-size: 80%;"><b>Setting up the Back propagation algorithm, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs045.html#updating-the-gradients" style="font-size: 80%;"><b>Updating the gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="#lecture-october-13-2025" style="font-size: 80%;"><b>Lecture October 13, 2025</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs002.html#readings-and-videos" style="font-size: 80%;"><b>Readings and videos</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs003.html#material-for-the-lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Material for the lab sessions on Tuesday and Wednesday</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs004.html#lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network" style="font-size: 80%;"><b>Lecture material: Writing a code which implements a feed-forward neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs005.html#mathematics-of-deep-learning" style="font-size: 80%;"><b>Mathematics of deep learning</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs006.html#reminder-on-books-with-hands-on-material-and-codes" style="font-size: 80%;"><b>Reminder on books with hands-on material and codes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs007.html#reading-recommendations" style="font-size: 80%;"><b>Reading recommendations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs008.html#reminder-from-last-week-first-network-example-simple-percepetron-with-one-input" style="font-size: 80%;"><b>Reminder from last week: First network example, simple percepetron with one input</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs009.html#layout-of-a-simple-neural-network-with-no-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with no hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs010.html#optimizing-the-parameters" style="font-size: 80%;"><b>Optimizing the parameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs011.html#adding-a-hidden-layer" style="font-size: 80%;"><b>Adding a hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs012.html#layout-of-a-simple-neural-network-with-one-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with one hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs013.html#the-derivatives" style="font-size: 80%;"><b>The derivatives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs014.html#important-observations" style="font-size: 80%;"><b>Important observations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs015.html#the-training" style="font-size: 80%;"><b>The training</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs016.html#code-example" style="font-size: 80%;"><b>Code example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs017.html#simple-neural-network-and-the-back-propagation-equations" style="font-size: 80%;"><b>Simple neural network and the  back propagation equations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs018.html#layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-with-two-hidden-noeds-and-one-output-node" style="font-size: 80%;"><b>Layout of a simple neural network with two input nodes, one  hidden layer with two hidden noeds and one output node</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs019.html#the-ouput-layer" style="font-size: 80%;"><b>The ouput layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs020.html#compact-expressions" style="font-size: 80%;"><b>Compact expressions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs021.html#output-layer" style="font-size: 80%;"><b>Output layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs022.html#explicit-derivatives" style="font-size: 80%;"><b>Explicit derivatives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs023.html#derivatives-of-the-hidden-layer" style="font-size: 80%;"><b>Derivatives of the hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs024.html#final-expression" style="font-size: 80%;"><b>Final expression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs025.html#completing-the-list" style="font-size: 80%;"><b>Completing the list</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs026.html#final-expressions-for-the-biases-of-the-hidden-layer" style="font-size: 80%;"><b>Final expressions for the biases of the hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs027.html#gradient-expressions" style="font-size: 80%;"><b>Gradient expressions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs028.html#setting-up-the-equations-for-a-neural-network" style="font-size: 80%;"><b>Setting up the equations for a neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs029.html#layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0" style="font-size: 80%;"><b>Layout of a neural network with three hidden layers (last layer = \( l=L=4 \), first layer \( l=0 \))</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs030.html#definitions" style="font-size: 80%;"><b>Definitions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs031.html#inputs-to-the-activation-function" style="font-size: 80%;"><b>Inputs to the activation function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs032.html#layout-of-input-to-first-hidden-layer-l-1-from-input-layer-l-0" style="font-size: 80%;"><b>Layout of input to first hidden layer \( l=1 \) from input layer \( l=0 \)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs033.html#derivatives-and-the-chain-rule" style="font-size: 80%;"><b>Derivatives and the chain rule</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs034.html#derivative-of-the-cost-function" style="font-size: 80%;"><b>Derivative of the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs035.html#the-back-propagation-equations-for-a-neural-network" style="font-size: 80%;"><b>The  back propagation equations for a neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs036.html#analyzing-the-last-results" style="font-size: 80%;"><b>Analyzing the last results</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs037.html#more-considerations" style="font-size: 80%;"><b>More considerations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs038.html#derivatives-in-terms-of-z-j-l" style="font-size: 80%;"><b>Derivatives in terms of \( z_j^L \)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs039.html#bringing-it-together" style="font-size: 80%;"><b>Bringing it together</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs040.html#final-back-propagating-equation" style="font-size: 80%;"><b>Final back propagating equation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs041.html#using-the-chain-rule-and-summing-over-all-k-entries" style="font-size: 80%;"><b>Using the chain rule and summing over all \( k \) entries</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs042.html#setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations" style="font-size: 80%;"><b>Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs043.html#setting-up-the-back-propagation-algorithm-part-1" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 1</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs044.html#setting-up-the-back-propagation-algorithm-part-2" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 2</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs045.html#setting-up-the-back-propagation-algorithm-part-3" style="font-size: 80%;"><b>Setting up the Back propagation algorithm, part 3</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs046.html#updating-the-gradients" style="font-size: 80%;"><b>Updating the gradients</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#activation-functions" style="font-size: 80%;"><b>Activation functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs047.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs048.html#relevance" style="font-size: 80%;"><b>Relevance</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs049.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs050.html#exploding-gradients" style="font-size: 80%;"><b>Exploding gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs051.html#is-the-logistic-activation-function-sigmoid-our-choice" style="font-size: 80%;"><b>Is the Logistic activation function (Sigmoid)  our choice?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs052.html#logistic-function-as-the-root-of-problems" style="font-size: 80%;"><b>Logistic function as the root of problems</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs053.html#the-derivative-of-the-logistic-funtion" style="font-size: 80%;"><b>The derivative of the Logistic funtion</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs054.html#insights-from-the-paper-by-glorot-and-bengio" style="font-size: 80%;"><b>Insights from the paper by Glorot and Bengio</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs055.html#the-relu-function-family" style="font-size: 80%;"><b>The RELU function family</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs056.html#elu-function" style="font-size: 80%;"><b>ELU function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs057.html#which-activation-function-should-we-use" style="font-size: 80%;"><b>Which activation function should we use?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs058.html#more-on-activation-functions-output-layers" style="font-size: 80%;"><b>More on activation functions, output layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs059.html#fine-tuning-neural-network-hyperparameters" style="font-size: 80%;"><b>Fine-tuning neural network hyperparameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs060.html#hidden-layers" style="font-size: 80%;"><b>Hidden layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs061.html#batch-normalization" style="font-size: 80%;"><b>Batch Normalization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs062.html#dropout" style="font-size: 80%;"><b>Dropout</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs063.html#gradient-clipping" style="font-size: 80%;"><b>Gradient Clipping</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs064.html#a-top-down-perspective-on-neural-networks" style="font-size: 80%;"><b>A top-down perspective on Neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs065.html#more-top-down-perspectives" style="font-size: 80%;"><b>More top-down perspectives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs066.html#limitations-of-supervised-learning-with-deep-networks" style="font-size: 80%;"><b>Limitations of supervised learning with deep networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs067.html#limitations-of-nns" style="font-size: 80%;"><b>Limitations of NNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs068.html#homogeneous-data" style="font-size: 80%;"><b>Homogeneous data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs069.html#more-limitations" style="font-size: 80%;"><b>More limitations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs070.html#setting-up-a-multi-layer-perceptron-model-for-classification" style="font-size: 80%;"><b>Setting up a Multi-layer perceptron model for classification</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs071.html#defining-the-cost-function" style="font-size: 80%;"><b>Defining the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs072.html#example-binary-classification-problem" style="font-size: 80%;"><b>Example: binary classification problem</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs073.html#the-softmax-function" style="font-size: 80%;"><b>The Softmax function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs074.html#developing-a-code-for-doing-neural-networks-with-back-propagation" style="font-size: 80%;"><b>Developing a code for doing neural networks with back propagation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs076.html#train-and-test-datasets" style="font-size: 80%;"><b>Train and test datasets</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs077.html#define-model-and-architecture" style="font-size: 80%;"><b>Define model and architecture</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs078.html#layers" style="font-size: 80%;"><b>Layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs079.html#weights-and-biases" style="font-size: 80%;"><b>Weights and biases</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs080.html#feed-forward-pass" style="font-size: 80%;"><b>Feed-forward pass</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs081.html#matrix-multiplications" style="font-size: 80%;"><b>Matrix multiplications</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs082.html#choose-cost-function-and-optimizer" style="font-size: 80%;"><b>Choose cost function and optimizer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs083.html#optimizing-the-cost-function" style="font-size: 80%;"><b>Optimizing the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs084.html#regularization" style="font-size: 80%;"><b>Regularization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs085.html#matrix-multiplication" style="font-size: 80%;"><b>Matrix  multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs086.html#improving-performance" style="font-size: 80%;"><b>Improving performance</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs087.html#full-object-oriented-implementation" style="font-size: 80%;"><b>Full object-oriented implementation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs088.html#evaluate-model-performance-on-test-data" style="font-size: 80%;"><b>Evaluate model performance on test data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs089.html#adjust-hyperparameters" style="font-size: 80%;"><b>Adjust hyperparameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs091.html#scikit-learn-implementation" style="font-size: 80%;"><b>scikit-learn implementation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#building-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs094.html#tensorflow" style="font-size: 80%;"><b>Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs095.html#using-keras" style="font-size: 80%;"><b>Using Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#the-breast-cancer-data-now-with-keras" style="font-size: 80%;"><b>The Breast Cancer Data, now with Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs048.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs049.html#relevance" style="font-size: 80%;"><b>Relevance</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs050.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs051.html#exploding-gradients" style="font-size: 80%;"><b>Exploding gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs052.html#is-the-logistic-activation-function-sigmoid-our-choice" style="font-size: 80%;"><b>Is the Logistic activation function (Sigmoid)  our choice?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs053.html#logistic-function-as-the-root-of-problems" style="font-size: 80%;"><b>Logistic function as the root of problems</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs054.html#the-derivative-of-the-logistic-funtion" style="font-size: 80%;"><b>The derivative of the Logistic funtion</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs055.html#insights-from-the-paper-by-glorot-and-bengio" style="font-size: 80%;"><b>Insights from the paper by Glorot and Bengio</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs056.html#the-relu-function-family" style="font-size: 80%;"><b>The RELU function family</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs057.html#elu-function" style="font-size: 80%;"><b>ELU function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs058.html#which-activation-function-should-we-use" style="font-size: 80%;"><b>Which activation function should we use?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs059.html#more-on-activation-functions-output-layers" style="font-size: 80%;"><b>More on activation functions, output layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs060.html#fine-tuning-neural-network-hyperparameters" style="font-size: 80%;"><b>Fine-tuning neural network hyperparameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs061.html#hidden-layers" style="font-size: 80%;"><b>Hidden layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs062.html#batch-normalization" style="font-size: 80%;"><b>Batch Normalization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs063.html#dropout" style="font-size: 80%;"><b>Dropout</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs064.html#gradient-clipping" style="font-size: 80%;"><b>Gradient Clipping</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs065.html#a-top-down-perspective-on-neural-networks" style="font-size: 80%;"><b>A top-down perspective on Neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs066.html#more-top-down-perspectives" style="font-size: 80%;"><b>More top-down perspectives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs067.html#limitations-of-supervised-learning-with-deep-networks" style="font-size: 80%;"><b>Limitations of supervised learning with deep networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs068.html#limitations-of-nns" style="font-size: 80%;"><b>Limitations of NNs</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs069.html#homogeneous-data" style="font-size: 80%;"><b>Homogeneous data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs070.html#more-limitations" style="font-size: 80%;"><b>More limitations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs071.html#setting-up-a-multi-layer-perceptron-model-for-classification" style="font-size: 80%;"><b>Setting up a Multi-layer perceptron model for classification</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs072.html#defining-the-cost-function" style="font-size: 80%;"><b>Defining the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs073.html#example-binary-classification-problem" style="font-size: 80%;"><b>Example: binary classification problem</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs074.html#the-softmax-function" style="font-size: 80%;"><b>The Softmax function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs075.html#developing-a-code-for-doing-neural-networks-with-back-propagation" style="font-size: 80%;"><b>Developing a code for doing neural networks with back propagation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs077.html#train-and-test-datasets" style="font-size: 80%;"><b>Train and test datasets</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs078.html#define-model-and-architecture" style="font-size: 80%;"><b>Define model and architecture</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs079.html#layers" style="font-size: 80%;"><b>Layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs080.html#weights-and-biases" style="font-size: 80%;"><b>Weights and biases</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs081.html#feed-forward-pass" style="font-size: 80%;"><b>Feed-forward pass</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs082.html#matrix-multiplications" style="font-size: 80%;"><b>Matrix multiplications</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs083.html#choose-cost-function-and-optimizer" style="font-size: 80%;"><b>Choose cost function and optimizer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs084.html#optimizing-the-cost-function" style="font-size: 80%;"><b>Optimizing the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs085.html#regularization" style="font-size: 80%;"><b>Regularization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs086.html#matrix-multiplication" style="font-size: 80%;"><b>Matrix  multiplication</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs087.html#improving-performance" style="font-size: 80%;"><b>Improving performance</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs088.html#full-object-oriented-implementation" style="font-size: 80%;"><b>Full object-oriented implementation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs089.html#evaluate-model-performance-on-test-data" style="font-size: 80%;"><b>Evaluate model performance on test data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs090.html#adjust-hyperparameters" style="font-size: 80%;"><b>Adjust hyperparameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#scikit-learn-implementation" style="font-size: 80%;"><b>scikit-learn implementation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs094.html#building-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building neural networks in Tensorflow and Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs095.html#tensorflow" style="font-size: 80%;"><b>Tensorflow</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#using-keras" style="font-size: 80%;"><b>Using Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#building-a-neural-network-code" style="font-size: 80%;"><b>Building a neural network code</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#learning-rate-methods" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Learning rate methods</a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#usage-of-the-above-learning-rate-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of the above learning rate schedulers</a></li>
@@ -465,34 +463,17 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0001"></a>
 <!-- !split -->
-<h2 id="lecture-october-14-2024" class="anchor">Lecture October 14, 2024 </h2>
+<h2 id="lecture-october-13-2025" class="anchor">Lecture October 13, 2025 </h2>
 <div class="panel panel-default">
 <div class="panel-body">
 <!-- subsequent paragraphs come in larger fonts, so start with a paragraph -->
 <ol>
 <li> Building our own Feed-forward Neural Network and discussion of project 2</li>
+<li> Project 2 is available at <a href="https://github.com/CompPhysics/MachineLearning/blob/master/doc/Projects/2025/Project2/ipynb/Project2.ipynb" target="_self"><tt>https://github.com/CompPhysics/MachineLearning/blob/master/doc/Projects/2025/Project2/ipynb/Project2.ipynb</tt></a></li>
 </ol>
 </div>
 </div>
 
-<div class="panel panel-default">
-<div class="panel-body">
-<!-- subsequent paragraphs come in larger fonts, so start with a paragraph -->
-<ol>
-<li> These lecture notes</li>
-<li> <a href="https://youtu.be/7B2F35gNj2Y" target="_self">Video of lecture</a></li>
-<li> <a href="https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2024/NotesOct14.pdf" target="_self">Whiteboard notes</a></li>
-<li> For a more in depth discussion on  neural networks we recommend Goodfellow et al chapters 6 and 7.</li>     
-<li> Neural Networks demystified at <a href="https://www.youtube.com/watch?v=bxe2T-V8XRs&list=PLiaHhY2iBX9hdHaRr6b7XevZtgZRa1PoU&ab_channel=WelchLabs" target="_self"><tt>https://www.youtube.com/watch?v=bxe2T-V8XRs&list=PLiaHhY2iBX9hdHaRr6b7XevZtgZRa1PoU&ab_channel=WelchLabs</tt></a></li>
-<li> Building Neural Networks from scratch at <a href="https://www.youtube.com/watch?v=Wo5dMEP_BbI&list=PLQVvvaa0QuDcjD5BAw2DxE6OF2tius3V3&ab_channel=sentdex" target="_self"><tt>https://www.youtube.com/watch?v=Wo5dMEP_BbI&list=PLQVvvaa0QuDcjD5BAw2DxE6OF2tius3V3&ab_channel=sentdex</tt></a></li>
-<li> Video on Neural Networks at <a href="https://www.youtube.com/watch?v=CqOfi41LfDw" target="_self"><tt>https://www.youtube.com/watch?v=CqOfi41LfDw</tt></a></li>
-<li> Video on the back propagation algorithm at <a href="https://www.youtube.com/watch?v=Ilg3gGewQ5U" target="_self"><tt>https://www.youtube.com/watch?v=Ilg3gGewQ5U</tt></a></li>
-</ol>
-<p>I also  recommend Michael Nielsen's intuitive approach to the neural networks and the universal approximation theorem, see the slides at <a href="http://neuralnetworksanddeeplearning.com/chap4.html" target="_self"><tt>http://neuralnetworksanddeeplearning.com/chap4.html</tt></a>.</p>
-</div>
-</div>
-
-
 <p>
 <!-- navigation buttons at the bottom of the page -->
 <ul class="pagination">
diff --git a/doc/pub/week42/html/._week42-bs002.html b/doc/pub/week42/html/._week42-bs002.html
index c6623ae0b..a1df6aa3f 100644
--- a/doc/pub/week42/html/._week42-bs002.html
+++ b/doc/pub/week42/html/._week42-bs002.html
@@ -36,16 +36,17 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Lecture October 14, 2024', 2, None, 'lecture-october-14-2024'),
-              ('Material for the active learning sessions on Tuesday and '
-               'Wednesday',
+ 'sections': [('Lecture October 13, 2025', 2, None, 'lecture-october-13-2025'),
+              ('Readings and videos', 2, None, 'readings-and-videos'),
+              ('Material for the lab sessions on Tuesday and Wednesday',
                2,
                None,
-               'material-for-the-active-learning-sessions-on-tuesday-and-wednesday'),
-              ('Writing a code which implements a feed-forward neural network',
+               'material-for-the-lab-sessions-on-tuesday-and-wednesday'),
+              ('Lecture material: Writing a code which implements a '
+               'feed-forward neural network',
                2,
                None,
-               'writing-a-code-which-implements-a-feed-forward-neural-network'),
+               'lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network'),
               ('Mathematics of deep learning',
                2,
                None,
@@ -55,10 +56,11 @@
                None,
                'reminder-on-books-with-hands-on-material-and-codes'),
               ('Reading recommendations', 2, None, 'reading-recommendations'),
-              ('First network example, simple percepetron with one input',
+              ('Reminder from last week: First network example, simple '
+               'percepetron with one input',
                2,
                None,
-               'first-network-example-simple-percepetron-with-one-input'),
+               'reminder-from-last-week-first-network-example-simple-percepetron-with-one-input'),
               ('Layout of a simple neural network with no hidden layer',
                2,
                None,
@@ -105,10 +107,10 @@
                None,
                'setting-up-the-equations-for-a-neural-network'),
               ('Layout of a neural network with three hidden layers (last '
-               'later = $l=L=4$, first layer $l=0$)',
+               'layer = $l=L=4$, first layer $l=0$)',
                2,
                None,
-               'layout-of-a-neural-network-with-three-hidden-layers-last-later-l-l-4-first-layer-l-0'),
+               'layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0'),
               ('Definitions', 2, None, 'definitions'),
               ('Inputs to the activation function',
                2,
@@ -291,10 +293,6 @@
                2,
                None,
                'collect-and-pre-process-data'),
-              ('The Breast Cancer Data, now with Keras',
-               2,
-               None,
-               'the-breast-cancer-data-now-with-keras'),
               ('Building a neural network code',
                2,
                None,
@@ -349,103 +347,103 @@
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs001.html#lecture-october-14-2024" style="font-size: 80%;"><b>Lecture October 14, 2024</b></a></li>
-     <!-- navigation toc: --> <li><a href="#material-for-the-active-learning-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Material for the active learning sessions on Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs003.html#writing-a-code-which-implements-a-feed-forward-neural-network" style="font-size: 80%;"><b>Writing a code which implements a feed-forward neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs004.html#mathematics-of-deep-learning" style="font-size: 80%;"><b>Mathematics of deep learning</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs005.html#reminder-on-books-with-hands-on-material-and-codes" style="font-size: 80%;"><b>Reminder on books with hands-on material and codes</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs006.html#reading-recommendations" style="font-size: 80%;"><b>Reading recommendations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs007.html#first-network-example-simple-percepetron-with-one-input" style="font-size: 80%;"><b>First network example, simple percepetron with one input</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs008.html#layout-of-a-simple-neural-network-with-no-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with no hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs009.html#optimizing-the-parameters" style="font-size: 80%;"><b>Optimizing the parameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs010.html#adding-a-hidden-layer" style="font-size: 80%;"><b>Adding a hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs011.html#layout-of-a-simple-neural-network-with-one-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with one hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs012.html#the-derivatives" style="font-size: 80%;"><b>The derivatives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs013.html#important-observations" style="font-size: 80%;"><b>Important observations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs014.html#the-training" style="font-size: 80%;"><b>The training</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs015.html#code-example" style="font-size: 80%;"><b>Code example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs016.html#simple-neural-network-and-the-back-propagation-equations" style="font-size: 80%;"><b>Simple neural network and the  back propagation equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs017.html#layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-with-two-hidden-noeds-and-one-output-node" style="font-size: 80%;"><b>Layout of a simple neural network with two input nodes, one  hidden layer with two hidden noeds and one output node</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs018.html#the-ouput-layer" style="font-size: 80%;"><b>The ouput layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs019.html#compact-expressions" style="font-size: 80%;"><b>Compact expressions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs020.html#output-layer" style="font-size: 80%;"><b>Output layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs021.html#explicit-derivatives" style="font-size: 80%;"><b>Explicit derivatives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs022.html#derivatives-of-the-hidden-layer" style="font-size: 80%;"><b>Derivatives of the hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs023.html#final-expression" style="font-size: 80%;"><b>Final expression</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs024.html#completing-the-list" style="font-size: 80%;"><b>Completing the list</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs025.html#final-expressions-for-the-biases-of-the-hidden-layer" style="font-size: 80%;"><b>Final expressions for the biases of the hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs026.html#gradient-expressions" style="font-size: 80%;"><b>Gradient expressions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs027.html#setting-up-the-equations-for-a-neural-network" style="font-size: 80%;"><b>Setting up the equations for a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs028.html#layout-of-a-neural-network-with-three-hidden-layers-last-later-l-l-4-first-layer-l-0" style="font-size: 80%;"><b>Layout of a neural network with three hidden layers (last later = \( l=L=4 \), first layer \( l=0 \))</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs029.html#definitions" style="font-size: 80%;"><b>Definitions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs030.html#inputs-to-the-activation-function" style="font-size: 80%;"><b>Inputs to the activation function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs031.html#layout-of-input-to-first-hidden-layer-l-1-from-input-layer-l-0" style="font-size: 80%;"><b>Layout of input to first hidden layer \( l=1 \) from input layer \( l=0 \)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs032.html#derivatives-and-the-chain-rule" style="font-size: 80%;"><b>Derivatives and the chain rule</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs033.html#derivative-of-the-cost-function" style="font-size: 80%;"><b>Derivative of the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs034.html#the-back-propagation-equations-for-a-neural-network" style="font-size: 80%;"><b>The  back propagation equations for a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs035.html#analyzing-the-last-results" style="font-size: 80%;"><b>Analyzing the last results</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs036.html#more-considerations" style="font-size: 80%;"><b>More considerations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs037.html#derivatives-in-terms-of-z-j-l" style="font-size: 80%;"><b>Derivatives in terms of \( z_j^L \)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs038.html#bringing-it-together" style="font-size: 80%;"><b>Bringing it together</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs039.html#final-back-propagating-equation" style="font-size: 80%;"><b>Final back propagating equation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs040.html#using-the-chain-rule-and-summing-over-all-k-entries" style="font-size: 80%;"><b>Using the chain rule and summing over all \( k \) entries</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs041.html#setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations" style="font-size: 80%;"><b>Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs042.html#setting-up-the-back-propagation-algorithm-part-1" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 1</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs043.html#setting-up-the-back-propagation-algorithm-part-2" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs044.html#setting-up-the-back-propagation-algorithm-part-3" style="font-size: 80%;"><b>Setting up the Back propagation algorithm, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs045.html#updating-the-gradients" style="font-size: 80%;"><b>Updating the gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs001.html#lecture-october-13-2025" style="font-size: 80%;"><b>Lecture October 13, 2025</b></a></li>
+     <!-- navigation toc: --> <li><a href="#readings-and-videos" style="font-size: 80%;"><b>Readings and videos</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs003.html#material-for-the-lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Material for the lab sessions on Tuesday and Wednesday</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs004.html#lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network" style="font-size: 80%;"><b>Lecture material: Writing a code which implements a feed-forward neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs005.html#mathematics-of-deep-learning" style="font-size: 80%;"><b>Mathematics of deep learning</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs006.html#reminder-on-books-with-hands-on-material-and-codes" style="font-size: 80%;"><b>Reminder on books with hands-on material and codes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs007.html#reading-recommendations" style="font-size: 80%;"><b>Reading recommendations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs008.html#reminder-from-last-week-first-network-example-simple-percepetron-with-one-input" style="font-size: 80%;"><b>Reminder from last week: First network example, simple percepetron with one input</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs009.html#layout-of-a-simple-neural-network-with-no-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with no hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs010.html#optimizing-the-parameters" style="font-size: 80%;"><b>Optimizing the parameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs011.html#adding-a-hidden-layer" style="font-size: 80%;"><b>Adding a hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs012.html#layout-of-a-simple-neural-network-with-one-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with one hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs013.html#the-derivatives" style="font-size: 80%;"><b>The derivatives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs014.html#important-observations" style="font-size: 80%;"><b>Important observations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs015.html#the-training" style="font-size: 80%;"><b>The training</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs016.html#code-example" style="font-size: 80%;"><b>Code example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs017.html#simple-neural-network-and-the-back-propagation-equations" style="font-size: 80%;"><b>Simple neural network and the  back propagation equations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs018.html#layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-with-two-hidden-noeds-and-one-output-node" style="font-size: 80%;"><b>Layout of a simple neural network with two input nodes, one  hidden layer with two hidden noeds and one output node</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs019.html#the-ouput-layer" style="font-size: 80%;"><b>The ouput layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs020.html#compact-expressions" style="font-size: 80%;"><b>Compact expressions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs021.html#output-layer" style="font-size: 80%;"><b>Output layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs022.html#explicit-derivatives" style="font-size: 80%;"><b>Explicit derivatives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs023.html#derivatives-of-the-hidden-layer" style="font-size: 80%;"><b>Derivatives of the hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs024.html#final-expression" style="font-size: 80%;"><b>Final expression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs025.html#completing-the-list" style="font-size: 80%;"><b>Completing the list</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs026.html#final-expressions-for-the-biases-of-the-hidden-layer" style="font-size: 80%;"><b>Final expressions for the biases of the hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs027.html#gradient-expressions" style="font-size: 80%;"><b>Gradient expressions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs028.html#setting-up-the-equations-for-a-neural-network" style="font-size: 80%;"><b>Setting up the equations for a neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs029.html#layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0" style="font-size: 80%;"><b>Layout of a neural network with three hidden layers (last layer = \( l=L=4 \), first layer \( l=0 \))</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs030.html#definitions" style="font-size: 80%;"><b>Definitions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs031.html#inputs-to-the-activation-function" style="font-size: 80%;"><b>Inputs to the activation function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs032.html#layout-of-input-to-first-hidden-layer-l-1-from-input-layer-l-0" style="font-size: 80%;"><b>Layout of input to first hidden layer \( l=1 \) from input layer \( l=0 \)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs033.html#derivatives-and-the-chain-rule" style="font-size: 80%;"><b>Derivatives and the chain rule</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs034.html#derivative-of-the-cost-function" style="font-size: 80%;"><b>Derivative of the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs035.html#the-back-propagation-equations-for-a-neural-network" style="font-size: 80%;"><b>The  back propagation equations for a neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs036.html#analyzing-the-last-results" style="font-size: 80%;"><b>Analyzing the last results</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs037.html#more-considerations" style="font-size: 80%;"><b>More considerations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs038.html#derivatives-in-terms-of-z-j-l" style="font-size: 80%;"><b>Derivatives in terms of \( z_j^L \)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs039.html#bringing-it-together" style="font-size: 80%;"><b>Bringing it together</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs040.html#final-back-propagating-equation" style="font-size: 80%;"><b>Final back propagating equation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs041.html#using-the-chain-rule-and-summing-over-all-k-entries" style="font-size: 80%;"><b>Using the chain rule and summing over all \( k \) entries</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs042.html#setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations" style="font-size: 80%;"><b>Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs043.html#setting-up-the-back-propagation-algorithm-part-1" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 1</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs044.html#setting-up-the-back-propagation-algorithm-part-2" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 2</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs045.html#setting-up-the-back-propagation-algorithm-part-3" style="font-size: 80%;"><b>Setting up the Back propagation algorithm, part 3</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs046.html#updating-the-gradients" style="font-size: 80%;"><b>Updating the gradients</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#activation-functions" style="font-size: 80%;"><b>Activation functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs047.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs048.html#relevance" style="font-size: 80%;"><b>Relevance</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs049.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs050.html#exploding-gradients" style="font-size: 80%;"><b>Exploding gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs051.html#is-the-logistic-activation-function-sigmoid-our-choice" style="font-size: 80%;"><b>Is the Logistic activation function (Sigmoid)  our choice?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs052.html#logistic-function-as-the-root-of-problems" style="font-size: 80%;"><b>Logistic function as the root of problems</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs053.html#the-derivative-of-the-logistic-funtion" style="font-size: 80%;"><b>The derivative of the Logistic funtion</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs054.html#insights-from-the-paper-by-glorot-and-bengio" style="font-size: 80%;"><b>Insights from the paper by Glorot and Bengio</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs055.html#the-relu-function-family" style="font-size: 80%;"><b>The RELU function family</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs056.html#elu-function" style="font-size: 80%;"><b>ELU function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs057.html#which-activation-function-should-we-use" style="font-size: 80%;"><b>Which activation function should we use?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs058.html#more-on-activation-functions-output-layers" style="font-size: 80%;"><b>More on activation functions, output layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs059.html#fine-tuning-neural-network-hyperparameters" style="font-size: 80%;"><b>Fine-tuning neural network hyperparameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs060.html#hidden-layers" style="font-size: 80%;"><b>Hidden layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs061.html#batch-normalization" style="font-size: 80%;"><b>Batch Normalization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs062.html#dropout" style="font-size: 80%;"><b>Dropout</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs063.html#gradient-clipping" style="font-size: 80%;"><b>Gradient Clipping</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs064.html#a-top-down-perspective-on-neural-networks" style="font-size: 80%;"><b>A top-down perspective on Neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs065.html#more-top-down-perspectives" style="font-size: 80%;"><b>More top-down perspectives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs066.html#limitations-of-supervised-learning-with-deep-networks" style="font-size: 80%;"><b>Limitations of supervised learning with deep networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs067.html#limitations-of-nns" style="font-size: 80%;"><b>Limitations of NNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs068.html#homogeneous-data" style="font-size: 80%;"><b>Homogeneous data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs069.html#more-limitations" style="font-size: 80%;"><b>More limitations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs070.html#setting-up-a-multi-layer-perceptron-model-for-classification" style="font-size: 80%;"><b>Setting up a Multi-layer perceptron model for classification</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs071.html#defining-the-cost-function" style="font-size: 80%;"><b>Defining the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs072.html#example-binary-classification-problem" style="font-size: 80%;"><b>Example: binary classification problem</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs073.html#the-softmax-function" style="font-size: 80%;"><b>The Softmax function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs074.html#developing-a-code-for-doing-neural-networks-with-back-propagation" style="font-size: 80%;"><b>Developing a code for doing neural networks with back propagation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs076.html#train-and-test-datasets" style="font-size: 80%;"><b>Train and test datasets</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs077.html#define-model-and-architecture" style="font-size: 80%;"><b>Define model and architecture</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs078.html#layers" style="font-size: 80%;"><b>Layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs079.html#weights-and-biases" style="font-size: 80%;"><b>Weights and biases</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs080.html#feed-forward-pass" style="font-size: 80%;"><b>Feed-forward pass</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs081.html#matrix-multiplications" style="font-size: 80%;"><b>Matrix multiplications</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs082.html#choose-cost-function-and-optimizer" style="font-size: 80%;"><b>Choose cost function and optimizer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs083.html#optimizing-the-cost-function" style="font-size: 80%;"><b>Optimizing the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs084.html#regularization" style="font-size: 80%;"><b>Regularization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs085.html#matrix-multiplication" style="font-size: 80%;"><b>Matrix  multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs086.html#improving-performance" style="font-size: 80%;"><b>Improving performance</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs087.html#full-object-oriented-implementation" style="font-size: 80%;"><b>Full object-oriented implementation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs088.html#evaluate-model-performance-on-test-data" style="font-size: 80%;"><b>Evaluate model performance on test data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs089.html#adjust-hyperparameters" style="font-size: 80%;"><b>Adjust hyperparameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs091.html#scikit-learn-implementation" style="font-size: 80%;"><b>scikit-learn implementation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#building-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs094.html#tensorflow" style="font-size: 80%;"><b>Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs095.html#using-keras" style="font-size: 80%;"><b>Using Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#the-breast-cancer-data-now-with-keras" style="font-size: 80%;"><b>The Breast Cancer Data, now with Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs048.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs049.html#relevance" style="font-size: 80%;"><b>Relevance</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs050.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs051.html#exploding-gradients" style="font-size: 80%;"><b>Exploding gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs052.html#is-the-logistic-activation-function-sigmoid-our-choice" style="font-size: 80%;"><b>Is the Logistic activation function (Sigmoid)  our choice?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs053.html#logistic-function-as-the-root-of-problems" style="font-size: 80%;"><b>Logistic function as the root of problems</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs054.html#the-derivative-of-the-logistic-funtion" style="font-size: 80%;"><b>The derivative of the Logistic funtion</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs055.html#insights-from-the-paper-by-glorot-and-bengio" style="font-size: 80%;"><b>Insights from the paper by Glorot and Bengio</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs056.html#the-relu-function-family" style="font-size: 80%;"><b>The RELU function family</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs057.html#elu-function" style="font-size: 80%;"><b>ELU function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs058.html#which-activation-function-should-we-use" style="font-size: 80%;"><b>Which activation function should we use?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs059.html#more-on-activation-functions-output-layers" style="font-size: 80%;"><b>More on activation functions, output layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs060.html#fine-tuning-neural-network-hyperparameters" style="font-size: 80%;"><b>Fine-tuning neural network hyperparameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs061.html#hidden-layers" style="font-size: 80%;"><b>Hidden layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs062.html#batch-normalization" style="font-size: 80%;"><b>Batch Normalization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs063.html#dropout" style="font-size: 80%;"><b>Dropout</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs064.html#gradient-clipping" style="font-size: 80%;"><b>Gradient Clipping</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs065.html#a-top-down-perspective-on-neural-networks" style="font-size: 80%;"><b>A top-down perspective on Neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs066.html#more-top-down-perspectives" style="font-size: 80%;"><b>More top-down perspectives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs067.html#limitations-of-supervised-learning-with-deep-networks" style="font-size: 80%;"><b>Limitations of supervised learning with deep networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs068.html#limitations-of-nns" style="font-size: 80%;"><b>Limitations of NNs</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs069.html#homogeneous-data" style="font-size: 80%;"><b>Homogeneous data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs070.html#more-limitations" style="font-size: 80%;"><b>More limitations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs071.html#setting-up-a-multi-layer-perceptron-model-for-classification" style="font-size: 80%;"><b>Setting up a Multi-layer perceptron model for classification</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs072.html#defining-the-cost-function" style="font-size: 80%;"><b>Defining the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs073.html#example-binary-classification-problem" style="font-size: 80%;"><b>Example: binary classification problem</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs074.html#the-softmax-function" style="font-size: 80%;"><b>The Softmax function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs075.html#developing-a-code-for-doing-neural-networks-with-back-propagation" style="font-size: 80%;"><b>Developing a code for doing neural networks with back propagation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs077.html#train-and-test-datasets" style="font-size: 80%;"><b>Train and test datasets</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs078.html#define-model-and-architecture" style="font-size: 80%;"><b>Define model and architecture</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs079.html#layers" style="font-size: 80%;"><b>Layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs080.html#weights-and-biases" style="font-size: 80%;"><b>Weights and biases</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs081.html#feed-forward-pass" style="font-size: 80%;"><b>Feed-forward pass</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs082.html#matrix-multiplications" style="font-size: 80%;"><b>Matrix multiplications</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs083.html#choose-cost-function-and-optimizer" style="font-size: 80%;"><b>Choose cost function and optimizer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs084.html#optimizing-the-cost-function" style="font-size: 80%;"><b>Optimizing the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs085.html#regularization" style="font-size: 80%;"><b>Regularization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs086.html#matrix-multiplication" style="font-size: 80%;"><b>Matrix  multiplication</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs087.html#improving-performance" style="font-size: 80%;"><b>Improving performance</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs088.html#full-object-oriented-implementation" style="font-size: 80%;"><b>Full object-oriented implementation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs089.html#evaluate-model-performance-on-test-data" style="font-size: 80%;"><b>Evaluate model performance on test data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs090.html#adjust-hyperparameters" style="font-size: 80%;"><b>Adjust hyperparameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#scikit-learn-implementation" style="font-size: 80%;"><b>scikit-learn implementation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs094.html#building-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building neural networks in Tensorflow and Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs095.html#tensorflow" style="font-size: 80%;"><b>Tensorflow</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#using-keras" style="font-size: 80%;"><b>Using Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#building-a-neural-network-code" style="font-size: 80%;"><b>Building a neural network code</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#learning-rate-methods" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Learning rate methods</a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#usage-of-the-above-learning-rate-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of the above learning rate schedulers</a></li>
@@ -465,20 +463,25 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0002"></a>
 <!-- !split -->
-<h2 id="material-for-the-active-learning-sessions-on-tuesday-and-wednesday" class="anchor">Material for the active learning sessions on Tuesday and Wednesday </h2>
+<h2 id="readings-and-videos" class="anchor">Readings and videos  </h2>
 <div class="panel panel-default">
 <div class="panel-body">
 <!-- subsequent paragraphs come in larger fonts, so start with a paragraph -->
-<ul>
-  <li> Exercise on starting to write a code for neural networks, feed forward part. We will also continue ur discussions of gradient descent methods from last week. If you have time, start considering the back-propagation part as well (exercises for next week)</li>
-  <li> Discussion of project 2</li>
-</ul>
+<ol>
+<li> These lecture notes</li>
+<li> Video of lecture at <a href="https://youtu.be/eqyNrEYRXnY" target="_self"><tt>https://youtu.be/eqyNrEYRXnY</tt></a></li>
+<li> Whiteboard notes at <a href="https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2025/FYSSTKweek42.pdf" target="_self"><tt>https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2025/FYSSTKweek42.pdf</tt></a></li>
+<li> For a more in depth discussion on  neural networks we recommend Goodfellow et al chapters 6 and 7. For the optimization part, see chapter 8.</li>    
+<li> Neural Networks demystified at <a href="https://www.youtube.com/watch?v=bxe2T-V8XRs&list=PLiaHhY2iBX9hdHaRr6b7XevZtgZRa1PoU&ab_channel=WelchLabs" target="_self"><tt>https://www.youtube.com/watch?v=bxe2T-V8XRs&list=PLiaHhY2iBX9hdHaRr6b7XevZtgZRa1PoU&ab_channel=WelchLabs</tt></a></li>
+<li> Building Neural Networks from scratch at <a href="https://www.youtube.com/watch?v=Wo5dMEP_BbI&list=PLQVvvaa0QuDcjD5BAw2DxE6OF2tius3V3&ab_channel=sentdex" target="_self"><tt>https://www.youtube.com/watch?v=Wo5dMEP_BbI&list=PLQVvvaa0QuDcjD5BAw2DxE6OF2tius3V3&ab_channel=sentdex</tt></a></li>
+<li> Video on Neural Networks at <a href="https://www.youtube.com/watch?v=CqOfi41LfDw" target="_self"><tt>https://www.youtube.com/watch?v=CqOfi41LfDw</tt></a></li>
+<li> Video on the back propagation algorithm at <a href="https://www.youtube.com/watch?v=Ilg3gGewQ5U" target="_self"><tt>https://www.youtube.com/watch?v=Ilg3gGewQ5U</tt></a></li>
+</ol>
+<p>I also  recommend Michael Nielsen's intuitive approach to the neural networks and the universal approximation theorem, see the slides at <a href="http://neuralnetworksanddeeplearning.com/chap4.html" target="_self"><tt>http://neuralnetworksanddeeplearning.com/chap4.html</tt></a>.</p>
 </div>
 </div>
 
 
-<p><b>Note</b>: some of the codes will also be discussed next week in connection with the solution of differential equations.</p>
-
 <p>
 <!-- navigation buttons at the bottom of the page -->
 <ul class="pagination">
diff --git a/doc/pub/week42/html/._week42-bs003.html b/doc/pub/week42/html/._week42-bs003.html
index dcd01d0d6..82ca31eb5 100644
--- a/doc/pub/week42/html/._week42-bs003.html
+++ b/doc/pub/week42/html/._week42-bs003.html
@@ -36,16 +36,17 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Lecture October 14, 2024', 2, None, 'lecture-october-14-2024'),
-              ('Material for the active learning sessions on Tuesday and '
-               'Wednesday',
+ 'sections': [('Lecture October 13, 2025', 2, None, 'lecture-october-13-2025'),
+              ('Readings and videos', 2, None, 'readings-and-videos'),
+              ('Material for the lab sessions on Tuesday and Wednesday',
                2,
                None,
-               'material-for-the-active-learning-sessions-on-tuesday-and-wednesday'),
-              ('Writing a code which implements a feed-forward neural network',
+               'material-for-the-lab-sessions-on-tuesday-and-wednesday'),
+              ('Lecture material: Writing a code which implements a '
+               'feed-forward neural network',
                2,
                None,
-               'writing-a-code-which-implements-a-feed-forward-neural-network'),
+               'lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network'),
               ('Mathematics of deep learning',
                2,
                None,
@@ -55,10 +56,11 @@
                None,
                'reminder-on-books-with-hands-on-material-and-codes'),
               ('Reading recommendations', 2, None, 'reading-recommendations'),
-              ('First network example, simple percepetron with one input',
+              ('Reminder from last week: First network example, simple '
+               'percepetron with one input',
                2,
                None,
-               'first-network-example-simple-percepetron-with-one-input'),
+               'reminder-from-last-week-first-network-example-simple-percepetron-with-one-input'),
               ('Layout of a simple neural network with no hidden layer',
                2,
                None,
@@ -105,10 +107,10 @@
                None,
                'setting-up-the-equations-for-a-neural-network'),
               ('Layout of a neural network with three hidden layers (last '
-               'later = $l=L=4$, first layer $l=0$)',
+               'layer = $l=L=4$, first layer $l=0$)',
                2,
                None,
-               'layout-of-a-neural-network-with-three-hidden-layers-last-later-l-l-4-first-layer-l-0'),
+               'layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0'),
               ('Definitions', 2, None, 'definitions'),
               ('Inputs to the activation function',
                2,
@@ -291,10 +293,6 @@
                2,
                None,
                'collect-and-pre-process-data'),
-              ('The Breast Cancer Data, now with Keras',
-               2,
-               None,
-               'the-breast-cancer-data-now-with-keras'),
               ('Building a neural network code',
                2,
                None,
@@ -349,103 +347,103 @@
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs001.html#lecture-october-14-2024" style="font-size: 80%;"><b>Lecture October 14, 2024</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs002.html#material-for-the-active-learning-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Material for the active learning sessions on Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="#writing-a-code-which-implements-a-feed-forward-neural-network" style="font-size: 80%;"><b>Writing a code which implements a feed-forward neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs004.html#mathematics-of-deep-learning" style="font-size: 80%;"><b>Mathematics of deep learning</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs005.html#reminder-on-books-with-hands-on-material-and-codes" style="font-size: 80%;"><b>Reminder on books with hands-on material and codes</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs006.html#reading-recommendations" style="font-size: 80%;"><b>Reading recommendations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs007.html#first-network-example-simple-percepetron-with-one-input" style="font-size: 80%;"><b>First network example, simple percepetron with one input</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs008.html#layout-of-a-simple-neural-network-with-no-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with no hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs009.html#optimizing-the-parameters" style="font-size: 80%;"><b>Optimizing the parameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs010.html#adding-a-hidden-layer" style="font-size: 80%;"><b>Adding a hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs011.html#layout-of-a-simple-neural-network-with-one-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with one hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs012.html#the-derivatives" style="font-size: 80%;"><b>The derivatives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs013.html#important-observations" style="font-size: 80%;"><b>Important observations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs014.html#the-training" style="font-size: 80%;"><b>The training</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs015.html#code-example" style="font-size: 80%;"><b>Code example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs016.html#simple-neural-network-and-the-back-propagation-equations" style="font-size: 80%;"><b>Simple neural network and the  back propagation equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs017.html#layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-with-two-hidden-noeds-and-one-output-node" style="font-size: 80%;"><b>Layout of a simple neural network with two input nodes, one  hidden layer with two hidden noeds and one output node</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs018.html#the-ouput-layer" style="font-size: 80%;"><b>The ouput layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs019.html#compact-expressions" style="font-size: 80%;"><b>Compact expressions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs020.html#output-layer" style="font-size: 80%;"><b>Output layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs021.html#explicit-derivatives" style="font-size: 80%;"><b>Explicit derivatives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs022.html#derivatives-of-the-hidden-layer" style="font-size: 80%;"><b>Derivatives of the hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs023.html#final-expression" style="font-size: 80%;"><b>Final expression</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs024.html#completing-the-list" style="font-size: 80%;"><b>Completing the list</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs025.html#final-expressions-for-the-biases-of-the-hidden-layer" style="font-size: 80%;"><b>Final expressions for the biases of the hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs026.html#gradient-expressions" style="font-size: 80%;"><b>Gradient expressions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs027.html#setting-up-the-equations-for-a-neural-network" style="font-size: 80%;"><b>Setting up the equations for a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs028.html#layout-of-a-neural-network-with-three-hidden-layers-last-later-l-l-4-first-layer-l-0" style="font-size: 80%;"><b>Layout of a neural network with three hidden layers (last later = \( l=L=4 \), first layer \( l=0 \))</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs029.html#definitions" style="font-size: 80%;"><b>Definitions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs030.html#inputs-to-the-activation-function" style="font-size: 80%;"><b>Inputs to the activation function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs031.html#layout-of-input-to-first-hidden-layer-l-1-from-input-layer-l-0" style="font-size: 80%;"><b>Layout of input to first hidden layer \( l=1 \) from input layer \( l=0 \)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs032.html#derivatives-and-the-chain-rule" style="font-size: 80%;"><b>Derivatives and the chain rule</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs033.html#derivative-of-the-cost-function" style="font-size: 80%;"><b>Derivative of the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs034.html#the-back-propagation-equations-for-a-neural-network" style="font-size: 80%;"><b>The  back propagation equations for a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs035.html#analyzing-the-last-results" style="font-size: 80%;"><b>Analyzing the last results</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs036.html#more-considerations" style="font-size: 80%;"><b>More considerations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs037.html#derivatives-in-terms-of-z-j-l" style="font-size: 80%;"><b>Derivatives in terms of \( z_j^L \)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs038.html#bringing-it-together" style="font-size: 80%;"><b>Bringing it together</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs039.html#final-back-propagating-equation" style="font-size: 80%;"><b>Final back propagating equation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs040.html#using-the-chain-rule-and-summing-over-all-k-entries" style="font-size: 80%;"><b>Using the chain rule and summing over all \( k \) entries</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs041.html#setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations" style="font-size: 80%;"><b>Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs042.html#setting-up-the-back-propagation-algorithm-part-1" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 1</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs043.html#setting-up-the-back-propagation-algorithm-part-2" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs044.html#setting-up-the-back-propagation-algorithm-part-3" style="font-size: 80%;"><b>Setting up the Back propagation algorithm, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs045.html#updating-the-gradients" style="font-size: 80%;"><b>Updating the gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs001.html#lecture-october-13-2025" style="font-size: 80%;"><b>Lecture October 13, 2025</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs002.html#readings-and-videos" style="font-size: 80%;"><b>Readings and videos</b></a></li>
+     <!-- navigation toc: --> <li><a href="#material-for-the-lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Material for the lab sessions on Tuesday and Wednesday</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs004.html#lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network" style="font-size: 80%;"><b>Lecture material: Writing a code which implements a feed-forward neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs005.html#mathematics-of-deep-learning" style="font-size: 80%;"><b>Mathematics of deep learning</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs006.html#reminder-on-books-with-hands-on-material-and-codes" style="font-size: 80%;"><b>Reminder on books with hands-on material and codes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs007.html#reading-recommendations" style="font-size: 80%;"><b>Reading recommendations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs008.html#reminder-from-last-week-first-network-example-simple-percepetron-with-one-input" style="font-size: 80%;"><b>Reminder from last week: First network example, simple percepetron with one input</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs009.html#layout-of-a-simple-neural-network-with-no-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with no hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs010.html#optimizing-the-parameters" style="font-size: 80%;"><b>Optimizing the parameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs011.html#adding-a-hidden-layer" style="font-size: 80%;"><b>Adding a hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs012.html#layout-of-a-simple-neural-network-with-one-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with one hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs013.html#the-derivatives" style="font-size: 80%;"><b>The derivatives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs014.html#important-observations" style="font-size: 80%;"><b>Important observations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs015.html#the-training" style="font-size: 80%;"><b>The training</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs016.html#code-example" style="font-size: 80%;"><b>Code example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs017.html#simple-neural-network-and-the-back-propagation-equations" style="font-size: 80%;"><b>Simple neural network and the  back propagation equations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs018.html#layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-with-two-hidden-noeds-and-one-output-node" style="font-size: 80%;"><b>Layout of a simple neural network with two input nodes, one  hidden layer with two hidden noeds and one output node</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs019.html#the-ouput-layer" style="font-size: 80%;"><b>The ouput layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs020.html#compact-expressions" style="font-size: 80%;"><b>Compact expressions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs021.html#output-layer" style="font-size: 80%;"><b>Output layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs022.html#explicit-derivatives" style="font-size: 80%;"><b>Explicit derivatives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs023.html#derivatives-of-the-hidden-layer" style="font-size: 80%;"><b>Derivatives of the hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs024.html#final-expression" style="font-size: 80%;"><b>Final expression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs025.html#completing-the-list" style="font-size: 80%;"><b>Completing the list</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs026.html#final-expressions-for-the-biases-of-the-hidden-layer" style="font-size: 80%;"><b>Final expressions for the biases of the hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs027.html#gradient-expressions" style="font-size: 80%;"><b>Gradient expressions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs028.html#setting-up-the-equations-for-a-neural-network" style="font-size: 80%;"><b>Setting up the equations for a neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs029.html#layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0" style="font-size: 80%;"><b>Layout of a neural network with three hidden layers (last layer = \( l=L=4 \), first layer \( l=0 \))</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs030.html#definitions" style="font-size: 80%;"><b>Definitions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs031.html#inputs-to-the-activation-function" style="font-size: 80%;"><b>Inputs to the activation function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs032.html#layout-of-input-to-first-hidden-layer-l-1-from-input-layer-l-0" style="font-size: 80%;"><b>Layout of input to first hidden layer \( l=1 \) from input layer \( l=0 \)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs033.html#derivatives-and-the-chain-rule" style="font-size: 80%;"><b>Derivatives and the chain rule</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs034.html#derivative-of-the-cost-function" style="font-size: 80%;"><b>Derivative of the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs035.html#the-back-propagation-equations-for-a-neural-network" style="font-size: 80%;"><b>The  back propagation equations for a neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs036.html#analyzing-the-last-results" style="font-size: 80%;"><b>Analyzing the last results</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs037.html#more-considerations" style="font-size: 80%;"><b>More considerations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs038.html#derivatives-in-terms-of-z-j-l" style="font-size: 80%;"><b>Derivatives in terms of \( z_j^L \)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs039.html#bringing-it-together" style="font-size: 80%;"><b>Bringing it together</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs040.html#final-back-propagating-equation" style="font-size: 80%;"><b>Final back propagating equation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs041.html#using-the-chain-rule-and-summing-over-all-k-entries" style="font-size: 80%;"><b>Using the chain rule and summing over all \( k \) entries</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs042.html#setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations" style="font-size: 80%;"><b>Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs043.html#setting-up-the-back-propagation-algorithm-part-1" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 1</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs044.html#setting-up-the-back-propagation-algorithm-part-2" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 2</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs045.html#setting-up-the-back-propagation-algorithm-part-3" style="font-size: 80%;"><b>Setting up the Back propagation algorithm, part 3</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs046.html#updating-the-gradients" style="font-size: 80%;"><b>Updating the gradients</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#activation-functions" style="font-size: 80%;"><b>Activation functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs047.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs048.html#relevance" style="font-size: 80%;"><b>Relevance</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs049.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs050.html#exploding-gradients" style="font-size: 80%;"><b>Exploding gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs051.html#is-the-logistic-activation-function-sigmoid-our-choice" style="font-size: 80%;"><b>Is the Logistic activation function (Sigmoid)  our choice?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs052.html#logistic-function-as-the-root-of-problems" style="font-size: 80%;"><b>Logistic function as the root of problems</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs053.html#the-derivative-of-the-logistic-funtion" style="font-size: 80%;"><b>The derivative of the Logistic funtion</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs054.html#insights-from-the-paper-by-glorot-and-bengio" style="font-size: 80%;"><b>Insights from the paper by Glorot and Bengio</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs055.html#the-relu-function-family" style="font-size: 80%;"><b>The RELU function family</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs056.html#elu-function" style="font-size: 80%;"><b>ELU function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs057.html#which-activation-function-should-we-use" style="font-size: 80%;"><b>Which activation function should we use?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs058.html#more-on-activation-functions-output-layers" style="font-size: 80%;"><b>More on activation functions, output layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs059.html#fine-tuning-neural-network-hyperparameters" style="font-size: 80%;"><b>Fine-tuning neural network hyperparameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs060.html#hidden-layers" style="font-size: 80%;"><b>Hidden layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs061.html#batch-normalization" style="font-size: 80%;"><b>Batch Normalization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs062.html#dropout" style="font-size: 80%;"><b>Dropout</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs063.html#gradient-clipping" style="font-size: 80%;"><b>Gradient Clipping</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs064.html#a-top-down-perspective-on-neural-networks" style="font-size: 80%;"><b>A top-down perspective on Neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs065.html#more-top-down-perspectives" style="font-size: 80%;"><b>More top-down perspectives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs066.html#limitations-of-supervised-learning-with-deep-networks" style="font-size: 80%;"><b>Limitations of supervised learning with deep networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs067.html#limitations-of-nns" style="font-size: 80%;"><b>Limitations of NNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs068.html#homogeneous-data" style="font-size: 80%;"><b>Homogeneous data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs069.html#more-limitations" style="font-size: 80%;"><b>More limitations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs070.html#setting-up-a-multi-layer-perceptron-model-for-classification" style="font-size: 80%;"><b>Setting up a Multi-layer perceptron model for classification</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs071.html#defining-the-cost-function" style="font-size: 80%;"><b>Defining the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs072.html#example-binary-classification-problem" style="font-size: 80%;"><b>Example: binary classification problem</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs073.html#the-softmax-function" style="font-size: 80%;"><b>The Softmax function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs074.html#developing-a-code-for-doing-neural-networks-with-back-propagation" style="font-size: 80%;"><b>Developing a code for doing neural networks with back propagation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs076.html#train-and-test-datasets" style="font-size: 80%;"><b>Train and test datasets</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs077.html#define-model-and-architecture" style="font-size: 80%;"><b>Define model and architecture</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs078.html#layers" style="font-size: 80%;"><b>Layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs079.html#weights-and-biases" style="font-size: 80%;"><b>Weights and biases</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs080.html#feed-forward-pass" style="font-size: 80%;"><b>Feed-forward pass</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs081.html#matrix-multiplications" style="font-size: 80%;"><b>Matrix multiplications</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs082.html#choose-cost-function-and-optimizer" style="font-size: 80%;"><b>Choose cost function and optimizer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs083.html#optimizing-the-cost-function" style="font-size: 80%;"><b>Optimizing the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs084.html#regularization" style="font-size: 80%;"><b>Regularization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs085.html#matrix-multiplication" style="font-size: 80%;"><b>Matrix  multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs086.html#improving-performance" style="font-size: 80%;"><b>Improving performance</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs087.html#full-object-oriented-implementation" style="font-size: 80%;"><b>Full object-oriented implementation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs088.html#evaluate-model-performance-on-test-data" style="font-size: 80%;"><b>Evaluate model performance on test data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs089.html#adjust-hyperparameters" style="font-size: 80%;"><b>Adjust hyperparameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs091.html#scikit-learn-implementation" style="font-size: 80%;"><b>scikit-learn implementation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#building-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs094.html#tensorflow" style="font-size: 80%;"><b>Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs095.html#using-keras" style="font-size: 80%;"><b>Using Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#the-breast-cancer-data-now-with-keras" style="font-size: 80%;"><b>The Breast Cancer Data, now with Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs048.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs049.html#relevance" style="font-size: 80%;"><b>Relevance</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs050.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs051.html#exploding-gradients" style="font-size: 80%;"><b>Exploding gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs052.html#is-the-logistic-activation-function-sigmoid-our-choice" style="font-size: 80%;"><b>Is the Logistic activation function (Sigmoid)  our choice?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs053.html#logistic-function-as-the-root-of-problems" style="font-size: 80%;"><b>Logistic function as the root of problems</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs054.html#the-derivative-of-the-logistic-funtion" style="font-size: 80%;"><b>The derivative of the Logistic funtion</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs055.html#insights-from-the-paper-by-glorot-and-bengio" style="font-size: 80%;"><b>Insights from the paper by Glorot and Bengio</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs056.html#the-relu-function-family" style="font-size: 80%;"><b>The RELU function family</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs057.html#elu-function" style="font-size: 80%;"><b>ELU function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs058.html#which-activation-function-should-we-use" style="font-size: 80%;"><b>Which activation function should we use?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs059.html#more-on-activation-functions-output-layers" style="font-size: 80%;"><b>More on activation functions, output layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs060.html#fine-tuning-neural-network-hyperparameters" style="font-size: 80%;"><b>Fine-tuning neural network hyperparameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs061.html#hidden-layers" style="font-size: 80%;"><b>Hidden layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs062.html#batch-normalization" style="font-size: 80%;"><b>Batch Normalization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs063.html#dropout" style="font-size: 80%;"><b>Dropout</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs064.html#gradient-clipping" style="font-size: 80%;"><b>Gradient Clipping</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs065.html#a-top-down-perspective-on-neural-networks" style="font-size: 80%;"><b>A top-down perspective on Neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs066.html#more-top-down-perspectives" style="font-size: 80%;"><b>More top-down perspectives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs067.html#limitations-of-supervised-learning-with-deep-networks" style="font-size: 80%;"><b>Limitations of supervised learning with deep networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs068.html#limitations-of-nns" style="font-size: 80%;"><b>Limitations of NNs</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs069.html#homogeneous-data" style="font-size: 80%;"><b>Homogeneous data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs070.html#more-limitations" style="font-size: 80%;"><b>More limitations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs071.html#setting-up-a-multi-layer-perceptron-model-for-classification" style="font-size: 80%;"><b>Setting up a Multi-layer perceptron model for classification</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs072.html#defining-the-cost-function" style="font-size: 80%;"><b>Defining the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs073.html#example-binary-classification-problem" style="font-size: 80%;"><b>Example: binary classification problem</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs074.html#the-softmax-function" style="font-size: 80%;"><b>The Softmax function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs075.html#developing-a-code-for-doing-neural-networks-with-back-propagation" style="font-size: 80%;"><b>Developing a code for doing neural networks with back propagation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs077.html#train-and-test-datasets" style="font-size: 80%;"><b>Train and test datasets</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs078.html#define-model-and-architecture" style="font-size: 80%;"><b>Define model and architecture</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs079.html#layers" style="font-size: 80%;"><b>Layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs080.html#weights-and-biases" style="font-size: 80%;"><b>Weights and biases</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs081.html#feed-forward-pass" style="font-size: 80%;"><b>Feed-forward pass</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs082.html#matrix-multiplications" style="font-size: 80%;"><b>Matrix multiplications</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs083.html#choose-cost-function-and-optimizer" style="font-size: 80%;"><b>Choose cost function and optimizer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs084.html#optimizing-the-cost-function" style="font-size: 80%;"><b>Optimizing the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs085.html#regularization" style="font-size: 80%;"><b>Regularization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs086.html#matrix-multiplication" style="font-size: 80%;"><b>Matrix  multiplication</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs087.html#improving-performance" style="font-size: 80%;"><b>Improving performance</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs088.html#full-object-oriented-implementation" style="font-size: 80%;"><b>Full object-oriented implementation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs089.html#evaluate-model-performance-on-test-data" style="font-size: 80%;"><b>Evaluate model performance on test data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs090.html#adjust-hyperparameters" style="font-size: 80%;"><b>Adjust hyperparameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#scikit-learn-implementation" style="font-size: 80%;"><b>scikit-learn implementation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs094.html#building-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building neural networks in Tensorflow and Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs095.html#tensorflow" style="font-size: 80%;"><b>Tensorflow</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#using-keras" style="font-size: 80%;"><b>Using Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#building-a-neural-network-code" style="font-size: 80%;"><b>Building a neural network code</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#learning-rate-methods" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Learning rate methods</a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#usage-of-the-above-learning-rate-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of the above learning rate schedulers</a></li>
@@ -465,18 +463,17 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0003"></a>
 <!-- !split -->
-<h2 id="writing-a-code-which-implements-a-feed-forward-neural-network" class="anchor">Writing a code which implements a feed-forward neural network  </h2>
-
-<p>Last week we discussed the basics of neural networks and deep learning
-and the basics of automatic differentiation.  We looked also at
-examples on how compute the parameters of a simple network with scalar
-inputs and ouputs and no or just one hidden layers.
-</p>
-
-<p>We ended our discussions with the derivation of the equations for a
-neural network with one hidden layers and two input variables and two
-hidden nodes but only one output node.
-</p>
+<h2 id="material-for-the-lab-sessions-on-tuesday-and-wednesday" class="anchor">Material for the lab sessions on Tuesday and Wednesday </h2>
+<div class="panel panel-default">
+<div class="panel-body">
+<!-- subsequent paragraphs come in larger fonts, so start with a paragraph -->
+<ol>
+<li> Exercises on writing a code for neural networks, back propagation part, see exercises for week 42 at <a href="https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/exercisesweek42.html" target="_self"><tt>https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/exercisesweek42.html</tt></a></li> 
+<li> Discussion of project 2</li>
+</ol>
+</div>
+</div>
+  
 
 <p>
 <!-- navigation buttons at the bottom of the page -->
diff --git a/doc/pub/week42/html/._week42-bs004.html b/doc/pub/week42/html/._week42-bs004.html
index 37db78730..637644eee 100644
--- a/doc/pub/week42/html/._week42-bs004.html
+++ b/doc/pub/week42/html/._week42-bs004.html
@@ -36,16 +36,17 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Lecture October 14, 2024', 2, None, 'lecture-october-14-2024'),
-              ('Material for the active learning sessions on Tuesday and '
-               'Wednesday',
+ 'sections': [('Lecture October 13, 2025', 2, None, 'lecture-october-13-2025'),
+              ('Readings and videos', 2, None, 'readings-and-videos'),
+              ('Material for the lab sessions on Tuesday and Wednesday',
                2,
                None,
-               'material-for-the-active-learning-sessions-on-tuesday-and-wednesday'),
-              ('Writing a code which implements a feed-forward neural network',
+               'material-for-the-lab-sessions-on-tuesday-and-wednesday'),
+              ('Lecture material: Writing a code which implements a '
+               'feed-forward neural network',
                2,
                None,
-               'writing-a-code-which-implements-a-feed-forward-neural-network'),
+               'lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network'),
               ('Mathematics of deep learning',
                2,
                None,
@@ -55,10 +56,11 @@
                None,
                'reminder-on-books-with-hands-on-material-and-codes'),
               ('Reading recommendations', 2, None, 'reading-recommendations'),
-              ('First network example, simple percepetron with one input',
+              ('Reminder from last week: First network example, simple '
+               'percepetron with one input',
                2,
                None,
-               'first-network-example-simple-percepetron-with-one-input'),
+               'reminder-from-last-week-first-network-example-simple-percepetron-with-one-input'),
               ('Layout of a simple neural network with no hidden layer',
                2,
                None,
@@ -105,10 +107,10 @@
                None,
                'setting-up-the-equations-for-a-neural-network'),
               ('Layout of a neural network with three hidden layers (last '
-               'later = $l=L=4$, first layer $l=0$)',
+               'layer = $l=L=4$, first layer $l=0$)',
                2,
                None,
-               'layout-of-a-neural-network-with-three-hidden-layers-last-later-l-l-4-first-layer-l-0'),
+               'layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0'),
               ('Definitions', 2, None, 'definitions'),
               ('Inputs to the activation function',
                2,
@@ -291,10 +293,6 @@
                2,
                None,
                'collect-and-pre-process-data'),
-              ('The Breast Cancer Data, now with Keras',
-               2,
-               None,
-               'the-breast-cancer-data-now-with-keras'),
               ('Building a neural network code',
                2,
                None,
@@ -349,103 +347,103 @@
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs001.html#lecture-october-14-2024" style="font-size: 80%;"><b>Lecture October 14, 2024</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs002.html#material-for-the-active-learning-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Material for the active learning sessions on Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs003.html#writing-a-code-which-implements-a-feed-forward-neural-network" style="font-size: 80%;"><b>Writing a code which implements a feed-forward neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="#mathematics-of-deep-learning" style="font-size: 80%;"><b>Mathematics of deep learning</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs005.html#reminder-on-books-with-hands-on-material-and-codes" style="font-size: 80%;"><b>Reminder on books with hands-on material and codes</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs006.html#reading-recommendations" style="font-size: 80%;"><b>Reading recommendations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs007.html#first-network-example-simple-percepetron-with-one-input" style="font-size: 80%;"><b>First network example, simple percepetron with one input</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs008.html#layout-of-a-simple-neural-network-with-no-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with no hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs009.html#optimizing-the-parameters" style="font-size: 80%;"><b>Optimizing the parameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs010.html#adding-a-hidden-layer" style="font-size: 80%;"><b>Adding a hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs011.html#layout-of-a-simple-neural-network-with-one-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with one hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs012.html#the-derivatives" style="font-size: 80%;"><b>The derivatives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs013.html#important-observations" style="font-size: 80%;"><b>Important observations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs014.html#the-training" style="font-size: 80%;"><b>The training</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs015.html#code-example" style="font-size: 80%;"><b>Code example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs016.html#simple-neural-network-and-the-back-propagation-equations" style="font-size: 80%;"><b>Simple neural network and the  back propagation equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs017.html#layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-with-two-hidden-noeds-and-one-output-node" style="font-size: 80%;"><b>Layout of a simple neural network with two input nodes, one  hidden layer with two hidden noeds and one output node</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs018.html#the-ouput-layer" style="font-size: 80%;"><b>The ouput layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs019.html#compact-expressions" style="font-size: 80%;"><b>Compact expressions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs020.html#output-layer" style="font-size: 80%;"><b>Output layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs021.html#explicit-derivatives" style="font-size: 80%;"><b>Explicit derivatives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs022.html#derivatives-of-the-hidden-layer" style="font-size: 80%;"><b>Derivatives of the hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs023.html#final-expression" style="font-size: 80%;"><b>Final expression</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs024.html#completing-the-list" style="font-size: 80%;"><b>Completing the list</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs025.html#final-expressions-for-the-biases-of-the-hidden-layer" style="font-size: 80%;"><b>Final expressions for the biases of the hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs026.html#gradient-expressions" style="font-size: 80%;"><b>Gradient expressions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs027.html#setting-up-the-equations-for-a-neural-network" style="font-size: 80%;"><b>Setting up the equations for a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs028.html#layout-of-a-neural-network-with-three-hidden-layers-last-later-l-l-4-first-layer-l-0" style="font-size: 80%;"><b>Layout of a neural network with three hidden layers (last later = \( l=L=4 \), first layer \( l=0 \))</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs029.html#definitions" style="font-size: 80%;"><b>Definitions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs030.html#inputs-to-the-activation-function" style="font-size: 80%;"><b>Inputs to the activation function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs031.html#layout-of-input-to-first-hidden-layer-l-1-from-input-layer-l-0" style="font-size: 80%;"><b>Layout of input to first hidden layer \( l=1 \) from input layer \( l=0 \)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs032.html#derivatives-and-the-chain-rule" style="font-size: 80%;"><b>Derivatives and the chain rule</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs033.html#derivative-of-the-cost-function" style="font-size: 80%;"><b>Derivative of the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs034.html#the-back-propagation-equations-for-a-neural-network" style="font-size: 80%;"><b>The  back propagation equations for a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs035.html#analyzing-the-last-results" style="font-size: 80%;"><b>Analyzing the last results</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs036.html#more-considerations" style="font-size: 80%;"><b>More considerations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs037.html#derivatives-in-terms-of-z-j-l" style="font-size: 80%;"><b>Derivatives in terms of \( z_j^L \)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs038.html#bringing-it-together" style="font-size: 80%;"><b>Bringing it together</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs039.html#final-back-propagating-equation" style="font-size: 80%;"><b>Final back propagating equation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs040.html#using-the-chain-rule-and-summing-over-all-k-entries" style="font-size: 80%;"><b>Using the chain rule and summing over all \( k \) entries</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs041.html#setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations" style="font-size: 80%;"><b>Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs042.html#setting-up-the-back-propagation-algorithm-part-1" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 1</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs043.html#setting-up-the-back-propagation-algorithm-part-2" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs044.html#setting-up-the-back-propagation-algorithm-part-3" style="font-size: 80%;"><b>Setting up the Back propagation algorithm, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs045.html#updating-the-gradients" style="font-size: 80%;"><b>Updating the gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs001.html#lecture-october-13-2025" style="font-size: 80%;"><b>Lecture October 13, 2025</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs002.html#readings-and-videos" style="font-size: 80%;"><b>Readings and videos</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs003.html#material-for-the-lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Material for the lab sessions on Tuesday and Wednesday</b></a></li>
+     <!-- navigation toc: --> <li><a href="#lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network" style="font-size: 80%;"><b>Lecture material: Writing a code which implements a feed-forward neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs005.html#mathematics-of-deep-learning" style="font-size: 80%;"><b>Mathematics of deep learning</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs006.html#reminder-on-books-with-hands-on-material-and-codes" style="font-size: 80%;"><b>Reminder on books with hands-on material and codes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs007.html#reading-recommendations" style="font-size: 80%;"><b>Reading recommendations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs008.html#reminder-from-last-week-first-network-example-simple-percepetron-with-one-input" style="font-size: 80%;"><b>Reminder from last week: First network example, simple percepetron with one input</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs009.html#layout-of-a-simple-neural-network-with-no-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with no hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs010.html#optimizing-the-parameters" style="font-size: 80%;"><b>Optimizing the parameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs011.html#adding-a-hidden-layer" style="font-size: 80%;"><b>Adding a hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs012.html#layout-of-a-simple-neural-network-with-one-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with one hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs013.html#the-derivatives" style="font-size: 80%;"><b>The derivatives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs014.html#important-observations" style="font-size: 80%;"><b>Important observations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs015.html#the-training" style="font-size: 80%;"><b>The training</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs016.html#code-example" style="font-size: 80%;"><b>Code example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs017.html#simple-neural-network-and-the-back-propagation-equations" style="font-size: 80%;"><b>Simple neural network and the  back propagation equations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs018.html#layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-with-two-hidden-noeds-and-one-output-node" style="font-size: 80%;"><b>Layout of a simple neural network with two input nodes, one  hidden layer with two hidden noeds and one output node</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs019.html#the-ouput-layer" style="font-size: 80%;"><b>The ouput layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs020.html#compact-expressions" style="font-size: 80%;"><b>Compact expressions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs021.html#output-layer" style="font-size: 80%;"><b>Output layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs022.html#explicit-derivatives" style="font-size: 80%;"><b>Explicit derivatives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs023.html#derivatives-of-the-hidden-layer" style="font-size: 80%;"><b>Derivatives of the hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs024.html#final-expression" style="font-size: 80%;"><b>Final expression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs025.html#completing-the-list" style="font-size: 80%;"><b>Completing the list</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs026.html#final-expressions-for-the-biases-of-the-hidden-layer" style="font-size: 80%;"><b>Final expressions for the biases of the hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs027.html#gradient-expressions" style="font-size: 80%;"><b>Gradient expressions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs028.html#setting-up-the-equations-for-a-neural-network" style="font-size: 80%;"><b>Setting up the equations for a neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs029.html#layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0" style="font-size: 80%;"><b>Layout of a neural network with three hidden layers (last layer = \( l=L=4 \), first layer \( l=0 \))</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs030.html#definitions" style="font-size: 80%;"><b>Definitions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs031.html#inputs-to-the-activation-function" style="font-size: 80%;"><b>Inputs to the activation function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs032.html#layout-of-input-to-first-hidden-layer-l-1-from-input-layer-l-0" style="font-size: 80%;"><b>Layout of input to first hidden layer \( l=1 \) from input layer \( l=0 \)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs033.html#derivatives-and-the-chain-rule" style="font-size: 80%;"><b>Derivatives and the chain rule</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs034.html#derivative-of-the-cost-function" style="font-size: 80%;"><b>Derivative of the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs035.html#the-back-propagation-equations-for-a-neural-network" style="font-size: 80%;"><b>The  back propagation equations for a neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs036.html#analyzing-the-last-results" style="font-size: 80%;"><b>Analyzing the last results</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs037.html#more-considerations" style="font-size: 80%;"><b>More considerations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs038.html#derivatives-in-terms-of-z-j-l" style="font-size: 80%;"><b>Derivatives in terms of \( z_j^L \)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs039.html#bringing-it-together" style="font-size: 80%;"><b>Bringing it together</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs040.html#final-back-propagating-equation" style="font-size: 80%;"><b>Final back propagating equation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs041.html#using-the-chain-rule-and-summing-over-all-k-entries" style="font-size: 80%;"><b>Using the chain rule and summing over all \( k \) entries</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs042.html#setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations" style="font-size: 80%;"><b>Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs043.html#setting-up-the-back-propagation-algorithm-part-1" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 1</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs044.html#setting-up-the-back-propagation-algorithm-part-2" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 2</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs045.html#setting-up-the-back-propagation-algorithm-part-3" style="font-size: 80%;"><b>Setting up the Back propagation algorithm, part 3</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs046.html#updating-the-gradients" style="font-size: 80%;"><b>Updating the gradients</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#activation-functions" style="font-size: 80%;"><b>Activation functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs047.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs048.html#relevance" style="font-size: 80%;"><b>Relevance</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs049.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs050.html#exploding-gradients" style="font-size: 80%;"><b>Exploding gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs051.html#is-the-logistic-activation-function-sigmoid-our-choice" style="font-size: 80%;"><b>Is the Logistic activation function (Sigmoid)  our choice?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs052.html#logistic-function-as-the-root-of-problems" style="font-size: 80%;"><b>Logistic function as the root of problems</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs053.html#the-derivative-of-the-logistic-funtion" style="font-size: 80%;"><b>The derivative of the Logistic funtion</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs054.html#insights-from-the-paper-by-glorot-and-bengio" style="font-size: 80%;"><b>Insights from the paper by Glorot and Bengio</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs055.html#the-relu-function-family" style="font-size: 80%;"><b>The RELU function family</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs056.html#elu-function" style="font-size: 80%;"><b>ELU function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs057.html#which-activation-function-should-we-use" style="font-size: 80%;"><b>Which activation function should we use?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs058.html#more-on-activation-functions-output-layers" style="font-size: 80%;"><b>More on activation functions, output layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs059.html#fine-tuning-neural-network-hyperparameters" style="font-size: 80%;"><b>Fine-tuning neural network hyperparameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs060.html#hidden-layers" style="font-size: 80%;"><b>Hidden layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs061.html#batch-normalization" style="font-size: 80%;"><b>Batch Normalization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs062.html#dropout" style="font-size: 80%;"><b>Dropout</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs063.html#gradient-clipping" style="font-size: 80%;"><b>Gradient Clipping</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs064.html#a-top-down-perspective-on-neural-networks" style="font-size: 80%;"><b>A top-down perspective on Neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs065.html#more-top-down-perspectives" style="font-size: 80%;"><b>More top-down perspectives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs066.html#limitations-of-supervised-learning-with-deep-networks" style="font-size: 80%;"><b>Limitations of supervised learning with deep networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs067.html#limitations-of-nns" style="font-size: 80%;"><b>Limitations of NNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs068.html#homogeneous-data" style="font-size: 80%;"><b>Homogeneous data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs069.html#more-limitations" style="font-size: 80%;"><b>More limitations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs070.html#setting-up-a-multi-layer-perceptron-model-for-classification" style="font-size: 80%;"><b>Setting up a Multi-layer perceptron model for classification</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs071.html#defining-the-cost-function" style="font-size: 80%;"><b>Defining the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs072.html#example-binary-classification-problem" style="font-size: 80%;"><b>Example: binary classification problem</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs073.html#the-softmax-function" style="font-size: 80%;"><b>The Softmax function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs074.html#developing-a-code-for-doing-neural-networks-with-back-propagation" style="font-size: 80%;"><b>Developing a code for doing neural networks with back propagation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs076.html#train-and-test-datasets" style="font-size: 80%;"><b>Train and test datasets</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs077.html#define-model-and-architecture" style="font-size: 80%;"><b>Define model and architecture</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs078.html#layers" style="font-size: 80%;"><b>Layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs079.html#weights-and-biases" style="font-size: 80%;"><b>Weights and biases</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs080.html#feed-forward-pass" style="font-size: 80%;"><b>Feed-forward pass</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs081.html#matrix-multiplications" style="font-size: 80%;"><b>Matrix multiplications</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs082.html#choose-cost-function-and-optimizer" style="font-size: 80%;"><b>Choose cost function and optimizer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs083.html#optimizing-the-cost-function" style="font-size: 80%;"><b>Optimizing the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs084.html#regularization" style="font-size: 80%;"><b>Regularization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs085.html#matrix-multiplication" style="font-size: 80%;"><b>Matrix  multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs086.html#improving-performance" style="font-size: 80%;"><b>Improving performance</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs087.html#full-object-oriented-implementation" style="font-size: 80%;"><b>Full object-oriented implementation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs088.html#evaluate-model-performance-on-test-data" style="font-size: 80%;"><b>Evaluate model performance on test data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs089.html#adjust-hyperparameters" style="font-size: 80%;"><b>Adjust hyperparameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs091.html#scikit-learn-implementation" style="font-size: 80%;"><b>scikit-learn implementation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#building-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs094.html#tensorflow" style="font-size: 80%;"><b>Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs095.html#using-keras" style="font-size: 80%;"><b>Using Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#the-breast-cancer-data-now-with-keras" style="font-size: 80%;"><b>The Breast Cancer Data, now with Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs048.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs049.html#relevance" style="font-size: 80%;"><b>Relevance</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs050.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs051.html#exploding-gradients" style="font-size: 80%;"><b>Exploding gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs052.html#is-the-logistic-activation-function-sigmoid-our-choice" style="font-size: 80%;"><b>Is the Logistic activation function (Sigmoid)  our choice?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs053.html#logistic-function-as-the-root-of-problems" style="font-size: 80%;"><b>Logistic function as the root of problems</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs054.html#the-derivative-of-the-logistic-funtion" style="font-size: 80%;"><b>The derivative of the Logistic funtion</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs055.html#insights-from-the-paper-by-glorot-and-bengio" style="font-size: 80%;"><b>Insights from the paper by Glorot and Bengio</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs056.html#the-relu-function-family" style="font-size: 80%;"><b>The RELU function family</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs057.html#elu-function" style="font-size: 80%;"><b>ELU function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs058.html#which-activation-function-should-we-use" style="font-size: 80%;"><b>Which activation function should we use?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs059.html#more-on-activation-functions-output-layers" style="font-size: 80%;"><b>More on activation functions, output layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs060.html#fine-tuning-neural-network-hyperparameters" style="font-size: 80%;"><b>Fine-tuning neural network hyperparameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs061.html#hidden-layers" style="font-size: 80%;"><b>Hidden layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs062.html#batch-normalization" style="font-size: 80%;"><b>Batch Normalization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs063.html#dropout" style="font-size: 80%;"><b>Dropout</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs064.html#gradient-clipping" style="font-size: 80%;"><b>Gradient Clipping</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs065.html#a-top-down-perspective-on-neural-networks" style="font-size: 80%;"><b>A top-down perspective on Neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs066.html#more-top-down-perspectives" style="font-size: 80%;"><b>More top-down perspectives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs067.html#limitations-of-supervised-learning-with-deep-networks" style="font-size: 80%;"><b>Limitations of supervised learning with deep networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs068.html#limitations-of-nns" style="font-size: 80%;"><b>Limitations of NNs</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs069.html#homogeneous-data" style="font-size: 80%;"><b>Homogeneous data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs070.html#more-limitations" style="font-size: 80%;"><b>More limitations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs071.html#setting-up-a-multi-layer-perceptron-model-for-classification" style="font-size: 80%;"><b>Setting up a Multi-layer perceptron model for classification</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs072.html#defining-the-cost-function" style="font-size: 80%;"><b>Defining the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs073.html#example-binary-classification-problem" style="font-size: 80%;"><b>Example: binary classification problem</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs074.html#the-softmax-function" style="font-size: 80%;"><b>The Softmax function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs075.html#developing-a-code-for-doing-neural-networks-with-back-propagation" style="font-size: 80%;"><b>Developing a code for doing neural networks with back propagation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs077.html#train-and-test-datasets" style="font-size: 80%;"><b>Train and test datasets</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs078.html#define-model-and-architecture" style="font-size: 80%;"><b>Define model and architecture</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs079.html#layers" style="font-size: 80%;"><b>Layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs080.html#weights-and-biases" style="font-size: 80%;"><b>Weights and biases</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs081.html#feed-forward-pass" style="font-size: 80%;"><b>Feed-forward pass</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs082.html#matrix-multiplications" style="font-size: 80%;"><b>Matrix multiplications</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs083.html#choose-cost-function-and-optimizer" style="font-size: 80%;"><b>Choose cost function and optimizer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs084.html#optimizing-the-cost-function" style="font-size: 80%;"><b>Optimizing the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs085.html#regularization" style="font-size: 80%;"><b>Regularization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs086.html#matrix-multiplication" style="font-size: 80%;"><b>Matrix  multiplication</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs087.html#improving-performance" style="font-size: 80%;"><b>Improving performance</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs088.html#full-object-oriented-implementation" style="font-size: 80%;"><b>Full object-oriented implementation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs089.html#evaluate-model-performance-on-test-data" style="font-size: 80%;"><b>Evaluate model performance on test data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs090.html#adjust-hyperparameters" style="font-size: 80%;"><b>Adjust hyperparameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#scikit-learn-implementation" style="font-size: 80%;"><b>scikit-learn implementation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs094.html#building-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building neural networks in Tensorflow and Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs095.html#tensorflow" style="font-size: 80%;"><b>Tensorflow</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#using-keras" style="font-size: 80%;"><b>Using Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#building-a-neural-network-code" style="font-size: 80%;"><b>Building a neural network code</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#learning-rate-methods" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Learning rate methods</a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#usage-of-the-above-learning-rate-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of the above learning rate schedulers</a></li>
@@ -465,18 +463,18 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0004"></a>
 <!-- !split -->
-<h2 id="mathematics-of-deep-learning" class="anchor">Mathematics of deep learning </h2>
+<h2 id="lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network" class="anchor">Lecture material: Writing a code which implements a feed-forward neural network  </h2>
 
-<div class="panel panel-default">
-<div class="panel-body">
-<!-- subsequent paragraphs come in larger fonts, so start with a paragraph -->
-<ol>
-<li> <a href="https://arxiv.org/abs/2105.04026" target="_self">The Modern Mathematics of Deep Learning, by Julius Berner, Philipp Grohs, Gitta Kutyniok, Philipp Petersen</a>, published as <a href="https://doi.org/10.1017/9781009025096.002" target="_self">Mathematical Aspects of Deep Learning, pp. 1-111. Cambridge University Press, 2022</a></li>
-<li> <a href="https://doi.org/10.48550/arXiv.2310.20360" target="_self">Mathematical Introduction to Deep Learning: Methods, Implementations, and Theory, Arnulf Jentzen, Benno Kuckuck, Philippe von Wurstemberger</a></li>
-</ol>
-</div>
-</div>
+<p>Last week we discussed the basics of neural networks and deep learning
+and the basics of automatic differentiation.  We looked also at
+examples on how compute the parameters of a simple network with scalar
+inputs and ouputs and no or just one hidden layers.
+</p>
 
+<p>We ended our discussions with the derivation of the equations for a
+neural network with one hidden layers and two input variables and two
+hidden nodes but only one output node. We did almost finish the derivation of the back propagation algorithm.
+</p>
 
 <p>
 <!-- navigation buttons at the bottom of the page -->
diff --git a/doc/pub/week42/html/._week42-bs005.html b/doc/pub/week42/html/._week42-bs005.html
index 62f983273..e4371b943 100644
--- a/doc/pub/week42/html/._week42-bs005.html
+++ b/doc/pub/week42/html/._week42-bs005.html
@@ -36,16 +36,17 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Lecture October 14, 2024', 2, None, 'lecture-october-14-2024'),
-              ('Material for the active learning sessions on Tuesday and '
-               'Wednesday',
+ 'sections': [('Lecture October 13, 2025', 2, None, 'lecture-october-13-2025'),
+              ('Readings and videos', 2, None, 'readings-and-videos'),
+              ('Material for the lab sessions on Tuesday and Wednesday',
                2,
                None,
-               'material-for-the-active-learning-sessions-on-tuesday-and-wednesday'),
-              ('Writing a code which implements a feed-forward neural network',
+               'material-for-the-lab-sessions-on-tuesday-and-wednesday'),
+              ('Lecture material: Writing a code which implements a '
+               'feed-forward neural network',
                2,
                None,
-               'writing-a-code-which-implements-a-feed-forward-neural-network'),
+               'lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network'),
               ('Mathematics of deep learning',
                2,
                None,
@@ -55,10 +56,11 @@
                None,
                'reminder-on-books-with-hands-on-material-and-codes'),
               ('Reading recommendations', 2, None, 'reading-recommendations'),
-              ('First network example, simple percepetron with one input',
+              ('Reminder from last week: First network example, simple '
+               'percepetron with one input',
                2,
                None,
-               'first-network-example-simple-percepetron-with-one-input'),
+               'reminder-from-last-week-first-network-example-simple-percepetron-with-one-input'),
               ('Layout of a simple neural network with no hidden layer',
                2,
                None,
@@ -105,10 +107,10 @@
                None,
                'setting-up-the-equations-for-a-neural-network'),
               ('Layout of a neural network with three hidden layers (last '
-               'later = $l=L=4$, first layer $l=0$)',
+               'layer = $l=L=4$, first layer $l=0$)',
                2,
                None,
-               'layout-of-a-neural-network-with-three-hidden-layers-last-later-l-l-4-first-layer-l-0'),
+               'layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0'),
               ('Definitions', 2, None, 'definitions'),
               ('Inputs to the activation function',
                2,
@@ -291,10 +293,6 @@
                2,
                None,
                'collect-and-pre-process-data'),
-              ('The Breast Cancer Data, now with Keras',
-               2,
-               None,
-               'the-breast-cancer-data-now-with-keras'),
               ('Building a neural network code',
                2,
                None,
@@ -349,103 +347,103 @@
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs001.html#lecture-october-14-2024" style="font-size: 80%;"><b>Lecture October 14, 2024</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs002.html#material-for-the-active-learning-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Material for the active learning sessions on Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs003.html#writing-a-code-which-implements-a-feed-forward-neural-network" style="font-size: 80%;"><b>Writing a code which implements a feed-forward neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs004.html#mathematics-of-deep-learning" style="font-size: 80%;"><b>Mathematics of deep learning</b></a></li>
-     <!-- navigation toc: --> <li><a href="#reminder-on-books-with-hands-on-material-and-codes" style="font-size: 80%;"><b>Reminder on books with hands-on material and codes</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs006.html#reading-recommendations" style="font-size: 80%;"><b>Reading recommendations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs007.html#first-network-example-simple-percepetron-with-one-input" style="font-size: 80%;"><b>First network example, simple percepetron with one input</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs008.html#layout-of-a-simple-neural-network-with-no-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with no hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs009.html#optimizing-the-parameters" style="font-size: 80%;"><b>Optimizing the parameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs010.html#adding-a-hidden-layer" style="font-size: 80%;"><b>Adding a hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs011.html#layout-of-a-simple-neural-network-with-one-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with one hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs012.html#the-derivatives" style="font-size: 80%;"><b>The derivatives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs013.html#important-observations" style="font-size: 80%;"><b>Important observations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs014.html#the-training" style="font-size: 80%;"><b>The training</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs015.html#code-example" style="font-size: 80%;"><b>Code example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs016.html#simple-neural-network-and-the-back-propagation-equations" style="font-size: 80%;"><b>Simple neural network and the  back propagation equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs017.html#layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-with-two-hidden-noeds-and-one-output-node" style="font-size: 80%;"><b>Layout of a simple neural network with two input nodes, one  hidden layer with two hidden noeds and one output node</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs018.html#the-ouput-layer" style="font-size: 80%;"><b>The ouput layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs019.html#compact-expressions" style="font-size: 80%;"><b>Compact expressions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs020.html#output-layer" style="font-size: 80%;"><b>Output layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs021.html#explicit-derivatives" style="font-size: 80%;"><b>Explicit derivatives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs022.html#derivatives-of-the-hidden-layer" style="font-size: 80%;"><b>Derivatives of the hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs023.html#final-expression" style="font-size: 80%;"><b>Final expression</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs024.html#completing-the-list" style="font-size: 80%;"><b>Completing the list</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs025.html#final-expressions-for-the-biases-of-the-hidden-layer" style="font-size: 80%;"><b>Final expressions for the biases of the hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs026.html#gradient-expressions" style="font-size: 80%;"><b>Gradient expressions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs027.html#setting-up-the-equations-for-a-neural-network" style="font-size: 80%;"><b>Setting up the equations for a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs028.html#layout-of-a-neural-network-with-three-hidden-layers-last-later-l-l-4-first-layer-l-0" style="font-size: 80%;"><b>Layout of a neural network with three hidden layers (last later = \( l=L=4 \), first layer \( l=0 \))</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs029.html#definitions" style="font-size: 80%;"><b>Definitions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs030.html#inputs-to-the-activation-function" style="font-size: 80%;"><b>Inputs to the activation function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs031.html#layout-of-input-to-first-hidden-layer-l-1-from-input-layer-l-0" style="font-size: 80%;"><b>Layout of input to first hidden layer \( l=1 \) from input layer \( l=0 \)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs032.html#derivatives-and-the-chain-rule" style="font-size: 80%;"><b>Derivatives and the chain rule</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs033.html#derivative-of-the-cost-function" style="font-size: 80%;"><b>Derivative of the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs034.html#the-back-propagation-equations-for-a-neural-network" style="font-size: 80%;"><b>The  back propagation equations for a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs035.html#analyzing-the-last-results" style="font-size: 80%;"><b>Analyzing the last results</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs036.html#more-considerations" style="font-size: 80%;"><b>More considerations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs037.html#derivatives-in-terms-of-z-j-l" style="font-size: 80%;"><b>Derivatives in terms of \( z_j^L \)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs038.html#bringing-it-together" style="font-size: 80%;"><b>Bringing it together</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs039.html#final-back-propagating-equation" style="font-size: 80%;"><b>Final back propagating equation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs040.html#using-the-chain-rule-and-summing-over-all-k-entries" style="font-size: 80%;"><b>Using the chain rule and summing over all \( k \) entries</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs041.html#setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations" style="font-size: 80%;"><b>Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs042.html#setting-up-the-back-propagation-algorithm-part-1" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 1</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs043.html#setting-up-the-back-propagation-algorithm-part-2" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs044.html#setting-up-the-back-propagation-algorithm-part-3" style="font-size: 80%;"><b>Setting up the Back propagation algorithm, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs045.html#updating-the-gradients" style="font-size: 80%;"><b>Updating the gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs001.html#lecture-october-13-2025" style="font-size: 80%;"><b>Lecture October 13, 2025</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs002.html#readings-and-videos" style="font-size: 80%;"><b>Readings and videos</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs003.html#material-for-the-lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Material for the lab sessions on Tuesday and Wednesday</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs004.html#lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network" style="font-size: 80%;"><b>Lecture material: Writing a code which implements a feed-forward neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="#mathematics-of-deep-learning" style="font-size: 80%;"><b>Mathematics of deep learning</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs006.html#reminder-on-books-with-hands-on-material-and-codes" style="font-size: 80%;"><b>Reminder on books with hands-on material and codes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs007.html#reading-recommendations" style="font-size: 80%;"><b>Reading recommendations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs008.html#reminder-from-last-week-first-network-example-simple-percepetron-with-one-input" style="font-size: 80%;"><b>Reminder from last week: First network example, simple percepetron with one input</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs009.html#layout-of-a-simple-neural-network-with-no-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with no hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs010.html#optimizing-the-parameters" style="font-size: 80%;"><b>Optimizing the parameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs011.html#adding-a-hidden-layer" style="font-size: 80%;"><b>Adding a hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs012.html#layout-of-a-simple-neural-network-with-one-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with one hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs013.html#the-derivatives" style="font-size: 80%;"><b>The derivatives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs014.html#important-observations" style="font-size: 80%;"><b>Important observations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs015.html#the-training" style="font-size: 80%;"><b>The training</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs016.html#code-example" style="font-size: 80%;"><b>Code example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs017.html#simple-neural-network-and-the-back-propagation-equations" style="font-size: 80%;"><b>Simple neural network and the  back propagation equations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs018.html#layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-with-two-hidden-noeds-and-one-output-node" style="font-size: 80%;"><b>Layout of a simple neural network with two input nodes, one  hidden layer with two hidden noeds and one output node</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs019.html#the-ouput-layer" style="font-size: 80%;"><b>The ouput layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs020.html#compact-expressions" style="font-size: 80%;"><b>Compact expressions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs021.html#output-layer" style="font-size: 80%;"><b>Output layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs022.html#explicit-derivatives" style="font-size: 80%;"><b>Explicit derivatives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs023.html#derivatives-of-the-hidden-layer" style="font-size: 80%;"><b>Derivatives of the hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs024.html#final-expression" style="font-size: 80%;"><b>Final expression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs025.html#completing-the-list" style="font-size: 80%;"><b>Completing the list</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs026.html#final-expressions-for-the-biases-of-the-hidden-layer" style="font-size: 80%;"><b>Final expressions for the biases of the hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs027.html#gradient-expressions" style="font-size: 80%;"><b>Gradient expressions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs028.html#setting-up-the-equations-for-a-neural-network" style="font-size: 80%;"><b>Setting up the equations for a neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs029.html#layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0" style="font-size: 80%;"><b>Layout of a neural network with three hidden layers (last layer = \( l=L=4 \), first layer \( l=0 \))</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs030.html#definitions" style="font-size: 80%;"><b>Definitions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs031.html#inputs-to-the-activation-function" style="font-size: 80%;"><b>Inputs to the activation function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs032.html#layout-of-input-to-first-hidden-layer-l-1-from-input-layer-l-0" style="font-size: 80%;"><b>Layout of input to first hidden layer \( l=1 \) from input layer \( l=0 \)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs033.html#derivatives-and-the-chain-rule" style="font-size: 80%;"><b>Derivatives and the chain rule</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs034.html#derivative-of-the-cost-function" style="font-size: 80%;"><b>Derivative of the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs035.html#the-back-propagation-equations-for-a-neural-network" style="font-size: 80%;"><b>The  back propagation equations for a neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs036.html#analyzing-the-last-results" style="font-size: 80%;"><b>Analyzing the last results</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs037.html#more-considerations" style="font-size: 80%;"><b>More considerations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs038.html#derivatives-in-terms-of-z-j-l" style="font-size: 80%;"><b>Derivatives in terms of \( z_j^L \)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs039.html#bringing-it-together" style="font-size: 80%;"><b>Bringing it together</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs040.html#final-back-propagating-equation" style="font-size: 80%;"><b>Final back propagating equation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs041.html#using-the-chain-rule-and-summing-over-all-k-entries" style="font-size: 80%;"><b>Using the chain rule and summing over all \( k \) entries</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs042.html#setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations" style="font-size: 80%;"><b>Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs043.html#setting-up-the-back-propagation-algorithm-part-1" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 1</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs044.html#setting-up-the-back-propagation-algorithm-part-2" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 2</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs045.html#setting-up-the-back-propagation-algorithm-part-3" style="font-size: 80%;"><b>Setting up the Back propagation algorithm, part 3</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs046.html#updating-the-gradients" style="font-size: 80%;"><b>Updating the gradients</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#activation-functions" style="font-size: 80%;"><b>Activation functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs047.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs048.html#relevance" style="font-size: 80%;"><b>Relevance</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs049.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs050.html#exploding-gradients" style="font-size: 80%;"><b>Exploding gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs051.html#is-the-logistic-activation-function-sigmoid-our-choice" style="font-size: 80%;"><b>Is the Logistic activation function (Sigmoid)  our choice?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs052.html#logistic-function-as-the-root-of-problems" style="font-size: 80%;"><b>Logistic function as the root of problems</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs053.html#the-derivative-of-the-logistic-funtion" style="font-size: 80%;"><b>The derivative of the Logistic funtion</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs054.html#insights-from-the-paper-by-glorot-and-bengio" style="font-size: 80%;"><b>Insights from the paper by Glorot and Bengio</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs055.html#the-relu-function-family" style="font-size: 80%;"><b>The RELU function family</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs056.html#elu-function" style="font-size: 80%;"><b>ELU function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs057.html#which-activation-function-should-we-use" style="font-size: 80%;"><b>Which activation function should we use?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs058.html#more-on-activation-functions-output-layers" style="font-size: 80%;"><b>More on activation functions, output layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs059.html#fine-tuning-neural-network-hyperparameters" style="font-size: 80%;"><b>Fine-tuning neural network hyperparameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs060.html#hidden-layers" style="font-size: 80%;"><b>Hidden layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs061.html#batch-normalization" style="font-size: 80%;"><b>Batch Normalization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs062.html#dropout" style="font-size: 80%;"><b>Dropout</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs063.html#gradient-clipping" style="font-size: 80%;"><b>Gradient Clipping</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs064.html#a-top-down-perspective-on-neural-networks" style="font-size: 80%;"><b>A top-down perspective on Neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs065.html#more-top-down-perspectives" style="font-size: 80%;"><b>More top-down perspectives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs066.html#limitations-of-supervised-learning-with-deep-networks" style="font-size: 80%;"><b>Limitations of supervised learning with deep networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs067.html#limitations-of-nns" style="font-size: 80%;"><b>Limitations of NNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs068.html#homogeneous-data" style="font-size: 80%;"><b>Homogeneous data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs069.html#more-limitations" style="font-size: 80%;"><b>More limitations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs070.html#setting-up-a-multi-layer-perceptron-model-for-classification" style="font-size: 80%;"><b>Setting up a Multi-layer perceptron model for classification</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs071.html#defining-the-cost-function" style="font-size: 80%;"><b>Defining the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs072.html#example-binary-classification-problem" style="font-size: 80%;"><b>Example: binary classification problem</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs073.html#the-softmax-function" style="font-size: 80%;"><b>The Softmax function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs074.html#developing-a-code-for-doing-neural-networks-with-back-propagation" style="font-size: 80%;"><b>Developing a code for doing neural networks with back propagation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs076.html#train-and-test-datasets" style="font-size: 80%;"><b>Train and test datasets</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs077.html#define-model-and-architecture" style="font-size: 80%;"><b>Define model and architecture</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs078.html#layers" style="font-size: 80%;"><b>Layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs079.html#weights-and-biases" style="font-size: 80%;"><b>Weights and biases</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs080.html#feed-forward-pass" style="font-size: 80%;"><b>Feed-forward pass</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs081.html#matrix-multiplications" style="font-size: 80%;"><b>Matrix multiplications</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs082.html#choose-cost-function-and-optimizer" style="font-size: 80%;"><b>Choose cost function and optimizer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs083.html#optimizing-the-cost-function" style="font-size: 80%;"><b>Optimizing the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs084.html#regularization" style="font-size: 80%;"><b>Regularization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs085.html#matrix-multiplication" style="font-size: 80%;"><b>Matrix  multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs086.html#improving-performance" style="font-size: 80%;"><b>Improving performance</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs087.html#full-object-oriented-implementation" style="font-size: 80%;"><b>Full object-oriented implementation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs088.html#evaluate-model-performance-on-test-data" style="font-size: 80%;"><b>Evaluate model performance on test data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs089.html#adjust-hyperparameters" style="font-size: 80%;"><b>Adjust hyperparameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs091.html#scikit-learn-implementation" style="font-size: 80%;"><b>scikit-learn implementation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#building-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs094.html#tensorflow" style="font-size: 80%;"><b>Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs095.html#using-keras" style="font-size: 80%;"><b>Using Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#the-breast-cancer-data-now-with-keras" style="font-size: 80%;"><b>The Breast Cancer Data, now with Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs048.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs049.html#relevance" style="font-size: 80%;"><b>Relevance</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs050.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs051.html#exploding-gradients" style="font-size: 80%;"><b>Exploding gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs052.html#is-the-logistic-activation-function-sigmoid-our-choice" style="font-size: 80%;"><b>Is the Logistic activation function (Sigmoid)  our choice?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs053.html#logistic-function-as-the-root-of-problems" style="font-size: 80%;"><b>Logistic function as the root of problems</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs054.html#the-derivative-of-the-logistic-funtion" style="font-size: 80%;"><b>The derivative of the Logistic funtion</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs055.html#insights-from-the-paper-by-glorot-and-bengio" style="font-size: 80%;"><b>Insights from the paper by Glorot and Bengio</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs056.html#the-relu-function-family" style="font-size: 80%;"><b>The RELU function family</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs057.html#elu-function" style="font-size: 80%;"><b>ELU function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs058.html#which-activation-function-should-we-use" style="font-size: 80%;"><b>Which activation function should we use?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs059.html#more-on-activation-functions-output-layers" style="font-size: 80%;"><b>More on activation functions, output layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs060.html#fine-tuning-neural-network-hyperparameters" style="font-size: 80%;"><b>Fine-tuning neural network hyperparameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs061.html#hidden-layers" style="font-size: 80%;"><b>Hidden layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs062.html#batch-normalization" style="font-size: 80%;"><b>Batch Normalization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs063.html#dropout" style="font-size: 80%;"><b>Dropout</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs064.html#gradient-clipping" style="font-size: 80%;"><b>Gradient Clipping</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs065.html#a-top-down-perspective-on-neural-networks" style="font-size: 80%;"><b>A top-down perspective on Neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs066.html#more-top-down-perspectives" style="font-size: 80%;"><b>More top-down perspectives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs067.html#limitations-of-supervised-learning-with-deep-networks" style="font-size: 80%;"><b>Limitations of supervised learning with deep networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs068.html#limitations-of-nns" style="font-size: 80%;"><b>Limitations of NNs</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs069.html#homogeneous-data" style="font-size: 80%;"><b>Homogeneous data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs070.html#more-limitations" style="font-size: 80%;"><b>More limitations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs071.html#setting-up-a-multi-layer-perceptron-model-for-classification" style="font-size: 80%;"><b>Setting up a Multi-layer perceptron model for classification</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs072.html#defining-the-cost-function" style="font-size: 80%;"><b>Defining the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs073.html#example-binary-classification-problem" style="font-size: 80%;"><b>Example: binary classification problem</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs074.html#the-softmax-function" style="font-size: 80%;"><b>The Softmax function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs075.html#developing-a-code-for-doing-neural-networks-with-back-propagation" style="font-size: 80%;"><b>Developing a code for doing neural networks with back propagation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs077.html#train-and-test-datasets" style="font-size: 80%;"><b>Train and test datasets</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs078.html#define-model-and-architecture" style="font-size: 80%;"><b>Define model and architecture</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs079.html#layers" style="font-size: 80%;"><b>Layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs080.html#weights-and-biases" style="font-size: 80%;"><b>Weights and biases</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs081.html#feed-forward-pass" style="font-size: 80%;"><b>Feed-forward pass</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs082.html#matrix-multiplications" style="font-size: 80%;"><b>Matrix multiplications</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs083.html#choose-cost-function-and-optimizer" style="font-size: 80%;"><b>Choose cost function and optimizer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs084.html#optimizing-the-cost-function" style="font-size: 80%;"><b>Optimizing the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs085.html#regularization" style="font-size: 80%;"><b>Regularization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs086.html#matrix-multiplication" style="font-size: 80%;"><b>Matrix  multiplication</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs087.html#improving-performance" style="font-size: 80%;"><b>Improving performance</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs088.html#full-object-oriented-implementation" style="font-size: 80%;"><b>Full object-oriented implementation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs089.html#evaluate-model-performance-on-test-data" style="font-size: 80%;"><b>Evaluate model performance on test data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs090.html#adjust-hyperparameters" style="font-size: 80%;"><b>Adjust hyperparameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#scikit-learn-implementation" style="font-size: 80%;"><b>scikit-learn implementation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs094.html#building-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building neural networks in Tensorflow and Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs095.html#tensorflow" style="font-size: 80%;"><b>Tensorflow</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#using-keras" style="font-size: 80%;"><b>Using Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#building-a-neural-network-code" style="font-size: 80%;"><b>Building a neural network code</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#learning-rate-methods" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Learning rate methods</a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#usage-of-the-above-learning-rate-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of the above learning rate schedulers</a></li>
@@ -465,13 +463,15 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0005"></a>
 <!-- !split -->
-<h2 id="reminder-on-books-with-hands-on-material-and-codes" class="anchor">Reminder on books with hands-on material and codes </h2>
+<h2 id="mathematics-of-deep-learning" class="anchor">Mathematics of deep learning </h2>
+
 <div class="panel panel-default">
 <div class="panel-body">
 <!-- subsequent paragraphs come in larger fonts, so start with a paragraph -->
-<ul>
-<li> <a href="https://sebastianraschka.com/blog/2022/ml-pytorch-book.html" target="_self">Sebastian Rashcka et al, Machine learning with Sickit-Learn and PyTorch</a></li>
-</ul>
+<ol>
+<li> <a href="https://arxiv.org/abs/2105.04026" target="_self">The Modern Mathematics of Deep Learning, by Julius Berner, Philipp Grohs, Gitta Kutyniok, Philipp Petersen</a>, published as <a href="https://doi.org/10.1017/9781009025096.002" target="_self">Mathematical Aspects of Deep Learning, pp. 1-111. Cambridge University Press, 2022</a></li>
+<li> <a href="https://doi.org/10.48550/arXiv.2310.20360" target="_self">Mathematical Introduction to Deep Learning: Methods, Implementations, and Theory, Arnulf Jentzen, Benno Kuckuck, Philippe von Wurstemberger</a></li>
+</ol>
 </div>
 </div>
 
diff --git a/doc/pub/week42/html/._week42-bs006.html b/doc/pub/week42/html/._week42-bs006.html
index f10997788..26ce6fa71 100644
--- a/doc/pub/week42/html/._week42-bs006.html
+++ b/doc/pub/week42/html/._week42-bs006.html
@@ -36,16 +36,17 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Lecture October 14, 2024', 2, None, 'lecture-october-14-2024'),
-              ('Material for the active learning sessions on Tuesday and '
-               'Wednesday',
+ 'sections': [('Lecture October 13, 2025', 2, None, 'lecture-october-13-2025'),
+              ('Readings and videos', 2, None, 'readings-and-videos'),
+              ('Material for the lab sessions on Tuesday and Wednesday',
                2,
                None,
-               'material-for-the-active-learning-sessions-on-tuesday-and-wednesday'),
-              ('Writing a code which implements a feed-forward neural network',
+               'material-for-the-lab-sessions-on-tuesday-and-wednesday'),
+              ('Lecture material: Writing a code which implements a '
+               'feed-forward neural network',
                2,
                None,
-               'writing-a-code-which-implements-a-feed-forward-neural-network'),
+               'lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network'),
               ('Mathematics of deep learning',
                2,
                None,
@@ -55,10 +56,11 @@
                None,
                'reminder-on-books-with-hands-on-material-and-codes'),
               ('Reading recommendations', 2, None, 'reading-recommendations'),
-              ('First network example, simple percepetron with one input',
+              ('Reminder from last week: First network example, simple '
+               'percepetron with one input',
                2,
                None,
-               'first-network-example-simple-percepetron-with-one-input'),
+               'reminder-from-last-week-first-network-example-simple-percepetron-with-one-input'),
               ('Layout of a simple neural network with no hidden layer',
                2,
                None,
@@ -105,10 +107,10 @@
                None,
                'setting-up-the-equations-for-a-neural-network'),
               ('Layout of a neural network with three hidden layers (last '
-               'later = $l=L=4$, first layer $l=0$)',
+               'layer = $l=L=4$, first layer $l=0$)',
                2,
                None,
-               'layout-of-a-neural-network-with-three-hidden-layers-last-later-l-l-4-first-layer-l-0'),
+               'layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0'),
               ('Definitions', 2, None, 'definitions'),
               ('Inputs to the activation function',
                2,
@@ -291,10 +293,6 @@
                2,
                None,
                'collect-and-pre-process-data'),
-              ('The Breast Cancer Data, now with Keras',
-               2,
-               None,
-               'the-breast-cancer-data-now-with-keras'),
               ('Building a neural network code',
                2,
                None,
@@ -349,103 +347,103 @@
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs001.html#lecture-october-14-2024" style="font-size: 80%;"><b>Lecture October 14, 2024</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs002.html#material-for-the-active-learning-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Material for the active learning sessions on Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs003.html#writing-a-code-which-implements-a-feed-forward-neural-network" style="font-size: 80%;"><b>Writing a code which implements a feed-forward neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs004.html#mathematics-of-deep-learning" style="font-size: 80%;"><b>Mathematics of deep learning</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs005.html#reminder-on-books-with-hands-on-material-and-codes" style="font-size: 80%;"><b>Reminder on books with hands-on material and codes</b></a></li>
-     <!-- navigation toc: --> <li><a href="#reading-recommendations" style="font-size: 80%;"><b>Reading recommendations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs007.html#first-network-example-simple-percepetron-with-one-input" style="font-size: 80%;"><b>First network example, simple percepetron with one input</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs008.html#layout-of-a-simple-neural-network-with-no-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with no hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs009.html#optimizing-the-parameters" style="font-size: 80%;"><b>Optimizing the parameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs010.html#adding-a-hidden-layer" style="font-size: 80%;"><b>Adding a hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs011.html#layout-of-a-simple-neural-network-with-one-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with one hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs012.html#the-derivatives" style="font-size: 80%;"><b>The derivatives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs013.html#important-observations" style="font-size: 80%;"><b>Important observations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs014.html#the-training" style="font-size: 80%;"><b>The training</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs015.html#code-example" style="font-size: 80%;"><b>Code example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs016.html#simple-neural-network-and-the-back-propagation-equations" style="font-size: 80%;"><b>Simple neural network and the  back propagation equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs017.html#layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-with-two-hidden-noeds-and-one-output-node" style="font-size: 80%;"><b>Layout of a simple neural network with two input nodes, one  hidden layer with two hidden noeds and one output node</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs018.html#the-ouput-layer" style="font-size: 80%;"><b>The ouput layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs019.html#compact-expressions" style="font-size: 80%;"><b>Compact expressions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs020.html#output-layer" style="font-size: 80%;"><b>Output layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs021.html#explicit-derivatives" style="font-size: 80%;"><b>Explicit derivatives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs022.html#derivatives-of-the-hidden-layer" style="font-size: 80%;"><b>Derivatives of the hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs023.html#final-expression" style="font-size: 80%;"><b>Final expression</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs024.html#completing-the-list" style="font-size: 80%;"><b>Completing the list</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs025.html#final-expressions-for-the-biases-of-the-hidden-layer" style="font-size: 80%;"><b>Final expressions for the biases of the hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs026.html#gradient-expressions" style="font-size: 80%;"><b>Gradient expressions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs027.html#setting-up-the-equations-for-a-neural-network" style="font-size: 80%;"><b>Setting up the equations for a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs028.html#layout-of-a-neural-network-with-three-hidden-layers-last-later-l-l-4-first-layer-l-0" style="font-size: 80%;"><b>Layout of a neural network with three hidden layers (last later = \( l=L=4 \), first layer \( l=0 \))</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs029.html#definitions" style="font-size: 80%;"><b>Definitions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs030.html#inputs-to-the-activation-function" style="font-size: 80%;"><b>Inputs to the activation function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs031.html#layout-of-input-to-first-hidden-layer-l-1-from-input-layer-l-0" style="font-size: 80%;"><b>Layout of input to first hidden layer \( l=1 \) from input layer \( l=0 \)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs032.html#derivatives-and-the-chain-rule" style="font-size: 80%;"><b>Derivatives and the chain rule</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs033.html#derivative-of-the-cost-function" style="font-size: 80%;"><b>Derivative of the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs034.html#the-back-propagation-equations-for-a-neural-network" style="font-size: 80%;"><b>The  back propagation equations for a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs035.html#analyzing-the-last-results" style="font-size: 80%;"><b>Analyzing the last results</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs036.html#more-considerations" style="font-size: 80%;"><b>More considerations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs037.html#derivatives-in-terms-of-z-j-l" style="font-size: 80%;"><b>Derivatives in terms of \( z_j^L \)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs038.html#bringing-it-together" style="font-size: 80%;"><b>Bringing it together</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs039.html#final-back-propagating-equation" style="font-size: 80%;"><b>Final back propagating equation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs040.html#using-the-chain-rule-and-summing-over-all-k-entries" style="font-size: 80%;"><b>Using the chain rule and summing over all \( k \) entries</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs041.html#setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations" style="font-size: 80%;"><b>Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs042.html#setting-up-the-back-propagation-algorithm-part-1" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 1</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs043.html#setting-up-the-back-propagation-algorithm-part-2" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs044.html#setting-up-the-back-propagation-algorithm-part-3" style="font-size: 80%;"><b>Setting up the Back propagation algorithm, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs045.html#updating-the-gradients" style="font-size: 80%;"><b>Updating the gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs001.html#lecture-october-13-2025" style="font-size: 80%;"><b>Lecture October 13, 2025</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs002.html#readings-and-videos" style="font-size: 80%;"><b>Readings and videos</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs003.html#material-for-the-lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Material for the lab sessions on Tuesday and Wednesday</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs004.html#lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network" style="font-size: 80%;"><b>Lecture material: Writing a code which implements a feed-forward neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs005.html#mathematics-of-deep-learning" style="font-size: 80%;"><b>Mathematics of deep learning</b></a></li>
+     <!-- navigation toc: --> <li><a href="#reminder-on-books-with-hands-on-material-and-codes" style="font-size: 80%;"><b>Reminder on books with hands-on material and codes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs007.html#reading-recommendations" style="font-size: 80%;"><b>Reading recommendations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs008.html#reminder-from-last-week-first-network-example-simple-percepetron-with-one-input" style="font-size: 80%;"><b>Reminder from last week: First network example, simple percepetron with one input</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs009.html#layout-of-a-simple-neural-network-with-no-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with no hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs010.html#optimizing-the-parameters" style="font-size: 80%;"><b>Optimizing the parameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs011.html#adding-a-hidden-layer" style="font-size: 80%;"><b>Adding a hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs012.html#layout-of-a-simple-neural-network-with-one-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with one hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs013.html#the-derivatives" style="font-size: 80%;"><b>The derivatives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs014.html#important-observations" style="font-size: 80%;"><b>Important observations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs015.html#the-training" style="font-size: 80%;"><b>The training</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs016.html#code-example" style="font-size: 80%;"><b>Code example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs017.html#simple-neural-network-and-the-back-propagation-equations" style="font-size: 80%;"><b>Simple neural network and the  back propagation equations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs018.html#layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-with-two-hidden-noeds-and-one-output-node" style="font-size: 80%;"><b>Layout of a simple neural network with two input nodes, one  hidden layer with two hidden noeds and one output node</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs019.html#the-ouput-layer" style="font-size: 80%;"><b>The ouput layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs020.html#compact-expressions" style="font-size: 80%;"><b>Compact expressions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs021.html#output-layer" style="font-size: 80%;"><b>Output layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs022.html#explicit-derivatives" style="font-size: 80%;"><b>Explicit derivatives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs023.html#derivatives-of-the-hidden-layer" style="font-size: 80%;"><b>Derivatives of the hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs024.html#final-expression" style="font-size: 80%;"><b>Final expression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs025.html#completing-the-list" style="font-size: 80%;"><b>Completing the list</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs026.html#final-expressions-for-the-biases-of-the-hidden-layer" style="font-size: 80%;"><b>Final expressions for the biases of the hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs027.html#gradient-expressions" style="font-size: 80%;"><b>Gradient expressions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs028.html#setting-up-the-equations-for-a-neural-network" style="font-size: 80%;"><b>Setting up the equations for a neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs029.html#layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0" style="font-size: 80%;"><b>Layout of a neural network with three hidden layers (last layer = \( l=L=4 \), first layer \( l=0 \))</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs030.html#definitions" style="font-size: 80%;"><b>Definitions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs031.html#inputs-to-the-activation-function" style="font-size: 80%;"><b>Inputs to the activation function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs032.html#layout-of-input-to-first-hidden-layer-l-1-from-input-layer-l-0" style="font-size: 80%;"><b>Layout of input to first hidden layer \( l=1 \) from input layer \( l=0 \)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs033.html#derivatives-and-the-chain-rule" style="font-size: 80%;"><b>Derivatives and the chain rule</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs034.html#derivative-of-the-cost-function" style="font-size: 80%;"><b>Derivative of the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs035.html#the-back-propagation-equations-for-a-neural-network" style="font-size: 80%;"><b>The  back propagation equations for a neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs036.html#analyzing-the-last-results" style="font-size: 80%;"><b>Analyzing the last results</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs037.html#more-considerations" style="font-size: 80%;"><b>More considerations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs038.html#derivatives-in-terms-of-z-j-l" style="font-size: 80%;"><b>Derivatives in terms of \( z_j^L \)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs039.html#bringing-it-together" style="font-size: 80%;"><b>Bringing it together</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs040.html#final-back-propagating-equation" style="font-size: 80%;"><b>Final back propagating equation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs041.html#using-the-chain-rule-and-summing-over-all-k-entries" style="font-size: 80%;"><b>Using the chain rule and summing over all \( k \) entries</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs042.html#setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations" style="font-size: 80%;"><b>Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs043.html#setting-up-the-back-propagation-algorithm-part-1" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 1</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs044.html#setting-up-the-back-propagation-algorithm-part-2" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 2</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs045.html#setting-up-the-back-propagation-algorithm-part-3" style="font-size: 80%;"><b>Setting up the Back propagation algorithm, part 3</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs046.html#updating-the-gradients" style="font-size: 80%;"><b>Updating the gradients</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#activation-functions" style="font-size: 80%;"><b>Activation functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs047.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs048.html#relevance" style="font-size: 80%;"><b>Relevance</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs049.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs050.html#exploding-gradients" style="font-size: 80%;"><b>Exploding gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs051.html#is-the-logistic-activation-function-sigmoid-our-choice" style="font-size: 80%;"><b>Is the Logistic activation function (Sigmoid)  our choice?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs052.html#logistic-function-as-the-root-of-problems" style="font-size: 80%;"><b>Logistic function as the root of problems</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs053.html#the-derivative-of-the-logistic-funtion" style="font-size: 80%;"><b>The derivative of the Logistic funtion</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs054.html#insights-from-the-paper-by-glorot-and-bengio" style="font-size: 80%;"><b>Insights from the paper by Glorot and Bengio</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs055.html#the-relu-function-family" style="font-size: 80%;"><b>The RELU function family</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs056.html#elu-function" style="font-size: 80%;"><b>ELU function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs057.html#which-activation-function-should-we-use" style="font-size: 80%;"><b>Which activation function should we use?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs058.html#more-on-activation-functions-output-layers" style="font-size: 80%;"><b>More on activation functions, output layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs059.html#fine-tuning-neural-network-hyperparameters" style="font-size: 80%;"><b>Fine-tuning neural network hyperparameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs060.html#hidden-layers" style="font-size: 80%;"><b>Hidden layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs061.html#batch-normalization" style="font-size: 80%;"><b>Batch Normalization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs062.html#dropout" style="font-size: 80%;"><b>Dropout</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs063.html#gradient-clipping" style="font-size: 80%;"><b>Gradient Clipping</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs064.html#a-top-down-perspective-on-neural-networks" style="font-size: 80%;"><b>A top-down perspective on Neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs065.html#more-top-down-perspectives" style="font-size: 80%;"><b>More top-down perspectives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs066.html#limitations-of-supervised-learning-with-deep-networks" style="font-size: 80%;"><b>Limitations of supervised learning with deep networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs067.html#limitations-of-nns" style="font-size: 80%;"><b>Limitations of NNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs068.html#homogeneous-data" style="font-size: 80%;"><b>Homogeneous data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs069.html#more-limitations" style="font-size: 80%;"><b>More limitations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs070.html#setting-up-a-multi-layer-perceptron-model-for-classification" style="font-size: 80%;"><b>Setting up a Multi-layer perceptron model for classification</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs071.html#defining-the-cost-function" style="font-size: 80%;"><b>Defining the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs072.html#example-binary-classification-problem" style="font-size: 80%;"><b>Example: binary classification problem</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs073.html#the-softmax-function" style="font-size: 80%;"><b>The Softmax function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs074.html#developing-a-code-for-doing-neural-networks-with-back-propagation" style="font-size: 80%;"><b>Developing a code for doing neural networks with back propagation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs076.html#train-and-test-datasets" style="font-size: 80%;"><b>Train and test datasets</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs077.html#define-model-and-architecture" style="font-size: 80%;"><b>Define model and architecture</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs078.html#layers" style="font-size: 80%;"><b>Layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs079.html#weights-and-biases" style="font-size: 80%;"><b>Weights and biases</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs080.html#feed-forward-pass" style="font-size: 80%;"><b>Feed-forward pass</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs081.html#matrix-multiplications" style="font-size: 80%;"><b>Matrix multiplications</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs082.html#choose-cost-function-and-optimizer" style="font-size: 80%;"><b>Choose cost function and optimizer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs083.html#optimizing-the-cost-function" style="font-size: 80%;"><b>Optimizing the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs084.html#regularization" style="font-size: 80%;"><b>Regularization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs085.html#matrix-multiplication" style="font-size: 80%;"><b>Matrix  multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs086.html#improving-performance" style="font-size: 80%;"><b>Improving performance</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs087.html#full-object-oriented-implementation" style="font-size: 80%;"><b>Full object-oriented implementation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs088.html#evaluate-model-performance-on-test-data" style="font-size: 80%;"><b>Evaluate model performance on test data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs089.html#adjust-hyperparameters" style="font-size: 80%;"><b>Adjust hyperparameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs091.html#scikit-learn-implementation" style="font-size: 80%;"><b>scikit-learn implementation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#building-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs094.html#tensorflow" style="font-size: 80%;"><b>Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs095.html#using-keras" style="font-size: 80%;"><b>Using Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#the-breast-cancer-data-now-with-keras" style="font-size: 80%;"><b>The Breast Cancer Data, now with Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs048.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs049.html#relevance" style="font-size: 80%;"><b>Relevance</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs050.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs051.html#exploding-gradients" style="font-size: 80%;"><b>Exploding gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs052.html#is-the-logistic-activation-function-sigmoid-our-choice" style="font-size: 80%;"><b>Is the Logistic activation function (Sigmoid)  our choice?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs053.html#logistic-function-as-the-root-of-problems" style="font-size: 80%;"><b>Logistic function as the root of problems</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs054.html#the-derivative-of-the-logistic-funtion" style="font-size: 80%;"><b>The derivative of the Logistic funtion</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs055.html#insights-from-the-paper-by-glorot-and-bengio" style="font-size: 80%;"><b>Insights from the paper by Glorot and Bengio</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs056.html#the-relu-function-family" style="font-size: 80%;"><b>The RELU function family</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs057.html#elu-function" style="font-size: 80%;"><b>ELU function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs058.html#which-activation-function-should-we-use" style="font-size: 80%;"><b>Which activation function should we use?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs059.html#more-on-activation-functions-output-layers" style="font-size: 80%;"><b>More on activation functions, output layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs060.html#fine-tuning-neural-network-hyperparameters" style="font-size: 80%;"><b>Fine-tuning neural network hyperparameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs061.html#hidden-layers" style="font-size: 80%;"><b>Hidden layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs062.html#batch-normalization" style="font-size: 80%;"><b>Batch Normalization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs063.html#dropout" style="font-size: 80%;"><b>Dropout</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs064.html#gradient-clipping" style="font-size: 80%;"><b>Gradient Clipping</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs065.html#a-top-down-perspective-on-neural-networks" style="font-size: 80%;"><b>A top-down perspective on Neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs066.html#more-top-down-perspectives" style="font-size: 80%;"><b>More top-down perspectives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs067.html#limitations-of-supervised-learning-with-deep-networks" style="font-size: 80%;"><b>Limitations of supervised learning with deep networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs068.html#limitations-of-nns" style="font-size: 80%;"><b>Limitations of NNs</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs069.html#homogeneous-data" style="font-size: 80%;"><b>Homogeneous data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs070.html#more-limitations" style="font-size: 80%;"><b>More limitations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs071.html#setting-up-a-multi-layer-perceptron-model-for-classification" style="font-size: 80%;"><b>Setting up a Multi-layer perceptron model for classification</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs072.html#defining-the-cost-function" style="font-size: 80%;"><b>Defining the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs073.html#example-binary-classification-problem" style="font-size: 80%;"><b>Example: binary classification problem</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs074.html#the-softmax-function" style="font-size: 80%;"><b>The Softmax function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs075.html#developing-a-code-for-doing-neural-networks-with-back-propagation" style="font-size: 80%;"><b>Developing a code for doing neural networks with back propagation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs077.html#train-and-test-datasets" style="font-size: 80%;"><b>Train and test datasets</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs078.html#define-model-and-architecture" style="font-size: 80%;"><b>Define model and architecture</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs079.html#layers" style="font-size: 80%;"><b>Layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs080.html#weights-and-biases" style="font-size: 80%;"><b>Weights and biases</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs081.html#feed-forward-pass" style="font-size: 80%;"><b>Feed-forward pass</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs082.html#matrix-multiplications" style="font-size: 80%;"><b>Matrix multiplications</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs083.html#choose-cost-function-and-optimizer" style="font-size: 80%;"><b>Choose cost function and optimizer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs084.html#optimizing-the-cost-function" style="font-size: 80%;"><b>Optimizing the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs085.html#regularization" style="font-size: 80%;"><b>Regularization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs086.html#matrix-multiplication" style="font-size: 80%;"><b>Matrix  multiplication</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs087.html#improving-performance" style="font-size: 80%;"><b>Improving performance</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs088.html#full-object-oriented-implementation" style="font-size: 80%;"><b>Full object-oriented implementation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs089.html#evaluate-model-performance-on-test-data" style="font-size: 80%;"><b>Evaluate model performance on test data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs090.html#adjust-hyperparameters" style="font-size: 80%;"><b>Adjust hyperparameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#scikit-learn-implementation" style="font-size: 80%;"><b>scikit-learn implementation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs094.html#building-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building neural networks in Tensorflow and Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs095.html#tensorflow" style="font-size: 80%;"><b>Tensorflow</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#using-keras" style="font-size: 80%;"><b>Using Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#building-a-neural-network-code" style="font-size: 80%;"><b>Building a neural network code</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#learning-rate-methods" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Learning rate methods</a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#usage-of-the-above-learning-rate-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of the above learning rate schedulers</a></li>
@@ -465,12 +463,17 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0006"></a>
 <!-- !split -->
-<h2 id="reading-recommendations" class="anchor">Reading recommendations </h2>
+<h2 id="reminder-on-books-with-hands-on-material-and-codes" class="anchor">Reminder on books with hands-on material and codes </h2>
+<div class="panel panel-default">
+<div class="panel-body">
+<!-- subsequent paragraphs come in larger fonts, so start with a paragraph -->
+<ul>
+<li> <a href="https://sebastianraschka.com/blog/2022/ml-pytorch-book.html" target="_self">Sebastian Rashcka et al, Machine learning with Sickit-Learn and PyTorch</a></li>
+</ul>
+</div>
+</div>
+
 
-<ol>
-<li> Rashkca et al., chapter 11, jupyter-notebook sent separately, from <a href="https://github.com/rasbt/machine-learning-book" target="_self">GitHub</a></li>
-<li> Goodfellow et al, chapter 6 and 7 contain most of the neural network background.</li>
-</ol>
 <p>
 <!-- navigation buttons at the bottom of the page -->
 <ul class="pagination">
diff --git a/doc/pub/week42/html/._week42-bs007.html b/doc/pub/week42/html/._week42-bs007.html
index f36029253..b569ddbce 100644
--- a/doc/pub/week42/html/._week42-bs007.html
+++ b/doc/pub/week42/html/._week42-bs007.html
@@ -36,16 +36,17 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Lecture October 14, 2024', 2, None, 'lecture-october-14-2024'),
-              ('Material for the active learning sessions on Tuesday and '
-               'Wednesday',
+ 'sections': [('Lecture October 13, 2025', 2, None, 'lecture-october-13-2025'),
+              ('Readings and videos', 2, None, 'readings-and-videos'),
+              ('Material for the lab sessions on Tuesday and Wednesday',
                2,
                None,
-               'material-for-the-active-learning-sessions-on-tuesday-and-wednesday'),
-              ('Writing a code which implements a feed-forward neural network',
+               'material-for-the-lab-sessions-on-tuesday-and-wednesday'),
+              ('Lecture material: Writing a code which implements a '
+               'feed-forward neural network',
                2,
                None,
-               'writing-a-code-which-implements-a-feed-forward-neural-network'),
+               'lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network'),
               ('Mathematics of deep learning',
                2,
                None,
@@ -55,10 +56,11 @@
                None,
                'reminder-on-books-with-hands-on-material-and-codes'),
               ('Reading recommendations', 2, None, 'reading-recommendations'),
-              ('First network example, simple percepetron with one input',
+              ('Reminder from last week: First network example, simple '
+               'percepetron with one input',
                2,
                None,
-               'first-network-example-simple-percepetron-with-one-input'),
+               'reminder-from-last-week-first-network-example-simple-percepetron-with-one-input'),
               ('Layout of a simple neural network with no hidden layer',
                2,
                None,
@@ -105,10 +107,10 @@
                None,
                'setting-up-the-equations-for-a-neural-network'),
               ('Layout of a neural network with three hidden layers (last '
-               'later = $l=L=4$, first layer $l=0$)',
+               'layer = $l=L=4$, first layer $l=0$)',
                2,
                None,
-               'layout-of-a-neural-network-with-three-hidden-layers-last-later-l-l-4-first-layer-l-0'),
+               'layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0'),
               ('Definitions', 2, None, 'definitions'),
               ('Inputs to the activation function',
                2,
@@ -291,10 +293,6 @@
                2,
                None,
                'collect-and-pre-process-data'),
-              ('The Breast Cancer Data, now with Keras',
-               2,
-               None,
-               'the-breast-cancer-data-now-with-keras'),
               ('Building a neural network code',
                2,
                None,
@@ -349,103 +347,103 @@
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs001.html#lecture-october-14-2024" style="font-size: 80%;"><b>Lecture October 14, 2024</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs002.html#material-for-the-active-learning-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Material for the active learning sessions on Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs003.html#writing-a-code-which-implements-a-feed-forward-neural-network" style="font-size: 80%;"><b>Writing a code which implements a feed-forward neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs004.html#mathematics-of-deep-learning" style="font-size: 80%;"><b>Mathematics of deep learning</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs005.html#reminder-on-books-with-hands-on-material-and-codes" style="font-size: 80%;"><b>Reminder on books with hands-on material and codes</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs006.html#reading-recommendations" style="font-size: 80%;"><b>Reading recommendations</b></a></li>
-     <!-- navigation toc: --> <li><a href="#first-network-example-simple-percepetron-with-one-input" style="font-size: 80%;"><b>First network example, simple percepetron with one input</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs008.html#layout-of-a-simple-neural-network-with-no-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with no hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs009.html#optimizing-the-parameters" style="font-size: 80%;"><b>Optimizing the parameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs010.html#adding-a-hidden-layer" style="font-size: 80%;"><b>Adding a hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs011.html#layout-of-a-simple-neural-network-with-one-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with one hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs012.html#the-derivatives" style="font-size: 80%;"><b>The derivatives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs013.html#important-observations" style="font-size: 80%;"><b>Important observations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs014.html#the-training" style="font-size: 80%;"><b>The training</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs015.html#code-example" style="font-size: 80%;"><b>Code example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs016.html#simple-neural-network-and-the-back-propagation-equations" style="font-size: 80%;"><b>Simple neural network and the  back propagation equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs017.html#layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-with-two-hidden-noeds-and-one-output-node" style="font-size: 80%;"><b>Layout of a simple neural network with two input nodes, one  hidden layer with two hidden noeds and one output node</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs018.html#the-ouput-layer" style="font-size: 80%;"><b>The ouput layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs019.html#compact-expressions" style="font-size: 80%;"><b>Compact expressions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs020.html#output-layer" style="font-size: 80%;"><b>Output layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs021.html#explicit-derivatives" style="font-size: 80%;"><b>Explicit derivatives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs022.html#derivatives-of-the-hidden-layer" style="font-size: 80%;"><b>Derivatives of the hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs023.html#final-expression" style="font-size: 80%;"><b>Final expression</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs024.html#completing-the-list" style="font-size: 80%;"><b>Completing the list</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs025.html#final-expressions-for-the-biases-of-the-hidden-layer" style="font-size: 80%;"><b>Final expressions for the biases of the hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs026.html#gradient-expressions" style="font-size: 80%;"><b>Gradient expressions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs027.html#setting-up-the-equations-for-a-neural-network" style="font-size: 80%;"><b>Setting up the equations for a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs028.html#layout-of-a-neural-network-with-three-hidden-layers-last-later-l-l-4-first-layer-l-0" style="font-size: 80%;"><b>Layout of a neural network with three hidden layers (last later = \( l=L=4 \), first layer \( l=0 \))</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs029.html#definitions" style="font-size: 80%;"><b>Definitions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs030.html#inputs-to-the-activation-function" style="font-size: 80%;"><b>Inputs to the activation function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs031.html#layout-of-input-to-first-hidden-layer-l-1-from-input-layer-l-0" style="font-size: 80%;"><b>Layout of input to first hidden layer \( l=1 \) from input layer \( l=0 \)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs032.html#derivatives-and-the-chain-rule" style="font-size: 80%;"><b>Derivatives and the chain rule</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs033.html#derivative-of-the-cost-function" style="font-size: 80%;"><b>Derivative of the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs034.html#the-back-propagation-equations-for-a-neural-network" style="font-size: 80%;"><b>The  back propagation equations for a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs035.html#analyzing-the-last-results" style="font-size: 80%;"><b>Analyzing the last results</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs036.html#more-considerations" style="font-size: 80%;"><b>More considerations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs037.html#derivatives-in-terms-of-z-j-l" style="font-size: 80%;"><b>Derivatives in terms of \( z_j^L \)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs038.html#bringing-it-together" style="font-size: 80%;"><b>Bringing it together</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs039.html#final-back-propagating-equation" style="font-size: 80%;"><b>Final back propagating equation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs040.html#using-the-chain-rule-and-summing-over-all-k-entries" style="font-size: 80%;"><b>Using the chain rule and summing over all \( k \) entries</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs041.html#setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations" style="font-size: 80%;"><b>Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs042.html#setting-up-the-back-propagation-algorithm-part-1" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 1</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs043.html#setting-up-the-back-propagation-algorithm-part-2" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs044.html#setting-up-the-back-propagation-algorithm-part-3" style="font-size: 80%;"><b>Setting up the Back propagation algorithm, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs045.html#updating-the-gradients" style="font-size: 80%;"><b>Updating the gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs001.html#lecture-october-13-2025" style="font-size: 80%;"><b>Lecture October 13, 2025</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs002.html#readings-and-videos" style="font-size: 80%;"><b>Readings and videos</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs003.html#material-for-the-lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Material for the lab sessions on Tuesday and Wednesday</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs004.html#lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network" style="font-size: 80%;"><b>Lecture material: Writing a code which implements a feed-forward neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs005.html#mathematics-of-deep-learning" style="font-size: 80%;"><b>Mathematics of deep learning</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs006.html#reminder-on-books-with-hands-on-material-and-codes" style="font-size: 80%;"><b>Reminder on books with hands-on material and codes</b></a></li>
+     <!-- navigation toc: --> <li><a href="#reading-recommendations" style="font-size: 80%;"><b>Reading recommendations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs008.html#reminder-from-last-week-first-network-example-simple-percepetron-with-one-input" style="font-size: 80%;"><b>Reminder from last week: First network example, simple percepetron with one input</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs009.html#layout-of-a-simple-neural-network-with-no-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with no hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs010.html#optimizing-the-parameters" style="font-size: 80%;"><b>Optimizing the parameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs011.html#adding-a-hidden-layer" style="font-size: 80%;"><b>Adding a hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs012.html#layout-of-a-simple-neural-network-with-one-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with one hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs013.html#the-derivatives" style="font-size: 80%;"><b>The derivatives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs014.html#important-observations" style="font-size: 80%;"><b>Important observations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs015.html#the-training" style="font-size: 80%;"><b>The training</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs016.html#code-example" style="font-size: 80%;"><b>Code example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs017.html#simple-neural-network-and-the-back-propagation-equations" style="font-size: 80%;"><b>Simple neural network and the  back propagation equations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs018.html#layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-with-two-hidden-noeds-and-one-output-node" style="font-size: 80%;"><b>Layout of a simple neural network with two input nodes, one  hidden layer with two hidden noeds and one output node</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs019.html#the-ouput-layer" style="font-size: 80%;"><b>The ouput layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs020.html#compact-expressions" style="font-size: 80%;"><b>Compact expressions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs021.html#output-layer" style="font-size: 80%;"><b>Output layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs022.html#explicit-derivatives" style="font-size: 80%;"><b>Explicit derivatives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs023.html#derivatives-of-the-hidden-layer" style="font-size: 80%;"><b>Derivatives of the hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs024.html#final-expression" style="font-size: 80%;"><b>Final expression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs025.html#completing-the-list" style="font-size: 80%;"><b>Completing the list</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs026.html#final-expressions-for-the-biases-of-the-hidden-layer" style="font-size: 80%;"><b>Final expressions for the biases of the hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs027.html#gradient-expressions" style="font-size: 80%;"><b>Gradient expressions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs028.html#setting-up-the-equations-for-a-neural-network" style="font-size: 80%;"><b>Setting up the equations for a neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs029.html#layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0" style="font-size: 80%;"><b>Layout of a neural network with three hidden layers (last layer = \( l=L=4 \), first layer \( l=0 \))</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs030.html#definitions" style="font-size: 80%;"><b>Definitions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs031.html#inputs-to-the-activation-function" style="font-size: 80%;"><b>Inputs to the activation function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs032.html#layout-of-input-to-first-hidden-layer-l-1-from-input-layer-l-0" style="font-size: 80%;"><b>Layout of input to first hidden layer \( l=1 \) from input layer \( l=0 \)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs033.html#derivatives-and-the-chain-rule" style="font-size: 80%;"><b>Derivatives and the chain rule</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs034.html#derivative-of-the-cost-function" style="font-size: 80%;"><b>Derivative of the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs035.html#the-back-propagation-equations-for-a-neural-network" style="font-size: 80%;"><b>The  back propagation equations for a neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs036.html#analyzing-the-last-results" style="font-size: 80%;"><b>Analyzing the last results</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs037.html#more-considerations" style="font-size: 80%;"><b>More considerations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs038.html#derivatives-in-terms-of-z-j-l" style="font-size: 80%;"><b>Derivatives in terms of \( z_j^L \)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs039.html#bringing-it-together" style="font-size: 80%;"><b>Bringing it together</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs040.html#final-back-propagating-equation" style="font-size: 80%;"><b>Final back propagating equation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs041.html#using-the-chain-rule-and-summing-over-all-k-entries" style="font-size: 80%;"><b>Using the chain rule and summing over all \( k \) entries</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs042.html#setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations" style="font-size: 80%;"><b>Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs043.html#setting-up-the-back-propagation-algorithm-part-1" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 1</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs044.html#setting-up-the-back-propagation-algorithm-part-2" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 2</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs045.html#setting-up-the-back-propagation-algorithm-part-3" style="font-size: 80%;"><b>Setting up the Back propagation algorithm, part 3</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs046.html#updating-the-gradients" style="font-size: 80%;"><b>Updating the gradients</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#activation-functions" style="font-size: 80%;"><b>Activation functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs047.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs048.html#relevance" style="font-size: 80%;"><b>Relevance</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs049.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs050.html#exploding-gradients" style="font-size: 80%;"><b>Exploding gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs051.html#is-the-logistic-activation-function-sigmoid-our-choice" style="font-size: 80%;"><b>Is the Logistic activation function (Sigmoid)  our choice?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs052.html#logistic-function-as-the-root-of-problems" style="font-size: 80%;"><b>Logistic function as the root of problems</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs053.html#the-derivative-of-the-logistic-funtion" style="font-size: 80%;"><b>The derivative of the Logistic funtion</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs054.html#insights-from-the-paper-by-glorot-and-bengio" style="font-size: 80%;"><b>Insights from the paper by Glorot and Bengio</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs055.html#the-relu-function-family" style="font-size: 80%;"><b>The RELU function family</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs056.html#elu-function" style="font-size: 80%;"><b>ELU function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs057.html#which-activation-function-should-we-use" style="font-size: 80%;"><b>Which activation function should we use?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs058.html#more-on-activation-functions-output-layers" style="font-size: 80%;"><b>More on activation functions, output layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs059.html#fine-tuning-neural-network-hyperparameters" style="font-size: 80%;"><b>Fine-tuning neural network hyperparameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs060.html#hidden-layers" style="font-size: 80%;"><b>Hidden layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs061.html#batch-normalization" style="font-size: 80%;"><b>Batch Normalization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs062.html#dropout" style="font-size: 80%;"><b>Dropout</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs063.html#gradient-clipping" style="font-size: 80%;"><b>Gradient Clipping</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs064.html#a-top-down-perspective-on-neural-networks" style="font-size: 80%;"><b>A top-down perspective on Neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs065.html#more-top-down-perspectives" style="font-size: 80%;"><b>More top-down perspectives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs066.html#limitations-of-supervised-learning-with-deep-networks" style="font-size: 80%;"><b>Limitations of supervised learning with deep networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs067.html#limitations-of-nns" style="font-size: 80%;"><b>Limitations of NNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs068.html#homogeneous-data" style="font-size: 80%;"><b>Homogeneous data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs069.html#more-limitations" style="font-size: 80%;"><b>More limitations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs070.html#setting-up-a-multi-layer-perceptron-model-for-classification" style="font-size: 80%;"><b>Setting up a Multi-layer perceptron model for classification</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs071.html#defining-the-cost-function" style="font-size: 80%;"><b>Defining the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs072.html#example-binary-classification-problem" style="font-size: 80%;"><b>Example: binary classification problem</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs073.html#the-softmax-function" style="font-size: 80%;"><b>The Softmax function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs074.html#developing-a-code-for-doing-neural-networks-with-back-propagation" style="font-size: 80%;"><b>Developing a code for doing neural networks with back propagation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs076.html#train-and-test-datasets" style="font-size: 80%;"><b>Train and test datasets</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs077.html#define-model-and-architecture" style="font-size: 80%;"><b>Define model and architecture</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs078.html#layers" style="font-size: 80%;"><b>Layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs079.html#weights-and-biases" style="font-size: 80%;"><b>Weights and biases</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs080.html#feed-forward-pass" style="font-size: 80%;"><b>Feed-forward pass</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs081.html#matrix-multiplications" style="font-size: 80%;"><b>Matrix multiplications</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs082.html#choose-cost-function-and-optimizer" style="font-size: 80%;"><b>Choose cost function and optimizer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs083.html#optimizing-the-cost-function" style="font-size: 80%;"><b>Optimizing the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs084.html#regularization" style="font-size: 80%;"><b>Regularization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs085.html#matrix-multiplication" style="font-size: 80%;"><b>Matrix  multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs086.html#improving-performance" style="font-size: 80%;"><b>Improving performance</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs087.html#full-object-oriented-implementation" style="font-size: 80%;"><b>Full object-oriented implementation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs088.html#evaluate-model-performance-on-test-data" style="font-size: 80%;"><b>Evaluate model performance on test data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs089.html#adjust-hyperparameters" style="font-size: 80%;"><b>Adjust hyperparameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs091.html#scikit-learn-implementation" style="font-size: 80%;"><b>scikit-learn implementation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#building-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs094.html#tensorflow" style="font-size: 80%;"><b>Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs095.html#using-keras" style="font-size: 80%;"><b>Using Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#the-breast-cancer-data-now-with-keras" style="font-size: 80%;"><b>The Breast Cancer Data, now with Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs048.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs049.html#relevance" style="font-size: 80%;"><b>Relevance</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs050.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs051.html#exploding-gradients" style="font-size: 80%;"><b>Exploding gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs052.html#is-the-logistic-activation-function-sigmoid-our-choice" style="font-size: 80%;"><b>Is the Logistic activation function (Sigmoid)  our choice?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs053.html#logistic-function-as-the-root-of-problems" style="font-size: 80%;"><b>Logistic function as the root of problems</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs054.html#the-derivative-of-the-logistic-funtion" style="font-size: 80%;"><b>The derivative of the Logistic funtion</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs055.html#insights-from-the-paper-by-glorot-and-bengio" style="font-size: 80%;"><b>Insights from the paper by Glorot and Bengio</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs056.html#the-relu-function-family" style="font-size: 80%;"><b>The RELU function family</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs057.html#elu-function" style="font-size: 80%;"><b>ELU function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs058.html#which-activation-function-should-we-use" style="font-size: 80%;"><b>Which activation function should we use?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs059.html#more-on-activation-functions-output-layers" style="font-size: 80%;"><b>More on activation functions, output layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs060.html#fine-tuning-neural-network-hyperparameters" style="font-size: 80%;"><b>Fine-tuning neural network hyperparameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs061.html#hidden-layers" style="font-size: 80%;"><b>Hidden layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs062.html#batch-normalization" style="font-size: 80%;"><b>Batch Normalization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs063.html#dropout" style="font-size: 80%;"><b>Dropout</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs064.html#gradient-clipping" style="font-size: 80%;"><b>Gradient Clipping</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs065.html#a-top-down-perspective-on-neural-networks" style="font-size: 80%;"><b>A top-down perspective on Neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs066.html#more-top-down-perspectives" style="font-size: 80%;"><b>More top-down perspectives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs067.html#limitations-of-supervised-learning-with-deep-networks" style="font-size: 80%;"><b>Limitations of supervised learning with deep networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs068.html#limitations-of-nns" style="font-size: 80%;"><b>Limitations of NNs</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs069.html#homogeneous-data" style="font-size: 80%;"><b>Homogeneous data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs070.html#more-limitations" style="font-size: 80%;"><b>More limitations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs071.html#setting-up-a-multi-layer-perceptron-model-for-classification" style="font-size: 80%;"><b>Setting up a Multi-layer perceptron model for classification</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs072.html#defining-the-cost-function" style="font-size: 80%;"><b>Defining the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs073.html#example-binary-classification-problem" style="font-size: 80%;"><b>Example: binary classification problem</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs074.html#the-softmax-function" style="font-size: 80%;"><b>The Softmax function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs075.html#developing-a-code-for-doing-neural-networks-with-back-propagation" style="font-size: 80%;"><b>Developing a code for doing neural networks with back propagation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs077.html#train-and-test-datasets" style="font-size: 80%;"><b>Train and test datasets</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs078.html#define-model-and-architecture" style="font-size: 80%;"><b>Define model and architecture</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs079.html#layers" style="font-size: 80%;"><b>Layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs080.html#weights-and-biases" style="font-size: 80%;"><b>Weights and biases</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs081.html#feed-forward-pass" style="font-size: 80%;"><b>Feed-forward pass</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs082.html#matrix-multiplications" style="font-size: 80%;"><b>Matrix multiplications</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs083.html#choose-cost-function-and-optimizer" style="font-size: 80%;"><b>Choose cost function and optimizer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs084.html#optimizing-the-cost-function" style="font-size: 80%;"><b>Optimizing the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs085.html#regularization" style="font-size: 80%;"><b>Regularization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs086.html#matrix-multiplication" style="font-size: 80%;"><b>Matrix  multiplication</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs087.html#improving-performance" style="font-size: 80%;"><b>Improving performance</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs088.html#full-object-oriented-implementation" style="font-size: 80%;"><b>Full object-oriented implementation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs089.html#evaluate-model-performance-on-test-data" style="font-size: 80%;"><b>Evaluate model performance on test data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs090.html#adjust-hyperparameters" style="font-size: 80%;"><b>Adjust hyperparameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#scikit-learn-implementation" style="font-size: 80%;"><b>scikit-learn implementation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs094.html#building-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building neural networks in Tensorflow and Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs095.html#tensorflow" style="font-size: 80%;"><b>Tensorflow</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#using-keras" style="font-size: 80%;"><b>Using Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#building-a-neural-network-code" style="font-size: 80%;"><b>Building a neural network code</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#learning-rate-methods" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Learning rate methods</a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#usage-of-the-above-learning-rate-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of the above learning rate schedulers</a></li>
@@ -465,30 +463,12 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0007"></a>
 <!-- !split -->
-<h2 id="first-network-example-simple-percepetron-with-one-input" class="anchor">First network example, simple percepetron with one input </h2>
-
-<p>As yet another example we define now a simple perceptron model with
-all quantities given by scalars. We consider only one input variable
-\( x \) and one target value \( y \).  We define an activation function
-\( \sigma_1 \) which takes as input
-</p>
-
-$$
-z_1 = w_1x+b_1,
-$$
-
-<p>where \( w_1 \) is the weight and \( b_1 \) is the bias. These are the
-parameters we want to optimize.  The output is \( a_1=\sigma(z_1) \) (see
-graph from whiteboard notes). This output is then fed into the
-<b>cost/loss</b> function, which we here for the sake of simplicity just
-define as the squared error
-</p>
-
-$$
-C(x;w_1,b_1)=\frac{1}{2}(a_1-y)^2.
-$$
-
+<h2 id="reading-recommendations" class="anchor">Reading recommendations </h2>
 
+<ol>
+<li> Rashkca et al., chapter 11, jupyter-notebook sent separately, from <a href="https://github.com/rasbt/machine-learning-book" target="_self">GitHub</a></li>
+<li> Goodfellow et al, chapter 6 and 7 contain most of the neural network background.</li>
+</ol>
 <p>
 <!-- navigation buttons at the bottom of the page -->
 <ul class="pagination">
diff --git a/doc/pub/week42/html/._week42-bs008.html b/doc/pub/week42/html/._week42-bs008.html
index 9966971d8..4fc5e8f70 100644
--- a/doc/pub/week42/html/._week42-bs008.html
+++ b/doc/pub/week42/html/._week42-bs008.html
@@ -36,16 +36,17 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Lecture October 14, 2024', 2, None, 'lecture-october-14-2024'),
-              ('Material for the active learning sessions on Tuesday and '
-               'Wednesday',
+ 'sections': [('Lecture October 13, 2025', 2, None, 'lecture-october-13-2025'),
+              ('Readings and videos', 2, None, 'readings-and-videos'),
+              ('Material for the lab sessions on Tuesday and Wednesday',
                2,
                None,
-               'material-for-the-active-learning-sessions-on-tuesday-and-wednesday'),
-              ('Writing a code which implements a feed-forward neural network',
+               'material-for-the-lab-sessions-on-tuesday-and-wednesday'),
+              ('Lecture material: Writing a code which implements a '
+               'feed-forward neural network',
                2,
                None,
-               'writing-a-code-which-implements-a-feed-forward-neural-network'),
+               'lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network'),
               ('Mathematics of deep learning',
                2,
                None,
@@ -55,10 +56,11 @@
                None,
                'reminder-on-books-with-hands-on-material-and-codes'),
               ('Reading recommendations', 2, None, 'reading-recommendations'),
-              ('First network example, simple percepetron with one input',
+              ('Reminder from last week: First network example, simple '
+               'percepetron with one input',
                2,
                None,
-               'first-network-example-simple-percepetron-with-one-input'),
+               'reminder-from-last-week-first-network-example-simple-percepetron-with-one-input'),
               ('Layout of a simple neural network with no hidden layer',
                2,
                None,
@@ -105,10 +107,10 @@
                None,
                'setting-up-the-equations-for-a-neural-network'),
               ('Layout of a neural network with three hidden layers (last '
-               'later = $l=L=4$, first layer $l=0$)',
+               'layer = $l=L=4$, first layer $l=0$)',
                2,
                None,
-               'layout-of-a-neural-network-with-three-hidden-layers-last-later-l-l-4-first-layer-l-0'),
+               'layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0'),
               ('Definitions', 2, None, 'definitions'),
               ('Inputs to the activation function',
                2,
@@ -291,10 +293,6 @@
                2,
                None,
                'collect-and-pre-process-data'),
-              ('The Breast Cancer Data, now with Keras',
-               2,
-               None,
-               'the-breast-cancer-data-now-with-keras'),
               ('Building a neural network code',
                2,
                None,
@@ -349,103 +347,103 @@
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs001.html#lecture-october-14-2024" style="font-size: 80%;"><b>Lecture October 14, 2024</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs002.html#material-for-the-active-learning-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Material for the active learning sessions on Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs003.html#writing-a-code-which-implements-a-feed-forward-neural-network" style="font-size: 80%;"><b>Writing a code which implements a feed-forward neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs004.html#mathematics-of-deep-learning" style="font-size: 80%;"><b>Mathematics of deep learning</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs005.html#reminder-on-books-with-hands-on-material-and-codes" style="font-size: 80%;"><b>Reminder on books with hands-on material and codes</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs006.html#reading-recommendations" style="font-size: 80%;"><b>Reading recommendations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs007.html#first-network-example-simple-percepetron-with-one-input" style="font-size: 80%;"><b>First network example, simple percepetron with one input</b></a></li>
-     <!-- navigation toc: --> <li><a href="#layout-of-a-simple-neural-network-with-no-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with no hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs009.html#optimizing-the-parameters" style="font-size: 80%;"><b>Optimizing the parameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs010.html#adding-a-hidden-layer" style="font-size: 80%;"><b>Adding a hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs011.html#layout-of-a-simple-neural-network-with-one-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with one hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs012.html#the-derivatives" style="font-size: 80%;"><b>The derivatives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs013.html#important-observations" style="font-size: 80%;"><b>Important observations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs014.html#the-training" style="font-size: 80%;"><b>The training</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs015.html#code-example" style="font-size: 80%;"><b>Code example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs016.html#simple-neural-network-and-the-back-propagation-equations" style="font-size: 80%;"><b>Simple neural network and the  back propagation equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs017.html#layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-with-two-hidden-noeds-and-one-output-node" style="font-size: 80%;"><b>Layout of a simple neural network with two input nodes, one  hidden layer with two hidden noeds and one output node</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs018.html#the-ouput-layer" style="font-size: 80%;"><b>The ouput layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs019.html#compact-expressions" style="font-size: 80%;"><b>Compact expressions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs020.html#output-layer" style="font-size: 80%;"><b>Output layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs021.html#explicit-derivatives" style="font-size: 80%;"><b>Explicit derivatives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs022.html#derivatives-of-the-hidden-layer" style="font-size: 80%;"><b>Derivatives of the hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs023.html#final-expression" style="font-size: 80%;"><b>Final expression</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs024.html#completing-the-list" style="font-size: 80%;"><b>Completing the list</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs025.html#final-expressions-for-the-biases-of-the-hidden-layer" style="font-size: 80%;"><b>Final expressions for the biases of the hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs026.html#gradient-expressions" style="font-size: 80%;"><b>Gradient expressions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs027.html#setting-up-the-equations-for-a-neural-network" style="font-size: 80%;"><b>Setting up the equations for a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs028.html#layout-of-a-neural-network-with-three-hidden-layers-last-later-l-l-4-first-layer-l-0" style="font-size: 80%;"><b>Layout of a neural network with three hidden layers (last later = \( l=L=4 \), first layer \( l=0 \))</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs029.html#definitions" style="font-size: 80%;"><b>Definitions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs030.html#inputs-to-the-activation-function" style="font-size: 80%;"><b>Inputs to the activation function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs031.html#layout-of-input-to-first-hidden-layer-l-1-from-input-layer-l-0" style="font-size: 80%;"><b>Layout of input to first hidden layer \( l=1 \) from input layer \( l=0 \)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs032.html#derivatives-and-the-chain-rule" style="font-size: 80%;"><b>Derivatives and the chain rule</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs033.html#derivative-of-the-cost-function" style="font-size: 80%;"><b>Derivative of the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs034.html#the-back-propagation-equations-for-a-neural-network" style="font-size: 80%;"><b>The  back propagation equations for a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs035.html#analyzing-the-last-results" style="font-size: 80%;"><b>Analyzing the last results</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs036.html#more-considerations" style="font-size: 80%;"><b>More considerations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs037.html#derivatives-in-terms-of-z-j-l" style="font-size: 80%;"><b>Derivatives in terms of \( z_j^L \)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs038.html#bringing-it-together" style="font-size: 80%;"><b>Bringing it together</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs039.html#final-back-propagating-equation" style="font-size: 80%;"><b>Final back propagating equation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs040.html#using-the-chain-rule-and-summing-over-all-k-entries" style="font-size: 80%;"><b>Using the chain rule and summing over all \( k \) entries</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs041.html#setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations" style="font-size: 80%;"><b>Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs042.html#setting-up-the-back-propagation-algorithm-part-1" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 1</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs043.html#setting-up-the-back-propagation-algorithm-part-2" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs044.html#setting-up-the-back-propagation-algorithm-part-3" style="font-size: 80%;"><b>Setting up the Back propagation algorithm, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs045.html#updating-the-gradients" style="font-size: 80%;"><b>Updating the gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs001.html#lecture-october-13-2025" style="font-size: 80%;"><b>Lecture October 13, 2025</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs002.html#readings-and-videos" style="font-size: 80%;"><b>Readings and videos</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs003.html#material-for-the-lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Material for the lab sessions on Tuesday and Wednesday</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs004.html#lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network" style="font-size: 80%;"><b>Lecture material: Writing a code which implements a feed-forward neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs005.html#mathematics-of-deep-learning" style="font-size: 80%;"><b>Mathematics of deep learning</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs006.html#reminder-on-books-with-hands-on-material-and-codes" style="font-size: 80%;"><b>Reminder on books with hands-on material and codes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs007.html#reading-recommendations" style="font-size: 80%;"><b>Reading recommendations</b></a></li>
+     <!-- navigation toc: --> <li><a href="#reminder-from-last-week-first-network-example-simple-percepetron-with-one-input" style="font-size: 80%;"><b>Reminder from last week: First network example, simple percepetron with one input</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs009.html#layout-of-a-simple-neural-network-with-no-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with no hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs010.html#optimizing-the-parameters" style="font-size: 80%;"><b>Optimizing the parameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs011.html#adding-a-hidden-layer" style="font-size: 80%;"><b>Adding a hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs012.html#layout-of-a-simple-neural-network-with-one-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with one hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs013.html#the-derivatives" style="font-size: 80%;"><b>The derivatives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs014.html#important-observations" style="font-size: 80%;"><b>Important observations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs015.html#the-training" style="font-size: 80%;"><b>The training</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs016.html#code-example" style="font-size: 80%;"><b>Code example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs017.html#simple-neural-network-and-the-back-propagation-equations" style="font-size: 80%;"><b>Simple neural network and the  back propagation equations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs018.html#layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-with-two-hidden-noeds-and-one-output-node" style="font-size: 80%;"><b>Layout of a simple neural network with two input nodes, one  hidden layer with two hidden noeds and one output node</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs019.html#the-ouput-layer" style="font-size: 80%;"><b>The ouput layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs020.html#compact-expressions" style="font-size: 80%;"><b>Compact expressions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs021.html#output-layer" style="font-size: 80%;"><b>Output layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs022.html#explicit-derivatives" style="font-size: 80%;"><b>Explicit derivatives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs023.html#derivatives-of-the-hidden-layer" style="font-size: 80%;"><b>Derivatives of the hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs024.html#final-expression" style="font-size: 80%;"><b>Final expression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs025.html#completing-the-list" style="font-size: 80%;"><b>Completing the list</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs026.html#final-expressions-for-the-biases-of-the-hidden-layer" style="font-size: 80%;"><b>Final expressions for the biases of the hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs027.html#gradient-expressions" style="font-size: 80%;"><b>Gradient expressions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs028.html#setting-up-the-equations-for-a-neural-network" style="font-size: 80%;"><b>Setting up the equations for a neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs029.html#layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0" style="font-size: 80%;"><b>Layout of a neural network with three hidden layers (last layer = \( l=L=4 \), first layer \( l=0 \))</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs030.html#definitions" style="font-size: 80%;"><b>Definitions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs031.html#inputs-to-the-activation-function" style="font-size: 80%;"><b>Inputs to the activation function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs032.html#layout-of-input-to-first-hidden-layer-l-1-from-input-layer-l-0" style="font-size: 80%;"><b>Layout of input to first hidden layer \( l=1 \) from input layer \( l=0 \)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs033.html#derivatives-and-the-chain-rule" style="font-size: 80%;"><b>Derivatives and the chain rule</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs034.html#derivative-of-the-cost-function" style="font-size: 80%;"><b>Derivative of the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs035.html#the-back-propagation-equations-for-a-neural-network" style="font-size: 80%;"><b>The  back propagation equations for a neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs036.html#analyzing-the-last-results" style="font-size: 80%;"><b>Analyzing the last results</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs037.html#more-considerations" style="font-size: 80%;"><b>More considerations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs038.html#derivatives-in-terms-of-z-j-l" style="font-size: 80%;"><b>Derivatives in terms of \( z_j^L \)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs039.html#bringing-it-together" style="font-size: 80%;"><b>Bringing it together</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs040.html#final-back-propagating-equation" style="font-size: 80%;"><b>Final back propagating equation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs041.html#using-the-chain-rule-and-summing-over-all-k-entries" style="font-size: 80%;"><b>Using the chain rule and summing over all \( k \) entries</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs042.html#setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations" style="font-size: 80%;"><b>Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs043.html#setting-up-the-back-propagation-algorithm-part-1" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 1</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs044.html#setting-up-the-back-propagation-algorithm-part-2" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 2</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs045.html#setting-up-the-back-propagation-algorithm-part-3" style="font-size: 80%;"><b>Setting up the Back propagation algorithm, part 3</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs046.html#updating-the-gradients" style="font-size: 80%;"><b>Updating the gradients</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#activation-functions" style="font-size: 80%;"><b>Activation functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs047.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs048.html#relevance" style="font-size: 80%;"><b>Relevance</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs049.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs050.html#exploding-gradients" style="font-size: 80%;"><b>Exploding gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs051.html#is-the-logistic-activation-function-sigmoid-our-choice" style="font-size: 80%;"><b>Is the Logistic activation function (Sigmoid)  our choice?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs052.html#logistic-function-as-the-root-of-problems" style="font-size: 80%;"><b>Logistic function as the root of problems</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs053.html#the-derivative-of-the-logistic-funtion" style="font-size: 80%;"><b>The derivative of the Logistic funtion</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs054.html#insights-from-the-paper-by-glorot-and-bengio" style="font-size: 80%;"><b>Insights from the paper by Glorot and Bengio</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs055.html#the-relu-function-family" style="font-size: 80%;"><b>The RELU function family</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs056.html#elu-function" style="font-size: 80%;"><b>ELU function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs057.html#which-activation-function-should-we-use" style="font-size: 80%;"><b>Which activation function should we use?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs058.html#more-on-activation-functions-output-layers" style="font-size: 80%;"><b>More on activation functions, output layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs059.html#fine-tuning-neural-network-hyperparameters" style="font-size: 80%;"><b>Fine-tuning neural network hyperparameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs060.html#hidden-layers" style="font-size: 80%;"><b>Hidden layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs061.html#batch-normalization" style="font-size: 80%;"><b>Batch Normalization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs062.html#dropout" style="font-size: 80%;"><b>Dropout</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs063.html#gradient-clipping" style="font-size: 80%;"><b>Gradient Clipping</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs064.html#a-top-down-perspective-on-neural-networks" style="font-size: 80%;"><b>A top-down perspective on Neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs065.html#more-top-down-perspectives" style="font-size: 80%;"><b>More top-down perspectives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs066.html#limitations-of-supervised-learning-with-deep-networks" style="font-size: 80%;"><b>Limitations of supervised learning with deep networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs067.html#limitations-of-nns" style="font-size: 80%;"><b>Limitations of NNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs068.html#homogeneous-data" style="font-size: 80%;"><b>Homogeneous data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs069.html#more-limitations" style="font-size: 80%;"><b>More limitations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs070.html#setting-up-a-multi-layer-perceptron-model-for-classification" style="font-size: 80%;"><b>Setting up a Multi-layer perceptron model for classification</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs071.html#defining-the-cost-function" style="font-size: 80%;"><b>Defining the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs072.html#example-binary-classification-problem" style="font-size: 80%;"><b>Example: binary classification problem</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs073.html#the-softmax-function" style="font-size: 80%;"><b>The Softmax function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs074.html#developing-a-code-for-doing-neural-networks-with-back-propagation" style="font-size: 80%;"><b>Developing a code for doing neural networks with back propagation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs076.html#train-and-test-datasets" style="font-size: 80%;"><b>Train and test datasets</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs077.html#define-model-and-architecture" style="font-size: 80%;"><b>Define model and architecture</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs078.html#layers" style="font-size: 80%;"><b>Layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs079.html#weights-and-biases" style="font-size: 80%;"><b>Weights and biases</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs080.html#feed-forward-pass" style="font-size: 80%;"><b>Feed-forward pass</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs081.html#matrix-multiplications" style="font-size: 80%;"><b>Matrix multiplications</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs082.html#choose-cost-function-and-optimizer" style="font-size: 80%;"><b>Choose cost function and optimizer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs083.html#optimizing-the-cost-function" style="font-size: 80%;"><b>Optimizing the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs084.html#regularization" style="font-size: 80%;"><b>Regularization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs085.html#matrix-multiplication" style="font-size: 80%;"><b>Matrix  multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs086.html#improving-performance" style="font-size: 80%;"><b>Improving performance</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs087.html#full-object-oriented-implementation" style="font-size: 80%;"><b>Full object-oriented implementation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs088.html#evaluate-model-performance-on-test-data" style="font-size: 80%;"><b>Evaluate model performance on test data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs089.html#adjust-hyperparameters" style="font-size: 80%;"><b>Adjust hyperparameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs091.html#scikit-learn-implementation" style="font-size: 80%;"><b>scikit-learn implementation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#building-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs094.html#tensorflow" style="font-size: 80%;"><b>Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs095.html#using-keras" style="font-size: 80%;"><b>Using Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#the-breast-cancer-data-now-with-keras" style="font-size: 80%;"><b>The Breast Cancer Data, now with Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs048.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs049.html#relevance" style="font-size: 80%;"><b>Relevance</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs050.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs051.html#exploding-gradients" style="font-size: 80%;"><b>Exploding gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs052.html#is-the-logistic-activation-function-sigmoid-our-choice" style="font-size: 80%;"><b>Is the Logistic activation function (Sigmoid)  our choice?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs053.html#logistic-function-as-the-root-of-problems" style="font-size: 80%;"><b>Logistic function as the root of problems</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs054.html#the-derivative-of-the-logistic-funtion" style="font-size: 80%;"><b>The derivative of the Logistic funtion</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs055.html#insights-from-the-paper-by-glorot-and-bengio" style="font-size: 80%;"><b>Insights from the paper by Glorot and Bengio</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs056.html#the-relu-function-family" style="font-size: 80%;"><b>The RELU function family</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs057.html#elu-function" style="font-size: 80%;"><b>ELU function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs058.html#which-activation-function-should-we-use" style="font-size: 80%;"><b>Which activation function should we use?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs059.html#more-on-activation-functions-output-layers" style="font-size: 80%;"><b>More on activation functions, output layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs060.html#fine-tuning-neural-network-hyperparameters" style="font-size: 80%;"><b>Fine-tuning neural network hyperparameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs061.html#hidden-layers" style="font-size: 80%;"><b>Hidden layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs062.html#batch-normalization" style="font-size: 80%;"><b>Batch Normalization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs063.html#dropout" style="font-size: 80%;"><b>Dropout</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs064.html#gradient-clipping" style="font-size: 80%;"><b>Gradient Clipping</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs065.html#a-top-down-perspective-on-neural-networks" style="font-size: 80%;"><b>A top-down perspective on Neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs066.html#more-top-down-perspectives" style="font-size: 80%;"><b>More top-down perspectives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs067.html#limitations-of-supervised-learning-with-deep-networks" style="font-size: 80%;"><b>Limitations of supervised learning with deep networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs068.html#limitations-of-nns" style="font-size: 80%;"><b>Limitations of NNs</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs069.html#homogeneous-data" style="font-size: 80%;"><b>Homogeneous data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs070.html#more-limitations" style="font-size: 80%;"><b>More limitations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs071.html#setting-up-a-multi-layer-perceptron-model-for-classification" style="font-size: 80%;"><b>Setting up a Multi-layer perceptron model for classification</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs072.html#defining-the-cost-function" style="font-size: 80%;"><b>Defining the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs073.html#example-binary-classification-problem" style="font-size: 80%;"><b>Example: binary classification problem</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs074.html#the-softmax-function" style="font-size: 80%;"><b>The Softmax function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs075.html#developing-a-code-for-doing-neural-networks-with-back-propagation" style="font-size: 80%;"><b>Developing a code for doing neural networks with back propagation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs077.html#train-and-test-datasets" style="font-size: 80%;"><b>Train and test datasets</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs078.html#define-model-and-architecture" style="font-size: 80%;"><b>Define model and architecture</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs079.html#layers" style="font-size: 80%;"><b>Layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs080.html#weights-and-biases" style="font-size: 80%;"><b>Weights and biases</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs081.html#feed-forward-pass" style="font-size: 80%;"><b>Feed-forward pass</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs082.html#matrix-multiplications" style="font-size: 80%;"><b>Matrix multiplications</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs083.html#choose-cost-function-and-optimizer" style="font-size: 80%;"><b>Choose cost function and optimizer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs084.html#optimizing-the-cost-function" style="font-size: 80%;"><b>Optimizing the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs085.html#regularization" style="font-size: 80%;"><b>Regularization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs086.html#matrix-multiplication" style="font-size: 80%;"><b>Matrix  multiplication</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs087.html#improving-performance" style="font-size: 80%;"><b>Improving performance</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs088.html#full-object-oriented-implementation" style="font-size: 80%;"><b>Full object-oriented implementation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs089.html#evaluate-model-performance-on-test-data" style="font-size: 80%;"><b>Evaluate model performance on test data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs090.html#adjust-hyperparameters" style="font-size: 80%;"><b>Adjust hyperparameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#scikit-learn-implementation" style="font-size: 80%;"><b>scikit-learn implementation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs094.html#building-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building neural networks in Tensorflow and Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs095.html#tensorflow" style="font-size: 80%;"><b>Tensorflow</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#using-keras" style="font-size: 80%;"><b>Using Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#building-a-neural-network-code" style="font-size: 80%;"><b>Building a neural network code</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#learning-rate-methods" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Learning rate methods</a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#usage-of-the-above-learning-rate-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of the above learning rate schedulers</a></li>
@@ -465,13 +463,29 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0008"></a>
 <!-- !split -->
-<h2 id="layout-of-a-simple-neural-network-with-no-hidden-layer" class="anchor">Layout of a simple neural network with no hidden layer  </h2>
+<h2 id="reminder-from-last-week-first-network-example-simple-percepetron-with-one-input" class="anchor">Reminder from last week: First network example, simple percepetron with one input </h2>
+
+<p>As yet another example we define now a simple perceptron model with
+all quantities given by scalars. We consider only one input variable
+\( x \) and one target value \( y \).  We define an activation function
+\( \sigma_1 \) which takes as input
+</p>
+
+$$
+z_1 = w_1x+b_1,
+$$
+
+<p>where \( w_1 \) is the weight and \( b_1 \) is the bias. These are the
+parameters we want to optimize.  The output is \( a_1=\sigma(z_1) \) (see
+graph from whiteboard notes). This output is then fed into the
+<b>cost/loss</b> function, which we here for the sake of simplicity just
+define as the squared error
+</p>
+
+$$
+C(x;w_1,b_1)=\frac{1}{2}(a_1-y)^2.
+$$
 
-<br/><br/>
-<center>
-<p><img src="http://github.com/figures/simplenn1.png" width="900" align="bottom"></p>
-</center>
-<br/><br/>
 
 <p>
 <!-- navigation buttons at the bottom of the page -->
diff --git a/doc/pub/week42/html/._week42-bs009.html b/doc/pub/week42/html/._week42-bs009.html
index accfdbc6b..fa926f442 100644
--- a/doc/pub/week42/html/._week42-bs009.html
+++ b/doc/pub/week42/html/._week42-bs009.html
@@ -36,16 +36,17 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Lecture October 14, 2024', 2, None, 'lecture-october-14-2024'),
-              ('Material for the active learning sessions on Tuesday and '
-               'Wednesday',
+ 'sections': [('Lecture October 13, 2025', 2, None, 'lecture-october-13-2025'),
+              ('Readings and videos', 2, None, 'readings-and-videos'),
+              ('Material for the lab sessions on Tuesday and Wednesday',
                2,
                None,
-               'material-for-the-active-learning-sessions-on-tuesday-and-wednesday'),
-              ('Writing a code which implements a feed-forward neural network',
+               'material-for-the-lab-sessions-on-tuesday-and-wednesday'),
+              ('Lecture material: Writing a code which implements a '
+               'feed-forward neural network',
                2,
                None,
-               'writing-a-code-which-implements-a-feed-forward-neural-network'),
+               'lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network'),
               ('Mathematics of deep learning',
                2,
                None,
@@ -55,10 +56,11 @@
                None,
                'reminder-on-books-with-hands-on-material-and-codes'),
               ('Reading recommendations', 2, None, 'reading-recommendations'),
-              ('First network example, simple percepetron with one input',
+              ('Reminder from last week: First network example, simple '
+               'percepetron with one input',
                2,
                None,
-               'first-network-example-simple-percepetron-with-one-input'),
+               'reminder-from-last-week-first-network-example-simple-percepetron-with-one-input'),
               ('Layout of a simple neural network with no hidden layer',
                2,
                None,
@@ -105,10 +107,10 @@
                None,
                'setting-up-the-equations-for-a-neural-network'),
               ('Layout of a neural network with three hidden layers (last '
-               'later = $l=L=4$, first layer $l=0$)',
+               'layer = $l=L=4$, first layer $l=0$)',
                2,
                None,
-               'layout-of-a-neural-network-with-three-hidden-layers-last-later-l-l-4-first-layer-l-0'),
+               'layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0'),
               ('Definitions', 2, None, 'definitions'),
               ('Inputs to the activation function',
                2,
@@ -291,10 +293,6 @@
                2,
                None,
                'collect-and-pre-process-data'),
-              ('The Breast Cancer Data, now with Keras',
-               2,
-               None,
-               'the-breast-cancer-data-now-with-keras'),
               ('Building a neural network code',
                2,
                None,
@@ -349,103 +347,103 @@
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs001.html#lecture-october-14-2024" style="font-size: 80%;"><b>Lecture October 14, 2024</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs002.html#material-for-the-active-learning-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Material for the active learning sessions on Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs003.html#writing-a-code-which-implements-a-feed-forward-neural-network" style="font-size: 80%;"><b>Writing a code which implements a feed-forward neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs004.html#mathematics-of-deep-learning" style="font-size: 80%;"><b>Mathematics of deep learning</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs005.html#reminder-on-books-with-hands-on-material-and-codes" style="font-size: 80%;"><b>Reminder on books with hands-on material and codes</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs006.html#reading-recommendations" style="font-size: 80%;"><b>Reading recommendations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs007.html#first-network-example-simple-percepetron-with-one-input" style="font-size: 80%;"><b>First network example, simple percepetron with one input</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs008.html#layout-of-a-simple-neural-network-with-no-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with no hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="#optimizing-the-parameters" style="font-size: 80%;"><b>Optimizing the parameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs010.html#adding-a-hidden-layer" style="font-size: 80%;"><b>Adding a hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs011.html#layout-of-a-simple-neural-network-with-one-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with one hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs012.html#the-derivatives" style="font-size: 80%;"><b>The derivatives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs013.html#important-observations" style="font-size: 80%;"><b>Important observations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs014.html#the-training" style="font-size: 80%;"><b>The training</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs015.html#code-example" style="font-size: 80%;"><b>Code example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs016.html#simple-neural-network-and-the-back-propagation-equations" style="font-size: 80%;"><b>Simple neural network and the  back propagation equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs017.html#layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-with-two-hidden-noeds-and-one-output-node" style="font-size: 80%;"><b>Layout of a simple neural network with two input nodes, one  hidden layer with two hidden noeds and one output node</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs018.html#the-ouput-layer" style="font-size: 80%;"><b>The ouput layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs019.html#compact-expressions" style="font-size: 80%;"><b>Compact expressions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs020.html#output-layer" style="font-size: 80%;"><b>Output layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs021.html#explicit-derivatives" style="font-size: 80%;"><b>Explicit derivatives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs022.html#derivatives-of-the-hidden-layer" style="font-size: 80%;"><b>Derivatives of the hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs023.html#final-expression" style="font-size: 80%;"><b>Final expression</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs024.html#completing-the-list" style="font-size: 80%;"><b>Completing the list</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs025.html#final-expressions-for-the-biases-of-the-hidden-layer" style="font-size: 80%;"><b>Final expressions for the biases of the hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs026.html#gradient-expressions" style="font-size: 80%;"><b>Gradient expressions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs027.html#setting-up-the-equations-for-a-neural-network" style="font-size: 80%;"><b>Setting up the equations for a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs028.html#layout-of-a-neural-network-with-three-hidden-layers-last-later-l-l-4-first-layer-l-0" style="font-size: 80%;"><b>Layout of a neural network with three hidden layers (last later = \( l=L=4 \), first layer \( l=0 \))</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs029.html#definitions" style="font-size: 80%;"><b>Definitions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs030.html#inputs-to-the-activation-function" style="font-size: 80%;"><b>Inputs to the activation function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs031.html#layout-of-input-to-first-hidden-layer-l-1-from-input-layer-l-0" style="font-size: 80%;"><b>Layout of input to first hidden layer \( l=1 \) from input layer \( l=0 \)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs032.html#derivatives-and-the-chain-rule" style="font-size: 80%;"><b>Derivatives and the chain rule</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs033.html#derivative-of-the-cost-function" style="font-size: 80%;"><b>Derivative of the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs034.html#the-back-propagation-equations-for-a-neural-network" style="font-size: 80%;"><b>The  back propagation equations for a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs035.html#analyzing-the-last-results" style="font-size: 80%;"><b>Analyzing the last results</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs036.html#more-considerations" style="font-size: 80%;"><b>More considerations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs037.html#derivatives-in-terms-of-z-j-l" style="font-size: 80%;"><b>Derivatives in terms of \( z_j^L \)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs038.html#bringing-it-together" style="font-size: 80%;"><b>Bringing it together</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs039.html#final-back-propagating-equation" style="font-size: 80%;"><b>Final back propagating equation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs040.html#using-the-chain-rule-and-summing-over-all-k-entries" style="font-size: 80%;"><b>Using the chain rule and summing over all \( k \) entries</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs041.html#setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations" style="font-size: 80%;"><b>Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs042.html#setting-up-the-back-propagation-algorithm-part-1" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 1</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs043.html#setting-up-the-back-propagation-algorithm-part-2" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs044.html#setting-up-the-back-propagation-algorithm-part-3" style="font-size: 80%;"><b>Setting up the Back propagation algorithm, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs045.html#updating-the-gradients" style="font-size: 80%;"><b>Updating the gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs001.html#lecture-october-13-2025" style="font-size: 80%;"><b>Lecture October 13, 2025</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs002.html#readings-and-videos" style="font-size: 80%;"><b>Readings and videos</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs003.html#material-for-the-lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Material for the lab sessions on Tuesday and Wednesday</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs004.html#lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network" style="font-size: 80%;"><b>Lecture material: Writing a code which implements a feed-forward neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs005.html#mathematics-of-deep-learning" style="font-size: 80%;"><b>Mathematics of deep learning</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs006.html#reminder-on-books-with-hands-on-material-and-codes" style="font-size: 80%;"><b>Reminder on books with hands-on material and codes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs007.html#reading-recommendations" style="font-size: 80%;"><b>Reading recommendations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs008.html#reminder-from-last-week-first-network-example-simple-percepetron-with-one-input" style="font-size: 80%;"><b>Reminder from last week: First network example, simple percepetron with one input</b></a></li>
+     <!-- navigation toc: --> <li><a href="#layout-of-a-simple-neural-network-with-no-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with no hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs010.html#optimizing-the-parameters" style="font-size: 80%;"><b>Optimizing the parameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs011.html#adding-a-hidden-layer" style="font-size: 80%;"><b>Adding a hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs012.html#layout-of-a-simple-neural-network-with-one-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with one hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs013.html#the-derivatives" style="font-size: 80%;"><b>The derivatives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs014.html#important-observations" style="font-size: 80%;"><b>Important observations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs015.html#the-training" style="font-size: 80%;"><b>The training</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs016.html#code-example" style="font-size: 80%;"><b>Code example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs017.html#simple-neural-network-and-the-back-propagation-equations" style="font-size: 80%;"><b>Simple neural network and the  back propagation equations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs018.html#layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-with-two-hidden-noeds-and-one-output-node" style="font-size: 80%;"><b>Layout of a simple neural network with two input nodes, one  hidden layer with two hidden noeds and one output node</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs019.html#the-ouput-layer" style="font-size: 80%;"><b>The ouput layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs020.html#compact-expressions" style="font-size: 80%;"><b>Compact expressions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs021.html#output-layer" style="font-size: 80%;"><b>Output layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs022.html#explicit-derivatives" style="font-size: 80%;"><b>Explicit derivatives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs023.html#derivatives-of-the-hidden-layer" style="font-size: 80%;"><b>Derivatives of the hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs024.html#final-expression" style="font-size: 80%;"><b>Final expression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs025.html#completing-the-list" style="font-size: 80%;"><b>Completing the list</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs026.html#final-expressions-for-the-biases-of-the-hidden-layer" style="font-size: 80%;"><b>Final expressions for the biases of the hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs027.html#gradient-expressions" style="font-size: 80%;"><b>Gradient expressions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs028.html#setting-up-the-equations-for-a-neural-network" style="font-size: 80%;"><b>Setting up the equations for a neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs029.html#layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0" style="font-size: 80%;"><b>Layout of a neural network with three hidden layers (last layer = \( l=L=4 \), first layer \( l=0 \))</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs030.html#definitions" style="font-size: 80%;"><b>Definitions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs031.html#inputs-to-the-activation-function" style="font-size: 80%;"><b>Inputs to the activation function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs032.html#layout-of-input-to-first-hidden-layer-l-1-from-input-layer-l-0" style="font-size: 80%;"><b>Layout of input to first hidden layer \( l=1 \) from input layer \( l=0 \)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs033.html#derivatives-and-the-chain-rule" style="font-size: 80%;"><b>Derivatives and the chain rule</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs034.html#derivative-of-the-cost-function" style="font-size: 80%;"><b>Derivative of the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs035.html#the-back-propagation-equations-for-a-neural-network" style="font-size: 80%;"><b>The  back propagation equations for a neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs036.html#analyzing-the-last-results" style="font-size: 80%;"><b>Analyzing the last results</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs037.html#more-considerations" style="font-size: 80%;"><b>More considerations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs038.html#derivatives-in-terms-of-z-j-l" style="font-size: 80%;"><b>Derivatives in terms of \( z_j^L \)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs039.html#bringing-it-together" style="font-size: 80%;"><b>Bringing it together</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs040.html#final-back-propagating-equation" style="font-size: 80%;"><b>Final back propagating equation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs041.html#using-the-chain-rule-and-summing-over-all-k-entries" style="font-size: 80%;"><b>Using the chain rule and summing over all \( k \) entries</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs042.html#setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations" style="font-size: 80%;"><b>Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs043.html#setting-up-the-back-propagation-algorithm-part-1" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 1</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs044.html#setting-up-the-back-propagation-algorithm-part-2" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 2</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs045.html#setting-up-the-back-propagation-algorithm-part-3" style="font-size: 80%;"><b>Setting up the Back propagation algorithm, part 3</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs046.html#updating-the-gradients" style="font-size: 80%;"><b>Updating the gradients</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#activation-functions" style="font-size: 80%;"><b>Activation functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs047.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs048.html#relevance" style="font-size: 80%;"><b>Relevance</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs049.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs050.html#exploding-gradients" style="font-size: 80%;"><b>Exploding gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs051.html#is-the-logistic-activation-function-sigmoid-our-choice" style="font-size: 80%;"><b>Is the Logistic activation function (Sigmoid)  our choice?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs052.html#logistic-function-as-the-root-of-problems" style="font-size: 80%;"><b>Logistic function as the root of problems</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs053.html#the-derivative-of-the-logistic-funtion" style="font-size: 80%;"><b>The derivative of the Logistic funtion</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs054.html#insights-from-the-paper-by-glorot-and-bengio" style="font-size: 80%;"><b>Insights from the paper by Glorot and Bengio</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs055.html#the-relu-function-family" style="font-size: 80%;"><b>The RELU function family</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs056.html#elu-function" style="font-size: 80%;"><b>ELU function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs057.html#which-activation-function-should-we-use" style="font-size: 80%;"><b>Which activation function should we use?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs058.html#more-on-activation-functions-output-layers" style="font-size: 80%;"><b>More on activation functions, output layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs059.html#fine-tuning-neural-network-hyperparameters" style="font-size: 80%;"><b>Fine-tuning neural network hyperparameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs060.html#hidden-layers" style="font-size: 80%;"><b>Hidden layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs061.html#batch-normalization" style="font-size: 80%;"><b>Batch Normalization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs062.html#dropout" style="font-size: 80%;"><b>Dropout</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs063.html#gradient-clipping" style="font-size: 80%;"><b>Gradient Clipping</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs064.html#a-top-down-perspective-on-neural-networks" style="font-size: 80%;"><b>A top-down perspective on Neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs065.html#more-top-down-perspectives" style="font-size: 80%;"><b>More top-down perspectives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs066.html#limitations-of-supervised-learning-with-deep-networks" style="font-size: 80%;"><b>Limitations of supervised learning with deep networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs067.html#limitations-of-nns" style="font-size: 80%;"><b>Limitations of NNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs068.html#homogeneous-data" style="font-size: 80%;"><b>Homogeneous data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs069.html#more-limitations" style="font-size: 80%;"><b>More limitations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs070.html#setting-up-a-multi-layer-perceptron-model-for-classification" style="font-size: 80%;"><b>Setting up a Multi-layer perceptron model for classification</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs071.html#defining-the-cost-function" style="font-size: 80%;"><b>Defining the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs072.html#example-binary-classification-problem" style="font-size: 80%;"><b>Example: binary classification problem</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs073.html#the-softmax-function" style="font-size: 80%;"><b>The Softmax function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs074.html#developing-a-code-for-doing-neural-networks-with-back-propagation" style="font-size: 80%;"><b>Developing a code for doing neural networks with back propagation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs076.html#train-and-test-datasets" style="font-size: 80%;"><b>Train and test datasets</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs077.html#define-model-and-architecture" style="font-size: 80%;"><b>Define model and architecture</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs078.html#layers" style="font-size: 80%;"><b>Layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs079.html#weights-and-biases" style="font-size: 80%;"><b>Weights and biases</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs080.html#feed-forward-pass" style="font-size: 80%;"><b>Feed-forward pass</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs081.html#matrix-multiplications" style="font-size: 80%;"><b>Matrix multiplications</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs082.html#choose-cost-function-and-optimizer" style="font-size: 80%;"><b>Choose cost function and optimizer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs083.html#optimizing-the-cost-function" style="font-size: 80%;"><b>Optimizing the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs084.html#regularization" style="font-size: 80%;"><b>Regularization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs085.html#matrix-multiplication" style="font-size: 80%;"><b>Matrix  multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs086.html#improving-performance" style="font-size: 80%;"><b>Improving performance</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs087.html#full-object-oriented-implementation" style="font-size: 80%;"><b>Full object-oriented implementation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs088.html#evaluate-model-performance-on-test-data" style="font-size: 80%;"><b>Evaluate model performance on test data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs089.html#adjust-hyperparameters" style="font-size: 80%;"><b>Adjust hyperparameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs091.html#scikit-learn-implementation" style="font-size: 80%;"><b>scikit-learn implementation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#building-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs094.html#tensorflow" style="font-size: 80%;"><b>Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs095.html#using-keras" style="font-size: 80%;"><b>Using Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#the-breast-cancer-data-now-with-keras" style="font-size: 80%;"><b>The Breast Cancer Data, now with Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs048.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs049.html#relevance" style="font-size: 80%;"><b>Relevance</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs050.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs051.html#exploding-gradients" style="font-size: 80%;"><b>Exploding gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs052.html#is-the-logistic-activation-function-sigmoid-our-choice" style="font-size: 80%;"><b>Is the Logistic activation function (Sigmoid)  our choice?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs053.html#logistic-function-as-the-root-of-problems" style="font-size: 80%;"><b>Logistic function as the root of problems</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs054.html#the-derivative-of-the-logistic-funtion" style="font-size: 80%;"><b>The derivative of the Logistic funtion</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs055.html#insights-from-the-paper-by-glorot-and-bengio" style="font-size: 80%;"><b>Insights from the paper by Glorot and Bengio</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs056.html#the-relu-function-family" style="font-size: 80%;"><b>The RELU function family</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs057.html#elu-function" style="font-size: 80%;"><b>ELU function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs058.html#which-activation-function-should-we-use" style="font-size: 80%;"><b>Which activation function should we use?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs059.html#more-on-activation-functions-output-layers" style="font-size: 80%;"><b>More on activation functions, output layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs060.html#fine-tuning-neural-network-hyperparameters" style="font-size: 80%;"><b>Fine-tuning neural network hyperparameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs061.html#hidden-layers" style="font-size: 80%;"><b>Hidden layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs062.html#batch-normalization" style="font-size: 80%;"><b>Batch Normalization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs063.html#dropout" style="font-size: 80%;"><b>Dropout</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs064.html#gradient-clipping" style="font-size: 80%;"><b>Gradient Clipping</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs065.html#a-top-down-perspective-on-neural-networks" style="font-size: 80%;"><b>A top-down perspective on Neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs066.html#more-top-down-perspectives" style="font-size: 80%;"><b>More top-down perspectives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs067.html#limitations-of-supervised-learning-with-deep-networks" style="font-size: 80%;"><b>Limitations of supervised learning with deep networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs068.html#limitations-of-nns" style="font-size: 80%;"><b>Limitations of NNs</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs069.html#homogeneous-data" style="font-size: 80%;"><b>Homogeneous data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs070.html#more-limitations" style="font-size: 80%;"><b>More limitations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs071.html#setting-up-a-multi-layer-perceptron-model-for-classification" style="font-size: 80%;"><b>Setting up a Multi-layer perceptron model for classification</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs072.html#defining-the-cost-function" style="font-size: 80%;"><b>Defining the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs073.html#example-binary-classification-problem" style="font-size: 80%;"><b>Example: binary classification problem</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs074.html#the-softmax-function" style="font-size: 80%;"><b>The Softmax function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs075.html#developing-a-code-for-doing-neural-networks-with-back-propagation" style="font-size: 80%;"><b>Developing a code for doing neural networks with back propagation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs077.html#train-and-test-datasets" style="font-size: 80%;"><b>Train and test datasets</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs078.html#define-model-and-architecture" style="font-size: 80%;"><b>Define model and architecture</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs079.html#layers" style="font-size: 80%;"><b>Layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs080.html#weights-and-biases" style="font-size: 80%;"><b>Weights and biases</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs081.html#feed-forward-pass" style="font-size: 80%;"><b>Feed-forward pass</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs082.html#matrix-multiplications" style="font-size: 80%;"><b>Matrix multiplications</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs083.html#choose-cost-function-and-optimizer" style="font-size: 80%;"><b>Choose cost function and optimizer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs084.html#optimizing-the-cost-function" style="font-size: 80%;"><b>Optimizing the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs085.html#regularization" style="font-size: 80%;"><b>Regularization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs086.html#matrix-multiplication" style="font-size: 80%;"><b>Matrix  multiplication</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs087.html#improving-performance" style="font-size: 80%;"><b>Improving performance</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs088.html#full-object-oriented-implementation" style="font-size: 80%;"><b>Full object-oriented implementation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs089.html#evaluate-model-performance-on-test-data" style="font-size: 80%;"><b>Evaluate model performance on test data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs090.html#adjust-hyperparameters" style="font-size: 80%;"><b>Adjust hyperparameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#scikit-learn-implementation" style="font-size: 80%;"><b>scikit-learn implementation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs094.html#building-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building neural networks in Tensorflow and Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs095.html#tensorflow" style="font-size: 80%;"><b>Tensorflow</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#using-keras" style="font-size: 80%;"><b>Using Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#building-a-neural-network-code" style="font-size: 80%;"><b>Building a neural network code</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#learning-rate-methods" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Learning rate methods</a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#usage-of-the-above-learning-rate-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of the above learning rate schedulers</a></li>
@@ -465,33 +463,13 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0009"></a>
 <!-- !split -->
-<h2 id="optimizing-the-parameters" class="anchor">Optimizing the parameters </h2>
-
-<p>In setting up the feed forward and back propagation parts of the
-algorithm, we need now the derivative of the various variables we want
-to train.
-</p>
-
-<p>We need</p>
-$$
-\frac{\partial C}{\partial w_1} \hspace{0.1cm}\mathrm{and}\hspace{0.1cm}\frac{\partial C}{\partial b_1}. 
-$$
-
-<p>Using the chain rule we find </p>
-$$
-\frac{\partial C}{\partial w_1}=\frac{\partial C}{\partial a_1}\frac{\partial a_1}{\partial z_1}\frac{\partial z_1}{\partial w_1}=(a_1-y)\sigma_1'x,
-$$
-
-<p>and</p>
-$$
-\frac{\partial C}{\partial b_1}=\frac{\partial C}{\partial a_1}\frac{\partial a_1}{\partial z_1}\frac{\partial z_1}{\partial b_1}=(a_1-y)\sigma_1',
-$$
-
-<p>which we later will just define as</p>
-$$
-\frac{\partial C}{\partial a_1}\frac{\partial a_1}{\partial z_1}=\delta_1.
-$$
+<h2 id="layout-of-a-simple-neural-network-with-no-hidden-layer" class="anchor">Layout of a simple neural network with no hidden layer  </h2>
 
+<br/><br/>
+<center>
+<p><img src="http://github.com/figures/simplenn1.png" width="900" align="bottom"></p>
+</center>
+<br/><br/>
 
 <p>
 <!-- navigation buttons at the bottom of the page -->
diff --git a/doc/pub/week42/html/._week42-bs010.html b/doc/pub/week42/html/._week42-bs010.html
index 1cd03effb..4add5737b 100644
--- a/doc/pub/week42/html/._week42-bs010.html
+++ b/doc/pub/week42/html/._week42-bs010.html
@@ -36,16 +36,17 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Lecture October 14, 2024', 2, None, 'lecture-october-14-2024'),
-              ('Material for the active learning sessions on Tuesday and '
-               'Wednesday',
+ 'sections': [('Lecture October 13, 2025', 2, None, 'lecture-october-13-2025'),
+              ('Readings and videos', 2, None, 'readings-and-videos'),
+              ('Material for the lab sessions on Tuesday and Wednesday',
                2,
                None,
-               'material-for-the-active-learning-sessions-on-tuesday-and-wednesday'),
-              ('Writing a code which implements a feed-forward neural network',
+               'material-for-the-lab-sessions-on-tuesday-and-wednesday'),
+              ('Lecture material: Writing a code which implements a '
+               'feed-forward neural network',
                2,
                None,
-               'writing-a-code-which-implements-a-feed-forward-neural-network'),
+               'lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network'),
               ('Mathematics of deep learning',
                2,
                None,
@@ -55,10 +56,11 @@
                None,
                'reminder-on-books-with-hands-on-material-and-codes'),
               ('Reading recommendations', 2, None, 'reading-recommendations'),
-              ('First network example, simple percepetron with one input',
+              ('Reminder from last week: First network example, simple '
+               'percepetron with one input',
                2,
                None,
-               'first-network-example-simple-percepetron-with-one-input'),
+               'reminder-from-last-week-first-network-example-simple-percepetron-with-one-input'),
               ('Layout of a simple neural network with no hidden layer',
                2,
                None,
@@ -105,10 +107,10 @@
                None,
                'setting-up-the-equations-for-a-neural-network'),
               ('Layout of a neural network with three hidden layers (last '
-               'later = $l=L=4$, first layer $l=0$)',
+               'layer = $l=L=4$, first layer $l=0$)',
                2,
                None,
-               'layout-of-a-neural-network-with-three-hidden-layers-last-later-l-l-4-first-layer-l-0'),
+               'layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0'),
               ('Definitions', 2, None, 'definitions'),
               ('Inputs to the activation function',
                2,
@@ -291,10 +293,6 @@
                2,
                None,
                'collect-and-pre-process-data'),
-              ('The Breast Cancer Data, now with Keras',
-               2,
-               None,
-               'the-breast-cancer-data-now-with-keras'),
               ('Building a neural network code',
                2,
                None,
@@ -349,103 +347,103 @@
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs001.html#lecture-october-14-2024" style="font-size: 80%;"><b>Lecture October 14, 2024</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs002.html#material-for-the-active-learning-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Material for the active learning sessions on Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs003.html#writing-a-code-which-implements-a-feed-forward-neural-network" style="font-size: 80%;"><b>Writing a code which implements a feed-forward neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs004.html#mathematics-of-deep-learning" style="font-size: 80%;"><b>Mathematics of deep learning</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs005.html#reminder-on-books-with-hands-on-material-and-codes" style="font-size: 80%;"><b>Reminder on books with hands-on material and codes</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs006.html#reading-recommendations" style="font-size: 80%;"><b>Reading recommendations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs007.html#first-network-example-simple-percepetron-with-one-input" style="font-size: 80%;"><b>First network example, simple percepetron with one input</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs008.html#layout-of-a-simple-neural-network-with-no-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with no hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs009.html#optimizing-the-parameters" style="font-size: 80%;"><b>Optimizing the parameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="#adding-a-hidden-layer" style="font-size: 80%;"><b>Adding a hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs011.html#layout-of-a-simple-neural-network-with-one-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with one hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs012.html#the-derivatives" style="font-size: 80%;"><b>The derivatives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs013.html#important-observations" style="font-size: 80%;"><b>Important observations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs014.html#the-training" style="font-size: 80%;"><b>The training</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs015.html#code-example" style="font-size: 80%;"><b>Code example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs016.html#simple-neural-network-and-the-back-propagation-equations" style="font-size: 80%;"><b>Simple neural network and the  back propagation equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs017.html#layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-with-two-hidden-noeds-and-one-output-node" style="font-size: 80%;"><b>Layout of a simple neural network with two input nodes, one  hidden layer with two hidden noeds and one output node</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs018.html#the-ouput-layer" style="font-size: 80%;"><b>The ouput layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs019.html#compact-expressions" style="font-size: 80%;"><b>Compact expressions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs020.html#output-layer" style="font-size: 80%;"><b>Output layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs021.html#explicit-derivatives" style="font-size: 80%;"><b>Explicit derivatives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs022.html#derivatives-of-the-hidden-layer" style="font-size: 80%;"><b>Derivatives of the hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs023.html#final-expression" style="font-size: 80%;"><b>Final expression</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs024.html#completing-the-list" style="font-size: 80%;"><b>Completing the list</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs025.html#final-expressions-for-the-biases-of-the-hidden-layer" style="font-size: 80%;"><b>Final expressions for the biases of the hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs026.html#gradient-expressions" style="font-size: 80%;"><b>Gradient expressions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs027.html#setting-up-the-equations-for-a-neural-network" style="font-size: 80%;"><b>Setting up the equations for a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs028.html#layout-of-a-neural-network-with-three-hidden-layers-last-later-l-l-4-first-layer-l-0" style="font-size: 80%;"><b>Layout of a neural network with three hidden layers (last later = \( l=L=4 \), first layer \( l=0 \))</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs029.html#definitions" style="font-size: 80%;"><b>Definitions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs030.html#inputs-to-the-activation-function" style="font-size: 80%;"><b>Inputs to the activation function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs031.html#layout-of-input-to-first-hidden-layer-l-1-from-input-layer-l-0" style="font-size: 80%;"><b>Layout of input to first hidden layer \( l=1 \) from input layer \( l=0 \)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs032.html#derivatives-and-the-chain-rule" style="font-size: 80%;"><b>Derivatives and the chain rule</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs033.html#derivative-of-the-cost-function" style="font-size: 80%;"><b>Derivative of the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs034.html#the-back-propagation-equations-for-a-neural-network" style="font-size: 80%;"><b>The  back propagation equations for a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs035.html#analyzing-the-last-results" style="font-size: 80%;"><b>Analyzing the last results</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs036.html#more-considerations" style="font-size: 80%;"><b>More considerations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs037.html#derivatives-in-terms-of-z-j-l" style="font-size: 80%;"><b>Derivatives in terms of \( z_j^L \)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs038.html#bringing-it-together" style="font-size: 80%;"><b>Bringing it together</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs039.html#final-back-propagating-equation" style="font-size: 80%;"><b>Final back propagating equation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs040.html#using-the-chain-rule-and-summing-over-all-k-entries" style="font-size: 80%;"><b>Using the chain rule and summing over all \( k \) entries</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs041.html#setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations" style="font-size: 80%;"><b>Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs042.html#setting-up-the-back-propagation-algorithm-part-1" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 1</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs043.html#setting-up-the-back-propagation-algorithm-part-2" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs044.html#setting-up-the-back-propagation-algorithm-part-3" style="font-size: 80%;"><b>Setting up the Back propagation algorithm, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs045.html#updating-the-gradients" style="font-size: 80%;"><b>Updating the gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs001.html#lecture-october-13-2025" style="font-size: 80%;"><b>Lecture October 13, 2025</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs002.html#readings-and-videos" style="font-size: 80%;"><b>Readings and videos</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs003.html#material-for-the-lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Material for the lab sessions on Tuesday and Wednesday</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs004.html#lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network" style="font-size: 80%;"><b>Lecture material: Writing a code which implements a feed-forward neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs005.html#mathematics-of-deep-learning" style="font-size: 80%;"><b>Mathematics of deep learning</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs006.html#reminder-on-books-with-hands-on-material-and-codes" style="font-size: 80%;"><b>Reminder on books with hands-on material and codes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs007.html#reading-recommendations" style="font-size: 80%;"><b>Reading recommendations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs008.html#reminder-from-last-week-first-network-example-simple-percepetron-with-one-input" style="font-size: 80%;"><b>Reminder from last week: First network example, simple percepetron with one input</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs009.html#layout-of-a-simple-neural-network-with-no-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with no hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="#optimizing-the-parameters" style="font-size: 80%;"><b>Optimizing the parameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs011.html#adding-a-hidden-layer" style="font-size: 80%;"><b>Adding a hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs012.html#layout-of-a-simple-neural-network-with-one-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with one hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs013.html#the-derivatives" style="font-size: 80%;"><b>The derivatives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs014.html#important-observations" style="font-size: 80%;"><b>Important observations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs015.html#the-training" style="font-size: 80%;"><b>The training</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs016.html#code-example" style="font-size: 80%;"><b>Code example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs017.html#simple-neural-network-and-the-back-propagation-equations" style="font-size: 80%;"><b>Simple neural network and the  back propagation equations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs018.html#layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-with-two-hidden-noeds-and-one-output-node" style="font-size: 80%;"><b>Layout of a simple neural network with two input nodes, one  hidden layer with two hidden noeds and one output node</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs019.html#the-ouput-layer" style="font-size: 80%;"><b>The ouput layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs020.html#compact-expressions" style="font-size: 80%;"><b>Compact expressions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs021.html#output-layer" style="font-size: 80%;"><b>Output layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs022.html#explicit-derivatives" style="font-size: 80%;"><b>Explicit derivatives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs023.html#derivatives-of-the-hidden-layer" style="font-size: 80%;"><b>Derivatives of the hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs024.html#final-expression" style="font-size: 80%;"><b>Final expression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs025.html#completing-the-list" style="font-size: 80%;"><b>Completing the list</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs026.html#final-expressions-for-the-biases-of-the-hidden-layer" style="font-size: 80%;"><b>Final expressions for the biases of the hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs027.html#gradient-expressions" style="font-size: 80%;"><b>Gradient expressions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs028.html#setting-up-the-equations-for-a-neural-network" style="font-size: 80%;"><b>Setting up the equations for a neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs029.html#layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0" style="font-size: 80%;"><b>Layout of a neural network with three hidden layers (last layer = \( l=L=4 \), first layer \( l=0 \))</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs030.html#definitions" style="font-size: 80%;"><b>Definitions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs031.html#inputs-to-the-activation-function" style="font-size: 80%;"><b>Inputs to the activation function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs032.html#layout-of-input-to-first-hidden-layer-l-1-from-input-layer-l-0" style="font-size: 80%;"><b>Layout of input to first hidden layer \( l=1 \) from input layer \( l=0 \)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs033.html#derivatives-and-the-chain-rule" style="font-size: 80%;"><b>Derivatives and the chain rule</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs034.html#derivative-of-the-cost-function" style="font-size: 80%;"><b>Derivative of the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs035.html#the-back-propagation-equations-for-a-neural-network" style="font-size: 80%;"><b>The  back propagation equations for a neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs036.html#analyzing-the-last-results" style="font-size: 80%;"><b>Analyzing the last results</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs037.html#more-considerations" style="font-size: 80%;"><b>More considerations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs038.html#derivatives-in-terms-of-z-j-l" style="font-size: 80%;"><b>Derivatives in terms of \( z_j^L \)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs039.html#bringing-it-together" style="font-size: 80%;"><b>Bringing it together</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs040.html#final-back-propagating-equation" style="font-size: 80%;"><b>Final back propagating equation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs041.html#using-the-chain-rule-and-summing-over-all-k-entries" style="font-size: 80%;"><b>Using the chain rule and summing over all \( k \) entries</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs042.html#setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations" style="font-size: 80%;"><b>Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs043.html#setting-up-the-back-propagation-algorithm-part-1" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 1</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs044.html#setting-up-the-back-propagation-algorithm-part-2" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 2</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs045.html#setting-up-the-back-propagation-algorithm-part-3" style="font-size: 80%;"><b>Setting up the Back propagation algorithm, part 3</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs046.html#updating-the-gradients" style="font-size: 80%;"><b>Updating the gradients</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#activation-functions" style="font-size: 80%;"><b>Activation functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs047.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs048.html#relevance" style="font-size: 80%;"><b>Relevance</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs049.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs050.html#exploding-gradients" style="font-size: 80%;"><b>Exploding gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs051.html#is-the-logistic-activation-function-sigmoid-our-choice" style="font-size: 80%;"><b>Is the Logistic activation function (Sigmoid)  our choice?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs052.html#logistic-function-as-the-root-of-problems" style="font-size: 80%;"><b>Logistic function as the root of problems</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs053.html#the-derivative-of-the-logistic-funtion" style="font-size: 80%;"><b>The derivative of the Logistic funtion</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs054.html#insights-from-the-paper-by-glorot-and-bengio" style="font-size: 80%;"><b>Insights from the paper by Glorot and Bengio</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs055.html#the-relu-function-family" style="font-size: 80%;"><b>The RELU function family</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs056.html#elu-function" style="font-size: 80%;"><b>ELU function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs057.html#which-activation-function-should-we-use" style="font-size: 80%;"><b>Which activation function should we use?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs058.html#more-on-activation-functions-output-layers" style="font-size: 80%;"><b>More on activation functions, output layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs059.html#fine-tuning-neural-network-hyperparameters" style="font-size: 80%;"><b>Fine-tuning neural network hyperparameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs060.html#hidden-layers" style="font-size: 80%;"><b>Hidden layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs061.html#batch-normalization" style="font-size: 80%;"><b>Batch Normalization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs062.html#dropout" style="font-size: 80%;"><b>Dropout</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs063.html#gradient-clipping" style="font-size: 80%;"><b>Gradient Clipping</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs064.html#a-top-down-perspective-on-neural-networks" style="font-size: 80%;"><b>A top-down perspective on Neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs065.html#more-top-down-perspectives" style="font-size: 80%;"><b>More top-down perspectives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs066.html#limitations-of-supervised-learning-with-deep-networks" style="font-size: 80%;"><b>Limitations of supervised learning with deep networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs067.html#limitations-of-nns" style="font-size: 80%;"><b>Limitations of NNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs068.html#homogeneous-data" style="font-size: 80%;"><b>Homogeneous data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs069.html#more-limitations" style="font-size: 80%;"><b>More limitations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs070.html#setting-up-a-multi-layer-perceptron-model-for-classification" style="font-size: 80%;"><b>Setting up a Multi-layer perceptron model for classification</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs071.html#defining-the-cost-function" style="font-size: 80%;"><b>Defining the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs072.html#example-binary-classification-problem" style="font-size: 80%;"><b>Example: binary classification problem</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs073.html#the-softmax-function" style="font-size: 80%;"><b>The Softmax function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs074.html#developing-a-code-for-doing-neural-networks-with-back-propagation" style="font-size: 80%;"><b>Developing a code for doing neural networks with back propagation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs076.html#train-and-test-datasets" style="font-size: 80%;"><b>Train and test datasets</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs077.html#define-model-and-architecture" style="font-size: 80%;"><b>Define model and architecture</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs078.html#layers" style="font-size: 80%;"><b>Layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs079.html#weights-and-biases" style="font-size: 80%;"><b>Weights and biases</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs080.html#feed-forward-pass" style="font-size: 80%;"><b>Feed-forward pass</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs081.html#matrix-multiplications" style="font-size: 80%;"><b>Matrix multiplications</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs082.html#choose-cost-function-and-optimizer" style="font-size: 80%;"><b>Choose cost function and optimizer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs083.html#optimizing-the-cost-function" style="font-size: 80%;"><b>Optimizing the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs084.html#regularization" style="font-size: 80%;"><b>Regularization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs085.html#matrix-multiplication" style="font-size: 80%;"><b>Matrix  multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs086.html#improving-performance" style="font-size: 80%;"><b>Improving performance</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs087.html#full-object-oriented-implementation" style="font-size: 80%;"><b>Full object-oriented implementation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs088.html#evaluate-model-performance-on-test-data" style="font-size: 80%;"><b>Evaluate model performance on test data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs089.html#adjust-hyperparameters" style="font-size: 80%;"><b>Adjust hyperparameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs091.html#scikit-learn-implementation" style="font-size: 80%;"><b>scikit-learn implementation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#building-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs094.html#tensorflow" style="font-size: 80%;"><b>Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs095.html#using-keras" style="font-size: 80%;"><b>Using Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#the-breast-cancer-data-now-with-keras" style="font-size: 80%;"><b>The Breast Cancer Data, now with Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs048.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs049.html#relevance" style="font-size: 80%;"><b>Relevance</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs050.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs051.html#exploding-gradients" style="font-size: 80%;"><b>Exploding gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs052.html#is-the-logistic-activation-function-sigmoid-our-choice" style="font-size: 80%;"><b>Is the Logistic activation function (Sigmoid)  our choice?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs053.html#logistic-function-as-the-root-of-problems" style="font-size: 80%;"><b>Logistic function as the root of problems</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs054.html#the-derivative-of-the-logistic-funtion" style="font-size: 80%;"><b>The derivative of the Logistic funtion</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs055.html#insights-from-the-paper-by-glorot-and-bengio" style="font-size: 80%;"><b>Insights from the paper by Glorot and Bengio</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs056.html#the-relu-function-family" style="font-size: 80%;"><b>The RELU function family</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs057.html#elu-function" style="font-size: 80%;"><b>ELU function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs058.html#which-activation-function-should-we-use" style="font-size: 80%;"><b>Which activation function should we use?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs059.html#more-on-activation-functions-output-layers" style="font-size: 80%;"><b>More on activation functions, output layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs060.html#fine-tuning-neural-network-hyperparameters" style="font-size: 80%;"><b>Fine-tuning neural network hyperparameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs061.html#hidden-layers" style="font-size: 80%;"><b>Hidden layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs062.html#batch-normalization" style="font-size: 80%;"><b>Batch Normalization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs063.html#dropout" style="font-size: 80%;"><b>Dropout</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs064.html#gradient-clipping" style="font-size: 80%;"><b>Gradient Clipping</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs065.html#a-top-down-perspective-on-neural-networks" style="font-size: 80%;"><b>A top-down perspective on Neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs066.html#more-top-down-perspectives" style="font-size: 80%;"><b>More top-down perspectives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs067.html#limitations-of-supervised-learning-with-deep-networks" style="font-size: 80%;"><b>Limitations of supervised learning with deep networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs068.html#limitations-of-nns" style="font-size: 80%;"><b>Limitations of NNs</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs069.html#homogeneous-data" style="font-size: 80%;"><b>Homogeneous data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs070.html#more-limitations" style="font-size: 80%;"><b>More limitations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs071.html#setting-up-a-multi-layer-perceptron-model-for-classification" style="font-size: 80%;"><b>Setting up a Multi-layer perceptron model for classification</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs072.html#defining-the-cost-function" style="font-size: 80%;"><b>Defining the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs073.html#example-binary-classification-problem" style="font-size: 80%;"><b>Example: binary classification problem</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs074.html#the-softmax-function" style="font-size: 80%;"><b>The Softmax function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs075.html#developing-a-code-for-doing-neural-networks-with-back-propagation" style="font-size: 80%;"><b>Developing a code for doing neural networks with back propagation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs077.html#train-and-test-datasets" style="font-size: 80%;"><b>Train and test datasets</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs078.html#define-model-and-architecture" style="font-size: 80%;"><b>Define model and architecture</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs079.html#layers" style="font-size: 80%;"><b>Layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs080.html#weights-and-biases" style="font-size: 80%;"><b>Weights and biases</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs081.html#feed-forward-pass" style="font-size: 80%;"><b>Feed-forward pass</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs082.html#matrix-multiplications" style="font-size: 80%;"><b>Matrix multiplications</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs083.html#choose-cost-function-and-optimizer" style="font-size: 80%;"><b>Choose cost function and optimizer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs084.html#optimizing-the-cost-function" style="font-size: 80%;"><b>Optimizing the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs085.html#regularization" style="font-size: 80%;"><b>Regularization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs086.html#matrix-multiplication" style="font-size: 80%;"><b>Matrix  multiplication</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs087.html#improving-performance" style="font-size: 80%;"><b>Improving performance</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs088.html#full-object-oriented-implementation" style="font-size: 80%;"><b>Full object-oriented implementation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs089.html#evaluate-model-performance-on-test-data" style="font-size: 80%;"><b>Evaluate model performance on test data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs090.html#adjust-hyperparameters" style="font-size: 80%;"><b>Adjust hyperparameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#scikit-learn-implementation" style="font-size: 80%;"><b>scikit-learn implementation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs094.html#building-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building neural networks in Tensorflow and Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs095.html#tensorflow" style="font-size: 80%;"><b>Tensorflow</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#using-keras" style="font-size: 80%;"><b>Using Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#building-a-neural-network-code" style="font-size: 80%;"><b>Building a neural network code</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#learning-rate-methods" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Learning rate methods</a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#usage-of-the-above-learning-rate-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of the above learning rate schedulers</a></li>
@@ -465,29 +463,33 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0010"></a>
 <!-- !split -->
-<h2 id="adding-a-hidden-layer" class="anchor">Adding a hidden layer </h2>
+<h2 id="optimizing-the-parameters" class="anchor">Optimizing the parameters </h2>
 
-<p>We change our simple model to (see graph)
-a network with just one hidden layer but with scalar variables only.
+<p>In setting up the feed forward and back propagation parts of the
+algorithm, we need now the derivative of the various variables we want
+to train.
 </p>
 
-<p>Our output variable changes to \( a_2 \) and \( a_1 \) is now the output from the hidden node and \( a_0=x \).
-We have then
-</p>
+<p>We need</p>
+$$
+\frac{\partial C}{\partial w_1} \hspace{0.1cm}\mathrm{and}\hspace{0.1cm}\frac{\partial C}{\partial b_1}. 
+$$
+
+<p>Using the chain rule we find </p>
 $$
-z_1 = w_1a_0+b_1 \hspace{0.1cm} \wedge a_1 = \sigma_1(z_1),
+\frac{\partial C}{\partial w_1}=\frac{\partial C}{\partial a_1}\frac{\partial a_1}{\partial z_1}\frac{\partial z_1}{\partial w_1}=(a_1-y)\sigma_1'x,
 $$
 
+<p>and</p>
 $$
-z_2 = w_2a_1+b_2 \hspace{0.1cm} \wedge a_2 = \sigma_2(z_2),
+\frac{\partial C}{\partial b_1}=\frac{\partial C}{\partial a_1}\frac{\partial a_1}{\partial z_1}\frac{\partial z_1}{\partial b_1}=(a_1-y)\sigma_1',
 $$
 
-<p>and the cost function</p>
+<p>which we later will just define as</p>
 $$
-C(x;\boldsymbol{\Theta})=\frac{1}{2}(a_2-y)^2,
+\frac{\partial C}{\partial a_1}\frac{\partial a_1}{\partial z_1}=\delta_1.
 $$
 
-<p>with \( \boldsymbol{\Theta}=[w_1,w_2,b_1,b_2] \).</p>
 
 <p>
 <!-- navigation buttons at the bottom of the page -->
diff --git a/doc/pub/week42/html/._week42-bs011.html b/doc/pub/week42/html/._week42-bs011.html
index bdb721fe0..ea0dd87a9 100644
--- a/doc/pub/week42/html/._week42-bs011.html
+++ b/doc/pub/week42/html/._week42-bs011.html
@@ -36,16 +36,17 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Lecture October 14, 2024', 2, None, 'lecture-october-14-2024'),
-              ('Material for the active learning sessions on Tuesday and '
-               'Wednesday',
+ 'sections': [('Lecture October 13, 2025', 2, None, 'lecture-october-13-2025'),
+              ('Readings and videos', 2, None, 'readings-and-videos'),
+              ('Material for the lab sessions on Tuesday and Wednesday',
                2,
                None,
-               'material-for-the-active-learning-sessions-on-tuesday-and-wednesday'),
-              ('Writing a code which implements a feed-forward neural network',
+               'material-for-the-lab-sessions-on-tuesday-and-wednesday'),
+              ('Lecture material: Writing a code which implements a '
+               'feed-forward neural network',
                2,
                None,
-               'writing-a-code-which-implements-a-feed-forward-neural-network'),
+               'lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network'),
               ('Mathematics of deep learning',
                2,
                None,
@@ -55,10 +56,11 @@
                None,
                'reminder-on-books-with-hands-on-material-and-codes'),
               ('Reading recommendations', 2, None, 'reading-recommendations'),
-              ('First network example, simple percepetron with one input',
+              ('Reminder from last week: First network example, simple '
+               'percepetron with one input',
                2,
                None,
-               'first-network-example-simple-percepetron-with-one-input'),
+               'reminder-from-last-week-first-network-example-simple-percepetron-with-one-input'),
               ('Layout of a simple neural network with no hidden layer',
                2,
                None,
@@ -105,10 +107,10 @@
                None,
                'setting-up-the-equations-for-a-neural-network'),
               ('Layout of a neural network with three hidden layers (last '
-               'later = $l=L=4$, first layer $l=0$)',
+               'layer = $l=L=4$, first layer $l=0$)',
                2,
                None,
-               'layout-of-a-neural-network-with-three-hidden-layers-last-later-l-l-4-first-layer-l-0'),
+               'layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0'),
               ('Definitions', 2, None, 'definitions'),
               ('Inputs to the activation function',
                2,
@@ -291,10 +293,6 @@
                2,
                None,
                'collect-and-pre-process-data'),
-              ('The Breast Cancer Data, now with Keras',
-               2,
-               None,
-               'the-breast-cancer-data-now-with-keras'),
               ('Building a neural network code',
                2,
                None,
@@ -349,103 +347,103 @@
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs001.html#lecture-october-14-2024" style="font-size: 80%;"><b>Lecture October 14, 2024</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs002.html#material-for-the-active-learning-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Material for the active learning sessions on Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs003.html#writing-a-code-which-implements-a-feed-forward-neural-network" style="font-size: 80%;"><b>Writing a code which implements a feed-forward neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs004.html#mathematics-of-deep-learning" style="font-size: 80%;"><b>Mathematics of deep learning</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs005.html#reminder-on-books-with-hands-on-material-and-codes" style="font-size: 80%;"><b>Reminder on books with hands-on material and codes</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs006.html#reading-recommendations" style="font-size: 80%;"><b>Reading recommendations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs007.html#first-network-example-simple-percepetron-with-one-input" style="font-size: 80%;"><b>First network example, simple percepetron with one input</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs008.html#layout-of-a-simple-neural-network-with-no-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with no hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs009.html#optimizing-the-parameters" style="font-size: 80%;"><b>Optimizing the parameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs010.html#adding-a-hidden-layer" style="font-size: 80%;"><b>Adding a hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="#layout-of-a-simple-neural-network-with-one-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with one hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs012.html#the-derivatives" style="font-size: 80%;"><b>The derivatives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs013.html#important-observations" style="font-size: 80%;"><b>Important observations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs014.html#the-training" style="font-size: 80%;"><b>The training</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs015.html#code-example" style="font-size: 80%;"><b>Code example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs016.html#simple-neural-network-and-the-back-propagation-equations" style="font-size: 80%;"><b>Simple neural network and the  back propagation equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs017.html#layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-with-two-hidden-noeds-and-one-output-node" style="font-size: 80%;"><b>Layout of a simple neural network with two input nodes, one  hidden layer with two hidden noeds and one output node</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs018.html#the-ouput-layer" style="font-size: 80%;"><b>The ouput layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs019.html#compact-expressions" style="font-size: 80%;"><b>Compact expressions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs020.html#output-layer" style="font-size: 80%;"><b>Output layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs021.html#explicit-derivatives" style="font-size: 80%;"><b>Explicit derivatives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs022.html#derivatives-of-the-hidden-layer" style="font-size: 80%;"><b>Derivatives of the hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs023.html#final-expression" style="font-size: 80%;"><b>Final expression</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs024.html#completing-the-list" style="font-size: 80%;"><b>Completing the list</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs025.html#final-expressions-for-the-biases-of-the-hidden-layer" style="font-size: 80%;"><b>Final expressions for the biases of the hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs026.html#gradient-expressions" style="font-size: 80%;"><b>Gradient expressions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs027.html#setting-up-the-equations-for-a-neural-network" style="font-size: 80%;"><b>Setting up the equations for a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs028.html#layout-of-a-neural-network-with-three-hidden-layers-last-later-l-l-4-first-layer-l-0" style="font-size: 80%;"><b>Layout of a neural network with three hidden layers (last later = \( l=L=4 \), first layer \( l=0 \))</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs029.html#definitions" style="font-size: 80%;"><b>Definitions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs030.html#inputs-to-the-activation-function" style="font-size: 80%;"><b>Inputs to the activation function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs031.html#layout-of-input-to-first-hidden-layer-l-1-from-input-layer-l-0" style="font-size: 80%;"><b>Layout of input to first hidden layer \( l=1 \) from input layer \( l=0 \)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs032.html#derivatives-and-the-chain-rule" style="font-size: 80%;"><b>Derivatives and the chain rule</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs033.html#derivative-of-the-cost-function" style="font-size: 80%;"><b>Derivative of the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs034.html#the-back-propagation-equations-for-a-neural-network" style="font-size: 80%;"><b>The  back propagation equations for a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs035.html#analyzing-the-last-results" style="font-size: 80%;"><b>Analyzing the last results</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs036.html#more-considerations" style="font-size: 80%;"><b>More considerations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs037.html#derivatives-in-terms-of-z-j-l" style="font-size: 80%;"><b>Derivatives in terms of \( z_j^L \)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs038.html#bringing-it-together" style="font-size: 80%;"><b>Bringing it together</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs039.html#final-back-propagating-equation" style="font-size: 80%;"><b>Final back propagating equation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs040.html#using-the-chain-rule-and-summing-over-all-k-entries" style="font-size: 80%;"><b>Using the chain rule and summing over all \( k \) entries</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs041.html#setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations" style="font-size: 80%;"><b>Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs042.html#setting-up-the-back-propagation-algorithm-part-1" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 1</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs043.html#setting-up-the-back-propagation-algorithm-part-2" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs044.html#setting-up-the-back-propagation-algorithm-part-3" style="font-size: 80%;"><b>Setting up the Back propagation algorithm, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs045.html#updating-the-gradients" style="font-size: 80%;"><b>Updating the gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs001.html#lecture-october-13-2025" style="font-size: 80%;"><b>Lecture October 13, 2025</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs002.html#readings-and-videos" style="font-size: 80%;"><b>Readings and videos</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs003.html#material-for-the-lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Material for the lab sessions on Tuesday and Wednesday</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs004.html#lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network" style="font-size: 80%;"><b>Lecture material: Writing a code which implements a feed-forward neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs005.html#mathematics-of-deep-learning" style="font-size: 80%;"><b>Mathematics of deep learning</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs006.html#reminder-on-books-with-hands-on-material-and-codes" style="font-size: 80%;"><b>Reminder on books with hands-on material and codes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs007.html#reading-recommendations" style="font-size: 80%;"><b>Reading recommendations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs008.html#reminder-from-last-week-first-network-example-simple-percepetron-with-one-input" style="font-size: 80%;"><b>Reminder from last week: First network example, simple percepetron with one input</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs009.html#layout-of-a-simple-neural-network-with-no-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with no hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs010.html#optimizing-the-parameters" style="font-size: 80%;"><b>Optimizing the parameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="#adding-a-hidden-layer" style="font-size: 80%;"><b>Adding a hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs012.html#layout-of-a-simple-neural-network-with-one-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with one hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs013.html#the-derivatives" style="font-size: 80%;"><b>The derivatives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs014.html#important-observations" style="font-size: 80%;"><b>Important observations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs015.html#the-training" style="font-size: 80%;"><b>The training</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs016.html#code-example" style="font-size: 80%;"><b>Code example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs017.html#simple-neural-network-and-the-back-propagation-equations" style="font-size: 80%;"><b>Simple neural network and the  back propagation equations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs018.html#layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-with-two-hidden-noeds-and-one-output-node" style="font-size: 80%;"><b>Layout of a simple neural network with two input nodes, one  hidden layer with two hidden noeds and one output node</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs019.html#the-ouput-layer" style="font-size: 80%;"><b>The ouput layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs020.html#compact-expressions" style="font-size: 80%;"><b>Compact expressions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs021.html#output-layer" style="font-size: 80%;"><b>Output layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs022.html#explicit-derivatives" style="font-size: 80%;"><b>Explicit derivatives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs023.html#derivatives-of-the-hidden-layer" style="font-size: 80%;"><b>Derivatives of the hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs024.html#final-expression" style="font-size: 80%;"><b>Final expression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs025.html#completing-the-list" style="font-size: 80%;"><b>Completing the list</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs026.html#final-expressions-for-the-biases-of-the-hidden-layer" style="font-size: 80%;"><b>Final expressions for the biases of the hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs027.html#gradient-expressions" style="font-size: 80%;"><b>Gradient expressions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs028.html#setting-up-the-equations-for-a-neural-network" style="font-size: 80%;"><b>Setting up the equations for a neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs029.html#layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0" style="font-size: 80%;"><b>Layout of a neural network with three hidden layers (last layer = \( l=L=4 \), first layer \( l=0 \))</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs030.html#definitions" style="font-size: 80%;"><b>Definitions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs031.html#inputs-to-the-activation-function" style="font-size: 80%;"><b>Inputs to the activation function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs032.html#layout-of-input-to-first-hidden-layer-l-1-from-input-layer-l-0" style="font-size: 80%;"><b>Layout of input to first hidden layer \( l=1 \) from input layer \( l=0 \)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs033.html#derivatives-and-the-chain-rule" style="font-size: 80%;"><b>Derivatives and the chain rule</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs034.html#derivative-of-the-cost-function" style="font-size: 80%;"><b>Derivative of the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs035.html#the-back-propagation-equations-for-a-neural-network" style="font-size: 80%;"><b>The  back propagation equations for a neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs036.html#analyzing-the-last-results" style="font-size: 80%;"><b>Analyzing the last results</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs037.html#more-considerations" style="font-size: 80%;"><b>More considerations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs038.html#derivatives-in-terms-of-z-j-l" style="font-size: 80%;"><b>Derivatives in terms of \( z_j^L \)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs039.html#bringing-it-together" style="font-size: 80%;"><b>Bringing it together</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs040.html#final-back-propagating-equation" style="font-size: 80%;"><b>Final back propagating equation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs041.html#using-the-chain-rule-and-summing-over-all-k-entries" style="font-size: 80%;"><b>Using the chain rule and summing over all \( k \) entries</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs042.html#setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations" style="font-size: 80%;"><b>Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs043.html#setting-up-the-back-propagation-algorithm-part-1" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 1</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs044.html#setting-up-the-back-propagation-algorithm-part-2" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 2</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs045.html#setting-up-the-back-propagation-algorithm-part-3" style="font-size: 80%;"><b>Setting up the Back propagation algorithm, part 3</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs046.html#updating-the-gradients" style="font-size: 80%;"><b>Updating the gradients</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#activation-functions" style="font-size: 80%;"><b>Activation functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs047.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs048.html#relevance" style="font-size: 80%;"><b>Relevance</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs049.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs050.html#exploding-gradients" style="font-size: 80%;"><b>Exploding gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs051.html#is-the-logistic-activation-function-sigmoid-our-choice" style="font-size: 80%;"><b>Is the Logistic activation function (Sigmoid)  our choice?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs052.html#logistic-function-as-the-root-of-problems" style="font-size: 80%;"><b>Logistic function as the root of problems</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs053.html#the-derivative-of-the-logistic-funtion" style="font-size: 80%;"><b>The derivative of the Logistic funtion</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs054.html#insights-from-the-paper-by-glorot-and-bengio" style="font-size: 80%;"><b>Insights from the paper by Glorot and Bengio</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs055.html#the-relu-function-family" style="font-size: 80%;"><b>The RELU function family</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs056.html#elu-function" style="font-size: 80%;"><b>ELU function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs057.html#which-activation-function-should-we-use" style="font-size: 80%;"><b>Which activation function should we use?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs058.html#more-on-activation-functions-output-layers" style="font-size: 80%;"><b>More on activation functions, output layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs059.html#fine-tuning-neural-network-hyperparameters" style="font-size: 80%;"><b>Fine-tuning neural network hyperparameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs060.html#hidden-layers" style="font-size: 80%;"><b>Hidden layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs061.html#batch-normalization" style="font-size: 80%;"><b>Batch Normalization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs062.html#dropout" style="font-size: 80%;"><b>Dropout</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs063.html#gradient-clipping" style="font-size: 80%;"><b>Gradient Clipping</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs064.html#a-top-down-perspective-on-neural-networks" style="font-size: 80%;"><b>A top-down perspective on Neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs065.html#more-top-down-perspectives" style="font-size: 80%;"><b>More top-down perspectives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs066.html#limitations-of-supervised-learning-with-deep-networks" style="font-size: 80%;"><b>Limitations of supervised learning with deep networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs067.html#limitations-of-nns" style="font-size: 80%;"><b>Limitations of NNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs068.html#homogeneous-data" style="font-size: 80%;"><b>Homogeneous data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs069.html#more-limitations" style="font-size: 80%;"><b>More limitations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs070.html#setting-up-a-multi-layer-perceptron-model-for-classification" style="font-size: 80%;"><b>Setting up a Multi-layer perceptron model for classification</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs071.html#defining-the-cost-function" style="font-size: 80%;"><b>Defining the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs072.html#example-binary-classification-problem" style="font-size: 80%;"><b>Example: binary classification problem</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs073.html#the-softmax-function" style="font-size: 80%;"><b>The Softmax function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs074.html#developing-a-code-for-doing-neural-networks-with-back-propagation" style="font-size: 80%;"><b>Developing a code for doing neural networks with back propagation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs076.html#train-and-test-datasets" style="font-size: 80%;"><b>Train and test datasets</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs077.html#define-model-and-architecture" style="font-size: 80%;"><b>Define model and architecture</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs078.html#layers" style="font-size: 80%;"><b>Layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs079.html#weights-and-biases" style="font-size: 80%;"><b>Weights and biases</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs080.html#feed-forward-pass" style="font-size: 80%;"><b>Feed-forward pass</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs081.html#matrix-multiplications" style="font-size: 80%;"><b>Matrix multiplications</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs082.html#choose-cost-function-and-optimizer" style="font-size: 80%;"><b>Choose cost function and optimizer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs083.html#optimizing-the-cost-function" style="font-size: 80%;"><b>Optimizing the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs084.html#regularization" style="font-size: 80%;"><b>Regularization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs085.html#matrix-multiplication" style="font-size: 80%;"><b>Matrix  multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs086.html#improving-performance" style="font-size: 80%;"><b>Improving performance</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs087.html#full-object-oriented-implementation" style="font-size: 80%;"><b>Full object-oriented implementation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs088.html#evaluate-model-performance-on-test-data" style="font-size: 80%;"><b>Evaluate model performance on test data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs089.html#adjust-hyperparameters" style="font-size: 80%;"><b>Adjust hyperparameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs091.html#scikit-learn-implementation" style="font-size: 80%;"><b>scikit-learn implementation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#building-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs094.html#tensorflow" style="font-size: 80%;"><b>Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs095.html#using-keras" style="font-size: 80%;"><b>Using Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#the-breast-cancer-data-now-with-keras" style="font-size: 80%;"><b>The Breast Cancer Data, now with Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs048.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs049.html#relevance" style="font-size: 80%;"><b>Relevance</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs050.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs051.html#exploding-gradients" style="font-size: 80%;"><b>Exploding gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs052.html#is-the-logistic-activation-function-sigmoid-our-choice" style="font-size: 80%;"><b>Is the Logistic activation function (Sigmoid)  our choice?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs053.html#logistic-function-as-the-root-of-problems" style="font-size: 80%;"><b>Logistic function as the root of problems</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs054.html#the-derivative-of-the-logistic-funtion" style="font-size: 80%;"><b>The derivative of the Logistic funtion</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs055.html#insights-from-the-paper-by-glorot-and-bengio" style="font-size: 80%;"><b>Insights from the paper by Glorot and Bengio</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs056.html#the-relu-function-family" style="font-size: 80%;"><b>The RELU function family</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs057.html#elu-function" style="font-size: 80%;"><b>ELU function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs058.html#which-activation-function-should-we-use" style="font-size: 80%;"><b>Which activation function should we use?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs059.html#more-on-activation-functions-output-layers" style="font-size: 80%;"><b>More on activation functions, output layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs060.html#fine-tuning-neural-network-hyperparameters" style="font-size: 80%;"><b>Fine-tuning neural network hyperparameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs061.html#hidden-layers" style="font-size: 80%;"><b>Hidden layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs062.html#batch-normalization" style="font-size: 80%;"><b>Batch Normalization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs063.html#dropout" style="font-size: 80%;"><b>Dropout</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs064.html#gradient-clipping" style="font-size: 80%;"><b>Gradient Clipping</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs065.html#a-top-down-perspective-on-neural-networks" style="font-size: 80%;"><b>A top-down perspective on Neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs066.html#more-top-down-perspectives" style="font-size: 80%;"><b>More top-down perspectives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs067.html#limitations-of-supervised-learning-with-deep-networks" style="font-size: 80%;"><b>Limitations of supervised learning with deep networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs068.html#limitations-of-nns" style="font-size: 80%;"><b>Limitations of NNs</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs069.html#homogeneous-data" style="font-size: 80%;"><b>Homogeneous data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs070.html#more-limitations" style="font-size: 80%;"><b>More limitations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs071.html#setting-up-a-multi-layer-perceptron-model-for-classification" style="font-size: 80%;"><b>Setting up a Multi-layer perceptron model for classification</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs072.html#defining-the-cost-function" style="font-size: 80%;"><b>Defining the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs073.html#example-binary-classification-problem" style="font-size: 80%;"><b>Example: binary classification problem</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs074.html#the-softmax-function" style="font-size: 80%;"><b>The Softmax function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs075.html#developing-a-code-for-doing-neural-networks-with-back-propagation" style="font-size: 80%;"><b>Developing a code for doing neural networks with back propagation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs077.html#train-and-test-datasets" style="font-size: 80%;"><b>Train and test datasets</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs078.html#define-model-and-architecture" style="font-size: 80%;"><b>Define model and architecture</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs079.html#layers" style="font-size: 80%;"><b>Layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs080.html#weights-and-biases" style="font-size: 80%;"><b>Weights and biases</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs081.html#feed-forward-pass" style="font-size: 80%;"><b>Feed-forward pass</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs082.html#matrix-multiplications" style="font-size: 80%;"><b>Matrix multiplications</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs083.html#choose-cost-function-and-optimizer" style="font-size: 80%;"><b>Choose cost function and optimizer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs084.html#optimizing-the-cost-function" style="font-size: 80%;"><b>Optimizing the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs085.html#regularization" style="font-size: 80%;"><b>Regularization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs086.html#matrix-multiplication" style="font-size: 80%;"><b>Matrix  multiplication</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs087.html#improving-performance" style="font-size: 80%;"><b>Improving performance</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs088.html#full-object-oriented-implementation" style="font-size: 80%;"><b>Full object-oriented implementation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs089.html#evaluate-model-performance-on-test-data" style="font-size: 80%;"><b>Evaluate model performance on test data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs090.html#adjust-hyperparameters" style="font-size: 80%;"><b>Adjust hyperparameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#scikit-learn-implementation" style="font-size: 80%;"><b>scikit-learn implementation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs094.html#building-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building neural networks in Tensorflow and Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs095.html#tensorflow" style="font-size: 80%;"><b>Tensorflow</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#using-keras" style="font-size: 80%;"><b>Using Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#building-a-neural-network-code" style="font-size: 80%;"><b>Building a neural network code</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#learning-rate-methods" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Learning rate methods</a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#usage-of-the-above-learning-rate-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of the above learning rate schedulers</a></li>
@@ -465,13 +463,29 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0011"></a>
 <!-- !split -->
-<h2 id="layout-of-a-simple-neural-network-with-one-hidden-layer" class="anchor">Layout of a simple neural network with one hidden layer  </h2>
+<h2 id="adding-a-hidden-layer" class="anchor">Adding a hidden layer </h2>
 
-<br/><br/>
-<center>
-<p><img src="http://github.com/figures/simplenn2.png" width="900" align="bottom"></p>
-</center>
-<br/><br/>
+<p>We change our simple model to (see graph)
+a network with just one hidden layer but with scalar variables only.
+</p>
+
+<p>Our output variable changes to \( a_2 \) and \( a_1 \) is now the output from the hidden node and \( a_0=x \).
+We have then
+</p>
+$$
+z_1 = w_1a_0+b_1 \hspace{0.1cm} \wedge a_1 = \sigma_1(z_1),
+$$
+
+$$
+z_2 = w_2a_1+b_2 \hspace{0.1cm} \wedge a_2 = \sigma_2(z_2),
+$$
+
+<p>and the cost function</p>
+$$
+C(x;\boldsymbol{\Theta})=\frac{1}{2}(a_2-y)^2,
+$$
+
+<p>with \( \boldsymbol{\Theta}=[w_1,w_2,b_1,b_2] \).</p>
 
 <p>
 <!-- navigation buttons at the bottom of the page -->
diff --git a/doc/pub/week42/html/._week42-bs012.html b/doc/pub/week42/html/._week42-bs012.html
index 1876d220c..2a5c9ec4c 100644
--- a/doc/pub/week42/html/._week42-bs012.html
+++ b/doc/pub/week42/html/._week42-bs012.html
@@ -36,16 +36,17 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Lecture October 14, 2024', 2, None, 'lecture-october-14-2024'),
-              ('Material for the active learning sessions on Tuesday and '
-               'Wednesday',
+ 'sections': [('Lecture October 13, 2025', 2, None, 'lecture-october-13-2025'),
+              ('Readings and videos', 2, None, 'readings-and-videos'),
+              ('Material for the lab sessions on Tuesday and Wednesday',
                2,
                None,
-               'material-for-the-active-learning-sessions-on-tuesday-and-wednesday'),
-              ('Writing a code which implements a feed-forward neural network',
+               'material-for-the-lab-sessions-on-tuesday-and-wednesday'),
+              ('Lecture material: Writing a code which implements a '
+               'feed-forward neural network',
                2,
                None,
-               'writing-a-code-which-implements-a-feed-forward-neural-network'),
+               'lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network'),
               ('Mathematics of deep learning',
                2,
                None,
@@ -55,10 +56,11 @@
                None,
                'reminder-on-books-with-hands-on-material-and-codes'),
               ('Reading recommendations', 2, None, 'reading-recommendations'),
-              ('First network example, simple percepetron with one input',
+              ('Reminder from last week: First network example, simple '
+               'percepetron with one input',
                2,
                None,
-               'first-network-example-simple-percepetron-with-one-input'),
+               'reminder-from-last-week-first-network-example-simple-percepetron-with-one-input'),
               ('Layout of a simple neural network with no hidden layer',
                2,
                None,
@@ -105,10 +107,10 @@
                None,
                'setting-up-the-equations-for-a-neural-network'),
               ('Layout of a neural network with three hidden layers (last '
-               'later = $l=L=4$, first layer $l=0$)',
+               'layer = $l=L=4$, first layer $l=0$)',
                2,
                None,
-               'layout-of-a-neural-network-with-three-hidden-layers-last-later-l-l-4-first-layer-l-0'),
+               'layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0'),
               ('Definitions', 2, None, 'definitions'),
               ('Inputs to the activation function',
                2,
@@ -291,10 +293,6 @@
                2,
                None,
                'collect-and-pre-process-data'),
-              ('The Breast Cancer Data, now with Keras',
-               2,
-               None,
-               'the-breast-cancer-data-now-with-keras'),
               ('Building a neural network code',
                2,
                None,
@@ -349,103 +347,103 @@
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs001.html#lecture-october-14-2024" style="font-size: 80%;"><b>Lecture October 14, 2024</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs002.html#material-for-the-active-learning-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Material for the active learning sessions on Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs003.html#writing-a-code-which-implements-a-feed-forward-neural-network" style="font-size: 80%;"><b>Writing a code which implements a feed-forward neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs004.html#mathematics-of-deep-learning" style="font-size: 80%;"><b>Mathematics of deep learning</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs005.html#reminder-on-books-with-hands-on-material-and-codes" style="font-size: 80%;"><b>Reminder on books with hands-on material and codes</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs006.html#reading-recommendations" style="font-size: 80%;"><b>Reading recommendations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs007.html#first-network-example-simple-percepetron-with-one-input" style="font-size: 80%;"><b>First network example, simple percepetron with one input</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs008.html#layout-of-a-simple-neural-network-with-no-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with no hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs009.html#optimizing-the-parameters" style="font-size: 80%;"><b>Optimizing the parameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs010.html#adding-a-hidden-layer" style="font-size: 80%;"><b>Adding a hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs011.html#layout-of-a-simple-neural-network-with-one-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with one hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="#the-derivatives" style="font-size: 80%;"><b>The derivatives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs013.html#important-observations" style="font-size: 80%;"><b>Important observations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs014.html#the-training" style="font-size: 80%;"><b>The training</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs015.html#code-example" style="font-size: 80%;"><b>Code example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs016.html#simple-neural-network-and-the-back-propagation-equations" style="font-size: 80%;"><b>Simple neural network and the  back propagation equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs017.html#layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-with-two-hidden-noeds-and-one-output-node" style="font-size: 80%;"><b>Layout of a simple neural network with two input nodes, one  hidden layer with two hidden noeds and one output node</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs018.html#the-ouput-layer" style="font-size: 80%;"><b>The ouput layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs019.html#compact-expressions" style="font-size: 80%;"><b>Compact expressions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs020.html#output-layer" style="font-size: 80%;"><b>Output layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs021.html#explicit-derivatives" style="font-size: 80%;"><b>Explicit derivatives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs022.html#derivatives-of-the-hidden-layer" style="font-size: 80%;"><b>Derivatives of the hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs023.html#final-expression" style="font-size: 80%;"><b>Final expression</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs024.html#completing-the-list" style="font-size: 80%;"><b>Completing the list</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs025.html#final-expressions-for-the-biases-of-the-hidden-layer" style="font-size: 80%;"><b>Final expressions for the biases of the hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs026.html#gradient-expressions" style="font-size: 80%;"><b>Gradient expressions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs027.html#setting-up-the-equations-for-a-neural-network" style="font-size: 80%;"><b>Setting up the equations for a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs028.html#layout-of-a-neural-network-with-three-hidden-layers-last-later-l-l-4-first-layer-l-0" style="font-size: 80%;"><b>Layout of a neural network with three hidden layers (last later = \( l=L=4 \), first layer \( l=0 \))</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs029.html#definitions" style="font-size: 80%;"><b>Definitions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs030.html#inputs-to-the-activation-function" style="font-size: 80%;"><b>Inputs to the activation function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs031.html#layout-of-input-to-first-hidden-layer-l-1-from-input-layer-l-0" style="font-size: 80%;"><b>Layout of input to first hidden layer \( l=1 \) from input layer \( l=0 \)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs032.html#derivatives-and-the-chain-rule" style="font-size: 80%;"><b>Derivatives and the chain rule</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs033.html#derivative-of-the-cost-function" style="font-size: 80%;"><b>Derivative of the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs034.html#the-back-propagation-equations-for-a-neural-network" style="font-size: 80%;"><b>The  back propagation equations for a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs035.html#analyzing-the-last-results" style="font-size: 80%;"><b>Analyzing the last results</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs036.html#more-considerations" style="font-size: 80%;"><b>More considerations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs037.html#derivatives-in-terms-of-z-j-l" style="font-size: 80%;"><b>Derivatives in terms of \( z_j^L \)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs038.html#bringing-it-together" style="font-size: 80%;"><b>Bringing it together</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs039.html#final-back-propagating-equation" style="font-size: 80%;"><b>Final back propagating equation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs040.html#using-the-chain-rule-and-summing-over-all-k-entries" style="font-size: 80%;"><b>Using the chain rule and summing over all \( k \) entries</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs041.html#setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations" style="font-size: 80%;"><b>Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs042.html#setting-up-the-back-propagation-algorithm-part-1" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 1</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs043.html#setting-up-the-back-propagation-algorithm-part-2" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs044.html#setting-up-the-back-propagation-algorithm-part-3" style="font-size: 80%;"><b>Setting up the Back propagation algorithm, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs045.html#updating-the-gradients" style="font-size: 80%;"><b>Updating the gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs001.html#lecture-october-13-2025" style="font-size: 80%;"><b>Lecture October 13, 2025</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs002.html#readings-and-videos" style="font-size: 80%;"><b>Readings and videos</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs003.html#material-for-the-lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Material for the lab sessions on Tuesday and Wednesday</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs004.html#lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network" style="font-size: 80%;"><b>Lecture material: Writing a code which implements a feed-forward neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs005.html#mathematics-of-deep-learning" style="font-size: 80%;"><b>Mathematics of deep learning</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs006.html#reminder-on-books-with-hands-on-material-and-codes" style="font-size: 80%;"><b>Reminder on books with hands-on material and codes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs007.html#reading-recommendations" style="font-size: 80%;"><b>Reading recommendations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs008.html#reminder-from-last-week-first-network-example-simple-percepetron-with-one-input" style="font-size: 80%;"><b>Reminder from last week: First network example, simple percepetron with one input</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs009.html#layout-of-a-simple-neural-network-with-no-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with no hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs010.html#optimizing-the-parameters" style="font-size: 80%;"><b>Optimizing the parameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs011.html#adding-a-hidden-layer" style="font-size: 80%;"><b>Adding a hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="#layout-of-a-simple-neural-network-with-one-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with one hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs013.html#the-derivatives" style="font-size: 80%;"><b>The derivatives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs014.html#important-observations" style="font-size: 80%;"><b>Important observations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs015.html#the-training" style="font-size: 80%;"><b>The training</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs016.html#code-example" style="font-size: 80%;"><b>Code example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs017.html#simple-neural-network-and-the-back-propagation-equations" style="font-size: 80%;"><b>Simple neural network and the  back propagation equations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs018.html#layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-with-two-hidden-noeds-and-one-output-node" style="font-size: 80%;"><b>Layout of a simple neural network with two input nodes, one  hidden layer with two hidden noeds and one output node</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs019.html#the-ouput-layer" style="font-size: 80%;"><b>The ouput layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs020.html#compact-expressions" style="font-size: 80%;"><b>Compact expressions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs021.html#output-layer" style="font-size: 80%;"><b>Output layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs022.html#explicit-derivatives" style="font-size: 80%;"><b>Explicit derivatives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs023.html#derivatives-of-the-hidden-layer" style="font-size: 80%;"><b>Derivatives of the hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs024.html#final-expression" style="font-size: 80%;"><b>Final expression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs025.html#completing-the-list" style="font-size: 80%;"><b>Completing the list</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs026.html#final-expressions-for-the-biases-of-the-hidden-layer" style="font-size: 80%;"><b>Final expressions for the biases of the hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs027.html#gradient-expressions" style="font-size: 80%;"><b>Gradient expressions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs028.html#setting-up-the-equations-for-a-neural-network" style="font-size: 80%;"><b>Setting up the equations for a neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs029.html#layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0" style="font-size: 80%;"><b>Layout of a neural network with three hidden layers (last layer = \( l=L=4 \), first layer \( l=0 \))</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs030.html#definitions" style="font-size: 80%;"><b>Definitions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs031.html#inputs-to-the-activation-function" style="font-size: 80%;"><b>Inputs to the activation function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs032.html#layout-of-input-to-first-hidden-layer-l-1-from-input-layer-l-0" style="font-size: 80%;"><b>Layout of input to first hidden layer \( l=1 \) from input layer \( l=0 \)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs033.html#derivatives-and-the-chain-rule" style="font-size: 80%;"><b>Derivatives and the chain rule</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs034.html#derivative-of-the-cost-function" style="font-size: 80%;"><b>Derivative of the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs035.html#the-back-propagation-equations-for-a-neural-network" style="font-size: 80%;"><b>The  back propagation equations for a neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs036.html#analyzing-the-last-results" style="font-size: 80%;"><b>Analyzing the last results</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs037.html#more-considerations" style="font-size: 80%;"><b>More considerations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs038.html#derivatives-in-terms-of-z-j-l" style="font-size: 80%;"><b>Derivatives in terms of \( z_j^L \)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs039.html#bringing-it-together" style="font-size: 80%;"><b>Bringing it together</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs040.html#final-back-propagating-equation" style="font-size: 80%;"><b>Final back propagating equation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs041.html#using-the-chain-rule-and-summing-over-all-k-entries" style="font-size: 80%;"><b>Using the chain rule and summing over all \( k \) entries</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs042.html#setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations" style="font-size: 80%;"><b>Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs043.html#setting-up-the-back-propagation-algorithm-part-1" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 1</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs044.html#setting-up-the-back-propagation-algorithm-part-2" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 2</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs045.html#setting-up-the-back-propagation-algorithm-part-3" style="font-size: 80%;"><b>Setting up the Back propagation algorithm, part 3</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs046.html#updating-the-gradients" style="font-size: 80%;"><b>Updating the gradients</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#activation-functions" style="font-size: 80%;"><b>Activation functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs047.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs048.html#relevance" style="font-size: 80%;"><b>Relevance</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs049.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs050.html#exploding-gradients" style="font-size: 80%;"><b>Exploding gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs051.html#is-the-logistic-activation-function-sigmoid-our-choice" style="font-size: 80%;"><b>Is the Logistic activation function (Sigmoid)  our choice?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs052.html#logistic-function-as-the-root-of-problems" style="font-size: 80%;"><b>Logistic function as the root of problems</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs053.html#the-derivative-of-the-logistic-funtion" style="font-size: 80%;"><b>The derivative of the Logistic funtion</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs054.html#insights-from-the-paper-by-glorot-and-bengio" style="font-size: 80%;"><b>Insights from the paper by Glorot and Bengio</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs055.html#the-relu-function-family" style="font-size: 80%;"><b>The RELU function family</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs056.html#elu-function" style="font-size: 80%;"><b>ELU function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs057.html#which-activation-function-should-we-use" style="font-size: 80%;"><b>Which activation function should we use?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs058.html#more-on-activation-functions-output-layers" style="font-size: 80%;"><b>More on activation functions, output layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs059.html#fine-tuning-neural-network-hyperparameters" style="font-size: 80%;"><b>Fine-tuning neural network hyperparameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs060.html#hidden-layers" style="font-size: 80%;"><b>Hidden layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs061.html#batch-normalization" style="font-size: 80%;"><b>Batch Normalization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs062.html#dropout" style="font-size: 80%;"><b>Dropout</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs063.html#gradient-clipping" style="font-size: 80%;"><b>Gradient Clipping</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs064.html#a-top-down-perspective-on-neural-networks" style="font-size: 80%;"><b>A top-down perspective on Neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs065.html#more-top-down-perspectives" style="font-size: 80%;"><b>More top-down perspectives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs066.html#limitations-of-supervised-learning-with-deep-networks" style="font-size: 80%;"><b>Limitations of supervised learning with deep networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs067.html#limitations-of-nns" style="font-size: 80%;"><b>Limitations of NNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs068.html#homogeneous-data" style="font-size: 80%;"><b>Homogeneous data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs069.html#more-limitations" style="font-size: 80%;"><b>More limitations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs070.html#setting-up-a-multi-layer-perceptron-model-for-classification" style="font-size: 80%;"><b>Setting up a Multi-layer perceptron model for classification</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs071.html#defining-the-cost-function" style="font-size: 80%;"><b>Defining the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs072.html#example-binary-classification-problem" style="font-size: 80%;"><b>Example: binary classification problem</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs073.html#the-softmax-function" style="font-size: 80%;"><b>The Softmax function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs074.html#developing-a-code-for-doing-neural-networks-with-back-propagation" style="font-size: 80%;"><b>Developing a code for doing neural networks with back propagation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs076.html#train-and-test-datasets" style="font-size: 80%;"><b>Train and test datasets</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs077.html#define-model-and-architecture" style="font-size: 80%;"><b>Define model and architecture</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs078.html#layers" style="font-size: 80%;"><b>Layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs079.html#weights-and-biases" style="font-size: 80%;"><b>Weights and biases</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs080.html#feed-forward-pass" style="font-size: 80%;"><b>Feed-forward pass</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs081.html#matrix-multiplications" style="font-size: 80%;"><b>Matrix multiplications</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs082.html#choose-cost-function-and-optimizer" style="font-size: 80%;"><b>Choose cost function and optimizer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs083.html#optimizing-the-cost-function" style="font-size: 80%;"><b>Optimizing the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs084.html#regularization" style="font-size: 80%;"><b>Regularization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs085.html#matrix-multiplication" style="font-size: 80%;"><b>Matrix  multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs086.html#improving-performance" style="font-size: 80%;"><b>Improving performance</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs087.html#full-object-oriented-implementation" style="font-size: 80%;"><b>Full object-oriented implementation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs088.html#evaluate-model-performance-on-test-data" style="font-size: 80%;"><b>Evaluate model performance on test data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs089.html#adjust-hyperparameters" style="font-size: 80%;"><b>Adjust hyperparameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs091.html#scikit-learn-implementation" style="font-size: 80%;"><b>scikit-learn implementation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#building-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs094.html#tensorflow" style="font-size: 80%;"><b>Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs095.html#using-keras" style="font-size: 80%;"><b>Using Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#the-breast-cancer-data-now-with-keras" style="font-size: 80%;"><b>The Breast Cancer Data, now with Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs048.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs049.html#relevance" style="font-size: 80%;"><b>Relevance</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs050.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs051.html#exploding-gradients" style="font-size: 80%;"><b>Exploding gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs052.html#is-the-logistic-activation-function-sigmoid-our-choice" style="font-size: 80%;"><b>Is the Logistic activation function (Sigmoid)  our choice?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs053.html#logistic-function-as-the-root-of-problems" style="font-size: 80%;"><b>Logistic function as the root of problems</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs054.html#the-derivative-of-the-logistic-funtion" style="font-size: 80%;"><b>The derivative of the Logistic funtion</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs055.html#insights-from-the-paper-by-glorot-and-bengio" style="font-size: 80%;"><b>Insights from the paper by Glorot and Bengio</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs056.html#the-relu-function-family" style="font-size: 80%;"><b>The RELU function family</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs057.html#elu-function" style="font-size: 80%;"><b>ELU function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs058.html#which-activation-function-should-we-use" style="font-size: 80%;"><b>Which activation function should we use?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs059.html#more-on-activation-functions-output-layers" style="font-size: 80%;"><b>More on activation functions, output layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs060.html#fine-tuning-neural-network-hyperparameters" style="font-size: 80%;"><b>Fine-tuning neural network hyperparameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs061.html#hidden-layers" style="font-size: 80%;"><b>Hidden layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs062.html#batch-normalization" style="font-size: 80%;"><b>Batch Normalization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs063.html#dropout" style="font-size: 80%;"><b>Dropout</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs064.html#gradient-clipping" style="font-size: 80%;"><b>Gradient Clipping</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs065.html#a-top-down-perspective-on-neural-networks" style="font-size: 80%;"><b>A top-down perspective on Neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs066.html#more-top-down-perspectives" style="font-size: 80%;"><b>More top-down perspectives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs067.html#limitations-of-supervised-learning-with-deep-networks" style="font-size: 80%;"><b>Limitations of supervised learning with deep networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs068.html#limitations-of-nns" style="font-size: 80%;"><b>Limitations of NNs</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs069.html#homogeneous-data" style="font-size: 80%;"><b>Homogeneous data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs070.html#more-limitations" style="font-size: 80%;"><b>More limitations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs071.html#setting-up-a-multi-layer-perceptron-model-for-classification" style="font-size: 80%;"><b>Setting up a Multi-layer perceptron model for classification</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs072.html#defining-the-cost-function" style="font-size: 80%;"><b>Defining the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs073.html#example-binary-classification-problem" style="font-size: 80%;"><b>Example: binary classification problem</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs074.html#the-softmax-function" style="font-size: 80%;"><b>The Softmax function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs075.html#developing-a-code-for-doing-neural-networks-with-back-propagation" style="font-size: 80%;"><b>Developing a code for doing neural networks with back propagation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs077.html#train-and-test-datasets" style="font-size: 80%;"><b>Train and test datasets</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs078.html#define-model-and-architecture" style="font-size: 80%;"><b>Define model and architecture</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs079.html#layers" style="font-size: 80%;"><b>Layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs080.html#weights-and-biases" style="font-size: 80%;"><b>Weights and biases</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs081.html#feed-forward-pass" style="font-size: 80%;"><b>Feed-forward pass</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs082.html#matrix-multiplications" style="font-size: 80%;"><b>Matrix multiplications</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs083.html#choose-cost-function-and-optimizer" style="font-size: 80%;"><b>Choose cost function and optimizer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs084.html#optimizing-the-cost-function" style="font-size: 80%;"><b>Optimizing the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs085.html#regularization" style="font-size: 80%;"><b>Regularization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs086.html#matrix-multiplication" style="font-size: 80%;"><b>Matrix  multiplication</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs087.html#improving-performance" style="font-size: 80%;"><b>Improving performance</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs088.html#full-object-oriented-implementation" style="font-size: 80%;"><b>Full object-oriented implementation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs089.html#evaluate-model-performance-on-test-data" style="font-size: 80%;"><b>Evaluate model performance on test data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs090.html#adjust-hyperparameters" style="font-size: 80%;"><b>Adjust hyperparameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#scikit-learn-implementation" style="font-size: 80%;"><b>scikit-learn implementation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs094.html#building-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building neural networks in Tensorflow and Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs095.html#tensorflow" style="font-size: 80%;"><b>Tensorflow</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#using-keras" style="font-size: 80%;"><b>Using Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#building-a-neural-network-code" style="font-size: 80%;"><b>Building a neural network code</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#learning-rate-methods" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Learning rate methods</a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#usage-of-the-above-learning-rate-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of the above learning rate schedulers</a></li>
@@ -465,27 +463,13 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0012"></a>
 <!-- !split -->
-<h2 id="the-derivatives" class="anchor">The derivatives </h2>
-
-<p>The derivatives are now, using the chain rule again</p>
-
-$$
-\frac{\partial C}{\partial w_2}=\frac{\partial C}{\partial a_2}\frac{\partial a_2}{\partial z_2}\frac{\partial z_2}{\partial w_2}=(a_2-y)\sigma_2'a_1=\delta_2a_1,
-$$
+<h2 id="layout-of-a-simple-neural-network-with-one-hidden-layer" class="anchor">Layout of a simple neural network with one hidden layer  </h2>
 
-$$
-\frac{\partial C}{\partial b_2}=\frac{\partial C}{\partial a_2}\frac{\partial a_2}{\partial z_2}\frac{\partial z_2}{\partial b_2}=(a_2-y)\sigma_2'=\delta_2,
-$$
-
-$$
-\frac{\partial C}{\partial w_1}=\frac{\partial C}{\partial a_2}\frac{\partial a_2}{\partial z_2}\frac{\partial z_2}{\partial a_1}\frac{\partial a_1}{\partial z_1}\frac{\partial z_1}{\partial w_1}=(a_2-y)\sigma_2'a_1\sigma_1'a_0,
-$$
-
-$$
-\frac{\partial C}{\partial b_1}=\frac{\partial C}{\partial a_2}\frac{\partial a_2}{\partial z_2}\frac{\partial z_2}{\partial a_1}\frac{\partial a_1}{\partial z_1}\frac{\partial z_1}{\partial b_1}=(a_2-y)\sigma_2'\sigma_1'=\delta_1.
-$$
-
-<p>Can you generalize this to more than one hidden layer?</p>
+<br/><br/>
+<center>
+<p><img src="http://github.com/figures/simplenn2.png" width="900" align="bottom"></p>
+</center>
+<br/><br/>
 
 <p>
 <!-- navigation buttons at the bottom of the page -->
diff --git a/doc/pub/week42/html/._week42-bs013.html b/doc/pub/week42/html/._week42-bs013.html
index b446fcd88..5550478f6 100644
--- a/doc/pub/week42/html/._week42-bs013.html
+++ b/doc/pub/week42/html/._week42-bs013.html
@@ -36,16 +36,17 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Lecture October 14, 2024', 2, None, 'lecture-october-14-2024'),
-              ('Material for the active learning sessions on Tuesday and '
-               'Wednesday',
+ 'sections': [('Lecture October 13, 2025', 2, None, 'lecture-october-13-2025'),
+              ('Readings and videos', 2, None, 'readings-and-videos'),
+              ('Material for the lab sessions on Tuesday and Wednesday',
                2,
                None,
-               'material-for-the-active-learning-sessions-on-tuesday-and-wednesday'),
-              ('Writing a code which implements a feed-forward neural network',
+               'material-for-the-lab-sessions-on-tuesday-and-wednesday'),
+              ('Lecture material: Writing a code which implements a '
+               'feed-forward neural network',
                2,
                None,
-               'writing-a-code-which-implements-a-feed-forward-neural-network'),
+               'lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network'),
               ('Mathematics of deep learning',
                2,
                None,
@@ -55,10 +56,11 @@
                None,
                'reminder-on-books-with-hands-on-material-and-codes'),
               ('Reading recommendations', 2, None, 'reading-recommendations'),
-              ('First network example, simple percepetron with one input',
+              ('Reminder from last week: First network example, simple '
+               'percepetron with one input',
                2,
                None,
-               'first-network-example-simple-percepetron-with-one-input'),
+               'reminder-from-last-week-first-network-example-simple-percepetron-with-one-input'),
               ('Layout of a simple neural network with no hidden layer',
                2,
                None,
@@ -105,10 +107,10 @@
                None,
                'setting-up-the-equations-for-a-neural-network'),
               ('Layout of a neural network with three hidden layers (last '
-               'later = $l=L=4$, first layer $l=0$)',
+               'layer = $l=L=4$, first layer $l=0$)',
                2,
                None,
-               'layout-of-a-neural-network-with-three-hidden-layers-last-later-l-l-4-first-layer-l-0'),
+               'layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0'),
               ('Definitions', 2, None, 'definitions'),
               ('Inputs to the activation function',
                2,
@@ -291,10 +293,6 @@
                2,
                None,
                'collect-and-pre-process-data'),
-              ('The Breast Cancer Data, now with Keras',
-               2,
-               None,
-               'the-breast-cancer-data-now-with-keras'),
               ('Building a neural network code',
                2,
                None,
@@ -349,103 +347,103 @@
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs001.html#lecture-october-14-2024" style="font-size: 80%;"><b>Lecture October 14, 2024</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs002.html#material-for-the-active-learning-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Material for the active learning sessions on Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs003.html#writing-a-code-which-implements-a-feed-forward-neural-network" style="font-size: 80%;"><b>Writing a code which implements a feed-forward neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs004.html#mathematics-of-deep-learning" style="font-size: 80%;"><b>Mathematics of deep learning</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs005.html#reminder-on-books-with-hands-on-material-and-codes" style="font-size: 80%;"><b>Reminder on books with hands-on material and codes</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs006.html#reading-recommendations" style="font-size: 80%;"><b>Reading recommendations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs007.html#first-network-example-simple-percepetron-with-one-input" style="font-size: 80%;"><b>First network example, simple percepetron with one input</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs008.html#layout-of-a-simple-neural-network-with-no-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with no hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs009.html#optimizing-the-parameters" style="font-size: 80%;"><b>Optimizing the parameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs010.html#adding-a-hidden-layer" style="font-size: 80%;"><b>Adding a hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs011.html#layout-of-a-simple-neural-network-with-one-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with one hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs012.html#the-derivatives" style="font-size: 80%;"><b>The derivatives</b></a></li>
-     <!-- navigation toc: --> <li><a href="#important-observations" style="font-size: 80%;"><b>Important observations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs014.html#the-training" style="font-size: 80%;"><b>The training</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs015.html#code-example" style="font-size: 80%;"><b>Code example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs016.html#simple-neural-network-and-the-back-propagation-equations" style="font-size: 80%;"><b>Simple neural network and the  back propagation equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs017.html#layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-with-two-hidden-noeds-and-one-output-node" style="font-size: 80%;"><b>Layout of a simple neural network with two input nodes, one  hidden layer with two hidden noeds and one output node</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs018.html#the-ouput-layer" style="font-size: 80%;"><b>The ouput layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs019.html#compact-expressions" style="font-size: 80%;"><b>Compact expressions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs020.html#output-layer" style="font-size: 80%;"><b>Output layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs021.html#explicit-derivatives" style="font-size: 80%;"><b>Explicit derivatives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs022.html#derivatives-of-the-hidden-layer" style="font-size: 80%;"><b>Derivatives of the hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs023.html#final-expression" style="font-size: 80%;"><b>Final expression</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs024.html#completing-the-list" style="font-size: 80%;"><b>Completing the list</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs025.html#final-expressions-for-the-biases-of-the-hidden-layer" style="font-size: 80%;"><b>Final expressions for the biases of the hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs026.html#gradient-expressions" style="font-size: 80%;"><b>Gradient expressions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs027.html#setting-up-the-equations-for-a-neural-network" style="font-size: 80%;"><b>Setting up the equations for a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs028.html#layout-of-a-neural-network-with-three-hidden-layers-last-later-l-l-4-first-layer-l-0" style="font-size: 80%;"><b>Layout of a neural network with three hidden layers (last later = \( l=L=4 \), first layer \( l=0 \))</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs029.html#definitions" style="font-size: 80%;"><b>Definitions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs030.html#inputs-to-the-activation-function" style="font-size: 80%;"><b>Inputs to the activation function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs031.html#layout-of-input-to-first-hidden-layer-l-1-from-input-layer-l-0" style="font-size: 80%;"><b>Layout of input to first hidden layer \( l=1 \) from input layer \( l=0 \)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs032.html#derivatives-and-the-chain-rule" style="font-size: 80%;"><b>Derivatives and the chain rule</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs033.html#derivative-of-the-cost-function" style="font-size: 80%;"><b>Derivative of the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs034.html#the-back-propagation-equations-for-a-neural-network" style="font-size: 80%;"><b>The  back propagation equations for a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs035.html#analyzing-the-last-results" style="font-size: 80%;"><b>Analyzing the last results</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs036.html#more-considerations" style="font-size: 80%;"><b>More considerations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs037.html#derivatives-in-terms-of-z-j-l" style="font-size: 80%;"><b>Derivatives in terms of \( z_j^L \)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs038.html#bringing-it-together" style="font-size: 80%;"><b>Bringing it together</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs039.html#final-back-propagating-equation" style="font-size: 80%;"><b>Final back propagating equation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs040.html#using-the-chain-rule-and-summing-over-all-k-entries" style="font-size: 80%;"><b>Using the chain rule and summing over all \( k \) entries</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs041.html#setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations" style="font-size: 80%;"><b>Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs042.html#setting-up-the-back-propagation-algorithm-part-1" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 1</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs043.html#setting-up-the-back-propagation-algorithm-part-2" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs044.html#setting-up-the-back-propagation-algorithm-part-3" style="font-size: 80%;"><b>Setting up the Back propagation algorithm, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs045.html#updating-the-gradients" style="font-size: 80%;"><b>Updating the gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs001.html#lecture-october-13-2025" style="font-size: 80%;"><b>Lecture October 13, 2025</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs002.html#readings-and-videos" style="font-size: 80%;"><b>Readings and videos</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs003.html#material-for-the-lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Material for the lab sessions on Tuesday and Wednesday</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs004.html#lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network" style="font-size: 80%;"><b>Lecture material: Writing a code which implements a feed-forward neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs005.html#mathematics-of-deep-learning" style="font-size: 80%;"><b>Mathematics of deep learning</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs006.html#reminder-on-books-with-hands-on-material-and-codes" style="font-size: 80%;"><b>Reminder on books with hands-on material and codes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs007.html#reading-recommendations" style="font-size: 80%;"><b>Reading recommendations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs008.html#reminder-from-last-week-first-network-example-simple-percepetron-with-one-input" style="font-size: 80%;"><b>Reminder from last week: First network example, simple percepetron with one input</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs009.html#layout-of-a-simple-neural-network-with-no-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with no hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs010.html#optimizing-the-parameters" style="font-size: 80%;"><b>Optimizing the parameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs011.html#adding-a-hidden-layer" style="font-size: 80%;"><b>Adding a hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs012.html#layout-of-a-simple-neural-network-with-one-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with one hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="#the-derivatives" style="font-size: 80%;"><b>The derivatives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs014.html#important-observations" style="font-size: 80%;"><b>Important observations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs015.html#the-training" style="font-size: 80%;"><b>The training</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs016.html#code-example" style="font-size: 80%;"><b>Code example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs017.html#simple-neural-network-and-the-back-propagation-equations" style="font-size: 80%;"><b>Simple neural network and the  back propagation equations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs018.html#layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-with-two-hidden-noeds-and-one-output-node" style="font-size: 80%;"><b>Layout of a simple neural network with two input nodes, one  hidden layer with two hidden noeds and one output node</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs019.html#the-ouput-layer" style="font-size: 80%;"><b>The ouput layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs020.html#compact-expressions" style="font-size: 80%;"><b>Compact expressions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs021.html#output-layer" style="font-size: 80%;"><b>Output layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs022.html#explicit-derivatives" style="font-size: 80%;"><b>Explicit derivatives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs023.html#derivatives-of-the-hidden-layer" style="font-size: 80%;"><b>Derivatives of the hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs024.html#final-expression" style="font-size: 80%;"><b>Final expression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs025.html#completing-the-list" style="font-size: 80%;"><b>Completing the list</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs026.html#final-expressions-for-the-biases-of-the-hidden-layer" style="font-size: 80%;"><b>Final expressions for the biases of the hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs027.html#gradient-expressions" style="font-size: 80%;"><b>Gradient expressions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs028.html#setting-up-the-equations-for-a-neural-network" style="font-size: 80%;"><b>Setting up the equations for a neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs029.html#layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0" style="font-size: 80%;"><b>Layout of a neural network with three hidden layers (last layer = \( l=L=4 \), first layer \( l=0 \))</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs030.html#definitions" style="font-size: 80%;"><b>Definitions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs031.html#inputs-to-the-activation-function" style="font-size: 80%;"><b>Inputs to the activation function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs032.html#layout-of-input-to-first-hidden-layer-l-1-from-input-layer-l-0" style="font-size: 80%;"><b>Layout of input to first hidden layer \( l=1 \) from input layer \( l=0 \)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs033.html#derivatives-and-the-chain-rule" style="font-size: 80%;"><b>Derivatives and the chain rule</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs034.html#derivative-of-the-cost-function" style="font-size: 80%;"><b>Derivative of the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs035.html#the-back-propagation-equations-for-a-neural-network" style="font-size: 80%;"><b>The  back propagation equations for a neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs036.html#analyzing-the-last-results" style="font-size: 80%;"><b>Analyzing the last results</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs037.html#more-considerations" style="font-size: 80%;"><b>More considerations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs038.html#derivatives-in-terms-of-z-j-l" style="font-size: 80%;"><b>Derivatives in terms of \( z_j^L \)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs039.html#bringing-it-together" style="font-size: 80%;"><b>Bringing it together</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs040.html#final-back-propagating-equation" style="font-size: 80%;"><b>Final back propagating equation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs041.html#using-the-chain-rule-and-summing-over-all-k-entries" style="font-size: 80%;"><b>Using the chain rule and summing over all \( k \) entries</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs042.html#setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations" style="font-size: 80%;"><b>Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs043.html#setting-up-the-back-propagation-algorithm-part-1" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 1</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs044.html#setting-up-the-back-propagation-algorithm-part-2" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 2</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs045.html#setting-up-the-back-propagation-algorithm-part-3" style="font-size: 80%;"><b>Setting up the Back propagation algorithm, part 3</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs046.html#updating-the-gradients" style="font-size: 80%;"><b>Updating the gradients</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#activation-functions" style="font-size: 80%;"><b>Activation functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs047.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs048.html#relevance" style="font-size: 80%;"><b>Relevance</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs049.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs050.html#exploding-gradients" style="font-size: 80%;"><b>Exploding gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs051.html#is-the-logistic-activation-function-sigmoid-our-choice" style="font-size: 80%;"><b>Is the Logistic activation function (Sigmoid)  our choice?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs052.html#logistic-function-as-the-root-of-problems" style="font-size: 80%;"><b>Logistic function as the root of problems</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs053.html#the-derivative-of-the-logistic-funtion" style="font-size: 80%;"><b>The derivative of the Logistic funtion</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs054.html#insights-from-the-paper-by-glorot-and-bengio" style="font-size: 80%;"><b>Insights from the paper by Glorot and Bengio</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs055.html#the-relu-function-family" style="font-size: 80%;"><b>The RELU function family</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs056.html#elu-function" style="font-size: 80%;"><b>ELU function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs057.html#which-activation-function-should-we-use" style="font-size: 80%;"><b>Which activation function should we use?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs058.html#more-on-activation-functions-output-layers" style="font-size: 80%;"><b>More on activation functions, output layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs059.html#fine-tuning-neural-network-hyperparameters" style="font-size: 80%;"><b>Fine-tuning neural network hyperparameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs060.html#hidden-layers" style="font-size: 80%;"><b>Hidden layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs061.html#batch-normalization" style="font-size: 80%;"><b>Batch Normalization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs062.html#dropout" style="font-size: 80%;"><b>Dropout</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs063.html#gradient-clipping" style="font-size: 80%;"><b>Gradient Clipping</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs064.html#a-top-down-perspective-on-neural-networks" style="font-size: 80%;"><b>A top-down perspective on Neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs065.html#more-top-down-perspectives" style="font-size: 80%;"><b>More top-down perspectives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs066.html#limitations-of-supervised-learning-with-deep-networks" style="font-size: 80%;"><b>Limitations of supervised learning with deep networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs067.html#limitations-of-nns" style="font-size: 80%;"><b>Limitations of NNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs068.html#homogeneous-data" style="font-size: 80%;"><b>Homogeneous data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs069.html#more-limitations" style="font-size: 80%;"><b>More limitations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs070.html#setting-up-a-multi-layer-perceptron-model-for-classification" style="font-size: 80%;"><b>Setting up a Multi-layer perceptron model for classification</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs071.html#defining-the-cost-function" style="font-size: 80%;"><b>Defining the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs072.html#example-binary-classification-problem" style="font-size: 80%;"><b>Example: binary classification problem</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs073.html#the-softmax-function" style="font-size: 80%;"><b>The Softmax function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs074.html#developing-a-code-for-doing-neural-networks-with-back-propagation" style="font-size: 80%;"><b>Developing a code for doing neural networks with back propagation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs076.html#train-and-test-datasets" style="font-size: 80%;"><b>Train and test datasets</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs077.html#define-model-and-architecture" style="font-size: 80%;"><b>Define model and architecture</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs078.html#layers" style="font-size: 80%;"><b>Layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs079.html#weights-and-biases" style="font-size: 80%;"><b>Weights and biases</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs080.html#feed-forward-pass" style="font-size: 80%;"><b>Feed-forward pass</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs081.html#matrix-multiplications" style="font-size: 80%;"><b>Matrix multiplications</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs082.html#choose-cost-function-and-optimizer" style="font-size: 80%;"><b>Choose cost function and optimizer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs083.html#optimizing-the-cost-function" style="font-size: 80%;"><b>Optimizing the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs084.html#regularization" style="font-size: 80%;"><b>Regularization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs085.html#matrix-multiplication" style="font-size: 80%;"><b>Matrix  multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs086.html#improving-performance" style="font-size: 80%;"><b>Improving performance</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs087.html#full-object-oriented-implementation" style="font-size: 80%;"><b>Full object-oriented implementation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs088.html#evaluate-model-performance-on-test-data" style="font-size: 80%;"><b>Evaluate model performance on test data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs089.html#adjust-hyperparameters" style="font-size: 80%;"><b>Adjust hyperparameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs091.html#scikit-learn-implementation" style="font-size: 80%;"><b>scikit-learn implementation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#building-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs094.html#tensorflow" style="font-size: 80%;"><b>Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs095.html#using-keras" style="font-size: 80%;"><b>Using Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#the-breast-cancer-data-now-with-keras" style="font-size: 80%;"><b>The Breast Cancer Data, now with Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs048.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs049.html#relevance" style="font-size: 80%;"><b>Relevance</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs050.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs051.html#exploding-gradients" style="font-size: 80%;"><b>Exploding gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs052.html#is-the-logistic-activation-function-sigmoid-our-choice" style="font-size: 80%;"><b>Is the Logistic activation function (Sigmoid)  our choice?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs053.html#logistic-function-as-the-root-of-problems" style="font-size: 80%;"><b>Logistic function as the root of problems</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs054.html#the-derivative-of-the-logistic-funtion" style="font-size: 80%;"><b>The derivative of the Logistic funtion</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs055.html#insights-from-the-paper-by-glorot-and-bengio" style="font-size: 80%;"><b>Insights from the paper by Glorot and Bengio</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs056.html#the-relu-function-family" style="font-size: 80%;"><b>The RELU function family</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs057.html#elu-function" style="font-size: 80%;"><b>ELU function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs058.html#which-activation-function-should-we-use" style="font-size: 80%;"><b>Which activation function should we use?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs059.html#more-on-activation-functions-output-layers" style="font-size: 80%;"><b>More on activation functions, output layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs060.html#fine-tuning-neural-network-hyperparameters" style="font-size: 80%;"><b>Fine-tuning neural network hyperparameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs061.html#hidden-layers" style="font-size: 80%;"><b>Hidden layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs062.html#batch-normalization" style="font-size: 80%;"><b>Batch Normalization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs063.html#dropout" style="font-size: 80%;"><b>Dropout</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs064.html#gradient-clipping" style="font-size: 80%;"><b>Gradient Clipping</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs065.html#a-top-down-perspective-on-neural-networks" style="font-size: 80%;"><b>A top-down perspective on Neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs066.html#more-top-down-perspectives" style="font-size: 80%;"><b>More top-down perspectives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs067.html#limitations-of-supervised-learning-with-deep-networks" style="font-size: 80%;"><b>Limitations of supervised learning with deep networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs068.html#limitations-of-nns" style="font-size: 80%;"><b>Limitations of NNs</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs069.html#homogeneous-data" style="font-size: 80%;"><b>Homogeneous data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs070.html#more-limitations" style="font-size: 80%;"><b>More limitations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs071.html#setting-up-a-multi-layer-perceptron-model-for-classification" style="font-size: 80%;"><b>Setting up a Multi-layer perceptron model for classification</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs072.html#defining-the-cost-function" style="font-size: 80%;"><b>Defining the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs073.html#example-binary-classification-problem" style="font-size: 80%;"><b>Example: binary classification problem</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs074.html#the-softmax-function" style="font-size: 80%;"><b>The Softmax function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs075.html#developing-a-code-for-doing-neural-networks-with-back-propagation" style="font-size: 80%;"><b>Developing a code for doing neural networks with back propagation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs077.html#train-and-test-datasets" style="font-size: 80%;"><b>Train and test datasets</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs078.html#define-model-and-architecture" style="font-size: 80%;"><b>Define model and architecture</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs079.html#layers" style="font-size: 80%;"><b>Layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs080.html#weights-and-biases" style="font-size: 80%;"><b>Weights and biases</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs081.html#feed-forward-pass" style="font-size: 80%;"><b>Feed-forward pass</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs082.html#matrix-multiplications" style="font-size: 80%;"><b>Matrix multiplications</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs083.html#choose-cost-function-and-optimizer" style="font-size: 80%;"><b>Choose cost function and optimizer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs084.html#optimizing-the-cost-function" style="font-size: 80%;"><b>Optimizing the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs085.html#regularization" style="font-size: 80%;"><b>Regularization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs086.html#matrix-multiplication" style="font-size: 80%;"><b>Matrix  multiplication</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs087.html#improving-performance" style="font-size: 80%;"><b>Improving performance</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs088.html#full-object-oriented-implementation" style="font-size: 80%;"><b>Full object-oriented implementation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs089.html#evaluate-model-performance-on-test-data" style="font-size: 80%;"><b>Evaluate model performance on test data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs090.html#adjust-hyperparameters" style="font-size: 80%;"><b>Adjust hyperparameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#scikit-learn-implementation" style="font-size: 80%;"><b>scikit-learn implementation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs094.html#building-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building neural networks in Tensorflow and Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs095.html#tensorflow" style="font-size: 80%;"><b>Tensorflow</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#using-keras" style="font-size: 80%;"><b>Using Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#building-a-neural-network-code" style="font-size: 80%;"><b>Building a neural network code</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#learning-rate-methods" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Learning rate methods</a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#usage-of-the-above-learning-rate-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of the above learning rate schedulers</a></li>
@@ -465,20 +463,27 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0013"></a>
 <!-- !split -->
-<h2 id="important-observations" class="anchor">Important observations </h2>
+<h2 id="the-derivatives" class="anchor">The derivatives </h2>
 
-<div class="panel panel-default">
-<div class="panel-body">
-<!-- subsequent paragraphs come in larger fonts, so start with a paragraph -->
-<p>From the above equations we see that the derivatives of the activation
-functions play a central role. If they vanish, the training may
-stop. This is called the vanishing gradient problem, see discussions below. If they become
-large, the parameters \( w_i \) and \( b_i \) may simply go to infinity. This
-is referenced as  the exploding gradient problem.
-</p>
-</div>
-</div>
+<p>The derivatives are now, using the chain rule again</p>
+
+$$
+\frac{\partial C}{\partial w_2}=\frac{\partial C}{\partial a_2}\frac{\partial a_2}{\partial z_2}\frac{\partial z_2}{\partial w_2}=(a_2-y)\sigma_2'a_1=\delta_2a_1,
+$$
+
+$$
+\frac{\partial C}{\partial b_2}=\frac{\partial C}{\partial a_2}\frac{\partial a_2}{\partial z_2}\frac{\partial z_2}{\partial b_2}=(a_2-y)\sigma_2'=\delta_2,
+$$
+
+$$
+\frac{\partial C}{\partial w_1}=\frac{\partial C}{\partial a_2}\frac{\partial a_2}{\partial z_2}\frac{\partial z_2}{\partial a_1}\frac{\partial a_1}{\partial z_1}\frac{\partial z_1}{\partial w_1}=(a_2-y)\sigma_2'a_1\sigma_1'a_0,
+$$
+
+$$
+\frac{\partial C}{\partial b_1}=\frac{\partial C}{\partial a_2}\frac{\partial a_2}{\partial z_2}\frac{\partial z_2}{\partial a_1}\frac{\partial a_1}{\partial z_1}\frac{\partial z_1}{\partial b_1}=(a_2-y)\sigma_2'\sigma_1'=\delta_1.
+$$
 
+<p>Can you generalize this to more than one hidden layer?</p>
 
 <p>
 <!-- navigation buttons at the bottom of the page -->
diff --git a/doc/pub/week42/html/._week42-bs014.html b/doc/pub/week42/html/._week42-bs014.html
index 8abe55087..dda91b793 100644
--- a/doc/pub/week42/html/._week42-bs014.html
+++ b/doc/pub/week42/html/._week42-bs014.html
@@ -36,16 +36,17 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Lecture October 14, 2024', 2, None, 'lecture-october-14-2024'),
-              ('Material for the active learning sessions on Tuesday and '
-               'Wednesday',
+ 'sections': [('Lecture October 13, 2025', 2, None, 'lecture-october-13-2025'),
+              ('Readings and videos', 2, None, 'readings-and-videos'),
+              ('Material for the lab sessions on Tuesday and Wednesday',
                2,
                None,
-               'material-for-the-active-learning-sessions-on-tuesday-and-wednesday'),
-              ('Writing a code which implements a feed-forward neural network',
+               'material-for-the-lab-sessions-on-tuesday-and-wednesday'),
+              ('Lecture material: Writing a code which implements a '
+               'feed-forward neural network',
                2,
                None,
-               'writing-a-code-which-implements-a-feed-forward-neural-network'),
+               'lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network'),
               ('Mathematics of deep learning',
                2,
                None,
@@ -55,10 +56,11 @@
                None,
                'reminder-on-books-with-hands-on-material-and-codes'),
               ('Reading recommendations', 2, None, 'reading-recommendations'),
-              ('First network example, simple percepetron with one input',
+              ('Reminder from last week: First network example, simple '
+               'percepetron with one input',
                2,
                None,
-               'first-network-example-simple-percepetron-with-one-input'),
+               'reminder-from-last-week-first-network-example-simple-percepetron-with-one-input'),
               ('Layout of a simple neural network with no hidden layer',
                2,
                None,
@@ -105,10 +107,10 @@
                None,
                'setting-up-the-equations-for-a-neural-network'),
               ('Layout of a neural network with three hidden layers (last '
-               'later = $l=L=4$, first layer $l=0$)',
+               'layer = $l=L=4$, first layer $l=0$)',
                2,
                None,
-               'layout-of-a-neural-network-with-three-hidden-layers-last-later-l-l-4-first-layer-l-0'),
+               'layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0'),
               ('Definitions', 2, None, 'definitions'),
               ('Inputs to the activation function',
                2,
@@ -291,10 +293,6 @@
                2,
                None,
                'collect-and-pre-process-data'),
-              ('The Breast Cancer Data, now with Keras',
-               2,
-               None,
-               'the-breast-cancer-data-now-with-keras'),
               ('Building a neural network code',
                2,
                None,
@@ -349,103 +347,103 @@
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs001.html#lecture-october-14-2024" style="font-size: 80%;"><b>Lecture October 14, 2024</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs002.html#material-for-the-active-learning-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Material for the active learning sessions on Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs003.html#writing-a-code-which-implements-a-feed-forward-neural-network" style="font-size: 80%;"><b>Writing a code which implements a feed-forward neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs004.html#mathematics-of-deep-learning" style="font-size: 80%;"><b>Mathematics of deep learning</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs005.html#reminder-on-books-with-hands-on-material-and-codes" style="font-size: 80%;"><b>Reminder on books with hands-on material and codes</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs006.html#reading-recommendations" style="font-size: 80%;"><b>Reading recommendations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs007.html#first-network-example-simple-percepetron-with-one-input" style="font-size: 80%;"><b>First network example, simple percepetron with one input</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs008.html#layout-of-a-simple-neural-network-with-no-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with no hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs009.html#optimizing-the-parameters" style="font-size: 80%;"><b>Optimizing the parameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs010.html#adding-a-hidden-layer" style="font-size: 80%;"><b>Adding a hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs011.html#layout-of-a-simple-neural-network-with-one-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with one hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs012.html#the-derivatives" style="font-size: 80%;"><b>The derivatives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs013.html#important-observations" style="font-size: 80%;"><b>Important observations</b></a></li>
-     <!-- navigation toc: --> <li><a href="#the-training" style="font-size: 80%;"><b>The training</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs015.html#code-example" style="font-size: 80%;"><b>Code example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs016.html#simple-neural-network-and-the-back-propagation-equations" style="font-size: 80%;"><b>Simple neural network and the  back propagation equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs017.html#layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-with-two-hidden-noeds-and-one-output-node" style="font-size: 80%;"><b>Layout of a simple neural network with two input nodes, one  hidden layer with two hidden noeds and one output node</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs018.html#the-ouput-layer" style="font-size: 80%;"><b>The ouput layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs019.html#compact-expressions" style="font-size: 80%;"><b>Compact expressions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs020.html#output-layer" style="font-size: 80%;"><b>Output layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs021.html#explicit-derivatives" style="font-size: 80%;"><b>Explicit derivatives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs022.html#derivatives-of-the-hidden-layer" style="font-size: 80%;"><b>Derivatives of the hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs023.html#final-expression" style="font-size: 80%;"><b>Final expression</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs024.html#completing-the-list" style="font-size: 80%;"><b>Completing the list</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs025.html#final-expressions-for-the-biases-of-the-hidden-layer" style="font-size: 80%;"><b>Final expressions for the biases of the hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs026.html#gradient-expressions" style="font-size: 80%;"><b>Gradient expressions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs027.html#setting-up-the-equations-for-a-neural-network" style="font-size: 80%;"><b>Setting up the equations for a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs028.html#layout-of-a-neural-network-with-three-hidden-layers-last-later-l-l-4-first-layer-l-0" style="font-size: 80%;"><b>Layout of a neural network with three hidden layers (last later = \( l=L=4 \), first layer \( l=0 \))</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs029.html#definitions" style="font-size: 80%;"><b>Definitions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs030.html#inputs-to-the-activation-function" style="font-size: 80%;"><b>Inputs to the activation function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs031.html#layout-of-input-to-first-hidden-layer-l-1-from-input-layer-l-0" style="font-size: 80%;"><b>Layout of input to first hidden layer \( l=1 \) from input layer \( l=0 \)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs032.html#derivatives-and-the-chain-rule" style="font-size: 80%;"><b>Derivatives and the chain rule</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs033.html#derivative-of-the-cost-function" style="font-size: 80%;"><b>Derivative of the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs034.html#the-back-propagation-equations-for-a-neural-network" style="font-size: 80%;"><b>The  back propagation equations for a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs035.html#analyzing-the-last-results" style="font-size: 80%;"><b>Analyzing the last results</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs036.html#more-considerations" style="font-size: 80%;"><b>More considerations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs037.html#derivatives-in-terms-of-z-j-l" style="font-size: 80%;"><b>Derivatives in terms of \( z_j^L \)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs038.html#bringing-it-together" style="font-size: 80%;"><b>Bringing it together</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs039.html#final-back-propagating-equation" style="font-size: 80%;"><b>Final back propagating equation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs040.html#using-the-chain-rule-and-summing-over-all-k-entries" style="font-size: 80%;"><b>Using the chain rule and summing over all \( k \) entries</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs041.html#setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations" style="font-size: 80%;"><b>Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs042.html#setting-up-the-back-propagation-algorithm-part-1" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 1</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs043.html#setting-up-the-back-propagation-algorithm-part-2" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs044.html#setting-up-the-back-propagation-algorithm-part-3" style="font-size: 80%;"><b>Setting up the Back propagation algorithm, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs045.html#updating-the-gradients" style="font-size: 80%;"><b>Updating the gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs001.html#lecture-october-13-2025" style="font-size: 80%;"><b>Lecture October 13, 2025</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs002.html#readings-and-videos" style="font-size: 80%;"><b>Readings and videos</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs003.html#material-for-the-lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Material for the lab sessions on Tuesday and Wednesday</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs004.html#lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network" style="font-size: 80%;"><b>Lecture material: Writing a code which implements a feed-forward neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs005.html#mathematics-of-deep-learning" style="font-size: 80%;"><b>Mathematics of deep learning</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs006.html#reminder-on-books-with-hands-on-material-and-codes" style="font-size: 80%;"><b>Reminder on books with hands-on material and codes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs007.html#reading-recommendations" style="font-size: 80%;"><b>Reading recommendations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs008.html#reminder-from-last-week-first-network-example-simple-percepetron-with-one-input" style="font-size: 80%;"><b>Reminder from last week: First network example, simple percepetron with one input</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs009.html#layout-of-a-simple-neural-network-with-no-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with no hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs010.html#optimizing-the-parameters" style="font-size: 80%;"><b>Optimizing the parameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs011.html#adding-a-hidden-layer" style="font-size: 80%;"><b>Adding a hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs012.html#layout-of-a-simple-neural-network-with-one-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with one hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs013.html#the-derivatives" style="font-size: 80%;"><b>The derivatives</b></a></li>
+     <!-- navigation toc: --> <li><a href="#important-observations" style="font-size: 80%;"><b>Important observations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs015.html#the-training" style="font-size: 80%;"><b>The training</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs016.html#code-example" style="font-size: 80%;"><b>Code example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs017.html#simple-neural-network-and-the-back-propagation-equations" style="font-size: 80%;"><b>Simple neural network and the  back propagation equations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs018.html#layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-with-two-hidden-noeds-and-one-output-node" style="font-size: 80%;"><b>Layout of a simple neural network with two input nodes, one  hidden layer with two hidden noeds and one output node</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs019.html#the-ouput-layer" style="font-size: 80%;"><b>The ouput layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs020.html#compact-expressions" style="font-size: 80%;"><b>Compact expressions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs021.html#output-layer" style="font-size: 80%;"><b>Output layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs022.html#explicit-derivatives" style="font-size: 80%;"><b>Explicit derivatives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs023.html#derivatives-of-the-hidden-layer" style="font-size: 80%;"><b>Derivatives of the hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs024.html#final-expression" style="font-size: 80%;"><b>Final expression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs025.html#completing-the-list" style="font-size: 80%;"><b>Completing the list</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs026.html#final-expressions-for-the-biases-of-the-hidden-layer" style="font-size: 80%;"><b>Final expressions for the biases of the hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs027.html#gradient-expressions" style="font-size: 80%;"><b>Gradient expressions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs028.html#setting-up-the-equations-for-a-neural-network" style="font-size: 80%;"><b>Setting up the equations for a neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs029.html#layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0" style="font-size: 80%;"><b>Layout of a neural network with three hidden layers (last layer = \( l=L=4 \), first layer \( l=0 \))</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs030.html#definitions" style="font-size: 80%;"><b>Definitions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs031.html#inputs-to-the-activation-function" style="font-size: 80%;"><b>Inputs to the activation function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs032.html#layout-of-input-to-first-hidden-layer-l-1-from-input-layer-l-0" style="font-size: 80%;"><b>Layout of input to first hidden layer \( l=1 \) from input layer \( l=0 \)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs033.html#derivatives-and-the-chain-rule" style="font-size: 80%;"><b>Derivatives and the chain rule</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs034.html#derivative-of-the-cost-function" style="font-size: 80%;"><b>Derivative of the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs035.html#the-back-propagation-equations-for-a-neural-network" style="font-size: 80%;"><b>The  back propagation equations for a neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs036.html#analyzing-the-last-results" style="font-size: 80%;"><b>Analyzing the last results</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs037.html#more-considerations" style="font-size: 80%;"><b>More considerations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs038.html#derivatives-in-terms-of-z-j-l" style="font-size: 80%;"><b>Derivatives in terms of \( z_j^L \)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs039.html#bringing-it-together" style="font-size: 80%;"><b>Bringing it together</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs040.html#final-back-propagating-equation" style="font-size: 80%;"><b>Final back propagating equation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs041.html#using-the-chain-rule-and-summing-over-all-k-entries" style="font-size: 80%;"><b>Using the chain rule and summing over all \( k \) entries</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs042.html#setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations" style="font-size: 80%;"><b>Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs043.html#setting-up-the-back-propagation-algorithm-part-1" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 1</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs044.html#setting-up-the-back-propagation-algorithm-part-2" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 2</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs045.html#setting-up-the-back-propagation-algorithm-part-3" style="font-size: 80%;"><b>Setting up the Back propagation algorithm, part 3</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs046.html#updating-the-gradients" style="font-size: 80%;"><b>Updating the gradients</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#activation-functions" style="font-size: 80%;"><b>Activation functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs047.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs048.html#relevance" style="font-size: 80%;"><b>Relevance</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs049.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs050.html#exploding-gradients" style="font-size: 80%;"><b>Exploding gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs051.html#is-the-logistic-activation-function-sigmoid-our-choice" style="font-size: 80%;"><b>Is the Logistic activation function (Sigmoid)  our choice?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs052.html#logistic-function-as-the-root-of-problems" style="font-size: 80%;"><b>Logistic function as the root of problems</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs053.html#the-derivative-of-the-logistic-funtion" style="font-size: 80%;"><b>The derivative of the Logistic funtion</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs054.html#insights-from-the-paper-by-glorot-and-bengio" style="font-size: 80%;"><b>Insights from the paper by Glorot and Bengio</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs055.html#the-relu-function-family" style="font-size: 80%;"><b>The RELU function family</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs056.html#elu-function" style="font-size: 80%;"><b>ELU function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs057.html#which-activation-function-should-we-use" style="font-size: 80%;"><b>Which activation function should we use?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs058.html#more-on-activation-functions-output-layers" style="font-size: 80%;"><b>More on activation functions, output layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs059.html#fine-tuning-neural-network-hyperparameters" style="font-size: 80%;"><b>Fine-tuning neural network hyperparameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs060.html#hidden-layers" style="font-size: 80%;"><b>Hidden layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs061.html#batch-normalization" style="font-size: 80%;"><b>Batch Normalization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs062.html#dropout" style="font-size: 80%;"><b>Dropout</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs063.html#gradient-clipping" style="font-size: 80%;"><b>Gradient Clipping</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs064.html#a-top-down-perspective-on-neural-networks" style="font-size: 80%;"><b>A top-down perspective on Neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs065.html#more-top-down-perspectives" style="font-size: 80%;"><b>More top-down perspectives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs066.html#limitations-of-supervised-learning-with-deep-networks" style="font-size: 80%;"><b>Limitations of supervised learning with deep networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs067.html#limitations-of-nns" style="font-size: 80%;"><b>Limitations of NNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs068.html#homogeneous-data" style="font-size: 80%;"><b>Homogeneous data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs069.html#more-limitations" style="font-size: 80%;"><b>More limitations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs070.html#setting-up-a-multi-layer-perceptron-model-for-classification" style="font-size: 80%;"><b>Setting up a Multi-layer perceptron model for classification</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs071.html#defining-the-cost-function" style="font-size: 80%;"><b>Defining the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs072.html#example-binary-classification-problem" style="font-size: 80%;"><b>Example: binary classification problem</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs073.html#the-softmax-function" style="font-size: 80%;"><b>The Softmax function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs074.html#developing-a-code-for-doing-neural-networks-with-back-propagation" style="font-size: 80%;"><b>Developing a code for doing neural networks with back propagation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs076.html#train-and-test-datasets" style="font-size: 80%;"><b>Train and test datasets</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs077.html#define-model-and-architecture" style="font-size: 80%;"><b>Define model and architecture</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs078.html#layers" style="font-size: 80%;"><b>Layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs079.html#weights-and-biases" style="font-size: 80%;"><b>Weights and biases</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs080.html#feed-forward-pass" style="font-size: 80%;"><b>Feed-forward pass</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs081.html#matrix-multiplications" style="font-size: 80%;"><b>Matrix multiplications</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs082.html#choose-cost-function-and-optimizer" style="font-size: 80%;"><b>Choose cost function and optimizer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs083.html#optimizing-the-cost-function" style="font-size: 80%;"><b>Optimizing the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs084.html#regularization" style="font-size: 80%;"><b>Regularization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs085.html#matrix-multiplication" style="font-size: 80%;"><b>Matrix  multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs086.html#improving-performance" style="font-size: 80%;"><b>Improving performance</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs087.html#full-object-oriented-implementation" style="font-size: 80%;"><b>Full object-oriented implementation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs088.html#evaluate-model-performance-on-test-data" style="font-size: 80%;"><b>Evaluate model performance on test data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs089.html#adjust-hyperparameters" style="font-size: 80%;"><b>Adjust hyperparameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs091.html#scikit-learn-implementation" style="font-size: 80%;"><b>scikit-learn implementation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#building-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs094.html#tensorflow" style="font-size: 80%;"><b>Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs095.html#using-keras" style="font-size: 80%;"><b>Using Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#the-breast-cancer-data-now-with-keras" style="font-size: 80%;"><b>The Breast Cancer Data, now with Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs048.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs049.html#relevance" style="font-size: 80%;"><b>Relevance</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs050.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs051.html#exploding-gradients" style="font-size: 80%;"><b>Exploding gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs052.html#is-the-logistic-activation-function-sigmoid-our-choice" style="font-size: 80%;"><b>Is the Logistic activation function (Sigmoid)  our choice?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs053.html#logistic-function-as-the-root-of-problems" style="font-size: 80%;"><b>Logistic function as the root of problems</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs054.html#the-derivative-of-the-logistic-funtion" style="font-size: 80%;"><b>The derivative of the Logistic funtion</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs055.html#insights-from-the-paper-by-glorot-and-bengio" style="font-size: 80%;"><b>Insights from the paper by Glorot and Bengio</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs056.html#the-relu-function-family" style="font-size: 80%;"><b>The RELU function family</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs057.html#elu-function" style="font-size: 80%;"><b>ELU function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs058.html#which-activation-function-should-we-use" style="font-size: 80%;"><b>Which activation function should we use?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs059.html#more-on-activation-functions-output-layers" style="font-size: 80%;"><b>More on activation functions, output layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs060.html#fine-tuning-neural-network-hyperparameters" style="font-size: 80%;"><b>Fine-tuning neural network hyperparameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs061.html#hidden-layers" style="font-size: 80%;"><b>Hidden layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs062.html#batch-normalization" style="font-size: 80%;"><b>Batch Normalization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs063.html#dropout" style="font-size: 80%;"><b>Dropout</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs064.html#gradient-clipping" style="font-size: 80%;"><b>Gradient Clipping</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs065.html#a-top-down-perspective-on-neural-networks" style="font-size: 80%;"><b>A top-down perspective on Neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs066.html#more-top-down-perspectives" style="font-size: 80%;"><b>More top-down perspectives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs067.html#limitations-of-supervised-learning-with-deep-networks" style="font-size: 80%;"><b>Limitations of supervised learning with deep networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs068.html#limitations-of-nns" style="font-size: 80%;"><b>Limitations of NNs</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs069.html#homogeneous-data" style="font-size: 80%;"><b>Homogeneous data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs070.html#more-limitations" style="font-size: 80%;"><b>More limitations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs071.html#setting-up-a-multi-layer-perceptron-model-for-classification" style="font-size: 80%;"><b>Setting up a Multi-layer perceptron model for classification</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs072.html#defining-the-cost-function" style="font-size: 80%;"><b>Defining the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs073.html#example-binary-classification-problem" style="font-size: 80%;"><b>Example: binary classification problem</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs074.html#the-softmax-function" style="font-size: 80%;"><b>The Softmax function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs075.html#developing-a-code-for-doing-neural-networks-with-back-propagation" style="font-size: 80%;"><b>Developing a code for doing neural networks with back propagation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs077.html#train-and-test-datasets" style="font-size: 80%;"><b>Train and test datasets</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs078.html#define-model-and-architecture" style="font-size: 80%;"><b>Define model and architecture</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs079.html#layers" style="font-size: 80%;"><b>Layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs080.html#weights-and-biases" style="font-size: 80%;"><b>Weights and biases</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs081.html#feed-forward-pass" style="font-size: 80%;"><b>Feed-forward pass</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs082.html#matrix-multiplications" style="font-size: 80%;"><b>Matrix multiplications</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs083.html#choose-cost-function-and-optimizer" style="font-size: 80%;"><b>Choose cost function and optimizer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs084.html#optimizing-the-cost-function" style="font-size: 80%;"><b>Optimizing the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs085.html#regularization" style="font-size: 80%;"><b>Regularization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs086.html#matrix-multiplication" style="font-size: 80%;"><b>Matrix  multiplication</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs087.html#improving-performance" style="font-size: 80%;"><b>Improving performance</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs088.html#full-object-oriented-implementation" style="font-size: 80%;"><b>Full object-oriented implementation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs089.html#evaluate-model-performance-on-test-data" style="font-size: 80%;"><b>Evaluate model performance on test data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs090.html#adjust-hyperparameters" style="font-size: 80%;"><b>Adjust hyperparameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#scikit-learn-implementation" style="font-size: 80%;"><b>scikit-learn implementation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs094.html#building-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building neural networks in Tensorflow and Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs095.html#tensorflow" style="font-size: 80%;"><b>Tensorflow</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#using-keras" style="font-size: 80%;"><b>Using Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#building-a-neural-network-code" style="font-size: 80%;"><b>Building a neural network code</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#learning-rate-methods" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Learning rate methods</a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#usage-of-the-above-learning-rate-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of the above learning rate schedulers</a></li>
@@ -465,24 +463,20 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0014"></a>
 <!-- !split -->
-<h2 id="the-training" class="anchor">The training </h2>
-
-<p>The training of the parameters is done through various gradient descent approximations with</p>
-
-$$
-w_{i}\leftarrow w_{i}- \eta \delta_i a_{i-1},
-$$
+<h2 id="important-observations" class="anchor">Important observations </h2>
 
-<p>and</p>
-$$
-b_i \leftarrow b_i-\eta \delta_i,
-$$
-
-<p>with \( \eta \) is the learning rate.</p>
-
-<p>One iteration consists of one feed forward step and one back-propagation step. Each back-propagation step does one update of the parameters \( \boldsymbol{\Theta} \).</p>
+<div class="panel panel-default">
+<div class="panel-body">
+<!-- subsequent paragraphs come in larger fonts, so start with a paragraph -->
+<p>From the above equations we see that the derivatives of the activation
+functions play a central role. If they vanish, the training may
+stop. This is called the vanishing gradient problem, see discussions below. If they become
+large, the parameters \( w_i \) and \( b_i \) may simply go to infinity. This
+is referenced as  the exploding gradient problem.
+</p>
+</div>
+</div>
 
-<p>For the first hidden layer \( a_{i-1}=a_0=x \) for this simple model.</p>
 
 <p>
 <!-- navigation buttons at the bottom of the page -->
diff --git a/doc/pub/week42/html/._week42-bs015.html b/doc/pub/week42/html/._week42-bs015.html
index 1746be476..34e9bf02d 100644
--- a/doc/pub/week42/html/._week42-bs015.html
+++ b/doc/pub/week42/html/._week42-bs015.html
@@ -36,16 +36,17 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Lecture October 14, 2024', 2, None, 'lecture-october-14-2024'),
-              ('Material for the active learning sessions on Tuesday and '
-               'Wednesday',
+ 'sections': [('Lecture October 13, 2025', 2, None, 'lecture-october-13-2025'),
+              ('Readings and videos', 2, None, 'readings-and-videos'),
+              ('Material for the lab sessions on Tuesday and Wednesday',
                2,
                None,
-               'material-for-the-active-learning-sessions-on-tuesday-and-wednesday'),
-              ('Writing a code which implements a feed-forward neural network',
+               'material-for-the-lab-sessions-on-tuesday-and-wednesday'),
+              ('Lecture material: Writing a code which implements a '
+               'feed-forward neural network',
                2,
                None,
-               'writing-a-code-which-implements-a-feed-forward-neural-network'),
+               'lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network'),
               ('Mathematics of deep learning',
                2,
                None,
@@ -55,10 +56,11 @@
                None,
                'reminder-on-books-with-hands-on-material-and-codes'),
               ('Reading recommendations', 2, None, 'reading-recommendations'),
-              ('First network example, simple percepetron with one input',
+              ('Reminder from last week: First network example, simple '
+               'percepetron with one input',
                2,
                None,
-               'first-network-example-simple-percepetron-with-one-input'),
+               'reminder-from-last-week-first-network-example-simple-percepetron-with-one-input'),
               ('Layout of a simple neural network with no hidden layer',
                2,
                None,
@@ -105,10 +107,10 @@
                None,
                'setting-up-the-equations-for-a-neural-network'),
               ('Layout of a neural network with three hidden layers (last '
-               'later = $l=L=4$, first layer $l=0$)',
+               'layer = $l=L=4$, first layer $l=0$)',
                2,
                None,
-               'layout-of-a-neural-network-with-three-hidden-layers-last-later-l-l-4-first-layer-l-0'),
+               'layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0'),
               ('Definitions', 2, None, 'definitions'),
               ('Inputs to the activation function',
                2,
@@ -291,10 +293,6 @@
                2,
                None,
                'collect-and-pre-process-data'),
-              ('The Breast Cancer Data, now with Keras',
-               2,
-               None,
-               'the-breast-cancer-data-now-with-keras'),
               ('Building a neural network code',
                2,
                None,
@@ -349,103 +347,103 @@
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs001.html#lecture-october-14-2024" style="font-size: 80%;"><b>Lecture October 14, 2024</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs002.html#material-for-the-active-learning-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Material for the active learning sessions on Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs003.html#writing-a-code-which-implements-a-feed-forward-neural-network" style="font-size: 80%;"><b>Writing a code which implements a feed-forward neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs004.html#mathematics-of-deep-learning" style="font-size: 80%;"><b>Mathematics of deep learning</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs005.html#reminder-on-books-with-hands-on-material-and-codes" style="font-size: 80%;"><b>Reminder on books with hands-on material and codes</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs006.html#reading-recommendations" style="font-size: 80%;"><b>Reading recommendations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs007.html#first-network-example-simple-percepetron-with-one-input" style="font-size: 80%;"><b>First network example, simple percepetron with one input</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs008.html#layout-of-a-simple-neural-network-with-no-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with no hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs009.html#optimizing-the-parameters" style="font-size: 80%;"><b>Optimizing the parameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs010.html#adding-a-hidden-layer" style="font-size: 80%;"><b>Adding a hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs011.html#layout-of-a-simple-neural-network-with-one-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with one hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs012.html#the-derivatives" style="font-size: 80%;"><b>The derivatives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs013.html#important-observations" style="font-size: 80%;"><b>Important observations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs014.html#the-training" style="font-size: 80%;"><b>The training</b></a></li>
-     <!-- navigation toc: --> <li><a href="#code-example" style="font-size: 80%;"><b>Code example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs016.html#simple-neural-network-and-the-back-propagation-equations" style="font-size: 80%;"><b>Simple neural network and the  back propagation equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs017.html#layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-with-two-hidden-noeds-and-one-output-node" style="font-size: 80%;"><b>Layout of a simple neural network with two input nodes, one  hidden layer with two hidden noeds and one output node</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs018.html#the-ouput-layer" style="font-size: 80%;"><b>The ouput layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs019.html#compact-expressions" style="font-size: 80%;"><b>Compact expressions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs020.html#output-layer" style="font-size: 80%;"><b>Output layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs021.html#explicit-derivatives" style="font-size: 80%;"><b>Explicit derivatives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs022.html#derivatives-of-the-hidden-layer" style="font-size: 80%;"><b>Derivatives of the hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs023.html#final-expression" style="font-size: 80%;"><b>Final expression</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs024.html#completing-the-list" style="font-size: 80%;"><b>Completing the list</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs025.html#final-expressions-for-the-biases-of-the-hidden-layer" style="font-size: 80%;"><b>Final expressions for the biases of the hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs026.html#gradient-expressions" style="font-size: 80%;"><b>Gradient expressions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs027.html#setting-up-the-equations-for-a-neural-network" style="font-size: 80%;"><b>Setting up the equations for a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs028.html#layout-of-a-neural-network-with-three-hidden-layers-last-later-l-l-4-first-layer-l-0" style="font-size: 80%;"><b>Layout of a neural network with three hidden layers (last later = \( l=L=4 \), first layer \( l=0 \))</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs029.html#definitions" style="font-size: 80%;"><b>Definitions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs030.html#inputs-to-the-activation-function" style="font-size: 80%;"><b>Inputs to the activation function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs031.html#layout-of-input-to-first-hidden-layer-l-1-from-input-layer-l-0" style="font-size: 80%;"><b>Layout of input to first hidden layer \( l=1 \) from input layer \( l=0 \)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs032.html#derivatives-and-the-chain-rule" style="font-size: 80%;"><b>Derivatives and the chain rule</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs033.html#derivative-of-the-cost-function" style="font-size: 80%;"><b>Derivative of the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs034.html#the-back-propagation-equations-for-a-neural-network" style="font-size: 80%;"><b>The  back propagation equations for a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs035.html#analyzing-the-last-results" style="font-size: 80%;"><b>Analyzing the last results</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs036.html#more-considerations" style="font-size: 80%;"><b>More considerations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs037.html#derivatives-in-terms-of-z-j-l" style="font-size: 80%;"><b>Derivatives in terms of \( z_j^L \)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs038.html#bringing-it-together" style="font-size: 80%;"><b>Bringing it together</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs039.html#final-back-propagating-equation" style="font-size: 80%;"><b>Final back propagating equation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs040.html#using-the-chain-rule-and-summing-over-all-k-entries" style="font-size: 80%;"><b>Using the chain rule and summing over all \( k \) entries</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs041.html#setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations" style="font-size: 80%;"><b>Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs042.html#setting-up-the-back-propagation-algorithm-part-1" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 1</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs043.html#setting-up-the-back-propagation-algorithm-part-2" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs044.html#setting-up-the-back-propagation-algorithm-part-3" style="font-size: 80%;"><b>Setting up the Back propagation algorithm, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs045.html#updating-the-gradients" style="font-size: 80%;"><b>Updating the gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs001.html#lecture-october-13-2025" style="font-size: 80%;"><b>Lecture October 13, 2025</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs002.html#readings-and-videos" style="font-size: 80%;"><b>Readings and videos</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs003.html#material-for-the-lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Material for the lab sessions on Tuesday and Wednesday</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs004.html#lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network" style="font-size: 80%;"><b>Lecture material: Writing a code which implements a feed-forward neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs005.html#mathematics-of-deep-learning" style="font-size: 80%;"><b>Mathematics of deep learning</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs006.html#reminder-on-books-with-hands-on-material-and-codes" style="font-size: 80%;"><b>Reminder on books with hands-on material and codes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs007.html#reading-recommendations" style="font-size: 80%;"><b>Reading recommendations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs008.html#reminder-from-last-week-first-network-example-simple-percepetron-with-one-input" style="font-size: 80%;"><b>Reminder from last week: First network example, simple percepetron with one input</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs009.html#layout-of-a-simple-neural-network-with-no-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with no hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs010.html#optimizing-the-parameters" style="font-size: 80%;"><b>Optimizing the parameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs011.html#adding-a-hidden-layer" style="font-size: 80%;"><b>Adding a hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs012.html#layout-of-a-simple-neural-network-with-one-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with one hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs013.html#the-derivatives" style="font-size: 80%;"><b>The derivatives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs014.html#important-observations" style="font-size: 80%;"><b>Important observations</b></a></li>
+     <!-- navigation toc: --> <li><a href="#the-training" style="font-size: 80%;"><b>The training</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs016.html#code-example" style="font-size: 80%;"><b>Code example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs017.html#simple-neural-network-and-the-back-propagation-equations" style="font-size: 80%;"><b>Simple neural network and the  back propagation equations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs018.html#layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-with-two-hidden-noeds-and-one-output-node" style="font-size: 80%;"><b>Layout of a simple neural network with two input nodes, one  hidden layer with two hidden noeds and one output node</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs019.html#the-ouput-layer" style="font-size: 80%;"><b>The ouput layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs020.html#compact-expressions" style="font-size: 80%;"><b>Compact expressions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs021.html#output-layer" style="font-size: 80%;"><b>Output layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs022.html#explicit-derivatives" style="font-size: 80%;"><b>Explicit derivatives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs023.html#derivatives-of-the-hidden-layer" style="font-size: 80%;"><b>Derivatives of the hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs024.html#final-expression" style="font-size: 80%;"><b>Final expression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs025.html#completing-the-list" style="font-size: 80%;"><b>Completing the list</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs026.html#final-expressions-for-the-biases-of-the-hidden-layer" style="font-size: 80%;"><b>Final expressions for the biases of the hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs027.html#gradient-expressions" style="font-size: 80%;"><b>Gradient expressions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs028.html#setting-up-the-equations-for-a-neural-network" style="font-size: 80%;"><b>Setting up the equations for a neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs029.html#layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0" style="font-size: 80%;"><b>Layout of a neural network with three hidden layers (last layer = \( l=L=4 \), first layer \( l=0 \))</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs030.html#definitions" style="font-size: 80%;"><b>Definitions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs031.html#inputs-to-the-activation-function" style="font-size: 80%;"><b>Inputs to the activation function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs032.html#layout-of-input-to-first-hidden-layer-l-1-from-input-layer-l-0" style="font-size: 80%;"><b>Layout of input to first hidden layer \( l=1 \) from input layer \( l=0 \)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs033.html#derivatives-and-the-chain-rule" style="font-size: 80%;"><b>Derivatives and the chain rule</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs034.html#derivative-of-the-cost-function" style="font-size: 80%;"><b>Derivative of the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs035.html#the-back-propagation-equations-for-a-neural-network" style="font-size: 80%;"><b>The  back propagation equations for a neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs036.html#analyzing-the-last-results" style="font-size: 80%;"><b>Analyzing the last results</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs037.html#more-considerations" style="font-size: 80%;"><b>More considerations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs038.html#derivatives-in-terms-of-z-j-l" style="font-size: 80%;"><b>Derivatives in terms of \( z_j^L \)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs039.html#bringing-it-together" style="font-size: 80%;"><b>Bringing it together</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs040.html#final-back-propagating-equation" style="font-size: 80%;"><b>Final back propagating equation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs041.html#using-the-chain-rule-and-summing-over-all-k-entries" style="font-size: 80%;"><b>Using the chain rule and summing over all \( k \) entries</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs042.html#setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations" style="font-size: 80%;"><b>Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs043.html#setting-up-the-back-propagation-algorithm-part-1" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 1</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs044.html#setting-up-the-back-propagation-algorithm-part-2" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 2</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs045.html#setting-up-the-back-propagation-algorithm-part-3" style="font-size: 80%;"><b>Setting up the Back propagation algorithm, part 3</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs046.html#updating-the-gradients" style="font-size: 80%;"><b>Updating the gradients</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#activation-functions" style="font-size: 80%;"><b>Activation functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs047.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs048.html#relevance" style="font-size: 80%;"><b>Relevance</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs049.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs050.html#exploding-gradients" style="font-size: 80%;"><b>Exploding gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs051.html#is-the-logistic-activation-function-sigmoid-our-choice" style="font-size: 80%;"><b>Is the Logistic activation function (Sigmoid)  our choice?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs052.html#logistic-function-as-the-root-of-problems" style="font-size: 80%;"><b>Logistic function as the root of problems</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs053.html#the-derivative-of-the-logistic-funtion" style="font-size: 80%;"><b>The derivative of the Logistic funtion</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs054.html#insights-from-the-paper-by-glorot-and-bengio" style="font-size: 80%;"><b>Insights from the paper by Glorot and Bengio</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs055.html#the-relu-function-family" style="font-size: 80%;"><b>The RELU function family</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs056.html#elu-function" style="font-size: 80%;"><b>ELU function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs057.html#which-activation-function-should-we-use" style="font-size: 80%;"><b>Which activation function should we use?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs058.html#more-on-activation-functions-output-layers" style="font-size: 80%;"><b>More on activation functions, output layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs059.html#fine-tuning-neural-network-hyperparameters" style="font-size: 80%;"><b>Fine-tuning neural network hyperparameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs060.html#hidden-layers" style="font-size: 80%;"><b>Hidden layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs061.html#batch-normalization" style="font-size: 80%;"><b>Batch Normalization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs062.html#dropout" style="font-size: 80%;"><b>Dropout</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs063.html#gradient-clipping" style="font-size: 80%;"><b>Gradient Clipping</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs064.html#a-top-down-perspective-on-neural-networks" style="font-size: 80%;"><b>A top-down perspective on Neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs065.html#more-top-down-perspectives" style="font-size: 80%;"><b>More top-down perspectives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs066.html#limitations-of-supervised-learning-with-deep-networks" style="font-size: 80%;"><b>Limitations of supervised learning with deep networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs067.html#limitations-of-nns" style="font-size: 80%;"><b>Limitations of NNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs068.html#homogeneous-data" style="font-size: 80%;"><b>Homogeneous data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs069.html#more-limitations" style="font-size: 80%;"><b>More limitations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs070.html#setting-up-a-multi-layer-perceptron-model-for-classification" style="font-size: 80%;"><b>Setting up a Multi-layer perceptron model for classification</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs071.html#defining-the-cost-function" style="font-size: 80%;"><b>Defining the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs072.html#example-binary-classification-problem" style="font-size: 80%;"><b>Example: binary classification problem</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs073.html#the-softmax-function" style="font-size: 80%;"><b>The Softmax function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs074.html#developing-a-code-for-doing-neural-networks-with-back-propagation" style="font-size: 80%;"><b>Developing a code for doing neural networks with back propagation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs076.html#train-and-test-datasets" style="font-size: 80%;"><b>Train and test datasets</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs077.html#define-model-and-architecture" style="font-size: 80%;"><b>Define model and architecture</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs078.html#layers" style="font-size: 80%;"><b>Layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs079.html#weights-and-biases" style="font-size: 80%;"><b>Weights and biases</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs080.html#feed-forward-pass" style="font-size: 80%;"><b>Feed-forward pass</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs081.html#matrix-multiplications" style="font-size: 80%;"><b>Matrix multiplications</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs082.html#choose-cost-function-and-optimizer" style="font-size: 80%;"><b>Choose cost function and optimizer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs083.html#optimizing-the-cost-function" style="font-size: 80%;"><b>Optimizing the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs084.html#regularization" style="font-size: 80%;"><b>Regularization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs085.html#matrix-multiplication" style="font-size: 80%;"><b>Matrix  multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs086.html#improving-performance" style="font-size: 80%;"><b>Improving performance</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs087.html#full-object-oriented-implementation" style="font-size: 80%;"><b>Full object-oriented implementation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs088.html#evaluate-model-performance-on-test-data" style="font-size: 80%;"><b>Evaluate model performance on test data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs089.html#adjust-hyperparameters" style="font-size: 80%;"><b>Adjust hyperparameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs091.html#scikit-learn-implementation" style="font-size: 80%;"><b>scikit-learn implementation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#building-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs094.html#tensorflow" style="font-size: 80%;"><b>Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs095.html#using-keras" style="font-size: 80%;"><b>Using Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#the-breast-cancer-data-now-with-keras" style="font-size: 80%;"><b>The Breast Cancer Data, now with Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs048.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs049.html#relevance" style="font-size: 80%;"><b>Relevance</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs050.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs051.html#exploding-gradients" style="font-size: 80%;"><b>Exploding gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs052.html#is-the-logistic-activation-function-sigmoid-our-choice" style="font-size: 80%;"><b>Is the Logistic activation function (Sigmoid)  our choice?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs053.html#logistic-function-as-the-root-of-problems" style="font-size: 80%;"><b>Logistic function as the root of problems</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs054.html#the-derivative-of-the-logistic-funtion" style="font-size: 80%;"><b>The derivative of the Logistic funtion</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs055.html#insights-from-the-paper-by-glorot-and-bengio" style="font-size: 80%;"><b>Insights from the paper by Glorot and Bengio</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs056.html#the-relu-function-family" style="font-size: 80%;"><b>The RELU function family</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs057.html#elu-function" style="font-size: 80%;"><b>ELU function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs058.html#which-activation-function-should-we-use" style="font-size: 80%;"><b>Which activation function should we use?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs059.html#more-on-activation-functions-output-layers" style="font-size: 80%;"><b>More on activation functions, output layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs060.html#fine-tuning-neural-network-hyperparameters" style="font-size: 80%;"><b>Fine-tuning neural network hyperparameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs061.html#hidden-layers" style="font-size: 80%;"><b>Hidden layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs062.html#batch-normalization" style="font-size: 80%;"><b>Batch Normalization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs063.html#dropout" style="font-size: 80%;"><b>Dropout</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs064.html#gradient-clipping" style="font-size: 80%;"><b>Gradient Clipping</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs065.html#a-top-down-perspective-on-neural-networks" style="font-size: 80%;"><b>A top-down perspective on Neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs066.html#more-top-down-perspectives" style="font-size: 80%;"><b>More top-down perspectives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs067.html#limitations-of-supervised-learning-with-deep-networks" style="font-size: 80%;"><b>Limitations of supervised learning with deep networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs068.html#limitations-of-nns" style="font-size: 80%;"><b>Limitations of NNs</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs069.html#homogeneous-data" style="font-size: 80%;"><b>Homogeneous data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs070.html#more-limitations" style="font-size: 80%;"><b>More limitations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs071.html#setting-up-a-multi-layer-perceptron-model-for-classification" style="font-size: 80%;"><b>Setting up a Multi-layer perceptron model for classification</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs072.html#defining-the-cost-function" style="font-size: 80%;"><b>Defining the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs073.html#example-binary-classification-problem" style="font-size: 80%;"><b>Example: binary classification problem</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs074.html#the-softmax-function" style="font-size: 80%;"><b>The Softmax function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs075.html#developing-a-code-for-doing-neural-networks-with-back-propagation" style="font-size: 80%;"><b>Developing a code for doing neural networks with back propagation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs077.html#train-and-test-datasets" style="font-size: 80%;"><b>Train and test datasets</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs078.html#define-model-and-architecture" style="font-size: 80%;"><b>Define model and architecture</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs079.html#layers" style="font-size: 80%;"><b>Layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs080.html#weights-and-biases" style="font-size: 80%;"><b>Weights and biases</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs081.html#feed-forward-pass" style="font-size: 80%;"><b>Feed-forward pass</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs082.html#matrix-multiplications" style="font-size: 80%;"><b>Matrix multiplications</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs083.html#choose-cost-function-and-optimizer" style="font-size: 80%;"><b>Choose cost function and optimizer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs084.html#optimizing-the-cost-function" style="font-size: 80%;"><b>Optimizing the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs085.html#regularization" style="font-size: 80%;"><b>Regularization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs086.html#matrix-multiplication" style="font-size: 80%;"><b>Matrix  multiplication</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs087.html#improving-performance" style="font-size: 80%;"><b>Improving performance</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs088.html#full-object-oriented-implementation" style="font-size: 80%;"><b>Full object-oriented implementation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs089.html#evaluate-model-performance-on-test-data" style="font-size: 80%;"><b>Evaluate model performance on test data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs090.html#adjust-hyperparameters" style="font-size: 80%;"><b>Adjust hyperparameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#scikit-learn-implementation" style="font-size: 80%;"><b>scikit-learn implementation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs094.html#building-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building neural networks in Tensorflow and Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs095.html#tensorflow" style="font-size: 80%;"><b>Tensorflow</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#using-keras" style="font-size: 80%;"><b>Using Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#building-a-neural-network-code" style="font-size: 80%;"><b>Building a neural network code</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#learning-rate-methods" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Learning rate methods</a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#usage-of-the-above-learning-rate-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of the above learning rate schedulers</a></li>
@@ -465,101 +463,24 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0015"></a>
 <!-- !split -->
-<h2 id="code-example" class="anchor">Code example  </h2>
-
-<p>The code here implements the above model with one hidden layer and
-scalar variables for the same function we studied in the previous
-example.  The code is however set up so that we can add multiple
-inputs \( x \) and target values \( y \). Note also that we have the
-possibility of defining a feature matrix \( \boldsymbol{X} \) with more than just
-one column for the input values. This will turn useful in our next example. We have also defined matrices and vectors for all of our operations although it is not necessary here.
-</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
-<span style="color: #408080; font-style: italic"># We use the Sigmoid function as activation function</span>
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">sigmoid</span>(z):
-    <span style="color: #008000; font-weight: bold">return</span> <span style="color: #666666">1.0/</span>(<span style="color: #666666">1.0+</span>np<span style="color: #666666">.</span>exp(<span style="color: #666666">-</span>z))
-
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">forwardpropagation</span>(x):
-    <span style="color: #408080; font-style: italic"># weighted sum of inputs to the hidden layer</span>
-    z_1 <span style="color: #666666">=</span> np<span style="color: #666666">.</span>matmul(x, w_1) <span style="color: #666666">+</span> b_1
-    <span style="color: #408080; font-style: italic"># activation in the hidden layer</span>
-    a_1 <span style="color: #666666">=</span> sigmoid(z_1)
-    <span style="color: #408080; font-style: italic"># weighted sum of inputs to the output layer</span>
-    z_2 <span style="color: #666666">=</span> np<span style="color: #666666">.</span>matmul(a_1, w_2) <span style="color: #666666">+</span> b_2
-    a_2 <span style="color: #666666">=</span> z_2
-    <span style="color: #008000; font-weight: bold">return</span> a_1, a_2
+<h2 id="the-training" class="anchor">The training </h2>
 
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">backpropagation</span>(x, y):
-    a_1, a_2 <span style="color: #666666">=</span> forwardpropagation(x)
-    <span style="color: #408080; font-style: italic"># parameter delta for the output layer, note that a_2=z_2 and its derivative wrt z_2 is just 1</span>
-    delta_2 <span style="color: #666666">=</span> a_2 <span style="color: #666666">-</span> y
-    <span style="color: #008000">print</span>(<span style="color: #666666">0.5*</span>((a_2<span style="color: #666666">-</span>y)<span style="color: #666666">**2</span>))
-    <span style="color: #408080; font-style: italic"># delta for  the hidden layer</span>
-    delta_1 <span style="color: #666666">=</span> np<span style="color: #666666">.</span>matmul(delta_2, w_2<span style="color: #666666">.</span>T) <span style="color: #666666">*</span> a_1 <span style="color: #666666">*</span> (<span style="color: #666666">1</span> <span style="color: #666666">-</span> a_1)
-    <span style="color: #408080; font-style: italic"># gradients for the output layer</span>
-    output_weights_gradient <span style="color: #666666">=</span> np<span style="color: #666666">.</span>matmul(a_1<span style="color: #666666">.</span>T, delta_2)
-    output_bias_gradient <span style="color: #666666">=</span> np<span style="color: #666666">.</span>sum(delta_2, axis<span style="color: #666666">=0</span>)
-    <span style="color: #408080; font-style: italic"># gradient for the hidden layer</span>
-    hidden_weights_gradient <span style="color: #666666">=</span> np<span style="color: #666666">.</span>matmul(x<span style="color: #666666">.</span>T, delta_1)
-    hidden_bias_gradient <span style="color: #666666">=</span> np<span style="color: #666666">.</span>sum(delta_1, axis<span style="color: #666666">=0</span>)
-    <span style="color: #008000; font-weight: bold">return</span> output_weights_gradient, output_bias_gradient, hidden_weights_gradient, hidden_bias_gradient
+<p>The training of the parameters is done through various gradient descent approximations with</p>
 
+$$
+w_{i}\leftarrow w_{i}- \eta \delta_i a_{i-1},
+$$
 
-<span style="color: #408080; font-style: italic"># ensure the same random numbers appear every time</span>
-np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>seed(<span style="color: #666666">0</span>)
-<span style="color: #408080; font-style: italic"># Input variable</span>
-x <span style="color: #666666">=</span> np<span style="color: #666666">.</span>array([<span style="color: #666666">4.0</span>],dtype<span style="color: #666666">=</span>np<span style="color: #666666">.</span>float64)
-<span style="color: #408080; font-style: italic"># Target values</span>
-y <span style="color: #666666">=</span> <span style="color: #666666">2*</span>x<span style="color: #666666">+1.0</span> 
+<p>and</p>
+$$
+b_i \leftarrow b_i-\eta \delta_i,
+$$
 
-<span style="color: #408080; font-style: italic"># Defining the neural network, only scalars here</span>
-n_inputs <span style="color: #666666">=</span> x<span style="color: #666666">.</span>shape
-n_features <span style="color: #666666">=</span> <span style="color: #666666">1</span>
-n_hidden_neurons <span style="color: #666666">=</span> <span style="color: #666666">1</span>
-n_outputs <span style="color: #666666">=</span> <span style="color: #666666">1</span>
+<p>with \( \eta \) is the learning rate.</p>
 
-<span style="color: #408080; font-style: italic"># Initialize the network</span>
-<span style="color: #408080; font-style: italic"># weights and bias in the hidden layer</span>
-w_1 <span style="color: #666666">=</span> np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>randn(n_features, n_hidden_neurons)
-b_1 <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros(n_hidden_neurons) <span style="color: #666666">+</span> <span style="color: #666666">0.01</span>
-
-<span style="color: #408080; font-style: italic"># weights and bias in the output layer</span>
-w_2 <span style="color: #666666">=</span> np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>randn(n_hidden_neurons, n_outputs)
-b_2 <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros(n_outputs) <span style="color: #666666">+</span> <span style="color: #666666">0.01</span>
-
-eta <span style="color: #666666">=</span> <span style="color: #666666">0.1</span>
-<span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #666666">50</span>):
-    <span style="color: #408080; font-style: italic"># calculate gradients</span>
-    derivW2, derivB2, derivW1, derivB1 <span style="color: #666666">=</span> backpropagation(x, y)
-    <span style="color: #408080; font-style: italic"># update weights and biases</span>
-    w_2 <span style="color: #666666">-=</span> eta <span style="color: #666666">*</span> derivW2
-    b_2 <span style="color: #666666">-=</span> eta <span style="color: #666666">*</span> derivB2
-    w_1 <span style="color: #666666">-=</span> eta <span style="color: #666666">*</span> derivW1
-    b_1 <span style="color: #666666">-=</span> eta <span style="color: #666666">*</span> derivB1
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
+<p>One iteration consists of one feed forward step and one back-propagation step. Each back-propagation step does one update of the parameters \( \boldsymbol{\Theta} \).</p>
 
-<p>We see that after some few iterations (the results do depend on the learning rate however), we get an error which is rather small.</p>
+<p>For the first hidden layer \( a_{i-1}=a_0=x \) for this simple model.</p>
 
 <p>
 <!-- navigation buttons at the bottom of the page -->
diff --git a/doc/pub/week42/html/._week42-bs016.html b/doc/pub/week42/html/._week42-bs016.html
index 21cee57db..e8e6fd91b 100644
--- a/doc/pub/week42/html/._week42-bs016.html
+++ b/doc/pub/week42/html/._week42-bs016.html
@@ -36,16 +36,17 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Lecture October 14, 2024', 2, None, 'lecture-october-14-2024'),
-              ('Material for the active learning sessions on Tuesday and '
-               'Wednesday',
+ 'sections': [('Lecture October 13, 2025', 2, None, 'lecture-october-13-2025'),
+              ('Readings and videos', 2, None, 'readings-and-videos'),
+              ('Material for the lab sessions on Tuesday and Wednesday',
                2,
                None,
-               'material-for-the-active-learning-sessions-on-tuesday-and-wednesday'),
-              ('Writing a code which implements a feed-forward neural network',
+               'material-for-the-lab-sessions-on-tuesday-and-wednesday'),
+              ('Lecture material: Writing a code which implements a '
+               'feed-forward neural network',
                2,
                None,
-               'writing-a-code-which-implements-a-feed-forward-neural-network'),
+               'lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network'),
               ('Mathematics of deep learning',
                2,
                None,
@@ -55,10 +56,11 @@
                None,
                'reminder-on-books-with-hands-on-material-and-codes'),
               ('Reading recommendations', 2, None, 'reading-recommendations'),
-              ('First network example, simple percepetron with one input',
+              ('Reminder from last week: First network example, simple '
+               'percepetron with one input',
                2,
                None,
-               'first-network-example-simple-percepetron-with-one-input'),
+               'reminder-from-last-week-first-network-example-simple-percepetron-with-one-input'),
               ('Layout of a simple neural network with no hidden layer',
                2,
                None,
@@ -105,10 +107,10 @@
                None,
                'setting-up-the-equations-for-a-neural-network'),
               ('Layout of a neural network with three hidden layers (last '
-               'later = $l=L=4$, first layer $l=0$)',
+               'layer = $l=L=4$, first layer $l=0$)',
                2,
                None,
-               'layout-of-a-neural-network-with-three-hidden-layers-last-later-l-l-4-first-layer-l-0'),
+               'layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0'),
               ('Definitions', 2, None, 'definitions'),
               ('Inputs to the activation function',
                2,
@@ -291,10 +293,6 @@
                2,
                None,
                'collect-and-pre-process-data'),
-              ('The Breast Cancer Data, now with Keras',
-               2,
-               None,
-               'the-breast-cancer-data-now-with-keras'),
               ('Building a neural network code',
                2,
                None,
@@ -349,103 +347,103 @@
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs001.html#lecture-october-14-2024" style="font-size: 80%;"><b>Lecture October 14, 2024</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs002.html#material-for-the-active-learning-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Material for the active learning sessions on Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs003.html#writing-a-code-which-implements-a-feed-forward-neural-network" style="font-size: 80%;"><b>Writing a code which implements a feed-forward neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs004.html#mathematics-of-deep-learning" style="font-size: 80%;"><b>Mathematics of deep learning</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs005.html#reminder-on-books-with-hands-on-material-and-codes" style="font-size: 80%;"><b>Reminder on books with hands-on material and codes</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs006.html#reading-recommendations" style="font-size: 80%;"><b>Reading recommendations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs007.html#first-network-example-simple-percepetron-with-one-input" style="font-size: 80%;"><b>First network example, simple percepetron with one input</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs008.html#layout-of-a-simple-neural-network-with-no-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with no hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs009.html#optimizing-the-parameters" style="font-size: 80%;"><b>Optimizing the parameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs010.html#adding-a-hidden-layer" style="font-size: 80%;"><b>Adding a hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs011.html#layout-of-a-simple-neural-network-with-one-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with one hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs012.html#the-derivatives" style="font-size: 80%;"><b>The derivatives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs013.html#important-observations" style="font-size: 80%;"><b>Important observations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs014.html#the-training" style="font-size: 80%;"><b>The training</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs015.html#code-example" style="font-size: 80%;"><b>Code example</b></a></li>
-     <!-- navigation toc: --> <li><a href="#simple-neural-network-and-the-back-propagation-equations" style="font-size: 80%;"><b>Simple neural network and the  back propagation equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs017.html#layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-with-two-hidden-noeds-and-one-output-node" style="font-size: 80%;"><b>Layout of a simple neural network with two input nodes, one  hidden layer with two hidden noeds and one output node</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs018.html#the-ouput-layer" style="font-size: 80%;"><b>The ouput layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs019.html#compact-expressions" style="font-size: 80%;"><b>Compact expressions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs020.html#output-layer" style="font-size: 80%;"><b>Output layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs021.html#explicit-derivatives" style="font-size: 80%;"><b>Explicit derivatives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs022.html#derivatives-of-the-hidden-layer" style="font-size: 80%;"><b>Derivatives of the hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs023.html#final-expression" style="font-size: 80%;"><b>Final expression</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs024.html#completing-the-list" style="font-size: 80%;"><b>Completing the list</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs025.html#final-expressions-for-the-biases-of-the-hidden-layer" style="font-size: 80%;"><b>Final expressions for the biases of the hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs026.html#gradient-expressions" style="font-size: 80%;"><b>Gradient expressions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs027.html#setting-up-the-equations-for-a-neural-network" style="font-size: 80%;"><b>Setting up the equations for a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs028.html#layout-of-a-neural-network-with-three-hidden-layers-last-later-l-l-4-first-layer-l-0" style="font-size: 80%;"><b>Layout of a neural network with three hidden layers (last later = \( l=L=4 \), first layer \( l=0 \))</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs029.html#definitions" style="font-size: 80%;"><b>Definitions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs030.html#inputs-to-the-activation-function" style="font-size: 80%;"><b>Inputs to the activation function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs031.html#layout-of-input-to-first-hidden-layer-l-1-from-input-layer-l-0" style="font-size: 80%;"><b>Layout of input to first hidden layer \( l=1 \) from input layer \( l=0 \)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs032.html#derivatives-and-the-chain-rule" style="font-size: 80%;"><b>Derivatives and the chain rule</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs033.html#derivative-of-the-cost-function" style="font-size: 80%;"><b>Derivative of the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs034.html#the-back-propagation-equations-for-a-neural-network" style="font-size: 80%;"><b>The  back propagation equations for a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs035.html#analyzing-the-last-results" style="font-size: 80%;"><b>Analyzing the last results</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs036.html#more-considerations" style="font-size: 80%;"><b>More considerations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs037.html#derivatives-in-terms-of-z-j-l" style="font-size: 80%;"><b>Derivatives in terms of \( z_j^L \)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs038.html#bringing-it-together" style="font-size: 80%;"><b>Bringing it together</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs039.html#final-back-propagating-equation" style="font-size: 80%;"><b>Final back propagating equation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs040.html#using-the-chain-rule-and-summing-over-all-k-entries" style="font-size: 80%;"><b>Using the chain rule and summing over all \( k \) entries</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs041.html#setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations" style="font-size: 80%;"><b>Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs042.html#setting-up-the-back-propagation-algorithm-part-1" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 1</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs043.html#setting-up-the-back-propagation-algorithm-part-2" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs044.html#setting-up-the-back-propagation-algorithm-part-3" style="font-size: 80%;"><b>Setting up the Back propagation algorithm, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs045.html#updating-the-gradients" style="font-size: 80%;"><b>Updating the gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs001.html#lecture-october-13-2025" style="font-size: 80%;"><b>Lecture October 13, 2025</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs002.html#readings-and-videos" style="font-size: 80%;"><b>Readings and videos</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs003.html#material-for-the-lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Material for the lab sessions on Tuesday and Wednesday</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs004.html#lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network" style="font-size: 80%;"><b>Lecture material: Writing a code which implements a feed-forward neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs005.html#mathematics-of-deep-learning" style="font-size: 80%;"><b>Mathematics of deep learning</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs006.html#reminder-on-books-with-hands-on-material-and-codes" style="font-size: 80%;"><b>Reminder on books with hands-on material and codes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs007.html#reading-recommendations" style="font-size: 80%;"><b>Reading recommendations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs008.html#reminder-from-last-week-first-network-example-simple-percepetron-with-one-input" style="font-size: 80%;"><b>Reminder from last week: First network example, simple percepetron with one input</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs009.html#layout-of-a-simple-neural-network-with-no-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with no hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs010.html#optimizing-the-parameters" style="font-size: 80%;"><b>Optimizing the parameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs011.html#adding-a-hidden-layer" style="font-size: 80%;"><b>Adding a hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs012.html#layout-of-a-simple-neural-network-with-one-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with one hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs013.html#the-derivatives" style="font-size: 80%;"><b>The derivatives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs014.html#important-observations" style="font-size: 80%;"><b>Important observations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs015.html#the-training" style="font-size: 80%;"><b>The training</b></a></li>
+     <!-- navigation toc: --> <li><a href="#code-example" style="font-size: 80%;"><b>Code example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs017.html#simple-neural-network-and-the-back-propagation-equations" style="font-size: 80%;"><b>Simple neural network and the  back propagation equations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs018.html#layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-with-two-hidden-noeds-and-one-output-node" style="font-size: 80%;"><b>Layout of a simple neural network with two input nodes, one  hidden layer with two hidden noeds and one output node</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs019.html#the-ouput-layer" style="font-size: 80%;"><b>The ouput layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs020.html#compact-expressions" style="font-size: 80%;"><b>Compact expressions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs021.html#output-layer" style="font-size: 80%;"><b>Output layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs022.html#explicit-derivatives" style="font-size: 80%;"><b>Explicit derivatives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs023.html#derivatives-of-the-hidden-layer" style="font-size: 80%;"><b>Derivatives of the hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs024.html#final-expression" style="font-size: 80%;"><b>Final expression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs025.html#completing-the-list" style="font-size: 80%;"><b>Completing the list</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs026.html#final-expressions-for-the-biases-of-the-hidden-layer" style="font-size: 80%;"><b>Final expressions for the biases of the hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs027.html#gradient-expressions" style="font-size: 80%;"><b>Gradient expressions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs028.html#setting-up-the-equations-for-a-neural-network" style="font-size: 80%;"><b>Setting up the equations for a neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs029.html#layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0" style="font-size: 80%;"><b>Layout of a neural network with three hidden layers (last layer = \( l=L=4 \), first layer \( l=0 \))</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs030.html#definitions" style="font-size: 80%;"><b>Definitions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs031.html#inputs-to-the-activation-function" style="font-size: 80%;"><b>Inputs to the activation function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs032.html#layout-of-input-to-first-hidden-layer-l-1-from-input-layer-l-0" style="font-size: 80%;"><b>Layout of input to first hidden layer \( l=1 \) from input layer \( l=0 \)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs033.html#derivatives-and-the-chain-rule" style="font-size: 80%;"><b>Derivatives and the chain rule</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs034.html#derivative-of-the-cost-function" style="font-size: 80%;"><b>Derivative of the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs035.html#the-back-propagation-equations-for-a-neural-network" style="font-size: 80%;"><b>The  back propagation equations for a neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs036.html#analyzing-the-last-results" style="font-size: 80%;"><b>Analyzing the last results</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs037.html#more-considerations" style="font-size: 80%;"><b>More considerations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs038.html#derivatives-in-terms-of-z-j-l" style="font-size: 80%;"><b>Derivatives in terms of \( z_j^L \)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs039.html#bringing-it-together" style="font-size: 80%;"><b>Bringing it together</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs040.html#final-back-propagating-equation" style="font-size: 80%;"><b>Final back propagating equation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs041.html#using-the-chain-rule-and-summing-over-all-k-entries" style="font-size: 80%;"><b>Using the chain rule and summing over all \( k \) entries</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs042.html#setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations" style="font-size: 80%;"><b>Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs043.html#setting-up-the-back-propagation-algorithm-part-1" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 1</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs044.html#setting-up-the-back-propagation-algorithm-part-2" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 2</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs045.html#setting-up-the-back-propagation-algorithm-part-3" style="font-size: 80%;"><b>Setting up the Back propagation algorithm, part 3</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs046.html#updating-the-gradients" style="font-size: 80%;"><b>Updating the gradients</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#activation-functions" style="font-size: 80%;"><b>Activation functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs047.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs048.html#relevance" style="font-size: 80%;"><b>Relevance</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs049.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs050.html#exploding-gradients" style="font-size: 80%;"><b>Exploding gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs051.html#is-the-logistic-activation-function-sigmoid-our-choice" style="font-size: 80%;"><b>Is the Logistic activation function (Sigmoid)  our choice?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs052.html#logistic-function-as-the-root-of-problems" style="font-size: 80%;"><b>Logistic function as the root of problems</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs053.html#the-derivative-of-the-logistic-funtion" style="font-size: 80%;"><b>The derivative of the Logistic funtion</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs054.html#insights-from-the-paper-by-glorot-and-bengio" style="font-size: 80%;"><b>Insights from the paper by Glorot and Bengio</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs055.html#the-relu-function-family" style="font-size: 80%;"><b>The RELU function family</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs056.html#elu-function" style="font-size: 80%;"><b>ELU function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs057.html#which-activation-function-should-we-use" style="font-size: 80%;"><b>Which activation function should we use?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs058.html#more-on-activation-functions-output-layers" style="font-size: 80%;"><b>More on activation functions, output layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs059.html#fine-tuning-neural-network-hyperparameters" style="font-size: 80%;"><b>Fine-tuning neural network hyperparameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs060.html#hidden-layers" style="font-size: 80%;"><b>Hidden layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs061.html#batch-normalization" style="font-size: 80%;"><b>Batch Normalization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs062.html#dropout" style="font-size: 80%;"><b>Dropout</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs063.html#gradient-clipping" style="font-size: 80%;"><b>Gradient Clipping</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs064.html#a-top-down-perspective-on-neural-networks" style="font-size: 80%;"><b>A top-down perspective on Neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs065.html#more-top-down-perspectives" style="font-size: 80%;"><b>More top-down perspectives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs066.html#limitations-of-supervised-learning-with-deep-networks" style="font-size: 80%;"><b>Limitations of supervised learning with deep networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs067.html#limitations-of-nns" style="font-size: 80%;"><b>Limitations of NNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs068.html#homogeneous-data" style="font-size: 80%;"><b>Homogeneous data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs069.html#more-limitations" style="font-size: 80%;"><b>More limitations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs070.html#setting-up-a-multi-layer-perceptron-model-for-classification" style="font-size: 80%;"><b>Setting up a Multi-layer perceptron model for classification</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs071.html#defining-the-cost-function" style="font-size: 80%;"><b>Defining the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs072.html#example-binary-classification-problem" style="font-size: 80%;"><b>Example: binary classification problem</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs073.html#the-softmax-function" style="font-size: 80%;"><b>The Softmax function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs074.html#developing-a-code-for-doing-neural-networks-with-back-propagation" style="font-size: 80%;"><b>Developing a code for doing neural networks with back propagation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs076.html#train-and-test-datasets" style="font-size: 80%;"><b>Train and test datasets</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs077.html#define-model-and-architecture" style="font-size: 80%;"><b>Define model and architecture</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs078.html#layers" style="font-size: 80%;"><b>Layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs079.html#weights-and-biases" style="font-size: 80%;"><b>Weights and biases</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs080.html#feed-forward-pass" style="font-size: 80%;"><b>Feed-forward pass</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs081.html#matrix-multiplications" style="font-size: 80%;"><b>Matrix multiplications</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs082.html#choose-cost-function-and-optimizer" style="font-size: 80%;"><b>Choose cost function and optimizer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs083.html#optimizing-the-cost-function" style="font-size: 80%;"><b>Optimizing the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs084.html#regularization" style="font-size: 80%;"><b>Regularization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs085.html#matrix-multiplication" style="font-size: 80%;"><b>Matrix  multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs086.html#improving-performance" style="font-size: 80%;"><b>Improving performance</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs087.html#full-object-oriented-implementation" style="font-size: 80%;"><b>Full object-oriented implementation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs088.html#evaluate-model-performance-on-test-data" style="font-size: 80%;"><b>Evaluate model performance on test data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs089.html#adjust-hyperparameters" style="font-size: 80%;"><b>Adjust hyperparameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs091.html#scikit-learn-implementation" style="font-size: 80%;"><b>scikit-learn implementation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#building-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs094.html#tensorflow" style="font-size: 80%;"><b>Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs095.html#using-keras" style="font-size: 80%;"><b>Using Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#the-breast-cancer-data-now-with-keras" style="font-size: 80%;"><b>The Breast Cancer Data, now with Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs048.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs049.html#relevance" style="font-size: 80%;"><b>Relevance</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs050.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs051.html#exploding-gradients" style="font-size: 80%;"><b>Exploding gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs052.html#is-the-logistic-activation-function-sigmoid-our-choice" style="font-size: 80%;"><b>Is the Logistic activation function (Sigmoid)  our choice?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs053.html#logistic-function-as-the-root-of-problems" style="font-size: 80%;"><b>Logistic function as the root of problems</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs054.html#the-derivative-of-the-logistic-funtion" style="font-size: 80%;"><b>The derivative of the Logistic funtion</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs055.html#insights-from-the-paper-by-glorot-and-bengio" style="font-size: 80%;"><b>Insights from the paper by Glorot and Bengio</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs056.html#the-relu-function-family" style="font-size: 80%;"><b>The RELU function family</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs057.html#elu-function" style="font-size: 80%;"><b>ELU function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs058.html#which-activation-function-should-we-use" style="font-size: 80%;"><b>Which activation function should we use?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs059.html#more-on-activation-functions-output-layers" style="font-size: 80%;"><b>More on activation functions, output layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs060.html#fine-tuning-neural-network-hyperparameters" style="font-size: 80%;"><b>Fine-tuning neural network hyperparameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs061.html#hidden-layers" style="font-size: 80%;"><b>Hidden layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs062.html#batch-normalization" style="font-size: 80%;"><b>Batch Normalization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs063.html#dropout" style="font-size: 80%;"><b>Dropout</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs064.html#gradient-clipping" style="font-size: 80%;"><b>Gradient Clipping</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs065.html#a-top-down-perspective-on-neural-networks" style="font-size: 80%;"><b>A top-down perspective on Neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs066.html#more-top-down-perspectives" style="font-size: 80%;"><b>More top-down perspectives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs067.html#limitations-of-supervised-learning-with-deep-networks" style="font-size: 80%;"><b>Limitations of supervised learning with deep networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs068.html#limitations-of-nns" style="font-size: 80%;"><b>Limitations of NNs</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs069.html#homogeneous-data" style="font-size: 80%;"><b>Homogeneous data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs070.html#more-limitations" style="font-size: 80%;"><b>More limitations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs071.html#setting-up-a-multi-layer-perceptron-model-for-classification" style="font-size: 80%;"><b>Setting up a Multi-layer perceptron model for classification</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs072.html#defining-the-cost-function" style="font-size: 80%;"><b>Defining the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs073.html#example-binary-classification-problem" style="font-size: 80%;"><b>Example: binary classification problem</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs074.html#the-softmax-function" style="font-size: 80%;"><b>The Softmax function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs075.html#developing-a-code-for-doing-neural-networks-with-back-propagation" style="font-size: 80%;"><b>Developing a code for doing neural networks with back propagation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs077.html#train-and-test-datasets" style="font-size: 80%;"><b>Train and test datasets</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs078.html#define-model-and-architecture" style="font-size: 80%;"><b>Define model and architecture</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs079.html#layers" style="font-size: 80%;"><b>Layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs080.html#weights-and-biases" style="font-size: 80%;"><b>Weights and biases</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs081.html#feed-forward-pass" style="font-size: 80%;"><b>Feed-forward pass</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs082.html#matrix-multiplications" style="font-size: 80%;"><b>Matrix multiplications</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs083.html#choose-cost-function-and-optimizer" style="font-size: 80%;"><b>Choose cost function and optimizer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs084.html#optimizing-the-cost-function" style="font-size: 80%;"><b>Optimizing the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs085.html#regularization" style="font-size: 80%;"><b>Regularization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs086.html#matrix-multiplication" style="font-size: 80%;"><b>Matrix  multiplication</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs087.html#improving-performance" style="font-size: 80%;"><b>Improving performance</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs088.html#full-object-oriented-implementation" style="font-size: 80%;"><b>Full object-oriented implementation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs089.html#evaluate-model-performance-on-test-data" style="font-size: 80%;"><b>Evaluate model performance on test data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs090.html#adjust-hyperparameters" style="font-size: 80%;"><b>Adjust hyperparameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#scikit-learn-implementation" style="font-size: 80%;"><b>scikit-learn implementation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs094.html#building-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building neural networks in Tensorflow and Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs095.html#tensorflow" style="font-size: 80%;"><b>Tensorflow</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#using-keras" style="font-size: 80%;"><b>Using Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#building-a-neural-network-code" style="font-size: 80%;"><b>Building a neural network code</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#learning-rate-methods" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Learning rate methods</a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#usage-of-the-above-learning-rate-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of the above learning rate schedulers</a></li>
@@ -465,25 +463,101 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0016"></a>
 <!-- !split -->
-<h2 id="simple-neural-network-and-the-back-propagation-equations" class="anchor">Simple neural network and the  back propagation equations  </h2>
+<h2 id="code-example" class="anchor">Code example  </h2>
 
-<p>Let us now try to increase our level of ambition and attempt at setting 
-up the equations for a neural network with two input nodes, one hidden
-layer with two hidden nodes and one output layer with one output node/neuron only (see graph)..
+<p>The code here implements the above model with one hidden layer and
+scalar variables for the same function we studied in the previous
+example.  The code is however set up so that we can add multiple
+inputs \( x \) and target values \( y \). Note also that we have the
+possibility of defining a feature matrix \( \boldsymbol{X} \) with more than just
+one column for the input values. This will turn useful in our next example. We have also defined matrices and vectors for all of our operations although it is not necessary here.
 </p>
 
-<p>We need to define the following parameters and variables with the input layer (layer \( (0) \)) 
-where we label the  nodes \( x_1 \) and \( x_2 \)
-</p>
-$$
-x_1 = a_1^{(0)} \wedge x_2 = a_2^{(0)}.
-$$
 
-<p>The  hidden layer (layer \( (1) \)) has  nodes which yield the outputs \( a_1^{(1)} \) and \( a_2^{(1)} \)) with  weight \( \boldsymbol{w} \) and bias \( \boldsymbol{b} \) parameters</p>
-$$
-w_{ij}^{(1)}=\left\{w_{11}^{(1)},w_{12}^{(1)},w_{21}^{(1)},w_{22}^{(1)}\right\} \wedge b^{(1)}=\left\{b_1^{(1)},b_2^{(1)}\right\}.
-$$
+<!-- code=python (!bc pycod) typeset with pygments style "default" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #f8f8f8">
+  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
+<span style="color: #408080; font-style: italic"># We use the Sigmoid function as activation function</span>
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">sigmoid</span>(z):
+    <span style="color: #008000; font-weight: bold">return</span> <span style="color: #666666">1.0/</span>(<span style="color: #666666">1.0+</span>np<span style="color: #666666">.</span>exp(<span style="color: #666666">-</span>z))
+
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">forwardpropagation</span>(x):
+    <span style="color: #408080; font-style: italic"># weighted sum of inputs to the hidden layer</span>
+    z_1 <span style="color: #666666">=</span> np<span style="color: #666666">.</span>matmul(x, w_1) <span style="color: #666666">+</span> b_1
+    <span style="color: #408080; font-style: italic"># activation in the hidden layer</span>
+    a_1 <span style="color: #666666">=</span> sigmoid(z_1)
+    <span style="color: #408080; font-style: italic"># weighted sum of inputs to the output layer</span>
+    z_2 <span style="color: #666666">=</span> np<span style="color: #666666">.</span>matmul(a_1, w_2) <span style="color: #666666">+</span> b_2
+    a_2 <span style="color: #666666">=</span> z_2
+    <span style="color: #008000; font-weight: bold">return</span> a_1, a_2
+
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">backpropagation</span>(x, y):
+    a_1, a_2 <span style="color: #666666">=</span> forwardpropagation(x)
+    <span style="color: #408080; font-style: italic"># parameter delta for the output layer, note that a_2=z_2 and its derivative wrt z_2 is just 1</span>
+    delta_2 <span style="color: #666666">=</span> a_2 <span style="color: #666666">-</span> y
+    <span style="color: #008000">print</span>(<span style="color: #666666">0.5*</span>((a_2<span style="color: #666666">-</span>y)<span style="color: #666666">**2</span>))
+    <span style="color: #408080; font-style: italic"># delta for  the hidden layer</span>
+    delta_1 <span style="color: #666666">=</span> np<span style="color: #666666">.</span>matmul(delta_2, w_2<span style="color: #666666">.</span>T) <span style="color: #666666">*</span> a_1 <span style="color: #666666">*</span> (<span style="color: #666666">1</span> <span style="color: #666666">-</span> a_1)
+    <span style="color: #408080; font-style: italic"># gradients for the output layer</span>
+    output_weights_gradient <span style="color: #666666">=</span> np<span style="color: #666666">.</span>matmul(a_1<span style="color: #666666">.</span>T, delta_2)
+    output_bias_gradient <span style="color: #666666">=</span> np<span style="color: #666666">.</span>sum(delta_2, axis<span style="color: #666666">=0</span>)
+    <span style="color: #408080; font-style: italic"># gradient for the hidden layer</span>
+    hidden_weights_gradient <span style="color: #666666">=</span> np<span style="color: #666666">.</span>matmul(x<span style="color: #666666">.</span>T, delta_1)
+    hidden_bias_gradient <span style="color: #666666">=</span> np<span style="color: #666666">.</span>sum(delta_1, axis<span style="color: #666666">=0</span>)
+    <span style="color: #008000; font-weight: bold">return</span> output_weights_gradient, output_bias_gradient, hidden_weights_gradient, hidden_bias_gradient
+
+
+<span style="color: #408080; font-style: italic"># ensure the same random numbers appear every time</span>
+np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>seed(<span style="color: #666666">0</span>)
+<span style="color: #408080; font-style: italic"># Input variable</span>
+x <span style="color: #666666">=</span> np<span style="color: #666666">.</span>array([<span style="color: #666666">4.0</span>],dtype<span style="color: #666666">=</span>np<span style="color: #666666">.</span>float64)
+<span style="color: #408080; font-style: italic"># Target values</span>
+y <span style="color: #666666">=</span> <span style="color: #666666">2*</span>x<span style="color: #666666">+1.0</span> 
+
+<span style="color: #408080; font-style: italic"># Defining the neural network, only scalars here</span>
+n_inputs <span style="color: #666666">=</span> x<span style="color: #666666">.</span>shape
+n_features <span style="color: #666666">=</span> <span style="color: #666666">1</span>
+n_hidden_neurons <span style="color: #666666">=</span> <span style="color: #666666">1</span>
+n_outputs <span style="color: #666666">=</span> <span style="color: #666666">1</span>
+
+<span style="color: #408080; font-style: italic"># Initialize the network</span>
+<span style="color: #408080; font-style: italic"># weights and bias in the hidden layer</span>
+w_1 <span style="color: #666666">=</span> np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>randn(n_features, n_hidden_neurons)
+b_1 <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros(n_hidden_neurons) <span style="color: #666666">+</span> <span style="color: #666666">0.01</span>
+
+<span style="color: #408080; font-style: italic"># weights and bias in the output layer</span>
+w_2 <span style="color: #666666">=</span> np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>randn(n_hidden_neurons, n_outputs)
+b_2 <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros(n_outputs) <span style="color: #666666">+</span> <span style="color: #666666">0.01</span>
+
+eta <span style="color: #666666">=</span> <span style="color: #666666">0.1</span>
+<span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #666666">50</span>):
+    <span style="color: #408080; font-style: italic"># calculate gradients</span>
+    derivW2, derivB2, derivW1, derivB1 <span style="color: #666666">=</span> backpropagation(x, y)
+    <span style="color: #408080; font-style: italic"># update weights and biases</span>
+    w_2 <span style="color: #666666">-=</span> eta <span style="color: #666666">*</span> derivW2
+    b_2 <span style="color: #666666">-=</span> eta <span style="color: #666666">*</span> derivB2
+    w_1 <span style="color: #666666">-=</span> eta <span style="color: #666666">*</span> derivW1
+    b_1 <span style="color: #666666">-=</span> eta <span style="color: #666666">*</span> derivB1
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
+</div>
 
+<p>We see that after some few iterations (the results do depend on the learning rate however), we get an error which is rather small.</p>
 
 <p>
 <!-- navigation buttons at the bottom of the page -->
diff --git a/doc/pub/week42/html/._week42-bs017.html b/doc/pub/week42/html/._week42-bs017.html
index 31544c02e..dada053ed 100644
--- a/doc/pub/week42/html/._week42-bs017.html
+++ b/doc/pub/week42/html/._week42-bs017.html
@@ -36,16 +36,17 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Lecture October 14, 2024', 2, None, 'lecture-october-14-2024'),
-              ('Material for the active learning sessions on Tuesday and '
-               'Wednesday',
+ 'sections': [('Lecture October 13, 2025', 2, None, 'lecture-october-13-2025'),
+              ('Readings and videos', 2, None, 'readings-and-videos'),
+              ('Material for the lab sessions on Tuesday and Wednesday',
                2,
                None,
-               'material-for-the-active-learning-sessions-on-tuesday-and-wednesday'),
-              ('Writing a code which implements a feed-forward neural network',
+               'material-for-the-lab-sessions-on-tuesday-and-wednesday'),
+              ('Lecture material: Writing a code which implements a '
+               'feed-forward neural network',
                2,
                None,
-               'writing-a-code-which-implements-a-feed-forward-neural-network'),
+               'lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network'),
               ('Mathematics of deep learning',
                2,
                None,
@@ -55,10 +56,11 @@
                None,
                'reminder-on-books-with-hands-on-material-and-codes'),
               ('Reading recommendations', 2, None, 'reading-recommendations'),
-              ('First network example, simple percepetron with one input',
+              ('Reminder from last week: First network example, simple '
+               'percepetron with one input',
                2,
                None,
-               'first-network-example-simple-percepetron-with-one-input'),
+               'reminder-from-last-week-first-network-example-simple-percepetron-with-one-input'),
               ('Layout of a simple neural network with no hidden layer',
                2,
                None,
@@ -105,10 +107,10 @@
                None,
                'setting-up-the-equations-for-a-neural-network'),
               ('Layout of a neural network with three hidden layers (last '
-               'later = $l=L=4$, first layer $l=0$)',
+               'layer = $l=L=4$, first layer $l=0$)',
                2,
                None,
-               'layout-of-a-neural-network-with-three-hidden-layers-last-later-l-l-4-first-layer-l-0'),
+               'layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0'),
               ('Definitions', 2, None, 'definitions'),
               ('Inputs to the activation function',
                2,
@@ -291,10 +293,6 @@
                2,
                None,
                'collect-and-pre-process-data'),
-              ('The Breast Cancer Data, now with Keras',
-               2,
-               None,
-               'the-breast-cancer-data-now-with-keras'),
               ('Building a neural network code',
                2,
                None,
@@ -349,103 +347,103 @@
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs001.html#lecture-october-14-2024" style="font-size: 80%;"><b>Lecture October 14, 2024</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs002.html#material-for-the-active-learning-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Material for the active learning sessions on Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs003.html#writing-a-code-which-implements-a-feed-forward-neural-network" style="font-size: 80%;"><b>Writing a code which implements a feed-forward neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs004.html#mathematics-of-deep-learning" style="font-size: 80%;"><b>Mathematics of deep learning</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs005.html#reminder-on-books-with-hands-on-material-and-codes" style="font-size: 80%;"><b>Reminder on books with hands-on material and codes</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs006.html#reading-recommendations" style="font-size: 80%;"><b>Reading recommendations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs007.html#first-network-example-simple-percepetron-with-one-input" style="font-size: 80%;"><b>First network example, simple percepetron with one input</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs008.html#layout-of-a-simple-neural-network-with-no-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with no hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs009.html#optimizing-the-parameters" style="font-size: 80%;"><b>Optimizing the parameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs010.html#adding-a-hidden-layer" style="font-size: 80%;"><b>Adding a hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs011.html#layout-of-a-simple-neural-network-with-one-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with one hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs012.html#the-derivatives" style="font-size: 80%;"><b>The derivatives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs013.html#important-observations" style="font-size: 80%;"><b>Important observations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs014.html#the-training" style="font-size: 80%;"><b>The training</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs015.html#code-example" style="font-size: 80%;"><b>Code example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs016.html#simple-neural-network-and-the-back-propagation-equations" style="font-size: 80%;"><b>Simple neural network and the  back propagation equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="#layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-with-two-hidden-noeds-and-one-output-node" style="font-size: 80%;"><b>Layout of a simple neural network with two input nodes, one  hidden layer with two hidden noeds and one output node</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs018.html#the-ouput-layer" style="font-size: 80%;"><b>The ouput layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs019.html#compact-expressions" style="font-size: 80%;"><b>Compact expressions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs020.html#output-layer" style="font-size: 80%;"><b>Output layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs021.html#explicit-derivatives" style="font-size: 80%;"><b>Explicit derivatives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs022.html#derivatives-of-the-hidden-layer" style="font-size: 80%;"><b>Derivatives of the hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs023.html#final-expression" style="font-size: 80%;"><b>Final expression</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs024.html#completing-the-list" style="font-size: 80%;"><b>Completing the list</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs025.html#final-expressions-for-the-biases-of-the-hidden-layer" style="font-size: 80%;"><b>Final expressions for the biases of the hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs026.html#gradient-expressions" style="font-size: 80%;"><b>Gradient expressions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs027.html#setting-up-the-equations-for-a-neural-network" style="font-size: 80%;"><b>Setting up the equations for a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs028.html#layout-of-a-neural-network-with-three-hidden-layers-last-later-l-l-4-first-layer-l-0" style="font-size: 80%;"><b>Layout of a neural network with three hidden layers (last later = \( l=L=4 \), first layer \( l=0 \))</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs029.html#definitions" style="font-size: 80%;"><b>Definitions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs030.html#inputs-to-the-activation-function" style="font-size: 80%;"><b>Inputs to the activation function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs031.html#layout-of-input-to-first-hidden-layer-l-1-from-input-layer-l-0" style="font-size: 80%;"><b>Layout of input to first hidden layer \( l=1 \) from input layer \( l=0 \)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs032.html#derivatives-and-the-chain-rule" style="font-size: 80%;"><b>Derivatives and the chain rule</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs033.html#derivative-of-the-cost-function" style="font-size: 80%;"><b>Derivative of the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs034.html#the-back-propagation-equations-for-a-neural-network" style="font-size: 80%;"><b>The  back propagation equations for a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs035.html#analyzing-the-last-results" style="font-size: 80%;"><b>Analyzing the last results</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs036.html#more-considerations" style="font-size: 80%;"><b>More considerations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs037.html#derivatives-in-terms-of-z-j-l" style="font-size: 80%;"><b>Derivatives in terms of \( z_j^L \)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs038.html#bringing-it-together" style="font-size: 80%;"><b>Bringing it together</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs039.html#final-back-propagating-equation" style="font-size: 80%;"><b>Final back propagating equation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs040.html#using-the-chain-rule-and-summing-over-all-k-entries" style="font-size: 80%;"><b>Using the chain rule and summing over all \( k \) entries</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs041.html#setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations" style="font-size: 80%;"><b>Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs042.html#setting-up-the-back-propagation-algorithm-part-1" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 1</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs043.html#setting-up-the-back-propagation-algorithm-part-2" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs044.html#setting-up-the-back-propagation-algorithm-part-3" style="font-size: 80%;"><b>Setting up the Back propagation algorithm, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs045.html#updating-the-gradients" style="font-size: 80%;"><b>Updating the gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs001.html#lecture-october-13-2025" style="font-size: 80%;"><b>Lecture October 13, 2025</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs002.html#readings-and-videos" style="font-size: 80%;"><b>Readings and videos</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs003.html#material-for-the-lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Material for the lab sessions on Tuesday and Wednesday</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs004.html#lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network" style="font-size: 80%;"><b>Lecture material: Writing a code which implements a feed-forward neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs005.html#mathematics-of-deep-learning" style="font-size: 80%;"><b>Mathematics of deep learning</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs006.html#reminder-on-books-with-hands-on-material-and-codes" style="font-size: 80%;"><b>Reminder on books with hands-on material and codes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs007.html#reading-recommendations" style="font-size: 80%;"><b>Reading recommendations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs008.html#reminder-from-last-week-first-network-example-simple-percepetron-with-one-input" style="font-size: 80%;"><b>Reminder from last week: First network example, simple percepetron with one input</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs009.html#layout-of-a-simple-neural-network-with-no-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with no hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs010.html#optimizing-the-parameters" style="font-size: 80%;"><b>Optimizing the parameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs011.html#adding-a-hidden-layer" style="font-size: 80%;"><b>Adding a hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs012.html#layout-of-a-simple-neural-network-with-one-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with one hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs013.html#the-derivatives" style="font-size: 80%;"><b>The derivatives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs014.html#important-observations" style="font-size: 80%;"><b>Important observations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs015.html#the-training" style="font-size: 80%;"><b>The training</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs016.html#code-example" style="font-size: 80%;"><b>Code example</b></a></li>
+     <!-- navigation toc: --> <li><a href="#simple-neural-network-and-the-back-propagation-equations" style="font-size: 80%;"><b>Simple neural network and the  back propagation equations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs018.html#layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-with-two-hidden-noeds-and-one-output-node" style="font-size: 80%;"><b>Layout of a simple neural network with two input nodes, one  hidden layer with two hidden noeds and one output node</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs019.html#the-ouput-layer" style="font-size: 80%;"><b>The ouput layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs020.html#compact-expressions" style="font-size: 80%;"><b>Compact expressions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs021.html#output-layer" style="font-size: 80%;"><b>Output layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs022.html#explicit-derivatives" style="font-size: 80%;"><b>Explicit derivatives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs023.html#derivatives-of-the-hidden-layer" style="font-size: 80%;"><b>Derivatives of the hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs024.html#final-expression" style="font-size: 80%;"><b>Final expression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs025.html#completing-the-list" style="font-size: 80%;"><b>Completing the list</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs026.html#final-expressions-for-the-biases-of-the-hidden-layer" style="font-size: 80%;"><b>Final expressions for the biases of the hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs027.html#gradient-expressions" style="font-size: 80%;"><b>Gradient expressions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs028.html#setting-up-the-equations-for-a-neural-network" style="font-size: 80%;"><b>Setting up the equations for a neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs029.html#layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0" style="font-size: 80%;"><b>Layout of a neural network with three hidden layers (last layer = \( l=L=4 \), first layer \( l=0 \))</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs030.html#definitions" style="font-size: 80%;"><b>Definitions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs031.html#inputs-to-the-activation-function" style="font-size: 80%;"><b>Inputs to the activation function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs032.html#layout-of-input-to-first-hidden-layer-l-1-from-input-layer-l-0" style="font-size: 80%;"><b>Layout of input to first hidden layer \( l=1 \) from input layer \( l=0 \)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs033.html#derivatives-and-the-chain-rule" style="font-size: 80%;"><b>Derivatives and the chain rule</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs034.html#derivative-of-the-cost-function" style="font-size: 80%;"><b>Derivative of the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs035.html#the-back-propagation-equations-for-a-neural-network" style="font-size: 80%;"><b>The  back propagation equations for a neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs036.html#analyzing-the-last-results" style="font-size: 80%;"><b>Analyzing the last results</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs037.html#more-considerations" style="font-size: 80%;"><b>More considerations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs038.html#derivatives-in-terms-of-z-j-l" style="font-size: 80%;"><b>Derivatives in terms of \( z_j^L \)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs039.html#bringing-it-together" style="font-size: 80%;"><b>Bringing it together</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs040.html#final-back-propagating-equation" style="font-size: 80%;"><b>Final back propagating equation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs041.html#using-the-chain-rule-and-summing-over-all-k-entries" style="font-size: 80%;"><b>Using the chain rule and summing over all \( k \) entries</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs042.html#setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations" style="font-size: 80%;"><b>Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs043.html#setting-up-the-back-propagation-algorithm-part-1" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 1</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs044.html#setting-up-the-back-propagation-algorithm-part-2" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 2</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs045.html#setting-up-the-back-propagation-algorithm-part-3" style="font-size: 80%;"><b>Setting up the Back propagation algorithm, part 3</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs046.html#updating-the-gradients" style="font-size: 80%;"><b>Updating the gradients</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#activation-functions" style="font-size: 80%;"><b>Activation functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs047.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs048.html#relevance" style="font-size: 80%;"><b>Relevance</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs049.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs050.html#exploding-gradients" style="font-size: 80%;"><b>Exploding gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs051.html#is-the-logistic-activation-function-sigmoid-our-choice" style="font-size: 80%;"><b>Is the Logistic activation function (Sigmoid)  our choice?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs052.html#logistic-function-as-the-root-of-problems" style="font-size: 80%;"><b>Logistic function as the root of problems</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs053.html#the-derivative-of-the-logistic-funtion" style="font-size: 80%;"><b>The derivative of the Logistic funtion</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs054.html#insights-from-the-paper-by-glorot-and-bengio" style="font-size: 80%;"><b>Insights from the paper by Glorot and Bengio</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs055.html#the-relu-function-family" style="font-size: 80%;"><b>The RELU function family</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs056.html#elu-function" style="font-size: 80%;"><b>ELU function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs057.html#which-activation-function-should-we-use" style="font-size: 80%;"><b>Which activation function should we use?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs058.html#more-on-activation-functions-output-layers" style="font-size: 80%;"><b>More on activation functions, output layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs059.html#fine-tuning-neural-network-hyperparameters" style="font-size: 80%;"><b>Fine-tuning neural network hyperparameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs060.html#hidden-layers" style="font-size: 80%;"><b>Hidden layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs061.html#batch-normalization" style="font-size: 80%;"><b>Batch Normalization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs062.html#dropout" style="font-size: 80%;"><b>Dropout</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs063.html#gradient-clipping" style="font-size: 80%;"><b>Gradient Clipping</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs064.html#a-top-down-perspective-on-neural-networks" style="font-size: 80%;"><b>A top-down perspective on Neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs065.html#more-top-down-perspectives" style="font-size: 80%;"><b>More top-down perspectives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs066.html#limitations-of-supervised-learning-with-deep-networks" style="font-size: 80%;"><b>Limitations of supervised learning with deep networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs067.html#limitations-of-nns" style="font-size: 80%;"><b>Limitations of NNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs068.html#homogeneous-data" style="font-size: 80%;"><b>Homogeneous data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs069.html#more-limitations" style="font-size: 80%;"><b>More limitations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs070.html#setting-up-a-multi-layer-perceptron-model-for-classification" style="font-size: 80%;"><b>Setting up a Multi-layer perceptron model for classification</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs071.html#defining-the-cost-function" style="font-size: 80%;"><b>Defining the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs072.html#example-binary-classification-problem" style="font-size: 80%;"><b>Example: binary classification problem</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs073.html#the-softmax-function" style="font-size: 80%;"><b>The Softmax function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs074.html#developing-a-code-for-doing-neural-networks-with-back-propagation" style="font-size: 80%;"><b>Developing a code for doing neural networks with back propagation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs076.html#train-and-test-datasets" style="font-size: 80%;"><b>Train and test datasets</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs077.html#define-model-and-architecture" style="font-size: 80%;"><b>Define model and architecture</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs078.html#layers" style="font-size: 80%;"><b>Layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs079.html#weights-and-biases" style="font-size: 80%;"><b>Weights and biases</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs080.html#feed-forward-pass" style="font-size: 80%;"><b>Feed-forward pass</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs081.html#matrix-multiplications" style="font-size: 80%;"><b>Matrix multiplications</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs082.html#choose-cost-function-and-optimizer" style="font-size: 80%;"><b>Choose cost function and optimizer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs083.html#optimizing-the-cost-function" style="font-size: 80%;"><b>Optimizing the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs084.html#regularization" style="font-size: 80%;"><b>Regularization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs085.html#matrix-multiplication" style="font-size: 80%;"><b>Matrix  multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs086.html#improving-performance" style="font-size: 80%;"><b>Improving performance</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs087.html#full-object-oriented-implementation" style="font-size: 80%;"><b>Full object-oriented implementation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs088.html#evaluate-model-performance-on-test-data" style="font-size: 80%;"><b>Evaluate model performance on test data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs089.html#adjust-hyperparameters" style="font-size: 80%;"><b>Adjust hyperparameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs091.html#scikit-learn-implementation" style="font-size: 80%;"><b>scikit-learn implementation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#building-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs094.html#tensorflow" style="font-size: 80%;"><b>Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs095.html#using-keras" style="font-size: 80%;"><b>Using Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#the-breast-cancer-data-now-with-keras" style="font-size: 80%;"><b>The Breast Cancer Data, now with Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs048.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs049.html#relevance" style="font-size: 80%;"><b>Relevance</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs050.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs051.html#exploding-gradients" style="font-size: 80%;"><b>Exploding gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs052.html#is-the-logistic-activation-function-sigmoid-our-choice" style="font-size: 80%;"><b>Is the Logistic activation function (Sigmoid)  our choice?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs053.html#logistic-function-as-the-root-of-problems" style="font-size: 80%;"><b>Logistic function as the root of problems</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs054.html#the-derivative-of-the-logistic-funtion" style="font-size: 80%;"><b>The derivative of the Logistic funtion</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs055.html#insights-from-the-paper-by-glorot-and-bengio" style="font-size: 80%;"><b>Insights from the paper by Glorot and Bengio</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs056.html#the-relu-function-family" style="font-size: 80%;"><b>The RELU function family</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs057.html#elu-function" style="font-size: 80%;"><b>ELU function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs058.html#which-activation-function-should-we-use" style="font-size: 80%;"><b>Which activation function should we use?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs059.html#more-on-activation-functions-output-layers" style="font-size: 80%;"><b>More on activation functions, output layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs060.html#fine-tuning-neural-network-hyperparameters" style="font-size: 80%;"><b>Fine-tuning neural network hyperparameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs061.html#hidden-layers" style="font-size: 80%;"><b>Hidden layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs062.html#batch-normalization" style="font-size: 80%;"><b>Batch Normalization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs063.html#dropout" style="font-size: 80%;"><b>Dropout</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs064.html#gradient-clipping" style="font-size: 80%;"><b>Gradient Clipping</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs065.html#a-top-down-perspective-on-neural-networks" style="font-size: 80%;"><b>A top-down perspective on Neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs066.html#more-top-down-perspectives" style="font-size: 80%;"><b>More top-down perspectives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs067.html#limitations-of-supervised-learning-with-deep-networks" style="font-size: 80%;"><b>Limitations of supervised learning with deep networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs068.html#limitations-of-nns" style="font-size: 80%;"><b>Limitations of NNs</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs069.html#homogeneous-data" style="font-size: 80%;"><b>Homogeneous data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs070.html#more-limitations" style="font-size: 80%;"><b>More limitations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs071.html#setting-up-a-multi-layer-perceptron-model-for-classification" style="font-size: 80%;"><b>Setting up a Multi-layer perceptron model for classification</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs072.html#defining-the-cost-function" style="font-size: 80%;"><b>Defining the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs073.html#example-binary-classification-problem" style="font-size: 80%;"><b>Example: binary classification problem</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs074.html#the-softmax-function" style="font-size: 80%;"><b>The Softmax function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs075.html#developing-a-code-for-doing-neural-networks-with-back-propagation" style="font-size: 80%;"><b>Developing a code for doing neural networks with back propagation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs077.html#train-and-test-datasets" style="font-size: 80%;"><b>Train and test datasets</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs078.html#define-model-and-architecture" style="font-size: 80%;"><b>Define model and architecture</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs079.html#layers" style="font-size: 80%;"><b>Layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs080.html#weights-and-biases" style="font-size: 80%;"><b>Weights and biases</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs081.html#feed-forward-pass" style="font-size: 80%;"><b>Feed-forward pass</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs082.html#matrix-multiplications" style="font-size: 80%;"><b>Matrix multiplications</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs083.html#choose-cost-function-and-optimizer" style="font-size: 80%;"><b>Choose cost function and optimizer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs084.html#optimizing-the-cost-function" style="font-size: 80%;"><b>Optimizing the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs085.html#regularization" style="font-size: 80%;"><b>Regularization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs086.html#matrix-multiplication" style="font-size: 80%;"><b>Matrix  multiplication</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs087.html#improving-performance" style="font-size: 80%;"><b>Improving performance</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs088.html#full-object-oriented-implementation" style="font-size: 80%;"><b>Full object-oriented implementation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs089.html#evaluate-model-performance-on-test-data" style="font-size: 80%;"><b>Evaluate model performance on test data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs090.html#adjust-hyperparameters" style="font-size: 80%;"><b>Adjust hyperparameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#scikit-learn-implementation" style="font-size: 80%;"><b>scikit-learn implementation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs094.html#building-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building neural networks in Tensorflow and Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs095.html#tensorflow" style="font-size: 80%;"><b>Tensorflow</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#using-keras" style="font-size: 80%;"><b>Using Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#building-a-neural-network-code" style="font-size: 80%;"><b>Building a neural network code</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#learning-rate-methods" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Learning rate methods</a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#usage-of-the-above-learning-rate-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of the above learning rate schedulers</a></li>
@@ -465,13 +463,25 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0017"></a>
 <!-- !split -->
-<h2 id="layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-with-two-hidden-noeds-and-one-output-node" class="anchor">Layout of a simple neural network with two input nodes, one  hidden layer with two hidden noeds and one output node  </h2>
+<h2 id="simple-neural-network-and-the-back-propagation-equations" class="anchor">Simple neural network and the  back propagation equations  </h2>
+
+<p>Let us now try to increase our level of ambition and attempt at setting 
+up the equations for a neural network with two input nodes, one hidden
+layer with two hidden nodes and one output layer with one output node/neuron only (see graph)..
+</p>
+
+<p>We need to define the following parameters and variables with the input layer (layer \( (0) \)) 
+where we label the  nodes \( x_1 \) and \( x_2 \)
+</p>
+$$
+x_1 = a_1^{(0)} \wedge x_2 = a_2^{(0)}.
+$$
+
+<p>The  hidden layer (layer \( (1) \)) has  nodes which yield the outputs \( a_1^{(1)} \) and \( a_2^{(1)} \)) with  weight \( \boldsymbol{w} \) and bias \( \boldsymbol{b} \) parameters</p>
+$$
+w_{ij}^{(1)}=\left\{w_{11}^{(1)},w_{12}^{(1)},w_{21}^{(1)},w_{22}^{(1)}\right\} \wedge b^{(1)}=\left\{b_1^{(1)},b_2^{(1)}\right\}.
+$$
 
-<br/><br/>
-<center>
-<p><img src="http://github.com/figures/simplenn3.png" width="900" align="bottom"></p>
-</center>
-<br/><br/>
 
 <p>
 <!-- navigation buttons at the bottom of the page -->
diff --git a/doc/pub/week42/html/._week42-bs018.html b/doc/pub/week42/html/._week42-bs018.html
index 84b0b581a..c4f2ce09c 100644
--- a/doc/pub/week42/html/._week42-bs018.html
+++ b/doc/pub/week42/html/._week42-bs018.html
@@ -36,16 +36,17 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Lecture October 14, 2024', 2, None, 'lecture-october-14-2024'),
-              ('Material for the active learning sessions on Tuesday and '
-               'Wednesday',
+ 'sections': [('Lecture October 13, 2025', 2, None, 'lecture-october-13-2025'),
+              ('Readings and videos', 2, None, 'readings-and-videos'),
+              ('Material for the lab sessions on Tuesday and Wednesday',
                2,
                None,
-               'material-for-the-active-learning-sessions-on-tuesday-and-wednesday'),
-              ('Writing a code which implements a feed-forward neural network',
+               'material-for-the-lab-sessions-on-tuesday-and-wednesday'),
+              ('Lecture material: Writing a code which implements a '
+               'feed-forward neural network',
                2,
                None,
-               'writing-a-code-which-implements-a-feed-forward-neural-network'),
+               'lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network'),
               ('Mathematics of deep learning',
                2,
                None,
@@ -55,10 +56,11 @@
                None,
                'reminder-on-books-with-hands-on-material-and-codes'),
               ('Reading recommendations', 2, None, 'reading-recommendations'),
-              ('First network example, simple percepetron with one input',
+              ('Reminder from last week: First network example, simple '
+               'percepetron with one input',
                2,
                None,
-               'first-network-example-simple-percepetron-with-one-input'),
+               'reminder-from-last-week-first-network-example-simple-percepetron-with-one-input'),
               ('Layout of a simple neural network with no hidden layer',
                2,
                None,
@@ -105,10 +107,10 @@
                None,
                'setting-up-the-equations-for-a-neural-network'),
               ('Layout of a neural network with three hidden layers (last '
-               'later = $l=L=4$, first layer $l=0$)',
+               'layer = $l=L=4$, first layer $l=0$)',
                2,
                None,
-               'layout-of-a-neural-network-with-three-hidden-layers-last-later-l-l-4-first-layer-l-0'),
+               'layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0'),
               ('Definitions', 2, None, 'definitions'),
               ('Inputs to the activation function',
                2,
@@ -291,10 +293,6 @@
                2,
                None,
                'collect-and-pre-process-data'),
-              ('The Breast Cancer Data, now with Keras',
-               2,
-               None,
-               'the-breast-cancer-data-now-with-keras'),
               ('Building a neural network code',
                2,
                None,
@@ -349,103 +347,103 @@
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs001.html#lecture-october-14-2024" style="font-size: 80%;"><b>Lecture October 14, 2024</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs002.html#material-for-the-active-learning-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Material for the active learning sessions on Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs003.html#writing-a-code-which-implements-a-feed-forward-neural-network" style="font-size: 80%;"><b>Writing a code which implements a feed-forward neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs004.html#mathematics-of-deep-learning" style="font-size: 80%;"><b>Mathematics of deep learning</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs005.html#reminder-on-books-with-hands-on-material-and-codes" style="font-size: 80%;"><b>Reminder on books with hands-on material and codes</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs006.html#reading-recommendations" style="font-size: 80%;"><b>Reading recommendations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs007.html#first-network-example-simple-percepetron-with-one-input" style="font-size: 80%;"><b>First network example, simple percepetron with one input</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs008.html#layout-of-a-simple-neural-network-with-no-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with no hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs009.html#optimizing-the-parameters" style="font-size: 80%;"><b>Optimizing the parameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs010.html#adding-a-hidden-layer" style="font-size: 80%;"><b>Adding a hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs011.html#layout-of-a-simple-neural-network-with-one-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with one hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs012.html#the-derivatives" style="font-size: 80%;"><b>The derivatives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs013.html#important-observations" style="font-size: 80%;"><b>Important observations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs014.html#the-training" style="font-size: 80%;"><b>The training</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs015.html#code-example" style="font-size: 80%;"><b>Code example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs016.html#simple-neural-network-and-the-back-propagation-equations" style="font-size: 80%;"><b>Simple neural network and the  back propagation equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs017.html#layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-with-two-hidden-noeds-and-one-output-node" style="font-size: 80%;"><b>Layout of a simple neural network with two input nodes, one  hidden layer with two hidden noeds and one output node</b></a></li>
-     <!-- navigation toc: --> <li><a href="#the-ouput-layer" style="font-size: 80%;"><b>The ouput layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs019.html#compact-expressions" style="font-size: 80%;"><b>Compact expressions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs020.html#output-layer" style="font-size: 80%;"><b>Output layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs021.html#explicit-derivatives" style="font-size: 80%;"><b>Explicit derivatives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs022.html#derivatives-of-the-hidden-layer" style="font-size: 80%;"><b>Derivatives of the hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs023.html#final-expression" style="font-size: 80%;"><b>Final expression</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs024.html#completing-the-list" style="font-size: 80%;"><b>Completing the list</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs025.html#final-expressions-for-the-biases-of-the-hidden-layer" style="font-size: 80%;"><b>Final expressions for the biases of the hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs026.html#gradient-expressions" style="font-size: 80%;"><b>Gradient expressions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs027.html#setting-up-the-equations-for-a-neural-network" style="font-size: 80%;"><b>Setting up the equations for a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs028.html#layout-of-a-neural-network-with-three-hidden-layers-last-later-l-l-4-first-layer-l-0" style="font-size: 80%;"><b>Layout of a neural network with three hidden layers (last later = \( l=L=4 \), first layer \( l=0 \))</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs029.html#definitions" style="font-size: 80%;"><b>Definitions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs030.html#inputs-to-the-activation-function" style="font-size: 80%;"><b>Inputs to the activation function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs031.html#layout-of-input-to-first-hidden-layer-l-1-from-input-layer-l-0" style="font-size: 80%;"><b>Layout of input to first hidden layer \( l=1 \) from input layer \( l=0 \)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs032.html#derivatives-and-the-chain-rule" style="font-size: 80%;"><b>Derivatives and the chain rule</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs033.html#derivative-of-the-cost-function" style="font-size: 80%;"><b>Derivative of the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs034.html#the-back-propagation-equations-for-a-neural-network" style="font-size: 80%;"><b>The  back propagation equations for a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs035.html#analyzing-the-last-results" style="font-size: 80%;"><b>Analyzing the last results</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs036.html#more-considerations" style="font-size: 80%;"><b>More considerations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs037.html#derivatives-in-terms-of-z-j-l" style="font-size: 80%;"><b>Derivatives in terms of \( z_j^L \)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs038.html#bringing-it-together" style="font-size: 80%;"><b>Bringing it together</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs039.html#final-back-propagating-equation" style="font-size: 80%;"><b>Final back propagating equation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs040.html#using-the-chain-rule-and-summing-over-all-k-entries" style="font-size: 80%;"><b>Using the chain rule and summing over all \( k \) entries</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs041.html#setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations" style="font-size: 80%;"><b>Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs042.html#setting-up-the-back-propagation-algorithm-part-1" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 1</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs043.html#setting-up-the-back-propagation-algorithm-part-2" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs044.html#setting-up-the-back-propagation-algorithm-part-3" style="font-size: 80%;"><b>Setting up the Back propagation algorithm, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs045.html#updating-the-gradients" style="font-size: 80%;"><b>Updating the gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs001.html#lecture-october-13-2025" style="font-size: 80%;"><b>Lecture October 13, 2025</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs002.html#readings-and-videos" style="font-size: 80%;"><b>Readings and videos</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs003.html#material-for-the-lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Material for the lab sessions on Tuesday and Wednesday</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs004.html#lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network" style="font-size: 80%;"><b>Lecture material: Writing a code which implements a feed-forward neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs005.html#mathematics-of-deep-learning" style="font-size: 80%;"><b>Mathematics of deep learning</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs006.html#reminder-on-books-with-hands-on-material-and-codes" style="font-size: 80%;"><b>Reminder on books with hands-on material and codes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs007.html#reading-recommendations" style="font-size: 80%;"><b>Reading recommendations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs008.html#reminder-from-last-week-first-network-example-simple-percepetron-with-one-input" style="font-size: 80%;"><b>Reminder from last week: First network example, simple percepetron with one input</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs009.html#layout-of-a-simple-neural-network-with-no-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with no hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs010.html#optimizing-the-parameters" style="font-size: 80%;"><b>Optimizing the parameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs011.html#adding-a-hidden-layer" style="font-size: 80%;"><b>Adding a hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs012.html#layout-of-a-simple-neural-network-with-one-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with one hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs013.html#the-derivatives" style="font-size: 80%;"><b>The derivatives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs014.html#important-observations" style="font-size: 80%;"><b>Important observations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs015.html#the-training" style="font-size: 80%;"><b>The training</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs016.html#code-example" style="font-size: 80%;"><b>Code example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs017.html#simple-neural-network-and-the-back-propagation-equations" style="font-size: 80%;"><b>Simple neural network and the  back propagation equations</b></a></li>
+     <!-- navigation toc: --> <li><a href="#layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-with-two-hidden-noeds-and-one-output-node" style="font-size: 80%;"><b>Layout of a simple neural network with two input nodes, one  hidden layer with two hidden noeds and one output node</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs019.html#the-ouput-layer" style="font-size: 80%;"><b>The ouput layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs020.html#compact-expressions" style="font-size: 80%;"><b>Compact expressions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs021.html#output-layer" style="font-size: 80%;"><b>Output layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs022.html#explicit-derivatives" style="font-size: 80%;"><b>Explicit derivatives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs023.html#derivatives-of-the-hidden-layer" style="font-size: 80%;"><b>Derivatives of the hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs024.html#final-expression" style="font-size: 80%;"><b>Final expression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs025.html#completing-the-list" style="font-size: 80%;"><b>Completing the list</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs026.html#final-expressions-for-the-biases-of-the-hidden-layer" style="font-size: 80%;"><b>Final expressions for the biases of the hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs027.html#gradient-expressions" style="font-size: 80%;"><b>Gradient expressions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs028.html#setting-up-the-equations-for-a-neural-network" style="font-size: 80%;"><b>Setting up the equations for a neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs029.html#layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0" style="font-size: 80%;"><b>Layout of a neural network with three hidden layers (last layer = \( l=L=4 \), first layer \( l=0 \))</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs030.html#definitions" style="font-size: 80%;"><b>Definitions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs031.html#inputs-to-the-activation-function" style="font-size: 80%;"><b>Inputs to the activation function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs032.html#layout-of-input-to-first-hidden-layer-l-1-from-input-layer-l-0" style="font-size: 80%;"><b>Layout of input to first hidden layer \( l=1 \) from input layer \( l=0 \)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs033.html#derivatives-and-the-chain-rule" style="font-size: 80%;"><b>Derivatives and the chain rule</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs034.html#derivative-of-the-cost-function" style="font-size: 80%;"><b>Derivative of the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs035.html#the-back-propagation-equations-for-a-neural-network" style="font-size: 80%;"><b>The  back propagation equations for a neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs036.html#analyzing-the-last-results" style="font-size: 80%;"><b>Analyzing the last results</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs037.html#more-considerations" style="font-size: 80%;"><b>More considerations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs038.html#derivatives-in-terms-of-z-j-l" style="font-size: 80%;"><b>Derivatives in terms of \( z_j^L \)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs039.html#bringing-it-together" style="font-size: 80%;"><b>Bringing it together</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs040.html#final-back-propagating-equation" style="font-size: 80%;"><b>Final back propagating equation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs041.html#using-the-chain-rule-and-summing-over-all-k-entries" style="font-size: 80%;"><b>Using the chain rule and summing over all \( k \) entries</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs042.html#setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations" style="font-size: 80%;"><b>Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs043.html#setting-up-the-back-propagation-algorithm-part-1" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 1</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs044.html#setting-up-the-back-propagation-algorithm-part-2" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 2</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs045.html#setting-up-the-back-propagation-algorithm-part-3" style="font-size: 80%;"><b>Setting up the Back propagation algorithm, part 3</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs046.html#updating-the-gradients" style="font-size: 80%;"><b>Updating the gradients</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#activation-functions" style="font-size: 80%;"><b>Activation functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs047.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs048.html#relevance" style="font-size: 80%;"><b>Relevance</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs049.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs050.html#exploding-gradients" style="font-size: 80%;"><b>Exploding gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs051.html#is-the-logistic-activation-function-sigmoid-our-choice" style="font-size: 80%;"><b>Is the Logistic activation function (Sigmoid)  our choice?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs052.html#logistic-function-as-the-root-of-problems" style="font-size: 80%;"><b>Logistic function as the root of problems</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs053.html#the-derivative-of-the-logistic-funtion" style="font-size: 80%;"><b>The derivative of the Logistic funtion</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs054.html#insights-from-the-paper-by-glorot-and-bengio" style="font-size: 80%;"><b>Insights from the paper by Glorot and Bengio</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs055.html#the-relu-function-family" style="font-size: 80%;"><b>The RELU function family</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs056.html#elu-function" style="font-size: 80%;"><b>ELU function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs057.html#which-activation-function-should-we-use" style="font-size: 80%;"><b>Which activation function should we use?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs058.html#more-on-activation-functions-output-layers" style="font-size: 80%;"><b>More on activation functions, output layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs059.html#fine-tuning-neural-network-hyperparameters" style="font-size: 80%;"><b>Fine-tuning neural network hyperparameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs060.html#hidden-layers" style="font-size: 80%;"><b>Hidden layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs061.html#batch-normalization" style="font-size: 80%;"><b>Batch Normalization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs062.html#dropout" style="font-size: 80%;"><b>Dropout</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs063.html#gradient-clipping" style="font-size: 80%;"><b>Gradient Clipping</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs064.html#a-top-down-perspective-on-neural-networks" style="font-size: 80%;"><b>A top-down perspective on Neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs065.html#more-top-down-perspectives" style="font-size: 80%;"><b>More top-down perspectives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs066.html#limitations-of-supervised-learning-with-deep-networks" style="font-size: 80%;"><b>Limitations of supervised learning with deep networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs067.html#limitations-of-nns" style="font-size: 80%;"><b>Limitations of NNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs068.html#homogeneous-data" style="font-size: 80%;"><b>Homogeneous data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs069.html#more-limitations" style="font-size: 80%;"><b>More limitations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs070.html#setting-up-a-multi-layer-perceptron-model-for-classification" style="font-size: 80%;"><b>Setting up a Multi-layer perceptron model for classification</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs071.html#defining-the-cost-function" style="font-size: 80%;"><b>Defining the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs072.html#example-binary-classification-problem" style="font-size: 80%;"><b>Example: binary classification problem</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs073.html#the-softmax-function" style="font-size: 80%;"><b>The Softmax function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs074.html#developing-a-code-for-doing-neural-networks-with-back-propagation" style="font-size: 80%;"><b>Developing a code for doing neural networks with back propagation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs076.html#train-and-test-datasets" style="font-size: 80%;"><b>Train and test datasets</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs077.html#define-model-and-architecture" style="font-size: 80%;"><b>Define model and architecture</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs078.html#layers" style="font-size: 80%;"><b>Layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs079.html#weights-and-biases" style="font-size: 80%;"><b>Weights and biases</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs080.html#feed-forward-pass" style="font-size: 80%;"><b>Feed-forward pass</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs081.html#matrix-multiplications" style="font-size: 80%;"><b>Matrix multiplications</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs082.html#choose-cost-function-and-optimizer" style="font-size: 80%;"><b>Choose cost function and optimizer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs083.html#optimizing-the-cost-function" style="font-size: 80%;"><b>Optimizing the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs084.html#regularization" style="font-size: 80%;"><b>Regularization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs085.html#matrix-multiplication" style="font-size: 80%;"><b>Matrix  multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs086.html#improving-performance" style="font-size: 80%;"><b>Improving performance</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs087.html#full-object-oriented-implementation" style="font-size: 80%;"><b>Full object-oriented implementation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs088.html#evaluate-model-performance-on-test-data" style="font-size: 80%;"><b>Evaluate model performance on test data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs089.html#adjust-hyperparameters" style="font-size: 80%;"><b>Adjust hyperparameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs091.html#scikit-learn-implementation" style="font-size: 80%;"><b>scikit-learn implementation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#building-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs094.html#tensorflow" style="font-size: 80%;"><b>Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs095.html#using-keras" style="font-size: 80%;"><b>Using Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#the-breast-cancer-data-now-with-keras" style="font-size: 80%;"><b>The Breast Cancer Data, now with Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs048.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs049.html#relevance" style="font-size: 80%;"><b>Relevance</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs050.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs051.html#exploding-gradients" style="font-size: 80%;"><b>Exploding gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs052.html#is-the-logistic-activation-function-sigmoid-our-choice" style="font-size: 80%;"><b>Is the Logistic activation function (Sigmoid)  our choice?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs053.html#logistic-function-as-the-root-of-problems" style="font-size: 80%;"><b>Logistic function as the root of problems</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs054.html#the-derivative-of-the-logistic-funtion" style="font-size: 80%;"><b>The derivative of the Logistic funtion</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs055.html#insights-from-the-paper-by-glorot-and-bengio" style="font-size: 80%;"><b>Insights from the paper by Glorot and Bengio</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs056.html#the-relu-function-family" style="font-size: 80%;"><b>The RELU function family</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs057.html#elu-function" style="font-size: 80%;"><b>ELU function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs058.html#which-activation-function-should-we-use" style="font-size: 80%;"><b>Which activation function should we use?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs059.html#more-on-activation-functions-output-layers" style="font-size: 80%;"><b>More on activation functions, output layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs060.html#fine-tuning-neural-network-hyperparameters" style="font-size: 80%;"><b>Fine-tuning neural network hyperparameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs061.html#hidden-layers" style="font-size: 80%;"><b>Hidden layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs062.html#batch-normalization" style="font-size: 80%;"><b>Batch Normalization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs063.html#dropout" style="font-size: 80%;"><b>Dropout</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs064.html#gradient-clipping" style="font-size: 80%;"><b>Gradient Clipping</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs065.html#a-top-down-perspective-on-neural-networks" style="font-size: 80%;"><b>A top-down perspective on Neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs066.html#more-top-down-perspectives" style="font-size: 80%;"><b>More top-down perspectives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs067.html#limitations-of-supervised-learning-with-deep-networks" style="font-size: 80%;"><b>Limitations of supervised learning with deep networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs068.html#limitations-of-nns" style="font-size: 80%;"><b>Limitations of NNs</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs069.html#homogeneous-data" style="font-size: 80%;"><b>Homogeneous data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs070.html#more-limitations" style="font-size: 80%;"><b>More limitations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs071.html#setting-up-a-multi-layer-perceptron-model-for-classification" style="font-size: 80%;"><b>Setting up a Multi-layer perceptron model for classification</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs072.html#defining-the-cost-function" style="font-size: 80%;"><b>Defining the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs073.html#example-binary-classification-problem" style="font-size: 80%;"><b>Example: binary classification problem</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs074.html#the-softmax-function" style="font-size: 80%;"><b>The Softmax function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs075.html#developing-a-code-for-doing-neural-networks-with-back-propagation" style="font-size: 80%;"><b>Developing a code for doing neural networks with back propagation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs077.html#train-and-test-datasets" style="font-size: 80%;"><b>Train and test datasets</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs078.html#define-model-and-architecture" style="font-size: 80%;"><b>Define model and architecture</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs079.html#layers" style="font-size: 80%;"><b>Layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs080.html#weights-and-biases" style="font-size: 80%;"><b>Weights and biases</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs081.html#feed-forward-pass" style="font-size: 80%;"><b>Feed-forward pass</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs082.html#matrix-multiplications" style="font-size: 80%;"><b>Matrix multiplications</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs083.html#choose-cost-function-and-optimizer" style="font-size: 80%;"><b>Choose cost function and optimizer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs084.html#optimizing-the-cost-function" style="font-size: 80%;"><b>Optimizing the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs085.html#regularization" style="font-size: 80%;"><b>Regularization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs086.html#matrix-multiplication" style="font-size: 80%;"><b>Matrix  multiplication</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs087.html#improving-performance" style="font-size: 80%;"><b>Improving performance</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs088.html#full-object-oriented-implementation" style="font-size: 80%;"><b>Full object-oriented implementation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs089.html#evaluate-model-performance-on-test-data" style="font-size: 80%;"><b>Evaluate model performance on test data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs090.html#adjust-hyperparameters" style="font-size: 80%;"><b>Adjust hyperparameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#scikit-learn-implementation" style="font-size: 80%;"><b>scikit-learn implementation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs094.html#building-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building neural networks in Tensorflow and Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs095.html#tensorflow" style="font-size: 80%;"><b>Tensorflow</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#using-keras" style="font-size: 80%;"><b>Using Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#building-a-neural-network-code" style="font-size: 80%;"><b>Building a neural network code</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#learning-rate-methods" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Learning rate methods</a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#usage-of-the-above-learning-rate-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of the above learning rate schedulers</a></li>
@@ -465,20 +463,13 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0018"></a>
 <!-- !split -->
-<h2 id="the-ouput-layer" class="anchor">The ouput layer </h2>
-
-<p>We have the ouput layer given by layer label \( (2) \) with output \( a^{(2)} \) and weights and biases to be determined given by the variables</p>
-$$
-w_{i}^{(2)}=\left\{w_{1}^{(2)},w_{2}^{(2)}\right\} \wedge b^{(2)}.
-$$
-
-<p>Our output is \( \tilde{y}=a^{(2)} \) and we define a generic cost function \( C(a^{(2)},y;\boldsymbol{\Theta}) \) where \( y \) is the target value (a scalar here).
-The parameters we need to optimize are given by
-</p>
-$$
-\boldsymbol{\Theta}=\left\{w_{11}^{(1)},w_{12}^{(1)},w_{21}^{(1)},w_{22}^{(1)},w_{1}^{(2)},w_{2}^{(2)},b_1^{(1)},b_2^{(1)},b^{(2)}\right\}.
-$$
+<h2 id="layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-with-two-hidden-noeds-and-one-output-node" class="anchor">Layout of a simple neural network with two input nodes, one  hidden layer with two hidden noeds and one output node  </h2>
 
+<br/><br/>
+<center>
+<p><img src="http://github.com/figures/simplenn3.png" width="900" align="bottom"></p>
+</center>
+<br/><br/>
 
 <p>
 <!-- navigation buttons at the bottom of the page -->
diff --git a/doc/pub/week42/html/._week42-bs019.html b/doc/pub/week42/html/._week42-bs019.html
index c41632cb4..e86878c11 100644
--- a/doc/pub/week42/html/._week42-bs019.html
+++ b/doc/pub/week42/html/._week42-bs019.html
@@ -36,16 +36,17 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Lecture October 14, 2024', 2, None, 'lecture-october-14-2024'),
-              ('Material for the active learning sessions on Tuesday and '
-               'Wednesday',
+ 'sections': [('Lecture October 13, 2025', 2, None, 'lecture-october-13-2025'),
+              ('Readings and videos', 2, None, 'readings-and-videos'),
+              ('Material for the lab sessions on Tuesday and Wednesday',
                2,
                None,
-               'material-for-the-active-learning-sessions-on-tuesday-and-wednesday'),
-              ('Writing a code which implements a feed-forward neural network',
+               'material-for-the-lab-sessions-on-tuesday-and-wednesday'),
+              ('Lecture material: Writing a code which implements a '
+               'feed-forward neural network',
                2,
                None,
-               'writing-a-code-which-implements-a-feed-forward-neural-network'),
+               'lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network'),
               ('Mathematics of deep learning',
                2,
                None,
@@ -55,10 +56,11 @@
                None,
                'reminder-on-books-with-hands-on-material-and-codes'),
               ('Reading recommendations', 2, None, 'reading-recommendations'),
-              ('First network example, simple percepetron with one input',
+              ('Reminder from last week: First network example, simple '
+               'percepetron with one input',
                2,
                None,
-               'first-network-example-simple-percepetron-with-one-input'),
+               'reminder-from-last-week-first-network-example-simple-percepetron-with-one-input'),
               ('Layout of a simple neural network with no hidden layer',
                2,
                None,
@@ -105,10 +107,10 @@
                None,
                'setting-up-the-equations-for-a-neural-network'),
               ('Layout of a neural network with three hidden layers (last '
-               'later = $l=L=4$, first layer $l=0$)',
+               'layer = $l=L=4$, first layer $l=0$)',
                2,
                None,
-               'layout-of-a-neural-network-with-three-hidden-layers-last-later-l-l-4-first-layer-l-0'),
+               'layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0'),
               ('Definitions', 2, None, 'definitions'),
               ('Inputs to the activation function',
                2,
@@ -291,10 +293,6 @@
                2,
                None,
                'collect-and-pre-process-data'),
-              ('The Breast Cancer Data, now with Keras',
-               2,
-               None,
-               'the-breast-cancer-data-now-with-keras'),
               ('Building a neural network code',
                2,
                None,
@@ -349,103 +347,103 @@
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs001.html#lecture-october-14-2024" style="font-size: 80%;"><b>Lecture October 14, 2024</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs002.html#material-for-the-active-learning-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Material for the active learning sessions on Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs003.html#writing-a-code-which-implements-a-feed-forward-neural-network" style="font-size: 80%;"><b>Writing a code which implements a feed-forward neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs004.html#mathematics-of-deep-learning" style="font-size: 80%;"><b>Mathematics of deep learning</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs005.html#reminder-on-books-with-hands-on-material-and-codes" style="font-size: 80%;"><b>Reminder on books with hands-on material and codes</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs006.html#reading-recommendations" style="font-size: 80%;"><b>Reading recommendations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs007.html#first-network-example-simple-percepetron-with-one-input" style="font-size: 80%;"><b>First network example, simple percepetron with one input</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs008.html#layout-of-a-simple-neural-network-with-no-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with no hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs009.html#optimizing-the-parameters" style="font-size: 80%;"><b>Optimizing the parameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs010.html#adding-a-hidden-layer" style="font-size: 80%;"><b>Adding a hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs011.html#layout-of-a-simple-neural-network-with-one-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with one hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs012.html#the-derivatives" style="font-size: 80%;"><b>The derivatives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs013.html#important-observations" style="font-size: 80%;"><b>Important observations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs014.html#the-training" style="font-size: 80%;"><b>The training</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs015.html#code-example" style="font-size: 80%;"><b>Code example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs016.html#simple-neural-network-and-the-back-propagation-equations" style="font-size: 80%;"><b>Simple neural network and the  back propagation equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs017.html#layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-with-two-hidden-noeds-and-one-output-node" style="font-size: 80%;"><b>Layout of a simple neural network with two input nodes, one  hidden layer with two hidden noeds and one output node</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs018.html#the-ouput-layer" style="font-size: 80%;"><b>The ouput layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="#compact-expressions" style="font-size: 80%;"><b>Compact expressions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs020.html#output-layer" style="font-size: 80%;"><b>Output layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs021.html#explicit-derivatives" style="font-size: 80%;"><b>Explicit derivatives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs022.html#derivatives-of-the-hidden-layer" style="font-size: 80%;"><b>Derivatives of the hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs023.html#final-expression" style="font-size: 80%;"><b>Final expression</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs024.html#completing-the-list" style="font-size: 80%;"><b>Completing the list</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs025.html#final-expressions-for-the-biases-of-the-hidden-layer" style="font-size: 80%;"><b>Final expressions for the biases of the hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs026.html#gradient-expressions" style="font-size: 80%;"><b>Gradient expressions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs027.html#setting-up-the-equations-for-a-neural-network" style="font-size: 80%;"><b>Setting up the equations for a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs028.html#layout-of-a-neural-network-with-three-hidden-layers-last-later-l-l-4-first-layer-l-0" style="font-size: 80%;"><b>Layout of a neural network with three hidden layers (last later = \( l=L=4 \), first layer \( l=0 \))</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs029.html#definitions" style="font-size: 80%;"><b>Definitions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs030.html#inputs-to-the-activation-function" style="font-size: 80%;"><b>Inputs to the activation function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs031.html#layout-of-input-to-first-hidden-layer-l-1-from-input-layer-l-0" style="font-size: 80%;"><b>Layout of input to first hidden layer \( l=1 \) from input layer \( l=0 \)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs032.html#derivatives-and-the-chain-rule" style="font-size: 80%;"><b>Derivatives and the chain rule</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs033.html#derivative-of-the-cost-function" style="font-size: 80%;"><b>Derivative of the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs034.html#the-back-propagation-equations-for-a-neural-network" style="font-size: 80%;"><b>The  back propagation equations for a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs035.html#analyzing-the-last-results" style="font-size: 80%;"><b>Analyzing the last results</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs036.html#more-considerations" style="font-size: 80%;"><b>More considerations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs037.html#derivatives-in-terms-of-z-j-l" style="font-size: 80%;"><b>Derivatives in terms of \( z_j^L \)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs038.html#bringing-it-together" style="font-size: 80%;"><b>Bringing it together</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs039.html#final-back-propagating-equation" style="font-size: 80%;"><b>Final back propagating equation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs040.html#using-the-chain-rule-and-summing-over-all-k-entries" style="font-size: 80%;"><b>Using the chain rule and summing over all \( k \) entries</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs041.html#setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations" style="font-size: 80%;"><b>Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs042.html#setting-up-the-back-propagation-algorithm-part-1" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 1</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs043.html#setting-up-the-back-propagation-algorithm-part-2" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs044.html#setting-up-the-back-propagation-algorithm-part-3" style="font-size: 80%;"><b>Setting up the Back propagation algorithm, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs045.html#updating-the-gradients" style="font-size: 80%;"><b>Updating the gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs001.html#lecture-october-13-2025" style="font-size: 80%;"><b>Lecture October 13, 2025</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs002.html#readings-and-videos" style="font-size: 80%;"><b>Readings and videos</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs003.html#material-for-the-lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Material for the lab sessions on Tuesday and Wednesday</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs004.html#lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network" style="font-size: 80%;"><b>Lecture material: Writing a code which implements a feed-forward neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs005.html#mathematics-of-deep-learning" style="font-size: 80%;"><b>Mathematics of deep learning</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs006.html#reminder-on-books-with-hands-on-material-and-codes" style="font-size: 80%;"><b>Reminder on books with hands-on material and codes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs007.html#reading-recommendations" style="font-size: 80%;"><b>Reading recommendations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs008.html#reminder-from-last-week-first-network-example-simple-percepetron-with-one-input" style="font-size: 80%;"><b>Reminder from last week: First network example, simple percepetron with one input</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs009.html#layout-of-a-simple-neural-network-with-no-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with no hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs010.html#optimizing-the-parameters" style="font-size: 80%;"><b>Optimizing the parameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs011.html#adding-a-hidden-layer" style="font-size: 80%;"><b>Adding a hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs012.html#layout-of-a-simple-neural-network-with-one-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with one hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs013.html#the-derivatives" style="font-size: 80%;"><b>The derivatives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs014.html#important-observations" style="font-size: 80%;"><b>Important observations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs015.html#the-training" style="font-size: 80%;"><b>The training</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs016.html#code-example" style="font-size: 80%;"><b>Code example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs017.html#simple-neural-network-and-the-back-propagation-equations" style="font-size: 80%;"><b>Simple neural network and the  back propagation equations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs018.html#layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-with-two-hidden-noeds-and-one-output-node" style="font-size: 80%;"><b>Layout of a simple neural network with two input nodes, one  hidden layer with two hidden noeds and one output node</b></a></li>
+     <!-- navigation toc: --> <li><a href="#the-ouput-layer" style="font-size: 80%;"><b>The ouput layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs020.html#compact-expressions" style="font-size: 80%;"><b>Compact expressions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs021.html#output-layer" style="font-size: 80%;"><b>Output layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs022.html#explicit-derivatives" style="font-size: 80%;"><b>Explicit derivatives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs023.html#derivatives-of-the-hidden-layer" style="font-size: 80%;"><b>Derivatives of the hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs024.html#final-expression" style="font-size: 80%;"><b>Final expression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs025.html#completing-the-list" style="font-size: 80%;"><b>Completing the list</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs026.html#final-expressions-for-the-biases-of-the-hidden-layer" style="font-size: 80%;"><b>Final expressions for the biases of the hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs027.html#gradient-expressions" style="font-size: 80%;"><b>Gradient expressions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs028.html#setting-up-the-equations-for-a-neural-network" style="font-size: 80%;"><b>Setting up the equations for a neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs029.html#layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0" style="font-size: 80%;"><b>Layout of a neural network with three hidden layers (last layer = \( l=L=4 \), first layer \( l=0 \))</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs030.html#definitions" style="font-size: 80%;"><b>Definitions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs031.html#inputs-to-the-activation-function" style="font-size: 80%;"><b>Inputs to the activation function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs032.html#layout-of-input-to-first-hidden-layer-l-1-from-input-layer-l-0" style="font-size: 80%;"><b>Layout of input to first hidden layer \( l=1 \) from input layer \( l=0 \)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs033.html#derivatives-and-the-chain-rule" style="font-size: 80%;"><b>Derivatives and the chain rule</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs034.html#derivative-of-the-cost-function" style="font-size: 80%;"><b>Derivative of the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs035.html#the-back-propagation-equations-for-a-neural-network" style="font-size: 80%;"><b>The  back propagation equations for a neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs036.html#analyzing-the-last-results" style="font-size: 80%;"><b>Analyzing the last results</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs037.html#more-considerations" style="font-size: 80%;"><b>More considerations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs038.html#derivatives-in-terms-of-z-j-l" style="font-size: 80%;"><b>Derivatives in terms of \( z_j^L \)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs039.html#bringing-it-together" style="font-size: 80%;"><b>Bringing it together</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs040.html#final-back-propagating-equation" style="font-size: 80%;"><b>Final back propagating equation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs041.html#using-the-chain-rule-and-summing-over-all-k-entries" style="font-size: 80%;"><b>Using the chain rule and summing over all \( k \) entries</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs042.html#setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations" style="font-size: 80%;"><b>Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs043.html#setting-up-the-back-propagation-algorithm-part-1" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 1</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs044.html#setting-up-the-back-propagation-algorithm-part-2" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 2</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs045.html#setting-up-the-back-propagation-algorithm-part-3" style="font-size: 80%;"><b>Setting up the Back propagation algorithm, part 3</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs046.html#updating-the-gradients" style="font-size: 80%;"><b>Updating the gradients</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#activation-functions" style="font-size: 80%;"><b>Activation functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs047.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs048.html#relevance" style="font-size: 80%;"><b>Relevance</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs049.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs050.html#exploding-gradients" style="font-size: 80%;"><b>Exploding gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs051.html#is-the-logistic-activation-function-sigmoid-our-choice" style="font-size: 80%;"><b>Is the Logistic activation function (Sigmoid)  our choice?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs052.html#logistic-function-as-the-root-of-problems" style="font-size: 80%;"><b>Logistic function as the root of problems</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs053.html#the-derivative-of-the-logistic-funtion" style="font-size: 80%;"><b>The derivative of the Logistic funtion</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs054.html#insights-from-the-paper-by-glorot-and-bengio" style="font-size: 80%;"><b>Insights from the paper by Glorot and Bengio</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs055.html#the-relu-function-family" style="font-size: 80%;"><b>The RELU function family</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs056.html#elu-function" style="font-size: 80%;"><b>ELU function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs057.html#which-activation-function-should-we-use" style="font-size: 80%;"><b>Which activation function should we use?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs058.html#more-on-activation-functions-output-layers" style="font-size: 80%;"><b>More on activation functions, output layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs059.html#fine-tuning-neural-network-hyperparameters" style="font-size: 80%;"><b>Fine-tuning neural network hyperparameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs060.html#hidden-layers" style="font-size: 80%;"><b>Hidden layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs061.html#batch-normalization" style="font-size: 80%;"><b>Batch Normalization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs062.html#dropout" style="font-size: 80%;"><b>Dropout</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs063.html#gradient-clipping" style="font-size: 80%;"><b>Gradient Clipping</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs064.html#a-top-down-perspective-on-neural-networks" style="font-size: 80%;"><b>A top-down perspective on Neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs065.html#more-top-down-perspectives" style="font-size: 80%;"><b>More top-down perspectives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs066.html#limitations-of-supervised-learning-with-deep-networks" style="font-size: 80%;"><b>Limitations of supervised learning with deep networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs067.html#limitations-of-nns" style="font-size: 80%;"><b>Limitations of NNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs068.html#homogeneous-data" style="font-size: 80%;"><b>Homogeneous data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs069.html#more-limitations" style="font-size: 80%;"><b>More limitations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs070.html#setting-up-a-multi-layer-perceptron-model-for-classification" style="font-size: 80%;"><b>Setting up a Multi-layer perceptron model for classification</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs071.html#defining-the-cost-function" style="font-size: 80%;"><b>Defining the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs072.html#example-binary-classification-problem" style="font-size: 80%;"><b>Example: binary classification problem</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs073.html#the-softmax-function" style="font-size: 80%;"><b>The Softmax function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs074.html#developing-a-code-for-doing-neural-networks-with-back-propagation" style="font-size: 80%;"><b>Developing a code for doing neural networks with back propagation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs076.html#train-and-test-datasets" style="font-size: 80%;"><b>Train and test datasets</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs077.html#define-model-and-architecture" style="font-size: 80%;"><b>Define model and architecture</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs078.html#layers" style="font-size: 80%;"><b>Layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs079.html#weights-and-biases" style="font-size: 80%;"><b>Weights and biases</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs080.html#feed-forward-pass" style="font-size: 80%;"><b>Feed-forward pass</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs081.html#matrix-multiplications" style="font-size: 80%;"><b>Matrix multiplications</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs082.html#choose-cost-function-and-optimizer" style="font-size: 80%;"><b>Choose cost function and optimizer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs083.html#optimizing-the-cost-function" style="font-size: 80%;"><b>Optimizing the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs084.html#regularization" style="font-size: 80%;"><b>Regularization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs085.html#matrix-multiplication" style="font-size: 80%;"><b>Matrix  multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs086.html#improving-performance" style="font-size: 80%;"><b>Improving performance</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs087.html#full-object-oriented-implementation" style="font-size: 80%;"><b>Full object-oriented implementation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs088.html#evaluate-model-performance-on-test-data" style="font-size: 80%;"><b>Evaluate model performance on test data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs089.html#adjust-hyperparameters" style="font-size: 80%;"><b>Adjust hyperparameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs091.html#scikit-learn-implementation" style="font-size: 80%;"><b>scikit-learn implementation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#building-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs094.html#tensorflow" style="font-size: 80%;"><b>Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs095.html#using-keras" style="font-size: 80%;"><b>Using Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#the-breast-cancer-data-now-with-keras" style="font-size: 80%;"><b>The Breast Cancer Data, now with Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs048.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs049.html#relevance" style="font-size: 80%;"><b>Relevance</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs050.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs051.html#exploding-gradients" style="font-size: 80%;"><b>Exploding gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs052.html#is-the-logistic-activation-function-sigmoid-our-choice" style="font-size: 80%;"><b>Is the Logistic activation function (Sigmoid)  our choice?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs053.html#logistic-function-as-the-root-of-problems" style="font-size: 80%;"><b>Logistic function as the root of problems</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs054.html#the-derivative-of-the-logistic-funtion" style="font-size: 80%;"><b>The derivative of the Logistic funtion</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs055.html#insights-from-the-paper-by-glorot-and-bengio" style="font-size: 80%;"><b>Insights from the paper by Glorot and Bengio</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs056.html#the-relu-function-family" style="font-size: 80%;"><b>The RELU function family</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs057.html#elu-function" style="font-size: 80%;"><b>ELU function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs058.html#which-activation-function-should-we-use" style="font-size: 80%;"><b>Which activation function should we use?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs059.html#more-on-activation-functions-output-layers" style="font-size: 80%;"><b>More on activation functions, output layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs060.html#fine-tuning-neural-network-hyperparameters" style="font-size: 80%;"><b>Fine-tuning neural network hyperparameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs061.html#hidden-layers" style="font-size: 80%;"><b>Hidden layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs062.html#batch-normalization" style="font-size: 80%;"><b>Batch Normalization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs063.html#dropout" style="font-size: 80%;"><b>Dropout</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs064.html#gradient-clipping" style="font-size: 80%;"><b>Gradient Clipping</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs065.html#a-top-down-perspective-on-neural-networks" style="font-size: 80%;"><b>A top-down perspective on Neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs066.html#more-top-down-perspectives" style="font-size: 80%;"><b>More top-down perspectives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs067.html#limitations-of-supervised-learning-with-deep-networks" style="font-size: 80%;"><b>Limitations of supervised learning with deep networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs068.html#limitations-of-nns" style="font-size: 80%;"><b>Limitations of NNs</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs069.html#homogeneous-data" style="font-size: 80%;"><b>Homogeneous data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs070.html#more-limitations" style="font-size: 80%;"><b>More limitations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs071.html#setting-up-a-multi-layer-perceptron-model-for-classification" style="font-size: 80%;"><b>Setting up a Multi-layer perceptron model for classification</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs072.html#defining-the-cost-function" style="font-size: 80%;"><b>Defining the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs073.html#example-binary-classification-problem" style="font-size: 80%;"><b>Example: binary classification problem</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs074.html#the-softmax-function" style="font-size: 80%;"><b>The Softmax function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs075.html#developing-a-code-for-doing-neural-networks-with-back-propagation" style="font-size: 80%;"><b>Developing a code for doing neural networks with back propagation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs077.html#train-and-test-datasets" style="font-size: 80%;"><b>Train and test datasets</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs078.html#define-model-and-architecture" style="font-size: 80%;"><b>Define model and architecture</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs079.html#layers" style="font-size: 80%;"><b>Layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs080.html#weights-and-biases" style="font-size: 80%;"><b>Weights and biases</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs081.html#feed-forward-pass" style="font-size: 80%;"><b>Feed-forward pass</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs082.html#matrix-multiplications" style="font-size: 80%;"><b>Matrix multiplications</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs083.html#choose-cost-function-and-optimizer" style="font-size: 80%;"><b>Choose cost function and optimizer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs084.html#optimizing-the-cost-function" style="font-size: 80%;"><b>Optimizing the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs085.html#regularization" style="font-size: 80%;"><b>Regularization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs086.html#matrix-multiplication" style="font-size: 80%;"><b>Matrix  multiplication</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs087.html#improving-performance" style="font-size: 80%;"><b>Improving performance</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs088.html#full-object-oriented-implementation" style="font-size: 80%;"><b>Full object-oriented implementation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs089.html#evaluate-model-performance-on-test-data" style="font-size: 80%;"><b>Evaluate model performance on test data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs090.html#adjust-hyperparameters" style="font-size: 80%;"><b>Adjust hyperparameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#scikit-learn-implementation" style="font-size: 80%;"><b>scikit-learn implementation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs094.html#building-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building neural networks in Tensorflow and Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs095.html#tensorflow" style="font-size: 80%;"><b>Tensorflow</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#using-keras" style="font-size: 80%;"><b>Using Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#building-a-neural-network-code" style="font-size: 80%;"><b>Building a neural network code</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#learning-rate-methods" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Learning rate methods</a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#usage-of-the-above-learning-rate-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of the above learning rate schedulers</a></li>
@@ -465,18 +463,18 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0019"></a>
 <!-- !split -->
-<h2 id="compact-expressions" class="anchor">Compact expressions </h2>
+<h2 id="the-ouput-layer" class="anchor">The ouput layer </h2>
 
-<p>We can define the inputs to the activation functions for the various layers in terms of various matrix-vector multiplications and vector additions.
-The inputs to the first hidden layer are
-</p>
+<p>We have the ouput layer given by layer label \( (2) \) with output \( a^{(2)} \) and weights and biases to be determined given by the variables</p>
 $$
-\begin{bmatrix}z_1^{(1)} \\ z_2^{(1)} \end{bmatrix}=\left(\begin{bmatrix}w_{11}^{(1)} & w_{12}^{(1)}\\ w_{21}^{(1)} &w_{22}^{(1)} \end{bmatrix}\right)^{T}\begin{bmatrix}a_1^{(0)} \\ a_2^{(0)} \end{bmatrix}+\begin{bmatrix}b_1^{(1)} \\ b_2^{(1)} \end{bmatrix},
+w_{i}^{(2)}=\left\{w_{1}^{(2)},w_{2}^{(2)}\right\} \wedge b^{(2)}.
 $$
 
-<p>with outputs</p>
+<p>Our output is \( \tilde{y}=a^{(2)} \) and we define a generic cost function \( C(a^{(2)},y;\boldsymbol{\Theta}) \) where \( y \) is the target value (a scalar here).
+The parameters we need to optimize are given by
+</p>
 $$
-\begin{bmatrix}a_1^{(1)} \\ a_2^{(1)} \end{bmatrix}=\begin{bmatrix}\sigma^{(1)}(z_1^{(1)}) \\ \sigma^{(1)}(z_2^{(1)}) \end{bmatrix}.
+\boldsymbol{\Theta}=\left\{w_{11}^{(1)},w_{12}^{(1)},w_{21}^{(1)},w_{22}^{(1)},w_{1}^{(2)},w_{2}^{(2)},b_1^{(1)},b_2^{(1)},b^{(2)}\right\}.
 $$
 
 
diff --git a/doc/pub/week42/html/._week42-bs020.html b/doc/pub/week42/html/._week42-bs020.html
index f193ffed7..d556e4dbd 100644
--- a/doc/pub/week42/html/._week42-bs020.html
+++ b/doc/pub/week42/html/._week42-bs020.html
@@ -36,16 +36,17 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Lecture October 14, 2024', 2, None, 'lecture-october-14-2024'),
-              ('Material for the active learning sessions on Tuesday and '
-               'Wednesday',
+ 'sections': [('Lecture October 13, 2025', 2, None, 'lecture-october-13-2025'),
+              ('Readings and videos', 2, None, 'readings-and-videos'),
+              ('Material for the lab sessions on Tuesday and Wednesday',
                2,
                None,
-               'material-for-the-active-learning-sessions-on-tuesday-and-wednesday'),
-              ('Writing a code which implements a feed-forward neural network',
+               'material-for-the-lab-sessions-on-tuesday-and-wednesday'),
+              ('Lecture material: Writing a code which implements a '
+               'feed-forward neural network',
                2,
                None,
-               'writing-a-code-which-implements-a-feed-forward-neural-network'),
+               'lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network'),
               ('Mathematics of deep learning',
                2,
                None,
@@ -55,10 +56,11 @@
                None,
                'reminder-on-books-with-hands-on-material-and-codes'),
               ('Reading recommendations', 2, None, 'reading-recommendations'),
-              ('First network example, simple percepetron with one input',
+              ('Reminder from last week: First network example, simple '
+               'percepetron with one input',
                2,
                None,
-               'first-network-example-simple-percepetron-with-one-input'),
+               'reminder-from-last-week-first-network-example-simple-percepetron-with-one-input'),
               ('Layout of a simple neural network with no hidden layer',
                2,
                None,
@@ -105,10 +107,10 @@
                None,
                'setting-up-the-equations-for-a-neural-network'),
               ('Layout of a neural network with three hidden layers (last '
-               'later = $l=L=4$, first layer $l=0$)',
+               'layer = $l=L=4$, first layer $l=0$)',
                2,
                None,
-               'layout-of-a-neural-network-with-three-hidden-layers-last-later-l-l-4-first-layer-l-0'),
+               'layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0'),
               ('Definitions', 2, None, 'definitions'),
               ('Inputs to the activation function',
                2,
@@ -291,10 +293,6 @@
                2,
                None,
                'collect-and-pre-process-data'),
-              ('The Breast Cancer Data, now with Keras',
-               2,
-               None,
-               'the-breast-cancer-data-now-with-keras'),
               ('Building a neural network code',
                2,
                None,
@@ -349,103 +347,103 @@
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs001.html#lecture-october-14-2024" style="font-size: 80%;"><b>Lecture October 14, 2024</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs002.html#material-for-the-active-learning-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Material for the active learning sessions on Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs003.html#writing-a-code-which-implements-a-feed-forward-neural-network" style="font-size: 80%;"><b>Writing a code which implements a feed-forward neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs004.html#mathematics-of-deep-learning" style="font-size: 80%;"><b>Mathematics of deep learning</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs005.html#reminder-on-books-with-hands-on-material-and-codes" style="font-size: 80%;"><b>Reminder on books with hands-on material and codes</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs006.html#reading-recommendations" style="font-size: 80%;"><b>Reading recommendations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs007.html#first-network-example-simple-percepetron-with-one-input" style="font-size: 80%;"><b>First network example, simple percepetron with one input</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs008.html#layout-of-a-simple-neural-network-with-no-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with no hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs009.html#optimizing-the-parameters" style="font-size: 80%;"><b>Optimizing the parameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs010.html#adding-a-hidden-layer" style="font-size: 80%;"><b>Adding a hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs011.html#layout-of-a-simple-neural-network-with-one-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with one hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs012.html#the-derivatives" style="font-size: 80%;"><b>The derivatives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs013.html#important-observations" style="font-size: 80%;"><b>Important observations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs014.html#the-training" style="font-size: 80%;"><b>The training</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs015.html#code-example" style="font-size: 80%;"><b>Code example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs016.html#simple-neural-network-and-the-back-propagation-equations" style="font-size: 80%;"><b>Simple neural network and the  back propagation equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs017.html#layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-with-two-hidden-noeds-and-one-output-node" style="font-size: 80%;"><b>Layout of a simple neural network with two input nodes, one  hidden layer with two hidden noeds and one output node</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs018.html#the-ouput-layer" style="font-size: 80%;"><b>The ouput layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs019.html#compact-expressions" style="font-size: 80%;"><b>Compact expressions</b></a></li>
-     <!-- navigation toc: --> <li><a href="#output-layer" style="font-size: 80%;"><b>Output layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs021.html#explicit-derivatives" style="font-size: 80%;"><b>Explicit derivatives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs022.html#derivatives-of-the-hidden-layer" style="font-size: 80%;"><b>Derivatives of the hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs023.html#final-expression" style="font-size: 80%;"><b>Final expression</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs024.html#completing-the-list" style="font-size: 80%;"><b>Completing the list</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs025.html#final-expressions-for-the-biases-of-the-hidden-layer" style="font-size: 80%;"><b>Final expressions for the biases of the hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs026.html#gradient-expressions" style="font-size: 80%;"><b>Gradient expressions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs027.html#setting-up-the-equations-for-a-neural-network" style="font-size: 80%;"><b>Setting up the equations for a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs028.html#layout-of-a-neural-network-with-three-hidden-layers-last-later-l-l-4-first-layer-l-0" style="font-size: 80%;"><b>Layout of a neural network with three hidden layers (last later = \( l=L=4 \), first layer \( l=0 \))</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs029.html#definitions" style="font-size: 80%;"><b>Definitions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs030.html#inputs-to-the-activation-function" style="font-size: 80%;"><b>Inputs to the activation function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs031.html#layout-of-input-to-first-hidden-layer-l-1-from-input-layer-l-0" style="font-size: 80%;"><b>Layout of input to first hidden layer \( l=1 \) from input layer \( l=0 \)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs032.html#derivatives-and-the-chain-rule" style="font-size: 80%;"><b>Derivatives and the chain rule</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs033.html#derivative-of-the-cost-function" style="font-size: 80%;"><b>Derivative of the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs034.html#the-back-propagation-equations-for-a-neural-network" style="font-size: 80%;"><b>The  back propagation equations for a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs035.html#analyzing-the-last-results" style="font-size: 80%;"><b>Analyzing the last results</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs036.html#more-considerations" style="font-size: 80%;"><b>More considerations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs037.html#derivatives-in-terms-of-z-j-l" style="font-size: 80%;"><b>Derivatives in terms of \( z_j^L \)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs038.html#bringing-it-together" style="font-size: 80%;"><b>Bringing it together</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs039.html#final-back-propagating-equation" style="font-size: 80%;"><b>Final back propagating equation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs040.html#using-the-chain-rule-and-summing-over-all-k-entries" style="font-size: 80%;"><b>Using the chain rule and summing over all \( k \) entries</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs041.html#setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations" style="font-size: 80%;"><b>Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs042.html#setting-up-the-back-propagation-algorithm-part-1" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 1</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs043.html#setting-up-the-back-propagation-algorithm-part-2" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs044.html#setting-up-the-back-propagation-algorithm-part-3" style="font-size: 80%;"><b>Setting up the Back propagation algorithm, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs045.html#updating-the-gradients" style="font-size: 80%;"><b>Updating the gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs001.html#lecture-october-13-2025" style="font-size: 80%;"><b>Lecture October 13, 2025</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs002.html#readings-and-videos" style="font-size: 80%;"><b>Readings and videos</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs003.html#material-for-the-lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Material for the lab sessions on Tuesday and Wednesday</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs004.html#lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network" style="font-size: 80%;"><b>Lecture material: Writing a code which implements a feed-forward neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs005.html#mathematics-of-deep-learning" style="font-size: 80%;"><b>Mathematics of deep learning</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs006.html#reminder-on-books-with-hands-on-material-and-codes" style="font-size: 80%;"><b>Reminder on books with hands-on material and codes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs007.html#reading-recommendations" style="font-size: 80%;"><b>Reading recommendations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs008.html#reminder-from-last-week-first-network-example-simple-percepetron-with-one-input" style="font-size: 80%;"><b>Reminder from last week: First network example, simple percepetron with one input</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs009.html#layout-of-a-simple-neural-network-with-no-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with no hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs010.html#optimizing-the-parameters" style="font-size: 80%;"><b>Optimizing the parameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs011.html#adding-a-hidden-layer" style="font-size: 80%;"><b>Adding a hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs012.html#layout-of-a-simple-neural-network-with-one-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with one hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs013.html#the-derivatives" style="font-size: 80%;"><b>The derivatives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs014.html#important-observations" style="font-size: 80%;"><b>Important observations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs015.html#the-training" style="font-size: 80%;"><b>The training</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs016.html#code-example" style="font-size: 80%;"><b>Code example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs017.html#simple-neural-network-and-the-back-propagation-equations" style="font-size: 80%;"><b>Simple neural network and the  back propagation equations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs018.html#layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-with-two-hidden-noeds-and-one-output-node" style="font-size: 80%;"><b>Layout of a simple neural network with two input nodes, one  hidden layer with two hidden noeds and one output node</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs019.html#the-ouput-layer" style="font-size: 80%;"><b>The ouput layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="#compact-expressions" style="font-size: 80%;"><b>Compact expressions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs021.html#output-layer" style="font-size: 80%;"><b>Output layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs022.html#explicit-derivatives" style="font-size: 80%;"><b>Explicit derivatives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs023.html#derivatives-of-the-hidden-layer" style="font-size: 80%;"><b>Derivatives of the hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs024.html#final-expression" style="font-size: 80%;"><b>Final expression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs025.html#completing-the-list" style="font-size: 80%;"><b>Completing the list</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs026.html#final-expressions-for-the-biases-of-the-hidden-layer" style="font-size: 80%;"><b>Final expressions for the biases of the hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs027.html#gradient-expressions" style="font-size: 80%;"><b>Gradient expressions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs028.html#setting-up-the-equations-for-a-neural-network" style="font-size: 80%;"><b>Setting up the equations for a neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs029.html#layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0" style="font-size: 80%;"><b>Layout of a neural network with three hidden layers (last layer = \( l=L=4 \), first layer \( l=0 \))</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs030.html#definitions" style="font-size: 80%;"><b>Definitions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs031.html#inputs-to-the-activation-function" style="font-size: 80%;"><b>Inputs to the activation function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs032.html#layout-of-input-to-first-hidden-layer-l-1-from-input-layer-l-0" style="font-size: 80%;"><b>Layout of input to first hidden layer \( l=1 \) from input layer \( l=0 \)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs033.html#derivatives-and-the-chain-rule" style="font-size: 80%;"><b>Derivatives and the chain rule</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs034.html#derivative-of-the-cost-function" style="font-size: 80%;"><b>Derivative of the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs035.html#the-back-propagation-equations-for-a-neural-network" style="font-size: 80%;"><b>The  back propagation equations for a neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs036.html#analyzing-the-last-results" style="font-size: 80%;"><b>Analyzing the last results</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs037.html#more-considerations" style="font-size: 80%;"><b>More considerations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs038.html#derivatives-in-terms-of-z-j-l" style="font-size: 80%;"><b>Derivatives in terms of \( z_j^L \)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs039.html#bringing-it-together" style="font-size: 80%;"><b>Bringing it together</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs040.html#final-back-propagating-equation" style="font-size: 80%;"><b>Final back propagating equation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs041.html#using-the-chain-rule-and-summing-over-all-k-entries" style="font-size: 80%;"><b>Using the chain rule and summing over all \( k \) entries</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs042.html#setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations" style="font-size: 80%;"><b>Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs043.html#setting-up-the-back-propagation-algorithm-part-1" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 1</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs044.html#setting-up-the-back-propagation-algorithm-part-2" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 2</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs045.html#setting-up-the-back-propagation-algorithm-part-3" style="font-size: 80%;"><b>Setting up the Back propagation algorithm, part 3</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs046.html#updating-the-gradients" style="font-size: 80%;"><b>Updating the gradients</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#activation-functions" style="font-size: 80%;"><b>Activation functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs047.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs048.html#relevance" style="font-size: 80%;"><b>Relevance</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs049.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs050.html#exploding-gradients" style="font-size: 80%;"><b>Exploding gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs051.html#is-the-logistic-activation-function-sigmoid-our-choice" style="font-size: 80%;"><b>Is the Logistic activation function (Sigmoid)  our choice?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs052.html#logistic-function-as-the-root-of-problems" style="font-size: 80%;"><b>Logistic function as the root of problems</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs053.html#the-derivative-of-the-logistic-funtion" style="font-size: 80%;"><b>The derivative of the Logistic funtion</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs054.html#insights-from-the-paper-by-glorot-and-bengio" style="font-size: 80%;"><b>Insights from the paper by Glorot and Bengio</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs055.html#the-relu-function-family" style="font-size: 80%;"><b>The RELU function family</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs056.html#elu-function" style="font-size: 80%;"><b>ELU function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs057.html#which-activation-function-should-we-use" style="font-size: 80%;"><b>Which activation function should we use?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs058.html#more-on-activation-functions-output-layers" style="font-size: 80%;"><b>More on activation functions, output layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs059.html#fine-tuning-neural-network-hyperparameters" style="font-size: 80%;"><b>Fine-tuning neural network hyperparameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs060.html#hidden-layers" style="font-size: 80%;"><b>Hidden layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs061.html#batch-normalization" style="font-size: 80%;"><b>Batch Normalization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs062.html#dropout" style="font-size: 80%;"><b>Dropout</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs063.html#gradient-clipping" style="font-size: 80%;"><b>Gradient Clipping</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs064.html#a-top-down-perspective-on-neural-networks" style="font-size: 80%;"><b>A top-down perspective on Neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs065.html#more-top-down-perspectives" style="font-size: 80%;"><b>More top-down perspectives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs066.html#limitations-of-supervised-learning-with-deep-networks" style="font-size: 80%;"><b>Limitations of supervised learning with deep networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs067.html#limitations-of-nns" style="font-size: 80%;"><b>Limitations of NNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs068.html#homogeneous-data" style="font-size: 80%;"><b>Homogeneous data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs069.html#more-limitations" style="font-size: 80%;"><b>More limitations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs070.html#setting-up-a-multi-layer-perceptron-model-for-classification" style="font-size: 80%;"><b>Setting up a Multi-layer perceptron model for classification</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs071.html#defining-the-cost-function" style="font-size: 80%;"><b>Defining the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs072.html#example-binary-classification-problem" style="font-size: 80%;"><b>Example: binary classification problem</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs073.html#the-softmax-function" style="font-size: 80%;"><b>The Softmax function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs074.html#developing-a-code-for-doing-neural-networks-with-back-propagation" style="font-size: 80%;"><b>Developing a code for doing neural networks with back propagation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs076.html#train-and-test-datasets" style="font-size: 80%;"><b>Train and test datasets</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs077.html#define-model-and-architecture" style="font-size: 80%;"><b>Define model and architecture</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs078.html#layers" style="font-size: 80%;"><b>Layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs079.html#weights-and-biases" style="font-size: 80%;"><b>Weights and biases</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs080.html#feed-forward-pass" style="font-size: 80%;"><b>Feed-forward pass</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs081.html#matrix-multiplications" style="font-size: 80%;"><b>Matrix multiplications</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs082.html#choose-cost-function-and-optimizer" style="font-size: 80%;"><b>Choose cost function and optimizer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs083.html#optimizing-the-cost-function" style="font-size: 80%;"><b>Optimizing the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs084.html#regularization" style="font-size: 80%;"><b>Regularization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs085.html#matrix-multiplication" style="font-size: 80%;"><b>Matrix  multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs086.html#improving-performance" style="font-size: 80%;"><b>Improving performance</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs087.html#full-object-oriented-implementation" style="font-size: 80%;"><b>Full object-oriented implementation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs088.html#evaluate-model-performance-on-test-data" style="font-size: 80%;"><b>Evaluate model performance on test data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs089.html#adjust-hyperparameters" style="font-size: 80%;"><b>Adjust hyperparameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs091.html#scikit-learn-implementation" style="font-size: 80%;"><b>scikit-learn implementation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#building-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs094.html#tensorflow" style="font-size: 80%;"><b>Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs095.html#using-keras" style="font-size: 80%;"><b>Using Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#the-breast-cancer-data-now-with-keras" style="font-size: 80%;"><b>The Breast Cancer Data, now with Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs048.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs049.html#relevance" style="font-size: 80%;"><b>Relevance</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs050.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs051.html#exploding-gradients" style="font-size: 80%;"><b>Exploding gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs052.html#is-the-logistic-activation-function-sigmoid-our-choice" style="font-size: 80%;"><b>Is the Logistic activation function (Sigmoid)  our choice?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs053.html#logistic-function-as-the-root-of-problems" style="font-size: 80%;"><b>Logistic function as the root of problems</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs054.html#the-derivative-of-the-logistic-funtion" style="font-size: 80%;"><b>The derivative of the Logistic funtion</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs055.html#insights-from-the-paper-by-glorot-and-bengio" style="font-size: 80%;"><b>Insights from the paper by Glorot and Bengio</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs056.html#the-relu-function-family" style="font-size: 80%;"><b>The RELU function family</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs057.html#elu-function" style="font-size: 80%;"><b>ELU function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs058.html#which-activation-function-should-we-use" style="font-size: 80%;"><b>Which activation function should we use?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs059.html#more-on-activation-functions-output-layers" style="font-size: 80%;"><b>More on activation functions, output layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs060.html#fine-tuning-neural-network-hyperparameters" style="font-size: 80%;"><b>Fine-tuning neural network hyperparameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs061.html#hidden-layers" style="font-size: 80%;"><b>Hidden layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs062.html#batch-normalization" style="font-size: 80%;"><b>Batch Normalization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs063.html#dropout" style="font-size: 80%;"><b>Dropout</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs064.html#gradient-clipping" style="font-size: 80%;"><b>Gradient Clipping</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs065.html#a-top-down-perspective-on-neural-networks" style="font-size: 80%;"><b>A top-down perspective on Neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs066.html#more-top-down-perspectives" style="font-size: 80%;"><b>More top-down perspectives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs067.html#limitations-of-supervised-learning-with-deep-networks" style="font-size: 80%;"><b>Limitations of supervised learning with deep networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs068.html#limitations-of-nns" style="font-size: 80%;"><b>Limitations of NNs</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs069.html#homogeneous-data" style="font-size: 80%;"><b>Homogeneous data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs070.html#more-limitations" style="font-size: 80%;"><b>More limitations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs071.html#setting-up-a-multi-layer-perceptron-model-for-classification" style="font-size: 80%;"><b>Setting up a Multi-layer perceptron model for classification</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs072.html#defining-the-cost-function" style="font-size: 80%;"><b>Defining the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs073.html#example-binary-classification-problem" style="font-size: 80%;"><b>Example: binary classification problem</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs074.html#the-softmax-function" style="font-size: 80%;"><b>The Softmax function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs075.html#developing-a-code-for-doing-neural-networks-with-back-propagation" style="font-size: 80%;"><b>Developing a code for doing neural networks with back propagation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs077.html#train-and-test-datasets" style="font-size: 80%;"><b>Train and test datasets</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs078.html#define-model-and-architecture" style="font-size: 80%;"><b>Define model and architecture</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs079.html#layers" style="font-size: 80%;"><b>Layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs080.html#weights-and-biases" style="font-size: 80%;"><b>Weights and biases</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs081.html#feed-forward-pass" style="font-size: 80%;"><b>Feed-forward pass</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs082.html#matrix-multiplications" style="font-size: 80%;"><b>Matrix multiplications</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs083.html#choose-cost-function-and-optimizer" style="font-size: 80%;"><b>Choose cost function and optimizer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs084.html#optimizing-the-cost-function" style="font-size: 80%;"><b>Optimizing the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs085.html#regularization" style="font-size: 80%;"><b>Regularization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs086.html#matrix-multiplication" style="font-size: 80%;"><b>Matrix  multiplication</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs087.html#improving-performance" style="font-size: 80%;"><b>Improving performance</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs088.html#full-object-oriented-implementation" style="font-size: 80%;"><b>Full object-oriented implementation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs089.html#evaluate-model-performance-on-test-data" style="font-size: 80%;"><b>Evaluate model performance on test data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs090.html#adjust-hyperparameters" style="font-size: 80%;"><b>Adjust hyperparameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#scikit-learn-implementation" style="font-size: 80%;"><b>scikit-learn implementation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs094.html#building-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building neural networks in Tensorflow and Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs095.html#tensorflow" style="font-size: 80%;"><b>Tensorflow</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#using-keras" style="font-size: 80%;"><b>Using Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#building-a-neural-network-code" style="font-size: 80%;"><b>Building a neural network code</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#learning-rate-methods" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Learning rate methods</a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#usage-of-the-above-learning-rate-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of the above learning rate schedulers</a></li>
@@ -465,16 +463,18 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0020"></a>
 <!-- !split -->
-<h2 id="output-layer" class="anchor">Output layer </h2>
+<h2 id="compact-expressions" class="anchor">Compact expressions </h2>
 
-<p>For the final output layer we have the inputs to the final activation function </p>
+<p>We can define the inputs to the activation functions for the various layers in terms of various matrix-vector multiplications and vector additions.
+The inputs to the first hidden layer are
+</p>
 $$
-z^{(2)} = w_{1}^{(2)}a_1^{(1)} +w_{2}^{(2)}a_2^{(1)}+b^{(2)},
+\begin{bmatrix}z_1^{(1)} \\ z_2^{(1)} \end{bmatrix}=\left(\begin{bmatrix}w_{11}^{(1)} & w_{12}^{(1)}\\ w_{21}^{(1)} &w_{22}^{(1)} \end{bmatrix}\right)^{T}\begin{bmatrix}a_1^{(0)} \\ a_2^{(0)} \end{bmatrix}+\begin{bmatrix}b_1^{(1)} \\ b_2^{(1)} \end{bmatrix},
 $$
 
-<p>resulting in the  output</p>
+<p>with outputs</p>
 $$
-a^{(2)}=\sigma^{(2)}(z^{(2)}).
+\begin{bmatrix}a_1^{(1)} \\ a_2^{(1)} \end{bmatrix}=\begin{bmatrix}\sigma^{(1)}(z_1^{(1)}) \\ \sigma^{(1)}(z_2^{(1)}) \end{bmatrix}.
 $$
 
 
diff --git a/doc/pub/week42/html/._week42-bs021.html b/doc/pub/week42/html/._week42-bs021.html
index e401afcfc..1f819022f 100644
--- a/doc/pub/week42/html/._week42-bs021.html
+++ b/doc/pub/week42/html/._week42-bs021.html
@@ -36,16 +36,17 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Lecture October 14, 2024', 2, None, 'lecture-october-14-2024'),
-              ('Material for the active learning sessions on Tuesday and '
-               'Wednesday',
+ 'sections': [('Lecture October 13, 2025', 2, None, 'lecture-october-13-2025'),
+              ('Readings and videos', 2, None, 'readings-and-videos'),
+              ('Material for the lab sessions on Tuesday and Wednesday',
                2,
                None,
-               'material-for-the-active-learning-sessions-on-tuesday-and-wednesday'),
-              ('Writing a code which implements a feed-forward neural network',
+               'material-for-the-lab-sessions-on-tuesday-and-wednesday'),
+              ('Lecture material: Writing a code which implements a '
+               'feed-forward neural network',
                2,
                None,
-               'writing-a-code-which-implements-a-feed-forward-neural-network'),
+               'lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network'),
               ('Mathematics of deep learning',
                2,
                None,
@@ -55,10 +56,11 @@
                None,
                'reminder-on-books-with-hands-on-material-and-codes'),
               ('Reading recommendations', 2, None, 'reading-recommendations'),
-              ('First network example, simple percepetron with one input',
+              ('Reminder from last week: First network example, simple '
+               'percepetron with one input',
                2,
                None,
-               'first-network-example-simple-percepetron-with-one-input'),
+               'reminder-from-last-week-first-network-example-simple-percepetron-with-one-input'),
               ('Layout of a simple neural network with no hidden layer',
                2,
                None,
@@ -105,10 +107,10 @@
                None,
                'setting-up-the-equations-for-a-neural-network'),
               ('Layout of a neural network with three hidden layers (last '
-               'later = $l=L=4$, first layer $l=0$)',
+               'layer = $l=L=4$, first layer $l=0$)',
                2,
                None,
-               'layout-of-a-neural-network-with-three-hidden-layers-last-later-l-l-4-first-layer-l-0'),
+               'layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0'),
               ('Definitions', 2, None, 'definitions'),
               ('Inputs to the activation function',
                2,
@@ -291,10 +293,6 @@
                2,
                None,
                'collect-and-pre-process-data'),
-              ('The Breast Cancer Data, now with Keras',
-               2,
-               None,
-               'the-breast-cancer-data-now-with-keras'),
               ('Building a neural network code',
                2,
                None,
@@ -349,103 +347,103 @@
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs001.html#lecture-october-14-2024" style="font-size: 80%;"><b>Lecture October 14, 2024</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs002.html#material-for-the-active-learning-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Material for the active learning sessions on Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs003.html#writing-a-code-which-implements-a-feed-forward-neural-network" style="font-size: 80%;"><b>Writing a code which implements a feed-forward neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs004.html#mathematics-of-deep-learning" style="font-size: 80%;"><b>Mathematics of deep learning</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs005.html#reminder-on-books-with-hands-on-material-and-codes" style="font-size: 80%;"><b>Reminder on books with hands-on material and codes</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs006.html#reading-recommendations" style="font-size: 80%;"><b>Reading recommendations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs007.html#first-network-example-simple-percepetron-with-one-input" style="font-size: 80%;"><b>First network example, simple percepetron with one input</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs008.html#layout-of-a-simple-neural-network-with-no-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with no hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs009.html#optimizing-the-parameters" style="font-size: 80%;"><b>Optimizing the parameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs010.html#adding-a-hidden-layer" style="font-size: 80%;"><b>Adding a hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs011.html#layout-of-a-simple-neural-network-with-one-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with one hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs012.html#the-derivatives" style="font-size: 80%;"><b>The derivatives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs013.html#important-observations" style="font-size: 80%;"><b>Important observations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs014.html#the-training" style="font-size: 80%;"><b>The training</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs015.html#code-example" style="font-size: 80%;"><b>Code example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs016.html#simple-neural-network-and-the-back-propagation-equations" style="font-size: 80%;"><b>Simple neural network and the  back propagation equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs017.html#layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-with-two-hidden-noeds-and-one-output-node" style="font-size: 80%;"><b>Layout of a simple neural network with two input nodes, one  hidden layer with two hidden noeds and one output node</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs018.html#the-ouput-layer" style="font-size: 80%;"><b>The ouput layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs019.html#compact-expressions" style="font-size: 80%;"><b>Compact expressions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs020.html#output-layer" style="font-size: 80%;"><b>Output layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="#explicit-derivatives" style="font-size: 80%;"><b>Explicit derivatives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs022.html#derivatives-of-the-hidden-layer" style="font-size: 80%;"><b>Derivatives of the hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs023.html#final-expression" style="font-size: 80%;"><b>Final expression</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs024.html#completing-the-list" style="font-size: 80%;"><b>Completing the list</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs025.html#final-expressions-for-the-biases-of-the-hidden-layer" style="font-size: 80%;"><b>Final expressions for the biases of the hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs026.html#gradient-expressions" style="font-size: 80%;"><b>Gradient expressions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs027.html#setting-up-the-equations-for-a-neural-network" style="font-size: 80%;"><b>Setting up the equations for a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs028.html#layout-of-a-neural-network-with-three-hidden-layers-last-later-l-l-4-first-layer-l-0" style="font-size: 80%;"><b>Layout of a neural network with three hidden layers (last later = \( l=L=4 \), first layer \( l=0 \))</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs029.html#definitions" style="font-size: 80%;"><b>Definitions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs030.html#inputs-to-the-activation-function" style="font-size: 80%;"><b>Inputs to the activation function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs031.html#layout-of-input-to-first-hidden-layer-l-1-from-input-layer-l-0" style="font-size: 80%;"><b>Layout of input to first hidden layer \( l=1 \) from input layer \( l=0 \)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs032.html#derivatives-and-the-chain-rule" style="font-size: 80%;"><b>Derivatives and the chain rule</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs033.html#derivative-of-the-cost-function" style="font-size: 80%;"><b>Derivative of the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs034.html#the-back-propagation-equations-for-a-neural-network" style="font-size: 80%;"><b>The  back propagation equations for a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs035.html#analyzing-the-last-results" style="font-size: 80%;"><b>Analyzing the last results</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs036.html#more-considerations" style="font-size: 80%;"><b>More considerations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs037.html#derivatives-in-terms-of-z-j-l" style="font-size: 80%;"><b>Derivatives in terms of \( z_j^L \)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs038.html#bringing-it-together" style="font-size: 80%;"><b>Bringing it together</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs039.html#final-back-propagating-equation" style="font-size: 80%;"><b>Final back propagating equation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs040.html#using-the-chain-rule-and-summing-over-all-k-entries" style="font-size: 80%;"><b>Using the chain rule and summing over all \( k \) entries</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs041.html#setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations" style="font-size: 80%;"><b>Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs042.html#setting-up-the-back-propagation-algorithm-part-1" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 1</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs043.html#setting-up-the-back-propagation-algorithm-part-2" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs044.html#setting-up-the-back-propagation-algorithm-part-3" style="font-size: 80%;"><b>Setting up the Back propagation algorithm, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs045.html#updating-the-gradients" style="font-size: 80%;"><b>Updating the gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs001.html#lecture-october-13-2025" style="font-size: 80%;"><b>Lecture October 13, 2025</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs002.html#readings-and-videos" style="font-size: 80%;"><b>Readings and videos</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs003.html#material-for-the-lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Material for the lab sessions on Tuesday and Wednesday</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs004.html#lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network" style="font-size: 80%;"><b>Lecture material: Writing a code which implements a feed-forward neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs005.html#mathematics-of-deep-learning" style="font-size: 80%;"><b>Mathematics of deep learning</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs006.html#reminder-on-books-with-hands-on-material-and-codes" style="font-size: 80%;"><b>Reminder on books with hands-on material and codes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs007.html#reading-recommendations" style="font-size: 80%;"><b>Reading recommendations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs008.html#reminder-from-last-week-first-network-example-simple-percepetron-with-one-input" style="font-size: 80%;"><b>Reminder from last week: First network example, simple percepetron with one input</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs009.html#layout-of-a-simple-neural-network-with-no-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with no hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs010.html#optimizing-the-parameters" style="font-size: 80%;"><b>Optimizing the parameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs011.html#adding-a-hidden-layer" style="font-size: 80%;"><b>Adding a hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs012.html#layout-of-a-simple-neural-network-with-one-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with one hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs013.html#the-derivatives" style="font-size: 80%;"><b>The derivatives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs014.html#important-observations" style="font-size: 80%;"><b>Important observations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs015.html#the-training" style="font-size: 80%;"><b>The training</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs016.html#code-example" style="font-size: 80%;"><b>Code example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs017.html#simple-neural-network-and-the-back-propagation-equations" style="font-size: 80%;"><b>Simple neural network and the  back propagation equations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs018.html#layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-with-two-hidden-noeds-and-one-output-node" style="font-size: 80%;"><b>Layout of a simple neural network with two input nodes, one  hidden layer with two hidden noeds and one output node</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs019.html#the-ouput-layer" style="font-size: 80%;"><b>The ouput layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs020.html#compact-expressions" style="font-size: 80%;"><b>Compact expressions</b></a></li>
+     <!-- navigation toc: --> <li><a href="#output-layer" style="font-size: 80%;"><b>Output layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs022.html#explicit-derivatives" style="font-size: 80%;"><b>Explicit derivatives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs023.html#derivatives-of-the-hidden-layer" style="font-size: 80%;"><b>Derivatives of the hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs024.html#final-expression" style="font-size: 80%;"><b>Final expression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs025.html#completing-the-list" style="font-size: 80%;"><b>Completing the list</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs026.html#final-expressions-for-the-biases-of-the-hidden-layer" style="font-size: 80%;"><b>Final expressions for the biases of the hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs027.html#gradient-expressions" style="font-size: 80%;"><b>Gradient expressions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs028.html#setting-up-the-equations-for-a-neural-network" style="font-size: 80%;"><b>Setting up the equations for a neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs029.html#layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0" style="font-size: 80%;"><b>Layout of a neural network with three hidden layers (last layer = \( l=L=4 \), first layer \( l=0 \))</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs030.html#definitions" style="font-size: 80%;"><b>Definitions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs031.html#inputs-to-the-activation-function" style="font-size: 80%;"><b>Inputs to the activation function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs032.html#layout-of-input-to-first-hidden-layer-l-1-from-input-layer-l-0" style="font-size: 80%;"><b>Layout of input to first hidden layer \( l=1 \) from input layer \( l=0 \)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs033.html#derivatives-and-the-chain-rule" style="font-size: 80%;"><b>Derivatives and the chain rule</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs034.html#derivative-of-the-cost-function" style="font-size: 80%;"><b>Derivative of the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs035.html#the-back-propagation-equations-for-a-neural-network" style="font-size: 80%;"><b>The  back propagation equations for a neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs036.html#analyzing-the-last-results" style="font-size: 80%;"><b>Analyzing the last results</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs037.html#more-considerations" style="font-size: 80%;"><b>More considerations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs038.html#derivatives-in-terms-of-z-j-l" style="font-size: 80%;"><b>Derivatives in terms of \( z_j^L \)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs039.html#bringing-it-together" style="font-size: 80%;"><b>Bringing it together</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs040.html#final-back-propagating-equation" style="font-size: 80%;"><b>Final back propagating equation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs041.html#using-the-chain-rule-and-summing-over-all-k-entries" style="font-size: 80%;"><b>Using the chain rule and summing over all \( k \) entries</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs042.html#setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations" style="font-size: 80%;"><b>Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs043.html#setting-up-the-back-propagation-algorithm-part-1" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 1</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs044.html#setting-up-the-back-propagation-algorithm-part-2" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 2</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs045.html#setting-up-the-back-propagation-algorithm-part-3" style="font-size: 80%;"><b>Setting up the Back propagation algorithm, part 3</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs046.html#updating-the-gradients" style="font-size: 80%;"><b>Updating the gradients</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#activation-functions" style="font-size: 80%;"><b>Activation functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs047.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs048.html#relevance" style="font-size: 80%;"><b>Relevance</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs049.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs050.html#exploding-gradients" style="font-size: 80%;"><b>Exploding gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs051.html#is-the-logistic-activation-function-sigmoid-our-choice" style="font-size: 80%;"><b>Is the Logistic activation function (Sigmoid)  our choice?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs052.html#logistic-function-as-the-root-of-problems" style="font-size: 80%;"><b>Logistic function as the root of problems</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs053.html#the-derivative-of-the-logistic-funtion" style="font-size: 80%;"><b>The derivative of the Logistic funtion</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs054.html#insights-from-the-paper-by-glorot-and-bengio" style="font-size: 80%;"><b>Insights from the paper by Glorot and Bengio</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs055.html#the-relu-function-family" style="font-size: 80%;"><b>The RELU function family</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs056.html#elu-function" style="font-size: 80%;"><b>ELU function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs057.html#which-activation-function-should-we-use" style="font-size: 80%;"><b>Which activation function should we use?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs058.html#more-on-activation-functions-output-layers" style="font-size: 80%;"><b>More on activation functions, output layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs059.html#fine-tuning-neural-network-hyperparameters" style="font-size: 80%;"><b>Fine-tuning neural network hyperparameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs060.html#hidden-layers" style="font-size: 80%;"><b>Hidden layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs061.html#batch-normalization" style="font-size: 80%;"><b>Batch Normalization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs062.html#dropout" style="font-size: 80%;"><b>Dropout</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs063.html#gradient-clipping" style="font-size: 80%;"><b>Gradient Clipping</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs064.html#a-top-down-perspective-on-neural-networks" style="font-size: 80%;"><b>A top-down perspective on Neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs065.html#more-top-down-perspectives" style="font-size: 80%;"><b>More top-down perspectives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs066.html#limitations-of-supervised-learning-with-deep-networks" style="font-size: 80%;"><b>Limitations of supervised learning with deep networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs067.html#limitations-of-nns" style="font-size: 80%;"><b>Limitations of NNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs068.html#homogeneous-data" style="font-size: 80%;"><b>Homogeneous data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs069.html#more-limitations" style="font-size: 80%;"><b>More limitations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs070.html#setting-up-a-multi-layer-perceptron-model-for-classification" style="font-size: 80%;"><b>Setting up a Multi-layer perceptron model for classification</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs071.html#defining-the-cost-function" style="font-size: 80%;"><b>Defining the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs072.html#example-binary-classification-problem" style="font-size: 80%;"><b>Example: binary classification problem</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs073.html#the-softmax-function" style="font-size: 80%;"><b>The Softmax function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs074.html#developing-a-code-for-doing-neural-networks-with-back-propagation" style="font-size: 80%;"><b>Developing a code for doing neural networks with back propagation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs076.html#train-and-test-datasets" style="font-size: 80%;"><b>Train and test datasets</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs077.html#define-model-and-architecture" style="font-size: 80%;"><b>Define model and architecture</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs078.html#layers" style="font-size: 80%;"><b>Layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs079.html#weights-and-biases" style="font-size: 80%;"><b>Weights and biases</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs080.html#feed-forward-pass" style="font-size: 80%;"><b>Feed-forward pass</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs081.html#matrix-multiplications" style="font-size: 80%;"><b>Matrix multiplications</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs082.html#choose-cost-function-and-optimizer" style="font-size: 80%;"><b>Choose cost function and optimizer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs083.html#optimizing-the-cost-function" style="font-size: 80%;"><b>Optimizing the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs084.html#regularization" style="font-size: 80%;"><b>Regularization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs085.html#matrix-multiplication" style="font-size: 80%;"><b>Matrix  multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs086.html#improving-performance" style="font-size: 80%;"><b>Improving performance</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs087.html#full-object-oriented-implementation" style="font-size: 80%;"><b>Full object-oriented implementation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs088.html#evaluate-model-performance-on-test-data" style="font-size: 80%;"><b>Evaluate model performance on test data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs089.html#adjust-hyperparameters" style="font-size: 80%;"><b>Adjust hyperparameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs091.html#scikit-learn-implementation" style="font-size: 80%;"><b>scikit-learn implementation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#building-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs094.html#tensorflow" style="font-size: 80%;"><b>Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs095.html#using-keras" style="font-size: 80%;"><b>Using Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#the-breast-cancer-data-now-with-keras" style="font-size: 80%;"><b>The Breast Cancer Data, now with Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs048.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs049.html#relevance" style="font-size: 80%;"><b>Relevance</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs050.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs051.html#exploding-gradients" style="font-size: 80%;"><b>Exploding gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs052.html#is-the-logistic-activation-function-sigmoid-our-choice" style="font-size: 80%;"><b>Is the Logistic activation function (Sigmoid)  our choice?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs053.html#logistic-function-as-the-root-of-problems" style="font-size: 80%;"><b>Logistic function as the root of problems</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs054.html#the-derivative-of-the-logistic-funtion" style="font-size: 80%;"><b>The derivative of the Logistic funtion</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs055.html#insights-from-the-paper-by-glorot-and-bengio" style="font-size: 80%;"><b>Insights from the paper by Glorot and Bengio</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs056.html#the-relu-function-family" style="font-size: 80%;"><b>The RELU function family</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs057.html#elu-function" style="font-size: 80%;"><b>ELU function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs058.html#which-activation-function-should-we-use" style="font-size: 80%;"><b>Which activation function should we use?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs059.html#more-on-activation-functions-output-layers" style="font-size: 80%;"><b>More on activation functions, output layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs060.html#fine-tuning-neural-network-hyperparameters" style="font-size: 80%;"><b>Fine-tuning neural network hyperparameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs061.html#hidden-layers" style="font-size: 80%;"><b>Hidden layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs062.html#batch-normalization" style="font-size: 80%;"><b>Batch Normalization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs063.html#dropout" style="font-size: 80%;"><b>Dropout</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs064.html#gradient-clipping" style="font-size: 80%;"><b>Gradient Clipping</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs065.html#a-top-down-perspective-on-neural-networks" style="font-size: 80%;"><b>A top-down perspective on Neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs066.html#more-top-down-perspectives" style="font-size: 80%;"><b>More top-down perspectives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs067.html#limitations-of-supervised-learning-with-deep-networks" style="font-size: 80%;"><b>Limitations of supervised learning with deep networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs068.html#limitations-of-nns" style="font-size: 80%;"><b>Limitations of NNs</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs069.html#homogeneous-data" style="font-size: 80%;"><b>Homogeneous data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs070.html#more-limitations" style="font-size: 80%;"><b>More limitations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs071.html#setting-up-a-multi-layer-perceptron-model-for-classification" style="font-size: 80%;"><b>Setting up a Multi-layer perceptron model for classification</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs072.html#defining-the-cost-function" style="font-size: 80%;"><b>Defining the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs073.html#example-binary-classification-problem" style="font-size: 80%;"><b>Example: binary classification problem</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs074.html#the-softmax-function" style="font-size: 80%;"><b>The Softmax function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs075.html#developing-a-code-for-doing-neural-networks-with-back-propagation" style="font-size: 80%;"><b>Developing a code for doing neural networks with back propagation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs077.html#train-and-test-datasets" style="font-size: 80%;"><b>Train and test datasets</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs078.html#define-model-and-architecture" style="font-size: 80%;"><b>Define model and architecture</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs079.html#layers" style="font-size: 80%;"><b>Layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs080.html#weights-and-biases" style="font-size: 80%;"><b>Weights and biases</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs081.html#feed-forward-pass" style="font-size: 80%;"><b>Feed-forward pass</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs082.html#matrix-multiplications" style="font-size: 80%;"><b>Matrix multiplications</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs083.html#choose-cost-function-and-optimizer" style="font-size: 80%;"><b>Choose cost function and optimizer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs084.html#optimizing-the-cost-function" style="font-size: 80%;"><b>Optimizing the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs085.html#regularization" style="font-size: 80%;"><b>Regularization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs086.html#matrix-multiplication" style="font-size: 80%;"><b>Matrix  multiplication</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs087.html#improving-performance" style="font-size: 80%;"><b>Improving performance</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs088.html#full-object-oriented-implementation" style="font-size: 80%;"><b>Full object-oriented implementation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs089.html#evaluate-model-performance-on-test-data" style="font-size: 80%;"><b>Evaluate model performance on test data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs090.html#adjust-hyperparameters" style="font-size: 80%;"><b>Adjust hyperparameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#scikit-learn-implementation" style="font-size: 80%;"><b>scikit-learn implementation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs094.html#building-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building neural networks in Tensorflow and Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs095.html#tensorflow" style="font-size: 80%;"><b>Tensorflow</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#using-keras" style="font-size: 80%;"><b>Using Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#building-a-neural-network-code" style="font-size: 80%;"><b>Building a neural network code</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#learning-rate-methods" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Learning rate methods</a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#usage-of-the-above-learning-rate-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of the above learning rate schedulers</a></li>
@@ -465,27 +463,16 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0021"></a>
 <!-- !split -->
-<h2 id="explicit-derivatives" class="anchor">Explicit derivatives </h2>
-
-<p>In total we have nine parameters which we need to train.  Using the
-chain rule (or just the back-propagation algorithm) we can find all
-derivatives. Since we will use automatic differentiation in reverse
-mode, we start with the derivatives of the cost function with respect
-to the parameters of the output layer, namely
-</p>
-
-$$
-\frac{\partial C}{\partial w_{i}^{(2)}}=\frac{\partial C}{\partial a^{(2)}}\frac{\partial a^{(2)}}{\partial z^{(2)}}\frac{\partial z^{(2)}}{\partial w_{i}^{(2)}}=\delta^{(2)}a_i^{(1)},
-$$
+<h2 id="output-layer" class="anchor">Output layer </h2>
 
-<p>with</p>
+<p>For the final output layer we have the inputs to the final activation function </p>
 $$
-\delta^{(2)}=\frac{\partial C}{\partial a^{(2)}}\frac{\partial a^{(2)}}{\partial z^{(2)}}
+z^{(2)} = w_{1}^{(2)}a_1^{(1)} +w_{2}^{(2)}a_2^{(1)}+b^{(2)},
 $$
 
-<p>and finally</p>
+<p>resulting in the  output</p>
 $$
-\frac{\partial C}{\partial b^{(2)}}=\frac{\partial C}{\partial a^{(2)}}\frac{\partial a^{(2)}}{\partial z^{(2)}}\frac{\partial z^{(2)}}{\partial b^{(2)}}=\delta^{(2)}.
+a^{(2)}=\sigma^{(2)}(z^{(2)}).
 $$
 
 
diff --git a/doc/pub/week42/html/._week42-bs022.html b/doc/pub/week42/html/._week42-bs022.html
index 7af775cf9..76ad95999 100644
--- a/doc/pub/week42/html/._week42-bs022.html
+++ b/doc/pub/week42/html/._week42-bs022.html
@@ -36,16 +36,17 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Lecture October 14, 2024', 2, None, 'lecture-october-14-2024'),
-              ('Material for the active learning sessions on Tuesday and '
-               'Wednesday',
+ 'sections': [('Lecture October 13, 2025', 2, None, 'lecture-october-13-2025'),
+              ('Readings and videos', 2, None, 'readings-and-videos'),
+              ('Material for the lab sessions on Tuesday and Wednesday',
                2,
                None,
-               'material-for-the-active-learning-sessions-on-tuesday-and-wednesday'),
-              ('Writing a code which implements a feed-forward neural network',
+               'material-for-the-lab-sessions-on-tuesday-and-wednesday'),
+              ('Lecture material: Writing a code which implements a '
+               'feed-forward neural network',
                2,
                None,
-               'writing-a-code-which-implements-a-feed-forward-neural-network'),
+               'lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network'),
               ('Mathematics of deep learning',
                2,
                None,
@@ -55,10 +56,11 @@
                None,
                'reminder-on-books-with-hands-on-material-and-codes'),
               ('Reading recommendations', 2, None, 'reading-recommendations'),
-              ('First network example, simple percepetron with one input',
+              ('Reminder from last week: First network example, simple '
+               'percepetron with one input',
                2,
                None,
-               'first-network-example-simple-percepetron-with-one-input'),
+               'reminder-from-last-week-first-network-example-simple-percepetron-with-one-input'),
               ('Layout of a simple neural network with no hidden layer',
                2,
                None,
@@ -105,10 +107,10 @@
                None,
                'setting-up-the-equations-for-a-neural-network'),
               ('Layout of a neural network with three hidden layers (last '
-               'later = $l=L=4$, first layer $l=0$)',
+               'layer = $l=L=4$, first layer $l=0$)',
                2,
                None,
-               'layout-of-a-neural-network-with-three-hidden-layers-last-later-l-l-4-first-layer-l-0'),
+               'layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0'),
               ('Definitions', 2, None, 'definitions'),
               ('Inputs to the activation function',
                2,
@@ -291,10 +293,6 @@
                2,
                None,
                'collect-and-pre-process-data'),
-              ('The Breast Cancer Data, now with Keras',
-               2,
-               None,
-               'the-breast-cancer-data-now-with-keras'),
               ('Building a neural network code',
                2,
                None,
@@ -349,103 +347,103 @@
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs001.html#lecture-october-14-2024" style="font-size: 80%;"><b>Lecture October 14, 2024</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs002.html#material-for-the-active-learning-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Material for the active learning sessions on Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs003.html#writing-a-code-which-implements-a-feed-forward-neural-network" style="font-size: 80%;"><b>Writing a code which implements a feed-forward neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs004.html#mathematics-of-deep-learning" style="font-size: 80%;"><b>Mathematics of deep learning</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs005.html#reminder-on-books-with-hands-on-material-and-codes" style="font-size: 80%;"><b>Reminder on books with hands-on material and codes</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs006.html#reading-recommendations" style="font-size: 80%;"><b>Reading recommendations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs007.html#first-network-example-simple-percepetron-with-one-input" style="font-size: 80%;"><b>First network example, simple percepetron with one input</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs008.html#layout-of-a-simple-neural-network-with-no-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with no hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs009.html#optimizing-the-parameters" style="font-size: 80%;"><b>Optimizing the parameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs010.html#adding-a-hidden-layer" style="font-size: 80%;"><b>Adding a hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs011.html#layout-of-a-simple-neural-network-with-one-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with one hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs012.html#the-derivatives" style="font-size: 80%;"><b>The derivatives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs013.html#important-observations" style="font-size: 80%;"><b>Important observations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs014.html#the-training" style="font-size: 80%;"><b>The training</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs015.html#code-example" style="font-size: 80%;"><b>Code example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs016.html#simple-neural-network-and-the-back-propagation-equations" style="font-size: 80%;"><b>Simple neural network and the  back propagation equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs017.html#layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-with-two-hidden-noeds-and-one-output-node" style="font-size: 80%;"><b>Layout of a simple neural network with two input nodes, one  hidden layer with two hidden noeds and one output node</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs018.html#the-ouput-layer" style="font-size: 80%;"><b>The ouput layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs019.html#compact-expressions" style="font-size: 80%;"><b>Compact expressions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs020.html#output-layer" style="font-size: 80%;"><b>Output layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs021.html#explicit-derivatives" style="font-size: 80%;"><b>Explicit derivatives</b></a></li>
-     <!-- navigation toc: --> <li><a href="#derivatives-of-the-hidden-layer" style="font-size: 80%;"><b>Derivatives of the hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs023.html#final-expression" style="font-size: 80%;"><b>Final expression</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs024.html#completing-the-list" style="font-size: 80%;"><b>Completing the list</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs025.html#final-expressions-for-the-biases-of-the-hidden-layer" style="font-size: 80%;"><b>Final expressions for the biases of the hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs026.html#gradient-expressions" style="font-size: 80%;"><b>Gradient expressions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs027.html#setting-up-the-equations-for-a-neural-network" style="font-size: 80%;"><b>Setting up the equations for a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs028.html#layout-of-a-neural-network-with-three-hidden-layers-last-later-l-l-4-first-layer-l-0" style="font-size: 80%;"><b>Layout of a neural network with three hidden layers (last later = \( l=L=4 \), first layer \( l=0 \))</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs029.html#definitions" style="font-size: 80%;"><b>Definitions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs030.html#inputs-to-the-activation-function" style="font-size: 80%;"><b>Inputs to the activation function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs031.html#layout-of-input-to-first-hidden-layer-l-1-from-input-layer-l-0" style="font-size: 80%;"><b>Layout of input to first hidden layer \( l=1 \) from input layer \( l=0 \)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs032.html#derivatives-and-the-chain-rule" style="font-size: 80%;"><b>Derivatives and the chain rule</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs033.html#derivative-of-the-cost-function" style="font-size: 80%;"><b>Derivative of the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs034.html#the-back-propagation-equations-for-a-neural-network" style="font-size: 80%;"><b>The  back propagation equations for a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs035.html#analyzing-the-last-results" style="font-size: 80%;"><b>Analyzing the last results</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs036.html#more-considerations" style="font-size: 80%;"><b>More considerations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs037.html#derivatives-in-terms-of-z-j-l" style="font-size: 80%;"><b>Derivatives in terms of \( z_j^L \)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs038.html#bringing-it-together" style="font-size: 80%;"><b>Bringing it together</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs039.html#final-back-propagating-equation" style="font-size: 80%;"><b>Final back propagating equation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs040.html#using-the-chain-rule-and-summing-over-all-k-entries" style="font-size: 80%;"><b>Using the chain rule and summing over all \( k \) entries</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs041.html#setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations" style="font-size: 80%;"><b>Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs042.html#setting-up-the-back-propagation-algorithm-part-1" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 1</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs043.html#setting-up-the-back-propagation-algorithm-part-2" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs044.html#setting-up-the-back-propagation-algorithm-part-3" style="font-size: 80%;"><b>Setting up the Back propagation algorithm, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs045.html#updating-the-gradients" style="font-size: 80%;"><b>Updating the gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs001.html#lecture-october-13-2025" style="font-size: 80%;"><b>Lecture October 13, 2025</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs002.html#readings-and-videos" style="font-size: 80%;"><b>Readings and videos</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs003.html#material-for-the-lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Material for the lab sessions on Tuesday and Wednesday</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs004.html#lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network" style="font-size: 80%;"><b>Lecture material: Writing a code which implements a feed-forward neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs005.html#mathematics-of-deep-learning" style="font-size: 80%;"><b>Mathematics of deep learning</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs006.html#reminder-on-books-with-hands-on-material-and-codes" style="font-size: 80%;"><b>Reminder on books with hands-on material and codes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs007.html#reading-recommendations" style="font-size: 80%;"><b>Reading recommendations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs008.html#reminder-from-last-week-first-network-example-simple-percepetron-with-one-input" style="font-size: 80%;"><b>Reminder from last week: First network example, simple percepetron with one input</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs009.html#layout-of-a-simple-neural-network-with-no-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with no hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs010.html#optimizing-the-parameters" style="font-size: 80%;"><b>Optimizing the parameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs011.html#adding-a-hidden-layer" style="font-size: 80%;"><b>Adding a hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs012.html#layout-of-a-simple-neural-network-with-one-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with one hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs013.html#the-derivatives" style="font-size: 80%;"><b>The derivatives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs014.html#important-observations" style="font-size: 80%;"><b>Important observations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs015.html#the-training" style="font-size: 80%;"><b>The training</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs016.html#code-example" style="font-size: 80%;"><b>Code example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs017.html#simple-neural-network-and-the-back-propagation-equations" style="font-size: 80%;"><b>Simple neural network and the  back propagation equations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs018.html#layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-with-two-hidden-noeds-and-one-output-node" style="font-size: 80%;"><b>Layout of a simple neural network with two input nodes, one  hidden layer with two hidden noeds and one output node</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs019.html#the-ouput-layer" style="font-size: 80%;"><b>The ouput layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs020.html#compact-expressions" style="font-size: 80%;"><b>Compact expressions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs021.html#output-layer" style="font-size: 80%;"><b>Output layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="#explicit-derivatives" style="font-size: 80%;"><b>Explicit derivatives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs023.html#derivatives-of-the-hidden-layer" style="font-size: 80%;"><b>Derivatives of the hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs024.html#final-expression" style="font-size: 80%;"><b>Final expression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs025.html#completing-the-list" style="font-size: 80%;"><b>Completing the list</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs026.html#final-expressions-for-the-biases-of-the-hidden-layer" style="font-size: 80%;"><b>Final expressions for the biases of the hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs027.html#gradient-expressions" style="font-size: 80%;"><b>Gradient expressions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs028.html#setting-up-the-equations-for-a-neural-network" style="font-size: 80%;"><b>Setting up the equations for a neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs029.html#layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0" style="font-size: 80%;"><b>Layout of a neural network with three hidden layers (last layer = \( l=L=4 \), first layer \( l=0 \))</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs030.html#definitions" style="font-size: 80%;"><b>Definitions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs031.html#inputs-to-the-activation-function" style="font-size: 80%;"><b>Inputs to the activation function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs032.html#layout-of-input-to-first-hidden-layer-l-1-from-input-layer-l-0" style="font-size: 80%;"><b>Layout of input to first hidden layer \( l=1 \) from input layer \( l=0 \)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs033.html#derivatives-and-the-chain-rule" style="font-size: 80%;"><b>Derivatives and the chain rule</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs034.html#derivative-of-the-cost-function" style="font-size: 80%;"><b>Derivative of the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs035.html#the-back-propagation-equations-for-a-neural-network" style="font-size: 80%;"><b>The  back propagation equations for a neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs036.html#analyzing-the-last-results" style="font-size: 80%;"><b>Analyzing the last results</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs037.html#more-considerations" style="font-size: 80%;"><b>More considerations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs038.html#derivatives-in-terms-of-z-j-l" style="font-size: 80%;"><b>Derivatives in terms of \( z_j^L \)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs039.html#bringing-it-together" style="font-size: 80%;"><b>Bringing it together</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs040.html#final-back-propagating-equation" style="font-size: 80%;"><b>Final back propagating equation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs041.html#using-the-chain-rule-and-summing-over-all-k-entries" style="font-size: 80%;"><b>Using the chain rule and summing over all \( k \) entries</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs042.html#setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations" style="font-size: 80%;"><b>Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs043.html#setting-up-the-back-propagation-algorithm-part-1" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 1</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs044.html#setting-up-the-back-propagation-algorithm-part-2" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 2</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs045.html#setting-up-the-back-propagation-algorithm-part-3" style="font-size: 80%;"><b>Setting up the Back propagation algorithm, part 3</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs046.html#updating-the-gradients" style="font-size: 80%;"><b>Updating the gradients</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#activation-functions" style="font-size: 80%;"><b>Activation functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs047.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs048.html#relevance" style="font-size: 80%;"><b>Relevance</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs049.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs050.html#exploding-gradients" style="font-size: 80%;"><b>Exploding gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs051.html#is-the-logistic-activation-function-sigmoid-our-choice" style="font-size: 80%;"><b>Is the Logistic activation function (Sigmoid)  our choice?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs052.html#logistic-function-as-the-root-of-problems" style="font-size: 80%;"><b>Logistic function as the root of problems</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs053.html#the-derivative-of-the-logistic-funtion" style="font-size: 80%;"><b>The derivative of the Logistic funtion</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs054.html#insights-from-the-paper-by-glorot-and-bengio" style="font-size: 80%;"><b>Insights from the paper by Glorot and Bengio</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs055.html#the-relu-function-family" style="font-size: 80%;"><b>The RELU function family</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs056.html#elu-function" style="font-size: 80%;"><b>ELU function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs057.html#which-activation-function-should-we-use" style="font-size: 80%;"><b>Which activation function should we use?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs058.html#more-on-activation-functions-output-layers" style="font-size: 80%;"><b>More on activation functions, output layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs059.html#fine-tuning-neural-network-hyperparameters" style="font-size: 80%;"><b>Fine-tuning neural network hyperparameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs060.html#hidden-layers" style="font-size: 80%;"><b>Hidden layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs061.html#batch-normalization" style="font-size: 80%;"><b>Batch Normalization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs062.html#dropout" style="font-size: 80%;"><b>Dropout</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs063.html#gradient-clipping" style="font-size: 80%;"><b>Gradient Clipping</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs064.html#a-top-down-perspective-on-neural-networks" style="font-size: 80%;"><b>A top-down perspective on Neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs065.html#more-top-down-perspectives" style="font-size: 80%;"><b>More top-down perspectives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs066.html#limitations-of-supervised-learning-with-deep-networks" style="font-size: 80%;"><b>Limitations of supervised learning with deep networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs067.html#limitations-of-nns" style="font-size: 80%;"><b>Limitations of NNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs068.html#homogeneous-data" style="font-size: 80%;"><b>Homogeneous data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs069.html#more-limitations" style="font-size: 80%;"><b>More limitations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs070.html#setting-up-a-multi-layer-perceptron-model-for-classification" style="font-size: 80%;"><b>Setting up a Multi-layer perceptron model for classification</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs071.html#defining-the-cost-function" style="font-size: 80%;"><b>Defining the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs072.html#example-binary-classification-problem" style="font-size: 80%;"><b>Example: binary classification problem</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs073.html#the-softmax-function" style="font-size: 80%;"><b>The Softmax function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs074.html#developing-a-code-for-doing-neural-networks-with-back-propagation" style="font-size: 80%;"><b>Developing a code for doing neural networks with back propagation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs076.html#train-and-test-datasets" style="font-size: 80%;"><b>Train and test datasets</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs077.html#define-model-and-architecture" style="font-size: 80%;"><b>Define model and architecture</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs078.html#layers" style="font-size: 80%;"><b>Layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs079.html#weights-and-biases" style="font-size: 80%;"><b>Weights and biases</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs080.html#feed-forward-pass" style="font-size: 80%;"><b>Feed-forward pass</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs081.html#matrix-multiplications" style="font-size: 80%;"><b>Matrix multiplications</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs082.html#choose-cost-function-and-optimizer" style="font-size: 80%;"><b>Choose cost function and optimizer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs083.html#optimizing-the-cost-function" style="font-size: 80%;"><b>Optimizing the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs084.html#regularization" style="font-size: 80%;"><b>Regularization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs085.html#matrix-multiplication" style="font-size: 80%;"><b>Matrix  multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs086.html#improving-performance" style="font-size: 80%;"><b>Improving performance</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs087.html#full-object-oriented-implementation" style="font-size: 80%;"><b>Full object-oriented implementation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs088.html#evaluate-model-performance-on-test-data" style="font-size: 80%;"><b>Evaluate model performance on test data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs089.html#adjust-hyperparameters" style="font-size: 80%;"><b>Adjust hyperparameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs091.html#scikit-learn-implementation" style="font-size: 80%;"><b>scikit-learn implementation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#building-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs094.html#tensorflow" style="font-size: 80%;"><b>Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs095.html#using-keras" style="font-size: 80%;"><b>Using Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#the-breast-cancer-data-now-with-keras" style="font-size: 80%;"><b>The Breast Cancer Data, now with Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs048.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs049.html#relevance" style="font-size: 80%;"><b>Relevance</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs050.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs051.html#exploding-gradients" style="font-size: 80%;"><b>Exploding gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs052.html#is-the-logistic-activation-function-sigmoid-our-choice" style="font-size: 80%;"><b>Is the Logistic activation function (Sigmoid)  our choice?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs053.html#logistic-function-as-the-root-of-problems" style="font-size: 80%;"><b>Logistic function as the root of problems</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs054.html#the-derivative-of-the-logistic-funtion" style="font-size: 80%;"><b>The derivative of the Logistic funtion</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs055.html#insights-from-the-paper-by-glorot-and-bengio" style="font-size: 80%;"><b>Insights from the paper by Glorot and Bengio</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs056.html#the-relu-function-family" style="font-size: 80%;"><b>The RELU function family</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs057.html#elu-function" style="font-size: 80%;"><b>ELU function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs058.html#which-activation-function-should-we-use" style="font-size: 80%;"><b>Which activation function should we use?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs059.html#more-on-activation-functions-output-layers" style="font-size: 80%;"><b>More on activation functions, output layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs060.html#fine-tuning-neural-network-hyperparameters" style="font-size: 80%;"><b>Fine-tuning neural network hyperparameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs061.html#hidden-layers" style="font-size: 80%;"><b>Hidden layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs062.html#batch-normalization" style="font-size: 80%;"><b>Batch Normalization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs063.html#dropout" style="font-size: 80%;"><b>Dropout</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs064.html#gradient-clipping" style="font-size: 80%;"><b>Gradient Clipping</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs065.html#a-top-down-perspective-on-neural-networks" style="font-size: 80%;"><b>A top-down perspective on Neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs066.html#more-top-down-perspectives" style="font-size: 80%;"><b>More top-down perspectives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs067.html#limitations-of-supervised-learning-with-deep-networks" style="font-size: 80%;"><b>Limitations of supervised learning with deep networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs068.html#limitations-of-nns" style="font-size: 80%;"><b>Limitations of NNs</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs069.html#homogeneous-data" style="font-size: 80%;"><b>Homogeneous data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs070.html#more-limitations" style="font-size: 80%;"><b>More limitations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs071.html#setting-up-a-multi-layer-perceptron-model-for-classification" style="font-size: 80%;"><b>Setting up a Multi-layer perceptron model for classification</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs072.html#defining-the-cost-function" style="font-size: 80%;"><b>Defining the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs073.html#example-binary-classification-problem" style="font-size: 80%;"><b>Example: binary classification problem</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs074.html#the-softmax-function" style="font-size: 80%;"><b>The Softmax function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs075.html#developing-a-code-for-doing-neural-networks-with-back-propagation" style="font-size: 80%;"><b>Developing a code for doing neural networks with back propagation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs077.html#train-and-test-datasets" style="font-size: 80%;"><b>Train and test datasets</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs078.html#define-model-and-architecture" style="font-size: 80%;"><b>Define model and architecture</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs079.html#layers" style="font-size: 80%;"><b>Layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs080.html#weights-and-biases" style="font-size: 80%;"><b>Weights and biases</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs081.html#feed-forward-pass" style="font-size: 80%;"><b>Feed-forward pass</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs082.html#matrix-multiplications" style="font-size: 80%;"><b>Matrix multiplications</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs083.html#choose-cost-function-and-optimizer" style="font-size: 80%;"><b>Choose cost function and optimizer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs084.html#optimizing-the-cost-function" style="font-size: 80%;"><b>Optimizing the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs085.html#regularization" style="font-size: 80%;"><b>Regularization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs086.html#matrix-multiplication" style="font-size: 80%;"><b>Matrix  multiplication</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs087.html#improving-performance" style="font-size: 80%;"><b>Improving performance</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs088.html#full-object-oriented-implementation" style="font-size: 80%;"><b>Full object-oriented implementation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs089.html#evaluate-model-performance-on-test-data" style="font-size: 80%;"><b>Evaluate model performance on test data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs090.html#adjust-hyperparameters" style="font-size: 80%;"><b>Adjust hyperparameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#scikit-learn-implementation" style="font-size: 80%;"><b>scikit-learn implementation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs094.html#building-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building neural networks in Tensorflow and Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs095.html#tensorflow" style="font-size: 80%;"><b>Tensorflow</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#using-keras" style="font-size: 80%;"><b>Using Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#building-a-neural-network-code" style="font-size: 80%;"><b>Building a neural network code</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#learning-rate-methods" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Learning rate methods</a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#usage-of-the-above-learning-rate-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of the above learning rate schedulers</a></li>
@@ -465,22 +463,27 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0022"></a>
 <!-- !split -->
-<h2 id="derivatives-of-the-hidden-layer" class="anchor">Derivatives of the hidden layer </h2>
+<h2 id="explicit-derivatives" class="anchor">Explicit derivatives </h2>
+
+<p>In total we have nine parameters which we need to train.  Using the
+chain rule (or just the back-propagation algorithm) we can find all
+derivatives. Since we will use automatic differentiation in reverse
+mode, we start with the derivatives of the cost function with respect
+to the parameters of the output layer, namely
+</p>
 
-<p>Using the chain rule we have the following expressions for say one of the weight parameters (it is easy to generalize to the other weight parameters)</p>
 $$
-\frac{\partial C}{\partial w_{11}^{(1)}}=\frac{\partial C}{\partial a^{(2)}}\frac{\partial a^{(2)}}{\partial z^{(2)}}
-\frac{\partial z^{(2)}}{\partial z_1^{(1)}}\frac{\partial z_1^{(1)}}{\partial w_{11}^{(1)}}=    \delta^{(2)}\frac{\partial z^{(2)}}{\partial z_1^{(1)}}\frac{\partial z_1^{(1)}}{\partial w_{11}^{(1)}},
+\frac{\partial C}{\partial w_{i}^{(2)}}=\frac{\partial C}{\partial a^{(2)}}\frac{\partial a^{(2)}}{\partial z^{(2)}}\frac{\partial z^{(2)}}{\partial w_{i}^{(2)}}=\delta^{(2)}a_i^{(1)},
 $$
 
-<p>which, noting that</p>
+<p>with</p>
 $$
-z^{(2)} =w_1^{(2)}a_1^{(1)}+w_2^{(2)}a_2^{(1)}+b^{(2)},
+\delta^{(2)}=\frac{\partial C}{\partial a^{(2)}}\frac{\partial a^{(2)}}{\partial z^{(2)}}
 $$
 
-<p>allows us to rewrite </p>
+<p>and finally</p>
 $$
-\frac{\partial z^{(2)}}{\partial z_1^{(1)}}\frac{\partial z_1^{(1)}}{\partial w_{11}^{(1)}}=w_1^{(2)}\frac{\partial a_1^{(1)}}{\partial z_1^{(1)}}a_1^{(1)}.
+\frac{\partial C}{\partial b^{(2)}}=\frac{\partial C}{\partial a^{(2)}}\frac{\partial a^{(2)}}{\partial z^{(2)}}\frac{\partial z^{(2)}}{\partial b^{(2)}}=\delta^{(2)}.
 $$
 
 
diff --git a/doc/pub/week42/html/._week42-bs023.html b/doc/pub/week42/html/._week42-bs023.html
index 595d824e0..1370ae533 100644
--- a/doc/pub/week42/html/._week42-bs023.html
+++ b/doc/pub/week42/html/._week42-bs023.html
@@ -36,16 +36,17 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Lecture October 14, 2024', 2, None, 'lecture-october-14-2024'),
-              ('Material for the active learning sessions on Tuesday and '
-               'Wednesday',
+ 'sections': [('Lecture October 13, 2025', 2, None, 'lecture-october-13-2025'),
+              ('Readings and videos', 2, None, 'readings-and-videos'),
+              ('Material for the lab sessions on Tuesday and Wednesday',
                2,
                None,
-               'material-for-the-active-learning-sessions-on-tuesday-and-wednesday'),
-              ('Writing a code which implements a feed-forward neural network',
+               'material-for-the-lab-sessions-on-tuesday-and-wednesday'),
+              ('Lecture material: Writing a code which implements a '
+               'feed-forward neural network',
                2,
                None,
-               'writing-a-code-which-implements-a-feed-forward-neural-network'),
+               'lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network'),
               ('Mathematics of deep learning',
                2,
                None,
@@ -55,10 +56,11 @@
                None,
                'reminder-on-books-with-hands-on-material-and-codes'),
               ('Reading recommendations', 2, None, 'reading-recommendations'),
-              ('First network example, simple percepetron with one input',
+              ('Reminder from last week: First network example, simple '
+               'percepetron with one input',
                2,
                None,
-               'first-network-example-simple-percepetron-with-one-input'),
+               'reminder-from-last-week-first-network-example-simple-percepetron-with-one-input'),
               ('Layout of a simple neural network with no hidden layer',
                2,
                None,
@@ -105,10 +107,10 @@
                None,
                'setting-up-the-equations-for-a-neural-network'),
               ('Layout of a neural network with three hidden layers (last '
-               'later = $l=L=4$, first layer $l=0$)',
+               'layer = $l=L=4$, first layer $l=0$)',
                2,
                None,
-               'layout-of-a-neural-network-with-three-hidden-layers-last-later-l-l-4-first-layer-l-0'),
+               'layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0'),
               ('Definitions', 2, None, 'definitions'),
               ('Inputs to the activation function',
                2,
@@ -291,10 +293,6 @@
                2,
                None,
                'collect-and-pre-process-data'),
-              ('The Breast Cancer Data, now with Keras',
-               2,
-               None,
-               'the-breast-cancer-data-now-with-keras'),
               ('Building a neural network code',
                2,
                None,
@@ -349,103 +347,103 @@
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs001.html#lecture-october-14-2024" style="font-size: 80%;"><b>Lecture October 14, 2024</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs002.html#material-for-the-active-learning-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Material for the active learning sessions on Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs003.html#writing-a-code-which-implements-a-feed-forward-neural-network" style="font-size: 80%;"><b>Writing a code which implements a feed-forward neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs004.html#mathematics-of-deep-learning" style="font-size: 80%;"><b>Mathematics of deep learning</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs005.html#reminder-on-books-with-hands-on-material-and-codes" style="font-size: 80%;"><b>Reminder on books with hands-on material and codes</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs006.html#reading-recommendations" style="font-size: 80%;"><b>Reading recommendations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs007.html#first-network-example-simple-percepetron-with-one-input" style="font-size: 80%;"><b>First network example, simple percepetron with one input</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs008.html#layout-of-a-simple-neural-network-with-no-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with no hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs009.html#optimizing-the-parameters" style="font-size: 80%;"><b>Optimizing the parameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs010.html#adding-a-hidden-layer" style="font-size: 80%;"><b>Adding a hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs011.html#layout-of-a-simple-neural-network-with-one-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with one hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs012.html#the-derivatives" style="font-size: 80%;"><b>The derivatives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs013.html#important-observations" style="font-size: 80%;"><b>Important observations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs014.html#the-training" style="font-size: 80%;"><b>The training</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs015.html#code-example" style="font-size: 80%;"><b>Code example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs016.html#simple-neural-network-and-the-back-propagation-equations" style="font-size: 80%;"><b>Simple neural network and the  back propagation equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs017.html#layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-with-two-hidden-noeds-and-one-output-node" style="font-size: 80%;"><b>Layout of a simple neural network with two input nodes, one  hidden layer with two hidden noeds and one output node</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs018.html#the-ouput-layer" style="font-size: 80%;"><b>The ouput layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs019.html#compact-expressions" style="font-size: 80%;"><b>Compact expressions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs020.html#output-layer" style="font-size: 80%;"><b>Output layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs021.html#explicit-derivatives" style="font-size: 80%;"><b>Explicit derivatives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs022.html#derivatives-of-the-hidden-layer" style="font-size: 80%;"><b>Derivatives of the hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="#final-expression" style="font-size: 80%;"><b>Final expression</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs024.html#completing-the-list" style="font-size: 80%;"><b>Completing the list</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs025.html#final-expressions-for-the-biases-of-the-hidden-layer" style="font-size: 80%;"><b>Final expressions for the biases of the hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs026.html#gradient-expressions" style="font-size: 80%;"><b>Gradient expressions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs027.html#setting-up-the-equations-for-a-neural-network" style="font-size: 80%;"><b>Setting up the equations for a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs028.html#layout-of-a-neural-network-with-three-hidden-layers-last-later-l-l-4-first-layer-l-0" style="font-size: 80%;"><b>Layout of a neural network with three hidden layers (last later = \( l=L=4 \), first layer \( l=0 \))</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs029.html#definitions" style="font-size: 80%;"><b>Definitions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs030.html#inputs-to-the-activation-function" style="font-size: 80%;"><b>Inputs to the activation function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs031.html#layout-of-input-to-first-hidden-layer-l-1-from-input-layer-l-0" style="font-size: 80%;"><b>Layout of input to first hidden layer \( l=1 \) from input layer \( l=0 \)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs032.html#derivatives-and-the-chain-rule" style="font-size: 80%;"><b>Derivatives and the chain rule</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs033.html#derivative-of-the-cost-function" style="font-size: 80%;"><b>Derivative of the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs034.html#the-back-propagation-equations-for-a-neural-network" style="font-size: 80%;"><b>The  back propagation equations for a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs035.html#analyzing-the-last-results" style="font-size: 80%;"><b>Analyzing the last results</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs036.html#more-considerations" style="font-size: 80%;"><b>More considerations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs037.html#derivatives-in-terms-of-z-j-l" style="font-size: 80%;"><b>Derivatives in terms of \( z_j^L \)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs038.html#bringing-it-together" style="font-size: 80%;"><b>Bringing it together</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs039.html#final-back-propagating-equation" style="font-size: 80%;"><b>Final back propagating equation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs040.html#using-the-chain-rule-and-summing-over-all-k-entries" style="font-size: 80%;"><b>Using the chain rule and summing over all \( k \) entries</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs041.html#setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations" style="font-size: 80%;"><b>Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs042.html#setting-up-the-back-propagation-algorithm-part-1" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 1</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs043.html#setting-up-the-back-propagation-algorithm-part-2" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs044.html#setting-up-the-back-propagation-algorithm-part-3" style="font-size: 80%;"><b>Setting up the Back propagation algorithm, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs045.html#updating-the-gradients" style="font-size: 80%;"><b>Updating the gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs001.html#lecture-october-13-2025" style="font-size: 80%;"><b>Lecture October 13, 2025</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs002.html#readings-and-videos" style="font-size: 80%;"><b>Readings and videos</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs003.html#material-for-the-lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Material for the lab sessions on Tuesday and Wednesday</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs004.html#lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network" style="font-size: 80%;"><b>Lecture material: Writing a code which implements a feed-forward neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs005.html#mathematics-of-deep-learning" style="font-size: 80%;"><b>Mathematics of deep learning</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs006.html#reminder-on-books-with-hands-on-material-and-codes" style="font-size: 80%;"><b>Reminder on books with hands-on material and codes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs007.html#reading-recommendations" style="font-size: 80%;"><b>Reading recommendations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs008.html#reminder-from-last-week-first-network-example-simple-percepetron-with-one-input" style="font-size: 80%;"><b>Reminder from last week: First network example, simple percepetron with one input</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs009.html#layout-of-a-simple-neural-network-with-no-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with no hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs010.html#optimizing-the-parameters" style="font-size: 80%;"><b>Optimizing the parameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs011.html#adding-a-hidden-layer" style="font-size: 80%;"><b>Adding a hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs012.html#layout-of-a-simple-neural-network-with-one-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with one hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs013.html#the-derivatives" style="font-size: 80%;"><b>The derivatives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs014.html#important-observations" style="font-size: 80%;"><b>Important observations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs015.html#the-training" style="font-size: 80%;"><b>The training</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs016.html#code-example" style="font-size: 80%;"><b>Code example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs017.html#simple-neural-network-and-the-back-propagation-equations" style="font-size: 80%;"><b>Simple neural network and the  back propagation equations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs018.html#layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-with-two-hidden-noeds-and-one-output-node" style="font-size: 80%;"><b>Layout of a simple neural network with two input nodes, one  hidden layer with two hidden noeds and one output node</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs019.html#the-ouput-layer" style="font-size: 80%;"><b>The ouput layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs020.html#compact-expressions" style="font-size: 80%;"><b>Compact expressions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs021.html#output-layer" style="font-size: 80%;"><b>Output layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs022.html#explicit-derivatives" style="font-size: 80%;"><b>Explicit derivatives</b></a></li>
+     <!-- navigation toc: --> <li><a href="#derivatives-of-the-hidden-layer" style="font-size: 80%;"><b>Derivatives of the hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs024.html#final-expression" style="font-size: 80%;"><b>Final expression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs025.html#completing-the-list" style="font-size: 80%;"><b>Completing the list</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs026.html#final-expressions-for-the-biases-of-the-hidden-layer" style="font-size: 80%;"><b>Final expressions for the biases of the hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs027.html#gradient-expressions" style="font-size: 80%;"><b>Gradient expressions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs028.html#setting-up-the-equations-for-a-neural-network" style="font-size: 80%;"><b>Setting up the equations for a neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs029.html#layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0" style="font-size: 80%;"><b>Layout of a neural network with three hidden layers (last layer = \( l=L=4 \), first layer \( l=0 \))</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs030.html#definitions" style="font-size: 80%;"><b>Definitions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs031.html#inputs-to-the-activation-function" style="font-size: 80%;"><b>Inputs to the activation function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs032.html#layout-of-input-to-first-hidden-layer-l-1-from-input-layer-l-0" style="font-size: 80%;"><b>Layout of input to first hidden layer \( l=1 \) from input layer \( l=0 \)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs033.html#derivatives-and-the-chain-rule" style="font-size: 80%;"><b>Derivatives and the chain rule</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs034.html#derivative-of-the-cost-function" style="font-size: 80%;"><b>Derivative of the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs035.html#the-back-propagation-equations-for-a-neural-network" style="font-size: 80%;"><b>The  back propagation equations for a neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs036.html#analyzing-the-last-results" style="font-size: 80%;"><b>Analyzing the last results</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs037.html#more-considerations" style="font-size: 80%;"><b>More considerations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs038.html#derivatives-in-terms-of-z-j-l" style="font-size: 80%;"><b>Derivatives in terms of \( z_j^L \)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs039.html#bringing-it-together" style="font-size: 80%;"><b>Bringing it together</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs040.html#final-back-propagating-equation" style="font-size: 80%;"><b>Final back propagating equation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs041.html#using-the-chain-rule-and-summing-over-all-k-entries" style="font-size: 80%;"><b>Using the chain rule and summing over all \( k \) entries</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs042.html#setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations" style="font-size: 80%;"><b>Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs043.html#setting-up-the-back-propagation-algorithm-part-1" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 1</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs044.html#setting-up-the-back-propagation-algorithm-part-2" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 2</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs045.html#setting-up-the-back-propagation-algorithm-part-3" style="font-size: 80%;"><b>Setting up the Back propagation algorithm, part 3</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs046.html#updating-the-gradients" style="font-size: 80%;"><b>Updating the gradients</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#activation-functions" style="font-size: 80%;"><b>Activation functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs047.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs048.html#relevance" style="font-size: 80%;"><b>Relevance</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs049.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs050.html#exploding-gradients" style="font-size: 80%;"><b>Exploding gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs051.html#is-the-logistic-activation-function-sigmoid-our-choice" style="font-size: 80%;"><b>Is the Logistic activation function (Sigmoid)  our choice?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs052.html#logistic-function-as-the-root-of-problems" style="font-size: 80%;"><b>Logistic function as the root of problems</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs053.html#the-derivative-of-the-logistic-funtion" style="font-size: 80%;"><b>The derivative of the Logistic funtion</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs054.html#insights-from-the-paper-by-glorot-and-bengio" style="font-size: 80%;"><b>Insights from the paper by Glorot and Bengio</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs055.html#the-relu-function-family" style="font-size: 80%;"><b>The RELU function family</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs056.html#elu-function" style="font-size: 80%;"><b>ELU function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs057.html#which-activation-function-should-we-use" style="font-size: 80%;"><b>Which activation function should we use?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs058.html#more-on-activation-functions-output-layers" style="font-size: 80%;"><b>More on activation functions, output layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs059.html#fine-tuning-neural-network-hyperparameters" style="font-size: 80%;"><b>Fine-tuning neural network hyperparameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs060.html#hidden-layers" style="font-size: 80%;"><b>Hidden layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs061.html#batch-normalization" style="font-size: 80%;"><b>Batch Normalization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs062.html#dropout" style="font-size: 80%;"><b>Dropout</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs063.html#gradient-clipping" style="font-size: 80%;"><b>Gradient Clipping</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs064.html#a-top-down-perspective-on-neural-networks" style="font-size: 80%;"><b>A top-down perspective on Neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs065.html#more-top-down-perspectives" style="font-size: 80%;"><b>More top-down perspectives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs066.html#limitations-of-supervised-learning-with-deep-networks" style="font-size: 80%;"><b>Limitations of supervised learning with deep networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs067.html#limitations-of-nns" style="font-size: 80%;"><b>Limitations of NNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs068.html#homogeneous-data" style="font-size: 80%;"><b>Homogeneous data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs069.html#more-limitations" style="font-size: 80%;"><b>More limitations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs070.html#setting-up-a-multi-layer-perceptron-model-for-classification" style="font-size: 80%;"><b>Setting up a Multi-layer perceptron model for classification</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs071.html#defining-the-cost-function" style="font-size: 80%;"><b>Defining the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs072.html#example-binary-classification-problem" style="font-size: 80%;"><b>Example: binary classification problem</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs073.html#the-softmax-function" style="font-size: 80%;"><b>The Softmax function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs074.html#developing-a-code-for-doing-neural-networks-with-back-propagation" style="font-size: 80%;"><b>Developing a code for doing neural networks with back propagation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs076.html#train-and-test-datasets" style="font-size: 80%;"><b>Train and test datasets</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs077.html#define-model-and-architecture" style="font-size: 80%;"><b>Define model and architecture</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs078.html#layers" style="font-size: 80%;"><b>Layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs079.html#weights-and-biases" style="font-size: 80%;"><b>Weights and biases</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs080.html#feed-forward-pass" style="font-size: 80%;"><b>Feed-forward pass</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs081.html#matrix-multiplications" style="font-size: 80%;"><b>Matrix multiplications</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs082.html#choose-cost-function-and-optimizer" style="font-size: 80%;"><b>Choose cost function and optimizer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs083.html#optimizing-the-cost-function" style="font-size: 80%;"><b>Optimizing the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs084.html#regularization" style="font-size: 80%;"><b>Regularization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs085.html#matrix-multiplication" style="font-size: 80%;"><b>Matrix  multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs086.html#improving-performance" style="font-size: 80%;"><b>Improving performance</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs087.html#full-object-oriented-implementation" style="font-size: 80%;"><b>Full object-oriented implementation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs088.html#evaluate-model-performance-on-test-data" style="font-size: 80%;"><b>Evaluate model performance on test data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs089.html#adjust-hyperparameters" style="font-size: 80%;"><b>Adjust hyperparameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs091.html#scikit-learn-implementation" style="font-size: 80%;"><b>scikit-learn implementation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#building-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs094.html#tensorflow" style="font-size: 80%;"><b>Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs095.html#using-keras" style="font-size: 80%;"><b>Using Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#the-breast-cancer-data-now-with-keras" style="font-size: 80%;"><b>The Breast Cancer Data, now with Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs048.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs049.html#relevance" style="font-size: 80%;"><b>Relevance</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs050.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs051.html#exploding-gradients" style="font-size: 80%;"><b>Exploding gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs052.html#is-the-logistic-activation-function-sigmoid-our-choice" style="font-size: 80%;"><b>Is the Logistic activation function (Sigmoid)  our choice?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs053.html#logistic-function-as-the-root-of-problems" style="font-size: 80%;"><b>Logistic function as the root of problems</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs054.html#the-derivative-of-the-logistic-funtion" style="font-size: 80%;"><b>The derivative of the Logistic funtion</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs055.html#insights-from-the-paper-by-glorot-and-bengio" style="font-size: 80%;"><b>Insights from the paper by Glorot and Bengio</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs056.html#the-relu-function-family" style="font-size: 80%;"><b>The RELU function family</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs057.html#elu-function" style="font-size: 80%;"><b>ELU function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs058.html#which-activation-function-should-we-use" style="font-size: 80%;"><b>Which activation function should we use?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs059.html#more-on-activation-functions-output-layers" style="font-size: 80%;"><b>More on activation functions, output layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs060.html#fine-tuning-neural-network-hyperparameters" style="font-size: 80%;"><b>Fine-tuning neural network hyperparameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs061.html#hidden-layers" style="font-size: 80%;"><b>Hidden layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs062.html#batch-normalization" style="font-size: 80%;"><b>Batch Normalization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs063.html#dropout" style="font-size: 80%;"><b>Dropout</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs064.html#gradient-clipping" style="font-size: 80%;"><b>Gradient Clipping</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs065.html#a-top-down-perspective-on-neural-networks" style="font-size: 80%;"><b>A top-down perspective on Neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs066.html#more-top-down-perspectives" style="font-size: 80%;"><b>More top-down perspectives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs067.html#limitations-of-supervised-learning-with-deep-networks" style="font-size: 80%;"><b>Limitations of supervised learning with deep networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs068.html#limitations-of-nns" style="font-size: 80%;"><b>Limitations of NNs</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs069.html#homogeneous-data" style="font-size: 80%;"><b>Homogeneous data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs070.html#more-limitations" style="font-size: 80%;"><b>More limitations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs071.html#setting-up-a-multi-layer-perceptron-model-for-classification" style="font-size: 80%;"><b>Setting up a Multi-layer perceptron model for classification</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs072.html#defining-the-cost-function" style="font-size: 80%;"><b>Defining the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs073.html#example-binary-classification-problem" style="font-size: 80%;"><b>Example: binary classification problem</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs074.html#the-softmax-function" style="font-size: 80%;"><b>The Softmax function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs075.html#developing-a-code-for-doing-neural-networks-with-back-propagation" style="font-size: 80%;"><b>Developing a code for doing neural networks with back propagation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs077.html#train-and-test-datasets" style="font-size: 80%;"><b>Train and test datasets</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs078.html#define-model-and-architecture" style="font-size: 80%;"><b>Define model and architecture</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs079.html#layers" style="font-size: 80%;"><b>Layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs080.html#weights-and-biases" style="font-size: 80%;"><b>Weights and biases</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs081.html#feed-forward-pass" style="font-size: 80%;"><b>Feed-forward pass</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs082.html#matrix-multiplications" style="font-size: 80%;"><b>Matrix multiplications</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs083.html#choose-cost-function-and-optimizer" style="font-size: 80%;"><b>Choose cost function and optimizer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs084.html#optimizing-the-cost-function" style="font-size: 80%;"><b>Optimizing the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs085.html#regularization" style="font-size: 80%;"><b>Regularization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs086.html#matrix-multiplication" style="font-size: 80%;"><b>Matrix  multiplication</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs087.html#improving-performance" style="font-size: 80%;"><b>Improving performance</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs088.html#full-object-oriented-implementation" style="font-size: 80%;"><b>Full object-oriented implementation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs089.html#evaluate-model-performance-on-test-data" style="font-size: 80%;"><b>Evaluate model performance on test data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs090.html#adjust-hyperparameters" style="font-size: 80%;"><b>Adjust hyperparameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#scikit-learn-implementation" style="font-size: 80%;"><b>scikit-learn implementation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs094.html#building-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building neural networks in Tensorflow and Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs095.html#tensorflow" style="font-size: 80%;"><b>Tensorflow</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#using-keras" style="font-size: 80%;"><b>Using Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#building-a-neural-network-code" style="font-size: 80%;"><b>Building a neural network code</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#learning-rate-methods" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Learning rate methods</a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#usage-of-the-above-learning-rate-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of the above learning rate schedulers</a></li>
@@ -465,20 +463,22 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0023"></a>
 <!-- !split -->
-<h2 id="final-expression" class="anchor">Final expression </h2>
-<p>Defining</p>
+<h2 id="derivatives-of-the-hidden-layer" class="anchor">Derivatives of the hidden layer </h2>
+
+<p>Using the chain rule we have the following expressions for say one of the weight parameters (it is easy to generalize to the other weight parameters)</p>
 $$
-\delta_1^{(1)}=w_1^{(2)}\frac{\partial a_1^{(1)}}{\partial z_1^{(1)}}\delta^{(2)},
+\frac{\partial C}{\partial w_{11}^{(1)}}=\frac{\partial C}{\partial a^{(2)}}\frac{\partial a^{(2)}}{\partial z^{(2)}}
+\frac{\partial z^{(2)}}{\partial z_1^{(1)}}\frac{\partial z_1^{(1)}}{\partial w_{11}^{(1)}}=    \delta^{(2)}\frac{\partial z^{(2)}}{\partial z_1^{(1)}}\frac{\partial z_1^{(1)}}{\partial w_{11}^{(1)}},
 $$
 
-<p>we have </p>
+<p>which, noting that</p>
 $$
-\frac{\partial C}{\partial w_{11}^{(1)}}=\delta_1^{(1)}a_1^{(1)}.
+z^{(2)} =w_1^{(2)}a_1^{(1)}+w_2^{(2)}a_2^{(1)}+b^{(2)},
 $$
 
-<p>Similarly, we obtain</p>
+<p>allows us to rewrite </p>
 $$
-\frac{\partial C}{\partial w_{12}^{(1)}}=\delta_1^{(1)}a_2^{(1)}.
+\frac{\partial z^{(2)}}{\partial z_1^{(1)}}\frac{\partial z_1^{(1)}}{\partial w_{11}^{(1)}}=w_1^{(2)}\frac{\partial a_1^{(1)}}{\partial z_1^{(1)}}a_1^{(1)}.
 $$
 
 
diff --git a/doc/pub/week42/html/._week42-bs024.html b/doc/pub/week42/html/._week42-bs024.html
index ea63e4fc7..687356e3a 100644
--- a/doc/pub/week42/html/._week42-bs024.html
+++ b/doc/pub/week42/html/._week42-bs024.html
@@ -36,16 +36,17 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Lecture October 14, 2024', 2, None, 'lecture-october-14-2024'),
-              ('Material for the active learning sessions on Tuesday and '
-               'Wednesday',
+ 'sections': [('Lecture October 13, 2025', 2, None, 'lecture-october-13-2025'),
+              ('Readings and videos', 2, None, 'readings-and-videos'),
+              ('Material for the lab sessions on Tuesday and Wednesday',
                2,
                None,
-               'material-for-the-active-learning-sessions-on-tuesday-and-wednesday'),
-              ('Writing a code which implements a feed-forward neural network',
+               'material-for-the-lab-sessions-on-tuesday-and-wednesday'),
+              ('Lecture material: Writing a code which implements a '
+               'feed-forward neural network',
                2,
                None,
-               'writing-a-code-which-implements-a-feed-forward-neural-network'),
+               'lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network'),
               ('Mathematics of deep learning',
                2,
                None,
@@ -55,10 +56,11 @@
                None,
                'reminder-on-books-with-hands-on-material-and-codes'),
               ('Reading recommendations', 2, None, 'reading-recommendations'),
-              ('First network example, simple percepetron with one input',
+              ('Reminder from last week: First network example, simple '
+               'percepetron with one input',
                2,
                None,
-               'first-network-example-simple-percepetron-with-one-input'),
+               'reminder-from-last-week-first-network-example-simple-percepetron-with-one-input'),
               ('Layout of a simple neural network with no hidden layer',
                2,
                None,
@@ -105,10 +107,10 @@
                None,
                'setting-up-the-equations-for-a-neural-network'),
               ('Layout of a neural network with three hidden layers (last '
-               'later = $l=L=4$, first layer $l=0$)',
+               'layer = $l=L=4$, first layer $l=0$)',
                2,
                None,
-               'layout-of-a-neural-network-with-three-hidden-layers-last-later-l-l-4-first-layer-l-0'),
+               'layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0'),
               ('Definitions', 2, None, 'definitions'),
               ('Inputs to the activation function',
                2,
@@ -291,10 +293,6 @@
                2,
                None,
                'collect-and-pre-process-data'),
-              ('The Breast Cancer Data, now with Keras',
-               2,
-               None,
-               'the-breast-cancer-data-now-with-keras'),
               ('Building a neural network code',
                2,
                None,
@@ -349,103 +347,103 @@
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs001.html#lecture-october-14-2024" style="font-size: 80%;"><b>Lecture October 14, 2024</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs002.html#material-for-the-active-learning-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Material for the active learning sessions on Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs003.html#writing-a-code-which-implements-a-feed-forward-neural-network" style="font-size: 80%;"><b>Writing a code which implements a feed-forward neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs004.html#mathematics-of-deep-learning" style="font-size: 80%;"><b>Mathematics of deep learning</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs005.html#reminder-on-books-with-hands-on-material-and-codes" style="font-size: 80%;"><b>Reminder on books with hands-on material and codes</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs006.html#reading-recommendations" style="font-size: 80%;"><b>Reading recommendations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs007.html#first-network-example-simple-percepetron-with-one-input" style="font-size: 80%;"><b>First network example, simple percepetron with one input</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs008.html#layout-of-a-simple-neural-network-with-no-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with no hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs009.html#optimizing-the-parameters" style="font-size: 80%;"><b>Optimizing the parameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs010.html#adding-a-hidden-layer" style="font-size: 80%;"><b>Adding a hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs011.html#layout-of-a-simple-neural-network-with-one-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with one hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs012.html#the-derivatives" style="font-size: 80%;"><b>The derivatives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs013.html#important-observations" style="font-size: 80%;"><b>Important observations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs014.html#the-training" style="font-size: 80%;"><b>The training</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs015.html#code-example" style="font-size: 80%;"><b>Code example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs016.html#simple-neural-network-and-the-back-propagation-equations" style="font-size: 80%;"><b>Simple neural network and the  back propagation equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs017.html#layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-with-two-hidden-noeds-and-one-output-node" style="font-size: 80%;"><b>Layout of a simple neural network with two input nodes, one  hidden layer with two hidden noeds and one output node</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs018.html#the-ouput-layer" style="font-size: 80%;"><b>The ouput layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs019.html#compact-expressions" style="font-size: 80%;"><b>Compact expressions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs020.html#output-layer" style="font-size: 80%;"><b>Output layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs021.html#explicit-derivatives" style="font-size: 80%;"><b>Explicit derivatives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs022.html#derivatives-of-the-hidden-layer" style="font-size: 80%;"><b>Derivatives of the hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs023.html#final-expression" style="font-size: 80%;"><b>Final expression</b></a></li>
-     <!-- navigation toc: --> <li><a href="#completing-the-list" style="font-size: 80%;"><b>Completing the list</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs025.html#final-expressions-for-the-biases-of-the-hidden-layer" style="font-size: 80%;"><b>Final expressions for the biases of the hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs026.html#gradient-expressions" style="font-size: 80%;"><b>Gradient expressions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs027.html#setting-up-the-equations-for-a-neural-network" style="font-size: 80%;"><b>Setting up the equations for a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs028.html#layout-of-a-neural-network-with-three-hidden-layers-last-later-l-l-4-first-layer-l-0" style="font-size: 80%;"><b>Layout of a neural network with three hidden layers (last later = \( l=L=4 \), first layer \( l=0 \))</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs029.html#definitions" style="font-size: 80%;"><b>Definitions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs030.html#inputs-to-the-activation-function" style="font-size: 80%;"><b>Inputs to the activation function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs031.html#layout-of-input-to-first-hidden-layer-l-1-from-input-layer-l-0" style="font-size: 80%;"><b>Layout of input to first hidden layer \( l=1 \) from input layer \( l=0 \)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs032.html#derivatives-and-the-chain-rule" style="font-size: 80%;"><b>Derivatives and the chain rule</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs033.html#derivative-of-the-cost-function" style="font-size: 80%;"><b>Derivative of the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs034.html#the-back-propagation-equations-for-a-neural-network" style="font-size: 80%;"><b>The  back propagation equations for a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs035.html#analyzing-the-last-results" style="font-size: 80%;"><b>Analyzing the last results</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs036.html#more-considerations" style="font-size: 80%;"><b>More considerations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs037.html#derivatives-in-terms-of-z-j-l" style="font-size: 80%;"><b>Derivatives in terms of \( z_j^L \)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs038.html#bringing-it-together" style="font-size: 80%;"><b>Bringing it together</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs039.html#final-back-propagating-equation" style="font-size: 80%;"><b>Final back propagating equation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs040.html#using-the-chain-rule-and-summing-over-all-k-entries" style="font-size: 80%;"><b>Using the chain rule and summing over all \( k \) entries</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs041.html#setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations" style="font-size: 80%;"><b>Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs042.html#setting-up-the-back-propagation-algorithm-part-1" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 1</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs043.html#setting-up-the-back-propagation-algorithm-part-2" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs044.html#setting-up-the-back-propagation-algorithm-part-3" style="font-size: 80%;"><b>Setting up the Back propagation algorithm, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs045.html#updating-the-gradients" style="font-size: 80%;"><b>Updating the gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs001.html#lecture-october-13-2025" style="font-size: 80%;"><b>Lecture October 13, 2025</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs002.html#readings-and-videos" style="font-size: 80%;"><b>Readings and videos</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs003.html#material-for-the-lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Material for the lab sessions on Tuesday and Wednesday</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs004.html#lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network" style="font-size: 80%;"><b>Lecture material: Writing a code which implements a feed-forward neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs005.html#mathematics-of-deep-learning" style="font-size: 80%;"><b>Mathematics of deep learning</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs006.html#reminder-on-books-with-hands-on-material-and-codes" style="font-size: 80%;"><b>Reminder on books with hands-on material and codes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs007.html#reading-recommendations" style="font-size: 80%;"><b>Reading recommendations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs008.html#reminder-from-last-week-first-network-example-simple-percepetron-with-one-input" style="font-size: 80%;"><b>Reminder from last week: First network example, simple percepetron with one input</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs009.html#layout-of-a-simple-neural-network-with-no-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with no hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs010.html#optimizing-the-parameters" style="font-size: 80%;"><b>Optimizing the parameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs011.html#adding-a-hidden-layer" style="font-size: 80%;"><b>Adding a hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs012.html#layout-of-a-simple-neural-network-with-one-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with one hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs013.html#the-derivatives" style="font-size: 80%;"><b>The derivatives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs014.html#important-observations" style="font-size: 80%;"><b>Important observations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs015.html#the-training" style="font-size: 80%;"><b>The training</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs016.html#code-example" style="font-size: 80%;"><b>Code example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs017.html#simple-neural-network-and-the-back-propagation-equations" style="font-size: 80%;"><b>Simple neural network and the  back propagation equations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs018.html#layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-with-two-hidden-noeds-and-one-output-node" style="font-size: 80%;"><b>Layout of a simple neural network with two input nodes, one  hidden layer with two hidden noeds and one output node</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs019.html#the-ouput-layer" style="font-size: 80%;"><b>The ouput layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs020.html#compact-expressions" style="font-size: 80%;"><b>Compact expressions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs021.html#output-layer" style="font-size: 80%;"><b>Output layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs022.html#explicit-derivatives" style="font-size: 80%;"><b>Explicit derivatives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs023.html#derivatives-of-the-hidden-layer" style="font-size: 80%;"><b>Derivatives of the hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="#final-expression" style="font-size: 80%;"><b>Final expression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs025.html#completing-the-list" style="font-size: 80%;"><b>Completing the list</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs026.html#final-expressions-for-the-biases-of-the-hidden-layer" style="font-size: 80%;"><b>Final expressions for the biases of the hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs027.html#gradient-expressions" style="font-size: 80%;"><b>Gradient expressions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs028.html#setting-up-the-equations-for-a-neural-network" style="font-size: 80%;"><b>Setting up the equations for a neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs029.html#layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0" style="font-size: 80%;"><b>Layout of a neural network with three hidden layers (last layer = \( l=L=4 \), first layer \( l=0 \))</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs030.html#definitions" style="font-size: 80%;"><b>Definitions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs031.html#inputs-to-the-activation-function" style="font-size: 80%;"><b>Inputs to the activation function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs032.html#layout-of-input-to-first-hidden-layer-l-1-from-input-layer-l-0" style="font-size: 80%;"><b>Layout of input to first hidden layer \( l=1 \) from input layer \( l=0 \)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs033.html#derivatives-and-the-chain-rule" style="font-size: 80%;"><b>Derivatives and the chain rule</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs034.html#derivative-of-the-cost-function" style="font-size: 80%;"><b>Derivative of the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs035.html#the-back-propagation-equations-for-a-neural-network" style="font-size: 80%;"><b>The  back propagation equations for a neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs036.html#analyzing-the-last-results" style="font-size: 80%;"><b>Analyzing the last results</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs037.html#more-considerations" style="font-size: 80%;"><b>More considerations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs038.html#derivatives-in-terms-of-z-j-l" style="font-size: 80%;"><b>Derivatives in terms of \( z_j^L \)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs039.html#bringing-it-together" style="font-size: 80%;"><b>Bringing it together</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs040.html#final-back-propagating-equation" style="font-size: 80%;"><b>Final back propagating equation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs041.html#using-the-chain-rule-and-summing-over-all-k-entries" style="font-size: 80%;"><b>Using the chain rule and summing over all \( k \) entries</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs042.html#setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations" style="font-size: 80%;"><b>Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs043.html#setting-up-the-back-propagation-algorithm-part-1" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 1</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs044.html#setting-up-the-back-propagation-algorithm-part-2" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 2</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs045.html#setting-up-the-back-propagation-algorithm-part-3" style="font-size: 80%;"><b>Setting up the Back propagation algorithm, part 3</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs046.html#updating-the-gradients" style="font-size: 80%;"><b>Updating the gradients</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#activation-functions" style="font-size: 80%;"><b>Activation functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs047.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs048.html#relevance" style="font-size: 80%;"><b>Relevance</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs049.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs050.html#exploding-gradients" style="font-size: 80%;"><b>Exploding gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs051.html#is-the-logistic-activation-function-sigmoid-our-choice" style="font-size: 80%;"><b>Is the Logistic activation function (Sigmoid)  our choice?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs052.html#logistic-function-as-the-root-of-problems" style="font-size: 80%;"><b>Logistic function as the root of problems</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs053.html#the-derivative-of-the-logistic-funtion" style="font-size: 80%;"><b>The derivative of the Logistic funtion</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs054.html#insights-from-the-paper-by-glorot-and-bengio" style="font-size: 80%;"><b>Insights from the paper by Glorot and Bengio</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs055.html#the-relu-function-family" style="font-size: 80%;"><b>The RELU function family</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs056.html#elu-function" style="font-size: 80%;"><b>ELU function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs057.html#which-activation-function-should-we-use" style="font-size: 80%;"><b>Which activation function should we use?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs058.html#more-on-activation-functions-output-layers" style="font-size: 80%;"><b>More on activation functions, output layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs059.html#fine-tuning-neural-network-hyperparameters" style="font-size: 80%;"><b>Fine-tuning neural network hyperparameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs060.html#hidden-layers" style="font-size: 80%;"><b>Hidden layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs061.html#batch-normalization" style="font-size: 80%;"><b>Batch Normalization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs062.html#dropout" style="font-size: 80%;"><b>Dropout</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs063.html#gradient-clipping" style="font-size: 80%;"><b>Gradient Clipping</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs064.html#a-top-down-perspective-on-neural-networks" style="font-size: 80%;"><b>A top-down perspective on Neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs065.html#more-top-down-perspectives" style="font-size: 80%;"><b>More top-down perspectives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs066.html#limitations-of-supervised-learning-with-deep-networks" style="font-size: 80%;"><b>Limitations of supervised learning with deep networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs067.html#limitations-of-nns" style="font-size: 80%;"><b>Limitations of NNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs068.html#homogeneous-data" style="font-size: 80%;"><b>Homogeneous data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs069.html#more-limitations" style="font-size: 80%;"><b>More limitations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs070.html#setting-up-a-multi-layer-perceptron-model-for-classification" style="font-size: 80%;"><b>Setting up a Multi-layer perceptron model for classification</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs071.html#defining-the-cost-function" style="font-size: 80%;"><b>Defining the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs072.html#example-binary-classification-problem" style="font-size: 80%;"><b>Example: binary classification problem</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs073.html#the-softmax-function" style="font-size: 80%;"><b>The Softmax function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs074.html#developing-a-code-for-doing-neural-networks-with-back-propagation" style="font-size: 80%;"><b>Developing a code for doing neural networks with back propagation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs076.html#train-and-test-datasets" style="font-size: 80%;"><b>Train and test datasets</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs077.html#define-model-and-architecture" style="font-size: 80%;"><b>Define model and architecture</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs078.html#layers" style="font-size: 80%;"><b>Layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs079.html#weights-and-biases" style="font-size: 80%;"><b>Weights and biases</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs080.html#feed-forward-pass" style="font-size: 80%;"><b>Feed-forward pass</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs081.html#matrix-multiplications" style="font-size: 80%;"><b>Matrix multiplications</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs082.html#choose-cost-function-and-optimizer" style="font-size: 80%;"><b>Choose cost function and optimizer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs083.html#optimizing-the-cost-function" style="font-size: 80%;"><b>Optimizing the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs084.html#regularization" style="font-size: 80%;"><b>Regularization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs085.html#matrix-multiplication" style="font-size: 80%;"><b>Matrix  multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs086.html#improving-performance" style="font-size: 80%;"><b>Improving performance</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs087.html#full-object-oriented-implementation" style="font-size: 80%;"><b>Full object-oriented implementation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs088.html#evaluate-model-performance-on-test-data" style="font-size: 80%;"><b>Evaluate model performance on test data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs089.html#adjust-hyperparameters" style="font-size: 80%;"><b>Adjust hyperparameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs091.html#scikit-learn-implementation" style="font-size: 80%;"><b>scikit-learn implementation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#building-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs094.html#tensorflow" style="font-size: 80%;"><b>Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs095.html#using-keras" style="font-size: 80%;"><b>Using Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#the-breast-cancer-data-now-with-keras" style="font-size: 80%;"><b>The Breast Cancer Data, now with Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs048.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs049.html#relevance" style="font-size: 80%;"><b>Relevance</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs050.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs051.html#exploding-gradients" style="font-size: 80%;"><b>Exploding gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs052.html#is-the-logistic-activation-function-sigmoid-our-choice" style="font-size: 80%;"><b>Is the Logistic activation function (Sigmoid)  our choice?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs053.html#logistic-function-as-the-root-of-problems" style="font-size: 80%;"><b>Logistic function as the root of problems</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs054.html#the-derivative-of-the-logistic-funtion" style="font-size: 80%;"><b>The derivative of the Logistic funtion</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs055.html#insights-from-the-paper-by-glorot-and-bengio" style="font-size: 80%;"><b>Insights from the paper by Glorot and Bengio</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs056.html#the-relu-function-family" style="font-size: 80%;"><b>The RELU function family</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs057.html#elu-function" style="font-size: 80%;"><b>ELU function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs058.html#which-activation-function-should-we-use" style="font-size: 80%;"><b>Which activation function should we use?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs059.html#more-on-activation-functions-output-layers" style="font-size: 80%;"><b>More on activation functions, output layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs060.html#fine-tuning-neural-network-hyperparameters" style="font-size: 80%;"><b>Fine-tuning neural network hyperparameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs061.html#hidden-layers" style="font-size: 80%;"><b>Hidden layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs062.html#batch-normalization" style="font-size: 80%;"><b>Batch Normalization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs063.html#dropout" style="font-size: 80%;"><b>Dropout</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs064.html#gradient-clipping" style="font-size: 80%;"><b>Gradient Clipping</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs065.html#a-top-down-perspective-on-neural-networks" style="font-size: 80%;"><b>A top-down perspective on Neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs066.html#more-top-down-perspectives" style="font-size: 80%;"><b>More top-down perspectives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs067.html#limitations-of-supervised-learning-with-deep-networks" style="font-size: 80%;"><b>Limitations of supervised learning with deep networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs068.html#limitations-of-nns" style="font-size: 80%;"><b>Limitations of NNs</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs069.html#homogeneous-data" style="font-size: 80%;"><b>Homogeneous data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs070.html#more-limitations" style="font-size: 80%;"><b>More limitations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs071.html#setting-up-a-multi-layer-perceptron-model-for-classification" style="font-size: 80%;"><b>Setting up a Multi-layer perceptron model for classification</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs072.html#defining-the-cost-function" style="font-size: 80%;"><b>Defining the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs073.html#example-binary-classification-problem" style="font-size: 80%;"><b>Example: binary classification problem</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs074.html#the-softmax-function" style="font-size: 80%;"><b>The Softmax function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs075.html#developing-a-code-for-doing-neural-networks-with-back-propagation" style="font-size: 80%;"><b>Developing a code for doing neural networks with back propagation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs077.html#train-and-test-datasets" style="font-size: 80%;"><b>Train and test datasets</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs078.html#define-model-and-architecture" style="font-size: 80%;"><b>Define model and architecture</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs079.html#layers" style="font-size: 80%;"><b>Layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs080.html#weights-and-biases" style="font-size: 80%;"><b>Weights and biases</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs081.html#feed-forward-pass" style="font-size: 80%;"><b>Feed-forward pass</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs082.html#matrix-multiplications" style="font-size: 80%;"><b>Matrix multiplications</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs083.html#choose-cost-function-and-optimizer" style="font-size: 80%;"><b>Choose cost function and optimizer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs084.html#optimizing-the-cost-function" style="font-size: 80%;"><b>Optimizing the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs085.html#regularization" style="font-size: 80%;"><b>Regularization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs086.html#matrix-multiplication" style="font-size: 80%;"><b>Matrix  multiplication</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs087.html#improving-performance" style="font-size: 80%;"><b>Improving performance</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs088.html#full-object-oriented-implementation" style="font-size: 80%;"><b>Full object-oriented implementation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs089.html#evaluate-model-performance-on-test-data" style="font-size: 80%;"><b>Evaluate model performance on test data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs090.html#adjust-hyperparameters" style="font-size: 80%;"><b>Adjust hyperparameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#scikit-learn-implementation" style="font-size: 80%;"><b>scikit-learn implementation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs094.html#building-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building neural networks in Tensorflow and Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs095.html#tensorflow" style="font-size: 80%;"><b>Tensorflow</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#using-keras" style="font-size: 80%;"><b>Using Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#building-a-neural-network-code" style="font-size: 80%;"><b>Building a neural network code</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#learning-rate-methods" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Learning rate methods</a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#usage-of-the-above-learning-rate-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of the above learning rate schedulers</a></li>
@@ -465,21 +463,20 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0024"></a>
 <!-- !split -->
-<h2 id="completing-the-list" class="anchor">Completing the list </h2>
-
-<p>Similarly, we find</p>
+<h2 id="final-expression" class="anchor">Final expression </h2>
+<p>Defining</p>
 $$
-\frac{\partial C}{\partial w_{21}^{(1)}}=\delta_2^{(1)}a_1^{(1)},
+\delta_1^{(1)}=w_1^{(2)}\frac{\partial a_1^{(1)}}{\partial z_1^{(1)}}\delta^{(2)},
 $$
 
-<p>and </p>
+<p>we have </p>
 $$
-\frac{\partial C}{\partial w_{22}^{(1)}}=\delta_2^{(1)}a_2^{(1)},
+\frac{\partial C}{\partial w_{11}^{(1)}}=\delta_1^{(1)}a_1^{(1)}.
 $$
 
-<p>where we have defined </p>
+<p>Similarly, we obtain</p>
 $$
-\delta_2^{(1)}=w_2^{(2)}\frac{\partial a_2^{(1)}}{\partial z_2^{(1)}}\delta^{(2)}.
+\frac{\partial C}{\partial w_{12}^{(1)}}=\delta_1^{(1)}a_2^{(1)}.
 $$
 
 
diff --git a/doc/pub/week42/html/._week42-bs025.html b/doc/pub/week42/html/._week42-bs025.html
index 1a39902ee..78cbe869d 100644
--- a/doc/pub/week42/html/._week42-bs025.html
+++ b/doc/pub/week42/html/._week42-bs025.html
@@ -36,16 +36,17 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Lecture October 14, 2024', 2, None, 'lecture-october-14-2024'),
-              ('Material for the active learning sessions on Tuesday and '
-               'Wednesday',
+ 'sections': [('Lecture October 13, 2025', 2, None, 'lecture-october-13-2025'),
+              ('Readings and videos', 2, None, 'readings-and-videos'),
+              ('Material for the lab sessions on Tuesday and Wednesday',
                2,
                None,
-               'material-for-the-active-learning-sessions-on-tuesday-and-wednesday'),
-              ('Writing a code which implements a feed-forward neural network',
+               'material-for-the-lab-sessions-on-tuesday-and-wednesday'),
+              ('Lecture material: Writing a code which implements a '
+               'feed-forward neural network',
                2,
                None,
-               'writing-a-code-which-implements-a-feed-forward-neural-network'),
+               'lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network'),
               ('Mathematics of deep learning',
                2,
                None,
@@ -55,10 +56,11 @@
                None,
                'reminder-on-books-with-hands-on-material-and-codes'),
               ('Reading recommendations', 2, None, 'reading-recommendations'),
-              ('First network example, simple percepetron with one input',
+              ('Reminder from last week: First network example, simple '
+               'percepetron with one input',
                2,
                None,
-               'first-network-example-simple-percepetron-with-one-input'),
+               'reminder-from-last-week-first-network-example-simple-percepetron-with-one-input'),
               ('Layout of a simple neural network with no hidden layer',
                2,
                None,
@@ -105,10 +107,10 @@
                None,
                'setting-up-the-equations-for-a-neural-network'),
               ('Layout of a neural network with three hidden layers (last '
-               'later = $l=L=4$, first layer $l=0$)',
+               'layer = $l=L=4$, first layer $l=0$)',
                2,
                None,
-               'layout-of-a-neural-network-with-three-hidden-layers-last-later-l-l-4-first-layer-l-0'),
+               'layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0'),
               ('Definitions', 2, None, 'definitions'),
               ('Inputs to the activation function',
                2,
@@ -291,10 +293,6 @@
                2,
                None,
                'collect-and-pre-process-data'),
-              ('The Breast Cancer Data, now with Keras',
-               2,
-               None,
-               'the-breast-cancer-data-now-with-keras'),
               ('Building a neural network code',
                2,
                None,
@@ -349,103 +347,103 @@
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs001.html#lecture-october-14-2024" style="font-size: 80%;"><b>Lecture October 14, 2024</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs002.html#material-for-the-active-learning-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Material for the active learning sessions on Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs003.html#writing-a-code-which-implements-a-feed-forward-neural-network" style="font-size: 80%;"><b>Writing a code which implements a feed-forward neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs004.html#mathematics-of-deep-learning" style="font-size: 80%;"><b>Mathematics of deep learning</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs005.html#reminder-on-books-with-hands-on-material-and-codes" style="font-size: 80%;"><b>Reminder on books with hands-on material and codes</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs006.html#reading-recommendations" style="font-size: 80%;"><b>Reading recommendations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs007.html#first-network-example-simple-percepetron-with-one-input" style="font-size: 80%;"><b>First network example, simple percepetron with one input</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs008.html#layout-of-a-simple-neural-network-with-no-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with no hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs009.html#optimizing-the-parameters" style="font-size: 80%;"><b>Optimizing the parameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs010.html#adding-a-hidden-layer" style="font-size: 80%;"><b>Adding a hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs011.html#layout-of-a-simple-neural-network-with-one-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with one hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs012.html#the-derivatives" style="font-size: 80%;"><b>The derivatives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs013.html#important-observations" style="font-size: 80%;"><b>Important observations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs014.html#the-training" style="font-size: 80%;"><b>The training</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs015.html#code-example" style="font-size: 80%;"><b>Code example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs016.html#simple-neural-network-and-the-back-propagation-equations" style="font-size: 80%;"><b>Simple neural network and the  back propagation equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs017.html#layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-with-two-hidden-noeds-and-one-output-node" style="font-size: 80%;"><b>Layout of a simple neural network with two input nodes, one  hidden layer with two hidden noeds and one output node</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs018.html#the-ouput-layer" style="font-size: 80%;"><b>The ouput layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs019.html#compact-expressions" style="font-size: 80%;"><b>Compact expressions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs020.html#output-layer" style="font-size: 80%;"><b>Output layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs021.html#explicit-derivatives" style="font-size: 80%;"><b>Explicit derivatives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs022.html#derivatives-of-the-hidden-layer" style="font-size: 80%;"><b>Derivatives of the hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs023.html#final-expression" style="font-size: 80%;"><b>Final expression</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs024.html#completing-the-list" style="font-size: 80%;"><b>Completing the list</b></a></li>
-     <!-- navigation toc: --> <li><a href="#final-expressions-for-the-biases-of-the-hidden-layer" style="font-size: 80%;"><b>Final expressions for the biases of the hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs026.html#gradient-expressions" style="font-size: 80%;"><b>Gradient expressions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs027.html#setting-up-the-equations-for-a-neural-network" style="font-size: 80%;"><b>Setting up the equations for a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs028.html#layout-of-a-neural-network-with-three-hidden-layers-last-later-l-l-4-first-layer-l-0" style="font-size: 80%;"><b>Layout of a neural network with three hidden layers (last later = \( l=L=4 \), first layer \( l=0 \))</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs029.html#definitions" style="font-size: 80%;"><b>Definitions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs030.html#inputs-to-the-activation-function" style="font-size: 80%;"><b>Inputs to the activation function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs031.html#layout-of-input-to-first-hidden-layer-l-1-from-input-layer-l-0" style="font-size: 80%;"><b>Layout of input to first hidden layer \( l=1 \) from input layer \( l=0 \)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs032.html#derivatives-and-the-chain-rule" style="font-size: 80%;"><b>Derivatives and the chain rule</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs033.html#derivative-of-the-cost-function" style="font-size: 80%;"><b>Derivative of the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs034.html#the-back-propagation-equations-for-a-neural-network" style="font-size: 80%;"><b>The  back propagation equations for a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs035.html#analyzing-the-last-results" style="font-size: 80%;"><b>Analyzing the last results</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs036.html#more-considerations" style="font-size: 80%;"><b>More considerations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs037.html#derivatives-in-terms-of-z-j-l" style="font-size: 80%;"><b>Derivatives in terms of \( z_j^L \)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs038.html#bringing-it-together" style="font-size: 80%;"><b>Bringing it together</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs039.html#final-back-propagating-equation" style="font-size: 80%;"><b>Final back propagating equation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs040.html#using-the-chain-rule-and-summing-over-all-k-entries" style="font-size: 80%;"><b>Using the chain rule and summing over all \( k \) entries</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs041.html#setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations" style="font-size: 80%;"><b>Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs042.html#setting-up-the-back-propagation-algorithm-part-1" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 1</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs043.html#setting-up-the-back-propagation-algorithm-part-2" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs044.html#setting-up-the-back-propagation-algorithm-part-3" style="font-size: 80%;"><b>Setting up the Back propagation algorithm, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs045.html#updating-the-gradients" style="font-size: 80%;"><b>Updating the gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs001.html#lecture-october-13-2025" style="font-size: 80%;"><b>Lecture October 13, 2025</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs002.html#readings-and-videos" style="font-size: 80%;"><b>Readings and videos</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs003.html#material-for-the-lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Material for the lab sessions on Tuesday and Wednesday</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs004.html#lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network" style="font-size: 80%;"><b>Lecture material: Writing a code which implements a feed-forward neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs005.html#mathematics-of-deep-learning" style="font-size: 80%;"><b>Mathematics of deep learning</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs006.html#reminder-on-books-with-hands-on-material-and-codes" style="font-size: 80%;"><b>Reminder on books with hands-on material and codes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs007.html#reading-recommendations" style="font-size: 80%;"><b>Reading recommendations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs008.html#reminder-from-last-week-first-network-example-simple-percepetron-with-one-input" style="font-size: 80%;"><b>Reminder from last week: First network example, simple percepetron with one input</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs009.html#layout-of-a-simple-neural-network-with-no-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with no hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs010.html#optimizing-the-parameters" style="font-size: 80%;"><b>Optimizing the parameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs011.html#adding-a-hidden-layer" style="font-size: 80%;"><b>Adding a hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs012.html#layout-of-a-simple-neural-network-with-one-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with one hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs013.html#the-derivatives" style="font-size: 80%;"><b>The derivatives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs014.html#important-observations" style="font-size: 80%;"><b>Important observations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs015.html#the-training" style="font-size: 80%;"><b>The training</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs016.html#code-example" style="font-size: 80%;"><b>Code example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs017.html#simple-neural-network-and-the-back-propagation-equations" style="font-size: 80%;"><b>Simple neural network and the  back propagation equations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs018.html#layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-with-two-hidden-noeds-and-one-output-node" style="font-size: 80%;"><b>Layout of a simple neural network with two input nodes, one  hidden layer with two hidden noeds and one output node</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs019.html#the-ouput-layer" style="font-size: 80%;"><b>The ouput layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs020.html#compact-expressions" style="font-size: 80%;"><b>Compact expressions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs021.html#output-layer" style="font-size: 80%;"><b>Output layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs022.html#explicit-derivatives" style="font-size: 80%;"><b>Explicit derivatives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs023.html#derivatives-of-the-hidden-layer" style="font-size: 80%;"><b>Derivatives of the hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs024.html#final-expression" style="font-size: 80%;"><b>Final expression</b></a></li>
+     <!-- navigation toc: --> <li><a href="#completing-the-list" style="font-size: 80%;"><b>Completing the list</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs026.html#final-expressions-for-the-biases-of-the-hidden-layer" style="font-size: 80%;"><b>Final expressions for the biases of the hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs027.html#gradient-expressions" style="font-size: 80%;"><b>Gradient expressions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs028.html#setting-up-the-equations-for-a-neural-network" style="font-size: 80%;"><b>Setting up the equations for a neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs029.html#layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0" style="font-size: 80%;"><b>Layout of a neural network with three hidden layers (last layer = \( l=L=4 \), first layer \( l=0 \))</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs030.html#definitions" style="font-size: 80%;"><b>Definitions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs031.html#inputs-to-the-activation-function" style="font-size: 80%;"><b>Inputs to the activation function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs032.html#layout-of-input-to-first-hidden-layer-l-1-from-input-layer-l-0" style="font-size: 80%;"><b>Layout of input to first hidden layer \( l=1 \) from input layer \( l=0 \)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs033.html#derivatives-and-the-chain-rule" style="font-size: 80%;"><b>Derivatives and the chain rule</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs034.html#derivative-of-the-cost-function" style="font-size: 80%;"><b>Derivative of the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs035.html#the-back-propagation-equations-for-a-neural-network" style="font-size: 80%;"><b>The  back propagation equations for a neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs036.html#analyzing-the-last-results" style="font-size: 80%;"><b>Analyzing the last results</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs037.html#more-considerations" style="font-size: 80%;"><b>More considerations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs038.html#derivatives-in-terms-of-z-j-l" style="font-size: 80%;"><b>Derivatives in terms of \( z_j^L \)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs039.html#bringing-it-together" style="font-size: 80%;"><b>Bringing it together</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs040.html#final-back-propagating-equation" style="font-size: 80%;"><b>Final back propagating equation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs041.html#using-the-chain-rule-and-summing-over-all-k-entries" style="font-size: 80%;"><b>Using the chain rule and summing over all \( k \) entries</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs042.html#setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations" style="font-size: 80%;"><b>Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs043.html#setting-up-the-back-propagation-algorithm-part-1" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 1</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs044.html#setting-up-the-back-propagation-algorithm-part-2" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 2</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs045.html#setting-up-the-back-propagation-algorithm-part-3" style="font-size: 80%;"><b>Setting up the Back propagation algorithm, part 3</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs046.html#updating-the-gradients" style="font-size: 80%;"><b>Updating the gradients</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#activation-functions" style="font-size: 80%;"><b>Activation functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs047.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs048.html#relevance" style="font-size: 80%;"><b>Relevance</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs049.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs050.html#exploding-gradients" style="font-size: 80%;"><b>Exploding gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs051.html#is-the-logistic-activation-function-sigmoid-our-choice" style="font-size: 80%;"><b>Is the Logistic activation function (Sigmoid)  our choice?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs052.html#logistic-function-as-the-root-of-problems" style="font-size: 80%;"><b>Logistic function as the root of problems</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs053.html#the-derivative-of-the-logistic-funtion" style="font-size: 80%;"><b>The derivative of the Logistic funtion</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs054.html#insights-from-the-paper-by-glorot-and-bengio" style="font-size: 80%;"><b>Insights from the paper by Glorot and Bengio</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs055.html#the-relu-function-family" style="font-size: 80%;"><b>The RELU function family</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs056.html#elu-function" style="font-size: 80%;"><b>ELU function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs057.html#which-activation-function-should-we-use" style="font-size: 80%;"><b>Which activation function should we use?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs058.html#more-on-activation-functions-output-layers" style="font-size: 80%;"><b>More on activation functions, output layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs059.html#fine-tuning-neural-network-hyperparameters" style="font-size: 80%;"><b>Fine-tuning neural network hyperparameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs060.html#hidden-layers" style="font-size: 80%;"><b>Hidden layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs061.html#batch-normalization" style="font-size: 80%;"><b>Batch Normalization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs062.html#dropout" style="font-size: 80%;"><b>Dropout</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs063.html#gradient-clipping" style="font-size: 80%;"><b>Gradient Clipping</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs064.html#a-top-down-perspective-on-neural-networks" style="font-size: 80%;"><b>A top-down perspective on Neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs065.html#more-top-down-perspectives" style="font-size: 80%;"><b>More top-down perspectives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs066.html#limitations-of-supervised-learning-with-deep-networks" style="font-size: 80%;"><b>Limitations of supervised learning with deep networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs067.html#limitations-of-nns" style="font-size: 80%;"><b>Limitations of NNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs068.html#homogeneous-data" style="font-size: 80%;"><b>Homogeneous data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs069.html#more-limitations" style="font-size: 80%;"><b>More limitations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs070.html#setting-up-a-multi-layer-perceptron-model-for-classification" style="font-size: 80%;"><b>Setting up a Multi-layer perceptron model for classification</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs071.html#defining-the-cost-function" style="font-size: 80%;"><b>Defining the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs072.html#example-binary-classification-problem" style="font-size: 80%;"><b>Example: binary classification problem</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs073.html#the-softmax-function" style="font-size: 80%;"><b>The Softmax function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs074.html#developing-a-code-for-doing-neural-networks-with-back-propagation" style="font-size: 80%;"><b>Developing a code for doing neural networks with back propagation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs076.html#train-and-test-datasets" style="font-size: 80%;"><b>Train and test datasets</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs077.html#define-model-and-architecture" style="font-size: 80%;"><b>Define model and architecture</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs078.html#layers" style="font-size: 80%;"><b>Layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs079.html#weights-and-biases" style="font-size: 80%;"><b>Weights and biases</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs080.html#feed-forward-pass" style="font-size: 80%;"><b>Feed-forward pass</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs081.html#matrix-multiplications" style="font-size: 80%;"><b>Matrix multiplications</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs082.html#choose-cost-function-and-optimizer" style="font-size: 80%;"><b>Choose cost function and optimizer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs083.html#optimizing-the-cost-function" style="font-size: 80%;"><b>Optimizing the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs084.html#regularization" style="font-size: 80%;"><b>Regularization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs085.html#matrix-multiplication" style="font-size: 80%;"><b>Matrix  multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs086.html#improving-performance" style="font-size: 80%;"><b>Improving performance</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs087.html#full-object-oriented-implementation" style="font-size: 80%;"><b>Full object-oriented implementation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs088.html#evaluate-model-performance-on-test-data" style="font-size: 80%;"><b>Evaluate model performance on test data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs089.html#adjust-hyperparameters" style="font-size: 80%;"><b>Adjust hyperparameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs091.html#scikit-learn-implementation" style="font-size: 80%;"><b>scikit-learn implementation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#building-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs094.html#tensorflow" style="font-size: 80%;"><b>Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs095.html#using-keras" style="font-size: 80%;"><b>Using Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#the-breast-cancer-data-now-with-keras" style="font-size: 80%;"><b>The Breast Cancer Data, now with Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs048.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs049.html#relevance" style="font-size: 80%;"><b>Relevance</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs050.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs051.html#exploding-gradients" style="font-size: 80%;"><b>Exploding gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs052.html#is-the-logistic-activation-function-sigmoid-our-choice" style="font-size: 80%;"><b>Is the Logistic activation function (Sigmoid)  our choice?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs053.html#logistic-function-as-the-root-of-problems" style="font-size: 80%;"><b>Logistic function as the root of problems</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs054.html#the-derivative-of-the-logistic-funtion" style="font-size: 80%;"><b>The derivative of the Logistic funtion</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs055.html#insights-from-the-paper-by-glorot-and-bengio" style="font-size: 80%;"><b>Insights from the paper by Glorot and Bengio</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs056.html#the-relu-function-family" style="font-size: 80%;"><b>The RELU function family</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs057.html#elu-function" style="font-size: 80%;"><b>ELU function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs058.html#which-activation-function-should-we-use" style="font-size: 80%;"><b>Which activation function should we use?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs059.html#more-on-activation-functions-output-layers" style="font-size: 80%;"><b>More on activation functions, output layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs060.html#fine-tuning-neural-network-hyperparameters" style="font-size: 80%;"><b>Fine-tuning neural network hyperparameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs061.html#hidden-layers" style="font-size: 80%;"><b>Hidden layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs062.html#batch-normalization" style="font-size: 80%;"><b>Batch Normalization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs063.html#dropout" style="font-size: 80%;"><b>Dropout</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs064.html#gradient-clipping" style="font-size: 80%;"><b>Gradient Clipping</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs065.html#a-top-down-perspective-on-neural-networks" style="font-size: 80%;"><b>A top-down perspective on Neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs066.html#more-top-down-perspectives" style="font-size: 80%;"><b>More top-down perspectives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs067.html#limitations-of-supervised-learning-with-deep-networks" style="font-size: 80%;"><b>Limitations of supervised learning with deep networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs068.html#limitations-of-nns" style="font-size: 80%;"><b>Limitations of NNs</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs069.html#homogeneous-data" style="font-size: 80%;"><b>Homogeneous data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs070.html#more-limitations" style="font-size: 80%;"><b>More limitations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs071.html#setting-up-a-multi-layer-perceptron-model-for-classification" style="font-size: 80%;"><b>Setting up a Multi-layer perceptron model for classification</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs072.html#defining-the-cost-function" style="font-size: 80%;"><b>Defining the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs073.html#example-binary-classification-problem" style="font-size: 80%;"><b>Example: binary classification problem</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs074.html#the-softmax-function" style="font-size: 80%;"><b>The Softmax function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs075.html#developing-a-code-for-doing-neural-networks-with-back-propagation" style="font-size: 80%;"><b>Developing a code for doing neural networks with back propagation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs077.html#train-and-test-datasets" style="font-size: 80%;"><b>Train and test datasets</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs078.html#define-model-and-architecture" style="font-size: 80%;"><b>Define model and architecture</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs079.html#layers" style="font-size: 80%;"><b>Layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs080.html#weights-and-biases" style="font-size: 80%;"><b>Weights and biases</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs081.html#feed-forward-pass" style="font-size: 80%;"><b>Feed-forward pass</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs082.html#matrix-multiplications" style="font-size: 80%;"><b>Matrix multiplications</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs083.html#choose-cost-function-and-optimizer" style="font-size: 80%;"><b>Choose cost function and optimizer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs084.html#optimizing-the-cost-function" style="font-size: 80%;"><b>Optimizing the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs085.html#regularization" style="font-size: 80%;"><b>Regularization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs086.html#matrix-multiplication" style="font-size: 80%;"><b>Matrix  multiplication</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs087.html#improving-performance" style="font-size: 80%;"><b>Improving performance</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs088.html#full-object-oriented-implementation" style="font-size: 80%;"><b>Full object-oriented implementation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs089.html#evaluate-model-performance-on-test-data" style="font-size: 80%;"><b>Evaluate model performance on test data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs090.html#adjust-hyperparameters" style="font-size: 80%;"><b>Adjust hyperparameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#scikit-learn-implementation" style="font-size: 80%;"><b>scikit-learn implementation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs094.html#building-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building neural networks in Tensorflow and Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs095.html#tensorflow" style="font-size: 80%;"><b>Tensorflow</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#using-keras" style="font-size: 80%;"><b>Using Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#building-a-neural-network-code" style="font-size: 80%;"><b>Building a neural network code</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#learning-rate-methods" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Learning rate methods</a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#usage-of-the-above-learning-rate-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of the above learning rate schedulers</a></li>
@@ -465,19 +463,23 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0025"></a>
 <!-- !split -->
-<h2 id="final-expressions-for-the-biases-of-the-hidden-layer" class="anchor">Final expressions for the biases of the hidden layer </h2>
+<h2 id="completing-the-list" class="anchor">Completing the list </h2>
+
+<p>Similarly, we find</p>
+$$
+\frac{\partial C}{\partial w_{21}^{(1)}}=\delta_2^{(1)}a_1^{(1)},
+$$
 
-<p>For the sake of completeness, we list the derivatives of the biases, which are</p>
+<p>and </p>
 $$
-\frac{\partial C}{\partial b_{1}^{(1)}}=\delta_1^{(1)},
+\frac{\partial C}{\partial w_{22}^{(1)}}=\delta_2^{(1)}a_2^{(1)},
 $$
 
-<p>and</p>
+<p>where we have defined </p>
 $$
-\frac{\partial C}{\partial b_{2}^{(1)}}=\delta_2^{(1)}.
+\delta_2^{(1)}=w_2^{(2)}\frac{\partial a_2^{(1)}}{\partial z_2^{(1)}}\delta^{(2)}.
 $$
 
-<p>As we will see below, these expressions can be generalized in a more compact form. </p>
 
 <p>
 <!-- navigation buttons at the bottom of the page -->
diff --git a/doc/pub/week42/html/._week42-bs026.html b/doc/pub/week42/html/._week42-bs026.html
index f00c2656b..9996339eb 100644
--- a/doc/pub/week42/html/._week42-bs026.html
+++ b/doc/pub/week42/html/._week42-bs026.html
@@ -36,16 +36,17 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Lecture October 14, 2024', 2, None, 'lecture-october-14-2024'),
-              ('Material for the active learning sessions on Tuesday and '
-               'Wednesday',
+ 'sections': [('Lecture October 13, 2025', 2, None, 'lecture-october-13-2025'),
+              ('Readings and videos', 2, None, 'readings-and-videos'),
+              ('Material for the lab sessions on Tuesday and Wednesday',
                2,
                None,
-               'material-for-the-active-learning-sessions-on-tuesday-and-wednesday'),
-              ('Writing a code which implements a feed-forward neural network',
+               'material-for-the-lab-sessions-on-tuesday-and-wednesday'),
+              ('Lecture material: Writing a code which implements a '
+               'feed-forward neural network',
                2,
                None,
-               'writing-a-code-which-implements-a-feed-forward-neural-network'),
+               'lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network'),
               ('Mathematics of deep learning',
                2,
                None,
@@ -55,10 +56,11 @@
                None,
                'reminder-on-books-with-hands-on-material-and-codes'),
               ('Reading recommendations', 2, None, 'reading-recommendations'),
-              ('First network example, simple percepetron with one input',
+              ('Reminder from last week: First network example, simple '
+               'percepetron with one input',
                2,
                None,
-               'first-network-example-simple-percepetron-with-one-input'),
+               'reminder-from-last-week-first-network-example-simple-percepetron-with-one-input'),
               ('Layout of a simple neural network with no hidden layer',
                2,
                None,
@@ -105,10 +107,10 @@
                None,
                'setting-up-the-equations-for-a-neural-network'),
               ('Layout of a neural network with three hidden layers (last '
-               'later = $l=L=4$, first layer $l=0$)',
+               'layer = $l=L=4$, first layer $l=0$)',
                2,
                None,
-               'layout-of-a-neural-network-with-three-hidden-layers-last-later-l-l-4-first-layer-l-0'),
+               'layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0'),
               ('Definitions', 2, None, 'definitions'),
               ('Inputs to the activation function',
                2,
@@ -291,10 +293,6 @@
                2,
                None,
                'collect-and-pre-process-data'),
-              ('The Breast Cancer Data, now with Keras',
-               2,
-               None,
-               'the-breast-cancer-data-now-with-keras'),
               ('Building a neural network code',
                2,
                None,
@@ -349,103 +347,103 @@
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs001.html#lecture-october-14-2024" style="font-size: 80%;"><b>Lecture October 14, 2024</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs002.html#material-for-the-active-learning-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Material for the active learning sessions on Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs003.html#writing-a-code-which-implements-a-feed-forward-neural-network" style="font-size: 80%;"><b>Writing a code which implements a feed-forward neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs004.html#mathematics-of-deep-learning" style="font-size: 80%;"><b>Mathematics of deep learning</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs005.html#reminder-on-books-with-hands-on-material-and-codes" style="font-size: 80%;"><b>Reminder on books with hands-on material and codes</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs006.html#reading-recommendations" style="font-size: 80%;"><b>Reading recommendations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs007.html#first-network-example-simple-percepetron-with-one-input" style="font-size: 80%;"><b>First network example, simple percepetron with one input</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs008.html#layout-of-a-simple-neural-network-with-no-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with no hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs009.html#optimizing-the-parameters" style="font-size: 80%;"><b>Optimizing the parameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs010.html#adding-a-hidden-layer" style="font-size: 80%;"><b>Adding a hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs011.html#layout-of-a-simple-neural-network-with-one-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with one hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs012.html#the-derivatives" style="font-size: 80%;"><b>The derivatives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs013.html#important-observations" style="font-size: 80%;"><b>Important observations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs014.html#the-training" style="font-size: 80%;"><b>The training</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs015.html#code-example" style="font-size: 80%;"><b>Code example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs016.html#simple-neural-network-and-the-back-propagation-equations" style="font-size: 80%;"><b>Simple neural network and the  back propagation equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs017.html#layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-with-two-hidden-noeds-and-one-output-node" style="font-size: 80%;"><b>Layout of a simple neural network with two input nodes, one  hidden layer with two hidden noeds and one output node</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs018.html#the-ouput-layer" style="font-size: 80%;"><b>The ouput layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs019.html#compact-expressions" style="font-size: 80%;"><b>Compact expressions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs020.html#output-layer" style="font-size: 80%;"><b>Output layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs021.html#explicit-derivatives" style="font-size: 80%;"><b>Explicit derivatives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs022.html#derivatives-of-the-hidden-layer" style="font-size: 80%;"><b>Derivatives of the hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs023.html#final-expression" style="font-size: 80%;"><b>Final expression</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs024.html#completing-the-list" style="font-size: 80%;"><b>Completing the list</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs025.html#final-expressions-for-the-biases-of-the-hidden-layer" style="font-size: 80%;"><b>Final expressions for the biases of the hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="#gradient-expressions" style="font-size: 80%;"><b>Gradient expressions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs027.html#setting-up-the-equations-for-a-neural-network" style="font-size: 80%;"><b>Setting up the equations for a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs028.html#layout-of-a-neural-network-with-three-hidden-layers-last-later-l-l-4-first-layer-l-0" style="font-size: 80%;"><b>Layout of a neural network with three hidden layers (last later = \( l=L=4 \), first layer \( l=0 \))</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs029.html#definitions" style="font-size: 80%;"><b>Definitions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs030.html#inputs-to-the-activation-function" style="font-size: 80%;"><b>Inputs to the activation function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs031.html#layout-of-input-to-first-hidden-layer-l-1-from-input-layer-l-0" style="font-size: 80%;"><b>Layout of input to first hidden layer \( l=1 \) from input layer \( l=0 \)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs032.html#derivatives-and-the-chain-rule" style="font-size: 80%;"><b>Derivatives and the chain rule</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs033.html#derivative-of-the-cost-function" style="font-size: 80%;"><b>Derivative of the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs034.html#the-back-propagation-equations-for-a-neural-network" style="font-size: 80%;"><b>The  back propagation equations for a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs035.html#analyzing-the-last-results" style="font-size: 80%;"><b>Analyzing the last results</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs036.html#more-considerations" style="font-size: 80%;"><b>More considerations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs037.html#derivatives-in-terms-of-z-j-l" style="font-size: 80%;"><b>Derivatives in terms of \( z_j^L \)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs038.html#bringing-it-together" style="font-size: 80%;"><b>Bringing it together</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs039.html#final-back-propagating-equation" style="font-size: 80%;"><b>Final back propagating equation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs040.html#using-the-chain-rule-and-summing-over-all-k-entries" style="font-size: 80%;"><b>Using the chain rule and summing over all \( k \) entries</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs041.html#setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations" style="font-size: 80%;"><b>Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs042.html#setting-up-the-back-propagation-algorithm-part-1" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 1</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs043.html#setting-up-the-back-propagation-algorithm-part-2" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs044.html#setting-up-the-back-propagation-algorithm-part-3" style="font-size: 80%;"><b>Setting up the Back propagation algorithm, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs045.html#updating-the-gradients" style="font-size: 80%;"><b>Updating the gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs001.html#lecture-october-13-2025" style="font-size: 80%;"><b>Lecture October 13, 2025</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs002.html#readings-and-videos" style="font-size: 80%;"><b>Readings and videos</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs003.html#material-for-the-lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Material for the lab sessions on Tuesday and Wednesday</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs004.html#lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network" style="font-size: 80%;"><b>Lecture material: Writing a code which implements a feed-forward neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs005.html#mathematics-of-deep-learning" style="font-size: 80%;"><b>Mathematics of deep learning</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs006.html#reminder-on-books-with-hands-on-material-and-codes" style="font-size: 80%;"><b>Reminder on books with hands-on material and codes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs007.html#reading-recommendations" style="font-size: 80%;"><b>Reading recommendations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs008.html#reminder-from-last-week-first-network-example-simple-percepetron-with-one-input" style="font-size: 80%;"><b>Reminder from last week: First network example, simple percepetron with one input</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs009.html#layout-of-a-simple-neural-network-with-no-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with no hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs010.html#optimizing-the-parameters" style="font-size: 80%;"><b>Optimizing the parameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs011.html#adding-a-hidden-layer" style="font-size: 80%;"><b>Adding a hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs012.html#layout-of-a-simple-neural-network-with-one-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with one hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs013.html#the-derivatives" style="font-size: 80%;"><b>The derivatives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs014.html#important-observations" style="font-size: 80%;"><b>Important observations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs015.html#the-training" style="font-size: 80%;"><b>The training</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs016.html#code-example" style="font-size: 80%;"><b>Code example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs017.html#simple-neural-network-and-the-back-propagation-equations" style="font-size: 80%;"><b>Simple neural network and the  back propagation equations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs018.html#layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-with-two-hidden-noeds-and-one-output-node" style="font-size: 80%;"><b>Layout of a simple neural network with two input nodes, one  hidden layer with two hidden noeds and one output node</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs019.html#the-ouput-layer" style="font-size: 80%;"><b>The ouput layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs020.html#compact-expressions" style="font-size: 80%;"><b>Compact expressions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs021.html#output-layer" style="font-size: 80%;"><b>Output layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs022.html#explicit-derivatives" style="font-size: 80%;"><b>Explicit derivatives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs023.html#derivatives-of-the-hidden-layer" style="font-size: 80%;"><b>Derivatives of the hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs024.html#final-expression" style="font-size: 80%;"><b>Final expression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs025.html#completing-the-list" style="font-size: 80%;"><b>Completing the list</b></a></li>
+     <!-- navigation toc: --> <li><a href="#final-expressions-for-the-biases-of-the-hidden-layer" style="font-size: 80%;"><b>Final expressions for the biases of the hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs027.html#gradient-expressions" style="font-size: 80%;"><b>Gradient expressions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs028.html#setting-up-the-equations-for-a-neural-network" style="font-size: 80%;"><b>Setting up the equations for a neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs029.html#layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0" style="font-size: 80%;"><b>Layout of a neural network with three hidden layers (last layer = \( l=L=4 \), first layer \( l=0 \))</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs030.html#definitions" style="font-size: 80%;"><b>Definitions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs031.html#inputs-to-the-activation-function" style="font-size: 80%;"><b>Inputs to the activation function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs032.html#layout-of-input-to-first-hidden-layer-l-1-from-input-layer-l-0" style="font-size: 80%;"><b>Layout of input to first hidden layer \( l=1 \) from input layer \( l=0 \)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs033.html#derivatives-and-the-chain-rule" style="font-size: 80%;"><b>Derivatives and the chain rule</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs034.html#derivative-of-the-cost-function" style="font-size: 80%;"><b>Derivative of the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs035.html#the-back-propagation-equations-for-a-neural-network" style="font-size: 80%;"><b>The  back propagation equations for a neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs036.html#analyzing-the-last-results" style="font-size: 80%;"><b>Analyzing the last results</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs037.html#more-considerations" style="font-size: 80%;"><b>More considerations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs038.html#derivatives-in-terms-of-z-j-l" style="font-size: 80%;"><b>Derivatives in terms of \( z_j^L \)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs039.html#bringing-it-together" style="font-size: 80%;"><b>Bringing it together</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs040.html#final-back-propagating-equation" style="font-size: 80%;"><b>Final back propagating equation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs041.html#using-the-chain-rule-and-summing-over-all-k-entries" style="font-size: 80%;"><b>Using the chain rule and summing over all \( k \) entries</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs042.html#setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations" style="font-size: 80%;"><b>Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs043.html#setting-up-the-back-propagation-algorithm-part-1" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 1</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs044.html#setting-up-the-back-propagation-algorithm-part-2" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 2</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs045.html#setting-up-the-back-propagation-algorithm-part-3" style="font-size: 80%;"><b>Setting up the Back propagation algorithm, part 3</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs046.html#updating-the-gradients" style="font-size: 80%;"><b>Updating the gradients</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#activation-functions" style="font-size: 80%;"><b>Activation functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs047.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs048.html#relevance" style="font-size: 80%;"><b>Relevance</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs049.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs050.html#exploding-gradients" style="font-size: 80%;"><b>Exploding gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs051.html#is-the-logistic-activation-function-sigmoid-our-choice" style="font-size: 80%;"><b>Is the Logistic activation function (Sigmoid)  our choice?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs052.html#logistic-function-as-the-root-of-problems" style="font-size: 80%;"><b>Logistic function as the root of problems</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs053.html#the-derivative-of-the-logistic-funtion" style="font-size: 80%;"><b>The derivative of the Logistic funtion</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs054.html#insights-from-the-paper-by-glorot-and-bengio" style="font-size: 80%;"><b>Insights from the paper by Glorot and Bengio</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs055.html#the-relu-function-family" style="font-size: 80%;"><b>The RELU function family</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs056.html#elu-function" style="font-size: 80%;"><b>ELU function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs057.html#which-activation-function-should-we-use" style="font-size: 80%;"><b>Which activation function should we use?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs058.html#more-on-activation-functions-output-layers" style="font-size: 80%;"><b>More on activation functions, output layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs059.html#fine-tuning-neural-network-hyperparameters" style="font-size: 80%;"><b>Fine-tuning neural network hyperparameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs060.html#hidden-layers" style="font-size: 80%;"><b>Hidden layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs061.html#batch-normalization" style="font-size: 80%;"><b>Batch Normalization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs062.html#dropout" style="font-size: 80%;"><b>Dropout</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs063.html#gradient-clipping" style="font-size: 80%;"><b>Gradient Clipping</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs064.html#a-top-down-perspective-on-neural-networks" style="font-size: 80%;"><b>A top-down perspective on Neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs065.html#more-top-down-perspectives" style="font-size: 80%;"><b>More top-down perspectives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs066.html#limitations-of-supervised-learning-with-deep-networks" style="font-size: 80%;"><b>Limitations of supervised learning with deep networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs067.html#limitations-of-nns" style="font-size: 80%;"><b>Limitations of NNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs068.html#homogeneous-data" style="font-size: 80%;"><b>Homogeneous data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs069.html#more-limitations" style="font-size: 80%;"><b>More limitations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs070.html#setting-up-a-multi-layer-perceptron-model-for-classification" style="font-size: 80%;"><b>Setting up a Multi-layer perceptron model for classification</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs071.html#defining-the-cost-function" style="font-size: 80%;"><b>Defining the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs072.html#example-binary-classification-problem" style="font-size: 80%;"><b>Example: binary classification problem</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs073.html#the-softmax-function" style="font-size: 80%;"><b>The Softmax function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs074.html#developing-a-code-for-doing-neural-networks-with-back-propagation" style="font-size: 80%;"><b>Developing a code for doing neural networks with back propagation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs076.html#train-and-test-datasets" style="font-size: 80%;"><b>Train and test datasets</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs077.html#define-model-and-architecture" style="font-size: 80%;"><b>Define model and architecture</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs078.html#layers" style="font-size: 80%;"><b>Layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs079.html#weights-and-biases" style="font-size: 80%;"><b>Weights and biases</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs080.html#feed-forward-pass" style="font-size: 80%;"><b>Feed-forward pass</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs081.html#matrix-multiplications" style="font-size: 80%;"><b>Matrix multiplications</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs082.html#choose-cost-function-and-optimizer" style="font-size: 80%;"><b>Choose cost function and optimizer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs083.html#optimizing-the-cost-function" style="font-size: 80%;"><b>Optimizing the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs084.html#regularization" style="font-size: 80%;"><b>Regularization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs085.html#matrix-multiplication" style="font-size: 80%;"><b>Matrix  multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs086.html#improving-performance" style="font-size: 80%;"><b>Improving performance</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs087.html#full-object-oriented-implementation" style="font-size: 80%;"><b>Full object-oriented implementation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs088.html#evaluate-model-performance-on-test-data" style="font-size: 80%;"><b>Evaluate model performance on test data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs089.html#adjust-hyperparameters" style="font-size: 80%;"><b>Adjust hyperparameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs091.html#scikit-learn-implementation" style="font-size: 80%;"><b>scikit-learn implementation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#building-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs094.html#tensorflow" style="font-size: 80%;"><b>Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs095.html#using-keras" style="font-size: 80%;"><b>Using Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#the-breast-cancer-data-now-with-keras" style="font-size: 80%;"><b>The Breast Cancer Data, now with Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs048.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs049.html#relevance" style="font-size: 80%;"><b>Relevance</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs050.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs051.html#exploding-gradients" style="font-size: 80%;"><b>Exploding gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs052.html#is-the-logistic-activation-function-sigmoid-our-choice" style="font-size: 80%;"><b>Is the Logistic activation function (Sigmoid)  our choice?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs053.html#logistic-function-as-the-root-of-problems" style="font-size: 80%;"><b>Logistic function as the root of problems</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs054.html#the-derivative-of-the-logistic-funtion" style="font-size: 80%;"><b>The derivative of the Logistic funtion</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs055.html#insights-from-the-paper-by-glorot-and-bengio" style="font-size: 80%;"><b>Insights from the paper by Glorot and Bengio</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs056.html#the-relu-function-family" style="font-size: 80%;"><b>The RELU function family</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs057.html#elu-function" style="font-size: 80%;"><b>ELU function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs058.html#which-activation-function-should-we-use" style="font-size: 80%;"><b>Which activation function should we use?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs059.html#more-on-activation-functions-output-layers" style="font-size: 80%;"><b>More on activation functions, output layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs060.html#fine-tuning-neural-network-hyperparameters" style="font-size: 80%;"><b>Fine-tuning neural network hyperparameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs061.html#hidden-layers" style="font-size: 80%;"><b>Hidden layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs062.html#batch-normalization" style="font-size: 80%;"><b>Batch Normalization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs063.html#dropout" style="font-size: 80%;"><b>Dropout</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs064.html#gradient-clipping" style="font-size: 80%;"><b>Gradient Clipping</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs065.html#a-top-down-perspective-on-neural-networks" style="font-size: 80%;"><b>A top-down perspective on Neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs066.html#more-top-down-perspectives" style="font-size: 80%;"><b>More top-down perspectives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs067.html#limitations-of-supervised-learning-with-deep-networks" style="font-size: 80%;"><b>Limitations of supervised learning with deep networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs068.html#limitations-of-nns" style="font-size: 80%;"><b>Limitations of NNs</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs069.html#homogeneous-data" style="font-size: 80%;"><b>Homogeneous data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs070.html#more-limitations" style="font-size: 80%;"><b>More limitations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs071.html#setting-up-a-multi-layer-perceptron-model-for-classification" style="font-size: 80%;"><b>Setting up a Multi-layer perceptron model for classification</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs072.html#defining-the-cost-function" style="font-size: 80%;"><b>Defining the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs073.html#example-binary-classification-problem" style="font-size: 80%;"><b>Example: binary classification problem</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs074.html#the-softmax-function" style="font-size: 80%;"><b>The Softmax function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs075.html#developing-a-code-for-doing-neural-networks-with-back-propagation" style="font-size: 80%;"><b>Developing a code for doing neural networks with back propagation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs077.html#train-and-test-datasets" style="font-size: 80%;"><b>Train and test datasets</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs078.html#define-model-and-architecture" style="font-size: 80%;"><b>Define model and architecture</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs079.html#layers" style="font-size: 80%;"><b>Layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs080.html#weights-and-biases" style="font-size: 80%;"><b>Weights and biases</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs081.html#feed-forward-pass" style="font-size: 80%;"><b>Feed-forward pass</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs082.html#matrix-multiplications" style="font-size: 80%;"><b>Matrix multiplications</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs083.html#choose-cost-function-and-optimizer" style="font-size: 80%;"><b>Choose cost function and optimizer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs084.html#optimizing-the-cost-function" style="font-size: 80%;"><b>Optimizing the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs085.html#regularization" style="font-size: 80%;"><b>Regularization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs086.html#matrix-multiplication" style="font-size: 80%;"><b>Matrix  multiplication</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs087.html#improving-performance" style="font-size: 80%;"><b>Improving performance</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs088.html#full-object-oriented-implementation" style="font-size: 80%;"><b>Full object-oriented implementation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs089.html#evaluate-model-performance-on-test-data" style="font-size: 80%;"><b>Evaluate model performance on test data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs090.html#adjust-hyperparameters" style="font-size: 80%;"><b>Adjust hyperparameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#scikit-learn-implementation" style="font-size: 80%;"><b>scikit-learn implementation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs094.html#building-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building neural networks in Tensorflow and Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs095.html#tensorflow" style="font-size: 80%;"><b>Tensorflow</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#using-keras" style="font-size: 80%;"><b>Using Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#building-a-neural-network-code" style="font-size: 80%;"><b>Building a neural network code</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#learning-rate-methods" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Learning rate methods</a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#usage-of-the-above-learning-rate-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of the above learning rate schedulers</a></li>
@@ -465,31 +463,19 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0026"></a>
 <!-- !split -->
-<h2 id="gradient-expressions" class="anchor">Gradient expressions </h2>
-
-<p>For this specific model, with just one output node and two hidden
-nodes, the gradient descent equations take the following form for output layer
-</p>
-$$
-w_{i}^{(2)}\leftarrow w_{i}^{(2)}- \eta \delta^{(2)} a_{i}^{(1)},
-$$
+<h2 id="final-expressions-for-the-biases-of-the-hidden-layer" class="anchor">Final expressions for the biases of the hidden layer </h2>
 
-<p>and</p>
-$$
-b^{(2)} \leftarrow b^{(2)}-\eta \delta^{(2)},
-$$
-
-<p>and</p>
+<p>For the sake of completeness, we list the derivatives of the biases, which are</p>
 $$
-w_{ij}^{(1)}\leftarrow w_{ij}^{(1)}- \eta \delta_{i}^{(1)} a_{j}^{(0)},
+\frac{\partial C}{\partial b_{1}^{(1)}}=\delta_1^{(1)},
 $$
 
 <p>and</p>
 $$
-b_{i}^{(1)} \leftarrow b_{i}^{(1)}-\eta \delta_{i}^{(1)},
+\frac{\partial C}{\partial b_{2}^{(1)}}=\delta_2^{(1)}.
 $$
 
-<p>where \( \eta \) is the learning rate.</p>
+<p>As we will see below, these expressions can be generalized in a more compact form. </p>
 
 <p>
 <!-- navigation buttons at the bottom of the page -->
diff --git a/doc/pub/week42/html/._week42-bs027.html b/doc/pub/week42/html/._week42-bs027.html
index 89668d155..9850d1a78 100644
--- a/doc/pub/week42/html/._week42-bs027.html
+++ b/doc/pub/week42/html/._week42-bs027.html
@@ -36,16 +36,17 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Lecture October 14, 2024', 2, None, 'lecture-october-14-2024'),
-              ('Material for the active learning sessions on Tuesday and '
-               'Wednesday',
+ 'sections': [('Lecture October 13, 2025', 2, None, 'lecture-october-13-2025'),
+              ('Readings and videos', 2, None, 'readings-and-videos'),
+              ('Material for the lab sessions on Tuesday and Wednesday',
                2,
                None,
-               'material-for-the-active-learning-sessions-on-tuesday-and-wednesday'),
-              ('Writing a code which implements a feed-forward neural network',
+               'material-for-the-lab-sessions-on-tuesday-and-wednesday'),
+              ('Lecture material: Writing a code which implements a '
+               'feed-forward neural network',
                2,
                None,
-               'writing-a-code-which-implements-a-feed-forward-neural-network'),
+               'lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network'),
               ('Mathematics of deep learning',
                2,
                None,
@@ -55,10 +56,11 @@
                None,
                'reminder-on-books-with-hands-on-material-and-codes'),
               ('Reading recommendations', 2, None, 'reading-recommendations'),
-              ('First network example, simple percepetron with one input',
+              ('Reminder from last week: First network example, simple '
+               'percepetron with one input',
                2,
                None,
-               'first-network-example-simple-percepetron-with-one-input'),
+               'reminder-from-last-week-first-network-example-simple-percepetron-with-one-input'),
               ('Layout of a simple neural network with no hidden layer',
                2,
                None,
@@ -105,10 +107,10 @@
                None,
                'setting-up-the-equations-for-a-neural-network'),
               ('Layout of a neural network with three hidden layers (last '
-               'later = $l=L=4$, first layer $l=0$)',
+               'layer = $l=L=4$, first layer $l=0$)',
                2,
                None,
-               'layout-of-a-neural-network-with-three-hidden-layers-last-later-l-l-4-first-layer-l-0'),
+               'layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0'),
               ('Definitions', 2, None, 'definitions'),
               ('Inputs to the activation function',
                2,
@@ -291,10 +293,6 @@
                2,
                None,
                'collect-and-pre-process-data'),
-              ('The Breast Cancer Data, now with Keras',
-               2,
-               None,
-               'the-breast-cancer-data-now-with-keras'),
               ('Building a neural network code',
                2,
                None,
@@ -349,103 +347,103 @@
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs001.html#lecture-october-14-2024" style="font-size: 80%;"><b>Lecture October 14, 2024</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs002.html#material-for-the-active-learning-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Material for the active learning sessions on Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs003.html#writing-a-code-which-implements-a-feed-forward-neural-network" style="font-size: 80%;"><b>Writing a code which implements a feed-forward neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs004.html#mathematics-of-deep-learning" style="font-size: 80%;"><b>Mathematics of deep learning</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs005.html#reminder-on-books-with-hands-on-material-and-codes" style="font-size: 80%;"><b>Reminder on books with hands-on material and codes</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs006.html#reading-recommendations" style="font-size: 80%;"><b>Reading recommendations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs007.html#first-network-example-simple-percepetron-with-one-input" style="font-size: 80%;"><b>First network example, simple percepetron with one input</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs008.html#layout-of-a-simple-neural-network-with-no-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with no hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs009.html#optimizing-the-parameters" style="font-size: 80%;"><b>Optimizing the parameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs010.html#adding-a-hidden-layer" style="font-size: 80%;"><b>Adding a hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs011.html#layout-of-a-simple-neural-network-with-one-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with one hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs012.html#the-derivatives" style="font-size: 80%;"><b>The derivatives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs013.html#important-observations" style="font-size: 80%;"><b>Important observations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs014.html#the-training" style="font-size: 80%;"><b>The training</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs015.html#code-example" style="font-size: 80%;"><b>Code example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs016.html#simple-neural-network-and-the-back-propagation-equations" style="font-size: 80%;"><b>Simple neural network and the  back propagation equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs017.html#layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-with-two-hidden-noeds-and-one-output-node" style="font-size: 80%;"><b>Layout of a simple neural network with two input nodes, one  hidden layer with two hidden noeds and one output node</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs018.html#the-ouput-layer" style="font-size: 80%;"><b>The ouput layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs019.html#compact-expressions" style="font-size: 80%;"><b>Compact expressions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs020.html#output-layer" style="font-size: 80%;"><b>Output layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs021.html#explicit-derivatives" style="font-size: 80%;"><b>Explicit derivatives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs022.html#derivatives-of-the-hidden-layer" style="font-size: 80%;"><b>Derivatives of the hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs023.html#final-expression" style="font-size: 80%;"><b>Final expression</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs024.html#completing-the-list" style="font-size: 80%;"><b>Completing the list</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs025.html#final-expressions-for-the-biases-of-the-hidden-layer" style="font-size: 80%;"><b>Final expressions for the biases of the hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs026.html#gradient-expressions" style="font-size: 80%;"><b>Gradient expressions</b></a></li>
-     <!-- navigation toc: --> <li><a href="#setting-up-the-equations-for-a-neural-network" style="font-size: 80%;"><b>Setting up the equations for a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs028.html#layout-of-a-neural-network-with-three-hidden-layers-last-later-l-l-4-first-layer-l-0" style="font-size: 80%;"><b>Layout of a neural network with three hidden layers (last later = \( l=L=4 \), first layer \( l=0 \))</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs029.html#definitions" style="font-size: 80%;"><b>Definitions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs030.html#inputs-to-the-activation-function" style="font-size: 80%;"><b>Inputs to the activation function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs031.html#layout-of-input-to-first-hidden-layer-l-1-from-input-layer-l-0" style="font-size: 80%;"><b>Layout of input to first hidden layer \( l=1 \) from input layer \( l=0 \)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs032.html#derivatives-and-the-chain-rule" style="font-size: 80%;"><b>Derivatives and the chain rule</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs033.html#derivative-of-the-cost-function" style="font-size: 80%;"><b>Derivative of the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs034.html#the-back-propagation-equations-for-a-neural-network" style="font-size: 80%;"><b>The  back propagation equations for a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs035.html#analyzing-the-last-results" style="font-size: 80%;"><b>Analyzing the last results</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs036.html#more-considerations" style="font-size: 80%;"><b>More considerations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs037.html#derivatives-in-terms-of-z-j-l" style="font-size: 80%;"><b>Derivatives in terms of \( z_j^L \)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs038.html#bringing-it-together" style="font-size: 80%;"><b>Bringing it together</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs039.html#final-back-propagating-equation" style="font-size: 80%;"><b>Final back propagating equation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs040.html#using-the-chain-rule-and-summing-over-all-k-entries" style="font-size: 80%;"><b>Using the chain rule and summing over all \( k \) entries</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs041.html#setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations" style="font-size: 80%;"><b>Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs042.html#setting-up-the-back-propagation-algorithm-part-1" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 1</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs043.html#setting-up-the-back-propagation-algorithm-part-2" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs044.html#setting-up-the-back-propagation-algorithm-part-3" style="font-size: 80%;"><b>Setting up the Back propagation algorithm, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs045.html#updating-the-gradients" style="font-size: 80%;"><b>Updating the gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs001.html#lecture-october-13-2025" style="font-size: 80%;"><b>Lecture October 13, 2025</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs002.html#readings-and-videos" style="font-size: 80%;"><b>Readings and videos</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs003.html#material-for-the-lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Material for the lab sessions on Tuesday and Wednesday</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs004.html#lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network" style="font-size: 80%;"><b>Lecture material: Writing a code which implements a feed-forward neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs005.html#mathematics-of-deep-learning" style="font-size: 80%;"><b>Mathematics of deep learning</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs006.html#reminder-on-books-with-hands-on-material-and-codes" style="font-size: 80%;"><b>Reminder on books with hands-on material and codes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs007.html#reading-recommendations" style="font-size: 80%;"><b>Reading recommendations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs008.html#reminder-from-last-week-first-network-example-simple-percepetron-with-one-input" style="font-size: 80%;"><b>Reminder from last week: First network example, simple percepetron with one input</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs009.html#layout-of-a-simple-neural-network-with-no-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with no hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs010.html#optimizing-the-parameters" style="font-size: 80%;"><b>Optimizing the parameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs011.html#adding-a-hidden-layer" style="font-size: 80%;"><b>Adding a hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs012.html#layout-of-a-simple-neural-network-with-one-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with one hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs013.html#the-derivatives" style="font-size: 80%;"><b>The derivatives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs014.html#important-observations" style="font-size: 80%;"><b>Important observations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs015.html#the-training" style="font-size: 80%;"><b>The training</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs016.html#code-example" style="font-size: 80%;"><b>Code example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs017.html#simple-neural-network-and-the-back-propagation-equations" style="font-size: 80%;"><b>Simple neural network and the  back propagation equations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs018.html#layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-with-two-hidden-noeds-and-one-output-node" style="font-size: 80%;"><b>Layout of a simple neural network with two input nodes, one  hidden layer with two hidden noeds and one output node</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs019.html#the-ouput-layer" style="font-size: 80%;"><b>The ouput layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs020.html#compact-expressions" style="font-size: 80%;"><b>Compact expressions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs021.html#output-layer" style="font-size: 80%;"><b>Output layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs022.html#explicit-derivatives" style="font-size: 80%;"><b>Explicit derivatives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs023.html#derivatives-of-the-hidden-layer" style="font-size: 80%;"><b>Derivatives of the hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs024.html#final-expression" style="font-size: 80%;"><b>Final expression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs025.html#completing-the-list" style="font-size: 80%;"><b>Completing the list</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs026.html#final-expressions-for-the-biases-of-the-hidden-layer" style="font-size: 80%;"><b>Final expressions for the biases of the hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="#gradient-expressions" style="font-size: 80%;"><b>Gradient expressions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs028.html#setting-up-the-equations-for-a-neural-network" style="font-size: 80%;"><b>Setting up the equations for a neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs029.html#layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0" style="font-size: 80%;"><b>Layout of a neural network with three hidden layers (last layer = \( l=L=4 \), first layer \( l=0 \))</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs030.html#definitions" style="font-size: 80%;"><b>Definitions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs031.html#inputs-to-the-activation-function" style="font-size: 80%;"><b>Inputs to the activation function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs032.html#layout-of-input-to-first-hidden-layer-l-1-from-input-layer-l-0" style="font-size: 80%;"><b>Layout of input to first hidden layer \( l=1 \) from input layer \( l=0 \)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs033.html#derivatives-and-the-chain-rule" style="font-size: 80%;"><b>Derivatives and the chain rule</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs034.html#derivative-of-the-cost-function" style="font-size: 80%;"><b>Derivative of the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs035.html#the-back-propagation-equations-for-a-neural-network" style="font-size: 80%;"><b>The  back propagation equations for a neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs036.html#analyzing-the-last-results" style="font-size: 80%;"><b>Analyzing the last results</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs037.html#more-considerations" style="font-size: 80%;"><b>More considerations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs038.html#derivatives-in-terms-of-z-j-l" style="font-size: 80%;"><b>Derivatives in terms of \( z_j^L \)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs039.html#bringing-it-together" style="font-size: 80%;"><b>Bringing it together</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs040.html#final-back-propagating-equation" style="font-size: 80%;"><b>Final back propagating equation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs041.html#using-the-chain-rule-and-summing-over-all-k-entries" style="font-size: 80%;"><b>Using the chain rule and summing over all \( k \) entries</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs042.html#setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations" style="font-size: 80%;"><b>Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs043.html#setting-up-the-back-propagation-algorithm-part-1" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 1</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs044.html#setting-up-the-back-propagation-algorithm-part-2" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 2</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs045.html#setting-up-the-back-propagation-algorithm-part-3" style="font-size: 80%;"><b>Setting up the Back propagation algorithm, part 3</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs046.html#updating-the-gradients" style="font-size: 80%;"><b>Updating the gradients</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#activation-functions" style="font-size: 80%;"><b>Activation functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs047.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs048.html#relevance" style="font-size: 80%;"><b>Relevance</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs049.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs050.html#exploding-gradients" style="font-size: 80%;"><b>Exploding gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs051.html#is-the-logistic-activation-function-sigmoid-our-choice" style="font-size: 80%;"><b>Is the Logistic activation function (Sigmoid)  our choice?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs052.html#logistic-function-as-the-root-of-problems" style="font-size: 80%;"><b>Logistic function as the root of problems</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs053.html#the-derivative-of-the-logistic-funtion" style="font-size: 80%;"><b>The derivative of the Logistic funtion</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs054.html#insights-from-the-paper-by-glorot-and-bengio" style="font-size: 80%;"><b>Insights from the paper by Glorot and Bengio</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs055.html#the-relu-function-family" style="font-size: 80%;"><b>The RELU function family</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs056.html#elu-function" style="font-size: 80%;"><b>ELU function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs057.html#which-activation-function-should-we-use" style="font-size: 80%;"><b>Which activation function should we use?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs058.html#more-on-activation-functions-output-layers" style="font-size: 80%;"><b>More on activation functions, output layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs059.html#fine-tuning-neural-network-hyperparameters" style="font-size: 80%;"><b>Fine-tuning neural network hyperparameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs060.html#hidden-layers" style="font-size: 80%;"><b>Hidden layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs061.html#batch-normalization" style="font-size: 80%;"><b>Batch Normalization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs062.html#dropout" style="font-size: 80%;"><b>Dropout</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs063.html#gradient-clipping" style="font-size: 80%;"><b>Gradient Clipping</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs064.html#a-top-down-perspective-on-neural-networks" style="font-size: 80%;"><b>A top-down perspective on Neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs065.html#more-top-down-perspectives" style="font-size: 80%;"><b>More top-down perspectives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs066.html#limitations-of-supervised-learning-with-deep-networks" style="font-size: 80%;"><b>Limitations of supervised learning with deep networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs067.html#limitations-of-nns" style="font-size: 80%;"><b>Limitations of NNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs068.html#homogeneous-data" style="font-size: 80%;"><b>Homogeneous data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs069.html#more-limitations" style="font-size: 80%;"><b>More limitations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs070.html#setting-up-a-multi-layer-perceptron-model-for-classification" style="font-size: 80%;"><b>Setting up a Multi-layer perceptron model for classification</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs071.html#defining-the-cost-function" style="font-size: 80%;"><b>Defining the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs072.html#example-binary-classification-problem" style="font-size: 80%;"><b>Example: binary classification problem</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs073.html#the-softmax-function" style="font-size: 80%;"><b>The Softmax function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs074.html#developing-a-code-for-doing-neural-networks-with-back-propagation" style="font-size: 80%;"><b>Developing a code for doing neural networks with back propagation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs076.html#train-and-test-datasets" style="font-size: 80%;"><b>Train and test datasets</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs077.html#define-model-and-architecture" style="font-size: 80%;"><b>Define model and architecture</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs078.html#layers" style="font-size: 80%;"><b>Layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs079.html#weights-and-biases" style="font-size: 80%;"><b>Weights and biases</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs080.html#feed-forward-pass" style="font-size: 80%;"><b>Feed-forward pass</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs081.html#matrix-multiplications" style="font-size: 80%;"><b>Matrix multiplications</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs082.html#choose-cost-function-and-optimizer" style="font-size: 80%;"><b>Choose cost function and optimizer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs083.html#optimizing-the-cost-function" style="font-size: 80%;"><b>Optimizing the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs084.html#regularization" style="font-size: 80%;"><b>Regularization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs085.html#matrix-multiplication" style="font-size: 80%;"><b>Matrix  multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs086.html#improving-performance" style="font-size: 80%;"><b>Improving performance</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs087.html#full-object-oriented-implementation" style="font-size: 80%;"><b>Full object-oriented implementation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs088.html#evaluate-model-performance-on-test-data" style="font-size: 80%;"><b>Evaluate model performance on test data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs089.html#adjust-hyperparameters" style="font-size: 80%;"><b>Adjust hyperparameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs091.html#scikit-learn-implementation" style="font-size: 80%;"><b>scikit-learn implementation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#building-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs094.html#tensorflow" style="font-size: 80%;"><b>Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs095.html#using-keras" style="font-size: 80%;"><b>Using Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#the-breast-cancer-data-now-with-keras" style="font-size: 80%;"><b>The Breast Cancer Data, now with Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs048.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs049.html#relevance" style="font-size: 80%;"><b>Relevance</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs050.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs051.html#exploding-gradients" style="font-size: 80%;"><b>Exploding gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs052.html#is-the-logistic-activation-function-sigmoid-our-choice" style="font-size: 80%;"><b>Is the Logistic activation function (Sigmoid)  our choice?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs053.html#logistic-function-as-the-root-of-problems" style="font-size: 80%;"><b>Logistic function as the root of problems</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs054.html#the-derivative-of-the-logistic-funtion" style="font-size: 80%;"><b>The derivative of the Logistic funtion</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs055.html#insights-from-the-paper-by-glorot-and-bengio" style="font-size: 80%;"><b>Insights from the paper by Glorot and Bengio</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs056.html#the-relu-function-family" style="font-size: 80%;"><b>The RELU function family</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs057.html#elu-function" style="font-size: 80%;"><b>ELU function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs058.html#which-activation-function-should-we-use" style="font-size: 80%;"><b>Which activation function should we use?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs059.html#more-on-activation-functions-output-layers" style="font-size: 80%;"><b>More on activation functions, output layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs060.html#fine-tuning-neural-network-hyperparameters" style="font-size: 80%;"><b>Fine-tuning neural network hyperparameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs061.html#hidden-layers" style="font-size: 80%;"><b>Hidden layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs062.html#batch-normalization" style="font-size: 80%;"><b>Batch Normalization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs063.html#dropout" style="font-size: 80%;"><b>Dropout</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs064.html#gradient-clipping" style="font-size: 80%;"><b>Gradient Clipping</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs065.html#a-top-down-perspective-on-neural-networks" style="font-size: 80%;"><b>A top-down perspective on Neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs066.html#more-top-down-perspectives" style="font-size: 80%;"><b>More top-down perspectives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs067.html#limitations-of-supervised-learning-with-deep-networks" style="font-size: 80%;"><b>Limitations of supervised learning with deep networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs068.html#limitations-of-nns" style="font-size: 80%;"><b>Limitations of NNs</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs069.html#homogeneous-data" style="font-size: 80%;"><b>Homogeneous data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs070.html#more-limitations" style="font-size: 80%;"><b>More limitations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs071.html#setting-up-a-multi-layer-perceptron-model-for-classification" style="font-size: 80%;"><b>Setting up a Multi-layer perceptron model for classification</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs072.html#defining-the-cost-function" style="font-size: 80%;"><b>Defining the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs073.html#example-binary-classification-problem" style="font-size: 80%;"><b>Example: binary classification problem</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs074.html#the-softmax-function" style="font-size: 80%;"><b>The Softmax function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs075.html#developing-a-code-for-doing-neural-networks-with-back-propagation" style="font-size: 80%;"><b>Developing a code for doing neural networks with back propagation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs077.html#train-and-test-datasets" style="font-size: 80%;"><b>Train and test datasets</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs078.html#define-model-and-architecture" style="font-size: 80%;"><b>Define model and architecture</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs079.html#layers" style="font-size: 80%;"><b>Layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs080.html#weights-and-biases" style="font-size: 80%;"><b>Weights and biases</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs081.html#feed-forward-pass" style="font-size: 80%;"><b>Feed-forward pass</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs082.html#matrix-multiplications" style="font-size: 80%;"><b>Matrix multiplications</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs083.html#choose-cost-function-and-optimizer" style="font-size: 80%;"><b>Choose cost function and optimizer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs084.html#optimizing-the-cost-function" style="font-size: 80%;"><b>Optimizing the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs085.html#regularization" style="font-size: 80%;"><b>Regularization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs086.html#matrix-multiplication" style="font-size: 80%;"><b>Matrix  multiplication</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs087.html#improving-performance" style="font-size: 80%;"><b>Improving performance</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs088.html#full-object-oriented-implementation" style="font-size: 80%;"><b>Full object-oriented implementation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs089.html#evaluate-model-performance-on-test-data" style="font-size: 80%;"><b>Evaluate model performance on test data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs090.html#adjust-hyperparameters" style="font-size: 80%;"><b>Adjust hyperparameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#scikit-learn-implementation" style="font-size: 80%;"><b>scikit-learn implementation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs094.html#building-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building neural networks in Tensorflow and Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs095.html#tensorflow" style="font-size: 80%;"><b>Tensorflow</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#using-keras" style="font-size: 80%;"><b>Using Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#building-a-neural-network-code" style="font-size: 80%;"><b>Building a neural network code</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#learning-rate-methods" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Learning rate methods</a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#usage-of-the-above-learning-rate-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of the above learning rate schedulers</a></li>
@@ -465,25 +463,31 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0027"></a>
 <!-- !split -->
-<h2 id="setting-up-the-equations-for-a-neural-network" class="anchor">Setting up the equations for a neural network </h2>
+<h2 id="gradient-expressions" class="anchor">Gradient expressions </h2>
 
-<p>The questions we want to ask are how do changes in the biases and the
-weights in our network change the cost function and how can we use the
-final output to modify the weights and biases?
+<p>For this specific model, with just one output node and two hidden
+nodes, the gradient descent equations take the following form for output layer
 </p>
+$$
+w_{i}^{(2)}\leftarrow w_{i}^{(2)}- \eta \delta^{(2)} a_{i}^{(1)},
+$$
 
-<p>To derive these equations let us start with a plain regression problem
-and define our cost function as
-</p>
+<p>and</p>
+$$
+b^{(2)} \leftarrow b^{(2)}-\eta \delta^{(2)},
+$$
 
+<p>and</p>
 $$
-{\cal C}(\boldsymbol{\Theta})  =  \frac{1}{2}\sum_{i=1}^n\left(y_i - \tilde{y}_i\right)^2, 
+w_{ij}^{(1)}\leftarrow w_{ij}^{(1)}- \eta \delta_{i}^{(1)} a_{j}^{(0)},
 $$
 
-<p>where the $y_i$s are our \( n \) targets (the values we want to
-reproduce), while the outputs of the network after having propagated
-all inputs \( \boldsymbol{x} \) are given by \( \boldsymbol{\tilde{y}}_i \).
-</p>
+<p>and</p>
+$$
+b_{i}^{(1)} \leftarrow b_{i}^{(1)}-\eta \delta_{i}^{(1)},
+$$
+
+<p>where \( \eta \) is the learning rate.</p>
 
 <p>
 <!-- navigation buttons at the bottom of the page -->
diff --git a/doc/pub/week42/html/._week42-bs028.html b/doc/pub/week42/html/._week42-bs028.html
index c2b11af09..0c34329df 100644
--- a/doc/pub/week42/html/._week42-bs028.html
+++ b/doc/pub/week42/html/._week42-bs028.html
@@ -36,16 +36,17 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Lecture October 14, 2024', 2, None, 'lecture-october-14-2024'),
-              ('Material for the active learning sessions on Tuesday and '
-               'Wednesday',
+ 'sections': [('Lecture October 13, 2025', 2, None, 'lecture-october-13-2025'),
+              ('Readings and videos', 2, None, 'readings-and-videos'),
+              ('Material for the lab sessions on Tuesday and Wednesday',
                2,
                None,
-               'material-for-the-active-learning-sessions-on-tuesday-and-wednesday'),
-              ('Writing a code which implements a feed-forward neural network',
+               'material-for-the-lab-sessions-on-tuesday-and-wednesday'),
+              ('Lecture material: Writing a code which implements a '
+               'feed-forward neural network',
                2,
                None,
-               'writing-a-code-which-implements-a-feed-forward-neural-network'),
+               'lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network'),
               ('Mathematics of deep learning',
                2,
                None,
@@ -55,10 +56,11 @@
                None,
                'reminder-on-books-with-hands-on-material-and-codes'),
               ('Reading recommendations', 2, None, 'reading-recommendations'),
-              ('First network example, simple percepetron with one input',
+              ('Reminder from last week: First network example, simple '
+               'percepetron with one input',
                2,
                None,
-               'first-network-example-simple-percepetron-with-one-input'),
+               'reminder-from-last-week-first-network-example-simple-percepetron-with-one-input'),
               ('Layout of a simple neural network with no hidden layer',
                2,
                None,
@@ -105,10 +107,10 @@
                None,
                'setting-up-the-equations-for-a-neural-network'),
               ('Layout of a neural network with three hidden layers (last '
-               'later = $l=L=4$, first layer $l=0$)',
+               'layer = $l=L=4$, first layer $l=0$)',
                2,
                None,
-               'layout-of-a-neural-network-with-three-hidden-layers-last-later-l-l-4-first-layer-l-0'),
+               'layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0'),
               ('Definitions', 2, None, 'definitions'),
               ('Inputs to the activation function',
                2,
@@ -291,10 +293,6 @@
                2,
                None,
                'collect-and-pre-process-data'),
-              ('The Breast Cancer Data, now with Keras',
-               2,
-               None,
-               'the-breast-cancer-data-now-with-keras'),
               ('Building a neural network code',
                2,
                None,
@@ -349,103 +347,103 @@
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs001.html#lecture-october-14-2024" style="font-size: 80%;"><b>Lecture October 14, 2024</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs002.html#material-for-the-active-learning-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Material for the active learning sessions on Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs003.html#writing-a-code-which-implements-a-feed-forward-neural-network" style="font-size: 80%;"><b>Writing a code which implements a feed-forward neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs004.html#mathematics-of-deep-learning" style="font-size: 80%;"><b>Mathematics of deep learning</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs005.html#reminder-on-books-with-hands-on-material-and-codes" style="font-size: 80%;"><b>Reminder on books with hands-on material and codes</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs006.html#reading-recommendations" style="font-size: 80%;"><b>Reading recommendations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs007.html#first-network-example-simple-percepetron-with-one-input" style="font-size: 80%;"><b>First network example, simple percepetron with one input</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs008.html#layout-of-a-simple-neural-network-with-no-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with no hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs009.html#optimizing-the-parameters" style="font-size: 80%;"><b>Optimizing the parameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs010.html#adding-a-hidden-layer" style="font-size: 80%;"><b>Adding a hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs011.html#layout-of-a-simple-neural-network-with-one-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with one hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs012.html#the-derivatives" style="font-size: 80%;"><b>The derivatives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs013.html#important-observations" style="font-size: 80%;"><b>Important observations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs014.html#the-training" style="font-size: 80%;"><b>The training</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs015.html#code-example" style="font-size: 80%;"><b>Code example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs016.html#simple-neural-network-and-the-back-propagation-equations" style="font-size: 80%;"><b>Simple neural network and the  back propagation equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs017.html#layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-with-two-hidden-noeds-and-one-output-node" style="font-size: 80%;"><b>Layout of a simple neural network with two input nodes, one  hidden layer with two hidden noeds and one output node</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs018.html#the-ouput-layer" style="font-size: 80%;"><b>The ouput layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs019.html#compact-expressions" style="font-size: 80%;"><b>Compact expressions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs020.html#output-layer" style="font-size: 80%;"><b>Output layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs021.html#explicit-derivatives" style="font-size: 80%;"><b>Explicit derivatives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs022.html#derivatives-of-the-hidden-layer" style="font-size: 80%;"><b>Derivatives of the hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs023.html#final-expression" style="font-size: 80%;"><b>Final expression</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs024.html#completing-the-list" style="font-size: 80%;"><b>Completing the list</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs025.html#final-expressions-for-the-biases-of-the-hidden-layer" style="font-size: 80%;"><b>Final expressions for the biases of the hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs026.html#gradient-expressions" style="font-size: 80%;"><b>Gradient expressions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs027.html#setting-up-the-equations-for-a-neural-network" style="font-size: 80%;"><b>Setting up the equations for a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="#layout-of-a-neural-network-with-three-hidden-layers-last-later-l-l-4-first-layer-l-0" style="font-size: 80%;"><b>Layout of a neural network with three hidden layers (last later = \( l=L=4 \), first layer \( l=0 \))</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs029.html#definitions" style="font-size: 80%;"><b>Definitions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs030.html#inputs-to-the-activation-function" style="font-size: 80%;"><b>Inputs to the activation function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs031.html#layout-of-input-to-first-hidden-layer-l-1-from-input-layer-l-0" style="font-size: 80%;"><b>Layout of input to first hidden layer \( l=1 \) from input layer \( l=0 \)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs032.html#derivatives-and-the-chain-rule" style="font-size: 80%;"><b>Derivatives and the chain rule</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs033.html#derivative-of-the-cost-function" style="font-size: 80%;"><b>Derivative of the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs034.html#the-back-propagation-equations-for-a-neural-network" style="font-size: 80%;"><b>The  back propagation equations for a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs035.html#analyzing-the-last-results" style="font-size: 80%;"><b>Analyzing the last results</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs036.html#more-considerations" style="font-size: 80%;"><b>More considerations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs037.html#derivatives-in-terms-of-z-j-l" style="font-size: 80%;"><b>Derivatives in terms of \( z_j^L \)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs038.html#bringing-it-together" style="font-size: 80%;"><b>Bringing it together</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs039.html#final-back-propagating-equation" style="font-size: 80%;"><b>Final back propagating equation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs040.html#using-the-chain-rule-and-summing-over-all-k-entries" style="font-size: 80%;"><b>Using the chain rule and summing over all \( k \) entries</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs041.html#setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations" style="font-size: 80%;"><b>Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs042.html#setting-up-the-back-propagation-algorithm-part-1" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 1</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs043.html#setting-up-the-back-propagation-algorithm-part-2" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs044.html#setting-up-the-back-propagation-algorithm-part-3" style="font-size: 80%;"><b>Setting up the Back propagation algorithm, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs045.html#updating-the-gradients" style="font-size: 80%;"><b>Updating the gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs001.html#lecture-october-13-2025" style="font-size: 80%;"><b>Lecture October 13, 2025</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs002.html#readings-and-videos" style="font-size: 80%;"><b>Readings and videos</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs003.html#material-for-the-lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Material for the lab sessions on Tuesday and Wednesday</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs004.html#lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network" style="font-size: 80%;"><b>Lecture material: Writing a code which implements a feed-forward neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs005.html#mathematics-of-deep-learning" style="font-size: 80%;"><b>Mathematics of deep learning</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs006.html#reminder-on-books-with-hands-on-material-and-codes" style="font-size: 80%;"><b>Reminder on books with hands-on material and codes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs007.html#reading-recommendations" style="font-size: 80%;"><b>Reading recommendations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs008.html#reminder-from-last-week-first-network-example-simple-percepetron-with-one-input" style="font-size: 80%;"><b>Reminder from last week: First network example, simple percepetron with one input</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs009.html#layout-of-a-simple-neural-network-with-no-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with no hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs010.html#optimizing-the-parameters" style="font-size: 80%;"><b>Optimizing the parameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs011.html#adding-a-hidden-layer" style="font-size: 80%;"><b>Adding a hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs012.html#layout-of-a-simple-neural-network-with-one-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with one hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs013.html#the-derivatives" style="font-size: 80%;"><b>The derivatives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs014.html#important-observations" style="font-size: 80%;"><b>Important observations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs015.html#the-training" style="font-size: 80%;"><b>The training</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs016.html#code-example" style="font-size: 80%;"><b>Code example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs017.html#simple-neural-network-and-the-back-propagation-equations" style="font-size: 80%;"><b>Simple neural network and the  back propagation equations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs018.html#layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-with-two-hidden-noeds-and-one-output-node" style="font-size: 80%;"><b>Layout of a simple neural network with two input nodes, one  hidden layer with two hidden noeds and one output node</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs019.html#the-ouput-layer" style="font-size: 80%;"><b>The ouput layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs020.html#compact-expressions" style="font-size: 80%;"><b>Compact expressions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs021.html#output-layer" style="font-size: 80%;"><b>Output layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs022.html#explicit-derivatives" style="font-size: 80%;"><b>Explicit derivatives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs023.html#derivatives-of-the-hidden-layer" style="font-size: 80%;"><b>Derivatives of the hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs024.html#final-expression" style="font-size: 80%;"><b>Final expression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs025.html#completing-the-list" style="font-size: 80%;"><b>Completing the list</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs026.html#final-expressions-for-the-biases-of-the-hidden-layer" style="font-size: 80%;"><b>Final expressions for the biases of the hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs027.html#gradient-expressions" style="font-size: 80%;"><b>Gradient expressions</b></a></li>
+     <!-- navigation toc: --> <li><a href="#setting-up-the-equations-for-a-neural-network" style="font-size: 80%;"><b>Setting up the equations for a neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs029.html#layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0" style="font-size: 80%;"><b>Layout of a neural network with three hidden layers (last layer = \( l=L=4 \), first layer \( l=0 \))</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs030.html#definitions" style="font-size: 80%;"><b>Definitions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs031.html#inputs-to-the-activation-function" style="font-size: 80%;"><b>Inputs to the activation function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs032.html#layout-of-input-to-first-hidden-layer-l-1-from-input-layer-l-0" style="font-size: 80%;"><b>Layout of input to first hidden layer \( l=1 \) from input layer \( l=0 \)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs033.html#derivatives-and-the-chain-rule" style="font-size: 80%;"><b>Derivatives and the chain rule</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs034.html#derivative-of-the-cost-function" style="font-size: 80%;"><b>Derivative of the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs035.html#the-back-propagation-equations-for-a-neural-network" style="font-size: 80%;"><b>The  back propagation equations for a neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs036.html#analyzing-the-last-results" style="font-size: 80%;"><b>Analyzing the last results</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs037.html#more-considerations" style="font-size: 80%;"><b>More considerations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs038.html#derivatives-in-terms-of-z-j-l" style="font-size: 80%;"><b>Derivatives in terms of \( z_j^L \)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs039.html#bringing-it-together" style="font-size: 80%;"><b>Bringing it together</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs040.html#final-back-propagating-equation" style="font-size: 80%;"><b>Final back propagating equation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs041.html#using-the-chain-rule-and-summing-over-all-k-entries" style="font-size: 80%;"><b>Using the chain rule and summing over all \( k \) entries</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs042.html#setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations" style="font-size: 80%;"><b>Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs043.html#setting-up-the-back-propagation-algorithm-part-1" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 1</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs044.html#setting-up-the-back-propagation-algorithm-part-2" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 2</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs045.html#setting-up-the-back-propagation-algorithm-part-3" style="font-size: 80%;"><b>Setting up the Back propagation algorithm, part 3</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs046.html#updating-the-gradients" style="font-size: 80%;"><b>Updating the gradients</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#activation-functions" style="font-size: 80%;"><b>Activation functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs047.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs048.html#relevance" style="font-size: 80%;"><b>Relevance</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs049.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs050.html#exploding-gradients" style="font-size: 80%;"><b>Exploding gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs051.html#is-the-logistic-activation-function-sigmoid-our-choice" style="font-size: 80%;"><b>Is the Logistic activation function (Sigmoid)  our choice?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs052.html#logistic-function-as-the-root-of-problems" style="font-size: 80%;"><b>Logistic function as the root of problems</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs053.html#the-derivative-of-the-logistic-funtion" style="font-size: 80%;"><b>The derivative of the Logistic funtion</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs054.html#insights-from-the-paper-by-glorot-and-bengio" style="font-size: 80%;"><b>Insights from the paper by Glorot and Bengio</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs055.html#the-relu-function-family" style="font-size: 80%;"><b>The RELU function family</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs056.html#elu-function" style="font-size: 80%;"><b>ELU function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs057.html#which-activation-function-should-we-use" style="font-size: 80%;"><b>Which activation function should we use?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs058.html#more-on-activation-functions-output-layers" style="font-size: 80%;"><b>More on activation functions, output layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs059.html#fine-tuning-neural-network-hyperparameters" style="font-size: 80%;"><b>Fine-tuning neural network hyperparameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs060.html#hidden-layers" style="font-size: 80%;"><b>Hidden layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs061.html#batch-normalization" style="font-size: 80%;"><b>Batch Normalization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs062.html#dropout" style="font-size: 80%;"><b>Dropout</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs063.html#gradient-clipping" style="font-size: 80%;"><b>Gradient Clipping</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs064.html#a-top-down-perspective-on-neural-networks" style="font-size: 80%;"><b>A top-down perspective on Neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs065.html#more-top-down-perspectives" style="font-size: 80%;"><b>More top-down perspectives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs066.html#limitations-of-supervised-learning-with-deep-networks" style="font-size: 80%;"><b>Limitations of supervised learning with deep networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs067.html#limitations-of-nns" style="font-size: 80%;"><b>Limitations of NNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs068.html#homogeneous-data" style="font-size: 80%;"><b>Homogeneous data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs069.html#more-limitations" style="font-size: 80%;"><b>More limitations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs070.html#setting-up-a-multi-layer-perceptron-model-for-classification" style="font-size: 80%;"><b>Setting up a Multi-layer perceptron model for classification</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs071.html#defining-the-cost-function" style="font-size: 80%;"><b>Defining the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs072.html#example-binary-classification-problem" style="font-size: 80%;"><b>Example: binary classification problem</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs073.html#the-softmax-function" style="font-size: 80%;"><b>The Softmax function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs074.html#developing-a-code-for-doing-neural-networks-with-back-propagation" style="font-size: 80%;"><b>Developing a code for doing neural networks with back propagation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs076.html#train-and-test-datasets" style="font-size: 80%;"><b>Train and test datasets</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs077.html#define-model-and-architecture" style="font-size: 80%;"><b>Define model and architecture</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs078.html#layers" style="font-size: 80%;"><b>Layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs079.html#weights-and-biases" style="font-size: 80%;"><b>Weights and biases</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs080.html#feed-forward-pass" style="font-size: 80%;"><b>Feed-forward pass</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs081.html#matrix-multiplications" style="font-size: 80%;"><b>Matrix multiplications</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs082.html#choose-cost-function-and-optimizer" style="font-size: 80%;"><b>Choose cost function and optimizer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs083.html#optimizing-the-cost-function" style="font-size: 80%;"><b>Optimizing the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs084.html#regularization" style="font-size: 80%;"><b>Regularization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs085.html#matrix-multiplication" style="font-size: 80%;"><b>Matrix  multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs086.html#improving-performance" style="font-size: 80%;"><b>Improving performance</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs087.html#full-object-oriented-implementation" style="font-size: 80%;"><b>Full object-oriented implementation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs088.html#evaluate-model-performance-on-test-data" style="font-size: 80%;"><b>Evaluate model performance on test data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs089.html#adjust-hyperparameters" style="font-size: 80%;"><b>Adjust hyperparameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs091.html#scikit-learn-implementation" style="font-size: 80%;"><b>scikit-learn implementation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#building-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs094.html#tensorflow" style="font-size: 80%;"><b>Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs095.html#using-keras" style="font-size: 80%;"><b>Using Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#the-breast-cancer-data-now-with-keras" style="font-size: 80%;"><b>The Breast Cancer Data, now with Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs048.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs049.html#relevance" style="font-size: 80%;"><b>Relevance</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs050.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs051.html#exploding-gradients" style="font-size: 80%;"><b>Exploding gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs052.html#is-the-logistic-activation-function-sigmoid-our-choice" style="font-size: 80%;"><b>Is the Logistic activation function (Sigmoid)  our choice?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs053.html#logistic-function-as-the-root-of-problems" style="font-size: 80%;"><b>Logistic function as the root of problems</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs054.html#the-derivative-of-the-logistic-funtion" style="font-size: 80%;"><b>The derivative of the Logistic funtion</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs055.html#insights-from-the-paper-by-glorot-and-bengio" style="font-size: 80%;"><b>Insights from the paper by Glorot and Bengio</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs056.html#the-relu-function-family" style="font-size: 80%;"><b>The RELU function family</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs057.html#elu-function" style="font-size: 80%;"><b>ELU function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs058.html#which-activation-function-should-we-use" style="font-size: 80%;"><b>Which activation function should we use?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs059.html#more-on-activation-functions-output-layers" style="font-size: 80%;"><b>More on activation functions, output layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs060.html#fine-tuning-neural-network-hyperparameters" style="font-size: 80%;"><b>Fine-tuning neural network hyperparameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs061.html#hidden-layers" style="font-size: 80%;"><b>Hidden layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs062.html#batch-normalization" style="font-size: 80%;"><b>Batch Normalization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs063.html#dropout" style="font-size: 80%;"><b>Dropout</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs064.html#gradient-clipping" style="font-size: 80%;"><b>Gradient Clipping</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs065.html#a-top-down-perspective-on-neural-networks" style="font-size: 80%;"><b>A top-down perspective on Neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs066.html#more-top-down-perspectives" style="font-size: 80%;"><b>More top-down perspectives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs067.html#limitations-of-supervised-learning-with-deep-networks" style="font-size: 80%;"><b>Limitations of supervised learning with deep networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs068.html#limitations-of-nns" style="font-size: 80%;"><b>Limitations of NNs</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs069.html#homogeneous-data" style="font-size: 80%;"><b>Homogeneous data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs070.html#more-limitations" style="font-size: 80%;"><b>More limitations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs071.html#setting-up-a-multi-layer-perceptron-model-for-classification" style="font-size: 80%;"><b>Setting up a Multi-layer perceptron model for classification</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs072.html#defining-the-cost-function" style="font-size: 80%;"><b>Defining the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs073.html#example-binary-classification-problem" style="font-size: 80%;"><b>Example: binary classification problem</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs074.html#the-softmax-function" style="font-size: 80%;"><b>The Softmax function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs075.html#developing-a-code-for-doing-neural-networks-with-back-propagation" style="font-size: 80%;"><b>Developing a code for doing neural networks with back propagation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs077.html#train-and-test-datasets" style="font-size: 80%;"><b>Train and test datasets</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs078.html#define-model-and-architecture" style="font-size: 80%;"><b>Define model and architecture</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs079.html#layers" style="font-size: 80%;"><b>Layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs080.html#weights-and-biases" style="font-size: 80%;"><b>Weights and biases</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs081.html#feed-forward-pass" style="font-size: 80%;"><b>Feed-forward pass</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs082.html#matrix-multiplications" style="font-size: 80%;"><b>Matrix multiplications</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs083.html#choose-cost-function-and-optimizer" style="font-size: 80%;"><b>Choose cost function and optimizer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs084.html#optimizing-the-cost-function" style="font-size: 80%;"><b>Optimizing the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs085.html#regularization" style="font-size: 80%;"><b>Regularization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs086.html#matrix-multiplication" style="font-size: 80%;"><b>Matrix  multiplication</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs087.html#improving-performance" style="font-size: 80%;"><b>Improving performance</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs088.html#full-object-oriented-implementation" style="font-size: 80%;"><b>Full object-oriented implementation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs089.html#evaluate-model-performance-on-test-data" style="font-size: 80%;"><b>Evaluate model performance on test data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs090.html#adjust-hyperparameters" style="font-size: 80%;"><b>Adjust hyperparameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#scikit-learn-implementation" style="font-size: 80%;"><b>scikit-learn implementation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs094.html#building-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building neural networks in Tensorflow and Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs095.html#tensorflow" style="font-size: 80%;"><b>Tensorflow</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#using-keras" style="font-size: 80%;"><b>Using Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#building-a-neural-network-code" style="font-size: 80%;"><b>Building a neural network code</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#learning-rate-methods" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Learning rate methods</a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#usage-of-the-above-learning-rate-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of the above learning rate schedulers</a></li>
@@ -465,13 +463,25 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0028"></a>
 <!-- !split -->
-<h2 id="layout-of-a-neural-network-with-three-hidden-layers-last-later-l-l-4-first-layer-l-0" class="anchor">Layout of a neural network with three hidden layers (last later = \( l=L=4 \), first layer \( l=0 \)) </h2>
+<h2 id="setting-up-the-equations-for-a-neural-network" class="anchor">Setting up the equations for a neural network </h2>
 
-<br/><br/>
-<center>
-<p><img src="http://github.com/figures/nn2.png" width="900" align="bottom"></p>
-</center>
-<br/><br/>
+<p>The questions we want to ask are how do changes in the biases and the
+weights in our network change the cost function and how can we use the
+final output to modify the weights and biases?
+</p>
+
+<p>To derive these equations let us start with a plain regression problem
+and define our cost function as
+</p>
+
+$$
+{\cal C}(\boldsymbol{\Theta})  =  \frac{1}{2}\sum_{i=1}^n\left(y_i - \tilde{y}_i\right)^2, 
+$$
+
+<p>where the $y_i$s are our \( n \) targets (the values we want to
+reproduce), while the outputs of the network after having propagated
+all inputs \( \boldsymbol{x} \) are given by \( \boldsymbol{\tilde{y}}_i \).
+</p>
 
 <p>
 <!-- navigation buttons at the bottom of the page -->
diff --git a/doc/pub/week42/html/._week42-bs029.html b/doc/pub/week42/html/._week42-bs029.html
index 45bfd2bbd..956b88b6b 100644
--- a/doc/pub/week42/html/._week42-bs029.html
+++ b/doc/pub/week42/html/._week42-bs029.html
@@ -36,16 +36,17 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Lecture October 14, 2024', 2, None, 'lecture-october-14-2024'),
-              ('Material for the active learning sessions on Tuesday and '
-               'Wednesday',
+ 'sections': [('Lecture October 13, 2025', 2, None, 'lecture-october-13-2025'),
+              ('Readings and videos', 2, None, 'readings-and-videos'),
+              ('Material for the lab sessions on Tuesday and Wednesday',
                2,
                None,
-               'material-for-the-active-learning-sessions-on-tuesday-and-wednesday'),
-              ('Writing a code which implements a feed-forward neural network',
+               'material-for-the-lab-sessions-on-tuesday-and-wednesday'),
+              ('Lecture material: Writing a code which implements a '
+               'feed-forward neural network',
                2,
                None,
-               'writing-a-code-which-implements-a-feed-forward-neural-network'),
+               'lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network'),
               ('Mathematics of deep learning',
                2,
                None,
@@ -55,10 +56,11 @@
                None,
                'reminder-on-books-with-hands-on-material-and-codes'),
               ('Reading recommendations', 2, None, 'reading-recommendations'),
-              ('First network example, simple percepetron with one input',
+              ('Reminder from last week: First network example, simple '
+               'percepetron with one input',
                2,
                None,
-               'first-network-example-simple-percepetron-with-one-input'),
+               'reminder-from-last-week-first-network-example-simple-percepetron-with-one-input'),
               ('Layout of a simple neural network with no hidden layer',
                2,
                None,
@@ -105,10 +107,10 @@
                None,
                'setting-up-the-equations-for-a-neural-network'),
               ('Layout of a neural network with three hidden layers (last '
-               'later = $l=L=4$, first layer $l=0$)',
+               'layer = $l=L=4$, first layer $l=0$)',
                2,
                None,
-               'layout-of-a-neural-network-with-three-hidden-layers-last-later-l-l-4-first-layer-l-0'),
+               'layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0'),
               ('Definitions', 2, None, 'definitions'),
               ('Inputs to the activation function',
                2,
@@ -291,10 +293,6 @@
                2,
                None,
                'collect-and-pre-process-data'),
-              ('The Breast Cancer Data, now with Keras',
-               2,
-               None,
-               'the-breast-cancer-data-now-with-keras'),
               ('Building a neural network code',
                2,
                None,
@@ -349,103 +347,103 @@
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs001.html#lecture-october-14-2024" style="font-size: 80%;"><b>Lecture October 14, 2024</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs002.html#material-for-the-active-learning-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Material for the active learning sessions on Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs003.html#writing-a-code-which-implements-a-feed-forward-neural-network" style="font-size: 80%;"><b>Writing a code which implements a feed-forward neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs004.html#mathematics-of-deep-learning" style="font-size: 80%;"><b>Mathematics of deep learning</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs005.html#reminder-on-books-with-hands-on-material-and-codes" style="font-size: 80%;"><b>Reminder on books with hands-on material and codes</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs006.html#reading-recommendations" style="font-size: 80%;"><b>Reading recommendations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs007.html#first-network-example-simple-percepetron-with-one-input" style="font-size: 80%;"><b>First network example, simple percepetron with one input</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs008.html#layout-of-a-simple-neural-network-with-no-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with no hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs009.html#optimizing-the-parameters" style="font-size: 80%;"><b>Optimizing the parameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs010.html#adding-a-hidden-layer" style="font-size: 80%;"><b>Adding a hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs011.html#layout-of-a-simple-neural-network-with-one-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with one hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs012.html#the-derivatives" style="font-size: 80%;"><b>The derivatives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs013.html#important-observations" style="font-size: 80%;"><b>Important observations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs014.html#the-training" style="font-size: 80%;"><b>The training</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs015.html#code-example" style="font-size: 80%;"><b>Code example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs016.html#simple-neural-network-and-the-back-propagation-equations" style="font-size: 80%;"><b>Simple neural network and the  back propagation equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs017.html#layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-with-two-hidden-noeds-and-one-output-node" style="font-size: 80%;"><b>Layout of a simple neural network with two input nodes, one  hidden layer with two hidden noeds and one output node</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs018.html#the-ouput-layer" style="font-size: 80%;"><b>The ouput layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs019.html#compact-expressions" style="font-size: 80%;"><b>Compact expressions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs020.html#output-layer" style="font-size: 80%;"><b>Output layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs021.html#explicit-derivatives" style="font-size: 80%;"><b>Explicit derivatives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs022.html#derivatives-of-the-hidden-layer" style="font-size: 80%;"><b>Derivatives of the hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs023.html#final-expression" style="font-size: 80%;"><b>Final expression</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs024.html#completing-the-list" style="font-size: 80%;"><b>Completing the list</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs025.html#final-expressions-for-the-biases-of-the-hidden-layer" style="font-size: 80%;"><b>Final expressions for the biases of the hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs026.html#gradient-expressions" style="font-size: 80%;"><b>Gradient expressions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs027.html#setting-up-the-equations-for-a-neural-network" style="font-size: 80%;"><b>Setting up the equations for a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs028.html#layout-of-a-neural-network-with-three-hidden-layers-last-later-l-l-4-first-layer-l-0" style="font-size: 80%;"><b>Layout of a neural network with three hidden layers (last later = \( l=L=4 \), first layer \( l=0 \))</b></a></li>
-     <!-- navigation toc: --> <li><a href="#definitions" style="font-size: 80%;"><b>Definitions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs030.html#inputs-to-the-activation-function" style="font-size: 80%;"><b>Inputs to the activation function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs031.html#layout-of-input-to-first-hidden-layer-l-1-from-input-layer-l-0" style="font-size: 80%;"><b>Layout of input to first hidden layer \( l=1 \) from input layer \( l=0 \)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs032.html#derivatives-and-the-chain-rule" style="font-size: 80%;"><b>Derivatives and the chain rule</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs033.html#derivative-of-the-cost-function" style="font-size: 80%;"><b>Derivative of the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs034.html#the-back-propagation-equations-for-a-neural-network" style="font-size: 80%;"><b>The  back propagation equations for a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs035.html#analyzing-the-last-results" style="font-size: 80%;"><b>Analyzing the last results</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs036.html#more-considerations" style="font-size: 80%;"><b>More considerations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs037.html#derivatives-in-terms-of-z-j-l" style="font-size: 80%;"><b>Derivatives in terms of \( z_j^L \)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs038.html#bringing-it-together" style="font-size: 80%;"><b>Bringing it together</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs039.html#final-back-propagating-equation" style="font-size: 80%;"><b>Final back propagating equation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs040.html#using-the-chain-rule-and-summing-over-all-k-entries" style="font-size: 80%;"><b>Using the chain rule and summing over all \( k \) entries</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs041.html#setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations" style="font-size: 80%;"><b>Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs042.html#setting-up-the-back-propagation-algorithm-part-1" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 1</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs043.html#setting-up-the-back-propagation-algorithm-part-2" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs044.html#setting-up-the-back-propagation-algorithm-part-3" style="font-size: 80%;"><b>Setting up the Back propagation algorithm, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs045.html#updating-the-gradients" style="font-size: 80%;"><b>Updating the gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs001.html#lecture-october-13-2025" style="font-size: 80%;"><b>Lecture October 13, 2025</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs002.html#readings-and-videos" style="font-size: 80%;"><b>Readings and videos</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs003.html#material-for-the-lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Material for the lab sessions on Tuesday and Wednesday</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs004.html#lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network" style="font-size: 80%;"><b>Lecture material: Writing a code which implements a feed-forward neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs005.html#mathematics-of-deep-learning" style="font-size: 80%;"><b>Mathematics of deep learning</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs006.html#reminder-on-books-with-hands-on-material-and-codes" style="font-size: 80%;"><b>Reminder on books with hands-on material and codes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs007.html#reading-recommendations" style="font-size: 80%;"><b>Reading recommendations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs008.html#reminder-from-last-week-first-network-example-simple-percepetron-with-one-input" style="font-size: 80%;"><b>Reminder from last week: First network example, simple percepetron with one input</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs009.html#layout-of-a-simple-neural-network-with-no-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with no hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs010.html#optimizing-the-parameters" style="font-size: 80%;"><b>Optimizing the parameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs011.html#adding-a-hidden-layer" style="font-size: 80%;"><b>Adding a hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs012.html#layout-of-a-simple-neural-network-with-one-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with one hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs013.html#the-derivatives" style="font-size: 80%;"><b>The derivatives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs014.html#important-observations" style="font-size: 80%;"><b>Important observations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs015.html#the-training" style="font-size: 80%;"><b>The training</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs016.html#code-example" style="font-size: 80%;"><b>Code example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs017.html#simple-neural-network-and-the-back-propagation-equations" style="font-size: 80%;"><b>Simple neural network and the  back propagation equations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs018.html#layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-with-two-hidden-noeds-and-one-output-node" style="font-size: 80%;"><b>Layout of a simple neural network with two input nodes, one  hidden layer with two hidden noeds and one output node</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs019.html#the-ouput-layer" style="font-size: 80%;"><b>The ouput layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs020.html#compact-expressions" style="font-size: 80%;"><b>Compact expressions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs021.html#output-layer" style="font-size: 80%;"><b>Output layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs022.html#explicit-derivatives" style="font-size: 80%;"><b>Explicit derivatives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs023.html#derivatives-of-the-hidden-layer" style="font-size: 80%;"><b>Derivatives of the hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs024.html#final-expression" style="font-size: 80%;"><b>Final expression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs025.html#completing-the-list" style="font-size: 80%;"><b>Completing the list</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs026.html#final-expressions-for-the-biases-of-the-hidden-layer" style="font-size: 80%;"><b>Final expressions for the biases of the hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs027.html#gradient-expressions" style="font-size: 80%;"><b>Gradient expressions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs028.html#setting-up-the-equations-for-a-neural-network" style="font-size: 80%;"><b>Setting up the equations for a neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="#layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0" style="font-size: 80%;"><b>Layout of a neural network with three hidden layers (last layer = \( l=L=4 \), first layer \( l=0 \))</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs030.html#definitions" style="font-size: 80%;"><b>Definitions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs031.html#inputs-to-the-activation-function" style="font-size: 80%;"><b>Inputs to the activation function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs032.html#layout-of-input-to-first-hidden-layer-l-1-from-input-layer-l-0" style="font-size: 80%;"><b>Layout of input to first hidden layer \( l=1 \) from input layer \( l=0 \)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs033.html#derivatives-and-the-chain-rule" style="font-size: 80%;"><b>Derivatives and the chain rule</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs034.html#derivative-of-the-cost-function" style="font-size: 80%;"><b>Derivative of the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs035.html#the-back-propagation-equations-for-a-neural-network" style="font-size: 80%;"><b>The  back propagation equations for a neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs036.html#analyzing-the-last-results" style="font-size: 80%;"><b>Analyzing the last results</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs037.html#more-considerations" style="font-size: 80%;"><b>More considerations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs038.html#derivatives-in-terms-of-z-j-l" style="font-size: 80%;"><b>Derivatives in terms of \( z_j^L \)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs039.html#bringing-it-together" style="font-size: 80%;"><b>Bringing it together</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs040.html#final-back-propagating-equation" style="font-size: 80%;"><b>Final back propagating equation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs041.html#using-the-chain-rule-and-summing-over-all-k-entries" style="font-size: 80%;"><b>Using the chain rule and summing over all \( k \) entries</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs042.html#setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations" style="font-size: 80%;"><b>Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs043.html#setting-up-the-back-propagation-algorithm-part-1" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 1</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs044.html#setting-up-the-back-propagation-algorithm-part-2" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 2</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs045.html#setting-up-the-back-propagation-algorithm-part-3" style="font-size: 80%;"><b>Setting up the Back propagation algorithm, part 3</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs046.html#updating-the-gradients" style="font-size: 80%;"><b>Updating the gradients</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#activation-functions" style="font-size: 80%;"><b>Activation functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs047.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs048.html#relevance" style="font-size: 80%;"><b>Relevance</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs049.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs050.html#exploding-gradients" style="font-size: 80%;"><b>Exploding gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs051.html#is-the-logistic-activation-function-sigmoid-our-choice" style="font-size: 80%;"><b>Is the Logistic activation function (Sigmoid)  our choice?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs052.html#logistic-function-as-the-root-of-problems" style="font-size: 80%;"><b>Logistic function as the root of problems</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs053.html#the-derivative-of-the-logistic-funtion" style="font-size: 80%;"><b>The derivative of the Logistic funtion</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs054.html#insights-from-the-paper-by-glorot-and-bengio" style="font-size: 80%;"><b>Insights from the paper by Glorot and Bengio</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs055.html#the-relu-function-family" style="font-size: 80%;"><b>The RELU function family</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs056.html#elu-function" style="font-size: 80%;"><b>ELU function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs057.html#which-activation-function-should-we-use" style="font-size: 80%;"><b>Which activation function should we use?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs058.html#more-on-activation-functions-output-layers" style="font-size: 80%;"><b>More on activation functions, output layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs059.html#fine-tuning-neural-network-hyperparameters" style="font-size: 80%;"><b>Fine-tuning neural network hyperparameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs060.html#hidden-layers" style="font-size: 80%;"><b>Hidden layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs061.html#batch-normalization" style="font-size: 80%;"><b>Batch Normalization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs062.html#dropout" style="font-size: 80%;"><b>Dropout</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs063.html#gradient-clipping" style="font-size: 80%;"><b>Gradient Clipping</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs064.html#a-top-down-perspective-on-neural-networks" style="font-size: 80%;"><b>A top-down perspective on Neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs065.html#more-top-down-perspectives" style="font-size: 80%;"><b>More top-down perspectives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs066.html#limitations-of-supervised-learning-with-deep-networks" style="font-size: 80%;"><b>Limitations of supervised learning with deep networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs067.html#limitations-of-nns" style="font-size: 80%;"><b>Limitations of NNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs068.html#homogeneous-data" style="font-size: 80%;"><b>Homogeneous data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs069.html#more-limitations" style="font-size: 80%;"><b>More limitations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs070.html#setting-up-a-multi-layer-perceptron-model-for-classification" style="font-size: 80%;"><b>Setting up a Multi-layer perceptron model for classification</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs071.html#defining-the-cost-function" style="font-size: 80%;"><b>Defining the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs072.html#example-binary-classification-problem" style="font-size: 80%;"><b>Example: binary classification problem</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs073.html#the-softmax-function" style="font-size: 80%;"><b>The Softmax function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs074.html#developing-a-code-for-doing-neural-networks-with-back-propagation" style="font-size: 80%;"><b>Developing a code for doing neural networks with back propagation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs076.html#train-and-test-datasets" style="font-size: 80%;"><b>Train and test datasets</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs077.html#define-model-and-architecture" style="font-size: 80%;"><b>Define model and architecture</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs078.html#layers" style="font-size: 80%;"><b>Layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs079.html#weights-and-biases" style="font-size: 80%;"><b>Weights and biases</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs080.html#feed-forward-pass" style="font-size: 80%;"><b>Feed-forward pass</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs081.html#matrix-multiplications" style="font-size: 80%;"><b>Matrix multiplications</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs082.html#choose-cost-function-and-optimizer" style="font-size: 80%;"><b>Choose cost function and optimizer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs083.html#optimizing-the-cost-function" style="font-size: 80%;"><b>Optimizing the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs084.html#regularization" style="font-size: 80%;"><b>Regularization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs085.html#matrix-multiplication" style="font-size: 80%;"><b>Matrix  multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs086.html#improving-performance" style="font-size: 80%;"><b>Improving performance</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs087.html#full-object-oriented-implementation" style="font-size: 80%;"><b>Full object-oriented implementation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs088.html#evaluate-model-performance-on-test-data" style="font-size: 80%;"><b>Evaluate model performance on test data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs089.html#adjust-hyperparameters" style="font-size: 80%;"><b>Adjust hyperparameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs091.html#scikit-learn-implementation" style="font-size: 80%;"><b>scikit-learn implementation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#building-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs094.html#tensorflow" style="font-size: 80%;"><b>Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs095.html#using-keras" style="font-size: 80%;"><b>Using Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#the-breast-cancer-data-now-with-keras" style="font-size: 80%;"><b>The Breast Cancer Data, now with Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs048.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs049.html#relevance" style="font-size: 80%;"><b>Relevance</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs050.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs051.html#exploding-gradients" style="font-size: 80%;"><b>Exploding gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs052.html#is-the-logistic-activation-function-sigmoid-our-choice" style="font-size: 80%;"><b>Is the Logistic activation function (Sigmoid)  our choice?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs053.html#logistic-function-as-the-root-of-problems" style="font-size: 80%;"><b>Logistic function as the root of problems</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs054.html#the-derivative-of-the-logistic-funtion" style="font-size: 80%;"><b>The derivative of the Logistic funtion</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs055.html#insights-from-the-paper-by-glorot-and-bengio" style="font-size: 80%;"><b>Insights from the paper by Glorot and Bengio</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs056.html#the-relu-function-family" style="font-size: 80%;"><b>The RELU function family</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs057.html#elu-function" style="font-size: 80%;"><b>ELU function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs058.html#which-activation-function-should-we-use" style="font-size: 80%;"><b>Which activation function should we use?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs059.html#more-on-activation-functions-output-layers" style="font-size: 80%;"><b>More on activation functions, output layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs060.html#fine-tuning-neural-network-hyperparameters" style="font-size: 80%;"><b>Fine-tuning neural network hyperparameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs061.html#hidden-layers" style="font-size: 80%;"><b>Hidden layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs062.html#batch-normalization" style="font-size: 80%;"><b>Batch Normalization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs063.html#dropout" style="font-size: 80%;"><b>Dropout</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs064.html#gradient-clipping" style="font-size: 80%;"><b>Gradient Clipping</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs065.html#a-top-down-perspective-on-neural-networks" style="font-size: 80%;"><b>A top-down perspective on Neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs066.html#more-top-down-perspectives" style="font-size: 80%;"><b>More top-down perspectives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs067.html#limitations-of-supervised-learning-with-deep-networks" style="font-size: 80%;"><b>Limitations of supervised learning with deep networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs068.html#limitations-of-nns" style="font-size: 80%;"><b>Limitations of NNs</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs069.html#homogeneous-data" style="font-size: 80%;"><b>Homogeneous data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs070.html#more-limitations" style="font-size: 80%;"><b>More limitations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs071.html#setting-up-a-multi-layer-perceptron-model-for-classification" style="font-size: 80%;"><b>Setting up a Multi-layer perceptron model for classification</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs072.html#defining-the-cost-function" style="font-size: 80%;"><b>Defining the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs073.html#example-binary-classification-problem" style="font-size: 80%;"><b>Example: binary classification problem</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs074.html#the-softmax-function" style="font-size: 80%;"><b>The Softmax function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs075.html#developing-a-code-for-doing-neural-networks-with-back-propagation" style="font-size: 80%;"><b>Developing a code for doing neural networks with back propagation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs077.html#train-and-test-datasets" style="font-size: 80%;"><b>Train and test datasets</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs078.html#define-model-and-architecture" style="font-size: 80%;"><b>Define model and architecture</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs079.html#layers" style="font-size: 80%;"><b>Layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs080.html#weights-and-biases" style="font-size: 80%;"><b>Weights and biases</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs081.html#feed-forward-pass" style="font-size: 80%;"><b>Feed-forward pass</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs082.html#matrix-multiplications" style="font-size: 80%;"><b>Matrix multiplications</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs083.html#choose-cost-function-and-optimizer" style="font-size: 80%;"><b>Choose cost function and optimizer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs084.html#optimizing-the-cost-function" style="font-size: 80%;"><b>Optimizing the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs085.html#regularization" style="font-size: 80%;"><b>Regularization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs086.html#matrix-multiplication" style="font-size: 80%;"><b>Matrix  multiplication</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs087.html#improving-performance" style="font-size: 80%;"><b>Improving performance</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs088.html#full-object-oriented-implementation" style="font-size: 80%;"><b>Full object-oriented implementation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs089.html#evaluate-model-performance-on-test-data" style="font-size: 80%;"><b>Evaluate model performance on test data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs090.html#adjust-hyperparameters" style="font-size: 80%;"><b>Adjust hyperparameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#scikit-learn-implementation" style="font-size: 80%;"><b>scikit-learn implementation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs094.html#building-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building neural networks in Tensorflow and Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs095.html#tensorflow" style="font-size: 80%;"><b>Tensorflow</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#using-keras" style="font-size: 80%;"><b>Using Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#building-a-neural-network-code" style="font-size: 80%;"><b>Building a neural network code</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#learning-rate-methods" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Learning rate methods</a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#usage-of-the-above-learning-rate-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of the above learning rate schedulers</a></li>
@@ -465,30 +463,13 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0029"></a>
 <!-- !split -->
-<h2 id="definitions" class="anchor">Definitions </h2>
-
-<p>With our definition of the targets \( \boldsymbol{y} \), the outputs of the
-network \( \boldsymbol{\tilde{y}} \) and the inputs \( \boldsymbol{x} \) we
-define now the activation \( z_j^l \) of node/neuron/unit \( j \) of the
-\( l \)-th layer as a function of the bias, the weights which add up from
-the previous layer \( l-1 \) and the forward passes/outputs
-\( \boldsymbol{a}^{l-1} \) from the previous layer as
-</p>
-
-$$
-z_j^l = \sum_{i=1}^{M_{l-1}}w_{ij}^la_i^{l-1}+b_j^l,
-$$
-
-<p>where \( b_k^l \) are the biases from layer \( l \).  Here \( M_{l-1} \)
-represents the total number of nodes/neurons/units of layer \( l-1 \). The
-figure in the whiteboard notes illustrates this equation.  We can rewrite this in a more
-compact form as the matrix-vector products we discussed earlier,
-</p>
-
-$$
-\boldsymbol{z}^l = \left(\boldsymbol{W}^l\right)^T\boldsymbol{a}^{l-1}+\boldsymbol{b}^l.
-$$
+<h2 id="layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0" class="anchor">Layout of a neural network with three hidden layers (last layer = \( l=L=4 \), first layer \( l=0 \)) </h2>
 
+<br/><br/>
+<center>
+<p><img src="http://github.com/figures/nn2.png" width="900" align="bottom"></p>
+</center>
+<br/><br/>
 
 <p>
 <!-- navigation buttons at the bottom of the page -->
diff --git a/doc/pub/week42/html/._week42-bs030.html b/doc/pub/week42/html/._week42-bs030.html
index 04978b68b..138b03ced 100644
--- a/doc/pub/week42/html/._week42-bs030.html
+++ b/doc/pub/week42/html/._week42-bs030.html
@@ -36,16 +36,17 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Lecture October 14, 2024', 2, None, 'lecture-october-14-2024'),
-              ('Material for the active learning sessions on Tuesday and '
-               'Wednesday',
+ 'sections': [('Lecture October 13, 2025', 2, None, 'lecture-october-13-2025'),
+              ('Readings and videos', 2, None, 'readings-and-videos'),
+              ('Material for the lab sessions on Tuesday and Wednesday',
                2,
                None,
-               'material-for-the-active-learning-sessions-on-tuesday-and-wednesday'),
-              ('Writing a code which implements a feed-forward neural network',
+               'material-for-the-lab-sessions-on-tuesday-and-wednesday'),
+              ('Lecture material: Writing a code which implements a '
+               'feed-forward neural network',
                2,
                None,
-               'writing-a-code-which-implements-a-feed-forward-neural-network'),
+               'lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network'),
               ('Mathematics of deep learning',
                2,
                None,
@@ -55,10 +56,11 @@
                None,
                'reminder-on-books-with-hands-on-material-and-codes'),
               ('Reading recommendations', 2, None, 'reading-recommendations'),
-              ('First network example, simple percepetron with one input',
+              ('Reminder from last week: First network example, simple '
+               'percepetron with one input',
                2,
                None,
-               'first-network-example-simple-percepetron-with-one-input'),
+               'reminder-from-last-week-first-network-example-simple-percepetron-with-one-input'),
               ('Layout of a simple neural network with no hidden layer',
                2,
                None,
@@ -105,10 +107,10 @@
                None,
                'setting-up-the-equations-for-a-neural-network'),
               ('Layout of a neural network with three hidden layers (last '
-               'later = $l=L=4$, first layer $l=0$)',
+               'layer = $l=L=4$, first layer $l=0$)',
                2,
                None,
-               'layout-of-a-neural-network-with-three-hidden-layers-last-later-l-l-4-first-layer-l-0'),
+               'layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0'),
               ('Definitions', 2, None, 'definitions'),
               ('Inputs to the activation function',
                2,
@@ -291,10 +293,6 @@
                2,
                None,
                'collect-and-pre-process-data'),
-              ('The Breast Cancer Data, now with Keras',
-               2,
-               None,
-               'the-breast-cancer-data-now-with-keras'),
               ('Building a neural network code',
                2,
                None,
@@ -349,103 +347,103 @@
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs001.html#lecture-october-14-2024" style="font-size: 80%;"><b>Lecture October 14, 2024</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs002.html#material-for-the-active-learning-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Material for the active learning sessions on Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs003.html#writing-a-code-which-implements-a-feed-forward-neural-network" style="font-size: 80%;"><b>Writing a code which implements a feed-forward neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs004.html#mathematics-of-deep-learning" style="font-size: 80%;"><b>Mathematics of deep learning</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs005.html#reminder-on-books-with-hands-on-material-and-codes" style="font-size: 80%;"><b>Reminder on books with hands-on material and codes</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs006.html#reading-recommendations" style="font-size: 80%;"><b>Reading recommendations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs007.html#first-network-example-simple-percepetron-with-one-input" style="font-size: 80%;"><b>First network example, simple percepetron with one input</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs008.html#layout-of-a-simple-neural-network-with-no-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with no hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs009.html#optimizing-the-parameters" style="font-size: 80%;"><b>Optimizing the parameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs010.html#adding-a-hidden-layer" style="font-size: 80%;"><b>Adding a hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs011.html#layout-of-a-simple-neural-network-with-one-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with one hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs012.html#the-derivatives" style="font-size: 80%;"><b>The derivatives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs013.html#important-observations" style="font-size: 80%;"><b>Important observations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs014.html#the-training" style="font-size: 80%;"><b>The training</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs015.html#code-example" style="font-size: 80%;"><b>Code example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs016.html#simple-neural-network-and-the-back-propagation-equations" style="font-size: 80%;"><b>Simple neural network and the  back propagation equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs017.html#layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-with-two-hidden-noeds-and-one-output-node" style="font-size: 80%;"><b>Layout of a simple neural network with two input nodes, one  hidden layer with two hidden noeds and one output node</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs018.html#the-ouput-layer" style="font-size: 80%;"><b>The ouput layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs019.html#compact-expressions" style="font-size: 80%;"><b>Compact expressions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs020.html#output-layer" style="font-size: 80%;"><b>Output layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs021.html#explicit-derivatives" style="font-size: 80%;"><b>Explicit derivatives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs022.html#derivatives-of-the-hidden-layer" style="font-size: 80%;"><b>Derivatives of the hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs023.html#final-expression" style="font-size: 80%;"><b>Final expression</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs024.html#completing-the-list" style="font-size: 80%;"><b>Completing the list</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs025.html#final-expressions-for-the-biases-of-the-hidden-layer" style="font-size: 80%;"><b>Final expressions for the biases of the hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs026.html#gradient-expressions" style="font-size: 80%;"><b>Gradient expressions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs027.html#setting-up-the-equations-for-a-neural-network" style="font-size: 80%;"><b>Setting up the equations for a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs028.html#layout-of-a-neural-network-with-three-hidden-layers-last-later-l-l-4-first-layer-l-0" style="font-size: 80%;"><b>Layout of a neural network with three hidden layers (last later = \( l=L=4 \), first layer \( l=0 \))</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs029.html#definitions" style="font-size: 80%;"><b>Definitions</b></a></li>
-     <!-- navigation toc: --> <li><a href="#inputs-to-the-activation-function" style="font-size: 80%;"><b>Inputs to the activation function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs031.html#layout-of-input-to-first-hidden-layer-l-1-from-input-layer-l-0" style="font-size: 80%;"><b>Layout of input to first hidden layer \( l=1 \) from input layer \( l=0 \)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs032.html#derivatives-and-the-chain-rule" style="font-size: 80%;"><b>Derivatives and the chain rule</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs033.html#derivative-of-the-cost-function" style="font-size: 80%;"><b>Derivative of the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs034.html#the-back-propagation-equations-for-a-neural-network" style="font-size: 80%;"><b>The  back propagation equations for a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs035.html#analyzing-the-last-results" style="font-size: 80%;"><b>Analyzing the last results</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs036.html#more-considerations" style="font-size: 80%;"><b>More considerations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs037.html#derivatives-in-terms-of-z-j-l" style="font-size: 80%;"><b>Derivatives in terms of \( z_j^L \)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs038.html#bringing-it-together" style="font-size: 80%;"><b>Bringing it together</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs039.html#final-back-propagating-equation" style="font-size: 80%;"><b>Final back propagating equation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs040.html#using-the-chain-rule-and-summing-over-all-k-entries" style="font-size: 80%;"><b>Using the chain rule and summing over all \( k \) entries</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs041.html#setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations" style="font-size: 80%;"><b>Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs042.html#setting-up-the-back-propagation-algorithm-part-1" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 1</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs043.html#setting-up-the-back-propagation-algorithm-part-2" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs044.html#setting-up-the-back-propagation-algorithm-part-3" style="font-size: 80%;"><b>Setting up the Back propagation algorithm, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs045.html#updating-the-gradients" style="font-size: 80%;"><b>Updating the gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs001.html#lecture-october-13-2025" style="font-size: 80%;"><b>Lecture October 13, 2025</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs002.html#readings-and-videos" style="font-size: 80%;"><b>Readings and videos</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs003.html#material-for-the-lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Material for the lab sessions on Tuesday and Wednesday</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs004.html#lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network" style="font-size: 80%;"><b>Lecture material: Writing a code which implements a feed-forward neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs005.html#mathematics-of-deep-learning" style="font-size: 80%;"><b>Mathematics of deep learning</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs006.html#reminder-on-books-with-hands-on-material-and-codes" style="font-size: 80%;"><b>Reminder on books with hands-on material and codes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs007.html#reading-recommendations" style="font-size: 80%;"><b>Reading recommendations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs008.html#reminder-from-last-week-first-network-example-simple-percepetron-with-one-input" style="font-size: 80%;"><b>Reminder from last week: First network example, simple percepetron with one input</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs009.html#layout-of-a-simple-neural-network-with-no-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with no hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs010.html#optimizing-the-parameters" style="font-size: 80%;"><b>Optimizing the parameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs011.html#adding-a-hidden-layer" style="font-size: 80%;"><b>Adding a hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs012.html#layout-of-a-simple-neural-network-with-one-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with one hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs013.html#the-derivatives" style="font-size: 80%;"><b>The derivatives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs014.html#important-observations" style="font-size: 80%;"><b>Important observations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs015.html#the-training" style="font-size: 80%;"><b>The training</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs016.html#code-example" style="font-size: 80%;"><b>Code example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs017.html#simple-neural-network-and-the-back-propagation-equations" style="font-size: 80%;"><b>Simple neural network and the  back propagation equations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs018.html#layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-with-two-hidden-noeds-and-one-output-node" style="font-size: 80%;"><b>Layout of a simple neural network with two input nodes, one  hidden layer with two hidden noeds and one output node</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs019.html#the-ouput-layer" style="font-size: 80%;"><b>The ouput layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs020.html#compact-expressions" style="font-size: 80%;"><b>Compact expressions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs021.html#output-layer" style="font-size: 80%;"><b>Output layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs022.html#explicit-derivatives" style="font-size: 80%;"><b>Explicit derivatives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs023.html#derivatives-of-the-hidden-layer" style="font-size: 80%;"><b>Derivatives of the hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs024.html#final-expression" style="font-size: 80%;"><b>Final expression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs025.html#completing-the-list" style="font-size: 80%;"><b>Completing the list</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs026.html#final-expressions-for-the-biases-of-the-hidden-layer" style="font-size: 80%;"><b>Final expressions for the biases of the hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs027.html#gradient-expressions" style="font-size: 80%;"><b>Gradient expressions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs028.html#setting-up-the-equations-for-a-neural-network" style="font-size: 80%;"><b>Setting up the equations for a neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs029.html#layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0" style="font-size: 80%;"><b>Layout of a neural network with three hidden layers (last layer = \( l=L=4 \), first layer \( l=0 \))</b></a></li>
+     <!-- navigation toc: --> <li><a href="#definitions" style="font-size: 80%;"><b>Definitions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs031.html#inputs-to-the-activation-function" style="font-size: 80%;"><b>Inputs to the activation function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs032.html#layout-of-input-to-first-hidden-layer-l-1-from-input-layer-l-0" style="font-size: 80%;"><b>Layout of input to first hidden layer \( l=1 \) from input layer \( l=0 \)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs033.html#derivatives-and-the-chain-rule" style="font-size: 80%;"><b>Derivatives and the chain rule</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs034.html#derivative-of-the-cost-function" style="font-size: 80%;"><b>Derivative of the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs035.html#the-back-propagation-equations-for-a-neural-network" style="font-size: 80%;"><b>The  back propagation equations for a neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs036.html#analyzing-the-last-results" style="font-size: 80%;"><b>Analyzing the last results</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs037.html#more-considerations" style="font-size: 80%;"><b>More considerations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs038.html#derivatives-in-terms-of-z-j-l" style="font-size: 80%;"><b>Derivatives in terms of \( z_j^L \)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs039.html#bringing-it-together" style="font-size: 80%;"><b>Bringing it together</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs040.html#final-back-propagating-equation" style="font-size: 80%;"><b>Final back propagating equation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs041.html#using-the-chain-rule-and-summing-over-all-k-entries" style="font-size: 80%;"><b>Using the chain rule and summing over all \( k \) entries</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs042.html#setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations" style="font-size: 80%;"><b>Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs043.html#setting-up-the-back-propagation-algorithm-part-1" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 1</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs044.html#setting-up-the-back-propagation-algorithm-part-2" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 2</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs045.html#setting-up-the-back-propagation-algorithm-part-3" style="font-size: 80%;"><b>Setting up the Back propagation algorithm, part 3</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs046.html#updating-the-gradients" style="font-size: 80%;"><b>Updating the gradients</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#activation-functions" style="font-size: 80%;"><b>Activation functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs047.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs048.html#relevance" style="font-size: 80%;"><b>Relevance</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs049.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs050.html#exploding-gradients" style="font-size: 80%;"><b>Exploding gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs051.html#is-the-logistic-activation-function-sigmoid-our-choice" style="font-size: 80%;"><b>Is the Logistic activation function (Sigmoid)  our choice?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs052.html#logistic-function-as-the-root-of-problems" style="font-size: 80%;"><b>Logistic function as the root of problems</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs053.html#the-derivative-of-the-logistic-funtion" style="font-size: 80%;"><b>The derivative of the Logistic funtion</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs054.html#insights-from-the-paper-by-glorot-and-bengio" style="font-size: 80%;"><b>Insights from the paper by Glorot and Bengio</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs055.html#the-relu-function-family" style="font-size: 80%;"><b>The RELU function family</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs056.html#elu-function" style="font-size: 80%;"><b>ELU function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs057.html#which-activation-function-should-we-use" style="font-size: 80%;"><b>Which activation function should we use?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs058.html#more-on-activation-functions-output-layers" style="font-size: 80%;"><b>More on activation functions, output layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs059.html#fine-tuning-neural-network-hyperparameters" style="font-size: 80%;"><b>Fine-tuning neural network hyperparameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs060.html#hidden-layers" style="font-size: 80%;"><b>Hidden layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs061.html#batch-normalization" style="font-size: 80%;"><b>Batch Normalization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs062.html#dropout" style="font-size: 80%;"><b>Dropout</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs063.html#gradient-clipping" style="font-size: 80%;"><b>Gradient Clipping</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs064.html#a-top-down-perspective-on-neural-networks" style="font-size: 80%;"><b>A top-down perspective on Neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs065.html#more-top-down-perspectives" style="font-size: 80%;"><b>More top-down perspectives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs066.html#limitations-of-supervised-learning-with-deep-networks" style="font-size: 80%;"><b>Limitations of supervised learning with deep networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs067.html#limitations-of-nns" style="font-size: 80%;"><b>Limitations of NNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs068.html#homogeneous-data" style="font-size: 80%;"><b>Homogeneous data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs069.html#more-limitations" style="font-size: 80%;"><b>More limitations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs070.html#setting-up-a-multi-layer-perceptron-model-for-classification" style="font-size: 80%;"><b>Setting up a Multi-layer perceptron model for classification</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs071.html#defining-the-cost-function" style="font-size: 80%;"><b>Defining the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs072.html#example-binary-classification-problem" style="font-size: 80%;"><b>Example: binary classification problem</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs073.html#the-softmax-function" style="font-size: 80%;"><b>The Softmax function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs074.html#developing-a-code-for-doing-neural-networks-with-back-propagation" style="font-size: 80%;"><b>Developing a code for doing neural networks with back propagation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs076.html#train-and-test-datasets" style="font-size: 80%;"><b>Train and test datasets</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs077.html#define-model-and-architecture" style="font-size: 80%;"><b>Define model and architecture</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs078.html#layers" style="font-size: 80%;"><b>Layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs079.html#weights-and-biases" style="font-size: 80%;"><b>Weights and biases</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs080.html#feed-forward-pass" style="font-size: 80%;"><b>Feed-forward pass</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs081.html#matrix-multiplications" style="font-size: 80%;"><b>Matrix multiplications</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs082.html#choose-cost-function-and-optimizer" style="font-size: 80%;"><b>Choose cost function and optimizer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs083.html#optimizing-the-cost-function" style="font-size: 80%;"><b>Optimizing the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs084.html#regularization" style="font-size: 80%;"><b>Regularization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs085.html#matrix-multiplication" style="font-size: 80%;"><b>Matrix  multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs086.html#improving-performance" style="font-size: 80%;"><b>Improving performance</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs087.html#full-object-oriented-implementation" style="font-size: 80%;"><b>Full object-oriented implementation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs088.html#evaluate-model-performance-on-test-data" style="font-size: 80%;"><b>Evaluate model performance on test data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs089.html#adjust-hyperparameters" style="font-size: 80%;"><b>Adjust hyperparameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs091.html#scikit-learn-implementation" style="font-size: 80%;"><b>scikit-learn implementation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#building-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs094.html#tensorflow" style="font-size: 80%;"><b>Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs095.html#using-keras" style="font-size: 80%;"><b>Using Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#the-breast-cancer-data-now-with-keras" style="font-size: 80%;"><b>The Breast Cancer Data, now with Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs048.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs049.html#relevance" style="font-size: 80%;"><b>Relevance</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs050.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs051.html#exploding-gradients" style="font-size: 80%;"><b>Exploding gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs052.html#is-the-logistic-activation-function-sigmoid-our-choice" style="font-size: 80%;"><b>Is the Logistic activation function (Sigmoid)  our choice?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs053.html#logistic-function-as-the-root-of-problems" style="font-size: 80%;"><b>Logistic function as the root of problems</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs054.html#the-derivative-of-the-logistic-funtion" style="font-size: 80%;"><b>The derivative of the Logistic funtion</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs055.html#insights-from-the-paper-by-glorot-and-bengio" style="font-size: 80%;"><b>Insights from the paper by Glorot and Bengio</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs056.html#the-relu-function-family" style="font-size: 80%;"><b>The RELU function family</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs057.html#elu-function" style="font-size: 80%;"><b>ELU function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs058.html#which-activation-function-should-we-use" style="font-size: 80%;"><b>Which activation function should we use?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs059.html#more-on-activation-functions-output-layers" style="font-size: 80%;"><b>More on activation functions, output layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs060.html#fine-tuning-neural-network-hyperparameters" style="font-size: 80%;"><b>Fine-tuning neural network hyperparameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs061.html#hidden-layers" style="font-size: 80%;"><b>Hidden layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs062.html#batch-normalization" style="font-size: 80%;"><b>Batch Normalization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs063.html#dropout" style="font-size: 80%;"><b>Dropout</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs064.html#gradient-clipping" style="font-size: 80%;"><b>Gradient Clipping</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs065.html#a-top-down-perspective-on-neural-networks" style="font-size: 80%;"><b>A top-down perspective on Neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs066.html#more-top-down-perspectives" style="font-size: 80%;"><b>More top-down perspectives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs067.html#limitations-of-supervised-learning-with-deep-networks" style="font-size: 80%;"><b>Limitations of supervised learning with deep networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs068.html#limitations-of-nns" style="font-size: 80%;"><b>Limitations of NNs</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs069.html#homogeneous-data" style="font-size: 80%;"><b>Homogeneous data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs070.html#more-limitations" style="font-size: 80%;"><b>More limitations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs071.html#setting-up-a-multi-layer-perceptron-model-for-classification" style="font-size: 80%;"><b>Setting up a Multi-layer perceptron model for classification</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs072.html#defining-the-cost-function" style="font-size: 80%;"><b>Defining the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs073.html#example-binary-classification-problem" style="font-size: 80%;"><b>Example: binary classification problem</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs074.html#the-softmax-function" style="font-size: 80%;"><b>The Softmax function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs075.html#developing-a-code-for-doing-neural-networks-with-back-propagation" style="font-size: 80%;"><b>Developing a code for doing neural networks with back propagation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs077.html#train-and-test-datasets" style="font-size: 80%;"><b>Train and test datasets</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs078.html#define-model-and-architecture" style="font-size: 80%;"><b>Define model and architecture</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs079.html#layers" style="font-size: 80%;"><b>Layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs080.html#weights-and-biases" style="font-size: 80%;"><b>Weights and biases</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs081.html#feed-forward-pass" style="font-size: 80%;"><b>Feed-forward pass</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs082.html#matrix-multiplications" style="font-size: 80%;"><b>Matrix multiplications</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs083.html#choose-cost-function-and-optimizer" style="font-size: 80%;"><b>Choose cost function and optimizer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs084.html#optimizing-the-cost-function" style="font-size: 80%;"><b>Optimizing the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs085.html#regularization" style="font-size: 80%;"><b>Regularization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs086.html#matrix-multiplication" style="font-size: 80%;"><b>Matrix  multiplication</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs087.html#improving-performance" style="font-size: 80%;"><b>Improving performance</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs088.html#full-object-oriented-implementation" style="font-size: 80%;"><b>Full object-oriented implementation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs089.html#evaluate-model-performance-on-test-data" style="font-size: 80%;"><b>Evaluate model performance on test data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs090.html#adjust-hyperparameters" style="font-size: 80%;"><b>Adjust hyperparameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#scikit-learn-implementation" style="font-size: 80%;"><b>scikit-learn implementation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs094.html#building-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building neural networks in Tensorflow and Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs095.html#tensorflow" style="font-size: 80%;"><b>Tensorflow</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#using-keras" style="font-size: 80%;"><b>Using Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#building-a-neural-network-code" style="font-size: 80%;"><b>Building a neural network code</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#learning-rate-methods" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Learning rate methods</a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#usage-of-the-above-learning-rate-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of the above learning rate schedulers</a></li>
@@ -465,17 +463,28 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0030"></a>
 <!-- !split -->
-<h2 id="inputs-to-the-activation-function" class="anchor">Inputs to the activation function </h2>
+<h2 id="definitions" class="anchor">Definitions </h2>
+
+<p>With our definition of the targets \( \boldsymbol{y} \), the outputs of the
+network \( \boldsymbol{\tilde{y}} \) and the inputs \( \boldsymbol{x} \) we
+define now the activation \( z_j^l \) of node/neuron/unit \( j \) of the
+\( l \)-th layer as a function of the bias, the weights which add up from
+the previous layer \( l-1 \) and the forward passes/outputs
+\( \boldsymbol{a}^{l-1} \) from the previous layer as
+</p>
+
+$$
+z_j^l = \sum_{i=1}^{M_{l-1}}w_{ij}^la_i^{l-1}+b_j^l,
+$$
 
-<p>With the activation values \( \boldsymbol{z}^l \) we can in turn define the
-output of layer \( l \) as \( \boldsymbol{a}^l = \sigma(\boldsymbol{z}^l) \) where \( \sigma \) is our
-activation function. In the examples here we will use the sigmoid
-function discussed in our logistic regression lectures. We will also use the same activation function \( \sigma \) for all layers
-and their nodes.  It means we have
+<p>where \( b_k^l \) are the biases from layer \( l \).  Here \( M_{l-1} \)
+represents the total number of nodes/neurons/units of layer \( l-1 \). The
+figure in the whiteboard notes illustrates this equation.  We can rewrite this in a more
+compact form as the matrix-vector products we discussed earlier,
 </p>
 
 $$
-a_j^l = \sigma(z_j^l) = \frac{1}{1+\exp{-(z_j^l)}}.
+\boldsymbol{z}^l = \left(\boldsymbol{W}^l\right)^T\boldsymbol{a}^{l-1}+\boldsymbol{b}^l.
 $$
 
 
diff --git a/doc/pub/week42/html/._week42-bs031.html b/doc/pub/week42/html/._week42-bs031.html
index 40e18abe1..66e575d60 100644
--- a/doc/pub/week42/html/._week42-bs031.html
+++ b/doc/pub/week42/html/._week42-bs031.html
@@ -36,16 +36,17 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Lecture October 14, 2024', 2, None, 'lecture-october-14-2024'),
-              ('Material for the active learning sessions on Tuesday and '
-               'Wednesday',
+ 'sections': [('Lecture October 13, 2025', 2, None, 'lecture-october-13-2025'),
+              ('Readings and videos', 2, None, 'readings-and-videos'),
+              ('Material for the lab sessions on Tuesday and Wednesday',
                2,
                None,
-               'material-for-the-active-learning-sessions-on-tuesday-and-wednesday'),
-              ('Writing a code which implements a feed-forward neural network',
+               'material-for-the-lab-sessions-on-tuesday-and-wednesday'),
+              ('Lecture material: Writing a code which implements a '
+               'feed-forward neural network',
                2,
                None,
-               'writing-a-code-which-implements-a-feed-forward-neural-network'),
+               'lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network'),
               ('Mathematics of deep learning',
                2,
                None,
@@ -55,10 +56,11 @@
                None,
                'reminder-on-books-with-hands-on-material-and-codes'),
               ('Reading recommendations', 2, None, 'reading-recommendations'),
-              ('First network example, simple percepetron with one input',
+              ('Reminder from last week: First network example, simple '
+               'percepetron with one input',
                2,
                None,
-               'first-network-example-simple-percepetron-with-one-input'),
+               'reminder-from-last-week-first-network-example-simple-percepetron-with-one-input'),
               ('Layout of a simple neural network with no hidden layer',
                2,
                None,
@@ -105,10 +107,10 @@
                None,
                'setting-up-the-equations-for-a-neural-network'),
               ('Layout of a neural network with three hidden layers (last '
-               'later = $l=L=4$, first layer $l=0$)',
+               'layer = $l=L=4$, first layer $l=0$)',
                2,
                None,
-               'layout-of-a-neural-network-with-three-hidden-layers-last-later-l-l-4-first-layer-l-0'),
+               'layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0'),
               ('Definitions', 2, None, 'definitions'),
               ('Inputs to the activation function',
                2,
@@ -291,10 +293,6 @@
                2,
                None,
                'collect-and-pre-process-data'),
-              ('The Breast Cancer Data, now with Keras',
-               2,
-               None,
-               'the-breast-cancer-data-now-with-keras'),
               ('Building a neural network code',
                2,
                None,
@@ -349,103 +347,103 @@
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs001.html#lecture-october-14-2024" style="font-size: 80%;"><b>Lecture October 14, 2024</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs002.html#material-for-the-active-learning-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Material for the active learning sessions on Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs003.html#writing-a-code-which-implements-a-feed-forward-neural-network" style="font-size: 80%;"><b>Writing a code which implements a feed-forward neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs004.html#mathematics-of-deep-learning" style="font-size: 80%;"><b>Mathematics of deep learning</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs005.html#reminder-on-books-with-hands-on-material-and-codes" style="font-size: 80%;"><b>Reminder on books with hands-on material and codes</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs006.html#reading-recommendations" style="font-size: 80%;"><b>Reading recommendations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs007.html#first-network-example-simple-percepetron-with-one-input" style="font-size: 80%;"><b>First network example, simple percepetron with one input</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs008.html#layout-of-a-simple-neural-network-with-no-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with no hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs009.html#optimizing-the-parameters" style="font-size: 80%;"><b>Optimizing the parameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs010.html#adding-a-hidden-layer" style="font-size: 80%;"><b>Adding a hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs011.html#layout-of-a-simple-neural-network-with-one-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with one hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs012.html#the-derivatives" style="font-size: 80%;"><b>The derivatives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs013.html#important-observations" style="font-size: 80%;"><b>Important observations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs014.html#the-training" style="font-size: 80%;"><b>The training</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs015.html#code-example" style="font-size: 80%;"><b>Code example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs016.html#simple-neural-network-and-the-back-propagation-equations" style="font-size: 80%;"><b>Simple neural network and the  back propagation equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs017.html#layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-with-two-hidden-noeds-and-one-output-node" style="font-size: 80%;"><b>Layout of a simple neural network with two input nodes, one  hidden layer with two hidden noeds and one output node</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs018.html#the-ouput-layer" style="font-size: 80%;"><b>The ouput layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs019.html#compact-expressions" style="font-size: 80%;"><b>Compact expressions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs020.html#output-layer" style="font-size: 80%;"><b>Output layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs021.html#explicit-derivatives" style="font-size: 80%;"><b>Explicit derivatives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs022.html#derivatives-of-the-hidden-layer" style="font-size: 80%;"><b>Derivatives of the hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs023.html#final-expression" style="font-size: 80%;"><b>Final expression</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs024.html#completing-the-list" style="font-size: 80%;"><b>Completing the list</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs025.html#final-expressions-for-the-biases-of-the-hidden-layer" style="font-size: 80%;"><b>Final expressions for the biases of the hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs026.html#gradient-expressions" style="font-size: 80%;"><b>Gradient expressions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs027.html#setting-up-the-equations-for-a-neural-network" style="font-size: 80%;"><b>Setting up the equations for a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs028.html#layout-of-a-neural-network-with-three-hidden-layers-last-later-l-l-4-first-layer-l-0" style="font-size: 80%;"><b>Layout of a neural network with three hidden layers (last later = \( l=L=4 \), first layer \( l=0 \))</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs029.html#definitions" style="font-size: 80%;"><b>Definitions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs030.html#inputs-to-the-activation-function" style="font-size: 80%;"><b>Inputs to the activation function</b></a></li>
-     <!-- navigation toc: --> <li><a href="#layout-of-input-to-first-hidden-layer-l-1-from-input-layer-l-0" style="font-size: 80%;"><b>Layout of input to first hidden layer \( l=1 \) from input layer \( l=0 \)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs032.html#derivatives-and-the-chain-rule" style="font-size: 80%;"><b>Derivatives and the chain rule</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs033.html#derivative-of-the-cost-function" style="font-size: 80%;"><b>Derivative of the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs034.html#the-back-propagation-equations-for-a-neural-network" style="font-size: 80%;"><b>The  back propagation equations for a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs035.html#analyzing-the-last-results" style="font-size: 80%;"><b>Analyzing the last results</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs036.html#more-considerations" style="font-size: 80%;"><b>More considerations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs037.html#derivatives-in-terms-of-z-j-l" style="font-size: 80%;"><b>Derivatives in terms of \( z_j^L \)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs038.html#bringing-it-together" style="font-size: 80%;"><b>Bringing it together</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs039.html#final-back-propagating-equation" style="font-size: 80%;"><b>Final back propagating equation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs040.html#using-the-chain-rule-and-summing-over-all-k-entries" style="font-size: 80%;"><b>Using the chain rule and summing over all \( k \) entries</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs041.html#setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations" style="font-size: 80%;"><b>Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs042.html#setting-up-the-back-propagation-algorithm-part-1" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 1</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs043.html#setting-up-the-back-propagation-algorithm-part-2" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs044.html#setting-up-the-back-propagation-algorithm-part-3" style="font-size: 80%;"><b>Setting up the Back propagation algorithm, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs045.html#updating-the-gradients" style="font-size: 80%;"><b>Updating the gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs001.html#lecture-october-13-2025" style="font-size: 80%;"><b>Lecture October 13, 2025</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs002.html#readings-and-videos" style="font-size: 80%;"><b>Readings and videos</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs003.html#material-for-the-lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Material for the lab sessions on Tuesday and Wednesday</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs004.html#lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network" style="font-size: 80%;"><b>Lecture material: Writing a code which implements a feed-forward neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs005.html#mathematics-of-deep-learning" style="font-size: 80%;"><b>Mathematics of deep learning</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs006.html#reminder-on-books-with-hands-on-material-and-codes" style="font-size: 80%;"><b>Reminder on books with hands-on material and codes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs007.html#reading-recommendations" style="font-size: 80%;"><b>Reading recommendations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs008.html#reminder-from-last-week-first-network-example-simple-percepetron-with-one-input" style="font-size: 80%;"><b>Reminder from last week: First network example, simple percepetron with one input</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs009.html#layout-of-a-simple-neural-network-with-no-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with no hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs010.html#optimizing-the-parameters" style="font-size: 80%;"><b>Optimizing the parameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs011.html#adding-a-hidden-layer" style="font-size: 80%;"><b>Adding a hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs012.html#layout-of-a-simple-neural-network-with-one-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with one hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs013.html#the-derivatives" style="font-size: 80%;"><b>The derivatives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs014.html#important-observations" style="font-size: 80%;"><b>Important observations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs015.html#the-training" style="font-size: 80%;"><b>The training</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs016.html#code-example" style="font-size: 80%;"><b>Code example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs017.html#simple-neural-network-and-the-back-propagation-equations" style="font-size: 80%;"><b>Simple neural network and the  back propagation equations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs018.html#layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-with-two-hidden-noeds-and-one-output-node" style="font-size: 80%;"><b>Layout of a simple neural network with two input nodes, one  hidden layer with two hidden noeds and one output node</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs019.html#the-ouput-layer" style="font-size: 80%;"><b>The ouput layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs020.html#compact-expressions" style="font-size: 80%;"><b>Compact expressions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs021.html#output-layer" style="font-size: 80%;"><b>Output layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs022.html#explicit-derivatives" style="font-size: 80%;"><b>Explicit derivatives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs023.html#derivatives-of-the-hidden-layer" style="font-size: 80%;"><b>Derivatives of the hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs024.html#final-expression" style="font-size: 80%;"><b>Final expression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs025.html#completing-the-list" style="font-size: 80%;"><b>Completing the list</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs026.html#final-expressions-for-the-biases-of-the-hidden-layer" style="font-size: 80%;"><b>Final expressions for the biases of the hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs027.html#gradient-expressions" style="font-size: 80%;"><b>Gradient expressions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs028.html#setting-up-the-equations-for-a-neural-network" style="font-size: 80%;"><b>Setting up the equations for a neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs029.html#layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0" style="font-size: 80%;"><b>Layout of a neural network with three hidden layers (last layer = \( l=L=4 \), first layer \( l=0 \))</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs030.html#definitions" style="font-size: 80%;"><b>Definitions</b></a></li>
+     <!-- navigation toc: --> <li><a href="#inputs-to-the-activation-function" style="font-size: 80%;"><b>Inputs to the activation function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs032.html#layout-of-input-to-first-hidden-layer-l-1-from-input-layer-l-0" style="font-size: 80%;"><b>Layout of input to first hidden layer \( l=1 \) from input layer \( l=0 \)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs033.html#derivatives-and-the-chain-rule" style="font-size: 80%;"><b>Derivatives and the chain rule</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs034.html#derivative-of-the-cost-function" style="font-size: 80%;"><b>Derivative of the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs035.html#the-back-propagation-equations-for-a-neural-network" style="font-size: 80%;"><b>The  back propagation equations for a neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs036.html#analyzing-the-last-results" style="font-size: 80%;"><b>Analyzing the last results</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs037.html#more-considerations" style="font-size: 80%;"><b>More considerations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs038.html#derivatives-in-terms-of-z-j-l" style="font-size: 80%;"><b>Derivatives in terms of \( z_j^L \)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs039.html#bringing-it-together" style="font-size: 80%;"><b>Bringing it together</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs040.html#final-back-propagating-equation" style="font-size: 80%;"><b>Final back propagating equation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs041.html#using-the-chain-rule-and-summing-over-all-k-entries" style="font-size: 80%;"><b>Using the chain rule and summing over all \( k \) entries</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs042.html#setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations" style="font-size: 80%;"><b>Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs043.html#setting-up-the-back-propagation-algorithm-part-1" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 1</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs044.html#setting-up-the-back-propagation-algorithm-part-2" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 2</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs045.html#setting-up-the-back-propagation-algorithm-part-3" style="font-size: 80%;"><b>Setting up the Back propagation algorithm, part 3</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs046.html#updating-the-gradients" style="font-size: 80%;"><b>Updating the gradients</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#activation-functions" style="font-size: 80%;"><b>Activation functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs047.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs048.html#relevance" style="font-size: 80%;"><b>Relevance</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs049.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs050.html#exploding-gradients" style="font-size: 80%;"><b>Exploding gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs051.html#is-the-logistic-activation-function-sigmoid-our-choice" style="font-size: 80%;"><b>Is the Logistic activation function (Sigmoid)  our choice?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs052.html#logistic-function-as-the-root-of-problems" style="font-size: 80%;"><b>Logistic function as the root of problems</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs053.html#the-derivative-of-the-logistic-funtion" style="font-size: 80%;"><b>The derivative of the Logistic funtion</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs054.html#insights-from-the-paper-by-glorot-and-bengio" style="font-size: 80%;"><b>Insights from the paper by Glorot and Bengio</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs055.html#the-relu-function-family" style="font-size: 80%;"><b>The RELU function family</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs056.html#elu-function" style="font-size: 80%;"><b>ELU function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs057.html#which-activation-function-should-we-use" style="font-size: 80%;"><b>Which activation function should we use?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs058.html#more-on-activation-functions-output-layers" style="font-size: 80%;"><b>More on activation functions, output layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs059.html#fine-tuning-neural-network-hyperparameters" style="font-size: 80%;"><b>Fine-tuning neural network hyperparameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs060.html#hidden-layers" style="font-size: 80%;"><b>Hidden layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs061.html#batch-normalization" style="font-size: 80%;"><b>Batch Normalization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs062.html#dropout" style="font-size: 80%;"><b>Dropout</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs063.html#gradient-clipping" style="font-size: 80%;"><b>Gradient Clipping</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs064.html#a-top-down-perspective-on-neural-networks" style="font-size: 80%;"><b>A top-down perspective on Neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs065.html#more-top-down-perspectives" style="font-size: 80%;"><b>More top-down perspectives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs066.html#limitations-of-supervised-learning-with-deep-networks" style="font-size: 80%;"><b>Limitations of supervised learning with deep networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs067.html#limitations-of-nns" style="font-size: 80%;"><b>Limitations of NNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs068.html#homogeneous-data" style="font-size: 80%;"><b>Homogeneous data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs069.html#more-limitations" style="font-size: 80%;"><b>More limitations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs070.html#setting-up-a-multi-layer-perceptron-model-for-classification" style="font-size: 80%;"><b>Setting up a Multi-layer perceptron model for classification</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs071.html#defining-the-cost-function" style="font-size: 80%;"><b>Defining the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs072.html#example-binary-classification-problem" style="font-size: 80%;"><b>Example: binary classification problem</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs073.html#the-softmax-function" style="font-size: 80%;"><b>The Softmax function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs074.html#developing-a-code-for-doing-neural-networks-with-back-propagation" style="font-size: 80%;"><b>Developing a code for doing neural networks with back propagation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs076.html#train-and-test-datasets" style="font-size: 80%;"><b>Train and test datasets</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs077.html#define-model-and-architecture" style="font-size: 80%;"><b>Define model and architecture</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs078.html#layers" style="font-size: 80%;"><b>Layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs079.html#weights-and-biases" style="font-size: 80%;"><b>Weights and biases</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs080.html#feed-forward-pass" style="font-size: 80%;"><b>Feed-forward pass</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs081.html#matrix-multiplications" style="font-size: 80%;"><b>Matrix multiplications</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs082.html#choose-cost-function-and-optimizer" style="font-size: 80%;"><b>Choose cost function and optimizer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs083.html#optimizing-the-cost-function" style="font-size: 80%;"><b>Optimizing the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs084.html#regularization" style="font-size: 80%;"><b>Regularization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs085.html#matrix-multiplication" style="font-size: 80%;"><b>Matrix  multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs086.html#improving-performance" style="font-size: 80%;"><b>Improving performance</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs087.html#full-object-oriented-implementation" style="font-size: 80%;"><b>Full object-oriented implementation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs088.html#evaluate-model-performance-on-test-data" style="font-size: 80%;"><b>Evaluate model performance on test data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs089.html#adjust-hyperparameters" style="font-size: 80%;"><b>Adjust hyperparameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs091.html#scikit-learn-implementation" style="font-size: 80%;"><b>scikit-learn implementation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#building-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs094.html#tensorflow" style="font-size: 80%;"><b>Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs095.html#using-keras" style="font-size: 80%;"><b>Using Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#the-breast-cancer-data-now-with-keras" style="font-size: 80%;"><b>The Breast Cancer Data, now with Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs048.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs049.html#relevance" style="font-size: 80%;"><b>Relevance</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs050.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs051.html#exploding-gradients" style="font-size: 80%;"><b>Exploding gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs052.html#is-the-logistic-activation-function-sigmoid-our-choice" style="font-size: 80%;"><b>Is the Logistic activation function (Sigmoid)  our choice?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs053.html#logistic-function-as-the-root-of-problems" style="font-size: 80%;"><b>Logistic function as the root of problems</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs054.html#the-derivative-of-the-logistic-funtion" style="font-size: 80%;"><b>The derivative of the Logistic funtion</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs055.html#insights-from-the-paper-by-glorot-and-bengio" style="font-size: 80%;"><b>Insights from the paper by Glorot and Bengio</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs056.html#the-relu-function-family" style="font-size: 80%;"><b>The RELU function family</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs057.html#elu-function" style="font-size: 80%;"><b>ELU function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs058.html#which-activation-function-should-we-use" style="font-size: 80%;"><b>Which activation function should we use?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs059.html#more-on-activation-functions-output-layers" style="font-size: 80%;"><b>More on activation functions, output layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs060.html#fine-tuning-neural-network-hyperparameters" style="font-size: 80%;"><b>Fine-tuning neural network hyperparameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs061.html#hidden-layers" style="font-size: 80%;"><b>Hidden layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs062.html#batch-normalization" style="font-size: 80%;"><b>Batch Normalization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs063.html#dropout" style="font-size: 80%;"><b>Dropout</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs064.html#gradient-clipping" style="font-size: 80%;"><b>Gradient Clipping</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs065.html#a-top-down-perspective-on-neural-networks" style="font-size: 80%;"><b>A top-down perspective on Neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs066.html#more-top-down-perspectives" style="font-size: 80%;"><b>More top-down perspectives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs067.html#limitations-of-supervised-learning-with-deep-networks" style="font-size: 80%;"><b>Limitations of supervised learning with deep networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs068.html#limitations-of-nns" style="font-size: 80%;"><b>Limitations of NNs</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs069.html#homogeneous-data" style="font-size: 80%;"><b>Homogeneous data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs070.html#more-limitations" style="font-size: 80%;"><b>More limitations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs071.html#setting-up-a-multi-layer-perceptron-model-for-classification" style="font-size: 80%;"><b>Setting up a Multi-layer perceptron model for classification</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs072.html#defining-the-cost-function" style="font-size: 80%;"><b>Defining the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs073.html#example-binary-classification-problem" style="font-size: 80%;"><b>Example: binary classification problem</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs074.html#the-softmax-function" style="font-size: 80%;"><b>The Softmax function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs075.html#developing-a-code-for-doing-neural-networks-with-back-propagation" style="font-size: 80%;"><b>Developing a code for doing neural networks with back propagation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs077.html#train-and-test-datasets" style="font-size: 80%;"><b>Train and test datasets</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs078.html#define-model-and-architecture" style="font-size: 80%;"><b>Define model and architecture</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs079.html#layers" style="font-size: 80%;"><b>Layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs080.html#weights-and-biases" style="font-size: 80%;"><b>Weights and biases</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs081.html#feed-forward-pass" style="font-size: 80%;"><b>Feed-forward pass</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs082.html#matrix-multiplications" style="font-size: 80%;"><b>Matrix multiplications</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs083.html#choose-cost-function-and-optimizer" style="font-size: 80%;"><b>Choose cost function and optimizer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs084.html#optimizing-the-cost-function" style="font-size: 80%;"><b>Optimizing the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs085.html#regularization" style="font-size: 80%;"><b>Regularization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs086.html#matrix-multiplication" style="font-size: 80%;"><b>Matrix  multiplication</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs087.html#improving-performance" style="font-size: 80%;"><b>Improving performance</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs088.html#full-object-oriented-implementation" style="font-size: 80%;"><b>Full object-oriented implementation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs089.html#evaluate-model-performance-on-test-data" style="font-size: 80%;"><b>Evaluate model performance on test data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs090.html#adjust-hyperparameters" style="font-size: 80%;"><b>Adjust hyperparameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#scikit-learn-implementation" style="font-size: 80%;"><b>scikit-learn implementation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs094.html#building-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building neural networks in Tensorflow and Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs095.html#tensorflow" style="font-size: 80%;"><b>Tensorflow</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#using-keras" style="font-size: 80%;"><b>Using Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#building-a-neural-network-code" style="font-size: 80%;"><b>Building a neural network code</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#learning-rate-methods" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Learning rate methods</a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#usage-of-the-above-learning-rate-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of the above learning rate schedulers</a></li>
@@ -465,13 +463,19 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0031"></a>
 <!-- !split -->
-<h2 id="layout-of-input-to-first-hidden-layer-l-1-from-input-layer-l-0" class="anchor">Layout of input to first hidden layer \( l=1 \) from input layer \( l=0 \) </h2>
+<h2 id="inputs-to-the-activation-function" class="anchor">Inputs to the activation function </h2>
+
+<p>With the activation values \( \boldsymbol{z}^l \) we can in turn define the
+output of layer \( l \) as \( \boldsymbol{a}^l = \sigma(\boldsymbol{z}^l) \) where \( \sigma \) is our
+activation function. In the examples here we will use the sigmoid
+function discussed in our logistic regression lectures. We will also use the same activation function \( \sigma \) for all layers
+and their nodes.  It means we have
+</p>
+
+$$
+a_j^l = \sigma(z_j^l) = \frac{1}{1+\exp{-(z_j^l)}}.
+$$
 
-<br/><br/>
-<center>
-<p><img src="http://github.com/figures/structure.png" width="900" align="bottom"></p>
-</center>
-<br/><br/>
 
 <p>
 <!-- navigation buttons at the bottom of the page -->
diff --git a/doc/pub/week42/html/._week42-bs032.html b/doc/pub/week42/html/._week42-bs032.html
index 4090147f3..7174a7655 100644
--- a/doc/pub/week42/html/._week42-bs032.html
+++ b/doc/pub/week42/html/._week42-bs032.html
@@ -36,16 +36,17 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Lecture October 14, 2024', 2, None, 'lecture-october-14-2024'),
-              ('Material for the active learning sessions on Tuesday and '
-               'Wednesday',
+ 'sections': [('Lecture October 13, 2025', 2, None, 'lecture-october-13-2025'),
+              ('Readings and videos', 2, None, 'readings-and-videos'),
+              ('Material for the lab sessions on Tuesday and Wednesday',
                2,
                None,
-               'material-for-the-active-learning-sessions-on-tuesday-and-wednesday'),
-              ('Writing a code which implements a feed-forward neural network',
+               'material-for-the-lab-sessions-on-tuesday-and-wednesday'),
+              ('Lecture material: Writing a code which implements a '
+               'feed-forward neural network',
                2,
                None,
-               'writing-a-code-which-implements-a-feed-forward-neural-network'),
+               'lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network'),
               ('Mathematics of deep learning',
                2,
                None,
@@ -55,10 +56,11 @@
                None,
                'reminder-on-books-with-hands-on-material-and-codes'),
               ('Reading recommendations', 2, None, 'reading-recommendations'),
-              ('First network example, simple percepetron with one input',
+              ('Reminder from last week: First network example, simple '
+               'percepetron with one input',
                2,
                None,
-               'first-network-example-simple-percepetron-with-one-input'),
+               'reminder-from-last-week-first-network-example-simple-percepetron-with-one-input'),
               ('Layout of a simple neural network with no hidden layer',
                2,
                None,
@@ -105,10 +107,10 @@
                None,
                'setting-up-the-equations-for-a-neural-network'),
               ('Layout of a neural network with three hidden layers (last '
-               'later = $l=L=4$, first layer $l=0$)',
+               'layer = $l=L=4$, first layer $l=0$)',
                2,
                None,
-               'layout-of-a-neural-network-with-three-hidden-layers-last-later-l-l-4-first-layer-l-0'),
+               'layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0'),
               ('Definitions', 2, None, 'definitions'),
               ('Inputs to the activation function',
                2,
@@ -291,10 +293,6 @@
                2,
                None,
                'collect-and-pre-process-data'),
-              ('The Breast Cancer Data, now with Keras',
-               2,
-               None,
-               'the-breast-cancer-data-now-with-keras'),
               ('Building a neural network code',
                2,
                None,
@@ -349,103 +347,103 @@
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs001.html#lecture-october-14-2024" style="font-size: 80%;"><b>Lecture October 14, 2024</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs002.html#material-for-the-active-learning-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Material for the active learning sessions on Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs003.html#writing-a-code-which-implements-a-feed-forward-neural-network" style="font-size: 80%;"><b>Writing a code which implements a feed-forward neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs004.html#mathematics-of-deep-learning" style="font-size: 80%;"><b>Mathematics of deep learning</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs005.html#reminder-on-books-with-hands-on-material-and-codes" style="font-size: 80%;"><b>Reminder on books with hands-on material and codes</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs006.html#reading-recommendations" style="font-size: 80%;"><b>Reading recommendations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs007.html#first-network-example-simple-percepetron-with-one-input" style="font-size: 80%;"><b>First network example, simple percepetron with one input</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs008.html#layout-of-a-simple-neural-network-with-no-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with no hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs009.html#optimizing-the-parameters" style="font-size: 80%;"><b>Optimizing the parameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs010.html#adding-a-hidden-layer" style="font-size: 80%;"><b>Adding a hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs011.html#layout-of-a-simple-neural-network-with-one-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with one hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs012.html#the-derivatives" style="font-size: 80%;"><b>The derivatives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs013.html#important-observations" style="font-size: 80%;"><b>Important observations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs014.html#the-training" style="font-size: 80%;"><b>The training</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs015.html#code-example" style="font-size: 80%;"><b>Code example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs016.html#simple-neural-network-and-the-back-propagation-equations" style="font-size: 80%;"><b>Simple neural network and the  back propagation equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs017.html#layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-with-two-hidden-noeds-and-one-output-node" style="font-size: 80%;"><b>Layout of a simple neural network with two input nodes, one  hidden layer with two hidden noeds and one output node</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs018.html#the-ouput-layer" style="font-size: 80%;"><b>The ouput layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs019.html#compact-expressions" style="font-size: 80%;"><b>Compact expressions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs020.html#output-layer" style="font-size: 80%;"><b>Output layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs021.html#explicit-derivatives" style="font-size: 80%;"><b>Explicit derivatives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs022.html#derivatives-of-the-hidden-layer" style="font-size: 80%;"><b>Derivatives of the hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs023.html#final-expression" style="font-size: 80%;"><b>Final expression</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs024.html#completing-the-list" style="font-size: 80%;"><b>Completing the list</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs025.html#final-expressions-for-the-biases-of-the-hidden-layer" style="font-size: 80%;"><b>Final expressions for the biases of the hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs026.html#gradient-expressions" style="font-size: 80%;"><b>Gradient expressions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs027.html#setting-up-the-equations-for-a-neural-network" style="font-size: 80%;"><b>Setting up the equations for a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs028.html#layout-of-a-neural-network-with-three-hidden-layers-last-later-l-l-4-first-layer-l-0" style="font-size: 80%;"><b>Layout of a neural network with three hidden layers (last later = \( l=L=4 \), first layer \( l=0 \))</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs029.html#definitions" style="font-size: 80%;"><b>Definitions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs030.html#inputs-to-the-activation-function" style="font-size: 80%;"><b>Inputs to the activation function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs031.html#layout-of-input-to-first-hidden-layer-l-1-from-input-layer-l-0" style="font-size: 80%;"><b>Layout of input to first hidden layer \( l=1 \) from input layer \( l=0 \)</b></a></li>
-     <!-- navigation toc: --> <li><a href="#derivatives-and-the-chain-rule" style="font-size: 80%;"><b>Derivatives and the chain rule</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs033.html#derivative-of-the-cost-function" style="font-size: 80%;"><b>Derivative of the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs034.html#the-back-propagation-equations-for-a-neural-network" style="font-size: 80%;"><b>The  back propagation equations for a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs035.html#analyzing-the-last-results" style="font-size: 80%;"><b>Analyzing the last results</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs036.html#more-considerations" style="font-size: 80%;"><b>More considerations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs037.html#derivatives-in-terms-of-z-j-l" style="font-size: 80%;"><b>Derivatives in terms of \( z_j^L \)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs038.html#bringing-it-together" style="font-size: 80%;"><b>Bringing it together</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs039.html#final-back-propagating-equation" style="font-size: 80%;"><b>Final back propagating equation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs040.html#using-the-chain-rule-and-summing-over-all-k-entries" style="font-size: 80%;"><b>Using the chain rule and summing over all \( k \) entries</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs041.html#setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations" style="font-size: 80%;"><b>Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs042.html#setting-up-the-back-propagation-algorithm-part-1" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 1</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs043.html#setting-up-the-back-propagation-algorithm-part-2" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs044.html#setting-up-the-back-propagation-algorithm-part-3" style="font-size: 80%;"><b>Setting up the Back propagation algorithm, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs045.html#updating-the-gradients" style="font-size: 80%;"><b>Updating the gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs001.html#lecture-october-13-2025" style="font-size: 80%;"><b>Lecture October 13, 2025</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs002.html#readings-and-videos" style="font-size: 80%;"><b>Readings and videos</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs003.html#material-for-the-lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Material for the lab sessions on Tuesday and Wednesday</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs004.html#lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network" style="font-size: 80%;"><b>Lecture material: Writing a code which implements a feed-forward neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs005.html#mathematics-of-deep-learning" style="font-size: 80%;"><b>Mathematics of deep learning</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs006.html#reminder-on-books-with-hands-on-material-and-codes" style="font-size: 80%;"><b>Reminder on books with hands-on material and codes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs007.html#reading-recommendations" style="font-size: 80%;"><b>Reading recommendations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs008.html#reminder-from-last-week-first-network-example-simple-percepetron-with-one-input" style="font-size: 80%;"><b>Reminder from last week: First network example, simple percepetron with one input</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs009.html#layout-of-a-simple-neural-network-with-no-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with no hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs010.html#optimizing-the-parameters" style="font-size: 80%;"><b>Optimizing the parameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs011.html#adding-a-hidden-layer" style="font-size: 80%;"><b>Adding a hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs012.html#layout-of-a-simple-neural-network-with-one-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with one hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs013.html#the-derivatives" style="font-size: 80%;"><b>The derivatives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs014.html#important-observations" style="font-size: 80%;"><b>Important observations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs015.html#the-training" style="font-size: 80%;"><b>The training</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs016.html#code-example" style="font-size: 80%;"><b>Code example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs017.html#simple-neural-network-and-the-back-propagation-equations" style="font-size: 80%;"><b>Simple neural network and the  back propagation equations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs018.html#layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-with-two-hidden-noeds-and-one-output-node" style="font-size: 80%;"><b>Layout of a simple neural network with two input nodes, one  hidden layer with two hidden noeds and one output node</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs019.html#the-ouput-layer" style="font-size: 80%;"><b>The ouput layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs020.html#compact-expressions" style="font-size: 80%;"><b>Compact expressions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs021.html#output-layer" style="font-size: 80%;"><b>Output layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs022.html#explicit-derivatives" style="font-size: 80%;"><b>Explicit derivatives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs023.html#derivatives-of-the-hidden-layer" style="font-size: 80%;"><b>Derivatives of the hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs024.html#final-expression" style="font-size: 80%;"><b>Final expression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs025.html#completing-the-list" style="font-size: 80%;"><b>Completing the list</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs026.html#final-expressions-for-the-biases-of-the-hidden-layer" style="font-size: 80%;"><b>Final expressions for the biases of the hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs027.html#gradient-expressions" style="font-size: 80%;"><b>Gradient expressions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs028.html#setting-up-the-equations-for-a-neural-network" style="font-size: 80%;"><b>Setting up the equations for a neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs029.html#layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0" style="font-size: 80%;"><b>Layout of a neural network with three hidden layers (last layer = \( l=L=4 \), first layer \( l=0 \))</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs030.html#definitions" style="font-size: 80%;"><b>Definitions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs031.html#inputs-to-the-activation-function" style="font-size: 80%;"><b>Inputs to the activation function</b></a></li>
+     <!-- navigation toc: --> <li><a href="#layout-of-input-to-first-hidden-layer-l-1-from-input-layer-l-0" style="font-size: 80%;"><b>Layout of input to first hidden layer \( l=1 \) from input layer \( l=0 \)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs033.html#derivatives-and-the-chain-rule" style="font-size: 80%;"><b>Derivatives and the chain rule</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs034.html#derivative-of-the-cost-function" style="font-size: 80%;"><b>Derivative of the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs035.html#the-back-propagation-equations-for-a-neural-network" style="font-size: 80%;"><b>The  back propagation equations for a neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs036.html#analyzing-the-last-results" style="font-size: 80%;"><b>Analyzing the last results</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs037.html#more-considerations" style="font-size: 80%;"><b>More considerations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs038.html#derivatives-in-terms-of-z-j-l" style="font-size: 80%;"><b>Derivatives in terms of \( z_j^L \)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs039.html#bringing-it-together" style="font-size: 80%;"><b>Bringing it together</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs040.html#final-back-propagating-equation" style="font-size: 80%;"><b>Final back propagating equation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs041.html#using-the-chain-rule-and-summing-over-all-k-entries" style="font-size: 80%;"><b>Using the chain rule and summing over all \( k \) entries</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs042.html#setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations" style="font-size: 80%;"><b>Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs043.html#setting-up-the-back-propagation-algorithm-part-1" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 1</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs044.html#setting-up-the-back-propagation-algorithm-part-2" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 2</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs045.html#setting-up-the-back-propagation-algorithm-part-3" style="font-size: 80%;"><b>Setting up the Back propagation algorithm, part 3</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs046.html#updating-the-gradients" style="font-size: 80%;"><b>Updating the gradients</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#activation-functions" style="font-size: 80%;"><b>Activation functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs047.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs048.html#relevance" style="font-size: 80%;"><b>Relevance</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs049.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs050.html#exploding-gradients" style="font-size: 80%;"><b>Exploding gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs051.html#is-the-logistic-activation-function-sigmoid-our-choice" style="font-size: 80%;"><b>Is the Logistic activation function (Sigmoid)  our choice?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs052.html#logistic-function-as-the-root-of-problems" style="font-size: 80%;"><b>Logistic function as the root of problems</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs053.html#the-derivative-of-the-logistic-funtion" style="font-size: 80%;"><b>The derivative of the Logistic funtion</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs054.html#insights-from-the-paper-by-glorot-and-bengio" style="font-size: 80%;"><b>Insights from the paper by Glorot and Bengio</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs055.html#the-relu-function-family" style="font-size: 80%;"><b>The RELU function family</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs056.html#elu-function" style="font-size: 80%;"><b>ELU function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs057.html#which-activation-function-should-we-use" style="font-size: 80%;"><b>Which activation function should we use?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs058.html#more-on-activation-functions-output-layers" style="font-size: 80%;"><b>More on activation functions, output layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs059.html#fine-tuning-neural-network-hyperparameters" style="font-size: 80%;"><b>Fine-tuning neural network hyperparameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs060.html#hidden-layers" style="font-size: 80%;"><b>Hidden layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs061.html#batch-normalization" style="font-size: 80%;"><b>Batch Normalization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs062.html#dropout" style="font-size: 80%;"><b>Dropout</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs063.html#gradient-clipping" style="font-size: 80%;"><b>Gradient Clipping</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs064.html#a-top-down-perspective-on-neural-networks" style="font-size: 80%;"><b>A top-down perspective on Neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs065.html#more-top-down-perspectives" style="font-size: 80%;"><b>More top-down perspectives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs066.html#limitations-of-supervised-learning-with-deep-networks" style="font-size: 80%;"><b>Limitations of supervised learning with deep networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs067.html#limitations-of-nns" style="font-size: 80%;"><b>Limitations of NNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs068.html#homogeneous-data" style="font-size: 80%;"><b>Homogeneous data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs069.html#more-limitations" style="font-size: 80%;"><b>More limitations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs070.html#setting-up-a-multi-layer-perceptron-model-for-classification" style="font-size: 80%;"><b>Setting up a Multi-layer perceptron model for classification</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs071.html#defining-the-cost-function" style="font-size: 80%;"><b>Defining the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs072.html#example-binary-classification-problem" style="font-size: 80%;"><b>Example: binary classification problem</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs073.html#the-softmax-function" style="font-size: 80%;"><b>The Softmax function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs074.html#developing-a-code-for-doing-neural-networks-with-back-propagation" style="font-size: 80%;"><b>Developing a code for doing neural networks with back propagation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs076.html#train-and-test-datasets" style="font-size: 80%;"><b>Train and test datasets</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs077.html#define-model-and-architecture" style="font-size: 80%;"><b>Define model and architecture</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs078.html#layers" style="font-size: 80%;"><b>Layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs079.html#weights-and-biases" style="font-size: 80%;"><b>Weights and biases</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs080.html#feed-forward-pass" style="font-size: 80%;"><b>Feed-forward pass</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs081.html#matrix-multiplications" style="font-size: 80%;"><b>Matrix multiplications</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs082.html#choose-cost-function-and-optimizer" style="font-size: 80%;"><b>Choose cost function and optimizer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs083.html#optimizing-the-cost-function" style="font-size: 80%;"><b>Optimizing the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs084.html#regularization" style="font-size: 80%;"><b>Regularization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs085.html#matrix-multiplication" style="font-size: 80%;"><b>Matrix  multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs086.html#improving-performance" style="font-size: 80%;"><b>Improving performance</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs087.html#full-object-oriented-implementation" style="font-size: 80%;"><b>Full object-oriented implementation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs088.html#evaluate-model-performance-on-test-data" style="font-size: 80%;"><b>Evaluate model performance on test data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs089.html#adjust-hyperparameters" style="font-size: 80%;"><b>Adjust hyperparameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs091.html#scikit-learn-implementation" style="font-size: 80%;"><b>scikit-learn implementation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#building-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs094.html#tensorflow" style="font-size: 80%;"><b>Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs095.html#using-keras" style="font-size: 80%;"><b>Using Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#the-breast-cancer-data-now-with-keras" style="font-size: 80%;"><b>The Breast Cancer Data, now with Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs048.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs049.html#relevance" style="font-size: 80%;"><b>Relevance</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs050.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs051.html#exploding-gradients" style="font-size: 80%;"><b>Exploding gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs052.html#is-the-logistic-activation-function-sigmoid-our-choice" style="font-size: 80%;"><b>Is the Logistic activation function (Sigmoid)  our choice?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs053.html#logistic-function-as-the-root-of-problems" style="font-size: 80%;"><b>Logistic function as the root of problems</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs054.html#the-derivative-of-the-logistic-funtion" style="font-size: 80%;"><b>The derivative of the Logistic funtion</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs055.html#insights-from-the-paper-by-glorot-and-bengio" style="font-size: 80%;"><b>Insights from the paper by Glorot and Bengio</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs056.html#the-relu-function-family" style="font-size: 80%;"><b>The RELU function family</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs057.html#elu-function" style="font-size: 80%;"><b>ELU function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs058.html#which-activation-function-should-we-use" style="font-size: 80%;"><b>Which activation function should we use?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs059.html#more-on-activation-functions-output-layers" style="font-size: 80%;"><b>More on activation functions, output layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs060.html#fine-tuning-neural-network-hyperparameters" style="font-size: 80%;"><b>Fine-tuning neural network hyperparameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs061.html#hidden-layers" style="font-size: 80%;"><b>Hidden layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs062.html#batch-normalization" style="font-size: 80%;"><b>Batch Normalization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs063.html#dropout" style="font-size: 80%;"><b>Dropout</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs064.html#gradient-clipping" style="font-size: 80%;"><b>Gradient Clipping</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs065.html#a-top-down-perspective-on-neural-networks" style="font-size: 80%;"><b>A top-down perspective on Neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs066.html#more-top-down-perspectives" style="font-size: 80%;"><b>More top-down perspectives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs067.html#limitations-of-supervised-learning-with-deep-networks" style="font-size: 80%;"><b>Limitations of supervised learning with deep networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs068.html#limitations-of-nns" style="font-size: 80%;"><b>Limitations of NNs</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs069.html#homogeneous-data" style="font-size: 80%;"><b>Homogeneous data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs070.html#more-limitations" style="font-size: 80%;"><b>More limitations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs071.html#setting-up-a-multi-layer-perceptron-model-for-classification" style="font-size: 80%;"><b>Setting up a Multi-layer perceptron model for classification</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs072.html#defining-the-cost-function" style="font-size: 80%;"><b>Defining the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs073.html#example-binary-classification-problem" style="font-size: 80%;"><b>Example: binary classification problem</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs074.html#the-softmax-function" style="font-size: 80%;"><b>The Softmax function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs075.html#developing-a-code-for-doing-neural-networks-with-back-propagation" style="font-size: 80%;"><b>Developing a code for doing neural networks with back propagation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs077.html#train-and-test-datasets" style="font-size: 80%;"><b>Train and test datasets</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs078.html#define-model-and-architecture" style="font-size: 80%;"><b>Define model and architecture</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs079.html#layers" style="font-size: 80%;"><b>Layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs080.html#weights-and-biases" style="font-size: 80%;"><b>Weights and biases</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs081.html#feed-forward-pass" style="font-size: 80%;"><b>Feed-forward pass</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs082.html#matrix-multiplications" style="font-size: 80%;"><b>Matrix multiplications</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs083.html#choose-cost-function-and-optimizer" style="font-size: 80%;"><b>Choose cost function and optimizer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs084.html#optimizing-the-cost-function" style="font-size: 80%;"><b>Optimizing the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs085.html#regularization" style="font-size: 80%;"><b>Regularization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs086.html#matrix-multiplication" style="font-size: 80%;"><b>Matrix  multiplication</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs087.html#improving-performance" style="font-size: 80%;"><b>Improving performance</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs088.html#full-object-oriented-implementation" style="font-size: 80%;"><b>Full object-oriented implementation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs089.html#evaluate-model-performance-on-test-data" style="font-size: 80%;"><b>Evaluate model performance on test data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs090.html#adjust-hyperparameters" style="font-size: 80%;"><b>Adjust hyperparameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#scikit-learn-implementation" style="font-size: 80%;"><b>scikit-learn implementation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs094.html#building-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building neural networks in Tensorflow and Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs095.html#tensorflow" style="font-size: 80%;"><b>Tensorflow</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#using-keras" style="font-size: 80%;"><b>Using Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#building-a-neural-network-code" style="font-size: 80%;"><b>Building a neural network code</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#learning-rate-methods" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Learning rate methods</a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#usage-of-the-above-learning-rate-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of the above learning rate schedulers</a></li>
@@ -465,23 +463,13 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0032"></a>
 <!-- !split -->
-<h2 id="derivatives-and-the-chain-rule" class="anchor">Derivatives and the chain rule </h2>
-
-<p>From the definition of the input variable to the activation function, that is \( z_j^l \) we have</p>
-$$
-\frac{\partial z_j^l}{\partial w_{ij}^l} = a_i^{l-1},
-$$
-
-<p>and</p>
-$$
-\frac{\partial z_j^l}{\partial a_i^{l-1}} = w_{ji}^l. 
-$$
-
-<p>With our definition of the activation function we have that (note that this function depends only on \( z_j^l \))</p>
-$$
-\frac{\partial a_j^l}{\partial z_j^{l}} = a_j^l(1-a_j^l)=\sigma(z_j^l)(1-\sigma(z_j^l)). 
-$$
+<h2 id="layout-of-input-to-first-hidden-layer-l-1-from-input-layer-l-0" class="anchor">Layout of input to first hidden layer \( l=1 \) from input layer \( l=0 \) </h2>
 
+<br/><br/>
+<center>
+<p><img src="http://github.com/figures/structure.png" width="900" align="bottom"></p>
+</center>
+<br/><br/>
 
 <p>
 <!-- navigation buttons at the bottom of the page -->
diff --git a/doc/pub/week42/html/._week42-bs033.html b/doc/pub/week42/html/._week42-bs033.html
index ae79e8047..afa268380 100644
--- a/doc/pub/week42/html/._week42-bs033.html
+++ b/doc/pub/week42/html/._week42-bs033.html
@@ -36,16 +36,17 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Lecture October 14, 2024', 2, None, 'lecture-october-14-2024'),
-              ('Material for the active learning sessions on Tuesday and '
-               'Wednesday',
+ 'sections': [('Lecture October 13, 2025', 2, None, 'lecture-october-13-2025'),
+              ('Readings and videos', 2, None, 'readings-and-videos'),
+              ('Material for the lab sessions on Tuesday and Wednesday',
                2,
                None,
-               'material-for-the-active-learning-sessions-on-tuesday-and-wednesday'),
-              ('Writing a code which implements a feed-forward neural network',
+               'material-for-the-lab-sessions-on-tuesday-and-wednesday'),
+              ('Lecture material: Writing a code which implements a '
+               'feed-forward neural network',
                2,
                None,
-               'writing-a-code-which-implements-a-feed-forward-neural-network'),
+               'lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network'),
               ('Mathematics of deep learning',
                2,
                None,
@@ -55,10 +56,11 @@
                None,
                'reminder-on-books-with-hands-on-material-and-codes'),
               ('Reading recommendations', 2, None, 'reading-recommendations'),
-              ('First network example, simple percepetron with one input',
+              ('Reminder from last week: First network example, simple '
+               'percepetron with one input',
                2,
                None,
-               'first-network-example-simple-percepetron-with-one-input'),
+               'reminder-from-last-week-first-network-example-simple-percepetron-with-one-input'),
               ('Layout of a simple neural network with no hidden layer',
                2,
                None,
@@ -105,10 +107,10 @@
                None,
                'setting-up-the-equations-for-a-neural-network'),
               ('Layout of a neural network with three hidden layers (last '
-               'later = $l=L=4$, first layer $l=0$)',
+               'layer = $l=L=4$, first layer $l=0$)',
                2,
                None,
-               'layout-of-a-neural-network-with-three-hidden-layers-last-later-l-l-4-first-layer-l-0'),
+               'layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0'),
               ('Definitions', 2, None, 'definitions'),
               ('Inputs to the activation function',
                2,
@@ -291,10 +293,6 @@
                2,
                None,
                'collect-and-pre-process-data'),
-              ('The Breast Cancer Data, now with Keras',
-               2,
-               None,
-               'the-breast-cancer-data-now-with-keras'),
               ('Building a neural network code',
                2,
                None,
@@ -349,103 +347,103 @@
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs001.html#lecture-october-14-2024" style="font-size: 80%;"><b>Lecture October 14, 2024</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs002.html#material-for-the-active-learning-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Material for the active learning sessions on Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs003.html#writing-a-code-which-implements-a-feed-forward-neural-network" style="font-size: 80%;"><b>Writing a code which implements a feed-forward neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs004.html#mathematics-of-deep-learning" style="font-size: 80%;"><b>Mathematics of deep learning</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs005.html#reminder-on-books-with-hands-on-material-and-codes" style="font-size: 80%;"><b>Reminder on books with hands-on material and codes</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs006.html#reading-recommendations" style="font-size: 80%;"><b>Reading recommendations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs007.html#first-network-example-simple-percepetron-with-one-input" style="font-size: 80%;"><b>First network example, simple percepetron with one input</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs008.html#layout-of-a-simple-neural-network-with-no-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with no hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs009.html#optimizing-the-parameters" style="font-size: 80%;"><b>Optimizing the parameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs010.html#adding-a-hidden-layer" style="font-size: 80%;"><b>Adding a hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs011.html#layout-of-a-simple-neural-network-with-one-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with one hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs012.html#the-derivatives" style="font-size: 80%;"><b>The derivatives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs013.html#important-observations" style="font-size: 80%;"><b>Important observations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs014.html#the-training" style="font-size: 80%;"><b>The training</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs015.html#code-example" style="font-size: 80%;"><b>Code example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs016.html#simple-neural-network-and-the-back-propagation-equations" style="font-size: 80%;"><b>Simple neural network and the  back propagation equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs017.html#layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-with-two-hidden-noeds-and-one-output-node" style="font-size: 80%;"><b>Layout of a simple neural network with two input nodes, one  hidden layer with two hidden noeds and one output node</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs018.html#the-ouput-layer" style="font-size: 80%;"><b>The ouput layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs019.html#compact-expressions" style="font-size: 80%;"><b>Compact expressions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs020.html#output-layer" style="font-size: 80%;"><b>Output layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs021.html#explicit-derivatives" style="font-size: 80%;"><b>Explicit derivatives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs022.html#derivatives-of-the-hidden-layer" style="font-size: 80%;"><b>Derivatives of the hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs023.html#final-expression" style="font-size: 80%;"><b>Final expression</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs024.html#completing-the-list" style="font-size: 80%;"><b>Completing the list</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs025.html#final-expressions-for-the-biases-of-the-hidden-layer" style="font-size: 80%;"><b>Final expressions for the biases of the hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs026.html#gradient-expressions" style="font-size: 80%;"><b>Gradient expressions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs027.html#setting-up-the-equations-for-a-neural-network" style="font-size: 80%;"><b>Setting up the equations for a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs028.html#layout-of-a-neural-network-with-three-hidden-layers-last-later-l-l-4-first-layer-l-0" style="font-size: 80%;"><b>Layout of a neural network with three hidden layers (last later = \( l=L=4 \), first layer \( l=0 \))</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs029.html#definitions" style="font-size: 80%;"><b>Definitions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs030.html#inputs-to-the-activation-function" style="font-size: 80%;"><b>Inputs to the activation function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs031.html#layout-of-input-to-first-hidden-layer-l-1-from-input-layer-l-0" style="font-size: 80%;"><b>Layout of input to first hidden layer \( l=1 \) from input layer \( l=0 \)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs032.html#derivatives-and-the-chain-rule" style="font-size: 80%;"><b>Derivatives and the chain rule</b></a></li>
-     <!-- navigation toc: --> <li><a href="#derivative-of-the-cost-function" style="font-size: 80%;"><b>Derivative of the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs034.html#the-back-propagation-equations-for-a-neural-network" style="font-size: 80%;"><b>The  back propagation equations for a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs035.html#analyzing-the-last-results" style="font-size: 80%;"><b>Analyzing the last results</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs036.html#more-considerations" style="font-size: 80%;"><b>More considerations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs037.html#derivatives-in-terms-of-z-j-l" style="font-size: 80%;"><b>Derivatives in terms of \( z_j^L \)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs038.html#bringing-it-together" style="font-size: 80%;"><b>Bringing it together</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs039.html#final-back-propagating-equation" style="font-size: 80%;"><b>Final back propagating equation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs040.html#using-the-chain-rule-and-summing-over-all-k-entries" style="font-size: 80%;"><b>Using the chain rule and summing over all \( k \) entries</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs041.html#setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations" style="font-size: 80%;"><b>Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs042.html#setting-up-the-back-propagation-algorithm-part-1" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 1</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs043.html#setting-up-the-back-propagation-algorithm-part-2" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs044.html#setting-up-the-back-propagation-algorithm-part-3" style="font-size: 80%;"><b>Setting up the Back propagation algorithm, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs045.html#updating-the-gradients" style="font-size: 80%;"><b>Updating the gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs001.html#lecture-october-13-2025" style="font-size: 80%;"><b>Lecture October 13, 2025</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs002.html#readings-and-videos" style="font-size: 80%;"><b>Readings and videos</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs003.html#material-for-the-lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Material for the lab sessions on Tuesday and Wednesday</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs004.html#lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network" style="font-size: 80%;"><b>Lecture material: Writing a code which implements a feed-forward neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs005.html#mathematics-of-deep-learning" style="font-size: 80%;"><b>Mathematics of deep learning</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs006.html#reminder-on-books-with-hands-on-material-and-codes" style="font-size: 80%;"><b>Reminder on books with hands-on material and codes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs007.html#reading-recommendations" style="font-size: 80%;"><b>Reading recommendations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs008.html#reminder-from-last-week-first-network-example-simple-percepetron-with-one-input" style="font-size: 80%;"><b>Reminder from last week: First network example, simple percepetron with one input</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs009.html#layout-of-a-simple-neural-network-with-no-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with no hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs010.html#optimizing-the-parameters" style="font-size: 80%;"><b>Optimizing the parameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs011.html#adding-a-hidden-layer" style="font-size: 80%;"><b>Adding a hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs012.html#layout-of-a-simple-neural-network-with-one-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with one hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs013.html#the-derivatives" style="font-size: 80%;"><b>The derivatives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs014.html#important-observations" style="font-size: 80%;"><b>Important observations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs015.html#the-training" style="font-size: 80%;"><b>The training</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs016.html#code-example" style="font-size: 80%;"><b>Code example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs017.html#simple-neural-network-and-the-back-propagation-equations" style="font-size: 80%;"><b>Simple neural network and the  back propagation equations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs018.html#layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-with-two-hidden-noeds-and-one-output-node" style="font-size: 80%;"><b>Layout of a simple neural network with two input nodes, one  hidden layer with two hidden noeds and one output node</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs019.html#the-ouput-layer" style="font-size: 80%;"><b>The ouput layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs020.html#compact-expressions" style="font-size: 80%;"><b>Compact expressions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs021.html#output-layer" style="font-size: 80%;"><b>Output layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs022.html#explicit-derivatives" style="font-size: 80%;"><b>Explicit derivatives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs023.html#derivatives-of-the-hidden-layer" style="font-size: 80%;"><b>Derivatives of the hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs024.html#final-expression" style="font-size: 80%;"><b>Final expression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs025.html#completing-the-list" style="font-size: 80%;"><b>Completing the list</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs026.html#final-expressions-for-the-biases-of-the-hidden-layer" style="font-size: 80%;"><b>Final expressions for the biases of the hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs027.html#gradient-expressions" style="font-size: 80%;"><b>Gradient expressions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs028.html#setting-up-the-equations-for-a-neural-network" style="font-size: 80%;"><b>Setting up the equations for a neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs029.html#layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0" style="font-size: 80%;"><b>Layout of a neural network with three hidden layers (last layer = \( l=L=4 \), first layer \( l=0 \))</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs030.html#definitions" style="font-size: 80%;"><b>Definitions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs031.html#inputs-to-the-activation-function" style="font-size: 80%;"><b>Inputs to the activation function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs032.html#layout-of-input-to-first-hidden-layer-l-1-from-input-layer-l-0" style="font-size: 80%;"><b>Layout of input to first hidden layer \( l=1 \) from input layer \( l=0 \)</b></a></li>
+     <!-- navigation toc: --> <li><a href="#derivatives-and-the-chain-rule" style="font-size: 80%;"><b>Derivatives and the chain rule</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs034.html#derivative-of-the-cost-function" style="font-size: 80%;"><b>Derivative of the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs035.html#the-back-propagation-equations-for-a-neural-network" style="font-size: 80%;"><b>The  back propagation equations for a neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs036.html#analyzing-the-last-results" style="font-size: 80%;"><b>Analyzing the last results</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs037.html#more-considerations" style="font-size: 80%;"><b>More considerations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs038.html#derivatives-in-terms-of-z-j-l" style="font-size: 80%;"><b>Derivatives in terms of \( z_j^L \)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs039.html#bringing-it-together" style="font-size: 80%;"><b>Bringing it together</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs040.html#final-back-propagating-equation" style="font-size: 80%;"><b>Final back propagating equation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs041.html#using-the-chain-rule-and-summing-over-all-k-entries" style="font-size: 80%;"><b>Using the chain rule and summing over all \( k \) entries</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs042.html#setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations" style="font-size: 80%;"><b>Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs043.html#setting-up-the-back-propagation-algorithm-part-1" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 1</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs044.html#setting-up-the-back-propagation-algorithm-part-2" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 2</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs045.html#setting-up-the-back-propagation-algorithm-part-3" style="font-size: 80%;"><b>Setting up the Back propagation algorithm, part 3</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs046.html#updating-the-gradients" style="font-size: 80%;"><b>Updating the gradients</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#activation-functions" style="font-size: 80%;"><b>Activation functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs047.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs048.html#relevance" style="font-size: 80%;"><b>Relevance</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs049.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs050.html#exploding-gradients" style="font-size: 80%;"><b>Exploding gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs051.html#is-the-logistic-activation-function-sigmoid-our-choice" style="font-size: 80%;"><b>Is the Logistic activation function (Sigmoid)  our choice?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs052.html#logistic-function-as-the-root-of-problems" style="font-size: 80%;"><b>Logistic function as the root of problems</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs053.html#the-derivative-of-the-logistic-funtion" style="font-size: 80%;"><b>The derivative of the Logistic funtion</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs054.html#insights-from-the-paper-by-glorot-and-bengio" style="font-size: 80%;"><b>Insights from the paper by Glorot and Bengio</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs055.html#the-relu-function-family" style="font-size: 80%;"><b>The RELU function family</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs056.html#elu-function" style="font-size: 80%;"><b>ELU function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs057.html#which-activation-function-should-we-use" style="font-size: 80%;"><b>Which activation function should we use?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs058.html#more-on-activation-functions-output-layers" style="font-size: 80%;"><b>More on activation functions, output layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs059.html#fine-tuning-neural-network-hyperparameters" style="font-size: 80%;"><b>Fine-tuning neural network hyperparameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs060.html#hidden-layers" style="font-size: 80%;"><b>Hidden layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs061.html#batch-normalization" style="font-size: 80%;"><b>Batch Normalization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs062.html#dropout" style="font-size: 80%;"><b>Dropout</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs063.html#gradient-clipping" style="font-size: 80%;"><b>Gradient Clipping</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs064.html#a-top-down-perspective-on-neural-networks" style="font-size: 80%;"><b>A top-down perspective on Neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs065.html#more-top-down-perspectives" style="font-size: 80%;"><b>More top-down perspectives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs066.html#limitations-of-supervised-learning-with-deep-networks" style="font-size: 80%;"><b>Limitations of supervised learning with deep networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs067.html#limitations-of-nns" style="font-size: 80%;"><b>Limitations of NNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs068.html#homogeneous-data" style="font-size: 80%;"><b>Homogeneous data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs069.html#more-limitations" style="font-size: 80%;"><b>More limitations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs070.html#setting-up-a-multi-layer-perceptron-model-for-classification" style="font-size: 80%;"><b>Setting up a Multi-layer perceptron model for classification</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs071.html#defining-the-cost-function" style="font-size: 80%;"><b>Defining the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs072.html#example-binary-classification-problem" style="font-size: 80%;"><b>Example: binary classification problem</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs073.html#the-softmax-function" style="font-size: 80%;"><b>The Softmax function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs074.html#developing-a-code-for-doing-neural-networks-with-back-propagation" style="font-size: 80%;"><b>Developing a code for doing neural networks with back propagation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs076.html#train-and-test-datasets" style="font-size: 80%;"><b>Train and test datasets</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs077.html#define-model-and-architecture" style="font-size: 80%;"><b>Define model and architecture</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs078.html#layers" style="font-size: 80%;"><b>Layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs079.html#weights-and-biases" style="font-size: 80%;"><b>Weights and biases</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs080.html#feed-forward-pass" style="font-size: 80%;"><b>Feed-forward pass</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs081.html#matrix-multiplications" style="font-size: 80%;"><b>Matrix multiplications</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs082.html#choose-cost-function-and-optimizer" style="font-size: 80%;"><b>Choose cost function and optimizer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs083.html#optimizing-the-cost-function" style="font-size: 80%;"><b>Optimizing the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs084.html#regularization" style="font-size: 80%;"><b>Regularization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs085.html#matrix-multiplication" style="font-size: 80%;"><b>Matrix  multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs086.html#improving-performance" style="font-size: 80%;"><b>Improving performance</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs087.html#full-object-oriented-implementation" style="font-size: 80%;"><b>Full object-oriented implementation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs088.html#evaluate-model-performance-on-test-data" style="font-size: 80%;"><b>Evaluate model performance on test data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs089.html#adjust-hyperparameters" style="font-size: 80%;"><b>Adjust hyperparameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs091.html#scikit-learn-implementation" style="font-size: 80%;"><b>scikit-learn implementation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#building-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs094.html#tensorflow" style="font-size: 80%;"><b>Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs095.html#using-keras" style="font-size: 80%;"><b>Using Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#the-breast-cancer-data-now-with-keras" style="font-size: 80%;"><b>The Breast Cancer Data, now with Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs048.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs049.html#relevance" style="font-size: 80%;"><b>Relevance</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs050.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs051.html#exploding-gradients" style="font-size: 80%;"><b>Exploding gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs052.html#is-the-logistic-activation-function-sigmoid-our-choice" style="font-size: 80%;"><b>Is the Logistic activation function (Sigmoid)  our choice?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs053.html#logistic-function-as-the-root-of-problems" style="font-size: 80%;"><b>Logistic function as the root of problems</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs054.html#the-derivative-of-the-logistic-funtion" style="font-size: 80%;"><b>The derivative of the Logistic funtion</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs055.html#insights-from-the-paper-by-glorot-and-bengio" style="font-size: 80%;"><b>Insights from the paper by Glorot and Bengio</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs056.html#the-relu-function-family" style="font-size: 80%;"><b>The RELU function family</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs057.html#elu-function" style="font-size: 80%;"><b>ELU function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs058.html#which-activation-function-should-we-use" style="font-size: 80%;"><b>Which activation function should we use?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs059.html#more-on-activation-functions-output-layers" style="font-size: 80%;"><b>More on activation functions, output layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs060.html#fine-tuning-neural-network-hyperparameters" style="font-size: 80%;"><b>Fine-tuning neural network hyperparameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs061.html#hidden-layers" style="font-size: 80%;"><b>Hidden layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs062.html#batch-normalization" style="font-size: 80%;"><b>Batch Normalization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs063.html#dropout" style="font-size: 80%;"><b>Dropout</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs064.html#gradient-clipping" style="font-size: 80%;"><b>Gradient Clipping</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs065.html#a-top-down-perspective-on-neural-networks" style="font-size: 80%;"><b>A top-down perspective on Neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs066.html#more-top-down-perspectives" style="font-size: 80%;"><b>More top-down perspectives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs067.html#limitations-of-supervised-learning-with-deep-networks" style="font-size: 80%;"><b>Limitations of supervised learning with deep networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs068.html#limitations-of-nns" style="font-size: 80%;"><b>Limitations of NNs</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs069.html#homogeneous-data" style="font-size: 80%;"><b>Homogeneous data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs070.html#more-limitations" style="font-size: 80%;"><b>More limitations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs071.html#setting-up-a-multi-layer-perceptron-model-for-classification" style="font-size: 80%;"><b>Setting up a Multi-layer perceptron model for classification</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs072.html#defining-the-cost-function" style="font-size: 80%;"><b>Defining the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs073.html#example-binary-classification-problem" style="font-size: 80%;"><b>Example: binary classification problem</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs074.html#the-softmax-function" style="font-size: 80%;"><b>The Softmax function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs075.html#developing-a-code-for-doing-neural-networks-with-back-propagation" style="font-size: 80%;"><b>Developing a code for doing neural networks with back propagation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs077.html#train-and-test-datasets" style="font-size: 80%;"><b>Train and test datasets</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs078.html#define-model-and-architecture" style="font-size: 80%;"><b>Define model and architecture</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs079.html#layers" style="font-size: 80%;"><b>Layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs080.html#weights-and-biases" style="font-size: 80%;"><b>Weights and biases</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs081.html#feed-forward-pass" style="font-size: 80%;"><b>Feed-forward pass</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs082.html#matrix-multiplications" style="font-size: 80%;"><b>Matrix multiplications</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs083.html#choose-cost-function-and-optimizer" style="font-size: 80%;"><b>Choose cost function and optimizer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs084.html#optimizing-the-cost-function" style="font-size: 80%;"><b>Optimizing the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs085.html#regularization" style="font-size: 80%;"><b>Regularization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs086.html#matrix-multiplication" style="font-size: 80%;"><b>Matrix  multiplication</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs087.html#improving-performance" style="font-size: 80%;"><b>Improving performance</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs088.html#full-object-oriented-implementation" style="font-size: 80%;"><b>Full object-oriented implementation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs089.html#evaluate-model-performance-on-test-data" style="font-size: 80%;"><b>Evaluate model performance on test data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs090.html#adjust-hyperparameters" style="font-size: 80%;"><b>Adjust hyperparameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#scikit-learn-implementation" style="font-size: 80%;"><b>scikit-learn implementation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs094.html#building-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building neural networks in Tensorflow and Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs095.html#tensorflow" style="font-size: 80%;"><b>Tensorflow</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#using-keras" style="font-size: 80%;"><b>Using Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#building-a-neural-network-code" style="font-size: 80%;"><b>Building a neural network code</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#learning-rate-methods" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Learning rate methods</a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#usage-of-the-above-learning-rate-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of the above learning rate schedulers</a></li>
@@ -465,24 +463,21 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0033"></a>
 <!-- !split -->
-<h2 id="derivative-of-the-cost-function" class="anchor">Derivative of the cost function </h2>
+<h2 id="derivatives-and-the-chain-rule" class="anchor">Derivatives and the chain rule </h2>
 
-<p>With these definitions we can now compute the derivative of the cost function in terms of the weights.</p>
-
-<p>Let us specialize to the output layer \( l=L \). Our cost function is</p>
+<p>From the definition of the input variable to the activation function, that is \( z_j^l \) we have</p>
 $$
-{\cal C}(\boldsymbol{\Theta}^L)  =  \frac{1}{2}\sum_{i=1}^n\left(y_i - \tilde{y}_i\right)^2=\frac{1}{2}\sum_{i=1}^n\left(a_i^L - y_i\right)^2, 
+\frac{\partial z_j^l}{\partial w_{ij}^l} = a_i^{l-1},
 $$
 
-<p>The derivative of this function with respect to the weights is</p>
-
+<p>and</p>
 $$
-\frac{\partial{\cal C}(\boldsymbol{\Theta}^L)}{\partial w_{ij}^L}  =  \left(a_j^L - y_j\right)\frac{\partial a_j^L}{\partial w_{ij}^{L}}, 
+\frac{\partial z_j^l}{\partial a_i^{l-1}} = w_{ji}^l. 
 $$
 
-<p>The last partial derivative can easily be computed and reads (by applying the chain rule)</p>
+<p>With our definition of the activation function we have that (note that this function depends only on \( z_j^l \))</p>
 $$
-\frac{\partial a_j^L}{\partial w_{ij}^{L}} = \frac{\partial a_j^L}{\partial z_{j}^{L}}\frac{\partial z_j^L}{\partial w_{ij}^{L}}=a_j^L(1-a_j^L)a_i^{L-1}.  
+\frac{\partial a_j^l}{\partial z_j^{l}} = a_j^l(1-a_j^l)=\sigma(z_j^l)(1-\sigma(z_j^l)). 
 $$
 
 
diff --git a/doc/pub/week42/html/._week42-bs034.html b/doc/pub/week42/html/._week42-bs034.html
index 76f262307..54fa87947 100644
--- a/doc/pub/week42/html/._week42-bs034.html
+++ b/doc/pub/week42/html/._week42-bs034.html
@@ -36,16 +36,17 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Lecture October 14, 2024', 2, None, 'lecture-october-14-2024'),
-              ('Material for the active learning sessions on Tuesday and '
-               'Wednesday',
+ 'sections': [('Lecture October 13, 2025', 2, None, 'lecture-october-13-2025'),
+              ('Readings and videos', 2, None, 'readings-and-videos'),
+              ('Material for the lab sessions on Tuesday and Wednesday',
                2,
                None,
-               'material-for-the-active-learning-sessions-on-tuesday-and-wednesday'),
-              ('Writing a code which implements a feed-forward neural network',
+               'material-for-the-lab-sessions-on-tuesday-and-wednesday'),
+              ('Lecture material: Writing a code which implements a '
+               'feed-forward neural network',
                2,
                None,
-               'writing-a-code-which-implements-a-feed-forward-neural-network'),
+               'lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network'),
               ('Mathematics of deep learning',
                2,
                None,
@@ -55,10 +56,11 @@
                None,
                'reminder-on-books-with-hands-on-material-and-codes'),
               ('Reading recommendations', 2, None, 'reading-recommendations'),
-              ('First network example, simple percepetron with one input',
+              ('Reminder from last week: First network example, simple '
+               'percepetron with one input',
                2,
                None,
-               'first-network-example-simple-percepetron-with-one-input'),
+               'reminder-from-last-week-first-network-example-simple-percepetron-with-one-input'),
               ('Layout of a simple neural network with no hidden layer',
                2,
                None,
@@ -105,10 +107,10 @@
                None,
                'setting-up-the-equations-for-a-neural-network'),
               ('Layout of a neural network with three hidden layers (last '
-               'later = $l=L=4$, first layer $l=0$)',
+               'layer = $l=L=4$, first layer $l=0$)',
                2,
                None,
-               'layout-of-a-neural-network-with-three-hidden-layers-last-later-l-l-4-first-layer-l-0'),
+               'layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0'),
               ('Definitions', 2, None, 'definitions'),
               ('Inputs to the activation function',
                2,
@@ -291,10 +293,6 @@
                2,
                None,
                'collect-and-pre-process-data'),
-              ('The Breast Cancer Data, now with Keras',
-               2,
-               None,
-               'the-breast-cancer-data-now-with-keras'),
               ('Building a neural network code',
                2,
                None,
@@ -349,103 +347,103 @@
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs001.html#lecture-october-14-2024" style="font-size: 80%;"><b>Lecture October 14, 2024</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs002.html#material-for-the-active-learning-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Material for the active learning sessions on Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs003.html#writing-a-code-which-implements-a-feed-forward-neural-network" style="font-size: 80%;"><b>Writing a code which implements a feed-forward neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs004.html#mathematics-of-deep-learning" style="font-size: 80%;"><b>Mathematics of deep learning</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs005.html#reminder-on-books-with-hands-on-material-and-codes" style="font-size: 80%;"><b>Reminder on books with hands-on material and codes</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs006.html#reading-recommendations" style="font-size: 80%;"><b>Reading recommendations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs007.html#first-network-example-simple-percepetron-with-one-input" style="font-size: 80%;"><b>First network example, simple percepetron with one input</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs008.html#layout-of-a-simple-neural-network-with-no-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with no hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs009.html#optimizing-the-parameters" style="font-size: 80%;"><b>Optimizing the parameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs010.html#adding-a-hidden-layer" style="font-size: 80%;"><b>Adding a hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs011.html#layout-of-a-simple-neural-network-with-one-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with one hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs012.html#the-derivatives" style="font-size: 80%;"><b>The derivatives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs013.html#important-observations" style="font-size: 80%;"><b>Important observations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs014.html#the-training" style="font-size: 80%;"><b>The training</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs015.html#code-example" style="font-size: 80%;"><b>Code example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs016.html#simple-neural-network-and-the-back-propagation-equations" style="font-size: 80%;"><b>Simple neural network and the  back propagation equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs017.html#layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-with-two-hidden-noeds-and-one-output-node" style="font-size: 80%;"><b>Layout of a simple neural network with two input nodes, one  hidden layer with two hidden noeds and one output node</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs018.html#the-ouput-layer" style="font-size: 80%;"><b>The ouput layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs019.html#compact-expressions" style="font-size: 80%;"><b>Compact expressions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs020.html#output-layer" style="font-size: 80%;"><b>Output layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs021.html#explicit-derivatives" style="font-size: 80%;"><b>Explicit derivatives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs022.html#derivatives-of-the-hidden-layer" style="font-size: 80%;"><b>Derivatives of the hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs023.html#final-expression" style="font-size: 80%;"><b>Final expression</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs024.html#completing-the-list" style="font-size: 80%;"><b>Completing the list</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs025.html#final-expressions-for-the-biases-of-the-hidden-layer" style="font-size: 80%;"><b>Final expressions for the biases of the hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs026.html#gradient-expressions" style="font-size: 80%;"><b>Gradient expressions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs027.html#setting-up-the-equations-for-a-neural-network" style="font-size: 80%;"><b>Setting up the equations for a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs028.html#layout-of-a-neural-network-with-three-hidden-layers-last-later-l-l-4-first-layer-l-0" style="font-size: 80%;"><b>Layout of a neural network with three hidden layers (last later = \( l=L=4 \), first layer \( l=0 \))</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs029.html#definitions" style="font-size: 80%;"><b>Definitions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs030.html#inputs-to-the-activation-function" style="font-size: 80%;"><b>Inputs to the activation function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs031.html#layout-of-input-to-first-hidden-layer-l-1-from-input-layer-l-0" style="font-size: 80%;"><b>Layout of input to first hidden layer \( l=1 \) from input layer \( l=0 \)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs032.html#derivatives-and-the-chain-rule" style="font-size: 80%;"><b>Derivatives and the chain rule</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs033.html#derivative-of-the-cost-function" style="font-size: 80%;"><b>Derivative of the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="#the-back-propagation-equations-for-a-neural-network" style="font-size: 80%;"><b>The  back propagation equations for a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs035.html#analyzing-the-last-results" style="font-size: 80%;"><b>Analyzing the last results</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs036.html#more-considerations" style="font-size: 80%;"><b>More considerations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs037.html#derivatives-in-terms-of-z-j-l" style="font-size: 80%;"><b>Derivatives in terms of \( z_j^L \)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs038.html#bringing-it-together" style="font-size: 80%;"><b>Bringing it together</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs039.html#final-back-propagating-equation" style="font-size: 80%;"><b>Final back propagating equation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs040.html#using-the-chain-rule-and-summing-over-all-k-entries" style="font-size: 80%;"><b>Using the chain rule and summing over all \( k \) entries</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs041.html#setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations" style="font-size: 80%;"><b>Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs042.html#setting-up-the-back-propagation-algorithm-part-1" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 1</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs043.html#setting-up-the-back-propagation-algorithm-part-2" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs044.html#setting-up-the-back-propagation-algorithm-part-3" style="font-size: 80%;"><b>Setting up the Back propagation algorithm, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs045.html#updating-the-gradients" style="font-size: 80%;"><b>Updating the gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs001.html#lecture-october-13-2025" style="font-size: 80%;"><b>Lecture October 13, 2025</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs002.html#readings-and-videos" style="font-size: 80%;"><b>Readings and videos</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs003.html#material-for-the-lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Material for the lab sessions on Tuesday and Wednesday</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs004.html#lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network" style="font-size: 80%;"><b>Lecture material: Writing a code which implements a feed-forward neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs005.html#mathematics-of-deep-learning" style="font-size: 80%;"><b>Mathematics of deep learning</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs006.html#reminder-on-books-with-hands-on-material-and-codes" style="font-size: 80%;"><b>Reminder on books with hands-on material and codes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs007.html#reading-recommendations" style="font-size: 80%;"><b>Reading recommendations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs008.html#reminder-from-last-week-first-network-example-simple-percepetron-with-one-input" style="font-size: 80%;"><b>Reminder from last week: First network example, simple percepetron with one input</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs009.html#layout-of-a-simple-neural-network-with-no-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with no hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs010.html#optimizing-the-parameters" style="font-size: 80%;"><b>Optimizing the parameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs011.html#adding-a-hidden-layer" style="font-size: 80%;"><b>Adding a hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs012.html#layout-of-a-simple-neural-network-with-one-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with one hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs013.html#the-derivatives" style="font-size: 80%;"><b>The derivatives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs014.html#important-observations" style="font-size: 80%;"><b>Important observations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs015.html#the-training" style="font-size: 80%;"><b>The training</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs016.html#code-example" style="font-size: 80%;"><b>Code example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs017.html#simple-neural-network-and-the-back-propagation-equations" style="font-size: 80%;"><b>Simple neural network and the  back propagation equations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs018.html#layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-with-two-hidden-noeds-and-one-output-node" style="font-size: 80%;"><b>Layout of a simple neural network with two input nodes, one  hidden layer with two hidden noeds and one output node</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs019.html#the-ouput-layer" style="font-size: 80%;"><b>The ouput layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs020.html#compact-expressions" style="font-size: 80%;"><b>Compact expressions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs021.html#output-layer" style="font-size: 80%;"><b>Output layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs022.html#explicit-derivatives" style="font-size: 80%;"><b>Explicit derivatives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs023.html#derivatives-of-the-hidden-layer" style="font-size: 80%;"><b>Derivatives of the hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs024.html#final-expression" style="font-size: 80%;"><b>Final expression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs025.html#completing-the-list" style="font-size: 80%;"><b>Completing the list</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs026.html#final-expressions-for-the-biases-of-the-hidden-layer" style="font-size: 80%;"><b>Final expressions for the biases of the hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs027.html#gradient-expressions" style="font-size: 80%;"><b>Gradient expressions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs028.html#setting-up-the-equations-for-a-neural-network" style="font-size: 80%;"><b>Setting up the equations for a neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs029.html#layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0" style="font-size: 80%;"><b>Layout of a neural network with three hidden layers (last layer = \( l=L=4 \), first layer \( l=0 \))</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs030.html#definitions" style="font-size: 80%;"><b>Definitions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs031.html#inputs-to-the-activation-function" style="font-size: 80%;"><b>Inputs to the activation function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs032.html#layout-of-input-to-first-hidden-layer-l-1-from-input-layer-l-0" style="font-size: 80%;"><b>Layout of input to first hidden layer \( l=1 \) from input layer \( l=0 \)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs033.html#derivatives-and-the-chain-rule" style="font-size: 80%;"><b>Derivatives and the chain rule</b></a></li>
+     <!-- navigation toc: --> <li><a href="#derivative-of-the-cost-function" style="font-size: 80%;"><b>Derivative of the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs035.html#the-back-propagation-equations-for-a-neural-network" style="font-size: 80%;"><b>The  back propagation equations for a neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs036.html#analyzing-the-last-results" style="font-size: 80%;"><b>Analyzing the last results</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs037.html#more-considerations" style="font-size: 80%;"><b>More considerations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs038.html#derivatives-in-terms-of-z-j-l" style="font-size: 80%;"><b>Derivatives in terms of \( z_j^L \)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs039.html#bringing-it-together" style="font-size: 80%;"><b>Bringing it together</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs040.html#final-back-propagating-equation" style="font-size: 80%;"><b>Final back propagating equation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs041.html#using-the-chain-rule-and-summing-over-all-k-entries" style="font-size: 80%;"><b>Using the chain rule and summing over all \( k \) entries</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs042.html#setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations" style="font-size: 80%;"><b>Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs043.html#setting-up-the-back-propagation-algorithm-part-1" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 1</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs044.html#setting-up-the-back-propagation-algorithm-part-2" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 2</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs045.html#setting-up-the-back-propagation-algorithm-part-3" style="font-size: 80%;"><b>Setting up the Back propagation algorithm, part 3</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs046.html#updating-the-gradients" style="font-size: 80%;"><b>Updating the gradients</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#activation-functions" style="font-size: 80%;"><b>Activation functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs047.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs048.html#relevance" style="font-size: 80%;"><b>Relevance</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs049.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs050.html#exploding-gradients" style="font-size: 80%;"><b>Exploding gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs051.html#is-the-logistic-activation-function-sigmoid-our-choice" style="font-size: 80%;"><b>Is the Logistic activation function (Sigmoid)  our choice?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs052.html#logistic-function-as-the-root-of-problems" style="font-size: 80%;"><b>Logistic function as the root of problems</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs053.html#the-derivative-of-the-logistic-funtion" style="font-size: 80%;"><b>The derivative of the Logistic funtion</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs054.html#insights-from-the-paper-by-glorot-and-bengio" style="font-size: 80%;"><b>Insights from the paper by Glorot and Bengio</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs055.html#the-relu-function-family" style="font-size: 80%;"><b>The RELU function family</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs056.html#elu-function" style="font-size: 80%;"><b>ELU function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs057.html#which-activation-function-should-we-use" style="font-size: 80%;"><b>Which activation function should we use?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs058.html#more-on-activation-functions-output-layers" style="font-size: 80%;"><b>More on activation functions, output layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs059.html#fine-tuning-neural-network-hyperparameters" style="font-size: 80%;"><b>Fine-tuning neural network hyperparameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs060.html#hidden-layers" style="font-size: 80%;"><b>Hidden layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs061.html#batch-normalization" style="font-size: 80%;"><b>Batch Normalization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs062.html#dropout" style="font-size: 80%;"><b>Dropout</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs063.html#gradient-clipping" style="font-size: 80%;"><b>Gradient Clipping</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs064.html#a-top-down-perspective-on-neural-networks" style="font-size: 80%;"><b>A top-down perspective on Neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs065.html#more-top-down-perspectives" style="font-size: 80%;"><b>More top-down perspectives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs066.html#limitations-of-supervised-learning-with-deep-networks" style="font-size: 80%;"><b>Limitations of supervised learning with deep networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs067.html#limitations-of-nns" style="font-size: 80%;"><b>Limitations of NNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs068.html#homogeneous-data" style="font-size: 80%;"><b>Homogeneous data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs069.html#more-limitations" style="font-size: 80%;"><b>More limitations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs070.html#setting-up-a-multi-layer-perceptron-model-for-classification" style="font-size: 80%;"><b>Setting up a Multi-layer perceptron model for classification</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs071.html#defining-the-cost-function" style="font-size: 80%;"><b>Defining the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs072.html#example-binary-classification-problem" style="font-size: 80%;"><b>Example: binary classification problem</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs073.html#the-softmax-function" style="font-size: 80%;"><b>The Softmax function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs074.html#developing-a-code-for-doing-neural-networks-with-back-propagation" style="font-size: 80%;"><b>Developing a code for doing neural networks with back propagation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs076.html#train-and-test-datasets" style="font-size: 80%;"><b>Train and test datasets</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs077.html#define-model-and-architecture" style="font-size: 80%;"><b>Define model and architecture</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs078.html#layers" style="font-size: 80%;"><b>Layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs079.html#weights-and-biases" style="font-size: 80%;"><b>Weights and biases</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs080.html#feed-forward-pass" style="font-size: 80%;"><b>Feed-forward pass</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs081.html#matrix-multiplications" style="font-size: 80%;"><b>Matrix multiplications</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs082.html#choose-cost-function-and-optimizer" style="font-size: 80%;"><b>Choose cost function and optimizer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs083.html#optimizing-the-cost-function" style="font-size: 80%;"><b>Optimizing the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs084.html#regularization" style="font-size: 80%;"><b>Regularization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs085.html#matrix-multiplication" style="font-size: 80%;"><b>Matrix  multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs086.html#improving-performance" style="font-size: 80%;"><b>Improving performance</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs087.html#full-object-oriented-implementation" style="font-size: 80%;"><b>Full object-oriented implementation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs088.html#evaluate-model-performance-on-test-data" style="font-size: 80%;"><b>Evaluate model performance on test data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs089.html#adjust-hyperparameters" style="font-size: 80%;"><b>Adjust hyperparameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs091.html#scikit-learn-implementation" style="font-size: 80%;"><b>scikit-learn implementation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#building-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs094.html#tensorflow" style="font-size: 80%;"><b>Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs095.html#using-keras" style="font-size: 80%;"><b>Using Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#the-breast-cancer-data-now-with-keras" style="font-size: 80%;"><b>The Breast Cancer Data, now with Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs048.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs049.html#relevance" style="font-size: 80%;"><b>Relevance</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs050.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs051.html#exploding-gradients" style="font-size: 80%;"><b>Exploding gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs052.html#is-the-logistic-activation-function-sigmoid-our-choice" style="font-size: 80%;"><b>Is the Logistic activation function (Sigmoid)  our choice?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs053.html#logistic-function-as-the-root-of-problems" style="font-size: 80%;"><b>Logistic function as the root of problems</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs054.html#the-derivative-of-the-logistic-funtion" style="font-size: 80%;"><b>The derivative of the Logistic funtion</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs055.html#insights-from-the-paper-by-glorot-and-bengio" style="font-size: 80%;"><b>Insights from the paper by Glorot and Bengio</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs056.html#the-relu-function-family" style="font-size: 80%;"><b>The RELU function family</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs057.html#elu-function" style="font-size: 80%;"><b>ELU function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs058.html#which-activation-function-should-we-use" style="font-size: 80%;"><b>Which activation function should we use?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs059.html#more-on-activation-functions-output-layers" style="font-size: 80%;"><b>More on activation functions, output layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs060.html#fine-tuning-neural-network-hyperparameters" style="font-size: 80%;"><b>Fine-tuning neural network hyperparameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs061.html#hidden-layers" style="font-size: 80%;"><b>Hidden layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs062.html#batch-normalization" style="font-size: 80%;"><b>Batch Normalization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs063.html#dropout" style="font-size: 80%;"><b>Dropout</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs064.html#gradient-clipping" style="font-size: 80%;"><b>Gradient Clipping</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs065.html#a-top-down-perspective-on-neural-networks" style="font-size: 80%;"><b>A top-down perspective on Neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs066.html#more-top-down-perspectives" style="font-size: 80%;"><b>More top-down perspectives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs067.html#limitations-of-supervised-learning-with-deep-networks" style="font-size: 80%;"><b>Limitations of supervised learning with deep networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs068.html#limitations-of-nns" style="font-size: 80%;"><b>Limitations of NNs</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs069.html#homogeneous-data" style="font-size: 80%;"><b>Homogeneous data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs070.html#more-limitations" style="font-size: 80%;"><b>More limitations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs071.html#setting-up-a-multi-layer-perceptron-model-for-classification" style="font-size: 80%;"><b>Setting up a Multi-layer perceptron model for classification</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs072.html#defining-the-cost-function" style="font-size: 80%;"><b>Defining the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs073.html#example-binary-classification-problem" style="font-size: 80%;"><b>Example: binary classification problem</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs074.html#the-softmax-function" style="font-size: 80%;"><b>The Softmax function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs075.html#developing-a-code-for-doing-neural-networks-with-back-propagation" style="font-size: 80%;"><b>Developing a code for doing neural networks with back propagation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs077.html#train-and-test-datasets" style="font-size: 80%;"><b>Train and test datasets</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs078.html#define-model-and-architecture" style="font-size: 80%;"><b>Define model and architecture</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs079.html#layers" style="font-size: 80%;"><b>Layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs080.html#weights-and-biases" style="font-size: 80%;"><b>Weights and biases</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs081.html#feed-forward-pass" style="font-size: 80%;"><b>Feed-forward pass</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs082.html#matrix-multiplications" style="font-size: 80%;"><b>Matrix multiplications</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs083.html#choose-cost-function-and-optimizer" style="font-size: 80%;"><b>Choose cost function and optimizer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs084.html#optimizing-the-cost-function" style="font-size: 80%;"><b>Optimizing the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs085.html#regularization" style="font-size: 80%;"><b>Regularization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs086.html#matrix-multiplication" style="font-size: 80%;"><b>Matrix  multiplication</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs087.html#improving-performance" style="font-size: 80%;"><b>Improving performance</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs088.html#full-object-oriented-implementation" style="font-size: 80%;"><b>Full object-oriented implementation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs089.html#evaluate-model-performance-on-test-data" style="font-size: 80%;"><b>Evaluate model performance on test data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs090.html#adjust-hyperparameters" style="font-size: 80%;"><b>Adjust hyperparameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#scikit-learn-implementation" style="font-size: 80%;"><b>scikit-learn implementation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs094.html#building-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building neural networks in Tensorflow and Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs095.html#tensorflow" style="font-size: 80%;"><b>Tensorflow</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#using-keras" style="font-size: 80%;"><b>Using Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#building-a-neural-network-code" style="font-size: 80%;"><b>Building a neural network code</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#learning-rate-methods" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Learning rate methods</a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#usage-of-the-above-learning-rate-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of the above learning rate schedulers</a></li>
@@ -465,21 +463,24 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0034"></a>
 <!-- !split -->
-<h2 id="the-back-propagation-equations-for-a-neural-network" class="anchor">The  back propagation equations for a neural network </h2>
+<h2 id="derivative-of-the-cost-function" class="anchor">Derivative of the cost function </h2>
 
-<p>We have thus</p>
+<p>With these definitions we can now compute the derivative of the cost function in terms of the weights.</p>
+
+<p>Let us specialize to the output layer \( l=L \). Our cost function is</p>
 $$
-\frac{\partial{\cal C}((\boldsymbol{\Theta}^L)}{\partial w_{ij}^L}  =  \left(a_j^L - y_j\right)a_j^L(1-a_j^L)a_i^{L-1}, 
+{\cal C}(\boldsymbol{\Theta}^L)  =  \frac{1}{2}\sum_{i=1}^n\left(y_i - \tilde{y}_i\right)^2=\frac{1}{2}\sum_{i=1}^n\left(a_i^L - y_i\right)^2, 
 $$
 
-<p>Defining</p>
+<p>The derivative of this function with respect to the weights is</p>
+
 $$
-\delta_j^L = a_j^L(1-a_j^L)\left(a_j^L - y_j\right) = \sigma'(z_j^L)\frac{\partial {\cal C}}{\partial (a_j^L)},
+\frac{\partial{\cal C}(\boldsymbol{\Theta}^L)}{\partial w_{ij}^L}  =  \left(a_j^L - y_j\right)\frac{\partial a_j^L}{\partial w_{ij}^{L}}, 
 $$
 
-<p>and using the Hadamard product of two vectors we can write this as</p>
+<p>The last partial derivative can easily be computed and reads (by applying the chain rule)</p>
 $$
-\boldsymbol{\delta}^L = \sigma'(\boldsymbol{z}^L)\circ\frac{\partial {\cal C}}{\partial (\boldsymbol{a}^L)}.
+\frac{\partial a_j^L}{\partial w_{ij}^{L}} = \frac{\partial a_j^L}{\partial z_{j}^{L}}\frac{\partial z_j^L}{\partial w_{ij}^{L}}=a_j^L(1-a_j^L)a_i^{L-1}.  
 $$
 
 
diff --git a/doc/pub/week42/html/._week42-bs035.html b/doc/pub/week42/html/._week42-bs035.html
index b2d367ad6..c5d704177 100644
--- a/doc/pub/week42/html/._week42-bs035.html
+++ b/doc/pub/week42/html/._week42-bs035.html
@@ -36,16 +36,17 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Lecture October 14, 2024', 2, None, 'lecture-october-14-2024'),
-              ('Material for the active learning sessions on Tuesday and '
-               'Wednesday',
+ 'sections': [('Lecture October 13, 2025', 2, None, 'lecture-october-13-2025'),
+              ('Readings and videos', 2, None, 'readings-and-videos'),
+              ('Material for the lab sessions on Tuesday and Wednesday',
                2,
                None,
-               'material-for-the-active-learning-sessions-on-tuesday-and-wednesday'),
-              ('Writing a code which implements a feed-forward neural network',
+               'material-for-the-lab-sessions-on-tuesday-and-wednesday'),
+              ('Lecture material: Writing a code which implements a '
+               'feed-forward neural network',
                2,
                None,
-               'writing-a-code-which-implements-a-feed-forward-neural-network'),
+               'lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network'),
               ('Mathematics of deep learning',
                2,
                None,
@@ -55,10 +56,11 @@
                None,
                'reminder-on-books-with-hands-on-material-and-codes'),
               ('Reading recommendations', 2, None, 'reading-recommendations'),
-              ('First network example, simple percepetron with one input',
+              ('Reminder from last week: First network example, simple '
+               'percepetron with one input',
                2,
                None,
-               'first-network-example-simple-percepetron-with-one-input'),
+               'reminder-from-last-week-first-network-example-simple-percepetron-with-one-input'),
               ('Layout of a simple neural network with no hidden layer',
                2,
                None,
@@ -105,10 +107,10 @@
                None,
                'setting-up-the-equations-for-a-neural-network'),
               ('Layout of a neural network with three hidden layers (last '
-               'later = $l=L=4$, first layer $l=0$)',
+               'layer = $l=L=4$, first layer $l=0$)',
                2,
                None,
-               'layout-of-a-neural-network-with-three-hidden-layers-last-later-l-l-4-first-layer-l-0'),
+               'layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0'),
               ('Definitions', 2, None, 'definitions'),
               ('Inputs to the activation function',
                2,
@@ -291,10 +293,6 @@
                2,
                None,
                'collect-and-pre-process-data'),
-              ('The Breast Cancer Data, now with Keras',
-               2,
-               None,
-               'the-breast-cancer-data-now-with-keras'),
               ('Building a neural network code',
                2,
                None,
@@ -349,103 +347,103 @@
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs001.html#lecture-october-14-2024" style="font-size: 80%;"><b>Lecture October 14, 2024</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs002.html#material-for-the-active-learning-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Material for the active learning sessions on Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs003.html#writing-a-code-which-implements-a-feed-forward-neural-network" style="font-size: 80%;"><b>Writing a code which implements a feed-forward neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs004.html#mathematics-of-deep-learning" style="font-size: 80%;"><b>Mathematics of deep learning</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs005.html#reminder-on-books-with-hands-on-material-and-codes" style="font-size: 80%;"><b>Reminder on books with hands-on material and codes</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs006.html#reading-recommendations" style="font-size: 80%;"><b>Reading recommendations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs007.html#first-network-example-simple-percepetron-with-one-input" style="font-size: 80%;"><b>First network example, simple percepetron with one input</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs008.html#layout-of-a-simple-neural-network-with-no-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with no hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs009.html#optimizing-the-parameters" style="font-size: 80%;"><b>Optimizing the parameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs010.html#adding-a-hidden-layer" style="font-size: 80%;"><b>Adding a hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs011.html#layout-of-a-simple-neural-network-with-one-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with one hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs012.html#the-derivatives" style="font-size: 80%;"><b>The derivatives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs013.html#important-observations" style="font-size: 80%;"><b>Important observations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs014.html#the-training" style="font-size: 80%;"><b>The training</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs015.html#code-example" style="font-size: 80%;"><b>Code example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs016.html#simple-neural-network-and-the-back-propagation-equations" style="font-size: 80%;"><b>Simple neural network and the  back propagation equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs017.html#layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-with-two-hidden-noeds-and-one-output-node" style="font-size: 80%;"><b>Layout of a simple neural network with two input nodes, one  hidden layer with two hidden noeds and one output node</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs018.html#the-ouput-layer" style="font-size: 80%;"><b>The ouput layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs019.html#compact-expressions" style="font-size: 80%;"><b>Compact expressions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs020.html#output-layer" style="font-size: 80%;"><b>Output layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs021.html#explicit-derivatives" style="font-size: 80%;"><b>Explicit derivatives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs022.html#derivatives-of-the-hidden-layer" style="font-size: 80%;"><b>Derivatives of the hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs023.html#final-expression" style="font-size: 80%;"><b>Final expression</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs024.html#completing-the-list" style="font-size: 80%;"><b>Completing the list</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs025.html#final-expressions-for-the-biases-of-the-hidden-layer" style="font-size: 80%;"><b>Final expressions for the biases of the hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs026.html#gradient-expressions" style="font-size: 80%;"><b>Gradient expressions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs027.html#setting-up-the-equations-for-a-neural-network" style="font-size: 80%;"><b>Setting up the equations for a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs028.html#layout-of-a-neural-network-with-three-hidden-layers-last-later-l-l-4-first-layer-l-0" style="font-size: 80%;"><b>Layout of a neural network with three hidden layers (last later = \( l=L=4 \), first layer \( l=0 \))</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs029.html#definitions" style="font-size: 80%;"><b>Definitions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs030.html#inputs-to-the-activation-function" style="font-size: 80%;"><b>Inputs to the activation function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs031.html#layout-of-input-to-first-hidden-layer-l-1-from-input-layer-l-0" style="font-size: 80%;"><b>Layout of input to first hidden layer \( l=1 \) from input layer \( l=0 \)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs032.html#derivatives-and-the-chain-rule" style="font-size: 80%;"><b>Derivatives and the chain rule</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs033.html#derivative-of-the-cost-function" style="font-size: 80%;"><b>Derivative of the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs034.html#the-back-propagation-equations-for-a-neural-network" style="font-size: 80%;"><b>The  back propagation equations for a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="#analyzing-the-last-results" style="font-size: 80%;"><b>Analyzing the last results</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs036.html#more-considerations" style="font-size: 80%;"><b>More considerations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs037.html#derivatives-in-terms-of-z-j-l" style="font-size: 80%;"><b>Derivatives in terms of \( z_j^L \)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs038.html#bringing-it-together" style="font-size: 80%;"><b>Bringing it together</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs039.html#final-back-propagating-equation" style="font-size: 80%;"><b>Final back propagating equation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs040.html#using-the-chain-rule-and-summing-over-all-k-entries" style="font-size: 80%;"><b>Using the chain rule and summing over all \( k \) entries</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs041.html#setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations" style="font-size: 80%;"><b>Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs042.html#setting-up-the-back-propagation-algorithm-part-1" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 1</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs043.html#setting-up-the-back-propagation-algorithm-part-2" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs044.html#setting-up-the-back-propagation-algorithm-part-3" style="font-size: 80%;"><b>Setting up the Back propagation algorithm, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs045.html#updating-the-gradients" style="font-size: 80%;"><b>Updating the gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs001.html#lecture-october-13-2025" style="font-size: 80%;"><b>Lecture October 13, 2025</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs002.html#readings-and-videos" style="font-size: 80%;"><b>Readings and videos</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs003.html#material-for-the-lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Material for the lab sessions on Tuesday and Wednesday</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs004.html#lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network" style="font-size: 80%;"><b>Lecture material: Writing a code which implements a feed-forward neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs005.html#mathematics-of-deep-learning" style="font-size: 80%;"><b>Mathematics of deep learning</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs006.html#reminder-on-books-with-hands-on-material-and-codes" style="font-size: 80%;"><b>Reminder on books with hands-on material and codes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs007.html#reading-recommendations" style="font-size: 80%;"><b>Reading recommendations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs008.html#reminder-from-last-week-first-network-example-simple-percepetron-with-one-input" style="font-size: 80%;"><b>Reminder from last week: First network example, simple percepetron with one input</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs009.html#layout-of-a-simple-neural-network-with-no-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with no hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs010.html#optimizing-the-parameters" style="font-size: 80%;"><b>Optimizing the parameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs011.html#adding-a-hidden-layer" style="font-size: 80%;"><b>Adding a hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs012.html#layout-of-a-simple-neural-network-with-one-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with one hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs013.html#the-derivatives" style="font-size: 80%;"><b>The derivatives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs014.html#important-observations" style="font-size: 80%;"><b>Important observations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs015.html#the-training" style="font-size: 80%;"><b>The training</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs016.html#code-example" style="font-size: 80%;"><b>Code example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs017.html#simple-neural-network-and-the-back-propagation-equations" style="font-size: 80%;"><b>Simple neural network and the  back propagation equations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs018.html#layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-with-two-hidden-noeds-and-one-output-node" style="font-size: 80%;"><b>Layout of a simple neural network with two input nodes, one  hidden layer with two hidden noeds and one output node</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs019.html#the-ouput-layer" style="font-size: 80%;"><b>The ouput layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs020.html#compact-expressions" style="font-size: 80%;"><b>Compact expressions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs021.html#output-layer" style="font-size: 80%;"><b>Output layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs022.html#explicit-derivatives" style="font-size: 80%;"><b>Explicit derivatives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs023.html#derivatives-of-the-hidden-layer" style="font-size: 80%;"><b>Derivatives of the hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs024.html#final-expression" style="font-size: 80%;"><b>Final expression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs025.html#completing-the-list" style="font-size: 80%;"><b>Completing the list</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs026.html#final-expressions-for-the-biases-of-the-hidden-layer" style="font-size: 80%;"><b>Final expressions for the biases of the hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs027.html#gradient-expressions" style="font-size: 80%;"><b>Gradient expressions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs028.html#setting-up-the-equations-for-a-neural-network" style="font-size: 80%;"><b>Setting up the equations for a neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs029.html#layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0" style="font-size: 80%;"><b>Layout of a neural network with three hidden layers (last layer = \( l=L=4 \), first layer \( l=0 \))</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs030.html#definitions" style="font-size: 80%;"><b>Definitions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs031.html#inputs-to-the-activation-function" style="font-size: 80%;"><b>Inputs to the activation function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs032.html#layout-of-input-to-first-hidden-layer-l-1-from-input-layer-l-0" style="font-size: 80%;"><b>Layout of input to first hidden layer \( l=1 \) from input layer \( l=0 \)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs033.html#derivatives-and-the-chain-rule" style="font-size: 80%;"><b>Derivatives and the chain rule</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs034.html#derivative-of-the-cost-function" style="font-size: 80%;"><b>Derivative of the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="#the-back-propagation-equations-for-a-neural-network" style="font-size: 80%;"><b>The  back propagation equations for a neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs036.html#analyzing-the-last-results" style="font-size: 80%;"><b>Analyzing the last results</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs037.html#more-considerations" style="font-size: 80%;"><b>More considerations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs038.html#derivatives-in-terms-of-z-j-l" style="font-size: 80%;"><b>Derivatives in terms of \( z_j^L \)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs039.html#bringing-it-together" style="font-size: 80%;"><b>Bringing it together</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs040.html#final-back-propagating-equation" style="font-size: 80%;"><b>Final back propagating equation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs041.html#using-the-chain-rule-and-summing-over-all-k-entries" style="font-size: 80%;"><b>Using the chain rule and summing over all \( k \) entries</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs042.html#setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations" style="font-size: 80%;"><b>Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs043.html#setting-up-the-back-propagation-algorithm-part-1" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 1</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs044.html#setting-up-the-back-propagation-algorithm-part-2" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 2</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs045.html#setting-up-the-back-propagation-algorithm-part-3" style="font-size: 80%;"><b>Setting up the Back propagation algorithm, part 3</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs046.html#updating-the-gradients" style="font-size: 80%;"><b>Updating the gradients</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#activation-functions" style="font-size: 80%;"><b>Activation functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs047.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs048.html#relevance" style="font-size: 80%;"><b>Relevance</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs049.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs050.html#exploding-gradients" style="font-size: 80%;"><b>Exploding gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs051.html#is-the-logistic-activation-function-sigmoid-our-choice" style="font-size: 80%;"><b>Is the Logistic activation function (Sigmoid)  our choice?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs052.html#logistic-function-as-the-root-of-problems" style="font-size: 80%;"><b>Logistic function as the root of problems</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs053.html#the-derivative-of-the-logistic-funtion" style="font-size: 80%;"><b>The derivative of the Logistic funtion</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs054.html#insights-from-the-paper-by-glorot-and-bengio" style="font-size: 80%;"><b>Insights from the paper by Glorot and Bengio</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs055.html#the-relu-function-family" style="font-size: 80%;"><b>The RELU function family</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs056.html#elu-function" style="font-size: 80%;"><b>ELU function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs057.html#which-activation-function-should-we-use" style="font-size: 80%;"><b>Which activation function should we use?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs058.html#more-on-activation-functions-output-layers" style="font-size: 80%;"><b>More on activation functions, output layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs059.html#fine-tuning-neural-network-hyperparameters" style="font-size: 80%;"><b>Fine-tuning neural network hyperparameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs060.html#hidden-layers" style="font-size: 80%;"><b>Hidden layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs061.html#batch-normalization" style="font-size: 80%;"><b>Batch Normalization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs062.html#dropout" style="font-size: 80%;"><b>Dropout</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs063.html#gradient-clipping" style="font-size: 80%;"><b>Gradient Clipping</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs064.html#a-top-down-perspective-on-neural-networks" style="font-size: 80%;"><b>A top-down perspective on Neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs065.html#more-top-down-perspectives" style="font-size: 80%;"><b>More top-down perspectives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs066.html#limitations-of-supervised-learning-with-deep-networks" style="font-size: 80%;"><b>Limitations of supervised learning with deep networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs067.html#limitations-of-nns" style="font-size: 80%;"><b>Limitations of NNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs068.html#homogeneous-data" style="font-size: 80%;"><b>Homogeneous data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs069.html#more-limitations" style="font-size: 80%;"><b>More limitations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs070.html#setting-up-a-multi-layer-perceptron-model-for-classification" style="font-size: 80%;"><b>Setting up a Multi-layer perceptron model for classification</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs071.html#defining-the-cost-function" style="font-size: 80%;"><b>Defining the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs072.html#example-binary-classification-problem" style="font-size: 80%;"><b>Example: binary classification problem</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs073.html#the-softmax-function" style="font-size: 80%;"><b>The Softmax function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs074.html#developing-a-code-for-doing-neural-networks-with-back-propagation" style="font-size: 80%;"><b>Developing a code for doing neural networks with back propagation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs076.html#train-and-test-datasets" style="font-size: 80%;"><b>Train and test datasets</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs077.html#define-model-and-architecture" style="font-size: 80%;"><b>Define model and architecture</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs078.html#layers" style="font-size: 80%;"><b>Layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs079.html#weights-and-biases" style="font-size: 80%;"><b>Weights and biases</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs080.html#feed-forward-pass" style="font-size: 80%;"><b>Feed-forward pass</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs081.html#matrix-multiplications" style="font-size: 80%;"><b>Matrix multiplications</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs082.html#choose-cost-function-and-optimizer" style="font-size: 80%;"><b>Choose cost function and optimizer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs083.html#optimizing-the-cost-function" style="font-size: 80%;"><b>Optimizing the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs084.html#regularization" style="font-size: 80%;"><b>Regularization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs085.html#matrix-multiplication" style="font-size: 80%;"><b>Matrix  multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs086.html#improving-performance" style="font-size: 80%;"><b>Improving performance</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs087.html#full-object-oriented-implementation" style="font-size: 80%;"><b>Full object-oriented implementation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs088.html#evaluate-model-performance-on-test-data" style="font-size: 80%;"><b>Evaluate model performance on test data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs089.html#adjust-hyperparameters" style="font-size: 80%;"><b>Adjust hyperparameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs091.html#scikit-learn-implementation" style="font-size: 80%;"><b>scikit-learn implementation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#building-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs094.html#tensorflow" style="font-size: 80%;"><b>Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs095.html#using-keras" style="font-size: 80%;"><b>Using Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#the-breast-cancer-data-now-with-keras" style="font-size: 80%;"><b>The Breast Cancer Data, now with Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs048.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs049.html#relevance" style="font-size: 80%;"><b>Relevance</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs050.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs051.html#exploding-gradients" style="font-size: 80%;"><b>Exploding gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs052.html#is-the-logistic-activation-function-sigmoid-our-choice" style="font-size: 80%;"><b>Is the Logistic activation function (Sigmoid)  our choice?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs053.html#logistic-function-as-the-root-of-problems" style="font-size: 80%;"><b>Logistic function as the root of problems</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs054.html#the-derivative-of-the-logistic-funtion" style="font-size: 80%;"><b>The derivative of the Logistic funtion</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs055.html#insights-from-the-paper-by-glorot-and-bengio" style="font-size: 80%;"><b>Insights from the paper by Glorot and Bengio</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs056.html#the-relu-function-family" style="font-size: 80%;"><b>The RELU function family</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs057.html#elu-function" style="font-size: 80%;"><b>ELU function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs058.html#which-activation-function-should-we-use" style="font-size: 80%;"><b>Which activation function should we use?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs059.html#more-on-activation-functions-output-layers" style="font-size: 80%;"><b>More on activation functions, output layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs060.html#fine-tuning-neural-network-hyperparameters" style="font-size: 80%;"><b>Fine-tuning neural network hyperparameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs061.html#hidden-layers" style="font-size: 80%;"><b>Hidden layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs062.html#batch-normalization" style="font-size: 80%;"><b>Batch Normalization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs063.html#dropout" style="font-size: 80%;"><b>Dropout</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs064.html#gradient-clipping" style="font-size: 80%;"><b>Gradient Clipping</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs065.html#a-top-down-perspective-on-neural-networks" style="font-size: 80%;"><b>A top-down perspective on Neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs066.html#more-top-down-perspectives" style="font-size: 80%;"><b>More top-down perspectives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs067.html#limitations-of-supervised-learning-with-deep-networks" style="font-size: 80%;"><b>Limitations of supervised learning with deep networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs068.html#limitations-of-nns" style="font-size: 80%;"><b>Limitations of NNs</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs069.html#homogeneous-data" style="font-size: 80%;"><b>Homogeneous data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs070.html#more-limitations" style="font-size: 80%;"><b>More limitations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs071.html#setting-up-a-multi-layer-perceptron-model-for-classification" style="font-size: 80%;"><b>Setting up a Multi-layer perceptron model for classification</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs072.html#defining-the-cost-function" style="font-size: 80%;"><b>Defining the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs073.html#example-binary-classification-problem" style="font-size: 80%;"><b>Example: binary classification problem</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs074.html#the-softmax-function" style="font-size: 80%;"><b>The Softmax function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs075.html#developing-a-code-for-doing-neural-networks-with-back-propagation" style="font-size: 80%;"><b>Developing a code for doing neural networks with back propagation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs077.html#train-and-test-datasets" style="font-size: 80%;"><b>Train and test datasets</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs078.html#define-model-and-architecture" style="font-size: 80%;"><b>Define model and architecture</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs079.html#layers" style="font-size: 80%;"><b>Layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs080.html#weights-and-biases" style="font-size: 80%;"><b>Weights and biases</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs081.html#feed-forward-pass" style="font-size: 80%;"><b>Feed-forward pass</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs082.html#matrix-multiplications" style="font-size: 80%;"><b>Matrix multiplications</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs083.html#choose-cost-function-and-optimizer" style="font-size: 80%;"><b>Choose cost function and optimizer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs084.html#optimizing-the-cost-function" style="font-size: 80%;"><b>Optimizing the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs085.html#regularization" style="font-size: 80%;"><b>Regularization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs086.html#matrix-multiplication" style="font-size: 80%;"><b>Matrix  multiplication</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs087.html#improving-performance" style="font-size: 80%;"><b>Improving performance</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs088.html#full-object-oriented-implementation" style="font-size: 80%;"><b>Full object-oriented implementation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs089.html#evaluate-model-performance-on-test-data" style="font-size: 80%;"><b>Evaluate model performance on test data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs090.html#adjust-hyperparameters" style="font-size: 80%;"><b>Adjust hyperparameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#scikit-learn-implementation" style="font-size: 80%;"><b>scikit-learn implementation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs094.html#building-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building neural networks in Tensorflow and Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs095.html#tensorflow" style="font-size: 80%;"><b>Tensorflow</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#using-keras" style="font-size: 80%;"><b>Using Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#building-a-neural-network-code" style="font-size: 80%;"><b>Building a neural network code</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#learning-rate-methods" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Learning rate methods</a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#usage-of-the-above-learning-rate-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of the above learning rate schedulers</a></li>
@@ -465,16 +463,23 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0035"></a>
 <!-- !split -->
-<h2 id="analyzing-the-last-results" class="anchor">Analyzing the last results </h2>
+<h2 id="the-back-propagation-equations-for-a-neural-network" class="anchor">The  back propagation equations for a neural network </h2>
+
+<p>We have thus</p>
+$$
+\frac{\partial{\cal C}((\boldsymbol{\Theta}^L)}{\partial w_{ij}^L}  =  \left(a_j^L - y_j\right)a_j^L(1-a_j^L)a_i^{L-1}, 
+$$
+
+<p>Defining</p>
+$$
+\delta_j^L = a_j^L(1-a_j^L)\left(a_j^L - y_j\right) = \sigma'(z_j^L)\frac{\partial {\cal C}}{\partial (a_j^L)},
+$$
+
+<p>and using the Hadamard product of two vectors we can write this as</p>
+$$
+\boldsymbol{\delta}^L = \sigma'(\boldsymbol{z}^L)\circ\frac{\partial {\cal C}}{\partial (\boldsymbol{a}^L)}.
+$$
 
-<p>This is an important expression. The second term on the right handside
-measures how fast the cost function is changing as a function of the $j$th
-output activation.  If, for example, the cost function doesn't depend
-much on a particular output node \( j \), then \( \delta_j^L \) will be small,
-which is what we would expect. The first term on the right, measures
-how fast the activation function \( f \) is changing at a given activation
-value \( z_j^L \).
-</p>
 
 <p>
 <!-- navigation buttons at the bottom of the page -->
diff --git a/doc/pub/week42/html/._week42-bs036.html b/doc/pub/week42/html/._week42-bs036.html
index 6d921843a..ce39018ee 100644
--- a/doc/pub/week42/html/._week42-bs036.html
+++ b/doc/pub/week42/html/._week42-bs036.html
@@ -36,16 +36,17 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Lecture October 14, 2024', 2, None, 'lecture-october-14-2024'),
-              ('Material for the active learning sessions on Tuesday and '
-               'Wednesday',
+ 'sections': [('Lecture October 13, 2025', 2, None, 'lecture-october-13-2025'),
+              ('Readings and videos', 2, None, 'readings-and-videos'),
+              ('Material for the lab sessions on Tuesday and Wednesday',
                2,
                None,
-               'material-for-the-active-learning-sessions-on-tuesday-and-wednesday'),
-              ('Writing a code which implements a feed-forward neural network',
+               'material-for-the-lab-sessions-on-tuesday-and-wednesday'),
+              ('Lecture material: Writing a code which implements a '
+               'feed-forward neural network',
                2,
                None,
-               'writing-a-code-which-implements-a-feed-forward-neural-network'),
+               'lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network'),
               ('Mathematics of deep learning',
                2,
                None,
@@ -55,10 +56,11 @@
                None,
                'reminder-on-books-with-hands-on-material-and-codes'),
               ('Reading recommendations', 2, None, 'reading-recommendations'),
-              ('First network example, simple percepetron with one input',
+              ('Reminder from last week: First network example, simple '
+               'percepetron with one input',
                2,
                None,
-               'first-network-example-simple-percepetron-with-one-input'),
+               'reminder-from-last-week-first-network-example-simple-percepetron-with-one-input'),
               ('Layout of a simple neural network with no hidden layer',
                2,
                None,
@@ -105,10 +107,10 @@
                None,
                'setting-up-the-equations-for-a-neural-network'),
               ('Layout of a neural network with three hidden layers (last '
-               'later = $l=L=4$, first layer $l=0$)',
+               'layer = $l=L=4$, first layer $l=0$)',
                2,
                None,
-               'layout-of-a-neural-network-with-three-hidden-layers-last-later-l-l-4-first-layer-l-0'),
+               'layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0'),
               ('Definitions', 2, None, 'definitions'),
               ('Inputs to the activation function',
                2,
@@ -291,10 +293,6 @@
                2,
                None,
                'collect-and-pre-process-data'),
-              ('The Breast Cancer Data, now with Keras',
-               2,
-               None,
-               'the-breast-cancer-data-now-with-keras'),
               ('Building a neural network code',
                2,
                None,
@@ -349,103 +347,103 @@
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs001.html#lecture-october-14-2024" style="font-size: 80%;"><b>Lecture October 14, 2024</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs002.html#material-for-the-active-learning-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Material for the active learning sessions on Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs003.html#writing-a-code-which-implements-a-feed-forward-neural-network" style="font-size: 80%;"><b>Writing a code which implements a feed-forward neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs004.html#mathematics-of-deep-learning" style="font-size: 80%;"><b>Mathematics of deep learning</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs005.html#reminder-on-books-with-hands-on-material-and-codes" style="font-size: 80%;"><b>Reminder on books with hands-on material and codes</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs006.html#reading-recommendations" style="font-size: 80%;"><b>Reading recommendations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs007.html#first-network-example-simple-percepetron-with-one-input" style="font-size: 80%;"><b>First network example, simple percepetron with one input</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs008.html#layout-of-a-simple-neural-network-with-no-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with no hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs009.html#optimizing-the-parameters" style="font-size: 80%;"><b>Optimizing the parameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs010.html#adding-a-hidden-layer" style="font-size: 80%;"><b>Adding a hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs011.html#layout-of-a-simple-neural-network-with-one-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with one hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs012.html#the-derivatives" style="font-size: 80%;"><b>The derivatives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs013.html#important-observations" style="font-size: 80%;"><b>Important observations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs014.html#the-training" style="font-size: 80%;"><b>The training</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs015.html#code-example" style="font-size: 80%;"><b>Code example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs016.html#simple-neural-network-and-the-back-propagation-equations" style="font-size: 80%;"><b>Simple neural network and the  back propagation equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs017.html#layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-with-two-hidden-noeds-and-one-output-node" style="font-size: 80%;"><b>Layout of a simple neural network with two input nodes, one  hidden layer with two hidden noeds and one output node</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs018.html#the-ouput-layer" style="font-size: 80%;"><b>The ouput layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs019.html#compact-expressions" style="font-size: 80%;"><b>Compact expressions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs020.html#output-layer" style="font-size: 80%;"><b>Output layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs021.html#explicit-derivatives" style="font-size: 80%;"><b>Explicit derivatives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs022.html#derivatives-of-the-hidden-layer" style="font-size: 80%;"><b>Derivatives of the hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs023.html#final-expression" style="font-size: 80%;"><b>Final expression</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs024.html#completing-the-list" style="font-size: 80%;"><b>Completing the list</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs025.html#final-expressions-for-the-biases-of-the-hidden-layer" style="font-size: 80%;"><b>Final expressions for the biases of the hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs026.html#gradient-expressions" style="font-size: 80%;"><b>Gradient expressions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs027.html#setting-up-the-equations-for-a-neural-network" style="font-size: 80%;"><b>Setting up the equations for a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs028.html#layout-of-a-neural-network-with-three-hidden-layers-last-later-l-l-4-first-layer-l-0" style="font-size: 80%;"><b>Layout of a neural network with three hidden layers (last later = \( l=L=4 \), first layer \( l=0 \))</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs029.html#definitions" style="font-size: 80%;"><b>Definitions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs030.html#inputs-to-the-activation-function" style="font-size: 80%;"><b>Inputs to the activation function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs031.html#layout-of-input-to-first-hidden-layer-l-1-from-input-layer-l-0" style="font-size: 80%;"><b>Layout of input to first hidden layer \( l=1 \) from input layer \( l=0 \)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs032.html#derivatives-and-the-chain-rule" style="font-size: 80%;"><b>Derivatives and the chain rule</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs033.html#derivative-of-the-cost-function" style="font-size: 80%;"><b>Derivative of the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs034.html#the-back-propagation-equations-for-a-neural-network" style="font-size: 80%;"><b>The  back propagation equations for a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs035.html#analyzing-the-last-results" style="font-size: 80%;"><b>Analyzing the last results</b></a></li>
-     <!-- navigation toc: --> <li><a href="#more-considerations" style="font-size: 80%;"><b>More considerations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs037.html#derivatives-in-terms-of-z-j-l" style="font-size: 80%;"><b>Derivatives in terms of \( z_j^L \)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs038.html#bringing-it-together" style="font-size: 80%;"><b>Bringing it together</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs039.html#final-back-propagating-equation" style="font-size: 80%;"><b>Final back propagating equation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs040.html#using-the-chain-rule-and-summing-over-all-k-entries" style="font-size: 80%;"><b>Using the chain rule and summing over all \( k \) entries</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs041.html#setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations" style="font-size: 80%;"><b>Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs042.html#setting-up-the-back-propagation-algorithm-part-1" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 1</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs043.html#setting-up-the-back-propagation-algorithm-part-2" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs044.html#setting-up-the-back-propagation-algorithm-part-3" style="font-size: 80%;"><b>Setting up the Back propagation algorithm, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs045.html#updating-the-gradients" style="font-size: 80%;"><b>Updating the gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs001.html#lecture-october-13-2025" style="font-size: 80%;"><b>Lecture October 13, 2025</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs002.html#readings-and-videos" style="font-size: 80%;"><b>Readings and videos</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs003.html#material-for-the-lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Material for the lab sessions on Tuesday and Wednesday</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs004.html#lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network" style="font-size: 80%;"><b>Lecture material: Writing a code which implements a feed-forward neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs005.html#mathematics-of-deep-learning" style="font-size: 80%;"><b>Mathematics of deep learning</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs006.html#reminder-on-books-with-hands-on-material-and-codes" style="font-size: 80%;"><b>Reminder on books with hands-on material and codes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs007.html#reading-recommendations" style="font-size: 80%;"><b>Reading recommendations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs008.html#reminder-from-last-week-first-network-example-simple-percepetron-with-one-input" style="font-size: 80%;"><b>Reminder from last week: First network example, simple percepetron with one input</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs009.html#layout-of-a-simple-neural-network-with-no-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with no hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs010.html#optimizing-the-parameters" style="font-size: 80%;"><b>Optimizing the parameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs011.html#adding-a-hidden-layer" style="font-size: 80%;"><b>Adding a hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs012.html#layout-of-a-simple-neural-network-with-one-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with one hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs013.html#the-derivatives" style="font-size: 80%;"><b>The derivatives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs014.html#important-observations" style="font-size: 80%;"><b>Important observations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs015.html#the-training" style="font-size: 80%;"><b>The training</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs016.html#code-example" style="font-size: 80%;"><b>Code example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs017.html#simple-neural-network-and-the-back-propagation-equations" style="font-size: 80%;"><b>Simple neural network and the  back propagation equations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs018.html#layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-with-two-hidden-noeds-and-one-output-node" style="font-size: 80%;"><b>Layout of a simple neural network with two input nodes, one  hidden layer with two hidden noeds and one output node</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs019.html#the-ouput-layer" style="font-size: 80%;"><b>The ouput layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs020.html#compact-expressions" style="font-size: 80%;"><b>Compact expressions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs021.html#output-layer" style="font-size: 80%;"><b>Output layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs022.html#explicit-derivatives" style="font-size: 80%;"><b>Explicit derivatives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs023.html#derivatives-of-the-hidden-layer" style="font-size: 80%;"><b>Derivatives of the hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs024.html#final-expression" style="font-size: 80%;"><b>Final expression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs025.html#completing-the-list" style="font-size: 80%;"><b>Completing the list</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs026.html#final-expressions-for-the-biases-of-the-hidden-layer" style="font-size: 80%;"><b>Final expressions for the biases of the hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs027.html#gradient-expressions" style="font-size: 80%;"><b>Gradient expressions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs028.html#setting-up-the-equations-for-a-neural-network" style="font-size: 80%;"><b>Setting up the equations for a neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs029.html#layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0" style="font-size: 80%;"><b>Layout of a neural network with three hidden layers (last layer = \( l=L=4 \), first layer \( l=0 \))</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs030.html#definitions" style="font-size: 80%;"><b>Definitions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs031.html#inputs-to-the-activation-function" style="font-size: 80%;"><b>Inputs to the activation function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs032.html#layout-of-input-to-first-hidden-layer-l-1-from-input-layer-l-0" style="font-size: 80%;"><b>Layout of input to first hidden layer \( l=1 \) from input layer \( l=0 \)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs033.html#derivatives-and-the-chain-rule" style="font-size: 80%;"><b>Derivatives and the chain rule</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs034.html#derivative-of-the-cost-function" style="font-size: 80%;"><b>Derivative of the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs035.html#the-back-propagation-equations-for-a-neural-network" style="font-size: 80%;"><b>The  back propagation equations for a neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="#analyzing-the-last-results" style="font-size: 80%;"><b>Analyzing the last results</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs037.html#more-considerations" style="font-size: 80%;"><b>More considerations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs038.html#derivatives-in-terms-of-z-j-l" style="font-size: 80%;"><b>Derivatives in terms of \( z_j^L \)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs039.html#bringing-it-together" style="font-size: 80%;"><b>Bringing it together</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs040.html#final-back-propagating-equation" style="font-size: 80%;"><b>Final back propagating equation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs041.html#using-the-chain-rule-and-summing-over-all-k-entries" style="font-size: 80%;"><b>Using the chain rule and summing over all \( k \) entries</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs042.html#setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations" style="font-size: 80%;"><b>Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs043.html#setting-up-the-back-propagation-algorithm-part-1" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 1</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs044.html#setting-up-the-back-propagation-algorithm-part-2" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 2</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs045.html#setting-up-the-back-propagation-algorithm-part-3" style="font-size: 80%;"><b>Setting up the Back propagation algorithm, part 3</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs046.html#updating-the-gradients" style="font-size: 80%;"><b>Updating the gradients</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#activation-functions" style="font-size: 80%;"><b>Activation functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs047.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs048.html#relevance" style="font-size: 80%;"><b>Relevance</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs049.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs050.html#exploding-gradients" style="font-size: 80%;"><b>Exploding gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs051.html#is-the-logistic-activation-function-sigmoid-our-choice" style="font-size: 80%;"><b>Is the Logistic activation function (Sigmoid)  our choice?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs052.html#logistic-function-as-the-root-of-problems" style="font-size: 80%;"><b>Logistic function as the root of problems</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs053.html#the-derivative-of-the-logistic-funtion" style="font-size: 80%;"><b>The derivative of the Logistic funtion</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs054.html#insights-from-the-paper-by-glorot-and-bengio" style="font-size: 80%;"><b>Insights from the paper by Glorot and Bengio</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs055.html#the-relu-function-family" style="font-size: 80%;"><b>The RELU function family</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs056.html#elu-function" style="font-size: 80%;"><b>ELU function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs057.html#which-activation-function-should-we-use" style="font-size: 80%;"><b>Which activation function should we use?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs058.html#more-on-activation-functions-output-layers" style="font-size: 80%;"><b>More on activation functions, output layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs059.html#fine-tuning-neural-network-hyperparameters" style="font-size: 80%;"><b>Fine-tuning neural network hyperparameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs060.html#hidden-layers" style="font-size: 80%;"><b>Hidden layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs061.html#batch-normalization" style="font-size: 80%;"><b>Batch Normalization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs062.html#dropout" style="font-size: 80%;"><b>Dropout</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs063.html#gradient-clipping" style="font-size: 80%;"><b>Gradient Clipping</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs064.html#a-top-down-perspective-on-neural-networks" style="font-size: 80%;"><b>A top-down perspective on Neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs065.html#more-top-down-perspectives" style="font-size: 80%;"><b>More top-down perspectives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs066.html#limitations-of-supervised-learning-with-deep-networks" style="font-size: 80%;"><b>Limitations of supervised learning with deep networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs067.html#limitations-of-nns" style="font-size: 80%;"><b>Limitations of NNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs068.html#homogeneous-data" style="font-size: 80%;"><b>Homogeneous data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs069.html#more-limitations" style="font-size: 80%;"><b>More limitations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs070.html#setting-up-a-multi-layer-perceptron-model-for-classification" style="font-size: 80%;"><b>Setting up a Multi-layer perceptron model for classification</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs071.html#defining-the-cost-function" style="font-size: 80%;"><b>Defining the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs072.html#example-binary-classification-problem" style="font-size: 80%;"><b>Example: binary classification problem</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs073.html#the-softmax-function" style="font-size: 80%;"><b>The Softmax function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs074.html#developing-a-code-for-doing-neural-networks-with-back-propagation" style="font-size: 80%;"><b>Developing a code for doing neural networks with back propagation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs076.html#train-and-test-datasets" style="font-size: 80%;"><b>Train and test datasets</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs077.html#define-model-and-architecture" style="font-size: 80%;"><b>Define model and architecture</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs078.html#layers" style="font-size: 80%;"><b>Layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs079.html#weights-and-biases" style="font-size: 80%;"><b>Weights and biases</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs080.html#feed-forward-pass" style="font-size: 80%;"><b>Feed-forward pass</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs081.html#matrix-multiplications" style="font-size: 80%;"><b>Matrix multiplications</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs082.html#choose-cost-function-and-optimizer" style="font-size: 80%;"><b>Choose cost function and optimizer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs083.html#optimizing-the-cost-function" style="font-size: 80%;"><b>Optimizing the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs084.html#regularization" style="font-size: 80%;"><b>Regularization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs085.html#matrix-multiplication" style="font-size: 80%;"><b>Matrix  multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs086.html#improving-performance" style="font-size: 80%;"><b>Improving performance</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs087.html#full-object-oriented-implementation" style="font-size: 80%;"><b>Full object-oriented implementation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs088.html#evaluate-model-performance-on-test-data" style="font-size: 80%;"><b>Evaluate model performance on test data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs089.html#adjust-hyperparameters" style="font-size: 80%;"><b>Adjust hyperparameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs091.html#scikit-learn-implementation" style="font-size: 80%;"><b>scikit-learn implementation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#building-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs094.html#tensorflow" style="font-size: 80%;"><b>Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs095.html#using-keras" style="font-size: 80%;"><b>Using Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#the-breast-cancer-data-now-with-keras" style="font-size: 80%;"><b>The Breast Cancer Data, now with Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs048.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs049.html#relevance" style="font-size: 80%;"><b>Relevance</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs050.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs051.html#exploding-gradients" style="font-size: 80%;"><b>Exploding gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs052.html#is-the-logistic-activation-function-sigmoid-our-choice" style="font-size: 80%;"><b>Is the Logistic activation function (Sigmoid)  our choice?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs053.html#logistic-function-as-the-root-of-problems" style="font-size: 80%;"><b>Logistic function as the root of problems</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs054.html#the-derivative-of-the-logistic-funtion" style="font-size: 80%;"><b>The derivative of the Logistic funtion</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs055.html#insights-from-the-paper-by-glorot-and-bengio" style="font-size: 80%;"><b>Insights from the paper by Glorot and Bengio</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs056.html#the-relu-function-family" style="font-size: 80%;"><b>The RELU function family</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs057.html#elu-function" style="font-size: 80%;"><b>ELU function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs058.html#which-activation-function-should-we-use" style="font-size: 80%;"><b>Which activation function should we use?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs059.html#more-on-activation-functions-output-layers" style="font-size: 80%;"><b>More on activation functions, output layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs060.html#fine-tuning-neural-network-hyperparameters" style="font-size: 80%;"><b>Fine-tuning neural network hyperparameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs061.html#hidden-layers" style="font-size: 80%;"><b>Hidden layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs062.html#batch-normalization" style="font-size: 80%;"><b>Batch Normalization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs063.html#dropout" style="font-size: 80%;"><b>Dropout</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs064.html#gradient-clipping" style="font-size: 80%;"><b>Gradient Clipping</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs065.html#a-top-down-perspective-on-neural-networks" style="font-size: 80%;"><b>A top-down perspective on Neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs066.html#more-top-down-perspectives" style="font-size: 80%;"><b>More top-down perspectives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs067.html#limitations-of-supervised-learning-with-deep-networks" style="font-size: 80%;"><b>Limitations of supervised learning with deep networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs068.html#limitations-of-nns" style="font-size: 80%;"><b>Limitations of NNs</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs069.html#homogeneous-data" style="font-size: 80%;"><b>Homogeneous data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs070.html#more-limitations" style="font-size: 80%;"><b>More limitations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs071.html#setting-up-a-multi-layer-perceptron-model-for-classification" style="font-size: 80%;"><b>Setting up a Multi-layer perceptron model for classification</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs072.html#defining-the-cost-function" style="font-size: 80%;"><b>Defining the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs073.html#example-binary-classification-problem" style="font-size: 80%;"><b>Example: binary classification problem</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs074.html#the-softmax-function" style="font-size: 80%;"><b>The Softmax function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs075.html#developing-a-code-for-doing-neural-networks-with-back-propagation" style="font-size: 80%;"><b>Developing a code for doing neural networks with back propagation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs077.html#train-and-test-datasets" style="font-size: 80%;"><b>Train and test datasets</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs078.html#define-model-and-architecture" style="font-size: 80%;"><b>Define model and architecture</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs079.html#layers" style="font-size: 80%;"><b>Layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs080.html#weights-and-biases" style="font-size: 80%;"><b>Weights and biases</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs081.html#feed-forward-pass" style="font-size: 80%;"><b>Feed-forward pass</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs082.html#matrix-multiplications" style="font-size: 80%;"><b>Matrix multiplications</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs083.html#choose-cost-function-and-optimizer" style="font-size: 80%;"><b>Choose cost function and optimizer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs084.html#optimizing-the-cost-function" style="font-size: 80%;"><b>Optimizing the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs085.html#regularization" style="font-size: 80%;"><b>Regularization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs086.html#matrix-multiplication" style="font-size: 80%;"><b>Matrix  multiplication</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs087.html#improving-performance" style="font-size: 80%;"><b>Improving performance</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs088.html#full-object-oriented-implementation" style="font-size: 80%;"><b>Full object-oriented implementation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs089.html#evaluate-model-performance-on-test-data" style="font-size: 80%;"><b>Evaluate model performance on test data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs090.html#adjust-hyperparameters" style="font-size: 80%;"><b>Adjust hyperparameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#scikit-learn-implementation" style="font-size: 80%;"><b>scikit-learn implementation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs094.html#building-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building neural networks in Tensorflow and Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs095.html#tensorflow" style="font-size: 80%;"><b>Tensorflow</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#using-keras" style="font-size: 80%;"><b>Using Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#building-a-neural-network-code" style="font-size: 80%;"><b>Building a neural network code</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#learning-rate-methods" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Learning rate methods</a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#usage-of-the-above-learning-rate-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of the above learning rate schedulers</a></li>
@@ -465,27 +463,17 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0036"></a>
 <!-- !split -->
-<h2 id="more-considerations" class="anchor">More considerations </h2>
+<h2 id="analyzing-the-last-results" class="anchor">Analyzing the last results </h2>
 
-<p>Notice that everything in the above equations is easily computed.  In
-particular, we compute \( z_j^L \) while computing the behaviour of the
-network, and it is only a small additional overhead to compute
-\( \sigma'(z^L_j) \).  The exact form of the derivative with respect to the
-output depends on the form of the cost function.
-However, provided the cost function is known there should be little
-trouble in calculating
+<p>This is an important expression. The second term on the right handside
+measures how fast the cost function is changing as a function of the $j$th
+output activation.  If, for example, the cost function doesn't depend
+much on a particular output node \( j \), then \( \delta_j^L \) will be small,
+which is what we would expect. The first term on the right, measures
+how fast the activation function \( f \) is changing at a given activation
+value \( z_j^L \).
 </p>
 
-$$
-\frac{\partial {\cal C}}{\partial (a_j^L)}
-$$
-
-<p>With the definition of \( \delta_j^L \) we have a more compact definition of the derivative of the cost function in terms of the weights, namely</p>
-$$
-\frac{\partial{\cal C}}{\partial w_{ij}^L}  =  \delta_j^La_i^{L-1}.
-$$
-
-
 <p>
 <!-- navigation buttons at the bottom of the page -->
 <ul class="pagination">
diff --git a/doc/pub/week42/html/._week42-bs037.html b/doc/pub/week42/html/._week42-bs037.html
index d1fe17793..59203f250 100644
--- a/doc/pub/week42/html/._week42-bs037.html
+++ b/doc/pub/week42/html/._week42-bs037.html
@@ -36,16 +36,17 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Lecture October 14, 2024', 2, None, 'lecture-october-14-2024'),
-              ('Material for the active learning sessions on Tuesday and '
-               'Wednesday',
+ 'sections': [('Lecture October 13, 2025', 2, None, 'lecture-october-13-2025'),
+              ('Readings and videos', 2, None, 'readings-and-videos'),
+              ('Material for the lab sessions on Tuesday and Wednesday',
                2,
                None,
-               'material-for-the-active-learning-sessions-on-tuesday-and-wednesday'),
-              ('Writing a code which implements a feed-forward neural network',
+               'material-for-the-lab-sessions-on-tuesday-and-wednesday'),
+              ('Lecture material: Writing a code which implements a '
+               'feed-forward neural network',
                2,
                None,
-               'writing-a-code-which-implements-a-feed-forward-neural-network'),
+               'lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network'),
               ('Mathematics of deep learning',
                2,
                None,
@@ -55,10 +56,11 @@
                None,
                'reminder-on-books-with-hands-on-material-and-codes'),
               ('Reading recommendations', 2, None, 'reading-recommendations'),
-              ('First network example, simple percepetron with one input',
+              ('Reminder from last week: First network example, simple '
+               'percepetron with one input',
                2,
                None,
-               'first-network-example-simple-percepetron-with-one-input'),
+               'reminder-from-last-week-first-network-example-simple-percepetron-with-one-input'),
               ('Layout of a simple neural network with no hidden layer',
                2,
                None,
@@ -105,10 +107,10 @@
                None,
                'setting-up-the-equations-for-a-neural-network'),
               ('Layout of a neural network with three hidden layers (last '
-               'later = $l=L=4$, first layer $l=0$)',
+               'layer = $l=L=4$, first layer $l=0$)',
                2,
                None,
-               'layout-of-a-neural-network-with-three-hidden-layers-last-later-l-l-4-first-layer-l-0'),
+               'layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0'),
               ('Definitions', 2, None, 'definitions'),
               ('Inputs to the activation function',
                2,
@@ -291,10 +293,6 @@
                2,
                None,
                'collect-and-pre-process-data'),
-              ('The Breast Cancer Data, now with Keras',
-               2,
-               None,
-               'the-breast-cancer-data-now-with-keras'),
               ('Building a neural network code',
                2,
                None,
@@ -349,103 +347,103 @@
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs001.html#lecture-october-14-2024" style="font-size: 80%;"><b>Lecture October 14, 2024</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs002.html#material-for-the-active-learning-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Material for the active learning sessions on Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs003.html#writing-a-code-which-implements-a-feed-forward-neural-network" style="font-size: 80%;"><b>Writing a code which implements a feed-forward neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs004.html#mathematics-of-deep-learning" style="font-size: 80%;"><b>Mathematics of deep learning</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs005.html#reminder-on-books-with-hands-on-material-and-codes" style="font-size: 80%;"><b>Reminder on books with hands-on material and codes</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs006.html#reading-recommendations" style="font-size: 80%;"><b>Reading recommendations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs007.html#first-network-example-simple-percepetron-with-one-input" style="font-size: 80%;"><b>First network example, simple percepetron with one input</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs008.html#layout-of-a-simple-neural-network-with-no-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with no hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs009.html#optimizing-the-parameters" style="font-size: 80%;"><b>Optimizing the parameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs010.html#adding-a-hidden-layer" style="font-size: 80%;"><b>Adding a hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs011.html#layout-of-a-simple-neural-network-with-one-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with one hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs012.html#the-derivatives" style="font-size: 80%;"><b>The derivatives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs013.html#important-observations" style="font-size: 80%;"><b>Important observations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs014.html#the-training" style="font-size: 80%;"><b>The training</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs015.html#code-example" style="font-size: 80%;"><b>Code example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs016.html#simple-neural-network-and-the-back-propagation-equations" style="font-size: 80%;"><b>Simple neural network and the  back propagation equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs017.html#layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-with-two-hidden-noeds-and-one-output-node" style="font-size: 80%;"><b>Layout of a simple neural network with two input nodes, one  hidden layer with two hidden noeds and one output node</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs018.html#the-ouput-layer" style="font-size: 80%;"><b>The ouput layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs019.html#compact-expressions" style="font-size: 80%;"><b>Compact expressions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs020.html#output-layer" style="font-size: 80%;"><b>Output layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs021.html#explicit-derivatives" style="font-size: 80%;"><b>Explicit derivatives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs022.html#derivatives-of-the-hidden-layer" style="font-size: 80%;"><b>Derivatives of the hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs023.html#final-expression" style="font-size: 80%;"><b>Final expression</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs024.html#completing-the-list" style="font-size: 80%;"><b>Completing the list</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs025.html#final-expressions-for-the-biases-of-the-hidden-layer" style="font-size: 80%;"><b>Final expressions for the biases of the hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs026.html#gradient-expressions" style="font-size: 80%;"><b>Gradient expressions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs027.html#setting-up-the-equations-for-a-neural-network" style="font-size: 80%;"><b>Setting up the equations for a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs028.html#layout-of-a-neural-network-with-three-hidden-layers-last-later-l-l-4-first-layer-l-0" style="font-size: 80%;"><b>Layout of a neural network with three hidden layers (last later = \( l=L=4 \), first layer \( l=0 \))</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs029.html#definitions" style="font-size: 80%;"><b>Definitions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs030.html#inputs-to-the-activation-function" style="font-size: 80%;"><b>Inputs to the activation function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs031.html#layout-of-input-to-first-hidden-layer-l-1-from-input-layer-l-0" style="font-size: 80%;"><b>Layout of input to first hidden layer \( l=1 \) from input layer \( l=0 \)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs032.html#derivatives-and-the-chain-rule" style="font-size: 80%;"><b>Derivatives and the chain rule</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs033.html#derivative-of-the-cost-function" style="font-size: 80%;"><b>Derivative of the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs034.html#the-back-propagation-equations-for-a-neural-network" style="font-size: 80%;"><b>The  back propagation equations for a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs035.html#analyzing-the-last-results" style="font-size: 80%;"><b>Analyzing the last results</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs036.html#more-considerations" style="font-size: 80%;"><b>More considerations</b></a></li>
-     <!-- navigation toc: --> <li><a href="#derivatives-in-terms-of-z-j-l" style="font-size: 80%;"><b>Derivatives in terms of \( z_j^L \)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs038.html#bringing-it-together" style="font-size: 80%;"><b>Bringing it together</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs039.html#final-back-propagating-equation" style="font-size: 80%;"><b>Final back propagating equation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs040.html#using-the-chain-rule-and-summing-over-all-k-entries" style="font-size: 80%;"><b>Using the chain rule and summing over all \( k \) entries</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs041.html#setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations" style="font-size: 80%;"><b>Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs042.html#setting-up-the-back-propagation-algorithm-part-1" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 1</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs043.html#setting-up-the-back-propagation-algorithm-part-2" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs044.html#setting-up-the-back-propagation-algorithm-part-3" style="font-size: 80%;"><b>Setting up the Back propagation algorithm, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs045.html#updating-the-gradients" style="font-size: 80%;"><b>Updating the gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs001.html#lecture-october-13-2025" style="font-size: 80%;"><b>Lecture October 13, 2025</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs002.html#readings-and-videos" style="font-size: 80%;"><b>Readings and videos</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs003.html#material-for-the-lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Material for the lab sessions on Tuesday and Wednesday</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs004.html#lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network" style="font-size: 80%;"><b>Lecture material: Writing a code which implements a feed-forward neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs005.html#mathematics-of-deep-learning" style="font-size: 80%;"><b>Mathematics of deep learning</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs006.html#reminder-on-books-with-hands-on-material-and-codes" style="font-size: 80%;"><b>Reminder on books with hands-on material and codes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs007.html#reading-recommendations" style="font-size: 80%;"><b>Reading recommendations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs008.html#reminder-from-last-week-first-network-example-simple-percepetron-with-one-input" style="font-size: 80%;"><b>Reminder from last week: First network example, simple percepetron with one input</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs009.html#layout-of-a-simple-neural-network-with-no-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with no hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs010.html#optimizing-the-parameters" style="font-size: 80%;"><b>Optimizing the parameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs011.html#adding-a-hidden-layer" style="font-size: 80%;"><b>Adding a hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs012.html#layout-of-a-simple-neural-network-with-one-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with one hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs013.html#the-derivatives" style="font-size: 80%;"><b>The derivatives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs014.html#important-observations" style="font-size: 80%;"><b>Important observations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs015.html#the-training" style="font-size: 80%;"><b>The training</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs016.html#code-example" style="font-size: 80%;"><b>Code example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs017.html#simple-neural-network-and-the-back-propagation-equations" style="font-size: 80%;"><b>Simple neural network and the  back propagation equations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs018.html#layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-with-two-hidden-noeds-and-one-output-node" style="font-size: 80%;"><b>Layout of a simple neural network with two input nodes, one  hidden layer with two hidden noeds and one output node</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs019.html#the-ouput-layer" style="font-size: 80%;"><b>The ouput layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs020.html#compact-expressions" style="font-size: 80%;"><b>Compact expressions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs021.html#output-layer" style="font-size: 80%;"><b>Output layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs022.html#explicit-derivatives" style="font-size: 80%;"><b>Explicit derivatives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs023.html#derivatives-of-the-hidden-layer" style="font-size: 80%;"><b>Derivatives of the hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs024.html#final-expression" style="font-size: 80%;"><b>Final expression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs025.html#completing-the-list" style="font-size: 80%;"><b>Completing the list</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs026.html#final-expressions-for-the-biases-of-the-hidden-layer" style="font-size: 80%;"><b>Final expressions for the biases of the hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs027.html#gradient-expressions" style="font-size: 80%;"><b>Gradient expressions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs028.html#setting-up-the-equations-for-a-neural-network" style="font-size: 80%;"><b>Setting up the equations for a neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs029.html#layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0" style="font-size: 80%;"><b>Layout of a neural network with three hidden layers (last layer = \( l=L=4 \), first layer \( l=0 \))</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs030.html#definitions" style="font-size: 80%;"><b>Definitions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs031.html#inputs-to-the-activation-function" style="font-size: 80%;"><b>Inputs to the activation function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs032.html#layout-of-input-to-first-hidden-layer-l-1-from-input-layer-l-0" style="font-size: 80%;"><b>Layout of input to first hidden layer \( l=1 \) from input layer \( l=0 \)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs033.html#derivatives-and-the-chain-rule" style="font-size: 80%;"><b>Derivatives and the chain rule</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs034.html#derivative-of-the-cost-function" style="font-size: 80%;"><b>Derivative of the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs035.html#the-back-propagation-equations-for-a-neural-network" style="font-size: 80%;"><b>The  back propagation equations for a neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs036.html#analyzing-the-last-results" style="font-size: 80%;"><b>Analyzing the last results</b></a></li>
+     <!-- navigation toc: --> <li><a href="#more-considerations" style="font-size: 80%;"><b>More considerations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs038.html#derivatives-in-terms-of-z-j-l" style="font-size: 80%;"><b>Derivatives in terms of \( z_j^L \)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs039.html#bringing-it-together" style="font-size: 80%;"><b>Bringing it together</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs040.html#final-back-propagating-equation" style="font-size: 80%;"><b>Final back propagating equation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs041.html#using-the-chain-rule-and-summing-over-all-k-entries" style="font-size: 80%;"><b>Using the chain rule and summing over all \( k \) entries</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs042.html#setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations" style="font-size: 80%;"><b>Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs043.html#setting-up-the-back-propagation-algorithm-part-1" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 1</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs044.html#setting-up-the-back-propagation-algorithm-part-2" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 2</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs045.html#setting-up-the-back-propagation-algorithm-part-3" style="font-size: 80%;"><b>Setting up the Back propagation algorithm, part 3</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs046.html#updating-the-gradients" style="font-size: 80%;"><b>Updating the gradients</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#activation-functions" style="font-size: 80%;"><b>Activation functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs047.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs048.html#relevance" style="font-size: 80%;"><b>Relevance</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs049.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs050.html#exploding-gradients" style="font-size: 80%;"><b>Exploding gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs051.html#is-the-logistic-activation-function-sigmoid-our-choice" style="font-size: 80%;"><b>Is the Logistic activation function (Sigmoid)  our choice?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs052.html#logistic-function-as-the-root-of-problems" style="font-size: 80%;"><b>Logistic function as the root of problems</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs053.html#the-derivative-of-the-logistic-funtion" style="font-size: 80%;"><b>The derivative of the Logistic funtion</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs054.html#insights-from-the-paper-by-glorot-and-bengio" style="font-size: 80%;"><b>Insights from the paper by Glorot and Bengio</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs055.html#the-relu-function-family" style="font-size: 80%;"><b>The RELU function family</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs056.html#elu-function" style="font-size: 80%;"><b>ELU function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs057.html#which-activation-function-should-we-use" style="font-size: 80%;"><b>Which activation function should we use?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs058.html#more-on-activation-functions-output-layers" style="font-size: 80%;"><b>More on activation functions, output layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs059.html#fine-tuning-neural-network-hyperparameters" style="font-size: 80%;"><b>Fine-tuning neural network hyperparameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs060.html#hidden-layers" style="font-size: 80%;"><b>Hidden layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs061.html#batch-normalization" style="font-size: 80%;"><b>Batch Normalization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs062.html#dropout" style="font-size: 80%;"><b>Dropout</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs063.html#gradient-clipping" style="font-size: 80%;"><b>Gradient Clipping</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs064.html#a-top-down-perspective-on-neural-networks" style="font-size: 80%;"><b>A top-down perspective on Neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs065.html#more-top-down-perspectives" style="font-size: 80%;"><b>More top-down perspectives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs066.html#limitations-of-supervised-learning-with-deep-networks" style="font-size: 80%;"><b>Limitations of supervised learning with deep networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs067.html#limitations-of-nns" style="font-size: 80%;"><b>Limitations of NNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs068.html#homogeneous-data" style="font-size: 80%;"><b>Homogeneous data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs069.html#more-limitations" style="font-size: 80%;"><b>More limitations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs070.html#setting-up-a-multi-layer-perceptron-model-for-classification" style="font-size: 80%;"><b>Setting up a Multi-layer perceptron model for classification</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs071.html#defining-the-cost-function" style="font-size: 80%;"><b>Defining the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs072.html#example-binary-classification-problem" style="font-size: 80%;"><b>Example: binary classification problem</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs073.html#the-softmax-function" style="font-size: 80%;"><b>The Softmax function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs074.html#developing-a-code-for-doing-neural-networks-with-back-propagation" style="font-size: 80%;"><b>Developing a code for doing neural networks with back propagation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs076.html#train-and-test-datasets" style="font-size: 80%;"><b>Train and test datasets</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs077.html#define-model-and-architecture" style="font-size: 80%;"><b>Define model and architecture</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs078.html#layers" style="font-size: 80%;"><b>Layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs079.html#weights-and-biases" style="font-size: 80%;"><b>Weights and biases</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs080.html#feed-forward-pass" style="font-size: 80%;"><b>Feed-forward pass</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs081.html#matrix-multiplications" style="font-size: 80%;"><b>Matrix multiplications</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs082.html#choose-cost-function-and-optimizer" style="font-size: 80%;"><b>Choose cost function and optimizer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs083.html#optimizing-the-cost-function" style="font-size: 80%;"><b>Optimizing the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs084.html#regularization" style="font-size: 80%;"><b>Regularization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs085.html#matrix-multiplication" style="font-size: 80%;"><b>Matrix  multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs086.html#improving-performance" style="font-size: 80%;"><b>Improving performance</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs087.html#full-object-oriented-implementation" style="font-size: 80%;"><b>Full object-oriented implementation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs088.html#evaluate-model-performance-on-test-data" style="font-size: 80%;"><b>Evaluate model performance on test data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs089.html#adjust-hyperparameters" style="font-size: 80%;"><b>Adjust hyperparameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs091.html#scikit-learn-implementation" style="font-size: 80%;"><b>scikit-learn implementation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#building-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs094.html#tensorflow" style="font-size: 80%;"><b>Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs095.html#using-keras" style="font-size: 80%;"><b>Using Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#the-breast-cancer-data-now-with-keras" style="font-size: 80%;"><b>The Breast Cancer Data, now with Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs048.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs049.html#relevance" style="font-size: 80%;"><b>Relevance</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs050.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs051.html#exploding-gradients" style="font-size: 80%;"><b>Exploding gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs052.html#is-the-logistic-activation-function-sigmoid-our-choice" style="font-size: 80%;"><b>Is the Logistic activation function (Sigmoid)  our choice?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs053.html#logistic-function-as-the-root-of-problems" style="font-size: 80%;"><b>Logistic function as the root of problems</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs054.html#the-derivative-of-the-logistic-funtion" style="font-size: 80%;"><b>The derivative of the Logistic funtion</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs055.html#insights-from-the-paper-by-glorot-and-bengio" style="font-size: 80%;"><b>Insights from the paper by Glorot and Bengio</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs056.html#the-relu-function-family" style="font-size: 80%;"><b>The RELU function family</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs057.html#elu-function" style="font-size: 80%;"><b>ELU function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs058.html#which-activation-function-should-we-use" style="font-size: 80%;"><b>Which activation function should we use?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs059.html#more-on-activation-functions-output-layers" style="font-size: 80%;"><b>More on activation functions, output layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs060.html#fine-tuning-neural-network-hyperparameters" style="font-size: 80%;"><b>Fine-tuning neural network hyperparameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs061.html#hidden-layers" style="font-size: 80%;"><b>Hidden layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs062.html#batch-normalization" style="font-size: 80%;"><b>Batch Normalization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs063.html#dropout" style="font-size: 80%;"><b>Dropout</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs064.html#gradient-clipping" style="font-size: 80%;"><b>Gradient Clipping</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs065.html#a-top-down-perspective-on-neural-networks" style="font-size: 80%;"><b>A top-down perspective on Neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs066.html#more-top-down-perspectives" style="font-size: 80%;"><b>More top-down perspectives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs067.html#limitations-of-supervised-learning-with-deep-networks" style="font-size: 80%;"><b>Limitations of supervised learning with deep networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs068.html#limitations-of-nns" style="font-size: 80%;"><b>Limitations of NNs</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs069.html#homogeneous-data" style="font-size: 80%;"><b>Homogeneous data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs070.html#more-limitations" style="font-size: 80%;"><b>More limitations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs071.html#setting-up-a-multi-layer-perceptron-model-for-classification" style="font-size: 80%;"><b>Setting up a Multi-layer perceptron model for classification</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs072.html#defining-the-cost-function" style="font-size: 80%;"><b>Defining the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs073.html#example-binary-classification-problem" style="font-size: 80%;"><b>Example: binary classification problem</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs074.html#the-softmax-function" style="font-size: 80%;"><b>The Softmax function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs075.html#developing-a-code-for-doing-neural-networks-with-back-propagation" style="font-size: 80%;"><b>Developing a code for doing neural networks with back propagation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs077.html#train-and-test-datasets" style="font-size: 80%;"><b>Train and test datasets</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs078.html#define-model-and-architecture" style="font-size: 80%;"><b>Define model and architecture</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs079.html#layers" style="font-size: 80%;"><b>Layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs080.html#weights-and-biases" style="font-size: 80%;"><b>Weights and biases</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs081.html#feed-forward-pass" style="font-size: 80%;"><b>Feed-forward pass</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs082.html#matrix-multiplications" style="font-size: 80%;"><b>Matrix multiplications</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs083.html#choose-cost-function-and-optimizer" style="font-size: 80%;"><b>Choose cost function and optimizer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs084.html#optimizing-the-cost-function" style="font-size: 80%;"><b>Optimizing the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs085.html#regularization" style="font-size: 80%;"><b>Regularization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs086.html#matrix-multiplication" style="font-size: 80%;"><b>Matrix  multiplication</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs087.html#improving-performance" style="font-size: 80%;"><b>Improving performance</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs088.html#full-object-oriented-implementation" style="font-size: 80%;"><b>Full object-oriented implementation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs089.html#evaluate-model-performance-on-test-data" style="font-size: 80%;"><b>Evaluate model performance on test data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs090.html#adjust-hyperparameters" style="font-size: 80%;"><b>Adjust hyperparameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#scikit-learn-implementation" style="font-size: 80%;"><b>scikit-learn implementation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs094.html#building-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building neural networks in Tensorflow and Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs095.html#tensorflow" style="font-size: 80%;"><b>Tensorflow</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#using-keras" style="font-size: 80%;"><b>Using Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#building-a-neural-network-code" style="font-size: 80%;"><b>Building a neural network code</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#learning-rate-methods" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Learning rate methods</a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#usage-of-the-above-learning-rate-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of the above learning rate schedulers</a></li>
@@ -465,20 +463,26 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0037"></a>
 <!-- !split -->
-<h2 id="derivatives-in-terms-of-z-j-l" class="anchor">Derivatives in terms of \( z_j^L \) </h2>
+<h2 id="more-considerations" class="anchor">More considerations </h2>
 
-<p>It is also easy to see that our previous equation can be written as</p>
+<p>Notice that everything in the above equations is easily computed.  In
+particular, we compute \( z_j^L \) while computing the behaviour of the
+network, and it is only a small additional overhead to compute
+\( \sigma'(z^L_j) \).  The exact form of the derivative with respect to the
+output depends on the form of the cost function.
+However, provided the cost function is known there should be little
+trouble in calculating
+</p>
 
 $$
-\delta_j^L =\frac{\partial {\cal C}}{\partial z_j^L}= \frac{\partial {\cal C}}{\partial a_j^L}\frac{\partial a_j^L}{\partial z_j^L},
+\frac{\partial {\cal C}}{\partial (a_j^L)}
 $$
 
-<p>which can also be interpreted as the partial derivative of the cost function with respect to the biases \( b_j^L \), namely</p>
+<p>With the definition of \( \delta_j^L \) we have a more compact definition of the derivative of the cost function in terms of the weights, namely</p>
 $$
-\delta_j^L = \frac{\partial {\cal C}}{\partial b_j^L}\frac{\partial b_j^L}{\partial z_j^L}=\frac{\partial {\cal C}}{\partial b_j^L},
+\frac{\partial{\cal C}}{\partial w_{ij}^L}  =  \delta_j^La_i^{L-1}.
 $$
 
-<p>That is, the error \( \delta_j^L \) is exactly equal to the rate of change of the cost function as a function of the bias. </p>
 
 <p>
 <!-- navigation buttons at the bottom of the page -->
diff --git a/doc/pub/week42/html/._week42-bs038.html b/doc/pub/week42/html/._week42-bs038.html
index 2e20d7b0d..dcd5c7a2a 100644
--- a/doc/pub/week42/html/._week42-bs038.html
+++ b/doc/pub/week42/html/._week42-bs038.html
@@ -36,16 +36,17 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Lecture October 14, 2024', 2, None, 'lecture-october-14-2024'),
-              ('Material for the active learning sessions on Tuesday and '
-               'Wednesday',
+ 'sections': [('Lecture October 13, 2025', 2, None, 'lecture-october-13-2025'),
+              ('Readings and videos', 2, None, 'readings-and-videos'),
+              ('Material for the lab sessions on Tuesday and Wednesday',
                2,
                None,
-               'material-for-the-active-learning-sessions-on-tuesday-and-wednesday'),
-              ('Writing a code which implements a feed-forward neural network',
+               'material-for-the-lab-sessions-on-tuesday-and-wednesday'),
+              ('Lecture material: Writing a code which implements a '
+               'feed-forward neural network',
                2,
                None,
-               'writing-a-code-which-implements-a-feed-forward-neural-network'),
+               'lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network'),
               ('Mathematics of deep learning',
                2,
                None,
@@ -55,10 +56,11 @@
                None,
                'reminder-on-books-with-hands-on-material-and-codes'),
               ('Reading recommendations', 2, None, 'reading-recommendations'),
-              ('First network example, simple percepetron with one input',
+              ('Reminder from last week: First network example, simple '
+               'percepetron with one input',
                2,
                None,
-               'first-network-example-simple-percepetron-with-one-input'),
+               'reminder-from-last-week-first-network-example-simple-percepetron-with-one-input'),
               ('Layout of a simple neural network with no hidden layer',
                2,
                None,
@@ -105,10 +107,10 @@
                None,
                'setting-up-the-equations-for-a-neural-network'),
               ('Layout of a neural network with three hidden layers (last '
-               'later = $l=L=4$, first layer $l=0$)',
+               'layer = $l=L=4$, first layer $l=0$)',
                2,
                None,
-               'layout-of-a-neural-network-with-three-hidden-layers-last-later-l-l-4-first-layer-l-0'),
+               'layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0'),
               ('Definitions', 2, None, 'definitions'),
               ('Inputs to the activation function',
                2,
@@ -291,10 +293,6 @@
                2,
                None,
                'collect-and-pre-process-data'),
-              ('The Breast Cancer Data, now with Keras',
-               2,
-               None,
-               'the-breast-cancer-data-now-with-keras'),
               ('Building a neural network code',
                2,
                None,
@@ -349,103 +347,103 @@
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs001.html#lecture-october-14-2024" style="font-size: 80%;"><b>Lecture October 14, 2024</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs002.html#material-for-the-active-learning-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Material for the active learning sessions on Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs003.html#writing-a-code-which-implements-a-feed-forward-neural-network" style="font-size: 80%;"><b>Writing a code which implements a feed-forward neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs004.html#mathematics-of-deep-learning" style="font-size: 80%;"><b>Mathematics of deep learning</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs005.html#reminder-on-books-with-hands-on-material-and-codes" style="font-size: 80%;"><b>Reminder on books with hands-on material and codes</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs006.html#reading-recommendations" style="font-size: 80%;"><b>Reading recommendations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs007.html#first-network-example-simple-percepetron-with-one-input" style="font-size: 80%;"><b>First network example, simple percepetron with one input</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs008.html#layout-of-a-simple-neural-network-with-no-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with no hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs009.html#optimizing-the-parameters" style="font-size: 80%;"><b>Optimizing the parameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs010.html#adding-a-hidden-layer" style="font-size: 80%;"><b>Adding a hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs011.html#layout-of-a-simple-neural-network-with-one-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with one hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs012.html#the-derivatives" style="font-size: 80%;"><b>The derivatives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs013.html#important-observations" style="font-size: 80%;"><b>Important observations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs014.html#the-training" style="font-size: 80%;"><b>The training</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs015.html#code-example" style="font-size: 80%;"><b>Code example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs016.html#simple-neural-network-and-the-back-propagation-equations" style="font-size: 80%;"><b>Simple neural network and the  back propagation equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs017.html#layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-with-two-hidden-noeds-and-one-output-node" style="font-size: 80%;"><b>Layout of a simple neural network with two input nodes, one  hidden layer with two hidden noeds and one output node</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs018.html#the-ouput-layer" style="font-size: 80%;"><b>The ouput layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs019.html#compact-expressions" style="font-size: 80%;"><b>Compact expressions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs020.html#output-layer" style="font-size: 80%;"><b>Output layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs021.html#explicit-derivatives" style="font-size: 80%;"><b>Explicit derivatives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs022.html#derivatives-of-the-hidden-layer" style="font-size: 80%;"><b>Derivatives of the hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs023.html#final-expression" style="font-size: 80%;"><b>Final expression</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs024.html#completing-the-list" style="font-size: 80%;"><b>Completing the list</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs025.html#final-expressions-for-the-biases-of-the-hidden-layer" style="font-size: 80%;"><b>Final expressions for the biases of the hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs026.html#gradient-expressions" style="font-size: 80%;"><b>Gradient expressions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs027.html#setting-up-the-equations-for-a-neural-network" style="font-size: 80%;"><b>Setting up the equations for a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs028.html#layout-of-a-neural-network-with-three-hidden-layers-last-later-l-l-4-first-layer-l-0" style="font-size: 80%;"><b>Layout of a neural network with three hidden layers (last later = \( l=L=4 \), first layer \( l=0 \))</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs029.html#definitions" style="font-size: 80%;"><b>Definitions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs030.html#inputs-to-the-activation-function" style="font-size: 80%;"><b>Inputs to the activation function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs031.html#layout-of-input-to-first-hidden-layer-l-1-from-input-layer-l-0" style="font-size: 80%;"><b>Layout of input to first hidden layer \( l=1 \) from input layer \( l=0 \)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs032.html#derivatives-and-the-chain-rule" style="font-size: 80%;"><b>Derivatives and the chain rule</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs033.html#derivative-of-the-cost-function" style="font-size: 80%;"><b>Derivative of the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs034.html#the-back-propagation-equations-for-a-neural-network" style="font-size: 80%;"><b>The  back propagation equations for a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs035.html#analyzing-the-last-results" style="font-size: 80%;"><b>Analyzing the last results</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs036.html#more-considerations" style="font-size: 80%;"><b>More considerations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs037.html#derivatives-in-terms-of-z-j-l" style="font-size: 80%;"><b>Derivatives in terms of \( z_j^L \)</b></a></li>
-     <!-- navigation toc: --> <li><a href="#bringing-it-together" style="font-size: 80%;"><b>Bringing it together</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs039.html#final-back-propagating-equation" style="font-size: 80%;"><b>Final back propagating equation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs040.html#using-the-chain-rule-and-summing-over-all-k-entries" style="font-size: 80%;"><b>Using the chain rule and summing over all \( k \) entries</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs041.html#setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations" style="font-size: 80%;"><b>Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs042.html#setting-up-the-back-propagation-algorithm-part-1" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 1</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs043.html#setting-up-the-back-propagation-algorithm-part-2" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs044.html#setting-up-the-back-propagation-algorithm-part-3" style="font-size: 80%;"><b>Setting up the Back propagation algorithm, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs045.html#updating-the-gradients" style="font-size: 80%;"><b>Updating the gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs001.html#lecture-october-13-2025" style="font-size: 80%;"><b>Lecture October 13, 2025</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs002.html#readings-and-videos" style="font-size: 80%;"><b>Readings and videos</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs003.html#material-for-the-lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Material for the lab sessions on Tuesday and Wednesday</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs004.html#lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network" style="font-size: 80%;"><b>Lecture material: Writing a code which implements a feed-forward neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs005.html#mathematics-of-deep-learning" style="font-size: 80%;"><b>Mathematics of deep learning</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs006.html#reminder-on-books-with-hands-on-material-and-codes" style="font-size: 80%;"><b>Reminder on books with hands-on material and codes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs007.html#reading-recommendations" style="font-size: 80%;"><b>Reading recommendations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs008.html#reminder-from-last-week-first-network-example-simple-percepetron-with-one-input" style="font-size: 80%;"><b>Reminder from last week: First network example, simple percepetron with one input</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs009.html#layout-of-a-simple-neural-network-with-no-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with no hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs010.html#optimizing-the-parameters" style="font-size: 80%;"><b>Optimizing the parameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs011.html#adding-a-hidden-layer" style="font-size: 80%;"><b>Adding a hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs012.html#layout-of-a-simple-neural-network-with-one-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with one hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs013.html#the-derivatives" style="font-size: 80%;"><b>The derivatives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs014.html#important-observations" style="font-size: 80%;"><b>Important observations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs015.html#the-training" style="font-size: 80%;"><b>The training</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs016.html#code-example" style="font-size: 80%;"><b>Code example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs017.html#simple-neural-network-and-the-back-propagation-equations" style="font-size: 80%;"><b>Simple neural network and the  back propagation equations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs018.html#layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-with-two-hidden-noeds-and-one-output-node" style="font-size: 80%;"><b>Layout of a simple neural network with two input nodes, one  hidden layer with two hidden noeds and one output node</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs019.html#the-ouput-layer" style="font-size: 80%;"><b>The ouput layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs020.html#compact-expressions" style="font-size: 80%;"><b>Compact expressions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs021.html#output-layer" style="font-size: 80%;"><b>Output layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs022.html#explicit-derivatives" style="font-size: 80%;"><b>Explicit derivatives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs023.html#derivatives-of-the-hidden-layer" style="font-size: 80%;"><b>Derivatives of the hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs024.html#final-expression" style="font-size: 80%;"><b>Final expression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs025.html#completing-the-list" style="font-size: 80%;"><b>Completing the list</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs026.html#final-expressions-for-the-biases-of-the-hidden-layer" style="font-size: 80%;"><b>Final expressions for the biases of the hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs027.html#gradient-expressions" style="font-size: 80%;"><b>Gradient expressions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs028.html#setting-up-the-equations-for-a-neural-network" style="font-size: 80%;"><b>Setting up the equations for a neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs029.html#layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0" style="font-size: 80%;"><b>Layout of a neural network with three hidden layers (last layer = \( l=L=4 \), first layer \( l=0 \))</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs030.html#definitions" style="font-size: 80%;"><b>Definitions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs031.html#inputs-to-the-activation-function" style="font-size: 80%;"><b>Inputs to the activation function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs032.html#layout-of-input-to-first-hidden-layer-l-1-from-input-layer-l-0" style="font-size: 80%;"><b>Layout of input to first hidden layer \( l=1 \) from input layer \( l=0 \)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs033.html#derivatives-and-the-chain-rule" style="font-size: 80%;"><b>Derivatives and the chain rule</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs034.html#derivative-of-the-cost-function" style="font-size: 80%;"><b>Derivative of the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs035.html#the-back-propagation-equations-for-a-neural-network" style="font-size: 80%;"><b>The  back propagation equations for a neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs036.html#analyzing-the-last-results" style="font-size: 80%;"><b>Analyzing the last results</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs037.html#more-considerations" style="font-size: 80%;"><b>More considerations</b></a></li>
+     <!-- navigation toc: --> <li><a href="#derivatives-in-terms-of-z-j-l" style="font-size: 80%;"><b>Derivatives in terms of \( z_j^L \)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs039.html#bringing-it-together" style="font-size: 80%;"><b>Bringing it together</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs040.html#final-back-propagating-equation" style="font-size: 80%;"><b>Final back propagating equation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs041.html#using-the-chain-rule-and-summing-over-all-k-entries" style="font-size: 80%;"><b>Using the chain rule and summing over all \( k \) entries</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs042.html#setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations" style="font-size: 80%;"><b>Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs043.html#setting-up-the-back-propagation-algorithm-part-1" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 1</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs044.html#setting-up-the-back-propagation-algorithm-part-2" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 2</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs045.html#setting-up-the-back-propagation-algorithm-part-3" style="font-size: 80%;"><b>Setting up the Back propagation algorithm, part 3</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs046.html#updating-the-gradients" style="font-size: 80%;"><b>Updating the gradients</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#activation-functions" style="font-size: 80%;"><b>Activation functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs047.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs048.html#relevance" style="font-size: 80%;"><b>Relevance</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs049.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs050.html#exploding-gradients" style="font-size: 80%;"><b>Exploding gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs051.html#is-the-logistic-activation-function-sigmoid-our-choice" style="font-size: 80%;"><b>Is the Logistic activation function (Sigmoid)  our choice?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs052.html#logistic-function-as-the-root-of-problems" style="font-size: 80%;"><b>Logistic function as the root of problems</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs053.html#the-derivative-of-the-logistic-funtion" style="font-size: 80%;"><b>The derivative of the Logistic funtion</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs054.html#insights-from-the-paper-by-glorot-and-bengio" style="font-size: 80%;"><b>Insights from the paper by Glorot and Bengio</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs055.html#the-relu-function-family" style="font-size: 80%;"><b>The RELU function family</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs056.html#elu-function" style="font-size: 80%;"><b>ELU function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs057.html#which-activation-function-should-we-use" style="font-size: 80%;"><b>Which activation function should we use?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs058.html#more-on-activation-functions-output-layers" style="font-size: 80%;"><b>More on activation functions, output layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs059.html#fine-tuning-neural-network-hyperparameters" style="font-size: 80%;"><b>Fine-tuning neural network hyperparameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs060.html#hidden-layers" style="font-size: 80%;"><b>Hidden layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs061.html#batch-normalization" style="font-size: 80%;"><b>Batch Normalization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs062.html#dropout" style="font-size: 80%;"><b>Dropout</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs063.html#gradient-clipping" style="font-size: 80%;"><b>Gradient Clipping</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs064.html#a-top-down-perspective-on-neural-networks" style="font-size: 80%;"><b>A top-down perspective on Neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs065.html#more-top-down-perspectives" style="font-size: 80%;"><b>More top-down perspectives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs066.html#limitations-of-supervised-learning-with-deep-networks" style="font-size: 80%;"><b>Limitations of supervised learning with deep networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs067.html#limitations-of-nns" style="font-size: 80%;"><b>Limitations of NNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs068.html#homogeneous-data" style="font-size: 80%;"><b>Homogeneous data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs069.html#more-limitations" style="font-size: 80%;"><b>More limitations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs070.html#setting-up-a-multi-layer-perceptron-model-for-classification" style="font-size: 80%;"><b>Setting up a Multi-layer perceptron model for classification</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs071.html#defining-the-cost-function" style="font-size: 80%;"><b>Defining the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs072.html#example-binary-classification-problem" style="font-size: 80%;"><b>Example: binary classification problem</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs073.html#the-softmax-function" style="font-size: 80%;"><b>The Softmax function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs074.html#developing-a-code-for-doing-neural-networks-with-back-propagation" style="font-size: 80%;"><b>Developing a code for doing neural networks with back propagation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs076.html#train-and-test-datasets" style="font-size: 80%;"><b>Train and test datasets</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs077.html#define-model-and-architecture" style="font-size: 80%;"><b>Define model and architecture</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs078.html#layers" style="font-size: 80%;"><b>Layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs079.html#weights-and-biases" style="font-size: 80%;"><b>Weights and biases</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs080.html#feed-forward-pass" style="font-size: 80%;"><b>Feed-forward pass</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs081.html#matrix-multiplications" style="font-size: 80%;"><b>Matrix multiplications</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs082.html#choose-cost-function-and-optimizer" style="font-size: 80%;"><b>Choose cost function and optimizer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs083.html#optimizing-the-cost-function" style="font-size: 80%;"><b>Optimizing the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs084.html#regularization" style="font-size: 80%;"><b>Regularization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs085.html#matrix-multiplication" style="font-size: 80%;"><b>Matrix  multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs086.html#improving-performance" style="font-size: 80%;"><b>Improving performance</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs087.html#full-object-oriented-implementation" style="font-size: 80%;"><b>Full object-oriented implementation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs088.html#evaluate-model-performance-on-test-data" style="font-size: 80%;"><b>Evaluate model performance on test data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs089.html#adjust-hyperparameters" style="font-size: 80%;"><b>Adjust hyperparameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs091.html#scikit-learn-implementation" style="font-size: 80%;"><b>scikit-learn implementation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#building-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs094.html#tensorflow" style="font-size: 80%;"><b>Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs095.html#using-keras" style="font-size: 80%;"><b>Using Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#the-breast-cancer-data-now-with-keras" style="font-size: 80%;"><b>The Breast Cancer Data, now with Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs048.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs049.html#relevance" style="font-size: 80%;"><b>Relevance</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs050.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs051.html#exploding-gradients" style="font-size: 80%;"><b>Exploding gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs052.html#is-the-logistic-activation-function-sigmoid-our-choice" style="font-size: 80%;"><b>Is the Logistic activation function (Sigmoid)  our choice?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs053.html#logistic-function-as-the-root-of-problems" style="font-size: 80%;"><b>Logistic function as the root of problems</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs054.html#the-derivative-of-the-logistic-funtion" style="font-size: 80%;"><b>The derivative of the Logistic funtion</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs055.html#insights-from-the-paper-by-glorot-and-bengio" style="font-size: 80%;"><b>Insights from the paper by Glorot and Bengio</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs056.html#the-relu-function-family" style="font-size: 80%;"><b>The RELU function family</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs057.html#elu-function" style="font-size: 80%;"><b>ELU function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs058.html#which-activation-function-should-we-use" style="font-size: 80%;"><b>Which activation function should we use?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs059.html#more-on-activation-functions-output-layers" style="font-size: 80%;"><b>More on activation functions, output layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs060.html#fine-tuning-neural-network-hyperparameters" style="font-size: 80%;"><b>Fine-tuning neural network hyperparameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs061.html#hidden-layers" style="font-size: 80%;"><b>Hidden layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs062.html#batch-normalization" style="font-size: 80%;"><b>Batch Normalization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs063.html#dropout" style="font-size: 80%;"><b>Dropout</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs064.html#gradient-clipping" style="font-size: 80%;"><b>Gradient Clipping</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs065.html#a-top-down-perspective-on-neural-networks" style="font-size: 80%;"><b>A top-down perspective on Neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs066.html#more-top-down-perspectives" style="font-size: 80%;"><b>More top-down perspectives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs067.html#limitations-of-supervised-learning-with-deep-networks" style="font-size: 80%;"><b>Limitations of supervised learning with deep networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs068.html#limitations-of-nns" style="font-size: 80%;"><b>Limitations of NNs</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs069.html#homogeneous-data" style="font-size: 80%;"><b>Homogeneous data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs070.html#more-limitations" style="font-size: 80%;"><b>More limitations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs071.html#setting-up-a-multi-layer-perceptron-model-for-classification" style="font-size: 80%;"><b>Setting up a Multi-layer perceptron model for classification</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs072.html#defining-the-cost-function" style="font-size: 80%;"><b>Defining the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs073.html#example-binary-classification-problem" style="font-size: 80%;"><b>Example: binary classification problem</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs074.html#the-softmax-function" style="font-size: 80%;"><b>The Softmax function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs075.html#developing-a-code-for-doing-neural-networks-with-back-propagation" style="font-size: 80%;"><b>Developing a code for doing neural networks with back propagation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs077.html#train-and-test-datasets" style="font-size: 80%;"><b>Train and test datasets</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs078.html#define-model-and-architecture" style="font-size: 80%;"><b>Define model and architecture</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs079.html#layers" style="font-size: 80%;"><b>Layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs080.html#weights-and-biases" style="font-size: 80%;"><b>Weights and biases</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs081.html#feed-forward-pass" style="font-size: 80%;"><b>Feed-forward pass</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs082.html#matrix-multiplications" style="font-size: 80%;"><b>Matrix multiplications</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs083.html#choose-cost-function-and-optimizer" style="font-size: 80%;"><b>Choose cost function and optimizer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs084.html#optimizing-the-cost-function" style="font-size: 80%;"><b>Optimizing the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs085.html#regularization" style="font-size: 80%;"><b>Regularization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs086.html#matrix-multiplication" style="font-size: 80%;"><b>Matrix  multiplication</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs087.html#improving-performance" style="font-size: 80%;"><b>Improving performance</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs088.html#full-object-oriented-implementation" style="font-size: 80%;"><b>Full object-oriented implementation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs089.html#evaluate-model-performance-on-test-data" style="font-size: 80%;"><b>Evaluate model performance on test data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs090.html#adjust-hyperparameters" style="font-size: 80%;"><b>Adjust hyperparameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#scikit-learn-implementation" style="font-size: 80%;"><b>scikit-learn implementation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs094.html#building-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building neural networks in Tensorflow and Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs095.html#tensorflow" style="font-size: 80%;"><b>Tensorflow</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#using-keras" style="font-size: 80%;"><b>Using Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#building-a-neural-network-code" style="font-size: 80%;"><b>Building a neural network code</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#learning-rate-methods" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Learning rate methods</a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#usage-of-the-above-learning-rate-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of the above learning rate schedulers</a></li>
@@ -465,34 +463,20 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0038"></a>
 <!-- !split -->
-<h2 id="bringing-it-together" class="anchor">Bringing it together </h2>
-
-<p>We have now three equations that are essential for the computations of the derivatives of the cost function at the output layer. These equations are needed to start the algorithm and they are</p>
+<h2 id="derivatives-in-terms-of-z-j-l" class="anchor">Derivatives in terms of \( z_j^L \) </h2>
 
-$$
-\begin{equation}
-\frac{\partial{\cal C}(\boldsymbol{W^L})}{\partial w_{ij}^L}  =  \delta_j^La_i^{L-1},
-\tag{1}
-\end{equation}
-$$
+<p>It is also easy to see that our previous equation can be written as</p>
 
-<p>and</p>
 $$
-\begin{equation}
-\delta_j^L = \sigma'(z_j^L)\frac{\partial {\cal C}}{\partial (a_j^L)},
-\tag{2}
-\end{equation}
+\delta_j^L =\frac{\partial {\cal C}}{\partial z_j^L}= \frac{\partial {\cal C}}{\partial a_j^L}\frac{\partial a_j^L}{\partial z_j^L},
 $$
 
-<p>and</p>
-
+<p>which can also be interpreted as the partial derivative of the cost function with respect to the biases \( b_j^L \), namely</p>
 $$
-\begin{equation}
-\delta_j^L = \frac{\partial {\cal C}}{\partial b_j^L},
-\tag{3}
-\end{equation}
+\delta_j^L = \frac{\partial {\cal C}}{\partial b_j^L}\frac{\partial b_j^L}{\partial z_j^L}=\frac{\partial {\cal C}}{\partial b_j^L},
 $$
 
+<p>That is, the error \( \delta_j^L \) is exactly equal to the rate of change of the cost function as a function of the bias. </p>
 
 <p>
 <!-- navigation buttons at the bottom of the page -->
diff --git a/doc/pub/week42/html/._week42-bs039.html b/doc/pub/week42/html/._week42-bs039.html
index e4d3d3833..108b4e72c 100644
--- a/doc/pub/week42/html/._week42-bs039.html
+++ b/doc/pub/week42/html/._week42-bs039.html
@@ -36,16 +36,17 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Lecture October 14, 2024', 2, None, 'lecture-october-14-2024'),
-              ('Material for the active learning sessions on Tuesday and '
-               'Wednesday',
+ 'sections': [('Lecture October 13, 2025', 2, None, 'lecture-october-13-2025'),
+              ('Readings and videos', 2, None, 'readings-and-videos'),
+              ('Material for the lab sessions on Tuesday and Wednesday',
                2,
                None,
-               'material-for-the-active-learning-sessions-on-tuesday-and-wednesday'),
-              ('Writing a code which implements a feed-forward neural network',
+               'material-for-the-lab-sessions-on-tuesday-and-wednesday'),
+              ('Lecture material: Writing a code which implements a '
+               'feed-forward neural network',
                2,
                None,
-               'writing-a-code-which-implements-a-feed-forward-neural-network'),
+               'lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network'),
               ('Mathematics of deep learning',
                2,
                None,
@@ -55,10 +56,11 @@
                None,
                'reminder-on-books-with-hands-on-material-and-codes'),
               ('Reading recommendations', 2, None, 'reading-recommendations'),
-              ('First network example, simple percepetron with one input',
+              ('Reminder from last week: First network example, simple '
+               'percepetron with one input',
                2,
                None,
-               'first-network-example-simple-percepetron-with-one-input'),
+               'reminder-from-last-week-first-network-example-simple-percepetron-with-one-input'),
               ('Layout of a simple neural network with no hidden layer',
                2,
                None,
@@ -105,10 +107,10 @@
                None,
                'setting-up-the-equations-for-a-neural-network'),
               ('Layout of a neural network with three hidden layers (last '
-               'later = $l=L=4$, first layer $l=0$)',
+               'layer = $l=L=4$, first layer $l=0$)',
                2,
                None,
-               'layout-of-a-neural-network-with-three-hidden-layers-last-later-l-l-4-first-layer-l-0'),
+               'layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0'),
               ('Definitions', 2, None, 'definitions'),
               ('Inputs to the activation function',
                2,
@@ -291,10 +293,6 @@
                2,
                None,
                'collect-and-pre-process-data'),
-              ('The Breast Cancer Data, now with Keras',
-               2,
-               None,
-               'the-breast-cancer-data-now-with-keras'),
               ('Building a neural network code',
                2,
                None,
@@ -349,103 +347,103 @@
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs001.html#lecture-october-14-2024" style="font-size: 80%;"><b>Lecture October 14, 2024</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs002.html#material-for-the-active-learning-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Material for the active learning sessions on Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs003.html#writing-a-code-which-implements-a-feed-forward-neural-network" style="font-size: 80%;"><b>Writing a code which implements a feed-forward neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs004.html#mathematics-of-deep-learning" style="font-size: 80%;"><b>Mathematics of deep learning</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs005.html#reminder-on-books-with-hands-on-material-and-codes" style="font-size: 80%;"><b>Reminder on books with hands-on material and codes</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs006.html#reading-recommendations" style="font-size: 80%;"><b>Reading recommendations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs007.html#first-network-example-simple-percepetron-with-one-input" style="font-size: 80%;"><b>First network example, simple percepetron with one input</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs008.html#layout-of-a-simple-neural-network-with-no-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with no hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs009.html#optimizing-the-parameters" style="font-size: 80%;"><b>Optimizing the parameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs010.html#adding-a-hidden-layer" style="font-size: 80%;"><b>Adding a hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs011.html#layout-of-a-simple-neural-network-with-one-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with one hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs012.html#the-derivatives" style="font-size: 80%;"><b>The derivatives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs013.html#important-observations" style="font-size: 80%;"><b>Important observations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs014.html#the-training" style="font-size: 80%;"><b>The training</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs015.html#code-example" style="font-size: 80%;"><b>Code example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs016.html#simple-neural-network-and-the-back-propagation-equations" style="font-size: 80%;"><b>Simple neural network and the  back propagation equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs017.html#layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-with-two-hidden-noeds-and-one-output-node" style="font-size: 80%;"><b>Layout of a simple neural network with two input nodes, one  hidden layer with two hidden noeds and one output node</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs018.html#the-ouput-layer" style="font-size: 80%;"><b>The ouput layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs019.html#compact-expressions" style="font-size: 80%;"><b>Compact expressions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs020.html#output-layer" style="font-size: 80%;"><b>Output layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs021.html#explicit-derivatives" style="font-size: 80%;"><b>Explicit derivatives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs022.html#derivatives-of-the-hidden-layer" style="font-size: 80%;"><b>Derivatives of the hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs023.html#final-expression" style="font-size: 80%;"><b>Final expression</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs024.html#completing-the-list" style="font-size: 80%;"><b>Completing the list</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs025.html#final-expressions-for-the-biases-of-the-hidden-layer" style="font-size: 80%;"><b>Final expressions for the biases of the hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs026.html#gradient-expressions" style="font-size: 80%;"><b>Gradient expressions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs027.html#setting-up-the-equations-for-a-neural-network" style="font-size: 80%;"><b>Setting up the equations for a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs028.html#layout-of-a-neural-network-with-three-hidden-layers-last-later-l-l-4-first-layer-l-0" style="font-size: 80%;"><b>Layout of a neural network with three hidden layers (last later = \( l=L=4 \), first layer \( l=0 \))</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs029.html#definitions" style="font-size: 80%;"><b>Definitions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs030.html#inputs-to-the-activation-function" style="font-size: 80%;"><b>Inputs to the activation function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs031.html#layout-of-input-to-first-hidden-layer-l-1-from-input-layer-l-0" style="font-size: 80%;"><b>Layout of input to first hidden layer \( l=1 \) from input layer \( l=0 \)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs032.html#derivatives-and-the-chain-rule" style="font-size: 80%;"><b>Derivatives and the chain rule</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs033.html#derivative-of-the-cost-function" style="font-size: 80%;"><b>Derivative of the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs034.html#the-back-propagation-equations-for-a-neural-network" style="font-size: 80%;"><b>The  back propagation equations for a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs035.html#analyzing-the-last-results" style="font-size: 80%;"><b>Analyzing the last results</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs036.html#more-considerations" style="font-size: 80%;"><b>More considerations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs037.html#derivatives-in-terms-of-z-j-l" style="font-size: 80%;"><b>Derivatives in terms of \( z_j^L \)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs038.html#bringing-it-together" style="font-size: 80%;"><b>Bringing it together</b></a></li>
-     <!-- navigation toc: --> <li><a href="#final-back-propagating-equation" style="font-size: 80%;"><b>Final back propagating equation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs040.html#using-the-chain-rule-and-summing-over-all-k-entries" style="font-size: 80%;"><b>Using the chain rule and summing over all \( k \) entries</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs041.html#setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations" style="font-size: 80%;"><b>Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs042.html#setting-up-the-back-propagation-algorithm-part-1" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 1</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs043.html#setting-up-the-back-propagation-algorithm-part-2" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs044.html#setting-up-the-back-propagation-algorithm-part-3" style="font-size: 80%;"><b>Setting up the Back propagation algorithm, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs045.html#updating-the-gradients" style="font-size: 80%;"><b>Updating the gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs001.html#lecture-october-13-2025" style="font-size: 80%;"><b>Lecture October 13, 2025</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs002.html#readings-and-videos" style="font-size: 80%;"><b>Readings and videos</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs003.html#material-for-the-lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Material for the lab sessions on Tuesday and Wednesday</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs004.html#lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network" style="font-size: 80%;"><b>Lecture material: Writing a code which implements a feed-forward neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs005.html#mathematics-of-deep-learning" style="font-size: 80%;"><b>Mathematics of deep learning</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs006.html#reminder-on-books-with-hands-on-material-and-codes" style="font-size: 80%;"><b>Reminder on books with hands-on material and codes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs007.html#reading-recommendations" style="font-size: 80%;"><b>Reading recommendations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs008.html#reminder-from-last-week-first-network-example-simple-percepetron-with-one-input" style="font-size: 80%;"><b>Reminder from last week: First network example, simple percepetron with one input</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs009.html#layout-of-a-simple-neural-network-with-no-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with no hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs010.html#optimizing-the-parameters" style="font-size: 80%;"><b>Optimizing the parameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs011.html#adding-a-hidden-layer" style="font-size: 80%;"><b>Adding a hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs012.html#layout-of-a-simple-neural-network-with-one-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with one hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs013.html#the-derivatives" style="font-size: 80%;"><b>The derivatives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs014.html#important-observations" style="font-size: 80%;"><b>Important observations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs015.html#the-training" style="font-size: 80%;"><b>The training</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs016.html#code-example" style="font-size: 80%;"><b>Code example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs017.html#simple-neural-network-and-the-back-propagation-equations" style="font-size: 80%;"><b>Simple neural network and the  back propagation equations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs018.html#layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-with-two-hidden-noeds-and-one-output-node" style="font-size: 80%;"><b>Layout of a simple neural network with two input nodes, one  hidden layer with two hidden noeds and one output node</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs019.html#the-ouput-layer" style="font-size: 80%;"><b>The ouput layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs020.html#compact-expressions" style="font-size: 80%;"><b>Compact expressions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs021.html#output-layer" style="font-size: 80%;"><b>Output layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs022.html#explicit-derivatives" style="font-size: 80%;"><b>Explicit derivatives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs023.html#derivatives-of-the-hidden-layer" style="font-size: 80%;"><b>Derivatives of the hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs024.html#final-expression" style="font-size: 80%;"><b>Final expression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs025.html#completing-the-list" style="font-size: 80%;"><b>Completing the list</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs026.html#final-expressions-for-the-biases-of-the-hidden-layer" style="font-size: 80%;"><b>Final expressions for the biases of the hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs027.html#gradient-expressions" style="font-size: 80%;"><b>Gradient expressions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs028.html#setting-up-the-equations-for-a-neural-network" style="font-size: 80%;"><b>Setting up the equations for a neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs029.html#layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0" style="font-size: 80%;"><b>Layout of a neural network with three hidden layers (last layer = \( l=L=4 \), first layer \( l=0 \))</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs030.html#definitions" style="font-size: 80%;"><b>Definitions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs031.html#inputs-to-the-activation-function" style="font-size: 80%;"><b>Inputs to the activation function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs032.html#layout-of-input-to-first-hidden-layer-l-1-from-input-layer-l-0" style="font-size: 80%;"><b>Layout of input to first hidden layer \( l=1 \) from input layer \( l=0 \)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs033.html#derivatives-and-the-chain-rule" style="font-size: 80%;"><b>Derivatives and the chain rule</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs034.html#derivative-of-the-cost-function" style="font-size: 80%;"><b>Derivative of the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs035.html#the-back-propagation-equations-for-a-neural-network" style="font-size: 80%;"><b>The  back propagation equations for a neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs036.html#analyzing-the-last-results" style="font-size: 80%;"><b>Analyzing the last results</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs037.html#more-considerations" style="font-size: 80%;"><b>More considerations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs038.html#derivatives-in-terms-of-z-j-l" style="font-size: 80%;"><b>Derivatives in terms of \( z_j^L \)</b></a></li>
+     <!-- navigation toc: --> <li><a href="#bringing-it-together" style="font-size: 80%;"><b>Bringing it together</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs040.html#final-back-propagating-equation" style="font-size: 80%;"><b>Final back propagating equation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs041.html#using-the-chain-rule-and-summing-over-all-k-entries" style="font-size: 80%;"><b>Using the chain rule and summing over all \( k \) entries</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs042.html#setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations" style="font-size: 80%;"><b>Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs043.html#setting-up-the-back-propagation-algorithm-part-1" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 1</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs044.html#setting-up-the-back-propagation-algorithm-part-2" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 2</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs045.html#setting-up-the-back-propagation-algorithm-part-3" style="font-size: 80%;"><b>Setting up the Back propagation algorithm, part 3</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs046.html#updating-the-gradients" style="font-size: 80%;"><b>Updating the gradients</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#activation-functions" style="font-size: 80%;"><b>Activation functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs047.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs048.html#relevance" style="font-size: 80%;"><b>Relevance</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs049.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs050.html#exploding-gradients" style="font-size: 80%;"><b>Exploding gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs051.html#is-the-logistic-activation-function-sigmoid-our-choice" style="font-size: 80%;"><b>Is the Logistic activation function (Sigmoid)  our choice?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs052.html#logistic-function-as-the-root-of-problems" style="font-size: 80%;"><b>Logistic function as the root of problems</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs053.html#the-derivative-of-the-logistic-funtion" style="font-size: 80%;"><b>The derivative of the Logistic funtion</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs054.html#insights-from-the-paper-by-glorot-and-bengio" style="font-size: 80%;"><b>Insights from the paper by Glorot and Bengio</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs055.html#the-relu-function-family" style="font-size: 80%;"><b>The RELU function family</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs056.html#elu-function" style="font-size: 80%;"><b>ELU function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs057.html#which-activation-function-should-we-use" style="font-size: 80%;"><b>Which activation function should we use?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs058.html#more-on-activation-functions-output-layers" style="font-size: 80%;"><b>More on activation functions, output layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs059.html#fine-tuning-neural-network-hyperparameters" style="font-size: 80%;"><b>Fine-tuning neural network hyperparameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs060.html#hidden-layers" style="font-size: 80%;"><b>Hidden layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs061.html#batch-normalization" style="font-size: 80%;"><b>Batch Normalization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs062.html#dropout" style="font-size: 80%;"><b>Dropout</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs063.html#gradient-clipping" style="font-size: 80%;"><b>Gradient Clipping</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs064.html#a-top-down-perspective-on-neural-networks" style="font-size: 80%;"><b>A top-down perspective on Neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs065.html#more-top-down-perspectives" style="font-size: 80%;"><b>More top-down perspectives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs066.html#limitations-of-supervised-learning-with-deep-networks" style="font-size: 80%;"><b>Limitations of supervised learning with deep networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs067.html#limitations-of-nns" style="font-size: 80%;"><b>Limitations of NNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs068.html#homogeneous-data" style="font-size: 80%;"><b>Homogeneous data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs069.html#more-limitations" style="font-size: 80%;"><b>More limitations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs070.html#setting-up-a-multi-layer-perceptron-model-for-classification" style="font-size: 80%;"><b>Setting up a Multi-layer perceptron model for classification</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs071.html#defining-the-cost-function" style="font-size: 80%;"><b>Defining the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs072.html#example-binary-classification-problem" style="font-size: 80%;"><b>Example: binary classification problem</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs073.html#the-softmax-function" style="font-size: 80%;"><b>The Softmax function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs074.html#developing-a-code-for-doing-neural-networks-with-back-propagation" style="font-size: 80%;"><b>Developing a code for doing neural networks with back propagation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs076.html#train-and-test-datasets" style="font-size: 80%;"><b>Train and test datasets</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs077.html#define-model-and-architecture" style="font-size: 80%;"><b>Define model and architecture</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs078.html#layers" style="font-size: 80%;"><b>Layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs079.html#weights-and-biases" style="font-size: 80%;"><b>Weights and biases</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs080.html#feed-forward-pass" style="font-size: 80%;"><b>Feed-forward pass</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs081.html#matrix-multiplications" style="font-size: 80%;"><b>Matrix multiplications</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs082.html#choose-cost-function-and-optimizer" style="font-size: 80%;"><b>Choose cost function and optimizer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs083.html#optimizing-the-cost-function" style="font-size: 80%;"><b>Optimizing the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs084.html#regularization" style="font-size: 80%;"><b>Regularization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs085.html#matrix-multiplication" style="font-size: 80%;"><b>Matrix  multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs086.html#improving-performance" style="font-size: 80%;"><b>Improving performance</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs087.html#full-object-oriented-implementation" style="font-size: 80%;"><b>Full object-oriented implementation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs088.html#evaluate-model-performance-on-test-data" style="font-size: 80%;"><b>Evaluate model performance on test data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs089.html#adjust-hyperparameters" style="font-size: 80%;"><b>Adjust hyperparameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs091.html#scikit-learn-implementation" style="font-size: 80%;"><b>scikit-learn implementation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#building-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs094.html#tensorflow" style="font-size: 80%;"><b>Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs095.html#using-keras" style="font-size: 80%;"><b>Using Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#the-breast-cancer-data-now-with-keras" style="font-size: 80%;"><b>The Breast Cancer Data, now with Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs048.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs049.html#relevance" style="font-size: 80%;"><b>Relevance</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs050.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs051.html#exploding-gradients" style="font-size: 80%;"><b>Exploding gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs052.html#is-the-logistic-activation-function-sigmoid-our-choice" style="font-size: 80%;"><b>Is the Logistic activation function (Sigmoid)  our choice?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs053.html#logistic-function-as-the-root-of-problems" style="font-size: 80%;"><b>Logistic function as the root of problems</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs054.html#the-derivative-of-the-logistic-funtion" style="font-size: 80%;"><b>The derivative of the Logistic funtion</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs055.html#insights-from-the-paper-by-glorot-and-bengio" style="font-size: 80%;"><b>Insights from the paper by Glorot and Bengio</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs056.html#the-relu-function-family" style="font-size: 80%;"><b>The RELU function family</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs057.html#elu-function" style="font-size: 80%;"><b>ELU function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs058.html#which-activation-function-should-we-use" style="font-size: 80%;"><b>Which activation function should we use?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs059.html#more-on-activation-functions-output-layers" style="font-size: 80%;"><b>More on activation functions, output layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs060.html#fine-tuning-neural-network-hyperparameters" style="font-size: 80%;"><b>Fine-tuning neural network hyperparameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs061.html#hidden-layers" style="font-size: 80%;"><b>Hidden layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs062.html#batch-normalization" style="font-size: 80%;"><b>Batch Normalization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs063.html#dropout" style="font-size: 80%;"><b>Dropout</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs064.html#gradient-clipping" style="font-size: 80%;"><b>Gradient Clipping</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs065.html#a-top-down-perspective-on-neural-networks" style="font-size: 80%;"><b>A top-down perspective on Neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs066.html#more-top-down-perspectives" style="font-size: 80%;"><b>More top-down perspectives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs067.html#limitations-of-supervised-learning-with-deep-networks" style="font-size: 80%;"><b>Limitations of supervised learning with deep networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs068.html#limitations-of-nns" style="font-size: 80%;"><b>Limitations of NNs</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs069.html#homogeneous-data" style="font-size: 80%;"><b>Homogeneous data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs070.html#more-limitations" style="font-size: 80%;"><b>More limitations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs071.html#setting-up-a-multi-layer-perceptron-model-for-classification" style="font-size: 80%;"><b>Setting up a Multi-layer perceptron model for classification</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs072.html#defining-the-cost-function" style="font-size: 80%;"><b>Defining the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs073.html#example-binary-classification-problem" style="font-size: 80%;"><b>Example: binary classification problem</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs074.html#the-softmax-function" style="font-size: 80%;"><b>The Softmax function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs075.html#developing-a-code-for-doing-neural-networks-with-back-propagation" style="font-size: 80%;"><b>Developing a code for doing neural networks with back propagation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs077.html#train-and-test-datasets" style="font-size: 80%;"><b>Train and test datasets</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs078.html#define-model-and-architecture" style="font-size: 80%;"><b>Define model and architecture</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs079.html#layers" style="font-size: 80%;"><b>Layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs080.html#weights-and-biases" style="font-size: 80%;"><b>Weights and biases</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs081.html#feed-forward-pass" style="font-size: 80%;"><b>Feed-forward pass</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs082.html#matrix-multiplications" style="font-size: 80%;"><b>Matrix multiplications</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs083.html#choose-cost-function-and-optimizer" style="font-size: 80%;"><b>Choose cost function and optimizer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs084.html#optimizing-the-cost-function" style="font-size: 80%;"><b>Optimizing the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs085.html#regularization" style="font-size: 80%;"><b>Regularization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs086.html#matrix-multiplication" style="font-size: 80%;"><b>Matrix  multiplication</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs087.html#improving-performance" style="font-size: 80%;"><b>Improving performance</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs088.html#full-object-oriented-implementation" style="font-size: 80%;"><b>Full object-oriented implementation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs089.html#evaluate-model-performance-on-test-data" style="font-size: 80%;"><b>Evaluate model performance on test data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs090.html#adjust-hyperparameters" style="font-size: 80%;"><b>Adjust hyperparameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#scikit-learn-implementation" style="font-size: 80%;"><b>scikit-learn implementation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs094.html#building-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building neural networks in Tensorflow and Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs095.html#tensorflow" style="font-size: 80%;"><b>Tensorflow</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#using-keras" style="font-size: 80%;"><b>Using Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#building-a-neural-network-code" style="font-size: 80%;"><b>Building a neural network code</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#learning-rate-methods" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Learning rate methods</a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#usage-of-the-above-learning-rate-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of the above learning rate schedulers</a></li>
@@ -465,14 +463,34 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0039"></a>
 <!-- !split -->
-<h2 id="final-back-propagating-equation" class="anchor">Final back propagating equation </h2>
+<h2 id="bringing-it-together" class="anchor">Bringing it together </h2>
+
+<p>We have now three equations that are essential for the computations of the derivatives of the cost function at the output layer. These equations are needed to start the algorithm and they are</p>
+
+$$
+\begin{equation}
+\frac{\partial{\cal C}(\boldsymbol{W^L})}{\partial w_{ij}^L}  =  \delta_j^La_i^{L-1},
+\tag{1}
+\end{equation}
+$$
+
+<p>and</p>
+$$
+\begin{equation}
+\delta_j^L = \sigma'(z_j^L)\frac{\partial {\cal C}}{\partial (a_j^L)},
+\tag{2}
+\end{equation}
+$$
+
+<p>and</p>
 
-<p>We have that (replacing \( L \) with a general layer \( l \))</p>
 $$
-\delta_j^l =\frac{\partial {\cal C}}{\partial z_j^l}.
+\begin{equation}
+\delta_j^L = \frac{\partial {\cal C}}{\partial b_j^L},
+\tag{3}
+\end{equation}
 $$
 
-<p>We want to express this in terms of the equations for layer \( l+1 \).</p>
 
 <p>
 <!-- navigation buttons at the bottom of the page -->
diff --git a/doc/pub/week42/html/._week42-bs040.html b/doc/pub/week42/html/._week42-bs040.html
index 298d6005a..07b8578f4 100644
--- a/doc/pub/week42/html/._week42-bs040.html
+++ b/doc/pub/week42/html/._week42-bs040.html
@@ -36,16 +36,17 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Lecture October 14, 2024', 2, None, 'lecture-october-14-2024'),
-              ('Material for the active learning sessions on Tuesday and '
-               'Wednesday',
+ 'sections': [('Lecture October 13, 2025', 2, None, 'lecture-october-13-2025'),
+              ('Readings and videos', 2, None, 'readings-and-videos'),
+              ('Material for the lab sessions on Tuesday and Wednesday',
                2,
                None,
-               'material-for-the-active-learning-sessions-on-tuesday-and-wednesday'),
-              ('Writing a code which implements a feed-forward neural network',
+               'material-for-the-lab-sessions-on-tuesday-and-wednesday'),
+              ('Lecture material: Writing a code which implements a '
+               'feed-forward neural network',
                2,
                None,
-               'writing-a-code-which-implements-a-feed-forward-neural-network'),
+               'lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network'),
               ('Mathematics of deep learning',
                2,
                None,
@@ -55,10 +56,11 @@
                None,
                'reminder-on-books-with-hands-on-material-and-codes'),
               ('Reading recommendations', 2, None, 'reading-recommendations'),
-              ('First network example, simple percepetron with one input',
+              ('Reminder from last week: First network example, simple '
+               'percepetron with one input',
                2,
                None,
-               'first-network-example-simple-percepetron-with-one-input'),
+               'reminder-from-last-week-first-network-example-simple-percepetron-with-one-input'),
               ('Layout of a simple neural network with no hidden layer',
                2,
                None,
@@ -105,10 +107,10 @@
                None,
                'setting-up-the-equations-for-a-neural-network'),
               ('Layout of a neural network with three hidden layers (last '
-               'later = $l=L=4$, first layer $l=0$)',
+               'layer = $l=L=4$, first layer $l=0$)',
                2,
                None,
-               'layout-of-a-neural-network-with-three-hidden-layers-last-later-l-l-4-first-layer-l-0'),
+               'layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0'),
               ('Definitions', 2, None, 'definitions'),
               ('Inputs to the activation function',
                2,
@@ -291,10 +293,6 @@
                2,
                None,
                'collect-and-pre-process-data'),
-              ('The Breast Cancer Data, now with Keras',
-               2,
-               None,
-               'the-breast-cancer-data-now-with-keras'),
               ('Building a neural network code',
                2,
                None,
@@ -349,103 +347,103 @@
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs001.html#lecture-october-14-2024" style="font-size: 80%;"><b>Lecture October 14, 2024</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs002.html#material-for-the-active-learning-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Material for the active learning sessions on Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs003.html#writing-a-code-which-implements-a-feed-forward-neural-network" style="font-size: 80%;"><b>Writing a code which implements a feed-forward neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs004.html#mathematics-of-deep-learning" style="font-size: 80%;"><b>Mathematics of deep learning</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs005.html#reminder-on-books-with-hands-on-material-and-codes" style="font-size: 80%;"><b>Reminder on books with hands-on material and codes</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs006.html#reading-recommendations" style="font-size: 80%;"><b>Reading recommendations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs007.html#first-network-example-simple-percepetron-with-one-input" style="font-size: 80%;"><b>First network example, simple percepetron with one input</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs008.html#layout-of-a-simple-neural-network-with-no-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with no hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs009.html#optimizing-the-parameters" style="font-size: 80%;"><b>Optimizing the parameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs010.html#adding-a-hidden-layer" style="font-size: 80%;"><b>Adding a hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs011.html#layout-of-a-simple-neural-network-with-one-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with one hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs012.html#the-derivatives" style="font-size: 80%;"><b>The derivatives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs013.html#important-observations" style="font-size: 80%;"><b>Important observations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs014.html#the-training" style="font-size: 80%;"><b>The training</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs015.html#code-example" style="font-size: 80%;"><b>Code example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs016.html#simple-neural-network-and-the-back-propagation-equations" style="font-size: 80%;"><b>Simple neural network and the  back propagation equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs017.html#layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-with-two-hidden-noeds-and-one-output-node" style="font-size: 80%;"><b>Layout of a simple neural network with two input nodes, one  hidden layer with two hidden noeds and one output node</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs018.html#the-ouput-layer" style="font-size: 80%;"><b>The ouput layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs019.html#compact-expressions" style="font-size: 80%;"><b>Compact expressions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs020.html#output-layer" style="font-size: 80%;"><b>Output layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs021.html#explicit-derivatives" style="font-size: 80%;"><b>Explicit derivatives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs022.html#derivatives-of-the-hidden-layer" style="font-size: 80%;"><b>Derivatives of the hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs023.html#final-expression" style="font-size: 80%;"><b>Final expression</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs024.html#completing-the-list" style="font-size: 80%;"><b>Completing the list</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs025.html#final-expressions-for-the-biases-of-the-hidden-layer" style="font-size: 80%;"><b>Final expressions for the biases of the hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs026.html#gradient-expressions" style="font-size: 80%;"><b>Gradient expressions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs027.html#setting-up-the-equations-for-a-neural-network" style="font-size: 80%;"><b>Setting up the equations for a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs028.html#layout-of-a-neural-network-with-three-hidden-layers-last-later-l-l-4-first-layer-l-0" style="font-size: 80%;"><b>Layout of a neural network with three hidden layers (last later = \( l=L=4 \), first layer \( l=0 \))</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs029.html#definitions" style="font-size: 80%;"><b>Definitions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs030.html#inputs-to-the-activation-function" style="font-size: 80%;"><b>Inputs to the activation function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs031.html#layout-of-input-to-first-hidden-layer-l-1-from-input-layer-l-0" style="font-size: 80%;"><b>Layout of input to first hidden layer \( l=1 \) from input layer \( l=0 \)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs032.html#derivatives-and-the-chain-rule" style="font-size: 80%;"><b>Derivatives and the chain rule</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs033.html#derivative-of-the-cost-function" style="font-size: 80%;"><b>Derivative of the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs034.html#the-back-propagation-equations-for-a-neural-network" style="font-size: 80%;"><b>The  back propagation equations for a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs035.html#analyzing-the-last-results" style="font-size: 80%;"><b>Analyzing the last results</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs036.html#more-considerations" style="font-size: 80%;"><b>More considerations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs037.html#derivatives-in-terms-of-z-j-l" style="font-size: 80%;"><b>Derivatives in terms of \( z_j^L \)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs038.html#bringing-it-together" style="font-size: 80%;"><b>Bringing it together</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs039.html#final-back-propagating-equation" style="font-size: 80%;"><b>Final back propagating equation</b></a></li>
-     <!-- navigation toc: --> <li><a href="#using-the-chain-rule-and-summing-over-all-k-entries" style="font-size: 80%;"><b>Using the chain rule and summing over all \( k \) entries</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs041.html#setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations" style="font-size: 80%;"><b>Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs042.html#setting-up-the-back-propagation-algorithm-part-1" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 1</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs043.html#setting-up-the-back-propagation-algorithm-part-2" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs044.html#setting-up-the-back-propagation-algorithm-part-3" style="font-size: 80%;"><b>Setting up the Back propagation algorithm, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs045.html#updating-the-gradients" style="font-size: 80%;"><b>Updating the gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs001.html#lecture-october-13-2025" style="font-size: 80%;"><b>Lecture October 13, 2025</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs002.html#readings-and-videos" style="font-size: 80%;"><b>Readings and videos</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs003.html#material-for-the-lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Material for the lab sessions on Tuesday and Wednesday</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs004.html#lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network" style="font-size: 80%;"><b>Lecture material: Writing a code which implements a feed-forward neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs005.html#mathematics-of-deep-learning" style="font-size: 80%;"><b>Mathematics of deep learning</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs006.html#reminder-on-books-with-hands-on-material-and-codes" style="font-size: 80%;"><b>Reminder on books with hands-on material and codes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs007.html#reading-recommendations" style="font-size: 80%;"><b>Reading recommendations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs008.html#reminder-from-last-week-first-network-example-simple-percepetron-with-one-input" style="font-size: 80%;"><b>Reminder from last week: First network example, simple percepetron with one input</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs009.html#layout-of-a-simple-neural-network-with-no-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with no hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs010.html#optimizing-the-parameters" style="font-size: 80%;"><b>Optimizing the parameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs011.html#adding-a-hidden-layer" style="font-size: 80%;"><b>Adding a hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs012.html#layout-of-a-simple-neural-network-with-one-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with one hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs013.html#the-derivatives" style="font-size: 80%;"><b>The derivatives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs014.html#important-observations" style="font-size: 80%;"><b>Important observations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs015.html#the-training" style="font-size: 80%;"><b>The training</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs016.html#code-example" style="font-size: 80%;"><b>Code example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs017.html#simple-neural-network-and-the-back-propagation-equations" style="font-size: 80%;"><b>Simple neural network and the  back propagation equations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs018.html#layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-with-two-hidden-noeds-and-one-output-node" style="font-size: 80%;"><b>Layout of a simple neural network with two input nodes, one  hidden layer with two hidden noeds and one output node</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs019.html#the-ouput-layer" style="font-size: 80%;"><b>The ouput layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs020.html#compact-expressions" style="font-size: 80%;"><b>Compact expressions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs021.html#output-layer" style="font-size: 80%;"><b>Output layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs022.html#explicit-derivatives" style="font-size: 80%;"><b>Explicit derivatives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs023.html#derivatives-of-the-hidden-layer" style="font-size: 80%;"><b>Derivatives of the hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs024.html#final-expression" style="font-size: 80%;"><b>Final expression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs025.html#completing-the-list" style="font-size: 80%;"><b>Completing the list</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs026.html#final-expressions-for-the-biases-of-the-hidden-layer" style="font-size: 80%;"><b>Final expressions for the biases of the hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs027.html#gradient-expressions" style="font-size: 80%;"><b>Gradient expressions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs028.html#setting-up-the-equations-for-a-neural-network" style="font-size: 80%;"><b>Setting up the equations for a neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs029.html#layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0" style="font-size: 80%;"><b>Layout of a neural network with three hidden layers (last layer = \( l=L=4 \), first layer \( l=0 \))</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs030.html#definitions" style="font-size: 80%;"><b>Definitions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs031.html#inputs-to-the-activation-function" style="font-size: 80%;"><b>Inputs to the activation function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs032.html#layout-of-input-to-first-hidden-layer-l-1-from-input-layer-l-0" style="font-size: 80%;"><b>Layout of input to first hidden layer \( l=1 \) from input layer \( l=0 \)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs033.html#derivatives-and-the-chain-rule" style="font-size: 80%;"><b>Derivatives and the chain rule</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs034.html#derivative-of-the-cost-function" style="font-size: 80%;"><b>Derivative of the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs035.html#the-back-propagation-equations-for-a-neural-network" style="font-size: 80%;"><b>The  back propagation equations for a neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs036.html#analyzing-the-last-results" style="font-size: 80%;"><b>Analyzing the last results</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs037.html#more-considerations" style="font-size: 80%;"><b>More considerations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs038.html#derivatives-in-terms-of-z-j-l" style="font-size: 80%;"><b>Derivatives in terms of \( z_j^L \)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs039.html#bringing-it-together" style="font-size: 80%;"><b>Bringing it together</b></a></li>
+     <!-- navigation toc: --> <li><a href="#final-back-propagating-equation" style="font-size: 80%;"><b>Final back propagating equation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs041.html#using-the-chain-rule-and-summing-over-all-k-entries" style="font-size: 80%;"><b>Using the chain rule and summing over all \( k \) entries</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs042.html#setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations" style="font-size: 80%;"><b>Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs043.html#setting-up-the-back-propagation-algorithm-part-1" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 1</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs044.html#setting-up-the-back-propagation-algorithm-part-2" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 2</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs045.html#setting-up-the-back-propagation-algorithm-part-3" style="font-size: 80%;"><b>Setting up the Back propagation algorithm, part 3</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs046.html#updating-the-gradients" style="font-size: 80%;"><b>Updating the gradients</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#activation-functions" style="font-size: 80%;"><b>Activation functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs047.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs048.html#relevance" style="font-size: 80%;"><b>Relevance</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs049.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs050.html#exploding-gradients" style="font-size: 80%;"><b>Exploding gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs051.html#is-the-logistic-activation-function-sigmoid-our-choice" style="font-size: 80%;"><b>Is the Logistic activation function (Sigmoid)  our choice?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs052.html#logistic-function-as-the-root-of-problems" style="font-size: 80%;"><b>Logistic function as the root of problems</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs053.html#the-derivative-of-the-logistic-funtion" style="font-size: 80%;"><b>The derivative of the Logistic funtion</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs054.html#insights-from-the-paper-by-glorot-and-bengio" style="font-size: 80%;"><b>Insights from the paper by Glorot and Bengio</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs055.html#the-relu-function-family" style="font-size: 80%;"><b>The RELU function family</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs056.html#elu-function" style="font-size: 80%;"><b>ELU function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs057.html#which-activation-function-should-we-use" style="font-size: 80%;"><b>Which activation function should we use?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs058.html#more-on-activation-functions-output-layers" style="font-size: 80%;"><b>More on activation functions, output layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs059.html#fine-tuning-neural-network-hyperparameters" style="font-size: 80%;"><b>Fine-tuning neural network hyperparameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs060.html#hidden-layers" style="font-size: 80%;"><b>Hidden layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs061.html#batch-normalization" style="font-size: 80%;"><b>Batch Normalization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs062.html#dropout" style="font-size: 80%;"><b>Dropout</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs063.html#gradient-clipping" style="font-size: 80%;"><b>Gradient Clipping</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs064.html#a-top-down-perspective-on-neural-networks" style="font-size: 80%;"><b>A top-down perspective on Neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs065.html#more-top-down-perspectives" style="font-size: 80%;"><b>More top-down perspectives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs066.html#limitations-of-supervised-learning-with-deep-networks" style="font-size: 80%;"><b>Limitations of supervised learning with deep networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs067.html#limitations-of-nns" style="font-size: 80%;"><b>Limitations of NNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs068.html#homogeneous-data" style="font-size: 80%;"><b>Homogeneous data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs069.html#more-limitations" style="font-size: 80%;"><b>More limitations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs070.html#setting-up-a-multi-layer-perceptron-model-for-classification" style="font-size: 80%;"><b>Setting up a Multi-layer perceptron model for classification</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs071.html#defining-the-cost-function" style="font-size: 80%;"><b>Defining the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs072.html#example-binary-classification-problem" style="font-size: 80%;"><b>Example: binary classification problem</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs073.html#the-softmax-function" style="font-size: 80%;"><b>The Softmax function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs074.html#developing-a-code-for-doing-neural-networks-with-back-propagation" style="font-size: 80%;"><b>Developing a code for doing neural networks with back propagation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs076.html#train-and-test-datasets" style="font-size: 80%;"><b>Train and test datasets</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs077.html#define-model-and-architecture" style="font-size: 80%;"><b>Define model and architecture</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs078.html#layers" style="font-size: 80%;"><b>Layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs079.html#weights-and-biases" style="font-size: 80%;"><b>Weights and biases</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs080.html#feed-forward-pass" style="font-size: 80%;"><b>Feed-forward pass</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs081.html#matrix-multiplications" style="font-size: 80%;"><b>Matrix multiplications</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs082.html#choose-cost-function-and-optimizer" style="font-size: 80%;"><b>Choose cost function and optimizer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs083.html#optimizing-the-cost-function" style="font-size: 80%;"><b>Optimizing the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs084.html#regularization" style="font-size: 80%;"><b>Regularization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs085.html#matrix-multiplication" style="font-size: 80%;"><b>Matrix  multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs086.html#improving-performance" style="font-size: 80%;"><b>Improving performance</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs087.html#full-object-oriented-implementation" style="font-size: 80%;"><b>Full object-oriented implementation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs088.html#evaluate-model-performance-on-test-data" style="font-size: 80%;"><b>Evaluate model performance on test data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs089.html#adjust-hyperparameters" style="font-size: 80%;"><b>Adjust hyperparameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs091.html#scikit-learn-implementation" style="font-size: 80%;"><b>scikit-learn implementation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#building-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs094.html#tensorflow" style="font-size: 80%;"><b>Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs095.html#using-keras" style="font-size: 80%;"><b>Using Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#the-breast-cancer-data-now-with-keras" style="font-size: 80%;"><b>The Breast Cancer Data, now with Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs048.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs049.html#relevance" style="font-size: 80%;"><b>Relevance</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs050.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs051.html#exploding-gradients" style="font-size: 80%;"><b>Exploding gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs052.html#is-the-logistic-activation-function-sigmoid-our-choice" style="font-size: 80%;"><b>Is the Logistic activation function (Sigmoid)  our choice?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs053.html#logistic-function-as-the-root-of-problems" style="font-size: 80%;"><b>Logistic function as the root of problems</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs054.html#the-derivative-of-the-logistic-funtion" style="font-size: 80%;"><b>The derivative of the Logistic funtion</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs055.html#insights-from-the-paper-by-glorot-and-bengio" style="font-size: 80%;"><b>Insights from the paper by Glorot and Bengio</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs056.html#the-relu-function-family" style="font-size: 80%;"><b>The RELU function family</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs057.html#elu-function" style="font-size: 80%;"><b>ELU function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs058.html#which-activation-function-should-we-use" style="font-size: 80%;"><b>Which activation function should we use?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs059.html#more-on-activation-functions-output-layers" style="font-size: 80%;"><b>More on activation functions, output layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs060.html#fine-tuning-neural-network-hyperparameters" style="font-size: 80%;"><b>Fine-tuning neural network hyperparameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs061.html#hidden-layers" style="font-size: 80%;"><b>Hidden layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs062.html#batch-normalization" style="font-size: 80%;"><b>Batch Normalization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs063.html#dropout" style="font-size: 80%;"><b>Dropout</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs064.html#gradient-clipping" style="font-size: 80%;"><b>Gradient Clipping</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs065.html#a-top-down-perspective-on-neural-networks" style="font-size: 80%;"><b>A top-down perspective on Neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs066.html#more-top-down-perspectives" style="font-size: 80%;"><b>More top-down perspectives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs067.html#limitations-of-supervised-learning-with-deep-networks" style="font-size: 80%;"><b>Limitations of supervised learning with deep networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs068.html#limitations-of-nns" style="font-size: 80%;"><b>Limitations of NNs</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs069.html#homogeneous-data" style="font-size: 80%;"><b>Homogeneous data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs070.html#more-limitations" style="font-size: 80%;"><b>More limitations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs071.html#setting-up-a-multi-layer-perceptron-model-for-classification" style="font-size: 80%;"><b>Setting up a Multi-layer perceptron model for classification</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs072.html#defining-the-cost-function" style="font-size: 80%;"><b>Defining the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs073.html#example-binary-classification-problem" style="font-size: 80%;"><b>Example: binary classification problem</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs074.html#the-softmax-function" style="font-size: 80%;"><b>The Softmax function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs075.html#developing-a-code-for-doing-neural-networks-with-back-propagation" style="font-size: 80%;"><b>Developing a code for doing neural networks with back propagation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs077.html#train-and-test-datasets" style="font-size: 80%;"><b>Train and test datasets</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs078.html#define-model-and-architecture" style="font-size: 80%;"><b>Define model and architecture</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs079.html#layers" style="font-size: 80%;"><b>Layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs080.html#weights-and-biases" style="font-size: 80%;"><b>Weights and biases</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs081.html#feed-forward-pass" style="font-size: 80%;"><b>Feed-forward pass</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs082.html#matrix-multiplications" style="font-size: 80%;"><b>Matrix multiplications</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs083.html#choose-cost-function-and-optimizer" style="font-size: 80%;"><b>Choose cost function and optimizer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs084.html#optimizing-the-cost-function" style="font-size: 80%;"><b>Optimizing the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs085.html#regularization" style="font-size: 80%;"><b>Regularization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs086.html#matrix-multiplication" style="font-size: 80%;"><b>Matrix  multiplication</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs087.html#improving-performance" style="font-size: 80%;"><b>Improving performance</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs088.html#full-object-oriented-implementation" style="font-size: 80%;"><b>Full object-oriented implementation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs089.html#evaluate-model-performance-on-test-data" style="font-size: 80%;"><b>Evaluate model performance on test data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs090.html#adjust-hyperparameters" style="font-size: 80%;"><b>Adjust hyperparameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#scikit-learn-implementation" style="font-size: 80%;"><b>scikit-learn implementation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs094.html#building-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building neural networks in Tensorflow and Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs095.html#tensorflow" style="font-size: 80%;"><b>Tensorflow</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#using-keras" style="font-size: 80%;"><b>Using Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#building-a-neural-network-code" style="font-size: 80%;"><b>Building a neural network code</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#learning-rate-methods" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Learning rate methods</a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#usage-of-the-above-learning-rate-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of the above learning rate schedulers</a></li>
@@ -465,26 +463,14 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0040"></a>
 <!-- !split -->
-<h2 id="using-the-chain-rule-and-summing-over-all-k-entries" class="anchor">Using the chain rule and summing over all \( k \) entries </h2>
-
-<p>We obtain</p>
-$$
-\delta_j^l =\sum_k \frac{\partial {\cal C}}{\partial z_k^{l+1}}\frac{\partial z_k^{l+1}}{\partial z_j^{l}}=\sum_k \delta_k^{l+1}\frac{\partial z_k^{l+1}}{\partial z_j^{l}},
-$$
+<h2 id="final-back-propagating-equation" class="anchor">Final back propagating equation </h2>
 
-<p>and recalling that</p>
+<p>We have that (replacing \( L \) with a general layer \( l \))</p>
 $$
-z_j^{l+1} = \sum_{i=1}^{M_{l}}w_{ij}^{l+1}a_i^{l}+b_j^{l+1},
+\delta_j^l =\frac{\partial {\cal C}}{\partial z_j^l}.
 $$
 
-<p>with \( M_l \) being the number of nodes in layer \( l \), we obtain</p>
-$$
-\delta_j^l =\sum_k \delta_k^{l+1}w_{kj}^{l+1}\sigma'(z_j^l),
-$$
-
-<p>This is our final equation.</p>
-
-<p>We are now ready to set up the algorithm for back propagation and learning the weights and biases.</p>
+<p>We want to express this in terms of the equations for layer \( l+1 \).</p>
 
 <p>
 <!-- navigation buttons at the bottom of the page -->
diff --git a/doc/pub/week42/html/._week42-bs041.html b/doc/pub/week42/html/._week42-bs041.html
index 4c0d90908..1da781f5d 100644
--- a/doc/pub/week42/html/._week42-bs041.html
+++ b/doc/pub/week42/html/._week42-bs041.html
@@ -36,16 +36,17 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Lecture October 14, 2024', 2, None, 'lecture-october-14-2024'),
-              ('Material for the active learning sessions on Tuesday and '
-               'Wednesday',
+ 'sections': [('Lecture October 13, 2025', 2, None, 'lecture-october-13-2025'),
+              ('Readings and videos', 2, None, 'readings-and-videos'),
+              ('Material for the lab sessions on Tuesday and Wednesday',
                2,
                None,
-               'material-for-the-active-learning-sessions-on-tuesday-and-wednesday'),
-              ('Writing a code which implements a feed-forward neural network',
+               'material-for-the-lab-sessions-on-tuesday-and-wednesday'),
+              ('Lecture material: Writing a code which implements a '
+               'feed-forward neural network',
                2,
                None,
-               'writing-a-code-which-implements-a-feed-forward-neural-network'),
+               'lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network'),
               ('Mathematics of deep learning',
                2,
                None,
@@ -55,10 +56,11 @@
                None,
                'reminder-on-books-with-hands-on-material-and-codes'),
               ('Reading recommendations', 2, None, 'reading-recommendations'),
-              ('First network example, simple percepetron with one input',
+              ('Reminder from last week: First network example, simple '
+               'percepetron with one input',
                2,
                None,
-               'first-network-example-simple-percepetron-with-one-input'),
+               'reminder-from-last-week-first-network-example-simple-percepetron-with-one-input'),
               ('Layout of a simple neural network with no hidden layer',
                2,
                None,
@@ -105,10 +107,10 @@
                None,
                'setting-up-the-equations-for-a-neural-network'),
               ('Layout of a neural network with three hidden layers (last '
-               'later = $l=L=4$, first layer $l=0$)',
+               'layer = $l=L=4$, first layer $l=0$)',
                2,
                None,
-               'layout-of-a-neural-network-with-three-hidden-layers-last-later-l-l-4-first-layer-l-0'),
+               'layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0'),
               ('Definitions', 2, None, 'definitions'),
               ('Inputs to the activation function',
                2,
@@ -291,10 +293,6 @@
                2,
                None,
                'collect-and-pre-process-data'),
-              ('The Breast Cancer Data, now with Keras',
-               2,
-               None,
-               'the-breast-cancer-data-now-with-keras'),
               ('Building a neural network code',
                2,
                None,
@@ -349,103 +347,103 @@
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs001.html#lecture-october-14-2024" style="font-size: 80%;"><b>Lecture October 14, 2024</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs002.html#material-for-the-active-learning-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Material for the active learning sessions on Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs003.html#writing-a-code-which-implements-a-feed-forward-neural-network" style="font-size: 80%;"><b>Writing a code which implements a feed-forward neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs004.html#mathematics-of-deep-learning" style="font-size: 80%;"><b>Mathematics of deep learning</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs005.html#reminder-on-books-with-hands-on-material-and-codes" style="font-size: 80%;"><b>Reminder on books with hands-on material and codes</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs006.html#reading-recommendations" style="font-size: 80%;"><b>Reading recommendations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs007.html#first-network-example-simple-percepetron-with-one-input" style="font-size: 80%;"><b>First network example, simple percepetron with one input</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs008.html#layout-of-a-simple-neural-network-with-no-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with no hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs009.html#optimizing-the-parameters" style="font-size: 80%;"><b>Optimizing the parameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs010.html#adding-a-hidden-layer" style="font-size: 80%;"><b>Adding a hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs011.html#layout-of-a-simple-neural-network-with-one-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with one hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs012.html#the-derivatives" style="font-size: 80%;"><b>The derivatives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs013.html#important-observations" style="font-size: 80%;"><b>Important observations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs014.html#the-training" style="font-size: 80%;"><b>The training</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs015.html#code-example" style="font-size: 80%;"><b>Code example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs016.html#simple-neural-network-and-the-back-propagation-equations" style="font-size: 80%;"><b>Simple neural network and the  back propagation equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs017.html#layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-with-two-hidden-noeds-and-one-output-node" style="font-size: 80%;"><b>Layout of a simple neural network with two input nodes, one  hidden layer with two hidden noeds and one output node</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs018.html#the-ouput-layer" style="font-size: 80%;"><b>The ouput layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs019.html#compact-expressions" style="font-size: 80%;"><b>Compact expressions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs020.html#output-layer" style="font-size: 80%;"><b>Output layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs021.html#explicit-derivatives" style="font-size: 80%;"><b>Explicit derivatives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs022.html#derivatives-of-the-hidden-layer" style="font-size: 80%;"><b>Derivatives of the hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs023.html#final-expression" style="font-size: 80%;"><b>Final expression</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs024.html#completing-the-list" style="font-size: 80%;"><b>Completing the list</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs025.html#final-expressions-for-the-biases-of-the-hidden-layer" style="font-size: 80%;"><b>Final expressions for the biases of the hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs026.html#gradient-expressions" style="font-size: 80%;"><b>Gradient expressions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs027.html#setting-up-the-equations-for-a-neural-network" style="font-size: 80%;"><b>Setting up the equations for a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs028.html#layout-of-a-neural-network-with-three-hidden-layers-last-later-l-l-4-first-layer-l-0" style="font-size: 80%;"><b>Layout of a neural network with three hidden layers (last later = \( l=L=4 \), first layer \( l=0 \))</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs029.html#definitions" style="font-size: 80%;"><b>Definitions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs030.html#inputs-to-the-activation-function" style="font-size: 80%;"><b>Inputs to the activation function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs031.html#layout-of-input-to-first-hidden-layer-l-1-from-input-layer-l-0" style="font-size: 80%;"><b>Layout of input to first hidden layer \( l=1 \) from input layer \( l=0 \)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs032.html#derivatives-and-the-chain-rule" style="font-size: 80%;"><b>Derivatives and the chain rule</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs033.html#derivative-of-the-cost-function" style="font-size: 80%;"><b>Derivative of the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs034.html#the-back-propagation-equations-for-a-neural-network" style="font-size: 80%;"><b>The  back propagation equations for a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs035.html#analyzing-the-last-results" style="font-size: 80%;"><b>Analyzing the last results</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs036.html#more-considerations" style="font-size: 80%;"><b>More considerations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs037.html#derivatives-in-terms-of-z-j-l" style="font-size: 80%;"><b>Derivatives in terms of \( z_j^L \)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs038.html#bringing-it-together" style="font-size: 80%;"><b>Bringing it together</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs039.html#final-back-propagating-equation" style="font-size: 80%;"><b>Final back propagating equation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs040.html#using-the-chain-rule-and-summing-over-all-k-entries" style="font-size: 80%;"><b>Using the chain rule and summing over all \( k \) entries</b></a></li>
-     <!-- navigation toc: --> <li><a href="#setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations" style="font-size: 80%;"><b>Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs042.html#setting-up-the-back-propagation-algorithm-part-1" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 1</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs043.html#setting-up-the-back-propagation-algorithm-part-2" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs044.html#setting-up-the-back-propagation-algorithm-part-3" style="font-size: 80%;"><b>Setting up the Back propagation algorithm, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs045.html#updating-the-gradients" style="font-size: 80%;"><b>Updating the gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs001.html#lecture-october-13-2025" style="font-size: 80%;"><b>Lecture October 13, 2025</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs002.html#readings-and-videos" style="font-size: 80%;"><b>Readings and videos</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs003.html#material-for-the-lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Material for the lab sessions on Tuesday and Wednesday</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs004.html#lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network" style="font-size: 80%;"><b>Lecture material: Writing a code which implements a feed-forward neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs005.html#mathematics-of-deep-learning" style="font-size: 80%;"><b>Mathematics of deep learning</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs006.html#reminder-on-books-with-hands-on-material-and-codes" style="font-size: 80%;"><b>Reminder on books with hands-on material and codes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs007.html#reading-recommendations" style="font-size: 80%;"><b>Reading recommendations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs008.html#reminder-from-last-week-first-network-example-simple-percepetron-with-one-input" style="font-size: 80%;"><b>Reminder from last week: First network example, simple percepetron with one input</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs009.html#layout-of-a-simple-neural-network-with-no-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with no hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs010.html#optimizing-the-parameters" style="font-size: 80%;"><b>Optimizing the parameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs011.html#adding-a-hidden-layer" style="font-size: 80%;"><b>Adding a hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs012.html#layout-of-a-simple-neural-network-with-one-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with one hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs013.html#the-derivatives" style="font-size: 80%;"><b>The derivatives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs014.html#important-observations" style="font-size: 80%;"><b>Important observations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs015.html#the-training" style="font-size: 80%;"><b>The training</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs016.html#code-example" style="font-size: 80%;"><b>Code example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs017.html#simple-neural-network-and-the-back-propagation-equations" style="font-size: 80%;"><b>Simple neural network and the  back propagation equations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs018.html#layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-with-two-hidden-noeds-and-one-output-node" style="font-size: 80%;"><b>Layout of a simple neural network with two input nodes, one  hidden layer with two hidden noeds and one output node</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs019.html#the-ouput-layer" style="font-size: 80%;"><b>The ouput layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs020.html#compact-expressions" style="font-size: 80%;"><b>Compact expressions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs021.html#output-layer" style="font-size: 80%;"><b>Output layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs022.html#explicit-derivatives" style="font-size: 80%;"><b>Explicit derivatives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs023.html#derivatives-of-the-hidden-layer" style="font-size: 80%;"><b>Derivatives of the hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs024.html#final-expression" style="font-size: 80%;"><b>Final expression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs025.html#completing-the-list" style="font-size: 80%;"><b>Completing the list</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs026.html#final-expressions-for-the-biases-of-the-hidden-layer" style="font-size: 80%;"><b>Final expressions for the biases of the hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs027.html#gradient-expressions" style="font-size: 80%;"><b>Gradient expressions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs028.html#setting-up-the-equations-for-a-neural-network" style="font-size: 80%;"><b>Setting up the equations for a neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs029.html#layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0" style="font-size: 80%;"><b>Layout of a neural network with three hidden layers (last layer = \( l=L=4 \), first layer \( l=0 \))</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs030.html#definitions" style="font-size: 80%;"><b>Definitions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs031.html#inputs-to-the-activation-function" style="font-size: 80%;"><b>Inputs to the activation function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs032.html#layout-of-input-to-first-hidden-layer-l-1-from-input-layer-l-0" style="font-size: 80%;"><b>Layout of input to first hidden layer \( l=1 \) from input layer \( l=0 \)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs033.html#derivatives-and-the-chain-rule" style="font-size: 80%;"><b>Derivatives and the chain rule</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs034.html#derivative-of-the-cost-function" style="font-size: 80%;"><b>Derivative of the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs035.html#the-back-propagation-equations-for-a-neural-network" style="font-size: 80%;"><b>The  back propagation equations for a neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs036.html#analyzing-the-last-results" style="font-size: 80%;"><b>Analyzing the last results</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs037.html#more-considerations" style="font-size: 80%;"><b>More considerations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs038.html#derivatives-in-terms-of-z-j-l" style="font-size: 80%;"><b>Derivatives in terms of \( z_j^L \)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs039.html#bringing-it-together" style="font-size: 80%;"><b>Bringing it together</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs040.html#final-back-propagating-equation" style="font-size: 80%;"><b>Final back propagating equation</b></a></li>
+     <!-- navigation toc: --> <li><a href="#using-the-chain-rule-and-summing-over-all-k-entries" style="font-size: 80%;"><b>Using the chain rule and summing over all \( k \) entries</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs042.html#setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations" style="font-size: 80%;"><b>Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs043.html#setting-up-the-back-propagation-algorithm-part-1" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 1</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs044.html#setting-up-the-back-propagation-algorithm-part-2" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 2</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs045.html#setting-up-the-back-propagation-algorithm-part-3" style="font-size: 80%;"><b>Setting up the Back propagation algorithm, part 3</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs046.html#updating-the-gradients" style="font-size: 80%;"><b>Updating the gradients</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#activation-functions" style="font-size: 80%;"><b>Activation functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs047.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs048.html#relevance" style="font-size: 80%;"><b>Relevance</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs049.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs050.html#exploding-gradients" style="font-size: 80%;"><b>Exploding gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs051.html#is-the-logistic-activation-function-sigmoid-our-choice" style="font-size: 80%;"><b>Is the Logistic activation function (Sigmoid)  our choice?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs052.html#logistic-function-as-the-root-of-problems" style="font-size: 80%;"><b>Logistic function as the root of problems</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs053.html#the-derivative-of-the-logistic-funtion" style="font-size: 80%;"><b>The derivative of the Logistic funtion</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs054.html#insights-from-the-paper-by-glorot-and-bengio" style="font-size: 80%;"><b>Insights from the paper by Glorot and Bengio</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs055.html#the-relu-function-family" style="font-size: 80%;"><b>The RELU function family</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs056.html#elu-function" style="font-size: 80%;"><b>ELU function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs057.html#which-activation-function-should-we-use" style="font-size: 80%;"><b>Which activation function should we use?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs058.html#more-on-activation-functions-output-layers" style="font-size: 80%;"><b>More on activation functions, output layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs059.html#fine-tuning-neural-network-hyperparameters" style="font-size: 80%;"><b>Fine-tuning neural network hyperparameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs060.html#hidden-layers" style="font-size: 80%;"><b>Hidden layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs061.html#batch-normalization" style="font-size: 80%;"><b>Batch Normalization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs062.html#dropout" style="font-size: 80%;"><b>Dropout</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs063.html#gradient-clipping" style="font-size: 80%;"><b>Gradient Clipping</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs064.html#a-top-down-perspective-on-neural-networks" style="font-size: 80%;"><b>A top-down perspective on Neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs065.html#more-top-down-perspectives" style="font-size: 80%;"><b>More top-down perspectives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs066.html#limitations-of-supervised-learning-with-deep-networks" style="font-size: 80%;"><b>Limitations of supervised learning with deep networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs067.html#limitations-of-nns" style="font-size: 80%;"><b>Limitations of NNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs068.html#homogeneous-data" style="font-size: 80%;"><b>Homogeneous data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs069.html#more-limitations" style="font-size: 80%;"><b>More limitations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs070.html#setting-up-a-multi-layer-perceptron-model-for-classification" style="font-size: 80%;"><b>Setting up a Multi-layer perceptron model for classification</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs071.html#defining-the-cost-function" style="font-size: 80%;"><b>Defining the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs072.html#example-binary-classification-problem" style="font-size: 80%;"><b>Example: binary classification problem</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs073.html#the-softmax-function" style="font-size: 80%;"><b>The Softmax function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs074.html#developing-a-code-for-doing-neural-networks-with-back-propagation" style="font-size: 80%;"><b>Developing a code for doing neural networks with back propagation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs076.html#train-and-test-datasets" style="font-size: 80%;"><b>Train and test datasets</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs077.html#define-model-and-architecture" style="font-size: 80%;"><b>Define model and architecture</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs078.html#layers" style="font-size: 80%;"><b>Layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs079.html#weights-and-biases" style="font-size: 80%;"><b>Weights and biases</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs080.html#feed-forward-pass" style="font-size: 80%;"><b>Feed-forward pass</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs081.html#matrix-multiplications" style="font-size: 80%;"><b>Matrix multiplications</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs082.html#choose-cost-function-and-optimizer" style="font-size: 80%;"><b>Choose cost function and optimizer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs083.html#optimizing-the-cost-function" style="font-size: 80%;"><b>Optimizing the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs084.html#regularization" style="font-size: 80%;"><b>Regularization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs085.html#matrix-multiplication" style="font-size: 80%;"><b>Matrix  multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs086.html#improving-performance" style="font-size: 80%;"><b>Improving performance</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs087.html#full-object-oriented-implementation" style="font-size: 80%;"><b>Full object-oriented implementation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs088.html#evaluate-model-performance-on-test-data" style="font-size: 80%;"><b>Evaluate model performance on test data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs089.html#adjust-hyperparameters" style="font-size: 80%;"><b>Adjust hyperparameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs091.html#scikit-learn-implementation" style="font-size: 80%;"><b>scikit-learn implementation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#building-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs094.html#tensorflow" style="font-size: 80%;"><b>Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs095.html#using-keras" style="font-size: 80%;"><b>Using Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#the-breast-cancer-data-now-with-keras" style="font-size: 80%;"><b>The Breast Cancer Data, now with Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs048.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs049.html#relevance" style="font-size: 80%;"><b>Relevance</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs050.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs051.html#exploding-gradients" style="font-size: 80%;"><b>Exploding gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs052.html#is-the-logistic-activation-function-sigmoid-our-choice" style="font-size: 80%;"><b>Is the Logistic activation function (Sigmoid)  our choice?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs053.html#logistic-function-as-the-root-of-problems" style="font-size: 80%;"><b>Logistic function as the root of problems</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs054.html#the-derivative-of-the-logistic-funtion" style="font-size: 80%;"><b>The derivative of the Logistic funtion</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs055.html#insights-from-the-paper-by-glorot-and-bengio" style="font-size: 80%;"><b>Insights from the paper by Glorot and Bengio</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs056.html#the-relu-function-family" style="font-size: 80%;"><b>The RELU function family</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs057.html#elu-function" style="font-size: 80%;"><b>ELU function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs058.html#which-activation-function-should-we-use" style="font-size: 80%;"><b>Which activation function should we use?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs059.html#more-on-activation-functions-output-layers" style="font-size: 80%;"><b>More on activation functions, output layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs060.html#fine-tuning-neural-network-hyperparameters" style="font-size: 80%;"><b>Fine-tuning neural network hyperparameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs061.html#hidden-layers" style="font-size: 80%;"><b>Hidden layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs062.html#batch-normalization" style="font-size: 80%;"><b>Batch Normalization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs063.html#dropout" style="font-size: 80%;"><b>Dropout</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs064.html#gradient-clipping" style="font-size: 80%;"><b>Gradient Clipping</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs065.html#a-top-down-perspective-on-neural-networks" style="font-size: 80%;"><b>A top-down perspective on Neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs066.html#more-top-down-perspectives" style="font-size: 80%;"><b>More top-down perspectives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs067.html#limitations-of-supervised-learning-with-deep-networks" style="font-size: 80%;"><b>Limitations of supervised learning with deep networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs068.html#limitations-of-nns" style="font-size: 80%;"><b>Limitations of NNs</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs069.html#homogeneous-data" style="font-size: 80%;"><b>Homogeneous data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs070.html#more-limitations" style="font-size: 80%;"><b>More limitations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs071.html#setting-up-a-multi-layer-perceptron-model-for-classification" style="font-size: 80%;"><b>Setting up a Multi-layer perceptron model for classification</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs072.html#defining-the-cost-function" style="font-size: 80%;"><b>Defining the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs073.html#example-binary-classification-problem" style="font-size: 80%;"><b>Example: binary classification problem</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs074.html#the-softmax-function" style="font-size: 80%;"><b>The Softmax function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs075.html#developing-a-code-for-doing-neural-networks-with-back-propagation" style="font-size: 80%;"><b>Developing a code for doing neural networks with back propagation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs077.html#train-and-test-datasets" style="font-size: 80%;"><b>Train and test datasets</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs078.html#define-model-and-architecture" style="font-size: 80%;"><b>Define model and architecture</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs079.html#layers" style="font-size: 80%;"><b>Layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs080.html#weights-and-biases" style="font-size: 80%;"><b>Weights and biases</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs081.html#feed-forward-pass" style="font-size: 80%;"><b>Feed-forward pass</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs082.html#matrix-multiplications" style="font-size: 80%;"><b>Matrix multiplications</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs083.html#choose-cost-function-and-optimizer" style="font-size: 80%;"><b>Choose cost function and optimizer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs084.html#optimizing-the-cost-function" style="font-size: 80%;"><b>Optimizing the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs085.html#regularization" style="font-size: 80%;"><b>Regularization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs086.html#matrix-multiplication" style="font-size: 80%;"><b>Matrix  multiplication</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs087.html#improving-performance" style="font-size: 80%;"><b>Improving performance</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs088.html#full-object-oriented-implementation" style="font-size: 80%;"><b>Full object-oriented implementation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs089.html#evaluate-model-performance-on-test-data" style="font-size: 80%;"><b>Evaluate model performance on test data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs090.html#adjust-hyperparameters" style="font-size: 80%;"><b>Adjust hyperparameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#scikit-learn-implementation" style="font-size: 80%;"><b>scikit-learn implementation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs094.html#building-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building neural networks in Tensorflow and Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs095.html#tensorflow" style="font-size: 80%;"><b>Tensorflow</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#using-keras" style="font-size: 80%;"><b>Using Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#building-a-neural-network-code" style="font-size: 80%;"><b>Building a neural network code</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#learning-rate-methods" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Learning rate methods</a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#usage-of-the-above-learning-rate-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of the above learning rate schedulers</a></li>
@@ -465,23 +463,26 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0041"></a>
 <!-- !split -->
-<h2 id="setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations" class="anchor">Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations </h2>
+<h2 id="using-the-chain-rule-and-summing-over-all-k-entries" class="anchor">Using the chain rule and summing over all \( k \) entries </h2>
 
-<div class="panel panel-default">
-<div class="panel-body">
-<!-- subsequent paragraphs come in larger fonts, so start with a paragraph -->
-<ol>
-<li> Set up your inputs and outputs (scalars, vectors, matrices or higher-order arrays)</li>
-<li> Define the number of hidden layers and hidden nodes</li>
-<li> Define activation functions for hidden layers and output layers</li>
-<li> Define optimizer (plan learning rate, momentum, ADAgrad, RMSprop, ADAM etc) and array of initial learning rates</li>
-<li> Define cost function and possible regularization terms with hyperparameters</li>
-<li> Initialize weights and biases</li>
-<li> Fix number of iterations for the feed forward part and back propagation part</li>
-</ol>
-</div>
-</div>
+<p>We obtain</p>
+$$
+\delta_j^l =\sum_k \frac{\partial {\cal C}}{\partial z_k^{l+1}}\frac{\partial z_k^{l+1}}{\partial z_j^{l}}=\sum_k \delta_k^{l+1}\frac{\partial z_k^{l+1}}{\partial z_j^{l}},
+$$
+
+<p>and recalling that</p>
+$$
+z_j^{l+1} = \sum_{i=1}^{M_{l}}w_{ij}^{l+1}a_i^{l}+b_j^{l+1},
+$$
+
+<p>with \( M_l \) being the number of nodes in layer \( l \), we obtain</p>
+$$
+\delta_j^l =\sum_k \delta_k^{l+1}w_{kj}^{l+1}\sigma'(z_j^l),
+$$
+
+<p>This is our final equation.</p>
 
+<p>We are now ready to set up the algorithm for back propagation and learning the weights and biases.</p>
 
 <p>
 <!-- navigation buttons at the bottom of the page -->
diff --git a/doc/pub/week42/html/._week42-bs042.html b/doc/pub/week42/html/._week42-bs042.html
index 51b2a3657..d5faf2de1 100644
--- a/doc/pub/week42/html/._week42-bs042.html
+++ b/doc/pub/week42/html/._week42-bs042.html
@@ -36,16 +36,17 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Lecture October 14, 2024', 2, None, 'lecture-october-14-2024'),
-              ('Material for the active learning sessions on Tuesday and '
-               'Wednesday',
+ 'sections': [('Lecture October 13, 2025', 2, None, 'lecture-october-13-2025'),
+              ('Readings and videos', 2, None, 'readings-and-videos'),
+              ('Material for the lab sessions on Tuesday and Wednesday',
                2,
                None,
-               'material-for-the-active-learning-sessions-on-tuesday-and-wednesday'),
-              ('Writing a code which implements a feed-forward neural network',
+               'material-for-the-lab-sessions-on-tuesday-and-wednesday'),
+              ('Lecture material: Writing a code which implements a '
+               'feed-forward neural network',
                2,
                None,
-               'writing-a-code-which-implements-a-feed-forward-neural-network'),
+               'lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network'),
               ('Mathematics of deep learning',
                2,
                None,
@@ -55,10 +56,11 @@
                None,
                'reminder-on-books-with-hands-on-material-and-codes'),
               ('Reading recommendations', 2, None, 'reading-recommendations'),
-              ('First network example, simple percepetron with one input',
+              ('Reminder from last week: First network example, simple '
+               'percepetron with one input',
                2,
                None,
-               'first-network-example-simple-percepetron-with-one-input'),
+               'reminder-from-last-week-first-network-example-simple-percepetron-with-one-input'),
               ('Layout of a simple neural network with no hidden layer',
                2,
                None,
@@ -105,10 +107,10 @@
                None,
                'setting-up-the-equations-for-a-neural-network'),
               ('Layout of a neural network with three hidden layers (last '
-               'later = $l=L=4$, first layer $l=0$)',
+               'layer = $l=L=4$, first layer $l=0$)',
                2,
                None,
-               'layout-of-a-neural-network-with-three-hidden-layers-last-later-l-l-4-first-layer-l-0'),
+               'layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0'),
               ('Definitions', 2, None, 'definitions'),
               ('Inputs to the activation function',
                2,
@@ -291,10 +293,6 @@
                2,
                None,
                'collect-and-pre-process-data'),
-              ('The Breast Cancer Data, now with Keras',
-               2,
-               None,
-               'the-breast-cancer-data-now-with-keras'),
               ('Building a neural network code',
                2,
                None,
@@ -349,103 +347,103 @@
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs001.html#lecture-october-14-2024" style="font-size: 80%;"><b>Lecture October 14, 2024</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs002.html#material-for-the-active-learning-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Material for the active learning sessions on Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs003.html#writing-a-code-which-implements-a-feed-forward-neural-network" style="font-size: 80%;"><b>Writing a code which implements a feed-forward neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs004.html#mathematics-of-deep-learning" style="font-size: 80%;"><b>Mathematics of deep learning</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs005.html#reminder-on-books-with-hands-on-material-and-codes" style="font-size: 80%;"><b>Reminder on books with hands-on material and codes</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs006.html#reading-recommendations" style="font-size: 80%;"><b>Reading recommendations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs007.html#first-network-example-simple-percepetron-with-one-input" style="font-size: 80%;"><b>First network example, simple percepetron with one input</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs008.html#layout-of-a-simple-neural-network-with-no-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with no hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs009.html#optimizing-the-parameters" style="font-size: 80%;"><b>Optimizing the parameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs010.html#adding-a-hidden-layer" style="font-size: 80%;"><b>Adding a hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs011.html#layout-of-a-simple-neural-network-with-one-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with one hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs012.html#the-derivatives" style="font-size: 80%;"><b>The derivatives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs013.html#important-observations" style="font-size: 80%;"><b>Important observations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs014.html#the-training" style="font-size: 80%;"><b>The training</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs015.html#code-example" style="font-size: 80%;"><b>Code example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs016.html#simple-neural-network-and-the-back-propagation-equations" style="font-size: 80%;"><b>Simple neural network and the  back propagation equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs017.html#layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-with-two-hidden-noeds-and-one-output-node" style="font-size: 80%;"><b>Layout of a simple neural network with two input nodes, one  hidden layer with two hidden noeds and one output node</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs018.html#the-ouput-layer" style="font-size: 80%;"><b>The ouput layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs019.html#compact-expressions" style="font-size: 80%;"><b>Compact expressions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs020.html#output-layer" style="font-size: 80%;"><b>Output layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs021.html#explicit-derivatives" style="font-size: 80%;"><b>Explicit derivatives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs022.html#derivatives-of-the-hidden-layer" style="font-size: 80%;"><b>Derivatives of the hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs023.html#final-expression" style="font-size: 80%;"><b>Final expression</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs024.html#completing-the-list" style="font-size: 80%;"><b>Completing the list</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs025.html#final-expressions-for-the-biases-of-the-hidden-layer" style="font-size: 80%;"><b>Final expressions for the biases of the hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs026.html#gradient-expressions" style="font-size: 80%;"><b>Gradient expressions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs027.html#setting-up-the-equations-for-a-neural-network" style="font-size: 80%;"><b>Setting up the equations for a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs028.html#layout-of-a-neural-network-with-three-hidden-layers-last-later-l-l-4-first-layer-l-0" style="font-size: 80%;"><b>Layout of a neural network with three hidden layers (last later = \( l=L=4 \), first layer \( l=0 \))</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs029.html#definitions" style="font-size: 80%;"><b>Definitions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs030.html#inputs-to-the-activation-function" style="font-size: 80%;"><b>Inputs to the activation function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs031.html#layout-of-input-to-first-hidden-layer-l-1-from-input-layer-l-0" style="font-size: 80%;"><b>Layout of input to first hidden layer \( l=1 \) from input layer \( l=0 \)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs032.html#derivatives-and-the-chain-rule" style="font-size: 80%;"><b>Derivatives and the chain rule</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs033.html#derivative-of-the-cost-function" style="font-size: 80%;"><b>Derivative of the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs034.html#the-back-propagation-equations-for-a-neural-network" style="font-size: 80%;"><b>The  back propagation equations for a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs035.html#analyzing-the-last-results" style="font-size: 80%;"><b>Analyzing the last results</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs036.html#more-considerations" style="font-size: 80%;"><b>More considerations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs037.html#derivatives-in-terms-of-z-j-l" style="font-size: 80%;"><b>Derivatives in terms of \( z_j^L \)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs038.html#bringing-it-together" style="font-size: 80%;"><b>Bringing it together</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs039.html#final-back-propagating-equation" style="font-size: 80%;"><b>Final back propagating equation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs040.html#using-the-chain-rule-and-summing-over-all-k-entries" style="font-size: 80%;"><b>Using the chain rule and summing over all \( k \) entries</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs041.html#setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations" style="font-size: 80%;"><b>Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations</b></a></li>
-     <!-- navigation toc: --> <li><a href="#setting-up-the-back-propagation-algorithm-part-1" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 1</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs043.html#setting-up-the-back-propagation-algorithm-part-2" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs044.html#setting-up-the-back-propagation-algorithm-part-3" style="font-size: 80%;"><b>Setting up the Back propagation algorithm, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs045.html#updating-the-gradients" style="font-size: 80%;"><b>Updating the gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs001.html#lecture-october-13-2025" style="font-size: 80%;"><b>Lecture October 13, 2025</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs002.html#readings-and-videos" style="font-size: 80%;"><b>Readings and videos</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs003.html#material-for-the-lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Material for the lab sessions on Tuesday and Wednesday</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs004.html#lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network" style="font-size: 80%;"><b>Lecture material: Writing a code which implements a feed-forward neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs005.html#mathematics-of-deep-learning" style="font-size: 80%;"><b>Mathematics of deep learning</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs006.html#reminder-on-books-with-hands-on-material-and-codes" style="font-size: 80%;"><b>Reminder on books with hands-on material and codes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs007.html#reading-recommendations" style="font-size: 80%;"><b>Reading recommendations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs008.html#reminder-from-last-week-first-network-example-simple-percepetron-with-one-input" style="font-size: 80%;"><b>Reminder from last week: First network example, simple percepetron with one input</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs009.html#layout-of-a-simple-neural-network-with-no-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with no hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs010.html#optimizing-the-parameters" style="font-size: 80%;"><b>Optimizing the parameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs011.html#adding-a-hidden-layer" style="font-size: 80%;"><b>Adding a hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs012.html#layout-of-a-simple-neural-network-with-one-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with one hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs013.html#the-derivatives" style="font-size: 80%;"><b>The derivatives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs014.html#important-observations" style="font-size: 80%;"><b>Important observations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs015.html#the-training" style="font-size: 80%;"><b>The training</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs016.html#code-example" style="font-size: 80%;"><b>Code example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs017.html#simple-neural-network-and-the-back-propagation-equations" style="font-size: 80%;"><b>Simple neural network and the  back propagation equations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs018.html#layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-with-two-hidden-noeds-and-one-output-node" style="font-size: 80%;"><b>Layout of a simple neural network with two input nodes, one  hidden layer with two hidden noeds and one output node</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs019.html#the-ouput-layer" style="font-size: 80%;"><b>The ouput layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs020.html#compact-expressions" style="font-size: 80%;"><b>Compact expressions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs021.html#output-layer" style="font-size: 80%;"><b>Output layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs022.html#explicit-derivatives" style="font-size: 80%;"><b>Explicit derivatives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs023.html#derivatives-of-the-hidden-layer" style="font-size: 80%;"><b>Derivatives of the hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs024.html#final-expression" style="font-size: 80%;"><b>Final expression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs025.html#completing-the-list" style="font-size: 80%;"><b>Completing the list</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs026.html#final-expressions-for-the-biases-of-the-hidden-layer" style="font-size: 80%;"><b>Final expressions for the biases of the hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs027.html#gradient-expressions" style="font-size: 80%;"><b>Gradient expressions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs028.html#setting-up-the-equations-for-a-neural-network" style="font-size: 80%;"><b>Setting up the equations for a neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs029.html#layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0" style="font-size: 80%;"><b>Layout of a neural network with three hidden layers (last layer = \( l=L=4 \), first layer \( l=0 \))</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs030.html#definitions" style="font-size: 80%;"><b>Definitions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs031.html#inputs-to-the-activation-function" style="font-size: 80%;"><b>Inputs to the activation function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs032.html#layout-of-input-to-first-hidden-layer-l-1-from-input-layer-l-0" style="font-size: 80%;"><b>Layout of input to first hidden layer \( l=1 \) from input layer \( l=0 \)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs033.html#derivatives-and-the-chain-rule" style="font-size: 80%;"><b>Derivatives and the chain rule</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs034.html#derivative-of-the-cost-function" style="font-size: 80%;"><b>Derivative of the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs035.html#the-back-propagation-equations-for-a-neural-network" style="font-size: 80%;"><b>The  back propagation equations for a neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs036.html#analyzing-the-last-results" style="font-size: 80%;"><b>Analyzing the last results</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs037.html#more-considerations" style="font-size: 80%;"><b>More considerations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs038.html#derivatives-in-terms-of-z-j-l" style="font-size: 80%;"><b>Derivatives in terms of \( z_j^L \)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs039.html#bringing-it-together" style="font-size: 80%;"><b>Bringing it together</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs040.html#final-back-propagating-equation" style="font-size: 80%;"><b>Final back propagating equation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs041.html#using-the-chain-rule-and-summing-over-all-k-entries" style="font-size: 80%;"><b>Using the chain rule and summing over all \( k \) entries</b></a></li>
+     <!-- navigation toc: --> <li><a href="#setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations" style="font-size: 80%;"><b>Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs043.html#setting-up-the-back-propagation-algorithm-part-1" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 1</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs044.html#setting-up-the-back-propagation-algorithm-part-2" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 2</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs045.html#setting-up-the-back-propagation-algorithm-part-3" style="font-size: 80%;"><b>Setting up the Back propagation algorithm, part 3</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs046.html#updating-the-gradients" style="font-size: 80%;"><b>Updating the gradients</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#activation-functions" style="font-size: 80%;"><b>Activation functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs047.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs048.html#relevance" style="font-size: 80%;"><b>Relevance</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs049.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs050.html#exploding-gradients" style="font-size: 80%;"><b>Exploding gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs051.html#is-the-logistic-activation-function-sigmoid-our-choice" style="font-size: 80%;"><b>Is the Logistic activation function (Sigmoid)  our choice?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs052.html#logistic-function-as-the-root-of-problems" style="font-size: 80%;"><b>Logistic function as the root of problems</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs053.html#the-derivative-of-the-logistic-funtion" style="font-size: 80%;"><b>The derivative of the Logistic funtion</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs054.html#insights-from-the-paper-by-glorot-and-bengio" style="font-size: 80%;"><b>Insights from the paper by Glorot and Bengio</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs055.html#the-relu-function-family" style="font-size: 80%;"><b>The RELU function family</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs056.html#elu-function" style="font-size: 80%;"><b>ELU function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs057.html#which-activation-function-should-we-use" style="font-size: 80%;"><b>Which activation function should we use?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs058.html#more-on-activation-functions-output-layers" style="font-size: 80%;"><b>More on activation functions, output layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs059.html#fine-tuning-neural-network-hyperparameters" style="font-size: 80%;"><b>Fine-tuning neural network hyperparameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs060.html#hidden-layers" style="font-size: 80%;"><b>Hidden layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs061.html#batch-normalization" style="font-size: 80%;"><b>Batch Normalization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs062.html#dropout" style="font-size: 80%;"><b>Dropout</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs063.html#gradient-clipping" style="font-size: 80%;"><b>Gradient Clipping</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs064.html#a-top-down-perspective-on-neural-networks" style="font-size: 80%;"><b>A top-down perspective on Neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs065.html#more-top-down-perspectives" style="font-size: 80%;"><b>More top-down perspectives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs066.html#limitations-of-supervised-learning-with-deep-networks" style="font-size: 80%;"><b>Limitations of supervised learning with deep networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs067.html#limitations-of-nns" style="font-size: 80%;"><b>Limitations of NNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs068.html#homogeneous-data" style="font-size: 80%;"><b>Homogeneous data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs069.html#more-limitations" style="font-size: 80%;"><b>More limitations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs070.html#setting-up-a-multi-layer-perceptron-model-for-classification" style="font-size: 80%;"><b>Setting up a Multi-layer perceptron model for classification</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs071.html#defining-the-cost-function" style="font-size: 80%;"><b>Defining the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs072.html#example-binary-classification-problem" style="font-size: 80%;"><b>Example: binary classification problem</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs073.html#the-softmax-function" style="font-size: 80%;"><b>The Softmax function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs074.html#developing-a-code-for-doing-neural-networks-with-back-propagation" style="font-size: 80%;"><b>Developing a code for doing neural networks with back propagation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs076.html#train-and-test-datasets" style="font-size: 80%;"><b>Train and test datasets</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs077.html#define-model-and-architecture" style="font-size: 80%;"><b>Define model and architecture</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs078.html#layers" style="font-size: 80%;"><b>Layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs079.html#weights-and-biases" style="font-size: 80%;"><b>Weights and biases</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs080.html#feed-forward-pass" style="font-size: 80%;"><b>Feed-forward pass</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs081.html#matrix-multiplications" style="font-size: 80%;"><b>Matrix multiplications</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs082.html#choose-cost-function-and-optimizer" style="font-size: 80%;"><b>Choose cost function and optimizer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs083.html#optimizing-the-cost-function" style="font-size: 80%;"><b>Optimizing the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs084.html#regularization" style="font-size: 80%;"><b>Regularization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs085.html#matrix-multiplication" style="font-size: 80%;"><b>Matrix  multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs086.html#improving-performance" style="font-size: 80%;"><b>Improving performance</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs087.html#full-object-oriented-implementation" style="font-size: 80%;"><b>Full object-oriented implementation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs088.html#evaluate-model-performance-on-test-data" style="font-size: 80%;"><b>Evaluate model performance on test data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs089.html#adjust-hyperparameters" style="font-size: 80%;"><b>Adjust hyperparameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs091.html#scikit-learn-implementation" style="font-size: 80%;"><b>scikit-learn implementation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#building-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs094.html#tensorflow" style="font-size: 80%;"><b>Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs095.html#using-keras" style="font-size: 80%;"><b>Using Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#the-breast-cancer-data-now-with-keras" style="font-size: 80%;"><b>The Breast Cancer Data, now with Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs048.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs049.html#relevance" style="font-size: 80%;"><b>Relevance</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs050.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs051.html#exploding-gradients" style="font-size: 80%;"><b>Exploding gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs052.html#is-the-logistic-activation-function-sigmoid-our-choice" style="font-size: 80%;"><b>Is the Logistic activation function (Sigmoid)  our choice?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs053.html#logistic-function-as-the-root-of-problems" style="font-size: 80%;"><b>Logistic function as the root of problems</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs054.html#the-derivative-of-the-logistic-funtion" style="font-size: 80%;"><b>The derivative of the Logistic funtion</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs055.html#insights-from-the-paper-by-glorot-and-bengio" style="font-size: 80%;"><b>Insights from the paper by Glorot and Bengio</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs056.html#the-relu-function-family" style="font-size: 80%;"><b>The RELU function family</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs057.html#elu-function" style="font-size: 80%;"><b>ELU function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs058.html#which-activation-function-should-we-use" style="font-size: 80%;"><b>Which activation function should we use?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs059.html#more-on-activation-functions-output-layers" style="font-size: 80%;"><b>More on activation functions, output layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs060.html#fine-tuning-neural-network-hyperparameters" style="font-size: 80%;"><b>Fine-tuning neural network hyperparameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs061.html#hidden-layers" style="font-size: 80%;"><b>Hidden layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs062.html#batch-normalization" style="font-size: 80%;"><b>Batch Normalization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs063.html#dropout" style="font-size: 80%;"><b>Dropout</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs064.html#gradient-clipping" style="font-size: 80%;"><b>Gradient Clipping</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs065.html#a-top-down-perspective-on-neural-networks" style="font-size: 80%;"><b>A top-down perspective on Neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs066.html#more-top-down-perspectives" style="font-size: 80%;"><b>More top-down perspectives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs067.html#limitations-of-supervised-learning-with-deep-networks" style="font-size: 80%;"><b>Limitations of supervised learning with deep networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs068.html#limitations-of-nns" style="font-size: 80%;"><b>Limitations of NNs</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs069.html#homogeneous-data" style="font-size: 80%;"><b>Homogeneous data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs070.html#more-limitations" style="font-size: 80%;"><b>More limitations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs071.html#setting-up-a-multi-layer-perceptron-model-for-classification" style="font-size: 80%;"><b>Setting up a Multi-layer perceptron model for classification</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs072.html#defining-the-cost-function" style="font-size: 80%;"><b>Defining the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs073.html#example-binary-classification-problem" style="font-size: 80%;"><b>Example: binary classification problem</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs074.html#the-softmax-function" style="font-size: 80%;"><b>The Softmax function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs075.html#developing-a-code-for-doing-neural-networks-with-back-propagation" style="font-size: 80%;"><b>Developing a code for doing neural networks with back propagation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs077.html#train-and-test-datasets" style="font-size: 80%;"><b>Train and test datasets</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs078.html#define-model-and-architecture" style="font-size: 80%;"><b>Define model and architecture</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs079.html#layers" style="font-size: 80%;"><b>Layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs080.html#weights-and-biases" style="font-size: 80%;"><b>Weights and biases</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs081.html#feed-forward-pass" style="font-size: 80%;"><b>Feed-forward pass</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs082.html#matrix-multiplications" style="font-size: 80%;"><b>Matrix multiplications</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs083.html#choose-cost-function-and-optimizer" style="font-size: 80%;"><b>Choose cost function and optimizer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs084.html#optimizing-the-cost-function" style="font-size: 80%;"><b>Optimizing the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs085.html#regularization" style="font-size: 80%;"><b>Regularization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs086.html#matrix-multiplication" style="font-size: 80%;"><b>Matrix  multiplication</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs087.html#improving-performance" style="font-size: 80%;"><b>Improving performance</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs088.html#full-object-oriented-implementation" style="font-size: 80%;"><b>Full object-oriented implementation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs089.html#evaluate-model-performance-on-test-data" style="font-size: 80%;"><b>Evaluate model performance on test data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs090.html#adjust-hyperparameters" style="font-size: 80%;"><b>Adjust hyperparameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#scikit-learn-implementation" style="font-size: 80%;"><b>scikit-learn implementation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs094.html#building-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building neural networks in Tensorflow and Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs095.html#tensorflow" style="font-size: 80%;"><b>Tensorflow</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#using-keras" style="font-size: 80%;"><b>Using Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#building-a-neural-network-code" style="font-size: 80%;"><b>Building a neural network code</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#learning-rate-methods" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Learning rate methods</a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#usage-of-the-above-learning-rate-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of the above learning rate schedulers</a></li>
@@ -465,22 +463,23 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0042"></a>
 <!-- !split -->
-<h2 id="setting-up-the-back-propagation-algorithm-part-1" class="anchor">Setting up the back propagation algorithm, part 1 </h2>
-
-<p>The four equations  provide us with a way of computing the gradients of the cost function. Let us write this out in the form of an algorithm.</p>
+<h2 id="setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations" class="anchor">Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations </h2>
 
-<p><b>First</b>, we set up the input data \( \boldsymbol{x} \) and the activations
-\( \boldsymbol{z}_1 \) of the input layer and compute the activation function and
-the pertinent outputs \( \boldsymbol{a}^1 \).
-</p>
-
-<p><b>Secondly</b>, we perform then the feed forward till we reach the output
-layer and compute all \( \boldsymbol{z}_l \) of the input layer and compute the
-activation function and the pertinent outputs \( \boldsymbol{a}^l \) for
-\( l=1,2,3,\dots,L \).
-</p>
+<div class="panel panel-default">
+<div class="panel-body">
+<!-- subsequent paragraphs come in larger fonts, so start with a paragraph -->
+<ol>
+<li> Set up your inputs and outputs (scalars, vectors, matrices or higher-order arrays)</li>
+<li> Define the number of hidden layers and hidden nodes</li>
+<li> Define activation functions for hidden layers and output layers</li>
+<li> Define optimizer (plan learning rate, momentum, ADAgrad, RMSprop, ADAM etc) and array of initial learning rates</li>
+<li> Define cost function and possible regularization terms with hyperparameters</li>
+<li> Initialize weights and biases</li>
+<li> Fix number of iterations for the feed forward part and back propagation part</li>
+</ol>
+</div>
+</div>
 
-<p><b>Notation</b>: The first hidden layer has \( l=1 \) as label and the final output layer has \( l=L \).</p>
 
 <p>
 <!-- navigation buttons at the bottom of the page -->
diff --git a/doc/pub/week42/html/._week42-bs043.html b/doc/pub/week42/html/._week42-bs043.html
index 416827e4a..807fe7b12 100644
--- a/doc/pub/week42/html/._week42-bs043.html
+++ b/doc/pub/week42/html/._week42-bs043.html
@@ -36,16 +36,17 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Lecture October 14, 2024', 2, None, 'lecture-october-14-2024'),
-              ('Material for the active learning sessions on Tuesday and '
-               'Wednesday',
+ 'sections': [('Lecture October 13, 2025', 2, None, 'lecture-october-13-2025'),
+              ('Readings and videos', 2, None, 'readings-and-videos'),
+              ('Material for the lab sessions on Tuesday and Wednesday',
                2,
                None,
-               'material-for-the-active-learning-sessions-on-tuesday-and-wednesday'),
-              ('Writing a code which implements a feed-forward neural network',
+               'material-for-the-lab-sessions-on-tuesday-and-wednesday'),
+              ('Lecture material: Writing a code which implements a '
+               'feed-forward neural network',
                2,
                None,
-               'writing-a-code-which-implements-a-feed-forward-neural-network'),
+               'lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network'),
               ('Mathematics of deep learning',
                2,
                None,
@@ -55,10 +56,11 @@
                None,
                'reminder-on-books-with-hands-on-material-and-codes'),
               ('Reading recommendations', 2, None, 'reading-recommendations'),
-              ('First network example, simple percepetron with one input',
+              ('Reminder from last week: First network example, simple '
+               'percepetron with one input',
                2,
                None,
-               'first-network-example-simple-percepetron-with-one-input'),
+               'reminder-from-last-week-first-network-example-simple-percepetron-with-one-input'),
               ('Layout of a simple neural network with no hidden layer',
                2,
                None,
@@ -105,10 +107,10 @@
                None,
                'setting-up-the-equations-for-a-neural-network'),
               ('Layout of a neural network with three hidden layers (last '
-               'later = $l=L=4$, first layer $l=0$)',
+               'layer = $l=L=4$, first layer $l=0$)',
                2,
                None,
-               'layout-of-a-neural-network-with-three-hidden-layers-last-later-l-l-4-first-layer-l-0'),
+               'layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0'),
               ('Definitions', 2, None, 'definitions'),
               ('Inputs to the activation function',
                2,
@@ -291,10 +293,6 @@
                2,
                None,
                'collect-and-pre-process-data'),
-              ('The Breast Cancer Data, now with Keras',
-               2,
-               None,
-               'the-breast-cancer-data-now-with-keras'),
               ('Building a neural network code',
                2,
                None,
@@ -349,103 +347,103 @@
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs001.html#lecture-october-14-2024" style="font-size: 80%;"><b>Lecture October 14, 2024</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs002.html#material-for-the-active-learning-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Material for the active learning sessions on Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs003.html#writing-a-code-which-implements-a-feed-forward-neural-network" style="font-size: 80%;"><b>Writing a code which implements a feed-forward neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs004.html#mathematics-of-deep-learning" style="font-size: 80%;"><b>Mathematics of deep learning</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs005.html#reminder-on-books-with-hands-on-material-and-codes" style="font-size: 80%;"><b>Reminder on books with hands-on material and codes</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs006.html#reading-recommendations" style="font-size: 80%;"><b>Reading recommendations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs007.html#first-network-example-simple-percepetron-with-one-input" style="font-size: 80%;"><b>First network example, simple percepetron with one input</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs008.html#layout-of-a-simple-neural-network-with-no-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with no hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs009.html#optimizing-the-parameters" style="font-size: 80%;"><b>Optimizing the parameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs010.html#adding-a-hidden-layer" style="font-size: 80%;"><b>Adding a hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs011.html#layout-of-a-simple-neural-network-with-one-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with one hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs012.html#the-derivatives" style="font-size: 80%;"><b>The derivatives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs013.html#important-observations" style="font-size: 80%;"><b>Important observations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs014.html#the-training" style="font-size: 80%;"><b>The training</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs015.html#code-example" style="font-size: 80%;"><b>Code example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs016.html#simple-neural-network-and-the-back-propagation-equations" style="font-size: 80%;"><b>Simple neural network and the  back propagation equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs017.html#layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-with-two-hidden-noeds-and-one-output-node" style="font-size: 80%;"><b>Layout of a simple neural network with two input nodes, one  hidden layer with two hidden noeds and one output node</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs018.html#the-ouput-layer" style="font-size: 80%;"><b>The ouput layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs019.html#compact-expressions" style="font-size: 80%;"><b>Compact expressions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs020.html#output-layer" style="font-size: 80%;"><b>Output layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs021.html#explicit-derivatives" style="font-size: 80%;"><b>Explicit derivatives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs022.html#derivatives-of-the-hidden-layer" style="font-size: 80%;"><b>Derivatives of the hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs023.html#final-expression" style="font-size: 80%;"><b>Final expression</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs024.html#completing-the-list" style="font-size: 80%;"><b>Completing the list</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs025.html#final-expressions-for-the-biases-of-the-hidden-layer" style="font-size: 80%;"><b>Final expressions for the biases of the hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs026.html#gradient-expressions" style="font-size: 80%;"><b>Gradient expressions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs027.html#setting-up-the-equations-for-a-neural-network" style="font-size: 80%;"><b>Setting up the equations for a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs028.html#layout-of-a-neural-network-with-three-hidden-layers-last-later-l-l-4-first-layer-l-0" style="font-size: 80%;"><b>Layout of a neural network with three hidden layers (last later = \( l=L=4 \), first layer \( l=0 \))</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs029.html#definitions" style="font-size: 80%;"><b>Definitions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs030.html#inputs-to-the-activation-function" style="font-size: 80%;"><b>Inputs to the activation function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs031.html#layout-of-input-to-first-hidden-layer-l-1-from-input-layer-l-0" style="font-size: 80%;"><b>Layout of input to first hidden layer \( l=1 \) from input layer \( l=0 \)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs032.html#derivatives-and-the-chain-rule" style="font-size: 80%;"><b>Derivatives and the chain rule</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs033.html#derivative-of-the-cost-function" style="font-size: 80%;"><b>Derivative of the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs034.html#the-back-propagation-equations-for-a-neural-network" style="font-size: 80%;"><b>The  back propagation equations for a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs035.html#analyzing-the-last-results" style="font-size: 80%;"><b>Analyzing the last results</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs036.html#more-considerations" style="font-size: 80%;"><b>More considerations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs037.html#derivatives-in-terms-of-z-j-l" style="font-size: 80%;"><b>Derivatives in terms of \( z_j^L \)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs038.html#bringing-it-together" style="font-size: 80%;"><b>Bringing it together</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs039.html#final-back-propagating-equation" style="font-size: 80%;"><b>Final back propagating equation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs040.html#using-the-chain-rule-and-summing-over-all-k-entries" style="font-size: 80%;"><b>Using the chain rule and summing over all \( k \) entries</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs041.html#setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations" style="font-size: 80%;"><b>Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs042.html#setting-up-the-back-propagation-algorithm-part-1" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 1</b></a></li>
-     <!-- navigation toc: --> <li><a href="#setting-up-the-back-propagation-algorithm-part-2" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs044.html#setting-up-the-back-propagation-algorithm-part-3" style="font-size: 80%;"><b>Setting up the Back propagation algorithm, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs045.html#updating-the-gradients" style="font-size: 80%;"><b>Updating the gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs001.html#lecture-october-13-2025" style="font-size: 80%;"><b>Lecture October 13, 2025</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs002.html#readings-and-videos" style="font-size: 80%;"><b>Readings and videos</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs003.html#material-for-the-lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Material for the lab sessions on Tuesday and Wednesday</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs004.html#lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network" style="font-size: 80%;"><b>Lecture material: Writing a code which implements a feed-forward neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs005.html#mathematics-of-deep-learning" style="font-size: 80%;"><b>Mathematics of deep learning</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs006.html#reminder-on-books-with-hands-on-material-and-codes" style="font-size: 80%;"><b>Reminder on books with hands-on material and codes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs007.html#reading-recommendations" style="font-size: 80%;"><b>Reading recommendations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs008.html#reminder-from-last-week-first-network-example-simple-percepetron-with-one-input" style="font-size: 80%;"><b>Reminder from last week: First network example, simple percepetron with one input</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs009.html#layout-of-a-simple-neural-network-with-no-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with no hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs010.html#optimizing-the-parameters" style="font-size: 80%;"><b>Optimizing the parameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs011.html#adding-a-hidden-layer" style="font-size: 80%;"><b>Adding a hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs012.html#layout-of-a-simple-neural-network-with-one-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with one hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs013.html#the-derivatives" style="font-size: 80%;"><b>The derivatives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs014.html#important-observations" style="font-size: 80%;"><b>Important observations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs015.html#the-training" style="font-size: 80%;"><b>The training</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs016.html#code-example" style="font-size: 80%;"><b>Code example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs017.html#simple-neural-network-and-the-back-propagation-equations" style="font-size: 80%;"><b>Simple neural network and the  back propagation equations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs018.html#layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-with-two-hidden-noeds-and-one-output-node" style="font-size: 80%;"><b>Layout of a simple neural network with two input nodes, one  hidden layer with two hidden noeds and one output node</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs019.html#the-ouput-layer" style="font-size: 80%;"><b>The ouput layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs020.html#compact-expressions" style="font-size: 80%;"><b>Compact expressions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs021.html#output-layer" style="font-size: 80%;"><b>Output layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs022.html#explicit-derivatives" style="font-size: 80%;"><b>Explicit derivatives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs023.html#derivatives-of-the-hidden-layer" style="font-size: 80%;"><b>Derivatives of the hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs024.html#final-expression" style="font-size: 80%;"><b>Final expression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs025.html#completing-the-list" style="font-size: 80%;"><b>Completing the list</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs026.html#final-expressions-for-the-biases-of-the-hidden-layer" style="font-size: 80%;"><b>Final expressions for the biases of the hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs027.html#gradient-expressions" style="font-size: 80%;"><b>Gradient expressions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs028.html#setting-up-the-equations-for-a-neural-network" style="font-size: 80%;"><b>Setting up the equations for a neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs029.html#layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0" style="font-size: 80%;"><b>Layout of a neural network with three hidden layers (last layer = \( l=L=4 \), first layer \( l=0 \))</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs030.html#definitions" style="font-size: 80%;"><b>Definitions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs031.html#inputs-to-the-activation-function" style="font-size: 80%;"><b>Inputs to the activation function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs032.html#layout-of-input-to-first-hidden-layer-l-1-from-input-layer-l-0" style="font-size: 80%;"><b>Layout of input to first hidden layer \( l=1 \) from input layer \( l=0 \)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs033.html#derivatives-and-the-chain-rule" style="font-size: 80%;"><b>Derivatives and the chain rule</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs034.html#derivative-of-the-cost-function" style="font-size: 80%;"><b>Derivative of the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs035.html#the-back-propagation-equations-for-a-neural-network" style="font-size: 80%;"><b>The  back propagation equations for a neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs036.html#analyzing-the-last-results" style="font-size: 80%;"><b>Analyzing the last results</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs037.html#more-considerations" style="font-size: 80%;"><b>More considerations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs038.html#derivatives-in-terms-of-z-j-l" style="font-size: 80%;"><b>Derivatives in terms of \( z_j^L \)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs039.html#bringing-it-together" style="font-size: 80%;"><b>Bringing it together</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs040.html#final-back-propagating-equation" style="font-size: 80%;"><b>Final back propagating equation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs041.html#using-the-chain-rule-and-summing-over-all-k-entries" style="font-size: 80%;"><b>Using the chain rule and summing over all \( k \) entries</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs042.html#setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations" style="font-size: 80%;"><b>Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations</b></a></li>
+     <!-- navigation toc: --> <li><a href="#setting-up-the-back-propagation-algorithm-part-1" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 1</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs044.html#setting-up-the-back-propagation-algorithm-part-2" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 2</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs045.html#setting-up-the-back-propagation-algorithm-part-3" style="font-size: 80%;"><b>Setting up the Back propagation algorithm, part 3</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs046.html#updating-the-gradients" style="font-size: 80%;"><b>Updating the gradients</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#activation-functions" style="font-size: 80%;"><b>Activation functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs047.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs048.html#relevance" style="font-size: 80%;"><b>Relevance</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs049.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs050.html#exploding-gradients" style="font-size: 80%;"><b>Exploding gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs051.html#is-the-logistic-activation-function-sigmoid-our-choice" style="font-size: 80%;"><b>Is the Logistic activation function (Sigmoid)  our choice?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs052.html#logistic-function-as-the-root-of-problems" style="font-size: 80%;"><b>Logistic function as the root of problems</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs053.html#the-derivative-of-the-logistic-funtion" style="font-size: 80%;"><b>The derivative of the Logistic funtion</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs054.html#insights-from-the-paper-by-glorot-and-bengio" style="font-size: 80%;"><b>Insights from the paper by Glorot and Bengio</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs055.html#the-relu-function-family" style="font-size: 80%;"><b>The RELU function family</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs056.html#elu-function" style="font-size: 80%;"><b>ELU function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs057.html#which-activation-function-should-we-use" style="font-size: 80%;"><b>Which activation function should we use?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs058.html#more-on-activation-functions-output-layers" style="font-size: 80%;"><b>More on activation functions, output layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs059.html#fine-tuning-neural-network-hyperparameters" style="font-size: 80%;"><b>Fine-tuning neural network hyperparameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs060.html#hidden-layers" style="font-size: 80%;"><b>Hidden layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs061.html#batch-normalization" style="font-size: 80%;"><b>Batch Normalization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs062.html#dropout" style="font-size: 80%;"><b>Dropout</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs063.html#gradient-clipping" style="font-size: 80%;"><b>Gradient Clipping</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs064.html#a-top-down-perspective-on-neural-networks" style="font-size: 80%;"><b>A top-down perspective on Neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs065.html#more-top-down-perspectives" style="font-size: 80%;"><b>More top-down perspectives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs066.html#limitations-of-supervised-learning-with-deep-networks" style="font-size: 80%;"><b>Limitations of supervised learning with deep networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs067.html#limitations-of-nns" style="font-size: 80%;"><b>Limitations of NNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs068.html#homogeneous-data" style="font-size: 80%;"><b>Homogeneous data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs069.html#more-limitations" style="font-size: 80%;"><b>More limitations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs070.html#setting-up-a-multi-layer-perceptron-model-for-classification" style="font-size: 80%;"><b>Setting up a Multi-layer perceptron model for classification</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs071.html#defining-the-cost-function" style="font-size: 80%;"><b>Defining the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs072.html#example-binary-classification-problem" style="font-size: 80%;"><b>Example: binary classification problem</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs073.html#the-softmax-function" style="font-size: 80%;"><b>The Softmax function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs074.html#developing-a-code-for-doing-neural-networks-with-back-propagation" style="font-size: 80%;"><b>Developing a code for doing neural networks with back propagation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs076.html#train-and-test-datasets" style="font-size: 80%;"><b>Train and test datasets</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs077.html#define-model-and-architecture" style="font-size: 80%;"><b>Define model and architecture</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs078.html#layers" style="font-size: 80%;"><b>Layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs079.html#weights-and-biases" style="font-size: 80%;"><b>Weights and biases</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs080.html#feed-forward-pass" style="font-size: 80%;"><b>Feed-forward pass</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs081.html#matrix-multiplications" style="font-size: 80%;"><b>Matrix multiplications</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs082.html#choose-cost-function-and-optimizer" style="font-size: 80%;"><b>Choose cost function and optimizer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs083.html#optimizing-the-cost-function" style="font-size: 80%;"><b>Optimizing the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs084.html#regularization" style="font-size: 80%;"><b>Regularization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs085.html#matrix-multiplication" style="font-size: 80%;"><b>Matrix  multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs086.html#improving-performance" style="font-size: 80%;"><b>Improving performance</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs087.html#full-object-oriented-implementation" style="font-size: 80%;"><b>Full object-oriented implementation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs088.html#evaluate-model-performance-on-test-data" style="font-size: 80%;"><b>Evaluate model performance on test data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs089.html#adjust-hyperparameters" style="font-size: 80%;"><b>Adjust hyperparameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs091.html#scikit-learn-implementation" style="font-size: 80%;"><b>scikit-learn implementation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#building-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs094.html#tensorflow" style="font-size: 80%;"><b>Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs095.html#using-keras" style="font-size: 80%;"><b>Using Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#the-breast-cancer-data-now-with-keras" style="font-size: 80%;"><b>The Breast Cancer Data, now with Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs048.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs049.html#relevance" style="font-size: 80%;"><b>Relevance</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs050.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs051.html#exploding-gradients" style="font-size: 80%;"><b>Exploding gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs052.html#is-the-logistic-activation-function-sigmoid-our-choice" style="font-size: 80%;"><b>Is the Logistic activation function (Sigmoid)  our choice?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs053.html#logistic-function-as-the-root-of-problems" style="font-size: 80%;"><b>Logistic function as the root of problems</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs054.html#the-derivative-of-the-logistic-funtion" style="font-size: 80%;"><b>The derivative of the Logistic funtion</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs055.html#insights-from-the-paper-by-glorot-and-bengio" style="font-size: 80%;"><b>Insights from the paper by Glorot and Bengio</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs056.html#the-relu-function-family" style="font-size: 80%;"><b>The RELU function family</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs057.html#elu-function" style="font-size: 80%;"><b>ELU function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs058.html#which-activation-function-should-we-use" style="font-size: 80%;"><b>Which activation function should we use?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs059.html#more-on-activation-functions-output-layers" style="font-size: 80%;"><b>More on activation functions, output layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs060.html#fine-tuning-neural-network-hyperparameters" style="font-size: 80%;"><b>Fine-tuning neural network hyperparameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs061.html#hidden-layers" style="font-size: 80%;"><b>Hidden layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs062.html#batch-normalization" style="font-size: 80%;"><b>Batch Normalization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs063.html#dropout" style="font-size: 80%;"><b>Dropout</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs064.html#gradient-clipping" style="font-size: 80%;"><b>Gradient Clipping</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs065.html#a-top-down-perspective-on-neural-networks" style="font-size: 80%;"><b>A top-down perspective on Neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs066.html#more-top-down-perspectives" style="font-size: 80%;"><b>More top-down perspectives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs067.html#limitations-of-supervised-learning-with-deep-networks" style="font-size: 80%;"><b>Limitations of supervised learning with deep networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs068.html#limitations-of-nns" style="font-size: 80%;"><b>Limitations of NNs</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs069.html#homogeneous-data" style="font-size: 80%;"><b>Homogeneous data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs070.html#more-limitations" style="font-size: 80%;"><b>More limitations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs071.html#setting-up-a-multi-layer-perceptron-model-for-classification" style="font-size: 80%;"><b>Setting up a Multi-layer perceptron model for classification</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs072.html#defining-the-cost-function" style="font-size: 80%;"><b>Defining the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs073.html#example-binary-classification-problem" style="font-size: 80%;"><b>Example: binary classification problem</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs074.html#the-softmax-function" style="font-size: 80%;"><b>The Softmax function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs075.html#developing-a-code-for-doing-neural-networks-with-back-propagation" style="font-size: 80%;"><b>Developing a code for doing neural networks with back propagation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs077.html#train-and-test-datasets" style="font-size: 80%;"><b>Train and test datasets</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs078.html#define-model-and-architecture" style="font-size: 80%;"><b>Define model and architecture</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs079.html#layers" style="font-size: 80%;"><b>Layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs080.html#weights-and-biases" style="font-size: 80%;"><b>Weights and biases</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs081.html#feed-forward-pass" style="font-size: 80%;"><b>Feed-forward pass</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs082.html#matrix-multiplications" style="font-size: 80%;"><b>Matrix multiplications</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs083.html#choose-cost-function-and-optimizer" style="font-size: 80%;"><b>Choose cost function and optimizer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs084.html#optimizing-the-cost-function" style="font-size: 80%;"><b>Optimizing the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs085.html#regularization" style="font-size: 80%;"><b>Regularization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs086.html#matrix-multiplication" style="font-size: 80%;"><b>Matrix  multiplication</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs087.html#improving-performance" style="font-size: 80%;"><b>Improving performance</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs088.html#full-object-oriented-implementation" style="font-size: 80%;"><b>Full object-oriented implementation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs089.html#evaluate-model-performance-on-test-data" style="font-size: 80%;"><b>Evaluate model performance on test data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs090.html#adjust-hyperparameters" style="font-size: 80%;"><b>Adjust hyperparameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#scikit-learn-implementation" style="font-size: 80%;"><b>scikit-learn implementation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs094.html#building-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building neural networks in Tensorflow and Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs095.html#tensorflow" style="font-size: 80%;"><b>Tensorflow</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#using-keras" style="font-size: 80%;"><b>Using Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#building-a-neural-network-code" style="font-size: 80%;"><b>Building a neural network code</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#learning-rate-methods" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Learning rate methods</a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#usage-of-the-above-learning-rate-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of the above learning rate schedulers</a></li>
@@ -465,18 +463,22 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0043"></a>
 <!-- !split -->
-<h2 id="setting-up-the-back-propagation-algorithm-part-2" class="anchor">Setting up the back propagation algorithm, part 2 </h2>
+<h2 id="setting-up-the-back-propagation-algorithm-part-1" class="anchor">Setting up the back propagation algorithm, part 1 </h2>
+
+<p>The four equations  provide us with a way of computing the gradients of the cost function. Let us write this out in the form of an algorithm.</p>
 
-<p>Thereafter we compute the ouput error \( \boldsymbol{\delta}^L \) by computing all</p>
-$$
-\delta_j^L = \sigma'(z_j^L)\frac{\partial {\cal C}}{\partial (a_j^L)}.
-$$
+<p><b>First</b>, we set up the input data \( \boldsymbol{x} \) and the activations
+\( \boldsymbol{z}_1 \) of the input layer and compute the activation function and
+the pertinent outputs \( \boldsymbol{a}^1 \).
+</p>
 
-<p>Then we compute the back propagate error for each \( l=L-1,L-2,\dots,1 \) as</p>
-$$
-\delta_j^l = \sum_k \delta_k^{l+1}w_{kj}^{l+1}\sigma'(z_j^l).
-$$
+<p><b>Secondly</b>, we perform then the feed forward till we reach the output
+layer and compute all \( \boldsymbol{z}_l \) of the input layer and compute the
+activation function and the pertinent outputs \( \boldsymbol{a}^l \) for
+\( l=1,2,3,\dots,L \).
+</p>
 
+<p><b>Notation</b>: The first hidden layer has \( l=1 \) as label and the final output layer has \( l=L \).</p>
 
 <p>
 <!-- navigation buttons at the bottom of the page -->
diff --git a/doc/pub/week42/html/._week42-bs044.html b/doc/pub/week42/html/._week42-bs044.html
index 612fa717b..ade6afdd2 100644
--- a/doc/pub/week42/html/._week42-bs044.html
+++ b/doc/pub/week42/html/._week42-bs044.html
@@ -36,16 +36,17 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Lecture October 14, 2024', 2, None, 'lecture-october-14-2024'),
-              ('Material for the active learning sessions on Tuesday and '
-               'Wednesday',
+ 'sections': [('Lecture October 13, 2025', 2, None, 'lecture-october-13-2025'),
+              ('Readings and videos', 2, None, 'readings-and-videos'),
+              ('Material for the lab sessions on Tuesday and Wednesday',
                2,
                None,
-               'material-for-the-active-learning-sessions-on-tuesday-and-wednesday'),
-              ('Writing a code which implements a feed-forward neural network',
+               'material-for-the-lab-sessions-on-tuesday-and-wednesday'),
+              ('Lecture material: Writing a code which implements a '
+               'feed-forward neural network',
                2,
                None,
-               'writing-a-code-which-implements-a-feed-forward-neural-network'),
+               'lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network'),
               ('Mathematics of deep learning',
                2,
                None,
@@ -55,10 +56,11 @@
                None,
                'reminder-on-books-with-hands-on-material-and-codes'),
               ('Reading recommendations', 2, None, 'reading-recommendations'),
-              ('First network example, simple percepetron with one input',
+              ('Reminder from last week: First network example, simple '
+               'percepetron with one input',
                2,
                None,
-               'first-network-example-simple-percepetron-with-one-input'),
+               'reminder-from-last-week-first-network-example-simple-percepetron-with-one-input'),
               ('Layout of a simple neural network with no hidden layer',
                2,
                None,
@@ -105,10 +107,10 @@
                None,
                'setting-up-the-equations-for-a-neural-network'),
               ('Layout of a neural network with three hidden layers (last '
-               'later = $l=L=4$, first layer $l=0$)',
+               'layer = $l=L=4$, first layer $l=0$)',
                2,
                None,
-               'layout-of-a-neural-network-with-three-hidden-layers-last-later-l-l-4-first-layer-l-0'),
+               'layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0'),
               ('Definitions', 2, None, 'definitions'),
               ('Inputs to the activation function',
                2,
@@ -291,10 +293,6 @@
                2,
                None,
                'collect-and-pre-process-data'),
-              ('The Breast Cancer Data, now with Keras',
-               2,
-               None,
-               'the-breast-cancer-data-now-with-keras'),
               ('Building a neural network code',
                2,
                None,
@@ -349,103 +347,103 @@
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs001.html#lecture-october-14-2024" style="font-size: 80%;"><b>Lecture October 14, 2024</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs002.html#material-for-the-active-learning-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Material for the active learning sessions on Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs003.html#writing-a-code-which-implements-a-feed-forward-neural-network" style="font-size: 80%;"><b>Writing a code which implements a feed-forward neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs004.html#mathematics-of-deep-learning" style="font-size: 80%;"><b>Mathematics of deep learning</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs005.html#reminder-on-books-with-hands-on-material-and-codes" style="font-size: 80%;"><b>Reminder on books with hands-on material and codes</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs006.html#reading-recommendations" style="font-size: 80%;"><b>Reading recommendations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs007.html#first-network-example-simple-percepetron-with-one-input" style="font-size: 80%;"><b>First network example, simple percepetron with one input</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs008.html#layout-of-a-simple-neural-network-with-no-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with no hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs009.html#optimizing-the-parameters" style="font-size: 80%;"><b>Optimizing the parameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs010.html#adding-a-hidden-layer" style="font-size: 80%;"><b>Adding a hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs011.html#layout-of-a-simple-neural-network-with-one-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with one hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs012.html#the-derivatives" style="font-size: 80%;"><b>The derivatives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs013.html#important-observations" style="font-size: 80%;"><b>Important observations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs014.html#the-training" style="font-size: 80%;"><b>The training</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs015.html#code-example" style="font-size: 80%;"><b>Code example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs016.html#simple-neural-network-and-the-back-propagation-equations" style="font-size: 80%;"><b>Simple neural network and the  back propagation equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs017.html#layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-with-two-hidden-noeds-and-one-output-node" style="font-size: 80%;"><b>Layout of a simple neural network with two input nodes, one  hidden layer with two hidden noeds and one output node</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs018.html#the-ouput-layer" style="font-size: 80%;"><b>The ouput layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs019.html#compact-expressions" style="font-size: 80%;"><b>Compact expressions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs020.html#output-layer" style="font-size: 80%;"><b>Output layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs021.html#explicit-derivatives" style="font-size: 80%;"><b>Explicit derivatives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs022.html#derivatives-of-the-hidden-layer" style="font-size: 80%;"><b>Derivatives of the hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs023.html#final-expression" style="font-size: 80%;"><b>Final expression</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs024.html#completing-the-list" style="font-size: 80%;"><b>Completing the list</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs025.html#final-expressions-for-the-biases-of-the-hidden-layer" style="font-size: 80%;"><b>Final expressions for the biases of the hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs026.html#gradient-expressions" style="font-size: 80%;"><b>Gradient expressions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs027.html#setting-up-the-equations-for-a-neural-network" style="font-size: 80%;"><b>Setting up the equations for a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs028.html#layout-of-a-neural-network-with-three-hidden-layers-last-later-l-l-4-first-layer-l-0" style="font-size: 80%;"><b>Layout of a neural network with three hidden layers (last later = \( l=L=4 \), first layer \( l=0 \))</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs029.html#definitions" style="font-size: 80%;"><b>Definitions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs030.html#inputs-to-the-activation-function" style="font-size: 80%;"><b>Inputs to the activation function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs031.html#layout-of-input-to-first-hidden-layer-l-1-from-input-layer-l-0" style="font-size: 80%;"><b>Layout of input to first hidden layer \( l=1 \) from input layer \( l=0 \)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs032.html#derivatives-and-the-chain-rule" style="font-size: 80%;"><b>Derivatives and the chain rule</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs033.html#derivative-of-the-cost-function" style="font-size: 80%;"><b>Derivative of the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs034.html#the-back-propagation-equations-for-a-neural-network" style="font-size: 80%;"><b>The  back propagation equations for a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs035.html#analyzing-the-last-results" style="font-size: 80%;"><b>Analyzing the last results</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs036.html#more-considerations" style="font-size: 80%;"><b>More considerations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs037.html#derivatives-in-terms-of-z-j-l" style="font-size: 80%;"><b>Derivatives in terms of \( z_j^L \)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs038.html#bringing-it-together" style="font-size: 80%;"><b>Bringing it together</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs039.html#final-back-propagating-equation" style="font-size: 80%;"><b>Final back propagating equation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs040.html#using-the-chain-rule-and-summing-over-all-k-entries" style="font-size: 80%;"><b>Using the chain rule and summing over all \( k \) entries</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs041.html#setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations" style="font-size: 80%;"><b>Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs042.html#setting-up-the-back-propagation-algorithm-part-1" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 1</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs043.html#setting-up-the-back-propagation-algorithm-part-2" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="#setting-up-the-back-propagation-algorithm-part-3" style="font-size: 80%;"><b>Setting up the Back propagation algorithm, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs045.html#updating-the-gradients" style="font-size: 80%;"><b>Updating the gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs001.html#lecture-october-13-2025" style="font-size: 80%;"><b>Lecture October 13, 2025</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs002.html#readings-and-videos" style="font-size: 80%;"><b>Readings and videos</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs003.html#material-for-the-lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Material for the lab sessions on Tuesday and Wednesday</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs004.html#lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network" style="font-size: 80%;"><b>Lecture material: Writing a code which implements a feed-forward neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs005.html#mathematics-of-deep-learning" style="font-size: 80%;"><b>Mathematics of deep learning</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs006.html#reminder-on-books-with-hands-on-material-and-codes" style="font-size: 80%;"><b>Reminder on books with hands-on material and codes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs007.html#reading-recommendations" style="font-size: 80%;"><b>Reading recommendations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs008.html#reminder-from-last-week-first-network-example-simple-percepetron-with-one-input" style="font-size: 80%;"><b>Reminder from last week: First network example, simple percepetron with one input</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs009.html#layout-of-a-simple-neural-network-with-no-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with no hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs010.html#optimizing-the-parameters" style="font-size: 80%;"><b>Optimizing the parameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs011.html#adding-a-hidden-layer" style="font-size: 80%;"><b>Adding a hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs012.html#layout-of-a-simple-neural-network-with-one-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with one hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs013.html#the-derivatives" style="font-size: 80%;"><b>The derivatives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs014.html#important-observations" style="font-size: 80%;"><b>Important observations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs015.html#the-training" style="font-size: 80%;"><b>The training</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs016.html#code-example" style="font-size: 80%;"><b>Code example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs017.html#simple-neural-network-and-the-back-propagation-equations" style="font-size: 80%;"><b>Simple neural network and the  back propagation equations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs018.html#layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-with-two-hidden-noeds-and-one-output-node" style="font-size: 80%;"><b>Layout of a simple neural network with two input nodes, one  hidden layer with two hidden noeds and one output node</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs019.html#the-ouput-layer" style="font-size: 80%;"><b>The ouput layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs020.html#compact-expressions" style="font-size: 80%;"><b>Compact expressions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs021.html#output-layer" style="font-size: 80%;"><b>Output layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs022.html#explicit-derivatives" style="font-size: 80%;"><b>Explicit derivatives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs023.html#derivatives-of-the-hidden-layer" style="font-size: 80%;"><b>Derivatives of the hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs024.html#final-expression" style="font-size: 80%;"><b>Final expression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs025.html#completing-the-list" style="font-size: 80%;"><b>Completing the list</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs026.html#final-expressions-for-the-biases-of-the-hidden-layer" style="font-size: 80%;"><b>Final expressions for the biases of the hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs027.html#gradient-expressions" style="font-size: 80%;"><b>Gradient expressions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs028.html#setting-up-the-equations-for-a-neural-network" style="font-size: 80%;"><b>Setting up the equations for a neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs029.html#layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0" style="font-size: 80%;"><b>Layout of a neural network with three hidden layers (last layer = \( l=L=4 \), first layer \( l=0 \))</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs030.html#definitions" style="font-size: 80%;"><b>Definitions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs031.html#inputs-to-the-activation-function" style="font-size: 80%;"><b>Inputs to the activation function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs032.html#layout-of-input-to-first-hidden-layer-l-1-from-input-layer-l-0" style="font-size: 80%;"><b>Layout of input to first hidden layer \( l=1 \) from input layer \( l=0 \)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs033.html#derivatives-and-the-chain-rule" style="font-size: 80%;"><b>Derivatives and the chain rule</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs034.html#derivative-of-the-cost-function" style="font-size: 80%;"><b>Derivative of the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs035.html#the-back-propagation-equations-for-a-neural-network" style="font-size: 80%;"><b>The  back propagation equations for a neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs036.html#analyzing-the-last-results" style="font-size: 80%;"><b>Analyzing the last results</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs037.html#more-considerations" style="font-size: 80%;"><b>More considerations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs038.html#derivatives-in-terms-of-z-j-l" style="font-size: 80%;"><b>Derivatives in terms of \( z_j^L \)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs039.html#bringing-it-together" style="font-size: 80%;"><b>Bringing it together</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs040.html#final-back-propagating-equation" style="font-size: 80%;"><b>Final back propagating equation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs041.html#using-the-chain-rule-and-summing-over-all-k-entries" style="font-size: 80%;"><b>Using the chain rule and summing over all \( k \) entries</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs042.html#setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations" style="font-size: 80%;"><b>Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs043.html#setting-up-the-back-propagation-algorithm-part-1" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 1</b></a></li>
+     <!-- navigation toc: --> <li><a href="#setting-up-the-back-propagation-algorithm-part-2" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 2</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs045.html#setting-up-the-back-propagation-algorithm-part-3" style="font-size: 80%;"><b>Setting up the Back propagation algorithm, part 3</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs046.html#updating-the-gradients" style="font-size: 80%;"><b>Updating the gradients</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#activation-functions" style="font-size: 80%;"><b>Activation functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs047.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs048.html#relevance" style="font-size: 80%;"><b>Relevance</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs049.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs050.html#exploding-gradients" style="font-size: 80%;"><b>Exploding gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs051.html#is-the-logistic-activation-function-sigmoid-our-choice" style="font-size: 80%;"><b>Is the Logistic activation function (Sigmoid)  our choice?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs052.html#logistic-function-as-the-root-of-problems" style="font-size: 80%;"><b>Logistic function as the root of problems</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs053.html#the-derivative-of-the-logistic-funtion" style="font-size: 80%;"><b>The derivative of the Logistic funtion</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs054.html#insights-from-the-paper-by-glorot-and-bengio" style="font-size: 80%;"><b>Insights from the paper by Glorot and Bengio</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs055.html#the-relu-function-family" style="font-size: 80%;"><b>The RELU function family</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs056.html#elu-function" style="font-size: 80%;"><b>ELU function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs057.html#which-activation-function-should-we-use" style="font-size: 80%;"><b>Which activation function should we use?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs058.html#more-on-activation-functions-output-layers" style="font-size: 80%;"><b>More on activation functions, output layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs059.html#fine-tuning-neural-network-hyperparameters" style="font-size: 80%;"><b>Fine-tuning neural network hyperparameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs060.html#hidden-layers" style="font-size: 80%;"><b>Hidden layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs061.html#batch-normalization" style="font-size: 80%;"><b>Batch Normalization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs062.html#dropout" style="font-size: 80%;"><b>Dropout</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs063.html#gradient-clipping" style="font-size: 80%;"><b>Gradient Clipping</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs064.html#a-top-down-perspective-on-neural-networks" style="font-size: 80%;"><b>A top-down perspective on Neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs065.html#more-top-down-perspectives" style="font-size: 80%;"><b>More top-down perspectives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs066.html#limitations-of-supervised-learning-with-deep-networks" style="font-size: 80%;"><b>Limitations of supervised learning with deep networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs067.html#limitations-of-nns" style="font-size: 80%;"><b>Limitations of NNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs068.html#homogeneous-data" style="font-size: 80%;"><b>Homogeneous data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs069.html#more-limitations" style="font-size: 80%;"><b>More limitations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs070.html#setting-up-a-multi-layer-perceptron-model-for-classification" style="font-size: 80%;"><b>Setting up a Multi-layer perceptron model for classification</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs071.html#defining-the-cost-function" style="font-size: 80%;"><b>Defining the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs072.html#example-binary-classification-problem" style="font-size: 80%;"><b>Example: binary classification problem</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs073.html#the-softmax-function" style="font-size: 80%;"><b>The Softmax function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs074.html#developing-a-code-for-doing-neural-networks-with-back-propagation" style="font-size: 80%;"><b>Developing a code for doing neural networks with back propagation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs076.html#train-and-test-datasets" style="font-size: 80%;"><b>Train and test datasets</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs077.html#define-model-and-architecture" style="font-size: 80%;"><b>Define model and architecture</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs078.html#layers" style="font-size: 80%;"><b>Layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs079.html#weights-and-biases" style="font-size: 80%;"><b>Weights and biases</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs080.html#feed-forward-pass" style="font-size: 80%;"><b>Feed-forward pass</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs081.html#matrix-multiplications" style="font-size: 80%;"><b>Matrix multiplications</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs082.html#choose-cost-function-and-optimizer" style="font-size: 80%;"><b>Choose cost function and optimizer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs083.html#optimizing-the-cost-function" style="font-size: 80%;"><b>Optimizing the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs084.html#regularization" style="font-size: 80%;"><b>Regularization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs085.html#matrix-multiplication" style="font-size: 80%;"><b>Matrix  multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs086.html#improving-performance" style="font-size: 80%;"><b>Improving performance</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs087.html#full-object-oriented-implementation" style="font-size: 80%;"><b>Full object-oriented implementation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs088.html#evaluate-model-performance-on-test-data" style="font-size: 80%;"><b>Evaluate model performance on test data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs089.html#adjust-hyperparameters" style="font-size: 80%;"><b>Adjust hyperparameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs091.html#scikit-learn-implementation" style="font-size: 80%;"><b>scikit-learn implementation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#building-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs094.html#tensorflow" style="font-size: 80%;"><b>Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs095.html#using-keras" style="font-size: 80%;"><b>Using Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#the-breast-cancer-data-now-with-keras" style="font-size: 80%;"><b>The Breast Cancer Data, now with Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs048.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs049.html#relevance" style="font-size: 80%;"><b>Relevance</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs050.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs051.html#exploding-gradients" style="font-size: 80%;"><b>Exploding gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs052.html#is-the-logistic-activation-function-sigmoid-our-choice" style="font-size: 80%;"><b>Is the Logistic activation function (Sigmoid)  our choice?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs053.html#logistic-function-as-the-root-of-problems" style="font-size: 80%;"><b>Logistic function as the root of problems</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs054.html#the-derivative-of-the-logistic-funtion" style="font-size: 80%;"><b>The derivative of the Logistic funtion</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs055.html#insights-from-the-paper-by-glorot-and-bengio" style="font-size: 80%;"><b>Insights from the paper by Glorot and Bengio</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs056.html#the-relu-function-family" style="font-size: 80%;"><b>The RELU function family</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs057.html#elu-function" style="font-size: 80%;"><b>ELU function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs058.html#which-activation-function-should-we-use" style="font-size: 80%;"><b>Which activation function should we use?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs059.html#more-on-activation-functions-output-layers" style="font-size: 80%;"><b>More on activation functions, output layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs060.html#fine-tuning-neural-network-hyperparameters" style="font-size: 80%;"><b>Fine-tuning neural network hyperparameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs061.html#hidden-layers" style="font-size: 80%;"><b>Hidden layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs062.html#batch-normalization" style="font-size: 80%;"><b>Batch Normalization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs063.html#dropout" style="font-size: 80%;"><b>Dropout</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs064.html#gradient-clipping" style="font-size: 80%;"><b>Gradient Clipping</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs065.html#a-top-down-perspective-on-neural-networks" style="font-size: 80%;"><b>A top-down perspective on Neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs066.html#more-top-down-perspectives" style="font-size: 80%;"><b>More top-down perspectives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs067.html#limitations-of-supervised-learning-with-deep-networks" style="font-size: 80%;"><b>Limitations of supervised learning with deep networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs068.html#limitations-of-nns" style="font-size: 80%;"><b>Limitations of NNs</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs069.html#homogeneous-data" style="font-size: 80%;"><b>Homogeneous data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs070.html#more-limitations" style="font-size: 80%;"><b>More limitations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs071.html#setting-up-a-multi-layer-perceptron-model-for-classification" style="font-size: 80%;"><b>Setting up a Multi-layer perceptron model for classification</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs072.html#defining-the-cost-function" style="font-size: 80%;"><b>Defining the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs073.html#example-binary-classification-problem" style="font-size: 80%;"><b>Example: binary classification problem</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs074.html#the-softmax-function" style="font-size: 80%;"><b>The Softmax function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs075.html#developing-a-code-for-doing-neural-networks-with-back-propagation" style="font-size: 80%;"><b>Developing a code for doing neural networks with back propagation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs077.html#train-and-test-datasets" style="font-size: 80%;"><b>Train and test datasets</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs078.html#define-model-and-architecture" style="font-size: 80%;"><b>Define model and architecture</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs079.html#layers" style="font-size: 80%;"><b>Layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs080.html#weights-and-biases" style="font-size: 80%;"><b>Weights and biases</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs081.html#feed-forward-pass" style="font-size: 80%;"><b>Feed-forward pass</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs082.html#matrix-multiplications" style="font-size: 80%;"><b>Matrix multiplications</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs083.html#choose-cost-function-and-optimizer" style="font-size: 80%;"><b>Choose cost function and optimizer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs084.html#optimizing-the-cost-function" style="font-size: 80%;"><b>Optimizing the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs085.html#regularization" style="font-size: 80%;"><b>Regularization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs086.html#matrix-multiplication" style="font-size: 80%;"><b>Matrix  multiplication</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs087.html#improving-performance" style="font-size: 80%;"><b>Improving performance</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs088.html#full-object-oriented-implementation" style="font-size: 80%;"><b>Full object-oriented implementation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs089.html#evaluate-model-performance-on-test-data" style="font-size: 80%;"><b>Evaluate model performance on test data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs090.html#adjust-hyperparameters" style="font-size: 80%;"><b>Adjust hyperparameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#scikit-learn-implementation" style="font-size: 80%;"><b>scikit-learn implementation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs094.html#building-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building neural networks in Tensorflow and Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs095.html#tensorflow" style="font-size: 80%;"><b>Tensorflow</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#using-keras" style="font-size: 80%;"><b>Using Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#building-a-neural-network-code" style="font-size: 80%;"><b>Building a neural network code</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#learning-rate-methods" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Learning rate methods</a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#usage-of-the-above-learning-rate-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of the above learning rate schedulers</a></li>
@@ -465,23 +463,18 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0044"></a>
 <!-- !split -->
-<h2 id="setting-up-the-back-propagation-algorithm-part-3" class="anchor">Setting up the Back propagation algorithm, part 3 </h2>
-
-<p>Finally, we update the weights and the biases using gradient descent
-for each \( l=L-1,L-2,\dots,1 \) (the first hidden layer) and update the weights and biases
-according to the rules
-</p>
+<h2 id="setting-up-the-back-propagation-algorithm-part-2" class="anchor">Setting up the back propagation algorithm, part 2 </h2>
 
+<p>Thereafter we compute the ouput error \( \boldsymbol{\delta}^L \) by computing all</p>
 $$
-w_{ij}^l\leftarrow  = w_{ij}^l- \eta \delta_j^la_i^{l-1},
+\delta_j^L = \sigma'(z_j^L)\frac{\partial {\cal C}}{\partial (a_j^L)}.
 $$
 
-
+<p>Then we compute the back propagate error for each \( l=L-1,L-2,\dots,1 \) as</p>
 $$
-b_j^l \leftarrow b_j^l-\eta \frac{\partial {\cal C}}{\partial b_j^l}=b_j^l-\eta \delta_j^l,
+\delta_j^l = \sum_k \delta_k^{l+1}w_{kj}^{l+1}\sigma'(z_j^l).
 $$
 
-<p>with \( \eta \) being the learning rate.</p>
 
 <p>
 <!-- navigation buttons at the bottom of the page -->
diff --git a/doc/pub/week42/html/._week42-bs045.html b/doc/pub/week42/html/._week42-bs045.html
index 2a9abcd83..f13fe04bd 100644
--- a/doc/pub/week42/html/._week42-bs045.html
+++ b/doc/pub/week42/html/._week42-bs045.html
@@ -36,16 +36,17 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Lecture October 14, 2024', 2, None, 'lecture-october-14-2024'),
-              ('Material for the active learning sessions on Tuesday and '
-               'Wednesday',
+ 'sections': [('Lecture October 13, 2025', 2, None, 'lecture-october-13-2025'),
+              ('Readings and videos', 2, None, 'readings-and-videos'),
+              ('Material for the lab sessions on Tuesday and Wednesday',
                2,
                None,
-               'material-for-the-active-learning-sessions-on-tuesday-and-wednesday'),
-              ('Writing a code which implements a feed-forward neural network',
+               'material-for-the-lab-sessions-on-tuesday-and-wednesday'),
+              ('Lecture material: Writing a code which implements a '
+               'feed-forward neural network',
                2,
                None,
-               'writing-a-code-which-implements-a-feed-forward-neural-network'),
+               'lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network'),
               ('Mathematics of deep learning',
                2,
                None,
@@ -55,10 +56,11 @@
                None,
                'reminder-on-books-with-hands-on-material-and-codes'),
               ('Reading recommendations', 2, None, 'reading-recommendations'),
-              ('First network example, simple percepetron with one input',
+              ('Reminder from last week: First network example, simple '
+               'percepetron with one input',
                2,
                None,
-               'first-network-example-simple-percepetron-with-one-input'),
+               'reminder-from-last-week-first-network-example-simple-percepetron-with-one-input'),
               ('Layout of a simple neural network with no hidden layer',
                2,
                None,
@@ -105,10 +107,10 @@
                None,
                'setting-up-the-equations-for-a-neural-network'),
               ('Layout of a neural network with three hidden layers (last '
-               'later = $l=L=4$, first layer $l=0$)',
+               'layer = $l=L=4$, first layer $l=0$)',
                2,
                None,
-               'layout-of-a-neural-network-with-three-hidden-layers-last-later-l-l-4-first-layer-l-0'),
+               'layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0'),
               ('Definitions', 2, None, 'definitions'),
               ('Inputs to the activation function',
                2,
@@ -291,10 +293,6 @@
                2,
                None,
                'collect-and-pre-process-data'),
-              ('The Breast Cancer Data, now with Keras',
-               2,
-               None,
-               'the-breast-cancer-data-now-with-keras'),
               ('Building a neural network code',
                2,
                None,
@@ -349,103 +347,103 @@
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs001.html#lecture-october-14-2024" style="font-size: 80%;"><b>Lecture October 14, 2024</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs002.html#material-for-the-active-learning-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Material for the active learning sessions on Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs003.html#writing-a-code-which-implements-a-feed-forward-neural-network" style="font-size: 80%;"><b>Writing a code which implements a feed-forward neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs004.html#mathematics-of-deep-learning" style="font-size: 80%;"><b>Mathematics of deep learning</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs005.html#reminder-on-books-with-hands-on-material-and-codes" style="font-size: 80%;"><b>Reminder on books with hands-on material and codes</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs006.html#reading-recommendations" style="font-size: 80%;"><b>Reading recommendations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs007.html#first-network-example-simple-percepetron-with-one-input" style="font-size: 80%;"><b>First network example, simple percepetron with one input</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs008.html#layout-of-a-simple-neural-network-with-no-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with no hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs009.html#optimizing-the-parameters" style="font-size: 80%;"><b>Optimizing the parameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs010.html#adding-a-hidden-layer" style="font-size: 80%;"><b>Adding a hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs011.html#layout-of-a-simple-neural-network-with-one-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with one hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs012.html#the-derivatives" style="font-size: 80%;"><b>The derivatives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs013.html#important-observations" style="font-size: 80%;"><b>Important observations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs014.html#the-training" style="font-size: 80%;"><b>The training</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs015.html#code-example" style="font-size: 80%;"><b>Code example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs016.html#simple-neural-network-and-the-back-propagation-equations" style="font-size: 80%;"><b>Simple neural network and the  back propagation equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs017.html#layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-with-two-hidden-noeds-and-one-output-node" style="font-size: 80%;"><b>Layout of a simple neural network with two input nodes, one  hidden layer with two hidden noeds and one output node</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs018.html#the-ouput-layer" style="font-size: 80%;"><b>The ouput layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs019.html#compact-expressions" style="font-size: 80%;"><b>Compact expressions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs020.html#output-layer" style="font-size: 80%;"><b>Output layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs021.html#explicit-derivatives" style="font-size: 80%;"><b>Explicit derivatives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs022.html#derivatives-of-the-hidden-layer" style="font-size: 80%;"><b>Derivatives of the hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs023.html#final-expression" style="font-size: 80%;"><b>Final expression</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs024.html#completing-the-list" style="font-size: 80%;"><b>Completing the list</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs025.html#final-expressions-for-the-biases-of-the-hidden-layer" style="font-size: 80%;"><b>Final expressions for the biases of the hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs026.html#gradient-expressions" style="font-size: 80%;"><b>Gradient expressions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs027.html#setting-up-the-equations-for-a-neural-network" style="font-size: 80%;"><b>Setting up the equations for a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs028.html#layout-of-a-neural-network-with-three-hidden-layers-last-later-l-l-4-first-layer-l-0" style="font-size: 80%;"><b>Layout of a neural network with three hidden layers (last later = \( l=L=4 \), first layer \( l=0 \))</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs029.html#definitions" style="font-size: 80%;"><b>Definitions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs030.html#inputs-to-the-activation-function" style="font-size: 80%;"><b>Inputs to the activation function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs031.html#layout-of-input-to-first-hidden-layer-l-1-from-input-layer-l-0" style="font-size: 80%;"><b>Layout of input to first hidden layer \( l=1 \) from input layer \( l=0 \)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs032.html#derivatives-and-the-chain-rule" style="font-size: 80%;"><b>Derivatives and the chain rule</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs033.html#derivative-of-the-cost-function" style="font-size: 80%;"><b>Derivative of the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs034.html#the-back-propagation-equations-for-a-neural-network" style="font-size: 80%;"><b>The  back propagation equations for a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs035.html#analyzing-the-last-results" style="font-size: 80%;"><b>Analyzing the last results</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs036.html#more-considerations" style="font-size: 80%;"><b>More considerations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs037.html#derivatives-in-terms-of-z-j-l" style="font-size: 80%;"><b>Derivatives in terms of \( z_j^L \)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs038.html#bringing-it-together" style="font-size: 80%;"><b>Bringing it together</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs039.html#final-back-propagating-equation" style="font-size: 80%;"><b>Final back propagating equation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs040.html#using-the-chain-rule-and-summing-over-all-k-entries" style="font-size: 80%;"><b>Using the chain rule and summing over all \( k \) entries</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs041.html#setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations" style="font-size: 80%;"><b>Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs042.html#setting-up-the-back-propagation-algorithm-part-1" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 1</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs043.html#setting-up-the-back-propagation-algorithm-part-2" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs044.html#setting-up-the-back-propagation-algorithm-part-3" style="font-size: 80%;"><b>Setting up the Back propagation algorithm, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="#updating-the-gradients" style="font-size: 80%;"><b>Updating the gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs001.html#lecture-october-13-2025" style="font-size: 80%;"><b>Lecture October 13, 2025</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs002.html#readings-and-videos" style="font-size: 80%;"><b>Readings and videos</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs003.html#material-for-the-lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Material for the lab sessions on Tuesday and Wednesday</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs004.html#lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network" style="font-size: 80%;"><b>Lecture material: Writing a code which implements a feed-forward neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs005.html#mathematics-of-deep-learning" style="font-size: 80%;"><b>Mathematics of deep learning</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs006.html#reminder-on-books-with-hands-on-material-and-codes" style="font-size: 80%;"><b>Reminder on books with hands-on material and codes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs007.html#reading-recommendations" style="font-size: 80%;"><b>Reading recommendations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs008.html#reminder-from-last-week-first-network-example-simple-percepetron-with-one-input" style="font-size: 80%;"><b>Reminder from last week: First network example, simple percepetron with one input</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs009.html#layout-of-a-simple-neural-network-with-no-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with no hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs010.html#optimizing-the-parameters" style="font-size: 80%;"><b>Optimizing the parameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs011.html#adding-a-hidden-layer" style="font-size: 80%;"><b>Adding a hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs012.html#layout-of-a-simple-neural-network-with-one-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with one hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs013.html#the-derivatives" style="font-size: 80%;"><b>The derivatives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs014.html#important-observations" style="font-size: 80%;"><b>Important observations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs015.html#the-training" style="font-size: 80%;"><b>The training</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs016.html#code-example" style="font-size: 80%;"><b>Code example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs017.html#simple-neural-network-and-the-back-propagation-equations" style="font-size: 80%;"><b>Simple neural network and the  back propagation equations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs018.html#layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-with-two-hidden-noeds-and-one-output-node" style="font-size: 80%;"><b>Layout of a simple neural network with two input nodes, one  hidden layer with two hidden noeds and one output node</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs019.html#the-ouput-layer" style="font-size: 80%;"><b>The ouput layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs020.html#compact-expressions" style="font-size: 80%;"><b>Compact expressions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs021.html#output-layer" style="font-size: 80%;"><b>Output layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs022.html#explicit-derivatives" style="font-size: 80%;"><b>Explicit derivatives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs023.html#derivatives-of-the-hidden-layer" style="font-size: 80%;"><b>Derivatives of the hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs024.html#final-expression" style="font-size: 80%;"><b>Final expression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs025.html#completing-the-list" style="font-size: 80%;"><b>Completing the list</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs026.html#final-expressions-for-the-biases-of-the-hidden-layer" style="font-size: 80%;"><b>Final expressions for the biases of the hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs027.html#gradient-expressions" style="font-size: 80%;"><b>Gradient expressions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs028.html#setting-up-the-equations-for-a-neural-network" style="font-size: 80%;"><b>Setting up the equations for a neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs029.html#layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0" style="font-size: 80%;"><b>Layout of a neural network with three hidden layers (last layer = \( l=L=4 \), first layer \( l=0 \))</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs030.html#definitions" style="font-size: 80%;"><b>Definitions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs031.html#inputs-to-the-activation-function" style="font-size: 80%;"><b>Inputs to the activation function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs032.html#layout-of-input-to-first-hidden-layer-l-1-from-input-layer-l-0" style="font-size: 80%;"><b>Layout of input to first hidden layer \( l=1 \) from input layer \( l=0 \)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs033.html#derivatives-and-the-chain-rule" style="font-size: 80%;"><b>Derivatives and the chain rule</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs034.html#derivative-of-the-cost-function" style="font-size: 80%;"><b>Derivative of the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs035.html#the-back-propagation-equations-for-a-neural-network" style="font-size: 80%;"><b>The  back propagation equations for a neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs036.html#analyzing-the-last-results" style="font-size: 80%;"><b>Analyzing the last results</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs037.html#more-considerations" style="font-size: 80%;"><b>More considerations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs038.html#derivatives-in-terms-of-z-j-l" style="font-size: 80%;"><b>Derivatives in terms of \( z_j^L \)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs039.html#bringing-it-together" style="font-size: 80%;"><b>Bringing it together</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs040.html#final-back-propagating-equation" style="font-size: 80%;"><b>Final back propagating equation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs041.html#using-the-chain-rule-and-summing-over-all-k-entries" style="font-size: 80%;"><b>Using the chain rule and summing over all \( k \) entries</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs042.html#setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations" style="font-size: 80%;"><b>Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs043.html#setting-up-the-back-propagation-algorithm-part-1" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 1</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs044.html#setting-up-the-back-propagation-algorithm-part-2" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 2</b></a></li>
+     <!-- navigation toc: --> <li><a href="#setting-up-the-back-propagation-algorithm-part-3" style="font-size: 80%;"><b>Setting up the Back propagation algorithm, part 3</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs046.html#updating-the-gradients" style="font-size: 80%;"><b>Updating the gradients</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#activation-functions" style="font-size: 80%;"><b>Activation functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs047.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs048.html#relevance" style="font-size: 80%;"><b>Relevance</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs049.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs050.html#exploding-gradients" style="font-size: 80%;"><b>Exploding gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs051.html#is-the-logistic-activation-function-sigmoid-our-choice" style="font-size: 80%;"><b>Is the Logistic activation function (Sigmoid)  our choice?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs052.html#logistic-function-as-the-root-of-problems" style="font-size: 80%;"><b>Logistic function as the root of problems</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs053.html#the-derivative-of-the-logistic-funtion" style="font-size: 80%;"><b>The derivative of the Logistic funtion</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs054.html#insights-from-the-paper-by-glorot-and-bengio" style="font-size: 80%;"><b>Insights from the paper by Glorot and Bengio</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs055.html#the-relu-function-family" style="font-size: 80%;"><b>The RELU function family</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs056.html#elu-function" style="font-size: 80%;"><b>ELU function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs057.html#which-activation-function-should-we-use" style="font-size: 80%;"><b>Which activation function should we use?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs058.html#more-on-activation-functions-output-layers" style="font-size: 80%;"><b>More on activation functions, output layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs059.html#fine-tuning-neural-network-hyperparameters" style="font-size: 80%;"><b>Fine-tuning neural network hyperparameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs060.html#hidden-layers" style="font-size: 80%;"><b>Hidden layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs061.html#batch-normalization" style="font-size: 80%;"><b>Batch Normalization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs062.html#dropout" style="font-size: 80%;"><b>Dropout</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs063.html#gradient-clipping" style="font-size: 80%;"><b>Gradient Clipping</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs064.html#a-top-down-perspective-on-neural-networks" style="font-size: 80%;"><b>A top-down perspective on Neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs065.html#more-top-down-perspectives" style="font-size: 80%;"><b>More top-down perspectives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs066.html#limitations-of-supervised-learning-with-deep-networks" style="font-size: 80%;"><b>Limitations of supervised learning with deep networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs067.html#limitations-of-nns" style="font-size: 80%;"><b>Limitations of NNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs068.html#homogeneous-data" style="font-size: 80%;"><b>Homogeneous data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs069.html#more-limitations" style="font-size: 80%;"><b>More limitations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs070.html#setting-up-a-multi-layer-perceptron-model-for-classification" style="font-size: 80%;"><b>Setting up a Multi-layer perceptron model for classification</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs071.html#defining-the-cost-function" style="font-size: 80%;"><b>Defining the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs072.html#example-binary-classification-problem" style="font-size: 80%;"><b>Example: binary classification problem</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs073.html#the-softmax-function" style="font-size: 80%;"><b>The Softmax function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs074.html#developing-a-code-for-doing-neural-networks-with-back-propagation" style="font-size: 80%;"><b>Developing a code for doing neural networks with back propagation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs076.html#train-and-test-datasets" style="font-size: 80%;"><b>Train and test datasets</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs077.html#define-model-and-architecture" style="font-size: 80%;"><b>Define model and architecture</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs078.html#layers" style="font-size: 80%;"><b>Layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs079.html#weights-and-biases" style="font-size: 80%;"><b>Weights and biases</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs080.html#feed-forward-pass" style="font-size: 80%;"><b>Feed-forward pass</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs081.html#matrix-multiplications" style="font-size: 80%;"><b>Matrix multiplications</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs082.html#choose-cost-function-and-optimizer" style="font-size: 80%;"><b>Choose cost function and optimizer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs083.html#optimizing-the-cost-function" style="font-size: 80%;"><b>Optimizing the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs084.html#regularization" style="font-size: 80%;"><b>Regularization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs085.html#matrix-multiplication" style="font-size: 80%;"><b>Matrix  multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs086.html#improving-performance" style="font-size: 80%;"><b>Improving performance</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs087.html#full-object-oriented-implementation" style="font-size: 80%;"><b>Full object-oriented implementation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs088.html#evaluate-model-performance-on-test-data" style="font-size: 80%;"><b>Evaluate model performance on test data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs089.html#adjust-hyperparameters" style="font-size: 80%;"><b>Adjust hyperparameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs091.html#scikit-learn-implementation" style="font-size: 80%;"><b>scikit-learn implementation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#building-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs094.html#tensorflow" style="font-size: 80%;"><b>Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs095.html#using-keras" style="font-size: 80%;"><b>Using Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#the-breast-cancer-data-now-with-keras" style="font-size: 80%;"><b>The Breast Cancer Data, now with Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs048.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs049.html#relevance" style="font-size: 80%;"><b>Relevance</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs050.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs051.html#exploding-gradients" style="font-size: 80%;"><b>Exploding gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs052.html#is-the-logistic-activation-function-sigmoid-our-choice" style="font-size: 80%;"><b>Is the Logistic activation function (Sigmoid)  our choice?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs053.html#logistic-function-as-the-root-of-problems" style="font-size: 80%;"><b>Logistic function as the root of problems</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs054.html#the-derivative-of-the-logistic-funtion" style="font-size: 80%;"><b>The derivative of the Logistic funtion</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs055.html#insights-from-the-paper-by-glorot-and-bengio" style="font-size: 80%;"><b>Insights from the paper by Glorot and Bengio</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs056.html#the-relu-function-family" style="font-size: 80%;"><b>The RELU function family</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs057.html#elu-function" style="font-size: 80%;"><b>ELU function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs058.html#which-activation-function-should-we-use" style="font-size: 80%;"><b>Which activation function should we use?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs059.html#more-on-activation-functions-output-layers" style="font-size: 80%;"><b>More on activation functions, output layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs060.html#fine-tuning-neural-network-hyperparameters" style="font-size: 80%;"><b>Fine-tuning neural network hyperparameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs061.html#hidden-layers" style="font-size: 80%;"><b>Hidden layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs062.html#batch-normalization" style="font-size: 80%;"><b>Batch Normalization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs063.html#dropout" style="font-size: 80%;"><b>Dropout</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs064.html#gradient-clipping" style="font-size: 80%;"><b>Gradient Clipping</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs065.html#a-top-down-perspective-on-neural-networks" style="font-size: 80%;"><b>A top-down perspective on Neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs066.html#more-top-down-perspectives" style="font-size: 80%;"><b>More top-down perspectives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs067.html#limitations-of-supervised-learning-with-deep-networks" style="font-size: 80%;"><b>Limitations of supervised learning with deep networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs068.html#limitations-of-nns" style="font-size: 80%;"><b>Limitations of NNs</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs069.html#homogeneous-data" style="font-size: 80%;"><b>Homogeneous data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs070.html#more-limitations" style="font-size: 80%;"><b>More limitations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs071.html#setting-up-a-multi-layer-perceptron-model-for-classification" style="font-size: 80%;"><b>Setting up a Multi-layer perceptron model for classification</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs072.html#defining-the-cost-function" style="font-size: 80%;"><b>Defining the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs073.html#example-binary-classification-problem" style="font-size: 80%;"><b>Example: binary classification problem</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs074.html#the-softmax-function" style="font-size: 80%;"><b>The Softmax function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs075.html#developing-a-code-for-doing-neural-networks-with-back-propagation" style="font-size: 80%;"><b>Developing a code for doing neural networks with back propagation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs077.html#train-and-test-datasets" style="font-size: 80%;"><b>Train and test datasets</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs078.html#define-model-and-architecture" style="font-size: 80%;"><b>Define model and architecture</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs079.html#layers" style="font-size: 80%;"><b>Layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs080.html#weights-and-biases" style="font-size: 80%;"><b>Weights and biases</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs081.html#feed-forward-pass" style="font-size: 80%;"><b>Feed-forward pass</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs082.html#matrix-multiplications" style="font-size: 80%;"><b>Matrix multiplications</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs083.html#choose-cost-function-and-optimizer" style="font-size: 80%;"><b>Choose cost function and optimizer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs084.html#optimizing-the-cost-function" style="font-size: 80%;"><b>Optimizing the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs085.html#regularization" style="font-size: 80%;"><b>Regularization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs086.html#matrix-multiplication" style="font-size: 80%;"><b>Matrix  multiplication</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs087.html#improving-performance" style="font-size: 80%;"><b>Improving performance</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs088.html#full-object-oriented-implementation" style="font-size: 80%;"><b>Full object-oriented implementation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs089.html#evaluate-model-performance-on-test-data" style="font-size: 80%;"><b>Evaluate model performance on test data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs090.html#adjust-hyperparameters" style="font-size: 80%;"><b>Adjust hyperparameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#scikit-learn-implementation" style="font-size: 80%;"><b>scikit-learn implementation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs094.html#building-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building neural networks in Tensorflow and Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs095.html#tensorflow" style="font-size: 80%;"><b>Tensorflow</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#using-keras" style="font-size: 80%;"><b>Using Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#building-a-neural-network-code" style="font-size: 80%;"><b>Building a neural network code</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#learning-rate-methods" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Learning rate methods</a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#usage-of-the-above-learning-rate-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of the above learning rate schedulers</a></li>
@@ -465,14 +463,13 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0045"></a>
 <!-- !split -->
-<h2 id="updating-the-gradients" class="anchor">Updating the gradients  </h2>
+<h2 id="setting-up-the-back-propagation-algorithm-part-3" class="anchor">Setting up the Back propagation algorithm, part 3 </h2>
 
-<p>With the back propagate error for each \( l=L-1,L-2,\dots,1 \) as</p>
-$$
-\delta_j^l = \sum_k \delta_k^{l+1}w_{kj}^{l+1}\sigma'(z_j^l),
-$$
+<p>Finally, we update the weights and the biases using gradient descent
+for each \( l=L-1,L-2,\dots,1 \) (the first hidden layer) and update the weights and biases
+according to the rules
+</p>
 
-<p>we update the weights and the biases using gradient descent for each \( l=L-1,L-2,\dots,1 \) and update the weights and biases according to the rules</p>
 $$
 w_{ij}^l\leftarrow  = w_{ij}^l- \eta \delta_j^la_i^{l-1},
 $$
@@ -482,6 +479,7 @@ <h2 id="updating-the-gradients" class="anchor">Updating the gradients  </h2>
 b_j^l \leftarrow b_j^l-\eta \frac{\partial {\cal C}}{\partial b_j^l}=b_j^l-\eta \delta_j^l,
 $$
 
+<p>with \( \eta \) being the learning rate.</p>
 
 <p>
 <!-- navigation buttons at the bottom of the page -->
diff --git a/doc/pub/week42/html/._week42-bs046.html b/doc/pub/week42/html/._week42-bs046.html
index 1d172d997..c06bbad01 100644
--- a/doc/pub/week42/html/._week42-bs046.html
+++ b/doc/pub/week42/html/._week42-bs046.html
@@ -36,16 +36,17 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Lecture October 14, 2024', 2, None, 'lecture-october-14-2024'),
-              ('Material for the active learning sessions on Tuesday and '
-               'Wednesday',
+ 'sections': [('Lecture October 13, 2025', 2, None, 'lecture-october-13-2025'),
+              ('Readings and videos', 2, None, 'readings-and-videos'),
+              ('Material for the lab sessions on Tuesday and Wednesday',
                2,
                None,
-               'material-for-the-active-learning-sessions-on-tuesday-and-wednesday'),
-              ('Writing a code which implements a feed-forward neural network',
+               'material-for-the-lab-sessions-on-tuesday-and-wednesday'),
+              ('Lecture material: Writing a code which implements a '
+               'feed-forward neural network',
                2,
                None,
-               'writing-a-code-which-implements-a-feed-forward-neural-network'),
+               'lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network'),
               ('Mathematics of deep learning',
                2,
                None,
@@ -55,10 +56,11 @@
                None,
                'reminder-on-books-with-hands-on-material-and-codes'),
               ('Reading recommendations', 2, None, 'reading-recommendations'),
-              ('First network example, simple percepetron with one input',
+              ('Reminder from last week: First network example, simple '
+               'percepetron with one input',
                2,
                None,
-               'first-network-example-simple-percepetron-with-one-input'),
+               'reminder-from-last-week-first-network-example-simple-percepetron-with-one-input'),
               ('Layout of a simple neural network with no hidden layer',
                2,
                None,
@@ -105,10 +107,10 @@
                None,
                'setting-up-the-equations-for-a-neural-network'),
               ('Layout of a neural network with three hidden layers (last '
-               'later = $l=L=4$, first layer $l=0$)',
+               'layer = $l=L=4$, first layer $l=0$)',
                2,
                None,
-               'layout-of-a-neural-network-with-three-hidden-layers-last-later-l-l-4-first-layer-l-0'),
+               'layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0'),
               ('Definitions', 2, None, 'definitions'),
               ('Inputs to the activation function',
                2,
@@ -291,10 +293,6 @@
                2,
                None,
                'collect-and-pre-process-data'),
-              ('The Breast Cancer Data, now with Keras',
-               2,
-               None,
-               'the-breast-cancer-data-now-with-keras'),
               ('Building a neural network code',
                2,
                None,
@@ -349,103 +347,103 @@
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs001.html#lecture-october-14-2024" style="font-size: 80%;"><b>Lecture October 14, 2024</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs002.html#material-for-the-active-learning-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Material for the active learning sessions on Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs003.html#writing-a-code-which-implements-a-feed-forward-neural-network" style="font-size: 80%;"><b>Writing a code which implements a feed-forward neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs004.html#mathematics-of-deep-learning" style="font-size: 80%;"><b>Mathematics of deep learning</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs005.html#reminder-on-books-with-hands-on-material-and-codes" style="font-size: 80%;"><b>Reminder on books with hands-on material and codes</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs006.html#reading-recommendations" style="font-size: 80%;"><b>Reading recommendations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs007.html#first-network-example-simple-percepetron-with-one-input" style="font-size: 80%;"><b>First network example, simple percepetron with one input</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs008.html#layout-of-a-simple-neural-network-with-no-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with no hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs009.html#optimizing-the-parameters" style="font-size: 80%;"><b>Optimizing the parameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs010.html#adding-a-hidden-layer" style="font-size: 80%;"><b>Adding a hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs011.html#layout-of-a-simple-neural-network-with-one-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with one hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs012.html#the-derivatives" style="font-size: 80%;"><b>The derivatives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs013.html#important-observations" style="font-size: 80%;"><b>Important observations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs014.html#the-training" style="font-size: 80%;"><b>The training</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs015.html#code-example" style="font-size: 80%;"><b>Code example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs016.html#simple-neural-network-and-the-back-propagation-equations" style="font-size: 80%;"><b>Simple neural network and the  back propagation equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs017.html#layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-with-two-hidden-noeds-and-one-output-node" style="font-size: 80%;"><b>Layout of a simple neural network with two input nodes, one  hidden layer with two hidden noeds and one output node</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs018.html#the-ouput-layer" style="font-size: 80%;"><b>The ouput layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs019.html#compact-expressions" style="font-size: 80%;"><b>Compact expressions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs020.html#output-layer" style="font-size: 80%;"><b>Output layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs021.html#explicit-derivatives" style="font-size: 80%;"><b>Explicit derivatives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs022.html#derivatives-of-the-hidden-layer" style="font-size: 80%;"><b>Derivatives of the hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs023.html#final-expression" style="font-size: 80%;"><b>Final expression</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs024.html#completing-the-list" style="font-size: 80%;"><b>Completing the list</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs025.html#final-expressions-for-the-biases-of-the-hidden-layer" style="font-size: 80%;"><b>Final expressions for the biases of the hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs026.html#gradient-expressions" style="font-size: 80%;"><b>Gradient expressions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs027.html#setting-up-the-equations-for-a-neural-network" style="font-size: 80%;"><b>Setting up the equations for a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs028.html#layout-of-a-neural-network-with-three-hidden-layers-last-later-l-l-4-first-layer-l-0" style="font-size: 80%;"><b>Layout of a neural network with three hidden layers (last later = \( l=L=4 \), first layer \( l=0 \))</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs029.html#definitions" style="font-size: 80%;"><b>Definitions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs030.html#inputs-to-the-activation-function" style="font-size: 80%;"><b>Inputs to the activation function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs031.html#layout-of-input-to-first-hidden-layer-l-1-from-input-layer-l-0" style="font-size: 80%;"><b>Layout of input to first hidden layer \( l=1 \) from input layer \( l=0 \)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs032.html#derivatives-and-the-chain-rule" style="font-size: 80%;"><b>Derivatives and the chain rule</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs033.html#derivative-of-the-cost-function" style="font-size: 80%;"><b>Derivative of the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs034.html#the-back-propagation-equations-for-a-neural-network" style="font-size: 80%;"><b>The  back propagation equations for a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs035.html#analyzing-the-last-results" style="font-size: 80%;"><b>Analyzing the last results</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs036.html#more-considerations" style="font-size: 80%;"><b>More considerations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs037.html#derivatives-in-terms-of-z-j-l" style="font-size: 80%;"><b>Derivatives in terms of \( z_j^L \)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs038.html#bringing-it-together" style="font-size: 80%;"><b>Bringing it together</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs039.html#final-back-propagating-equation" style="font-size: 80%;"><b>Final back propagating equation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs040.html#using-the-chain-rule-and-summing-over-all-k-entries" style="font-size: 80%;"><b>Using the chain rule and summing over all \( k \) entries</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs041.html#setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations" style="font-size: 80%;"><b>Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs042.html#setting-up-the-back-propagation-algorithm-part-1" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 1</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs043.html#setting-up-the-back-propagation-algorithm-part-2" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs044.html#setting-up-the-back-propagation-algorithm-part-3" style="font-size: 80%;"><b>Setting up the Back propagation algorithm, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs045.html#updating-the-gradients" style="font-size: 80%;"><b>Updating the gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs001.html#lecture-october-13-2025" style="font-size: 80%;"><b>Lecture October 13, 2025</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs002.html#readings-and-videos" style="font-size: 80%;"><b>Readings and videos</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs003.html#material-for-the-lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Material for the lab sessions on Tuesday and Wednesday</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs004.html#lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network" style="font-size: 80%;"><b>Lecture material: Writing a code which implements a feed-forward neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs005.html#mathematics-of-deep-learning" style="font-size: 80%;"><b>Mathematics of deep learning</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs006.html#reminder-on-books-with-hands-on-material-and-codes" style="font-size: 80%;"><b>Reminder on books with hands-on material and codes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs007.html#reading-recommendations" style="font-size: 80%;"><b>Reading recommendations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs008.html#reminder-from-last-week-first-network-example-simple-percepetron-with-one-input" style="font-size: 80%;"><b>Reminder from last week: First network example, simple percepetron with one input</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs009.html#layout-of-a-simple-neural-network-with-no-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with no hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs010.html#optimizing-the-parameters" style="font-size: 80%;"><b>Optimizing the parameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs011.html#adding-a-hidden-layer" style="font-size: 80%;"><b>Adding a hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs012.html#layout-of-a-simple-neural-network-with-one-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with one hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs013.html#the-derivatives" style="font-size: 80%;"><b>The derivatives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs014.html#important-observations" style="font-size: 80%;"><b>Important observations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs015.html#the-training" style="font-size: 80%;"><b>The training</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs016.html#code-example" style="font-size: 80%;"><b>Code example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs017.html#simple-neural-network-and-the-back-propagation-equations" style="font-size: 80%;"><b>Simple neural network and the  back propagation equations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs018.html#layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-with-two-hidden-noeds-and-one-output-node" style="font-size: 80%;"><b>Layout of a simple neural network with two input nodes, one  hidden layer with two hidden noeds and one output node</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs019.html#the-ouput-layer" style="font-size: 80%;"><b>The ouput layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs020.html#compact-expressions" style="font-size: 80%;"><b>Compact expressions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs021.html#output-layer" style="font-size: 80%;"><b>Output layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs022.html#explicit-derivatives" style="font-size: 80%;"><b>Explicit derivatives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs023.html#derivatives-of-the-hidden-layer" style="font-size: 80%;"><b>Derivatives of the hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs024.html#final-expression" style="font-size: 80%;"><b>Final expression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs025.html#completing-the-list" style="font-size: 80%;"><b>Completing the list</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs026.html#final-expressions-for-the-biases-of-the-hidden-layer" style="font-size: 80%;"><b>Final expressions for the biases of the hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs027.html#gradient-expressions" style="font-size: 80%;"><b>Gradient expressions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs028.html#setting-up-the-equations-for-a-neural-network" style="font-size: 80%;"><b>Setting up the equations for a neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs029.html#layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0" style="font-size: 80%;"><b>Layout of a neural network with three hidden layers (last layer = \( l=L=4 \), first layer \( l=0 \))</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs030.html#definitions" style="font-size: 80%;"><b>Definitions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs031.html#inputs-to-the-activation-function" style="font-size: 80%;"><b>Inputs to the activation function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs032.html#layout-of-input-to-first-hidden-layer-l-1-from-input-layer-l-0" style="font-size: 80%;"><b>Layout of input to first hidden layer \( l=1 \) from input layer \( l=0 \)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs033.html#derivatives-and-the-chain-rule" style="font-size: 80%;"><b>Derivatives and the chain rule</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs034.html#derivative-of-the-cost-function" style="font-size: 80%;"><b>Derivative of the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs035.html#the-back-propagation-equations-for-a-neural-network" style="font-size: 80%;"><b>The  back propagation equations for a neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs036.html#analyzing-the-last-results" style="font-size: 80%;"><b>Analyzing the last results</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs037.html#more-considerations" style="font-size: 80%;"><b>More considerations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs038.html#derivatives-in-terms-of-z-j-l" style="font-size: 80%;"><b>Derivatives in terms of \( z_j^L \)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs039.html#bringing-it-together" style="font-size: 80%;"><b>Bringing it together</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs040.html#final-back-propagating-equation" style="font-size: 80%;"><b>Final back propagating equation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs041.html#using-the-chain-rule-and-summing-over-all-k-entries" style="font-size: 80%;"><b>Using the chain rule and summing over all \( k \) entries</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs042.html#setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations" style="font-size: 80%;"><b>Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs043.html#setting-up-the-back-propagation-algorithm-part-1" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 1</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs044.html#setting-up-the-back-propagation-algorithm-part-2" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 2</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs045.html#setting-up-the-back-propagation-algorithm-part-3" style="font-size: 80%;"><b>Setting up the Back propagation algorithm, part 3</b></a></li>
+     <!-- navigation toc: --> <li><a href="#updating-the-gradients" style="font-size: 80%;"><b>Updating the gradients</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#activation-functions" style="font-size: 80%;"><b>Activation functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs047.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs048.html#relevance" style="font-size: 80%;"><b>Relevance</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs049.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs050.html#exploding-gradients" style="font-size: 80%;"><b>Exploding gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs051.html#is-the-logistic-activation-function-sigmoid-our-choice" style="font-size: 80%;"><b>Is the Logistic activation function (Sigmoid)  our choice?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs052.html#logistic-function-as-the-root-of-problems" style="font-size: 80%;"><b>Logistic function as the root of problems</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs053.html#the-derivative-of-the-logistic-funtion" style="font-size: 80%;"><b>The derivative of the Logistic funtion</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs054.html#insights-from-the-paper-by-glorot-and-bengio" style="font-size: 80%;"><b>Insights from the paper by Glorot and Bengio</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs055.html#the-relu-function-family" style="font-size: 80%;"><b>The RELU function family</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs056.html#elu-function" style="font-size: 80%;"><b>ELU function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs057.html#which-activation-function-should-we-use" style="font-size: 80%;"><b>Which activation function should we use?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs058.html#more-on-activation-functions-output-layers" style="font-size: 80%;"><b>More on activation functions, output layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs059.html#fine-tuning-neural-network-hyperparameters" style="font-size: 80%;"><b>Fine-tuning neural network hyperparameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs060.html#hidden-layers" style="font-size: 80%;"><b>Hidden layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs061.html#batch-normalization" style="font-size: 80%;"><b>Batch Normalization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs062.html#dropout" style="font-size: 80%;"><b>Dropout</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs063.html#gradient-clipping" style="font-size: 80%;"><b>Gradient Clipping</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs064.html#a-top-down-perspective-on-neural-networks" style="font-size: 80%;"><b>A top-down perspective on Neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs065.html#more-top-down-perspectives" style="font-size: 80%;"><b>More top-down perspectives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs066.html#limitations-of-supervised-learning-with-deep-networks" style="font-size: 80%;"><b>Limitations of supervised learning with deep networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs067.html#limitations-of-nns" style="font-size: 80%;"><b>Limitations of NNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs068.html#homogeneous-data" style="font-size: 80%;"><b>Homogeneous data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs069.html#more-limitations" style="font-size: 80%;"><b>More limitations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs070.html#setting-up-a-multi-layer-perceptron-model-for-classification" style="font-size: 80%;"><b>Setting up a Multi-layer perceptron model for classification</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs071.html#defining-the-cost-function" style="font-size: 80%;"><b>Defining the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs072.html#example-binary-classification-problem" style="font-size: 80%;"><b>Example: binary classification problem</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs073.html#the-softmax-function" style="font-size: 80%;"><b>The Softmax function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs074.html#developing-a-code-for-doing-neural-networks-with-back-propagation" style="font-size: 80%;"><b>Developing a code for doing neural networks with back propagation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs076.html#train-and-test-datasets" style="font-size: 80%;"><b>Train and test datasets</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs077.html#define-model-and-architecture" style="font-size: 80%;"><b>Define model and architecture</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs078.html#layers" style="font-size: 80%;"><b>Layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs079.html#weights-and-biases" style="font-size: 80%;"><b>Weights and biases</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs080.html#feed-forward-pass" style="font-size: 80%;"><b>Feed-forward pass</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs081.html#matrix-multiplications" style="font-size: 80%;"><b>Matrix multiplications</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs082.html#choose-cost-function-and-optimizer" style="font-size: 80%;"><b>Choose cost function and optimizer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs083.html#optimizing-the-cost-function" style="font-size: 80%;"><b>Optimizing the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs084.html#regularization" style="font-size: 80%;"><b>Regularization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs085.html#matrix-multiplication" style="font-size: 80%;"><b>Matrix  multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs086.html#improving-performance" style="font-size: 80%;"><b>Improving performance</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs087.html#full-object-oriented-implementation" style="font-size: 80%;"><b>Full object-oriented implementation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs088.html#evaluate-model-performance-on-test-data" style="font-size: 80%;"><b>Evaluate model performance on test data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs089.html#adjust-hyperparameters" style="font-size: 80%;"><b>Adjust hyperparameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs091.html#scikit-learn-implementation" style="font-size: 80%;"><b>scikit-learn implementation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#building-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs094.html#tensorflow" style="font-size: 80%;"><b>Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs095.html#using-keras" style="font-size: 80%;"><b>Using Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#the-breast-cancer-data-now-with-keras" style="font-size: 80%;"><b>The Breast Cancer Data, now with Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs048.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs049.html#relevance" style="font-size: 80%;"><b>Relevance</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs050.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs051.html#exploding-gradients" style="font-size: 80%;"><b>Exploding gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs052.html#is-the-logistic-activation-function-sigmoid-our-choice" style="font-size: 80%;"><b>Is the Logistic activation function (Sigmoid)  our choice?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs053.html#logistic-function-as-the-root-of-problems" style="font-size: 80%;"><b>Logistic function as the root of problems</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs054.html#the-derivative-of-the-logistic-funtion" style="font-size: 80%;"><b>The derivative of the Logistic funtion</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs055.html#insights-from-the-paper-by-glorot-and-bengio" style="font-size: 80%;"><b>Insights from the paper by Glorot and Bengio</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs056.html#the-relu-function-family" style="font-size: 80%;"><b>The RELU function family</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs057.html#elu-function" style="font-size: 80%;"><b>ELU function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs058.html#which-activation-function-should-we-use" style="font-size: 80%;"><b>Which activation function should we use?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs059.html#more-on-activation-functions-output-layers" style="font-size: 80%;"><b>More on activation functions, output layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs060.html#fine-tuning-neural-network-hyperparameters" style="font-size: 80%;"><b>Fine-tuning neural network hyperparameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs061.html#hidden-layers" style="font-size: 80%;"><b>Hidden layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs062.html#batch-normalization" style="font-size: 80%;"><b>Batch Normalization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs063.html#dropout" style="font-size: 80%;"><b>Dropout</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs064.html#gradient-clipping" style="font-size: 80%;"><b>Gradient Clipping</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs065.html#a-top-down-perspective-on-neural-networks" style="font-size: 80%;"><b>A top-down perspective on Neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs066.html#more-top-down-perspectives" style="font-size: 80%;"><b>More top-down perspectives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs067.html#limitations-of-supervised-learning-with-deep-networks" style="font-size: 80%;"><b>Limitations of supervised learning with deep networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs068.html#limitations-of-nns" style="font-size: 80%;"><b>Limitations of NNs</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs069.html#homogeneous-data" style="font-size: 80%;"><b>Homogeneous data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs070.html#more-limitations" style="font-size: 80%;"><b>More limitations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs071.html#setting-up-a-multi-layer-perceptron-model-for-classification" style="font-size: 80%;"><b>Setting up a Multi-layer perceptron model for classification</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs072.html#defining-the-cost-function" style="font-size: 80%;"><b>Defining the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs073.html#example-binary-classification-problem" style="font-size: 80%;"><b>Example: binary classification problem</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs074.html#the-softmax-function" style="font-size: 80%;"><b>The Softmax function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs075.html#developing-a-code-for-doing-neural-networks-with-back-propagation" style="font-size: 80%;"><b>Developing a code for doing neural networks with back propagation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs077.html#train-and-test-datasets" style="font-size: 80%;"><b>Train and test datasets</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs078.html#define-model-and-architecture" style="font-size: 80%;"><b>Define model and architecture</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs079.html#layers" style="font-size: 80%;"><b>Layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs080.html#weights-and-biases" style="font-size: 80%;"><b>Weights and biases</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs081.html#feed-forward-pass" style="font-size: 80%;"><b>Feed-forward pass</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs082.html#matrix-multiplications" style="font-size: 80%;"><b>Matrix multiplications</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs083.html#choose-cost-function-and-optimizer" style="font-size: 80%;"><b>Choose cost function and optimizer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs084.html#optimizing-the-cost-function" style="font-size: 80%;"><b>Optimizing the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs085.html#regularization" style="font-size: 80%;"><b>Regularization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs086.html#matrix-multiplication" style="font-size: 80%;"><b>Matrix  multiplication</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs087.html#improving-performance" style="font-size: 80%;"><b>Improving performance</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs088.html#full-object-oriented-implementation" style="font-size: 80%;"><b>Full object-oriented implementation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs089.html#evaluate-model-performance-on-test-data" style="font-size: 80%;"><b>Evaluate model performance on test data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs090.html#adjust-hyperparameters" style="font-size: 80%;"><b>Adjust hyperparameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#scikit-learn-implementation" style="font-size: 80%;"><b>scikit-learn implementation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs094.html#building-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building neural networks in Tensorflow and Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs095.html#tensorflow" style="font-size: 80%;"><b>Tensorflow</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#using-keras" style="font-size: 80%;"><b>Using Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#building-a-neural-network-code" style="font-size: 80%;"><b>Building a neural network code</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#learning-rate-methods" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Learning rate methods</a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#usage-of-the-above-learning-rate-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of the above learning rate schedulers</a></li>
@@ -465,20 +463,24 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0046"></a>
 <!-- !split -->
-<h2 id="activation-functions" class="anchor">Activation functions  </h2>
+<h2 id="updating-the-gradients" class="anchor">Updating the gradients  </h2>
+
+<p>With the back propagate error for each \( l=L-1,L-2,\dots,1 \) as</p>
+$$
+\delta_j^l = \sum_k \delta_k^{l+1}w_{kj}^{l+1}\sigma'(z_j^l),
+$$
+
+<p>we update the weights and the biases using gradient descent for each \( l=L-1,L-2,\dots,1 \) and update the weights and biases according to the rules</p>
+$$
+w_{ij}^l\leftarrow  = w_{ij}^l- \eta \delta_j^la_i^{l-1},
+$$
+
+
+$$
+b_j^l \leftarrow b_j^l-\eta \frac{\partial {\cal C}}{\partial b_j^l}=b_j^l-\eta \delta_j^l,
+$$
 
-<p>A property that characterizes a neural network, other than its
-connectivity, is the choice of activation function(s).  The following
-restrictions are imposed on an activation function for an FFNN to
-fulfill the universal approximation theorem
-</p>
 
-<ul>
-  <li> Non-constant</li>
-  <li> Bounded</li>
-  <li> Monotonically-increasing</li>
-  <li> Continuous</li>
-</ul>
 <p>
 <!-- navigation buttons at the bottom of the page -->
 <ul class="pagination">
diff --git a/doc/pub/week42/html/._week42-bs047.html b/doc/pub/week42/html/._week42-bs047.html
index bb022d2bb..a86229205 100644
--- a/doc/pub/week42/html/._week42-bs047.html
+++ b/doc/pub/week42/html/._week42-bs047.html
@@ -36,16 +36,17 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Lecture October 14, 2024', 2, None, 'lecture-october-14-2024'),
-              ('Material for the active learning sessions on Tuesday and '
-               'Wednesday',
+ 'sections': [('Lecture October 13, 2025', 2, None, 'lecture-october-13-2025'),
+              ('Readings and videos', 2, None, 'readings-and-videos'),
+              ('Material for the lab sessions on Tuesday and Wednesday',
                2,
                None,
-               'material-for-the-active-learning-sessions-on-tuesday-and-wednesday'),
-              ('Writing a code which implements a feed-forward neural network',
+               'material-for-the-lab-sessions-on-tuesday-and-wednesday'),
+              ('Lecture material: Writing a code which implements a '
+               'feed-forward neural network',
                2,
                None,
-               'writing-a-code-which-implements-a-feed-forward-neural-network'),
+               'lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network'),
               ('Mathematics of deep learning',
                2,
                None,
@@ -55,10 +56,11 @@
                None,
                'reminder-on-books-with-hands-on-material-and-codes'),
               ('Reading recommendations', 2, None, 'reading-recommendations'),
-              ('First network example, simple percepetron with one input',
+              ('Reminder from last week: First network example, simple '
+               'percepetron with one input',
                2,
                None,
-               'first-network-example-simple-percepetron-with-one-input'),
+               'reminder-from-last-week-first-network-example-simple-percepetron-with-one-input'),
               ('Layout of a simple neural network with no hidden layer',
                2,
                None,
@@ -105,10 +107,10 @@
                None,
                'setting-up-the-equations-for-a-neural-network'),
               ('Layout of a neural network with three hidden layers (last '
-               'later = $l=L=4$, first layer $l=0$)',
+               'layer = $l=L=4$, first layer $l=0$)',
                2,
                None,
-               'layout-of-a-neural-network-with-three-hidden-layers-last-later-l-l-4-first-layer-l-0'),
+               'layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0'),
               ('Definitions', 2, None, 'definitions'),
               ('Inputs to the activation function',
                2,
@@ -291,10 +293,6 @@
                2,
                None,
                'collect-and-pre-process-data'),
-              ('The Breast Cancer Data, now with Keras',
-               2,
-               None,
-               'the-breast-cancer-data-now-with-keras'),
               ('Building a neural network code',
                2,
                None,
@@ -349,103 +347,103 @@
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs001.html#lecture-october-14-2024" style="font-size: 80%;"><b>Lecture October 14, 2024</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs002.html#material-for-the-active-learning-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Material for the active learning sessions on Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs003.html#writing-a-code-which-implements-a-feed-forward-neural-network" style="font-size: 80%;"><b>Writing a code which implements a feed-forward neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs004.html#mathematics-of-deep-learning" style="font-size: 80%;"><b>Mathematics of deep learning</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs005.html#reminder-on-books-with-hands-on-material-and-codes" style="font-size: 80%;"><b>Reminder on books with hands-on material and codes</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs006.html#reading-recommendations" style="font-size: 80%;"><b>Reading recommendations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs007.html#first-network-example-simple-percepetron-with-one-input" style="font-size: 80%;"><b>First network example, simple percepetron with one input</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs008.html#layout-of-a-simple-neural-network-with-no-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with no hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs009.html#optimizing-the-parameters" style="font-size: 80%;"><b>Optimizing the parameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs010.html#adding-a-hidden-layer" style="font-size: 80%;"><b>Adding a hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs011.html#layout-of-a-simple-neural-network-with-one-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with one hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs012.html#the-derivatives" style="font-size: 80%;"><b>The derivatives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs013.html#important-observations" style="font-size: 80%;"><b>Important observations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs014.html#the-training" style="font-size: 80%;"><b>The training</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs015.html#code-example" style="font-size: 80%;"><b>Code example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs016.html#simple-neural-network-and-the-back-propagation-equations" style="font-size: 80%;"><b>Simple neural network and the  back propagation equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs017.html#layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-with-two-hidden-noeds-and-one-output-node" style="font-size: 80%;"><b>Layout of a simple neural network with two input nodes, one  hidden layer with two hidden noeds and one output node</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs018.html#the-ouput-layer" style="font-size: 80%;"><b>The ouput layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs019.html#compact-expressions" style="font-size: 80%;"><b>Compact expressions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs020.html#output-layer" style="font-size: 80%;"><b>Output layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs021.html#explicit-derivatives" style="font-size: 80%;"><b>Explicit derivatives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs022.html#derivatives-of-the-hidden-layer" style="font-size: 80%;"><b>Derivatives of the hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs023.html#final-expression" style="font-size: 80%;"><b>Final expression</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs024.html#completing-the-list" style="font-size: 80%;"><b>Completing the list</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs025.html#final-expressions-for-the-biases-of-the-hidden-layer" style="font-size: 80%;"><b>Final expressions for the biases of the hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs026.html#gradient-expressions" style="font-size: 80%;"><b>Gradient expressions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs027.html#setting-up-the-equations-for-a-neural-network" style="font-size: 80%;"><b>Setting up the equations for a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs028.html#layout-of-a-neural-network-with-three-hidden-layers-last-later-l-l-4-first-layer-l-0" style="font-size: 80%;"><b>Layout of a neural network with three hidden layers (last later = \( l=L=4 \), first layer \( l=0 \))</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs029.html#definitions" style="font-size: 80%;"><b>Definitions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs030.html#inputs-to-the-activation-function" style="font-size: 80%;"><b>Inputs to the activation function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs031.html#layout-of-input-to-first-hidden-layer-l-1-from-input-layer-l-0" style="font-size: 80%;"><b>Layout of input to first hidden layer \( l=1 \) from input layer \( l=0 \)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs032.html#derivatives-and-the-chain-rule" style="font-size: 80%;"><b>Derivatives and the chain rule</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs033.html#derivative-of-the-cost-function" style="font-size: 80%;"><b>Derivative of the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs034.html#the-back-propagation-equations-for-a-neural-network" style="font-size: 80%;"><b>The  back propagation equations for a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs035.html#analyzing-the-last-results" style="font-size: 80%;"><b>Analyzing the last results</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs036.html#more-considerations" style="font-size: 80%;"><b>More considerations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs037.html#derivatives-in-terms-of-z-j-l" style="font-size: 80%;"><b>Derivatives in terms of \( z_j^L \)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs038.html#bringing-it-together" style="font-size: 80%;"><b>Bringing it together</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs039.html#final-back-propagating-equation" style="font-size: 80%;"><b>Final back propagating equation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs040.html#using-the-chain-rule-and-summing-over-all-k-entries" style="font-size: 80%;"><b>Using the chain rule and summing over all \( k \) entries</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs041.html#setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations" style="font-size: 80%;"><b>Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs042.html#setting-up-the-back-propagation-algorithm-part-1" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 1</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs043.html#setting-up-the-back-propagation-algorithm-part-2" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs044.html#setting-up-the-back-propagation-algorithm-part-3" style="font-size: 80%;"><b>Setting up the Back propagation algorithm, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs045.html#updating-the-gradients" style="font-size: 80%;"><b>Updating the gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs001.html#lecture-october-13-2025" style="font-size: 80%;"><b>Lecture October 13, 2025</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs002.html#readings-and-videos" style="font-size: 80%;"><b>Readings and videos</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs003.html#material-for-the-lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Material for the lab sessions on Tuesday and Wednesday</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs004.html#lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network" style="font-size: 80%;"><b>Lecture material: Writing a code which implements a feed-forward neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs005.html#mathematics-of-deep-learning" style="font-size: 80%;"><b>Mathematics of deep learning</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs006.html#reminder-on-books-with-hands-on-material-and-codes" style="font-size: 80%;"><b>Reminder on books with hands-on material and codes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs007.html#reading-recommendations" style="font-size: 80%;"><b>Reading recommendations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs008.html#reminder-from-last-week-first-network-example-simple-percepetron-with-one-input" style="font-size: 80%;"><b>Reminder from last week: First network example, simple percepetron with one input</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs009.html#layout-of-a-simple-neural-network-with-no-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with no hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs010.html#optimizing-the-parameters" style="font-size: 80%;"><b>Optimizing the parameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs011.html#adding-a-hidden-layer" style="font-size: 80%;"><b>Adding a hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs012.html#layout-of-a-simple-neural-network-with-one-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with one hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs013.html#the-derivatives" style="font-size: 80%;"><b>The derivatives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs014.html#important-observations" style="font-size: 80%;"><b>Important observations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs015.html#the-training" style="font-size: 80%;"><b>The training</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs016.html#code-example" style="font-size: 80%;"><b>Code example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs017.html#simple-neural-network-and-the-back-propagation-equations" style="font-size: 80%;"><b>Simple neural network and the  back propagation equations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs018.html#layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-with-two-hidden-noeds-and-one-output-node" style="font-size: 80%;"><b>Layout of a simple neural network with two input nodes, one  hidden layer with two hidden noeds and one output node</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs019.html#the-ouput-layer" style="font-size: 80%;"><b>The ouput layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs020.html#compact-expressions" style="font-size: 80%;"><b>Compact expressions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs021.html#output-layer" style="font-size: 80%;"><b>Output layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs022.html#explicit-derivatives" style="font-size: 80%;"><b>Explicit derivatives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs023.html#derivatives-of-the-hidden-layer" style="font-size: 80%;"><b>Derivatives of the hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs024.html#final-expression" style="font-size: 80%;"><b>Final expression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs025.html#completing-the-list" style="font-size: 80%;"><b>Completing the list</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs026.html#final-expressions-for-the-biases-of-the-hidden-layer" style="font-size: 80%;"><b>Final expressions for the biases of the hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs027.html#gradient-expressions" style="font-size: 80%;"><b>Gradient expressions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs028.html#setting-up-the-equations-for-a-neural-network" style="font-size: 80%;"><b>Setting up the equations for a neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs029.html#layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0" style="font-size: 80%;"><b>Layout of a neural network with three hidden layers (last layer = \( l=L=4 \), first layer \( l=0 \))</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs030.html#definitions" style="font-size: 80%;"><b>Definitions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs031.html#inputs-to-the-activation-function" style="font-size: 80%;"><b>Inputs to the activation function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs032.html#layout-of-input-to-first-hidden-layer-l-1-from-input-layer-l-0" style="font-size: 80%;"><b>Layout of input to first hidden layer \( l=1 \) from input layer \( l=0 \)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs033.html#derivatives-and-the-chain-rule" style="font-size: 80%;"><b>Derivatives and the chain rule</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs034.html#derivative-of-the-cost-function" style="font-size: 80%;"><b>Derivative of the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs035.html#the-back-propagation-equations-for-a-neural-network" style="font-size: 80%;"><b>The  back propagation equations for a neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs036.html#analyzing-the-last-results" style="font-size: 80%;"><b>Analyzing the last results</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs037.html#more-considerations" style="font-size: 80%;"><b>More considerations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs038.html#derivatives-in-terms-of-z-j-l" style="font-size: 80%;"><b>Derivatives in terms of \( z_j^L \)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs039.html#bringing-it-together" style="font-size: 80%;"><b>Bringing it together</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs040.html#final-back-propagating-equation" style="font-size: 80%;"><b>Final back propagating equation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs041.html#using-the-chain-rule-and-summing-over-all-k-entries" style="font-size: 80%;"><b>Using the chain rule and summing over all \( k \) entries</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs042.html#setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations" style="font-size: 80%;"><b>Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs043.html#setting-up-the-back-propagation-algorithm-part-1" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 1</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs044.html#setting-up-the-back-propagation-algorithm-part-2" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 2</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs045.html#setting-up-the-back-propagation-algorithm-part-3" style="font-size: 80%;"><b>Setting up the Back propagation algorithm, part 3</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs046.html#updating-the-gradients" style="font-size: 80%;"><b>Updating the gradients</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#activation-functions" style="font-size: 80%;"><b>Activation functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs048.html#relevance" style="font-size: 80%;"><b>Relevance</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs049.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs050.html#exploding-gradients" style="font-size: 80%;"><b>Exploding gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs051.html#is-the-logistic-activation-function-sigmoid-our-choice" style="font-size: 80%;"><b>Is the Logistic activation function (Sigmoid)  our choice?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs052.html#logistic-function-as-the-root-of-problems" style="font-size: 80%;"><b>Logistic function as the root of problems</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs053.html#the-derivative-of-the-logistic-funtion" style="font-size: 80%;"><b>The derivative of the Logistic funtion</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs054.html#insights-from-the-paper-by-glorot-and-bengio" style="font-size: 80%;"><b>Insights from the paper by Glorot and Bengio</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs055.html#the-relu-function-family" style="font-size: 80%;"><b>The RELU function family</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs056.html#elu-function" style="font-size: 80%;"><b>ELU function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs057.html#which-activation-function-should-we-use" style="font-size: 80%;"><b>Which activation function should we use?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs058.html#more-on-activation-functions-output-layers" style="font-size: 80%;"><b>More on activation functions, output layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs059.html#fine-tuning-neural-network-hyperparameters" style="font-size: 80%;"><b>Fine-tuning neural network hyperparameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs060.html#hidden-layers" style="font-size: 80%;"><b>Hidden layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs061.html#batch-normalization" style="font-size: 80%;"><b>Batch Normalization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs062.html#dropout" style="font-size: 80%;"><b>Dropout</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs063.html#gradient-clipping" style="font-size: 80%;"><b>Gradient Clipping</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs064.html#a-top-down-perspective-on-neural-networks" style="font-size: 80%;"><b>A top-down perspective on Neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs065.html#more-top-down-perspectives" style="font-size: 80%;"><b>More top-down perspectives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs066.html#limitations-of-supervised-learning-with-deep-networks" style="font-size: 80%;"><b>Limitations of supervised learning with deep networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs067.html#limitations-of-nns" style="font-size: 80%;"><b>Limitations of NNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs068.html#homogeneous-data" style="font-size: 80%;"><b>Homogeneous data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs069.html#more-limitations" style="font-size: 80%;"><b>More limitations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs070.html#setting-up-a-multi-layer-perceptron-model-for-classification" style="font-size: 80%;"><b>Setting up a Multi-layer perceptron model for classification</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs071.html#defining-the-cost-function" style="font-size: 80%;"><b>Defining the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs072.html#example-binary-classification-problem" style="font-size: 80%;"><b>Example: binary classification problem</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs073.html#the-softmax-function" style="font-size: 80%;"><b>The Softmax function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs074.html#developing-a-code-for-doing-neural-networks-with-back-propagation" style="font-size: 80%;"><b>Developing a code for doing neural networks with back propagation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs076.html#train-and-test-datasets" style="font-size: 80%;"><b>Train and test datasets</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs077.html#define-model-and-architecture" style="font-size: 80%;"><b>Define model and architecture</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs078.html#layers" style="font-size: 80%;"><b>Layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs079.html#weights-and-biases" style="font-size: 80%;"><b>Weights and biases</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs080.html#feed-forward-pass" style="font-size: 80%;"><b>Feed-forward pass</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs081.html#matrix-multiplications" style="font-size: 80%;"><b>Matrix multiplications</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs082.html#choose-cost-function-and-optimizer" style="font-size: 80%;"><b>Choose cost function and optimizer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs083.html#optimizing-the-cost-function" style="font-size: 80%;"><b>Optimizing the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs084.html#regularization" style="font-size: 80%;"><b>Regularization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs085.html#matrix-multiplication" style="font-size: 80%;"><b>Matrix  multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs086.html#improving-performance" style="font-size: 80%;"><b>Improving performance</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs087.html#full-object-oriented-implementation" style="font-size: 80%;"><b>Full object-oriented implementation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs088.html#evaluate-model-performance-on-test-data" style="font-size: 80%;"><b>Evaluate model performance on test data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs089.html#adjust-hyperparameters" style="font-size: 80%;"><b>Adjust hyperparameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs091.html#scikit-learn-implementation" style="font-size: 80%;"><b>scikit-learn implementation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#building-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs094.html#tensorflow" style="font-size: 80%;"><b>Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs095.html#using-keras" style="font-size: 80%;"><b>Using Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#the-breast-cancer-data-now-with-keras" style="font-size: 80%;"><b>The Breast Cancer Data, now with Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs048.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs049.html#relevance" style="font-size: 80%;"><b>Relevance</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs050.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs051.html#exploding-gradients" style="font-size: 80%;"><b>Exploding gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs052.html#is-the-logistic-activation-function-sigmoid-our-choice" style="font-size: 80%;"><b>Is the Logistic activation function (Sigmoid)  our choice?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs053.html#logistic-function-as-the-root-of-problems" style="font-size: 80%;"><b>Logistic function as the root of problems</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs054.html#the-derivative-of-the-logistic-funtion" style="font-size: 80%;"><b>The derivative of the Logistic funtion</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs055.html#insights-from-the-paper-by-glorot-and-bengio" style="font-size: 80%;"><b>Insights from the paper by Glorot and Bengio</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs056.html#the-relu-function-family" style="font-size: 80%;"><b>The RELU function family</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs057.html#elu-function" style="font-size: 80%;"><b>ELU function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs058.html#which-activation-function-should-we-use" style="font-size: 80%;"><b>Which activation function should we use?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs059.html#more-on-activation-functions-output-layers" style="font-size: 80%;"><b>More on activation functions, output layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs060.html#fine-tuning-neural-network-hyperparameters" style="font-size: 80%;"><b>Fine-tuning neural network hyperparameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs061.html#hidden-layers" style="font-size: 80%;"><b>Hidden layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs062.html#batch-normalization" style="font-size: 80%;"><b>Batch Normalization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs063.html#dropout" style="font-size: 80%;"><b>Dropout</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs064.html#gradient-clipping" style="font-size: 80%;"><b>Gradient Clipping</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs065.html#a-top-down-perspective-on-neural-networks" style="font-size: 80%;"><b>A top-down perspective on Neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs066.html#more-top-down-perspectives" style="font-size: 80%;"><b>More top-down perspectives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs067.html#limitations-of-supervised-learning-with-deep-networks" style="font-size: 80%;"><b>Limitations of supervised learning with deep networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs068.html#limitations-of-nns" style="font-size: 80%;"><b>Limitations of NNs</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs069.html#homogeneous-data" style="font-size: 80%;"><b>Homogeneous data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs070.html#more-limitations" style="font-size: 80%;"><b>More limitations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs071.html#setting-up-a-multi-layer-perceptron-model-for-classification" style="font-size: 80%;"><b>Setting up a Multi-layer perceptron model for classification</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs072.html#defining-the-cost-function" style="font-size: 80%;"><b>Defining the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs073.html#example-binary-classification-problem" style="font-size: 80%;"><b>Example: binary classification problem</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs074.html#the-softmax-function" style="font-size: 80%;"><b>The Softmax function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs075.html#developing-a-code-for-doing-neural-networks-with-back-propagation" style="font-size: 80%;"><b>Developing a code for doing neural networks with back propagation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs077.html#train-and-test-datasets" style="font-size: 80%;"><b>Train and test datasets</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs078.html#define-model-and-architecture" style="font-size: 80%;"><b>Define model and architecture</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs079.html#layers" style="font-size: 80%;"><b>Layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs080.html#weights-and-biases" style="font-size: 80%;"><b>Weights and biases</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs081.html#feed-forward-pass" style="font-size: 80%;"><b>Feed-forward pass</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs082.html#matrix-multiplications" style="font-size: 80%;"><b>Matrix multiplications</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs083.html#choose-cost-function-and-optimizer" style="font-size: 80%;"><b>Choose cost function and optimizer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs084.html#optimizing-the-cost-function" style="font-size: 80%;"><b>Optimizing the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs085.html#regularization" style="font-size: 80%;"><b>Regularization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs086.html#matrix-multiplication" style="font-size: 80%;"><b>Matrix  multiplication</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs087.html#improving-performance" style="font-size: 80%;"><b>Improving performance</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs088.html#full-object-oriented-implementation" style="font-size: 80%;"><b>Full object-oriented implementation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs089.html#evaluate-model-performance-on-test-data" style="font-size: 80%;"><b>Evaluate model performance on test data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs090.html#adjust-hyperparameters" style="font-size: 80%;"><b>Adjust hyperparameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#scikit-learn-implementation" style="font-size: 80%;"><b>scikit-learn implementation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs094.html#building-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building neural networks in Tensorflow and Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs095.html#tensorflow" style="font-size: 80%;"><b>Tensorflow</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#using-keras" style="font-size: 80%;"><b>Using Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#building-a-neural-network-code" style="font-size: 80%;"><b>Building a neural network code</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#learning-rate-methods" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Learning rate methods</a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#usage-of-the-above-learning-rate-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of the above learning rate schedulers</a></li>
@@ -465,29 +463,20 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0047"></a>
 <!-- !split -->
-<h3 id="activation-functions-logistic-and-hyperbolic-ones" class="anchor">Activation functions, Logistic and Hyperbolic ones  </h3>
-
-<p>The second requirement excludes all linear functions. Furthermore, in
-a MLP with only linear activation functions, each layer simply
-performs a linear transformation of its inputs.
-</p>
+<h2 id="activation-functions" class="anchor">Activation functions  </h2>
 
-<p>Regardless of the number of layers, the output of the NN will be
-nothing but a linear function of the inputs. Thus we need to introduce
-some kind of non-linearity to the NN to be able to fit non-linear
-functions Typical examples are the logistic <em>Sigmoid</em>
+<p>A property that characterizes a neural network, other than its
+connectivity, is the choice of activation function(s).  The following
+restrictions are imposed on an activation function for an FFNN to
+fulfill the universal approximation theorem
 </p>
 
-$$
- \sigma(x) = \frac{1}{1 + e^{-x}},
-$$
-
-<p>and the <em>hyperbolic tangent</em> function</p>
-$$
- \sigma(x) = \tanh(x)
-$$
-
-
+<ul>
+  <li> Non-constant</li>
+  <li> Bounded</li>
+  <li> Monotonically-increasing</li>
+  <li> Continuous</li>
+</ul>
 <p>
 <!-- navigation buttons at the bottom of the page -->
 <ul class="pagination">
diff --git a/doc/pub/week42/html/._week42-bs048.html b/doc/pub/week42/html/._week42-bs048.html
index 1e420e2c0..15ac14ca6 100644
--- a/doc/pub/week42/html/._week42-bs048.html
+++ b/doc/pub/week42/html/._week42-bs048.html
@@ -36,16 +36,17 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Lecture October 14, 2024', 2, None, 'lecture-october-14-2024'),
-              ('Material for the active learning sessions on Tuesday and '
-               'Wednesday',
+ 'sections': [('Lecture October 13, 2025', 2, None, 'lecture-october-13-2025'),
+              ('Readings and videos', 2, None, 'readings-and-videos'),
+              ('Material for the lab sessions on Tuesday and Wednesday',
                2,
                None,
-               'material-for-the-active-learning-sessions-on-tuesday-and-wednesday'),
-              ('Writing a code which implements a feed-forward neural network',
+               'material-for-the-lab-sessions-on-tuesday-and-wednesday'),
+              ('Lecture material: Writing a code which implements a '
+               'feed-forward neural network',
                2,
                None,
-               'writing-a-code-which-implements-a-feed-forward-neural-network'),
+               'lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network'),
               ('Mathematics of deep learning',
                2,
                None,
@@ -55,10 +56,11 @@
                None,
                'reminder-on-books-with-hands-on-material-and-codes'),
               ('Reading recommendations', 2, None, 'reading-recommendations'),
-              ('First network example, simple percepetron with one input',
+              ('Reminder from last week: First network example, simple '
+               'percepetron with one input',
                2,
                None,
-               'first-network-example-simple-percepetron-with-one-input'),
+               'reminder-from-last-week-first-network-example-simple-percepetron-with-one-input'),
               ('Layout of a simple neural network with no hidden layer',
                2,
                None,
@@ -105,10 +107,10 @@
                None,
                'setting-up-the-equations-for-a-neural-network'),
               ('Layout of a neural network with three hidden layers (last '
-               'later = $l=L=4$, first layer $l=0$)',
+               'layer = $l=L=4$, first layer $l=0$)',
                2,
                None,
-               'layout-of-a-neural-network-with-three-hidden-layers-last-later-l-l-4-first-layer-l-0'),
+               'layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0'),
               ('Definitions', 2, None, 'definitions'),
               ('Inputs to the activation function',
                2,
@@ -291,10 +293,6 @@
                2,
                None,
                'collect-and-pre-process-data'),
-              ('The Breast Cancer Data, now with Keras',
-               2,
-               None,
-               'the-breast-cancer-data-now-with-keras'),
               ('Building a neural network code',
                2,
                None,
@@ -349,103 +347,103 @@
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs001.html#lecture-october-14-2024" style="font-size: 80%;"><b>Lecture October 14, 2024</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs002.html#material-for-the-active-learning-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Material for the active learning sessions on Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs003.html#writing-a-code-which-implements-a-feed-forward-neural-network" style="font-size: 80%;"><b>Writing a code which implements a feed-forward neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs004.html#mathematics-of-deep-learning" style="font-size: 80%;"><b>Mathematics of deep learning</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs005.html#reminder-on-books-with-hands-on-material-and-codes" style="font-size: 80%;"><b>Reminder on books with hands-on material and codes</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs006.html#reading-recommendations" style="font-size: 80%;"><b>Reading recommendations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs007.html#first-network-example-simple-percepetron-with-one-input" style="font-size: 80%;"><b>First network example, simple percepetron with one input</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs008.html#layout-of-a-simple-neural-network-with-no-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with no hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs009.html#optimizing-the-parameters" style="font-size: 80%;"><b>Optimizing the parameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs010.html#adding-a-hidden-layer" style="font-size: 80%;"><b>Adding a hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs011.html#layout-of-a-simple-neural-network-with-one-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with one hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs012.html#the-derivatives" style="font-size: 80%;"><b>The derivatives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs013.html#important-observations" style="font-size: 80%;"><b>Important observations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs014.html#the-training" style="font-size: 80%;"><b>The training</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs015.html#code-example" style="font-size: 80%;"><b>Code example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs016.html#simple-neural-network-and-the-back-propagation-equations" style="font-size: 80%;"><b>Simple neural network and the  back propagation equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs017.html#layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-with-two-hidden-noeds-and-one-output-node" style="font-size: 80%;"><b>Layout of a simple neural network with two input nodes, one  hidden layer with two hidden noeds and one output node</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs018.html#the-ouput-layer" style="font-size: 80%;"><b>The ouput layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs019.html#compact-expressions" style="font-size: 80%;"><b>Compact expressions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs020.html#output-layer" style="font-size: 80%;"><b>Output layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs021.html#explicit-derivatives" style="font-size: 80%;"><b>Explicit derivatives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs022.html#derivatives-of-the-hidden-layer" style="font-size: 80%;"><b>Derivatives of the hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs023.html#final-expression" style="font-size: 80%;"><b>Final expression</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs024.html#completing-the-list" style="font-size: 80%;"><b>Completing the list</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs025.html#final-expressions-for-the-biases-of-the-hidden-layer" style="font-size: 80%;"><b>Final expressions for the biases of the hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs026.html#gradient-expressions" style="font-size: 80%;"><b>Gradient expressions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs027.html#setting-up-the-equations-for-a-neural-network" style="font-size: 80%;"><b>Setting up the equations for a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs028.html#layout-of-a-neural-network-with-three-hidden-layers-last-later-l-l-4-first-layer-l-0" style="font-size: 80%;"><b>Layout of a neural network with three hidden layers (last later = \( l=L=4 \), first layer \( l=0 \))</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs029.html#definitions" style="font-size: 80%;"><b>Definitions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs030.html#inputs-to-the-activation-function" style="font-size: 80%;"><b>Inputs to the activation function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs031.html#layout-of-input-to-first-hidden-layer-l-1-from-input-layer-l-0" style="font-size: 80%;"><b>Layout of input to first hidden layer \( l=1 \) from input layer \( l=0 \)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs032.html#derivatives-and-the-chain-rule" style="font-size: 80%;"><b>Derivatives and the chain rule</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs033.html#derivative-of-the-cost-function" style="font-size: 80%;"><b>Derivative of the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs034.html#the-back-propagation-equations-for-a-neural-network" style="font-size: 80%;"><b>The  back propagation equations for a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs035.html#analyzing-the-last-results" style="font-size: 80%;"><b>Analyzing the last results</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs036.html#more-considerations" style="font-size: 80%;"><b>More considerations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs037.html#derivatives-in-terms-of-z-j-l" style="font-size: 80%;"><b>Derivatives in terms of \( z_j^L \)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs038.html#bringing-it-together" style="font-size: 80%;"><b>Bringing it together</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs039.html#final-back-propagating-equation" style="font-size: 80%;"><b>Final back propagating equation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs040.html#using-the-chain-rule-and-summing-over-all-k-entries" style="font-size: 80%;"><b>Using the chain rule and summing over all \( k \) entries</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs041.html#setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations" style="font-size: 80%;"><b>Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs042.html#setting-up-the-back-propagation-algorithm-part-1" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 1</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs043.html#setting-up-the-back-propagation-algorithm-part-2" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs044.html#setting-up-the-back-propagation-algorithm-part-3" style="font-size: 80%;"><b>Setting up the Back propagation algorithm, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs045.html#updating-the-gradients" style="font-size: 80%;"><b>Updating the gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs001.html#lecture-october-13-2025" style="font-size: 80%;"><b>Lecture October 13, 2025</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs002.html#readings-and-videos" style="font-size: 80%;"><b>Readings and videos</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs003.html#material-for-the-lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Material for the lab sessions on Tuesday and Wednesday</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs004.html#lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network" style="font-size: 80%;"><b>Lecture material: Writing a code which implements a feed-forward neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs005.html#mathematics-of-deep-learning" style="font-size: 80%;"><b>Mathematics of deep learning</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs006.html#reminder-on-books-with-hands-on-material-and-codes" style="font-size: 80%;"><b>Reminder on books with hands-on material and codes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs007.html#reading-recommendations" style="font-size: 80%;"><b>Reading recommendations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs008.html#reminder-from-last-week-first-network-example-simple-percepetron-with-one-input" style="font-size: 80%;"><b>Reminder from last week: First network example, simple percepetron with one input</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs009.html#layout-of-a-simple-neural-network-with-no-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with no hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs010.html#optimizing-the-parameters" style="font-size: 80%;"><b>Optimizing the parameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs011.html#adding-a-hidden-layer" style="font-size: 80%;"><b>Adding a hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs012.html#layout-of-a-simple-neural-network-with-one-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with one hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs013.html#the-derivatives" style="font-size: 80%;"><b>The derivatives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs014.html#important-observations" style="font-size: 80%;"><b>Important observations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs015.html#the-training" style="font-size: 80%;"><b>The training</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs016.html#code-example" style="font-size: 80%;"><b>Code example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs017.html#simple-neural-network-and-the-back-propagation-equations" style="font-size: 80%;"><b>Simple neural network and the  back propagation equations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs018.html#layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-with-two-hidden-noeds-and-one-output-node" style="font-size: 80%;"><b>Layout of a simple neural network with two input nodes, one  hidden layer with two hidden noeds and one output node</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs019.html#the-ouput-layer" style="font-size: 80%;"><b>The ouput layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs020.html#compact-expressions" style="font-size: 80%;"><b>Compact expressions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs021.html#output-layer" style="font-size: 80%;"><b>Output layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs022.html#explicit-derivatives" style="font-size: 80%;"><b>Explicit derivatives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs023.html#derivatives-of-the-hidden-layer" style="font-size: 80%;"><b>Derivatives of the hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs024.html#final-expression" style="font-size: 80%;"><b>Final expression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs025.html#completing-the-list" style="font-size: 80%;"><b>Completing the list</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs026.html#final-expressions-for-the-biases-of-the-hidden-layer" style="font-size: 80%;"><b>Final expressions for the biases of the hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs027.html#gradient-expressions" style="font-size: 80%;"><b>Gradient expressions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs028.html#setting-up-the-equations-for-a-neural-network" style="font-size: 80%;"><b>Setting up the equations for a neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs029.html#layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0" style="font-size: 80%;"><b>Layout of a neural network with three hidden layers (last layer = \( l=L=4 \), first layer \( l=0 \))</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs030.html#definitions" style="font-size: 80%;"><b>Definitions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs031.html#inputs-to-the-activation-function" style="font-size: 80%;"><b>Inputs to the activation function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs032.html#layout-of-input-to-first-hidden-layer-l-1-from-input-layer-l-0" style="font-size: 80%;"><b>Layout of input to first hidden layer \( l=1 \) from input layer \( l=0 \)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs033.html#derivatives-and-the-chain-rule" style="font-size: 80%;"><b>Derivatives and the chain rule</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs034.html#derivative-of-the-cost-function" style="font-size: 80%;"><b>Derivative of the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs035.html#the-back-propagation-equations-for-a-neural-network" style="font-size: 80%;"><b>The  back propagation equations for a neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs036.html#analyzing-the-last-results" style="font-size: 80%;"><b>Analyzing the last results</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs037.html#more-considerations" style="font-size: 80%;"><b>More considerations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs038.html#derivatives-in-terms-of-z-j-l" style="font-size: 80%;"><b>Derivatives in terms of \( z_j^L \)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs039.html#bringing-it-together" style="font-size: 80%;"><b>Bringing it together</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs040.html#final-back-propagating-equation" style="font-size: 80%;"><b>Final back propagating equation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs041.html#using-the-chain-rule-and-summing-over-all-k-entries" style="font-size: 80%;"><b>Using the chain rule and summing over all \( k \) entries</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs042.html#setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations" style="font-size: 80%;"><b>Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs043.html#setting-up-the-back-propagation-algorithm-part-1" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 1</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs044.html#setting-up-the-back-propagation-algorithm-part-2" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 2</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs045.html#setting-up-the-back-propagation-algorithm-part-3" style="font-size: 80%;"><b>Setting up the Back propagation algorithm, part 3</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs046.html#updating-the-gradients" style="font-size: 80%;"><b>Updating the gradients</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#activation-functions" style="font-size: 80%;"><b>Activation functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs047.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
-     <!-- navigation toc: --> <li><a href="#relevance" style="font-size: 80%;"><b>Relevance</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs049.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs050.html#exploding-gradients" style="font-size: 80%;"><b>Exploding gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs051.html#is-the-logistic-activation-function-sigmoid-our-choice" style="font-size: 80%;"><b>Is the Logistic activation function (Sigmoid)  our choice?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs052.html#logistic-function-as-the-root-of-problems" style="font-size: 80%;"><b>Logistic function as the root of problems</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs053.html#the-derivative-of-the-logistic-funtion" style="font-size: 80%;"><b>The derivative of the Logistic funtion</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs054.html#insights-from-the-paper-by-glorot-and-bengio" style="font-size: 80%;"><b>Insights from the paper by Glorot and Bengio</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs055.html#the-relu-function-family" style="font-size: 80%;"><b>The RELU function family</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs056.html#elu-function" style="font-size: 80%;"><b>ELU function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs057.html#which-activation-function-should-we-use" style="font-size: 80%;"><b>Which activation function should we use?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs058.html#more-on-activation-functions-output-layers" style="font-size: 80%;"><b>More on activation functions, output layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs059.html#fine-tuning-neural-network-hyperparameters" style="font-size: 80%;"><b>Fine-tuning neural network hyperparameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs060.html#hidden-layers" style="font-size: 80%;"><b>Hidden layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs061.html#batch-normalization" style="font-size: 80%;"><b>Batch Normalization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs062.html#dropout" style="font-size: 80%;"><b>Dropout</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs063.html#gradient-clipping" style="font-size: 80%;"><b>Gradient Clipping</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs064.html#a-top-down-perspective-on-neural-networks" style="font-size: 80%;"><b>A top-down perspective on Neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs065.html#more-top-down-perspectives" style="font-size: 80%;"><b>More top-down perspectives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs066.html#limitations-of-supervised-learning-with-deep-networks" style="font-size: 80%;"><b>Limitations of supervised learning with deep networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs067.html#limitations-of-nns" style="font-size: 80%;"><b>Limitations of NNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs068.html#homogeneous-data" style="font-size: 80%;"><b>Homogeneous data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs069.html#more-limitations" style="font-size: 80%;"><b>More limitations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs070.html#setting-up-a-multi-layer-perceptron-model-for-classification" style="font-size: 80%;"><b>Setting up a Multi-layer perceptron model for classification</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs071.html#defining-the-cost-function" style="font-size: 80%;"><b>Defining the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs072.html#example-binary-classification-problem" style="font-size: 80%;"><b>Example: binary classification problem</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs073.html#the-softmax-function" style="font-size: 80%;"><b>The Softmax function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs074.html#developing-a-code-for-doing-neural-networks-with-back-propagation" style="font-size: 80%;"><b>Developing a code for doing neural networks with back propagation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs076.html#train-and-test-datasets" style="font-size: 80%;"><b>Train and test datasets</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs077.html#define-model-and-architecture" style="font-size: 80%;"><b>Define model and architecture</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs078.html#layers" style="font-size: 80%;"><b>Layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs079.html#weights-and-biases" style="font-size: 80%;"><b>Weights and biases</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs080.html#feed-forward-pass" style="font-size: 80%;"><b>Feed-forward pass</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs081.html#matrix-multiplications" style="font-size: 80%;"><b>Matrix multiplications</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs082.html#choose-cost-function-and-optimizer" style="font-size: 80%;"><b>Choose cost function and optimizer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs083.html#optimizing-the-cost-function" style="font-size: 80%;"><b>Optimizing the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs084.html#regularization" style="font-size: 80%;"><b>Regularization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs085.html#matrix-multiplication" style="font-size: 80%;"><b>Matrix  multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs086.html#improving-performance" style="font-size: 80%;"><b>Improving performance</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs087.html#full-object-oriented-implementation" style="font-size: 80%;"><b>Full object-oriented implementation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs088.html#evaluate-model-performance-on-test-data" style="font-size: 80%;"><b>Evaluate model performance on test data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs089.html#adjust-hyperparameters" style="font-size: 80%;"><b>Adjust hyperparameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs091.html#scikit-learn-implementation" style="font-size: 80%;"><b>scikit-learn implementation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#building-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs094.html#tensorflow" style="font-size: 80%;"><b>Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs095.html#using-keras" style="font-size: 80%;"><b>Using Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#the-breast-cancer-data-now-with-keras" style="font-size: 80%;"><b>The Breast Cancer Data, now with Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs049.html#relevance" style="font-size: 80%;"><b>Relevance</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs050.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs051.html#exploding-gradients" style="font-size: 80%;"><b>Exploding gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs052.html#is-the-logistic-activation-function-sigmoid-our-choice" style="font-size: 80%;"><b>Is the Logistic activation function (Sigmoid)  our choice?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs053.html#logistic-function-as-the-root-of-problems" style="font-size: 80%;"><b>Logistic function as the root of problems</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs054.html#the-derivative-of-the-logistic-funtion" style="font-size: 80%;"><b>The derivative of the Logistic funtion</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs055.html#insights-from-the-paper-by-glorot-and-bengio" style="font-size: 80%;"><b>Insights from the paper by Glorot and Bengio</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs056.html#the-relu-function-family" style="font-size: 80%;"><b>The RELU function family</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs057.html#elu-function" style="font-size: 80%;"><b>ELU function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs058.html#which-activation-function-should-we-use" style="font-size: 80%;"><b>Which activation function should we use?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs059.html#more-on-activation-functions-output-layers" style="font-size: 80%;"><b>More on activation functions, output layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs060.html#fine-tuning-neural-network-hyperparameters" style="font-size: 80%;"><b>Fine-tuning neural network hyperparameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs061.html#hidden-layers" style="font-size: 80%;"><b>Hidden layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs062.html#batch-normalization" style="font-size: 80%;"><b>Batch Normalization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs063.html#dropout" style="font-size: 80%;"><b>Dropout</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs064.html#gradient-clipping" style="font-size: 80%;"><b>Gradient Clipping</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs065.html#a-top-down-perspective-on-neural-networks" style="font-size: 80%;"><b>A top-down perspective on Neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs066.html#more-top-down-perspectives" style="font-size: 80%;"><b>More top-down perspectives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs067.html#limitations-of-supervised-learning-with-deep-networks" style="font-size: 80%;"><b>Limitations of supervised learning with deep networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs068.html#limitations-of-nns" style="font-size: 80%;"><b>Limitations of NNs</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs069.html#homogeneous-data" style="font-size: 80%;"><b>Homogeneous data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs070.html#more-limitations" style="font-size: 80%;"><b>More limitations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs071.html#setting-up-a-multi-layer-perceptron-model-for-classification" style="font-size: 80%;"><b>Setting up a Multi-layer perceptron model for classification</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs072.html#defining-the-cost-function" style="font-size: 80%;"><b>Defining the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs073.html#example-binary-classification-problem" style="font-size: 80%;"><b>Example: binary classification problem</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs074.html#the-softmax-function" style="font-size: 80%;"><b>The Softmax function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs075.html#developing-a-code-for-doing-neural-networks-with-back-propagation" style="font-size: 80%;"><b>Developing a code for doing neural networks with back propagation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs077.html#train-and-test-datasets" style="font-size: 80%;"><b>Train and test datasets</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs078.html#define-model-and-architecture" style="font-size: 80%;"><b>Define model and architecture</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs079.html#layers" style="font-size: 80%;"><b>Layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs080.html#weights-and-biases" style="font-size: 80%;"><b>Weights and biases</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs081.html#feed-forward-pass" style="font-size: 80%;"><b>Feed-forward pass</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs082.html#matrix-multiplications" style="font-size: 80%;"><b>Matrix multiplications</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs083.html#choose-cost-function-and-optimizer" style="font-size: 80%;"><b>Choose cost function and optimizer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs084.html#optimizing-the-cost-function" style="font-size: 80%;"><b>Optimizing the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs085.html#regularization" style="font-size: 80%;"><b>Regularization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs086.html#matrix-multiplication" style="font-size: 80%;"><b>Matrix  multiplication</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs087.html#improving-performance" style="font-size: 80%;"><b>Improving performance</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs088.html#full-object-oriented-implementation" style="font-size: 80%;"><b>Full object-oriented implementation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs089.html#evaluate-model-performance-on-test-data" style="font-size: 80%;"><b>Evaluate model performance on test data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs090.html#adjust-hyperparameters" style="font-size: 80%;"><b>Adjust hyperparameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#scikit-learn-implementation" style="font-size: 80%;"><b>scikit-learn implementation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs094.html#building-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building neural networks in Tensorflow and Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs095.html#tensorflow" style="font-size: 80%;"><b>Tensorflow</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#using-keras" style="font-size: 80%;"><b>Using Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#building-a-neural-network-code" style="font-size: 80%;"><b>Building a neural network code</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#learning-rate-methods" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Learning rate methods</a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#usage-of-the-above-learning-rate-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of the above learning rate schedulers</a></li>
@@ -465,107 +463,27 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0048"></a>
 <!-- !split -->
-<h2 id="relevance" class="anchor">Relevance </h2>
+<h3 id="activation-functions-logistic-and-hyperbolic-ones" class="anchor">Activation functions, Logistic and Hyperbolic ones  </h3>
 
-<p>The <em>sigmoid</em> function are more biologically plausible because the
-output of inactive neurons are zero. Such activation function are
-called <em>one-sided</em>. However, it has been shown that the hyperbolic
-tangent performs better than the sigmoid for training MLPs.  has
-become the most popular for <em>deep neural networks</em>
+<p>The second requirement excludes all linear functions. Furthermore, in
+a MLP with only linear activation functions, each layer simply
+performs a linear transformation of its inputs.
 </p>
 
+<p>Regardless of the number of layers, the output of the NN will be
+nothing but a linear function of the inputs. Thus we need to introduce
+some kind of non-linearity to the NN to be able to fit non-linear
+functions Typical examples are the logistic <em>Sigmoid</em>
+</p>
 
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;The sigmoid function (or the logistic curve) is a </span>
-<span style="color: #BA2121; font-style: italic">function that takes any real number, z, and outputs a number (0,1).</span>
-<span style="color: #BA2121; font-style: italic">It is useful in neural networks for assigning weights on a relative scale.</span>
-<span style="color: #BA2121; font-style: italic">The value z is the weighted sum of parameters involved in the learning algorithm.&quot;&quot;&quot;</span>
-
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">numpy</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">matplotlib.pyplot</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">plt</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">math</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">mt</span>
-
-z <span style="color: #666666">=</span> numpy<span style="color: #666666">.</span>arange(<span style="color: #666666">-5</span>, <span style="color: #666666">5</span>, <span style="color: #666666">.1</span>)
-sigma_fn <span style="color: #666666">=</span> numpy<span style="color: #666666">.</span>vectorize(<span style="color: #008000; font-weight: bold">lambda</span> z: <span style="color: #666666">1/</span>(<span style="color: #666666">1+</span>numpy<span style="color: #666666">.</span>exp(<span style="color: #666666">-</span>z)))
-sigma <span style="color: #666666">=</span> sigma_fn(z)
-
-fig <span style="color: #666666">=</span> plt<span style="color: #666666">.</span>figure()
-ax <span style="color: #666666">=</span> fig<span style="color: #666666">.</span>add_subplot(<span style="color: #666666">111</span>)
-ax<span style="color: #666666">.</span>plot(z, sigma)
-ax<span style="color: #666666">.</span>set_ylim([<span style="color: #666666">-0.1</span>, <span style="color: #666666">1.1</span>])
-ax<span style="color: #666666">.</span>set_xlim([<span style="color: #666666">-5</span>,<span style="color: #666666">5</span>])
-ax<span style="color: #666666">.</span>grid(<span style="color: #008000; font-weight: bold">True</span>)
-ax<span style="color: #666666">.</span>set_xlabel(<span style="color: #BA2121">&#39;z&#39;</span>)
-ax<span style="color: #666666">.</span>set_title(<span style="color: #BA2121">&#39;sigmoid function&#39;</span>)
-
-plt<span style="color: #666666">.</span>show()
-
-<span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;Step Function&quot;&quot;&quot;</span>
-z <span style="color: #666666">=</span> numpy<span style="color: #666666">.</span>arange(<span style="color: #666666">-5</span>, <span style="color: #666666">5</span>, <span style="color: #666666">.02</span>)
-step_fn <span style="color: #666666">=</span> numpy<span style="color: #666666">.</span>vectorize(<span style="color: #008000; font-weight: bold">lambda</span> z: <span style="color: #666666">1.0</span> <span style="color: #008000; font-weight: bold">if</span> z <span style="color: #666666">&gt;=</span> <span style="color: #666666">0.0</span> <span style="color: #008000; font-weight: bold">else</span> <span style="color: #666666">0.0</span>)
-step <span style="color: #666666">=</span> step_fn(z)
-
-fig <span style="color: #666666">=</span> plt<span style="color: #666666">.</span>figure()
-ax <span style="color: #666666">=</span> fig<span style="color: #666666">.</span>add_subplot(<span style="color: #666666">111</span>)
-ax<span style="color: #666666">.</span>plot(z, step)
-ax<span style="color: #666666">.</span>set_ylim([<span style="color: #666666">-0.5</span>, <span style="color: #666666">1.5</span>])
-ax<span style="color: #666666">.</span>set_xlim([<span style="color: #666666">-5</span>,<span style="color: #666666">5</span>])
-ax<span style="color: #666666">.</span>grid(<span style="color: #008000; font-weight: bold">True</span>)
-ax<span style="color: #666666">.</span>set_xlabel(<span style="color: #BA2121">&#39;z&#39;</span>)
-ax<span style="color: #666666">.</span>set_title(<span style="color: #BA2121">&#39;step function&#39;</span>)
-
-plt<span style="color: #666666">.</span>show()
-
-<span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;Sine Function&quot;&quot;&quot;</span>
-z <span style="color: #666666">=</span> numpy<span style="color: #666666">.</span>arange(<span style="color: #666666">-2*</span>mt<span style="color: #666666">.</span>pi, <span style="color: #666666">2*</span>mt<span style="color: #666666">.</span>pi, <span style="color: #666666">0.1</span>)
-t <span style="color: #666666">=</span> numpy<span style="color: #666666">.</span>sin(z)
-
-fig <span style="color: #666666">=</span> plt<span style="color: #666666">.</span>figure()
-ax <span style="color: #666666">=</span> fig<span style="color: #666666">.</span>add_subplot(<span style="color: #666666">111</span>)
-ax<span style="color: #666666">.</span>plot(z, t)
-ax<span style="color: #666666">.</span>set_ylim([<span style="color: #666666">-1.0</span>, <span style="color: #666666">1.0</span>])
-ax<span style="color: #666666">.</span>set_xlim([<span style="color: #666666">-2*</span>mt<span style="color: #666666">.</span>pi,<span style="color: #666666">2*</span>mt<span style="color: #666666">.</span>pi])
-ax<span style="color: #666666">.</span>grid(<span style="color: #008000; font-weight: bold">True</span>)
-ax<span style="color: #666666">.</span>set_xlabel(<span style="color: #BA2121">&#39;z&#39;</span>)
-ax<span style="color: #666666">.</span>set_title(<span style="color: #BA2121">&#39;sine function&#39;</span>)
-
-plt<span style="color: #666666">.</span>show()
-
-<span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;Plots a graph of the squashing function used by a rectified linear</span>
-<span style="color: #BA2121; font-style: italic">unit&quot;&quot;&quot;</span>
-z <span style="color: #666666">=</span> numpy<span style="color: #666666">.</span>arange(<span style="color: #666666">-2</span>, <span style="color: #666666">2</span>, <span style="color: #666666">.1</span>)
-zero <span style="color: #666666">=</span> numpy<span style="color: #666666">.</span>zeros(<span style="color: #008000">len</span>(z))
-y <span style="color: #666666">=</span> numpy<span style="color: #666666">.</span>max([zero, z], axis<span style="color: #666666">=0</span>)
-
-fig <span style="color: #666666">=</span> plt<span style="color: #666666">.</span>figure()
-ax <span style="color: #666666">=</span> fig<span style="color: #666666">.</span>add_subplot(<span style="color: #666666">111</span>)
-ax<span style="color: #666666">.</span>plot(z, y)
-ax<span style="color: #666666">.</span>set_ylim([<span style="color: #666666">-2.0</span>, <span style="color: #666666">2.0</span>])
-ax<span style="color: #666666">.</span>set_xlim([<span style="color: #666666">-2.0</span>, <span style="color: #666666">2.0</span>])
-ax<span style="color: #666666">.</span>grid(<span style="color: #008000; font-weight: bold">True</span>)
-ax<span style="color: #666666">.</span>set_xlabel(<span style="color: #BA2121">&#39;z&#39;</span>)
-ax<span style="color: #666666">.</span>set_title(<span style="color: #BA2121">&#39;Rectified linear unit&#39;</span>)
+$$
+ \sigma(x) = \frac{1}{1 + e^{-x}},
+$$
 
-plt<span style="color: #666666">.</span>show()
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
+<p>and the <em>hyperbolic tangent</em> function</p>
+$$
+ \sigma(x) = \tanh(x)
+$$
 
 
 <p>
diff --git a/doc/pub/week42/html/._week42-bs049.html b/doc/pub/week42/html/._week42-bs049.html
index d7d92c221..beb370163 100644
--- a/doc/pub/week42/html/._week42-bs049.html
+++ b/doc/pub/week42/html/._week42-bs049.html
@@ -36,16 +36,17 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Lecture October 14, 2024', 2, None, 'lecture-october-14-2024'),
-              ('Material for the active learning sessions on Tuesday and '
-               'Wednesday',
+ 'sections': [('Lecture October 13, 2025', 2, None, 'lecture-october-13-2025'),
+              ('Readings and videos', 2, None, 'readings-and-videos'),
+              ('Material for the lab sessions on Tuesday and Wednesday',
                2,
                None,
-               'material-for-the-active-learning-sessions-on-tuesday-and-wednesday'),
-              ('Writing a code which implements a feed-forward neural network',
+               'material-for-the-lab-sessions-on-tuesday-and-wednesday'),
+              ('Lecture material: Writing a code which implements a '
+               'feed-forward neural network',
                2,
                None,
-               'writing-a-code-which-implements-a-feed-forward-neural-network'),
+               'lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network'),
               ('Mathematics of deep learning',
                2,
                None,
@@ -55,10 +56,11 @@
                None,
                'reminder-on-books-with-hands-on-material-and-codes'),
               ('Reading recommendations', 2, None, 'reading-recommendations'),
-              ('First network example, simple percepetron with one input',
+              ('Reminder from last week: First network example, simple '
+               'percepetron with one input',
                2,
                None,
-               'first-network-example-simple-percepetron-with-one-input'),
+               'reminder-from-last-week-first-network-example-simple-percepetron-with-one-input'),
               ('Layout of a simple neural network with no hidden layer',
                2,
                None,
@@ -105,10 +107,10 @@
                None,
                'setting-up-the-equations-for-a-neural-network'),
               ('Layout of a neural network with three hidden layers (last '
-               'later = $l=L=4$, first layer $l=0$)',
+               'layer = $l=L=4$, first layer $l=0$)',
                2,
                None,
-               'layout-of-a-neural-network-with-three-hidden-layers-last-later-l-l-4-first-layer-l-0'),
+               'layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0'),
               ('Definitions', 2, None, 'definitions'),
               ('Inputs to the activation function',
                2,
@@ -291,10 +293,6 @@
                2,
                None,
                'collect-and-pre-process-data'),
-              ('The Breast Cancer Data, now with Keras',
-               2,
-               None,
-               'the-breast-cancer-data-now-with-keras'),
               ('Building a neural network code',
                2,
                None,
@@ -349,103 +347,103 @@
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs001.html#lecture-october-14-2024" style="font-size: 80%;"><b>Lecture October 14, 2024</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs002.html#material-for-the-active-learning-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Material for the active learning sessions on Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs003.html#writing-a-code-which-implements-a-feed-forward-neural-network" style="font-size: 80%;"><b>Writing a code which implements a feed-forward neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs004.html#mathematics-of-deep-learning" style="font-size: 80%;"><b>Mathematics of deep learning</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs005.html#reminder-on-books-with-hands-on-material-and-codes" style="font-size: 80%;"><b>Reminder on books with hands-on material and codes</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs006.html#reading-recommendations" style="font-size: 80%;"><b>Reading recommendations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs007.html#first-network-example-simple-percepetron-with-one-input" style="font-size: 80%;"><b>First network example, simple percepetron with one input</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs008.html#layout-of-a-simple-neural-network-with-no-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with no hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs009.html#optimizing-the-parameters" style="font-size: 80%;"><b>Optimizing the parameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs010.html#adding-a-hidden-layer" style="font-size: 80%;"><b>Adding a hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs011.html#layout-of-a-simple-neural-network-with-one-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with one hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs012.html#the-derivatives" style="font-size: 80%;"><b>The derivatives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs013.html#important-observations" style="font-size: 80%;"><b>Important observations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs014.html#the-training" style="font-size: 80%;"><b>The training</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs015.html#code-example" style="font-size: 80%;"><b>Code example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs016.html#simple-neural-network-and-the-back-propagation-equations" style="font-size: 80%;"><b>Simple neural network and the  back propagation equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs017.html#layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-with-two-hidden-noeds-and-one-output-node" style="font-size: 80%;"><b>Layout of a simple neural network with two input nodes, one  hidden layer with two hidden noeds and one output node</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs018.html#the-ouput-layer" style="font-size: 80%;"><b>The ouput layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs019.html#compact-expressions" style="font-size: 80%;"><b>Compact expressions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs020.html#output-layer" style="font-size: 80%;"><b>Output layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs021.html#explicit-derivatives" style="font-size: 80%;"><b>Explicit derivatives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs022.html#derivatives-of-the-hidden-layer" style="font-size: 80%;"><b>Derivatives of the hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs023.html#final-expression" style="font-size: 80%;"><b>Final expression</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs024.html#completing-the-list" style="font-size: 80%;"><b>Completing the list</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs025.html#final-expressions-for-the-biases-of-the-hidden-layer" style="font-size: 80%;"><b>Final expressions for the biases of the hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs026.html#gradient-expressions" style="font-size: 80%;"><b>Gradient expressions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs027.html#setting-up-the-equations-for-a-neural-network" style="font-size: 80%;"><b>Setting up the equations for a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs028.html#layout-of-a-neural-network-with-three-hidden-layers-last-later-l-l-4-first-layer-l-0" style="font-size: 80%;"><b>Layout of a neural network with three hidden layers (last later = \( l=L=4 \), first layer \( l=0 \))</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs029.html#definitions" style="font-size: 80%;"><b>Definitions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs030.html#inputs-to-the-activation-function" style="font-size: 80%;"><b>Inputs to the activation function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs031.html#layout-of-input-to-first-hidden-layer-l-1-from-input-layer-l-0" style="font-size: 80%;"><b>Layout of input to first hidden layer \( l=1 \) from input layer \( l=0 \)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs032.html#derivatives-and-the-chain-rule" style="font-size: 80%;"><b>Derivatives and the chain rule</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs033.html#derivative-of-the-cost-function" style="font-size: 80%;"><b>Derivative of the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs034.html#the-back-propagation-equations-for-a-neural-network" style="font-size: 80%;"><b>The  back propagation equations for a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs035.html#analyzing-the-last-results" style="font-size: 80%;"><b>Analyzing the last results</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs036.html#more-considerations" style="font-size: 80%;"><b>More considerations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs037.html#derivatives-in-terms-of-z-j-l" style="font-size: 80%;"><b>Derivatives in terms of \( z_j^L \)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs038.html#bringing-it-together" style="font-size: 80%;"><b>Bringing it together</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs039.html#final-back-propagating-equation" style="font-size: 80%;"><b>Final back propagating equation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs040.html#using-the-chain-rule-and-summing-over-all-k-entries" style="font-size: 80%;"><b>Using the chain rule and summing over all \( k \) entries</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs041.html#setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations" style="font-size: 80%;"><b>Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs042.html#setting-up-the-back-propagation-algorithm-part-1" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 1</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs043.html#setting-up-the-back-propagation-algorithm-part-2" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs044.html#setting-up-the-back-propagation-algorithm-part-3" style="font-size: 80%;"><b>Setting up the Back propagation algorithm, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs045.html#updating-the-gradients" style="font-size: 80%;"><b>Updating the gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs001.html#lecture-october-13-2025" style="font-size: 80%;"><b>Lecture October 13, 2025</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs002.html#readings-and-videos" style="font-size: 80%;"><b>Readings and videos</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs003.html#material-for-the-lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Material for the lab sessions on Tuesday and Wednesday</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs004.html#lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network" style="font-size: 80%;"><b>Lecture material: Writing a code which implements a feed-forward neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs005.html#mathematics-of-deep-learning" style="font-size: 80%;"><b>Mathematics of deep learning</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs006.html#reminder-on-books-with-hands-on-material-and-codes" style="font-size: 80%;"><b>Reminder on books with hands-on material and codes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs007.html#reading-recommendations" style="font-size: 80%;"><b>Reading recommendations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs008.html#reminder-from-last-week-first-network-example-simple-percepetron-with-one-input" style="font-size: 80%;"><b>Reminder from last week: First network example, simple percepetron with one input</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs009.html#layout-of-a-simple-neural-network-with-no-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with no hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs010.html#optimizing-the-parameters" style="font-size: 80%;"><b>Optimizing the parameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs011.html#adding-a-hidden-layer" style="font-size: 80%;"><b>Adding a hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs012.html#layout-of-a-simple-neural-network-with-one-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with one hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs013.html#the-derivatives" style="font-size: 80%;"><b>The derivatives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs014.html#important-observations" style="font-size: 80%;"><b>Important observations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs015.html#the-training" style="font-size: 80%;"><b>The training</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs016.html#code-example" style="font-size: 80%;"><b>Code example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs017.html#simple-neural-network-and-the-back-propagation-equations" style="font-size: 80%;"><b>Simple neural network and the  back propagation equations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs018.html#layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-with-two-hidden-noeds-and-one-output-node" style="font-size: 80%;"><b>Layout of a simple neural network with two input nodes, one  hidden layer with two hidden noeds and one output node</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs019.html#the-ouput-layer" style="font-size: 80%;"><b>The ouput layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs020.html#compact-expressions" style="font-size: 80%;"><b>Compact expressions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs021.html#output-layer" style="font-size: 80%;"><b>Output layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs022.html#explicit-derivatives" style="font-size: 80%;"><b>Explicit derivatives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs023.html#derivatives-of-the-hidden-layer" style="font-size: 80%;"><b>Derivatives of the hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs024.html#final-expression" style="font-size: 80%;"><b>Final expression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs025.html#completing-the-list" style="font-size: 80%;"><b>Completing the list</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs026.html#final-expressions-for-the-biases-of-the-hidden-layer" style="font-size: 80%;"><b>Final expressions for the biases of the hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs027.html#gradient-expressions" style="font-size: 80%;"><b>Gradient expressions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs028.html#setting-up-the-equations-for-a-neural-network" style="font-size: 80%;"><b>Setting up the equations for a neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs029.html#layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0" style="font-size: 80%;"><b>Layout of a neural network with three hidden layers (last layer = \( l=L=4 \), first layer \( l=0 \))</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs030.html#definitions" style="font-size: 80%;"><b>Definitions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs031.html#inputs-to-the-activation-function" style="font-size: 80%;"><b>Inputs to the activation function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs032.html#layout-of-input-to-first-hidden-layer-l-1-from-input-layer-l-0" style="font-size: 80%;"><b>Layout of input to first hidden layer \( l=1 \) from input layer \( l=0 \)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs033.html#derivatives-and-the-chain-rule" style="font-size: 80%;"><b>Derivatives and the chain rule</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs034.html#derivative-of-the-cost-function" style="font-size: 80%;"><b>Derivative of the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs035.html#the-back-propagation-equations-for-a-neural-network" style="font-size: 80%;"><b>The  back propagation equations for a neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs036.html#analyzing-the-last-results" style="font-size: 80%;"><b>Analyzing the last results</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs037.html#more-considerations" style="font-size: 80%;"><b>More considerations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs038.html#derivatives-in-terms-of-z-j-l" style="font-size: 80%;"><b>Derivatives in terms of \( z_j^L \)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs039.html#bringing-it-together" style="font-size: 80%;"><b>Bringing it together</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs040.html#final-back-propagating-equation" style="font-size: 80%;"><b>Final back propagating equation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs041.html#using-the-chain-rule-and-summing-over-all-k-entries" style="font-size: 80%;"><b>Using the chain rule and summing over all \( k \) entries</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs042.html#setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations" style="font-size: 80%;"><b>Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs043.html#setting-up-the-back-propagation-algorithm-part-1" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 1</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs044.html#setting-up-the-back-propagation-algorithm-part-2" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 2</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs045.html#setting-up-the-back-propagation-algorithm-part-3" style="font-size: 80%;"><b>Setting up the Back propagation algorithm, part 3</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs046.html#updating-the-gradients" style="font-size: 80%;"><b>Updating the gradients</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#activation-functions" style="font-size: 80%;"><b>Activation functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs047.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs048.html#relevance" style="font-size: 80%;"><b>Relevance</b></a></li>
-     <!-- navigation toc: --> <li><a href="#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs050.html#exploding-gradients" style="font-size: 80%;"><b>Exploding gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs051.html#is-the-logistic-activation-function-sigmoid-our-choice" style="font-size: 80%;"><b>Is the Logistic activation function (Sigmoid)  our choice?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs052.html#logistic-function-as-the-root-of-problems" style="font-size: 80%;"><b>Logistic function as the root of problems</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs053.html#the-derivative-of-the-logistic-funtion" style="font-size: 80%;"><b>The derivative of the Logistic funtion</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs054.html#insights-from-the-paper-by-glorot-and-bengio" style="font-size: 80%;"><b>Insights from the paper by Glorot and Bengio</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs055.html#the-relu-function-family" style="font-size: 80%;"><b>The RELU function family</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs056.html#elu-function" style="font-size: 80%;"><b>ELU function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs057.html#which-activation-function-should-we-use" style="font-size: 80%;"><b>Which activation function should we use?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs058.html#more-on-activation-functions-output-layers" style="font-size: 80%;"><b>More on activation functions, output layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs059.html#fine-tuning-neural-network-hyperparameters" style="font-size: 80%;"><b>Fine-tuning neural network hyperparameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs060.html#hidden-layers" style="font-size: 80%;"><b>Hidden layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs061.html#batch-normalization" style="font-size: 80%;"><b>Batch Normalization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs062.html#dropout" style="font-size: 80%;"><b>Dropout</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs063.html#gradient-clipping" style="font-size: 80%;"><b>Gradient Clipping</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs064.html#a-top-down-perspective-on-neural-networks" style="font-size: 80%;"><b>A top-down perspective on Neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs065.html#more-top-down-perspectives" style="font-size: 80%;"><b>More top-down perspectives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs066.html#limitations-of-supervised-learning-with-deep-networks" style="font-size: 80%;"><b>Limitations of supervised learning with deep networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs067.html#limitations-of-nns" style="font-size: 80%;"><b>Limitations of NNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs068.html#homogeneous-data" style="font-size: 80%;"><b>Homogeneous data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs069.html#more-limitations" style="font-size: 80%;"><b>More limitations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs070.html#setting-up-a-multi-layer-perceptron-model-for-classification" style="font-size: 80%;"><b>Setting up a Multi-layer perceptron model for classification</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs071.html#defining-the-cost-function" style="font-size: 80%;"><b>Defining the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs072.html#example-binary-classification-problem" style="font-size: 80%;"><b>Example: binary classification problem</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs073.html#the-softmax-function" style="font-size: 80%;"><b>The Softmax function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs074.html#developing-a-code-for-doing-neural-networks-with-back-propagation" style="font-size: 80%;"><b>Developing a code for doing neural networks with back propagation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs076.html#train-and-test-datasets" style="font-size: 80%;"><b>Train and test datasets</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs077.html#define-model-and-architecture" style="font-size: 80%;"><b>Define model and architecture</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs078.html#layers" style="font-size: 80%;"><b>Layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs079.html#weights-and-biases" style="font-size: 80%;"><b>Weights and biases</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs080.html#feed-forward-pass" style="font-size: 80%;"><b>Feed-forward pass</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs081.html#matrix-multiplications" style="font-size: 80%;"><b>Matrix multiplications</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs082.html#choose-cost-function-and-optimizer" style="font-size: 80%;"><b>Choose cost function and optimizer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs083.html#optimizing-the-cost-function" style="font-size: 80%;"><b>Optimizing the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs084.html#regularization" style="font-size: 80%;"><b>Regularization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs085.html#matrix-multiplication" style="font-size: 80%;"><b>Matrix  multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs086.html#improving-performance" style="font-size: 80%;"><b>Improving performance</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs087.html#full-object-oriented-implementation" style="font-size: 80%;"><b>Full object-oriented implementation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs088.html#evaluate-model-performance-on-test-data" style="font-size: 80%;"><b>Evaluate model performance on test data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs089.html#adjust-hyperparameters" style="font-size: 80%;"><b>Adjust hyperparameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs091.html#scikit-learn-implementation" style="font-size: 80%;"><b>scikit-learn implementation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#building-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs094.html#tensorflow" style="font-size: 80%;"><b>Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs095.html#using-keras" style="font-size: 80%;"><b>Using Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#the-breast-cancer-data-now-with-keras" style="font-size: 80%;"><b>The Breast Cancer Data, now with Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs048.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
+     <!-- navigation toc: --> <li><a href="#relevance" style="font-size: 80%;"><b>Relevance</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs050.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs051.html#exploding-gradients" style="font-size: 80%;"><b>Exploding gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs052.html#is-the-logistic-activation-function-sigmoid-our-choice" style="font-size: 80%;"><b>Is the Logistic activation function (Sigmoid)  our choice?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs053.html#logistic-function-as-the-root-of-problems" style="font-size: 80%;"><b>Logistic function as the root of problems</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs054.html#the-derivative-of-the-logistic-funtion" style="font-size: 80%;"><b>The derivative of the Logistic funtion</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs055.html#insights-from-the-paper-by-glorot-and-bengio" style="font-size: 80%;"><b>Insights from the paper by Glorot and Bengio</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs056.html#the-relu-function-family" style="font-size: 80%;"><b>The RELU function family</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs057.html#elu-function" style="font-size: 80%;"><b>ELU function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs058.html#which-activation-function-should-we-use" style="font-size: 80%;"><b>Which activation function should we use?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs059.html#more-on-activation-functions-output-layers" style="font-size: 80%;"><b>More on activation functions, output layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs060.html#fine-tuning-neural-network-hyperparameters" style="font-size: 80%;"><b>Fine-tuning neural network hyperparameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs061.html#hidden-layers" style="font-size: 80%;"><b>Hidden layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs062.html#batch-normalization" style="font-size: 80%;"><b>Batch Normalization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs063.html#dropout" style="font-size: 80%;"><b>Dropout</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs064.html#gradient-clipping" style="font-size: 80%;"><b>Gradient Clipping</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs065.html#a-top-down-perspective-on-neural-networks" style="font-size: 80%;"><b>A top-down perspective on Neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs066.html#more-top-down-perspectives" style="font-size: 80%;"><b>More top-down perspectives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs067.html#limitations-of-supervised-learning-with-deep-networks" style="font-size: 80%;"><b>Limitations of supervised learning with deep networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs068.html#limitations-of-nns" style="font-size: 80%;"><b>Limitations of NNs</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs069.html#homogeneous-data" style="font-size: 80%;"><b>Homogeneous data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs070.html#more-limitations" style="font-size: 80%;"><b>More limitations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs071.html#setting-up-a-multi-layer-perceptron-model-for-classification" style="font-size: 80%;"><b>Setting up a Multi-layer perceptron model for classification</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs072.html#defining-the-cost-function" style="font-size: 80%;"><b>Defining the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs073.html#example-binary-classification-problem" style="font-size: 80%;"><b>Example: binary classification problem</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs074.html#the-softmax-function" style="font-size: 80%;"><b>The Softmax function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs075.html#developing-a-code-for-doing-neural-networks-with-back-propagation" style="font-size: 80%;"><b>Developing a code for doing neural networks with back propagation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs077.html#train-and-test-datasets" style="font-size: 80%;"><b>Train and test datasets</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs078.html#define-model-and-architecture" style="font-size: 80%;"><b>Define model and architecture</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs079.html#layers" style="font-size: 80%;"><b>Layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs080.html#weights-and-biases" style="font-size: 80%;"><b>Weights and biases</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs081.html#feed-forward-pass" style="font-size: 80%;"><b>Feed-forward pass</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs082.html#matrix-multiplications" style="font-size: 80%;"><b>Matrix multiplications</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs083.html#choose-cost-function-and-optimizer" style="font-size: 80%;"><b>Choose cost function and optimizer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs084.html#optimizing-the-cost-function" style="font-size: 80%;"><b>Optimizing the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs085.html#regularization" style="font-size: 80%;"><b>Regularization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs086.html#matrix-multiplication" style="font-size: 80%;"><b>Matrix  multiplication</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs087.html#improving-performance" style="font-size: 80%;"><b>Improving performance</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs088.html#full-object-oriented-implementation" style="font-size: 80%;"><b>Full object-oriented implementation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs089.html#evaluate-model-performance-on-test-data" style="font-size: 80%;"><b>Evaluate model performance on test data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs090.html#adjust-hyperparameters" style="font-size: 80%;"><b>Adjust hyperparameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#scikit-learn-implementation" style="font-size: 80%;"><b>scikit-learn implementation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs094.html#building-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building neural networks in Tensorflow and Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs095.html#tensorflow" style="font-size: 80%;"><b>Tensorflow</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#using-keras" style="font-size: 80%;"><b>Using Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#building-a-neural-network-code" style="font-size: 80%;"><b>Building a neural network code</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#learning-rate-methods" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Learning rate methods</a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#usage-of-the-above-learning-rate-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of the above learning rate schedulers</a></li>
@@ -464,22 +462,109 @@
 <div class="container">
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0049"></a>
-<!-- !split  -->
-<h2 id="vanishing-gradients" class="anchor">Vanishing gradients  </h2>
+<!-- !split -->
+<h2 id="relevance" class="anchor">Relevance </h2>
 
-<p>The Back propagation algorithm we derived above works by going from
-the output layer to the input layer, propagating the error gradient on
-the way. Once the algorithm has computed the gradient of the cost
-function with regards to each parameter in the network, it uses these
-gradients to update each parameter with a Gradient Descent (GD) step.
+<p>The <em>sigmoid</em> function are more biologically plausible because the
+output of inactive neurons are zero. Such activation function are
+called <em>one-sided</em>. However, it has been shown that the hyperbolic
+tangent performs better than the sigmoid for training MLPs.  has
+become the most popular for <em>deep neural networks</em>
 </p>
 
-<p>Unfortunately for us, the gradients often get smaller and smaller as
-the algorithm progresses down to the first hidden layers. As a result,
-the GD update leaves the lower layer connection weights virtually
-unchanged, and training never converges to a good solution. This is
-known in the literature as <b>the vanishing gradients problem</b>.
-</p>
+
+<!-- code=python (!bc pycod) typeset with pygments style "default" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #f8f8f8">
+  <pre style="line-height: 125%;"><span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;The sigmoid function (or the logistic curve) is a </span>
+<span style="color: #BA2121; font-style: italic">function that takes any real number, z, and outputs a number (0,1).</span>
+<span style="color: #BA2121; font-style: italic">It is useful in neural networks for assigning weights on a relative scale.</span>
+<span style="color: #BA2121; font-style: italic">The value z is the weighted sum of parameters involved in the learning algorithm.&quot;&quot;&quot;</span>
+
+<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">numpy</span>
+<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">matplotlib.pyplot</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">plt</span>
+<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">math</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">mt</span>
+
+z <span style="color: #666666">=</span> numpy<span style="color: #666666">.</span>arange(<span style="color: #666666">-5</span>, <span style="color: #666666">5</span>, <span style="color: #666666">.1</span>)
+sigma_fn <span style="color: #666666">=</span> numpy<span style="color: #666666">.</span>vectorize(<span style="color: #008000; font-weight: bold">lambda</span> z: <span style="color: #666666">1/</span>(<span style="color: #666666">1+</span>numpy<span style="color: #666666">.</span>exp(<span style="color: #666666">-</span>z)))
+sigma <span style="color: #666666">=</span> sigma_fn(z)
+
+fig <span style="color: #666666">=</span> plt<span style="color: #666666">.</span>figure()
+ax <span style="color: #666666">=</span> fig<span style="color: #666666">.</span>add_subplot(<span style="color: #666666">111</span>)
+ax<span style="color: #666666">.</span>plot(z, sigma)
+ax<span style="color: #666666">.</span>set_ylim([<span style="color: #666666">-0.1</span>, <span style="color: #666666">1.1</span>])
+ax<span style="color: #666666">.</span>set_xlim([<span style="color: #666666">-5</span>,<span style="color: #666666">5</span>])
+ax<span style="color: #666666">.</span>grid(<span style="color: #008000; font-weight: bold">True</span>)
+ax<span style="color: #666666">.</span>set_xlabel(<span style="color: #BA2121">&#39;z&#39;</span>)
+ax<span style="color: #666666">.</span>set_title(<span style="color: #BA2121">&#39;sigmoid function&#39;</span>)
+
+plt<span style="color: #666666">.</span>show()
+
+<span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;Step Function&quot;&quot;&quot;</span>
+z <span style="color: #666666">=</span> numpy<span style="color: #666666">.</span>arange(<span style="color: #666666">-5</span>, <span style="color: #666666">5</span>, <span style="color: #666666">.02</span>)
+step_fn <span style="color: #666666">=</span> numpy<span style="color: #666666">.</span>vectorize(<span style="color: #008000; font-weight: bold">lambda</span> z: <span style="color: #666666">1.0</span> <span style="color: #008000; font-weight: bold">if</span> z <span style="color: #666666">&gt;=</span> <span style="color: #666666">0.0</span> <span style="color: #008000; font-weight: bold">else</span> <span style="color: #666666">0.0</span>)
+step <span style="color: #666666">=</span> step_fn(z)
+
+fig <span style="color: #666666">=</span> plt<span style="color: #666666">.</span>figure()
+ax <span style="color: #666666">=</span> fig<span style="color: #666666">.</span>add_subplot(<span style="color: #666666">111</span>)
+ax<span style="color: #666666">.</span>plot(z, step)
+ax<span style="color: #666666">.</span>set_ylim([<span style="color: #666666">-0.5</span>, <span style="color: #666666">1.5</span>])
+ax<span style="color: #666666">.</span>set_xlim([<span style="color: #666666">-5</span>,<span style="color: #666666">5</span>])
+ax<span style="color: #666666">.</span>grid(<span style="color: #008000; font-weight: bold">True</span>)
+ax<span style="color: #666666">.</span>set_xlabel(<span style="color: #BA2121">&#39;z&#39;</span>)
+ax<span style="color: #666666">.</span>set_title(<span style="color: #BA2121">&#39;step function&#39;</span>)
+
+plt<span style="color: #666666">.</span>show()
+
+<span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;Sine Function&quot;&quot;&quot;</span>
+z <span style="color: #666666">=</span> numpy<span style="color: #666666">.</span>arange(<span style="color: #666666">-2*</span>mt<span style="color: #666666">.</span>pi, <span style="color: #666666">2*</span>mt<span style="color: #666666">.</span>pi, <span style="color: #666666">0.1</span>)
+t <span style="color: #666666">=</span> numpy<span style="color: #666666">.</span>sin(z)
+
+fig <span style="color: #666666">=</span> plt<span style="color: #666666">.</span>figure()
+ax <span style="color: #666666">=</span> fig<span style="color: #666666">.</span>add_subplot(<span style="color: #666666">111</span>)
+ax<span style="color: #666666">.</span>plot(z, t)
+ax<span style="color: #666666">.</span>set_ylim([<span style="color: #666666">-1.0</span>, <span style="color: #666666">1.0</span>])
+ax<span style="color: #666666">.</span>set_xlim([<span style="color: #666666">-2*</span>mt<span style="color: #666666">.</span>pi,<span style="color: #666666">2*</span>mt<span style="color: #666666">.</span>pi])
+ax<span style="color: #666666">.</span>grid(<span style="color: #008000; font-weight: bold">True</span>)
+ax<span style="color: #666666">.</span>set_xlabel(<span style="color: #BA2121">&#39;z&#39;</span>)
+ax<span style="color: #666666">.</span>set_title(<span style="color: #BA2121">&#39;sine function&#39;</span>)
+
+plt<span style="color: #666666">.</span>show()
+
+<span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;Plots a graph of the squashing function used by a rectified linear</span>
+<span style="color: #BA2121; font-style: italic">unit&quot;&quot;&quot;</span>
+z <span style="color: #666666">=</span> numpy<span style="color: #666666">.</span>arange(<span style="color: #666666">-2</span>, <span style="color: #666666">2</span>, <span style="color: #666666">.1</span>)
+zero <span style="color: #666666">=</span> numpy<span style="color: #666666">.</span>zeros(<span style="color: #008000">len</span>(z))
+y <span style="color: #666666">=</span> numpy<span style="color: #666666">.</span>max([zero, z], axis<span style="color: #666666">=0</span>)
+
+fig <span style="color: #666666">=</span> plt<span style="color: #666666">.</span>figure()
+ax <span style="color: #666666">=</span> fig<span style="color: #666666">.</span>add_subplot(<span style="color: #666666">111</span>)
+ax<span style="color: #666666">.</span>plot(z, y)
+ax<span style="color: #666666">.</span>set_ylim([<span style="color: #666666">-2.0</span>, <span style="color: #666666">2.0</span>])
+ax<span style="color: #666666">.</span>set_xlim([<span style="color: #666666">-2.0</span>, <span style="color: #666666">2.0</span>])
+ax<span style="color: #666666">.</span>grid(<span style="color: #008000; font-weight: bold">True</span>)
+ax<span style="color: #666666">.</span>set_xlabel(<span style="color: #BA2121">&#39;z&#39;</span>)
+ax<span style="color: #666666">.</span>set_title(<span style="color: #BA2121">&#39;Rectified linear unit&#39;</span>)
+
+plt<span style="color: #666666">.</span>show()
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
+</div>
+
 
 <p>
 <!-- navigation buttons at the bottom of the page -->
diff --git a/doc/pub/week42/html/._week42-bs050.html b/doc/pub/week42/html/._week42-bs050.html
index 770a888d2..57a086e05 100644
--- a/doc/pub/week42/html/._week42-bs050.html
+++ b/doc/pub/week42/html/._week42-bs050.html
@@ -36,16 +36,17 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Lecture October 14, 2024', 2, None, 'lecture-october-14-2024'),
-              ('Material for the active learning sessions on Tuesday and '
-               'Wednesday',
+ 'sections': [('Lecture October 13, 2025', 2, None, 'lecture-october-13-2025'),
+              ('Readings and videos', 2, None, 'readings-and-videos'),
+              ('Material for the lab sessions on Tuesday and Wednesday',
                2,
                None,
-               'material-for-the-active-learning-sessions-on-tuesday-and-wednesday'),
-              ('Writing a code which implements a feed-forward neural network',
+               'material-for-the-lab-sessions-on-tuesday-and-wednesday'),
+              ('Lecture material: Writing a code which implements a '
+               'feed-forward neural network',
                2,
                None,
-               'writing-a-code-which-implements-a-feed-forward-neural-network'),
+               'lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network'),
               ('Mathematics of deep learning',
                2,
                None,
@@ -55,10 +56,11 @@
                None,
                'reminder-on-books-with-hands-on-material-and-codes'),
               ('Reading recommendations', 2, None, 'reading-recommendations'),
-              ('First network example, simple percepetron with one input',
+              ('Reminder from last week: First network example, simple '
+               'percepetron with one input',
                2,
                None,
-               'first-network-example-simple-percepetron-with-one-input'),
+               'reminder-from-last-week-first-network-example-simple-percepetron-with-one-input'),
               ('Layout of a simple neural network with no hidden layer',
                2,
                None,
@@ -105,10 +107,10 @@
                None,
                'setting-up-the-equations-for-a-neural-network'),
               ('Layout of a neural network with three hidden layers (last '
-               'later = $l=L=4$, first layer $l=0$)',
+               'layer = $l=L=4$, first layer $l=0$)',
                2,
                None,
-               'layout-of-a-neural-network-with-three-hidden-layers-last-later-l-l-4-first-layer-l-0'),
+               'layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0'),
               ('Definitions', 2, None, 'definitions'),
               ('Inputs to the activation function',
                2,
@@ -291,10 +293,6 @@
                2,
                None,
                'collect-and-pre-process-data'),
-              ('The Breast Cancer Data, now with Keras',
-               2,
-               None,
-               'the-breast-cancer-data-now-with-keras'),
               ('Building a neural network code',
                2,
                None,
@@ -349,103 +347,103 @@
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs001.html#lecture-october-14-2024" style="font-size: 80%;"><b>Lecture October 14, 2024</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs002.html#material-for-the-active-learning-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Material for the active learning sessions on Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs003.html#writing-a-code-which-implements-a-feed-forward-neural-network" style="font-size: 80%;"><b>Writing a code which implements a feed-forward neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs004.html#mathematics-of-deep-learning" style="font-size: 80%;"><b>Mathematics of deep learning</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs005.html#reminder-on-books-with-hands-on-material-and-codes" style="font-size: 80%;"><b>Reminder on books with hands-on material and codes</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs006.html#reading-recommendations" style="font-size: 80%;"><b>Reading recommendations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs007.html#first-network-example-simple-percepetron-with-one-input" style="font-size: 80%;"><b>First network example, simple percepetron with one input</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs008.html#layout-of-a-simple-neural-network-with-no-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with no hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs009.html#optimizing-the-parameters" style="font-size: 80%;"><b>Optimizing the parameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs010.html#adding-a-hidden-layer" style="font-size: 80%;"><b>Adding a hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs011.html#layout-of-a-simple-neural-network-with-one-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with one hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs012.html#the-derivatives" style="font-size: 80%;"><b>The derivatives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs013.html#important-observations" style="font-size: 80%;"><b>Important observations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs014.html#the-training" style="font-size: 80%;"><b>The training</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs015.html#code-example" style="font-size: 80%;"><b>Code example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs016.html#simple-neural-network-and-the-back-propagation-equations" style="font-size: 80%;"><b>Simple neural network and the  back propagation equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs017.html#layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-with-two-hidden-noeds-and-one-output-node" style="font-size: 80%;"><b>Layout of a simple neural network with two input nodes, one  hidden layer with two hidden noeds and one output node</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs018.html#the-ouput-layer" style="font-size: 80%;"><b>The ouput layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs019.html#compact-expressions" style="font-size: 80%;"><b>Compact expressions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs020.html#output-layer" style="font-size: 80%;"><b>Output layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs021.html#explicit-derivatives" style="font-size: 80%;"><b>Explicit derivatives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs022.html#derivatives-of-the-hidden-layer" style="font-size: 80%;"><b>Derivatives of the hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs023.html#final-expression" style="font-size: 80%;"><b>Final expression</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs024.html#completing-the-list" style="font-size: 80%;"><b>Completing the list</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs025.html#final-expressions-for-the-biases-of-the-hidden-layer" style="font-size: 80%;"><b>Final expressions for the biases of the hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs026.html#gradient-expressions" style="font-size: 80%;"><b>Gradient expressions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs027.html#setting-up-the-equations-for-a-neural-network" style="font-size: 80%;"><b>Setting up the equations for a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs028.html#layout-of-a-neural-network-with-three-hidden-layers-last-later-l-l-4-first-layer-l-0" style="font-size: 80%;"><b>Layout of a neural network with three hidden layers (last later = \( l=L=4 \), first layer \( l=0 \))</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs029.html#definitions" style="font-size: 80%;"><b>Definitions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs030.html#inputs-to-the-activation-function" style="font-size: 80%;"><b>Inputs to the activation function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs031.html#layout-of-input-to-first-hidden-layer-l-1-from-input-layer-l-0" style="font-size: 80%;"><b>Layout of input to first hidden layer \( l=1 \) from input layer \( l=0 \)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs032.html#derivatives-and-the-chain-rule" style="font-size: 80%;"><b>Derivatives and the chain rule</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs033.html#derivative-of-the-cost-function" style="font-size: 80%;"><b>Derivative of the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs034.html#the-back-propagation-equations-for-a-neural-network" style="font-size: 80%;"><b>The  back propagation equations for a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs035.html#analyzing-the-last-results" style="font-size: 80%;"><b>Analyzing the last results</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs036.html#more-considerations" style="font-size: 80%;"><b>More considerations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs037.html#derivatives-in-terms-of-z-j-l" style="font-size: 80%;"><b>Derivatives in terms of \( z_j^L \)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs038.html#bringing-it-together" style="font-size: 80%;"><b>Bringing it together</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs039.html#final-back-propagating-equation" style="font-size: 80%;"><b>Final back propagating equation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs040.html#using-the-chain-rule-and-summing-over-all-k-entries" style="font-size: 80%;"><b>Using the chain rule and summing over all \( k \) entries</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs041.html#setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations" style="font-size: 80%;"><b>Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs042.html#setting-up-the-back-propagation-algorithm-part-1" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 1</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs043.html#setting-up-the-back-propagation-algorithm-part-2" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs044.html#setting-up-the-back-propagation-algorithm-part-3" style="font-size: 80%;"><b>Setting up the Back propagation algorithm, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs045.html#updating-the-gradients" style="font-size: 80%;"><b>Updating the gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs001.html#lecture-october-13-2025" style="font-size: 80%;"><b>Lecture October 13, 2025</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs002.html#readings-and-videos" style="font-size: 80%;"><b>Readings and videos</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs003.html#material-for-the-lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Material for the lab sessions on Tuesday and Wednesday</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs004.html#lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network" style="font-size: 80%;"><b>Lecture material: Writing a code which implements a feed-forward neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs005.html#mathematics-of-deep-learning" style="font-size: 80%;"><b>Mathematics of deep learning</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs006.html#reminder-on-books-with-hands-on-material-and-codes" style="font-size: 80%;"><b>Reminder on books with hands-on material and codes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs007.html#reading-recommendations" style="font-size: 80%;"><b>Reading recommendations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs008.html#reminder-from-last-week-first-network-example-simple-percepetron-with-one-input" style="font-size: 80%;"><b>Reminder from last week: First network example, simple percepetron with one input</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs009.html#layout-of-a-simple-neural-network-with-no-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with no hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs010.html#optimizing-the-parameters" style="font-size: 80%;"><b>Optimizing the parameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs011.html#adding-a-hidden-layer" style="font-size: 80%;"><b>Adding a hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs012.html#layout-of-a-simple-neural-network-with-one-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with one hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs013.html#the-derivatives" style="font-size: 80%;"><b>The derivatives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs014.html#important-observations" style="font-size: 80%;"><b>Important observations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs015.html#the-training" style="font-size: 80%;"><b>The training</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs016.html#code-example" style="font-size: 80%;"><b>Code example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs017.html#simple-neural-network-and-the-back-propagation-equations" style="font-size: 80%;"><b>Simple neural network and the  back propagation equations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs018.html#layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-with-two-hidden-noeds-and-one-output-node" style="font-size: 80%;"><b>Layout of a simple neural network with two input nodes, one  hidden layer with two hidden noeds and one output node</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs019.html#the-ouput-layer" style="font-size: 80%;"><b>The ouput layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs020.html#compact-expressions" style="font-size: 80%;"><b>Compact expressions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs021.html#output-layer" style="font-size: 80%;"><b>Output layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs022.html#explicit-derivatives" style="font-size: 80%;"><b>Explicit derivatives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs023.html#derivatives-of-the-hidden-layer" style="font-size: 80%;"><b>Derivatives of the hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs024.html#final-expression" style="font-size: 80%;"><b>Final expression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs025.html#completing-the-list" style="font-size: 80%;"><b>Completing the list</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs026.html#final-expressions-for-the-biases-of-the-hidden-layer" style="font-size: 80%;"><b>Final expressions for the biases of the hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs027.html#gradient-expressions" style="font-size: 80%;"><b>Gradient expressions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs028.html#setting-up-the-equations-for-a-neural-network" style="font-size: 80%;"><b>Setting up the equations for a neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs029.html#layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0" style="font-size: 80%;"><b>Layout of a neural network with three hidden layers (last layer = \( l=L=4 \), first layer \( l=0 \))</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs030.html#definitions" style="font-size: 80%;"><b>Definitions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs031.html#inputs-to-the-activation-function" style="font-size: 80%;"><b>Inputs to the activation function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs032.html#layout-of-input-to-first-hidden-layer-l-1-from-input-layer-l-0" style="font-size: 80%;"><b>Layout of input to first hidden layer \( l=1 \) from input layer \( l=0 \)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs033.html#derivatives-and-the-chain-rule" style="font-size: 80%;"><b>Derivatives and the chain rule</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs034.html#derivative-of-the-cost-function" style="font-size: 80%;"><b>Derivative of the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs035.html#the-back-propagation-equations-for-a-neural-network" style="font-size: 80%;"><b>The  back propagation equations for a neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs036.html#analyzing-the-last-results" style="font-size: 80%;"><b>Analyzing the last results</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs037.html#more-considerations" style="font-size: 80%;"><b>More considerations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs038.html#derivatives-in-terms-of-z-j-l" style="font-size: 80%;"><b>Derivatives in terms of \( z_j^L \)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs039.html#bringing-it-together" style="font-size: 80%;"><b>Bringing it together</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs040.html#final-back-propagating-equation" style="font-size: 80%;"><b>Final back propagating equation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs041.html#using-the-chain-rule-and-summing-over-all-k-entries" style="font-size: 80%;"><b>Using the chain rule and summing over all \( k \) entries</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs042.html#setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations" style="font-size: 80%;"><b>Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs043.html#setting-up-the-back-propagation-algorithm-part-1" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 1</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs044.html#setting-up-the-back-propagation-algorithm-part-2" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 2</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs045.html#setting-up-the-back-propagation-algorithm-part-3" style="font-size: 80%;"><b>Setting up the Back propagation algorithm, part 3</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs046.html#updating-the-gradients" style="font-size: 80%;"><b>Updating the gradients</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#activation-functions" style="font-size: 80%;"><b>Activation functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs047.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs048.html#relevance" style="font-size: 80%;"><b>Relevance</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs049.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="#exploding-gradients" style="font-size: 80%;"><b>Exploding gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs051.html#is-the-logistic-activation-function-sigmoid-our-choice" style="font-size: 80%;"><b>Is the Logistic activation function (Sigmoid)  our choice?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs052.html#logistic-function-as-the-root-of-problems" style="font-size: 80%;"><b>Logistic function as the root of problems</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs053.html#the-derivative-of-the-logistic-funtion" style="font-size: 80%;"><b>The derivative of the Logistic funtion</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs054.html#insights-from-the-paper-by-glorot-and-bengio" style="font-size: 80%;"><b>Insights from the paper by Glorot and Bengio</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs055.html#the-relu-function-family" style="font-size: 80%;"><b>The RELU function family</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs056.html#elu-function" style="font-size: 80%;"><b>ELU function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs057.html#which-activation-function-should-we-use" style="font-size: 80%;"><b>Which activation function should we use?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs058.html#more-on-activation-functions-output-layers" style="font-size: 80%;"><b>More on activation functions, output layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs059.html#fine-tuning-neural-network-hyperparameters" style="font-size: 80%;"><b>Fine-tuning neural network hyperparameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs060.html#hidden-layers" style="font-size: 80%;"><b>Hidden layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs061.html#batch-normalization" style="font-size: 80%;"><b>Batch Normalization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs062.html#dropout" style="font-size: 80%;"><b>Dropout</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs063.html#gradient-clipping" style="font-size: 80%;"><b>Gradient Clipping</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs064.html#a-top-down-perspective-on-neural-networks" style="font-size: 80%;"><b>A top-down perspective on Neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs065.html#more-top-down-perspectives" style="font-size: 80%;"><b>More top-down perspectives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs066.html#limitations-of-supervised-learning-with-deep-networks" style="font-size: 80%;"><b>Limitations of supervised learning with deep networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs067.html#limitations-of-nns" style="font-size: 80%;"><b>Limitations of NNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs068.html#homogeneous-data" style="font-size: 80%;"><b>Homogeneous data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs069.html#more-limitations" style="font-size: 80%;"><b>More limitations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs070.html#setting-up-a-multi-layer-perceptron-model-for-classification" style="font-size: 80%;"><b>Setting up a Multi-layer perceptron model for classification</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs071.html#defining-the-cost-function" style="font-size: 80%;"><b>Defining the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs072.html#example-binary-classification-problem" style="font-size: 80%;"><b>Example: binary classification problem</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs073.html#the-softmax-function" style="font-size: 80%;"><b>The Softmax function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs074.html#developing-a-code-for-doing-neural-networks-with-back-propagation" style="font-size: 80%;"><b>Developing a code for doing neural networks with back propagation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs076.html#train-and-test-datasets" style="font-size: 80%;"><b>Train and test datasets</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs077.html#define-model-and-architecture" style="font-size: 80%;"><b>Define model and architecture</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs078.html#layers" style="font-size: 80%;"><b>Layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs079.html#weights-and-biases" style="font-size: 80%;"><b>Weights and biases</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs080.html#feed-forward-pass" style="font-size: 80%;"><b>Feed-forward pass</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs081.html#matrix-multiplications" style="font-size: 80%;"><b>Matrix multiplications</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs082.html#choose-cost-function-and-optimizer" style="font-size: 80%;"><b>Choose cost function and optimizer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs083.html#optimizing-the-cost-function" style="font-size: 80%;"><b>Optimizing the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs084.html#regularization" style="font-size: 80%;"><b>Regularization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs085.html#matrix-multiplication" style="font-size: 80%;"><b>Matrix  multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs086.html#improving-performance" style="font-size: 80%;"><b>Improving performance</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs087.html#full-object-oriented-implementation" style="font-size: 80%;"><b>Full object-oriented implementation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs088.html#evaluate-model-performance-on-test-data" style="font-size: 80%;"><b>Evaluate model performance on test data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs089.html#adjust-hyperparameters" style="font-size: 80%;"><b>Adjust hyperparameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs091.html#scikit-learn-implementation" style="font-size: 80%;"><b>scikit-learn implementation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#building-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs094.html#tensorflow" style="font-size: 80%;"><b>Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs095.html#using-keras" style="font-size: 80%;"><b>Using Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#the-breast-cancer-data-now-with-keras" style="font-size: 80%;"><b>The Breast Cancer Data, now with Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs048.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs049.html#relevance" style="font-size: 80%;"><b>Relevance</b></a></li>
+     <!-- navigation toc: --> <li><a href="#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs051.html#exploding-gradients" style="font-size: 80%;"><b>Exploding gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs052.html#is-the-logistic-activation-function-sigmoid-our-choice" style="font-size: 80%;"><b>Is the Logistic activation function (Sigmoid)  our choice?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs053.html#logistic-function-as-the-root-of-problems" style="font-size: 80%;"><b>Logistic function as the root of problems</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs054.html#the-derivative-of-the-logistic-funtion" style="font-size: 80%;"><b>The derivative of the Logistic funtion</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs055.html#insights-from-the-paper-by-glorot-and-bengio" style="font-size: 80%;"><b>Insights from the paper by Glorot and Bengio</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs056.html#the-relu-function-family" style="font-size: 80%;"><b>The RELU function family</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs057.html#elu-function" style="font-size: 80%;"><b>ELU function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs058.html#which-activation-function-should-we-use" style="font-size: 80%;"><b>Which activation function should we use?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs059.html#more-on-activation-functions-output-layers" style="font-size: 80%;"><b>More on activation functions, output layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs060.html#fine-tuning-neural-network-hyperparameters" style="font-size: 80%;"><b>Fine-tuning neural network hyperparameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs061.html#hidden-layers" style="font-size: 80%;"><b>Hidden layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs062.html#batch-normalization" style="font-size: 80%;"><b>Batch Normalization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs063.html#dropout" style="font-size: 80%;"><b>Dropout</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs064.html#gradient-clipping" style="font-size: 80%;"><b>Gradient Clipping</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs065.html#a-top-down-perspective-on-neural-networks" style="font-size: 80%;"><b>A top-down perspective on Neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs066.html#more-top-down-perspectives" style="font-size: 80%;"><b>More top-down perspectives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs067.html#limitations-of-supervised-learning-with-deep-networks" style="font-size: 80%;"><b>Limitations of supervised learning with deep networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs068.html#limitations-of-nns" style="font-size: 80%;"><b>Limitations of NNs</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs069.html#homogeneous-data" style="font-size: 80%;"><b>Homogeneous data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs070.html#more-limitations" style="font-size: 80%;"><b>More limitations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs071.html#setting-up-a-multi-layer-perceptron-model-for-classification" style="font-size: 80%;"><b>Setting up a Multi-layer perceptron model for classification</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs072.html#defining-the-cost-function" style="font-size: 80%;"><b>Defining the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs073.html#example-binary-classification-problem" style="font-size: 80%;"><b>Example: binary classification problem</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs074.html#the-softmax-function" style="font-size: 80%;"><b>The Softmax function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs075.html#developing-a-code-for-doing-neural-networks-with-back-propagation" style="font-size: 80%;"><b>Developing a code for doing neural networks with back propagation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs077.html#train-and-test-datasets" style="font-size: 80%;"><b>Train and test datasets</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs078.html#define-model-and-architecture" style="font-size: 80%;"><b>Define model and architecture</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs079.html#layers" style="font-size: 80%;"><b>Layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs080.html#weights-and-biases" style="font-size: 80%;"><b>Weights and biases</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs081.html#feed-forward-pass" style="font-size: 80%;"><b>Feed-forward pass</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs082.html#matrix-multiplications" style="font-size: 80%;"><b>Matrix multiplications</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs083.html#choose-cost-function-and-optimizer" style="font-size: 80%;"><b>Choose cost function and optimizer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs084.html#optimizing-the-cost-function" style="font-size: 80%;"><b>Optimizing the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs085.html#regularization" style="font-size: 80%;"><b>Regularization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs086.html#matrix-multiplication" style="font-size: 80%;"><b>Matrix  multiplication</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs087.html#improving-performance" style="font-size: 80%;"><b>Improving performance</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs088.html#full-object-oriented-implementation" style="font-size: 80%;"><b>Full object-oriented implementation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs089.html#evaluate-model-performance-on-test-data" style="font-size: 80%;"><b>Evaluate model performance on test data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs090.html#adjust-hyperparameters" style="font-size: 80%;"><b>Adjust hyperparameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#scikit-learn-implementation" style="font-size: 80%;"><b>scikit-learn implementation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs094.html#building-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building neural networks in Tensorflow and Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs095.html#tensorflow" style="font-size: 80%;"><b>Tensorflow</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#using-keras" style="font-size: 80%;"><b>Using Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#building-a-neural-network-code" style="font-size: 80%;"><b>Building a neural network code</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#learning-rate-methods" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Learning rate methods</a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#usage-of-the-above-learning-rate-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of the above learning rate schedulers</a></li>
@@ -464,16 +462,21 @@
 <div class="container">
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0050"></a>
-<!-- !split -->
-<h2 id="exploding-gradients" class="anchor">Exploding gradients </h2>
+<!-- !split  -->
+<h2 id="vanishing-gradients" class="anchor">Vanishing gradients  </h2>
+
+<p>The Back propagation algorithm we derived above works by going from
+the output layer to the input layer, propagating the error gradient on
+the way. Once the algorithm has computed the gradient of the cost
+function with regards to each parameter in the network, it uses these
+gradients to update each parameter with a Gradient Descent (GD) step.
+</p>
 
-<p>In other cases, the opposite can happen, namely the the gradients can
-grow bigger and bigger. The result is that many of the layers get
-large updates of the weights the algorithm diverges. This is the
-<b>exploding gradients problem</b>, which is mostly encountered in
-recurrent neural networks. More generally, deep neural networks suffer
-from unstable gradients, different layers may learn at widely
-different speeds
+<p>Unfortunately for us, the gradients often get smaller and smaller as
+the algorithm progresses down to the first hidden layers. As a result,
+the GD update leaves the lower layer connection weights virtually
+unchanged, and training never converges to a good solution. This is
+known in the literature as <b>the vanishing gradients problem</b>.
 </p>
 
 <p>
diff --git a/doc/pub/week42/html/._week42-bs051.html b/doc/pub/week42/html/._week42-bs051.html
index 0ea94a198..e425da420 100644
--- a/doc/pub/week42/html/._week42-bs051.html
+++ b/doc/pub/week42/html/._week42-bs051.html
@@ -36,16 +36,17 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Lecture October 14, 2024', 2, None, 'lecture-october-14-2024'),
-              ('Material for the active learning sessions on Tuesday and '
-               'Wednesday',
+ 'sections': [('Lecture October 13, 2025', 2, None, 'lecture-october-13-2025'),
+              ('Readings and videos', 2, None, 'readings-and-videos'),
+              ('Material for the lab sessions on Tuesday and Wednesday',
                2,
                None,
-               'material-for-the-active-learning-sessions-on-tuesday-and-wednesday'),
-              ('Writing a code which implements a feed-forward neural network',
+               'material-for-the-lab-sessions-on-tuesday-and-wednesday'),
+              ('Lecture material: Writing a code which implements a '
+               'feed-forward neural network',
                2,
                None,
-               'writing-a-code-which-implements-a-feed-forward-neural-network'),
+               'lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network'),
               ('Mathematics of deep learning',
                2,
                None,
@@ -55,10 +56,11 @@
                None,
                'reminder-on-books-with-hands-on-material-and-codes'),
               ('Reading recommendations', 2, None, 'reading-recommendations'),
-              ('First network example, simple percepetron with one input',
+              ('Reminder from last week: First network example, simple '
+               'percepetron with one input',
                2,
                None,
-               'first-network-example-simple-percepetron-with-one-input'),
+               'reminder-from-last-week-first-network-example-simple-percepetron-with-one-input'),
               ('Layout of a simple neural network with no hidden layer',
                2,
                None,
@@ -105,10 +107,10 @@
                None,
                'setting-up-the-equations-for-a-neural-network'),
               ('Layout of a neural network with three hidden layers (last '
-               'later = $l=L=4$, first layer $l=0$)',
+               'layer = $l=L=4$, first layer $l=0$)',
                2,
                None,
-               'layout-of-a-neural-network-with-three-hidden-layers-last-later-l-l-4-first-layer-l-0'),
+               'layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0'),
               ('Definitions', 2, None, 'definitions'),
               ('Inputs to the activation function',
                2,
@@ -291,10 +293,6 @@
                2,
                None,
                'collect-and-pre-process-data'),
-              ('The Breast Cancer Data, now with Keras',
-               2,
-               None,
-               'the-breast-cancer-data-now-with-keras'),
               ('Building a neural network code',
                2,
                None,
@@ -349,103 +347,103 @@
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs001.html#lecture-october-14-2024" style="font-size: 80%;"><b>Lecture October 14, 2024</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs002.html#material-for-the-active-learning-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Material for the active learning sessions on Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs003.html#writing-a-code-which-implements-a-feed-forward-neural-network" style="font-size: 80%;"><b>Writing a code which implements a feed-forward neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs004.html#mathematics-of-deep-learning" style="font-size: 80%;"><b>Mathematics of deep learning</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs005.html#reminder-on-books-with-hands-on-material-and-codes" style="font-size: 80%;"><b>Reminder on books with hands-on material and codes</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs006.html#reading-recommendations" style="font-size: 80%;"><b>Reading recommendations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs007.html#first-network-example-simple-percepetron-with-one-input" style="font-size: 80%;"><b>First network example, simple percepetron with one input</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs008.html#layout-of-a-simple-neural-network-with-no-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with no hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs009.html#optimizing-the-parameters" style="font-size: 80%;"><b>Optimizing the parameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs010.html#adding-a-hidden-layer" style="font-size: 80%;"><b>Adding a hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs011.html#layout-of-a-simple-neural-network-with-one-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with one hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs012.html#the-derivatives" style="font-size: 80%;"><b>The derivatives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs013.html#important-observations" style="font-size: 80%;"><b>Important observations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs014.html#the-training" style="font-size: 80%;"><b>The training</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs015.html#code-example" style="font-size: 80%;"><b>Code example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs016.html#simple-neural-network-and-the-back-propagation-equations" style="font-size: 80%;"><b>Simple neural network and the  back propagation equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs017.html#layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-with-two-hidden-noeds-and-one-output-node" style="font-size: 80%;"><b>Layout of a simple neural network with two input nodes, one  hidden layer with two hidden noeds and one output node</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs018.html#the-ouput-layer" style="font-size: 80%;"><b>The ouput layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs019.html#compact-expressions" style="font-size: 80%;"><b>Compact expressions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs020.html#output-layer" style="font-size: 80%;"><b>Output layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs021.html#explicit-derivatives" style="font-size: 80%;"><b>Explicit derivatives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs022.html#derivatives-of-the-hidden-layer" style="font-size: 80%;"><b>Derivatives of the hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs023.html#final-expression" style="font-size: 80%;"><b>Final expression</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs024.html#completing-the-list" style="font-size: 80%;"><b>Completing the list</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs025.html#final-expressions-for-the-biases-of-the-hidden-layer" style="font-size: 80%;"><b>Final expressions for the biases of the hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs026.html#gradient-expressions" style="font-size: 80%;"><b>Gradient expressions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs027.html#setting-up-the-equations-for-a-neural-network" style="font-size: 80%;"><b>Setting up the equations for a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs028.html#layout-of-a-neural-network-with-three-hidden-layers-last-later-l-l-4-first-layer-l-0" style="font-size: 80%;"><b>Layout of a neural network with three hidden layers (last later = \( l=L=4 \), first layer \( l=0 \))</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs029.html#definitions" style="font-size: 80%;"><b>Definitions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs030.html#inputs-to-the-activation-function" style="font-size: 80%;"><b>Inputs to the activation function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs031.html#layout-of-input-to-first-hidden-layer-l-1-from-input-layer-l-0" style="font-size: 80%;"><b>Layout of input to first hidden layer \( l=1 \) from input layer \( l=0 \)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs032.html#derivatives-and-the-chain-rule" style="font-size: 80%;"><b>Derivatives and the chain rule</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs033.html#derivative-of-the-cost-function" style="font-size: 80%;"><b>Derivative of the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs034.html#the-back-propagation-equations-for-a-neural-network" style="font-size: 80%;"><b>The  back propagation equations for a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs035.html#analyzing-the-last-results" style="font-size: 80%;"><b>Analyzing the last results</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs036.html#more-considerations" style="font-size: 80%;"><b>More considerations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs037.html#derivatives-in-terms-of-z-j-l" style="font-size: 80%;"><b>Derivatives in terms of \( z_j^L \)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs038.html#bringing-it-together" style="font-size: 80%;"><b>Bringing it together</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs039.html#final-back-propagating-equation" style="font-size: 80%;"><b>Final back propagating equation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs040.html#using-the-chain-rule-and-summing-over-all-k-entries" style="font-size: 80%;"><b>Using the chain rule and summing over all \( k \) entries</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs041.html#setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations" style="font-size: 80%;"><b>Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs042.html#setting-up-the-back-propagation-algorithm-part-1" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 1</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs043.html#setting-up-the-back-propagation-algorithm-part-2" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs044.html#setting-up-the-back-propagation-algorithm-part-3" style="font-size: 80%;"><b>Setting up the Back propagation algorithm, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs045.html#updating-the-gradients" style="font-size: 80%;"><b>Updating the gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs001.html#lecture-october-13-2025" style="font-size: 80%;"><b>Lecture October 13, 2025</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs002.html#readings-and-videos" style="font-size: 80%;"><b>Readings and videos</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs003.html#material-for-the-lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Material for the lab sessions on Tuesday and Wednesday</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs004.html#lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network" style="font-size: 80%;"><b>Lecture material: Writing a code which implements a feed-forward neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs005.html#mathematics-of-deep-learning" style="font-size: 80%;"><b>Mathematics of deep learning</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs006.html#reminder-on-books-with-hands-on-material-and-codes" style="font-size: 80%;"><b>Reminder on books with hands-on material and codes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs007.html#reading-recommendations" style="font-size: 80%;"><b>Reading recommendations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs008.html#reminder-from-last-week-first-network-example-simple-percepetron-with-one-input" style="font-size: 80%;"><b>Reminder from last week: First network example, simple percepetron with one input</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs009.html#layout-of-a-simple-neural-network-with-no-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with no hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs010.html#optimizing-the-parameters" style="font-size: 80%;"><b>Optimizing the parameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs011.html#adding-a-hidden-layer" style="font-size: 80%;"><b>Adding a hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs012.html#layout-of-a-simple-neural-network-with-one-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with one hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs013.html#the-derivatives" style="font-size: 80%;"><b>The derivatives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs014.html#important-observations" style="font-size: 80%;"><b>Important observations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs015.html#the-training" style="font-size: 80%;"><b>The training</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs016.html#code-example" style="font-size: 80%;"><b>Code example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs017.html#simple-neural-network-and-the-back-propagation-equations" style="font-size: 80%;"><b>Simple neural network and the  back propagation equations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs018.html#layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-with-two-hidden-noeds-and-one-output-node" style="font-size: 80%;"><b>Layout of a simple neural network with two input nodes, one  hidden layer with two hidden noeds and one output node</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs019.html#the-ouput-layer" style="font-size: 80%;"><b>The ouput layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs020.html#compact-expressions" style="font-size: 80%;"><b>Compact expressions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs021.html#output-layer" style="font-size: 80%;"><b>Output layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs022.html#explicit-derivatives" style="font-size: 80%;"><b>Explicit derivatives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs023.html#derivatives-of-the-hidden-layer" style="font-size: 80%;"><b>Derivatives of the hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs024.html#final-expression" style="font-size: 80%;"><b>Final expression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs025.html#completing-the-list" style="font-size: 80%;"><b>Completing the list</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs026.html#final-expressions-for-the-biases-of-the-hidden-layer" style="font-size: 80%;"><b>Final expressions for the biases of the hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs027.html#gradient-expressions" style="font-size: 80%;"><b>Gradient expressions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs028.html#setting-up-the-equations-for-a-neural-network" style="font-size: 80%;"><b>Setting up the equations for a neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs029.html#layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0" style="font-size: 80%;"><b>Layout of a neural network with three hidden layers (last layer = \( l=L=4 \), first layer \( l=0 \))</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs030.html#definitions" style="font-size: 80%;"><b>Definitions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs031.html#inputs-to-the-activation-function" style="font-size: 80%;"><b>Inputs to the activation function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs032.html#layout-of-input-to-first-hidden-layer-l-1-from-input-layer-l-0" style="font-size: 80%;"><b>Layout of input to first hidden layer \( l=1 \) from input layer \( l=0 \)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs033.html#derivatives-and-the-chain-rule" style="font-size: 80%;"><b>Derivatives and the chain rule</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs034.html#derivative-of-the-cost-function" style="font-size: 80%;"><b>Derivative of the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs035.html#the-back-propagation-equations-for-a-neural-network" style="font-size: 80%;"><b>The  back propagation equations for a neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs036.html#analyzing-the-last-results" style="font-size: 80%;"><b>Analyzing the last results</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs037.html#more-considerations" style="font-size: 80%;"><b>More considerations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs038.html#derivatives-in-terms-of-z-j-l" style="font-size: 80%;"><b>Derivatives in terms of \( z_j^L \)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs039.html#bringing-it-together" style="font-size: 80%;"><b>Bringing it together</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs040.html#final-back-propagating-equation" style="font-size: 80%;"><b>Final back propagating equation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs041.html#using-the-chain-rule-and-summing-over-all-k-entries" style="font-size: 80%;"><b>Using the chain rule and summing over all \( k \) entries</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs042.html#setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations" style="font-size: 80%;"><b>Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs043.html#setting-up-the-back-propagation-algorithm-part-1" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 1</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs044.html#setting-up-the-back-propagation-algorithm-part-2" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 2</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs045.html#setting-up-the-back-propagation-algorithm-part-3" style="font-size: 80%;"><b>Setting up the Back propagation algorithm, part 3</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs046.html#updating-the-gradients" style="font-size: 80%;"><b>Updating the gradients</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#activation-functions" style="font-size: 80%;"><b>Activation functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs047.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs048.html#relevance" style="font-size: 80%;"><b>Relevance</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs049.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs050.html#exploding-gradients" style="font-size: 80%;"><b>Exploding gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="#is-the-logistic-activation-function-sigmoid-our-choice" style="font-size: 80%;"><b>Is the Logistic activation function (Sigmoid)  our choice?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs052.html#logistic-function-as-the-root-of-problems" style="font-size: 80%;"><b>Logistic function as the root of problems</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs053.html#the-derivative-of-the-logistic-funtion" style="font-size: 80%;"><b>The derivative of the Logistic funtion</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs054.html#insights-from-the-paper-by-glorot-and-bengio" style="font-size: 80%;"><b>Insights from the paper by Glorot and Bengio</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs055.html#the-relu-function-family" style="font-size: 80%;"><b>The RELU function family</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs056.html#elu-function" style="font-size: 80%;"><b>ELU function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs057.html#which-activation-function-should-we-use" style="font-size: 80%;"><b>Which activation function should we use?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs058.html#more-on-activation-functions-output-layers" style="font-size: 80%;"><b>More on activation functions, output layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs059.html#fine-tuning-neural-network-hyperparameters" style="font-size: 80%;"><b>Fine-tuning neural network hyperparameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs060.html#hidden-layers" style="font-size: 80%;"><b>Hidden layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs061.html#batch-normalization" style="font-size: 80%;"><b>Batch Normalization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs062.html#dropout" style="font-size: 80%;"><b>Dropout</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs063.html#gradient-clipping" style="font-size: 80%;"><b>Gradient Clipping</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs064.html#a-top-down-perspective-on-neural-networks" style="font-size: 80%;"><b>A top-down perspective on Neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs065.html#more-top-down-perspectives" style="font-size: 80%;"><b>More top-down perspectives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs066.html#limitations-of-supervised-learning-with-deep-networks" style="font-size: 80%;"><b>Limitations of supervised learning with deep networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs067.html#limitations-of-nns" style="font-size: 80%;"><b>Limitations of NNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs068.html#homogeneous-data" style="font-size: 80%;"><b>Homogeneous data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs069.html#more-limitations" style="font-size: 80%;"><b>More limitations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs070.html#setting-up-a-multi-layer-perceptron-model-for-classification" style="font-size: 80%;"><b>Setting up a Multi-layer perceptron model for classification</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs071.html#defining-the-cost-function" style="font-size: 80%;"><b>Defining the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs072.html#example-binary-classification-problem" style="font-size: 80%;"><b>Example: binary classification problem</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs073.html#the-softmax-function" style="font-size: 80%;"><b>The Softmax function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs074.html#developing-a-code-for-doing-neural-networks-with-back-propagation" style="font-size: 80%;"><b>Developing a code for doing neural networks with back propagation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs076.html#train-and-test-datasets" style="font-size: 80%;"><b>Train and test datasets</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs077.html#define-model-and-architecture" style="font-size: 80%;"><b>Define model and architecture</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs078.html#layers" style="font-size: 80%;"><b>Layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs079.html#weights-and-biases" style="font-size: 80%;"><b>Weights and biases</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs080.html#feed-forward-pass" style="font-size: 80%;"><b>Feed-forward pass</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs081.html#matrix-multiplications" style="font-size: 80%;"><b>Matrix multiplications</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs082.html#choose-cost-function-and-optimizer" style="font-size: 80%;"><b>Choose cost function and optimizer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs083.html#optimizing-the-cost-function" style="font-size: 80%;"><b>Optimizing the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs084.html#regularization" style="font-size: 80%;"><b>Regularization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs085.html#matrix-multiplication" style="font-size: 80%;"><b>Matrix  multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs086.html#improving-performance" style="font-size: 80%;"><b>Improving performance</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs087.html#full-object-oriented-implementation" style="font-size: 80%;"><b>Full object-oriented implementation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs088.html#evaluate-model-performance-on-test-data" style="font-size: 80%;"><b>Evaluate model performance on test data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs089.html#adjust-hyperparameters" style="font-size: 80%;"><b>Adjust hyperparameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs091.html#scikit-learn-implementation" style="font-size: 80%;"><b>scikit-learn implementation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#building-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs094.html#tensorflow" style="font-size: 80%;"><b>Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs095.html#using-keras" style="font-size: 80%;"><b>Using Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#the-breast-cancer-data-now-with-keras" style="font-size: 80%;"><b>The Breast Cancer Data, now with Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs048.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs049.html#relevance" style="font-size: 80%;"><b>Relevance</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs050.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="#exploding-gradients" style="font-size: 80%;"><b>Exploding gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs052.html#is-the-logistic-activation-function-sigmoid-our-choice" style="font-size: 80%;"><b>Is the Logistic activation function (Sigmoid)  our choice?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs053.html#logistic-function-as-the-root-of-problems" style="font-size: 80%;"><b>Logistic function as the root of problems</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs054.html#the-derivative-of-the-logistic-funtion" style="font-size: 80%;"><b>The derivative of the Logistic funtion</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs055.html#insights-from-the-paper-by-glorot-and-bengio" style="font-size: 80%;"><b>Insights from the paper by Glorot and Bengio</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs056.html#the-relu-function-family" style="font-size: 80%;"><b>The RELU function family</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs057.html#elu-function" style="font-size: 80%;"><b>ELU function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs058.html#which-activation-function-should-we-use" style="font-size: 80%;"><b>Which activation function should we use?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs059.html#more-on-activation-functions-output-layers" style="font-size: 80%;"><b>More on activation functions, output layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs060.html#fine-tuning-neural-network-hyperparameters" style="font-size: 80%;"><b>Fine-tuning neural network hyperparameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs061.html#hidden-layers" style="font-size: 80%;"><b>Hidden layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs062.html#batch-normalization" style="font-size: 80%;"><b>Batch Normalization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs063.html#dropout" style="font-size: 80%;"><b>Dropout</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs064.html#gradient-clipping" style="font-size: 80%;"><b>Gradient Clipping</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs065.html#a-top-down-perspective-on-neural-networks" style="font-size: 80%;"><b>A top-down perspective on Neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs066.html#more-top-down-perspectives" style="font-size: 80%;"><b>More top-down perspectives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs067.html#limitations-of-supervised-learning-with-deep-networks" style="font-size: 80%;"><b>Limitations of supervised learning with deep networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs068.html#limitations-of-nns" style="font-size: 80%;"><b>Limitations of NNs</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs069.html#homogeneous-data" style="font-size: 80%;"><b>Homogeneous data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs070.html#more-limitations" style="font-size: 80%;"><b>More limitations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs071.html#setting-up-a-multi-layer-perceptron-model-for-classification" style="font-size: 80%;"><b>Setting up a Multi-layer perceptron model for classification</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs072.html#defining-the-cost-function" style="font-size: 80%;"><b>Defining the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs073.html#example-binary-classification-problem" style="font-size: 80%;"><b>Example: binary classification problem</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs074.html#the-softmax-function" style="font-size: 80%;"><b>The Softmax function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs075.html#developing-a-code-for-doing-neural-networks-with-back-propagation" style="font-size: 80%;"><b>Developing a code for doing neural networks with back propagation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs077.html#train-and-test-datasets" style="font-size: 80%;"><b>Train and test datasets</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs078.html#define-model-and-architecture" style="font-size: 80%;"><b>Define model and architecture</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs079.html#layers" style="font-size: 80%;"><b>Layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs080.html#weights-and-biases" style="font-size: 80%;"><b>Weights and biases</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs081.html#feed-forward-pass" style="font-size: 80%;"><b>Feed-forward pass</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs082.html#matrix-multiplications" style="font-size: 80%;"><b>Matrix multiplications</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs083.html#choose-cost-function-and-optimizer" style="font-size: 80%;"><b>Choose cost function and optimizer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs084.html#optimizing-the-cost-function" style="font-size: 80%;"><b>Optimizing the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs085.html#regularization" style="font-size: 80%;"><b>Regularization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs086.html#matrix-multiplication" style="font-size: 80%;"><b>Matrix  multiplication</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs087.html#improving-performance" style="font-size: 80%;"><b>Improving performance</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs088.html#full-object-oriented-implementation" style="font-size: 80%;"><b>Full object-oriented implementation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs089.html#evaluate-model-performance-on-test-data" style="font-size: 80%;"><b>Evaluate model performance on test data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs090.html#adjust-hyperparameters" style="font-size: 80%;"><b>Adjust hyperparameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#scikit-learn-implementation" style="font-size: 80%;"><b>scikit-learn implementation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs094.html#building-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building neural networks in Tensorflow and Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs095.html#tensorflow" style="font-size: 80%;"><b>Tensorflow</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#using-keras" style="font-size: 80%;"><b>Using Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#building-a-neural-network-code" style="font-size: 80%;"><b>Building a neural network code</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#learning-rate-methods" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Learning rate methods</a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#usage-of-the-above-learning-rate-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of the above learning rate schedulers</a></li>
@@ -464,22 +462,16 @@
 <div class="container">
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0051"></a>
-<!-- !split  -->
-<h2 id="is-the-logistic-activation-function-sigmoid-our-choice" class="anchor">Is the Logistic activation function (Sigmoid)  our choice? </h2>
-
-<p>Although this unfortunate behavior has been empirically observed for
-quite a while (it was one of the reasons why deep neural networks were
-mostly abandoned for a long time), it is only around 2010 that
-significant progress was made in understanding it.
-</p>
+<!-- !split -->
+<h2 id="exploding-gradients" class="anchor">Exploding gradients </h2>
 
-<p>A paper titled <a href="http://proceedings.mlr.press/v9/glorot10a.html" target="_self">Understanding the Difficulty of Training Deep
-Feedforward Neural Networks by Xavier Glorot and Yoshua Bengio</a> found that
-the problems with the popular logistic
-sigmoid activation function and the weight initialization technique
-that was most popular at the time, namely random initialization using
-a normal distribution with a mean of 0 and a standard deviation of
-1. 
+<p>In other cases, the opposite can happen, namely the the gradients can
+grow bigger and bigger. The result is that many of the layers get
+large updates of the weights the algorithm diverges. This is the
+<b>exploding gradients problem</b>, which is mostly encountered in
+recurrent neural networks. More generally, deep neural networks suffer
+from unstable gradients, different layers may learn at widely
+different speeds
 </p>
 
 <p>
diff --git a/doc/pub/week42/html/._week42-bs052.html b/doc/pub/week42/html/._week42-bs052.html
index 3a76e948a..0380e0bb5 100644
--- a/doc/pub/week42/html/._week42-bs052.html
+++ b/doc/pub/week42/html/._week42-bs052.html
@@ -36,16 +36,17 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Lecture October 14, 2024', 2, None, 'lecture-october-14-2024'),
-              ('Material for the active learning sessions on Tuesday and '
-               'Wednesday',
+ 'sections': [('Lecture October 13, 2025', 2, None, 'lecture-october-13-2025'),
+              ('Readings and videos', 2, None, 'readings-and-videos'),
+              ('Material for the lab sessions on Tuesday and Wednesday',
                2,
                None,
-               'material-for-the-active-learning-sessions-on-tuesday-and-wednesday'),
-              ('Writing a code which implements a feed-forward neural network',
+               'material-for-the-lab-sessions-on-tuesday-and-wednesday'),
+              ('Lecture material: Writing a code which implements a '
+               'feed-forward neural network',
                2,
                None,
-               'writing-a-code-which-implements-a-feed-forward-neural-network'),
+               'lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network'),
               ('Mathematics of deep learning',
                2,
                None,
@@ -55,10 +56,11 @@
                None,
                'reminder-on-books-with-hands-on-material-and-codes'),
               ('Reading recommendations', 2, None, 'reading-recommendations'),
-              ('First network example, simple percepetron with one input',
+              ('Reminder from last week: First network example, simple '
+               'percepetron with one input',
                2,
                None,
-               'first-network-example-simple-percepetron-with-one-input'),
+               'reminder-from-last-week-first-network-example-simple-percepetron-with-one-input'),
               ('Layout of a simple neural network with no hidden layer',
                2,
                None,
@@ -105,10 +107,10 @@
                None,
                'setting-up-the-equations-for-a-neural-network'),
               ('Layout of a neural network with three hidden layers (last '
-               'later = $l=L=4$, first layer $l=0$)',
+               'layer = $l=L=4$, first layer $l=0$)',
                2,
                None,
-               'layout-of-a-neural-network-with-three-hidden-layers-last-later-l-l-4-first-layer-l-0'),
+               'layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0'),
               ('Definitions', 2, None, 'definitions'),
               ('Inputs to the activation function',
                2,
@@ -291,10 +293,6 @@
                2,
                None,
                'collect-and-pre-process-data'),
-              ('The Breast Cancer Data, now with Keras',
-               2,
-               None,
-               'the-breast-cancer-data-now-with-keras'),
               ('Building a neural network code',
                2,
                None,
@@ -349,103 +347,103 @@
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs001.html#lecture-october-14-2024" style="font-size: 80%;"><b>Lecture October 14, 2024</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs002.html#material-for-the-active-learning-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Material for the active learning sessions on Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs003.html#writing-a-code-which-implements-a-feed-forward-neural-network" style="font-size: 80%;"><b>Writing a code which implements a feed-forward neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs004.html#mathematics-of-deep-learning" style="font-size: 80%;"><b>Mathematics of deep learning</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs005.html#reminder-on-books-with-hands-on-material-and-codes" style="font-size: 80%;"><b>Reminder on books with hands-on material and codes</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs006.html#reading-recommendations" style="font-size: 80%;"><b>Reading recommendations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs007.html#first-network-example-simple-percepetron-with-one-input" style="font-size: 80%;"><b>First network example, simple percepetron with one input</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs008.html#layout-of-a-simple-neural-network-with-no-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with no hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs009.html#optimizing-the-parameters" style="font-size: 80%;"><b>Optimizing the parameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs010.html#adding-a-hidden-layer" style="font-size: 80%;"><b>Adding a hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs011.html#layout-of-a-simple-neural-network-with-one-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with one hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs012.html#the-derivatives" style="font-size: 80%;"><b>The derivatives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs013.html#important-observations" style="font-size: 80%;"><b>Important observations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs014.html#the-training" style="font-size: 80%;"><b>The training</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs015.html#code-example" style="font-size: 80%;"><b>Code example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs016.html#simple-neural-network-and-the-back-propagation-equations" style="font-size: 80%;"><b>Simple neural network and the  back propagation equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs017.html#layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-with-two-hidden-noeds-and-one-output-node" style="font-size: 80%;"><b>Layout of a simple neural network with two input nodes, one  hidden layer with two hidden noeds and one output node</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs018.html#the-ouput-layer" style="font-size: 80%;"><b>The ouput layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs019.html#compact-expressions" style="font-size: 80%;"><b>Compact expressions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs020.html#output-layer" style="font-size: 80%;"><b>Output layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs021.html#explicit-derivatives" style="font-size: 80%;"><b>Explicit derivatives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs022.html#derivatives-of-the-hidden-layer" style="font-size: 80%;"><b>Derivatives of the hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs023.html#final-expression" style="font-size: 80%;"><b>Final expression</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs024.html#completing-the-list" style="font-size: 80%;"><b>Completing the list</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs025.html#final-expressions-for-the-biases-of-the-hidden-layer" style="font-size: 80%;"><b>Final expressions for the biases of the hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs026.html#gradient-expressions" style="font-size: 80%;"><b>Gradient expressions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs027.html#setting-up-the-equations-for-a-neural-network" style="font-size: 80%;"><b>Setting up the equations for a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs028.html#layout-of-a-neural-network-with-three-hidden-layers-last-later-l-l-4-first-layer-l-0" style="font-size: 80%;"><b>Layout of a neural network with three hidden layers (last later = \( l=L=4 \), first layer \( l=0 \))</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs029.html#definitions" style="font-size: 80%;"><b>Definitions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs030.html#inputs-to-the-activation-function" style="font-size: 80%;"><b>Inputs to the activation function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs031.html#layout-of-input-to-first-hidden-layer-l-1-from-input-layer-l-0" style="font-size: 80%;"><b>Layout of input to first hidden layer \( l=1 \) from input layer \( l=0 \)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs032.html#derivatives-and-the-chain-rule" style="font-size: 80%;"><b>Derivatives and the chain rule</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs033.html#derivative-of-the-cost-function" style="font-size: 80%;"><b>Derivative of the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs034.html#the-back-propagation-equations-for-a-neural-network" style="font-size: 80%;"><b>The  back propagation equations for a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs035.html#analyzing-the-last-results" style="font-size: 80%;"><b>Analyzing the last results</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs036.html#more-considerations" style="font-size: 80%;"><b>More considerations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs037.html#derivatives-in-terms-of-z-j-l" style="font-size: 80%;"><b>Derivatives in terms of \( z_j^L \)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs038.html#bringing-it-together" style="font-size: 80%;"><b>Bringing it together</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs039.html#final-back-propagating-equation" style="font-size: 80%;"><b>Final back propagating equation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs040.html#using-the-chain-rule-and-summing-over-all-k-entries" style="font-size: 80%;"><b>Using the chain rule and summing over all \( k \) entries</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs041.html#setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations" style="font-size: 80%;"><b>Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs042.html#setting-up-the-back-propagation-algorithm-part-1" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 1</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs043.html#setting-up-the-back-propagation-algorithm-part-2" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs044.html#setting-up-the-back-propagation-algorithm-part-3" style="font-size: 80%;"><b>Setting up the Back propagation algorithm, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs045.html#updating-the-gradients" style="font-size: 80%;"><b>Updating the gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs001.html#lecture-october-13-2025" style="font-size: 80%;"><b>Lecture October 13, 2025</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs002.html#readings-and-videos" style="font-size: 80%;"><b>Readings and videos</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs003.html#material-for-the-lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Material for the lab sessions on Tuesday and Wednesday</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs004.html#lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network" style="font-size: 80%;"><b>Lecture material: Writing a code which implements a feed-forward neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs005.html#mathematics-of-deep-learning" style="font-size: 80%;"><b>Mathematics of deep learning</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs006.html#reminder-on-books-with-hands-on-material-and-codes" style="font-size: 80%;"><b>Reminder on books with hands-on material and codes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs007.html#reading-recommendations" style="font-size: 80%;"><b>Reading recommendations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs008.html#reminder-from-last-week-first-network-example-simple-percepetron-with-one-input" style="font-size: 80%;"><b>Reminder from last week: First network example, simple percepetron with one input</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs009.html#layout-of-a-simple-neural-network-with-no-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with no hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs010.html#optimizing-the-parameters" style="font-size: 80%;"><b>Optimizing the parameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs011.html#adding-a-hidden-layer" style="font-size: 80%;"><b>Adding a hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs012.html#layout-of-a-simple-neural-network-with-one-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with one hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs013.html#the-derivatives" style="font-size: 80%;"><b>The derivatives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs014.html#important-observations" style="font-size: 80%;"><b>Important observations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs015.html#the-training" style="font-size: 80%;"><b>The training</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs016.html#code-example" style="font-size: 80%;"><b>Code example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs017.html#simple-neural-network-and-the-back-propagation-equations" style="font-size: 80%;"><b>Simple neural network and the  back propagation equations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs018.html#layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-with-two-hidden-noeds-and-one-output-node" style="font-size: 80%;"><b>Layout of a simple neural network with two input nodes, one  hidden layer with two hidden noeds and one output node</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs019.html#the-ouput-layer" style="font-size: 80%;"><b>The ouput layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs020.html#compact-expressions" style="font-size: 80%;"><b>Compact expressions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs021.html#output-layer" style="font-size: 80%;"><b>Output layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs022.html#explicit-derivatives" style="font-size: 80%;"><b>Explicit derivatives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs023.html#derivatives-of-the-hidden-layer" style="font-size: 80%;"><b>Derivatives of the hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs024.html#final-expression" style="font-size: 80%;"><b>Final expression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs025.html#completing-the-list" style="font-size: 80%;"><b>Completing the list</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs026.html#final-expressions-for-the-biases-of-the-hidden-layer" style="font-size: 80%;"><b>Final expressions for the biases of the hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs027.html#gradient-expressions" style="font-size: 80%;"><b>Gradient expressions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs028.html#setting-up-the-equations-for-a-neural-network" style="font-size: 80%;"><b>Setting up the equations for a neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs029.html#layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0" style="font-size: 80%;"><b>Layout of a neural network with three hidden layers (last layer = \( l=L=4 \), first layer \( l=0 \))</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs030.html#definitions" style="font-size: 80%;"><b>Definitions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs031.html#inputs-to-the-activation-function" style="font-size: 80%;"><b>Inputs to the activation function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs032.html#layout-of-input-to-first-hidden-layer-l-1-from-input-layer-l-0" style="font-size: 80%;"><b>Layout of input to first hidden layer \( l=1 \) from input layer \( l=0 \)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs033.html#derivatives-and-the-chain-rule" style="font-size: 80%;"><b>Derivatives and the chain rule</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs034.html#derivative-of-the-cost-function" style="font-size: 80%;"><b>Derivative of the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs035.html#the-back-propagation-equations-for-a-neural-network" style="font-size: 80%;"><b>The  back propagation equations for a neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs036.html#analyzing-the-last-results" style="font-size: 80%;"><b>Analyzing the last results</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs037.html#more-considerations" style="font-size: 80%;"><b>More considerations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs038.html#derivatives-in-terms-of-z-j-l" style="font-size: 80%;"><b>Derivatives in terms of \( z_j^L \)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs039.html#bringing-it-together" style="font-size: 80%;"><b>Bringing it together</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs040.html#final-back-propagating-equation" style="font-size: 80%;"><b>Final back propagating equation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs041.html#using-the-chain-rule-and-summing-over-all-k-entries" style="font-size: 80%;"><b>Using the chain rule and summing over all \( k \) entries</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs042.html#setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations" style="font-size: 80%;"><b>Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs043.html#setting-up-the-back-propagation-algorithm-part-1" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 1</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs044.html#setting-up-the-back-propagation-algorithm-part-2" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 2</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs045.html#setting-up-the-back-propagation-algorithm-part-3" style="font-size: 80%;"><b>Setting up the Back propagation algorithm, part 3</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs046.html#updating-the-gradients" style="font-size: 80%;"><b>Updating the gradients</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#activation-functions" style="font-size: 80%;"><b>Activation functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs047.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs048.html#relevance" style="font-size: 80%;"><b>Relevance</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs049.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs050.html#exploding-gradients" style="font-size: 80%;"><b>Exploding gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs051.html#is-the-logistic-activation-function-sigmoid-our-choice" style="font-size: 80%;"><b>Is the Logistic activation function (Sigmoid)  our choice?</b></a></li>
-     <!-- navigation toc: --> <li><a href="#logistic-function-as-the-root-of-problems" style="font-size: 80%;"><b>Logistic function as the root of problems</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs053.html#the-derivative-of-the-logistic-funtion" style="font-size: 80%;"><b>The derivative of the Logistic funtion</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs054.html#insights-from-the-paper-by-glorot-and-bengio" style="font-size: 80%;"><b>Insights from the paper by Glorot and Bengio</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs055.html#the-relu-function-family" style="font-size: 80%;"><b>The RELU function family</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs056.html#elu-function" style="font-size: 80%;"><b>ELU function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs057.html#which-activation-function-should-we-use" style="font-size: 80%;"><b>Which activation function should we use?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs058.html#more-on-activation-functions-output-layers" style="font-size: 80%;"><b>More on activation functions, output layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs059.html#fine-tuning-neural-network-hyperparameters" style="font-size: 80%;"><b>Fine-tuning neural network hyperparameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs060.html#hidden-layers" style="font-size: 80%;"><b>Hidden layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs061.html#batch-normalization" style="font-size: 80%;"><b>Batch Normalization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs062.html#dropout" style="font-size: 80%;"><b>Dropout</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs063.html#gradient-clipping" style="font-size: 80%;"><b>Gradient Clipping</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs064.html#a-top-down-perspective-on-neural-networks" style="font-size: 80%;"><b>A top-down perspective on Neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs065.html#more-top-down-perspectives" style="font-size: 80%;"><b>More top-down perspectives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs066.html#limitations-of-supervised-learning-with-deep-networks" style="font-size: 80%;"><b>Limitations of supervised learning with deep networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs067.html#limitations-of-nns" style="font-size: 80%;"><b>Limitations of NNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs068.html#homogeneous-data" style="font-size: 80%;"><b>Homogeneous data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs069.html#more-limitations" style="font-size: 80%;"><b>More limitations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs070.html#setting-up-a-multi-layer-perceptron-model-for-classification" style="font-size: 80%;"><b>Setting up a Multi-layer perceptron model for classification</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs071.html#defining-the-cost-function" style="font-size: 80%;"><b>Defining the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs072.html#example-binary-classification-problem" style="font-size: 80%;"><b>Example: binary classification problem</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs073.html#the-softmax-function" style="font-size: 80%;"><b>The Softmax function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs074.html#developing-a-code-for-doing-neural-networks-with-back-propagation" style="font-size: 80%;"><b>Developing a code for doing neural networks with back propagation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs076.html#train-and-test-datasets" style="font-size: 80%;"><b>Train and test datasets</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs077.html#define-model-and-architecture" style="font-size: 80%;"><b>Define model and architecture</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs078.html#layers" style="font-size: 80%;"><b>Layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs079.html#weights-and-biases" style="font-size: 80%;"><b>Weights and biases</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs080.html#feed-forward-pass" style="font-size: 80%;"><b>Feed-forward pass</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs081.html#matrix-multiplications" style="font-size: 80%;"><b>Matrix multiplications</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs082.html#choose-cost-function-and-optimizer" style="font-size: 80%;"><b>Choose cost function and optimizer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs083.html#optimizing-the-cost-function" style="font-size: 80%;"><b>Optimizing the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs084.html#regularization" style="font-size: 80%;"><b>Regularization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs085.html#matrix-multiplication" style="font-size: 80%;"><b>Matrix  multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs086.html#improving-performance" style="font-size: 80%;"><b>Improving performance</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs087.html#full-object-oriented-implementation" style="font-size: 80%;"><b>Full object-oriented implementation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs088.html#evaluate-model-performance-on-test-data" style="font-size: 80%;"><b>Evaluate model performance on test data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs089.html#adjust-hyperparameters" style="font-size: 80%;"><b>Adjust hyperparameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs091.html#scikit-learn-implementation" style="font-size: 80%;"><b>scikit-learn implementation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#building-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs094.html#tensorflow" style="font-size: 80%;"><b>Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs095.html#using-keras" style="font-size: 80%;"><b>Using Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#the-breast-cancer-data-now-with-keras" style="font-size: 80%;"><b>The Breast Cancer Data, now with Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs048.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs049.html#relevance" style="font-size: 80%;"><b>Relevance</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs050.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs051.html#exploding-gradients" style="font-size: 80%;"><b>Exploding gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="#is-the-logistic-activation-function-sigmoid-our-choice" style="font-size: 80%;"><b>Is the Logistic activation function (Sigmoid)  our choice?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs053.html#logistic-function-as-the-root-of-problems" style="font-size: 80%;"><b>Logistic function as the root of problems</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs054.html#the-derivative-of-the-logistic-funtion" style="font-size: 80%;"><b>The derivative of the Logistic funtion</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs055.html#insights-from-the-paper-by-glorot-and-bengio" style="font-size: 80%;"><b>Insights from the paper by Glorot and Bengio</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs056.html#the-relu-function-family" style="font-size: 80%;"><b>The RELU function family</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs057.html#elu-function" style="font-size: 80%;"><b>ELU function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs058.html#which-activation-function-should-we-use" style="font-size: 80%;"><b>Which activation function should we use?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs059.html#more-on-activation-functions-output-layers" style="font-size: 80%;"><b>More on activation functions, output layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs060.html#fine-tuning-neural-network-hyperparameters" style="font-size: 80%;"><b>Fine-tuning neural network hyperparameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs061.html#hidden-layers" style="font-size: 80%;"><b>Hidden layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs062.html#batch-normalization" style="font-size: 80%;"><b>Batch Normalization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs063.html#dropout" style="font-size: 80%;"><b>Dropout</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs064.html#gradient-clipping" style="font-size: 80%;"><b>Gradient Clipping</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs065.html#a-top-down-perspective-on-neural-networks" style="font-size: 80%;"><b>A top-down perspective on Neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs066.html#more-top-down-perspectives" style="font-size: 80%;"><b>More top-down perspectives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs067.html#limitations-of-supervised-learning-with-deep-networks" style="font-size: 80%;"><b>Limitations of supervised learning with deep networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs068.html#limitations-of-nns" style="font-size: 80%;"><b>Limitations of NNs</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs069.html#homogeneous-data" style="font-size: 80%;"><b>Homogeneous data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs070.html#more-limitations" style="font-size: 80%;"><b>More limitations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs071.html#setting-up-a-multi-layer-perceptron-model-for-classification" style="font-size: 80%;"><b>Setting up a Multi-layer perceptron model for classification</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs072.html#defining-the-cost-function" style="font-size: 80%;"><b>Defining the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs073.html#example-binary-classification-problem" style="font-size: 80%;"><b>Example: binary classification problem</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs074.html#the-softmax-function" style="font-size: 80%;"><b>The Softmax function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs075.html#developing-a-code-for-doing-neural-networks-with-back-propagation" style="font-size: 80%;"><b>Developing a code for doing neural networks with back propagation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs077.html#train-and-test-datasets" style="font-size: 80%;"><b>Train and test datasets</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs078.html#define-model-and-architecture" style="font-size: 80%;"><b>Define model and architecture</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs079.html#layers" style="font-size: 80%;"><b>Layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs080.html#weights-and-biases" style="font-size: 80%;"><b>Weights and biases</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs081.html#feed-forward-pass" style="font-size: 80%;"><b>Feed-forward pass</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs082.html#matrix-multiplications" style="font-size: 80%;"><b>Matrix multiplications</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs083.html#choose-cost-function-and-optimizer" style="font-size: 80%;"><b>Choose cost function and optimizer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs084.html#optimizing-the-cost-function" style="font-size: 80%;"><b>Optimizing the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs085.html#regularization" style="font-size: 80%;"><b>Regularization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs086.html#matrix-multiplication" style="font-size: 80%;"><b>Matrix  multiplication</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs087.html#improving-performance" style="font-size: 80%;"><b>Improving performance</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs088.html#full-object-oriented-implementation" style="font-size: 80%;"><b>Full object-oriented implementation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs089.html#evaluate-model-performance-on-test-data" style="font-size: 80%;"><b>Evaluate model performance on test data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs090.html#adjust-hyperparameters" style="font-size: 80%;"><b>Adjust hyperparameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#scikit-learn-implementation" style="font-size: 80%;"><b>scikit-learn implementation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs094.html#building-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building neural networks in Tensorflow and Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs095.html#tensorflow" style="font-size: 80%;"><b>Tensorflow</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#using-keras" style="font-size: 80%;"><b>Using Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#building-a-neural-network-code" style="font-size: 80%;"><b>Building a neural network code</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#learning-rate-methods" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Learning rate methods</a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#usage-of-the-above-learning-rate-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of the above learning rate schedulers</a></li>
@@ -464,17 +462,22 @@
 <div class="container">
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0052"></a>
-<!-- !split -->
-<h2 id="logistic-function-as-the-root-of-problems" class="anchor">Logistic function as the root of problems </h2>
+<!-- !split  -->
+<h2 id="is-the-logistic-activation-function-sigmoid-our-choice" class="anchor">Is the Logistic activation function (Sigmoid)  our choice? </h2>
+
+<p>Although this unfortunate behavior has been empirically observed for
+quite a while (it was one of the reasons why deep neural networks were
+mostly abandoned for a long time), it is only around 2010 that
+significant progress was made in understanding it.
+</p>
 
-<p>They showed that with this activation function and this
-initialization scheme, the variance of the outputs of each layer is
-much greater than the variance of its inputs. Going forward in the
-network, the variance keeps increasing after each layer until the
-activation function saturates at the top layers. This is actually made
-worse by the fact that the logistic function has a mean of 0.5, not 0
-(the hyperbolic tangent function has a mean of 0 and behaves slightly
-better than the logistic function in deep networks).
+<p>A paper titled <a href="http://proceedings.mlr.press/v9/glorot10a.html" target="_self">Understanding the Difficulty of Training Deep
+Feedforward Neural Networks by Xavier Glorot and Yoshua Bengio</a> found that
+the problems with the popular logistic
+sigmoid activation function and the weight initialization technique
+that was most popular at the time, namely random initialization using
+a normal distribution with a mean of 0 and a standard deviation of
+1. 
 </p>
 
 <p>
diff --git a/doc/pub/week42/html/._week42-bs053.html b/doc/pub/week42/html/._week42-bs053.html
index 6baeafbb9..3963b33b6 100644
--- a/doc/pub/week42/html/._week42-bs053.html
+++ b/doc/pub/week42/html/._week42-bs053.html
@@ -36,16 +36,17 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Lecture October 14, 2024', 2, None, 'lecture-october-14-2024'),
-              ('Material for the active learning sessions on Tuesday and '
-               'Wednesday',
+ 'sections': [('Lecture October 13, 2025', 2, None, 'lecture-october-13-2025'),
+              ('Readings and videos', 2, None, 'readings-and-videos'),
+              ('Material for the lab sessions on Tuesday and Wednesday',
                2,
                None,
-               'material-for-the-active-learning-sessions-on-tuesday-and-wednesday'),
-              ('Writing a code which implements a feed-forward neural network',
+               'material-for-the-lab-sessions-on-tuesday-and-wednesday'),
+              ('Lecture material: Writing a code which implements a '
+               'feed-forward neural network',
                2,
                None,
-               'writing-a-code-which-implements-a-feed-forward-neural-network'),
+               'lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network'),
               ('Mathematics of deep learning',
                2,
                None,
@@ -55,10 +56,11 @@
                None,
                'reminder-on-books-with-hands-on-material-and-codes'),
               ('Reading recommendations', 2, None, 'reading-recommendations'),
-              ('First network example, simple percepetron with one input',
+              ('Reminder from last week: First network example, simple '
+               'percepetron with one input',
                2,
                None,
-               'first-network-example-simple-percepetron-with-one-input'),
+               'reminder-from-last-week-first-network-example-simple-percepetron-with-one-input'),
               ('Layout of a simple neural network with no hidden layer',
                2,
                None,
@@ -105,10 +107,10 @@
                None,
                'setting-up-the-equations-for-a-neural-network'),
               ('Layout of a neural network with three hidden layers (last '
-               'later = $l=L=4$, first layer $l=0$)',
+               'layer = $l=L=4$, first layer $l=0$)',
                2,
                None,
-               'layout-of-a-neural-network-with-three-hidden-layers-last-later-l-l-4-first-layer-l-0'),
+               'layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0'),
               ('Definitions', 2, None, 'definitions'),
               ('Inputs to the activation function',
                2,
@@ -291,10 +293,6 @@
                2,
                None,
                'collect-and-pre-process-data'),
-              ('The Breast Cancer Data, now with Keras',
-               2,
-               None,
-               'the-breast-cancer-data-now-with-keras'),
               ('Building a neural network code',
                2,
                None,
@@ -349,103 +347,103 @@
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs001.html#lecture-october-14-2024" style="font-size: 80%;"><b>Lecture October 14, 2024</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs002.html#material-for-the-active-learning-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Material for the active learning sessions on Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs003.html#writing-a-code-which-implements-a-feed-forward-neural-network" style="font-size: 80%;"><b>Writing a code which implements a feed-forward neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs004.html#mathematics-of-deep-learning" style="font-size: 80%;"><b>Mathematics of deep learning</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs005.html#reminder-on-books-with-hands-on-material-and-codes" style="font-size: 80%;"><b>Reminder on books with hands-on material and codes</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs006.html#reading-recommendations" style="font-size: 80%;"><b>Reading recommendations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs007.html#first-network-example-simple-percepetron-with-one-input" style="font-size: 80%;"><b>First network example, simple percepetron with one input</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs008.html#layout-of-a-simple-neural-network-with-no-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with no hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs009.html#optimizing-the-parameters" style="font-size: 80%;"><b>Optimizing the parameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs010.html#adding-a-hidden-layer" style="font-size: 80%;"><b>Adding a hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs011.html#layout-of-a-simple-neural-network-with-one-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with one hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs012.html#the-derivatives" style="font-size: 80%;"><b>The derivatives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs013.html#important-observations" style="font-size: 80%;"><b>Important observations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs014.html#the-training" style="font-size: 80%;"><b>The training</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs015.html#code-example" style="font-size: 80%;"><b>Code example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs016.html#simple-neural-network-and-the-back-propagation-equations" style="font-size: 80%;"><b>Simple neural network and the  back propagation equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs017.html#layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-with-two-hidden-noeds-and-one-output-node" style="font-size: 80%;"><b>Layout of a simple neural network with two input nodes, one  hidden layer with two hidden noeds and one output node</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs018.html#the-ouput-layer" style="font-size: 80%;"><b>The ouput layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs019.html#compact-expressions" style="font-size: 80%;"><b>Compact expressions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs020.html#output-layer" style="font-size: 80%;"><b>Output layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs021.html#explicit-derivatives" style="font-size: 80%;"><b>Explicit derivatives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs022.html#derivatives-of-the-hidden-layer" style="font-size: 80%;"><b>Derivatives of the hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs023.html#final-expression" style="font-size: 80%;"><b>Final expression</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs024.html#completing-the-list" style="font-size: 80%;"><b>Completing the list</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs025.html#final-expressions-for-the-biases-of-the-hidden-layer" style="font-size: 80%;"><b>Final expressions for the biases of the hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs026.html#gradient-expressions" style="font-size: 80%;"><b>Gradient expressions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs027.html#setting-up-the-equations-for-a-neural-network" style="font-size: 80%;"><b>Setting up the equations for a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs028.html#layout-of-a-neural-network-with-three-hidden-layers-last-later-l-l-4-first-layer-l-0" style="font-size: 80%;"><b>Layout of a neural network with three hidden layers (last later = \( l=L=4 \), first layer \( l=0 \))</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs029.html#definitions" style="font-size: 80%;"><b>Definitions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs030.html#inputs-to-the-activation-function" style="font-size: 80%;"><b>Inputs to the activation function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs031.html#layout-of-input-to-first-hidden-layer-l-1-from-input-layer-l-0" style="font-size: 80%;"><b>Layout of input to first hidden layer \( l=1 \) from input layer \( l=0 \)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs032.html#derivatives-and-the-chain-rule" style="font-size: 80%;"><b>Derivatives and the chain rule</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs033.html#derivative-of-the-cost-function" style="font-size: 80%;"><b>Derivative of the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs034.html#the-back-propagation-equations-for-a-neural-network" style="font-size: 80%;"><b>The  back propagation equations for a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs035.html#analyzing-the-last-results" style="font-size: 80%;"><b>Analyzing the last results</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs036.html#more-considerations" style="font-size: 80%;"><b>More considerations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs037.html#derivatives-in-terms-of-z-j-l" style="font-size: 80%;"><b>Derivatives in terms of \( z_j^L \)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs038.html#bringing-it-together" style="font-size: 80%;"><b>Bringing it together</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs039.html#final-back-propagating-equation" style="font-size: 80%;"><b>Final back propagating equation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs040.html#using-the-chain-rule-and-summing-over-all-k-entries" style="font-size: 80%;"><b>Using the chain rule and summing over all \( k \) entries</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs041.html#setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations" style="font-size: 80%;"><b>Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs042.html#setting-up-the-back-propagation-algorithm-part-1" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 1</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs043.html#setting-up-the-back-propagation-algorithm-part-2" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs044.html#setting-up-the-back-propagation-algorithm-part-3" style="font-size: 80%;"><b>Setting up the Back propagation algorithm, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs045.html#updating-the-gradients" style="font-size: 80%;"><b>Updating the gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs001.html#lecture-october-13-2025" style="font-size: 80%;"><b>Lecture October 13, 2025</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs002.html#readings-and-videos" style="font-size: 80%;"><b>Readings and videos</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs003.html#material-for-the-lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Material for the lab sessions on Tuesday and Wednesday</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs004.html#lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network" style="font-size: 80%;"><b>Lecture material: Writing a code which implements a feed-forward neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs005.html#mathematics-of-deep-learning" style="font-size: 80%;"><b>Mathematics of deep learning</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs006.html#reminder-on-books-with-hands-on-material-and-codes" style="font-size: 80%;"><b>Reminder on books with hands-on material and codes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs007.html#reading-recommendations" style="font-size: 80%;"><b>Reading recommendations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs008.html#reminder-from-last-week-first-network-example-simple-percepetron-with-one-input" style="font-size: 80%;"><b>Reminder from last week: First network example, simple percepetron with one input</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs009.html#layout-of-a-simple-neural-network-with-no-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with no hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs010.html#optimizing-the-parameters" style="font-size: 80%;"><b>Optimizing the parameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs011.html#adding-a-hidden-layer" style="font-size: 80%;"><b>Adding a hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs012.html#layout-of-a-simple-neural-network-with-one-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with one hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs013.html#the-derivatives" style="font-size: 80%;"><b>The derivatives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs014.html#important-observations" style="font-size: 80%;"><b>Important observations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs015.html#the-training" style="font-size: 80%;"><b>The training</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs016.html#code-example" style="font-size: 80%;"><b>Code example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs017.html#simple-neural-network-and-the-back-propagation-equations" style="font-size: 80%;"><b>Simple neural network and the  back propagation equations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs018.html#layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-with-two-hidden-noeds-and-one-output-node" style="font-size: 80%;"><b>Layout of a simple neural network with two input nodes, one  hidden layer with two hidden noeds and one output node</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs019.html#the-ouput-layer" style="font-size: 80%;"><b>The ouput layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs020.html#compact-expressions" style="font-size: 80%;"><b>Compact expressions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs021.html#output-layer" style="font-size: 80%;"><b>Output layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs022.html#explicit-derivatives" style="font-size: 80%;"><b>Explicit derivatives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs023.html#derivatives-of-the-hidden-layer" style="font-size: 80%;"><b>Derivatives of the hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs024.html#final-expression" style="font-size: 80%;"><b>Final expression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs025.html#completing-the-list" style="font-size: 80%;"><b>Completing the list</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs026.html#final-expressions-for-the-biases-of-the-hidden-layer" style="font-size: 80%;"><b>Final expressions for the biases of the hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs027.html#gradient-expressions" style="font-size: 80%;"><b>Gradient expressions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs028.html#setting-up-the-equations-for-a-neural-network" style="font-size: 80%;"><b>Setting up the equations for a neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs029.html#layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0" style="font-size: 80%;"><b>Layout of a neural network with three hidden layers (last layer = \( l=L=4 \), first layer \( l=0 \))</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs030.html#definitions" style="font-size: 80%;"><b>Definitions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs031.html#inputs-to-the-activation-function" style="font-size: 80%;"><b>Inputs to the activation function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs032.html#layout-of-input-to-first-hidden-layer-l-1-from-input-layer-l-0" style="font-size: 80%;"><b>Layout of input to first hidden layer \( l=1 \) from input layer \( l=0 \)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs033.html#derivatives-and-the-chain-rule" style="font-size: 80%;"><b>Derivatives and the chain rule</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs034.html#derivative-of-the-cost-function" style="font-size: 80%;"><b>Derivative of the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs035.html#the-back-propagation-equations-for-a-neural-network" style="font-size: 80%;"><b>The  back propagation equations for a neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs036.html#analyzing-the-last-results" style="font-size: 80%;"><b>Analyzing the last results</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs037.html#more-considerations" style="font-size: 80%;"><b>More considerations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs038.html#derivatives-in-terms-of-z-j-l" style="font-size: 80%;"><b>Derivatives in terms of \( z_j^L \)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs039.html#bringing-it-together" style="font-size: 80%;"><b>Bringing it together</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs040.html#final-back-propagating-equation" style="font-size: 80%;"><b>Final back propagating equation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs041.html#using-the-chain-rule-and-summing-over-all-k-entries" style="font-size: 80%;"><b>Using the chain rule and summing over all \( k \) entries</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs042.html#setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations" style="font-size: 80%;"><b>Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs043.html#setting-up-the-back-propagation-algorithm-part-1" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 1</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs044.html#setting-up-the-back-propagation-algorithm-part-2" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 2</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs045.html#setting-up-the-back-propagation-algorithm-part-3" style="font-size: 80%;"><b>Setting up the Back propagation algorithm, part 3</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs046.html#updating-the-gradients" style="font-size: 80%;"><b>Updating the gradients</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#activation-functions" style="font-size: 80%;"><b>Activation functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs047.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs048.html#relevance" style="font-size: 80%;"><b>Relevance</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs049.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs050.html#exploding-gradients" style="font-size: 80%;"><b>Exploding gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs051.html#is-the-logistic-activation-function-sigmoid-our-choice" style="font-size: 80%;"><b>Is the Logistic activation function (Sigmoid)  our choice?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs052.html#logistic-function-as-the-root-of-problems" style="font-size: 80%;"><b>Logistic function as the root of problems</b></a></li>
-     <!-- navigation toc: --> <li><a href="#the-derivative-of-the-logistic-funtion" style="font-size: 80%;"><b>The derivative of the Logistic funtion</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs054.html#insights-from-the-paper-by-glorot-and-bengio" style="font-size: 80%;"><b>Insights from the paper by Glorot and Bengio</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs055.html#the-relu-function-family" style="font-size: 80%;"><b>The RELU function family</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs056.html#elu-function" style="font-size: 80%;"><b>ELU function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs057.html#which-activation-function-should-we-use" style="font-size: 80%;"><b>Which activation function should we use?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs058.html#more-on-activation-functions-output-layers" style="font-size: 80%;"><b>More on activation functions, output layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs059.html#fine-tuning-neural-network-hyperparameters" style="font-size: 80%;"><b>Fine-tuning neural network hyperparameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs060.html#hidden-layers" style="font-size: 80%;"><b>Hidden layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs061.html#batch-normalization" style="font-size: 80%;"><b>Batch Normalization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs062.html#dropout" style="font-size: 80%;"><b>Dropout</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs063.html#gradient-clipping" style="font-size: 80%;"><b>Gradient Clipping</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs064.html#a-top-down-perspective-on-neural-networks" style="font-size: 80%;"><b>A top-down perspective on Neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs065.html#more-top-down-perspectives" style="font-size: 80%;"><b>More top-down perspectives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs066.html#limitations-of-supervised-learning-with-deep-networks" style="font-size: 80%;"><b>Limitations of supervised learning with deep networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs067.html#limitations-of-nns" style="font-size: 80%;"><b>Limitations of NNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs068.html#homogeneous-data" style="font-size: 80%;"><b>Homogeneous data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs069.html#more-limitations" style="font-size: 80%;"><b>More limitations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs070.html#setting-up-a-multi-layer-perceptron-model-for-classification" style="font-size: 80%;"><b>Setting up a Multi-layer perceptron model for classification</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs071.html#defining-the-cost-function" style="font-size: 80%;"><b>Defining the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs072.html#example-binary-classification-problem" style="font-size: 80%;"><b>Example: binary classification problem</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs073.html#the-softmax-function" style="font-size: 80%;"><b>The Softmax function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs074.html#developing-a-code-for-doing-neural-networks-with-back-propagation" style="font-size: 80%;"><b>Developing a code for doing neural networks with back propagation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs076.html#train-and-test-datasets" style="font-size: 80%;"><b>Train and test datasets</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs077.html#define-model-and-architecture" style="font-size: 80%;"><b>Define model and architecture</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs078.html#layers" style="font-size: 80%;"><b>Layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs079.html#weights-and-biases" style="font-size: 80%;"><b>Weights and biases</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs080.html#feed-forward-pass" style="font-size: 80%;"><b>Feed-forward pass</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs081.html#matrix-multiplications" style="font-size: 80%;"><b>Matrix multiplications</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs082.html#choose-cost-function-and-optimizer" style="font-size: 80%;"><b>Choose cost function and optimizer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs083.html#optimizing-the-cost-function" style="font-size: 80%;"><b>Optimizing the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs084.html#regularization" style="font-size: 80%;"><b>Regularization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs085.html#matrix-multiplication" style="font-size: 80%;"><b>Matrix  multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs086.html#improving-performance" style="font-size: 80%;"><b>Improving performance</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs087.html#full-object-oriented-implementation" style="font-size: 80%;"><b>Full object-oriented implementation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs088.html#evaluate-model-performance-on-test-data" style="font-size: 80%;"><b>Evaluate model performance on test data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs089.html#adjust-hyperparameters" style="font-size: 80%;"><b>Adjust hyperparameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs091.html#scikit-learn-implementation" style="font-size: 80%;"><b>scikit-learn implementation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#building-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs094.html#tensorflow" style="font-size: 80%;"><b>Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs095.html#using-keras" style="font-size: 80%;"><b>Using Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#the-breast-cancer-data-now-with-keras" style="font-size: 80%;"><b>The Breast Cancer Data, now with Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs048.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs049.html#relevance" style="font-size: 80%;"><b>Relevance</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs050.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs051.html#exploding-gradients" style="font-size: 80%;"><b>Exploding gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs052.html#is-the-logistic-activation-function-sigmoid-our-choice" style="font-size: 80%;"><b>Is the Logistic activation function (Sigmoid)  our choice?</b></a></li>
+     <!-- navigation toc: --> <li><a href="#logistic-function-as-the-root-of-problems" style="font-size: 80%;"><b>Logistic function as the root of problems</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs054.html#the-derivative-of-the-logistic-funtion" style="font-size: 80%;"><b>The derivative of the Logistic funtion</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs055.html#insights-from-the-paper-by-glorot-and-bengio" style="font-size: 80%;"><b>Insights from the paper by Glorot and Bengio</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs056.html#the-relu-function-family" style="font-size: 80%;"><b>The RELU function family</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs057.html#elu-function" style="font-size: 80%;"><b>ELU function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs058.html#which-activation-function-should-we-use" style="font-size: 80%;"><b>Which activation function should we use?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs059.html#more-on-activation-functions-output-layers" style="font-size: 80%;"><b>More on activation functions, output layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs060.html#fine-tuning-neural-network-hyperparameters" style="font-size: 80%;"><b>Fine-tuning neural network hyperparameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs061.html#hidden-layers" style="font-size: 80%;"><b>Hidden layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs062.html#batch-normalization" style="font-size: 80%;"><b>Batch Normalization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs063.html#dropout" style="font-size: 80%;"><b>Dropout</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs064.html#gradient-clipping" style="font-size: 80%;"><b>Gradient Clipping</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs065.html#a-top-down-perspective-on-neural-networks" style="font-size: 80%;"><b>A top-down perspective on Neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs066.html#more-top-down-perspectives" style="font-size: 80%;"><b>More top-down perspectives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs067.html#limitations-of-supervised-learning-with-deep-networks" style="font-size: 80%;"><b>Limitations of supervised learning with deep networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs068.html#limitations-of-nns" style="font-size: 80%;"><b>Limitations of NNs</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs069.html#homogeneous-data" style="font-size: 80%;"><b>Homogeneous data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs070.html#more-limitations" style="font-size: 80%;"><b>More limitations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs071.html#setting-up-a-multi-layer-perceptron-model-for-classification" style="font-size: 80%;"><b>Setting up a Multi-layer perceptron model for classification</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs072.html#defining-the-cost-function" style="font-size: 80%;"><b>Defining the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs073.html#example-binary-classification-problem" style="font-size: 80%;"><b>Example: binary classification problem</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs074.html#the-softmax-function" style="font-size: 80%;"><b>The Softmax function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs075.html#developing-a-code-for-doing-neural-networks-with-back-propagation" style="font-size: 80%;"><b>Developing a code for doing neural networks with back propagation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs077.html#train-and-test-datasets" style="font-size: 80%;"><b>Train and test datasets</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs078.html#define-model-and-architecture" style="font-size: 80%;"><b>Define model and architecture</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs079.html#layers" style="font-size: 80%;"><b>Layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs080.html#weights-and-biases" style="font-size: 80%;"><b>Weights and biases</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs081.html#feed-forward-pass" style="font-size: 80%;"><b>Feed-forward pass</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs082.html#matrix-multiplications" style="font-size: 80%;"><b>Matrix multiplications</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs083.html#choose-cost-function-and-optimizer" style="font-size: 80%;"><b>Choose cost function and optimizer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs084.html#optimizing-the-cost-function" style="font-size: 80%;"><b>Optimizing the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs085.html#regularization" style="font-size: 80%;"><b>Regularization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs086.html#matrix-multiplication" style="font-size: 80%;"><b>Matrix  multiplication</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs087.html#improving-performance" style="font-size: 80%;"><b>Improving performance</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs088.html#full-object-oriented-implementation" style="font-size: 80%;"><b>Full object-oriented implementation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs089.html#evaluate-model-performance-on-test-data" style="font-size: 80%;"><b>Evaluate model performance on test data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs090.html#adjust-hyperparameters" style="font-size: 80%;"><b>Adjust hyperparameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#scikit-learn-implementation" style="font-size: 80%;"><b>scikit-learn implementation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs094.html#building-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building neural networks in Tensorflow and Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs095.html#tensorflow" style="font-size: 80%;"><b>Tensorflow</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#using-keras" style="font-size: 80%;"><b>Using Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#building-a-neural-network-code" style="font-size: 80%;"><b>Building a neural network code</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#learning-rate-methods" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Learning rate methods</a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#usage-of-the-above-learning-rate-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of the above learning rate schedulers</a></li>
@@ -465,26 +463,16 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0053"></a>
 <!-- !split -->
-<h2 id="the-derivative-of-the-logistic-funtion" class="anchor">The derivative of the Logistic funtion </h2>
-
-<p>Looking at the logistic activation function, when inputs become large
-(negative or positive), the function saturates at 0 or 1, with a
-derivative extremely close to 0. Thus when backpropagation kicks in,
-it has virtually no gradient to propagate back through the network,
-and what little gradient exists keeps getting diluted as
-backpropagation progresses down through the top layers, so there is
-really nothing left for the lower layers.
-</p>
+<h2 id="logistic-function-as-the-root-of-problems" class="anchor">Logistic function as the root of problems </h2>
 
-<p>In their paper, Glorot and Bengio propose a way to significantly
-alleviate this problem. We need the signal to flow properly in both
-directions: in the forward direction when making predictions, and in
-the reverse direction when backpropagating gradients. We don&#8217;t want
-the signal to die out, nor do we want it to explode and saturate. For
-the signal to flow properly, the authors argue that we need the
-variance of the outputs of each layer to be equal to the variance of
-its inputs, and we also need the gradients to have equal variance
-before and after flowing through a layer in the reverse direction.
+<p>They showed that with this activation function and this
+initialization scheme, the variance of the outputs of each layer is
+much greater than the variance of its inputs. Going forward in the
+network, the variance keeps increasing after each layer until the
+activation function saturates at the top layers. This is actually made
+worse by the fact that the logistic function has a mean of 0.5, not 0
+(the hyperbolic tangent function has a mean of 0 and behaves slightly
+better than the logistic function in deep networks).
 </p>
 
 <p>
diff --git a/doc/pub/week42/html/._week42-bs054.html b/doc/pub/week42/html/._week42-bs054.html
index fbad8c238..bc4306aff 100644
--- a/doc/pub/week42/html/._week42-bs054.html
+++ b/doc/pub/week42/html/._week42-bs054.html
@@ -36,16 +36,17 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Lecture October 14, 2024', 2, None, 'lecture-october-14-2024'),
-              ('Material for the active learning sessions on Tuesday and '
-               'Wednesday',
+ 'sections': [('Lecture October 13, 2025', 2, None, 'lecture-october-13-2025'),
+              ('Readings and videos', 2, None, 'readings-and-videos'),
+              ('Material for the lab sessions on Tuesday and Wednesday',
                2,
                None,
-               'material-for-the-active-learning-sessions-on-tuesday-and-wednesday'),
-              ('Writing a code which implements a feed-forward neural network',
+               'material-for-the-lab-sessions-on-tuesday-and-wednesday'),
+              ('Lecture material: Writing a code which implements a '
+               'feed-forward neural network',
                2,
                None,
-               'writing-a-code-which-implements-a-feed-forward-neural-network'),
+               'lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network'),
               ('Mathematics of deep learning',
                2,
                None,
@@ -55,10 +56,11 @@
                None,
                'reminder-on-books-with-hands-on-material-and-codes'),
               ('Reading recommendations', 2, None, 'reading-recommendations'),
-              ('First network example, simple percepetron with one input',
+              ('Reminder from last week: First network example, simple '
+               'percepetron with one input',
                2,
                None,
-               'first-network-example-simple-percepetron-with-one-input'),
+               'reminder-from-last-week-first-network-example-simple-percepetron-with-one-input'),
               ('Layout of a simple neural network with no hidden layer',
                2,
                None,
@@ -105,10 +107,10 @@
                None,
                'setting-up-the-equations-for-a-neural-network'),
               ('Layout of a neural network with three hidden layers (last '
-               'later = $l=L=4$, first layer $l=0$)',
+               'layer = $l=L=4$, first layer $l=0$)',
                2,
                None,
-               'layout-of-a-neural-network-with-three-hidden-layers-last-later-l-l-4-first-layer-l-0'),
+               'layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0'),
               ('Definitions', 2, None, 'definitions'),
               ('Inputs to the activation function',
                2,
@@ -291,10 +293,6 @@
                2,
                None,
                'collect-and-pre-process-data'),
-              ('The Breast Cancer Data, now with Keras',
-               2,
-               None,
-               'the-breast-cancer-data-now-with-keras'),
               ('Building a neural network code',
                2,
                None,
@@ -349,103 +347,103 @@
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs001.html#lecture-october-14-2024" style="font-size: 80%;"><b>Lecture October 14, 2024</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs002.html#material-for-the-active-learning-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Material for the active learning sessions on Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs003.html#writing-a-code-which-implements-a-feed-forward-neural-network" style="font-size: 80%;"><b>Writing a code which implements a feed-forward neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs004.html#mathematics-of-deep-learning" style="font-size: 80%;"><b>Mathematics of deep learning</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs005.html#reminder-on-books-with-hands-on-material-and-codes" style="font-size: 80%;"><b>Reminder on books with hands-on material and codes</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs006.html#reading-recommendations" style="font-size: 80%;"><b>Reading recommendations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs007.html#first-network-example-simple-percepetron-with-one-input" style="font-size: 80%;"><b>First network example, simple percepetron with one input</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs008.html#layout-of-a-simple-neural-network-with-no-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with no hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs009.html#optimizing-the-parameters" style="font-size: 80%;"><b>Optimizing the parameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs010.html#adding-a-hidden-layer" style="font-size: 80%;"><b>Adding a hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs011.html#layout-of-a-simple-neural-network-with-one-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with one hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs012.html#the-derivatives" style="font-size: 80%;"><b>The derivatives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs013.html#important-observations" style="font-size: 80%;"><b>Important observations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs014.html#the-training" style="font-size: 80%;"><b>The training</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs015.html#code-example" style="font-size: 80%;"><b>Code example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs016.html#simple-neural-network-and-the-back-propagation-equations" style="font-size: 80%;"><b>Simple neural network and the  back propagation equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs017.html#layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-with-two-hidden-noeds-and-one-output-node" style="font-size: 80%;"><b>Layout of a simple neural network with two input nodes, one  hidden layer with two hidden noeds and one output node</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs018.html#the-ouput-layer" style="font-size: 80%;"><b>The ouput layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs019.html#compact-expressions" style="font-size: 80%;"><b>Compact expressions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs020.html#output-layer" style="font-size: 80%;"><b>Output layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs021.html#explicit-derivatives" style="font-size: 80%;"><b>Explicit derivatives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs022.html#derivatives-of-the-hidden-layer" style="font-size: 80%;"><b>Derivatives of the hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs023.html#final-expression" style="font-size: 80%;"><b>Final expression</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs024.html#completing-the-list" style="font-size: 80%;"><b>Completing the list</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs025.html#final-expressions-for-the-biases-of-the-hidden-layer" style="font-size: 80%;"><b>Final expressions for the biases of the hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs026.html#gradient-expressions" style="font-size: 80%;"><b>Gradient expressions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs027.html#setting-up-the-equations-for-a-neural-network" style="font-size: 80%;"><b>Setting up the equations for a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs028.html#layout-of-a-neural-network-with-three-hidden-layers-last-later-l-l-4-first-layer-l-0" style="font-size: 80%;"><b>Layout of a neural network with three hidden layers (last later = \( l=L=4 \), first layer \( l=0 \))</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs029.html#definitions" style="font-size: 80%;"><b>Definitions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs030.html#inputs-to-the-activation-function" style="font-size: 80%;"><b>Inputs to the activation function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs031.html#layout-of-input-to-first-hidden-layer-l-1-from-input-layer-l-0" style="font-size: 80%;"><b>Layout of input to first hidden layer \( l=1 \) from input layer \( l=0 \)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs032.html#derivatives-and-the-chain-rule" style="font-size: 80%;"><b>Derivatives and the chain rule</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs033.html#derivative-of-the-cost-function" style="font-size: 80%;"><b>Derivative of the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs034.html#the-back-propagation-equations-for-a-neural-network" style="font-size: 80%;"><b>The  back propagation equations for a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs035.html#analyzing-the-last-results" style="font-size: 80%;"><b>Analyzing the last results</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs036.html#more-considerations" style="font-size: 80%;"><b>More considerations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs037.html#derivatives-in-terms-of-z-j-l" style="font-size: 80%;"><b>Derivatives in terms of \( z_j^L \)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs038.html#bringing-it-together" style="font-size: 80%;"><b>Bringing it together</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs039.html#final-back-propagating-equation" style="font-size: 80%;"><b>Final back propagating equation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs040.html#using-the-chain-rule-and-summing-over-all-k-entries" style="font-size: 80%;"><b>Using the chain rule and summing over all \( k \) entries</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs041.html#setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations" style="font-size: 80%;"><b>Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs042.html#setting-up-the-back-propagation-algorithm-part-1" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 1</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs043.html#setting-up-the-back-propagation-algorithm-part-2" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs044.html#setting-up-the-back-propagation-algorithm-part-3" style="font-size: 80%;"><b>Setting up the Back propagation algorithm, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs045.html#updating-the-gradients" style="font-size: 80%;"><b>Updating the gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs001.html#lecture-october-13-2025" style="font-size: 80%;"><b>Lecture October 13, 2025</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs002.html#readings-and-videos" style="font-size: 80%;"><b>Readings and videos</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs003.html#material-for-the-lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Material for the lab sessions on Tuesday and Wednesday</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs004.html#lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network" style="font-size: 80%;"><b>Lecture material: Writing a code which implements a feed-forward neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs005.html#mathematics-of-deep-learning" style="font-size: 80%;"><b>Mathematics of deep learning</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs006.html#reminder-on-books-with-hands-on-material-and-codes" style="font-size: 80%;"><b>Reminder on books with hands-on material and codes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs007.html#reading-recommendations" style="font-size: 80%;"><b>Reading recommendations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs008.html#reminder-from-last-week-first-network-example-simple-percepetron-with-one-input" style="font-size: 80%;"><b>Reminder from last week: First network example, simple percepetron with one input</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs009.html#layout-of-a-simple-neural-network-with-no-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with no hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs010.html#optimizing-the-parameters" style="font-size: 80%;"><b>Optimizing the parameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs011.html#adding-a-hidden-layer" style="font-size: 80%;"><b>Adding a hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs012.html#layout-of-a-simple-neural-network-with-one-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with one hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs013.html#the-derivatives" style="font-size: 80%;"><b>The derivatives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs014.html#important-observations" style="font-size: 80%;"><b>Important observations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs015.html#the-training" style="font-size: 80%;"><b>The training</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs016.html#code-example" style="font-size: 80%;"><b>Code example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs017.html#simple-neural-network-and-the-back-propagation-equations" style="font-size: 80%;"><b>Simple neural network and the  back propagation equations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs018.html#layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-with-two-hidden-noeds-and-one-output-node" style="font-size: 80%;"><b>Layout of a simple neural network with two input nodes, one  hidden layer with two hidden noeds and one output node</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs019.html#the-ouput-layer" style="font-size: 80%;"><b>The ouput layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs020.html#compact-expressions" style="font-size: 80%;"><b>Compact expressions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs021.html#output-layer" style="font-size: 80%;"><b>Output layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs022.html#explicit-derivatives" style="font-size: 80%;"><b>Explicit derivatives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs023.html#derivatives-of-the-hidden-layer" style="font-size: 80%;"><b>Derivatives of the hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs024.html#final-expression" style="font-size: 80%;"><b>Final expression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs025.html#completing-the-list" style="font-size: 80%;"><b>Completing the list</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs026.html#final-expressions-for-the-biases-of-the-hidden-layer" style="font-size: 80%;"><b>Final expressions for the biases of the hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs027.html#gradient-expressions" style="font-size: 80%;"><b>Gradient expressions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs028.html#setting-up-the-equations-for-a-neural-network" style="font-size: 80%;"><b>Setting up the equations for a neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs029.html#layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0" style="font-size: 80%;"><b>Layout of a neural network with three hidden layers (last layer = \( l=L=4 \), first layer \( l=0 \))</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs030.html#definitions" style="font-size: 80%;"><b>Definitions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs031.html#inputs-to-the-activation-function" style="font-size: 80%;"><b>Inputs to the activation function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs032.html#layout-of-input-to-first-hidden-layer-l-1-from-input-layer-l-0" style="font-size: 80%;"><b>Layout of input to first hidden layer \( l=1 \) from input layer \( l=0 \)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs033.html#derivatives-and-the-chain-rule" style="font-size: 80%;"><b>Derivatives and the chain rule</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs034.html#derivative-of-the-cost-function" style="font-size: 80%;"><b>Derivative of the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs035.html#the-back-propagation-equations-for-a-neural-network" style="font-size: 80%;"><b>The  back propagation equations for a neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs036.html#analyzing-the-last-results" style="font-size: 80%;"><b>Analyzing the last results</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs037.html#more-considerations" style="font-size: 80%;"><b>More considerations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs038.html#derivatives-in-terms-of-z-j-l" style="font-size: 80%;"><b>Derivatives in terms of \( z_j^L \)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs039.html#bringing-it-together" style="font-size: 80%;"><b>Bringing it together</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs040.html#final-back-propagating-equation" style="font-size: 80%;"><b>Final back propagating equation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs041.html#using-the-chain-rule-and-summing-over-all-k-entries" style="font-size: 80%;"><b>Using the chain rule and summing over all \( k \) entries</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs042.html#setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations" style="font-size: 80%;"><b>Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs043.html#setting-up-the-back-propagation-algorithm-part-1" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 1</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs044.html#setting-up-the-back-propagation-algorithm-part-2" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 2</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs045.html#setting-up-the-back-propagation-algorithm-part-3" style="font-size: 80%;"><b>Setting up the Back propagation algorithm, part 3</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs046.html#updating-the-gradients" style="font-size: 80%;"><b>Updating the gradients</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#activation-functions" style="font-size: 80%;"><b>Activation functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs047.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs048.html#relevance" style="font-size: 80%;"><b>Relevance</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs049.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs050.html#exploding-gradients" style="font-size: 80%;"><b>Exploding gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs051.html#is-the-logistic-activation-function-sigmoid-our-choice" style="font-size: 80%;"><b>Is the Logistic activation function (Sigmoid)  our choice?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs052.html#logistic-function-as-the-root-of-problems" style="font-size: 80%;"><b>Logistic function as the root of problems</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs053.html#the-derivative-of-the-logistic-funtion" style="font-size: 80%;"><b>The derivative of the Logistic funtion</b></a></li>
-     <!-- navigation toc: --> <li><a href="#insights-from-the-paper-by-glorot-and-bengio" style="font-size: 80%;"><b>Insights from the paper by Glorot and Bengio</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs055.html#the-relu-function-family" style="font-size: 80%;"><b>The RELU function family</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs056.html#elu-function" style="font-size: 80%;"><b>ELU function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs057.html#which-activation-function-should-we-use" style="font-size: 80%;"><b>Which activation function should we use?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs058.html#more-on-activation-functions-output-layers" style="font-size: 80%;"><b>More on activation functions, output layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs059.html#fine-tuning-neural-network-hyperparameters" style="font-size: 80%;"><b>Fine-tuning neural network hyperparameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs060.html#hidden-layers" style="font-size: 80%;"><b>Hidden layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs061.html#batch-normalization" style="font-size: 80%;"><b>Batch Normalization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs062.html#dropout" style="font-size: 80%;"><b>Dropout</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs063.html#gradient-clipping" style="font-size: 80%;"><b>Gradient Clipping</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs064.html#a-top-down-perspective-on-neural-networks" style="font-size: 80%;"><b>A top-down perspective on Neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs065.html#more-top-down-perspectives" style="font-size: 80%;"><b>More top-down perspectives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs066.html#limitations-of-supervised-learning-with-deep-networks" style="font-size: 80%;"><b>Limitations of supervised learning with deep networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs067.html#limitations-of-nns" style="font-size: 80%;"><b>Limitations of NNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs068.html#homogeneous-data" style="font-size: 80%;"><b>Homogeneous data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs069.html#more-limitations" style="font-size: 80%;"><b>More limitations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs070.html#setting-up-a-multi-layer-perceptron-model-for-classification" style="font-size: 80%;"><b>Setting up a Multi-layer perceptron model for classification</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs071.html#defining-the-cost-function" style="font-size: 80%;"><b>Defining the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs072.html#example-binary-classification-problem" style="font-size: 80%;"><b>Example: binary classification problem</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs073.html#the-softmax-function" style="font-size: 80%;"><b>The Softmax function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs074.html#developing-a-code-for-doing-neural-networks-with-back-propagation" style="font-size: 80%;"><b>Developing a code for doing neural networks with back propagation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs076.html#train-and-test-datasets" style="font-size: 80%;"><b>Train and test datasets</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs077.html#define-model-and-architecture" style="font-size: 80%;"><b>Define model and architecture</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs078.html#layers" style="font-size: 80%;"><b>Layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs079.html#weights-and-biases" style="font-size: 80%;"><b>Weights and biases</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs080.html#feed-forward-pass" style="font-size: 80%;"><b>Feed-forward pass</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs081.html#matrix-multiplications" style="font-size: 80%;"><b>Matrix multiplications</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs082.html#choose-cost-function-and-optimizer" style="font-size: 80%;"><b>Choose cost function and optimizer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs083.html#optimizing-the-cost-function" style="font-size: 80%;"><b>Optimizing the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs084.html#regularization" style="font-size: 80%;"><b>Regularization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs085.html#matrix-multiplication" style="font-size: 80%;"><b>Matrix  multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs086.html#improving-performance" style="font-size: 80%;"><b>Improving performance</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs087.html#full-object-oriented-implementation" style="font-size: 80%;"><b>Full object-oriented implementation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs088.html#evaluate-model-performance-on-test-data" style="font-size: 80%;"><b>Evaluate model performance on test data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs089.html#adjust-hyperparameters" style="font-size: 80%;"><b>Adjust hyperparameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs091.html#scikit-learn-implementation" style="font-size: 80%;"><b>scikit-learn implementation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#building-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs094.html#tensorflow" style="font-size: 80%;"><b>Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs095.html#using-keras" style="font-size: 80%;"><b>Using Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#the-breast-cancer-data-now-with-keras" style="font-size: 80%;"><b>The Breast Cancer Data, now with Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs048.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs049.html#relevance" style="font-size: 80%;"><b>Relevance</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs050.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs051.html#exploding-gradients" style="font-size: 80%;"><b>Exploding gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs052.html#is-the-logistic-activation-function-sigmoid-our-choice" style="font-size: 80%;"><b>Is the Logistic activation function (Sigmoid)  our choice?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs053.html#logistic-function-as-the-root-of-problems" style="font-size: 80%;"><b>Logistic function as the root of problems</b></a></li>
+     <!-- navigation toc: --> <li><a href="#the-derivative-of-the-logistic-funtion" style="font-size: 80%;"><b>The derivative of the Logistic funtion</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs055.html#insights-from-the-paper-by-glorot-and-bengio" style="font-size: 80%;"><b>Insights from the paper by Glorot and Bengio</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs056.html#the-relu-function-family" style="font-size: 80%;"><b>The RELU function family</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs057.html#elu-function" style="font-size: 80%;"><b>ELU function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs058.html#which-activation-function-should-we-use" style="font-size: 80%;"><b>Which activation function should we use?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs059.html#more-on-activation-functions-output-layers" style="font-size: 80%;"><b>More on activation functions, output layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs060.html#fine-tuning-neural-network-hyperparameters" style="font-size: 80%;"><b>Fine-tuning neural network hyperparameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs061.html#hidden-layers" style="font-size: 80%;"><b>Hidden layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs062.html#batch-normalization" style="font-size: 80%;"><b>Batch Normalization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs063.html#dropout" style="font-size: 80%;"><b>Dropout</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs064.html#gradient-clipping" style="font-size: 80%;"><b>Gradient Clipping</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs065.html#a-top-down-perspective-on-neural-networks" style="font-size: 80%;"><b>A top-down perspective on Neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs066.html#more-top-down-perspectives" style="font-size: 80%;"><b>More top-down perspectives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs067.html#limitations-of-supervised-learning-with-deep-networks" style="font-size: 80%;"><b>Limitations of supervised learning with deep networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs068.html#limitations-of-nns" style="font-size: 80%;"><b>Limitations of NNs</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs069.html#homogeneous-data" style="font-size: 80%;"><b>Homogeneous data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs070.html#more-limitations" style="font-size: 80%;"><b>More limitations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs071.html#setting-up-a-multi-layer-perceptron-model-for-classification" style="font-size: 80%;"><b>Setting up a Multi-layer perceptron model for classification</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs072.html#defining-the-cost-function" style="font-size: 80%;"><b>Defining the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs073.html#example-binary-classification-problem" style="font-size: 80%;"><b>Example: binary classification problem</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs074.html#the-softmax-function" style="font-size: 80%;"><b>The Softmax function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs075.html#developing-a-code-for-doing-neural-networks-with-back-propagation" style="font-size: 80%;"><b>Developing a code for doing neural networks with back propagation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs077.html#train-and-test-datasets" style="font-size: 80%;"><b>Train and test datasets</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs078.html#define-model-and-architecture" style="font-size: 80%;"><b>Define model and architecture</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs079.html#layers" style="font-size: 80%;"><b>Layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs080.html#weights-and-biases" style="font-size: 80%;"><b>Weights and biases</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs081.html#feed-forward-pass" style="font-size: 80%;"><b>Feed-forward pass</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs082.html#matrix-multiplications" style="font-size: 80%;"><b>Matrix multiplications</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs083.html#choose-cost-function-and-optimizer" style="font-size: 80%;"><b>Choose cost function and optimizer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs084.html#optimizing-the-cost-function" style="font-size: 80%;"><b>Optimizing the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs085.html#regularization" style="font-size: 80%;"><b>Regularization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs086.html#matrix-multiplication" style="font-size: 80%;"><b>Matrix  multiplication</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs087.html#improving-performance" style="font-size: 80%;"><b>Improving performance</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs088.html#full-object-oriented-implementation" style="font-size: 80%;"><b>Full object-oriented implementation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs089.html#evaluate-model-performance-on-test-data" style="font-size: 80%;"><b>Evaluate model performance on test data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs090.html#adjust-hyperparameters" style="font-size: 80%;"><b>Adjust hyperparameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#scikit-learn-implementation" style="font-size: 80%;"><b>scikit-learn implementation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs094.html#building-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building neural networks in Tensorflow and Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs095.html#tensorflow" style="font-size: 80%;"><b>Tensorflow</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#using-keras" style="font-size: 80%;"><b>Using Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#building-a-neural-network-code" style="font-size: 80%;"><b>Building a neural network code</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#learning-rate-methods" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Learning rate methods</a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#usage-of-the-above-learning-rate-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of the above learning rate schedulers</a></li>
@@ -465,17 +463,26 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0054"></a>
 <!-- !split -->
-<h2 id="insights-from-the-paper-by-glorot-and-bengio" class="anchor">Insights from the paper by Glorot and Bengio </h2>
+<h2 id="the-derivative-of-the-logistic-funtion" class="anchor">The derivative of the Logistic funtion </h2>
+
+<p>Looking at the logistic activation function, when inputs become large
+(negative or positive), the function saturates at 0 or 1, with a
+derivative extremely close to 0. Thus when backpropagation kicks in,
+it has virtually no gradient to propagate back through the network,
+and what little gradient exists keeps getting diluted as
+backpropagation progresses down through the top layers, so there is
+really nothing left for the lower layers.
+</p>
 
-<p>One of the insights in the 2010 paper by Glorot and Bengio was that
-the vanishing/exploding gradients problems were in part due to a poor
-choice of activation function. Until then most people had assumed that
-if Nature had chosen to use roughly sigmoid activation functions in
-biological neurons, they must be an excellent choice. But it turns out
-that other activation functions behave much better in deep neural
-networks, in particular the ReLU activation function, mostly because
-it does not saturate for positive values (and also because it is quite
-fast to compute).
+<p>In their paper, Glorot and Bengio propose a way to significantly
+alleviate this problem. We need the signal to flow properly in both
+directions: in the forward direction when making predictions, and in
+the reverse direction when backpropagating gradients. We don&#8217;t want
+the signal to die out, nor do we want it to explode and saturate. For
+the signal to flow properly, the authors argue that we need the
+variance of the outputs of each layer to be equal to the variance of
+its inputs, and we also need the gradients to have equal variance
+before and after flowing through a layer in the reverse direction.
 </p>
 
 <p>
diff --git a/doc/pub/week42/html/._week42-bs055.html b/doc/pub/week42/html/._week42-bs055.html
index ab7dd5029..e16e63445 100644
--- a/doc/pub/week42/html/._week42-bs055.html
+++ b/doc/pub/week42/html/._week42-bs055.html
@@ -36,16 +36,17 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Lecture October 14, 2024', 2, None, 'lecture-october-14-2024'),
-              ('Material for the active learning sessions on Tuesday and '
-               'Wednesday',
+ 'sections': [('Lecture October 13, 2025', 2, None, 'lecture-october-13-2025'),
+              ('Readings and videos', 2, None, 'readings-and-videos'),
+              ('Material for the lab sessions on Tuesday and Wednesday',
                2,
                None,
-               'material-for-the-active-learning-sessions-on-tuesday-and-wednesday'),
-              ('Writing a code which implements a feed-forward neural network',
+               'material-for-the-lab-sessions-on-tuesday-and-wednesday'),
+              ('Lecture material: Writing a code which implements a '
+               'feed-forward neural network',
                2,
                None,
-               'writing-a-code-which-implements-a-feed-forward-neural-network'),
+               'lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network'),
               ('Mathematics of deep learning',
                2,
                None,
@@ -55,10 +56,11 @@
                None,
                'reminder-on-books-with-hands-on-material-and-codes'),
               ('Reading recommendations', 2, None, 'reading-recommendations'),
-              ('First network example, simple percepetron with one input',
+              ('Reminder from last week: First network example, simple '
+               'percepetron with one input',
                2,
                None,
-               'first-network-example-simple-percepetron-with-one-input'),
+               'reminder-from-last-week-first-network-example-simple-percepetron-with-one-input'),
               ('Layout of a simple neural network with no hidden layer',
                2,
                None,
@@ -105,10 +107,10 @@
                None,
                'setting-up-the-equations-for-a-neural-network'),
               ('Layout of a neural network with three hidden layers (last '
-               'later = $l=L=4$, first layer $l=0$)',
+               'layer = $l=L=4$, first layer $l=0$)',
                2,
                None,
-               'layout-of-a-neural-network-with-three-hidden-layers-last-later-l-l-4-first-layer-l-0'),
+               'layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0'),
               ('Definitions', 2, None, 'definitions'),
               ('Inputs to the activation function',
                2,
@@ -291,10 +293,6 @@
                2,
                None,
                'collect-and-pre-process-data'),
-              ('The Breast Cancer Data, now with Keras',
-               2,
-               None,
-               'the-breast-cancer-data-now-with-keras'),
               ('Building a neural network code',
                2,
                None,
@@ -349,103 +347,103 @@
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs001.html#lecture-october-14-2024" style="font-size: 80%;"><b>Lecture October 14, 2024</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs002.html#material-for-the-active-learning-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Material for the active learning sessions on Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs003.html#writing-a-code-which-implements-a-feed-forward-neural-network" style="font-size: 80%;"><b>Writing a code which implements a feed-forward neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs004.html#mathematics-of-deep-learning" style="font-size: 80%;"><b>Mathematics of deep learning</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs005.html#reminder-on-books-with-hands-on-material-and-codes" style="font-size: 80%;"><b>Reminder on books with hands-on material and codes</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs006.html#reading-recommendations" style="font-size: 80%;"><b>Reading recommendations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs007.html#first-network-example-simple-percepetron-with-one-input" style="font-size: 80%;"><b>First network example, simple percepetron with one input</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs008.html#layout-of-a-simple-neural-network-with-no-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with no hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs009.html#optimizing-the-parameters" style="font-size: 80%;"><b>Optimizing the parameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs010.html#adding-a-hidden-layer" style="font-size: 80%;"><b>Adding a hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs011.html#layout-of-a-simple-neural-network-with-one-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with one hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs012.html#the-derivatives" style="font-size: 80%;"><b>The derivatives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs013.html#important-observations" style="font-size: 80%;"><b>Important observations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs014.html#the-training" style="font-size: 80%;"><b>The training</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs015.html#code-example" style="font-size: 80%;"><b>Code example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs016.html#simple-neural-network-and-the-back-propagation-equations" style="font-size: 80%;"><b>Simple neural network and the  back propagation equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs017.html#layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-with-two-hidden-noeds-and-one-output-node" style="font-size: 80%;"><b>Layout of a simple neural network with two input nodes, one  hidden layer with two hidden noeds and one output node</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs018.html#the-ouput-layer" style="font-size: 80%;"><b>The ouput layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs019.html#compact-expressions" style="font-size: 80%;"><b>Compact expressions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs020.html#output-layer" style="font-size: 80%;"><b>Output layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs021.html#explicit-derivatives" style="font-size: 80%;"><b>Explicit derivatives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs022.html#derivatives-of-the-hidden-layer" style="font-size: 80%;"><b>Derivatives of the hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs023.html#final-expression" style="font-size: 80%;"><b>Final expression</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs024.html#completing-the-list" style="font-size: 80%;"><b>Completing the list</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs025.html#final-expressions-for-the-biases-of-the-hidden-layer" style="font-size: 80%;"><b>Final expressions for the biases of the hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs026.html#gradient-expressions" style="font-size: 80%;"><b>Gradient expressions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs027.html#setting-up-the-equations-for-a-neural-network" style="font-size: 80%;"><b>Setting up the equations for a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs028.html#layout-of-a-neural-network-with-three-hidden-layers-last-later-l-l-4-first-layer-l-0" style="font-size: 80%;"><b>Layout of a neural network with three hidden layers (last later = \( l=L=4 \), first layer \( l=0 \))</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs029.html#definitions" style="font-size: 80%;"><b>Definitions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs030.html#inputs-to-the-activation-function" style="font-size: 80%;"><b>Inputs to the activation function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs031.html#layout-of-input-to-first-hidden-layer-l-1-from-input-layer-l-0" style="font-size: 80%;"><b>Layout of input to first hidden layer \( l=1 \) from input layer \( l=0 \)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs032.html#derivatives-and-the-chain-rule" style="font-size: 80%;"><b>Derivatives and the chain rule</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs033.html#derivative-of-the-cost-function" style="font-size: 80%;"><b>Derivative of the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs034.html#the-back-propagation-equations-for-a-neural-network" style="font-size: 80%;"><b>The  back propagation equations for a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs035.html#analyzing-the-last-results" style="font-size: 80%;"><b>Analyzing the last results</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs036.html#more-considerations" style="font-size: 80%;"><b>More considerations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs037.html#derivatives-in-terms-of-z-j-l" style="font-size: 80%;"><b>Derivatives in terms of \( z_j^L \)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs038.html#bringing-it-together" style="font-size: 80%;"><b>Bringing it together</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs039.html#final-back-propagating-equation" style="font-size: 80%;"><b>Final back propagating equation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs040.html#using-the-chain-rule-and-summing-over-all-k-entries" style="font-size: 80%;"><b>Using the chain rule and summing over all \( k \) entries</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs041.html#setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations" style="font-size: 80%;"><b>Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs042.html#setting-up-the-back-propagation-algorithm-part-1" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 1</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs043.html#setting-up-the-back-propagation-algorithm-part-2" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs044.html#setting-up-the-back-propagation-algorithm-part-3" style="font-size: 80%;"><b>Setting up the Back propagation algorithm, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs045.html#updating-the-gradients" style="font-size: 80%;"><b>Updating the gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs001.html#lecture-october-13-2025" style="font-size: 80%;"><b>Lecture October 13, 2025</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs002.html#readings-and-videos" style="font-size: 80%;"><b>Readings and videos</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs003.html#material-for-the-lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Material for the lab sessions on Tuesday and Wednesday</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs004.html#lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network" style="font-size: 80%;"><b>Lecture material: Writing a code which implements a feed-forward neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs005.html#mathematics-of-deep-learning" style="font-size: 80%;"><b>Mathematics of deep learning</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs006.html#reminder-on-books-with-hands-on-material-and-codes" style="font-size: 80%;"><b>Reminder on books with hands-on material and codes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs007.html#reading-recommendations" style="font-size: 80%;"><b>Reading recommendations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs008.html#reminder-from-last-week-first-network-example-simple-percepetron-with-one-input" style="font-size: 80%;"><b>Reminder from last week: First network example, simple percepetron with one input</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs009.html#layout-of-a-simple-neural-network-with-no-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with no hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs010.html#optimizing-the-parameters" style="font-size: 80%;"><b>Optimizing the parameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs011.html#adding-a-hidden-layer" style="font-size: 80%;"><b>Adding a hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs012.html#layout-of-a-simple-neural-network-with-one-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with one hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs013.html#the-derivatives" style="font-size: 80%;"><b>The derivatives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs014.html#important-observations" style="font-size: 80%;"><b>Important observations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs015.html#the-training" style="font-size: 80%;"><b>The training</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs016.html#code-example" style="font-size: 80%;"><b>Code example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs017.html#simple-neural-network-and-the-back-propagation-equations" style="font-size: 80%;"><b>Simple neural network and the  back propagation equations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs018.html#layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-with-two-hidden-noeds-and-one-output-node" style="font-size: 80%;"><b>Layout of a simple neural network with two input nodes, one  hidden layer with two hidden noeds and one output node</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs019.html#the-ouput-layer" style="font-size: 80%;"><b>The ouput layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs020.html#compact-expressions" style="font-size: 80%;"><b>Compact expressions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs021.html#output-layer" style="font-size: 80%;"><b>Output layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs022.html#explicit-derivatives" style="font-size: 80%;"><b>Explicit derivatives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs023.html#derivatives-of-the-hidden-layer" style="font-size: 80%;"><b>Derivatives of the hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs024.html#final-expression" style="font-size: 80%;"><b>Final expression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs025.html#completing-the-list" style="font-size: 80%;"><b>Completing the list</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs026.html#final-expressions-for-the-biases-of-the-hidden-layer" style="font-size: 80%;"><b>Final expressions for the biases of the hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs027.html#gradient-expressions" style="font-size: 80%;"><b>Gradient expressions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs028.html#setting-up-the-equations-for-a-neural-network" style="font-size: 80%;"><b>Setting up the equations for a neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs029.html#layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0" style="font-size: 80%;"><b>Layout of a neural network with three hidden layers (last layer = \( l=L=4 \), first layer \( l=0 \))</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs030.html#definitions" style="font-size: 80%;"><b>Definitions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs031.html#inputs-to-the-activation-function" style="font-size: 80%;"><b>Inputs to the activation function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs032.html#layout-of-input-to-first-hidden-layer-l-1-from-input-layer-l-0" style="font-size: 80%;"><b>Layout of input to first hidden layer \( l=1 \) from input layer \( l=0 \)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs033.html#derivatives-and-the-chain-rule" style="font-size: 80%;"><b>Derivatives and the chain rule</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs034.html#derivative-of-the-cost-function" style="font-size: 80%;"><b>Derivative of the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs035.html#the-back-propagation-equations-for-a-neural-network" style="font-size: 80%;"><b>The  back propagation equations for a neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs036.html#analyzing-the-last-results" style="font-size: 80%;"><b>Analyzing the last results</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs037.html#more-considerations" style="font-size: 80%;"><b>More considerations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs038.html#derivatives-in-terms-of-z-j-l" style="font-size: 80%;"><b>Derivatives in terms of \( z_j^L \)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs039.html#bringing-it-together" style="font-size: 80%;"><b>Bringing it together</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs040.html#final-back-propagating-equation" style="font-size: 80%;"><b>Final back propagating equation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs041.html#using-the-chain-rule-and-summing-over-all-k-entries" style="font-size: 80%;"><b>Using the chain rule and summing over all \( k \) entries</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs042.html#setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations" style="font-size: 80%;"><b>Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs043.html#setting-up-the-back-propagation-algorithm-part-1" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 1</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs044.html#setting-up-the-back-propagation-algorithm-part-2" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 2</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs045.html#setting-up-the-back-propagation-algorithm-part-3" style="font-size: 80%;"><b>Setting up the Back propagation algorithm, part 3</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs046.html#updating-the-gradients" style="font-size: 80%;"><b>Updating the gradients</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#activation-functions" style="font-size: 80%;"><b>Activation functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs047.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs048.html#relevance" style="font-size: 80%;"><b>Relevance</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs049.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs050.html#exploding-gradients" style="font-size: 80%;"><b>Exploding gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs051.html#is-the-logistic-activation-function-sigmoid-our-choice" style="font-size: 80%;"><b>Is the Logistic activation function (Sigmoid)  our choice?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs052.html#logistic-function-as-the-root-of-problems" style="font-size: 80%;"><b>Logistic function as the root of problems</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs053.html#the-derivative-of-the-logistic-funtion" style="font-size: 80%;"><b>The derivative of the Logistic funtion</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs054.html#insights-from-the-paper-by-glorot-and-bengio" style="font-size: 80%;"><b>Insights from the paper by Glorot and Bengio</b></a></li>
-     <!-- navigation toc: --> <li><a href="#the-relu-function-family" style="font-size: 80%;"><b>The RELU function family</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs056.html#elu-function" style="font-size: 80%;"><b>ELU function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs057.html#which-activation-function-should-we-use" style="font-size: 80%;"><b>Which activation function should we use?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs058.html#more-on-activation-functions-output-layers" style="font-size: 80%;"><b>More on activation functions, output layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs059.html#fine-tuning-neural-network-hyperparameters" style="font-size: 80%;"><b>Fine-tuning neural network hyperparameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs060.html#hidden-layers" style="font-size: 80%;"><b>Hidden layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs061.html#batch-normalization" style="font-size: 80%;"><b>Batch Normalization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs062.html#dropout" style="font-size: 80%;"><b>Dropout</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs063.html#gradient-clipping" style="font-size: 80%;"><b>Gradient Clipping</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs064.html#a-top-down-perspective-on-neural-networks" style="font-size: 80%;"><b>A top-down perspective on Neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs065.html#more-top-down-perspectives" style="font-size: 80%;"><b>More top-down perspectives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs066.html#limitations-of-supervised-learning-with-deep-networks" style="font-size: 80%;"><b>Limitations of supervised learning with deep networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs067.html#limitations-of-nns" style="font-size: 80%;"><b>Limitations of NNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs068.html#homogeneous-data" style="font-size: 80%;"><b>Homogeneous data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs069.html#more-limitations" style="font-size: 80%;"><b>More limitations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs070.html#setting-up-a-multi-layer-perceptron-model-for-classification" style="font-size: 80%;"><b>Setting up a Multi-layer perceptron model for classification</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs071.html#defining-the-cost-function" style="font-size: 80%;"><b>Defining the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs072.html#example-binary-classification-problem" style="font-size: 80%;"><b>Example: binary classification problem</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs073.html#the-softmax-function" style="font-size: 80%;"><b>The Softmax function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs074.html#developing-a-code-for-doing-neural-networks-with-back-propagation" style="font-size: 80%;"><b>Developing a code for doing neural networks with back propagation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs076.html#train-and-test-datasets" style="font-size: 80%;"><b>Train and test datasets</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs077.html#define-model-and-architecture" style="font-size: 80%;"><b>Define model and architecture</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs078.html#layers" style="font-size: 80%;"><b>Layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs079.html#weights-and-biases" style="font-size: 80%;"><b>Weights and biases</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs080.html#feed-forward-pass" style="font-size: 80%;"><b>Feed-forward pass</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs081.html#matrix-multiplications" style="font-size: 80%;"><b>Matrix multiplications</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs082.html#choose-cost-function-and-optimizer" style="font-size: 80%;"><b>Choose cost function and optimizer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs083.html#optimizing-the-cost-function" style="font-size: 80%;"><b>Optimizing the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs084.html#regularization" style="font-size: 80%;"><b>Regularization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs085.html#matrix-multiplication" style="font-size: 80%;"><b>Matrix  multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs086.html#improving-performance" style="font-size: 80%;"><b>Improving performance</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs087.html#full-object-oriented-implementation" style="font-size: 80%;"><b>Full object-oriented implementation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs088.html#evaluate-model-performance-on-test-data" style="font-size: 80%;"><b>Evaluate model performance on test data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs089.html#adjust-hyperparameters" style="font-size: 80%;"><b>Adjust hyperparameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs091.html#scikit-learn-implementation" style="font-size: 80%;"><b>scikit-learn implementation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#building-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs094.html#tensorflow" style="font-size: 80%;"><b>Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs095.html#using-keras" style="font-size: 80%;"><b>Using Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#the-breast-cancer-data-now-with-keras" style="font-size: 80%;"><b>The Breast Cancer Data, now with Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs048.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs049.html#relevance" style="font-size: 80%;"><b>Relevance</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs050.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs051.html#exploding-gradients" style="font-size: 80%;"><b>Exploding gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs052.html#is-the-logistic-activation-function-sigmoid-our-choice" style="font-size: 80%;"><b>Is the Logistic activation function (Sigmoid)  our choice?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs053.html#logistic-function-as-the-root-of-problems" style="font-size: 80%;"><b>Logistic function as the root of problems</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs054.html#the-derivative-of-the-logistic-funtion" style="font-size: 80%;"><b>The derivative of the Logistic funtion</b></a></li>
+     <!-- navigation toc: --> <li><a href="#insights-from-the-paper-by-glorot-and-bengio" style="font-size: 80%;"><b>Insights from the paper by Glorot and Bengio</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs056.html#the-relu-function-family" style="font-size: 80%;"><b>The RELU function family</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs057.html#elu-function" style="font-size: 80%;"><b>ELU function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs058.html#which-activation-function-should-we-use" style="font-size: 80%;"><b>Which activation function should we use?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs059.html#more-on-activation-functions-output-layers" style="font-size: 80%;"><b>More on activation functions, output layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs060.html#fine-tuning-neural-network-hyperparameters" style="font-size: 80%;"><b>Fine-tuning neural network hyperparameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs061.html#hidden-layers" style="font-size: 80%;"><b>Hidden layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs062.html#batch-normalization" style="font-size: 80%;"><b>Batch Normalization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs063.html#dropout" style="font-size: 80%;"><b>Dropout</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs064.html#gradient-clipping" style="font-size: 80%;"><b>Gradient Clipping</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs065.html#a-top-down-perspective-on-neural-networks" style="font-size: 80%;"><b>A top-down perspective on Neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs066.html#more-top-down-perspectives" style="font-size: 80%;"><b>More top-down perspectives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs067.html#limitations-of-supervised-learning-with-deep-networks" style="font-size: 80%;"><b>Limitations of supervised learning with deep networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs068.html#limitations-of-nns" style="font-size: 80%;"><b>Limitations of NNs</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs069.html#homogeneous-data" style="font-size: 80%;"><b>Homogeneous data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs070.html#more-limitations" style="font-size: 80%;"><b>More limitations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs071.html#setting-up-a-multi-layer-perceptron-model-for-classification" style="font-size: 80%;"><b>Setting up a Multi-layer perceptron model for classification</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs072.html#defining-the-cost-function" style="font-size: 80%;"><b>Defining the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs073.html#example-binary-classification-problem" style="font-size: 80%;"><b>Example: binary classification problem</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs074.html#the-softmax-function" style="font-size: 80%;"><b>The Softmax function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs075.html#developing-a-code-for-doing-neural-networks-with-back-propagation" style="font-size: 80%;"><b>Developing a code for doing neural networks with back propagation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs077.html#train-and-test-datasets" style="font-size: 80%;"><b>Train and test datasets</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs078.html#define-model-and-architecture" style="font-size: 80%;"><b>Define model and architecture</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs079.html#layers" style="font-size: 80%;"><b>Layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs080.html#weights-and-biases" style="font-size: 80%;"><b>Weights and biases</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs081.html#feed-forward-pass" style="font-size: 80%;"><b>Feed-forward pass</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs082.html#matrix-multiplications" style="font-size: 80%;"><b>Matrix multiplications</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs083.html#choose-cost-function-and-optimizer" style="font-size: 80%;"><b>Choose cost function and optimizer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs084.html#optimizing-the-cost-function" style="font-size: 80%;"><b>Optimizing the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs085.html#regularization" style="font-size: 80%;"><b>Regularization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs086.html#matrix-multiplication" style="font-size: 80%;"><b>Matrix  multiplication</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs087.html#improving-performance" style="font-size: 80%;"><b>Improving performance</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs088.html#full-object-oriented-implementation" style="font-size: 80%;"><b>Full object-oriented implementation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs089.html#evaluate-model-performance-on-test-data" style="font-size: 80%;"><b>Evaluate model performance on test data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs090.html#adjust-hyperparameters" style="font-size: 80%;"><b>Adjust hyperparameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#scikit-learn-implementation" style="font-size: 80%;"><b>scikit-learn implementation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs094.html#building-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building neural networks in Tensorflow and Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs095.html#tensorflow" style="font-size: 80%;"><b>Tensorflow</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#using-keras" style="font-size: 80%;"><b>Using Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#building-a-neural-network-code" style="font-size: 80%;"><b>Building a neural network code</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#learning-rate-methods" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Learning rate methods</a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#usage-of-the-above-learning-rate-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of the above learning rate schedulers</a></li>
@@ -465,19 +463,17 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0055"></a>
 <!-- !split -->
-<h2 id="the-relu-function-family" class="anchor">The RELU function family </h2>
-
-<p>The ReLU activation function suffers from a problem known as the dying
-ReLUs: during training, some neurons effectively die, meaning they
-stop outputting anything other than 0.
-</p>
+<h2 id="insights-from-the-paper-by-glorot-and-bengio" class="anchor">Insights from the paper by Glorot and Bengio </h2>
 
-<p>In some cases, you may find that half of your network&#8217;s neurons are
-dead, especially if you used a large learning rate. During training,
-if a neuron&#8217;s weights get updated such that the weighted sum of the
-neuron&#8217;s inputs is negative, it will start outputting 0. When this
-happen, the neuron is unlikely to come back to life since the gradient
-of the ReLU function is 0 when its input is negative.
+<p>One of the insights in the 2010 paper by Glorot and Bengio was that
+the vanishing/exploding gradients problems were in part due to a poor
+choice of activation function. Until then most people had assumed that
+if Nature had chosen to use roughly sigmoid activation functions in
+biological neurons, they must be an excellent choice. But it turns out
+that other activation functions behave much better in deep neural
+networks, in particular the ReLU activation function, mostly because
+it does not saturate for positive values (and also because it is quite
+fast to compute).
 </p>
 
 <p>
diff --git a/doc/pub/week42/html/._week42-bs056.html b/doc/pub/week42/html/._week42-bs056.html
index 0c01e85e0..a5f31f32d 100644
--- a/doc/pub/week42/html/._week42-bs056.html
+++ b/doc/pub/week42/html/._week42-bs056.html
@@ -36,16 +36,17 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Lecture October 14, 2024', 2, None, 'lecture-october-14-2024'),
-              ('Material for the active learning sessions on Tuesday and '
-               'Wednesday',
+ 'sections': [('Lecture October 13, 2025', 2, None, 'lecture-october-13-2025'),
+              ('Readings and videos', 2, None, 'readings-and-videos'),
+              ('Material for the lab sessions on Tuesday and Wednesday',
                2,
                None,
-               'material-for-the-active-learning-sessions-on-tuesday-and-wednesday'),
-              ('Writing a code which implements a feed-forward neural network',
+               'material-for-the-lab-sessions-on-tuesday-and-wednesday'),
+              ('Lecture material: Writing a code which implements a '
+               'feed-forward neural network',
                2,
                None,
-               'writing-a-code-which-implements-a-feed-forward-neural-network'),
+               'lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network'),
               ('Mathematics of deep learning',
                2,
                None,
@@ -55,10 +56,11 @@
                None,
                'reminder-on-books-with-hands-on-material-and-codes'),
               ('Reading recommendations', 2, None, 'reading-recommendations'),
-              ('First network example, simple percepetron with one input',
+              ('Reminder from last week: First network example, simple '
+               'percepetron with one input',
                2,
                None,
-               'first-network-example-simple-percepetron-with-one-input'),
+               'reminder-from-last-week-first-network-example-simple-percepetron-with-one-input'),
               ('Layout of a simple neural network with no hidden layer',
                2,
                None,
@@ -105,10 +107,10 @@
                None,
                'setting-up-the-equations-for-a-neural-network'),
               ('Layout of a neural network with three hidden layers (last '
-               'later = $l=L=4$, first layer $l=0$)',
+               'layer = $l=L=4$, first layer $l=0$)',
                2,
                None,
-               'layout-of-a-neural-network-with-three-hidden-layers-last-later-l-l-4-first-layer-l-0'),
+               'layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0'),
               ('Definitions', 2, None, 'definitions'),
               ('Inputs to the activation function',
                2,
@@ -291,10 +293,6 @@
                2,
                None,
                'collect-and-pre-process-data'),
-              ('The Breast Cancer Data, now with Keras',
-               2,
-               None,
-               'the-breast-cancer-data-now-with-keras'),
               ('Building a neural network code',
                2,
                None,
@@ -349,103 +347,103 @@
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs001.html#lecture-october-14-2024" style="font-size: 80%;"><b>Lecture October 14, 2024</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs002.html#material-for-the-active-learning-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Material for the active learning sessions on Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs003.html#writing-a-code-which-implements-a-feed-forward-neural-network" style="font-size: 80%;"><b>Writing a code which implements a feed-forward neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs004.html#mathematics-of-deep-learning" style="font-size: 80%;"><b>Mathematics of deep learning</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs005.html#reminder-on-books-with-hands-on-material-and-codes" style="font-size: 80%;"><b>Reminder on books with hands-on material and codes</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs006.html#reading-recommendations" style="font-size: 80%;"><b>Reading recommendations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs007.html#first-network-example-simple-percepetron-with-one-input" style="font-size: 80%;"><b>First network example, simple percepetron with one input</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs008.html#layout-of-a-simple-neural-network-with-no-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with no hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs009.html#optimizing-the-parameters" style="font-size: 80%;"><b>Optimizing the parameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs010.html#adding-a-hidden-layer" style="font-size: 80%;"><b>Adding a hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs011.html#layout-of-a-simple-neural-network-with-one-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with one hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs012.html#the-derivatives" style="font-size: 80%;"><b>The derivatives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs013.html#important-observations" style="font-size: 80%;"><b>Important observations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs014.html#the-training" style="font-size: 80%;"><b>The training</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs015.html#code-example" style="font-size: 80%;"><b>Code example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs016.html#simple-neural-network-and-the-back-propagation-equations" style="font-size: 80%;"><b>Simple neural network and the  back propagation equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs017.html#layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-with-two-hidden-noeds-and-one-output-node" style="font-size: 80%;"><b>Layout of a simple neural network with two input nodes, one  hidden layer with two hidden noeds and one output node</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs018.html#the-ouput-layer" style="font-size: 80%;"><b>The ouput layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs019.html#compact-expressions" style="font-size: 80%;"><b>Compact expressions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs020.html#output-layer" style="font-size: 80%;"><b>Output layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs021.html#explicit-derivatives" style="font-size: 80%;"><b>Explicit derivatives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs022.html#derivatives-of-the-hidden-layer" style="font-size: 80%;"><b>Derivatives of the hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs023.html#final-expression" style="font-size: 80%;"><b>Final expression</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs024.html#completing-the-list" style="font-size: 80%;"><b>Completing the list</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs025.html#final-expressions-for-the-biases-of-the-hidden-layer" style="font-size: 80%;"><b>Final expressions for the biases of the hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs026.html#gradient-expressions" style="font-size: 80%;"><b>Gradient expressions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs027.html#setting-up-the-equations-for-a-neural-network" style="font-size: 80%;"><b>Setting up the equations for a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs028.html#layout-of-a-neural-network-with-three-hidden-layers-last-later-l-l-4-first-layer-l-0" style="font-size: 80%;"><b>Layout of a neural network with three hidden layers (last later = \( l=L=4 \), first layer \( l=0 \))</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs029.html#definitions" style="font-size: 80%;"><b>Definitions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs030.html#inputs-to-the-activation-function" style="font-size: 80%;"><b>Inputs to the activation function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs031.html#layout-of-input-to-first-hidden-layer-l-1-from-input-layer-l-0" style="font-size: 80%;"><b>Layout of input to first hidden layer \( l=1 \) from input layer \( l=0 \)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs032.html#derivatives-and-the-chain-rule" style="font-size: 80%;"><b>Derivatives and the chain rule</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs033.html#derivative-of-the-cost-function" style="font-size: 80%;"><b>Derivative of the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs034.html#the-back-propagation-equations-for-a-neural-network" style="font-size: 80%;"><b>The  back propagation equations for a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs035.html#analyzing-the-last-results" style="font-size: 80%;"><b>Analyzing the last results</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs036.html#more-considerations" style="font-size: 80%;"><b>More considerations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs037.html#derivatives-in-terms-of-z-j-l" style="font-size: 80%;"><b>Derivatives in terms of \( z_j^L \)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs038.html#bringing-it-together" style="font-size: 80%;"><b>Bringing it together</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs039.html#final-back-propagating-equation" style="font-size: 80%;"><b>Final back propagating equation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs040.html#using-the-chain-rule-and-summing-over-all-k-entries" style="font-size: 80%;"><b>Using the chain rule and summing over all \( k \) entries</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs041.html#setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations" style="font-size: 80%;"><b>Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs042.html#setting-up-the-back-propagation-algorithm-part-1" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 1</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs043.html#setting-up-the-back-propagation-algorithm-part-2" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs044.html#setting-up-the-back-propagation-algorithm-part-3" style="font-size: 80%;"><b>Setting up the Back propagation algorithm, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs045.html#updating-the-gradients" style="font-size: 80%;"><b>Updating the gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs001.html#lecture-october-13-2025" style="font-size: 80%;"><b>Lecture October 13, 2025</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs002.html#readings-and-videos" style="font-size: 80%;"><b>Readings and videos</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs003.html#material-for-the-lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Material for the lab sessions on Tuesday and Wednesday</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs004.html#lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network" style="font-size: 80%;"><b>Lecture material: Writing a code which implements a feed-forward neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs005.html#mathematics-of-deep-learning" style="font-size: 80%;"><b>Mathematics of deep learning</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs006.html#reminder-on-books-with-hands-on-material-and-codes" style="font-size: 80%;"><b>Reminder on books with hands-on material and codes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs007.html#reading-recommendations" style="font-size: 80%;"><b>Reading recommendations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs008.html#reminder-from-last-week-first-network-example-simple-percepetron-with-one-input" style="font-size: 80%;"><b>Reminder from last week: First network example, simple percepetron with one input</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs009.html#layout-of-a-simple-neural-network-with-no-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with no hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs010.html#optimizing-the-parameters" style="font-size: 80%;"><b>Optimizing the parameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs011.html#adding-a-hidden-layer" style="font-size: 80%;"><b>Adding a hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs012.html#layout-of-a-simple-neural-network-with-one-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with one hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs013.html#the-derivatives" style="font-size: 80%;"><b>The derivatives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs014.html#important-observations" style="font-size: 80%;"><b>Important observations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs015.html#the-training" style="font-size: 80%;"><b>The training</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs016.html#code-example" style="font-size: 80%;"><b>Code example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs017.html#simple-neural-network-and-the-back-propagation-equations" style="font-size: 80%;"><b>Simple neural network and the  back propagation equations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs018.html#layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-with-two-hidden-noeds-and-one-output-node" style="font-size: 80%;"><b>Layout of a simple neural network with two input nodes, one  hidden layer with two hidden noeds and one output node</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs019.html#the-ouput-layer" style="font-size: 80%;"><b>The ouput layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs020.html#compact-expressions" style="font-size: 80%;"><b>Compact expressions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs021.html#output-layer" style="font-size: 80%;"><b>Output layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs022.html#explicit-derivatives" style="font-size: 80%;"><b>Explicit derivatives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs023.html#derivatives-of-the-hidden-layer" style="font-size: 80%;"><b>Derivatives of the hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs024.html#final-expression" style="font-size: 80%;"><b>Final expression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs025.html#completing-the-list" style="font-size: 80%;"><b>Completing the list</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs026.html#final-expressions-for-the-biases-of-the-hidden-layer" style="font-size: 80%;"><b>Final expressions for the biases of the hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs027.html#gradient-expressions" style="font-size: 80%;"><b>Gradient expressions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs028.html#setting-up-the-equations-for-a-neural-network" style="font-size: 80%;"><b>Setting up the equations for a neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs029.html#layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0" style="font-size: 80%;"><b>Layout of a neural network with three hidden layers (last layer = \( l=L=4 \), first layer \( l=0 \))</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs030.html#definitions" style="font-size: 80%;"><b>Definitions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs031.html#inputs-to-the-activation-function" style="font-size: 80%;"><b>Inputs to the activation function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs032.html#layout-of-input-to-first-hidden-layer-l-1-from-input-layer-l-0" style="font-size: 80%;"><b>Layout of input to first hidden layer \( l=1 \) from input layer \( l=0 \)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs033.html#derivatives-and-the-chain-rule" style="font-size: 80%;"><b>Derivatives and the chain rule</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs034.html#derivative-of-the-cost-function" style="font-size: 80%;"><b>Derivative of the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs035.html#the-back-propagation-equations-for-a-neural-network" style="font-size: 80%;"><b>The  back propagation equations for a neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs036.html#analyzing-the-last-results" style="font-size: 80%;"><b>Analyzing the last results</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs037.html#more-considerations" style="font-size: 80%;"><b>More considerations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs038.html#derivatives-in-terms-of-z-j-l" style="font-size: 80%;"><b>Derivatives in terms of \( z_j^L \)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs039.html#bringing-it-together" style="font-size: 80%;"><b>Bringing it together</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs040.html#final-back-propagating-equation" style="font-size: 80%;"><b>Final back propagating equation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs041.html#using-the-chain-rule-and-summing-over-all-k-entries" style="font-size: 80%;"><b>Using the chain rule and summing over all \( k \) entries</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs042.html#setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations" style="font-size: 80%;"><b>Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs043.html#setting-up-the-back-propagation-algorithm-part-1" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 1</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs044.html#setting-up-the-back-propagation-algorithm-part-2" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 2</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs045.html#setting-up-the-back-propagation-algorithm-part-3" style="font-size: 80%;"><b>Setting up the Back propagation algorithm, part 3</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs046.html#updating-the-gradients" style="font-size: 80%;"><b>Updating the gradients</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#activation-functions" style="font-size: 80%;"><b>Activation functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs047.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs048.html#relevance" style="font-size: 80%;"><b>Relevance</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs049.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs050.html#exploding-gradients" style="font-size: 80%;"><b>Exploding gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs051.html#is-the-logistic-activation-function-sigmoid-our-choice" style="font-size: 80%;"><b>Is the Logistic activation function (Sigmoid)  our choice?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs052.html#logistic-function-as-the-root-of-problems" style="font-size: 80%;"><b>Logistic function as the root of problems</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs053.html#the-derivative-of-the-logistic-funtion" style="font-size: 80%;"><b>The derivative of the Logistic funtion</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs054.html#insights-from-the-paper-by-glorot-and-bengio" style="font-size: 80%;"><b>Insights from the paper by Glorot and Bengio</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs055.html#the-relu-function-family" style="font-size: 80%;"><b>The RELU function family</b></a></li>
-     <!-- navigation toc: --> <li><a href="#elu-function" style="font-size: 80%;"><b>ELU function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs057.html#which-activation-function-should-we-use" style="font-size: 80%;"><b>Which activation function should we use?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs058.html#more-on-activation-functions-output-layers" style="font-size: 80%;"><b>More on activation functions, output layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs059.html#fine-tuning-neural-network-hyperparameters" style="font-size: 80%;"><b>Fine-tuning neural network hyperparameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs060.html#hidden-layers" style="font-size: 80%;"><b>Hidden layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs061.html#batch-normalization" style="font-size: 80%;"><b>Batch Normalization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs062.html#dropout" style="font-size: 80%;"><b>Dropout</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs063.html#gradient-clipping" style="font-size: 80%;"><b>Gradient Clipping</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs064.html#a-top-down-perspective-on-neural-networks" style="font-size: 80%;"><b>A top-down perspective on Neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs065.html#more-top-down-perspectives" style="font-size: 80%;"><b>More top-down perspectives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs066.html#limitations-of-supervised-learning-with-deep-networks" style="font-size: 80%;"><b>Limitations of supervised learning with deep networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs067.html#limitations-of-nns" style="font-size: 80%;"><b>Limitations of NNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs068.html#homogeneous-data" style="font-size: 80%;"><b>Homogeneous data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs069.html#more-limitations" style="font-size: 80%;"><b>More limitations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs070.html#setting-up-a-multi-layer-perceptron-model-for-classification" style="font-size: 80%;"><b>Setting up a Multi-layer perceptron model for classification</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs071.html#defining-the-cost-function" style="font-size: 80%;"><b>Defining the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs072.html#example-binary-classification-problem" style="font-size: 80%;"><b>Example: binary classification problem</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs073.html#the-softmax-function" style="font-size: 80%;"><b>The Softmax function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs074.html#developing-a-code-for-doing-neural-networks-with-back-propagation" style="font-size: 80%;"><b>Developing a code for doing neural networks with back propagation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs076.html#train-and-test-datasets" style="font-size: 80%;"><b>Train and test datasets</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs077.html#define-model-and-architecture" style="font-size: 80%;"><b>Define model and architecture</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs078.html#layers" style="font-size: 80%;"><b>Layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs079.html#weights-and-biases" style="font-size: 80%;"><b>Weights and biases</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs080.html#feed-forward-pass" style="font-size: 80%;"><b>Feed-forward pass</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs081.html#matrix-multiplications" style="font-size: 80%;"><b>Matrix multiplications</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs082.html#choose-cost-function-and-optimizer" style="font-size: 80%;"><b>Choose cost function and optimizer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs083.html#optimizing-the-cost-function" style="font-size: 80%;"><b>Optimizing the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs084.html#regularization" style="font-size: 80%;"><b>Regularization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs085.html#matrix-multiplication" style="font-size: 80%;"><b>Matrix  multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs086.html#improving-performance" style="font-size: 80%;"><b>Improving performance</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs087.html#full-object-oriented-implementation" style="font-size: 80%;"><b>Full object-oriented implementation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs088.html#evaluate-model-performance-on-test-data" style="font-size: 80%;"><b>Evaluate model performance on test data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs089.html#adjust-hyperparameters" style="font-size: 80%;"><b>Adjust hyperparameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs091.html#scikit-learn-implementation" style="font-size: 80%;"><b>scikit-learn implementation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#building-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs094.html#tensorflow" style="font-size: 80%;"><b>Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs095.html#using-keras" style="font-size: 80%;"><b>Using Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#the-breast-cancer-data-now-with-keras" style="font-size: 80%;"><b>The Breast Cancer Data, now with Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs048.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs049.html#relevance" style="font-size: 80%;"><b>Relevance</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs050.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs051.html#exploding-gradients" style="font-size: 80%;"><b>Exploding gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs052.html#is-the-logistic-activation-function-sigmoid-our-choice" style="font-size: 80%;"><b>Is the Logistic activation function (Sigmoid)  our choice?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs053.html#logistic-function-as-the-root-of-problems" style="font-size: 80%;"><b>Logistic function as the root of problems</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs054.html#the-derivative-of-the-logistic-funtion" style="font-size: 80%;"><b>The derivative of the Logistic funtion</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs055.html#insights-from-the-paper-by-glorot-and-bengio" style="font-size: 80%;"><b>Insights from the paper by Glorot and Bengio</b></a></li>
+     <!-- navigation toc: --> <li><a href="#the-relu-function-family" style="font-size: 80%;"><b>The RELU function family</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs057.html#elu-function" style="font-size: 80%;"><b>ELU function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs058.html#which-activation-function-should-we-use" style="font-size: 80%;"><b>Which activation function should we use?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs059.html#more-on-activation-functions-output-layers" style="font-size: 80%;"><b>More on activation functions, output layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs060.html#fine-tuning-neural-network-hyperparameters" style="font-size: 80%;"><b>Fine-tuning neural network hyperparameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs061.html#hidden-layers" style="font-size: 80%;"><b>Hidden layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs062.html#batch-normalization" style="font-size: 80%;"><b>Batch Normalization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs063.html#dropout" style="font-size: 80%;"><b>Dropout</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs064.html#gradient-clipping" style="font-size: 80%;"><b>Gradient Clipping</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs065.html#a-top-down-perspective-on-neural-networks" style="font-size: 80%;"><b>A top-down perspective on Neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs066.html#more-top-down-perspectives" style="font-size: 80%;"><b>More top-down perspectives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs067.html#limitations-of-supervised-learning-with-deep-networks" style="font-size: 80%;"><b>Limitations of supervised learning with deep networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs068.html#limitations-of-nns" style="font-size: 80%;"><b>Limitations of NNs</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs069.html#homogeneous-data" style="font-size: 80%;"><b>Homogeneous data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs070.html#more-limitations" style="font-size: 80%;"><b>More limitations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs071.html#setting-up-a-multi-layer-perceptron-model-for-classification" style="font-size: 80%;"><b>Setting up a Multi-layer perceptron model for classification</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs072.html#defining-the-cost-function" style="font-size: 80%;"><b>Defining the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs073.html#example-binary-classification-problem" style="font-size: 80%;"><b>Example: binary classification problem</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs074.html#the-softmax-function" style="font-size: 80%;"><b>The Softmax function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs075.html#developing-a-code-for-doing-neural-networks-with-back-propagation" style="font-size: 80%;"><b>Developing a code for doing neural networks with back propagation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs077.html#train-and-test-datasets" style="font-size: 80%;"><b>Train and test datasets</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs078.html#define-model-and-architecture" style="font-size: 80%;"><b>Define model and architecture</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs079.html#layers" style="font-size: 80%;"><b>Layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs080.html#weights-and-biases" style="font-size: 80%;"><b>Weights and biases</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs081.html#feed-forward-pass" style="font-size: 80%;"><b>Feed-forward pass</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs082.html#matrix-multiplications" style="font-size: 80%;"><b>Matrix multiplications</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs083.html#choose-cost-function-and-optimizer" style="font-size: 80%;"><b>Choose cost function and optimizer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs084.html#optimizing-the-cost-function" style="font-size: 80%;"><b>Optimizing the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs085.html#regularization" style="font-size: 80%;"><b>Regularization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs086.html#matrix-multiplication" style="font-size: 80%;"><b>Matrix  multiplication</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs087.html#improving-performance" style="font-size: 80%;"><b>Improving performance</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs088.html#full-object-oriented-implementation" style="font-size: 80%;"><b>Full object-oriented implementation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs089.html#evaluate-model-performance-on-test-data" style="font-size: 80%;"><b>Evaluate model performance on test data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs090.html#adjust-hyperparameters" style="font-size: 80%;"><b>Adjust hyperparameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#scikit-learn-implementation" style="font-size: 80%;"><b>scikit-learn implementation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs094.html#building-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building neural networks in Tensorflow and Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs095.html#tensorflow" style="font-size: 80%;"><b>Tensorflow</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#using-keras" style="font-size: 80%;"><b>Using Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#building-a-neural-network-code" style="font-size: 80%;"><b>Building a neural network code</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#learning-rate-methods" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Learning rate methods</a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#usage-of-the-above-learning-rate-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of the above learning rate schedulers</a></li>
@@ -465,17 +463,20 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0056"></a>
 <!-- !split -->
-<h2 id="elu-function" class="anchor">ELU function </h2>
+<h2 id="the-relu-function-family" class="anchor">The RELU function family </h2>
 
-<p>To solve this problem, nowadays practitioners use a variant of the
-ReLU function, such as the leaky ReLU discussed above or the so-called
-exponential linear unit (ELU) function
+<p>The ReLU activation function suffers from a problem known as the dying
+ReLUs: during training, some neurons effectively die, meaning they
+stop outputting anything other than 0.
 </p>
 
-$$
-ELU(z) = \left\{\begin{array}{cc} \alpha\left( \exp{(z)}-1\right) & z < 0,\\  z & z \ge 0.\end{array}\right. 
-$$
-
+<p>In some cases, you may find that half of your network&#8217;s neurons are
+dead, especially if you used a large learning rate. During training,
+if a neuron&#8217;s weights get updated such that the weighted sum of the
+neuron&#8217;s inputs is negative, it will start outputting 0. When this
+happen, the neuron is unlikely to come back to life since the gradient
+of the ReLU function is 0 when its input is negative.
+</p>
 
 <p>
 <!-- navigation buttons at the bottom of the page -->
diff --git a/doc/pub/week42/html/._week42-bs057.html b/doc/pub/week42/html/._week42-bs057.html
index e3de53cde..96fc4e2e0 100644
--- a/doc/pub/week42/html/._week42-bs057.html
+++ b/doc/pub/week42/html/._week42-bs057.html
@@ -36,16 +36,17 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Lecture October 14, 2024', 2, None, 'lecture-october-14-2024'),
-              ('Material for the active learning sessions on Tuesday and '
-               'Wednesday',
+ 'sections': [('Lecture October 13, 2025', 2, None, 'lecture-october-13-2025'),
+              ('Readings and videos', 2, None, 'readings-and-videos'),
+              ('Material for the lab sessions on Tuesday and Wednesday',
                2,
                None,
-               'material-for-the-active-learning-sessions-on-tuesday-and-wednesday'),
-              ('Writing a code which implements a feed-forward neural network',
+               'material-for-the-lab-sessions-on-tuesday-and-wednesday'),
+              ('Lecture material: Writing a code which implements a '
+               'feed-forward neural network',
                2,
                None,
-               'writing-a-code-which-implements-a-feed-forward-neural-network'),
+               'lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network'),
               ('Mathematics of deep learning',
                2,
                None,
@@ -55,10 +56,11 @@
                None,
                'reminder-on-books-with-hands-on-material-and-codes'),
               ('Reading recommendations', 2, None, 'reading-recommendations'),
-              ('First network example, simple percepetron with one input',
+              ('Reminder from last week: First network example, simple '
+               'percepetron with one input',
                2,
                None,
-               'first-network-example-simple-percepetron-with-one-input'),
+               'reminder-from-last-week-first-network-example-simple-percepetron-with-one-input'),
               ('Layout of a simple neural network with no hidden layer',
                2,
                None,
@@ -105,10 +107,10 @@
                None,
                'setting-up-the-equations-for-a-neural-network'),
               ('Layout of a neural network with three hidden layers (last '
-               'later = $l=L=4$, first layer $l=0$)',
+               'layer = $l=L=4$, first layer $l=0$)',
                2,
                None,
-               'layout-of-a-neural-network-with-three-hidden-layers-last-later-l-l-4-first-layer-l-0'),
+               'layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0'),
               ('Definitions', 2, None, 'definitions'),
               ('Inputs to the activation function',
                2,
@@ -291,10 +293,6 @@
                2,
                None,
                'collect-and-pre-process-data'),
-              ('The Breast Cancer Data, now with Keras',
-               2,
-               None,
-               'the-breast-cancer-data-now-with-keras'),
               ('Building a neural network code',
                2,
                None,
@@ -349,103 +347,103 @@
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs001.html#lecture-october-14-2024" style="font-size: 80%;"><b>Lecture October 14, 2024</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs002.html#material-for-the-active-learning-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Material for the active learning sessions on Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs003.html#writing-a-code-which-implements-a-feed-forward-neural-network" style="font-size: 80%;"><b>Writing a code which implements a feed-forward neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs004.html#mathematics-of-deep-learning" style="font-size: 80%;"><b>Mathematics of deep learning</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs005.html#reminder-on-books-with-hands-on-material-and-codes" style="font-size: 80%;"><b>Reminder on books with hands-on material and codes</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs006.html#reading-recommendations" style="font-size: 80%;"><b>Reading recommendations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs007.html#first-network-example-simple-percepetron-with-one-input" style="font-size: 80%;"><b>First network example, simple percepetron with one input</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs008.html#layout-of-a-simple-neural-network-with-no-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with no hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs009.html#optimizing-the-parameters" style="font-size: 80%;"><b>Optimizing the parameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs010.html#adding-a-hidden-layer" style="font-size: 80%;"><b>Adding a hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs011.html#layout-of-a-simple-neural-network-with-one-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with one hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs012.html#the-derivatives" style="font-size: 80%;"><b>The derivatives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs013.html#important-observations" style="font-size: 80%;"><b>Important observations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs014.html#the-training" style="font-size: 80%;"><b>The training</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs015.html#code-example" style="font-size: 80%;"><b>Code example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs016.html#simple-neural-network-and-the-back-propagation-equations" style="font-size: 80%;"><b>Simple neural network and the  back propagation equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs017.html#layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-with-two-hidden-noeds-and-one-output-node" style="font-size: 80%;"><b>Layout of a simple neural network with two input nodes, one  hidden layer with two hidden noeds and one output node</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs018.html#the-ouput-layer" style="font-size: 80%;"><b>The ouput layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs019.html#compact-expressions" style="font-size: 80%;"><b>Compact expressions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs020.html#output-layer" style="font-size: 80%;"><b>Output layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs021.html#explicit-derivatives" style="font-size: 80%;"><b>Explicit derivatives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs022.html#derivatives-of-the-hidden-layer" style="font-size: 80%;"><b>Derivatives of the hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs023.html#final-expression" style="font-size: 80%;"><b>Final expression</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs024.html#completing-the-list" style="font-size: 80%;"><b>Completing the list</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs025.html#final-expressions-for-the-biases-of-the-hidden-layer" style="font-size: 80%;"><b>Final expressions for the biases of the hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs026.html#gradient-expressions" style="font-size: 80%;"><b>Gradient expressions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs027.html#setting-up-the-equations-for-a-neural-network" style="font-size: 80%;"><b>Setting up the equations for a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs028.html#layout-of-a-neural-network-with-three-hidden-layers-last-later-l-l-4-first-layer-l-0" style="font-size: 80%;"><b>Layout of a neural network with three hidden layers (last later = \( l=L=4 \), first layer \( l=0 \))</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs029.html#definitions" style="font-size: 80%;"><b>Definitions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs030.html#inputs-to-the-activation-function" style="font-size: 80%;"><b>Inputs to the activation function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs031.html#layout-of-input-to-first-hidden-layer-l-1-from-input-layer-l-0" style="font-size: 80%;"><b>Layout of input to first hidden layer \( l=1 \) from input layer \( l=0 \)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs032.html#derivatives-and-the-chain-rule" style="font-size: 80%;"><b>Derivatives and the chain rule</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs033.html#derivative-of-the-cost-function" style="font-size: 80%;"><b>Derivative of the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs034.html#the-back-propagation-equations-for-a-neural-network" style="font-size: 80%;"><b>The  back propagation equations for a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs035.html#analyzing-the-last-results" style="font-size: 80%;"><b>Analyzing the last results</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs036.html#more-considerations" style="font-size: 80%;"><b>More considerations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs037.html#derivatives-in-terms-of-z-j-l" style="font-size: 80%;"><b>Derivatives in terms of \( z_j^L \)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs038.html#bringing-it-together" style="font-size: 80%;"><b>Bringing it together</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs039.html#final-back-propagating-equation" style="font-size: 80%;"><b>Final back propagating equation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs040.html#using-the-chain-rule-and-summing-over-all-k-entries" style="font-size: 80%;"><b>Using the chain rule and summing over all \( k \) entries</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs041.html#setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations" style="font-size: 80%;"><b>Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs042.html#setting-up-the-back-propagation-algorithm-part-1" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 1</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs043.html#setting-up-the-back-propagation-algorithm-part-2" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs044.html#setting-up-the-back-propagation-algorithm-part-3" style="font-size: 80%;"><b>Setting up the Back propagation algorithm, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs045.html#updating-the-gradients" style="font-size: 80%;"><b>Updating the gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs001.html#lecture-october-13-2025" style="font-size: 80%;"><b>Lecture October 13, 2025</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs002.html#readings-and-videos" style="font-size: 80%;"><b>Readings and videos</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs003.html#material-for-the-lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Material for the lab sessions on Tuesday and Wednesday</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs004.html#lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network" style="font-size: 80%;"><b>Lecture material: Writing a code which implements a feed-forward neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs005.html#mathematics-of-deep-learning" style="font-size: 80%;"><b>Mathematics of deep learning</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs006.html#reminder-on-books-with-hands-on-material-and-codes" style="font-size: 80%;"><b>Reminder on books with hands-on material and codes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs007.html#reading-recommendations" style="font-size: 80%;"><b>Reading recommendations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs008.html#reminder-from-last-week-first-network-example-simple-percepetron-with-one-input" style="font-size: 80%;"><b>Reminder from last week: First network example, simple percepetron with one input</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs009.html#layout-of-a-simple-neural-network-with-no-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with no hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs010.html#optimizing-the-parameters" style="font-size: 80%;"><b>Optimizing the parameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs011.html#adding-a-hidden-layer" style="font-size: 80%;"><b>Adding a hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs012.html#layout-of-a-simple-neural-network-with-one-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with one hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs013.html#the-derivatives" style="font-size: 80%;"><b>The derivatives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs014.html#important-observations" style="font-size: 80%;"><b>Important observations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs015.html#the-training" style="font-size: 80%;"><b>The training</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs016.html#code-example" style="font-size: 80%;"><b>Code example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs017.html#simple-neural-network-and-the-back-propagation-equations" style="font-size: 80%;"><b>Simple neural network and the  back propagation equations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs018.html#layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-with-two-hidden-noeds-and-one-output-node" style="font-size: 80%;"><b>Layout of a simple neural network with two input nodes, one  hidden layer with two hidden noeds and one output node</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs019.html#the-ouput-layer" style="font-size: 80%;"><b>The ouput layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs020.html#compact-expressions" style="font-size: 80%;"><b>Compact expressions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs021.html#output-layer" style="font-size: 80%;"><b>Output layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs022.html#explicit-derivatives" style="font-size: 80%;"><b>Explicit derivatives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs023.html#derivatives-of-the-hidden-layer" style="font-size: 80%;"><b>Derivatives of the hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs024.html#final-expression" style="font-size: 80%;"><b>Final expression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs025.html#completing-the-list" style="font-size: 80%;"><b>Completing the list</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs026.html#final-expressions-for-the-biases-of-the-hidden-layer" style="font-size: 80%;"><b>Final expressions for the biases of the hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs027.html#gradient-expressions" style="font-size: 80%;"><b>Gradient expressions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs028.html#setting-up-the-equations-for-a-neural-network" style="font-size: 80%;"><b>Setting up the equations for a neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs029.html#layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0" style="font-size: 80%;"><b>Layout of a neural network with three hidden layers (last layer = \( l=L=4 \), first layer \( l=0 \))</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs030.html#definitions" style="font-size: 80%;"><b>Definitions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs031.html#inputs-to-the-activation-function" style="font-size: 80%;"><b>Inputs to the activation function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs032.html#layout-of-input-to-first-hidden-layer-l-1-from-input-layer-l-0" style="font-size: 80%;"><b>Layout of input to first hidden layer \( l=1 \) from input layer \( l=0 \)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs033.html#derivatives-and-the-chain-rule" style="font-size: 80%;"><b>Derivatives and the chain rule</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs034.html#derivative-of-the-cost-function" style="font-size: 80%;"><b>Derivative of the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs035.html#the-back-propagation-equations-for-a-neural-network" style="font-size: 80%;"><b>The  back propagation equations for a neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs036.html#analyzing-the-last-results" style="font-size: 80%;"><b>Analyzing the last results</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs037.html#more-considerations" style="font-size: 80%;"><b>More considerations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs038.html#derivatives-in-terms-of-z-j-l" style="font-size: 80%;"><b>Derivatives in terms of \( z_j^L \)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs039.html#bringing-it-together" style="font-size: 80%;"><b>Bringing it together</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs040.html#final-back-propagating-equation" style="font-size: 80%;"><b>Final back propagating equation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs041.html#using-the-chain-rule-and-summing-over-all-k-entries" style="font-size: 80%;"><b>Using the chain rule and summing over all \( k \) entries</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs042.html#setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations" style="font-size: 80%;"><b>Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs043.html#setting-up-the-back-propagation-algorithm-part-1" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 1</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs044.html#setting-up-the-back-propagation-algorithm-part-2" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 2</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs045.html#setting-up-the-back-propagation-algorithm-part-3" style="font-size: 80%;"><b>Setting up the Back propagation algorithm, part 3</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs046.html#updating-the-gradients" style="font-size: 80%;"><b>Updating the gradients</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#activation-functions" style="font-size: 80%;"><b>Activation functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs047.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs048.html#relevance" style="font-size: 80%;"><b>Relevance</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs049.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs050.html#exploding-gradients" style="font-size: 80%;"><b>Exploding gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs051.html#is-the-logistic-activation-function-sigmoid-our-choice" style="font-size: 80%;"><b>Is the Logistic activation function (Sigmoid)  our choice?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs052.html#logistic-function-as-the-root-of-problems" style="font-size: 80%;"><b>Logistic function as the root of problems</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs053.html#the-derivative-of-the-logistic-funtion" style="font-size: 80%;"><b>The derivative of the Logistic funtion</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs054.html#insights-from-the-paper-by-glorot-and-bengio" style="font-size: 80%;"><b>Insights from the paper by Glorot and Bengio</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs055.html#the-relu-function-family" style="font-size: 80%;"><b>The RELU function family</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs056.html#elu-function" style="font-size: 80%;"><b>ELU function</b></a></li>
-     <!-- navigation toc: --> <li><a href="#which-activation-function-should-we-use" style="font-size: 80%;"><b>Which activation function should we use?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs058.html#more-on-activation-functions-output-layers" style="font-size: 80%;"><b>More on activation functions, output layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs059.html#fine-tuning-neural-network-hyperparameters" style="font-size: 80%;"><b>Fine-tuning neural network hyperparameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs060.html#hidden-layers" style="font-size: 80%;"><b>Hidden layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs061.html#batch-normalization" style="font-size: 80%;"><b>Batch Normalization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs062.html#dropout" style="font-size: 80%;"><b>Dropout</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs063.html#gradient-clipping" style="font-size: 80%;"><b>Gradient Clipping</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs064.html#a-top-down-perspective-on-neural-networks" style="font-size: 80%;"><b>A top-down perspective on Neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs065.html#more-top-down-perspectives" style="font-size: 80%;"><b>More top-down perspectives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs066.html#limitations-of-supervised-learning-with-deep-networks" style="font-size: 80%;"><b>Limitations of supervised learning with deep networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs067.html#limitations-of-nns" style="font-size: 80%;"><b>Limitations of NNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs068.html#homogeneous-data" style="font-size: 80%;"><b>Homogeneous data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs069.html#more-limitations" style="font-size: 80%;"><b>More limitations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs070.html#setting-up-a-multi-layer-perceptron-model-for-classification" style="font-size: 80%;"><b>Setting up a Multi-layer perceptron model for classification</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs071.html#defining-the-cost-function" style="font-size: 80%;"><b>Defining the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs072.html#example-binary-classification-problem" style="font-size: 80%;"><b>Example: binary classification problem</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs073.html#the-softmax-function" style="font-size: 80%;"><b>The Softmax function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs074.html#developing-a-code-for-doing-neural-networks-with-back-propagation" style="font-size: 80%;"><b>Developing a code for doing neural networks with back propagation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs076.html#train-and-test-datasets" style="font-size: 80%;"><b>Train and test datasets</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs077.html#define-model-and-architecture" style="font-size: 80%;"><b>Define model and architecture</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs078.html#layers" style="font-size: 80%;"><b>Layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs079.html#weights-and-biases" style="font-size: 80%;"><b>Weights and biases</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs080.html#feed-forward-pass" style="font-size: 80%;"><b>Feed-forward pass</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs081.html#matrix-multiplications" style="font-size: 80%;"><b>Matrix multiplications</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs082.html#choose-cost-function-and-optimizer" style="font-size: 80%;"><b>Choose cost function and optimizer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs083.html#optimizing-the-cost-function" style="font-size: 80%;"><b>Optimizing the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs084.html#regularization" style="font-size: 80%;"><b>Regularization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs085.html#matrix-multiplication" style="font-size: 80%;"><b>Matrix  multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs086.html#improving-performance" style="font-size: 80%;"><b>Improving performance</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs087.html#full-object-oriented-implementation" style="font-size: 80%;"><b>Full object-oriented implementation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs088.html#evaluate-model-performance-on-test-data" style="font-size: 80%;"><b>Evaluate model performance on test data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs089.html#adjust-hyperparameters" style="font-size: 80%;"><b>Adjust hyperparameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs091.html#scikit-learn-implementation" style="font-size: 80%;"><b>scikit-learn implementation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#building-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs094.html#tensorflow" style="font-size: 80%;"><b>Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs095.html#using-keras" style="font-size: 80%;"><b>Using Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#the-breast-cancer-data-now-with-keras" style="font-size: 80%;"><b>The Breast Cancer Data, now with Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs048.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs049.html#relevance" style="font-size: 80%;"><b>Relevance</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs050.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs051.html#exploding-gradients" style="font-size: 80%;"><b>Exploding gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs052.html#is-the-logistic-activation-function-sigmoid-our-choice" style="font-size: 80%;"><b>Is the Logistic activation function (Sigmoid)  our choice?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs053.html#logistic-function-as-the-root-of-problems" style="font-size: 80%;"><b>Logistic function as the root of problems</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs054.html#the-derivative-of-the-logistic-funtion" style="font-size: 80%;"><b>The derivative of the Logistic funtion</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs055.html#insights-from-the-paper-by-glorot-and-bengio" style="font-size: 80%;"><b>Insights from the paper by Glorot and Bengio</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs056.html#the-relu-function-family" style="font-size: 80%;"><b>The RELU function family</b></a></li>
+     <!-- navigation toc: --> <li><a href="#elu-function" style="font-size: 80%;"><b>ELU function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs058.html#which-activation-function-should-we-use" style="font-size: 80%;"><b>Which activation function should we use?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs059.html#more-on-activation-functions-output-layers" style="font-size: 80%;"><b>More on activation functions, output layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs060.html#fine-tuning-neural-network-hyperparameters" style="font-size: 80%;"><b>Fine-tuning neural network hyperparameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs061.html#hidden-layers" style="font-size: 80%;"><b>Hidden layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs062.html#batch-normalization" style="font-size: 80%;"><b>Batch Normalization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs063.html#dropout" style="font-size: 80%;"><b>Dropout</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs064.html#gradient-clipping" style="font-size: 80%;"><b>Gradient Clipping</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs065.html#a-top-down-perspective-on-neural-networks" style="font-size: 80%;"><b>A top-down perspective on Neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs066.html#more-top-down-perspectives" style="font-size: 80%;"><b>More top-down perspectives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs067.html#limitations-of-supervised-learning-with-deep-networks" style="font-size: 80%;"><b>Limitations of supervised learning with deep networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs068.html#limitations-of-nns" style="font-size: 80%;"><b>Limitations of NNs</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs069.html#homogeneous-data" style="font-size: 80%;"><b>Homogeneous data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs070.html#more-limitations" style="font-size: 80%;"><b>More limitations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs071.html#setting-up-a-multi-layer-perceptron-model-for-classification" style="font-size: 80%;"><b>Setting up a Multi-layer perceptron model for classification</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs072.html#defining-the-cost-function" style="font-size: 80%;"><b>Defining the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs073.html#example-binary-classification-problem" style="font-size: 80%;"><b>Example: binary classification problem</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs074.html#the-softmax-function" style="font-size: 80%;"><b>The Softmax function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs075.html#developing-a-code-for-doing-neural-networks-with-back-propagation" style="font-size: 80%;"><b>Developing a code for doing neural networks with back propagation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs077.html#train-and-test-datasets" style="font-size: 80%;"><b>Train and test datasets</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs078.html#define-model-and-architecture" style="font-size: 80%;"><b>Define model and architecture</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs079.html#layers" style="font-size: 80%;"><b>Layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs080.html#weights-and-biases" style="font-size: 80%;"><b>Weights and biases</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs081.html#feed-forward-pass" style="font-size: 80%;"><b>Feed-forward pass</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs082.html#matrix-multiplications" style="font-size: 80%;"><b>Matrix multiplications</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs083.html#choose-cost-function-and-optimizer" style="font-size: 80%;"><b>Choose cost function and optimizer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs084.html#optimizing-the-cost-function" style="font-size: 80%;"><b>Optimizing the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs085.html#regularization" style="font-size: 80%;"><b>Regularization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs086.html#matrix-multiplication" style="font-size: 80%;"><b>Matrix  multiplication</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs087.html#improving-performance" style="font-size: 80%;"><b>Improving performance</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs088.html#full-object-oriented-implementation" style="font-size: 80%;"><b>Full object-oriented implementation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs089.html#evaluate-model-performance-on-test-data" style="font-size: 80%;"><b>Evaluate model performance on test data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs090.html#adjust-hyperparameters" style="font-size: 80%;"><b>Adjust hyperparameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#scikit-learn-implementation" style="font-size: 80%;"><b>scikit-learn implementation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs094.html#building-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building neural networks in Tensorflow and Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs095.html#tensorflow" style="font-size: 80%;"><b>Tensorflow</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#using-keras" style="font-size: 80%;"><b>Using Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#building-a-neural-network-code" style="font-size: 80%;"><b>Building a neural network code</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#learning-rate-methods" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Learning rate methods</a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#usage-of-the-above-learning-rate-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of the above learning rate schedulers</a></li>
@@ -465,21 +463,17 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0057"></a>
 <!-- !split -->
-<h2 id="which-activation-function-should-we-use" class="anchor">Which activation function should we use? </h2>
+<h2 id="elu-function" class="anchor">ELU function </h2>
 
-<p>In general it seems that the ELU activation function is better than
-the leaky ReLU function (and its variants), which is better than
-ReLU. ReLU performs better than \( \tanh \) which in turn performs better
-than the logistic function.
+<p>To solve this problem, nowadays practitioners use a variant of the
+ReLU function, such as the leaky ReLU discussed above or the so-called
+exponential linear unit (ELU) function
 </p>
 
-<p>If runtime performance is an issue, then you may opt for the leaky
-ReLU function over the ELU function If you don&#8217;t want to tweak yet
-another hyperparameter, you may just use the default \( \alpha \) of
-\( 0.01 \) for the leaky ReLU, and \( 1 \) for ELU. If you have spare time and
-computing power, you can use cross-validation or bootstrap to evaluate
-other activation functions.
-</p>
+$$
+ELU(z) = \left\{\begin{array}{cc} \alpha\left( \exp{(z)}-1\right) & z < 0,\\  z & z \ge 0.\end{array}\right. 
+$$
+
 
 <p>
 <!-- navigation buttons at the bottom of the page -->
diff --git a/doc/pub/week42/html/._week42-bs058.html b/doc/pub/week42/html/._week42-bs058.html
index 17ef0bdb2..8663711f5 100644
--- a/doc/pub/week42/html/._week42-bs058.html
+++ b/doc/pub/week42/html/._week42-bs058.html
@@ -36,16 +36,17 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Lecture October 14, 2024', 2, None, 'lecture-october-14-2024'),
-              ('Material for the active learning sessions on Tuesday and '
-               'Wednesday',
+ 'sections': [('Lecture October 13, 2025', 2, None, 'lecture-october-13-2025'),
+              ('Readings and videos', 2, None, 'readings-and-videos'),
+              ('Material for the lab sessions on Tuesday and Wednesday',
                2,
                None,
-               'material-for-the-active-learning-sessions-on-tuesday-and-wednesday'),
-              ('Writing a code which implements a feed-forward neural network',
+               'material-for-the-lab-sessions-on-tuesday-and-wednesday'),
+              ('Lecture material: Writing a code which implements a '
+               'feed-forward neural network',
                2,
                None,
-               'writing-a-code-which-implements-a-feed-forward-neural-network'),
+               'lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network'),
               ('Mathematics of deep learning',
                2,
                None,
@@ -55,10 +56,11 @@
                None,
                'reminder-on-books-with-hands-on-material-and-codes'),
               ('Reading recommendations', 2, None, 'reading-recommendations'),
-              ('First network example, simple percepetron with one input',
+              ('Reminder from last week: First network example, simple '
+               'percepetron with one input',
                2,
                None,
-               'first-network-example-simple-percepetron-with-one-input'),
+               'reminder-from-last-week-first-network-example-simple-percepetron-with-one-input'),
               ('Layout of a simple neural network with no hidden layer',
                2,
                None,
@@ -105,10 +107,10 @@
                None,
                'setting-up-the-equations-for-a-neural-network'),
               ('Layout of a neural network with three hidden layers (last '
-               'later = $l=L=4$, first layer $l=0$)',
+               'layer = $l=L=4$, first layer $l=0$)',
                2,
                None,
-               'layout-of-a-neural-network-with-three-hidden-layers-last-later-l-l-4-first-layer-l-0'),
+               'layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0'),
               ('Definitions', 2, None, 'definitions'),
               ('Inputs to the activation function',
                2,
@@ -291,10 +293,6 @@
                2,
                None,
                'collect-and-pre-process-data'),
-              ('The Breast Cancer Data, now with Keras',
-               2,
-               None,
-               'the-breast-cancer-data-now-with-keras'),
               ('Building a neural network code',
                2,
                None,
@@ -349,103 +347,103 @@
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs001.html#lecture-october-14-2024" style="font-size: 80%;"><b>Lecture October 14, 2024</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs002.html#material-for-the-active-learning-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Material for the active learning sessions on Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs003.html#writing-a-code-which-implements-a-feed-forward-neural-network" style="font-size: 80%;"><b>Writing a code which implements a feed-forward neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs004.html#mathematics-of-deep-learning" style="font-size: 80%;"><b>Mathematics of deep learning</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs005.html#reminder-on-books-with-hands-on-material-and-codes" style="font-size: 80%;"><b>Reminder on books with hands-on material and codes</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs006.html#reading-recommendations" style="font-size: 80%;"><b>Reading recommendations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs007.html#first-network-example-simple-percepetron-with-one-input" style="font-size: 80%;"><b>First network example, simple percepetron with one input</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs008.html#layout-of-a-simple-neural-network-with-no-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with no hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs009.html#optimizing-the-parameters" style="font-size: 80%;"><b>Optimizing the parameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs010.html#adding-a-hidden-layer" style="font-size: 80%;"><b>Adding a hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs011.html#layout-of-a-simple-neural-network-with-one-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with one hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs012.html#the-derivatives" style="font-size: 80%;"><b>The derivatives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs013.html#important-observations" style="font-size: 80%;"><b>Important observations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs014.html#the-training" style="font-size: 80%;"><b>The training</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs015.html#code-example" style="font-size: 80%;"><b>Code example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs016.html#simple-neural-network-and-the-back-propagation-equations" style="font-size: 80%;"><b>Simple neural network and the  back propagation equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs017.html#layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-with-two-hidden-noeds-and-one-output-node" style="font-size: 80%;"><b>Layout of a simple neural network with two input nodes, one  hidden layer with two hidden noeds and one output node</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs018.html#the-ouput-layer" style="font-size: 80%;"><b>The ouput layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs019.html#compact-expressions" style="font-size: 80%;"><b>Compact expressions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs020.html#output-layer" style="font-size: 80%;"><b>Output layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs021.html#explicit-derivatives" style="font-size: 80%;"><b>Explicit derivatives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs022.html#derivatives-of-the-hidden-layer" style="font-size: 80%;"><b>Derivatives of the hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs023.html#final-expression" style="font-size: 80%;"><b>Final expression</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs024.html#completing-the-list" style="font-size: 80%;"><b>Completing the list</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs025.html#final-expressions-for-the-biases-of-the-hidden-layer" style="font-size: 80%;"><b>Final expressions for the biases of the hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs026.html#gradient-expressions" style="font-size: 80%;"><b>Gradient expressions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs027.html#setting-up-the-equations-for-a-neural-network" style="font-size: 80%;"><b>Setting up the equations for a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs028.html#layout-of-a-neural-network-with-three-hidden-layers-last-later-l-l-4-first-layer-l-0" style="font-size: 80%;"><b>Layout of a neural network with three hidden layers (last later = \( l=L=4 \), first layer \( l=0 \))</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs029.html#definitions" style="font-size: 80%;"><b>Definitions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs030.html#inputs-to-the-activation-function" style="font-size: 80%;"><b>Inputs to the activation function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs031.html#layout-of-input-to-first-hidden-layer-l-1-from-input-layer-l-0" style="font-size: 80%;"><b>Layout of input to first hidden layer \( l=1 \) from input layer \( l=0 \)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs032.html#derivatives-and-the-chain-rule" style="font-size: 80%;"><b>Derivatives and the chain rule</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs033.html#derivative-of-the-cost-function" style="font-size: 80%;"><b>Derivative of the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs034.html#the-back-propagation-equations-for-a-neural-network" style="font-size: 80%;"><b>The  back propagation equations for a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs035.html#analyzing-the-last-results" style="font-size: 80%;"><b>Analyzing the last results</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs036.html#more-considerations" style="font-size: 80%;"><b>More considerations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs037.html#derivatives-in-terms-of-z-j-l" style="font-size: 80%;"><b>Derivatives in terms of \( z_j^L \)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs038.html#bringing-it-together" style="font-size: 80%;"><b>Bringing it together</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs039.html#final-back-propagating-equation" style="font-size: 80%;"><b>Final back propagating equation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs040.html#using-the-chain-rule-and-summing-over-all-k-entries" style="font-size: 80%;"><b>Using the chain rule and summing over all \( k \) entries</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs041.html#setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations" style="font-size: 80%;"><b>Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs042.html#setting-up-the-back-propagation-algorithm-part-1" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 1</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs043.html#setting-up-the-back-propagation-algorithm-part-2" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs044.html#setting-up-the-back-propagation-algorithm-part-3" style="font-size: 80%;"><b>Setting up the Back propagation algorithm, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs045.html#updating-the-gradients" style="font-size: 80%;"><b>Updating the gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs001.html#lecture-october-13-2025" style="font-size: 80%;"><b>Lecture October 13, 2025</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs002.html#readings-and-videos" style="font-size: 80%;"><b>Readings and videos</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs003.html#material-for-the-lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Material for the lab sessions on Tuesday and Wednesday</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs004.html#lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network" style="font-size: 80%;"><b>Lecture material: Writing a code which implements a feed-forward neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs005.html#mathematics-of-deep-learning" style="font-size: 80%;"><b>Mathematics of deep learning</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs006.html#reminder-on-books-with-hands-on-material-and-codes" style="font-size: 80%;"><b>Reminder on books with hands-on material and codes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs007.html#reading-recommendations" style="font-size: 80%;"><b>Reading recommendations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs008.html#reminder-from-last-week-first-network-example-simple-percepetron-with-one-input" style="font-size: 80%;"><b>Reminder from last week: First network example, simple percepetron with one input</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs009.html#layout-of-a-simple-neural-network-with-no-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with no hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs010.html#optimizing-the-parameters" style="font-size: 80%;"><b>Optimizing the parameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs011.html#adding-a-hidden-layer" style="font-size: 80%;"><b>Adding a hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs012.html#layout-of-a-simple-neural-network-with-one-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with one hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs013.html#the-derivatives" style="font-size: 80%;"><b>The derivatives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs014.html#important-observations" style="font-size: 80%;"><b>Important observations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs015.html#the-training" style="font-size: 80%;"><b>The training</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs016.html#code-example" style="font-size: 80%;"><b>Code example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs017.html#simple-neural-network-and-the-back-propagation-equations" style="font-size: 80%;"><b>Simple neural network and the  back propagation equations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs018.html#layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-with-two-hidden-noeds-and-one-output-node" style="font-size: 80%;"><b>Layout of a simple neural network with two input nodes, one  hidden layer with two hidden noeds and one output node</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs019.html#the-ouput-layer" style="font-size: 80%;"><b>The ouput layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs020.html#compact-expressions" style="font-size: 80%;"><b>Compact expressions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs021.html#output-layer" style="font-size: 80%;"><b>Output layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs022.html#explicit-derivatives" style="font-size: 80%;"><b>Explicit derivatives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs023.html#derivatives-of-the-hidden-layer" style="font-size: 80%;"><b>Derivatives of the hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs024.html#final-expression" style="font-size: 80%;"><b>Final expression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs025.html#completing-the-list" style="font-size: 80%;"><b>Completing the list</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs026.html#final-expressions-for-the-biases-of-the-hidden-layer" style="font-size: 80%;"><b>Final expressions for the biases of the hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs027.html#gradient-expressions" style="font-size: 80%;"><b>Gradient expressions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs028.html#setting-up-the-equations-for-a-neural-network" style="font-size: 80%;"><b>Setting up the equations for a neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs029.html#layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0" style="font-size: 80%;"><b>Layout of a neural network with three hidden layers (last layer = \( l=L=4 \), first layer \( l=0 \))</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs030.html#definitions" style="font-size: 80%;"><b>Definitions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs031.html#inputs-to-the-activation-function" style="font-size: 80%;"><b>Inputs to the activation function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs032.html#layout-of-input-to-first-hidden-layer-l-1-from-input-layer-l-0" style="font-size: 80%;"><b>Layout of input to first hidden layer \( l=1 \) from input layer \( l=0 \)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs033.html#derivatives-and-the-chain-rule" style="font-size: 80%;"><b>Derivatives and the chain rule</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs034.html#derivative-of-the-cost-function" style="font-size: 80%;"><b>Derivative of the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs035.html#the-back-propagation-equations-for-a-neural-network" style="font-size: 80%;"><b>The  back propagation equations for a neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs036.html#analyzing-the-last-results" style="font-size: 80%;"><b>Analyzing the last results</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs037.html#more-considerations" style="font-size: 80%;"><b>More considerations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs038.html#derivatives-in-terms-of-z-j-l" style="font-size: 80%;"><b>Derivatives in terms of \( z_j^L \)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs039.html#bringing-it-together" style="font-size: 80%;"><b>Bringing it together</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs040.html#final-back-propagating-equation" style="font-size: 80%;"><b>Final back propagating equation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs041.html#using-the-chain-rule-and-summing-over-all-k-entries" style="font-size: 80%;"><b>Using the chain rule and summing over all \( k \) entries</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs042.html#setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations" style="font-size: 80%;"><b>Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs043.html#setting-up-the-back-propagation-algorithm-part-1" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 1</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs044.html#setting-up-the-back-propagation-algorithm-part-2" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 2</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs045.html#setting-up-the-back-propagation-algorithm-part-3" style="font-size: 80%;"><b>Setting up the Back propagation algorithm, part 3</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs046.html#updating-the-gradients" style="font-size: 80%;"><b>Updating the gradients</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#activation-functions" style="font-size: 80%;"><b>Activation functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs047.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs048.html#relevance" style="font-size: 80%;"><b>Relevance</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs049.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs050.html#exploding-gradients" style="font-size: 80%;"><b>Exploding gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs051.html#is-the-logistic-activation-function-sigmoid-our-choice" style="font-size: 80%;"><b>Is the Logistic activation function (Sigmoid)  our choice?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs052.html#logistic-function-as-the-root-of-problems" style="font-size: 80%;"><b>Logistic function as the root of problems</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs053.html#the-derivative-of-the-logistic-funtion" style="font-size: 80%;"><b>The derivative of the Logistic funtion</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs054.html#insights-from-the-paper-by-glorot-and-bengio" style="font-size: 80%;"><b>Insights from the paper by Glorot and Bengio</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs055.html#the-relu-function-family" style="font-size: 80%;"><b>The RELU function family</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs056.html#elu-function" style="font-size: 80%;"><b>ELU function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs057.html#which-activation-function-should-we-use" style="font-size: 80%;"><b>Which activation function should we use?</b></a></li>
-     <!-- navigation toc: --> <li><a href="#more-on-activation-functions-output-layers" style="font-size: 80%;"><b>More on activation functions, output layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs059.html#fine-tuning-neural-network-hyperparameters" style="font-size: 80%;"><b>Fine-tuning neural network hyperparameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs060.html#hidden-layers" style="font-size: 80%;"><b>Hidden layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs061.html#batch-normalization" style="font-size: 80%;"><b>Batch Normalization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs062.html#dropout" style="font-size: 80%;"><b>Dropout</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs063.html#gradient-clipping" style="font-size: 80%;"><b>Gradient Clipping</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs064.html#a-top-down-perspective-on-neural-networks" style="font-size: 80%;"><b>A top-down perspective on Neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs065.html#more-top-down-perspectives" style="font-size: 80%;"><b>More top-down perspectives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs066.html#limitations-of-supervised-learning-with-deep-networks" style="font-size: 80%;"><b>Limitations of supervised learning with deep networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs067.html#limitations-of-nns" style="font-size: 80%;"><b>Limitations of NNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs068.html#homogeneous-data" style="font-size: 80%;"><b>Homogeneous data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs069.html#more-limitations" style="font-size: 80%;"><b>More limitations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs070.html#setting-up-a-multi-layer-perceptron-model-for-classification" style="font-size: 80%;"><b>Setting up a Multi-layer perceptron model for classification</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs071.html#defining-the-cost-function" style="font-size: 80%;"><b>Defining the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs072.html#example-binary-classification-problem" style="font-size: 80%;"><b>Example: binary classification problem</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs073.html#the-softmax-function" style="font-size: 80%;"><b>The Softmax function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs074.html#developing-a-code-for-doing-neural-networks-with-back-propagation" style="font-size: 80%;"><b>Developing a code for doing neural networks with back propagation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs076.html#train-and-test-datasets" style="font-size: 80%;"><b>Train and test datasets</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs077.html#define-model-and-architecture" style="font-size: 80%;"><b>Define model and architecture</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs078.html#layers" style="font-size: 80%;"><b>Layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs079.html#weights-and-biases" style="font-size: 80%;"><b>Weights and biases</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs080.html#feed-forward-pass" style="font-size: 80%;"><b>Feed-forward pass</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs081.html#matrix-multiplications" style="font-size: 80%;"><b>Matrix multiplications</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs082.html#choose-cost-function-and-optimizer" style="font-size: 80%;"><b>Choose cost function and optimizer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs083.html#optimizing-the-cost-function" style="font-size: 80%;"><b>Optimizing the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs084.html#regularization" style="font-size: 80%;"><b>Regularization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs085.html#matrix-multiplication" style="font-size: 80%;"><b>Matrix  multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs086.html#improving-performance" style="font-size: 80%;"><b>Improving performance</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs087.html#full-object-oriented-implementation" style="font-size: 80%;"><b>Full object-oriented implementation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs088.html#evaluate-model-performance-on-test-data" style="font-size: 80%;"><b>Evaluate model performance on test data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs089.html#adjust-hyperparameters" style="font-size: 80%;"><b>Adjust hyperparameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs091.html#scikit-learn-implementation" style="font-size: 80%;"><b>scikit-learn implementation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#building-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs094.html#tensorflow" style="font-size: 80%;"><b>Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs095.html#using-keras" style="font-size: 80%;"><b>Using Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#the-breast-cancer-data-now-with-keras" style="font-size: 80%;"><b>The Breast Cancer Data, now with Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs048.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs049.html#relevance" style="font-size: 80%;"><b>Relevance</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs050.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs051.html#exploding-gradients" style="font-size: 80%;"><b>Exploding gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs052.html#is-the-logistic-activation-function-sigmoid-our-choice" style="font-size: 80%;"><b>Is the Logistic activation function (Sigmoid)  our choice?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs053.html#logistic-function-as-the-root-of-problems" style="font-size: 80%;"><b>Logistic function as the root of problems</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs054.html#the-derivative-of-the-logistic-funtion" style="font-size: 80%;"><b>The derivative of the Logistic funtion</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs055.html#insights-from-the-paper-by-glorot-and-bengio" style="font-size: 80%;"><b>Insights from the paper by Glorot and Bengio</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs056.html#the-relu-function-family" style="font-size: 80%;"><b>The RELU function family</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs057.html#elu-function" style="font-size: 80%;"><b>ELU function</b></a></li>
+     <!-- navigation toc: --> <li><a href="#which-activation-function-should-we-use" style="font-size: 80%;"><b>Which activation function should we use?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs059.html#more-on-activation-functions-output-layers" style="font-size: 80%;"><b>More on activation functions, output layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs060.html#fine-tuning-neural-network-hyperparameters" style="font-size: 80%;"><b>Fine-tuning neural network hyperparameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs061.html#hidden-layers" style="font-size: 80%;"><b>Hidden layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs062.html#batch-normalization" style="font-size: 80%;"><b>Batch Normalization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs063.html#dropout" style="font-size: 80%;"><b>Dropout</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs064.html#gradient-clipping" style="font-size: 80%;"><b>Gradient Clipping</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs065.html#a-top-down-perspective-on-neural-networks" style="font-size: 80%;"><b>A top-down perspective on Neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs066.html#more-top-down-perspectives" style="font-size: 80%;"><b>More top-down perspectives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs067.html#limitations-of-supervised-learning-with-deep-networks" style="font-size: 80%;"><b>Limitations of supervised learning with deep networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs068.html#limitations-of-nns" style="font-size: 80%;"><b>Limitations of NNs</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs069.html#homogeneous-data" style="font-size: 80%;"><b>Homogeneous data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs070.html#more-limitations" style="font-size: 80%;"><b>More limitations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs071.html#setting-up-a-multi-layer-perceptron-model-for-classification" style="font-size: 80%;"><b>Setting up a Multi-layer perceptron model for classification</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs072.html#defining-the-cost-function" style="font-size: 80%;"><b>Defining the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs073.html#example-binary-classification-problem" style="font-size: 80%;"><b>Example: binary classification problem</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs074.html#the-softmax-function" style="font-size: 80%;"><b>The Softmax function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs075.html#developing-a-code-for-doing-neural-networks-with-back-propagation" style="font-size: 80%;"><b>Developing a code for doing neural networks with back propagation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs077.html#train-and-test-datasets" style="font-size: 80%;"><b>Train and test datasets</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs078.html#define-model-and-architecture" style="font-size: 80%;"><b>Define model and architecture</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs079.html#layers" style="font-size: 80%;"><b>Layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs080.html#weights-and-biases" style="font-size: 80%;"><b>Weights and biases</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs081.html#feed-forward-pass" style="font-size: 80%;"><b>Feed-forward pass</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs082.html#matrix-multiplications" style="font-size: 80%;"><b>Matrix multiplications</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs083.html#choose-cost-function-and-optimizer" style="font-size: 80%;"><b>Choose cost function and optimizer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs084.html#optimizing-the-cost-function" style="font-size: 80%;"><b>Optimizing the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs085.html#regularization" style="font-size: 80%;"><b>Regularization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs086.html#matrix-multiplication" style="font-size: 80%;"><b>Matrix  multiplication</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs087.html#improving-performance" style="font-size: 80%;"><b>Improving performance</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs088.html#full-object-oriented-implementation" style="font-size: 80%;"><b>Full object-oriented implementation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs089.html#evaluate-model-performance-on-test-data" style="font-size: 80%;"><b>Evaluate model performance on test data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs090.html#adjust-hyperparameters" style="font-size: 80%;"><b>Adjust hyperparameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#scikit-learn-implementation" style="font-size: 80%;"><b>scikit-learn implementation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs094.html#building-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building neural networks in Tensorflow and Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs095.html#tensorflow" style="font-size: 80%;"><b>Tensorflow</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#using-keras" style="font-size: 80%;"><b>Using Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#building-a-neural-network-code" style="font-size: 80%;"><b>Building a neural network code</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#learning-rate-methods" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Learning rate methods</a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#usage-of-the-above-learning-rate-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of the above learning rate schedulers</a></li>
@@ -465,22 +463,22 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0058"></a>
 <!-- !split -->
-<h2 id="more-on-activation-functions-output-layers" class="anchor">More on activation functions, output layers </h2>
+<h2 id="which-activation-function-should-we-use" class="anchor">Which activation function should we use? </h2>
 
-<p>In most cases you can use the ReLU activation function in the hidden
-layers (or one of its variants).
+<p>In general it seems that the ELU activation function is better than
+the leaky ReLU function (and its variants), which is better than
+ReLU. ReLU performs better than \( \tanh \) which in turn performs better
+than the logistic function.
 </p>
 
-<p>It is a bit faster to compute than other activation functions, and the
-gradient descent optimization does in general not get stuck.
+<p>If runtime performance is an issue, then you may opt for the leaky
+ReLU function over the ELU function If you don&#8217;t want to tweak yet
+another hyperparameter, you may just use the default \( \alpha \) of
+\( 0.01 \) for the leaky ReLU, and \( 1 \) for ELU. If you have spare time and
+computing power, you can use cross-validation or bootstrap to evaluate
+other activation functions.
 </p>
 
-<b>For the output layer:</b>
-
-<ul>
-<li> For classification the softmax activation function is generally a good choice for classification tasks (when the classes are mutually exclusive).</li>
-<li> For regression tasks, you can simply use no activation function at all.</li>
-</ul>
 <p>
 <!-- navigation buttons at the bottom of the page -->
 <ul class="pagination">
diff --git a/doc/pub/week42/html/._week42-bs059.html b/doc/pub/week42/html/._week42-bs059.html
index 1be16c029..e9d74711d 100644
--- a/doc/pub/week42/html/._week42-bs059.html
+++ b/doc/pub/week42/html/._week42-bs059.html
@@ -36,16 +36,17 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Lecture October 14, 2024', 2, None, 'lecture-october-14-2024'),
-              ('Material for the active learning sessions on Tuesday and '
-               'Wednesday',
+ 'sections': [('Lecture October 13, 2025', 2, None, 'lecture-october-13-2025'),
+              ('Readings and videos', 2, None, 'readings-and-videos'),
+              ('Material for the lab sessions on Tuesday and Wednesday',
                2,
                None,
-               'material-for-the-active-learning-sessions-on-tuesday-and-wednesday'),
-              ('Writing a code which implements a feed-forward neural network',
+               'material-for-the-lab-sessions-on-tuesday-and-wednesday'),
+              ('Lecture material: Writing a code which implements a '
+               'feed-forward neural network',
                2,
                None,
-               'writing-a-code-which-implements-a-feed-forward-neural-network'),
+               'lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network'),
               ('Mathematics of deep learning',
                2,
                None,
@@ -55,10 +56,11 @@
                None,
                'reminder-on-books-with-hands-on-material-and-codes'),
               ('Reading recommendations', 2, None, 'reading-recommendations'),
-              ('First network example, simple percepetron with one input',
+              ('Reminder from last week: First network example, simple '
+               'percepetron with one input',
                2,
                None,
-               'first-network-example-simple-percepetron-with-one-input'),
+               'reminder-from-last-week-first-network-example-simple-percepetron-with-one-input'),
               ('Layout of a simple neural network with no hidden layer',
                2,
                None,
@@ -105,10 +107,10 @@
                None,
                'setting-up-the-equations-for-a-neural-network'),
               ('Layout of a neural network with three hidden layers (last '
-               'later = $l=L=4$, first layer $l=0$)',
+               'layer = $l=L=4$, first layer $l=0$)',
                2,
                None,
-               'layout-of-a-neural-network-with-three-hidden-layers-last-later-l-l-4-first-layer-l-0'),
+               'layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0'),
               ('Definitions', 2, None, 'definitions'),
               ('Inputs to the activation function',
                2,
@@ -291,10 +293,6 @@
                2,
                None,
                'collect-and-pre-process-data'),
-              ('The Breast Cancer Data, now with Keras',
-               2,
-               None,
-               'the-breast-cancer-data-now-with-keras'),
               ('Building a neural network code',
                2,
                None,
@@ -349,103 +347,103 @@
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs001.html#lecture-october-14-2024" style="font-size: 80%;"><b>Lecture October 14, 2024</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs002.html#material-for-the-active-learning-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Material for the active learning sessions on Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs003.html#writing-a-code-which-implements-a-feed-forward-neural-network" style="font-size: 80%;"><b>Writing a code which implements a feed-forward neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs004.html#mathematics-of-deep-learning" style="font-size: 80%;"><b>Mathematics of deep learning</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs005.html#reminder-on-books-with-hands-on-material-and-codes" style="font-size: 80%;"><b>Reminder on books with hands-on material and codes</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs006.html#reading-recommendations" style="font-size: 80%;"><b>Reading recommendations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs007.html#first-network-example-simple-percepetron-with-one-input" style="font-size: 80%;"><b>First network example, simple percepetron with one input</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs008.html#layout-of-a-simple-neural-network-with-no-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with no hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs009.html#optimizing-the-parameters" style="font-size: 80%;"><b>Optimizing the parameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs010.html#adding-a-hidden-layer" style="font-size: 80%;"><b>Adding a hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs011.html#layout-of-a-simple-neural-network-with-one-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with one hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs012.html#the-derivatives" style="font-size: 80%;"><b>The derivatives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs013.html#important-observations" style="font-size: 80%;"><b>Important observations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs014.html#the-training" style="font-size: 80%;"><b>The training</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs015.html#code-example" style="font-size: 80%;"><b>Code example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs016.html#simple-neural-network-and-the-back-propagation-equations" style="font-size: 80%;"><b>Simple neural network and the  back propagation equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs017.html#layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-with-two-hidden-noeds-and-one-output-node" style="font-size: 80%;"><b>Layout of a simple neural network with two input nodes, one  hidden layer with two hidden noeds and one output node</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs018.html#the-ouput-layer" style="font-size: 80%;"><b>The ouput layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs019.html#compact-expressions" style="font-size: 80%;"><b>Compact expressions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs020.html#output-layer" style="font-size: 80%;"><b>Output layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs021.html#explicit-derivatives" style="font-size: 80%;"><b>Explicit derivatives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs022.html#derivatives-of-the-hidden-layer" style="font-size: 80%;"><b>Derivatives of the hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs023.html#final-expression" style="font-size: 80%;"><b>Final expression</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs024.html#completing-the-list" style="font-size: 80%;"><b>Completing the list</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs025.html#final-expressions-for-the-biases-of-the-hidden-layer" style="font-size: 80%;"><b>Final expressions for the biases of the hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs026.html#gradient-expressions" style="font-size: 80%;"><b>Gradient expressions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs027.html#setting-up-the-equations-for-a-neural-network" style="font-size: 80%;"><b>Setting up the equations for a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs028.html#layout-of-a-neural-network-with-three-hidden-layers-last-later-l-l-4-first-layer-l-0" style="font-size: 80%;"><b>Layout of a neural network with three hidden layers (last later = \( l=L=4 \), first layer \( l=0 \))</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs029.html#definitions" style="font-size: 80%;"><b>Definitions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs030.html#inputs-to-the-activation-function" style="font-size: 80%;"><b>Inputs to the activation function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs031.html#layout-of-input-to-first-hidden-layer-l-1-from-input-layer-l-0" style="font-size: 80%;"><b>Layout of input to first hidden layer \( l=1 \) from input layer \( l=0 \)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs032.html#derivatives-and-the-chain-rule" style="font-size: 80%;"><b>Derivatives and the chain rule</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs033.html#derivative-of-the-cost-function" style="font-size: 80%;"><b>Derivative of the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs034.html#the-back-propagation-equations-for-a-neural-network" style="font-size: 80%;"><b>The  back propagation equations for a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs035.html#analyzing-the-last-results" style="font-size: 80%;"><b>Analyzing the last results</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs036.html#more-considerations" style="font-size: 80%;"><b>More considerations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs037.html#derivatives-in-terms-of-z-j-l" style="font-size: 80%;"><b>Derivatives in terms of \( z_j^L \)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs038.html#bringing-it-together" style="font-size: 80%;"><b>Bringing it together</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs039.html#final-back-propagating-equation" style="font-size: 80%;"><b>Final back propagating equation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs040.html#using-the-chain-rule-and-summing-over-all-k-entries" style="font-size: 80%;"><b>Using the chain rule and summing over all \( k \) entries</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs041.html#setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations" style="font-size: 80%;"><b>Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs042.html#setting-up-the-back-propagation-algorithm-part-1" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 1</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs043.html#setting-up-the-back-propagation-algorithm-part-2" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs044.html#setting-up-the-back-propagation-algorithm-part-3" style="font-size: 80%;"><b>Setting up the Back propagation algorithm, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs045.html#updating-the-gradients" style="font-size: 80%;"><b>Updating the gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs001.html#lecture-october-13-2025" style="font-size: 80%;"><b>Lecture October 13, 2025</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs002.html#readings-and-videos" style="font-size: 80%;"><b>Readings and videos</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs003.html#material-for-the-lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Material for the lab sessions on Tuesday and Wednesday</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs004.html#lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network" style="font-size: 80%;"><b>Lecture material: Writing a code which implements a feed-forward neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs005.html#mathematics-of-deep-learning" style="font-size: 80%;"><b>Mathematics of deep learning</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs006.html#reminder-on-books-with-hands-on-material-and-codes" style="font-size: 80%;"><b>Reminder on books with hands-on material and codes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs007.html#reading-recommendations" style="font-size: 80%;"><b>Reading recommendations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs008.html#reminder-from-last-week-first-network-example-simple-percepetron-with-one-input" style="font-size: 80%;"><b>Reminder from last week: First network example, simple percepetron with one input</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs009.html#layout-of-a-simple-neural-network-with-no-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with no hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs010.html#optimizing-the-parameters" style="font-size: 80%;"><b>Optimizing the parameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs011.html#adding-a-hidden-layer" style="font-size: 80%;"><b>Adding a hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs012.html#layout-of-a-simple-neural-network-with-one-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with one hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs013.html#the-derivatives" style="font-size: 80%;"><b>The derivatives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs014.html#important-observations" style="font-size: 80%;"><b>Important observations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs015.html#the-training" style="font-size: 80%;"><b>The training</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs016.html#code-example" style="font-size: 80%;"><b>Code example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs017.html#simple-neural-network-and-the-back-propagation-equations" style="font-size: 80%;"><b>Simple neural network and the  back propagation equations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs018.html#layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-with-two-hidden-noeds-and-one-output-node" style="font-size: 80%;"><b>Layout of a simple neural network with two input nodes, one  hidden layer with two hidden noeds and one output node</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs019.html#the-ouput-layer" style="font-size: 80%;"><b>The ouput layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs020.html#compact-expressions" style="font-size: 80%;"><b>Compact expressions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs021.html#output-layer" style="font-size: 80%;"><b>Output layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs022.html#explicit-derivatives" style="font-size: 80%;"><b>Explicit derivatives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs023.html#derivatives-of-the-hidden-layer" style="font-size: 80%;"><b>Derivatives of the hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs024.html#final-expression" style="font-size: 80%;"><b>Final expression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs025.html#completing-the-list" style="font-size: 80%;"><b>Completing the list</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs026.html#final-expressions-for-the-biases-of-the-hidden-layer" style="font-size: 80%;"><b>Final expressions for the biases of the hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs027.html#gradient-expressions" style="font-size: 80%;"><b>Gradient expressions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs028.html#setting-up-the-equations-for-a-neural-network" style="font-size: 80%;"><b>Setting up the equations for a neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs029.html#layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0" style="font-size: 80%;"><b>Layout of a neural network with three hidden layers (last layer = \( l=L=4 \), first layer \( l=0 \))</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs030.html#definitions" style="font-size: 80%;"><b>Definitions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs031.html#inputs-to-the-activation-function" style="font-size: 80%;"><b>Inputs to the activation function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs032.html#layout-of-input-to-first-hidden-layer-l-1-from-input-layer-l-0" style="font-size: 80%;"><b>Layout of input to first hidden layer \( l=1 \) from input layer \( l=0 \)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs033.html#derivatives-and-the-chain-rule" style="font-size: 80%;"><b>Derivatives and the chain rule</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs034.html#derivative-of-the-cost-function" style="font-size: 80%;"><b>Derivative of the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs035.html#the-back-propagation-equations-for-a-neural-network" style="font-size: 80%;"><b>The  back propagation equations for a neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs036.html#analyzing-the-last-results" style="font-size: 80%;"><b>Analyzing the last results</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs037.html#more-considerations" style="font-size: 80%;"><b>More considerations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs038.html#derivatives-in-terms-of-z-j-l" style="font-size: 80%;"><b>Derivatives in terms of \( z_j^L \)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs039.html#bringing-it-together" style="font-size: 80%;"><b>Bringing it together</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs040.html#final-back-propagating-equation" style="font-size: 80%;"><b>Final back propagating equation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs041.html#using-the-chain-rule-and-summing-over-all-k-entries" style="font-size: 80%;"><b>Using the chain rule and summing over all \( k \) entries</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs042.html#setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations" style="font-size: 80%;"><b>Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs043.html#setting-up-the-back-propagation-algorithm-part-1" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 1</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs044.html#setting-up-the-back-propagation-algorithm-part-2" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 2</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs045.html#setting-up-the-back-propagation-algorithm-part-3" style="font-size: 80%;"><b>Setting up the Back propagation algorithm, part 3</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs046.html#updating-the-gradients" style="font-size: 80%;"><b>Updating the gradients</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#activation-functions" style="font-size: 80%;"><b>Activation functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs047.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs048.html#relevance" style="font-size: 80%;"><b>Relevance</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs049.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs050.html#exploding-gradients" style="font-size: 80%;"><b>Exploding gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs051.html#is-the-logistic-activation-function-sigmoid-our-choice" style="font-size: 80%;"><b>Is the Logistic activation function (Sigmoid)  our choice?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs052.html#logistic-function-as-the-root-of-problems" style="font-size: 80%;"><b>Logistic function as the root of problems</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs053.html#the-derivative-of-the-logistic-funtion" style="font-size: 80%;"><b>The derivative of the Logistic funtion</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs054.html#insights-from-the-paper-by-glorot-and-bengio" style="font-size: 80%;"><b>Insights from the paper by Glorot and Bengio</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs055.html#the-relu-function-family" style="font-size: 80%;"><b>The RELU function family</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs056.html#elu-function" style="font-size: 80%;"><b>ELU function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs057.html#which-activation-function-should-we-use" style="font-size: 80%;"><b>Which activation function should we use?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs058.html#more-on-activation-functions-output-layers" style="font-size: 80%;"><b>More on activation functions, output layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="#fine-tuning-neural-network-hyperparameters" style="font-size: 80%;"><b>Fine-tuning neural network hyperparameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs060.html#hidden-layers" style="font-size: 80%;"><b>Hidden layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs061.html#batch-normalization" style="font-size: 80%;"><b>Batch Normalization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs062.html#dropout" style="font-size: 80%;"><b>Dropout</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs063.html#gradient-clipping" style="font-size: 80%;"><b>Gradient Clipping</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs064.html#a-top-down-perspective-on-neural-networks" style="font-size: 80%;"><b>A top-down perspective on Neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs065.html#more-top-down-perspectives" style="font-size: 80%;"><b>More top-down perspectives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs066.html#limitations-of-supervised-learning-with-deep-networks" style="font-size: 80%;"><b>Limitations of supervised learning with deep networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs067.html#limitations-of-nns" style="font-size: 80%;"><b>Limitations of NNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs068.html#homogeneous-data" style="font-size: 80%;"><b>Homogeneous data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs069.html#more-limitations" style="font-size: 80%;"><b>More limitations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs070.html#setting-up-a-multi-layer-perceptron-model-for-classification" style="font-size: 80%;"><b>Setting up a Multi-layer perceptron model for classification</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs071.html#defining-the-cost-function" style="font-size: 80%;"><b>Defining the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs072.html#example-binary-classification-problem" style="font-size: 80%;"><b>Example: binary classification problem</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs073.html#the-softmax-function" style="font-size: 80%;"><b>The Softmax function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs074.html#developing-a-code-for-doing-neural-networks-with-back-propagation" style="font-size: 80%;"><b>Developing a code for doing neural networks with back propagation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs076.html#train-and-test-datasets" style="font-size: 80%;"><b>Train and test datasets</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs077.html#define-model-and-architecture" style="font-size: 80%;"><b>Define model and architecture</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs078.html#layers" style="font-size: 80%;"><b>Layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs079.html#weights-and-biases" style="font-size: 80%;"><b>Weights and biases</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs080.html#feed-forward-pass" style="font-size: 80%;"><b>Feed-forward pass</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs081.html#matrix-multiplications" style="font-size: 80%;"><b>Matrix multiplications</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs082.html#choose-cost-function-and-optimizer" style="font-size: 80%;"><b>Choose cost function and optimizer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs083.html#optimizing-the-cost-function" style="font-size: 80%;"><b>Optimizing the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs084.html#regularization" style="font-size: 80%;"><b>Regularization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs085.html#matrix-multiplication" style="font-size: 80%;"><b>Matrix  multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs086.html#improving-performance" style="font-size: 80%;"><b>Improving performance</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs087.html#full-object-oriented-implementation" style="font-size: 80%;"><b>Full object-oriented implementation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs088.html#evaluate-model-performance-on-test-data" style="font-size: 80%;"><b>Evaluate model performance on test data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs089.html#adjust-hyperparameters" style="font-size: 80%;"><b>Adjust hyperparameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs091.html#scikit-learn-implementation" style="font-size: 80%;"><b>scikit-learn implementation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#building-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs094.html#tensorflow" style="font-size: 80%;"><b>Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs095.html#using-keras" style="font-size: 80%;"><b>Using Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#the-breast-cancer-data-now-with-keras" style="font-size: 80%;"><b>The Breast Cancer Data, now with Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs048.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs049.html#relevance" style="font-size: 80%;"><b>Relevance</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs050.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs051.html#exploding-gradients" style="font-size: 80%;"><b>Exploding gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs052.html#is-the-logistic-activation-function-sigmoid-our-choice" style="font-size: 80%;"><b>Is the Logistic activation function (Sigmoid)  our choice?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs053.html#logistic-function-as-the-root-of-problems" style="font-size: 80%;"><b>Logistic function as the root of problems</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs054.html#the-derivative-of-the-logistic-funtion" style="font-size: 80%;"><b>The derivative of the Logistic funtion</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs055.html#insights-from-the-paper-by-glorot-and-bengio" style="font-size: 80%;"><b>Insights from the paper by Glorot and Bengio</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs056.html#the-relu-function-family" style="font-size: 80%;"><b>The RELU function family</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs057.html#elu-function" style="font-size: 80%;"><b>ELU function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs058.html#which-activation-function-should-we-use" style="font-size: 80%;"><b>Which activation function should we use?</b></a></li>
+     <!-- navigation toc: --> <li><a href="#more-on-activation-functions-output-layers" style="font-size: 80%;"><b>More on activation functions, output layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs060.html#fine-tuning-neural-network-hyperparameters" style="font-size: 80%;"><b>Fine-tuning neural network hyperparameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs061.html#hidden-layers" style="font-size: 80%;"><b>Hidden layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs062.html#batch-normalization" style="font-size: 80%;"><b>Batch Normalization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs063.html#dropout" style="font-size: 80%;"><b>Dropout</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs064.html#gradient-clipping" style="font-size: 80%;"><b>Gradient Clipping</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs065.html#a-top-down-perspective-on-neural-networks" style="font-size: 80%;"><b>A top-down perspective on Neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs066.html#more-top-down-perspectives" style="font-size: 80%;"><b>More top-down perspectives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs067.html#limitations-of-supervised-learning-with-deep-networks" style="font-size: 80%;"><b>Limitations of supervised learning with deep networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs068.html#limitations-of-nns" style="font-size: 80%;"><b>Limitations of NNs</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs069.html#homogeneous-data" style="font-size: 80%;"><b>Homogeneous data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs070.html#more-limitations" style="font-size: 80%;"><b>More limitations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs071.html#setting-up-a-multi-layer-perceptron-model-for-classification" style="font-size: 80%;"><b>Setting up a Multi-layer perceptron model for classification</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs072.html#defining-the-cost-function" style="font-size: 80%;"><b>Defining the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs073.html#example-binary-classification-problem" style="font-size: 80%;"><b>Example: binary classification problem</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs074.html#the-softmax-function" style="font-size: 80%;"><b>The Softmax function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs075.html#developing-a-code-for-doing-neural-networks-with-back-propagation" style="font-size: 80%;"><b>Developing a code for doing neural networks with back propagation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs077.html#train-and-test-datasets" style="font-size: 80%;"><b>Train and test datasets</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs078.html#define-model-and-architecture" style="font-size: 80%;"><b>Define model and architecture</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs079.html#layers" style="font-size: 80%;"><b>Layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs080.html#weights-and-biases" style="font-size: 80%;"><b>Weights and biases</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs081.html#feed-forward-pass" style="font-size: 80%;"><b>Feed-forward pass</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs082.html#matrix-multiplications" style="font-size: 80%;"><b>Matrix multiplications</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs083.html#choose-cost-function-and-optimizer" style="font-size: 80%;"><b>Choose cost function and optimizer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs084.html#optimizing-the-cost-function" style="font-size: 80%;"><b>Optimizing the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs085.html#regularization" style="font-size: 80%;"><b>Regularization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs086.html#matrix-multiplication" style="font-size: 80%;"><b>Matrix  multiplication</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs087.html#improving-performance" style="font-size: 80%;"><b>Improving performance</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs088.html#full-object-oriented-implementation" style="font-size: 80%;"><b>Full object-oriented implementation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs089.html#evaluate-model-performance-on-test-data" style="font-size: 80%;"><b>Evaluate model performance on test data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs090.html#adjust-hyperparameters" style="font-size: 80%;"><b>Adjust hyperparameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#scikit-learn-implementation" style="font-size: 80%;"><b>scikit-learn implementation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs094.html#building-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building neural networks in Tensorflow and Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs095.html#tensorflow" style="font-size: 80%;"><b>Tensorflow</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#using-keras" style="font-size: 80%;"><b>Using Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#building-a-neural-network-code" style="font-size: 80%;"><b>Building a neural network code</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#learning-rate-methods" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Learning rate methods</a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#usage-of-the-above-learning-rate-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of the above learning rate schedulers</a></li>
@@ -465,29 +463,21 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0059"></a>
 <!-- !split -->
-<h2 id="fine-tuning-neural-network-hyperparameters" class="anchor">Fine-tuning neural network hyperparameters </h2>
+<h2 id="more-on-activation-functions-output-layers" class="anchor">More on activation functions, output layers </h2>
 
-<p>The flexibility of neural networks is also one of their main
-drawbacks: there are many hyperparameters to tweak. Not only can you
-use any imaginable network topology (how neurons/nodes are
-interconnected), but even in a simple FFNN you can change the number
-of layers, the number of neurons per layer, the type of activation
-function to use in each layer, the weight initialization logic, the
-stochastic gradient optmized and much more. How do you know what
-combination of hyperparameters is the best for your task?
+<p>In most cases you can use the ReLU activation function in the hidden
+layers (or one of its variants).
 </p>
 
-<ul>
-<li> You can use grid search with cross-validation to find the right hyperparameters.</li>
-</ul>
-<p>However,since there are many hyperparameters to tune, and since
-training a neural network on a large dataset takes a lot of time, you
-will only be able to explore a tiny part of the hyperparameter space.
+<p>It is a bit faster to compute than other activation functions, and the
+gradient descent optimization does in general not get stuck.
 </p>
 
+<b>For the output layer:</b>
+
 <ul>
-<li> You can use randomized search.</li>
-<li> Or use tools like <a href="http://oscar.calldesk.ai/" target="_self">Oscar</a>, which implements more complex algorithms to help you find a good set of hyperparameters quickly.</li>  
+<li> For classification the softmax activation function is generally a good choice for classification tasks (when the classes are mutually exclusive).</li>
+<li> For regression tasks, you can simply use no activation function at all.</li>
 </ul>
 <p>
 <!-- navigation buttons at the bottom of the page -->
diff --git a/doc/pub/week42/html/._week42-bs060.html b/doc/pub/week42/html/._week42-bs060.html
index 4fce7ed10..eec3e5a14 100644
--- a/doc/pub/week42/html/._week42-bs060.html
+++ b/doc/pub/week42/html/._week42-bs060.html
@@ -36,16 +36,17 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Lecture October 14, 2024', 2, None, 'lecture-october-14-2024'),
-              ('Material for the active learning sessions on Tuesday and '
-               'Wednesday',
+ 'sections': [('Lecture October 13, 2025', 2, None, 'lecture-october-13-2025'),
+              ('Readings and videos', 2, None, 'readings-and-videos'),
+              ('Material for the lab sessions on Tuesday and Wednesday',
                2,
                None,
-               'material-for-the-active-learning-sessions-on-tuesday-and-wednesday'),
-              ('Writing a code which implements a feed-forward neural network',
+               'material-for-the-lab-sessions-on-tuesday-and-wednesday'),
+              ('Lecture material: Writing a code which implements a '
+               'feed-forward neural network',
                2,
                None,
-               'writing-a-code-which-implements-a-feed-forward-neural-network'),
+               'lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network'),
               ('Mathematics of deep learning',
                2,
                None,
@@ -55,10 +56,11 @@
                None,
                'reminder-on-books-with-hands-on-material-and-codes'),
               ('Reading recommendations', 2, None, 'reading-recommendations'),
-              ('First network example, simple percepetron with one input',
+              ('Reminder from last week: First network example, simple '
+               'percepetron with one input',
                2,
                None,
-               'first-network-example-simple-percepetron-with-one-input'),
+               'reminder-from-last-week-first-network-example-simple-percepetron-with-one-input'),
               ('Layout of a simple neural network with no hidden layer',
                2,
                None,
@@ -105,10 +107,10 @@
                None,
                'setting-up-the-equations-for-a-neural-network'),
               ('Layout of a neural network with three hidden layers (last '
-               'later = $l=L=4$, first layer $l=0$)',
+               'layer = $l=L=4$, first layer $l=0$)',
                2,
                None,
-               'layout-of-a-neural-network-with-three-hidden-layers-last-later-l-l-4-first-layer-l-0'),
+               'layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0'),
               ('Definitions', 2, None, 'definitions'),
               ('Inputs to the activation function',
                2,
@@ -291,10 +293,6 @@
                2,
                None,
                'collect-and-pre-process-data'),
-              ('The Breast Cancer Data, now with Keras',
-               2,
-               None,
-               'the-breast-cancer-data-now-with-keras'),
               ('Building a neural network code',
                2,
                None,
@@ -349,103 +347,103 @@
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs001.html#lecture-october-14-2024" style="font-size: 80%;"><b>Lecture October 14, 2024</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs002.html#material-for-the-active-learning-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Material for the active learning sessions on Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs003.html#writing-a-code-which-implements-a-feed-forward-neural-network" style="font-size: 80%;"><b>Writing a code which implements a feed-forward neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs004.html#mathematics-of-deep-learning" style="font-size: 80%;"><b>Mathematics of deep learning</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs005.html#reminder-on-books-with-hands-on-material-and-codes" style="font-size: 80%;"><b>Reminder on books with hands-on material and codes</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs006.html#reading-recommendations" style="font-size: 80%;"><b>Reading recommendations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs007.html#first-network-example-simple-percepetron-with-one-input" style="font-size: 80%;"><b>First network example, simple percepetron with one input</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs008.html#layout-of-a-simple-neural-network-with-no-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with no hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs009.html#optimizing-the-parameters" style="font-size: 80%;"><b>Optimizing the parameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs010.html#adding-a-hidden-layer" style="font-size: 80%;"><b>Adding a hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs011.html#layout-of-a-simple-neural-network-with-one-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with one hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs012.html#the-derivatives" style="font-size: 80%;"><b>The derivatives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs013.html#important-observations" style="font-size: 80%;"><b>Important observations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs014.html#the-training" style="font-size: 80%;"><b>The training</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs015.html#code-example" style="font-size: 80%;"><b>Code example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs016.html#simple-neural-network-and-the-back-propagation-equations" style="font-size: 80%;"><b>Simple neural network and the  back propagation equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs017.html#layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-with-two-hidden-noeds-and-one-output-node" style="font-size: 80%;"><b>Layout of a simple neural network with two input nodes, one  hidden layer with two hidden noeds and one output node</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs018.html#the-ouput-layer" style="font-size: 80%;"><b>The ouput layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs019.html#compact-expressions" style="font-size: 80%;"><b>Compact expressions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs020.html#output-layer" style="font-size: 80%;"><b>Output layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs021.html#explicit-derivatives" style="font-size: 80%;"><b>Explicit derivatives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs022.html#derivatives-of-the-hidden-layer" style="font-size: 80%;"><b>Derivatives of the hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs023.html#final-expression" style="font-size: 80%;"><b>Final expression</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs024.html#completing-the-list" style="font-size: 80%;"><b>Completing the list</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs025.html#final-expressions-for-the-biases-of-the-hidden-layer" style="font-size: 80%;"><b>Final expressions for the biases of the hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs026.html#gradient-expressions" style="font-size: 80%;"><b>Gradient expressions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs027.html#setting-up-the-equations-for-a-neural-network" style="font-size: 80%;"><b>Setting up the equations for a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs028.html#layout-of-a-neural-network-with-three-hidden-layers-last-later-l-l-4-first-layer-l-0" style="font-size: 80%;"><b>Layout of a neural network with three hidden layers (last later = \( l=L=4 \), first layer \( l=0 \))</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs029.html#definitions" style="font-size: 80%;"><b>Definitions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs030.html#inputs-to-the-activation-function" style="font-size: 80%;"><b>Inputs to the activation function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs031.html#layout-of-input-to-first-hidden-layer-l-1-from-input-layer-l-0" style="font-size: 80%;"><b>Layout of input to first hidden layer \( l=1 \) from input layer \( l=0 \)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs032.html#derivatives-and-the-chain-rule" style="font-size: 80%;"><b>Derivatives and the chain rule</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs033.html#derivative-of-the-cost-function" style="font-size: 80%;"><b>Derivative of the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs034.html#the-back-propagation-equations-for-a-neural-network" style="font-size: 80%;"><b>The  back propagation equations for a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs035.html#analyzing-the-last-results" style="font-size: 80%;"><b>Analyzing the last results</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs036.html#more-considerations" style="font-size: 80%;"><b>More considerations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs037.html#derivatives-in-terms-of-z-j-l" style="font-size: 80%;"><b>Derivatives in terms of \( z_j^L \)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs038.html#bringing-it-together" style="font-size: 80%;"><b>Bringing it together</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs039.html#final-back-propagating-equation" style="font-size: 80%;"><b>Final back propagating equation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs040.html#using-the-chain-rule-and-summing-over-all-k-entries" style="font-size: 80%;"><b>Using the chain rule and summing over all \( k \) entries</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs041.html#setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations" style="font-size: 80%;"><b>Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs042.html#setting-up-the-back-propagation-algorithm-part-1" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 1</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs043.html#setting-up-the-back-propagation-algorithm-part-2" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs044.html#setting-up-the-back-propagation-algorithm-part-3" style="font-size: 80%;"><b>Setting up the Back propagation algorithm, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs045.html#updating-the-gradients" style="font-size: 80%;"><b>Updating the gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs001.html#lecture-october-13-2025" style="font-size: 80%;"><b>Lecture October 13, 2025</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs002.html#readings-and-videos" style="font-size: 80%;"><b>Readings and videos</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs003.html#material-for-the-lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Material for the lab sessions on Tuesday and Wednesday</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs004.html#lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network" style="font-size: 80%;"><b>Lecture material: Writing a code which implements a feed-forward neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs005.html#mathematics-of-deep-learning" style="font-size: 80%;"><b>Mathematics of deep learning</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs006.html#reminder-on-books-with-hands-on-material-and-codes" style="font-size: 80%;"><b>Reminder on books with hands-on material and codes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs007.html#reading-recommendations" style="font-size: 80%;"><b>Reading recommendations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs008.html#reminder-from-last-week-first-network-example-simple-percepetron-with-one-input" style="font-size: 80%;"><b>Reminder from last week: First network example, simple percepetron with one input</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs009.html#layout-of-a-simple-neural-network-with-no-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with no hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs010.html#optimizing-the-parameters" style="font-size: 80%;"><b>Optimizing the parameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs011.html#adding-a-hidden-layer" style="font-size: 80%;"><b>Adding a hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs012.html#layout-of-a-simple-neural-network-with-one-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with one hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs013.html#the-derivatives" style="font-size: 80%;"><b>The derivatives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs014.html#important-observations" style="font-size: 80%;"><b>Important observations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs015.html#the-training" style="font-size: 80%;"><b>The training</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs016.html#code-example" style="font-size: 80%;"><b>Code example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs017.html#simple-neural-network-and-the-back-propagation-equations" style="font-size: 80%;"><b>Simple neural network and the  back propagation equations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs018.html#layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-with-two-hidden-noeds-and-one-output-node" style="font-size: 80%;"><b>Layout of a simple neural network with two input nodes, one  hidden layer with two hidden noeds and one output node</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs019.html#the-ouput-layer" style="font-size: 80%;"><b>The ouput layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs020.html#compact-expressions" style="font-size: 80%;"><b>Compact expressions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs021.html#output-layer" style="font-size: 80%;"><b>Output layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs022.html#explicit-derivatives" style="font-size: 80%;"><b>Explicit derivatives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs023.html#derivatives-of-the-hidden-layer" style="font-size: 80%;"><b>Derivatives of the hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs024.html#final-expression" style="font-size: 80%;"><b>Final expression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs025.html#completing-the-list" style="font-size: 80%;"><b>Completing the list</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs026.html#final-expressions-for-the-biases-of-the-hidden-layer" style="font-size: 80%;"><b>Final expressions for the biases of the hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs027.html#gradient-expressions" style="font-size: 80%;"><b>Gradient expressions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs028.html#setting-up-the-equations-for-a-neural-network" style="font-size: 80%;"><b>Setting up the equations for a neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs029.html#layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0" style="font-size: 80%;"><b>Layout of a neural network with three hidden layers (last layer = \( l=L=4 \), first layer \( l=0 \))</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs030.html#definitions" style="font-size: 80%;"><b>Definitions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs031.html#inputs-to-the-activation-function" style="font-size: 80%;"><b>Inputs to the activation function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs032.html#layout-of-input-to-first-hidden-layer-l-1-from-input-layer-l-0" style="font-size: 80%;"><b>Layout of input to first hidden layer \( l=1 \) from input layer \( l=0 \)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs033.html#derivatives-and-the-chain-rule" style="font-size: 80%;"><b>Derivatives and the chain rule</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs034.html#derivative-of-the-cost-function" style="font-size: 80%;"><b>Derivative of the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs035.html#the-back-propagation-equations-for-a-neural-network" style="font-size: 80%;"><b>The  back propagation equations for a neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs036.html#analyzing-the-last-results" style="font-size: 80%;"><b>Analyzing the last results</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs037.html#more-considerations" style="font-size: 80%;"><b>More considerations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs038.html#derivatives-in-terms-of-z-j-l" style="font-size: 80%;"><b>Derivatives in terms of \( z_j^L \)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs039.html#bringing-it-together" style="font-size: 80%;"><b>Bringing it together</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs040.html#final-back-propagating-equation" style="font-size: 80%;"><b>Final back propagating equation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs041.html#using-the-chain-rule-and-summing-over-all-k-entries" style="font-size: 80%;"><b>Using the chain rule and summing over all \( k \) entries</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs042.html#setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations" style="font-size: 80%;"><b>Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs043.html#setting-up-the-back-propagation-algorithm-part-1" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 1</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs044.html#setting-up-the-back-propagation-algorithm-part-2" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 2</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs045.html#setting-up-the-back-propagation-algorithm-part-3" style="font-size: 80%;"><b>Setting up the Back propagation algorithm, part 3</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs046.html#updating-the-gradients" style="font-size: 80%;"><b>Updating the gradients</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#activation-functions" style="font-size: 80%;"><b>Activation functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs047.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs048.html#relevance" style="font-size: 80%;"><b>Relevance</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs049.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs050.html#exploding-gradients" style="font-size: 80%;"><b>Exploding gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs051.html#is-the-logistic-activation-function-sigmoid-our-choice" style="font-size: 80%;"><b>Is the Logistic activation function (Sigmoid)  our choice?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs052.html#logistic-function-as-the-root-of-problems" style="font-size: 80%;"><b>Logistic function as the root of problems</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs053.html#the-derivative-of-the-logistic-funtion" style="font-size: 80%;"><b>The derivative of the Logistic funtion</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs054.html#insights-from-the-paper-by-glorot-and-bengio" style="font-size: 80%;"><b>Insights from the paper by Glorot and Bengio</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs055.html#the-relu-function-family" style="font-size: 80%;"><b>The RELU function family</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs056.html#elu-function" style="font-size: 80%;"><b>ELU function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs057.html#which-activation-function-should-we-use" style="font-size: 80%;"><b>Which activation function should we use?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs058.html#more-on-activation-functions-output-layers" style="font-size: 80%;"><b>More on activation functions, output layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs059.html#fine-tuning-neural-network-hyperparameters" style="font-size: 80%;"><b>Fine-tuning neural network hyperparameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="#hidden-layers" style="font-size: 80%;"><b>Hidden layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs061.html#batch-normalization" style="font-size: 80%;"><b>Batch Normalization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs062.html#dropout" style="font-size: 80%;"><b>Dropout</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs063.html#gradient-clipping" style="font-size: 80%;"><b>Gradient Clipping</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs064.html#a-top-down-perspective-on-neural-networks" style="font-size: 80%;"><b>A top-down perspective on Neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs065.html#more-top-down-perspectives" style="font-size: 80%;"><b>More top-down perspectives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs066.html#limitations-of-supervised-learning-with-deep-networks" style="font-size: 80%;"><b>Limitations of supervised learning with deep networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs067.html#limitations-of-nns" style="font-size: 80%;"><b>Limitations of NNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs068.html#homogeneous-data" style="font-size: 80%;"><b>Homogeneous data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs069.html#more-limitations" style="font-size: 80%;"><b>More limitations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs070.html#setting-up-a-multi-layer-perceptron-model-for-classification" style="font-size: 80%;"><b>Setting up a Multi-layer perceptron model for classification</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs071.html#defining-the-cost-function" style="font-size: 80%;"><b>Defining the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs072.html#example-binary-classification-problem" style="font-size: 80%;"><b>Example: binary classification problem</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs073.html#the-softmax-function" style="font-size: 80%;"><b>The Softmax function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs074.html#developing-a-code-for-doing-neural-networks-with-back-propagation" style="font-size: 80%;"><b>Developing a code for doing neural networks with back propagation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs076.html#train-and-test-datasets" style="font-size: 80%;"><b>Train and test datasets</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs077.html#define-model-and-architecture" style="font-size: 80%;"><b>Define model and architecture</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs078.html#layers" style="font-size: 80%;"><b>Layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs079.html#weights-and-biases" style="font-size: 80%;"><b>Weights and biases</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs080.html#feed-forward-pass" style="font-size: 80%;"><b>Feed-forward pass</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs081.html#matrix-multiplications" style="font-size: 80%;"><b>Matrix multiplications</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs082.html#choose-cost-function-and-optimizer" style="font-size: 80%;"><b>Choose cost function and optimizer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs083.html#optimizing-the-cost-function" style="font-size: 80%;"><b>Optimizing the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs084.html#regularization" style="font-size: 80%;"><b>Regularization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs085.html#matrix-multiplication" style="font-size: 80%;"><b>Matrix  multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs086.html#improving-performance" style="font-size: 80%;"><b>Improving performance</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs087.html#full-object-oriented-implementation" style="font-size: 80%;"><b>Full object-oriented implementation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs088.html#evaluate-model-performance-on-test-data" style="font-size: 80%;"><b>Evaluate model performance on test data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs089.html#adjust-hyperparameters" style="font-size: 80%;"><b>Adjust hyperparameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs091.html#scikit-learn-implementation" style="font-size: 80%;"><b>scikit-learn implementation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#building-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs094.html#tensorflow" style="font-size: 80%;"><b>Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs095.html#using-keras" style="font-size: 80%;"><b>Using Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#the-breast-cancer-data-now-with-keras" style="font-size: 80%;"><b>The Breast Cancer Data, now with Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs048.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs049.html#relevance" style="font-size: 80%;"><b>Relevance</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs050.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs051.html#exploding-gradients" style="font-size: 80%;"><b>Exploding gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs052.html#is-the-logistic-activation-function-sigmoid-our-choice" style="font-size: 80%;"><b>Is the Logistic activation function (Sigmoid)  our choice?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs053.html#logistic-function-as-the-root-of-problems" style="font-size: 80%;"><b>Logistic function as the root of problems</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs054.html#the-derivative-of-the-logistic-funtion" style="font-size: 80%;"><b>The derivative of the Logistic funtion</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs055.html#insights-from-the-paper-by-glorot-and-bengio" style="font-size: 80%;"><b>Insights from the paper by Glorot and Bengio</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs056.html#the-relu-function-family" style="font-size: 80%;"><b>The RELU function family</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs057.html#elu-function" style="font-size: 80%;"><b>ELU function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs058.html#which-activation-function-should-we-use" style="font-size: 80%;"><b>Which activation function should we use?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs059.html#more-on-activation-functions-output-layers" style="font-size: 80%;"><b>More on activation functions, output layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="#fine-tuning-neural-network-hyperparameters" style="font-size: 80%;"><b>Fine-tuning neural network hyperparameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs061.html#hidden-layers" style="font-size: 80%;"><b>Hidden layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs062.html#batch-normalization" style="font-size: 80%;"><b>Batch Normalization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs063.html#dropout" style="font-size: 80%;"><b>Dropout</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs064.html#gradient-clipping" style="font-size: 80%;"><b>Gradient Clipping</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs065.html#a-top-down-perspective-on-neural-networks" style="font-size: 80%;"><b>A top-down perspective on Neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs066.html#more-top-down-perspectives" style="font-size: 80%;"><b>More top-down perspectives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs067.html#limitations-of-supervised-learning-with-deep-networks" style="font-size: 80%;"><b>Limitations of supervised learning with deep networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs068.html#limitations-of-nns" style="font-size: 80%;"><b>Limitations of NNs</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs069.html#homogeneous-data" style="font-size: 80%;"><b>Homogeneous data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs070.html#more-limitations" style="font-size: 80%;"><b>More limitations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs071.html#setting-up-a-multi-layer-perceptron-model-for-classification" style="font-size: 80%;"><b>Setting up a Multi-layer perceptron model for classification</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs072.html#defining-the-cost-function" style="font-size: 80%;"><b>Defining the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs073.html#example-binary-classification-problem" style="font-size: 80%;"><b>Example: binary classification problem</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs074.html#the-softmax-function" style="font-size: 80%;"><b>The Softmax function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs075.html#developing-a-code-for-doing-neural-networks-with-back-propagation" style="font-size: 80%;"><b>Developing a code for doing neural networks with back propagation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs077.html#train-and-test-datasets" style="font-size: 80%;"><b>Train and test datasets</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs078.html#define-model-and-architecture" style="font-size: 80%;"><b>Define model and architecture</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs079.html#layers" style="font-size: 80%;"><b>Layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs080.html#weights-and-biases" style="font-size: 80%;"><b>Weights and biases</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs081.html#feed-forward-pass" style="font-size: 80%;"><b>Feed-forward pass</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs082.html#matrix-multiplications" style="font-size: 80%;"><b>Matrix multiplications</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs083.html#choose-cost-function-and-optimizer" style="font-size: 80%;"><b>Choose cost function and optimizer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs084.html#optimizing-the-cost-function" style="font-size: 80%;"><b>Optimizing the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs085.html#regularization" style="font-size: 80%;"><b>Regularization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs086.html#matrix-multiplication" style="font-size: 80%;"><b>Matrix  multiplication</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs087.html#improving-performance" style="font-size: 80%;"><b>Improving performance</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs088.html#full-object-oriented-implementation" style="font-size: 80%;"><b>Full object-oriented implementation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs089.html#evaluate-model-performance-on-test-data" style="font-size: 80%;"><b>Evaluate model performance on test data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs090.html#adjust-hyperparameters" style="font-size: 80%;"><b>Adjust hyperparameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#scikit-learn-implementation" style="font-size: 80%;"><b>scikit-learn implementation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs094.html#building-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building neural networks in Tensorflow and Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs095.html#tensorflow" style="font-size: 80%;"><b>Tensorflow</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#using-keras" style="font-size: 80%;"><b>Using Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#building-a-neural-network-code" style="font-size: 80%;"><b>Building a neural network code</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#learning-rate-methods" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Learning rate methods</a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#usage-of-the-above-learning-rate-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of the above learning rate schedulers</a></li>
@@ -465,26 +463,30 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0060"></a>
 <!-- !split -->
-<h2 id="hidden-layers" class="anchor">Hidden layers </h2>
+<h2 id="fine-tuning-neural-network-hyperparameters" class="anchor">Fine-tuning neural network hyperparameters </h2>
 
-<p>For many problems you can start with just one or two hidden layers and
-it will work just fine.  For the MNIST data set discussed below you can easily get a
-high accuracy using just one hidden layer with a few hundred neurons.
-You can reach for this data set above 98% accuracy using two hidden
-layers with the same total amount of neurons, in roughly the same
-amount of training time.
+<p>The flexibility of neural networks is also one of their main
+drawbacks: there are many hyperparameters to tweak. Not only can you
+use any imaginable network topology (how neurons/nodes are
+interconnected), but even in a simple FFNN you can change the number
+of layers, the number of neurons per layer, the type of activation
+function to use in each layer, the weight initialization logic, the
+stochastic gradient optmized and much more. How do you know what
+combination of hyperparameters is the best for your task?
 </p>
 
-<p>For more complex problems, you can gradually ramp up the number of
-hidden layers, until you start overfitting the training set. Very
-complex tasks, such as large image classification or speech
-recognition, typically require networks with dozens of layers and they
-need a huge amount of training data. However, you will rarely have to
-train such networks from scratch: it is much more common to reuse
-parts of a pretrained state-of-the-art network that performs a similar
-task.
+<ul>
+<li> You can use grid search with cross-validation to find the right hyperparameters.</li>
+</ul>
+<p>However,since there are many hyperparameters to tune, and since
+training a neural network on a large dataset takes a lot of time, you
+will only be able to explore a tiny part of the hyperparameter space.
 </p>
 
+<ul>
+<li> You can use randomized search.</li>
+<li> Or use tools like <a href="http://oscar.calldesk.ai/" target="_self">Oscar</a>, which implements more complex algorithms to help you find a good set of hyperparameters quickly.</li>  
+</ul>
 <p>
 <!-- navigation buttons at the bottom of the page -->
 <ul class="pagination">
diff --git a/doc/pub/week42/html/._week42-bs061.html b/doc/pub/week42/html/._week42-bs061.html
index 17c93e9fa..ba47baa70 100644
--- a/doc/pub/week42/html/._week42-bs061.html
+++ b/doc/pub/week42/html/._week42-bs061.html
@@ -36,16 +36,17 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Lecture October 14, 2024', 2, None, 'lecture-october-14-2024'),
-              ('Material for the active learning sessions on Tuesday and '
-               'Wednesday',
+ 'sections': [('Lecture October 13, 2025', 2, None, 'lecture-october-13-2025'),
+              ('Readings and videos', 2, None, 'readings-and-videos'),
+              ('Material for the lab sessions on Tuesday and Wednesday',
                2,
                None,
-               'material-for-the-active-learning-sessions-on-tuesday-and-wednesday'),
-              ('Writing a code which implements a feed-forward neural network',
+               'material-for-the-lab-sessions-on-tuesday-and-wednesday'),
+              ('Lecture material: Writing a code which implements a '
+               'feed-forward neural network',
                2,
                None,
-               'writing-a-code-which-implements-a-feed-forward-neural-network'),
+               'lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network'),
               ('Mathematics of deep learning',
                2,
                None,
@@ -55,10 +56,11 @@
                None,
                'reminder-on-books-with-hands-on-material-and-codes'),
               ('Reading recommendations', 2, None, 'reading-recommendations'),
-              ('First network example, simple percepetron with one input',
+              ('Reminder from last week: First network example, simple '
+               'percepetron with one input',
                2,
                None,
-               'first-network-example-simple-percepetron-with-one-input'),
+               'reminder-from-last-week-first-network-example-simple-percepetron-with-one-input'),
               ('Layout of a simple neural network with no hidden layer',
                2,
                None,
@@ -105,10 +107,10 @@
                None,
                'setting-up-the-equations-for-a-neural-network'),
               ('Layout of a neural network with three hidden layers (last '
-               'later = $l=L=4$, first layer $l=0$)',
+               'layer = $l=L=4$, first layer $l=0$)',
                2,
                None,
-               'layout-of-a-neural-network-with-three-hidden-layers-last-later-l-l-4-first-layer-l-0'),
+               'layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0'),
               ('Definitions', 2, None, 'definitions'),
               ('Inputs to the activation function',
                2,
@@ -291,10 +293,6 @@
                2,
                None,
                'collect-and-pre-process-data'),
-              ('The Breast Cancer Data, now with Keras',
-               2,
-               None,
-               'the-breast-cancer-data-now-with-keras'),
               ('Building a neural network code',
                2,
                None,
@@ -349,103 +347,103 @@
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs001.html#lecture-october-14-2024" style="font-size: 80%;"><b>Lecture October 14, 2024</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs002.html#material-for-the-active-learning-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Material for the active learning sessions on Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs003.html#writing-a-code-which-implements-a-feed-forward-neural-network" style="font-size: 80%;"><b>Writing a code which implements a feed-forward neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs004.html#mathematics-of-deep-learning" style="font-size: 80%;"><b>Mathematics of deep learning</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs005.html#reminder-on-books-with-hands-on-material-and-codes" style="font-size: 80%;"><b>Reminder on books with hands-on material and codes</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs006.html#reading-recommendations" style="font-size: 80%;"><b>Reading recommendations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs007.html#first-network-example-simple-percepetron-with-one-input" style="font-size: 80%;"><b>First network example, simple percepetron with one input</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs008.html#layout-of-a-simple-neural-network-with-no-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with no hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs009.html#optimizing-the-parameters" style="font-size: 80%;"><b>Optimizing the parameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs010.html#adding-a-hidden-layer" style="font-size: 80%;"><b>Adding a hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs011.html#layout-of-a-simple-neural-network-with-one-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with one hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs012.html#the-derivatives" style="font-size: 80%;"><b>The derivatives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs013.html#important-observations" style="font-size: 80%;"><b>Important observations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs014.html#the-training" style="font-size: 80%;"><b>The training</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs015.html#code-example" style="font-size: 80%;"><b>Code example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs016.html#simple-neural-network-and-the-back-propagation-equations" style="font-size: 80%;"><b>Simple neural network and the  back propagation equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs017.html#layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-with-two-hidden-noeds-and-one-output-node" style="font-size: 80%;"><b>Layout of a simple neural network with two input nodes, one  hidden layer with two hidden noeds and one output node</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs018.html#the-ouput-layer" style="font-size: 80%;"><b>The ouput layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs019.html#compact-expressions" style="font-size: 80%;"><b>Compact expressions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs020.html#output-layer" style="font-size: 80%;"><b>Output layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs021.html#explicit-derivatives" style="font-size: 80%;"><b>Explicit derivatives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs022.html#derivatives-of-the-hidden-layer" style="font-size: 80%;"><b>Derivatives of the hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs023.html#final-expression" style="font-size: 80%;"><b>Final expression</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs024.html#completing-the-list" style="font-size: 80%;"><b>Completing the list</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs025.html#final-expressions-for-the-biases-of-the-hidden-layer" style="font-size: 80%;"><b>Final expressions for the biases of the hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs026.html#gradient-expressions" style="font-size: 80%;"><b>Gradient expressions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs027.html#setting-up-the-equations-for-a-neural-network" style="font-size: 80%;"><b>Setting up the equations for a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs028.html#layout-of-a-neural-network-with-three-hidden-layers-last-later-l-l-4-first-layer-l-0" style="font-size: 80%;"><b>Layout of a neural network with three hidden layers (last later = \( l=L=4 \), first layer \( l=0 \))</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs029.html#definitions" style="font-size: 80%;"><b>Definitions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs030.html#inputs-to-the-activation-function" style="font-size: 80%;"><b>Inputs to the activation function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs031.html#layout-of-input-to-first-hidden-layer-l-1-from-input-layer-l-0" style="font-size: 80%;"><b>Layout of input to first hidden layer \( l=1 \) from input layer \( l=0 \)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs032.html#derivatives-and-the-chain-rule" style="font-size: 80%;"><b>Derivatives and the chain rule</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs033.html#derivative-of-the-cost-function" style="font-size: 80%;"><b>Derivative of the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs034.html#the-back-propagation-equations-for-a-neural-network" style="font-size: 80%;"><b>The  back propagation equations for a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs035.html#analyzing-the-last-results" style="font-size: 80%;"><b>Analyzing the last results</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs036.html#more-considerations" style="font-size: 80%;"><b>More considerations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs037.html#derivatives-in-terms-of-z-j-l" style="font-size: 80%;"><b>Derivatives in terms of \( z_j^L \)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs038.html#bringing-it-together" style="font-size: 80%;"><b>Bringing it together</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs039.html#final-back-propagating-equation" style="font-size: 80%;"><b>Final back propagating equation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs040.html#using-the-chain-rule-and-summing-over-all-k-entries" style="font-size: 80%;"><b>Using the chain rule and summing over all \( k \) entries</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs041.html#setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations" style="font-size: 80%;"><b>Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs042.html#setting-up-the-back-propagation-algorithm-part-1" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 1</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs043.html#setting-up-the-back-propagation-algorithm-part-2" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs044.html#setting-up-the-back-propagation-algorithm-part-3" style="font-size: 80%;"><b>Setting up the Back propagation algorithm, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs045.html#updating-the-gradients" style="font-size: 80%;"><b>Updating the gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs001.html#lecture-october-13-2025" style="font-size: 80%;"><b>Lecture October 13, 2025</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs002.html#readings-and-videos" style="font-size: 80%;"><b>Readings and videos</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs003.html#material-for-the-lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Material for the lab sessions on Tuesday and Wednesday</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs004.html#lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network" style="font-size: 80%;"><b>Lecture material: Writing a code which implements a feed-forward neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs005.html#mathematics-of-deep-learning" style="font-size: 80%;"><b>Mathematics of deep learning</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs006.html#reminder-on-books-with-hands-on-material-and-codes" style="font-size: 80%;"><b>Reminder on books with hands-on material and codes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs007.html#reading-recommendations" style="font-size: 80%;"><b>Reading recommendations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs008.html#reminder-from-last-week-first-network-example-simple-percepetron-with-one-input" style="font-size: 80%;"><b>Reminder from last week: First network example, simple percepetron with one input</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs009.html#layout-of-a-simple-neural-network-with-no-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with no hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs010.html#optimizing-the-parameters" style="font-size: 80%;"><b>Optimizing the parameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs011.html#adding-a-hidden-layer" style="font-size: 80%;"><b>Adding a hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs012.html#layout-of-a-simple-neural-network-with-one-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with one hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs013.html#the-derivatives" style="font-size: 80%;"><b>The derivatives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs014.html#important-observations" style="font-size: 80%;"><b>Important observations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs015.html#the-training" style="font-size: 80%;"><b>The training</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs016.html#code-example" style="font-size: 80%;"><b>Code example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs017.html#simple-neural-network-and-the-back-propagation-equations" style="font-size: 80%;"><b>Simple neural network and the  back propagation equations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs018.html#layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-with-two-hidden-noeds-and-one-output-node" style="font-size: 80%;"><b>Layout of a simple neural network with two input nodes, one  hidden layer with two hidden noeds and one output node</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs019.html#the-ouput-layer" style="font-size: 80%;"><b>The ouput layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs020.html#compact-expressions" style="font-size: 80%;"><b>Compact expressions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs021.html#output-layer" style="font-size: 80%;"><b>Output layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs022.html#explicit-derivatives" style="font-size: 80%;"><b>Explicit derivatives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs023.html#derivatives-of-the-hidden-layer" style="font-size: 80%;"><b>Derivatives of the hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs024.html#final-expression" style="font-size: 80%;"><b>Final expression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs025.html#completing-the-list" style="font-size: 80%;"><b>Completing the list</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs026.html#final-expressions-for-the-biases-of-the-hidden-layer" style="font-size: 80%;"><b>Final expressions for the biases of the hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs027.html#gradient-expressions" style="font-size: 80%;"><b>Gradient expressions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs028.html#setting-up-the-equations-for-a-neural-network" style="font-size: 80%;"><b>Setting up the equations for a neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs029.html#layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0" style="font-size: 80%;"><b>Layout of a neural network with three hidden layers (last layer = \( l=L=4 \), first layer \( l=0 \))</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs030.html#definitions" style="font-size: 80%;"><b>Definitions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs031.html#inputs-to-the-activation-function" style="font-size: 80%;"><b>Inputs to the activation function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs032.html#layout-of-input-to-first-hidden-layer-l-1-from-input-layer-l-0" style="font-size: 80%;"><b>Layout of input to first hidden layer \( l=1 \) from input layer \( l=0 \)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs033.html#derivatives-and-the-chain-rule" style="font-size: 80%;"><b>Derivatives and the chain rule</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs034.html#derivative-of-the-cost-function" style="font-size: 80%;"><b>Derivative of the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs035.html#the-back-propagation-equations-for-a-neural-network" style="font-size: 80%;"><b>The  back propagation equations for a neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs036.html#analyzing-the-last-results" style="font-size: 80%;"><b>Analyzing the last results</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs037.html#more-considerations" style="font-size: 80%;"><b>More considerations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs038.html#derivatives-in-terms-of-z-j-l" style="font-size: 80%;"><b>Derivatives in terms of \( z_j^L \)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs039.html#bringing-it-together" style="font-size: 80%;"><b>Bringing it together</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs040.html#final-back-propagating-equation" style="font-size: 80%;"><b>Final back propagating equation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs041.html#using-the-chain-rule-and-summing-over-all-k-entries" style="font-size: 80%;"><b>Using the chain rule and summing over all \( k \) entries</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs042.html#setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations" style="font-size: 80%;"><b>Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs043.html#setting-up-the-back-propagation-algorithm-part-1" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 1</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs044.html#setting-up-the-back-propagation-algorithm-part-2" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 2</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs045.html#setting-up-the-back-propagation-algorithm-part-3" style="font-size: 80%;"><b>Setting up the Back propagation algorithm, part 3</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs046.html#updating-the-gradients" style="font-size: 80%;"><b>Updating the gradients</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#activation-functions" style="font-size: 80%;"><b>Activation functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs047.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs048.html#relevance" style="font-size: 80%;"><b>Relevance</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs049.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs050.html#exploding-gradients" style="font-size: 80%;"><b>Exploding gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs051.html#is-the-logistic-activation-function-sigmoid-our-choice" style="font-size: 80%;"><b>Is the Logistic activation function (Sigmoid)  our choice?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs052.html#logistic-function-as-the-root-of-problems" style="font-size: 80%;"><b>Logistic function as the root of problems</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs053.html#the-derivative-of-the-logistic-funtion" style="font-size: 80%;"><b>The derivative of the Logistic funtion</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs054.html#insights-from-the-paper-by-glorot-and-bengio" style="font-size: 80%;"><b>Insights from the paper by Glorot and Bengio</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs055.html#the-relu-function-family" style="font-size: 80%;"><b>The RELU function family</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs056.html#elu-function" style="font-size: 80%;"><b>ELU function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs057.html#which-activation-function-should-we-use" style="font-size: 80%;"><b>Which activation function should we use?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs058.html#more-on-activation-functions-output-layers" style="font-size: 80%;"><b>More on activation functions, output layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs059.html#fine-tuning-neural-network-hyperparameters" style="font-size: 80%;"><b>Fine-tuning neural network hyperparameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs060.html#hidden-layers" style="font-size: 80%;"><b>Hidden layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="#batch-normalization" style="font-size: 80%;"><b>Batch Normalization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs062.html#dropout" style="font-size: 80%;"><b>Dropout</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs063.html#gradient-clipping" style="font-size: 80%;"><b>Gradient Clipping</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs064.html#a-top-down-perspective-on-neural-networks" style="font-size: 80%;"><b>A top-down perspective on Neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs065.html#more-top-down-perspectives" style="font-size: 80%;"><b>More top-down perspectives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs066.html#limitations-of-supervised-learning-with-deep-networks" style="font-size: 80%;"><b>Limitations of supervised learning with deep networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs067.html#limitations-of-nns" style="font-size: 80%;"><b>Limitations of NNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs068.html#homogeneous-data" style="font-size: 80%;"><b>Homogeneous data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs069.html#more-limitations" style="font-size: 80%;"><b>More limitations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs070.html#setting-up-a-multi-layer-perceptron-model-for-classification" style="font-size: 80%;"><b>Setting up a Multi-layer perceptron model for classification</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs071.html#defining-the-cost-function" style="font-size: 80%;"><b>Defining the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs072.html#example-binary-classification-problem" style="font-size: 80%;"><b>Example: binary classification problem</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs073.html#the-softmax-function" style="font-size: 80%;"><b>The Softmax function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs074.html#developing-a-code-for-doing-neural-networks-with-back-propagation" style="font-size: 80%;"><b>Developing a code for doing neural networks with back propagation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs076.html#train-and-test-datasets" style="font-size: 80%;"><b>Train and test datasets</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs077.html#define-model-and-architecture" style="font-size: 80%;"><b>Define model and architecture</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs078.html#layers" style="font-size: 80%;"><b>Layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs079.html#weights-and-biases" style="font-size: 80%;"><b>Weights and biases</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs080.html#feed-forward-pass" style="font-size: 80%;"><b>Feed-forward pass</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs081.html#matrix-multiplications" style="font-size: 80%;"><b>Matrix multiplications</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs082.html#choose-cost-function-and-optimizer" style="font-size: 80%;"><b>Choose cost function and optimizer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs083.html#optimizing-the-cost-function" style="font-size: 80%;"><b>Optimizing the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs084.html#regularization" style="font-size: 80%;"><b>Regularization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs085.html#matrix-multiplication" style="font-size: 80%;"><b>Matrix  multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs086.html#improving-performance" style="font-size: 80%;"><b>Improving performance</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs087.html#full-object-oriented-implementation" style="font-size: 80%;"><b>Full object-oriented implementation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs088.html#evaluate-model-performance-on-test-data" style="font-size: 80%;"><b>Evaluate model performance on test data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs089.html#adjust-hyperparameters" style="font-size: 80%;"><b>Adjust hyperparameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs091.html#scikit-learn-implementation" style="font-size: 80%;"><b>scikit-learn implementation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#building-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs094.html#tensorflow" style="font-size: 80%;"><b>Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs095.html#using-keras" style="font-size: 80%;"><b>Using Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#the-breast-cancer-data-now-with-keras" style="font-size: 80%;"><b>The Breast Cancer Data, now with Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs048.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs049.html#relevance" style="font-size: 80%;"><b>Relevance</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs050.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs051.html#exploding-gradients" style="font-size: 80%;"><b>Exploding gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs052.html#is-the-logistic-activation-function-sigmoid-our-choice" style="font-size: 80%;"><b>Is the Logistic activation function (Sigmoid)  our choice?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs053.html#logistic-function-as-the-root-of-problems" style="font-size: 80%;"><b>Logistic function as the root of problems</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs054.html#the-derivative-of-the-logistic-funtion" style="font-size: 80%;"><b>The derivative of the Logistic funtion</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs055.html#insights-from-the-paper-by-glorot-and-bengio" style="font-size: 80%;"><b>Insights from the paper by Glorot and Bengio</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs056.html#the-relu-function-family" style="font-size: 80%;"><b>The RELU function family</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs057.html#elu-function" style="font-size: 80%;"><b>ELU function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs058.html#which-activation-function-should-we-use" style="font-size: 80%;"><b>Which activation function should we use?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs059.html#more-on-activation-functions-output-layers" style="font-size: 80%;"><b>More on activation functions, output layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs060.html#fine-tuning-neural-network-hyperparameters" style="font-size: 80%;"><b>Fine-tuning neural network hyperparameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="#hidden-layers" style="font-size: 80%;"><b>Hidden layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs062.html#batch-normalization" style="font-size: 80%;"><b>Batch Normalization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs063.html#dropout" style="font-size: 80%;"><b>Dropout</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs064.html#gradient-clipping" style="font-size: 80%;"><b>Gradient Clipping</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs065.html#a-top-down-perspective-on-neural-networks" style="font-size: 80%;"><b>A top-down perspective on Neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs066.html#more-top-down-perspectives" style="font-size: 80%;"><b>More top-down perspectives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs067.html#limitations-of-supervised-learning-with-deep-networks" style="font-size: 80%;"><b>Limitations of supervised learning with deep networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs068.html#limitations-of-nns" style="font-size: 80%;"><b>Limitations of NNs</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs069.html#homogeneous-data" style="font-size: 80%;"><b>Homogeneous data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs070.html#more-limitations" style="font-size: 80%;"><b>More limitations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs071.html#setting-up-a-multi-layer-perceptron-model-for-classification" style="font-size: 80%;"><b>Setting up a Multi-layer perceptron model for classification</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs072.html#defining-the-cost-function" style="font-size: 80%;"><b>Defining the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs073.html#example-binary-classification-problem" style="font-size: 80%;"><b>Example: binary classification problem</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs074.html#the-softmax-function" style="font-size: 80%;"><b>The Softmax function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs075.html#developing-a-code-for-doing-neural-networks-with-back-propagation" style="font-size: 80%;"><b>Developing a code for doing neural networks with back propagation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs077.html#train-and-test-datasets" style="font-size: 80%;"><b>Train and test datasets</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs078.html#define-model-and-architecture" style="font-size: 80%;"><b>Define model and architecture</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs079.html#layers" style="font-size: 80%;"><b>Layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs080.html#weights-and-biases" style="font-size: 80%;"><b>Weights and biases</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs081.html#feed-forward-pass" style="font-size: 80%;"><b>Feed-forward pass</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs082.html#matrix-multiplications" style="font-size: 80%;"><b>Matrix multiplications</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs083.html#choose-cost-function-and-optimizer" style="font-size: 80%;"><b>Choose cost function and optimizer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs084.html#optimizing-the-cost-function" style="font-size: 80%;"><b>Optimizing the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs085.html#regularization" style="font-size: 80%;"><b>Regularization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs086.html#matrix-multiplication" style="font-size: 80%;"><b>Matrix  multiplication</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs087.html#improving-performance" style="font-size: 80%;"><b>Improving performance</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs088.html#full-object-oriented-implementation" style="font-size: 80%;"><b>Full object-oriented implementation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs089.html#evaluate-model-performance-on-test-data" style="font-size: 80%;"><b>Evaluate model performance on test data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs090.html#adjust-hyperparameters" style="font-size: 80%;"><b>Adjust hyperparameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#scikit-learn-implementation" style="font-size: 80%;"><b>scikit-learn implementation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs094.html#building-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building neural networks in Tensorflow and Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs095.html#tensorflow" style="font-size: 80%;"><b>Tensorflow</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#using-keras" style="font-size: 80%;"><b>Using Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#building-a-neural-network-code" style="font-size: 80%;"><b>Building a neural network code</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#learning-rate-methods" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Learning rate methods</a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#usage-of-the-above-learning-rate-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of the above learning rate schedulers</a></li>
@@ -465,24 +463,24 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0061"></a>
 <!-- !split -->
-<h2 id="batch-normalization" class="anchor">Batch Normalization </h2>
+<h2 id="hidden-layers" class="anchor">Hidden layers </h2>
 
-<p>Batch Normalization aims to address the vanishing/exploding gradients
-problems, and more generally the problem that the distribution of each
-layer&#8217;s inputs changes during training, as the parameters of the
-previous layers change.
+<p>For many problems you can start with just one or two hidden layers and
+it will work just fine.  For the MNIST data set discussed below you can easily get a
+high accuracy using just one hidden layer with a few hundred neurons.
+You can reach for this data set above 98% accuracy using two hidden
+layers with the same total amount of neurons, in roughly the same
+amount of training time.
 </p>
 
-<p>The technique consists of adding an operation in the model just before
-the activation function of each layer, simply zero-centering and
-normalizing the inputs, then scaling and shifting the result using two
-new parameters per layer (one for scaling, the other for shifting). In
-other words, this operation lets the model learn the optimal scale and
-mean of the inputs for each layer.  In order to zero-center and
-normalize the inputs, the algorithm needs to estimate the inputs&#8217; mean
-and standard deviation. It does so by evaluating the mean and standard
-deviation of the inputs over the current mini-batch, from this the
-name batch normalization.
+<p>For more complex problems, you can gradually ramp up the number of
+hidden layers, until you start overfitting the training set. Very
+complex tasks, such as large image classification or speech
+recognition, typically require networks with dozens of layers and they
+need a huge amount of training data. However, you will rarely have to
+train such networks from scratch: it is much more common to reuse
+parts of a pretrained state-of-the-art network that performs a similar
+task.
 </p>
 
 <p>
diff --git a/doc/pub/week42/html/._week42-bs062.html b/doc/pub/week42/html/._week42-bs062.html
index de92bf6e6..95f66a6ec 100644
--- a/doc/pub/week42/html/._week42-bs062.html
+++ b/doc/pub/week42/html/._week42-bs062.html
@@ -36,16 +36,17 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Lecture October 14, 2024', 2, None, 'lecture-october-14-2024'),
-              ('Material for the active learning sessions on Tuesday and '
-               'Wednesday',
+ 'sections': [('Lecture October 13, 2025', 2, None, 'lecture-october-13-2025'),
+              ('Readings and videos', 2, None, 'readings-and-videos'),
+              ('Material for the lab sessions on Tuesday and Wednesday',
                2,
                None,
-               'material-for-the-active-learning-sessions-on-tuesday-and-wednesday'),
-              ('Writing a code which implements a feed-forward neural network',
+               'material-for-the-lab-sessions-on-tuesday-and-wednesday'),
+              ('Lecture material: Writing a code which implements a '
+               'feed-forward neural network',
                2,
                None,
-               'writing-a-code-which-implements-a-feed-forward-neural-network'),
+               'lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network'),
               ('Mathematics of deep learning',
                2,
                None,
@@ -55,10 +56,11 @@
                None,
                'reminder-on-books-with-hands-on-material-and-codes'),
               ('Reading recommendations', 2, None, 'reading-recommendations'),
-              ('First network example, simple percepetron with one input',
+              ('Reminder from last week: First network example, simple '
+               'percepetron with one input',
                2,
                None,
-               'first-network-example-simple-percepetron-with-one-input'),
+               'reminder-from-last-week-first-network-example-simple-percepetron-with-one-input'),
               ('Layout of a simple neural network with no hidden layer',
                2,
                None,
@@ -105,10 +107,10 @@
                None,
                'setting-up-the-equations-for-a-neural-network'),
               ('Layout of a neural network with three hidden layers (last '
-               'later = $l=L=4$, first layer $l=0$)',
+               'layer = $l=L=4$, first layer $l=0$)',
                2,
                None,
-               'layout-of-a-neural-network-with-three-hidden-layers-last-later-l-l-4-first-layer-l-0'),
+               'layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0'),
               ('Definitions', 2, None, 'definitions'),
               ('Inputs to the activation function',
                2,
@@ -291,10 +293,6 @@
                2,
                None,
                'collect-and-pre-process-data'),
-              ('The Breast Cancer Data, now with Keras',
-               2,
-               None,
-               'the-breast-cancer-data-now-with-keras'),
               ('Building a neural network code',
                2,
                None,
@@ -349,103 +347,103 @@
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs001.html#lecture-october-14-2024" style="font-size: 80%;"><b>Lecture October 14, 2024</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs002.html#material-for-the-active-learning-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Material for the active learning sessions on Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs003.html#writing-a-code-which-implements-a-feed-forward-neural-network" style="font-size: 80%;"><b>Writing a code which implements a feed-forward neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs004.html#mathematics-of-deep-learning" style="font-size: 80%;"><b>Mathematics of deep learning</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs005.html#reminder-on-books-with-hands-on-material-and-codes" style="font-size: 80%;"><b>Reminder on books with hands-on material and codes</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs006.html#reading-recommendations" style="font-size: 80%;"><b>Reading recommendations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs007.html#first-network-example-simple-percepetron-with-one-input" style="font-size: 80%;"><b>First network example, simple percepetron with one input</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs008.html#layout-of-a-simple-neural-network-with-no-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with no hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs009.html#optimizing-the-parameters" style="font-size: 80%;"><b>Optimizing the parameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs010.html#adding-a-hidden-layer" style="font-size: 80%;"><b>Adding a hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs011.html#layout-of-a-simple-neural-network-with-one-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with one hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs012.html#the-derivatives" style="font-size: 80%;"><b>The derivatives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs013.html#important-observations" style="font-size: 80%;"><b>Important observations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs014.html#the-training" style="font-size: 80%;"><b>The training</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs015.html#code-example" style="font-size: 80%;"><b>Code example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs016.html#simple-neural-network-and-the-back-propagation-equations" style="font-size: 80%;"><b>Simple neural network and the  back propagation equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs017.html#layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-with-two-hidden-noeds-and-one-output-node" style="font-size: 80%;"><b>Layout of a simple neural network with two input nodes, one  hidden layer with two hidden noeds and one output node</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs018.html#the-ouput-layer" style="font-size: 80%;"><b>The ouput layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs019.html#compact-expressions" style="font-size: 80%;"><b>Compact expressions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs020.html#output-layer" style="font-size: 80%;"><b>Output layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs021.html#explicit-derivatives" style="font-size: 80%;"><b>Explicit derivatives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs022.html#derivatives-of-the-hidden-layer" style="font-size: 80%;"><b>Derivatives of the hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs023.html#final-expression" style="font-size: 80%;"><b>Final expression</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs024.html#completing-the-list" style="font-size: 80%;"><b>Completing the list</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs025.html#final-expressions-for-the-biases-of-the-hidden-layer" style="font-size: 80%;"><b>Final expressions for the biases of the hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs026.html#gradient-expressions" style="font-size: 80%;"><b>Gradient expressions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs027.html#setting-up-the-equations-for-a-neural-network" style="font-size: 80%;"><b>Setting up the equations for a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs028.html#layout-of-a-neural-network-with-three-hidden-layers-last-later-l-l-4-first-layer-l-0" style="font-size: 80%;"><b>Layout of a neural network with three hidden layers (last later = \( l=L=4 \), first layer \( l=0 \))</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs029.html#definitions" style="font-size: 80%;"><b>Definitions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs030.html#inputs-to-the-activation-function" style="font-size: 80%;"><b>Inputs to the activation function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs031.html#layout-of-input-to-first-hidden-layer-l-1-from-input-layer-l-0" style="font-size: 80%;"><b>Layout of input to first hidden layer \( l=1 \) from input layer \( l=0 \)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs032.html#derivatives-and-the-chain-rule" style="font-size: 80%;"><b>Derivatives and the chain rule</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs033.html#derivative-of-the-cost-function" style="font-size: 80%;"><b>Derivative of the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs034.html#the-back-propagation-equations-for-a-neural-network" style="font-size: 80%;"><b>The  back propagation equations for a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs035.html#analyzing-the-last-results" style="font-size: 80%;"><b>Analyzing the last results</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs036.html#more-considerations" style="font-size: 80%;"><b>More considerations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs037.html#derivatives-in-terms-of-z-j-l" style="font-size: 80%;"><b>Derivatives in terms of \( z_j^L \)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs038.html#bringing-it-together" style="font-size: 80%;"><b>Bringing it together</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs039.html#final-back-propagating-equation" style="font-size: 80%;"><b>Final back propagating equation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs040.html#using-the-chain-rule-and-summing-over-all-k-entries" style="font-size: 80%;"><b>Using the chain rule and summing over all \( k \) entries</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs041.html#setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations" style="font-size: 80%;"><b>Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs042.html#setting-up-the-back-propagation-algorithm-part-1" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 1</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs043.html#setting-up-the-back-propagation-algorithm-part-2" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs044.html#setting-up-the-back-propagation-algorithm-part-3" style="font-size: 80%;"><b>Setting up the Back propagation algorithm, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs045.html#updating-the-gradients" style="font-size: 80%;"><b>Updating the gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs001.html#lecture-october-13-2025" style="font-size: 80%;"><b>Lecture October 13, 2025</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs002.html#readings-and-videos" style="font-size: 80%;"><b>Readings and videos</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs003.html#material-for-the-lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Material for the lab sessions on Tuesday and Wednesday</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs004.html#lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network" style="font-size: 80%;"><b>Lecture material: Writing a code which implements a feed-forward neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs005.html#mathematics-of-deep-learning" style="font-size: 80%;"><b>Mathematics of deep learning</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs006.html#reminder-on-books-with-hands-on-material-and-codes" style="font-size: 80%;"><b>Reminder on books with hands-on material and codes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs007.html#reading-recommendations" style="font-size: 80%;"><b>Reading recommendations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs008.html#reminder-from-last-week-first-network-example-simple-percepetron-with-one-input" style="font-size: 80%;"><b>Reminder from last week: First network example, simple percepetron with one input</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs009.html#layout-of-a-simple-neural-network-with-no-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with no hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs010.html#optimizing-the-parameters" style="font-size: 80%;"><b>Optimizing the parameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs011.html#adding-a-hidden-layer" style="font-size: 80%;"><b>Adding a hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs012.html#layout-of-a-simple-neural-network-with-one-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with one hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs013.html#the-derivatives" style="font-size: 80%;"><b>The derivatives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs014.html#important-observations" style="font-size: 80%;"><b>Important observations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs015.html#the-training" style="font-size: 80%;"><b>The training</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs016.html#code-example" style="font-size: 80%;"><b>Code example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs017.html#simple-neural-network-and-the-back-propagation-equations" style="font-size: 80%;"><b>Simple neural network and the  back propagation equations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs018.html#layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-with-two-hidden-noeds-and-one-output-node" style="font-size: 80%;"><b>Layout of a simple neural network with two input nodes, one  hidden layer with two hidden noeds and one output node</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs019.html#the-ouput-layer" style="font-size: 80%;"><b>The ouput layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs020.html#compact-expressions" style="font-size: 80%;"><b>Compact expressions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs021.html#output-layer" style="font-size: 80%;"><b>Output layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs022.html#explicit-derivatives" style="font-size: 80%;"><b>Explicit derivatives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs023.html#derivatives-of-the-hidden-layer" style="font-size: 80%;"><b>Derivatives of the hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs024.html#final-expression" style="font-size: 80%;"><b>Final expression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs025.html#completing-the-list" style="font-size: 80%;"><b>Completing the list</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs026.html#final-expressions-for-the-biases-of-the-hidden-layer" style="font-size: 80%;"><b>Final expressions for the biases of the hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs027.html#gradient-expressions" style="font-size: 80%;"><b>Gradient expressions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs028.html#setting-up-the-equations-for-a-neural-network" style="font-size: 80%;"><b>Setting up the equations for a neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs029.html#layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0" style="font-size: 80%;"><b>Layout of a neural network with three hidden layers (last layer = \( l=L=4 \), first layer \( l=0 \))</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs030.html#definitions" style="font-size: 80%;"><b>Definitions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs031.html#inputs-to-the-activation-function" style="font-size: 80%;"><b>Inputs to the activation function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs032.html#layout-of-input-to-first-hidden-layer-l-1-from-input-layer-l-0" style="font-size: 80%;"><b>Layout of input to first hidden layer \( l=1 \) from input layer \( l=0 \)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs033.html#derivatives-and-the-chain-rule" style="font-size: 80%;"><b>Derivatives and the chain rule</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs034.html#derivative-of-the-cost-function" style="font-size: 80%;"><b>Derivative of the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs035.html#the-back-propagation-equations-for-a-neural-network" style="font-size: 80%;"><b>The  back propagation equations for a neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs036.html#analyzing-the-last-results" style="font-size: 80%;"><b>Analyzing the last results</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs037.html#more-considerations" style="font-size: 80%;"><b>More considerations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs038.html#derivatives-in-terms-of-z-j-l" style="font-size: 80%;"><b>Derivatives in terms of \( z_j^L \)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs039.html#bringing-it-together" style="font-size: 80%;"><b>Bringing it together</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs040.html#final-back-propagating-equation" style="font-size: 80%;"><b>Final back propagating equation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs041.html#using-the-chain-rule-and-summing-over-all-k-entries" style="font-size: 80%;"><b>Using the chain rule and summing over all \( k \) entries</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs042.html#setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations" style="font-size: 80%;"><b>Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs043.html#setting-up-the-back-propagation-algorithm-part-1" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 1</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs044.html#setting-up-the-back-propagation-algorithm-part-2" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 2</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs045.html#setting-up-the-back-propagation-algorithm-part-3" style="font-size: 80%;"><b>Setting up the Back propagation algorithm, part 3</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs046.html#updating-the-gradients" style="font-size: 80%;"><b>Updating the gradients</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#activation-functions" style="font-size: 80%;"><b>Activation functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs047.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs048.html#relevance" style="font-size: 80%;"><b>Relevance</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs049.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs050.html#exploding-gradients" style="font-size: 80%;"><b>Exploding gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs051.html#is-the-logistic-activation-function-sigmoid-our-choice" style="font-size: 80%;"><b>Is the Logistic activation function (Sigmoid)  our choice?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs052.html#logistic-function-as-the-root-of-problems" style="font-size: 80%;"><b>Logistic function as the root of problems</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs053.html#the-derivative-of-the-logistic-funtion" style="font-size: 80%;"><b>The derivative of the Logistic funtion</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs054.html#insights-from-the-paper-by-glorot-and-bengio" style="font-size: 80%;"><b>Insights from the paper by Glorot and Bengio</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs055.html#the-relu-function-family" style="font-size: 80%;"><b>The RELU function family</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs056.html#elu-function" style="font-size: 80%;"><b>ELU function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs057.html#which-activation-function-should-we-use" style="font-size: 80%;"><b>Which activation function should we use?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs058.html#more-on-activation-functions-output-layers" style="font-size: 80%;"><b>More on activation functions, output layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs059.html#fine-tuning-neural-network-hyperparameters" style="font-size: 80%;"><b>Fine-tuning neural network hyperparameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs060.html#hidden-layers" style="font-size: 80%;"><b>Hidden layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs061.html#batch-normalization" style="font-size: 80%;"><b>Batch Normalization</b></a></li>
-     <!-- navigation toc: --> <li><a href="#dropout" style="font-size: 80%;"><b>Dropout</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs063.html#gradient-clipping" style="font-size: 80%;"><b>Gradient Clipping</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs064.html#a-top-down-perspective-on-neural-networks" style="font-size: 80%;"><b>A top-down perspective on Neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs065.html#more-top-down-perspectives" style="font-size: 80%;"><b>More top-down perspectives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs066.html#limitations-of-supervised-learning-with-deep-networks" style="font-size: 80%;"><b>Limitations of supervised learning with deep networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs067.html#limitations-of-nns" style="font-size: 80%;"><b>Limitations of NNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs068.html#homogeneous-data" style="font-size: 80%;"><b>Homogeneous data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs069.html#more-limitations" style="font-size: 80%;"><b>More limitations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs070.html#setting-up-a-multi-layer-perceptron-model-for-classification" style="font-size: 80%;"><b>Setting up a Multi-layer perceptron model for classification</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs071.html#defining-the-cost-function" style="font-size: 80%;"><b>Defining the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs072.html#example-binary-classification-problem" style="font-size: 80%;"><b>Example: binary classification problem</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs073.html#the-softmax-function" style="font-size: 80%;"><b>The Softmax function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs074.html#developing-a-code-for-doing-neural-networks-with-back-propagation" style="font-size: 80%;"><b>Developing a code for doing neural networks with back propagation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs076.html#train-and-test-datasets" style="font-size: 80%;"><b>Train and test datasets</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs077.html#define-model-and-architecture" style="font-size: 80%;"><b>Define model and architecture</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs078.html#layers" style="font-size: 80%;"><b>Layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs079.html#weights-and-biases" style="font-size: 80%;"><b>Weights and biases</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs080.html#feed-forward-pass" style="font-size: 80%;"><b>Feed-forward pass</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs081.html#matrix-multiplications" style="font-size: 80%;"><b>Matrix multiplications</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs082.html#choose-cost-function-and-optimizer" style="font-size: 80%;"><b>Choose cost function and optimizer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs083.html#optimizing-the-cost-function" style="font-size: 80%;"><b>Optimizing the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs084.html#regularization" style="font-size: 80%;"><b>Regularization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs085.html#matrix-multiplication" style="font-size: 80%;"><b>Matrix  multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs086.html#improving-performance" style="font-size: 80%;"><b>Improving performance</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs087.html#full-object-oriented-implementation" style="font-size: 80%;"><b>Full object-oriented implementation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs088.html#evaluate-model-performance-on-test-data" style="font-size: 80%;"><b>Evaluate model performance on test data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs089.html#adjust-hyperparameters" style="font-size: 80%;"><b>Adjust hyperparameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs091.html#scikit-learn-implementation" style="font-size: 80%;"><b>scikit-learn implementation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#building-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs094.html#tensorflow" style="font-size: 80%;"><b>Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs095.html#using-keras" style="font-size: 80%;"><b>Using Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#the-breast-cancer-data-now-with-keras" style="font-size: 80%;"><b>The Breast Cancer Data, now with Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs048.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs049.html#relevance" style="font-size: 80%;"><b>Relevance</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs050.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs051.html#exploding-gradients" style="font-size: 80%;"><b>Exploding gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs052.html#is-the-logistic-activation-function-sigmoid-our-choice" style="font-size: 80%;"><b>Is the Logistic activation function (Sigmoid)  our choice?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs053.html#logistic-function-as-the-root-of-problems" style="font-size: 80%;"><b>Logistic function as the root of problems</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs054.html#the-derivative-of-the-logistic-funtion" style="font-size: 80%;"><b>The derivative of the Logistic funtion</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs055.html#insights-from-the-paper-by-glorot-and-bengio" style="font-size: 80%;"><b>Insights from the paper by Glorot and Bengio</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs056.html#the-relu-function-family" style="font-size: 80%;"><b>The RELU function family</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs057.html#elu-function" style="font-size: 80%;"><b>ELU function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs058.html#which-activation-function-should-we-use" style="font-size: 80%;"><b>Which activation function should we use?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs059.html#more-on-activation-functions-output-layers" style="font-size: 80%;"><b>More on activation functions, output layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs060.html#fine-tuning-neural-network-hyperparameters" style="font-size: 80%;"><b>Fine-tuning neural network hyperparameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs061.html#hidden-layers" style="font-size: 80%;"><b>Hidden layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="#batch-normalization" style="font-size: 80%;"><b>Batch Normalization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs063.html#dropout" style="font-size: 80%;"><b>Dropout</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs064.html#gradient-clipping" style="font-size: 80%;"><b>Gradient Clipping</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs065.html#a-top-down-perspective-on-neural-networks" style="font-size: 80%;"><b>A top-down perspective on Neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs066.html#more-top-down-perspectives" style="font-size: 80%;"><b>More top-down perspectives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs067.html#limitations-of-supervised-learning-with-deep-networks" style="font-size: 80%;"><b>Limitations of supervised learning with deep networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs068.html#limitations-of-nns" style="font-size: 80%;"><b>Limitations of NNs</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs069.html#homogeneous-data" style="font-size: 80%;"><b>Homogeneous data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs070.html#more-limitations" style="font-size: 80%;"><b>More limitations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs071.html#setting-up-a-multi-layer-perceptron-model-for-classification" style="font-size: 80%;"><b>Setting up a Multi-layer perceptron model for classification</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs072.html#defining-the-cost-function" style="font-size: 80%;"><b>Defining the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs073.html#example-binary-classification-problem" style="font-size: 80%;"><b>Example: binary classification problem</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs074.html#the-softmax-function" style="font-size: 80%;"><b>The Softmax function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs075.html#developing-a-code-for-doing-neural-networks-with-back-propagation" style="font-size: 80%;"><b>Developing a code for doing neural networks with back propagation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs077.html#train-and-test-datasets" style="font-size: 80%;"><b>Train and test datasets</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs078.html#define-model-and-architecture" style="font-size: 80%;"><b>Define model and architecture</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs079.html#layers" style="font-size: 80%;"><b>Layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs080.html#weights-and-biases" style="font-size: 80%;"><b>Weights and biases</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs081.html#feed-forward-pass" style="font-size: 80%;"><b>Feed-forward pass</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs082.html#matrix-multiplications" style="font-size: 80%;"><b>Matrix multiplications</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs083.html#choose-cost-function-and-optimizer" style="font-size: 80%;"><b>Choose cost function and optimizer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs084.html#optimizing-the-cost-function" style="font-size: 80%;"><b>Optimizing the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs085.html#regularization" style="font-size: 80%;"><b>Regularization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs086.html#matrix-multiplication" style="font-size: 80%;"><b>Matrix  multiplication</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs087.html#improving-performance" style="font-size: 80%;"><b>Improving performance</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs088.html#full-object-oriented-implementation" style="font-size: 80%;"><b>Full object-oriented implementation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs089.html#evaluate-model-performance-on-test-data" style="font-size: 80%;"><b>Evaluate model performance on test data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs090.html#adjust-hyperparameters" style="font-size: 80%;"><b>Adjust hyperparameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#scikit-learn-implementation" style="font-size: 80%;"><b>scikit-learn implementation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs094.html#building-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building neural networks in Tensorflow and Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs095.html#tensorflow" style="font-size: 80%;"><b>Tensorflow</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#using-keras" style="font-size: 80%;"><b>Using Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#building-a-neural-network-code" style="font-size: 80%;"><b>Building a neural network code</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#learning-rate-methods" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Learning rate methods</a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#usage-of-the-above-learning-rate-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of the above learning rate schedulers</a></li>
@@ -465,18 +463,24 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0062"></a>
 <!-- !split -->
-<h2 id="dropout" class="anchor">Dropout </h2>
+<h2 id="batch-normalization" class="anchor">Batch Normalization </h2>
 
-<p>It is a fairly simple algorithm: at every training step, every neuron
-(including the input neurons but excluding the output neurons) has a
-probability \( p \) of being temporarily dropped out, meaning it will be
-entirely ignored during this training step, but it may be active
-during the next step.
+<p>Batch Normalization aims to address the vanishing/exploding gradients
+problems, and more generally the problem that the distribution of each
+layer&#8217;s inputs changes during training, as the parameters of the
+previous layers change.
 </p>
 
-<p>The hyperparameter \( p \) is called the dropout rate, and it is typically
-set to 50%. After training, the neurons are not dropped anymore.  It
-is viewed as one of the most popular regularization techniques.
+<p>The technique consists of adding an operation in the model just before
+the activation function of each layer, simply zero-centering and
+normalizing the inputs, then scaling and shifting the result using two
+new parameters per layer (one for scaling, the other for shifting). In
+other words, this operation lets the model learn the optimal scale and
+mean of the inputs for each layer.  In order to zero-center and
+normalize the inputs, the algorithm needs to estimate the inputs&#8217; mean
+and standard deviation. It does so by evaluating the mean and standard
+deviation of the inputs over the current mini-batch, from this the
+name batch normalization.
 </p>
 
 <p>
diff --git a/doc/pub/week42/html/._week42-bs063.html b/doc/pub/week42/html/._week42-bs063.html
index cb91698d3..59cfd3344 100644
--- a/doc/pub/week42/html/._week42-bs063.html
+++ b/doc/pub/week42/html/._week42-bs063.html
@@ -36,16 +36,17 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Lecture October 14, 2024', 2, None, 'lecture-october-14-2024'),
-              ('Material for the active learning sessions on Tuesday and '
-               'Wednesday',
+ 'sections': [('Lecture October 13, 2025', 2, None, 'lecture-october-13-2025'),
+              ('Readings and videos', 2, None, 'readings-and-videos'),
+              ('Material for the lab sessions on Tuesday and Wednesday',
                2,
                None,
-               'material-for-the-active-learning-sessions-on-tuesday-and-wednesday'),
-              ('Writing a code which implements a feed-forward neural network',
+               'material-for-the-lab-sessions-on-tuesday-and-wednesday'),
+              ('Lecture material: Writing a code which implements a '
+               'feed-forward neural network',
                2,
                None,
-               'writing-a-code-which-implements-a-feed-forward-neural-network'),
+               'lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network'),
               ('Mathematics of deep learning',
                2,
                None,
@@ -55,10 +56,11 @@
                None,
                'reminder-on-books-with-hands-on-material-and-codes'),
               ('Reading recommendations', 2, None, 'reading-recommendations'),
-              ('First network example, simple percepetron with one input',
+              ('Reminder from last week: First network example, simple '
+               'percepetron with one input',
                2,
                None,
-               'first-network-example-simple-percepetron-with-one-input'),
+               'reminder-from-last-week-first-network-example-simple-percepetron-with-one-input'),
               ('Layout of a simple neural network with no hidden layer',
                2,
                None,
@@ -105,10 +107,10 @@
                None,
                'setting-up-the-equations-for-a-neural-network'),
               ('Layout of a neural network with three hidden layers (last '
-               'later = $l=L=4$, first layer $l=0$)',
+               'layer = $l=L=4$, first layer $l=0$)',
                2,
                None,
-               'layout-of-a-neural-network-with-three-hidden-layers-last-later-l-l-4-first-layer-l-0'),
+               'layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0'),
               ('Definitions', 2, None, 'definitions'),
               ('Inputs to the activation function',
                2,
@@ -291,10 +293,6 @@
                2,
                None,
                'collect-and-pre-process-data'),
-              ('The Breast Cancer Data, now with Keras',
-               2,
-               None,
-               'the-breast-cancer-data-now-with-keras'),
               ('Building a neural network code',
                2,
                None,
@@ -349,103 +347,103 @@
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs001.html#lecture-october-14-2024" style="font-size: 80%;"><b>Lecture October 14, 2024</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs002.html#material-for-the-active-learning-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Material for the active learning sessions on Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs003.html#writing-a-code-which-implements-a-feed-forward-neural-network" style="font-size: 80%;"><b>Writing a code which implements a feed-forward neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs004.html#mathematics-of-deep-learning" style="font-size: 80%;"><b>Mathematics of deep learning</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs005.html#reminder-on-books-with-hands-on-material-and-codes" style="font-size: 80%;"><b>Reminder on books with hands-on material and codes</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs006.html#reading-recommendations" style="font-size: 80%;"><b>Reading recommendations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs007.html#first-network-example-simple-percepetron-with-one-input" style="font-size: 80%;"><b>First network example, simple percepetron with one input</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs008.html#layout-of-a-simple-neural-network-with-no-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with no hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs009.html#optimizing-the-parameters" style="font-size: 80%;"><b>Optimizing the parameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs010.html#adding-a-hidden-layer" style="font-size: 80%;"><b>Adding a hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs011.html#layout-of-a-simple-neural-network-with-one-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with one hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs012.html#the-derivatives" style="font-size: 80%;"><b>The derivatives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs013.html#important-observations" style="font-size: 80%;"><b>Important observations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs014.html#the-training" style="font-size: 80%;"><b>The training</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs015.html#code-example" style="font-size: 80%;"><b>Code example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs016.html#simple-neural-network-and-the-back-propagation-equations" style="font-size: 80%;"><b>Simple neural network and the  back propagation equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs017.html#layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-with-two-hidden-noeds-and-one-output-node" style="font-size: 80%;"><b>Layout of a simple neural network with two input nodes, one  hidden layer with two hidden noeds and one output node</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs018.html#the-ouput-layer" style="font-size: 80%;"><b>The ouput layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs019.html#compact-expressions" style="font-size: 80%;"><b>Compact expressions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs020.html#output-layer" style="font-size: 80%;"><b>Output layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs021.html#explicit-derivatives" style="font-size: 80%;"><b>Explicit derivatives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs022.html#derivatives-of-the-hidden-layer" style="font-size: 80%;"><b>Derivatives of the hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs023.html#final-expression" style="font-size: 80%;"><b>Final expression</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs024.html#completing-the-list" style="font-size: 80%;"><b>Completing the list</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs025.html#final-expressions-for-the-biases-of-the-hidden-layer" style="font-size: 80%;"><b>Final expressions for the biases of the hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs026.html#gradient-expressions" style="font-size: 80%;"><b>Gradient expressions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs027.html#setting-up-the-equations-for-a-neural-network" style="font-size: 80%;"><b>Setting up the equations for a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs028.html#layout-of-a-neural-network-with-three-hidden-layers-last-later-l-l-4-first-layer-l-0" style="font-size: 80%;"><b>Layout of a neural network with three hidden layers (last later = \( l=L=4 \), first layer \( l=0 \))</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs029.html#definitions" style="font-size: 80%;"><b>Definitions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs030.html#inputs-to-the-activation-function" style="font-size: 80%;"><b>Inputs to the activation function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs031.html#layout-of-input-to-first-hidden-layer-l-1-from-input-layer-l-0" style="font-size: 80%;"><b>Layout of input to first hidden layer \( l=1 \) from input layer \( l=0 \)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs032.html#derivatives-and-the-chain-rule" style="font-size: 80%;"><b>Derivatives and the chain rule</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs033.html#derivative-of-the-cost-function" style="font-size: 80%;"><b>Derivative of the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs034.html#the-back-propagation-equations-for-a-neural-network" style="font-size: 80%;"><b>The  back propagation equations for a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs035.html#analyzing-the-last-results" style="font-size: 80%;"><b>Analyzing the last results</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs036.html#more-considerations" style="font-size: 80%;"><b>More considerations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs037.html#derivatives-in-terms-of-z-j-l" style="font-size: 80%;"><b>Derivatives in terms of \( z_j^L \)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs038.html#bringing-it-together" style="font-size: 80%;"><b>Bringing it together</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs039.html#final-back-propagating-equation" style="font-size: 80%;"><b>Final back propagating equation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs040.html#using-the-chain-rule-and-summing-over-all-k-entries" style="font-size: 80%;"><b>Using the chain rule and summing over all \( k \) entries</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs041.html#setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations" style="font-size: 80%;"><b>Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs042.html#setting-up-the-back-propagation-algorithm-part-1" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 1</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs043.html#setting-up-the-back-propagation-algorithm-part-2" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs044.html#setting-up-the-back-propagation-algorithm-part-3" style="font-size: 80%;"><b>Setting up the Back propagation algorithm, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs045.html#updating-the-gradients" style="font-size: 80%;"><b>Updating the gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs001.html#lecture-october-13-2025" style="font-size: 80%;"><b>Lecture October 13, 2025</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs002.html#readings-and-videos" style="font-size: 80%;"><b>Readings and videos</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs003.html#material-for-the-lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Material for the lab sessions on Tuesday and Wednesday</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs004.html#lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network" style="font-size: 80%;"><b>Lecture material: Writing a code which implements a feed-forward neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs005.html#mathematics-of-deep-learning" style="font-size: 80%;"><b>Mathematics of deep learning</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs006.html#reminder-on-books-with-hands-on-material-and-codes" style="font-size: 80%;"><b>Reminder on books with hands-on material and codes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs007.html#reading-recommendations" style="font-size: 80%;"><b>Reading recommendations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs008.html#reminder-from-last-week-first-network-example-simple-percepetron-with-one-input" style="font-size: 80%;"><b>Reminder from last week: First network example, simple percepetron with one input</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs009.html#layout-of-a-simple-neural-network-with-no-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with no hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs010.html#optimizing-the-parameters" style="font-size: 80%;"><b>Optimizing the parameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs011.html#adding-a-hidden-layer" style="font-size: 80%;"><b>Adding a hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs012.html#layout-of-a-simple-neural-network-with-one-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with one hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs013.html#the-derivatives" style="font-size: 80%;"><b>The derivatives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs014.html#important-observations" style="font-size: 80%;"><b>Important observations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs015.html#the-training" style="font-size: 80%;"><b>The training</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs016.html#code-example" style="font-size: 80%;"><b>Code example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs017.html#simple-neural-network-and-the-back-propagation-equations" style="font-size: 80%;"><b>Simple neural network and the  back propagation equations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs018.html#layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-with-two-hidden-noeds-and-one-output-node" style="font-size: 80%;"><b>Layout of a simple neural network with two input nodes, one  hidden layer with two hidden noeds and one output node</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs019.html#the-ouput-layer" style="font-size: 80%;"><b>The ouput layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs020.html#compact-expressions" style="font-size: 80%;"><b>Compact expressions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs021.html#output-layer" style="font-size: 80%;"><b>Output layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs022.html#explicit-derivatives" style="font-size: 80%;"><b>Explicit derivatives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs023.html#derivatives-of-the-hidden-layer" style="font-size: 80%;"><b>Derivatives of the hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs024.html#final-expression" style="font-size: 80%;"><b>Final expression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs025.html#completing-the-list" style="font-size: 80%;"><b>Completing the list</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs026.html#final-expressions-for-the-biases-of-the-hidden-layer" style="font-size: 80%;"><b>Final expressions for the biases of the hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs027.html#gradient-expressions" style="font-size: 80%;"><b>Gradient expressions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs028.html#setting-up-the-equations-for-a-neural-network" style="font-size: 80%;"><b>Setting up the equations for a neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs029.html#layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0" style="font-size: 80%;"><b>Layout of a neural network with three hidden layers (last layer = \( l=L=4 \), first layer \( l=0 \))</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs030.html#definitions" style="font-size: 80%;"><b>Definitions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs031.html#inputs-to-the-activation-function" style="font-size: 80%;"><b>Inputs to the activation function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs032.html#layout-of-input-to-first-hidden-layer-l-1-from-input-layer-l-0" style="font-size: 80%;"><b>Layout of input to first hidden layer \( l=1 \) from input layer \( l=0 \)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs033.html#derivatives-and-the-chain-rule" style="font-size: 80%;"><b>Derivatives and the chain rule</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs034.html#derivative-of-the-cost-function" style="font-size: 80%;"><b>Derivative of the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs035.html#the-back-propagation-equations-for-a-neural-network" style="font-size: 80%;"><b>The  back propagation equations for a neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs036.html#analyzing-the-last-results" style="font-size: 80%;"><b>Analyzing the last results</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs037.html#more-considerations" style="font-size: 80%;"><b>More considerations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs038.html#derivatives-in-terms-of-z-j-l" style="font-size: 80%;"><b>Derivatives in terms of \( z_j^L \)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs039.html#bringing-it-together" style="font-size: 80%;"><b>Bringing it together</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs040.html#final-back-propagating-equation" style="font-size: 80%;"><b>Final back propagating equation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs041.html#using-the-chain-rule-and-summing-over-all-k-entries" style="font-size: 80%;"><b>Using the chain rule and summing over all \( k \) entries</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs042.html#setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations" style="font-size: 80%;"><b>Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs043.html#setting-up-the-back-propagation-algorithm-part-1" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 1</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs044.html#setting-up-the-back-propagation-algorithm-part-2" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 2</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs045.html#setting-up-the-back-propagation-algorithm-part-3" style="font-size: 80%;"><b>Setting up the Back propagation algorithm, part 3</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs046.html#updating-the-gradients" style="font-size: 80%;"><b>Updating the gradients</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#activation-functions" style="font-size: 80%;"><b>Activation functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs047.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs048.html#relevance" style="font-size: 80%;"><b>Relevance</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs049.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs050.html#exploding-gradients" style="font-size: 80%;"><b>Exploding gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs051.html#is-the-logistic-activation-function-sigmoid-our-choice" style="font-size: 80%;"><b>Is the Logistic activation function (Sigmoid)  our choice?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs052.html#logistic-function-as-the-root-of-problems" style="font-size: 80%;"><b>Logistic function as the root of problems</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs053.html#the-derivative-of-the-logistic-funtion" style="font-size: 80%;"><b>The derivative of the Logistic funtion</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs054.html#insights-from-the-paper-by-glorot-and-bengio" style="font-size: 80%;"><b>Insights from the paper by Glorot and Bengio</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs055.html#the-relu-function-family" style="font-size: 80%;"><b>The RELU function family</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs056.html#elu-function" style="font-size: 80%;"><b>ELU function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs057.html#which-activation-function-should-we-use" style="font-size: 80%;"><b>Which activation function should we use?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs058.html#more-on-activation-functions-output-layers" style="font-size: 80%;"><b>More on activation functions, output layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs059.html#fine-tuning-neural-network-hyperparameters" style="font-size: 80%;"><b>Fine-tuning neural network hyperparameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs060.html#hidden-layers" style="font-size: 80%;"><b>Hidden layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs061.html#batch-normalization" style="font-size: 80%;"><b>Batch Normalization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs062.html#dropout" style="font-size: 80%;"><b>Dropout</b></a></li>
-     <!-- navigation toc: --> <li><a href="#gradient-clipping" style="font-size: 80%;"><b>Gradient Clipping</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs064.html#a-top-down-perspective-on-neural-networks" style="font-size: 80%;"><b>A top-down perspective on Neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs065.html#more-top-down-perspectives" style="font-size: 80%;"><b>More top-down perspectives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs066.html#limitations-of-supervised-learning-with-deep-networks" style="font-size: 80%;"><b>Limitations of supervised learning with deep networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs067.html#limitations-of-nns" style="font-size: 80%;"><b>Limitations of NNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs068.html#homogeneous-data" style="font-size: 80%;"><b>Homogeneous data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs069.html#more-limitations" style="font-size: 80%;"><b>More limitations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs070.html#setting-up-a-multi-layer-perceptron-model-for-classification" style="font-size: 80%;"><b>Setting up a Multi-layer perceptron model for classification</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs071.html#defining-the-cost-function" style="font-size: 80%;"><b>Defining the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs072.html#example-binary-classification-problem" style="font-size: 80%;"><b>Example: binary classification problem</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs073.html#the-softmax-function" style="font-size: 80%;"><b>The Softmax function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs074.html#developing-a-code-for-doing-neural-networks-with-back-propagation" style="font-size: 80%;"><b>Developing a code for doing neural networks with back propagation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs076.html#train-and-test-datasets" style="font-size: 80%;"><b>Train and test datasets</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs077.html#define-model-and-architecture" style="font-size: 80%;"><b>Define model and architecture</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs078.html#layers" style="font-size: 80%;"><b>Layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs079.html#weights-and-biases" style="font-size: 80%;"><b>Weights and biases</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs080.html#feed-forward-pass" style="font-size: 80%;"><b>Feed-forward pass</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs081.html#matrix-multiplications" style="font-size: 80%;"><b>Matrix multiplications</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs082.html#choose-cost-function-and-optimizer" style="font-size: 80%;"><b>Choose cost function and optimizer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs083.html#optimizing-the-cost-function" style="font-size: 80%;"><b>Optimizing the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs084.html#regularization" style="font-size: 80%;"><b>Regularization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs085.html#matrix-multiplication" style="font-size: 80%;"><b>Matrix  multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs086.html#improving-performance" style="font-size: 80%;"><b>Improving performance</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs087.html#full-object-oriented-implementation" style="font-size: 80%;"><b>Full object-oriented implementation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs088.html#evaluate-model-performance-on-test-data" style="font-size: 80%;"><b>Evaluate model performance on test data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs089.html#adjust-hyperparameters" style="font-size: 80%;"><b>Adjust hyperparameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs091.html#scikit-learn-implementation" style="font-size: 80%;"><b>scikit-learn implementation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#building-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs094.html#tensorflow" style="font-size: 80%;"><b>Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs095.html#using-keras" style="font-size: 80%;"><b>Using Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#the-breast-cancer-data-now-with-keras" style="font-size: 80%;"><b>The Breast Cancer Data, now with Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs048.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs049.html#relevance" style="font-size: 80%;"><b>Relevance</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs050.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs051.html#exploding-gradients" style="font-size: 80%;"><b>Exploding gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs052.html#is-the-logistic-activation-function-sigmoid-our-choice" style="font-size: 80%;"><b>Is the Logistic activation function (Sigmoid)  our choice?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs053.html#logistic-function-as-the-root-of-problems" style="font-size: 80%;"><b>Logistic function as the root of problems</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs054.html#the-derivative-of-the-logistic-funtion" style="font-size: 80%;"><b>The derivative of the Logistic funtion</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs055.html#insights-from-the-paper-by-glorot-and-bengio" style="font-size: 80%;"><b>Insights from the paper by Glorot and Bengio</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs056.html#the-relu-function-family" style="font-size: 80%;"><b>The RELU function family</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs057.html#elu-function" style="font-size: 80%;"><b>ELU function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs058.html#which-activation-function-should-we-use" style="font-size: 80%;"><b>Which activation function should we use?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs059.html#more-on-activation-functions-output-layers" style="font-size: 80%;"><b>More on activation functions, output layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs060.html#fine-tuning-neural-network-hyperparameters" style="font-size: 80%;"><b>Fine-tuning neural network hyperparameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs061.html#hidden-layers" style="font-size: 80%;"><b>Hidden layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs062.html#batch-normalization" style="font-size: 80%;"><b>Batch Normalization</b></a></li>
+     <!-- navigation toc: --> <li><a href="#dropout" style="font-size: 80%;"><b>Dropout</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs064.html#gradient-clipping" style="font-size: 80%;"><b>Gradient Clipping</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs065.html#a-top-down-perspective-on-neural-networks" style="font-size: 80%;"><b>A top-down perspective on Neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs066.html#more-top-down-perspectives" style="font-size: 80%;"><b>More top-down perspectives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs067.html#limitations-of-supervised-learning-with-deep-networks" style="font-size: 80%;"><b>Limitations of supervised learning with deep networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs068.html#limitations-of-nns" style="font-size: 80%;"><b>Limitations of NNs</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs069.html#homogeneous-data" style="font-size: 80%;"><b>Homogeneous data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs070.html#more-limitations" style="font-size: 80%;"><b>More limitations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs071.html#setting-up-a-multi-layer-perceptron-model-for-classification" style="font-size: 80%;"><b>Setting up a Multi-layer perceptron model for classification</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs072.html#defining-the-cost-function" style="font-size: 80%;"><b>Defining the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs073.html#example-binary-classification-problem" style="font-size: 80%;"><b>Example: binary classification problem</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs074.html#the-softmax-function" style="font-size: 80%;"><b>The Softmax function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs075.html#developing-a-code-for-doing-neural-networks-with-back-propagation" style="font-size: 80%;"><b>Developing a code for doing neural networks with back propagation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs077.html#train-and-test-datasets" style="font-size: 80%;"><b>Train and test datasets</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs078.html#define-model-and-architecture" style="font-size: 80%;"><b>Define model and architecture</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs079.html#layers" style="font-size: 80%;"><b>Layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs080.html#weights-and-biases" style="font-size: 80%;"><b>Weights and biases</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs081.html#feed-forward-pass" style="font-size: 80%;"><b>Feed-forward pass</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs082.html#matrix-multiplications" style="font-size: 80%;"><b>Matrix multiplications</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs083.html#choose-cost-function-and-optimizer" style="font-size: 80%;"><b>Choose cost function and optimizer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs084.html#optimizing-the-cost-function" style="font-size: 80%;"><b>Optimizing the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs085.html#regularization" style="font-size: 80%;"><b>Regularization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs086.html#matrix-multiplication" style="font-size: 80%;"><b>Matrix  multiplication</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs087.html#improving-performance" style="font-size: 80%;"><b>Improving performance</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs088.html#full-object-oriented-implementation" style="font-size: 80%;"><b>Full object-oriented implementation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs089.html#evaluate-model-performance-on-test-data" style="font-size: 80%;"><b>Evaluate model performance on test data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs090.html#adjust-hyperparameters" style="font-size: 80%;"><b>Adjust hyperparameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#scikit-learn-implementation" style="font-size: 80%;"><b>scikit-learn implementation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs094.html#building-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building neural networks in Tensorflow and Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs095.html#tensorflow" style="font-size: 80%;"><b>Tensorflow</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#using-keras" style="font-size: 80%;"><b>Using Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#building-a-neural-network-code" style="font-size: 80%;"><b>Building a neural network code</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#learning-rate-methods" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Learning rate methods</a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#usage-of-the-above-learning-rate-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of the above learning rate schedulers</a></li>
@@ -465,18 +463,18 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0063"></a>
 <!-- !split -->
-<h2 id="gradient-clipping" class="anchor">Gradient Clipping </h2>
+<h2 id="dropout" class="anchor">Dropout </h2>
 
-<p>A popular technique to lessen the exploding gradients problem is to
-simply clip the gradients during backpropagation so that they never
-exceed some threshold (this is mostly useful for recurrent neural
-networks).
+<p>It is a fairly simple algorithm: at every training step, every neuron
+(including the input neurons but excluding the output neurons) has a
+probability \( p \) of being temporarily dropped out, meaning it will be
+entirely ignored during this training step, but it may be active
+during the next step.
 </p>
 
-<p>This technique is called Gradient Clipping.</p>
-
-<p>In general however, Batch
-Normalization is preferred.
+<p>The hyperparameter \( p \) is called the dropout rate, and it is typically
+set to 50%. After training, the neurons are not dropped anymore.  It
+is viewed as one of the most popular regularization techniques.
 </p>
 
 <p>
diff --git a/doc/pub/week42/html/._week42-bs064.html b/doc/pub/week42/html/._week42-bs064.html
index 7feaa88b6..27226aed0 100644
--- a/doc/pub/week42/html/._week42-bs064.html
+++ b/doc/pub/week42/html/._week42-bs064.html
@@ -36,16 +36,17 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Lecture October 14, 2024', 2, None, 'lecture-october-14-2024'),
-              ('Material for the active learning sessions on Tuesday and '
-               'Wednesday',
+ 'sections': [('Lecture October 13, 2025', 2, None, 'lecture-october-13-2025'),
+              ('Readings and videos', 2, None, 'readings-and-videos'),
+              ('Material for the lab sessions on Tuesday and Wednesday',
                2,
                None,
-               'material-for-the-active-learning-sessions-on-tuesday-and-wednesday'),
-              ('Writing a code which implements a feed-forward neural network',
+               'material-for-the-lab-sessions-on-tuesday-and-wednesday'),
+              ('Lecture material: Writing a code which implements a '
+               'feed-forward neural network',
                2,
                None,
-               'writing-a-code-which-implements-a-feed-forward-neural-network'),
+               'lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network'),
               ('Mathematics of deep learning',
                2,
                None,
@@ -55,10 +56,11 @@
                None,
                'reminder-on-books-with-hands-on-material-and-codes'),
               ('Reading recommendations', 2, None, 'reading-recommendations'),
-              ('First network example, simple percepetron with one input',
+              ('Reminder from last week: First network example, simple '
+               'percepetron with one input',
                2,
                None,
-               'first-network-example-simple-percepetron-with-one-input'),
+               'reminder-from-last-week-first-network-example-simple-percepetron-with-one-input'),
               ('Layout of a simple neural network with no hidden layer',
                2,
                None,
@@ -105,10 +107,10 @@
                None,
                'setting-up-the-equations-for-a-neural-network'),
               ('Layout of a neural network with three hidden layers (last '
-               'later = $l=L=4$, first layer $l=0$)',
+               'layer = $l=L=4$, first layer $l=0$)',
                2,
                None,
-               'layout-of-a-neural-network-with-three-hidden-layers-last-later-l-l-4-first-layer-l-0'),
+               'layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0'),
               ('Definitions', 2, None, 'definitions'),
               ('Inputs to the activation function',
                2,
@@ -291,10 +293,6 @@
                2,
                None,
                'collect-and-pre-process-data'),
-              ('The Breast Cancer Data, now with Keras',
-               2,
-               None,
-               'the-breast-cancer-data-now-with-keras'),
               ('Building a neural network code',
                2,
                None,
@@ -349,103 +347,103 @@
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs001.html#lecture-october-14-2024" style="font-size: 80%;"><b>Lecture October 14, 2024</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs002.html#material-for-the-active-learning-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Material for the active learning sessions on Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs003.html#writing-a-code-which-implements-a-feed-forward-neural-network" style="font-size: 80%;"><b>Writing a code which implements a feed-forward neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs004.html#mathematics-of-deep-learning" style="font-size: 80%;"><b>Mathematics of deep learning</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs005.html#reminder-on-books-with-hands-on-material-and-codes" style="font-size: 80%;"><b>Reminder on books with hands-on material and codes</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs006.html#reading-recommendations" style="font-size: 80%;"><b>Reading recommendations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs007.html#first-network-example-simple-percepetron-with-one-input" style="font-size: 80%;"><b>First network example, simple percepetron with one input</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs008.html#layout-of-a-simple-neural-network-with-no-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with no hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs009.html#optimizing-the-parameters" style="font-size: 80%;"><b>Optimizing the parameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs010.html#adding-a-hidden-layer" style="font-size: 80%;"><b>Adding a hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs011.html#layout-of-a-simple-neural-network-with-one-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with one hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs012.html#the-derivatives" style="font-size: 80%;"><b>The derivatives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs013.html#important-observations" style="font-size: 80%;"><b>Important observations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs014.html#the-training" style="font-size: 80%;"><b>The training</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs015.html#code-example" style="font-size: 80%;"><b>Code example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs016.html#simple-neural-network-and-the-back-propagation-equations" style="font-size: 80%;"><b>Simple neural network and the  back propagation equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs017.html#layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-with-two-hidden-noeds-and-one-output-node" style="font-size: 80%;"><b>Layout of a simple neural network with two input nodes, one  hidden layer with two hidden noeds and one output node</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs018.html#the-ouput-layer" style="font-size: 80%;"><b>The ouput layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs019.html#compact-expressions" style="font-size: 80%;"><b>Compact expressions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs020.html#output-layer" style="font-size: 80%;"><b>Output layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs021.html#explicit-derivatives" style="font-size: 80%;"><b>Explicit derivatives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs022.html#derivatives-of-the-hidden-layer" style="font-size: 80%;"><b>Derivatives of the hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs023.html#final-expression" style="font-size: 80%;"><b>Final expression</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs024.html#completing-the-list" style="font-size: 80%;"><b>Completing the list</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs025.html#final-expressions-for-the-biases-of-the-hidden-layer" style="font-size: 80%;"><b>Final expressions for the biases of the hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs026.html#gradient-expressions" style="font-size: 80%;"><b>Gradient expressions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs027.html#setting-up-the-equations-for-a-neural-network" style="font-size: 80%;"><b>Setting up the equations for a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs028.html#layout-of-a-neural-network-with-three-hidden-layers-last-later-l-l-4-first-layer-l-0" style="font-size: 80%;"><b>Layout of a neural network with three hidden layers (last later = \( l=L=4 \), first layer \( l=0 \))</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs029.html#definitions" style="font-size: 80%;"><b>Definitions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs030.html#inputs-to-the-activation-function" style="font-size: 80%;"><b>Inputs to the activation function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs031.html#layout-of-input-to-first-hidden-layer-l-1-from-input-layer-l-0" style="font-size: 80%;"><b>Layout of input to first hidden layer \( l=1 \) from input layer \( l=0 \)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs032.html#derivatives-and-the-chain-rule" style="font-size: 80%;"><b>Derivatives and the chain rule</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs033.html#derivative-of-the-cost-function" style="font-size: 80%;"><b>Derivative of the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs034.html#the-back-propagation-equations-for-a-neural-network" style="font-size: 80%;"><b>The  back propagation equations for a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs035.html#analyzing-the-last-results" style="font-size: 80%;"><b>Analyzing the last results</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs036.html#more-considerations" style="font-size: 80%;"><b>More considerations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs037.html#derivatives-in-terms-of-z-j-l" style="font-size: 80%;"><b>Derivatives in terms of \( z_j^L \)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs038.html#bringing-it-together" style="font-size: 80%;"><b>Bringing it together</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs039.html#final-back-propagating-equation" style="font-size: 80%;"><b>Final back propagating equation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs040.html#using-the-chain-rule-and-summing-over-all-k-entries" style="font-size: 80%;"><b>Using the chain rule and summing over all \( k \) entries</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs041.html#setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations" style="font-size: 80%;"><b>Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs042.html#setting-up-the-back-propagation-algorithm-part-1" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 1</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs043.html#setting-up-the-back-propagation-algorithm-part-2" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs044.html#setting-up-the-back-propagation-algorithm-part-3" style="font-size: 80%;"><b>Setting up the Back propagation algorithm, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs045.html#updating-the-gradients" style="font-size: 80%;"><b>Updating the gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs001.html#lecture-october-13-2025" style="font-size: 80%;"><b>Lecture October 13, 2025</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs002.html#readings-and-videos" style="font-size: 80%;"><b>Readings and videos</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs003.html#material-for-the-lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Material for the lab sessions on Tuesday and Wednesday</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs004.html#lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network" style="font-size: 80%;"><b>Lecture material: Writing a code which implements a feed-forward neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs005.html#mathematics-of-deep-learning" style="font-size: 80%;"><b>Mathematics of deep learning</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs006.html#reminder-on-books-with-hands-on-material-and-codes" style="font-size: 80%;"><b>Reminder on books with hands-on material and codes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs007.html#reading-recommendations" style="font-size: 80%;"><b>Reading recommendations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs008.html#reminder-from-last-week-first-network-example-simple-percepetron-with-one-input" style="font-size: 80%;"><b>Reminder from last week: First network example, simple percepetron with one input</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs009.html#layout-of-a-simple-neural-network-with-no-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with no hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs010.html#optimizing-the-parameters" style="font-size: 80%;"><b>Optimizing the parameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs011.html#adding-a-hidden-layer" style="font-size: 80%;"><b>Adding a hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs012.html#layout-of-a-simple-neural-network-with-one-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with one hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs013.html#the-derivatives" style="font-size: 80%;"><b>The derivatives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs014.html#important-observations" style="font-size: 80%;"><b>Important observations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs015.html#the-training" style="font-size: 80%;"><b>The training</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs016.html#code-example" style="font-size: 80%;"><b>Code example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs017.html#simple-neural-network-and-the-back-propagation-equations" style="font-size: 80%;"><b>Simple neural network and the  back propagation equations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs018.html#layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-with-two-hidden-noeds-and-one-output-node" style="font-size: 80%;"><b>Layout of a simple neural network with two input nodes, one  hidden layer with two hidden noeds and one output node</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs019.html#the-ouput-layer" style="font-size: 80%;"><b>The ouput layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs020.html#compact-expressions" style="font-size: 80%;"><b>Compact expressions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs021.html#output-layer" style="font-size: 80%;"><b>Output layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs022.html#explicit-derivatives" style="font-size: 80%;"><b>Explicit derivatives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs023.html#derivatives-of-the-hidden-layer" style="font-size: 80%;"><b>Derivatives of the hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs024.html#final-expression" style="font-size: 80%;"><b>Final expression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs025.html#completing-the-list" style="font-size: 80%;"><b>Completing the list</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs026.html#final-expressions-for-the-biases-of-the-hidden-layer" style="font-size: 80%;"><b>Final expressions for the biases of the hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs027.html#gradient-expressions" style="font-size: 80%;"><b>Gradient expressions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs028.html#setting-up-the-equations-for-a-neural-network" style="font-size: 80%;"><b>Setting up the equations for a neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs029.html#layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0" style="font-size: 80%;"><b>Layout of a neural network with three hidden layers (last layer = \( l=L=4 \), first layer \( l=0 \))</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs030.html#definitions" style="font-size: 80%;"><b>Definitions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs031.html#inputs-to-the-activation-function" style="font-size: 80%;"><b>Inputs to the activation function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs032.html#layout-of-input-to-first-hidden-layer-l-1-from-input-layer-l-0" style="font-size: 80%;"><b>Layout of input to first hidden layer \( l=1 \) from input layer \( l=0 \)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs033.html#derivatives-and-the-chain-rule" style="font-size: 80%;"><b>Derivatives and the chain rule</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs034.html#derivative-of-the-cost-function" style="font-size: 80%;"><b>Derivative of the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs035.html#the-back-propagation-equations-for-a-neural-network" style="font-size: 80%;"><b>The  back propagation equations for a neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs036.html#analyzing-the-last-results" style="font-size: 80%;"><b>Analyzing the last results</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs037.html#more-considerations" style="font-size: 80%;"><b>More considerations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs038.html#derivatives-in-terms-of-z-j-l" style="font-size: 80%;"><b>Derivatives in terms of \( z_j^L \)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs039.html#bringing-it-together" style="font-size: 80%;"><b>Bringing it together</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs040.html#final-back-propagating-equation" style="font-size: 80%;"><b>Final back propagating equation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs041.html#using-the-chain-rule-and-summing-over-all-k-entries" style="font-size: 80%;"><b>Using the chain rule and summing over all \( k \) entries</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs042.html#setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations" style="font-size: 80%;"><b>Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs043.html#setting-up-the-back-propagation-algorithm-part-1" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 1</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs044.html#setting-up-the-back-propagation-algorithm-part-2" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 2</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs045.html#setting-up-the-back-propagation-algorithm-part-3" style="font-size: 80%;"><b>Setting up the Back propagation algorithm, part 3</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs046.html#updating-the-gradients" style="font-size: 80%;"><b>Updating the gradients</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#activation-functions" style="font-size: 80%;"><b>Activation functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs047.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs048.html#relevance" style="font-size: 80%;"><b>Relevance</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs049.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs050.html#exploding-gradients" style="font-size: 80%;"><b>Exploding gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs051.html#is-the-logistic-activation-function-sigmoid-our-choice" style="font-size: 80%;"><b>Is the Logistic activation function (Sigmoid)  our choice?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs052.html#logistic-function-as-the-root-of-problems" style="font-size: 80%;"><b>Logistic function as the root of problems</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs053.html#the-derivative-of-the-logistic-funtion" style="font-size: 80%;"><b>The derivative of the Logistic funtion</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs054.html#insights-from-the-paper-by-glorot-and-bengio" style="font-size: 80%;"><b>Insights from the paper by Glorot and Bengio</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs055.html#the-relu-function-family" style="font-size: 80%;"><b>The RELU function family</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs056.html#elu-function" style="font-size: 80%;"><b>ELU function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs057.html#which-activation-function-should-we-use" style="font-size: 80%;"><b>Which activation function should we use?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs058.html#more-on-activation-functions-output-layers" style="font-size: 80%;"><b>More on activation functions, output layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs059.html#fine-tuning-neural-network-hyperparameters" style="font-size: 80%;"><b>Fine-tuning neural network hyperparameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs060.html#hidden-layers" style="font-size: 80%;"><b>Hidden layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs061.html#batch-normalization" style="font-size: 80%;"><b>Batch Normalization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs062.html#dropout" style="font-size: 80%;"><b>Dropout</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs063.html#gradient-clipping" style="font-size: 80%;"><b>Gradient Clipping</b></a></li>
-     <!-- navigation toc: --> <li><a href="#a-top-down-perspective-on-neural-networks" style="font-size: 80%;"><b>A top-down perspective on Neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs065.html#more-top-down-perspectives" style="font-size: 80%;"><b>More top-down perspectives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs066.html#limitations-of-supervised-learning-with-deep-networks" style="font-size: 80%;"><b>Limitations of supervised learning with deep networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs067.html#limitations-of-nns" style="font-size: 80%;"><b>Limitations of NNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs068.html#homogeneous-data" style="font-size: 80%;"><b>Homogeneous data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs069.html#more-limitations" style="font-size: 80%;"><b>More limitations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs070.html#setting-up-a-multi-layer-perceptron-model-for-classification" style="font-size: 80%;"><b>Setting up a Multi-layer perceptron model for classification</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs071.html#defining-the-cost-function" style="font-size: 80%;"><b>Defining the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs072.html#example-binary-classification-problem" style="font-size: 80%;"><b>Example: binary classification problem</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs073.html#the-softmax-function" style="font-size: 80%;"><b>The Softmax function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs074.html#developing-a-code-for-doing-neural-networks-with-back-propagation" style="font-size: 80%;"><b>Developing a code for doing neural networks with back propagation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs076.html#train-and-test-datasets" style="font-size: 80%;"><b>Train and test datasets</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs077.html#define-model-and-architecture" style="font-size: 80%;"><b>Define model and architecture</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs078.html#layers" style="font-size: 80%;"><b>Layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs079.html#weights-and-biases" style="font-size: 80%;"><b>Weights and biases</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs080.html#feed-forward-pass" style="font-size: 80%;"><b>Feed-forward pass</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs081.html#matrix-multiplications" style="font-size: 80%;"><b>Matrix multiplications</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs082.html#choose-cost-function-and-optimizer" style="font-size: 80%;"><b>Choose cost function and optimizer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs083.html#optimizing-the-cost-function" style="font-size: 80%;"><b>Optimizing the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs084.html#regularization" style="font-size: 80%;"><b>Regularization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs085.html#matrix-multiplication" style="font-size: 80%;"><b>Matrix  multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs086.html#improving-performance" style="font-size: 80%;"><b>Improving performance</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs087.html#full-object-oriented-implementation" style="font-size: 80%;"><b>Full object-oriented implementation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs088.html#evaluate-model-performance-on-test-data" style="font-size: 80%;"><b>Evaluate model performance on test data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs089.html#adjust-hyperparameters" style="font-size: 80%;"><b>Adjust hyperparameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs091.html#scikit-learn-implementation" style="font-size: 80%;"><b>scikit-learn implementation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#building-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs094.html#tensorflow" style="font-size: 80%;"><b>Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs095.html#using-keras" style="font-size: 80%;"><b>Using Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#the-breast-cancer-data-now-with-keras" style="font-size: 80%;"><b>The Breast Cancer Data, now with Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs048.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs049.html#relevance" style="font-size: 80%;"><b>Relevance</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs050.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs051.html#exploding-gradients" style="font-size: 80%;"><b>Exploding gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs052.html#is-the-logistic-activation-function-sigmoid-our-choice" style="font-size: 80%;"><b>Is the Logistic activation function (Sigmoid)  our choice?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs053.html#logistic-function-as-the-root-of-problems" style="font-size: 80%;"><b>Logistic function as the root of problems</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs054.html#the-derivative-of-the-logistic-funtion" style="font-size: 80%;"><b>The derivative of the Logistic funtion</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs055.html#insights-from-the-paper-by-glorot-and-bengio" style="font-size: 80%;"><b>Insights from the paper by Glorot and Bengio</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs056.html#the-relu-function-family" style="font-size: 80%;"><b>The RELU function family</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs057.html#elu-function" style="font-size: 80%;"><b>ELU function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs058.html#which-activation-function-should-we-use" style="font-size: 80%;"><b>Which activation function should we use?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs059.html#more-on-activation-functions-output-layers" style="font-size: 80%;"><b>More on activation functions, output layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs060.html#fine-tuning-neural-network-hyperparameters" style="font-size: 80%;"><b>Fine-tuning neural network hyperparameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs061.html#hidden-layers" style="font-size: 80%;"><b>Hidden layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs062.html#batch-normalization" style="font-size: 80%;"><b>Batch Normalization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs063.html#dropout" style="font-size: 80%;"><b>Dropout</b></a></li>
+     <!-- navigation toc: --> <li><a href="#gradient-clipping" style="font-size: 80%;"><b>Gradient Clipping</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs065.html#a-top-down-perspective-on-neural-networks" style="font-size: 80%;"><b>A top-down perspective on Neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs066.html#more-top-down-perspectives" style="font-size: 80%;"><b>More top-down perspectives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs067.html#limitations-of-supervised-learning-with-deep-networks" style="font-size: 80%;"><b>Limitations of supervised learning with deep networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs068.html#limitations-of-nns" style="font-size: 80%;"><b>Limitations of NNs</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs069.html#homogeneous-data" style="font-size: 80%;"><b>Homogeneous data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs070.html#more-limitations" style="font-size: 80%;"><b>More limitations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs071.html#setting-up-a-multi-layer-perceptron-model-for-classification" style="font-size: 80%;"><b>Setting up a Multi-layer perceptron model for classification</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs072.html#defining-the-cost-function" style="font-size: 80%;"><b>Defining the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs073.html#example-binary-classification-problem" style="font-size: 80%;"><b>Example: binary classification problem</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs074.html#the-softmax-function" style="font-size: 80%;"><b>The Softmax function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs075.html#developing-a-code-for-doing-neural-networks-with-back-propagation" style="font-size: 80%;"><b>Developing a code for doing neural networks with back propagation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs077.html#train-and-test-datasets" style="font-size: 80%;"><b>Train and test datasets</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs078.html#define-model-and-architecture" style="font-size: 80%;"><b>Define model and architecture</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs079.html#layers" style="font-size: 80%;"><b>Layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs080.html#weights-and-biases" style="font-size: 80%;"><b>Weights and biases</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs081.html#feed-forward-pass" style="font-size: 80%;"><b>Feed-forward pass</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs082.html#matrix-multiplications" style="font-size: 80%;"><b>Matrix multiplications</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs083.html#choose-cost-function-and-optimizer" style="font-size: 80%;"><b>Choose cost function and optimizer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs084.html#optimizing-the-cost-function" style="font-size: 80%;"><b>Optimizing the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs085.html#regularization" style="font-size: 80%;"><b>Regularization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs086.html#matrix-multiplication" style="font-size: 80%;"><b>Matrix  multiplication</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs087.html#improving-performance" style="font-size: 80%;"><b>Improving performance</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs088.html#full-object-oriented-implementation" style="font-size: 80%;"><b>Full object-oriented implementation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs089.html#evaluate-model-performance-on-test-data" style="font-size: 80%;"><b>Evaluate model performance on test data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs090.html#adjust-hyperparameters" style="font-size: 80%;"><b>Adjust hyperparameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#scikit-learn-implementation" style="font-size: 80%;"><b>scikit-learn implementation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs094.html#building-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building neural networks in Tensorflow and Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs095.html#tensorflow" style="font-size: 80%;"><b>Tensorflow</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#using-keras" style="font-size: 80%;"><b>Using Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#building-a-neural-network-code" style="font-size: 80%;"><b>Building a neural network code</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#learning-rate-methods" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Learning rate methods</a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#usage-of-the-above-learning-rate-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of the above learning rate schedulers</a></li>
@@ -464,25 +462,21 @@
 <div class="container">
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0064"></a>
-<!-- !split  -->
-<h2 id="a-top-down-perspective-on-neural-networks" class="anchor">A top-down perspective on Neural networks </h2>
+<!-- !split -->
+<h2 id="gradient-clipping" class="anchor">Gradient Clipping </h2>
+
+<p>A popular technique to lessen the exploding gradients problem is to
+simply clip the gradients during backpropagation so that they never
+exceed some threshold (this is mostly useful for recurrent neural
+networks).
+</p>
+
+<p>This technique is called Gradient Clipping.</p>
 
-<p>The first thing we would like to do is divide the data into two or
-three parts. A training set, a validation or dev (development) set,
-and a test set. The test set is the data on which we want to make
-predictions. The dev set is a subset of the training data we use to
-check how well we are doing out-of-sample, after training the model on
-the training dataset. We use the validation error as a proxy for the
-test error in order to make tweaks to our model. It is crucial that we
-do not use any of the test data to train the algorithm. This is a
-cardinal sin in ML. Then:
+<p>In general however, Batch
+Normalization is preferred.
 </p>
 
-<ol>
-<li> Estimate optimal error rate</li>
-<li> Minimize underfitting (bias) on training data set.</li>
-<li> Make sure you are not overfitting.</li>
-</ol>
 <p>
 <!-- navigation buttons at the bottom of the page -->
 <ul class="pagination">
diff --git a/doc/pub/week42/html/._week42-bs065.html b/doc/pub/week42/html/._week42-bs065.html
index 10f0a6c04..cc8e52b29 100644
--- a/doc/pub/week42/html/._week42-bs065.html
+++ b/doc/pub/week42/html/._week42-bs065.html
@@ -36,16 +36,17 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Lecture October 14, 2024', 2, None, 'lecture-october-14-2024'),
-              ('Material for the active learning sessions on Tuesday and '
-               'Wednesday',
+ 'sections': [('Lecture October 13, 2025', 2, None, 'lecture-october-13-2025'),
+              ('Readings and videos', 2, None, 'readings-and-videos'),
+              ('Material for the lab sessions on Tuesday and Wednesday',
                2,
                None,
-               'material-for-the-active-learning-sessions-on-tuesday-and-wednesday'),
-              ('Writing a code which implements a feed-forward neural network',
+               'material-for-the-lab-sessions-on-tuesday-and-wednesday'),
+              ('Lecture material: Writing a code which implements a '
+               'feed-forward neural network',
                2,
                None,
-               'writing-a-code-which-implements-a-feed-forward-neural-network'),
+               'lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network'),
               ('Mathematics of deep learning',
                2,
                None,
@@ -55,10 +56,11 @@
                None,
                'reminder-on-books-with-hands-on-material-and-codes'),
               ('Reading recommendations', 2, None, 'reading-recommendations'),
-              ('First network example, simple percepetron with one input',
+              ('Reminder from last week: First network example, simple '
+               'percepetron with one input',
                2,
                None,
-               'first-network-example-simple-percepetron-with-one-input'),
+               'reminder-from-last-week-first-network-example-simple-percepetron-with-one-input'),
               ('Layout of a simple neural network with no hidden layer',
                2,
                None,
@@ -105,10 +107,10 @@
                None,
                'setting-up-the-equations-for-a-neural-network'),
               ('Layout of a neural network with three hidden layers (last '
-               'later = $l=L=4$, first layer $l=0$)',
+               'layer = $l=L=4$, first layer $l=0$)',
                2,
                None,
-               'layout-of-a-neural-network-with-three-hidden-layers-last-later-l-l-4-first-layer-l-0'),
+               'layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0'),
               ('Definitions', 2, None, 'definitions'),
               ('Inputs to the activation function',
                2,
@@ -291,10 +293,6 @@
                2,
                None,
                'collect-and-pre-process-data'),
-              ('The Breast Cancer Data, now with Keras',
-               2,
-               None,
-               'the-breast-cancer-data-now-with-keras'),
               ('Building a neural network code',
                2,
                None,
@@ -349,103 +347,103 @@
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs001.html#lecture-october-14-2024" style="font-size: 80%;"><b>Lecture October 14, 2024</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs002.html#material-for-the-active-learning-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Material for the active learning sessions on Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs003.html#writing-a-code-which-implements-a-feed-forward-neural-network" style="font-size: 80%;"><b>Writing a code which implements a feed-forward neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs004.html#mathematics-of-deep-learning" style="font-size: 80%;"><b>Mathematics of deep learning</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs005.html#reminder-on-books-with-hands-on-material-and-codes" style="font-size: 80%;"><b>Reminder on books with hands-on material and codes</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs006.html#reading-recommendations" style="font-size: 80%;"><b>Reading recommendations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs007.html#first-network-example-simple-percepetron-with-one-input" style="font-size: 80%;"><b>First network example, simple percepetron with one input</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs008.html#layout-of-a-simple-neural-network-with-no-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with no hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs009.html#optimizing-the-parameters" style="font-size: 80%;"><b>Optimizing the parameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs010.html#adding-a-hidden-layer" style="font-size: 80%;"><b>Adding a hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs011.html#layout-of-a-simple-neural-network-with-one-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with one hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs012.html#the-derivatives" style="font-size: 80%;"><b>The derivatives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs013.html#important-observations" style="font-size: 80%;"><b>Important observations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs014.html#the-training" style="font-size: 80%;"><b>The training</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs015.html#code-example" style="font-size: 80%;"><b>Code example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs016.html#simple-neural-network-and-the-back-propagation-equations" style="font-size: 80%;"><b>Simple neural network and the  back propagation equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs017.html#layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-with-two-hidden-noeds-and-one-output-node" style="font-size: 80%;"><b>Layout of a simple neural network with two input nodes, one  hidden layer with two hidden noeds and one output node</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs018.html#the-ouput-layer" style="font-size: 80%;"><b>The ouput layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs019.html#compact-expressions" style="font-size: 80%;"><b>Compact expressions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs020.html#output-layer" style="font-size: 80%;"><b>Output layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs021.html#explicit-derivatives" style="font-size: 80%;"><b>Explicit derivatives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs022.html#derivatives-of-the-hidden-layer" style="font-size: 80%;"><b>Derivatives of the hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs023.html#final-expression" style="font-size: 80%;"><b>Final expression</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs024.html#completing-the-list" style="font-size: 80%;"><b>Completing the list</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs025.html#final-expressions-for-the-biases-of-the-hidden-layer" style="font-size: 80%;"><b>Final expressions for the biases of the hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs026.html#gradient-expressions" style="font-size: 80%;"><b>Gradient expressions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs027.html#setting-up-the-equations-for-a-neural-network" style="font-size: 80%;"><b>Setting up the equations for a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs028.html#layout-of-a-neural-network-with-three-hidden-layers-last-later-l-l-4-first-layer-l-0" style="font-size: 80%;"><b>Layout of a neural network with three hidden layers (last later = \( l=L=4 \), first layer \( l=0 \))</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs029.html#definitions" style="font-size: 80%;"><b>Definitions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs030.html#inputs-to-the-activation-function" style="font-size: 80%;"><b>Inputs to the activation function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs031.html#layout-of-input-to-first-hidden-layer-l-1-from-input-layer-l-0" style="font-size: 80%;"><b>Layout of input to first hidden layer \( l=1 \) from input layer \( l=0 \)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs032.html#derivatives-and-the-chain-rule" style="font-size: 80%;"><b>Derivatives and the chain rule</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs033.html#derivative-of-the-cost-function" style="font-size: 80%;"><b>Derivative of the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs034.html#the-back-propagation-equations-for-a-neural-network" style="font-size: 80%;"><b>The  back propagation equations for a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs035.html#analyzing-the-last-results" style="font-size: 80%;"><b>Analyzing the last results</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs036.html#more-considerations" style="font-size: 80%;"><b>More considerations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs037.html#derivatives-in-terms-of-z-j-l" style="font-size: 80%;"><b>Derivatives in terms of \( z_j^L \)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs038.html#bringing-it-together" style="font-size: 80%;"><b>Bringing it together</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs039.html#final-back-propagating-equation" style="font-size: 80%;"><b>Final back propagating equation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs040.html#using-the-chain-rule-and-summing-over-all-k-entries" style="font-size: 80%;"><b>Using the chain rule and summing over all \( k \) entries</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs041.html#setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations" style="font-size: 80%;"><b>Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs042.html#setting-up-the-back-propagation-algorithm-part-1" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 1</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs043.html#setting-up-the-back-propagation-algorithm-part-2" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs044.html#setting-up-the-back-propagation-algorithm-part-3" style="font-size: 80%;"><b>Setting up the Back propagation algorithm, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs045.html#updating-the-gradients" style="font-size: 80%;"><b>Updating the gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs001.html#lecture-october-13-2025" style="font-size: 80%;"><b>Lecture October 13, 2025</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs002.html#readings-and-videos" style="font-size: 80%;"><b>Readings and videos</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs003.html#material-for-the-lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Material for the lab sessions on Tuesday and Wednesday</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs004.html#lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network" style="font-size: 80%;"><b>Lecture material: Writing a code which implements a feed-forward neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs005.html#mathematics-of-deep-learning" style="font-size: 80%;"><b>Mathematics of deep learning</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs006.html#reminder-on-books-with-hands-on-material-and-codes" style="font-size: 80%;"><b>Reminder on books with hands-on material and codes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs007.html#reading-recommendations" style="font-size: 80%;"><b>Reading recommendations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs008.html#reminder-from-last-week-first-network-example-simple-percepetron-with-one-input" style="font-size: 80%;"><b>Reminder from last week: First network example, simple percepetron with one input</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs009.html#layout-of-a-simple-neural-network-with-no-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with no hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs010.html#optimizing-the-parameters" style="font-size: 80%;"><b>Optimizing the parameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs011.html#adding-a-hidden-layer" style="font-size: 80%;"><b>Adding a hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs012.html#layout-of-a-simple-neural-network-with-one-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with one hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs013.html#the-derivatives" style="font-size: 80%;"><b>The derivatives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs014.html#important-observations" style="font-size: 80%;"><b>Important observations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs015.html#the-training" style="font-size: 80%;"><b>The training</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs016.html#code-example" style="font-size: 80%;"><b>Code example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs017.html#simple-neural-network-and-the-back-propagation-equations" style="font-size: 80%;"><b>Simple neural network and the  back propagation equations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs018.html#layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-with-two-hidden-noeds-and-one-output-node" style="font-size: 80%;"><b>Layout of a simple neural network with two input nodes, one  hidden layer with two hidden noeds and one output node</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs019.html#the-ouput-layer" style="font-size: 80%;"><b>The ouput layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs020.html#compact-expressions" style="font-size: 80%;"><b>Compact expressions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs021.html#output-layer" style="font-size: 80%;"><b>Output layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs022.html#explicit-derivatives" style="font-size: 80%;"><b>Explicit derivatives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs023.html#derivatives-of-the-hidden-layer" style="font-size: 80%;"><b>Derivatives of the hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs024.html#final-expression" style="font-size: 80%;"><b>Final expression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs025.html#completing-the-list" style="font-size: 80%;"><b>Completing the list</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs026.html#final-expressions-for-the-biases-of-the-hidden-layer" style="font-size: 80%;"><b>Final expressions for the biases of the hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs027.html#gradient-expressions" style="font-size: 80%;"><b>Gradient expressions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs028.html#setting-up-the-equations-for-a-neural-network" style="font-size: 80%;"><b>Setting up the equations for a neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs029.html#layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0" style="font-size: 80%;"><b>Layout of a neural network with three hidden layers (last layer = \( l=L=4 \), first layer \( l=0 \))</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs030.html#definitions" style="font-size: 80%;"><b>Definitions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs031.html#inputs-to-the-activation-function" style="font-size: 80%;"><b>Inputs to the activation function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs032.html#layout-of-input-to-first-hidden-layer-l-1-from-input-layer-l-0" style="font-size: 80%;"><b>Layout of input to first hidden layer \( l=1 \) from input layer \( l=0 \)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs033.html#derivatives-and-the-chain-rule" style="font-size: 80%;"><b>Derivatives and the chain rule</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs034.html#derivative-of-the-cost-function" style="font-size: 80%;"><b>Derivative of the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs035.html#the-back-propagation-equations-for-a-neural-network" style="font-size: 80%;"><b>The  back propagation equations for a neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs036.html#analyzing-the-last-results" style="font-size: 80%;"><b>Analyzing the last results</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs037.html#more-considerations" style="font-size: 80%;"><b>More considerations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs038.html#derivatives-in-terms-of-z-j-l" style="font-size: 80%;"><b>Derivatives in terms of \( z_j^L \)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs039.html#bringing-it-together" style="font-size: 80%;"><b>Bringing it together</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs040.html#final-back-propagating-equation" style="font-size: 80%;"><b>Final back propagating equation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs041.html#using-the-chain-rule-and-summing-over-all-k-entries" style="font-size: 80%;"><b>Using the chain rule and summing over all \( k \) entries</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs042.html#setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations" style="font-size: 80%;"><b>Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs043.html#setting-up-the-back-propagation-algorithm-part-1" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 1</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs044.html#setting-up-the-back-propagation-algorithm-part-2" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 2</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs045.html#setting-up-the-back-propagation-algorithm-part-3" style="font-size: 80%;"><b>Setting up the Back propagation algorithm, part 3</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs046.html#updating-the-gradients" style="font-size: 80%;"><b>Updating the gradients</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#activation-functions" style="font-size: 80%;"><b>Activation functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs047.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs048.html#relevance" style="font-size: 80%;"><b>Relevance</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs049.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs050.html#exploding-gradients" style="font-size: 80%;"><b>Exploding gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs051.html#is-the-logistic-activation-function-sigmoid-our-choice" style="font-size: 80%;"><b>Is the Logistic activation function (Sigmoid)  our choice?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs052.html#logistic-function-as-the-root-of-problems" style="font-size: 80%;"><b>Logistic function as the root of problems</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs053.html#the-derivative-of-the-logistic-funtion" style="font-size: 80%;"><b>The derivative of the Logistic funtion</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs054.html#insights-from-the-paper-by-glorot-and-bengio" style="font-size: 80%;"><b>Insights from the paper by Glorot and Bengio</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs055.html#the-relu-function-family" style="font-size: 80%;"><b>The RELU function family</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs056.html#elu-function" style="font-size: 80%;"><b>ELU function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs057.html#which-activation-function-should-we-use" style="font-size: 80%;"><b>Which activation function should we use?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs058.html#more-on-activation-functions-output-layers" style="font-size: 80%;"><b>More on activation functions, output layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs059.html#fine-tuning-neural-network-hyperparameters" style="font-size: 80%;"><b>Fine-tuning neural network hyperparameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs060.html#hidden-layers" style="font-size: 80%;"><b>Hidden layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs061.html#batch-normalization" style="font-size: 80%;"><b>Batch Normalization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs062.html#dropout" style="font-size: 80%;"><b>Dropout</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs063.html#gradient-clipping" style="font-size: 80%;"><b>Gradient Clipping</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs064.html#a-top-down-perspective-on-neural-networks" style="font-size: 80%;"><b>A top-down perspective on Neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="#more-top-down-perspectives" style="font-size: 80%;"><b>More top-down perspectives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs066.html#limitations-of-supervised-learning-with-deep-networks" style="font-size: 80%;"><b>Limitations of supervised learning with deep networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs067.html#limitations-of-nns" style="font-size: 80%;"><b>Limitations of NNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs068.html#homogeneous-data" style="font-size: 80%;"><b>Homogeneous data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs069.html#more-limitations" style="font-size: 80%;"><b>More limitations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs070.html#setting-up-a-multi-layer-perceptron-model-for-classification" style="font-size: 80%;"><b>Setting up a Multi-layer perceptron model for classification</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs071.html#defining-the-cost-function" style="font-size: 80%;"><b>Defining the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs072.html#example-binary-classification-problem" style="font-size: 80%;"><b>Example: binary classification problem</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs073.html#the-softmax-function" style="font-size: 80%;"><b>The Softmax function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs074.html#developing-a-code-for-doing-neural-networks-with-back-propagation" style="font-size: 80%;"><b>Developing a code for doing neural networks with back propagation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs076.html#train-and-test-datasets" style="font-size: 80%;"><b>Train and test datasets</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs077.html#define-model-and-architecture" style="font-size: 80%;"><b>Define model and architecture</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs078.html#layers" style="font-size: 80%;"><b>Layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs079.html#weights-and-biases" style="font-size: 80%;"><b>Weights and biases</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs080.html#feed-forward-pass" style="font-size: 80%;"><b>Feed-forward pass</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs081.html#matrix-multiplications" style="font-size: 80%;"><b>Matrix multiplications</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs082.html#choose-cost-function-and-optimizer" style="font-size: 80%;"><b>Choose cost function and optimizer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs083.html#optimizing-the-cost-function" style="font-size: 80%;"><b>Optimizing the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs084.html#regularization" style="font-size: 80%;"><b>Regularization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs085.html#matrix-multiplication" style="font-size: 80%;"><b>Matrix  multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs086.html#improving-performance" style="font-size: 80%;"><b>Improving performance</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs087.html#full-object-oriented-implementation" style="font-size: 80%;"><b>Full object-oriented implementation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs088.html#evaluate-model-performance-on-test-data" style="font-size: 80%;"><b>Evaluate model performance on test data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs089.html#adjust-hyperparameters" style="font-size: 80%;"><b>Adjust hyperparameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs091.html#scikit-learn-implementation" style="font-size: 80%;"><b>scikit-learn implementation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#building-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs094.html#tensorflow" style="font-size: 80%;"><b>Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs095.html#using-keras" style="font-size: 80%;"><b>Using Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#the-breast-cancer-data-now-with-keras" style="font-size: 80%;"><b>The Breast Cancer Data, now with Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs048.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs049.html#relevance" style="font-size: 80%;"><b>Relevance</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs050.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs051.html#exploding-gradients" style="font-size: 80%;"><b>Exploding gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs052.html#is-the-logistic-activation-function-sigmoid-our-choice" style="font-size: 80%;"><b>Is the Logistic activation function (Sigmoid)  our choice?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs053.html#logistic-function-as-the-root-of-problems" style="font-size: 80%;"><b>Logistic function as the root of problems</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs054.html#the-derivative-of-the-logistic-funtion" style="font-size: 80%;"><b>The derivative of the Logistic funtion</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs055.html#insights-from-the-paper-by-glorot-and-bengio" style="font-size: 80%;"><b>Insights from the paper by Glorot and Bengio</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs056.html#the-relu-function-family" style="font-size: 80%;"><b>The RELU function family</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs057.html#elu-function" style="font-size: 80%;"><b>ELU function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs058.html#which-activation-function-should-we-use" style="font-size: 80%;"><b>Which activation function should we use?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs059.html#more-on-activation-functions-output-layers" style="font-size: 80%;"><b>More on activation functions, output layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs060.html#fine-tuning-neural-network-hyperparameters" style="font-size: 80%;"><b>Fine-tuning neural network hyperparameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs061.html#hidden-layers" style="font-size: 80%;"><b>Hidden layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs062.html#batch-normalization" style="font-size: 80%;"><b>Batch Normalization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs063.html#dropout" style="font-size: 80%;"><b>Dropout</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs064.html#gradient-clipping" style="font-size: 80%;"><b>Gradient Clipping</b></a></li>
+     <!-- navigation toc: --> <li><a href="#a-top-down-perspective-on-neural-networks" style="font-size: 80%;"><b>A top-down perspective on Neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs066.html#more-top-down-perspectives" style="font-size: 80%;"><b>More top-down perspectives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs067.html#limitations-of-supervised-learning-with-deep-networks" style="font-size: 80%;"><b>Limitations of supervised learning with deep networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs068.html#limitations-of-nns" style="font-size: 80%;"><b>Limitations of NNs</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs069.html#homogeneous-data" style="font-size: 80%;"><b>Homogeneous data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs070.html#more-limitations" style="font-size: 80%;"><b>More limitations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs071.html#setting-up-a-multi-layer-perceptron-model-for-classification" style="font-size: 80%;"><b>Setting up a Multi-layer perceptron model for classification</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs072.html#defining-the-cost-function" style="font-size: 80%;"><b>Defining the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs073.html#example-binary-classification-problem" style="font-size: 80%;"><b>Example: binary classification problem</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs074.html#the-softmax-function" style="font-size: 80%;"><b>The Softmax function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs075.html#developing-a-code-for-doing-neural-networks-with-back-propagation" style="font-size: 80%;"><b>Developing a code for doing neural networks with back propagation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs077.html#train-and-test-datasets" style="font-size: 80%;"><b>Train and test datasets</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs078.html#define-model-and-architecture" style="font-size: 80%;"><b>Define model and architecture</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs079.html#layers" style="font-size: 80%;"><b>Layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs080.html#weights-and-biases" style="font-size: 80%;"><b>Weights and biases</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs081.html#feed-forward-pass" style="font-size: 80%;"><b>Feed-forward pass</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs082.html#matrix-multiplications" style="font-size: 80%;"><b>Matrix multiplications</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs083.html#choose-cost-function-and-optimizer" style="font-size: 80%;"><b>Choose cost function and optimizer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs084.html#optimizing-the-cost-function" style="font-size: 80%;"><b>Optimizing the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs085.html#regularization" style="font-size: 80%;"><b>Regularization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs086.html#matrix-multiplication" style="font-size: 80%;"><b>Matrix  multiplication</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs087.html#improving-performance" style="font-size: 80%;"><b>Improving performance</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs088.html#full-object-oriented-implementation" style="font-size: 80%;"><b>Full object-oriented implementation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs089.html#evaluate-model-performance-on-test-data" style="font-size: 80%;"><b>Evaluate model performance on test data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs090.html#adjust-hyperparameters" style="font-size: 80%;"><b>Adjust hyperparameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#scikit-learn-implementation" style="font-size: 80%;"><b>scikit-learn implementation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs094.html#building-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building neural networks in Tensorflow and Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs095.html#tensorflow" style="font-size: 80%;"><b>Tensorflow</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#using-keras" style="font-size: 80%;"><b>Using Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#building-a-neural-network-code" style="font-size: 80%;"><b>Building a neural network code</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#learning-rate-methods" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Learning rate methods</a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#usage-of-the-above-learning-rate-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of the above learning rate schedulers</a></li>
@@ -464,30 +462,25 @@
 <div class="container">
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0065"></a>
-<!-- !split -->
-<h2 id="more-top-down-perspectives" class="anchor">More top-down perspectives </h2>
-
-<p>If the validation and test sets are drawn from the same distributions,
-then a good performance on the validation set should lead to similarly
-good performance on the test set. 
-</p>
+<!-- !split  -->
+<h2 id="a-top-down-perspective-on-neural-networks" class="anchor">A top-down perspective on Neural networks </h2>
 
-<p>However, sometimes
-the training data and test data differ in subtle ways because, for
-example, they are collected using slightly different methods, or
-because it is cheaper to collect data in one way versus another. In
-this case, there can be a mismatch between the training and test
-data. This can lead to the neural network overfitting these small
-differences between the test and training sets, and a poor performance
-on the test set despite having a good performance on the validation
-set. To rectify this, Andrew Ng suggests making two validation or dev
-sets, one constructed from the training data and one constructed from
-the test data. The difference between the performance of the algorithm
-on these two validation sets quantifies the train-test mismatch. This
-can serve as another important diagnostic when using DNNs for
-supervised learning.
+<p>The first thing we would like to do is divide the data into two or
+three parts. A training set, a validation or dev (development) set,
+and a test set. The test set is the data on which we want to make
+predictions. The dev set is a subset of the training data we use to
+check how well we are doing out-of-sample, after training the model on
+the training dataset. We use the validation error as a proxy for the
+test error in order to make tweaks to our model. It is crucial that we
+do not use any of the test data to train the algorithm. This is a
+cardinal sin in ML. Then:
 </p>
 
+<ol>
+<li> Estimate optimal error rate</li>
+<li> Minimize underfitting (bias) on training data set.</li>
+<li> Make sure you are not overfitting.</li>
+</ol>
 <p>
 <!-- navigation buttons at the bottom of the page -->
 <ul class="pagination">
diff --git a/doc/pub/week42/html/._week42-bs066.html b/doc/pub/week42/html/._week42-bs066.html
index e09ea1f05..1f9bdd74a 100644
--- a/doc/pub/week42/html/._week42-bs066.html
+++ b/doc/pub/week42/html/._week42-bs066.html
@@ -36,16 +36,17 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Lecture October 14, 2024', 2, None, 'lecture-october-14-2024'),
-              ('Material for the active learning sessions on Tuesday and '
-               'Wednesday',
+ 'sections': [('Lecture October 13, 2025', 2, None, 'lecture-october-13-2025'),
+              ('Readings and videos', 2, None, 'readings-and-videos'),
+              ('Material for the lab sessions on Tuesday and Wednesday',
                2,
                None,
-               'material-for-the-active-learning-sessions-on-tuesday-and-wednesday'),
-              ('Writing a code which implements a feed-forward neural network',
+               'material-for-the-lab-sessions-on-tuesday-and-wednesday'),
+              ('Lecture material: Writing a code which implements a '
+               'feed-forward neural network',
                2,
                None,
-               'writing-a-code-which-implements-a-feed-forward-neural-network'),
+               'lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network'),
               ('Mathematics of deep learning',
                2,
                None,
@@ -55,10 +56,11 @@
                None,
                'reminder-on-books-with-hands-on-material-and-codes'),
               ('Reading recommendations', 2, None, 'reading-recommendations'),
-              ('First network example, simple percepetron with one input',
+              ('Reminder from last week: First network example, simple '
+               'percepetron with one input',
                2,
                None,
-               'first-network-example-simple-percepetron-with-one-input'),
+               'reminder-from-last-week-first-network-example-simple-percepetron-with-one-input'),
               ('Layout of a simple neural network with no hidden layer',
                2,
                None,
@@ -105,10 +107,10 @@
                None,
                'setting-up-the-equations-for-a-neural-network'),
               ('Layout of a neural network with three hidden layers (last '
-               'later = $l=L=4$, first layer $l=0$)',
+               'layer = $l=L=4$, first layer $l=0$)',
                2,
                None,
-               'layout-of-a-neural-network-with-three-hidden-layers-last-later-l-l-4-first-layer-l-0'),
+               'layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0'),
               ('Definitions', 2, None, 'definitions'),
               ('Inputs to the activation function',
                2,
@@ -291,10 +293,6 @@
                2,
                None,
                'collect-and-pre-process-data'),
-              ('The Breast Cancer Data, now with Keras',
-               2,
-               None,
-               'the-breast-cancer-data-now-with-keras'),
               ('Building a neural network code',
                2,
                None,
@@ -349,103 +347,103 @@
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs001.html#lecture-october-14-2024" style="font-size: 80%;"><b>Lecture October 14, 2024</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs002.html#material-for-the-active-learning-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Material for the active learning sessions on Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs003.html#writing-a-code-which-implements-a-feed-forward-neural-network" style="font-size: 80%;"><b>Writing a code which implements a feed-forward neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs004.html#mathematics-of-deep-learning" style="font-size: 80%;"><b>Mathematics of deep learning</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs005.html#reminder-on-books-with-hands-on-material-and-codes" style="font-size: 80%;"><b>Reminder on books with hands-on material and codes</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs006.html#reading-recommendations" style="font-size: 80%;"><b>Reading recommendations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs007.html#first-network-example-simple-percepetron-with-one-input" style="font-size: 80%;"><b>First network example, simple percepetron with one input</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs008.html#layout-of-a-simple-neural-network-with-no-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with no hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs009.html#optimizing-the-parameters" style="font-size: 80%;"><b>Optimizing the parameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs010.html#adding-a-hidden-layer" style="font-size: 80%;"><b>Adding a hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs011.html#layout-of-a-simple-neural-network-with-one-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with one hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs012.html#the-derivatives" style="font-size: 80%;"><b>The derivatives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs013.html#important-observations" style="font-size: 80%;"><b>Important observations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs014.html#the-training" style="font-size: 80%;"><b>The training</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs015.html#code-example" style="font-size: 80%;"><b>Code example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs016.html#simple-neural-network-and-the-back-propagation-equations" style="font-size: 80%;"><b>Simple neural network and the  back propagation equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs017.html#layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-with-two-hidden-noeds-and-one-output-node" style="font-size: 80%;"><b>Layout of a simple neural network with two input nodes, one  hidden layer with two hidden noeds and one output node</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs018.html#the-ouput-layer" style="font-size: 80%;"><b>The ouput layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs019.html#compact-expressions" style="font-size: 80%;"><b>Compact expressions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs020.html#output-layer" style="font-size: 80%;"><b>Output layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs021.html#explicit-derivatives" style="font-size: 80%;"><b>Explicit derivatives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs022.html#derivatives-of-the-hidden-layer" style="font-size: 80%;"><b>Derivatives of the hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs023.html#final-expression" style="font-size: 80%;"><b>Final expression</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs024.html#completing-the-list" style="font-size: 80%;"><b>Completing the list</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs025.html#final-expressions-for-the-biases-of-the-hidden-layer" style="font-size: 80%;"><b>Final expressions for the biases of the hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs026.html#gradient-expressions" style="font-size: 80%;"><b>Gradient expressions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs027.html#setting-up-the-equations-for-a-neural-network" style="font-size: 80%;"><b>Setting up the equations for a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs028.html#layout-of-a-neural-network-with-three-hidden-layers-last-later-l-l-4-first-layer-l-0" style="font-size: 80%;"><b>Layout of a neural network with three hidden layers (last later = \( l=L=4 \), first layer \( l=0 \))</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs029.html#definitions" style="font-size: 80%;"><b>Definitions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs030.html#inputs-to-the-activation-function" style="font-size: 80%;"><b>Inputs to the activation function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs031.html#layout-of-input-to-first-hidden-layer-l-1-from-input-layer-l-0" style="font-size: 80%;"><b>Layout of input to first hidden layer \( l=1 \) from input layer \( l=0 \)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs032.html#derivatives-and-the-chain-rule" style="font-size: 80%;"><b>Derivatives and the chain rule</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs033.html#derivative-of-the-cost-function" style="font-size: 80%;"><b>Derivative of the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs034.html#the-back-propagation-equations-for-a-neural-network" style="font-size: 80%;"><b>The  back propagation equations for a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs035.html#analyzing-the-last-results" style="font-size: 80%;"><b>Analyzing the last results</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs036.html#more-considerations" style="font-size: 80%;"><b>More considerations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs037.html#derivatives-in-terms-of-z-j-l" style="font-size: 80%;"><b>Derivatives in terms of \( z_j^L \)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs038.html#bringing-it-together" style="font-size: 80%;"><b>Bringing it together</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs039.html#final-back-propagating-equation" style="font-size: 80%;"><b>Final back propagating equation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs040.html#using-the-chain-rule-and-summing-over-all-k-entries" style="font-size: 80%;"><b>Using the chain rule and summing over all \( k \) entries</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs041.html#setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations" style="font-size: 80%;"><b>Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs042.html#setting-up-the-back-propagation-algorithm-part-1" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 1</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs043.html#setting-up-the-back-propagation-algorithm-part-2" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs044.html#setting-up-the-back-propagation-algorithm-part-3" style="font-size: 80%;"><b>Setting up the Back propagation algorithm, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs045.html#updating-the-gradients" style="font-size: 80%;"><b>Updating the gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs001.html#lecture-october-13-2025" style="font-size: 80%;"><b>Lecture October 13, 2025</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs002.html#readings-and-videos" style="font-size: 80%;"><b>Readings and videos</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs003.html#material-for-the-lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Material for the lab sessions on Tuesday and Wednesday</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs004.html#lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network" style="font-size: 80%;"><b>Lecture material: Writing a code which implements a feed-forward neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs005.html#mathematics-of-deep-learning" style="font-size: 80%;"><b>Mathematics of deep learning</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs006.html#reminder-on-books-with-hands-on-material-and-codes" style="font-size: 80%;"><b>Reminder on books with hands-on material and codes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs007.html#reading-recommendations" style="font-size: 80%;"><b>Reading recommendations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs008.html#reminder-from-last-week-first-network-example-simple-percepetron-with-one-input" style="font-size: 80%;"><b>Reminder from last week: First network example, simple percepetron with one input</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs009.html#layout-of-a-simple-neural-network-with-no-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with no hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs010.html#optimizing-the-parameters" style="font-size: 80%;"><b>Optimizing the parameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs011.html#adding-a-hidden-layer" style="font-size: 80%;"><b>Adding a hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs012.html#layout-of-a-simple-neural-network-with-one-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with one hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs013.html#the-derivatives" style="font-size: 80%;"><b>The derivatives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs014.html#important-observations" style="font-size: 80%;"><b>Important observations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs015.html#the-training" style="font-size: 80%;"><b>The training</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs016.html#code-example" style="font-size: 80%;"><b>Code example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs017.html#simple-neural-network-and-the-back-propagation-equations" style="font-size: 80%;"><b>Simple neural network and the  back propagation equations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs018.html#layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-with-two-hidden-noeds-and-one-output-node" style="font-size: 80%;"><b>Layout of a simple neural network with two input nodes, one  hidden layer with two hidden noeds and one output node</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs019.html#the-ouput-layer" style="font-size: 80%;"><b>The ouput layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs020.html#compact-expressions" style="font-size: 80%;"><b>Compact expressions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs021.html#output-layer" style="font-size: 80%;"><b>Output layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs022.html#explicit-derivatives" style="font-size: 80%;"><b>Explicit derivatives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs023.html#derivatives-of-the-hidden-layer" style="font-size: 80%;"><b>Derivatives of the hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs024.html#final-expression" style="font-size: 80%;"><b>Final expression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs025.html#completing-the-list" style="font-size: 80%;"><b>Completing the list</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs026.html#final-expressions-for-the-biases-of-the-hidden-layer" style="font-size: 80%;"><b>Final expressions for the biases of the hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs027.html#gradient-expressions" style="font-size: 80%;"><b>Gradient expressions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs028.html#setting-up-the-equations-for-a-neural-network" style="font-size: 80%;"><b>Setting up the equations for a neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs029.html#layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0" style="font-size: 80%;"><b>Layout of a neural network with three hidden layers (last layer = \( l=L=4 \), first layer \( l=0 \))</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs030.html#definitions" style="font-size: 80%;"><b>Definitions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs031.html#inputs-to-the-activation-function" style="font-size: 80%;"><b>Inputs to the activation function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs032.html#layout-of-input-to-first-hidden-layer-l-1-from-input-layer-l-0" style="font-size: 80%;"><b>Layout of input to first hidden layer \( l=1 \) from input layer \( l=0 \)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs033.html#derivatives-and-the-chain-rule" style="font-size: 80%;"><b>Derivatives and the chain rule</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs034.html#derivative-of-the-cost-function" style="font-size: 80%;"><b>Derivative of the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs035.html#the-back-propagation-equations-for-a-neural-network" style="font-size: 80%;"><b>The  back propagation equations for a neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs036.html#analyzing-the-last-results" style="font-size: 80%;"><b>Analyzing the last results</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs037.html#more-considerations" style="font-size: 80%;"><b>More considerations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs038.html#derivatives-in-terms-of-z-j-l" style="font-size: 80%;"><b>Derivatives in terms of \( z_j^L \)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs039.html#bringing-it-together" style="font-size: 80%;"><b>Bringing it together</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs040.html#final-back-propagating-equation" style="font-size: 80%;"><b>Final back propagating equation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs041.html#using-the-chain-rule-and-summing-over-all-k-entries" style="font-size: 80%;"><b>Using the chain rule and summing over all \( k \) entries</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs042.html#setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations" style="font-size: 80%;"><b>Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs043.html#setting-up-the-back-propagation-algorithm-part-1" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 1</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs044.html#setting-up-the-back-propagation-algorithm-part-2" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 2</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs045.html#setting-up-the-back-propagation-algorithm-part-3" style="font-size: 80%;"><b>Setting up the Back propagation algorithm, part 3</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs046.html#updating-the-gradients" style="font-size: 80%;"><b>Updating the gradients</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#activation-functions" style="font-size: 80%;"><b>Activation functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs047.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs048.html#relevance" style="font-size: 80%;"><b>Relevance</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs049.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs050.html#exploding-gradients" style="font-size: 80%;"><b>Exploding gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs051.html#is-the-logistic-activation-function-sigmoid-our-choice" style="font-size: 80%;"><b>Is the Logistic activation function (Sigmoid)  our choice?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs052.html#logistic-function-as-the-root-of-problems" style="font-size: 80%;"><b>Logistic function as the root of problems</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs053.html#the-derivative-of-the-logistic-funtion" style="font-size: 80%;"><b>The derivative of the Logistic funtion</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs054.html#insights-from-the-paper-by-glorot-and-bengio" style="font-size: 80%;"><b>Insights from the paper by Glorot and Bengio</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs055.html#the-relu-function-family" style="font-size: 80%;"><b>The RELU function family</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs056.html#elu-function" style="font-size: 80%;"><b>ELU function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs057.html#which-activation-function-should-we-use" style="font-size: 80%;"><b>Which activation function should we use?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs058.html#more-on-activation-functions-output-layers" style="font-size: 80%;"><b>More on activation functions, output layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs059.html#fine-tuning-neural-network-hyperparameters" style="font-size: 80%;"><b>Fine-tuning neural network hyperparameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs060.html#hidden-layers" style="font-size: 80%;"><b>Hidden layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs061.html#batch-normalization" style="font-size: 80%;"><b>Batch Normalization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs062.html#dropout" style="font-size: 80%;"><b>Dropout</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs063.html#gradient-clipping" style="font-size: 80%;"><b>Gradient Clipping</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs064.html#a-top-down-perspective-on-neural-networks" style="font-size: 80%;"><b>A top-down perspective on Neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs065.html#more-top-down-perspectives" style="font-size: 80%;"><b>More top-down perspectives</b></a></li>
-     <!-- navigation toc: --> <li><a href="#limitations-of-supervised-learning-with-deep-networks" style="font-size: 80%;"><b>Limitations of supervised learning with deep networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs067.html#limitations-of-nns" style="font-size: 80%;"><b>Limitations of NNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs068.html#homogeneous-data" style="font-size: 80%;"><b>Homogeneous data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs069.html#more-limitations" style="font-size: 80%;"><b>More limitations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs070.html#setting-up-a-multi-layer-perceptron-model-for-classification" style="font-size: 80%;"><b>Setting up a Multi-layer perceptron model for classification</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs071.html#defining-the-cost-function" style="font-size: 80%;"><b>Defining the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs072.html#example-binary-classification-problem" style="font-size: 80%;"><b>Example: binary classification problem</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs073.html#the-softmax-function" style="font-size: 80%;"><b>The Softmax function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs074.html#developing-a-code-for-doing-neural-networks-with-back-propagation" style="font-size: 80%;"><b>Developing a code for doing neural networks with back propagation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs076.html#train-and-test-datasets" style="font-size: 80%;"><b>Train and test datasets</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs077.html#define-model-and-architecture" style="font-size: 80%;"><b>Define model and architecture</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs078.html#layers" style="font-size: 80%;"><b>Layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs079.html#weights-and-biases" style="font-size: 80%;"><b>Weights and biases</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs080.html#feed-forward-pass" style="font-size: 80%;"><b>Feed-forward pass</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs081.html#matrix-multiplications" style="font-size: 80%;"><b>Matrix multiplications</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs082.html#choose-cost-function-and-optimizer" style="font-size: 80%;"><b>Choose cost function and optimizer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs083.html#optimizing-the-cost-function" style="font-size: 80%;"><b>Optimizing the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs084.html#regularization" style="font-size: 80%;"><b>Regularization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs085.html#matrix-multiplication" style="font-size: 80%;"><b>Matrix  multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs086.html#improving-performance" style="font-size: 80%;"><b>Improving performance</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs087.html#full-object-oriented-implementation" style="font-size: 80%;"><b>Full object-oriented implementation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs088.html#evaluate-model-performance-on-test-data" style="font-size: 80%;"><b>Evaluate model performance on test data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs089.html#adjust-hyperparameters" style="font-size: 80%;"><b>Adjust hyperparameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs091.html#scikit-learn-implementation" style="font-size: 80%;"><b>scikit-learn implementation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#building-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs094.html#tensorflow" style="font-size: 80%;"><b>Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs095.html#using-keras" style="font-size: 80%;"><b>Using Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#the-breast-cancer-data-now-with-keras" style="font-size: 80%;"><b>The Breast Cancer Data, now with Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs048.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs049.html#relevance" style="font-size: 80%;"><b>Relevance</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs050.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs051.html#exploding-gradients" style="font-size: 80%;"><b>Exploding gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs052.html#is-the-logistic-activation-function-sigmoid-our-choice" style="font-size: 80%;"><b>Is the Logistic activation function (Sigmoid)  our choice?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs053.html#logistic-function-as-the-root-of-problems" style="font-size: 80%;"><b>Logistic function as the root of problems</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs054.html#the-derivative-of-the-logistic-funtion" style="font-size: 80%;"><b>The derivative of the Logistic funtion</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs055.html#insights-from-the-paper-by-glorot-and-bengio" style="font-size: 80%;"><b>Insights from the paper by Glorot and Bengio</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs056.html#the-relu-function-family" style="font-size: 80%;"><b>The RELU function family</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs057.html#elu-function" style="font-size: 80%;"><b>ELU function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs058.html#which-activation-function-should-we-use" style="font-size: 80%;"><b>Which activation function should we use?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs059.html#more-on-activation-functions-output-layers" style="font-size: 80%;"><b>More on activation functions, output layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs060.html#fine-tuning-neural-network-hyperparameters" style="font-size: 80%;"><b>Fine-tuning neural network hyperparameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs061.html#hidden-layers" style="font-size: 80%;"><b>Hidden layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs062.html#batch-normalization" style="font-size: 80%;"><b>Batch Normalization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs063.html#dropout" style="font-size: 80%;"><b>Dropout</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs064.html#gradient-clipping" style="font-size: 80%;"><b>Gradient Clipping</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs065.html#a-top-down-perspective-on-neural-networks" style="font-size: 80%;"><b>A top-down perspective on Neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="#more-top-down-perspectives" style="font-size: 80%;"><b>More top-down perspectives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs067.html#limitations-of-supervised-learning-with-deep-networks" style="font-size: 80%;"><b>Limitations of supervised learning with deep networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs068.html#limitations-of-nns" style="font-size: 80%;"><b>Limitations of NNs</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs069.html#homogeneous-data" style="font-size: 80%;"><b>Homogeneous data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs070.html#more-limitations" style="font-size: 80%;"><b>More limitations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs071.html#setting-up-a-multi-layer-perceptron-model-for-classification" style="font-size: 80%;"><b>Setting up a Multi-layer perceptron model for classification</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs072.html#defining-the-cost-function" style="font-size: 80%;"><b>Defining the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs073.html#example-binary-classification-problem" style="font-size: 80%;"><b>Example: binary classification problem</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs074.html#the-softmax-function" style="font-size: 80%;"><b>The Softmax function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs075.html#developing-a-code-for-doing-neural-networks-with-back-propagation" style="font-size: 80%;"><b>Developing a code for doing neural networks with back propagation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs077.html#train-and-test-datasets" style="font-size: 80%;"><b>Train and test datasets</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs078.html#define-model-and-architecture" style="font-size: 80%;"><b>Define model and architecture</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs079.html#layers" style="font-size: 80%;"><b>Layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs080.html#weights-and-biases" style="font-size: 80%;"><b>Weights and biases</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs081.html#feed-forward-pass" style="font-size: 80%;"><b>Feed-forward pass</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs082.html#matrix-multiplications" style="font-size: 80%;"><b>Matrix multiplications</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs083.html#choose-cost-function-and-optimizer" style="font-size: 80%;"><b>Choose cost function and optimizer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs084.html#optimizing-the-cost-function" style="font-size: 80%;"><b>Optimizing the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs085.html#regularization" style="font-size: 80%;"><b>Regularization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs086.html#matrix-multiplication" style="font-size: 80%;"><b>Matrix  multiplication</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs087.html#improving-performance" style="font-size: 80%;"><b>Improving performance</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs088.html#full-object-oriented-implementation" style="font-size: 80%;"><b>Full object-oriented implementation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs089.html#evaluate-model-performance-on-test-data" style="font-size: 80%;"><b>Evaluate model performance on test data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs090.html#adjust-hyperparameters" style="font-size: 80%;"><b>Adjust hyperparameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#scikit-learn-implementation" style="font-size: 80%;"><b>scikit-learn implementation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs094.html#building-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building neural networks in Tensorflow and Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs095.html#tensorflow" style="font-size: 80%;"><b>Tensorflow</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#using-keras" style="font-size: 80%;"><b>Using Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#building-a-neural-network-code" style="font-size: 80%;"><b>Building a neural network code</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#learning-rate-methods" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Learning rate methods</a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#usage-of-the-above-learning-rate-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of the above learning rate schedulers</a></li>
@@ -465,15 +463,27 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0066"></a>
 <!-- !split -->
-<h2 id="limitations-of-supervised-learning-with-deep-networks" class="anchor">Limitations of supervised learning with deep networks </h2>
+<h2 id="more-top-down-perspectives" class="anchor">More top-down perspectives </h2>
+
+<p>If the validation and test sets are drawn from the same distributions,
+then a good performance on the validation set should lead to similarly
+good performance on the test set. 
+</p>
 
-<p>Like all statistical methods, supervised learning using neural
-networks has important limitations. This is especially important when
-one seeks to apply these methods, especially to physics problems. Like
-all tools, DNNs are not a universal solution. Often, the same or
-better performance on a task can be achieved by using a few
-hand-engineered features (or even a collection of random
-features). 
+<p>However, sometimes
+the training data and test data differ in subtle ways because, for
+example, they are collected using slightly different methods, or
+because it is cheaper to collect data in one way versus another. In
+this case, there can be a mismatch between the training and test
+data. This can lead to the neural network overfitting these small
+differences between the test and training sets, and a poor performance
+on the test set despite having a good performance on the validation
+set. To rectify this, Andrew Ng suggests making two validation or dev
+sets, one constructed from the training data and one constructed from
+the test data. The difference between the performance of the algorithm
+on these two validation sets quantifies the train-test mismatch. This
+can serve as another important diagnostic when using DNNs for
+supervised learning.
 </p>
 
 <p>
diff --git a/doc/pub/week42/html/._week42-bs067.html b/doc/pub/week42/html/._week42-bs067.html
index cb9fcad0c..32d4b2df2 100644
--- a/doc/pub/week42/html/._week42-bs067.html
+++ b/doc/pub/week42/html/._week42-bs067.html
@@ -36,16 +36,17 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Lecture October 14, 2024', 2, None, 'lecture-october-14-2024'),
-              ('Material for the active learning sessions on Tuesday and '
-               'Wednesday',
+ 'sections': [('Lecture October 13, 2025', 2, None, 'lecture-october-13-2025'),
+              ('Readings and videos', 2, None, 'readings-and-videos'),
+              ('Material for the lab sessions on Tuesday and Wednesday',
                2,
                None,
-               'material-for-the-active-learning-sessions-on-tuesday-and-wednesday'),
-              ('Writing a code which implements a feed-forward neural network',
+               'material-for-the-lab-sessions-on-tuesday-and-wednesday'),
+              ('Lecture material: Writing a code which implements a '
+               'feed-forward neural network',
                2,
                None,
-               'writing-a-code-which-implements-a-feed-forward-neural-network'),
+               'lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network'),
               ('Mathematics of deep learning',
                2,
                None,
@@ -55,10 +56,11 @@
                None,
                'reminder-on-books-with-hands-on-material-and-codes'),
               ('Reading recommendations', 2, None, 'reading-recommendations'),
-              ('First network example, simple percepetron with one input',
+              ('Reminder from last week: First network example, simple '
+               'percepetron with one input',
                2,
                None,
-               'first-network-example-simple-percepetron-with-one-input'),
+               'reminder-from-last-week-first-network-example-simple-percepetron-with-one-input'),
               ('Layout of a simple neural network with no hidden layer',
                2,
                None,
@@ -105,10 +107,10 @@
                None,
                'setting-up-the-equations-for-a-neural-network'),
               ('Layout of a neural network with three hidden layers (last '
-               'later = $l=L=4$, first layer $l=0$)',
+               'layer = $l=L=4$, first layer $l=0$)',
                2,
                None,
-               'layout-of-a-neural-network-with-three-hidden-layers-last-later-l-l-4-first-layer-l-0'),
+               'layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0'),
               ('Definitions', 2, None, 'definitions'),
               ('Inputs to the activation function',
                2,
@@ -291,10 +293,6 @@
                2,
                None,
                'collect-and-pre-process-data'),
-              ('The Breast Cancer Data, now with Keras',
-               2,
-               None,
-               'the-breast-cancer-data-now-with-keras'),
               ('Building a neural network code',
                2,
                None,
@@ -349,103 +347,103 @@
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs001.html#lecture-october-14-2024" style="font-size: 80%;"><b>Lecture October 14, 2024</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs002.html#material-for-the-active-learning-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Material for the active learning sessions on Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs003.html#writing-a-code-which-implements-a-feed-forward-neural-network" style="font-size: 80%;"><b>Writing a code which implements a feed-forward neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs004.html#mathematics-of-deep-learning" style="font-size: 80%;"><b>Mathematics of deep learning</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs005.html#reminder-on-books-with-hands-on-material-and-codes" style="font-size: 80%;"><b>Reminder on books with hands-on material and codes</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs006.html#reading-recommendations" style="font-size: 80%;"><b>Reading recommendations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs007.html#first-network-example-simple-percepetron-with-one-input" style="font-size: 80%;"><b>First network example, simple percepetron with one input</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs008.html#layout-of-a-simple-neural-network-with-no-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with no hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs009.html#optimizing-the-parameters" style="font-size: 80%;"><b>Optimizing the parameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs010.html#adding-a-hidden-layer" style="font-size: 80%;"><b>Adding a hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs011.html#layout-of-a-simple-neural-network-with-one-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with one hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs012.html#the-derivatives" style="font-size: 80%;"><b>The derivatives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs013.html#important-observations" style="font-size: 80%;"><b>Important observations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs014.html#the-training" style="font-size: 80%;"><b>The training</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs015.html#code-example" style="font-size: 80%;"><b>Code example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs016.html#simple-neural-network-and-the-back-propagation-equations" style="font-size: 80%;"><b>Simple neural network and the  back propagation equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs017.html#layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-with-two-hidden-noeds-and-one-output-node" style="font-size: 80%;"><b>Layout of a simple neural network with two input nodes, one  hidden layer with two hidden noeds and one output node</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs018.html#the-ouput-layer" style="font-size: 80%;"><b>The ouput layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs019.html#compact-expressions" style="font-size: 80%;"><b>Compact expressions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs020.html#output-layer" style="font-size: 80%;"><b>Output layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs021.html#explicit-derivatives" style="font-size: 80%;"><b>Explicit derivatives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs022.html#derivatives-of-the-hidden-layer" style="font-size: 80%;"><b>Derivatives of the hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs023.html#final-expression" style="font-size: 80%;"><b>Final expression</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs024.html#completing-the-list" style="font-size: 80%;"><b>Completing the list</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs025.html#final-expressions-for-the-biases-of-the-hidden-layer" style="font-size: 80%;"><b>Final expressions for the biases of the hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs026.html#gradient-expressions" style="font-size: 80%;"><b>Gradient expressions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs027.html#setting-up-the-equations-for-a-neural-network" style="font-size: 80%;"><b>Setting up the equations for a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs028.html#layout-of-a-neural-network-with-three-hidden-layers-last-later-l-l-4-first-layer-l-0" style="font-size: 80%;"><b>Layout of a neural network with three hidden layers (last later = \( l=L=4 \), first layer \( l=0 \))</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs029.html#definitions" style="font-size: 80%;"><b>Definitions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs030.html#inputs-to-the-activation-function" style="font-size: 80%;"><b>Inputs to the activation function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs031.html#layout-of-input-to-first-hidden-layer-l-1-from-input-layer-l-0" style="font-size: 80%;"><b>Layout of input to first hidden layer \( l=1 \) from input layer \( l=0 \)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs032.html#derivatives-and-the-chain-rule" style="font-size: 80%;"><b>Derivatives and the chain rule</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs033.html#derivative-of-the-cost-function" style="font-size: 80%;"><b>Derivative of the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs034.html#the-back-propagation-equations-for-a-neural-network" style="font-size: 80%;"><b>The  back propagation equations for a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs035.html#analyzing-the-last-results" style="font-size: 80%;"><b>Analyzing the last results</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs036.html#more-considerations" style="font-size: 80%;"><b>More considerations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs037.html#derivatives-in-terms-of-z-j-l" style="font-size: 80%;"><b>Derivatives in terms of \( z_j^L \)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs038.html#bringing-it-together" style="font-size: 80%;"><b>Bringing it together</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs039.html#final-back-propagating-equation" style="font-size: 80%;"><b>Final back propagating equation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs040.html#using-the-chain-rule-and-summing-over-all-k-entries" style="font-size: 80%;"><b>Using the chain rule and summing over all \( k \) entries</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs041.html#setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations" style="font-size: 80%;"><b>Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs042.html#setting-up-the-back-propagation-algorithm-part-1" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 1</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs043.html#setting-up-the-back-propagation-algorithm-part-2" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs044.html#setting-up-the-back-propagation-algorithm-part-3" style="font-size: 80%;"><b>Setting up the Back propagation algorithm, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs045.html#updating-the-gradients" style="font-size: 80%;"><b>Updating the gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs001.html#lecture-october-13-2025" style="font-size: 80%;"><b>Lecture October 13, 2025</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs002.html#readings-and-videos" style="font-size: 80%;"><b>Readings and videos</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs003.html#material-for-the-lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Material for the lab sessions on Tuesday and Wednesday</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs004.html#lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network" style="font-size: 80%;"><b>Lecture material: Writing a code which implements a feed-forward neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs005.html#mathematics-of-deep-learning" style="font-size: 80%;"><b>Mathematics of deep learning</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs006.html#reminder-on-books-with-hands-on-material-and-codes" style="font-size: 80%;"><b>Reminder on books with hands-on material and codes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs007.html#reading-recommendations" style="font-size: 80%;"><b>Reading recommendations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs008.html#reminder-from-last-week-first-network-example-simple-percepetron-with-one-input" style="font-size: 80%;"><b>Reminder from last week: First network example, simple percepetron with one input</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs009.html#layout-of-a-simple-neural-network-with-no-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with no hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs010.html#optimizing-the-parameters" style="font-size: 80%;"><b>Optimizing the parameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs011.html#adding-a-hidden-layer" style="font-size: 80%;"><b>Adding a hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs012.html#layout-of-a-simple-neural-network-with-one-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with one hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs013.html#the-derivatives" style="font-size: 80%;"><b>The derivatives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs014.html#important-observations" style="font-size: 80%;"><b>Important observations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs015.html#the-training" style="font-size: 80%;"><b>The training</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs016.html#code-example" style="font-size: 80%;"><b>Code example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs017.html#simple-neural-network-and-the-back-propagation-equations" style="font-size: 80%;"><b>Simple neural network and the  back propagation equations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs018.html#layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-with-two-hidden-noeds-and-one-output-node" style="font-size: 80%;"><b>Layout of a simple neural network with two input nodes, one  hidden layer with two hidden noeds and one output node</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs019.html#the-ouput-layer" style="font-size: 80%;"><b>The ouput layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs020.html#compact-expressions" style="font-size: 80%;"><b>Compact expressions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs021.html#output-layer" style="font-size: 80%;"><b>Output layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs022.html#explicit-derivatives" style="font-size: 80%;"><b>Explicit derivatives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs023.html#derivatives-of-the-hidden-layer" style="font-size: 80%;"><b>Derivatives of the hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs024.html#final-expression" style="font-size: 80%;"><b>Final expression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs025.html#completing-the-list" style="font-size: 80%;"><b>Completing the list</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs026.html#final-expressions-for-the-biases-of-the-hidden-layer" style="font-size: 80%;"><b>Final expressions for the biases of the hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs027.html#gradient-expressions" style="font-size: 80%;"><b>Gradient expressions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs028.html#setting-up-the-equations-for-a-neural-network" style="font-size: 80%;"><b>Setting up the equations for a neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs029.html#layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0" style="font-size: 80%;"><b>Layout of a neural network with three hidden layers (last layer = \( l=L=4 \), first layer \( l=0 \))</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs030.html#definitions" style="font-size: 80%;"><b>Definitions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs031.html#inputs-to-the-activation-function" style="font-size: 80%;"><b>Inputs to the activation function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs032.html#layout-of-input-to-first-hidden-layer-l-1-from-input-layer-l-0" style="font-size: 80%;"><b>Layout of input to first hidden layer \( l=1 \) from input layer \( l=0 \)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs033.html#derivatives-and-the-chain-rule" style="font-size: 80%;"><b>Derivatives and the chain rule</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs034.html#derivative-of-the-cost-function" style="font-size: 80%;"><b>Derivative of the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs035.html#the-back-propagation-equations-for-a-neural-network" style="font-size: 80%;"><b>The  back propagation equations for a neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs036.html#analyzing-the-last-results" style="font-size: 80%;"><b>Analyzing the last results</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs037.html#more-considerations" style="font-size: 80%;"><b>More considerations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs038.html#derivatives-in-terms-of-z-j-l" style="font-size: 80%;"><b>Derivatives in terms of \( z_j^L \)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs039.html#bringing-it-together" style="font-size: 80%;"><b>Bringing it together</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs040.html#final-back-propagating-equation" style="font-size: 80%;"><b>Final back propagating equation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs041.html#using-the-chain-rule-and-summing-over-all-k-entries" style="font-size: 80%;"><b>Using the chain rule and summing over all \( k \) entries</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs042.html#setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations" style="font-size: 80%;"><b>Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs043.html#setting-up-the-back-propagation-algorithm-part-1" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 1</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs044.html#setting-up-the-back-propagation-algorithm-part-2" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 2</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs045.html#setting-up-the-back-propagation-algorithm-part-3" style="font-size: 80%;"><b>Setting up the Back propagation algorithm, part 3</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs046.html#updating-the-gradients" style="font-size: 80%;"><b>Updating the gradients</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#activation-functions" style="font-size: 80%;"><b>Activation functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs047.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs048.html#relevance" style="font-size: 80%;"><b>Relevance</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs049.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs050.html#exploding-gradients" style="font-size: 80%;"><b>Exploding gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs051.html#is-the-logistic-activation-function-sigmoid-our-choice" style="font-size: 80%;"><b>Is the Logistic activation function (Sigmoid)  our choice?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs052.html#logistic-function-as-the-root-of-problems" style="font-size: 80%;"><b>Logistic function as the root of problems</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs053.html#the-derivative-of-the-logistic-funtion" style="font-size: 80%;"><b>The derivative of the Logistic funtion</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs054.html#insights-from-the-paper-by-glorot-and-bengio" style="font-size: 80%;"><b>Insights from the paper by Glorot and Bengio</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs055.html#the-relu-function-family" style="font-size: 80%;"><b>The RELU function family</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs056.html#elu-function" style="font-size: 80%;"><b>ELU function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs057.html#which-activation-function-should-we-use" style="font-size: 80%;"><b>Which activation function should we use?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs058.html#more-on-activation-functions-output-layers" style="font-size: 80%;"><b>More on activation functions, output layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs059.html#fine-tuning-neural-network-hyperparameters" style="font-size: 80%;"><b>Fine-tuning neural network hyperparameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs060.html#hidden-layers" style="font-size: 80%;"><b>Hidden layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs061.html#batch-normalization" style="font-size: 80%;"><b>Batch Normalization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs062.html#dropout" style="font-size: 80%;"><b>Dropout</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs063.html#gradient-clipping" style="font-size: 80%;"><b>Gradient Clipping</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs064.html#a-top-down-perspective-on-neural-networks" style="font-size: 80%;"><b>A top-down perspective on Neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs065.html#more-top-down-perspectives" style="font-size: 80%;"><b>More top-down perspectives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs066.html#limitations-of-supervised-learning-with-deep-networks" style="font-size: 80%;"><b>Limitations of supervised learning with deep networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="#limitations-of-nns" style="font-size: 80%;"><b>Limitations of NNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs068.html#homogeneous-data" style="font-size: 80%;"><b>Homogeneous data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs069.html#more-limitations" style="font-size: 80%;"><b>More limitations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs070.html#setting-up-a-multi-layer-perceptron-model-for-classification" style="font-size: 80%;"><b>Setting up a Multi-layer perceptron model for classification</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs071.html#defining-the-cost-function" style="font-size: 80%;"><b>Defining the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs072.html#example-binary-classification-problem" style="font-size: 80%;"><b>Example: binary classification problem</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs073.html#the-softmax-function" style="font-size: 80%;"><b>The Softmax function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs074.html#developing-a-code-for-doing-neural-networks-with-back-propagation" style="font-size: 80%;"><b>Developing a code for doing neural networks with back propagation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs076.html#train-and-test-datasets" style="font-size: 80%;"><b>Train and test datasets</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs077.html#define-model-and-architecture" style="font-size: 80%;"><b>Define model and architecture</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs078.html#layers" style="font-size: 80%;"><b>Layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs079.html#weights-and-biases" style="font-size: 80%;"><b>Weights and biases</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs080.html#feed-forward-pass" style="font-size: 80%;"><b>Feed-forward pass</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs081.html#matrix-multiplications" style="font-size: 80%;"><b>Matrix multiplications</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs082.html#choose-cost-function-and-optimizer" style="font-size: 80%;"><b>Choose cost function and optimizer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs083.html#optimizing-the-cost-function" style="font-size: 80%;"><b>Optimizing the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs084.html#regularization" style="font-size: 80%;"><b>Regularization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs085.html#matrix-multiplication" style="font-size: 80%;"><b>Matrix  multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs086.html#improving-performance" style="font-size: 80%;"><b>Improving performance</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs087.html#full-object-oriented-implementation" style="font-size: 80%;"><b>Full object-oriented implementation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs088.html#evaluate-model-performance-on-test-data" style="font-size: 80%;"><b>Evaluate model performance on test data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs089.html#adjust-hyperparameters" style="font-size: 80%;"><b>Adjust hyperparameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs091.html#scikit-learn-implementation" style="font-size: 80%;"><b>scikit-learn implementation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#building-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs094.html#tensorflow" style="font-size: 80%;"><b>Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs095.html#using-keras" style="font-size: 80%;"><b>Using Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#the-breast-cancer-data-now-with-keras" style="font-size: 80%;"><b>The Breast Cancer Data, now with Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs048.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs049.html#relevance" style="font-size: 80%;"><b>Relevance</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs050.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs051.html#exploding-gradients" style="font-size: 80%;"><b>Exploding gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs052.html#is-the-logistic-activation-function-sigmoid-our-choice" style="font-size: 80%;"><b>Is the Logistic activation function (Sigmoid)  our choice?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs053.html#logistic-function-as-the-root-of-problems" style="font-size: 80%;"><b>Logistic function as the root of problems</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs054.html#the-derivative-of-the-logistic-funtion" style="font-size: 80%;"><b>The derivative of the Logistic funtion</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs055.html#insights-from-the-paper-by-glorot-and-bengio" style="font-size: 80%;"><b>Insights from the paper by Glorot and Bengio</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs056.html#the-relu-function-family" style="font-size: 80%;"><b>The RELU function family</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs057.html#elu-function" style="font-size: 80%;"><b>ELU function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs058.html#which-activation-function-should-we-use" style="font-size: 80%;"><b>Which activation function should we use?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs059.html#more-on-activation-functions-output-layers" style="font-size: 80%;"><b>More on activation functions, output layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs060.html#fine-tuning-neural-network-hyperparameters" style="font-size: 80%;"><b>Fine-tuning neural network hyperparameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs061.html#hidden-layers" style="font-size: 80%;"><b>Hidden layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs062.html#batch-normalization" style="font-size: 80%;"><b>Batch Normalization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs063.html#dropout" style="font-size: 80%;"><b>Dropout</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs064.html#gradient-clipping" style="font-size: 80%;"><b>Gradient Clipping</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs065.html#a-top-down-perspective-on-neural-networks" style="font-size: 80%;"><b>A top-down perspective on Neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs066.html#more-top-down-perspectives" style="font-size: 80%;"><b>More top-down perspectives</b></a></li>
+     <!-- navigation toc: --> <li><a href="#limitations-of-supervised-learning-with-deep-networks" style="font-size: 80%;"><b>Limitations of supervised learning with deep networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs068.html#limitations-of-nns" style="font-size: 80%;"><b>Limitations of NNs</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs069.html#homogeneous-data" style="font-size: 80%;"><b>Homogeneous data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs070.html#more-limitations" style="font-size: 80%;"><b>More limitations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs071.html#setting-up-a-multi-layer-perceptron-model-for-classification" style="font-size: 80%;"><b>Setting up a Multi-layer perceptron model for classification</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs072.html#defining-the-cost-function" style="font-size: 80%;"><b>Defining the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs073.html#example-binary-classification-problem" style="font-size: 80%;"><b>Example: binary classification problem</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs074.html#the-softmax-function" style="font-size: 80%;"><b>The Softmax function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs075.html#developing-a-code-for-doing-neural-networks-with-back-propagation" style="font-size: 80%;"><b>Developing a code for doing neural networks with back propagation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs077.html#train-and-test-datasets" style="font-size: 80%;"><b>Train and test datasets</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs078.html#define-model-and-architecture" style="font-size: 80%;"><b>Define model and architecture</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs079.html#layers" style="font-size: 80%;"><b>Layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs080.html#weights-and-biases" style="font-size: 80%;"><b>Weights and biases</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs081.html#feed-forward-pass" style="font-size: 80%;"><b>Feed-forward pass</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs082.html#matrix-multiplications" style="font-size: 80%;"><b>Matrix multiplications</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs083.html#choose-cost-function-and-optimizer" style="font-size: 80%;"><b>Choose cost function and optimizer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs084.html#optimizing-the-cost-function" style="font-size: 80%;"><b>Optimizing the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs085.html#regularization" style="font-size: 80%;"><b>Regularization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs086.html#matrix-multiplication" style="font-size: 80%;"><b>Matrix  multiplication</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs087.html#improving-performance" style="font-size: 80%;"><b>Improving performance</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs088.html#full-object-oriented-implementation" style="font-size: 80%;"><b>Full object-oriented implementation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs089.html#evaluate-model-performance-on-test-data" style="font-size: 80%;"><b>Evaluate model performance on test data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs090.html#adjust-hyperparameters" style="font-size: 80%;"><b>Adjust hyperparameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#scikit-learn-implementation" style="font-size: 80%;"><b>scikit-learn implementation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs094.html#building-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building neural networks in Tensorflow and Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs095.html#tensorflow" style="font-size: 80%;"><b>Tensorflow</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#using-keras" style="font-size: 80%;"><b>Using Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#building-a-neural-network-code" style="font-size: 80%;"><b>Building a neural network code</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#learning-rate-methods" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Learning rate methods</a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#usage-of-the-above-learning-rate-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of the above learning rate schedulers</a></li>
@@ -465,14 +463,17 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0067"></a>
 <!-- !split -->
-<h2 id="limitations-of-nns" class="anchor">Limitations of NNs </h2>
+<h2 id="limitations-of-supervised-learning-with-deep-networks" class="anchor">Limitations of supervised learning with deep networks </h2>
 
-<p>Here we list some of the important limitations of supervised neural network based models. </p>
+<p>Like all statistical methods, supervised learning using neural
+networks has important limitations. This is especially important when
+one seeks to apply these methods, especially to physics problems. Like
+all tools, DNNs are not a universal solution. Often, the same or
+better performance on a task can be achieved by using a few
+hand-engineered features (or even a collection of random
+features). 
+</p>
 
-<ul>
-<li> <b>Need labeled data</b>. All supervised learning methods, DNNs for supervised learning require labeled data. Often, labeled data is harder to acquire than unlabeled data (e.g. one must pay for human experts to label images).</li>
-<li> <b>Supervised neural networks are extremely data intensive.</b> DNNs are data hungry. They perform best when data is plentiful. This is doubly so for supervised methods where the data must also be labeled. The utility of DNNs is extremely limited if data is hard to acquire or the datasets are small (hundreds to a few thousand samples). In this case, the performance of other methods that utilize hand-engineered features can exceed that of DNNs.</li>
-</ul>
 <p>
 <!-- navigation buttons at the bottom of the page -->
 <ul class="pagination">
diff --git a/doc/pub/week42/html/._week42-bs068.html b/doc/pub/week42/html/._week42-bs068.html
index 14d7860b7..cda1bbc6a 100644
--- a/doc/pub/week42/html/._week42-bs068.html
+++ b/doc/pub/week42/html/._week42-bs068.html
@@ -36,16 +36,17 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Lecture October 14, 2024', 2, None, 'lecture-october-14-2024'),
-              ('Material for the active learning sessions on Tuesday and '
-               'Wednesday',
+ 'sections': [('Lecture October 13, 2025', 2, None, 'lecture-october-13-2025'),
+              ('Readings and videos', 2, None, 'readings-and-videos'),
+              ('Material for the lab sessions on Tuesday and Wednesday',
                2,
                None,
-               'material-for-the-active-learning-sessions-on-tuesday-and-wednesday'),
-              ('Writing a code which implements a feed-forward neural network',
+               'material-for-the-lab-sessions-on-tuesday-and-wednesday'),
+              ('Lecture material: Writing a code which implements a '
+               'feed-forward neural network',
                2,
                None,
-               'writing-a-code-which-implements-a-feed-forward-neural-network'),
+               'lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network'),
               ('Mathematics of deep learning',
                2,
                None,
@@ -55,10 +56,11 @@
                None,
                'reminder-on-books-with-hands-on-material-and-codes'),
               ('Reading recommendations', 2, None, 'reading-recommendations'),
-              ('First network example, simple percepetron with one input',
+              ('Reminder from last week: First network example, simple '
+               'percepetron with one input',
                2,
                None,
-               'first-network-example-simple-percepetron-with-one-input'),
+               'reminder-from-last-week-first-network-example-simple-percepetron-with-one-input'),
               ('Layout of a simple neural network with no hidden layer',
                2,
                None,
@@ -105,10 +107,10 @@
                None,
                'setting-up-the-equations-for-a-neural-network'),
               ('Layout of a neural network with three hidden layers (last '
-               'later = $l=L=4$, first layer $l=0$)',
+               'layer = $l=L=4$, first layer $l=0$)',
                2,
                None,
-               'layout-of-a-neural-network-with-three-hidden-layers-last-later-l-l-4-first-layer-l-0'),
+               'layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0'),
               ('Definitions', 2, None, 'definitions'),
               ('Inputs to the activation function',
                2,
@@ -291,10 +293,6 @@
                2,
                None,
                'collect-and-pre-process-data'),
-              ('The Breast Cancer Data, now with Keras',
-               2,
-               None,
-               'the-breast-cancer-data-now-with-keras'),
               ('Building a neural network code',
                2,
                None,
@@ -349,103 +347,103 @@
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs001.html#lecture-october-14-2024" style="font-size: 80%;"><b>Lecture October 14, 2024</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs002.html#material-for-the-active-learning-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Material for the active learning sessions on Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs003.html#writing-a-code-which-implements-a-feed-forward-neural-network" style="font-size: 80%;"><b>Writing a code which implements a feed-forward neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs004.html#mathematics-of-deep-learning" style="font-size: 80%;"><b>Mathematics of deep learning</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs005.html#reminder-on-books-with-hands-on-material-and-codes" style="font-size: 80%;"><b>Reminder on books with hands-on material and codes</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs006.html#reading-recommendations" style="font-size: 80%;"><b>Reading recommendations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs007.html#first-network-example-simple-percepetron-with-one-input" style="font-size: 80%;"><b>First network example, simple percepetron with one input</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs008.html#layout-of-a-simple-neural-network-with-no-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with no hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs009.html#optimizing-the-parameters" style="font-size: 80%;"><b>Optimizing the parameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs010.html#adding-a-hidden-layer" style="font-size: 80%;"><b>Adding a hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs011.html#layout-of-a-simple-neural-network-with-one-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with one hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs012.html#the-derivatives" style="font-size: 80%;"><b>The derivatives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs013.html#important-observations" style="font-size: 80%;"><b>Important observations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs014.html#the-training" style="font-size: 80%;"><b>The training</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs015.html#code-example" style="font-size: 80%;"><b>Code example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs016.html#simple-neural-network-and-the-back-propagation-equations" style="font-size: 80%;"><b>Simple neural network and the  back propagation equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs017.html#layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-with-two-hidden-noeds-and-one-output-node" style="font-size: 80%;"><b>Layout of a simple neural network with two input nodes, one  hidden layer with two hidden noeds and one output node</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs018.html#the-ouput-layer" style="font-size: 80%;"><b>The ouput layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs019.html#compact-expressions" style="font-size: 80%;"><b>Compact expressions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs020.html#output-layer" style="font-size: 80%;"><b>Output layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs021.html#explicit-derivatives" style="font-size: 80%;"><b>Explicit derivatives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs022.html#derivatives-of-the-hidden-layer" style="font-size: 80%;"><b>Derivatives of the hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs023.html#final-expression" style="font-size: 80%;"><b>Final expression</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs024.html#completing-the-list" style="font-size: 80%;"><b>Completing the list</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs025.html#final-expressions-for-the-biases-of-the-hidden-layer" style="font-size: 80%;"><b>Final expressions for the biases of the hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs026.html#gradient-expressions" style="font-size: 80%;"><b>Gradient expressions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs027.html#setting-up-the-equations-for-a-neural-network" style="font-size: 80%;"><b>Setting up the equations for a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs028.html#layout-of-a-neural-network-with-three-hidden-layers-last-later-l-l-4-first-layer-l-0" style="font-size: 80%;"><b>Layout of a neural network with three hidden layers (last later = \( l=L=4 \), first layer \( l=0 \))</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs029.html#definitions" style="font-size: 80%;"><b>Definitions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs030.html#inputs-to-the-activation-function" style="font-size: 80%;"><b>Inputs to the activation function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs031.html#layout-of-input-to-first-hidden-layer-l-1-from-input-layer-l-0" style="font-size: 80%;"><b>Layout of input to first hidden layer \( l=1 \) from input layer \( l=0 \)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs032.html#derivatives-and-the-chain-rule" style="font-size: 80%;"><b>Derivatives and the chain rule</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs033.html#derivative-of-the-cost-function" style="font-size: 80%;"><b>Derivative of the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs034.html#the-back-propagation-equations-for-a-neural-network" style="font-size: 80%;"><b>The  back propagation equations for a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs035.html#analyzing-the-last-results" style="font-size: 80%;"><b>Analyzing the last results</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs036.html#more-considerations" style="font-size: 80%;"><b>More considerations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs037.html#derivatives-in-terms-of-z-j-l" style="font-size: 80%;"><b>Derivatives in terms of \( z_j^L \)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs038.html#bringing-it-together" style="font-size: 80%;"><b>Bringing it together</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs039.html#final-back-propagating-equation" style="font-size: 80%;"><b>Final back propagating equation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs040.html#using-the-chain-rule-and-summing-over-all-k-entries" style="font-size: 80%;"><b>Using the chain rule and summing over all \( k \) entries</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs041.html#setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations" style="font-size: 80%;"><b>Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs042.html#setting-up-the-back-propagation-algorithm-part-1" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 1</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs043.html#setting-up-the-back-propagation-algorithm-part-2" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs044.html#setting-up-the-back-propagation-algorithm-part-3" style="font-size: 80%;"><b>Setting up the Back propagation algorithm, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs045.html#updating-the-gradients" style="font-size: 80%;"><b>Updating the gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs001.html#lecture-october-13-2025" style="font-size: 80%;"><b>Lecture October 13, 2025</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs002.html#readings-and-videos" style="font-size: 80%;"><b>Readings and videos</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs003.html#material-for-the-lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Material for the lab sessions on Tuesday and Wednesday</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs004.html#lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network" style="font-size: 80%;"><b>Lecture material: Writing a code which implements a feed-forward neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs005.html#mathematics-of-deep-learning" style="font-size: 80%;"><b>Mathematics of deep learning</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs006.html#reminder-on-books-with-hands-on-material-and-codes" style="font-size: 80%;"><b>Reminder on books with hands-on material and codes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs007.html#reading-recommendations" style="font-size: 80%;"><b>Reading recommendations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs008.html#reminder-from-last-week-first-network-example-simple-percepetron-with-one-input" style="font-size: 80%;"><b>Reminder from last week: First network example, simple percepetron with one input</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs009.html#layout-of-a-simple-neural-network-with-no-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with no hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs010.html#optimizing-the-parameters" style="font-size: 80%;"><b>Optimizing the parameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs011.html#adding-a-hidden-layer" style="font-size: 80%;"><b>Adding a hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs012.html#layout-of-a-simple-neural-network-with-one-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with one hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs013.html#the-derivatives" style="font-size: 80%;"><b>The derivatives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs014.html#important-observations" style="font-size: 80%;"><b>Important observations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs015.html#the-training" style="font-size: 80%;"><b>The training</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs016.html#code-example" style="font-size: 80%;"><b>Code example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs017.html#simple-neural-network-and-the-back-propagation-equations" style="font-size: 80%;"><b>Simple neural network and the  back propagation equations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs018.html#layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-with-two-hidden-noeds-and-one-output-node" style="font-size: 80%;"><b>Layout of a simple neural network with two input nodes, one  hidden layer with two hidden noeds and one output node</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs019.html#the-ouput-layer" style="font-size: 80%;"><b>The ouput layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs020.html#compact-expressions" style="font-size: 80%;"><b>Compact expressions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs021.html#output-layer" style="font-size: 80%;"><b>Output layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs022.html#explicit-derivatives" style="font-size: 80%;"><b>Explicit derivatives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs023.html#derivatives-of-the-hidden-layer" style="font-size: 80%;"><b>Derivatives of the hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs024.html#final-expression" style="font-size: 80%;"><b>Final expression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs025.html#completing-the-list" style="font-size: 80%;"><b>Completing the list</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs026.html#final-expressions-for-the-biases-of-the-hidden-layer" style="font-size: 80%;"><b>Final expressions for the biases of the hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs027.html#gradient-expressions" style="font-size: 80%;"><b>Gradient expressions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs028.html#setting-up-the-equations-for-a-neural-network" style="font-size: 80%;"><b>Setting up the equations for a neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs029.html#layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0" style="font-size: 80%;"><b>Layout of a neural network with three hidden layers (last layer = \( l=L=4 \), first layer \( l=0 \))</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs030.html#definitions" style="font-size: 80%;"><b>Definitions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs031.html#inputs-to-the-activation-function" style="font-size: 80%;"><b>Inputs to the activation function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs032.html#layout-of-input-to-first-hidden-layer-l-1-from-input-layer-l-0" style="font-size: 80%;"><b>Layout of input to first hidden layer \( l=1 \) from input layer \( l=0 \)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs033.html#derivatives-and-the-chain-rule" style="font-size: 80%;"><b>Derivatives and the chain rule</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs034.html#derivative-of-the-cost-function" style="font-size: 80%;"><b>Derivative of the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs035.html#the-back-propagation-equations-for-a-neural-network" style="font-size: 80%;"><b>The  back propagation equations for a neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs036.html#analyzing-the-last-results" style="font-size: 80%;"><b>Analyzing the last results</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs037.html#more-considerations" style="font-size: 80%;"><b>More considerations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs038.html#derivatives-in-terms-of-z-j-l" style="font-size: 80%;"><b>Derivatives in terms of \( z_j^L \)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs039.html#bringing-it-together" style="font-size: 80%;"><b>Bringing it together</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs040.html#final-back-propagating-equation" style="font-size: 80%;"><b>Final back propagating equation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs041.html#using-the-chain-rule-and-summing-over-all-k-entries" style="font-size: 80%;"><b>Using the chain rule and summing over all \( k \) entries</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs042.html#setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations" style="font-size: 80%;"><b>Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs043.html#setting-up-the-back-propagation-algorithm-part-1" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 1</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs044.html#setting-up-the-back-propagation-algorithm-part-2" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 2</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs045.html#setting-up-the-back-propagation-algorithm-part-3" style="font-size: 80%;"><b>Setting up the Back propagation algorithm, part 3</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs046.html#updating-the-gradients" style="font-size: 80%;"><b>Updating the gradients</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#activation-functions" style="font-size: 80%;"><b>Activation functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs047.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs048.html#relevance" style="font-size: 80%;"><b>Relevance</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs049.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs050.html#exploding-gradients" style="font-size: 80%;"><b>Exploding gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs051.html#is-the-logistic-activation-function-sigmoid-our-choice" style="font-size: 80%;"><b>Is the Logistic activation function (Sigmoid)  our choice?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs052.html#logistic-function-as-the-root-of-problems" style="font-size: 80%;"><b>Logistic function as the root of problems</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs053.html#the-derivative-of-the-logistic-funtion" style="font-size: 80%;"><b>The derivative of the Logistic funtion</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs054.html#insights-from-the-paper-by-glorot-and-bengio" style="font-size: 80%;"><b>Insights from the paper by Glorot and Bengio</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs055.html#the-relu-function-family" style="font-size: 80%;"><b>The RELU function family</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs056.html#elu-function" style="font-size: 80%;"><b>ELU function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs057.html#which-activation-function-should-we-use" style="font-size: 80%;"><b>Which activation function should we use?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs058.html#more-on-activation-functions-output-layers" style="font-size: 80%;"><b>More on activation functions, output layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs059.html#fine-tuning-neural-network-hyperparameters" style="font-size: 80%;"><b>Fine-tuning neural network hyperparameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs060.html#hidden-layers" style="font-size: 80%;"><b>Hidden layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs061.html#batch-normalization" style="font-size: 80%;"><b>Batch Normalization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs062.html#dropout" style="font-size: 80%;"><b>Dropout</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs063.html#gradient-clipping" style="font-size: 80%;"><b>Gradient Clipping</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs064.html#a-top-down-perspective-on-neural-networks" style="font-size: 80%;"><b>A top-down perspective on Neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs065.html#more-top-down-perspectives" style="font-size: 80%;"><b>More top-down perspectives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs066.html#limitations-of-supervised-learning-with-deep-networks" style="font-size: 80%;"><b>Limitations of supervised learning with deep networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs067.html#limitations-of-nns" style="font-size: 80%;"><b>Limitations of NNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="#homogeneous-data" style="font-size: 80%;"><b>Homogeneous data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs069.html#more-limitations" style="font-size: 80%;"><b>More limitations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs070.html#setting-up-a-multi-layer-perceptron-model-for-classification" style="font-size: 80%;"><b>Setting up a Multi-layer perceptron model for classification</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs071.html#defining-the-cost-function" style="font-size: 80%;"><b>Defining the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs072.html#example-binary-classification-problem" style="font-size: 80%;"><b>Example: binary classification problem</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs073.html#the-softmax-function" style="font-size: 80%;"><b>The Softmax function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs074.html#developing-a-code-for-doing-neural-networks-with-back-propagation" style="font-size: 80%;"><b>Developing a code for doing neural networks with back propagation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs076.html#train-and-test-datasets" style="font-size: 80%;"><b>Train and test datasets</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs077.html#define-model-and-architecture" style="font-size: 80%;"><b>Define model and architecture</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs078.html#layers" style="font-size: 80%;"><b>Layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs079.html#weights-and-biases" style="font-size: 80%;"><b>Weights and biases</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs080.html#feed-forward-pass" style="font-size: 80%;"><b>Feed-forward pass</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs081.html#matrix-multiplications" style="font-size: 80%;"><b>Matrix multiplications</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs082.html#choose-cost-function-and-optimizer" style="font-size: 80%;"><b>Choose cost function and optimizer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs083.html#optimizing-the-cost-function" style="font-size: 80%;"><b>Optimizing the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs084.html#regularization" style="font-size: 80%;"><b>Regularization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs085.html#matrix-multiplication" style="font-size: 80%;"><b>Matrix  multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs086.html#improving-performance" style="font-size: 80%;"><b>Improving performance</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs087.html#full-object-oriented-implementation" style="font-size: 80%;"><b>Full object-oriented implementation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs088.html#evaluate-model-performance-on-test-data" style="font-size: 80%;"><b>Evaluate model performance on test data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs089.html#adjust-hyperparameters" style="font-size: 80%;"><b>Adjust hyperparameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs091.html#scikit-learn-implementation" style="font-size: 80%;"><b>scikit-learn implementation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#building-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs094.html#tensorflow" style="font-size: 80%;"><b>Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs095.html#using-keras" style="font-size: 80%;"><b>Using Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#the-breast-cancer-data-now-with-keras" style="font-size: 80%;"><b>The Breast Cancer Data, now with Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs048.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs049.html#relevance" style="font-size: 80%;"><b>Relevance</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs050.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs051.html#exploding-gradients" style="font-size: 80%;"><b>Exploding gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs052.html#is-the-logistic-activation-function-sigmoid-our-choice" style="font-size: 80%;"><b>Is the Logistic activation function (Sigmoid)  our choice?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs053.html#logistic-function-as-the-root-of-problems" style="font-size: 80%;"><b>Logistic function as the root of problems</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs054.html#the-derivative-of-the-logistic-funtion" style="font-size: 80%;"><b>The derivative of the Logistic funtion</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs055.html#insights-from-the-paper-by-glorot-and-bengio" style="font-size: 80%;"><b>Insights from the paper by Glorot and Bengio</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs056.html#the-relu-function-family" style="font-size: 80%;"><b>The RELU function family</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs057.html#elu-function" style="font-size: 80%;"><b>ELU function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs058.html#which-activation-function-should-we-use" style="font-size: 80%;"><b>Which activation function should we use?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs059.html#more-on-activation-functions-output-layers" style="font-size: 80%;"><b>More on activation functions, output layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs060.html#fine-tuning-neural-network-hyperparameters" style="font-size: 80%;"><b>Fine-tuning neural network hyperparameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs061.html#hidden-layers" style="font-size: 80%;"><b>Hidden layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs062.html#batch-normalization" style="font-size: 80%;"><b>Batch Normalization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs063.html#dropout" style="font-size: 80%;"><b>Dropout</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs064.html#gradient-clipping" style="font-size: 80%;"><b>Gradient Clipping</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs065.html#a-top-down-perspective-on-neural-networks" style="font-size: 80%;"><b>A top-down perspective on Neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs066.html#more-top-down-perspectives" style="font-size: 80%;"><b>More top-down perspectives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs067.html#limitations-of-supervised-learning-with-deep-networks" style="font-size: 80%;"><b>Limitations of supervised learning with deep networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="#limitations-of-nns" style="font-size: 80%;"><b>Limitations of NNs</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs069.html#homogeneous-data" style="font-size: 80%;"><b>Homogeneous data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs070.html#more-limitations" style="font-size: 80%;"><b>More limitations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs071.html#setting-up-a-multi-layer-perceptron-model-for-classification" style="font-size: 80%;"><b>Setting up a Multi-layer perceptron model for classification</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs072.html#defining-the-cost-function" style="font-size: 80%;"><b>Defining the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs073.html#example-binary-classification-problem" style="font-size: 80%;"><b>Example: binary classification problem</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs074.html#the-softmax-function" style="font-size: 80%;"><b>The Softmax function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs075.html#developing-a-code-for-doing-neural-networks-with-back-propagation" style="font-size: 80%;"><b>Developing a code for doing neural networks with back propagation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs077.html#train-and-test-datasets" style="font-size: 80%;"><b>Train and test datasets</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs078.html#define-model-and-architecture" style="font-size: 80%;"><b>Define model and architecture</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs079.html#layers" style="font-size: 80%;"><b>Layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs080.html#weights-and-biases" style="font-size: 80%;"><b>Weights and biases</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs081.html#feed-forward-pass" style="font-size: 80%;"><b>Feed-forward pass</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs082.html#matrix-multiplications" style="font-size: 80%;"><b>Matrix multiplications</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs083.html#choose-cost-function-and-optimizer" style="font-size: 80%;"><b>Choose cost function and optimizer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs084.html#optimizing-the-cost-function" style="font-size: 80%;"><b>Optimizing the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs085.html#regularization" style="font-size: 80%;"><b>Regularization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs086.html#matrix-multiplication" style="font-size: 80%;"><b>Matrix  multiplication</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs087.html#improving-performance" style="font-size: 80%;"><b>Improving performance</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs088.html#full-object-oriented-implementation" style="font-size: 80%;"><b>Full object-oriented implementation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs089.html#evaluate-model-performance-on-test-data" style="font-size: 80%;"><b>Evaluate model performance on test data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs090.html#adjust-hyperparameters" style="font-size: 80%;"><b>Adjust hyperparameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#scikit-learn-implementation" style="font-size: 80%;"><b>scikit-learn implementation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs094.html#building-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building neural networks in Tensorflow and Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs095.html#tensorflow" style="font-size: 80%;"><b>Tensorflow</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#using-keras" style="font-size: 80%;"><b>Using Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#building-a-neural-network-code" style="font-size: 80%;"><b>Building a neural network code</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#learning-rate-methods" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Learning rate methods</a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#usage-of-the-above-learning-rate-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of the above learning rate schedulers</a></li>
@@ -465,10 +463,13 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0068"></a>
 <!-- !split -->
-<h2 id="homogeneous-data" class="anchor">Homogeneous data </h2>
+<h2 id="limitations-of-nns" class="anchor">Limitations of NNs </h2>
+
+<p>Here we list some of the important limitations of supervised neural network based models. </p>
 
 <ul>
-<li> <b>Homogeneous data.</b> Almost all DNNs deal with homogeneous data of one type. It is very hard to design architectures that mix and match data types (i.e.&nbsp;some continuous variables, some discrete variables, some time series). In applications beyond images, video, and language, this is often what is required. In contrast, ensemble models like random forests or gradient-boosted trees have no difficulty handling mixed data types.</li>
+<li> <b>Need labeled data</b>. All supervised learning methods, DNNs for supervised learning require labeled data. Often, labeled data is harder to acquire than unlabeled data (e.g. one must pay for human experts to label images).</li>
+<li> <b>Supervised neural networks are extremely data intensive.</b> DNNs are data hungry. They perform best when data is plentiful. This is doubly so for supervised methods where the data must also be labeled. The utility of DNNs is extremely limited if data is hard to acquire or the datasets are small (hundreds to a few thousand samples). In this case, the performance of other methods that utilize hand-engineered features can exceed that of DNNs.</li>
 </ul>
 <p>
 <!-- navigation buttons at the bottom of the page -->
diff --git a/doc/pub/week42/html/._week42-bs069.html b/doc/pub/week42/html/._week42-bs069.html
index 5f26296c1..8f5ed0110 100644
--- a/doc/pub/week42/html/._week42-bs069.html
+++ b/doc/pub/week42/html/._week42-bs069.html
@@ -36,16 +36,17 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Lecture October 14, 2024', 2, None, 'lecture-october-14-2024'),
-              ('Material for the active learning sessions on Tuesday and '
-               'Wednesday',
+ 'sections': [('Lecture October 13, 2025', 2, None, 'lecture-october-13-2025'),
+              ('Readings and videos', 2, None, 'readings-and-videos'),
+              ('Material for the lab sessions on Tuesday and Wednesday',
                2,
                None,
-               'material-for-the-active-learning-sessions-on-tuesday-and-wednesday'),
-              ('Writing a code which implements a feed-forward neural network',
+               'material-for-the-lab-sessions-on-tuesday-and-wednesday'),
+              ('Lecture material: Writing a code which implements a '
+               'feed-forward neural network',
                2,
                None,
-               'writing-a-code-which-implements-a-feed-forward-neural-network'),
+               'lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network'),
               ('Mathematics of deep learning',
                2,
                None,
@@ -55,10 +56,11 @@
                None,
                'reminder-on-books-with-hands-on-material-and-codes'),
               ('Reading recommendations', 2, None, 'reading-recommendations'),
-              ('First network example, simple percepetron with one input',
+              ('Reminder from last week: First network example, simple '
+               'percepetron with one input',
                2,
                None,
-               'first-network-example-simple-percepetron-with-one-input'),
+               'reminder-from-last-week-first-network-example-simple-percepetron-with-one-input'),
               ('Layout of a simple neural network with no hidden layer',
                2,
                None,
@@ -105,10 +107,10 @@
                None,
                'setting-up-the-equations-for-a-neural-network'),
               ('Layout of a neural network with three hidden layers (last '
-               'later = $l=L=4$, first layer $l=0$)',
+               'layer = $l=L=4$, first layer $l=0$)',
                2,
                None,
-               'layout-of-a-neural-network-with-three-hidden-layers-last-later-l-l-4-first-layer-l-0'),
+               'layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0'),
               ('Definitions', 2, None, 'definitions'),
               ('Inputs to the activation function',
                2,
@@ -291,10 +293,6 @@
                2,
                None,
                'collect-and-pre-process-data'),
-              ('The Breast Cancer Data, now with Keras',
-               2,
-               None,
-               'the-breast-cancer-data-now-with-keras'),
               ('Building a neural network code',
                2,
                None,
@@ -349,103 +347,103 @@
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs001.html#lecture-october-14-2024" style="font-size: 80%;"><b>Lecture October 14, 2024</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs002.html#material-for-the-active-learning-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Material for the active learning sessions on Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs003.html#writing-a-code-which-implements-a-feed-forward-neural-network" style="font-size: 80%;"><b>Writing a code which implements a feed-forward neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs004.html#mathematics-of-deep-learning" style="font-size: 80%;"><b>Mathematics of deep learning</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs005.html#reminder-on-books-with-hands-on-material-and-codes" style="font-size: 80%;"><b>Reminder on books with hands-on material and codes</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs006.html#reading-recommendations" style="font-size: 80%;"><b>Reading recommendations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs007.html#first-network-example-simple-percepetron-with-one-input" style="font-size: 80%;"><b>First network example, simple percepetron with one input</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs008.html#layout-of-a-simple-neural-network-with-no-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with no hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs009.html#optimizing-the-parameters" style="font-size: 80%;"><b>Optimizing the parameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs010.html#adding-a-hidden-layer" style="font-size: 80%;"><b>Adding a hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs011.html#layout-of-a-simple-neural-network-with-one-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with one hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs012.html#the-derivatives" style="font-size: 80%;"><b>The derivatives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs013.html#important-observations" style="font-size: 80%;"><b>Important observations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs014.html#the-training" style="font-size: 80%;"><b>The training</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs015.html#code-example" style="font-size: 80%;"><b>Code example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs016.html#simple-neural-network-and-the-back-propagation-equations" style="font-size: 80%;"><b>Simple neural network and the  back propagation equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs017.html#layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-with-two-hidden-noeds-and-one-output-node" style="font-size: 80%;"><b>Layout of a simple neural network with two input nodes, one  hidden layer with two hidden noeds and one output node</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs018.html#the-ouput-layer" style="font-size: 80%;"><b>The ouput layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs019.html#compact-expressions" style="font-size: 80%;"><b>Compact expressions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs020.html#output-layer" style="font-size: 80%;"><b>Output layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs021.html#explicit-derivatives" style="font-size: 80%;"><b>Explicit derivatives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs022.html#derivatives-of-the-hidden-layer" style="font-size: 80%;"><b>Derivatives of the hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs023.html#final-expression" style="font-size: 80%;"><b>Final expression</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs024.html#completing-the-list" style="font-size: 80%;"><b>Completing the list</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs025.html#final-expressions-for-the-biases-of-the-hidden-layer" style="font-size: 80%;"><b>Final expressions for the biases of the hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs026.html#gradient-expressions" style="font-size: 80%;"><b>Gradient expressions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs027.html#setting-up-the-equations-for-a-neural-network" style="font-size: 80%;"><b>Setting up the equations for a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs028.html#layout-of-a-neural-network-with-three-hidden-layers-last-later-l-l-4-first-layer-l-0" style="font-size: 80%;"><b>Layout of a neural network with three hidden layers (last later = \( l=L=4 \), first layer \( l=0 \))</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs029.html#definitions" style="font-size: 80%;"><b>Definitions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs030.html#inputs-to-the-activation-function" style="font-size: 80%;"><b>Inputs to the activation function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs031.html#layout-of-input-to-first-hidden-layer-l-1-from-input-layer-l-0" style="font-size: 80%;"><b>Layout of input to first hidden layer \( l=1 \) from input layer \( l=0 \)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs032.html#derivatives-and-the-chain-rule" style="font-size: 80%;"><b>Derivatives and the chain rule</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs033.html#derivative-of-the-cost-function" style="font-size: 80%;"><b>Derivative of the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs034.html#the-back-propagation-equations-for-a-neural-network" style="font-size: 80%;"><b>The  back propagation equations for a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs035.html#analyzing-the-last-results" style="font-size: 80%;"><b>Analyzing the last results</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs036.html#more-considerations" style="font-size: 80%;"><b>More considerations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs037.html#derivatives-in-terms-of-z-j-l" style="font-size: 80%;"><b>Derivatives in terms of \( z_j^L \)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs038.html#bringing-it-together" style="font-size: 80%;"><b>Bringing it together</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs039.html#final-back-propagating-equation" style="font-size: 80%;"><b>Final back propagating equation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs040.html#using-the-chain-rule-and-summing-over-all-k-entries" style="font-size: 80%;"><b>Using the chain rule and summing over all \( k \) entries</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs041.html#setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations" style="font-size: 80%;"><b>Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs042.html#setting-up-the-back-propagation-algorithm-part-1" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 1</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs043.html#setting-up-the-back-propagation-algorithm-part-2" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs044.html#setting-up-the-back-propagation-algorithm-part-3" style="font-size: 80%;"><b>Setting up the Back propagation algorithm, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs045.html#updating-the-gradients" style="font-size: 80%;"><b>Updating the gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs001.html#lecture-october-13-2025" style="font-size: 80%;"><b>Lecture October 13, 2025</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs002.html#readings-and-videos" style="font-size: 80%;"><b>Readings and videos</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs003.html#material-for-the-lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Material for the lab sessions on Tuesday and Wednesday</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs004.html#lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network" style="font-size: 80%;"><b>Lecture material: Writing a code which implements a feed-forward neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs005.html#mathematics-of-deep-learning" style="font-size: 80%;"><b>Mathematics of deep learning</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs006.html#reminder-on-books-with-hands-on-material-and-codes" style="font-size: 80%;"><b>Reminder on books with hands-on material and codes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs007.html#reading-recommendations" style="font-size: 80%;"><b>Reading recommendations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs008.html#reminder-from-last-week-first-network-example-simple-percepetron-with-one-input" style="font-size: 80%;"><b>Reminder from last week: First network example, simple percepetron with one input</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs009.html#layout-of-a-simple-neural-network-with-no-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with no hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs010.html#optimizing-the-parameters" style="font-size: 80%;"><b>Optimizing the parameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs011.html#adding-a-hidden-layer" style="font-size: 80%;"><b>Adding a hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs012.html#layout-of-a-simple-neural-network-with-one-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with one hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs013.html#the-derivatives" style="font-size: 80%;"><b>The derivatives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs014.html#important-observations" style="font-size: 80%;"><b>Important observations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs015.html#the-training" style="font-size: 80%;"><b>The training</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs016.html#code-example" style="font-size: 80%;"><b>Code example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs017.html#simple-neural-network-and-the-back-propagation-equations" style="font-size: 80%;"><b>Simple neural network and the  back propagation equations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs018.html#layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-with-two-hidden-noeds-and-one-output-node" style="font-size: 80%;"><b>Layout of a simple neural network with two input nodes, one  hidden layer with two hidden noeds and one output node</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs019.html#the-ouput-layer" style="font-size: 80%;"><b>The ouput layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs020.html#compact-expressions" style="font-size: 80%;"><b>Compact expressions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs021.html#output-layer" style="font-size: 80%;"><b>Output layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs022.html#explicit-derivatives" style="font-size: 80%;"><b>Explicit derivatives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs023.html#derivatives-of-the-hidden-layer" style="font-size: 80%;"><b>Derivatives of the hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs024.html#final-expression" style="font-size: 80%;"><b>Final expression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs025.html#completing-the-list" style="font-size: 80%;"><b>Completing the list</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs026.html#final-expressions-for-the-biases-of-the-hidden-layer" style="font-size: 80%;"><b>Final expressions for the biases of the hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs027.html#gradient-expressions" style="font-size: 80%;"><b>Gradient expressions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs028.html#setting-up-the-equations-for-a-neural-network" style="font-size: 80%;"><b>Setting up the equations for a neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs029.html#layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0" style="font-size: 80%;"><b>Layout of a neural network with three hidden layers (last layer = \( l=L=4 \), first layer \( l=0 \))</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs030.html#definitions" style="font-size: 80%;"><b>Definitions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs031.html#inputs-to-the-activation-function" style="font-size: 80%;"><b>Inputs to the activation function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs032.html#layout-of-input-to-first-hidden-layer-l-1-from-input-layer-l-0" style="font-size: 80%;"><b>Layout of input to first hidden layer \( l=1 \) from input layer \( l=0 \)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs033.html#derivatives-and-the-chain-rule" style="font-size: 80%;"><b>Derivatives and the chain rule</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs034.html#derivative-of-the-cost-function" style="font-size: 80%;"><b>Derivative of the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs035.html#the-back-propagation-equations-for-a-neural-network" style="font-size: 80%;"><b>The  back propagation equations for a neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs036.html#analyzing-the-last-results" style="font-size: 80%;"><b>Analyzing the last results</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs037.html#more-considerations" style="font-size: 80%;"><b>More considerations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs038.html#derivatives-in-terms-of-z-j-l" style="font-size: 80%;"><b>Derivatives in terms of \( z_j^L \)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs039.html#bringing-it-together" style="font-size: 80%;"><b>Bringing it together</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs040.html#final-back-propagating-equation" style="font-size: 80%;"><b>Final back propagating equation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs041.html#using-the-chain-rule-and-summing-over-all-k-entries" style="font-size: 80%;"><b>Using the chain rule and summing over all \( k \) entries</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs042.html#setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations" style="font-size: 80%;"><b>Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs043.html#setting-up-the-back-propagation-algorithm-part-1" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 1</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs044.html#setting-up-the-back-propagation-algorithm-part-2" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 2</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs045.html#setting-up-the-back-propagation-algorithm-part-3" style="font-size: 80%;"><b>Setting up the Back propagation algorithm, part 3</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs046.html#updating-the-gradients" style="font-size: 80%;"><b>Updating the gradients</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#activation-functions" style="font-size: 80%;"><b>Activation functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs047.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs048.html#relevance" style="font-size: 80%;"><b>Relevance</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs049.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs050.html#exploding-gradients" style="font-size: 80%;"><b>Exploding gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs051.html#is-the-logistic-activation-function-sigmoid-our-choice" style="font-size: 80%;"><b>Is the Logistic activation function (Sigmoid)  our choice?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs052.html#logistic-function-as-the-root-of-problems" style="font-size: 80%;"><b>Logistic function as the root of problems</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs053.html#the-derivative-of-the-logistic-funtion" style="font-size: 80%;"><b>The derivative of the Logistic funtion</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs054.html#insights-from-the-paper-by-glorot-and-bengio" style="font-size: 80%;"><b>Insights from the paper by Glorot and Bengio</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs055.html#the-relu-function-family" style="font-size: 80%;"><b>The RELU function family</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs056.html#elu-function" style="font-size: 80%;"><b>ELU function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs057.html#which-activation-function-should-we-use" style="font-size: 80%;"><b>Which activation function should we use?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs058.html#more-on-activation-functions-output-layers" style="font-size: 80%;"><b>More on activation functions, output layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs059.html#fine-tuning-neural-network-hyperparameters" style="font-size: 80%;"><b>Fine-tuning neural network hyperparameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs060.html#hidden-layers" style="font-size: 80%;"><b>Hidden layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs061.html#batch-normalization" style="font-size: 80%;"><b>Batch Normalization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs062.html#dropout" style="font-size: 80%;"><b>Dropout</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs063.html#gradient-clipping" style="font-size: 80%;"><b>Gradient Clipping</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs064.html#a-top-down-perspective-on-neural-networks" style="font-size: 80%;"><b>A top-down perspective on Neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs065.html#more-top-down-perspectives" style="font-size: 80%;"><b>More top-down perspectives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs066.html#limitations-of-supervised-learning-with-deep-networks" style="font-size: 80%;"><b>Limitations of supervised learning with deep networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs067.html#limitations-of-nns" style="font-size: 80%;"><b>Limitations of NNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs068.html#homogeneous-data" style="font-size: 80%;"><b>Homogeneous data</b></a></li>
-     <!-- navigation toc: --> <li><a href="#more-limitations" style="font-size: 80%;"><b>More limitations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs070.html#setting-up-a-multi-layer-perceptron-model-for-classification" style="font-size: 80%;"><b>Setting up a Multi-layer perceptron model for classification</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs071.html#defining-the-cost-function" style="font-size: 80%;"><b>Defining the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs072.html#example-binary-classification-problem" style="font-size: 80%;"><b>Example: binary classification problem</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs073.html#the-softmax-function" style="font-size: 80%;"><b>The Softmax function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs074.html#developing-a-code-for-doing-neural-networks-with-back-propagation" style="font-size: 80%;"><b>Developing a code for doing neural networks with back propagation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs076.html#train-and-test-datasets" style="font-size: 80%;"><b>Train and test datasets</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs077.html#define-model-and-architecture" style="font-size: 80%;"><b>Define model and architecture</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs078.html#layers" style="font-size: 80%;"><b>Layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs079.html#weights-and-biases" style="font-size: 80%;"><b>Weights and biases</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs080.html#feed-forward-pass" style="font-size: 80%;"><b>Feed-forward pass</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs081.html#matrix-multiplications" style="font-size: 80%;"><b>Matrix multiplications</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs082.html#choose-cost-function-and-optimizer" style="font-size: 80%;"><b>Choose cost function and optimizer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs083.html#optimizing-the-cost-function" style="font-size: 80%;"><b>Optimizing the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs084.html#regularization" style="font-size: 80%;"><b>Regularization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs085.html#matrix-multiplication" style="font-size: 80%;"><b>Matrix  multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs086.html#improving-performance" style="font-size: 80%;"><b>Improving performance</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs087.html#full-object-oriented-implementation" style="font-size: 80%;"><b>Full object-oriented implementation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs088.html#evaluate-model-performance-on-test-data" style="font-size: 80%;"><b>Evaluate model performance on test data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs089.html#adjust-hyperparameters" style="font-size: 80%;"><b>Adjust hyperparameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs091.html#scikit-learn-implementation" style="font-size: 80%;"><b>scikit-learn implementation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#building-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs094.html#tensorflow" style="font-size: 80%;"><b>Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs095.html#using-keras" style="font-size: 80%;"><b>Using Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#the-breast-cancer-data-now-with-keras" style="font-size: 80%;"><b>The Breast Cancer Data, now with Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs048.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs049.html#relevance" style="font-size: 80%;"><b>Relevance</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs050.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs051.html#exploding-gradients" style="font-size: 80%;"><b>Exploding gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs052.html#is-the-logistic-activation-function-sigmoid-our-choice" style="font-size: 80%;"><b>Is the Logistic activation function (Sigmoid)  our choice?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs053.html#logistic-function-as-the-root-of-problems" style="font-size: 80%;"><b>Logistic function as the root of problems</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs054.html#the-derivative-of-the-logistic-funtion" style="font-size: 80%;"><b>The derivative of the Logistic funtion</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs055.html#insights-from-the-paper-by-glorot-and-bengio" style="font-size: 80%;"><b>Insights from the paper by Glorot and Bengio</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs056.html#the-relu-function-family" style="font-size: 80%;"><b>The RELU function family</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs057.html#elu-function" style="font-size: 80%;"><b>ELU function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs058.html#which-activation-function-should-we-use" style="font-size: 80%;"><b>Which activation function should we use?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs059.html#more-on-activation-functions-output-layers" style="font-size: 80%;"><b>More on activation functions, output layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs060.html#fine-tuning-neural-network-hyperparameters" style="font-size: 80%;"><b>Fine-tuning neural network hyperparameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs061.html#hidden-layers" style="font-size: 80%;"><b>Hidden layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs062.html#batch-normalization" style="font-size: 80%;"><b>Batch Normalization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs063.html#dropout" style="font-size: 80%;"><b>Dropout</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs064.html#gradient-clipping" style="font-size: 80%;"><b>Gradient Clipping</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs065.html#a-top-down-perspective-on-neural-networks" style="font-size: 80%;"><b>A top-down perspective on Neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs066.html#more-top-down-perspectives" style="font-size: 80%;"><b>More top-down perspectives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs067.html#limitations-of-supervised-learning-with-deep-networks" style="font-size: 80%;"><b>Limitations of supervised learning with deep networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs068.html#limitations-of-nns" style="font-size: 80%;"><b>Limitations of NNs</b></a></li>
+     <!-- navigation toc: --> <li><a href="#homogeneous-data" style="font-size: 80%;"><b>Homogeneous data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs070.html#more-limitations" style="font-size: 80%;"><b>More limitations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs071.html#setting-up-a-multi-layer-perceptron-model-for-classification" style="font-size: 80%;"><b>Setting up a Multi-layer perceptron model for classification</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs072.html#defining-the-cost-function" style="font-size: 80%;"><b>Defining the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs073.html#example-binary-classification-problem" style="font-size: 80%;"><b>Example: binary classification problem</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs074.html#the-softmax-function" style="font-size: 80%;"><b>The Softmax function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs075.html#developing-a-code-for-doing-neural-networks-with-back-propagation" style="font-size: 80%;"><b>Developing a code for doing neural networks with back propagation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs077.html#train-and-test-datasets" style="font-size: 80%;"><b>Train and test datasets</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs078.html#define-model-and-architecture" style="font-size: 80%;"><b>Define model and architecture</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs079.html#layers" style="font-size: 80%;"><b>Layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs080.html#weights-and-biases" style="font-size: 80%;"><b>Weights and biases</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs081.html#feed-forward-pass" style="font-size: 80%;"><b>Feed-forward pass</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs082.html#matrix-multiplications" style="font-size: 80%;"><b>Matrix multiplications</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs083.html#choose-cost-function-and-optimizer" style="font-size: 80%;"><b>Choose cost function and optimizer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs084.html#optimizing-the-cost-function" style="font-size: 80%;"><b>Optimizing the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs085.html#regularization" style="font-size: 80%;"><b>Regularization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs086.html#matrix-multiplication" style="font-size: 80%;"><b>Matrix  multiplication</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs087.html#improving-performance" style="font-size: 80%;"><b>Improving performance</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs088.html#full-object-oriented-implementation" style="font-size: 80%;"><b>Full object-oriented implementation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs089.html#evaluate-model-performance-on-test-data" style="font-size: 80%;"><b>Evaluate model performance on test data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs090.html#adjust-hyperparameters" style="font-size: 80%;"><b>Adjust hyperparameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#scikit-learn-implementation" style="font-size: 80%;"><b>scikit-learn implementation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs094.html#building-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building neural networks in Tensorflow and Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs095.html#tensorflow" style="font-size: 80%;"><b>Tensorflow</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#using-keras" style="font-size: 80%;"><b>Using Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#building-a-neural-network-code" style="font-size: 80%;"><b>Building a neural network code</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#learning-rate-methods" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Learning rate methods</a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#usage-of-the-above-learning-rate-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of the above learning rate schedulers</a></li>
@@ -465,13 +463,11 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0069"></a>
 <!-- !split -->
-<h2 id="more-limitations" class="anchor">More limitations </h2>
+<h2 id="homogeneous-data" class="anchor">Homogeneous data </h2>
 
 <ul>
-<li> <b>Many problems are not about prediction.</b> In natural science we are often interested in learning something about the underlying distribution that generates the data. In this case, it is often difficult to cast these ideas in a supervised learning setting. While the problems are related, it is possible to make good predictions with a <em>wrong</em> model. The model might or might not be useful for understanding the underlying science.</li>
+<li> <b>Homogeneous data.</b> Almost all DNNs deal with homogeneous data of one type. It is very hard to design architectures that mix and match data types (i.e.&nbsp;some continuous variables, some discrete variables, some time series). In applications beyond images, video, and language, this is often what is required. In contrast, ensemble models like random forests or gradient-boosted trees have no difficulty handling mixed data types.</li>
 </ul>
-<p>Some of these remarks are particular to DNNs, others are shared by all supervised learning methods. This motivates the use of unsupervised methods which in part circumvent these problems.</p>
-
 <p>
 <!-- navigation buttons at the bottom of the page -->
 <ul class="pagination">
diff --git a/doc/pub/week42/html/._week42-bs070.html b/doc/pub/week42/html/._week42-bs070.html
index 20d771166..36bfa3807 100644
--- a/doc/pub/week42/html/._week42-bs070.html
+++ b/doc/pub/week42/html/._week42-bs070.html
@@ -36,16 +36,17 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Lecture October 14, 2024', 2, None, 'lecture-october-14-2024'),
-              ('Material for the active learning sessions on Tuesday and '
-               'Wednesday',
+ 'sections': [('Lecture October 13, 2025', 2, None, 'lecture-october-13-2025'),
+              ('Readings and videos', 2, None, 'readings-and-videos'),
+              ('Material for the lab sessions on Tuesday and Wednesday',
                2,
                None,
-               'material-for-the-active-learning-sessions-on-tuesday-and-wednesday'),
-              ('Writing a code which implements a feed-forward neural network',
+               'material-for-the-lab-sessions-on-tuesday-and-wednesday'),
+              ('Lecture material: Writing a code which implements a '
+               'feed-forward neural network',
                2,
                None,
-               'writing-a-code-which-implements-a-feed-forward-neural-network'),
+               'lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network'),
               ('Mathematics of deep learning',
                2,
                None,
@@ -55,10 +56,11 @@
                None,
                'reminder-on-books-with-hands-on-material-and-codes'),
               ('Reading recommendations', 2, None, 'reading-recommendations'),
-              ('First network example, simple percepetron with one input',
+              ('Reminder from last week: First network example, simple '
+               'percepetron with one input',
                2,
                None,
-               'first-network-example-simple-percepetron-with-one-input'),
+               'reminder-from-last-week-first-network-example-simple-percepetron-with-one-input'),
               ('Layout of a simple neural network with no hidden layer',
                2,
                None,
@@ -105,10 +107,10 @@
                None,
                'setting-up-the-equations-for-a-neural-network'),
               ('Layout of a neural network with three hidden layers (last '
-               'later = $l=L=4$, first layer $l=0$)',
+               'layer = $l=L=4$, first layer $l=0$)',
                2,
                None,
-               'layout-of-a-neural-network-with-three-hidden-layers-last-later-l-l-4-first-layer-l-0'),
+               'layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0'),
               ('Definitions', 2, None, 'definitions'),
               ('Inputs to the activation function',
                2,
@@ -291,10 +293,6 @@
                2,
                None,
                'collect-and-pre-process-data'),
-              ('The Breast Cancer Data, now with Keras',
-               2,
-               None,
-               'the-breast-cancer-data-now-with-keras'),
               ('Building a neural network code',
                2,
                None,
@@ -349,103 +347,103 @@
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs001.html#lecture-october-14-2024" style="font-size: 80%;"><b>Lecture October 14, 2024</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs002.html#material-for-the-active-learning-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Material for the active learning sessions on Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs003.html#writing-a-code-which-implements-a-feed-forward-neural-network" style="font-size: 80%;"><b>Writing a code which implements a feed-forward neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs004.html#mathematics-of-deep-learning" style="font-size: 80%;"><b>Mathematics of deep learning</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs005.html#reminder-on-books-with-hands-on-material-and-codes" style="font-size: 80%;"><b>Reminder on books with hands-on material and codes</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs006.html#reading-recommendations" style="font-size: 80%;"><b>Reading recommendations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs007.html#first-network-example-simple-percepetron-with-one-input" style="font-size: 80%;"><b>First network example, simple percepetron with one input</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs008.html#layout-of-a-simple-neural-network-with-no-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with no hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs009.html#optimizing-the-parameters" style="font-size: 80%;"><b>Optimizing the parameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs010.html#adding-a-hidden-layer" style="font-size: 80%;"><b>Adding a hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs011.html#layout-of-a-simple-neural-network-with-one-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with one hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs012.html#the-derivatives" style="font-size: 80%;"><b>The derivatives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs013.html#important-observations" style="font-size: 80%;"><b>Important observations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs014.html#the-training" style="font-size: 80%;"><b>The training</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs015.html#code-example" style="font-size: 80%;"><b>Code example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs016.html#simple-neural-network-and-the-back-propagation-equations" style="font-size: 80%;"><b>Simple neural network and the  back propagation equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs017.html#layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-with-two-hidden-noeds-and-one-output-node" style="font-size: 80%;"><b>Layout of a simple neural network with two input nodes, one  hidden layer with two hidden noeds and one output node</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs018.html#the-ouput-layer" style="font-size: 80%;"><b>The ouput layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs019.html#compact-expressions" style="font-size: 80%;"><b>Compact expressions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs020.html#output-layer" style="font-size: 80%;"><b>Output layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs021.html#explicit-derivatives" style="font-size: 80%;"><b>Explicit derivatives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs022.html#derivatives-of-the-hidden-layer" style="font-size: 80%;"><b>Derivatives of the hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs023.html#final-expression" style="font-size: 80%;"><b>Final expression</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs024.html#completing-the-list" style="font-size: 80%;"><b>Completing the list</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs025.html#final-expressions-for-the-biases-of-the-hidden-layer" style="font-size: 80%;"><b>Final expressions for the biases of the hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs026.html#gradient-expressions" style="font-size: 80%;"><b>Gradient expressions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs027.html#setting-up-the-equations-for-a-neural-network" style="font-size: 80%;"><b>Setting up the equations for a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs028.html#layout-of-a-neural-network-with-three-hidden-layers-last-later-l-l-4-first-layer-l-0" style="font-size: 80%;"><b>Layout of a neural network with three hidden layers (last later = \( l=L=4 \), first layer \( l=0 \))</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs029.html#definitions" style="font-size: 80%;"><b>Definitions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs030.html#inputs-to-the-activation-function" style="font-size: 80%;"><b>Inputs to the activation function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs031.html#layout-of-input-to-first-hidden-layer-l-1-from-input-layer-l-0" style="font-size: 80%;"><b>Layout of input to first hidden layer \( l=1 \) from input layer \( l=0 \)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs032.html#derivatives-and-the-chain-rule" style="font-size: 80%;"><b>Derivatives and the chain rule</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs033.html#derivative-of-the-cost-function" style="font-size: 80%;"><b>Derivative of the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs034.html#the-back-propagation-equations-for-a-neural-network" style="font-size: 80%;"><b>The  back propagation equations for a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs035.html#analyzing-the-last-results" style="font-size: 80%;"><b>Analyzing the last results</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs036.html#more-considerations" style="font-size: 80%;"><b>More considerations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs037.html#derivatives-in-terms-of-z-j-l" style="font-size: 80%;"><b>Derivatives in terms of \( z_j^L \)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs038.html#bringing-it-together" style="font-size: 80%;"><b>Bringing it together</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs039.html#final-back-propagating-equation" style="font-size: 80%;"><b>Final back propagating equation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs040.html#using-the-chain-rule-and-summing-over-all-k-entries" style="font-size: 80%;"><b>Using the chain rule and summing over all \( k \) entries</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs041.html#setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations" style="font-size: 80%;"><b>Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs042.html#setting-up-the-back-propagation-algorithm-part-1" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 1</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs043.html#setting-up-the-back-propagation-algorithm-part-2" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs044.html#setting-up-the-back-propagation-algorithm-part-3" style="font-size: 80%;"><b>Setting up the Back propagation algorithm, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs045.html#updating-the-gradients" style="font-size: 80%;"><b>Updating the gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs001.html#lecture-october-13-2025" style="font-size: 80%;"><b>Lecture October 13, 2025</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs002.html#readings-and-videos" style="font-size: 80%;"><b>Readings and videos</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs003.html#material-for-the-lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Material for the lab sessions on Tuesday and Wednesday</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs004.html#lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network" style="font-size: 80%;"><b>Lecture material: Writing a code which implements a feed-forward neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs005.html#mathematics-of-deep-learning" style="font-size: 80%;"><b>Mathematics of deep learning</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs006.html#reminder-on-books-with-hands-on-material-and-codes" style="font-size: 80%;"><b>Reminder on books with hands-on material and codes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs007.html#reading-recommendations" style="font-size: 80%;"><b>Reading recommendations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs008.html#reminder-from-last-week-first-network-example-simple-percepetron-with-one-input" style="font-size: 80%;"><b>Reminder from last week: First network example, simple percepetron with one input</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs009.html#layout-of-a-simple-neural-network-with-no-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with no hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs010.html#optimizing-the-parameters" style="font-size: 80%;"><b>Optimizing the parameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs011.html#adding-a-hidden-layer" style="font-size: 80%;"><b>Adding a hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs012.html#layout-of-a-simple-neural-network-with-one-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with one hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs013.html#the-derivatives" style="font-size: 80%;"><b>The derivatives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs014.html#important-observations" style="font-size: 80%;"><b>Important observations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs015.html#the-training" style="font-size: 80%;"><b>The training</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs016.html#code-example" style="font-size: 80%;"><b>Code example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs017.html#simple-neural-network-and-the-back-propagation-equations" style="font-size: 80%;"><b>Simple neural network and the  back propagation equations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs018.html#layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-with-two-hidden-noeds-and-one-output-node" style="font-size: 80%;"><b>Layout of a simple neural network with two input nodes, one  hidden layer with two hidden noeds and one output node</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs019.html#the-ouput-layer" style="font-size: 80%;"><b>The ouput layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs020.html#compact-expressions" style="font-size: 80%;"><b>Compact expressions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs021.html#output-layer" style="font-size: 80%;"><b>Output layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs022.html#explicit-derivatives" style="font-size: 80%;"><b>Explicit derivatives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs023.html#derivatives-of-the-hidden-layer" style="font-size: 80%;"><b>Derivatives of the hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs024.html#final-expression" style="font-size: 80%;"><b>Final expression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs025.html#completing-the-list" style="font-size: 80%;"><b>Completing the list</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs026.html#final-expressions-for-the-biases-of-the-hidden-layer" style="font-size: 80%;"><b>Final expressions for the biases of the hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs027.html#gradient-expressions" style="font-size: 80%;"><b>Gradient expressions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs028.html#setting-up-the-equations-for-a-neural-network" style="font-size: 80%;"><b>Setting up the equations for a neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs029.html#layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0" style="font-size: 80%;"><b>Layout of a neural network with three hidden layers (last layer = \( l=L=4 \), first layer \( l=0 \))</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs030.html#definitions" style="font-size: 80%;"><b>Definitions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs031.html#inputs-to-the-activation-function" style="font-size: 80%;"><b>Inputs to the activation function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs032.html#layout-of-input-to-first-hidden-layer-l-1-from-input-layer-l-0" style="font-size: 80%;"><b>Layout of input to first hidden layer \( l=1 \) from input layer \( l=0 \)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs033.html#derivatives-and-the-chain-rule" style="font-size: 80%;"><b>Derivatives and the chain rule</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs034.html#derivative-of-the-cost-function" style="font-size: 80%;"><b>Derivative of the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs035.html#the-back-propagation-equations-for-a-neural-network" style="font-size: 80%;"><b>The  back propagation equations for a neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs036.html#analyzing-the-last-results" style="font-size: 80%;"><b>Analyzing the last results</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs037.html#more-considerations" style="font-size: 80%;"><b>More considerations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs038.html#derivatives-in-terms-of-z-j-l" style="font-size: 80%;"><b>Derivatives in terms of \( z_j^L \)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs039.html#bringing-it-together" style="font-size: 80%;"><b>Bringing it together</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs040.html#final-back-propagating-equation" style="font-size: 80%;"><b>Final back propagating equation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs041.html#using-the-chain-rule-and-summing-over-all-k-entries" style="font-size: 80%;"><b>Using the chain rule and summing over all \( k \) entries</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs042.html#setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations" style="font-size: 80%;"><b>Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs043.html#setting-up-the-back-propagation-algorithm-part-1" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 1</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs044.html#setting-up-the-back-propagation-algorithm-part-2" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 2</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs045.html#setting-up-the-back-propagation-algorithm-part-3" style="font-size: 80%;"><b>Setting up the Back propagation algorithm, part 3</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs046.html#updating-the-gradients" style="font-size: 80%;"><b>Updating the gradients</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#activation-functions" style="font-size: 80%;"><b>Activation functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs047.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs048.html#relevance" style="font-size: 80%;"><b>Relevance</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs049.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs050.html#exploding-gradients" style="font-size: 80%;"><b>Exploding gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs051.html#is-the-logistic-activation-function-sigmoid-our-choice" style="font-size: 80%;"><b>Is the Logistic activation function (Sigmoid)  our choice?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs052.html#logistic-function-as-the-root-of-problems" style="font-size: 80%;"><b>Logistic function as the root of problems</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs053.html#the-derivative-of-the-logistic-funtion" style="font-size: 80%;"><b>The derivative of the Logistic funtion</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs054.html#insights-from-the-paper-by-glorot-and-bengio" style="font-size: 80%;"><b>Insights from the paper by Glorot and Bengio</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs055.html#the-relu-function-family" style="font-size: 80%;"><b>The RELU function family</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs056.html#elu-function" style="font-size: 80%;"><b>ELU function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs057.html#which-activation-function-should-we-use" style="font-size: 80%;"><b>Which activation function should we use?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs058.html#more-on-activation-functions-output-layers" style="font-size: 80%;"><b>More on activation functions, output layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs059.html#fine-tuning-neural-network-hyperparameters" style="font-size: 80%;"><b>Fine-tuning neural network hyperparameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs060.html#hidden-layers" style="font-size: 80%;"><b>Hidden layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs061.html#batch-normalization" style="font-size: 80%;"><b>Batch Normalization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs062.html#dropout" style="font-size: 80%;"><b>Dropout</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs063.html#gradient-clipping" style="font-size: 80%;"><b>Gradient Clipping</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs064.html#a-top-down-perspective-on-neural-networks" style="font-size: 80%;"><b>A top-down perspective on Neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs065.html#more-top-down-perspectives" style="font-size: 80%;"><b>More top-down perspectives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs066.html#limitations-of-supervised-learning-with-deep-networks" style="font-size: 80%;"><b>Limitations of supervised learning with deep networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs067.html#limitations-of-nns" style="font-size: 80%;"><b>Limitations of NNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs068.html#homogeneous-data" style="font-size: 80%;"><b>Homogeneous data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs069.html#more-limitations" style="font-size: 80%;"><b>More limitations</b></a></li>
-     <!-- navigation toc: --> <li><a href="#setting-up-a-multi-layer-perceptron-model-for-classification" style="font-size: 80%;"><b>Setting up a Multi-layer perceptron model for classification</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs071.html#defining-the-cost-function" style="font-size: 80%;"><b>Defining the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs072.html#example-binary-classification-problem" style="font-size: 80%;"><b>Example: binary classification problem</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs073.html#the-softmax-function" style="font-size: 80%;"><b>The Softmax function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs074.html#developing-a-code-for-doing-neural-networks-with-back-propagation" style="font-size: 80%;"><b>Developing a code for doing neural networks with back propagation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs076.html#train-and-test-datasets" style="font-size: 80%;"><b>Train and test datasets</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs077.html#define-model-and-architecture" style="font-size: 80%;"><b>Define model and architecture</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs078.html#layers" style="font-size: 80%;"><b>Layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs079.html#weights-and-biases" style="font-size: 80%;"><b>Weights and biases</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs080.html#feed-forward-pass" style="font-size: 80%;"><b>Feed-forward pass</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs081.html#matrix-multiplications" style="font-size: 80%;"><b>Matrix multiplications</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs082.html#choose-cost-function-and-optimizer" style="font-size: 80%;"><b>Choose cost function and optimizer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs083.html#optimizing-the-cost-function" style="font-size: 80%;"><b>Optimizing the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs084.html#regularization" style="font-size: 80%;"><b>Regularization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs085.html#matrix-multiplication" style="font-size: 80%;"><b>Matrix  multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs086.html#improving-performance" style="font-size: 80%;"><b>Improving performance</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs087.html#full-object-oriented-implementation" style="font-size: 80%;"><b>Full object-oriented implementation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs088.html#evaluate-model-performance-on-test-data" style="font-size: 80%;"><b>Evaluate model performance on test data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs089.html#adjust-hyperparameters" style="font-size: 80%;"><b>Adjust hyperparameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs091.html#scikit-learn-implementation" style="font-size: 80%;"><b>scikit-learn implementation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#building-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs094.html#tensorflow" style="font-size: 80%;"><b>Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs095.html#using-keras" style="font-size: 80%;"><b>Using Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#the-breast-cancer-data-now-with-keras" style="font-size: 80%;"><b>The Breast Cancer Data, now with Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs048.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs049.html#relevance" style="font-size: 80%;"><b>Relevance</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs050.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs051.html#exploding-gradients" style="font-size: 80%;"><b>Exploding gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs052.html#is-the-logistic-activation-function-sigmoid-our-choice" style="font-size: 80%;"><b>Is the Logistic activation function (Sigmoid)  our choice?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs053.html#logistic-function-as-the-root-of-problems" style="font-size: 80%;"><b>Logistic function as the root of problems</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs054.html#the-derivative-of-the-logistic-funtion" style="font-size: 80%;"><b>The derivative of the Logistic funtion</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs055.html#insights-from-the-paper-by-glorot-and-bengio" style="font-size: 80%;"><b>Insights from the paper by Glorot and Bengio</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs056.html#the-relu-function-family" style="font-size: 80%;"><b>The RELU function family</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs057.html#elu-function" style="font-size: 80%;"><b>ELU function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs058.html#which-activation-function-should-we-use" style="font-size: 80%;"><b>Which activation function should we use?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs059.html#more-on-activation-functions-output-layers" style="font-size: 80%;"><b>More on activation functions, output layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs060.html#fine-tuning-neural-network-hyperparameters" style="font-size: 80%;"><b>Fine-tuning neural network hyperparameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs061.html#hidden-layers" style="font-size: 80%;"><b>Hidden layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs062.html#batch-normalization" style="font-size: 80%;"><b>Batch Normalization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs063.html#dropout" style="font-size: 80%;"><b>Dropout</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs064.html#gradient-clipping" style="font-size: 80%;"><b>Gradient Clipping</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs065.html#a-top-down-perspective-on-neural-networks" style="font-size: 80%;"><b>A top-down perspective on Neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs066.html#more-top-down-perspectives" style="font-size: 80%;"><b>More top-down perspectives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs067.html#limitations-of-supervised-learning-with-deep-networks" style="font-size: 80%;"><b>Limitations of supervised learning with deep networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs068.html#limitations-of-nns" style="font-size: 80%;"><b>Limitations of NNs</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs069.html#homogeneous-data" style="font-size: 80%;"><b>Homogeneous data</b></a></li>
+     <!-- navigation toc: --> <li><a href="#more-limitations" style="font-size: 80%;"><b>More limitations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs071.html#setting-up-a-multi-layer-perceptron-model-for-classification" style="font-size: 80%;"><b>Setting up a Multi-layer perceptron model for classification</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs072.html#defining-the-cost-function" style="font-size: 80%;"><b>Defining the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs073.html#example-binary-classification-problem" style="font-size: 80%;"><b>Example: binary classification problem</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs074.html#the-softmax-function" style="font-size: 80%;"><b>The Softmax function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs075.html#developing-a-code-for-doing-neural-networks-with-back-propagation" style="font-size: 80%;"><b>Developing a code for doing neural networks with back propagation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs077.html#train-and-test-datasets" style="font-size: 80%;"><b>Train and test datasets</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs078.html#define-model-and-architecture" style="font-size: 80%;"><b>Define model and architecture</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs079.html#layers" style="font-size: 80%;"><b>Layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs080.html#weights-and-biases" style="font-size: 80%;"><b>Weights and biases</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs081.html#feed-forward-pass" style="font-size: 80%;"><b>Feed-forward pass</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs082.html#matrix-multiplications" style="font-size: 80%;"><b>Matrix multiplications</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs083.html#choose-cost-function-and-optimizer" style="font-size: 80%;"><b>Choose cost function and optimizer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs084.html#optimizing-the-cost-function" style="font-size: 80%;"><b>Optimizing the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs085.html#regularization" style="font-size: 80%;"><b>Regularization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs086.html#matrix-multiplication" style="font-size: 80%;"><b>Matrix  multiplication</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs087.html#improving-performance" style="font-size: 80%;"><b>Improving performance</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs088.html#full-object-oriented-implementation" style="font-size: 80%;"><b>Full object-oriented implementation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs089.html#evaluate-model-performance-on-test-data" style="font-size: 80%;"><b>Evaluate model performance on test data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs090.html#adjust-hyperparameters" style="font-size: 80%;"><b>Adjust hyperparameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#scikit-learn-implementation" style="font-size: 80%;"><b>scikit-learn implementation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs094.html#building-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building neural networks in Tensorflow and Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs095.html#tensorflow" style="font-size: 80%;"><b>Tensorflow</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#using-keras" style="font-size: 80%;"><b>Using Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#building-a-neural-network-code" style="font-size: 80%;"><b>Building a neural network code</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#learning-rate-methods" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Learning rate methods</a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#usage-of-the-above-learning-rate-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of the above learning rate schedulers</a></li>
@@ -464,42 +462,13 @@
 <div class="container">
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0070"></a>
-<!-- !split  -->
-<h2 id="setting-up-a-multi-layer-perceptron-model-for-classification" class="anchor">Setting up a Multi-layer perceptron model for classification  </h2>
-
-<p>We are now gong to develop an example based on the MNIST data
-base. This is a classification problem and we need to use our
-cross-entropy function we discussed in connection with logistic
-regression. The cross-entropy defines our cost function for the
-classificaton problems with neural networks.
-</p>
-
-<p>In binary classification with two classes \( (0, 1) \) we define the
-logistic/sigmoid function as the probability that a particular input
-is in class \( 0 \) or \( 1 \).  This is possible because the logistic
-function takes any input from the real numbers and inputs a number
-between 0 and 1, and can therefore be interpreted as a probability. It
-also has other nice properties, such as a derivative that is simple to
-calculate.
-</p>
-
-<p>For an input \( \boldsymbol{a} \) from the hidden layer, the probability that the input \( \boldsymbol{x} \)
-is in class 0 or 1 is just. We let \( \theta \) represent the unknown weights and biases to be adjusted by our equations). The variable \( x \)
-represents our activation values \( z \). We have
-</p>
-$$
-P(y = 0 \mid \boldsymbol{x}, \boldsymbol{\theta}) = \frac{1}{1 + \exp{(- \boldsymbol{x}})} ,
-$$
-
-<p>and</p>
-$$
-P(y = 1 \mid \boldsymbol{x}, \boldsymbol{\theta}) = 1 - P(y = 0 \mid \boldsymbol{x}, \boldsymbol{\theta}) ,
-$$
-
-<p>where \( y \in \{0, 1\} \)  and \( \boldsymbol{\theta} \) represents the weights and biases
-of our network.
-</p>
+<!-- !split -->
+<h2 id="more-limitations" class="anchor">More limitations </h2>
 
+<ul>
+<li> <b>Many problems are not about prediction.</b> In natural science we are often interested in learning something about the underlying distribution that generates the data. In this case, it is often difficult to cast these ideas in a supervised learning setting. While the problems are related, it is possible to make good predictions with a <em>wrong</em> model. The model might or might not be useful for understanding the underlying science.</li>
+</ul>
+<p>Some of these remarks are particular to DNNs, others are shared by all supervised learning methods. This motivates the use of unsupervised methods which in part circumvent these problems.</p>
 
 <p>
 <!-- navigation buttons at the bottom of the page -->
diff --git a/doc/pub/week42/html/._week42-bs071.html b/doc/pub/week42/html/._week42-bs071.html
index 2ebf82823..0e369ffe0 100644
--- a/doc/pub/week42/html/._week42-bs071.html
+++ b/doc/pub/week42/html/._week42-bs071.html
@@ -36,16 +36,17 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Lecture October 14, 2024', 2, None, 'lecture-october-14-2024'),
-              ('Material for the active learning sessions on Tuesday and '
-               'Wednesday',
+ 'sections': [('Lecture October 13, 2025', 2, None, 'lecture-october-13-2025'),
+              ('Readings and videos', 2, None, 'readings-and-videos'),
+              ('Material for the lab sessions on Tuesday and Wednesday',
                2,
                None,
-               'material-for-the-active-learning-sessions-on-tuesday-and-wednesday'),
-              ('Writing a code which implements a feed-forward neural network',
+               'material-for-the-lab-sessions-on-tuesday-and-wednesday'),
+              ('Lecture material: Writing a code which implements a '
+               'feed-forward neural network',
                2,
                None,
-               'writing-a-code-which-implements-a-feed-forward-neural-network'),
+               'lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network'),
               ('Mathematics of deep learning',
                2,
                None,
@@ -55,10 +56,11 @@
                None,
                'reminder-on-books-with-hands-on-material-and-codes'),
               ('Reading recommendations', 2, None, 'reading-recommendations'),
-              ('First network example, simple percepetron with one input',
+              ('Reminder from last week: First network example, simple '
+               'percepetron with one input',
                2,
                None,
-               'first-network-example-simple-percepetron-with-one-input'),
+               'reminder-from-last-week-first-network-example-simple-percepetron-with-one-input'),
               ('Layout of a simple neural network with no hidden layer',
                2,
                None,
@@ -105,10 +107,10 @@
                None,
                'setting-up-the-equations-for-a-neural-network'),
               ('Layout of a neural network with three hidden layers (last '
-               'later = $l=L=4$, first layer $l=0$)',
+               'layer = $l=L=4$, first layer $l=0$)',
                2,
                None,
-               'layout-of-a-neural-network-with-three-hidden-layers-last-later-l-l-4-first-layer-l-0'),
+               'layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0'),
               ('Definitions', 2, None, 'definitions'),
               ('Inputs to the activation function',
                2,
@@ -291,10 +293,6 @@
                2,
                None,
                'collect-and-pre-process-data'),
-              ('The Breast Cancer Data, now with Keras',
-               2,
-               None,
-               'the-breast-cancer-data-now-with-keras'),
               ('Building a neural network code',
                2,
                None,
@@ -349,103 +347,103 @@
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs001.html#lecture-october-14-2024" style="font-size: 80%;"><b>Lecture October 14, 2024</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs002.html#material-for-the-active-learning-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Material for the active learning sessions on Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs003.html#writing-a-code-which-implements-a-feed-forward-neural-network" style="font-size: 80%;"><b>Writing a code which implements a feed-forward neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs004.html#mathematics-of-deep-learning" style="font-size: 80%;"><b>Mathematics of deep learning</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs005.html#reminder-on-books-with-hands-on-material-and-codes" style="font-size: 80%;"><b>Reminder on books with hands-on material and codes</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs006.html#reading-recommendations" style="font-size: 80%;"><b>Reading recommendations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs007.html#first-network-example-simple-percepetron-with-one-input" style="font-size: 80%;"><b>First network example, simple percepetron with one input</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs008.html#layout-of-a-simple-neural-network-with-no-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with no hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs009.html#optimizing-the-parameters" style="font-size: 80%;"><b>Optimizing the parameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs010.html#adding-a-hidden-layer" style="font-size: 80%;"><b>Adding a hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs011.html#layout-of-a-simple-neural-network-with-one-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with one hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs012.html#the-derivatives" style="font-size: 80%;"><b>The derivatives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs013.html#important-observations" style="font-size: 80%;"><b>Important observations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs014.html#the-training" style="font-size: 80%;"><b>The training</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs015.html#code-example" style="font-size: 80%;"><b>Code example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs016.html#simple-neural-network-and-the-back-propagation-equations" style="font-size: 80%;"><b>Simple neural network and the  back propagation equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs017.html#layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-with-two-hidden-noeds-and-one-output-node" style="font-size: 80%;"><b>Layout of a simple neural network with two input nodes, one  hidden layer with two hidden noeds and one output node</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs018.html#the-ouput-layer" style="font-size: 80%;"><b>The ouput layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs019.html#compact-expressions" style="font-size: 80%;"><b>Compact expressions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs020.html#output-layer" style="font-size: 80%;"><b>Output layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs021.html#explicit-derivatives" style="font-size: 80%;"><b>Explicit derivatives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs022.html#derivatives-of-the-hidden-layer" style="font-size: 80%;"><b>Derivatives of the hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs023.html#final-expression" style="font-size: 80%;"><b>Final expression</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs024.html#completing-the-list" style="font-size: 80%;"><b>Completing the list</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs025.html#final-expressions-for-the-biases-of-the-hidden-layer" style="font-size: 80%;"><b>Final expressions for the biases of the hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs026.html#gradient-expressions" style="font-size: 80%;"><b>Gradient expressions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs027.html#setting-up-the-equations-for-a-neural-network" style="font-size: 80%;"><b>Setting up the equations for a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs028.html#layout-of-a-neural-network-with-three-hidden-layers-last-later-l-l-4-first-layer-l-0" style="font-size: 80%;"><b>Layout of a neural network with three hidden layers (last later = \( l=L=4 \), first layer \( l=0 \))</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs029.html#definitions" style="font-size: 80%;"><b>Definitions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs030.html#inputs-to-the-activation-function" style="font-size: 80%;"><b>Inputs to the activation function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs031.html#layout-of-input-to-first-hidden-layer-l-1-from-input-layer-l-0" style="font-size: 80%;"><b>Layout of input to first hidden layer \( l=1 \) from input layer \( l=0 \)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs032.html#derivatives-and-the-chain-rule" style="font-size: 80%;"><b>Derivatives and the chain rule</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs033.html#derivative-of-the-cost-function" style="font-size: 80%;"><b>Derivative of the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs034.html#the-back-propagation-equations-for-a-neural-network" style="font-size: 80%;"><b>The  back propagation equations for a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs035.html#analyzing-the-last-results" style="font-size: 80%;"><b>Analyzing the last results</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs036.html#more-considerations" style="font-size: 80%;"><b>More considerations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs037.html#derivatives-in-terms-of-z-j-l" style="font-size: 80%;"><b>Derivatives in terms of \( z_j^L \)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs038.html#bringing-it-together" style="font-size: 80%;"><b>Bringing it together</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs039.html#final-back-propagating-equation" style="font-size: 80%;"><b>Final back propagating equation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs040.html#using-the-chain-rule-and-summing-over-all-k-entries" style="font-size: 80%;"><b>Using the chain rule and summing over all \( k \) entries</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs041.html#setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations" style="font-size: 80%;"><b>Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs042.html#setting-up-the-back-propagation-algorithm-part-1" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 1</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs043.html#setting-up-the-back-propagation-algorithm-part-2" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs044.html#setting-up-the-back-propagation-algorithm-part-3" style="font-size: 80%;"><b>Setting up the Back propagation algorithm, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs045.html#updating-the-gradients" style="font-size: 80%;"><b>Updating the gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs001.html#lecture-october-13-2025" style="font-size: 80%;"><b>Lecture October 13, 2025</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs002.html#readings-and-videos" style="font-size: 80%;"><b>Readings and videos</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs003.html#material-for-the-lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Material for the lab sessions on Tuesday and Wednesday</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs004.html#lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network" style="font-size: 80%;"><b>Lecture material: Writing a code which implements a feed-forward neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs005.html#mathematics-of-deep-learning" style="font-size: 80%;"><b>Mathematics of deep learning</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs006.html#reminder-on-books-with-hands-on-material-and-codes" style="font-size: 80%;"><b>Reminder on books with hands-on material and codes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs007.html#reading-recommendations" style="font-size: 80%;"><b>Reading recommendations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs008.html#reminder-from-last-week-first-network-example-simple-percepetron-with-one-input" style="font-size: 80%;"><b>Reminder from last week: First network example, simple percepetron with one input</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs009.html#layout-of-a-simple-neural-network-with-no-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with no hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs010.html#optimizing-the-parameters" style="font-size: 80%;"><b>Optimizing the parameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs011.html#adding-a-hidden-layer" style="font-size: 80%;"><b>Adding a hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs012.html#layout-of-a-simple-neural-network-with-one-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with one hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs013.html#the-derivatives" style="font-size: 80%;"><b>The derivatives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs014.html#important-observations" style="font-size: 80%;"><b>Important observations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs015.html#the-training" style="font-size: 80%;"><b>The training</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs016.html#code-example" style="font-size: 80%;"><b>Code example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs017.html#simple-neural-network-and-the-back-propagation-equations" style="font-size: 80%;"><b>Simple neural network and the  back propagation equations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs018.html#layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-with-two-hidden-noeds-and-one-output-node" style="font-size: 80%;"><b>Layout of a simple neural network with two input nodes, one  hidden layer with two hidden noeds and one output node</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs019.html#the-ouput-layer" style="font-size: 80%;"><b>The ouput layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs020.html#compact-expressions" style="font-size: 80%;"><b>Compact expressions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs021.html#output-layer" style="font-size: 80%;"><b>Output layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs022.html#explicit-derivatives" style="font-size: 80%;"><b>Explicit derivatives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs023.html#derivatives-of-the-hidden-layer" style="font-size: 80%;"><b>Derivatives of the hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs024.html#final-expression" style="font-size: 80%;"><b>Final expression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs025.html#completing-the-list" style="font-size: 80%;"><b>Completing the list</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs026.html#final-expressions-for-the-biases-of-the-hidden-layer" style="font-size: 80%;"><b>Final expressions for the biases of the hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs027.html#gradient-expressions" style="font-size: 80%;"><b>Gradient expressions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs028.html#setting-up-the-equations-for-a-neural-network" style="font-size: 80%;"><b>Setting up the equations for a neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs029.html#layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0" style="font-size: 80%;"><b>Layout of a neural network with three hidden layers (last layer = \( l=L=4 \), first layer \( l=0 \))</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs030.html#definitions" style="font-size: 80%;"><b>Definitions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs031.html#inputs-to-the-activation-function" style="font-size: 80%;"><b>Inputs to the activation function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs032.html#layout-of-input-to-first-hidden-layer-l-1-from-input-layer-l-0" style="font-size: 80%;"><b>Layout of input to first hidden layer \( l=1 \) from input layer \( l=0 \)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs033.html#derivatives-and-the-chain-rule" style="font-size: 80%;"><b>Derivatives and the chain rule</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs034.html#derivative-of-the-cost-function" style="font-size: 80%;"><b>Derivative of the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs035.html#the-back-propagation-equations-for-a-neural-network" style="font-size: 80%;"><b>The  back propagation equations for a neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs036.html#analyzing-the-last-results" style="font-size: 80%;"><b>Analyzing the last results</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs037.html#more-considerations" style="font-size: 80%;"><b>More considerations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs038.html#derivatives-in-terms-of-z-j-l" style="font-size: 80%;"><b>Derivatives in terms of \( z_j^L \)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs039.html#bringing-it-together" style="font-size: 80%;"><b>Bringing it together</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs040.html#final-back-propagating-equation" style="font-size: 80%;"><b>Final back propagating equation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs041.html#using-the-chain-rule-and-summing-over-all-k-entries" style="font-size: 80%;"><b>Using the chain rule and summing over all \( k \) entries</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs042.html#setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations" style="font-size: 80%;"><b>Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs043.html#setting-up-the-back-propagation-algorithm-part-1" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 1</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs044.html#setting-up-the-back-propagation-algorithm-part-2" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 2</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs045.html#setting-up-the-back-propagation-algorithm-part-3" style="font-size: 80%;"><b>Setting up the Back propagation algorithm, part 3</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs046.html#updating-the-gradients" style="font-size: 80%;"><b>Updating the gradients</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#activation-functions" style="font-size: 80%;"><b>Activation functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs047.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs048.html#relevance" style="font-size: 80%;"><b>Relevance</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs049.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs050.html#exploding-gradients" style="font-size: 80%;"><b>Exploding gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs051.html#is-the-logistic-activation-function-sigmoid-our-choice" style="font-size: 80%;"><b>Is the Logistic activation function (Sigmoid)  our choice?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs052.html#logistic-function-as-the-root-of-problems" style="font-size: 80%;"><b>Logistic function as the root of problems</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs053.html#the-derivative-of-the-logistic-funtion" style="font-size: 80%;"><b>The derivative of the Logistic funtion</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs054.html#insights-from-the-paper-by-glorot-and-bengio" style="font-size: 80%;"><b>Insights from the paper by Glorot and Bengio</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs055.html#the-relu-function-family" style="font-size: 80%;"><b>The RELU function family</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs056.html#elu-function" style="font-size: 80%;"><b>ELU function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs057.html#which-activation-function-should-we-use" style="font-size: 80%;"><b>Which activation function should we use?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs058.html#more-on-activation-functions-output-layers" style="font-size: 80%;"><b>More on activation functions, output layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs059.html#fine-tuning-neural-network-hyperparameters" style="font-size: 80%;"><b>Fine-tuning neural network hyperparameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs060.html#hidden-layers" style="font-size: 80%;"><b>Hidden layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs061.html#batch-normalization" style="font-size: 80%;"><b>Batch Normalization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs062.html#dropout" style="font-size: 80%;"><b>Dropout</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs063.html#gradient-clipping" style="font-size: 80%;"><b>Gradient Clipping</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs064.html#a-top-down-perspective-on-neural-networks" style="font-size: 80%;"><b>A top-down perspective on Neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs065.html#more-top-down-perspectives" style="font-size: 80%;"><b>More top-down perspectives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs066.html#limitations-of-supervised-learning-with-deep-networks" style="font-size: 80%;"><b>Limitations of supervised learning with deep networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs067.html#limitations-of-nns" style="font-size: 80%;"><b>Limitations of NNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs068.html#homogeneous-data" style="font-size: 80%;"><b>Homogeneous data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs069.html#more-limitations" style="font-size: 80%;"><b>More limitations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs070.html#setting-up-a-multi-layer-perceptron-model-for-classification" style="font-size: 80%;"><b>Setting up a Multi-layer perceptron model for classification</b></a></li>
-     <!-- navigation toc: --> <li><a href="#defining-the-cost-function" style="font-size: 80%;"><b>Defining the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs072.html#example-binary-classification-problem" style="font-size: 80%;"><b>Example: binary classification problem</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs073.html#the-softmax-function" style="font-size: 80%;"><b>The Softmax function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs074.html#developing-a-code-for-doing-neural-networks-with-back-propagation" style="font-size: 80%;"><b>Developing a code for doing neural networks with back propagation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs076.html#train-and-test-datasets" style="font-size: 80%;"><b>Train and test datasets</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs077.html#define-model-and-architecture" style="font-size: 80%;"><b>Define model and architecture</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs078.html#layers" style="font-size: 80%;"><b>Layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs079.html#weights-and-biases" style="font-size: 80%;"><b>Weights and biases</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs080.html#feed-forward-pass" style="font-size: 80%;"><b>Feed-forward pass</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs081.html#matrix-multiplications" style="font-size: 80%;"><b>Matrix multiplications</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs082.html#choose-cost-function-and-optimizer" style="font-size: 80%;"><b>Choose cost function and optimizer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs083.html#optimizing-the-cost-function" style="font-size: 80%;"><b>Optimizing the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs084.html#regularization" style="font-size: 80%;"><b>Regularization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs085.html#matrix-multiplication" style="font-size: 80%;"><b>Matrix  multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs086.html#improving-performance" style="font-size: 80%;"><b>Improving performance</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs087.html#full-object-oriented-implementation" style="font-size: 80%;"><b>Full object-oriented implementation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs088.html#evaluate-model-performance-on-test-data" style="font-size: 80%;"><b>Evaluate model performance on test data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs089.html#adjust-hyperparameters" style="font-size: 80%;"><b>Adjust hyperparameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs091.html#scikit-learn-implementation" style="font-size: 80%;"><b>scikit-learn implementation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#building-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs094.html#tensorflow" style="font-size: 80%;"><b>Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs095.html#using-keras" style="font-size: 80%;"><b>Using Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#the-breast-cancer-data-now-with-keras" style="font-size: 80%;"><b>The Breast Cancer Data, now with Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs048.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs049.html#relevance" style="font-size: 80%;"><b>Relevance</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs050.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs051.html#exploding-gradients" style="font-size: 80%;"><b>Exploding gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs052.html#is-the-logistic-activation-function-sigmoid-our-choice" style="font-size: 80%;"><b>Is the Logistic activation function (Sigmoid)  our choice?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs053.html#logistic-function-as-the-root-of-problems" style="font-size: 80%;"><b>Logistic function as the root of problems</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs054.html#the-derivative-of-the-logistic-funtion" style="font-size: 80%;"><b>The derivative of the Logistic funtion</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs055.html#insights-from-the-paper-by-glorot-and-bengio" style="font-size: 80%;"><b>Insights from the paper by Glorot and Bengio</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs056.html#the-relu-function-family" style="font-size: 80%;"><b>The RELU function family</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs057.html#elu-function" style="font-size: 80%;"><b>ELU function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs058.html#which-activation-function-should-we-use" style="font-size: 80%;"><b>Which activation function should we use?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs059.html#more-on-activation-functions-output-layers" style="font-size: 80%;"><b>More on activation functions, output layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs060.html#fine-tuning-neural-network-hyperparameters" style="font-size: 80%;"><b>Fine-tuning neural network hyperparameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs061.html#hidden-layers" style="font-size: 80%;"><b>Hidden layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs062.html#batch-normalization" style="font-size: 80%;"><b>Batch Normalization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs063.html#dropout" style="font-size: 80%;"><b>Dropout</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs064.html#gradient-clipping" style="font-size: 80%;"><b>Gradient Clipping</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs065.html#a-top-down-perspective-on-neural-networks" style="font-size: 80%;"><b>A top-down perspective on Neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs066.html#more-top-down-perspectives" style="font-size: 80%;"><b>More top-down perspectives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs067.html#limitations-of-supervised-learning-with-deep-networks" style="font-size: 80%;"><b>Limitations of supervised learning with deep networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs068.html#limitations-of-nns" style="font-size: 80%;"><b>Limitations of NNs</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs069.html#homogeneous-data" style="font-size: 80%;"><b>Homogeneous data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs070.html#more-limitations" style="font-size: 80%;"><b>More limitations</b></a></li>
+     <!-- navigation toc: --> <li><a href="#setting-up-a-multi-layer-perceptron-model-for-classification" style="font-size: 80%;"><b>Setting up a Multi-layer perceptron model for classification</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs072.html#defining-the-cost-function" style="font-size: 80%;"><b>Defining the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs073.html#example-binary-classification-problem" style="font-size: 80%;"><b>Example: binary classification problem</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs074.html#the-softmax-function" style="font-size: 80%;"><b>The Softmax function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs075.html#developing-a-code-for-doing-neural-networks-with-back-propagation" style="font-size: 80%;"><b>Developing a code for doing neural networks with back propagation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs077.html#train-and-test-datasets" style="font-size: 80%;"><b>Train and test datasets</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs078.html#define-model-and-architecture" style="font-size: 80%;"><b>Define model and architecture</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs079.html#layers" style="font-size: 80%;"><b>Layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs080.html#weights-and-biases" style="font-size: 80%;"><b>Weights and biases</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs081.html#feed-forward-pass" style="font-size: 80%;"><b>Feed-forward pass</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs082.html#matrix-multiplications" style="font-size: 80%;"><b>Matrix multiplications</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs083.html#choose-cost-function-and-optimizer" style="font-size: 80%;"><b>Choose cost function and optimizer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs084.html#optimizing-the-cost-function" style="font-size: 80%;"><b>Optimizing the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs085.html#regularization" style="font-size: 80%;"><b>Regularization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs086.html#matrix-multiplication" style="font-size: 80%;"><b>Matrix  multiplication</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs087.html#improving-performance" style="font-size: 80%;"><b>Improving performance</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs088.html#full-object-oriented-implementation" style="font-size: 80%;"><b>Full object-oriented implementation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs089.html#evaluate-model-performance-on-test-data" style="font-size: 80%;"><b>Evaluate model performance on test data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs090.html#adjust-hyperparameters" style="font-size: 80%;"><b>Adjust hyperparameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#scikit-learn-implementation" style="font-size: 80%;"><b>scikit-learn implementation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs094.html#building-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building neural networks in Tensorflow and Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs095.html#tensorflow" style="font-size: 80%;"><b>Tensorflow</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#using-keras" style="font-size: 80%;"><b>Using Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#building-a-neural-network-code" style="font-size: 80%;"><b>Building a neural network code</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#learning-rate-methods" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Learning rate methods</a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#usage-of-the-above-learning-rate-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of the above learning rate schedulers</a></li>
@@ -464,57 +462,42 @@
 <div class="container">
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0071"></a>
-<!-- !split -->
-<h2 id="defining-the-cost-function" class="anchor">Defining the cost function </h2>
-
-<p>Our cost function is given as (see the Logistic regression lectures)</p>
-$$
-\mathcal{C}(\boldsymbol{\theta}) = - \ln P(\mathcal{D} \mid \boldsymbol{\theta}) = - \sum_{i=1}^n
-y_i \ln[P(y_i = 0)] + (1 - y_i) \ln [1 - P(y_i = 0)] = \sum_{i=1}^n \mathcal{L}_i(\boldsymbol{\theta}) .
-$$
+<!-- !split  -->
+<h2 id="setting-up-a-multi-layer-perceptron-model-for-classification" class="anchor">Setting up a Multi-layer perceptron model for classification  </h2>
 
-<p>This last equality means that we can interpret our <em>cost</em> function as a sum over the <em>loss</em> function
-for each point in the dataset \( \mathcal{L}_i(\boldsymbol{\theta}) \).  
-The negative sign is just so that we can think about our algorithm as minimizing a positive number, rather
-than maximizing a negative number.  
+<p>We are now gong to develop an example based on the MNIST data
+base. This is a classification problem and we need to use our
+cross-entropy function we discussed in connection with logistic
+regression. The cross-entropy defines our cost function for the
+classificaton problems with neural networks.
 </p>
 
-<p>In <em>multiclass</em> classification it is common to treat each integer label as a so called <em>one-hot</em> vector:  </p>
-
-<p>\( y = 5 \quad \rightarrow \quad \boldsymbol{y} = (0, 0, 0, 0, 0, 1, 0, 0, 0, 0) , \) and</p>
-
-\( y = 1 \quad \rightarrow \quad \boldsymbol{y} = (0, 1, 0, 0, 0, 0, 0, 0, 0, 0) , \)
-
-<p>i.e. a binary bit string of length \( C \), where \( C = 10 \) is the number of classes in the MNIST dataset (numbers from \( 0 \) to \( 9 \))..  </p>
-
-<p>If \( \boldsymbol{x}_i \) is the \( i \)-th input (image), \( y_{ic} \) refers to the \( c \)-th component of the \( i \)-th
-output vector \( \boldsymbol{y}_i \).  
-The probability of \( \boldsymbol{x}_i \) being in class \( c \) will be given by the softmax function:  
+<p>In binary classification with two classes \( (0, 1) \) we define the
+logistic/sigmoid function as the probability that a particular input
+is in class \( 0 \) or \( 1 \).  This is possible because the logistic
+function takes any input from the real numbers and inputs a number
+between 0 and 1, and can therefore be interpreted as a probability. It
+also has other nice properties, such as a derivative that is simple to
+calculate.
 </p>
 
-$$
-P(y_{ic} = 1 \mid \boldsymbol{x}_i, \boldsymbol{\theta}) = \frac{\exp{((\boldsymbol{a}_i^{hidden})^T \boldsymbol{w}_c)}}
-{\sum_{c'=0}^{C-1} \exp{((\boldsymbol{a}_i^{hidden})^T \boldsymbol{w}_{c'})}} ,
-$$
-
-<p>which reduces to the logistic function in the binary case.  
-The likelihood of this \( C \)-class classifier
-is now given as:  
+<p>For an input \( \boldsymbol{a} \) from the hidden layer, the probability that the input \( \boldsymbol{x} \)
+is in class 0 or 1 is just. We let \( \theta \) represent the unknown weights and biases to be adjusted by our equations). The variable \( x \)
+represents our activation values \( z \). We have
 </p>
-
 $$
-P(\mathcal{D} \mid \boldsymbol{\theta}) = \prod_{i=1}^n \prod_{c=0}^{C-1} [P(y_{ic} = 1)]^{y_{ic}} .
+P(y = 0 \mid \boldsymbol{x}, \boldsymbol{\theta}) = \frac{1}{1 + \exp{(- \boldsymbol{x}})} ,
 $$
 
-<p>Again we take the negative log-likelihood to define our cost function:  </p>
-
+<p>and</p>
 $$
-\mathcal{C}(\boldsymbol{\theta}) = - \log{P(\mathcal{D} \mid \boldsymbol{\theta})}.
+P(y = 1 \mid \boldsymbol{x}, \boldsymbol{\theta}) = 1 - P(y = 0 \mid \boldsymbol{x}, \boldsymbol{\theta}) ,
 $$
 
-<p>See the logistic regression lectures for a full definition of the cost function.</p>
+<p>where \( y \in \{0, 1\} \)  and \( \boldsymbol{\theta} \) represents the weights and biases
+of our network.
+</p>
 
-<p>The back propagation equations need now only a small change, namely the definition of a new cost function. We are thus ready to use the same equations as before!</p>
 
 <p>
 <!-- navigation buttons at the bottom of the page -->
diff --git a/doc/pub/week42/html/._week42-bs072.html b/doc/pub/week42/html/._week42-bs072.html
index 98b9dbe95..1e6b69931 100644
--- a/doc/pub/week42/html/._week42-bs072.html
+++ b/doc/pub/week42/html/._week42-bs072.html
@@ -36,16 +36,17 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Lecture October 14, 2024', 2, None, 'lecture-october-14-2024'),
-              ('Material for the active learning sessions on Tuesday and '
-               'Wednesday',
+ 'sections': [('Lecture October 13, 2025', 2, None, 'lecture-october-13-2025'),
+              ('Readings and videos', 2, None, 'readings-and-videos'),
+              ('Material for the lab sessions on Tuesday and Wednesday',
                2,
                None,
-               'material-for-the-active-learning-sessions-on-tuesday-and-wednesday'),
-              ('Writing a code which implements a feed-forward neural network',
+               'material-for-the-lab-sessions-on-tuesday-and-wednesday'),
+              ('Lecture material: Writing a code which implements a '
+               'feed-forward neural network',
                2,
                None,
-               'writing-a-code-which-implements-a-feed-forward-neural-network'),
+               'lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network'),
               ('Mathematics of deep learning',
                2,
                None,
@@ -55,10 +56,11 @@
                None,
                'reminder-on-books-with-hands-on-material-and-codes'),
               ('Reading recommendations', 2, None, 'reading-recommendations'),
-              ('First network example, simple percepetron with one input',
+              ('Reminder from last week: First network example, simple '
+               'percepetron with one input',
                2,
                None,
-               'first-network-example-simple-percepetron-with-one-input'),
+               'reminder-from-last-week-first-network-example-simple-percepetron-with-one-input'),
               ('Layout of a simple neural network with no hidden layer',
                2,
                None,
@@ -105,10 +107,10 @@
                None,
                'setting-up-the-equations-for-a-neural-network'),
               ('Layout of a neural network with three hidden layers (last '
-               'later = $l=L=4$, first layer $l=0$)',
+               'layer = $l=L=4$, first layer $l=0$)',
                2,
                None,
-               'layout-of-a-neural-network-with-three-hidden-layers-last-later-l-l-4-first-layer-l-0'),
+               'layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0'),
               ('Definitions', 2, None, 'definitions'),
               ('Inputs to the activation function',
                2,
@@ -291,10 +293,6 @@
                2,
                None,
                'collect-and-pre-process-data'),
-              ('The Breast Cancer Data, now with Keras',
-               2,
-               None,
-               'the-breast-cancer-data-now-with-keras'),
               ('Building a neural network code',
                2,
                None,
@@ -349,103 +347,103 @@
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs001.html#lecture-october-14-2024" style="font-size: 80%;"><b>Lecture October 14, 2024</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs002.html#material-for-the-active-learning-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Material for the active learning sessions on Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs003.html#writing-a-code-which-implements-a-feed-forward-neural-network" style="font-size: 80%;"><b>Writing a code which implements a feed-forward neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs004.html#mathematics-of-deep-learning" style="font-size: 80%;"><b>Mathematics of deep learning</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs005.html#reminder-on-books-with-hands-on-material-and-codes" style="font-size: 80%;"><b>Reminder on books with hands-on material and codes</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs006.html#reading-recommendations" style="font-size: 80%;"><b>Reading recommendations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs007.html#first-network-example-simple-percepetron-with-one-input" style="font-size: 80%;"><b>First network example, simple percepetron with one input</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs008.html#layout-of-a-simple-neural-network-with-no-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with no hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs009.html#optimizing-the-parameters" style="font-size: 80%;"><b>Optimizing the parameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs010.html#adding-a-hidden-layer" style="font-size: 80%;"><b>Adding a hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs011.html#layout-of-a-simple-neural-network-with-one-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with one hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs012.html#the-derivatives" style="font-size: 80%;"><b>The derivatives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs013.html#important-observations" style="font-size: 80%;"><b>Important observations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs014.html#the-training" style="font-size: 80%;"><b>The training</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs015.html#code-example" style="font-size: 80%;"><b>Code example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs016.html#simple-neural-network-and-the-back-propagation-equations" style="font-size: 80%;"><b>Simple neural network and the  back propagation equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs017.html#layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-with-two-hidden-noeds-and-one-output-node" style="font-size: 80%;"><b>Layout of a simple neural network with two input nodes, one  hidden layer with two hidden noeds and one output node</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs018.html#the-ouput-layer" style="font-size: 80%;"><b>The ouput layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs019.html#compact-expressions" style="font-size: 80%;"><b>Compact expressions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs020.html#output-layer" style="font-size: 80%;"><b>Output layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs021.html#explicit-derivatives" style="font-size: 80%;"><b>Explicit derivatives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs022.html#derivatives-of-the-hidden-layer" style="font-size: 80%;"><b>Derivatives of the hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs023.html#final-expression" style="font-size: 80%;"><b>Final expression</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs024.html#completing-the-list" style="font-size: 80%;"><b>Completing the list</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs025.html#final-expressions-for-the-biases-of-the-hidden-layer" style="font-size: 80%;"><b>Final expressions for the biases of the hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs026.html#gradient-expressions" style="font-size: 80%;"><b>Gradient expressions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs027.html#setting-up-the-equations-for-a-neural-network" style="font-size: 80%;"><b>Setting up the equations for a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs028.html#layout-of-a-neural-network-with-three-hidden-layers-last-later-l-l-4-first-layer-l-0" style="font-size: 80%;"><b>Layout of a neural network with three hidden layers (last later = \( l=L=4 \), first layer \( l=0 \))</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs029.html#definitions" style="font-size: 80%;"><b>Definitions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs030.html#inputs-to-the-activation-function" style="font-size: 80%;"><b>Inputs to the activation function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs031.html#layout-of-input-to-first-hidden-layer-l-1-from-input-layer-l-0" style="font-size: 80%;"><b>Layout of input to first hidden layer \( l=1 \) from input layer \( l=0 \)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs032.html#derivatives-and-the-chain-rule" style="font-size: 80%;"><b>Derivatives and the chain rule</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs033.html#derivative-of-the-cost-function" style="font-size: 80%;"><b>Derivative of the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs034.html#the-back-propagation-equations-for-a-neural-network" style="font-size: 80%;"><b>The  back propagation equations for a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs035.html#analyzing-the-last-results" style="font-size: 80%;"><b>Analyzing the last results</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs036.html#more-considerations" style="font-size: 80%;"><b>More considerations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs037.html#derivatives-in-terms-of-z-j-l" style="font-size: 80%;"><b>Derivatives in terms of \( z_j^L \)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs038.html#bringing-it-together" style="font-size: 80%;"><b>Bringing it together</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs039.html#final-back-propagating-equation" style="font-size: 80%;"><b>Final back propagating equation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs040.html#using-the-chain-rule-and-summing-over-all-k-entries" style="font-size: 80%;"><b>Using the chain rule and summing over all \( k \) entries</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs041.html#setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations" style="font-size: 80%;"><b>Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs042.html#setting-up-the-back-propagation-algorithm-part-1" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 1</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs043.html#setting-up-the-back-propagation-algorithm-part-2" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs044.html#setting-up-the-back-propagation-algorithm-part-3" style="font-size: 80%;"><b>Setting up the Back propagation algorithm, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs045.html#updating-the-gradients" style="font-size: 80%;"><b>Updating the gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs001.html#lecture-october-13-2025" style="font-size: 80%;"><b>Lecture October 13, 2025</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs002.html#readings-and-videos" style="font-size: 80%;"><b>Readings and videos</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs003.html#material-for-the-lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Material for the lab sessions on Tuesday and Wednesday</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs004.html#lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network" style="font-size: 80%;"><b>Lecture material: Writing a code which implements a feed-forward neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs005.html#mathematics-of-deep-learning" style="font-size: 80%;"><b>Mathematics of deep learning</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs006.html#reminder-on-books-with-hands-on-material-and-codes" style="font-size: 80%;"><b>Reminder on books with hands-on material and codes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs007.html#reading-recommendations" style="font-size: 80%;"><b>Reading recommendations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs008.html#reminder-from-last-week-first-network-example-simple-percepetron-with-one-input" style="font-size: 80%;"><b>Reminder from last week: First network example, simple percepetron with one input</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs009.html#layout-of-a-simple-neural-network-with-no-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with no hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs010.html#optimizing-the-parameters" style="font-size: 80%;"><b>Optimizing the parameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs011.html#adding-a-hidden-layer" style="font-size: 80%;"><b>Adding a hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs012.html#layout-of-a-simple-neural-network-with-one-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with one hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs013.html#the-derivatives" style="font-size: 80%;"><b>The derivatives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs014.html#important-observations" style="font-size: 80%;"><b>Important observations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs015.html#the-training" style="font-size: 80%;"><b>The training</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs016.html#code-example" style="font-size: 80%;"><b>Code example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs017.html#simple-neural-network-and-the-back-propagation-equations" style="font-size: 80%;"><b>Simple neural network and the  back propagation equations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs018.html#layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-with-two-hidden-noeds-and-one-output-node" style="font-size: 80%;"><b>Layout of a simple neural network with two input nodes, one  hidden layer with two hidden noeds and one output node</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs019.html#the-ouput-layer" style="font-size: 80%;"><b>The ouput layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs020.html#compact-expressions" style="font-size: 80%;"><b>Compact expressions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs021.html#output-layer" style="font-size: 80%;"><b>Output layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs022.html#explicit-derivatives" style="font-size: 80%;"><b>Explicit derivatives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs023.html#derivatives-of-the-hidden-layer" style="font-size: 80%;"><b>Derivatives of the hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs024.html#final-expression" style="font-size: 80%;"><b>Final expression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs025.html#completing-the-list" style="font-size: 80%;"><b>Completing the list</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs026.html#final-expressions-for-the-biases-of-the-hidden-layer" style="font-size: 80%;"><b>Final expressions for the biases of the hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs027.html#gradient-expressions" style="font-size: 80%;"><b>Gradient expressions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs028.html#setting-up-the-equations-for-a-neural-network" style="font-size: 80%;"><b>Setting up the equations for a neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs029.html#layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0" style="font-size: 80%;"><b>Layout of a neural network with three hidden layers (last layer = \( l=L=4 \), first layer \( l=0 \))</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs030.html#definitions" style="font-size: 80%;"><b>Definitions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs031.html#inputs-to-the-activation-function" style="font-size: 80%;"><b>Inputs to the activation function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs032.html#layout-of-input-to-first-hidden-layer-l-1-from-input-layer-l-0" style="font-size: 80%;"><b>Layout of input to first hidden layer \( l=1 \) from input layer \( l=0 \)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs033.html#derivatives-and-the-chain-rule" style="font-size: 80%;"><b>Derivatives and the chain rule</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs034.html#derivative-of-the-cost-function" style="font-size: 80%;"><b>Derivative of the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs035.html#the-back-propagation-equations-for-a-neural-network" style="font-size: 80%;"><b>The  back propagation equations for a neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs036.html#analyzing-the-last-results" style="font-size: 80%;"><b>Analyzing the last results</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs037.html#more-considerations" style="font-size: 80%;"><b>More considerations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs038.html#derivatives-in-terms-of-z-j-l" style="font-size: 80%;"><b>Derivatives in terms of \( z_j^L \)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs039.html#bringing-it-together" style="font-size: 80%;"><b>Bringing it together</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs040.html#final-back-propagating-equation" style="font-size: 80%;"><b>Final back propagating equation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs041.html#using-the-chain-rule-and-summing-over-all-k-entries" style="font-size: 80%;"><b>Using the chain rule and summing over all \( k \) entries</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs042.html#setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations" style="font-size: 80%;"><b>Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs043.html#setting-up-the-back-propagation-algorithm-part-1" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 1</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs044.html#setting-up-the-back-propagation-algorithm-part-2" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 2</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs045.html#setting-up-the-back-propagation-algorithm-part-3" style="font-size: 80%;"><b>Setting up the Back propagation algorithm, part 3</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs046.html#updating-the-gradients" style="font-size: 80%;"><b>Updating the gradients</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#activation-functions" style="font-size: 80%;"><b>Activation functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs047.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs048.html#relevance" style="font-size: 80%;"><b>Relevance</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs049.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs050.html#exploding-gradients" style="font-size: 80%;"><b>Exploding gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs051.html#is-the-logistic-activation-function-sigmoid-our-choice" style="font-size: 80%;"><b>Is the Logistic activation function (Sigmoid)  our choice?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs052.html#logistic-function-as-the-root-of-problems" style="font-size: 80%;"><b>Logistic function as the root of problems</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs053.html#the-derivative-of-the-logistic-funtion" style="font-size: 80%;"><b>The derivative of the Logistic funtion</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs054.html#insights-from-the-paper-by-glorot-and-bengio" style="font-size: 80%;"><b>Insights from the paper by Glorot and Bengio</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs055.html#the-relu-function-family" style="font-size: 80%;"><b>The RELU function family</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs056.html#elu-function" style="font-size: 80%;"><b>ELU function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs057.html#which-activation-function-should-we-use" style="font-size: 80%;"><b>Which activation function should we use?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs058.html#more-on-activation-functions-output-layers" style="font-size: 80%;"><b>More on activation functions, output layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs059.html#fine-tuning-neural-network-hyperparameters" style="font-size: 80%;"><b>Fine-tuning neural network hyperparameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs060.html#hidden-layers" style="font-size: 80%;"><b>Hidden layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs061.html#batch-normalization" style="font-size: 80%;"><b>Batch Normalization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs062.html#dropout" style="font-size: 80%;"><b>Dropout</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs063.html#gradient-clipping" style="font-size: 80%;"><b>Gradient Clipping</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs064.html#a-top-down-perspective-on-neural-networks" style="font-size: 80%;"><b>A top-down perspective on Neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs065.html#more-top-down-perspectives" style="font-size: 80%;"><b>More top-down perspectives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs066.html#limitations-of-supervised-learning-with-deep-networks" style="font-size: 80%;"><b>Limitations of supervised learning with deep networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs067.html#limitations-of-nns" style="font-size: 80%;"><b>Limitations of NNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs068.html#homogeneous-data" style="font-size: 80%;"><b>Homogeneous data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs069.html#more-limitations" style="font-size: 80%;"><b>More limitations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs070.html#setting-up-a-multi-layer-perceptron-model-for-classification" style="font-size: 80%;"><b>Setting up a Multi-layer perceptron model for classification</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs071.html#defining-the-cost-function" style="font-size: 80%;"><b>Defining the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="#example-binary-classification-problem" style="font-size: 80%;"><b>Example: binary classification problem</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs073.html#the-softmax-function" style="font-size: 80%;"><b>The Softmax function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs074.html#developing-a-code-for-doing-neural-networks-with-back-propagation" style="font-size: 80%;"><b>Developing a code for doing neural networks with back propagation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs076.html#train-and-test-datasets" style="font-size: 80%;"><b>Train and test datasets</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs077.html#define-model-and-architecture" style="font-size: 80%;"><b>Define model and architecture</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs078.html#layers" style="font-size: 80%;"><b>Layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs079.html#weights-and-biases" style="font-size: 80%;"><b>Weights and biases</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs080.html#feed-forward-pass" style="font-size: 80%;"><b>Feed-forward pass</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs081.html#matrix-multiplications" style="font-size: 80%;"><b>Matrix multiplications</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs082.html#choose-cost-function-and-optimizer" style="font-size: 80%;"><b>Choose cost function and optimizer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs083.html#optimizing-the-cost-function" style="font-size: 80%;"><b>Optimizing the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs084.html#regularization" style="font-size: 80%;"><b>Regularization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs085.html#matrix-multiplication" style="font-size: 80%;"><b>Matrix  multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs086.html#improving-performance" style="font-size: 80%;"><b>Improving performance</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs087.html#full-object-oriented-implementation" style="font-size: 80%;"><b>Full object-oriented implementation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs088.html#evaluate-model-performance-on-test-data" style="font-size: 80%;"><b>Evaluate model performance on test data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs089.html#adjust-hyperparameters" style="font-size: 80%;"><b>Adjust hyperparameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs091.html#scikit-learn-implementation" style="font-size: 80%;"><b>scikit-learn implementation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#building-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs094.html#tensorflow" style="font-size: 80%;"><b>Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs095.html#using-keras" style="font-size: 80%;"><b>Using Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#the-breast-cancer-data-now-with-keras" style="font-size: 80%;"><b>The Breast Cancer Data, now with Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs048.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs049.html#relevance" style="font-size: 80%;"><b>Relevance</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs050.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs051.html#exploding-gradients" style="font-size: 80%;"><b>Exploding gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs052.html#is-the-logistic-activation-function-sigmoid-our-choice" style="font-size: 80%;"><b>Is the Logistic activation function (Sigmoid)  our choice?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs053.html#logistic-function-as-the-root-of-problems" style="font-size: 80%;"><b>Logistic function as the root of problems</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs054.html#the-derivative-of-the-logistic-funtion" style="font-size: 80%;"><b>The derivative of the Logistic funtion</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs055.html#insights-from-the-paper-by-glorot-and-bengio" style="font-size: 80%;"><b>Insights from the paper by Glorot and Bengio</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs056.html#the-relu-function-family" style="font-size: 80%;"><b>The RELU function family</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs057.html#elu-function" style="font-size: 80%;"><b>ELU function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs058.html#which-activation-function-should-we-use" style="font-size: 80%;"><b>Which activation function should we use?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs059.html#more-on-activation-functions-output-layers" style="font-size: 80%;"><b>More on activation functions, output layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs060.html#fine-tuning-neural-network-hyperparameters" style="font-size: 80%;"><b>Fine-tuning neural network hyperparameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs061.html#hidden-layers" style="font-size: 80%;"><b>Hidden layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs062.html#batch-normalization" style="font-size: 80%;"><b>Batch Normalization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs063.html#dropout" style="font-size: 80%;"><b>Dropout</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs064.html#gradient-clipping" style="font-size: 80%;"><b>Gradient Clipping</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs065.html#a-top-down-perspective-on-neural-networks" style="font-size: 80%;"><b>A top-down perspective on Neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs066.html#more-top-down-perspectives" style="font-size: 80%;"><b>More top-down perspectives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs067.html#limitations-of-supervised-learning-with-deep-networks" style="font-size: 80%;"><b>Limitations of supervised learning with deep networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs068.html#limitations-of-nns" style="font-size: 80%;"><b>Limitations of NNs</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs069.html#homogeneous-data" style="font-size: 80%;"><b>Homogeneous data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs070.html#more-limitations" style="font-size: 80%;"><b>More limitations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs071.html#setting-up-a-multi-layer-perceptron-model-for-classification" style="font-size: 80%;"><b>Setting up a Multi-layer perceptron model for classification</b></a></li>
+     <!-- navigation toc: --> <li><a href="#defining-the-cost-function" style="font-size: 80%;"><b>Defining the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs073.html#example-binary-classification-problem" style="font-size: 80%;"><b>Example: binary classification problem</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs074.html#the-softmax-function" style="font-size: 80%;"><b>The Softmax function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs075.html#developing-a-code-for-doing-neural-networks-with-back-propagation" style="font-size: 80%;"><b>Developing a code for doing neural networks with back propagation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs077.html#train-and-test-datasets" style="font-size: 80%;"><b>Train and test datasets</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs078.html#define-model-and-architecture" style="font-size: 80%;"><b>Define model and architecture</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs079.html#layers" style="font-size: 80%;"><b>Layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs080.html#weights-and-biases" style="font-size: 80%;"><b>Weights and biases</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs081.html#feed-forward-pass" style="font-size: 80%;"><b>Feed-forward pass</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs082.html#matrix-multiplications" style="font-size: 80%;"><b>Matrix multiplications</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs083.html#choose-cost-function-and-optimizer" style="font-size: 80%;"><b>Choose cost function and optimizer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs084.html#optimizing-the-cost-function" style="font-size: 80%;"><b>Optimizing the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs085.html#regularization" style="font-size: 80%;"><b>Regularization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs086.html#matrix-multiplication" style="font-size: 80%;"><b>Matrix  multiplication</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs087.html#improving-performance" style="font-size: 80%;"><b>Improving performance</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs088.html#full-object-oriented-implementation" style="font-size: 80%;"><b>Full object-oriented implementation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs089.html#evaluate-model-performance-on-test-data" style="font-size: 80%;"><b>Evaluate model performance on test data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs090.html#adjust-hyperparameters" style="font-size: 80%;"><b>Adjust hyperparameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#scikit-learn-implementation" style="font-size: 80%;"><b>scikit-learn implementation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs094.html#building-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building neural networks in Tensorflow and Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs095.html#tensorflow" style="font-size: 80%;"><b>Tensorflow</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#using-keras" style="font-size: 80%;"><b>Using Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#building-a-neural-network-code" style="font-size: 80%;"><b>Building a neural network code</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#learning-rate-methods" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Learning rate methods</a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#usage-of-the-above-learning-rate-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of the above learning rate schedulers</a></li>
@@ -465,50 +463,56 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0072"></a>
 <!-- !split -->
-<h2 id="example-binary-classification-problem" class="anchor">Example: binary classification problem </h2>
+<h2 id="defining-the-cost-function" class="anchor">Defining the cost function </h2>
 
-<p>As an example of the above, relevant for project 2 as well, let us consider a binary class. As discussed in our logistic regression lectures, we defined a cost function in terms of the parameters \( \beta \) as</p>
+<p>Our cost function is given as (see the Logistic regression lectures)</p>
 $$
-\mathcal{C}(\boldsymbol{\beta}) = - \sum_{i=1}^n \left(y_i\log{p(y_i \vert x_i,\boldsymbol{\beta})}+(1-y_i)\log{1-p(y_i \vert x_i,\boldsymbol{\beta})}\right),
+\mathcal{C}(\boldsymbol{\theta}) = - \ln P(\mathcal{D} \mid \boldsymbol{\theta}) = - \sum_{i=1}^n
+y_i \ln[P(y_i = 0)] + (1 - y_i) \ln [1 - P(y_i = 0)] = \sum_{i=1}^n \mathcal{L}_i(\boldsymbol{\theta}) .
 $$
 
-<p>where we had defined the logistic (sigmoid) function</p>
-$$
-p(y_i =1\vert x_i,\boldsymbol{\beta})=\frac{\exp{(\beta_0+\beta_1 x_i)}}{1+\exp{(\beta_0+\beta_1 x_i)}},
-$$
+<p>This last equality means that we can interpret our <em>cost</em> function as a sum over the <em>loss</em> function
+for each point in the dataset \( \mathcal{L}_i(\boldsymbol{\theta}) \).  
+The negative sign is just so that we can think about our algorithm as minimizing a positive number, rather
+than maximizing a negative number.  
+</p>
 
-<p>and</p>
-$$
-p(y_i =0\vert x_i,\boldsymbol{\beta})=1-p(y_i =1\vert x_i,\boldsymbol{\beta}).
-$$
+<p>In <em>multiclass</em> classification it is common to treat each integer label as a so called <em>one-hot</em> vector:  </p>
 
-<p>The parameters \( \boldsymbol{\beta} \) were defined using a minimization method like gradient descent or Newton-Raphson's method. </p>
+<p>\( y = 5 \quad \rightarrow \quad \boldsymbol{y} = (0, 0, 0, 0, 0, 1, 0, 0, 0, 0) , \) and</p>
 
-<p>Now we replace \( x_i \) with the activation \( z_i^l \) for a given layer \( l \) and the outputs as \( y_i=a_i^l=f(z_i^l) \), with \( z_i^l \) now being a function of the weights \( w_{ij}^l \) and biases \( b_i^l \). 
-We have then
+\( y = 1 \quad \rightarrow \quad \boldsymbol{y} = (0, 1, 0, 0, 0, 0, 0, 0, 0, 0) , \)
+
+<p>i.e. a binary bit string of length \( C \), where \( C = 10 \) is the number of classes in the MNIST dataset (numbers from \( 0 \) to \( 9 \))..  </p>
+
+<p>If \( \boldsymbol{x}_i \) is the \( i \)-th input (image), \( y_{ic} \) refers to the \( c \)-th component of the \( i \)-th
+output vector \( \boldsymbol{y}_i \).  
+The probability of \( \boldsymbol{x}_i \) being in class \( c \) will be given by the softmax function:  
 </p>
-$$
-a_i^l = y_i = \frac{\exp{(z_i^l)}}{1+\exp{(z_i^l)}},
-$$
 
-<p>with </p>
 $$
-z_i^l = \sum_{j}w_{ij}^l a_j^{l-1}+b_i^l,
+P(y_{ic} = 1 \mid \boldsymbol{x}_i, \boldsymbol{\theta}) = \frac{\exp{((\boldsymbol{a}_i^{hidden})^T \boldsymbol{w}_c)}}
+{\sum_{c'=0}^{C-1} \exp{((\boldsymbol{a}_i^{hidden})^T \boldsymbol{w}_{c'})}} ,
 $$
 
-<p>where the superscript \( l-1 \) indicates that these are the outputs from layer \( l-1 \).
-Our cost function at the final layer \( l=L \) is now
+<p>which reduces to the logistic function in the binary case.  
+The likelihood of this \( C \)-class classifier
+is now given as:  
 </p>
+
 $$
-\mathcal{C}(\boldsymbol{W}) = - \sum_{i=1}^n \left(t_i\log{a_i^L}+(1-t_i)\log{(1-a_i^L)}\right),
+P(\mathcal{D} \mid \boldsymbol{\theta}) = \prod_{i=1}^n \prod_{c=0}^{C-1} [P(y_{ic} = 1)]^{y_{ic}} .
 $$
 
-<p>where we have defined the targets \( t_i \). The derivatives of the cost function with respect to the output \( a_i^L \) are then easily calculated and we get</p>
+<p>Again we take the negative log-likelihood to define our cost function:  </p>
+
 $$
-\frac{\partial \mathcal{C}(\boldsymbol{W})}{\partial a_i^L} = \frac{a_i^L-t_i}{a_i^L(1-a_i^L)}. 
+\mathcal{C}(\boldsymbol{\theta}) = - \log{P(\mathcal{D} \mid \boldsymbol{\theta})}.
 $$
 
-<p>In case we use another activation function than the logistic one, we need to evaluate other derivatives. </p>
+<p>See the logistic regression lectures for a full definition of the cost function.</p>
+
+<p>The back propagation equations need now only a small change, namely the definition of a new cost function. We are thus ready to use the same equations as before!</p>
 
 <p>
 <!-- navigation buttons at the bottom of the page -->
diff --git a/doc/pub/week42/html/._week42-bs073.html b/doc/pub/week42/html/._week42-bs073.html
index 3880972fc..7b2208858 100644
--- a/doc/pub/week42/html/._week42-bs073.html
+++ b/doc/pub/week42/html/._week42-bs073.html
@@ -36,16 +36,17 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Lecture October 14, 2024', 2, None, 'lecture-october-14-2024'),
-              ('Material for the active learning sessions on Tuesday and '
-               'Wednesday',
+ 'sections': [('Lecture October 13, 2025', 2, None, 'lecture-october-13-2025'),
+              ('Readings and videos', 2, None, 'readings-and-videos'),
+              ('Material for the lab sessions on Tuesday and Wednesday',
                2,
                None,
-               'material-for-the-active-learning-sessions-on-tuesday-and-wednesday'),
-              ('Writing a code which implements a feed-forward neural network',
+               'material-for-the-lab-sessions-on-tuesday-and-wednesday'),
+              ('Lecture material: Writing a code which implements a '
+               'feed-forward neural network',
                2,
                None,
-               'writing-a-code-which-implements-a-feed-forward-neural-network'),
+               'lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network'),
               ('Mathematics of deep learning',
                2,
                None,
@@ -55,10 +56,11 @@
                None,
                'reminder-on-books-with-hands-on-material-and-codes'),
               ('Reading recommendations', 2, None, 'reading-recommendations'),
-              ('First network example, simple percepetron with one input',
+              ('Reminder from last week: First network example, simple '
+               'percepetron with one input',
                2,
                None,
-               'first-network-example-simple-percepetron-with-one-input'),
+               'reminder-from-last-week-first-network-example-simple-percepetron-with-one-input'),
               ('Layout of a simple neural network with no hidden layer',
                2,
                None,
@@ -105,10 +107,10 @@
                None,
                'setting-up-the-equations-for-a-neural-network'),
               ('Layout of a neural network with three hidden layers (last '
-               'later = $l=L=4$, first layer $l=0$)',
+               'layer = $l=L=4$, first layer $l=0$)',
                2,
                None,
-               'layout-of-a-neural-network-with-three-hidden-layers-last-later-l-l-4-first-layer-l-0'),
+               'layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0'),
               ('Definitions', 2, None, 'definitions'),
               ('Inputs to the activation function',
                2,
@@ -291,10 +293,6 @@
                2,
                None,
                'collect-and-pre-process-data'),
-              ('The Breast Cancer Data, now with Keras',
-               2,
-               None,
-               'the-breast-cancer-data-now-with-keras'),
               ('Building a neural network code',
                2,
                None,
@@ -349,103 +347,103 @@
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs001.html#lecture-october-14-2024" style="font-size: 80%;"><b>Lecture October 14, 2024</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs002.html#material-for-the-active-learning-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Material for the active learning sessions on Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs003.html#writing-a-code-which-implements-a-feed-forward-neural-network" style="font-size: 80%;"><b>Writing a code which implements a feed-forward neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs004.html#mathematics-of-deep-learning" style="font-size: 80%;"><b>Mathematics of deep learning</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs005.html#reminder-on-books-with-hands-on-material-and-codes" style="font-size: 80%;"><b>Reminder on books with hands-on material and codes</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs006.html#reading-recommendations" style="font-size: 80%;"><b>Reading recommendations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs007.html#first-network-example-simple-percepetron-with-one-input" style="font-size: 80%;"><b>First network example, simple percepetron with one input</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs008.html#layout-of-a-simple-neural-network-with-no-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with no hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs009.html#optimizing-the-parameters" style="font-size: 80%;"><b>Optimizing the parameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs010.html#adding-a-hidden-layer" style="font-size: 80%;"><b>Adding a hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs011.html#layout-of-a-simple-neural-network-with-one-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with one hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs012.html#the-derivatives" style="font-size: 80%;"><b>The derivatives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs013.html#important-observations" style="font-size: 80%;"><b>Important observations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs014.html#the-training" style="font-size: 80%;"><b>The training</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs015.html#code-example" style="font-size: 80%;"><b>Code example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs016.html#simple-neural-network-and-the-back-propagation-equations" style="font-size: 80%;"><b>Simple neural network and the  back propagation equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs017.html#layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-with-two-hidden-noeds-and-one-output-node" style="font-size: 80%;"><b>Layout of a simple neural network with two input nodes, one  hidden layer with two hidden noeds and one output node</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs018.html#the-ouput-layer" style="font-size: 80%;"><b>The ouput layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs019.html#compact-expressions" style="font-size: 80%;"><b>Compact expressions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs020.html#output-layer" style="font-size: 80%;"><b>Output layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs021.html#explicit-derivatives" style="font-size: 80%;"><b>Explicit derivatives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs022.html#derivatives-of-the-hidden-layer" style="font-size: 80%;"><b>Derivatives of the hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs023.html#final-expression" style="font-size: 80%;"><b>Final expression</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs024.html#completing-the-list" style="font-size: 80%;"><b>Completing the list</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs025.html#final-expressions-for-the-biases-of-the-hidden-layer" style="font-size: 80%;"><b>Final expressions for the biases of the hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs026.html#gradient-expressions" style="font-size: 80%;"><b>Gradient expressions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs027.html#setting-up-the-equations-for-a-neural-network" style="font-size: 80%;"><b>Setting up the equations for a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs028.html#layout-of-a-neural-network-with-three-hidden-layers-last-later-l-l-4-first-layer-l-0" style="font-size: 80%;"><b>Layout of a neural network with three hidden layers (last later = \( l=L=4 \), first layer \( l=0 \))</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs029.html#definitions" style="font-size: 80%;"><b>Definitions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs030.html#inputs-to-the-activation-function" style="font-size: 80%;"><b>Inputs to the activation function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs031.html#layout-of-input-to-first-hidden-layer-l-1-from-input-layer-l-0" style="font-size: 80%;"><b>Layout of input to first hidden layer \( l=1 \) from input layer \( l=0 \)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs032.html#derivatives-and-the-chain-rule" style="font-size: 80%;"><b>Derivatives and the chain rule</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs033.html#derivative-of-the-cost-function" style="font-size: 80%;"><b>Derivative of the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs034.html#the-back-propagation-equations-for-a-neural-network" style="font-size: 80%;"><b>The  back propagation equations for a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs035.html#analyzing-the-last-results" style="font-size: 80%;"><b>Analyzing the last results</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs036.html#more-considerations" style="font-size: 80%;"><b>More considerations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs037.html#derivatives-in-terms-of-z-j-l" style="font-size: 80%;"><b>Derivatives in terms of \( z_j^L \)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs038.html#bringing-it-together" style="font-size: 80%;"><b>Bringing it together</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs039.html#final-back-propagating-equation" style="font-size: 80%;"><b>Final back propagating equation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs040.html#using-the-chain-rule-and-summing-over-all-k-entries" style="font-size: 80%;"><b>Using the chain rule and summing over all \( k \) entries</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs041.html#setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations" style="font-size: 80%;"><b>Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs042.html#setting-up-the-back-propagation-algorithm-part-1" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 1</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs043.html#setting-up-the-back-propagation-algorithm-part-2" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs044.html#setting-up-the-back-propagation-algorithm-part-3" style="font-size: 80%;"><b>Setting up the Back propagation algorithm, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs045.html#updating-the-gradients" style="font-size: 80%;"><b>Updating the gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs001.html#lecture-october-13-2025" style="font-size: 80%;"><b>Lecture October 13, 2025</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs002.html#readings-and-videos" style="font-size: 80%;"><b>Readings and videos</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs003.html#material-for-the-lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Material for the lab sessions on Tuesday and Wednesday</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs004.html#lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network" style="font-size: 80%;"><b>Lecture material: Writing a code which implements a feed-forward neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs005.html#mathematics-of-deep-learning" style="font-size: 80%;"><b>Mathematics of deep learning</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs006.html#reminder-on-books-with-hands-on-material-and-codes" style="font-size: 80%;"><b>Reminder on books with hands-on material and codes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs007.html#reading-recommendations" style="font-size: 80%;"><b>Reading recommendations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs008.html#reminder-from-last-week-first-network-example-simple-percepetron-with-one-input" style="font-size: 80%;"><b>Reminder from last week: First network example, simple percepetron with one input</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs009.html#layout-of-a-simple-neural-network-with-no-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with no hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs010.html#optimizing-the-parameters" style="font-size: 80%;"><b>Optimizing the parameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs011.html#adding-a-hidden-layer" style="font-size: 80%;"><b>Adding a hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs012.html#layout-of-a-simple-neural-network-with-one-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with one hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs013.html#the-derivatives" style="font-size: 80%;"><b>The derivatives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs014.html#important-observations" style="font-size: 80%;"><b>Important observations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs015.html#the-training" style="font-size: 80%;"><b>The training</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs016.html#code-example" style="font-size: 80%;"><b>Code example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs017.html#simple-neural-network-and-the-back-propagation-equations" style="font-size: 80%;"><b>Simple neural network and the  back propagation equations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs018.html#layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-with-two-hidden-noeds-and-one-output-node" style="font-size: 80%;"><b>Layout of a simple neural network with two input nodes, one  hidden layer with two hidden noeds and one output node</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs019.html#the-ouput-layer" style="font-size: 80%;"><b>The ouput layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs020.html#compact-expressions" style="font-size: 80%;"><b>Compact expressions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs021.html#output-layer" style="font-size: 80%;"><b>Output layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs022.html#explicit-derivatives" style="font-size: 80%;"><b>Explicit derivatives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs023.html#derivatives-of-the-hidden-layer" style="font-size: 80%;"><b>Derivatives of the hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs024.html#final-expression" style="font-size: 80%;"><b>Final expression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs025.html#completing-the-list" style="font-size: 80%;"><b>Completing the list</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs026.html#final-expressions-for-the-biases-of-the-hidden-layer" style="font-size: 80%;"><b>Final expressions for the biases of the hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs027.html#gradient-expressions" style="font-size: 80%;"><b>Gradient expressions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs028.html#setting-up-the-equations-for-a-neural-network" style="font-size: 80%;"><b>Setting up the equations for a neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs029.html#layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0" style="font-size: 80%;"><b>Layout of a neural network with three hidden layers (last layer = \( l=L=4 \), first layer \( l=0 \))</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs030.html#definitions" style="font-size: 80%;"><b>Definitions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs031.html#inputs-to-the-activation-function" style="font-size: 80%;"><b>Inputs to the activation function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs032.html#layout-of-input-to-first-hidden-layer-l-1-from-input-layer-l-0" style="font-size: 80%;"><b>Layout of input to first hidden layer \( l=1 \) from input layer \( l=0 \)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs033.html#derivatives-and-the-chain-rule" style="font-size: 80%;"><b>Derivatives and the chain rule</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs034.html#derivative-of-the-cost-function" style="font-size: 80%;"><b>Derivative of the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs035.html#the-back-propagation-equations-for-a-neural-network" style="font-size: 80%;"><b>The  back propagation equations for a neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs036.html#analyzing-the-last-results" style="font-size: 80%;"><b>Analyzing the last results</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs037.html#more-considerations" style="font-size: 80%;"><b>More considerations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs038.html#derivatives-in-terms-of-z-j-l" style="font-size: 80%;"><b>Derivatives in terms of \( z_j^L \)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs039.html#bringing-it-together" style="font-size: 80%;"><b>Bringing it together</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs040.html#final-back-propagating-equation" style="font-size: 80%;"><b>Final back propagating equation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs041.html#using-the-chain-rule-and-summing-over-all-k-entries" style="font-size: 80%;"><b>Using the chain rule and summing over all \( k \) entries</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs042.html#setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations" style="font-size: 80%;"><b>Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs043.html#setting-up-the-back-propagation-algorithm-part-1" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 1</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs044.html#setting-up-the-back-propagation-algorithm-part-2" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 2</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs045.html#setting-up-the-back-propagation-algorithm-part-3" style="font-size: 80%;"><b>Setting up the Back propagation algorithm, part 3</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs046.html#updating-the-gradients" style="font-size: 80%;"><b>Updating the gradients</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#activation-functions" style="font-size: 80%;"><b>Activation functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs047.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs048.html#relevance" style="font-size: 80%;"><b>Relevance</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs049.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs050.html#exploding-gradients" style="font-size: 80%;"><b>Exploding gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs051.html#is-the-logistic-activation-function-sigmoid-our-choice" style="font-size: 80%;"><b>Is the Logistic activation function (Sigmoid)  our choice?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs052.html#logistic-function-as-the-root-of-problems" style="font-size: 80%;"><b>Logistic function as the root of problems</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs053.html#the-derivative-of-the-logistic-funtion" style="font-size: 80%;"><b>The derivative of the Logistic funtion</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs054.html#insights-from-the-paper-by-glorot-and-bengio" style="font-size: 80%;"><b>Insights from the paper by Glorot and Bengio</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs055.html#the-relu-function-family" style="font-size: 80%;"><b>The RELU function family</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs056.html#elu-function" style="font-size: 80%;"><b>ELU function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs057.html#which-activation-function-should-we-use" style="font-size: 80%;"><b>Which activation function should we use?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs058.html#more-on-activation-functions-output-layers" style="font-size: 80%;"><b>More on activation functions, output layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs059.html#fine-tuning-neural-network-hyperparameters" style="font-size: 80%;"><b>Fine-tuning neural network hyperparameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs060.html#hidden-layers" style="font-size: 80%;"><b>Hidden layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs061.html#batch-normalization" style="font-size: 80%;"><b>Batch Normalization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs062.html#dropout" style="font-size: 80%;"><b>Dropout</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs063.html#gradient-clipping" style="font-size: 80%;"><b>Gradient Clipping</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs064.html#a-top-down-perspective-on-neural-networks" style="font-size: 80%;"><b>A top-down perspective on Neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs065.html#more-top-down-perspectives" style="font-size: 80%;"><b>More top-down perspectives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs066.html#limitations-of-supervised-learning-with-deep-networks" style="font-size: 80%;"><b>Limitations of supervised learning with deep networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs067.html#limitations-of-nns" style="font-size: 80%;"><b>Limitations of NNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs068.html#homogeneous-data" style="font-size: 80%;"><b>Homogeneous data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs069.html#more-limitations" style="font-size: 80%;"><b>More limitations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs070.html#setting-up-a-multi-layer-perceptron-model-for-classification" style="font-size: 80%;"><b>Setting up a Multi-layer perceptron model for classification</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs071.html#defining-the-cost-function" style="font-size: 80%;"><b>Defining the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs072.html#example-binary-classification-problem" style="font-size: 80%;"><b>Example: binary classification problem</b></a></li>
-     <!-- navigation toc: --> <li><a href="#the-softmax-function" style="font-size: 80%;"><b>The Softmax function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs074.html#developing-a-code-for-doing-neural-networks-with-back-propagation" style="font-size: 80%;"><b>Developing a code for doing neural networks with back propagation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs076.html#train-and-test-datasets" style="font-size: 80%;"><b>Train and test datasets</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs077.html#define-model-and-architecture" style="font-size: 80%;"><b>Define model and architecture</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs078.html#layers" style="font-size: 80%;"><b>Layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs079.html#weights-and-biases" style="font-size: 80%;"><b>Weights and biases</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs080.html#feed-forward-pass" style="font-size: 80%;"><b>Feed-forward pass</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs081.html#matrix-multiplications" style="font-size: 80%;"><b>Matrix multiplications</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs082.html#choose-cost-function-and-optimizer" style="font-size: 80%;"><b>Choose cost function and optimizer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs083.html#optimizing-the-cost-function" style="font-size: 80%;"><b>Optimizing the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs084.html#regularization" style="font-size: 80%;"><b>Regularization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs085.html#matrix-multiplication" style="font-size: 80%;"><b>Matrix  multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs086.html#improving-performance" style="font-size: 80%;"><b>Improving performance</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs087.html#full-object-oriented-implementation" style="font-size: 80%;"><b>Full object-oriented implementation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs088.html#evaluate-model-performance-on-test-data" style="font-size: 80%;"><b>Evaluate model performance on test data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs089.html#adjust-hyperparameters" style="font-size: 80%;"><b>Adjust hyperparameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs091.html#scikit-learn-implementation" style="font-size: 80%;"><b>scikit-learn implementation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#building-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs094.html#tensorflow" style="font-size: 80%;"><b>Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs095.html#using-keras" style="font-size: 80%;"><b>Using Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#the-breast-cancer-data-now-with-keras" style="font-size: 80%;"><b>The Breast Cancer Data, now with Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs048.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs049.html#relevance" style="font-size: 80%;"><b>Relevance</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs050.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs051.html#exploding-gradients" style="font-size: 80%;"><b>Exploding gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs052.html#is-the-logistic-activation-function-sigmoid-our-choice" style="font-size: 80%;"><b>Is the Logistic activation function (Sigmoid)  our choice?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs053.html#logistic-function-as-the-root-of-problems" style="font-size: 80%;"><b>Logistic function as the root of problems</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs054.html#the-derivative-of-the-logistic-funtion" style="font-size: 80%;"><b>The derivative of the Logistic funtion</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs055.html#insights-from-the-paper-by-glorot-and-bengio" style="font-size: 80%;"><b>Insights from the paper by Glorot and Bengio</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs056.html#the-relu-function-family" style="font-size: 80%;"><b>The RELU function family</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs057.html#elu-function" style="font-size: 80%;"><b>ELU function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs058.html#which-activation-function-should-we-use" style="font-size: 80%;"><b>Which activation function should we use?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs059.html#more-on-activation-functions-output-layers" style="font-size: 80%;"><b>More on activation functions, output layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs060.html#fine-tuning-neural-network-hyperparameters" style="font-size: 80%;"><b>Fine-tuning neural network hyperparameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs061.html#hidden-layers" style="font-size: 80%;"><b>Hidden layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs062.html#batch-normalization" style="font-size: 80%;"><b>Batch Normalization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs063.html#dropout" style="font-size: 80%;"><b>Dropout</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs064.html#gradient-clipping" style="font-size: 80%;"><b>Gradient Clipping</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs065.html#a-top-down-perspective-on-neural-networks" style="font-size: 80%;"><b>A top-down perspective on Neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs066.html#more-top-down-perspectives" style="font-size: 80%;"><b>More top-down perspectives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs067.html#limitations-of-supervised-learning-with-deep-networks" style="font-size: 80%;"><b>Limitations of supervised learning with deep networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs068.html#limitations-of-nns" style="font-size: 80%;"><b>Limitations of NNs</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs069.html#homogeneous-data" style="font-size: 80%;"><b>Homogeneous data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs070.html#more-limitations" style="font-size: 80%;"><b>More limitations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs071.html#setting-up-a-multi-layer-perceptron-model-for-classification" style="font-size: 80%;"><b>Setting up a Multi-layer perceptron model for classification</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs072.html#defining-the-cost-function" style="font-size: 80%;"><b>Defining the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="#example-binary-classification-problem" style="font-size: 80%;"><b>Example: binary classification problem</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs074.html#the-softmax-function" style="font-size: 80%;"><b>The Softmax function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs075.html#developing-a-code-for-doing-neural-networks-with-back-propagation" style="font-size: 80%;"><b>Developing a code for doing neural networks with back propagation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs077.html#train-and-test-datasets" style="font-size: 80%;"><b>Train and test datasets</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs078.html#define-model-and-architecture" style="font-size: 80%;"><b>Define model and architecture</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs079.html#layers" style="font-size: 80%;"><b>Layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs080.html#weights-and-biases" style="font-size: 80%;"><b>Weights and biases</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs081.html#feed-forward-pass" style="font-size: 80%;"><b>Feed-forward pass</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs082.html#matrix-multiplications" style="font-size: 80%;"><b>Matrix multiplications</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs083.html#choose-cost-function-and-optimizer" style="font-size: 80%;"><b>Choose cost function and optimizer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs084.html#optimizing-the-cost-function" style="font-size: 80%;"><b>Optimizing the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs085.html#regularization" style="font-size: 80%;"><b>Regularization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs086.html#matrix-multiplication" style="font-size: 80%;"><b>Matrix  multiplication</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs087.html#improving-performance" style="font-size: 80%;"><b>Improving performance</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs088.html#full-object-oriented-implementation" style="font-size: 80%;"><b>Full object-oriented implementation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs089.html#evaluate-model-performance-on-test-data" style="font-size: 80%;"><b>Evaluate model performance on test data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs090.html#adjust-hyperparameters" style="font-size: 80%;"><b>Adjust hyperparameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#scikit-learn-implementation" style="font-size: 80%;"><b>scikit-learn implementation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs094.html#building-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building neural networks in Tensorflow and Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs095.html#tensorflow" style="font-size: 80%;"><b>Tensorflow</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#using-keras" style="font-size: 80%;"><b>Using Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#building-a-neural-network-code" style="font-size: 80%;"><b>Building a neural network code</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#learning-rate-methods" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Learning rate methods</a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#usage-of-the-above-learning-rate-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of the above learning rate schedulers</a></li>
@@ -465,24 +463,50 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0073"></a>
 <!-- !split -->
-<h2 id="the-softmax-function" class="anchor">The Softmax function </h2>
-<p>In case we employ the more general case given by the Softmax equation, we need to evaluate the derivative of the activation function with respect to the activation \( z_i^l \), that is we need</p>
+<h2 id="example-binary-classification-problem" class="anchor">Example: binary classification problem </h2>
+
+<p>As an example of the above, relevant for project 2 as well, let us consider a binary class. As discussed in our logistic regression lectures, we defined a cost function in terms of the parameters \( \beta \) as</p>
+$$
+\mathcal{C}(\boldsymbol{\beta}) = - \sum_{i=1}^n \left(y_i\log{p(y_i \vert x_i,\boldsymbol{\beta})}+(1-y_i)\log{1-p(y_i \vert x_i,\boldsymbol{\beta})}\right),
+$$
+
+<p>where we had defined the logistic (sigmoid) function</p>
+$$
+p(y_i =1\vert x_i,\boldsymbol{\beta})=\frac{\exp{(\beta_0+\beta_1 x_i)}}{1+\exp{(\beta_0+\beta_1 x_i)}},
+$$
+
+<p>and</p>
+$$
+p(y_i =0\vert x_i,\boldsymbol{\beta})=1-p(y_i =1\vert x_i,\boldsymbol{\beta}).
+$$
+
+<p>The parameters \( \boldsymbol{\beta} \) were defined using a minimization method like gradient descent or Newton-Raphson's method. </p>
+
+<p>Now we replace \( x_i \) with the activation \( z_i^l \) for a given layer \( l \) and the outputs as \( y_i=a_i^l=f(z_i^l) \), with \( z_i^l \) now being a function of the weights \( w_{ij}^l \) and biases \( b_i^l \). 
+We have then
+</p>
+$$
+a_i^l = y_i = \frac{\exp{(z_i^l)}}{1+\exp{(z_i^l)}},
+$$
+
+<p>with </p>
 $$
-\frac{\partial f(z_i^l)}{\partial w_{jk}^l} =
-\frac{\partial f(z_i^l)}{\partial z_j^l} \frac{\partial z_j^l}{\partial w_{jk}^l}= \frac{\partial f(z_i^l)}{\partial z_j^l}a_k^{l-1}.
+z_i^l = \sum_{j}w_{ij}^l a_j^{l-1}+b_i^l,
 $$
 
-<p>For the Softmax function we have</p>
+<p>where the superscript \( l-1 \) indicates that these are the outputs from layer \( l-1 \).
+Our cost function at the final layer \( l=L \) is now
+</p>
 $$
-f(z_i^l) = \frac{\exp{(z_i^l)}}{\sum_{m=1}^K\exp{(z_m^l)}}.
+\mathcal{C}(\boldsymbol{W}) = - \sum_{i=1}^n \left(t_i\log{a_i^L}+(1-t_i)\log{(1-a_i^L)}\right),
 $$
 
-<p>Its derivative with respect to \( z_j^l \) gives </p>
+<p>where we have defined the targets \( t_i \). The derivatives of the cost function with respect to the output \( a_i^L \) are then easily calculated and we get</p>
 $$
-\frac{\partial f(z_i^l)}{\partial z_j^l}= f(z_i^l)\left(\delta_{ij}-f(z_j^l)\right), 
+\frac{\partial \mathcal{C}(\boldsymbol{W})}{\partial a_i^L} = \frac{a_i^L-t_i}{a_i^L(1-a_i^L)}. 
 $$
 
-<p>which in case of the simply binary model reduces to  having \( i=j \). </p>
+<p>In case we use another activation function than the logistic one, we need to evaluate other derivatives. </p>
 
 <p>
 <!-- navigation buttons at the bottom of the page -->
diff --git a/doc/pub/week42/html/._week42-bs074.html b/doc/pub/week42/html/._week42-bs074.html
index 34a6db3e4..27aab15f1 100644
--- a/doc/pub/week42/html/._week42-bs074.html
+++ b/doc/pub/week42/html/._week42-bs074.html
@@ -36,16 +36,17 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Lecture October 14, 2024', 2, None, 'lecture-october-14-2024'),
-              ('Material for the active learning sessions on Tuesday and '
-               'Wednesday',
+ 'sections': [('Lecture October 13, 2025', 2, None, 'lecture-october-13-2025'),
+              ('Readings and videos', 2, None, 'readings-and-videos'),
+              ('Material for the lab sessions on Tuesday and Wednesday',
                2,
                None,
-               'material-for-the-active-learning-sessions-on-tuesday-and-wednesday'),
-              ('Writing a code which implements a feed-forward neural network',
+               'material-for-the-lab-sessions-on-tuesday-and-wednesday'),
+              ('Lecture material: Writing a code which implements a '
+               'feed-forward neural network',
                2,
                None,
-               'writing-a-code-which-implements-a-feed-forward-neural-network'),
+               'lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network'),
               ('Mathematics of deep learning',
                2,
                None,
@@ -55,10 +56,11 @@
                None,
                'reminder-on-books-with-hands-on-material-and-codes'),
               ('Reading recommendations', 2, None, 'reading-recommendations'),
-              ('First network example, simple percepetron with one input',
+              ('Reminder from last week: First network example, simple '
+               'percepetron with one input',
                2,
                None,
-               'first-network-example-simple-percepetron-with-one-input'),
+               'reminder-from-last-week-first-network-example-simple-percepetron-with-one-input'),
               ('Layout of a simple neural network with no hidden layer',
                2,
                None,
@@ -105,10 +107,10 @@
                None,
                'setting-up-the-equations-for-a-neural-network'),
               ('Layout of a neural network with three hidden layers (last '
-               'later = $l=L=4$, first layer $l=0$)',
+               'layer = $l=L=4$, first layer $l=0$)',
                2,
                None,
-               'layout-of-a-neural-network-with-three-hidden-layers-last-later-l-l-4-first-layer-l-0'),
+               'layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0'),
               ('Definitions', 2, None, 'definitions'),
               ('Inputs to the activation function',
                2,
@@ -291,10 +293,6 @@
                2,
                None,
                'collect-and-pre-process-data'),
-              ('The Breast Cancer Data, now with Keras',
-               2,
-               None,
-               'the-breast-cancer-data-now-with-keras'),
               ('Building a neural network code',
                2,
                None,
@@ -349,103 +347,103 @@
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs001.html#lecture-october-14-2024" style="font-size: 80%;"><b>Lecture October 14, 2024</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs002.html#material-for-the-active-learning-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Material for the active learning sessions on Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs003.html#writing-a-code-which-implements-a-feed-forward-neural-network" style="font-size: 80%;"><b>Writing a code which implements a feed-forward neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs004.html#mathematics-of-deep-learning" style="font-size: 80%;"><b>Mathematics of deep learning</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs005.html#reminder-on-books-with-hands-on-material-and-codes" style="font-size: 80%;"><b>Reminder on books with hands-on material and codes</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs006.html#reading-recommendations" style="font-size: 80%;"><b>Reading recommendations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs007.html#first-network-example-simple-percepetron-with-one-input" style="font-size: 80%;"><b>First network example, simple percepetron with one input</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs008.html#layout-of-a-simple-neural-network-with-no-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with no hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs009.html#optimizing-the-parameters" style="font-size: 80%;"><b>Optimizing the parameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs010.html#adding-a-hidden-layer" style="font-size: 80%;"><b>Adding a hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs011.html#layout-of-a-simple-neural-network-with-one-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with one hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs012.html#the-derivatives" style="font-size: 80%;"><b>The derivatives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs013.html#important-observations" style="font-size: 80%;"><b>Important observations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs014.html#the-training" style="font-size: 80%;"><b>The training</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs015.html#code-example" style="font-size: 80%;"><b>Code example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs016.html#simple-neural-network-and-the-back-propagation-equations" style="font-size: 80%;"><b>Simple neural network and the  back propagation equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs017.html#layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-with-two-hidden-noeds-and-one-output-node" style="font-size: 80%;"><b>Layout of a simple neural network with two input nodes, one  hidden layer with two hidden noeds and one output node</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs018.html#the-ouput-layer" style="font-size: 80%;"><b>The ouput layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs019.html#compact-expressions" style="font-size: 80%;"><b>Compact expressions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs020.html#output-layer" style="font-size: 80%;"><b>Output layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs021.html#explicit-derivatives" style="font-size: 80%;"><b>Explicit derivatives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs022.html#derivatives-of-the-hidden-layer" style="font-size: 80%;"><b>Derivatives of the hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs023.html#final-expression" style="font-size: 80%;"><b>Final expression</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs024.html#completing-the-list" style="font-size: 80%;"><b>Completing the list</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs025.html#final-expressions-for-the-biases-of-the-hidden-layer" style="font-size: 80%;"><b>Final expressions for the biases of the hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs026.html#gradient-expressions" style="font-size: 80%;"><b>Gradient expressions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs027.html#setting-up-the-equations-for-a-neural-network" style="font-size: 80%;"><b>Setting up the equations for a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs028.html#layout-of-a-neural-network-with-three-hidden-layers-last-later-l-l-4-first-layer-l-0" style="font-size: 80%;"><b>Layout of a neural network with three hidden layers (last later = \( l=L=4 \), first layer \( l=0 \))</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs029.html#definitions" style="font-size: 80%;"><b>Definitions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs030.html#inputs-to-the-activation-function" style="font-size: 80%;"><b>Inputs to the activation function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs031.html#layout-of-input-to-first-hidden-layer-l-1-from-input-layer-l-0" style="font-size: 80%;"><b>Layout of input to first hidden layer \( l=1 \) from input layer \( l=0 \)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs032.html#derivatives-and-the-chain-rule" style="font-size: 80%;"><b>Derivatives and the chain rule</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs033.html#derivative-of-the-cost-function" style="font-size: 80%;"><b>Derivative of the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs034.html#the-back-propagation-equations-for-a-neural-network" style="font-size: 80%;"><b>The  back propagation equations for a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs035.html#analyzing-the-last-results" style="font-size: 80%;"><b>Analyzing the last results</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs036.html#more-considerations" style="font-size: 80%;"><b>More considerations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs037.html#derivatives-in-terms-of-z-j-l" style="font-size: 80%;"><b>Derivatives in terms of \( z_j^L \)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs038.html#bringing-it-together" style="font-size: 80%;"><b>Bringing it together</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs039.html#final-back-propagating-equation" style="font-size: 80%;"><b>Final back propagating equation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs040.html#using-the-chain-rule-and-summing-over-all-k-entries" style="font-size: 80%;"><b>Using the chain rule and summing over all \( k \) entries</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs041.html#setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations" style="font-size: 80%;"><b>Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs042.html#setting-up-the-back-propagation-algorithm-part-1" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 1</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs043.html#setting-up-the-back-propagation-algorithm-part-2" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs044.html#setting-up-the-back-propagation-algorithm-part-3" style="font-size: 80%;"><b>Setting up the Back propagation algorithm, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs045.html#updating-the-gradients" style="font-size: 80%;"><b>Updating the gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs001.html#lecture-october-13-2025" style="font-size: 80%;"><b>Lecture October 13, 2025</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs002.html#readings-and-videos" style="font-size: 80%;"><b>Readings and videos</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs003.html#material-for-the-lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Material for the lab sessions on Tuesday and Wednesday</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs004.html#lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network" style="font-size: 80%;"><b>Lecture material: Writing a code which implements a feed-forward neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs005.html#mathematics-of-deep-learning" style="font-size: 80%;"><b>Mathematics of deep learning</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs006.html#reminder-on-books-with-hands-on-material-and-codes" style="font-size: 80%;"><b>Reminder on books with hands-on material and codes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs007.html#reading-recommendations" style="font-size: 80%;"><b>Reading recommendations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs008.html#reminder-from-last-week-first-network-example-simple-percepetron-with-one-input" style="font-size: 80%;"><b>Reminder from last week: First network example, simple percepetron with one input</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs009.html#layout-of-a-simple-neural-network-with-no-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with no hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs010.html#optimizing-the-parameters" style="font-size: 80%;"><b>Optimizing the parameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs011.html#adding-a-hidden-layer" style="font-size: 80%;"><b>Adding a hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs012.html#layout-of-a-simple-neural-network-with-one-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with one hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs013.html#the-derivatives" style="font-size: 80%;"><b>The derivatives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs014.html#important-observations" style="font-size: 80%;"><b>Important observations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs015.html#the-training" style="font-size: 80%;"><b>The training</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs016.html#code-example" style="font-size: 80%;"><b>Code example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs017.html#simple-neural-network-and-the-back-propagation-equations" style="font-size: 80%;"><b>Simple neural network and the  back propagation equations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs018.html#layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-with-two-hidden-noeds-and-one-output-node" style="font-size: 80%;"><b>Layout of a simple neural network with two input nodes, one  hidden layer with two hidden noeds and one output node</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs019.html#the-ouput-layer" style="font-size: 80%;"><b>The ouput layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs020.html#compact-expressions" style="font-size: 80%;"><b>Compact expressions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs021.html#output-layer" style="font-size: 80%;"><b>Output layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs022.html#explicit-derivatives" style="font-size: 80%;"><b>Explicit derivatives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs023.html#derivatives-of-the-hidden-layer" style="font-size: 80%;"><b>Derivatives of the hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs024.html#final-expression" style="font-size: 80%;"><b>Final expression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs025.html#completing-the-list" style="font-size: 80%;"><b>Completing the list</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs026.html#final-expressions-for-the-biases-of-the-hidden-layer" style="font-size: 80%;"><b>Final expressions for the biases of the hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs027.html#gradient-expressions" style="font-size: 80%;"><b>Gradient expressions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs028.html#setting-up-the-equations-for-a-neural-network" style="font-size: 80%;"><b>Setting up the equations for a neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs029.html#layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0" style="font-size: 80%;"><b>Layout of a neural network with three hidden layers (last layer = \( l=L=4 \), first layer \( l=0 \))</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs030.html#definitions" style="font-size: 80%;"><b>Definitions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs031.html#inputs-to-the-activation-function" style="font-size: 80%;"><b>Inputs to the activation function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs032.html#layout-of-input-to-first-hidden-layer-l-1-from-input-layer-l-0" style="font-size: 80%;"><b>Layout of input to first hidden layer \( l=1 \) from input layer \( l=0 \)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs033.html#derivatives-and-the-chain-rule" style="font-size: 80%;"><b>Derivatives and the chain rule</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs034.html#derivative-of-the-cost-function" style="font-size: 80%;"><b>Derivative of the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs035.html#the-back-propagation-equations-for-a-neural-network" style="font-size: 80%;"><b>The  back propagation equations for a neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs036.html#analyzing-the-last-results" style="font-size: 80%;"><b>Analyzing the last results</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs037.html#more-considerations" style="font-size: 80%;"><b>More considerations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs038.html#derivatives-in-terms-of-z-j-l" style="font-size: 80%;"><b>Derivatives in terms of \( z_j^L \)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs039.html#bringing-it-together" style="font-size: 80%;"><b>Bringing it together</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs040.html#final-back-propagating-equation" style="font-size: 80%;"><b>Final back propagating equation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs041.html#using-the-chain-rule-and-summing-over-all-k-entries" style="font-size: 80%;"><b>Using the chain rule and summing over all \( k \) entries</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs042.html#setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations" style="font-size: 80%;"><b>Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs043.html#setting-up-the-back-propagation-algorithm-part-1" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 1</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs044.html#setting-up-the-back-propagation-algorithm-part-2" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 2</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs045.html#setting-up-the-back-propagation-algorithm-part-3" style="font-size: 80%;"><b>Setting up the Back propagation algorithm, part 3</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs046.html#updating-the-gradients" style="font-size: 80%;"><b>Updating the gradients</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#activation-functions" style="font-size: 80%;"><b>Activation functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs047.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs048.html#relevance" style="font-size: 80%;"><b>Relevance</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs049.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs050.html#exploding-gradients" style="font-size: 80%;"><b>Exploding gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs051.html#is-the-logistic-activation-function-sigmoid-our-choice" style="font-size: 80%;"><b>Is the Logistic activation function (Sigmoid)  our choice?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs052.html#logistic-function-as-the-root-of-problems" style="font-size: 80%;"><b>Logistic function as the root of problems</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs053.html#the-derivative-of-the-logistic-funtion" style="font-size: 80%;"><b>The derivative of the Logistic funtion</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs054.html#insights-from-the-paper-by-glorot-and-bengio" style="font-size: 80%;"><b>Insights from the paper by Glorot and Bengio</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs055.html#the-relu-function-family" style="font-size: 80%;"><b>The RELU function family</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs056.html#elu-function" style="font-size: 80%;"><b>ELU function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs057.html#which-activation-function-should-we-use" style="font-size: 80%;"><b>Which activation function should we use?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs058.html#more-on-activation-functions-output-layers" style="font-size: 80%;"><b>More on activation functions, output layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs059.html#fine-tuning-neural-network-hyperparameters" style="font-size: 80%;"><b>Fine-tuning neural network hyperparameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs060.html#hidden-layers" style="font-size: 80%;"><b>Hidden layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs061.html#batch-normalization" style="font-size: 80%;"><b>Batch Normalization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs062.html#dropout" style="font-size: 80%;"><b>Dropout</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs063.html#gradient-clipping" style="font-size: 80%;"><b>Gradient Clipping</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs064.html#a-top-down-perspective-on-neural-networks" style="font-size: 80%;"><b>A top-down perspective on Neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs065.html#more-top-down-perspectives" style="font-size: 80%;"><b>More top-down perspectives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs066.html#limitations-of-supervised-learning-with-deep-networks" style="font-size: 80%;"><b>Limitations of supervised learning with deep networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs067.html#limitations-of-nns" style="font-size: 80%;"><b>Limitations of NNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs068.html#homogeneous-data" style="font-size: 80%;"><b>Homogeneous data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs069.html#more-limitations" style="font-size: 80%;"><b>More limitations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs070.html#setting-up-a-multi-layer-perceptron-model-for-classification" style="font-size: 80%;"><b>Setting up a Multi-layer perceptron model for classification</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs071.html#defining-the-cost-function" style="font-size: 80%;"><b>Defining the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs072.html#example-binary-classification-problem" style="font-size: 80%;"><b>Example: binary classification problem</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs073.html#the-softmax-function" style="font-size: 80%;"><b>The Softmax function</b></a></li>
-     <!-- navigation toc: --> <li><a href="#developing-a-code-for-doing-neural-networks-with-back-propagation" style="font-size: 80%;"><b>Developing a code for doing neural networks with back propagation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs076.html#train-and-test-datasets" style="font-size: 80%;"><b>Train and test datasets</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs077.html#define-model-and-architecture" style="font-size: 80%;"><b>Define model and architecture</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs078.html#layers" style="font-size: 80%;"><b>Layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs079.html#weights-and-biases" style="font-size: 80%;"><b>Weights and biases</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs080.html#feed-forward-pass" style="font-size: 80%;"><b>Feed-forward pass</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs081.html#matrix-multiplications" style="font-size: 80%;"><b>Matrix multiplications</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs082.html#choose-cost-function-and-optimizer" style="font-size: 80%;"><b>Choose cost function and optimizer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs083.html#optimizing-the-cost-function" style="font-size: 80%;"><b>Optimizing the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs084.html#regularization" style="font-size: 80%;"><b>Regularization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs085.html#matrix-multiplication" style="font-size: 80%;"><b>Matrix  multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs086.html#improving-performance" style="font-size: 80%;"><b>Improving performance</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs087.html#full-object-oriented-implementation" style="font-size: 80%;"><b>Full object-oriented implementation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs088.html#evaluate-model-performance-on-test-data" style="font-size: 80%;"><b>Evaluate model performance on test data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs089.html#adjust-hyperparameters" style="font-size: 80%;"><b>Adjust hyperparameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs091.html#scikit-learn-implementation" style="font-size: 80%;"><b>scikit-learn implementation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#building-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs094.html#tensorflow" style="font-size: 80%;"><b>Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs095.html#using-keras" style="font-size: 80%;"><b>Using Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#the-breast-cancer-data-now-with-keras" style="font-size: 80%;"><b>The Breast Cancer Data, now with Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs048.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs049.html#relevance" style="font-size: 80%;"><b>Relevance</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs050.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs051.html#exploding-gradients" style="font-size: 80%;"><b>Exploding gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs052.html#is-the-logistic-activation-function-sigmoid-our-choice" style="font-size: 80%;"><b>Is the Logistic activation function (Sigmoid)  our choice?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs053.html#logistic-function-as-the-root-of-problems" style="font-size: 80%;"><b>Logistic function as the root of problems</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs054.html#the-derivative-of-the-logistic-funtion" style="font-size: 80%;"><b>The derivative of the Logistic funtion</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs055.html#insights-from-the-paper-by-glorot-and-bengio" style="font-size: 80%;"><b>Insights from the paper by Glorot and Bengio</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs056.html#the-relu-function-family" style="font-size: 80%;"><b>The RELU function family</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs057.html#elu-function" style="font-size: 80%;"><b>ELU function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs058.html#which-activation-function-should-we-use" style="font-size: 80%;"><b>Which activation function should we use?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs059.html#more-on-activation-functions-output-layers" style="font-size: 80%;"><b>More on activation functions, output layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs060.html#fine-tuning-neural-network-hyperparameters" style="font-size: 80%;"><b>Fine-tuning neural network hyperparameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs061.html#hidden-layers" style="font-size: 80%;"><b>Hidden layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs062.html#batch-normalization" style="font-size: 80%;"><b>Batch Normalization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs063.html#dropout" style="font-size: 80%;"><b>Dropout</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs064.html#gradient-clipping" style="font-size: 80%;"><b>Gradient Clipping</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs065.html#a-top-down-perspective-on-neural-networks" style="font-size: 80%;"><b>A top-down perspective on Neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs066.html#more-top-down-perspectives" style="font-size: 80%;"><b>More top-down perspectives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs067.html#limitations-of-supervised-learning-with-deep-networks" style="font-size: 80%;"><b>Limitations of supervised learning with deep networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs068.html#limitations-of-nns" style="font-size: 80%;"><b>Limitations of NNs</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs069.html#homogeneous-data" style="font-size: 80%;"><b>Homogeneous data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs070.html#more-limitations" style="font-size: 80%;"><b>More limitations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs071.html#setting-up-a-multi-layer-perceptron-model-for-classification" style="font-size: 80%;"><b>Setting up a Multi-layer perceptron model for classification</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs072.html#defining-the-cost-function" style="font-size: 80%;"><b>Defining the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs073.html#example-binary-classification-problem" style="font-size: 80%;"><b>Example: binary classification problem</b></a></li>
+     <!-- navigation toc: --> <li><a href="#the-softmax-function" style="font-size: 80%;"><b>The Softmax function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs075.html#developing-a-code-for-doing-neural-networks-with-back-propagation" style="font-size: 80%;"><b>Developing a code for doing neural networks with back propagation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs077.html#train-and-test-datasets" style="font-size: 80%;"><b>Train and test datasets</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs078.html#define-model-and-architecture" style="font-size: 80%;"><b>Define model and architecture</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs079.html#layers" style="font-size: 80%;"><b>Layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs080.html#weights-and-biases" style="font-size: 80%;"><b>Weights and biases</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs081.html#feed-forward-pass" style="font-size: 80%;"><b>Feed-forward pass</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs082.html#matrix-multiplications" style="font-size: 80%;"><b>Matrix multiplications</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs083.html#choose-cost-function-and-optimizer" style="font-size: 80%;"><b>Choose cost function and optimizer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs084.html#optimizing-the-cost-function" style="font-size: 80%;"><b>Optimizing the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs085.html#regularization" style="font-size: 80%;"><b>Regularization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs086.html#matrix-multiplication" style="font-size: 80%;"><b>Matrix  multiplication</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs087.html#improving-performance" style="font-size: 80%;"><b>Improving performance</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs088.html#full-object-oriented-implementation" style="font-size: 80%;"><b>Full object-oriented implementation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs089.html#evaluate-model-performance-on-test-data" style="font-size: 80%;"><b>Evaluate model performance on test data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs090.html#adjust-hyperparameters" style="font-size: 80%;"><b>Adjust hyperparameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#scikit-learn-implementation" style="font-size: 80%;"><b>scikit-learn implementation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs094.html#building-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building neural networks in Tensorflow and Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs095.html#tensorflow" style="font-size: 80%;"><b>Tensorflow</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#using-keras" style="font-size: 80%;"><b>Using Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#building-a-neural-network-code" style="font-size: 80%;"><b>Building a neural network code</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#learning-rate-methods" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Learning rate methods</a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#usage-of-the-above-learning-rate-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of the above learning rate schedulers</a></li>
@@ -464,19 +462,26 @@
 <div class="container">
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0074"></a>
-<!-- !split  -->
-<h2 id="developing-a-code-for-doing-neural-networks-with-back-propagation" class="anchor">Developing a code for doing neural networks with back propagation </h2>
+<!-- !split -->
+<h2 id="the-softmax-function" class="anchor">The Softmax function </h2>
+<p>In case we employ the more general case given by the Softmax equation, we need to evaluate the derivative of the activation function with respect to the activation \( z_i^l \), that is we need</p>
+$$
+\frac{\partial f(z_i^l)}{\partial w_{jk}^l} =
+\frac{\partial f(z_i^l)}{\partial z_j^l} \frac{\partial z_j^l}{\partial w_{jk}^l}= \frac{\partial f(z_i^l)}{\partial z_j^l}a_k^{l-1}.
+$$
+
+<p>For the Softmax function we have</p>
+$$
+f(z_i^l) = \frac{\exp{(z_i^l)}}{\sum_{m=1}^K\exp{(z_m^l)}}.
+$$
+
+<p>Its derivative with respect to \( z_j^l \) gives </p>
+$$
+\frac{\partial f(z_i^l)}{\partial z_j^l}= f(z_i^l)\left(\delta_{ij}-f(z_j^l)\right), 
+$$
 
-<p>One can identify a set of key steps when using neural networks to solve supervised learning problems:  </p>
+<p>which in case of the simply binary model reduces to  having \( i=j \). </p>
 
-<ol>
-<li> Collect and pre-process data</li>  
-<li> Define model and architecture</li>  
-<li> Choose cost function and optimizer</li>  
-<li> Train the model</li>  
-<li> Evaluate model performance on test data</li>  
-<li> Adjust hyperparameters (if necessary, network architecture)</li>
-</ol>
 <p>
 <!-- navigation buttons at the bottom of the page -->
 <ul class="pagination">
diff --git a/doc/pub/week42/html/._week42-bs075.html b/doc/pub/week42/html/._week42-bs075.html
index 18529eb6e..6a8a0688a 100644
--- a/doc/pub/week42/html/._week42-bs075.html
+++ b/doc/pub/week42/html/._week42-bs075.html
@@ -36,16 +36,17 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Lecture October 14, 2024', 2, None, 'lecture-october-14-2024'),
-              ('Material for the active learning sessions on Tuesday and '
-               'Wednesday',
+ 'sections': [('Lecture October 13, 2025', 2, None, 'lecture-october-13-2025'),
+              ('Readings and videos', 2, None, 'readings-and-videos'),
+              ('Material for the lab sessions on Tuesday and Wednesday',
                2,
                None,
-               'material-for-the-active-learning-sessions-on-tuesday-and-wednesday'),
-              ('Writing a code which implements a feed-forward neural network',
+               'material-for-the-lab-sessions-on-tuesday-and-wednesday'),
+              ('Lecture material: Writing a code which implements a '
+               'feed-forward neural network',
                2,
                None,
-               'writing-a-code-which-implements-a-feed-forward-neural-network'),
+               'lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network'),
               ('Mathematics of deep learning',
                2,
                None,
@@ -55,10 +56,11 @@
                None,
                'reminder-on-books-with-hands-on-material-and-codes'),
               ('Reading recommendations', 2, None, 'reading-recommendations'),
-              ('First network example, simple percepetron with one input',
+              ('Reminder from last week: First network example, simple '
+               'percepetron with one input',
                2,
                None,
-               'first-network-example-simple-percepetron-with-one-input'),
+               'reminder-from-last-week-first-network-example-simple-percepetron-with-one-input'),
               ('Layout of a simple neural network with no hidden layer',
                2,
                None,
@@ -105,10 +107,10 @@
                None,
                'setting-up-the-equations-for-a-neural-network'),
               ('Layout of a neural network with three hidden layers (last '
-               'later = $l=L=4$, first layer $l=0$)',
+               'layer = $l=L=4$, first layer $l=0$)',
                2,
                None,
-               'layout-of-a-neural-network-with-three-hidden-layers-last-later-l-l-4-first-layer-l-0'),
+               'layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0'),
               ('Definitions', 2, None, 'definitions'),
               ('Inputs to the activation function',
                2,
@@ -291,10 +293,6 @@
                2,
                None,
                'collect-and-pre-process-data'),
-              ('The Breast Cancer Data, now with Keras',
-               2,
-               None,
-               'the-breast-cancer-data-now-with-keras'),
               ('Building a neural network code',
                2,
                None,
@@ -349,103 +347,103 @@
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs001.html#lecture-october-14-2024" style="font-size: 80%;"><b>Lecture October 14, 2024</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs002.html#material-for-the-active-learning-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Material for the active learning sessions on Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs003.html#writing-a-code-which-implements-a-feed-forward-neural-network" style="font-size: 80%;"><b>Writing a code which implements a feed-forward neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs004.html#mathematics-of-deep-learning" style="font-size: 80%;"><b>Mathematics of deep learning</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs005.html#reminder-on-books-with-hands-on-material-and-codes" style="font-size: 80%;"><b>Reminder on books with hands-on material and codes</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs006.html#reading-recommendations" style="font-size: 80%;"><b>Reading recommendations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs007.html#first-network-example-simple-percepetron-with-one-input" style="font-size: 80%;"><b>First network example, simple percepetron with one input</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs008.html#layout-of-a-simple-neural-network-with-no-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with no hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs009.html#optimizing-the-parameters" style="font-size: 80%;"><b>Optimizing the parameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs010.html#adding-a-hidden-layer" style="font-size: 80%;"><b>Adding a hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs011.html#layout-of-a-simple-neural-network-with-one-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with one hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs012.html#the-derivatives" style="font-size: 80%;"><b>The derivatives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs013.html#important-observations" style="font-size: 80%;"><b>Important observations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs014.html#the-training" style="font-size: 80%;"><b>The training</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs015.html#code-example" style="font-size: 80%;"><b>Code example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs016.html#simple-neural-network-and-the-back-propagation-equations" style="font-size: 80%;"><b>Simple neural network and the  back propagation equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs017.html#layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-with-two-hidden-noeds-and-one-output-node" style="font-size: 80%;"><b>Layout of a simple neural network with two input nodes, one  hidden layer with two hidden noeds and one output node</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs018.html#the-ouput-layer" style="font-size: 80%;"><b>The ouput layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs019.html#compact-expressions" style="font-size: 80%;"><b>Compact expressions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs020.html#output-layer" style="font-size: 80%;"><b>Output layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs021.html#explicit-derivatives" style="font-size: 80%;"><b>Explicit derivatives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs022.html#derivatives-of-the-hidden-layer" style="font-size: 80%;"><b>Derivatives of the hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs023.html#final-expression" style="font-size: 80%;"><b>Final expression</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs024.html#completing-the-list" style="font-size: 80%;"><b>Completing the list</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs025.html#final-expressions-for-the-biases-of-the-hidden-layer" style="font-size: 80%;"><b>Final expressions for the biases of the hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs026.html#gradient-expressions" style="font-size: 80%;"><b>Gradient expressions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs027.html#setting-up-the-equations-for-a-neural-network" style="font-size: 80%;"><b>Setting up the equations for a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs028.html#layout-of-a-neural-network-with-three-hidden-layers-last-later-l-l-4-first-layer-l-0" style="font-size: 80%;"><b>Layout of a neural network with three hidden layers (last later = \( l=L=4 \), first layer \( l=0 \))</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs029.html#definitions" style="font-size: 80%;"><b>Definitions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs030.html#inputs-to-the-activation-function" style="font-size: 80%;"><b>Inputs to the activation function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs031.html#layout-of-input-to-first-hidden-layer-l-1-from-input-layer-l-0" style="font-size: 80%;"><b>Layout of input to first hidden layer \( l=1 \) from input layer \( l=0 \)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs032.html#derivatives-and-the-chain-rule" style="font-size: 80%;"><b>Derivatives and the chain rule</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs033.html#derivative-of-the-cost-function" style="font-size: 80%;"><b>Derivative of the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs034.html#the-back-propagation-equations-for-a-neural-network" style="font-size: 80%;"><b>The  back propagation equations for a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs035.html#analyzing-the-last-results" style="font-size: 80%;"><b>Analyzing the last results</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs036.html#more-considerations" style="font-size: 80%;"><b>More considerations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs037.html#derivatives-in-terms-of-z-j-l" style="font-size: 80%;"><b>Derivatives in terms of \( z_j^L \)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs038.html#bringing-it-together" style="font-size: 80%;"><b>Bringing it together</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs039.html#final-back-propagating-equation" style="font-size: 80%;"><b>Final back propagating equation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs040.html#using-the-chain-rule-and-summing-over-all-k-entries" style="font-size: 80%;"><b>Using the chain rule and summing over all \( k \) entries</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs041.html#setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations" style="font-size: 80%;"><b>Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs042.html#setting-up-the-back-propagation-algorithm-part-1" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 1</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs043.html#setting-up-the-back-propagation-algorithm-part-2" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs044.html#setting-up-the-back-propagation-algorithm-part-3" style="font-size: 80%;"><b>Setting up the Back propagation algorithm, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs045.html#updating-the-gradients" style="font-size: 80%;"><b>Updating the gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs001.html#lecture-october-13-2025" style="font-size: 80%;"><b>Lecture October 13, 2025</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs002.html#readings-and-videos" style="font-size: 80%;"><b>Readings and videos</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs003.html#material-for-the-lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Material for the lab sessions on Tuesday and Wednesday</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs004.html#lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network" style="font-size: 80%;"><b>Lecture material: Writing a code which implements a feed-forward neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs005.html#mathematics-of-deep-learning" style="font-size: 80%;"><b>Mathematics of deep learning</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs006.html#reminder-on-books-with-hands-on-material-and-codes" style="font-size: 80%;"><b>Reminder on books with hands-on material and codes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs007.html#reading-recommendations" style="font-size: 80%;"><b>Reading recommendations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs008.html#reminder-from-last-week-first-network-example-simple-percepetron-with-one-input" style="font-size: 80%;"><b>Reminder from last week: First network example, simple percepetron with one input</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs009.html#layout-of-a-simple-neural-network-with-no-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with no hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs010.html#optimizing-the-parameters" style="font-size: 80%;"><b>Optimizing the parameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs011.html#adding-a-hidden-layer" style="font-size: 80%;"><b>Adding a hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs012.html#layout-of-a-simple-neural-network-with-one-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with one hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs013.html#the-derivatives" style="font-size: 80%;"><b>The derivatives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs014.html#important-observations" style="font-size: 80%;"><b>Important observations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs015.html#the-training" style="font-size: 80%;"><b>The training</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs016.html#code-example" style="font-size: 80%;"><b>Code example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs017.html#simple-neural-network-and-the-back-propagation-equations" style="font-size: 80%;"><b>Simple neural network and the  back propagation equations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs018.html#layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-with-two-hidden-noeds-and-one-output-node" style="font-size: 80%;"><b>Layout of a simple neural network with two input nodes, one  hidden layer with two hidden noeds and one output node</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs019.html#the-ouput-layer" style="font-size: 80%;"><b>The ouput layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs020.html#compact-expressions" style="font-size: 80%;"><b>Compact expressions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs021.html#output-layer" style="font-size: 80%;"><b>Output layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs022.html#explicit-derivatives" style="font-size: 80%;"><b>Explicit derivatives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs023.html#derivatives-of-the-hidden-layer" style="font-size: 80%;"><b>Derivatives of the hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs024.html#final-expression" style="font-size: 80%;"><b>Final expression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs025.html#completing-the-list" style="font-size: 80%;"><b>Completing the list</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs026.html#final-expressions-for-the-biases-of-the-hidden-layer" style="font-size: 80%;"><b>Final expressions for the biases of the hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs027.html#gradient-expressions" style="font-size: 80%;"><b>Gradient expressions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs028.html#setting-up-the-equations-for-a-neural-network" style="font-size: 80%;"><b>Setting up the equations for a neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs029.html#layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0" style="font-size: 80%;"><b>Layout of a neural network with three hidden layers (last layer = \( l=L=4 \), first layer \( l=0 \))</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs030.html#definitions" style="font-size: 80%;"><b>Definitions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs031.html#inputs-to-the-activation-function" style="font-size: 80%;"><b>Inputs to the activation function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs032.html#layout-of-input-to-first-hidden-layer-l-1-from-input-layer-l-0" style="font-size: 80%;"><b>Layout of input to first hidden layer \( l=1 \) from input layer \( l=0 \)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs033.html#derivatives-and-the-chain-rule" style="font-size: 80%;"><b>Derivatives and the chain rule</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs034.html#derivative-of-the-cost-function" style="font-size: 80%;"><b>Derivative of the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs035.html#the-back-propagation-equations-for-a-neural-network" style="font-size: 80%;"><b>The  back propagation equations for a neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs036.html#analyzing-the-last-results" style="font-size: 80%;"><b>Analyzing the last results</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs037.html#more-considerations" style="font-size: 80%;"><b>More considerations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs038.html#derivatives-in-terms-of-z-j-l" style="font-size: 80%;"><b>Derivatives in terms of \( z_j^L \)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs039.html#bringing-it-together" style="font-size: 80%;"><b>Bringing it together</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs040.html#final-back-propagating-equation" style="font-size: 80%;"><b>Final back propagating equation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs041.html#using-the-chain-rule-and-summing-over-all-k-entries" style="font-size: 80%;"><b>Using the chain rule and summing over all \( k \) entries</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs042.html#setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations" style="font-size: 80%;"><b>Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs043.html#setting-up-the-back-propagation-algorithm-part-1" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 1</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs044.html#setting-up-the-back-propagation-algorithm-part-2" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 2</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs045.html#setting-up-the-back-propagation-algorithm-part-3" style="font-size: 80%;"><b>Setting up the Back propagation algorithm, part 3</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs046.html#updating-the-gradients" style="font-size: 80%;"><b>Updating the gradients</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#activation-functions" style="font-size: 80%;"><b>Activation functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs047.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs048.html#relevance" style="font-size: 80%;"><b>Relevance</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs049.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs050.html#exploding-gradients" style="font-size: 80%;"><b>Exploding gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs051.html#is-the-logistic-activation-function-sigmoid-our-choice" style="font-size: 80%;"><b>Is the Logistic activation function (Sigmoid)  our choice?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs052.html#logistic-function-as-the-root-of-problems" style="font-size: 80%;"><b>Logistic function as the root of problems</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs053.html#the-derivative-of-the-logistic-funtion" style="font-size: 80%;"><b>The derivative of the Logistic funtion</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs054.html#insights-from-the-paper-by-glorot-and-bengio" style="font-size: 80%;"><b>Insights from the paper by Glorot and Bengio</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs055.html#the-relu-function-family" style="font-size: 80%;"><b>The RELU function family</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs056.html#elu-function" style="font-size: 80%;"><b>ELU function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs057.html#which-activation-function-should-we-use" style="font-size: 80%;"><b>Which activation function should we use?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs058.html#more-on-activation-functions-output-layers" style="font-size: 80%;"><b>More on activation functions, output layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs059.html#fine-tuning-neural-network-hyperparameters" style="font-size: 80%;"><b>Fine-tuning neural network hyperparameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs060.html#hidden-layers" style="font-size: 80%;"><b>Hidden layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs061.html#batch-normalization" style="font-size: 80%;"><b>Batch Normalization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs062.html#dropout" style="font-size: 80%;"><b>Dropout</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs063.html#gradient-clipping" style="font-size: 80%;"><b>Gradient Clipping</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs064.html#a-top-down-perspective-on-neural-networks" style="font-size: 80%;"><b>A top-down perspective on Neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs065.html#more-top-down-perspectives" style="font-size: 80%;"><b>More top-down perspectives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs066.html#limitations-of-supervised-learning-with-deep-networks" style="font-size: 80%;"><b>Limitations of supervised learning with deep networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs067.html#limitations-of-nns" style="font-size: 80%;"><b>Limitations of NNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs068.html#homogeneous-data" style="font-size: 80%;"><b>Homogeneous data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs069.html#more-limitations" style="font-size: 80%;"><b>More limitations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs070.html#setting-up-a-multi-layer-perceptron-model-for-classification" style="font-size: 80%;"><b>Setting up a Multi-layer perceptron model for classification</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs071.html#defining-the-cost-function" style="font-size: 80%;"><b>Defining the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs072.html#example-binary-classification-problem" style="font-size: 80%;"><b>Example: binary classification problem</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs073.html#the-softmax-function" style="font-size: 80%;"><b>The Softmax function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs074.html#developing-a-code-for-doing-neural-networks-with-back-propagation" style="font-size: 80%;"><b>Developing a code for doing neural networks with back propagation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs076.html#train-and-test-datasets" style="font-size: 80%;"><b>Train and test datasets</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs077.html#define-model-and-architecture" style="font-size: 80%;"><b>Define model and architecture</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs078.html#layers" style="font-size: 80%;"><b>Layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs079.html#weights-and-biases" style="font-size: 80%;"><b>Weights and biases</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs080.html#feed-forward-pass" style="font-size: 80%;"><b>Feed-forward pass</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs081.html#matrix-multiplications" style="font-size: 80%;"><b>Matrix multiplications</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs082.html#choose-cost-function-and-optimizer" style="font-size: 80%;"><b>Choose cost function and optimizer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs083.html#optimizing-the-cost-function" style="font-size: 80%;"><b>Optimizing the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs084.html#regularization" style="font-size: 80%;"><b>Regularization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs085.html#matrix-multiplication" style="font-size: 80%;"><b>Matrix  multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs086.html#improving-performance" style="font-size: 80%;"><b>Improving performance</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs087.html#full-object-oriented-implementation" style="font-size: 80%;"><b>Full object-oriented implementation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs088.html#evaluate-model-performance-on-test-data" style="font-size: 80%;"><b>Evaluate model performance on test data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs089.html#adjust-hyperparameters" style="font-size: 80%;"><b>Adjust hyperparameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs091.html#scikit-learn-implementation" style="font-size: 80%;"><b>scikit-learn implementation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#building-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs094.html#tensorflow" style="font-size: 80%;"><b>Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs095.html#using-keras" style="font-size: 80%;"><b>Using Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#the-breast-cancer-data-now-with-keras" style="font-size: 80%;"><b>The Breast Cancer Data, now with Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs048.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs049.html#relevance" style="font-size: 80%;"><b>Relevance</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs050.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs051.html#exploding-gradients" style="font-size: 80%;"><b>Exploding gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs052.html#is-the-logistic-activation-function-sigmoid-our-choice" style="font-size: 80%;"><b>Is the Logistic activation function (Sigmoid)  our choice?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs053.html#logistic-function-as-the-root-of-problems" style="font-size: 80%;"><b>Logistic function as the root of problems</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs054.html#the-derivative-of-the-logistic-funtion" style="font-size: 80%;"><b>The derivative of the Logistic funtion</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs055.html#insights-from-the-paper-by-glorot-and-bengio" style="font-size: 80%;"><b>Insights from the paper by Glorot and Bengio</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs056.html#the-relu-function-family" style="font-size: 80%;"><b>The RELU function family</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs057.html#elu-function" style="font-size: 80%;"><b>ELU function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs058.html#which-activation-function-should-we-use" style="font-size: 80%;"><b>Which activation function should we use?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs059.html#more-on-activation-functions-output-layers" style="font-size: 80%;"><b>More on activation functions, output layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs060.html#fine-tuning-neural-network-hyperparameters" style="font-size: 80%;"><b>Fine-tuning neural network hyperparameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs061.html#hidden-layers" style="font-size: 80%;"><b>Hidden layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs062.html#batch-normalization" style="font-size: 80%;"><b>Batch Normalization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs063.html#dropout" style="font-size: 80%;"><b>Dropout</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs064.html#gradient-clipping" style="font-size: 80%;"><b>Gradient Clipping</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs065.html#a-top-down-perspective-on-neural-networks" style="font-size: 80%;"><b>A top-down perspective on Neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs066.html#more-top-down-perspectives" style="font-size: 80%;"><b>More top-down perspectives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs067.html#limitations-of-supervised-learning-with-deep-networks" style="font-size: 80%;"><b>Limitations of supervised learning with deep networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs068.html#limitations-of-nns" style="font-size: 80%;"><b>Limitations of NNs</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs069.html#homogeneous-data" style="font-size: 80%;"><b>Homogeneous data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs070.html#more-limitations" style="font-size: 80%;"><b>More limitations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs071.html#setting-up-a-multi-layer-perceptron-model-for-classification" style="font-size: 80%;"><b>Setting up a Multi-layer perceptron model for classification</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs072.html#defining-the-cost-function" style="font-size: 80%;"><b>Defining the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs073.html#example-binary-classification-problem" style="font-size: 80%;"><b>Example: binary classification problem</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs074.html#the-softmax-function" style="font-size: 80%;"><b>The Softmax function</b></a></li>
+     <!-- navigation toc: --> <li><a href="#developing-a-code-for-doing-neural-networks-with-back-propagation" style="font-size: 80%;"><b>Developing a code for doing neural networks with back propagation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs077.html#train-and-test-datasets" style="font-size: 80%;"><b>Train and test datasets</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs078.html#define-model-and-architecture" style="font-size: 80%;"><b>Define model and architecture</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs079.html#layers" style="font-size: 80%;"><b>Layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs080.html#weights-and-biases" style="font-size: 80%;"><b>Weights and biases</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs081.html#feed-forward-pass" style="font-size: 80%;"><b>Feed-forward pass</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs082.html#matrix-multiplications" style="font-size: 80%;"><b>Matrix multiplications</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs083.html#choose-cost-function-and-optimizer" style="font-size: 80%;"><b>Choose cost function and optimizer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs084.html#optimizing-the-cost-function" style="font-size: 80%;"><b>Optimizing the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs085.html#regularization" style="font-size: 80%;"><b>Regularization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs086.html#matrix-multiplication" style="font-size: 80%;"><b>Matrix  multiplication</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs087.html#improving-performance" style="font-size: 80%;"><b>Improving performance</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs088.html#full-object-oriented-implementation" style="font-size: 80%;"><b>Full object-oriented implementation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs089.html#evaluate-model-performance-on-test-data" style="font-size: 80%;"><b>Evaluate model performance on test data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs090.html#adjust-hyperparameters" style="font-size: 80%;"><b>Adjust hyperparameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#scikit-learn-implementation" style="font-size: 80%;"><b>scikit-learn implementation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs094.html#building-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building neural networks in Tensorflow and Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs095.html#tensorflow" style="font-size: 80%;"><b>Tensorflow</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#using-keras" style="font-size: 80%;"><b>Using Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#building-a-neural-network-code" style="font-size: 80%;"><b>Building a neural network code</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#learning-rate-methods" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Learning rate methods</a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#usage-of-the-above-learning-rate-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of the above learning rate schedulers</a></li>
@@ -464,118 +462,19 @@
 <div class="container">
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0075"></a>
-<!-- !split -->
-<h2 id="collect-and-pre-process-data" class="anchor">Collect and pre-process data </h2>
-
-<p>Here we will be using the MNIST dataset, which is readily available through the <b>scikit-learn</b>
-package. You may also find it for example <a href="http://yann.lecun.com/exdb/mnist/" target="_self">here</a>.  
-The <em>MNIST</em> (Modified National Institute of Standards and Technology) database is a large database
-of handwritten digits that is commonly used for training various image processing systems.  
-The MNIST dataset consists of 70 000 images of size \( 28\times 28 \) pixels, each labeled from 0 to 9.  
-The scikit-learn dataset we will use consists of a selection of 1797 images of size \( 8\times 8 \) collected and processed from this database.  
-</p>
-
-<p>To feed data into a feed-forward neural network we need to represent
-the inputs as a design/feature matrix \( X = (n_{inputs}, n_{features}) \).  Each
-row represents an <em>input</em>, in this case a handwritten digit, and
-each column represents a <em>feature</em>, in this case a pixel.  The
-correct answers, also known as <em>labels</em> or <em>targets</em> are
-represented as a 1D array of integers 
-\( Y = (n_{inputs}) = (5, 3, 1, 8,...) \).
-</p>
-
-<p>As an example, say we want to build a neural network using supervised learning to predict Body-Mass Index (BMI) from
-measurements of height (in m)  
-and weight (in kg). If we have measurements of 5 people the design/feature matrix could be for example:  
-</p>
-
-<p>$$ X = \begin{bmatrix}
-1.85 &amp; 81\\
-1.71 &amp; 65\\
-1.95 &amp; 103\\
-1.55 &amp; 42\\
-1.63 &amp; 56
-\end{bmatrix} ,$$  
-</p>
-
-<p>and the targets would be:  </p>
-
-<p>$$ Y = (23.7, 22.2, 27.1, 17.5, 21.1) $$  </p>
-
-<p>Since each input image is a 2D matrix, we need to flatten the image
-(i.e. "unravel" the 2D matrix into a 1D array) to turn the data into a
-design/feature matrix. This means we lose all spatial information in the
-image, such as locality and translational invariance. More complicated
-architectures such as Convolutional Neural Networks can take advantage
-of such information, and are most commonly applied when analyzing
-images.
-</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #408080; font-style: italic"># import necessary packages</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">matplotlib.pyplot</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">plt</span>
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn</span> <span style="color: #008000; font-weight: bold">import</span> datasets
-
-
-<span style="color: #408080; font-style: italic"># ensure the same random numbers appear every time</span>
-np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>seed(<span style="color: #666666">0</span>)
-
-<span style="color: #408080; font-style: italic"># display images in notebook</span>
-<span style="color: #666666">%</span>matplotlib inline
-plt<span style="color: #666666">.</span>rcParams[<span style="color: #BA2121">&#39;figure.figsize&#39;</span>] <span style="color: #666666">=</span> (<span style="color: #666666">12</span>,<span style="color: #666666">12</span>)
-
-
-<span style="color: #408080; font-style: italic"># download MNIST dataset</span>
-digits <span style="color: #666666">=</span> datasets<span style="color: #666666">.</span>load_digits()
-
-<span style="color: #408080; font-style: italic"># define inputs and labels</span>
-inputs <span style="color: #666666">=</span> digits<span style="color: #666666">.</span>images
-labels <span style="color: #666666">=</span> digits<span style="color: #666666">.</span>target
-
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;inputs = (n_inputs, pixel_width, pixel_height) = &quot;</span> <span style="color: #666666">+</span> <span style="color: #008000">str</span>(inputs<span style="color: #666666">.</span>shape))
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;labels = (n_inputs) = &quot;</span> <span style="color: #666666">+</span> <span style="color: #008000">str</span>(labels<span style="color: #666666">.</span>shape))
-
-
-<span style="color: #408080; font-style: italic"># flatten the image</span>
-<span style="color: #408080; font-style: italic"># the value -1 means dimension is inferred from the remaining dimensions: 8x8 = 64</span>
-n_inputs <span style="color: #666666">=</span> <span style="color: #008000">len</span>(inputs)
-inputs <span style="color: #666666">=</span> inputs<span style="color: #666666">.</span>reshape(n_inputs, <span style="color: #666666">-1</span>)
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;X = (n_inputs, n_features) = &quot;</span> <span style="color: #666666">+</span> <span style="color: #008000">str</span>(inputs<span style="color: #666666">.</span>shape))
-
-
-<span style="color: #408080; font-style: italic"># choose some random images to display</span>
-indices <span style="color: #666666">=</span> np<span style="color: #666666">.</span>arange(n_inputs)
-random_indices <span style="color: #666666">=</span> np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>choice(indices, size<span style="color: #666666">=5</span>)
-
-<span style="color: #008000; font-weight: bold">for</span> i, image <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">enumerate</span>(digits<span style="color: #666666">.</span>images[random_indices]):
-    plt<span style="color: #666666">.</span>subplot(<span style="color: #666666">1</span>, <span style="color: #666666">5</span>, i<span style="color: #666666">+1</span>)
-    plt<span style="color: #666666">.</span>axis(<span style="color: #BA2121">&#39;off&#39;</span>)
-    plt<span style="color: #666666">.</span>imshow(image, cmap<span style="color: #666666">=</span>plt<span style="color: #666666">.</span>cm<span style="color: #666666">.</span>gray_r, interpolation<span style="color: #666666">=</span><span style="color: #BA2121">&#39;nearest&#39;</span>)
-    plt<span style="color: #666666">.</span>title(<span style="color: #BA2121">&quot;Label: </span><span style="color: #BB6688; font-weight: bold">%d</span><span style="color: #BA2121">&quot;</span> <span style="color: #666666">%</span> digits<span style="color: #666666">.</span>target[random_indices[i]])
-plt<span style="color: #666666">.</span>show()
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-
+<!-- !split  -->
+<h2 id="developing-a-code-for-doing-neural-networks-with-back-propagation" class="anchor">Developing a code for doing neural networks with back propagation </h2>
+
+<p>One can identify a set of key steps when using neural networks to solve supervised learning problems:  </p>
+
+<ol>
+<li> Collect and pre-process data</li>  
+<li> Define model and architecture</li>  
+<li> Choose cost function and optimizer</li>  
+<li> Train the model</li>  
+<li> Evaluate model performance on test data</li>  
+<li> Adjust hyperparameters (if necessary, network architecture)</li>
+</ol>
 <p>
 <!-- navigation buttons at the bottom of the page -->
 <ul class="pagination">
diff --git a/doc/pub/week42/html/._week42-bs076.html b/doc/pub/week42/html/._week42-bs076.html
index a3d1ab913..2d9cc4b3f 100644
--- a/doc/pub/week42/html/._week42-bs076.html
+++ b/doc/pub/week42/html/._week42-bs076.html
@@ -36,16 +36,17 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Lecture October 14, 2024', 2, None, 'lecture-october-14-2024'),
-              ('Material for the active learning sessions on Tuesday and '
-               'Wednesday',
+ 'sections': [('Lecture October 13, 2025', 2, None, 'lecture-october-13-2025'),
+              ('Readings and videos', 2, None, 'readings-and-videos'),
+              ('Material for the lab sessions on Tuesday and Wednesday',
                2,
                None,
-               'material-for-the-active-learning-sessions-on-tuesday-and-wednesday'),
-              ('Writing a code which implements a feed-forward neural network',
+               'material-for-the-lab-sessions-on-tuesday-and-wednesday'),
+              ('Lecture material: Writing a code which implements a '
+               'feed-forward neural network',
                2,
                None,
-               'writing-a-code-which-implements-a-feed-forward-neural-network'),
+               'lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network'),
               ('Mathematics of deep learning',
                2,
                None,
@@ -55,10 +56,11 @@
                None,
                'reminder-on-books-with-hands-on-material-and-codes'),
               ('Reading recommendations', 2, None, 'reading-recommendations'),
-              ('First network example, simple percepetron with one input',
+              ('Reminder from last week: First network example, simple '
+               'percepetron with one input',
                2,
                None,
-               'first-network-example-simple-percepetron-with-one-input'),
+               'reminder-from-last-week-first-network-example-simple-percepetron-with-one-input'),
               ('Layout of a simple neural network with no hidden layer',
                2,
                None,
@@ -105,10 +107,10 @@
                None,
                'setting-up-the-equations-for-a-neural-network'),
               ('Layout of a neural network with three hidden layers (last '
-               'later = $l=L=4$, first layer $l=0$)',
+               'layer = $l=L=4$, first layer $l=0$)',
                2,
                None,
-               'layout-of-a-neural-network-with-three-hidden-layers-last-later-l-l-4-first-layer-l-0'),
+               'layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0'),
               ('Definitions', 2, None, 'definitions'),
               ('Inputs to the activation function',
                2,
@@ -291,10 +293,6 @@
                2,
                None,
                'collect-and-pre-process-data'),
-              ('The Breast Cancer Data, now with Keras',
-               2,
-               None,
-               'the-breast-cancer-data-now-with-keras'),
               ('Building a neural network code',
                2,
                None,
@@ -349,103 +347,103 @@
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs001.html#lecture-october-14-2024" style="font-size: 80%;"><b>Lecture October 14, 2024</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs002.html#material-for-the-active-learning-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Material for the active learning sessions on Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs003.html#writing-a-code-which-implements-a-feed-forward-neural-network" style="font-size: 80%;"><b>Writing a code which implements a feed-forward neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs004.html#mathematics-of-deep-learning" style="font-size: 80%;"><b>Mathematics of deep learning</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs005.html#reminder-on-books-with-hands-on-material-and-codes" style="font-size: 80%;"><b>Reminder on books with hands-on material and codes</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs006.html#reading-recommendations" style="font-size: 80%;"><b>Reading recommendations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs007.html#first-network-example-simple-percepetron-with-one-input" style="font-size: 80%;"><b>First network example, simple percepetron with one input</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs008.html#layout-of-a-simple-neural-network-with-no-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with no hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs009.html#optimizing-the-parameters" style="font-size: 80%;"><b>Optimizing the parameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs010.html#adding-a-hidden-layer" style="font-size: 80%;"><b>Adding a hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs011.html#layout-of-a-simple-neural-network-with-one-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with one hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs012.html#the-derivatives" style="font-size: 80%;"><b>The derivatives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs013.html#important-observations" style="font-size: 80%;"><b>Important observations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs014.html#the-training" style="font-size: 80%;"><b>The training</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs015.html#code-example" style="font-size: 80%;"><b>Code example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs016.html#simple-neural-network-and-the-back-propagation-equations" style="font-size: 80%;"><b>Simple neural network and the  back propagation equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs017.html#layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-with-two-hidden-noeds-and-one-output-node" style="font-size: 80%;"><b>Layout of a simple neural network with two input nodes, one  hidden layer with two hidden noeds and one output node</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs018.html#the-ouput-layer" style="font-size: 80%;"><b>The ouput layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs019.html#compact-expressions" style="font-size: 80%;"><b>Compact expressions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs020.html#output-layer" style="font-size: 80%;"><b>Output layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs021.html#explicit-derivatives" style="font-size: 80%;"><b>Explicit derivatives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs022.html#derivatives-of-the-hidden-layer" style="font-size: 80%;"><b>Derivatives of the hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs023.html#final-expression" style="font-size: 80%;"><b>Final expression</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs024.html#completing-the-list" style="font-size: 80%;"><b>Completing the list</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs025.html#final-expressions-for-the-biases-of-the-hidden-layer" style="font-size: 80%;"><b>Final expressions for the biases of the hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs026.html#gradient-expressions" style="font-size: 80%;"><b>Gradient expressions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs027.html#setting-up-the-equations-for-a-neural-network" style="font-size: 80%;"><b>Setting up the equations for a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs028.html#layout-of-a-neural-network-with-three-hidden-layers-last-later-l-l-4-first-layer-l-0" style="font-size: 80%;"><b>Layout of a neural network with three hidden layers (last later = \( l=L=4 \), first layer \( l=0 \))</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs029.html#definitions" style="font-size: 80%;"><b>Definitions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs030.html#inputs-to-the-activation-function" style="font-size: 80%;"><b>Inputs to the activation function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs031.html#layout-of-input-to-first-hidden-layer-l-1-from-input-layer-l-0" style="font-size: 80%;"><b>Layout of input to first hidden layer \( l=1 \) from input layer \( l=0 \)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs032.html#derivatives-and-the-chain-rule" style="font-size: 80%;"><b>Derivatives and the chain rule</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs033.html#derivative-of-the-cost-function" style="font-size: 80%;"><b>Derivative of the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs034.html#the-back-propagation-equations-for-a-neural-network" style="font-size: 80%;"><b>The  back propagation equations for a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs035.html#analyzing-the-last-results" style="font-size: 80%;"><b>Analyzing the last results</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs036.html#more-considerations" style="font-size: 80%;"><b>More considerations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs037.html#derivatives-in-terms-of-z-j-l" style="font-size: 80%;"><b>Derivatives in terms of \( z_j^L \)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs038.html#bringing-it-together" style="font-size: 80%;"><b>Bringing it together</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs039.html#final-back-propagating-equation" style="font-size: 80%;"><b>Final back propagating equation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs040.html#using-the-chain-rule-and-summing-over-all-k-entries" style="font-size: 80%;"><b>Using the chain rule and summing over all \( k \) entries</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs041.html#setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations" style="font-size: 80%;"><b>Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs042.html#setting-up-the-back-propagation-algorithm-part-1" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 1</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs043.html#setting-up-the-back-propagation-algorithm-part-2" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs044.html#setting-up-the-back-propagation-algorithm-part-3" style="font-size: 80%;"><b>Setting up the Back propagation algorithm, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs045.html#updating-the-gradients" style="font-size: 80%;"><b>Updating the gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs001.html#lecture-october-13-2025" style="font-size: 80%;"><b>Lecture October 13, 2025</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs002.html#readings-and-videos" style="font-size: 80%;"><b>Readings and videos</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs003.html#material-for-the-lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Material for the lab sessions on Tuesday and Wednesday</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs004.html#lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network" style="font-size: 80%;"><b>Lecture material: Writing a code which implements a feed-forward neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs005.html#mathematics-of-deep-learning" style="font-size: 80%;"><b>Mathematics of deep learning</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs006.html#reminder-on-books-with-hands-on-material-and-codes" style="font-size: 80%;"><b>Reminder on books with hands-on material and codes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs007.html#reading-recommendations" style="font-size: 80%;"><b>Reading recommendations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs008.html#reminder-from-last-week-first-network-example-simple-percepetron-with-one-input" style="font-size: 80%;"><b>Reminder from last week: First network example, simple percepetron with one input</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs009.html#layout-of-a-simple-neural-network-with-no-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with no hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs010.html#optimizing-the-parameters" style="font-size: 80%;"><b>Optimizing the parameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs011.html#adding-a-hidden-layer" style="font-size: 80%;"><b>Adding a hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs012.html#layout-of-a-simple-neural-network-with-one-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with one hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs013.html#the-derivatives" style="font-size: 80%;"><b>The derivatives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs014.html#important-observations" style="font-size: 80%;"><b>Important observations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs015.html#the-training" style="font-size: 80%;"><b>The training</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs016.html#code-example" style="font-size: 80%;"><b>Code example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs017.html#simple-neural-network-and-the-back-propagation-equations" style="font-size: 80%;"><b>Simple neural network and the  back propagation equations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs018.html#layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-with-two-hidden-noeds-and-one-output-node" style="font-size: 80%;"><b>Layout of a simple neural network with two input nodes, one  hidden layer with two hidden noeds and one output node</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs019.html#the-ouput-layer" style="font-size: 80%;"><b>The ouput layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs020.html#compact-expressions" style="font-size: 80%;"><b>Compact expressions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs021.html#output-layer" style="font-size: 80%;"><b>Output layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs022.html#explicit-derivatives" style="font-size: 80%;"><b>Explicit derivatives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs023.html#derivatives-of-the-hidden-layer" style="font-size: 80%;"><b>Derivatives of the hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs024.html#final-expression" style="font-size: 80%;"><b>Final expression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs025.html#completing-the-list" style="font-size: 80%;"><b>Completing the list</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs026.html#final-expressions-for-the-biases-of-the-hidden-layer" style="font-size: 80%;"><b>Final expressions for the biases of the hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs027.html#gradient-expressions" style="font-size: 80%;"><b>Gradient expressions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs028.html#setting-up-the-equations-for-a-neural-network" style="font-size: 80%;"><b>Setting up the equations for a neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs029.html#layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0" style="font-size: 80%;"><b>Layout of a neural network with three hidden layers (last layer = \( l=L=4 \), first layer \( l=0 \))</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs030.html#definitions" style="font-size: 80%;"><b>Definitions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs031.html#inputs-to-the-activation-function" style="font-size: 80%;"><b>Inputs to the activation function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs032.html#layout-of-input-to-first-hidden-layer-l-1-from-input-layer-l-0" style="font-size: 80%;"><b>Layout of input to first hidden layer \( l=1 \) from input layer \( l=0 \)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs033.html#derivatives-and-the-chain-rule" style="font-size: 80%;"><b>Derivatives and the chain rule</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs034.html#derivative-of-the-cost-function" style="font-size: 80%;"><b>Derivative of the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs035.html#the-back-propagation-equations-for-a-neural-network" style="font-size: 80%;"><b>The  back propagation equations for a neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs036.html#analyzing-the-last-results" style="font-size: 80%;"><b>Analyzing the last results</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs037.html#more-considerations" style="font-size: 80%;"><b>More considerations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs038.html#derivatives-in-terms-of-z-j-l" style="font-size: 80%;"><b>Derivatives in terms of \( z_j^L \)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs039.html#bringing-it-together" style="font-size: 80%;"><b>Bringing it together</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs040.html#final-back-propagating-equation" style="font-size: 80%;"><b>Final back propagating equation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs041.html#using-the-chain-rule-and-summing-over-all-k-entries" style="font-size: 80%;"><b>Using the chain rule and summing over all \( k \) entries</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs042.html#setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations" style="font-size: 80%;"><b>Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs043.html#setting-up-the-back-propagation-algorithm-part-1" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 1</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs044.html#setting-up-the-back-propagation-algorithm-part-2" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 2</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs045.html#setting-up-the-back-propagation-algorithm-part-3" style="font-size: 80%;"><b>Setting up the Back propagation algorithm, part 3</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs046.html#updating-the-gradients" style="font-size: 80%;"><b>Updating the gradients</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#activation-functions" style="font-size: 80%;"><b>Activation functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs047.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs048.html#relevance" style="font-size: 80%;"><b>Relevance</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs049.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs050.html#exploding-gradients" style="font-size: 80%;"><b>Exploding gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs051.html#is-the-logistic-activation-function-sigmoid-our-choice" style="font-size: 80%;"><b>Is the Logistic activation function (Sigmoid)  our choice?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs052.html#logistic-function-as-the-root-of-problems" style="font-size: 80%;"><b>Logistic function as the root of problems</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs053.html#the-derivative-of-the-logistic-funtion" style="font-size: 80%;"><b>The derivative of the Logistic funtion</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs054.html#insights-from-the-paper-by-glorot-and-bengio" style="font-size: 80%;"><b>Insights from the paper by Glorot and Bengio</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs055.html#the-relu-function-family" style="font-size: 80%;"><b>The RELU function family</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs056.html#elu-function" style="font-size: 80%;"><b>ELU function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs057.html#which-activation-function-should-we-use" style="font-size: 80%;"><b>Which activation function should we use?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs058.html#more-on-activation-functions-output-layers" style="font-size: 80%;"><b>More on activation functions, output layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs059.html#fine-tuning-neural-network-hyperparameters" style="font-size: 80%;"><b>Fine-tuning neural network hyperparameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs060.html#hidden-layers" style="font-size: 80%;"><b>Hidden layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs061.html#batch-normalization" style="font-size: 80%;"><b>Batch Normalization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs062.html#dropout" style="font-size: 80%;"><b>Dropout</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs063.html#gradient-clipping" style="font-size: 80%;"><b>Gradient Clipping</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs064.html#a-top-down-perspective-on-neural-networks" style="font-size: 80%;"><b>A top-down perspective on Neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs065.html#more-top-down-perspectives" style="font-size: 80%;"><b>More top-down perspectives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs066.html#limitations-of-supervised-learning-with-deep-networks" style="font-size: 80%;"><b>Limitations of supervised learning with deep networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs067.html#limitations-of-nns" style="font-size: 80%;"><b>Limitations of NNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs068.html#homogeneous-data" style="font-size: 80%;"><b>Homogeneous data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs069.html#more-limitations" style="font-size: 80%;"><b>More limitations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs070.html#setting-up-a-multi-layer-perceptron-model-for-classification" style="font-size: 80%;"><b>Setting up a Multi-layer perceptron model for classification</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs071.html#defining-the-cost-function" style="font-size: 80%;"><b>Defining the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs072.html#example-binary-classification-problem" style="font-size: 80%;"><b>Example: binary classification problem</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs073.html#the-softmax-function" style="font-size: 80%;"><b>The Softmax function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs074.html#developing-a-code-for-doing-neural-networks-with-back-propagation" style="font-size: 80%;"><b>Developing a code for doing neural networks with back propagation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="#train-and-test-datasets" style="font-size: 80%;"><b>Train and test datasets</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs077.html#define-model-and-architecture" style="font-size: 80%;"><b>Define model and architecture</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs078.html#layers" style="font-size: 80%;"><b>Layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs079.html#weights-and-biases" style="font-size: 80%;"><b>Weights and biases</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs080.html#feed-forward-pass" style="font-size: 80%;"><b>Feed-forward pass</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs081.html#matrix-multiplications" style="font-size: 80%;"><b>Matrix multiplications</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs082.html#choose-cost-function-and-optimizer" style="font-size: 80%;"><b>Choose cost function and optimizer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs083.html#optimizing-the-cost-function" style="font-size: 80%;"><b>Optimizing the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs084.html#regularization" style="font-size: 80%;"><b>Regularization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs085.html#matrix-multiplication" style="font-size: 80%;"><b>Matrix  multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs086.html#improving-performance" style="font-size: 80%;"><b>Improving performance</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs087.html#full-object-oriented-implementation" style="font-size: 80%;"><b>Full object-oriented implementation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs088.html#evaluate-model-performance-on-test-data" style="font-size: 80%;"><b>Evaluate model performance on test data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs089.html#adjust-hyperparameters" style="font-size: 80%;"><b>Adjust hyperparameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs091.html#scikit-learn-implementation" style="font-size: 80%;"><b>scikit-learn implementation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#building-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs094.html#tensorflow" style="font-size: 80%;"><b>Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs095.html#using-keras" style="font-size: 80%;"><b>Using Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#the-breast-cancer-data-now-with-keras" style="font-size: 80%;"><b>The Breast Cancer Data, now with Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs048.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs049.html#relevance" style="font-size: 80%;"><b>Relevance</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs050.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs051.html#exploding-gradients" style="font-size: 80%;"><b>Exploding gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs052.html#is-the-logistic-activation-function-sigmoid-our-choice" style="font-size: 80%;"><b>Is the Logistic activation function (Sigmoid)  our choice?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs053.html#logistic-function-as-the-root-of-problems" style="font-size: 80%;"><b>Logistic function as the root of problems</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs054.html#the-derivative-of-the-logistic-funtion" style="font-size: 80%;"><b>The derivative of the Logistic funtion</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs055.html#insights-from-the-paper-by-glorot-and-bengio" style="font-size: 80%;"><b>Insights from the paper by Glorot and Bengio</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs056.html#the-relu-function-family" style="font-size: 80%;"><b>The RELU function family</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs057.html#elu-function" style="font-size: 80%;"><b>ELU function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs058.html#which-activation-function-should-we-use" style="font-size: 80%;"><b>Which activation function should we use?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs059.html#more-on-activation-functions-output-layers" style="font-size: 80%;"><b>More on activation functions, output layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs060.html#fine-tuning-neural-network-hyperparameters" style="font-size: 80%;"><b>Fine-tuning neural network hyperparameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs061.html#hidden-layers" style="font-size: 80%;"><b>Hidden layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs062.html#batch-normalization" style="font-size: 80%;"><b>Batch Normalization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs063.html#dropout" style="font-size: 80%;"><b>Dropout</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs064.html#gradient-clipping" style="font-size: 80%;"><b>Gradient Clipping</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs065.html#a-top-down-perspective-on-neural-networks" style="font-size: 80%;"><b>A top-down perspective on Neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs066.html#more-top-down-perspectives" style="font-size: 80%;"><b>More top-down perspectives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs067.html#limitations-of-supervised-learning-with-deep-networks" style="font-size: 80%;"><b>Limitations of supervised learning with deep networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs068.html#limitations-of-nns" style="font-size: 80%;"><b>Limitations of NNs</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs069.html#homogeneous-data" style="font-size: 80%;"><b>Homogeneous data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs070.html#more-limitations" style="font-size: 80%;"><b>More limitations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs071.html#setting-up-a-multi-layer-perceptron-model-for-classification" style="font-size: 80%;"><b>Setting up a Multi-layer perceptron model for classification</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs072.html#defining-the-cost-function" style="font-size: 80%;"><b>Defining the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs073.html#example-binary-classification-problem" style="font-size: 80%;"><b>Example: binary classification problem</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs074.html#the-softmax-function" style="font-size: 80%;"><b>The Softmax function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs075.html#developing-a-code-for-doing-neural-networks-with-back-propagation" style="font-size: 80%;"><b>Developing a code for doing neural networks with back propagation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs077.html#train-and-test-datasets" style="font-size: 80%;"><b>Train and test datasets</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs078.html#define-model-and-architecture" style="font-size: 80%;"><b>Define model and architecture</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs079.html#layers" style="font-size: 80%;"><b>Layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs080.html#weights-and-biases" style="font-size: 80%;"><b>Weights and biases</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs081.html#feed-forward-pass" style="font-size: 80%;"><b>Feed-forward pass</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs082.html#matrix-multiplications" style="font-size: 80%;"><b>Matrix multiplications</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs083.html#choose-cost-function-and-optimizer" style="font-size: 80%;"><b>Choose cost function and optimizer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs084.html#optimizing-the-cost-function" style="font-size: 80%;"><b>Optimizing the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs085.html#regularization" style="font-size: 80%;"><b>Regularization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs086.html#matrix-multiplication" style="font-size: 80%;"><b>Matrix  multiplication</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs087.html#improving-performance" style="font-size: 80%;"><b>Improving performance</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs088.html#full-object-oriented-implementation" style="font-size: 80%;"><b>Full object-oriented implementation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs089.html#evaluate-model-performance-on-test-data" style="font-size: 80%;"><b>Evaluate model performance on test data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs090.html#adjust-hyperparameters" style="font-size: 80%;"><b>Adjust hyperparameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#scikit-learn-implementation" style="font-size: 80%;"><b>scikit-learn implementation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs094.html#building-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building neural networks in Tensorflow and Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs095.html#tensorflow" style="font-size: 80%;"><b>Tensorflow</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#using-keras" style="font-size: 80%;"><b>Using Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#building-a-neural-network-code" style="font-size: 80%;"><b>Building a neural network code</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#learning-rate-methods" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Learning rate methods</a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#usage-of-the-above-learning-rate-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of the above learning rate schedulers</a></li>
@@ -465,17 +463,50 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0076"></a>
 <!-- !split -->
-<h2 id="train-and-test-datasets" class="anchor">Train and test datasets </h2>
+<h2 id="collect-and-pre-process-data" class="anchor">Collect and pre-process data </h2>
+
+<p>Here we will be using the MNIST dataset, which is readily available through the <b>scikit-learn</b>
+package. You may also find it for example <a href="http://yann.lecun.com/exdb/mnist/" target="_self">here</a>.  
+The <em>MNIST</em> (Modified National Institute of Standards and Technology) database is a large database
+of handwritten digits that is commonly used for training various image processing systems.  
+The MNIST dataset consists of 70 000 images of size \( 28\times 28 \) pixels, each labeled from 0 to 9.  
+The scikit-learn dataset we will use consists of a selection of 1797 images of size \( 8\times 8 \) collected and processed from this database.  
+</p>
 
-<p>Performing analysis before partitioning the dataset is a major error, that can lead to incorrect conclusions.  </p>
+<p>To feed data into a feed-forward neural network we need to represent
+the inputs as a design/feature matrix \( X = (n_{inputs}, n_{features}) \).  Each
+row represents an <em>input</em>, in this case a handwritten digit, and
+each column represents a <em>feature</em>, in this case a pixel.  The
+correct answers, also known as <em>labels</em> or <em>targets</em> are
+represented as a 1D array of integers 
+\( Y = (n_{inputs}) = (5, 3, 1, 8,...) \).
+</p>
 
-<p>We will reserve \( 80 \% \) of our dataset for training and \( 20 \% \) for testing.  </p>
+<p>As an example, say we want to build a neural network using supervised learning to predict Body-Mass Index (BMI) from
+measurements of height (in m)  
+and weight (in kg). If we have measurements of 5 people the design/feature matrix could be for example:  
+</p>
 
-<p>It is important that the train and test datasets are drawn randomly from our dataset, to ensure
-no bias in the sampling.  
-Say you are taking measurements of weather data to predict the weather in the coming 5 days.
-You don't want to train your model on measurements taken from the hours 00.00 to 12.00, and then test it on data
-collected from 12.00 to 24.00.
+<p>$$ X = \begin{bmatrix}
+1.85 &amp; 81\\
+1.71 &amp; 65\\
+1.95 &amp; 103\\
+1.55 &amp; 42\\
+1.63 &amp; 56
+\end{bmatrix} ,$$  
+</p>
+
+<p>and the targets would be:  </p>
+
+<p>$$ Y = (23.7, 22.2, 27.1, 17.5, 21.1) $$  </p>
+
+<p>Since each input image is a 2D matrix, we need to flatten the image
+(i.e. "unravel" the 2D matrix into a 1D array) to turn the data into a
+design/feature matrix. This means we lose all spatial information in the
+image, such as locality and translational invariance. More complicated
+architectures such as Convolutional Neural Networks can take advantage
+of such information, and are most commonly applied when analyzing
+images.
 </p>
 
 
@@ -485,33 +516,48 @@ <h2 id="train-and-test-datasets" class="anchor">Train and test datasets </h2>
     <div class="inner_cell">
       <div class="input_area">
         <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.model_selection</span> <span style="color: #008000; font-weight: bold">import</span> train_test_split
-
-<span style="color: #408080; font-style: italic"># one-liner from scikit-learn library</span>
-train_size <span style="color: #666666">=</span> <span style="color: #666666">0.8</span>
-test_size <span style="color: #666666">=</span> <span style="color: #666666">1</span> <span style="color: #666666">-</span> train_size
-X_train, X_test, Y_train, Y_test <span style="color: #666666">=</span> train_test_split(inputs, labels, train_size<span style="color: #666666">=</span>train_size,
-                                                    test_size<span style="color: #666666">=</span>test_size)
-
-<span style="color: #408080; font-style: italic"># equivalently in numpy</span>
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">train_test_split_numpy</span>(inputs, labels, train_size, test_size):
-    n_inputs <span style="color: #666666">=</span> <span style="color: #008000">len</span>(inputs)
-    inputs_shuffled <span style="color: #666666">=</span> inputs<span style="color: #666666">.</span>copy()
-    labels_shuffled <span style="color: #666666">=</span> labels<span style="color: #666666">.</span>copy()
-    
-    np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>shuffle(inputs_shuffled)
-    np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>shuffle(labels_shuffled)
-    
-    train_end <span style="color: #666666">=</span> <span style="color: #008000">int</span>(n_inputs<span style="color: #666666">*</span>train_size)
-    X_train, X_test <span style="color: #666666">=</span> inputs_shuffled[:train_end], inputs_shuffled[train_end:]
-    Y_train, Y_test <span style="color: #666666">=</span> labels_shuffled[:train_end], labels_shuffled[train_end:]
-    
-    <span style="color: #008000; font-weight: bold">return</span> X_train, X_test, Y_train, Y_test
-
-<span style="color: #408080; font-style: italic">#X_train, X_test, Y_train, Y_test = train_test_split_numpy(inputs, labels, train_size, test_size)</span>
-
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;Number of training images: &quot;</span> <span style="color: #666666">+</span> <span style="color: #008000">str</span>(<span style="color: #008000">len</span>(X_train)))
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;Number of test images: &quot;</span> <span style="color: #666666">+</span> <span style="color: #008000">str</span>(<span style="color: #008000">len</span>(X_test)))
+  <pre style="line-height: 125%;"><span style="color: #408080; font-style: italic"># import necessary packages</span>
+<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
+<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">matplotlib.pyplot</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">plt</span>
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn</span> <span style="color: #008000; font-weight: bold">import</span> datasets
+
+
+<span style="color: #408080; font-style: italic"># ensure the same random numbers appear every time</span>
+np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>seed(<span style="color: #666666">0</span>)
+
+<span style="color: #408080; font-style: italic"># display images in notebook</span>
+<span style="color: #666666">%</span>matplotlib inline
+plt<span style="color: #666666">.</span>rcParams[<span style="color: #BA2121">&#39;figure.figsize&#39;</span>] <span style="color: #666666">=</span> (<span style="color: #666666">12</span>,<span style="color: #666666">12</span>)
+
+
+<span style="color: #408080; font-style: italic"># download MNIST dataset</span>
+digits <span style="color: #666666">=</span> datasets<span style="color: #666666">.</span>load_digits()
+
+<span style="color: #408080; font-style: italic"># define inputs and labels</span>
+inputs <span style="color: #666666">=</span> digits<span style="color: #666666">.</span>images
+labels <span style="color: #666666">=</span> digits<span style="color: #666666">.</span>target
+
+<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;inputs = (n_inputs, pixel_width, pixel_height) = &quot;</span> <span style="color: #666666">+</span> <span style="color: #008000">str</span>(inputs<span style="color: #666666">.</span>shape))
+<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;labels = (n_inputs) = &quot;</span> <span style="color: #666666">+</span> <span style="color: #008000">str</span>(labels<span style="color: #666666">.</span>shape))
+
+
+<span style="color: #408080; font-style: italic"># flatten the image</span>
+<span style="color: #408080; font-style: italic"># the value -1 means dimension is inferred from the remaining dimensions: 8x8 = 64</span>
+n_inputs <span style="color: #666666">=</span> <span style="color: #008000">len</span>(inputs)
+inputs <span style="color: #666666">=</span> inputs<span style="color: #666666">.</span>reshape(n_inputs, <span style="color: #666666">-1</span>)
+<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;X = (n_inputs, n_features) = &quot;</span> <span style="color: #666666">+</span> <span style="color: #008000">str</span>(inputs<span style="color: #666666">.</span>shape))
+
+
+<span style="color: #408080; font-style: italic"># choose some random images to display</span>
+indices <span style="color: #666666">=</span> np<span style="color: #666666">.</span>arange(n_inputs)
+random_indices <span style="color: #666666">=</span> np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>choice(indices, size<span style="color: #666666">=5</span>)
+
+<span style="color: #008000; font-weight: bold">for</span> i, image <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">enumerate</span>(digits<span style="color: #666666">.</span>images[random_indices]):
+    plt<span style="color: #666666">.</span>subplot(<span style="color: #666666">1</span>, <span style="color: #666666">5</span>, i<span style="color: #666666">+1</span>)
+    plt<span style="color: #666666">.</span>axis(<span style="color: #BA2121">&#39;off&#39;</span>)
+    plt<span style="color: #666666">.</span>imshow(image, cmap<span style="color: #666666">=</span>plt<span style="color: #666666">.</span>cm<span style="color: #666666">.</span>gray_r, interpolation<span style="color: #666666">=</span><span style="color: #BA2121">&#39;nearest&#39;</span>)
+    plt<span style="color: #666666">.</span>title(<span style="color: #BA2121">&quot;Label: </span><span style="color: #BB6688; font-weight: bold">%d</span><span style="color: #BA2121">&quot;</span> <span style="color: #666666">%</span> digits<span style="color: #666666">.</span>target[random_indices[i]])
+plt<span style="color: #666666">.</span>show()
 </pre>
 </div>
       </div>
diff --git a/doc/pub/week42/html/._week42-bs077.html b/doc/pub/week42/html/._week42-bs077.html
index 75bc28881..4bbf354e4 100644
--- a/doc/pub/week42/html/._week42-bs077.html
+++ b/doc/pub/week42/html/._week42-bs077.html
@@ -36,16 +36,17 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Lecture October 14, 2024', 2, None, 'lecture-october-14-2024'),
-              ('Material for the active learning sessions on Tuesday and '
-               'Wednesday',
+ 'sections': [('Lecture October 13, 2025', 2, None, 'lecture-october-13-2025'),
+              ('Readings and videos', 2, None, 'readings-and-videos'),
+              ('Material for the lab sessions on Tuesday and Wednesday',
                2,
                None,
-               'material-for-the-active-learning-sessions-on-tuesday-and-wednesday'),
-              ('Writing a code which implements a feed-forward neural network',
+               'material-for-the-lab-sessions-on-tuesday-and-wednesday'),
+              ('Lecture material: Writing a code which implements a '
+               'feed-forward neural network',
                2,
                None,
-               'writing-a-code-which-implements-a-feed-forward-neural-network'),
+               'lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network'),
               ('Mathematics of deep learning',
                2,
                None,
@@ -55,10 +56,11 @@
                None,
                'reminder-on-books-with-hands-on-material-and-codes'),
               ('Reading recommendations', 2, None, 'reading-recommendations'),
-              ('First network example, simple percepetron with one input',
+              ('Reminder from last week: First network example, simple '
+               'percepetron with one input',
                2,
                None,
-               'first-network-example-simple-percepetron-with-one-input'),
+               'reminder-from-last-week-first-network-example-simple-percepetron-with-one-input'),
               ('Layout of a simple neural network with no hidden layer',
                2,
                None,
@@ -105,10 +107,10 @@
                None,
                'setting-up-the-equations-for-a-neural-network'),
               ('Layout of a neural network with three hidden layers (last '
-               'later = $l=L=4$, first layer $l=0$)',
+               'layer = $l=L=4$, first layer $l=0$)',
                2,
                None,
-               'layout-of-a-neural-network-with-three-hidden-layers-last-later-l-l-4-first-layer-l-0'),
+               'layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0'),
               ('Definitions', 2, None, 'definitions'),
               ('Inputs to the activation function',
                2,
@@ -291,10 +293,6 @@
                2,
                None,
                'collect-and-pre-process-data'),
-              ('The Breast Cancer Data, now with Keras',
-               2,
-               None,
-               'the-breast-cancer-data-now-with-keras'),
               ('Building a neural network code',
                2,
                None,
@@ -349,103 +347,103 @@
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs001.html#lecture-october-14-2024" style="font-size: 80%;"><b>Lecture October 14, 2024</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs002.html#material-for-the-active-learning-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Material for the active learning sessions on Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs003.html#writing-a-code-which-implements-a-feed-forward-neural-network" style="font-size: 80%;"><b>Writing a code which implements a feed-forward neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs004.html#mathematics-of-deep-learning" style="font-size: 80%;"><b>Mathematics of deep learning</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs005.html#reminder-on-books-with-hands-on-material-and-codes" style="font-size: 80%;"><b>Reminder on books with hands-on material and codes</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs006.html#reading-recommendations" style="font-size: 80%;"><b>Reading recommendations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs007.html#first-network-example-simple-percepetron-with-one-input" style="font-size: 80%;"><b>First network example, simple percepetron with one input</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs008.html#layout-of-a-simple-neural-network-with-no-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with no hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs009.html#optimizing-the-parameters" style="font-size: 80%;"><b>Optimizing the parameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs010.html#adding-a-hidden-layer" style="font-size: 80%;"><b>Adding a hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs011.html#layout-of-a-simple-neural-network-with-one-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with one hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs012.html#the-derivatives" style="font-size: 80%;"><b>The derivatives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs013.html#important-observations" style="font-size: 80%;"><b>Important observations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs014.html#the-training" style="font-size: 80%;"><b>The training</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs015.html#code-example" style="font-size: 80%;"><b>Code example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs016.html#simple-neural-network-and-the-back-propagation-equations" style="font-size: 80%;"><b>Simple neural network and the  back propagation equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs017.html#layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-with-two-hidden-noeds-and-one-output-node" style="font-size: 80%;"><b>Layout of a simple neural network with two input nodes, one  hidden layer with two hidden noeds and one output node</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs018.html#the-ouput-layer" style="font-size: 80%;"><b>The ouput layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs019.html#compact-expressions" style="font-size: 80%;"><b>Compact expressions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs020.html#output-layer" style="font-size: 80%;"><b>Output layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs021.html#explicit-derivatives" style="font-size: 80%;"><b>Explicit derivatives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs022.html#derivatives-of-the-hidden-layer" style="font-size: 80%;"><b>Derivatives of the hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs023.html#final-expression" style="font-size: 80%;"><b>Final expression</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs024.html#completing-the-list" style="font-size: 80%;"><b>Completing the list</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs025.html#final-expressions-for-the-biases-of-the-hidden-layer" style="font-size: 80%;"><b>Final expressions for the biases of the hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs026.html#gradient-expressions" style="font-size: 80%;"><b>Gradient expressions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs027.html#setting-up-the-equations-for-a-neural-network" style="font-size: 80%;"><b>Setting up the equations for a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs028.html#layout-of-a-neural-network-with-three-hidden-layers-last-later-l-l-4-first-layer-l-0" style="font-size: 80%;"><b>Layout of a neural network with three hidden layers (last later = \( l=L=4 \), first layer \( l=0 \))</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs029.html#definitions" style="font-size: 80%;"><b>Definitions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs030.html#inputs-to-the-activation-function" style="font-size: 80%;"><b>Inputs to the activation function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs031.html#layout-of-input-to-first-hidden-layer-l-1-from-input-layer-l-0" style="font-size: 80%;"><b>Layout of input to first hidden layer \( l=1 \) from input layer \( l=0 \)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs032.html#derivatives-and-the-chain-rule" style="font-size: 80%;"><b>Derivatives and the chain rule</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs033.html#derivative-of-the-cost-function" style="font-size: 80%;"><b>Derivative of the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs034.html#the-back-propagation-equations-for-a-neural-network" style="font-size: 80%;"><b>The  back propagation equations for a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs035.html#analyzing-the-last-results" style="font-size: 80%;"><b>Analyzing the last results</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs036.html#more-considerations" style="font-size: 80%;"><b>More considerations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs037.html#derivatives-in-terms-of-z-j-l" style="font-size: 80%;"><b>Derivatives in terms of \( z_j^L \)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs038.html#bringing-it-together" style="font-size: 80%;"><b>Bringing it together</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs039.html#final-back-propagating-equation" style="font-size: 80%;"><b>Final back propagating equation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs040.html#using-the-chain-rule-and-summing-over-all-k-entries" style="font-size: 80%;"><b>Using the chain rule and summing over all \( k \) entries</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs041.html#setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations" style="font-size: 80%;"><b>Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs042.html#setting-up-the-back-propagation-algorithm-part-1" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 1</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs043.html#setting-up-the-back-propagation-algorithm-part-2" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs044.html#setting-up-the-back-propagation-algorithm-part-3" style="font-size: 80%;"><b>Setting up the Back propagation algorithm, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs045.html#updating-the-gradients" style="font-size: 80%;"><b>Updating the gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs001.html#lecture-october-13-2025" style="font-size: 80%;"><b>Lecture October 13, 2025</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs002.html#readings-and-videos" style="font-size: 80%;"><b>Readings and videos</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs003.html#material-for-the-lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Material for the lab sessions on Tuesday and Wednesday</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs004.html#lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network" style="font-size: 80%;"><b>Lecture material: Writing a code which implements a feed-forward neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs005.html#mathematics-of-deep-learning" style="font-size: 80%;"><b>Mathematics of deep learning</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs006.html#reminder-on-books-with-hands-on-material-and-codes" style="font-size: 80%;"><b>Reminder on books with hands-on material and codes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs007.html#reading-recommendations" style="font-size: 80%;"><b>Reading recommendations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs008.html#reminder-from-last-week-first-network-example-simple-percepetron-with-one-input" style="font-size: 80%;"><b>Reminder from last week: First network example, simple percepetron with one input</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs009.html#layout-of-a-simple-neural-network-with-no-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with no hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs010.html#optimizing-the-parameters" style="font-size: 80%;"><b>Optimizing the parameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs011.html#adding-a-hidden-layer" style="font-size: 80%;"><b>Adding a hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs012.html#layout-of-a-simple-neural-network-with-one-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with one hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs013.html#the-derivatives" style="font-size: 80%;"><b>The derivatives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs014.html#important-observations" style="font-size: 80%;"><b>Important observations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs015.html#the-training" style="font-size: 80%;"><b>The training</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs016.html#code-example" style="font-size: 80%;"><b>Code example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs017.html#simple-neural-network-and-the-back-propagation-equations" style="font-size: 80%;"><b>Simple neural network and the  back propagation equations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs018.html#layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-with-two-hidden-noeds-and-one-output-node" style="font-size: 80%;"><b>Layout of a simple neural network with two input nodes, one  hidden layer with two hidden noeds and one output node</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs019.html#the-ouput-layer" style="font-size: 80%;"><b>The ouput layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs020.html#compact-expressions" style="font-size: 80%;"><b>Compact expressions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs021.html#output-layer" style="font-size: 80%;"><b>Output layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs022.html#explicit-derivatives" style="font-size: 80%;"><b>Explicit derivatives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs023.html#derivatives-of-the-hidden-layer" style="font-size: 80%;"><b>Derivatives of the hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs024.html#final-expression" style="font-size: 80%;"><b>Final expression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs025.html#completing-the-list" style="font-size: 80%;"><b>Completing the list</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs026.html#final-expressions-for-the-biases-of-the-hidden-layer" style="font-size: 80%;"><b>Final expressions for the biases of the hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs027.html#gradient-expressions" style="font-size: 80%;"><b>Gradient expressions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs028.html#setting-up-the-equations-for-a-neural-network" style="font-size: 80%;"><b>Setting up the equations for a neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs029.html#layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0" style="font-size: 80%;"><b>Layout of a neural network with three hidden layers (last layer = \( l=L=4 \), first layer \( l=0 \))</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs030.html#definitions" style="font-size: 80%;"><b>Definitions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs031.html#inputs-to-the-activation-function" style="font-size: 80%;"><b>Inputs to the activation function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs032.html#layout-of-input-to-first-hidden-layer-l-1-from-input-layer-l-0" style="font-size: 80%;"><b>Layout of input to first hidden layer \( l=1 \) from input layer \( l=0 \)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs033.html#derivatives-and-the-chain-rule" style="font-size: 80%;"><b>Derivatives and the chain rule</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs034.html#derivative-of-the-cost-function" style="font-size: 80%;"><b>Derivative of the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs035.html#the-back-propagation-equations-for-a-neural-network" style="font-size: 80%;"><b>The  back propagation equations for a neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs036.html#analyzing-the-last-results" style="font-size: 80%;"><b>Analyzing the last results</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs037.html#more-considerations" style="font-size: 80%;"><b>More considerations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs038.html#derivatives-in-terms-of-z-j-l" style="font-size: 80%;"><b>Derivatives in terms of \( z_j^L \)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs039.html#bringing-it-together" style="font-size: 80%;"><b>Bringing it together</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs040.html#final-back-propagating-equation" style="font-size: 80%;"><b>Final back propagating equation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs041.html#using-the-chain-rule-and-summing-over-all-k-entries" style="font-size: 80%;"><b>Using the chain rule and summing over all \( k \) entries</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs042.html#setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations" style="font-size: 80%;"><b>Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs043.html#setting-up-the-back-propagation-algorithm-part-1" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 1</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs044.html#setting-up-the-back-propagation-algorithm-part-2" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 2</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs045.html#setting-up-the-back-propagation-algorithm-part-3" style="font-size: 80%;"><b>Setting up the Back propagation algorithm, part 3</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs046.html#updating-the-gradients" style="font-size: 80%;"><b>Updating the gradients</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#activation-functions" style="font-size: 80%;"><b>Activation functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs047.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs048.html#relevance" style="font-size: 80%;"><b>Relevance</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs049.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs050.html#exploding-gradients" style="font-size: 80%;"><b>Exploding gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs051.html#is-the-logistic-activation-function-sigmoid-our-choice" style="font-size: 80%;"><b>Is the Logistic activation function (Sigmoid)  our choice?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs052.html#logistic-function-as-the-root-of-problems" style="font-size: 80%;"><b>Logistic function as the root of problems</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs053.html#the-derivative-of-the-logistic-funtion" style="font-size: 80%;"><b>The derivative of the Logistic funtion</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs054.html#insights-from-the-paper-by-glorot-and-bengio" style="font-size: 80%;"><b>Insights from the paper by Glorot and Bengio</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs055.html#the-relu-function-family" style="font-size: 80%;"><b>The RELU function family</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs056.html#elu-function" style="font-size: 80%;"><b>ELU function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs057.html#which-activation-function-should-we-use" style="font-size: 80%;"><b>Which activation function should we use?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs058.html#more-on-activation-functions-output-layers" style="font-size: 80%;"><b>More on activation functions, output layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs059.html#fine-tuning-neural-network-hyperparameters" style="font-size: 80%;"><b>Fine-tuning neural network hyperparameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs060.html#hidden-layers" style="font-size: 80%;"><b>Hidden layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs061.html#batch-normalization" style="font-size: 80%;"><b>Batch Normalization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs062.html#dropout" style="font-size: 80%;"><b>Dropout</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs063.html#gradient-clipping" style="font-size: 80%;"><b>Gradient Clipping</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs064.html#a-top-down-perspective-on-neural-networks" style="font-size: 80%;"><b>A top-down perspective on Neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs065.html#more-top-down-perspectives" style="font-size: 80%;"><b>More top-down perspectives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs066.html#limitations-of-supervised-learning-with-deep-networks" style="font-size: 80%;"><b>Limitations of supervised learning with deep networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs067.html#limitations-of-nns" style="font-size: 80%;"><b>Limitations of NNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs068.html#homogeneous-data" style="font-size: 80%;"><b>Homogeneous data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs069.html#more-limitations" style="font-size: 80%;"><b>More limitations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs070.html#setting-up-a-multi-layer-perceptron-model-for-classification" style="font-size: 80%;"><b>Setting up a Multi-layer perceptron model for classification</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs071.html#defining-the-cost-function" style="font-size: 80%;"><b>Defining the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs072.html#example-binary-classification-problem" style="font-size: 80%;"><b>Example: binary classification problem</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs073.html#the-softmax-function" style="font-size: 80%;"><b>The Softmax function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs074.html#developing-a-code-for-doing-neural-networks-with-back-propagation" style="font-size: 80%;"><b>Developing a code for doing neural networks with back propagation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs076.html#train-and-test-datasets" style="font-size: 80%;"><b>Train and test datasets</b></a></li>
-     <!-- navigation toc: --> <li><a href="#define-model-and-architecture" style="font-size: 80%;"><b>Define model and architecture</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs078.html#layers" style="font-size: 80%;"><b>Layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs079.html#weights-and-biases" style="font-size: 80%;"><b>Weights and biases</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs080.html#feed-forward-pass" style="font-size: 80%;"><b>Feed-forward pass</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs081.html#matrix-multiplications" style="font-size: 80%;"><b>Matrix multiplications</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs082.html#choose-cost-function-and-optimizer" style="font-size: 80%;"><b>Choose cost function and optimizer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs083.html#optimizing-the-cost-function" style="font-size: 80%;"><b>Optimizing the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs084.html#regularization" style="font-size: 80%;"><b>Regularization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs085.html#matrix-multiplication" style="font-size: 80%;"><b>Matrix  multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs086.html#improving-performance" style="font-size: 80%;"><b>Improving performance</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs087.html#full-object-oriented-implementation" style="font-size: 80%;"><b>Full object-oriented implementation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs088.html#evaluate-model-performance-on-test-data" style="font-size: 80%;"><b>Evaluate model performance on test data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs089.html#adjust-hyperparameters" style="font-size: 80%;"><b>Adjust hyperparameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs091.html#scikit-learn-implementation" style="font-size: 80%;"><b>scikit-learn implementation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#building-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs094.html#tensorflow" style="font-size: 80%;"><b>Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs095.html#using-keras" style="font-size: 80%;"><b>Using Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#the-breast-cancer-data-now-with-keras" style="font-size: 80%;"><b>The Breast Cancer Data, now with Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs048.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs049.html#relevance" style="font-size: 80%;"><b>Relevance</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs050.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs051.html#exploding-gradients" style="font-size: 80%;"><b>Exploding gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs052.html#is-the-logistic-activation-function-sigmoid-our-choice" style="font-size: 80%;"><b>Is the Logistic activation function (Sigmoid)  our choice?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs053.html#logistic-function-as-the-root-of-problems" style="font-size: 80%;"><b>Logistic function as the root of problems</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs054.html#the-derivative-of-the-logistic-funtion" style="font-size: 80%;"><b>The derivative of the Logistic funtion</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs055.html#insights-from-the-paper-by-glorot-and-bengio" style="font-size: 80%;"><b>Insights from the paper by Glorot and Bengio</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs056.html#the-relu-function-family" style="font-size: 80%;"><b>The RELU function family</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs057.html#elu-function" style="font-size: 80%;"><b>ELU function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs058.html#which-activation-function-should-we-use" style="font-size: 80%;"><b>Which activation function should we use?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs059.html#more-on-activation-functions-output-layers" style="font-size: 80%;"><b>More on activation functions, output layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs060.html#fine-tuning-neural-network-hyperparameters" style="font-size: 80%;"><b>Fine-tuning neural network hyperparameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs061.html#hidden-layers" style="font-size: 80%;"><b>Hidden layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs062.html#batch-normalization" style="font-size: 80%;"><b>Batch Normalization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs063.html#dropout" style="font-size: 80%;"><b>Dropout</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs064.html#gradient-clipping" style="font-size: 80%;"><b>Gradient Clipping</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs065.html#a-top-down-perspective-on-neural-networks" style="font-size: 80%;"><b>A top-down perspective on Neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs066.html#more-top-down-perspectives" style="font-size: 80%;"><b>More top-down perspectives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs067.html#limitations-of-supervised-learning-with-deep-networks" style="font-size: 80%;"><b>Limitations of supervised learning with deep networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs068.html#limitations-of-nns" style="font-size: 80%;"><b>Limitations of NNs</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs069.html#homogeneous-data" style="font-size: 80%;"><b>Homogeneous data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs070.html#more-limitations" style="font-size: 80%;"><b>More limitations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs071.html#setting-up-a-multi-layer-perceptron-model-for-classification" style="font-size: 80%;"><b>Setting up a Multi-layer perceptron model for classification</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs072.html#defining-the-cost-function" style="font-size: 80%;"><b>Defining the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs073.html#example-binary-classification-problem" style="font-size: 80%;"><b>Example: binary classification problem</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs074.html#the-softmax-function" style="font-size: 80%;"><b>The Softmax function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs075.html#developing-a-code-for-doing-neural-networks-with-back-propagation" style="font-size: 80%;"><b>Developing a code for doing neural networks with back propagation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
+     <!-- navigation toc: --> <li><a href="#train-and-test-datasets" style="font-size: 80%;"><b>Train and test datasets</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs078.html#define-model-and-architecture" style="font-size: 80%;"><b>Define model and architecture</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs079.html#layers" style="font-size: 80%;"><b>Layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs080.html#weights-and-biases" style="font-size: 80%;"><b>Weights and biases</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs081.html#feed-forward-pass" style="font-size: 80%;"><b>Feed-forward pass</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs082.html#matrix-multiplications" style="font-size: 80%;"><b>Matrix multiplications</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs083.html#choose-cost-function-and-optimizer" style="font-size: 80%;"><b>Choose cost function and optimizer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs084.html#optimizing-the-cost-function" style="font-size: 80%;"><b>Optimizing the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs085.html#regularization" style="font-size: 80%;"><b>Regularization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs086.html#matrix-multiplication" style="font-size: 80%;"><b>Matrix  multiplication</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs087.html#improving-performance" style="font-size: 80%;"><b>Improving performance</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs088.html#full-object-oriented-implementation" style="font-size: 80%;"><b>Full object-oriented implementation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs089.html#evaluate-model-performance-on-test-data" style="font-size: 80%;"><b>Evaluate model performance on test data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs090.html#adjust-hyperparameters" style="font-size: 80%;"><b>Adjust hyperparameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#scikit-learn-implementation" style="font-size: 80%;"><b>scikit-learn implementation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs094.html#building-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building neural networks in Tensorflow and Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs095.html#tensorflow" style="font-size: 80%;"><b>Tensorflow</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#using-keras" style="font-size: 80%;"><b>Using Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#building-a-neural-network-code" style="font-size: 80%;"><b>Building a neural network code</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#learning-rate-methods" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Learning rate methods</a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#usage-of-the-above-learning-rate-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of the above learning rate schedulers</a></li>
@@ -465,46 +463,68 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0077"></a>
 <!-- !split -->
-<h2 id="define-model-and-architecture" class="anchor">Define model and architecture </h2>
-
-<p>Our simple feed-forward neural network will consist of an <em>input</em> layer, a single <em>hidden</em> layer and an <em>output</em> layer. The activation \( y \) of each neuron is a weighted sum of inputs, passed through an activation function. In case of the simple perceptron model we have </p>
+<h2 id="train-and-test-datasets" class="anchor">Train and test datasets </h2>
 
-<p>$$ z = \sum_{i=1}^n w_i a_i ,$$</p>
+<p>Performing analysis before partitioning the dataset is a major error, that can lead to incorrect conclusions.  </p>
 
-<p>$$ y = f(z) ,$$</p>
+<p>We will reserve \( 80 \% \) of our dataset for training and \( 20 \% \) for testing.  </p>
 
-<p>where \( f \) is the activation function, \( a_i \) represents input from neuron \( i \) in the preceding layer
-and \( w_i \) is the weight to input \( i \).  
-The activation of the neurons in the input layer is just the features (e.g. a pixel value).  
+<p>It is important that the train and test datasets are drawn randomly from our dataset, to ensure
+no bias in the sampling.  
+Say you are taking measurements of weather data to predict the weather in the coming 5 days.
+You don't want to train your model on measurements taken from the hours 00.00 to 12.00, and then test it on data
+collected from 12.00 to 24.00.
 </p>
 
-<p>The simplest activation function for a neuron is the <em>Heaviside</em> function:</p>
 
-<p>$$ f(z) = 
-\begin{cases}
-1,  &  z > 0\\
-0,  & \text{otherwise}
-\end{cases}
-$$
-</p>
+<!-- code=python (!bc pycod) typeset with pygments style "default" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #f8f8f8">
+  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.model_selection</span> <span style="color: #008000; font-weight: bold">import</span> train_test_split
 
-<p>A feed-forward neural network with this activation is known as a <em>perceptron</em>.  
-For a binary classifier (i.e. two classes, 0 or 1, dog or not-dog) we can also use this in our output layer.  
-This activation can be generalized to \( k \) classes (using e.g. the <em>one-against-all</em> strategy), 
-and we call these architectures <em>multiclass perceptrons</em>.  
-</p>
+<span style="color: #408080; font-style: italic"># one-liner from scikit-learn library</span>
+train_size <span style="color: #666666">=</span> <span style="color: #666666">0.8</span>
+test_size <span style="color: #666666">=</span> <span style="color: #666666">1</span> <span style="color: #666666">-</span> train_size
+X_train, X_test, Y_train, Y_test <span style="color: #666666">=</span> train_test_split(inputs, labels, train_size<span style="color: #666666">=</span>train_size,
+                                                    test_size<span style="color: #666666">=</span>test_size)
 
-<p>However, it is now common to use the terms Single Layer Perceptron (SLP) (1 hidden layer) and  
-Multilayer Perceptron (MLP) (2 or more hidden layers) to refer to feed-forward neural networks with any activation function.  
-</p>
+<span style="color: #408080; font-style: italic"># equivalently in numpy</span>
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">train_test_split_numpy</span>(inputs, labels, train_size, test_size):
+    n_inputs <span style="color: #666666">=</span> <span style="color: #008000">len</span>(inputs)
+    inputs_shuffled <span style="color: #666666">=</span> inputs<span style="color: #666666">.</span>copy()
+    labels_shuffled <span style="color: #666666">=</span> labels<span style="color: #666666">.</span>copy()
+    
+    np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>shuffle(inputs_shuffled)
+    np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>shuffle(labels_shuffled)
+    
+    train_end <span style="color: #666666">=</span> <span style="color: #008000">int</span>(n_inputs<span style="color: #666666">*</span>train_size)
+    X_train, X_test <span style="color: #666666">=</span> inputs_shuffled[:train_end], inputs_shuffled[train_end:]
+    Y_train, Y_test <span style="color: #666666">=</span> labels_shuffled[:train_end], labels_shuffled[train_end:]
+    
+    <span style="color: #008000; font-weight: bold">return</span> X_train, X_test, Y_train, Y_test
 
-<p>Typical choices for activation functions include the sigmoid function, hyperbolic tangent, and Rectified Linear Unit (ReLU).  
-We will be using the sigmoid function \( \sigma(x) \):  
-</p>
+<span style="color: #408080; font-style: italic">#X_train, X_test, Y_train, Y_test = train_test_split_numpy(inputs, labels, train_size, test_size)</span>
 
-<p>$$ f(x) = \sigma(x) = \frac{1}{1 + e^{-x}} ,$$</p>
+<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;Number of training images: &quot;</span> <span style="color: #666666">+</span> <span style="color: #008000">str</span>(<span style="color: #008000">len</span>(X_train)))
+<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;Number of test images: &quot;</span> <span style="color: #666666">+</span> <span style="color: #008000">str</span>(<span style="color: #008000">len</span>(X_test)))
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
+</div>
 
-<p>which is inspired by probability theory (see logistic regression) and was most commonly used until about 2011. See the discussion below concerning other activation functions.</p>
 
 <p>
 <!-- navigation buttons at the bottom of the page -->
diff --git a/doc/pub/week42/html/._week42-bs078.html b/doc/pub/week42/html/._week42-bs078.html
index c5878d261..817b5d7f2 100644
--- a/doc/pub/week42/html/._week42-bs078.html
+++ b/doc/pub/week42/html/._week42-bs078.html
@@ -36,16 +36,17 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Lecture October 14, 2024', 2, None, 'lecture-october-14-2024'),
-              ('Material for the active learning sessions on Tuesday and '
-               'Wednesday',
+ 'sections': [('Lecture October 13, 2025', 2, None, 'lecture-october-13-2025'),
+              ('Readings and videos', 2, None, 'readings-and-videos'),
+              ('Material for the lab sessions on Tuesday and Wednesday',
                2,
                None,
-               'material-for-the-active-learning-sessions-on-tuesday-and-wednesday'),
-              ('Writing a code which implements a feed-forward neural network',
+               'material-for-the-lab-sessions-on-tuesday-and-wednesday'),
+              ('Lecture material: Writing a code which implements a '
+               'feed-forward neural network',
                2,
                None,
-               'writing-a-code-which-implements-a-feed-forward-neural-network'),
+               'lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network'),
               ('Mathematics of deep learning',
                2,
                None,
@@ -55,10 +56,11 @@
                None,
                'reminder-on-books-with-hands-on-material-and-codes'),
               ('Reading recommendations', 2, None, 'reading-recommendations'),
-              ('First network example, simple percepetron with one input',
+              ('Reminder from last week: First network example, simple '
+               'percepetron with one input',
                2,
                None,
-               'first-network-example-simple-percepetron-with-one-input'),
+               'reminder-from-last-week-first-network-example-simple-percepetron-with-one-input'),
               ('Layout of a simple neural network with no hidden layer',
                2,
                None,
@@ -105,10 +107,10 @@
                None,
                'setting-up-the-equations-for-a-neural-network'),
               ('Layout of a neural network with three hidden layers (last '
-               'later = $l=L=4$, first layer $l=0$)',
+               'layer = $l=L=4$, first layer $l=0$)',
                2,
                None,
-               'layout-of-a-neural-network-with-three-hidden-layers-last-later-l-l-4-first-layer-l-0'),
+               'layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0'),
               ('Definitions', 2, None, 'definitions'),
               ('Inputs to the activation function',
                2,
@@ -291,10 +293,6 @@
                2,
                None,
                'collect-and-pre-process-data'),
-              ('The Breast Cancer Data, now with Keras',
-               2,
-               None,
-               'the-breast-cancer-data-now-with-keras'),
               ('Building a neural network code',
                2,
                None,
@@ -349,103 +347,103 @@
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs001.html#lecture-october-14-2024" style="font-size: 80%;"><b>Lecture October 14, 2024</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs002.html#material-for-the-active-learning-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Material for the active learning sessions on Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs003.html#writing-a-code-which-implements-a-feed-forward-neural-network" style="font-size: 80%;"><b>Writing a code which implements a feed-forward neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs004.html#mathematics-of-deep-learning" style="font-size: 80%;"><b>Mathematics of deep learning</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs005.html#reminder-on-books-with-hands-on-material-and-codes" style="font-size: 80%;"><b>Reminder on books with hands-on material and codes</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs006.html#reading-recommendations" style="font-size: 80%;"><b>Reading recommendations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs007.html#first-network-example-simple-percepetron-with-one-input" style="font-size: 80%;"><b>First network example, simple percepetron with one input</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs008.html#layout-of-a-simple-neural-network-with-no-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with no hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs009.html#optimizing-the-parameters" style="font-size: 80%;"><b>Optimizing the parameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs010.html#adding-a-hidden-layer" style="font-size: 80%;"><b>Adding a hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs011.html#layout-of-a-simple-neural-network-with-one-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with one hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs012.html#the-derivatives" style="font-size: 80%;"><b>The derivatives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs013.html#important-observations" style="font-size: 80%;"><b>Important observations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs014.html#the-training" style="font-size: 80%;"><b>The training</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs015.html#code-example" style="font-size: 80%;"><b>Code example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs016.html#simple-neural-network-and-the-back-propagation-equations" style="font-size: 80%;"><b>Simple neural network and the  back propagation equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs017.html#layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-with-two-hidden-noeds-and-one-output-node" style="font-size: 80%;"><b>Layout of a simple neural network with two input nodes, one  hidden layer with two hidden noeds and one output node</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs018.html#the-ouput-layer" style="font-size: 80%;"><b>The ouput layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs019.html#compact-expressions" style="font-size: 80%;"><b>Compact expressions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs020.html#output-layer" style="font-size: 80%;"><b>Output layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs021.html#explicit-derivatives" style="font-size: 80%;"><b>Explicit derivatives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs022.html#derivatives-of-the-hidden-layer" style="font-size: 80%;"><b>Derivatives of the hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs023.html#final-expression" style="font-size: 80%;"><b>Final expression</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs024.html#completing-the-list" style="font-size: 80%;"><b>Completing the list</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs025.html#final-expressions-for-the-biases-of-the-hidden-layer" style="font-size: 80%;"><b>Final expressions for the biases of the hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs026.html#gradient-expressions" style="font-size: 80%;"><b>Gradient expressions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs027.html#setting-up-the-equations-for-a-neural-network" style="font-size: 80%;"><b>Setting up the equations for a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs028.html#layout-of-a-neural-network-with-three-hidden-layers-last-later-l-l-4-first-layer-l-0" style="font-size: 80%;"><b>Layout of a neural network with three hidden layers (last later = \( l=L=4 \), first layer \( l=0 \))</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs029.html#definitions" style="font-size: 80%;"><b>Definitions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs030.html#inputs-to-the-activation-function" style="font-size: 80%;"><b>Inputs to the activation function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs031.html#layout-of-input-to-first-hidden-layer-l-1-from-input-layer-l-0" style="font-size: 80%;"><b>Layout of input to first hidden layer \( l=1 \) from input layer \( l=0 \)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs032.html#derivatives-and-the-chain-rule" style="font-size: 80%;"><b>Derivatives and the chain rule</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs033.html#derivative-of-the-cost-function" style="font-size: 80%;"><b>Derivative of the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs034.html#the-back-propagation-equations-for-a-neural-network" style="font-size: 80%;"><b>The  back propagation equations for a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs035.html#analyzing-the-last-results" style="font-size: 80%;"><b>Analyzing the last results</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs036.html#more-considerations" style="font-size: 80%;"><b>More considerations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs037.html#derivatives-in-terms-of-z-j-l" style="font-size: 80%;"><b>Derivatives in terms of \( z_j^L \)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs038.html#bringing-it-together" style="font-size: 80%;"><b>Bringing it together</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs039.html#final-back-propagating-equation" style="font-size: 80%;"><b>Final back propagating equation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs040.html#using-the-chain-rule-and-summing-over-all-k-entries" style="font-size: 80%;"><b>Using the chain rule and summing over all \( k \) entries</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs041.html#setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations" style="font-size: 80%;"><b>Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs042.html#setting-up-the-back-propagation-algorithm-part-1" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 1</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs043.html#setting-up-the-back-propagation-algorithm-part-2" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs044.html#setting-up-the-back-propagation-algorithm-part-3" style="font-size: 80%;"><b>Setting up the Back propagation algorithm, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs045.html#updating-the-gradients" style="font-size: 80%;"><b>Updating the gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs001.html#lecture-october-13-2025" style="font-size: 80%;"><b>Lecture October 13, 2025</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs002.html#readings-and-videos" style="font-size: 80%;"><b>Readings and videos</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs003.html#material-for-the-lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Material for the lab sessions on Tuesday and Wednesday</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs004.html#lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network" style="font-size: 80%;"><b>Lecture material: Writing a code which implements a feed-forward neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs005.html#mathematics-of-deep-learning" style="font-size: 80%;"><b>Mathematics of deep learning</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs006.html#reminder-on-books-with-hands-on-material-and-codes" style="font-size: 80%;"><b>Reminder on books with hands-on material and codes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs007.html#reading-recommendations" style="font-size: 80%;"><b>Reading recommendations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs008.html#reminder-from-last-week-first-network-example-simple-percepetron-with-one-input" style="font-size: 80%;"><b>Reminder from last week: First network example, simple percepetron with one input</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs009.html#layout-of-a-simple-neural-network-with-no-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with no hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs010.html#optimizing-the-parameters" style="font-size: 80%;"><b>Optimizing the parameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs011.html#adding-a-hidden-layer" style="font-size: 80%;"><b>Adding a hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs012.html#layout-of-a-simple-neural-network-with-one-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with one hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs013.html#the-derivatives" style="font-size: 80%;"><b>The derivatives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs014.html#important-observations" style="font-size: 80%;"><b>Important observations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs015.html#the-training" style="font-size: 80%;"><b>The training</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs016.html#code-example" style="font-size: 80%;"><b>Code example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs017.html#simple-neural-network-and-the-back-propagation-equations" style="font-size: 80%;"><b>Simple neural network and the  back propagation equations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs018.html#layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-with-two-hidden-noeds-and-one-output-node" style="font-size: 80%;"><b>Layout of a simple neural network with two input nodes, one  hidden layer with two hidden noeds and one output node</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs019.html#the-ouput-layer" style="font-size: 80%;"><b>The ouput layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs020.html#compact-expressions" style="font-size: 80%;"><b>Compact expressions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs021.html#output-layer" style="font-size: 80%;"><b>Output layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs022.html#explicit-derivatives" style="font-size: 80%;"><b>Explicit derivatives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs023.html#derivatives-of-the-hidden-layer" style="font-size: 80%;"><b>Derivatives of the hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs024.html#final-expression" style="font-size: 80%;"><b>Final expression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs025.html#completing-the-list" style="font-size: 80%;"><b>Completing the list</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs026.html#final-expressions-for-the-biases-of-the-hidden-layer" style="font-size: 80%;"><b>Final expressions for the biases of the hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs027.html#gradient-expressions" style="font-size: 80%;"><b>Gradient expressions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs028.html#setting-up-the-equations-for-a-neural-network" style="font-size: 80%;"><b>Setting up the equations for a neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs029.html#layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0" style="font-size: 80%;"><b>Layout of a neural network with three hidden layers (last layer = \( l=L=4 \), first layer \( l=0 \))</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs030.html#definitions" style="font-size: 80%;"><b>Definitions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs031.html#inputs-to-the-activation-function" style="font-size: 80%;"><b>Inputs to the activation function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs032.html#layout-of-input-to-first-hidden-layer-l-1-from-input-layer-l-0" style="font-size: 80%;"><b>Layout of input to first hidden layer \( l=1 \) from input layer \( l=0 \)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs033.html#derivatives-and-the-chain-rule" style="font-size: 80%;"><b>Derivatives and the chain rule</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs034.html#derivative-of-the-cost-function" style="font-size: 80%;"><b>Derivative of the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs035.html#the-back-propagation-equations-for-a-neural-network" style="font-size: 80%;"><b>The  back propagation equations for a neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs036.html#analyzing-the-last-results" style="font-size: 80%;"><b>Analyzing the last results</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs037.html#more-considerations" style="font-size: 80%;"><b>More considerations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs038.html#derivatives-in-terms-of-z-j-l" style="font-size: 80%;"><b>Derivatives in terms of \( z_j^L \)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs039.html#bringing-it-together" style="font-size: 80%;"><b>Bringing it together</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs040.html#final-back-propagating-equation" style="font-size: 80%;"><b>Final back propagating equation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs041.html#using-the-chain-rule-and-summing-over-all-k-entries" style="font-size: 80%;"><b>Using the chain rule and summing over all \( k \) entries</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs042.html#setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations" style="font-size: 80%;"><b>Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs043.html#setting-up-the-back-propagation-algorithm-part-1" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 1</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs044.html#setting-up-the-back-propagation-algorithm-part-2" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 2</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs045.html#setting-up-the-back-propagation-algorithm-part-3" style="font-size: 80%;"><b>Setting up the Back propagation algorithm, part 3</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs046.html#updating-the-gradients" style="font-size: 80%;"><b>Updating the gradients</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#activation-functions" style="font-size: 80%;"><b>Activation functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs047.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs048.html#relevance" style="font-size: 80%;"><b>Relevance</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs049.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs050.html#exploding-gradients" style="font-size: 80%;"><b>Exploding gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs051.html#is-the-logistic-activation-function-sigmoid-our-choice" style="font-size: 80%;"><b>Is the Logistic activation function (Sigmoid)  our choice?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs052.html#logistic-function-as-the-root-of-problems" style="font-size: 80%;"><b>Logistic function as the root of problems</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs053.html#the-derivative-of-the-logistic-funtion" style="font-size: 80%;"><b>The derivative of the Logistic funtion</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs054.html#insights-from-the-paper-by-glorot-and-bengio" style="font-size: 80%;"><b>Insights from the paper by Glorot and Bengio</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs055.html#the-relu-function-family" style="font-size: 80%;"><b>The RELU function family</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs056.html#elu-function" style="font-size: 80%;"><b>ELU function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs057.html#which-activation-function-should-we-use" style="font-size: 80%;"><b>Which activation function should we use?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs058.html#more-on-activation-functions-output-layers" style="font-size: 80%;"><b>More on activation functions, output layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs059.html#fine-tuning-neural-network-hyperparameters" style="font-size: 80%;"><b>Fine-tuning neural network hyperparameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs060.html#hidden-layers" style="font-size: 80%;"><b>Hidden layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs061.html#batch-normalization" style="font-size: 80%;"><b>Batch Normalization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs062.html#dropout" style="font-size: 80%;"><b>Dropout</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs063.html#gradient-clipping" style="font-size: 80%;"><b>Gradient Clipping</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs064.html#a-top-down-perspective-on-neural-networks" style="font-size: 80%;"><b>A top-down perspective on Neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs065.html#more-top-down-perspectives" style="font-size: 80%;"><b>More top-down perspectives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs066.html#limitations-of-supervised-learning-with-deep-networks" style="font-size: 80%;"><b>Limitations of supervised learning with deep networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs067.html#limitations-of-nns" style="font-size: 80%;"><b>Limitations of NNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs068.html#homogeneous-data" style="font-size: 80%;"><b>Homogeneous data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs069.html#more-limitations" style="font-size: 80%;"><b>More limitations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs070.html#setting-up-a-multi-layer-perceptron-model-for-classification" style="font-size: 80%;"><b>Setting up a Multi-layer perceptron model for classification</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs071.html#defining-the-cost-function" style="font-size: 80%;"><b>Defining the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs072.html#example-binary-classification-problem" style="font-size: 80%;"><b>Example: binary classification problem</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs073.html#the-softmax-function" style="font-size: 80%;"><b>The Softmax function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs074.html#developing-a-code-for-doing-neural-networks-with-back-propagation" style="font-size: 80%;"><b>Developing a code for doing neural networks with back propagation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs076.html#train-and-test-datasets" style="font-size: 80%;"><b>Train and test datasets</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs077.html#define-model-and-architecture" style="font-size: 80%;"><b>Define model and architecture</b></a></li>
-     <!-- navigation toc: --> <li><a href="#layers" style="font-size: 80%;"><b>Layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs079.html#weights-and-biases" style="font-size: 80%;"><b>Weights and biases</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs080.html#feed-forward-pass" style="font-size: 80%;"><b>Feed-forward pass</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs081.html#matrix-multiplications" style="font-size: 80%;"><b>Matrix multiplications</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs082.html#choose-cost-function-and-optimizer" style="font-size: 80%;"><b>Choose cost function and optimizer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs083.html#optimizing-the-cost-function" style="font-size: 80%;"><b>Optimizing the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs084.html#regularization" style="font-size: 80%;"><b>Regularization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs085.html#matrix-multiplication" style="font-size: 80%;"><b>Matrix  multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs086.html#improving-performance" style="font-size: 80%;"><b>Improving performance</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs087.html#full-object-oriented-implementation" style="font-size: 80%;"><b>Full object-oriented implementation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs088.html#evaluate-model-performance-on-test-data" style="font-size: 80%;"><b>Evaluate model performance on test data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs089.html#adjust-hyperparameters" style="font-size: 80%;"><b>Adjust hyperparameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs091.html#scikit-learn-implementation" style="font-size: 80%;"><b>scikit-learn implementation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#building-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs094.html#tensorflow" style="font-size: 80%;"><b>Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs095.html#using-keras" style="font-size: 80%;"><b>Using Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#the-breast-cancer-data-now-with-keras" style="font-size: 80%;"><b>The Breast Cancer Data, now with Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs048.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs049.html#relevance" style="font-size: 80%;"><b>Relevance</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs050.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs051.html#exploding-gradients" style="font-size: 80%;"><b>Exploding gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs052.html#is-the-logistic-activation-function-sigmoid-our-choice" style="font-size: 80%;"><b>Is the Logistic activation function (Sigmoid)  our choice?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs053.html#logistic-function-as-the-root-of-problems" style="font-size: 80%;"><b>Logistic function as the root of problems</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs054.html#the-derivative-of-the-logistic-funtion" style="font-size: 80%;"><b>The derivative of the Logistic funtion</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs055.html#insights-from-the-paper-by-glorot-and-bengio" style="font-size: 80%;"><b>Insights from the paper by Glorot and Bengio</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs056.html#the-relu-function-family" style="font-size: 80%;"><b>The RELU function family</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs057.html#elu-function" style="font-size: 80%;"><b>ELU function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs058.html#which-activation-function-should-we-use" style="font-size: 80%;"><b>Which activation function should we use?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs059.html#more-on-activation-functions-output-layers" style="font-size: 80%;"><b>More on activation functions, output layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs060.html#fine-tuning-neural-network-hyperparameters" style="font-size: 80%;"><b>Fine-tuning neural network hyperparameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs061.html#hidden-layers" style="font-size: 80%;"><b>Hidden layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs062.html#batch-normalization" style="font-size: 80%;"><b>Batch Normalization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs063.html#dropout" style="font-size: 80%;"><b>Dropout</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs064.html#gradient-clipping" style="font-size: 80%;"><b>Gradient Clipping</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs065.html#a-top-down-perspective-on-neural-networks" style="font-size: 80%;"><b>A top-down perspective on Neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs066.html#more-top-down-perspectives" style="font-size: 80%;"><b>More top-down perspectives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs067.html#limitations-of-supervised-learning-with-deep-networks" style="font-size: 80%;"><b>Limitations of supervised learning with deep networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs068.html#limitations-of-nns" style="font-size: 80%;"><b>Limitations of NNs</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs069.html#homogeneous-data" style="font-size: 80%;"><b>Homogeneous data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs070.html#more-limitations" style="font-size: 80%;"><b>More limitations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs071.html#setting-up-a-multi-layer-perceptron-model-for-classification" style="font-size: 80%;"><b>Setting up a Multi-layer perceptron model for classification</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs072.html#defining-the-cost-function" style="font-size: 80%;"><b>Defining the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs073.html#example-binary-classification-problem" style="font-size: 80%;"><b>Example: binary classification problem</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs074.html#the-softmax-function" style="font-size: 80%;"><b>The Softmax function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs075.html#developing-a-code-for-doing-neural-networks-with-back-propagation" style="font-size: 80%;"><b>Developing a code for doing neural networks with back propagation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs077.html#train-and-test-datasets" style="font-size: 80%;"><b>Train and test datasets</b></a></li>
+     <!-- navigation toc: --> <li><a href="#define-model-and-architecture" style="font-size: 80%;"><b>Define model and architecture</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs079.html#layers" style="font-size: 80%;"><b>Layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs080.html#weights-and-biases" style="font-size: 80%;"><b>Weights and biases</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs081.html#feed-forward-pass" style="font-size: 80%;"><b>Feed-forward pass</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs082.html#matrix-multiplications" style="font-size: 80%;"><b>Matrix multiplications</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs083.html#choose-cost-function-and-optimizer" style="font-size: 80%;"><b>Choose cost function and optimizer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs084.html#optimizing-the-cost-function" style="font-size: 80%;"><b>Optimizing the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs085.html#regularization" style="font-size: 80%;"><b>Regularization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs086.html#matrix-multiplication" style="font-size: 80%;"><b>Matrix  multiplication</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs087.html#improving-performance" style="font-size: 80%;"><b>Improving performance</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs088.html#full-object-oriented-implementation" style="font-size: 80%;"><b>Full object-oriented implementation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs089.html#evaluate-model-performance-on-test-data" style="font-size: 80%;"><b>Evaluate model performance on test data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs090.html#adjust-hyperparameters" style="font-size: 80%;"><b>Adjust hyperparameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#scikit-learn-implementation" style="font-size: 80%;"><b>scikit-learn implementation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs094.html#building-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building neural networks in Tensorflow and Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs095.html#tensorflow" style="font-size: 80%;"><b>Tensorflow</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#using-keras" style="font-size: 80%;"><b>Using Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#building-a-neural-network-code" style="font-size: 80%;"><b>Building a neural network code</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#learning-rate-methods" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Learning rate methods</a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#usage-of-the-above-learning-rate-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of the above learning rate schedulers</a></li>
@@ -464,47 +462,48 @@
 <div class="container">
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0078"></a>
-<!-- !split   -->
-<h2 id="layers" class="anchor">Layers </h2>
+<!-- !split -->
+<h2 id="define-model-and-architecture" class="anchor">Define model and architecture </h2>
 
-<ul>
-<li> Input</li> 
-</ul>
-<p>Since each input image has 8x8 = 64 pixels or features, we have an input layer of 64 neurons.  </p>
+<p>Our simple feed-forward neural network will consist of an <em>input</em> layer, a single <em>hidden</em> layer and an <em>output</em> layer. The activation \( y \) of each neuron is a weighted sum of inputs, passed through an activation function. In case of the simple perceptron model we have </p>
 
-<ul>
-<li> Hidden layer</li>
-</ul>
-<p>We will use 50 neurons in the hidden layer receiving input from the neurons in the input layer.  
-Since each neuron in the hidden layer is connected to the 64 inputs we have 64x50 = 3200 weights to the hidden layer.  
-</p>
+<p>$$ z = \sum_{i=1}^n w_i a_i ,$$</p>
 
-<ul>
-<li> Output</li>
-</ul>
-<p>If we were building a binary classifier, it would be sufficient with a single neuron in the output layer,
-which could output 0 or 1 according to the Heaviside function. This would be an example of a <em>hard</em> classifier, meaning it outputs the class of the input directly. However, if we are dealing with noisy data it is often beneficial to use a <em>soft</em> classifier, which outputs the probability of being in class 0 or 1.  
-</p>
+<p>$$ y = f(z) ,$$</p>
 
-<p>For a soft binary classifier, we could use a single neuron and interpret the output as either being the probability of being in class 0 or the probability of being in class 1. Alternatively we could use 2 neurons, and interpret each neuron as the probability of being in each class.  </p>
+<p>where \( f \) is the activation function, \( a_i \) represents input from neuron \( i \) in the preceding layer
+and \( w_i \) is the weight to input \( i \).  
+The activation of the neurons in the input layer is just the features (e.g. a pixel value).  
+</p>
 
-<p>Since we are doing multiclass classification, with 10 categories, it is natural to use 10 neurons in the output layer. We number the neurons \( j = 0,1,...,9 \). The activation of each output neuron \( j \) will be according to the <em>softmax</em> function:  </p>
+<p>The simplest activation function for a neuron is the <em>Heaviside</em> function:</p>
 
-<p>$$ P(\text{class \( j \)} \mid \text{input \( \boldsymbol{a} \)}) = \frac{\exp{(\boldsymbol{a}^T \boldsymbol{w}_j)}}
-{\sum_{c=0}^{9} \exp{(\boldsymbol{a}^T \boldsymbol{w}_c)}} ,$$  
+<p>$$ f(z) = 
+\begin{cases}
+1,  &  z > 0\\
+0,  & \text{otherwise}
+\end{cases}
+$$
 </p>
 
-<p>i.e. each neuron \( j \) outputs the probability of being in class \( j \) given an input from the hidden layer \( \boldsymbol{a} \), with \( \boldsymbol{w}_j \) the weights of neuron \( j \) to the inputs.  
-The denominator is a normalization factor to ensure the outputs (probabilities) sum up to 1.  
-The exponent is just the weighted sum of inputs as before:  
+<p>A feed-forward neural network with this activation is known as a <em>perceptron</em>.  
+For a binary classifier (i.e. two classes, 0 or 1, dog or not-dog) we can also use this in our output layer.  
+This activation can be generalized to \( k \) classes (using e.g. the <em>one-against-all</em> strategy), 
+and we call these architectures <em>multiclass perceptrons</em>.  
 </p>
 
-<p>$$ z_j = \sum_{i=1}^n w_ {ij} a_i+b_j.$$  </p>
+<p>However, it is now common to use the terms Single Layer Perceptron (SLP) (1 hidden layer) and  
+Multilayer Perceptron (MLP) (2 or more hidden layers) to refer to feed-forward neural networks with any activation function.  
+</p>
 
-<p>Since each neuron in the output layer is connected to the 50 inputs from the hidden layer we have 50x10 = 500
-weights to the output layer.
+<p>Typical choices for activation functions include the sigmoid function, hyperbolic tangent, and Rectified Linear Unit (ReLU).  
+We will be using the sigmoid function \( \sigma(x) \):  
 </p>
 
+<p>$$ f(x) = \sigma(x) = \frac{1}{1 + e^{-x}} ,$$</p>
+
+<p>which is inspired by probability theory (see logistic regression) and was most commonly used until about 2011. See the discussion below concerning other activation functions.</p>
+
 <p>
 <!-- navigation buttons at the bottom of the page -->
 <ul class="pagination">
diff --git a/doc/pub/week42/html/._week42-bs079.html b/doc/pub/week42/html/._week42-bs079.html
index 1c687c83f..7b09af856 100644
--- a/doc/pub/week42/html/._week42-bs079.html
+++ b/doc/pub/week42/html/._week42-bs079.html
@@ -36,16 +36,17 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Lecture October 14, 2024', 2, None, 'lecture-october-14-2024'),
-              ('Material for the active learning sessions on Tuesday and '
-               'Wednesday',
+ 'sections': [('Lecture October 13, 2025', 2, None, 'lecture-october-13-2025'),
+              ('Readings and videos', 2, None, 'readings-and-videos'),
+              ('Material for the lab sessions on Tuesday and Wednesday',
                2,
                None,
-               'material-for-the-active-learning-sessions-on-tuesday-and-wednesday'),
-              ('Writing a code which implements a feed-forward neural network',
+               'material-for-the-lab-sessions-on-tuesday-and-wednesday'),
+              ('Lecture material: Writing a code which implements a '
+               'feed-forward neural network',
                2,
                None,
-               'writing-a-code-which-implements-a-feed-forward-neural-network'),
+               'lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network'),
               ('Mathematics of deep learning',
                2,
                None,
@@ -55,10 +56,11 @@
                None,
                'reminder-on-books-with-hands-on-material-and-codes'),
               ('Reading recommendations', 2, None, 'reading-recommendations'),
-              ('First network example, simple percepetron with one input',
+              ('Reminder from last week: First network example, simple '
+               'percepetron with one input',
                2,
                None,
-               'first-network-example-simple-percepetron-with-one-input'),
+               'reminder-from-last-week-first-network-example-simple-percepetron-with-one-input'),
               ('Layout of a simple neural network with no hidden layer',
                2,
                None,
@@ -105,10 +107,10 @@
                None,
                'setting-up-the-equations-for-a-neural-network'),
               ('Layout of a neural network with three hidden layers (last '
-               'later = $l=L=4$, first layer $l=0$)',
+               'layer = $l=L=4$, first layer $l=0$)',
                2,
                None,
-               'layout-of-a-neural-network-with-three-hidden-layers-last-later-l-l-4-first-layer-l-0'),
+               'layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0'),
               ('Definitions', 2, None, 'definitions'),
               ('Inputs to the activation function',
                2,
@@ -291,10 +293,6 @@
                2,
                None,
                'collect-and-pre-process-data'),
-              ('The Breast Cancer Data, now with Keras',
-               2,
-               None,
-               'the-breast-cancer-data-now-with-keras'),
               ('Building a neural network code',
                2,
                None,
@@ -349,103 +347,103 @@
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs001.html#lecture-october-14-2024" style="font-size: 80%;"><b>Lecture October 14, 2024</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs002.html#material-for-the-active-learning-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Material for the active learning sessions on Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs003.html#writing-a-code-which-implements-a-feed-forward-neural-network" style="font-size: 80%;"><b>Writing a code which implements a feed-forward neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs004.html#mathematics-of-deep-learning" style="font-size: 80%;"><b>Mathematics of deep learning</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs005.html#reminder-on-books-with-hands-on-material-and-codes" style="font-size: 80%;"><b>Reminder on books with hands-on material and codes</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs006.html#reading-recommendations" style="font-size: 80%;"><b>Reading recommendations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs007.html#first-network-example-simple-percepetron-with-one-input" style="font-size: 80%;"><b>First network example, simple percepetron with one input</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs008.html#layout-of-a-simple-neural-network-with-no-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with no hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs009.html#optimizing-the-parameters" style="font-size: 80%;"><b>Optimizing the parameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs010.html#adding-a-hidden-layer" style="font-size: 80%;"><b>Adding a hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs011.html#layout-of-a-simple-neural-network-with-one-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with one hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs012.html#the-derivatives" style="font-size: 80%;"><b>The derivatives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs013.html#important-observations" style="font-size: 80%;"><b>Important observations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs014.html#the-training" style="font-size: 80%;"><b>The training</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs015.html#code-example" style="font-size: 80%;"><b>Code example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs016.html#simple-neural-network-and-the-back-propagation-equations" style="font-size: 80%;"><b>Simple neural network and the  back propagation equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs017.html#layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-with-two-hidden-noeds-and-one-output-node" style="font-size: 80%;"><b>Layout of a simple neural network with two input nodes, one  hidden layer with two hidden noeds and one output node</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs018.html#the-ouput-layer" style="font-size: 80%;"><b>The ouput layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs019.html#compact-expressions" style="font-size: 80%;"><b>Compact expressions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs020.html#output-layer" style="font-size: 80%;"><b>Output layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs021.html#explicit-derivatives" style="font-size: 80%;"><b>Explicit derivatives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs022.html#derivatives-of-the-hidden-layer" style="font-size: 80%;"><b>Derivatives of the hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs023.html#final-expression" style="font-size: 80%;"><b>Final expression</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs024.html#completing-the-list" style="font-size: 80%;"><b>Completing the list</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs025.html#final-expressions-for-the-biases-of-the-hidden-layer" style="font-size: 80%;"><b>Final expressions for the biases of the hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs026.html#gradient-expressions" style="font-size: 80%;"><b>Gradient expressions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs027.html#setting-up-the-equations-for-a-neural-network" style="font-size: 80%;"><b>Setting up the equations for a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs028.html#layout-of-a-neural-network-with-three-hidden-layers-last-later-l-l-4-first-layer-l-0" style="font-size: 80%;"><b>Layout of a neural network with three hidden layers (last later = \( l=L=4 \), first layer \( l=0 \))</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs029.html#definitions" style="font-size: 80%;"><b>Definitions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs030.html#inputs-to-the-activation-function" style="font-size: 80%;"><b>Inputs to the activation function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs031.html#layout-of-input-to-first-hidden-layer-l-1-from-input-layer-l-0" style="font-size: 80%;"><b>Layout of input to first hidden layer \( l=1 \) from input layer \( l=0 \)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs032.html#derivatives-and-the-chain-rule" style="font-size: 80%;"><b>Derivatives and the chain rule</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs033.html#derivative-of-the-cost-function" style="font-size: 80%;"><b>Derivative of the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs034.html#the-back-propagation-equations-for-a-neural-network" style="font-size: 80%;"><b>The  back propagation equations for a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs035.html#analyzing-the-last-results" style="font-size: 80%;"><b>Analyzing the last results</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs036.html#more-considerations" style="font-size: 80%;"><b>More considerations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs037.html#derivatives-in-terms-of-z-j-l" style="font-size: 80%;"><b>Derivatives in terms of \( z_j^L \)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs038.html#bringing-it-together" style="font-size: 80%;"><b>Bringing it together</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs039.html#final-back-propagating-equation" style="font-size: 80%;"><b>Final back propagating equation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs040.html#using-the-chain-rule-and-summing-over-all-k-entries" style="font-size: 80%;"><b>Using the chain rule and summing over all \( k \) entries</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs041.html#setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations" style="font-size: 80%;"><b>Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs042.html#setting-up-the-back-propagation-algorithm-part-1" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 1</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs043.html#setting-up-the-back-propagation-algorithm-part-2" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs044.html#setting-up-the-back-propagation-algorithm-part-3" style="font-size: 80%;"><b>Setting up the Back propagation algorithm, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs045.html#updating-the-gradients" style="font-size: 80%;"><b>Updating the gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs001.html#lecture-october-13-2025" style="font-size: 80%;"><b>Lecture October 13, 2025</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs002.html#readings-and-videos" style="font-size: 80%;"><b>Readings and videos</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs003.html#material-for-the-lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Material for the lab sessions on Tuesday and Wednesday</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs004.html#lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network" style="font-size: 80%;"><b>Lecture material: Writing a code which implements a feed-forward neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs005.html#mathematics-of-deep-learning" style="font-size: 80%;"><b>Mathematics of deep learning</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs006.html#reminder-on-books-with-hands-on-material-and-codes" style="font-size: 80%;"><b>Reminder on books with hands-on material and codes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs007.html#reading-recommendations" style="font-size: 80%;"><b>Reading recommendations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs008.html#reminder-from-last-week-first-network-example-simple-percepetron-with-one-input" style="font-size: 80%;"><b>Reminder from last week: First network example, simple percepetron with one input</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs009.html#layout-of-a-simple-neural-network-with-no-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with no hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs010.html#optimizing-the-parameters" style="font-size: 80%;"><b>Optimizing the parameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs011.html#adding-a-hidden-layer" style="font-size: 80%;"><b>Adding a hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs012.html#layout-of-a-simple-neural-network-with-one-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with one hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs013.html#the-derivatives" style="font-size: 80%;"><b>The derivatives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs014.html#important-observations" style="font-size: 80%;"><b>Important observations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs015.html#the-training" style="font-size: 80%;"><b>The training</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs016.html#code-example" style="font-size: 80%;"><b>Code example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs017.html#simple-neural-network-and-the-back-propagation-equations" style="font-size: 80%;"><b>Simple neural network and the  back propagation equations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs018.html#layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-with-two-hidden-noeds-and-one-output-node" style="font-size: 80%;"><b>Layout of a simple neural network with two input nodes, one  hidden layer with two hidden noeds and one output node</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs019.html#the-ouput-layer" style="font-size: 80%;"><b>The ouput layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs020.html#compact-expressions" style="font-size: 80%;"><b>Compact expressions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs021.html#output-layer" style="font-size: 80%;"><b>Output layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs022.html#explicit-derivatives" style="font-size: 80%;"><b>Explicit derivatives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs023.html#derivatives-of-the-hidden-layer" style="font-size: 80%;"><b>Derivatives of the hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs024.html#final-expression" style="font-size: 80%;"><b>Final expression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs025.html#completing-the-list" style="font-size: 80%;"><b>Completing the list</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs026.html#final-expressions-for-the-biases-of-the-hidden-layer" style="font-size: 80%;"><b>Final expressions for the biases of the hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs027.html#gradient-expressions" style="font-size: 80%;"><b>Gradient expressions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs028.html#setting-up-the-equations-for-a-neural-network" style="font-size: 80%;"><b>Setting up the equations for a neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs029.html#layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0" style="font-size: 80%;"><b>Layout of a neural network with three hidden layers (last layer = \( l=L=4 \), first layer \( l=0 \))</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs030.html#definitions" style="font-size: 80%;"><b>Definitions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs031.html#inputs-to-the-activation-function" style="font-size: 80%;"><b>Inputs to the activation function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs032.html#layout-of-input-to-first-hidden-layer-l-1-from-input-layer-l-0" style="font-size: 80%;"><b>Layout of input to first hidden layer \( l=1 \) from input layer \( l=0 \)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs033.html#derivatives-and-the-chain-rule" style="font-size: 80%;"><b>Derivatives and the chain rule</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs034.html#derivative-of-the-cost-function" style="font-size: 80%;"><b>Derivative of the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs035.html#the-back-propagation-equations-for-a-neural-network" style="font-size: 80%;"><b>The  back propagation equations for a neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs036.html#analyzing-the-last-results" style="font-size: 80%;"><b>Analyzing the last results</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs037.html#more-considerations" style="font-size: 80%;"><b>More considerations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs038.html#derivatives-in-terms-of-z-j-l" style="font-size: 80%;"><b>Derivatives in terms of \( z_j^L \)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs039.html#bringing-it-together" style="font-size: 80%;"><b>Bringing it together</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs040.html#final-back-propagating-equation" style="font-size: 80%;"><b>Final back propagating equation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs041.html#using-the-chain-rule-and-summing-over-all-k-entries" style="font-size: 80%;"><b>Using the chain rule and summing over all \( k \) entries</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs042.html#setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations" style="font-size: 80%;"><b>Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs043.html#setting-up-the-back-propagation-algorithm-part-1" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 1</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs044.html#setting-up-the-back-propagation-algorithm-part-2" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 2</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs045.html#setting-up-the-back-propagation-algorithm-part-3" style="font-size: 80%;"><b>Setting up the Back propagation algorithm, part 3</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs046.html#updating-the-gradients" style="font-size: 80%;"><b>Updating the gradients</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#activation-functions" style="font-size: 80%;"><b>Activation functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs047.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs048.html#relevance" style="font-size: 80%;"><b>Relevance</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs049.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs050.html#exploding-gradients" style="font-size: 80%;"><b>Exploding gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs051.html#is-the-logistic-activation-function-sigmoid-our-choice" style="font-size: 80%;"><b>Is the Logistic activation function (Sigmoid)  our choice?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs052.html#logistic-function-as-the-root-of-problems" style="font-size: 80%;"><b>Logistic function as the root of problems</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs053.html#the-derivative-of-the-logistic-funtion" style="font-size: 80%;"><b>The derivative of the Logistic funtion</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs054.html#insights-from-the-paper-by-glorot-and-bengio" style="font-size: 80%;"><b>Insights from the paper by Glorot and Bengio</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs055.html#the-relu-function-family" style="font-size: 80%;"><b>The RELU function family</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs056.html#elu-function" style="font-size: 80%;"><b>ELU function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs057.html#which-activation-function-should-we-use" style="font-size: 80%;"><b>Which activation function should we use?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs058.html#more-on-activation-functions-output-layers" style="font-size: 80%;"><b>More on activation functions, output layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs059.html#fine-tuning-neural-network-hyperparameters" style="font-size: 80%;"><b>Fine-tuning neural network hyperparameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs060.html#hidden-layers" style="font-size: 80%;"><b>Hidden layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs061.html#batch-normalization" style="font-size: 80%;"><b>Batch Normalization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs062.html#dropout" style="font-size: 80%;"><b>Dropout</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs063.html#gradient-clipping" style="font-size: 80%;"><b>Gradient Clipping</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs064.html#a-top-down-perspective-on-neural-networks" style="font-size: 80%;"><b>A top-down perspective on Neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs065.html#more-top-down-perspectives" style="font-size: 80%;"><b>More top-down perspectives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs066.html#limitations-of-supervised-learning-with-deep-networks" style="font-size: 80%;"><b>Limitations of supervised learning with deep networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs067.html#limitations-of-nns" style="font-size: 80%;"><b>Limitations of NNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs068.html#homogeneous-data" style="font-size: 80%;"><b>Homogeneous data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs069.html#more-limitations" style="font-size: 80%;"><b>More limitations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs070.html#setting-up-a-multi-layer-perceptron-model-for-classification" style="font-size: 80%;"><b>Setting up a Multi-layer perceptron model for classification</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs071.html#defining-the-cost-function" style="font-size: 80%;"><b>Defining the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs072.html#example-binary-classification-problem" style="font-size: 80%;"><b>Example: binary classification problem</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs073.html#the-softmax-function" style="font-size: 80%;"><b>The Softmax function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs074.html#developing-a-code-for-doing-neural-networks-with-back-propagation" style="font-size: 80%;"><b>Developing a code for doing neural networks with back propagation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs076.html#train-and-test-datasets" style="font-size: 80%;"><b>Train and test datasets</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs077.html#define-model-and-architecture" style="font-size: 80%;"><b>Define model and architecture</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs078.html#layers" style="font-size: 80%;"><b>Layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="#weights-and-biases" style="font-size: 80%;"><b>Weights and biases</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs080.html#feed-forward-pass" style="font-size: 80%;"><b>Feed-forward pass</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs081.html#matrix-multiplications" style="font-size: 80%;"><b>Matrix multiplications</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs082.html#choose-cost-function-and-optimizer" style="font-size: 80%;"><b>Choose cost function and optimizer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs083.html#optimizing-the-cost-function" style="font-size: 80%;"><b>Optimizing the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs084.html#regularization" style="font-size: 80%;"><b>Regularization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs085.html#matrix-multiplication" style="font-size: 80%;"><b>Matrix  multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs086.html#improving-performance" style="font-size: 80%;"><b>Improving performance</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs087.html#full-object-oriented-implementation" style="font-size: 80%;"><b>Full object-oriented implementation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs088.html#evaluate-model-performance-on-test-data" style="font-size: 80%;"><b>Evaluate model performance on test data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs089.html#adjust-hyperparameters" style="font-size: 80%;"><b>Adjust hyperparameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs091.html#scikit-learn-implementation" style="font-size: 80%;"><b>scikit-learn implementation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#building-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs094.html#tensorflow" style="font-size: 80%;"><b>Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs095.html#using-keras" style="font-size: 80%;"><b>Using Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#the-breast-cancer-data-now-with-keras" style="font-size: 80%;"><b>The Breast Cancer Data, now with Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs048.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs049.html#relevance" style="font-size: 80%;"><b>Relevance</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs050.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs051.html#exploding-gradients" style="font-size: 80%;"><b>Exploding gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs052.html#is-the-logistic-activation-function-sigmoid-our-choice" style="font-size: 80%;"><b>Is the Logistic activation function (Sigmoid)  our choice?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs053.html#logistic-function-as-the-root-of-problems" style="font-size: 80%;"><b>Logistic function as the root of problems</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs054.html#the-derivative-of-the-logistic-funtion" style="font-size: 80%;"><b>The derivative of the Logistic funtion</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs055.html#insights-from-the-paper-by-glorot-and-bengio" style="font-size: 80%;"><b>Insights from the paper by Glorot and Bengio</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs056.html#the-relu-function-family" style="font-size: 80%;"><b>The RELU function family</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs057.html#elu-function" style="font-size: 80%;"><b>ELU function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs058.html#which-activation-function-should-we-use" style="font-size: 80%;"><b>Which activation function should we use?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs059.html#more-on-activation-functions-output-layers" style="font-size: 80%;"><b>More on activation functions, output layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs060.html#fine-tuning-neural-network-hyperparameters" style="font-size: 80%;"><b>Fine-tuning neural network hyperparameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs061.html#hidden-layers" style="font-size: 80%;"><b>Hidden layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs062.html#batch-normalization" style="font-size: 80%;"><b>Batch Normalization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs063.html#dropout" style="font-size: 80%;"><b>Dropout</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs064.html#gradient-clipping" style="font-size: 80%;"><b>Gradient Clipping</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs065.html#a-top-down-perspective-on-neural-networks" style="font-size: 80%;"><b>A top-down perspective on Neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs066.html#more-top-down-perspectives" style="font-size: 80%;"><b>More top-down perspectives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs067.html#limitations-of-supervised-learning-with-deep-networks" style="font-size: 80%;"><b>Limitations of supervised learning with deep networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs068.html#limitations-of-nns" style="font-size: 80%;"><b>Limitations of NNs</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs069.html#homogeneous-data" style="font-size: 80%;"><b>Homogeneous data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs070.html#more-limitations" style="font-size: 80%;"><b>More limitations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs071.html#setting-up-a-multi-layer-perceptron-model-for-classification" style="font-size: 80%;"><b>Setting up a Multi-layer perceptron model for classification</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs072.html#defining-the-cost-function" style="font-size: 80%;"><b>Defining the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs073.html#example-binary-classification-problem" style="font-size: 80%;"><b>Example: binary classification problem</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs074.html#the-softmax-function" style="font-size: 80%;"><b>The Softmax function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs075.html#developing-a-code-for-doing-neural-networks-with-back-propagation" style="font-size: 80%;"><b>Developing a code for doing neural networks with back propagation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs077.html#train-and-test-datasets" style="font-size: 80%;"><b>Train and test datasets</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs078.html#define-model-and-architecture" style="font-size: 80%;"><b>Define model and architecture</b></a></li>
+     <!-- navigation toc: --> <li><a href="#layers" style="font-size: 80%;"><b>Layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs080.html#weights-and-biases" style="font-size: 80%;"><b>Weights and biases</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs081.html#feed-forward-pass" style="font-size: 80%;"><b>Feed-forward pass</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs082.html#matrix-multiplications" style="font-size: 80%;"><b>Matrix multiplications</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs083.html#choose-cost-function-and-optimizer" style="font-size: 80%;"><b>Choose cost function and optimizer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs084.html#optimizing-the-cost-function" style="font-size: 80%;"><b>Optimizing the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs085.html#regularization" style="font-size: 80%;"><b>Regularization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs086.html#matrix-multiplication" style="font-size: 80%;"><b>Matrix  multiplication</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs087.html#improving-performance" style="font-size: 80%;"><b>Improving performance</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs088.html#full-object-oriented-implementation" style="font-size: 80%;"><b>Full object-oriented implementation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs089.html#evaluate-model-performance-on-test-data" style="font-size: 80%;"><b>Evaluate model performance on test data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs090.html#adjust-hyperparameters" style="font-size: 80%;"><b>Adjust hyperparameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#scikit-learn-implementation" style="font-size: 80%;"><b>scikit-learn implementation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs094.html#building-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building neural networks in Tensorflow and Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs095.html#tensorflow" style="font-size: 80%;"><b>Tensorflow</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#using-keras" style="font-size: 80%;"><b>Using Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#building-a-neural-network-code" style="font-size: 80%;"><b>Building a neural network code</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#learning-rate-methods" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Learning rate methods</a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#usage-of-the-above-learning-rate-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of the above learning rate schedulers</a></li>
@@ -465,56 +463,45 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0079"></a>
 <!-- !split   -->
-<h2 id="weights-and-biases" class="anchor">Weights and biases </h2>
+<h2 id="layers" class="anchor">Layers </h2>
 
-<p>Typically weights are initialized with small values distributed around zero, drawn from a uniform
-or normal distribution. Setting all weights to zero means all neurons give the same output, making the network useless.  
-</p>
+<ul>
+<li> Input</li> 
+</ul>
+<p>Since each input image has 8x8 = 64 pixels or features, we have an input layer of 64 neurons.  </p>
 
-<p>Adding a bias value to the weighted sum of inputs allows the neural network to represent a greater range
-of values. Without it, any input with the value 0 will be mapped to zero (before being passed through the activation). The bias unit has an output of 1, and a weight to each neuron \( j \), \( b_j \):  
+<ul>
+<li> Hidden layer</li>
+</ul>
+<p>We will use 50 neurons in the hidden layer receiving input from the neurons in the input layer.  
+Since each neuron in the hidden layer is connected to the 64 inputs we have 64x50 = 3200 weights to the hidden layer.  
 </p>
 
-<p>$$ z_j = \sum_{i=1}^n w_ {ij} a_i + b_j.$$  </p>
-
-<p>The bias weights \( \boldsymbol{b} \) are often initialized to zero, but a small value like \( 0.01 \) ensures all neurons have some output which can be backpropagated in the first training cycle.</p>
+<ul>
+<li> Output</li>
+</ul>
+<p>If we were building a binary classifier, it would be sufficient with a single neuron in the output layer,
+which could output 0 or 1 according to the Heaviside function. This would be an example of a <em>hard</em> classifier, meaning it outputs the class of the input directly. However, if we are dealing with noisy data it is often beneficial to use a <em>soft</em> classifier, which outputs the probability of being in class 0 or 1.  
+</p>
 
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #408080; font-style: italic"># building our neural network</span>
+<p>For a soft binary classifier, we could use a single neuron and interpret the output as either being the probability of being in class 0 or the probability of being in class 1. Alternatively we could use 2 neurons, and interpret each neuron as the probability of being in each class.  </p>
 
-n_inputs, n_features <span style="color: #666666">=</span> X_train<span style="color: #666666">.</span>shape
-n_hidden_neurons <span style="color: #666666">=</span> <span style="color: #666666">50</span>
-n_categories <span style="color: #666666">=</span> <span style="color: #666666">10</span>
+<p>Since we are doing multiclass classification, with 10 categories, it is natural to use 10 neurons in the output layer. We number the neurons \( j = 0,1,...,9 \). The activation of each output neuron \( j \) will be according to the <em>softmax</em> function:  </p>
 
-<span style="color: #408080; font-style: italic"># we make the weights normally distributed using numpy.random.randn</span>
+<p>$$ P(\text{class \( j \)} \mid \text{input \( \boldsymbol{a} \)}) = \frac{\exp{(\boldsymbol{a}^T \boldsymbol{w}_j)}}
+{\sum_{c=0}^{9} \exp{(\boldsymbol{a}^T \boldsymbol{w}_c)}} ,$$  
+</p>
 
-<span style="color: #408080; font-style: italic"># weights and bias in the hidden layer</span>
-hidden_weights <span style="color: #666666">=</span> np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>randn(n_features, n_hidden_neurons)
-hidden_bias <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros(n_hidden_neurons) <span style="color: #666666">+</span> <span style="color: #666666">0.01</span>
+<p>i.e. each neuron \( j \) outputs the probability of being in class \( j \) given an input from the hidden layer \( \boldsymbol{a} \), with \( \boldsymbol{w}_j \) the weights of neuron \( j \) to the inputs.  
+The denominator is a normalization factor to ensure the outputs (probabilities) sum up to 1.  
+The exponent is just the weighted sum of inputs as before:  
+</p>
 
-<span style="color: #408080; font-style: italic"># weights and bias in the output layer</span>
-output_weights <span style="color: #666666">=</span> np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>randn(n_hidden_neurons, n_categories)
-output_bias <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros(n_categories) <span style="color: #666666">+</span> <span style="color: #666666">0.01</span>
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
+<p>$$ z_j = \sum_{i=1}^n w_ {ij} a_i+b_j.$$  </p>
 
+<p>Since each neuron in the output layer is connected to the 50 inputs from the hidden layer we have 50x10 = 500
+weights to the output layer.
+</p>
 
 <p>
 <!-- navigation buttons at the bottom of the page -->
diff --git a/doc/pub/week42/html/._week42-bs080.html b/doc/pub/week42/html/._week42-bs080.html
index db9ca2bc3..7dc11b61c 100644
--- a/doc/pub/week42/html/._week42-bs080.html
+++ b/doc/pub/week42/html/._week42-bs080.html
@@ -36,16 +36,17 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Lecture October 14, 2024', 2, None, 'lecture-october-14-2024'),
-              ('Material for the active learning sessions on Tuesday and '
-               'Wednesday',
+ 'sections': [('Lecture October 13, 2025', 2, None, 'lecture-october-13-2025'),
+              ('Readings and videos', 2, None, 'readings-and-videos'),
+              ('Material for the lab sessions on Tuesday and Wednesday',
                2,
                None,
-               'material-for-the-active-learning-sessions-on-tuesday-and-wednesday'),
-              ('Writing a code which implements a feed-forward neural network',
+               'material-for-the-lab-sessions-on-tuesday-and-wednesday'),
+              ('Lecture material: Writing a code which implements a '
+               'feed-forward neural network',
                2,
                None,
-               'writing-a-code-which-implements-a-feed-forward-neural-network'),
+               'lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network'),
               ('Mathematics of deep learning',
                2,
                None,
@@ -55,10 +56,11 @@
                None,
                'reminder-on-books-with-hands-on-material-and-codes'),
               ('Reading recommendations', 2, None, 'reading-recommendations'),
-              ('First network example, simple percepetron with one input',
+              ('Reminder from last week: First network example, simple '
+               'percepetron with one input',
                2,
                None,
-               'first-network-example-simple-percepetron-with-one-input'),
+               'reminder-from-last-week-first-network-example-simple-percepetron-with-one-input'),
               ('Layout of a simple neural network with no hidden layer',
                2,
                None,
@@ -105,10 +107,10 @@
                None,
                'setting-up-the-equations-for-a-neural-network'),
               ('Layout of a neural network with three hidden layers (last '
-               'later = $l=L=4$, first layer $l=0$)',
+               'layer = $l=L=4$, first layer $l=0$)',
                2,
                None,
-               'layout-of-a-neural-network-with-three-hidden-layers-last-later-l-l-4-first-layer-l-0'),
+               'layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0'),
               ('Definitions', 2, None, 'definitions'),
               ('Inputs to the activation function',
                2,
@@ -291,10 +293,6 @@
                2,
                None,
                'collect-and-pre-process-data'),
-              ('The Breast Cancer Data, now with Keras',
-               2,
-               None,
-               'the-breast-cancer-data-now-with-keras'),
               ('Building a neural network code',
                2,
                None,
@@ -349,103 +347,103 @@
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs001.html#lecture-october-14-2024" style="font-size: 80%;"><b>Lecture October 14, 2024</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs002.html#material-for-the-active-learning-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Material for the active learning sessions on Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs003.html#writing-a-code-which-implements-a-feed-forward-neural-network" style="font-size: 80%;"><b>Writing a code which implements a feed-forward neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs004.html#mathematics-of-deep-learning" style="font-size: 80%;"><b>Mathematics of deep learning</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs005.html#reminder-on-books-with-hands-on-material-and-codes" style="font-size: 80%;"><b>Reminder on books with hands-on material and codes</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs006.html#reading-recommendations" style="font-size: 80%;"><b>Reading recommendations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs007.html#first-network-example-simple-percepetron-with-one-input" style="font-size: 80%;"><b>First network example, simple percepetron with one input</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs008.html#layout-of-a-simple-neural-network-with-no-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with no hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs009.html#optimizing-the-parameters" style="font-size: 80%;"><b>Optimizing the parameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs010.html#adding-a-hidden-layer" style="font-size: 80%;"><b>Adding a hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs011.html#layout-of-a-simple-neural-network-with-one-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with one hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs012.html#the-derivatives" style="font-size: 80%;"><b>The derivatives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs013.html#important-observations" style="font-size: 80%;"><b>Important observations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs014.html#the-training" style="font-size: 80%;"><b>The training</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs015.html#code-example" style="font-size: 80%;"><b>Code example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs016.html#simple-neural-network-and-the-back-propagation-equations" style="font-size: 80%;"><b>Simple neural network and the  back propagation equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs017.html#layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-with-two-hidden-noeds-and-one-output-node" style="font-size: 80%;"><b>Layout of a simple neural network with two input nodes, one  hidden layer with two hidden noeds and one output node</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs018.html#the-ouput-layer" style="font-size: 80%;"><b>The ouput layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs019.html#compact-expressions" style="font-size: 80%;"><b>Compact expressions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs020.html#output-layer" style="font-size: 80%;"><b>Output layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs021.html#explicit-derivatives" style="font-size: 80%;"><b>Explicit derivatives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs022.html#derivatives-of-the-hidden-layer" style="font-size: 80%;"><b>Derivatives of the hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs023.html#final-expression" style="font-size: 80%;"><b>Final expression</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs024.html#completing-the-list" style="font-size: 80%;"><b>Completing the list</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs025.html#final-expressions-for-the-biases-of-the-hidden-layer" style="font-size: 80%;"><b>Final expressions for the biases of the hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs026.html#gradient-expressions" style="font-size: 80%;"><b>Gradient expressions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs027.html#setting-up-the-equations-for-a-neural-network" style="font-size: 80%;"><b>Setting up the equations for a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs028.html#layout-of-a-neural-network-with-three-hidden-layers-last-later-l-l-4-first-layer-l-0" style="font-size: 80%;"><b>Layout of a neural network with three hidden layers (last later = \( l=L=4 \), first layer \( l=0 \))</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs029.html#definitions" style="font-size: 80%;"><b>Definitions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs030.html#inputs-to-the-activation-function" style="font-size: 80%;"><b>Inputs to the activation function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs031.html#layout-of-input-to-first-hidden-layer-l-1-from-input-layer-l-0" style="font-size: 80%;"><b>Layout of input to first hidden layer \( l=1 \) from input layer \( l=0 \)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs032.html#derivatives-and-the-chain-rule" style="font-size: 80%;"><b>Derivatives and the chain rule</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs033.html#derivative-of-the-cost-function" style="font-size: 80%;"><b>Derivative of the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs034.html#the-back-propagation-equations-for-a-neural-network" style="font-size: 80%;"><b>The  back propagation equations for a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs035.html#analyzing-the-last-results" style="font-size: 80%;"><b>Analyzing the last results</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs036.html#more-considerations" style="font-size: 80%;"><b>More considerations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs037.html#derivatives-in-terms-of-z-j-l" style="font-size: 80%;"><b>Derivatives in terms of \( z_j^L \)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs038.html#bringing-it-together" style="font-size: 80%;"><b>Bringing it together</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs039.html#final-back-propagating-equation" style="font-size: 80%;"><b>Final back propagating equation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs040.html#using-the-chain-rule-and-summing-over-all-k-entries" style="font-size: 80%;"><b>Using the chain rule and summing over all \( k \) entries</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs041.html#setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations" style="font-size: 80%;"><b>Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs042.html#setting-up-the-back-propagation-algorithm-part-1" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 1</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs043.html#setting-up-the-back-propagation-algorithm-part-2" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs044.html#setting-up-the-back-propagation-algorithm-part-3" style="font-size: 80%;"><b>Setting up the Back propagation algorithm, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs045.html#updating-the-gradients" style="font-size: 80%;"><b>Updating the gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs001.html#lecture-october-13-2025" style="font-size: 80%;"><b>Lecture October 13, 2025</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs002.html#readings-and-videos" style="font-size: 80%;"><b>Readings and videos</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs003.html#material-for-the-lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Material for the lab sessions on Tuesday and Wednesday</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs004.html#lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network" style="font-size: 80%;"><b>Lecture material: Writing a code which implements a feed-forward neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs005.html#mathematics-of-deep-learning" style="font-size: 80%;"><b>Mathematics of deep learning</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs006.html#reminder-on-books-with-hands-on-material-and-codes" style="font-size: 80%;"><b>Reminder on books with hands-on material and codes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs007.html#reading-recommendations" style="font-size: 80%;"><b>Reading recommendations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs008.html#reminder-from-last-week-first-network-example-simple-percepetron-with-one-input" style="font-size: 80%;"><b>Reminder from last week: First network example, simple percepetron with one input</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs009.html#layout-of-a-simple-neural-network-with-no-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with no hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs010.html#optimizing-the-parameters" style="font-size: 80%;"><b>Optimizing the parameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs011.html#adding-a-hidden-layer" style="font-size: 80%;"><b>Adding a hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs012.html#layout-of-a-simple-neural-network-with-one-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with one hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs013.html#the-derivatives" style="font-size: 80%;"><b>The derivatives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs014.html#important-observations" style="font-size: 80%;"><b>Important observations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs015.html#the-training" style="font-size: 80%;"><b>The training</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs016.html#code-example" style="font-size: 80%;"><b>Code example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs017.html#simple-neural-network-and-the-back-propagation-equations" style="font-size: 80%;"><b>Simple neural network and the  back propagation equations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs018.html#layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-with-two-hidden-noeds-and-one-output-node" style="font-size: 80%;"><b>Layout of a simple neural network with two input nodes, one  hidden layer with two hidden noeds and one output node</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs019.html#the-ouput-layer" style="font-size: 80%;"><b>The ouput layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs020.html#compact-expressions" style="font-size: 80%;"><b>Compact expressions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs021.html#output-layer" style="font-size: 80%;"><b>Output layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs022.html#explicit-derivatives" style="font-size: 80%;"><b>Explicit derivatives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs023.html#derivatives-of-the-hidden-layer" style="font-size: 80%;"><b>Derivatives of the hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs024.html#final-expression" style="font-size: 80%;"><b>Final expression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs025.html#completing-the-list" style="font-size: 80%;"><b>Completing the list</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs026.html#final-expressions-for-the-biases-of-the-hidden-layer" style="font-size: 80%;"><b>Final expressions for the biases of the hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs027.html#gradient-expressions" style="font-size: 80%;"><b>Gradient expressions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs028.html#setting-up-the-equations-for-a-neural-network" style="font-size: 80%;"><b>Setting up the equations for a neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs029.html#layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0" style="font-size: 80%;"><b>Layout of a neural network with three hidden layers (last layer = \( l=L=4 \), first layer \( l=0 \))</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs030.html#definitions" style="font-size: 80%;"><b>Definitions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs031.html#inputs-to-the-activation-function" style="font-size: 80%;"><b>Inputs to the activation function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs032.html#layout-of-input-to-first-hidden-layer-l-1-from-input-layer-l-0" style="font-size: 80%;"><b>Layout of input to first hidden layer \( l=1 \) from input layer \( l=0 \)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs033.html#derivatives-and-the-chain-rule" style="font-size: 80%;"><b>Derivatives and the chain rule</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs034.html#derivative-of-the-cost-function" style="font-size: 80%;"><b>Derivative of the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs035.html#the-back-propagation-equations-for-a-neural-network" style="font-size: 80%;"><b>The  back propagation equations for a neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs036.html#analyzing-the-last-results" style="font-size: 80%;"><b>Analyzing the last results</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs037.html#more-considerations" style="font-size: 80%;"><b>More considerations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs038.html#derivatives-in-terms-of-z-j-l" style="font-size: 80%;"><b>Derivatives in terms of \( z_j^L \)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs039.html#bringing-it-together" style="font-size: 80%;"><b>Bringing it together</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs040.html#final-back-propagating-equation" style="font-size: 80%;"><b>Final back propagating equation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs041.html#using-the-chain-rule-and-summing-over-all-k-entries" style="font-size: 80%;"><b>Using the chain rule and summing over all \( k \) entries</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs042.html#setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations" style="font-size: 80%;"><b>Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs043.html#setting-up-the-back-propagation-algorithm-part-1" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 1</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs044.html#setting-up-the-back-propagation-algorithm-part-2" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 2</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs045.html#setting-up-the-back-propagation-algorithm-part-3" style="font-size: 80%;"><b>Setting up the Back propagation algorithm, part 3</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs046.html#updating-the-gradients" style="font-size: 80%;"><b>Updating the gradients</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#activation-functions" style="font-size: 80%;"><b>Activation functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs047.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs048.html#relevance" style="font-size: 80%;"><b>Relevance</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs049.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs050.html#exploding-gradients" style="font-size: 80%;"><b>Exploding gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs051.html#is-the-logistic-activation-function-sigmoid-our-choice" style="font-size: 80%;"><b>Is the Logistic activation function (Sigmoid)  our choice?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs052.html#logistic-function-as-the-root-of-problems" style="font-size: 80%;"><b>Logistic function as the root of problems</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs053.html#the-derivative-of-the-logistic-funtion" style="font-size: 80%;"><b>The derivative of the Logistic funtion</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs054.html#insights-from-the-paper-by-glorot-and-bengio" style="font-size: 80%;"><b>Insights from the paper by Glorot and Bengio</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs055.html#the-relu-function-family" style="font-size: 80%;"><b>The RELU function family</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs056.html#elu-function" style="font-size: 80%;"><b>ELU function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs057.html#which-activation-function-should-we-use" style="font-size: 80%;"><b>Which activation function should we use?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs058.html#more-on-activation-functions-output-layers" style="font-size: 80%;"><b>More on activation functions, output layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs059.html#fine-tuning-neural-network-hyperparameters" style="font-size: 80%;"><b>Fine-tuning neural network hyperparameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs060.html#hidden-layers" style="font-size: 80%;"><b>Hidden layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs061.html#batch-normalization" style="font-size: 80%;"><b>Batch Normalization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs062.html#dropout" style="font-size: 80%;"><b>Dropout</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs063.html#gradient-clipping" style="font-size: 80%;"><b>Gradient Clipping</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs064.html#a-top-down-perspective-on-neural-networks" style="font-size: 80%;"><b>A top-down perspective on Neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs065.html#more-top-down-perspectives" style="font-size: 80%;"><b>More top-down perspectives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs066.html#limitations-of-supervised-learning-with-deep-networks" style="font-size: 80%;"><b>Limitations of supervised learning with deep networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs067.html#limitations-of-nns" style="font-size: 80%;"><b>Limitations of NNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs068.html#homogeneous-data" style="font-size: 80%;"><b>Homogeneous data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs069.html#more-limitations" style="font-size: 80%;"><b>More limitations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs070.html#setting-up-a-multi-layer-perceptron-model-for-classification" style="font-size: 80%;"><b>Setting up a Multi-layer perceptron model for classification</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs071.html#defining-the-cost-function" style="font-size: 80%;"><b>Defining the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs072.html#example-binary-classification-problem" style="font-size: 80%;"><b>Example: binary classification problem</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs073.html#the-softmax-function" style="font-size: 80%;"><b>The Softmax function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs074.html#developing-a-code-for-doing-neural-networks-with-back-propagation" style="font-size: 80%;"><b>Developing a code for doing neural networks with back propagation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs076.html#train-and-test-datasets" style="font-size: 80%;"><b>Train and test datasets</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs077.html#define-model-and-architecture" style="font-size: 80%;"><b>Define model and architecture</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs078.html#layers" style="font-size: 80%;"><b>Layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs079.html#weights-and-biases" style="font-size: 80%;"><b>Weights and biases</b></a></li>
-     <!-- navigation toc: --> <li><a href="#feed-forward-pass" style="font-size: 80%;"><b>Feed-forward pass</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs081.html#matrix-multiplications" style="font-size: 80%;"><b>Matrix multiplications</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs082.html#choose-cost-function-and-optimizer" style="font-size: 80%;"><b>Choose cost function and optimizer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs083.html#optimizing-the-cost-function" style="font-size: 80%;"><b>Optimizing the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs084.html#regularization" style="font-size: 80%;"><b>Regularization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs085.html#matrix-multiplication" style="font-size: 80%;"><b>Matrix  multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs086.html#improving-performance" style="font-size: 80%;"><b>Improving performance</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs087.html#full-object-oriented-implementation" style="font-size: 80%;"><b>Full object-oriented implementation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs088.html#evaluate-model-performance-on-test-data" style="font-size: 80%;"><b>Evaluate model performance on test data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs089.html#adjust-hyperparameters" style="font-size: 80%;"><b>Adjust hyperparameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs091.html#scikit-learn-implementation" style="font-size: 80%;"><b>scikit-learn implementation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#building-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs094.html#tensorflow" style="font-size: 80%;"><b>Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs095.html#using-keras" style="font-size: 80%;"><b>Using Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#the-breast-cancer-data-now-with-keras" style="font-size: 80%;"><b>The Breast Cancer Data, now with Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs048.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs049.html#relevance" style="font-size: 80%;"><b>Relevance</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs050.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs051.html#exploding-gradients" style="font-size: 80%;"><b>Exploding gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs052.html#is-the-logistic-activation-function-sigmoid-our-choice" style="font-size: 80%;"><b>Is the Logistic activation function (Sigmoid)  our choice?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs053.html#logistic-function-as-the-root-of-problems" style="font-size: 80%;"><b>Logistic function as the root of problems</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs054.html#the-derivative-of-the-logistic-funtion" style="font-size: 80%;"><b>The derivative of the Logistic funtion</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs055.html#insights-from-the-paper-by-glorot-and-bengio" style="font-size: 80%;"><b>Insights from the paper by Glorot and Bengio</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs056.html#the-relu-function-family" style="font-size: 80%;"><b>The RELU function family</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs057.html#elu-function" style="font-size: 80%;"><b>ELU function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs058.html#which-activation-function-should-we-use" style="font-size: 80%;"><b>Which activation function should we use?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs059.html#more-on-activation-functions-output-layers" style="font-size: 80%;"><b>More on activation functions, output layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs060.html#fine-tuning-neural-network-hyperparameters" style="font-size: 80%;"><b>Fine-tuning neural network hyperparameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs061.html#hidden-layers" style="font-size: 80%;"><b>Hidden layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs062.html#batch-normalization" style="font-size: 80%;"><b>Batch Normalization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs063.html#dropout" style="font-size: 80%;"><b>Dropout</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs064.html#gradient-clipping" style="font-size: 80%;"><b>Gradient Clipping</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs065.html#a-top-down-perspective-on-neural-networks" style="font-size: 80%;"><b>A top-down perspective on Neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs066.html#more-top-down-perspectives" style="font-size: 80%;"><b>More top-down perspectives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs067.html#limitations-of-supervised-learning-with-deep-networks" style="font-size: 80%;"><b>Limitations of supervised learning with deep networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs068.html#limitations-of-nns" style="font-size: 80%;"><b>Limitations of NNs</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs069.html#homogeneous-data" style="font-size: 80%;"><b>Homogeneous data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs070.html#more-limitations" style="font-size: 80%;"><b>More limitations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs071.html#setting-up-a-multi-layer-perceptron-model-for-classification" style="font-size: 80%;"><b>Setting up a Multi-layer perceptron model for classification</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs072.html#defining-the-cost-function" style="font-size: 80%;"><b>Defining the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs073.html#example-binary-classification-problem" style="font-size: 80%;"><b>Example: binary classification problem</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs074.html#the-softmax-function" style="font-size: 80%;"><b>The Softmax function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs075.html#developing-a-code-for-doing-neural-networks-with-back-propagation" style="font-size: 80%;"><b>Developing a code for doing neural networks with back propagation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs077.html#train-and-test-datasets" style="font-size: 80%;"><b>Train and test datasets</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs078.html#define-model-and-architecture" style="font-size: 80%;"><b>Define model and architecture</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs079.html#layers" style="font-size: 80%;"><b>Layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="#weights-and-biases" style="font-size: 80%;"><b>Weights and biases</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs081.html#feed-forward-pass" style="font-size: 80%;"><b>Feed-forward pass</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs082.html#matrix-multiplications" style="font-size: 80%;"><b>Matrix multiplications</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs083.html#choose-cost-function-and-optimizer" style="font-size: 80%;"><b>Choose cost function and optimizer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs084.html#optimizing-the-cost-function" style="font-size: 80%;"><b>Optimizing the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs085.html#regularization" style="font-size: 80%;"><b>Regularization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs086.html#matrix-multiplication" style="font-size: 80%;"><b>Matrix  multiplication</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs087.html#improving-performance" style="font-size: 80%;"><b>Improving performance</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs088.html#full-object-oriented-implementation" style="font-size: 80%;"><b>Full object-oriented implementation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs089.html#evaluate-model-performance-on-test-data" style="font-size: 80%;"><b>Evaluate model performance on test data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs090.html#adjust-hyperparameters" style="font-size: 80%;"><b>Adjust hyperparameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#scikit-learn-implementation" style="font-size: 80%;"><b>scikit-learn implementation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs094.html#building-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building neural networks in Tensorflow and Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs095.html#tensorflow" style="font-size: 80%;"><b>Tensorflow</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#using-keras" style="font-size: 80%;"><b>Using Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#building-a-neural-network-code" style="font-size: 80%;"><b>Building a neural network code</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#learning-rate-methods" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Learning rate methods</a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#usage-of-the-above-learning-rate-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of the above learning rate schedulers</a></li>
@@ -464,28 +462,57 @@
 <div class="container">
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0080"></a>
-<!-- !split -->
-<h2 id="feed-forward-pass" class="anchor">Feed-forward pass </h2>
+<!-- !split   -->
+<h2 id="weights-and-biases" class="anchor">Weights and biases </h2>
 
-<p>Denote \( F \) the number of features, \( H \) the number of hidden neurons and \( C \) the number of categories.  
-For each input image we calculate a weighted sum of input features (pixel values) to each neuron \( j \) in the hidden layer \( l \):  
+<p>Typically weights are initialized with small values distributed around zero, drawn from a uniform
+or normal distribution. Setting all weights to zero means all neurons give the same output, making the network useless.  
 </p>
 
-<p>$$ z_{j}^{l} = \sum_{i=1}^{F} w_{ij}^{l} x_i + b_{j}^{l},$$</p>
+<p>Adding a bias value to the weighted sum of inputs allows the neural network to represent a greater range
+of values. Without it, any input with the value 0 will be mapped to zero (before being passed through the activation). The bias unit has an output of 1, and a weight to each neuron \( j \), \( b_j \):  
+</p>
 
-<p>this is then passed through our activation function  </p>
+<p>$$ z_j = \sum_{i=1}^n w_ {ij} a_i + b_j.$$  </p>
 
-<p>$$ a_{j}^{l} = f(z_{j}^{l}) .$$  </p>
+<p>The bias weights \( \boldsymbol{b} \) are often initialized to zero, but a small value like \( 0.01 \) ensures all neurons have some output which can be backpropagated in the first training cycle.</p>
 
-<p>We calculate a weighted sum of inputs (activations in the hidden layer) to each neuron \( j \) in the output layer:  </p>
+<!-- code=python (!bc pycod) typeset with pygments style "default" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #f8f8f8">
+  <pre style="line-height: 125%;"><span style="color: #408080; font-style: italic"># building our neural network</span>
 
-<p>$$ z_{j}^{L} = \sum_{i=1}^{H} w_{ij}^{L} a_{i}^{l} + b_{j}^{L}.$$  </p>
+n_inputs, n_features <span style="color: #666666">=</span> X_train<span style="color: #666666">.</span>shape
+n_hidden_neurons <span style="color: #666666">=</span> <span style="color: #666666">50</span>
+n_categories <span style="color: #666666">=</span> <span style="color: #666666">10</span>
 
-<p>Finally we calculate the output of neuron \( j \) in the output layer using the softmax function:  </p>
+<span style="color: #408080; font-style: italic"># we make the weights normally distributed using numpy.random.randn</span>
+
+<span style="color: #408080; font-style: italic"># weights and bias in the hidden layer</span>
+hidden_weights <span style="color: #666666">=</span> np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>randn(n_features, n_hidden_neurons)
+hidden_bias <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros(n_hidden_neurons) <span style="color: #666666">+</span> <span style="color: #666666">0.01</span>
+
+<span style="color: #408080; font-style: italic"># weights and bias in the output layer</span>
+output_weights <span style="color: #666666">=</span> np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>randn(n_hidden_neurons, n_categories)
+output_bias <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros(n_categories) <span style="color: #666666">+</span> <span style="color: #666666">0.01</span>
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
+</div>
 
-<p>$$ a_{j}^{L} = \frac{\exp{(z_j^{L})}}
-{\sum_{c=0}^{C-1} \exp{(z_c^{L})}} .$$  
-</p>
 
 <p>
 <!-- navigation buttons at the bottom of the page -->
diff --git a/doc/pub/week42/html/._week42-bs081.html b/doc/pub/week42/html/._week42-bs081.html
index 08053cadb..40d0710af 100644
--- a/doc/pub/week42/html/._week42-bs081.html
+++ b/doc/pub/week42/html/._week42-bs081.html
@@ -36,16 +36,17 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Lecture October 14, 2024', 2, None, 'lecture-october-14-2024'),
-              ('Material for the active learning sessions on Tuesday and '
-               'Wednesday',
+ 'sections': [('Lecture October 13, 2025', 2, None, 'lecture-october-13-2025'),
+              ('Readings and videos', 2, None, 'readings-and-videos'),
+              ('Material for the lab sessions on Tuesday and Wednesday',
                2,
                None,
-               'material-for-the-active-learning-sessions-on-tuesday-and-wednesday'),
-              ('Writing a code which implements a feed-forward neural network',
+               'material-for-the-lab-sessions-on-tuesday-and-wednesday'),
+              ('Lecture material: Writing a code which implements a '
+               'feed-forward neural network',
                2,
                None,
-               'writing-a-code-which-implements-a-feed-forward-neural-network'),
+               'lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network'),
               ('Mathematics of deep learning',
                2,
                None,
@@ -55,10 +56,11 @@
                None,
                'reminder-on-books-with-hands-on-material-and-codes'),
               ('Reading recommendations', 2, None, 'reading-recommendations'),
-              ('First network example, simple percepetron with one input',
+              ('Reminder from last week: First network example, simple '
+               'percepetron with one input',
                2,
                None,
-               'first-network-example-simple-percepetron-with-one-input'),
+               'reminder-from-last-week-first-network-example-simple-percepetron-with-one-input'),
               ('Layout of a simple neural network with no hidden layer',
                2,
                None,
@@ -105,10 +107,10 @@
                None,
                'setting-up-the-equations-for-a-neural-network'),
               ('Layout of a neural network with three hidden layers (last '
-               'later = $l=L=4$, first layer $l=0$)',
+               'layer = $l=L=4$, first layer $l=0$)',
                2,
                None,
-               'layout-of-a-neural-network-with-three-hidden-layers-last-later-l-l-4-first-layer-l-0'),
+               'layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0'),
               ('Definitions', 2, None, 'definitions'),
               ('Inputs to the activation function',
                2,
@@ -291,10 +293,6 @@
                2,
                None,
                'collect-and-pre-process-data'),
-              ('The Breast Cancer Data, now with Keras',
-               2,
-               None,
-               'the-breast-cancer-data-now-with-keras'),
               ('Building a neural network code',
                2,
                None,
@@ -349,103 +347,103 @@
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs001.html#lecture-october-14-2024" style="font-size: 80%;"><b>Lecture October 14, 2024</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs002.html#material-for-the-active-learning-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Material for the active learning sessions on Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs003.html#writing-a-code-which-implements-a-feed-forward-neural-network" style="font-size: 80%;"><b>Writing a code which implements a feed-forward neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs004.html#mathematics-of-deep-learning" style="font-size: 80%;"><b>Mathematics of deep learning</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs005.html#reminder-on-books-with-hands-on-material-and-codes" style="font-size: 80%;"><b>Reminder on books with hands-on material and codes</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs006.html#reading-recommendations" style="font-size: 80%;"><b>Reading recommendations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs007.html#first-network-example-simple-percepetron-with-one-input" style="font-size: 80%;"><b>First network example, simple percepetron with one input</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs008.html#layout-of-a-simple-neural-network-with-no-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with no hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs009.html#optimizing-the-parameters" style="font-size: 80%;"><b>Optimizing the parameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs010.html#adding-a-hidden-layer" style="font-size: 80%;"><b>Adding a hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs011.html#layout-of-a-simple-neural-network-with-one-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with one hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs012.html#the-derivatives" style="font-size: 80%;"><b>The derivatives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs013.html#important-observations" style="font-size: 80%;"><b>Important observations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs014.html#the-training" style="font-size: 80%;"><b>The training</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs015.html#code-example" style="font-size: 80%;"><b>Code example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs016.html#simple-neural-network-and-the-back-propagation-equations" style="font-size: 80%;"><b>Simple neural network and the  back propagation equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs017.html#layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-with-two-hidden-noeds-and-one-output-node" style="font-size: 80%;"><b>Layout of a simple neural network with two input nodes, one  hidden layer with two hidden noeds and one output node</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs018.html#the-ouput-layer" style="font-size: 80%;"><b>The ouput layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs019.html#compact-expressions" style="font-size: 80%;"><b>Compact expressions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs020.html#output-layer" style="font-size: 80%;"><b>Output layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs021.html#explicit-derivatives" style="font-size: 80%;"><b>Explicit derivatives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs022.html#derivatives-of-the-hidden-layer" style="font-size: 80%;"><b>Derivatives of the hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs023.html#final-expression" style="font-size: 80%;"><b>Final expression</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs024.html#completing-the-list" style="font-size: 80%;"><b>Completing the list</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs025.html#final-expressions-for-the-biases-of-the-hidden-layer" style="font-size: 80%;"><b>Final expressions for the biases of the hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs026.html#gradient-expressions" style="font-size: 80%;"><b>Gradient expressions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs027.html#setting-up-the-equations-for-a-neural-network" style="font-size: 80%;"><b>Setting up the equations for a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs028.html#layout-of-a-neural-network-with-three-hidden-layers-last-later-l-l-4-first-layer-l-0" style="font-size: 80%;"><b>Layout of a neural network with three hidden layers (last later = \( l=L=4 \), first layer \( l=0 \))</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs029.html#definitions" style="font-size: 80%;"><b>Definitions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs030.html#inputs-to-the-activation-function" style="font-size: 80%;"><b>Inputs to the activation function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs031.html#layout-of-input-to-first-hidden-layer-l-1-from-input-layer-l-0" style="font-size: 80%;"><b>Layout of input to first hidden layer \( l=1 \) from input layer \( l=0 \)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs032.html#derivatives-and-the-chain-rule" style="font-size: 80%;"><b>Derivatives and the chain rule</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs033.html#derivative-of-the-cost-function" style="font-size: 80%;"><b>Derivative of the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs034.html#the-back-propagation-equations-for-a-neural-network" style="font-size: 80%;"><b>The  back propagation equations for a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs035.html#analyzing-the-last-results" style="font-size: 80%;"><b>Analyzing the last results</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs036.html#more-considerations" style="font-size: 80%;"><b>More considerations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs037.html#derivatives-in-terms-of-z-j-l" style="font-size: 80%;"><b>Derivatives in terms of \( z_j^L \)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs038.html#bringing-it-together" style="font-size: 80%;"><b>Bringing it together</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs039.html#final-back-propagating-equation" style="font-size: 80%;"><b>Final back propagating equation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs040.html#using-the-chain-rule-and-summing-over-all-k-entries" style="font-size: 80%;"><b>Using the chain rule and summing over all \( k \) entries</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs041.html#setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations" style="font-size: 80%;"><b>Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs042.html#setting-up-the-back-propagation-algorithm-part-1" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 1</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs043.html#setting-up-the-back-propagation-algorithm-part-2" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs044.html#setting-up-the-back-propagation-algorithm-part-3" style="font-size: 80%;"><b>Setting up the Back propagation algorithm, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs045.html#updating-the-gradients" style="font-size: 80%;"><b>Updating the gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs001.html#lecture-october-13-2025" style="font-size: 80%;"><b>Lecture October 13, 2025</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs002.html#readings-and-videos" style="font-size: 80%;"><b>Readings and videos</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs003.html#material-for-the-lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Material for the lab sessions on Tuesday and Wednesday</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs004.html#lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network" style="font-size: 80%;"><b>Lecture material: Writing a code which implements a feed-forward neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs005.html#mathematics-of-deep-learning" style="font-size: 80%;"><b>Mathematics of deep learning</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs006.html#reminder-on-books-with-hands-on-material-and-codes" style="font-size: 80%;"><b>Reminder on books with hands-on material and codes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs007.html#reading-recommendations" style="font-size: 80%;"><b>Reading recommendations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs008.html#reminder-from-last-week-first-network-example-simple-percepetron-with-one-input" style="font-size: 80%;"><b>Reminder from last week: First network example, simple percepetron with one input</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs009.html#layout-of-a-simple-neural-network-with-no-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with no hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs010.html#optimizing-the-parameters" style="font-size: 80%;"><b>Optimizing the parameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs011.html#adding-a-hidden-layer" style="font-size: 80%;"><b>Adding a hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs012.html#layout-of-a-simple-neural-network-with-one-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with one hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs013.html#the-derivatives" style="font-size: 80%;"><b>The derivatives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs014.html#important-observations" style="font-size: 80%;"><b>Important observations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs015.html#the-training" style="font-size: 80%;"><b>The training</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs016.html#code-example" style="font-size: 80%;"><b>Code example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs017.html#simple-neural-network-and-the-back-propagation-equations" style="font-size: 80%;"><b>Simple neural network and the  back propagation equations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs018.html#layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-with-two-hidden-noeds-and-one-output-node" style="font-size: 80%;"><b>Layout of a simple neural network with two input nodes, one  hidden layer with two hidden noeds and one output node</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs019.html#the-ouput-layer" style="font-size: 80%;"><b>The ouput layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs020.html#compact-expressions" style="font-size: 80%;"><b>Compact expressions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs021.html#output-layer" style="font-size: 80%;"><b>Output layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs022.html#explicit-derivatives" style="font-size: 80%;"><b>Explicit derivatives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs023.html#derivatives-of-the-hidden-layer" style="font-size: 80%;"><b>Derivatives of the hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs024.html#final-expression" style="font-size: 80%;"><b>Final expression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs025.html#completing-the-list" style="font-size: 80%;"><b>Completing the list</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs026.html#final-expressions-for-the-biases-of-the-hidden-layer" style="font-size: 80%;"><b>Final expressions for the biases of the hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs027.html#gradient-expressions" style="font-size: 80%;"><b>Gradient expressions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs028.html#setting-up-the-equations-for-a-neural-network" style="font-size: 80%;"><b>Setting up the equations for a neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs029.html#layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0" style="font-size: 80%;"><b>Layout of a neural network with three hidden layers (last layer = \( l=L=4 \), first layer \( l=0 \))</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs030.html#definitions" style="font-size: 80%;"><b>Definitions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs031.html#inputs-to-the-activation-function" style="font-size: 80%;"><b>Inputs to the activation function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs032.html#layout-of-input-to-first-hidden-layer-l-1-from-input-layer-l-0" style="font-size: 80%;"><b>Layout of input to first hidden layer \( l=1 \) from input layer \( l=0 \)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs033.html#derivatives-and-the-chain-rule" style="font-size: 80%;"><b>Derivatives and the chain rule</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs034.html#derivative-of-the-cost-function" style="font-size: 80%;"><b>Derivative of the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs035.html#the-back-propagation-equations-for-a-neural-network" style="font-size: 80%;"><b>The  back propagation equations for a neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs036.html#analyzing-the-last-results" style="font-size: 80%;"><b>Analyzing the last results</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs037.html#more-considerations" style="font-size: 80%;"><b>More considerations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs038.html#derivatives-in-terms-of-z-j-l" style="font-size: 80%;"><b>Derivatives in terms of \( z_j^L \)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs039.html#bringing-it-together" style="font-size: 80%;"><b>Bringing it together</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs040.html#final-back-propagating-equation" style="font-size: 80%;"><b>Final back propagating equation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs041.html#using-the-chain-rule-and-summing-over-all-k-entries" style="font-size: 80%;"><b>Using the chain rule and summing over all \( k \) entries</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs042.html#setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations" style="font-size: 80%;"><b>Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs043.html#setting-up-the-back-propagation-algorithm-part-1" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 1</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs044.html#setting-up-the-back-propagation-algorithm-part-2" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 2</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs045.html#setting-up-the-back-propagation-algorithm-part-3" style="font-size: 80%;"><b>Setting up the Back propagation algorithm, part 3</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs046.html#updating-the-gradients" style="font-size: 80%;"><b>Updating the gradients</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#activation-functions" style="font-size: 80%;"><b>Activation functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs047.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs048.html#relevance" style="font-size: 80%;"><b>Relevance</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs049.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs050.html#exploding-gradients" style="font-size: 80%;"><b>Exploding gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs051.html#is-the-logistic-activation-function-sigmoid-our-choice" style="font-size: 80%;"><b>Is the Logistic activation function (Sigmoid)  our choice?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs052.html#logistic-function-as-the-root-of-problems" style="font-size: 80%;"><b>Logistic function as the root of problems</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs053.html#the-derivative-of-the-logistic-funtion" style="font-size: 80%;"><b>The derivative of the Logistic funtion</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs054.html#insights-from-the-paper-by-glorot-and-bengio" style="font-size: 80%;"><b>Insights from the paper by Glorot and Bengio</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs055.html#the-relu-function-family" style="font-size: 80%;"><b>The RELU function family</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs056.html#elu-function" style="font-size: 80%;"><b>ELU function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs057.html#which-activation-function-should-we-use" style="font-size: 80%;"><b>Which activation function should we use?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs058.html#more-on-activation-functions-output-layers" style="font-size: 80%;"><b>More on activation functions, output layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs059.html#fine-tuning-neural-network-hyperparameters" style="font-size: 80%;"><b>Fine-tuning neural network hyperparameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs060.html#hidden-layers" style="font-size: 80%;"><b>Hidden layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs061.html#batch-normalization" style="font-size: 80%;"><b>Batch Normalization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs062.html#dropout" style="font-size: 80%;"><b>Dropout</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs063.html#gradient-clipping" style="font-size: 80%;"><b>Gradient Clipping</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs064.html#a-top-down-perspective-on-neural-networks" style="font-size: 80%;"><b>A top-down perspective on Neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs065.html#more-top-down-perspectives" style="font-size: 80%;"><b>More top-down perspectives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs066.html#limitations-of-supervised-learning-with-deep-networks" style="font-size: 80%;"><b>Limitations of supervised learning with deep networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs067.html#limitations-of-nns" style="font-size: 80%;"><b>Limitations of NNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs068.html#homogeneous-data" style="font-size: 80%;"><b>Homogeneous data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs069.html#more-limitations" style="font-size: 80%;"><b>More limitations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs070.html#setting-up-a-multi-layer-perceptron-model-for-classification" style="font-size: 80%;"><b>Setting up a Multi-layer perceptron model for classification</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs071.html#defining-the-cost-function" style="font-size: 80%;"><b>Defining the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs072.html#example-binary-classification-problem" style="font-size: 80%;"><b>Example: binary classification problem</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs073.html#the-softmax-function" style="font-size: 80%;"><b>The Softmax function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs074.html#developing-a-code-for-doing-neural-networks-with-back-propagation" style="font-size: 80%;"><b>Developing a code for doing neural networks with back propagation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs076.html#train-and-test-datasets" style="font-size: 80%;"><b>Train and test datasets</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs077.html#define-model-and-architecture" style="font-size: 80%;"><b>Define model and architecture</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs078.html#layers" style="font-size: 80%;"><b>Layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs079.html#weights-and-biases" style="font-size: 80%;"><b>Weights and biases</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs080.html#feed-forward-pass" style="font-size: 80%;"><b>Feed-forward pass</b></a></li>
-     <!-- navigation toc: --> <li><a href="#matrix-multiplications" style="font-size: 80%;"><b>Matrix multiplications</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs082.html#choose-cost-function-and-optimizer" style="font-size: 80%;"><b>Choose cost function and optimizer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs083.html#optimizing-the-cost-function" style="font-size: 80%;"><b>Optimizing the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs084.html#regularization" style="font-size: 80%;"><b>Regularization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs085.html#matrix-multiplication" style="font-size: 80%;"><b>Matrix  multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs086.html#improving-performance" style="font-size: 80%;"><b>Improving performance</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs087.html#full-object-oriented-implementation" style="font-size: 80%;"><b>Full object-oriented implementation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs088.html#evaluate-model-performance-on-test-data" style="font-size: 80%;"><b>Evaluate model performance on test data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs089.html#adjust-hyperparameters" style="font-size: 80%;"><b>Adjust hyperparameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs091.html#scikit-learn-implementation" style="font-size: 80%;"><b>scikit-learn implementation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#building-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs094.html#tensorflow" style="font-size: 80%;"><b>Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs095.html#using-keras" style="font-size: 80%;"><b>Using Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#the-breast-cancer-data-now-with-keras" style="font-size: 80%;"><b>The Breast Cancer Data, now with Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs048.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs049.html#relevance" style="font-size: 80%;"><b>Relevance</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs050.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs051.html#exploding-gradients" style="font-size: 80%;"><b>Exploding gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs052.html#is-the-logistic-activation-function-sigmoid-our-choice" style="font-size: 80%;"><b>Is the Logistic activation function (Sigmoid)  our choice?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs053.html#logistic-function-as-the-root-of-problems" style="font-size: 80%;"><b>Logistic function as the root of problems</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs054.html#the-derivative-of-the-logistic-funtion" style="font-size: 80%;"><b>The derivative of the Logistic funtion</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs055.html#insights-from-the-paper-by-glorot-and-bengio" style="font-size: 80%;"><b>Insights from the paper by Glorot and Bengio</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs056.html#the-relu-function-family" style="font-size: 80%;"><b>The RELU function family</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs057.html#elu-function" style="font-size: 80%;"><b>ELU function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs058.html#which-activation-function-should-we-use" style="font-size: 80%;"><b>Which activation function should we use?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs059.html#more-on-activation-functions-output-layers" style="font-size: 80%;"><b>More on activation functions, output layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs060.html#fine-tuning-neural-network-hyperparameters" style="font-size: 80%;"><b>Fine-tuning neural network hyperparameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs061.html#hidden-layers" style="font-size: 80%;"><b>Hidden layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs062.html#batch-normalization" style="font-size: 80%;"><b>Batch Normalization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs063.html#dropout" style="font-size: 80%;"><b>Dropout</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs064.html#gradient-clipping" style="font-size: 80%;"><b>Gradient Clipping</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs065.html#a-top-down-perspective-on-neural-networks" style="font-size: 80%;"><b>A top-down perspective on Neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs066.html#more-top-down-perspectives" style="font-size: 80%;"><b>More top-down perspectives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs067.html#limitations-of-supervised-learning-with-deep-networks" style="font-size: 80%;"><b>Limitations of supervised learning with deep networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs068.html#limitations-of-nns" style="font-size: 80%;"><b>Limitations of NNs</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs069.html#homogeneous-data" style="font-size: 80%;"><b>Homogeneous data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs070.html#more-limitations" style="font-size: 80%;"><b>More limitations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs071.html#setting-up-a-multi-layer-perceptron-model-for-classification" style="font-size: 80%;"><b>Setting up a Multi-layer perceptron model for classification</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs072.html#defining-the-cost-function" style="font-size: 80%;"><b>Defining the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs073.html#example-binary-classification-problem" style="font-size: 80%;"><b>Example: binary classification problem</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs074.html#the-softmax-function" style="font-size: 80%;"><b>The Softmax function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs075.html#developing-a-code-for-doing-neural-networks-with-back-propagation" style="font-size: 80%;"><b>Developing a code for doing neural networks with back propagation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs077.html#train-and-test-datasets" style="font-size: 80%;"><b>Train and test datasets</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs078.html#define-model-and-architecture" style="font-size: 80%;"><b>Define model and architecture</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs079.html#layers" style="font-size: 80%;"><b>Layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs080.html#weights-and-biases" style="font-size: 80%;"><b>Weights and biases</b></a></li>
+     <!-- navigation toc: --> <li><a href="#feed-forward-pass" style="font-size: 80%;"><b>Feed-forward pass</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs082.html#matrix-multiplications" style="font-size: 80%;"><b>Matrix multiplications</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs083.html#choose-cost-function-and-optimizer" style="font-size: 80%;"><b>Choose cost function and optimizer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs084.html#optimizing-the-cost-function" style="font-size: 80%;"><b>Optimizing the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs085.html#regularization" style="font-size: 80%;"><b>Regularization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs086.html#matrix-multiplication" style="font-size: 80%;"><b>Matrix  multiplication</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs087.html#improving-performance" style="font-size: 80%;"><b>Improving performance</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs088.html#full-object-oriented-implementation" style="font-size: 80%;"><b>Full object-oriented implementation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs089.html#evaluate-model-performance-on-test-data" style="font-size: 80%;"><b>Evaluate model performance on test data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs090.html#adjust-hyperparameters" style="font-size: 80%;"><b>Adjust hyperparameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#scikit-learn-implementation" style="font-size: 80%;"><b>scikit-learn implementation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs094.html#building-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building neural networks in Tensorflow and Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs095.html#tensorflow" style="font-size: 80%;"><b>Tensorflow</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#using-keras" style="font-size: 80%;"><b>Using Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#building-a-neural-network-code" style="font-size: 80%;"><b>Building a neural network code</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#learning-rate-methods" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Learning rate methods</a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#usage-of-the-above-learning-rate-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of the above learning rate schedulers</a></li>
@@ -464,95 +462,28 @@
 <div class="container">
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0081"></a>
-<!-- !split    -->
-<h2 id="matrix-multiplications" class="anchor">Matrix multiplications </h2>
-
-<p>Since our data has the dimensions \( X = (n_{inputs}, n_{features}) \) and our weights to the hidden
-layer have the dimensions  
-\( W_{hidden} = (n_{features}, n_{hidden}) \),
-we can easily feed the network all our training data in one go by taking the matrix product  
-</p>
-
-<p>$$ X W^{h} = (n_{inputs}, n_{hidden}),$$ </p>
+<!-- !split -->
+<h2 id="feed-forward-pass" class="anchor">Feed-forward pass </h2>
 
-<p>and obtain a matrix that holds the weighted sum of inputs to the hidden layer
-for each input image and each hidden neuron.    
-We also add the bias to obtain a matrix of weighted sums to the hidden layer \( Z^{h} \):  
+<p>Denote \( F \) the number of features, \( H \) the number of hidden neurons and \( C \) the number of categories.  
+For each input image we calculate a weighted sum of input features (pixel values) to each neuron \( j \) in the hidden layer \( l \):  
 </p>
 
-<p>$$ \boldsymbol{z}^{l} = \boldsymbol{X} \boldsymbol{W}^{l} + \boldsymbol{b}^{l} ,$$</p>
-
-<p>meaning the same bias (1D array with size equal number of hidden neurons) is added to each input image.  
-This is then passed through the activation:  
-</p>
-
-<p>$$ \boldsymbol{a}^{l} = f(\boldsymbol{z}^l) .$$  </p>
-
-<p>This is fed to the output layer:  </p>
-
-<p>$$ \boldsymbol{z}^{L} = \boldsymbol{a}^{L} \boldsymbol{W}^{L} + \boldsymbol{b}^{L} .$$</p>
+<p>$$ z_{j}^{l} = \sum_{i=1}^{F} w_{ij}^{l} x_i + b_{j}^{l},$$</p>
 
-<p>Finally we receive our output values for each image and each category by passing it through the softmax function:  </p>
+<p>this is then passed through our activation function  </p>
 
-<p>$$ output = softmax (\boldsymbol{z}^{L}) = (n_{inputs}, n_{categories}) .$$</p>
+<p>$$ a_{j}^{l} = f(z_{j}^{l}) .$$  </p>
 
+<p>We calculate a weighted sum of inputs (activations in the hidden layer) to each neuron \( j \) in the output layer:  </p>
 
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #408080; font-style: italic"># setup the feed-forward pass, subscript h = hidden layer</span>
+<p>$$ z_{j}^{L} = \sum_{i=1}^{H} w_{ij}^{L} a_{i}^{l} + b_{j}^{L}.$$  </p>
 
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">sigmoid</span>(x):
-    <span style="color: #008000; font-weight: bold">return</span> <span style="color: #666666">1/</span>(<span style="color: #666666">1</span> <span style="color: #666666">+</span> np<span style="color: #666666">.</span>exp(<span style="color: #666666">-</span>x))
-
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">feed_forward</span>(X):
-    <span style="color: #408080; font-style: italic"># weighted sum of inputs to the hidden layer</span>
-    z_h <span style="color: #666666">=</span> np<span style="color: #666666">.</span>matmul(X, hidden_weights) <span style="color: #666666">+</span> hidden_bias
-    <span style="color: #408080; font-style: italic"># activation in the hidden layer</span>
-    a_h <span style="color: #666666">=</span> sigmoid(z_h)
-    
-    <span style="color: #408080; font-style: italic"># weighted sum of inputs to the output layer</span>
-    z_o <span style="color: #666666">=</span> np<span style="color: #666666">.</span>matmul(a_h, output_weights) <span style="color: #666666">+</span> output_bias
-    <span style="color: #408080; font-style: italic"># softmax output</span>
-    <span style="color: #408080; font-style: italic"># axis 0 holds each input and axis 1 the probabilities of each category</span>
-    exp_term <span style="color: #666666">=</span> np<span style="color: #666666">.</span>exp(z_o)
-    probabilities <span style="color: #666666">=</span> exp_term <span style="color: #666666">/</span> np<span style="color: #666666">.</span>sum(exp_term, axis<span style="color: #666666">=1</span>, keepdims<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">True</span>)
-    
-    <span style="color: #008000; font-weight: bold">return</span> probabilities
-
-probabilities <span style="color: #666666">=</span> feed_forward(X_train)
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;probabilities = (n_inputs, n_categories) = &quot;</span> <span style="color: #666666">+</span> <span style="color: #008000">str</span>(probabilities<span style="color: #666666">.</span>shape))
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;probability that image 0 is in category 0,1,2,...,9 = </span><span style="color: #BB6622; font-weight: bold">\n</span><span style="color: #BA2121">&quot;</span> <span style="color: #666666">+</span> <span style="color: #008000">str</span>(probabilities[<span style="color: #666666">0</span>]))
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;probabilities sum up to: &quot;</span> <span style="color: #666666">+</span> <span style="color: #008000">str</span>(probabilities[<span style="color: #666666">0</span>]<span style="color: #666666">.</span>sum()))
-<span style="color: #008000">print</span>()
-
-<span style="color: #408080; font-style: italic"># we obtain a prediction by taking the class with the highest likelihood</span>
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">predict</span>(X):
-    probabilities <span style="color: #666666">=</span> feed_forward(X)
-    <span style="color: #008000; font-weight: bold">return</span> np<span style="color: #666666">.</span>argmax(probabilities, axis<span style="color: #666666">=1</span>)
-
-predictions <span style="color: #666666">=</span> predict(X_train)
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;predictions = (n_inputs) = &quot;</span> <span style="color: #666666">+</span> <span style="color: #008000">str</span>(predictions<span style="color: #666666">.</span>shape))
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;prediction for image 0: &quot;</span> <span style="color: #666666">+</span> <span style="color: #008000">str</span>(predictions[<span style="color: #666666">0</span>]))
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;correct label for image 0: &quot;</span> <span style="color: #666666">+</span> <span style="color: #008000">str</span>(Y_train[<span style="color: #666666">0</span>]))
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
+<p>Finally we calculate the output of neuron \( j \) in the output layer using the softmax function:  </p>
 
+<p>$$ a_{j}^{L} = \frac{\exp{(z_j^{L})}}
+{\sum_{c=0}^{C-1} \exp{(z_c^{L})}} .$$  
+</p>
 
 <p>
 <!-- navigation buttons at the bottom of the page -->
diff --git a/doc/pub/week42/html/._week42-bs082.html b/doc/pub/week42/html/._week42-bs082.html
index 89a8662a0..952b9fc06 100644
--- a/doc/pub/week42/html/._week42-bs082.html
+++ b/doc/pub/week42/html/._week42-bs082.html
@@ -36,16 +36,17 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Lecture October 14, 2024', 2, None, 'lecture-october-14-2024'),
-              ('Material for the active learning sessions on Tuesday and '
-               'Wednesday',
+ 'sections': [('Lecture October 13, 2025', 2, None, 'lecture-october-13-2025'),
+              ('Readings and videos', 2, None, 'readings-and-videos'),
+              ('Material for the lab sessions on Tuesday and Wednesday',
                2,
                None,
-               'material-for-the-active-learning-sessions-on-tuesday-and-wednesday'),
-              ('Writing a code which implements a feed-forward neural network',
+               'material-for-the-lab-sessions-on-tuesday-and-wednesday'),
+              ('Lecture material: Writing a code which implements a '
+               'feed-forward neural network',
                2,
                None,
-               'writing-a-code-which-implements-a-feed-forward-neural-network'),
+               'lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network'),
               ('Mathematics of deep learning',
                2,
                None,
@@ -55,10 +56,11 @@
                None,
                'reminder-on-books-with-hands-on-material-and-codes'),
               ('Reading recommendations', 2, None, 'reading-recommendations'),
-              ('First network example, simple percepetron with one input',
+              ('Reminder from last week: First network example, simple '
+               'percepetron with one input',
                2,
                None,
-               'first-network-example-simple-percepetron-with-one-input'),
+               'reminder-from-last-week-first-network-example-simple-percepetron-with-one-input'),
               ('Layout of a simple neural network with no hidden layer',
                2,
                None,
@@ -105,10 +107,10 @@
                None,
                'setting-up-the-equations-for-a-neural-network'),
               ('Layout of a neural network with three hidden layers (last '
-               'later = $l=L=4$, first layer $l=0$)',
+               'layer = $l=L=4$, first layer $l=0$)',
                2,
                None,
-               'layout-of-a-neural-network-with-three-hidden-layers-last-later-l-l-4-first-layer-l-0'),
+               'layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0'),
               ('Definitions', 2, None, 'definitions'),
               ('Inputs to the activation function',
                2,
@@ -291,10 +293,6 @@
                2,
                None,
                'collect-and-pre-process-data'),
-              ('The Breast Cancer Data, now with Keras',
-               2,
-               None,
-               'the-breast-cancer-data-now-with-keras'),
               ('Building a neural network code',
                2,
                None,
@@ -349,103 +347,103 @@
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs001.html#lecture-october-14-2024" style="font-size: 80%;"><b>Lecture October 14, 2024</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs002.html#material-for-the-active-learning-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Material for the active learning sessions on Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs003.html#writing-a-code-which-implements-a-feed-forward-neural-network" style="font-size: 80%;"><b>Writing a code which implements a feed-forward neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs004.html#mathematics-of-deep-learning" style="font-size: 80%;"><b>Mathematics of deep learning</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs005.html#reminder-on-books-with-hands-on-material-and-codes" style="font-size: 80%;"><b>Reminder on books with hands-on material and codes</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs006.html#reading-recommendations" style="font-size: 80%;"><b>Reading recommendations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs007.html#first-network-example-simple-percepetron-with-one-input" style="font-size: 80%;"><b>First network example, simple percepetron with one input</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs008.html#layout-of-a-simple-neural-network-with-no-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with no hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs009.html#optimizing-the-parameters" style="font-size: 80%;"><b>Optimizing the parameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs010.html#adding-a-hidden-layer" style="font-size: 80%;"><b>Adding a hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs011.html#layout-of-a-simple-neural-network-with-one-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with one hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs012.html#the-derivatives" style="font-size: 80%;"><b>The derivatives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs013.html#important-observations" style="font-size: 80%;"><b>Important observations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs014.html#the-training" style="font-size: 80%;"><b>The training</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs015.html#code-example" style="font-size: 80%;"><b>Code example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs016.html#simple-neural-network-and-the-back-propagation-equations" style="font-size: 80%;"><b>Simple neural network and the  back propagation equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs017.html#layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-with-two-hidden-noeds-and-one-output-node" style="font-size: 80%;"><b>Layout of a simple neural network with two input nodes, one  hidden layer with two hidden noeds and one output node</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs018.html#the-ouput-layer" style="font-size: 80%;"><b>The ouput layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs019.html#compact-expressions" style="font-size: 80%;"><b>Compact expressions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs020.html#output-layer" style="font-size: 80%;"><b>Output layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs021.html#explicit-derivatives" style="font-size: 80%;"><b>Explicit derivatives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs022.html#derivatives-of-the-hidden-layer" style="font-size: 80%;"><b>Derivatives of the hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs023.html#final-expression" style="font-size: 80%;"><b>Final expression</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs024.html#completing-the-list" style="font-size: 80%;"><b>Completing the list</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs025.html#final-expressions-for-the-biases-of-the-hidden-layer" style="font-size: 80%;"><b>Final expressions for the biases of the hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs026.html#gradient-expressions" style="font-size: 80%;"><b>Gradient expressions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs027.html#setting-up-the-equations-for-a-neural-network" style="font-size: 80%;"><b>Setting up the equations for a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs028.html#layout-of-a-neural-network-with-three-hidden-layers-last-later-l-l-4-first-layer-l-0" style="font-size: 80%;"><b>Layout of a neural network with three hidden layers (last later = \( l=L=4 \), first layer \( l=0 \))</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs029.html#definitions" style="font-size: 80%;"><b>Definitions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs030.html#inputs-to-the-activation-function" style="font-size: 80%;"><b>Inputs to the activation function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs031.html#layout-of-input-to-first-hidden-layer-l-1-from-input-layer-l-0" style="font-size: 80%;"><b>Layout of input to first hidden layer \( l=1 \) from input layer \( l=0 \)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs032.html#derivatives-and-the-chain-rule" style="font-size: 80%;"><b>Derivatives and the chain rule</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs033.html#derivative-of-the-cost-function" style="font-size: 80%;"><b>Derivative of the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs034.html#the-back-propagation-equations-for-a-neural-network" style="font-size: 80%;"><b>The  back propagation equations for a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs035.html#analyzing-the-last-results" style="font-size: 80%;"><b>Analyzing the last results</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs036.html#more-considerations" style="font-size: 80%;"><b>More considerations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs037.html#derivatives-in-terms-of-z-j-l" style="font-size: 80%;"><b>Derivatives in terms of \( z_j^L \)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs038.html#bringing-it-together" style="font-size: 80%;"><b>Bringing it together</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs039.html#final-back-propagating-equation" style="font-size: 80%;"><b>Final back propagating equation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs040.html#using-the-chain-rule-and-summing-over-all-k-entries" style="font-size: 80%;"><b>Using the chain rule and summing over all \( k \) entries</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs041.html#setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations" style="font-size: 80%;"><b>Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs042.html#setting-up-the-back-propagation-algorithm-part-1" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 1</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs043.html#setting-up-the-back-propagation-algorithm-part-2" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs044.html#setting-up-the-back-propagation-algorithm-part-3" style="font-size: 80%;"><b>Setting up the Back propagation algorithm, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs045.html#updating-the-gradients" style="font-size: 80%;"><b>Updating the gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs001.html#lecture-october-13-2025" style="font-size: 80%;"><b>Lecture October 13, 2025</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs002.html#readings-and-videos" style="font-size: 80%;"><b>Readings and videos</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs003.html#material-for-the-lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Material for the lab sessions on Tuesday and Wednesday</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs004.html#lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network" style="font-size: 80%;"><b>Lecture material: Writing a code which implements a feed-forward neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs005.html#mathematics-of-deep-learning" style="font-size: 80%;"><b>Mathematics of deep learning</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs006.html#reminder-on-books-with-hands-on-material-and-codes" style="font-size: 80%;"><b>Reminder on books with hands-on material and codes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs007.html#reading-recommendations" style="font-size: 80%;"><b>Reading recommendations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs008.html#reminder-from-last-week-first-network-example-simple-percepetron-with-one-input" style="font-size: 80%;"><b>Reminder from last week: First network example, simple percepetron with one input</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs009.html#layout-of-a-simple-neural-network-with-no-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with no hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs010.html#optimizing-the-parameters" style="font-size: 80%;"><b>Optimizing the parameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs011.html#adding-a-hidden-layer" style="font-size: 80%;"><b>Adding a hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs012.html#layout-of-a-simple-neural-network-with-one-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with one hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs013.html#the-derivatives" style="font-size: 80%;"><b>The derivatives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs014.html#important-observations" style="font-size: 80%;"><b>Important observations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs015.html#the-training" style="font-size: 80%;"><b>The training</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs016.html#code-example" style="font-size: 80%;"><b>Code example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs017.html#simple-neural-network-and-the-back-propagation-equations" style="font-size: 80%;"><b>Simple neural network and the  back propagation equations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs018.html#layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-with-two-hidden-noeds-and-one-output-node" style="font-size: 80%;"><b>Layout of a simple neural network with two input nodes, one  hidden layer with two hidden noeds and one output node</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs019.html#the-ouput-layer" style="font-size: 80%;"><b>The ouput layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs020.html#compact-expressions" style="font-size: 80%;"><b>Compact expressions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs021.html#output-layer" style="font-size: 80%;"><b>Output layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs022.html#explicit-derivatives" style="font-size: 80%;"><b>Explicit derivatives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs023.html#derivatives-of-the-hidden-layer" style="font-size: 80%;"><b>Derivatives of the hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs024.html#final-expression" style="font-size: 80%;"><b>Final expression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs025.html#completing-the-list" style="font-size: 80%;"><b>Completing the list</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs026.html#final-expressions-for-the-biases-of-the-hidden-layer" style="font-size: 80%;"><b>Final expressions for the biases of the hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs027.html#gradient-expressions" style="font-size: 80%;"><b>Gradient expressions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs028.html#setting-up-the-equations-for-a-neural-network" style="font-size: 80%;"><b>Setting up the equations for a neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs029.html#layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0" style="font-size: 80%;"><b>Layout of a neural network with three hidden layers (last layer = \( l=L=4 \), first layer \( l=0 \))</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs030.html#definitions" style="font-size: 80%;"><b>Definitions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs031.html#inputs-to-the-activation-function" style="font-size: 80%;"><b>Inputs to the activation function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs032.html#layout-of-input-to-first-hidden-layer-l-1-from-input-layer-l-0" style="font-size: 80%;"><b>Layout of input to first hidden layer \( l=1 \) from input layer \( l=0 \)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs033.html#derivatives-and-the-chain-rule" style="font-size: 80%;"><b>Derivatives and the chain rule</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs034.html#derivative-of-the-cost-function" style="font-size: 80%;"><b>Derivative of the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs035.html#the-back-propagation-equations-for-a-neural-network" style="font-size: 80%;"><b>The  back propagation equations for a neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs036.html#analyzing-the-last-results" style="font-size: 80%;"><b>Analyzing the last results</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs037.html#more-considerations" style="font-size: 80%;"><b>More considerations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs038.html#derivatives-in-terms-of-z-j-l" style="font-size: 80%;"><b>Derivatives in terms of \( z_j^L \)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs039.html#bringing-it-together" style="font-size: 80%;"><b>Bringing it together</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs040.html#final-back-propagating-equation" style="font-size: 80%;"><b>Final back propagating equation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs041.html#using-the-chain-rule-and-summing-over-all-k-entries" style="font-size: 80%;"><b>Using the chain rule and summing over all \( k \) entries</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs042.html#setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations" style="font-size: 80%;"><b>Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs043.html#setting-up-the-back-propagation-algorithm-part-1" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 1</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs044.html#setting-up-the-back-propagation-algorithm-part-2" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 2</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs045.html#setting-up-the-back-propagation-algorithm-part-3" style="font-size: 80%;"><b>Setting up the Back propagation algorithm, part 3</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs046.html#updating-the-gradients" style="font-size: 80%;"><b>Updating the gradients</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#activation-functions" style="font-size: 80%;"><b>Activation functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs047.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs048.html#relevance" style="font-size: 80%;"><b>Relevance</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs049.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs050.html#exploding-gradients" style="font-size: 80%;"><b>Exploding gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs051.html#is-the-logistic-activation-function-sigmoid-our-choice" style="font-size: 80%;"><b>Is the Logistic activation function (Sigmoid)  our choice?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs052.html#logistic-function-as-the-root-of-problems" style="font-size: 80%;"><b>Logistic function as the root of problems</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs053.html#the-derivative-of-the-logistic-funtion" style="font-size: 80%;"><b>The derivative of the Logistic funtion</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs054.html#insights-from-the-paper-by-glorot-and-bengio" style="font-size: 80%;"><b>Insights from the paper by Glorot and Bengio</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs055.html#the-relu-function-family" style="font-size: 80%;"><b>The RELU function family</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs056.html#elu-function" style="font-size: 80%;"><b>ELU function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs057.html#which-activation-function-should-we-use" style="font-size: 80%;"><b>Which activation function should we use?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs058.html#more-on-activation-functions-output-layers" style="font-size: 80%;"><b>More on activation functions, output layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs059.html#fine-tuning-neural-network-hyperparameters" style="font-size: 80%;"><b>Fine-tuning neural network hyperparameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs060.html#hidden-layers" style="font-size: 80%;"><b>Hidden layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs061.html#batch-normalization" style="font-size: 80%;"><b>Batch Normalization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs062.html#dropout" style="font-size: 80%;"><b>Dropout</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs063.html#gradient-clipping" style="font-size: 80%;"><b>Gradient Clipping</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs064.html#a-top-down-perspective-on-neural-networks" style="font-size: 80%;"><b>A top-down perspective on Neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs065.html#more-top-down-perspectives" style="font-size: 80%;"><b>More top-down perspectives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs066.html#limitations-of-supervised-learning-with-deep-networks" style="font-size: 80%;"><b>Limitations of supervised learning with deep networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs067.html#limitations-of-nns" style="font-size: 80%;"><b>Limitations of NNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs068.html#homogeneous-data" style="font-size: 80%;"><b>Homogeneous data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs069.html#more-limitations" style="font-size: 80%;"><b>More limitations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs070.html#setting-up-a-multi-layer-perceptron-model-for-classification" style="font-size: 80%;"><b>Setting up a Multi-layer perceptron model for classification</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs071.html#defining-the-cost-function" style="font-size: 80%;"><b>Defining the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs072.html#example-binary-classification-problem" style="font-size: 80%;"><b>Example: binary classification problem</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs073.html#the-softmax-function" style="font-size: 80%;"><b>The Softmax function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs074.html#developing-a-code-for-doing-neural-networks-with-back-propagation" style="font-size: 80%;"><b>Developing a code for doing neural networks with back propagation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs076.html#train-and-test-datasets" style="font-size: 80%;"><b>Train and test datasets</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs077.html#define-model-and-architecture" style="font-size: 80%;"><b>Define model and architecture</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs078.html#layers" style="font-size: 80%;"><b>Layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs079.html#weights-and-biases" style="font-size: 80%;"><b>Weights and biases</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs080.html#feed-forward-pass" style="font-size: 80%;"><b>Feed-forward pass</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs081.html#matrix-multiplications" style="font-size: 80%;"><b>Matrix multiplications</b></a></li>
-     <!-- navigation toc: --> <li><a href="#choose-cost-function-and-optimizer" style="font-size: 80%;"><b>Choose cost function and optimizer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs083.html#optimizing-the-cost-function" style="font-size: 80%;"><b>Optimizing the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs084.html#regularization" style="font-size: 80%;"><b>Regularization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs085.html#matrix-multiplication" style="font-size: 80%;"><b>Matrix  multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs086.html#improving-performance" style="font-size: 80%;"><b>Improving performance</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs087.html#full-object-oriented-implementation" style="font-size: 80%;"><b>Full object-oriented implementation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs088.html#evaluate-model-performance-on-test-data" style="font-size: 80%;"><b>Evaluate model performance on test data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs089.html#adjust-hyperparameters" style="font-size: 80%;"><b>Adjust hyperparameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs091.html#scikit-learn-implementation" style="font-size: 80%;"><b>scikit-learn implementation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#building-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs094.html#tensorflow" style="font-size: 80%;"><b>Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs095.html#using-keras" style="font-size: 80%;"><b>Using Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#the-breast-cancer-data-now-with-keras" style="font-size: 80%;"><b>The Breast Cancer Data, now with Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs048.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs049.html#relevance" style="font-size: 80%;"><b>Relevance</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs050.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs051.html#exploding-gradients" style="font-size: 80%;"><b>Exploding gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs052.html#is-the-logistic-activation-function-sigmoid-our-choice" style="font-size: 80%;"><b>Is the Logistic activation function (Sigmoid)  our choice?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs053.html#logistic-function-as-the-root-of-problems" style="font-size: 80%;"><b>Logistic function as the root of problems</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs054.html#the-derivative-of-the-logistic-funtion" style="font-size: 80%;"><b>The derivative of the Logistic funtion</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs055.html#insights-from-the-paper-by-glorot-and-bengio" style="font-size: 80%;"><b>Insights from the paper by Glorot and Bengio</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs056.html#the-relu-function-family" style="font-size: 80%;"><b>The RELU function family</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs057.html#elu-function" style="font-size: 80%;"><b>ELU function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs058.html#which-activation-function-should-we-use" style="font-size: 80%;"><b>Which activation function should we use?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs059.html#more-on-activation-functions-output-layers" style="font-size: 80%;"><b>More on activation functions, output layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs060.html#fine-tuning-neural-network-hyperparameters" style="font-size: 80%;"><b>Fine-tuning neural network hyperparameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs061.html#hidden-layers" style="font-size: 80%;"><b>Hidden layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs062.html#batch-normalization" style="font-size: 80%;"><b>Batch Normalization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs063.html#dropout" style="font-size: 80%;"><b>Dropout</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs064.html#gradient-clipping" style="font-size: 80%;"><b>Gradient Clipping</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs065.html#a-top-down-perspective-on-neural-networks" style="font-size: 80%;"><b>A top-down perspective on Neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs066.html#more-top-down-perspectives" style="font-size: 80%;"><b>More top-down perspectives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs067.html#limitations-of-supervised-learning-with-deep-networks" style="font-size: 80%;"><b>Limitations of supervised learning with deep networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs068.html#limitations-of-nns" style="font-size: 80%;"><b>Limitations of NNs</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs069.html#homogeneous-data" style="font-size: 80%;"><b>Homogeneous data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs070.html#more-limitations" style="font-size: 80%;"><b>More limitations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs071.html#setting-up-a-multi-layer-perceptron-model-for-classification" style="font-size: 80%;"><b>Setting up a Multi-layer perceptron model for classification</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs072.html#defining-the-cost-function" style="font-size: 80%;"><b>Defining the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs073.html#example-binary-classification-problem" style="font-size: 80%;"><b>Example: binary classification problem</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs074.html#the-softmax-function" style="font-size: 80%;"><b>The Softmax function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs075.html#developing-a-code-for-doing-neural-networks-with-back-propagation" style="font-size: 80%;"><b>Developing a code for doing neural networks with back propagation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs077.html#train-and-test-datasets" style="font-size: 80%;"><b>Train and test datasets</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs078.html#define-model-and-architecture" style="font-size: 80%;"><b>Define model and architecture</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs079.html#layers" style="font-size: 80%;"><b>Layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs080.html#weights-and-biases" style="font-size: 80%;"><b>Weights and biases</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs081.html#feed-forward-pass" style="font-size: 80%;"><b>Feed-forward pass</b></a></li>
+     <!-- navigation toc: --> <li><a href="#matrix-multiplications" style="font-size: 80%;"><b>Matrix multiplications</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs083.html#choose-cost-function-and-optimizer" style="font-size: 80%;"><b>Choose cost function and optimizer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs084.html#optimizing-the-cost-function" style="font-size: 80%;"><b>Optimizing the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs085.html#regularization" style="font-size: 80%;"><b>Regularization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs086.html#matrix-multiplication" style="font-size: 80%;"><b>Matrix  multiplication</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs087.html#improving-performance" style="font-size: 80%;"><b>Improving performance</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs088.html#full-object-oriented-implementation" style="font-size: 80%;"><b>Full object-oriented implementation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs089.html#evaluate-model-performance-on-test-data" style="font-size: 80%;"><b>Evaluate model performance on test data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs090.html#adjust-hyperparameters" style="font-size: 80%;"><b>Adjust hyperparameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#scikit-learn-implementation" style="font-size: 80%;"><b>scikit-learn implementation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs094.html#building-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building neural networks in Tensorflow and Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs095.html#tensorflow" style="font-size: 80%;"><b>Tensorflow</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#using-keras" style="font-size: 80%;"><b>Using Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#building-a-neural-network-code" style="font-size: 80%;"><b>Building a neural network code</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#learning-rate-methods" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Learning rate methods</a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#usage-of-the-above-learning-rate-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of the above learning rate schedulers</a></li>
@@ -464,32 +462,94 @@
 <div class="container">
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0082"></a>
-<!-- !split -->
-<h2 id="choose-cost-function-and-optimizer" class="anchor">Choose cost function and optimizer </h2>
+<!-- !split    -->
+<h2 id="matrix-multiplications" class="anchor">Matrix multiplications </h2>
 
-<p>To measure how well our neural network is doing we need to introduce a cost function.  
-We will call the function that gives the error of a single sample output the <em>loss</em> function, and the function
-that gives the total error of our network across all samples the <em>cost</em> function.
-A typical choice for multiclass classification is the <em>cross-entropy</em> loss, also known as the negative log likelihood.  
+<p>Since our data has the dimensions \( X = (n_{inputs}, n_{features}) \) and our weights to the hidden
+layer have the dimensions  
+\( W_{hidden} = (n_{features}, n_{hidden}) \),
+we can easily feed the network all our training data in one go by taking the matrix product  
 </p>
 
-<p>In <em>multiclass</em> classification it is common to treat each integer label as a so called <em>one-hot</em> vector:  </p>
-
-<p>$$ y = 5 \quad \rightarrow \quad \boldsymbol{y} = (0, 0, 0, 0, 0, 1, 0, 0, 0, 0) ,$$  </p>
+<p>$$ X W^{h} = (n_{inputs}, n_{hidden}),$$ </p>
 
-<p>$$ y = 1 \quad \rightarrow \quad \boldsymbol{y} = (0, 1, 0, 0, 0, 0, 0, 0, 0, 0) ,$$  </p>
+<p>and obtain a matrix that holds the weighted sum of inputs to the hidden layer
+for each input image and each hidden neuron.    
+We also add the bias to obtain a matrix of weighted sums to the hidden layer \( Z^{h} \):  
+</p>
 
-<p>i.e. a binary bit string of length \( C \), where \( C = 10 \) is the number of classes in the MNIST dataset.  </p>
+<p>$$ \boldsymbol{z}^{l} = \boldsymbol{X} \boldsymbol{W}^{l} + \boldsymbol{b}^{l} ,$$</p>
 
-<p>Let \( y_{ic} \) denote the \( c \)-th component of the \( i \)-th one-hot vector.  
-We define the cost function \( \mathcal{C} \) as a sum over the cross-entropy loss for each point \( \boldsymbol{x}_i \) in the dataset.
+<p>meaning the same bias (1D array with size equal number of hidden neurons) is added to each input image.  
+This is then passed through the activation:  
 </p>
 
-<p>In the one-hot representation only one of the terms in the loss function is non-zero, namely the
-probability of the correct category \( c' \)  
-(i.e. the category \( c' \) such that \( y_{ic'} = 1 \)). This means that the cross entropy loss only punishes you for how wrong
-you got the correct label. The probability of category \( c \) is given by the softmax function. The vector \( \boldsymbol{\theta} \) represents the parameters of our network, i.e. all the weights and biases.  
-</p>
+<p>$$ \boldsymbol{a}^{l} = f(\boldsymbol{z}^l) .$$  </p>
+
+<p>This is fed to the output layer:  </p>
+
+<p>$$ \boldsymbol{z}^{L} = \boldsymbol{a}^{L} \boldsymbol{W}^{L} + \boldsymbol{b}^{L} .$$</p>
+
+<p>Finally we receive our output values for each image and each category by passing it through the softmax function:  </p>
+
+<p>$$ output = softmax (\boldsymbol{z}^{L}) = (n_{inputs}, n_{categories}) .$$</p>
+
+
+<!-- code=python (!bc pycod) typeset with pygments style "default" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #f8f8f8">
+  <pre style="line-height: 125%;"><span style="color: #408080; font-style: italic"># setup the feed-forward pass, subscript h = hidden layer</span>
+
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">sigmoid</span>(x):
+    <span style="color: #008000; font-weight: bold">return</span> <span style="color: #666666">1/</span>(<span style="color: #666666">1</span> <span style="color: #666666">+</span> np<span style="color: #666666">.</span>exp(<span style="color: #666666">-</span>x))
+
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">feed_forward</span>(X):
+    <span style="color: #408080; font-style: italic"># weighted sum of inputs to the hidden layer</span>
+    z_h <span style="color: #666666">=</span> np<span style="color: #666666">.</span>matmul(X, hidden_weights) <span style="color: #666666">+</span> hidden_bias
+    <span style="color: #408080; font-style: italic"># activation in the hidden layer</span>
+    a_h <span style="color: #666666">=</span> sigmoid(z_h)
+    
+    <span style="color: #408080; font-style: italic"># weighted sum of inputs to the output layer</span>
+    z_o <span style="color: #666666">=</span> np<span style="color: #666666">.</span>matmul(a_h, output_weights) <span style="color: #666666">+</span> output_bias
+    <span style="color: #408080; font-style: italic"># softmax output</span>
+    <span style="color: #408080; font-style: italic"># axis 0 holds each input and axis 1 the probabilities of each category</span>
+    exp_term <span style="color: #666666">=</span> np<span style="color: #666666">.</span>exp(z_o)
+    probabilities <span style="color: #666666">=</span> exp_term <span style="color: #666666">/</span> np<span style="color: #666666">.</span>sum(exp_term, axis<span style="color: #666666">=1</span>, keepdims<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">True</span>)
+    
+    <span style="color: #008000; font-weight: bold">return</span> probabilities
+
+probabilities <span style="color: #666666">=</span> feed_forward(X_train)
+<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;probabilities = (n_inputs, n_categories) = &quot;</span> <span style="color: #666666">+</span> <span style="color: #008000">str</span>(probabilities<span style="color: #666666">.</span>shape))
+<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;probability that image 0 is in category 0,1,2,...,9 = </span><span style="color: #BB6622; font-weight: bold">\n</span><span style="color: #BA2121">&quot;</span> <span style="color: #666666">+</span> <span style="color: #008000">str</span>(probabilities[<span style="color: #666666">0</span>]))
+<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;probabilities sum up to: &quot;</span> <span style="color: #666666">+</span> <span style="color: #008000">str</span>(probabilities[<span style="color: #666666">0</span>]<span style="color: #666666">.</span>sum()))
+<span style="color: #008000">print</span>()
+
+<span style="color: #408080; font-style: italic"># we obtain a prediction by taking the class with the highest likelihood</span>
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">predict</span>(X):
+    probabilities <span style="color: #666666">=</span> feed_forward(X)
+    <span style="color: #008000; font-weight: bold">return</span> np<span style="color: #666666">.</span>argmax(probabilities, axis<span style="color: #666666">=1</span>)
+
+predictions <span style="color: #666666">=</span> predict(X_train)
+<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;predictions = (n_inputs) = &quot;</span> <span style="color: #666666">+</span> <span style="color: #008000">str</span>(predictions<span style="color: #666666">.</span>shape))
+<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;prediction for image 0: &quot;</span> <span style="color: #666666">+</span> <span style="color: #008000">str</span>(predictions[<span style="color: #666666">0</span>]))
+<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;correct label for image 0: &quot;</span> <span style="color: #666666">+</span> <span style="color: #008000">str</span>(Y_train[<span style="color: #666666">0</span>]))
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
+</div>
 
 
 <p>
diff --git a/doc/pub/week42/html/._week42-bs083.html b/doc/pub/week42/html/._week42-bs083.html
index cfc98dd0c..3ec2266c3 100644
--- a/doc/pub/week42/html/._week42-bs083.html
+++ b/doc/pub/week42/html/._week42-bs083.html
@@ -36,16 +36,17 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Lecture October 14, 2024', 2, None, 'lecture-october-14-2024'),
-              ('Material for the active learning sessions on Tuesday and '
-               'Wednesday',
+ 'sections': [('Lecture October 13, 2025', 2, None, 'lecture-october-13-2025'),
+              ('Readings and videos', 2, None, 'readings-and-videos'),
+              ('Material for the lab sessions on Tuesday and Wednesday',
                2,
                None,
-               'material-for-the-active-learning-sessions-on-tuesday-and-wednesday'),
-              ('Writing a code which implements a feed-forward neural network',
+               'material-for-the-lab-sessions-on-tuesday-and-wednesday'),
+              ('Lecture material: Writing a code which implements a '
+               'feed-forward neural network',
                2,
                None,
-               'writing-a-code-which-implements-a-feed-forward-neural-network'),
+               'lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network'),
               ('Mathematics of deep learning',
                2,
                None,
@@ -55,10 +56,11 @@
                None,
                'reminder-on-books-with-hands-on-material-and-codes'),
               ('Reading recommendations', 2, None, 'reading-recommendations'),
-              ('First network example, simple percepetron with one input',
+              ('Reminder from last week: First network example, simple '
+               'percepetron with one input',
                2,
                None,
-               'first-network-example-simple-percepetron-with-one-input'),
+               'reminder-from-last-week-first-network-example-simple-percepetron-with-one-input'),
               ('Layout of a simple neural network with no hidden layer',
                2,
                None,
@@ -105,10 +107,10 @@
                None,
                'setting-up-the-equations-for-a-neural-network'),
               ('Layout of a neural network with three hidden layers (last '
-               'later = $l=L=4$, first layer $l=0$)',
+               'layer = $l=L=4$, first layer $l=0$)',
                2,
                None,
-               'layout-of-a-neural-network-with-three-hidden-layers-last-later-l-l-4-first-layer-l-0'),
+               'layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0'),
               ('Definitions', 2, None, 'definitions'),
               ('Inputs to the activation function',
                2,
@@ -291,10 +293,6 @@
                2,
                None,
                'collect-and-pre-process-data'),
-              ('The Breast Cancer Data, now with Keras',
-               2,
-               None,
-               'the-breast-cancer-data-now-with-keras'),
               ('Building a neural network code',
                2,
                None,
@@ -349,103 +347,103 @@
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs001.html#lecture-october-14-2024" style="font-size: 80%;"><b>Lecture October 14, 2024</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs002.html#material-for-the-active-learning-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Material for the active learning sessions on Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs003.html#writing-a-code-which-implements-a-feed-forward-neural-network" style="font-size: 80%;"><b>Writing a code which implements a feed-forward neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs004.html#mathematics-of-deep-learning" style="font-size: 80%;"><b>Mathematics of deep learning</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs005.html#reminder-on-books-with-hands-on-material-and-codes" style="font-size: 80%;"><b>Reminder on books with hands-on material and codes</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs006.html#reading-recommendations" style="font-size: 80%;"><b>Reading recommendations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs007.html#first-network-example-simple-percepetron-with-one-input" style="font-size: 80%;"><b>First network example, simple percepetron with one input</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs008.html#layout-of-a-simple-neural-network-with-no-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with no hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs009.html#optimizing-the-parameters" style="font-size: 80%;"><b>Optimizing the parameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs010.html#adding-a-hidden-layer" style="font-size: 80%;"><b>Adding a hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs011.html#layout-of-a-simple-neural-network-with-one-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with one hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs012.html#the-derivatives" style="font-size: 80%;"><b>The derivatives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs013.html#important-observations" style="font-size: 80%;"><b>Important observations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs014.html#the-training" style="font-size: 80%;"><b>The training</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs015.html#code-example" style="font-size: 80%;"><b>Code example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs016.html#simple-neural-network-and-the-back-propagation-equations" style="font-size: 80%;"><b>Simple neural network and the  back propagation equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs017.html#layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-with-two-hidden-noeds-and-one-output-node" style="font-size: 80%;"><b>Layout of a simple neural network with two input nodes, one  hidden layer with two hidden noeds and one output node</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs018.html#the-ouput-layer" style="font-size: 80%;"><b>The ouput layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs019.html#compact-expressions" style="font-size: 80%;"><b>Compact expressions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs020.html#output-layer" style="font-size: 80%;"><b>Output layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs021.html#explicit-derivatives" style="font-size: 80%;"><b>Explicit derivatives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs022.html#derivatives-of-the-hidden-layer" style="font-size: 80%;"><b>Derivatives of the hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs023.html#final-expression" style="font-size: 80%;"><b>Final expression</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs024.html#completing-the-list" style="font-size: 80%;"><b>Completing the list</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs025.html#final-expressions-for-the-biases-of-the-hidden-layer" style="font-size: 80%;"><b>Final expressions for the biases of the hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs026.html#gradient-expressions" style="font-size: 80%;"><b>Gradient expressions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs027.html#setting-up-the-equations-for-a-neural-network" style="font-size: 80%;"><b>Setting up the equations for a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs028.html#layout-of-a-neural-network-with-three-hidden-layers-last-later-l-l-4-first-layer-l-0" style="font-size: 80%;"><b>Layout of a neural network with three hidden layers (last later = \( l=L=4 \), first layer \( l=0 \))</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs029.html#definitions" style="font-size: 80%;"><b>Definitions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs030.html#inputs-to-the-activation-function" style="font-size: 80%;"><b>Inputs to the activation function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs031.html#layout-of-input-to-first-hidden-layer-l-1-from-input-layer-l-0" style="font-size: 80%;"><b>Layout of input to first hidden layer \( l=1 \) from input layer \( l=0 \)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs032.html#derivatives-and-the-chain-rule" style="font-size: 80%;"><b>Derivatives and the chain rule</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs033.html#derivative-of-the-cost-function" style="font-size: 80%;"><b>Derivative of the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs034.html#the-back-propagation-equations-for-a-neural-network" style="font-size: 80%;"><b>The  back propagation equations for a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs035.html#analyzing-the-last-results" style="font-size: 80%;"><b>Analyzing the last results</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs036.html#more-considerations" style="font-size: 80%;"><b>More considerations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs037.html#derivatives-in-terms-of-z-j-l" style="font-size: 80%;"><b>Derivatives in terms of \( z_j^L \)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs038.html#bringing-it-together" style="font-size: 80%;"><b>Bringing it together</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs039.html#final-back-propagating-equation" style="font-size: 80%;"><b>Final back propagating equation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs040.html#using-the-chain-rule-and-summing-over-all-k-entries" style="font-size: 80%;"><b>Using the chain rule and summing over all \( k \) entries</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs041.html#setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations" style="font-size: 80%;"><b>Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs042.html#setting-up-the-back-propagation-algorithm-part-1" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 1</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs043.html#setting-up-the-back-propagation-algorithm-part-2" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs044.html#setting-up-the-back-propagation-algorithm-part-3" style="font-size: 80%;"><b>Setting up the Back propagation algorithm, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs045.html#updating-the-gradients" style="font-size: 80%;"><b>Updating the gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs001.html#lecture-october-13-2025" style="font-size: 80%;"><b>Lecture October 13, 2025</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs002.html#readings-and-videos" style="font-size: 80%;"><b>Readings and videos</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs003.html#material-for-the-lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Material for the lab sessions on Tuesday and Wednesday</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs004.html#lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network" style="font-size: 80%;"><b>Lecture material: Writing a code which implements a feed-forward neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs005.html#mathematics-of-deep-learning" style="font-size: 80%;"><b>Mathematics of deep learning</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs006.html#reminder-on-books-with-hands-on-material-and-codes" style="font-size: 80%;"><b>Reminder on books with hands-on material and codes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs007.html#reading-recommendations" style="font-size: 80%;"><b>Reading recommendations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs008.html#reminder-from-last-week-first-network-example-simple-percepetron-with-one-input" style="font-size: 80%;"><b>Reminder from last week: First network example, simple percepetron with one input</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs009.html#layout-of-a-simple-neural-network-with-no-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with no hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs010.html#optimizing-the-parameters" style="font-size: 80%;"><b>Optimizing the parameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs011.html#adding-a-hidden-layer" style="font-size: 80%;"><b>Adding a hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs012.html#layout-of-a-simple-neural-network-with-one-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with one hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs013.html#the-derivatives" style="font-size: 80%;"><b>The derivatives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs014.html#important-observations" style="font-size: 80%;"><b>Important observations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs015.html#the-training" style="font-size: 80%;"><b>The training</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs016.html#code-example" style="font-size: 80%;"><b>Code example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs017.html#simple-neural-network-and-the-back-propagation-equations" style="font-size: 80%;"><b>Simple neural network and the  back propagation equations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs018.html#layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-with-two-hidden-noeds-and-one-output-node" style="font-size: 80%;"><b>Layout of a simple neural network with two input nodes, one  hidden layer with two hidden noeds and one output node</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs019.html#the-ouput-layer" style="font-size: 80%;"><b>The ouput layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs020.html#compact-expressions" style="font-size: 80%;"><b>Compact expressions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs021.html#output-layer" style="font-size: 80%;"><b>Output layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs022.html#explicit-derivatives" style="font-size: 80%;"><b>Explicit derivatives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs023.html#derivatives-of-the-hidden-layer" style="font-size: 80%;"><b>Derivatives of the hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs024.html#final-expression" style="font-size: 80%;"><b>Final expression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs025.html#completing-the-list" style="font-size: 80%;"><b>Completing the list</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs026.html#final-expressions-for-the-biases-of-the-hidden-layer" style="font-size: 80%;"><b>Final expressions for the biases of the hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs027.html#gradient-expressions" style="font-size: 80%;"><b>Gradient expressions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs028.html#setting-up-the-equations-for-a-neural-network" style="font-size: 80%;"><b>Setting up the equations for a neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs029.html#layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0" style="font-size: 80%;"><b>Layout of a neural network with three hidden layers (last layer = \( l=L=4 \), first layer \( l=0 \))</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs030.html#definitions" style="font-size: 80%;"><b>Definitions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs031.html#inputs-to-the-activation-function" style="font-size: 80%;"><b>Inputs to the activation function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs032.html#layout-of-input-to-first-hidden-layer-l-1-from-input-layer-l-0" style="font-size: 80%;"><b>Layout of input to first hidden layer \( l=1 \) from input layer \( l=0 \)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs033.html#derivatives-and-the-chain-rule" style="font-size: 80%;"><b>Derivatives and the chain rule</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs034.html#derivative-of-the-cost-function" style="font-size: 80%;"><b>Derivative of the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs035.html#the-back-propagation-equations-for-a-neural-network" style="font-size: 80%;"><b>The  back propagation equations for a neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs036.html#analyzing-the-last-results" style="font-size: 80%;"><b>Analyzing the last results</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs037.html#more-considerations" style="font-size: 80%;"><b>More considerations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs038.html#derivatives-in-terms-of-z-j-l" style="font-size: 80%;"><b>Derivatives in terms of \( z_j^L \)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs039.html#bringing-it-together" style="font-size: 80%;"><b>Bringing it together</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs040.html#final-back-propagating-equation" style="font-size: 80%;"><b>Final back propagating equation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs041.html#using-the-chain-rule-and-summing-over-all-k-entries" style="font-size: 80%;"><b>Using the chain rule and summing over all \( k \) entries</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs042.html#setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations" style="font-size: 80%;"><b>Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs043.html#setting-up-the-back-propagation-algorithm-part-1" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 1</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs044.html#setting-up-the-back-propagation-algorithm-part-2" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 2</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs045.html#setting-up-the-back-propagation-algorithm-part-3" style="font-size: 80%;"><b>Setting up the Back propagation algorithm, part 3</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs046.html#updating-the-gradients" style="font-size: 80%;"><b>Updating the gradients</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#activation-functions" style="font-size: 80%;"><b>Activation functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs047.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs048.html#relevance" style="font-size: 80%;"><b>Relevance</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs049.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs050.html#exploding-gradients" style="font-size: 80%;"><b>Exploding gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs051.html#is-the-logistic-activation-function-sigmoid-our-choice" style="font-size: 80%;"><b>Is the Logistic activation function (Sigmoid)  our choice?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs052.html#logistic-function-as-the-root-of-problems" style="font-size: 80%;"><b>Logistic function as the root of problems</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs053.html#the-derivative-of-the-logistic-funtion" style="font-size: 80%;"><b>The derivative of the Logistic funtion</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs054.html#insights-from-the-paper-by-glorot-and-bengio" style="font-size: 80%;"><b>Insights from the paper by Glorot and Bengio</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs055.html#the-relu-function-family" style="font-size: 80%;"><b>The RELU function family</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs056.html#elu-function" style="font-size: 80%;"><b>ELU function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs057.html#which-activation-function-should-we-use" style="font-size: 80%;"><b>Which activation function should we use?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs058.html#more-on-activation-functions-output-layers" style="font-size: 80%;"><b>More on activation functions, output layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs059.html#fine-tuning-neural-network-hyperparameters" style="font-size: 80%;"><b>Fine-tuning neural network hyperparameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs060.html#hidden-layers" style="font-size: 80%;"><b>Hidden layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs061.html#batch-normalization" style="font-size: 80%;"><b>Batch Normalization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs062.html#dropout" style="font-size: 80%;"><b>Dropout</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs063.html#gradient-clipping" style="font-size: 80%;"><b>Gradient Clipping</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs064.html#a-top-down-perspective-on-neural-networks" style="font-size: 80%;"><b>A top-down perspective on Neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs065.html#more-top-down-perspectives" style="font-size: 80%;"><b>More top-down perspectives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs066.html#limitations-of-supervised-learning-with-deep-networks" style="font-size: 80%;"><b>Limitations of supervised learning with deep networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs067.html#limitations-of-nns" style="font-size: 80%;"><b>Limitations of NNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs068.html#homogeneous-data" style="font-size: 80%;"><b>Homogeneous data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs069.html#more-limitations" style="font-size: 80%;"><b>More limitations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs070.html#setting-up-a-multi-layer-perceptron-model-for-classification" style="font-size: 80%;"><b>Setting up a Multi-layer perceptron model for classification</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs071.html#defining-the-cost-function" style="font-size: 80%;"><b>Defining the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs072.html#example-binary-classification-problem" style="font-size: 80%;"><b>Example: binary classification problem</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs073.html#the-softmax-function" style="font-size: 80%;"><b>The Softmax function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs074.html#developing-a-code-for-doing-neural-networks-with-back-propagation" style="font-size: 80%;"><b>Developing a code for doing neural networks with back propagation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs076.html#train-and-test-datasets" style="font-size: 80%;"><b>Train and test datasets</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs077.html#define-model-and-architecture" style="font-size: 80%;"><b>Define model and architecture</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs078.html#layers" style="font-size: 80%;"><b>Layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs079.html#weights-and-biases" style="font-size: 80%;"><b>Weights and biases</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs080.html#feed-forward-pass" style="font-size: 80%;"><b>Feed-forward pass</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs081.html#matrix-multiplications" style="font-size: 80%;"><b>Matrix multiplications</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs082.html#choose-cost-function-and-optimizer" style="font-size: 80%;"><b>Choose cost function and optimizer</b></a></li>
-     <!-- navigation toc: --> <li><a href="#optimizing-the-cost-function" style="font-size: 80%;"><b>Optimizing the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs084.html#regularization" style="font-size: 80%;"><b>Regularization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs085.html#matrix-multiplication" style="font-size: 80%;"><b>Matrix  multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs086.html#improving-performance" style="font-size: 80%;"><b>Improving performance</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs087.html#full-object-oriented-implementation" style="font-size: 80%;"><b>Full object-oriented implementation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs088.html#evaluate-model-performance-on-test-data" style="font-size: 80%;"><b>Evaluate model performance on test data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs089.html#adjust-hyperparameters" style="font-size: 80%;"><b>Adjust hyperparameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs091.html#scikit-learn-implementation" style="font-size: 80%;"><b>scikit-learn implementation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#building-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs094.html#tensorflow" style="font-size: 80%;"><b>Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs095.html#using-keras" style="font-size: 80%;"><b>Using Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#the-breast-cancer-data-now-with-keras" style="font-size: 80%;"><b>The Breast Cancer Data, now with Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs048.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs049.html#relevance" style="font-size: 80%;"><b>Relevance</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs050.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs051.html#exploding-gradients" style="font-size: 80%;"><b>Exploding gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs052.html#is-the-logistic-activation-function-sigmoid-our-choice" style="font-size: 80%;"><b>Is the Logistic activation function (Sigmoid)  our choice?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs053.html#logistic-function-as-the-root-of-problems" style="font-size: 80%;"><b>Logistic function as the root of problems</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs054.html#the-derivative-of-the-logistic-funtion" style="font-size: 80%;"><b>The derivative of the Logistic funtion</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs055.html#insights-from-the-paper-by-glorot-and-bengio" style="font-size: 80%;"><b>Insights from the paper by Glorot and Bengio</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs056.html#the-relu-function-family" style="font-size: 80%;"><b>The RELU function family</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs057.html#elu-function" style="font-size: 80%;"><b>ELU function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs058.html#which-activation-function-should-we-use" style="font-size: 80%;"><b>Which activation function should we use?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs059.html#more-on-activation-functions-output-layers" style="font-size: 80%;"><b>More on activation functions, output layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs060.html#fine-tuning-neural-network-hyperparameters" style="font-size: 80%;"><b>Fine-tuning neural network hyperparameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs061.html#hidden-layers" style="font-size: 80%;"><b>Hidden layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs062.html#batch-normalization" style="font-size: 80%;"><b>Batch Normalization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs063.html#dropout" style="font-size: 80%;"><b>Dropout</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs064.html#gradient-clipping" style="font-size: 80%;"><b>Gradient Clipping</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs065.html#a-top-down-perspective-on-neural-networks" style="font-size: 80%;"><b>A top-down perspective on Neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs066.html#more-top-down-perspectives" style="font-size: 80%;"><b>More top-down perspectives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs067.html#limitations-of-supervised-learning-with-deep-networks" style="font-size: 80%;"><b>Limitations of supervised learning with deep networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs068.html#limitations-of-nns" style="font-size: 80%;"><b>Limitations of NNs</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs069.html#homogeneous-data" style="font-size: 80%;"><b>Homogeneous data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs070.html#more-limitations" style="font-size: 80%;"><b>More limitations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs071.html#setting-up-a-multi-layer-perceptron-model-for-classification" style="font-size: 80%;"><b>Setting up a Multi-layer perceptron model for classification</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs072.html#defining-the-cost-function" style="font-size: 80%;"><b>Defining the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs073.html#example-binary-classification-problem" style="font-size: 80%;"><b>Example: binary classification problem</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs074.html#the-softmax-function" style="font-size: 80%;"><b>The Softmax function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs075.html#developing-a-code-for-doing-neural-networks-with-back-propagation" style="font-size: 80%;"><b>Developing a code for doing neural networks with back propagation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs077.html#train-and-test-datasets" style="font-size: 80%;"><b>Train and test datasets</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs078.html#define-model-and-architecture" style="font-size: 80%;"><b>Define model and architecture</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs079.html#layers" style="font-size: 80%;"><b>Layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs080.html#weights-and-biases" style="font-size: 80%;"><b>Weights and biases</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs081.html#feed-forward-pass" style="font-size: 80%;"><b>Feed-forward pass</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs082.html#matrix-multiplications" style="font-size: 80%;"><b>Matrix multiplications</b></a></li>
+     <!-- navigation toc: --> <li><a href="#choose-cost-function-and-optimizer" style="font-size: 80%;"><b>Choose cost function and optimizer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs084.html#optimizing-the-cost-function" style="font-size: 80%;"><b>Optimizing the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs085.html#regularization" style="font-size: 80%;"><b>Regularization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs086.html#matrix-multiplication" style="font-size: 80%;"><b>Matrix  multiplication</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs087.html#improving-performance" style="font-size: 80%;"><b>Improving performance</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs088.html#full-object-oriented-implementation" style="font-size: 80%;"><b>Full object-oriented implementation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs089.html#evaluate-model-performance-on-test-data" style="font-size: 80%;"><b>Evaluate model performance on test data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs090.html#adjust-hyperparameters" style="font-size: 80%;"><b>Adjust hyperparameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#scikit-learn-implementation" style="font-size: 80%;"><b>scikit-learn implementation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs094.html#building-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building neural networks in Tensorflow and Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs095.html#tensorflow" style="font-size: 80%;"><b>Tensorflow</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#using-keras" style="font-size: 80%;"><b>Using Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#building-a-neural-network-code" style="font-size: 80%;"><b>Building a neural network code</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#learning-rate-methods" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Learning rate methods</a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#usage-of-the-above-learning-rate-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of the above learning rate schedulers</a></li>
@@ -465,39 +463,32 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0083"></a>
 <!-- !split -->
-<h2 id="optimizing-the-cost-function" class="anchor">Optimizing the cost function </h2>
+<h2 id="choose-cost-function-and-optimizer" class="anchor">Choose cost function and optimizer </h2>
 
-<p>The network is trained by finding the weights and biases that minimize the cost function. One of the most widely used classes of methods is <em>gradient descent</em> and its generalizations. The idea behind gradient descent
-is simply to adjust the weights in the direction where the gradient of the cost function is large and negative. This ensures we flow toward a <em>local</em> minimum of the cost function.  
-Each parameter \( \theta \) is iteratively adjusted according to the rule  
+<p>To measure how well our neural network is doing we need to introduce a cost function.  
+We will call the function that gives the error of a single sample output the <em>loss</em> function, and the function
+that gives the total error of our network across all samples the <em>cost</em> function.
+A typical choice for multiclass classification is the <em>cross-entropy</em> loss, also known as the negative log likelihood.  
 </p>
 
-<p>$$ \theta_{i+1} = \theta_i - \eta \nabla \mathcal{C}(\theta_i) ,$$</p>
+<p>In <em>multiclass</em> classification it is common to treat each integer label as a so called <em>one-hot</em> vector:  </p>
 
-<p>where \( \eta \) is known as the <em>learning rate</em>, which controls how big a step we take towards the minimum.  
-This update can be repeated for any number of iterations, or until we are satisfied with the result.  
-</p>
+<p>$$ y = 5 \quad \rightarrow \quad \boldsymbol{y} = (0, 0, 0, 0, 0, 1, 0, 0, 0, 0) ,$$  </p>
 
-<p>A simple and effective improvement is a variant called <em>Batch Gradient Descent</em>.  
-Instead of calculating the gradient on the whole dataset, we calculate an approximation of the gradient
-on a subset of the data called a <em>minibatch</em>.  
-If there are \( N \) data points and we have a minibatch size of \( M \), the total number of batches
-is \( N/M \).  
-We denote each minibatch \( B_k \), with \( k = 1, 2,...,N/M \). The gradient then becomes:  
-</p>
+<p>$$ y = 1 \quad \rightarrow \quad \boldsymbol{y} = (0, 1, 0, 0, 0, 0, 0, 0, 0, 0) ,$$  </p>
+
+<p>i.e. a binary bit string of length \( C \), where \( C = 10 \) is the number of classes in the MNIST dataset.  </p>
 
-<p>$$ \nabla \mathcal{C}(\theta) = \frac{1}{N} \sum_{i=1}^N \nabla \mathcal{L}_i(\theta) \quad \rightarrow \quad
-\frac{1}{M} \sum_{i \in B_k} \nabla \mathcal{L}_i(\theta) ,$$
+<p>Let \( y_{ic} \) denote the \( c \)-th component of the \( i \)-th one-hot vector.  
+We define the cost function \( \mathcal{C} \) as a sum over the cross-entropy loss for each point \( \boldsymbol{x}_i \) in the dataset.
 </p>
 
-<p>i.e. instead of averaging the loss over the entire dataset, we average over a minibatch.  </p>
+<p>In the one-hot representation only one of the terms in the loss function is non-zero, namely the
+probability of the correct category \( c' \)  
+(i.e. the category \( c' \) such that \( y_{ic'} = 1 \)). This means that the cross entropy loss only punishes you for how wrong
+you got the correct label. The probability of category \( c \) is given by the softmax function. The vector \( \boldsymbol{\theta} \) represents the parameters of our network, i.e. all the weights and biases.  
+</p>
 
-<p>This has two important benefits:  </p>
-<ol>
-<li> Introducing stochasticity decreases the chance that the algorithm becomes stuck in a local minima.</li>  
-<li> It significantly speeds up the calculation, since we do not have to use the entire dataset to calculate the gradient.</li>  
-</ol>
-<p>The various optmization  methods, with codes and algorithms,  are discussed in our lectures on <a href="https://compphysics.github.io/MachineLearning/doc/pub/Splines/html/Splines-bs.html" target="_self">Gradient descent approaches</a>.</p>
 
 <p>
 <!-- navigation buttons at the bottom of the page -->
diff --git a/doc/pub/week42/html/._week42-bs084.html b/doc/pub/week42/html/._week42-bs084.html
index 2ad20cb2b..fe5bbc832 100644
--- a/doc/pub/week42/html/._week42-bs084.html
+++ b/doc/pub/week42/html/._week42-bs084.html
@@ -36,16 +36,17 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Lecture October 14, 2024', 2, None, 'lecture-october-14-2024'),
-              ('Material for the active learning sessions on Tuesday and '
-               'Wednesday',
+ 'sections': [('Lecture October 13, 2025', 2, None, 'lecture-october-13-2025'),
+              ('Readings and videos', 2, None, 'readings-and-videos'),
+              ('Material for the lab sessions on Tuesday and Wednesday',
                2,
                None,
-               'material-for-the-active-learning-sessions-on-tuesday-and-wednesday'),
-              ('Writing a code which implements a feed-forward neural network',
+               'material-for-the-lab-sessions-on-tuesday-and-wednesday'),
+              ('Lecture material: Writing a code which implements a '
+               'feed-forward neural network',
                2,
                None,
-               'writing-a-code-which-implements-a-feed-forward-neural-network'),
+               'lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network'),
               ('Mathematics of deep learning',
                2,
                None,
@@ -55,10 +56,11 @@
                None,
                'reminder-on-books-with-hands-on-material-and-codes'),
               ('Reading recommendations', 2, None, 'reading-recommendations'),
-              ('First network example, simple percepetron with one input',
+              ('Reminder from last week: First network example, simple '
+               'percepetron with one input',
                2,
                None,
-               'first-network-example-simple-percepetron-with-one-input'),
+               'reminder-from-last-week-first-network-example-simple-percepetron-with-one-input'),
               ('Layout of a simple neural network with no hidden layer',
                2,
                None,
@@ -105,10 +107,10 @@
                None,
                'setting-up-the-equations-for-a-neural-network'),
               ('Layout of a neural network with three hidden layers (last '
-               'later = $l=L=4$, first layer $l=0$)',
+               'layer = $l=L=4$, first layer $l=0$)',
                2,
                None,
-               'layout-of-a-neural-network-with-three-hidden-layers-last-later-l-l-4-first-layer-l-0'),
+               'layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0'),
               ('Definitions', 2, None, 'definitions'),
               ('Inputs to the activation function',
                2,
@@ -291,10 +293,6 @@
                2,
                None,
                'collect-and-pre-process-data'),
-              ('The Breast Cancer Data, now with Keras',
-               2,
-               None,
-               'the-breast-cancer-data-now-with-keras'),
               ('Building a neural network code',
                2,
                None,
@@ -349,103 +347,103 @@
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs001.html#lecture-october-14-2024" style="font-size: 80%;"><b>Lecture October 14, 2024</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs002.html#material-for-the-active-learning-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Material for the active learning sessions on Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs003.html#writing-a-code-which-implements-a-feed-forward-neural-network" style="font-size: 80%;"><b>Writing a code which implements a feed-forward neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs004.html#mathematics-of-deep-learning" style="font-size: 80%;"><b>Mathematics of deep learning</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs005.html#reminder-on-books-with-hands-on-material-and-codes" style="font-size: 80%;"><b>Reminder on books with hands-on material and codes</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs006.html#reading-recommendations" style="font-size: 80%;"><b>Reading recommendations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs007.html#first-network-example-simple-percepetron-with-one-input" style="font-size: 80%;"><b>First network example, simple percepetron with one input</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs008.html#layout-of-a-simple-neural-network-with-no-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with no hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs009.html#optimizing-the-parameters" style="font-size: 80%;"><b>Optimizing the parameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs010.html#adding-a-hidden-layer" style="font-size: 80%;"><b>Adding a hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs011.html#layout-of-a-simple-neural-network-with-one-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with one hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs012.html#the-derivatives" style="font-size: 80%;"><b>The derivatives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs013.html#important-observations" style="font-size: 80%;"><b>Important observations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs014.html#the-training" style="font-size: 80%;"><b>The training</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs015.html#code-example" style="font-size: 80%;"><b>Code example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs016.html#simple-neural-network-and-the-back-propagation-equations" style="font-size: 80%;"><b>Simple neural network and the  back propagation equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs017.html#layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-with-two-hidden-noeds-and-one-output-node" style="font-size: 80%;"><b>Layout of a simple neural network with two input nodes, one  hidden layer with two hidden noeds and one output node</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs018.html#the-ouput-layer" style="font-size: 80%;"><b>The ouput layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs019.html#compact-expressions" style="font-size: 80%;"><b>Compact expressions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs020.html#output-layer" style="font-size: 80%;"><b>Output layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs021.html#explicit-derivatives" style="font-size: 80%;"><b>Explicit derivatives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs022.html#derivatives-of-the-hidden-layer" style="font-size: 80%;"><b>Derivatives of the hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs023.html#final-expression" style="font-size: 80%;"><b>Final expression</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs024.html#completing-the-list" style="font-size: 80%;"><b>Completing the list</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs025.html#final-expressions-for-the-biases-of-the-hidden-layer" style="font-size: 80%;"><b>Final expressions for the biases of the hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs026.html#gradient-expressions" style="font-size: 80%;"><b>Gradient expressions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs027.html#setting-up-the-equations-for-a-neural-network" style="font-size: 80%;"><b>Setting up the equations for a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs028.html#layout-of-a-neural-network-with-three-hidden-layers-last-later-l-l-4-first-layer-l-0" style="font-size: 80%;"><b>Layout of a neural network with three hidden layers (last later = \( l=L=4 \), first layer \( l=0 \))</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs029.html#definitions" style="font-size: 80%;"><b>Definitions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs030.html#inputs-to-the-activation-function" style="font-size: 80%;"><b>Inputs to the activation function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs031.html#layout-of-input-to-first-hidden-layer-l-1-from-input-layer-l-0" style="font-size: 80%;"><b>Layout of input to first hidden layer \( l=1 \) from input layer \( l=0 \)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs032.html#derivatives-and-the-chain-rule" style="font-size: 80%;"><b>Derivatives and the chain rule</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs033.html#derivative-of-the-cost-function" style="font-size: 80%;"><b>Derivative of the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs034.html#the-back-propagation-equations-for-a-neural-network" style="font-size: 80%;"><b>The  back propagation equations for a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs035.html#analyzing-the-last-results" style="font-size: 80%;"><b>Analyzing the last results</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs036.html#more-considerations" style="font-size: 80%;"><b>More considerations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs037.html#derivatives-in-terms-of-z-j-l" style="font-size: 80%;"><b>Derivatives in terms of \( z_j^L \)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs038.html#bringing-it-together" style="font-size: 80%;"><b>Bringing it together</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs039.html#final-back-propagating-equation" style="font-size: 80%;"><b>Final back propagating equation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs040.html#using-the-chain-rule-and-summing-over-all-k-entries" style="font-size: 80%;"><b>Using the chain rule and summing over all \( k \) entries</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs041.html#setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations" style="font-size: 80%;"><b>Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs042.html#setting-up-the-back-propagation-algorithm-part-1" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 1</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs043.html#setting-up-the-back-propagation-algorithm-part-2" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs044.html#setting-up-the-back-propagation-algorithm-part-3" style="font-size: 80%;"><b>Setting up the Back propagation algorithm, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs045.html#updating-the-gradients" style="font-size: 80%;"><b>Updating the gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs001.html#lecture-october-13-2025" style="font-size: 80%;"><b>Lecture October 13, 2025</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs002.html#readings-and-videos" style="font-size: 80%;"><b>Readings and videos</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs003.html#material-for-the-lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Material for the lab sessions on Tuesday and Wednesday</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs004.html#lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network" style="font-size: 80%;"><b>Lecture material: Writing a code which implements a feed-forward neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs005.html#mathematics-of-deep-learning" style="font-size: 80%;"><b>Mathematics of deep learning</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs006.html#reminder-on-books-with-hands-on-material-and-codes" style="font-size: 80%;"><b>Reminder on books with hands-on material and codes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs007.html#reading-recommendations" style="font-size: 80%;"><b>Reading recommendations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs008.html#reminder-from-last-week-first-network-example-simple-percepetron-with-one-input" style="font-size: 80%;"><b>Reminder from last week: First network example, simple percepetron with one input</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs009.html#layout-of-a-simple-neural-network-with-no-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with no hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs010.html#optimizing-the-parameters" style="font-size: 80%;"><b>Optimizing the parameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs011.html#adding-a-hidden-layer" style="font-size: 80%;"><b>Adding a hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs012.html#layout-of-a-simple-neural-network-with-one-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with one hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs013.html#the-derivatives" style="font-size: 80%;"><b>The derivatives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs014.html#important-observations" style="font-size: 80%;"><b>Important observations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs015.html#the-training" style="font-size: 80%;"><b>The training</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs016.html#code-example" style="font-size: 80%;"><b>Code example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs017.html#simple-neural-network-and-the-back-propagation-equations" style="font-size: 80%;"><b>Simple neural network and the  back propagation equations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs018.html#layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-with-two-hidden-noeds-and-one-output-node" style="font-size: 80%;"><b>Layout of a simple neural network with two input nodes, one  hidden layer with two hidden noeds and one output node</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs019.html#the-ouput-layer" style="font-size: 80%;"><b>The ouput layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs020.html#compact-expressions" style="font-size: 80%;"><b>Compact expressions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs021.html#output-layer" style="font-size: 80%;"><b>Output layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs022.html#explicit-derivatives" style="font-size: 80%;"><b>Explicit derivatives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs023.html#derivatives-of-the-hidden-layer" style="font-size: 80%;"><b>Derivatives of the hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs024.html#final-expression" style="font-size: 80%;"><b>Final expression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs025.html#completing-the-list" style="font-size: 80%;"><b>Completing the list</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs026.html#final-expressions-for-the-biases-of-the-hidden-layer" style="font-size: 80%;"><b>Final expressions for the biases of the hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs027.html#gradient-expressions" style="font-size: 80%;"><b>Gradient expressions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs028.html#setting-up-the-equations-for-a-neural-network" style="font-size: 80%;"><b>Setting up the equations for a neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs029.html#layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0" style="font-size: 80%;"><b>Layout of a neural network with three hidden layers (last layer = \( l=L=4 \), first layer \( l=0 \))</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs030.html#definitions" style="font-size: 80%;"><b>Definitions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs031.html#inputs-to-the-activation-function" style="font-size: 80%;"><b>Inputs to the activation function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs032.html#layout-of-input-to-first-hidden-layer-l-1-from-input-layer-l-0" style="font-size: 80%;"><b>Layout of input to first hidden layer \( l=1 \) from input layer \( l=0 \)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs033.html#derivatives-and-the-chain-rule" style="font-size: 80%;"><b>Derivatives and the chain rule</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs034.html#derivative-of-the-cost-function" style="font-size: 80%;"><b>Derivative of the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs035.html#the-back-propagation-equations-for-a-neural-network" style="font-size: 80%;"><b>The  back propagation equations for a neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs036.html#analyzing-the-last-results" style="font-size: 80%;"><b>Analyzing the last results</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs037.html#more-considerations" style="font-size: 80%;"><b>More considerations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs038.html#derivatives-in-terms-of-z-j-l" style="font-size: 80%;"><b>Derivatives in terms of \( z_j^L \)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs039.html#bringing-it-together" style="font-size: 80%;"><b>Bringing it together</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs040.html#final-back-propagating-equation" style="font-size: 80%;"><b>Final back propagating equation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs041.html#using-the-chain-rule-and-summing-over-all-k-entries" style="font-size: 80%;"><b>Using the chain rule and summing over all \( k \) entries</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs042.html#setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations" style="font-size: 80%;"><b>Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs043.html#setting-up-the-back-propagation-algorithm-part-1" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 1</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs044.html#setting-up-the-back-propagation-algorithm-part-2" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 2</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs045.html#setting-up-the-back-propagation-algorithm-part-3" style="font-size: 80%;"><b>Setting up the Back propagation algorithm, part 3</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs046.html#updating-the-gradients" style="font-size: 80%;"><b>Updating the gradients</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#activation-functions" style="font-size: 80%;"><b>Activation functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs047.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs048.html#relevance" style="font-size: 80%;"><b>Relevance</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs049.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs050.html#exploding-gradients" style="font-size: 80%;"><b>Exploding gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs051.html#is-the-logistic-activation-function-sigmoid-our-choice" style="font-size: 80%;"><b>Is the Logistic activation function (Sigmoid)  our choice?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs052.html#logistic-function-as-the-root-of-problems" style="font-size: 80%;"><b>Logistic function as the root of problems</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs053.html#the-derivative-of-the-logistic-funtion" style="font-size: 80%;"><b>The derivative of the Logistic funtion</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs054.html#insights-from-the-paper-by-glorot-and-bengio" style="font-size: 80%;"><b>Insights from the paper by Glorot and Bengio</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs055.html#the-relu-function-family" style="font-size: 80%;"><b>The RELU function family</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs056.html#elu-function" style="font-size: 80%;"><b>ELU function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs057.html#which-activation-function-should-we-use" style="font-size: 80%;"><b>Which activation function should we use?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs058.html#more-on-activation-functions-output-layers" style="font-size: 80%;"><b>More on activation functions, output layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs059.html#fine-tuning-neural-network-hyperparameters" style="font-size: 80%;"><b>Fine-tuning neural network hyperparameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs060.html#hidden-layers" style="font-size: 80%;"><b>Hidden layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs061.html#batch-normalization" style="font-size: 80%;"><b>Batch Normalization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs062.html#dropout" style="font-size: 80%;"><b>Dropout</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs063.html#gradient-clipping" style="font-size: 80%;"><b>Gradient Clipping</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs064.html#a-top-down-perspective-on-neural-networks" style="font-size: 80%;"><b>A top-down perspective on Neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs065.html#more-top-down-perspectives" style="font-size: 80%;"><b>More top-down perspectives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs066.html#limitations-of-supervised-learning-with-deep-networks" style="font-size: 80%;"><b>Limitations of supervised learning with deep networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs067.html#limitations-of-nns" style="font-size: 80%;"><b>Limitations of NNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs068.html#homogeneous-data" style="font-size: 80%;"><b>Homogeneous data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs069.html#more-limitations" style="font-size: 80%;"><b>More limitations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs070.html#setting-up-a-multi-layer-perceptron-model-for-classification" style="font-size: 80%;"><b>Setting up a Multi-layer perceptron model for classification</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs071.html#defining-the-cost-function" style="font-size: 80%;"><b>Defining the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs072.html#example-binary-classification-problem" style="font-size: 80%;"><b>Example: binary classification problem</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs073.html#the-softmax-function" style="font-size: 80%;"><b>The Softmax function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs074.html#developing-a-code-for-doing-neural-networks-with-back-propagation" style="font-size: 80%;"><b>Developing a code for doing neural networks with back propagation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs076.html#train-and-test-datasets" style="font-size: 80%;"><b>Train and test datasets</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs077.html#define-model-and-architecture" style="font-size: 80%;"><b>Define model and architecture</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs078.html#layers" style="font-size: 80%;"><b>Layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs079.html#weights-and-biases" style="font-size: 80%;"><b>Weights and biases</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs080.html#feed-forward-pass" style="font-size: 80%;"><b>Feed-forward pass</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs081.html#matrix-multiplications" style="font-size: 80%;"><b>Matrix multiplications</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs082.html#choose-cost-function-and-optimizer" style="font-size: 80%;"><b>Choose cost function and optimizer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs083.html#optimizing-the-cost-function" style="font-size: 80%;"><b>Optimizing the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="#regularization" style="font-size: 80%;"><b>Regularization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs085.html#matrix-multiplication" style="font-size: 80%;"><b>Matrix  multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs086.html#improving-performance" style="font-size: 80%;"><b>Improving performance</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs087.html#full-object-oriented-implementation" style="font-size: 80%;"><b>Full object-oriented implementation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs088.html#evaluate-model-performance-on-test-data" style="font-size: 80%;"><b>Evaluate model performance on test data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs089.html#adjust-hyperparameters" style="font-size: 80%;"><b>Adjust hyperparameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs091.html#scikit-learn-implementation" style="font-size: 80%;"><b>scikit-learn implementation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#building-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs094.html#tensorflow" style="font-size: 80%;"><b>Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs095.html#using-keras" style="font-size: 80%;"><b>Using Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#the-breast-cancer-data-now-with-keras" style="font-size: 80%;"><b>The Breast Cancer Data, now with Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs048.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs049.html#relevance" style="font-size: 80%;"><b>Relevance</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs050.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs051.html#exploding-gradients" style="font-size: 80%;"><b>Exploding gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs052.html#is-the-logistic-activation-function-sigmoid-our-choice" style="font-size: 80%;"><b>Is the Logistic activation function (Sigmoid)  our choice?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs053.html#logistic-function-as-the-root-of-problems" style="font-size: 80%;"><b>Logistic function as the root of problems</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs054.html#the-derivative-of-the-logistic-funtion" style="font-size: 80%;"><b>The derivative of the Logistic funtion</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs055.html#insights-from-the-paper-by-glorot-and-bengio" style="font-size: 80%;"><b>Insights from the paper by Glorot and Bengio</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs056.html#the-relu-function-family" style="font-size: 80%;"><b>The RELU function family</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs057.html#elu-function" style="font-size: 80%;"><b>ELU function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs058.html#which-activation-function-should-we-use" style="font-size: 80%;"><b>Which activation function should we use?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs059.html#more-on-activation-functions-output-layers" style="font-size: 80%;"><b>More on activation functions, output layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs060.html#fine-tuning-neural-network-hyperparameters" style="font-size: 80%;"><b>Fine-tuning neural network hyperparameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs061.html#hidden-layers" style="font-size: 80%;"><b>Hidden layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs062.html#batch-normalization" style="font-size: 80%;"><b>Batch Normalization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs063.html#dropout" style="font-size: 80%;"><b>Dropout</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs064.html#gradient-clipping" style="font-size: 80%;"><b>Gradient Clipping</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs065.html#a-top-down-perspective-on-neural-networks" style="font-size: 80%;"><b>A top-down perspective on Neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs066.html#more-top-down-perspectives" style="font-size: 80%;"><b>More top-down perspectives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs067.html#limitations-of-supervised-learning-with-deep-networks" style="font-size: 80%;"><b>Limitations of supervised learning with deep networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs068.html#limitations-of-nns" style="font-size: 80%;"><b>Limitations of NNs</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs069.html#homogeneous-data" style="font-size: 80%;"><b>Homogeneous data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs070.html#more-limitations" style="font-size: 80%;"><b>More limitations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs071.html#setting-up-a-multi-layer-perceptron-model-for-classification" style="font-size: 80%;"><b>Setting up a Multi-layer perceptron model for classification</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs072.html#defining-the-cost-function" style="font-size: 80%;"><b>Defining the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs073.html#example-binary-classification-problem" style="font-size: 80%;"><b>Example: binary classification problem</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs074.html#the-softmax-function" style="font-size: 80%;"><b>The Softmax function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs075.html#developing-a-code-for-doing-neural-networks-with-back-propagation" style="font-size: 80%;"><b>Developing a code for doing neural networks with back propagation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs077.html#train-and-test-datasets" style="font-size: 80%;"><b>Train and test datasets</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs078.html#define-model-and-architecture" style="font-size: 80%;"><b>Define model and architecture</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs079.html#layers" style="font-size: 80%;"><b>Layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs080.html#weights-and-biases" style="font-size: 80%;"><b>Weights and biases</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs081.html#feed-forward-pass" style="font-size: 80%;"><b>Feed-forward pass</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs082.html#matrix-multiplications" style="font-size: 80%;"><b>Matrix multiplications</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs083.html#choose-cost-function-and-optimizer" style="font-size: 80%;"><b>Choose cost function and optimizer</b></a></li>
+     <!-- navigation toc: --> <li><a href="#optimizing-the-cost-function" style="font-size: 80%;"><b>Optimizing the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs085.html#regularization" style="font-size: 80%;"><b>Regularization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs086.html#matrix-multiplication" style="font-size: 80%;"><b>Matrix  multiplication</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs087.html#improving-performance" style="font-size: 80%;"><b>Improving performance</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs088.html#full-object-oriented-implementation" style="font-size: 80%;"><b>Full object-oriented implementation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs089.html#evaluate-model-performance-on-test-data" style="font-size: 80%;"><b>Evaluate model performance on test data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs090.html#adjust-hyperparameters" style="font-size: 80%;"><b>Adjust hyperparameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#scikit-learn-implementation" style="font-size: 80%;"><b>scikit-learn implementation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs094.html#building-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building neural networks in Tensorflow and Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs095.html#tensorflow" style="font-size: 80%;"><b>Tensorflow</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#using-keras" style="font-size: 80%;"><b>Using Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#building-a-neural-network-code" style="font-size: 80%;"><b>Building a neural network code</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#learning-rate-methods" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Learning rate methods</a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#usage-of-the-above-learning-rate-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of the above learning rate schedulers</a></li>
@@ -464,36 +462,41 @@
 <div class="container">
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0084"></a>
-<!-- !split   -->
-<h2 id="regularization" class="anchor">Regularization </h2>
+<!-- !split -->
+<h2 id="optimizing-the-cost-function" class="anchor">Optimizing the cost function </h2>
 
-<p>It is common to add an extra term to the cost function, proportional
-to the size of the weights.  This is equivalent to constraining the
-size of the weights, so that they do not grow out of control.
-Constraining the size of the weights means that the weights cannot
-grow arbitrarily large to fit the training data, and in this way
-reduces <em>overfitting</em>.
+<p>The network is trained by finding the weights and biases that minimize the cost function. One of the most widely used classes of methods is <em>gradient descent</em> and its generalizations. The idea behind gradient descent
+is simply to adjust the weights in the direction where the gradient of the cost function is large and negative. This ensures we flow toward a <em>local</em> minimum of the cost function.  
+Each parameter \( \theta \) is iteratively adjusted according to the rule  
 </p>
 
-<p>We will measure the size of the weights using the so called <em>L2-norm</em>, meaning our cost function becomes:  </p>
+<p>$$ \theta_{i+1} = \theta_i - \eta \nabla \mathcal{C}(\theta_i) ,$$</p>
 
-<p>$$  \mathcal{C}(\theta) = \frac{1}{N} \sum_{i=1}^N \mathcal{L}_i(\theta) \quad \rightarrow \quad
-\frac{1}{N} \sum_{i=1}^N  \mathcal{L}_i(\theta) + \lambda \lvert \lvert \boldsymbol{w} \rvert \rvert_2^2 
-= \frac{1}{N} \sum_{i=1}^N  \mathcal{L}(\theta) + \lambda \sum_{ij} w_{ij}^2,$$  
+<p>where \( \eta \) is known as the <em>learning rate</em>, which controls how big a step we take towards the minimum.  
+This update can be repeated for any number of iterations, or until we are satisfied with the result.  
 </p>
 
-<p>i.e. we sum up all the weights squared. The factor \( \lambda \) is known as a regularization parameter.</p>
+<p>A simple and effective improvement is a variant called <em>Batch Gradient Descent</em>.  
+Instead of calculating the gradient on the whole dataset, we calculate an approximation of the gradient
+on a subset of the data called a <em>minibatch</em>.  
+If there are \( N \) data points and we have a minibatch size of \( M \), the total number of batches
+is \( N/M \).  
+We denote each minibatch \( B_k \), with \( k = 1, 2,...,N/M \). The gradient then becomes:  
+</p>
 
-<p>In order to train the model, we need to calculate the derivative of
-the cost function with respect to every bias and weight in the
-network.  In total our network has \( (64 + 1)\times 50=3250 \) weights in
-the hidden layer and \( (50 + 1)\times 10=510 \) weights to the output
-layer (\( +1 \) for the bias), and the gradient must be calculated for
-every parameter.  We use the <em>backpropagation</em> algorithm discussed
-above. This is a clever use of the chain rule that allows us to
-calculate the gradient efficently. 
+<p>$$ \nabla \mathcal{C}(\theta) = \frac{1}{N} \sum_{i=1}^N \nabla \mathcal{L}_i(\theta) \quad \rightarrow \quad
+\frac{1}{M} \sum_{i \in B_k} \nabla \mathcal{L}_i(\theta) ,$$
 </p>
 
+<p>i.e. instead of averaging the loss over the entire dataset, we average over a minibatch.  </p>
+
+<p>This has two important benefits:  </p>
+<ol>
+<li> Introducing stochasticity decreases the chance that the algorithm becomes stuck in a local minima.</li>  
+<li> It significantly speeds up the calculation, since we do not have to use the entire dataset to calculate the gradient.</li>  
+</ol>
+<p>The various optmization  methods, with codes and algorithms,  are discussed in our lectures on <a href="https://compphysics.github.io/MachineLearning/doc/pub/Splines/html/Splines-bs.html" target="_self">Gradient descent approaches</a>.</p>
+
 <p>
 <!-- navigation buttons at the bottom of the page -->
 <ul class="pagination">
diff --git a/doc/pub/week42/html/._week42-bs085.html b/doc/pub/week42/html/._week42-bs085.html
index 9a64ac8d6..7e725fc05 100644
--- a/doc/pub/week42/html/._week42-bs085.html
+++ b/doc/pub/week42/html/._week42-bs085.html
@@ -36,16 +36,17 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Lecture October 14, 2024', 2, None, 'lecture-october-14-2024'),
-              ('Material for the active learning sessions on Tuesday and '
-               'Wednesday',
+ 'sections': [('Lecture October 13, 2025', 2, None, 'lecture-october-13-2025'),
+              ('Readings and videos', 2, None, 'readings-and-videos'),
+              ('Material for the lab sessions on Tuesday and Wednesday',
                2,
                None,
-               'material-for-the-active-learning-sessions-on-tuesday-and-wednesday'),
-              ('Writing a code which implements a feed-forward neural network',
+               'material-for-the-lab-sessions-on-tuesday-and-wednesday'),
+              ('Lecture material: Writing a code which implements a '
+               'feed-forward neural network',
                2,
                None,
-               'writing-a-code-which-implements-a-feed-forward-neural-network'),
+               'lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network'),
               ('Mathematics of deep learning',
                2,
                None,
@@ -55,10 +56,11 @@
                None,
                'reminder-on-books-with-hands-on-material-and-codes'),
               ('Reading recommendations', 2, None, 'reading-recommendations'),
-              ('First network example, simple percepetron with one input',
+              ('Reminder from last week: First network example, simple '
+               'percepetron with one input',
                2,
                None,
-               'first-network-example-simple-percepetron-with-one-input'),
+               'reminder-from-last-week-first-network-example-simple-percepetron-with-one-input'),
               ('Layout of a simple neural network with no hidden layer',
                2,
                None,
@@ -105,10 +107,10 @@
                None,
                'setting-up-the-equations-for-a-neural-network'),
               ('Layout of a neural network with three hidden layers (last '
-               'later = $l=L=4$, first layer $l=0$)',
+               'layer = $l=L=4$, first layer $l=0$)',
                2,
                None,
-               'layout-of-a-neural-network-with-three-hidden-layers-last-later-l-l-4-first-layer-l-0'),
+               'layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0'),
               ('Definitions', 2, None, 'definitions'),
               ('Inputs to the activation function',
                2,
@@ -291,10 +293,6 @@
                2,
                None,
                'collect-and-pre-process-data'),
-              ('The Breast Cancer Data, now with Keras',
-               2,
-               None,
-               'the-breast-cancer-data-now-with-keras'),
               ('Building a neural network code',
                2,
                None,
@@ -349,103 +347,103 @@
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs001.html#lecture-october-14-2024" style="font-size: 80%;"><b>Lecture October 14, 2024</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs002.html#material-for-the-active-learning-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Material for the active learning sessions on Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs003.html#writing-a-code-which-implements-a-feed-forward-neural-network" style="font-size: 80%;"><b>Writing a code which implements a feed-forward neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs004.html#mathematics-of-deep-learning" style="font-size: 80%;"><b>Mathematics of deep learning</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs005.html#reminder-on-books-with-hands-on-material-and-codes" style="font-size: 80%;"><b>Reminder on books with hands-on material and codes</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs006.html#reading-recommendations" style="font-size: 80%;"><b>Reading recommendations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs007.html#first-network-example-simple-percepetron-with-one-input" style="font-size: 80%;"><b>First network example, simple percepetron with one input</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs008.html#layout-of-a-simple-neural-network-with-no-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with no hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs009.html#optimizing-the-parameters" style="font-size: 80%;"><b>Optimizing the parameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs010.html#adding-a-hidden-layer" style="font-size: 80%;"><b>Adding a hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs011.html#layout-of-a-simple-neural-network-with-one-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with one hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs012.html#the-derivatives" style="font-size: 80%;"><b>The derivatives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs013.html#important-observations" style="font-size: 80%;"><b>Important observations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs014.html#the-training" style="font-size: 80%;"><b>The training</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs015.html#code-example" style="font-size: 80%;"><b>Code example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs016.html#simple-neural-network-and-the-back-propagation-equations" style="font-size: 80%;"><b>Simple neural network and the  back propagation equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs017.html#layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-with-two-hidden-noeds-and-one-output-node" style="font-size: 80%;"><b>Layout of a simple neural network with two input nodes, one  hidden layer with two hidden noeds and one output node</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs018.html#the-ouput-layer" style="font-size: 80%;"><b>The ouput layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs019.html#compact-expressions" style="font-size: 80%;"><b>Compact expressions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs020.html#output-layer" style="font-size: 80%;"><b>Output layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs021.html#explicit-derivatives" style="font-size: 80%;"><b>Explicit derivatives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs022.html#derivatives-of-the-hidden-layer" style="font-size: 80%;"><b>Derivatives of the hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs023.html#final-expression" style="font-size: 80%;"><b>Final expression</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs024.html#completing-the-list" style="font-size: 80%;"><b>Completing the list</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs025.html#final-expressions-for-the-biases-of-the-hidden-layer" style="font-size: 80%;"><b>Final expressions for the biases of the hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs026.html#gradient-expressions" style="font-size: 80%;"><b>Gradient expressions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs027.html#setting-up-the-equations-for-a-neural-network" style="font-size: 80%;"><b>Setting up the equations for a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs028.html#layout-of-a-neural-network-with-three-hidden-layers-last-later-l-l-4-first-layer-l-0" style="font-size: 80%;"><b>Layout of a neural network with three hidden layers (last later = \( l=L=4 \), first layer \( l=0 \))</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs029.html#definitions" style="font-size: 80%;"><b>Definitions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs030.html#inputs-to-the-activation-function" style="font-size: 80%;"><b>Inputs to the activation function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs031.html#layout-of-input-to-first-hidden-layer-l-1-from-input-layer-l-0" style="font-size: 80%;"><b>Layout of input to first hidden layer \( l=1 \) from input layer \( l=0 \)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs032.html#derivatives-and-the-chain-rule" style="font-size: 80%;"><b>Derivatives and the chain rule</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs033.html#derivative-of-the-cost-function" style="font-size: 80%;"><b>Derivative of the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs034.html#the-back-propagation-equations-for-a-neural-network" style="font-size: 80%;"><b>The  back propagation equations for a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs035.html#analyzing-the-last-results" style="font-size: 80%;"><b>Analyzing the last results</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs036.html#more-considerations" style="font-size: 80%;"><b>More considerations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs037.html#derivatives-in-terms-of-z-j-l" style="font-size: 80%;"><b>Derivatives in terms of \( z_j^L \)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs038.html#bringing-it-together" style="font-size: 80%;"><b>Bringing it together</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs039.html#final-back-propagating-equation" style="font-size: 80%;"><b>Final back propagating equation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs040.html#using-the-chain-rule-and-summing-over-all-k-entries" style="font-size: 80%;"><b>Using the chain rule and summing over all \( k \) entries</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs041.html#setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations" style="font-size: 80%;"><b>Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs042.html#setting-up-the-back-propagation-algorithm-part-1" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 1</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs043.html#setting-up-the-back-propagation-algorithm-part-2" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs044.html#setting-up-the-back-propagation-algorithm-part-3" style="font-size: 80%;"><b>Setting up the Back propagation algorithm, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs045.html#updating-the-gradients" style="font-size: 80%;"><b>Updating the gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs001.html#lecture-october-13-2025" style="font-size: 80%;"><b>Lecture October 13, 2025</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs002.html#readings-and-videos" style="font-size: 80%;"><b>Readings and videos</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs003.html#material-for-the-lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Material for the lab sessions on Tuesday and Wednesday</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs004.html#lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network" style="font-size: 80%;"><b>Lecture material: Writing a code which implements a feed-forward neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs005.html#mathematics-of-deep-learning" style="font-size: 80%;"><b>Mathematics of deep learning</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs006.html#reminder-on-books-with-hands-on-material-and-codes" style="font-size: 80%;"><b>Reminder on books with hands-on material and codes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs007.html#reading-recommendations" style="font-size: 80%;"><b>Reading recommendations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs008.html#reminder-from-last-week-first-network-example-simple-percepetron-with-one-input" style="font-size: 80%;"><b>Reminder from last week: First network example, simple percepetron with one input</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs009.html#layout-of-a-simple-neural-network-with-no-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with no hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs010.html#optimizing-the-parameters" style="font-size: 80%;"><b>Optimizing the parameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs011.html#adding-a-hidden-layer" style="font-size: 80%;"><b>Adding a hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs012.html#layout-of-a-simple-neural-network-with-one-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with one hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs013.html#the-derivatives" style="font-size: 80%;"><b>The derivatives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs014.html#important-observations" style="font-size: 80%;"><b>Important observations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs015.html#the-training" style="font-size: 80%;"><b>The training</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs016.html#code-example" style="font-size: 80%;"><b>Code example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs017.html#simple-neural-network-and-the-back-propagation-equations" style="font-size: 80%;"><b>Simple neural network and the  back propagation equations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs018.html#layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-with-two-hidden-noeds-and-one-output-node" style="font-size: 80%;"><b>Layout of a simple neural network with two input nodes, one  hidden layer with two hidden noeds and one output node</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs019.html#the-ouput-layer" style="font-size: 80%;"><b>The ouput layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs020.html#compact-expressions" style="font-size: 80%;"><b>Compact expressions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs021.html#output-layer" style="font-size: 80%;"><b>Output layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs022.html#explicit-derivatives" style="font-size: 80%;"><b>Explicit derivatives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs023.html#derivatives-of-the-hidden-layer" style="font-size: 80%;"><b>Derivatives of the hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs024.html#final-expression" style="font-size: 80%;"><b>Final expression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs025.html#completing-the-list" style="font-size: 80%;"><b>Completing the list</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs026.html#final-expressions-for-the-biases-of-the-hidden-layer" style="font-size: 80%;"><b>Final expressions for the biases of the hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs027.html#gradient-expressions" style="font-size: 80%;"><b>Gradient expressions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs028.html#setting-up-the-equations-for-a-neural-network" style="font-size: 80%;"><b>Setting up the equations for a neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs029.html#layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0" style="font-size: 80%;"><b>Layout of a neural network with three hidden layers (last layer = \( l=L=4 \), first layer \( l=0 \))</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs030.html#definitions" style="font-size: 80%;"><b>Definitions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs031.html#inputs-to-the-activation-function" style="font-size: 80%;"><b>Inputs to the activation function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs032.html#layout-of-input-to-first-hidden-layer-l-1-from-input-layer-l-0" style="font-size: 80%;"><b>Layout of input to first hidden layer \( l=1 \) from input layer \( l=0 \)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs033.html#derivatives-and-the-chain-rule" style="font-size: 80%;"><b>Derivatives and the chain rule</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs034.html#derivative-of-the-cost-function" style="font-size: 80%;"><b>Derivative of the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs035.html#the-back-propagation-equations-for-a-neural-network" style="font-size: 80%;"><b>The  back propagation equations for a neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs036.html#analyzing-the-last-results" style="font-size: 80%;"><b>Analyzing the last results</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs037.html#more-considerations" style="font-size: 80%;"><b>More considerations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs038.html#derivatives-in-terms-of-z-j-l" style="font-size: 80%;"><b>Derivatives in terms of \( z_j^L \)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs039.html#bringing-it-together" style="font-size: 80%;"><b>Bringing it together</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs040.html#final-back-propagating-equation" style="font-size: 80%;"><b>Final back propagating equation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs041.html#using-the-chain-rule-and-summing-over-all-k-entries" style="font-size: 80%;"><b>Using the chain rule and summing over all \( k \) entries</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs042.html#setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations" style="font-size: 80%;"><b>Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs043.html#setting-up-the-back-propagation-algorithm-part-1" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 1</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs044.html#setting-up-the-back-propagation-algorithm-part-2" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 2</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs045.html#setting-up-the-back-propagation-algorithm-part-3" style="font-size: 80%;"><b>Setting up the Back propagation algorithm, part 3</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs046.html#updating-the-gradients" style="font-size: 80%;"><b>Updating the gradients</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#activation-functions" style="font-size: 80%;"><b>Activation functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs047.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs048.html#relevance" style="font-size: 80%;"><b>Relevance</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs049.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs050.html#exploding-gradients" style="font-size: 80%;"><b>Exploding gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs051.html#is-the-logistic-activation-function-sigmoid-our-choice" style="font-size: 80%;"><b>Is the Logistic activation function (Sigmoid)  our choice?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs052.html#logistic-function-as-the-root-of-problems" style="font-size: 80%;"><b>Logistic function as the root of problems</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs053.html#the-derivative-of-the-logistic-funtion" style="font-size: 80%;"><b>The derivative of the Logistic funtion</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs054.html#insights-from-the-paper-by-glorot-and-bengio" style="font-size: 80%;"><b>Insights from the paper by Glorot and Bengio</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs055.html#the-relu-function-family" style="font-size: 80%;"><b>The RELU function family</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs056.html#elu-function" style="font-size: 80%;"><b>ELU function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs057.html#which-activation-function-should-we-use" style="font-size: 80%;"><b>Which activation function should we use?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs058.html#more-on-activation-functions-output-layers" style="font-size: 80%;"><b>More on activation functions, output layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs059.html#fine-tuning-neural-network-hyperparameters" style="font-size: 80%;"><b>Fine-tuning neural network hyperparameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs060.html#hidden-layers" style="font-size: 80%;"><b>Hidden layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs061.html#batch-normalization" style="font-size: 80%;"><b>Batch Normalization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs062.html#dropout" style="font-size: 80%;"><b>Dropout</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs063.html#gradient-clipping" style="font-size: 80%;"><b>Gradient Clipping</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs064.html#a-top-down-perspective-on-neural-networks" style="font-size: 80%;"><b>A top-down perspective on Neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs065.html#more-top-down-perspectives" style="font-size: 80%;"><b>More top-down perspectives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs066.html#limitations-of-supervised-learning-with-deep-networks" style="font-size: 80%;"><b>Limitations of supervised learning with deep networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs067.html#limitations-of-nns" style="font-size: 80%;"><b>Limitations of NNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs068.html#homogeneous-data" style="font-size: 80%;"><b>Homogeneous data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs069.html#more-limitations" style="font-size: 80%;"><b>More limitations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs070.html#setting-up-a-multi-layer-perceptron-model-for-classification" style="font-size: 80%;"><b>Setting up a Multi-layer perceptron model for classification</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs071.html#defining-the-cost-function" style="font-size: 80%;"><b>Defining the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs072.html#example-binary-classification-problem" style="font-size: 80%;"><b>Example: binary classification problem</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs073.html#the-softmax-function" style="font-size: 80%;"><b>The Softmax function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs074.html#developing-a-code-for-doing-neural-networks-with-back-propagation" style="font-size: 80%;"><b>Developing a code for doing neural networks with back propagation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs076.html#train-and-test-datasets" style="font-size: 80%;"><b>Train and test datasets</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs077.html#define-model-and-architecture" style="font-size: 80%;"><b>Define model and architecture</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs078.html#layers" style="font-size: 80%;"><b>Layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs079.html#weights-and-biases" style="font-size: 80%;"><b>Weights and biases</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs080.html#feed-forward-pass" style="font-size: 80%;"><b>Feed-forward pass</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs081.html#matrix-multiplications" style="font-size: 80%;"><b>Matrix multiplications</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs082.html#choose-cost-function-and-optimizer" style="font-size: 80%;"><b>Choose cost function and optimizer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs083.html#optimizing-the-cost-function" style="font-size: 80%;"><b>Optimizing the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs084.html#regularization" style="font-size: 80%;"><b>Regularization</b></a></li>
-     <!-- navigation toc: --> <li><a href="#matrix-multiplication" style="font-size: 80%;"><b>Matrix  multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs086.html#improving-performance" style="font-size: 80%;"><b>Improving performance</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs087.html#full-object-oriented-implementation" style="font-size: 80%;"><b>Full object-oriented implementation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs088.html#evaluate-model-performance-on-test-data" style="font-size: 80%;"><b>Evaluate model performance on test data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs089.html#adjust-hyperparameters" style="font-size: 80%;"><b>Adjust hyperparameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs091.html#scikit-learn-implementation" style="font-size: 80%;"><b>scikit-learn implementation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#building-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs094.html#tensorflow" style="font-size: 80%;"><b>Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs095.html#using-keras" style="font-size: 80%;"><b>Using Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#the-breast-cancer-data-now-with-keras" style="font-size: 80%;"><b>The Breast Cancer Data, now with Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs048.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs049.html#relevance" style="font-size: 80%;"><b>Relevance</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs050.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs051.html#exploding-gradients" style="font-size: 80%;"><b>Exploding gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs052.html#is-the-logistic-activation-function-sigmoid-our-choice" style="font-size: 80%;"><b>Is the Logistic activation function (Sigmoid)  our choice?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs053.html#logistic-function-as-the-root-of-problems" style="font-size: 80%;"><b>Logistic function as the root of problems</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs054.html#the-derivative-of-the-logistic-funtion" style="font-size: 80%;"><b>The derivative of the Logistic funtion</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs055.html#insights-from-the-paper-by-glorot-and-bengio" style="font-size: 80%;"><b>Insights from the paper by Glorot and Bengio</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs056.html#the-relu-function-family" style="font-size: 80%;"><b>The RELU function family</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs057.html#elu-function" style="font-size: 80%;"><b>ELU function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs058.html#which-activation-function-should-we-use" style="font-size: 80%;"><b>Which activation function should we use?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs059.html#more-on-activation-functions-output-layers" style="font-size: 80%;"><b>More on activation functions, output layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs060.html#fine-tuning-neural-network-hyperparameters" style="font-size: 80%;"><b>Fine-tuning neural network hyperparameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs061.html#hidden-layers" style="font-size: 80%;"><b>Hidden layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs062.html#batch-normalization" style="font-size: 80%;"><b>Batch Normalization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs063.html#dropout" style="font-size: 80%;"><b>Dropout</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs064.html#gradient-clipping" style="font-size: 80%;"><b>Gradient Clipping</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs065.html#a-top-down-perspective-on-neural-networks" style="font-size: 80%;"><b>A top-down perspective on Neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs066.html#more-top-down-perspectives" style="font-size: 80%;"><b>More top-down perspectives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs067.html#limitations-of-supervised-learning-with-deep-networks" style="font-size: 80%;"><b>Limitations of supervised learning with deep networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs068.html#limitations-of-nns" style="font-size: 80%;"><b>Limitations of NNs</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs069.html#homogeneous-data" style="font-size: 80%;"><b>Homogeneous data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs070.html#more-limitations" style="font-size: 80%;"><b>More limitations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs071.html#setting-up-a-multi-layer-perceptron-model-for-classification" style="font-size: 80%;"><b>Setting up a Multi-layer perceptron model for classification</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs072.html#defining-the-cost-function" style="font-size: 80%;"><b>Defining the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs073.html#example-binary-classification-problem" style="font-size: 80%;"><b>Example: binary classification problem</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs074.html#the-softmax-function" style="font-size: 80%;"><b>The Softmax function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs075.html#developing-a-code-for-doing-neural-networks-with-back-propagation" style="font-size: 80%;"><b>Developing a code for doing neural networks with back propagation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs077.html#train-and-test-datasets" style="font-size: 80%;"><b>Train and test datasets</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs078.html#define-model-and-architecture" style="font-size: 80%;"><b>Define model and architecture</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs079.html#layers" style="font-size: 80%;"><b>Layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs080.html#weights-and-biases" style="font-size: 80%;"><b>Weights and biases</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs081.html#feed-forward-pass" style="font-size: 80%;"><b>Feed-forward pass</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs082.html#matrix-multiplications" style="font-size: 80%;"><b>Matrix multiplications</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs083.html#choose-cost-function-and-optimizer" style="font-size: 80%;"><b>Choose cost function and optimizer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs084.html#optimizing-the-cost-function" style="font-size: 80%;"><b>Optimizing the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="#regularization" style="font-size: 80%;"><b>Regularization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs086.html#matrix-multiplication" style="font-size: 80%;"><b>Matrix  multiplication</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs087.html#improving-performance" style="font-size: 80%;"><b>Improving performance</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs088.html#full-object-oriented-implementation" style="font-size: 80%;"><b>Full object-oriented implementation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs089.html#evaluate-model-performance-on-test-data" style="font-size: 80%;"><b>Evaluate model performance on test data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs090.html#adjust-hyperparameters" style="font-size: 80%;"><b>Adjust hyperparameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#scikit-learn-implementation" style="font-size: 80%;"><b>scikit-learn implementation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs094.html#building-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building neural networks in Tensorflow and Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs095.html#tensorflow" style="font-size: 80%;"><b>Tensorflow</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#using-keras" style="font-size: 80%;"><b>Using Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#building-a-neural-network-code" style="font-size: 80%;"><b>Building a neural network code</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#learning-rate-methods" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Learning rate methods</a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#usage-of-the-above-learning-rate-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of the above learning rate schedulers</a></li>
@@ -464,133 +462,36 @@
 <div class="container">
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0085"></a>
-<!-- !split -->
-<h2 id="matrix-multiplication" class="anchor">Matrix  multiplication </h2>
-
-<p>To more efficently train our network these equations are implemented using matrix operations.  
-The error in the output layer is calculated simply as, with \( \boldsymbol{t} \) being our targets,  
+<!-- !split   -->
+<h2 id="regularization" class="anchor">Regularization </h2>
+
+<p>It is common to add an extra term to the cost function, proportional
+to the size of the weights.  This is equivalent to constraining the
+size of the weights, so that they do not grow out of control.
+Constraining the size of the weights means that the weights cannot
+grow arbitrarily large to fit the training data, and in this way
+reduces <em>overfitting</em>.
 </p>
 
-<p>$$ \delta_L = \boldsymbol{t} - \boldsymbol{y} = (n_{inputs}, n_{categories}) .$$  </p>
-
-<p>The gradient for the output weights is calculated as  </p>
+<p>We will measure the size of the weights using the so called <em>L2-norm</em>, meaning our cost function becomes:  </p>
 
-<p>$$ \nabla W_{L} = \boldsymbol{a}^T \delta_L   = (n_{hidden}, n_{categories}) ,$$</p>
-
-<p>where \( \boldsymbol{a} = (n_{inputs}, n_{hidden}) \). This simply means that we are summing up the gradients for each input.  
-Since we are going backwards we have to transpose the activation matrix.  
+<p>$$  \mathcal{C}(\theta) = \frac{1}{N} \sum_{i=1}^N \mathcal{L}_i(\theta) \quad \rightarrow \quad
+\frac{1}{N} \sum_{i=1}^N  \mathcal{L}_i(\theta) + \lambda \lvert \lvert \boldsymbol{w} \rvert \rvert_2^2 
+= \frac{1}{N} \sum_{i=1}^N  \mathcal{L}(\theta) + \lambda \sum_{ij} w_{ij}^2,$$  
 </p>
 
-<p>The gradient with respect to the output bias is then  </p>
-
-<p>$$ \nabla \boldsymbol{b}_{L} = \sum_{i=1}^{n_{inputs}} \delta_L = (n_{categories}) .$$  </p>
-
-<p>The error in the hidden layer is  </p>
+<p>i.e. we sum up all the weights squared. The factor \( \lambda \) is known as a regularization parameter.</p>
 
-<p>$$ \Delta_h = \delta_L W_{L}^T \circ f'(z_{h}) = \delta_L W_{L}^T \circ a_{h} \circ (1 - a_{h}) = (n_{inputs}, n_{hidden}) ,$$  </p>
-
-<p>where \( f'(a_{h}) \) is the derivative of the activation in the hidden layer. The matrix products mean
-that we are summing up the products for each neuron in the output layer. The symbol \( \circ \) denotes
-the <em>Hadamard product</em>, meaning element-wise multiplication.  
+<p>In order to train the model, we need to calculate the derivative of
+the cost function with respect to every bias and weight in the
+network.  In total our network has \( (64 + 1)\times 50=3250 \) weights in
+the hidden layer and \( (50 + 1)\times 10=510 \) weights to the output
+layer (\( +1 \) for the bias), and the gradient must be calculated for
+every parameter.  We use the <em>backpropagation</em> algorithm discussed
+above. This is a clever use of the chain rule that allows us to
+calculate the gradient efficently. 
 </p>
 
-<p>This again gives us the gradients in the hidden layer:  </p>
-
-<p>$$ \nabla W_{h} = X^T \delta_h = (n_{features}, n_{hidden}) ,$$  </p>
-
-<p>$$ \nabla b_{h} = \sum_{i=1}^{n_{inputs}} \delta_h = (n_{hidden}) .$$</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #408080; font-style: italic"># to categorical turns our integer vector into a onehot representation</span>
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.metrics</span> <span style="color: #008000; font-weight: bold">import</span> accuracy_score
-
-<span style="color: #408080; font-style: italic"># one-hot in numpy</span>
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">to_categorical_numpy</span>(integer_vector):
-    n_inputs <span style="color: #666666">=</span> <span style="color: #008000">len</span>(integer_vector)
-    n_categories <span style="color: #666666">=</span> np<span style="color: #666666">.</span>max(integer_vector) <span style="color: #666666">+</span> <span style="color: #666666">1</span>
-    onehot_vector <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros((n_inputs, n_categories))
-    onehot_vector[<span style="color: #008000">range</span>(n_inputs), integer_vector] <span style="color: #666666">=</span> <span style="color: #666666">1</span>
-    
-    <span style="color: #008000; font-weight: bold">return</span> onehot_vector
-
-<span style="color: #408080; font-style: italic">#Y_train_onehot, Y_test_onehot = to_categorical(Y_train), to_categorical(Y_test)</span>
-Y_train_onehot, Y_test_onehot <span style="color: #666666">=</span> to_categorical_numpy(Y_train), to_categorical_numpy(Y_test)
-
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">feed_forward_train</span>(X):
-    <span style="color: #408080; font-style: italic"># weighted sum of inputs to the hidden layer</span>
-    z_h <span style="color: #666666">=</span> np<span style="color: #666666">.</span>matmul(X, hidden_weights) <span style="color: #666666">+</span> hidden_bias
-    <span style="color: #408080; font-style: italic"># activation in the hidden layer</span>
-    a_h <span style="color: #666666">=</span> sigmoid(z_h)
-    
-    <span style="color: #408080; font-style: italic"># weighted sum of inputs to the output layer</span>
-    z_o <span style="color: #666666">=</span> np<span style="color: #666666">.</span>matmul(a_h, output_weights) <span style="color: #666666">+</span> output_bias
-    <span style="color: #408080; font-style: italic"># softmax output</span>
-    <span style="color: #408080; font-style: italic"># axis 0 holds each input and axis 1 the probabilities of each category</span>
-    exp_term <span style="color: #666666">=</span> np<span style="color: #666666">.</span>exp(z_o)
-    probabilities <span style="color: #666666">=</span> exp_term <span style="color: #666666">/</span> np<span style="color: #666666">.</span>sum(exp_term, axis<span style="color: #666666">=1</span>, keepdims<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">True</span>)
-    
-    <span style="color: #408080; font-style: italic"># for backpropagation need activations in hidden and output layers</span>
-    <span style="color: #008000; font-weight: bold">return</span> a_h, probabilities
-
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">backpropagation</span>(X, Y):
-    a_h, probabilities <span style="color: #666666">=</span> feed_forward_train(X)
-    
-    <span style="color: #408080; font-style: italic"># error in the output layer</span>
-    error_output <span style="color: #666666">=</span> probabilities <span style="color: #666666">-</span> Y
-    <span style="color: #408080; font-style: italic"># error in the hidden layer</span>
-    error_hidden <span style="color: #666666">=</span> np<span style="color: #666666">.</span>matmul(error_output, output_weights<span style="color: #666666">.</span>T) <span style="color: #666666">*</span> a_h <span style="color: #666666">*</span> (<span style="color: #666666">1</span> <span style="color: #666666">-</span> a_h)
-    
-    <span style="color: #408080; font-style: italic"># gradients for the output layer</span>
-    output_weights_gradient <span style="color: #666666">=</span> np<span style="color: #666666">.</span>matmul(a_h<span style="color: #666666">.</span>T, error_output)
-    output_bias_gradient <span style="color: #666666">=</span> np<span style="color: #666666">.</span>sum(error_output, axis<span style="color: #666666">=0</span>)
-    
-    <span style="color: #408080; font-style: italic"># gradient for the hidden layer</span>
-    hidden_weights_gradient <span style="color: #666666">=</span> np<span style="color: #666666">.</span>matmul(X<span style="color: #666666">.</span>T, error_hidden)
-    hidden_bias_gradient <span style="color: #666666">=</span> np<span style="color: #666666">.</span>sum(error_hidden, axis<span style="color: #666666">=0</span>)
-
-    <span style="color: #008000; font-weight: bold">return</span> output_weights_gradient, output_bias_gradient, hidden_weights_gradient, hidden_bias_gradient
-
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;Old accuracy on training data: &quot;</span> <span style="color: #666666">+</span> <span style="color: #008000">str</span>(accuracy_score(predict(X_train), Y_train)))
-
-eta <span style="color: #666666">=</span> <span style="color: #666666">0.01</span>
-lmbd <span style="color: #666666">=</span> <span style="color: #666666">0.01</span>
-<span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #666666">1000</span>):
-    <span style="color: #408080; font-style: italic"># calculate gradients</span>
-    dWo, dBo, dWh, dBh <span style="color: #666666">=</span> backpropagation(X_train, Y_train_onehot)
-    
-    <span style="color: #408080; font-style: italic"># regularization term gradients</span>
-    dWo <span style="color: #666666">+=</span> lmbd <span style="color: #666666">*</span> output_weights
-    dWh <span style="color: #666666">+=</span> lmbd <span style="color: #666666">*</span> hidden_weights
-    
-    <span style="color: #408080; font-style: italic"># update weights and biases</span>
-    output_weights <span style="color: #666666">-=</span> eta <span style="color: #666666">*</span> dWo
-    output_bias <span style="color: #666666">-=</span> eta <span style="color: #666666">*</span> dBo
-    hidden_weights <span style="color: #666666">-=</span> eta <span style="color: #666666">*</span> dWh
-    hidden_bias <span style="color: #666666">-=</span> eta <span style="color: #666666">*</span> dBh
-
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;New accuracy on training data: &quot;</span> <span style="color: #666666">+</span> <span style="color: #008000">str</span>(accuracy_score(predict(X_train), Y_train)))
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-
 <p>
 <!-- navigation buttons at the bottom of the page -->
 <ul class="pagination">
diff --git a/doc/pub/week42/html/._week42-bs086.html b/doc/pub/week42/html/._week42-bs086.html
index fbaa6169c..f50e29d1b 100644
--- a/doc/pub/week42/html/._week42-bs086.html
+++ b/doc/pub/week42/html/._week42-bs086.html
@@ -36,16 +36,17 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Lecture October 14, 2024', 2, None, 'lecture-october-14-2024'),
-              ('Material for the active learning sessions on Tuesday and '
-               'Wednesday',
+ 'sections': [('Lecture October 13, 2025', 2, None, 'lecture-october-13-2025'),
+              ('Readings and videos', 2, None, 'readings-and-videos'),
+              ('Material for the lab sessions on Tuesday and Wednesday',
                2,
                None,
-               'material-for-the-active-learning-sessions-on-tuesday-and-wednesday'),
-              ('Writing a code which implements a feed-forward neural network',
+               'material-for-the-lab-sessions-on-tuesday-and-wednesday'),
+              ('Lecture material: Writing a code which implements a '
+               'feed-forward neural network',
                2,
                None,
-               'writing-a-code-which-implements-a-feed-forward-neural-network'),
+               'lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network'),
               ('Mathematics of deep learning',
                2,
                None,
@@ -55,10 +56,11 @@
                None,
                'reminder-on-books-with-hands-on-material-and-codes'),
               ('Reading recommendations', 2, None, 'reading-recommendations'),
-              ('First network example, simple percepetron with one input',
+              ('Reminder from last week: First network example, simple '
+               'percepetron with one input',
                2,
                None,
-               'first-network-example-simple-percepetron-with-one-input'),
+               'reminder-from-last-week-first-network-example-simple-percepetron-with-one-input'),
               ('Layout of a simple neural network with no hidden layer',
                2,
                None,
@@ -105,10 +107,10 @@
                None,
                'setting-up-the-equations-for-a-neural-network'),
               ('Layout of a neural network with three hidden layers (last '
-               'later = $l=L=4$, first layer $l=0$)',
+               'layer = $l=L=4$, first layer $l=0$)',
                2,
                None,
-               'layout-of-a-neural-network-with-three-hidden-layers-last-later-l-l-4-first-layer-l-0'),
+               'layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0'),
               ('Definitions', 2, None, 'definitions'),
               ('Inputs to the activation function',
                2,
@@ -291,10 +293,6 @@
                2,
                None,
                'collect-and-pre-process-data'),
-              ('The Breast Cancer Data, now with Keras',
-               2,
-               None,
-               'the-breast-cancer-data-now-with-keras'),
               ('Building a neural network code',
                2,
                None,
@@ -349,103 +347,103 @@
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs001.html#lecture-october-14-2024" style="font-size: 80%;"><b>Lecture October 14, 2024</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs002.html#material-for-the-active-learning-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Material for the active learning sessions on Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs003.html#writing-a-code-which-implements-a-feed-forward-neural-network" style="font-size: 80%;"><b>Writing a code which implements a feed-forward neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs004.html#mathematics-of-deep-learning" style="font-size: 80%;"><b>Mathematics of deep learning</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs005.html#reminder-on-books-with-hands-on-material-and-codes" style="font-size: 80%;"><b>Reminder on books with hands-on material and codes</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs006.html#reading-recommendations" style="font-size: 80%;"><b>Reading recommendations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs007.html#first-network-example-simple-percepetron-with-one-input" style="font-size: 80%;"><b>First network example, simple percepetron with one input</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs008.html#layout-of-a-simple-neural-network-with-no-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with no hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs009.html#optimizing-the-parameters" style="font-size: 80%;"><b>Optimizing the parameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs010.html#adding-a-hidden-layer" style="font-size: 80%;"><b>Adding a hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs011.html#layout-of-a-simple-neural-network-with-one-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with one hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs012.html#the-derivatives" style="font-size: 80%;"><b>The derivatives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs013.html#important-observations" style="font-size: 80%;"><b>Important observations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs014.html#the-training" style="font-size: 80%;"><b>The training</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs015.html#code-example" style="font-size: 80%;"><b>Code example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs016.html#simple-neural-network-and-the-back-propagation-equations" style="font-size: 80%;"><b>Simple neural network and the  back propagation equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs017.html#layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-with-two-hidden-noeds-and-one-output-node" style="font-size: 80%;"><b>Layout of a simple neural network with two input nodes, one  hidden layer with two hidden noeds and one output node</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs018.html#the-ouput-layer" style="font-size: 80%;"><b>The ouput layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs019.html#compact-expressions" style="font-size: 80%;"><b>Compact expressions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs020.html#output-layer" style="font-size: 80%;"><b>Output layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs021.html#explicit-derivatives" style="font-size: 80%;"><b>Explicit derivatives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs022.html#derivatives-of-the-hidden-layer" style="font-size: 80%;"><b>Derivatives of the hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs023.html#final-expression" style="font-size: 80%;"><b>Final expression</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs024.html#completing-the-list" style="font-size: 80%;"><b>Completing the list</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs025.html#final-expressions-for-the-biases-of-the-hidden-layer" style="font-size: 80%;"><b>Final expressions for the biases of the hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs026.html#gradient-expressions" style="font-size: 80%;"><b>Gradient expressions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs027.html#setting-up-the-equations-for-a-neural-network" style="font-size: 80%;"><b>Setting up the equations for a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs028.html#layout-of-a-neural-network-with-three-hidden-layers-last-later-l-l-4-first-layer-l-0" style="font-size: 80%;"><b>Layout of a neural network with three hidden layers (last later = \( l=L=4 \), first layer \( l=0 \))</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs029.html#definitions" style="font-size: 80%;"><b>Definitions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs030.html#inputs-to-the-activation-function" style="font-size: 80%;"><b>Inputs to the activation function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs031.html#layout-of-input-to-first-hidden-layer-l-1-from-input-layer-l-0" style="font-size: 80%;"><b>Layout of input to first hidden layer \( l=1 \) from input layer \( l=0 \)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs032.html#derivatives-and-the-chain-rule" style="font-size: 80%;"><b>Derivatives and the chain rule</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs033.html#derivative-of-the-cost-function" style="font-size: 80%;"><b>Derivative of the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs034.html#the-back-propagation-equations-for-a-neural-network" style="font-size: 80%;"><b>The  back propagation equations for a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs035.html#analyzing-the-last-results" style="font-size: 80%;"><b>Analyzing the last results</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs036.html#more-considerations" style="font-size: 80%;"><b>More considerations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs037.html#derivatives-in-terms-of-z-j-l" style="font-size: 80%;"><b>Derivatives in terms of \( z_j^L \)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs038.html#bringing-it-together" style="font-size: 80%;"><b>Bringing it together</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs039.html#final-back-propagating-equation" style="font-size: 80%;"><b>Final back propagating equation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs040.html#using-the-chain-rule-and-summing-over-all-k-entries" style="font-size: 80%;"><b>Using the chain rule and summing over all \( k \) entries</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs041.html#setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations" style="font-size: 80%;"><b>Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs042.html#setting-up-the-back-propagation-algorithm-part-1" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 1</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs043.html#setting-up-the-back-propagation-algorithm-part-2" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs044.html#setting-up-the-back-propagation-algorithm-part-3" style="font-size: 80%;"><b>Setting up the Back propagation algorithm, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs045.html#updating-the-gradients" style="font-size: 80%;"><b>Updating the gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs001.html#lecture-october-13-2025" style="font-size: 80%;"><b>Lecture October 13, 2025</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs002.html#readings-and-videos" style="font-size: 80%;"><b>Readings and videos</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs003.html#material-for-the-lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Material for the lab sessions on Tuesday and Wednesday</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs004.html#lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network" style="font-size: 80%;"><b>Lecture material: Writing a code which implements a feed-forward neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs005.html#mathematics-of-deep-learning" style="font-size: 80%;"><b>Mathematics of deep learning</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs006.html#reminder-on-books-with-hands-on-material-and-codes" style="font-size: 80%;"><b>Reminder on books with hands-on material and codes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs007.html#reading-recommendations" style="font-size: 80%;"><b>Reading recommendations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs008.html#reminder-from-last-week-first-network-example-simple-percepetron-with-one-input" style="font-size: 80%;"><b>Reminder from last week: First network example, simple percepetron with one input</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs009.html#layout-of-a-simple-neural-network-with-no-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with no hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs010.html#optimizing-the-parameters" style="font-size: 80%;"><b>Optimizing the parameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs011.html#adding-a-hidden-layer" style="font-size: 80%;"><b>Adding a hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs012.html#layout-of-a-simple-neural-network-with-one-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with one hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs013.html#the-derivatives" style="font-size: 80%;"><b>The derivatives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs014.html#important-observations" style="font-size: 80%;"><b>Important observations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs015.html#the-training" style="font-size: 80%;"><b>The training</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs016.html#code-example" style="font-size: 80%;"><b>Code example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs017.html#simple-neural-network-and-the-back-propagation-equations" style="font-size: 80%;"><b>Simple neural network and the  back propagation equations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs018.html#layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-with-two-hidden-noeds-and-one-output-node" style="font-size: 80%;"><b>Layout of a simple neural network with two input nodes, one  hidden layer with two hidden noeds and one output node</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs019.html#the-ouput-layer" style="font-size: 80%;"><b>The ouput layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs020.html#compact-expressions" style="font-size: 80%;"><b>Compact expressions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs021.html#output-layer" style="font-size: 80%;"><b>Output layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs022.html#explicit-derivatives" style="font-size: 80%;"><b>Explicit derivatives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs023.html#derivatives-of-the-hidden-layer" style="font-size: 80%;"><b>Derivatives of the hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs024.html#final-expression" style="font-size: 80%;"><b>Final expression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs025.html#completing-the-list" style="font-size: 80%;"><b>Completing the list</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs026.html#final-expressions-for-the-biases-of-the-hidden-layer" style="font-size: 80%;"><b>Final expressions for the biases of the hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs027.html#gradient-expressions" style="font-size: 80%;"><b>Gradient expressions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs028.html#setting-up-the-equations-for-a-neural-network" style="font-size: 80%;"><b>Setting up the equations for a neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs029.html#layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0" style="font-size: 80%;"><b>Layout of a neural network with three hidden layers (last layer = \( l=L=4 \), first layer \( l=0 \))</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs030.html#definitions" style="font-size: 80%;"><b>Definitions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs031.html#inputs-to-the-activation-function" style="font-size: 80%;"><b>Inputs to the activation function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs032.html#layout-of-input-to-first-hidden-layer-l-1-from-input-layer-l-0" style="font-size: 80%;"><b>Layout of input to first hidden layer \( l=1 \) from input layer \( l=0 \)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs033.html#derivatives-and-the-chain-rule" style="font-size: 80%;"><b>Derivatives and the chain rule</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs034.html#derivative-of-the-cost-function" style="font-size: 80%;"><b>Derivative of the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs035.html#the-back-propagation-equations-for-a-neural-network" style="font-size: 80%;"><b>The  back propagation equations for a neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs036.html#analyzing-the-last-results" style="font-size: 80%;"><b>Analyzing the last results</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs037.html#more-considerations" style="font-size: 80%;"><b>More considerations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs038.html#derivatives-in-terms-of-z-j-l" style="font-size: 80%;"><b>Derivatives in terms of \( z_j^L \)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs039.html#bringing-it-together" style="font-size: 80%;"><b>Bringing it together</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs040.html#final-back-propagating-equation" style="font-size: 80%;"><b>Final back propagating equation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs041.html#using-the-chain-rule-and-summing-over-all-k-entries" style="font-size: 80%;"><b>Using the chain rule and summing over all \( k \) entries</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs042.html#setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations" style="font-size: 80%;"><b>Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs043.html#setting-up-the-back-propagation-algorithm-part-1" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 1</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs044.html#setting-up-the-back-propagation-algorithm-part-2" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 2</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs045.html#setting-up-the-back-propagation-algorithm-part-3" style="font-size: 80%;"><b>Setting up the Back propagation algorithm, part 3</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs046.html#updating-the-gradients" style="font-size: 80%;"><b>Updating the gradients</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#activation-functions" style="font-size: 80%;"><b>Activation functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs047.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs048.html#relevance" style="font-size: 80%;"><b>Relevance</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs049.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs050.html#exploding-gradients" style="font-size: 80%;"><b>Exploding gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs051.html#is-the-logistic-activation-function-sigmoid-our-choice" style="font-size: 80%;"><b>Is the Logistic activation function (Sigmoid)  our choice?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs052.html#logistic-function-as-the-root-of-problems" style="font-size: 80%;"><b>Logistic function as the root of problems</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs053.html#the-derivative-of-the-logistic-funtion" style="font-size: 80%;"><b>The derivative of the Logistic funtion</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs054.html#insights-from-the-paper-by-glorot-and-bengio" style="font-size: 80%;"><b>Insights from the paper by Glorot and Bengio</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs055.html#the-relu-function-family" style="font-size: 80%;"><b>The RELU function family</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs056.html#elu-function" style="font-size: 80%;"><b>ELU function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs057.html#which-activation-function-should-we-use" style="font-size: 80%;"><b>Which activation function should we use?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs058.html#more-on-activation-functions-output-layers" style="font-size: 80%;"><b>More on activation functions, output layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs059.html#fine-tuning-neural-network-hyperparameters" style="font-size: 80%;"><b>Fine-tuning neural network hyperparameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs060.html#hidden-layers" style="font-size: 80%;"><b>Hidden layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs061.html#batch-normalization" style="font-size: 80%;"><b>Batch Normalization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs062.html#dropout" style="font-size: 80%;"><b>Dropout</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs063.html#gradient-clipping" style="font-size: 80%;"><b>Gradient Clipping</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs064.html#a-top-down-perspective-on-neural-networks" style="font-size: 80%;"><b>A top-down perspective on Neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs065.html#more-top-down-perspectives" style="font-size: 80%;"><b>More top-down perspectives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs066.html#limitations-of-supervised-learning-with-deep-networks" style="font-size: 80%;"><b>Limitations of supervised learning with deep networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs067.html#limitations-of-nns" style="font-size: 80%;"><b>Limitations of NNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs068.html#homogeneous-data" style="font-size: 80%;"><b>Homogeneous data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs069.html#more-limitations" style="font-size: 80%;"><b>More limitations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs070.html#setting-up-a-multi-layer-perceptron-model-for-classification" style="font-size: 80%;"><b>Setting up a Multi-layer perceptron model for classification</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs071.html#defining-the-cost-function" style="font-size: 80%;"><b>Defining the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs072.html#example-binary-classification-problem" style="font-size: 80%;"><b>Example: binary classification problem</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs073.html#the-softmax-function" style="font-size: 80%;"><b>The Softmax function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs074.html#developing-a-code-for-doing-neural-networks-with-back-propagation" style="font-size: 80%;"><b>Developing a code for doing neural networks with back propagation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs076.html#train-and-test-datasets" style="font-size: 80%;"><b>Train and test datasets</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs077.html#define-model-and-architecture" style="font-size: 80%;"><b>Define model and architecture</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs078.html#layers" style="font-size: 80%;"><b>Layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs079.html#weights-and-biases" style="font-size: 80%;"><b>Weights and biases</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs080.html#feed-forward-pass" style="font-size: 80%;"><b>Feed-forward pass</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs081.html#matrix-multiplications" style="font-size: 80%;"><b>Matrix multiplications</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs082.html#choose-cost-function-and-optimizer" style="font-size: 80%;"><b>Choose cost function and optimizer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs083.html#optimizing-the-cost-function" style="font-size: 80%;"><b>Optimizing the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs084.html#regularization" style="font-size: 80%;"><b>Regularization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs085.html#matrix-multiplication" style="font-size: 80%;"><b>Matrix  multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="#improving-performance" style="font-size: 80%;"><b>Improving performance</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs087.html#full-object-oriented-implementation" style="font-size: 80%;"><b>Full object-oriented implementation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs088.html#evaluate-model-performance-on-test-data" style="font-size: 80%;"><b>Evaluate model performance on test data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs089.html#adjust-hyperparameters" style="font-size: 80%;"><b>Adjust hyperparameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs091.html#scikit-learn-implementation" style="font-size: 80%;"><b>scikit-learn implementation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#building-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs094.html#tensorflow" style="font-size: 80%;"><b>Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs095.html#using-keras" style="font-size: 80%;"><b>Using Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#the-breast-cancer-data-now-with-keras" style="font-size: 80%;"><b>The Breast Cancer Data, now with Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs048.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs049.html#relevance" style="font-size: 80%;"><b>Relevance</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs050.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs051.html#exploding-gradients" style="font-size: 80%;"><b>Exploding gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs052.html#is-the-logistic-activation-function-sigmoid-our-choice" style="font-size: 80%;"><b>Is the Logistic activation function (Sigmoid)  our choice?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs053.html#logistic-function-as-the-root-of-problems" style="font-size: 80%;"><b>Logistic function as the root of problems</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs054.html#the-derivative-of-the-logistic-funtion" style="font-size: 80%;"><b>The derivative of the Logistic funtion</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs055.html#insights-from-the-paper-by-glorot-and-bengio" style="font-size: 80%;"><b>Insights from the paper by Glorot and Bengio</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs056.html#the-relu-function-family" style="font-size: 80%;"><b>The RELU function family</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs057.html#elu-function" style="font-size: 80%;"><b>ELU function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs058.html#which-activation-function-should-we-use" style="font-size: 80%;"><b>Which activation function should we use?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs059.html#more-on-activation-functions-output-layers" style="font-size: 80%;"><b>More on activation functions, output layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs060.html#fine-tuning-neural-network-hyperparameters" style="font-size: 80%;"><b>Fine-tuning neural network hyperparameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs061.html#hidden-layers" style="font-size: 80%;"><b>Hidden layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs062.html#batch-normalization" style="font-size: 80%;"><b>Batch Normalization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs063.html#dropout" style="font-size: 80%;"><b>Dropout</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs064.html#gradient-clipping" style="font-size: 80%;"><b>Gradient Clipping</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs065.html#a-top-down-perspective-on-neural-networks" style="font-size: 80%;"><b>A top-down perspective on Neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs066.html#more-top-down-perspectives" style="font-size: 80%;"><b>More top-down perspectives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs067.html#limitations-of-supervised-learning-with-deep-networks" style="font-size: 80%;"><b>Limitations of supervised learning with deep networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs068.html#limitations-of-nns" style="font-size: 80%;"><b>Limitations of NNs</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs069.html#homogeneous-data" style="font-size: 80%;"><b>Homogeneous data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs070.html#more-limitations" style="font-size: 80%;"><b>More limitations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs071.html#setting-up-a-multi-layer-perceptron-model-for-classification" style="font-size: 80%;"><b>Setting up a Multi-layer perceptron model for classification</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs072.html#defining-the-cost-function" style="font-size: 80%;"><b>Defining the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs073.html#example-binary-classification-problem" style="font-size: 80%;"><b>Example: binary classification problem</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs074.html#the-softmax-function" style="font-size: 80%;"><b>The Softmax function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs075.html#developing-a-code-for-doing-neural-networks-with-back-propagation" style="font-size: 80%;"><b>Developing a code for doing neural networks with back propagation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs077.html#train-and-test-datasets" style="font-size: 80%;"><b>Train and test datasets</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs078.html#define-model-and-architecture" style="font-size: 80%;"><b>Define model and architecture</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs079.html#layers" style="font-size: 80%;"><b>Layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs080.html#weights-and-biases" style="font-size: 80%;"><b>Weights and biases</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs081.html#feed-forward-pass" style="font-size: 80%;"><b>Feed-forward pass</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs082.html#matrix-multiplications" style="font-size: 80%;"><b>Matrix multiplications</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs083.html#choose-cost-function-and-optimizer" style="font-size: 80%;"><b>Choose cost function and optimizer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs084.html#optimizing-the-cost-function" style="font-size: 80%;"><b>Optimizing the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs085.html#regularization" style="font-size: 80%;"><b>Regularization</b></a></li>
+     <!-- navigation toc: --> <li><a href="#matrix-multiplication" style="font-size: 80%;"><b>Matrix  multiplication</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs087.html#improving-performance" style="font-size: 80%;"><b>Improving performance</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs088.html#full-object-oriented-implementation" style="font-size: 80%;"><b>Full object-oriented implementation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs089.html#evaluate-model-performance-on-test-data" style="font-size: 80%;"><b>Evaluate model performance on test data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs090.html#adjust-hyperparameters" style="font-size: 80%;"><b>Adjust hyperparameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#scikit-learn-implementation" style="font-size: 80%;"><b>scikit-learn implementation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs094.html#building-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building neural networks in Tensorflow and Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs095.html#tensorflow" style="font-size: 80%;"><b>Tensorflow</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#using-keras" style="font-size: 80%;"><b>Using Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#building-a-neural-network-code" style="font-size: 80%;"><b>Building a neural network code</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#learning-rate-methods" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Learning rate methods</a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#usage-of-the-above-learning-rate-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of the above learning rate schedulers</a></li>
@@ -465,22 +463,132 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0086"></a>
 <!-- !split -->
-<h2 id="improving-performance" class="anchor">Improving performance </h2>
+<h2 id="matrix-multiplication" class="anchor">Matrix  multiplication </h2>
 
-<p>As we can see the network does not seem to be learning at all. It seems to be just guessing the label for each image.  
-In order to obtain a network that does something useful, we will have to do a bit more work.  
+<p>To more efficently train our network these equations are implemented using matrix operations.  
+The error in the output layer is calculated simply as, with \( \boldsymbol{t} \) being our targets,  
 </p>
 
-<p>The choice of <em>hyperparameters</em> such as learning rate and regularization parameter is hugely influential for the performance of the network. Typically a <em>grid-search</em> is performed, wherein we test different hyperparameters separated by orders of magnitude. For example we could test the learning rates \( \eta = 10^{-6}, 10^{-5},...,10^{-1} \) with different regularization parameters \( \lambda = 10^{-6},...,10^{-0} \).  </p>
+<p>$$ \delta_L = \boldsymbol{t} - \boldsymbol{y} = (n_{inputs}, n_{categories}) .$$  </p>
+
+<p>The gradient for the output weights is calculated as  </p>
+
+<p>$$ \nabla W_{L} = \boldsymbol{a}^T \delta_L   = (n_{hidden}, n_{categories}) ,$$</p>
 
-<p>Next, we haven't implemented minibatching yet, which introduces stochasticity and is though to act as an important regularizer on the weights. We call a feed-forward + backward pass with a minibatch an <em>iteration</em>, and a full training period
-going through the entire dataset (\( n/M \) batches) an <em>epoch</em>.
+<p>where \( \boldsymbol{a} = (n_{inputs}, n_{hidden}) \). This simply means that we are summing up the gradients for each input.  
+Since we are going backwards we have to transpose the activation matrix.  
 </p>
 
-<p>If this does not improve network performance, you may want to consider altering the network architecture, adding more neurons or hidden layers.  
-Andrew Ng goes through some of these considerations in this <a href="https://youtu.be/F1ka6a13S9I" target="_self">video</a>. You can find a summary of the video <a href="https://kevinzakka.github.io/2016/09/26/applying-deep-learning/" target="_self">here</a>.  
+<p>The gradient with respect to the output bias is then  </p>
+
+<p>$$ \nabla \boldsymbol{b}_{L} = \sum_{i=1}^{n_{inputs}} \delta_L = (n_{categories}) .$$  </p>
+
+<p>The error in the hidden layer is  </p>
+
+<p>$$ \Delta_h = \delta_L W_{L}^T \circ f'(z_{h}) = \delta_L W_{L}^T \circ a_{h} \circ (1 - a_{h}) = (n_{inputs}, n_{hidden}) ,$$  </p>
+
+<p>where \( f'(a_{h}) \) is the derivative of the activation in the hidden layer. The matrix products mean
+that we are summing up the products for each neuron in the output layer. The symbol \( \circ \) denotes
+the <em>Hadamard product</em>, meaning element-wise multiplication.  
 </p>
 
+<p>This again gives us the gradients in the hidden layer:  </p>
+
+<p>$$ \nabla W_{h} = X^T \delta_h = (n_{features}, n_{hidden}) ,$$  </p>
+
+<p>$$ \nabla b_{h} = \sum_{i=1}^{n_{inputs}} \delta_h = (n_{hidden}) .$$</p>
+
+
+<!-- code=python (!bc pycod) typeset with pygments style "default" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #f8f8f8">
+  <pre style="line-height: 125%;"><span style="color: #408080; font-style: italic"># to categorical turns our integer vector into a onehot representation</span>
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.metrics</span> <span style="color: #008000; font-weight: bold">import</span> accuracy_score
+
+<span style="color: #408080; font-style: italic"># one-hot in numpy</span>
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">to_categorical_numpy</span>(integer_vector):
+    n_inputs <span style="color: #666666">=</span> <span style="color: #008000">len</span>(integer_vector)
+    n_categories <span style="color: #666666">=</span> np<span style="color: #666666">.</span>max(integer_vector) <span style="color: #666666">+</span> <span style="color: #666666">1</span>
+    onehot_vector <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros((n_inputs, n_categories))
+    onehot_vector[<span style="color: #008000">range</span>(n_inputs), integer_vector] <span style="color: #666666">=</span> <span style="color: #666666">1</span>
+    
+    <span style="color: #008000; font-weight: bold">return</span> onehot_vector
+
+<span style="color: #408080; font-style: italic">#Y_train_onehot, Y_test_onehot = to_categorical(Y_train), to_categorical(Y_test)</span>
+Y_train_onehot, Y_test_onehot <span style="color: #666666">=</span> to_categorical_numpy(Y_train), to_categorical_numpy(Y_test)
+
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">feed_forward_train</span>(X):
+    <span style="color: #408080; font-style: italic"># weighted sum of inputs to the hidden layer</span>
+    z_h <span style="color: #666666">=</span> np<span style="color: #666666">.</span>matmul(X, hidden_weights) <span style="color: #666666">+</span> hidden_bias
+    <span style="color: #408080; font-style: italic"># activation in the hidden layer</span>
+    a_h <span style="color: #666666">=</span> sigmoid(z_h)
+    
+    <span style="color: #408080; font-style: italic"># weighted sum of inputs to the output layer</span>
+    z_o <span style="color: #666666">=</span> np<span style="color: #666666">.</span>matmul(a_h, output_weights) <span style="color: #666666">+</span> output_bias
+    <span style="color: #408080; font-style: italic"># softmax output</span>
+    <span style="color: #408080; font-style: italic"># axis 0 holds each input and axis 1 the probabilities of each category</span>
+    exp_term <span style="color: #666666">=</span> np<span style="color: #666666">.</span>exp(z_o)
+    probabilities <span style="color: #666666">=</span> exp_term <span style="color: #666666">/</span> np<span style="color: #666666">.</span>sum(exp_term, axis<span style="color: #666666">=1</span>, keepdims<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">True</span>)
+    
+    <span style="color: #408080; font-style: italic"># for backpropagation need activations in hidden and output layers</span>
+    <span style="color: #008000; font-weight: bold">return</span> a_h, probabilities
+
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">backpropagation</span>(X, Y):
+    a_h, probabilities <span style="color: #666666">=</span> feed_forward_train(X)
+    
+    <span style="color: #408080; font-style: italic"># error in the output layer</span>
+    error_output <span style="color: #666666">=</span> probabilities <span style="color: #666666">-</span> Y
+    <span style="color: #408080; font-style: italic"># error in the hidden layer</span>
+    error_hidden <span style="color: #666666">=</span> np<span style="color: #666666">.</span>matmul(error_output, output_weights<span style="color: #666666">.</span>T) <span style="color: #666666">*</span> a_h <span style="color: #666666">*</span> (<span style="color: #666666">1</span> <span style="color: #666666">-</span> a_h)
+    
+    <span style="color: #408080; font-style: italic"># gradients for the output layer</span>
+    output_weights_gradient <span style="color: #666666">=</span> np<span style="color: #666666">.</span>matmul(a_h<span style="color: #666666">.</span>T, error_output)
+    output_bias_gradient <span style="color: #666666">=</span> np<span style="color: #666666">.</span>sum(error_output, axis<span style="color: #666666">=0</span>)
+    
+    <span style="color: #408080; font-style: italic"># gradient for the hidden layer</span>
+    hidden_weights_gradient <span style="color: #666666">=</span> np<span style="color: #666666">.</span>matmul(X<span style="color: #666666">.</span>T, error_hidden)
+    hidden_bias_gradient <span style="color: #666666">=</span> np<span style="color: #666666">.</span>sum(error_hidden, axis<span style="color: #666666">=0</span>)
+
+    <span style="color: #008000; font-weight: bold">return</span> output_weights_gradient, output_bias_gradient, hidden_weights_gradient, hidden_bias_gradient
+
+<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;Old accuracy on training data: &quot;</span> <span style="color: #666666">+</span> <span style="color: #008000">str</span>(accuracy_score(predict(X_train), Y_train)))
+
+eta <span style="color: #666666">=</span> <span style="color: #666666">0.01</span>
+lmbd <span style="color: #666666">=</span> <span style="color: #666666">0.01</span>
+<span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #666666">1000</span>):
+    <span style="color: #408080; font-style: italic"># calculate gradients</span>
+    dWo, dBo, dWh, dBh <span style="color: #666666">=</span> backpropagation(X_train, Y_train_onehot)
+    
+    <span style="color: #408080; font-style: italic"># regularization term gradients</span>
+    dWo <span style="color: #666666">+=</span> lmbd <span style="color: #666666">*</span> output_weights
+    dWh <span style="color: #666666">+=</span> lmbd <span style="color: #666666">*</span> hidden_weights
+    
+    <span style="color: #408080; font-style: italic"># update weights and biases</span>
+    output_weights <span style="color: #666666">-=</span> eta <span style="color: #666666">*</span> dWo
+    output_bias <span style="color: #666666">-=</span> eta <span style="color: #666666">*</span> dBo
+    hidden_weights <span style="color: #666666">-=</span> eta <span style="color: #666666">*</span> dWh
+    hidden_bias <span style="color: #666666">-=</span> eta <span style="color: #666666">*</span> dBh
+
+<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;New accuracy on training data: &quot;</span> <span style="color: #666666">+</span> <span style="color: #008000">str</span>(accuracy_score(predict(X_train), Y_train)))
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
+</div>
+
+
 <p>
 <!-- navigation buttons at the bottom of the page -->
 <ul class="pagination">
diff --git a/doc/pub/week42/html/._week42-bs087.html b/doc/pub/week42/html/._week42-bs087.html
index 52d5da8b3..a9b8577a7 100644
--- a/doc/pub/week42/html/._week42-bs087.html
+++ b/doc/pub/week42/html/._week42-bs087.html
@@ -36,16 +36,17 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Lecture October 14, 2024', 2, None, 'lecture-october-14-2024'),
-              ('Material for the active learning sessions on Tuesday and '
-               'Wednesday',
+ 'sections': [('Lecture October 13, 2025', 2, None, 'lecture-october-13-2025'),
+              ('Readings and videos', 2, None, 'readings-and-videos'),
+              ('Material for the lab sessions on Tuesday and Wednesday',
                2,
                None,
-               'material-for-the-active-learning-sessions-on-tuesday-and-wednesday'),
-              ('Writing a code which implements a feed-forward neural network',
+               'material-for-the-lab-sessions-on-tuesday-and-wednesday'),
+              ('Lecture material: Writing a code which implements a '
+               'feed-forward neural network',
                2,
                None,
-               'writing-a-code-which-implements-a-feed-forward-neural-network'),
+               'lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network'),
               ('Mathematics of deep learning',
                2,
                None,
@@ -55,10 +56,11 @@
                None,
                'reminder-on-books-with-hands-on-material-and-codes'),
               ('Reading recommendations', 2, None, 'reading-recommendations'),
-              ('First network example, simple percepetron with one input',
+              ('Reminder from last week: First network example, simple '
+               'percepetron with one input',
                2,
                None,
-               'first-network-example-simple-percepetron-with-one-input'),
+               'reminder-from-last-week-first-network-example-simple-percepetron-with-one-input'),
               ('Layout of a simple neural network with no hidden layer',
                2,
                None,
@@ -105,10 +107,10 @@
                None,
                'setting-up-the-equations-for-a-neural-network'),
               ('Layout of a neural network with three hidden layers (last '
-               'later = $l=L=4$, first layer $l=0$)',
+               'layer = $l=L=4$, first layer $l=0$)',
                2,
                None,
-               'layout-of-a-neural-network-with-three-hidden-layers-last-later-l-l-4-first-layer-l-0'),
+               'layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0'),
               ('Definitions', 2, None, 'definitions'),
               ('Inputs to the activation function',
                2,
@@ -291,10 +293,6 @@
                2,
                None,
                'collect-and-pre-process-data'),
-              ('The Breast Cancer Data, now with Keras',
-               2,
-               None,
-               'the-breast-cancer-data-now-with-keras'),
               ('Building a neural network code',
                2,
                None,
@@ -349,103 +347,103 @@
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs001.html#lecture-october-14-2024" style="font-size: 80%;"><b>Lecture October 14, 2024</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs002.html#material-for-the-active-learning-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Material for the active learning sessions on Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs003.html#writing-a-code-which-implements-a-feed-forward-neural-network" style="font-size: 80%;"><b>Writing a code which implements a feed-forward neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs004.html#mathematics-of-deep-learning" style="font-size: 80%;"><b>Mathematics of deep learning</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs005.html#reminder-on-books-with-hands-on-material-and-codes" style="font-size: 80%;"><b>Reminder on books with hands-on material and codes</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs006.html#reading-recommendations" style="font-size: 80%;"><b>Reading recommendations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs007.html#first-network-example-simple-percepetron-with-one-input" style="font-size: 80%;"><b>First network example, simple percepetron with one input</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs008.html#layout-of-a-simple-neural-network-with-no-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with no hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs009.html#optimizing-the-parameters" style="font-size: 80%;"><b>Optimizing the parameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs010.html#adding-a-hidden-layer" style="font-size: 80%;"><b>Adding a hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs011.html#layout-of-a-simple-neural-network-with-one-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with one hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs012.html#the-derivatives" style="font-size: 80%;"><b>The derivatives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs013.html#important-observations" style="font-size: 80%;"><b>Important observations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs014.html#the-training" style="font-size: 80%;"><b>The training</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs015.html#code-example" style="font-size: 80%;"><b>Code example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs016.html#simple-neural-network-and-the-back-propagation-equations" style="font-size: 80%;"><b>Simple neural network and the  back propagation equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs017.html#layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-with-two-hidden-noeds-and-one-output-node" style="font-size: 80%;"><b>Layout of a simple neural network with two input nodes, one  hidden layer with two hidden noeds and one output node</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs018.html#the-ouput-layer" style="font-size: 80%;"><b>The ouput layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs019.html#compact-expressions" style="font-size: 80%;"><b>Compact expressions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs020.html#output-layer" style="font-size: 80%;"><b>Output layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs021.html#explicit-derivatives" style="font-size: 80%;"><b>Explicit derivatives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs022.html#derivatives-of-the-hidden-layer" style="font-size: 80%;"><b>Derivatives of the hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs023.html#final-expression" style="font-size: 80%;"><b>Final expression</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs024.html#completing-the-list" style="font-size: 80%;"><b>Completing the list</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs025.html#final-expressions-for-the-biases-of-the-hidden-layer" style="font-size: 80%;"><b>Final expressions for the biases of the hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs026.html#gradient-expressions" style="font-size: 80%;"><b>Gradient expressions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs027.html#setting-up-the-equations-for-a-neural-network" style="font-size: 80%;"><b>Setting up the equations for a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs028.html#layout-of-a-neural-network-with-three-hidden-layers-last-later-l-l-4-first-layer-l-0" style="font-size: 80%;"><b>Layout of a neural network with three hidden layers (last later = \( l=L=4 \), first layer \( l=0 \))</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs029.html#definitions" style="font-size: 80%;"><b>Definitions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs030.html#inputs-to-the-activation-function" style="font-size: 80%;"><b>Inputs to the activation function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs031.html#layout-of-input-to-first-hidden-layer-l-1-from-input-layer-l-0" style="font-size: 80%;"><b>Layout of input to first hidden layer \( l=1 \) from input layer \( l=0 \)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs032.html#derivatives-and-the-chain-rule" style="font-size: 80%;"><b>Derivatives and the chain rule</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs033.html#derivative-of-the-cost-function" style="font-size: 80%;"><b>Derivative of the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs034.html#the-back-propagation-equations-for-a-neural-network" style="font-size: 80%;"><b>The  back propagation equations for a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs035.html#analyzing-the-last-results" style="font-size: 80%;"><b>Analyzing the last results</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs036.html#more-considerations" style="font-size: 80%;"><b>More considerations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs037.html#derivatives-in-terms-of-z-j-l" style="font-size: 80%;"><b>Derivatives in terms of \( z_j^L \)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs038.html#bringing-it-together" style="font-size: 80%;"><b>Bringing it together</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs039.html#final-back-propagating-equation" style="font-size: 80%;"><b>Final back propagating equation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs040.html#using-the-chain-rule-and-summing-over-all-k-entries" style="font-size: 80%;"><b>Using the chain rule and summing over all \( k \) entries</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs041.html#setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations" style="font-size: 80%;"><b>Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs042.html#setting-up-the-back-propagation-algorithm-part-1" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 1</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs043.html#setting-up-the-back-propagation-algorithm-part-2" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs044.html#setting-up-the-back-propagation-algorithm-part-3" style="font-size: 80%;"><b>Setting up the Back propagation algorithm, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs045.html#updating-the-gradients" style="font-size: 80%;"><b>Updating the gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs001.html#lecture-october-13-2025" style="font-size: 80%;"><b>Lecture October 13, 2025</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs002.html#readings-and-videos" style="font-size: 80%;"><b>Readings and videos</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs003.html#material-for-the-lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Material for the lab sessions on Tuesday and Wednesday</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs004.html#lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network" style="font-size: 80%;"><b>Lecture material: Writing a code which implements a feed-forward neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs005.html#mathematics-of-deep-learning" style="font-size: 80%;"><b>Mathematics of deep learning</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs006.html#reminder-on-books-with-hands-on-material-and-codes" style="font-size: 80%;"><b>Reminder on books with hands-on material and codes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs007.html#reading-recommendations" style="font-size: 80%;"><b>Reading recommendations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs008.html#reminder-from-last-week-first-network-example-simple-percepetron-with-one-input" style="font-size: 80%;"><b>Reminder from last week: First network example, simple percepetron with one input</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs009.html#layout-of-a-simple-neural-network-with-no-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with no hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs010.html#optimizing-the-parameters" style="font-size: 80%;"><b>Optimizing the parameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs011.html#adding-a-hidden-layer" style="font-size: 80%;"><b>Adding a hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs012.html#layout-of-a-simple-neural-network-with-one-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with one hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs013.html#the-derivatives" style="font-size: 80%;"><b>The derivatives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs014.html#important-observations" style="font-size: 80%;"><b>Important observations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs015.html#the-training" style="font-size: 80%;"><b>The training</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs016.html#code-example" style="font-size: 80%;"><b>Code example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs017.html#simple-neural-network-and-the-back-propagation-equations" style="font-size: 80%;"><b>Simple neural network and the  back propagation equations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs018.html#layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-with-two-hidden-noeds-and-one-output-node" style="font-size: 80%;"><b>Layout of a simple neural network with two input nodes, one  hidden layer with two hidden noeds and one output node</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs019.html#the-ouput-layer" style="font-size: 80%;"><b>The ouput layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs020.html#compact-expressions" style="font-size: 80%;"><b>Compact expressions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs021.html#output-layer" style="font-size: 80%;"><b>Output layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs022.html#explicit-derivatives" style="font-size: 80%;"><b>Explicit derivatives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs023.html#derivatives-of-the-hidden-layer" style="font-size: 80%;"><b>Derivatives of the hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs024.html#final-expression" style="font-size: 80%;"><b>Final expression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs025.html#completing-the-list" style="font-size: 80%;"><b>Completing the list</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs026.html#final-expressions-for-the-biases-of-the-hidden-layer" style="font-size: 80%;"><b>Final expressions for the biases of the hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs027.html#gradient-expressions" style="font-size: 80%;"><b>Gradient expressions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs028.html#setting-up-the-equations-for-a-neural-network" style="font-size: 80%;"><b>Setting up the equations for a neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs029.html#layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0" style="font-size: 80%;"><b>Layout of a neural network with three hidden layers (last layer = \( l=L=4 \), first layer \( l=0 \))</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs030.html#definitions" style="font-size: 80%;"><b>Definitions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs031.html#inputs-to-the-activation-function" style="font-size: 80%;"><b>Inputs to the activation function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs032.html#layout-of-input-to-first-hidden-layer-l-1-from-input-layer-l-0" style="font-size: 80%;"><b>Layout of input to first hidden layer \( l=1 \) from input layer \( l=0 \)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs033.html#derivatives-and-the-chain-rule" style="font-size: 80%;"><b>Derivatives and the chain rule</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs034.html#derivative-of-the-cost-function" style="font-size: 80%;"><b>Derivative of the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs035.html#the-back-propagation-equations-for-a-neural-network" style="font-size: 80%;"><b>The  back propagation equations for a neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs036.html#analyzing-the-last-results" style="font-size: 80%;"><b>Analyzing the last results</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs037.html#more-considerations" style="font-size: 80%;"><b>More considerations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs038.html#derivatives-in-terms-of-z-j-l" style="font-size: 80%;"><b>Derivatives in terms of \( z_j^L \)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs039.html#bringing-it-together" style="font-size: 80%;"><b>Bringing it together</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs040.html#final-back-propagating-equation" style="font-size: 80%;"><b>Final back propagating equation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs041.html#using-the-chain-rule-and-summing-over-all-k-entries" style="font-size: 80%;"><b>Using the chain rule and summing over all \( k \) entries</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs042.html#setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations" style="font-size: 80%;"><b>Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs043.html#setting-up-the-back-propagation-algorithm-part-1" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 1</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs044.html#setting-up-the-back-propagation-algorithm-part-2" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 2</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs045.html#setting-up-the-back-propagation-algorithm-part-3" style="font-size: 80%;"><b>Setting up the Back propagation algorithm, part 3</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs046.html#updating-the-gradients" style="font-size: 80%;"><b>Updating the gradients</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#activation-functions" style="font-size: 80%;"><b>Activation functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs047.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs048.html#relevance" style="font-size: 80%;"><b>Relevance</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs049.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs050.html#exploding-gradients" style="font-size: 80%;"><b>Exploding gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs051.html#is-the-logistic-activation-function-sigmoid-our-choice" style="font-size: 80%;"><b>Is the Logistic activation function (Sigmoid)  our choice?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs052.html#logistic-function-as-the-root-of-problems" style="font-size: 80%;"><b>Logistic function as the root of problems</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs053.html#the-derivative-of-the-logistic-funtion" style="font-size: 80%;"><b>The derivative of the Logistic funtion</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs054.html#insights-from-the-paper-by-glorot-and-bengio" style="font-size: 80%;"><b>Insights from the paper by Glorot and Bengio</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs055.html#the-relu-function-family" style="font-size: 80%;"><b>The RELU function family</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs056.html#elu-function" style="font-size: 80%;"><b>ELU function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs057.html#which-activation-function-should-we-use" style="font-size: 80%;"><b>Which activation function should we use?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs058.html#more-on-activation-functions-output-layers" style="font-size: 80%;"><b>More on activation functions, output layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs059.html#fine-tuning-neural-network-hyperparameters" style="font-size: 80%;"><b>Fine-tuning neural network hyperparameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs060.html#hidden-layers" style="font-size: 80%;"><b>Hidden layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs061.html#batch-normalization" style="font-size: 80%;"><b>Batch Normalization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs062.html#dropout" style="font-size: 80%;"><b>Dropout</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs063.html#gradient-clipping" style="font-size: 80%;"><b>Gradient Clipping</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs064.html#a-top-down-perspective-on-neural-networks" style="font-size: 80%;"><b>A top-down perspective on Neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs065.html#more-top-down-perspectives" style="font-size: 80%;"><b>More top-down perspectives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs066.html#limitations-of-supervised-learning-with-deep-networks" style="font-size: 80%;"><b>Limitations of supervised learning with deep networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs067.html#limitations-of-nns" style="font-size: 80%;"><b>Limitations of NNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs068.html#homogeneous-data" style="font-size: 80%;"><b>Homogeneous data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs069.html#more-limitations" style="font-size: 80%;"><b>More limitations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs070.html#setting-up-a-multi-layer-perceptron-model-for-classification" style="font-size: 80%;"><b>Setting up a Multi-layer perceptron model for classification</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs071.html#defining-the-cost-function" style="font-size: 80%;"><b>Defining the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs072.html#example-binary-classification-problem" style="font-size: 80%;"><b>Example: binary classification problem</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs073.html#the-softmax-function" style="font-size: 80%;"><b>The Softmax function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs074.html#developing-a-code-for-doing-neural-networks-with-back-propagation" style="font-size: 80%;"><b>Developing a code for doing neural networks with back propagation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs076.html#train-and-test-datasets" style="font-size: 80%;"><b>Train and test datasets</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs077.html#define-model-and-architecture" style="font-size: 80%;"><b>Define model and architecture</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs078.html#layers" style="font-size: 80%;"><b>Layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs079.html#weights-and-biases" style="font-size: 80%;"><b>Weights and biases</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs080.html#feed-forward-pass" style="font-size: 80%;"><b>Feed-forward pass</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs081.html#matrix-multiplications" style="font-size: 80%;"><b>Matrix multiplications</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs082.html#choose-cost-function-and-optimizer" style="font-size: 80%;"><b>Choose cost function and optimizer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs083.html#optimizing-the-cost-function" style="font-size: 80%;"><b>Optimizing the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs084.html#regularization" style="font-size: 80%;"><b>Regularization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs085.html#matrix-multiplication" style="font-size: 80%;"><b>Matrix  multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs086.html#improving-performance" style="font-size: 80%;"><b>Improving performance</b></a></li>
-     <!-- navigation toc: --> <li><a href="#full-object-oriented-implementation" style="font-size: 80%;"><b>Full object-oriented implementation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs088.html#evaluate-model-performance-on-test-data" style="font-size: 80%;"><b>Evaluate model performance on test data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs089.html#adjust-hyperparameters" style="font-size: 80%;"><b>Adjust hyperparameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs091.html#scikit-learn-implementation" style="font-size: 80%;"><b>scikit-learn implementation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#building-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs094.html#tensorflow" style="font-size: 80%;"><b>Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs095.html#using-keras" style="font-size: 80%;"><b>Using Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#the-breast-cancer-data-now-with-keras" style="font-size: 80%;"><b>The Breast Cancer Data, now with Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs048.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs049.html#relevance" style="font-size: 80%;"><b>Relevance</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs050.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs051.html#exploding-gradients" style="font-size: 80%;"><b>Exploding gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs052.html#is-the-logistic-activation-function-sigmoid-our-choice" style="font-size: 80%;"><b>Is the Logistic activation function (Sigmoid)  our choice?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs053.html#logistic-function-as-the-root-of-problems" style="font-size: 80%;"><b>Logistic function as the root of problems</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs054.html#the-derivative-of-the-logistic-funtion" style="font-size: 80%;"><b>The derivative of the Logistic funtion</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs055.html#insights-from-the-paper-by-glorot-and-bengio" style="font-size: 80%;"><b>Insights from the paper by Glorot and Bengio</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs056.html#the-relu-function-family" style="font-size: 80%;"><b>The RELU function family</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs057.html#elu-function" style="font-size: 80%;"><b>ELU function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs058.html#which-activation-function-should-we-use" style="font-size: 80%;"><b>Which activation function should we use?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs059.html#more-on-activation-functions-output-layers" style="font-size: 80%;"><b>More on activation functions, output layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs060.html#fine-tuning-neural-network-hyperparameters" style="font-size: 80%;"><b>Fine-tuning neural network hyperparameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs061.html#hidden-layers" style="font-size: 80%;"><b>Hidden layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs062.html#batch-normalization" style="font-size: 80%;"><b>Batch Normalization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs063.html#dropout" style="font-size: 80%;"><b>Dropout</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs064.html#gradient-clipping" style="font-size: 80%;"><b>Gradient Clipping</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs065.html#a-top-down-perspective-on-neural-networks" style="font-size: 80%;"><b>A top-down perspective on Neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs066.html#more-top-down-perspectives" style="font-size: 80%;"><b>More top-down perspectives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs067.html#limitations-of-supervised-learning-with-deep-networks" style="font-size: 80%;"><b>Limitations of supervised learning with deep networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs068.html#limitations-of-nns" style="font-size: 80%;"><b>Limitations of NNs</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs069.html#homogeneous-data" style="font-size: 80%;"><b>Homogeneous data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs070.html#more-limitations" style="font-size: 80%;"><b>More limitations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs071.html#setting-up-a-multi-layer-perceptron-model-for-classification" style="font-size: 80%;"><b>Setting up a Multi-layer perceptron model for classification</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs072.html#defining-the-cost-function" style="font-size: 80%;"><b>Defining the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs073.html#example-binary-classification-problem" style="font-size: 80%;"><b>Example: binary classification problem</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs074.html#the-softmax-function" style="font-size: 80%;"><b>The Softmax function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs075.html#developing-a-code-for-doing-neural-networks-with-back-propagation" style="font-size: 80%;"><b>Developing a code for doing neural networks with back propagation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs077.html#train-and-test-datasets" style="font-size: 80%;"><b>Train and test datasets</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs078.html#define-model-and-architecture" style="font-size: 80%;"><b>Define model and architecture</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs079.html#layers" style="font-size: 80%;"><b>Layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs080.html#weights-and-biases" style="font-size: 80%;"><b>Weights and biases</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs081.html#feed-forward-pass" style="font-size: 80%;"><b>Feed-forward pass</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs082.html#matrix-multiplications" style="font-size: 80%;"><b>Matrix multiplications</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs083.html#choose-cost-function-and-optimizer" style="font-size: 80%;"><b>Choose cost function and optimizer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs084.html#optimizing-the-cost-function" style="font-size: 80%;"><b>Optimizing the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs085.html#regularization" style="font-size: 80%;"><b>Regularization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs086.html#matrix-multiplication" style="font-size: 80%;"><b>Matrix  multiplication</b></a></li>
+     <!-- navigation toc: --> <li><a href="#improving-performance" style="font-size: 80%;"><b>Improving performance</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs088.html#full-object-oriented-implementation" style="font-size: 80%;"><b>Full object-oriented implementation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs089.html#evaluate-model-performance-on-test-data" style="font-size: 80%;"><b>Evaluate model performance on test data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs090.html#adjust-hyperparameters" style="font-size: 80%;"><b>Adjust hyperparameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#scikit-learn-implementation" style="font-size: 80%;"><b>scikit-learn implementation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs094.html#building-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building neural networks in Tensorflow and Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs095.html#tensorflow" style="font-size: 80%;"><b>Tensorflow</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#using-keras" style="font-size: 80%;"><b>Using Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#building-a-neural-network-code" style="font-size: 80%;"><b>Building a neural network code</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#learning-rate-methods" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Learning rate methods</a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#usage-of-the-above-learning-rate-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of the above learning rate schedulers</a></li>
@@ -465,133 +463,21 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0087"></a>
 <!-- !split -->
-<h2 id="full-object-oriented-implementation" class="anchor">Full object-oriented implementation </h2>
+<h2 id="improving-performance" class="anchor">Improving performance </h2>
 
-<p>It is very natural to think of the network as an object, with specific instances of the network
-being realizations of this object with different hyperparameters. An implementation using Python classes provides a clean structure and interface, and the full implementation of our neural network is given below.
+<p>As we can see the network does not seem to be learning at all. It seems to be just guessing the label for each image.  
+In order to obtain a network that does something useful, we will have to do a bit more work.  
 </p>
 
+<p>The choice of <em>hyperparameters</em> such as learning rate and regularization parameter is hugely influential for the performance of the network. Typically a <em>grid-search</em> is performed, wherein we test different hyperparameters separated by orders of magnitude. For example we could test the learning rates \( \eta = 10^{-6}, 10^{-5},...,10^{-1} \) with different regularization parameters \( \lambda = 10^{-6},...,10^{-0} \).  </p>
 
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">class</span> <span style="color: #0000FF; font-weight: bold">NeuralNetwork</span>:
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">__init__</span>(
-            <span style="color: #008000">self</span>,
-            X_data,
-            Y_data,
-            n_hidden_neurons<span style="color: #666666">=50</span>,
-            n_categories<span style="color: #666666">=10</span>,
-            epochs<span style="color: #666666">=10</span>,
-            batch_size<span style="color: #666666">=100</span>,
-            eta<span style="color: #666666">=0.1</span>,
-            lmbd<span style="color: #666666">=0.0</span>):
-
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>X_data_full <span style="color: #666666">=</span> X_data
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>Y_data_full <span style="color: #666666">=</span> Y_data
-
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>n_inputs <span style="color: #666666">=</span> X_data<span style="color: #666666">.</span>shape[<span style="color: #666666">0</span>]
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>n_features <span style="color: #666666">=</span> X_data<span style="color: #666666">.</span>shape[<span style="color: #666666">1</span>]
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>n_hidden_neurons <span style="color: #666666">=</span> n_hidden_neurons
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>n_categories <span style="color: #666666">=</span> n_categories
-
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>epochs <span style="color: #666666">=</span> epochs
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>batch_size <span style="color: #666666">=</span> batch_size
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>iterations <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>n_inputs <span style="color: #666666">//</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>batch_size
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>eta <span style="color: #666666">=</span> eta
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>lmbd <span style="color: #666666">=</span> lmbd
-
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>create_biases_and_weights()
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">create_biases_and_weights</span>(<span style="color: #008000">self</span>):
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>hidden_weights <span style="color: #666666">=</span> np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>randn(<span style="color: #008000">self</span><span style="color: #666666">.</span>n_features, <span style="color: #008000">self</span><span style="color: #666666">.</span>n_hidden_neurons)
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>hidden_bias <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros(<span style="color: #008000">self</span><span style="color: #666666">.</span>n_hidden_neurons) <span style="color: #666666">+</span> <span style="color: #666666">0.01</span>
-
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>output_weights <span style="color: #666666">=</span> np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>randn(<span style="color: #008000">self</span><span style="color: #666666">.</span>n_hidden_neurons, <span style="color: #008000">self</span><span style="color: #666666">.</span>n_categories)
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>output_bias <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros(<span style="color: #008000">self</span><span style="color: #666666">.</span>n_categories) <span style="color: #666666">+</span> <span style="color: #666666">0.01</span>
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">feed_forward</span>(<span style="color: #008000">self</span>):
-        <span style="color: #408080; font-style: italic"># feed-forward for training</span>
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>z_h <span style="color: #666666">=</span> np<span style="color: #666666">.</span>matmul(<span style="color: #008000">self</span><span style="color: #666666">.</span>X_data, <span style="color: #008000">self</span><span style="color: #666666">.</span>hidden_weights) <span style="color: #666666">+</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>hidden_bias
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>a_h <span style="color: #666666">=</span> sigmoid(<span style="color: #008000">self</span><span style="color: #666666">.</span>z_h)
-
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>z_o <span style="color: #666666">=</span> np<span style="color: #666666">.</span>matmul(<span style="color: #008000">self</span><span style="color: #666666">.</span>a_h, <span style="color: #008000">self</span><span style="color: #666666">.</span>output_weights) <span style="color: #666666">+</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>output_bias
-
-        exp_term <span style="color: #666666">=</span> np<span style="color: #666666">.</span>exp(<span style="color: #008000">self</span><span style="color: #666666">.</span>z_o)
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>probabilities <span style="color: #666666">=</span> exp_term <span style="color: #666666">/</span> np<span style="color: #666666">.</span>sum(exp_term, axis<span style="color: #666666">=1</span>, keepdims<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">True</span>)
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">feed_forward_out</span>(<span style="color: #008000">self</span>, X):
-        <span style="color: #408080; font-style: italic"># feed-forward for output</span>
-        z_h <span style="color: #666666">=</span> np<span style="color: #666666">.</span>matmul(X, <span style="color: #008000">self</span><span style="color: #666666">.</span>hidden_weights) <span style="color: #666666">+</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>hidden_bias
-        a_h <span style="color: #666666">=</span> sigmoid(z_h)
-
-        z_o <span style="color: #666666">=</span> np<span style="color: #666666">.</span>matmul(a_h, <span style="color: #008000">self</span><span style="color: #666666">.</span>output_weights) <span style="color: #666666">+</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>output_bias
-        
-        exp_term <span style="color: #666666">=</span> np<span style="color: #666666">.</span>exp(z_o)
-        probabilities <span style="color: #666666">=</span> exp_term <span style="color: #666666">/</span> np<span style="color: #666666">.</span>sum(exp_term, axis<span style="color: #666666">=1</span>, keepdims<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">True</span>)
-        <span style="color: #008000; font-weight: bold">return</span> probabilities
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">backpropagation</span>(<span style="color: #008000">self</span>):
-        error_output <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>probabilities <span style="color: #666666">-</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>Y_data
-        error_hidden <span style="color: #666666">=</span> np<span style="color: #666666">.</span>matmul(error_output, <span style="color: #008000">self</span><span style="color: #666666">.</span>output_weights<span style="color: #666666">.</span>T) <span style="color: #666666">*</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>a_h <span style="color: #666666">*</span> (<span style="color: #666666">1</span> <span style="color: #666666">-</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>a_h)
-
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>output_weights_gradient <span style="color: #666666">=</span> np<span style="color: #666666">.</span>matmul(<span style="color: #008000">self</span><span style="color: #666666">.</span>a_h<span style="color: #666666">.</span>T, error_output)
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>output_bias_gradient <span style="color: #666666">=</span> np<span style="color: #666666">.</span>sum(error_output, axis<span style="color: #666666">=0</span>)
-
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>hidden_weights_gradient <span style="color: #666666">=</span> np<span style="color: #666666">.</span>matmul(<span style="color: #008000">self</span><span style="color: #666666">.</span>X_data<span style="color: #666666">.</span>T, error_hidden)
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>hidden_bias_gradient <span style="color: #666666">=</span> np<span style="color: #666666">.</span>sum(error_hidden, axis<span style="color: #666666">=0</span>)
-
-        <span style="color: #008000; font-weight: bold">if</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>lmbd <span style="color: #666666">&gt;</span> <span style="color: #666666">0.0</span>:
-            <span style="color: #008000">self</span><span style="color: #666666">.</span>output_weights_gradient <span style="color: #666666">+=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>lmbd <span style="color: #666666">*</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>output_weights
-            <span style="color: #008000">self</span><span style="color: #666666">.</span>hidden_weights_gradient <span style="color: #666666">+=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>lmbd <span style="color: #666666">*</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>hidden_weights
-
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>output_weights <span style="color: #666666">-=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>eta <span style="color: #666666">*</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>output_weights_gradient
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>output_bias <span style="color: #666666">-=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>eta <span style="color: #666666">*</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>output_bias_gradient
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>hidden_weights <span style="color: #666666">-=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>eta <span style="color: #666666">*</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>hidden_weights_gradient
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>hidden_bias <span style="color: #666666">-=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>eta <span style="color: #666666">*</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>hidden_bias_gradient
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">predict</span>(<span style="color: #008000">self</span>, X):
-        probabilities <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>feed_forward_out(X)
-        <span style="color: #008000; font-weight: bold">return</span> np<span style="color: #666666">.</span>argmax(probabilities, axis<span style="color: #666666">=1</span>)
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">predict_probabilities</span>(<span style="color: #008000">self</span>, X):
-        probabilities <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>feed_forward_out(X)
-        <span style="color: #008000; font-weight: bold">return</span> probabilities
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">train</span>(<span style="color: #008000">self</span>):
-        data_indices <span style="color: #666666">=</span> np<span style="color: #666666">.</span>arange(<span style="color: #008000">self</span><span style="color: #666666">.</span>n_inputs)
-
-        <span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #008000">self</span><span style="color: #666666">.</span>epochs):
-            <span style="color: #008000; font-weight: bold">for</span> j <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #008000">self</span><span style="color: #666666">.</span>iterations):
-                <span style="color: #408080; font-style: italic"># pick datapoints with replacement</span>
-                chosen_datapoints <span style="color: #666666">=</span> np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>choice(
-                    data_indices, size<span style="color: #666666">=</span><span style="color: #008000">self</span><span style="color: #666666">.</span>batch_size, replace<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">False</span>
-                )
-
-                <span style="color: #408080; font-style: italic"># minibatch training data</span>
-                <span style="color: #008000">self</span><span style="color: #666666">.</span>X_data <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>X_data_full[chosen_datapoints]
-                <span style="color: #008000">self</span><span style="color: #666666">.</span>Y_data <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>Y_data_full[chosen_datapoints]
-
-                <span style="color: #008000">self</span><span style="color: #666666">.</span>feed_forward()
-                <span style="color: #008000">self</span><span style="color: #666666">.</span>backpropagation()
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
+<p>Next, we haven't implemented minibatching yet, which introduces stochasticity and is though to act as an important regularizer on the weights. We call a feed-forward + backward pass with a minibatch an <em>iteration</em>, and a full training period
+going through the entire dataset (\( n/M \) batches) an <em>epoch</em>.
+</p>
 
+<p>If this does not improve network performance, you may want to consider altering the network architecture, adding more neurons or hidden layers.  
+Andrew Ng goes through some of these considerations in this <a href="https://youtu.be/F1ka6a13S9I" target="_self">video</a>. You can find a summary of the video <a href="https://kevinzakka.github.io/2016/09/26/applying-deep-learning/" target="_self">here</a>.  
+</p>
 
 <p>
 <!-- navigation buttons at the bottom of the page -->
diff --git a/doc/pub/week42/html/._week42-bs088.html b/doc/pub/week42/html/._week42-bs088.html
index ccd1d5147..a6c07e80d 100644
--- a/doc/pub/week42/html/._week42-bs088.html
+++ b/doc/pub/week42/html/._week42-bs088.html
@@ -36,16 +36,17 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Lecture October 14, 2024', 2, None, 'lecture-october-14-2024'),
-              ('Material for the active learning sessions on Tuesday and '
-               'Wednesday',
+ 'sections': [('Lecture October 13, 2025', 2, None, 'lecture-october-13-2025'),
+              ('Readings and videos', 2, None, 'readings-and-videos'),
+              ('Material for the lab sessions on Tuesday and Wednesday',
                2,
                None,
-               'material-for-the-active-learning-sessions-on-tuesday-and-wednesday'),
-              ('Writing a code which implements a feed-forward neural network',
+               'material-for-the-lab-sessions-on-tuesday-and-wednesday'),
+              ('Lecture material: Writing a code which implements a '
+               'feed-forward neural network',
                2,
                None,
-               'writing-a-code-which-implements-a-feed-forward-neural-network'),
+               'lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network'),
               ('Mathematics of deep learning',
                2,
                None,
@@ -55,10 +56,11 @@
                None,
                'reminder-on-books-with-hands-on-material-and-codes'),
               ('Reading recommendations', 2, None, 'reading-recommendations'),
-              ('First network example, simple percepetron with one input',
+              ('Reminder from last week: First network example, simple '
+               'percepetron with one input',
                2,
                None,
-               'first-network-example-simple-percepetron-with-one-input'),
+               'reminder-from-last-week-first-network-example-simple-percepetron-with-one-input'),
               ('Layout of a simple neural network with no hidden layer',
                2,
                None,
@@ -105,10 +107,10 @@
                None,
                'setting-up-the-equations-for-a-neural-network'),
               ('Layout of a neural network with three hidden layers (last '
-               'later = $l=L=4$, first layer $l=0$)',
+               'layer = $l=L=4$, first layer $l=0$)',
                2,
                None,
-               'layout-of-a-neural-network-with-three-hidden-layers-last-later-l-l-4-first-layer-l-0'),
+               'layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0'),
               ('Definitions', 2, None, 'definitions'),
               ('Inputs to the activation function',
                2,
@@ -291,10 +293,6 @@
                2,
                None,
                'collect-and-pre-process-data'),
-              ('The Breast Cancer Data, now with Keras',
-               2,
-               None,
-               'the-breast-cancer-data-now-with-keras'),
               ('Building a neural network code',
                2,
                None,
@@ -349,103 +347,103 @@
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs001.html#lecture-october-14-2024" style="font-size: 80%;"><b>Lecture October 14, 2024</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs002.html#material-for-the-active-learning-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Material for the active learning sessions on Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs003.html#writing-a-code-which-implements-a-feed-forward-neural-network" style="font-size: 80%;"><b>Writing a code which implements a feed-forward neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs004.html#mathematics-of-deep-learning" style="font-size: 80%;"><b>Mathematics of deep learning</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs005.html#reminder-on-books-with-hands-on-material-and-codes" style="font-size: 80%;"><b>Reminder on books with hands-on material and codes</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs006.html#reading-recommendations" style="font-size: 80%;"><b>Reading recommendations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs007.html#first-network-example-simple-percepetron-with-one-input" style="font-size: 80%;"><b>First network example, simple percepetron with one input</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs008.html#layout-of-a-simple-neural-network-with-no-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with no hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs009.html#optimizing-the-parameters" style="font-size: 80%;"><b>Optimizing the parameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs010.html#adding-a-hidden-layer" style="font-size: 80%;"><b>Adding a hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs011.html#layout-of-a-simple-neural-network-with-one-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with one hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs012.html#the-derivatives" style="font-size: 80%;"><b>The derivatives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs013.html#important-observations" style="font-size: 80%;"><b>Important observations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs014.html#the-training" style="font-size: 80%;"><b>The training</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs015.html#code-example" style="font-size: 80%;"><b>Code example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs016.html#simple-neural-network-and-the-back-propagation-equations" style="font-size: 80%;"><b>Simple neural network and the  back propagation equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs017.html#layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-with-two-hidden-noeds-and-one-output-node" style="font-size: 80%;"><b>Layout of a simple neural network with two input nodes, one  hidden layer with two hidden noeds and one output node</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs018.html#the-ouput-layer" style="font-size: 80%;"><b>The ouput layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs019.html#compact-expressions" style="font-size: 80%;"><b>Compact expressions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs020.html#output-layer" style="font-size: 80%;"><b>Output layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs021.html#explicit-derivatives" style="font-size: 80%;"><b>Explicit derivatives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs022.html#derivatives-of-the-hidden-layer" style="font-size: 80%;"><b>Derivatives of the hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs023.html#final-expression" style="font-size: 80%;"><b>Final expression</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs024.html#completing-the-list" style="font-size: 80%;"><b>Completing the list</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs025.html#final-expressions-for-the-biases-of-the-hidden-layer" style="font-size: 80%;"><b>Final expressions for the biases of the hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs026.html#gradient-expressions" style="font-size: 80%;"><b>Gradient expressions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs027.html#setting-up-the-equations-for-a-neural-network" style="font-size: 80%;"><b>Setting up the equations for a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs028.html#layout-of-a-neural-network-with-three-hidden-layers-last-later-l-l-4-first-layer-l-0" style="font-size: 80%;"><b>Layout of a neural network with three hidden layers (last later = \( l=L=4 \), first layer \( l=0 \))</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs029.html#definitions" style="font-size: 80%;"><b>Definitions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs030.html#inputs-to-the-activation-function" style="font-size: 80%;"><b>Inputs to the activation function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs031.html#layout-of-input-to-first-hidden-layer-l-1-from-input-layer-l-0" style="font-size: 80%;"><b>Layout of input to first hidden layer \( l=1 \) from input layer \( l=0 \)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs032.html#derivatives-and-the-chain-rule" style="font-size: 80%;"><b>Derivatives and the chain rule</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs033.html#derivative-of-the-cost-function" style="font-size: 80%;"><b>Derivative of the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs034.html#the-back-propagation-equations-for-a-neural-network" style="font-size: 80%;"><b>The  back propagation equations for a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs035.html#analyzing-the-last-results" style="font-size: 80%;"><b>Analyzing the last results</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs036.html#more-considerations" style="font-size: 80%;"><b>More considerations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs037.html#derivatives-in-terms-of-z-j-l" style="font-size: 80%;"><b>Derivatives in terms of \( z_j^L \)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs038.html#bringing-it-together" style="font-size: 80%;"><b>Bringing it together</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs039.html#final-back-propagating-equation" style="font-size: 80%;"><b>Final back propagating equation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs040.html#using-the-chain-rule-and-summing-over-all-k-entries" style="font-size: 80%;"><b>Using the chain rule and summing over all \( k \) entries</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs041.html#setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations" style="font-size: 80%;"><b>Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs042.html#setting-up-the-back-propagation-algorithm-part-1" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 1</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs043.html#setting-up-the-back-propagation-algorithm-part-2" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs044.html#setting-up-the-back-propagation-algorithm-part-3" style="font-size: 80%;"><b>Setting up the Back propagation algorithm, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs045.html#updating-the-gradients" style="font-size: 80%;"><b>Updating the gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs001.html#lecture-october-13-2025" style="font-size: 80%;"><b>Lecture October 13, 2025</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs002.html#readings-and-videos" style="font-size: 80%;"><b>Readings and videos</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs003.html#material-for-the-lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Material for the lab sessions on Tuesday and Wednesday</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs004.html#lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network" style="font-size: 80%;"><b>Lecture material: Writing a code which implements a feed-forward neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs005.html#mathematics-of-deep-learning" style="font-size: 80%;"><b>Mathematics of deep learning</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs006.html#reminder-on-books-with-hands-on-material-and-codes" style="font-size: 80%;"><b>Reminder on books with hands-on material and codes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs007.html#reading-recommendations" style="font-size: 80%;"><b>Reading recommendations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs008.html#reminder-from-last-week-first-network-example-simple-percepetron-with-one-input" style="font-size: 80%;"><b>Reminder from last week: First network example, simple percepetron with one input</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs009.html#layout-of-a-simple-neural-network-with-no-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with no hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs010.html#optimizing-the-parameters" style="font-size: 80%;"><b>Optimizing the parameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs011.html#adding-a-hidden-layer" style="font-size: 80%;"><b>Adding a hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs012.html#layout-of-a-simple-neural-network-with-one-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with one hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs013.html#the-derivatives" style="font-size: 80%;"><b>The derivatives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs014.html#important-observations" style="font-size: 80%;"><b>Important observations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs015.html#the-training" style="font-size: 80%;"><b>The training</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs016.html#code-example" style="font-size: 80%;"><b>Code example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs017.html#simple-neural-network-and-the-back-propagation-equations" style="font-size: 80%;"><b>Simple neural network and the  back propagation equations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs018.html#layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-with-two-hidden-noeds-and-one-output-node" style="font-size: 80%;"><b>Layout of a simple neural network with two input nodes, one  hidden layer with two hidden noeds and one output node</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs019.html#the-ouput-layer" style="font-size: 80%;"><b>The ouput layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs020.html#compact-expressions" style="font-size: 80%;"><b>Compact expressions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs021.html#output-layer" style="font-size: 80%;"><b>Output layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs022.html#explicit-derivatives" style="font-size: 80%;"><b>Explicit derivatives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs023.html#derivatives-of-the-hidden-layer" style="font-size: 80%;"><b>Derivatives of the hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs024.html#final-expression" style="font-size: 80%;"><b>Final expression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs025.html#completing-the-list" style="font-size: 80%;"><b>Completing the list</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs026.html#final-expressions-for-the-biases-of-the-hidden-layer" style="font-size: 80%;"><b>Final expressions for the biases of the hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs027.html#gradient-expressions" style="font-size: 80%;"><b>Gradient expressions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs028.html#setting-up-the-equations-for-a-neural-network" style="font-size: 80%;"><b>Setting up the equations for a neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs029.html#layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0" style="font-size: 80%;"><b>Layout of a neural network with three hidden layers (last layer = \( l=L=4 \), first layer \( l=0 \))</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs030.html#definitions" style="font-size: 80%;"><b>Definitions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs031.html#inputs-to-the-activation-function" style="font-size: 80%;"><b>Inputs to the activation function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs032.html#layout-of-input-to-first-hidden-layer-l-1-from-input-layer-l-0" style="font-size: 80%;"><b>Layout of input to first hidden layer \( l=1 \) from input layer \( l=0 \)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs033.html#derivatives-and-the-chain-rule" style="font-size: 80%;"><b>Derivatives and the chain rule</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs034.html#derivative-of-the-cost-function" style="font-size: 80%;"><b>Derivative of the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs035.html#the-back-propagation-equations-for-a-neural-network" style="font-size: 80%;"><b>The  back propagation equations for a neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs036.html#analyzing-the-last-results" style="font-size: 80%;"><b>Analyzing the last results</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs037.html#more-considerations" style="font-size: 80%;"><b>More considerations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs038.html#derivatives-in-terms-of-z-j-l" style="font-size: 80%;"><b>Derivatives in terms of \( z_j^L \)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs039.html#bringing-it-together" style="font-size: 80%;"><b>Bringing it together</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs040.html#final-back-propagating-equation" style="font-size: 80%;"><b>Final back propagating equation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs041.html#using-the-chain-rule-and-summing-over-all-k-entries" style="font-size: 80%;"><b>Using the chain rule and summing over all \( k \) entries</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs042.html#setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations" style="font-size: 80%;"><b>Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs043.html#setting-up-the-back-propagation-algorithm-part-1" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 1</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs044.html#setting-up-the-back-propagation-algorithm-part-2" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 2</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs045.html#setting-up-the-back-propagation-algorithm-part-3" style="font-size: 80%;"><b>Setting up the Back propagation algorithm, part 3</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs046.html#updating-the-gradients" style="font-size: 80%;"><b>Updating the gradients</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#activation-functions" style="font-size: 80%;"><b>Activation functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs047.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs048.html#relevance" style="font-size: 80%;"><b>Relevance</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs049.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs050.html#exploding-gradients" style="font-size: 80%;"><b>Exploding gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs051.html#is-the-logistic-activation-function-sigmoid-our-choice" style="font-size: 80%;"><b>Is the Logistic activation function (Sigmoid)  our choice?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs052.html#logistic-function-as-the-root-of-problems" style="font-size: 80%;"><b>Logistic function as the root of problems</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs053.html#the-derivative-of-the-logistic-funtion" style="font-size: 80%;"><b>The derivative of the Logistic funtion</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs054.html#insights-from-the-paper-by-glorot-and-bengio" style="font-size: 80%;"><b>Insights from the paper by Glorot and Bengio</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs055.html#the-relu-function-family" style="font-size: 80%;"><b>The RELU function family</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs056.html#elu-function" style="font-size: 80%;"><b>ELU function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs057.html#which-activation-function-should-we-use" style="font-size: 80%;"><b>Which activation function should we use?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs058.html#more-on-activation-functions-output-layers" style="font-size: 80%;"><b>More on activation functions, output layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs059.html#fine-tuning-neural-network-hyperparameters" style="font-size: 80%;"><b>Fine-tuning neural network hyperparameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs060.html#hidden-layers" style="font-size: 80%;"><b>Hidden layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs061.html#batch-normalization" style="font-size: 80%;"><b>Batch Normalization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs062.html#dropout" style="font-size: 80%;"><b>Dropout</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs063.html#gradient-clipping" style="font-size: 80%;"><b>Gradient Clipping</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs064.html#a-top-down-perspective-on-neural-networks" style="font-size: 80%;"><b>A top-down perspective on Neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs065.html#more-top-down-perspectives" style="font-size: 80%;"><b>More top-down perspectives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs066.html#limitations-of-supervised-learning-with-deep-networks" style="font-size: 80%;"><b>Limitations of supervised learning with deep networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs067.html#limitations-of-nns" style="font-size: 80%;"><b>Limitations of NNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs068.html#homogeneous-data" style="font-size: 80%;"><b>Homogeneous data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs069.html#more-limitations" style="font-size: 80%;"><b>More limitations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs070.html#setting-up-a-multi-layer-perceptron-model-for-classification" style="font-size: 80%;"><b>Setting up a Multi-layer perceptron model for classification</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs071.html#defining-the-cost-function" style="font-size: 80%;"><b>Defining the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs072.html#example-binary-classification-problem" style="font-size: 80%;"><b>Example: binary classification problem</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs073.html#the-softmax-function" style="font-size: 80%;"><b>The Softmax function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs074.html#developing-a-code-for-doing-neural-networks-with-back-propagation" style="font-size: 80%;"><b>Developing a code for doing neural networks with back propagation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs076.html#train-and-test-datasets" style="font-size: 80%;"><b>Train and test datasets</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs077.html#define-model-and-architecture" style="font-size: 80%;"><b>Define model and architecture</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs078.html#layers" style="font-size: 80%;"><b>Layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs079.html#weights-and-biases" style="font-size: 80%;"><b>Weights and biases</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs080.html#feed-forward-pass" style="font-size: 80%;"><b>Feed-forward pass</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs081.html#matrix-multiplications" style="font-size: 80%;"><b>Matrix multiplications</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs082.html#choose-cost-function-and-optimizer" style="font-size: 80%;"><b>Choose cost function and optimizer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs083.html#optimizing-the-cost-function" style="font-size: 80%;"><b>Optimizing the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs084.html#regularization" style="font-size: 80%;"><b>Regularization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs085.html#matrix-multiplication" style="font-size: 80%;"><b>Matrix  multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs086.html#improving-performance" style="font-size: 80%;"><b>Improving performance</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs087.html#full-object-oriented-implementation" style="font-size: 80%;"><b>Full object-oriented implementation</b></a></li>
-     <!-- navigation toc: --> <li><a href="#evaluate-model-performance-on-test-data" style="font-size: 80%;"><b>Evaluate model performance on test data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs089.html#adjust-hyperparameters" style="font-size: 80%;"><b>Adjust hyperparameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs091.html#scikit-learn-implementation" style="font-size: 80%;"><b>scikit-learn implementation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#building-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs094.html#tensorflow" style="font-size: 80%;"><b>Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs095.html#using-keras" style="font-size: 80%;"><b>Using Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#the-breast-cancer-data-now-with-keras" style="font-size: 80%;"><b>The Breast Cancer Data, now with Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs048.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs049.html#relevance" style="font-size: 80%;"><b>Relevance</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs050.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs051.html#exploding-gradients" style="font-size: 80%;"><b>Exploding gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs052.html#is-the-logistic-activation-function-sigmoid-our-choice" style="font-size: 80%;"><b>Is the Logistic activation function (Sigmoid)  our choice?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs053.html#logistic-function-as-the-root-of-problems" style="font-size: 80%;"><b>Logistic function as the root of problems</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs054.html#the-derivative-of-the-logistic-funtion" style="font-size: 80%;"><b>The derivative of the Logistic funtion</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs055.html#insights-from-the-paper-by-glorot-and-bengio" style="font-size: 80%;"><b>Insights from the paper by Glorot and Bengio</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs056.html#the-relu-function-family" style="font-size: 80%;"><b>The RELU function family</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs057.html#elu-function" style="font-size: 80%;"><b>ELU function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs058.html#which-activation-function-should-we-use" style="font-size: 80%;"><b>Which activation function should we use?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs059.html#more-on-activation-functions-output-layers" style="font-size: 80%;"><b>More on activation functions, output layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs060.html#fine-tuning-neural-network-hyperparameters" style="font-size: 80%;"><b>Fine-tuning neural network hyperparameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs061.html#hidden-layers" style="font-size: 80%;"><b>Hidden layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs062.html#batch-normalization" style="font-size: 80%;"><b>Batch Normalization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs063.html#dropout" style="font-size: 80%;"><b>Dropout</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs064.html#gradient-clipping" style="font-size: 80%;"><b>Gradient Clipping</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs065.html#a-top-down-perspective-on-neural-networks" style="font-size: 80%;"><b>A top-down perspective on Neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs066.html#more-top-down-perspectives" style="font-size: 80%;"><b>More top-down perspectives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs067.html#limitations-of-supervised-learning-with-deep-networks" style="font-size: 80%;"><b>Limitations of supervised learning with deep networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs068.html#limitations-of-nns" style="font-size: 80%;"><b>Limitations of NNs</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs069.html#homogeneous-data" style="font-size: 80%;"><b>Homogeneous data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs070.html#more-limitations" style="font-size: 80%;"><b>More limitations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs071.html#setting-up-a-multi-layer-perceptron-model-for-classification" style="font-size: 80%;"><b>Setting up a Multi-layer perceptron model for classification</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs072.html#defining-the-cost-function" style="font-size: 80%;"><b>Defining the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs073.html#example-binary-classification-problem" style="font-size: 80%;"><b>Example: binary classification problem</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs074.html#the-softmax-function" style="font-size: 80%;"><b>The Softmax function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs075.html#developing-a-code-for-doing-neural-networks-with-back-propagation" style="font-size: 80%;"><b>Developing a code for doing neural networks with back propagation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs077.html#train-and-test-datasets" style="font-size: 80%;"><b>Train and test datasets</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs078.html#define-model-and-architecture" style="font-size: 80%;"><b>Define model and architecture</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs079.html#layers" style="font-size: 80%;"><b>Layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs080.html#weights-and-biases" style="font-size: 80%;"><b>Weights and biases</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs081.html#feed-forward-pass" style="font-size: 80%;"><b>Feed-forward pass</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs082.html#matrix-multiplications" style="font-size: 80%;"><b>Matrix multiplications</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs083.html#choose-cost-function-and-optimizer" style="font-size: 80%;"><b>Choose cost function and optimizer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs084.html#optimizing-the-cost-function" style="font-size: 80%;"><b>Optimizing the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs085.html#regularization" style="font-size: 80%;"><b>Regularization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs086.html#matrix-multiplication" style="font-size: 80%;"><b>Matrix  multiplication</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs087.html#improving-performance" style="font-size: 80%;"><b>Improving performance</b></a></li>
+     <!-- navigation toc: --> <li><a href="#full-object-oriented-implementation" style="font-size: 80%;"><b>Full object-oriented implementation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs089.html#evaluate-model-performance-on-test-data" style="font-size: 80%;"><b>Evaluate model performance on test data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs090.html#adjust-hyperparameters" style="font-size: 80%;"><b>Adjust hyperparameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#scikit-learn-implementation" style="font-size: 80%;"><b>scikit-learn implementation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs094.html#building-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building neural networks in Tensorflow and Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs095.html#tensorflow" style="font-size: 80%;"><b>Tensorflow</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#using-keras" style="font-size: 80%;"><b>Using Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#building-a-neural-network-code" style="font-size: 80%;"><b>Building a neural network code</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#learning-rate-methods" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Learning rate methods</a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#usage-of-the-above-learning-rate-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of the above learning rate schedulers</a></li>
@@ -465,17 +463,12 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0088"></a>
 <!-- !split -->
-<h2 id="evaluate-model-performance-on-test-data" class="anchor">Evaluate model performance on test data </h2>
+<h2 id="full-object-oriented-implementation" class="anchor">Full object-oriented implementation </h2>
 
-<p>To measure the performance of our network we evaluate how well it does it data it has never seen before, i.e. the test data.  
-We measure the performance of the network using the <em>accuracy</em> score.  
-The accuracy is as you would expect just the number of images correctly labeled divided by the total number of images. A perfect classifier will have an accuracy score of \( 1 \).  
+<p>It is very natural to think of the network as an object, with specific instances of the network
+being realizations of this object with different hyperparameters. An implementation using Python classes provides a clean structure and interface, and the full implementation of our neural network is given below.
 </p>
 
-<p>$$ \text{Accuracy} = \frac{\sum_{i=1}^n I(\tilde{y}_i = y_i)}{n} ,$$  </p>
-
-<p>where \( I \) is the indicator function, \( 1 \) if \( \tilde{y}_i = y_i \) and \( 0 \) otherwise.</p>
-
 
 <!-- code=python (!bc pycod) typeset with pygments style "default" -->
 <div class="cell border-box-sizing code_cell rendered">
@@ -483,22 +476,105 @@ <h2 id="evaluate-model-performance-on-test-data" class="anchor">Evaluate model p
     <div class="inner_cell">
       <div class="input_area">
         <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;">epochs <span style="color: #666666">=</span> <span style="color: #666666">100</span>
-batch_size <span style="color: #666666">=</span> <span style="color: #666666">100</span>
+  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">class</span> <span style="color: #0000FF; font-weight: bold">NeuralNetwork</span>:
+    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">__init__</span>(
+            <span style="color: #008000">self</span>,
+            X_data,
+            Y_data,
+            n_hidden_neurons<span style="color: #666666">=50</span>,
+            n_categories<span style="color: #666666">=10</span>,
+            epochs<span style="color: #666666">=10</span>,
+            batch_size<span style="color: #666666">=100</span>,
+            eta<span style="color: #666666">=0.1</span>,
+            lmbd<span style="color: #666666">=0.0</span>):
+
+        <span style="color: #008000">self</span><span style="color: #666666">.</span>X_data_full <span style="color: #666666">=</span> X_data
+        <span style="color: #008000">self</span><span style="color: #666666">.</span>Y_data_full <span style="color: #666666">=</span> Y_data
+
+        <span style="color: #008000">self</span><span style="color: #666666">.</span>n_inputs <span style="color: #666666">=</span> X_data<span style="color: #666666">.</span>shape[<span style="color: #666666">0</span>]
+        <span style="color: #008000">self</span><span style="color: #666666">.</span>n_features <span style="color: #666666">=</span> X_data<span style="color: #666666">.</span>shape[<span style="color: #666666">1</span>]
+        <span style="color: #008000">self</span><span style="color: #666666">.</span>n_hidden_neurons <span style="color: #666666">=</span> n_hidden_neurons
+        <span style="color: #008000">self</span><span style="color: #666666">.</span>n_categories <span style="color: #666666">=</span> n_categories
+
+        <span style="color: #008000">self</span><span style="color: #666666">.</span>epochs <span style="color: #666666">=</span> epochs
+        <span style="color: #008000">self</span><span style="color: #666666">.</span>batch_size <span style="color: #666666">=</span> batch_size
+        <span style="color: #008000">self</span><span style="color: #666666">.</span>iterations <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>n_inputs <span style="color: #666666">//</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>batch_size
+        <span style="color: #008000">self</span><span style="color: #666666">.</span>eta <span style="color: #666666">=</span> eta
+        <span style="color: #008000">self</span><span style="color: #666666">.</span>lmbd <span style="color: #666666">=</span> lmbd
+
+        <span style="color: #008000">self</span><span style="color: #666666">.</span>create_biases_and_weights()
+
+    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">create_biases_and_weights</span>(<span style="color: #008000">self</span>):
+        <span style="color: #008000">self</span><span style="color: #666666">.</span>hidden_weights <span style="color: #666666">=</span> np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>randn(<span style="color: #008000">self</span><span style="color: #666666">.</span>n_features, <span style="color: #008000">self</span><span style="color: #666666">.</span>n_hidden_neurons)
+        <span style="color: #008000">self</span><span style="color: #666666">.</span>hidden_bias <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros(<span style="color: #008000">self</span><span style="color: #666666">.</span>n_hidden_neurons) <span style="color: #666666">+</span> <span style="color: #666666">0.01</span>
+
+        <span style="color: #008000">self</span><span style="color: #666666">.</span>output_weights <span style="color: #666666">=</span> np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>randn(<span style="color: #008000">self</span><span style="color: #666666">.</span>n_hidden_neurons, <span style="color: #008000">self</span><span style="color: #666666">.</span>n_categories)
+        <span style="color: #008000">self</span><span style="color: #666666">.</span>output_bias <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros(<span style="color: #008000">self</span><span style="color: #666666">.</span>n_categories) <span style="color: #666666">+</span> <span style="color: #666666">0.01</span>
+
+    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">feed_forward</span>(<span style="color: #008000">self</span>):
+        <span style="color: #408080; font-style: italic"># feed-forward for training</span>
+        <span style="color: #008000">self</span><span style="color: #666666">.</span>z_h <span style="color: #666666">=</span> np<span style="color: #666666">.</span>matmul(<span style="color: #008000">self</span><span style="color: #666666">.</span>X_data, <span style="color: #008000">self</span><span style="color: #666666">.</span>hidden_weights) <span style="color: #666666">+</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>hidden_bias
+        <span style="color: #008000">self</span><span style="color: #666666">.</span>a_h <span style="color: #666666">=</span> sigmoid(<span style="color: #008000">self</span><span style="color: #666666">.</span>z_h)
+
+        <span style="color: #008000">self</span><span style="color: #666666">.</span>z_o <span style="color: #666666">=</span> np<span style="color: #666666">.</span>matmul(<span style="color: #008000">self</span><span style="color: #666666">.</span>a_h, <span style="color: #008000">self</span><span style="color: #666666">.</span>output_weights) <span style="color: #666666">+</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>output_bias
+
+        exp_term <span style="color: #666666">=</span> np<span style="color: #666666">.</span>exp(<span style="color: #008000">self</span><span style="color: #666666">.</span>z_o)
+        <span style="color: #008000">self</span><span style="color: #666666">.</span>probabilities <span style="color: #666666">=</span> exp_term <span style="color: #666666">/</span> np<span style="color: #666666">.</span>sum(exp_term, axis<span style="color: #666666">=1</span>, keepdims<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">True</span>)
+
+    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">feed_forward_out</span>(<span style="color: #008000">self</span>, X):
+        <span style="color: #408080; font-style: italic"># feed-forward for output</span>
+        z_h <span style="color: #666666">=</span> np<span style="color: #666666">.</span>matmul(X, <span style="color: #008000">self</span><span style="color: #666666">.</span>hidden_weights) <span style="color: #666666">+</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>hidden_bias
+        a_h <span style="color: #666666">=</span> sigmoid(z_h)
+
+        z_o <span style="color: #666666">=</span> np<span style="color: #666666">.</span>matmul(a_h, <span style="color: #008000">self</span><span style="color: #666666">.</span>output_weights) <span style="color: #666666">+</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>output_bias
+        
+        exp_term <span style="color: #666666">=</span> np<span style="color: #666666">.</span>exp(z_o)
+        probabilities <span style="color: #666666">=</span> exp_term <span style="color: #666666">/</span> np<span style="color: #666666">.</span>sum(exp_term, axis<span style="color: #666666">=1</span>, keepdims<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">True</span>)
+        <span style="color: #008000; font-weight: bold">return</span> probabilities
+
+    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">backpropagation</span>(<span style="color: #008000">self</span>):
+        error_output <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>probabilities <span style="color: #666666">-</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>Y_data
+        error_hidden <span style="color: #666666">=</span> np<span style="color: #666666">.</span>matmul(error_output, <span style="color: #008000">self</span><span style="color: #666666">.</span>output_weights<span style="color: #666666">.</span>T) <span style="color: #666666">*</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>a_h <span style="color: #666666">*</span> (<span style="color: #666666">1</span> <span style="color: #666666">-</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>a_h)
+
+        <span style="color: #008000">self</span><span style="color: #666666">.</span>output_weights_gradient <span style="color: #666666">=</span> np<span style="color: #666666">.</span>matmul(<span style="color: #008000">self</span><span style="color: #666666">.</span>a_h<span style="color: #666666">.</span>T, error_output)
+        <span style="color: #008000">self</span><span style="color: #666666">.</span>output_bias_gradient <span style="color: #666666">=</span> np<span style="color: #666666">.</span>sum(error_output, axis<span style="color: #666666">=0</span>)
+
+        <span style="color: #008000">self</span><span style="color: #666666">.</span>hidden_weights_gradient <span style="color: #666666">=</span> np<span style="color: #666666">.</span>matmul(<span style="color: #008000">self</span><span style="color: #666666">.</span>X_data<span style="color: #666666">.</span>T, error_hidden)
+        <span style="color: #008000">self</span><span style="color: #666666">.</span>hidden_bias_gradient <span style="color: #666666">=</span> np<span style="color: #666666">.</span>sum(error_hidden, axis<span style="color: #666666">=0</span>)
+
+        <span style="color: #008000; font-weight: bold">if</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>lmbd <span style="color: #666666">&gt;</span> <span style="color: #666666">0.0</span>:
+            <span style="color: #008000">self</span><span style="color: #666666">.</span>output_weights_gradient <span style="color: #666666">+=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>lmbd <span style="color: #666666">*</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>output_weights
+            <span style="color: #008000">self</span><span style="color: #666666">.</span>hidden_weights_gradient <span style="color: #666666">+=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>lmbd <span style="color: #666666">*</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>hidden_weights
+
+        <span style="color: #008000">self</span><span style="color: #666666">.</span>output_weights <span style="color: #666666">-=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>eta <span style="color: #666666">*</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>output_weights_gradient
+        <span style="color: #008000">self</span><span style="color: #666666">.</span>output_bias <span style="color: #666666">-=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>eta <span style="color: #666666">*</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>output_bias_gradient
+        <span style="color: #008000">self</span><span style="color: #666666">.</span>hidden_weights <span style="color: #666666">-=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>eta <span style="color: #666666">*</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>hidden_weights_gradient
+        <span style="color: #008000">self</span><span style="color: #666666">.</span>hidden_bias <span style="color: #666666">-=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>eta <span style="color: #666666">*</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>hidden_bias_gradient
+
+    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">predict</span>(<span style="color: #008000">self</span>, X):
+        probabilities <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>feed_forward_out(X)
+        <span style="color: #008000; font-weight: bold">return</span> np<span style="color: #666666">.</span>argmax(probabilities, axis<span style="color: #666666">=1</span>)
+
+    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">predict_probabilities</span>(<span style="color: #008000">self</span>, X):
+        probabilities <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>feed_forward_out(X)
+        <span style="color: #008000; font-weight: bold">return</span> probabilities
 
-dnn <span style="color: #666666">=</span> NeuralNetwork(X_train, Y_train_onehot, eta<span style="color: #666666">=</span>eta, lmbd<span style="color: #666666">=</span>lmbd, epochs<span style="color: #666666">=</span>epochs, batch_size<span style="color: #666666">=</span>batch_size,
-                    n_hidden_neurons<span style="color: #666666">=</span>n_hidden_neurons, n_categories<span style="color: #666666">=</span>n_categories)
-dnn<span style="color: #666666">.</span>train()
-test_predict <span style="color: #666666">=</span> dnn<span style="color: #666666">.</span>predict(X_test)
+    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">train</span>(<span style="color: #008000">self</span>):
+        data_indices <span style="color: #666666">=</span> np<span style="color: #666666">.</span>arange(<span style="color: #008000">self</span><span style="color: #666666">.</span>n_inputs)
 
-<span style="color: #408080; font-style: italic"># accuracy score from scikit library</span>
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;Accuracy score on test set: &quot;</span>, accuracy_score(Y_test, test_predict))
+        <span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #008000">self</span><span style="color: #666666">.</span>epochs):
+            <span style="color: #008000; font-weight: bold">for</span> j <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #008000">self</span><span style="color: #666666">.</span>iterations):
+                <span style="color: #408080; font-style: italic"># pick datapoints with replacement</span>
+                chosen_datapoints <span style="color: #666666">=</span> np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>choice(
+                    data_indices, size<span style="color: #666666">=</span><span style="color: #008000">self</span><span style="color: #666666">.</span>batch_size, replace<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">False</span>
+                )
 
-<span style="color: #408080; font-style: italic"># equivalent in numpy</span>
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">accuracy_score_numpy</span>(Y_test, Y_pred):
-    <span style="color: #008000; font-weight: bold">return</span> np<span style="color: #666666">.</span>sum(Y_test <span style="color: #666666">==</span> Y_pred) <span style="color: #666666">/</span> <span style="color: #008000">len</span>(Y_test)
+                <span style="color: #408080; font-style: italic"># minibatch training data</span>
+                <span style="color: #008000">self</span><span style="color: #666666">.</span>X_data <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>X_data_full[chosen_datapoints]
+                <span style="color: #008000">self</span><span style="color: #666666">.</span>Y_data <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>Y_data_full[chosen_datapoints]
 
-<span style="color: #408080; font-style: italic">#print(&quot;Accuracy score on test set: &quot;, accuracy_score_numpy(Y_test, test_predict))</span>
+                <span style="color: #008000">self</span><span style="color: #666666">.</span>feed_forward()
+                <span style="color: #008000">self</span><span style="color: #666666">.</span>backpropagation()
 </pre>
 </div>
       </div>
diff --git a/doc/pub/week42/html/._week42-bs089.html b/doc/pub/week42/html/._week42-bs089.html
index 59d1e1b0a..6e62e5058 100644
--- a/doc/pub/week42/html/._week42-bs089.html
+++ b/doc/pub/week42/html/._week42-bs089.html
@@ -36,16 +36,17 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Lecture October 14, 2024', 2, None, 'lecture-october-14-2024'),
-              ('Material for the active learning sessions on Tuesday and '
-               'Wednesday',
+ 'sections': [('Lecture October 13, 2025', 2, None, 'lecture-october-13-2025'),
+              ('Readings and videos', 2, None, 'readings-and-videos'),
+              ('Material for the lab sessions on Tuesday and Wednesday',
                2,
                None,
-               'material-for-the-active-learning-sessions-on-tuesday-and-wednesday'),
-              ('Writing a code which implements a feed-forward neural network',
+               'material-for-the-lab-sessions-on-tuesday-and-wednesday'),
+              ('Lecture material: Writing a code which implements a '
+               'feed-forward neural network',
                2,
                None,
-               'writing-a-code-which-implements-a-feed-forward-neural-network'),
+               'lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network'),
               ('Mathematics of deep learning',
                2,
                None,
@@ -55,10 +56,11 @@
                None,
                'reminder-on-books-with-hands-on-material-and-codes'),
               ('Reading recommendations', 2, None, 'reading-recommendations'),
-              ('First network example, simple percepetron with one input',
+              ('Reminder from last week: First network example, simple '
+               'percepetron with one input',
                2,
                None,
-               'first-network-example-simple-percepetron-with-one-input'),
+               'reminder-from-last-week-first-network-example-simple-percepetron-with-one-input'),
               ('Layout of a simple neural network with no hidden layer',
                2,
                None,
@@ -105,10 +107,10 @@
                None,
                'setting-up-the-equations-for-a-neural-network'),
               ('Layout of a neural network with three hidden layers (last '
-               'later = $l=L=4$, first layer $l=0$)',
+               'layer = $l=L=4$, first layer $l=0$)',
                2,
                None,
-               'layout-of-a-neural-network-with-three-hidden-layers-last-later-l-l-4-first-layer-l-0'),
+               'layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0'),
               ('Definitions', 2, None, 'definitions'),
               ('Inputs to the activation function',
                2,
@@ -291,10 +293,6 @@
                2,
                None,
                'collect-and-pre-process-data'),
-              ('The Breast Cancer Data, now with Keras',
-               2,
-               None,
-               'the-breast-cancer-data-now-with-keras'),
               ('Building a neural network code',
                2,
                None,
@@ -349,103 +347,103 @@
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs001.html#lecture-october-14-2024" style="font-size: 80%;"><b>Lecture October 14, 2024</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs002.html#material-for-the-active-learning-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Material for the active learning sessions on Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs003.html#writing-a-code-which-implements-a-feed-forward-neural-network" style="font-size: 80%;"><b>Writing a code which implements a feed-forward neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs004.html#mathematics-of-deep-learning" style="font-size: 80%;"><b>Mathematics of deep learning</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs005.html#reminder-on-books-with-hands-on-material-and-codes" style="font-size: 80%;"><b>Reminder on books with hands-on material and codes</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs006.html#reading-recommendations" style="font-size: 80%;"><b>Reading recommendations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs007.html#first-network-example-simple-percepetron-with-one-input" style="font-size: 80%;"><b>First network example, simple percepetron with one input</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs008.html#layout-of-a-simple-neural-network-with-no-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with no hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs009.html#optimizing-the-parameters" style="font-size: 80%;"><b>Optimizing the parameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs010.html#adding-a-hidden-layer" style="font-size: 80%;"><b>Adding a hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs011.html#layout-of-a-simple-neural-network-with-one-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with one hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs012.html#the-derivatives" style="font-size: 80%;"><b>The derivatives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs013.html#important-observations" style="font-size: 80%;"><b>Important observations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs014.html#the-training" style="font-size: 80%;"><b>The training</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs015.html#code-example" style="font-size: 80%;"><b>Code example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs016.html#simple-neural-network-and-the-back-propagation-equations" style="font-size: 80%;"><b>Simple neural network and the  back propagation equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs017.html#layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-with-two-hidden-noeds-and-one-output-node" style="font-size: 80%;"><b>Layout of a simple neural network with two input nodes, one  hidden layer with two hidden noeds and one output node</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs018.html#the-ouput-layer" style="font-size: 80%;"><b>The ouput layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs019.html#compact-expressions" style="font-size: 80%;"><b>Compact expressions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs020.html#output-layer" style="font-size: 80%;"><b>Output layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs021.html#explicit-derivatives" style="font-size: 80%;"><b>Explicit derivatives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs022.html#derivatives-of-the-hidden-layer" style="font-size: 80%;"><b>Derivatives of the hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs023.html#final-expression" style="font-size: 80%;"><b>Final expression</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs024.html#completing-the-list" style="font-size: 80%;"><b>Completing the list</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs025.html#final-expressions-for-the-biases-of-the-hidden-layer" style="font-size: 80%;"><b>Final expressions for the biases of the hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs026.html#gradient-expressions" style="font-size: 80%;"><b>Gradient expressions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs027.html#setting-up-the-equations-for-a-neural-network" style="font-size: 80%;"><b>Setting up the equations for a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs028.html#layout-of-a-neural-network-with-three-hidden-layers-last-later-l-l-4-first-layer-l-0" style="font-size: 80%;"><b>Layout of a neural network with three hidden layers (last later = \( l=L=4 \), first layer \( l=0 \))</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs029.html#definitions" style="font-size: 80%;"><b>Definitions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs030.html#inputs-to-the-activation-function" style="font-size: 80%;"><b>Inputs to the activation function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs031.html#layout-of-input-to-first-hidden-layer-l-1-from-input-layer-l-0" style="font-size: 80%;"><b>Layout of input to first hidden layer \( l=1 \) from input layer \( l=0 \)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs032.html#derivatives-and-the-chain-rule" style="font-size: 80%;"><b>Derivatives and the chain rule</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs033.html#derivative-of-the-cost-function" style="font-size: 80%;"><b>Derivative of the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs034.html#the-back-propagation-equations-for-a-neural-network" style="font-size: 80%;"><b>The  back propagation equations for a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs035.html#analyzing-the-last-results" style="font-size: 80%;"><b>Analyzing the last results</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs036.html#more-considerations" style="font-size: 80%;"><b>More considerations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs037.html#derivatives-in-terms-of-z-j-l" style="font-size: 80%;"><b>Derivatives in terms of \( z_j^L \)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs038.html#bringing-it-together" style="font-size: 80%;"><b>Bringing it together</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs039.html#final-back-propagating-equation" style="font-size: 80%;"><b>Final back propagating equation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs040.html#using-the-chain-rule-and-summing-over-all-k-entries" style="font-size: 80%;"><b>Using the chain rule and summing over all \( k \) entries</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs041.html#setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations" style="font-size: 80%;"><b>Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs042.html#setting-up-the-back-propagation-algorithm-part-1" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 1</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs043.html#setting-up-the-back-propagation-algorithm-part-2" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs044.html#setting-up-the-back-propagation-algorithm-part-3" style="font-size: 80%;"><b>Setting up the Back propagation algorithm, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs045.html#updating-the-gradients" style="font-size: 80%;"><b>Updating the gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs001.html#lecture-october-13-2025" style="font-size: 80%;"><b>Lecture October 13, 2025</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs002.html#readings-and-videos" style="font-size: 80%;"><b>Readings and videos</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs003.html#material-for-the-lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Material for the lab sessions on Tuesday and Wednesday</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs004.html#lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network" style="font-size: 80%;"><b>Lecture material: Writing a code which implements a feed-forward neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs005.html#mathematics-of-deep-learning" style="font-size: 80%;"><b>Mathematics of deep learning</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs006.html#reminder-on-books-with-hands-on-material-and-codes" style="font-size: 80%;"><b>Reminder on books with hands-on material and codes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs007.html#reading-recommendations" style="font-size: 80%;"><b>Reading recommendations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs008.html#reminder-from-last-week-first-network-example-simple-percepetron-with-one-input" style="font-size: 80%;"><b>Reminder from last week: First network example, simple percepetron with one input</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs009.html#layout-of-a-simple-neural-network-with-no-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with no hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs010.html#optimizing-the-parameters" style="font-size: 80%;"><b>Optimizing the parameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs011.html#adding-a-hidden-layer" style="font-size: 80%;"><b>Adding a hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs012.html#layout-of-a-simple-neural-network-with-one-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with one hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs013.html#the-derivatives" style="font-size: 80%;"><b>The derivatives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs014.html#important-observations" style="font-size: 80%;"><b>Important observations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs015.html#the-training" style="font-size: 80%;"><b>The training</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs016.html#code-example" style="font-size: 80%;"><b>Code example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs017.html#simple-neural-network-and-the-back-propagation-equations" style="font-size: 80%;"><b>Simple neural network and the  back propagation equations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs018.html#layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-with-two-hidden-noeds-and-one-output-node" style="font-size: 80%;"><b>Layout of a simple neural network with two input nodes, one  hidden layer with two hidden noeds and one output node</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs019.html#the-ouput-layer" style="font-size: 80%;"><b>The ouput layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs020.html#compact-expressions" style="font-size: 80%;"><b>Compact expressions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs021.html#output-layer" style="font-size: 80%;"><b>Output layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs022.html#explicit-derivatives" style="font-size: 80%;"><b>Explicit derivatives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs023.html#derivatives-of-the-hidden-layer" style="font-size: 80%;"><b>Derivatives of the hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs024.html#final-expression" style="font-size: 80%;"><b>Final expression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs025.html#completing-the-list" style="font-size: 80%;"><b>Completing the list</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs026.html#final-expressions-for-the-biases-of-the-hidden-layer" style="font-size: 80%;"><b>Final expressions for the biases of the hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs027.html#gradient-expressions" style="font-size: 80%;"><b>Gradient expressions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs028.html#setting-up-the-equations-for-a-neural-network" style="font-size: 80%;"><b>Setting up the equations for a neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs029.html#layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0" style="font-size: 80%;"><b>Layout of a neural network with three hidden layers (last layer = \( l=L=4 \), first layer \( l=0 \))</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs030.html#definitions" style="font-size: 80%;"><b>Definitions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs031.html#inputs-to-the-activation-function" style="font-size: 80%;"><b>Inputs to the activation function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs032.html#layout-of-input-to-first-hidden-layer-l-1-from-input-layer-l-0" style="font-size: 80%;"><b>Layout of input to first hidden layer \( l=1 \) from input layer \( l=0 \)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs033.html#derivatives-and-the-chain-rule" style="font-size: 80%;"><b>Derivatives and the chain rule</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs034.html#derivative-of-the-cost-function" style="font-size: 80%;"><b>Derivative of the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs035.html#the-back-propagation-equations-for-a-neural-network" style="font-size: 80%;"><b>The  back propagation equations for a neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs036.html#analyzing-the-last-results" style="font-size: 80%;"><b>Analyzing the last results</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs037.html#more-considerations" style="font-size: 80%;"><b>More considerations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs038.html#derivatives-in-terms-of-z-j-l" style="font-size: 80%;"><b>Derivatives in terms of \( z_j^L \)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs039.html#bringing-it-together" style="font-size: 80%;"><b>Bringing it together</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs040.html#final-back-propagating-equation" style="font-size: 80%;"><b>Final back propagating equation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs041.html#using-the-chain-rule-and-summing-over-all-k-entries" style="font-size: 80%;"><b>Using the chain rule and summing over all \( k \) entries</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs042.html#setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations" style="font-size: 80%;"><b>Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs043.html#setting-up-the-back-propagation-algorithm-part-1" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 1</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs044.html#setting-up-the-back-propagation-algorithm-part-2" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 2</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs045.html#setting-up-the-back-propagation-algorithm-part-3" style="font-size: 80%;"><b>Setting up the Back propagation algorithm, part 3</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs046.html#updating-the-gradients" style="font-size: 80%;"><b>Updating the gradients</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#activation-functions" style="font-size: 80%;"><b>Activation functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs047.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs048.html#relevance" style="font-size: 80%;"><b>Relevance</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs049.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs050.html#exploding-gradients" style="font-size: 80%;"><b>Exploding gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs051.html#is-the-logistic-activation-function-sigmoid-our-choice" style="font-size: 80%;"><b>Is the Logistic activation function (Sigmoid)  our choice?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs052.html#logistic-function-as-the-root-of-problems" style="font-size: 80%;"><b>Logistic function as the root of problems</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs053.html#the-derivative-of-the-logistic-funtion" style="font-size: 80%;"><b>The derivative of the Logistic funtion</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs054.html#insights-from-the-paper-by-glorot-and-bengio" style="font-size: 80%;"><b>Insights from the paper by Glorot and Bengio</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs055.html#the-relu-function-family" style="font-size: 80%;"><b>The RELU function family</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs056.html#elu-function" style="font-size: 80%;"><b>ELU function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs057.html#which-activation-function-should-we-use" style="font-size: 80%;"><b>Which activation function should we use?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs058.html#more-on-activation-functions-output-layers" style="font-size: 80%;"><b>More on activation functions, output layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs059.html#fine-tuning-neural-network-hyperparameters" style="font-size: 80%;"><b>Fine-tuning neural network hyperparameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs060.html#hidden-layers" style="font-size: 80%;"><b>Hidden layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs061.html#batch-normalization" style="font-size: 80%;"><b>Batch Normalization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs062.html#dropout" style="font-size: 80%;"><b>Dropout</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs063.html#gradient-clipping" style="font-size: 80%;"><b>Gradient Clipping</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs064.html#a-top-down-perspective-on-neural-networks" style="font-size: 80%;"><b>A top-down perspective on Neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs065.html#more-top-down-perspectives" style="font-size: 80%;"><b>More top-down perspectives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs066.html#limitations-of-supervised-learning-with-deep-networks" style="font-size: 80%;"><b>Limitations of supervised learning with deep networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs067.html#limitations-of-nns" style="font-size: 80%;"><b>Limitations of NNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs068.html#homogeneous-data" style="font-size: 80%;"><b>Homogeneous data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs069.html#more-limitations" style="font-size: 80%;"><b>More limitations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs070.html#setting-up-a-multi-layer-perceptron-model-for-classification" style="font-size: 80%;"><b>Setting up a Multi-layer perceptron model for classification</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs071.html#defining-the-cost-function" style="font-size: 80%;"><b>Defining the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs072.html#example-binary-classification-problem" style="font-size: 80%;"><b>Example: binary classification problem</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs073.html#the-softmax-function" style="font-size: 80%;"><b>The Softmax function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs074.html#developing-a-code-for-doing-neural-networks-with-back-propagation" style="font-size: 80%;"><b>Developing a code for doing neural networks with back propagation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs076.html#train-and-test-datasets" style="font-size: 80%;"><b>Train and test datasets</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs077.html#define-model-and-architecture" style="font-size: 80%;"><b>Define model and architecture</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs078.html#layers" style="font-size: 80%;"><b>Layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs079.html#weights-and-biases" style="font-size: 80%;"><b>Weights and biases</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs080.html#feed-forward-pass" style="font-size: 80%;"><b>Feed-forward pass</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs081.html#matrix-multiplications" style="font-size: 80%;"><b>Matrix multiplications</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs082.html#choose-cost-function-and-optimizer" style="font-size: 80%;"><b>Choose cost function and optimizer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs083.html#optimizing-the-cost-function" style="font-size: 80%;"><b>Optimizing the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs084.html#regularization" style="font-size: 80%;"><b>Regularization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs085.html#matrix-multiplication" style="font-size: 80%;"><b>Matrix  multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs086.html#improving-performance" style="font-size: 80%;"><b>Improving performance</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs087.html#full-object-oriented-implementation" style="font-size: 80%;"><b>Full object-oriented implementation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs088.html#evaluate-model-performance-on-test-data" style="font-size: 80%;"><b>Evaluate model performance on test data</b></a></li>
-     <!-- navigation toc: --> <li><a href="#adjust-hyperparameters" style="font-size: 80%;"><b>Adjust hyperparameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs091.html#scikit-learn-implementation" style="font-size: 80%;"><b>scikit-learn implementation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#building-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs094.html#tensorflow" style="font-size: 80%;"><b>Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs095.html#using-keras" style="font-size: 80%;"><b>Using Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#the-breast-cancer-data-now-with-keras" style="font-size: 80%;"><b>The Breast Cancer Data, now with Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs048.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs049.html#relevance" style="font-size: 80%;"><b>Relevance</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs050.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs051.html#exploding-gradients" style="font-size: 80%;"><b>Exploding gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs052.html#is-the-logistic-activation-function-sigmoid-our-choice" style="font-size: 80%;"><b>Is the Logistic activation function (Sigmoid)  our choice?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs053.html#logistic-function-as-the-root-of-problems" style="font-size: 80%;"><b>Logistic function as the root of problems</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs054.html#the-derivative-of-the-logistic-funtion" style="font-size: 80%;"><b>The derivative of the Logistic funtion</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs055.html#insights-from-the-paper-by-glorot-and-bengio" style="font-size: 80%;"><b>Insights from the paper by Glorot and Bengio</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs056.html#the-relu-function-family" style="font-size: 80%;"><b>The RELU function family</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs057.html#elu-function" style="font-size: 80%;"><b>ELU function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs058.html#which-activation-function-should-we-use" style="font-size: 80%;"><b>Which activation function should we use?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs059.html#more-on-activation-functions-output-layers" style="font-size: 80%;"><b>More on activation functions, output layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs060.html#fine-tuning-neural-network-hyperparameters" style="font-size: 80%;"><b>Fine-tuning neural network hyperparameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs061.html#hidden-layers" style="font-size: 80%;"><b>Hidden layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs062.html#batch-normalization" style="font-size: 80%;"><b>Batch Normalization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs063.html#dropout" style="font-size: 80%;"><b>Dropout</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs064.html#gradient-clipping" style="font-size: 80%;"><b>Gradient Clipping</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs065.html#a-top-down-perspective-on-neural-networks" style="font-size: 80%;"><b>A top-down perspective on Neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs066.html#more-top-down-perspectives" style="font-size: 80%;"><b>More top-down perspectives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs067.html#limitations-of-supervised-learning-with-deep-networks" style="font-size: 80%;"><b>Limitations of supervised learning with deep networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs068.html#limitations-of-nns" style="font-size: 80%;"><b>Limitations of NNs</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs069.html#homogeneous-data" style="font-size: 80%;"><b>Homogeneous data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs070.html#more-limitations" style="font-size: 80%;"><b>More limitations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs071.html#setting-up-a-multi-layer-perceptron-model-for-classification" style="font-size: 80%;"><b>Setting up a Multi-layer perceptron model for classification</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs072.html#defining-the-cost-function" style="font-size: 80%;"><b>Defining the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs073.html#example-binary-classification-problem" style="font-size: 80%;"><b>Example: binary classification problem</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs074.html#the-softmax-function" style="font-size: 80%;"><b>The Softmax function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs075.html#developing-a-code-for-doing-neural-networks-with-back-propagation" style="font-size: 80%;"><b>Developing a code for doing neural networks with back propagation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs077.html#train-and-test-datasets" style="font-size: 80%;"><b>Train and test datasets</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs078.html#define-model-and-architecture" style="font-size: 80%;"><b>Define model and architecture</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs079.html#layers" style="font-size: 80%;"><b>Layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs080.html#weights-and-biases" style="font-size: 80%;"><b>Weights and biases</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs081.html#feed-forward-pass" style="font-size: 80%;"><b>Feed-forward pass</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs082.html#matrix-multiplications" style="font-size: 80%;"><b>Matrix multiplications</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs083.html#choose-cost-function-and-optimizer" style="font-size: 80%;"><b>Choose cost function and optimizer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs084.html#optimizing-the-cost-function" style="font-size: 80%;"><b>Optimizing the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs085.html#regularization" style="font-size: 80%;"><b>Regularization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs086.html#matrix-multiplication" style="font-size: 80%;"><b>Matrix  multiplication</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs087.html#improving-performance" style="font-size: 80%;"><b>Improving performance</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs088.html#full-object-oriented-implementation" style="font-size: 80%;"><b>Full object-oriented implementation</b></a></li>
+     <!-- navigation toc: --> <li><a href="#evaluate-model-performance-on-test-data" style="font-size: 80%;"><b>Evaluate model performance on test data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs090.html#adjust-hyperparameters" style="font-size: 80%;"><b>Adjust hyperparameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#scikit-learn-implementation" style="font-size: 80%;"><b>scikit-learn implementation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs094.html#building-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building neural networks in Tensorflow and Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs095.html#tensorflow" style="font-size: 80%;"><b>Tensorflow</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#using-keras" style="font-size: 80%;"><b>Using Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#building-a-neural-network-code" style="font-size: 80%;"><b>Building a neural network code</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#learning-rate-methods" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Learning rate methods</a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#usage-of-the-above-learning-rate-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of the above learning rate schedulers</a></li>
@@ -465,12 +463,17 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0089"></a>
 <!-- !split -->
-<h2 id="adjust-hyperparameters" class="anchor">Adjust hyperparameters </h2>
+<h2 id="evaluate-model-performance-on-test-data" class="anchor">Evaluate model performance on test data </h2>
 
-<p>We now perform a grid search to find the optimal hyperparameters for the network.  
-Note that we are only using 1 layer with 50 neurons, and human performance is estimated to be around \( 98\% \) (\( 2\% \) error rate).
+<p>To measure the performance of our network we evaluate how well it does it data it has never seen before, i.e. the test data.  
+We measure the performance of the network using the <em>accuracy</em> score.  
+The accuracy is as you would expect just the number of images correctly labeled divided by the total number of images. A perfect classifier will have an accuracy score of \( 1 \).  
 </p>
 
+<p>$$ \text{Accuracy} = \frac{\sum_{i=1}^n I(\tilde{y}_i = y_i)}{n} ,$$  </p>
+
+<p>where \( I \) is the indicator function, \( 1 \) if \( \tilde{y}_i = y_i \) and \( 0 \) otherwise.</p>
+
 
 <!-- code=python (!bc pycod) typeset with pygments style "default" -->
 <div class="cell border-box-sizing code_cell rendered">
@@ -478,26 +481,22 @@ <h2 id="adjust-hyperparameters" class="anchor">Adjust hyperparameters </h2>
     <div class="inner_cell">
       <div class="input_area">
         <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;">eta_vals <span style="color: #666666">=</span> np<span style="color: #666666">.</span>logspace(<span style="color: #666666">-5</span>, <span style="color: #666666">1</span>, <span style="color: #666666">7</span>)
-lmbd_vals <span style="color: #666666">=</span> np<span style="color: #666666">.</span>logspace(<span style="color: #666666">-5</span>, <span style="color: #666666">1</span>, <span style="color: #666666">7</span>)
-<span style="color: #408080; font-style: italic"># store the models for later use</span>
-DNN_numpy <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros((<span style="color: #008000">len</span>(eta_vals), <span style="color: #008000">len</span>(lmbd_vals)), dtype<span style="color: #666666">=</span><span style="color: #008000">object</span>)
+  <pre style="line-height: 125%;">epochs <span style="color: #666666">=</span> <span style="color: #666666">100</span>
+batch_size <span style="color: #666666">=</span> <span style="color: #666666">100</span>
+
+dnn <span style="color: #666666">=</span> NeuralNetwork(X_train, Y_train_onehot, eta<span style="color: #666666">=</span>eta, lmbd<span style="color: #666666">=</span>lmbd, epochs<span style="color: #666666">=</span>epochs, batch_size<span style="color: #666666">=</span>batch_size,
+                    n_hidden_neurons<span style="color: #666666">=</span>n_hidden_neurons, n_categories<span style="color: #666666">=</span>n_categories)
+dnn<span style="color: #666666">.</span>train()
+test_predict <span style="color: #666666">=</span> dnn<span style="color: #666666">.</span>predict(X_test)
+
+<span style="color: #408080; font-style: italic"># accuracy score from scikit library</span>
+<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;Accuracy score on test set: &quot;</span>, accuracy_score(Y_test, test_predict))
+
+<span style="color: #408080; font-style: italic"># equivalent in numpy</span>
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">accuracy_score_numpy</span>(Y_test, Y_pred):
+    <span style="color: #008000; font-weight: bold">return</span> np<span style="color: #666666">.</span>sum(Y_test <span style="color: #666666">==</span> Y_pred) <span style="color: #666666">/</span> <span style="color: #008000">len</span>(Y_test)
 
-<span style="color: #408080; font-style: italic"># grid search</span>
-<span style="color: #008000; font-weight: bold">for</span> i, eta <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">enumerate</span>(eta_vals):
-    <span style="color: #008000; font-weight: bold">for</span> j, lmbd <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">enumerate</span>(lmbd_vals):
-        dnn <span style="color: #666666">=</span> NeuralNetwork(X_train, Y_train_onehot, eta<span style="color: #666666">=</span>eta, lmbd<span style="color: #666666">=</span>lmbd, epochs<span style="color: #666666">=</span>epochs, batch_size<span style="color: #666666">=</span>batch_size,
-                            n_hidden_neurons<span style="color: #666666">=</span>n_hidden_neurons, n_categories<span style="color: #666666">=</span>n_categories)
-        dnn<span style="color: #666666">.</span>train()
-        
-        DNN_numpy[i][j] <span style="color: #666666">=</span> dnn
-        
-        test_predict <span style="color: #666666">=</span> dnn<span style="color: #666666">.</span>predict(X_test)
-        
-        <span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;Learning rate  = &quot;</span>, eta)
-        <span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;Lambda = &quot;</span>, lmbd)
-        <span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;Accuracy score on test set: &quot;</span>, accuracy_score(Y_test, test_predict))
-        <span style="color: #008000">print</span>()
+<span style="color: #408080; font-style: italic">#print(&quot;Accuracy score on test set: &quot;, accuracy_score_numpy(Y_test, test_predict))</span>
 </pre>
 </div>
       </div>
diff --git a/doc/pub/week42/html/._week42-bs090.html b/doc/pub/week42/html/._week42-bs090.html
index 59d6fd6a6..9a4f879fc 100644
--- a/doc/pub/week42/html/._week42-bs090.html
+++ b/doc/pub/week42/html/._week42-bs090.html
@@ -36,16 +36,17 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Lecture October 14, 2024', 2, None, 'lecture-october-14-2024'),
-              ('Material for the active learning sessions on Tuesday and '
-               'Wednesday',
+ 'sections': [('Lecture October 13, 2025', 2, None, 'lecture-october-13-2025'),
+              ('Readings and videos', 2, None, 'readings-and-videos'),
+              ('Material for the lab sessions on Tuesday and Wednesday',
                2,
                None,
-               'material-for-the-active-learning-sessions-on-tuesday-and-wednesday'),
-              ('Writing a code which implements a feed-forward neural network',
+               'material-for-the-lab-sessions-on-tuesday-and-wednesday'),
+              ('Lecture material: Writing a code which implements a '
+               'feed-forward neural network',
                2,
                None,
-               'writing-a-code-which-implements-a-feed-forward-neural-network'),
+               'lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network'),
               ('Mathematics of deep learning',
                2,
                None,
@@ -55,10 +56,11 @@
                None,
                'reminder-on-books-with-hands-on-material-and-codes'),
               ('Reading recommendations', 2, None, 'reading-recommendations'),
-              ('First network example, simple percepetron with one input',
+              ('Reminder from last week: First network example, simple '
+               'percepetron with one input',
                2,
                None,
-               'first-network-example-simple-percepetron-with-one-input'),
+               'reminder-from-last-week-first-network-example-simple-percepetron-with-one-input'),
               ('Layout of a simple neural network with no hidden layer',
                2,
                None,
@@ -105,10 +107,10 @@
                None,
                'setting-up-the-equations-for-a-neural-network'),
               ('Layout of a neural network with three hidden layers (last '
-               'later = $l=L=4$, first layer $l=0$)',
+               'layer = $l=L=4$, first layer $l=0$)',
                2,
                None,
-               'layout-of-a-neural-network-with-three-hidden-layers-last-later-l-l-4-first-layer-l-0'),
+               'layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0'),
               ('Definitions', 2, None, 'definitions'),
               ('Inputs to the activation function',
                2,
@@ -291,10 +293,6 @@
                2,
                None,
                'collect-and-pre-process-data'),
-              ('The Breast Cancer Data, now with Keras',
-               2,
-               None,
-               'the-breast-cancer-data-now-with-keras'),
               ('Building a neural network code',
                2,
                None,
@@ -349,103 +347,103 @@
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs001.html#lecture-october-14-2024" style="font-size: 80%;"><b>Lecture October 14, 2024</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs002.html#material-for-the-active-learning-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Material for the active learning sessions on Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs003.html#writing-a-code-which-implements-a-feed-forward-neural-network" style="font-size: 80%;"><b>Writing a code which implements a feed-forward neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs004.html#mathematics-of-deep-learning" style="font-size: 80%;"><b>Mathematics of deep learning</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs005.html#reminder-on-books-with-hands-on-material-and-codes" style="font-size: 80%;"><b>Reminder on books with hands-on material and codes</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs006.html#reading-recommendations" style="font-size: 80%;"><b>Reading recommendations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs007.html#first-network-example-simple-percepetron-with-one-input" style="font-size: 80%;"><b>First network example, simple percepetron with one input</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs008.html#layout-of-a-simple-neural-network-with-no-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with no hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs009.html#optimizing-the-parameters" style="font-size: 80%;"><b>Optimizing the parameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs010.html#adding-a-hidden-layer" style="font-size: 80%;"><b>Adding a hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs011.html#layout-of-a-simple-neural-network-with-one-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with one hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs012.html#the-derivatives" style="font-size: 80%;"><b>The derivatives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs013.html#important-observations" style="font-size: 80%;"><b>Important observations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs014.html#the-training" style="font-size: 80%;"><b>The training</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs015.html#code-example" style="font-size: 80%;"><b>Code example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs016.html#simple-neural-network-and-the-back-propagation-equations" style="font-size: 80%;"><b>Simple neural network and the  back propagation equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs017.html#layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-with-two-hidden-noeds-and-one-output-node" style="font-size: 80%;"><b>Layout of a simple neural network with two input nodes, one  hidden layer with two hidden noeds and one output node</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs018.html#the-ouput-layer" style="font-size: 80%;"><b>The ouput layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs019.html#compact-expressions" style="font-size: 80%;"><b>Compact expressions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs020.html#output-layer" style="font-size: 80%;"><b>Output layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs021.html#explicit-derivatives" style="font-size: 80%;"><b>Explicit derivatives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs022.html#derivatives-of-the-hidden-layer" style="font-size: 80%;"><b>Derivatives of the hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs023.html#final-expression" style="font-size: 80%;"><b>Final expression</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs024.html#completing-the-list" style="font-size: 80%;"><b>Completing the list</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs025.html#final-expressions-for-the-biases-of-the-hidden-layer" style="font-size: 80%;"><b>Final expressions for the biases of the hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs026.html#gradient-expressions" style="font-size: 80%;"><b>Gradient expressions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs027.html#setting-up-the-equations-for-a-neural-network" style="font-size: 80%;"><b>Setting up the equations for a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs028.html#layout-of-a-neural-network-with-three-hidden-layers-last-later-l-l-4-first-layer-l-0" style="font-size: 80%;"><b>Layout of a neural network with three hidden layers (last later = \( l=L=4 \), first layer \( l=0 \))</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs029.html#definitions" style="font-size: 80%;"><b>Definitions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs030.html#inputs-to-the-activation-function" style="font-size: 80%;"><b>Inputs to the activation function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs031.html#layout-of-input-to-first-hidden-layer-l-1-from-input-layer-l-0" style="font-size: 80%;"><b>Layout of input to first hidden layer \( l=1 \) from input layer \( l=0 \)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs032.html#derivatives-and-the-chain-rule" style="font-size: 80%;"><b>Derivatives and the chain rule</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs033.html#derivative-of-the-cost-function" style="font-size: 80%;"><b>Derivative of the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs034.html#the-back-propagation-equations-for-a-neural-network" style="font-size: 80%;"><b>The  back propagation equations for a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs035.html#analyzing-the-last-results" style="font-size: 80%;"><b>Analyzing the last results</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs036.html#more-considerations" style="font-size: 80%;"><b>More considerations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs037.html#derivatives-in-terms-of-z-j-l" style="font-size: 80%;"><b>Derivatives in terms of \( z_j^L \)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs038.html#bringing-it-together" style="font-size: 80%;"><b>Bringing it together</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs039.html#final-back-propagating-equation" style="font-size: 80%;"><b>Final back propagating equation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs040.html#using-the-chain-rule-and-summing-over-all-k-entries" style="font-size: 80%;"><b>Using the chain rule and summing over all \( k \) entries</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs041.html#setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations" style="font-size: 80%;"><b>Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs042.html#setting-up-the-back-propagation-algorithm-part-1" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 1</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs043.html#setting-up-the-back-propagation-algorithm-part-2" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs044.html#setting-up-the-back-propagation-algorithm-part-3" style="font-size: 80%;"><b>Setting up the Back propagation algorithm, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs045.html#updating-the-gradients" style="font-size: 80%;"><b>Updating the gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs001.html#lecture-october-13-2025" style="font-size: 80%;"><b>Lecture October 13, 2025</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs002.html#readings-and-videos" style="font-size: 80%;"><b>Readings and videos</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs003.html#material-for-the-lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Material for the lab sessions on Tuesday and Wednesday</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs004.html#lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network" style="font-size: 80%;"><b>Lecture material: Writing a code which implements a feed-forward neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs005.html#mathematics-of-deep-learning" style="font-size: 80%;"><b>Mathematics of deep learning</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs006.html#reminder-on-books-with-hands-on-material-and-codes" style="font-size: 80%;"><b>Reminder on books with hands-on material and codes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs007.html#reading-recommendations" style="font-size: 80%;"><b>Reading recommendations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs008.html#reminder-from-last-week-first-network-example-simple-percepetron-with-one-input" style="font-size: 80%;"><b>Reminder from last week: First network example, simple percepetron with one input</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs009.html#layout-of-a-simple-neural-network-with-no-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with no hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs010.html#optimizing-the-parameters" style="font-size: 80%;"><b>Optimizing the parameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs011.html#adding-a-hidden-layer" style="font-size: 80%;"><b>Adding a hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs012.html#layout-of-a-simple-neural-network-with-one-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with one hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs013.html#the-derivatives" style="font-size: 80%;"><b>The derivatives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs014.html#important-observations" style="font-size: 80%;"><b>Important observations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs015.html#the-training" style="font-size: 80%;"><b>The training</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs016.html#code-example" style="font-size: 80%;"><b>Code example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs017.html#simple-neural-network-and-the-back-propagation-equations" style="font-size: 80%;"><b>Simple neural network and the  back propagation equations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs018.html#layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-with-two-hidden-noeds-and-one-output-node" style="font-size: 80%;"><b>Layout of a simple neural network with two input nodes, one  hidden layer with two hidden noeds and one output node</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs019.html#the-ouput-layer" style="font-size: 80%;"><b>The ouput layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs020.html#compact-expressions" style="font-size: 80%;"><b>Compact expressions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs021.html#output-layer" style="font-size: 80%;"><b>Output layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs022.html#explicit-derivatives" style="font-size: 80%;"><b>Explicit derivatives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs023.html#derivatives-of-the-hidden-layer" style="font-size: 80%;"><b>Derivatives of the hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs024.html#final-expression" style="font-size: 80%;"><b>Final expression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs025.html#completing-the-list" style="font-size: 80%;"><b>Completing the list</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs026.html#final-expressions-for-the-biases-of-the-hidden-layer" style="font-size: 80%;"><b>Final expressions for the biases of the hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs027.html#gradient-expressions" style="font-size: 80%;"><b>Gradient expressions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs028.html#setting-up-the-equations-for-a-neural-network" style="font-size: 80%;"><b>Setting up the equations for a neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs029.html#layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0" style="font-size: 80%;"><b>Layout of a neural network with three hidden layers (last layer = \( l=L=4 \), first layer \( l=0 \))</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs030.html#definitions" style="font-size: 80%;"><b>Definitions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs031.html#inputs-to-the-activation-function" style="font-size: 80%;"><b>Inputs to the activation function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs032.html#layout-of-input-to-first-hidden-layer-l-1-from-input-layer-l-0" style="font-size: 80%;"><b>Layout of input to first hidden layer \( l=1 \) from input layer \( l=0 \)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs033.html#derivatives-and-the-chain-rule" style="font-size: 80%;"><b>Derivatives and the chain rule</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs034.html#derivative-of-the-cost-function" style="font-size: 80%;"><b>Derivative of the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs035.html#the-back-propagation-equations-for-a-neural-network" style="font-size: 80%;"><b>The  back propagation equations for a neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs036.html#analyzing-the-last-results" style="font-size: 80%;"><b>Analyzing the last results</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs037.html#more-considerations" style="font-size: 80%;"><b>More considerations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs038.html#derivatives-in-terms-of-z-j-l" style="font-size: 80%;"><b>Derivatives in terms of \( z_j^L \)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs039.html#bringing-it-together" style="font-size: 80%;"><b>Bringing it together</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs040.html#final-back-propagating-equation" style="font-size: 80%;"><b>Final back propagating equation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs041.html#using-the-chain-rule-and-summing-over-all-k-entries" style="font-size: 80%;"><b>Using the chain rule and summing over all \( k \) entries</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs042.html#setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations" style="font-size: 80%;"><b>Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs043.html#setting-up-the-back-propagation-algorithm-part-1" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 1</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs044.html#setting-up-the-back-propagation-algorithm-part-2" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 2</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs045.html#setting-up-the-back-propagation-algorithm-part-3" style="font-size: 80%;"><b>Setting up the Back propagation algorithm, part 3</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs046.html#updating-the-gradients" style="font-size: 80%;"><b>Updating the gradients</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#activation-functions" style="font-size: 80%;"><b>Activation functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs047.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs048.html#relevance" style="font-size: 80%;"><b>Relevance</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs049.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs050.html#exploding-gradients" style="font-size: 80%;"><b>Exploding gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs051.html#is-the-logistic-activation-function-sigmoid-our-choice" style="font-size: 80%;"><b>Is the Logistic activation function (Sigmoid)  our choice?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs052.html#logistic-function-as-the-root-of-problems" style="font-size: 80%;"><b>Logistic function as the root of problems</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs053.html#the-derivative-of-the-logistic-funtion" style="font-size: 80%;"><b>The derivative of the Logistic funtion</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs054.html#insights-from-the-paper-by-glorot-and-bengio" style="font-size: 80%;"><b>Insights from the paper by Glorot and Bengio</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs055.html#the-relu-function-family" style="font-size: 80%;"><b>The RELU function family</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs056.html#elu-function" style="font-size: 80%;"><b>ELU function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs057.html#which-activation-function-should-we-use" style="font-size: 80%;"><b>Which activation function should we use?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs058.html#more-on-activation-functions-output-layers" style="font-size: 80%;"><b>More on activation functions, output layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs059.html#fine-tuning-neural-network-hyperparameters" style="font-size: 80%;"><b>Fine-tuning neural network hyperparameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs060.html#hidden-layers" style="font-size: 80%;"><b>Hidden layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs061.html#batch-normalization" style="font-size: 80%;"><b>Batch Normalization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs062.html#dropout" style="font-size: 80%;"><b>Dropout</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs063.html#gradient-clipping" style="font-size: 80%;"><b>Gradient Clipping</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs064.html#a-top-down-perspective-on-neural-networks" style="font-size: 80%;"><b>A top-down perspective on Neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs065.html#more-top-down-perspectives" style="font-size: 80%;"><b>More top-down perspectives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs066.html#limitations-of-supervised-learning-with-deep-networks" style="font-size: 80%;"><b>Limitations of supervised learning with deep networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs067.html#limitations-of-nns" style="font-size: 80%;"><b>Limitations of NNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs068.html#homogeneous-data" style="font-size: 80%;"><b>Homogeneous data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs069.html#more-limitations" style="font-size: 80%;"><b>More limitations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs070.html#setting-up-a-multi-layer-perceptron-model-for-classification" style="font-size: 80%;"><b>Setting up a Multi-layer perceptron model for classification</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs071.html#defining-the-cost-function" style="font-size: 80%;"><b>Defining the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs072.html#example-binary-classification-problem" style="font-size: 80%;"><b>Example: binary classification problem</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs073.html#the-softmax-function" style="font-size: 80%;"><b>The Softmax function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs074.html#developing-a-code-for-doing-neural-networks-with-back-propagation" style="font-size: 80%;"><b>Developing a code for doing neural networks with back propagation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs076.html#train-and-test-datasets" style="font-size: 80%;"><b>Train and test datasets</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs077.html#define-model-and-architecture" style="font-size: 80%;"><b>Define model and architecture</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs078.html#layers" style="font-size: 80%;"><b>Layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs079.html#weights-and-biases" style="font-size: 80%;"><b>Weights and biases</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs080.html#feed-forward-pass" style="font-size: 80%;"><b>Feed-forward pass</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs081.html#matrix-multiplications" style="font-size: 80%;"><b>Matrix multiplications</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs082.html#choose-cost-function-and-optimizer" style="font-size: 80%;"><b>Choose cost function and optimizer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs083.html#optimizing-the-cost-function" style="font-size: 80%;"><b>Optimizing the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs084.html#regularization" style="font-size: 80%;"><b>Regularization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs085.html#matrix-multiplication" style="font-size: 80%;"><b>Matrix  multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs086.html#improving-performance" style="font-size: 80%;"><b>Improving performance</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs087.html#full-object-oriented-implementation" style="font-size: 80%;"><b>Full object-oriented implementation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs088.html#evaluate-model-performance-on-test-data" style="font-size: 80%;"><b>Evaluate model performance on test data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs089.html#adjust-hyperparameters" style="font-size: 80%;"><b>Adjust hyperparameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs091.html#scikit-learn-implementation" style="font-size: 80%;"><b>scikit-learn implementation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#building-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs094.html#tensorflow" style="font-size: 80%;"><b>Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs095.html#using-keras" style="font-size: 80%;"><b>Using Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#the-breast-cancer-data-now-with-keras" style="font-size: 80%;"><b>The Breast Cancer Data, now with Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs048.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs049.html#relevance" style="font-size: 80%;"><b>Relevance</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs050.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs051.html#exploding-gradients" style="font-size: 80%;"><b>Exploding gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs052.html#is-the-logistic-activation-function-sigmoid-our-choice" style="font-size: 80%;"><b>Is the Logistic activation function (Sigmoid)  our choice?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs053.html#logistic-function-as-the-root-of-problems" style="font-size: 80%;"><b>Logistic function as the root of problems</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs054.html#the-derivative-of-the-logistic-funtion" style="font-size: 80%;"><b>The derivative of the Logistic funtion</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs055.html#insights-from-the-paper-by-glorot-and-bengio" style="font-size: 80%;"><b>Insights from the paper by Glorot and Bengio</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs056.html#the-relu-function-family" style="font-size: 80%;"><b>The RELU function family</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs057.html#elu-function" style="font-size: 80%;"><b>ELU function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs058.html#which-activation-function-should-we-use" style="font-size: 80%;"><b>Which activation function should we use?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs059.html#more-on-activation-functions-output-layers" style="font-size: 80%;"><b>More on activation functions, output layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs060.html#fine-tuning-neural-network-hyperparameters" style="font-size: 80%;"><b>Fine-tuning neural network hyperparameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs061.html#hidden-layers" style="font-size: 80%;"><b>Hidden layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs062.html#batch-normalization" style="font-size: 80%;"><b>Batch Normalization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs063.html#dropout" style="font-size: 80%;"><b>Dropout</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs064.html#gradient-clipping" style="font-size: 80%;"><b>Gradient Clipping</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs065.html#a-top-down-perspective-on-neural-networks" style="font-size: 80%;"><b>A top-down perspective on Neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs066.html#more-top-down-perspectives" style="font-size: 80%;"><b>More top-down perspectives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs067.html#limitations-of-supervised-learning-with-deep-networks" style="font-size: 80%;"><b>Limitations of supervised learning with deep networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs068.html#limitations-of-nns" style="font-size: 80%;"><b>Limitations of NNs</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs069.html#homogeneous-data" style="font-size: 80%;"><b>Homogeneous data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs070.html#more-limitations" style="font-size: 80%;"><b>More limitations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs071.html#setting-up-a-multi-layer-perceptron-model-for-classification" style="font-size: 80%;"><b>Setting up a Multi-layer perceptron model for classification</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs072.html#defining-the-cost-function" style="font-size: 80%;"><b>Defining the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs073.html#example-binary-classification-problem" style="font-size: 80%;"><b>Example: binary classification problem</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs074.html#the-softmax-function" style="font-size: 80%;"><b>The Softmax function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs075.html#developing-a-code-for-doing-neural-networks-with-back-propagation" style="font-size: 80%;"><b>Developing a code for doing neural networks with back propagation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs077.html#train-and-test-datasets" style="font-size: 80%;"><b>Train and test datasets</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs078.html#define-model-and-architecture" style="font-size: 80%;"><b>Define model and architecture</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs079.html#layers" style="font-size: 80%;"><b>Layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs080.html#weights-and-biases" style="font-size: 80%;"><b>Weights and biases</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs081.html#feed-forward-pass" style="font-size: 80%;"><b>Feed-forward pass</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs082.html#matrix-multiplications" style="font-size: 80%;"><b>Matrix multiplications</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs083.html#choose-cost-function-and-optimizer" style="font-size: 80%;"><b>Choose cost function and optimizer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs084.html#optimizing-the-cost-function" style="font-size: 80%;"><b>Optimizing the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs085.html#regularization" style="font-size: 80%;"><b>Regularization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs086.html#matrix-multiplication" style="font-size: 80%;"><b>Matrix  multiplication</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs087.html#improving-performance" style="font-size: 80%;"><b>Improving performance</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs088.html#full-object-oriented-implementation" style="font-size: 80%;"><b>Full object-oriented implementation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs089.html#evaluate-model-performance-on-test-data" style="font-size: 80%;"><b>Evaluate model performance on test data</b></a></li>
+     <!-- navigation toc: --> <li><a href="#adjust-hyperparameters" style="font-size: 80%;"><b>Adjust hyperparameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#scikit-learn-implementation" style="font-size: 80%;"><b>scikit-learn implementation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs094.html#building-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building neural networks in Tensorflow and Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs095.html#tensorflow" style="font-size: 80%;"><b>Tensorflow</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#using-keras" style="font-size: 80%;"><b>Using Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#building-a-neural-network-code" style="font-size: 80%;"><b>Building a neural network code</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#learning-rate-methods" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Learning rate methods</a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#usage-of-the-above-learning-rate-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of the above learning rate schedulers</a></li>
@@ -465,7 +463,11 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0090"></a>
 <!-- !split -->
-<h2 id="visualization" class="anchor">Visualization </h2>
+<h2 id="adjust-hyperparameters" class="anchor">Adjust hyperparameters </h2>
+
+<p>We now perform a grid search to find the optimal hyperparameters for the network.  
+Note that we are only using 1 layer with 50 neurons, and human performance is estimated to be around \( 98\% \) (\( 2\% \) error rate).
+</p>
 
 
 <!-- code=python (!bc pycod) typeset with pygments style "default" -->
@@ -474,39 +476,26 @@ <h2 id="visualization" class="anchor">Visualization </h2>
     <div class="inner_cell">
       <div class="input_area">
         <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #408080; font-style: italic"># visual representation of grid search</span>
-<span style="color: #408080; font-style: italic"># uses seaborn heatmap, you can also do this with matplotlib imshow</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">seaborn</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">sns</span>
-
-sns<span style="color: #666666">.</span>set()
-
-train_accuracy <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros((<span style="color: #008000">len</span>(eta_vals), <span style="color: #008000">len</span>(lmbd_vals)))
-test_accuracy <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros((<span style="color: #008000">len</span>(eta_vals), <span style="color: #008000">len</span>(lmbd_vals)))
+  <pre style="line-height: 125%;">eta_vals <span style="color: #666666">=</span> np<span style="color: #666666">.</span>logspace(<span style="color: #666666">-5</span>, <span style="color: #666666">1</span>, <span style="color: #666666">7</span>)
+lmbd_vals <span style="color: #666666">=</span> np<span style="color: #666666">.</span>logspace(<span style="color: #666666">-5</span>, <span style="color: #666666">1</span>, <span style="color: #666666">7</span>)
+<span style="color: #408080; font-style: italic"># store the models for later use</span>
+DNN_numpy <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros((<span style="color: #008000">len</span>(eta_vals), <span style="color: #008000">len</span>(lmbd_vals)), dtype<span style="color: #666666">=</span><span style="color: #008000">object</span>)
 
-<span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #008000">len</span>(eta_vals)):
-    <span style="color: #008000; font-weight: bold">for</span> j <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #008000">len</span>(lmbd_vals)):
-        dnn <span style="color: #666666">=</span> DNN_numpy[i][j]
+<span style="color: #408080; font-style: italic"># grid search</span>
+<span style="color: #008000; font-weight: bold">for</span> i, eta <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">enumerate</span>(eta_vals):
+    <span style="color: #008000; font-weight: bold">for</span> j, lmbd <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">enumerate</span>(lmbd_vals):
+        dnn <span style="color: #666666">=</span> NeuralNetwork(X_train, Y_train_onehot, eta<span style="color: #666666">=</span>eta, lmbd<span style="color: #666666">=</span>lmbd, epochs<span style="color: #666666">=</span>epochs, batch_size<span style="color: #666666">=</span>batch_size,
+                            n_hidden_neurons<span style="color: #666666">=</span>n_hidden_neurons, n_categories<span style="color: #666666">=</span>n_categories)
+        dnn<span style="color: #666666">.</span>train()
         
-        train_pred <span style="color: #666666">=</span> dnn<span style="color: #666666">.</span>predict(X_train) 
-        test_pred <span style="color: #666666">=</span> dnn<span style="color: #666666">.</span>predict(X_test)
-
-        train_accuracy[i][j] <span style="color: #666666">=</span> accuracy_score(Y_train, train_pred)
-        test_accuracy[i][j] <span style="color: #666666">=</span> accuracy_score(Y_test, test_pred)
-
+        DNN_numpy[i][j] <span style="color: #666666">=</span> dnn
         
-fig, ax <span style="color: #666666">=</span> plt<span style="color: #666666">.</span>subplots(figsize <span style="color: #666666">=</span> (<span style="color: #666666">10</span>, <span style="color: #666666">10</span>))
-sns<span style="color: #666666">.</span>heatmap(train_accuracy, annot<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">True</span>, ax<span style="color: #666666">=</span>ax, cmap<span style="color: #666666">=</span><span style="color: #BA2121">&quot;viridis&quot;</span>)
-ax<span style="color: #666666">.</span>set_title(<span style="color: #BA2121">&quot;Training Accuracy&quot;</span>)
-ax<span style="color: #666666">.</span>set_ylabel(<span style="color: #BA2121">&quot;$\eta$&quot;</span>)
-ax<span style="color: #666666">.</span>set_xlabel(<span style="color: #BA2121">&quot;$\lambda$&quot;</span>)
-plt<span style="color: #666666">.</span>show()
-
-fig, ax <span style="color: #666666">=</span> plt<span style="color: #666666">.</span>subplots(figsize <span style="color: #666666">=</span> (<span style="color: #666666">10</span>, <span style="color: #666666">10</span>))
-sns<span style="color: #666666">.</span>heatmap(test_accuracy, annot<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">True</span>, ax<span style="color: #666666">=</span>ax, cmap<span style="color: #666666">=</span><span style="color: #BA2121">&quot;viridis&quot;</span>)
-ax<span style="color: #666666">.</span>set_title(<span style="color: #BA2121">&quot;Test Accuracy&quot;</span>)
-ax<span style="color: #666666">.</span>set_ylabel(<span style="color: #BA2121">&quot;$\eta$&quot;</span>)
-ax<span style="color: #666666">.</span>set_xlabel(<span style="color: #BA2121">&quot;$\lambda$&quot;</span>)
-plt<span style="color: #666666">.</span>show()
+        test_predict <span style="color: #666666">=</span> dnn<span style="color: #666666">.</span>predict(X_test)
+        
+        <span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;Learning rate  = &quot;</span>, eta)
+        <span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;Lambda = &quot;</span>, lmbd)
+        <span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;Accuracy score on test set: &quot;</span>, accuracy_score(Y_test, test_predict))
+        <span style="color: #008000">print</span>()
 </pre>
 </div>
       </div>
diff --git a/doc/pub/week42/html/._week42-bs091.html b/doc/pub/week42/html/._week42-bs091.html
index c6bf28f95..1e38a7c5e 100644
--- a/doc/pub/week42/html/._week42-bs091.html
+++ b/doc/pub/week42/html/._week42-bs091.html
@@ -36,16 +36,17 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Lecture October 14, 2024', 2, None, 'lecture-october-14-2024'),
-              ('Material for the active learning sessions on Tuesday and '
-               'Wednesday',
+ 'sections': [('Lecture October 13, 2025', 2, None, 'lecture-october-13-2025'),
+              ('Readings and videos', 2, None, 'readings-and-videos'),
+              ('Material for the lab sessions on Tuesday and Wednesday',
                2,
                None,
-               'material-for-the-active-learning-sessions-on-tuesday-and-wednesday'),
-              ('Writing a code which implements a feed-forward neural network',
+               'material-for-the-lab-sessions-on-tuesday-and-wednesday'),
+              ('Lecture material: Writing a code which implements a '
+               'feed-forward neural network',
                2,
                None,
-               'writing-a-code-which-implements-a-feed-forward-neural-network'),
+               'lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network'),
               ('Mathematics of deep learning',
                2,
                None,
@@ -55,10 +56,11 @@
                None,
                'reminder-on-books-with-hands-on-material-and-codes'),
               ('Reading recommendations', 2, None, 'reading-recommendations'),
-              ('First network example, simple percepetron with one input',
+              ('Reminder from last week: First network example, simple '
+               'percepetron with one input',
                2,
                None,
-               'first-network-example-simple-percepetron-with-one-input'),
+               'reminder-from-last-week-first-network-example-simple-percepetron-with-one-input'),
               ('Layout of a simple neural network with no hidden layer',
                2,
                None,
@@ -105,10 +107,10 @@
                None,
                'setting-up-the-equations-for-a-neural-network'),
               ('Layout of a neural network with three hidden layers (last '
-               'later = $l=L=4$, first layer $l=0$)',
+               'layer = $l=L=4$, first layer $l=0$)',
                2,
                None,
-               'layout-of-a-neural-network-with-three-hidden-layers-last-later-l-l-4-first-layer-l-0'),
+               'layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0'),
               ('Definitions', 2, None, 'definitions'),
               ('Inputs to the activation function',
                2,
@@ -291,10 +293,6 @@
                2,
                None,
                'collect-and-pre-process-data'),
-              ('The Breast Cancer Data, now with Keras',
-               2,
-               None,
-               'the-breast-cancer-data-now-with-keras'),
               ('Building a neural network code',
                2,
                None,
@@ -349,103 +347,103 @@
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs001.html#lecture-october-14-2024" style="font-size: 80%;"><b>Lecture October 14, 2024</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs002.html#material-for-the-active-learning-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Material for the active learning sessions on Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs003.html#writing-a-code-which-implements-a-feed-forward-neural-network" style="font-size: 80%;"><b>Writing a code which implements a feed-forward neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs004.html#mathematics-of-deep-learning" style="font-size: 80%;"><b>Mathematics of deep learning</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs005.html#reminder-on-books-with-hands-on-material-and-codes" style="font-size: 80%;"><b>Reminder on books with hands-on material and codes</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs006.html#reading-recommendations" style="font-size: 80%;"><b>Reading recommendations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs007.html#first-network-example-simple-percepetron-with-one-input" style="font-size: 80%;"><b>First network example, simple percepetron with one input</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs008.html#layout-of-a-simple-neural-network-with-no-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with no hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs009.html#optimizing-the-parameters" style="font-size: 80%;"><b>Optimizing the parameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs010.html#adding-a-hidden-layer" style="font-size: 80%;"><b>Adding a hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs011.html#layout-of-a-simple-neural-network-with-one-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with one hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs012.html#the-derivatives" style="font-size: 80%;"><b>The derivatives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs013.html#important-observations" style="font-size: 80%;"><b>Important observations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs014.html#the-training" style="font-size: 80%;"><b>The training</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs015.html#code-example" style="font-size: 80%;"><b>Code example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs016.html#simple-neural-network-and-the-back-propagation-equations" style="font-size: 80%;"><b>Simple neural network and the  back propagation equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs017.html#layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-with-two-hidden-noeds-and-one-output-node" style="font-size: 80%;"><b>Layout of a simple neural network with two input nodes, one  hidden layer with two hidden noeds and one output node</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs018.html#the-ouput-layer" style="font-size: 80%;"><b>The ouput layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs019.html#compact-expressions" style="font-size: 80%;"><b>Compact expressions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs020.html#output-layer" style="font-size: 80%;"><b>Output layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs021.html#explicit-derivatives" style="font-size: 80%;"><b>Explicit derivatives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs022.html#derivatives-of-the-hidden-layer" style="font-size: 80%;"><b>Derivatives of the hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs023.html#final-expression" style="font-size: 80%;"><b>Final expression</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs024.html#completing-the-list" style="font-size: 80%;"><b>Completing the list</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs025.html#final-expressions-for-the-biases-of-the-hidden-layer" style="font-size: 80%;"><b>Final expressions for the biases of the hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs026.html#gradient-expressions" style="font-size: 80%;"><b>Gradient expressions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs027.html#setting-up-the-equations-for-a-neural-network" style="font-size: 80%;"><b>Setting up the equations for a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs028.html#layout-of-a-neural-network-with-three-hidden-layers-last-later-l-l-4-first-layer-l-0" style="font-size: 80%;"><b>Layout of a neural network with three hidden layers (last later = \( l=L=4 \), first layer \( l=0 \))</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs029.html#definitions" style="font-size: 80%;"><b>Definitions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs030.html#inputs-to-the-activation-function" style="font-size: 80%;"><b>Inputs to the activation function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs031.html#layout-of-input-to-first-hidden-layer-l-1-from-input-layer-l-0" style="font-size: 80%;"><b>Layout of input to first hidden layer \( l=1 \) from input layer \( l=0 \)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs032.html#derivatives-and-the-chain-rule" style="font-size: 80%;"><b>Derivatives and the chain rule</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs033.html#derivative-of-the-cost-function" style="font-size: 80%;"><b>Derivative of the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs034.html#the-back-propagation-equations-for-a-neural-network" style="font-size: 80%;"><b>The  back propagation equations for a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs035.html#analyzing-the-last-results" style="font-size: 80%;"><b>Analyzing the last results</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs036.html#more-considerations" style="font-size: 80%;"><b>More considerations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs037.html#derivatives-in-terms-of-z-j-l" style="font-size: 80%;"><b>Derivatives in terms of \( z_j^L \)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs038.html#bringing-it-together" style="font-size: 80%;"><b>Bringing it together</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs039.html#final-back-propagating-equation" style="font-size: 80%;"><b>Final back propagating equation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs040.html#using-the-chain-rule-and-summing-over-all-k-entries" style="font-size: 80%;"><b>Using the chain rule and summing over all \( k \) entries</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs041.html#setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations" style="font-size: 80%;"><b>Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs042.html#setting-up-the-back-propagation-algorithm-part-1" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 1</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs043.html#setting-up-the-back-propagation-algorithm-part-2" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs044.html#setting-up-the-back-propagation-algorithm-part-3" style="font-size: 80%;"><b>Setting up the Back propagation algorithm, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs045.html#updating-the-gradients" style="font-size: 80%;"><b>Updating the gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs001.html#lecture-october-13-2025" style="font-size: 80%;"><b>Lecture October 13, 2025</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs002.html#readings-and-videos" style="font-size: 80%;"><b>Readings and videos</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs003.html#material-for-the-lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Material for the lab sessions on Tuesday and Wednesday</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs004.html#lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network" style="font-size: 80%;"><b>Lecture material: Writing a code which implements a feed-forward neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs005.html#mathematics-of-deep-learning" style="font-size: 80%;"><b>Mathematics of deep learning</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs006.html#reminder-on-books-with-hands-on-material-and-codes" style="font-size: 80%;"><b>Reminder on books with hands-on material and codes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs007.html#reading-recommendations" style="font-size: 80%;"><b>Reading recommendations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs008.html#reminder-from-last-week-first-network-example-simple-percepetron-with-one-input" style="font-size: 80%;"><b>Reminder from last week: First network example, simple percepetron with one input</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs009.html#layout-of-a-simple-neural-network-with-no-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with no hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs010.html#optimizing-the-parameters" style="font-size: 80%;"><b>Optimizing the parameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs011.html#adding-a-hidden-layer" style="font-size: 80%;"><b>Adding a hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs012.html#layout-of-a-simple-neural-network-with-one-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with one hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs013.html#the-derivatives" style="font-size: 80%;"><b>The derivatives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs014.html#important-observations" style="font-size: 80%;"><b>Important observations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs015.html#the-training" style="font-size: 80%;"><b>The training</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs016.html#code-example" style="font-size: 80%;"><b>Code example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs017.html#simple-neural-network-and-the-back-propagation-equations" style="font-size: 80%;"><b>Simple neural network and the  back propagation equations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs018.html#layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-with-two-hidden-noeds-and-one-output-node" style="font-size: 80%;"><b>Layout of a simple neural network with two input nodes, one  hidden layer with two hidden noeds and one output node</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs019.html#the-ouput-layer" style="font-size: 80%;"><b>The ouput layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs020.html#compact-expressions" style="font-size: 80%;"><b>Compact expressions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs021.html#output-layer" style="font-size: 80%;"><b>Output layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs022.html#explicit-derivatives" style="font-size: 80%;"><b>Explicit derivatives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs023.html#derivatives-of-the-hidden-layer" style="font-size: 80%;"><b>Derivatives of the hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs024.html#final-expression" style="font-size: 80%;"><b>Final expression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs025.html#completing-the-list" style="font-size: 80%;"><b>Completing the list</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs026.html#final-expressions-for-the-biases-of-the-hidden-layer" style="font-size: 80%;"><b>Final expressions for the biases of the hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs027.html#gradient-expressions" style="font-size: 80%;"><b>Gradient expressions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs028.html#setting-up-the-equations-for-a-neural-network" style="font-size: 80%;"><b>Setting up the equations for a neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs029.html#layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0" style="font-size: 80%;"><b>Layout of a neural network with three hidden layers (last layer = \( l=L=4 \), first layer \( l=0 \))</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs030.html#definitions" style="font-size: 80%;"><b>Definitions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs031.html#inputs-to-the-activation-function" style="font-size: 80%;"><b>Inputs to the activation function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs032.html#layout-of-input-to-first-hidden-layer-l-1-from-input-layer-l-0" style="font-size: 80%;"><b>Layout of input to first hidden layer \( l=1 \) from input layer \( l=0 \)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs033.html#derivatives-and-the-chain-rule" style="font-size: 80%;"><b>Derivatives and the chain rule</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs034.html#derivative-of-the-cost-function" style="font-size: 80%;"><b>Derivative of the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs035.html#the-back-propagation-equations-for-a-neural-network" style="font-size: 80%;"><b>The  back propagation equations for a neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs036.html#analyzing-the-last-results" style="font-size: 80%;"><b>Analyzing the last results</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs037.html#more-considerations" style="font-size: 80%;"><b>More considerations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs038.html#derivatives-in-terms-of-z-j-l" style="font-size: 80%;"><b>Derivatives in terms of \( z_j^L \)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs039.html#bringing-it-together" style="font-size: 80%;"><b>Bringing it together</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs040.html#final-back-propagating-equation" style="font-size: 80%;"><b>Final back propagating equation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs041.html#using-the-chain-rule-and-summing-over-all-k-entries" style="font-size: 80%;"><b>Using the chain rule and summing over all \( k \) entries</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs042.html#setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations" style="font-size: 80%;"><b>Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs043.html#setting-up-the-back-propagation-algorithm-part-1" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 1</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs044.html#setting-up-the-back-propagation-algorithm-part-2" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 2</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs045.html#setting-up-the-back-propagation-algorithm-part-3" style="font-size: 80%;"><b>Setting up the Back propagation algorithm, part 3</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs046.html#updating-the-gradients" style="font-size: 80%;"><b>Updating the gradients</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#activation-functions" style="font-size: 80%;"><b>Activation functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs047.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs048.html#relevance" style="font-size: 80%;"><b>Relevance</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs049.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs050.html#exploding-gradients" style="font-size: 80%;"><b>Exploding gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs051.html#is-the-logistic-activation-function-sigmoid-our-choice" style="font-size: 80%;"><b>Is the Logistic activation function (Sigmoid)  our choice?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs052.html#logistic-function-as-the-root-of-problems" style="font-size: 80%;"><b>Logistic function as the root of problems</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs053.html#the-derivative-of-the-logistic-funtion" style="font-size: 80%;"><b>The derivative of the Logistic funtion</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs054.html#insights-from-the-paper-by-glorot-and-bengio" style="font-size: 80%;"><b>Insights from the paper by Glorot and Bengio</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs055.html#the-relu-function-family" style="font-size: 80%;"><b>The RELU function family</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs056.html#elu-function" style="font-size: 80%;"><b>ELU function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs057.html#which-activation-function-should-we-use" style="font-size: 80%;"><b>Which activation function should we use?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs058.html#more-on-activation-functions-output-layers" style="font-size: 80%;"><b>More on activation functions, output layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs059.html#fine-tuning-neural-network-hyperparameters" style="font-size: 80%;"><b>Fine-tuning neural network hyperparameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs060.html#hidden-layers" style="font-size: 80%;"><b>Hidden layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs061.html#batch-normalization" style="font-size: 80%;"><b>Batch Normalization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs062.html#dropout" style="font-size: 80%;"><b>Dropout</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs063.html#gradient-clipping" style="font-size: 80%;"><b>Gradient Clipping</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs064.html#a-top-down-perspective-on-neural-networks" style="font-size: 80%;"><b>A top-down perspective on Neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs065.html#more-top-down-perspectives" style="font-size: 80%;"><b>More top-down perspectives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs066.html#limitations-of-supervised-learning-with-deep-networks" style="font-size: 80%;"><b>Limitations of supervised learning with deep networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs067.html#limitations-of-nns" style="font-size: 80%;"><b>Limitations of NNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs068.html#homogeneous-data" style="font-size: 80%;"><b>Homogeneous data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs069.html#more-limitations" style="font-size: 80%;"><b>More limitations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs070.html#setting-up-a-multi-layer-perceptron-model-for-classification" style="font-size: 80%;"><b>Setting up a Multi-layer perceptron model for classification</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs071.html#defining-the-cost-function" style="font-size: 80%;"><b>Defining the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs072.html#example-binary-classification-problem" style="font-size: 80%;"><b>Example: binary classification problem</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs073.html#the-softmax-function" style="font-size: 80%;"><b>The Softmax function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs074.html#developing-a-code-for-doing-neural-networks-with-back-propagation" style="font-size: 80%;"><b>Developing a code for doing neural networks with back propagation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs076.html#train-and-test-datasets" style="font-size: 80%;"><b>Train and test datasets</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs077.html#define-model-and-architecture" style="font-size: 80%;"><b>Define model and architecture</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs078.html#layers" style="font-size: 80%;"><b>Layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs079.html#weights-and-biases" style="font-size: 80%;"><b>Weights and biases</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs080.html#feed-forward-pass" style="font-size: 80%;"><b>Feed-forward pass</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs081.html#matrix-multiplications" style="font-size: 80%;"><b>Matrix multiplications</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs082.html#choose-cost-function-and-optimizer" style="font-size: 80%;"><b>Choose cost function and optimizer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs083.html#optimizing-the-cost-function" style="font-size: 80%;"><b>Optimizing the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs084.html#regularization" style="font-size: 80%;"><b>Regularization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs085.html#matrix-multiplication" style="font-size: 80%;"><b>Matrix  multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs086.html#improving-performance" style="font-size: 80%;"><b>Improving performance</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs087.html#full-object-oriented-implementation" style="font-size: 80%;"><b>Full object-oriented implementation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs088.html#evaluate-model-performance-on-test-data" style="font-size: 80%;"><b>Evaluate model performance on test data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs089.html#adjust-hyperparameters" style="font-size: 80%;"><b>Adjust hyperparameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="#scikit-learn-implementation" style="font-size: 80%;"><b>scikit-learn implementation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#building-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs094.html#tensorflow" style="font-size: 80%;"><b>Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs095.html#using-keras" style="font-size: 80%;"><b>Using Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#the-breast-cancer-data-now-with-keras" style="font-size: 80%;"><b>The Breast Cancer Data, now with Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs048.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs049.html#relevance" style="font-size: 80%;"><b>Relevance</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs050.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs051.html#exploding-gradients" style="font-size: 80%;"><b>Exploding gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs052.html#is-the-logistic-activation-function-sigmoid-our-choice" style="font-size: 80%;"><b>Is the Logistic activation function (Sigmoid)  our choice?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs053.html#logistic-function-as-the-root-of-problems" style="font-size: 80%;"><b>Logistic function as the root of problems</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs054.html#the-derivative-of-the-logistic-funtion" style="font-size: 80%;"><b>The derivative of the Logistic funtion</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs055.html#insights-from-the-paper-by-glorot-and-bengio" style="font-size: 80%;"><b>Insights from the paper by Glorot and Bengio</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs056.html#the-relu-function-family" style="font-size: 80%;"><b>The RELU function family</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs057.html#elu-function" style="font-size: 80%;"><b>ELU function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs058.html#which-activation-function-should-we-use" style="font-size: 80%;"><b>Which activation function should we use?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs059.html#more-on-activation-functions-output-layers" style="font-size: 80%;"><b>More on activation functions, output layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs060.html#fine-tuning-neural-network-hyperparameters" style="font-size: 80%;"><b>Fine-tuning neural network hyperparameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs061.html#hidden-layers" style="font-size: 80%;"><b>Hidden layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs062.html#batch-normalization" style="font-size: 80%;"><b>Batch Normalization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs063.html#dropout" style="font-size: 80%;"><b>Dropout</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs064.html#gradient-clipping" style="font-size: 80%;"><b>Gradient Clipping</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs065.html#a-top-down-perspective-on-neural-networks" style="font-size: 80%;"><b>A top-down perspective on Neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs066.html#more-top-down-perspectives" style="font-size: 80%;"><b>More top-down perspectives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs067.html#limitations-of-supervised-learning-with-deep-networks" style="font-size: 80%;"><b>Limitations of supervised learning with deep networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs068.html#limitations-of-nns" style="font-size: 80%;"><b>Limitations of NNs</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs069.html#homogeneous-data" style="font-size: 80%;"><b>Homogeneous data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs070.html#more-limitations" style="font-size: 80%;"><b>More limitations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs071.html#setting-up-a-multi-layer-perceptron-model-for-classification" style="font-size: 80%;"><b>Setting up a Multi-layer perceptron model for classification</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs072.html#defining-the-cost-function" style="font-size: 80%;"><b>Defining the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs073.html#example-binary-classification-problem" style="font-size: 80%;"><b>Example: binary classification problem</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs074.html#the-softmax-function" style="font-size: 80%;"><b>The Softmax function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs075.html#developing-a-code-for-doing-neural-networks-with-back-propagation" style="font-size: 80%;"><b>Developing a code for doing neural networks with back propagation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs077.html#train-and-test-datasets" style="font-size: 80%;"><b>Train and test datasets</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs078.html#define-model-and-architecture" style="font-size: 80%;"><b>Define model and architecture</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs079.html#layers" style="font-size: 80%;"><b>Layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs080.html#weights-and-biases" style="font-size: 80%;"><b>Weights and biases</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs081.html#feed-forward-pass" style="font-size: 80%;"><b>Feed-forward pass</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs082.html#matrix-multiplications" style="font-size: 80%;"><b>Matrix multiplications</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs083.html#choose-cost-function-and-optimizer" style="font-size: 80%;"><b>Choose cost function and optimizer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs084.html#optimizing-the-cost-function" style="font-size: 80%;"><b>Optimizing the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs085.html#regularization" style="font-size: 80%;"><b>Regularization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs086.html#matrix-multiplication" style="font-size: 80%;"><b>Matrix  multiplication</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs087.html#improving-performance" style="font-size: 80%;"><b>Improving performance</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs088.html#full-object-oriented-implementation" style="font-size: 80%;"><b>Full object-oriented implementation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs089.html#evaluate-model-performance-on-test-data" style="font-size: 80%;"><b>Evaluate model performance on test data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs090.html#adjust-hyperparameters" style="font-size: 80%;"><b>Adjust hyperparameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#scikit-learn-implementation" style="font-size: 80%;"><b>scikit-learn implementation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs094.html#building-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building neural networks in Tensorflow and Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs095.html#tensorflow" style="font-size: 80%;"><b>Tensorflow</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#using-keras" style="font-size: 80%;"><b>Using Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#building-a-neural-network-code" style="font-size: 80%;"><b>Building a neural network code</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#learning-rate-methods" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Learning rate methods</a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#usage-of-the-above-learning-rate-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of the above learning rate schedulers</a></li>
@@ -465,21 +463,7 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0091"></a>
 <!-- !split -->
-<h2 id="scikit-learn-implementation" class="anchor">scikit-learn implementation </h2>
-
-<p><b>scikit-learn</b> focuses more
-on traditional machine learning methods, such as regression,
-clustering, decision trees, etc. As such, it has only two types of
-neural networks: Multi Layer Perceptron outputting continuous values,
-<em>MPLRegressor</em>, and Multi Layer Perceptron outputting labels,
-<em>MLPClassifier</em>. We will see how simple it is to use these classes.
-</p>
-
-<p><b>scikit-learn</b> implements a few improvements from our neural network,
-such as early stopping, a varying learning rate, different
-optimization methods, etc. We would therefore expect a better
-performance overall.
-</p>
+<h2 id="visualization" class="anchor">Visualization </h2>
 
 
 <!-- code=python (!bc pycod) typeset with pygments style "default" -->
@@ -488,22 +472,39 @@ <h2 id="scikit-learn-implementation" class="anchor">scikit-learn implementation
     <div class="inner_cell">
       <div class="input_area">
         <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.neural_network</span> <span style="color: #008000; font-weight: bold">import</span> MLPClassifier
-<span style="color: #408080; font-style: italic"># store models for later use</span>
-DNN_scikit <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros((<span style="color: #008000">len</span>(eta_vals), <span style="color: #008000">len</span>(lmbd_vals)), dtype<span style="color: #666666">=</span><span style="color: #008000">object</span>)
+  <pre style="line-height: 125%;"><span style="color: #408080; font-style: italic"># visual representation of grid search</span>
+<span style="color: #408080; font-style: italic"># uses seaborn heatmap, you can also do this with matplotlib imshow</span>
+<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">seaborn</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">sns</span>
+
+sns<span style="color: #666666">.</span>set()
 
-<span style="color: #008000; font-weight: bold">for</span> i, eta <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">enumerate</span>(eta_vals):
-    <span style="color: #008000; font-weight: bold">for</span> j, lmbd <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">enumerate</span>(lmbd_vals):
-        dnn <span style="color: #666666">=</span> MLPClassifier(hidden_layer_sizes<span style="color: #666666">=</span>(n_hidden_neurons), activation<span style="color: #666666">=</span><span style="color: #BA2121">&#39;logistic&#39;</span>,
-                            alpha<span style="color: #666666">=</span>lmbd, learning_rate_init<span style="color: #666666">=</span>eta, max_iter<span style="color: #666666">=</span>epochs)
-        dnn<span style="color: #666666">.</span>fit(X_train, Y_train)
+train_accuracy <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros((<span style="color: #008000">len</span>(eta_vals), <span style="color: #008000">len</span>(lmbd_vals)))
+test_accuracy <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros((<span style="color: #008000">len</span>(eta_vals), <span style="color: #008000">len</span>(lmbd_vals)))
+
+<span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #008000">len</span>(eta_vals)):
+    <span style="color: #008000; font-weight: bold">for</span> j <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #008000">len</span>(lmbd_vals)):
+        dnn <span style="color: #666666">=</span> DNN_numpy[i][j]
         
-        DNN_scikit[i][j] <span style="color: #666666">=</span> dnn
+        train_pred <span style="color: #666666">=</span> dnn<span style="color: #666666">.</span>predict(X_train) 
+        test_pred <span style="color: #666666">=</span> dnn<span style="color: #666666">.</span>predict(X_test)
+
+        train_accuracy[i][j] <span style="color: #666666">=</span> accuracy_score(Y_train, train_pred)
+        test_accuracy[i][j] <span style="color: #666666">=</span> accuracy_score(Y_test, test_pred)
+
         
-        <span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;Learning rate  = &quot;</span>, eta)
-        <span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;Lambda = &quot;</span>, lmbd)
-        <span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;Accuracy score on test set: &quot;</span>, dnn<span style="color: #666666">.</span>score(X_test, Y_test))
-        <span style="color: #008000">print</span>()
+fig, ax <span style="color: #666666">=</span> plt<span style="color: #666666">.</span>subplots(figsize <span style="color: #666666">=</span> (<span style="color: #666666">10</span>, <span style="color: #666666">10</span>))
+sns<span style="color: #666666">.</span>heatmap(train_accuracy, annot<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">True</span>, ax<span style="color: #666666">=</span>ax, cmap<span style="color: #666666">=</span><span style="color: #BA2121">&quot;viridis&quot;</span>)
+ax<span style="color: #666666">.</span>set_title(<span style="color: #BA2121">&quot;Training Accuracy&quot;</span>)
+ax<span style="color: #666666">.</span>set_ylabel(<span style="color: #BA2121">&quot;$\eta$&quot;</span>)
+ax<span style="color: #666666">.</span>set_xlabel(<span style="color: #BA2121">&quot;$\lambda$&quot;</span>)
+plt<span style="color: #666666">.</span>show()
+
+fig, ax <span style="color: #666666">=</span> plt<span style="color: #666666">.</span>subplots(figsize <span style="color: #666666">=</span> (<span style="color: #666666">10</span>, <span style="color: #666666">10</span>))
+sns<span style="color: #666666">.</span>heatmap(test_accuracy, annot<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">True</span>, ax<span style="color: #666666">=</span>ax, cmap<span style="color: #666666">=</span><span style="color: #BA2121">&quot;viridis&quot;</span>)
+ax<span style="color: #666666">.</span>set_title(<span style="color: #BA2121">&quot;Test Accuracy&quot;</span>)
+ax<span style="color: #666666">.</span>set_ylabel(<span style="color: #BA2121">&quot;$\eta$&quot;</span>)
+ax<span style="color: #666666">.</span>set_xlabel(<span style="color: #BA2121">&quot;$\lambda$&quot;</span>)
+plt<span style="color: #666666">.</span>show()
 </pre>
 </div>
       </div>
diff --git a/doc/pub/week42/html/._week42-bs092.html b/doc/pub/week42/html/._week42-bs092.html
index ab7606ab2..d52b48c9a 100644
--- a/doc/pub/week42/html/._week42-bs092.html
+++ b/doc/pub/week42/html/._week42-bs092.html
@@ -36,16 +36,17 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Lecture October 14, 2024', 2, None, 'lecture-october-14-2024'),
-              ('Material for the active learning sessions on Tuesday and '
-               'Wednesday',
+ 'sections': [('Lecture October 13, 2025', 2, None, 'lecture-october-13-2025'),
+              ('Readings and videos', 2, None, 'readings-and-videos'),
+              ('Material for the lab sessions on Tuesday and Wednesday',
                2,
                None,
-               'material-for-the-active-learning-sessions-on-tuesday-and-wednesday'),
-              ('Writing a code which implements a feed-forward neural network',
+               'material-for-the-lab-sessions-on-tuesday-and-wednesday'),
+              ('Lecture material: Writing a code which implements a '
+               'feed-forward neural network',
                2,
                None,
-               'writing-a-code-which-implements-a-feed-forward-neural-network'),
+               'lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network'),
               ('Mathematics of deep learning',
                2,
                None,
@@ -55,10 +56,11 @@
                None,
                'reminder-on-books-with-hands-on-material-and-codes'),
               ('Reading recommendations', 2, None, 'reading-recommendations'),
-              ('First network example, simple percepetron with one input',
+              ('Reminder from last week: First network example, simple '
+               'percepetron with one input',
                2,
                None,
-               'first-network-example-simple-percepetron-with-one-input'),
+               'reminder-from-last-week-first-network-example-simple-percepetron-with-one-input'),
               ('Layout of a simple neural network with no hidden layer',
                2,
                None,
@@ -105,10 +107,10 @@
                None,
                'setting-up-the-equations-for-a-neural-network'),
               ('Layout of a neural network with three hidden layers (last '
-               'later = $l=L=4$, first layer $l=0$)',
+               'layer = $l=L=4$, first layer $l=0$)',
                2,
                None,
-               'layout-of-a-neural-network-with-three-hidden-layers-last-later-l-l-4-first-layer-l-0'),
+               'layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0'),
               ('Definitions', 2, None, 'definitions'),
               ('Inputs to the activation function',
                2,
@@ -291,10 +293,6 @@
                2,
                None,
                'collect-and-pre-process-data'),
-              ('The Breast Cancer Data, now with Keras',
-               2,
-               None,
-               'the-breast-cancer-data-now-with-keras'),
               ('Building a neural network code',
                2,
                None,
@@ -349,103 +347,103 @@
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs001.html#lecture-october-14-2024" style="font-size: 80%;"><b>Lecture October 14, 2024</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs002.html#material-for-the-active-learning-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Material for the active learning sessions on Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs003.html#writing-a-code-which-implements-a-feed-forward-neural-network" style="font-size: 80%;"><b>Writing a code which implements a feed-forward neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs004.html#mathematics-of-deep-learning" style="font-size: 80%;"><b>Mathematics of deep learning</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs005.html#reminder-on-books-with-hands-on-material-and-codes" style="font-size: 80%;"><b>Reminder on books with hands-on material and codes</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs006.html#reading-recommendations" style="font-size: 80%;"><b>Reading recommendations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs007.html#first-network-example-simple-percepetron-with-one-input" style="font-size: 80%;"><b>First network example, simple percepetron with one input</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs008.html#layout-of-a-simple-neural-network-with-no-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with no hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs009.html#optimizing-the-parameters" style="font-size: 80%;"><b>Optimizing the parameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs010.html#adding-a-hidden-layer" style="font-size: 80%;"><b>Adding a hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs011.html#layout-of-a-simple-neural-network-with-one-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with one hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs012.html#the-derivatives" style="font-size: 80%;"><b>The derivatives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs013.html#important-observations" style="font-size: 80%;"><b>Important observations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs014.html#the-training" style="font-size: 80%;"><b>The training</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs015.html#code-example" style="font-size: 80%;"><b>Code example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs016.html#simple-neural-network-and-the-back-propagation-equations" style="font-size: 80%;"><b>Simple neural network and the  back propagation equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs017.html#layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-with-two-hidden-noeds-and-one-output-node" style="font-size: 80%;"><b>Layout of a simple neural network with two input nodes, one  hidden layer with two hidden noeds and one output node</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs018.html#the-ouput-layer" style="font-size: 80%;"><b>The ouput layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs019.html#compact-expressions" style="font-size: 80%;"><b>Compact expressions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs020.html#output-layer" style="font-size: 80%;"><b>Output layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs021.html#explicit-derivatives" style="font-size: 80%;"><b>Explicit derivatives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs022.html#derivatives-of-the-hidden-layer" style="font-size: 80%;"><b>Derivatives of the hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs023.html#final-expression" style="font-size: 80%;"><b>Final expression</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs024.html#completing-the-list" style="font-size: 80%;"><b>Completing the list</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs025.html#final-expressions-for-the-biases-of-the-hidden-layer" style="font-size: 80%;"><b>Final expressions for the biases of the hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs026.html#gradient-expressions" style="font-size: 80%;"><b>Gradient expressions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs027.html#setting-up-the-equations-for-a-neural-network" style="font-size: 80%;"><b>Setting up the equations for a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs028.html#layout-of-a-neural-network-with-three-hidden-layers-last-later-l-l-4-first-layer-l-0" style="font-size: 80%;"><b>Layout of a neural network with three hidden layers (last later = \( l=L=4 \), first layer \( l=0 \))</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs029.html#definitions" style="font-size: 80%;"><b>Definitions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs030.html#inputs-to-the-activation-function" style="font-size: 80%;"><b>Inputs to the activation function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs031.html#layout-of-input-to-first-hidden-layer-l-1-from-input-layer-l-0" style="font-size: 80%;"><b>Layout of input to first hidden layer \( l=1 \) from input layer \( l=0 \)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs032.html#derivatives-and-the-chain-rule" style="font-size: 80%;"><b>Derivatives and the chain rule</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs033.html#derivative-of-the-cost-function" style="font-size: 80%;"><b>Derivative of the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs034.html#the-back-propagation-equations-for-a-neural-network" style="font-size: 80%;"><b>The  back propagation equations for a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs035.html#analyzing-the-last-results" style="font-size: 80%;"><b>Analyzing the last results</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs036.html#more-considerations" style="font-size: 80%;"><b>More considerations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs037.html#derivatives-in-terms-of-z-j-l" style="font-size: 80%;"><b>Derivatives in terms of \( z_j^L \)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs038.html#bringing-it-together" style="font-size: 80%;"><b>Bringing it together</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs039.html#final-back-propagating-equation" style="font-size: 80%;"><b>Final back propagating equation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs040.html#using-the-chain-rule-and-summing-over-all-k-entries" style="font-size: 80%;"><b>Using the chain rule and summing over all \( k \) entries</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs041.html#setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations" style="font-size: 80%;"><b>Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs042.html#setting-up-the-back-propagation-algorithm-part-1" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 1</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs043.html#setting-up-the-back-propagation-algorithm-part-2" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs044.html#setting-up-the-back-propagation-algorithm-part-3" style="font-size: 80%;"><b>Setting up the Back propagation algorithm, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs045.html#updating-the-gradients" style="font-size: 80%;"><b>Updating the gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs001.html#lecture-october-13-2025" style="font-size: 80%;"><b>Lecture October 13, 2025</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs002.html#readings-and-videos" style="font-size: 80%;"><b>Readings and videos</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs003.html#material-for-the-lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Material for the lab sessions on Tuesday and Wednesday</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs004.html#lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network" style="font-size: 80%;"><b>Lecture material: Writing a code which implements a feed-forward neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs005.html#mathematics-of-deep-learning" style="font-size: 80%;"><b>Mathematics of deep learning</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs006.html#reminder-on-books-with-hands-on-material-and-codes" style="font-size: 80%;"><b>Reminder on books with hands-on material and codes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs007.html#reading-recommendations" style="font-size: 80%;"><b>Reading recommendations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs008.html#reminder-from-last-week-first-network-example-simple-percepetron-with-one-input" style="font-size: 80%;"><b>Reminder from last week: First network example, simple percepetron with one input</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs009.html#layout-of-a-simple-neural-network-with-no-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with no hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs010.html#optimizing-the-parameters" style="font-size: 80%;"><b>Optimizing the parameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs011.html#adding-a-hidden-layer" style="font-size: 80%;"><b>Adding a hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs012.html#layout-of-a-simple-neural-network-with-one-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with one hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs013.html#the-derivatives" style="font-size: 80%;"><b>The derivatives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs014.html#important-observations" style="font-size: 80%;"><b>Important observations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs015.html#the-training" style="font-size: 80%;"><b>The training</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs016.html#code-example" style="font-size: 80%;"><b>Code example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs017.html#simple-neural-network-and-the-back-propagation-equations" style="font-size: 80%;"><b>Simple neural network and the  back propagation equations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs018.html#layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-with-two-hidden-noeds-and-one-output-node" style="font-size: 80%;"><b>Layout of a simple neural network with two input nodes, one  hidden layer with two hidden noeds and one output node</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs019.html#the-ouput-layer" style="font-size: 80%;"><b>The ouput layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs020.html#compact-expressions" style="font-size: 80%;"><b>Compact expressions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs021.html#output-layer" style="font-size: 80%;"><b>Output layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs022.html#explicit-derivatives" style="font-size: 80%;"><b>Explicit derivatives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs023.html#derivatives-of-the-hidden-layer" style="font-size: 80%;"><b>Derivatives of the hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs024.html#final-expression" style="font-size: 80%;"><b>Final expression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs025.html#completing-the-list" style="font-size: 80%;"><b>Completing the list</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs026.html#final-expressions-for-the-biases-of-the-hidden-layer" style="font-size: 80%;"><b>Final expressions for the biases of the hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs027.html#gradient-expressions" style="font-size: 80%;"><b>Gradient expressions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs028.html#setting-up-the-equations-for-a-neural-network" style="font-size: 80%;"><b>Setting up the equations for a neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs029.html#layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0" style="font-size: 80%;"><b>Layout of a neural network with three hidden layers (last layer = \( l=L=4 \), first layer \( l=0 \))</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs030.html#definitions" style="font-size: 80%;"><b>Definitions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs031.html#inputs-to-the-activation-function" style="font-size: 80%;"><b>Inputs to the activation function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs032.html#layout-of-input-to-first-hidden-layer-l-1-from-input-layer-l-0" style="font-size: 80%;"><b>Layout of input to first hidden layer \( l=1 \) from input layer \( l=0 \)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs033.html#derivatives-and-the-chain-rule" style="font-size: 80%;"><b>Derivatives and the chain rule</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs034.html#derivative-of-the-cost-function" style="font-size: 80%;"><b>Derivative of the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs035.html#the-back-propagation-equations-for-a-neural-network" style="font-size: 80%;"><b>The  back propagation equations for a neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs036.html#analyzing-the-last-results" style="font-size: 80%;"><b>Analyzing the last results</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs037.html#more-considerations" style="font-size: 80%;"><b>More considerations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs038.html#derivatives-in-terms-of-z-j-l" style="font-size: 80%;"><b>Derivatives in terms of \( z_j^L \)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs039.html#bringing-it-together" style="font-size: 80%;"><b>Bringing it together</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs040.html#final-back-propagating-equation" style="font-size: 80%;"><b>Final back propagating equation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs041.html#using-the-chain-rule-and-summing-over-all-k-entries" style="font-size: 80%;"><b>Using the chain rule and summing over all \( k \) entries</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs042.html#setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations" style="font-size: 80%;"><b>Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs043.html#setting-up-the-back-propagation-algorithm-part-1" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 1</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs044.html#setting-up-the-back-propagation-algorithm-part-2" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 2</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs045.html#setting-up-the-back-propagation-algorithm-part-3" style="font-size: 80%;"><b>Setting up the Back propagation algorithm, part 3</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs046.html#updating-the-gradients" style="font-size: 80%;"><b>Updating the gradients</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#activation-functions" style="font-size: 80%;"><b>Activation functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs047.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs048.html#relevance" style="font-size: 80%;"><b>Relevance</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs049.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs050.html#exploding-gradients" style="font-size: 80%;"><b>Exploding gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs051.html#is-the-logistic-activation-function-sigmoid-our-choice" style="font-size: 80%;"><b>Is the Logistic activation function (Sigmoid)  our choice?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs052.html#logistic-function-as-the-root-of-problems" style="font-size: 80%;"><b>Logistic function as the root of problems</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs053.html#the-derivative-of-the-logistic-funtion" style="font-size: 80%;"><b>The derivative of the Logistic funtion</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs054.html#insights-from-the-paper-by-glorot-and-bengio" style="font-size: 80%;"><b>Insights from the paper by Glorot and Bengio</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs055.html#the-relu-function-family" style="font-size: 80%;"><b>The RELU function family</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs056.html#elu-function" style="font-size: 80%;"><b>ELU function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs057.html#which-activation-function-should-we-use" style="font-size: 80%;"><b>Which activation function should we use?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs058.html#more-on-activation-functions-output-layers" style="font-size: 80%;"><b>More on activation functions, output layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs059.html#fine-tuning-neural-network-hyperparameters" style="font-size: 80%;"><b>Fine-tuning neural network hyperparameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs060.html#hidden-layers" style="font-size: 80%;"><b>Hidden layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs061.html#batch-normalization" style="font-size: 80%;"><b>Batch Normalization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs062.html#dropout" style="font-size: 80%;"><b>Dropout</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs063.html#gradient-clipping" style="font-size: 80%;"><b>Gradient Clipping</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs064.html#a-top-down-perspective-on-neural-networks" style="font-size: 80%;"><b>A top-down perspective on Neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs065.html#more-top-down-perspectives" style="font-size: 80%;"><b>More top-down perspectives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs066.html#limitations-of-supervised-learning-with-deep-networks" style="font-size: 80%;"><b>Limitations of supervised learning with deep networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs067.html#limitations-of-nns" style="font-size: 80%;"><b>Limitations of NNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs068.html#homogeneous-data" style="font-size: 80%;"><b>Homogeneous data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs069.html#more-limitations" style="font-size: 80%;"><b>More limitations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs070.html#setting-up-a-multi-layer-perceptron-model-for-classification" style="font-size: 80%;"><b>Setting up a Multi-layer perceptron model for classification</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs071.html#defining-the-cost-function" style="font-size: 80%;"><b>Defining the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs072.html#example-binary-classification-problem" style="font-size: 80%;"><b>Example: binary classification problem</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs073.html#the-softmax-function" style="font-size: 80%;"><b>The Softmax function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs074.html#developing-a-code-for-doing-neural-networks-with-back-propagation" style="font-size: 80%;"><b>Developing a code for doing neural networks with back propagation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs076.html#train-and-test-datasets" style="font-size: 80%;"><b>Train and test datasets</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs077.html#define-model-and-architecture" style="font-size: 80%;"><b>Define model and architecture</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs078.html#layers" style="font-size: 80%;"><b>Layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs079.html#weights-and-biases" style="font-size: 80%;"><b>Weights and biases</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs080.html#feed-forward-pass" style="font-size: 80%;"><b>Feed-forward pass</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs081.html#matrix-multiplications" style="font-size: 80%;"><b>Matrix multiplications</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs082.html#choose-cost-function-and-optimizer" style="font-size: 80%;"><b>Choose cost function and optimizer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs083.html#optimizing-the-cost-function" style="font-size: 80%;"><b>Optimizing the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs084.html#regularization" style="font-size: 80%;"><b>Regularization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs085.html#matrix-multiplication" style="font-size: 80%;"><b>Matrix  multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs086.html#improving-performance" style="font-size: 80%;"><b>Improving performance</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs087.html#full-object-oriented-implementation" style="font-size: 80%;"><b>Full object-oriented implementation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs088.html#evaluate-model-performance-on-test-data" style="font-size: 80%;"><b>Evaluate model performance on test data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs089.html#adjust-hyperparameters" style="font-size: 80%;"><b>Adjust hyperparameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs091.html#scikit-learn-implementation" style="font-size: 80%;"><b>scikit-learn implementation</b></a></li>
-     <!-- navigation toc: --> <li><a href="#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#building-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs094.html#tensorflow" style="font-size: 80%;"><b>Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs095.html#using-keras" style="font-size: 80%;"><b>Using Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#the-breast-cancer-data-now-with-keras" style="font-size: 80%;"><b>The Breast Cancer Data, now with Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs048.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs049.html#relevance" style="font-size: 80%;"><b>Relevance</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs050.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs051.html#exploding-gradients" style="font-size: 80%;"><b>Exploding gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs052.html#is-the-logistic-activation-function-sigmoid-our-choice" style="font-size: 80%;"><b>Is the Logistic activation function (Sigmoid)  our choice?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs053.html#logistic-function-as-the-root-of-problems" style="font-size: 80%;"><b>Logistic function as the root of problems</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs054.html#the-derivative-of-the-logistic-funtion" style="font-size: 80%;"><b>The derivative of the Logistic funtion</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs055.html#insights-from-the-paper-by-glorot-and-bengio" style="font-size: 80%;"><b>Insights from the paper by Glorot and Bengio</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs056.html#the-relu-function-family" style="font-size: 80%;"><b>The RELU function family</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs057.html#elu-function" style="font-size: 80%;"><b>ELU function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs058.html#which-activation-function-should-we-use" style="font-size: 80%;"><b>Which activation function should we use?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs059.html#more-on-activation-functions-output-layers" style="font-size: 80%;"><b>More on activation functions, output layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs060.html#fine-tuning-neural-network-hyperparameters" style="font-size: 80%;"><b>Fine-tuning neural network hyperparameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs061.html#hidden-layers" style="font-size: 80%;"><b>Hidden layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs062.html#batch-normalization" style="font-size: 80%;"><b>Batch Normalization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs063.html#dropout" style="font-size: 80%;"><b>Dropout</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs064.html#gradient-clipping" style="font-size: 80%;"><b>Gradient Clipping</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs065.html#a-top-down-perspective-on-neural-networks" style="font-size: 80%;"><b>A top-down perspective on Neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs066.html#more-top-down-perspectives" style="font-size: 80%;"><b>More top-down perspectives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs067.html#limitations-of-supervised-learning-with-deep-networks" style="font-size: 80%;"><b>Limitations of supervised learning with deep networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs068.html#limitations-of-nns" style="font-size: 80%;"><b>Limitations of NNs</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs069.html#homogeneous-data" style="font-size: 80%;"><b>Homogeneous data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs070.html#more-limitations" style="font-size: 80%;"><b>More limitations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs071.html#setting-up-a-multi-layer-perceptron-model-for-classification" style="font-size: 80%;"><b>Setting up a Multi-layer perceptron model for classification</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs072.html#defining-the-cost-function" style="font-size: 80%;"><b>Defining the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs073.html#example-binary-classification-problem" style="font-size: 80%;"><b>Example: binary classification problem</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs074.html#the-softmax-function" style="font-size: 80%;"><b>The Softmax function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs075.html#developing-a-code-for-doing-neural-networks-with-back-propagation" style="font-size: 80%;"><b>Developing a code for doing neural networks with back propagation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs077.html#train-and-test-datasets" style="font-size: 80%;"><b>Train and test datasets</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs078.html#define-model-and-architecture" style="font-size: 80%;"><b>Define model and architecture</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs079.html#layers" style="font-size: 80%;"><b>Layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs080.html#weights-and-biases" style="font-size: 80%;"><b>Weights and biases</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs081.html#feed-forward-pass" style="font-size: 80%;"><b>Feed-forward pass</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs082.html#matrix-multiplications" style="font-size: 80%;"><b>Matrix multiplications</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs083.html#choose-cost-function-and-optimizer" style="font-size: 80%;"><b>Choose cost function and optimizer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs084.html#optimizing-the-cost-function" style="font-size: 80%;"><b>Optimizing the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs085.html#regularization" style="font-size: 80%;"><b>Regularization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs086.html#matrix-multiplication" style="font-size: 80%;"><b>Matrix  multiplication</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs087.html#improving-performance" style="font-size: 80%;"><b>Improving performance</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs088.html#full-object-oriented-implementation" style="font-size: 80%;"><b>Full object-oriented implementation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs089.html#evaluate-model-performance-on-test-data" style="font-size: 80%;"><b>Evaluate model performance on test data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs090.html#adjust-hyperparameters" style="font-size: 80%;"><b>Adjust hyperparameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
+     <!-- navigation toc: --> <li><a href="#scikit-learn-implementation" style="font-size: 80%;"><b>scikit-learn implementation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs094.html#building-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building neural networks in Tensorflow and Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs095.html#tensorflow" style="font-size: 80%;"><b>Tensorflow</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#using-keras" style="font-size: 80%;"><b>Using Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#building-a-neural-network-code" style="font-size: 80%;"><b>Building a neural network code</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#learning-rate-methods" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Learning rate methods</a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#usage-of-the-above-learning-rate-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of the above learning rate schedulers</a></li>
@@ -465,7 +463,22 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0092"></a>
 <!-- !split -->
-<h2 id="visualization" class="anchor">Visualization </h2>
+<h2 id="scikit-learn-implementation" class="anchor">scikit-learn implementation </h2>
+
+<p><b>scikit-learn</b> focuses more
+on traditional machine learning methods, such as regression,
+clustering, decision trees, etc. As such, it has only two types of
+neural networks: Multi Layer Perceptron outputting continuous values,
+<em>MPLRegressor</em>, and Multi Layer Perceptron outputting labels,
+<em>MLPClassifier</em>. We will see how simple it is to use these classes.
+</p>
+
+<p><b>scikit-learn</b> implements a few improvements from our neural network,
+such as early stopping, a varying learning rate, different
+optimization methods, etc. We would therefore expect a better
+performance overall.
+</p>
+
 
 <!-- code=python (!bc pycod) typeset with pygments style "default" -->
 <div class="cell border-box-sizing code_cell rendered">
@@ -473,40 +486,22 @@ <h2 id="visualization" class="anchor">Visualization </h2>
     <div class="inner_cell">
       <div class="input_area">
         <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #408080; font-style: italic"># optional</span>
-<span style="color: #408080; font-style: italic"># visual representation of grid search</span>
-<span style="color: #408080; font-style: italic"># uses seaborn heatmap, could probably do this in matplotlib</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">seaborn</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">sns</span>
+  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.neural_network</span> <span style="color: #008000; font-weight: bold">import</span> MLPClassifier
+<span style="color: #408080; font-style: italic"># store models for later use</span>
+DNN_scikit <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros((<span style="color: #008000">len</span>(eta_vals), <span style="color: #008000">len</span>(lmbd_vals)), dtype<span style="color: #666666">=</span><span style="color: #008000">object</span>)
 
-sns<span style="color: #666666">.</span>set()
-
-train_accuracy <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros((<span style="color: #008000">len</span>(eta_vals), <span style="color: #008000">len</span>(lmbd_vals)))
-test_accuracy <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros((<span style="color: #008000">len</span>(eta_vals), <span style="color: #008000">len</span>(lmbd_vals)))
-
-<span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #008000">len</span>(eta_vals)):
-    <span style="color: #008000; font-weight: bold">for</span> j <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #008000">len</span>(lmbd_vals)):
-        dnn <span style="color: #666666">=</span> DNN_scikit[i][j]
+<span style="color: #008000; font-weight: bold">for</span> i, eta <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">enumerate</span>(eta_vals):
+    <span style="color: #008000; font-weight: bold">for</span> j, lmbd <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">enumerate</span>(lmbd_vals):
+        dnn <span style="color: #666666">=</span> MLPClassifier(hidden_layer_sizes<span style="color: #666666">=</span>(n_hidden_neurons), activation<span style="color: #666666">=</span><span style="color: #BA2121">&#39;logistic&#39;</span>,
+                            alpha<span style="color: #666666">=</span>lmbd, learning_rate_init<span style="color: #666666">=</span>eta, max_iter<span style="color: #666666">=</span>epochs)
+        dnn<span style="color: #666666">.</span>fit(X_train, Y_train)
         
-        train_pred <span style="color: #666666">=</span> dnn<span style="color: #666666">.</span>predict(X_train) 
-        test_pred <span style="color: #666666">=</span> dnn<span style="color: #666666">.</span>predict(X_test)
-
-        train_accuracy[i][j] <span style="color: #666666">=</span> accuracy_score(Y_train, train_pred)
-        test_accuracy[i][j] <span style="color: #666666">=</span> accuracy_score(Y_test, test_pred)
-
+        DNN_scikit[i][j] <span style="color: #666666">=</span> dnn
         
-fig, ax <span style="color: #666666">=</span> plt<span style="color: #666666">.</span>subplots(figsize <span style="color: #666666">=</span> (<span style="color: #666666">10</span>, <span style="color: #666666">10</span>))
-sns<span style="color: #666666">.</span>heatmap(train_accuracy, annot<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">True</span>, ax<span style="color: #666666">=</span>ax, cmap<span style="color: #666666">=</span><span style="color: #BA2121">&quot;viridis&quot;</span>)
-ax<span style="color: #666666">.</span>set_title(<span style="color: #BA2121">&quot;Training Accuracy&quot;</span>)
-ax<span style="color: #666666">.</span>set_ylabel(<span style="color: #BA2121">&quot;$\eta$&quot;</span>)
-ax<span style="color: #666666">.</span>set_xlabel(<span style="color: #BA2121">&quot;$\lambda$&quot;</span>)
-plt<span style="color: #666666">.</span>show()
-
-fig, ax <span style="color: #666666">=</span> plt<span style="color: #666666">.</span>subplots(figsize <span style="color: #666666">=</span> (<span style="color: #666666">10</span>, <span style="color: #666666">10</span>))
-sns<span style="color: #666666">.</span>heatmap(test_accuracy, annot<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">True</span>, ax<span style="color: #666666">=</span>ax, cmap<span style="color: #666666">=</span><span style="color: #BA2121">&quot;viridis&quot;</span>)
-ax<span style="color: #666666">.</span>set_title(<span style="color: #BA2121">&quot;Test Accuracy&quot;</span>)
-ax<span style="color: #666666">.</span>set_ylabel(<span style="color: #BA2121">&quot;$\eta$&quot;</span>)
-ax<span style="color: #666666">.</span>set_xlabel(<span style="color: #BA2121">&quot;$\lambda$&quot;</span>)
-plt<span style="color: #666666">.</span>show()
+        <span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;Learning rate  = &quot;</span>, eta)
+        <span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;Lambda = &quot;</span>, lmbd)
+        <span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;Accuracy score on test set: &quot;</span>, dnn<span style="color: #666666">.</span>score(X_test, Y_test))
+        <span style="color: #008000">print</span>()
 </pre>
 </div>
       </div>
diff --git a/doc/pub/week42/html/week42-bs.html b/doc/pub/week42/html/week42-bs.html
index 01e8a9642..d478d6ac7 100644
--- a/doc/pub/week42/html/week42-bs.html
+++ b/doc/pub/week42/html/week42-bs.html
@@ -36,16 +36,17 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Lecture October 14, 2024', 2, None, 'lecture-october-14-2024'),
-              ('Material for the active learning sessions on Tuesday and '
-               'Wednesday',
+ 'sections': [('Lecture October 13, 2025', 2, None, 'lecture-october-13-2025'),
+              ('Readings and videos', 2, None, 'readings-and-videos'),
+              ('Material for the lab sessions on Tuesday and Wednesday',
                2,
                None,
-               'material-for-the-active-learning-sessions-on-tuesday-and-wednesday'),
-              ('Writing a code which implements a feed-forward neural network',
+               'material-for-the-lab-sessions-on-tuesday-and-wednesday'),
+              ('Lecture material: Writing a code which implements a '
+               'feed-forward neural network',
                2,
                None,
-               'writing-a-code-which-implements-a-feed-forward-neural-network'),
+               'lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network'),
               ('Mathematics of deep learning',
                2,
                None,
@@ -55,10 +56,11 @@
                None,
                'reminder-on-books-with-hands-on-material-and-codes'),
               ('Reading recommendations', 2, None, 'reading-recommendations'),
-              ('First network example, simple percepetron with one input',
+              ('Reminder from last week: First network example, simple '
+               'percepetron with one input',
                2,
                None,
-               'first-network-example-simple-percepetron-with-one-input'),
+               'reminder-from-last-week-first-network-example-simple-percepetron-with-one-input'),
               ('Layout of a simple neural network with no hidden layer',
                2,
                None,
@@ -105,10 +107,10 @@
                None,
                'setting-up-the-equations-for-a-neural-network'),
               ('Layout of a neural network with three hidden layers (last '
-               'later = $l=L=4$, first layer $l=0$)',
+               'layer = $l=L=4$, first layer $l=0$)',
                2,
                None,
-               'layout-of-a-neural-network-with-three-hidden-layers-last-later-l-l-4-first-layer-l-0'),
+               'layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0'),
               ('Definitions', 2, None, 'definitions'),
               ('Inputs to the activation function',
                2,
@@ -291,10 +293,6 @@
                2,
                None,
                'collect-and-pre-process-data'),
-              ('The Breast Cancer Data, now with Keras',
-               2,
-               None,
-               'the-breast-cancer-data-now-with-keras'),
               ('Building a neural network code',
                2,
                None,
@@ -349,103 +347,103 @@
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs001.html#lecture-october-14-2024" style="font-size: 80%;"><b>Lecture October 14, 2024</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs002.html#material-for-the-active-learning-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Material for the active learning sessions on Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs003.html#writing-a-code-which-implements-a-feed-forward-neural-network" style="font-size: 80%;"><b>Writing a code which implements a feed-forward neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs004.html#mathematics-of-deep-learning" style="font-size: 80%;"><b>Mathematics of deep learning</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs005.html#reminder-on-books-with-hands-on-material-and-codes" style="font-size: 80%;"><b>Reminder on books with hands-on material and codes</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs006.html#reading-recommendations" style="font-size: 80%;"><b>Reading recommendations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs007.html#first-network-example-simple-percepetron-with-one-input" style="font-size: 80%;"><b>First network example, simple percepetron with one input</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs008.html#layout-of-a-simple-neural-network-with-no-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with no hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs009.html#optimizing-the-parameters" style="font-size: 80%;"><b>Optimizing the parameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs010.html#adding-a-hidden-layer" style="font-size: 80%;"><b>Adding a hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs011.html#layout-of-a-simple-neural-network-with-one-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with one hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs012.html#the-derivatives" style="font-size: 80%;"><b>The derivatives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs013.html#important-observations" style="font-size: 80%;"><b>Important observations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs014.html#the-training" style="font-size: 80%;"><b>The training</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs015.html#code-example" style="font-size: 80%;"><b>Code example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs016.html#simple-neural-network-and-the-back-propagation-equations" style="font-size: 80%;"><b>Simple neural network and the  back propagation equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs017.html#layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-with-two-hidden-noeds-and-one-output-node" style="font-size: 80%;"><b>Layout of a simple neural network with two input nodes, one  hidden layer with two hidden noeds and one output node</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs018.html#the-ouput-layer" style="font-size: 80%;"><b>The ouput layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs019.html#compact-expressions" style="font-size: 80%;"><b>Compact expressions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs020.html#output-layer" style="font-size: 80%;"><b>Output layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs021.html#explicit-derivatives" style="font-size: 80%;"><b>Explicit derivatives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs022.html#derivatives-of-the-hidden-layer" style="font-size: 80%;"><b>Derivatives of the hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs023.html#final-expression" style="font-size: 80%;"><b>Final expression</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs024.html#completing-the-list" style="font-size: 80%;"><b>Completing the list</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs025.html#final-expressions-for-the-biases-of-the-hidden-layer" style="font-size: 80%;"><b>Final expressions for the biases of the hidden layer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs026.html#gradient-expressions" style="font-size: 80%;"><b>Gradient expressions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs027.html#setting-up-the-equations-for-a-neural-network" style="font-size: 80%;"><b>Setting up the equations for a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs028.html#layout-of-a-neural-network-with-three-hidden-layers-last-later-l-l-4-first-layer-l-0" style="font-size: 80%;"><b>Layout of a neural network with three hidden layers (last later = \( l=L=4 \), first layer \( l=0 \))</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs029.html#definitions" style="font-size: 80%;"><b>Definitions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs030.html#inputs-to-the-activation-function" style="font-size: 80%;"><b>Inputs to the activation function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs031.html#layout-of-input-to-first-hidden-layer-l-1-from-input-layer-l-0" style="font-size: 80%;"><b>Layout of input to first hidden layer \( l=1 \) from input layer \( l=0 \)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs032.html#derivatives-and-the-chain-rule" style="font-size: 80%;"><b>Derivatives and the chain rule</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs033.html#derivative-of-the-cost-function" style="font-size: 80%;"><b>Derivative of the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs034.html#the-back-propagation-equations-for-a-neural-network" style="font-size: 80%;"><b>The  back propagation equations for a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs035.html#analyzing-the-last-results" style="font-size: 80%;"><b>Analyzing the last results</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs036.html#more-considerations" style="font-size: 80%;"><b>More considerations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs037.html#derivatives-in-terms-of-z-j-l" style="font-size: 80%;"><b>Derivatives in terms of \( z_j^L \)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs038.html#bringing-it-together" style="font-size: 80%;"><b>Bringing it together</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs039.html#final-back-propagating-equation" style="font-size: 80%;"><b>Final back propagating equation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs040.html#using-the-chain-rule-and-summing-over-all-k-entries" style="font-size: 80%;"><b>Using the chain rule and summing over all \( k \) entries</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs041.html#setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations" style="font-size: 80%;"><b>Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs042.html#setting-up-the-back-propagation-algorithm-part-1" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 1</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs043.html#setting-up-the-back-propagation-algorithm-part-2" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs044.html#setting-up-the-back-propagation-algorithm-part-3" style="font-size: 80%;"><b>Setting up the Back propagation algorithm, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs045.html#updating-the-gradients" style="font-size: 80%;"><b>Updating the gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs001.html#lecture-october-13-2025" style="font-size: 80%;"><b>Lecture October 13, 2025</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs002.html#readings-and-videos" style="font-size: 80%;"><b>Readings and videos</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs003.html#material-for-the-lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Material for the lab sessions on Tuesday and Wednesday</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs004.html#lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network" style="font-size: 80%;"><b>Lecture material: Writing a code which implements a feed-forward neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs005.html#mathematics-of-deep-learning" style="font-size: 80%;"><b>Mathematics of deep learning</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs006.html#reminder-on-books-with-hands-on-material-and-codes" style="font-size: 80%;"><b>Reminder on books with hands-on material and codes</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs007.html#reading-recommendations" style="font-size: 80%;"><b>Reading recommendations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs008.html#reminder-from-last-week-first-network-example-simple-percepetron-with-one-input" style="font-size: 80%;"><b>Reminder from last week: First network example, simple percepetron with one input</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs009.html#layout-of-a-simple-neural-network-with-no-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with no hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs010.html#optimizing-the-parameters" style="font-size: 80%;"><b>Optimizing the parameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs011.html#adding-a-hidden-layer" style="font-size: 80%;"><b>Adding a hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs012.html#layout-of-a-simple-neural-network-with-one-hidden-layer" style="font-size: 80%;"><b>Layout of a simple neural network with one hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs013.html#the-derivatives" style="font-size: 80%;"><b>The derivatives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs014.html#important-observations" style="font-size: 80%;"><b>Important observations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs015.html#the-training" style="font-size: 80%;"><b>The training</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs016.html#code-example" style="font-size: 80%;"><b>Code example</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs017.html#simple-neural-network-and-the-back-propagation-equations" style="font-size: 80%;"><b>Simple neural network and the  back propagation equations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs018.html#layout-of-a-simple-neural-network-with-two-input-nodes-one-hidden-layer-with-two-hidden-noeds-and-one-output-node" style="font-size: 80%;"><b>Layout of a simple neural network with two input nodes, one  hidden layer with two hidden noeds and one output node</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs019.html#the-ouput-layer" style="font-size: 80%;"><b>The ouput layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs020.html#compact-expressions" style="font-size: 80%;"><b>Compact expressions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs021.html#output-layer" style="font-size: 80%;"><b>Output layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs022.html#explicit-derivatives" style="font-size: 80%;"><b>Explicit derivatives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs023.html#derivatives-of-the-hidden-layer" style="font-size: 80%;"><b>Derivatives of the hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs024.html#final-expression" style="font-size: 80%;"><b>Final expression</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs025.html#completing-the-list" style="font-size: 80%;"><b>Completing the list</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs026.html#final-expressions-for-the-biases-of-the-hidden-layer" style="font-size: 80%;"><b>Final expressions for the biases of the hidden layer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs027.html#gradient-expressions" style="font-size: 80%;"><b>Gradient expressions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs028.html#setting-up-the-equations-for-a-neural-network" style="font-size: 80%;"><b>Setting up the equations for a neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs029.html#layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0" style="font-size: 80%;"><b>Layout of a neural network with three hidden layers (last layer = \( l=L=4 \), first layer \( l=0 \))</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs030.html#definitions" style="font-size: 80%;"><b>Definitions</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs031.html#inputs-to-the-activation-function" style="font-size: 80%;"><b>Inputs to the activation function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs032.html#layout-of-input-to-first-hidden-layer-l-1-from-input-layer-l-0" style="font-size: 80%;"><b>Layout of input to first hidden layer \( l=1 \) from input layer \( l=0 \)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs033.html#derivatives-and-the-chain-rule" style="font-size: 80%;"><b>Derivatives and the chain rule</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs034.html#derivative-of-the-cost-function" style="font-size: 80%;"><b>Derivative of the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs035.html#the-back-propagation-equations-for-a-neural-network" style="font-size: 80%;"><b>The  back propagation equations for a neural network</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs036.html#analyzing-the-last-results" style="font-size: 80%;"><b>Analyzing the last results</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs037.html#more-considerations" style="font-size: 80%;"><b>More considerations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs038.html#derivatives-in-terms-of-z-j-l" style="font-size: 80%;"><b>Derivatives in terms of \( z_j^L \)</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs039.html#bringing-it-together" style="font-size: 80%;"><b>Bringing it together</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs040.html#final-back-propagating-equation" style="font-size: 80%;"><b>Final back propagating equation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs041.html#using-the-chain-rule-and-summing-over-all-k-entries" style="font-size: 80%;"><b>Using the chain rule and summing over all \( k \) entries</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs042.html#setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations" style="font-size: 80%;"><b>Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs043.html#setting-up-the-back-propagation-algorithm-part-1" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 1</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs044.html#setting-up-the-back-propagation-algorithm-part-2" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 2</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs045.html#setting-up-the-back-propagation-algorithm-part-3" style="font-size: 80%;"><b>Setting up the Back propagation algorithm, part 3</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs046.html#updating-the-gradients" style="font-size: 80%;"><b>Updating the gradients</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#activation-functions" style="font-size: 80%;"><b>Activation functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs047.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs048.html#relevance" style="font-size: 80%;"><b>Relevance</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs049.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs050.html#exploding-gradients" style="font-size: 80%;"><b>Exploding gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs051.html#is-the-logistic-activation-function-sigmoid-our-choice" style="font-size: 80%;"><b>Is the Logistic activation function (Sigmoid)  our choice?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs052.html#logistic-function-as-the-root-of-problems" style="font-size: 80%;"><b>Logistic function as the root of problems</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs053.html#the-derivative-of-the-logistic-funtion" style="font-size: 80%;"><b>The derivative of the Logistic funtion</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs054.html#insights-from-the-paper-by-glorot-and-bengio" style="font-size: 80%;"><b>Insights from the paper by Glorot and Bengio</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs055.html#the-relu-function-family" style="font-size: 80%;"><b>The RELU function family</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs056.html#elu-function" style="font-size: 80%;"><b>ELU function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs057.html#which-activation-function-should-we-use" style="font-size: 80%;"><b>Which activation function should we use?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs058.html#more-on-activation-functions-output-layers" style="font-size: 80%;"><b>More on activation functions, output layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs059.html#fine-tuning-neural-network-hyperparameters" style="font-size: 80%;"><b>Fine-tuning neural network hyperparameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs060.html#hidden-layers" style="font-size: 80%;"><b>Hidden layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs061.html#batch-normalization" style="font-size: 80%;"><b>Batch Normalization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs062.html#dropout" style="font-size: 80%;"><b>Dropout</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs063.html#gradient-clipping" style="font-size: 80%;"><b>Gradient Clipping</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs064.html#a-top-down-perspective-on-neural-networks" style="font-size: 80%;"><b>A top-down perspective on Neural networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs065.html#more-top-down-perspectives" style="font-size: 80%;"><b>More top-down perspectives</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs066.html#limitations-of-supervised-learning-with-deep-networks" style="font-size: 80%;"><b>Limitations of supervised learning with deep networks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs067.html#limitations-of-nns" style="font-size: 80%;"><b>Limitations of NNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs068.html#homogeneous-data" style="font-size: 80%;"><b>Homogeneous data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs069.html#more-limitations" style="font-size: 80%;"><b>More limitations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs070.html#setting-up-a-multi-layer-perceptron-model-for-classification" style="font-size: 80%;"><b>Setting up a Multi-layer perceptron model for classification</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs071.html#defining-the-cost-function" style="font-size: 80%;"><b>Defining the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs072.html#example-binary-classification-problem" style="font-size: 80%;"><b>Example: binary classification problem</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs073.html#the-softmax-function" style="font-size: 80%;"><b>The Softmax function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs074.html#developing-a-code-for-doing-neural-networks-with-back-propagation" style="font-size: 80%;"><b>Developing a code for doing neural networks with back propagation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs076.html#train-and-test-datasets" style="font-size: 80%;"><b>Train and test datasets</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs077.html#define-model-and-architecture" style="font-size: 80%;"><b>Define model and architecture</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs078.html#layers" style="font-size: 80%;"><b>Layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs079.html#weights-and-biases" style="font-size: 80%;"><b>Weights and biases</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs080.html#feed-forward-pass" style="font-size: 80%;"><b>Feed-forward pass</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs081.html#matrix-multiplications" style="font-size: 80%;"><b>Matrix multiplications</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs082.html#choose-cost-function-and-optimizer" style="font-size: 80%;"><b>Choose cost function and optimizer</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs083.html#optimizing-the-cost-function" style="font-size: 80%;"><b>Optimizing the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs084.html#regularization" style="font-size: 80%;"><b>Regularization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs085.html#matrix-multiplication" style="font-size: 80%;"><b>Matrix  multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs086.html#improving-performance" style="font-size: 80%;"><b>Improving performance</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs087.html#full-object-oriented-implementation" style="font-size: 80%;"><b>Full object-oriented implementation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs088.html#evaluate-model-performance-on-test-data" style="font-size: 80%;"><b>Evaluate model performance on test data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs089.html#adjust-hyperparameters" style="font-size: 80%;"><b>Adjust hyperparameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs091.html#scikit-learn-implementation" style="font-size: 80%;"><b>scikit-learn implementation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#building-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs094.html#tensorflow" style="font-size: 80%;"><b>Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs095.html#using-keras" style="font-size: 80%;"><b>Using Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#the-breast-cancer-data-now-with-keras" style="font-size: 80%;"><b>The Breast Cancer Data, now with Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs048.html#activation-functions-logistic-and-hyperbolic-ones" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions, Logistic and Hyperbolic ones</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs049.html#relevance" style="font-size: 80%;"><b>Relevance</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs050.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs051.html#exploding-gradients" style="font-size: 80%;"><b>Exploding gradients</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs052.html#is-the-logistic-activation-function-sigmoid-our-choice" style="font-size: 80%;"><b>Is the Logistic activation function (Sigmoid)  our choice?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs053.html#logistic-function-as-the-root-of-problems" style="font-size: 80%;"><b>Logistic function as the root of problems</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs054.html#the-derivative-of-the-logistic-funtion" style="font-size: 80%;"><b>The derivative of the Logistic funtion</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs055.html#insights-from-the-paper-by-glorot-and-bengio" style="font-size: 80%;"><b>Insights from the paper by Glorot and Bengio</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs056.html#the-relu-function-family" style="font-size: 80%;"><b>The RELU function family</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs057.html#elu-function" style="font-size: 80%;"><b>ELU function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs058.html#which-activation-function-should-we-use" style="font-size: 80%;"><b>Which activation function should we use?</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs059.html#more-on-activation-functions-output-layers" style="font-size: 80%;"><b>More on activation functions, output layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs060.html#fine-tuning-neural-network-hyperparameters" style="font-size: 80%;"><b>Fine-tuning neural network hyperparameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs061.html#hidden-layers" style="font-size: 80%;"><b>Hidden layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs062.html#batch-normalization" style="font-size: 80%;"><b>Batch Normalization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs063.html#dropout" style="font-size: 80%;"><b>Dropout</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs064.html#gradient-clipping" style="font-size: 80%;"><b>Gradient Clipping</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs065.html#a-top-down-perspective-on-neural-networks" style="font-size: 80%;"><b>A top-down perspective on Neural networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs066.html#more-top-down-perspectives" style="font-size: 80%;"><b>More top-down perspectives</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs067.html#limitations-of-supervised-learning-with-deep-networks" style="font-size: 80%;"><b>Limitations of supervised learning with deep networks</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs068.html#limitations-of-nns" style="font-size: 80%;"><b>Limitations of NNs</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs069.html#homogeneous-data" style="font-size: 80%;"><b>Homogeneous data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs070.html#more-limitations" style="font-size: 80%;"><b>More limitations</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs071.html#setting-up-a-multi-layer-perceptron-model-for-classification" style="font-size: 80%;"><b>Setting up a Multi-layer perceptron model for classification</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs072.html#defining-the-cost-function" style="font-size: 80%;"><b>Defining the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs073.html#example-binary-classification-problem" style="font-size: 80%;"><b>Example: binary classification problem</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs074.html#the-softmax-function" style="font-size: 80%;"><b>The Softmax function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs075.html#developing-a-code-for-doing-neural-networks-with-back-propagation" style="font-size: 80%;"><b>Developing a code for doing neural networks with back propagation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs077.html#train-and-test-datasets" style="font-size: 80%;"><b>Train and test datasets</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs078.html#define-model-and-architecture" style="font-size: 80%;"><b>Define model and architecture</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs079.html#layers" style="font-size: 80%;"><b>Layers</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs080.html#weights-and-biases" style="font-size: 80%;"><b>Weights and biases</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs081.html#feed-forward-pass" style="font-size: 80%;"><b>Feed-forward pass</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs082.html#matrix-multiplications" style="font-size: 80%;"><b>Matrix multiplications</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs083.html#choose-cost-function-and-optimizer" style="font-size: 80%;"><b>Choose cost function and optimizer</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs084.html#optimizing-the-cost-function" style="font-size: 80%;"><b>Optimizing the cost function</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs085.html#regularization" style="font-size: 80%;"><b>Regularization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs086.html#matrix-multiplication" style="font-size: 80%;"><b>Matrix  multiplication</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs087.html#improving-performance" style="font-size: 80%;"><b>Improving performance</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs088.html#full-object-oriented-implementation" style="font-size: 80%;"><b>Full object-oriented implementation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs089.html#evaluate-model-performance-on-test-data" style="font-size: 80%;"><b>Evaluate model performance on test data</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs090.html#adjust-hyperparameters" style="font-size: 80%;"><b>Adjust hyperparameters</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs092.html#scikit-learn-implementation" style="font-size: 80%;"><b>scikit-learn implementation</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs093.html#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs094.html#building-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building neural networks in Tensorflow and Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs095.html#tensorflow" style="font-size: 80%;"><b>Tensorflow</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs096.html#using-keras" style="font-size: 80%;"><b>Using Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs097.html#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#building-a-neural-network-code" style="font-size: 80%;"><b>Building a neural network code</b></a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#learning-rate-methods" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Learning rate methods</a></li>
      <!-- navigation toc: --> <li><a href="http://github.com/._week42-bs098.html#usage-of-the-above-learning-rate-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of the above learning rate schedulers</a></li>
@@ -472,18 +470,15 @@ <h1>Week 42 Constructing a Neural Network code with examples</h1>
 
 <!-- author(s): Morten Hjorth-Jensen -->
 <center>
-<b>Morten Hjorth-Jensen</b> [1, 2]
-</center>
-<!-- institution(s) -->
-<center>
-[1] <b>Department of Physics, University of Oslo</b>
+<b>Morten Hjorth-Jensen</b> 
 </center>
+<!-- institution -->
 <center>
-[2] <b>Department of Physics and Astronomy and Facility for Rare Isotope Beams, Michigan State University</b>
+<b>Department of Physics, University of Oslo, Norway</b>
 </center>
 <br>
 <center>
-<h4>October 14-18, 2024</h4>
+<h4>October 13-17, 2025</h4>
 </center> <!-- date -->
 <br>
 
@@ -522,7 +517,7 @@ <h4>October 14-18, 2024</h4>
 </footer>
 -->
 <center style="font-size:80%">
-<!-- copyright --> &copy; 1999-2024, Morten Hjorth-Jensen. Released under CC Attribution-NonCommercial 4.0 license
+<!-- copyright --> &copy; 1999-2025, Morten Hjorth-Jensen. Released under CC Attribution-NonCommercial 4.0 license
 </center>
 </body>
 </html>
diff --git a/doc/pub/week42/html/week42-reveal.html b/doc/pub/week42/html/week42-reveal.html
index 666803e58..55c267c29 100644
--- a/doc/pub/week42/html/week42-reveal.html
+++ b/doc/pub/week42/html/week42-reveal.html
@@ -173,45 +173,46 @@ <h1 style="text-align: center;">Week 42 Constructing a Neural Network code with
 
 <!-- author(s): Morten Hjorth-Jensen -->
 <center>
-<b>Morten Hjorth-Jensen</b> [1, 2]
+<b>Morten Hjorth-Jensen</b> 
 </center>
-<!-- institution(s) -->
+<!-- institution -->
 <center>
-[1] <b>Department of Physics, University of Oslo</b>
-</center>
-<center>
-[2] <b>Department of Physics and Astronomy and Facility for Rare Isotope Beams, Michigan State University</b>
+<b>Department of Physics, University of Oslo, Norway</b>
 </center>
 <br>
 <center>
-<h4>October 14-18, 2024</h4>
+<h4>October 13-17, 2025</h4>
 </center> <!-- date -->
 <br>
 
 
 <center style="font-size:80%">
-<!-- copyright --> &copy; 1999-2024, Morten Hjorth-Jensen. Released under CC Attribution-NonCommercial 4.0 license
+<!-- copyright --> &copy; 1999-2025, Morten Hjorth-Jensen. Released under CC Attribution-NonCommercial 4.0 license
 </center>
 </section>
 
 <section>
-<h2 id="lecture-october-14-2024">Lecture October 14, 2024 </h2>
+<h2 id="lecture-october-13-2025">Lecture October 13, 2025 </h2>
 <div class="alert alert-block alert-block alert-text-normal">
 <b></b>
 <p>
 <ol>
 <p><li> Building our own Feed-forward Neural Network and discussion of project 2</li>
+<p><li> Project 2 is available at <a href="https://github.com/CompPhysics/MachineLearning/blob/master/doc/Projects/2025/Project2/ipynb/Project2.ipynb" target="_blank"><tt>https://github.com/CompPhysics/MachineLearning/blob/master/doc/Projects/2025/Project2/ipynb/Project2.ipynb</tt></a></li>
 </ol>
 </div>
+</section>
 
+<section>
+<h2 id="readings-and-videos">Readings and videos  </h2>
 <div class="alert alert-block alert-block alert-text-normal">
-<b>Readings and videos</b>
+<b></b>
 <p>
 <ol>
 <p><li> These lecture notes</li>
-<p><li> <a href="https://youtu.be/7B2F35gNj2Y" target="_blank">Video of lecture</a></li>
-<p><li> <a href="https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2024/NotesOct14.pdf" target="_blank">Whiteboard notes</a></li>
-<p><li> For a more in depth discussion on  neural networks we recommend Goodfellow et al chapters 6 and 7.</li>
+<p><li> Video of lecture at <a href="https://youtu.be/eqyNrEYRXnY" target="_blank"><tt>https://youtu.be/eqyNrEYRXnY</tt></a></li>
+<p><li> Whiteboard notes at <a href="https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2025/FYSSTKweek42.pdf" target="_blank"><tt>https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2025/FYSSTKweek42.pdf</tt></a></li>
+<p><li> For a more in depth discussion on  neural networks we recommend Goodfellow et al chapters 6 and 7. For the optimization part, see chapter 8.</li>
 
 <p><li> Neural Networks demystified at <a href="https://www.youtube.com/watch?v=bxe2T-V8XRs&list=PLiaHhY2iBX9hdHaRr6b7XevZtgZRa1PoU&ab_channel=WelchLabs" target="_blank"><tt>https://www.youtube.com/watch?v=bxe2T-V8XRs&list=PLiaHhY2iBX9hdHaRr6b7XevZtgZRa1PoU&ab_channel=WelchLabs</tt></a></li>
 <p><li> Building Neural Networks from scratch at <a href="https://www.youtube.com/watch?v=Wo5dMEP_BbI&list=PLQVvvaa0QuDcjD5BAw2DxE6OF2tius3V3&ab_channel=sentdex" target="_blank"><tt>https://www.youtube.com/watch?v=Wo5dMEP_BbI&list=PLQVvvaa0QuDcjD5BAw2DxE6OF2tius3V3&ab_channel=sentdex</tt></a></li>
@@ -224,23 +225,19 @@ <h2 id="lecture-october-14-2024">Lecture October 14, 2024 </h2>
 </section>
 
 <section>
-<h2 id="material-for-the-active-learning-sessions-on-tuesday-and-wednesday">Material for the active learning sessions on Tuesday and Wednesday </h2>
+<h2 id="material-for-the-lab-sessions-on-tuesday-and-wednesday">Material for the lab sessions on Tuesday and Wednesday </h2>
 <div class="alert alert-block alert-block alert-text-normal">
 <b></b>
 <p>
-<ul>
-
-<p><li> Exercise on starting to write a code for neural networks, feed forward part. We will also continue ur discussions of gradient descent methods from last week. If you have time, start considering the back-propagation part as well (exercises for next week)</li>
-
+<ol>
+<p><li> Exercises on writing a code for neural networks, back propagation part, see exercises for week 42 at <a href="https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/exercisesweek42.html" target="_blank"><tt>https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/exercisesweek42.html</tt></a></li> 
 <p><li> Discussion of project 2</li>
-</ul>
+</ol>
 </div>
-
-<p><b>Note</b>: some of the codes will also be discussed next week in connection with the solution of differential equations.</p>
 </section>
 
 <section>
-<h2 id="writing-a-code-which-implements-a-feed-forward-neural-network">Writing a code which implements a feed-forward neural network  </h2>
+<h2 id="lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network">Lecture material: Writing a code which implements a feed-forward neural network  </h2>
 
 <p>Last week we discussed the basics of neural networks and deep learning
 and the basics of automatic differentiation.  We looked also at
@@ -250,7 +247,7 @@ <h2 id="writing-a-code-which-implements-a-feed-forward-neural-network">Writing a
 
 <p>We ended our discussions with the derivation of the equations for a
 neural network with one hidden layers and two input variables and two
-hidden nodes but only one output node.
+hidden nodes but only one output node. We did almost finish the derivation of the back propagation algorithm.
 </p>
 </section>
 
@@ -288,7 +285,7 @@ <h2 id="reading-recommendations">Reading recommendations </h2>
 </section>
 
 <section>
-<h2 id="first-network-example-simple-percepetron-with-one-input">First network example, simple percepetron with one input </h2>
+<h2 id="reminder-from-last-week-first-network-example-simple-percepetron-with-one-input">Reminder from last week: First network example, simple percepetron with one input </h2>
 
 <p>As yet another example we define now a simple perceptron model with
 all quantities given by scalars. We consider only one input variable
@@ -855,7 +852,7 @@ <h2 id="setting-up-the-equations-for-a-neural-network">Setting up the equations
 </section>
 
 <section>
-<h2 id="layout-of-a-neural-network-with-three-hidden-layers-last-later-l-l-4-first-layer-l-0">Layout of a neural network with three hidden layers (last later = \( l=L=4 \), first layer \( l=0 \)) </h2>
+<h2 id="layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0">Layout of a neural network with three hidden layers (last layer = \( l=L=4 \), first layer \( l=0 \)) </h2>
 
 <br/><br/>
 <center>
@@ -3484,197 +3481,6 @@ <h2 id="collect-and-pre-process-data">Collect and pre-process data </h2>
 </div>
 </section>
 
-<section>
-<h2 id="the-breast-cancer-data-now-with-keras">The Breast Cancer Data, now with Keras </h2>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="font-size: 80%; line-height: 125%;"><span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">tensorflow</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">tf</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">tensorflow.keras.layers</span> <span style="color: #8B008B; font-weight: bold">import</span> Input
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">tensorflow.keras.models</span> <span style="color: #8B008B; font-weight: bold">import</span> Sequential      <span style="color: #228B22">#This allows appending layers to existing models</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">tensorflow.keras.layers</span> <span style="color: #8B008B; font-weight: bold">import</span> Dense           <span style="color: #228B22">#This allows defining the characteristics of a particular layer</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">tensorflow.keras</span> <span style="color: #8B008B; font-weight: bold">import</span> optimizers             <span style="color: #228B22">#This allows using whichever optimiser we want (sgd,adam,RMSprop)</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">tensorflow.keras</span> <span style="color: #8B008B; font-weight: bold">import</span> regularizers           <span style="color: #228B22">#This allows using whichever regularizer we want (l1,l2,l1_l2)</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">tensorflow.keras.utils</span> <span style="color: #8B008B; font-weight: bold">import</span> to_categorical   <span style="color: #228B22">#This allows using categorical cross entropy as the cost function</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">matplotlib.pyplot</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">plt</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">seaborn</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">sns</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.model_selection</span> <span style="color: #8B008B; font-weight: bold">import</span> train_test_split <span style="color: #8B008B; font-weight: bold">as</span> splitter
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.datasets</span> <span style="color: #8B008B; font-weight: bold">import</span> load_breast_cancer
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">pickle</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">os</span> 
-
-
-<span style="color: #CD5555">&quot;&quot;&quot;Load breast cancer dataset&quot;&quot;&quot;</span>
-
-np.random.seed(<span style="color: #B452CD">0</span>)        <span style="color: #228B22">#create same seed for random number every time</span>
-
-cancer=load_breast_cancer()      <span style="color: #228B22">#Download breast cancer dataset</span>
-
-inputs=cancer.data                     <span style="color: #228B22">#Feature matrix of 569 rows (samples) and 30 columns (parameters)</span>
-outputs=cancer.target                  <span style="color: #228B22">#Label array of 569 rows (0 for benign and 1 for malignant)</span>
-labels=cancer.feature_names[<span style="color: #B452CD">0</span>:<span style="color: #B452CD">30</span>]
-
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&#39;The content of the breast cancer dataset is:&#39;</span>)      <span style="color: #228B22">#Print information about the datasets</span>
-<span style="color: #658b00">print</span>(labels)
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&#39;-------------------------&#39;</span>)
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;inputs =  &quot;</span> + <span style="color: #658b00">str</span>(inputs.shape))
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;outputs =  &quot;</span> + <span style="color: #658b00">str</span>(outputs.shape))
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;labels =  &quot;</span>+ <span style="color: #658b00">str</span>(labels.shape))
-
-x=inputs      <span style="color: #228B22">#Reassign the Feature and Label matrices to other variables</span>
-y=outputs
-
-<span style="color: #228B22">#%% </span>
-
-<span style="color: #228B22"># Visualisation of dataset (for correlation analysis)</span>
-
-plt.figure()
-plt.scatter(x[:,<span style="color: #B452CD">0</span>],x[:,<span style="color: #B452CD">2</span>],s=<span style="color: #B452CD">40</span>,c=y,cmap=plt.cm.Spectral)
-plt.xlabel(<span style="color: #CD5555">&#39;Mean radius&#39;</span>,fontweight=<span style="color: #CD5555">&#39;bold&#39;</span>)
-plt.ylabel(<span style="color: #CD5555">&#39;Mean perimeter&#39;</span>,fontweight=<span style="color: #CD5555">&#39;bold&#39;</span>)
-plt.show()
-
-plt.figure()
-plt.scatter(x[:,<span style="color: #B452CD">5</span>],x[:,<span style="color: #B452CD">6</span>],s=<span style="color: #B452CD">40</span>,c=y, cmap=plt.cm.Spectral)
-plt.xlabel(<span style="color: #CD5555">&#39;Mean compactness&#39;</span>,fontweight=<span style="color: #CD5555">&#39;bold&#39;</span>)
-plt.ylabel(<span style="color: #CD5555">&#39;Mean concavity&#39;</span>,fontweight=<span style="color: #CD5555">&#39;bold&#39;</span>)
-plt.show()
-
-
-plt.figure()
-plt.scatter(x[:,<span style="color: #B452CD">0</span>],x[:,<span style="color: #B452CD">1</span>],s=<span style="color: #B452CD">40</span>,c=y,cmap=plt.cm.Spectral)
-plt.xlabel(<span style="color: #CD5555">&#39;Mean radius&#39;</span>,fontweight=<span style="color: #CD5555">&#39;bold&#39;</span>)
-plt.ylabel(<span style="color: #CD5555">&#39;Mean texture&#39;</span>,fontweight=<span style="color: #CD5555">&#39;bold&#39;</span>)
-plt.show()
-
-plt.figure()
-plt.scatter(x[:,<span style="color: #B452CD">2</span>],x[:,<span style="color: #B452CD">1</span>],s=<span style="color: #B452CD">40</span>,c=y,cmap=plt.cm.Spectral)
-plt.xlabel(<span style="color: #CD5555">&#39;Mean perimeter&#39;</span>,fontweight=<span style="color: #CD5555">&#39;bold&#39;</span>)
-plt.ylabel(<span style="color: #CD5555">&#39;Mean compactness&#39;</span>,fontweight=<span style="color: #CD5555">&#39;bold&#39;</span>)
-plt.show()
-
-
-<span style="color: #228B22"># Generate training and testing datasets</span>
-
-<span style="color: #228B22">#Select features relevant to classification (texture,perimeter,compactness and symmetery) </span>
-<span style="color: #228B22">#and add to input matrix</span>
-
-temp1=np.reshape(x[:,<span style="color: #B452CD">1</span>],(<span style="color: #658b00">len</span>(x[:,<span style="color: #B452CD">1</span>]),<span style="color: #B452CD">1</span>))
-temp2=np.reshape(x[:,<span style="color: #B452CD">2</span>],(<span style="color: #658b00">len</span>(x[:,<span style="color: #B452CD">2</span>]),<span style="color: #B452CD">1</span>))
-X=np.hstack((temp1,temp2))      
-temp=np.reshape(x[:,<span style="color: #B452CD">5</span>],(<span style="color: #658b00">len</span>(x[:,<span style="color: #B452CD">5</span>]),<span style="color: #B452CD">1</span>))
-X=np.hstack((X,temp))       
-temp=np.reshape(x[:,<span style="color: #B452CD">8</span>],(<span style="color: #658b00">len</span>(x[:,<span style="color: #B452CD">8</span>]),<span style="color: #B452CD">1</span>))
-X=np.hstack((X,temp))       
-
-X_train,X_test,y_train,y_test=splitter(X,y,test_size=<span style="color: #B452CD">0.1</span>)   <span style="color: #228B22">#Split datasets into training and testing</span>
-
-y_train=to_categorical(y_train)     <span style="color: #228B22">#Convert labels to categorical when using categorical cross entropy</span>
-y_test=to_categorical(y_test)
-
-<span style="color: #8B008B; font-weight: bold">del</span> temp1,temp2,temp
-
-<span style="color: #228B22"># %%</span>
-
-<span style="color: #228B22"># Define tunable parameters&quot;</span>
-
-eta=np.logspace(-<span style="color: #B452CD">3</span>,-<span style="color: #B452CD">1</span>,<span style="color: #B452CD">3</span>)                    <span style="color: #228B22">#Define vector of learning rates (parameter to SGD optimiser)</span>
-lamda=<span style="color: #B452CD">0.01</span>                                  <span style="color: #228B22">#Define hyperparameter</span>
-n_layers=<span style="color: #B452CD">2</span>                                  <span style="color: #228B22">#Define number of hidden layers in the model</span>
-n_neuron=np.logspace(<span style="color: #B452CD">0</span>,<span style="color: #B452CD">3</span>,<span style="color: #B452CD">4</span>,dtype=<span style="color: #658b00">int</span>)       <span style="color: #228B22">#Define number of neurons per layer</span>
-epochs=<span style="color: #B452CD">100</span>                                   <span style="color: #228B22">#Number of reiterations over the input data</span>
-batch_size=<span style="color: #B452CD">100</span>                              <span style="color: #228B22">#Number of samples per gradient update</span>
-
-<span style="color: #228B22"># %%</span>
-
-<span style="color: #CD5555">&quot;&quot;&quot;Define function to return Deep Neural Network model&quot;&quot;&quot;</span>
-
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">NN_model</span>(inputsize,n_layers,n_neuron,eta,lamda):
-    model=Sequential()      
-    <span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(n_layers):       <span style="color: #228B22">#Run loop to add hidden layers to the model</span>
-        <span style="color: #8B008B; font-weight: bold">if</span> (i==<span style="color: #B452CD">0</span>):                  <span style="color: #228B22">#First layer requires input dimensions</span>
-            model.add(Dense(n_neuron,activation=<span style="color: #CD5555">&#39;relu&#39;</span>,kernel_regularizer=regularizers.l2(lamda),input_dim=inputsize))
-        <span style="color: #8B008B; font-weight: bold">else</span>:                       <span style="color: #228B22">#Subsequent layers are capable of automatic shape inferencing</span>
-            model.add(Dense(n_neuron,activation=<span style="color: #CD5555">&#39;relu&#39;</span>,kernel_regularizer=regularizers.l2(lamda)))
-    model.add(Dense(<span style="color: #B452CD">2</span>,activation=<span style="color: #CD5555">&#39;softmax&#39;</span>))  <span style="color: #228B22">#2 outputs - ordered and disordered (softmax for prob)</span>
-    sgd=optimizers.SGD(lr=eta)
-    model.compile(loss=<span style="color: #CD5555">&#39;categorical_crossentropy&#39;</span>,optimizer=sgd,metrics=[<span style="color: #CD5555">&#39;accuracy&#39;</span>])
-    <span style="color: #8B008B; font-weight: bold">return</span> model
-
-    
-Train_accuracy=np.zeros((<span style="color: #658b00">len</span>(n_neuron),<span style="color: #658b00">len</span>(eta)))      <span style="color: #228B22">#Define matrices to store accuracy scores as a function</span>
-Test_accuracy=np.zeros((<span style="color: #658b00">len</span>(n_neuron),<span style="color: #658b00">len</span>(eta)))       <span style="color: #228B22">#of learning rate and number of hidden neurons for </span>
-
-<span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #658b00">len</span>(n_neuron)):     <span style="color: #228B22">#run loops over hidden neurons and learning rates to calculate </span>
-    <span style="color: #8B008B; font-weight: bold">for</span> j <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #658b00">len</span>(eta)):      <span style="color: #228B22">#accuracy scores </span>
-        DNN_model=NN_model(X_train.shape[<span style="color: #B452CD">1</span>],n_layers,n_neuron[i],eta[j],lamda)
-        DNN_model.fit(X_train,y_train,epochs=epochs,batch_size=batch_size,verbose=<span style="color: #B452CD">1</span>)
-        Train_accuracy[i,j]=DNN_model.evaluate(X_train,y_train)[<span style="color: #B452CD">1</span>]
-        Test_accuracy[i,j]=DNN_model.evaluate(X_test,y_test)[<span style="color: #B452CD">1</span>]
-               
-
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">plot_data</span>(x,y,data,title=<span style="color: #8B008B; font-weight: bold">None</span>):
-
-    <span style="color: #228B22"># plot results</span>
-    fontsize=<span style="color: #B452CD">16</span>
-
-
-    fig = plt.figure()
-    ax = fig.add_subplot(<span style="color: #B452CD">111</span>)
-    cax = ax.matshow(data, interpolation=<span style="color: #CD5555">&#39;nearest&#39;</span>, vmin=<span style="color: #B452CD">0</span>, vmax=<span style="color: #B452CD">1</span>)
-    
-    cbar=fig.colorbar(cax)
-    cbar.ax.set_ylabel(<span style="color: #CD5555">&#39;accuracy (%)&#39;</span>,rotation=<span style="color: #B452CD">90</span>,fontsize=fontsize)
-    cbar.set_ticks([<span style="color: #B452CD">0</span>,<span style="color: #B452CD">.2</span>,<span style="color: #B452CD">.4</span>,<span style="color: #B452CD">0.6</span>,<span style="color: #B452CD">0.8</span>,<span style="color: #B452CD">1.0</span>])
-    cbar.set_ticklabels([<span style="color: #CD5555">&#39;0%&#39;</span>,<span style="color: #CD5555">&#39;20%&#39;</span>,<span style="color: #CD5555">&#39;40%&#39;</span>,<span style="color: #CD5555">&#39;60%&#39;</span>,<span style="color: #CD5555">&#39;80%&#39;</span>,<span style="color: #CD5555">&#39;100%&#39;</span>])
-
-    <span style="color: #228B22"># put text on matrix elements</span>
-    <span style="color: #8B008B; font-weight: bold">for</span> i, x_val <span style="color: #8B008B">in</span> <span style="color: #658b00">enumerate</span>(np.arange(<span style="color: #658b00">len</span>(x))):
-        <span style="color: #8B008B; font-weight: bold">for</span> j, y_val <span style="color: #8B008B">in</span> <span style="color: #658b00">enumerate</span>(np.arange(<span style="color: #658b00">len</span>(y))):
-            c = <span style="color: #CD5555">&quot;${0:.1f}\\%$&quot;</span>.format( <span style="color: #B452CD">100</span>*data[j,i])  
-            ax.text(x_val, y_val, c, va=<span style="color: #CD5555">&#39;center&#39;</span>, ha=<span style="color: #CD5555">&#39;center&#39;</span>)
-
-    <span style="color: #228B22"># convert axis vaues to to string labels</span>
-    x=[<span style="color: #658b00">str</span>(i) <span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> x]
-    y=[<span style="color: #658b00">str</span>(i) <span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> y]
-
-
-    ax.set_xticklabels([<span style="color: #CD5555">&#39;&#39;</span>]+x)
-    ax.set_yticklabels([<span style="color: #CD5555">&#39;&#39;</span>]+y)
-
-    ax.set_xlabel(<span style="color: #CD5555">&#39;$\\mathrm{learning\\ rate}$&#39;</span>,fontsize=fontsize)
-    ax.set_ylabel(<span style="color: #CD5555">&#39;$\\mathrm{hidden\\ neurons}$&#39;</span>,fontsize=fontsize)
-    <span style="color: #8B008B; font-weight: bold">if</span> title <span style="color: #8B008B">is</span> <span style="color: #8B008B">not</span> <span style="color: #8B008B; font-weight: bold">None</span>:
-        ax.set_title(title)
-
-    plt.tight_layout()
-
-    plt.show()
-    
-plot_data(eta,n_neuron,Train_accuracy, <span style="color: #CD5555">&#39;training&#39;</span>)
-plot_data(eta,n_neuron,Test_accuracy, <span style="color: #CD5555">&#39;testing&#39;</span>)
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-</section>
-
 <section>
 <h2 id="building-a-neural-network-code">Building a neural network code </h2>
 
diff --git a/doc/pub/week42/html/week42-solarized.html b/doc/pub/week42/html/week42-solarized.html
index 996f96d9d..fb394d62c 100644
--- a/doc/pub/week42/html/week42-solarized.html
+++ b/doc/pub/week42/html/week42-solarized.html
@@ -63,16 +63,17 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Lecture October 14, 2024', 2, None, 'lecture-october-14-2024'),
-              ('Material for the active learning sessions on Tuesday and '
-               'Wednesday',
+ 'sections': [('Lecture October 13, 2025', 2, None, 'lecture-october-13-2025'),
+              ('Readings and videos', 2, None, 'readings-and-videos'),
+              ('Material for the lab sessions on Tuesday and Wednesday',
                2,
                None,
-               'material-for-the-active-learning-sessions-on-tuesday-and-wednesday'),
-              ('Writing a code which implements a feed-forward neural network',
+               'material-for-the-lab-sessions-on-tuesday-and-wednesday'),
+              ('Lecture material: Writing a code which implements a '
+               'feed-forward neural network',
                2,
                None,
-               'writing-a-code-which-implements-a-feed-forward-neural-network'),
+               'lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network'),
               ('Mathematics of deep learning',
                2,
                None,
@@ -82,10 +83,11 @@
                None,
                'reminder-on-books-with-hands-on-material-and-codes'),
               ('Reading recommendations', 2, None, 'reading-recommendations'),
-              ('First network example, simple percepetron with one input',
+              ('Reminder from last week: First network example, simple '
+               'percepetron with one input',
                2,
                None,
-               'first-network-example-simple-percepetron-with-one-input'),
+               'reminder-from-last-week-first-network-example-simple-percepetron-with-one-input'),
               ('Layout of a simple neural network with no hidden layer',
                2,
                None,
@@ -132,10 +134,10 @@
                None,
                'setting-up-the-equations-for-a-neural-network'),
               ('Layout of a neural network with three hidden layers (last '
-               'later = $l=L=4$, first layer $l=0$)',
+               'layer = $l=L=4$, first layer $l=0$)',
                2,
                None,
-               'layout-of-a-neural-network-with-three-hidden-layers-last-later-l-l-4-first-layer-l-0'),
+               'layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0'),
               ('Definitions', 2, None, 'definitions'),
               ('Inputs to the activation function',
                2,
@@ -318,10 +320,6 @@
                2,
                None,
                'collect-and-pre-process-data'),
-              ('The Breast Cancer Data, now with Keras',
-               2,
-               None,
-               'the-breast-cancer-data-now-with-keras'),
               ('Building a neural network code',
                2,
                None,
@@ -368,39 +366,39 @@ <h1>Week 42 Constructing a Neural Network code with examples</h1>
 
 <!-- author(s): Morten Hjorth-Jensen -->
 <center>
-<b>Morten Hjorth-Jensen</b> [1, 2]
-</center>
-<!-- institution(s) -->
-<center>
-[1] <b>Department of Physics, University of Oslo</b>
+<b>Morten Hjorth-Jensen</b> 
 </center>
+<!-- institution -->
 <center>
-[2] <b>Department of Physics and Astronomy and Facility for Rare Isotope Beams, Michigan State University</b>
+<b>Department of Physics, University of Oslo, Norway</b>
 </center>
 <br>
 <center>
-<h4>October 14-18, 2024</h4>
+<h4>October 13-17, 2025</h4>
 </center> <!-- date -->
 <br>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="lecture-october-14-2024">Lecture October 14, 2024 </h2>
+<h2 id="lecture-october-13-2025">Lecture October 13, 2025 </h2>
 <div class="alert alert-block alert-block alert-text-normal">
 <b></b>
 <p>
 <ol>
 <li> Building our own Feed-forward Neural Network and discussion of project 2</li>
+<li> Project 2 is available at <a href="https://github.com/CompPhysics/MachineLearning/blob/master/doc/Projects/2025/Project2/ipynb/Project2.ipynb" target="_blank"><tt>https://github.com/CompPhysics/MachineLearning/blob/master/doc/Projects/2025/Project2/ipynb/Project2.ipynb</tt></a></li>
 </ol>
 </div>
 
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="readings-and-videos">Readings and videos  </h2>
 <div class="alert alert-block alert-block alert-text-normal">
-<b>Readings and videos</b>
+<b></b>
 <p>
 <ol>
 <li> These lecture notes</li>
-<li> <a href="https://youtu.be/7B2F35gNj2Y" target="_blank">Video of lecture</a></li>
-<li> <a href="https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2024/NotesOct14.pdf" target="_blank">Whiteboard notes</a></li>
-<li> For a more in depth discussion on  neural networks we recommend Goodfellow et al chapters 6 and 7.</li>     
+<li> Video of lecture at <a href="https://youtu.be/eqyNrEYRXnY" target="_blank"><tt>https://youtu.be/eqyNrEYRXnY</tt></a></li>
+<li> Whiteboard notes at <a href="https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2025/FYSSTKweek42.pdf" target="_blank"><tt>https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2025/FYSSTKweek42.pdf</tt></a></li>
+<li> For a more in depth discussion on  neural networks we recommend Goodfellow et al chapters 6 and 7. For the optimization part, see chapter 8.</li>    
 <li> Neural Networks demystified at <a href="https://www.youtube.com/watch?v=bxe2T-V8XRs&list=PLiaHhY2iBX9hdHaRr6b7XevZtgZRa1PoU&ab_channel=WelchLabs" target="_blank"><tt>https://www.youtube.com/watch?v=bxe2T-V8XRs&list=PLiaHhY2iBX9hdHaRr6b7XevZtgZRa1PoU&ab_channel=WelchLabs</tt></a></li>
 <li> Building Neural Networks from scratch at <a href="https://www.youtube.com/watch?v=Wo5dMEP_BbI&list=PLQVvvaa0QuDcjD5BAw2DxE6OF2tius3V3&ab_channel=sentdex" target="_blank"><tt>https://www.youtube.com/watch?v=Wo5dMEP_BbI&list=PLQVvvaa0QuDcjD5BAw2DxE6OF2tius3V3&ab_channel=sentdex</tt></a></li>
 <li> Video on Neural Networks at <a href="https://www.youtube.com/watch?v=CqOfi41LfDw" target="_blank"><tt>https://www.youtube.com/watch?v=CqOfi41LfDw</tt></a></li>
@@ -411,21 +409,19 @@ <h2 id="lecture-october-14-2024">Lecture October 14, 2024 </h2>
 
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="material-for-the-active-learning-sessions-on-tuesday-and-wednesday">Material for the active learning sessions on Tuesday and Wednesday </h2>
+<h2 id="material-for-the-lab-sessions-on-tuesday-and-wednesday">Material for the lab sessions on Tuesday and Wednesday </h2>
 <div class="alert alert-block alert-block alert-text-normal">
 <b></b>
 <p>
-<ul>
-  <li> Exercise on starting to write a code for neural networks, feed forward part. We will also continue ur discussions of gradient descent methods from last week. If you have time, start considering the back-propagation part as well (exercises for next week)</li>
-  <li> Discussion of project 2</li>
-</ul>
+<ol>
+<li> Exercises on writing a code for neural networks, back propagation part, see exercises for week 42 at <a href="https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/exercisesweek42.html" target="_blank"><tt>https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/exercisesweek42.html</tt></a></li> 
+<li> Discussion of project 2</li>
+</ol>
 </div>
-
-
-<p><b>Note</b>: some of the codes will also be discussed next week in connection with the solution of differential equations.</p>
+  
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="writing-a-code-which-implements-a-feed-forward-neural-network">Writing a code which implements a feed-forward neural network  </h2>
+<h2 id="lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network">Lecture material: Writing a code which implements a feed-forward neural network  </h2>
 
 <p>Last week we discussed the basics of neural networks and deep learning
 and the basics of automatic differentiation.  We looked also at
@@ -435,7 +431,7 @@ <h2 id="writing-a-code-which-implements-a-feed-forward-neural-network">Writing a
 
 <p>We ended our discussions with the derivation of the equations for a
 neural network with one hidden layers and two input variables and two
-hidden nodes but only one output node.
+hidden nodes but only one output node. We did almost finish the derivation of the back propagation algorithm.
 </p>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
@@ -470,7 +466,7 @@ <h2 id="reading-recommendations">Reading recommendations </h2>
 <li> Goodfellow et al, chapter 6 and 7 contain most of the neural network background.</li>
 </ol>
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="first-network-example-simple-percepetron-with-one-input">First network example, simple percepetron with one input </h2>
+<h2 id="reminder-from-last-week-first-network-example-simple-percepetron-with-one-input">Reminder from last week: First network example, simple percepetron with one input </h2>
 
 <p>As yet another example we define now a simple perceptron model with
 all quantities given by scalars. We consider only one input variable
@@ -943,7 +939,7 @@ <h2 id="setting-up-the-equations-for-a-neural-network">Setting up the equations
 </p>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="layout-of-a-neural-network-with-three-hidden-layers-last-later-l-l-4-first-layer-l-0">Layout of a neural network with three hidden layers (last later = \( l=L=4 \), first layer \( l=0 \)) </h2>
+<h2 id="layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0">Layout of a neural network with three hidden layers (last layer = \( l=L=4 \), first layer \( l=0 \)) </h2>
 
 <br/><br/>
 <center>
@@ -3352,197 +3348,6 @@ <h2 id="collect-and-pre-process-data">Collect and pre-process data </h2>
 </div>
 
 
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="the-breast-cancer-data-now-with-keras">The Breast Cancer Data, now with Keras </h2>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="line-height: 125%;"><span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">tensorflow</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">tf</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">tensorflow.keras.layers</span> <span style="color: #8B008B; font-weight: bold">import</span> Input
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">tensorflow.keras.models</span> <span style="color: #8B008B; font-weight: bold">import</span> Sequential      <span style="color: #228B22">#This allows appending layers to existing models</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">tensorflow.keras.layers</span> <span style="color: #8B008B; font-weight: bold">import</span> Dense           <span style="color: #228B22">#This allows defining the characteristics of a particular layer</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">tensorflow.keras</span> <span style="color: #8B008B; font-weight: bold">import</span> optimizers             <span style="color: #228B22">#This allows using whichever optimiser we want (sgd,adam,RMSprop)</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">tensorflow.keras</span> <span style="color: #8B008B; font-weight: bold">import</span> regularizers           <span style="color: #228B22">#This allows using whichever regularizer we want (l1,l2,l1_l2)</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">tensorflow.keras.utils</span> <span style="color: #8B008B; font-weight: bold">import</span> to_categorical   <span style="color: #228B22">#This allows using categorical cross entropy as the cost function</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">matplotlib.pyplot</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">plt</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">seaborn</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">sns</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.model_selection</span> <span style="color: #8B008B; font-weight: bold">import</span> train_test_split <span style="color: #8B008B; font-weight: bold">as</span> splitter
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.datasets</span> <span style="color: #8B008B; font-weight: bold">import</span> load_breast_cancer
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">pickle</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">os</span> 
-
-
-<span style="color: #CD5555">&quot;&quot;&quot;Load breast cancer dataset&quot;&quot;&quot;</span>
-
-np.random.seed(<span style="color: #B452CD">0</span>)        <span style="color: #228B22">#create same seed for random number every time</span>
-
-cancer=load_breast_cancer()      <span style="color: #228B22">#Download breast cancer dataset</span>
-
-inputs=cancer.data                     <span style="color: #228B22">#Feature matrix of 569 rows (samples) and 30 columns (parameters)</span>
-outputs=cancer.target                  <span style="color: #228B22">#Label array of 569 rows (0 for benign and 1 for malignant)</span>
-labels=cancer.feature_names[<span style="color: #B452CD">0</span>:<span style="color: #B452CD">30</span>]
-
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&#39;The content of the breast cancer dataset is:&#39;</span>)      <span style="color: #228B22">#Print information about the datasets</span>
-<span style="color: #658b00">print</span>(labels)
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&#39;-------------------------&#39;</span>)
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;inputs =  &quot;</span> + <span style="color: #658b00">str</span>(inputs.shape))
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;outputs =  &quot;</span> + <span style="color: #658b00">str</span>(outputs.shape))
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;labels =  &quot;</span>+ <span style="color: #658b00">str</span>(labels.shape))
-
-x=inputs      <span style="color: #228B22">#Reassign the Feature and Label matrices to other variables</span>
-y=outputs
-
-<span style="color: #228B22">#%% </span>
-
-<span style="color: #228B22"># Visualisation of dataset (for correlation analysis)</span>
-
-plt.figure()
-plt.scatter(x[:,<span style="color: #B452CD">0</span>],x[:,<span style="color: #B452CD">2</span>],s=<span style="color: #B452CD">40</span>,c=y,cmap=plt.cm.Spectral)
-plt.xlabel(<span style="color: #CD5555">&#39;Mean radius&#39;</span>,fontweight=<span style="color: #CD5555">&#39;bold&#39;</span>)
-plt.ylabel(<span style="color: #CD5555">&#39;Mean perimeter&#39;</span>,fontweight=<span style="color: #CD5555">&#39;bold&#39;</span>)
-plt.show()
-
-plt.figure()
-plt.scatter(x[:,<span style="color: #B452CD">5</span>],x[:,<span style="color: #B452CD">6</span>],s=<span style="color: #B452CD">40</span>,c=y, cmap=plt.cm.Spectral)
-plt.xlabel(<span style="color: #CD5555">&#39;Mean compactness&#39;</span>,fontweight=<span style="color: #CD5555">&#39;bold&#39;</span>)
-plt.ylabel(<span style="color: #CD5555">&#39;Mean concavity&#39;</span>,fontweight=<span style="color: #CD5555">&#39;bold&#39;</span>)
-plt.show()
-
-
-plt.figure()
-plt.scatter(x[:,<span style="color: #B452CD">0</span>],x[:,<span style="color: #B452CD">1</span>],s=<span style="color: #B452CD">40</span>,c=y,cmap=plt.cm.Spectral)
-plt.xlabel(<span style="color: #CD5555">&#39;Mean radius&#39;</span>,fontweight=<span style="color: #CD5555">&#39;bold&#39;</span>)
-plt.ylabel(<span style="color: #CD5555">&#39;Mean texture&#39;</span>,fontweight=<span style="color: #CD5555">&#39;bold&#39;</span>)
-plt.show()
-
-plt.figure()
-plt.scatter(x[:,<span style="color: #B452CD">2</span>],x[:,<span style="color: #B452CD">1</span>],s=<span style="color: #B452CD">40</span>,c=y,cmap=plt.cm.Spectral)
-plt.xlabel(<span style="color: #CD5555">&#39;Mean perimeter&#39;</span>,fontweight=<span style="color: #CD5555">&#39;bold&#39;</span>)
-plt.ylabel(<span style="color: #CD5555">&#39;Mean compactness&#39;</span>,fontweight=<span style="color: #CD5555">&#39;bold&#39;</span>)
-plt.show()
-
-
-<span style="color: #228B22"># Generate training and testing datasets</span>
-
-<span style="color: #228B22">#Select features relevant to classification (texture,perimeter,compactness and symmetery) </span>
-<span style="color: #228B22">#and add to input matrix</span>
-
-temp1=np.reshape(x[:,<span style="color: #B452CD">1</span>],(<span style="color: #658b00">len</span>(x[:,<span style="color: #B452CD">1</span>]),<span style="color: #B452CD">1</span>))
-temp2=np.reshape(x[:,<span style="color: #B452CD">2</span>],(<span style="color: #658b00">len</span>(x[:,<span style="color: #B452CD">2</span>]),<span style="color: #B452CD">1</span>))
-X=np.hstack((temp1,temp2))      
-temp=np.reshape(x[:,<span style="color: #B452CD">5</span>],(<span style="color: #658b00">len</span>(x[:,<span style="color: #B452CD">5</span>]),<span style="color: #B452CD">1</span>))
-X=np.hstack((X,temp))       
-temp=np.reshape(x[:,<span style="color: #B452CD">8</span>],(<span style="color: #658b00">len</span>(x[:,<span style="color: #B452CD">8</span>]),<span style="color: #B452CD">1</span>))
-X=np.hstack((X,temp))       
-
-X_train,X_test,y_train,y_test=splitter(X,y,test_size=<span style="color: #B452CD">0.1</span>)   <span style="color: #228B22">#Split datasets into training and testing</span>
-
-y_train=to_categorical(y_train)     <span style="color: #228B22">#Convert labels to categorical when using categorical cross entropy</span>
-y_test=to_categorical(y_test)
-
-<span style="color: #8B008B; font-weight: bold">del</span> temp1,temp2,temp
-
-<span style="color: #228B22"># %%</span>
-
-<span style="color: #228B22"># Define tunable parameters&quot;</span>
-
-eta=np.logspace(-<span style="color: #B452CD">3</span>,-<span style="color: #B452CD">1</span>,<span style="color: #B452CD">3</span>)                    <span style="color: #228B22">#Define vector of learning rates (parameter to SGD optimiser)</span>
-lamda=<span style="color: #B452CD">0.01</span>                                  <span style="color: #228B22">#Define hyperparameter</span>
-n_layers=<span style="color: #B452CD">2</span>                                  <span style="color: #228B22">#Define number of hidden layers in the model</span>
-n_neuron=np.logspace(<span style="color: #B452CD">0</span>,<span style="color: #B452CD">3</span>,<span style="color: #B452CD">4</span>,dtype=<span style="color: #658b00">int</span>)       <span style="color: #228B22">#Define number of neurons per layer</span>
-epochs=<span style="color: #B452CD">100</span>                                   <span style="color: #228B22">#Number of reiterations over the input data</span>
-batch_size=<span style="color: #B452CD">100</span>                              <span style="color: #228B22">#Number of samples per gradient update</span>
-
-<span style="color: #228B22"># %%</span>
-
-<span style="color: #CD5555">&quot;&quot;&quot;Define function to return Deep Neural Network model&quot;&quot;&quot;</span>
-
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">NN_model</span>(inputsize,n_layers,n_neuron,eta,lamda):
-    model=Sequential()      
-    <span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(n_layers):       <span style="color: #228B22">#Run loop to add hidden layers to the model</span>
-        <span style="color: #8B008B; font-weight: bold">if</span> (i==<span style="color: #B452CD">0</span>):                  <span style="color: #228B22">#First layer requires input dimensions</span>
-            model.add(Dense(n_neuron,activation=<span style="color: #CD5555">&#39;relu&#39;</span>,kernel_regularizer=regularizers.l2(lamda),input_dim=inputsize))
-        <span style="color: #8B008B; font-weight: bold">else</span>:                       <span style="color: #228B22">#Subsequent layers are capable of automatic shape inferencing</span>
-            model.add(Dense(n_neuron,activation=<span style="color: #CD5555">&#39;relu&#39;</span>,kernel_regularizer=regularizers.l2(lamda)))
-    model.add(Dense(<span style="color: #B452CD">2</span>,activation=<span style="color: #CD5555">&#39;softmax&#39;</span>))  <span style="color: #228B22">#2 outputs - ordered and disordered (softmax for prob)</span>
-    sgd=optimizers.SGD(lr=eta)
-    model.compile(loss=<span style="color: #CD5555">&#39;categorical_crossentropy&#39;</span>,optimizer=sgd,metrics=[<span style="color: #CD5555">&#39;accuracy&#39;</span>])
-    <span style="color: #8B008B; font-weight: bold">return</span> model
-
-    
-Train_accuracy=np.zeros((<span style="color: #658b00">len</span>(n_neuron),<span style="color: #658b00">len</span>(eta)))      <span style="color: #228B22">#Define matrices to store accuracy scores as a function</span>
-Test_accuracy=np.zeros((<span style="color: #658b00">len</span>(n_neuron),<span style="color: #658b00">len</span>(eta)))       <span style="color: #228B22">#of learning rate and number of hidden neurons for </span>
-
-<span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #658b00">len</span>(n_neuron)):     <span style="color: #228B22">#run loops over hidden neurons and learning rates to calculate </span>
-    <span style="color: #8B008B; font-weight: bold">for</span> j <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #658b00">len</span>(eta)):      <span style="color: #228B22">#accuracy scores </span>
-        DNN_model=NN_model(X_train.shape[<span style="color: #B452CD">1</span>],n_layers,n_neuron[i],eta[j],lamda)
-        DNN_model.fit(X_train,y_train,epochs=epochs,batch_size=batch_size,verbose=<span style="color: #B452CD">1</span>)
-        Train_accuracy[i,j]=DNN_model.evaluate(X_train,y_train)[<span style="color: #B452CD">1</span>]
-        Test_accuracy[i,j]=DNN_model.evaluate(X_test,y_test)[<span style="color: #B452CD">1</span>]
-               
-
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">plot_data</span>(x,y,data,title=<span style="color: #8B008B; font-weight: bold">None</span>):
-
-    <span style="color: #228B22"># plot results</span>
-    fontsize=<span style="color: #B452CD">16</span>
-
-
-    fig = plt.figure()
-    ax = fig.add_subplot(<span style="color: #B452CD">111</span>)
-    cax = ax.matshow(data, interpolation=<span style="color: #CD5555">&#39;nearest&#39;</span>, vmin=<span style="color: #B452CD">0</span>, vmax=<span style="color: #B452CD">1</span>)
-    
-    cbar=fig.colorbar(cax)
-    cbar.ax.set_ylabel(<span style="color: #CD5555">&#39;accuracy (%)&#39;</span>,rotation=<span style="color: #B452CD">90</span>,fontsize=fontsize)
-    cbar.set_ticks([<span style="color: #B452CD">0</span>,<span style="color: #B452CD">.2</span>,<span style="color: #B452CD">.4</span>,<span style="color: #B452CD">0.6</span>,<span style="color: #B452CD">0.8</span>,<span style="color: #B452CD">1.0</span>])
-    cbar.set_ticklabels([<span style="color: #CD5555">&#39;0%&#39;</span>,<span style="color: #CD5555">&#39;20%&#39;</span>,<span style="color: #CD5555">&#39;40%&#39;</span>,<span style="color: #CD5555">&#39;60%&#39;</span>,<span style="color: #CD5555">&#39;80%&#39;</span>,<span style="color: #CD5555">&#39;100%&#39;</span>])
-
-    <span style="color: #228B22"># put text on matrix elements</span>
-    <span style="color: #8B008B; font-weight: bold">for</span> i, x_val <span style="color: #8B008B">in</span> <span style="color: #658b00">enumerate</span>(np.arange(<span style="color: #658b00">len</span>(x))):
-        <span style="color: #8B008B; font-weight: bold">for</span> j, y_val <span style="color: #8B008B">in</span> <span style="color: #658b00">enumerate</span>(np.arange(<span style="color: #658b00">len</span>(y))):
-            c = <span style="color: #CD5555">&quot;${0:.1f}\\%$&quot;</span>.format( <span style="color: #B452CD">100</span>*data[j,i])  
-            ax.text(x_val, y_val, c, va=<span style="color: #CD5555">&#39;center&#39;</span>, ha=<span style="color: #CD5555">&#39;center&#39;</span>)
-
-    <span style="color: #228B22"># convert axis vaues to to string labels</span>
-    x=[<span style="color: #658b00">str</span>(i) <span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> x]
-    y=[<span style="color: #658b00">str</span>(i) <span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> y]
-
-
-    ax.set_xticklabels([<span style="color: #CD5555">&#39;&#39;</span>]+x)
-    ax.set_yticklabels([<span style="color: #CD5555">&#39;&#39;</span>]+y)
-
-    ax.set_xlabel(<span style="color: #CD5555">&#39;$\\mathrm{learning\\ rate}$&#39;</span>,fontsize=fontsize)
-    ax.set_ylabel(<span style="color: #CD5555">&#39;$\\mathrm{hidden\\ neurons}$&#39;</span>,fontsize=fontsize)
-    <span style="color: #8B008B; font-weight: bold">if</span> title <span style="color: #8B008B">is</span> <span style="color: #8B008B">not</span> <span style="color: #8B008B; font-weight: bold">None</span>:
-        ax.set_title(title)
-
-    plt.tight_layout()
-
-    plt.show()
-    
-plot_data(eta,n_neuron,Train_accuracy, <span style="color: #CD5555">&#39;training&#39;</span>)
-plot_data(eta,n_neuron,Test_accuracy, <span style="color: #CD5555">&#39;testing&#39;</span>)
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
 <h2 id="building-a-neural-network-code">Building a neural network code </h2>
 
@@ -4894,7 +4699,7 @@ <h2 id="testing-the-xor-gate-and-other-gates">Testing the XOR gate and other gat
 <p>Not bad, but the results depend strongly on the learning reate. Try different learning rates.</p>
 <!-- ------------------- end of main content --------------- -->
 <center style="font-size:80%">
-<!-- copyright --> &copy; 1999-2024, Morten Hjorth-Jensen. Released under CC Attribution-NonCommercial 4.0 license
+<!-- copyright --> &copy; 1999-2025, Morten Hjorth-Jensen. Released under CC Attribution-NonCommercial 4.0 license
 </center>
 </body>
 </html>
diff --git a/doc/pub/week42/html/week42.html b/doc/pub/week42/html/week42.html
index 1bfa631dc..fc4a1b807 100644
--- a/doc/pub/week42/html/week42.html
+++ b/doc/pub/week42/html/week42.html
@@ -140,16 +140,17 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Lecture October 14, 2024', 2, None, 'lecture-october-14-2024'),
-              ('Material for the active learning sessions on Tuesday and '
-               'Wednesday',
+ 'sections': [('Lecture October 13, 2025', 2, None, 'lecture-october-13-2025'),
+              ('Readings and videos', 2, None, 'readings-and-videos'),
+              ('Material for the lab sessions on Tuesday and Wednesday',
                2,
                None,
-               'material-for-the-active-learning-sessions-on-tuesday-and-wednesday'),
-              ('Writing a code which implements a feed-forward neural network',
+               'material-for-the-lab-sessions-on-tuesday-and-wednesday'),
+              ('Lecture material: Writing a code which implements a '
+               'feed-forward neural network',
                2,
                None,
-               'writing-a-code-which-implements-a-feed-forward-neural-network'),
+               'lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network'),
               ('Mathematics of deep learning',
                2,
                None,
@@ -159,10 +160,11 @@
                None,
                'reminder-on-books-with-hands-on-material-and-codes'),
               ('Reading recommendations', 2, None, 'reading-recommendations'),
-              ('First network example, simple percepetron with one input',
+              ('Reminder from last week: First network example, simple '
+               'percepetron with one input',
                2,
                None,
-               'first-network-example-simple-percepetron-with-one-input'),
+               'reminder-from-last-week-first-network-example-simple-percepetron-with-one-input'),
               ('Layout of a simple neural network with no hidden layer',
                2,
                None,
@@ -209,10 +211,10 @@
                None,
                'setting-up-the-equations-for-a-neural-network'),
               ('Layout of a neural network with three hidden layers (last '
-               'later = $l=L=4$, first layer $l=0$)',
+               'layer = $l=L=4$, first layer $l=0$)',
                2,
                None,
-               'layout-of-a-neural-network-with-three-hidden-layers-last-later-l-l-4-first-layer-l-0'),
+               'layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0'),
               ('Definitions', 2, None, 'definitions'),
               ('Inputs to the activation function',
                2,
@@ -395,10 +397,6 @@
                2,
                None,
                'collect-and-pre-process-data'),
-              ('The Breast Cancer Data, now with Keras',
-               2,
-               None,
-               'the-breast-cancer-data-now-with-keras'),
               ('Building a neural network code',
                2,
                None,
@@ -445,39 +443,39 @@ <h1>Week 42 Constructing a Neural Network code with examples</h1>
 
 <!-- author(s): Morten Hjorth-Jensen -->
 <center>
-<b>Morten Hjorth-Jensen</b> [1, 2]
-</center>
-<!-- institution(s) -->
-<center>
-[1] <b>Department of Physics, University of Oslo</b>
+<b>Morten Hjorth-Jensen</b> 
 </center>
+<!-- institution -->
 <center>
-[2] <b>Department of Physics and Astronomy and Facility for Rare Isotope Beams, Michigan State University</b>
+<b>Department of Physics, University of Oslo, Norway</b>
 </center>
 <br>
 <center>
-<h4>October 14-18, 2024</h4>
+<h4>October 13-17, 2025</h4>
 </center> <!-- date -->
 <br>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="lecture-october-14-2024">Lecture October 14, 2024 </h2>
+<h2 id="lecture-october-13-2025">Lecture October 13, 2025 </h2>
 <div class="alert alert-block alert-block alert-text-normal">
 <b></b>
 <p>
 <ol>
 <li> Building our own Feed-forward Neural Network and discussion of project 2</li>
+<li> Project 2 is available at <a href="https://github.com/CompPhysics/MachineLearning/blob/master/doc/Projects/2025/Project2/ipynb/Project2.ipynb" target="_blank"><tt>https://github.com/CompPhysics/MachineLearning/blob/master/doc/Projects/2025/Project2/ipynb/Project2.ipynb</tt></a></li>
 </ol>
 </div>
 
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="readings-and-videos">Readings and videos  </h2>
 <div class="alert alert-block alert-block alert-text-normal">
-<b>Readings and videos</b>
+<b></b>
 <p>
 <ol>
 <li> These lecture notes</li>
-<li> <a href="https://youtu.be/7B2F35gNj2Y" target="_blank">Video of lecture</a></li>
-<li> <a href="https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2024/NotesOct14.pdf" target="_blank">Whiteboard notes</a></li>
-<li> For a more in depth discussion on  neural networks we recommend Goodfellow et al chapters 6 and 7.</li>     
+<li> Video of lecture at <a href="https://youtu.be/eqyNrEYRXnY" target="_blank"><tt>https://youtu.be/eqyNrEYRXnY</tt></a></li>
+<li> Whiteboard notes at <a href="https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2025/FYSSTKweek42.pdf" target="_blank"><tt>https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2025/FYSSTKweek42.pdf</tt></a></li>
+<li> For a more in depth discussion on  neural networks we recommend Goodfellow et al chapters 6 and 7. For the optimization part, see chapter 8.</li>    
 <li> Neural Networks demystified at <a href="https://www.youtube.com/watch?v=bxe2T-V8XRs&list=PLiaHhY2iBX9hdHaRr6b7XevZtgZRa1PoU&ab_channel=WelchLabs" target="_blank"><tt>https://www.youtube.com/watch?v=bxe2T-V8XRs&list=PLiaHhY2iBX9hdHaRr6b7XevZtgZRa1PoU&ab_channel=WelchLabs</tt></a></li>
 <li> Building Neural Networks from scratch at <a href="https://www.youtube.com/watch?v=Wo5dMEP_BbI&list=PLQVvvaa0QuDcjD5BAw2DxE6OF2tius3V3&ab_channel=sentdex" target="_blank"><tt>https://www.youtube.com/watch?v=Wo5dMEP_BbI&list=PLQVvvaa0QuDcjD5BAw2DxE6OF2tius3V3&ab_channel=sentdex</tt></a></li>
 <li> Video on Neural Networks at <a href="https://www.youtube.com/watch?v=CqOfi41LfDw" target="_blank"><tt>https://www.youtube.com/watch?v=CqOfi41LfDw</tt></a></li>
@@ -488,21 +486,19 @@ <h2 id="lecture-october-14-2024">Lecture October 14, 2024 </h2>
 
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="material-for-the-active-learning-sessions-on-tuesday-and-wednesday">Material for the active learning sessions on Tuesday and Wednesday </h2>
+<h2 id="material-for-the-lab-sessions-on-tuesday-and-wednesday">Material for the lab sessions on Tuesday and Wednesday </h2>
 <div class="alert alert-block alert-block alert-text-normal">
 <b></b>
 <p>
-<ul>
-  <li> Exercise on starting to write a code for neural networks, feed forward part. We will also continue ur discussions of gradient descent methods from last week. If you have time, start considering the back-propagation part as well (exercises for next week)</li>
-  <li> Discussion of project 2</li>
-</ul>
+<ol>
+<li> Exercises on writing a code for neural networks, back propagation part, see exercises for week 42 at <a href="https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/exercisesweek42.html" target="_blank"><tt>https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/exercisesweek42.html</tt></a></li> 
+<li> Discussion of project 2</li>
+</ol>
 </div>
-
-
-<p><b>Note</b>: some of the codes will also be discussed next week in connection with the solution of differential equations.</p>
+  
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="writing-a-code-which-implements-a-feed-forward-neural-network">Writing a code which implements a feed-forward neural network  </h2>
+<h2 id="lecture-material-writing-a-code-which-implements-a-feed-forward-neural-network">Lecture material: Writing a code which implements a feed-forward neural network  </h2>
 
 <p>Last week we discussed the basics of neural networks and deep learning
 and the basics of automatic differentiation.  We looked also at
@@ -512,7 +508,7 @@ <h2 id="writing-a-code-which-implements-a-feed-forward-neural-network">Writing a
 
 <p>We ended our discussions with the derivation of the equations for a
 neural network with one hidden layers and two input variables and two
-hidden nodes but only one output node.
+hidden nodes but only one output node. We did almost finish the derivation of the back propagation algorithm.
 </p>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
@@ -547,7 +543,7 @@ <h2 id="reading-recommendations">Reading recommendations </h2>
 <li> Goodfellow et al, chapter 6 and 7 contain most of the neural network background.</li>
 </ol>
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="first-network-example-simple-percepetron-with-one-input">First network example, simple percepetron with one input </h2>
+<h2 id="reminder-from-last-week-first-network-example-simple-percepetron-with-one-input">Reminder from last week: First network example, simple percepetron with one input </h2>
 
 <p>As yet another example we define now a simple perceptron model with
 all quantities given by scalars. We consider only one input variable
@@ -1020,7 +1016,7 @@ <h2 id="setting-up-the-equations-for-a-neural-network">Setting up the equations
 </p>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="layout-of-a-neural-network-with-three-hidden-layers-last-later-l-l-4-first-layer-l-0">Layout of a neural network with three hidden layers (last later = \( l=L=4 \), first layer \( l=0 \)) </h2>
+<h2 id="layout-of-a-neural-network-with-three-hidden-layers-last-layer-l-l-4-first-layer-l-0">Layout of a neural network with three hidden layers (last layer = \( l=L=4 \), first layer \( l=0 \)) </h2>
 
 <br/><br/>
 <center>
@@ -3429,197 +3425,6 @@ <h2 id="collect-and-pre-process-data">Collect and pre-process data </h2>
 </div>
 
 
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="the-breast-cancer-data-now-with-keras">The Breast Cancer Data, now with Keras </h2>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">tensorflow</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">tf</span>
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">tensorflow.keras.layers</span> <span style="color: #008000; font-weight: bold">import</span> Input
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">tensorflow.keras.models</span> <span style="color: #008000; font-weight: bold">import</span> Sequential      <span style="color: #408080; font-style: italic">#This allows appending layers to existing models</span>
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">tensorflow.keras.layers</span> <span style="color: #008000; font-weight: bold">import</span> Dense           <span style="color: #408080; font-style: italic">#This allows defining the characteristics of a particular layer</span>
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">tensorflow.keras</span> <span style="color: #008000; font-weight: bold">import</span> optimizers             <span style="color: #408080; font-style: italic">#This allows using whichever optimiser we want (sgd,adam,RMSprop)</span>
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">tensorflow.keras</span> <span style="color: #008000; font-weight: bold">import</span> regularizers           <span style="color: #408080; font-style: italic">#This allows using whichever regularizer we want (l1,l2,l1_l2)</span>
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">tensorflow.keras.utils</span> <span style="color: #008000; font-weight: bold">import</span> to_categorical   <span style="color: #408080; font-style: italic">#This allows using categorical cross entropy as the cost function</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">matplotlib.pyplot</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">plt</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">seaborn</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">sns</span>
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.model_selection</span> <span style="color: #008000; font-weight: bold">import</span> train_test_split <span style="color: #008000; font-weight: bold">as</span> splitter
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.datasets</span> <span style="color: #008000; font-weight: bold">import</span> load_breast_cancer
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">pickle</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">os</span> 
-
-
-<span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;Load breast cancer dataset&quot;&quot;&quot;</span>
-
-np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>seed(<span style="color: #666666">0</span>)        <span style="color: #408080; font-style: italic">#create same seed for random number every time</span>
-
-cancer<span style="color: #666666">=</span>load_breast_cancer()      <span style="color: #408080; font-style: italic">#Download breast cancer dataset</span>
-
-inputs<span style="color: #666666">=</span>cancer<span style="color: #666666">.</span>data                     <span style="color: #408080; font-style: italic">#Feature matrix of 569 rows (samples) and 30 columns (parameters)</span>
-outputs<span style="color: #666666">=</span>cancer<span style="color: #666666">.</span>target                  <span style="color: #408080; font-style: italic">#Label array of 569 rows (0 for benign and 1 for malignant)</span>
-labels<span style="color: #666666">=</span>cancer<span style="color: #666666">.</span>feature_names[<span style="color: #666666">0</span>:<span style="color: #666666">30</span>]
-
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&#39;The content of the breast cancer dataset is:&#39;</span>)      <span style="color: #408080; font-style: italic">#Print information about the datasets</span>
-<span style="color: #008000">print</span>(labels)
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&#39;-------------------------&#39;</span>)
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;inputs =  &quot;</span> <span style="color: #666666">+</span> <span style="color: #008000">str</span>(inputs<span style="color: #666666">.</span>shape))
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;outputs =  &quot;</span> <span style="color: #666666">+</span> <span style="color: #008000">str</span>(outputs<span style="color: #666666">.</span>shape))
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;labels =  &quot;</span><span style="color: #666666">+</span> <span style="color: #008000">str</span>(labels<span style="color: #666666">.</span>shape))
-
-x<span style="color: #666666">=</span>inputs      <span style="color: #408080; font-style: italic">#Reassign the Feature and Label matrices to other variables</span>
-y<span style="color: #666666">=</span>outputs
-
-<span style="color: #408080; font-style: italic">#%% </span>
-
-<span style="color: #408080; font-style: italic"># Visualisation of dataset (for correlation analysis)</span>
-
-plt<span style="color: #666666">.</span>figure()
-plt<span style="color: #666666">.</span>scatter(x[:,<span style="color: #666666">0</span>],x[:,<span style="color: #666666">2</span>],s<span style="color: #666666">=40</span>,c<span style="color: #666666">=</span>y,cmap<span style="color: #666666">=</span>plt<span style="color: #666666">.</span>cm<span style="color: #666666">.</span>Spectral)
-plt<span style="color: #666666">.</span>xlabel(<span style="color: #BA2121">&#39;Mean radius&#39;</span>,fontweight<span style="color: #666666">=</span><span style="color: #BA2121">&#39;bold&#39;</span>)
-plt<span style="color: #666666">.</span>ylabel(<span style="color: #BA2121">&#39;Mean perimeter&#39;</span>,fontweight<span style="color: #666666">=</span><span style="color: #BA2121">&#39;bold&#39;</span>)
-plt<span style="color: #666666">.</span>show()
-
-plt<span style="color: #666666">.</span>figure()
-plt<span style="color: #666666">.</span>scatter(x[:,<span style="color: #666666">5</span>],x[:,<span style="color: #666666">6</span>],s<span style="color: #666666">=40</span>,c<span style="color: #666666">=</span>y, cmap<span style="color: #666666">=</span>plt<span style="color: #666666">.</span>cm<span style="color: #666666">.</span>Spectral)
-plt<span style="color: #666666">.</span>xlabel(<span style="color: #BA2121">&#39;Mean compactness&#39;</span>,fontweight<span style="color: #666666">=</span><span style="color: #BA2121">&#39;bold&#39;</span>)
-plt<span style="color: #666666">.</span>ylabel(<span style="color: #BA2121">&#39;Mean concavity&#39;</span>,fontweight<span style="color: #666666">=</span><span style="color: #BA2121">&#39;bold&#39;</span>)
-plt<span style="color: #666666">.</span>show()
-
-
-plt<span style="color: #666666">.</span>figure()
-plt<span style="color: #666666">.</span>scatter(x[:,<span style="color: #666666">0</span>],x[:,<span style="color: #666666">1</span>],s<span style="color: #666666">=40</span>,c<span style="color: #666666">=</span>y,cmap<span style="color: #666666">=</span>plt<span style="color: #666666">.</span>cm<span style="color: #666666">.</span>Spectral)
-plt<span style="color: #666666">.</span>xlabel(<span style="color: #BA2121">&#39;Mean radius&#39;</span>,fontweight<span style="color: #666666">=</span><span style="color: #BA2121">&#39;bold&#39;</span>)
-plt<span style="color: #666666">.</span>ylabel(<span style="color: #BA2121">&#39;Mean texture&#39;</span>,fontweight<span style="color: #666666">=</span><span style="color: #BA2121">&#39;bold&#39;</span>)
-plt<span style="color: #666666">.</span>show()
-
-plt<span style="color: #666666">.</span>figure()
-plt<span style="color: #666666">.</span>scatter(x[:,<span style="color: #666666">2</span>],x[:,<span style="color: #666666">1</span>],s<span style="color: #666666">=40</span>,c<span style="color: #666666">=</span>y,cmap<span style="color: #666666">=</span>plt<span style="color: #666666">.</span>cm<span style="color: #666666">.</span>Spectral)
-plt<span style="color: #666666">.</span>xlabel(<span style="color: #BA2121">&#39;Mean perimeter&#39;</span>,fontweight<span style="color: #666666">=</span><span style="color: #BA2121">&#39;bold&#39;</span>)
-plt<span style="color: #666666">.</span>ylabel(<span style="color: #BA2121">&#39;Mean compactness&#39;</span>,fontweight<span style="color: #666666">=</span><span style="color: #BA2121">&#39;bold&#39;</span>)
-plt<span style="color: #666666">.</span>show()
-
-
-<span style="color: #408080; font-style: italic"># Generate training and testing datasets</span>
-
-<span style="color: #408080; font-style: italic">#Select features relevant to classification (texture,perimeter,compactness and symmetery) </span>
-<span style="color: #408080; font-style: italic">#and add to input matrix</span>
-
-temp1<span style="color: #666666">=</span>np<span style="color: #666666">.</span>reshape(x[:,<span style="color: #666666">1</span>],(<span style="color: #008000">len</span>(x[:,<span style="color: #666666">1</span>]),<span style="color: #666666">1</span>))
-temp2<span style="color: #666666">=</span>np<span style="color: #666666">.</span>reshape(x[:,<span style="color: #666666">2</span>],(<span style="color: #008000">len</span>(x[:,<span style="color: #666666">2</span>]),<span style="color: #666666">1</span>))
-X<span style="color: #666666">=</span>np<span style="color: #666666">.</span>hstack((temp1,temp2))      
-temp<span style="color: #666666">=</span>np<span style="color: #666666">.</span>reshape(x[:,<span style="color: #666666">5</span>],(<span style="color: #008000">len</span>(x[:,<span style="color: #666666">5</span>]),<span style="color: #666666">1</span>))
-X<span style="color: #666666">=</span>np<span style="color: #666666">.</span>hstack((X,temp))       
-temp<span style="color: #666666">=</span>np<span style="color: #666666">.</span>reshape(x[:,<span style="color: #666666">8</span>],(<span style="color: #008000">len</span>(x[:,<span style="color: #666666">8</span>]),<span style="color: #666666">1</span>))
-X<span style="color: #666666">=</span>np<span style="color: #666666">.</span>hstack((X,temp))       
-
-X_train,X_test,y_train,y_test<span style="color: #666666">=</span>splitter(X,y,test_size<span style="color: #666666">=0.1</span>)   <span style="color: #408080; font-style: italic">#Split datasets into training and testing</span>
-
-y_train<span style="color: #666666">=</span>to_categorical(y_train)     <span style="color: #408080; font-style: italic">#Convert labels to categorical when using categorical cross entropy</span>
-y_test<span style="color: #666666">=</span>to_categorical(y_test)
-
-<span style="color: #008000; font-weight: bold">del</span> temp1,temp2,temp
-
-<span style="color: #408080; font-style: italic"># %%</span>
-
-<span style="color: #408080; font-style: italic"># Define tunable parameters&quot;</span>
-
-eta<span style="color: #666666">=</span>np<span style="color: #666666">.</span>logspace(<span style="color: #666666">-3</span>,<span style="color: #666666">-1</span>,<span style="color: #666666">3</span>)                    <span style="color: #408080; font-style: italic">#Define vector of learning rates (parameter to SGD optimiser)</span>
-lamda<span style="color: #666666">=0.01</span>                                  <span style="color: #408080; font-style: italic">#Define hyperparameter</span>
-n_layers<span style="color: #666666">=2</span>                                  <span style="color: #408080; font-style: italic">#Define number of hidden layers in the model</span>
-n_neuron<span style="color: #666666">=</span>np<span style="color: #666666">.</span>logspace(<span style="color: #666666">0</span>,<span style="color: #666666">3</span>,<span style="color: #666666">4</span>,dtype<span style="color: #666666">=</span><span style="color: #008000">int</span>)       <span style="color: #408080; font-style: italic">#Define number of neurons per layer</span>
-epochs<span style="color: #666666">=100</span>                                   <span style="color: #408080; font-style: italic">#Number of reiterations over the input data</span>
-batch_size<span style="color: #666666">=100</span>                              <span style="color: #408080; font-style: italic">#Number of samples per gradient update</span>
-
-<span style="color: #408080; font-style: italic"># %%</span>
-
-<span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;Define function to return Deep Neural Network model&quot;&quot;&quot;</span>
-
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">NN_model</span>(inputsize,n_layers,n_neuron,eta,lamda):
-    model<span style="color: #666666">=</span>Sequential()      
-    <span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(n_layers):       <span style="color: #408080; font-style: italic">#Run loop to add hidden layers to the model</span>
-        <span style="color: #008000; font-weight: bold">if</span> (i<span style="color: #666666">==0</span>):                  <span style="color: #408080; font-style: italic">#First layer requires input dimensions</span>
-            model<span style="color: #666666">.</span>add(Dense(n_neuron,activation<span style="color: #666666">=</span><span style="color: #BA2121">&#39;relu&#39;</span>,kernel_regularizer<span style="color: #666666">=</span>regularizers<span style="color: #666666">.</span>l2(lamda),input_dim<span style="color: #666666">=</span>inputsize))
-        <span style="color: #008000; font-weight: bold">else</span>:                       <span style="color: #408080; font-style: italic">#Subsequent layers are capable of automatic shape inferencing</span>
-            model<span style="color: #666666">.</span>add(Dense(n_neuron,activation<span style="color: #666666">=</span><span style="color: #BA2121">&#39;relu&#39;</span>,kernel_regularizer<span style="color: #666666">=</span>regularizers<span style="color: #666666">.</span>l2(lamda)))
-    model<span style="color: #666666">.</span>add(Dense(<span style="color: #666666">2</span>,activation<span style="color: #666666">=</span><span style="color: #BA2121">&#39;softmax&#39;</span>))  <span style="color: #408080; font-style: italic">#2 outputs - ordered and disordered (softmax for prob)</span>
-    sgd<span style="color: #666666">=</span>optimizers<span style="color: #666666">.</span>SGD(lr<span style="color: #666666">=</span>eta)
-    model<span style="color: #666666">.</span>compile(loss<span style="color: #666666">=</span><span style="color: #BA2121">&#39;categorical_crossentropy&#39;</span>,optimizer<span style="color: #666666">=</span>sgd,metrics<span style="color: #666666">=</span>[<span style="color: #BA2121">&#39;accuracy&#39;</span>])
-    <span style="color: #008000; font-weight: bold">return</span> model
-
-    
-Train_accuracy<span style="color: #666666">=</span>np<span style="color: #666666">.</span>zeros((<span style="color: #008000">len</span>(n_neuron),<span style="color: #008000">len</span>(eta)))      <span style="color: #408080; font-style: italic">#Define matrices to store accuracy scores as a function</span>
-Test_accuracy<span style="color: #666666">=</span>np<span style="color: #666666">.</span>zeros((<span style="color: #008000">len</span>(n_neuron),<span style="color: #008000">len</span>(eta)))       <span style="color: #408080; font-style: italic">#of learning rate and number of hidden neurons for </span>
-
-<span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #008000">len</span>(n_neuron)):     <span style="color: #408080; font-style: italic">#run loops over hidden neurons and learning rates to calculate </span>
-    <span style="color: #008000; font-weight: bold">for</span> j <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #008000">len</span>(eta)):      <span style="color: #408080; font-style: italic">#accuracy scores </span>
-        DNN_model<span style="color: #666666">=</span>NN_model(X_train<span style="color: #666666">.</span>shape[<span style="color: #666666">1</span>],n_layers,n_neuron[i],eta[j],lamda)
-        DNN_model<span style="color: #666666">.</span>fit(X_train,y_train,epochs<span style="color: #666666">=</span>epochs,batch_size<span style="color: #666666">=</span>batch_size,verbose<span style="color: #666666">=1</span>)
-        Train_accuracy[i,j]<span style="color: #666666">=</span>DNN_model<span style="color: #666666">.</span>evaluate(X_train,y_train)[<span style="color: #666666">1</span>]
-        Test_accuracy[i,j]<span style="color: #666666">=</span>DNN_model<span style="color: #666666">.</span>evaluate(X_test,y_test)[<span style="color: #666666">1</span>]
-               
-
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">plot_data</span>(x,y,data,title<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">None</span>):
-
-    <span style="color: #408080; font-style: italic"># plot results</span>
-    fontsize<span style="color: #666666">=16</span>
-
-
-    fig <span style="color: #666666">=</span> plt<span style="color: #666666">.</span>figure()
-    ax <span style="color: #666666">=</span> fig<span style="color: #666666">.</span>add_subplot(<span style="color: #666666">111</span>)
-    cax <span style="color: #666666">=</span> ax<span style="color: #666666">.</span>matshow(data, interpolation<span style="color: #666666">=</span><span style="color: #BA2121">&#39;nearest&#39;</span>, vmin<span style="color: #666666">=0</span>, vmax<span style="color: #666666">=1</span>)
-    
-    cbar<span style="color: #666666">=</span>fig<span style="color: #666666">.</span>colorbar(cax)
-    cbar<span style="color: #666666">.</span>ax<span style="color: #666666">.</span>set_ylabel(<span style="color: #BA2121">&#39;accuracy (%)&#39;</span>,rotation<span style="color: #666666">=90</span>,fontsize<span style="color: #666666">=</span>fontsize)
-    cbar<span style="color: #666666">.</span>set_ticks([<span style="color: #666666">0</span>,<span style="color: #666666">.2</span>,<span style="color: #666666">.4</span>,<span style="color: #666666">0.6</span>,<span style="color: #666666">0.8</span>,<span style="color: #666666">1.0</span>])
-    cbar<span style="color: #666666">.</span>set_ticklabels([<span style="color: #BA2121">&#39;0%&#39;</span>,<span style="color: #BA2121">&#39;20%&#39;</span>,<span style="color: #BA2121">&#39;40%&#39;</span>,<span style="color: #BA2121">&#39;60%&#39;</span>,<span style="color: #BA2121">&#39;80%&#39;</span>,<span style="color: #BA2121">&#39;100%&#39;</span>])
-
-    <span style="color: #408080; font-style: italic"># put text on matrix elements</span>
-    <span style="color: #008000; font-weight: bold">for</span> i, x_val <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">enumerate</span>(np<span style="color: #666666">.</span>arange(<span style="color: #008000">len</span>(x))):
-        <span style="color: #008000; font-weight: bold">for</span> j, y_val <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">enumerate</span>(np<span style="color: #666666">.</span>arange(<span style="color: #008000">len</span>(y))):
-            c <span style="color: #666666">=</span> <span style="color: #BA2121">&quot;$</span><span style="color: #BB6688; font-weight: bold">{0:.1f}</span><span style="color: #BB6622; font-weight: bold">\\</span><span style="color: #BA2121">%$&quot;</span><span style="color: #666666">.</span>format( <span style="color: #666666">100*</span>data[j,i])  
-            ax<span style="color: #666666">.</span>text(x_val, y_val, c, va<span style="color: #666666">=</span><span style="color: #BA2121">&#39;center&#39;</span>, ha<span style="color: #666666">=</span><span style="color: #BA2121">&#39;center&#39;</span>)
-
-    <span style="color: #408080; font-style: italic"># convert axis vaues to to string labels</span>
-    x<span style="color: #666666">=</span>[<span style="color: #008000">str</span>(i) <span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> x]
-    y<span style="color: #666666">=</span>[<span style="color: #008000">str</span>(i) <span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> y]
-
-
-    ax<span style="color: #666666">.</span>set_xticklabels([<span style="color: #BA2121">&#39;&#39;</span>]<span style="color: #666666">+</span>x)
-    ax<span style="color: #666666">.</span>set_yticklabels([<span style="color: #BA2121">&#39;&#39;</span>]<span style="color: #666666">+</span>y)
-
-    ax<span style="color: #666666">.</span>set_xlabel(<span style="color: #BA2121">&#39;$</span><span style="color: #BB6622; font-weight: bold">\\</span><span style="color: #BA2121">mathrm{learning</span><span style="color: #BB6622; font-weight: bold">\\</span><span style="color: #BA2121"> rate}$&#39;</span>,fontsize<span style="color: #666666">=</span>fontsize)
-    ax<span style="color: #666666">.</span>set_ylabel(<span style="color: #BA2121">&#39;$</span><span style="color: #BB6622; font-weight: bold">\\</span><span style="color: #BA2121">mathrm{hidden</span><span style="color: #BB6622; font-weight: bold">\\</span><span style="color: #BA2121"> neurons}$&#39;</span>,fontsize<span style="color: #666666">=</span>fontsize)
-    <span style="color: #008000; font-weight: bold">if</span> title <span style="color: #AA22FF; font-weight: bold">is</span> <span style="color: #AA22FF; font-weight: bold">not</span> <span style="color: #008000; font-weight: bold">None</span>:
-        ax<span style="color: #666666">.</span>set_title(title)
-
-    plt<span style="color: #666666">.</span>tight_layout()
-
-    plt<span style="color: #666666">.</span>show()
-    
-plot_data(eta,n_neuron,Train_accuracy, <span style="color: #BA2121">&#39;training&#39;</span>)
-plot_data(eta,n_neuron,Test_accuracy, <span style="color: #BA2121">&#39;testing&#39;</span>)
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
 <h2 id="building-a-neural-network-code">Building a neural network code </h2>
 
@@ -4971,7 +4776,7 @@ <h2 id="testing-the-xor-gate-and-other-gates">Testing the XOR gate and other gat
 <p>Not bad, but the results depend strongly on the learning reate. Try different learning rates.</p>
 <!-- ------------------- end of main content --------------- -->
 <center style="font-size:80%">
-<!-- copyright --> &copy; 1999-2024, Morten Hjorth-Jensen. Released under CC Attribution-NonCommercial 4.0 license
+<!-- copyright --> &copy; 1999-2025, Morten Hjorth-Jensen. Released under CC Attribution-NonCommercial 4.0 license
 </center>
 </body>
 </html>
diff --git a/doc/pub/week42/ipynb/.ipynb_checkpoints/week42-checkpoint.ipynb b/doc/pub/week42/ipynb/.ipynb_checkpoints/week42-checkpoint.ipynb
deleted file mode 100644
index 4b9b797bc..000000000
--- a/doc/pub/week42/ipynb/.ipynb_checkpoints/week42-checkpoint.ipynb
+++ /dev/null
@@ -1,1329 +0,0 @@
-{
- "cells": [
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-<<<<<<< HEAD
-    "<!-- dom:TITLE: Week 42 Convolutional and Recurrent Neural Networks and Autoencoders -->\n",
-    "# Week 42 Convolutional and Recurrent Neural Networks and Autoencoders\n",
-=======
-    "<!-- dom:TITLE: Week 42 Convolutional (CNN) and Recurrent (RNN) Neural Networks and Autoencoders -->\n",
-    "# Week 42 Convolutional (CNN) and Recurrent (RNN) Neural Networks and Autoencoders\n",
->>>>>>> 9b0e2e75096cc1acee65bfac25f4eff818140252
-    "<!-- dom:AUTHOR: Morten Hjorth-Jensen at Department of Physics, University of Oslo & Department of Physics and Astronomy and National Superconducting Cyclotron Laboratory, Michigan State University -->\n",
-    "<!-- Author: -->  \n",
-    "**Morten Hjorth-Jensen**, Department of Physics, University of Oslo and Department of Physics and Astronomy and National Superconducting Cyclotron Laboratory, Michigan State University\n",
-    "\n",
-    "Date: **Oct 15, 2020**\n",
-    "\n",
-    "Copyright 1999-2020, Morten Hjorth-Jensen. Released under CC Attribution-NonCommercial 4.0 license\n",
-    "\n",
-    "\n",
-    "\n",
-    "\n",
-    "## Plan for week 42\n",
-    "\n",
-<<<<<<< HEAD
-    "* Thursday: Convolutional Neural Networks and examples\n",
-=======
-    "* Thursday: Convolutional Neural Networks and examples. [Video of Lecture](https://www.uio.no/studier/emner/matnat/fys/FYS-STK4155/h20/forelesningsvideoer/LectureOctober15.mp4?vrtx=view-as-webpage)\n",
->>>>>>> 9b0e2e75096cc1acee65bfac25f4eff818140252
-    "\n",
-    "* Friday: Recurrent Neural Networks and Autoencoders\n",
-    "\n",
-    "Reading suggestions for both days: [Aurelien Geron's chapters 13 and 14](https://github.com/CompPhysics/MachineLearning/blob/master/doc/Textbooks/TensorflowML.pdf). Autoencoders are discussed in chapter 15 of Geron's text.\n",
-    "\n",
-<<<<<<< HEAD
-=======
-    "**Excellent lectures on CNNs and RNNs.**\n",
-    "\n",
-    "* [Video  on Convolutional Neural Networks from MIT](https://www.youtube.com/watch?v=iaSUYvmCekI&ab_channel=AlexanderAmini)\n",
-    "\n",
-    "* [Video  on Recurrent Neural Networks from MIT](https://www.youtube.com/watch?v=SEnXr6v2ifU&ab_channel=AlexanderAmini)\n",
-    "\n",
-    "\n",
-    "\n",
->>>>>>> 9b0e2e75096cc1acee65bfac25f4eff818140252
-    "\n",
-    "\n",
-    "\n",
-    "\n",
-    "## Convolutional Neural Networks (recognizing images)\n",
-    "\n",
-    "\n",
-    "Convolutional neural networks (CNNs) were developed during the last\n",
-    "decade of the previous century, with a focus on character recognition\n",
-    "tasks. Nowadays, CNNs are a central element in the spectacular success\n",
-    "of deep learning methods. The success in for example image\n",
-    "classifications have made them a central tool for most machine\n",
-    "learning practitioners.\n",
-    "\n",
-    "CNNs are very similar to ordinary Neural Networks.\n",
-    "They are made up of neurons that have learnable weights and\n",
-    "biases. Each neuron receives some inputs, performs a dot product and\n",
-    "optionally follows it with a non-linearity. The whole network still\n",
-    "expresses a single differentiable score function: from the raw image\n",
-    "pixels on one end to class scores at the other. And they still have a\n",
-    "loss function (for example Softmax) on the last (fully-connected) layer\n",
-    "and all the tips/tricks we developed for learning regular Neural\n",
-    "Networks still apply (back propagation, gradient descent etc etc).\n",
-    "\n",
-    "What is the difference? **CNN architectures make the explicit assumption that\n",
-    "the inputs are images, which allows us to encode certain properties\n",
-    "into the architecture. These then make the forward function more\n",
-    "efficient to implement and vastly reduce the amount of parameters in\n",
-    "the network.**\n",
-    "\n",
-    "Here we provide only a superficial overview, for the more interested, we recommend highly the course\n",
-    "[IN5400 – Machine Learning for Image Analysis](https://www.uio.no/studier/emner/matnat/ifi/IN5400/index-eng.html)\n",
-    "and the slides of [CS231](http://cs231n.github.io/convolutional-networks/).\n",
-    "\n",
-    "Another good read is the article here <https://arxiv.org/pdf/1603.07285.pdf>. \n",
-    "\n",
-    "\n",
-    "\n",
-    "\n",
-    "## Neural Networks vs CNNs\n",
-    "\n",
-    "Neural networks are defined as **affine transformations**, that is \n",
-    "a vector is received as input and is multiplied with a matrix of so-called weights (our unknown paramters) to produce an\n",
-    "output (to which a bias vector is usually added before passing the result\n",
-    "through a nonlinear activation function). This is applicable to any type of input, be it an\n",
-    "image, a sound clip or an unordered collection of features: whatever their\n",
-    "dimensionality, their representation can always be flattened into a vector\n",
-    "before the transformation.\n",
-    "\n",
-    "\n",
-    "## Why CNNS for images, sound files, medical images from CT scans etc?\n",
-    "\n",
-    "However, when we consider images, sound clips and many other similar kinds of data, these data  have an intrinsic\n",
-    "structure. More formally, they share these important properties:\n",
-    "* They are stored as multi-dimensional arrays (think of the pixels of a figure) .\n",
-    "\n",
-    "* They feature one or more axes for which ordering matters (e.g., width and height axes for an image, time axis for a sound clip).\n",
-    "\n",
-    "* One axis, called the channel axis, is used to access different views of the data (e.g., the red, green and blue channels of a color image, or the left and right channels of a stereo audio track).\n",
-    "\n",
-    "These properties are not exploited when an affine transformation is applied; in\n",
-    "fact, all the axes are treated in the same way and the topological information\n",
-    "is not taken into account. Still, taking advantage of the implicit structure of\n",
-    "the data may prove very handy in solving some tasks, like computer vision and\n",
-    "speech recognition, and in these cases it would be best to preserve it. This is\n",
-    "where discrete convolutions come into play.\n",
-    "\n",
-    "A discrete convolution is a linear transformation that preserves this notion of\n",
-    "ordering. It is sparse (only a few input units contribute to a given output\n",
-    "unit) and reuses parameters (the same weights are applied to multiple locations\n",
-    "in the input).\n",
-    "\n",
-    "\n",
-    "\n",
-    "\n",
-    "## Regular NNs don’t scale well to full images\n",
-    "\n",
-    "As an example, consider\n",
-    "an image of size $32\\times 32\\times 3$ (32 wide, 32 high, 3 color channels), so a\n",
-    "single fully-connected neuron in a first hidden layer of a regular\n",
-    "Neural Network would have $32\\times 32\\times 3 = 3072$ weights. This amount still\n",
-    "seems manageable, but clearly this fully-connected structure does not\n",
-    "scale to larger images. For example, an image of more respectable\n",
-    "size, say $200\\times 200\\times 3$, would lead to neurons that have \n",
-    "$200\\times 200\\times 3 = 120,000$ weights. \n",
-    "\n",
-    "We could have\n",
-    "several such neurons, and the parameters would add up quickly! Clearly,\n",
-    "this full connectivity is wasteful and the huge number of parameters\n",
-    "would quickly lead to possible overfitting.\n",
-    "\n",
-    "<!-- dom:FIGURE: [figslides/nn.jpeg, width=500 frac=0.6]  A regular 3-layer Neural Network. -->\n",
-    "<!-- begin figure -->\n",
-    "\n",
-    "<p>A regular 3-layer Neural Network.</p>\n",
-    "<img src=\"figslides/nn.jpeg\" width=500>\n",
-    "\n",
-    "<!-- end figure -->\n",
-    "\n",
-    "\n",
-    "## 3D volumes of neurons\n",
-    "\n",
-    "Convolutional Neural Networks take advantage of the fact that the\n",
-    "input consists of images and they constrain the architecture in a more\n",
-    "sensible way. \n",
-    "\n",
-    "In particular, unlike a regular Neural Network, the\n",
-    "layers of a CNN have neurons arranged in 3 dimensions: width,\n",
-    "height, depth. (Note that the word depth here refers to the third\n",
-    "dimension of an activation volume, not to the depth of a full Neural\n",
-    "Network, which can refer to the total number of layers in a network.)\n",
-    "\n",
-    "To understand it better, the above example of an image \n",
-    "with an input volume of\n",
-    "activations has dimensions $32\\times 32\\times 3$ (width, height,\n",
-    "depth respectively). \n",
-    "\n",
-    "The neurons in a layer will\n",
-    "only be connected to a small region of the layer before it, instead of\n",
-    "all of the neurons in a fully-connected manner. Moreover, the final\n",
-    "output layer could  for this specific image have dimensions $1\\times 1 \\times 10$, \n",
-    "because by the\n",
-    "end of the CNN architecture we will reduce the full image into a\n",
-    "single vector of class scores, arranged along the depth\n",
-    "dimension. \n",
-    "\n",
-    "<!-- dom:FIGURE: [figslides/cnn.jpeg, width=500 frac=0.6]  A CNN arranges its neurons in three dimensions (width, height, depth), as visualized in one of the layers. Every layer of a CNN transforms the 3D input volume to a 3D output volume of neuron activations. In this example, the red input layer holds the image, so its width and height would be the dimensions of the image, and the depth would be 3 (Red, Green, Blue channels). -->\n",
-    "<!-- begin figure -->\n",
-    "\n",
-    "<p>A CNN arranges its neurons in three dimensions (width, height, depth), as visualized in one of the layers. Every layer of a CNN transforms the 3D input volume to a 3D output volume of neuron activations. In this example, the red input layer holds the image, so its width and height would be the dimensions of the image, and the depth would be 3 (Red, Green, Blue channels).</p>\n",
-    "<img src=\"figslides/cnn.jpeg\" width=500>\n",
-    "\n",
-    "<!-- end figure -->\n",
-    "\n",
-    "\n",
-    "\n",
-    "\n",
-    "<!-- !split  -->\n",
-    "## Layers used to build CNNs\n",
-    "\n",
-    "\n",
-    "A simple CNN is a sequence of layers, and every layer of a CNN\n",
-    "transforms one volume of activations to another through a\n",
-    "differentiable function. We use three main types of layers to build\n",
-    "CNN architectures: Convolutional Layer, Pooling Layer, and\n",
-    "Fully-Connected Layer (exactly as seen in regular Neural Networks). We\n",
-    "will stack these layers to form a full CNN architecture.\n",
-    "\n",
-    "A simple CNN for image classification could have the architecture:\n",
-    "\n",
-    "* **INPUT** ($32\\times 32 \\times 3$) will hold the raw pixel values of the image, in this case an image of width 32, height 32, and with three color channels R,G,B.\n",
-    "\n",
-    "* **CONV** (convolutional )layer will compute the output of neurons that are connected to local regions in the input, each computing a dot product between their weights and a small region they are connected to in the input volume. This may result in volume such as $[32\\times 32\\times 12]$ if we decided to use 12 filters.\n",
-    "\n",
-    "* **RELU** layer will apply an elementwise activation function, such as the $max(0,x)$ thresholding at zero. This leaves the size of the volume unchanged ($[32\\times 32\\times 12]$).\n",
-    "\n",
-    "* **POOL** (pooling) layer will perform a downsampling operation along the spatial dimensions (width, height), resulting in volume such as $[16\\times 16\\times 12]$.\n",
-    "\n",
-    "* **FC** (i.e. fully-connected) layer will compute the class scores, resulting in volume of size $[1\\times 1\\times 10]$, where each of the 10 numbers correspond to a class score, such as among the 10 categories of the MNIST images we considered above . As with ordinary Neural Networks and as the name implies, each neuron in this layer will be connected to all the numbers in the previous volume.\n",
-    "\n",
-    "## Transforming images\n",
-    "\n",
-    "CNNs transform the original image layer by layer from the original\n",
-    "pixel values to the final class scores. \n",
-    "\n",
-    "Observe that some layers contain\n",
-    "parameters and other don’t. In particular, the CNN layers perform\n",
-    "transformations that are a function of not only the activations in the\n",
-    "input volume, but also of the parameters (the weights and biases of\n",
-    "the neurons). On the other hand, the RELU/POOL layers will implement a\n",
-    "fixed function. The parameters in the CONV/FC layers will be trained\n",
-    "with gradient descent so that the class scores that the CNN computes\n",
-    "are consistent with the labels in the training set for each image.\n",
-    "\n",
-    "\n",
-    "## CNNs in brief\n",
-    "\n",
-    "In summary:\n",
-    "\n",
-    "* A CNN architecture is in the simplest case a list of Layers that transform the image volume into an output volume (e.g. holding the class scores)\n",
-    "\n",
-    "* There are a few distinct types of Layers (e.g. CONV/FC/RELU/POOL are by far the most popular)\n",
-    "\n",
-    "* Each Layer accepts an input 3D volume and transforms it to an output 3D volume through a differentiable function\n",
-    "\n",
-    "* Each Layer may or may not have parameters (e.g. CONV/FC do, RELU/POOL don’t)\n",
-    "\n",
-    "* Each Layer may or may not have additional hyperparameters (e.g. CONV/FC/POOL do, RELU doesn’t)\n",
-    "\n",
-    "For more material on convolutional networks, we strongly recommend\n",
-    "the course\n",
-    "[IN5400 – Machine Learning for Image Analysis](https://www.uio.no/studier/emner/matnat/ifi/IN5400/index-eng.html)\n",
-    "and the slides of [CS231](http://cs231n.github.io/convolutional-networks/) which is taught at Stanford University (consistently ranked as one of the top computer science programs in the world). [Michael Nielsen's book is a must read, in particular chapter 6 which deals with CNNs](http://neuralnetworksanddeeplearning.com/chap6.html).\n",
-    "\n",
-    "\n",
-    "\n",
-    "## CNNs in more detail, building convolutional neural networks in Tensorflow and Keras\n",
-    "\n",
-    "\n",
-    "As discussed above, CNNs are neural networks built from the assumption that the inputs\n",
-    "to the network are 2D images. This is important because the number of features or pixels in images\n",
-    "grows very fast with the image size, and an enormous number of weights and biases are needed in order to build an accurate network.  \n",
-    "\n",
-    "As before, we still have our input, a hidden layer and an output. What's novel about convolutional networks\n",
-    "are the **convolutional** and **pooling** layers stacked in pairs between the input and the hidden layer.\n",
-    "In addition, the data is no longer represented as a 2D feature matrix, instead each input is a number of 2D\n",
-    "matrices, typically 1 for each color dimension (Red, Green, Blue). \n",
-    "\n",
-    "\n",
-    "## Setting it up\n",
-    "\n",
-    "It means that to represent the entire\n",
-    "dataset of images, we require a 4D matrix or **tensor**. This tensor has the dimensions:"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "$$\n",
-    "(n_{inputs},\\, n_{pixels, width},\\, n_{pixels, height},\\, depth) .\n",
-    "$$"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## The MNIST dataset again\n",
-    "\n",
-    "The MNIST dataset consists of grayscale images with a pixel size of\n",
-    "$28\\times 28$, meaning we require $28 \\times 28 = 724$ weights to each\n",
-    "neuron in the first hidden layer.\n",
-    "\n",
-    "If we were to analyze images of size $128\\times 128$ we would require\n",
-    "$128 \\times 128 = 16384$ weights to each neuron. Even worse if we were\n",
-    "dealing with color images, as most images are, we have an image matrix\n",
-    "of size $128\\times 128$ for each color dimension (Red, Green, Blue),\n",
-    "meaning 3 times the number of weights $= 49152$ are required for every\n",
-    "single neuron in the first hidden layer.\n",
-    "\n",
-    "\n",
-    "## Strong correlations\n",
-    "\n",
-    "Images typically have strong local correlations, meaning that a small\n",
-    "part of the image varies little from its neighboring regions. If for\n",
-    "example we have an image of a blue car, we can roughly assume that a\n",
-    "small blue part of the image is surrounded by other blue regions.\n",
-    "\n",
-    "Therefore, instead of connecting every single pixel to a neuron in the\n",
-    "first hidden layer, as we have previously done with deep neural\n",
-    "networks, we can instead connect each neuron to a small part of the\n",
-    "image (in all 3 RGB depth dimensions).  The size of each small area is\n",
-    "fixed, and known as a [receptive](https://en.wikipedia.org/wiki/Receptive_field).\n",
-    "\n",
-    "\n",
-    "<!-- !split  -->\n",
-    "## Layers of a CNN\n",
-    "The layers of a convolutional neural network arrange neurons in 3D: width, height and depth.  \n",
-    "The input image is typically a square matrix of depth 3. \n",
-    "\n",
-    "A **convolution** is performed on the image which outputs\n",
-    "a 3D volume of neurons. The weights to the input are arranged in a number of 2D matrices, known as **filters**.\n",
-    "\n",
-    "\n",
-    "Each filter slides along the input image, taking the dot product\n",
-    "between each small part of the image and the filter, in all depth\n",
-    "dimensions. This is then passed through a non-linear function,\n",
-    "typically the **Rectified Linear (ReLu)** function, which serves as the\n",
-    "activation of the neurons in the first convolutional layer. This is\n",
-    "further passed through a **pooling layer**, which reduces the size of the\n",
-    "convolutional layer, e.g. by taking the maximum or average across some\n",
-    "small regions, and this serves as input to the next convolutional\n",
-    "layer.\n",
-    "\n",
-    "\n",
-    "## Systematic reduction\n",
-    "\n",
-    "By systematically reducing the size of the input volume, through\n",
-    "convolution and pooling, the network should create representations of\n",
-    "small parts of the input, and then from them assemble representations\n",
-    "of larger areas.  The final pooling layer is flattened to serve as\n",
-    "input to a hidden layer, such that each neuron in the final pooling\n",
-    "layer is connected to every single neuron in the hidden layer. This\n",
-    "then serves as input to the output layer, e.g. a softmax output for\n",
-    "classification.\n",
-    "\n",
-    "\n",
-    "## Prerequisites: Collect and pre-process data"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 1,
-   "metadata": {},
-<<<<<<< HEAD
-   "outputs": [],
-=======
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "inputs = (n_inputs, pixel_width, pixel_height, depth) = (1797, 8, 8, 1)\n",
-      "labels = (n_inputs) = (1797,)\n"
-     ]
-    },
-    {
-     "data": {
-      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAqwAAACRCAYAAAAGuepqAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAALD0lEQVR4nO3dX2jl6VkH8OdpZ6HWP3O6KKJ2d6ItFbYXmxtRsTJnQBAESWBZEaw7GWnBK2eW6o0gyUi9EC82I17o1Wa6ghUtJKCLIDqJ1lW0sBnozYKUtGux2MqeuKsiWl8vTgbDMH+yz++cnDcznw8EMnvy/J73l/Oc93zPLyfZbK0FAAD06j2LXgAAADyIwAoAQNcEVgAAuiawAgDQNYEVAICuCawAAHTtkQ+smbmbmZ847VrOFnPCSZkVTsqscBLm5GTOTGDNzIPM/IlFr+N+curTmfnVzDw8GqKPLnpdj5szMCe/m5nvHPv4r8x8e9HrehyZFU7qDMzKz2bmG0fPPf+SmTcz8zsWva7HjTmZrzMTWM+A5yPiFyLixyPiyYj424h4ZaErojuttV9srX3bnY+I+IOI+KNFr4v+mBXehb+JiB9rrZ2PiB+IiHMR8enFLokOnek5OfOBNTM/kJl/kplfz8y3jj7/4F1f9qHM/PujVxU7mfnksfofyczXMnOSmbczc1xcyvdHxOdba19qrX0zIn4/Ip4pHosZ62hOjq/pWyPiuYi4OfRYzI5Z4aR6mZXW2puttW8c+0/fjIgPV47F7JmT2TjzgTWm5/ByRFyIiKcj4j8j4nfu+poXYnr183sj4n8i4rcjIjLz+yLiT2P6CuPJiPjliPhcZn7X3U0y8+mjYXn6Puv4bER8ODM/kplPRMTliPizgefG7PQyJ8c9FxFfj4i/qpwQc2NWOKluZiUzP5aZhxHxdkznZXPYqTFD5mQGznxgba39a2vtc621/2itvR0RvxERF+/6sldaa19srf17RPxaRPxMZr43Ij4eEa+21l5trf1va+3PI+ILEfFT9+jzldbaqLX2lfss5Z8j4q8j4o2YDuPzEfHiTE6SwTqak+MuR8RnWmtt0MkxU2aFk+ppVlprnz/6Ue8HI+K3IuJgJifJYOZkNs58YM3M92fm72XmlzPz32J6BWJ0dEff8eaxz78cEU9ExHfG9NXO80evSCaZOYmIj0XE9xSWsh4RPxQRT0XE+yLiekT8ZWa+v3AsZqyjObmznqdiumF9pnoM5sOscFK9zUpERGvtqzH96d5nhxyH2TEns3Fu0QuYgU9FxA9GxA+31r6WmcsR8XpE5LGveerY509HxH9HxDdiOiCvtNY+OYN1PBsRf9ha+6ejf29l5mZM38f6hRkcn2F6mZM7XoiI11prX5rhMZkNs8JJ9TYrd5yLiA/N4bjUmJMZOGtXWJ/IzPcd+zgXEd8e0x/BT47epLx+j7qPZ+YzR1c7fz0i/vjYL0b9dGb+ZGa+9+iY43u8Gfok/iGmr4K+OzPfk5k/H9NXSP9YOlOG6HlO7nghIrYG1DMbZoWT6nZWMvPnjt6/mJl5IaY/cv6L8pkyhDmZk7MWWF+N6Z1+52Mjpm8Y/paYvhL5u7j3Lzq9EtMN/2sx/XH9L0VMf2MuIlYi4ldj+gsNb0bEr8Q9vi9Hd/I7ef83M/9mRNyOiP2ImMT0/avPtdYm7/40GajnOYnM/NGYvn/InyhaPLPCSfU8K89ExGsR8U5M/3TRGxExjytyPJw5mZP0Hn4AAHp21q6wAgDwmBFYAQDomsAKAEDXBFYAALr2sL/Deuq/kbW1tVWq29jYKPccjUalus3N+v/RbDwel2sHyId/Sdmpz8ru7m6prjpjERHb29ulusPDw3LPW7duleoGzti8ZuXU52RnZ6dUd/Xq1Rmv5OGqMx0RsbS0NLN1vAvd7SkHBwflhtU9fcieUt0bzp8/X+65v79fqhs4Y13tKZNJ/Q/4LGJOquut3tcR/e0prrACANA1gRUAgK4JrAAAdE1gBQCgawIrAABdE1gBAOiawAoAQNcEVgAAuiawAgDQNYEVAICuCawAAHRNYAUAoGsCKwAAXTs3j4Pu7u6Wa69cuVKqW1lZKfccjUalutXV1XLPyWRSrmXq2rVrpboh3/u1tbVS3Y0bN8o9q/P5KDk4OCjXDnmcnrbt7e1ybfXx8KhZxPfh5s2b5dpbt26V6obsKZ5/hn0Pqo/TIXtRtefW1la558bGRrl2HlxhBQCgawIrAABdE1gBAOiawAoAQNcEVgAAuiawAgDQNYEVAICuCawAAHRNYAUAoGsCKwAAXRNYAQDomsAKAEDXBFYAALomsAIA0LVsrT3o9gfeeD/Xrl2rrSYiDg4OSnXb29vlnuPxuFQ3Go3KPYesd4Cc47FLszJEdVaG3G97e3ulusuXL5d7TiaTcu0A85qVU5+Tzc3NUt3y8nK556VLl0p1Fy9eLPfc3d0t1w7wSO0pi1B9vtzf3y/3fMRm5bGYk2pOGbKPVffOge47J66wAgDQNYEVAICuCawAAHRNYAUAoGsCKwAAXRNYAQDomsAKAEDXBFYAALomsAIA0DWBFQCArgmsAAB0TWAFAKBrAisAAF07N4+DLi0tlWsPDg5KdRsbG+Wee3t7pbrXX3+93JPhJpNJqa46YxER6+vrpbrRaFTuWV3vkMfho2Rtba1UN2RPqaruRRH19S7iPPl/y8vLpbqtra1yz+reOWQfe5RU9+TV1dXZLuQENjc3T73nvLjCCgBA1wRWAAC6JrACANA1gRUAgK4JrAAAdE1gBQCgawIrAABdE1gBAOiawAoAQNcEVgAAuiawAgDQNYEVAICuCawAAHRNYAUAoGvZWnvQ7Q+8cR6Wl5dLdbdv3y73vHz5cqlua2ur3HNBco7HLs3Kzs5OueHq6mq59ixZX18v1W1sbAxpO69ZKc3J/v5+ueF4PC7VHR4elntWVfeiiPr9vbS0VO4ZHe4pj4sh91t179zc3Cz3jM72lCGq+1F1L4qo70cvv/xyuefa2lq5doD7zokrrAAAdE1gBQCgawIrAABdE1gBAOiawAoAQNcEVgAAuiawAgDQNYEVAICuCawAAHRNYAUAoGsCKwAAXRNYAQDomsAKAEDXsrX2oNsfeOM8LC8vn3bLGI1Gpboha93c3CzXDpBzPHZpVnZ3d8sNt7e3S3X7+/vlngcHB6feszqfA81rVk59Ti5dulSurVpZWSnVVWd6gbrbUx4X4/H41HsOeRxGZ3vKZDKZ9ToeasheXr2/q89ZQ2sHuO+cuMIKAEDXBFYAALomsAIA0DWBFQCArgmsAAB0TWAFAKBrAisAAF0TWAEA6JrACgBA1wRWAAC6JrACANA1gRUAgK4JrAAAdE1gBQCga+cWvYC7jUajUt14PC733NjYKNVV17qonj0acr8dHh6W6ra2tso9V1dXS3WP2v122obMydWrV0t1N27cKPe8cuVKuZbF2dnZKdVduHCh3HN/f/9U6yLqzz+Pkr29vXLt+vp6qe769evlnmtra6W6IXvRZDIp1c3r+c4VVgAAuiawAgDQNYEVAICuCawAAHRNYAUAoGsCKwAAXRNYAQDomsAKAEDXBFYAALomsAIA0DWBFQCArgmsAAB0TWAFAKBr5xa9gLu9+OKLpbrV1dVyz+vXr5fqVlZWyj1Ho1G5lqm33nqrVHd4eFjuuba2Vq7lbHn22WfLtUP2BhbnpZdeKtXt7e2Ve54/f75UN2Qvso9FXLx4sVw7Ho9LddX5ioiYTCaluqtXr5Z79pZTXGEFAKBrAisAAF0TWAEA6JrACgBA1wRWAAC6JrACANA1gRUAgK4JrAAAdE1gBQCgawIrAABdE1gBAOiawAoAQNcEVgAAuiawAgDQtWytLXoNAABwX66wAgDQNYEVAICuCawAAHRNYAUAoGsCKwAAXRNYAQDo2v8BlXpliBTuHMEAAAAASUVORK5CYII=\n",
-      "text/plain": [
-       "<Figure size 864x864 with 5 Axes>"
-      ]
-     },
-     "metadata": {
-      "needs_background": "light"
-     },
-     "output_type": "display_data"
-    }
-   ],
->>>>>>> 9b0e2e75096cc1acee65bfac25f4eff818140252
-   "source": [
-    "%matplotlib inline\n",
-    "\n",
-    "# import necessary packages\n",
-    "import numpy as np\n",
-    "import matplotlib.pyplot as plt\n",
-    "from sklearn import datasets\n",
-    "\n",
-    "\n",
-    "# ensure the same random numbers appear every time\n",
-    "np.random.seed(0)\n",
-    "\n",
-    "# display images in notebook\n",
-    "%matplotlib inline\n",
-    "plt.rcParams['figure.figsize'] = (12,12)\n",
-    "\n",
-    "\n",
-    "# download MNIST dataset\n",
-    "digits = datasets.load_digits()\n",
-    "\n",
-    "# define inputs and labels\n",
-    "inputs = digits.images\n",
-    "labels = digits.target\n",
-    "\n",
-    "# RGB images have a depth of 3\n",
-    "# our images are grayscale so they should have a depth of 1\n",
-    "inputs = inputs[:,:,:,np.newaxis]\n",
-    "\n",
-    "print(\"inputs = (n_inputs, pixel_width, pixel_height, depth) = \" + str(inputs.shape))\n",
-    "print(\"labels = (n_inputs) = \" + str(labels.shape))\n",
-    "\n",
-    "\n",
-    "# choose some random images to display\n",
-    "n_inputs = len(inputs)\n",
-    "indices = np.arange(n_inputs)\n",
-    "random_indices = np.random.choice(indices, size=5)\n",
-    "\n",
-    "for i, image in enumerate(digits.images[random_indices]):\n",
-    "    plt.subplot(1, 5, i+1)\n",
-    "    plt.axis('off')\n",
-    "    plt.imshow(image, cmap=plt.cm.gray_r, interpolation='nearest')\n",
-    "    plt.title(\"Label: %d\" % digits.target[random_indices[i]])\n",
-    "plt.show()"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Importing Keras and Tensorflow"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 2,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "from tensorflow.keras import datasets, layers, models\n",
-    "from tensorflow.keras.layers import Input\n",
-    "from tensorflow.keras.models import Sequential      #This allows appending layers to existing models\n",
-    "from tensorflow.keras.layers import Dense           #This allows defining the characteristics of a particular layer\n",
-    "from tensorflow.keras import optimizers             #This allows using whichever optimiser we want (sgd,adam,RMSprop)\n",
-    "from tensorflow.keras import regularizers           #This allows using whichever regularizer we want (l1,l2,l1_l2)\n",
-    "from tensorflow.keras.utils import to_categorical   #This allows using categorical cross entropy as the cost function\n",
-<<<<<<< HEAD
-    "#from tensorflow.keras import Conv2D\n",
-=======
-    "#rt Cofrom tensorflow.keras imponv2D\n",
->>>>>>> 9b0e2e75096cc1acee65bfac25f4eff818140252
-    "#from tensorflow.keras import MaxPooling2D\n",
-    "#from tensorflow.keras import Flatten\n",
-    "\n",
-    "from sklearn.model_selection import train_test_split\n",
-    "\n",
-    "# representation of labels\n",
-    "labels = to_categorical(labels)\n",
-    "\n",
-    "# split into train and test data\n",
-    "# one-liner from scikit-learn library\n",
-    "train_size = 0.8\n",
-    "test_size = 1 - train_size\n",
-    "X_train, X_test, Y_train, Y_test = train_test_split(inputs, labels, train_size=train_size,\n",
-    "                                                    test_size=test_size)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "<!-- !split  -->\n",
-    "## Running with Keras"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 3,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "def create_convolutional_neural_network_keras(input_shape, receptive_field,\n",
-    "                                              n_filters, n_neurons_connected, n_categories,\n",
-    "                                              eta, lmbd):\n",
-    "    model = Sequential()\n",
-    "    model.add(layers.Conv2D(n_filters, (receptive_field, receptive_field), input_shape=input_shape, padding='same',\n",
-    "              activation='relu', kernel_regularizer=regularizers.l2(lmbd)))\n",
-    "    model.add(layers.MaxPooling2D(pool_size=(2, 2)))\n",
-    "    model.add(layers.Flatten())\n",
-    "    model.add(layers.Dense(n_neurons_connected, activation='relu', kernel_regularizer=regularizers.l2(lmbd)))\n",
-    "    model.add(layers.Dense(n_categories, activation='softmax', kernel_regularizer=regularizers.l2(lmbd)))\n",
-    "    \n",
-    "    sgd = optimizers.SGD(lr=eta)\n",
-    "    model.compile(loss='categorical_crossentropy', optimizer=sgd, metrics=['accuracy'])\n",
-    "    \n",
-    "    return model\n",
-    "\n",
-    "epochs = 100\n",
-    "batch_size = 100\n",
-    "input_shape = X_train.shape[1:4]\n",
-    "receptive_field = 3\n",
-    "n_filters = 10\n",
-    "n_neurons_connected = 50\n",
-    "n_categories = 10\n",
-    "\n",
-    "eta_vals = np.logspace(-5, 1, 7)\n",
-    "lmbd_vals = np.logspace(-5, 1, 7)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Final part"
-   ]
-  },
-  {
-   "cell_type": "code",
-<<<<<<< HEAD
-   "execution_count": 4,
-   "metadata": {},
-   "outputs": [],
-=======
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "12/12 [==============================] - 0s 2ms/step - loss: 2.8826 - accuracy: 0.2444\n",
-      "Learning rate =  1e-05\n",
-      "Lambda =  1e-05\n",
-      "Test accuracy: 0.244\n",
-      "\n",
-      "12/12 [==============================] - 0s 2ms/step - loss: 3.3143 - accuracy: 0.0528\n",
-      "Learning rate =  1e-05\n",
-      "Lambda =  0.0001\n",
-      "Test accuracy: 0.053\n",
-      "\n",
-      "12/12 [==============================] - 0s 1ms/step - loss: 2.6159 - accuracy: 0.1611\n",
-      "Learning rate =  1e-05\n",
-      "Lambda =  0.001\n",
-      "Test accuracy: 0.161\n",
-      "\n",
-      "12/12 [==============================] - 0s 2ms/step - loss: 4.6617 - accuracy: 0.1278\n",
-      "Learning rate =  1e-05\n",
-      "Lambda =  0.01\n",
-      "Test accuracy: 0.128\n",
-      "\n",
-      "12/12 [==============================] - 0s 2ms/step - loss: 12.1948 - accuracy: 0.1139\n",
-      "Learning rate =  1e-05\n",
-      "Lambda =  0.1\n",
-      "Test accuracy: 0.114\n",
-      "\n",
-      "12/12 [==============================] - 0s 2ms/step - loss: 91.3207 - accuracy: 0.0917\n",
-      "Learning rate =  1e-05\n",
-      "Lambda =  1.0\n",
-      "Test accuracy: 0.092\n",
-      "\n",
-      "12/12 [==============================] - 0s 1ms/step - loss: 517.6693 - accuracy: 0.1250\n",
-      "Learning rate =  1e-05\n",
-      "Lambda =  10.0\n",
-      "Test accuracy: 0.125\n",
-      "\n",
-      "12/12 [==============================] - 0s 2ms/step - loss: 1.3215 - accuracy: 0.6111\n",
-      "Learning rate =  0.0001\n",
-      "Lambda =  1e-05\n",
-      "Test accuracy: 0.611\n",
-      "\n",
-      "12/12 [==============================] - 0s 1ms/step - loss: 1.2700 - accuracy: 0.5889\n",
-      "Learning rate =  0.0001\n",
-      "Lambda =  0.0001\n",
-      "Test accuracy: 0.589\n",
-      "\n",
-      "12/12 [==============================] - 0s 1ms/step - loss: 1.4245 - accuracy: 0.5806\n",
-      "Learning rate =  0.0001\n",
-      "Lambda =  0.001\n",
-      "Test accuracy: 0.581\n",
-      "\n",
-      "12/12 [==============================] - 0s 1ms/step - loss: 2.6471 - accuracy: 0.4556\n",
-      "Learning rate =  0.0001\n",
-      "Lambda =  0.01\n",
-      "Test accuracy: 0.456\n",
-      "\n",
-      "12/12 [==============================] - 0s 1ms/step - loss: 10.4180 - accuracy: 0.5139\n",
-      "Learning rate =  0.0001\n",
-      "Lambda =  0.1\n",
-      "Test accuracy: 0.514\n",
-      "\n",
-      "12/12 [==============================] - 0s 1ms/step - loss: 54.1625 - accuracy: 0.2583\n",
-      "Learning rate =  0.0001\n",
-      "Lambda =  1.0\n",
-      "Test accuracy: 0.258\n",
-      "\n",
-      "12/12 [==============================] - 0s 1ms/step - loss: 4.5475 - accuracy: 0.0889\n",
-      "Learning rate =  0.0001\n",
-      "Lambda =  10.0\n",
-      "Test accuracy: 0.089\n",
-      "\n",
-      "12/12 [==============================] - 0s 1ms/step - loss: 0.2355 - accuracy: 0.9306\n",
-      "Learning rate =  0.001\n",
-      "Lambda =  1e-05\n",
-      "Test accuracy: 0.931\n",
-      "\n",
-      "12/12 [==============================] - 0s 1ms/step - loss: 0.2488 - accuracy: 0.9333\n",
-      "Learning rate =  0.001\n",
-      "Lambda =  0.0001\n",
-      "Test accuracy: 0.933\n",
-      "\n",
-      "12/12 [==============================] - 0s 1ms/step - loss: 0.3576 - accuracy: 0.9194\n",
-      "Learning rate =  0.001\n",
-      "Lambda =  0.001\n",
-      "Test accuracy: 0.919\n",
-      "\n",
-      "12/12 [==============================] - 0s 1ms/step - loss: 1.2576 - accuracy: 0.8778\n",
-      "Learning rate =  0.001\n",
-      "Lambda =  0.01\n",
-      "Test accuracy: 0.878\n",
-      "\n",
-      "12/12 [==============================] - 0s 2ms/step - loss: 5.8163 - accuracy: 0.9167\n",
-      "Learning rate =  0.001\n",
-      "Lambda =  0.1\n",
-      "Test accuracy: 0.917\n",
-      "\n"
-     ]
-    }
-   ],
->>>>>>> 9b0e2e75096cc1acee65bfac25f4eff818140252
-   "source": [
-    "CNN_keras = np.zeros((len(eta_vals), len(lmbd_vals)), dtype=object)\n",
-    "        \n",
-    "for i, eta in enumerate(eta_vals):\n",
-    "    for j, lmbd in enumerate(lmbd_vals):\n",
-    "        CNN = create_convolutional_neural_network_keras(input_shape, receptive_field,\n",
-    "                                              n_filters, n_neurons_connected, n_categories,\n",
-    "                                              eta, lmbd)\n",
-    "        CNN.fit(X_train, Y_train, epochs=epochs, batch_size=batch_size, verbose=0)\n",
-    "        scores = CNN.evaluate(X_test, Y_test)\n",
-    "        \n",
-    "        CNN_keras[i][j] = CNN\n",
-    "        \n",
-    "        print(\"Learning rate = \", eta)\n",
-    "        print(\"Lambda = \", lmbd)\n",
-    "        print(\"Test accuracy: %.3f\" % scores[1])\n",
-    "        print()"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Final visualization"
-   ]
-  },
-  {
-   "cell_type": "code",
-<<<<<<< HEAD
-   "execution_count": 5,
-   "metadata": {},
-   "outputs": [],
-=======
-   "execution_count": 6,
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "1437/1437 [==============================] - 0s 43us/sample - loss: 3.3022 - accuracy: 0.1872\n",
-      "360/360 [==============================] - 0s 121us/sample - loss: 3.4180 - accuracy: 0.1778\n",
-      "1437/1437 [==============================] - 0s 86us/sample - loss: 3.3955 - accuracy: 0.1093\n",
-      "360/360 [==============================] - 0s 142us/sample - loss: 3.4203 - accuracy: 0.0917\n",
-      "1437/1437 [==============================] - 0s 80us/sample - loss: 2.7250 - accuracy: 0.1587\n",
-      "360/360 [==============================] - 0s 216us/sample - loss: 2.7661 - accuracy: 0.1556\n",
-      "1437/1437 [==============================] - 0s 66us/sample - loss: 3.5698 - accuracy: 0.1343\n",
-      "360/360 [==============================] - 0s 46us/sample - loss: 3.5947 - accuracy: 0.1167\n",
-      "1437/1437 [==============================] - 0s 63us/sample - loss: 12.5837 - accuracy: 0.0946\n",
-      "360/360 [==============================] - 0s 60us/sample - loss: 12.5511 - accuracy: 0.1111\n",
-      "1437/1437 [==============================] - 0s 59us/sample - loss: 91.5210 - accuracy: 0.2408\n",
-      "360/360 [==============================] - 0s 53us/sample - loss: 91.5551 - accuracy: 0.2222\n",
-      "1437/1437 [==============================] - 0s 64us/sample - loss: 518.1178 - accuracy: 0.1969\n",
-      "360/360 [==============================] - 0s 48us/sample - loss: 518.1064 - accuracy: 0.1889\n",
-      "1437/1437 [==============================] - 0s 66us/sample - loss: 1.4465 - accuracy: 0.5623\n",
-      "360/360 [==============================] - 0s 37us/sample - loss: 1.4667 - accuracy: 0.5444\n",
-      "1437/1437 [==============================] - 0s 63us/sample - loss: 1.0335 - accuracy: 0.7015\n",
-      "360/360 [==============================] - 0s 75us/sample - loss: 1.0560 - accuracy: 0.6806\n",
-      "1437/1437 [==============================] - 0s 62us/sample - loss: 1.9454 - accuracy: 0.3730\n",
-      "360/360 [==============================] - 0s 48us/sample - loss: 2.0023 - accuracy: 0.3611\n",
-      "1437/1437 [==============================] - 0s 92us/sample - loss: 2.4747 - accuracy: 0.5080\n",
-      "360/360 [==============================] - 0s 49us/sample - loss: 2.5625 - accuracy: 0.4722\n",
-      "1437/1437 [==============================] - 0s 65us/sample - loss: 10.2878 - accuracy: 0.5887\n",
-      "360/360 [==============================] - 0s 87us/sample - loss: 10.3306 - accuracy: 0.5694\n",
-      "1437/1437 [==============================] - 0s 60us/sample - loss: 53.7810 - accuracy: 0.6548\n",
-      "360/360 [==============================] - 0s 45us/sample - loss: 53.8126 - accuracy: 0.6194\n",
-      "1437/1437 [==============================] - 0s 55us/sample - loss: 4.5991 - accuracy: 0.1058\n",
-      "360/360 [==============================] - 0s 45us/sample - loss: 4.5992 - accuracy: 0.0889\n",
-      "1437/1437 [==============================] - 0s 66us/sample - loss: 0.2035 - accuracy: 0.9457\n",
-      "360/360 [==============================] - 0s 43us/sample - loss: 0.2762 - accuracy: 0.9194\n",
-      "1437/1437 [==============================] - 0s 82us/sample - loss: 0.1869 - accuracy: 0.9617\n",
-      "360/360 [==============================] - 0s 46us/sample - loss: 0.2421 - accuracy: 0.9417\n",
-      "1437/1437 [==============================] - 0s 59us/sample - loss: 0.2736 - accuracy: 0.9527\n",
-      "360/360 [==============================] - 0s 57us/sample - loss: 0.3346 - accuracy: 0.9278\n",
-      "1437/1437 [==============================] - 0s 63us/sample - loss: 1.0958 - accuracy: 0.9499\n",
-      "360/360 [==============================] - 0s 48us/sample - loss: 1.1583 - accuracy: 0.9194\n",
-      "1437/1437 [==============================] - ETA: 0s - loss: 5.7025 - accuracy: 0.95 - 0s 68us/sample - loss: 5.7254 - accuracy: 0.9506\n",
-      "360/360 [==============================] - 0s 226us/sample - loss: 5.7769 - accuracy: 0.9194\n",
-      "1437/1437 [==============================] - 0s 61us/sample - loss: 2.5815 - accuracy: 0.1886\n",
-      "360/360 [==============================] - 0s 56us/sample - loss: 2.5828 - accuracy: 0.1472\n",
-      "1437/1437 [==============================] - 0s 61us/sample - loss: 2.3024 - accuracy: 0.1044\n",
-      "360/360 [==============================] - 0s 39us/sample - loss: 2.3034 - accuracy: 0.0778\n",
-      "1437/1437 [==============================] - 0s 68us/sample - loss: 0.0132 - accuracy: 1.0000\n",
-      "360/360 [==============================] - 0s 59us/sample - loss: 0.0683 - accuracy: 0.9694\n",
-      "1437/1437 [==============================] - 0s 71us/sample - loss: 0.0264 - accuracy: 0.9993\n",
-      "360/360 [==============================] - 0s 46us/sample - loss: 0.0946 - accuracy: 0.9694\n",
-      "1437/1437 [==============================] - 0s 56us/sample - loss: 0.1294 - accuracy: 0.9944\n",
-      "360/360 [==============================] - 0s 55us/sample - loss: 0.1989 - accuracy: 0.9694\n",
-      "1437/1437 [==============================] - 0s 67us/sample - loss: 0.6352 - accuracy: 0.9972\n",
-      "360/360 [==============================] - 0s 41us/sample - loss: 0.6825 - accuracy: 0.9806\n",
-      "1437/1437 [==============================] - 0s 75us/sample - loss: 0.9284 - accuracy: 0.9617\n",
-      "360/360 [==============================] - 0s 40us/sample - loss: 0.9712 - accuracy: 0.9306\n",
-      "1437/1437 [==============================] - 0s 54us/sample - loss: 2.3020 - accuracy: 0.1044\n",
-      "360/360 [==============================] - 0s 73us/sample - loss: 2.3063 - accuracy: 0.0889\n",
-      "1437/1437 [==============================] - 0s 72us/sample - loss: 2.3020 - accuracy: 0.1044\n",
-      "360/360 [==============================] - 0s 43us/sample - loss: 2.3065 - accuracy: 0.0889\n",
-      "1437/1437 [==============================] - 0s 78us/sample - loss: 0.0163 - accuracy: 0.9972\n",
-      "360/360 [==============================] - 0s 56us/sample - loss: 0.4443 - accuracy: 0.9056\n",
-      "1437/1437 [==============================] - 0s 60us/sample - loss: 0.0128 - accuracy: 1.0000\n",
-      "360/360 [==============================] - 0s 43us/sample - loss: 0.1760 - accuracy: 0.9611\n",
-      "1437/1437 [==============================] - 0s 62us/sample - loss: 0.0952 - accuracy: 1.0000\n",
-      "360/360 [==============================] - 0s 41us/sample - loss: 0.2395 - accuracy: 0.9694\n",
-      "1437/1437 [==============================] - 0s 72us/sample - loss: 0.4259 - accuracy: 0.9179\n",
-      "360/360 [==============================] - 0s 52us/sample - loss: 0.4562 - accuracy: 0.9083\n",
-      "1437/1437 [==============================] - 0s 63us/sample - loss: 1.3861 - accuracy: 0.8386\n",
-      "360/360 [==============================] - 0s 50us/sample - loss: 1.4405 - accuracy: 0.7972\n",
-      "1437/1437 [==============================] - 0s 62us/sample - loss: 2.3020 - accuracy: 0.1044\n",
-      "360/360 [==============================] - 0s 43us/sample - loss: 2.3075 - accuracy: 0.0889\n",
-      "1437/1437 [==============================] - 0s 65us/sample - loss: nan - accuracy: 0.1044\n",
-      "360/360 [==============================] - 0s 45us/sample - loss: nan - accuracy: 0.0778\n",
-      "1437/1437 [==============================] - 0s 68us/sample - loss: 11.7729 - accuracy: 0.1016\n",
-      "360/360 [==============================] - 0s 53us/sample - loss: 11.7820 - accuracy: 0.0917\n",
-      "1437/1437 [==============================] - 0s 54us/sample - loss: 525.7408 - accuracy: 0.1044\n",
-      "360/360 [==============================] - 0s 54us/sample - loss: 525.7493 - accuracy: 0.0889\n",
-      "1437/1437 [==============================] - 0s 54us/sample - loss: 14.1025 - accuracy: 0.1044\n",
-      "360/360 [==============================] - 0s 51us/sample - loss: 14.1148 - accuracy: 0.0778\n",
-      "1437/1437 [==============================] - 0s 47us/sample - loss: 2.3039 - accuracy: 0.0995\n",
-      "360/360 [==============================] - 0s 44us/sample - loss: 2.3088 - accuracy: 0.1056\n",
-      "1437/1437 [==============================] - 0s 61us/sample - loss: 2.3031 - accuracy: 0.0995\n",
-      "360/360 [==============================] - 0s 41us/sample - loss: 2.3101 - accuracy: 0.1056\n",
-      "1437/1437 [==============================] - 0s 79us/sample - loss: nan - accuracy: 0.1044\n",
-      "360/360 [==============================] - 0s 36us/sample - loss: nan - accuracy: 0.0778\n",
-      "1437/1437 [==============================] - 0s 58us/sample - loss: nan - accuracy: 0.1044\n",
-      "360/360 [==============================] - 0s 61us/sample - loss: nan - accuracy: 0.0778\n",
-      "1437/1437 [==============================] - 0s 60us/sample - loss: 6288412.0000 - accuracy: 0.0995\n",
-      "360/360 [==============================] - 0s 62us/sample - loss: 6288412.0000 - accuracy: 0.1056\n",
-      "1437/1437 [==============================] - 0s 67us/sample - loss: 285465.2778 - accuracy: 0.0995\n",
-      "360/360 [==============================] - 0s 49us/sample - loss: 285465.2687 - accuracy: 0.1056\n",
-      "1437/1437 [==============================] - 0s 92us/sample - loss: 2.3807 - accuracy: 0.0953\n",
-      "360/360 [==============================] - 0s 53us/sample - loss: 2.3793 - accuracy: 0.1250\n",
-      "1437/1437 [==============================] - 0s 55us/sample - loss: 2.3769 - accuracy: 0.0995\n",
-      "360/360 [==============================] - 0s 58us/sample - loss: 2.3507 - accuracy: 0.1056\n",
-      "1437/1437 [==============================] - 0s 57us/sample - loss: nan - accuracy: 0.1044\n",
-      "360/360 [==============================] - 0s 41us/sample - loss: nan - accuracy: 0.0778\n",
-      "1437/1437 [==============================] - 0s 64us/sample - loss: nan - accuracy: 0.1044\n",
-      "360/360 [==============================] - 0s 54us/sample - loss: nan - accuracy: 0.0778\n",
-      "1437/1437 [==============================] - 0s 64us/sample - loss: nan - accuracy: 0.1044\n",
-      "360/360 [==============================] - 0s 231us/sample - loss: nan - accuracy: 0.0778\n"
-     ]
-    },
-    {
-     "data": {
-      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAkIAAAJgCAYAAABmwww9AAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjEsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+j8jraAAAgAElEQVR4nOzde5yM5f/H8dfM7MkenXadiRxWWGdSznLMOYoKIUXOoiiVivhKJJWcUlROOWWTFKEi5yJhJUKs3cVa9rwz8/tj/IbbLjqsGbvzfj4e++C+55qZz2fve3au+VzXdY/JbrfbEREREfFAZncHICIiIuIu6giJiIiIx1JHSERERDyWOkIiIiLisdQREhEREY+ljpCIiIh4LC93ByDiacaPH8/OnTsBOHr0KMWKFcPPzw+AJUuWOP9/K/369eP555+nbNmyN2wzffp0SpUqRceOHf974FcMHjyYHTt2sGnTJvLkyZNtjysi4g4mXUdIxH2aNm3K9OnTqVKlirtD+VvOnj1Lhw4dqFq1Ko0bN6Z79+7uDklE5D/R0JjIHWTGjBn07duXdu3aMXLkSOLi4njmmWd45JFHaNq0KT169ODcuXOAoxO1f/9+tm/fTrdu3Rg1ahQdO3akbdu27N69G4DRo0czb948AKpUqcKMGTPo1q0bTZs25bPPPgPAarUyceJEmjdvTufOnRk3bhw9evTIMr6lS5dSr149OnXqxIIFC7j2c9Qvv/xC165dadu2LZ06dWLbtm033V+hQgXOnz/vvP//b2/fvp327dvTrVs32rVrR1paGuPHj6dr1660adOG1q1bO/NLTExkzJgxtGzZkjZt2jB16lSSkpKoXbs2x44dcz72E088wbfffpstx0hEchd1hETuMH/99RcrV65kypQpfPnll1SrVo0lS5awYcMG/Pz8WL16dab77Nu3jz59+rBq1So6d+7MtGnTMrVJS0sjX758LF68mHfeeYeJEyeSmprKsmXLOHDgAJGRkSxevJiTJ09mGVdGRgZLly6lffv2NG3alHPnzrFlyxYA0tPTGThwIAMHDiQyMpLXX3+dN954g7S0tCz322y2m/4Ojhw5wltvvcWaNWs4cOAAMTExLFmyhLVr19KpUyfmzJkDwDvvvENqaipr165l1apV7Nmzh/3799OxY0eWLVsGwIkTJzh+/DhNmjT5R8dBRDyD5giJ3GGqVauGl5fjpdmrVy927drF/PnzOX78OEeOHKFq1aqZ7lO0aFEqVqwIwD333MPKlSuzfOxmzZoBUKlSJdLS0khKSmLz5s106NABX19fAB555BEWLlyY6b4bNmzAZrPRoEEDvLy8aNOmDQsWLKBRo0ZERUVhNptp3LgxAJUrV3Z2YrLafytFihShWLFiAFSvXp2QkBBnJ2379u0EBAQAsHXrVsaMGYPFYsFisfDJJ58AEBYWxuOPP87w4cNZsmQJXbp0wWKx3PJ5RcTzqCMkcofx9/d3/v/NN99k3759PPTQQ9StW5eMjAyymtZ37QRrk8mUZRvA2dkxmUwA2O12Z6fr/5nNWReKP/vsM1JSUmjRogXgqDDFxsZy5MgRLBaL8zH/X1RU1A33lylTxrAvLS3NsH3t72DTpk1MmDCB3r1706xZM8qUKcMXX3wBgJeXl+Hxz5w5g5+fH6VLl6ZChQps2LCByMhIli5dmmVOIiIaGhO5g/3www/06tWLjh07UqBAAbZu3YrVas3W52jUqBFffPEFaWlpZGRkZFlNOnbsGDt37mTFihVs3LiRjRs38sMPP1C7dm0WLFhAmTJlMJlM/PjjjwAcOHCAXr163XC/zWYjf/787N+/H4DIyMgbxvfjjz/SpEkTHn30USpXrsy3337r/B3Uq1ePlStXYrPZSEtLY8iQIc4VeY8++iiTJ08mIiKCQoUKZevvTERyD3WERO5gAwcOZPLkybRr144BAwZQo0YNTpw4ka3P0blzZyIiIujYsSPdunXD29s707L4RYsW8cADD1CqVKlM8a1evZrLly8zY8YM3n33XTp06MArr7zCjBkz8PHxueH+sWPH8tprr9GpUyeOHj1KaGholvF169aNHTt20K5dOzp16kSJEiU4deoUNpuNQYMG4e3tTYcOHejYsSONGjVyVqyaNGlCUlIS3bp1y9bfl4jkLlo+L+LhfvjhB86dO0eHDh0Ax3WOfH19GTVqlJsj+2/27t3L2LFjiYyMzDQ8JyLy/9QREvFwZ8+eZfTo0cTFxWGz2QgPD2fcuHEEBQW5O7R/7fnnn2fHjh1MmzaNatWquTscEbmDqSMkIiIiOcbly5fp1q0bH3zwAcWLFzfcdvDgQV588UUSExOpVasWr776aqYFIdfTHCERERHJEX755Re6d+/O8ePHs7x91KhRvPzyy3z99dfY7fa/tWJUHSERERHJEZYuXcorr7xCWFhYptv++usvUlJSnMPhnTt3Zt26dbd8TF1HSERERNwmISGBhISETPuDg4MJDg427JswYcINHycmJsaw+jQ0NJSzZ8/e8vlzZEeodfnn3R2C29hOnHJ3CG5lLljA3SG4lT0p2d0huI01Pt7dIbiVqW6Eu0NwK0t8krtDcKuvDk506fPZosu77Lk+XjaYd999N9P+QYMGMXjw4L/9ODabzbBC1G63/60VozmyIyQiIiK5Q69evejUqVOm/ddXg26lcOHCxMbGOrfj4uKyHEK7njpCIiIiYmDj5l+MnJ2yGgL7N4oVK4avry+7d++mZs2arF69moYNG97yfposLSIiIjlWv379nF/XM2XKFCZOnEirVq1ISkqiZ8+et7y/KkIiIiJiYLW7riL0bzoiGzdudP5/zpw5zv+Hh4fz+eef/6PHUkVIREREPJY6QiIiIuKxNDQmIiIiBjY859u3VBESERERj6WKkIiIiBi4cvm8u6kiJCIiIh5LFSERERExsNo1R0hEREQk11NFSERERAy0akxERETEA6giJCIiIgZWVYREREREcj9VhERERMRAc4REREREPIAqQiIiImKg6wiJiIiIeABVhERERMTAc75pTBUhERER8WDqCImIiIjH0tCYiIiIGOiCiiIiIiIeQBUhERERMbB6TkFIFSERERHxXKoIiYiIiIGWz4uIiIh4AFWERERExMCKyd0huIwqQiIiIuKxVBESERERA5tWjYmIiIjkfqoIXVG7cTi9R7TC28eLY4fP8PYLn5OUmJpl22f/9zDHD0ez/MMtAASG5GHQq524O7woKclpfLNiF18s3OrK8LNFnVbV6P36I3j7enFs/0mm9Z9D0qXkLNuOnPs0x389yedvr3XuCwjxZ8q3LzH16dkc2XPMVWFni9pN76H3mHaO43/wNG+PXETS5ZQs2z477TGOHzrD8lkbnfse7FmfVt3r4ePnze/7T/L2yEWkp2W4Kvz/rE7zKvR+ubMj/99OMW3IxyRdukH+7/Xm+MG/WP7uegD8g/IwfEYvSpQrjMls4ttF21j2zjpXhv+P1WlTg75vPIq3rzfH9v3JW0/OzHSu36hNUL5Ahrzfj7ur3UVKYgpff/Qdq9915Htv25qM+mgQsSfinI8zvOFLJN/gXLoT1LmvLH0GNMXb24tjR88ydcIakpLSDG2ataxCl8fqgd1OSmo670/9miOHzhjavDyxK+fiLvHeW3f2sb+Z2o0q0Ht4yyvvA9G8PXZ5pveBJu2q0aVPQ+x2O6kp6XwwYQ1HDvzlpohvH80R8jAh+QIYMbEr4wcvpF+rKUSfPE/vka0ztStxdxgTP+5H/ZZVDPuffqEdKYlpPN3mLYY//B61GlagTuNwV4WfLUIKBvHs7Kd4vdvbPBkxiuhjMfQZ/0imdiUqFOV/616gQac6hv21W1Zl+vevUrx8EVeFnG1C8gcyYupjjH/qQ/o1mkD0iXP0HtMuU7sSZQsxcckg6j9YzbD/vtYRtO/dkDHd36N/04n4+HnTsV9jF0X/34UUCGTEu0/weq+ZPFn3Jc4cj6P3y50ztStRvjCTVj1Lg/Y1Dft7vdCBuNMX6H//OIY0m0DbPo2oWLuMq8L/x0IKBjPyw2d4rcsU+lQcypljZ+k76bG/3ab/1F4kJ6bwZKXhDKn3InVaVafugzUAuOe+Cnz+1hf0rzHK+XMnd4JC8voz8sX2vDbmc/p2e58zf8XT95lmhjbFSxbgyUHNeHH4ZwzoNYfP5v/AKxO7Gtp0fawelauWcGXo2S4kXwAjJnRh/NBP6ddmKtGnztP72VaGNsXuKsiTo1oz9qn5DOo8g8UffMfYdx53U8SSXVzWETp69Cjvv/8+L7/8MuPGjeP9999n//79rnr6m6pRvxxR+09y+s9zAEQu+okm7atnatf2sXp8vWwn36/bZ9hftlIxNqzeg81mJyPdyo5Nh6jfqkqm+9/JajxQhcO7/+D00bMARM75lqbd7s/Urn3/5qybv4ktK3YY9ncc2JLJvWdyPjreJfFmpxqNwon65QSnj8UCELngB5p0qpWpXdsnGvD1om18H/mzYX+zh+qwYvZ3XI5Pwm638+7opWxcvtMlsWeHGk0qEbX3OKf/iAHgyw830bRr3Uzt2vVtwrqF3/P96t2G/TPHLGbOS8sAyF8oBG8fLxITsq4k3glqtoggaudR/vo9GoA1M9fT7NEGf7tNuZpl+HbhZmw2GxnpGWxfu4eGD9UDoFK9ClRrUpkP9r7J1M2vUaVBRRdm9s/VrFOGwwdPc/rUeQAiV+yiacvKhjbpaRlMmxjJ+XOXAThy6DT5CgTi5eV4+4ioXora997Nl6v2uDb4bFbj/nJE/XrK+D7Q1vihJz0tg7dfWsGF2EsARP16inwFA/Hytrg83tvNisllP+7mko7Qp59+yogRIwCoUqUKlSpVAuCll17iww8/dEUIN1WwSF5iz1x0bsdFXyQgyA//AF9Du5mvrWbTdW+CAId/OUmzDjWweJnx8/fh/haVyR8afNvjzk6hxQsQd+WPIUDsqfMEhPjjH5TH0O694R/z3ZLMw34vtp9M1O4/bnuct0PBonmJPX3BuR13Jp6A4Dz4B/oZ2s0c+zmbVu2+/u4ULxNG3gKBvP7JAN7/5nkeG9Gayxfv3I7A9UKL5SP2r6v5x56+QECwP/5Bxvzff34Rm5bvuP7uANisNp77oC+zfnyVfT9GcepI9G2N+b8ILVGQ2FNXh65iT53LdK7frM2hHb/zQI9GWLws+AX4Ub/zveQvkheAhHOXiJz1Df2rj2LeC58xbsUoChbL77rk/qHQQsHExiQ4t2NjEwgI9MPf38e572z0RXZs/d25/fSQFvz0fRQZGTbyFwxkwPCWTBq3Cps1Z1+Cr2DhEOP7wNmETO8DMafj2bn5sHP7qecfZPt3B8lIt7o0VsleLukILViwgMWLF/PMM8/QtWtXunbtyjPPPMOiRYtYunSpK0K4KbPZBPbMU+Sttr/3wp4zKRI7dt5dNZSX3+/J3q1HyEjPOfNDwPE7sGf1O8jhf9z+DrPJRFZftPx3c7d4m6neMJyJ/eczpM0UgvL688TzbbM5ytvHZDZny7Gf3H8eD5cbTlC+AB59LvPQ4p3Cca5n3n/tG/nN2sx69mOw25m5ZzKvrhzFnm9/cc4He7XLFL5f/hMAB348xIGth6nZPOK25JEdTDd43duyWDLk5+fN2AkPUbR4PqZOXIPFYuaF1zoza/p6Z7UoJzObTWT1hyCr9wHfPN68MO1RipYqwNsvrXBBdK5ns5tc9uNuLukIeXl5kZGRuWOQkpKCt7e3K0K4qZjT8eQPu1rBKVgomEvxSaQmp/+t+/sH+jFv8loGtJ3GC0/MxWQyOcurOUXMyXMUKJLPuV2wWH4unb9MalLWE8Zzk5jTF8hfKMS5XbBwCJfiE0lNTrvJva46fzaBrV/9QtLlFDLSrWxcsZPwmnfdpmizX+ypcxQonNe5XbBIXi5dSCQ16e/lX7NpJfIXdvz+UhJT2bR8B2UjSt6WWLNDzIm4TOd6wvnLpFxzrt+sjX+wP3Oe+4SnIp7l+RavO17vR6MJCPGn+5hOhucymUx3dLUgNjqBAgWDnNsFQ4NJSEgmJcX4ty+0UDDTZvfGZrUzatBCEi+nUr5iEYoUzcvTQ5oz8+N+PNipJo2a3cPwMTnnQ8C1Ys7EGyr5N3ofCC0SwtTPBmCz2Xi+1xwSb7CoQHIOl3SE+vfvT8eOHRk7dizTp0/nnXfeYezYsXTt2pX+/fu7IoSb2vNDFOHVSlK0VAEA2nS/l20bfvvb92/TvS49hrYAIG+BQFp2rZ3lENqdbPe3+wmvU5aidxcC4MF+zdgWmXkYKDfas/kQ4TVKUbR0KABtetRn29d/f/7aD1/+TIO21fHxc3Tq67WKIOqXE7cl1tth93e/EV6rDEXLhAHwYO9GbPvq75+/DTvW4vErFSBvHy8adKzFL98fui2xZofd63+h4r3lKFa2MABt+7dg2+qdf7tNu/7N6fWaYyFB3rAQWvdtxsbPfiD5Ugrtn2lF/c6O+VV3V7uLCnXKsnPdnfu3YPeOo1SsXIyixR3Dd2071WTblsOGNnn8fZjyXk9+3HSIN15eQVqq40PtwV//4rGO7zCg1xwG9JrDlyt3s3nDb0ybGOnyPLLDnh+PEF61xNX3gUfqsm2j8X0gj78P//v4KX785gCTnl3s/F1IzuaS5fPt2rWjTp06bNu2jZiYGGw2G7Vq1WLw4MEUKlTIFSHc1MXziUwbs4wXZzyOl7cXZ06cY8pzSyhXuRhDJ3RhUIfpN73/0lnfMfLNbsyMHI7JZGLh9G+I2n/KRdFnj4uxCbz11CxeWjQULx8vzvwRw5t9Z1KuRmmGz+zHM3VfcHeIt83Fc5eZ9uxnvDirD17eFs78GceUYZ9QLqIEQ9/szqCWk296/8iPvycwrz8z1o7CbDHx+/5TzH1tlYui/+8uxl1i6qD5jP2ov+PYH4vlzQHzKFetFMOm92Jgo9duev/ZY5cyeOrjfPDjOAC2frmXVR9scEHk/058bAJT+rzPS8uexdvHi9NHzzK517uUr1mGEXMG0L/GqBu2AVg0cSXPLxjM7H1vYTKZ+PiVJUTtOgrAKx3/x8B3+tJz3MPYMmxM6DaNhHOX3JnuTcVfSGLK+DW89EYXvL0tnP7rPG++tppy4UUYMaYtA3rNoUOX2oQVDuH+RhW4v1EF532fG/wJl+7gSfH/1MXziUx7cTkvvv2Y4+/AyfNMGb2UcpWKMfT1zgzqPIN2j9UjrGhe7nvgHu574B7nfcf0mcel+CQ3Rp/97oRJzK5ismc1QHyHa13+eXeH4Da2Ezmrg5XdzAULuDsEt7In5Z43nn/KGp/zViRmJ1PdO3eukStYcllH45/66uBElz7fnhOuG96uUdK9FXRdUFFEREQMrB50mUHPyVRERETkOqoIiYiIiMGdsKzdVVQREhEREY+lipCIiIgYeNKqMVWERERExGOpIiQiIiIGVrvn1Ek8J1MRERGR66giJCIiIgY2D6qTeE6mIiIiItdRRUhEREQMtGpMRERExAOoIiQiIiIGWjUmIiIi4gHUERIRERGPpaExERERMbBpsrSIiIhI7qeKkIiIiBhYPahO4jmZioiIiFxHFSEREREx0PJ5EREREQ+gipCIiIgY6EtXRURERDyAKkIiIiJiYLXrOkIiIiIiuZ4qQiIiImKg6wiJiIiIeABVhERERMTApusIiYiIiOR+qgiJiIiIgeYIiYiIiHgAdYRERETEY2loTERERAx0QUURERERD5AjK0IHh4a5OwS3KVzWs/uu1iWh7g7BrS6W9ZxPaddLDctwdwhu5Zc/2d0huFWpbn+6OwSPoi9dFREREfEAObIiJCIiIrePVRdUFBEREcn9VBESERERAxueMx9RFSERERHxWKoIiYiIiIHmCImIiIh4AFWERERExEBfuioiIiLiAVQREhEREQObvmtMREREJPdTRUhEREQMNEdIRERExAOoIyQiIiIeS0NjIiIiYmDTBRVFREREcj9VhERERMTAqi9dFREREcn9VBESERERA80REhEREfEAqgiJiIiIgeYIiYiIiHgAVYRERETEQHOERERERDyAKkIiIiJiYFVFSERERCT3U0VIREREDGxaNSYiIiKS+6kiJCIiIgaaIyQiIiLiAVQREhEREQObXXOERERERHI9dYRERETEY2loTERERAysHlQnUUfoiialSvPcvQ3wsVg4dC6W5zeu53J6mqHNi/c3os3d5bmYmgLAHxcuMGh9JACPV65Kt3uq4GfxYn/sWZ7fuJ40m9XleWSX+wpW4JnyLfE2e/H7pWgm/LqcJGuq8/bWRavTvVR953aglx9hfiG03zyJ82mX3RHyf3J/RGkGPlQfH28LR07GMX7+ehJTjMe/a9NqdGkSgd0Op2LjmfDRN1y4lMykZ9pSIiyvs13RgiHsOXyKZ2esdnUa/1rju0szosn9+FgsHI6J44UvvyExzZj/6GYNaRVejospjvP/2LkLDFu11nl74aBAlj3RnfZzF3IhOcWl8f9XTUuU4bnaDfExWzh0Ppbnvl+X6fU/tm5j2pSuQPz/v/4vnmfQxjWE+Pox4f7m3JM/jOSMdJZF7eej3/a6I41/pVHhcjxbqSk+ZguHL8bwwp4vSMww5l4+OIyxVVsR5O2LzW7n5b1fciD+DAA/PTiS6OQEZ9t5R7ay5uSvLs3hv6rTuhp9xnfD29eLY/tPMvWp2SRdSs6y7ah5/Tn260k+n/alc19AiD9vbXyZt/rN4sieY64KW7KJOkJAfr88vNm0FV1WLOL4xXhG12vA8/Ua8NKWDYZ2NQsXZfD6L9kTfdqwv2WZsjxRpToPrVhMQmoK77dqR99qNZm5Z4cr08g2eb0DGFu5C0/v+ICTSecYWL4VA8u34s2DV9/Yvzq9l69OO/7YW0xmPqjzNAuObc6RnaC8QXl4uU9LnnxjMSdj4hnUpQGDutTnf59sdLYJLxXG461q8ugrC0lMTmPoww3p3+l+Ji74ltHvRzrb3XNXISY9047Jn2zI6qnuSPn88zCxbQu6LVjCnxfiGdmkPiOb1OfVrzca2lUvVoThq9ay968zmR6jY+WKDG5Yj0JBga4KO9vk98vDmw1b8dCazzieEM/o2g0ZXbshY7d+a2hXM6wYgzeuYXeM8fX/ct0mJKWn8cDyD7GYTMxp3okTly6y8eQfrkzjX8nn48/EGu3pvnk+fyaeZ2SlZoys3IxXf/7K2cbP4sW8+o/x4u41bDn7O82KlGdK7U60/uZ9SgcWID4tmY4bZ7sxi/8mpGAQI+c8zbDGr3L692j6vtGNvhO6MWPIfEO7EuFFGTy9N+F17ubYryed+2u3qkb/KY9TqFSoq0O/rTRZ2sM0KFmKfTHRHL8YD8Anv/5Ch/IVDW18zBYqFQyjf/XarOvWk5mt2lE0MAiAhypUYs7Pu7mYmoIdeHHTt6w4/Jur08g2dQuW42DCKU4mnQNgxYmfaFmk2g3b9yzdiAtpl1l1Kmd2/O6tVIrfjkVzMsZx/Jd/9wut7jUe/0N/xtB5zHwSk9Pw8bIQmi+Qi5eNnxi9LGZeebIVUxdv4uyFnNMhrF+6FPvPRPPnBUf+i/bso32lcEMbb4uFewqH0e/eWqx5sgczOrelSLDj/A8LDOCBCnfTd/EKl8eeHRoWu4t9sdEcT7jy+j/4Mx3K3mNo42O2cE+BMJ6OqMPXnZ/gg2YdKBrgyL9KwUKs+P03bHY76TYbG0/8QZvSFVyex79Rv1AZ9sef5s/E8wAsOraLdiWqGNrcH3Y3Jy9fYMvZ3wHYcCaKYds/B6B6geLY7DY+bfgEXzR7moHhDTHnsCsS12weweFdf3D692gAImd9S9Pu92dq175/C76a/x1blm837O84sCX/6/U+58/EuyReyX7qCAFFA4M5c/mSc/vM5UsE+/oS6O3j3BcWEMDWv04wZfsPtFq8gL1nzzCnTUcASufNRwH/PHzctjNfPdKTYXXuIyE1Zw0NXCvML4SzKRed2zGpCQR6++Fv8c3UNsTbn+53NeDtQ5GZbsspCuUP4uz5q8c/5sIlAv19CfDzMbSzWm00qn43X771FNXLF2fNDwcMt3doUJm4+Mts2vO7S+LOLkWCgziTcLXjFp1wiSA/XwJ8ruZfKDCAn46fZNqWrbSbu5Cf/zrDzC7tAYi5nMig5ZEcP58z3wiKBARxOvGa13/iJYJ9jK//Qv6BbDtzgim7f6Dlio/YG3Oauc07AfBz7Bk6l70HL5MZfy9vWpcuR5h/gMvz+DcK5wkhOunqaz06OYEgbz8CvK7mXjqwALGpl5lQox3LmzzJ/PqPYzE73josJjNbY4/x5I+f8tiWj6hf6G563F3H5Xn8F6HF8xN76pxzO/bUeQJC/PEPymNo996wj/hu8dZM93+x3f+I2n3nV//+KRtml/24m0siOH369E1/3M1kAjv2TPutdpvz/6cuJdA7ciVR5x0vmNl7d1EyJITiQcF4mc00KF6KgV9H0n7ZJ+T19WPUvfUzPV5OYcYE9sy/Dxu2TPs6Fq/D9zG/cTr5gitCuy1MJlMWRx+stsz5bt57lOZDZzJn9TZmPNsZ0zUffru3qMm8Ndsz3edOZ77B+W+79vy/mEC/pas4Eus4/+dt303JfCEUDwl2WZy3i/lGx/+a18DJyxd54uvlRF2IA2DW/p2UDM5LicAQxm/fhB1Y26knc5p34vu//iQ9h8wPvFHutmty9zKbaVSoHEuO7eGh7+byydEdzL7vUbzNFpYd38v4X9aRbE3nUnoq849s44Gi4Vk84p3LZDZn9ecOmzXz61/cb82aNbRp04YWLVrw6aefZrr9wIEDPPTQQ7Rv356nn36ahISELB7FyCVzhJ5++mmOHz9OWFgY9uvOOJPJxIYN7p1PcfrSJaoXKuLcLhwYSHxKMskZGc594QUKUrFAKCujDjr3mTCRYbMRk3iZdX/87pxcuTLqN4bWrue6BLLZ2ZR4KuUt4dwO9Q3mYnoSKdb0TG0fKBLB1INrXBletjt77hKVyxR2bjuGvVJISbt6/IuH5aVAiD+/HHF03L/4/ldG92xGsL8fFxNTKF8yFC+LiT2HT7k8/v/q9MVLRBS9evZ7te0AACAASURBVP4XCgokPjmF5PSr+VcILUh4oVBW/3rwmnuaSM+is5jTnE68RLWwa17/AUFXXv9Xz/fw/KFUzB/Kyt+vDnmbMJFusxLo7cMbOzY7F1EMrFrXOcx+pzuTdJGq+Yo5twv5BROflkzyNa/1mJRLHL0Uy74LfwGOobHxNUyUCMhHlbxFOHTxLIcTYoArfxPtOaMT+P9iT8YRXudu53bBYvlJOH+ZlKTUm9wr97PegXOEzp49y7Rp01ixYgU+Pj5069aNunXrUrZsWWebCRMmMGTIEBo1asSkSZOYN28ew4cPv+njuqQitGjRIkqXLs3kyZPZuHGj4cfdnSCA708ep1qhItwV4lj581ilqnxz7Kihjc1uZ1yDphQPcnwCfrxyVQ6diyU68TJrjx7hwbLl8bU4+pUtSpfll7PRrk0iG20/d4TKISUo4V8AgE4l6vJ9TOY5T0FefhTPU4B98X+6OsRs9dOB41QuU8S58uuhxlXZ8rNxeKtgSAATnn6QkEA/AFrVC+foX+e4mOh486tZoTg7D54kJ/rh2J9UK1aYUvkc+XevEcGGqOvOf+yMbd7YWQF6tEYEh2PiOHsp58yFupEtp45TPawodwVfef2HV2X9CePxt9ntvFqvGSUCQwDoUbEahy7EEp10mccqVuPZGo45JQXz+NOtQgSrjx4kJ/gh5ihV8xejVEB+ALqVqcmGM4cNbbZE/07xgHxUyuvoLNYqUNKxcjLxAuWCwxhyT2PMmPA1e/HY3bVZe+pApue5k+3+Zj8V65SjaFnHh6G2TzVj25rdbo5KsrJ161buvfde8ubNi7+/Py1btmTdunWGNjabjcTERACSk5Px8/O75eO6pCIUGBjI+PHjWbZsGTVr1nTFU/4j55KTGbXxa2a2aoe32cKfCfGM+HYdVUIL8b+mLWizZCFR58/xyvcbmfdgJywmE2cSLzNkvWP55MJffyavnx+RDz+O2WTiQGwME378xs1Z/XsX0hJ5/dflvFHtMbxNFk4lnee1X5cSHlyMFyp1pue2GQAU9y9IXNolwxBiTnThUjKvfbieSQPb4W0xcyr2IuPmrqPiXYUY+0RzHhv3CT8f+Yv5kduZ9dzDWG02YuMTGXXN8vgShfJxJu7WJdg70fmkZMZErmdG57Z4W8ycuHCR59aso3LhQkx48AE6zPuUI7HneH39d3zwcAcsJhPRly4zYvXaWz94DnAuJYlRm79iZrMO+Fgcr//hm9dSpWAh/tegFW1WfkzUhThe2baBeS06YTGbOZN4icEbHZXQ93/5iWmNHmR95ycwmUxM3fMj++Jyxgeh86lJjNn9Be/U7YK32cKJxAs8v2sVlfMWYXyNdnTcOJu41EQGblvCK9XakMfiTZotg8Hbl5Jms/Luoc28XLU1ax7oj5fZzLq/DrLseM65dABAfGwCU/rN4qXFQ/H28eL00bO82Wcm5WqUZsSsfgyo/YK7Q3QLV64aS0hIyHIIKzg4mODgq8PvMTExhIZeXZ0XFhbGvn37DPcZPXo0ffr04Y033iBPnjwsXbr0ls9vsl8/VpUD3PXeW+4OwW0Kl41zdwhuZV2Su5ao/lMXy9555WpXSQ3LuHWjXMwvf9bXtfEUpbodcncIbrU+7TOXPt/Qvd1d9lxlf7iPd999N9P+QYMGMXjwYOf2zJkzSU1NZdiwYQAsXbqUX3/9lddeew2AlJQUHnroISZOnEhERATz589n27ZtzJ5988s76DpCIiIiYmCzu241V69evejUqVOm/ddWgwAKFy7Mrl27nNuxsbGEhYU5t6OiovD19SUiIgKARx55hOnTp9/y+d2/bk1EREQ8VnBwMMWLF8/0c31H6L777mPbtm2cP3+e5ORk1q9fT8OGDZ23lypViujoaP74w3E5gw0bNlClivG6WFlRRUhEREQMrHfghTELFSrE8OHD6dmzJ+np6XTp0oWIiAj69evHkCFDqFKlChMnTmTYsGHY7XYKFCjAG2+8ccvHVUdIREREcoR27drRrl07w745c+Y4/9+oUSMaNWr0jx5THSEREREx0HeNiYiIiHgAdYRERETEY2loTERERAxcuXze3TwnUxEREZHrqCIkIiIiBrY7cPn87aKKkIiIiHgsVYRERETEwKrl8yIiIiK5nypCIiIiYqBVYyIiIiIeQBUhERERMdBXbIiIiIh4AFWERERExEDXERIRERHxAKoIiYiIiIHmCImIiIh4AFWERERExEDXERIRERHxAOoIiYiIiMfS0JiIiIgYaLK0iIiIiAdQRUhEREQMdEFFEREREQ+gipCIiIgYaI6QiIiIiAdQRUhEREQMVBESERER8QCqCImIiIiBKkIiIiIiHiBHVoS+bv+Wu0NwmyAP77qmVLK7OwS3snrQtT2uZ8Gzj/3Apj3dHYJbWe02d4fgUVQREhEREfEAObIiJCIiIrePriwtIiIi4gFUERIREREDzRESERER8QDqCImIiIjH0tCYiIiIGGhoTERERMQDqCIkIiIiBqoIiYiIiHgAVYRERETEQBUhEREREQ+gipCIiIgY2FUREhEREcn9VBESERERA33pqoiIiIgHUEVIREREDLRqTERERMQDqCIkIiIiBlo1JiIiIuIBVBESERERA80REhEREfEA6giJiIiIx9LQmIiIiBhosrSIiIiIB1BFSERERAw0WVpERETEA6giJCIiIgZ2u7sjcB1VhERERMRjqSIkIiIiBjY0R0hEREQk11NFSERERAx0HSERERERD6CK0BU7f7KwYJ4vGekmSpWxMuTZFPwDjG0iV3rz5WoffHztFC9po//gFIKCHbc91jmAgqFXp9l3ejiNxs0yXJjBf7PtJwtz5/qSnmaiTBkro0alEHBd/itWeLNqlSP/UiVtDB2aQvCV/Fet9mbtl96kpkH58jZGjUzBx8f1efxbO37yYv5cX9LToXQZG8NGJmfKf/VKb9as8sHXF0qUtDFwSLLz+D/SKdBw/B96OJWmD+Sc47/zJwsfz/UlPd3EXWWsDB2Z+fxfs9KbyCvHv0RJGwOGXD3/H+0UQIFr8u/8cBpNclD+nnz8azcOp/eIVnj7eHHs8BnefuFzkhJTs2z77P8e5vjhaJZ/uAWAwJA8DHq1E3eHFyUlOY1vVuzii4VbXRl+tqrTujp9JnTD28ebY/tPMPWpWSRdSs6y7agPB3Bs/0k+nxbp4ihdQ9cR8jAX4028M8WPMa8kM/OjRAoXsfHxXF9Dm30/W1i+xIfX30xi+qwkatXJ4L1pfgCcOmkiKAimz0py/uSkTlB8vInJk/14dVwyCxYkUqSojdlzjPnv3Wth0WIf3noriblzkqhbN4O3pjry37LFi5UrvZkyJYn5HyaRmgqff55zekHx8SamvunH2HHJzP3Ycfznz/UztPllr4Vli32ZOCWJ92YnUrtuBtOn5gHg1EkzQUF23pud6PzJKW+C4Dj/337TjzHjkpl1Jf+Prj//91r4fLEPE6YkMWN2ErXqZjBj6tXzPzAIZsxOcv7kpE6QJx//kHwBjJjYlfGDF9Kv1RSiT56n98jWmdqVuDuMiR/3o37LKob9T7/QjpTENJ5u8xbDH36PWg0rUKdxuKvCz1YhBYMYObc/rz08jb6VR3DmWAx93+ieqV2J8KJMXj+WBp3ruiFKuR1c1hH69ttvWbhwISdOnDDsX7JkiatCuKG9uy2UK2+jaHHHJ7rW7dLZvMHbcB2Fo1FmqtawOj/11aufwY6fvEhPh0MHLJgtdkYPy8Pgfv4sXuiD1eqOTP6dnbssVKhgo/iV/Du0T2fDdflHRZmpWdNK6JX8GzTIYNs2R/7rv/Hi4a7pBAeD2QwjhqfSvHm6O1L5V/bsslC+gpVixW0AtG2fxnfX5X/kiIXqNTKc+d9fP53tV47/bwcsmC0wcqg/A54M4NMFOev479lloVwFG8WuHP827dPZdF3+vx8xU+2a8/++a87/g1fO/+eG5mHQk/4syoH5e+rxr1G/HFH7T3L6z3MARC76iSbtq2dq1/axeny9bCffr9tn2F+2UjE2rN6DzWYnI93Kjk2HqN+qSqb75wQ1m0dweNdRTv8eDUDkrG9o2r1+pnbtB7Tkqw+/Y8vy7a4O0aXsdtf9uJtLOkJTpkzhk08+4fjx43Tv3p3Vq1c7b1u8eLErQripuBgzBcNszu2CoXaSkkwkJ11tU76ijX17LcScdZQLv/3am4x0E5cSTFitJqpWtzJuYjITpyWxZ5cXkau8XZ3GvxYbYybsmvxDQ+0kJppIuib/ihVt7N1rITrakf+6dd6kp5tISDBx6pSZC/Emnns+D32f9Oejj30IDLwDzu6/KS7W7HyDgyvH/7r8w8Ot/PKzF2evHP/1huMP1WpYGT8piTffTmTPLi++WJVzKmJxsWYKhl53/ided/6H29j389Xz/5vrzv9qNay8NimZSW/nvPPfk49/wSJ5iT1z0bkdF32RgCA//AOMFcGZr61mU+TPme5/+JeTNOtQA4uXGT9/H+5vUZn8ocG3Pe7bIbR4AWJPnXNux546R0CIP/5BeQzt3hs6n+8W/+jq8OQ2cskcoc2bN7Ny5Uq8vLzo0aMHffr0wcfHh9atW2O/A7qDthuEYL6mm1ipipVuPdN445U8mM3wQMt0goLseHnZafmgsfrR8aE01qzypsNDOaMqYrOT5RUjrs0/IsJKz55pvPxyHkxmaN06neBgR/4ZGSZ277Yw/vVkfHxg0iQ/5s3zZdCgrOcZ3GlsNrL8BViuyb9yhJXHeqTy+sv+mM3QolUaQUE2vLyg9YPpwNVj3alLGqtX+tDpobTbHnt2sNvAlEX+5uvy794jjQlXjn/zVlfP/1bXn/9d0lizMged/x58/M1mU5Yfya02WxatM5szKZInRz/Iu6uGciHuEnu3HuGe6qWyO0yXMJnNWVYnbNa/97vIbTxp1ZhLOkJ2ux3Tlb+0d911F7NmzaJ3797kz5/fud+dQsNsRB28+qs4F2ciMMiO3zUfBJKSoHJEBi1apzvbfPqRL0HB8N03Xtx1t43SZRwvGDvgZXFlBv9NoTAbB6/JPzbWRFCQnTzX5V+tagYPtkl3tpk/35fgYChQwEaD+hnOyaUPNE9nwQLjJ8o7WViYncOHrp6HcTc4/lWqZtDySv5xsSYWzPclKNjOhm+8KVPGSum7rxx/O3jloGUIoWE2Dh/6G+d/1QxaXJP/J/Md5//Gb7woXcZmyN+Sg/L35OMfczqeChElnNsFCwVzKT6J1OS/14n1D/Rj3uS1XL7omFD8SP8mzmG2nCb2ZBzhdco6twsWy0/C+cukJOWMD3Ty77lkaKxVq1b06NGDffsc48vlypVj+vTpDBs2LNOcIXeoXtPK4YMWTp9y/DH8ao03de8zTnY8f87Ei8/6k5To2F76qQ8NmqZjMsGfx8189pFjXkBqKny5ypv6jXPGZEmAWrWsHDxo4dSV/Nes8eb+6/KPizMxbLg/iVfy/+RTH5o2ceTfqGEGmzZ7kZrqeBP48QcvwivkkEkSQI1aGRz6zcJfpxwvh7VrfKh3n/GN4Pw5M8+NCHDmv/gzXxo3zcBkguPHzCz8yNd5/Nes9qFh45xRDQGoXsvK4d8s/HXl+K9d4829WZz/Y0Zcc/5/5kPD/z//j5n59JrzP3K1Nw1y0Pnvycd/zw9RhFcrSdFSBQBo0/1etm347W/fv033uvQY2gKAvAUCadm1dpZDaDnB7m/2UbFuWYqWLQxA26ceYNuaXW6OSlzBZHfR2NS2bdsICwvj7rvvdu47c+YMH374IS+++OI/eqzDJ4tmd3js2n5l+XwGFC5iZ/jzyUSfMfPuVD+mz3JMFohc5c3aL7yx20xUrJzB04NT8fWF1BT4YIYfUQfNZFhN3N8wnR590rIcbvivgm5T1/WnnyzMmevIv2hRO2NGJ3PmjJk3p/gxd44j/5UrvVm12pF/5SoZDB3iyN9qhU8+8eG7TV7YrCbKlbMyYkTm5ffZIeU2na47tnvx0ZX8ixSxMfJK/tPfysN7sx3vfl+s8iZytQ82G1SqbOWZISn4+kJKCrw/w49Dv1mwWqFBwwx69U29Lcffepsue79zu2P5vCN/OyNGO87/d97yY8Zsx/Ffs8qbL68c/3sqZ9D/yvFPuXL+H/7Ncf7Xb5hOz77Zf/5buH1/qnLC8R/YtGf2PuAVtRtV4IlnW+Hl7cWZE+eY8twSipTIz9AJXRjUYbqh7YhJXfkz6qxz+XyeAB9GvtmNoiULYDKZWDLrO777Yu9tidN67M/b8rjXqt2qGn0mdMfb24vTf5zlzd7vUbhMIUbMeooBtUYb2o6cN4Djv7pu+fz6dNfOp63yxSsue6797V912XNlxWUdoex0OzpCOcXt6gjlFLerI5RT3K6OUE5wOztCOcHt6gjlFK7oCN3J1BG6fXLISLaIiIi4ii6oKCIiIuIBVBESERERA0+ahaCKkIiIiHgsVYRERETEwJMuqKiKkIiIiHgsVYRERETEQBUhEREREQ+gipCIiIgYeNCiMVWERERExHOpIiQiIiIGmiMkIiIi4gFUERIREREjD5okpIqQiIiIeCx1hERERMRjaWhMREREDDRZWkRERMQDqCIkIiIiBnZNlhYRERHJ/VQREhEREQPNERIRERHxAKoIiYiIiJEqQiIiIiK5nypCIiIiYqBVYyIiIiIeQBUhERERMVJFSERERCT3U0VIREREDHQdIREREREPoIqQiIiIGGmOkIiIiEjup46QiIiIeCwNjYmIiIiBJkuLiIiIeIAcWREq5x3o7hDETWzY3B2CW5k9+LNLtPWyu0NwK+uxP90dgngSTZYWERERyf1yZEVIREREbifNERIRERHJ9dQREhERESO7C3/+gTVr1tCmTRtatGjBp59+mun2P/74gx49etC+fXv69u3LxYsXb/mY6giJiIjIHe/s2bNMmzaNzz77jFWrVrFkyRJ+//135+12u50BAwbQr18/vvjiCypWrMjs2bNv+bjqCImIiIjRHVgR2rp1K/feey958+bF39+fli1bsm7dOuftBw4cwN/fn4YNGwLQv39/HnvssVs+riZLi4iIiNskJCSQkJCQaX9wcDDBwcHO7ZiYGEJDQ53bYWFh7Nu3z7l94sQJChYsyAsvvMDBgwcpU6YML7300i2fXxUhERERMbKbXPbz8ccf06xZs0w/H3/8sSEkm82GyXR1NZvdbjdsZ2RksGPHDrp3787KlSspUaIEkyZNumWqqgiJiIiI2/Tq1YtOnTpl2n9tNQigcOHC7Nq1y7kdGxtLWFiYczs0NJRSpUpRpUoVANq2bcuQIUNu+fzqCImIiIiB3YVXlr5+COxG7rvvPmbMmMH58+fJkycP69ev5/XXX3feXr16dc6fP8+hQ4cIDw9n48aNVKpU6ZaPq46QiIiI3PEKFSrE8OHD6dmzJ+np6XTp0oWIiAj69evHkCFDqFKlCu+99x5jx44lOTmZwoULM3ny5Fs+rslud2W/L3vYosu7OwRxE33XmOdO6/P07xp74q6G7g5B3Gh9+mKXPl+pebfuQGSXP/s+57Lnyorn/lUVERERj6eOkIiIiHgszRESERERI7u+dFVEREQk11NFSERERAxMOW4Z1b+nipCIiIh4LFWERERExEgVIREREZHcTxUhERERMdKqMREREZHcTxUhERERMdIcIREREZHcTxUhERERMVJFSERERCT3U0VIREREjFQREhEREcn9VBESERERI11HSERERCT3U0dIREREPJaGxkRERMTApMnSIiIiIrmfKkIiIiJi5EEVIXWE/iG7HcZMhPJloE83d0fjWrk5983b4O3ZZtLSHfm9/ryNwABjm0+Xm/hspQlfXyhT0s7Y4XbyBkN8Arw+1cSh303k8YNOre089lDu+yuSm4//tp8szJ3rS3qaiTJlrIwalULAdcd/xQpvVq3ywcfXTqmSNoYOTSE42HHbqtXerP3Sm9Q0KF/exqiRKfj4uD6P7FCndXX6TOiGt483x/afYOpTs0i6lJxl21EfDuDY/pN8Pi3SxVHeHp6cuyfT0Ng/cPQ49B4O6ze7OxLXy825n4+HsZPMvP26jS8/sVG8qJ2ps4xLR7fvgXmLTMybamPFPBsN74VxUxwvn/+9a8I/D3zxsY3PZtr4fruJTVvdkcntk5uPf3y8icmT/Xh1XDILFiRSpKiN2XN8DW327rWwaLEPb72VxNw5SdStm8FbU/0A2LLFi5UrvZkyJYn5HyaRmgqff54ze0EhBYMYObc/rz08jb6VR3DmWAx93+ieqV2J8KJMXj+WBp3ruiHK28OTc/d0LusIHT9+nLNnzwKwbNkyxo8fz9q1a1319Nnis1XQ5UFo2djdkbhebs59604TlcOhVHHHdrcOdr781oT9mqLOb1Em7q1pp3CYY/uBhnY2bYW0dMdt7VrYsVjAxxsa1rOzfnPuugZHbj7+O3dZqFDBRvHijgPeoX06GzZ4G45/VJSZmjWthIY6djZokMG2bV6kp8P6b7x4uGs6wcFgNsOI4ak0b57ujlT+s5rNIzi86yinf48GIHLWNzTtXj9Tu/YDWvLVh9+xZfl2V4d423hy7p7ulkNjH374Ie+++y5Wq5VixYpRvnx5KlSoQIUKFShfvjzFixe/5ZN89NFHLFy4EJvNxr333suZM2do3rw5y5cv59ixYwwcODBbkrndXhrm+PfHXe6Nwx1yc+5nYqBw2NV3vUKhcDnRRGISzuGxiIp2Pllu5nS0naKFYeVXJtLTTVxMcNy2Zr2J6lXspKXBN5tNeHlBbhpkz83HPzbGTFiYzbkdGmonMdFEUhLO4bGKFW2sWOlDdLSJwoXtrFvnTXq6iYQEE6dOmbkQb+O55/Nw7pyJKlWsPP1Uqpuy+W9Cixcg9tQ553bsqXMEhPjjH5THMET03tD5gKPzkFt4cu5Z8aRVY7fsCM2aNYvJkycTERHByZMniYqK4vDhw2zZsoUjR44AUK5cORYtWnTDx1i+fDlr164lLi6Otm3b8tNPP+Hr60vXrl3p0qVLjukISe5kt0FW9RvzNfXSmlXhmSfsDBlrxmx2zAMKCbbj7QWjnrEzZaaJLk+aKZgf7qtlZ++vuasilJvZ7Lc+/hERVnr2TOPll/NgMkPr1ukEB9vx8rKTkWFi924L419PxscHJk3yY948XwYNynmdIZPZbKiE/T+b1ZZ5Zy7jybl7ult2hAIDA2ncuDFeXl6EhYVRs2ZNw+2nTp1ydohuxGaz4ePjQ7FixejTpw++vlfH361W678MXSR7FCkE+w6a+P8KTkwcBAfZ8c9ztU1iEtSqauehBx1tzsbCjA9NhAQ7Kkoj+tvJG+y4bfZCEyWLe9DHqRyuUJiNgwev/imMjTURFGQnzzXHPykJqlXN4ME26c428+f7EhwMBQrYaFA/w1k9eqB5OgsWGOcY5RSxJ+MIr1PWuV2wWH4Szl8mJSnnder+KU/OPUv6io2rnnrqKZYtW3bD24sXL06TJk1u+hgtWrTg8ccfx2q1MnjwYAAOHTrEo48+SuvWrf9hyCLZ677advb9Bn+ecmwv+cJE0/uNHZmYOOg9zMzlRMf2rIUm2jSzYzLB0tUm3v3Q8Ucj7jws/9LEg83UEcopatWycvCghVOnHMdwzRpv7r8vw9AmLs7EsOH+JF45/p986kPTJumYTNCoYQabNnuRmupYWffjD16EV8iZH/B2f7OPinXLUrRsYQDaPvUA29bkwvHQLHhy7p7ulhWhSZMmkZ6ezvfff0+DBg2oWLEiFSpUIM+1H5duYejQoezcuROLxeLc5+Pjw+DBg2nUqNG/i1wkmxTIB+NH2xj2spmMdChRDN54wcavh+DlN82smGejdEno+6id7v3N2OxQo4qdF4c5Ojv9HrczeoKZDk84JlgP6mOnSkU3JyV/W758dp4blcIr4/KQkQFFi9oZMzqZw4fNvDnFj7lzkihZ0s6j3dN4ZqA/dpuJylUyGDrEUSno0CGdS5dMPN3fH5vVRLlyVgYMSHFzVv9OfGwCU578gJeWDMfb24vTf5zlzd7vUa5mGUbMeooBtUa7O8TbxpNzz5IHfZYz2e1ZjYpedeLECQ4fPuz8OXToEKdPn6Z48eJ8/fXXrorTwBZd3i3PK+5nw7PH680efMWLaOtld4fgVk/c1dDdIYgbrU9f7NLnK/P2VJc91x/DRrjsubJyy4pQyZIlKVmyJM2bN3fuS0pKIioq6rYGJiIiIm7iQRWhf/Xx0t/fn2rVqmV3LCIiIiIupa/YEBEREQNPuo6Q5044EBEREY+nipCIiIgYqSIkIiIikvupIyQiIiIeS0NjIiIiYqShMREREZHcTxUhERERMdDyeREREREPoIqQiIiIGNlN7o7AZVQREhEREY+lipCIiIgYaY6QiIiISO6nipCIiIgYaNWYiIiIiAdQRUhERESMVBESERERyf1UERIREREDzRESERER8QCqCImIiIiRKkIiIiIiuZ86QiIiIuKxNDQmIiIiRhoaExEREcn9VBESERERAy2fFxEREfEA6giJiIiIx1JHSERERDyW5giJiIiIkeYIiYiIiOR+qgiJiIiIgVaNiYiIiHgAVYQkRzF7eN/9xxSbu0Nwm21JVd0dgojnUEVIREREJPdTRUhERESMVBESERERyf1UERIREREDrRoTERER8QDqCImIiIjH0tCYiIiIGGloTERERCT3U0VIREREDDRZWkRERMQDqCIkIiIiRqoIiYiIiOR+qgiJiIiIkSpCIiIiIrmfKkIiIiJioFVjIiIiIh5AFSERERExUkVIREREJPdTRUhERESMVBESERERyf1UERIREREDrRoTERER8QDqCImIiIjH0tCYiIiIGGloTERERCT3U0VIREREDDRZWkRERMQDqCIkIiIiRqoIiYiIiOR+qgiJiIiIUQxg+wAAHNNJREFUkSpCIiIiIrmfKkIiIiJiYHJ3AC6kipCIiIh4LFWERERExMiD5gipI/QP2e0wZiKULwN9urk7Gtfy5Nwhd+e/bzssn28mIx2Kl4YnhtvIE2Bss2G1iY1fmPDxgSIl7Tw60E5gMKSlwv+1d+9hVZVpH8d/G4EUU1E5eWimTBM7eIh5Ra08lahoHpK31Bl1UjM8RGEnx9EsS1OyyNRJa9S01NRKUycVTHMqscYO0tuMWp6SREAdwwMYstf7BxO2BlI0WEv38/1c175ibR72uu/2Au99P89aa9FMj/bu9EiWdE2kpd+PshR4hTu5XIzvtp3StjeOyltgqeZvA3XbqFAFBtkb5vu2ntTnb/5bHo90xZV+unVEqKrXCbCN2TDlkIJq+avNsBAnwy9XLbu20OBJfRUQGKC9X32nF4bN0anjeaWOfXTecO396oDeSl7jcJQVw+TcTcbU2AXYvU+6N1FK2ex2JM4zOXfJt/M/fkya/7yfRoz3atJcr0IjLL09z75CYMeX0rplHj08xasJL3t10/9Ir08v+vPxtyUeeQulJ2d79eRsrwp+lN578/JZYZD3Q6E+nJGt2x8LV9ysq1Qtwl//eP2obcyZ015tfjFbdzwert7J9fWb/wlS2tzDtjHpK44p61/5ToZe7mqEVNMjf43XxLuTNeTG0crcm60hk/uVGHdVZF0lpYzTbXdFuxBlxTA599J4LOcebnOlEJoyZYobu/3VFq+U4rpJndu7HYnzTM5d8u38v/7co6sbS+H1irbbd7f0yUaPrJ/9gdr/jUdNWliqFVq0ffOtlrZ/Ip0pkK67yVK3/pb8/CS/StJV10pHsp3P42J9/+UphTS6QjXqFnV3mnSprt1/Py7rZ/8DLG9RR/DHU15JUkG+Jf+As8Ve5v/lKePzU4rsXN3Z4MtZVKem2rlttw5+e0iStGZOqjr2u7XEuB7DO2vtvE36+9ufOB1ihTE5d9NV+NTYn/70pxLPbdy4UT/88IMk6dlnn63oEMrN+IeK/vvxNnfjcIPJuUu+nf/RHKlWyNl/9GuGSnmnPMo/peLpsWsiLb3/rp+OZFmqHS59vN6jMwUenciVbog6+1pHsqQNKzwa+KDX4Swu3snDhbqy9tk/hVVr+6vglKWCPEuBQUXFTkAVP90SH6LVY75X5WqV5PVKdz5bt+jnj57R1rlH1Hl8hHak5LqSQ3kJrV9bORlHirdzMo6oao0gBVWrYpsimvXgfElFxYOvMDn3Ul0CnRqnVHghFBwcrJUrVyo+Pl7Vqxd9Wtq6datatmxZ0bsGUAaWV6WeK+tX6ezX190k3fkHS7Mm+snjkW7tbKlqNUv+P1sis+8b6S9P+aljD0vNWlV42OXGsqxS8/f8rF9+dP+P+mLZv9XnpatUvU6Avl7zg95PylLP5+rpg+ezFX1vbQXVuvyXXHr8/GydwJ94Cy+fwvZimZy76Sr8N/fxxx9X27Zt9eKLL2r06NGKjo7WggUL1Lt374reNYAyqBUm7d3h0U8fAY8dloKutHRF5bNj8k8VTYHd1qVozL8PSysXeFS1WtH3P/3AozdmePT7kZaiO15eHyWvDPFXzq7Txdsnj5xR4JV+Cqh8thL6/otTCo+sXLw4uknX6vpk/hFl78rX8awCfTK/qJOQd6xQltdSYYGl20aGOptIOcg5cFiRLRsWb4fUq6XcoyeUf+r0OX7KN5icu+kcWSPUunVrzZkzR4sXL9bUqVNVWFjoxG4BlMENUZZ275Cyvi/a/uBvHjVvbS9mjh2Rpj3mp7yTRdt/W+JRy/aWPB7py63Skr94NPpZ72VXBElSveZByt51Wj8cLJAk7Vh/XL9tGWQbU7vBFTr0db7yjp2RJO3/9KSuDPNXxPVV1Pevv1Xv5PrqnVxfkZ2r6ZpbrrwsiyBJ+iw1XU2iG6puwwhJUvdhdyhttQ/OB5fC5NxLZTn4cJljvdzg4GBNnz5dy5cv186dO53aLYDzqB4s3fuwVy8/7aczZ6SwOtLgR73at0takOynCS97FXGV1PVuS5MeLJo+aHRD0enzkrT8VT9ZKhr7k4Y3FJ1CfzmoElxJbR8I1cbnslRYYKl6RIDaPRiqnG9P66NZOeqdXF91m1bRTb1q6G/jMlUpwKMrrvRTpz9FuB16uTuWk6tpQ2dr/NJEBQT46+CeLD137yw1imqg0XOGafjvxrgdYoUxOXfTeSyrtFnRS5v30HVuhwC44uN8c9crpJ1q5HYIrtrQrIbbIcBFKQVvOrq/5g8kO7avL2ckOrav0nAdIQAAYKzL/zQHAABQvi67uaKLR0cIAABcFlavXq3Y2FjFxMRo0aJFvzjugw8+UMeOHcv0mnSEAACAzaVw64v/lpWVpeTkZL3zzjsKDAxU3759FR0drYYNG9rGHT58WFOnTi3z69IRAgAAl7wtW7aoVatWCg4OVlBQkDp37qx169aVGDdu3DiNGjWqzK9LRwgAANg52BHKzc1Vbm7J29NUr169+I4UkpSdna3Q0LPX6AoLC1N6errtZxYuXKjrr79ezZo1K/P+KYQAAIBrFixYoJkzZ5Z4ftSoUXrggQeKt71erzyes/fDsSzLtr1r1y6lpKTotdde06FDh8q8fwohAABg4+QaoUGDBpV6262fd4MkKSIiQtu2nb3ad05OjsLCwoq3161bp5ycHPXp00cFBQXKzs5W//79tXjx4nPun0IIAAC45r+nwH5JmzZtNGPGDB09elRVqlRRSkqKnn766eLvJyQkKCEhQZKUkZGhgQMHnrcIklgsDQAA/tsleK+x8PBwJSYmauDAgerVq5e6d++upk2b6r777tNXX3110anSEQIAAJeFO++8U3feeaftuVdffbXEuPr162vjxo1lek0KIQAAYHcJXkeoojA1BgAAjEUhBAAAjMXUGAAAsLkUb7FRUegIAQAAY9ERAgAAdnSEAAAAfB8dIQAAYOOxzGkJ0RECAADGoiMEAADszGkI0RECAADmoiMEAABsuI4QAACAAegIAQAAOzpCAAAAvo+OEAAAsGGNEAAAgAHoCAEAADs6QgAAAL6PQggAABiLqTEAAGDDYmkAAAADXJYdoc71Wrgdgms8fh63Q4CLLK9BH9Ng4+FjK5xk0J8afrUAAICxLsuOEAAAqDisEQIAADAAHSEAAGBnmdMSoiMEAACMRUcIAADYsEYIAADAAHSEAACAHR0hAAAA30dHCAAA2Hi8bkfgHDpCAADAWHSEAACAHWuEAAAAfB+FEAAAMBZTYwAAwIYLKgIAABiAjhAAALDjpqsAAAC+j44QAACwYY0QAACAAegIAQAAOzpCAAAAvo+OEAAAsGGNEAAAgAHoCAEAADuuIwQAAOD76AgBAAAb1ggBAAAYgI4QAACwoyMEAADg+yiEAACAsZgaAwAANiyWBgAAMAAdIQAAYOc1pyVER+gXtIxtoTlfJGneP5M1fmmigqpV+cWxj84fobjR3R2MruK17NpCsz+fqrn/94LGLXno3PnPG664RN/J3+TcJY594/M3+Pg3OXeTUQiVokZINT0yd7gm/u8LGnx9ojL3ZGnIs/1LjPtNZD0lpY7XbX2iXYiy4tQIqaZH/hqviXcna8iNo5W5N1tDJvcrMe6qyLpKShmn2+7ynfxNzl3i2Cd/c49/k3MvleXgw2WOFELp6enFX6elpWnKlCmaNm2atm/f7sTuL1hUTDPt2rZb3397SJK0enaqbu9/a4lxPUbEaO28jfrwra1Oh1ihojo11c5tu3XwP/mvmZOqjv1KyX94Z62dt0l/f/sTp0OsMCbnLnHsG5+/wce/ybmbzpFCaMKECZKkRYsWafLkyYqIiFBISIieeOIJvfHGG06EcEFC69dWzoEjxds5GUdUtUZQiTbpzIT52rTkY6fDq3Ch9WsrJ+P8+c96cL42velb+Zucu8SxT/7mHv8m514aj+Xcw22OLpZetmyZFi5cqJo1a0qS4uLiFBcXpz/84Q9OhnFefn4eWaXceddb6HUhGud5/PxKvfGwCfmbnLvEsW96/iYf/ybnbjpHOkJnzpyR1+tVcHCwAgMDi58PDAyUn9+lt0wp+8Bh1a5bs3g7pF4t5R49ofxTp12Myjk5Bw6rdh0z8zc5d4lj3/T8TT7+Tc69VJbl3MNljlQhwcHBat++vfbu3aunn35aUtFaob59+6pLly5OhHBBPktJV5PoRqrXMEKS1P3+Tkpbtc3lqJzzWWq6mkQ3VN2f8h92h9JWm5G/yblLHPvG52/w8W9y7qZzZGrs9ddflyTt2bNHubm5koq6QQkJCWrfvr0TIVyQYzm5mjbkZY1fNloBgf46uOeQkgbN0nVRDTT6lfsVH/W42yFWqGM5uZo2dLbGL01UQIC/Du7J0nP3zlKjqAYaPWeYhv9ujNshVhiTc5c49snf3OPf5NxLcyms3XGKxyptQvwS16nSPW6H4BqPn8ftEOAiy6CLnMGO332zpRS86ej+OnSe6ti+Nq139wMGV5YGAAB2Bn3muvRWKgMAADiEjhAAALDxXH6rZi4aHSEAAGAsCiEAAGAspsYAAICdQRfUpiMEAACMRUcIAADYsFgaAADAAHSEAACAnTkNITpCAADAXHSEAACAHWuEAAAAfB8dIQAAYOMxpyFERwgAAJiLjhAAALBjjRAAAIDvoyMEAABsPNxrDAAAwPfREQIAAHasEQIAAPB9dIQAAICdOQ0hOkIAAMBcFEIAAMBYTI0BAAAbD4ulAQAAfB8dIQAAYEdHCAAAwPfREQIAAHbcYgMAAMD30RECAAA2nDUGAABgADpCAADAjo4QAACA76MjdJmxvOZU6aWyDDqVoTQecz+7ePw8bocAmIOOEAAAgO+jIwQAAOwMar7TEQIAAMaiIwQAAGy4jhAAAIABKIQAAICxmBoDAAB2TI0BAAD4PjpCAADAjo4QAACA76MjBAAA7OgIAQAAXFpWr16t2NhYxcTEaNGiRSW+v2HDBvXs2VM9evTQiBEj9MMPP5z3NSmEAACAndfBRxllZWUpOTlZixcv1sqVK7V06VJ9++23xd8/ceKEnnzySb3yyitatWqVGjdurBkzZpz3dSmEAACAa3Jzc5WRkVHikZubaxu3ZcsWtWrVSsHBwQoKClLnzp21bt264u8XFBRowoQJCg8PlyQ1btxYmZmZ590/a4QAAICNk7fYWLBggWbOnFni+VGjRumBBx4o3s7OzlZoaGjxdlhYmNLT04u3a9asqU6dOkmS8vPz9corr2jAgAHn3T+FEAAAcM2gQYPUu3fvEs9Xr17dtu31euXxeIq3Lcuybf/k+PHjGjlypCIjI0t93f9GIQQAAOwc7AhVr169RNFTmoiICG3btq14OycnR2FhYbYx2dnZGjJkiFq1aqWxY8eWaf+sEQIAAJe8Nm3aKC0tTUePHlVeXp5SUlLUtm3b4u8XFhYqPj5eXbt21Z///OdSu0WloSMEAADsvJfedYTCw8OVmJiogQMHqqCgQHFxcWratKnuu+8+JSQk6NChQ/rnP/+pwsJCrV+/XpJ04403atKkSed8XY9lXX5XTepU6R63Q4BbrAs419IXecxt4nr8yvbpDvBFKQVvOrq/ro3HOLavtTunOLav0tARAgAAdpdfj+SimfvxEgAAGI9CCAAAGIupMQAAYMfUGAAAgO+jIwQAAOzoCAEAAPg+OkIAAMDuErygYkWhIwQAAIxFRwgAANgZdBV/OkIAAMBYdIQAAIAdZ40BAAD4PjpCAADAzqCzxiiEfkHL2BYaMqmfAq4I0N6vvtPzQ2fr1PG8Usc+On+E9n71nd56YY3DUVYcE/JvGXuzhkzuX5Rj+n49P/TlEjn+0phqNa9Uwl/u07XNr1b+yXytf22T3p25TpLUqnuUHn1tlHK+O1z8OoltxyvvRL6j+V0sE977c2nZtYUGT+qrgMCi/F8YNueX8583XHu/OqC3ksnfF5icu8mYGitFjZBqemTucE383xc0+PpEZe7J0pBn+5cY95vIekpKHa/b+kS7EGXFMSH/GiHV9ci8EZoYN02DmzyozL1ZGjLl92UeE//CIOWdzNfQGxKV0PrPatmlhaK73SxJur5NY731/CrF3/xo8eNyKYJMeO/PpUZINT3y13hNvDtZQ24crcy92RoyuV+JcVdF1lVSyjjddhf5+wqTcy+VZTn3cJljhdCHH36o3NxcSdLKlSs1ceJEvf32207t/oJExTTTrm279f23hyRJq2en6vb+t5YY12NEjNbO26gP39rqdIgVyoT8o2Kaatc/fpbjyym6vf9tZR7TKKqBNry+WV6vV2cKzuiT9z5X2z6tJUk3tG6s5h1u1OwvntMLmyfqptuaOJjZr2PCe38uUZ2aaue23Tr4n/zXzElVx36l5D+8s9bO26S/v/2J0yFWKJPzNzl30zlSCE2aNElz5szR6dOn9eKLL2rVqlVq2LChUlNT9cwzzzgRwgUJrV9bOQeOFG/nZBxR1RpBCqpWxTZuZsJ8bVrysdPhVTgT8g+9KkQ5GWenrkrL8Vxjdnz6re4Y0E6V/CupctXKuvWuVqpVJ1iSlHvkuNbMSVV8i0c1d+xiPfnOowqpV8u55H4FE977cwmtX1s5GefPf9aD87XpTfL3JSbnXiqDOkKOrBHasmWLVq1apUqVKmnz5s1aunSpAgMDdc8996h79+5OhHBB/Pw8skp5c7yFZlxgyoT8i3Is+fzPczzXmDkPL9D90wbq5c+T9O9Dx/T5hu26vnVjSdJTcdOKx3798Q59vWWnojo11frXPijvNMqdCe/9uXj8/M57XPgyk/M3OXfTOdIRqly5so4cKaq0IyIidOrUKUlSXl6e/P0vvfXa2QcOq3bdmsXbIfVqKffoCeWfOu1iVM4xIf/s7w6rdp1z53iuMUHVg/TqY29oWNOH9XjM0/J4PDq4+5Cq1ghSvz/1tu3L4/HoTEFhxSdVDkx4788l58D5jwtfZnL+JuduOkcKoZEjRyouLk5Tp05V/fr1NWDAAE2ePFl333237r33XidCuCCfpaSrSXQj1WsYIUnqfn8npa3a5nJUzjEh/89StqtJq5/lGB+jtHf/UeYxd8Z30qCJ90iSgsNqqOuQ27Vx8UfKO56vHiO66Nb/LKS8tvnVatyyof6x7kunUvtVTHjvz+Wz1HQ1iW6ouj/lP+wOpa0mfxOYnHupmBorXx07dlSjRo20YcMG7d+/X82bN1fVqlU1ZcoUNW3a1IkQLsixnFxNG/Kyxi8brYBAfx3cc0hJg2bpuqgGGv3K/YqPetztECuUCfkfy8nVtMF/0fjlDxfluDtLSYNmFuX46nDF3/zoL46RpCXPrtDjCx/QK+nPy+PxaMGEpdq1bbckaUKvqRr50hANfPJuec94NalvsnKPHHcz3TIz4b0/l2M5uZo2dLbGL01UQIC/Du7J0nP3zlKjqAYaPWeYhv9ujNshViiT8zc5d9N5rNIWBFziOlW6x+0Q4BaDbgRYKo+5V7zw+HncDgFwTUrBm47ur2udkY7ta23mLMf2VRpz/6oCAADjXXorlQEAgLsuv8mii0ZHCAAAGIuOEAAAsKMjBAAA4PvoCAEAADsvHSEAAACfR0cIAADYWAZds42OEAAAMBYdIQAAYMcaIQAAAN9HRwgAANhxHSEAAADfRyEEAACMxdQYAACw83L6PAAAgM+jIwQAAOxYLA0AAOD76AgBAAAbizVCAAAAvo+OEAAAsGONEAAAgO+jIwQAAOy46SoAAIDvoyMEAADsLM4aAwAA8Hl0hAAAgI3FGiEAAADfR0cIAADYsUYIAADA91EIAQAAYzE1BgAAbFgsDQAAYAA6QgAAwM6gxdIeyzLoFrMAAAA/w9QYAAAwFoUQAAAwFoUQAAAwFoUQAAAwFoUQAAAwFoUQAAAwFoUQAAAwFoUQAAAwFoUQAAAwFoXQBVi9erViY2MVExOjRYsWuR2O406cOKHu3bsrIyPD7VAcN3PmTHXr1k3dunVTUlKS2+E4bvr06YqNjVW3bt00f/58t8NxzdSpUzVmzBi3w3DcgAED1K1bN/Xs2VM9e/bU9u3b3Q7JURs3btRdd92lrl276plnnnE7HJQz7jVWRllZWUpOTtY777yjwMBA9e3bV9HR0WrYsKHboTli+/btGjdunPbt2+d2KI7bsmWLPvroI61YsUIej0dDhw5VamqqOnXq5HZojvj000+1detWrVq1SmfOnFFsbKzatWunBg0auB2ao9LS0rRixQq1b9/e7VAcZVmW9u3bp02bNsnf37x/Mg4cOKAJEyZo+fLlql27tgYNGqTNmzerXbt2boeGckJHqIy2bNmiVq1aKTg4WEFBQercubPWrVvndliOWbZsmSZMmKCwsDC3Q3FcaGioxowZo8DAQAUEBOjaa6/VwYMH3Q7LMS1bttTChQvl7++vI0eOqLCwUEFBQW6H5ahjx44pOTlZ8fHxbofiuD179kiSBg8erB49euiNN95wOSJnpaamKjY2VhEREQoICFBycrKaNWvmdlgoR+aV9xcpOztboaGhxdthYWFKT093MSJnTZo0ye0QXNOoUaPir/ft26e1a9dqyZIlLkbkvICAAL300kuaN2+eunTpovDwcLdDctQTTzyhxMREZWZmuh2K43Jzc9W6dWuNHz9eBQUFGjhwoK655hrdcsstbofmiP379ysgIEDx8fHKzMxU+/bt9dBDD7kdFsoRHaEy8nq98ng8xduWZdm24fu++eYbDR48WI899piuvvpqt8NxXEJCgtLS0pSZmally5a5HY5jli9frjp16qh169Zuh+KKFi1aKCkpSdWqVVOtWrUUFxenzZs3ux2WYwoLC5WWlqbJkydr6dKlSk9P14oVK9wOC+WIQqiMIiIilJOTU7ydk5Nj5DSRqT777DP98Y9/1MMPP6zevXu7HY6jdu/erX/961+SpCpVqigmJkY7d+50OSrnvPfee/r444/Vs2dPvfTSS9q4caMmT57sdliO2bZtm9LS0oq3Lcsyaq1QSEiIWrdurVq1aqly5cq64447jJoNMAGFUBm1adNGaWlpOnr0qPLy8pSSkqK2bdu6HRYckJmZqZEjR2ratGnq1q2b2+E4LiMjQ+PGjdOPP/6oH3/8Ue+//76ioqLcDssx8+fP15o1a/Tuu+8qISFBHTt21NixY90OyzHHjx9XUlKSTp8+rRMnTmjFihXGnCggSR06dNBHH32k3NxcFRYW6sMPP9QNN9zgdlgoR+aU9b9SeHi4EhMTNXDgQBUUFCguLk5NmzZ1Oyw4YO7cuTp9+rSmTJlS/Fzfvn3Vr18/F6NyTrt27ZSenq5evXqpUqVKiomJMbIgNFWHDh20fft29erVS16vV/3791eLFi3cDssxzZo109ChQ9W/f38VFBTolltuUZ8+fdwOC+XIY1mW5XYQAAAAbmBqDAAAGItCCAAAGItCCAAAGItCCAAAGItCCAAAGItCCAAAGItCCAAAGItCCECZdO3aVW3bttU333zjdigAUG4ohACUyZo1a3T11Vdr/fr1bocCAOWGQghAmVSqVElRUVFG3XAVgO/jXmMAyiQ/P1/vvfeeuCsPAF9CRwhAmSQnJyssLEzfffedTp486XY4AFAuKIQAnNcXX3yhtWvXasaMGapWrRoLpgH4DAohAOd0+vRpjR07Vk899ZSCg4MVGRmpHTt2uB0WAJQLCiEA5zR9+nQ1b95cHTp0kCRFRkayYBqAz6AQAvCL0tPTtW7dOo0dO7b4uSZNmtARAuAzPBangAAAAEPREQIAAMaiEAIAAMaiEAIAAMaiEAIAAMaiEAIAAMaiEAIAAMaiEAIAAMaiEAIAAMb6fyr6WSBpNFEzAAAAAElFTkSuQmCC\n",
-      "text/plain": [
-       "<Figure size 720x720 with 2 Axes>"
-      ]
-     },
-     "metadata": {},
-     "output_type": "display_data"
-    },
-    {
-     "data": {
-      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAkIAAAJgCAYAAABmwww9AAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjEsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+j8jraAAAgAElEQVR4nOzdd3gUVdvH8e+2JCQkoSX0LlUIvUpHqdLxEURU4AFBERFBQWkKIgpSpAmigEjThyZFihRFQaQKIh2pCSQQIJ0ku/v+sbyBJVEiJJuE/X2ua68rM3tm5tyZ2dkz95wza7Db7XZERERE3JAxoysgIiIiklHUEBIRERG3pYaQiIiIuC01hERERMRtqSEkIiIibksNIREREXFbagiJZFJjx46lXbt2tGvXjgoVKtC8efOk6bi4uH+1rkOHDjFy5Mi/ff/48eOUKVOGOXPmPGy1RUSyFIOeIySS+TVp0oSpU6dSsWLFB1p+xYoVbNy4kdmzZ6f4/qhRo4iOjmbPnj1s2bIFs9n8MNUVEckydLYTyYK+/fZblixZgs1mI0eOHIwYMYKSJUuyd+9exo8fj81mA+Dll18mKCiITz/9lMjISIYNG8aHH37otK6oqCjWrFnDt99+y7Fjx9i4cSOtW7cGIDExkQkTJrB9+3ZMJhNVqlRh1KhRGI3GFOfPnj2b69evJ2Wfpk2bljTdvXt3/P39OXPmDF27dqVixYpMmDCB+Ph4wsLCqFu3LuPGjQNg27ZtTJkyBZvNhre3N++99x7btm3j1KlTfPLJJwDs3buXsWPHsmrVKlf920XkEaSGkEgW89tvv7Fq1SoWLVpEtmzZ+Pnnn+nfvz/ff/8906ZNo0ePHrRu3Zpjx46xbNkymjdvzoABA9i4cWOyRhDA6tWrKVasGCVLlqR9+/bMnz8/qSG0ePFijhw5wurVq/Hw8GDQoEGsX7+eiIiIFOffj5+fX1K5QYMGMWDAAGrVqkV0dDRNmzbljz/+IF++fAwZMoSvvvqK8uXLs2nTJiZOnMhHH31Es2bNuHHjBjly5OCbb76hS5cuafvPFRG3o4aQSBazfft2zp0759QIiIiI4MaNG7Rs2ZL333+frVu3UrduXQYNGnTf9S1dupT//Oc/ALRt25ZJkyZx4MABqlSpws6dO2nXrh1eXl4ATJkyBYC+ffumOH/atGn/uK3q1asn/T1+/Hh++uknPvvsM86cOcOtW7eIiYlh//79lCpVivLlywPQrFkzmjVrBkCjRo1YvXo17du35+eff2bUqFGp+p+JiPwdNYREshibzUa7du0YMmRI0nRoaCj+/v506dKFxo0b88svv7Bjxw6mT5/Ohg0b/nZde/fu5eTJk8ydO5d58+YBYLFYmD9/PlWqVEnWV+jq1avYbLa/nW8wGLi722FCQoJTOW9v76S/n3/+ecqUKUP9+vVp2bIlv//+O3a7HZPJhMFgSCpnt9s5fvw4ZcuWpVu3bowePRqz2UyzZs3w8fH5l/89ERFnGjUmksXUq1ePdevWERoaCsCSJUt48cUXAejSpQtHjx6lY8eOjBkzhoiICMLCwjCZTCQmJiZb15IlS2jXrh0//vgjW7duZevWrXz22Wds3ryZ4OBg6tSpw9q1a4mPj8dmszF69GjWrVv3t/Nz5szJkSNHsNvtREVFsW3bthRjiIiI4PDhwwwePJhmzZpx+fJlzp8/j81mo1KlSpw+fZqTJ08CsGXLlqRGX9WqVTEajXzxxRe6LSYiaUIZIZEspl69evTu3ZuePXtiMBjInj0706dPx2AwMHjwYMaNG8eUKVMwGAz079+fQoUKYbVamTFjBv3792f69OkAhIeHs2nTJpYvX+60/jp16lC5cmUWLlzI4MGDuXTpEh07dsRut1OzZk26d++OwWBIcX5sbCw7duygWbNm5M2bl5o1a5LSwFQ/Pz/69OlDhw4d8Pb2Jm/evFStWpVz585Rp04dJk6cyNtvv43VaiV79uxMnjw5admOHTuyfv16ypYtm77/aBFxCxo+LyJZRmJiIv3796dt27a0atUqo6sjIo8A3RoTkSzh1KlT1KlTh5w5c9KiRYuMro6IPCKUERIRERG3pYyQiIiIuC01hERERMRtqSEkIiIibitLDp9vWW5YRlchw1hPn8voKmQoU8F8GV2FDGW/GZnRVcg4VmtG1yBD2coVy+gqZChj9K2MrkKG2nD4A5duz3a5tMu2Zcx3wmXbSnH7Gbp1ERERkQyUJTNCIiIikn5s2Fy2rYzOyGT09kVEREQyjDJCIiIi4sRqd11GKKMbIsoIiYiIiNtSQ0hERETcVkZnpERERCSTseE+v76ljJCIiIi4LWWERERExIkrh89nNGWERERExG0pIyQiIiJOrHb1ERIRERF55CkjJCIiIk40akxERETEDSgjJCIiIk6sygiJiIiIPPqUERIREREn6iMkIiIi4gaUERIREREneo6QiIiIiBtQRkhEREScuM8vjSkjJCIiIm5MDSERERFxW7o1JiIiIk70QEURERERN6CMkIiIiDixuk9CSBkhERERcV/KCImIiIgTDZ8XERERcQPKCImIiIgTK4aMroLLKCMkIiIibksZIREREXFi06gxERERkUefMkK31WhYhh5vNMfiYeav45eZMnw5MdG3Uiz75ofPcPbEZZbP2wGA0WjglRFtqVi9OAB7fjrO3Anfu6zuD6pmy8r0HNsFi6eZvw5fYFKfOcRExqaqjG9OH16b3pOSlYoSF32LTQt+ZPXMTQCUrlaCfp90x8vHE6PJyDcT17Bl8S8ZEWKq1Whcjh5vtXbs/2PBTHl7GTFRf7P/J3bl7PEQln++PWle6+fr0uLZ2nh4WTj1x0WmvL2UhHiri2r/8Go+VZEeIzs64v/zIpMHLCAmMi7Fsm/O6MHZo5dYPt2xvz28LLw64TnKVC2OwWDg2L4zzBiymPi4BFeG8FBqNguix6hOWDwt/HXkApP7z/v7+Gf14uyfF1k+bWPSPB//bExcP5RJ/edx8sBZF9U6bdSsW4qerzTBYjHx16lQJn3wHTEx8U5lmraoSOdudcAOcXEJzJy0gZPHQvDwNNN/cEvKli8IBjh25BLTJ35P/K3EDIrm36tZvww9BjZzxH/yMpNHrvz7c//YTpw9eYXlC34GILtfNl4b0ZaSZfITF5vAptX7+G7xr66sfrpRHyE345/Th0EfdGbs64vo3WoSly+G0+PNFsnKFS4RwIfz/ku9ZhWc5jdpW4WCxQLo124qr3T4lIo1SlCveYVky2cm/nl8Gfz5y7z/7BR6VRhMyF9X6PVBl1SX6TuxO3FRcfQOGsLr9UZSo3llarWqAsDIZQP56v3l9KvxDu+2+ZiXP36eAo/lc3mMqeWfy4dBH3dhbL/59G46nsvnw+nx1tPJyhUuGciHi/pRr2WQ0/y6zSvS9sX6DHv+M/o2+xgPTwvtezZ0VfUfmn/u7Aya/hJjXpzFf2uNIOTsVXqM7JisXOHS+Ri/6k3qt63mNL/roNaYTCb61XuPfvVG4+nlwbNvtHRV9R+af25fBs3syZjuM/hv9XcIORtGj9Gdk5UrXDo/49cMoX676k7zazxVkalbRlCwVOY9xv+Ofw5vBg9vy/vDvqXXszMJCb5Or1ebOpUpVCQ3/+3/JO8OXEy/F+aweN4ORo3/DwDPvVQfk8nIy89/Rt/nZ+PpaaHLC/UyIpQH4p/Tm0FjOjLmjcX8t+0UQi5ep8fA5snKFS4ewPi5Pan/lPN5/eW3WhEXE0+f9lMZ2O0zqtcrTc0GZVxVfUkjLmsInT59mpkzZzJy5EhGjx7NzJkzOXz4sKs2/4+qPlGKE39cJPjcNQDWLvmVxk9XTlbu6efqsPF/e9ix0bneRpMRr2weWDzMWDzMmC0mEjL5FVG1p4I4vvcMwacuA7B29g806fpEqsuUqlqcHxb9jM1mJzHByu7vD1C/Y00snha+HruCA1v/AODqpXBuXI0koGAuF0b371StX4YThy4QfPYqAGu//oXG7aomK/f0C/XYuGw3O9b/7jS/acfqrJi7naibMdjtdqYP/5atK/e5pO5poWrjxzlx4CzBZ0IBWPfldpo8UytZuTa9GrNh4Q52rHaO7fCuEyz5ZB12ux2bzc6pQ+cJLJTbJXVPC1WbPM6J/X/dif+LbTR5pnaycm16N2HDgp/YsWqP0/x2fZ/k4z5zuH75pkvqm5aq1SrB8aPBBF8IB2Dtir00aV7RqUxCQiKTx60l/FoUACePBZMzd3bMZiOHD5xj8bwd2O049v2Jy+TN5+/yOB5U1bqlOHHkEsHnHef+dct206R1pWTl2nStzYYVe9mx+Q+n+aXKF2TLmoOO82CilT0/Had+s8x9EZxaVgwue2U0lzSEFi1axKBBgwCoWLEijz/+OAAjRozgyy+/dEUV/lGefP6Ehdw5iV29EoGPrxfePp5O5WaN/Y7t636/d3F+WLmPqIhYFm4fxqKf3iH4/DV2bz+W7vV+GAGFchF28VrSdNjFcHz8vfH2zZaqMsd+O8WT3ephMpvw8vGkfoea5MqXk4RbCWyYvz1pmVa9muCd3Yuju0+6JK4HkSd/DsJCbiRNX718Ex+/bHhnv2f/j1rB9u/2J1u+UPEAcuTOzpj5fZj5/WC6DWxOVERssnKZVUDBnIRdup40HRZ8HR8/b7x9vZzKzXx7CduX/5Zs+f3b/uTS6SsABBbKRYe+TyZrLGVmAYVyEXYpPGk67NL128f5PfEPWcT2/+1OtvzwTpM5sf9selczXQQE+hN25c65Lyw0Ap/sXnh7eyTNuxJyk9923vn8vvx6M37dcZzERBv7fjvDpduNqMB8/nR8thY/bf3TdQE8pIB8/oTd1YAN+5tz/8xxa9i+/lCy5Y8fvkDTNpUxmR0Xw088+Ti58vime70lbbmkj9BXX33FqlWryJYtm9P8Hj160KFDB3r27OmKavwto9EAKfzSrtWWumdrdnu1KTevR/Nc/Q/w8DQzcnp3Or5UjxXzf07jmqYdg9GIPYVRATarLVVlZr+1iD4fdWPWnnGEX77J/i2HKV+ntFO5Z4e0oX3/Frzb5qNM3V/EaDSQUqDWVP7Yjslsokq9Mrzf5wvibyXy5sSuvDS4FbPHrErrqqYLx35OKf5/92zZxyoVYeTCV/hu7jZ+25T8SyOzMhgNKR7n/zb+rMhgNKT4G+O2FIYMeXlZGDyiHQF5/Xhn4CKn90qVyc+oj/7D6v/tYfcvmfei514GgyHlYz+V5/45E7+n95stmfFNf65fi+TArlOUq1w0rauZIWz2jM/UuIpLMkJms5nExOS3iuLi4rBYLK6owj8KDblBrgC/pOk8ef2IvBHDrdjUfXnXfepxNi3fS2KClZioW/ywaj9BtUqmV3XTRNiFq+QukCNpOk/BXESERxEXcytVZbz9sjF32GL6VHmboS3HgcGQdAvN4mFm2ML+NHq2LgMbjOLMofOuC+wBhAbfIFfeO+n8PPn8b+//+H9Y6o7w0Ah2bjxETNQtEhOsbF21j7JVs87JMOziNXLnu2s/589B5PVobsWkLn6Ahh1r8OGKQXz53gqWTV6fHtVMN2EX7om/QE4ir0f9q/izqrArN8l9VwYjT4AfETdjibvnwiUgrx+TP++BzWZjyKtfEX3XQIJGTz7Oh58+zxczt7B0Qea9+EtJ2OUb5A6869wf6EfkzdSf+719PJk7aQN9O37KsN7zHOfBC9fuv6BkKi5pCPXt25f27dszfPhwpk6dyqeffsrw4cN55pln6Nu3ryuq8I/2/3KSspUKU6Coo19Dq2drsetfpHdP/RlMg5aO++oms5HaTcpx7PfM/eW/b/NhytUsldSJ+ek+Tdm1Zl+qyzzd50leGPUMADkC/WjZozFbl+4E4O0Fr+Ltm403Gozmyrmrrgrpge3fcZyyVYpSoFgeAFo9V5dd9/QF+Cc/f/879VtXxsPT0aiv06wiJw5dSJe6pod92/6kbPUSFCgRCEDrHg3Z9f3BVC9fq3kQ/T7swjudJqd46yyz27f1CGVr3BV/z0bsWpf6+LOyfbtPU65CQQoUdvThe7pDNXbtOO5UJpu3BxNnvsgv248xbsQKpxFhteuV5pVBLRj2+tds25T6z0xmsW/nKcoGFaZAEce5v/V/arJr29FUL9/6PzV5of+TAOTI7UOLTtVT7D4hmZtLbo21adOGmjVrsmvXLkJDQ7HZbFSvXp3XXnuNvHnzuqIK/+hmeDST313Ou1O6YbaYCLkQzsSh31Dq8YK8PqYj/TtO+8fl54xfxyvD2zJn3RvYbHYO7jrN/774yUW1fzA3wiKY2Hs2I5a+jsXDTPDpK0zoOYtSVYszaHZv+tV452/LACz9aDVvz3+FOQc+AoOBr97/lhP7zlCuVikadKrFhRPBTP5xVNL25r6zlH2bM+ftkpvXopg8ZCnvznzJsf/PXWXim0soVbEQr49/lv6tP/nH5dcu/IXs/t5MW/MGRpORU39cZO4Hq11U+4d382okk/rPY/j8vpg9zIT8FcaEfl9QqnJRBk59kVcbvv+Py/d+/xkwGBg49cWkeX/uPsWMtxand9XTxM2rkUx65UuGf/UqZg+TI/6+cylVpRgDP32JV+uPzugqppsb12OYOOY7RozrjMViIvjidSa8v4pSZfMz6J029HthDu061yAwnz9PNCzLEw3LJi37Vv+F9H7tSTDAoHfaJM0/cugC0ydm/seHgOPcP2nEcoZP6pp07p/wzv8oVb4gA9/rwKvPTP/H5ZfN/ZEhHz7DZysGYDDAwhk/cOLIJRfVPn1lhk7MrmKwp3SDNJNrWW5YRlchw1hPn8voKmQoU8GsN0Q5LdlvRmZ0FTKONes8lyk92MoVy+gqZCjj3zzbx11sOPyBS7e3/3wRl22rapGMvYOiByqKiIiIE6sbPWbQfSIVERERuYcyQiIiIuJEw+dFRERE3IAyQiIiIuLEnUaNKSMkIiIibksZIREREXFitbtPnsR9IhURERG5hzJCIiIi4sTmRnkS94lURERE5B7KCImIiIgTjRoTERERcQPKCImIiIgTjRoTERERcQNqCImIiIjb0q0xERERcWJTZ2kRERGRR58yQiIiIuLE6kZ5EveJVEREROQeygiJiIiIEw2fFxEREXEDygiJiIiIE/3oqoiIiIgbUEZIREREnFjteo6QiIiIyCNPGSERERFxoucIiYiIiLgBZYRERETEiU3PERIRERF59CkjJCIiIk7UR0hERETEDaghJCIiIm5Lt8ZERETEiR6oKCIiIuIGsmRG6Gj/PBldhQwTWNJ9Wukpsf3Pffc9QGQR993/cQUSM7oKGcrsG5/RVchQJbv/ldFVcCv60VURERERN5AlM0IiIiKSfqx6oKKIiIjIo08ZIREREXFiw336IyojJCIiIm5LGSERERFxoj5CIiIiIm5AGSERERFxoh9dFREREXEDygiJiIiIE5t+a0xERETk0aeMkIiIiDhRHyERERERN6CGkIiIiLgt3RoTERERJzY9UFFERETk0aeMkIiIiDix6kdXRURERB59ygiJiIiIE/UREhEREXEDygiJiIiIE/UREhEREXEDygiJiIiIE/UREhEREXEDygiJiIiIE6syQiIiIiKPPmWERERExIlNo8ZEREREMpc1a9bQqlUrmjVrxqJFi5K9f+TIETp16kTbtm15+eWXiYiIuO861RASERERJ1a70WWv1Lpy5QqTJ09m8eLFrFq1imXLlnHq1CmnMh988AEDBgzgu+++o3jx4nzxxRf3Xa9ujYmIiEiGiYiISDFz4+fnh5+fX9L0zp07qV27Njly5ACgefPmbNiwgf79+yeVsdlsREdHAxAbG4u/v/99t6+GkIiIiDix2V3XR2jBggVMnz492fz+/fvz2muvJU2HhoYSEBCQNB0YGMihQ4eclhk6dCg9e/Zk3LhxZMuWjW+++ea+21dDSERERDLMiy++SIcOHZLNvzsbBI5sj8Fwp4Fmt9udpuPi4nj33XeZP38+QUFBzJs3j7fffps5c+b84/bVEBIREZEMc+8tsL+TL18+9u7dmzQdFhZGYGBg0vSJEyfw9PQkKCgIgGeffZapU6fed73qLC0iIiJOrBhd9kqtunXrsmvXLsLDw4mNjWXTpk00aNAg6f2iRYty+fJlzpw5A8CWLVuoWLHifderjNBtjYsW563a9fEwmTh2LYy3t24iKiHeqcy7TzSkVcnS3LwVB8CZ69fpv2mtU5nPWrTlSnQUo3ZsdVnd08ITAWV4pXQzPIxmTkVeZuzhFURbbzmVKZk9L4PLtyG72Qub3caHR1ZxLCIYIwaGlG9LlVzFAdgZdpxPj3+fEWE8sCcqFqd/p3p4mE2cvHiVMfM3ER3nvP//07gynRo5rjQuht5g7FebuR4ZC0DnRpVoX78Cnh5mjp4LZcz8TSQkWl0ex4NqWKo4bzZ5Ag+TieOhV3nnu81Ex8enWLZpmZJMaN+Cqh/NAGBq56cpmutOh8RCOfzZc+4i/ZZ955K6p4UmhUvwVo0GeBhNHAsP460dG5J9/ofXakSr4mW48f+f/5vh9N+6hllN21LUL2dSucK+/uwOucB/N690aQwPqlH+xxgS1AgPo5ljN0MZ9ttaohKdYy/tH8Coqs3xtXhis9sZvnc9f1y/jKfJzHtVW1Apd37AwO/Xghm1fwO3rIkZE0wq1WxZmZ5ju2DxNPPX4QtM6jOHmNuf5fuVMRoNvDq1B0ENygLw2/cH+XzoYgAqNSxPn4+7YTKbiLgWyWeDF3Lm0HmXx/eoyps3L2+88QYvvPACCQkJdO7cmaCgIHr37s2AAQOoWLEiH374IQMHDsRut5M7d27GjRt33/Ua7Ha73QX1T1PFZnySpuvL5ZWNTV1fovOKJZy9eYOhderjY/FgxE9bnMqt6NSVsb/8yP7LwSmu5+UqNehTpTprTx5Pt4ZQYMlrab7OHBYfltZ7nd67Z3Mh5hr9SzfH2+zJx3/e+SLzNFpY2eBNxv6xgp1XT9AgsBz9S7fgPz9PpnXBqrQuUIX+e77EYDDwRe2+LDzzE1uu/JHmdbX9L0+arzNH9mx88/6L9Bq/lAuhN3itU328vSx8tOjOPixbNJCP+7Wh63sLiY6N5/VnGuDj5cG4hT/QuOpjvNLhCXqNX0ZkTBwf9W3DkbOXWfD9njSva2SRtO/AmNM7G+v6vUDXecs4F36DwU3r4ePhwXvfJz+Gi+bKwefPdSBPdm+qjp+R7P2KBfIytfPTPDd/GZcjotK0nnEF0ufLNZdXNjZ36kGnNYs5G3GDoTUakN3iwfCdPziVW9mmG2N3b2NfaMqff4CgPPmY1bQtndcuISQ6Mk3rafZNuWH6MHJ5erOhRR/+s2UBZ6Ou81ZQY3wsnozatyGpjJfJzLbWrzJsz1q2h5zmyQKleatSY5p9P5tBFRpSwMePIbvXYDAYmFSrHWejwpnyx09pXteS3Y+kyXr88/jy+cGPGdjoPYJPXabXuC54Z8/GtAHzUlXmqe4NeOr5+gxtOQ6D0ciUn0bzzSdr2bf5EAtPfsqYLlM4uO0IhcsUYPTyQfStOpSE+Ic/djfFL37odfwbb/3+jMu29XGlb122rZTo1hhQv0hRDoVe5uzNGwB8/cfvtCtdzqmMh9HE43kC6VulBhu6vMCsFm0okN036f3aBQrRsEgxFv3xu0vrnhZq5XmMP29e5EKMo5G1/MJuWuSv7FSmdp7HuBgbzs6rJwD4KfQo7xxcAoAJA14mDyxGMx5GMxaDiVu2zH1FeLfajxflz7OXuRDq2P//2/47LWs57/9j50Lp8O48omPj8TCbCMyRnRtRjivI1nXK8/WmfUREx2G3w7iFP7B+11GXx/Gg6pUoyuHgy5wLd8S/ZO8h2lQsm6ycl9nMhA4tGL/pxxTXYzEaGd+uOeM2bk/zRlB6alCwGIfCLnM24vbn/+hB2j1W3qmMh9FE+dyBvBxUk40dX+Kzpu0o4OPrVMZiNDKpYUve/3VbmjeC0ku9fMU5FB7C2ajrACw6tZ92RR53KlM/XwnOR11ne8hpAH4IPsFrOx3Zrt/CzjP9yC/YAZvdzp83LlPQ+/7DlTNStaeCOL73DMGnLgOwdvYPNOn6RKrLmExGvHw8sXhasHiaMXuYSYhLoOBj+Yi+GcPBbY4G24XjwcRExFKudikXRicPQg0hoEB2P0Ki7py4QqIi8fP0JLvFI2leoI8POy+dZ+Lun2mx9CsOXAnh81btHe95+zCqfmNe37weW9ZLsJHXy5/QuJtJ06FxEWS3eOFj8kyaV8QnD9duRTG8QkcW1HmF6dV7YjI6Dp+1l/YTmRDLusZDWd94GBdirvFz2DGXx/Gg8uby5Ur4nf0fej2S7N6e+Hh5OJWzWm00rFyS9RP6UKV0Idb84jjhFcmbk1y+3nw6sCNLRnenT9s6RMbEuTSGh5HP35fLN+80XC5HROLr5YmPh3P87z/9JMv2Heb4lasprqdzlQqERkbzw/HT6VrftJbfx5fguxouIdGR+Hk4f/7zemdnV8h5Ju77meYr5nMgNJi5TzmPcnm2TBBXYqLZeO6ky+r+sPJn8yMk5s7zWy7HRuDr4UV2853Yi/vmIiwuig9rtGbVUz35quFzmG9/9n++8hdno8IBKODtx0ula7L+Qua+CAgolIuwi3cy62EXw/Hx98bbN1uqymz66kcir0ez+OwMlp6fSfDpK/y6bj+XTl7Gy8eTak86+qSUrlaCouULkSt/DtcFl4ZsGF32ymguqUFwcPA/vjKawQB2kjdgrHZb0t8XIyPosXYlJ8IdH445B/ZSxN+fYv45mNasNe//vJ2wmGiX1TktGQ2GFKIHK3fiNxtMPBFQmpUXfuPFXTP55vwuplR7EYvBxH8fa8r1+GhabB3H09vH42fx5rli9VwXwEP62/httmTzfjx4miffmMWc73Yx7Y2OGAxgNhmpVb4owz5bS/cxi/D38eKVDlkp/pSPf9tdx/9z1YNItNlYfvDvb0+8WLsqs3bsTpc6pqe/3f93XdRciLrJSxuXc+K6oxE4+/AeivjloHD2O9mPXhWqMe3grvSubppyxJ7Sue/OPLPBRKP8jzauKIYAACAASURBVLH09AHab/6Sr07u5Yv6z+JhNCWVqZAzH8uavMDCk3vZFnIq2foyE4PRSErXqzarLVVlnh/RiZtXI3i2UF+eK94f35w+dBrYipjIWEZ3nkSXt9sxa++HPPl8fQ5uO0JiGtwWk/Tlks7SL7/8MmfPniUwMJB7uyQZDAa2bNnyN0u6RnBkJFXy5k+azpc9OzfiYolNvHMAl82dh3K5A1h54s7VjgEDAd4+FPbLwYh6jQAI8PbBaDDgaTYzdNsml8XwMC7H3uRx/8JJ0wGeftyMjyHOmpA0L+xWBH9FhXHk5kXAcWvs3QodKeidi8Z5H2fi0TUk2q0kJlpZF7yfpnkrsPjszy6P5UFcDo+kQvF8SdMBObJzMzqOuLtOYIUCc5Dbz5vfTzka7t/9/AfDujfFz9uLsBvRbNt/Mqlz9fpfj9K7TW3XBvEQQm5GUqngneM/r192bsTGEZtwJ/4OlR7Hy2JmVZ9uWEwmvMyOv/ssXkVoVDTl8gVgNhr57dzFjAjhoQRHR1I58K7Pv4/v7c//neO/bK4AyuUKYOWpP5PmGTCQYHN0iH88dyBmg5FfQy64ruJpIDgmgkq5CyZN583my41bscTe9dm/EhfJ6Yir/B7uOPZ/CD7BOEMrCvvk4HTkNZ4uXJ73qrVg9P6NrDmfNv140lPYhauUrVkyaTpPwVxEhEcRF3MrVWXqta/BjIELSEywkpgQy+aFO6jfsSYrpn5PXHQcQ54am7Tcl0c+Ifj0FdcElsasLnygYkZzSUZoyZIlFC9enI8//pitW7c6vTK6EQSw48JZKufNTzF/Rwqz2+OV2PyXc3rfZrczun4TCvk6nnXwfIVKHLsWxp6QS9T9ag6tli2k1bKFLPrjd9aePJ5lGkEAu6+dpEKOIhT2zg1AxyI1+SnUOb29K+wEBbLlpKxfAQCq5CyG3W4nOPY6xyMu8WQ+RzrYZDDSIKAcf9zIOl8Ivx45S4WS+Skc6Nj/nRpV4seDzle1efx9GNenNf7ZvQBoWbsspy9d42Z0HFv3neDJ6qXxtDiuKxpVeYw/z2adk9/Pp89RqWA+iuZyxN+lWhBb7rm99cwXS2jz2ULaz1lEn8WriEtMpP2cRYRGObKgNYsW4te/subomJ8unqVKYAGK+d3+/JetxKbzzvvfZrfzXp2mSRmg7uUqc+x6GJdjHLcUa+UrzM6QrBf/z5fPUCV3AYpld4x6e65kVX4IPuFU5seQ0xTyyUGFnI6LhRoBjoumC9E3aFKgFCOrNuOlH5dkiUYQwL7NhylXsxQFHnPE83Sfpuxasy/VZU4eOEuDzo4LHZPZRJ02VTn22ynsdjtjV79FqaqO0bMNn6lNfFyCRo1lAS4bNXbo0CG+/fZbxowZ89DrSutRYwCNihbn7dr1sBhNnIu4waAfNlDEz5+PmjSj1bKFALQvXY5+VWtiMhgIiY7i7a0bCY5y7hQ5sEYdcnply1KjxgDq5inNq6WbYzaauBQTzujD31IwWy7erdCB53c6Hn1eJWcxXivTkmwmD+JtiUw6upbfb5zD35KNweXbUsa3ADa7jT3hp5l67HsS7Wk/fDw9Ro2BY/j8qx3rYTEbuRh6k1FfbqBgHn+Gv/gU3d7/GoBOjYL4T+PKJFptXL0ZzUeLthB8NQKjwUCvp2vxVI0ymIwGjp0LZdzCH5INv08L6TFqDKDBY8V4s0k9LCYj56/f5O1VGyicMwdj2zxJ+znOv/Bc0N+PNf26O40aG9myMWFR0cza8Vu61A/Sb9QYQONCxR3D502Oz/8bP66niK8/H9VvQauVCwDo8Fh5+gXVxGQ0EhIdyVs/bUjqWzSm7pOExkQx7eCv6VbH9Bg1BtAof0kGBzXGYjRxPuo6g3d/RxGfnIyr0Zo2m+YCjsbP0EpN8TZbiLdaef/AJvZdvcjmln3x9/DiSuydPmb7rl5g9P6NaV7PtBo1BlCjRWV6jn0Wi4eZ4NNXmNBzFvmKBzJodm/61Xjnb8tEXo/GN1d2+k99iccqF8NqtXFw6x/MeXsRiQlWKtYvS79PXsDsYSY85AZTXpnL5b9C06TOrh419vqBri7b1tQqS1y2rZRo+HwWk14NoawivRpCWUV6NYSygvRsCGUF6dUQyirSsiGUFakhlH70QEURERFxYrNn/GguV3GfSEVERETuoYyQiIiIOLHiPrfhlRESERERt6WMkIiIiDix6TlCIiIiIo8+NYRERETEbenWmIiIiDjR8HkRERERN6CMkIiIiDixafi8iIiIyKNPGSERERFxYtXweREREZFHnzJCIiIi4kSjxkRERETcgDJCIiIi4kQ/sSEiIiLiBpQREhERESd6jpCIiIiIG1BGSERERJyoj5CIiIiIG1BGSERERJzoOUIiIiIibkANIREREXFbujUmIiIiTtRZWkRERMQNKCMkIiIiTvRARRERERE3oIyQiIiIOFEfIRERERE3oIyQiIiIOFFGSERERMQNKCMkIiIiTpQREhEREXEDWTIjtKbt5IyuQobJYbRmdBUyVFwF97lKSUmcG/0Qojh7q26njK5ChkpMTMjoKrgVZYRERERE3ECWzAiJiIhI+tGTpUVERETcgDJCIiIi4kR9hERERETcgBpCIiIi4rZ0a0xERESc6NaYiIiIiBtQRkhEREScKCMkIiIi4gaUERIREREnygiJiIiIuAFlhERERMSJXRkhERERkUefMkIiIiLiRD+6KiIiIuIGlBESERERJxo1JiIiIuIGlBESERERJxo1JiIiIuIGlBESERERJ+ojJCIiIuIG1BASERERt6VbYyIiIuJEnaVFRERE3IAyQiIiIuJEnaVFRERE3IAyQiIiIuLEbs/oGriOMkIiIiLitpQREhERESc21EdIRERE5JGnjJCIiIg40XOERERERNyAMkK37fvVxKIvPEhMgCIlbLzy5i28fZzLrF9pZsNqCx6edgoWsfPf127h6we3bsHcTz04ddyE3Q6lylr574B4PD0zJpYH8euvZr6Y60lCPJQoYePNIbH43BP/yhUWVq/ywNMTihSx8drrsfj5OZcZPTIbuXPbee31ONdVPg3s+dXEgrmeJCQYKFbCyuuD45Lt/zUrLaxd5YGHp53CRWz0GxCHrx9ER8HUiV5cvGDEboOmzRLp3DU+YwJ5QO5+/Ltz/DWalKfH0DZYPEz8dTSYKUOWEBN1K8Wyb07qxtnjwSyfvS1p3tLfP+BqyI2k6eWfbWXbqn3pXu+HUbNVVXqNew6Lp4W/Dp3jk//OIiYyNlVlRnzzJgUfy5dULl/xQA79+Ccj239EkXKFeGP2y2TL7oXdbueLYYvYu+l3V4eXJvQcITdz8wbMmOjJkFFxfDo/lrz5bSya6+FU5o+DRlYtszBqQhwTZ8dRtaaV2ZMdZ7oViyxYrfDJnFg+mRNLfLyBlUssGRHKA7lxw8DEj70YNTqW+V9Fk7+AjbmfezmVOXjAxLKlnkz4JIbZn0dTs1YikydlcyqzbKkHhw+bXFn1NHHzhoEpE7wYNjqW2QuiyZffxvy5zt9ihw6Y+N9SDz6YGMO0OTFUr5XItEmO/9HX8z3JE2Bn5hcxTJ4Zw/o1Fo4eyTofLXc//t05fv9cPgz65DnG9vmS3o3Gcfn8NXoMa5usXOHH8vLh0lep17qS0/yCJQKJvBFD/xYTkl6ZvRHkn8ePwV++wvudJ9Kz3OuE/HWFXuO7pbrMmP98Qt+qQ+hbdQiT+nxG1I1opvWfC8CAGf9lw7yt9K06hIm9ZjJ82SCMpqxzLnBX2kPA7/tMPFbaSv5CjgcnNG+TyI4tZqfnKJw+YSSoqpXcAY6ZteolsvdXEwkJUC7IRufnEzAawWSC4o/ZCLuSdVrT+/aaKF3GSqFCNgDatI1nyxaLU/wnTpioWi2RgNvx16ufwK+7zCQkON4/eNDEnt/MPN0mwdXVf2j795ooVcZGwdv7v1XbBLbfE/+pk0YqV7WS53b8desl8tuvjvj7vHqLXn0dV9Dh4QYSEkiWTcvM3P34d+f4qzYoy4nfzxN8NgyAtQt/oXH7asnKPf1iPTYu/ZUd6w46zS9fvTg2q40Jywcwc9PbPPd6c4zGzB17tWZBnNhzmkunLgOwZtYmmj5X/1+XMVvMvDW/P7PemE/YxWsAGE1GfHM6PvzevtmIj8tameG72e2ue2U0lzWEfvjhBxYuXMj58+ed5i9btsxVVfhb10KN5A68szdyB9iJiTEQG3OnTKlyNv44YEo6wW3baCYxwUBUhIHK1a0UuH0SDbtiYN0KM3UaWl0aw8MIDTUSeFf8AQF2YqINxNwVf7lyVg4cMHPlsiP+jRssJCQYiIgwcPWqgZnTvRj2bizGLNi0vhpmJE+ALWk6z+34797/pcvaOHTQROjt/b95o4XEBAOREQYMBscX4MRxXrzay4eKlawULGy7dzOZlrsf/+4cf54COQkLvnNb62rIDXz8suGd3TkjOmvEcrankOkxmYwc/PkEw5//jCGdP6Vqw7K07dEg3ev9MAIK5yHs4tWk6bCL1/Dx98bbN9u/KtOiVxOuBYfzy6rfkuZN6z+XLkM7sPj8Z3y0eSSfvvI5NmvWORe4K5d8bU2cOJGvv/6as2fP0rVrV1avXp303tKlS11RhX9ks5PiExPu/lIvX9HGMy8k8PEoT956xQuDAbL72jGb75xAT58wMmKgFy3aJVK9dtY4EcLft8jvjr9ikJUXXrjFqJHevNLXB4MRfP1sGA0wbmw2+r0SR+7cmaBp/wDsNjCkcADcHX+FICtdu8fzwchsDOznjdEAvvfs/8HvxLF4ZRSRkQaWLvRIvsJMyt2Pf3eO35G9Sf65tVpT91nesGQXs0Yu51ZsPNERsaz8fDt1WwSlcS3TltFoSPGcd3eDJTVlOg1szaIPlidNWzwtDF/6BhN6zOC5In15s+FIXv/sZQIK5U7T+ruK3W5w2SujuaSz9I8//sjKlSsxm810796dnj174uHhQcuWLbFngrxYQKCNk0fv/CvCrxrI7mvH664uMLExUD7IStOWiQBcu2pg6XzIfruz8M/bTMz91JNe/W9Rv2nWOAn+v8BAO0eP3jkYr4YZ8PW1k+2u+GNiIKhSIi1bJSSVmT/Pk5AQIyEhRj6b5egvEx5uwGaD+AQv3hycNTpMBwTaOH7szv6/lsL+j4mBCpUSaXZX/F/P88TXD/btMVGsuI3ceRz/s4aNE/hlR9boIwI6/t05/tBL1ylTpWjSdJ58/kTeiOZWbOpu6TTpWJ0zfwZz9lgw4LigSEzI3PGHnr9K2ZqlkqbzFMxFRHgUcTG3Ul2mZOVimMwmDv34Z1KZ4hUK4+ntye51+wE4uvsk545coGytUkm3ziRzcklGyG63Y7h9yV2sWDFmz57NBx98wO7du5PmZ6RK1aycPGoi5KKjLpvWmKlRN9GpTPg1A6Pe9CIm2jG9YpGFek0SMRhg7y4TX87wZPj4uCx1Evx/1aoncvSoiYsXHYfDmjUe1K3r3Nfn2lUjb77hQ/Tt+Bct8qRJ40TKP25lybIoZn8ezezPo3m6TQKNGiVmmUYQQJXqVo7/aeLS7f2/fo2F2ins/2GDvJP2/zeLPWjQJAGDAX7ebmbJVx7Y7ZAQDzt+tFCpSuK9m8m03P34d+f49/90jLJVilGgWAAArZ5/gl2b/kj18sXK5Kf74JYYjQY8vCy0eak+P605kF7VTRP7Nv1OudqlkkZ+Pd23GbtW7/lXZYIalufgNuf/06VTl/Hx96Z8ndIA5C+RlyLlC3HqwF/pGY6kAYPdBSmZ6dOns3PnToYOHUpQkCNtum/fPvr37098fDz79v27UQaHLxRK8zru321i0RcWEhMN5M1v47W3b3ElxMhnkzyYONvxpf79KjMbvrNgt0HZClZ6veYYIjvgpWxERRrIledO2rTM4zZ6D0j7jnI5jOlzot19e/h8YiLkL2Dj7aGxhIQYmTQxG7M/d5z9V6208N1qD2w2qFDRymsD4pINEV4w35OIm4Z0Gz4fl05p1D27HcPnExMhf347g4bGcjnEyKefeDFtjqOzyJpVFtattmC3GShfIZG+A27h6QlRUTBjshfnzjoaknXqJdLtxfh06S8VZ0+fa5escvynl6wQ/1t1O6Xp+v5fjcbleWno05gtJkLOXWPiG1+Tv0huXv+4C/1bTHAqO2jSc5w7HpI0fN7Ty8IrYztTtkoxTBYTO9YdZMFHa9OlnomXgtNsXTVbVqHnuOeweJgJPn2Fj1+cTv4SgQz6vB99qw752zKR16MAeG16L66FXGfxByuc1lup0eP0/uh5PLw8sCZaWfj+t+y8p5H1oDbbvk2T9aRWxe9GuWxbh9u+57JtpcQlDSGAXbt2ERgYSMmSJZPmhYSE8OWXX/Luu+/+q3WlR0Moq0ivhlBWkV4NoawivRpCkvmlV0Moq0jLhlBWpIZQ+nHZAxXr1KmTbF7+/Pn/dSNIRERE0pceqCgiIiLiBvQTGyIiIuIkEwzodhllhERERMRtKSMkIiIiTjLDgw5dRRkhERERcVvKCImIiIgTZYRERERE3IAyQiIiIuLEjQaNKSMkIiIi7ksZIREREXGiPkIiIiIibkAZIREREXHmRp2ElBESERERt6WGkIiIiLgt3RoTERERJ+osLSIiIuIGlBESERERJ3Z1lhYRERF59CkjJCIiIk7UR0hERETEDSgjJCIiIs6UERIRERF59CkjJCIiIk40akxERETEDSgjJCIiIs6UERIRERF59KkhJCIiIk7sdoPLXv/GmjVraNWqFc2aNWPRokXJ3j9z5gzdu3enbdu29OrVi5s3b953nWoIiYiISKZ35coVJk+ezOLFi1m1ahXLli3j1KlTSe/b7Xb69etH7969+e677yhXrhxz5sy573rVEBIRERFndhe+Umnnzp3Url2bHDly4O3tTfPmzdmwYUPS+0eOHMHb25sGDRoA0LdvX7p163bf9aqztIiIiGSYiIgIIiIiks338/PDz88vaTo0NJSAgICk6cDAQA4dOpQ0ff78efLkycM777zD0aNHKVGiBCNGjLjv9pUREhERkQyzYMECmjZtmuy1YMECp3I2mw2D4U6fIrvd7jSdmJjIb7/9RteuXVm5ciWFCxdm/Pjx992+MkIiIiLixJU/uvriiy/SoUOHZPPvzgYB5MuXj7179yZNh4WFERgYmDQdEBBA0aJFqVixIgBPP/00AwYMuO/2lRESERGRDOPn50ehQoWSve5tCNWtW5ddu3YRHh5ObGwsmzZtSuoPBFClShXCw8M5duwYAFu3buXxxx+/7/azZEboMbMlo6uQgdw5dvE0uO/+P50YldFVyFDWK2EZXYUMZXDr834GyIQPVMybNy9vvPEGL7zwAgkJCXTu3JmgoCB69+7NgAEDqFixIjNmzGD48OHExsaSL18+Pv744/uu12C3Z71fFIkNKZ7RVRDJEGoIua9XSzTK6CpIBtoUv9il2yv21f371qSVsy8Mddm2UpIlM0IiIiKSnlzXRyijqY+QiIiIuC1lhERERMRZlus08+CUERIRERG3pYyQiIiIOFNGSEREROTRp4yQiIiIOHPhk6UzmjJCIiIi4raUERIREREnWe9Ryw9OGSERERFxW8oIiYiIiDNlhEREREQefWoIiYiIiNvSrTERERFxpuHzIiIiIo8+ZYRERETEiUGdpUVEREQefcoIiYiIiDNlhEREREQefcoIiYiIiDONGhMRERF59CkjJCIiIs7UR0hERETk0aeMkIiIiDhTRkhERETk0aeMkIiIiDhTRkhERETk0aeMkIiIiDjTc4REREREHn1qCImIiIjb0q0xERERcWJQZ2kRERGRR58yQiIiIuLMjTJCagjd9tMuA9M+NxGfYKBUCTuj30oku49zmSUrjCxdacLTw06JonaGDbTi7weDR5o5f+lOueDLBqpVsjN1XKJrg3gIit+949++CybPgfgEKFMCxr5Nsvi/Xg6LVoKXJ5QoAiPegBx+YLXCmCmw93dHuQa1YUg/MGShQSd7fjWxYK4nCQkGipWw8vrgOLzviX/NSgtrV3ng4WmncBEb/QbE4esH0VEwdaIXFy8YsdugabNEOneNz5hAUqlmy8r0HNsFi6eZvw5fYFKfOcRExqaqjNFo4NWpPQhqUBaA374/yOdDFwNQqWF5+nzcDZPZRMS1SD4bvJAzh867PL77cff4xZlujQHhN2DUR2Ymvp/I6oUJFCpgZ+ock1OZPQcMzFtsYs4nCXzzRSL1atsZM9HRjpz4fiLffOF4jRxixTc7DBuYdb4EFb/if3c8TB0D338NhQrAJ7Ody+zeD3OXwLxJsPILR2Nn1ETHe99tgrMXYPU8WPkl7DkIG7e7PIwHdvOGgSkTvBg2OpbZC6LJl9/G/LmeTmUOHTDxv6UefDAxhmlzYqheK5Fpk7wA+Hq+J3kC7Mz8IobJM2NYv8bC0SOZ99Tqn8eXwZ+/zPvPTqFXhcGE/HWFXh90SXWZpt3qU7h0fl6u8jZ9qw0jqEE56neqhbdfNkZ+8wafD11M32pDmfbaPN5dPACLR+a63nb3+CU5l31az549y5UrVwD49ttvGTt2LOvXr3fV5v/Rrj1GHi9rp2ghx/Qzba18/4MR+12pwT+PG6hVzUbeQMd00/o2ftxlICHhTpmEBBj5oZkh/RPJF+i6+j8sxe/e8f+yByqUhWK34+/aDtb+gFP8R05AnWokxfVUA9i205FBstogNs7xd3w8JCSCp4fr43hQ+/eaKFXGRsFCjoBbtU1g+xaLU/ynThqpXNVKngDHzLr1EvntVzMJCdDn1Vv06nsLgPBwxzHh45NsM5lGtaeCOL73DMGnLgOwdvYPNOn6RKrLmExGvHw8sXhasHiaMXuYSYhLoOBj+Yi+GcPBbUcAuHA8mJiIWMrVLuXC6O7P3eOX5O7bEPryyy+pWrUqlSpVolWrVgwcOJBZs2axdetWLl68mKqNzJ8/n169etGlSxeGDRvGunXrKF68OMuXL2fGjBkPHcTDuhIK+QLunPXyBkBUtIHomDtlKpa3s+eAkWDH54LV3xtJSDBwI+JOmZXrjQTkttOkfta6uar43Tv+y6GQ/66GW0rxB5VzZIUu3Y5/5fckxd+hBfj5QqNO0KAjFCkIjZ2/VzK1q2FG8gTYkqbzBNiJiTYQe1f8pcvaOHTQROgVx/2+zRstJCYYiIwwYDCAyQQTx3nxai8fKlayUrCw7d7NZBoBhXIRdvFa0nTYxXB8/L3x9s2WqjKbvvqRyOvRLD47g6XnZxJ8+gq/rtvPpZOX8fLxpNqTFQEoXa0ERcsXIlf+HK4LLhXcPf7UMthd98po920IzZ49m48//pjNmzczZswYatWqxZUrV/j8889p37491atXp2vXrv+4juXLl7N+/Xq+/vprNmzYwOzZs+nWrRuzZs1i48aNaRbMg7LZU+7PYLrrv1M1yM7LL1oZNMLMc33MGIzg72fHclfW8+tvTfy3uzX9K5zGFL+bx/8339nGu+KvXgleeQleGw6d+zj+X/8f/4z5kNMfdqyC7f+DmxEwb5krap427LaU9//d8VcIstK1ezwfjMzGwH7eGA3g62vHbL5zFh/8ThyLV0YRGWlg6cLMmxIzGJ2znf/PZrWlqszzIzpx82oEzxbqy3PF++Ob04dOA1sRExnL6M6T6PJ2O2bt/ZAnn6/PwW1HSIzPXLeJ3T1+Se6+Ny+zZ89Oo0aNMJvNBAYGUq1aNaf3L168yMmTJ/9xHTabDQ8PDwoWLEjPnj3x9Lxz/91qzfgvjvyB8MfRO2fC0Kvg52sn250LBKJjoFolGx1aOz4sV8Jg5pcm/P0c7x87acBqheqVM0Hz9l9S/G4ef144dPTO9JWr4O9rx/ue+GtUgs6tb5cJg0+/dHSW3rwDhg8AD4vj1b4FbPwRejzr2jgeVECgjePH7pwKr101kN3Xjtdd8cfEQIVKiTRr5bgXejXMwNfzPPH1g317TBQrbiN3Hscx07BxAr/ssLg6jFQLu3CVsjVLJk3nKZiLiPAo4mJupapMvfY1mDFwAYkJVhITYtm8cAf1O9ZkxdTviYuOY8hTY5OW+/LIJwSfvuKawFLJ3eNPNf3Exh19+vTh22+//dv3CxUqROPGjf9xHc2aNeP555/HarXy2muvAXDs2DGee+45WrZs+S+rnPbq1LBx6E8D527f6fvfdyYaPeF8mRx2Ff470EJUtGN67tcmWjSxJV1J7j1ooGYVW5YaKfP/FL97x/9EDfj9Tzh7O/5l30GTe25thV6FFweSFP9nC6F1U0cmpXwp+H6bY35CImz9BSqVd139H1aV6laO/2ni0kXHzlu/xkLtus5X8eHXDAwb5E3M7fi/WexBgyYJGAzw83YzS77ywG6HhHjY8aOFSlUybxZg3+bDlKtZigKP5QPg6T5N2bVmX6rLnDxwlgadawNgMpuo06Yqx347hd1uZ+zqtyhVtTgADZ+pTXxcQqYbNeXu8UtyBrs9pQTgHVWqVCEhIYEGDRpQv359ypUrR5kyZch29+VyKuzZs4caNWokTZ85c4YLFy7QsGHDf13p2JDi/3qZ+9nxq2P4dEKCgUIF7Ix9J5GLwQbem2Dimy8cJ7WlK4wsW2XCZocqFW0Mfd2K1+3k1rgpJgJy2en9QubtG/BPFH/WiN/TkD6Zhh9/dQyfT0iAwgVh/DtwMRhGTHCMEgNYtAIWr3TcSqxaEUYMdAylv34Txk6Bo6cct5PqVIUhrziyQ2npdGJU2q7wLnt2O4bPJyZC/vx2Bg2N5XKIkU8/8WLaHEdnoTWrLKxbbcFuM1C+QiJ9B9zC0xOiomDGZC/OnXVcV9apl0i3F+Odbq2lhVdLNEqzddVoUZmeY5/F4mEm+PQVJvScRb7igQya3Zt+Nd752zKR16PxzZWd/lNf4rHKxbBabRzc+gdziKxHXAAAIABJREFU3l5EYoKVivXL0u+TFzB7mAkPucGUV+Zy+a/QNKt3WsmK8W+KX5wm60mtElMmuWxbZwYOctm2UnLfhtD58+c5fvx40uvYsWMEBwdTqFChDOvfkx4NIZGsIL0aQllBejaEsoK0bAjJ/7V373E21fsfx9/7Ombcb2NocpKmcMplnFzquBZyqYhTUnHcJ02uKeQWchANoVJuKU6oKBVRyq8yqYSpqNxzGWaYGIzLnr3X74+pzTpDRmb2ntnr9Xw85vGwtu/e6/OxvjPz8fl+194FD4VQ3rnsHqGKFSuqYsWKatasmf+xjIwM/fLLL3kaGAAACJKCt93xL/tLzduIiAjVrFkzt2MBAAAIKN7yEgAAmOSH9/cJlPz7PvAAAAB5jI4QAAAwoyMEAAAQ+iiEAACAZbE0BgAAzFgaAwAACH10hAAAgAm3zwMAAFgAHSEAAGBm2IIdQcDQEQIAAJZFRwgAAJixRwgAACD00RECAAAm3DUGAABgAXSEAACAGR0hAACA0EdHCAAAmLBHCAAAwALoCAEAADM6QgAAAKGPQggAAFgWS2MAAMCMpTEAAIDQR0cIAACYcPs8AACABVAIAQAAy6IQAgAAlsUeIQAAYMYeIQAAgNBHRwgAAJhw1xgAAIAFFMiOUNK5YEcQPFVd3mCHEFQumyPYIQTVxnPWnfzLj9ULdggIIiPTE+wQrIWOEAAAQOgrkB0hAACQh+gIAQAAhD46QgAAwIS7xgAAACyAQggAAFgWS2MAAMCMpTEAAIDQR0cIAACYsFkaAADAAugIAQAAMzpCAAAAoY+OEAAAMKMjBAAAEProCAEAABPuGgMAALAAOkIAAMCMjhAAAEDooyMEAADM6AgBAACEPjpCAADAhLvGAAAALIBCCAAAWBZLYwAAwIylMQAAgNBHRwgAAJiwWRoAAMAC6AgBAAAzOkIAAAChj44QAAAwoyMEAAAQ+ugIAQAAE1uwAwggOkIAAMCy6AgBAAAzC+0RohD63eYNNi2d65DHY9O1lQz1GJip8MLmMauX2/Xxew653YYqVDTUOd6rIsXO//3RFGlMP5fGvexR0eKBjf9qfZFo18zZLp3zSDHXGxo++JyK/E/+i99xaMlyp8LcUqW/+fRkP4+K/57/0uUOvfuhU2fPSlVu9GnEYI/c7sDn8Vf9X6JN01916JzHppjrDY1+MjNb/v99x643lzkU5jZ0/d8MDe3vVfFi0hMjnfr1wPlxBw/ZVLuGoWnjMwObxFXYtMGuJXMc8nikipUM9RiUqYiLzP/V7zrkdksVKhr69+OZ2eb/6L5ujZ91rsDN/wslbzyhHxYdltdjqPjfCukfj1aQK8JhGnNgQ7q2LkmRbDa5izhUO66CikQVnAlfp2VNdRvXUa4wp3Z/v0/P93pFGSdO52iM3W7TY9O6qnrDKpKkr1du1qtDFkmSajSqpl6THpLD6VD60RN6+YnXtSvp14Dndzl1WsWq+/hOcoW5tDtpr6b0eCl7/pcYM2LJIF1zQ5R/XFSlSCWt26qRbSeqYtVoDZjVW+FFCskwDM0ZulDfrt4S6PRwhVgak5R+THp1slOPj8zUpLkeRZY3tHiO+Qff1s02fbDEoSETPRr3cqZq1DE0b+r5OvKLNXY9O8il344WvJXV345JYya5NfGZc3p7wVldU96nGa+4TGO+3WTXgv+69OKUs1o0+6xur+vT+ClZP/jX/p9dS5Y5NXPyWS2ed1Znz9q06K2CU2OnHZNGTXRq8phMvfu6R9EVDE17xXz9v9lk07xFDr0yxaMlczL1z3qGxk7OynHymEwtmZP1NXKwV0WLSEP7F5wi6I/5329kpibP+2P+m6/f1s02rVjs1NBJHo2f5VHNOj7NuWD+f77GrnGD3AVy/l/o7PFMffviAdV74lrd9UKMCpdz6fuFh01jvGd9+nr6ftV/4lo1m1xZ5f9RRJvnJgcp4itXvExRPfFqb415YKq63/yEkncfVvdnO+Z4zB0PNdC1N5ZX71pPKa72UFVvWFUN2tdVRLFwjVwyQK8OWaS42kM0/fF5enpRX7nc+etnQfEyxfTE3D4a02GyulXtl5XbhIdyPGbs/VMUFztYcbGD9Xyvl3Xy2ClNj58tSeo7s4dWzVuruNjBmtz9RQ1fPFB2R8H8NWszAvd1JVasWKFWrVqpefPmWrhw4SXHffbZZ2ratGmOXjMoV2jChAnBOO0l/bDRrutvMhR1TdZx0zZeJa61y7jgAu3ZbtPfa/lUqmzW8T9u92nTBpsyPdJvR6WN6+0a/B9P4IPPBV9941C1m3yqGJ2VcPt7vVr1icOU/7Zf7Lq1tlflfs+/SQOvPk+0y+ORPlzt1EP3Z6p4Mclul4YOPKdWzQpOIZD4jV1/r2Lob9FZx/+6x6uVH5uv/9afbapb26dykVnHdzTwaV2iTZ4LLrnHI438j1OD4zMVFRm4+K/W9xvtqnSjT1G/X/877vZq/Sfm/Hdvt+vmWj6V/mP+/9OnTV/Zs+b/EWnjl3Y9WUDn/4UOJ51UycrhKlo+TJJUuXkp/fr5cRkX/GMYPkMyJE+GT5KUecYnh7vgFIC1m1XXz9/u0sEdhyRJ78/6WE0fvD3HYxwOuwoVDpMrzCVXmFNOt1OeMx5dc0OUTh3P0OZPf5Qk7fv5oDLST6tqvZgAZnd5tZtX1y/f7NSB33Nb8dJq3dGpwRWPcbqcenJ+vF4aMF+p+49KkuwOu4qWzGqlRhQN17kz5/I6HUs5fPiwEhIStGjRIi1fvlyLFy/Wjh07so07cuSIJk6cmOPXzfNSfejQodkeW7t2rY4fPy5J+s9//pPXIVzW0VSpVNnzP+hKlZVOZ9h0JkP+5bHKVQytXu7QkcNelSkn/d9quzI9Np1Ml0qWlvqNKji/+P/X4VSbykWezz+yrKFTp2w6lSH/8tDNVX1a/I5LyYdsKh9laMWqrGXE4+nSr/tt+u03mx5/0q0jR22qeYtPfXsXnF+Kh1OkqAuuf7my0sn/yf+Waob++45DBw95VSFKenelXR6PTcfSpbKls8Ys+9CusqUNNW1QsBbXj6ba/AWOdH7+n86Qf3mschWfVi9z6chhZc3/j7Lm/4l0qWQZqf/ogjv/L5RxxKOIMue7oeGlXco87VPmaZ9/ecwZ7lBsrwr6dPhuuYs6ZPgMNRlbKVghX7Gy0aX8v7glKXV/mgoXj1BE0XD/8tCfjVm9YJ0atK+rRXtmyuG0a+PH3+urD75TRNFwFSocptp33qKNH3+vG2tfr79Vi1ap8iUCnuOfKXttGaXuP+I/Tt1/NHv+ORhzV/emOnowTV8u/9o/bnr8bD33ySjd17+NSkQW1/gHE+Tz+gKUWS4L4I+x9PR0paenZ3u8WLFiKlbs/Pr7+vXrVa9ePZUokTWnWrRooVWrVik+Pt70vOHDhys+Pl5TpkzJ0fnzvBAqUaKEli9frri4OH9CX331lerUqZPXp84xw5BsF/kPnf2CftlNtxhq97BX055xymaTGrbwqXBRQw5X9ucVNIbv4vlf2NGtVd2nnp0zNXikW3abdHfLTBUvZsjplDIzpQ0b7Zo87pzC3NLoCS69OMelQfEFoxjyXeL6X5h/bHVDvbt4NXCEU3abdG8rn4oXM+S64DvojaUOjRhU8AqCS13/C+d/lVsMtXskUwmjXbLbpIZ3eVWkqCFnCMz/CxmX+OFvs5//Bzq+94y2Lk1V84QbVCTKre0fHlXilH2687nKsl3sHzKfsdntF83zwl/Yfzbm4RHtdfxIuh6IjpM73K3Rbw1U+/6t9PbUDzW6w/PqOuZ+9ZjQSd9//pM2f/qjMs/lr+8Ju9122fxzMqZ9/9ZK6D3Lf+wKc2n4mwP0XNeZ2vDBd6paN0Zj3huin7/ZaSoqkd1rr72mGTNmZHs8Pj5ejz/+uP84JSVFZcue/19bZGSkkpKSTM9ZsGCBqlWrpho1auT4/HleCD311FNq2LChpk6dqoEDB6pu3bp67bXX1K5du7w+dY6VLivt/On8D7DfjkiFixoKCz8/5nSGVKW6T41aZn0jpB2R3n7NoSJFAx1t7itXztAP287/1ktNtalYUUPhF+R/KkOKrenTva3PSpJSUqVZ81wqXkwqU9pQkwZef/ek5Z1ezV5QcH5Dlo+Ufth2/vqnHNFF869dw6d2rbOu/+FU6cW5Dv9m8Z+22+T1Sv+oWbC6QZJUOtK46PwvdJH53/jC+T9fITH/LxRRxqW07ec3zZ5O88hV2CFnofPfH4e2nFSZKuH+zdE3tCilLfMP6dwJr8KK5a/9MBeTuu+IqtSp7D8uc00ppaed1JmMszka88+2t2pm/9eU6fEq03Naa17/XA3uq6N3pq3UmVNnNLjZOP/z5v44RQd3mvdYBVvKr0dUpc755bqL5X+5MZVrXieH06GkdVv9YyrdfK3CIsK04YPvJEnbNmzX3h/3qUrdGAqhy+jSpctFa4ILu0GS5PP5TP/ZMAzDdPzLL79o9erVmj9/vg4dOpTj8wdkj1D9+vU1a9YsLVq0SBMnTpTX6w3EaXPslto+7dxm06Hf7/xZ+75DsfXN7cxjR6Xxg106fSrr+L1FDtVv4rvo/6QLmnr/8OqHbXb9uj8rmbdXONTwdvM1Sj1iU1z/MJ38Pf+5b7jUvKlXNpt0RyOvPv7MoTNns/5H/dmXDlWrUnDawfVv9Slpq01792cdv/WeQ41vN8efekTq0d/lz3/2Gw7d1fT89f92s011ahXM+XBLbZ92bLPr0O/X/5OLzP/fjtr07BNuZfye/7uLnCEz/y9UrkYRpW3P0InkrF94u1b/pgq3mqu9kpUKKXVrhs4cy+p0HPjmhApHugtEESRJG9d8r6p1YlTh9zuf2vS6Q4krNuZ4zPZNe9SwQz1JksPpUP27Y/XT1ztkGIbGvfukYmKzlgkb/auezp3x5Lu7xjau3qKq9WL8d361iWuuxHe/uaIx1RtV0+ZPfzA958COQypcPELV6t8oSSp/fTlVrBatHZt252U6eccI3FexYsUUHR2d7et/C6GoqCilpqb6j1NTUxUZeX5D5qpVq5Samqr27durV69eSklJUadOnS6bqs0wLtUMzhtLly7VypUrNXfu3L/8Ghv25v56/JavbVoy16FMj02RFQz1HpyplEM2zX3eoXEvZ/3AW/Nu1u3zhiHd+HefOsd75Q4zv07n5m7NXJp3tw9XdeVNEfnlV3bNfNUlT6YUXcHQ6KHndCDZpnHPubVodtYvhSXLHFq63CmfIdW82afB/TwqFCZ5vdLcN5xa86lDXp9UJcanoQM92W4/zw0um+Pyg/6Cz7/Kun3e47EpuoKhccMytf+gTc8859CSOVnX/8137Fq83CGfIdW6xach/bwq9Pv1Hz/VobKlDPXsnLcF4A+evPl23bzB/vv8lyIrGIp7MlMpyTbNft6p8bOyljj/ePsInyHddLOhLvGZ2eb/w83C9NJbZ/Nk/i8/Fpv7L3oRyd9l3T7vyzRUuJxbdeKv0ckUjza+dFDNJmd1SXasOqqdq9Jkd2bdPl+ze3kVv7ZQnsb1zT9y7/Vvvaumuo17QC63Uwd3HtZz3V5SVKVIDZzVU4/eOuySY078dkpFSxVR/LR/64aa18nr9Wnz2h/0ylMLlenx6pYGVfTolM5yup1KSz6mqX1m69DulFyJ2cjMvaX2Oi1rqdv4Tv7cJnWZofLXR2rgq48qLnbwJcec+O2kJOnxGd11NPk3LXr2HdPr1mj8d/Wc+LDchdzyZnr1+pilWv8/RdZftca3NFdeJ6dq9E0I2Lm2vDAgR+MOHz6sBx98UG+99ZbCw8PVsWNHjR07VtWrV882dv/+/ercubPWrl172dcNeCGUG/KiECoo8qoQKijyqhAqKPKqECoIAlUI5Ve5WQgVRLlZCBVEgS6Eaj4euEJo8/ScFUJS1u3zs2bNksfjUYcOHdSzZ0/17NlTffv21S233OIfdyWFUMHo5QIAAMu7++67dffdd5see/XVV7ONi46OzlERJFEIAQCA/2Wh5nPBfMtLAACAXEBHCAAAmFzpR18UZHSEAACAZdERAgAAZnSEAAAAQh8dIQAAYMIeIQAAAAugIwQAAMzoCAEAAIQ+OkIAAMCMjhAAAEDooxACAACWxdIYAAAw4fZ5AAAAC6AjBAAAzOgIAQAAhD46QgAAwMRmWKclREcIAABYFh0hAABgZp2GEB0hAABgXXSEAACACe8jBAAAYAF0hAAAgBkdIQAAgNBHRwgAAJiwRwgAAMAC6AgBAAAzOkIAAAChj0IIAABYFktjAADAhM3SAAAAFlAgO0IjYuoHOwQEiZHpCXYIQWWPiAh2CAgSIzMj2CHASugIAQAAhL4C2RECAAB5hz1CAAAAFkBHCAAAmBnWaQnREQIAAJZFRwgAAJiwRwgAAMAC6AgBAAAzOkIAAAChj44QAAAwsfmCHUHg0BECAACWRUcIAACYsUcIAAAg9FEIAQAAy2JpDAAAmPCGigAAABZARwgAAJjxoasAAAChj44QAAAwYY8QAACABdARAgAAZnSEAAAAQh8dIQAAYMIeIQAAAAugIwQAAMx4HyEAAIDQR0cIAACYsEcIAADAAugIAQAAMzpCAAAAoY9CCAAAWBZLYwAAwITN0gAAABZARwgAAJj5rNMSsnQhVKdlTXUb11GuMKd2f79Pz/d6RRknTudoTNGShfX4jG6qXONvOnPqrFa/tk7vvrhaknRj7ev16JRHVKhwmOwOu5ZMXqFPFn0ZjBT/1NXkb7fb9Ni0rqresIok6euVm/XqkEWSpBqNqqnXpIfkcDqUfvSEXn7ide1K+jXg+V1OnVax6j6+k1xhLu1O2qspPV7Knv8lxoxYMkjX3BDlHxdVKVJJ67ZqZNuJqlg1WgNm9VZ4kUIyDENzhi7Ut6u3BDq9K1KnRQ11feZfcrmd2v3jPiX0maOME2cuOvaJWT2158f9euuFlf7HCheP0OSPhun5R2dr+6Y9AYo691gtf6vPfavnDzPLLo0VL1NUT7zaW2MemKruNz+h5N2H1f3ZjjkeEzf5EZ05eUY9qw9Wv3+O1K0taqpuq1qSpJGL+2vBmLf16K3D9PTdk9R70sOqcME3Tn5wtfnf8VADXXtjefWu9ZTiag9V9YZV1aB9XUUUC9fIJQP06pBFiqs9RNMfn6enF/WVy52/au7iZYrpibl9NKbDZHWr2i8rtwkP5XjM2PunKC52sOJiB+v5Xi/r5LFTmh4/W5LUd2YPrZq3VnGxgzW5+4savnig7I78+61WvExRDXq5h8Y+NF09Yofo0O5UdRtzf7Zx195UXhM/eEoN2t5qevzW5tU17dORio7JX3M8p6yWv9XnvtXzzzEjgF9BFpArlJSU5P9zYmKiJkyYoMmTJ2vLluBVyrWbVdfP3+7SwR2HJEnvz/pYTR+8PcdjYmIr6eOFX8jnM5Tp8WrDyk1qcF8ducJcemPcO9q09gdJ0pEDaTp25ITKXlMqgNld3tXm73DYVahwmFxhLrnCnHK6nfKc8eiaG6J06niGNn/6oyRp388HlZF+WlXrxQQwu8ur3by6fvlmpw78ntuKl1brjk4NrniM0+XUk/Pj9dKA+Urdf1SSZHfYVbRkYUlSRNFwnTtzLq/TuSqxTW/Wzxt36eDOw5Kk92evVdP762cbd0+vO7Vq/jr937KvTY+3fbSZJvWYpbRDxwISb26zWv5Wn/tWzx/ZBaQQGjVqlCRp4cKFGj9+vKKiolSmTBmNHDlSb7zxRiBCyKZsdCn/5JWk1P1pKlw8QhFFw3M05qevd+jOh/4ph9OhQoXD1KBdHZWKKinPWY9Wzf/M/5xW3ZsqokghbduwPSB55dTV5r96wTqd+O2UFu2ZqTd/fVEHdx7WVx98pwPbD6lQ4TDVvvMWSVnLhH+rFq1S5UsELrkcKHttGaXuP+I/Tt1/NHv+ORhzV/emOnowTV8uP//LcXr8bHUc0k6Lfn1ZE9eM1At9XpXP68vjjP66stGldORAmv849cAf17mQadzMQa/r06VfZXv+0+2m6Jfvdud5nHnFavlbfe5bPf+cshmB+wq2gK5XLFmyRAsWLFDJkiUlSR06dFCHDh308MMPBzIMSZLNbr/oh+teOGn/bMysJxeq18SH9NI345V26Li+++R7Vat/o2ncA4PvVtv4u/T03RN17ownt1O4Kleb/8Mj2uv4kXQ9EB0nd7hbo98aqPb9W+ntqR9qdIfn1XXM/eoxoZO+//wnbf70R2Wey8zDbK6c3W67bP45GdO+f2sl9J7lP3aFuTT8zQF6rutMbfjgO1WtG6Mx7w3Rz9/sNBWV+cml8vQW0B/gV8pq+Vt97ls9f2QXkI5QZmamfD6fSpQoIbfb7X/c7XbLbg/O+mnqviMqXeF8l6LMNaWUnnZSZzLO5mhMRLFwzR66SL1qPaUhLcdLNpt/Ccnldmro6/Fq/MBt6t9wVL7cKHy1+f+z7a36aP46ZXq8ykg/rTWvf64ajarJZrPpzKkzGtxsnB79x1C9OOA1Rd9Y3r/skF+k/HpEpcuX9B9fLP/Ljalc8zo5nA4lrdvqH1Pp5msVFhGmDR98J0natmG79v64T1Xq5q+lwQul7EtT6agLrnOFkjqRdlJnM6zR1rda/laf+1bPP8cMI3BfQRaQKqREiRJq3Lixdu/erbFjx0rK2ivUsWNH3XXXXYEIIZuNa75X1Tox/k3MbXrdocQVG3M8pk2vO9V51L8kSSUii6ll1yZa++Z6SdJTrz2miKLhGtBwtA7vPaL86Grz375pjxp2qCdJcjgdqn93rH76eocMw9C4d59UTGwlSVKjf9XTuTOefFcMbly9RVXrxfjv/mgT11yJ735zRWOqN6qmzZ/+YHrOgR2HVLh4hL87WP76cqpYLVo7NuXfpZONa79XlTqVVaFyOUlS6+5NlfjBpiBHFThWy9/qc9/q+SO7gCyNvf7665KkXbt2KT09XVJWN6hv375q3LhxIELI5lhquib3nKURb/aTy+3UwZ2H9Vy3lxQTW0kDZ/XUo7cOu+QYSXpz4rt6an4fvbJpomSzacGYpfpl4y5VrRujhu3rat8vB5WwbpT/fLOHvamNa5IuFU7AXW3+Lz/xuuKn/Vtzvp8sr9enzWt/0JLJKyRJ/+k8QwNe7imn26m05GMa3eH5YKZ6UcdS0zW524sasXSQP7dJXWboxtrXa+CrjyoudvAlx/whOqa8Du1JMb3uqeMZGn3fc+oztavchdzyZno1tfcsJe/KXx2xCx1PPaEpcbM14o14Od1OJe9K0XO9XlFMres0YGY39bltZLBDzFNWy9/qc9/q+edUfti7Eyg2w8gHfakr1NzdKdghIEiMzPy11yrQ7BERwQ4BQeLLyAh2CAiiNb6lAT1fkxYTA3auTz96KmDnupj89eYuAAAg+Apci+SvK6Dv9AQAAHD16AgBAAATW8HbNfOX0RECAACWRSEEAAAsi6UxAABgFppvrH5RdIQAAIBl0RECAAAmbJYGAACwADpCAADAzDoNITpCAADAuugIAQAAM/YIAQAAhD46QgAAwMRmnYYQHSEAAGBddIQAAIAZe4QAAABCHx0hAABgYuOzxgAAAEIfHSEAAGDGHiEAAIDQR0cIAACYWachREcIAABYF4UQAACwLJbGAACAiY3N0gAAAKGPjhAAADCjIwQAABD66AgBAAAzPmIDAAAg9NERAgAAJtw1BgAAYAF0hAAAgBkdIQAAgNBXIDtCNrcr2CEgSBzFiwU7hKDynT4d7BCCxpeREewQAOugIwQAAJC/rFixQq1atVLz5s21cOHCbH//8ccf695779U999yjPn366Pjx45d9TQohAABg5gvgVw4dPnxYCQkJWrRokZYvX67Fixdrx44d/r8/efKkRo8erVdeeUXvvfeebrrpJk2fPv2yr0shBAAA8r3169erXr16KlGihCIiItSiRQutWrXK//cej0ejRo1SuXLlJEk33XSTkpOTL/u6BXKPEAAAyDuBfB+h9PR0paenZ3u8WLFiKlbs/L7QlJQUlS1b1n8cGRmppKQk/3HJkiXVrFkzSdKZM2f0yiuv6JFHHrns+SmEAABA0Lz22muaMWNGtsfj4+P1+OOP+499Pp9sNpv/2DAM0/EfTpw4occee0xVqlRRu3btLnt+CiEAABA0Xbp0uWjBcmE3SJKioqL07bff+o9TU1MVGRlpGpOSkqLu3burXr16GjZsWI7OTyEEAADMArg09r9LYJdy2223afr06UpLS1N4eLhWr16tsWPH+v/e6/UqLi5OLVu2VJ8+fXJ8fgohAACQ75UrV04DBgxQ586d5fF41KFDB1WvXl09e/ZU3759dejQIW3dulVer1cfffSRJOnmm2/Ws88++6evazOMgveuSS2KdAl2CAgSe3h4sEMIKt5QEbCmNb6lAT3fXdWHB+xcq5LGBexcF8Pt8wAAwLJYGgMAAGYFb7HoL6MjBAAALIuOEAAAMLuCj74o6OgIAQAAy6IjBAAATAL5ERvBRkcIAABYFh0hAABgRkcIAAAg9NERAgAAZj46QgAAACGPjhAAADBjjxAAAEDooxACAACWxdIYAAAwY2kMAAAg9NERAgAAZnSEAAAAQh8dIQAAYMYbKgIAAIQ+OkIAAMDM8AU7goChIwQAACyLjhAAADDjrjEAAIDQR0cIAACYWeiuMQqhC9RpUUNdn/mXXG6ndv+4Twl95ijjxJmLjn1iVk/t+XG/3nphpf+xwsUjNPmjYXr+0dnavmlPgKLOPVbO/9ZmN6vr023lCnNq99YDmtrvdWWcvHjug2Z00Z6tB/X2i2skSe5CLj02saNurHWdbDabfv5ut2Y+9abOnfEEMoWrYrVrX6dVrLqP7yRXmEu7k/ZqSo+XlHHidI7GjFgySNfcEOUfF1UpUknrtmpk24mqWDVaA2b1VnjmIUgHAAAMYElEQVSRQjIMQ3OGLtS3q7cEOr3LIn9r5w8zlsZ+V7xMUQ16uYfGPjRdPWKH6NDuVHUbc3+2cdfeVF4TP3hKDdreanr81ubVNe3TkYqOicr2nILAyvkXL11EA6d11rhur6hn/dE6tOeIuo5ol23ctTFR+s87/fXPNrGmxzsOaCm7w6E+jcapT6Oxchdy64F+dwUq/KtmtWtfvEwxPTG3j8Z0mKxuVfspefdhdZ/wUI7HjL1/iuJiBysudrCe7/WyTh47penxsyVJfWf20Kp5axUXO1iTu7+o4YsHyu7IXz9myd/a+eeYYQTuK8gCdoU+//xzpaenS5KWL1+uMWPG6O233w7U6S8rtunN+nnjLh3ceViS9P7stWp6f/1s4+7pdadWzV+n/1v2tenxto8206Qes5R26FhA4s1tVs4/tnE1/bJ5rw7uSpEkvT///9SkQ51s49p0b6yP3vhSn6/4zvT4D4nb9ebzH8owDPl8hnZ+v0+R15YKSOy5wWrXvnbz6vrlm506sOOQJGnFS6t1R6cGVzzG6XLqyfnxemnAfKXuPypJsjvsKlqysCQpomi4zp05l9fpXDHyt3b+yC4gS2PPPvustm3bpoSEBE2dOlVJSUm68847tWbNGm3btk3Dhw8PRBh/qmx0KR05kOY/Tj2QpsLFIxRRtJBpiWDmoNclSbF33Gx6/tPtpgQm0Dxi5fzLXFNSqQd+8x8fOfibChcLV0SRQqblsZeGvClJim1SzfT87z7b5v9zZHQpte3dVC8MXJjHUeceq137steWUer+I/7j1P1Hf8833L88kpMxd3VvqqMH0/Tl8vOF4fT42Xruk1G6r38blYgsrvEPJsjnzV/vx0L+1s4/x/JBpyZQAlIIrV+/Xu+9954cDofWrVunxYsXy+1264EHHlCbNm0CEcJl2e22i153b0GdxFfIyvnb7baLftN7fVeW+w3VK2rEa3FaMeczfb3m+9wKL89Z7dpfKt8Lf2HlZEz7/q2V0HuW/9gV5tLwNwfoua4zteGD71S1bozGvDdEP3+z098xyA/I39r5I7uALI0VKlRIR49mTYSoqChlZGRIkk6fPi2nM3/s107Zl6bSUSX8x2UqlNSJtJM6m2GN1qaV80/Zn6ZSF+ZevoRO/HbqinJv1PYfGv9WP80bu0yLp67KizDzjNWufcqvR1S6fEn/cZlrSik97aTOZJzN8ZjKNa+Tw+lQ0rqt/jGVbr5WYRFh2vBB1tLptg3btffHfapSNyavU7oi5G/t/JFdQAqhxx57TB06dNDEiRMVHR2tRx55ROPHj9f999+vrl27BiKEy9q49ntVqVNZFSqXkyS17t5UiR9sCnJUgWPl/L/7bJuq1K6kCtdHSpJa/buhElfl/E6Pus1vUdz4+/X0v6bps3e+yasw84zVrv3G1VtUtV6M/86fNnHNlfjuN1c0pnqjatr86Q+m5xzYcUiFi0eoWv0bJUnlry+nitWitWPT7rxM54qRv7XzzzELbZYOSDumadOmiomJ0ccff6y9e/eqZs2aKly4sCZMmKDq1asHIoTLOp56QlPiZmvEG/Fyup1K3pWi53q9opha12nAzG7qc9vIYIeYp6yc//EjJ5TQb4GentNLTrdDyXtSNfmx+YqpUVH9pj6i+CbP/unzezzTXjabTf2mPuJ/bOvXO/XiU2/mdei5wmrX/lhquiZ3e1Ejlg6Sy+3UwZ2HNanLDN1Y+3oNfPVRxcUOvuSYP0THlNehPSmm1z11PEOj73tOfaZ2lbuQW95Mr6b2nqXkXYcDneKfIn9r54/sbIaRD8qxK9SiSJdgh4AgsYeHBzuEoPKdPn35QSHK9/uSOmBFa3xLA3q+luUfC9i5VibPDNi5LqaAvsEBAADA1csfO5UBAED+UfAWi/4yOkIAAMCy6AgBAAAzOkIAAAChj44QAAAw89ERAgAACHl0hAAAgIlhhOZnDV4MHSEAAGBZdIQAAIAZe4QAAABCHx0hAABgxvsIAQAAhD4KIQAAYFksjQEAADMft88DAACEPDpCAADAjM3SAAAAoY+OEAAAMDHYIwQAABD66AgBAAAz9ggBAACEPjpCAADAjA9dBQAACH10hAAAgJnBXWMAAAAhj44QAAAwMdgjBAAAEProCAEAADP2CAEAAIQ+CiEAAGBZLI0BAAATNksDAABYAB0hAABgZqHN0jbDsNBHzAIAAFyApTEAAGBZFEIAAMCyKIQAAIBlUQgBAADLohACAACWRSEEAAAsi0IIAABYFoUQAACwLAohAABgWRRCV2DFihVq1aqVmjdvroULFwY7nIA7efKk2rRpo/379wc7lICbMWOGWrdurdatW2vSpEnBDifgpk2bplatWql169aaN29esMMJmokTJ2rIkCHBDiPgHnnkEbVu3Vr33nuv7r33Xm3ZsiXYIQXU2rVrdd9996lly5YaN25csMNBLuOzxnLo8OHDSkhI0DvvvCO3262OHTuqbt26uuGGG4IdWkBs2bJFw4cP1549e4IdSsCtX79eX3zxhZYtWyabzaYePXpozZo1atasWbBDC4ivv/5aX331ld577z1lZmaqVatWatSoka6//vpghxZQiYmJWrZsmRo3bhzsUALKMAzt2bNHn376qZxO6/3K2Ldvn0aNGqWlS5eqdOnS6tKli9atW6dGjRoFOzTkEjpCObR+/XrVq1dPJUqUUEREhFq0aKFVq1YFO6yAWbJkiUaNGqXIyMhghxJwZcuW1ZAhQ+R2u+VyuVS5cmUdPHgw2GEFTJ06dbRgwQI5nU4dPXpUXq9XERERwQ4roI4dO6aEhATFxcUFO5SA27VrlySpW7duuueee/TGG28EOaLAWrNmjVq1aqWoqCi5XC4lJCSoRo0awQ4Luch65f1flJKSorJly/qPIyMjlZSUFMSIAuvZZ58NdghBExMT4//znj17tHLlSv33v/8NYkSB53K59MILL2ju3Lm66667VK5cuWCHFFAjR47UgAEDlJycHOxQAi49PV3169fXiBEj5PF41LlzZ1WqVEm33357sEMLiL1798rlcikuLk7Jyclq3Lix+vfvH+ywkIvoCOWQz+eTzWbzHxuGYTpG6Nu+fbu6deumJ598Utddd12wwwm4vn37KjExUcnJyVqyZEmwwwmYpUuXqnz58qpfv36wQwmKWrVqadKkSSpatKhKlSqlDh06aN26dcEOK2C8Xq8SExM1fvx4LV68WElJSVq2bFmww0IuohDKoaioKKWmpvqPU1NTLblMZFUbN27Uv//9bw0aNEjt2rULdjgBtXPnTm3btk2SFB4erubNm+vnn38OclSB8+GHH+rLL7/UvffeqxdeeEFr167V+PHjgx1WwHz77bdKTEz0HxuGYam9QmXKlFH9+vVVqlQpFSpUSHfeeaelVgOsgEIoh2677TYlJiYqLS1Np0+f1urVq9WwYcNgh4UASE5O1mOPPabJkyerdevWwQ4n4Pbv36/hw4fr3LlzOnfunD755BPVrl072GEFzLx58/T+++/r3XffVd++fdW0aVMNGzYs2GEFzIkTJzRp0iSdPXtWJ0+e1LJlyyxzo4AkNWnSRF988YXS09Pl9Xr1+eef6+9//3uww0Iusk5Zf5XKlSunAQMGqHPnzvJ4POrQoYOqV68e7LAQAHPmzNHZs2c1YcIE/2MdO3bUgw8+GMSoAqdRo0ZKSkpS27Zt5XA41Lx5c0sWhFbVpEkTbdmyRW3btpXP51OnTp1Uq1atYIcVMDVq1FCPHj3UqVMneTwe3X777Wrfvn2ww0IushmGYQQ7CAAAgGBgaQwAAFgWhRAAALAsCiEAAGBZFEIAAMCyKIQAAIBlUQgBAADLohACAACWRSEEIEdatmyphg0bavv27cEOBQByDYUQgBx5//33dd111+mjjz4KdigAkGsohADkiMPhUO3atS31gasAQh+fNQYgR86cOaMPP/xQfCoPgFBCRwhAjiQkJCgyMlK//vqrTp06FexwACBXUAgBuKxNmzZp5cqVmj59uooWLcqGaQAhg0IIwJ86e/ashg0bpmeeeUYlSpRQlSpV9NNPPwU7LADIFRRCAP7UtGnTVLNmTTVp0kSSVKVKFTZMAwgZFEIALikpKUmrVq3SsGHD/I9VrVqVjhCAkGEzuAUEAABYFB0hAABgWRRCAADAsiiEAACAZVEIAQAAy6IQAgAAlkUhBAAALItCCAAAWBaFEAAAsKz/B3U5o0PL8JmoAAAAAElFTkSuQmCC\n",
-      "text/plain": [
-       "<Figure size 720x720 with 2 Axes>"
-      ]
-     },
-     "metadata": {},
-     "output_type": "display_data"
-    }
-   ],
->>>>>>> 9b0e2e75096cc1acee65bfac25f4eff818140252
-   "source": [
-    "# visual representation of grid search\n",
-    "# uses seaborn heatmap, could probably do this in matplotlib\n",
-    "import seaborn as sns\n",
-    "\n",
-    "sns.set()\n",
-    "\n",
-    "train_accuracy = np.zeros((len(eta_vals), len(lmbd_vals)))\n",
-    "test_accuracy = np.zeros((len(eta_vals), len(lmbd_vals)))\n",
-    "\n",
-    "for i in range(len(eta_vals)):\n",
-    "    for j in range(len(lmbd_vals)):\n",
-    "        CNN = CNN_keras[i][j]\n",
-    "\n",
-    "        train_accuracy[i][j] = CNN.evaluate(X_train, Y_train)[1]\n",
-    "        test_accuracy[i][j] = CNN.evaluate(X_test, Y_test)[1]\n",
-    "\n",
-    "        \n",
-    "fig, ax = plt.subplots(figsize = (10, 10))\n",
-    "sns.heatmap(train_accuracy, annot=True, ax=ax, cmap=\"viridis\")\n",
-    "ax.set_title(\"Training Accuracy\")\n",
-    "ax.set_ylabel(\"$\\eta$\")\n",
-    "ax.set_xlabel(\"$\\lambda$\")\n",
-    "plt.show()\n",
-    "\n",
-    "fig, ax = plt.subplots(figsize = (10, 10))\n",
-    "sns.heatmap(test_accuracy, annot=True, ax=ax, cmap=\"viridis\")\n",
-    "ax.set_title(\"Test Accuracy\")\n",
-    "ax.set_ylabel(\"$\\eta$\")\n",
-    "ax.set_xlabel(\"$\\lambda$\")\n",
-    "plt.show()"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## The CIFAR01 data set\n",
-    "\n",
-    "The CIFAR10 dataset contains 60,000 color images in 10 classes, with\n",
-    "6,000 images in each class. The dataset is divided into 50,000\n",
-    "training images and 10,000 testing images. The classes are mutually\n",
-    "exclusive and there is no overlap between them."
-   ]
-  },
-  {
-   "cell_type": "code",
-<<<<<<< HEAD
-   "execution_count": 6,
-   "metadata": {},
-   "outputs": [],
-=======
-   "execution_count": 7,
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "Downloading data from https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz\n",
-      "170500096/170498071 [==============================] - 35s 0us/step\n"
-     ]
-    }
-   ],
->>>>>>> 9b0e2e75096cc1acee65bfac25f4eff818140252
-   "source": [
-    "import tensorflow as tf\n",
-    "\n",
-    "from tensorflow.keras import datasets, layers, models\n",
-    "import matplotlib.pyplot as plt\n",
-    "\n",
-    "# We import the data set\n",
-    "(train_images, train_labels), (test_images, test_labels) = datasets.cifar10.load_data()\n",
-    "\n",
-    "# Normalize pixel values to be between 0 and 1 by dividing by 255. \n",
-    "train_images, test_images = train_images / 255.0, test_images / 255.0"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Verifying the data set\n",
-    "\n",
-    "To verify that the dataset looks correct, let's plot the first 25 images from the training set and display the class name below each image."
-   ]
-  },
-  {
-   "cell_type": "code",
-<<<<<<< HEAD
-   "execution_count": 7,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "class_names = ['airplane', 'automobile', 'bird', 'cat', 'deer',\n",
-    "               'dog', 'frog', 'horse', 'ship', 'truck']\n",
-    "​\n",
-=======
-   "execution_count": 9,
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAjkAAAI8CAYAAAATJrreAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjEsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+j8jraAAAgAElEQVR4nOy9y5IsSbKu9ald3D0ic2Wt6t377MM5Q96DCWMEQYRnQBCBMU+ACIgw4Bl4BB6ACSJMmDIFhHM4e/fuS13WysyIcHczUwaq5u6ZVdWr96nOprt3WklUxIqM8HA3t8uvv/6qKqqqvLf39t7e23t7b+/tvf2VtfD/9wm8t/f23t7be3tv7+29vUV7Bznv7b29t/f23t7be/urbO8g5729t/f23t7be3tvf5XtHeS8t/f23t7be3tv7+2vsr2DnPf23t7be3tv7+29/VW2d5Dz3t7be3tv7+29vbe/yvYOct7be3tv7+29vbf39lfZ0pc+8F/+V/81//irf2Ceb5RaiQJBlICSA0wRosB5iNxPiRQC4xA5jZkYA+OQmIZECIEg9hCBEBISI4iABBBBVbnNM8uyoqq0Bk2VECLjOJBTQgBpDVBUG7VWarV/l1KotRGikFO2YwNKQ1GqVsq6otpIOTGMA0ECoNujtUYrDRSEgCAgghweOWeGnAB7nVJCRAghEkJEgVIqrSmWhEjsWbBrxa61tobWBggNaAp3Dx/5z/+L/+aPfZ+39j//r/873333DY/f/pr1dmW9PXP5/C1lWUBB1c5vPN1z+vALUho433/gw1e/IKbMMJ0Zzx8IMRCjEFOgtcqn737N5+9/S1lnnj59y9Pjd2irtKXQymr9qQFRIYZAHkdSTkgIhJQIITKd7/iX/+pf8/DVR6Zp4uPXv2A6ndG6ss4zraxcLs98+7vfcJuvqDZ70GitobXSaDx+/sw333zDui6UdWaZb7RWqa3RWkUVWhNaFZrCPK8sc0WbsqyFUpuNPy2oKiEIp9PIOI7EIIzTQB4i0GisgB+zKii0BqXA1x//hv/+v/sf3+xe9hRXf5aprn7klETE5utaqLVSSuF2u1JqIefMOI3EEFBVtBZAiTESk60fImFfM+yI6DbH7d8C2+MtWghvZxf+Z//pf8xvfv2PJL/e3kQEYV9/7D37m2Jr5Os2L4utoyiJSMTW2CxC8vVorYXSKghIShAFCUIK0dbrIOQ8EGMghmTjP0VabXbfSqE1pa2FVtvW6ar4HKsvx4HYtfT1NASIwd7LMTKkSIzCNI7cnQYAnq83LtcrTRulKmtTRIRhGMnZPtNq/73Gui7UupJS4P5+YBptbY5RELE+DSHwN7/8O/7b/+F/+qPev2P7X/63/4PrbdnmZgiBHG0MhxCIKfq+F1CJqN+ThoDa/bbdRxAF8X4M6szE63sefB4IVH9LX01CO4bahwj27OtVH0NNFJU/YD05/P7r3+l/Vj3urIe/vfzki3XMHj+9ph2PZ5+x75ynxH/yH/2HP/qdL4Kcf/Nv/h/+7b/9t9yuthglMXATUMYk3CUhBbifEl+dMzkGztPA3d1ADEeQY7ct9AnbQc7h5mpr3PrkVKU2pSkECQxDJmebnkEb4hdXqm1Oqkr11yEGAx4hoNpo2mjYBlfKStPGMGSmaSLGvpg4yKkGnFBFiEQHJb2JCClGYooECQeQEwghIiGCQm2Nut2ow2KsYgNaG7Xa5mx9YI/l+vylW/Kz2uPzM58en7heLpT5ynx55vL4SFlmDLoaeCtkGBZSE7jNkC7ElJiaUIOBkhiFGG0xe3p85NN337GuM5+//4anz99Ca0hriC9MOY6kkCAmBoGUIiFG8jASc+Z0mjhNI9M4MAzZ+jkIqgGNgUZkyInxNCEBBznVAXGl1kJrjScR5vnG7Xbjen3m+ekzrVYbB63aeKtCawFtyrxUltlA6VqKLd6q1LpSayHGwOl8YhptwR+nTM4Rm94riH23rPZsIEcppX7hbvwVtz5l1F7LYQestdBqYV1mbtcn1nVliYH1mpEgtFqpxQDm6XTi7sMHm28xEWO0I4nNFwHUj/3WAOet269/9Q/8wz/8OwN18vIqRAIxyN6PB5CzbQgHwFfKyroWUEghkkJEEKJCxADBXFaWUgxwDImQDRAMKZGiAZ2cMzFZv0/jSEqZdV15vjyzLiutNepa0KY7iNnPEsDnXXODIRBisHU0RYYcEQnkFBiiAYC700iZJwCer1cu1yu1Nm5rYV4LgoOcYWC742pr6rrOlGIgZ72NnE7ZAEPYASJAeONRcp0XLrd5u0khCEuIu7Ec7X4QIsS8jWcleP/9CMhxgBP6tuJ9CuBmv3dF3292+KGqqKEZ77O49Z044FEHODvI+Qmgsb2tHLa4w98N3eyA5GiIHM7nlZG2g5wfAU37L9K0//vHP/u6fRHkfPPNt/zmN7/hdrlQSyEFmBIkgfMQkCkyRCGtiXMbIEUqI+QzkgJNA2v1AbadT2dvDBCUprbhq7KuhbUUQ+aqNN8gYwwOlCAFJYi69Wzf26eVEEIgxYgEYxlKLTbRtFJbBRp1GKFM26Jpc/MlyIkSIVgXtdbMWjk0Ech5IKXk1kncrAU9giO3XmzjNPCmqjQHZYgYOEKIfwiK/hltvt1opTCkTBYhaoPljpIzSEJlAAmc7r5y9mZwq0MdUBREV0QbVDMDtBbqfKXcLpR1oS2zURmqjHlgTIkUE3d3D5ynO1LOfPj4wOnuREyZu/t7xnFiGAYevvrI6XQiOvhJMdm9KwOqjfLhjo8fHwzQOmgB3ZiB1irjeOL5+cbz8xOtKU9Pz9TWQAIhGqgWEaQGNMBAJAS7J/kAcpZFWIstUCkFUgpIEOsLbCyEICARbY1aVz8HqAVKKW96L3sT+TPe1l+dmqK0stgmPF+4PX5imWfUQapqo5S6GTpf/80vHVhOpFFIonYP2Df8bgX3n/sz7o3f24ytEtBGU9+kfMMIISDYxmgbQdvHsaG+FxceQyAMGQGGlMjJXmsp6Fpt+a0KWlERgiRSsLVzSJHsbFLOyUBOCAwxkmKAKkhttLXshgaNQCA6gIkxknMmhMC6FpZlprVmBmhMSBCmaeTufDJmQ4yMCEG4O03cnSfbinNiGEdKrYTnC3q5oUBKDvpENhCm2igr1GLr//k0Mo3Zll8fGTa+im/2b9f0hQfA7k8zqhxtoMWMLYmZkJ1lCtH3IQgqxuoAojtD2U1iwI5T3cBv1ddCCDEibryr/9dUKWW1vQ3bq8AM8xQHYogOiTpweAUgfHi9BCrs7+ir97UfTV4wPf2Qr8HMBtYOwO24rulrwMbL4zR9uTcf2xdBzm2+cr1euV2v1FLIASRBDZBaoIREiIEaK3WFqAEtAdoKLaJiVvPWERsKs05WYC2V4mClrNXYGVWq7kgx2JghiP12CNZjrR0/Y3RgEwGfpLUWVmdvOtDBOYuaxGha6UZQd1cV7+SIBuu8Dn52tOnvl4WastOwAfCFSuJGzRpKtxu8lvoS5Kgh6xAMKC3z9Uu35Ge15htJCBFBaSmRcrYbIwnCAGIAIw8jKQ8HGto4N9SnX++HVlG3zrUUqM1YHIQUAkPKpJQ5Tyfu7u7JQ+bDwwPn+zM52+tpOpFz5u7unmEYnC43i1Y1UIMtUAOJNg774G472DF3ZeX77z8xjhPLshJjpjNlnQHYnoMB7xAjiX2iITbZag3UZlZgCPb5DrRVdVu8OkhWH4+dzdH20xPvj9HEXbx/KW1zHmujVXNj1nWhLDdqKSzLjVYra6nM80JrynQ6UUohHxa/lzBGXv3rL7eJvBxbZn2/tHL75tP62PJFUY47kB+nsxcpRWceoWqj1Wq7pW8dffz29TWGYCDJn6PI9l4M7sLvG5mq7bbmj3YpAsQo5Nzdbkoptk3HYIZGB1Cjyxr6Jh4EUk7kbOvh4HtArJXbvGxuvA4eQhBSDMQUUBXQSBAlxUB2Rkp8hxaE+icjV53ZCoeNWvt9hFrtOdAgNhCx3dAIYo74oR+uczDHDb7pUbZhFxeNmwFwoYbNubUUSi3m9kWBQIwQQjt8Y4cTL9xQyuaN6L/945fdv7uzLYe3X/TD6+O8PuZxPf59AOfwMz/avghygghRIDmwGCKcciBFOA2R85AZknAaBqZxICenOKOBn9AnIPgNcZ8buzXSDmCGIAQ18KPurgLdQIgEISYhReusDnKMkux0YNitIjG9i2gjaKM2m3QhGMhqjW0DA38d7XV0jY0gKNXXBfWHW5KbP7X7zUEFszJi3K4b3Qdlaxu52G+XWbEI9a1dHGr92qnLVo1Fq7UZWzGMhGDWW2ttYyNU3OVUslluTZHOo4oyDJm78x11HBnHyMPDHUEC96cTp/FESon7u684n++JKXH6cGY8jeYCO98xDqPpL6K5AY2m7RPB3Uy+0dVm4HDn7oQg0a27xjSeOJ/vaQ0ulwshZEQaIn7O7CyhqoHjkMJumYiBlHUF2w2CP7cDa2DakVLs/VortdjDWMhXi8Rb3Mq/IIADtg61WlnnG8t85fr8xOP333G7PlNLobi1X5tSqm2/ZZ6pZaWWlVRzNxP9iJ2Slxc9/ZcKdDrI6AtmDJGQ5CUwx8augaDOQL86DuYKTslcTuMwMA1mGCwoS6tIg5wjjUxwndkwjgQRM0qCMeGpAx6x1ykEcjL9TE0JpbNOjRAjg7M3KbnmMQRCMOOqtvri/cndviEEm9utOuRqNMxxE0LYtHvDMDKNvpGnvlYIKRuDpNoQDVTpYAwfL2JzPARQc98FeduYm1KruQv73tca1XWapVQWd/FJTIQ8mjYxJmIcHOwGwuZU24G8bvuPeR20tI2dqr5Wb5ou2SFL08pSFkotNkLUPAcpZU6nu017lYe0STh+iiHWn3jdx+gLIPL6uy/AzpfXr/0c/Jhi+8IPvvl7Jv0XQU52YCPRFu7TEHg4mYvqbsp8vBsZkutwzhM5RabTQB4GF3t1V5VCq75BQW1KVRP4tq0zzG0TXNeCKuIWB2puJolCHhNjjjvIaZjPOpo2xlCvXXVrisS4WT+1uZg0BpucrVsFce9Q9zTFEEkOVDQU2tqp4sNCK4H22poUIaRIStkAXLHr7gCn04q9GZgwv/W6zF+6JT+v1Qq1muutVUqpxp6tlZQjp9MdKY+oREqtUNumZTHRdaTWyajzYBZSEMzFJL9AVMk5MKRAjJH78z13pzMxJqbpnnE8meWZ7BFCYBgGW5Cd+ekLkGindyt1XZ2SbSbYVnW3pC2AnU5Xhbv7r/j64y/J6cTlciOn31Krs1A2oFAt7pZ0yzGZiDEmiAVqFZYFdm93RanuLthZm1JW1wM1lqVsOhwbm/+MNTm8XMTELdtaVq7PjwZwvvuW3/7q/+Xy+Gg6nHUxZjAkJA5IjNwePrDebuRo80lbQ8JOxdszGzD4S24xGNPRryU7IIghUtvOcAdV1I21bjhty5E/xpw5nc0dfz6NnKYJVeUpgLZCa4LKQMyREAN3d2em88kMBrXtta/dweUCQ4qmi9TGPA4OLRWVhoqSYjRxsmsWh2FAQmC+BYRGrYVxGrfzStGPh1DKSlmdHcdcYE0CIUXGlGhNqYprHtXZIFt/jI2weRlDpBXdWbGqSFBSiMSYTGPZlBy/uPX9rLaUlesyb4bZuq5crzdqrdxuC0/PV2ppBnJShhBIaSAP0xagE7t78jC0W7P1uGts1IMkyrJS1tXY8xiJIfrGb0Zb1cZSZ0otbmwITYVxGPnqq6+ZphPjOPDVVx8Yh8EYMge6G2NH35Z/HJzswMb0PT/XxjuCrOMOa55c/YOZ7C/eaaMQjc2RYIzOEIUhhcMjbn7c1BG2MynbiYrs7M3GaBwQWaf+nYZUBWmChM7y+PQV2xhjNB+iVFBX6Mewa2Jas2OEoEC0hWA7B90Ecp2F2dxM9J/Zo6UAJDR3R+1sQD95le0CtksREUPnqnS9++bz9H8djDZ3j+yU41u1DaQ13QTb3cWCCDEmUkpUDZR+n7QaxS3dZ9rtLL9aMctRxhEBzqeB0zQQY+LD3QfOp7NHZ5wZhslcmKHRRM0S62NG916U/YQd6DQXD+v2DKBRzWft7i2AnDLDMDKOlZwGJET3Qe8Uqom/nZHzRRygqVuFtI09fPnw7zvrY/dtF5FvboS/WD7hj9P2xeclT60e4bguFvU2Xy7cLs92Tz3yUVImDhBaopZCK8X1a+3VIXUTWNLXCNlp/b+0FthZb/F1LrveRCq+ufmUsE/RWjNjwLuhg5wYXU/j7Mo4ZFSVW4oGDjCdmYEEAxuDAw5pgmjnENiBTn8OgRQD1ddgXKeWUmIYkoGJDmBEaDmZnk2ii42zMTHuagKLkDrOs87PGRkfEVFSTORkcgMJ3ThVQrRTUBU0CHJwf/VD9mugR/i+8Qjpa3mXOCzryjwvlFK4Xmeeny+sa0ViJCQDgymvDGs1g10iUXZXWx/ytTXXlWIAp4OoZaGuxuTkkBwsQ1csN20sdWFtxd1ltkeuUyUPk+1GYgxUdkZMZWdn4JVr6YV/tL/pz8cpefxz3y+Pbqbj9/6A1u/rCzbp9fFetS+7q1CSKCGZYvs0RO6nzJgDpyFxyoEcxSOumim/q1BXTBtzGEydMTD6rDnwcKGUh5bj7iajvJohcUejACmIKfFzdHTfbIMG0F0z06qrsDcXmX0myn7zO1MgQUxIqzswM0tg196AgT079/192f7rkAnQZtZS8QWpFdOoqOvnX92lHk6+1kZ5a0Gcu31MteaOl409U8q62lUFo05jcGFfMlT/8NUH7j/c+yJl4j+AVkbauiACQ7bFTERYGtTbSpBKXJQQFxSlBaWJTdK6LtRaySFwHifGlMgpcXc6MbgVt/twmwnJ3ZW2zOt2DSmZ/uZ2XVCN7nYbGMbJWJe6sK63jU7tYm9Vm9xgzMxaZmqtNK2EYH71lIIv2ltPOmDfu7I/Drj+n217YWX5XEaVWlbm25Xb9cK62HhJ0XepaNaNSjSRZqu0srLMN9OVjKON3Q0N74jmuL7+pXZ7BwDiBuIwDEzTRErRXR8OamAbXOYiLSi6Mc9BAtM0MubskabR3TnKMBrDo62bXMaCj9O4sSrBUz0co5K6BADMMBmH7KBBIVhETnL2xtYTCxE3wCUMQ6JVc3XZ+7YL9sAB/Pv90jrzra3RnH1f1tVcMgLJgwjMuLUgJdUAGuhOr6iB/p9WpWGMa6tt0zq9Vfv+++/57tOjr1W6rVWtNm7LyvV2M1d3iBAKSCCmlZxNMxMJBInbht7ZHIvaNZlHBzib68rHRg7VAJJgLvpge1ZRSxlg65255JlvfH565LYsXOcRqEzTSE6Z6TSRokVGx9g9Hbu0wybeAfj05x+AmAML8wqM2Ph7dYD9wzvAOnzkCJb+kPZFkJNoDEGRLEgKfJgSX9+PnIbEmITzEEnBNDJJKkEbulZWtc3STspWoVIaZbUb1AP+bZAK2RImWHhhiM4yeESTeGCdQA7CmBPTkM29w0op+2e7C6uU9kLroxiVGbOJWZsDEVU1y6UZpRrCQc8DB5DTOpGD1F34Z/l/bCgadjJmoZnM3wejdpGGict8Je7npzSWUlmWyry+LZPTaqWVglTbdETFw8ETNFhvN+payOOJMQ/EGDndnbl/uDd31v0991/dE5wSDe771mZuMLPCTKfSmnK9LczPV7ceLpbTQgzgNGms88z3337D5emJ0zjwd1//gg/nO+7PJ/7V3/0d8f7+xX1stbEuC6VWSmksswGeECIxmOV4vc5AJMaRYThzPn1AJHG5PnK9XX1hNTbQ7qzRy6rKstxYl5uxV60gBxHlOCa/juKCZ2PDDDPujNgWDfjGfv8/++aUtaKoC97Xeeby/Mjjp+9ZrleiwJiTi1tN/riUynWxPq7LzO35CdFGHie0VvNPEHY085dK3bxqltZi3NJfjOPA+XxHSpF5WZBboNbqgmDbxNZlteg0VaZx5DROrmOJ5GQu5XHYUx6czqOz20JylgjYmTCwrDq6R0kFz11Uq7G4McDdaUInBzgOOjf9jAPc5nKEnAJMI02b59yxzVdp7kZ2vd/gedCCMSFKpSyVde0un0apzdjfmF0cLcQspCRoUwp2fqgQa0DUXZvVvrtrY952nf2HX/2K3377Hcu6bPuShblDqY1lafRtobp5HF2TgwhBDWxiX6PvQoUDyKErbszNH52hytJfQ4riEaVKo3gqFevLWpXbunCZZwO648Dz5ZHJwfVXXz0wDiM5G+DpmskoiZfR0gY42oEn7/4K0cPnfgSU/L5pq/BSf9NdPIcf3kmMnz7Ol91V0hX3dsI5irmncnQGxxanKDuT0ZmUI8ixzUE9cd/BLRH6IO+/ZehcO+FC//tOXwZfEEUFy5pwsBhb33iqMzwOVgSU4GBJCGoR0DtO7NLhToN66Jsv0j+8IS97dadH7aDaGs2/YJa+vvjcEf92l1F94e54o+Z+PEW3BFNG5fqEag0V+0x0AV/OyRLg5WxJGfPgovId5KDRGSKltUJ1DdXaYF6L56CBWuy3q1RUGvPtxnefPvP46RN308QpDQSMEu+uu+4a8t7atE21FtZ1tQSQUonRAizXUg1Y4xEi0YTUxhAeDIRuQnT2z/vfdFsWl2DjUjb3gX3uMLn6f/pyDvfvvWX7cwsd/1HLSnTrX3U9WllX1nWl1WJEgMgWJRNEqM3ZW/X8R8X0BhYZ6Mfbjt9/nH1iveqWP7d++qm2bSIuGI4xuoA4UVvbAEl0YS/YfK3FNIfmLhr8u5YAL/j63PvAjpnoSU37cVrdGfGIMQmmczRmaJ8f6nPB+9QTt6io/66BI0vOic8hB/1qLMC25h4Y8R7B2Ft3zVXXDfY1UpvSgmxzE9kjbwmuzWm7m6273fQwx03k/LZMzu125XK5sKwLtTbfWzzCrUGpBnBag2LbDLWa600keDyHfavR9y+LOC5bzJSHQogwxLTlQkKURrM9TCzxAG5Ybuune1K0Gflg06pxzcnmGcp0mmzcCOSa93VQdRtPR4al9/PWDnvni+3yyP78FDo5fk1++NkfvP457qohCGN0y0Fh3PQ3Jj4eB3NZ9BwH3YeId46qdbSqGWCKC4Od5kR84Dsl6mPVxXTVM59C88Fcq+XQiOKMzVo2PUQpFpnRWmOt1Sdk2CKkRJXSLHy8b1C9c7ZbFkzfgYhFH6ltnIKLuVR9AO6I0qhfg+X7BmrHbI7cLbGhsla2EPLiln+pyuVaWErjurwtyGku3o1OO8aYyHmkhcR0OvHh/itSzkznO+4+fDRR7jQynk+Wd0jh6XIF2cPiQU034Qnc1rJQqoGPp6cr18vsn42A+fHVQc718szf//2v+O6b3/Fwd0eW4PmIhLWYDsoo14g0SDEzjkpKlRgKrYrRvn4rVZXbbebTp88sy8rlckOx6KsYLVy+1bDfJIyZsXtoImNxV0h2LUH07M7qSc3WdWVdly0fTgexHdhET0aZ8tuKG//c20a1a7McHWXldrtxuVx5fn6mzjPr9UZdV1IQanJ2Yi2W4VqV2/Mzj99/zzLP5OnEh+sVVSWkTMgHXRjYpilvrbZ4uzYMg4EUBwobqK4m+h/HgabmlsrJ82qF4DiyOZMzmuER9ojU1hrLcgOMcY3b8WUT7MpmpMpmCHbD05ZCcxvV2tNrHECOJ+/t2cc7yOlC6Q5y7fd3l7xJZF6CEOh6Fo+krJ0Z2JsZ0s31SLBoQxwpbLnHGparq4qz++3FxtjeGOSs1eQHtXUTuhvs3suyR9JF9YDucNCy6jG2qoeCyxZ11oFK9f1oqZaaRESoBAsjD0KVQHJxjXokrCqoBPdMqGeMtqivy+3GUlaWUilNXas1cDpZhGz2pK3G/u8RbiJispNX/dCTGFrTl68P7QiWfnBnlM3j1Q1KeDlmfl/74io8psApBZJhe3NTDYkpJ6YcOA+B5Mj6GO2wnbSLvWwimX/QEGkzCk56/pGwicMMhDSPAtonlfqkvd0WL73gFKqzIMu6GhuijbVV6rYgmJ9YmyWwk24psluMeIh0kIBEs3Sq1i1yIUogedSXZcPp2p+2RYx1S6ffDLDBeFst90drylwaazVws6wGfmrDWI4G5+v6xZv2c5pZxNWs5hAgZYbphLbGw8NX/Iu//VvGaeJ894GHj18Tc6YhFLcknq43Pn9+orTGWiqrA4R1mSnLQmuN23xjXm4Gcp5vXC5Gh+Y8kfJovSMVpPH0+TP/1//5f/Prv/97vv74kSyJulrY9lIqiCUOC1S6GDxmcxvN80LTQFktqmmeLdnV8/OF3/z2d9xuM8t6Q9V0XzFl8jBurqau2aLVLWKhg5wgQkz5kNwsoGo097zcuN1mbOWIxmKpLWDdAh/HgXEc3/Re/nm33XjQ1pjnmXWZuVwufP78me+//0xbF8rlQiurlYNxUWxZV5ZlRpvy7At/HkfSMPLwi7+h1spwOjOkTBeUH6mcY2rQv6R2Ok2cTtO2cMcQ0GrrWIiB0zCBWKK77GzMnMzVp025O524P58JEmieuFO1sawz82xRmymnzaUUoyfHdIt+C8EIJo/qG7F545VWV8q62MkeQI7ortOQasaDRRxWL8sjzkIJ2sqmIUoxbucCu8bNRLqzzVuJBEnu1t+BmzqTC0ptK6p1t+aN4qAuiq42x6sz+13M/dYBHvNaWFZ3afOKXBRcHO26J88XJ+LpM8QDyNVAQ0UpopvB4P4S1mIuOAVEO1vkeiVsLx3UI4Sli7XNZBex9AStqskq1PPVPV9AIIZnvvv8mRCsRJJVB4icphMfPtx7cMfAyasG5A7QO0g7zMGjp+XH2oYXtMs39vf7q+NXt6ATffn4qfYHCI/dHYXsrx2UdKV97DlrXo6xvbll3DW/KN4Fuv9t6xjd3+bl2dtNdmAhFv7bnHrcXA210tBNoNX9wz2iq6oxObsd3zuYF0BT2etJoWqo2sXRVp4i4NLo/bubf7CL+twNVY1yrQ5sVqeGZ/eLGttjIX2lvi2T88Jf09kzzy/Us4uOh0fMmbX3rVq25nkplFod8ZsmZp1n1nmhtcp1vm21zp6fb1yui4sbhdwsIkKlApXb7cbz84XHp2eGnLnebszL4injd9Dc75YEo/uoZ6oAACAASURBVNIViNEo/K6HsXsqzrYUlmXdIhF2y9WEVf3zP2ZD9HFhi3MXox/u8ZYGQNznr9tv2POeSO2fZ3vZp4puc7OUYkzNutLWlbI6AxhtqQ9i9a16osxaDPA0MC3WWkjZoq06laiwb7qwj++/sNZzuRwtVAPhzd3DwcZ/DJsQtIdiq3S9i7MBrdGarSl9zVRRonZQ+LJ/5NWauDfdHi+Tr6m7IgU0eN4s2dxAeymHZrWZ9JDgsIdAhx6tKtshu7HYmRyCMw7bjnHYD3zh7mOrMyUGrHGRsbLriez1pkN6w9acubfzPFD7bCuZ/f/AKHdGpK9jvbSH/duvuy9EbCb6ti9un3UgpEGoGhwA2e3qbrzov9tZa/DEgsVAUwmVtVrKjJzrxiCi5uas2bilGCNJLSglJksqKwED3hzG0ysUsu25r/7cqYftGvXVd16970vAzwM54xCYxkQWJaJMY6fwveiZ55QJR12Hsm3+Kvv0qdV8txs46KjI4/2Rnra6u1UqqGvlZY+6qlo3IW/XR5h6XN2/2TNBmmtorUpoDaioFAy6NWcHIBdYqhBEkbQiN0DEGILF6r9M6cSULVFTDCMhJkSUmAoEs5jaupqYGTnkdFHWtbKuRv/NS2VxBH65FpbFrq96Gu+769vmybGFMqKSaBJpIVKxRWgh81wa61K46oXHokgwRmVercDl5+cLnx6fKLVfn7MinsG2tcayLizFfNHLrVCXSohGq+YQUKoX0luo60xAGaJRrK1U1sUsuafnZz49PhICpHQAyW6NxBiZTiNtyNTaGKfRoxdmPnw6E7Nwu1q5AFohRbHPt2ZAbSlsorXWB65aRJ9gKdPFqbvOOPo4j2JQubvI+oazizUT/LMWHu+bY6uV2TUKz8/PPD0/8/j0TJlnro+P1HUlx8iUk4uPlSi+MM8z8vxMWlYeP33P99/8jvF65kNtxGEyrVWwulbgW9xfHr7Zm+5hwU0btLDpAzUYyFfXlSC8cB/1ApWWJbzSs4FrBxsKy7Kwer2q7OVWAGjGcCsCWmj09P+ya2zQrSzKMUWA6fi6B9gBy/abpqPZCy55kVNneHreKdQSANp8MsZWsULO0dN4EHb8mpKVmGgqUGSLmKq+LonlqbBQaN+HwI2klIg5v+ltrM0eXdOiPXnsZkj3fF2K5e7qrqjXvI+7ydWS0Tbp5abx9BuWGqDRAZ2Jk0VBmt2r2ixnTs9bF8TE4kFMv2T32fdWul5TDJwJJl6dC0EaTS8GeKJF0l2mqzHX08TpdLPXw8g0jr5OmsYLdheoX/iGcWyv7ulc8A7qSQX3/jB5yZ74s4+VHRj9ePsiyDmNibspMQSrqzSNiZwjKZlq23y/O00vHLLJ9pPpAxnBMmQqoe7IsjMxTnTagG9WE0lb9YnTCyIqa2t03K7NbkZrjaV1UVUHObZhabOcJ6UZmGlqNzr4QpqTMi5CEBPdNrGoqOIboahwP2XuJ8vFczplptS1IguBxTd8GxCWZlu3Eg7zsoOE260weyTVp8eZ681AUZOAinB+vH3plvysllIgxIiGAZVIjUoJdq+ukvm8NpKu1OvC+u1nmirzXLjdXGNzufD4+Gz3S3YrQzbuC0rtoYpe1bs0E1OeIUehNljWhXV+pt6uRG0MMZKD0Ephuc1cLzc+fX7kdDqRh8Td3cnSxAsE59Jjzpxd97KJxFtjaQvfP35Pfg6I3LhdCk0XUhbCMPlknrkuu/iuhzrYeHFrwQWVqNHLQQSVPb19016+q1Pyacs2GmPaciz982r66rVSauFyufD0+Mjnx0c+fX7k+8+PzJcrn7/7jnWeGVLiNAykGJhy4H7KpCibpi3GRJ5ODOd7xtMZ1cB090AelDQIEmEbkH+hrbvO1TdqxAwfERCNtGBGRxGhBC/k2HTbOFqtLPPtwFB2l37ZolWXtVKauTbGIZNT9hw4gSS2xVYtHDDM8QS9fIJHEh6YFEXdEO2sjG55uJRmyQv99vQ8PVsRUrVEf5bk034yRAMh0YsBd4aj64VMD2K1CecmlNWShC7zwrquBAI5DsRwADMiSIykPJCG4Y9+/46tVJMhhHgQPx8G5zE1Sd8bgwQPxBHQ/mmxZKjNk/gJtGBgNIbuylOWVtxFuQewmORCif46NBe0S0OjARDbgz19i+/NlQZqEV4C1GLgWYDbvHB5vlmgQEoMo2W4Pp9O3N3fk5LlRtMHS0qYQiRHoxVCNG1kv/694oEcvEB9TBy4dX21qhzIRNXQYdlP3osvgpwoxthYiJp6iu8dlXXw3nUMdls6q2Y3TPtn2Cm1Hzyr+r1Vp9u2S9wvzv+/aSkUWgtOcR7Bjf+091AXGJvrqFHV3GLRa61Y7hoLTWyiNKdha1Fqse6r1bJECuLZRg10idfLsmveLffNV9gn+rYweITAdj523U0aKvJ7C439cZqjdl8yOgvWMMtjKaaVWktlXqovHIXr1ZiZ6+XG9TrTWnVWw+89VjQVlNLqlpW4FhMRivDCsmzN9FatVQQvKGh89sYQrevKvCwgSq2WdlxFIMQtpUDoKcixN1RNLJdyJOXokSUW0h4s/AJ1CxU93KfO6Pi42iaXehTagQ/d8yixDeJjfqVjCoK3b7+Hp/1B+4lzekEJ/1OO150DP3LI7TC63dOeer74Yy3mUlyWFW2NANQQiURqtnlFq1sqhrKuLDfTdxkTWIgt7ZEyP3V5h0s6fuTHVpffcxj/49vd1y2DcD+XoyXbQQO+Tvl3esqgnlW+MzxuWr5gXICN7RARao3E0LxApxio150Zf926XsQAjR9zN70361z8fPfrOl6j+Dqpr1yM+zy0BJ3++eO86v8+vA+HedzMLdfcK6AHInV3JXevwNvOz65R3dgQv8bXn3n5R933wWOf+iLVR4N30X7feTkPj27F5uy0dZVzHsFARdhOyCQEYLpX3UDZfkp0974nBbM0LApbVF0kzbZHjMPq5SPsHCPOpDmLtJ+j98Lx9/xb+7XK9pf+Oe+Wvdf05d9ety8zOVPkbsoMsRFFGXJkHAxFJ8GR5/bTh+62Sda1Msa6NMQXpBwCMVhY2tALsgG1LjRPAJhiRAfpNolxBc0qlZubRDYti7E2RlkGwdxrDmAkGO+z1soyr5SeONX7JXmeHts0o7uiAlMaeHi4J0gkMxE0E6pYhVDzehGTZf9sAWoM9hC2CdijxqIzXiciOStjVWIcub+vNIWlKaXB/d35S7fkZ7WUTkhQSos0FZZSeZ5nSm08J3j8/JkQzM1X3LKyaBfT3txuM7fLDdW2ARzDswVzLSprKayesG8tJgoOMXJbnvn8eAZtlDLTPDR4yCMfHz5yPp1MJKeNdb7x7be/Q9vKMI5cLx8YxoFhyJzPZ8/xYbVWQugTyK7R8tpYwkgroL4ibaGqFdxsCmVdTCdVK3UtNN9oqxf5FIEWxfQ/Dta69WFjdrCFIgit7Qux+f99E3nrOmT/REDyx/mdf8rmYItq12c0bZ700otwLiuX28LtOpNiYL6txCAspwGh+T1sjAixKdenJz5/8zvyODKOE/cfHhimE+eHD8QhWV4R6dziq7P9972EP3HrxqKGH76vbmEbe+g1nCSQJbqhZmBlMzyx9Pq1mZC10VzTZiJgsNBxAdbNQYUpj31ib1pJd9faiZibwwkIs853UgYz/iLJN7boLqKeYFVkd9Xs9QydEfKDqv9uCMESkSKeg8zBUTPZQmvupqli7imM7Y3Riwyn7BuhG9ghmITgjedm12FKaC+BiN+XHkr/evfc9mo3+BFxiYf9vbbK2sy9FSJbDhxBiR4pp627fXqfGrBRN0abutHpyQb7aKhHE+cwR0TwhLnb9EL9ON31qUBplRgia7Ho0xgjOWaG6PXRBtN5io+JnvRyO+j28CftJXVkY/j8sugZ63t2nsZPs+ZfBDl3p8x6zowOclKKjNmzINJ2dua4mvhgtIHrGTkPPriAEFKyMvMSNssbVRZdWUuzTMvJAEdTPBtwo1a4XM3CryqsRSjNBkz0CIHBsyKbD1IJoSJSYVWWuTIXZy66gBWLIggCOZtqPMbIh/Gev/nql6SYKFcoV7HyRwVY7Ysx2W81hZoCtThY6uH0GBXcghKxEgaIff7uztwdpTaeXavz8PDWIGcihEZdlVKUy63y/eOVZV3tovQGuIuw1wiqlkALVdZlZZ0XVNVBjk2/Wi1svGljXhZLgqUW8bauKyEEPj2eGSbz1SZ350SwDWs4MQ6ZISVoleV25Xe//Q3Pj5+ZppHnr79mmkZO5zO/+IUyjiPDkMx1hWVRxdnFGAPDEKklkqISdEHabKDYqVGr9WLpB8qyUm4L5iZdaa3Y+NawLY6lVKTa6yiBOERUhdgM5GxMnVPDnan407TXYOePtYu/stRfHPqnfuNoednyY5qCPTBg9QKFt2Xl6XrjerlaUIPY2jAvIyKNMSdOoxXT1di4PD5SSyNlK9tx/+ED4/lMSIHp/g4JiS3BFrLlgeqX8oPT/LHr/WJ7W4RkWrN93G3PsKWz2ArsAhITJC9A6WuY0FkXQJUqwfOJWSK+trmU2Oq3STMRiQQh5ERwYBElbrX8tpCr5huv7lq0EMJu0KrX/cvJdBmH3DwcQM5aTIDeXVtdU2JJYg1oSYhWnw72Wj2Hzbu5uLhVky2ImK4rxkQeRoZx3AFUM5aktMZS3jiKtVrkqbTO+O5AJ8RAEtsHXoy4l8TWFk2MsPW31TBbUYSk+5cENpDTROwe+/6rKNKCB9x4ktIKMTS/xxZUpLgXYzvsTlqE7rk5nGNrlhUeYCkLz9cnRITL9Znn52erSxgHhjQSQuT+7o67uztisBpnw2gaV6tDdoQiNk4MzNizS3Ad0GwZ+Wjuriq8sgoO7cvRVR5hEoMSAx5ZtW/i4q6f1yf46nZh2KtThvtE7GnDg3RE2g9xsCDAXVJsLp61WPXbtRrICU6r9SLyUcRC6YJrhwRSbT5fje9rh3A8xXPxBJuoXeneoxeabHzqzipq16Ls1lOwNeBwrX1wiG/ChmCD911DCLWxqFF6PbX5WzURi5NTrZ6MynINrWtBdUXbbAj6UJldW6N5Qq66FqpnB7ZEVHZvS11YixVYXJfF8siospaVUopd87ogsbtzrJaNii9IYv0s/nu1VtZl2diiZZ79fiTWdfWFVdwV1ilwGzwibLqZGPeIQEuerX3d+Kke2gyLPv5swpmmAZHNYgYX24cewVcPFOzRVfCnaK+p7+OrH7la/Yl//OBlH+x+HH0ZoK0/2pmv14ADtd43WDzbq7uQ2+EK1hJZi7lDc3ZrXYRaC3VdQJXiIekhRhuPmxjX7svWE3rgdfRgKR+669hzG677ifaWXo7ePWbAywYi7I8/vKvdJ7+BOdn7uTNC3TWzuXl0d9XIdgw2l8gPlnJ5ed/6v+X1sbff2Ne9Hi32Mov8Zo8fXEavrHiOp3H8bdnOF+UgBdg/291h2zq7RauBhp5PS988T86LCLn9SthEvYe/7D3yaud02cTmuqFv+t2V42uS990+drYtdB8q/Sh6CP5R3T774jzkMMfdjfV6kvQz19fjB1jXlSUu1vcRtFoOnZyzVztv7jGJVjzV8hB4/xyNpH1Iav83u0B5+9sX5uyXo6tyYkyB5LzHpssJhvKTeD2ffo1qSKsjrxASw7Bv3Pt4TLaRimAlAHBaLWzVu+tq+WNKbTxdCzev8vx0K9wWu0lLDVS1CzkF8/lNeeBfffXAx/NAShYdFqPwzWVh+vbC81J4Xiqfriur07Vd7S6i28baaqEsVwiJdbGSFCjULPT1tNeBEsx9E5Oj5CTEKkizGiq9iKce/KKnYSDEyFobKUeWtfJwfltB3DBOxDRT68KyFMvM+fTEPM8IKyILQnXXglkBlnF28aq3h/DeagnCLMnhylpWD9P3zM2CUZLB2a0UiXkgpcx0emAcz6QQ+BAjYxCkVdb1RrldGcZMDEotI6WsSAwMtxu3241SCsNoCao+fnxgGCwT8/nu5OG1ifNpJIqiHx8I8y9Z5xtPt5XHy2K5JWqklGhMDpUa1N1RGdXBgJInAmxqIenFmZmwFaANXt4iHRbNnqXZ+uZP216sRP9+X99evtxI7dB2fHWL8A/9rRgjp/MZBZ6enhnGieRjv9bGshbfnHy5nlfC842cAmtTckgMKaFVCVXRlHj+/lu++8dfMZxOXm7kjjQMpGEiDpMdqLFtKJth6kxBX0u7Mbxd3h+jH39G68BaVbdwcKQLTj1XWIzbuUZf8K1MjelqBAvzHQcX3Qax72uvWWdrXRQPJW5YGL8bI3kaSaML+h0g7IaauUiGOGyd11P3S7Co1R6m3ZPbWcmV6JFhPfeYnaeVYlF6HpeuDWx+bHVDTHC2acuq7uk3WnOvhoD21CaePfpQRZvWS/6Y1u96e9sAD4tB6duxbH3e/6ayZ3g+gpxe6sia9XfDs9C7USmhz8LmhZONpRPEqxMIkkyrWtWCX44TQDFdqooB5KaHucfht7eHbsbDQRnEHv10vAphqSs6X80oZSHK7AzPjU+PT8QQmE4ny70TAuN0YhhGc6VuY94Bc9836QDHhcp+Xr2u4Vp+BpMzZstwHNuKNNO59Dw5ll7cRKCtKbW0zcJvjrqSF4jbLSu76aaolq1set8QjHnJXlejstbGsipPzyvPl4W1NS5zZV6dyWmmzB8C5CSkANP9wL/++iP/wcd7xiHy4X5gyIFffbpA+J5P14Vvn2fm8gxabLD4PbLJ7FmRqxUHbCGyLpV1MZFsLYFWw25YuPUegrlKEHyzDY5UPQmiOu2mjZwid2dLGLeUSkqRZS083P0JQE6+UGplXmZu1yuXx89cbzeiVHIsmH5pYVlno0fXG+t88UVSEddYLcvKvLqLqlRmBwHilYBDCEznE+M0gRf6TMNIziPnD19zOn9FDsJDCpyjsFwvfP7tI7enT4zjQIpKLdOWJyVnKy3xfLmaiv/DPaqNaRq5v79jGEfEk6XdnSbGFBj0gTN/S1lnvvv0TJBHltXCytdi7qoalJos8k91BYozFHZza63M841lMYo7pi62S8QcyDlvOUFs4v1pKsrvTV+9fr1B/wGb9hHgbBZoH9y7iK2vvzvr+uPHPdrsMSbOd3fElPj8+ZFxmsh5QGK0Oj5efLKfYxW7AykGVIVzGqjZcuGnUtAYuXz3Ld+mwDBNjOeJ+198ba9FCMO4X6/qvr739yzXpF3dkfL5Yi+9rfW/sSF+Ur0WVAjhRUK4np8MsJDRUp2p0G0/yilxnk4bQCq44aFWflqA7NEvqkpdFpoEQgyMp4k8DcbKukbt2KwYrtW90qrUtVpSObVADJRtfuyRUP06QIuxyN19FjDgFByQLqWyrHvF9eaBC9LUaiO6gVVKz5EGnfnfksq6tjLEaHNTbMyWVrnNNy6365veS6u81Vm2o3jY5BEcKr0f28aGiYCrpKpH/XbWtOM2tNG0AOK1zGyflWDsclO1GoW4oNjrhYFsGtcdo6i7jMPGUncfhbEl/XU/Rg9l92SE7Kz3UlfWPmZqQKtXMuigVwKn85mT18O6v//A+XRHCJE8ZFKP+ItWTb2LlneQE3oA/g5y6k/P3C9HVwUr2yDtiPYOlKXdGTbfGTutxH75Bgh4uYhsRqIcAsAOP9LUEjrZQz0sTzekf3wcPQNRYIiWQXXyx5ACU0qMKTGkRo7F3Vm9nlEnzP2c8LTZtUBTSq1UjwSqLdDUygyo5xPYcKwPUgliiLt5PRVkv16nCbvbzwqcCqomYn7Tps1CEmvZHqWaCwqphFYRcQFuKR4FZaJb1N19oS/C+6Is28jA8pYEYzlSGsh5tPIQw8QwjKQ8WhhnHkgixNhr14QtCidEYV0LMa4glgUXj+7Yo0Pq/uj+eh9C3aXUI8C6hZOdQh2SMmSoobG2aJNRLQZLtbtIX9KyzaMUen0xkbYvEhwtsG4Bv+2m+Ae1nzwF/eFL3RmcF3lQ2Pu1u1L61350aXnxm7JtOiklL3eR7d7HtJVQ0cNv1tYzuVqq+VKrlX0QoUXTEZSyss6zjY1lpiyzgYJSNgp/Y2Z+xEf14hR/jMH6Qce9tHPfopkRpFuW9d0twWbR2qn4udhCvK21tqb4mHdXDXJIMieyZcLd0n64O4cQkWiZlbuBtnn/Dj3Qp3l3ib3ep7dVf1/+X2ykdPfSvmD633Z3S7fc0ZfuqKOcvEscmrwcg3K49l0qsH+m70NvDVj9Br74lQbbNTVtzr4c/n5YM45SgI05gY1utL56Ccz3PXR/OvZBJ7w6o/P6fPe17nBbtI/6fhbdmOvf6ef1+tIdhjTdoh+lHdLNrAshCikm5mUhxkyInmBXTUSfWqP1TNlhE5psmpxj8t1afwaTcxpHbsOIUkyqEaL5OwlW3MtHbPUslX2CblaZXxyHgWYCKmNgANNi+Obeu7/SuM6FT0831tJ4vhVuS8+6K5ZF02CrV9KGHJUxwhgbp1A4h4VBA3lpxBoYa+UhjciQqGPk86SkUCh1ZS4mlA3qtZOa8qxX2uKZPJdGWz3fTsy0bC6wNgzokO19BRUzTVKKDGqJmnIPHffrbk0tpToV6kwQuBsCLVuV97dsnx9/y3effst3n77h6XLj6fGRp6dvuN1mAzhtNbxu2aQAyDlwd39HCMJpOnF3PiMiXG+F67xSq3JZCs9zAYFhHMlDJqbE/cOHzYo/3T8wns6WYTmekDgQtRHrDdpKLY3L5crnz5/JObOWlWHMjNOJtQnjuHJ/f8/DV+aqGvJAqx5t52UzDIupJ5is3J6e+O43v6bMNyqJD+NAG4VhCJzuA6VWPn8SPn+aaVWZ10opy7YQ9Wip4r+hKLpagsiUMzGNhJQ8LYD1l4HjSnljcePevrT1/p6/v9jT9QB0dh2Sbq7csFmRPeT0R6HOtrHZjhhCZMgjQSL39w/8i7/7l4iYm+83v/6dJ8ss3G5WCqQ72UMQnmXhU7pwS4k6DSQ5kaMSLjcIn0jDlfOvf8t0/sBwOvGxCnk4E1IEiUSJ7iI+AJ2fYJ+O5PsPe05/uC/8kdvzbeHpOtPTY8QUyc0TeMaAJAOEFljlSSDUoptEYMyZu8mqc09jImUzPkP0OkkBElbKQbA6hDmawDiM2fbPIITRhMelFpbbjbqYuyGlSJSA1RS0dbhVpTiT4+YdsAvxBaHGSnBd1TFJIHSjkC1tRGvQSqP1WlT+ECwhaPbkhTkKjGboaG3UdTUGK8etintKnu4kCDnFrcZiHTPj+LbJANvGcVjrhrRi+YxqB+IbALXvqLhWVZoXORZ6nSo/kseDmIuKl71ufeqfFLD76QLnzd3pTFEHfEe3U/PsyOhBY+Wbcoc66i5JFVyorBxZXrpBT0fB7h92MkJUaHNhrldCCFyWi60PLlUx48fKSfQEqyEa7qDv+dITDNrvZKafvBdf3FEnz15Y2kIrala6pzm2nC+9WqyXslf2sgm9Fw8gp1vIpRkwQkzbE2IEES9eaf7My1z4/HRjrRYFNC92K2qHpn2QYFXSc1QGBzlTWDnJQtJAXitxDUwl8BBH4iCsJfIwKSlUbuvNQhJ79l7PFfO83HhWy0AsXvMqBGh5QIeBlCNShFDT1tkAeMjkGPrg1u259Fo0CEGMOo4SGIdsCvQ3BjmPj7/j0/e/NpDz3EsqfMsyL2gp6LJCa6ScGEajmE/DxMOHO4ac+PjV1/zyF39DiJHLtfJ8XSlVebyuPF7N7XC6OzOdTqScePj4FfcfPlh/nO7I00RtcJkL81qRuhKvDarR4tfLlcfPj8QUucxXYoqc7u4gZE6nSs4WFno+nUnJqOjiCQe1748efaK1cHt+5rvf/Ib1duH+4Rc8fPwlIWbOMvAgI2utBJ2Zb59Y1/oC5PTCr51d6tWQrURHIw+NcVrJ47hFe3RhYKm7huft29Fc/ie0HwE4uwW6W5xbuHIE1XBYzH6Sy3GMY4uchEgeBmJM3H944G//xd+R80ityoeHf8f1unC9XLleb5RSrRyBs2bPCKMIQ4qIKqecaRpp16sxPCkxnX7HMJ4YT2fyeOb+4y9ILRMSaAobzb4vQi6slB+e8w/t272Hfwoc/bHa5bbwfF02izqWxOAgJ6VEloDXCN6s4yxe50gsud/defICsZGYbbM5RISTjeYkiDDlyBDTVp8ox4SK4NHYLIsirVKW2XSYcfRM3za3mporqRaLcBIJRLH0G9rczaQgMRj7Go7ZaeWF/kSbAZWmakVvy85sVXVmNiayb4Cm3YiUUphvM7drI4hJK4ZxMBY99k1RkBRRNdCjNTO+cTJAYxyO/24bUFGFbuGLu28Q24xVBPGcbeEw3o6h5WHbS+3e72CiH7x/1g1014S2jld8MKgft2ea7oYNfu6isuXD0Waut91BdQA5B/uhP29noXFzYTbcCEVZ6g6KokQH6u6iioYxUs4mdRGXxYTobqy9kGl/znL3k/fiy9FVLuBqMSIaEReU0dNA1z0uYo9UkhfrwQ+W380l1V1elkkWbFAbAHJ2qDX3u+207SaA82MJu+snBN2AVG1GbK3NbkapoJ4VIBBIIZCikqrTtj40u7+vq9F3itVdcj0kVnfKzBZN3S5YpFsp1h/qVGzQYD7xvsa66+r4eMu2zDfT2tTiTIVHJ3n19c5EWdinDbJxHJnGiWGwQm3jaL7U2iprTYTaGGokF3Mt5Wy6m5QSQ54YhsncFXkkpRFpjbC6iG5D+DsQbBZ2A6VSUdJaWNdKzsXuqYelvhA3HjnpDegozStaL7eZMq1e0TkQEuRkdXHM+ouoRsvfIHIALHtYrL2/j9sjM9mf9zDal1WP36S9OLx86eXvOZ+dqkbZctpsKSBa3cZzd++Cbv3xgiZ/hXv2mAnxuks+nqaJaTLx4ThNFoHXI+TUs0iLuJvafr94GglpFpFYi9XWWdeV5TojElnnhbKufq7RE5H19elVXtQXA+dOTgAAIABJREFU5+psa+9U1cOf92+lN6xH1mvr9fBwqJQaiQoSGrE1vMIxrtXdDWVv3ZAEPYzB3cLfll762y4AFg9dp4uYu5tB95pH/fu6yesP48I5hR51Cxz3v36cjW8QNv6hf0g3o/nABLy4Jj9pfXlXtvOSfR94MQa3PujsUfDaTW/XXrurVQ/v4zNOIaiVZBBkWzvkxYZw+P7WZ32XfQXL9Qcvth/eqgsIiOoBgL3UC+1f3VNT7kUcdyZnf/1j13781+7SVwc69t22s0OqBK9e32Uill1eaRodaLmQXIRY96hsE5cLZf1pKPNl4fF0Yrq7Y0gBbbYQhZiMdalWg6g1S7RXxUKL99Bqv7htNDoMxYVmDnBCyiiZ2hpP1wuX2411LVxuM0tZTVgmQkr2LB7eLCL/H29v3txIllx7/vwusQAkc+lqtVojvWfPZrGx+f5fZ2yeSS21urasTJJAxF3nD/cbQJa6uno0TYUZKlkkSABxN/fjx89BlNDBLJ2TbyxOI4yfXneVs8bhu4MuXIk8s5K6p9KYoqc7wYl21FRT5G2oJswIsujQqhLJkAEfDmZ/RQ7rCWwidtyx+98gwtES2vtgPbbxK7c1+Mbn4p/+/Q/8+MMnSnW44JnXhcf3H6ilIbXhSkE6nE4rT+8eiDHw7unEx4+PxBhYppVlPiE4xBW6KIpSZSez04EYZ5yfrUVwRmRGxNN7oDVPrULJXYnctTLVpnYYFtA2KxHtW6c5KN0xLS/k0jg/POK8Z54W5nni4fFs2fxsWguKAtRUKFvm8uWFn/70PdvlhbpXqJ04zcwfvmE9n2gE3r1bKe0dKSXEVUpTocB8KewpHVlzCApxR3TBhhDwwexGuq6FUvJRqqr/5d1Vf+Hqv/g/9q1beeq6XbleLwcnrdaCc47zwwMnK1V6J9zaPrnB2ncRz81XZkivd6Zp4ZtvfsvD+RG68P23P/Dw8Mh3337H5fVKMT0mbB5kqbxI0vKuEXGn4DlNjYcmtNJ5/fSF0CPTsrCsj6znJ+Ky8PDhPT6G25odS7SLJUgc0vWtN3K72ZEMZE5vjR1I4pgfP7zF6AAcInU5qU2MD4FYKs6p6GVtFeetPD+OO+fxAXBiatKJ3oRaBcl6qOSSuT9pxBKrXrrOUe8geLyoB9x2vXLZtemg10Y0knEMgRjURsC1osmIU5Rt4ArDypnW1Y+wqUhdMBmP2pQr3Q2dETPLTGgXp6KnzoT9tFlBrSd0atWimme9NZqrZszZCOaoHoNnihqs5VaPZpja6tFy7r2iBG95VSu/3WdezYLGrwIeS3hBO79KLRasGaeKcbRYSChm/zBe6EisHfc94/dz9yhFcTtiKjcPrHr3HOEWpDa5BatDWbtbhN1RSkOTPxfo3Gfr469xBLfjfoztQrqp0Heh1YI0/Xy53pLXG3qFBkR2vg4UzJf3vzgWvxrkTMvMup7pMULTLFisXJWLkkFba/SSKejgHsqU/YgFdVE5ObDT0VaoNyPQCNReVJzuy5VcCpdNBeZU5MmbIzQE0z4Rp0KCLngmOqtrzCbL+Pl1NxVbp9YPTWhxJk+B5rXNeZocrov5f0Rq80q0bRpRqgeTluCKgQtDY6C2hmto1nXP0LOIRQdjROIje5Ej6dKy2ECB7mbfGwc53//pj/z00wucPiJhJs4LIayosF0nlI4Dnt498s03H5imiXfvTvzm45Nq+DRly/cGXQqtF3JtZDZSv9I6+GB+My7gZLEgxwGR1jytQSmQk9bqQ2lqoGq+Ub1DaZ0tJ0rv1O6Ylwuldj7uyWQJZtZ15enpnYk3micXGpCWVCl74frlwk/f/sj28oWeO64L07Lw8XRinnSTfVcWkEcTMbxy2S5IynBppJQAcBIOQbNRH3ZeO7l0A2uUkkh2QGmw89ZBzm1T++uvP5t7gWVotVW27crnL59vwVotipoZCqMQccOZtvyRV/+H5Ng2t66BDjTitPDxN9+ooWIXfvj+E+fzAyKOf/3Dv3F5vdJrO1zGE5Xeq3VdOsIUiMHTG0zoWr789AJbI84zp6f3nN69Zz6diPPE+nRGcBTpFFN6dXYcj/NBM8VKKru2v/ab0eMQquu9E1zg4xsGOa2rncp1T9RSbkGOd0xV+X3eOS1PYQTj0JmMZ9NKoRa0s8aOsI76KI38WR2prfTQKk060gIyN4JosLddr3x5frHDVnkwPtyCnNYbUjqNinOBECbEBT3MmnJ8Wm30rJ5ZXsDoROSBmFnQOBzBpWNBTqX1AF0Nbp3o64sAvRxryrl2NCqoUbIiUTGo0rkqa6s5bGuNYgrsw4JgCm9LC2iWII+y6D2B+iuEpzVGYatyw1SdOcpj83WU9bwztJkRNNyCnNEJNRIW/bIdKM3969feLLi5BTw3hPrrIMdxSwZGgINgFkjtQIDGpzp0y+7BDb5G5Prdv8crdrA2uPFGjxiCbslPr1AyNKVG2PHK7H65W+5XR3rAzOJufiOCma1ZVNlleDgpnE3XNuND4fL4YwKmxCnOWRRvNcN+Lw5WTW5cjQ/p4/MqcSw4OUQKp+h1AdKJIgTRDoHu3KFoXO/LTEYJw/RwPNjiCOZUrr5UHe29T1UnoqBOsHo/YDDf9c2ZINPgIFm0e3cTj2j2uAzV+Rm29+ZXybqRhIHIOUd3HizIiV75QtM028Na+qxFsRmq1Rq0CqO7DO5KOPflnK9gY4M4e79N4H73/9wWiDC8vHQG3f/K+LvjuUeXl72OHI9bF4Z2gelY3bSN1G8rBMc0RZBupZQZJ8JrCOqN1Tg2GbAA3fujxOdsPt5nX/dfv9X1VQ51j6L8fErd73B3v3c/IwdZupq0wPV6NXXiTC0ZHwKnbWPfd3MgNqdoreN9Pd85Vgb3Ypuj5Ddq7tM0cTqd2M4bp9OJZVbZ95I1cB7O2dadTGntkOMvQQ8RhxzWHM5puSpdNxAhp13LYN1RpVPNbNV1rCPQBEAxufySTOvpdigOVd3eO9G/bfZ/64K6G5y7zPdA22ywb4fK3XP7LXtv3Nqw7w+h8VRnnbOD3Hl0qxiiqkHOre33sKoZb2q8X6fdWNYJAoPLYe2l4uA486TfvY4RkdEZIsceIsd0/Wrf/FlMf1AFRM0gna3JUeo5ysd98MsGiiWHmvtbXUen1DEmd8FNv1u7/SY9cfzw+JzO7nY/9jPXrUqAasIdOjzDSBgY9jP6ukZJQI4zGhFck6PoNBbr/Z55JN7HPyNo4gjajnV9/7NjbnSGwZbcBvOrq3O87bsE336Prs0j4+a1zDh0etnpNeu7tb9Z9l/WPfr1cHbMUAty7ttkxQkuKDQZxBNQEb8BbY3PNX7F9JoAwYcJ7yO9Q06NnDs5Q94y2byRFunMi8rnt6p+KWLaPN57Ygw8Pp5Y5gnXO1Mv+N6ZveCDpzl364xpXX0fXALRFuTovFktTKzzQkeYl8i6zoiD6/aF1+tPlFJ4ed15edXOh2lSwrJ3HRn+Doh5htw2gPE4JtcxETpdHN0q22Ny/ZlU+G9+XV82yt44T2em8ztEAk5mBE8A5q6Z1/lh5fHdAzF4QnSk3JDc2K6F7bVQaydl2JIGpyUXdH6Yu7s56nZTjNCuu6LCY63SajZospjAl94DHyfivCpvqRTdBP0EfkJ8RFy4BT16R+2T3XZxJ57oJ3oorPPK0/mRic5pWYhOCK7jpOLIOOmcz5H59GREVs/5fGLfEs75w9Q1pUrJmglO08I8z7qAfbfloZvmNE20Vtn3yGld33w87z/6z+KMu4107LJfP6lb8N17Z993rtuVnBN//Pc/8od//VdyVg+anBMxRL68PPPNyzPzNPGb3/yWd+/emaotB5/mPkP76jEOGZGj7Pf07ol/+m//xMePH/He8/2333E+n/ny+Qvf/+k79n3X2WOkR9kT3amnnVSYJTA5j1TBZ6Gmyk9/+o4ePPO60jwweVwMtCnQoyYvLRe6IUXeMKbSCq/plb2o+WzKiVLrYWjZW2OZVv7Xf/jf3moUWZbIaY60EvB0fPBM0VtQLUjX8o+YsbBDg5QQlF845mBHE5FsQVppqvYNJvvgVGfn4bxyXhfNzZp51OVCKaoq74cExBSPAKwdkhlmSSBeuwx9UHR37NNjqnV3vF9BS1hayu1QM1L0mK6lEGMk9ECpoi3BFjwf3X2AOCsP25xChHlZmNcFEdWsct5Tu1IotmTq7M26L6MwuUB444BVUdDyVXI0UAo9AnSl6D64ayDi0MBFhDDPxGhEW1BeFnoPnQUtzimn8JZUC703Ssv0mm3uKg/Re8+6PhCnGfVKDJSuiNOe843raN3TcMfJOmYVdu/vgpr7PWcksQ6k3QCOAxAYRDI4WNB6L47AgOMM7J3eqmbSrdLzBUqi1ky+PFPzfvd84RJ+WZPsr8DsRhjuNAW6u8Rh/lAqzhO6RrADDoaRbevXR1udCD7M+DDRG6SkZalSOiUVyp4QGnNUD6LeoRb9vOJE3aVDYJ4jv3lceTgt0JVPQm9aFXZqGdBKJluWKr7hXEakIhIIztw8fMT7FRHP49OZ9x+e8F54fvmez8/VNnmga83Ye/X90AlZ6GTUJsIp34axGAfcJ3eHQD+i4QNe5M9B/W9zbZedmoUpnjidnnBuxvsTTgKTwOJMPfoUOT/Maj7aCqmo4vHr687zp41aO7V71QwCSi9gOpSqymk6D9ItmOsqXNVEVZJb1Q6N4UxuG5ZuqosShLsDKrgILqg3kbu3vfg5ZKHfUm+sSA8Ty7TwsJ4IvTJNE8EJfgRiZJyDdYqEeaX1TowT5/MDl+vG6+XKy8uFnAutbuSUACHGiWU56WcTDeKUCO0OIUDvPcvyy22Nf8ur/+xfzebugpnWORRXx9wbGSaa7eW8c7m8su0b333/Hf/yh38hpcS+7zr/YyTXQi6ZdV1ZlpWHhwfwYx7/h5Dzq69HiULEypkiPDw88vt/+L2u/5L5wz//MzGqANhPP/7Inu64bwI9Z0pXzZyZwFNYaNY4MFWh5cKXH34k9cZ0WojvzszvH/BTRM4LDIG7faekDL3jux7XpWae9y9s5XqgWdmC7FKU8H5eH95uEIFlCixzICeP9EoIjjn6O0RwwDKiawxDo03X5ujG6YaIl2oKwkK981AS2x9P64n3755ovXN5fT2620pVj0AtzUamaQXDhnStOgsN0U5Srz5VvSpqrtiMQ7wi1U70MWZFrap9NApqnY54IcagcyaPMEYz9cO6wGEaXXIznhRR4dAYGRBYB7oIpTb2pHykoUiuPTMeH942yCn2GUdwqBN4dELpOHa68qjyRm/VxGS1eiHREWVRK6Wue7KSh28Cgw7dZ4+4QDSYKz3RqwbrNe3UUpA4Ec4r6+yoXXDVU7poN2OpqEKY4CUwrDCqVEOB7nFDvgJbjm/1u/XeOEA9h+49wgjO2/GcgyR3VANgbE69t6Ms1Wumby+Qr9S8sz9/Im0XjsARuK7/Pww6VdzvPie7+6ADdjxqgLeNzvbSr2C6itYBEUFcw41aId0msgYO3kpJU4A5aNZeUfNv54Q4B3ycmKfIHD0xGJEXgXovBqVlD6nm0htuGgpNvAq/ibrcxhhxLjAvkXlWh+uUI3OacA6muCsniFtkamHc1zDrgcjcwaZfg663WXH89n/ddRjfjbKjvplbfH6HQIHQxTYNox61jn3PFtfhTmuKpUcZaRykt3LUkCLv3ObTV4sHUYkCHzQwNK8ZcV6DRuvqY7yn47033XSPWtYxKgec7qysdBDWx2fW9FNRuK7zLljNfplnTqeVnJTrUKsihCGokip068UbK/42I5y4QzTx7a/+H/6vc+OSdBOAHFHGcRYeMH7jum28Xi9s28bleuW6XUkpk/adlHZKLVyvVy6XCwApZyVWiiG6dyN5j950yxbVLbopIuz1fTgnejjSWZaFh8cH9m3j5fmFaZ70dbu2649uSdcAHKVVclVxziKe6nQcSlaVchxslwvX11d8nnCu451+3pQSJakHlvSGtE5tmT3tpKKE25S11NV6s7b29ua6R0PJOAQPPRjpNhyifsfzjJPzc+z3KPXaGux2oN7i3a+z5ePbvR+IZa3t4MXdnntDocd+3sdecPe90ZDxlUid3OYj3Haau53fnisHibT5o5nnrlXYHt4QArk1cgybFdDg7giM+3hNuTEJGHPzPzNCf/11dIuOz9fv7r6MXb+DONV8c+Bc14cMzpiW11wHZ+jY4GKNPfxuozVtMzW6bTYHcLoPhREIOwc4NV619u5aKqA/83Y+9na7x3oHR/t7M42fOzTt9qH189lPLRxVBFIs0bIgR5GcMTnuzsz7v3UcOipZ0Eum2aOXgsV69K6k+1+6fjXIGa7B0gp0zcjHdKlmqd5aNXVxy9bwWo9FNJo0r6Ns7HEElrWz2iFba6K3hPTMEhqcdKAeT47zogq0uSjxN4SJ9ek98/qId8JpVjXjVis5tRvMGmecD/gcIAq1VKZJ9VW895TWlZgGLKczD4/v8SGynmZOD6sGNusj86qfsTdhuybVZKl3BDjxOB91c+kO6RpR6vjdBu0ox7Zx6PzsgL+dkW96xTgTOngcHhWmSmXTcCMEggSac/juybb5tWrt9w26ePwccRFaF4K1e0pTHaEOSBCDXqFTKS2jXW5iXjsVqAgVRE1SuzjEB8K0Mq0PtFLxbPhccHHBhQUJCxImuveYZTVdim3mygmQLkfdXUx12UV9+OgJk3K4xHV6TbTSSG2HZAu0qkJ2WCL/8LtvWOeFlAvf//DC5y+aPahcfKCbL4xuqiCouIeIeQfF+e0HFPg6yDHUtGrJpbWm63R0iQmjC5mcM3tKlFr4/ocf+O6H79n2jX/5wx/447d/0p/vO2nfjXDauW5XHh4feff+Pe8+vCOGoLYdopnx4G2oOWu1g7OS9kQtlRijkcQFHx2n80KrE7/7/W/5P/+v/4PnL3/P+XHly/Mn5k+B58sr25cvlFaoVahdEYsXt/OpvzI5D0sjLsotSV+ecWUnTBN9jVzrjl9mzr/9wPLhiQ6K0qRM741WC70Waq/s7ZXcdgts8rFvDV7HWy/P6IV1jkR/pjUty8Up2sEkx70tOZO3fOMKDgKxOIKVlkpr9F4sWNCwaKDHIzyqxcbF9Kkul4t2d+U6zhZDH5yWQYopwNNp3bJ8ceSSNdWojZxUGNCJI7qb8/iwYD04kbQDwB5Cg2GKytkKQqz6HhWl8mbWqgJ/t5muI+K8coZqq2xpY0tJ32vrquR8zEsQF9RK5I01rEZiYdRj9Xk0bpMfHCKEEhwlOruXDXFmueECvlQcDd+F0HT8pzgxx6jk8hDpwZotRjJjDuxTMMKx9/Q+4ePEuixM8wzimdxKd5FWG6clW5eaaNJq67f0kQgq6aB31HA5bZp4tKIBhyWwN8TmdpQ5Gs7iBp2PZv2DRwbNA3egXN2SWAaSUzO97JTrC+X6RX0Ury+UfdNSbFH+2P769Itj8VcEOaaF0yrSFdYabHAtMdlG2lSJ86BIWV2vlEoqSuJLJZNK0gPIB53UaPmClpFemEPDz8rGf38WHlctKeUCtUKYIo+/eWR9+IDQCRQcjVKFTTK5CC544jLhQsRldRB3pbLMCw8PD0wxkkphM/Gph6eVD795JE4TcQrMqyoYx3gixkLOmcvLxud5IomwJ4UZ1S/F4ZwJYKHRMShsS28HEqIzn7vs+obvyX9BcDMu59UTyhkRtzcNYlXEMZKjvt/SG6V1kE5peu97g+4cPmqnh+s3bkwvTbuuANxAdHTh1VY09+zmgE5FeVEjyld0COfxcSbMJ4Iv+AquZ1yYkTDjjJejhjdCd1ou0iU17v2Acgyx8cobc8HhbZP0UeF+XaTq/1J7sQ12ZQ4qOvbNh/c8nJ9IqRDCZ0J8obVOMki/96pQaq2K7ln8L0DwgSm+reCYXj9DWO2qTcustVZS2tmvG61Vy+40AxvoTc6Zf//2T/zbn/6dfd/547d/4vufflT9mU01hgYCWkrh6aqq1Nu+UVvEB+OMyGhvHYeiCiiWUth2NVZVUbQZQdHS4PUeffzNe/77//hvXC8XUt74f/7n/00jkyn0F91HpAvV+AiXPfHcHJPzTHhOfiLUBr3Q00V5OIvn2hJxmXnfEo9o9relZNIU4x5lDcbZqORjzg77jpGVvzXxODhhmQJiCurDufkwLLSS93bZeE3F9OS6zsPeEadWGYp6eCsDgOEk9iq3EkEtjZzyEYTu266dkmUEOQOZHppoeqjckPeukgtF255LUR+yVhvBfIf8nXxAp5l4nGX23FBf55VPIyL47qjtnsCvcytOgSlax9FXPEfbU4t2xG7bpshU7wdyPdAPnKOZTc9bXkPvyw0kxzkCqkcUXWA2f6bahdxUYFFRDg0IQuv4ogGPa0oU9iLMUThZw0j1gWLdabVVqin+eefpPgAdh6phS4jahRojuIhMJ/CLijBO1fSJbiicUhD6YTSa7eu2b3qGlWL3WNXfXTetG0B6s89t/CHrjOot0Vu2cTf6gTX6KDtODPJyyk2pRYOcnCjbhXx50SBnu9DyfnAla23sl9dfHItfDXIOLv/dDRg/YSwgO1Ccd8eiGjBmMe0HbePTN3TzHSp3rYOZXgtCI3oIXqXJz6eoRKncyaUTJs80BeIUtLQgXkl6taioUnFaR7Ygp1qJqZXKMs3My0wMAbzQ0Fa6EATnu5ZHpNIMDaitmhihmZBZ+7wStNzB5D9QGkb10mDSwVK323V/FgkM8vlXC+PPnFd/0yvEiLcxGFmWZh1D28HGZ9hudF2IteuGVxklrGGWZiWPrz7iHYx6V/aS4RXmtRXWm1aSa1rqGiKP9/fr52WnA+q9iwzvqHDcBznjtQcSqu+3Ic1xyCPb78mYz82ye+u+8dbFN0+R02nR+ZsyLhuK00T1kwZcX4eY18/Xy1tct7LfjedlP+naDl6rBunbdqXUYmOlY34EOaXwcnnlum2KclTNzO/wdUDnRy5Z/96+cb1eKaWoZg536JnxIbYtkc0Mddut04lOSvNRysTg8JwToORLH5zZeUyEKdghNz4Xt/dSG9KFXLV01ei4oaTq5Oi0ar2RrlfydaMLisS2aslZUYIxlSrWeG3lmFGBvy/LvOU1OgG9lae8eUs56251xkerQZsunD1Hb6VyXVLOOBHyEYj34/CQDvgbR/LgGxlJVr++dZONg7odgqxf70/d6j6Dh6a6YVaaFC19qSBvQ+52iENQ0tb7CI6/uhfHWh6lM52zrQ+kwbTJRvmsa+DS2q11XsC6guXYq48Sz1tffWAUcjeu6vwudN1jRLQcO2wS6FbW6fSqnoGudwSvWm/DZ8w+Q+2NXDlKzs2qBNRGL9parwEe0DopFYRdmyXIiNfuyF6r6lI19SzULmTBWkloTqjO0xCaeG0A4Ya2HO/d/nU007IBekGa2eHUTKsqyYHruKaj1Aj6vkW001fcDRW6exz79P/H668SCzgkqe9Xvr2k6hw4Ap4umkmn3NmSRv7XLfP8erG6eqOj8t7+ej3eeLls1OuG9MrJZ5bFscye3//ugd99o9Dt8zVx3QsurKzvV6bTGe8dyzQTvMp7Xy6vpJzxwTMvi6rYNtOjaeq6O8cJ54SUdi7XC7VVptUTYsYFtVwouy6ay3Xn9WWj5EJKDZGA9zDFO5ViPMVMlG8k8aGe3E3p+HaYgp0dgrrQ2v+89XE4rvPjE9UXJd6JErxqLYraZI4ST/MBXyCgwmK1aTmo4xRB6dBqOfgSpaOZhGgwiLWeNudpJsHuYiTOM74W+hoJZOVQScFlLWH0bqasSv5RtEwcwZu5Y/DqvTNQTazc1UdJQTe5QzByBKIiejDmrBoRVbML6UqhHBYkLWfKXu1giAQJuOj45uMjj4+PlFL5/PLK5apE5J++XMlJD4aSNKAYxM6S39qFfMDA7asDuKMl4JSu5Jz56fNPfPftt+z7zp4Sl/2qom9p47JtlFp5fn3hy8szpVYu+1XtEMTjaiA0PZhKLVyuV8Q5vv3uW9bzyjRFnh4fOZ1WRnu4OEfJldfXK/ueaXbfW22sy0zJO8syaym75KOklvJG7YWwCB/+7hGZChsb0ydHVuqUyv2jaNpLTQTrCPFOzW2X6Jgnj6uFy0+fSHkjmMx/qwWCp02eFhyNTqqJbCUYDs2PIWshBxeot07Nb7tKJy/qJ3XoMblDm8mHoCKbzjF7TxTRoNUOw1Y7l0sl7aoXMspOHdUBE6fcnskrH8MhpD1ZRlx5fX3lerlQGuTiKFYuSnvSPb4P8q7coj1bq0ocVxHOkptRKXSNqVmoWoKMPXNeJtsTbwEXJj2iwaRqKAE3ny7BElAjmA70Bg4tnNoaKWfrCtKycvCjsoB+T0a5+ZeJqn+Tq6mASfTq9zV5LYN7Q0MvV6uACBQ3uK8mcdI7br8i2wXXO+cwczZD23ldwKsz92tKfMnZcjOjivSGSxlXCk7Q7lgn1L1xff3EjuDCTDglXFyNbKzHUE47r89fSCnRnaf5SHcON6/4h3dIiCATcpoIdPr1ldwarWSoDakV1xuOykRWZeWaaSXpZ8sbpWxAx8mEs07Z0rzON+fwccEFRTInqSoJ4xrZAqdu9pzocI56518MXP8qJKdxS2XuD2xBoTHACKMREFMO1jbBPVderwqbOzfabWHfBe/0b5brhXrd8NLwp8Y6O06r5+P7hd/93YP6BD17ni87Ls7MjzNhnQkhclofmKaZnDN+mUgpEYJnWaIR+MTKJKbHaTXHfQ+Ib5Sa8ZPHhYK4AfNrNLvtJkiYNStSJCLgvUXJYKiUHbhu3HCOm665iN2vcetErOPPNgx5cwDnuOb1xNTSscg71l5ZG1mEXjzOoNK9oj42zeDjQSyzIEdl9g0d6NZb1QeU7GhGqmuiGjvOB3yMOCf0yeOaRxUktStN3drvyIvHXJZDxEv1egZIc4dkjPw5H6khAAAgAElEQVRt/I6gAY4T82oR0z/RTKNZkHPD3+y1ayYnhUxdUKKxd47pYQU/kXJGXMc72Dy8vIhBsWozULKVvZx697ztNTLuW+fUuBu1laP9+/X1hR9+/N46xl75/PyssH5KXNJGa40tJ7Y8jEm7ZfweFzw+dJvnjW3fEef46fNPnL4/MU2RXBJbOhm3Qg/TnAtfPr+ybcMHTCHxfV0JXkh5oZZCSpsqKsuQLun4STi/X2k+c3peCJPD7WJzrEMTMo1rz3qApMQUPNE7IBBcU9Xfl4akjTBFptOMjx6JAXlckXWi0kktk40zMDz3dA7d2mhb0SCytV/eSP8WlxchescU/NE84bwz0dPIskxmVdER4zru+8521S7FvVQutRiqqO3egHWDCt53eo/mE6RKyLkmUxdXDSQtnwRqd0h1Wm40eQgOdO8OvW5dbTTcIWNi96pRRXke3gsRIw6LdlEBB8cIVAupDrTIcBjdAu7MKUVLxMBNqNHmlooIdkPlNPhR0vLXwcx9U8qbXl0ba6Kh/tE5YtCSU8mVlLR8W5yjDM+wpqTe3jtcduT1Vcs9S9XO5Wg/tz1tr4XnbTs6F+mCtE7MmVA0wGQCgiPXzpc98ZILPs5MyeHnineiga9zbJdXPv34A9v1QveRPi10F4jnxjI94GVSq545qpVCa7TtVTWVWsJ3HR/XM4GM641SdlreoVVaulLzhd7Bu6iv0cX06BQFDnPDT7OaqQYsSL2hQ22gXcdSvJ2wv3T9uhigN0+TZlLdR0T/s79vUH1HKEX1FnJROFQjbZ2cTnSzqLWSk/FCzMhtmG/FoMTNECdCnKA1nC84X6zLRjkZ4r3qCcwnCInYqvI6vBAnU0juoGqQo/RhMGIQwuSR2pWvYTwPBzjfoTt8gDiBuMY0V+ZVF1Mt7rB2F/POGCWafkSUfdyWA7HqmOAXd08xzo6M331zJHXUbsb7Mzi6Vr23FswqRF2Va2Vf967Y/dBPqEU311uwoJNPMzeT4ffG7A+O6PXRUUdlH1RIshcNcmJUBesYAw0h+Kq1abmN2+1gH0H3fRlhgNQc0f3wS3J+6LmMzZUjkBreXXShCTR3g8q5I81JE4RGDFpKFSKndaGUzO48+3Un29toVbty3vJKaT98hsanH/9NaVdxu5wpJVupuJBLOTRgiq1JRBE8HxS+HnYr9I5rQrnTtCilkHLmcr3y8vKiPAoHpSSGp5izbPXlWZGc+yCn5ISTxnKZtVSUtNXVOUeMmtnv6QrSdI1GzzxPup/Q6LkePnBNB5HSKqkWWnfMHkoF3wWp4Ao058j7zn65IjGAMxIkna2VuyCHm3DZ4CZYSae1xuTeVhIgRq9u2d4dh/FAcoIdRPcdWAL0EOimzaSdcFYy4k6kk4oSPdUWoZSi8he9Hp9X7O92O0NohiyY6vVXHZMy9NK6ibIqkd85oTt3JDojIfFex9ZZF96QFxmlsN4N+xbl8rV2Q7sHh3GU5L5GxW1fvasuiKE2gmgwYK91fwz2rzeNt7m6ok/eCdF7bcJojSoatHXpxukWbaKwbkwZkzt45R52FAE3moWEoHuUdJpoKUmXv5URXGfwQhFtYmpdVY1r11Iiw3KiVehCQZGz1qvRlnQfrE0hjtqs5b8UcEJAvR/FB0Kcac7hKFCCQVLKU2xdkdpqzUktJ2q28pbruKLxQi6iSI54ECVSdydUMS/FVuitHro5/W7eAHfUiz9//bp31TyR5onuzHm2NXI2FcW7ElbpTbukeuf1defzlyu5VC5XzRBa75pRmHrsVjLlovnS1Buxq/ndOi88Pk6cTpHT43uWx/fkWghJtN4eZt2ogscvM6eP33B+fE/JmfDyTE47TiqTZLzcq0mOmqMeksELp7gcHSd9yD50VWtuHbrvuFnJ19NyZjk/6uG1X8npqvciFVKu+gqGWtBtQvVbDVHE5LGdTfh+M3EcXAacf3sY9U4FWszksKRkSrLW0ivQaiKlDV+9+skURWxqqdScj2CIpmQzP6w2REsG06KQ8LROhDnivee8RpYlqMWGzLSp0UuhhkbbheAbl6czjs6Wsm7NW2KJAU+FmqHeJnzr/ngb6qargZCaFgri1etoXhfoGTHCs3MmgW4cghAUgRh3JxgxN7dGLbtulLXo+Ai8OzuezmdSmpmj8PL0wMvrhWJGoLU2VQ2+/LLU+N/i+uHH77Qr4i5oU7JsZ9t3Lq8XDTYuL1z3K1vaeb1e+PL6ovyYYJ1nYqJzaAkzxok4TdA62/OV/Vm1Y15fnnl9vXDdN8IfHZfLKz54zqeZeZ4YiufihFY7eyrW/aA8p9baTaXc2WZupMRp0s7GEDx73mi+EE+Oh/crv/27j5zPJ54/Xfi0P2vnEJ1igfVLSdRrIziBFvAtEpwQayTWCLnw+v0n8p7owZHPkbpEqnQ2GgkV9nDN3cj0Q+24NUV2W+Xy8W07ch5OJ/JptUBClLw+T4fjdvBWxuoRb/tZW2dqPVFb48cffuTzly/kbHfG+CsxdqYIVRxXqVB3nBPmoDIdQidGT2fC107aTFahdvbtSq3JRDC1M06coJaBmlBOs8cHwYnH+8n+9YQ4WaCmSadzaJJkKOr1uvP6cjEe20ByFSE+EmpM5ZaBjN75Numsv+NrjOTKcDlRCsWxLjqGutY3t1yR3gjAKSjJWCsEipqmVimhUz304HFxortbIt57167RtuF7Z/nwxNPHj+ohNk3sUZGvNDkq0YI8U0fuHYpH6hCCVF5p6pW9VvZS1KUg7zqvBEpRAdhaMt7DMgdSV6SotKr72uWKK50FcMtKcB63nDgHD71SnyOlFRXs23byvkEtpO2FdH2ltcKeN1K+2vv10B2tw16EXDXQi8sJPy9E74nnmbhEat5peTtKqzVnSql3Qc5fpln9apATg2bXrevhpy2hhWq10m6RZ6uNklVVct92LhcNcvaUVFTLTqIBUOamjsIeVNnT2gCnOLGuJ5Y1Mq9n4nqCUvDTjosZ8cFMOQUXI9P5kfXdR0pOdO9J+47vmdBecT0DjYZ2ddQqyu5vHe9FBaS6Rbp3Nd5ucC/eIZP6NHk/E8KiZZ0tkq6BViuvlyu1bYYxjDY4UfJXP8AazYQdDM2H3qHKrU41CHFvrpojQ/fmDq0plVqKej+1Bk6/X6wDpVUlpPXeKSlTkgazIxHRmMKbXoW2ws5RN7opKlHce88aPUv0es8JCldWIfdCFe0SOZ/m4728bplaGtE7E5VSH6Mhha72DHfQttUD+12ddnSotBjBOBYylGF7Nz2I4UGlWawX0w1JKhfQAVVrVr7Sup6I00zOGhjO80Lwnm+nyTomOjkXUnpbXZWX12dKyZYxy4HIDS2Y67ZZSWPTTqmS2bMpG5fCtC7Mc1SOk+jYiXPqDr6uuk66xzUhp8zL8zPbtuGc8OmTY9835cXNkRj9gYgNd/Ji9h+96Z7RzE6jdxXttGoiAizrzOPTWQm10eFn49icIo/v9Ptlr3xxr9TD60fHfqvqlB4cLBI5S7PsUje45iv784vKXXjHtgXS7KkCV+kk6Wp6WQOu2aFY74OcpCKg/W0lAZZpYpknc4CHKQaWaTpsF5x1SEnQgLQ1I2uiY//lyzO5KPl42K0M00wvni6NLBVpTj05lwD+hrZMBERUqwxrgS450ZqzVm4QF/F9eA9q6/cUAz6oAOc8rTgXCDGqz5n3hOCIkwUxtdJyOTLx7aqtwKZkNuIQJRYDesYMATll3I3M/QZ839r73YGIO27UVbRcjaG/Bxr9dpeggcPkPXMI7Lmx1ayCmtJpHn2fwcGs3bmq62ZBTom0PSJ0pvPK6d0jIQRMela5o0GORG/4SXbQv2mBYt0V6c2YhEuruFqQqiJ7mhQqwkStOK+CA6VCL5XaVI2alLQiEidWu88xTkzzhKOTSqK+ftZ7C5oI10TZtyPIyXkj5c0QOdHx7LAXSEUT/VILPidqDJRQaX6m5aS8nlr0PZntk47gr5cdf10McHSbWPZ8+P4Mtv6oDgCHwJFxFEaJyjtdcFq6uBOx6so+V4sG7RiI80pcHwhzpDKz5WBEZtj3jgvgV4eXgKBGfbV25XSEmYjHkQkNXC90KtK1RVRqozs9BHpt5rhpEPWARY3p3wHXHFLVUFJkxvuVXhtpisRpUrXKLqor0buif+3W64PBpndknJv+kT3a7Sd/RXXxb3AZ8nZjrWMCjLo2tLNNRRhPkzO59kZv1kY6OWrSg2CY/QHH2DrnWALMun+yTkoi9yEokrNOGqzEFaqnlUL2jjpHYgxsVy17TFsiV3QhxcD5PBMnz/m0qiqu6WeMxf0fYsOjgqgHrztKru62Sd6eZr+uIzDQX2eS+TpOYh5tXevLtdB7MyK6Z5knHh/OSu7dM6X2I3B6q+v1ciGl/RYY35XxECGGqA7ty8KynkAcL6+q9dOtTDeIwk26KsqKlkJiiNC7kr1doDkLLJp2uKSc8LvZWbRCLc7+3vC5MwsAhqCYtdnbIWpg21GCiZMjRH244AjWKrzMMw8PJ6IPbC/pkHcopZFTO8pWFZAmVrpS9fHQPKFpmcaVSs9FS+V7o+JUj8zZYYOYIKQGDCUXc8RuJAty0v62Qeu4bqUh+Wqu3rhXN7XjblmsDfkN2TwWwJ97hQai5eJp0rb4GNVdPNdOkYwEFXocpd7gnXamRu12i9FkAxzaneqUYnBa18MKYllW5XR5IQRdh60UakpWXku6RiyprEfVqR+ZuXdONyRua3nkMOMDB+9plry1NvzuLODhJkF6NM70Ox7bG13euDjBe4L31GpUiN5vFSrR+zc+kBXfdNi8h2nGC9oVPM1KAK9FnePRsrK3Et8R5GHlQackqU7Qsp3TFvzQGhI8Pmijh96LSu0mxOtFg1Tp+Cb42q3sNxocrCvapFqat+61MBGXM917JM2GoBmPzspkvd/GVcvf/fhaDBHoQwlfOiUnShJqSfQ62txv4zbOy1Ge/KXrV4OctF/J+xUpSh4a9VOtmY4CLociZTeNjJx3I2E65qgw62QOsQK0lKld2x3XZeZ8mpjniYff/B1Pv/s7fPBsEvj2iycl4ftPwufPlXnuzKfIdF4JTJQsbFtVh+TzBzVpo+D6jlBpPdPqRu+KRCxDPTUVyrbf2h6NJOpCxE0TiJC6Z2/eok4PNWhUvL9StldKTvh//YMJnxXqJZFy0Qlh0L0tN9TeQBhKQ72rHF63Wqo0rZHX9rY8DnpFTPPI9UqgakACrBGeZiFE4eEceP9+UYjUCV404+vNoC+61VuzHayF3pR0O81CnCohCk9PgcfHlRADj08PnM4nC4yewAjAedsoKbFdE+8eHnh5vnDdM998euVy3QnRs54iIXo+fHji3dMD6zozT/HmeG6bupYNDc1xIMERYqSVSK8KeYsRIAdudqiIghHKlRQQJ0c0uFs5LHqctrLRaqIjzCEy+ZnJC/2//y/89puPfHl+4X/+878Rp7fN/P/47//Gy8szrWidWvkaGvi9f/+Bv//971nmheBnOpFt39lShT99R20FkcAUF3xwFJTUKE60ZHw+6yS9VvpcR+7CngfHJnF51RLlEhyT1/JgjAEXzG/JBDnFdZzXTdR5mOLgbXktgThHnCbW1XRegifOQZVaxbPEmZyqBjqXjetl5+Xlwl4uJtinm2kVeE2F0Lxu8mIkeKdIoMuF5iBtkCI0EUp0NK8EVfGdJso7urxeTSivsScVMjyHt/YiM08qU/7V0ovtq7Wp/L7xT9wgX47SrAVD3YJLTa60pekW0B8REA7dd9+9O2uSOSmptNTO00vidStHEOy8mE3JpFwgEQtwhFYLKV+ptfB4OvF3v/sdp/XMNE+cHx60DDy0cdCDK29XRWsRtsuVfU+kUmlW9td7oGVnTWTCfwzWDpgcQAzJ7KSs2kyHPpmC6qrq227ijmrw+HbXMk2c5pnTNLHEGWmVq2gjvYjggiJOzoQxNV5T5BM607wwx4h3wm/ff+Dj4zvECV8ur6SrksuDCIvx6LoBC16EdQqsIdghE8HED7N39CkgPuDniESvlYl9p5bE5CLrsqiIY+lkXwm1U7wnSaX1RCuebbvga6TNCxJOqnG0PvHgA1Iye81sn7+n9UKpnZL1bOiG3oKu1WP60gxoaHRrRBDv2CThcqDXSrleaNY5J2j37B0V6y9evy4GWAq1ZFzNxsG5j5r6DckZkaktpGaqwCpNrorAU/Qa8HSsW6laBqC153mdWc6PzI8ftUOjVfa9kfbCy1V4fW2GmHi8RIRAq5BzU4XIaSXOM4L6EkGlt0ytr7SW7Ybq+yt7ArxNfqsTA2GeCcui/ibd4bq2uTlmnCzQoeyvlP2VvG88f/nCl08/QhJkywd6pdgux0GKIToNEMtwTU5Lf2rZWfsvIMRhfCqx6D3Yfjh5jjr9eXa8W4O6kJvNgeCsvKDvOaedvO/0rrB2LYbqRAixEyK8WxzvzpEYI08PM+eHhSHSh5VY0ghyNtVQWJeFfc/EMHO5JkJ0zKtC4k9PZ5ZlZorRhM+caWzISAIZt3tYTOhG4ulUhVzHXB3nxIG42S2yPzKyWC1jmltyb9Y2DyKDd6DCZ/2D8HA+M02R7374RP/rFBr+09fnL1/4/PkTZU+02vDOMwXlQZzWE3OcOZ/OlArX1IjTzjx/DzgzUVRiq8pAaH+2c44YIvM0QYc9TkQfKV55HqWawm5RMUcvUL2QTXJ/WibtSAuRiZsBpgvdvO4gzqqRFIOYhYqiuPp1UOuMoX5LYPIzJTdevrxyflwQ19nSTpd6lDU6qsCd6Gy9UpxjiRMlqtZI753mqhIqS6PsXUmbc6AHDy7QYwSnh2267mzbTimVfVfBvO2y//Jg/A2vexuDG6GyH9YcYmW+G9Hvriyuf4HOoTDFPU459KBElIezrjMhBNbTiWVZKLUjMTFvxcpY/iAXz9axqu9PX0m79wqZzjJPvHt85OHhUW06np4IIdAHZcB8w/boqaXw+nohTtHav0HZzu1uTWJCk6O8bHfiHrHEeI4m8jcsRFQC4hboyEByWj+sAt7yCsERQ2AKgcl7slOF+aMB15pnnLMWbkPhRhFmjYGTn4nO8bicOM0L4oTrvo/Jjkd5rBocVbpo12c0ioAAYh4ZLhfmVMi9qwr0bKrxAm2v1J5pot2kMUSa68Tm6KWDc0ZOVgS7lKyaVCESETyeaVqZY0Rqof50okvQfqiO7hdt+GNxIE8ihuAMvK13LUm1Sm2OsldS89C1Tb1bKZ4+5vqty+8vjsWvP+VWfhoEsAMK7fcQoB4O4zgoHYr9GGux9qJ8B+0U1YUTvLCuKw9Pj8zrwnJ+Yl4f6b2zvTyzXTe1VSjD6dxZF4TK8JaU6GEHEdauUBvSTYW403tSLZaebYKb9H51uNDUYLGqCCBAk2Bqm0KXgHMz4PB+JXg1qnPWu4+AnwIyZq+ZjHTbccTKXiPk7P2uLNXvuoT0G3rv3jjGeTid6OJ5ePdInFdSSlyiBnun08y7hxMxBp4eVp4eVqZpUun+OB3ohxMxfk4kpWjt04lSFbkIQctTSkpdWOeJEFWtNHpTIPUOxNGk0r1C0kpGVblwobPMauQYpsBq5arTyRAcb+21I7g57vMN8gWO1XR0r3Eg4rf7z4C/7d9R5h07EmNfGRvrXVrZqz7Q0pVuMJHzeWVYHbzVte8723YlbxrkxKAiXcEHWmt2UOocLEU5QsN0U+0aEnlPtOApXUmGzglpurJHJR7nPWkXXbFSNXoIhxC1tOmEU3TMNh7TomMtzuHChLiA951o3qoxCuvkCFF5HdFKHtp1cwOHnZXTTPETR+N0Wnj37okYI3uqfHl+1efUpmsYVbktdgikWkml4p2Gm36sx6aHvEORIu9VR8aHiA+TdXpie5vJ8991iL7VlVIipXRwcLSL1To/6+Cgta+Jlt3KyCZYqge+UvDHhPbeEYMGK8E3gtcSzzRFNbcMgXkKyqtynXkKGnSIcm6cCWJOUzi6usQNTkwleU+vHi+i2biV+XptdGcCj9ba2Kuneo/0rvuKEdx7d0YjUfmDIWQ+jG+P2oQBOMMeR7AkxjnlTTm1TTi0SsRrt52VbBChe0eMb1tKnmLktMy8e3rkYV60G1MqKSWa18SgO1S/yA/lf+UCCrCGyEPUBPM8z8xeNegWH1hjpDSvSWo3EVIRGhXvHKs9RzuL9Xx2CHMMpKwNFHjtcC3dUYPHdX90gRk4SDSNod7BNRXbpTpa2ugtUJwnx5nmK9GJekt2QfxMXB+Un5gvlG2h1WyduhbM9Eardt6Z2qbOa0VpvAw6hTosyK24pefkKIf0+97lP3/9apAzAopuisT6Ijf0gaFkOV5MIPfK3jpbBdfAVYUfa2l07UJj9sIyRaYp8PG33/D7f/x75mXlm9//E++++Xv2PfH998/8+79+gpbxrbLIxCwqTia10PLO5fkL7Jn14YmH99/gwoLzHj9Plr0rS13LDJm6X9V6XhKhr7ghpmXKjbV1UtZyk48LcX5CXGBeHpiWB0SEkr5Q0sK+vTI/rMhi2EZsNG9quY47p9Z651V1U9ftwwvMHMyRu/b8N7r+8R9+x54r7775HfNyYts3Xl5eKDlzWmeenk7E4Hl6euTjxw9MU2SZZk7rquqq/Ybj5aTmjb2r3lCt6vg7Omycc+ZYvShXZ5mZp2BBTkCcV8furqTRmtXDLOdXnAQ+vFMS47xMPL1/UpTAy6GCO5RgYdSMdXY2bh1WDK0R75BmvBosyGyNXjVYF1vdEm7Z86Gz0/Rv1iEFf0jKd2hZVZQR5kk5Dp0Tv//9b5Hwti3Hn7985sdPP7K/qtPwaTnx7ukdU5yotWltPUy01nl9vfLyeuHz52d++vSFy/WCYGJhFuRUKzeyF+rrBgjpNbFfCilpZ0MX5eGcH1Ye1pnoHe/WifMULMiZCTFoA0Kq6iUXYF6UuzFNjvNTIE5qAlirkkCH27Lr2v4dunnBeYesgd7ht7/9ht4c2zURwsT1emW77uzXZLYVKj9/aarBFXJWryBDdWaCceJuApDeR/y84Hwkns74aeFyufL8+bMmRK3SSqLmQsvpTcfzy/MLn788M8VgulCBadLOJo8QxdRruZVUu6hK7GicCDGqMm1VjqBzokH3Sf34gqsEqcxz5OnxxMcPTyrvEJW/VVu3/a4dHCvV6xHtirMkZSAtexB63okiTD7QS6XsieAcdc5K/p4886TaKoGOtEwtjvP5xLunR/Z5ZtoyISZT1U7kkuiY6rMFOZZiaaXg4GcY4uWczs9qpQ8ECRFxnlLUJiDLCNCDlmPf8Ho6LzwtM//7f/tH3j88cbm88tNP71U4sTdyK6auzUHW7aY2DcLDuvK0WinRqiG9d5hXfFOj661VUtPqhspEFLxzPK4r51X3npHI7VPWAAedM8QI3lN9Z2KmZqUkRKcJRXDCOnmmJrjSSNtu3MxErlntMZKi9y5EmGbishLwuOWJ82/+kZ43FXhtOzUn9u2ZsjXzhmuUdDPeVZ6ZEJwYx0vnqrMgp0vHaEaHwO+x7wPtLyBzfyWebvWyn2UyfURaHeWt2DHeeicbkjPKsU7A907uWjec7cPEGDidzzx9+MC0nDg9vmc+v6fLhZzh5csFR+VhakriEnVDkVbppZDTTiudMC0aXEhAfMRPZyQElJKoZNdqnQIiGekBV8QisEqXoJL+pVg3mOoTOG8b4HRmXh8Nwq34UMF1rW0G0Ts57GK7aH+/IQrK5B+uuJYNDxTs7l4qCeyvG5H/7PXu6YFS4cPH98zLyrbNzF61jdZl4vFxJUbP0+MDH54etLV3WXg4nZXvgeEYXbVYRpBTWzZC3HgCpn0SDyflaQoEb10EXp3FtXVRfacUyam0mvFBWBfljCynhQ/vH5nXmSEOdovcLXQcSGLvI+G7M7e1wEX4+rqDvI9d+ysk5/a9QyRv/NiyjtHtJU6DBcSxLJGnpzPyxhyObb9yuVzZL1dqUeG103LCiTMkR7VKeoOUMvue2LfEtm1s141tXdi3nRo8tRVqUz2UrQuuauBW9kZNnZrLDclxjmmaOJ1WpuB5PM08mEzAtGqQU0rDvW6kVLQtNQohwDx7zstEnDTz3rdMo5nzMj97WBePiYw+nE/kD439lPn06bOpJjdqLuyiY167dpK0BqmqQF5wneAb0eQicLYndSE6T/QRN7x95klbad2txHFrunhbJGdPirD11gzBMPVbE5Pz3tvk64zesi6G5Ayujlcxz4G+qxifJ06DW6e2iDF65jmyrrN6iIVR8sECnX6Uq8QNYrE/Smg3JKkxhQC1EQ4kp9JL1X+rR7q/ITI1UENA6EwxssyzNnp0Z8rqzf62Hl+DlzS6y0aKNbiUOh/Hz1VPqGqboMpx2J6VrfFgtMLP09uWkpcpEMXzmw/v+M2791wvus+mfae0ot1IVlorxqkbwpOC8Hg68/TwpAiYDXlrjRz+X97erEuSI7vv/NnmS0QuKBS2ZrNbJA91NCPN++j7fwDNw+hFOhLJptjNBgqoyszICHe3dR7uNY9Ak2CDYuf4OYkCEplZGeHmZvf+73/xlGGk1IrvRU6tJMRX1TnL0QeOmpvXd0prBFTYUtpVXc0aKhYTPAVBe6wqYwTJsTg1/XMtaUyIFFXNSlZjsyPGZUYsOUxiXOknwuER8kRaPhFfDxhriWmT71HuUbmJD+kwnTQlTeMJKx1HNyqM0GNy3+/l0of/J64/bgZI/8E6y8RodMF19AJ6aGuVRWu7GW2HOq0xBGcYvEDc8xy4OwbGaeT4eMfh4YEwTrgQNDNKiGK2FZypDN5KZTl4JfQCarRUw4B3Tn9XWSwldynk9U0suWnQpBEWPmI8lGshxqyOrivny4XaGsdqwQ14P0h6a5P8htoypURFLooSK8ENBj/Kz7yV9Ok5uC9WGb62nTAK8vvIZvW2/g3HeaJiebw7MkK0Aj0AACAASURBVB4OzGNg0CJnmgaOdxKHcTgcGEc5sJwX9KXt3ZQB04Rv1W3Im8PdFDmd7Ng5HxKUt1PWdX3cQJdVjKiGwTPNI8ENklHmHd5JgF03JLxNp+nO0zv5bv9//Gjh98dIgTN6xo6phqtfUf/duXl8jCBttVGSEO9LEQKcoFMTIQRkHKkZZ2qq5a4kiTe5xM4/a+ZYD2nUUSDs+XApRz1AV0rNu5mc3Q8l+ZMikuSyRaICtHHNpLUQcxIeW60477g/Hvnyi/cEZ7kbxB4AxJdDUr4Ly3lhWyPeG8gWH6BEi2lJkJy2o+lUU6g2Cc/HF2oS7xTvA7L8xCjtMM0EN/D4+Mj7919wmBc+8ol1iZKMXdnR0FQa0RSKbXhXpEgwEjZbmpjXkQstJeFLFMnZqUJY2t9LH5wcuG884qgahlxc3der+MfIe44TJLUXAPJvVbpbLYaGYcBYR2xJLD2QH7RP/ryMCIchMAxBix+ViFtLqY1Q9OkyV06OtWLM2PkxtjcFtTIOA6Y2vPMiW85ZlKdJwxhdI0RD9W5PMbc778jububOO6EWFNlf9r37pg+5CgTs7m22+0QZs+d97WirNWIBYsU4T1yX2Qmwb3U5pPi7OwQe7ydGD95kUprJRQw1q/L7itqNVEVyDDCPM8dploZFz7TaKgSDHZwgpbVIrEJtavyZcc5yOMzM07ijJEW5tHfjRC2yGKqVaIhqnQRxOi/PjipjUhFhSasNVyqTlQIyA1vL1GKwOeLShqlV0JycaVbc7d14FLn5wxfYulLSxjDfMc5HSs6s5wvreRGD0BLJJUlhFRTJ4Spw6ih9z5DrD4fpi4J/2r/eXj+vnNWu4HYm33++dLVNyPMSYCQf6lrZgzattRx94843gre8f3/ki/f3TPPE17/6BV/++s+xLtCKI5YkVV/ZcHUleMPdNPPZ3YgbBjGuMrJJhXnEjQeGacJgKUkItc1UTBZflaK+KrVAiUbk0MXL53GsMfLyupFS4vnliY+fvqeWwvv3F74iMw4jfrAc7g5Ieu/CFs9s8UJtCRsEah/vPNUP8n4ZWUQ0JJ1bN6LWq+FciWtUz4gOtTpyeVtI/KsvP8f7gS+/+ZrpcBSuxrZSalHZp3hbOOXVyGYkYwNlzNEt0YN3eKRjuBYXjU7i3Qvgftg3duI1rWKqHNSlSDggpnG8O2hsgqA43nmVfyeBN7l1ldbNcJ/YXn2OrgVPl39fEZ8Od5ck3J9SPJLkrCMoey3mjHYJJRa2RQ6O8+uFdV2ZxoGvvv6CwxikWG7iT0LLeFt549BqUufVqMS5tKrya1GjxBS1aD/z/PLMy+lEjJEQAq1VvLVyyFch/JkslvjxEkntlVYbl/PKconkWljyAjkzTCO//MU3/B//4a8FGagJ1yRp/PcfvuX55YW4Jp6/P7G+bsK7m8TJ3AfD6SCcnBAC83QUcmqTNHhQmbCTQyoMI+N0kNHNeOD9Z+/lvleHtxOXZeVv/sffcXpZiCbSUiXXgqmwpELNEgbbNGYEA8U1UXeUxjCseBpuyLRxpHlHLnkPCpWlPjIM4m79pvezZFVnood/0bGjxbZA83KANMS+QVBNQ2mFgjQU9w/35Fx5aSeWZVXgWAjizlmmyXGYLNM4cHc3czzMGpQrj3etQg5PuRc5PVpCUHdnlbivUu7NW0iV6AcwTpQwdSMCi/f44Ck10MjyfhpJqBZluN2l66WKU3UplVwjJPRhFTdeo6he0Bih3tVDj4ERBGSwHoLuCTrmst5QvMXiFCmQ2JW3vAabuZ9Gvvnyjl9+/UhJd6T4KH5ypbClLFwaxQ5k5NLoGVSd4gndm079r3IhZvneVCqpyOtPKZGyjJuDZvzV1thiJJdMCgXfHI/jUcZbOuaiVep8oKnp5WVZyCmzlCj+NikRrOedH2FwrKnwum3kqmd+Rb2ToPmR5j1mGAmHe5xpHO/v8d/8AmpmOT2xvj6TU+T5hx84ffxEyYl1fWZbX4HKYBPeSFC3rVnWeZV1nqrdx7JSuHeEXtbGT11/vMjpLa3+8E4z6YGG+zkjc6rrh36vVcMoZw3BN0KoDF6RnLuJ8XDg+HDk8PCAMY71kkiLIjkqc3ZYBi+mYzZ4nJNDzVghWXodiRgUSeqzTS1uSm17NVyK+mFUI9AZYlu/xkKMifOy8HJ6oZTMNI/E7Q5jCqVsNDIGq0hOotRE0+Rk6wwuGEJTQ6udsNpo+9/ZqKUJqmMqpLIzxqsSCK8mWG9z3R1mQhi5vzsyH47UWkhpoLWKC55hHMWj4ebW71dnPJobM66bkc7+ZTff0om91wKvFyLs96fP1w1igNbmCauhnN3ojlbFgFILrb3QaToT7X+fVv03Bf/Vm+gG4hR4uO4IWuvoI+yQ+O3rqKVRkvinrJeV81mcWqvmLjWu696gnhNvjOSUfEVyrj4UylMy4mmTNYW8k41LybvHkMXsBE523ygdTWVFNl8XlssihF6TgYqzhvvjkS/ev5fXHhdajoL2NR27rJH1vLK8rpI/kxxZi5ySwHkjKBgjJujIpSjyYAw42WNqAYunusY0GOZxxjrHw33i888z87Ly7fEDznsp0oxsiKZBbmLa4Fojlkqo0nllRCpuDJAyxVsNvCz4GyTH6iHpgnrFDG+L5HSDzk4ibqChh41S3X6o9z24t7W9wxX+zYC1Decu+2HQ92tBcmSEPAyieOxjrF0QYfWRMtcip/PrhiAjra6uMsZAbcK3aUYLFUEWrkiOjL78YGjN7plEu3+VIv19PAfaVIFODHr7Lp+z/WDb3zSglL35cdbgraDpVdGm/nlUqPD/D5LTGL3hbg7cH0fEuFkO7FwqMRU9sNVKpB+dvRnORSTTTe0rspwTQxET3d3TqEhhJEWO+tJpg1BrlSDWZHEUylDxxsl4qye239AmNrtRYyTWSqKJw3yK2ACDneR5yIWtiuccRDABbMXkKEgOBjNOuOEgAIcJHOwRWmYYD0zjkRQ3SrLkiJhNIue9oeBpeNMTzM2P17i+T+2GF2rQUeW/sNf+rCKn51VVdYc1OnYwpo+upNIxRkwCvfPMCoOOwTF69dMYHfNoxNzt/p67d58zHQ4c7h8Zj/e0BpfzE+tyZtsuGFOZJs8weMbjgfHuHuMD/viAm2ZsmPDjjBtGjPdaDcusmmjBCM8gxlWdfSM5CvFYTIcSrVXWdRHDoZqFdKXmVpDY4iuYRNxeSdsF66zYTGsCcKtF1R1FcfC0v3E9gbzVIgdJEw6TMRVck6weDLXIplyyefPu396Ye3knhmho1L3zft/Ubsc7RsdUmL7JdKjwD9bJPy2Lrp/Xzbby40K5lwUNpCL3Hlcre9QFCjnTDdGuP/c6SzXXf4dezeh/Gh219VOtf12flRh9yOUHGnoHdc3ZadWIMVbVe5UabIXmCnlJxCXKhu091hmKKjzsP/d+/AkvGVX0jLFKyollXcCIvPzD99/xej7w4fsPfPr4A6fXVy7ns3gbqT16RO5pjhtlk7RgshA4W63EZROJOg28jGVbqVwuF54+fZJ7Ehda3jhfLpxOJ15fz6Q1s65ijFicwdRCdrK+SxOr/1oshpXgC6U2clYFojU4NY/bQmK9iKvzsiTOF8mvO71eOL+eWTfhNjgfCKGRqwTmynS9icEcllQba64dBKLahinyWlwuNCuZXjZ6RXI04qQ1nBNX2WF8W9+j/kDVKmtTio6rD0p3Qu4wuRQZshL6ilBmphYZoiQtOUlxax3ejczTwDQJIu6sjHLEikEOEKzBlWsTY8y1a77pc4BrE9tqwxbo4o3W1DusVWJJrGnFWMMwOKZR8s6WZZPoj5TZNuGMlVLJKe2jaXOT2SX9lGp7jTB0REggXCIDZGOxpuj5xO5+PnhPsWZ/q964xqEoX3Rbz2zLK8aoga0Rj6HBiDWJxE8ISn6L6pSYdtPVlBLWyb0c+n7WIJVKzh3JyVrkXPecrrAMXqJ5vPekLEV8ynlHh3qEyRY8FBk32wbbsuIxjNPM3cMjPgwsMXE8RIm8wZMJNGMJ08gYxPV+MAZb1QrAWzABYxx+vKM1hyuZ++oxwx0lJw6Xd2zLCWrCpBMmX2glk9ZXyrZqwnmSM3c/kKqOI2HnGfzE9cdTyFXqlYtwWiQf82pDfeXHyOFtTWMeCu+OmVSKzICDVOsPx5HHu4lxHPjym2/45te/YpwPfP6LP+f+i6/JKfPDdx95/vQ9ZVtwtvDwIF//8P5z7t6/w/gRf/wMOx7V1OiI9SPWByrCzK8YSt3ELSduXM4nckqkuLJdThpRcC1yum610bAmM8+GWi2NC6/nzLp5hmnkcDzirCfFMzVGaoo07XrJGdKGiStQqWhHeDPGQTsKa6C6pknVEvYZvXS449si4njvdzLwoB4WYswtcQdGbd53tAWATjQ0e8Gzd5N7AXOFEW+Pdvm2q+toL3Juv78XysaK7wwKPfa/v/UJbWcSt+uCbgoj9r9HNnr94VVFiMbJYaFFldRDOoCuYKooMkSSaZBAVyHh2WYgG0I2DNnQEpi10M6ZWiLb88LFOPwYODweJaG3ZgZ7U1O90VVyJpdEyuLjsqwXnl6euGhsQ0qJIQz8w+9+z9/97d9yvggUnaJ0iFvKVBZoTWbkr2dardjalBjfSEVGJs1AmwN28rSc+fTDR/7Xb/4eaJTtQk2bjqu+4+nlhRwr20siXWRPWJ1yQoJhmj0uGLzLnE8F5zw5X3lxQooV1EAQOzG1s2HEDUJurDgqTnJ5YmacZqwLbKyUIghDMYIeFgyURopZlo+DZg3GQ9gSHkSxeVkE5aky9pvmUVBL3UDv7g9vej+Nleaxlro7oTsNi5VojELWUU9HUgwyNpA13VQpKr5VcV2hNeI6krYVZwbG4Pns8Y5pDBymIMnYVkYcTmXKtcl7u498b4kQCAlU9v8+AgsE70hJR45Z3sPLutHQsWm8UGrleJx4fLzDO8f5vPL6upI14/B8lhiSXLddxGCdJLNbY/BKmjZcCzOAWoW3UhTx6FL/juBiDWEMQJBmIOWfEQbwb7vSsrLayun5B14mwzBMHA/3OCeKRxcmxIDSY3yg8xU7WpG2jbguNDWjjFH8ccIwMAxCEZAAbMlZTFly4lpr6tYtSM1trFLVny3vQbqG5mrkyrauHLxjWzfOwwWbC+u6cv/wGV998wvG+SBJ91FM/pZYeF2lQWluEoO0Hg9TihQ64huBMTDe3zM9CnI+f5F5p943eX2lrGdajsSnb0mvn8hx5fTxA8vrEykmlvxEjYvI2K3kEEoRrtJz+9OlzM9AcrSL6P5JVmV86KGxR9krxG8azjpGHVEJc1+QkSF4xnFgnEam44HD3T3j4ch4uGOYD2AlEyluKyWuWFMZB8cwagrrfMD6ETcfMYMUOdaPe56V3MAiN71IRxS3lXW5kGIkbSvL+aQuvUVdL4XU50evnYqEd1ZrgELKC7U5UlrIaaO5ooSsnoiqb0xtemhmRQYytHLNU6LpHFsdQw0Yj8ofDSXLAW7D256Mki/lpDvqSElH5Do7Ue9mRwtv4eFOKN6vffz0Y5LgjnnflDP0DbPzY26wnx3JcW7fgKqOGQXBkeLm+qv82PyvXT9zg+Jw8/puuTw3G/ftnzrm6AKWW88WW83+YXKDVGUtbJm0aUpzkxFVV7G8dZHTuUW1SUGdS2aLG601zs4Rnp/wzvP8/MTLywuXy3IdxSk8nnRsty0r6/lCK1Veb//5TW0QpTKXz+uGeDqdMK2RYy9yNs6XRTKzYiPFSslqcplFNeOL8FxctRQt8q0Vl9ptTdQqRU7NYhOg02cpY+2iG6nFDRN+mHfExnlPbcheYCWrrrVKVW+cXAX56jBkQ5AH01HYLOT71MmyRpAcQEc0ljC8MczaxQp7VlNfn7o8d67ZtW+92v31Nay5bsrj6IVPUcTPWcM4+N3zphOAnSK7cth6teVnJ7R3ccD+qyq606NPrDHUqkivPtk5Z0qDZdt4fT1LDlyrEt8TPNsmZFmJ6MjEKEhOowinUv8eIQubngyoY6srCVkM5WWz6l/b9vdH/mHU1sKUolYQb4uyVs39SttK3Bb5neosuWMWUZw6cR+2YZSqtSM7+vpM1ZGSGvEBTNPANIqhasqVpIVNSkVHhZXkEjlL4WKMcJ/kfRL1ZyllL3IkwkSeOw+kacIDLWcOw4Cplbt54vH+jvlwJObKIRVybZzXhLUbuVQKA8XIz68g56Lpaji1DQkBrwWaR81wW6WuZ9q2UNLGagyb8aRtIS6RtGVq2zBmQXWTguLoJr2rruxPl61/XF3VN4lSBMZtYBHTKGst3sm0tNRKoVFNwzgkdbYYXewFaxs2TMwPXzDNM4eHr5gevmKcZvx4j8i8qzbYWSAqrIynwsAwHxmOj1L1TkdMOFBrY4mJUjfZIOwi+v0iBK1aG9u2cn55JiXR6qflLD9f3TWhMYwBa0eB9iy4wwymYZ2oKyRo8oA1mrBrM8ZmqvG4JhulKYW2Jcqyyo1r4sba4DpbNkKG65yJLms2zRCCuM+O49tKG8VR1l83VPpMvx/28nUyI27XjXUvYK4y911WfVMI7wVE35y5fm3uzpf6c0xrlK7GyUnAl174GFl7GJGOupvohk58lvJJ/67badRuJNU0PsDivSVb/fl90y4ZY9QtlC5XvHlY+ry6VSFKekstQtpM84gfvJA11SiuJ94bdC7u35bD4YJIg1MtGFuxzgrHKie2deVsTzjrOJ9ObOcz27JydcEVsrxTyK6pBL0XazuUALJ0EQNAP06M48gYBgYvzyzZk03Gu8A0zuTaKK6yxEzaDfXkgKvWUq1EORSjzstNwjxjaWoQJu+lTBfMdR5vi3wYg1kzuI3W4Hy+kJIgTrWhBa0ayCl7s9zMWG6XujdWuk3nsLrXyEu+HoJG0Rzn3vbZHMPANI56d2TsP4aAtZZxcHgdh1oLzV6jdTqi46w0aDTDOIiJXGtN/MicmL0NwTMOQT7nLd51HouMNTryKdHJyDbZmnIYZfTeR8ky0jd7oZlbwyyAkTzBLVdSbVzWxOmyycHbxKDPOce2Ri7nTbhf68qiXix+gDB4Qf6cw/cip6Gu+33E3OXV16bFalEkgG6fTXUej4yfe2P2lpc3Doeh5kSOK8kYtjBQSyaECsbjXMCGhrciszfWaZ6UFCY1eEH1qqcqshW8FZM+I6/GqeOx926nlOTgVcxRMFTWJvFKTgsr2auHXWGaUqKWShwDg4UUI5fjgXkcWLfI4XjPu/s7hnEklsaUxaF6DIngB3JppGpJ1VObcFxTbUDBpipTFWPIm2PzXlRdRp5/aELzqI5GoIyP2AeDi5EDDnd4JG4bbbpnPJ/IObKcn0nbQmuFUqMUgP8WTo4zFhrC6t4SzTbIsol0O3YhhvYF1bDBMDcJdly2xLpkYcZPj3z29V9yON7x+M1f8Pj1X+KHgfFwjzEHoFGzoW6JljPeWvwwMkwHpof3HN9/TXOB7I5UN5LWlZeXD1zO552bUGujlCx5MyWT1pXz6ZkcoyAvJYrNtUXyVwzMxxlrjvggzrqH+0ectzTjaVbIr4f5M5wTSZ+1jeYaxlU8FlcKJSbq5UJ6PrEbzSncW/cJi6Gp86dxBj8FrLdYb0SBFgbm49t2i2GcCGHAOLGgrVUk9H20RukFS9tntqUUyR/ZkQM1cerOprrJ7dwBRQcaV9OupnPjUkQZUHL80c8vrWKtjH2cZh+FYZCizInjsnNS6NySkWs/JWkI3R/0aMRQcRaGwWGah82QEcVdLYmcHY2bANqdwGzZDRvV48j3CALTuLubRN7opIFIKWG9oRb1zDGiPHPhjQ/FcWScJ4pBGgljBbYuhRwT2+sFg+Hjh4+cPn1iXXuYp5A4vbF4PcBbEoK/oF3aAEjtqJ2zZR4n5rt7sX2Yj8zjJB1bSuIP5R33RwjjTNoKpm2sXu55XHsmjaO6iercvoZaq+TUZNxRKjZW3FmUb4L6yaZejSUrrJ9LI6oSpQs6BXlogkLYBs0KIqDQR9UXJIc2uArBWFCfHD+MDNolmw4jdLyxwTAOb3o/D/OBh+O9oBFo0KIqHIMaKVqrv5E2UKDGhrXhbWNwFYflbhoo9wdag7t5Yho88xA4ziP3x4lx8MyjZ3DqZhykAGrGkI2n4mlICn1tcl9q7PlmFhcczjpaaLhBRRMGeN1oVFKtvK6JrTROrysfP11IKfLyGnk5icFh3BLbGgVNKImcNwzw8DhxPE54Zxi9Y1CXZYqMH00n27Tbg60XOeYa+ll78aNFjrWKjKgc+w2vMXi8KeS4sF08LSeMIo7DdMehGZwb8GPR4kZRYEWd8B4zDsqraTjd4+Q+aYMcnCJvsgpaE9pBVYPbmBKWTCvioj3NE0MY5P45q35ahRQV1cmZ+NkDJYvK77LIKBHrxFjRWHJDmpEGW6pcogh71lhZNvn3ZU1c1mvu2xY3amtEDBHZnYsPFCchds7POD9ijCfcjbi7rxhrYXz/55BXclx5ePqO9fzCejnxw+9/w+vLD5QUWZYXctrg3zSu0qvWTiirMhczHUIWclPfDAVCklTqZi0mmr3jt35gnO8ZD/eM8wNhuscPA9ZPYBytWT1Y6u4fY62Qmdww4MeZaj3VDlQTaCYSU2FZlVishLVSMqtuqmlbuZyeyXGTpVClurVWuntrDXkwtDLQbMOakWkIuOBpJlBNUKM3QXGssTQjUk6LlQ1UR1YtKSNef/9dWWS0MdaNuhkkxNDLXBsr6a/Oy+/0lleH3vtwpypRsMsJu2nhblLVBHZOMdFzYTqELdLG/vXcFDlXtU+tVxZ/SlcYNaVNuVGqfENGaWMd8UNQ47PulGCprukGj4JiN2TkH71CnTHpejTm6pGxuxTvv2cR78bOneotfj8VZcugd4jOdkt4gfvpe22r+/d2BWL3/njLq+f6OCdqEqqQDU0zYq1fkjQo60raNnKM7K6xguELjC8VwBXj+RGot+Nn2tQESTd3esiBPKPGKtoVGATewoeK9fpzrIQBNn1+mpHDM6vPR6mNrB43pgpx1vSRp3bhxVS09GHLhU0DHbEC+4OCb30nMqCzDOWC7dgf+x2+YdKKs2/PZtLPt74G5Jl9y8u7IGiZMxq2Kj5gspb2aYPsNZjrAS63af8emhTZwyBJ8oN3eB1Re414CN6pp4y5psFbaV6vWXAyOjNcU8D398rYPdeto2XWZeXzSRGRSiXmSkyZLSZilBR41JQvxcy2Rd0vErWKUrHWujtV998RoJWbhXmLFt8g0UbfhwYyymjcFKzs67vvzW91CfIso1DJe7KkJMap1kbyIHYTxnsZnxmDbX196etQlVl1VtO+21X+rpOAvpdj3G5AW4tFTG8b3tudXD54yY60ThR2zonjfHJu53KNztKK7P3zNKk1RSMpAJ8reCVHe19xXp5d7wqGJA7muZCNcOJyyTQtcnJtxCaWHjlkipcCKiBO9haL6553rRLGEVcjRd83HySQ+fT8PWE9C1iweTCJHyHwf3D90SKn6oG9b/JW2OEiywNvOpFUFxkCpVovn314vOP4+BkuDHz9y2/44hfiz3L32SPDLHLQVgrxciEtF2qO/fil1kyqCZcjcb2wXU5UHGtbiHgur6/88O1vef74UQ4q/SilEKMYnzWNfzCtKPwpC722RIxi7oatuMHhQ8CPAzE1gjH4cWSa7rE2MIQj1iqUXQ0lVVISdrs4dQq/IalNfWso8bV12yAaUIwMRkxueComGIaxYqbG6KHUn4bd/hTXt99+oFb4/ukVHwZyScS4aTGSBdVoPQhVFGE5ywy3FyopRfn/tVKKbnDWyYGpD57pG1MfZ9VKilFJcZnL5cy2rXSJZFV11+HuSBiHGyTH4X1gHA54FwjBM43jnqnjg1X+hGMY1JG1ZGwVcrk1gtg13dxLCDRr8c5p2vO12LO67oyuQGP3IwQ/CUckZ0dpBevl8GxODtMwSUq6sbLJhuDwb2we54InjAOjMfgiRnikqo+BdGZVCYCT8xKSuadE3RQ1IK+l/4fmXZmbD2tFoRHXSKtwenmVhO/WqGmTja0W4VbkREyFmIoQl2sltUYGKBIWmHRNyIhJkJVCP2Sbcv1uDjFETRIVOUy69hpqGtYLYNiJGLLnS2FzZWn0IlZeb606TneWlAopFWmAgrsa3rXdlOXNry7m2HkoVhyHJYpJ/qwgjsJ6WIsiR8jA3nucdRwOTUYgDY5zYJoHplFk4z0OpTc8ffwrtUNHRHUkssegyHMkjae5js0Ao+/P/jsbQ8mF8/nCOSZx5U6FnCuGgjMFa+QgzaWj3nqvdAzTzUM7IbcXpH2NFuUaycFbtQFHxzu3iLIismrk2lVF/1IMwJ/iGoMYpVoQ9a2RnMWSCzFWLksG40Qko7Z3wzgxzgcdTw6M4yj3vhX5ACgGsZMyWOMxmjTflA9zLeDE+2pwhsM8CE82OIYg9z44g3NiWOqapTpozlCskvU7N65KcdL9eEozJKWgui1RayRTSWRsjbRamX3FHwK1NiZfmYPcn0vOXFKmtMZShcNXjWVLiXy5YIzl1QfhZRoYTCGgPFfrMOOMb4X7d+/x3hDXC1BYL5Zx+ml3+Z+VQt5q1a7CopWNvsnC/r8lvQGyUIPMbY+Pn3P3+VeEaebXf/3X/Orf/xXjNDPfvWM63gON9fWF9fzCtpzJccGYQjOFVBIpr7BZLqdnpqeZ0gyvCdYMp5dnfvs//zvff/udmksJ4bPUQs5ycHtvmSeJE3DeMXo5rJY1slwu5FJIdSObKonSYeTw0GjGMBzuuLv/GucHLAPWBFptpGKIW2HbCik2UoaYIabGktTzpVlohtYEWpcNWeC6AtL9xIJxjengsHPDBCFMv+X1P//m73i9bJggSespRdb1rDPcKLPOqiWZdkzdjKpWmZ1fLssVqWnCCRjGiUEzqsZxlDR4Y1RKUjVtDAAAIABJREFULZ4NaRU0IaXI09NHXl9fKVVg1VzEjPDh8ZFxnhRSlcLJu8A4iiHcPI08PDxIptY0cNRA0XkeeXg84r1jsDAaGVlZUxn03pvgYRwE8va60esGX1tVY6u+gOXANUYUhaMfsLMRKebombekEL2q8rwljF6FYYbRecL0tuONMA1Mxxk3jjJSSJW6JFqpxLyyxkiJGVcqdz5QeliCEY2K8OyEqC/4rKx748xe6PT7Z6wlp8rlvOK2BK1xeV1wxjB5x2AduRbWdWNNkZgr6xbZshA9Y9XQ3tJIa8LYrEtsh4zkcOt2AQ4poNQ3prbGlgrLpgU2Rp4jIyGbPa+ojyYE4bnVAd4UOXQXM3kPTBITtRgzIWZNn/cSC9C/XrqhN72fKNfFeb8HTdqO6jgxtbOGHfnskuM+MrbWMipsP/qJh6O8I8FbBi/ZU8M4ygjYyki4J3ijyJA88kV5bdw0LtCaxRVBurw3+7MioIjB99/ZGFKMfHp64vmykmJm3ZLwrSoSr2DMroZqTbk0HWFEkPIrStFLVFmfvZhpyLpIuah8WjlldN+lthcz1QhDpxuPSkbU213HacDVhmsNShEbKs1b2tKJy/oduTReXs/88PRCyoVpFjGOD4EvvvyCb77+hjAE5tFzGAPWQG6FJCtfwj2NV3Stw3zsyrfWYAwQnOzLIXhBSawRtNAaaJbqvDaiDYqklnfkRZrQRsyakVUhN0NtjdPrhZw2UqskIr5cMKUyhYFwGDGIR0+Mgla9LAunpZBy5el8gWUhVnjJjufiqFiynygu4IxmZ3lLsIbPRs/x7p5hHHhvK/Xzd1xen8FUXoPncP/wk/fiZyA5NxB8h/3M1S/B/mjz6JDh1ehpnAbu7kU9dby/43B3lANxHLDe7eSnkpIGNJb953SjuFJE0ZTjRmmQtkbMje1yZnk9cXl9wRrDEIRIJ6OWTYITBw/DhFHVjt3h2EIpSbq4lASpwJCzSOLE4d7h/Ij3g5hrNAtIp1yUJFma2k1Xo9wA6Uo7OCo8Ad2MgNKMdLStN5kNl9v1a7qR4htdp/OFl5cz1a1gHCltLMurPPxpY1sv1FJ0SCTXbZGzLAtn5UC1hhY5lukwM80Hnf3OUqgYgzOy8bVe5KhD7/PzC6fTC0VljjlnwjBQW2PaZllDOjpwzjMOqxQ580RtYiUf4wSmKS+sMc8D0BS56YC7wtcK//YRi2pw+43a+VN9LffRq0isDA6D9Q5TLEGDOgWlk+8zzuydtmyz9po780aXcTrKNQZTLbYVihPyrcXIWKMUTGt4HSeJ4sHtXXspkvArRV0/Sm4N17RQUHt56Z4b2xblOLIWPwxi8lcqJRVyJwHrZilFlLy7RRGcPr81HXWhM4Xkv9FMpisxXkfmRbt2072P9qnF7TtzM+W6NQ7r407Fnk3/uXX/KMr5+MMfeVsmvdWltQK7J00fsfbX8aP1el2zOx0Ss5v1YYX4bgx4C9425brZfey0j5/639/adZ4j3uQYHdXKOEx/v77OzfXv7bEoXX1ZWyOqCWXOdV8P1li1JDBahOheD/2G3XxcESZBcuj/Js/efghLsS6mcA7JlOtrrxeE1zDInX/4hpfse1yLB1NUmGGIW+J83ki58PT0wocPPxBjYj7ecVw3GQmHwP39PWMZcUjuVbMogT8hiFql9sR5I4RuuQdXa5cfR2cI+tbNELshoun+QQZ5z62sKdv0Tyu/f6lNyd8ScyMjpgpN0JZWMtSKM57RC6JvcXjjKcWQsiMli6MSKLgScbXRkiVFISJvuZLcIIRqAqU6qreUaRBxQKv4acY4ESiFYcKHYRcM/HPXz+DkaNfTH/PWrl1Ta1eyZusPhnRknUszzTOP7z5jOhw5Ho47eXRbV9ZNCFLLyyeW05OMpFbxSqilkHOixg1jDMvllfE0UjBsSSCzvC20kjCt4Iz4KHgji8EF4fgMg6gJQs/WUYmoSOiuZmo5ZV2AG8v5QskV70/48IxzQQqQgkDs25m8XYTnUKDZkeoqmYFYJS1WyLFKMraqDMAwIAd/Q308aFgrAYS6Zf38J+l/4/rw4QeeXi7qMxQEvYkbtWaW84nT80dyiuKjE4KMJHQG32hsa2RZViUN65jHWqz3hEmUD0VRARl1JVqWUNYpBO7nA6VkxjGwvvtMfE6UqyPE7yODEu6i8ndqhZQjMUZS2ti2Becs4zhw9+mAD57PP/+MRmGeJ+7GwHgQtZyp3QHeMPiAm2eBx60Vzpgz+CB5P9fN9bZkVxhcnm5RtahFPNqBNnpmVPcRYg/xfNOrdx67GaL8/U5RVlsbtjYCFusDzTad22uh1yLJ9Lwxje7o/94PmKwjydZoWWF3W6Gu5C3jjKH6jdWJZ815W1lzEmGCH5ingaycg92vQ3lcxpp9ZNhHmrKziPrC2i5hr9ogsDcSIFtOVz7J4dafoXrds/f3pelb1g/AzruRl1xrI8aMdRHvPcY6zfm5roSc3xZmrVp05lzkXjmxnLBW1KstiQIwl7yHJdKMIk4IZ1DXcc1Nc6Kg2uv6TTkTs/gF5SKqqWqNFo5a5OiffXlpSQh0lLpB1VOyj4WkWlRKgIVWidvKsizihJy7y0YDp42HNoTQ5Gy1V3RK+ojufNv6G4R8tYRGdj7XljJbSnqfZe02rlQL+R5NvFNDRfPGD+fz8yfa9srRJi6HQVAXJ+Td02Xl4/MrW8o8PZ/49vuPpJR5+OwdGMMwjPzw/QdqK4QQ+OLdA/ndgxi4IinyBoN1QT3rDBive5i82v5seC/cG2ss+EJzkgyQ6ehcf3+vyH3//qKfqk24OLWJAOmySVjnt99+y9/87d9zuSycLwuvpzO1Nu7uHnh8fIf3gflw5HA8ShPlHWEMEuFTC7ZkYq0QLONoyBhem2VpYGxjsA1vGqEV6rayZskEm6zDjxNDjhzu7mktc7i7/8l78TOKHFGj7El6TV+tPvz9MJOHQhej1w4zBO7uH/jq66+ZjnfcPz5I/IIxXM5nXs9CPl1PT6znZ3JcWS5ngRNzksTWZaWWwuvzJ6xpVCxbc+TmSMsrpA1XM85YghFoC73pAOMYOE4jIThiSlwuFw1IU6VPaZRcyEnCyJbLwun5hRA2agnUMmCdv3potIZF4iZyjmzVUNyBYg2bmVmayDa98XtB553H6ew7dK+CKuO4XPIuTa2d1P2G1//6h9/z8fnMeP85Nky0lmllo7XC88dPfPeP/0BcF+Zp4uHuDu89wyDSVozhclk5nc4SKdDkQLHWYYfAWGaalXzkot3Utq6kZWUIgXdffs2X7z6XdVO/lNer7pu5FFVtSM5ZTJHn04ltk9nt09OLdIWlEONGKYUhSIqy945f/Nk3WGu4v7+D+zseQhBPigauCbZiwiDW5A2qNWL7bg02XLtbuPZ4HX0waOazmuaYQawTdjJjR0FsBdt2BOyt83GaE45cu2nmnFHeHFLkuNoIxmKHSfcxs3vH1FpZouoduuTdiCmjsQJh1yWKL1SDkorA/AbiEve08AuWgBzEa07EkgnTyN2XB8a7IylnMBabEiUXthKppQg6ptlkVRV2rcr7KQaOstFmZJ3kJsGBtep+Y1DOiDxjcg9lDNUaNHNtGnqStcig+zneN3rp8LctUZrB+0xtEIaiBFD5u+KWfvJe/CkuieEoO3k2NClwWrPCwcni69XR7QY46/fXbozDGo00yEkO/tZwDrJtwmeMiS3KqDArEi1ggBQAWI1s6DEKNOUmNZVvZ1q12jlYiaLIKi4oBWclvZpWFPV9pTRLbo6mqe/F1D2eoTdRIGMu08SxOqVKrbIushXFotPxckOUdSnL+3DeIuu2aS3Ui+Y/RPjkpndu6VvnOvzw4TvW00fa+YnDOOBDYJgPWOt4Or3y++8/sm4bzy9nvv/4iZwLX3y9YL1E65zOJ/7x29/jvePXf/Y19ZffiPzfNUYn0xLnB5wL8j5aEcjIdCxTasFZxzzNDIMmvftAtV1CXvYCUC7d6RSNq0BtyrQ1FqzYCyzryvPplS0m/u43/4P/8l/+H56fn1mWjct5oTX44ouv+OabXzBNM7/8d3/J519/hQ+BQzqIgWlODFSmmkilcl8NrxVShR8iPPfHzKmSrlbyJXJpmcE75ruRYRxpwMO7zxkGz+O7z3/yXvxxx+MbCH/P92mdyNfllde00A5X9jwJ55xmSwWVowvzPqdM3KTIiXEjxUhJV1nx1YBKk1pzJqdIw1Il+1rgsVbpnijXD7PPkbtq4OqOeR29XKtW6SKrqWrHnaBZ4hbZthVnPTHJmIUmTsXeokqhRkPUIlL+CGJUuyO0seDEA0EgaHGLplqRTSsSstun/6sepX/9tcXIljImV5ztHiZyD1MurMvGuohp3DSMClc6SpB7LKND4VlUJShbp1LxP0SaVblUaqFVMR+bOplOO8ZaG0nzl+SglvfCbZ4txn2kYnVTqrUQdbwlJGlJl16XlRiTytTLrm5rfbNrDYvBGbfD/33c0TdZWeU/vgf7NmCuqKWMvvv82/Q9VaaZf6DseuurEzH773m7hvZboWNDQUh0E2v992v7N5rd3kC18Q3xtBDYR0eu2hlXdYhGDqamCEssiVQyVrPknBPyYj+gqipeurplRwlN7fNb3Vg7fnCNGLn96C96H5n0EU8fw/Tfkz4F0XvdIYp/5kkTn5EqQoJSsbnsESfGmB1Beqvr1oIBq2M0Rc5b69YNVy7cfpk+rjA7KlZ1n6V1R+RGqXYfyRXXM/36i2/7M2v2/R1+9I7r79fHQd0Vew9vbu267psa4uVMwanEX9SonbdjrZOmoCHFOtB0LdUmRbroyKRwNYhtgxzAPWG7aXEoCE370RNsbpa3KihVTfbWV4wb67JyGR2UjB8GKqLeW5aFy+XMsm4sy0WVwFUbbbnHKRbWbcU5x/lyx7outBIwvomJrJGRUfPiJ2ddw2gBkzXGo6o3Ug+oFW5dY8+J+0Pydd8LdVxb6PYNmiWGIefEtq2sW+R8fuXp6RNPT09s68blskKDaRx5vbuTRj4lRbmFLuKdxVvDFDyTl1F7rnLfY22cS2NVJZe4BPQ1JqrpavVZt+5aWwxSRP7U9TNTyPsirvRMFfk8Py4UdNfxxuLDgAuBy2Xh97/9HcM4Md+fOXw8CeE1N1JuoqxazuRt1fwokXdjHc4F8CKrdNYpt0L4ERgxr5rGgTiPyrdBOD1G+REYWi2kLVI1qFBuboMmQW7WSOddc4EqPJ8TDuc8cdnYXi9Y63b7fFq3VTfUWji9PBHTRi4JPPiDByo2GFmQpmJcFV4YZifY2dZwxdCqY5wDj49H5uPA3cPdv+ph+tdef/7rv+DhnBju3+PCJNLNslBrxgCfPnxLyZlxnJnn4x4B4TVUy7kg9wWLaVVGhd4xH2YePhPS3Hx3ZDzM0Bpj8KRpZB5HvvnmK371zZ+pVFfGO0IczOQqxEG04045cV6+JKZEjJHT64kYEzFFzpfLjW+PPASff/45x+MDwzADXtxzY2Z5uXB5eqGkFVsrTuXvWymsOYM1HN8dOLybRSI9eobgr91gL5qtk8O/wT4Do91up3sh1H1L35iSQ0yZuAmvrNWGTRVb5FCzxjFOM9UGWhSlYqtC3l1TotTKa9xYtkhBR45WixsndvOtSrGeYT94uuFjJ5EZBXeFmyaGj6WJ5822JcyyktRjK98UoHKYGslTUtSomExrnY8iHX4fLVYA67De919ib2YaUpQYa6QQuBla1Sa/o71R/BmV5hojKjunLuA+eE2915FPFSSjma7Uetv7ucVETFlRDuFA5VyoVhxvd5NEa3E+7MV5Lp1SIJgjrREVHe1ruAGmiPfJ8+mVIXiVqrfd9Vj2Nct0GAlj2CXMIEGpRaN9mt70apDxfS67bYZFihhaJcUkjWKBS5QiefSBYxhxRgJ4gw+CFDVBoUCIrjlVHddlMLI3Wc2xqn3tJ2mO1pTYcvcNErEH1xUCaBQFMr6tFR1vveW9zFy2xMuyEXPBxUyIotx7eT1zOl/YYsR4x/svv8Bax1/81V/xf/6n/8Q0zfzw6SPfffhe6BznC7/77W8JznIcHXejpqlrFIcY/QWcD3sB05GcbT4wDqOoq4aJ4MPO59KZ1t6o7eM+I+KY3CdYRkb7zRg+fPiBv/n7f+D1fOE3v/kNP3z8wOn0SsmirAT49PQDuRbGcWK6e+Dxi6+ZplkKmxC0aej8SIOnEXT5Tg4OQXlUVp3qq+Z6NdU8taqjUcM8TgTTOB6OP3kvfkaR04uYqgWEoDEd5u2tcldnSOFnCMOI857z6yvL3/4dxnmmwwfm4yPWecbpyDhJFkxJKyWtasom4w/jPPiAHUbxdHDdo8bidUMuwXOYR2qaRQKdNmou4hpZHcZZaslsq2yAPc21VjmcgnM79lNzImOoubKdFwyWIXxkGIQEKxCgBIQZhTsbUNTro9QMA4R7LwddEEMBmZypIy0G18SsywCuWSye+TDy/v0dx/sDh7ufZon/Ka6//vf/gXMEf3yPDRMlb+R4ppRIK5V//PvfkGJimo4cj/cMIVyJbK3h/UjwI8K8yQLvB8/x/si7L94xjAPT8cB4mGmtkQ4T+bJxnGd+/etf8te/+ndSTAwe58UrJWZFcuAmMVwdkpu4IscsfhoxJl7PF3LOrEvk9HImpcwwjByU82VpLOfI1irnpxPP33+ixAVTijhT18ppWXlZFoy1fPmrL/jSf0EYAvc+4KZB0Y66q8fwTmfeUiCjhEnT59l9nCWPJ8Y0rHnbU3FTmF7GSA1XDSEj0RPOM80HCJVoNtZYKa2wbBtPpzO5FJaSOZckSsJhZHRSyBmvvjO1Uq0RubfyJ6oWfzVXWs67W7HrxnzoIZgK60UkomLpkPTZq9Rc9yLHW4/3UuRkk/T5suDEh0h+B3lOcR4bKkY004ImKOaQS8FUoyMqOdirooidRG10lGEVVTLGEMKgaffamN0UD73R7UTkN/aPk6T4GAleQnRLrZBl/bWWqU32n2GwhDBgrSWnsnOdhEnQ9tFi1vgCDzvKcbpcsFbkweL2XX8k/Q5BZM2zsaqcFRl6LVBSI6eGNcKTs6apDYDaThSu46DaiDGxrhuva+bT60YqleMwkQ934r7sB6ahKkE/4NQLJmex55A6Wmz9AZWty6g1przbEywxEXPW13jbdog6y9Bw+kwaGqYVtnV903u5bIXXJWIdYmZoDc6Kt8v5svB0OpFz5vGzd3z51ddM88x//L/+I//3f/7PHA4H/tt/+++s28ZyufB6OvH8/T9iaLw7Tnx2HMXfSIs+Y2U9eC0gmiKA+7hK1/hhPjKE8Z+gnnK2GkXL5D0rNA3hlO2uaHH529/9Pf/1v/6/fPz0zHcffuD3v/8dy7rp+pGublkvfPv9d4QwMN498NmXv+B4PPLu4ZHw8CBAgTHy3JfCoKijtYaD1+LZmL3IqQWSCnuCbdhaoWQccJwPMATuDz8NDvwsJKfd/NN0SLJ/dt/k+9fI/+1+KSUXYlkxxtKapzWH817n6DLCqTntyeDA/r3NigOWdTri0cq81+cyLzdqutYtu9rNr9MUNq0/+hM6bC0/qe4vQdUjRV6HqQ2KbDKlZkrLCORp95luVXSpO+9ar/wMrx+mybus0li0EzZCXcUh2S9hkDwX/8aGY/N8oDpw84zxEyVbrMmU0jdOx9Wrw9HdhftGb/Zqv17vhZEHzns5sFyQD2qjeQ8+a1BrYAjiaHwtchpY4Sg1uTH77QtclRFjHeiqHuscKWW836gFYkxKsBN36lYzWb1yYkzELVFiwpQsCdu1siwrl/OKcYZ1jcQkPhpFpZIG9riLhioN6KOW26ZeNs7rU/DG7f7N1Wqjqcqv1aa0ObMTKwV9kj/7AKJUUaN09KzUquv3+uz++JVdBxaVa2PTdNRh1LyP2q5fgyqhqkrUizrmKlG0jz36O2X2v9f02dJ13tfhc6Mon9XDqzUFL25/X/Y19OPb0O+MufmxfR0ranRT+Ozjy9Z+hAr8cyOuP+WVb+TyEqgo769wjBrX8o2b31MLm9rotv7yuVuDSl2zVUjNMUkGULxBjqRvk31sSpmQCs01rBXn8JzF5yYnKRpd1eK/o/xU9ny5/q7v4zcZW6VcyVYaTRlbKcer9VGhTkn7c6fy+KKWFg27q6VyX1u7gq/tz2X7A2+kjrJafT8sMqp7y6sT5bM+H6axN3Kdg1iqNN/DODLNM8fjkYeHew6HI/NhJnhPtI5YsuTOtcpkKpNtWuSIW7U04QWvRPP+vlsrovxWCsV5GdXrfesqO2strV73+B8XOYqGGihWxrXLeuH19cTp9MKyXDTcs0hzpKbAMRVRzJbCui4iFPGOnA/s0Tfm9jncMbj9Q/ASVfgZQcY7VtknRxa0YXH/4rn5s4qcPtdvCms6Tf5sVqyoW0MTnqVT6tybpos75oWGoaRM3kQKbPKGr1kg5l7gNElMccOIaUFGPmXGOss0CTJUmiySWKJwesqednTdIOkuy3Q7jH0jdc51g06qkkiMPh19m0ffSEOFGmWDrRXTelRB3kmp1VZRL1AgbIRBwuVKSFQvi65YESNIR2Sx1RCMY3AHvLWE2TDOgWmeGKfx59yS/+3LWemO1tcThQs5b8TtlZJlxlpuHpTdubh2OXAla9dojFIT1ZXUtEYriZIhb+rTUCvb+Uw8L9QU+e67b5mM3XkOGHnwY45XCah64xgr0nHhiTRtyqTIeX09E5OQxrOa39HkIbbGsi0rp0/fU+LG5dO3vH73PSWtItlQwvTz+cLT+SIInzcsNROGwGefPfLweLfzpHqAahiCqG5u4F1jBD7txFRs3c9pZy2+vW3BSrH6IVhua9KzNiNZTUVP9NVYFkS9cAEutZFaY6uNrTQRZ6WC2SLWWUz2GCtjvZgLWRUp+3PeRDrfeW228zNuCpMGLDGRjHokZS2o9LBtTcYna4yikiky5urTsJ3nZ4QjZI2QqYd+4HdFD8IZQ43qzI1iyjbhC4EU4aJ+63wQLWj6GrNXM4z9xZp+YN8UYW94XS4XzpcF750gUL2ipmkjpMnszYixnlWPmJR347u9MO+UQ1SN1qSIuKwrpWxiVqlNpgFyFFK4946n51fmeZIRxzjifSClxOn0wrqugobJpso0Bt49HhhHTyuOWsUd0xr2nKytNPxQxKvGQEy9qRWOpzRSEmNRq9NRVMFayFUEIn0E2UxX/qg1QWukUtXThZui99ottVahqqLHQrg1jnyj63B3x7K8YgcjNTmN0otUJ2rUZixhHJmPBw7HA8M47IVr8IHDfKSVyunT93z8+IlWMmWdyOuk4dfieG6sIJLOX49z8R6yDMNA0GiceZylke3ojZpBdiRHmltpcitQtCfPNGKTSI/vvvuOHz5+5OnphQZ8+eWXgGGeZ+6OR4wxfP/DR37/7Qdag2278OHb33E4HBht4/EwqrJOJi4WKFFimGIpvFxWnrZNAAGvNt+oUhQhz7fcaFZaYEP5o0/lHyceN3TRWmXeS76Gtb1s7t2TVddFMYVCN7K0RZZ1FRO95czmpON2ZWVsCWuddog9qdwRpglrDKM7ECzaKQtSUnMhxjOXLZJTJOuhDFeUAWTx96rV3PgHeFVfdY5B/1pT/0mVgyHv6NK1TWzUkolVoPXqMsUVsBUzJsIoIYKECC7SaHJIUIUrkSy2QLMDx1H4RsMsc/DDQfxl3vKyzlBy4uWyseVGzivb+kopsonlTvzWg6dauyvLam17SjNGPYdskweNKoicaSSLcAhK5fx6Yj2d2ULgd34gXxYaQoBOGmy5/n+8vXd3JNtx7fmLYzKrCkD3daREUUt6ozXf/wu9NbNGlCGvaQegqtIcM39EnJNZ6G5e6pFgkrgNk5Xm2IgdO3ZYgU4NYw2KBnnllHgrShhHhfDnaeLx8ZllWYhx5DjeaXZaGDROLYGn85Wf/vOPzJdnpk9/4vr+J1XltQyQUgofL1c+ns9UER7XmXfPZ0KMfPv9E2+/fWtFWQMx6kIyjCPDqKG7EDzeKi+P0RG9VtZ2QbphFMPfwchJDpLXAHqpKINEww1Fsma4OeEijmdxJBGeq/Bc1Utbi2apALAm6jSbEm5CnOrULMuqlcrtls3LbqhDMwgad8mLtkGiahmJnAxBaqq22+PXJMg845IZ1nssrG7fi/FmxCsKXDEUqxQ+26zMKDXTps/pve5M0w1pRg6+ZdfZfaXfGWj48ZZW/1rH8+XC8/lMiJ6UtfipluaoHI4jYbjrqFzKGbKoQbCY1ELddLYawoq1ZTGjab2sPNdVQ2KmkFtK4fx0Ybpccd5xdzwyjqo2Po4HYozM88z79x+4XC69X6Dy3bcP/N//9k988809joAvA1hW2DAEDoeBqWimWpFMLcK8LqwVcijUrA6BCGrc+cKyeOYl4hysOZGyIk8pG3dPoNW4qGA6PDYCHXQ+XdUxVUvR2oJ5JXiHDJ5XBnK4u3/L5XoBWdW8MbqHFqn1uBjBF4bDkdP9Hff39wyHsefsD8PAw90dlEpaE7/8/I60zMz3B6b7Ee9UwT0EM1RC2Bk5Nm5FejkP5xxjHLXIq3PKhXLeQAvfkczgg9JCnKNatuVSEpe0spbMH//7v/n555/59PTMt998z+9+94+M44Fvv/2WH37zA845/vf//n84X5RzdL0+8af//neOxyMPx8hvv3tL8J5Kwkc1Uda8cp2vTOvKh48f+Pn5CRBcDEjQ8jGnYWD0gVIDJVWqZHXOfOO5fr1D/wIkZ+fLdA9WV6s2h1rop7m0YoNMFw0lCemmaVkd1TbEvCrc2UCqqma2CvZpZkYMZuIUS2U3cbbt2fbCXc3bqrevXLefmrpnQ8RhAwq2TJXaP9a8xSZIVo0jkrMWdCySKS5ZexRcMM9Yc5cN8taq5LqQ2rWNvOpc6yhbhF+d+q9tuS6LSq2vM9N0JWeuI3nkAAAgAElEQVQl+BYzcrayDbkrk/bMtB0U3rzukrNpEFVD9aBkzVbLOZNFWNe1Z0xd54VlXclFswjWtOK8Y1jXXik95aKEOi+E5PFOmOaZ52c1cg5jxkkghkrwqXu9OWWmaWK6XlmmmWVelNTewiZ1g+qrwDQtuMtEiJnhMOHjaDo8kcFqvaQCycitMQQTVhNq9mQzbnx2VrXedc2P1z1sntVtvrVbGreXUtUGypjGDJtNpM4FvbxBsYyphnv2LEf7XhGaXVgADLmpHZmrbttct8yg27nU8BHlzTReEy2uYH9rP27XE2NyC5jUguvPuTWJ3ISw2iTvwnptreqhqXaS3beH7W6e4u9yFCOMttBeLVsWaNkhV7UZM0IP1TRUoxs5splkBfWCq1o0UNT71XIteo95XrhOiyVsOMto9KQEMWbmeeZ8VrXzrbUrh8NgchxZjYyqYezGu3CG9GqyQRsP7XmLrY0t61VThrdiv/S1SNddzaSqgFRBvGxr1a5tmq0K+2uoUa7SPv7VkRznnRmkiU7waGOth0rZyms0kUa2flZkRUzDaSUtK8viWGZFJkvxZAs1+VLwXcV5M3KSc3ijH5RctZyNc6SQNtKyITnO+RsjhxDACXNOXNeZtWSmWR3SZJmHh8OB00mNtDdv3iqv5u7EMAy91uU8XxHRjLOcFoSgHNre/9n2GU0vX5fFIkJWN815qnddLqMUrYEmFl2q5lR/7fgLjBzVzPA+6ADuMonquTWo0fmI81rhVKzAFqXiBcaoEKuS0rREhCsLZblqDr44nYZOyL6wSlVBqfGAG0aggm2uwRfu3MBwl5mnK8uinkwVKOI0bNQji2acmIhXg+ecLYKuL55ClxHbLSgaW0yW5QAp6e/XupLqCq7iTxXvCuJg9BAGodrELlaEsKyJNSlapEiOmALtQBwOeD9QqpCSqh+/5rFOV86f3vPjf/3M43liWSbOFy1hf378yPn8xDrPPFMRY+h3XY6qzP2WZaYGZ9GN/SdvhopW3/ZBeVTRB9UGqoEiUYur1sJaE0vWd36+LkzTFaBzD5R9HxHndBKUhWraG58+fWBeZu5O93z/7Q+M48hvfvOP/K9/cRwPJ56fPvHLzz9xefzEevnI+nShZBV/1DIGlcu6clnUIJ0fFz6lM9473j8lDj+d1RM1OXwRIQ6BEDVcpeiSGl3joEUPQ/AcTkdCDJxO9/z2Hwa+f2V+VUMXdGks1KIiaYBKNCz6ztd1ZTIveK2aMVWsuGJ1G/KZU+6bjDhLC29Gbb+ngGttYNtoAz9EOhESESsPsRk8HVVhQ1R8DwHSjRP1lzbnqlooyVU1oLFwaXXN4GpoUvuwPWuXuZcNzUG6Zs4t/wd7vtqNCNfSuIupINfXnZsV5eXM66qGiTTHvmUhasCjtR2imVWNfVJK7anUYlo0eiS0CEbFiwpzFoQlFa6zljn4+Hjl08dHnDgO42Iq4lYgWRwpZS4XrSQOWMa6kkc/fHpCHAQJHFzCiWdZZtXMiY5xDBxPB3wqlHkllcW4jvS2T1ZU2XvH8RjMmJHuXLbsmty4XzlTrUBrU6jXw4ytXUimsNUPbDyZ9Mok8i3Drx2Cc0H7IEAYNLwozhlPKjHNC5frRClwvU5c55nrNHO5TpwvulY7LLPI6ZqjGkl0o13vvc2jYGiNE8cQ9mUdgqE32ofODKEQYo/aVK8ZVfO68jRfWXPi3ftPrKtmuz3cP/Av//IvvHnzlh9+8wO/+93vTOMs8fHjJ87Pmib/+PiBy+WJd+++4edv3xBCIM0LyZIRLs8X5suVeU2k6zNletb1ow744qnOk8mkHCjOQboyeY8XLVnixRHi152RXzVyBEvtcwHniw1KQ3KcJ1jJ9hAGwnBU6x8V9iq1EKTiovoRKh5WcZLNyNGiXFUUZhfnSL0ER9A08uGAhogylEIQ4f5OM13Oz088Pj5Tz1fzTBcyGzEZbnVxtHhd0HRu2djgmtGhb7s2JMNY+49n5f3Mc2FezKNgJbEivnIUQRWnzcMfPVUKHu2YmuniVhTBp4rLwuBVsTLGoxo5xZHWSoivbOTMV54+vuOPf/h/effhE9M88fj8yJIWSCtlnaEW8rqwXC99oewIQePpNCMH3RDXdeHx06NlEagYZAyRH374Dd9++x0Ereqe/aCGUl1YsmNZ4fmycD5fyEkryrdioEbTYllnns+frATFmQ8ff2Ger7x9+5Z/+offcTqd+Ld/O/Obb78nOHh++shPP/7I08cPlOVKmS4qJZAzixk5a9WikVUgrzPpadWJ5c49vBldIbpiC6ZyOhBRDRjTXxqjJwRRePntWw6HI9989x0Sf8vh4S9TaPg/PRwmpoYS2Wvb5ErVRXNR3alpXZmSEjUXKtkJpTqqU8MILP07pc45agZHKo1LsMM0REtctLo5+0yNFgbaNJNkM0DMyGmpyftU8fZzB1oaqmow0B5Z4gvflyZt0e4NO++Y7b3afexci2/Q6ON7BKAUI2baelBf6or8jY9CM3IWci0E5xiiGtO5qNRCKaUTpQUDZqpubNkE8rYSC3bhmsDqt8egyHFBmNfCZVpZl8SHj2fe/fKEgGXjNGE5I4vTOFjq6PoWmvWO9x+fgMrgI3chE1xgmWecgyE6xho5FU9IlQWhTCtFbC8xzayUVffKOcfdMlrhXzEuoNIZcjHlgmqlHIyH01F2tn81XDYQRDrSk+2ENWua+mv3ZWFDmPbChyFq5p6qfnstPLuuTPPM5XIl58LlelVDZ1ID5+lyZZlnsP1M99ywCcnuSOdAN4ZD04kTrULeQoOdc9iu43ZGjlNeXzVH47rMfLqcWVLiMi2slqTx8OYN//qv/4vvf/iB3/72t/z+97/HOcc8L7x/957Hx0/8+x/+gz/++EdA+O7bN3zz9o4YggpIGk/v+Xlmus7Ma2K9PpLNyHGM1BKozpHKilg4brG1xTnpOkDD8PVQ8q8bOeZJvERtO8K8WxT7YrdntzfPDbpCavfcWsijL2zVoNn9gmLXdqovg9vIgs4IVc5evngVQ9L71g7fN7hz/7WH9HpkrW6TuRlHrVJuSpraWGpV0UBLEa67VXlbqAUa6VEhJsj6s6uO9r9WJE/EU0srhPm6C6mAxfp1UclpJaWFtK6Iwdh72LTUfXhwM3ba+7a/lYqRIUXZzbVSHKY/ErX4aYhWQ8wQQEPW2rurmrGWb6iVzchZtJTEssxcpyvXqzL2hzho3BuYpslgVAuPFRM7K1o5t1YhVUfGBN5blg4YmK99VbKg5R+hSiKL1fFqoUQRnE9gi8caHSE4hrXgwoFcPOMxsaRi0vSve/RQUMsyMb2SLqJp2To7ev42T22ho+42fnaR5t352+e0zRqvpX3fDJSGlgLd0Gm179rRPtcRCdoUkt33/7N26CGw/XU+e/btotu61RCe9onbtvh7Hg1p2kKB7b1sjmmsH1dKf7ctjL6FBD9HnGqP47Q/1doE9TYxvZSKbYAasmxGTr3ZPEWXVvvKRdfFJSWkOpJkrW3URVq3tHLVmLSwfNE9oIXp6Wtz6aKMZh9rm/QxsTldPZTa2qxua72W1DHEqe7vqRd67XBVU3Nv7+Radm0zpHWBUfQtJcteU1TN+6AZWCl1cdNmxGT7qlIhb1iR7h3aGs2hEIHsLVwlYoW2tzBZM3aiIUPOeULRkkQVVNwUYVoWpnlhTSvrmq02ozqzw6jV0sdxZBhU1mAYBsZxYBgGxIkW+a6VZZmUGmFGTrX9bllmK9mTSOtKTisijpKc0j6KIwsWSm8giK01VXW0cv667tGvGjkKlxp5K2fTiGl1Ykzp19jYqWakbCRhweSnUZVZ5dpYFdt4wMeRTpqy+iuZYpWwldGfSkacJ45HXNACjkvOCsuHA/fffI+Lo5aBuJ5VNTmtTNezvniu5GSCVVXwljWyscvbgq4DJK0r1/OVlDPP55mnx1m92SLkYgvyACEGXECh2IPHRyE4aMICkjykiMuFeAUWLe1w8JEhBE7hxMPwwMPhgVod03PiWs+c5tP/bDb9Dw9lwd/z9s0bcq4MMVCqppU2cT8tymcqud3I1YaqFmcFNo9SBB+jhh6c4zAebNCP/MPv/5l//Md/ZIgDb9684f7uTrM43MhxUOXPjx/fq9TAoplT54sKMMZ40JIaVDJa2LRgYnUhspbMx6cnLtPEtx8+8P7DB0qFy5qR0wOhOJbpSnKjEi+dY3A6FjWWa5uZ9/pVK+uqhPZaMmW9UNeJWgtrWnuau3KOGnIhPbvhugYOx0xxJ57OmWl51a5kNaHEZU6d05As02ZdE8uyWC2kZCEezWCJQ8QXrx6h/9xo2SM5vm1VzRNtyMvOQHFm8IF0D3G33HZ0pB0vjQ1oXL9mOLe/0VEUvc7t9+1fVd+91ejVR94EALshtjtaSMMbbO/N493+rtdoa0WrEfVaxzAeGA8Hy2hSBCc0grCowB8ZvKm9ImIGrbZDasVLu3GxM/BM46mKIhoN7ZO6sC6JJWkIxznVKPLDgIjjaERVqm7cWDi5mGYYLnCZVtzTlWPM+IOneBUCrEXTz50IwTs1eocAx5GSM0MYGKNpUuVMSYoQLMvM09MTPjiG0TMcVP9Fw02aDVuqM3FE63NDbGpK5KLIfAiB0ergea+brY02TZJ5xWNeZq7zRCmrGjltrUBMK0p1fZ4vF/z7DwzDhYc3b3n34SPjeOCX9+/56ZdfOD8/83S5kIq6XlkcWYKFcB25qJM4zxqebqhmKTbfGx9KILjtGXTaWVWCEHpIy4egBhF0qt+SVs7TRCrZDEqNpIQQOYwHjgetcr5Hgo+nI+u64pwwLxMpJX755eeOJuWUKKs6o+eL8sFSLjzOM+dFUfUwRFxUpzh4r/uRd4RB63H5EDiOSqZelvmrffEXGDnaIMWybaR6WwTdCyNH1PUWTW1sa0UT3tLvtaCYCm8dcEGLKI5DZBgipShMp0aOcj/WXFQwbDwxHu/U279cWfMC4cDdN99zvH9DTgvL+Ym8zkzXi5KDr4VKVhRmVa7O6hKlKEQdwg7R2Rk5GnteeX5eupEjLoLViBmGgLcwxThoEULnqxZOS6o4yhSQWZBciZPDLxoPfTgdOfjIKZ54MzxwP9wzL5nH88w0ZXK+/jVz61eP0+HEw909bx/eUHIlhkDKmsLX9Aia4XezSHZrsPaFRVE0E09zQavEOs/93T2nuzsOhwP/+Pv/i3/+/e91wRkGVdzMicEPrMOFYIq3Oasq7tPzM49PT8Q4cPfgGXzo6psZrbuED0jIrLnw6ekR5xzfffjAu48fKAiXNamR4wbWcCChC10cBoZR67iIb+mSKmYYoqfWynR5ZL48U/LK9CzMJna2poV5VtJ0UicKUBmCAsS4qpFzyEh483cycpKVHlmNuKebi3qHuWfd5NTUa8B53wugllzwoak4b9fd81WUX2OaGv5WN6l9edfKkmzhJ22bemPgbIRK7OfdPXeGjv5iF4oyw6aYx7k3cNrvpWwISGljWHbPtTPK2md72RlLpQ7dWKs9TKbVm3Ws+Fc2csZxZDwcGA4aLmrGgYj0MVhLwYegElxi4Rzj4ZS8EXOd6zCIZrz5jbtVTBJgXrIiuashjxqjQHxU8n0InE5HhmGAatc2/aN5mVRuwjuu06q8uaEw1kgNVqLAHCYvQgyGZJeA5IGaKzFExqBGTlpWLdZYqymcF3xwvPX3nMIBRHqB1mJOpzNNKCxkJVZ0JBelC4ToGQ+DhnG972rbOeXXN3Lmmes0mTGoZOy2l9ZGMKqVyoU1ZWIMvP3mW969/8jhcODduw/8/Ms7zuczz+eLlq9Aa7plF7SKe3WGPlfOc+ZynS1knUyCQLoBLwawt2W8HU40WaI5AU13pza3v6oo67yulFqJw8DheEe00NY4jhwOB7xzvfi1c5pS3oycZZmZl5l3734mr3Pv77zq+nS9LkzTSgHmCgnUAIuN99c4tmqQHe5OxGEgxgjljmEYWZavL7a/buQ4MTSnSYebvvkudl1t8yttYQDoe2KluWcb3N0Wm7a67mHiFg9v9aq0umw20aFiNSzSmiwlr4PqtFpRWon2Fp6GWzi3CZM11c9eE8aQigaTOufwVF0kGmkrNLKpZQ60AaySsKrbkkVDVBlcUU/I44kuMviB6CPBBWO2V9N8yWaMve7hxDHEyDiMpJw5Ho66gYmqmLaJsHnWre8aR8Ku46OlLTpN6XTq8TZvdBxayqJVZK/Qa2VZek8tyrdq9bBUNCwhzncCeKkbelJtA25hxsbgTyl1BeA1ZUUbfURCwsVBlbTjiI/jbmK3/tQFUdW2TXiyjaP+1caSbGFPmxUV1zkDKSsvYlkzyyv3ZS1NYK/sxvQG97cx3eaZwtOt+rbx5KqjpX/3YxfaaRoybS4o4ZWdQqpxM9rv94iJ3XsT1Kvbf+vmCNnTbShKh3PMMNsZJ5/V26EhFbKNy907vPzSW9duyN0EqnoTSH+228//xV3zf3S0el3Utm7az7udqYVlbpCshkWbd45RALaJvBmWbelurdsys3bwl45373p/OyOpO6C08gjOUarOk9KuU7avWsFLWx+dhulLk51wGs7avVfbYJthVkpB8s4whpvQHRZOa1+9cfq5dMO2Unq4SndLx2t3ZmlVz7FntIcTQ9F6ra+UcU4RGOXfXMg5c7lemaaJ2dazTsDu4SpQMSQtjJt6eN7C1FkF4ioGVKBLbs8baq1oZY2cya1k65ctHLivDVZM2bzevBN8GWVt9y9Wg7Kt0SKQ10Q2WRKtObjquaJ8MZyDnG/Wpran1/1+3ZCrPyNH/qtGzvEoTIdCvi5IuVJNh7ZWT8mB1AwXAUzCQBc9fcngI0SrO1UFo0vqS5ulF5xKOJeqZFOykug+ffjE8/lKjJE31yun+3vWlHl6Pqv2jhE3qtXUIq9QssYAEUQ8zgVCiFC1s3PSFE2NYWZL0dPaMLVW0roQnEMivL0PHA8nahVcjLigMUYXwQ3KExpHIVaQVKlFyLONvdkhi8avY1UG/BgGvrt/y5s3J4bhwJu7N5zGO+o6kZcnpvNEHL4Ou/0tjnlSS/q7b7/ndHrDklf+YVnINffYeXPkv4zkbIO4KSJDkwDQPvdWhiOEwOCE6fyIE2FB647lnJjPF5Zp4vHpEx8/fOTTx49cpomn52fOlzOpFIbTPVIGUq2aheeDojg+mg5TbcVEeLpe+a8//cin5wu1eqqP+OPAGEfkcE8pilq1WjkN8tfNTRdulafXujY1K3QehwGfVZwNUaOqrImcioXUjog3sp4bWIvnMmV+/OkD33z786v25TRNTPNEWjVmn9M+1b+w1WlDlY/VOkGsEnvKqvHUF+Bdf3cDoCE50A0/PXY7iqUBOWkE7R1icvPNfmPeHbWFhr688bTx1uQMbrl1tt5Utzlc7bn+jJGj79nuuRkX2O+cbJwFb2UfnHtd7z+tiXldQIrpoAS8cSqkKb+bkmnLonOiJH8B0wWzcJXT9UkA5yvOVZTPI0jVsEO2MFfOOr9c0BBAGCPDYdB+DKJzTKA4K/pJxQ2eUAQfBfFeM2NxrKZmLCLc358Ix5FhqZS5sOTKUtH5I5VaM8s6I8AQBo7HB91G8kxOExXR0i6llRFqPr2ik+ti9et2oUqHasDEGBmCZ4heNUATtECehICErxd0/FsczcDpxPaqnKeKIm8lJcuuWnGTGn7+P/6DNWVCiPz0p5/443/+N/M88+njJ2ZL9WdeWUqzZcVQrMoyLz1pJhcNSUqtSiExI8eVZijujRyNNnQk1d2+gw4rqwRABZcJa0Yksy4r0zRxvV4VCTWkc5omllnFetdF18qUMtfrTNXSktSUu1zJuiofsyIU71UAdN+Yhsp6tNZc9JHoB7x4ShbWtfLnSpH9qpFzOMDhUFjiDGki10yqAmhqcV2tZXqV5to9ABFR3RsDbzSrQjtd5bwXrW3jPdnKxNcEZLX0n6ZHVlOiXdPMPD2wpsTT4zOX69S9L6ExrRWaSyYu1owc74Ntzq3OSkUSvWGU7KYplrUaSc4FjuOgho0oedYPWvejuKKZKQB+VUSJQlqMNV+AxSGrDqLBkJ9jGPn27hu+/eaBGAbuj/cM44HlmslLYbrMDIfXjXHM04zg+Pbb71TxuRGFRTPEFK6kx5C1HXcbBNvmYnVtAWP3Z7P8cyJVC9uVzHx+Mq+lIEXT0K+WLfD49Mjjx098+vTINE88n585Xy9k4JhWfCkkLNXZBOFwAVVU1Bo6UHm+Tvzxp585Pp453b3l4e0PhDgwVCHazufFEQzlC95rajuQ80rOixnfopIEJRFwxBgp3qvOkVOtnLUKUhIuROLhnjic1Aux1NTrnPn5l4+8+eaXV+3LaZ6Z51mLJtaNvL0nsXd1aluAnN8gAZcdyW8K1nLT3y0Pe+f17jaThh41g1gXSMEHZ2GdLe6vn639n+YhvjxuuDq3yxxAn9cvkwiEXXq37LKr2oO9uHa7xksy8v7eHcVwGyfntcNVS04sq2qHtbl3qKqALk7r3Ckfpkk6qIBetDA6pSLOUGiHGu+iwq3ON8Mu02pvFAt15FLNOVH9pzgG4sHC0F7Rm4omXFQq1auDo9WvRfls5oGvWXmZIsL93YkD4KbELCt+LUgurLMalbr5KVJ/OBx4eLjHOeH8XHheLkqMzqZv45rcm6I9GppNHcmqNt589ASvxZvVqdHsw2bDqobZ38PIaQa3vmstKqCpzkgiLUufQy3JZpoX3n/4hIjw9PGJjx8+mhTEogVIayXNK7JsArjVoKJWsZ5qauSGorbyeQ0Jk5uZZTT3PcwnW2i7fa72G4GzshzOKY9znmamaVIKio0D1SabWRdNBGmk9mvR3zWDvMFTuX0rYs6std/uKZ1FabzTIrrRR0URDe3Lf0Z65VeNnOiEEDSmWqO2QclbIzYSaq3NyOld3MMbCi2pNVtMLMqhhDkRC0u5lrK4ZceknFizCsyty8y6RFLSODJWvLF3toCUzchpqY+Naa+xfKC2mP6+GVt3an97i9ErualVefX4lkIsDuUga7S1SdFLi+gVQZR7rUU5nSd6R/S+T762YPaaM7lY0cNXzq5qRqGZ7bX1ERvs2zzcXoV4B9WX/YkN+wZo8CvqTbbtoBm+OlYyNWVVHU5JyYal0AB3bXOrf9XDXC+98caPCEoObiUHjIeydmGydt8Gie83sxc4t4Xg6svx0KD7vhDsd3RtP+d9r/6LqPYF4myBfuVwVdX77Q3P/St07/cFktFOcy3rpFr/GimxozjdizADVzCCb925gu0UuUFPADOaXhgRUvc7wO2fXhgcLw2dHn7YhZxFpK89dfvgdr0vGDk9XLVvyzZWvgDDv0SOXuvYZ/5sIdF9h7Y2aV/6zG2+ivVXf+0Xw9VusnNY7F0Nge1/d2LIH9Yk5tDJF9qFFvJq4SM7W9rGROeFaOSzCQRWqtO6SOoYbwT2PZEdWl/v+6E9/LZgtbCpd84I5OZ4t5CGcUpVlb+lxb/e0UO4zTGoamCKFF1TnNaVutnDiirKi0hXntfXc5bxdGuYK7JVO/LhrH0+x1GVmP/SwNmfsT9q/+utgSM0AVEztFNmXtTRiiGQD4f+7Kup5Oec+1bb+lHAKgzUbSne3a+CikbaPT8ba9aX+pCKHP65tfZXjZz7e0956xmKZ50D8yI8XTWtWrkHUIrbuAlA8EJQCQ1TRlys+JtWlFXLjL6gpmXGSaRUjCWum9eaZ1JZLdMgkZezNkSGYB7Nsqw2IOj1bHLRLJOmcBlCxMVIr6RerXhgWi027PDm6XgzRKSHNELf0GgegcPY7ZVcHTkLpQglOVxSY00y+KKe1sPhwOkUORxHHu5P3N8dqdUUgOfE+fnK5bJwvSTG0+tujMHqneSkcc1pnnm6nFnzqrVmDoNC5TFwGIebTaVSe1pjrVthRkVstLaXiBDHSByVsT94R4xCzUXLN1il+LKqPkfJiSEO3N3dEw8jEgMPaSHEkeP9PXE4aJZdSpQCIWi18VIOrOvCfL2YsaSeUC2OOCzkdTWZPOkzqJEWBbEiq01jIpmKtWYhqdHiCGh2kWQHMnW+S61m4LjA4XjH8e5tR1JKqRyOh65f8ZqHem+lv58imq2/WsqupoQ63xIG2pnVMiVlQ3tc3wlBzwAMGel2oS49TsScFDF1121zavWjnG1u2zpusXwzTKFdz+7aUZ/9/rUty2HnHNxwkeqm5XJjP/2Z8FdzdDoaJKi2F1s7OFduMs4UdXi9I4SWNbUt6Llo1fVuaAu6QdrO2VL2xYwIaJshLVKCcxXxLdG6NQJWB2onWGf8Gx8dcfQ2pi1MYv3sRBRdMbmGWj3eRXLVjXsxI6uY8reIEH1lCKa5PATKcVAkZl6ZbIeLwROjlR84DJRypNr7NQXclHeqx7V0o685RSKajdUcUs06mljXzHS9Ms0r4KgyME2vSws4HA4cj8c+h0pKhCX0DTqt8WYeaJ8Y/7AUnPccDidKKQw5M/YajbssQqNpVKCUbGVOaq81yN44bxPDxk1DgPSzu2KubQ7V3Z1Eum9SSmFeVnIpfHz8xB//9Cfmeea7778zFXjH4+Mj79694/HpWfXPsqpaSw+zab81m3kbmdLHPYDY55woaFClaQGdmcIC4hA/qGbew+WrffGrK/HpTqgPnlg8efGcJ1hrVuOmVsoMOSsJLZegnRobTKoWliTzKklU8xoVzRGowlwcJTtKgWUtrKsVg8wTuaxG8F0p61mzQ+KB4KMWb1s0/pctJTJbpkHrqGGIjKcDMQSoxXLtK8tqcGGx0IzX54ox9s29pZVCy1ywZbShIFTWnQZOTg5ZnWnkqdU8SOBuHHm4O3A4jtyfjpyOB9aUmR5npilxvUxM15VpSsxz+jO98dcf2pYCaKx7nq58eP+OaZ4YDwN3p4Oq9x5GKCcrmrdtKPO8MFktMnJSpcNacakJaLcAACAASURBVNa23gv3b+6J/qSFVePAaYgaUmEhrxckFcpqkvI5E2LgcDwS60g8HDSO7DwuHhAzRPwcSFIIIRL8CSjMkyctq9lYwjInalk4LCoj4C31tW3XpamJVRDnLa22eaJFeTdo1pgqdBsiZYZB093RlGrVaRrGI8fT/W5DqIzj8Or8DTBjoRQ25b1mFAjObQucN67FnuhJey/b6PaE4b2T3FRn7YZ02NLGdwtz9ZIkTjQ0LZZ19QJBgQb61f4QX0NIXho6+zT3/ZhsyK8iOjTc4YshqvbZdv2KpUXbvTQpqXnLm4GvRs7rOiBNSkPD3zsEwhCrLXy8kbF7Da59v2N8HNfONU4ObC5zqUDTcaEjODjBBUeIShnItZBLQjCExLiTKa+saaWiBkXB4x0kKXgHmEis847gC8GLhsejh0MkZ3WJc07UoutSCNq/wxCoZURTrzVMqRtsc1QsVIr2cQiBOGim7jBGYlS0ptbEuqia8DzPTNMC4qlW6Pc1j2EcGA8HGuqWU7pRjw8h9HewzZF10cSJaiU1hsMI5pgVczBq6zoMGW99aEk51NrbpyEnjZenhGfMmNrWh5JUaLebOGZAbiiOtP+b4ORKKZnn52d+/vln1nVBnPDmzRuCDzyfz3z4+Imnpycu16slh9hzW6UEbRVzonZOlUaxqiFfZQuXoWitiJKgRWbdI0IC8VwuX89K/vXsKi94ZzIirvbvXQHvVaK60tql4U7VGl/rUW2ZAIVaNyl4W2pUxdIybZqAWbW6Fv2s0ngzUFxWafrdQldLq5bdFjCDfUvzHncDigYBui9D1w1W28j+bH4i7HeKLc55m9HVvndetQdijFoWoC/SlXVJTNPCNC3Mi1beXde/Q7hKat8gc16Z5yvTdKWUFala0ySnBUrCeW/naTsvy8w0zWbkqJyzpomCF9UfOpWjVpi1cgcxho3UbHa7ZgGoMjbYol0huICrDsQb6VU3S10UQFzFuwGxyTJdr9RS6TXJGuLQMhi0Q3WuFkXypDZFUmmAeO/bVmqAKrhGaH8RImhhM7HnisOghoIp0kZLSQ/hddGcNoZNtasLplVbIEDH7Q1Hjm0cN/TllpOjF2jXaUtR/5yJq2lb7UMlthBuYAy23dI9R26Rm+092t+23906uHYdYbeW7MIX3P5eL8rN3H5535c/v8R89uGpz679SsdtH+jPzbN1tH69fVLd6PfhhV1AS7brOguvd+9+s4tvhvfWf19iTdnfdnOs2JpNbvXRgm6gpUJfi/fCgLWvA9lrkUmqhqta8oeg51TcNsZQTVVBx633YsaMZbn6Furajdc9kmHoYwvtfAXke73DYoY95Or8Nu/siUsohJQpUnFS6AVU23oGnxs5NRt64yiWxbtlVu6UkKsaONWclL0h5JzbfUbod6rt3xdz3vTRUs5crxdiDDyb9Efwnufzmeuk2WFp3TvtzajpLtkGSOy/atujbAyJ1a+0SEEnSQOUbPv0XxGuGkbHOsIaEjWvhEEYj4IbNMOoRnWO16Uyzxo2EHQDKwAuKKtbhJyVGd9IctqOJo5kct1LWlhWLXipHoi+cCmZtDpEisrxuK0qdi5ZvexcqXmbnLUqSTgtiZqLbZbSSYQxDoTQ0m91sKRUKGU2GNQTgsfoWd2o7d6sdZRzAakV7zGtXEOHRBjHgfs397z97t7ksyNphesl8eNPj7x//8zT88yffnri4+OVMnwddvtbHN5VhEzJCzllzudP/Pjjf/H49Gi6HBpKHIaoxGvnbKJoSHBdtYCahqsqYhP1eBg5HkbGceD09sh4HBiHyMP9iTfHA3ldSecLy9mRa2HNC+dpZml1eoLHu8DBtBEUmYOcK2HwDHGkVhgPAw8PJ+IQePfzz/x/Bc7Pz8QwqIGEV95YymQxCNf6llKp2bwDH6jezJyWUi3CeDwSxkAtiXV6Is2ZVotJvV2HCyr8OB4OvH37hu9++I5aVWOp5Mw4RN6+eeDtw+sKOx4OI8fTCaktg4PONWibjxqGBqvSEBBtjyBh2zR3i23dGSVaQNMWnrqhLs3L69wtSzNHmjdoInVS2O8nm53wpS209nNa6HlzKbnRvPlS6nzjg7QN+mvE4i+FyHQD5sWz7sNZ0p2s1zpCcITorT+Vy5JroWbT9GkB2I4wi6UEl76B+vY20oasEKI6GkLVtF1Whf5dBWcq517JyeIKhUyy6vEtS1Xf3wokJiipkpOGSUqedKOJA0fxECLU1TJyYM0FlxORineVENX5HX3kdHC00Mianru9NY4tTOi6YyilkGrBI8TxQCvF41p249Zz9PCxJSdEC2dRhVI9wy4D8DUOc/NpBn6laiq+E1z1PZy9r6U4DCOHw8k2d12rMAe9o6l7tLVogoQaOftwVb0Zu3seU3P4qxGVSynGnbFimcZz3CNCzdJqc77Nh+v1wr//4Q8cxpFf3r3j519+wTnHf/znf/KHP/yB63Xi0+OT2rri+3gCNYhv2wq7p1hUTZSXI81NahyxxrdS1XxfK+I9Of0VYoAxOuIgeF8oLmudnoMgWSBUalCH3nmLl+YmHGibfRFK8d1aFEtHbVZlr0LbGjzPLFlz6aPzms4tbIrLaGwWabV6GiS3l9GGtlw1/ZlaKoSq6eSIpf9Cg7t7x5fCumqmVSyxL5ZI2wLUypHS3B7BGQTq/Ea49V69iniIHO+O3D/cG6TvyLkyT5kP78/8+OMnni8L7z6e+fS0EE+vGyvWygSFWlZyTlyvz7x79xMfPn7QdHyDPDX9Ugu3lVpMcEzbSoW+sDIdusm9efPAw5t7jqcj/5R/RzyoUNTp7sjD/Yk0LzwdohqZSeHu69xUNDHUJ3A4HjSbbs1IXVhKxvnAaRhwPnB3f+K3v/2ew/HAGAbe//wLZdVq5E4CglOtolQoknplZ6x/+2pRNwNN8Eq0FGEYRrwfKSVxKQt5mUDyhv2jGWjVaW2cu/s73r59Q62FtGoV9yFG3jwo9+o1j2EYOYwjurVtxM/a51NbYOnEXNUi2hYq2XFy2py5IdnWbYGr9fPQ0pdStBtWpN7VxiF4SXt8ga302zWejS7KhSaBIUW6oXPLyak3Rk7TzBAzSvfP+kUER/ZPVj87bzNyXhfJacjnHmDpYQUxFLSFAWlCh3vAv61VbMiMQ7NR4qC/qiaDIAXnkiXGKmqtRo5RBUpurUPTHet1zgqUrF81F9WlouILLOMB7yslFVTtt5JA2ZoCXrS4NQh1CCCRWpVPN10v6sjEyBBUcbnF3XRTx9KuhWFUdFwHkpZlUScsa0p8rTfiiMGFVm1GjR6/9fhrHA3JrNvgV2J/VdT/S0rjWP9sjbxHLe3vssFtJa9q5FRFdagbOrOtBc04secyb33T08os69K5V8lKMHRnqNb+c0OJ2jnzPPOnH3/EO8eHT594/+EDzjl+/vln/vTjj5oVNidDbjYUZz+LOgeo/dwbrKLLx+clODYOWgapODyl/BVigK2Ampjem1RwQSw8Ab5WcBWfIUSdLCW3mrds6sfSHs536LnVl2qwZsOg5OZr+2GfWdAX77Yg7xG2rQV7z1YTTtrjY81oEfuvEzHJc9MzKJmUTTGyeauKpW3Xb9inGVY9iliUG5BKEyxCEQYUAZmXxOW6cD7PXK4r06zhquWVs6uaiFKzA733RpI7mZGToBaC9wzBG8F0t4nkTApBjQRLInfOMRwOxPFAHEbEeRXGK4UlZeZFVVVXk51vFYGrDeCULSvKFqRlXSlZ9RNKLohp2vgQevXcBls6aVkcsk0h2/haqHIbFwp7CrbYWfftsxGc98pHyJrp5bzHV11Qh2HU9Myk3J5alUSYs0rY56TeaxZIaSWn1+VXNSXeJstQd56aLpYGWbN5YbQ5Ze97a5zcLvxteLfpuvvNV45qz2HhLqRPSKHB4F/55G4hVySIbV63zcKm1h69AZuv1RCcvYG2xdw+u1c7ZNcefXXZtcn+3y+hQn/Lo4e490ZXa7+do9WXRD2B1j49VNn6wf683/g2wcidl75bb/U/rS9aP7ChC1U67xHEQqboZtNKTXzJWBYbk7IzdkVLq9TawiD6Ao3fpQCr7hkVtF4gxfYkuQ2t9Y29JQfsDF3YEAF7nI1k/0rH9oq3BgrWz+27/Vi7+bxTToh62VsquIgRx0BwWuYGC0VZhkCrYwcvHZY+ncxIMOSELWS1ha7qCyOnduPIew21aZac7ou1WmaYIf8t7B+KSni0tWk/HztHrn3fttGbNnzRT82hwlD1JlopX0fm/iJOjouOMHqoniEIhyjkKoQsuFUHVRgrPmp4Ia2OdVbxPye1pyt7PxD9SF+QGyRd1CIVQcMWVdGbVs3aWyaLSNwaBTV0c0tpL3XXfrvWKgrlgXqzJWHhBwz8RaF876nVaYXsNaGaOgVvRSdDCMqpEdtSvNaCad5GAZJUEjrAmmR1qo7zlHlYmuS4kiTffTjzn//9kX//wy/Ma+H9c+K6FMY3r4vkpLxLsRbh7v6ef/6Xf+X7q5bSIK9UCh7LLAKbNKVPkq2yrilMO1GV48OodZEOd1yWzJIhr088P1/Ia+Lx+co5Fa65ksQhIZBz4uly4fHpUyfl1ZLxPjKOdwQfOd4Jb4aB4+nEMA6ItW8tRbVvvD6p6KBS43RnYLQMB93u1SvMmAcLWvBQFKEJ48jd3ZFSEiVP5LRQSsIHx939Hcu68u7jJ+bnC3lduJ4fef44qve7KDo2xAhp4u7wulocMUaGcWCrhNkWOqwtbSyWzGrhBxFd4HW98V1crhsWsE0w0CweaU7FDvree6ndiOKrNlDnxN387vaz20a180/aPcFC0bW/3w3BsiFB2xXb7vyZaVX7+NV53IwKNZi3jbEbFa5J3r8umdyLaJkTy+5kt0EJynkDIxvfcE82InU3fZrDWITVkIIKpHkhL7O2W1ZagHJgoAbXdZRy89xT6QhKXotSAipQVUvMeSEMmrgxxoCLUdfSXHUOlWLaNLqJtYKcgtWOM/2bUjylBEqF6D3BMrnCMBDjSAXcOkPS5/NeuiZQzoWSdKynVYspN2M++DYHN5OQqjX7Xvcw3l7bqM2eVONEaJlOPdzbHeX28dL4v3at3b92bi11q79nCTX7UBS1rdX7uXp7NIehGhDQuJft5/59Q3NvIiab8SPOcb0a+VeEh4c3lFJYVlV/V0HBcpPq3edy3X2/X4d2DdKQZjGwBOicQhFhHL++1v56FXJnqYPBqbGDMFRFPCRDjYVSBPEVRBWL11k3/1LkpmO9C3g/0rMWXIudNgxE45au7iw1U/vUuLDvL99hrtqQkl1H7jpUalVcFaFmqNm8eZGeXeNx4DZEKJtqrJiAEyKM2ILotut2iN/g1IKQ0PDOkrRIIt4zr/pzrVVRjZx5fJr45d0zP/74yFLgeYUlw/P1dVn/2cI3rYnGw5Hf/PAb3QRLpuZFB21R4S6xNm6ViFtZBRq61QuuRnxUMTEfD8yr1pZKy8IFRYCu08KcCnOuZFHDsgDXaebx6ZmUEtfLmWVeOByOfPtN4Xi8YxhHhhg5Hg691lJpWV3mTXSrF4Vuc85mgAqbSq+jGiG3ALl2QXxNf0ff43A6UXJivhyJ44VaAkMM1JqY5pnH5zO1qF7TPF24Xp7UoF0U9s0xEshM5/tX7csYo9YCa15SM+aqcXMkq3ZGh87M3GueoC2cbR7VdiEHHfVEaGWQSrOSbK7JlwwdGmqzIQrNavmSsVH333d4fvdc7asvqrc8nHbnvX+j/8pn1799SluT+ukaXnbdU9z/3vUw0WsejZwfTHzwhhNk5rmWVGhcHHujbtPtvOQethIkp95vKa1aKqfNcVTXyjkTAvVG8jQjJxflBJVcWOdMTpbpJRrKDl6zoXwQYgg4H9RwKerR60a8uRkOwTc0QoTajKrgCEXlN4JoJpeY6vM4qpGTpLI2sRJXt/cvuWc25qQKziJaNLhpbb08Ynjt7McXKI3IbjBvv7sJV20Aq5VSYTuvvcP+XUSVpnVem8QJmkXa673tkRzQc9tz9Wdrf6rdyNGfdxp43QHZULImJ9J4PXPjaorjdFKB1KEZOaWSrGzPPiQF9LIg+v1tqLy+eN7bNt3QsTh83ZT5VSOnlK0cA1bUrJWJ8UCoCiOGADEIRaAW1dAoRQxJqVAL4jKVDBgPx2BFTX+ztMCqHBdB8OJVtrxtUp9ZMZuX09blhr02Nri4DQptUudt9GjoQmhpbUCf7G057JBay+6yFBbf4sT2Hy2KqB2ac2a6rpqmWIXLdeF6VYQml6R6HDuIXTcSy2J7ZRhVLAPBO6EYB2AYR1xWsm3NVqemFpyRw282Kdfi5GgKn/MbAue9pYwqR0TbqvUJuBAJQyZLIQ6FIcGaVkJUnYtawfmA95pC2Qi/FRWJTMlQJg8ta885fYcmigdtPbf+EkVwgB0JWcdwNo9SvIr6leK12rgtEk3UquZMzgs1a5HOVjet5ERaZubpqmPAFtri3M3zvGJv0kI4ItLHIkBPLW1Zis2ja5+UzbMV+6aH8tgbBVtK55c8wi9GcHa/e8mB6Rfdtc3N5vzi352X1B9WdhtIn8P7e++fafeAaiy8DJvdPp+0xri5yBe8p1c4xLmdkdX6tnbU2bW5xIs2e/lYbVM0Q0LrRel87IVG24dKAZqqsxoWTeG5TZRKxeEgOrzT853zXXrDe00jdy8Gg/KhNPxUu3aSdGQMpwq3uqd7vNfx5cXb9W+zAsXeRVHogph4S8t2VGVnVQrq1eOb6GrfOHVteO2ZeTsQt397F9YX47L1+e70ujeQenhu18ZOLOWsuRUAbQ9tRlUzGlrI1+aL+1L4tZocXPOaXHdWFBhQ576hxc6Dr+g+Yc9Ya+31I0HRNnGKLPmULHP1di1SgKDtMbt9t9Y+E/eFf3ub7J47xuGrPfGrRs66etYUqERwA04K0RVNFy6CD+rhpaDlC0oR1hWWWd8lrSpjXY2MrHCnkodTl6NPZNOgEPEM7qhetfeE5o0XKLWdo/1L1dRlEZS1ziY8JmbseAdx2NILnZfekQpy2+TzOihD8Qw5kIszBV3LEFD3WDfVoSKDPlcxkaJUKo/nmcfLzLpmPn145vlp4v5u4DiOuKpVdeMohCgsaabUhMayhdEFIo7jn4Hd/hZHjJ5hDCQbnGEMDKejDrRW96sqgdrb3iJV2TcIVOchaOzde48LllVTm/Zz25B0MjnTo/GuMt5ZGYSUWeMFf1wYDgMfPn3kOs0qtibCEhaGYSAMAz56qhQu04WMVutNeSQET8qrZoGdjl1FuVY1uJWIlsh5805KTl3wT/VlbDz5gLM0/3GMnE5HSsk8Xy58ejqT08p0eWKZLuSUeH5+Yp1nKIWnj+9Ii8qaxzBo+j2Fmg8aS33VY9sMuzFeG4Sv4oZNeTlny5po6CM2j+z7LMXUjAGTF6gYSXVn8Ly4fX+Kdr1bAnM7cfv8tshvf+mLnf2+Izd2Rm1KZFL753UT/1zbdXNgbo99KKw9wLZONmfploCsD2Oih4iih694DN4Rnd/CxDTzXA2cJq5XdjW8XnKGXJNdCB5n2kiNzyYVglSKkc3LslDSqmttGNS5UzjJFljZzW9pRDp7Wrf1pS+YGj8VI2iLEGJAa0Ao5aFt1C1NvgoUc2IGF/BxoAIBb5lkQoiDylhU+nvUVtyTAlI1dGVP5YNmbjUnrPFFtL0a2VZDXK97ODMGtr78zBi/AVj2Mg56wk2ZFZvn0j8MVOnh5r1v7PxmINwgnjtUpxPU9w5Hhc4qbYiNne9eTijAxUrcGY+3hz7QmjV82OpfNT2r9jzt1NpCcOz+tkNim7G7/+we2X379u3nD2jHrxo5qubrqAQQtfYJVbk2VRu0VLF6T45aVO04BB1M61JZFx1c67KSrN1SqayNBZ+UCQ9C9BHvdHIGq2KrVmjedE/MA3cohNsBg6Zrs1toQ3D4oOQ415AcO/Yd7szT8JbGqVlhO8ExLBvHua4dgzP5flH152lOPF8Wljnx4eOVTx/OTNeR9z8883AXidFx9yYySiDnlUIxLw2id1Q88bXj/l56qiq5EJ1n9JFqukOpGzlCsGnpRRcdRMAHagiqVRFUXRSp1LRSjYC7LIsKRKEBAbH+dEH7xOfMsTpqWKhUDscj4+GAW70JrgkxRnz0iFdvYllmKoWcA86h8fuSCdEz5EjJjuyqaeZUak3kLN1QLaVQ0qpoUCmUrGrLUHEhIl71bqbpGxYrOTEtC5dpIq0L56dnrs9PyvdZJr1WLVzPz5SU8D5wOp2QYVQ1bEv//PsedbeQa1iyVSVucXTYNkRh84ic7JVUt0VkT2LsRs2NoXQb2tl7WDvUefeE26VuDBz7bffk7IPb39V4bgudcuO2Re+lk3uD1dQ9MrM9QX+L2q79AtVp5zcP80uo1N/wcLheAmG7v/1NVIFdhJ4FegPlw5YcYYZNMCPHed/5RI6ohP9iGVQlm9TFyDBoNmmysrciDi9B0wvEsZlf1izWLsU+4cwRbD3ijLeoKsrerODtnUpHE/UdvPWIq9KNHPHews61IzuNYNvQEt8SH6u+d21r1Q5dUEOnIRqv35dYG9xa9i/+2t6/GavCzfxhj+jtzt9sc9d/3s9D3Qft/bKWkmgGTxvr+0dsiA9mgCoJuKoN2cwOO+92vG3rSBtj5t/2+6eslI9a9zy6ncHV3vkF4tr+3Y/xdt89Fy+ZNtnp+HW5jl9HchZYFiEtQs1O3Xt1pboRKNTuAOxL0tSqMJqgvB3QcE+PglCpBXKbMQhiMKSuo225ezEwN7Cg95P9euvsXafrFey+zZltt6zK2C92oZpN/bFZnGKfRePTUrVUhfMZpJCqkKparPO8sEwLy5JZl0RKWpisZNPwCWowRK/VsA8xchgiBUfBU/CvHiuuZaEWK5VuvqK2l4OqIZhGWl2S1h1x6GInIpr/GaKSsWMgDBauqlnLPKAISRf/ckKwAqBtUSomDlZKAoryEIJHpFLyoNC3lZ8QqZSaWZaZXBKlDITolehrA7x5HaU0QUibpAjrusWNS0qkddFzu5EDrmScV5n1aZ6YrldKyb18Ra0o/yAEqI7ohVoGM2yUMxRC1O/jwDAMHI9HhuHrEOrfrD93C0YT/vqMONjaRT2AG69wv3Cod7tx04CbeSQ3m9RuwZUXXircWBkvU7Ibyrpdast4E+gic1orq0HW3eS4CZHtN139p52/tYW2086L3T/zbmN5ufFtTbAziF7x6AZX2/Sq4iWV2snRHTXjNuz20oDd/66NAfuhv4fTWP7NZqVOl1N12V2vCljGk4axdB2tu1Bmf3TDx7dekxa+MP7IPpDb0oF15Em/Vqs83uoxlf04kAo7DgrSGCnSr9HngF2rhaBvE1Re7yhG1sbCyC9YalYm4cV424Vn2iDf20if20utPfefac6Hflrnj9gevSuxsh/MtRnybcw04+IF6mI3epmN2ZOW21BoqCn0zOL2bvXm3i/esd9/O/YGzsuMR7hV/v7a8atGzvMTXD4KafLUFHCDEA8FCTrInTdURypa2NWMH2vQNSmC06o7z/NCycI8C/OsVUSXubIYqcbyZGx6aB0WHbQGuwlIaaTEejNpX0K3Le6bq5DLtrBr52NkUyWxOjR1OqdEWRfbLFFlZXQTWFNGRKXFU9apPCdN+15T4dOHKx8eJ9KaOT9NTJeF6JwS9tYKQRjDgfvDkeUI3z284fI2USos2VGq8Ob4uhvjunwgrY+6SNUKBEQGwJHSynJZtGLseeL8eCanokiOU1zHDQNuGBCrMTMeVDDwED2HQS15FZDTzKcxCIcx0DgjOqALkEhpopSFYXDc3Y3UOnB/d7ANO7MsJvSYEo/zlVIrh8NBM5iGgXm69kq3Oa0s86wkxFK7Vsa6rizzYqTElZwWU0NOlKy1iMRHnI9GOj72OP7l/Gz8m0ocIsHd4ZxwGCJD1EKip9Md43iw7+9NZydwGI98++33r9qXmCHTw3EtO63WLVxVWkXy1Bebpvdys8jWLy/+bhcC2cPptezr7ny+2eqxeWj7hVH2BkY/Z9ssGx+qk5EtBNc/x4vNfLNwFLUqGdX0yT2boztEol5nKz67R05qKb2Ok0h71kZIdq/Ol9OMRiPB150Btl/o+7vvDL/m7dr7tPBGLUVRm1p7Fo7hHGqwBA9uUNTcWyaQ7LJ9kK0otRh9wGsx2tIKSHatsmJJHqVvqK7xPgz9Fid9LNbaeHxWG7DSeWwaSVZE11eHq3kDG0Qx/NYmtd5usikbH7QWVquAvc2RLa18LxzxGkfJRStjy96oFPuXvsPtQ47AJjzanlP2s+vWILBP6HWUtaVGqhmutVYtSFqaI7t3LLbr3S4Dm1Fzk71owo77RaKhm9o3YtGYnYFZVTsvlcJnBl2zx9rU3YXU2rEvWfK1cFUnHse/IrtqnmC6QpocNTtCqeAdvirJy0nZaqQ0RNKBGGs+pErwVmhLVbjR8avTrdigzFnfXA2PbMThPYS+LWQdYbFd8+XSo1wQ+7y0KCNdVKyhT75aCKpmal3APPyaky2urhPjtFJ4i8mnPqnmJTEtiTUVLucr1/NEWgvztLIumbRmyqopjrUIwQXGMHIYEqfDyMPpQMqVkCBlGF8ZySn5QslnWryY6lFVI0ctK2lRTtH1fObj+4+sa8KLJ7iog20Y8eOodamOB47rqDDlcSAw9oWteZ3eCUNs5D8d0zmLoUYrtWZ8EMbRalTZYF7XlVIzeU6Ukoyzo/VZhmFQo3PZRKzapq5fZWP9LyvzPOs5aSVbuEpJ1orkiI+Ij4QUOT8/czg+6TPMS4d4vQ+d5Pzm/sTxMBJC4O50zziOhBi7keOcJ4TI8fi6YoANxXmp/tuVS3eaKJv4V72Ni79YfIRbz6lxX7qBs/MwN3Tky0jINjN3i/gLj+yLHpgIzt4FB7U00uWXPbutPQwlaHoh7Rqf7z3yTgAAIABJREFUvSEbwfcmBKCIgTrBu3s2w+yVoZzm4WvGkz6DszaXF+epYXBrXPZ2aSnj/e+WwSrchMJaYVVh040RpCM8Ol/LzfnOW5kea9vb8OTWfv0uIh3J0U279nELSjhGRFPUW4LAruAqcrtB7jNrEEV/mwFUdkr0tVQNVe/E7VobvbYUgN5/Ux/vbWQRilJVY67PpRu0pG6/F/jSkLuZZWaMq+aYwQPO936tmJApm3F8e3x+h25A7IwcFd/dZTPas/YMrP65F0aO1T+7veNuzKild/v++3NfzPfPUOFmOP41SE6VQJWRKieKQJFMZdGQQ83UquTGRjBt3AsfLEThKuJt0KpErhL5nAfnVRnZDCNlokUoHks2pNflKZsHSmvqXYPqCzdp+c3yKwhZ2E0EtYrEiLIiFVd1URW0tlZ2OhkpQsVbu5vKKIJYqmQFXHGEoqnJw1AYD8pHQgaGIfNwf+Du4Y7j3YnjceRwuGM8nDgcKnd3d9zfL6Rc8UtlTYpUvObRM92qTQozLPUNK14qRSo5LVwuz8zToganZZW5GHHDiHOO+4c7UrojBo+vJwavcfPWJ5qhkVmXCZCuwjstCx8/fuDx06NWQX/8xNPzk/ahTYGUEpfrpRddXRclrS2L53q9avriMjNdlQyc1oVlvqjRkw0JKBqz1SKAKrHtBPCC8wGJbbwO+DAQYuR4HBmHQVVVg1APA0IliJLcg3fcHQ+M40DwWlh0iAM+BIZhJMaIZnaEV19M98Jn+/AU3C6EzeDsDoj1e6X2LMSOqvDC+HCbrP7Ngmib7H7REWnGhZ57a4h82dDZPtdf6uYZXm6kesoXIKf9+8pexfnLx0Z6bEa5iVsaL6e1hTPF9S8ZVX/ro5WCYesSGuqMbGFGJ1sG1MsFvzkYrTK7frT263UhTdtcDCun1xQ0akHz8GtpG48VKG3jpe6I27YO737sytO1yFYOpLZ13MZp0QrriJBMObnWSlkzedVQuQ8Qgj7PWlbWqjKzmuTZDBrpRs6yKkUgN25gSgYSmkPcnN/PjN+/7bF1i3S7Zesr6YbCZzw2MMNBoG7k7n3bbleWXh1cv6y/a9HSIJWO3r60bj63dRpkVLtR3cRvS+M5fmaUaQWD1rbtUCFWK2pRWqbX5w5Qbf8tum70teRmPXnZrp8bOF87tx2/nkLujhT/9v9n792Dbcuq+v7PmHOux977nHPv7dstkJ9BxWolmBAMlpqIocAIlGIiZX5GMa0VDHloHpWXAr9QECSIJUnsArU0qVD1syiw8I/EFFVCgqG0UJCSlEks00XML6YboRv69r33nLP3Xms+xu+PMdfe+9xH3w707Q7NHrf2Pfu51lxrzjnmd3zHY1K8UhjIrpA04bSgGpGyNubFC1IDfJvO0/QWpFxqCqsCMSrjaFUWh7VnWAdKEdYrYVjZYMjJU1Kt4FqsAOGGUpvuZdkCnJxS9d/u7KFTKTtEbF+tYn7Bkgsp2eInapEwghJINHQGelxEne1q7opYvQdss0ZXJ3ZwgeCtXZ3PuCYTcuFQe2hjbaRZL4t5xzO+9Bnc+bQLzGYdF+68g/PnDvDNKU+/EvFNT4yF49PEOGYuXLhwqy75vGTLYExOQcVls9y8Zlpnpd3j6oTPfPoBTo5Pt27GohYIGBqc99x58Q7uvOsO2rYl3XWRoHfgnCPGWIvxnbXyU7ZgxzFGPvPIZa6cnDKOA5cuPcLJ6SmTm8Via6yugrkoPfgWxDPWnXq986QKckqeQM5qw7xsUrh3Jo33QhNsjPSt7TbvnaPpZrTdHO8Dh+fv4ODwAO89XdtYgTMRumZbAbppQi1WZnE6vgZHeh8M6GPW66OlNT4eMpU1mKj4yc1n/bzNjJhAPzotmNeWZLixxTT9Ftm6hzdxA9cwBmdjbuz/jfV+DeC5mULaWPeVPdqcp35/Yux2ZZfOvvb1tfEp0/Endmdieyf3TvDOariwXQQejTl6vCV4R3Buaylbw6ECsF02Tfz193433TqlxJhitbYNdghCDp62Bog2O5WDywRYsXo0hR3LUMFJhlyIztc4u1BLe2wjbKbFdlLUsViUjDrwxUKJbVsRK7XgcBSshs8QI+vR9iNMQyQNpkebtqdtjXFdp5Ehj4DigzDZEFZOwrK6hnWyrR1KqTXJko0Hv01JD6HWMLvNojtsoNZFTNWMyWrbnxmnSq3/JZZ5pjU8Y6q4TwWm09Bw1eiGCh7rvS+AqxhuEzOlWrfwkC1o2mEqt6p6Uwhu49ATVVzxm342SFLH3kbPbudv2SmRYp6S3Vg/3f52AjMWV17X9nJGl7Dz3RvNvzN66ibymJmc4g5Q7VCxDdyM9RgpGnCaN2yMePP1Nm2oZXWUybHrE/jWfK4uBFwTKHmqtSBoEWJ05GS1ebT4zQDe9MjGujBlJdUtMdGQGyU37Q1SoKQKaaUgddsC28Y91wwtTxCPUCjqiIqlrcq2jsPU/aZ/rY4DQCHbJoSu0PWOhMWfNE1T4zQ6Ds8dsTg8tA0VF4f08wUpC4eHh4xjZhgzKiPrdXoCXBzGaGwcSiqIZihWG8eLWmXoOHB6fIWrV45ZrdacnFYAURk47z2USPC2UeThrGU8nOGcYxgGxloYKsZU05e1+sgN5Dx85SrHyyXjOHL58mVOl8uNC2qKwDcRxDc03QIXGnItQAViwKamdVtMzuqMhTYxwlMVW+9CrcchdG3DYt4bmOkXdP0CHwLz+YxZ1+FD4GAxZzHr8c7RdwZ4RKyYmbt28RNh6+e3e3u7mZzrgq53aOKNEpks7GqdIeCmIIYd0HFzgGNPFbbMwOYGX8+wyI4GPQMidijza2U3IFintlZQ5naA1O7f3d/eiNHYva7dz87en7OWoJvq1Owc/3YDm12xNuzskSXXfs5m0XNsN6S8EbNl7tlUQf8mrBevgawWM1NwNVZHsC1ANmkitoHiFrVsDE0n2VK5d2JHtl/YOT/V0sdKVexuHFtyDYLPGZwt6illxsnFXBM4bDy4yhbAEAfW0eqNhWYCOdO8E9sTcIiME+sbrfDqpI+njZlFbv8+ZFvj7iwLswkXrxPpDJujuokJ080ahMU0XQMk7BB2EKnvb2Zn0U3R0zNSGa3NHJ0Ypk1rp3mzfW6v5UxC0eZwqpuCvpvX1nBbO3ViJuSMPprcmhOAntTMtlKE3nBM34rBvZncEuT0B+fQFIljj+aED4XQZLxXvIx4v8a5TGih7c06ajpP6D1uB+QoQAKpqeU4j4QJ5IBvtiCn5AnkuEqjbkGO7oAcLaVaBduy/JNVNlk6pUCbte5SXupmndt8JkvcLnS1SGGbEyENFkNUwBdHZgoKpDI5tnEoYBlU2fZYataR2WCWQwiBEDz9rOPwwl3Mj47oupZ2fo5mNqfNgfnRirE0tGNG2shszBxduPNz6sjHKvPFEZNtqJj/1vkA4mjbQgiZnDIXLt7J0//IH2FxeMp6PbBcrndAjsM7z513XeSuu+6k61ruuHiRc+cv4JzUFPJYQY7F0RQ1H3mqWUu+61ms18RxZD6fs1zb8VOM14AcDOS0c8TX4GdnxQZTGonDmlJdUnFYo3o9yLG/QtN4uqbBeWHe9cznFk/UdnPabob3gdniiL4/qICnZ951eOfoWqt6LCK1MJtsGUOmxXAqn29j8ODw9lY8Pn/uPAqbxWx3Ad8EIWNzxko3bOMXNjJhHZniTq4BOduvnUlD35WJVZ1cEvXH9d6cBTk3tLh2WIttG3eupSrEnDPpBrVqNsHRu78DUmUUz4A+rN7KVHgvBFuwHULwBmBtEagLyM55Ll68vYHkFy7eZdewScPeiniHBL9TG+gscNt8r97fYRxqBdqzIKf1gaYyksE5gtsFtrXwJnIdyHFQU9itVEjbtnhxZFWGONg2EGZ5MjE5UoNum8qKiqtgZhwtxMF72wZChGFMzEeb+2mwZAEB2rana63i8ToODMkqwm+ZnOtBThwtDsT2w7NNnUMTanVh2zbjzju/5PHospvKnTtjZTuGaoXp6b2qP4L3Z+fcxJxuQK3bxLhuulorw6NTHaPt2SYX63VyBvxP396urTdmWicAc/3htjFwOwaI3mDDXLZp4zcCObv3yBjo6084ZQBCDTnZYakB7rrz5nNT9HOFR3vZy172spe97GUv/wfL7c2j28te9rKXvexlL3t5kmQPcvayl73sZS972ctTUvYgZy972cte9rKXvTwl5QsS5LzhDW/gxS9+Mf/iX/yLJ7spe/kc5VWvehWXLl36vI/zwAMP8LVf+7WPQ4v28rnKvi+f2vIrv/Ir3HPPPU92M/byGOS9730v73rXu277eT760Y/y8pe//Iaf3Xvvvfybf/NvAPjqr/7qx0U3fD5yy+yq/xPlF3/xF/nQhz7E05/+9Ce7KXv5HOXDH/7wk92EvTxOsu/Lvezl/wz57d/+be6+++4ntQ1/9+/+3Sf1/NfKFxzIeeUrX4mq8upXv5r//t//Oy996Uu57777+Pt//+/z5V/+5bzpTW/i8uXLiAivetWr+M7v/E4Afv7nf55f+qVfYrFY8HVf93V88IMf5Fd/9Vef5Kv54pTXvva1APzAD/zAdX344z/+49x77738iT/xJwB48YtfvHn9H//jf+SnfuqnbNfZ+Zx/8k/+CQcH2zTt3//93+fVr341r33ta/nWb/3WJ+Xavthk35dPTbn33nv5d//u33H+/Hm+7Mu+DIBxHHnb297Gxz72MXLOPOc5z+Ef/+N/zMHBAQ8++CBvetOb+NSnPkWMkW//9m/nb/yNv8EDDzzA933f9/GVX/mVfPKTn+QXfuEX+JIvub3p208VKaXwlre8hd/5nd/htBZLffOb38x73/te7r77bn7wB38QgNe85jXcfffdPPOZz+RXf/VX+fCHP0zf93z3d383b33rW/nN3/xNvPc897nP5bWvfS0HBwe8+MUv5uUvfzkf+chHuHLlCn/1r/5VPv7xj/O7v/u7hBD42Z/9WZ72tKfxiU984qZr6nK55O/8nb/DH/zBH3B0dMSb3vQmvuIrvmLTnql9k7z3ve/l3e9+N6UUzp8/z+tf/3q+8iu/8vbfSP0ClK/6qq/Shx9+WF/0ohfpO97xDlVVjTHqt3zLt+j73/9+VVX99Kc/rd/8zd+sH//4x/XXfu3X9KUvfaleuXJFSyn62te+Vl/0ohc9mZfwRS836kNV1Re96EX6n//zf77u9Wc+8xl9/vOfr7/7u7+rqqrvf//79Qd/8Af1/vvv1+c973l633336Z/7c39Of+M3fuMJv5Yvdtn35VNL/v2///f6bd/2bXp8fKwxRv1rf+2v6V/+y39Z3/72t+tb3/pWLaWoquo/+2f/TN/whjeoquo999yjH/zgB1VVdb1e6z333KPve9/79P7779ev+qqv0o997GNP1uV8wcrHP/5x/dt/+29rzllVVX/u535O//pf/+v6oz/6o/qv/tW/2nxv9/Xu83vvvVf/1t/6WzqOo+ac9TWveY2+/vWvV1Wbi295y1tUVfV973ufPvvZz9bf+73fU1XVH/qhH9Kf/dmffdQ19SMf+Yg++9nP1t/+7d9WVdX3vOc9+hf/4l+8rg2TbvjoRz+qr3zlK3W5XKqq6q//+q/ry172stt383bkC47JuVa+7uu+DoD/+T//J8Mw8JKXvASApz3tabzkJS/h13/917l69Sove9nLODo6AuD7vu/7+MhHPvKktXkvZ2Xqw0eTj3/849x999085znPAeAlL3kJL3nJS3jggQcYx5Hv//7v5+u//uv503/6T9/u5u7lUWTfl1/48pu/+Zt867d+64ZZ+67v+i5+4Rd+gQ996EMcHx/zG7/xGwDEGLl48SLL5ZKPfexjXLlyhXvvvRcwK/+//bf/xnOf+1xCCDzvec970q7nC1W+9mu/lnPnzvGe97yH+++/n49+9KMsFgvOnz//mH7/a7/2a/y9v/f3Njt033PPPfzwD//w5vNprfyjf/SPcuedd/LsZz8bgGc+85lcuXLlUdfUb/iGb+Crv/qr+VN/6k8B8IpXvII3vvGNHB8f37AtH/rQh/iDP/gDvud7vmfz3tWrV7l8+fJjvp7PVb7gQc58Pgcg1/Ldu6JqGzSGEM5UBX0idqHdy2OXqQ8n2e2rcawVTnergtbv3HfffRtF/NM//dP8yI/8CO9///t56Utf+gS0ei83kn1fPjXkRvqylMLrXvc6XvjCFwJwenrKMAyb6rbvec97NtvSXLp0ia7reOSRR2jblhC+4JeaJ1w+9KEP8U//6T/lr/yVv8K3fMu38KxnPYtf/uVfvm6rkhjjDX9/7ea2pZQz323b7d56ExDalUdbU+H6nb+nSv83a8tf+At/gX/0j/7R5vVDDz3EuXPnbvj9x1O+ILOrbiTPetazCCHwgQ98AIAHH3yQ97///fyZP/NneOELX8gHPvCBDcr8pV/6pSezqXvBFOc0WXbljjvu4L/+1/8KWAT/Zz7zGQD+5J/8k/z+7/8+n/jEJwD44Ac/uJkwbdvy/Oc/n7e85S288Y1v3PxmL0+M7PvyqSV/9s/+WX7lV36Fq1evUkrh3/7bfwvAC17wAt71rnfZtgyl8PrXv55//s//OQcHBzzvec/jne98J2AW+vd+7/fywQ9+8Mm8jC94+fCHP8yLXvQiXvnKV/LH//gf5z/8h/9AzraJ8zSvHnzwQX7rt35r85vdufjN3/zNvPvd7ybWrXLe9a538U3f9E2P+fyPtqYC3Hffffze7/0eYMlAz3/+82+69+ILXvAC3ve+9/HQQw8B8O53v5sf+IEf+N+8I5+bPGXgddM0/MzP/AxvfvObefvb307OmR/+4R/mG7/xGwH47u/+bv7SX/pL9H3P3Xfffds3wtzLo8vLXvYy7rnnHk5PT8+8/w//4T/kjW98I7/4i7/I13zN1/A1X/M1ANx555287W1v40d/9EfJOXNwcHBdCYFv+IZv4Nu//dt53etex7/8l//yCbuWL3bZ9+VTS174whdy33338V3f9V0cHR3x7Gc/m0ceeYQf+qEf4id+4id4xSteQc6ZP/bH/hivec1rAHjb297Gj/3Yj/Ed3/EdjOPIy1/+cv78n//zPPDAA0/y1Xzhyvd8z/fwD/7BP+A7vuM7SCnxTd/0TXzgAx/gJ3/yJ/mRH/kRXvrSl/KlX/qlmzUODKC+9a1vBeBv/s2/yU/8xE/wnd/5naSUeO5zn8vrX//6x3z+R1tTP/rRj/KsZz2Ld7zjHdx///1cvHhxc94byQte8AJe/epX86pXvQoR4eDggHe84x1PyAa4XxR7V/2X//Jf+E//6T/x/d///QC8853v5Hd+53f4qZ/6qSe5ZXvZy172spe97OV2yRcFyDk5OeF1r3sd/+N//A9EhGc84xn82I/9GE972tOe7KbtZS972cte9rKX2yRfFCBnL3vZy172spe9fPHJUybweC972cte9rKXvexlV/YgZy972cte9rKXvTwlZQ9y9rKXvexlL3vZy1NS9iBnL3vZy172spe9PCVlD3L2spe97GUve9nLU1JuWQzwl9/8Jo4/+zCaC6UoTgQREAHtO9LBAeo9gxNWzqMIF/vAM+YtjcDplcucPPIwcYx88tJlHnj4MjEVxlwYcwasHPS2JtD0XEAciCBOaLsOHxqcg8aDF2gbz9HBjL4LHB0e8mXP/FLuOH8OLYWc06YE9XIYyTnjQkvT9zjnOV4uuXT5KuMY+ezDj/DJP3yI9TDYhYmd3ntH4zzOCbPQMGsba5tmtCRQJZdMLhkUclZyOZusJkCQWsZeQD2oQEYYiyOrULKShkTJmYt33cVbf/rnH8cuPisf+MT/w6c/+2k+9cApy5MRLR0lHYIGmtDRtXOcDzgcwdn9b4KjCQ4nQht6umaOE0/XzOnCAnGermvpuhZQxrQm5jW5RE7iZ1nFyyAJN1/juwHnlaYb8V0GFCcZEUVLoaSIFkUcNMEhThA8QgPU7Tg2tzgjkhEKIXS0zQzvPCIBR1fvviIlo6qMMTOOCc3KallYnhZKAS0tObegjqyBXBoUJedMzpmiSk4DOY+UosSYyKlYmfQ0knKqjVKgkHNmPUSOZnfwhlf93G3ry1f839/Lpz/9EKhDEDNZgo2zftGxOD8nNB5xQnA2wZbLNSdXl+RcUHXYjxxN4wlNi3M27kMjtb8dTRCCDxweHjBfLOq9jIw5klNmuVqyHga897RdTwjBjqqCKJALxIzmjPeermkI3hHahtlihm8bimbGUlAtpBSJw0AuSk6ROCSKFgSHVLssx0yKGS2FYRhYrwcUxXce3zV1mOz+M/0FbEw7qd8STAd57/DO1ece7xyqSs6KauHOi3fx9h//6dvWn//vb32Kq0NGtaC6bX9tJq4mwjqsb0xNOkTc5kqK2ihMWojZKt+K2DWiCjlBSQhCEzxN8IjYXAveAWIjWYWiQipC1qqc6zltlJfNNFSdpmTt78dB9JpnsnMfAAq6s7XBTQrK6dmn2yVGOOo9r/rGZzw+jb2B/Ot3/jxXH7lEXK8oOUEdvSB4H/ChRZygOFQEEWE2m7FYLPA+0M965osFTpyNglJAlWFYMwwDOSWOLz/CyeXL5JS4euUKpycnNl5LJtd7o1rHkEKpzwXwdc113tOEsBnvTdfindt53+G9p2k7vDhcaGhCB84TfKg6Q/Ah4Jtgz709F+cIoSGEBnH2W+9bW2Kdr+NWwNkaLyKIm3T+9r3aadyon8WZ7muawHPu/vIb9sUtQc7D/+v/4/ihh5BSQA1cBKeIwNDNOJ0fkH3gpMDlohTgyw4XHF44pPeO5fFljq88zBgjp5dPWF0+IebCkApjKpubvm01CGKLTFFyKQY8vLMLEnCiOFGa4FnMO7qu4eIdF5j3PQ5BSyanEc3ZzrtaklKi7XpmBweEEDg9OeHqpUsMw8jlhx/hkc8+yHoYbYLXydy1BmyC8/QHB3TtEQ4YhjXjsERLIeVIShFVJcZISnnnQmxYeyd4B+IdYRbwrQcJeOnBNeQYOT0+Zr0aCOH27qu1LpdY5k9zki9zmgeWJ8Llhx3jIMy6Iw7md9GEjlAHuXMTyDHF2vie1s/wEujbA2bdAc575rOe2axDHLg245oMEvHdCY1eQSVS2suk5hiVzNoNaI6mfbIY8lNw2SEK3kGjgg/YZzhQB+JwEgCH99AGRZyCX6N+RRGHlh5Nc1BPzo6cAlqEcVDGdSanwtUrS65cOaVkBenBdYBDJaASKEVZLles1mtA8d7aVEphvR6JYwXROZHVwBpkoJBTZrVeE2O+aT88HvKpTz3IA/f/Yb0/ggoUp6iDbtYyOzcjNJ629cz6BiewWg6cHC8puQChgkfbc2baU0qczS/nYN4HZl2gaQIimaax2ToMA2MciSly5fIVTk5PEedo2wYfAm0IHC7m9G2Dbx2zeUNwDd47uibgncM3nrYXnC9khVZNH+QkxOTRUojRM6wzWkzx+bpvz7BKrJdrA5t6wno8RVXx0uBdWxUnprAM4qCSESf4xuOCq2rF4ZzDiaMNQtiAHAP1iqDFtNTi/O0lvo+jcmUolA3IASaYUxTNtrBLNQQNnDl8NQaBzd9UCqkUQEHVQMK1IKdRGl/sPgTTUaZ9DfgqQlRPnpbnuhgrSp7aVdup9YXcDHA8BrkRZJHNJdmqIBNkLboBrUXt8Zhlunm3UWLOxKLgA04cXduxWBwQfKBpWrp+hvOegqPUxd6HQGhbRAxkFO9Nq+S81THdgq5doFoI7YyDo/OkmOgWl+ivXiXnzDgOxGhbb4zDSEqRlBKnJ6cMw5qcE+P6lBTH7RwQqbre5qaNGdNfzpmRY/OiJTQ9TjzOe3xo6pxqCK09D2F67mi7GV0lFtpuTtPNcc7RtC1N0yLO0TRNBUiOpuoP5xw+2HMjOgw0MYGf+tc7M+LkUabmLUFOjpESB1yxSbIx/gSKc6SxIbrMkJVVymSFsfFonEFxaM4VhRa7WSGgUqolUHbQuJ7dtA9QTagYhi/FLGcDQQWp73nvyKWwGCIxF5KdipKVkgspF1LKpJTxYbKS7EHJoMYEiKvEEZNaMYXaOLNw+rY1ECVizEC2QSQURLNtUucEnUyZejDBlIdZyNC2jtB5cIHsO9S1eBGWjcePDu9vswdRQDy4UHChoFIYxsx6rQgNbRNR9RQ1IOvUAcVUnxMoEXLAScHJiHcjvnhi4wjZrJVWwLcYEyCmHItACpnsI4VM0pFCNGVePFqcnavYOSfjcTs8zH4UNRpRxOHE7qn3Youmr/1XIKuz4+ZATi1ahBwdeRRyzsRhYFxBKSBeEa8gCs4epRTGmFivTRG0rUeCoxShFPtdUTXlqs7G08QNqN3o216CalpddDqvVgsXG/cxoxjI1jo+UbUlSwwY2YLppmdmiRcDBRR7LihOIDhH25jKyMUAHVrwTnBi1raoIlpworSNZzZrabznYNbTNQ3eCW0FOc47fOtx3pG1kDWgFHIu5NzUDQUTY+eNRUZw4kBhFYQgxtTm0jBGG7O+D7jeFLIEwXmHSl2WK6MaurBluKqlaiCnofHBxn0FPqq6GYOHR4vb3J1i+k5ly+Bs+CYqd1I5jWLzqqriG67ZIsIGiZz5zhaKTGcpqvWYFURN5ucGD+xY1TIxNmpDT6896ucm1/56Yvin69u16ac3pjkmsqMrrpHdt5+IbQQAQmhp2g6nDQLM+hlHR+dp25ambZnNjDEvCBlfgZcDd9bIVVXUJYoYKzeNV1Tp2haNC9urynl805JzZlitGMfBGOXVijiOxHFkHEbbxypnUkqM44BAnbumr7M3z4WWTMkR1Db59JVVca4hhBnivM0Rb+DDNwHfNgZOmoamgpyun9HN5jjn6foFXT8xvh1d1yHO03Z2T5xzaOlo6nEmdlzEIUE2QNuM3TreHkN33hLkrIrjJItRzloZFGcgYyRbBPGZAAAgAElEQVSxcpnshVhAq+FaxkJcDXjvWK7WXF2vGWLk6nrNyTCQciEmAyCbhWDHMnFOqpU8WSNQZMuwOGcdUxRSUaTAOmaunq7pj08RVRwZQe08xZpfJo8CbDr32odia50IdE1gMZ/ThMB8Pmc+nxnilYJzSsmZcXSMI5RiYG5DTYbG0GdFyOIE540GD17AebrGIz7gYuaq8wx45DaHSQ0xkUqmaT39oiFnODwHbQezbsHBfE4IPd45QjAgQclkTeRk93vICcGTkpKzELxHXUJ9xHnIPpNCAYlEf5noL6NECidkt0K1UIhkzaga1ixJcQrmkjLlVialK2ZtizhEAsH1iHgDNVW5qdr3bVzIBgCU5ElDSylCXMF4quTkWB/D6ZVIzgXfCq41qjR04Nuq4h34YFaNuVHM5ZFzoWRjHSiV3i9SXVuJnM2VUvLtBTmT9aX1ejcLgWJgsD6cOJpgLqISCqMfzUYrYgtDnddOp4WzWnZemHUdB4uetms4OJizOLBdxn3jGWNgjA1jGkHMSPHBlF/XNRwczVgsetrQcLSY0bUNripMJyBejFFxoDgKlZpXs9JVlZwScWjNoKjuL1Wl7zyzWSCnzGzRcHDYU1RxXYNrKyUeTDkCFJdRqaxF6/FNZWyC6RsnjsYba2s6xm1AglbgeO7gNoMcNYDOjp7afrhluGEzvDcfSjVKNoB3Ym7OHLwakuIqxnWoeBsmFSCLStVxcrYZYqDn0TDClpX/3IDEGTACG9YK2X4mlTrauGF2Lu9Wx2RzzBt88DjLbDEnxpE4mEchq7E7xIQ4Xz0UxZhpNzWnWP/DdhFHJ6t9C3Trl5w4fNvhfODg6BwhtOScWC2XDOsVOWe6fkYcR3JK9LMZw7AmpZHTk0OGYWXGa+1vv8PkpDiyXp6Y6zgOrFdLcko4gZSqeVTdWogYq9NUtqcJ+KYyOe2Stjux582MpqnAZsPkCE3bEqbvdy1NE3DO03R9fd/jm97WVHHmDvcO7zxN0+C8o++7m/bFLUHO1ey4FB0p2U6mTgqBhIhSsqOUiDplxGG2lpBXkZVfkgUeWR7zqeOrDCly6XTNZ0/X5MqyaC5nnKUbX7j3xqxO7E21PFVMsfvgUW8KesyQUJpV5KFLV4nFFt1519B4Ry6ZmJRSoNGJpVFEFO84+/B1sqidZz7rueP8edqm5fzRERcOjxARhs5YnZwTy+Uxq5UtcJSCloiIo+l6fNNv0Lk6j3NK2yaCL4TQMO9b2rblalEuh5aVy3h3/Zb3j6ecLFeMKdItGtqZp2k93jfEwdE155l3F/C+N6tc7Gasl0tWJ0tjQFaFuLROW8zXHCxGvA+sc8tQWnxQOkY6GcFHcvMwhUfAJZBTkDUqSi5CVgMkMRbS4PAoSGP3rED2lVHzDu/bGjPR0YZDnFjcTKkujqKZotFYuspAaIE0esbljJw8w3FgfdWTxsjVzwiXHhpIJRMWhWaRccExF2XW1rHmldA4ihZbbHNCi5JiJqVpsTFXWs6F9ToR07BxY8b++p25H08x/7fbAC3Z0PZaGRmBIgTXMO9mNMFDUuJqJJPJKuRcKhDyhAqaDOAKITjOHR5yx8UD2q7l4sUL3HHxPADrcWCMkRgjPgizuSmZyc0+6zvu/JJzHB7M6LqWc+cOmHWdsbc5GfPlqqsRKkKbrHIzCkAouVCSxbxpLpRU0KLkMZHGRClKisnch4D4gPhg7qpgD2OdC+oyOHBN/awCLXHGQHrx5vqZ2jCtMQCqLLpzt7U/c9mCu2sX4Q14qO66TcyO7nA+O4atTn6ejQUw3VuH85WVcZ5SmYOiBuClMna+Mn1Tv2z5pIm6qQ/dsin2DZm4pk2rHxVQyJaFPMN8yvaMFgu0BW1nwM2jHPpmTKpqjXG5jXLh/EW0wJXLVxjLQFJhNUbGXFBxNF2uAEdwYhaKueASqhVku+omLBYDaob/lgzo+p627Sye5+AcohZHeHJ8ldXpaWWs16RxBKZrLsQYOTm5aiBHXHXNGmjoW9Ozy9MTLn32QdbrJZcfucTyk/+LdVyjJUJcbgCzuB12zxnD4rzHVTdT8B4fKvgRY0yluubMPe5wIeCCt3CItiFUkNPNFsaG+UDbH+Kb3mL6+t7AWAjMegNCi4MF8PU37ItbgpxRhaEYmChacBQ8xRZAKeALePPR2sWCJiWNhvrGMbGKkXWKrFJkHZOh2FzQfJZGlWush1xsEdtyk4BzOFWjdNW+o6LEXFgPkdPVQNuEGkRXB47a5Nv1207Ul5OtUpUpmA9jBII3xNnVR9u21brLoIWcPSkOpBgQ2AwWcUYphmALdvEW5+FcwWJ5E16E1js672ingMedwMrbJTElo/WDAIE2B/pZSwieNnR0bYeTluqrAAoyQNFMLomYMushgVb/bOgIIRFioUkFj+LyGpcHIFLCEtUVkBAGkNF0ro0io2uLI2ezU4qt2Wbdl4mRcZXN8TgX8L7BSWuKOeeNoi9qsQZbHaxosZicHD05ZvKQyKMS17BeFnJJFO/QxuGLI+cabSBaXZiCK0JWrS7TnYWICeSY/s2Ti0iLAfTbzeRMgXnCZmkRnRaiac4YS+F9IASPd7aQmwtHyfU+iVYvtE4LvoGdJgS6rtsElndda4urqN0bJ/R9tw2+rqdt+4a+b+lrzNx8YTFbpWRyEgOjTNYs0/pdH2YZigg66QlVSirkmNGilDaQo4HOkpWc6qLlAuLMupQKZlRAfUF9dXc3IL6e01N9/OAxJQxb/bC7UM6a/rb2JxNg0R3GYVcmfHENo1Omd/R6DuVa1mJD+U+POm5sLlagzJbN3nxjAzo2qG+nBWcuYjpr/e4tnFiq1x3B3tZN/M9uMLNu/rvmvWvOf1NX8RO0i1HXddUF440xq2sVmG7Qqk8szmTnjk3gSycmB2Nu1FwRk2qb3IoGIByNN9AwJUuomqcheEdumspOSmViIqENDMO6uqIsBiZ4T9/WeMy2YxjXuBBYrVaoOLIqmjMlmjE5xYbZBeyAHe+QOn9dXQsNK0sdVza/J7bUeY9UN1loGkLwOB/oZwuarsf5hm42EtoZ3ntmsxlN01roS4wW0/MofXHrmBwRkkAUJatWvqaY3x2tUdqCOE/wTUV0ljGUxeJh4uZhLqq8Qe1l6s+NfszFAos3FsKZMSlIsWgeV1z1txssWI8jV06OSaXQdebP79qmUnDeAqWqC8m5QNN0zGZzmqZhHQvnViPdaNkiuQaMeh9s4UrZgJIzlqlvArODQ0pOuOBQp6QYSSWTckbFWUAxHlVPyg1ZGlzJqCbLdFGhxIxIgpgpUSkWE3hb5fQks4yZtpviIgL9YobmAMWRygpKRDWbC04LqQyIV4NfUkilZiitlxTFBmU3p8sF9VAkQsiIz+AKRay/05DIcUBVyOo3sTiaPVNsCJU9EiZQ4wg+0Ibe4iho0RIoOMYIq1UhZ4y69B1OFNEGpx5RR0IojVCckFeB7HuSi4y5YxgaUhFkFghqI9s5RwjWghAcuXhKEXIWSq6K31lwrrnaMqVgsSEp16ylYpkMt1mf+iCERqprCrwKTm3RCsHTegM1QRyuYlYDMBPIKUjO1dhPpFyzJ7MgRSB74nrNsGooJXJy3FRqXRlTNMBcjOWaAnhDW+fIrGE27+n7jrYL9n5jQeXiA6oZxCPOgIcTc0Mz6ROpkSGVvUGV4qaYFaW4bMGvaiCnuCllyiPiK4AR8FURNw5pqBlooIENS1Fd/DjcmZgNc5nYc0Xxt5dkrSfaLvo7JPe1X4Iz37rmu7L577pjbBiTCeOwNTTt9k9hx9tDbbAVk0PqxsBkt3Vnf/fo373pZxWwT46pyVUFRkzdgPDiZnds+7E8IUAnBAMM8/ncgmeryzuXQsqJMQ6Ukgk189PcztvWW3KNPffOsqHU1Qypso1fVa3zZCK61NatpmkpLlGyZe1OaFfFDtj1Pa6xoOjgzSUUnLXZ+wC+Iatlc0louHp6Qjufsz454fjSJdJoGbGbENQKmo2UMoygIpuo8ImoleqCLDWwf2KApAKeFCxGT5xjHAZ8aHE+ENpTfOhx3tG2HcEHfPD0XUcIgXMXLty8L27VWVlgFFiLkkQRzYha0GFLoXeG1iQEQtvZwqSRkm3RH8fEMGTWObOOmSFZeptubJDacRtrsiBa41p20WyZTAzB5ZoR4R1ZwZcKnB6+xOXjE2Z9zxgzfd8z61suHB3QdA2+aXChxXlP28ORE1vIfUuSwDhG1uuR1XJFKYoPDTFmVIVUgNAgTcPhwQEHBwsbpF2DeGEcRzJKKmqpl9oTtaWoY609ozZ4ySRJNJLRgAWTZdB1pAyFNEKO/ztT6X9fLl+JnKbM0VFL23p86DjsD3HSslpmTq4eE6OSczIXpRZccfhGEC+oZGIayKkwxASnp4Tgkf6I/lyhKULnM9IlcBl1mSJK1sLpcmQVV5ilZSuNI9BKS9hQmdi4EId3LcE3NKGh6xa0TWPs2RgoxbFewiOXhHFUGh/om65mAghtY5bEKJ7cC6U48qplDB0xJVZ5zsmyJ+eImztadXhsTIVWtm10xu6U5MipLgBOUS/kbNlVKVra8zhGYkzTbL7turTpHF3vzEKqiq4Uy0HxTWPZcd7TiEOKINmCuoMLlr6qEcnm/ilq41wENAgahNI4Viee0xmEtaeUyHJ5goJdd86bTIm2sUyIxcGcrm/p+oajowXzRUvTerp5S2gtsNgYVUtd9r4GuNd+2wTLVsunpEKJ2RaJpNXnbO6q7GvmU9YtK6wOmVDLBHIc+FnAd96S9AKUYN/PYuNzIxXZTIyF7rACzS215ecvdtl65vWNwIJsxtdu2zfI5dojbo5ihkuuoCbb2BFzVyIeQfHIDS3jGoV2HcS6FjfstuAWkOO6a7uOgVEl12s6C9bkDOt1w0u+6Ulv1arPX/qmYT6fowgxZYZhzfLkxNzYKbJeL62MQmhoS9oYdBOD6UTw1Y2lTjbhGTkribIBDBQq+DFSVFXwoaHrLJzCHpaZpVoBlffMDg6YO8E5T/B9/Rtomx7vA/M0cnD+DnJOzM9dYCiZ4yuX+eyDn+LK1asGfrbEbY3rMVeUq+zwxBJuKhCUUpnmrbGomxcVtG0IIYEpzVwc4rtNjKuVCTHc0TQWpHzX055+07645bRVmVz7ulEGUiGjBQPXRtYOclJjBPI2gDDv0PwG7IzuRtgEkE22QdFiCllMWW/Qe5mM/Po77JG1IGqxN0OMZFXEOdYxIt4TmmCIUtwmN1+cw6uHpqWUTNcmZv0M54MBmmgWuRNHLork6vKqxwhtRz83kNP1M5q2Q8Hq+PhgqDpbJk7GkdQRi6eI0oj5u0uRGjtSXXfKxgV3O2UcMrGUGogthBrI5X1gGIsFzpZMKomYR1QLgYB3jQHOygCYDz+a+6E4Uo5kTXh1WCZLQaRs3CJTIHmMNQ4LB2oxXo2HjdkCZmWCuajE45yvlGpTXVhuw67EURgHQb31qXcOmpoN4IQixuLYX0vXLCJWDycbU1emFPUah+EmF5CjZhpM7MI03jezw+JDct5xZVlw6xNRZtN5wQVzpxnImQJGa7ZZVZbOJhpai6g4hI0zRKnxS1N6cnVdiCDiySmRomVZjMOwua6Uze3sJ587dt+s9EBDCOYeC8GYHVeZHq39qrXLva+JBBXkOJnaam0rE81fioGWiYnwln1wbRae1HuxKyJ1nHsDOcUrxVdeQnRb80UmjmI7D7cxLxaIfTvlbCDt1l11vUqYWrTjvrj5QVE567bagAIt1Z1UB/sms2r3ePXoO4zWjWSKXzrzG65t2PUATHl0ZmUCfdeBoXq8a9szga5dWDR5cA2/bqiHm57z8RDnLAShaSxkIVu0rk23YrW0LHPJGdOiikgwkC5TJtG2qVNSTKkzd7c/p0GgdRC7GttKXQvFTQkGtf/EVZeQ6dYmtDUMINA0BnicD7jgKCUzPz1ktjgkpkTT9Wb4YThgCqljk5lsDZKybeAUcKJqHqDpmiwuYWr+xBDqZi4y6SlxILGytFuWd3KFOeeYzWY37YtbghxHpJFI8YVGzF3l1TIw2sYxazw+eJKHJMmYFsl4rwRVglO8WNUFsxBsBVAH6uok1alQ13bSTs+nAkrO2woj1ednefSeru9qzn1gNl9Y4FJoKM4xFCXkwipmcJm2d7SzObOurQXH1pSSOTrX0SyOyFm5/MhlPiuXiDESh5GT5RonwuLcyDoVaMB1PbNzF1AtdCdXaE4uk50jqnA6RHIRhtwQi1VOWRcYC7TBczBf0HczWgcEtRIxviE00HWRtr15Zz0eslopp1FpmkyMQttEclrifWI9jCRdUySjkuyB4ryj8WYhzxYd8RByUmI00GL0opBLIWbldBUpVyI4Y66yKAUh5X4LVKd+xRs7Irm6Fmvafgg0oSX4DieBnG0kxAjDUKyWyqC1CkCN9CfUwFFPUnO/LXXBqR6QtGHEs24aYsnE7graX0LyiGsK3pdaB0cYY0RR1uvMMBjgNZbGLKGULLsq51onqVpLpVLP1HF7u9NVZ0cdB6c9rljhvVLsXkyAOeeElsy4LJyUhHOOOEbGtZU/SEMijxbXQpkUkFYrUSALw7Fw6hLOO4bTlqZtzLgoBnSd94yLNV3fEZqGsh4YZj39rKVzBZd72q6lbxok1Jocoa3ABoKfMhuF4I251VLLS9S6W6mU6tYsaLK4BV0n0mqwwOSYyWNlezIwZXm6GjsUHO1hT7PoDBi1YkUTXWWtPAbepr87YqO0xjvdkpf4/GTSgI/FzhEw460GFU92w3VjbgKN049ks37UsgCTTq3uQsBpqW4IoajfwTg7oOFGjb/ujZvdr+38vxFLdbNfbFtQ37tBQ250xumW3Gb78YwcX73C6XJlJU0ALZlt2aZCigNZ6lhXA0VNUeg8DjXAOdUBKmZ4GMix2NIpBouJTdEtyJhiTJ23mDptG1KKnKxWDOuV1cOpWUlN07FYONrGXMFWGkLMaKtGBs4R2o6mn+GaDnUBFU8p2TLGVPHiCM561ItDp1o2VKMR6uvdCVbHE7ug1d7ZHW/2Im8G3jTGpVg8p4gwxPVN++KWIMezpnUDQXK16qHDMmHazjPrLcp5KXCVSAG8LzSuGMjxpk8CEERoxOFQihdKDQostfCIojXGYYrVsWA4V2N+XEVylvZpKHl2sDC/f9tycHRE13U1XU8ZU0ZSph8TWRxzHP3iiIPFnHEYWLtTSs4czmb0B4eIOP7wDz9FKZ7VcsXDD1/ikauPoKrMzl/kXCpQBDc/5Oiup6NauHJ6lfbkCskFRjyXV5GUYcgtsVgc01iUpMq88/jzByyOWgIFdCCR0JBoe888J2azg1t1yeclJyfK1XUGSbQttO3AMDshBG/ATweKZLIUirOsFtcEus7j8Uj2+NKTk7JeR1ar0YBtcETNBn6O1xwPg9VDkkihWKpgNye0M2zURtBswXAoHgM5bSt0TSD4hrbtacIMRMjJkxPEUVmeKikpwxpyFAM5IeClxYunaMOYe4p4jssRD6cLRA0kOlLfk11hPV/C/CqSR3y3pgkj3hdKGVmvB4oWTk8jq6VVYM7JigiqWraPBdAWxjgwjsleZ4tRca4yh49WoepxkKM7ZoxpgU+Km8o+jaBFWa8Tp8tIToX1AMsT+43W2BxVSEMkDZGKjmqROJBkdYPUCac6EJdWFNKyJlwlf4xRccEzXyws46EJrI8OrFrrQU8nBYkHlEXP4azHty2h8fSN/XXOXEBOzBDy1DU5WckCRUlEYskVzEAZKsA8HSnHK0rKjKuBuBqthEO0mDpVyFLIFCR4ZhcO6I4Wlk3VB2g94gU/a3BdRVqdBSsDWyKiKlTDQLcX5GQgT24IrocJZ0kRQWrcV1FbSaeg6etZip24lskzIFY/qallGJogeI8BomSZrypCFijq6/LjtgzgdXIto7IDCc9cyPZF5cwew525/iybwP8zQGeHdbqBgfFYAdXjIZce/izr9UDorOiflmQAp9agGeIIqsTQEGLGOU/fK+Iby/JFMDpEaj2uKdHGmM1NUPyWoqk1qtTArzO+tm0bnDODbbk84ZFHHsY5oavVjGezBa1vaVyweS0ZdWIZpVrrgLlA08/ociL0M9Q3ZBdIyWJRSykEpzTOej2IWlFNqIwylSV3W4S9I1uQs/t35x0FNG1Azo36cD0sb9oXtwQ5IuZScBUyNkBXGZnWWXaQ845Ri2U9UYOTHWZhToGksjEkDM+JAz/55KiUWnVxbehU+7bVf9sGJ5nryZCq90a9hSbQti1t1xFzIaqlqWa12i4pGzKViYrzGVdrRDTNVJzJ0/czQtPiQ2Lyp5ZSiDnbcRQrvNTaFgYW5xMQ7+1cuRCTEIsyTotJoaZLO8S3lvOvGWqbVATvEyEI/jZHN8YIMUGMikhBJBOCpSfmkiiaUGp1TanUolOcn9L3HW0rZGf31MVtjXwLuLUsI8aCkqubsxZDDB7bgmBbDGTjXKhpqq5m7GxSKGtO8pR0kHPdPiMZi7NrosqGPfGWvaWBmAPrHBhLQykNiZYihexa8FPa87Q9xBTXYgGCuRaRtAyesjM+S7V0SrXEyo5bY7odt5/JabvGAsh9BTnRqoRrVnPXabGiXjrFillAt6gV5ijZ2m+DsFhV8zrPzc1Yaz9SzH3lHTJlLU5z3XuiCzgFTQ1jY1s6NF6I65G0juQQLPU7FXNrY1V6fSVUvGNj5TrMLW7Q18h5W/gxcJYVZ5MaHROaMjom8mAgJ4+ZMloGYaKQKLjgCL1VVSVUZ10pSHAQTA+JF6S4SjJO/bbr9Lj9sqvo5RbPZff1DbT+tJbotQeAqk+33zMmtbIMmPvOslpdDRS9eSXjDXDYCczZWuNy9qJ2frHlcW4OPW5VTPMMvNnogRt8eKPj32a0EwcrwCfelthS3VPmip2yQktdyyooL5W9rNXGp5irKcB685j6cXpjGgRnrmnrYrcaoEpKI+OwNiMsWwZy8IFSSzpMLuvpwJunIjgfrAKxD3X9DIirwcUy1aLeYWaUCsRsfROdlvOt4bcLWHe7zy7bxsXmknbA3HaEbW9GKTevLn9rd5W3CqWt93gROpQDtYLwbTuj942VbM5mBUdVZh4ahwVO+Ux2ieiSFeSqaK9f9PQHc6RWcPTOUVRZr1as12urfzEmUrK04CnNzQBSrcKK4sUeXRM4f/6Qw8MjYlaWo4GSaWuHcRiYdx1XrpygWUlxJK7MXYVvcN2A857VEG2Po2Sul1jMhbEaM8ergeIbVjETa2pkqvu7xCKMSVmNmZQdI45Eg0ig6+d4P2M+a5jNDulmvRXYGwM5R5gVLtx5wEEsnLvj/K265POT4sgR4lqNPUsZ0RHvt7E0YL7cprUYJt/4OlKU0Auos/22HAy1fgMOUt76i4UWxQroxZJxHoILNM4jTmlaT/At4hTvMkhBBQuqI5NLRMeR0RnFKdIgOGJUYjKwU1Twtd6JDxl8RF0mqWfI1i+PXIU/vFRYR6vgWRJoLuSritLVNOwKStRqr2RGikIcLaNqUjKTPi6qNaVcN3ANsT1bnPO1SN/tLwfw9P/rTtq2wGpEU2E4HTh++ISYElIikiOuFFKuLjbFlEcFOeRslLhalqT3DhEleKWp6fOd87QyVQcOBB9AasqAYAW8pCEUj89CWBekRHKB409fIp8sOZ13pNWK2WLGbNFz8UvOM1v0xtgtWiR4c51UhVyiIoMpXdYFXdeg43WiLEcDNlfXlMsrixlarknLtfVrSuSYKVoYcmIoCbxjWI+Eq6cG1GYt0lrMQTjs8LMG3wZmF+e0Mm0JUYsU6kaPUp4ItDMBEN28vA5ebOh6EQoF48Z3XVW6sZxFzOycFgdjw3UDgqayGjnlWo+lkNcDeUyW1ju3bXBsVp5dbDeMiW7PeRaX3OiGXQ/bBLgWz9wIg5zFLXqjp49NngA6Z4wDJydX0ZMr1aDIlGhA3EIt3I5hb54KHwIxjXi1dGoLzp+27NiWV5nATymZlCNO3U7srsUFasmkFFmeHLNanbA8OebSg5/ms5/5tG0z0VjK+eHhORbzw1p3xjKZpqL7zoGo0LUd5w7P0bUt49PWnH7FMSfHx4zrNavlKTlVAFf3cNwYTGoZZVMV3rIhMKb43LJzTbuJGroh5GCaAzUpxW0NSDeRHyLVQ3BjuTXICVa99Fzb03nPTIXzOFqExrf0oUHE05eRnEdGzSyco/WV4vSZ5BNRI8kr6griPAcHCy5+yZ2EpmHW98z6nlwKVy9f5vj4mJQyJ8cnrJaWjbO16rXezIKXghcliDLrAnddvMDFO+8iFjgdDaCcXLnCQ3/4SVanJ7Tec+nSZdIYKSlRakBllgCt0YrL1ch6TDUTrDBmIRfldEhcPl0TCZyOibHY4jgWZwAnwzoqy3VlfLwj+4bGtRzMDlnMDpj1DfPDQ7pFT06ZtbTEGGk6x53nA513LI5ur7uqZEeOwrgq5KikoORYLIjVg28scLbpHW3X1XicBglmSgfnaWuZ/STKOhtdqT4z5ogUR9AWT4dqYVwn1kPCB+h8S/ENroGuV/pOUcmoLIERFYhqwc9ShDGvLdsOT3CKI5CiMkYDOagjNOYP9aGgYaD8/8S9W5MbS7Kd+XlcMgFUkdyX0z0myUZj0v//MfM6D2OyGY1O3/aFZFUByLj5PLhHAkVu9j6y7jpKGlgoFC6JjMwI9+XL1wpCbZlzE2qP/PQz/Lf/e3C+TkuPDVHloSmPcjRCtRi6gDYr4ZQrQz2Yao5Y+Gqj4kFO73dSCLbYm1ruzHnFPbbebvvP/+U/8v59on46M0rj008fuX78TB0bMhrSK9IGNKVtbkbqxosAUZWoJg2QAyzu55TDIAdrJz+EyCFkmxjjwpIXn3VcdlxAJCIjIE0Il4aUQbsUPl6uPOVAWjOf/vIT+ZD58N07OP8n3n/3jofTgQcvqeYAACAASURBVIcf3hMOi5VOgpEmRx3Eq2W84arIxbqs9KVRn65obYyPZ9ovz/TaqOcr9eXshE5LjIYqL2XjXItlkscF1sX4fMcVWTOSI/n9gfSwsDys/JiFuKSbEjMzyLEod/w7rI5fojV7mHyX5b5qc3d7k/m6/fVi6JQtCTeCr1dS2T2g/ByW4YlG65SXZ9r1Sswrx/VADkJHaMyMnd0I9H7HX4MpwteH68u9/P2o8TVa8z+DyNwfyVdv8nsv/Kds23bh6fMvvDx9NkRnKMH9H9fTgdP7d2Zc2TthqDVXpEAu2RtBArDc8YkcJdGb3lvrndaK683MeNMEBUdv1LLx+dOvPP36M89Pn/jr//f/8tc//w9XH87EFLl+/yPfffiONWeW1RSJY3RRP7/Gj4eVH77/wfwf48ISVrbrlcv5zNPnT7TWaHWjOMe11UKtm3vPFVqxBpbaGqN51aA1N7714GenqNyNi0x0HmtAcdJxcL+q4DpeIUSWw7fVyH9/Fp6tWjGSY2JRYdHAAmSJZEviyaosHpclN9DcVXP3kpXug5GTqSumnDkdDpyORyNDXq/0Umih0nKiR29mnBwHtYURJtdHiShJYE2mYROc8Bs7bDHCcB6Fd4rUUs1Tq7mTeG3U6llva6bl07uLEd46PGozopWVvlw3hlk+C7dSGDMgs/a3KbQ0PXIggihdA22Yf07OiUOOrMtbi3FYF84YinTvmrtTSZweXrcmBPHJ0PRMYjBfIhkQsmmHyADTXzMF29mSOhw10CF3N1Mznn5IKmomdT55djV+ljBQ7QTtvpaaz9acXGe7s9gpasq5Lj0/sLJN60op7jh+9mldTcNhcflKDT5xDMsVOu4Xqk5VmdCwzJzTf4eJ5d5djNMDyo7zW3NyjseF7bQSt0YXsVZ+YIqHzVLDhCNUcUj6tipOkDkIu91CCu63JkLyW3A+XQ7eheY/X5E8VAjjlnOO4kTgMdiy0Ftlzck4NOtCD8Haw+PY6yUiYpyg5udBH/Z7t8ekDg/cBtSB1m5BT22mqeNk6zEGvVTz3pu7NwbEYOWZMQg1olmsPBaF7qXJG1n3Fll8pcj7Bts8P/WOWzGxjvuy0xTKu73ui3fR+7z/bqHf245u6Mt+fnMrxeq4lWHRe3T27+376/vfxnD09h3+/uH4ZoDzrVF4FeR9ea3+nfd+i22o6+FsV8q2EdTKrIJZxYxm16wFEl6u8uaF3mVvxMG/y6s+MpllIfWmHT8//VpXdwoYo9NKcefyq++LNdIwOrFFyrYZr6ZVespuDjtdBuxo3rdqHw9HHh/fkdNCDMnLYI26XUkpmcNA2QglWKdpNKRlqCk9myTDuKE68/aqROxnvdxue4u9WAv5pDSkbArN+e+sm78b5DQVhgYiiUUSaaiJ1w01oum42iSZhB+XhRFgjXAIdsEcEE77ggkhGb/jhxz445JYcuL9YeHd8YCOwb+UBy5q4mrPCOdoB1JH9wnaYezRSSlwkkEelfdt47vzC4/PT9QOoSi1K+F6QVPgelx5J0p6+sTYLvsqpsDleuXy/MyQwE+//MLHX37mWgrn8wutV4aa6/L55QwI5/OZy/VKSoHD4cAffvwD707vqP+18W595Fo6f/q18tNTI4ZOYGO0RN0Knz516namtcrL+YmtXPlwWPhBHlnCQn5btXHTVFbQbrV3iYEcTfG4j0bdKsqgj0pTQ3hmKSOIcFxPPBxOZoHwnbI8HEDZ29Ctve+AyMLog/zUSMlsQAKDWq7ogO3qvkvSvZvLEAf6BYIQJLMECFJJceEozpAMQlyCcYT8+0wY0/092Xrl+fLCtWx8/rXx9JfO80tA+8YYm/GhlhdCfjEX5nJlXDdCHITlSljaXj4TiViA7uJXvqiL88JMOTp8geO7/9O/IYf4R7YPP5zQceK5NloQUo603ilbY3TTDoq+eLU26GI8ldpNcTIHOEZrtz/lwEO2MU7BENIALDGS3azykKPrD7miaXQPB7HOjGnLYp42btLqOjVSBzIKGl54+de/oZ+eGO8eOJXKOB5YUkRzIgaxhMS7vkbttM3EJ8e1IlUJHUJXWzgUaI1etr1pYV+kazN+DtDboF8CxEDaOmFdjETdO+NaCK3Tnjf6+0rIEUnGO7Qhve/6fLvt8vzCy7nu3U12urs8wIxEPbOdi8IeePuaNIOIMAKhz7G5fYYp1lsQ1S3q3YOCMUWEliNRMpISjWhEcISC0OQ+5PGld0d17uii+iWPZy7M98Wq3w50bsda7t7z/v63N/nqc+aiOZ9wL1jydlteFvP0G4PeTGR2mMkhfTQ0mJZVXBYjJwebK8zWKFlS4SUlCYmc847iBTe7CmJ2QiFMOwhrRz9fXtguF66XF37629/4+W9/5uqPiQfRrRSaCNvlwuX8wsvLC+tQ8vqAeuXErBnMpDY5H+fx8T2BaFYqtbBdDb2ptVDK1U11C7X6/bJRy8YYg+1q3V2jDy5OS+m9c7lc2LYNxYjKE6VZlnUPrpZsAoDigMFsH1/XlZQiP/z4L98ci98PcnpgjESWxCqZpA0txZxMa6NtG+jg+P6RP77/npQTQxtjmBHkkcAjEHVC4qaN8ccl8B/WzLos/Hg48P3xCKoUhZYyvTXOMXBdF/usbaM3U+I1VwVrk4s0QldO5cqPL0885kzpSrwOalOOvfGQAv202gX+6Vevrc9MWzgP5WkM6lB+fnnh58+f2VrjfC3mVwRctyvPT0+MPnh5eeF8ObMumePhxOkP/4HeGu/Xd/yX//Cf+fxy4f/8v/4bl/qvoI3Ald5Au/BrPfNZAq1tPL/8StnO9PdH/o/Tj6zrieWNFY8jJh6nviBIXliSlSGuZeOyVVO+rZVQrxBtgUcqIsIP373n8PA9ISUe32XTF3LVzBwXQ276wmiZ1oZZPkTX3ukXyrXQE+Q8lWk7IxpXa9ApW6XTLbBZlBxXFj0YtCuCxEh2xejZ8SIYObWpEbyvrfLp5ZnzJfDx5ysf//XKyxP0+swonxBpjIcNebyS4qC+FC7HjZjg9C5yfLgt0nFxFWYZcxXxOrl1B8VoqKV1YFkWJASC5DcvV/3w4wPCGa6FqwgpJ1odbNdGGIE1mH+M6GBk4+ZsFJqLg+UYOK2BHOD9EvhuzbsIWfBlIIqLJIbAYYmsq8nIh2SEe4CdmnpnnheCkBdIUejDFF771tDaea6Fa4709488XAv94ciaEnpciSG4yrHbdXS8scJKVqEaiz92SBNhaZW2XVx7BLuNgZaKbpWh1gRQVSEE8lqJiwU5favEY0JKpz1d6ZeK9kFYI7oERw258bLecHt+eub5eduLQju3S7BumR09Y/f2mnostrhP9M4J/P5cvUPdkscnBq5bwigYvy34FRWXRFwUJNAkUGtniAU5HQzdl7jTB2aZ6kvTTGGifDMk8kBFbyHPt1Cfud0r5Oz371/0BaJ1h6P6g3L31xmwjleB31tsy+K2A6PTa6WXjfJs3bylFqoOYs6k9cByrISUGKMbvyRZN+N8j/UQWZa8V1ViiEzbht6bI9GBMSKtV16en3n6/JHzyzN//vO/8rc//Q833Hyx60LV3MjH4LIeeXl+Zj18pvVBPj6ai0CMhLEYVygGcl6sfL0c+PD+Ox8G3cdjihzqGLRWqNVLVKV4kNM5Pz9xfn6it8bnz595fnqitsbHjx/5/PSEgNs6JFJKPD6+43g8kmIyd4Jl9RKVtb+nlDgejqSceHz//ptj8W+YhWd3u092fvqIsrtuj2HweBLIIZg4nATnywgLJsKmMiyCDcJB4ICyYj8PXtVKQBMXIwuB5AJiNQrdyx0a5qRqC52gHHSQWyPWQmpKKqapYerMLlyoA9qkz4Ubqat1axEeytg28+Zo3YiZE7pzRvy8WbtwILmL6giRh9OJMEAlcFgXsqtUWunOFGV7U4aIl80KpRRaDYxeYOS393WAHX6fqrw2Gc3HraQxxkBdAwFpKI0Q7GTuo5oiaoiEZN0xKZjOCRhBc2oXT6NHcQFCdWXdPqxLymwfrK+5MxhmHYlId9NNU2hl+hwhCMYbMmrvnKTtKajBoebhorTWaaXSCmjbGO2K0NBe0FFQGfRRjAAuOEyc70hwt2nzy0Xu/jm3B+/KN78LyP9jW8rRnOKdPD33adfqccTJdGisPDivF1MukikKbOdxdAFB3LpFhFmAC3clLPHXGBfk1lkRwnQYD8QAWew1oqZMrkMJfRi6Mjpjq/RrocdIz0pzdcD7IId77Z8+BQ11P/j7Ub6PRPbIxO+POVcNJCgjdoI0hkZCbaaV07oZBu9kE9PYmWuq7v+93dZao9bmxGBDDeN9kBNvwYq4kFrUgOoM9XUXW4tDUOdM6VB0523ILtwmw8tiM/yYVQNHKXWOr96QrBlf6Dyw8/5+nO6fZX+1ci778RS9vUbu3veGAr2+P99nEm5h/+jb+98qi9wms9fXn97de+uAdRJi989z3olxUdxfSgSJpkpsgKTRJmYQUmtxn8FGHv1G1/DvYl2ePp/6mtS7IyzbRtk2ymZrTG+FMQ2xdfrwjV2YcEydr2mKLdZZdxOMtHNHQiC68ZvcHfc+MrlVR40XWivG21wKtRpQMd+v1bab+4ZaWdaVtRTgFhymZIrRx+ORlLK1ui+rNcQ4DyelxPF4IOfM6Xj65lj8bpBzkIhK4AHhAVMOXXMmhsB5NDbpNBplFLZeGB0r4ywroPzx+kg/ninVvH1aMZ+bf6nwL5/P5Fh4OFdOy4t9ybLRi0WEx2L1QtzsUGdLB5ZJG1nDFLxSHRw/fiJfK7kPYnWTRIGW7Hzf6+pqF4D4Gr6qkIZQFeS6UWtl653DGKxi19NRlCOdg1akXqjnJ+I4sB4WluOCwb6uS7Ak/uX7d/xvHx7NcX0khhPGWu+2wFNpfaO1K9umvDw98SlU6ht35NTNeElJjFvTR2WrF9uf0Qhx4LqLdsy7C7MZkYbzy8bH+ExKgcshcz4mN3I0GwGRQIonUjgwBoS1cHwntC7086BJQ2WwtUa7DAiDkBvEYVzWZOXOlDpr7iypk2IhhDNQTbWa5GhJdAp8JGpkdLNhsO4UV5RWE48RlOWwsT4WYmh8/37jh/cbKSqHtXJYGjHC6RGOD8rO0HQlZlWfYDq00qlb2c0hfV25GdAp7E6Hb7gtSyAEpdTC9XqllgYaCGSimEx70ECTRnRJBdGOE2XmlUQkcIjCY07uQM3eYbEHbCLEKKAmrdDagF5t4g0RCcbBUq9g4chAwm02woJqIkYhuyL1qqCXjaYKMSKbnUvDEwpb1QIMD2XdWd5UpusubRBzIK+RMMQ1XmzhsxbZiMgguH2Fjk6nQh2EHMmLiZmmpmQNZCLWzQcW6LB30L3xcPLL85W/Pp2ZRTIrTTjfK8xuKV9sPDEx89W4F49uCJzs4n5jJo0iLCmyJOuUPbpBcHBEaJbG7LjZO3Udplg7wPobnTs3mgWuU/3c59XZPSNe8piIedjRJg8vlFvQBbta+C24uSFA9vRboGB/n4HuPDaz4ybcmV6G/R1uOI7v8xub55oVQSIvK+tqSbP5IDZCM8ucCAyuVLVu1lLsOg4x8unzJ37+6WdCirx79x3v339HCJHJkDIkp9Fb29GflCK1FH7661/4+OsvbNczT0+f2LbNnts73eUuprfelzyznaPlx1qABo7eGTeGeAviDNHD5uBk4xtTZtGDJ8ymwdV7J0drYLByVqOWRs6dmBbev//ezX7NWT2lyOn0wLoezIZpPZDS4vzLsKs6L6uVsU4P/0CQs0okSuSBwDuEFCJrzoQY6W3jE52qlaKVrRf6UB7CkeNxJUjgj9eN0/EDPTVCrYhUAsKpKqePLybnLs+7eNAY1v45rSOmMFbo4+aVMfkZIgwn+GrtjF8+oeEJHcpheO9/jugxoyk4t8czgqawKQzlKIlFMhWBslFKYRuDi8JBDJnIDLI2Fq2E4kGONuT4HetpQYAlJcbDgbQm/vD9Oz5/90hrg/M22NqgDeWlDLr2Pcip9cq2dZ6ehOO40MPf81P9x7daKrU0QlLEkZmtnmnqbagRUnRdlXYjiI1hq/cLm8H5QVjXwLoakrDkSF4SMUSOp0eOh0ebeJbAcQnWVq+DMipDG9f2wigbEpTUlZiUmAKHvJBTIqfBIXdybsSghPCCiJdKgv8kEeVgC+1Y7VwYPrkPT0JMEhlBOR423r/fyHkGOYWUBkvqLLkTAxxOsB4HiNBapnWLkLsvsKMpbTMBumkOqcNJuiH7ZM6u6/KW25LFgpxSuFyulFIRtVJZlEAmExBq6ESMIxeMoeLHxJSSI3AIwrucyTGQogc0TEsXx346pqO06z4BEvbW0xCMZI6jNxHIRG9cWJlARPLLd1FBL5ZUjCCMZBOnERO7E2wtqJ0YFGr70nvzoFhISyAfErF3ehBTPxbT0MpxL2gaUqNYVouV9zhkUk4W5Awhi5X41EuUDib5UXvbAf3p5YW/fD4zw6p7CXvBApe5dE8V2TADCDEkJ2KHaSJtiC1S3QOK07JwXBZzp14XWDNRICd16w0hx0jKNl9K6/QxCCqIGk+pjUFtneZZfxuNMe6DEN/33WMo7p5Mtjne4ogRmLZW3wPb+wDH0QKFPtwAF3XelS0IwYmos1wagxu0cpNx2IMnsWaL/m1ZlX/OFswQellXtButw/gqDamV1Ay9Gd25OvtxSky/JmtaiXz//Y/88MMf7LulhZSyHzM332RWMYVaN/765z/x688/0Wrh5fMntnI1Mn63cWJfB78g098hZXuQMwMhH5aU1PdVrbZjC7KVt6IhPHYexR2JjEHMEV0iolDKxnatbFdDft69T85Fijw8PHA4HPYAJmfvfs3mOTkRQ1SdeGyAy+F4+OZQ/G6QkxyyjmBEP3QXj9rbEfds18XF9JYlZhHWEFAnL4YQCAoHhdw7YYi3govvfb8FOHjWok4y5OsiwK2l0T2Eul9kU78lYiWuOajdoco5Sw/IMshi4leLWgltDrdiWVAKdltECDrQ3tBe93Z2u6hxQlQgJw8GpZtiK4J0JSWloYxuthTT2+deD+FNt8FrPBg/mb2rZG87VbjBvuxoxehKmy3nQQkyfPyt2h1jJ7VK7sUjbvNHgYEEQ2umr4nO0du5LiYJHp1nEPwCsRLJFC/cv4Sz9buXX2ZJy3kC6tmgzq4+tTb2FXK2n8s6iFHJScnZy26LkrJn7gND+/xk2NFbn9xnaU+HuvP8Hdny36G8IbNM5RPSrjVxN7bzx+3M0q8eZ5ak5HaLnkTMF6ra4qBuIbA3RbwqNNxv99Jg7O/r1ZLZfW6LlWseDSd2Txj+thgKFrWKZ/h2rU8RRqvmCKiVxL3A7EFC2L/G3NXZhWIl2RmoeilsGgHr6+/yW1nvP3vrAycD++d4NyrYMbMKlZ1js0wrghkZe5Azwe4hOwhJE+geeLQ4dk6WCVrqjYIgdt/G3w/X5PXMQ+JrAHord5gG07gFOY5MGHna5vvADbW5jasS1IKQ3o0ycCue+SHwIEfVn+NaLGN2fyGEqIQR3NFe9gV5FlLvR216Mfb+1rjc3PcZhL5G6GeAYUjYbQ0LPg8PTyRCiKzLkXV9cbSmknIGZEdnbuc61Fq4Xi9s25XeqskpTJHBibT9xnl8Kz3drT/7tWKlMXEECWZFhB1smN/X3ivsSI/cIcGGAN75WPqclQSCBznzFpxcvL/XnIh8vpsNoqp6U2b+xva7Qc53prvOoRSymLZGUFNAXgU+HE9U7ab6u21oNRl8MKg/l8KDH9QYrE1VULI2cmseDN1O64EPhHgAs68ZMk3IubMXtLrynI+ds6FBd8sIDQ6ZVc80uh+kmxUGmcFRBhnlh2CibjUoFaEQzJjzcCA8nIjryvdjsDw9EUtBUzB39jDZDgJaeTiu/Pj9B2ofHEun1GFCia1z7YNeC+8eF8r1woc18eOHA98dMqfv3lgM0I7a7G5HGbRezI4jJZa4+MmVfFFRShFqswC2SaSQCMEc08tmAUTKlZQKIQpb7VzK2bLCtJiqpipNKzE7AsMRyIhY0BGTic2dDkfWZSHGyMFfK77ozqx2rziqZUOi3jZOYGiitSOlRMqW0KHkBGNVvvt+5T/+7wfWpXFaG6fDIIbBcRUOh+Rk2UBezAD05ck5VG7rUIsJbNXink+KlfOGQkwmyb4f4/tA8m22JJHotXMZymjdeF7bFZVIkG5t8a2BdgtILVpBVdARGSMzJFpJdTh3Lt7i7V0fCAG12r3NbWE3Qc3JSj4SAikFYrRW9N4bVS0AEbXnxyDExTI+9cWqYV1Suwu6jpsKqwqj205Ytu8t1pMMrpbVrykzwqA0pctw5CMSoy39InE/n3W23ErncrnSdDBy4Px04fh0JvREfn8gWm/abUHSN14Y8wK5vV6EfOG5AYOTYux/frXI3HRnI0r0txlqKYCIknsntQBD6dl5h8FkAla3djhlWBc75iODYrpYWxu0PrhW5al3rqVR++B5q9R+ryJkYzNjGnGe1kQbdgPKuxLZ5JR82ZU1A1S9R3JmgDqGn6NhL+dZl+dE78J+bCaXxddboq685dZcKNQQjmRIVriV0vZEyQ6EvaY2U1hX5bpVLteCKvz6y68c//ynHemJMTEzD1XdO7FyivTeeHn6xPXlxdrR62bHtXdqqfRWHXV2vpd3KUU365RwO59mgMPAOnEFep/t3NZAFKOhNnvQ45txheBSTDOn1crf/vwn/vqXf6VuG7/8+jOfPn4E4Hg8sjp6cy0bh8PBkqJ53EQIaUFcTmYGTvN8QuDx8dv6cr8b5HzICQ2wlkIeFtxMDZwlwPvTkS7mSly3cssmsZpsLoVVrTMli0PVCqE3xKNQVJ2MNmFhu9/FAxURq9mLNXXuDir39V9VooxbxJhAPR1Rj9pNlNEnOSagYdq8lgEKS4CHlDyhCzR3rtPDEX04IUvmQQf5+Zm4bWgSmgwbgCVDzqg2TseFH77/QBuDUx2U1mkKpz7YnOxVPjzSS+UhR358WHm/RA7v3v3uBfQPbTOr3xcvpXWbWA/JyhQpOeqmLvqHotWMRysRdLrBdkMvRAmpIrESIlzbhUOFEAPrcmDJC4JdDNFbkK0Dx8iwKQU358w8HN6x5pUQhZzt8duZYeO9a0iMsWd2BCNXKoneM3WL1GtCB6Rsg/3ddwv/8T8dOBwriYvxksLg4ZR4OGWb7FMgRKE3aAWuZztGo3XqZplR20wZ2RJaI6sGBVl0LyOMCf+84Rajd9Y47KS907ZC2QpKIAQPcrrp1cpUI1I80AmMkc1sTyN9REdUHVlzojHiNg5qYnsGT09Y2o5ZXiwYnZ1Vgimy1j7cYiIwgk0AJkkxy07dAy5LSES8ZN2qf57Sq415q41SbCGMKZCz8wKCsiSTmhilU6mo4PpUwYMcT0KUWyapg37eCLUxcuTl6czx+UJmIbaFpBPR9An/zaPWBfIddAjMzHrcP7Q/avt1e9ivEVUSQt6zcLf0EFjaoEo3wn9PaG8g1uCxpkBKFuQcF/sUKxELrQ9Tg2/sootbaVxa59dz5dq6IcF70HW7Zu8BgonU3iOHAnuQc0NhXn9XO1ds3rTS6dj5OXtZLwgpZnPg5pYUzeOEH4MYYKX+Ewbs21tXY72ZknHey7kTzVK3XFGR3ai6FvPN673z+emFT5+fDakJiZsDt+mxgQd3Ytfh6bCyrguiyugV7VO6kf1aLLXSqqnbx2XZXxuCz8fxjiwtjlz6T5m8Q5H9uIMhTTHGO76roWQT5Xt5eebp6YlSNv705z/xp//x3yml8OnTRz5//oSI8Pj4jtPpRIyRbbMgZz9pxCai4Mdwkp/F56mJBn348OGbY/H73VXDJJplmNcS4l0H9o13OFEwVEVVERfa0xCQaSm/C0vNApNdBJaJ61cXsDrseM/8n5HQTctMXu3qfC9FTJPaW43lDn62Y2ev3p1S79owjWioXou3jjINARVHkhRi6+BeOXq9Mi7ZgpwZrLXmgnYOpeNwPd7aiZFYo4fDMbyG5t5022ux82LzstFen50DcSvzzL6ifZyssmfdAXPhUIPNZ2eW0R9cgVaCX5BhnwitA0f3+1bKyJOqymQiyD5hWv5q+3ifUevd/x6QKLRuZOehfipEIeXAeggcDpE4uwUFs5jI4eabFW3RFblBs+oLo+5qhHjAKD4pT8h/xgj/Douibx6v3kXuuq8Sc80J4uf7q5LonAId2fFxVF9k98tylgDm4qvs7c3TP25O4DPIA58k9Vbq2Us+M6ueh1JvJTDg5gc21AX+DOK2koZn+xIZ4RaIzZI5d7t5+6H7YflqX9z6pXm5pNVOmKKA/0u2+/P96zNI755hv98KMnOKBHz+uj13nz73c0RuJYxh4pndyyXdbzYHGALTh9LGoI2x/72r7iW2Pm7Zk4p+cf77/CEwxMozVhoLe9my+/V1H9vdjoBdVPNzFd0DVYRdU8imdUtW92z/N4KcMQxpecvtVanm9uBdkKNfnfd2nnrZ2cn3vXenWDj35s7PL+wUECEF66AEoDcYzrnz62NXFp7z9N31cX/+zH3fR05nyVTuXuDrsN7mRhMltHfrvZvswOhctyuXy5lSNi6XC5frlVI2rtuVUgoiwla2u1JVsP25QzCnZIWEWyIlrt9jHmDKui7fHIvfDXK2y2d4+UxtgzyULhMqhp4ChcQI1oIZW3d54Cv9apFy6J3Ymyt4uvmeTHE1HwhxgrHecJq7deQuEJpKn8HLWLcWtldbTLCeYMmEAbHIRLZp0d8zRzhkiEK6mypSH6TmhMMOozkyhNAuBUpH2q+0pzMSI5effqE/HAkpEd8/Eh8fqK2z/frE5fPZPJAk0jEDz+Hlj9E67WIwXs2REgdlROLfGax/xlZapZRKqEYojdGc4kPAW3dNEn+y4k2vo5Jic+M4974RM9yMOdv5EHBNnYFKpY1qYymo2gAAIABJREFUZaRNKaURQ+R0yKTVejRyzK7Qa6WNoELoEa1HxliQJISUSDPICc4j0IHifKtdvwaQZLwYAqUGnj4HPn+2+3kNpEX58N3KH//YOZ0akebj3kkZXPKFqQZsi1xBu3X69V4Yre6tkNH9nHYei8O+M6gwqPqNF8rhSsCOJBn3wTh0KQTWxQjISQfJjTqLKsEFH5XgqtuB2qE0u8ZCGITQDUUPprJoJQLjvJkCatoh5vV4NBkFVecAWGIzxk2CIQ5bcsYQ756y33VEF3HUG+LaXf7d+R6tuPhoaZSrtdWOHtHhyqeLT4DYqWKL9qA1Na+zPlyt3M7rPtgXVN0aWkFfNj5/eiH98pmDHlnLA8u4LYwhvA6k3mTrA5pJSOhtnse/1h4U3m9695xd3ZrJWVCfOwdJh4lKM0wnS6G0xnkTSjS5j9Y7MQaeS2fJhhYML/m0PriURm2Dax08XTvXMti6e/zpLIuNWymNr4/X1GGaRbdwv+i7OdgrsYa7NvAxBNWJGgT7pOHlC3+JcX/mCnL3+R5Q2HoDp/K2pcflcCTl1WYYb3aZ6BW4urF1LKDe9KHDxkgDxJ38Hzy46x6YW4YpiCXfTvavRRBvJlBHcoKj4zFFb+hR42b5+7V+ky0opRCzlfCCmI9kd87W9JacAZb6/Nhq5Xq+mCjhsARkjMH5cuHl5ZnWGr/8/DM///w3Sin89Lc/89Pf/kKrlfP5zPV6AeDl/Gw6PCGwLJmcrRznobEhxyEZgiWyJ1R72VIHf/jDH745Fr8f5JyfkfOTR+yOogT7+D4iNSyMEIjN2raDKtIroxnpRWVYZC/WCTGyEUNFhrk/66wf305KwSKR2dUAMIM7UQWv5frDtt1FnxISYX1AjgeoQlQh9hvkO4BxWtAPBzQFkjbSqMAwSNQzOS3K2CzgqQj1WlEq+nyh4QHXYaGtmZAzy4/fs3z/gToG2/OZ6/nKkEDPB0ZcrARWQbvxFfrVfD1qj9Q0KERSeeNasesTxCqmHYQYaU9kD76sNNEQGjCIoRODBzyh29hhGfyyLFhT00BD85RqM90ZoGkHrcS4sCSFxZCaFE6saXVkyNGBEdG6MLpZYwTNRIKv3t2CmtEJo5j9gygmTKTWex4CqoFaA89PgadPgVqjtRcHePdh8OO/DB4fOomBLYudoXUPnHoVWsM0VXTDfGDM3LO3sounRTfiTNF8ZiRMg0s7ziEqbw3KGXLoAc4wwU1rErAuyCVnUox0X+SGDpbaCMF1azzICdNktjuRs3ViaH7NRsLMAodlZxKMA3A6epBzWsnL6uqlnTJc42Y2InhwE2S2h5u0g01UNpEPVzEHtSCnNtBBr/0W5GyFci2OwFjXVQiBJRrPSz3jn3yH1ge12a11u91KVRYEtGaLja6Z56cz+dMzPSkfXHEZPCsP3KD8t9rG7AC7y6zlFRZxe/wLhOD2Fw9yFHdzt6aNyAyELchBoTQrY8ZgzSKlde+UbKRkCF9Xvw3lWga1W5eo3Ve2rtShNJ0Nq/p1kDOhd3wdeIXSzG+j/gpfZNAbgj+/nToyMlGEO6XxeXRkRoM4Und78avD9fDGQc56OJCWBby8M6049jJLbzd2eAxeuRg76hqDtfILhpa0YVw4xk4vJ6hRB1Sg1oG4zMVoHuT46yUshq6r2eJ0rPQXsGBrrgmTI3tfCho6u+7iTQbAj3+vzdZ5MK5iLfQx+PTpI7/++jOlFP76l7/wl7/8iVIKHz/+wqdPvxjSUwq1OhAS7pBgb9y5FXcckQtTfNJlFV4FOWbs/a3td4McdZGgNoMcbjBbj24Pj3Ve3V9346489QqDnE+bULFHhTs8PCFPxMicwZWrQthLJCqB8XcmnAlLM24TxfSBmd0Cu4JoNB0O/6O1wWYnt84JYygRMzYc85jM0l015IM2GJcrfbWMVreCNNcRkQx04x00R3N6MyXMWuky3K8Ehr51byPMSVDVoFsTiRJGGIw2rMODgcgMM92gci7aHrzaYNotBIwDNY/3/lHiLdYzyZR90pvZzW0S9DTGb9Pvar7n3O+p2cZ+3ty6qYZamao2qM3KVVNPJaTZqq6GHE04ewT6NDhCXO/Gy5hR0GFt1SmZ+7rGuGeQE2K91bfn9C37/TfbvJSzt9P68dz1VWZpEC+1elZ0u051/zcJ/8ZFuzE95rUOd+99h16FOLvhLGCRCafg0PuYXRlhh7e/xCNuZa35Or17DJ/MZsnQJrXeZSeh9j6h8tu1r44s2ER9H9y8vvX5sw8TjqzdlKt3PZY3HcGvj8M89rqf8ndHS/lqd+5rV37cDQHSVzs/u7Lm0M/ydPcgtA3MDkcFAnQCqrcgZ6ghYq3fkLB7heP7ZpD9ur7bLfXr+7a7r0M29feYP7k7716Ngd59U717N/3y3fa46qsX723Rb7gZQTjua5nt760UNbWchElGngH1LWa9X+LmN72/Gue1MtfTm3aQl5j8sZvwn10DIUyOnY1/bYbkpFLYtkJIG10N2TNUX29SBfg++76olw9bM9HB0Tsvz088Pz9Tysb5/OzlKrN9MKXlTnNRRLBYIQxb50MI9H6bQ1U96JIpQXDTjpplSxNPLN8ci98Ncq5to28Xnn3xmxMiKNoWO/QpgkRSzp4bK9dq/VDrhJcQwhBStwVEW4fq3RHd6r1CgGUxAl6MhIcH4vGALbnmb6RD6bXv0HbAyYGqO++HBv3aYBSUiAwTIdIQTDcnBFickQ/2XsUg9nBYkdPBouthaqIopDrIxT6/Xa/0zVxW+3Wjlc1O1MvG+OUzzdeRBaWHiC7KyFZbLZfGpQ5qrbyc7UTQQ+IaTqyayevbWgFINN5RH4PRTHW4Fssgaor0tRkJOCvLaiULUSvn2KTXaWOz4IUOoSFRiEsnLgISqC1QW/TgJSEjgWS0R0Z3omEbdJpfMtGykBARWRBZ0KaUzUhsEhRJA4LS+mCr3TKbbr4wOtRg37AyVHh6CnZ7DsQ4SIdGzko+bKRlI+ZBHJUwvJNFBR3Jgqqe0J4QHRyWwbtH833KMXI8mHPu9blSrpb17v4yYsHURG80CIfT247ltm2U65V6sdtojcW70paUzbsqBtoA8dbeyanTMOgyaDQgUIZw7VaWjUNZ8DLiHY8gxQDZJNcPa+awGhS+LpG0REJTSoQmSqdTa6W35nw3RYd37cmyi8/NmdyCFW8Ld9hY1RoFTFDTxsFq/dYGXWvZdYnmGFRHiYZA6Z1zqVZqqZVr7zf+yZychwmkja3xct5Izxc4JkppFuyIq6X7zPeWmyV7Yw9yDKq/C3HuFv55R4TXq+Hdcw3JMRPjgBLFyqzJD3vrnTYsgRkjUtyFPhYL1lWhDlMnt2Tvtg/N0TCGEhlkD6qSqs/HhqrOwEZcNZc7jt8EZuYXUo9KvsSs9njkrnR1Hyvr/XO+PhKv7/uhHW8c5Dw+vuP59Gh2NMHKP1upXLfNdHrEvuVyWDnsnDZPJHDUh5mosJ8K94GOlextNZ7JquB8uWTk6z4GvZj6cCnVApcQXBoGrteNX375hdI66+dnnrfOenzYDafxjrVlWfw6Du57ANfrlfP5hd4bpRY2t1b5/PkjHz/+Qq2Vz0+f+PzpE703rtvVNHvUkrMZ5MjdHHO/3YZI7m6vEdV5zpzP/4Ctw9YqtW68lOaojTpKA6LdWkdHIi6rBzvBWkfFyMaByOIzf0CI3YISrYoWbwfsPrEFzFo8JCRl0sMj6d07VJTmInLaOv1S6Fu1C1hvck/SDeKmKaNMOBCI2ZCIGGDJJo+eb1mnqJqbsSrhIcK7B8gJQrYbQro0xtmcjmsINFVGrYxSaJ/Md4PnCyNlegzIYSEviRAT7RCQDNrMJ+l6tQn68vKZa7kSHzLlNKhxoW5vW65yxI/ufA4LFqxU02JkbJ0UhPUIMZi2jGgkhWgXUxuM0UAcdZIOQUiLsBwd4r4EtEYj72oCzagm4190zxDDDHKMR2NIj409uqBjUEulNUXiIAxFolLb4FK89DBl0PtAQ9rr288vwvNL4OUsHE/KcqqkdZCWQsyVmDuxNRPG8zLp6AlVQfsB+gJjsKTK6WR15iUl6qHT6iBqMs+0YF1FKSVAUelMjpAEWI9vG+S0rVCvG/VazEOudXKIrCmzeJdJDMGSEmehznKzLYKdJqYvVFUow8Zv1WBcjLv83FArm9ySG3Uui5EFc47W6UQgRghBfRKr1FIZMZKCIERGD/v7zVZiYM8qdYwdqjNFaf+1q5efHEVthjKGIMQUWVbrFpnI8sAW8a1ae/PWGpuXYttQtxjxIGcMqI3zpZBfruTHhVpdX0T0puL8NY7yT92+bFOfaMOE5nf04W6BVrWM6jbx70fU+wMcaUVd3VqZgGsdg+o8kDGU0l3I0jPloVC7HSsR10zz15oej5VY4lx6RUm+NhhCb+ePXeG3Q7ijP/N83AGZV0v47f8vIR19/dArfOurKPDVg/u9t1bJeTidOByPu8nlUNha41qs1N4w+QsNsB5XbjC57fQMXr8iL999EwuGhyMe3UrCQEhuh6T4+W1JQmnNNI2isriFy1YqHz994loqeX3heWvk9WhyIutxJwQvzpkJBKIvIk+fP/PLLz+bjUS5cr2e6b3x9PSJT59+pbXGtl24XC8W2PgNZpB6N27zvBrj7jx/HeB+WX2Uu/P+HypXdZ98mo6diDSFukLPxglAvc3bsnFSRHJ2DYbgUvxiREl1qHTA9KGZLP+54ODS3BqTiagITF8ikYg0RYa1jdNssMUHW3RCg3YaSMBdQTOkADm4YZ2CdusIc02OuS8yhrXk7qwgzz1c4yDkTFxXxFvxYk77aKgfGxmdMEycLKrFbk3su4s6IfOu6+ytJ9C53RKnCW9O+NFFwebVP3ySGjAbEe04jP1k25WQhzJGuE04eiuNRLE+phTSvuhaR04kBuuisuck+ym3DH+CltYZMNz/qtN69W4Yg1mtpTEzqKgKpQyG7z/ciLRIY4zihFgnRw0Ar/eqqynHBZFuZNph9WzBNHV6GvSSCOpmetnQEkNIpiAhEJSc85uO5d7ZMIMDVa/nT2uLW4lnlm1A3Vl+to3OziulayeouE5meLUggbesRtm7AWd5bor83W63rri5cFvJSLgvE+ievbPD4lNMbnZDz3Psy5v64q3KXmqSICZb76erXb2668TYffzm2Mz+++vbLJMpdz/fdDR/e/st8u4XT3iV2f79Z8vd/3p3z8qZw+ff4AuIY0q3Z37x5vO4yMwldaoy6y4F8iogQ74olcq8+8Wm/PYffuPbyf3TfmOEfqvW9Y2n/jO34GrFMSZiStZ9K7JX5se4nf/zGO+lWb1dFwp3Y2xlqC9BKHsf+4N+Veu6Bb33x+/WTWnlKinFZFVfXki1EVNmrW0vuy057x2y0QOy5+cnXl6eqbVS6tWQnNHZNlMxb715yetOqJTX+/93h26/94qKvoe/+up8/vb2b0Byujly12I18DHoTgpMIXLonZAsm9PHI+SEHLKVfcYgDojdL4CtErYye/jQYsFFV6WpF8GiGDl1XZDHE+HDBxtkmisND9K5ELZunQjPF9jcndxlIjWaBPuQAcfE8scPxOPJJ9pmC2bZ4PJs7XZ9QDP2umwbvFwgJoYUhhl1ICFDXhCJ5OPCQb5DeycdMmmJaGu0y0Z3V/bUrMaoCDnCWCPnAM+hc6Gi0kgy6GIBUJyiTG9s6wAWLBjBSk3+PyVr6QaS83mlCVqDXYwSUEdytDdGt5KlbANcG2c47wmxADaFTCCw5CNLPBJj4vH0wOlwIsXIcZ36OYKM6HYEkRSXXZZdI1YuoNP6ldEb17LxfP5MbZWyFS4vF1rr9PaeUmD0Az//9UC5CqMHRAZ5ubCsjcET5+tHoJMUsmJ55ngkhQOEREzviDyiqpxOC60dbeJpYrcOl2ehzMQhmFSAind9yfBzevDu9LbCjtq78cNqM2SzK0vM6GITXi0NULZeubTNyjKiHB5XuihrjBxzshC3NV7qRuqQ8oGlWbA01W+DYPYHweDrdcks2QQUY7QOSZOKiowRfdk0DSbVQA2CjkGKYc/WxlCq2wCM7qjcMO4azWvtpXPdGqMPrlvnulm5SvHuvyCEy4ZET4zEzsUuQhMoOqg6KMP0qYa6Wz022Tad8gdC81t3n7lJTt5LSOFt+XL3mak9MJOF+XcPEu9Xum9A/Tsa4NyF+36jJPa7dYvZdd0Uep+IzcznBFzLyvixJsppnTde8nOER13oNZmSBqUrL2PQdAZRt300dOnVl+T1knafLH2Jw8jdc+6Cr/ugYL/ztdTEt3//525JEmteeXh4x+idy/kMS6anZGuQeJAhdtwRqG2wFUMqSxu71g4hEJLcjsnw1GMYkoOCBmGoI+IEkISVBk2PjDEIXQnBZAGGCA1h1E79/ESIZ+e6/oyKCeIuy2qBWggmyjolV7xseN02L1eZxlHzTtSJ7FjS0rHuMXWDVrkbvC+jNb1LbnkVn92fIq/4VLOsF77d5fG7QU4dna03Lt4eNmpjbPZl1mUljkFUZcSAHldYF2TJhHVBhhLrIG6DOJTQIGgDr7Ora150lDoXChE0RUJO5MOB8fhwKyuJIm2Y6Nza0WthXBXdPKobZrMwhvGCBoO4RuSHR8L792gtjO1syM3TBT6fsZVR7AZoLXDZ0NgYGmiuUBZOD4R1tcj2eORwOhpSNRphNHop6Oi07YIqxNFsToyRJQKLTfxLUBKNQSOJ0nzy+HfTyZkQp3uBhSSsyQIL6UpoZvEg7VZy0mhgtzI5EibI1aoytk6Idk2FaF0oVvc3p9jDcuCwHEkxcTwcOB4O5m91eOCwHDytCTBsso5iqIrKsIcB1FSZuxZKvXC5PrHVje2y8fz5xVogt8H15UCrjU8fv6NV0BEIMkh5Iy0F5ZmtfESks5BArNMrB4hhQSSzxAdy+uBLQwAcjRkJhnGKyjsTG7Rgplj5jsGQitL37Oi0frtO/E8ZSWUPcrQZKrnECEmorVHbRh+D0gtbNZh8rIF8zOQAaw4s2bqn9KWzbVeqwqEHjj2TgDzM6kTFzC7XxXg4OWdysjb6GFz3JCgpCXnEm2ibI5ut2cLcetzFxEy11gKx3hvN28bV4hcYSmmDUk3ptlRTDreJ0zSuRSBcK+JiZjEnwpLRIHSUpkrdAx1DbrriZiB4/6AQFRqTaHtDewyN8rLVWysec5eAf/HLV4GMBzxfBkX248ug50Z+te4dWyQDwTsSMZNUJl57ez93SbFAJxpqN1H4ga1TYXaehTtiM8pFvFsOO96eUtwFb7fP/Oo4cAtDXqMXvx3o6Fev+xYa9O+zRQksaeFwPNF7YzkcTSg2uqo4HuRg5xv9xjnrfdD6DWVEAiF6YKPq5Wb2xN5QoODVEGvMUV9HJBgdQxBXNDYLoonetT5o9exEYxN8rL0TYyKnZUdsp47c3AxBNdrFvUUFjpDel1pvJTdlJxbNqOU3xvV2Xuwn9O1D5zvp3fPtg745Fr8b5ESXkZZgJM3dQn4SBvf/2KMq8K4WVcoY0DuhD3LvLKMj6hCWTyCzm0NdW2O0ZshIbcTNBIPUhdluHTUWvfYUGTlZ5xUJbQprhsOKrAmWFY1pF/TTroxmdX/paiiOF2sAR5mat6Fa66tlNF7yip0RIi1G8PKdxgjJ0KdwXO2kC4ERAkQTlyNCToFliaxrQgJcttmaikmcu4z1m253Jwrquiq7GB+7zPqtVVBeDfJ8znCJfOPuKq0KraqL6bn8OIkUMjktxJCsPLV7kQR/bzHSMbJfoLPjakK6fZixXeum8VO95bG1ZvoRTkIeDUYD+vQ6M7LlEmEJEH2h0sm1UHXkaaChYzlmR7U5nNX3C2468I4Q0CUjJAtyOvRhqrrWwde9gyGx5m+bxv0ztukvNsfKhtcnnFmimm3TDoPHFDgcVyQFDkvkdIgEh617Kbt3Tld1cU//DJG9w2KWw6bCqXVHiIv47TsCd/uyd3m4GurNe8rKtvPxGeQMt13pd+Jz3kDGzbzArtmh7MaNpOjTqUxoBEJgzJKMhUZ7ScAed5hfhGmOeLOTmZu++u1NxvN+Mfg3P//rzXGau9/ZS8lToRsCqp35RW8Bg7i4nNy/obnGx0gKxi/JyduLxTRdRF4tOeRiSMTWLNDc+lyw7dP0i+fb9/GP0zlN6f47+17ePf71EcFH/hbivF6gXn2nt9zmtRFjcmNNb1CIjnJ2x6tU3W9RftPX6bffe/+PKdbnv92us5lI+Jfdgw4/BvY8Xunh9NH3GwhBLNEPw+flvUnAfvbR9zbzmQzwxefB7Tx9NRT7+3w5xt86oL9599+0/W6Q8+7xAXl8IClIaYg2pAEMgiQYBuEzxFRkxbo5WrOJ6+N54/J8QXvnWDaOZSOOwbFfOWlFGNSh3sPf6ZcznUjYCmM5UptdhzMn0BAgrxCTBTmPB/oxoaPRe2aMxnI48P6771gOK+F4oq0PDFkYrVLPjVEq8lItA6zNkIqoaBC0XhlPzSLdMuhXH/DTiXB6RmKkn05sp5Oty62g64IuiXSIhD88GmR3LbTakJzJD4l4CsQO/xJOLO8XXi4b29jYRiEukbQeiIcTYT3+Tw7h/+Q2mvGFPOuKCDlEUoxGTpykcsFb8G/Rt4hZeqRoxNTWTRwMwYLH4rYQDyvL8YEkiVP+wOPxkRAiS1pIYTG9lJ5o6mx9jUzLQdFgwJoOLx3AtTQ+Pj+z1TPXcuHz+ROlbfTW6aVa+awMdBOokVhhHY2ojXdx8MMKh4NwDAqtM6TRJdCDTTgiFcKFIJVKvCEEcUNC8TLayhIX0IjmR7StDB2UdqX36kFEY2jHOE6Z0/K2Fh1G6JyGisGSB9d5MgsE4y4VbdTRUVEeH49895//QD5m3j+ufPhwICh8/O9/5tf/R02fpkYutZFC4JCNOCkhkHJyXzFXwe0N8cqniCtd9+YZpqm1ttaNY6UTsRQul+se3KCG5FhQZoFPb5iVg0LZGteroci1KUZzDSYMJJYc1a6Ma0GCcIgJOYi5mseI5oSKuZMXdZVehT5NEbFAJ4txACUvSMoWGN1gif812yxN7b/+zvR+HxjcPWS+cVYuGK0w9oB8QTVbcBe8ZB2EQwocFu+0wsjGKUYe15U1JVIMHLLJKoQA2X+aRqzt76dL4V8/XTiXzudr428vxbyvVPYSIfzWoX1Nsv7dxe+r59zCtVs32r//ALbSEBVOxxMxCA+PH1mPJ5brhVoCtZn3XeuD61YIYo9txc715u7v+3fY4zsPfvbkweZrQ7wDKkrBSPfT6kJE9lbvmQ/Ze0PXQe32WaYl1ZyWYtfzbh3hic79oZ78If/tLgS9D6rsMr//Cl9ur0j1r//y1d397J7n+n0A9Y3td4Ocw/FIOxwJlwrePBJcPCoQ7gjEU201gAqtGfz2qRR+upzprfPQKw+tkHXw/ahEs1SkO6lZVWnblUZCWkc/P9FHREWwQgDG+XkX4BCtrnjMDEkM7dQR6Np4OD7w7ofviYcTkjIjrSiJPgJtG+i1Eq4NKR1t3o2RfNJrld6tLXlcGuOl2ne7FsK1IDHRrxty3axWekiENdmimBdSwsTjPj/TzxdCSuRDIK9C0Mj7fCB2JT4HfvqUiGchpEDImZBXQnpbxWNLkwdBg/NwhOTeNUHVvKIccpzaIzaLeMt+gIj5JdUi1Drrw6BNSQkecyYdTmQya3rguDzuglITydERjNzqPR/GCLJs27o9Aq3bAlq2zsvzlfP1zFYvPF/P1L4ZkX0MGNb1pUXQKoQOiw4inVMcPGbhsAhrMORuhI4Gy0BElCGNMTZUurVNDld0jo1ARyQRMyw5IZqRdEDGA2N0Uo3UZpBtGzZBRUnkcOKU37ZcZViGeEunI299MJpZFNRmxL+mnc7YOzl++MN3HB4P/PD9iR9/fDDEq25sHz9SL4XxMijbRFDYUZwUI9l5OAJWipqzmAc56hDMfUY5E3gdYsFXLcb5uA9yVPfZsHWlNGV0pdZOqd763dXOGVELvBzdbaNRS7XSWR8smNAl0ZogUPPBq+jOyblZt7ikgpjFiqQEMe66WrNU/r9s80Dn9wOcL/9+o4xPPSDUuE+Dau8JvvLcSuVRYInCIRu6m5yTtaT0/zP3Zt2NHFm25nds8AEAp4iQlGNVdfe9697V//+v9OqnfrhdQ1amUqEIkgDc3YbTD8fcAUaEFMpMsVabFsUgCYKAD2bb9tlnb+53PWPX0UfHwz4ydB7vhC5cQE7zieWH40xwwvOS+P7JunNRy0DKv0DadGFzPh/yyeO+/IifP16vfUa15TL2XW/AcRiIXU+IneVRiQOqXeMkwNm9mvPm2L0Bk6vXvJlsamNOGgjSano7bZqptSXdxM8Xj6rVHXrLolsBla4eSGVjfV1dW9lXvvj6mmnjShP2xfOiptt6UVWVl4/7uXP92XH99IGflmy/ML4KcmK8JAxL29FJ86ZYXQpf/td2DqWQSua4JH6cZnK2HWWumYgyVOVWDf9VVpq2vYlqYYN1SeRpRkXIDrI0A6BlMQ0QkF2lNEO6KmrAR4TiPdVbS7tRd1YC05RM8NwyL7bMpqovyoamYFzRMpAyMi+oL5f8EG+p11JNlLW212mt5vMTu638pNkcK121gx4F+hgYho6h7+i6jthFQve6bcdOpAU7tliCZrJouzkM+be6L00jdS3gM39AbQBDDfiiUJ11vNXVXLGVoqoHDfY9mjW3OsxFyCIekMB2KWo79BVqjlauSBEtHdpau0WHZtjXTCiByoCXDiHiFaTOpgmriSiVTiCKJ7qupXdHRAJuTadu/SClLJbJJVAl4bRQXSCHM9lFHBlHj5PQxMYJpLwoEcl1vttrnktMrO19JIQO5yprrb1g7Gihoi0tGO/ox8hu3zMeenaHgf1hQFAOdztuHm5Iw8IsM3OaTZDs7blcM9RbtRi0dmzXV3PVAAAgAElEQVTQzRjS2rObj02+gCQnLZtIaO3bqwndGufSnqeB6lr1kld1VWoz0LGWqlaC/iISFtiYGjsGVoY2mbJrH43doM0lumofbIEQJ8gWPuraBd8m+lc2d7RDobxYgq8EKb/IwG59zNXCs/1WmydtxhU6Vy1iwUHsPSH2xCDc7AK73jeG0Jiczgd2fWCIgT44xs4zRv85k9Ne/hg8+97SmE9LYRdbHEOGqV7v9FtJdC1fffLWt0379paualovD8/L9/+Sc7h6kq8fwl9j1Ov3KGtW3yXxew21vGgxBbkCf8LL97aV3db16roc1P73srFKt3LQ1rG4lregzXnrPdQ2KNfHifXv2slpxTVjgq5OiLTHCLQsw0+Yme179po+O19/77hiOT89Hp+Or66ou/2evBvJjyc0WapxGDBxatdhKbXW/rveECVnjtOZaU7864eP/F9/+StTztwFx30Qhtb9cJBARPGtPGD5ANXEv6WwfPiIPs9U55j7SIqeHDzPJTNPnYEcsU6fED39TY/vPBI8564zI8FcqfNki/LzM/7xGTlPkCfICdHS2tDthDg1Y3x0jZIxfY4rBZ0mozKej9B1qBNyH8jRdozxdsAfBrOajx39sDcAUCr5eEYROmdt0uIc7+72xL5jt9/x8OaOw+2Bfv+6JY6uj/S9gSnXOmXEG9AR54ne0fYY1p2GopK3DpPSrFG1AqngFjt2vomGvQZc9pCMoSH3kHZmLeAGEEskV+2oahQ5Yl1rtL9rh2xmmWHJkbQ4ynmGZUTKjM8D1IRT8Nq8W7gjyD3V9eRaifP3SHYM2XEnnp1Tbv3IrX+HD/WlrqxArglIaJ1M14WCLIgkvA9omajDEe8i0c0Ed7DFUnQLdU25kpPiXcWHgfrK2VWOSOcG9sMdZReYnx+pciTVyqKFiUwmE4fA7mbA94G3v7vj9//8jv3tyP39nrfvDjhgDMLtvmc+L/zlf33P9//6VzQZaFp0MbO4nEjJGLdN+6NQrzV22LGYl8w8V5Y2cTsxw0l1hTFlY0EbBFmt660NHlIqTEsDS6k5GivGWXnXdqIW9KgoSxVSUbuGM0ixzc6sniQd2RWyZLJBPgOAdmWT1fy/OhQNDukDrguELhC70N4NKBUfXlcvVxtAtNF20esit/7/+pKST/9x/fOmM6Jdn2JgEyk4FrwIh2B+RyE67h/2HG7vid5xGLyVq9pzSyuHWqnK0wfhbu8Z22OcmKLR1hrb3IQmRp9yYQiBOSlPc+aHc+aY03atrItvWz+3sf71l4zTp3TA+nt6YTS4BolXJZTPwM3r3ptlzbnbWNBA3w2M/WgauJwpIV9RAwaMJGfWbEDn7PxvjKiubKmBlkZo2uFYz9YnQKeU/MlbXhVbBl5WTVBZ3Y2vjt1F26Ub2Hlx5NaNAT99NC9xFvaY9Xr+Cvny8+Ma+f2CJ/oqyOkay6AhUFwrb4DtwEIwl9qVMBc72KVW5iVxXhben878r49PnHPiYYg8DR07J3wbIrNffUR0a2s0tam14uZ0opIo3jGVnqWPLMHzAeWYF5S1iwq6IXKzi/QS6JxniYEUA7Wa66ouGT9NdOcJf56ouhjAwcodtJZvoGkIjMmpLcJBNV/OZIjWYu6EqQvM0SPRE101+/8YiMOOuN+jpVCej9TJGIIQO2u5dnAzDkjXMex27A8j426kG19XrBqChSuGGFvekjE6K+3onbcSiFYzX0RtKWoLihRFU1MRJkWM+cZlhw/eroTioFjnBiWYuZ4GREeQAXBQO1TNixXfNRPAttBRKdmTp4WcIS+KpgOaPdQeXx2qCa+OiENUcBxw7KgSiTXh8xOaKl0dGdmxE8foOkbv8V43EKcouSRqtYm3sBibQwWdURaCj3TBEXy1LqzOIZTWxWBu3+bYq+Ss4BzF1a0W/VrDqSO4ji6ODD2EMFMRctNgJAqZQoiR7hDpxo7D/cibdzcc7nbc3+958/ZgJUitdN4xnWbmaeHDh0fKktG5kptqNNdCztaVl3O5xCq0Mh2Atn7jecmkrOR6YV1EwBfLPwq1WpmMtkCu/je6mv7lze24rOuW0EAxdtuqsay5mmE5qsSihGq3b8ZRJFBELCAXv4Gc1T8nq7E/GdPkSXBWPg6WoQNQcW3X/cpMTntdK699vRv+4k5VP/vH5at1QRNMZA9N7FtxFALCzmcOMdD3yre3PW/e3hC8Y987hmiv4tpa3zdtRh8dtzt7jMDm9aUqm1+Wa393KYVUKn997vBOOGdFJG87+0pj81iX30tp5MJGycX/ZXu366J7xWisbMUVaLomcL6ID19prLEJxsjYvBpDIIZIjZkuduQXdRtwLS9uvVecXHRjq3hfm4fb9fv4tFwjL1yyLwLm9Zi+jFpa76PLpuXTsV2GXP38SiX+t8DFC1Ran+YfPBO/AOh8PbuKQqVgidSrTG91PbiIolbqtzSNjTUwacu7gqpCwRicIo7UPlqjGw2Pk8XYmfWGX1Nt7VXUFiSYWJJuv6MCJTtKSuQkLNPE09OT1T5zQeaElEqsmeAwOtrslaGZ2xk6XlFpy64qq/Cr+QOtF5eqdfSIGBColohczgGJAiGQxFuqcy3U0xldDOSUmFHvUYSAo0eIWpGcIS2QfzqD49cYK52srd66dqttLrjaKMl60Umpq1soK6WxJ6rN5NC+79XjNODwOI046XDSI9o+agB6kB7wiHaIBgSHaLQWbTWjRFXFqcO7PcFFNETGrhLcjlwXQhgompEq+Gq7nZR61Nlze8l4yagU6/CiWInQCV2I+ACpJjOOU1h1QXalmsoFsOVXrQvMpEyKSKXqbABHHdrSzw0Ta7PAr6TUsfjXda+2LgwwYbAH500874DoCBIQdYw3AzcPB/pdx+F2ZNhF+j4Qu4sZYOwDw2HAecf+fs/N2xvylHAfz9Q0UagsJXGe7diUZHoCAzmNgbBtPThhSWXLOmoNTnZMpVEKK8vQOjxq21GatqlumVJreWl7z+2r1aT02uxPEJZacSmjIkxLYUrFQjpr3cpYW8W1PXcrnJm5Z7DyvG9dZAomHVqRxyuOa9p9KxJcuP7LY67HpxP8FTDads6yuks720y4rjG3cbPut3Nh4vRarWNw9VlZNUGpaS9rM8KcPHZwmtO8igfsflhy4bhki2FZ1ow1+31Hy8K7XjB/5n2toOUFG3NV1tv0XNoW9Z9Ydi/lMHltIget+RJaWRIpr2GU0spWDrR5j2kDZSuoW1mc5mEk22Vglgy1yAYIL+/j8rvXMQmfg4jLunkNpF8AnBeszeXvrFBJP7kmf25cOrK+QDx+4XHXr3J96PVr/PTn65P/XCn362aAdSbphDIhTDaHOXPFVFcpzvQHWSqzZnKBc66cs3JOsBShNJv4LIHZRZx3nFzkyUcWoFOlWycvB0XW3n/TsRRxzHhmhblmnqfEY9MCBfHmwJgzU4SaIsvxzPH5iA+B3ntuup7oPOOyIBE6CUTnrKMApcwL9TgZqKmgZQGF3LxvaAZHNc1XtJ2d7BwCJXjLuprP5CczUKr9j8xdZ7qRnHENFJUQTE8UO4bbW7phwJeMPz9TarpSq7/OkAKSFcRcgcXRui2sc8okcELJdpNqtVgFcWqTXnV01SZ/D8S2OwhEQu3wNdBxQyd3BOnxeo8rbxAJkPcoAxa21psuBkHUc+mDtF2h18LO31J9oXaFfT9Z950mUrV/11wpi4XdTZp5Oi4kVxjczOBOFJkZROmJDET2YeRut8dF4Wl+Zp6zGZqplVxFVy2NCeuLrmxMoBZHzorqQo7vtxp0dQHFk4uwVM+cBdHIos9omV/1XOZFKRmQiHhjVksQSgQfIvt4Ax7e/e6eP/63b9kdBn77T9/w7tsDw65nGAKxMwb28LCjHzvykqkOwq5jOc18+H/+g4/TmZQL+XzkMZ2t2S6DtjbYNfzPqBq7HlKtPE+JORcTp/bOUpW9s2Tz6KGY14ZW6whLS2q5VMqcm/BdZSstmnbOTDtTKSyNYk+qpLaoL3Pm+HSiAI/nwtPZgNbznJmbvm4VHasqmda56Ryhi/S7gW7sG4MdraU2JwOUr74yXsoCdn29XPx+/nfYfg/W7edacJAmKzBfLBd6W0Q7D9FDiKZ/SjNUx+wiFWN4ozf2rKhynq2b0iwbzCut5kRu3mMSOvxwg/PRNritzPjxvDDPhZrMiDS27KaKNRbYS7+gyEu6lX11DWYu0+O66r7sMlK9nKUXxNuLFfbnCiy/zljSzDQ98/z0kZRmTscnBCWGgFbrUqzes4XHquKzs6BjFbpgjvAIzWzPNn82L5u4/9pJ+LrlfA0LNkbIXViz1YMBNlHz2q6+at4+EThB2/D/FGv4Aoxe/e52zV4/30+gnE/PxJf2Ez8JcH7B+CrIyZopmnCarH0c2YRSKrq9ooqStTQDvUoqSqpGJVupQyg4snNkcSzOM4k3cnENVMLsA7bqfhP9ZSBrIauQFGaFSVvdVkJbNCt5spbgyszT8YiKsO97uL1l7DqkVHZetl0aPhi116yna8loaRN4VfPraV4e5sg6b7qB9R4roQmcnTNTuGW2C8yfyD7gxOzkPI1xCp7qHTIMxHHE9T1SCyyzMUbda3dXYexNuYiuq6s45+wYtIu65EJerKXXtbSOtRzpVlJZrX0ZBE+H067lOg142eHpcbpD6g4IoDtgBDzODziJTXcgl3mnVjYXlAasEGXsTT5aKaQ6G+BJlcUl8z2aTkzuAyoLQRxRFpybiNITyQQcnXcMXY+LjnOem54E7EryqChCsq5BAVHfhLHePJOKMTlFZ9zaoSIRddbpVzSQq4fqSTkR5XVjHUqujcnxbYfuUS9UL0jn6PYBF4Wbtwfe/vYNh9uBh3e3HG4G+iESgmVNiQh+jPRjT8mVeVnIWpmfz5y//9Eqjs2Mj1O7/rMJzW0TfRGPWrSLkKsFqaZqBZjYjilOWvOCWHmj6QwsDdlKYM2AfNttbt4erAudtjnGNDYZbU0J1tlYZmOQn6fCcarkqq19uf1+O37rIltbOcQFT+iCOTs3XxMVbfkE/0WjLUgXEcPPPPSzDZF+vhaJsrKTIiCuM52/EwiCeOy6wTqvrMvHmYOxrsnTJmKfUmZKZk8wnSdyypQ0s5yeqHnBdSPdjcPFnm36Viz3MJsBKdVKlB7aumGL+s8fkFZK4fOF2K6HS0ffi8VQrmHf9fO9tuEq23pxPh+Z54l5ngAreQZvpasi2HpZisUjNbpTBCtvxYCIWSGsICdLM2RcmcwrcLOWjJ27uF2bS7a975XlWr9GV21dOzKfAByz9Fi/uroUr7/4iXLVqsVZz9fPAfUvPu9PXPy/SHz/yfgqyFm9MlePRIcjYIu25V/a7tfTnBW1bllUtdouoAuBKkLXdfR9T3RGmy7iERSPx6tvHS3O6vq0FFttrpwb+DGH1SimAwo4AiZ41pLJaRVD2udSlRQiPhW8Kqdik11u/Y4BpUwLeV6s+6qohYe2hb6mttMsufl46CVrS8ywrjbVndSCVANMuYGkRsoTsJ1/9layc6Wiu6MZq3URH9ojU/qbT+LfNJKiiXavK7U5YqpURKz7xCFocS1ccz2vfAKhhYtTtMPREf0O7yNedoiOiHbUJhwWcZvpn5PmsNpKYII2K55LyULE1m7DUPZYbcdQiDjxKBYtUn0l+IIPPT4ILkSc99Ri3R1FC6Vmcp1J9YxXQd2Ci+bWbR1jznZHOKqaUsSJoCU0A7xoQEjNH4Y8YQ6UFVVPVY84TwjSLoiLpuPVTmVOFnNxPnI8zUx5QgKEwTPc9Ozf7Qh94N23D9w/3LI79Oz2AyF6c6cWDFivKYltd+ijZ9wNOGDYj/T7keQT03RmqQtUrGuuhW2uZSvAmDixctLqeyPiCcHTd6G5JNvls5ZH19DMvP5bZdscXWSZ651ko7CyAdoOtwHljBpYUmXJhaWxCfgWP6BieTrNPLC9bXvejU2Q5vlTN03eamT42uMyx+u2S/50Yn+xkH/ysw0Qrq6x1/dt86iZq/lPHZOBvFgUd8oUFpx3dEUJ0ZoCYjAmLpXK0ykxp0xJifl4JKeFmhby6RHNM/2+0u1v8NIZ8G0lKlcLnau4ALsIh84TfGUuyjl/zpAJlyBQ03pe4K5+YfGzUnLzi/nURK+dX5NsrcfzZdjja4zj6Ynn50eOxyfmeQ2vtPghE5jberJ2KBrr0jSSIlvyt4ix6mvZZs2pu7A2F+D/4m1fEynSJAm6giK2+W7r5vtbDsgLhuzSUfVzY8Pcv/B5NyD2c8/5M/fB9fi64zGKV2shFa1EYBRvahbBBJZS6an4Wmx3Vyo5m0mXZRbt6GrlcLvn9u5A5wSK8lwqC6uTqV3M4v3Wok61duMqUJ1SNIPzDF0PIdrGUQVn227yMqFJzQtlsfjiues5nhZS7JhEmLzggVGVW60EVXQ6w/MjWopRT7blM2v5YrdVzgu5LNsusqxZWwRULHspZEvuVpRlKdRctmAA3zZmqVHkfhjoSyU8H/FDz1AeCPsBDa+7++cEelQ0tPBSsZIjIqgLeK8WsicB1yINzPx3rc3WtrCDVgOiIp4od+y6N/jQ0fm3eL1HSiRNB2qOiHhkC+B0lADB14290QaObQFSxAkhrKJoWrkMvPeEaIZ0TkyzTFXK2DPswPmJ7viR0HWoJlRgKTMhZ865cEwz3gnZF8I+G0CuEa/RdkelM/8XBdRYGcGyXJx4VGfmdKaWHxFxOLEOQ2pPjD1BAtSIlI6he91z+TydeP/0nv/463/y+OORaU6EvbAfBr79/Tv+5X/+E7ubPffvdnz7uzu6IdANjmF0iDNqv7TYBWvbBhSGfU8XO5Zp4fj9I8cPR6bjxPPxz7zPE1osnNcVA3216JXZWFuGRFvmleKdYz927Hc9MXjCJuhVUrXcm1QqUzahcSrCXBzlCuQIq9bAJrMkyuwtOoLokGj6hXlaeJ4TuSjPc+a4VCunjj1jb2Gr51MmTYnN9bVt38z6zkMV0pKZzzNFM6lkSs2UV6Z0Lnx2G5/skn9qUr8uU7z4jRUoNRSnwJyFx8bF/rgY1RJc4XY6cYgGcvrR7CyuWbRSKqcpmWfRMpOf3lPnE+QZzh8gzzy8+46buweG/Y5cMtN8puRCcMJtNLPP4B0+RuYKP54T3x8Lqc0tK2sQnZicQCA6CJvni2yaExpTYa+tbOAm52LAVDGT2aoUFaYqpI0xkq8sn//4+POf/40//+efePz4I8s8UWsm57mVZmeWZbr45fgWHeQdMdo8tBtH9uMOcUKaly2I2ANzAyeBS8kJFlbHYecuROBaa1Jqi06xa1jU5rVPU78/HwYqv1hpumJntly1/x+OrzM5qqzJRdJoxojtimrTcdCYHLfy/9Vo9I3JiZZWPvQ9/TgYk7NklmTloIgnrtNMaG2iyJXfSttxtkk0BEcfrZ2zGZ9Sq4UBFi3W/XNOkCs5FRYcGjPZCzlaq3upii+FWBXmMzJNSLFcLdtG0ihX+/tFk0UIqLbSWdt/NBWj4JCakerQqqRlttDE9pBV1ro0UaVPGelHalViydTDgAaxLK3XHAlYsF17U/9Z2VEovlKDTSaIQ3zYOjXXzoWL5z5Gd1arZXkZ6PwBH3qCHHB6QNRTckcpBnB8c99cWyMN3VoHG22XkZrxmzihlObvglxYneAIvmvJ9mw+PyEUQjdQFXzzJxLvrURYM7ma+HSpGV8FDTT2zBFU8C11u2RPKeadY6LpyLXiUZGWrntGxOF9wUlLU/ct9qQ6xBt78ZpjTgun+czj8ZEPz882HUWh6wI39wd+8/tvuX245XDX8/DNjtg5kALOEpvKGqWgF08awEzL+p4YI7vbPcPtgSqOGh3nFtDqi+CairesIEdtMgUTlg5mloyI0oWWe+Vd61KyCbqoCY3tQ8kFUoWlWnQIXGr0zl10Flm0afcU5x0STYNQJmVOBprmXFhyQbxdM3FwFkK5KCpl21xd+pmsFK8ItVRyytbu0DQLmznmf+X4hUDn+uv1noXGfekFQGQV5mKLfKpCrhZ9spTEyWe8d/RZiV1Z/zxoc+adkjnZzyfq4yM6H5E04aYfkTwzjDukFoK0jp1s7LgP3sJ4nTlRJ+dY1MKfvVi3m7CWw81DrHOKE7FIlrWk6mWbQ1g/sNJmreaHlkUpzq5lyUpuNbELVP6vGU9PH3l8/JHHxw+kZTbQ7+yM1JrNlLGY+ahrGznXmBzUtDt9b9lR61xZSqH6TGnSgrXjrDrLhsufvD2hTV3aWNPmKk6THFiDzddYHHnx6fLtX34sP+fe/rbxj4Knr4Ic13azuXpUPWEz1DIzuNBcT0NOuLSAViQnNCdL6fbCOPZEgd1+ZLffEcThZSJXRVSoGrA7kdadsdYmre5fAVcNROnVizZacy1/WI2ekpBU8dn8b7oAgwSiC6YHEEd1VnJIS7agyjnjUn4JchTTijSQk7VuwKbopUyvTd+CKpoLxa2iMKG2RHHhYji2YLlAriqkTFgSJQZiyrgmMHvNMXR7dv0NGi9lwDW3p/eRXRwsfdp7JDTNzIXvNkfobLvgXAUpYjeqHxE/gnQo5oFjTJwZ7YlAdQXvWh6KAK01t25ZYZeAOhEh59yMF9lm6RCtPu2D7UQsfVaY55l5SSyto6NgQXhZ1YBNgSV75lQIrrUFt7pJrYJrx13LTK3JQIvZ7V1uaBGkGSPqFqoQEDGYHuOArzuEiNQdfXjdiI44dIQ+IhHw1YT2fY/znv3dwHjTMx46usEjTptpX20lpdpKVasCbl0JG3PmzYk79h39frQuqRgornWwykqyQ67WaXX5jhLbrnSInqEfGIaBYRyvgh5B8mrS58lqG4BclaXAbERsW/js3DutFjsiSpVK8Zg4dt8z3u5RYMIRUm2BpaCl4oJjf7vj/t0dpdQtSDZn5Xwq5MmY2ZQy85zoF0fJqzj5xdt63dFOwQZrtq6o9uNV5/Cp9uSKybGvuaxNG+2v22NX4XWp1jFVxQBPauc2Xrcpr3qrUnE142tGNIFmVJtsez1QWk2QnBdKmknLmbwkXHCkMpsOK/Ts4o5eHHOunFPHUmzzHMSYmiF4xuBNz9i00ZaR1bSU0tYGd61NsXJMTtmEutVMCJdsDOH3x3KRIXx1Yf/Hxzydmc4n5ulMWhZCcLjOXwx0nXWoXutWatN/rrqxDcvZmXtJoiCbGN60lPZdY72NLVcUaUG5tOdY29pr+7z+rc8Px8/AErlmdr5cUv21xq/xvF8FOSEHyB1T7SnFmJssFS8VXzJdmnHV4WaHPwY0eOQ0ofMZTZVh6Hh7N1K95+bhjtuHewt4fP/IXB7JpTCKTXSCVQhWXZjlGgleKyEpsRhdZ2Z1dnyjtxLIrIVlWcjTGV+gX5RQYIxw781LJDnlFJQisEyJ4/OEXzI+nfHzqWlqlLXvXVSg7fCzlG3nWGg1Xmluq9W6k0QzZG1sRcBF07QstTamQkkNQ0lWzqcJn5W+VtzdwYJGx9dlct7dfAfZWaaPM8GlNn5ziB37fkdwlgIfus5ARMOfqsoyn5imI7VW5mxdMIinG97hwlvERyp7ch4BaTlBGcHhw9LYDqGUQHDGeuWcTP9UKymlDehtl3djCFQhBM/Q93jvLXU6mNv06XTm8fnIkhLHJbPgKRKZNXFME9krboZwFkIFim8Xm0B6NJ0Sxrg5bQyk18YkSxPMAtoiP6rHOQ/0OBkI/sDe39P7N6YZ0ht6/+ZVz+X+/obhdrSspq5YVMO7e/qh59s/PvDut7ccbve4UHGhmq5JC1UTaAvku5pU15nLBcFHQcUz3u25+/YNru/wh4ElNNv4ut6HylIqeWl5X22hcsEz7g/c7wd2u4H7hwdubnZ2jkumaoXiKJJYEOYqHFMhZWEucEwmHhbUkpRRxGWcS8Yc9+AH8MFz+O6O737/GxSh/NtfeS6CLAkvJ6iFMHR894dv+Jf//gdyzvzHv/2Zv37/nnnK/OVPz5ymiZQrp9PM4+MRF5VlyRb4yrq8uBeM3msMg9yXSV2wApqIrDw2yirWvYCWdaxaNvviAszZgI6dL8sJhLlUpqKt69+eP/pKzJnO2bK5meWWTMgTPiVqPpN1ojIDi1mMoGjJ1PlIPXek45HT43vmeTFA7Oy+v3l4xze3N4Su56aL3HYdpSp9sMws74R9F9kPAS9C8BDcFcjxFwZXnL23zRG7bZhq8+Z5PC2c5sSHc+b//s9n/vNxJlfl1JzsX3N8eP9Xfvzhe07HIzklhrEn+jXexlkpvq0fa45ULaVlvwFaWyhwc5NuZ75diVZAqE1831hGbaDYeW9axRUMl0ZLqFVZjMExZmt1C/+7cPzfwOb8PeNrAOdazPxzwuavl6uqg+opNZDUk81GzZCmVnzNFg5YEi4vaPXG5JQMteJ9zzj0aAzsdiPjfocozP3E5Hw7de2GFaVtkO1vryRjbTXZ7TuNSRKhc8bmVGm78ZyRAiELsQqxQi+e3gcQReTiBZGXQp0TNVvUg6t5m0W2aIdW5ywN3FnOjWweGwXTMhgj0hgg54gOa0lqoEgrVKnG5DQKtaSC6AJdJKVMzJmcrxwqX2EM3Z5dN1Fja3tv6v0N5Aw7gg/4GPF9vyn+V7h/dhEQU/s721GreHwYETeaPkU7arVLK6/+QlJRMWLaiWkC1BmTk1JuN3jZQM7m7tm20uu/QwjUUg3geG/aAdeYnJRaWnljCK6YHKmFJcOSrQoqPiDFWjRrzhZMiUU/BJr/jUsYnHVX7PjK5IjpkQgIASeRGHs6P+DocOzo3esaO8Y+EvpojWte8b0z9mY3sr8dGQ8dwz5i/Uct9XvtXdTaXIOvJntp95wTY1uqIw7WVj2njERrICgizUXXni03RkDY5AWoCDFG+r43FmcY6PvBdt3LbPo3V6h4ihayOlIVlmIfc7Edt0gTvos1NYgWRCwQBC+4aDuBSpoAACAASURBVEzO4eHGruEPR0LfkTGghXMbk/Pmm3tyzjwfnzlPJ8Qt+NB0RWoGhMucSK0byBaOdmjkMve81vi0oCLb37W5z3wVr0ItvrAIXL5ldI5sz9Xm2I1VNulhqqYXtHNof9RcdRsdtJkJVVxN0LpsVTOCNVlsSiCtaMloNruNtEwsy9TYI3vyw80Nu+jo+2Cvqq3BY3TsoiM44TB23Awd3hnACe2aCt63cFjZGBG4YiOqCcW1Wifgj8FzmgPRzfy/wdG1A7w6JL/mmM4npunMPE+UnAnBvSgnbkxOM2zaRPgvHK+v+8DWoupLULt6mdUr6meNX3KfvscrYHwByfa/Fw/9AmD4rOT0yWO+7MfzK46fqXld+wJ9aXwV5JSiTKXyMStTVrJThmDSjhElYpOPLwVZFptU8oJrtvtDdLibHdJHDjd7bvZ720V+eKKoKfAzlaRrjoqCW6k51+4v64BxauWhsjT/FnEQQbzHl0rvPBoiXpRQbPfnvJl8Oe+InWe3jxRvmT9yWkzoPBXqYgbza/2yHT5W3+wstWXhYKZXroX4rWZaImhw1Cacdl0HoTPmp9GptVZmSSb+kqYFKLbIh/NECo48Tr/8xP8dY7+/IxVFu2DlKucgBHRjcobG5ER8F9vNaG3eqoo4b5lIuZDIUM1ZJxVHWZqNeVnw2Y7fColXUWTV2pgcjxdn7NbSmBytxupsJbu2+6zW3mpuuObfs3YixM4CI+c5MU+zJaOnRK4tWbcWlpyQXMjFPCpUhZqa0RtKOi8s0xlUCRjIceJZgtCFgnOO2Dl856i6MJeZuSa8U2sl14DTRA6JIksr/2VaytOrjdB17G/3fPf7b+mHgf3NjjfvHuiHjt3NgHjTnmwlhSsnVGH10ggvFsvVgG4Flz54+rFnWBK7/cj+sCMvhXJqFY3mj6PrJMu6+HgzSXSeXOE0J9RP5FI4z7N9nhYep4VlSZyXzCkVUi7MRZiq29LNnRQQxfmMl4I4CxrdP4x0fcfN21vu3t2hCLu/PtGNHUVAjn7LpsKZBkuc5/ZuTy2Z5+eJDz/MfPww2TU0zTx9fCb2JmBOSzYtWHTNxO11NVYWAXARD3snlvS97vjbYp7/hpL2NpPJ6kgPjZfdFk2HEFv6eOeFfec59L4JeS+mj1XMwkEolBUoqAU81pw5TxMfP340hiFnnA90/W7dMVpJvB+b869nhwMJKLZO7GNjcvrIfogWEOp0tV4ihAZy2rV7YXKaz0s1bcraBeekmee1Lj9tHkuvDXDAPNbyvFBzeWGEuNpVOBcAsx0pJTW/Ll5cY8bqOFRLA0YNpDaNWGlmoHbruc3Oo1bTjW46nLXLTSwk2Qgdd8Xg2PWwwuIGxT7DFF/CGddA85OfvPxd1W1u2QD4TwCTLzI4PwNwvja+CnKWXHlcKv8+Fx7nykNQ1Ct7Bw8ogxZryU0z7mQXsp8yIS9EFcZdJHz3gBtHDjd3HG4fqKUwvX+0Mk4pTKUw14zHPFuCMxFywWIkFJCqxEbt5TTZ3Oo9DAUXI7EWbkLH6FzLVkrmeBscPjpc9Ay3B/a/fYMMPXn/3pKOn8/kj8pyfNpa+oqueSnNdF6ETDERG+B9j4+9iRR9AB8w4a5QfNO6DAO+b/laasxPLoXjaWZZluYLslBypj/PnD98ZJzP3LnX7cj55ps/MuweYOjs+HkPLeKh85ExdtbW7j0uBFanVMubqnx4/AF1kZQT6XniWM/UCnP2pGSCVucKzs1YsLPggl2Mvoa2sEprT3doqSxzpmSbiEo1rlDABKUi1FLIKTW3z0puKfHeO7pgwr1SLL26VuU8nVlyptTClBdOy5niCvu8o5aRWhwpQ5oMgB6fT5ye36O14Kuz8qs4hvhMF0e89+xuIsMuUslM9dla0V1Ah0iJQvWB0Z8IDDgpiO+p7nXNAPt9z7vfvuP/DJHpPBNjoB87fHAM+4jvlSozrecbaKxINV2DTXr2XFsrKbagVAoqlW6IHO72iAj3b+55+81blnPixMyUzU9HKKZdk6bPwiMSqBIpLjBV5YfnI6GZCh7nycDMnHl6nlhSYU6F57OxcAvC3BzSRQoiGcS6gKIUvBce7u55+0/fMu5Hfvu//57f/bc/oAo/PJ3Z/+UDPJ9xj0dSVaIq4h3d4HEu8rvuHd9+d8/HD0c+vp/58OMZH4THxyfrotKZx49veDjtCJ1n7Dpi9MTwug7WXlxjKuxcBOfoom0Gilq3kK5rZSkXAGNnkJ9aCZxYtpytgRduR6g4NRCxC8JdL/TB83bXcTdE25TNCylliiiLK9jWpjV4qGUeLSmR5gX3+MS///u/G3O323Pz8JZ91286LHGO29s7dkNP13XsBsdbZ3PMyuQ4JwzRM3bBSp+YRGIFOaGVq6QJkBW2Fv+iyrIkUi74WXBOzWC1Zmqxz1pXdup1gc756ZnpeGzaLkV7bY00DpFAaGXflCeWZaaUbKDOx6ZXVJY045xQa+GCfSyPqlRtHcx2Ln24pI2XlEitE1jLJXfQOU+MjlphydpMos2W5cU2Z7uUfgGj8zeN69/+HOR8qjX72vilkRBfNwMsylyU56I8ZSU65YjtsHZXtWJqQbIZpEnJOLUwzxgd427A7wZ2u5HdOFhbofcb+i7NkwKsVd3V2mji9aA0XmdlWYq5EFMrEgxgOJTozMq+qpBctiya9oF3hC7Q7/e4/cBymqjDQMmVcgpUsZbVqpDZpvvNsj+rI6+vRwyJq7TcpcaEqLfuAbynxg5iZzqB9pFLISVlURPVzmkhtbbpOC8UKmF63YVxHA8UHNL3SDCQI12HeEd0niFEAznOIZ+CnKpM80zXPaHicUHB5U2EPGer9Tln9WRxQozSJlgzj3TNK4dSbZdSTP+wgpzakoVEzNRcxG7ynLPd3KUwzxO1FLx35ODb7rfFLGhzCG2TnnXtZFwzmVO18mttItdSlHlKnE4TWguuCK4KTjwlQoq1iQZ7XOyaGeFCUnMGTjnjJONpr88ncJ7aSkSvOXz0DPuRBxVKqgYqo9XxfSfG5GAsyDVFvxqEba24jTG5gBxhbUf1wRG7aMGug5WeRB1LMPBx7WHCphiwzyqysaNTsjJyypnjeWYpmWUpnJZkfjZJmXLrsEJY1vLMdveYOESkGlvce4abHeNhZHe353C3pyr0+8HKeEtGvLRSlNklOC+E4Oi60coAKgxjT2hAOS2J01E5nTqWOZFTscUZaRqwV2ZyWundyup2HoNrO3Rtre6qbcfOFxbql7Dn6plbJeNKw3D14cRKQr0Xei8m/I2eUgXNglQrKdUGlptlaCtzaDNwrCzLwvPxREoJfODOeWJn7soh2IYldh3Bm41ACIEuRpxzL0BOHzx99M366AK8V/uBrTzR3o45Aktz9G1t8dlKzFaWtbnloiR3nx2hX3uklCgpN7WDtPIf7V4T0/M1hbh5MqmJ/b1r4I7GQNkvbUyOXrsUr2am4DErD6jbOTENzsq4tVJ0q4GJNE3GF/JKrq+TvxcK2vzyxZ+wXo8///tXf/kKG8knGrNfMr6eXZVncpo4p5ljmgnADx7O2WjcO+cpWLK4U91Sf7xUAp4+eMZhIIwjQ9/ThY5Mbg67F4OiUuzGqVqpbs3GuqRktVNkE1ZVpFT7Ouet22t947oKrdrzLTmhy0SZIvX4jKuJMk0G0JxQnSM7T/H2XnIrJVXnqS1ENGsh1QwIpe8pfQ/irJ3PWcdZcpDajWmdIHaRztl2YbUqU9upFsy52RZhYcgZ5yGV19XkjIdbNHTQgI0Vvq19IThHDGvHmm+5NtKoUDtfEpqYuCpLyszntHmcTNmOj/eW/SNi7Z1usee0TCCPiMNLxBFMTOwEF0JjcswEDrlUV7SuJljWbl5yE+gVQUrZrhBtzm5pOZHzmaoWtpm14rW21n8rOpoOzCbt1e+maLHurDkjUki9o8uKjw49F4pPFhzLRCHhfEHSyTQlpeLlB5aUCH5k7BRe2b26VMvTdqF1IzZnalu92r1wRTuvQseXOTXXE1r7naaB0WoLb+wDXe64f3PLd7/9huk4o9N70sl8PkKs5FCvyghm6JnF/Gycs82O80J1lZLt+s8BY8GQ5k7rzBKCS6u4+SJZN95u7zncRGLneffbb3j49i27/cB4s0N6hxQl9p5hjKSc8SHYfVytnHk8nem7wGE/sBs7lrnncLCPqkrJhXPJnI8T59PEdJrtqNSx+Tu97tLoW3aWSGvwFSFXu75LbXOIQm6l21+2q24FiSstz1qgGIKj944+ON7sI2/3keiFXefNm0ZNeJ+36+liRldLtuDjWvHeU0Og6wdubm4YhoHb+wfuH94w7vYmFfBWXtqNO2Ls8T5sc8FPt3cbO7hFnXFZ5I3NsE7M4/nMeZrNsHBKnFPmNCf+9YdHfng683jOPM6ZuegmOXjtUZoW0QxQDZTknHG+6VDlsnab2ajdJ24FcGraqE2LKm47BsZot65eU+ebU7/opSxXm5jqU70NsGKfbS548dNf7xr/FOh8er2+AFP65c7BL76kn2CZfmp8FeTU5Zl5fuLj+ZkfTjPnaEZdvRemoWMXIjfBcScQVQktAi9KBSfsh4772xvCfk/fHej7kSQJ7yy3o9RiicbJFPrOKa6BnNwOjXEJVwttqfhqach1mkk+tYndaHgaPacOcs2c5xOLFpyvhB8cMnS444yjIkHIIZBCRy5K8p7FG0tTQqDGjipiYWtNFByHgTAOKMKilaV1KyxqOTrQBJLF/HiOp8Q8m+hWnZmaFwpzU8fvs9Ile8fD8rpMzv273zAsC+rj5hKr3rxynKsEZ9OABWea/wveIcF2HPJoYt5UKucp8fR0ZkmFqQhzNo+RLgZiNHC47qCkUbHBB5zzjP2OLmpzVY34GFot35HrxTdHqWiu1GzC5JwyaZ6tvVyVdKUtWI0K5/zEkj5SSSx5Yalmm75oZtaFqs5AjmjTfAQce6pmTucnHj8ugNKPyTKXouMsHbsaTTMWMuJNq7OUheg93vWczgud29HFPXe7M3X3uotiLtbZ4jvBRxPWb2UpD+sCB9cTilLratwloC+pahuXLCAXPf0+4L3jd3/8Db0bOD6eqZMwPS2EkMmzmjSrTcaIdewtUjlJsd33EAhdoCxm9rcIJBFysRJvrSCdxymmqWsi/77zjIPHB8ebtzd89909/RD5zR+/5Q//228Yxo6buz1+59Fc6G4ih4cR9UrsLVClVDgez7x//4H9vuf+fuDN2z3ewdu3B969u2GaFn54/5Hn5zPd4Pj44xOPHw6UuuP+7S3eRfwrl5JDiI3ZsPNkpaC5sY6Ylq5tMErzJVp1PNfjxT5caY7lK5tH034IN33H0AWG4Pnjw8B3N52VtgCvkFDm1lW4sSK1UEsmLwtpmaxsHALiPPubG959+y2Hw4G7h7f87g//xLjbE7wjxrAt4E4u4GX93rqpfdHBJubZtTmHN6FtLZXT6ch5mpiXhT99/wM/fHhkSoW/HBc+TIkpK38+Jj7OmaUIHybHubTC3meui7/+WHJmyaV1hjlqqSzz3HSNluG2IoDQ2GjnaD5AtuGwjT9ICFamayDAWO2mtWk+C1UL1GzrXkmszp4rXljJD4VL+O1qD7BNAF/Syfx9kNCul89huL2O1q+46nm+8lw/xU/+ao7HqnZRp5JZWt3w6D2LOp5L5VQhVmF0Zuy1yjmtDgzBmxlgiB0xBIIPqKubEHWzpK6rW8fKyVz+Leh61LYPpxiYKXbzmffAy8dZ43Ell0zNDpcW6jwjVEJKRHOJsuBM5yjOU1yg+EB1jhIjJUYDOYAFnwsaArWVyaaSmVvMw1JNkA0gbcEtVXnOhSllY0WCmddVhRmzoQ/N46d4NkO21xrdMFDdqiNyqFSqayUNKa3+bboNUbeBHHP4sp29MWR2E6Ylk3IxjUurD681dGidGs3ZswYzrgq+UnxGm3eRw/LEBAtj9VRWVZbWCy2u1W7qWkz47DZgY5Ej0jJsalkodaGSKGqCZpPfVgpl6zq4sH+CSOsGy8K8mLeE+kp1QlAhJMWtJSGxvy1aEVGqOrxUKM8kyZRa6eMNSz6/6rksDQSaiPFqwrja+X4+1vvuehd1+RmsOymbEZ1gTFEXGPcjt/cVJ55hHPDRU4viWilH1z+OWRNUsa5EcaDBmX2tOmoQS1JWQUMT4KtdW6KrS7LtTH3viLtIDI5dS1Mfxp6b+wP72wP9EOnGDgkmjrfyWhOkNyNJVci5MM8zXTQdUt97+j4wDJG+jy0o1DQdy5JI7fPQQrSuAyZea1i7sLuAnFq2oEv7vO7Ar5m41VPlS4uTXP1bX6wmAnRe2EUrTe37wH4IdgVYpoYBz+1aarPy5uXS4gVo7dyCddONI+Nux263t4/9oa0BrWV67YDSdqV87ZC2FyDA6opea2VJiWmamOaFx6dn3n/4yDkV/vK88H7KTEX566Q8JSVXx1wipVlGyOtOsYDN49t5klahKJXS2BznNliHE1sTVsZKWjVAq663ZDsG7R7Xy3nY0tcdayNwK419St+0M7iKuvRTYHA5EdfuxV9zMv78Z+t19ynAuUJbXzjpX/o7n30Nl3vwb7gVvwpy3g2O3+wDfzgEOinbAaJU5qXywykzhUrphFGE3sGCtBZfR3COIJ4gHqeCZkuPXsVQivk25GpCOi8tvA3jcLQxOXI1c9uSZi9jVZpf3r4062xzU6ZAIuBCQlLC5YLEQE6FMmekVJbzZOZxPjA7xxOGduecmVWtFl6sg0BE8MlU/4iQ1NQXpkvR1WIH1QIptbKOtdiuF6qsJS4x2tI5xxADuxgYw1dPyT80fIz4inWFNd2EIUYDOeKa7kEANfZJgrVsm8VtopaFmmfSdGR6fiLlTFJnWh9x+E7og9mTz02UKCJoylQXKC4Q1CG5WlxCO3dWljIWR0thmadNizMvkxk4pkRajMlxKFJboKcWINtuT2fQBJKA1pYnRvOfpzO5urag2M4xBMc4DsRQmQ6ZJdmE6qPig2JSDEet7qI48XZdqjqq9faa/wsJ0ZlpPjHH46ueS2VtAW+9v+u2oK0gthGgMSvtd5Sr3fLql2HPtS6YNiG2CUbaVsErcYyMdzuKKnHf44dIVtNTplaalWawGD3Em57xzZ5x3/PNb96wuxmZ55m752fmZWFZMqezdRuWJoasaq3pZtegjEO0OIjoeXhzy3ff3dP1HbcPB+IQcNGAeirJtFcBhjGSUybGaOBVHcucOT1PeCfUkvEeuk44HHrePByIwfHDDx/tSlTr6Cu5XlrJ9RoMvs4o2a7vdmbIpTQxPS3k2E6zcN1azBUTwhWuabOhrLCTZq5nHUxd8Pzmfsc3tzs673jYRYbOdCIZs3Rw6ojOkZ00c5ZCzQmHxQ4MXSCEjnF3IMTI7c0N3333Hbv9jsP+lr6LBCcEZ/FAbn1ZwbRzK3shCJ0341bDU2YKurIWOVs6/el45Hw+k1Li/fv3fPj4kSUl/vzhmR+fzyR1PLqBk3Sk6ji3kOiizsw/YSu7/b0MxS8d5iBe8AJUoZTEskxUtfDX6ECcnYuhty5cEx5L+/3cAlMN7CyzxTqs65rqaqNpSNTT0geA0AUUm39zKZTSmJsGlj8Fyl86Ej8FdH6qXftFR/KLz9cPunz7p87AV+Mh/o59xldX1H++7aj3kdO5412vPM+F758Kc6qcaubf8kx0jmnv6UNgFxwFwUVvLYQh0ruIlwClgYXFbqJKNefSWtoC0eIiWsvH5sbbAM7aWK5iO8WiylQyWcsGvMzMtTYH40oJiXLOiA+mPel+BOcICrFprxJCFk+JnqMqf9XCospzKTwt8wvHZUHwmvF5tpflZRM4rmayKNZmXStVhaVGkvqNtRDnUeco3qHedsr7vuN+6LgdXreDI3QdEbclw2tjJVTUulj8ujJeaqMuCi460ILITE0nynJief7I6cfvLc8mBDQEnA+EnWcXLbJiyRPpdAQ1tkQkEJxHUqZ0AyHEjRWxTYYttiUtnI/PTJPlvizLbJ0vObfutJbcq7VFThRzYqWS3RH1Z5AMzkon6mDOM0+nmZCELnq6LuLEEeJAH0dq0ebvM7a21EzV3MzoTBxNsIV8TWzQSnPuVkpJSDU2yPPI6D6+6rnUtT1WwFYh3S5AEUFL2xi4tZldtvOOPQpznWYT6F6qWHbuawO+KkJ3OxC6EekC4/2OeOjJAsUrU02sy6+Io++E4e2euz/cc/dwy//xP/6Zt+/umZeFp+cnlmQg53icSdk0TcvS4hNo/kvAbuy4OewIwXNzM3L/cDB6v3O4rrGiLjOVhZILvoObO/MnGvoe7zpQ5XxK/Pj+Ca2FnBZirAyD4+3DnvNvHxiGyJ/+9H4DOaVFwuS0lgV0OyavNeZlYZrmrd24qIWWti3IhbYXwf/EZH95jM1tq+2YsWPWHr7vPbs+8t+/u+Vfvn3AOdlCH83mYv7/eHu35siNbEvz8xuAuJCZKVXV6W4bm7G2+f8/Z8zaemae5lx0SkplJsmIAOCXPQ97OwLMTJV0ThUbshCZZDACgYv78rXXXouc9RoYoofmcauDkml5IQbP+d0DKQTOD4/803/73zidHxjHkfP5TBoSKSamYSL4oOVHW5r64Ale42KC9wQLitUSpX7SvGrwbKmVL1+e+PL0xLqu/PTXv/LXX35hmWf+9V//jZ//+jO5CpfmubUA44Hwl/+d8O4vVLR8nlu30dtNqtLvk7fbSquUWhWIOMhZmGch5sAwjhpREQMpJMZpMsDjt1y3dV2swaKR15W8LNTaLMNKF4JeGt4WpMFZg4f3xGEkpKSuz/PKsmaojWzC/h7h0r4BE68BzNfgZv/1j2y/9dR7oOjvv/bXz/mGIPoD2++CnGNyPAyex9GTi3bDfKKiOZjCrTRWB1dzKg2WLI2tlLuJlsdZN9XrePhe+mit203bgXVuu7lt3NlWMroYNUtvGwho6rlDE3xtkKuu8sWBL7ggUAqSdeLrL+Zx1BiROGpKOI1FNATt1hoXaysPQBRtaQxV8M5EYc4T3N0wr/vJlFI2kFMJVn6xQ8PuhjNkHLyumpLfr9H+8VsXhjrLBNPkcaMXt46Fvopv2z7iFTyq50pBaqGWrH4Q1jqO5Zg5zPPIKcPSipks9i6f0Cg5E1Q4Yl4StRPieihbpeasScetUspqXRR1S4R3u/KlFhSLskCuYr0o4Kwl1zn14amq9QhBvSLwQvSO5CPVQRoGxlFLBKWtlOpxvov4zOzR9e6AbpHvTA6kzFKtlVIzubxxojx9EOjDuO6TUvtsgOV+88i2wr8rme7/7kz2faa0hYaBJp88XiJxTPgh4mPABe0+bFYe1sGoIR7CGEiHgfE0cn5/5vHHd6zrip88qzE5cZrJpVCKsGYtz/QyM044HEYezkdSDJxOB84PR0L0ujjyyvZs5oaoh05MgZSCueOa8LOIllYtL08bLh3DGDlMI7dx3aXGW1dW64PxW59F3bpFQmtWvt+Ne33s09Pl7IztSo520vrtuz1t99WjTMEYA1MKnKfE+9OIw7Fk1ZBUtGzmvHYZBq+Tp9+GBi1XDikxjQPH45F3797x8PheYzwOEzFqgrl2ht2dnDvg6i7iavZnDtnoeNF1NzmrBvJ2u/H88sK8LPz66RM///wLt3nmp3//K//+079TBGY/sfoRPzUO7wpD0+u6iN8YMDFpQj9ab0zkbG7DXdjbtUxqHhut1KRlvmjl3hA80Tr4WquUHLaTX2vbgki7kMbZeKo6J9nKXcE611xrVkrUMVArDPb45gB8Hzn8EVDzW9qY7/2pTe3bc7/3+r/F5nz93NeA7Lf373dBznly/PgQ+T/WA++PkV9fMkHgMlfW4piz3pRzhp+vqxpKxcCUEjEEcq6UyxWqUCRQxLOumbouYKvwJo3Vgt1GAcwdpw/AKo7sg5/5t4pQHNyCJ4tT91266VEjCfja8DESUlRPGGOAQNPAQ9GJUnC00GgusLTGrVZmEW7SWKxVb3DOVkRKM3Zb7pjCfXA0tbvqRyxo0oE5fugFWR1uLYSaGVojiuedOKYKsSiAetOtV/06yJR7yUOj8io4Axy9Q6cBxfwoygx50fThMiNl1hC+5PAhErzgaqYts3o5zDfqcrVwwwBoh5VvjVYyIUSkVebb9RW3nvPKPF90BdMKOS/WTVHIeTYmZ0/bFwM6DZ8ycdR4gvEQGA+RNDmGQfVAunq0IVc8TSqlzQiQkuN0nCyu46S+GlRcvODCDRcag89Er0ZdzrQnzkcCR5xPpJBwEt58IO3bfUCQ15Ob/nLrogAFYs2iSvTzG0Rq9xKyVpx2E76VQ3y0BcvgCVMkHhOhFIoTbt2g0YF3geYhHiLTeWA8DwznRDpF3Ag1HRhKopTK9DDaAN7INpY064QTYEiRadRgz2EM+IRp7zqg7kaT2lqmLssDeRLSEEkpIKipofeR4CMhJGJKDKPn/Hjiw4+q1zqeRsYxaicYblv+e2eTvf/9Af/v2XoZockd7vfzuZF1e77//mVXrLf/iZi+ibvRX/T86Tzyl3cHjmPix/PIaVDTv1Y1nw9gCEGN9ILDt5ExeAYP7Yf3HMfINA68f3zkME0cTmc+vHvH4XjSyTVF819yRKcMkgY8W5+sFPJqYMYWStIayzIzz1dqLTw/P/P09KxlqU+f+PXzZ9Y188vHj3z6/Il1zfz66ye+PD/TfKSdjrR0hHSkuoEqQRceNqs7Oz53EL47jm+0SS1IXa3qgCELb+NupTdJbLElTiyyZtJ5JUaGOFBr5VNp5OcX1fP4wGE6gB1TvyFaEzKbsFuabCBd2dE7G9iPyKvVDH+cpXnlNP0bAOe7x4T7Uf86luRvvffvORrbs37zN78Lct6fPPHDwDiemXPl5y8LJye83AofL5V/+ZS5NeGyVP65qS33++OBH0NgkMRxLqyfn6jjwopjNnk3sAAAIABJREFUxbPmzDpfjWkRSqssNVMdHGLnffqJuB/QSqUa+FgRCnANkbWrzr0OkqE2BjzBQgvTOOCD5kiJ5YS4VmDJpunwhKQU+a1Vnmvh1oSbCNeuU3DeVh2WDWL+DkOKpK6jKVU1R73F0kCCp271Ur9qMvkg8KE2TgTOzXPKwkAlLW/rraLVv21tSO+Y6K2HzRcFOa4iQVfTUnvbaKWuF8g33HpD1ittfUFKw0+BMYyqVSkr9XallEq5PrNenvSmQzUg3gdqXljjhPOB2/VCMENFF5wlkBdW0+HUVikGbGrNrPlGbUWBo0VE4KqWp5xw8I3DEeLoOZwjp7MmUA8Hq/sHjO7Xm762rH+LZxiPjOkIBFw74uSASKHKR6o8gc+EeMH5GyKO2tRjybtECg8ERtUaEO7kylttdhrFSn0qvr9Pgn3MqbY6VpbGHHXl/gIC97ZT1zsfFOQ0Y3NwDpdUa+cOnnhKDOeJuVRWJ1yy6q5G74jeIQGGc+L4YeL4YWJ6PzC+H0gtEkuwFSnKvAFSG9US6GtrrBbtsQ3JzoS5ofNPXbGHrWQDHscwDhyPB2pRD5xhjAhCjNodFcJATCNpmHCu8f5HIfgRFx0P7w5Mx8Q4qj9ULz0H83kJv1Uj+gdtVWTLEuvbxhqadw6ui8bdHaDuJqmuGevaLA8M3nMyg73/9uHE//lfHplS5M+PE49T0MXImlma8s0haQeotMYUvHZTHQaOgyevM9M08acff+R0PJHSwPH0QEzJALV15TkhelvkibGqCHldWZeZWiu364Xn5ydKznz89Vd+/vkXlnXll18+8tdffmFdM58/f+Hzly+Uonq6eZ4tgqORS8UPB9Lhz6TpEcYjxU94i5SRpkptZxeQ380pby0iF9MtdpAjviHNKzveki18Pc41LSUGYRgix9NRW/KrbELl6+VGzspCDjEyjePGiEWvTGMxGxZBPd3qq7LUPp/qfq3sccPflMH8B9mc1z//lmXZ2MnfATj7Mtb3jAP/6Pa7ICcFmJLjPAaG6FiWwsMYoDaui9YbPSqOm4uawB2qUMQRxGnJpmT1xbFTW8xfofPjYje3x7062jvCQQczAxzdHqA6zZGqXuMBioNqJ90H6/PqJl4xbN0AOjjYezUh3Dk8s0+ncxr3i8Pt9sp13wL76oOWTpxsIt5O0+pJMlGYgBdlIILAIMIoMABRtGPsbYtVr7cNiW+0aqdYVawrUo1Pa6qZaVqmolYbQCra3qaDmPfgveprWi2IibWbpZbrzaZt9NUnu8kDgsP7qt01BnKasTa9NNW/ttpt0ItNbr0VU4XHfeLzQW38Y3SEFPRrsMlyA3p963b1gRDUWt0R8TLiDeTkOuJaUoLDcnPEWA6xGAjvAsElm5TvAP1NzyFWLqNDtt3CwP5319sIqsG5Mz93sKssqbYb236/Jg3030aA+eBflauKNDze9DS6aPXBEwfzuYlemSDxiI+WeXS/3qUpyJGmBo6+FGU0Wu8k6XT868GtMznOVrHBe6P+lWH1ppfrwYjO+a1k64MjDZFhagxjshJXf54xATYhevdtq/Y/fOsLqu2oOwM3bCzn19oE2Z3vft31VvH+s+C0LJSC4zAEzmNiHCJTCsTgdqDR/sbAsjhwEmhOX0PaSIqOw3TgdDpxOp4IMTIMag3RpFGLXUfOdrKPJ6ItzbXeGwdutyvXyzPrmnl6+sKvn35lWRZ+/vgLf/35F9Z15cuXJz5/+WLt9Cs5rzpUO7Wr8KEq7x8SYtYfrY/txuTAbkKXr76+1dYbAsTmix0w31/D/dTCvV0+BDVCxENxfutc7OUdteG453qJoFIMrMxT72P7/lbv13Mf++7X+Gsg8h/R3fze9j2As2dzfu+9/pZG6I8Cnd/3yWkFpJIsT+o0en48DxxSQFh5mhvj0lhrU8dSgWvOPM0rQxMOZrNNqAZIVBshViYSUW1PqVozra1C1YnL9Q6YfnTg7vegU5kZKQVb4WiOSXWBEoMupMcR//hAGIa9AxLr84VrLbQlkxxMzczMnOeQBl0RWbuiAIPR1d5pp0qWgmuOlpWJ8kAUPaDeuZ3fhdOUdedxTfC54mpjFOFda5xEOOI4RzgNjmN649ViM11LMcfMVrSzTbQQ2JyFObqCuBUtY9mjVsq8UtZMzVm9GTwQ0BbeaCxMa6xzppbCcrux3C66qpBuLx6tTFZwPpCo+DhAcziLB2hSqWW9+3KUm+a8tExtFwU53tGsF885wXkNb3RDZDhGxoPneB55fDyRxgChIGQVpwfRzg4ntpqqxvRlmix4p2WtwY9AIMmBJiv4FcYFFzWC4FZEXY99JKZECge8cyTnGeLpjc8lW3YN2ITn+uO+amubCL5r4vpgY88VR+/YcM7iBZreg40O8LWYaZG0+OQZjgPplnAxmjO4RzVQHu+j6ZsG63Jyd/NP6wpTxsG+ium9jJvA3KLFGdOIrQrts/bezDvTox5ZPmhQakparhpHXexMh4HD8cjhcCDEEXERAqSpMhXHdB45PoycH0cOp0FL0F23xh46vu3mvv5c+y6prm/0zlglZ65kBm8N1ep5VVYuOpiiaitPyfEwRh6PI0NUw79mpUKaWTCAAXaHiwGfIt6pRuR0GGi1MAwDp+OBaRxQY0/V3kktlHWx8cW6iWplXRfmm5aiLpcLX758oeTM8/MTn798JufMly9f+PRJv396fuHL0xOlVG63q4ltlUl2NlH7kCAkZXLGI2k64tNkTsKdJdx1G8p++Sh/eIL8z27HUTVP0xgUYKaB6XAghkgaJsZxIISoiwTE2KnMsiz4UEhhIEU9vtM4cTyeqKUwjQPTOGzJ8c4p47Neb+RlprbGUpTpqk3HpmJJ5furTO976V/+7u016Pja0fg1qOM7/+p//z3w0hcxv52T9eotvtn+QKyDrtYHQ43vpkB9P7FkFTU9L43jXPg8F64vmbUKfs3gZlKunOZVAxJjoHinrdpSjNZ0RvnrBCqg9v6hi6W6Y+SO2tq4Lj1PwQeI0UZ7CywLHklJgzlPR9yffyROkzEpgAjzXz9yeXlmbZmDE3ytBIEQI6dhVA2O7VcTSF607Q+hOWExJ9Jcii1uHUfzAfIOpiHhsMgJZzX+2gisODJjFd5L49AqRxyPyXGePMfxbbmc2gqlWsqyaIBdqUWtz9F0Yf260mQGaXisE65V8u1GXhbysuqgYy2gPqow1TlHnSt1EUrO3C5Xri/P2v6InvcQIk0KtS34EBEqkUkFrCpBBGm0ukJr1JbJ+UJrmSorpV2oouUqQqDrnbxXPYkfJqbzwOEYeXg/8f6HR4YxMq8zt+W6TbAhYPoOTVSGiuTF/DREV63DhHMVOKmg2SnIkZhZ18ZyXag5E8JAGAbGeMI7zxAi4/D4pueyNH10XCNOuyvcBnI686nmjd1ZXO9pdsyAwg36StFFcElfw4COA6oFZTYaYYyM54lhzvgUaT2o1gdciIQYGceBaZoYxgFcD1JsVOnRAA1npmU6yfYeGKGbGoqot9Ee4PTt3ubu8XbdxDCQBscwOqZpYDoqCD6dJs4PZ47nI3GYEDfgvJAOAj5wnCfO7w88vj9wOo/EGO+7wj2s4u03ZU51cnaYQAqHM+t+RwredDM99OJO//dy/Jb35K15JDlOg+f9QbU40QdzDrYFjzGzzjkiQTVIIXCYRtKQ9B1EBf3Be1IaLExSjJVviNlKlJK5Xm+mn1l5fn7m48ePrMvCpy+f+etff2ZZFp6fn/n0+ZNG3CwLt/lm2XRFIxFEtmYDwBoNetlywI8HwnRiOJ4Zjw8Kenww9q8zSE1hX2t2rXcu4W1ryecpsp4GjodkZogD0+GslhlxJAxaqm/9vmhCzivX21WZnEPgOCUkwPFw5PH8SK2FwzRyOIw4+zsRdVJ+mTOLdSnelpVlVSa0CKZzMxzQS0Dsu/X+GKvyve1bduVb9ubONMr2L/fqd277+/3r7r//umT1u67Iu+0PmLLcyy+I0p5jVM3MGD1T9OToib1qYMZHuWngSbFOqs39cOPn9jvMXRzVyye0VxUFZ0+0t/hmhfU1QhTLq5JgIChF1Sr2jxQD1TtqX63aPjogevWPiQLRKagJXgPfwFn3jmwDi1rg7Fd9XYCpq8tgg3AXAnogOUgIiaZfPaSgeoa33O62/tbBsZWs2K187gOmeuVYplStps8RA5nmh+REtRpbaN6ui65VS/9tVv6qRqhZe7ZAk6qs0MbRuU0jsv19b5fGOqdcNdYAvZ68dGYYF9TbJkRt04xRvSl8MbeOHUW7eQLdqUJbBTec0cfanRWACF5BOMFTjd3sSyG1/g94pyF8zr0tYO1V1v1qrA/h/Xa7P2/X7dE1PPTV0/15jntHCHIXLOpr3/9z3nJ2gu/1v+3e7qUj7/xm3Y8Nhnqp7VbYCF2A2d9t/+jP24aC3Ziw/VRe/fg+KHrDCHTfLi1jYaVGHFuJ1AdPiH5rJNiO6dsu+F9tr0oG7Eg518XPCnKi9wx2vwW5Z+211q0ClM3xqEdNNLCTrJspmj+LiDYT3LVP8ur9vFOgo2676CuaNihYxlI35+veZDmv5DWzzDPX65VlWXh5eeHp6YllWfjy+YlPnz8byHnh85cn9dlatRTVO297S3svpdvH2qZLdQ2OOG+PELbmkm9PnI3b203x9kxOCp4UPTFqW3hMwVLUtcTbnZ43iQDKyJRyB3d9f53vcR+6CNdrGGiO1gCnsUJ1lwPZmdvXepxtOaNH8SuA8nvbbx2z12WvHWvzD5zK+sLN8S3T83vb74OcbZloq18HU1JQ8+6Q+PNDsmA14VYKc3F2wCtSHbk1Cire88PANI2EUknjhAsJ583dtqm2p5ZCcdbGKJgw+G4G6FD9yiDaVteWFdZMEyFVM5LzXkM8vdrF55hgvOG9J0WNKnDLSmoAjiSCbxXfhBQ8x9Yo3hso0s/jvSOYsWwP7QTBV9GYCec5+cjBp83t2dtJ8dZdrd3XahKYpJHIDK4yxcD54Hg8B6bTG0+MsJWr9dOrb4VOQBXXkg53NUNWkFEoIFmFxyKEMZFEePzhHf/Fe0prMA3IGJEGa86UpVB9IRwaUw00cduE6VwgJsGHom64PmvZyZl+ykdagVxmqrkXlzbT2opPwukY1IU33A90k0qVDE4YDjAcAsNB9SC9k6o1Ry1K+qXB2EJFvAiabhxTIqSB4AcOU+AwWSeWSzgmmnPaweEiMXjG1CijJ4WBaH5QIrAujeWNvThq68GJZkyIIM5WrLZQEDEr/NYF9OB92MDs6ytDt77CB80pMx9LmnPq/l2Ldu0MiZSiDdwKatzWDpvwYcT7Ce8GpOqx7waGgoUmNm371+qKdFLnPk+JgSzpmjHbRxPMAzTXCL5Cg3UpLEthWVbWrJqurjGJUV2Qa2vMywqgpqAW5qolxwEfE605SpHdw4zX3nAL3Y7CHI+HGDiMypgMKXIcR0LQheVpiHjvKK0pS2cTY7HJUcezQnDwYRDOvnJwjiiae6S+Tm0TqHrvSYMKrrsz/T4s0qG+Nt6j4LdkqjTmeeHL588s88LlcuXjx4/M842Xlyu/fPzIvKxcLhc+W4fU5Xrh6emJXDLzvDAveeuuu5su7kCKu/eNOQsrdT7ihyN+POPHIy6NOB+VRdQr4r5A/mamla++vs12Po7k08Q0DZZJNjBMI95HSnOsZaWJZpMtzQCNCzh0UXA7z6yzalnXdWWapi3LasmqcZyXmWVZKaXw5fmF5+tNGaFSycbWGl9qYLF/6j0Q+Y+Dhu9ttsbX73/rObtvDHa+2h/djT1A/cegpD8AckyTbog6eg1wa+IopbE+jtymivfCJWcua+Wa4XnVnKcslewU5IzjwPjwQCqVYTrgQ8KFSuuZVCLqfioN8Q7fBBdMKxOidVBAso9fWqPaCRcRknmniPPUUGjOI2thqY2SdEB246SD8m1hsG6CJI1QCsE5xmrtzTiiCBEtoTjvLJJBqRhnXjsua0Kyx3HyiYNXB9/o7v4QrumjFaHkQl1XBhqjy4yucoiRh5Pj/WNkOL+t47F0ZN8nD+c2n5/WGk5W1PdloazWUdUypS2qj2lCGAcGH/iQEuO7B+14q5lrW6mlkeeZ4m7UUAjHyiFaS6frbhnOWKKMc4K4TEU7lHycCDGRaVTR1vEmC7neaG3hcIic34+Mh4CZF4FXt9x5URZxOAnjKTAeE3HQwU8M5JTskOaQFnAkbZVlAdP2xJSY0kj0A4dj4Dh1kflgLKVjKQO5JSQ2psEhpRL8gRQGgouUKqy3wu2NfXJqL1cJYK26tbBR2a3V3YRxnzS8dQm9YvL210fXyAGlCiqXE2ujRnPDvGMYEmlMhBQ0rdt5fIz4mPAxEcJEDAcDOZ6aBXFCc8rISatQC9KqLQZ0MaPaoQ7Iv2J8m+1XKVtQZXBBr+EG85y53QrzvKpQtWa8D+afo4Cs1MZtXnDoQkmsC8UHTc0OBnLWVci5bY/wxqv/4BzR3/3epyHx4XRkTIHjNPL+dGSIgdMQeDxEonespbCsWUuSObOYD1AwkOMRRlmYZGV0EC1eAIFa9DyLiJWgkurJYjLxqzLQCObGqx1mtWRyWSll5fnpmX/+5/+Ppy/PfP7yhX/553/l5XLh+eXCLz//yrwsLMvK5XI1B+dCLpl9MHNnfvkqjuA+KSoT6DfWJuLHI+HwYCBngpC0AcRhZbXOlGzSdruG+nu8LWB9PE+09cgwDCbSH+4lqjmzzCu5VNZSuK3LFt9Ris53l/OF62UmhsiQBqbjAYBlvjEvN2qpfHl+5vnlQimV55cXXq7Xjd3dZOTO32Fdjwz66jL+e4DOd9maDax8BVP6LnU2DQx8ObvP3+b++gMz6q4YJEaD2j9jcAzB0aJjjPp9DmoKqE/vEjpr9fRKGQvaeeHufPJ94t1KKTYROhMdtIYpedmMpUQUCDV17Q3dtdNBs7ZtVzysdyBEiEhruFq7ZtY0NeYrIdpt5UVIKKDaHCt2UPRVe5tRwz3YTqsnFk8h4Iwz9K0DnqZiVy9KRoTX9OZbbvLd67yv1hxbet02N94pY11tmKFgEAKBwZkza2nEqsxIjOCjlo+iePB+B3L0gm7FaYacY4td2MoHW3eLbDqO3s/rPaSkniniBYmi4MM5fNX9DaF3KXi7zjpQ0UG7deq90++i+9dLLd47XPDWSaXlrC78VAOvrsLSa9p7rETVj51qw+obex51D4x+h7bm6C3g2wQiZhrWB5j+Oei72sV+r6+KvY9FN+r09Hbzdj9XrosM9/dHfw1ngNnSz5uCnP7gVVmCbaDr48C+jPyqzCrdQLRt+++sdNJN02qr99JH93vuzzMW4y6GthFqq/s7E+U3tcT/Dhh8i+3ueKvvM8TAmAJjihxS5DhEUgwchmBJ4Y7ohICWJ1aC9rc1dAxrHieNoTliVVfh3nm1P9/K8O3MW90uDRsdu/TUdB1l08DinFnWhcv1ysvlRctSL8+8vFx4ebnyfLmwzCvrunK9adt4D2Xed3NuM/PXm11XG4vvzMjUezCwo8DG9GC7ear/7atte4/feL9/4BaDN1AYrJTfdab6WVoHeZakXpvm8ZWs16rG16xIFEIIDG4AZMfYFS3z5UIpmh/YW8jvn3sHJrZD8Lqc9L3D8Meuc7m/z+sv91/v33l3PsTJVna666Tu4418/Vrbuzk6RSe9dincx5Pf2P4QbbBdjAhqJw1OHJMX3o+ONXjWHHg6Jk7JE72outt5IqiOo1SCV6V4qY1hmIjDoPbpPlBRtqhaXVnsBLhaAU/zDecLWPlID5mGW3btgBPrz2iKiEUwQbHgYiDEAPOisQ7LyrGWrVgxOAVOTTK1BqQFtEChSLgUDXnEOVyMeLFDV512BQmkVumCyWICS2xlTAOpmbIutJqJ0TFMiePoOD0ceHg48PgwEU5vG+vQrCPn7rtsg4igAEf6RB50BUxAYiTQcBKIeSCtCbKnzo21FgLC+eg5Ro1DOB+F649qslfrQG1He2+dMEoRnj9nbpdCa4FSI7VaqcPCXGvNagpXMvhKiKqVOpwS7/904vQ4UNrKXG9UqVAaLSrTcniInB4OHI4Tx+ORaTqpkJREDEeaNNLQiNFAdCtb6QMGRAKtwZKvenScI1kqiLomB2oZkCpElxij4Jhw4s3fQgei5t92tZgFculGjkYyBh1Eay2UvGrpwrpxvHPEGLRk65wlz+8Glw5suM871cJQscujOWNba+/I08Gmp56XWliy0zLFy8Lz00KpwvSoK1pchZBV12YZZX3ydLbSLE3I1hGix7u+AjiAfr5OyRs2b1WY54WXl5nrZeZ2m1nWhRCtZd3mwVorOWcDPHrt38tSIDReXm6sayWOkdsts6xqo/+W2z99eNRyqVfLidM48OP5wJgCUwychkTynik6TlHd5Qua5t7EU4JntVwoasVVPYm+KAZIUdvINwmCnW1nVd8OWsNWMavUNVtLuFDN0+U2z/z8669cbjc+fvzE//if/y8fP33m8nLhl19+4XabWZfMy8W8smqlGLtg9nc2QYnZgnzLNmrpBpwPm02E9xEfAi4mwjThD2f8MOHSqJ4RXwEd+S5bI9uC6S23OGjZ26cB57UUX9HxtfQFnzf20+QP4rHsKgUAuegYKK5RRFnLl5cXXi4vKja+XLlcbxpYWrK5jn/7cTua6bjj6+f8x8F7P0+yXUfu1f+M2pDd8/fIRXSa6c9s2zm5Y42+25uJ4+6dtyLX10DqN7Y/VhvZjXrqo6E/nnwjDI4aPbUELqfENXtEGpelkvEkJyZYrQQfmKYDpQnDOBHTSEkNfKBIBzmNakxNj5EHpwDH9Zqr3xaOvSwk9mEatgIzzw2Nxy62Mte2SLyz7Cq9wT1C2FY1BSmrUaRBW02dYymNpTbEqf9PsEMnFahdVKmCDxGh1byVC8rGxFZaWaFmphQZDhPn88DD45HHxwPvHo640/SHTsl/dttAjjhrT90zdX57OMyfwUcNqAzKjsU6kMoAuZDbip9V9jgdB6aHBE6YxbG0iNAjFopOmEXTp5db4ad/fubXX27U4rjdIm1VkBOiutHmHLSbq6zE2Aij0uXH88APfzrx+MOB23Lh8/XGWrUtH40p4riBnCPH88lATiIl4XjQm7pyo8oNkWKhoDaRy4AQaKIgJ5d5A+eDG9XHpQRqHnXF7MBFBy0hxe+CHat2I73hVqqZ5lXzM7JSlANKySzrjLRGjIEhhc0fZvT6/V6Xs2du6q5hQCl0E9qbCD+XejeQlLaBHM3xKrS1cZtXXl4Wnr4slCacLhMxBvAVnzL4YkxOtbHF6WLIdAqr+eTUdteZ3AdO0xlZSS16UcfzKlxvC88vF64vM9fbjWVdidZC76zUXFplydn0JgHEU6psD03/nvFhYTgkbrN20r21Jue//vCO03FijPqZjmPih9PIGANRYKSb+wmT12iK6tWZWARKcuRqWWTFIdlYLy/Uop5R8XXUtZ470dT3AMbiaHeWehfN1FK0vbyt0ApPlwv/8i8/8enphb/+8pH/63/83/z8yyfWdeHl+YWc81YCNJKOHqLcvaXuc9PeEqRPbn7zovI+bkahIZhdSBzw0wF/PKseJ43gTTBJF3RBB/+63YFVn1rfcospEoeEjwrAtEfVqw8bqm0T51A5iHXfOo+mJ+txyWWlNk9uBZ8XWhO+PH/h6emJUiu328I8r3affp1F1SGEs2y/O+h4zfbct++BnW8FyfLVv7plAVuFR75+7gawZEsc2O+jOu77+0LmG+BydwDrr6rs4+55fwPr/C7I2R+QjQW0wdGjqn0vah0+Rk8VGAJErwOWB534zd3YYYJcZ2UJ77bLfMPX29hry0lVPaJdEXZv7ij33bdW/eiUdjPUY02WBh+l3csz+lp62BzQ7bax33dDJ29eKuIcrlZcsFqLmZUhAlsuV9voSEE2g0H6pNBv5qCq+RDDVl5xW37O22z3Tqp+C3wH2vfVuUfBkLNOlV5OsLPVRDOnVOgX1UXY7NxrP6LOqX+KaKZKq051PVHLPM1ycjoN7fflKttfQRN6vZm73UtRDueblf+0TCUom+HD3WOpWRlD/WF2lDZbnw8acSB2GnXlqgZ30LxmwQTLnGnNIRaNoCsxpwOZ1b97Segf2V3wva13yPVSrzcjTHZlmHuGTj/3duXbRHMfUPqgyP2428JGvjpW98Fo96c2aTZzMK6tsubMPK/EwWnYZWn4oEaTzs5BJ0dUH6Qrzs74NdOPNBPW7kFOfw6wGRC21jtUFBht3kB9Ytudj14O675bsole7TjVShVnrJWVrd64/HgYIkUaU9Jx9WAMTooeX5sC+Vdt9loCF2lbWG2w9vMqFTFRd28a2Z0mO5Zyj/KQvqa28cl5Wq0aq5JXkIrUBamF2/XG7bZwmxdus7Yrr7mwZm1h1jLpDmtgwOV+9LW80NSjahvjsUaAXir76quCnz5WhO3xzY22MZK70sd2ee+v6bfbnJWoesC0zgHqTVa3jmNe30f0fbTrs5djERrqEr5d2/363r3Wd7evXv/eYfbbuOBeRvoW+GyH+pva1P0HG7z6Ha3PK+Cye952OXz9yhs4drvfyHf25/X2hzQ56pVxByM6E+hAlRyId5yHyJ9OjrkItWWuCyzNcaAit6smI80zLWe6/snHgI9R23FN26KuyF0jc0eC+oEbGzwRNg+P5nTyqt7wuYj5vphNloiBJK/sjtO8oQ5jelstoICsqjOIKjjsEPUWaOfIaWWJatmt7Xk2mbrWpwFtae6W29bFgFRcMAOm5BhOI4fHE9NpJKWkosNNePg2m5hduBiVrWWL/ssVXMZRCUHwFqCZ5R77UNYbl5dPrMvK8+cnnj4+6eflTApnXIQSr5Q4Ix0IiZ63wETwg7lkV5xUFXunkeAOhDCQ0qjJ5F5zi2sTFWQOiWGClJL66VjEQ84zPB3XAAAgAElEQVQLa77hg+cwBbx3TIN679IKl+cXLi8Zh2fzEHAQ0koYVtRE0gI/EdZlpa69KVdDP70PLHllHCcczsq1B/1MPuKHgDRPcwNSo4pya2NIbysi7yGGeinr/dnr/t5budG5e0gfe78JayE2IW8P/+vsTdfzbPEL8GogabYCaxhoCToUl1JoTXi5vvBvP/07EoT3H84MB9UhpASHkxBT12fZpdf3xe6XUqzrqRZWc0j/Blxhk7boyNGqsKyZ623mep01I682ggnlvC0iBPtczm+tXP0e7VqfvOg1cXoYuV51xRzkbYXk//3PD9xyYYiydTE5W0At85VPnz5T18wxOR4HbQ8HtnG5mqdQa4358szt8oyIMA6JYRi0C7HZoqRBXa4s801Zl1Jo1r6c16r6kFJ4eXlmmW/UWlnWm3ZF5cany8p1LTxdVlyamM6P+HWl4Sm9GcR6lwVsvH0tiJdWoOTtE3jXtmt5V01jYxydR8IAccANE346Woeu0oh6PbXdfGHA2MFukNOFSn1blnWYDoQ0smQD/KVxXbNe27WxZPUMq1njKVRPZl2NArmqcSydDAjanLPYa2whrnb0Ojn3OvJjz310AM9rBuSrbc/o9u01hnS9uPKdrbMNO+bhFdp13/7Zfc29Pf3+daf321cclMq4/9X2nO9vf1CTszP+NoCDaAxBshX/eYhUn1gb5Op5uQlzhUSD+UYVoS0GcozeCjHiU0S81whLp7EPHQTpKstWBdzRqmv3k9j6it9BDQZeRLYVjxjD4rxDmlfRk3VGNbuZXNMVjR60ikjWgyYe34W4rWqcgXOsIbLGiDhHDd6M0EACSOhlA68Df8M6hVQ7Ebza3bXkSceRw8OR6aCusGEnxH6rTUtplWY1NNXSmjDVaYaTczrh+6ghk1Kd0aGVsty4vnxmnmeePn/h06+fcQJDrJxGj49QpgsVBbalqQjUu8gUA9EP6qtDw0nBExjiQIpna+EdrTNHW7GrtfnHITJOXrul6FoNDetc15nDYeAwDaQUXoOc68LzyxdqAx8En3QCmc6Nw7nhQ/e30WT2mitStZ7f2kyTGR8CS1kZ8pHgIlM4M4QJ7wIpDiSfkIYyOVsshTCk9KbnMudMKVUZwA7+nTencBU9tuY2UbSK5FU0rY+6iWqLTWpdaFytFNTZErgPRp3XUYgjFvOgw0IR3aeXm4KcOc/8+PKedx80AmCaPDEMePHb5Nzv7c5IldqsJKZf1zVvAKyDMufVNwbntANTtEy4LJnLdeZ6m1ly2dgcHFvJWgGBjhFSBWnuFfNTauV6m1nXlfPLkdt1YZ4zkfKm5/O//+mBtVZiVHZ5XjNPL1fWUnhebvz8139nud54nAIcIyk4neC9jlHVqSt1a42nz7/y5dNHQHj37j2P795ZPIDlSIlQlhvLy5MyNvOiTua1cHm5Ms8L67ry66dPXC4X1lJ4ut2Y1xVxiRKONJ9YSsHFifEc8cus14CVq8SAsri+EGUT24poG7rYPql/z51j6IvTWizE04wmJYwQRxgm/HjAhaR+aK1Tgvsr1L7fTZh9QSvyxiBnVJCTl5U1q0Hfl5cruRRdmNttVavomCMYsOxi+rrVf5xXuwwRWNZ1a/vv9wuwLVw60OlVlz142LBGZ7TcfZHxt7b7e+jfvTLl2y05Oti6lwF0DhcjRPZoSfr9t+2Pvdr2vX32e32GDmycM1LCWaXlNzkp3f6wT47z1orWX9zYok5/eq+aCXGYI6f5yziBptRpM5qtObfRpH1l1na7e0d2su3/K0quX7uGuRTsdEPB17RZ31ERY4aMIt9K0sK2WnKG8qWjREF1xKgrc22qSaiuaRSCMULN3d/pvomSr67T/ComVI9C9+rh+/H8muN9k0229+qdN69+t1HW1uKLxSqsK6VmHXBq3RivPpC1XChrxje0fT+oS3KtldIa6pLfFGSKiR1D77xRAei39d/7ysAHh1lhGPtiIHZ3p6uBWW+H74nlhXnO1CqEJATRsk4owrB5x1hqJ04nvSJ3Fsrq+qUUvM80D0EKrjWCdwxmOumc3gNds7U1fLz1tl/F2HhwdwjVkuDmL7JbYSn2v4t5W5MdkLiv5u6raOzrrvzF7v28nqMQ9Th67zVvaF1ZlpXbbeZ6uSEtqvcQGsVyN5G6D/z3biu4v9H+9hBcz+sS05nZ5N5ZoGIgCdtt+erS2iYB6QBL3+TOIPTj0p3Bv9Y8/OO36NWMImjrKL37q5k9/7Ku3JaFwQXmUGkGcpyZwymrrftcSzbWZl+aca/KILUqsG12nvKyUIoCvHmeWawr6jIv5FK4zivLmiG6Lane+8AwjviUSEG7uUrOevysbGWuSAoudxqrViItK8jxCKG7vFo23leFFr2hzEdru8FeUz46dm0zp16/e1KhH4+3LldhAL4vGPRYV3KuPS1E5y1jEEWs6aZ3G7oupuivpgzk3fZhBzC2uXhvyncvFb0qBdmr7f7xn/x0r7DJ/SV332/P3P27l8l1793r17kPMve9fPUmr9//1b//nnKVD6oSdyHowS2N5jJUNVzSuUaIBI4uUgXeF+FlKSxZWGLjlm80qay3F66XJyqe23xlKQtrXVlbJduFl4HitB1bSwNtm/+bDW7Y75pA6Te3aDxjl5P1ZnNbtukg15qKRRwbJaoHUrbODjV7s0uhdbrYsUpjsUlx9ZUsWMKsbAZsXVOib9u24M8gFU8jOuE0BsbgeX+InJJnDMp2SV7JN8GN6x9Ug//nNu8azleQcm8FRsWGjaxRDmRKvlDyZ1rNXF6eefryKyUXLtcrYRWGEkhVH9Iay+cbn5asAtRzpp0KQmOpVVenIeJPI+GQaGtlCo7Hc6TWwLxCLtVugLa5KuvEGYiD43ASTo8wHireF7vhK1I9kgOuBiKB6AJ1zbx8fsI5z6+fF37++UaulfHkmM6OqM1hyGBO1gwEBhBHyw7JerOFVAlRB57b9cZtXnES8G3FtQvTOPKXHwMpaefQMKodfl6L6nnc2678QwhE0VZVZy620VrmfQoEPyAIwfnt58EHY2l0Mlpztm7EQikZwbKLvLFDfeJoQqVug/UmOPYagJrGxOA87z9MWm7UhhKutwv8Wvl//if89K8/cT5P/Nf/+gPn88QwBo4PA9F8dsLgrYLUFzQOCDin4Z/4tg32YFI7oJVClkrJhefnC58/f+F2W1mWRZ9ro2XPYFNcbIyO6XiaNHwQ4qCMtU9OW7C9swygagLst9sKnrX1hZbwslQ+XTPzkvn56cq/ffzE5eWFp8FxGb1qH8O9W04tKRT855yNZfekFImDApFchZfLTGuNL883np9u5Jz5/PkzL8/P5Fz4/PzC9TaTS+HlOjMvK1WEtTRKgzFEPpweORzPhBgYpoEQA1KL6ncMXPbU71wL87pqrlLO3JbF9CWFkpWlW25XlqvmW80vT9xevmx6LGlNgc1wwJ/eq9h4OCA+Al5DgzuTs9+6E/er7Q4S3nJrAmtuXK4zt1k/8/PlyprLtqjX6UE2yWgzMCSCtZ6b07QXnDeQWEUzkW2x3gGM6zPeprW7A52tBLQ7BhuQ+I1Sz29qcV4/a1tgdgCpr7YrV7n7e23GG04pgDuIlQ2XvrLL2V6zQdvZXJi+V4Hv3ULjt7bfBzkxQhpUzAs0X/QNXdMyj/ldhBA5hkRz8JgbP8yZJTe+NAM5tbDeLtwuLxTnmZcba9Zcq7yBHAUtPYsl7FZ2bTskgnqcaMJLTx1qOLJoR7cTe4bsDt52MKtdALszt1st9qgHcShqtmJyFiGLXpxVnBLXYp6Nrmf+qNfFxutYW2yQSpTG4B0PQ+A8eN5PiWPUNvxIQ3KmtIZblt87JX/X5r3gXaVKFyW6natpRmShycqan7ldP1LywpePn/j408+6QiMQiLjmFeS0qEGcX26sn1ddbD2Cf1Ax6FwKc63EGBnzkbFqh9IYgVOkVJVK1laNIWzbaqX7YsQE4wmOZ2E8NFxQMzFpVbtIisfVYHvmaWvlZXmmifDrz1d++tdn1lI5vfc8VKcTWYJwNHv6NuFlguZoa0BWndymgx6bhpCrAiVpgbIstDxwOh55OD/y8KAt2uMUGJJXYL/eAw/faovBE0Xt4r2F2QbfKeOg5pVoOS6aPke9X3QlvebMuqw24WRKUVCW0qAZQMC2QhYxLUVREXcX5zs0DmGIpJT44YcfOJ8eWMvC0/MnrrcryzpzfXkhBs+7dw8sl5l37x44nkf+9E+PTIeBOHpGFwnRUUU2yh1Lqtdh0SO+3SeHrZOxqQnlmnl5ufLl6Zn5trKs68Y2KWvcfXNkWzh1hla7ryAmZRZD9PjitsDZXBr5jYXHFUcWRzbW6GWpfLkVbvPKL09Xfvr0hcvzF54DXJJGwYwpcEgJ7x2Db4xeXXFjSsSYCEFd3mMa1GSzNC63mVoaTy8zX55nlmXhp58/8euvv7LkzKenZ55vN2oTliz6uZ1DnIp8H6fEPx0fePzwgeNh4sc/vedwmPBOzVMdOg6qoaCzCf5CrpXbvPB01dbyXoqstfHl8yc+ffyFvKzUUrg8fTGjxrZNuD6NuOM7XBwhTYi3nthWQVZ2VKYxIUErEJ392yb9++T8ZudSdOF2vS2qEZtXXi5WrhL9/Z1w0mtdDHADGkWTMPCqpXYxILQtAvqD/rF3jC575vN7pI3bfe2AR8s/XXi8By7bX7nfABSiiwjdEdk9x9HbhbTodN/HPYMjRq3uGaptWhYDOlYV6fN6rwjsP833tj9UrnIoDa/ZJt3EDxXXSv9gzjq71WchBRssCsaSqMq/tarsh/QcYTM4MmzX5C5S23bc7T7wbmu7R+0Pe5L6yyks3J/o/Wtsry93QFQFslFpvQNDBIoo0Glui8/b8LDbTooohHfKOHmnnV6Ds7wZ75hiYEqeMXptP+5ltKbt6Rtn/0abZkf1hwoQW9NyS5VCkYzISqkLa5kpZSaXmZIXylrwbiD4iFO9aa+S6kVeql4TxSlaRUs/bCWtqsF7aOkuDcqa+CD40Ogtyb0UpR1TZpaYII0QB4hRIzZC6AJbzYnaSjB0fUcHS9bR58W8QHY3VH9PE0nqhKmlp1q8RRGo2WGpDalQ1kLNjpwypWRqK4TmEbM20GNccG8sVN2N59tQtf+3jgDdLO/3X+yVweXue/st/Y7Z6PR9CcxYnZgiw5jAC8M40ERbxVtrrLWxzJnrZSH4CA7mW7YulECarCwu9zL2t3d9/yD3n29Bk90I0MJIQbOXgu+GoyaW3muO7GeCbFlXWwefgUdBtSTte2zBP3DLTViqUKyzbK2NdUuUbpsxZm3a6t5L7A61MhAvuGC+SDERYiSaJUMa0vYz5wM+YHYNA1WEOAzEYaR5T5oygw1lYTBdnHNWFvM8Ppx4OB85Hw8cjiPn44HpMBIcRNdb0L1lXqk3Ew5K0cUO3lNqZc11E9LmdeF2OeC9ZxhGQhwQnHYyVocL8e6F03WYmw6nbt93Q05nI9PGdOwAzv+KcpW0biLZQcmdQeolSJ3A77P9fr82IGb38L7euic77vfpb+4Ju1U++ye7Vz/6rU6o3fveJ8zdXbhDWYo87F99dhRrahb8bhS5wx57lR0DJbtXvpfu5PX+IlaFuNvA/Nb2xzQ5Rl0LTjNqouiKqgnVlDSqALcXHT3nU2QsjXl2JNGeKSeFVjPVeVrT5OvWKlk0v6O2xs07rqLlqoRaneun2bUaijPjP8dK74ERbg1MMsy9oqkHb39hfH3SescMwCIwewU3RfR1RdR4bbWzMADJJo4kgq+qPfLdd8c5RucZnSd5z4dh5JQCQ/T88BA5T55DCjx4SCUTWqGheVn+jZmcpb2wtBdKzbTS8M6Rqx6fXJ+Y88/UtnBbfuHl+s+UMnN5vnB5eaLmxuDPjC7ixBEzTJZJVVqllgWc4Oe4MQGhNEJpuFhY3AvkQkiB8f3A4RxZVsfiCtnPtBrJc6CWBiyMkxrCPbyDH/8CH/7kiMkxHc3Y0Y1crmeCL4QgLIvZonuPN9+McRz58EFX7NOjcHhohCQMQ9f0oIL0pu3ty9xYLhozkBfHMmhrea7FOpAc65LJxVNz5vn5V05HxzAGYjrgQ2QpM9flF3x7Y1YObRt2TQGGxxMwjRe8HnB2lGZvl3Z468DShUswejxGFcID5Fw2Jkco1GqGcRY66GNAo+caLjhO5wPvf3hHk8LjDyO1rtwuM7/89Inry40v5Upb/8qQBt7/cCLnwvnxwOndAPHEcAg6uWFE9Q6Q3EtYnekx07RcuFk31TyvrGshl0ZKAw+PnmHUmIKcV1wQbrcbcbRXslV0rYUQhfHgCQmEkWFMHI4joOUfZTLfbvvpsvA0a1xFk8b1svDpsrAuK7e14L26iUNjqZlchdsKXPQ8H6LnmBRc/NPxHef3f2IYRz78+Bfe//nPxBhJw4E0TkgT4nDg/P4DOWemd+94//xEqY3rvLCYWWIM0RYSjhi0HDuNEz9++MBxOpCGyOl01MYJrwnpXRvXWe21FG7LoplhuXCZV0prLGvluqip488/v+N4PDLPMzFGCo6clY1b8oI4j5uO6vTgGrXMtFmv824vgHOa8u2DLsoNeOu1I39jIv/Hb6VUFU1b12Jr9wxGtwM8X4tYuu3Fxq6LWPeg3s7KaZpFnjeNUgcHHSB9ZyHwahNMi3kHSK91S99uXy+m+gv1Bf4rACKaA7kjbZTxdZ7QP9erhZdRBXZMtGvMbd1jHfps7sidyXGGeYHkf/u8/rFYB9eVza575IFTJ8ZmFPBd5C+k5DgeIqU2JhHi2uzAVwU3eKO+NcOmtMpqF8KCZ0ZN/sQ50o5We43kFNxkcWQUhFxFWLZDZqvL7tXz7afaOum6HRPALI2LCNUJK7CgK5r+Hs4JJ+DkLB9SFOiofkcfwTkGFzl4mKLjT8fEh2lkGDw/nBKnQ2DwwtEJsWZCc8pyeQ95/f1T8ndsq1xY6oXasnkXsaW6z/mJy/yZUm/c1p95nn+ilBvzdeV6u2mHuY8M4YwTT8gwom3/rlWkroDgF4s5EPCl4auAr6xcKXllOEaOf37H6d1AXB3PayFWoeSGLFHb7d1KGpX1Oz3A+x8cP/xZV9lpUJ+MUgcOk+hKjsy63oBmg7lOxiklHh81kHI4F4ZzUe1FKjhRB20RzbWS5lgX4XbVdPS6erLZ7NdSLCpAmFchF8G1wuVy5nLz1JY4HgupDaz1ym35TGhvW65S7sik9gb8gwu/YcolW5mnf1Wwolq7EDwifeWdiDEaM8ZuRdpbVa17KwZC7OyVsmTTceLx/RnnBBeO4BqfPz7x6edn1rmyXm88/7pCc/z4l0eGKTLfVko7cHofcHG4Z5w598oPR7f9WtBY1lK0G2jJLGsm50otjZgSp6D2DD54bbfPwrxE4k2vD29dZ60VlX2MnhAdzgVKgfGguqZczGH5DbePt5VP16zt3E2YbyvP16yp3qXiOsipjVxUh7Lmwrqq2eYxJZYhMabEn9zA8eED0+HA44cf+PDhB02wjgM+DgBMxzOlCaUUhsdHHq4X7S4ragYZg+d4mBhTVHATPSkoeDkMAzGodiumpNeD9wwx4r2a23mbyEqtrEWZ46U0blmDfm9L5WXRSIJxGhHvmW831tp4XjLrusJ8o8yzai2HA3iVCEhZ7y3UnURw1jDiVNKg7a6vJ7//NaJjrCusWHmpffOeez3Nnte42zz0Gde0LBvbcwc6YmGugmzi5e8Bp2+3HcDhzgb91rYHQbvyB73nWl9N7p9gxzAqMNLfRJTpo7N9ppnaPHlwSG9MQuUOr0tmHczKfb+sepT+RpPHH3Y83iikV+dq48U7udSvNoLXySN47UKpoqhUatVyjzm1dqOvXnLKKLBQG3DZSld+e0e3XQhFdmUqe3RnTQU4us/+K6qv73aHTXfku2tJt8/Ub6KNUtwdk/6NdNC6K4J6py7KGg0plut5B0Lb6rgJzd+7KN66XFVb1egKAzmIWBKFUOtij9V+Xzc34Ds9WJGWaRKA2uPEFORa8GNPLxYHwatbKx5bEXqr1ztC1FBGNe8Dt7Vu99KAsL//FOVr1IATi+0IgRAshbz57UZvvT2+9RtjIyWt4+ir1Ynb3UpGGpYsW6lNqmxmgk7M+t4ry9OavndthVrVqVn+f/beP8i2rKzv/jxrrb33Oadv318zMKBClRhmBqWMOlpaccYqQ8qIwRqsWAllQdQJxiRqTEwwkkiqQgChtMpSiUhKi/yhASGKscq3FHzVgrIGTIlJQCEvJHGwZGCG+XXv7e5z9l5rPe8fz1p779O378yFoWfIcJ5bfft09z77rL1+fp/v82vkIk5RqkZYTKTqiqY6LYH5f+PmedwsVTXcqt1Vh+Ppc2Ysypglmxm3XA+OiZYXZ34FPqgV5u1aM19FZUgTQxOHWOr0NAwx0UTzzVJXP+84wDHJZYxznqJSsloAgG98ubZB1ZvjbRPGenkWyWKO7ipmgs9qZp7QeEtSWdLohGDs1lhD7RQlFlNbTFebOSobXQ+SsVdmGrEVgizFUduO0C1oukXxx7HMwa4AEwAfZEwj0XUtMaeSZdo+2zvHatHSNY2lXQiOUEL3u9CUquQliWlhbmq06HhIl8PMe4Hs8N5q6ko2wNZYrRy6tmW5sGzvy9WK1d6e+RAprJM5mto+WVi9bDFbQAHFlHye5hYxpjs5Zq6qcvrmqikRZd1n6xjaE9c1M2ddZnTKrO1mlqzvm91jay1rJbMmZWZ250dvrLK10V5TZgfh7HVtk+2l1TxubXYF0IhYgstQgE31IZxAjt0vl1x2Flgyj2is5099Ip3tZ1ay5Fry2CBHpThEWZp1ygSdrTKqo5HFwyseoQkB56BrM8s24dWRNJHWR8QMw8EBmyuH9JueTYxsBMQJD6BsSqbZBdAWCNIg+OK2VArSk8VAkYEjZZOVvoIjZtOmTpIZqqwRRXY/COX3sWqRBZm6+mwU26KUjMo5l/Bxc0RWrLBnfXfj8li01DFY6QdxxF7pJaAe+ixIAC8ZCZBE8accwbEZDjjqLzFsjshFG6wh4X28zOHmAVLeMAyXScMGzRFHpgsFzKU1KT5ih51LNAszO0pwuM40RNe0+KZBEUKyJFd4we0FZBFoVp7VXsdiFcAL3cLRbiynUE6WQj+loQAUs8tvjoT1ATaysgHpGYZA27Z47+h74Wg9kDMMUUYn2owlSLOkyxnvMz4UX59QqVPMY73MnZrddr3JZJtQtZwnLjgWZzraRcNiabOm7zeIJNZrQVy0cgoYY3SaYvmO4oQ71NaclsPH+SkBoJkPhSxqtXIKqPYj3T1tqgpF+2RME2DIEUSdBRwkSp2ukpE4W6HAvt9wtD5isQic29tjdabD+8Dlh4/o2iVHBz0P3n+Z9aE52l8+OCRLQtrE/qVAyhbJKSGMJjTvmy2fCrQwK0M09ncz0PcDQ0y0XcuFi+dN4/ULQujw3nP2/ILVXosrWbE3m75olLYjpJToVh3nCsAZeiEVJkc8DGkgptONlttsNmO6g2qG05Qso7cqDo8XX1IxBHDZgIKzvWpv7wxn9y/QtR3nnvYMLtz0RSwWHXv7Z+gWq1IosoIcoSlKadZA2zrO7y9H0IpalvGmmKhq6gvzt7EovWoWlVo4WbBoVGroeJmnVdks9wjOWdmIxnbLFALDubPkbMnufGhoFys2/YZ7778fvf9+hhhZDwPrEgGoxKJCCtk5M2e5qUiuHbK+mHpOSE9xyiAnDQOp7y0vXIzjHotWRbce2HnWlJmThYopmHPzDmWdUtiryuQomOPIPKfc5Dd6tUxOxfMzcvbtKrEUH3bDyhnMs484sVIkAlYipACw4C1AwolFAobg7SlnxWDnPoCzD5zMkPOWbymmpYyNwI0X9q45FteXDDCXDTXrdrmIqjEqo5qt5cNDMOfUNmS6VpAsbDRbmvCkxPWaeHTE0Fs11b5uYiiHWAXvhZr/iwdahEbsoKm/U8rZpAZ2euy7+dNM5dlG5KsTfeakDARCoOT0gaIbyIztAVugYp1V0WS2ZDsV6JSpRaXwvDganwlecJSItOxIg5hPrheGsSq2LVd1WCbiU5QhrtkMB/TDFUvGlRIajbUZ4hU2/SOk3JPiIZqigRxNNMW7n9ybj4QKOMG3Jdw/OCQ1IJZ2wIXGOqsUMMULsmyQZSAsHYtlS7cw23nbQttAGpSskWFI5DyM7IEmYeiFzRFF0+zJWkxXTUPbNiiZo82RaU9pllXVJXBpnLfegfdqX0XVHBm9EqaYspKisr6S2Fwx/5zGeYLzNJ3j7P6CM8slbWcO2zEOOJfZ9IILkSH2gJkXTlM0m/m3iqAk78awRO9rfhxXnG+r5mOboJWBcOPGVrXbFNNUKTpP0W5WiLaUv8gV5OjIvKUczY9is6FphW7ZcvbcHoLjhhsv4l3L5UcOuXxpw3ptifqOjtZkIs0SDg4WqCRcCIS2LeHPLc61dnjpRP2nZKaanDPDUKowx0TTNuyfOwM4um6Prl3ivDMg3dlmlRnoB+u3aq5ClbZraForzhoHq7Pmgx0kKUeLSDxFGfqhJD60PS3F4lBbyjJUs3rGkUuplOC9ZUgWYbW3YnX2At1iwZnzN3Dmwg0sFp0lyezaY4fKpGUDrGjHvc1TmXMdHTrrPjgxCnOdes6M5PoLauWoKchXymFYcmN5+4TsIZ3ZQ8UzxAQ+4NsFR5sNG4VH1hv6fsNw5TLaHxanXW9zFyH7gDpfCuQmmysOXEoWkQeMFcCfIEkpkuNATnG0WlQ2x0iJmpdtxthU9VtkCokvJ3q1RrjK1pQ7IDKVJ6kAFT02JtsyqTIyssHzHDvAVa9Hnx2kMEbKZKBiPE8r2AlWgYM2ONpgrE0IrjCjE8gxVsfNWJ2JnarscJWRsZyzhSX56rn9a9d8vA4mx/6rFNhEmMoIGMZlUKKUxi6sD1NHE5kAACAASURBVC0lLDsn4mZDTBD7frJbjrZVIZUlkSkF41QtG3Lp7LroaqnOGumUYLTh2b2OP0hFLW7sMS0TKOuUiDDP7jnHwoYc62uDsxMOnk+mia6jtlBrnadacLBqj5CcDWoqLJiccgRHLokZq7s2kkAiFi6VkVIBOZckiKIOxJxKVRWCFhOWFGdzC+P0wSNqTI4l65LCAtq8EYxN8KEhBDejziEECI199wF8yEhS0qBVqSyHqoxaytTXlkPCfPDMr8SUpqkf64zUrd8UyhNGuy6+VOluasLCGu0lZbt3lrwwy3jIp5iJ0QDDMDh8bxlzFUanx9OSysbYEjXn/xqCCoL3Zvawsg6T+W70ZaimLmYzeAZ46t/nr60+lJZnN3OzYH0fRnPk3M5eTFetp100NJtAuwi0faDpPOJlLB5t5iegfI5Qs7rqyAxsMfmzTdJ7Y6qaVkHN6bxtA01rYcS+EVyoa9WAQn3c0RzkbL47tk2vUinfUz4ja3LQutdRE6yqBTA0pVZTzLY7Fj3DIsKco2lblqsFXbek7VozF9Yq3rPG1+WTJz3e5seook8Hynbikvp6DnWqE2hV8094G/O3jzeenSM6snYKtE1D13ZkhbZtaUNAc8LXDMBM87PQEFsfM29hPcDrGSb2sKdvripmNfM9YUzvwAwa5Plc1sJMYIqTFzMLmpnHjfPCarTJeF6a+dxMeXYvmxQGoKTsD7J1TNl+Uc7tqrSjo6mrdNHW67qVTSap8vvyNI0vVguB4K2WpROhCY62RCmG4AmhZCmXa4GcOatTAVj5seAN5yYw5AuT0zxKzcfHLtCpUzjuNHukLAQzISl2RlId87Q4Sim0zrFqBJ+VK0dHXDo64igqlx+6zOGlK6yTaWIZN+ba7csC6DUT1I6XVq1yuMMim6rVvWolGSGKWEXe0hm1rWMfSa3jI/Mxt5ITZUSzZkv5DiVL5wRs6lZRQ74N10mZbDrbHIpNkURWYYg9a42Fps3kGIjBEcR8BoITUGdtT6cLclJ/QIwHKIcgA+ITIhHIaO6hUXMwPXLE3pxPfQC3tEKYOSU0DeUQbxBvu2wIK3zoQGHdmwOoJiVeicRNJEjDsutY7Z/FL4TlwtG2lhBqb8/s143PHF4aIEX6PpF6JVptQGIvxF4w35MCyCTiQ7RaUQtY0ZFSY9EoWkwpMquWvbWv5XGxBi8E8eTGcWa/wSWh32SGK4cc5iNQIWdPFvsaNsrmKJEy+INMwpLIJTZ065KMz4N3p1vWoR8SfW9+FIoSko2dc44QlJQqAFBCKNpatXNXm/3YJ9PmUmub1bwdeTD2JEbLR9P3iaMrGw4uH7E+3OCdZ2+1YrlcslwuWSwWtG2LCpZbyMPyXIu04BfKUb+i3YemcSyWgRAECY4YhX5jSZCz5BLvkPEhm3m6+IqYjuFwPoAo7WIJzhemUcphbQwWYj41vvh9gZJUyGpJLIcSAeOk5Ewqh0UItsS9d/jWlTp7p1tXbtU29HGWBsFnIh0aPe16gW87YopscBwk868LQfCNMTrnLtzAF3/JM1gul9x4w3n2Fl0BOm50NdheBBPwcW4y37c1zmR2xazsG3YgurLXTT5BtgFO+2WdZ1IU4Prpk4oyvQjes+haYsrsn9kjA+tNz8OXHubhs/scrRvW60OuYIrwqPnUY1aEKcREjuud5eiaFALd3gw+95IjThJtI6DeGH2npGzz2NIElLaV82YsL1OBe4lq9s4RiklQVcmFfe3VkudmhZCEWCrQZ1+AkOoYOADVWlbWT+mjUQEcwekEVJmdo85Nir4rZqngoAlmtuyawLINJcJOaEM1UZmzei11ZMrDxCZWkFNTUYzM6oS3Z22htMWNZ38FPPt7y2sOxfVVId+amTpqqKPTIpQJJxNaLg0MDrogEIU8rDm8tOZwyBxdPmRztKZPVhyvVFQhl0yIghLVmHcpAzqCHLSYmoz6HNPvFUpeK8wE5hSlFF+FGc4vk6a0v1Cs9dCY67h1gZvduSyyMhiW9XgaiHJUAOb7EJPSq9mOvUjRVC1XjhOxUgF1wzhtkJM25LQG1tQ6VTgrSOk1GlOjkKLDbNqKc4nQ2uGRyWTtJ6bCB5w4ukXHYnHWFtnlA4YDKz6XxZgN8Z4QWpbLPfzCTFQhWP8vFqk4EyeWq56hz4gkDl1ZkAlyFNIgRkM3NlukZG923sqodjSjs2a/HsaaKUqeSnqY6j6OP9gG3zRWZDMtGyQGmpB4pImo9oWB92Q8OTpSb0UMFWWzTqjP+GiTox9MY1muFsZonaLkDMMsesPIq4Rz9bUrrxXUNoStSI+qylHWSWE5K9szsTYlXcSgxEGJfWazjqyPNvR9xIlj0XUsuo6ubWkbYxHAQvdx0O01+E5Qnzhz0CFtARbVNu+M2YxDYWt9yflSi9tSHDnLRq2UGl1iANN5+zwLc/alf6x6eNVSx40ziR02kiEVkCjGpEgoHHG2de69wwWrQyd+2ktOQxbBs2yUVJMWSiDlBg0O3zbQtMTBSqf0ku1Q9B7fOFwIrM7sc+ONF1itlpw9u0fXNtYXUra32Y5Wmfm6EsYSM2wlhd8CImPGWqa+hCkoZDyFyg8yGqzK/jxbdseWoDFRIeCcslwsyBijs79n+XjM9BGKkjlr1PgJEzMwsnxj2BXTfj2+9ZRBjtoZ1liYcGEqXMl35IjVBFzbW5Ky+uI7591kYvYiJZVKJR0Ke1Z8tbJSQvYLCHKurHMpGb2tSTlNVrC5lWxysZ3+n890Y3LqXql4Z/5VjXd0bU0rENhbLAjOAE4XCgvlLchEsPNijh3qzSuwmRcT3r5mIg+O+/FUsLRctNcciuvbhYVS4LJ8WGVCyiFvaM+NuQnmprQKRJJXywjMFHE0OtSVrzppS3rBLbBUOz9jEVcFD1p25IKsauKg+QSuJiWhRF5lHTts2uzrxl9Az4RvJ1ArI6YZzXQORkc82wBqFFXVfMp0GW1oNWoim7mjMCMOC1/OMkUPnJbk2BtjkwdQK72gxPK0eXxAFxyhC2QPLmV8HsqYztU8XzRlQ2m5mhV0qoVUmYAkmWEzsDnq8QjaQ+5hiEXjc5YQsFvAas8W++bI48Sz2hPaztE0Ygdf5y3xXOtoFuZ8FrLHN74SbORo5pRYNhQkE5qEKxlhp7GZp6iaDlypmYMreJViGii1s2pmUDtIjXWISczyJ/LERMqlXHxiij+YTKGkIsa8mN+NzT8payRXp4/ZKnPOtEilgIPii1PZk6zTly2zyUzULTpWe8bgdIuWpg00TTCH1Np33ljf0HgWy87MXE4IBeQY+9PZ+0LJ2ePmmp1YGZlJybMpiIylVSYp69eonfHKSt+KE7x4agh9zrk4aodi8rR7aDEBVuWobr6nJU3wtKGWnVCSGsBSlFjS/DvvCSHQNpmknrBoaFYdofEsVyvathT7dVP+lLnmqWO/zLqq/k1hbuYfnU2pyt3sB3sxXXyssvVs5zzhSSeQPTHl0xpzJYLLe0fjA23TkGKkDYHGB+NycgVuMgJ0Y+hlZOqtqacLTK8pxczinaChnI8oKdv+6CvoVBk73Ykby6nYaymvZ4o8Wmr96Wg2V7UcNCU/JOonXztLjFnAUdITQM60pqd1U4dYxnEZ/bicElxGRGcgx7FoA4vWzFKtdzQF5NRSM3AyyDEn5NnYlXU24WVbtHUvtrZM/lVTux6HuUq8FYBzimWMHR99pv1VlKXB6K6USqZTM1ftNZ7WK3sNrLw5hraa8CniUqbJ0Ol25uL6OWOtCzHEKpSIKi0hanVAFOrBMwdIIwgDpFD50zyc1InRTXm2ARx/VlcoWo/l7xGxKuxNATatGMvUiEVruQxezCeBkoAtizlcRnUMPTgNqHc0ooi6kgjv9CSuHyKuH0CcxaWpqHnll3ZKAMQR9lr2GszxMR8VrjOZluBK3ZHYQeogOzQKfYrkpPT9QL8eSH1iOIjEK5EclIebyxz1il8Ji8bRCsWc51HnCAu48RmOCzc09BvHufPK5qilWyjnLiqLpeIaT7v0uKD4pqFZtVZdmYasHahjfZg4vByJMdMPB6z7S2SNhOWa0BwiXkdt3YolefvSqhl7fGLU4Cn5STqv+JBxfkDVHPFjTGivY/HWPkLbWmbZfnHKkXKbgfUmjvPYeyFnY3KGmBmGPG0C5YxPOZFjKqCoOF0XDdKVDLUV7KmqJdYb7DNSLCUQoIxDhwTP0266yOrMHm3XccON59nf38c3QrOwBKFOPI00+FwKSeYLnD13ZsyK68TRdS2rvaXl53GKekOrVhqgOisW4FiBTqEa5o6KihKz7T2pgOzpPrY5Bgnj8znvaNvW+iCYT5EdIDa+Zvrz5k/mT5eZO7do8VXRUSUOjo1TU4SWHUO3MGfatiUslyiwOrfPmRvO07QtNz3tIufOXaBrG5q2NeUhpZkmPGNmuJpZGUMnrkq5UQ+7chDV3z0afpgpuvMPHRkk3X4t2EEtTmm9I4aAqHJmteTC2bMs2pZLly9x+fIVhhQ53ER0sCLJ6gKUrOciNfZ27tsxHYZbz32qkvBOWXaOrDVFSCmyqSXLtlJVZaCeL248d06KOKogVVXZxMyQ8kTIHrve1mxRgoy0HEFONV9ae2BbMbAbTsCDsVyMEyX4ZKao4OgaAzZNaOjadmRnjb2RLSdhETsorakzM6PM2ZuaeHhshrVtnMPT/jSfV/5RWPPrSgY4ZjPOsxrfc683MBXXmRlBc3XjtVwLnQg+Q+eF1imDKEEzkhIuWWXoBvOSGBcRswVWOn7MgF3oZcPxFYjopLHo5D8zMUoFrJRcGls5E+qKnCsm1GiGuolO9JjHgI6ImZkCxcG6dKjHzGyu3rJSU84YnITikpJSNMdjHCkJXrYdZk9DcrxCigc4HxGnJC0Vggsa9i6YpusDTWuh9KbNb1A1v4ZcTFraB9gEyxZ8KKQhkVIpZjgk+3mTSOtEckpu1mwy+F4YLji6lSWtEdciEvCNsFhYmOywEbpG2azNEXi5FwlNJrSe7oy3MPC2oVkZCBdpEVkBgc1SWbR2KB9tAodHiaQDGjL4tfn0CObFJw5LylKCxKWEXnu1OV+KZfvG/Foss3z18ynZeAsbpa4UZQXikGZOwKcjMRpbU0VL+K6rzuypOF1qLqG9xm6lFKkOmVUr8t5byv/yumasjmlWNToXJ301h9zQBsQ79s87uuWSpm3ZP7fHarUogDmDZJzHfJ5UEGkNgMSEd57GNzhxNE1D13U458gkIsYyOu8mxUTMrwG2DwGj6CuzFmfmu5LbprS3vLE8X/ElLNXSRQQpTsY2rs72p8LwGMg6XSZn1fpyiNn8GpziciAlIZbaYNo0eLE6eThh//x5LjztJtqu5fz5fVZ7e1aryvnCSs8VvmmDu+qgr6YrqWzP5DMxHrqjZi/jTR+TJ7kKUMwO6vL3eokrB6F35seBehadlY3wIqwWSxZdh4uePoGkwtLL1SHkozc72wDnutr8OREz60gzM5+V1CSVJawgZyxHXXxyrrrTjGkZASKKH6AvAX/m7+JmgJTidpBLSoICcqrpSidLxsj2zV7XvQEmp2ljZpTGJ2NygqdtjMkPwdOEhslPxmZNHVNmbJCBp8lEJbNkw8dHp/q6brEW82sKUH58TM7M9lUBzQSoZiqCk3JolAcp73eYN4OK+eacWTSIc+wvAmcXgU3KaCy2fyzfTRwjl9iiWqfHLMCnaqK1VULRRepbdPb3qrnW14xIdnyeY13oYExDXe2igoERXwYwzEGOWOhcKPivgiQLdNRxk6g/1wFC1fLuZEFP2VwVxKLdpoiRmvyQUROyMSxmHclochAdZmzM5Org1kNeJ8ulsgbdmJlIIrjsQcVy1Igf545Vkq9anMDoGGclJiSa43iOajRnK+Z46rAK6qrkAVBjn+gTzmfEecTFMnYlAZwHlQ5lSdJAdonse1TSRNFKeS6tYKfSn0LTBZarFlGla4S2VNb2DfigSFCsaNws8d/Wpn66KuM0q2Dc2Mvmolr7dexsKKAsxlQ2MjcRsSKWtbss6Zo8dlKhZLy9A3wTaNqMy3msbRSaUPxDBFxhMJ2dmuIEpw51OubK8M5b2YASWeOKWahG441aTWFr5onxdLLFjOnza8j76Fg9KqXzDUQKWCqJI+ebo0yfVw8dkDGxZD5l86M5mE4HWS79kbNakr/FojDqVj5BnLDaM1DZtC1d21jCzZmJb9yP65482y7nnHVN2GZh5ZPJZ/5e2Xr7HOjo+JfpqLQNdtpnZwBrUiunKKzauDLWdS60bbAkgSLs7a3Y3z/Dpo8M6uizMcHZOePhq0Jb205VUMcWj59/+kCngPEKFI8f1FodKRxOKsipThDH7jQCEmFMEkn127HP8MGXvFiMS1aVwqzU+STkQPnsbQZnfKVjJ46KhIGbubmqpGDxjhAKO1p9b+Z4gRl+qK+LuaqmtrBr5qbg40BHZ5OXrWvm7bXoypPlsUGOK0yOCmNY9KwjGB15tbhgC5LNqdBQmIXieuCGVYO6wHrIo+/Aekh8+rDnwaOelJV1hj5XB+ASFs4U4l29wqcHnS8eRnviFNheVlD5ndSVW28x3qlco5MjnRdHoFLcjlCoxFaE1s1MVGJ5fZbO0Tlziu6kRIGJFlBUAZ+VfbBowgzZErQZq5zx8XRzcayCIzWO5APZKZmSqwdBrIy6HQJNwjcRnEP7QOpbNCcrTLkWi0x5JNM/cmQF9DYNrg9WtX3j6OLSHFZxqHOoqz5HyfJZaEDVkxMMfSZFy0zt1XyUgvcsu4awKEnMnc2AHCP9wdpGtc24/sj8dHyy5H6uoQ177O+fxUlgSA39YIUiB13QayCrpcLf9Mk2Al1ACgZ0xExUoXWcu7DCi9HmQQa8RJtDPiHOQFZ20UxEo6ZWoO3xtXoKYnylMW9SD2Wx3EMWGTUd/JS8NzEOxH4o/iYy5qtoE7RtZU0s+6/hDIvxl3r44HA5s3RCWLSknGmWPXGIeB9YrKykBlAiD8FSS9iu68Th1JNDxjtPG9qxzpEPwTa/jCUfLWqBlJIVlqtHC9OSzFRYqqpbyQU14FtECtNsbJaMh2ktSQHg8lTVuVYkB8oYlrxJJbpsCKe8NlsLfc/ZoZoRFfqQyTj86gzhhptg6C2SpbMK4+fO7XHxhrM0TaAL5r/iChsy+jqU/QcYwUzFKCOAGQ+lSUmbA5s5QJoO0tlBRuneyjgYDzcdoTMzyTxlR71h0e1RMefxmoH63P4+CGz63qIYu471psd/6tPkBx4mZmUdKRXiZ2xTOaBHR+nZYrwWmDgNmdgLxrpU89dO/BbImbdzZFpUx6+cZQyuMNbLogebNpizvwA1TrkqkuW8rHXarBEjmp9932r4pPxQQQnltW49k5m0JneO6VlqX5ezVebsTWVzMCWY+t45A1f/m+bs1pk/ox6b8HgcjwviEqdjTo3RyWzG7oz58vPkBwBSDvdMFmHVeS4EzyZmHj7suHS44ahPVjl4gCFDiMqakvSNWqqhfKdor3XcqCmVTMbmzaCPtUPHwdpS6sZBmKWrktrNZtltyoAEqX44BnDashG0ojRYlEfjDAB5MT+dMnSFuWFMc16zrlgugwxq3vCWA+F0mZzWCa1zDN7y9Ek9KEWmHdBhZRpaBZfJeEiW+0Z7JcdMjkJ/NLC5MqARfA9uwA6k1OBzwGVj8ZzLqKuAtaZcL4suQ+yV2BvD45PgVJBOaFeeRdfY4Tw7jGL0ZoZIGdEBCdkcpdse5zJtWLFYtATfErPQJUfWyCZlfKkarrknDiVKjIDmstFgjIXzwmLV4AiglkxQCuNU08lntJhGctlMpo3zlPHN9CnixoPc7Nl2nKma6dCc3C2rdQ0DH4ZEzT9RQY4Tj/fmJ+DDLLx8VGQmjUvU0QAumNOueCHGUseo9SWozBK+1ZNPioOlABIKo+MCTWhGh8taLdzVLLUzgFM6fMzbk2Ii9kNxkrY9xN5cN9SJkaijoVqUGJl884w1qM7WOmrY09E+OdGnU07UGbyjDZYPKmchesU5b/Ox7fBn9tGYaLuGvVVL4z37+wvOnl3ReG8MW9W+q3Iq07FRl/jE2tT9aA4Atq+fESHTVccBzhz8wMw8Ux3cZ7/HIFCN6DHsO8ISoJhHvLV9sejIKF0cONz0DKWA6MMHGx65ssalxJATw/G6YvPn2tr0rwY9pyVzMDgCgjInJ9bD4c0R8sR7zAFONY9Xlqbu3iJWM6xp7ThXEpMH+XGoWlmlcpqOHzuxcTCNb319/G72++lJxzkwmzBX+0TJNtBh6o/peje+3rrnfPIhszVqTR/N0SfIY4OcsdCPRUNsGXPrKpBiZqmvR9gw0ZIUFqM1GxBnWs/5VcsiJPohEqNV1W1cpomZrEKfdcxcHLVEUlEynGg1s0zmlq1oqFmfXE2dbU8oKUCE0sXmS2PmqTnICVIhwZQV1MPI1oTy5ccvAz+ufJ+jemCGzs1UleT06+M0eILVksA761eNYoxSyYuiYixA1gSSSQPkjUcT6DrDkbM8gr1DBmf5BGt+kmr2UWNgXGhwnaIemiXkFfilw/sGp62FeccEg1pencF8XCTCuumxomeMO6+OaqAg2eE0GPhIntSbb0b0kb7tycEWvBbI6l1DE5b4HInBMfjyzDqFXOrcZOWF0ICqQ7J9QWE3MFCWk4Nkn6BZSrkDRkB2mlLrNwmQxSKqKBFyZpYqAKyEkI+aHY6a/lvq+thay4w14Cw334wV1Vy09FySZhaH9ZKUyxZP3ZTzZF4qNKslUbQ2RcUy0roCxkpkyZAiQz+QNBWTlrEqWafsyzlmAzmV1SkOts5bhfqR1S37hEWZKM5lA2RlU6yWvJxLHa1SVqWOXVYzh2XNY6mQ05NZ9GVxPjYwImMemRQSq65h2bU0xfGzMsPVf3AEKJx8dF595FcWZGrHdPaVGxU/SKH6M9YsufPDUcdxzjqFOtdcLVlLNvESrZdMTyl9PRrxianWIyuG4KJMuxBo2o42Q9M0hGClY5ybfEXnzz4+R6X367Pq8RPglGQGtKYDnlK3aVJM3HE0ORpMah/n0bw/N115V5SAAtqNTS4+O3NGc7tB1suzc9HKMl0N/rZASn2O+VWy/XoOKrdfMwN7Mt5vUkCOA505UGLap6gjenVr3KMwc9cBcoKVc8fVnWpit6qWVxeBMy1dnWWHtUt13DEaB3sOFl74onMdC+/oY+JcK5xvYIiZR456Lm8iKStHUdmkTFLYqFUcVyjJvMrrshHX0O8pMGAb2Eyd7JiNnT2i1rDvWcQWZqJqxCrq+gJyBAM1RukqrSitM3PcQmDhDNy0TizrI0LjshWzk1rYEWpqfQvnhUEsNPC0a1ct8oIkZ4hdIDeOYZM46i0SaUg9Q9+TNEHIaBvtMEyKDq2ZCg4FuZKRmPEHSjg0Z2rRgBAM4OTWvgC/CrSLDgK4cyBnwDWO0C7wqYGYkXUPh5HUJ/rLPWkd2SwG9GigXXpC6yzPSms0fq2h5JwnxA5RIQ5Kr1a/heEIyZesmnYbaLvWEgZ6aNoW1YToIaRDYkocrXvWR4PNBR/MuViEthWC83awp2CJGkuSOhFvYdAbGLDDTweIqXguDXrqjsc5U9iamhM8j5tISolhGOxgF8ast6p+9MMpZ0dZH4F60NTDCIpjes2QjFW71spg2UAgDYRQNiefyVKisebJ+5KWsF8lDubH5cSTXBqZnBo+2w8bjtaHpJzMDNO2U9RT2eT79YbDg0NyylbYMltMZrdcIGLh51r2BtQyjWuOxXnY2j+CO4SYMoeHa9abyNyfpz6HqtJJf6rjSbb0/ynFkT3ytuEQFg3LtkExB+X9ZUPwQheEzk8OnjUz7cTSzI4ttQOunIVXHVTTkTFSLtT9Xucgocy1MQqr/M0OYtvPqllK1cDskOznPiX6aHUQY6IUI4UxozgQs1VBNwYXK2HhPd1ixZmz4NsNe3uXWS0P2AwD6yGz7geoymRp56T+cjzCfXrG05J61oy+LLWQ6VTiwMbLjXXl5pBiSmlyHNzoOD/FmeO/iBU7rrl0MpNTsdRaRlSgUM/COThxs0kyByvHnumqPmQ2f+ZAZXaXClpGsDKBmdEnh+3cONuvy73GJm7PVIqDfE2WeJJcl7nKqv7lYleV2WfISGXj3Cz7b7FnjxfaZh/E6gVlgf0u4DCQEzc9edPTx4TLCV8yQgYSAatg7jP0ao5mEWemHd02Y2UsVK+uYq1azejtXW2I2wMlUpIOlp+ruSoUJseJlKKg5VomJieIGrMjs3ByKYwOjHkfKpvjykQzXFjS15ckbcYGnDKTow2tNDi/IAcHMdLrkXnd90I6SsQcUZ/JfbThzg6XSt6NowSHHomCbDxusCgORg95Y3FEbWpJ4/BOoVHCGfD7GI0eWiQ3uJSRwQZXN5l4kBkOI7kXgk+k6GgWHt+KlZAozIAgSPa4bB795ESMPVkzg0Q2YUMKCWRJ05rfkXMdLpjz9BCg9xkhstZEihtUlSA1SgFcKOnG1ZGjG51Ua6h1dhAHRxY/AdYMOTCyYqcpxUVlBOflt6PPSjVLWah2/btgNvCZk/GoCNgKUCb2JhcmtYa95vlmi7E2pq1V5UHJJHMerxpoSSqIGouTYrTfYdpFdZStSfyGfqDf9AZcVC1SyFU+GPPDGRLDZjBTdzagIyKEpp0YnEJsAWNUmXPCUELFxw1VpGRyjvSbgSmaZXpu4NSZHC0MzpinSG3/wNmY+TLvlq1jbxEIvvr82b7CfDzLSG7vdhN4mWMWO/AqW3etxk39LzKRIyMOqsxXLQNCZWl0AuMFMNuXOaynVE1XkzNsVB2T5aUCsFQcvmnougVZZcwHlJWSGXj2fExBHuhU1oHxCU8Z4FAVh2ltjayNK2HVzo/mq5rl+NgdZmYqN752Yq4fItUVwo/g1s1uMdo0ypqs5tvatpHEmWivMiDfgQAAIABJREFU7TZccyJc41mZnnN8+5xNmO0vVzkmz+5z3OpSX0/PNoEcqRWzxzacLI/teHzmBpsoJyQ3q34cCpASEgckZ7Qf0M26lCiwnUYBfLCcBgqhV5regveX/ixn2gPL3nrY49YDSZVuyKzLIthkGCrIKSyO5dWZtLXtTMVMgzdjb5xzs87XMV15zawsTN+rs3FN9jeFjes4qRrJ5ngssHTQOZt8rS/mLbGwu7F4mWdmwqp/NxutA7qnPfOxhuRxyWLvPBodcdWhjSO4hOQ1Q59oh57QLInZsvjmUHJsqOBSYeZCTw5rSJbcL66GsuMFLBGAK7lzWtPKXLYDKij+HPgzZTws5TF5UDrd0HcDqY+swwHDeiC0wvKco+mEZuFZnesIrTcWoYRDu8bTdAGcMGhmEXvTuLsli70zVrxwsWCxWNnGEjIulNDktMKnFTEngh7RyhGqlnunRv+g3nIAqZLjxpIoYhmgReww2oSeWBxTczlRmy5wdm/B2eVNpzqW+8uzwFzDgap5p5wYmjiCnCn6YDqRJiYHmsbTNAYYLcuvXZ/UTMbV/FQBzwhy5l9VU8Q+Ivni/JgtWg4KyGmjsX8I3oWyyTOGrQ9tTxs6cs40bbCw4ZqNGWMLGtfRuI5UCgfHwmYtViu61cLuOe785nCd0oBzQrfoaNuG0QFVhJQyQZdsmjgetvVBKsg5v3fhczyC29IFK7/g8SQVYrZ0/VltzVQmYNFYRtngba8JMgVUbIOc+m+aHdWUXrXwEeiWv8/ZH9HZG+sfmQ41ymcgk0OrmbNmcGMGhrKCJcVzo6kqSAU5xVNRpzmHKiF7YlZSdjjNBDKtU248u2JzYZ9+iCwax/7Sch1JaIzhkJIF+zhzX8xvz7hw5nM5dFfJ+Ys3maLkKmNTyhoUhdeHCnIYgcrUxvpt2x9HFVKMFh0JWFhMzSFV+QYla0Q11QGxb+VzJpBRfz8DOY8CFObtukrKZLoK5Iy7gZREwYVCOM7oMP99pRi22zhmkTjO5JTXZ87ecM1mi26lv9zJTnayk53sZCc7eWrIExNHt5Od7GQnO9nJTnbyBMsO5OxkJzvZyU52spOnpOxAzk52spOd7GQnO3lKyuctyHnHO97Br/zKr5z657z//e/nRS960Yl/+5mf+Rl+4zd+A4BbbrmFBx988NTb84Um1+r/ed8/mjz44IPccsstp9G0nTyGPNra2clTT653r9zJU0f+x//4H/ybf/NvnuxmPC453bK6j0P++I//mOc+97lPaht++Id/+En9/C9k2fX9Tnbyf4/s1utTUz72sY/xqU996sluxuOSJwTk5Jx53etex3//7/+dg4MDVJXXvOY1vOMd7+C5z30uf//v/30AfuzHfoznPve5PPvZz+b3fu/3+MM//EMWiwV/5+/8HV7/+tdz9913473nK7/yK3nlK1/JmTNn+Ot//a/zohe9iPe973088sgjvPzlL+cDH/gAf/qnf0oIgTe96U3cdNNNfPSjH+XVr341Dz/8MCLCXXfdxYtf/GIADg8P+Sf/5J9wzz33cPbsWV796lfzpV/6pWN7avuqvOMd7+Ctb30rOWfOnz/Pq171Kr7sy77siejKp6Sc1P9vfvObx75//vOfzwte8AI+8pGP8FM/9VPce++9/PRP/zTL5ZLnP//5T3bzv6Dl8PCQf/bP/hn/+3//bzabDa95zWu45ZZb+Lf/9t/ykY98BBHhjjvu4Ed+5EcIIVw1lr//+7/Pu9/9bpqm4cKFC/zET/wET3/60/lf/+t/8drXvpaHH36YlBIve9nL+M7v/M4n+3G/YOTg4IBXvvKV3HPPPTjn+Iqv+Ar+1t/6WyeO99d+7ddu7ZVf/uVfzvd93/fx3ve+l8PDQ37kR36Eb/mWb3myH2knRf7zf/7PvOUtb8E5N665t7zlLVedz1/0RV/Ez/7sz3L58mVe+cpX8hM/8RNPdtM/O9EnQD7wgQ/oD/3QD2lKSVVV3/zmN+v3f//367/8l/9Sf/EXf3G8bv7z/PXP/MzP6A/+4A9q3/eaUtIf+7Ef01e96lWqqvrN3/zN+rrXvU5VVX/rt35Lb731Vv3whz+sqqr/+B//Y33Tm96kwzDoC17wAv2d3/kdVVX95Cc/qXfccYd+4AMf0Pe9731666236h//8R+rqurb3vY2/c7v/M6r2nDzzTfrAw88oO9///v1u77ru/Tw8FBVVd/73vfqt37rt55e5z3F5Vr9f7zv3/nOd6qq6v3336+33XabfvSjH1VV1V/4hV/Qm2+++clp/Be4vO9979PnPe95+t/+239TVdW3vOUt+vf+3t/TH/3RH9V/9+/+neacdbPZ6F133aVvfvObVXV7LD/xiU/o13zN1+hms1FV1V/6pV/Sd7/73ToMg37bt32bfuhDH1JV1UuXLukLX/hC/ZM/+ZMn4Sm/MOWd73yn3nXXXaqqGmPUf/2v/7W+/e1vP3G8Va/eK9/0pjepquqHP/xhve222/SBBx54Ep5iJ8flwx/+sH7913+9fuITn1BVG8O77rrrxPNZVfXXfu3X9B/8g3/wpLX3cyFPiE/OV3/1V/NP/+k/5W1vextveMMb+O3f/m0ODg6u+/3vec97eMlLXkLTNDjneNnLXsZ73/ve8e9VS3jWs57FjTfeyK233grAs5/9bB555BH+/M//nM1mM15300038S3f8i3jPW655Ra+5mu+BoDv+I7v4EMf+hCXL18+sS1/8Ad/wD333MNLXvIS7rzzTn7yJ3+SS5cu8fDDD3/mHbMT4Pr6/2u/9msBM2PefPPN/JW/8lcA+Lt/9+8+sY3dyZY861nP4q/+1b8KwK233sqDDz7Ie97zHl760pdipTFaXvKSl/Ce97xnfE8dy5tuuolbb72V7/iO7+ANb3gDz3ve8/gbf+Nv8Od//ud8/OMf51/9q3/FnXfeyUtf+lLW6zV/9md/9qQ84xei3HbbbXzsYx/jZS97Gf/hP/wHvvu7v5tnP/vZJ473SfLSl750vObmm2/mv/7X//qEtX0n15a7776b22+/nWc+05LOfs/3fA+/9Eu/9LjO5893eULMVX/wB3/Aa1/7Wr73e7+XF7zgBTznOc/hN3/zN5kKvJkMw3Di+8c6M7Of59e27VRmvWmaq96fSsr3uajqmKbdHat7ISJWtv4abbnzzjt5xSteMf583333ce7cuROv38ljy/X0/2q1Gl/P58y1xmknT4zM11tdzyet13lJhDqWzjl++Zd/mQ9+8IPcfffdvO51r+OOO+7gzjvvZH9/n//yX/7L+J5Pf/rT7O/vPwFPtBMw8Prud7+b97///bzvfe/je7/3e3n1q1994nifJFOGbRv/+c87efKklgapsl6v+bVf+zX+43/8j1edz08VeUKYnD/8wz/km7/5m/mu7/ounv/85/O7v/u7pJS4cOECH/rQhwD41Kc+xR/90R+N7/HejxvjHXfcwVvf+tZScDDzK7/yK3zjN37jdX/+c57zHEIIvOtd7xo/63d+53f4a3/trwHwP//n/+TDH/4wAL/6q7/KbbfdxnK5PPFet99+O7/1W7/FfffdB8Bb3/pWvvu7v/sz7JGdzOUz6f+v+7qv42Mf+xgf+chHAPj1X//1J6ydO7k+uf322/nlX/5lK6TZ97z97W8f19pcPvKRj/CiF72IL/uyL+P7v//7+Z7v+R4++MEP8qVf+qUsFosR5Nx777286EUvGveKnZy+/Kf/9J945Stfye23384rXvEKbr/99s+ISauRVn/6p3/K//k//4ev+7qvO62m7uQzkK//+q/n7rvvHs+vt73tbbz3ve898XyG7XP4/1Z5QtTgl7zkJfzzf/7P+fZv/3ZijHzjN34j73rXu/jJn/xJfvRHf5S/+Tf/Jl/yJV/CN3zDN4zv+aZv+iZe//rXA/CP/tE/4g1veAMvfvGLiTHylV/5lbzqVa+67s9vmoaf//mf5zWveQ0/93M/R0qJH/iBH+AbvuEbeP/7389znvMc3vjGN/IXf/EX3HDDDePnniS333473/d938ddd92FiHDmzBne+MY3Pnbdj51cU07q/5/7uZ878dqLFy/yUz/1U/yLf/EvaJpmt3l+HsqP//iP85rXvIZv//ZvZxgG7rjjDv7hP/yHV11366238sIXvpC//bf/NqvVisViwY//+I/Tti0///M/z2tf+1p+8Rd/kRgjP/zDP8xtt932JDzNF6a8+MUv5o/+6I/4tm/7NpbLJc985jO55ZZb+O3f/u3rev8HPvAB3v72t5Nz5qd/+qd3TPfnidxyyy284hWv4OUvfzkAT3va0/iBH/gBXv3qV191Puec+aqv+ir+/b//9/zgD/4gb3zjG5/k1n92sqtdtZOd7GQnO/mcyS233MLdd9/NxYsXn+ym7GQnn7/JAHeyk53sZCc72clOHo/smJyd7GQnO9nJTnbylJQdk7OTnexkJzvZyU6ekrIDOTvZyU52spOd7OQpKTuQs5Od7GQnO9nJTp6SsgM5O9nJTnayk53s5CkpO5Czk53sZCc72clOnpLymMkAX/5DL+f++y07opRALMkK5XX5DS4EQgg452iahrZtxwR5qoqqMgxDKccwBXSJQHAO5wQUUoykmBAgeI9zHkHwHpwDcYJv7HoRh/ceJw5ViDGjOZMVYkzknMmaSTGiW58peB/GWljO2c+IoGr3UJScdSvzo3cBRdmsezabnpQyR4dHHK3X5b4BEWcPVb7L7AusxAQl9X3KGVUlxch6vSGmyDOf8Ux+853vfDxj+qjyq7/+Gwz9hv29ljZ4JEdcHoCEFyVIRgRWywX7e3uEEPC+IzQLwHM0wJUBhpi5976H+MQnP00/JIZNT+x7BKFpOkLT0mfh45sF9246cuiQC1+MnH0auT9guO//Iz70lyxZ8yx3PzfIJS6cP8dX3Po8nvGMZ6A+EENHcgFIoANoAhxOHCjEIbI+3BBj5OP3/AUf+uCf8cgjl4hDpN9sALj11pv56tu+iv39ffb29ji7fwZEeOCBB7j//vtRVc6eO8v+2bN45wnB433AiRCahiYEcs5sNhv6vidnZRh6Ykp451iuVnRtS0qZ9WZNHAabXyHQdR23fPlXnNpYvuueQw4Gm6/XlGskqRTk5Et0fs3nj3zOgkBnt1GddgVFyWWXUFVy/ZsqZHvfXuO483kXPjftOEFe+E138Im//MvauJMvKoNV95S6C2+Pp8y+199v38+erlw3v/215sWjtGX+mdMNS9uucY9rjef2fT6Dtmzfxe5fz6jZZ1nZEeWZX/zF/D//7+99Jjf9jOT17/gTHrqyQbVMnrFllJ8VlfqqjpNjPiaOcubGIyQeIJoJ6YCQDwleuPH8GS6etRIp/fqIYbNhs9lw77338ukHHkBzIsVEzgnvPYvFYjyjvffl7HNjyQ07uwMg7O2tuHjxBrquwznbE0UcgiDix/GYowDQa47TfAxErj2eOhuz+gVWGmT+++ledqPVanXNSvePCXI+df+93PvJe3FlxoqqdbyOjwWAD4FQQEMIDW3bIPOaRAoxRmIcSiMVNIMIwTu884CSYyKnhCAE7/GlQ13BDs5BaB3Oiw1KaHHOk1Om7wdizGTVMrhqICfnMtmEOp/sQAs4cThvh9sIysrD5ZRGkOOcAS5V6DdDATmJK1cOOTg4RFFEPFDBjR+ff9yQCrjJFdykNH5fr9fEGE89c7JmRXMmDYmoiqYeHY7QnGi80DWCEyH6gaHp0ZTQAKhDxJGiQ5NDs+JEaJoWiGhMZMTA5jAQY2KjjitHcGmjJA90a9Sv0X7DcHBEOjxko4ecSQ8i+iA5bbj/0xfxDtQHcreH+gbnoGls7J04gmtw4lgfHfHQQ4+wWW/4xF9+go999GM89NDDxGFg6HtEhK5ruPFpN3Du3D43XLwBJ4L3jjgMOJuF5JgYNhuyc2hqyCHhxCECXsTGLEVSimhWck5oTqgAuYDinMbXiJTrT5coPYpwGG0ZXXUoSv3v6sNE6u/lhLfU15+HiSUUPelxPtubjRt0BTu5AB2KgpPLdSQ9qbs+5/KJv/xL/uKee461c446j4GKY3uFzP4TKeM8/RJ71pM/e7yTTDDnUfeiawAcOQZwHm24TgI62wBtatxn2veGcbbBTf3+RGRNefjKmgcur+3cKR83PYOWBaYoghYwqgiKAQmHmplFFRePkLhGNNLkQ0I+wIsSN4dcftijOXF45TLrwwP6fsOn7ruPhx58CNVMTnbeOOfouo4QQlHyDeTMlXDnZFT6V6s9Lt5wA4tuQdctOHNmnyY0hKZh0S1xzqHM+7Kc6cfkeF+fhGHnY3P8ewWr83GzfaCA9PEe11b0HhPkpDSQ0wBih9+o3WCLyIorCjIOWibnyBC1dJ4bCzCKg9AYUMg5khM22IX4ACE0AQrg8GJau5QtSFDEgRPFC3gHbXCEEBiGyNBnyEPpb7veCYh3IG5rGESErAY2VBSSjoNvRcwglX1EmTQcEcUHR6sNKTlCcDgHqmLaXwFu4gQp/e6cGwepApvjXzbJpr46LXn6DRfpN0c03sZz6I9Yx4Gs4HxDu1gQRtAXUByKp6LDEAJL39CqcNPTWvbPXjBG68oBRweHpJQ4Wm9Yb3raJJz3LUdtIHlH7CLRH0GzQfccaEebIs2hoH1mEzfc98B9bIYNElr8ch9pOhZdw7mzK7q2QYKxK94FrqQDHn74ElcuX+HeT36Kj3/8L3jggQcMdKSMc8K5c/vceOMFzp49y9APLFdLuq6jaRpuvPFGTPsYV4rNG3WI2Cbjy7hkH6CxsWqDJ6vinaPtWprQkIPHOyG1VsBQcyb4062a4sS+1F3jNLkmizN7dVwB51EAzmmf8tc8e8ZV+yiH5mMdXMcRXWVqxMAqpjUb8DWQ4OqmKoKWveQJlxPG8FrgQ4/9etqlj13zGF312I8ps2/Wf1fxRdfRV49W4PPkxjzatSf3U71/fV37zp36YGZEczkrjx30MFpFjDlkel25nVEfV7M0BI8APgd8DmiKPPDQg3zy8BHiMPDIQ5/myiMPk2Lk4PCQ9Xo9AgHVMp+dn87imeKtW22083ixWHDu7DnatuPChQt88Rd/CavVHmfO7NO1Hc4FVDNpfO90D7vn1aDFcPn2TDl+Ds6vn8RAjVR2VWZ9l63PUk7XHInH3IVVE1lzMQnpqBvOFYRpzZVOZapEXAFAVTxE6oE/WxmFXZGiRQjO6DpxBdfKOFmEafAFCvq0L0FH5Czk8ebiDC2ftL4rGp0j0joYztkEmd5UNDoB5w3JipOtxTRqm7W987+dAG6OU29yPbvD45DVosOj+ApKU0CoQNXjQ1sozYo8ZzSqCA5j3RyOlWtoOkfOSuscjXPElIzJyJnsoFPPAkd0gvOKSAKXkADSecLg8B4QJeXI0frIKNSmI6jDNRHRBWnVGfjVMi8KcOw3Pev1hsODQy5fuczly5erGod3jitXrnDp0iUADg8PGYYB7z1tY2wjGMOYYhzfV7+EaW47J/jymfW0E3EEX0yn6iAEvHP27ALutOuZVTOo6rX538cEOvWaAuW1mD5Erz0XT/y1HvvD8Z8f4z1bl0/7yPiBVRu+6iCfXfMZSe03gwGqBmKU2Xot4KZec9rjOTdrX59MnTYHODJ111YvThecfKf6p0cfuRnAmbVVxwPhuOZ+DUA2Ax/X/KT5n7aQt87aN2eprh/onDZjLmM7Tziwr7pumvHj78uSrt0sThAtrItzaIb1Zs2VS5cY+g0PPfgAlx96kJQSwzDMimq6mdlSRqYNKriaLAvGaNq52bUdQz/Qti0AFy5cxHtP13WlD0eYMv5s95wAztWgZT4/pr/NP3/qpuP9ZNfUW9S/5vH8fhxMzvQgBlpyVig+OYYMS+fV5aQZcDNfl2rPk/IwuTA5Mm0ezkxGIuBwuAJIvEg5gKtJQBhdXsTalZINZoqx2O3mi8c6peKp8TtsDbY4mcAKlDZO7R07VevkmGi0ic3KHDO/jtdO7ZkGdf63JzLpdNe0SM6IlAWYE33T2YEdWkQCFOZGdYSSgEPK2HjnRmpVC5xcdA2SF6SUcQJt8PQqxNjSpJaI5zB0rL1DcsDLErfYJ0THXnuBxToRgqdrWoO4quQY7f7Bm9ksV5ApOOcJoaFbLFgMieVqxWq54nB5WMxGCeccq+WS/TP77O/v03atUbg5kbMjZ9sARraPgl80o1kZ+g25zC8zM85sznUu5YxKKvMhlzmhJ2pwn2upW9eJ6H2uhZz0p60fpmNtNBeccGBcfbtrHYfz3z/W6/J9W8G7RjtPesxju951iu0Bs/fWDWLkPxiBbh3JU4asn4VcA0BwQo+f0L/H71Wvu7aqNQM4V10hx75fu32wDT5O+tuJtwa2AY79PLVLuYrOepJkCwPq8b8cf4YJ4IyQNEdSGmwvGg7I/QHkCMNlGK6Q08DRI5/m6NLDpDgwbHqAsj971JeP1pPG5aqWUpX7uhxSTqzXRwzDQNM23Hffpzg4uMJms6FpWpaLJT4Emqb4oZZ72djloncdH+PjQPTR2Bu2rsspW1+oErMRLzkrQ4rkRwE4cF1VyKtLnhvNMVpMLk4zQTAgomL0HGa+CcEfc0KGIUZijHbQaxo3UucdofF20LrqnwNO7WhVzaSoZNER4AgGRoa+Z2AwR940mG8Ek5ksA1lmwKayRRXNFzOc83VpF+pLGQ9EGwRApQC0glYLk+O9KwBp22HxuA5VwdFxJ6onEuTsrxYMwRUgp3iEPESbzE3AuZaKJJXqc2ljgzgDtsFAkEtFa3RCu+rY6xrMwXzFMEQSjovScYWWQYWHeuFytHHtzp6jyS0uHrI4SITNipwym34gxWR+Q5s1OgxEBzrsQ5sQVbwLI8DZ3z+H8w3nzp3n3LlzxGEgx0iKA845zp8/x9Oe/nTOnj3L3t4eOWdiHPBOyMmAuBNzFAZMk1EDVOt+TUqJ6qhuzunmzyVOQNRMrhXdqsHBrFqo6kdffI9XRnPVMbDwmW/z2xvPyVqunnC4zT73ul8f/0zZ/s2jNP6aQEOPP/Sjr6dt0sjG0QB9XddTq3V2bj4p5qrPUk6GlteeHHXI5di4nHi5yNVY4rNAgddlrjpBrv6ozz8IKlKZ3JOeT0ZzlUmlKXVkLOJwSFxfIafI+spDrK88SIoD/eHD9AePoGkgrq8Q11eMcNCMU1BxhGBKIEDK9YyZfdpJFoRiYTGgaHvk5X6Doqw3R6yP1rRty9OffhM5K3t7Zrq6ePEiTdMAMhIe5paYrxpb+3HGwTwmwCl/z5mchtFH9mhj/qtDeT2kyKbvrzkW1+k0sI24cmExJn+SornWTmIyI3nvRhCQcyaNq2nSHI1JcVt+KXVLFIDsZnTdBHRUIWuagEeeDxwz8xdsrV3Zjnraoi+V8fnq4aVlp5uoOLY+x7mCw9NMCyx9NT8wjlN4J0+26xuRz1aC9xA8OdvzpMK0qZrN1jznDZlvbY5Vvxv7ykyR1W9BvEeKl34o5qCEIG5B5zr6DHoUkU3GqWfRtLQoLiqtrAhNzzBEcjogx2LLzoX1S4UBnDE5NlcCTdvQxo627WjblrZtSc4VACC0bcdysWS5XNKEAOiMUcyo1nlnT5ozo69WjANxKM7gzfS5KmaeUgqTU8dv7K2q+58yk1Pm+HFz1Wc+h/TYy/+fuvdakhxJ0nQ/NTMA7kEys8h0z57dWdmLXZE97/88w7t7ukiSiHACwIieCzUD4EEys0iUyDGpSPdyh4MY01/Zr/oZeajPfPHMhv3Z9y+3zWp5coXnAc5zX7zcAZcAZwvcdEU1VAVGL0/+2mL08YxZrvcrNwULa7WBXJS85dyymbePL9rW+zPfb+/xF97X1o22yPUvneNCXHxhDn3FFHs2sPkVmmzXz+MOruvVPr5cI9Js4yVS0kRJkTgdGU8P5BgZD/ecD5/QnCjzCZ3PCGY5X70hgvgqY9DFAn0ZiL1eWzZroFlgSsnEFKtCLjVRp2O/v+J4PCAi9H2/uJouM/kun/fptS/fvwR2pN1oNQw0kJNiZI6RmBPjODKnyG63f3EsvghyDHT4Ggxui74tkAuhvO4cVFm4mUgbz3kd4xZM1X5btCAIBYdrR29WhPO+po0r3inOmQshRUs5XzSuFght+eZc+ES3ZiChuqjaFUzrts6sA8cW0QiWObXRegS8d4Qu4LJZpxroaj/bAp3HgGrrygohmGUlvG6wqpa8TJZmVXLO4Su4NFehXPabdZY9vyqlZpylVIjR3gfv6WqgbXC20IKpM3ReSAVQz+AszqrXng5wWfFyjQ+ZmBJBOuY5UoCoQlFh2A14kU0mk6IFQghcX98SOguO+/777+vQmoAK3vE//vt/5x//2z9ye3uLDxZDIyLkXJjGCRHovCfXeJ+ckwXFl8I0zcSaEt73A6HrngBj79dgca1rpGghpUSo/uzXahcA/lmJ9etO2sIfvniql5DH82aESxPJ5noXn78gUX8dcHvppPV7WTd7WZSZCqSbAre50T/aVnDRTZ8zVsi2i9cNedl32gHy6Bma+53Lp9u6q4SnrpU26Z5iikdC6oX7fTwFnny/BeybO/x8k8v3wrIJ/1qL0W9pz67NDWK7iL+p79N8Jk4nSk6cD584378np9neP3ykpMh8PhLPJ3PHa7YgAif0XcfQ9puNJb7Nb0t6KUtIh2Ue66Z/TQaWkqsHw6hXLMHCLNeKcjqf+PHHH7m/v+d0PpNLYRgGhsGUyW3GVnu6x4pLhVNtw1z2zCaPct3jc0qUnCk5MU8jqVJ3nKobLeXMOU7ElOn64cWx+KJEDb7D1xTtdccx8PAEvcn6EC3OxbIYtCpK9lAiivcOnGnWiPkAnYil8VaYI6yaewgBX0GJF4spSSmRxolpnqt7KiC+hipXkNNiJCyAcANwRBC/osxS7JiUEynGJUW6xQeJeMMeoe8tAAAgAElEQVRP1OdSagp1WCeQQs6GhnO1LK1WBzMHrgLxMrPKMrpssr5mKyWbDzdGSgUs3jmk8idY6vQa0Npibwzg2TOUFFGl8jJUX/Buh/M7jLsIxBsw2nkH3lEQbjvPtAOh0AFeAqQA/QxzZ6bIq0iMmayFKWVSsbkSxJmbNNe0yKx0oeebb7+lFOX+0x3/9E//xM31NX3wDINlYf3f//f/8r//z//m+vqa8/nE4XCglESaZzNxqtIFTx+MwiClSE6RUnlvpmlGnDAMO7q2iVQNyeKx/DIfloWaCzHODJ/RLn6P1jD7q5z3a2TCSyr+I3nz4uftNy+ZDF767Kva8ye9FJ/adoYKbAxEC5hJb7HkSP3vdQXl52x/jz1yq5Xguedr+/IawNiyVy9+3Pz4FcosotdU6HqGNXhWeXyOtQ/bZxfC+2ISCW0LfmxVevREn7GyfMVkuJhv6/8sMaPwcif/js1RXcnaFpMs1zUxZPcmWLawamGcDjx8+JEURw4ff+Tu57+R48x8PhDPB3PbxEiJEQF2vSd0nuAcu37H1dUV4gTnA+I8CNUwYF6UGOcaLrLuVdDcaoJqJpe0hDKUugZSzmZlL4WHh3vO5zPOeb755lvuH+7Z7Xa8e/uO7779jq7rKoVM/6xnps2tZdHV0JXGaTdPM3OMRqtyGpkn4487HQ/M40gqmfM0EZPF4sScyFpw1YvwXPsKS47564SyYHqRzeJnMymb2XwFrLXpo1f7qRNZzGPt27btXOpgumTUiDSgqpAt6LVUIOOknWuz+7eJvtlopd7jdi1ts6yaCc6mYku5W0176z1bTM6aIu+Wz5fzPlpQ29S9y352SwD2q7Yl8LYswbTLfW0x96IIr9+3z7UGfTXAJICWDnvuUtP+WcfKrVtNqP0f8DgKSEBzD0RyLqCe4DOpFMQlUinL6tAKHC2+1wBjV7mLdrsdV/srpquR3dBztRvousCbN7fc3NxwdX1FKZnz+YSqq+4oy6hyKKnOvhQryCmlklfOtEDnpQ9KWTbM7edrn5aFp+c1m82lOrd/r7xvXTejX3z3XyubLqT112jpv+KaX9HaElyUIN18uLjfq4B87Kd+lbZRGp95ridGsQsX5aMfLAro6l7eWt3t965ytMCSYVm/XwDKJptiuye3NPvnrDnPT3tdNtwn0+HiB9tnevrZ48OeHZILsFXHbjuOf0BroRLLP7p5v+lP6+di8jXHas05M50PjMcHcpxI04k4noyDK5mi5wQIgqOR5RrvmziHC8GAjjS5IkscaJM9ZWOpbDKpqMdlWUBOC0mRmEipAJmUEnOMgND1PTfHAzkl+r7nZr5Z5ObCw9Oe9vHYa5P07TqmYKSca7ZrZpqN3DClxPk8Mo1mOWogR6lByGi9p+fbF0GOWTC8ccmocY+IW5bDIvRRJZeCVOGRqqBYU/ZMCCyPpkph7XhLAW8L0RZEc504EcQ7nK/0beLsfrzHdz2hptZ5Z5HeillSmvVoEUlSrU+Vm6fBts1d4byj3/WglSOFJrAvU/HanpdzodTMHMMEpX7X/JuXVptlATzaoBrzpH9lnpy7+0+cjsca/7JF7DUyv1qUvHPGOF3vc9cPOC8bFs3CfD5zPhyqjC1GYLW4viy2x2XF+TUYPGD97yXhyCAZvId+QLLSSzYXVkpkPZvmosUynYqBomGewXmKCqlIBVy6AJk3t9d89807hmHgH/70J/a7PX1n8TrG4OlI80TJFuA85UicK7AuLYq/QMmm4ypoTuRoE367eWlJC/i3mDCFki+Ew2s2ga/iPvnacy3gRrfb8e907tr0xW9+n/N/qT1W9JuFuW20NMvuioJA5CLm71XaBuM8//1mz9j8/9Y1sPyJM22eNebRLDoBnBGVht01vt+DeHzYIb4HlJQ3cWtpppRkczpOlYesIHmu66OgmlbXg2z3ry8Dw+cUgaefPdYUL99ux/N50LN+cWHRecUmPFJmK8gSpNI02P6QpjPz+EBOkbv3P/Dpx78S55HT/Qem0x05RfI8k2K0rOb6p1LpTfCIC4TQMww7xDtC1+G7sMhV54SSCz44UgxGlptzDVVoe76rymtaQkdaN+ecuZ7TohTHnFFgf7UzZTFHzqcTd58+EoJVEui7wZJyNsaJ4H3l+1mzpFWVOc0W/5Mzp/PIOM3klDmdzFNgmV4TMeZqvYGszhijq8UK93JowFdYcgJOAipKERNgXUsJz9lYYNXcB1pyXV9uiVWw1POGIGs+PKCYeUqqKdUHt06MZsarTMgiFvvSGIRb/Ix4IQwgoWNJcxaLt8hzBVl1gmkld5O2YVUTIbWjWxpa6DydH2yCIDWNvWly1ieqNb26aHWd1CwaaZagFqBsS7ChaLgEOO3VWKKtLEZ4ZXfVT+9/qtwKcQE3mm1EtsAmdB1DN+Aq8NKbG8BRciTNxvZ8Phy4//TRxjTNUBLOO2PGrAsn+HkBny6sRIviCq5y9bguIMFRFPxgnoJ5nol1Ic4xM5/PzHGmqNBfvaWIIyVljIVczF14e/uGoe/585++53/9z//B1dWe27fvuL65wXtPjJF5ngkhMp1O5Fyz/dJMSaYJdMHReW+Qt+SaMSjkZOzP4sQyB72vVrENeG17Zyl/SHZVVbkRXdfMbzjTxfvFEIo8EgkvCIgvyI3XtGl91mC2seBu72WxLS9JE0Y70DJHjSnWtJS2Z5RXNrJ+1v/4rKWmCdMGLNwCMhqYYbFCegM+/Q4JA77r2X/3j+zefY/zHd3+HX53Sy5wHBNjtIyWeD6Q44ikCU4fkPmEpAmZ7pE0gmbIE6L5IpO19elqtdiuBdk8gm7ev9gxn+82WIHLS8h88ZO97lxcL/dIkUWqzaWGP9T94Xx+4O7nvzFPJz799F/8/Ld/s/iT+cg8HmoMpckUUzws6cE7KDiQgHOBftixv7rGBU+/6wl9Z3KzZo+WkpmmbgkmNiU3LxnQ3nta1rNxRjmC89Xb0sI5LEN6imvatpKZY+HhcMc8nXFOCN5cVlINFU4MaO+Ggd0wLHuo956ihfN4WjJZD4czp/NIzso4JqZoMa6xKElXM7OVUKoAzTkIvyHwmDZY4hApJqwai7EqOTftYRN7o2UR7Lqhul9MwqwpbY1IaAnmrF1XZ8by3eNtWMGyXLxDCPUnjbRO12M251vn/suzXURw3gJwXTUGtvu8VEyk7inPBxMvV7p4zhXcPA5CfpLl9UptniarsVSReXXQmuXKO7RaclTVnr+UGqTcHmgd01zjl6DUmmMRVV8nNogUCmIlQSqYNSsgNXNlM/bO46r1TJzgclmD2LBrlnrd0gLTqnkzZwOzXdchKPv9fnFR7Xa7ZSG4ymZdSqmBdPZApT6joKgLRqVNE/ayPjfGhty+M4DRzL5by5z9vXr1W9ns6bK8qzO/3c/2/Zfn1nPGBFnW9zPnuAARuvnNc+e+BGNf0qd/aajvxdKp+8z2FNula1bXdZdognlRxOocE3Go00cn+ANaM6tt9g37fDvP1tG6dNO3jbNan8W0fcThfI90O1w3EHY39NdvcaGnu/oWv3tDLsoYEm4y9wkEcGfUj0icqxvbGcApBdFk/C2sRK1IVUS13Y/NwcvR3Hy+fKHPT5zLTlmPXfZ6VkV023ftV5s5uh7zylCnKQlNad/eGhUAlmLZU+OJeToxj0em84k4n8lxJMfZZKnKoi+5ysmtG4LWRu3hQ8AHX60pFQQ424uNF8yUeuOcgVw9CKHrbO9XpRQ7/xbkbPsypIT4SK57psX4mCyY61zN3owfzWgh1LAXtZAS5xxaPNqFqsTONfYmM04j4zRVOpHCHM3fkxHL1q3yYfGxOF+Tkl6GMl+RyuMQ8YRglgvv3FLAsC0sVav6UmjFLFf+ETArSdM2F1nZgjcFRBw+BOvKWodHqdlTdd7nmvnTLEXitmR1tlVZVHYFFN4bKZJmo55+pFVvg33N3bG61kop1Ve9XZyrO60scTs2x3zwdmT9jVbz0SIfNwihaV2Pg5Gb22hr9XmNdjwfOZ2OliGltgG4JsaKoLnqGloMxZdQCRc3oMSJefwcNcvNUvlTzvh2DHVzdoCrgMXDGuzDuhm1wmRgGyhC0MLu6gofPH3swQnDHOn213T9gPMdkpOlUuZCFzrevXuHlsybN28Zdju6rl80FClKcJ79bkfXBebra+bzjRVvzTs0J4Rmyakuz2pJqiO3jFvjgAI28Vtra/WtXjvweIEkK9JBKBtFYd2cvmTnkUfvVyDSwOAL0ORCc5an3y/frKvoKYPUcz94Dp18+dAn3zV88jzq2lygmOuxJNsPcku9NbBfo0i/dNe/Y5M6hE9v/Mknj4ANVWkQb5Yc7ztc6HG+Y/fuT+zefofrd1z9+Z/YffuPlgDiBlQ6NCuehHcFcsbJgPQR0oSGa0tZTme0fwvxhKQRN35E0ohqRsqMxZgI6lYguQTaNtP6osjqZx7spadWnj34JX/Vdlo+AkCv18RiEyvYFgpSIqhaSvjxjhxn7j7+xPsf/8Y8njjcfWAaT+Q4U3IkV0oS06/b+Hqc1OKaISw1I3f7PTe3N4QQGPYD3WCWFF/ltJZC3MdlT4sxLRxgLTg558w0TUu9xpIt1CM4Txe6KtOtPmNRAzm57q/OOcuArX2tJVPUihnPNT4xXt+QNeOdp+t7uirvzuPMeZxIKXMeI+NkiS0qFbiJWIkc8dUK1C1enq43K9Sb27cvjsSXY3KoWUvOpqld0FfkXDZmrkRRCwYKtY5PM0c2Qr1lQ4YFGDV25BCs+mmpRHDm67OArPVmNsjTtVpU5psrWki5VMZdhw9WILTkSImWFmcXthcnrg6cIdxUzdTAhWDzugISqQRLqmsFccSID+32mjahCP5iE10CmaWBIZ6k2zVLxWu2w+Gew+GuAhnM/eiaIdVMggZ4bAEUH4xksWlZwlLuwrn2PGZpsTgsA0jaQJLPqHizjgXwDRlW96IKZj1y3kCXN2LJ4IRrgZJ3xBQJ/UBMCQkDbtgjvkeiuRVSyvT9wJurPc4Jb97csN9fVTNswJRKm5fXNW6nxImSZisGq+ZeEiA4wbc5yuqKyi3lUqTGHK11XxaQo21TsjnQDbtXHUsQVFyFDOYauNS7lKdEeS+diY0B9cIuxearl398cYA++vrxDaw/uMBH7eUF1PJLRFN7nsVe91gu1gMax5dUkJPTXGPtbJM3V09TwP4I4VjbC3L8SWvABsw91dZy5a0Scfh+IHR7fD/w5k//nbf/7X/hd1cMf/6fDN/9PxSF6TwzTxFyIbhMCBkpkHrbUzUnytXJ3LrxjB5+gvmImx/w0uHmByRPSDogJaJSbEpKnZNLTIyuc02aDNAF031de7ljXkwXfwx0XtmSY5aGqpirYgSjE1oi8/EDH3/4C/N44tOHn/jxb39hnkbidGI+H5c07qW4p3NLTBXOIz6YkA8B33V0/cD1zQ3v3r0jhMDuaqAfeloySUvwKWqJG0WLJVhU/rFmGZ/nqSYVzmguCymr7z196K2moc8E5yvjcKnZWCt3HQoxGY9NKZnD8aGW1RHmNJPVaFKGtKfLJu8Op7G6qArHs9U9FKlyvh9qmZ8dPnQ4Z6UlfBcI3jP0AyF4vnn3G0BOM4U3K0YLwK2exVVgi0NVqgb/6BxbM+JjRbBOuK2FQ1m3wO18FViiwluIzPKna3aLpS+32lOVjVjk0R63+gu1vdZ7terDVXSoLdDV713nrG7urW3QjzZRYatYPN21pE6+pY9+0UL/da2oVWlvVanaFVdvR3suZQkmfHTc4mKSleVS1p1rHTt7MNPWFvC3aYtElqXzhLZpq0Xo1z7pulzLkHcgHhWrit7mpneevu8I3nzMrRjd1oy9gFVhCZArzuG0kXBhhV83Er/920AOUAH8BphuJ+nGahdemfNoUUo32vElyGE1CMA6yI+BzwLAdTV5PJrbX3czz/7PL2uPFxGw1tGq9/kEUDU32HrMs7arzeNdWhPsSwO1ZQknaet83WxeeXVenF+X2zODjqzHPCegl8+WBbr8iZiQdM4Tup5+2OOHPf2wJww7tECMBYkZKVKxksUtumBrsEhAskINDdDuygKxc8L5AXVGx2CxP1WprVm5bUg2Op8pRm3ENq7WNpWfG7tnm2wm+GadX3brOs6ibW68Msipz7BYEtVinLREUpwM0Iwn5vFMnEfiNJqSqE0hX4kcm2UONinZNXjcVQDUsqtCZ2zw5rqv8rrt9HU7L1rwzi9caTmlCn7UvDQuU1RYyjNgVinvzGNRXKnEvACehRoGe2DDdRUEZWPTVyAlCxyWZS+t1IeFJXmkLbeW0WpZWr4qrB0uBPre6it6v77vu98QeFwqApS6oxYt5NL2y7IMiGpGi0XZW6hmE+K+xmTUDq9C0QoemvnJu1blGqxmVN2q2xND5dwxUJBSrnw0SilpidPINb9etEO6zrQYJ0joDOyguKpZeLcW/xStG6dSA4lr8GHd3KR11OJSqfejazozohYI3TlKASu18Xhlrpllj9mOX3nNLe3bd2/ZBVcTzGSpddIsOa2/u9DTDxbPMuz6lYgxOESM6fLq+qqS5xUzP/adLQZvi06Azvklwr7rOnNLiljci1s34DXovG1aDg0B1BZ0kYAvhUIgMVDwdL3w5tahuTD0nqt9v4Ic36F4ijqklp9wDiSYJWY3DJTbW7OcFXNngS1ZJzaeW7C2Jc8SWb0Xa7qlHWs/UaDgP0NQ9Xu1mijYICjr/z2dULrdOC/P0lQWe+5sVdyhgQKpc94tAuTpzP51E3i5289q1i+As+W79gTbMz5t66cNMLT5ZgpaSZmcZkpRcrTAY+e71RT/2hoILC7zr/3cet7iX4zbKte17HBYYkdwSu+F4GFgZsgnfCr0x/cEJxSF7jSRpxlX4DoJfTEizhFHFCFL4UwmYnxgU+koOlDkhtT/CXG3uHzCdQNSJqRMEK3ektOC1EzFds+tlY1KCyyA9vEMfn5Gt9/8kh7+VT/4xc1qDhc0RSiZ8+mO+49/ZZ5PnB4+cf/h78RpZDofEDLBC06sph+oARgxhWxNCZcle9WLsN8NdH1P33cMu4H9fm/uqqFfakp58bimDDbjhLDst0VN3pWijNNI1/VM00ScZ8bjiZQSQ9+z2w0EH/Al4YIs3g8kVVlWaKUacrGU7pQT0xwZ52aZ6bl9846u67m6Nmt7ynnJlM2l4MKOYZ/xPrC/umEY9jjv6fsruq7H1SKhofK6dcHiid7cXr84Fl9RhVwXkGNlKAFNJvzrg8kCcix1tjRhKYJzWkdcjHyvIsDgPb7zC2Kz+ldcCv8ljqYFP5tkiakBKqszRB2oFOclgBR2FvzqnAEeDXavFZS5inCbkaFl/xrTYl409EYOpqxcKYpa0LGKAT0ppvV4IXTeEGnN+lnXk53FAslW/PZ42b92euM/fPct475f6oa1LDJZNvEGAgM+9Djn2e93OG+LwwcjoFJVbvR2iZD3vtKKLwDWVcuIEfmJWMZWAznqa0APVO1vYx20D81qg+I6gd6iTWIWyuwoRRiGnuvdNQ4Y+sDVVb8E0OU2nqZsIqJIxxLpv9/v6btgvDtJ0WRzzUlzkTagY+Ox1CRDFz87NN6eNk+3FisTkH9Mkyf/Pv622kB43kSzfgugOZMb74SwrD0fPK6yWpeLmSoXL59rT37zVT9rAGcZgc1vtuKvvb8EOpdiVJbjF0CNUQGY1jmZ67VSLPjQ0/nO5uqrG3IuaSae+765Y5aEhaaey6qQCYJThxOPSCF46AN0Qdkxc5UPuHkmPDhcPFMKxCnRzRnF0bseXKCoY6IjiWcWJZAYtRAVonakAiqBMnTQR1w5EtIO0REXD7bW8oTkhGM2oV97XjfPZB1bM2Z5zs7yGQi9Gf4XLTiPDv8jmgOcFkqc0Jw43X3g73/5V47HT8TxwHj4QEmmlDsyLoDSoX0PSLU09xaD0nX4rlutQxhR7c57Bu/oK9vw9dU1IXhCHwihZbcama8pnxan6pwzl0+ooSbFQM55HBmGPedxZDyfuXefmKeJoe/Y7QeC96Sc8NlicmJMIK56UDK5utlSLkyzpYWPk7mfvA+EbuDtu28Zhh23b95wc31LTIlchGk2D8NObSaE0HF7+4b91ZUBnt0VfTfgnKcfesIG9DkRrq9/Q3ZVS/k26+lqUpSq6W6cOJutpUmYJycz64dcbjrbublu1I+xe9P0WcBQO59syIsWIbNsFjwV4vXilzFva5r4NsZCK1jbxmfYeSvYWRTIFpRrJlpbcGVdrp/dyTdMo6+sZPRdXUjt1kVqxP5m61dwtSClmQw35H6LG1Ar4DHG52Y2vUgblMqTgCykiS1gd3XiUgUp6ya9fNysd1JJEqvFrW3wIoSaCRdCoAsB7x25VEoDVtebTY2VI6OZeNWBYjWs2vbaLBqgS4qxUF2XW5GpxsLUwPdjS06jPHitJmzDgpt42MKVejvLG336qWz/30Bbqw8Da/C4uamfWhNs3rqtSebJHNZn3r0EyC73jEsNf/vcL//ouatdXPSZL6voaNl7lRqjpIzxh+Q61n+UiORyrD5r5YJLWLBRm7Tug21uqiIlIdlKlbg84VNAihByJuUCzQ2MpSjXEGJb7yXhSsFpRkrbYx3qO1Q96jIqV6AOJUMcWPZxzVBs7Nb5uN3wVmvd5dyov18A7m/bIrdevddtCpUstSRzUc3TmXk8k+apsqpbgK2re6G20jlVIQydxZWGLiwlZUxR38QG1tqQzq+uq7b3tsymZrXx1f3jvaerbq2V/Na8IX3fL9adru/QUqpryH6nUsg4tLDGCbX7h0UxLNRzLuvGmOtbwe6hHxh2O1w0IsGu6zcJP1YwuZWL8N4z1GO89/R9j29kh9Xa1YXfwHgM1bohpZ507TznhFB5bEqG4upCakRUYsKpabQpF+IcUbEaQSn7pTaWxR03c1wVtxUZilRiPldjWFzTYmRJf0aU4GwteGnEXrme06wVaK0aDfX9ypMDVEI+R9BQg7QKrmrpbjOgjeSoAZ8Wje6DxYXkXCyAkcb3sxa9pFZshxZfVBfeFzey36e9vXlD6ju2bovH24otiur3FWfZTNWl5Ss4ANC+EuWpkSiK83VeVPM+gsXi13iuxnVU/T3mfbD7MMClteBnvZ/q2sQ5fD2Tywqaanqvr4Hw9uedZRPgvAFptTIbKVbSQ1FyBaNezDxs/h5BJW8EX1nmVOuYBfcufdWsHxuA2hD45uU1W6Pcd1X7aVvaczfwEpNO+6SkWEtaZO5+/olPP/9sQYfB12xJz9XNNfv93sCs98t4us585YjD+67O9a/h1/myyJIX3n+xSbvCpVVne92W6WMxOJYJMo1HUoycjg/M88hud03nA50XKK9vmfucniOPj1k2j6ZQttABoWRbVFoccTohFHLsGO97xqB0Xce1nrn1b1EcV65n7gNKIQsUkgWJxsKcCi5mpsMIUyIkpUyZPhU09OT9NRo61F1Twve2v08PlIcfkTjipjvC8adq1ZmRNCKabUZWIfj5sW3z5Nc5RV8MRn7FlqcDx4efuPvp70zjicP9B06He6bxjOaIYmWNgq8WG3GIC4g3MBNCoG9cM14WIlytJRCcwC4E9mF13xRVXCnkIpBByKQ8g9YYxDdvKkAwOdV1Fl/VgIjz5r6a457pas++D8b47oSaqMcUJ/Jkgeh4q4NgCT+JyEQpmeQnNMwoiW4HV6mjCx23b/a8e3fDfnfFN99+y7t33y4WU8FXj5GFMTjn2O+tlI5znj4MFpPTeMrc5Uz4nPj8OpCjjSxPoTIkgtQL1kwTRyUs0kUbEBGLBA+2OcQ8Ms0zqkoKDp8MgXbdYFq8YCZWxwIgClUoVReIqoKrUT9arTJVcJUKcpwALUbIebwEvHNVu6/nLqDkKqCkClSq5cIESIv1AUOtbZlYRVYz9aVKL41QJw/klC1ToX7ulj4za4LFAlA5VmofL4P0umDn3e0bStrZJFksEOu1m7Vmm57fKpOLNEDataNrmr5WTcLX/nELc/Oi8TWhs32VdpZFl15eq1lsefUSMHbPGiSXMzghOOic0FWAYwDZCHmKQtZC1IwWJaVslAQCfXD0oWqpklkzlJ4Xz5vwsEtLiawQcYlPe9ae8vs3R8b0CrdcUXAbq0ez8jQxcWnlWO9SrZ7Z+UycJ376+1/567/+i9G1d33V+gLffPstb96+wXlP2A1mQg+ebn9FGIYa7C04+toPtV+21i8uhfX67uuE0Nf1aePRamN2OaIX52gTsUCcJ8bzkXk6c/fpJ86nAzc3b3lzfQu7vvq0/4C2gJfLe77ot3ZMUwqqxXEJ5i1QklkZZy3kHOm859wpZ0a07+m7mW/2M7iO6N4QwxUFYS6ZVIRERtIRP834KTIfDnCajGE+e2JxlPCGdPWWsr8l9Xumq2/IYUBP97D7AZ3PhMMPUBxuPuDSAd+sOktsp26e8HLubjpl+ff/D0AnTQ8c737kh//6Z473n5jnkdPxQE7GK9biU7thz/7ayEpd6I15upKy9tVa0TrLFOpILhGHsdBf9R1Db2EALRRCss34UgpxmskxMgw73tze0Hfmyhp6S7zYdmbfG7+O1aqauLnekZJlTacSKWRkFMbjbLLUW92onK0EzlRGCpnoJ0qYQTLdXrmWjq4bePP2im/e3XK1v+Yf/uE7vv/uTxaMXDDCYTXql5Z17ENNbBGHk7DEFjVDyxoTaYaNl9rXpX88twfVObmkQDvzAyulEuRVkFPNZ1uT+VIjqvHnLL7oddGuJXlqWQhaRstKILfeSuWzaPexGCkeba7NcsDK1dG2BVkPqBOrZmQ9a998Xitt99UIAi87y16bC2bdnDbL93Nq3O/UhEeX2fajrHW4tpBj+4uXCAy3hT2X95enr5YbXTnC7NNF0C5XXUwjVBCxjv/W8tPO3zwkF9eqoCQXs/P1F1QAACAASURBVOaUYhllKlYQr6i7/EEd86YUf0YsstUs1+90ue72Hl6zLem4Sx+2z7e3ptu3y1Ota6Hda61FlhNxmhhPR6sh082k0BG6wH43MPQB5z2Fgi8Zlz0SAnjB+4AvPeLbNWWVxbSb2txj+/c5N9pnn5unQ7L8Wh496Pa09S4aWFFA3crhVa2/paTK7zES446UZgtIzq+7OLf9cAnEVtAj2/frASxB2Vvws4lt1FIZiVMkx4ksCmnG5RlQvMyodrXIsGW8SE64NOLShMsRn0dCmRB1FCzJoEjCechBIHhiGChhh4aIhiu0CMXvUGcZWCpdzYxcN4FmjGpPrssK1EfPuTzts58/6c96/m2cU3v/2oAnpZkUJ8ukStPCNWZ7ZNsrjaLAvB2h/vlFmWyZoM3l3pT5+jC1GKdlH61JATVdvHK/5WTZTd57UrXWQiGlcHEuwOpGZePPaenlBpwKWTNFLe4m5UjMkVQSWTOZYiWaRMEp4pWq7+KDoMERguC9WIKsh+CNNkaQ6sLqaqB/exYqD5s9UyMl3EwbRKqRoGVfvNC+CHK8t3x0iwHWWrtK1sGq753zpkCDfd64dMSeStVKtjvvjSmzLkgtSo62sYo4pAgtaEfItSaSEMQRnFlCxDuLA8yFIuYBdGLBpyJGGoRvgKV6CKs6bllRbqk4XN3KZg4UMQGIbRAOdyEUSr3vlFbywBhjZX20IxuxnzhZrCXSACHrq+LWPUjX+lmvzXh8/+mOPI/mXqqgxgW/mkj7foklWQBFq/BOjdavMTxFpRk/Nk0rQ2yFCc26twVxj9QxXQjs2nzV2k8t0Lul+Js1rw8eUaELRm/u6zjbIBdiLpyicdsczpmHYyIXZeiFYVepzp0w4Ot9lepi2RLprfe4AqitcNkI0uUnurxq0denPNYZQXDS1ZiiDThbkM5q4r/o9mYsqxt/SZFpPDGdz9x/fM/Pf/8raZ4tYFEs6Pj44R2f3rwx6virPWG/w3cd1+/eMdzc0Pc7vvnuz3R186LyhS8b0TNC6+n+9LLw2aoKF4dfYNUqImVl2LJ1lqGa+tM8k2JCROjDjuA7I2orM6VMpHjmdPjI/d0HSpq43l+T4sh8ffW50fhd2qrUPWq6CTZenvXy9QJ71z3MgFPNFnWZ8XzinkIcAucbx7xPOOcpfELooECelZiUVAplHNEY8Vm5SYkryeACGnbgOtLVwLifyPuJo+/5IIVRlUhgcjdk16NhQobvcbLHS0DyiCWazIjkJe5xo/FxuRZXl9a2X36tVeePsOj89MNf+fDhB2I8IZLoes+we4fd8ZqV3Er5tHCIUmNMsyqxxlZoSagaSLJs1mjxOMNAX2Nc+r4n9D0ikFMkzZmUIg/394ynE6ELTHFk/+EK7z276uJqm7IitRDmuRIFJqZ5tLJNFLIkCoXDeOTj8SNziuYyCSYjvFO6fQ3t8M6CmjOUaIkiIXT0u0TSAzErWUcggSh959jvhiW5Y6sQrXttxuaxLAkrbI57RhAt7Ysgx2pMBLt4JXhbUtsWVFrdPNIQWM20Eakp6DZ4F4GndRFqTf2e3VxTuh2umt+dWH0jh8OLuSXM8yE4tWyblJRMMVK7zgKkisGjJVBUN1H9zdpgZZssE8wLK4syFhfSGJlb8cOWJmcouSzWpwZybIE2kji3gMEV3egCGJplQnUjUpup8ZVBzsPHO+bxZJltzhGCpx8GK0w69DbeAi0AzoSoI7T6U86tgcotgnw107FMS6nxOkXZcJKz7MIX5oVHFrd67mah8dVEWTB3VB88DqHz5q6yOaJVY3XEOXM6Z+ak3J8yHx4SqSjXV55bCXQB+k4odV4hBXF5AdemBcN2C93Gu6xWnlXbbD+j/Vx5uTD479ai9RN+ucN2MyLbPr1EY+vdO1ogdU4z83hiPB25//Qz73/4K/M4VVNYwXvHw9s3XN/c4EJguL2m2+/ohoG3f/oT12/fsr+64frqhpura4wjxS38VK2K0dZCZoUKH4uuy/tdn2mToHCRMSBcWt3arzbWLVEo5r7WnJjPB8bz2TIIr5QwCGhE84zmmTSfKsj5iRxHdsOeOJ0p0+2Xh+Q3tud6o32+tdJcAJ6ttK/vbQ1VgJuTWVREGM+lPqPndAvz1WysuBnE+PPJYybOVjbFGMEzDuFGAh6Lu+p3Gd8NxP3A6Woi7iY+sWOUmj1K4Cw3JJfREJH+H3ByDVqQ6RNOMQLOfLnKFmtve97loV7okxfarwVAv1f76ce/8vHDD5R4AhLdcMXV1Vt86Gs9qhrmwBrQbvQprWy0ZX0ClBItjkcLqhlVK2osUEFOVyk8elAlzjPTbKUS7h/uebi7w3nH3eGO0FuGVd9ZhlJVzUGFOUaOhwPzbER+MUczBLiCBkvCOIwH3h8+EHNkdz1w/ebKSFavOva7AV9pVPreo0Ug2XMEF+iHRC4HKwCrI0pGBLrOs9v3lKJWpqdYOMplpnVm2UGqFf4Sq/4GkNNAwecqHTdNV0Q2r+tyfbIFyboil42oVDfGJhp7++egEvetSmG7rpOatuwdwTsygpZVSD+OoXjWjyCb77efbR58qwE81QZ02Vy2O9XnMMuFKfIPaudxJI4joYRqdeqscCa+FiVsNaNqJ+vzWuXyDBtB/2SzffK71Uqz/fri/EucknG1CEKRYtYhaYdUd2cuzGr+6ZQ8KRlL83kqnM6FOSvnsTBOmaww9LWEh7YsPVnGTDfXNwzW4r02oAw239VZuoC4Ctobwtku0FdqmlO14DRLlLULS8kFqNGLIbo8mVbLZkFzNq6obALOcoUdaZ6J04jLAemsplOhMI1nwjDgfSDFmZQiUqnnF61LtgDn0iUpL/aT3a2tqUdWoK17ZjmWdbyWBUgdmkLJiZIScbZMF+8DQz/QhY6cU/2L62uaidGeeerOTPPrBh7LC++/5pgX1+gChtY9zFJ+pQoOW2NedLGIOs1IzlZrqGRcVW4DUi2nhSAZLxklEcqElomOkY4znTpyLng18jfTbTwqwdxVrrN4SQmLBXXdFC7nwlZ/+qXZbU926D8yJifFWr+J1fPRwhicsxp4VXFu+4eWreK7zUIry/i12d3k4fbZ1jAQXXi92quWQsyJkgpOrOSCd5HGS6cK8xw5Hk/Eea4VBKKd0ylUkDNOUy2eHAm9s4wjFYSCpzTbLUjNTK2v3imqxiie2z4RJ+PHydGSjNpe2op/Lspm20ubDNjwfX3FkH6FJadaR+oDF13dNqUR4l3EaGwFu1Z0VhYf4VrSoPHj1BTmSlSjalpXVUUQpzjxDN5zNfRmQs2JVBdhHwLiLbB3tzMTXC4w5+YpERo7ts2XyvBRrTcO3VQ3v5wsuhFa5qfMy3MubimRWtoCGqmYK7IAr9YPi18YXTbiVv9KaSW7Xl8w/ss//wvTeOLm9pq+69hfX/HNN++MBhyl7wNowblA8OZ/tD7LKAWcQ2uJD1TxDQDX/29apizg6PJ5Fh17AYI1+01AcyHFmZISilTqMYHQIzvAK2lOxPFMTJmH05HD3QfSPFW2Jss3OqfAQwwkdcQSmEuPRbNfc3N9S/BVG6Zlh1mdHpqWVExr2OZyGN15RaULAaa9X4GN0eQqagSDr6xKno6fmEqg21nqvLb+VmztbDbG51u7/+Z7N64rL8rgraQRCnhFpCBxJJ4K4h0xj8g54PueOUUeHu64ur6lD3vSnOm7HTdvvmXYXVWAUzYWmNpfCF/y6a1ZbNvPsA302efZHFfnl2phHk9M5wfmaeT9j3/n7uN7uq4nfvdnbt+8ZTofefj0Ew+f3nM43HG8/8TpcE+eIw7Pw/0dafz2s/f62m0bC3fx+TPvt8pd62VbcqUKUkcIjt0+0DvHtYO9E+ZYQEej9VdLpkhq7scgrSiiImVCo3GVORV82LOTa975D+xkzzkF/DwwZ0fOE7MKSk9xN6Tue8SN+BggT6AOKRlIv6g/Pscp9NKxf1SbxhMpRnZ9h/c94jwxJmIquLpTgaI1vdyyplgIaMNSlsOSKkLn6/h5BFNEO9s0KTkxjmceDjXBJ87kUms2Bo8belQgemWWAiVTjjbGKRVO55kUM3GOnA5HYqOPaIvJgXS2nuY8cU5nCpnrLrDPhcEXrlJiPyveCTmOpFZcNGZKyrjkSIefeXDC1O/pXUCTzcXjYeJ0mu1SmxI/pRgAWzLMXAWINQNr1XEE5OV95CtBjsMeuyHFaujWgm6WTwM5DZUrtchmpXNutX8sZ984Tswd0oSHCToaiV5NSXde6L1j3/fEnBhHzIQq0AdPqBVY97sdoevIWfHJmBdzUWLMS6n4rZ2ouZPEyQLUFtvRI6DTqq6q6lLYE7aApwrBIks8zpLy3q5chU/biZfzL5+8voH13/79P5hOB777/lt2ux1v3r5h6DvTuATiMIAqIYBKMGxQLOhMkErL7ZaYKrcIkjVTjBoLIO3RH7WtbunELHBOhKwGXuMcjTWzjqHvd/TS43tHjjNxHpnnyKeP7/nLv/8rx8OBVJQpmXdlYsdZb8h4fHeNH24JoWO/9/wp39A3pUBNczXW40qnXswkbI/RxolK+scCYJZMqlbJvQru1dReatrn67XT8RNRBkJ/a7W8WF2sz7twnlrNqPdqmpSBPCdK76oG6hs2UTRNpt05yLMje8GFjvM80j1ccXX9hv1wi2bY72/ouj19tzOr4IJnGqRvd/WibenJMc8Jc938u1oCmjbZwLiBnNPDJ8bzifc//oWffvgbQ7/Da0LKxDSeeLj7mYe7DxyP95wO95yPD8RxIsVEqEy+r95eENhfI8jl0Wt7vwU5DYwLhdAJwxDYBcc3g+NN5xinxPk8cR6t+nWgI9MEbHWkqFJ0QhO1QOcRL57BXfE2vOfa7TiyB33HTM85e5J2JHqyu0a7b8DNoBmJ9+YmExBthYAfK+mX//cYsLzMBv30+D8K7EzjREoRvx+MVqRYTaeiEBz0zmzgJU3k0Qomt3pQCkjX42t8ZOd7dl1vijMWCuMq+NGSyVmYppHDyS/jQ1VLG8jJFKIkEpZcMJ5OpHFmmiIf7w6M55kUI+PhRJrjUpHAie0BvnMmCySjLtma3id2pbBX5Son9nPCOchpqnUMi8UHxRkQ4sFzzJEp7HDqibPVrpynwjxbzcl+GOj7HlVHTo6SHeIdXR+QrtY3tKrSNCivykIr81z7dcV1dAU8WsxkbYGhSz3r+n29gQpghOe1kTV7hk2Q7ia3R9tG1zRmu662dJl6TGMotnVs1iMtG7CyATjNilDF1pK1cAE8tlphexVZgA3LhGrxI4IugcduATm6uLy25sXnuvXCSPkqbZwmpmlini3iPudk9AALoCtLDTB7b7FRlmbdiP028mS9eXvu5eHWHInHD6tqmnjBah42y2BOiXlOxGiVoGPNilKX8NlM6ClGzucz0zRxPB55eHjgeDyQsjBlIasQBUbpKdLRkRhCJXVrgrxUd6b6WrfEXDSLz7uU5f7bvZesS/B9KyB7aW6GZlZuv5P8yzTTX9pSimSxMvCymTsvujE2smLFo1VxKda3OUZzUVVNrm2oSjV/a0YVC/gXV11AmZwSOUamceR8PCF44jSRK89GSwvdXntdU5fPpc8d9NxibBbBzbEXh2nddIqlUMd5NIA8jUzjGUphHI+M455pPDFPI/NslPYpRyMFJFnMnbia/vuK7SsAzrPC/GLAKyTcWErXjXT9XqFSLCgZEGcEcaUIXR/ou0BRQdTj8Jv5br8udV81C7QFp6Iz3k0o0CEMjAiFTEdXLefqCuocYMVE8QHoMJI7gS1hZx3bx2DnuXTwz5XDeO79s9rX79mawlQymk0mlKyLPGjW7KYMuio/WmZR5x1d8BZrGjxdCBXkVLlYiyO3jMCUUq32bZcXIFWPRyqWAZUwkJNyJEb7m6O5ZOdkwfgxJ3LJOJXKRSa4YsSQIoAriM+IKHlOxCkRgFlhwlK5c0nVIGBFPnONLUoxE32kFMd0Hjl1BxrIibPF7K7KlpCjI2fj4ko54KLHhUCHYNR7FhNscvs3xOQ8LkC4pH4DicKsxYR5cDgCugTb6vL7NqGcc4TFamMxFQ4zRfWhFhRzBcm2wXrM7+tRSIkyzeSUSFMkzRHvLHhU1Vk0eIHZzeSsjLGQitatukImadabxbayCKRcfVqWImdCdmtMd97RuR7BrFut+GKMERELPLZsIPNHxznVGBettbasHyxtvk5EZ866Bhb/CB3j548fGE8Huv1A0ky/H2q6oLnk4jzVAE2rQmwuvYBIQqh1qYK5GWmxV9oWmwXNWVZcrcy+uHdqARAxk+ycIjGXhR4850zJhek8kmLtz0oq1w2F7Hq6IfP+wwf+5V//jbv7ez68f89//vu/cTweydKTZIdKgOEdchUQP3BzHfBdMkvRPJLGA4lA7nrykEAgT2fyeK7AeKZotA29ZUE0MF9rRWjefP4I5CzuKoWwe90q5Hef3qPhiuubP9H3e5Yg3gXAr22xnsn61cLwmxKnwwOffv6R8XjgfLiz9GJN9OKqwFPOdUMsYqsTvD1vnFHnmIvw01//wsPHe25v3xEkUGKiH3pu3t4y7PtLO85G/j4397VaP7dg6HGEmNQvtlaMFcBlG6s4crr7wIcf/8b5dOTnH/7C+x/+Qug6Sj7zcPcTcZ748P5Hjg93TOOZ4+GBcRrxPhnNfOoYx5fr4/webcUhm6d8AeDIVprVjtTN2DdaD63r1KrmCLkeN6twmJQPx8zNzvOPb6+4/eaWIWa+LwN+OJMynKIyJyNyHadErOnFc7b9NWsmtVI1rtAJBBfo3MDOjWQJHF3PvtsTi2dyhbNA0YCTa5x8j+QZ4j3MBUoCjSyBpk/ckrIAmi9ZZR7LrT/SouMwV9R8eCA7WUAlCi44tDNZGSj0g0fwlYzWEkKGWqrBOVt/fd9VK03bZ03piGkiZSHmmcPxHqSWX3GOVBL35yPneaRoZi417TsmpsORNM7MMfJwOjJOM5pNPqmoWZ6yLm73FnohruBcQUT5lJQwZ/rO0QfYd2op4k5xrrrrq+InIvT9SD8Izk3cf/orobsDFea5kKJZvq+vrtjtrWjsNIGJAgehAx/odzu++f57rq6vLaZu2BnR4WcseV9du2rbDADYRh5Lrg8TLChNW8CvxeosZSHEFqZv6ccxUrLROFumTK1jlYv5/1ELeCuWO0Ktp1OSDVKKRgZnoMoyrHI0cr9UlDFmUtbKzhqWcvW+EcyxboZFtZLcmlsqVbeUa4HL0mpkmDbadd1FhenWPd4FnAvknOn7nlQF90qdrYvwRFY+mVJrYD1r3vmd24f7O6bTgTfv3iAObuabhfDQtPkZLQ4NBlesxlfGSQYEDQEh0CgElr6si8/61RuJY7Pu1DlQrJvJpXCeRsYYSSlzPJ+Zp0guhThFcqrZcqGzlPGsSDAwdvfhA//x7//GTz//zIcPH/jP//hPjscjGq4o/VtwPd0N7L65xXdUoJNQL2Yank4k8eQhU2ZFHORpJI5nWqXgXBKW5lljdLSC/VwWcK4tvbPWTgOq66r1AXTlZe3i92jHh0/QJ0qaWTmP/RPAsDF82EbZQA6VS6NkxtOJ+w/vGU8HptMBSkRKpvOOnQ+UUphmK2BpIR39euIYUfHEWPg4/gDykdO7B96+eUfnA1fXV1xdDbhdv2jpSwKBXN6jPvO6aPa6arzPPJ39X5PzmHWKEilp5Hy44+79j5xPBz79/Hc+/vx3fAjkNHJ4+ECKkYe7T5xPJ2KcOZ1OzPOE8wkVJeXANJ9+6RD94vZrAc6qwhuoUdkoIWL7pAgVoEJU4RSVu7PFSkm35/rNO4aUmXLAdTtiLnSnxHnOzDETdSJOto5nhblac1Ld25xmgss49Yh23LoR8BzdQAjXzGrvVfZE9YjsEd5BMe4WTbV/NVeX8GqZ3M7pbf2ul8DKSwDnj3JXORRSJMaR9MjC6rqAqtVD811g1wecE7quZxgGnHPsdnuurqw4ZQh+KVuQaoB8KYXTODJPsRbLtHUsIoRKDphK5n46cooV5MTZrCwxEQ9H8mQg53g+MUXL1HRqweBFC3lOlFwtR9Ub45wFEYsoD1OinKbKOK/03oDN0HuGwXh47DtXQc5MVy0wpYzk8jOqEGMmxYz3ntvba66u9lZL66xMc0HFkXxPkcD1zTUpZ9598w1d16G3twz9QPotZR2aC2Jx0Wh5ZPbWxVVxQepnsHNxXTQbavtpO18DEfViNhlEzIzX7Na1lkZOiZxM6DSAoLoyLRdt8TNKTlYgU+rAtBou7fYW91V1Uz3ZN5fbbhP00TGsC2aJQ5LL7DInbgmQbP2yPd9yLA31fmk0fntrsVGlmZqrdUtqev/yV8dqu+laNzVjN9VEaN9akbetUNeqiK325lxBTsqZcZw4TRMxJe4fjpzHqXKYGDAM3rPrB0LwpKJI6AgpcTqdOJ/PjONInKMB1FqCwvkOvFUgN4r0WkwyJ0oyRttp9Dj1zL0SB2PJjtNImiYDOZootVaR3W11XeWyuKmaBQSUVtdotepUsawsQOi1Wk4Z8XkDrlxdes9pNRVsboTlkmmTs8U6TSNxmig5L3Kz1SFr6znnbCAnp3o5xaeMcwYGcKtrJ04TcTwzORhPx5poJbVemFQ+jb4ymXIh1BehtPnX1u1FErr9q5tjquVQVclpIqe58v+cmMcz8zTWSuMJydQMqskUp1QtijXwfEt8+UcszmWt/SKgs9mXFn8/l/f+6H3zZaUCU1TmJEQC0fUUr/g97GTAp0J0EZ0TMmf6ciJJRFPCF4eThGrBl7JShARjpjUSWAtYD2QGiTgpJPXsfMIXb9l53qF4c2G5WrupbFxuVHnB07mx9EMDy20KvKzU/2HNY/xd9YmqvLNx6DtPH4LRdlQrjfem1HWd1QvsajkVX8scNUuKU0uhbmkRjawvpUhMqSrNis/G9D5NE1McUUpNHspIMe+IOFAv9MG8IaigWWiZi6XuD20fF7D1VRQnShJhnjPZQfKQK8iprBNWXNQJ3lu855yg62ywSvHkYl6MFI01ufHoFZRSYByVGEFFSK5QxKxcx+OB0AX6zqxbcZgsTueF9lUxOcanEqxzSzaLopoAMDdLq2RaixxKMzNKzXBqQamy4B8fTAi1VDgrhgde66RQLMslR0oWRtQ24KqtDt3OBG6xrCcLMK6otihTUqu90gU6cUioEKMoqFQWRwM50lL66nrxzqNigq1lVIUQ8FK5f0qpEej2bBafs07iFkzr/aY/alaDLVZL0XYtVkFt86YyYL5mS0ASyKIUwYK6hoHdbqDvArtdVxdWoPMtxbPGWomlkLoaPJ2zCUiFTfaZ1R9x1WTQpoL1gHHdjPPM3374kfefPnE+T/zXjz/x6e6hzhFbbEPf8+2bW+NeCIG+vv7083v+9re/8enujhgz3bDDdQMyvMXd/BkJe9zuG9zNdzg/gM+MpwfyqHwEhigMnZC/uYHvbglOiKeR+TwC5pZ03gCoD/a8Qp2LDcyU1V3VspPgMghZFeQz2sXv0abxhFdHjhOaJpAO8ZYJc7HkN5lWAFrrrqWUmM4n4jzxcPeRjz/9nfl8JJ4OFkPhHYN37LwnouSUGM8nW9OzRytTa45K1yecC3QD+KCU85HD+x/wJRH6nodP7+mHAfHOykE4z26/5+27d/a5kyWbxHpwSXperT0VmAM13q7yVqWa8q7GuZWT8XycT/eM4wPzeOa//vOf+em//oN5Gjk/3KFxpuTEdPJojSEYxxPzPBvIcYLvrFaO8yDOMj1fs22L3C5D9/j9hdLxGOisVp0LNvKWuiyCVEGh4nkYHf4eRnX8GG/43n1P13fsv7vm7bAjpcLDYWKcIuM4EX7+xOFgWj8PD4SqGLT4C9SI7JpiKnXN7DXSaaEgvA0zb0si4TlOcI8jl45celLu0EyNZXNmHdZq6QZ0YVt61Ez6bqyCz4Tdf4X15/dsQxD2na0fJ9DVxBjvreDm0Pc47xj6nv1uWApsOmdrwFcLTnu/JjE4kALZZEtOFl5xOBw5nk4oa4ZSKomH8Z5TPOEd7HuxavQi7AZHN/TEErjZe+ZciKlwPBdiVmZVziVbpXEFydj4YhQCAkwuc/JrLG1LPPJLrSsj13Wyurz89jlopY6oViLHbvdAP3gb9+IpxaKQilhO2jD0nE73XF/vCV3H1X5P13V89913L47Fl3lysJv1wYM6ozavQb1omzC6pIgvQEYaGtx6YZqVx6wcnaukf8XcAaborcU0KQWy+cTnkmyR+oAMN/TdYAFKKVsRxlovKqVEVpizkhVCAdeZ4FYsrkKcVDN9jWRXvTCDu3pszoYwUWNrbq08ckOslpuVD6ENaAuIWvL8rbiACZ7FOmZKjBY1rfYVWxbI1IBfAWrkejf09F2wOijVNedriv82AdU5VvFT0hpFn3MlRax9Ul+dSqXNhyKOguN0OvPzz+/56w8/cjge+dd//ws//n/MvVlzJVeypfftISLOAGQmyapb1YN0ZSaTHlr6E22m//+kpzaTydR2uwYmmZkAzhCxJ+8H9x0RyCKL7L4FqoOFAhI4OIhhD+7Ll6/1wycNjoIasp0OB/7N777j8XzSexl18f/89MwPH3/g5XrFx4E4HhhCIJzeEz/8DjecYHiHHN7j/IDLL6TlmSqFl5aIy8IUHUN+z6HdiN6T7gvprp0zwzQyTCPeO8bBMwxBA1Jpq9v9FuDoYs5X3++IZijjmz5LbZ0faCUhNVugsOW927SzsdfREvthbZWUFtJ85/byzOXLj6T7lZBmoo3nIQQGUymXWljmhYog2dO8J4SocFgxdXTTPmnpzu3pM7SK94Hnz59wIRDiwHg4EGLk8d07Ru9x55P9LK6djhu/ZDeYbBMVMLKzrkM1Z0pK5tWj5OJaMi/PP3K5fNK28T//C18+/pVSEsv9gpRE8540e1oryEBI6gAAIABJREFUyjNZEjkridY54551UU+3O483Ohx7W5WfDnD616+Qnq+CnX2A43Zrkn4O4CLiPLfsaTcowfG5nvjiPnCajrz/47/j2+++pZTG4eXGPCdut5k0fY9/elGtlOkL3LcSbyeMtrJosF8WJF2R2pjInEk4hETlDFQCn2Qk1YlUgSHS8qAmz3ioylfUpPZ1l+rP3b3/kY7BwyF6DoMq9R8OE48PZ3UXN2V573XTPp2Oltx+xcu0KoMG2j3Qc+ZWvoELqlR84/n5WXk/VuUorfA8P3FPV8boiI8D0zEQQ+TxcOI0RIrAoUaSwH2pNFkgVUpSAnKuRTcMa3xTGskqV7gLO3urEK+SKmepbU/2nevEaaN/GKLru5fXoJ5VSsIelfS+e7cYA7fbE9M0qAfXYWKIkcvLH3/2Wfxyuco+7yoyWxADuwnWERu3vZY9Cm2/7Deey6u/0Zp123jdfHdlsv68e/uuq01DS6m0YuJlXxGkVVnYbqZXfw/pNvCGLAlb8LEpK74eaB0+Zn3t62MPC4uVOFrX4oH1WldYtV9xL4cJ22B2f7cT7h9yOOdwQT1Pws4M7Sc/1gduv7viwnoNvStHA1wrV+3LkyKqI2cxYbUgZ15m7vc7913pKS0zgsf5Ai4grfJ8GWlNScM9yLlcr+RS1qDYh7iigl0nZ90QnG7kOS3UlpjRjxZhmQ8qLeAb85yYb3cAQs6EtOC85zAGxlHbMgevyss9I9k0kHSUu5+4Q2991FqQmillIecFHx1DVJXwvQj+Nt/6mNRDrFtNVbvzWq4JoG39dFVz1sBI6KW5rRtPCdvWwdaJ2N0vqWSaq8pzs8ChloyPkeDg9nCClnU8xmgog00Et7+TYt0qvRRVqUUFxPKi6q5iQU5KM60WLpcv3K4v2lF1v1OzNi5IacZ+1vVEqiK861wVT3Od47T7+29cshqmiWE67J6bHa8W36/Gl/vqPtlr16AGNOmykqBzUUu53iMuUsVTxHFbCs+3hUrklirngirWhpE4BsbqOJzO5ApuGDnVhhtUvbeWZGMg09Jgzz0iXpBWCKLCgNp/O+LcSCVydyNHdyBUcHKg5AlXHNSE+KhJLujm6iyphq/W219//FYoDsA0qQLwcQzE4Jimicl03IIPSg523tbgYAnxLszpNApe7zGvOsikr8GW5Ce1F3JBn29pVmmxtxqi4zB6pug5HjynIZCqkJL6+wWrLPiAfQ6qBA+rZHmX3djWkp7w77IA2b7zOlOxfddhie++sqGolYuBOCiiOcQDIYx04dcmQgzqgeX6/JQGO17kTx2/ipMj0rZ55LDMRtsLNZgRqwWCVFEXb6fEVHUltgzD6q4OVrkYEV2sW1YegDhPs8mpnm8DiBKCW9WLzeVOJekFtmLibf38woqieOcYDhPT8UgcB3ItLHlWBGdHRqNpS2yPldc02IELztbCZuZmrxeQvWZOLcXaBBVK90aMDsaa1+9vIEA1kT2wRSj41a38rY4wRkZGjucDp8cjh6PyXrQMpefhzXFceqDzalm1riOgtkzOy2oEV83DS8sHGuzNS9bJh9Mgx3su1yt//suf+NOf/sLtfufTDx95+vxEFchNKCKMw8Cnzx8V1jVUQa0doDT9HHxkPD4QxwMlnkjiaUU1fqLVyeblSvr8Vygzc0jcY2KKjgnH++lM8J4vn77w5fNnRaNco5oX2vEwcBgHYvS8O504HzVreP/ukdPpqIPEJpiSys2Hx2rZ+6z8LY75fkVS4unLR5o0jsf3PMbJIGGPsyzI0w3u1mGtnRnLsnrbXC8X5nmmLInD4Dkej2ql0twmSNsDHbENRhrSHLUkinMwNKQdQCJSEzXdqdGRc+V6m0m5aIfloMH1w8OZ5cv3HI9KsOzlIUV1gvnM2ZgTqDmTF0VstPlAg9377cb9erXuwEROxu+qM7lqWep+e2aZrzSptJpxta0tzR2ZDagAm6BCeJoVa+LS5QXe8vjw+3/LLW2hVL/P+3/3n8rum/tte01Ke2BER4H031quUoS2hUgmcM2R//z9heT+wuPjA2V4JPkTQ4w8HM8cTxPDqcD4yIc5seTM++uFOem9LXlWnZeSyPN1NQEt94vd64zLC641JIxIONIInMuRKZ1ZquPHTxPZO31290FpBa3gaiaQbXPc+Cj7Y39ffs3xmmv1Nscff/cN59HxcDqqDU2MxEHRm2abNgjjODIYer49b7F5ZoHEahEErm0+e10DLqXC9XLn6csLAgzjgTCMCObw7QPD4Pnmw8gfvhk4xMjvTwcexpHr0pBPCblXmpXPmmvUA4zHoirVVWgRup7nplDegLItKv2Q7cNRMa0XxDW6wVIjqj6yM16euaM/fDjy8HAgxoF3Dx84Hs+0WpnvV3JacFJBZpwUfHBME4QA498BzX8FkvM6g+nQJ63X4vyqR9F5qD2LXAdTRwuM4KpXKdrRZBtCKSaB7TziNYsM0amWgmimVS3YmUsiN73Z3jIE59Vg0tvfCD4izhGHgWGaiONASyDZmTDgblIYmWoHUtCvomuEdNIloN5anX/D5hSr0OG20XmnfjHeK4GsNRNosvvay3s9GHQWXLzl4aMnEBgPI9NhZJiiRvDOrSW77q/1Kjq3kbzPNJo50q5BTlZV7JIyNRdqa7xcZ25z2lA677lcrnz6/Ikff/yBeV54eXrienmhtMYtF1JVpv3T80AMytL3aNA6TgfOj+8ZxglcIE5HhulIY6KIo1rtuEvU5zxzuXyhpRvFJYpLjDHw7cM7bt8kDXKernz/8TOlFuayMNcF7xzn48hxGhiHyPLNN+THBw6HidP5jA8mB9yckvno43kH4L4xLJfznZITt9uzecUFzu07ulQ//Qk61lKw7ts2nnNeA4R5npWPkjNuPDBNIwGHyzsobn+IXS/Nxn024chigV9F6kLLgXRfePn0hdvtbgmIzp35fETmC4fjQTkIJn4WYlDxL4PlOyqY5oX5dqfVSsmFklTL5nK5cHl+ptaqkvPZJAA6wQ9FGaQVA/mN6N9ktaYA4z54p0F8M4Vv2m4NfFsU4PzuWx5uXb5A18baZNv8dt9Htu/vA51XXUXrV/tgR7Nmfa1TjeHq+fhlZm6feHfNvP/9lcO7mePhwOlhYjq/Z6iNMD1QSyOVwuF+I5WsLtfpTquZkhbS7cUMImfS9YVWC+QFma9auowjYTiCC4R6Qsojc3WkVvjxeqeGRT3ElhvUbA0oBWdPg1eIeOfZyPr1/yjHh/dnpgjvHx8Yx0EDNBfAOUouLMuCiGj5KsSVIrEm37sgZwUWgC6rAarkX2uj5Mo8J67XO+CYWmCQAK7RbF+JMfB4jnzzYeAUB353HnmcRsZr5fOlcJsr0asKdkQYBmGYRqo4pKIlaus52myVzO+PtlbBAf1eXwbXVk6hdjQHQCINm+NhxA8TYYwcH97x8OHENE787rt/4t3jB2rJXJ4+sdyvtJpId6HmRgiOIWqQM/wdbOBXIDltnVzObcHBhn671Uam/19HKppt9Csh13sjUDm1blce9et7se6tboWtRcQ6OZTY2mQjufqmNT8vnhAErK3ZBa0/O3N3tYhlhXfdOil2YehuUPUvlJS87xwDELoxKfTI+2sYX99vv82t4UIfxK5n/G5Djt642BG8XzPlECNhFTVsuxSS7VkatLid1Q4eX0tbO9RcuseYVWObbjxVhNSELML1qmUq3ZCU6Ll1v20ihKVaF4HzRCcIfuVJuaolkJwL+KJGgF60w0c2daTaCinN1OVODI0cVLiyrgJX6l/gTacpL3fu86LhXCvkHBljZIwDHsil8uFDplaxa3a2aehmuLbN+80u5K0OaTqHStZyVSlpDTJ0iFmyIRWxjUL6s2lNZRxK1u6zql1avUPR8AsNcGrXR1HzvCaC+D7vdQ5t3Q1dXsJKS63iREtoqzcS2mLrWqXlheqF5j2tJF0nQqAOHSHdFvySFMlRbZ8KpWhJoyy0stBKRUrWD5RHp1y6YJ1fXTKxZ5Q2SqomOKoRovJ1nW2w20/f/PjDH/5J0SXQJEKMYC02utrWEdrLEOy+r7/Wkyi2+bwePcjpc14DnRA8D49njqcj5/PZmkx64qatya3PTbsvvWlCLMPTkrVa+PTOVnEenEd8UJ2Tpl/3O+ucEH1jQPlvB7OWWZaJGiYEr2O55S24EUW+e9DZE8xfe7jd2v6WRwyBGD0herNc0YS3VwK8dya06tbO1iZtLS1tyeR20q8QO7GdZ/+g5ev905JpjUgYhsDxEDkOkfN55OEwsdREb0Io1VGbVWTAkm/VyFLSvWzCvH3Dl96Usj85t57PLvpm5eaAikxK2NFJlAM6RPWTG8fJ3NUHihNiDOSgXWA+eKQFbWIIWurrnKWffBa/9LCkFnVM7V0mPTDom6BdlKwIjoIipVZcc4yTRpEdpu76MvN9YS5ZW5pdU0QLhwRP8xrxthitxRBcFIKYyuL9hfuS9MbZBhmHiDPreO89cRpwMRKGCNHRPBAcwUSYep1vfQpbN/QanHR/LdBSVCkGV3uv3Y62CDczF2vd6VhQVq8FOr140ABpTTdILLBwJqDk4msKwhsdh8OENMf5dOB8PjCNA54GtSK+rgNTOgr3Vc3fdWjAgkbn1cnaWwApXtn92uOTWZbE5+dnUs58vlx5Ng7OX/7yV758+awZ+TLjpeJbxVnGXcVRRNuRgw9MYdSut5Lxy0KsjeYGGC/EqSCjQ44nDeupBJdxrpHmC58+fyTfL+TDgD9O1BGW6smMiIv46YHDY8almduXz/z54w+oh5MqeA4x8sP7d7w7P/Du3SOH05nzw3tC8BymSByUi9Jqsk1dgwn3xp1yrc6kUrleP9EoxBio5Y9IG2nVU0rRkktNkGcrO+tzatJYXp5I1wvpftM2epvrpanAmxdtuy+3hSUnbsvMnHVRDCHiCWvMq3MFG+UVpCi5NzlcLUwe/Bh0rKBiYrHM5OdCMy+xJq9Wxe1C112sI8TKjxosAXHzE9yelJuXKy0X41Q9cjocVm80b4t+sWusTbjnwpI1cEuiSKs4QUJBfFNhs4jxWt52gv7H/+s/crlcX21gHcnp7bz6dVv1t3ppXG9TD1AtMFw/72+nEUR36FC/34KWid+9ewdoK/KSE7f5bmVobRmvTfCuMQSQIqvabpoXXq4zOSU8Tde04LWbz6syt8pXWCLjM0c/M4rjm4fA/N17llT44ixQKok2D7blVPO5WlZ+Tr+o14jVL0U8v8EiCxwOA7SRwzQyjFFj8Z4LN0cNAefa2kXlvcc1B27vhbddX+8E7mabXVHZDK/sfmgZUoNKa55pQpGK85H37yb++E8PPB4m/qdvP/DN+cjw5xf+7//nR55eXlgkMLeRpXmKgI8TgwitKC1Eag90OldKZ7MgFlT1Kg3W1QKdn9d/4HsJrvs8es84TkzjwDSNPD488s2Hb5jGiW/ev+fx4VGbI5YLqdwhDDQOYPo74xAYgmcap599Fr9ODHDX+fS3g2gfxenXfXL27K63Uw9DZBgGQBQatx7/hpioH2qL0D9CoJnWiVPyOU6cdk/l8irIEQej6AWJ9/gY8IN+4LvSq9b7PWzS/K+kXV5fm3NuNalsra7RtdvBh/TMSkzy3hYirKtoVTe2e6eDvL9Tr69ui81/Q1Ly33UMMeLQWvA0jgwxKIpik+UV3P0VJ2cloK5AjltVm52VuwSMPR2opZKLik3NS+Ljpx/58emZlBJfnr5wu16Vu1Py2r3USx1NUDNW54g+EgiAw5VKytnamBPuPhMreDkQRt2UetbgEUqZuV6fWW4vDHLmFJWTkJujMOhCPBwYjmeK8yy18fR8Uf6V8b1iCMz3hZfTldu88M//fGPJVaXrnYlNSluRxrUc8sblKuVDJJb5gvOQzo8WaBVadVo6rdDyTFtuyl1zTstKIuT5TllmSlrWtuu2bkQ6VlMpLEmdh5eStdsCNtkF6Q0C/UO2ha0WpAYVFfQougq60IngWqbOMw0zwM2ZzVZE5/d+gw5xYBgHnPNMqwqsQLoh6aqBelFTQB8Cg3vgPA6a9JjmiIiqbZdayaWSU2PJyiWrtZKlqWfe0HBR0RQff5ut8T/8n/8HKaV1GWqiyrNYYFONfNm7Gdlteq90ytYg6atgpyf8FuB05Lm1tlqlaIfLgKAoUrbnbwMOEEPoNdH1TkzDrJJyMQ5eJgZFZ4KhPZpYNmot5JwUEXKVMagT9vngef9wYsmVZZ55vsw0n7S7tizKz5Fi1g+s247IrwttfutjjJE2RoZRg5jWQEwBX32hdN3siFjn763CuwZRvSpTwlfPuc81uyH4dSKKqV0324vxcDpFPrw78O408fvvznz3eOL5moDKfZ5JRBKe4tTSw/uorhs0fAEVAZA1eVfVc7+dbw9yvNuVq3aDDtOrQ/Q5onp3MaiDwDAMHA9HHk5npnHidOqCiBDHiB9UW8m3wRItA068U+7uzxy/HOS03inh6Dv2ntS2DS+3LUh99LFlIM6plkr3c+odSK8Gp3EGVpjNMpiurqt8m94R9FqDZg2w+jn1RRdZSzG9FLIfJL3EsC1iO02R3WKgD8Q2+371X2Ol6+vXZ7r+W39v3/r5ujS1QtBvrJIbvNNn2eFn2a4V2PxDezDmeu7nNoVap/evWht+L3X0++yducbGYBYYA74oX+l+v5NzMdPWHgRHAprx+VpwVc1Ae4dXL3muLfq2sDs2RU21m7ByaCvMtwsAy13dgGvJ6tlSM6F4rvcrX16+MA4DtSyIU3GsGALjEPFo3bea2F8phZSzbvbLwjzPNBk4HiO4waaGDrrXge3bHa0piTbnGb8E9WRaZobhTq2BWjSDrMtMul6QWnbGsUKZ70gtqmfSvcpMGfV210etjsaqKaUCkjqeazNl8j4njIi/LoDS7Pw0gPBO2z+dNAuOrDnYMnJdF3XieJuD0hcTr3MlRhVP896rMW/UcRyCCn62puJzXw1iTThwBCvVjTESDR5PRUt7qVbmnMkpqSGh17/bN6QQPPGtkbmiQoW7fQS9pTqmgldlEfHQVl88EAs2df3dAsNXJY/9e/ZN09ac1hreVcao4zeaUF0M2pDQCfWtljWgKtaRpgrpRYUvpZmsf6coAN5RSyNbi39KC/f7XfmIPoIfEHHc7kK6q1aL5BteMp6iWl5eEXpksoCtaVm2mQCldXU6J+s9e9VJ9d9a0/oHHJ2zuR1WQrf1tD+MrRDVE8wtKHiFsslXP+9/Z93n9L7LmkxrU4sYJQSn6Kn3XbdG+WY6X+w9hN152FrmHL2xqK+9G7AhP3Nb96/BGPzbdeqJ95eJ+RjafyZWWzvZv3PiVu09nee1CcE3QvCMg2MY/xXlKlWLLSvPYsve+5v2yGINL3QQGhmp5EIOnuq9aamoiF7KWeu4Ihv3BlbOEkARbfX03jPFkRgHdca1r5Us7EA0c1sHjG1YPgBOSYfgTLAub1Bvq7vn8XW7qG7ytejZODH3cTAdGbc9SLEFumlnUb8QMZOz19muX5VfVyIgGgBKE2p+W1PHGKJyDmqlZvWoWs9vN5ZaVVhaL75DN/3u6ESYU2JOi5b9SoZccA4Ow8QYR/CB4+nM+WGm4rjNC3/94Qda0e6Yrgt0mCYGH5lz4l4bpcpqUtAcDD4yRBUnVPHEClXNI45jZDwMhOPAcBxxceA23/nh84+knPj88c/MtxfKcuMe4BIhlYV/+fgnKpVpHPjw7oFvHh8Ig+N0nPjm4YGcE9eL+jVJrczzXceuhx8+feLDD+85nQ4cjwPn89FQL9b75XbZ2VsdquZ75eX5E/N8I4SR9+8/kpcCDDg5gHhuTz/y8vHP1LwQg6IaDrgtM7LMaA/pouTRmnh6mbm+VDwQCUTUB2eRSumZZFFtmRgjB5FtTvTgpkDxyfg4Fpy4QTfMqnNWIXdd8JtUsMVubUu15NBbV+bhNHJ6OGsnFo3olLt1H4Uh6u+V1nCu2rysZjbqiA7GoB1bMUzK+6mN43TgnjLX+52n5xeuX57wQ+DgdEy6oDyBcQpM089ni/+IY758ZJ7n9dpxHm9k1WhchXVMuf36a4Fmb16wn/8Uv6/nXBuvRz9KmdYuM23n1RJsLUWRwKada7UoLyvnYp2uhWwt+57CGB2eoMFO1EAnzUVL1inx8nLh06dP5JRtzdVzLOIooiWyfK+ELr7pHcSzBllhMl+5hpQ71FnHEAuu6hq/T7CxQMftvv6tji7PoXe7c1H2H2319RPTa1t9qWw/Wblxe0RnRW/a9l4OE3ANChC4RpVk3LICroKreN+IvhF8xZFAAkgmuKbdtWsc0ZN6nXfiwWNUjd6FLNre8PowEGT3urV8+tXXHSkQD5VKESjiSS2z1ATVkVumSKJKRpwirI3GUgrLknXvOATeneDh79jK/bLicTOH5q7E2YmWu+vr+IbsIIxeU+7Qql5gW9Ura63s9Wro6FBHcezhltbUb8p7JcrWppm/j3rhNGV9uz2SY7C57dxNDLZfXajbNlhAM72vnlePtpsRTIEVSegiRuvVW5C01sd7tIZb4dS128V+X695F1Y1oTmNZN/yCD6oImCHuW2TWdEpWAf5vn2139v16Uqj1EIumt1Rq3ZP2D0Kw0DEMYzqxxKXhVwyL5cr0hqBsJqzxmFkiiPiPEO4K5+G7Vl65y3ztjEounB4ROuxQyAMpmkTA9dr4vL0mXmeub48qzhcTqQcWXKkSuXLyxecE6ZxJEb48O6E845xiJymieQcKdyYAUxNu5TCOI5cb1deLi9AW3laaydhPz/3dSb3BkdrWq66Xyklcz9euF0vODfg3aREXwLz9crlyyfyMjMEzzjoRpjMgd3VjNSCSKW1wj0ttLTgHJyGA8dhophCeGVbaJVbowPjlV5U01ClmnVC9IEQR2IIOqdcXcedyvjr/fJ98WcnGuZULMx5x2GMnI6TJhtScC2bBQiEYOa3fu1ztQ1EwGm5PVrgeRhHhmGwkrpnjAPUhiuV5TYTp8h4jkjV8w1eeYVvLe9Qlgt5vtEzIuc8IQymhDwSDdp3rnep9nVr19hhirnb83g9BvcoT2v7zdPatteVwFFL5XopLDkjpVLSsmoppZStCaAqV8cQ/+AdhN5Jq29VW+V6n5nnmc9Pz3z/8ZM6ZjctUTus6zNaU0oN+KboTXNqtIsAPrL5yVnw0CquFXBlQ0Z2AY37ma/f+uhSHMArRGYNTKQT9GVdi/lK2mRfclwP2VZhfT/dq3rZq1llQ1HXilBRp+WGd43g9bNDjVAdlVV9X1DvHfoQ9Dhjk24ClD0wlfXf+xPcbeP2PPYoVb8L9v821JpJNlRR5Ly0RmxVm5PEroEuY6Pk9lwqVdQm4nCA6ecpOb8iyOkqrl3N2HeiUYfJ3HpRusYrGWkNAf5G1Eu/1jq5Bka1qAt4v7HeNG5W3Ra/2SQEY1THoANeb4Jsm5+VpGqtKkWNty4rZ8r8O0l+9g/UbUnAOsjs2sFY3Ers2tdQqwkRrh4fK3wf1gGgsD6rHxTS5cV3Y8R9FVS80VFbtbp4WTfuHuxIU0KjE1hK4Z7VNBND73qA06zMNc935vsNpDE4x2CDYpui+pxDjMQQWLs6sNqzCNVt3Xv9afQSVbCFOtq9j6HfU30XJw2p2iFEWkjzDRcKab5RljslzbSS6T5TTVQCHYRcEqksOC+UkqjVOnWaNjqu1iJ9E2iKOPaOrpQz2Xgs+5JmP/aB7ZsdNt5rLYiDYi7hKSWmGBmmQPCRdpxojw/UKVq5yMoPUkitGq/NpBj6M7avuzK2eIcfVBlbmiBZrRRW/k6TLoehc7iXmPuG3LkCfZlfkVBvTh51DXFWVBFZfX+6vIKuB0rS9LbeRO8ZTIdK3Y81UagWiAvKLYle14/sVW6+iRHLg2PoZc9OMG6yorDedRmIt32et+uV+/26rpldxNQ5T82jBhGGZnYRT9DtxsEq8Pn1GrUPdNa5KbzaUC2ns03KCKyt2sYILqiYXP+vGOoWPIyDkkib3cdmvcY9yMnLxMPpSPSO++1m2mJek15TlHfSaOZZVVuwseWgFpxE2xON5tqdWkWDTuf9SnSVHsW93nt/82NNcNYgEra73w8LcKThOoG3gwQd9egxjrz+TXvrXZDaCb6idAQjMa+/I6CU/0DDU8XTxKseVDOxPePXOCdrqdY5jzTdgwmqzr+WrARC37O22FjHYkcXd0HatsazijM4t3F2vWkFdTeCDpCsCCMW+BtqNcbI6RR5927g4eFfYdCJZel6rrZg+S1zC0bMdWsdtrcUK9oTQkeA1tOkw68h6ka/kChL0hgqeoZxUCkzC3SCD2Y9MCAiHA4HyqJZZ7EgyTsltjVlrJKT0CSod1U8EHzQDbXmzSerZ0DOrYvpugCgD74Whb5jiEzDtBqD4rDykkpgd9+iboXgXcS7oHwGpTyA7AC+dTDrP5237P8XH8i/7sjWBXG/qZ/J43RUUbWxUlwhOxV2+3K58PHzZ1LudgGdGNdWMcWUZnVqBj48PPD+4QGH0yqwKJ8nDgPH44F5WbSzzmkZr9SqIlMCpTZaxNBp1TjCOUKM4J36vowqxKeGknq/XSu0dKc6KKVRc0Nc4PJ84fb0mWVZyPcXJRBLpeaFeW5mhgcxCDmP3OYzy/KoXJ+aCXTuian5WkdJFWFOC5f7lZfrCz44kmkD9Qe5tlP+Rhljq5W03CB7brcLl5dnWgvEx4Hzu4lpPPA+NP4wOaQm5usL1+cvqkvVEkvV1nMvaqDonHY7JgOjRw816iY2Ppx4HAIlF+7PF5ZLBq9cq1KaLaq6AKmKa9QGAB9Wt0IRR/POfO4w0rZaqBQX1nJVh85d6Hy8QPSBsSt103BoO/JhGDgdJmIJpOK4OYX/cync5lllEpqK1QXvaGWgZUVIhjhwmCI1RY5D4GCqyzSh5MIwentdJIZfZfX33318/5eec0OOAAAgAElEQVQ/cbu+EMfBuhY3dHAaJ07Hk6LZQbluzmk5vHPbOv/NmZJujEbW3/2NbSNlW29W7oWneyU56/SIvhFGaM0RfaAW1DSxCDQTd4zDGlCtf82tmC/HwROcMM/a5v/x4ydV6C2VOamg49qVBzRxRuNw2v2IojrODzRvJcMYwI/QKiIJaRYktLo18+wSyN864On3vOmubfmI7JJo+5CmhGQvK6+xBzhbALoPGDpZXJPNYgFBNdXxhuCqqA0DDfHNEmhPdZHiJwqR1CJL9aTiyUXHevMOfNP56hzBBYSoCuot0ArWTNP3AvWaXEM383bTPdt4cN4Ru6xLL38JZJOlAIjmETjEiIgjp4LHacdeGigl294QiF4Yw4REeDgP/Js/nvi3fxg5n08/+yx+BZLT4TS7EFE0RJ2fQfkvlkl0vo4FPQqhrpjO+qGZRgCv2VfKZR2LztSBnQUeHlZSqX4OKxO7Flb3WrcjJdFEPbZ8Q0vae8KXKpc6SzX6WXmDwugPzAZnq9aRglPRpp1Yn3glLbdVANDqmJiwmNXTvWW0nbfR98A1q3K/nR1A50XlnMkpU8zUtHdZ1FpwzjPPM09PTywp6YIbDbkS1c0QEdVnSbpxHMaBd3JeSfV9neldddEIo/0qu9eZr5s2C7CW8zDlW7xnCNE2mUBzqFWEQxGJkmkhUBvk6mgukG5X8nxVDkFOxr3SFv9chCaelGeWFMG1VWPGC5q92lmuSE4TqlStcFdFceaUOOa03ov12Vn55rd5mqgkQVEdkZwWUlqIcaGdKmMMHMeR6E8MseJq5plKu1/INGYHrhVcLVY2MPLm10iOB8QRxoHRO3zMzNcbVRq+bRopmsD3zfLrD7fKZ1iPBvSyLWKmoVt7c9fm7K/TxEGzuGg+N040QYlBgxCAEMqaRbemnJHmPdlDoNnvC0EqIUTt0gqeMaiXXuzdQCJQFeHU9cq9ubjj7eWFy+WJ4TB9VRpz5HGilaLJYQgMw4Bzjs0zEGIcGAbrJotxDYTWY4dew5aoOkONVi+9Ftd/B69sZ/G6uTUjIgcvxuVwTKMFtKbuvv1N27BrZZ7PmnkfDiZ+p2rcpfROsQqSX52fLtER5yI4rzbBLuorvMNcl2mqwWHbk2N1BLb3+P8F0ekom8jmTmHIS38OHTUVaaio6Bbk9F/o88OxIemdgNxbylcNIyvhSROoYnSObSIJAXGB5oLaeTTTC6s2h2m4LhXjNMEAh0SvAp5K+NmJAe5vdc9KlLQfbRwMQVHW1TfNBwTwWTmcIKa2r0gkwqqFVotqcynat5UAgw8MYWAaIo8PIx8+WAPIzxy/GOR0jsUaoK8DeF2e1kG0oTh7josgsh9pfZvvd5JXk1ZhWNb38R32XMGura6pZayAI1oWWW1TcpSq0aDznlattdjQll4u6mTJPUXv1Vn2Sd8X3qLkp45GrQNrfejbgtK7x2rdfJ32wQTY7+MUXrfR9dY1Y7HzKFVLNyrwpsJ6bUe0vt1uXK9X5rRo233QaLy2SrGNvZZELZngPSlnSqv45jfbjA5HGp9mMAPQ7IyQXjJOlIQefSCXsv6utlduwoUxqihfLUBV4SwnjVYS1UF2lcUp5JrnOzX3UlVelwdngnRdjM6L1vNLWphvV7wIaZlVGyZvpSgxZKrKBqPun6es4/LVnd6Nprd6mPSKlW3MlkTY/OvhClgiIpooxKAqptHDGECiuiYfBrOCaIqY4hxxmPDxiHOOwygMmCjfrVCWqpuatWbvA1W3O0FplZLFRBgrS1pU0NGehUMtG7rQoLPyhZNGaF4zXKf2LyUr8bQHZ2IaVd2KBGxjMdi7tU5oDuuqrArNitIts0NqIS03I8/KJqgWlDMWhq7x9bZE8nW9sWe6f9D7crJ3xmNak8uOotj8FjSDt0hx1fixkn4flfu1vZf+tudmYa7xpXrpjt0m3de8HnyGEBiHwcTaVAZARFQYz7rUejyif68Lavb325e6LZNptS/GCH5bH7tWjFTtspJt89829v3865sU2/m/4bEZFlupX1g5ULU0W1+E0Nc751YNIWBdb4B1Xiu1w9bsnTBna51Erg0/AbBhtHYol9R4elr4/uOd0xDhLlyngY+fZpas+48PkTipD6D3I8Ef8H4gzQtIUVChmhCn7bOvULPeVuq98rKcJ0wjg5mRDuPEMGq5+3K9IbebUQE6H1CR8XleaK1yuY44m/f3u1YNatH9pxpHsFShVE18f+74xSAnGEzc1pDtdWagMvbObpJbUZ3+MjHkRycwa2a3KhOj5NTQM7HtR/hgnUxeo/NmpDPntOXUe0ecBmiBKoVU79RWoUCZBQmKKIyHSZEUqxu51iwaDKt2Rm9z2+qnEPGWvTlcVSPANTgyFEphW2tDd709FyXoFRUcK8m+rmos2IqaO0g3Kes1dglvvi9maebhlYgpMGf1okklU3MlJ+2a+P7HH/nL998zL7OWHKJmTRocZTrJWqQyhMDD6cTy/j3gmFpdVVK9d4zRM42B03Hi8eHEPC/cb3fmZab4TMBpm3cTUilUIFpJMw4D0zBwOhwYY6Rkj29F/aWkku8XagrMFW5FRaxuSybfdSOVfDNZ+KZEaKftxpFGaAVfhOXliS9e4eDb0zP368VaxpNR94TSKqlWUikkI1znWk2szTb3/Y3+LWKc6mjFqyeYc7SmY8lZCVkoNOtMCEG5adqG7fHiOY2OPMLo4P3B8815YMkwDZEhT4AnTu+I4xnvAw+nA+NhIi8LXgYcqrEU0M7F4LWVuVuE0BQqr62Rzak6l8L1PpNL0Y0xDrrZihKgnenrUFTSH++ILSAEck4s840SPK4WaBrw1NrwTtudnThralBIXHwiai8WeC2KpJyouSokPl8IznGZEzXfCE6NRF10uOgZRs90CEynyPh3ssV/xDEMo5ZqxVt5m3UMNdcooSJBr1F8VFubznnSDImaFckKTkXhOodh47Ptk1LoJatXLc8rv2NNDwyRsXVS5JVMQrTy2TSOqpg8RLN70K6sZV4YQ6AGz2hczOD7Wm9dRm07l70cRXMVyHaHFjBLilfclZrpdiryauLtvt4HPa9Q+7c5cs5aLi96n9W6yJLc2qhZEyg16bTnp0AJgMk2tPVxdYSzJ1qdV1mKJqjVhDxBiM4xBtPIyUIpsLxU/vP/d6HMwiF6vjuNnIfA5+eFp2uDMBLGkePDEaaROBw5HD4Q48j9euP5cyUlKCmTZkNXCkju3NK+9zsYHET1gRsfzzy8e2SIA4+P73j3+I5WG3/9y19xH3+glMIy30hLRoqq4ZeciTGQloXn55Emun5UKcqVM3/IpXnm3JiXho8/H+X84qx1Xjd631yfR18de/hzRen+5mdiAZLbigHrzzqpVx9lb7fr5S+MiCfrQIbe8eTMKdXjmpDNt0iJVMojqiux1iJm2dq5gwUXKs73uuwg1nEVnJ2rYFwepyUwm/i7uhMbJqQBQLVNsO2IvZtwVy/pWAbTocw3nnyrvk3PBmpVPkRV4b5lWSilcrvfuN6uzMvyVZCjCMy+rlxiJFkmrkjHRppzPRj1niEGlek2MmgxDZWUsy7csLWtw0qe7ETzIUacVMoqX6tIjgrfCUsSShNyaTQzb5RaXiE53mESVh3JQbk6tytOhDTPqg1T6yoA2WFhzR6MENdhYjZIWef6NkneOMbRIFkcTdzfIDkdFeyq2n0O67NwSHBE83yRpkjONGg3RRVPIapu0TDh44EQI9PpgdPDSY09z1+YDi9IrThzhf862Ot6RtKqOoDXTCqFeb6TciaEyDBoCckjBLq9h7XIGuqpOlvOROey1s9aD3JkRXL8zsNIM9iuu2Gdn7a5qa5T0fe34lla8jpWViS56+R0ef6/Z5DzDzg6KtLWtYAVLREjhzoc4nvXWEe7jSPR1xiwbh1ZkeetzO76/7a/+zUXUPYjerfmfxU72GijE8JDiGo4OQwUr8+oVkxvxziWeySn3+vtzbYYTDqhVnbf/InDkuge8LxGcb46cQfdXHKzIXmbo9ai1QS9GLpTeN8LqkmN6LjuA451A+3BzG4Y2J6hT6XWjibvy8WmhUMgOm2U6jJyJTeen5N2o0ZPPmSO0XObCymLNueEgB8j/jAwjCPH80SME+IK8xxM0diRi72pgBTdrV1HJhzK2VW5eMI0MJ6ODMPA+d0j7z58oNXGy8sLz1+UX5Wcp/OMctb9Wrm8onwcJ9TVakXdAxysSE4uGsj93PGrghzvI+K3Ab96pXRI3Pb62sxxfFvm1t9xCMUVe7BahnDRCE5h154pRRe5rplhXCDx0aL4ZiiOvrGUShcee9VuZ8qY0qDkinOVmnVDb1XJlcVrbXmDYXU0dTXGvrBoRmHiZQ4kDjCw4oHeHpJU2/yakFM2Uuqa0G71b+PgiOy5OG+9JdptsaFSpal8f63kmsmlMC8L1+uVUoqpFC/MaSHKQLRMr7Zqk0/5Sq0WSgzc5zu3u5onTlHRF0TLEVMMHKaBb96/4w+//47L9cblcuV6u+mADRbEohsLAtb/zOZfZJ1uVSUNWi3WWmtLrQjBQg7fCtIyWnBW0rETIUggooM+tkYoFS+CLIlq+G4p6qVVRSFRVeRuW6lK05YN6v+bG7zbGN649Mga5Fi3gpVGayuUkkh51no3jewqXsxfK3hCCxzGETkfyDnwdDzwMI3MXhGQe1J7ksFg5hAjp3HkNA1khMfTgfJ4pOZCul4pS1HCaFXhRfGOiiiPoxXE1KNXh+qUaV49s5xTYurQ28atLdghxOBNeBBKq+RatbuvFqRq23CqjVQsUK+VYmRcVyt4zW5bU2V1D1bm8VruVFKCZsLNkGIcPmgTRIyKUIfoeGPZI5Zl4Xa7aZdLT+JsdaiDbgjB9MZ64BeCetC5Xfl83S/p5fYtMRTbdGFbjxwbl1J/uqGT2xB2aPucWwMQ9aoqzPf76kE3DHHl/S3LQq2V67xwuc3c7jPX+6wK2nmhlEyrWhZGBNf6XrJvpd4RcPt57E5Mp5lsCeffQKg//fXbJyCsATiOlSawCr5aovsTK8ga5PV9QXryrMvkGnzqy3bXLprIRQ9j1HkzVQ1+A435tvDkhcMQGcTB5Fmqw4+REQfTCJOHCMMh8vD+xDSdGCdA7qQUWObAeFFD7ZqFPMtGEu+jLjrVSPLQhsLiFiqVl3KB2dFq49Zu5JCoUmhjMfkCqL6oZl7wFO+VZuKE5jLC5rzgnKM6uC0LzxdB3PKzz+KXOTkGY/UJ0GxB6FliE2VIl9Zwxfrp/8YEDttAGskpSTIMI8Hqt3EYGKdJL6ZAq9n+VqFJ1c6oEIl4cBUfGjGqYF1uuqg264TZghyvwWZVMapWTDo+WUtaLbjalXS9tiv3/5zbBPFSXdEPXZCdWiJMkz5YgWDbayrK5ai1Mc8zS0qAJ4RRA0VUM0SCW4PF1vrQ+G2OHuDkVkm1MJfMbZmJIXK9XPj06TNpSXx6fuL58sKSM9NhUu8h71ckp7VmCrt3Ygh8fnrk3eODCpahyI3q20TO04QH/vnf/xE/er48PXO737jcLtC03bB1DoBXgqFOki3llFq1JFOyojdZuUKq/+AJIkQbbL4lJC/rBtbbowcCRyA6mEplkLaWMbt0fVrLUI0shSS6qWbjItXWwCuC6EN4HejY3/8tEDk9Aq15SoEqjpyFXLJq56Q7t/sTrc5Ur1ozAUWjwhDwfmAcT7x7iNRaSMsTt+uJeQ6UfOVyUTXZ4zBwOj8Qh8i7xxMPD0fKNOC/e8fRLaR54Ycy83xPOh/TnSVYSbcO6l1kRpytZeqSWG5X7vNidWkTu/OOKXTZiEZwvX1cjQWFgC+OmPU1tWRqTppQNMhN9TPuuZBr0dZYZ51yEimtdl30VTWdVih5ptmGW2umtUIQVVceJhUBHCbHMMLwtlqAfPnyhc+fPq8kbG+lIOfUakFJ/v5VC3lvwtg6W/UjTyoP4b2nlJFxHOk6MWK7a+j3wcrs3sqNSgto7PZO7WyJto6Jckdaa9SlcL+rt9XxdEREmA6TliGylow/Pb3w109PXK93fvzyzOV64Xa7kpaZmhcLcthky0TWvw8bfaAHMK+//3Pz7e8FO29/9CCwKwpXU//u/JkugqkiuOr/1xPfPQK10mGdXZHJQK1/Y/3QxNOhqOzDwZsliFOVbCpPX658+XLjdJhoJZAeIuI98Xzg4dFRh0A5Rmr0HN9P/P7ffcvDwzvm24HHb3Ruz7c7l2dPyZmSGnkutKoKxNmQp+ocxTa1OmYu7ooTz22e+VQ/I024l2fm4UoLleoyMtQtcJam2lix0mLU++ATuEJwzkrvjuwTPz5DayMf0s+rAf5ykOO9LQod8tPIbT8g+4BrbeOkuN2TWQOdrpqIM8JNWGuQPijhWJrX0kUvBUizh6qd9bhmMHJP7nsJyOCzHh73gWBoTEGDllYtW0AhQ/HeTscUk91GQ1bU3LJAI5IpJydQg7mHs3+9rIzwUgola6eS9+a/tSI53rpXTN/npxCBNzq6UHafHHVHQl5y5j7PaluwLKScyKXgayAaDL4Sb01zJqVEDcHUjxPg1BuoFCWU+8gQPG2IPJ7PfPvNBwCOx4lhiFpfraztWJ0nwC7AAVv4eotltyFw4JpXSF5UQ8MjqojcigoUIqvgmBMFiAIQpOGrlkQla+CnwNy2aHQbg/pqMfkqC/7q2e0z5TcXHjP42DT91k6bJmr3UEoiOy0XFIJ18ZlEu9cAdPQqvnacRo5jhFbN+FLbwaP3jMOgpPFh4DBEqmssh5F6nLRjKYBIRZqjtUItGeeDBqUEIwer+rDUQs3a1QfdYQxa8Oo155Srodo5Rh62klMnzHuvEgSlqLBnFsgN5bw1bffXsV2haZluVY1d5ywaTEhvDOgic1r68B61JgnWDttbPd/w0Hk391FM8J7WO6RECM5RLbjp5PIYB1qrK2m5S3b0Mq//KijqnTjOOSSIOkE7h4hf1/J+P2Dr1fBe8D7izNKlE11rrSxJy7vOO+ZltrJ2ZbES9pwStzlxnRfmlEyfK5sURF2T022f6F+/Riz2X+8F89ZfWu08vj5237Pp+tssuXqeji2h/ZrioMCwojX0oK3bJ/3EO+5rJOtSs37o7wUPgyUMQ4AaoDRhXiqpKGF4XirTJPhBaQRhVA5ai54WHHEKHM8T58cjMWaaHMhJiENDGCkJSqqkKMY3EoJJgmRL2xvQfCU7neulFua2gAi5zZRQENdosa73qvNmxTeq9xQHzjW8zzhXVF8rKO+wOZhz5nqHacpf36r1+GUmnVrwKukHixidWE+8rC3iG0uelYfRkQ5UT84mjL6mlEpzGV89KSXCEKyUWtbNrNZCLUmtCESDmxA8h8PAgFeINC+0kreSjwVgnb1Tndb5fA3WnaCvaU2oXksR6pBrtT6nwkyARqvG7ehQYx9dVt5fF5zWg6x1IO8HZoeFWRenVZfnq43wrfONWhspF+7LgiBaZppnog/MaWEpyptYcuaeFjVCDZ44jeuiOR0PiKifU18t9pt+j2/1elWgLQY4nUY+tDMihW+/fce3373XToNFNW6aQGpaa/XO6+SRgpdGduCqZhDN+DLO6YQF/yo78k6VkJ3BxD3uDV51HYKYNAFao2+G3giQnJDRc8g7t+p9m3tv0Q1fITm6oO2h87c9jucj54czITVKg2mK0LQctCyeyyWwhIE7ws2CQNIMyw0vjW/en3n45hHv4Jvvfsc93bnPC9V/orpIa47DMTC5hUDhUCpDuuNr5sgdiYUQK4cgqA+uKjDPzXgY2HRp1XQ91Im5x7RigYhumt4sILQFtQXjbvjAkAo16EYgVu7NKZHSokGOoTm1Ne4pW0lLaM6r6qs45mXhFgd7b8/kPSKqKyLOWUCmQW5vbF/FO7/egd/o+Otff+CHj9+v3AfvFcnxhtiMhtj0n4H6gcU4mE7ZFsxM48g4qQ3EYEHqiuTY9el8DrZeW+l3n0xg6xmK5AzDRAiqW3K93ZXAXSqzBTnTNPH9xx8YRlWTLlXnzZfnK3/9+Jn7nHh+uZOzaofVqvpInUfUM95XIoX9/vN10rBx4dbF+KvX9+O3VDruxxCjyRpsHFMHu5KVWPDYldxfY0/75yQ9ERY0RRVZm3NW0k1PX8WtfzN44Tg6Ru/IVbfsGZhCQ9pCzrrvJechOWQYFAGNkVrVAHMcA56B0CZaEZapco4HSgm0XKnLoKh+acy5URu83IUvNzWXDcNImB4Uoe9NESKUOFDHg4438z5srbLc7+S0EILj4Rw5TIEQGqdJPeecg8FpY8MYItPwy9HqL3NywoD4oJG3FL2BXm/qnpi78bi2tuEe5Ej/7NoazbacjVehO36lmYiQ9s03qaS0kPPMOKi9urNuq0M4Es6O+T5Tlzsl2XMt26Ss5n9Um344Z0q2tvnWnumh79lCl65262TLRVWB9xNEBbigm/7txQGRTQFZN7yt/OWNbBK8ila1Jqq2vWm86vv/4iP71x1LSdyWmfHqWfLAYRj48vJMrZXb9cZ1mUkpcZnvvFyvLDnTnKhNwxA5n5Ut74BSMpfLS49w2RtpdnJh8Coo5kfHtx/OTA8jx/PAv//4R+a8kHLl+rIwz5lSG7c5s1h7b0lFa/beEUumOE+rXfum4FAZc+ftmViXW3TCYTRVz651hPJLggih6Wv64C+5sFTVakrBkYP6ic+lMpuRaLUNEQchBoZRN45VebYjBb3ebiTztzy++e4bkhTui/pITYcRaiLPF675Trm9KFKVEm2ezbog43Miesf//r/9r/zz//I/c5hG/Djy+P4dy7Lw+P7PPJz/QilFKU3lihOYUmW0UoaXmeOYuJbE56FxibqAL/cbt3pjGEYD4TXQ7CTKVNWnqOKodfNA8h5TInbE4BiiKgw38TgXCaEx5Ma8qJDnktSMVJHIRjYy+G3J3JeMAKGBD2JS9jdKaUTveTyM1HGwNnXBBaccItGMd+2UNG2WFSqTt32e/+k//b/8+U//ZSshObO0gFcaNK8+TP0VQ296CbWXsXrHp4oL9qVNVrSnf7+XuwBWRfgVdBALcgZ8iDo3S1mlJ+7zQqlVGwQGbRcWFxA3gPOkLNxTo1bhfrtwvxdKgZKFkus2T9a//1MBy+vPP58a/nQwsyrM6794a1humkYO04RuSkJrgRyKBt89uBMVjw0+vApAFZTZiPzS7wtYPCNaNXHWqON6gNNLfGq46r1wjBBdIJfGs2/cFsHHipQr9/tMdZDu6vTjpgOxOtx0oGRhGALHw0iYJsbzGS8DJXvS3JRSUhvOLJvuSbjOjVyEf/nrTPovd5I4Tocz5/e/13EZtGsRESRnWi6GrjsCnlIyXz7/wOXliRiF9x8cDyfHNAr/9H7i3dHEdG2NLdmxXB0l//3H+SuQnABWXmkidDdTi21WZKLf/A4Fr+Uq1n+wBT26ydcGzbuttOMdmKnb2gXUKrUplKpVDMfgA3HwtBpNDrorZNrEtIHSpHtnGOvcFgUNVFRpEjQYrnhte13xUu2m6u3eehnGE+khd0cxesmqb3RfITlbAOPWSoy4nw5o3jrhqFWsk6rgPKSi7eO9NFWMj9J9qXLO6hNiGjjOu3XxDDubBX3cO1QPbJyIdb3AOARkcKQycT4fOD+cNEvP0MThSiNU7XrqWbWSgB3FNh5W77Fm5Uq3wXb24QwN0JjEWUBqyABombAngGgmUUuxONlTnAY53X27d+v0o28Qe2n9frwuV/3jn9/+GA8T02FSmfbaGGIA0c6h2rRcAFDnmXy90mrF10rImRhUlygOI+N05PyQgcqSEy+XKy8vL+Skrfi5LcppaolQNOgfKXjfKKExeCE6TH28krKyX4p17vVauwaLXbtDc8/eJWKVErrxra4BZqNRew+Ufk+DnMqclBuWW9PXiGxaS6BiftYFkov6LbXgyVHF/xyCGQOwuez05/ZaCuO3OJ6fLzx9ebYgR41815K464KlWAl8Q3ScOhG/CnJU8Xjo+cfrOse6bnu6W/ir0usOteplo65sHILR+y2R60FOLlqad0F7F52PECZw3UYg0sSRl6QIT+tx41YC/ukJ83WQY1+719/VpXvzGezr8ytdwFfA+ds+2LU1XJRX4fC0ENSvq5dmRPcTDWS1lNz3j15Kld2Z6iX14HaP5LDuPdsrldM2BscYNJBYolOdMSe6RtRKARaEYucspRKighHeOUJwjM5zdJFIo8aBEgblBrWGawFEmKyUlUrj9CVb4KJoy2HQxgWisyAHCBGJCjwMLhBdoOTMcr+S5jvDIBwn4XiE4wjvHxrfPti9sTL1sjjKolxE4ecX218n/OBMoXjn7q11MsxfSIXyWtpB9U0DIpVM1RNwfRAK25Ii6pFSS1Y+j9OgqnNtWm003zZxJedVfdMUl70NEi9qtoeJSG16N6KdYTaRe1BUi4oySeu6HtYGZwEYfWG2TjIlU9ulNeWj+KYKzBq0mXLvVwvmivJYk4Pagqw0MzYukZGm3zhb7DVi57sPkBJofYyIdxR00ygiZsiIGfJpt001uWHnHNFHDoOS0gcfFCURc0hxOplLzdwRmoPkGsU3GjdCzEyTlQ1bZZmLGa9p8CutBxO9Kb9aa23RrMXucpGGx5GbkEo/ZyVQKJkyEIeoj0CwUpQGwsXGaqpCbh35E4p132ZRQp3YHOh6RqErMHstJfSHvZZjRc/f//y8+4ccwzRwOOkC0poQwsAwmrJ0Ew0IRfivzL15cyRHkuX5UzvcPQ4cmUmyr6memR2R/f7faGV3p2e7qrqKyUwgItzdzHT+UDX3QJIs1kwXWtYpSIBAIBDhdqk+ffpe05WlzuZwv6zovJJi4OVy4XKdETFdljxMhBB5enxmmRfKunL78sL88oI6gZu2gCg5BVo06YbjmJnGwVC5y5V1sUDocrlsCE5zv6xltXJo6YinB/w9VkElcD0AACAASURBVAUoqki18qWsFQnLBuunaIHbvM7My2zzB5c8VMtIiXYsVDXxtOIIrZl52l5hvkwwxEYKNv5FTYIgdqTD/XQ2Fff3HU7ACNbNUWIDOGTL8nfz3rvDHMFN+myOesNH5+S8CWB69ikWwm1kY39872S8T9L6QWvWFsVUabcyipW/59VQGsQtBUSRCMF9A21Pz76GVszl3FV8t6jyL93gbwKdu8fdYRzbPen55xakvklQPDF659Gcl4V5Wbd9cDejlh2tuf8FT+x3LqpuFIj+Rjr/qPcxWeUjEUM1uwSxZH9ZG5e5kkTRVG3PbaBNSJKsxTtAdwpvwZtZgyK6oi2yrFe+vLyQp4HYroz1hdAW1uXCcnmh1dXvt83JuSpXF+b76VZYqu2jdS3obUENPrcP1AAGdxPQIKhLuVj2b9yx21rQW2GtMH01+SzaHuQsS+DrS2NeAmn69R7y3wxyVARCJORo0FIMxCxIaESHlgOw3grz0rb2cvWWawdS+jhuKEaririaZVsDVRQNgSbJ2sWdqNi8q6asK+u6oCEy5mhkU7G6YUpxKw/RERUnQQVVNLo+jgRyts1ycXG3WqsdVt5p0OruUt6NwlDdTfOw7EVWEwbMZGKKttmqdno0TVy7RFzO3qXxTYUZI9xiJFFpCk02XZB3vTxL7boW5i+UCDlDDKyqdm+0UTyIWErldpuNP3N2MqdEhjhwnA7WJh4TWa3DKdNIUhBRlnLhsl5RUVpWWlIKM3lYOB5tIy+l8Po6U5VtcaB7+2tQU08tnWDuQY5JB9gmPpfGbSmG+MQIybpJUk7kwSXwl8K8mHCYVPVMv6tm9mDGtp4GW/AjuGqzCDFlBm+Rz8k81kwPxj68MrYFk+95TceJcz05ktX5UJYVlrK6sGOl6pXr8kJZVsp1YX2ZiSHw409f+OnlharKNCQO0wmwrPPhcKSsKy//9kdef0y0srBcFpbbq/3xlCAkBgk8Hg98OS68Xhda/Wq+aCnRFPJt9i3A7m+plZu3Fm9NAkG2baKhmFivGeyudWEubWvZD36S2fdvdp9jMPlmbL2RLLtc17opeK9r4erCeNZ9lkkxcJoS0xCZm7JW6xBJDSQKMZvadrd0eG9bhx7kqMtfBD/cRJyAXdvdQYhn72wotknj2PfuAcbekcUW7MiORt6VmrdyFXtHU8+Qv+WhmeWKOM/KkU53fxdRQhISpp4bQyKmwQKwUAiSfJ2Zj5E2O2D/usDjlzOH/tu//Sx72/x7XpertcwPKXRfyzdI4fa1sBGN/UQA8G5hD4feRERty5m7h1lK1YJJIkUbl7khLytRlCU0xmBrOoTEEKPd61AgWJIRkwU6NVVWbt4d/JU//vgn5lZhvRGuX6CsVo7+8oW6rkZqTBZNliiUbPPh69fCbW2mX3ZbaeFq9hzZH0+fp+48kMw6woR1BYjUpny9zbyWGzlCWYTP2QATXRtUpVThukTWEiAtvzoWv43kbJG/oSTdiDNE8XYuDy3ebAC6iS79LAjvSI744xDoHVL0GaBvv+4lg9ZsE+vY9ga93sF4PbZXefs8vlls8KDcwYPsSM+WG3QkyIGpXXC8l1FcoKw1gu7Q4j4f7bk8ptg+E9i5Yncmdr+sCfEel/r9v6/th+29qzpO0j/fvd/q1g99LK1F3LJFa8HfgX/xenHTlbXOEPbuAtUFCRW3pqKp8SqqQqu2UDZ0fUPQe2bTtnHdN4zeTWOBZghhaz82MTdDXNbazFFbxAjnPq+KKqXPTw+WraRyD4IKhN650qXs/WC4myf7a33nYQRSsgA/9IK03r2WJhTxer3rSNVmAmVLWYkSvBxpqqmavSWeyDiMcGzUdaUdXtDLSAnA3Du0LJCUGBlSJKfogm9GKqzVHMXXtdApl9XHq+ss1XZnVnt/qedEffEhKNVBiF03phv8Kd4J1bVl+mGthnDUu6ShqYmMLavZDLQWGJ27tZPL+1qVOxTnDgV5z+ubFq59H7H9bF8/e4lnR5vxNer71Hb/vFW8d4/elVi34IeO7PaSoN79fd1+3hrEqEYiVf8eJl+w7W/+EZqYt5iawGR0qYAdRuiisP5eFUS+DVHuT/e7r39hbf2lkdmDnx7EybsP5bpa918UBe0K5DaP9ur6Hb1B7t7pdv+/OQ/6HrV99L070JtlFLF27mLoeQqN4M0fWYQoAWRfeyqO4gggihXtzfx6WRau8wzzjL4uUFZurwuvXxbquqJJTN1YoA2B6i7yc+llfkzXrFSTWLofb5zkL8EqLe3u3Pdzs9aGqbbDZQ5o9Zu3eJDThNsqlKbMf8HX4beJx34D7tvf9oNsJ5kalJxccMvawAU80/BsoEOSAuaUmxEg5URMaVPGjb6ZGTt9JKe4uQAHMeJurV3Oum7Qt73eXr+2gyr0ElW0rH4YMilFWq3EKGiTn3ljbYAQe8TZN13BuQSYem4Vc9BtYpNKg7fE+nh0Ck8Xu0PZvHHe1MvvI6p3vZys1hq0irRG2D6s3JSakXKHECFax0t1iGJdF/MeapGq1SB98DExf5NSV0o1jsdlvXIpL2YW1xRtcFtW1ragUu1DbVG1ppS1sVZ7jVULotX8hNSbje8QkqZKh07WqpsQY8BLDTEyjgPH48HajnNmzQPaGtfrjdvlZjBu6AZyQh4SaUgWNN1ucFtQ8UzWM6e8ITnJiPd3B4pde7D83mOJtJ/PGQFCJZh5M3mMjIfBVHuJSLXNLg7ZUFUPTMq62Pyuzc0vE6eHD4xpopaZl0Pm8nWwjdnJrmu8kcdXQnolpmIu2cm8zjr62VR3bzStGzn//p7142e7e/6eqtqcNFBDtj2kBYUYCcG6zA7no2lupUyMVr778vmFr19eXUbfZfWrcltXEEhO0lrdEXvpQXAQQo6kIRJzcLFK3n9IY0JStsPY97CekElrhOz6ZK2Ln7J3fiquFNsPxi40KoRkiO0mV+Hy+1aiCh573GmbbUjOnnxJCMQ0bJwfSWkr7/dmDzNtTiBCTBNpOCLRHcpj8nkZv0kG79GXX7rBLnuwH8vbT/bvfjuPfv7Y/fd7+ft9r9//4Y98/vFPTIOhh30/YhNntFdpfl+J+67UHpb1ZX0f3G9vSzsG0B/nASQmobBU40I2VW6hERSSVgtoopKCBaAETx6jEqJyHJQwKsdUoF2pa2KdZ26XK3UtzK8zL6+LlaGSmPldF+aNNrdqVHQKSBM0Qw3V0CpHYexc9SqGd26XNhqRXSqaA5XAHIznF6qyXNXiqQZhFaSaueiyFEqFafp3tJBvBCfvFtFgh7rbpW1KuClCS2L6Lwqh2mhshntYQLsR6ULcGP0hxk3fIUXLkFUDMgzkGLx9cvDygAVcK0aeXdbiHRq2VEKIppci1kocg5BTJCRhGDKHowVN2iqX12A32uuTSqMzUkXE5KR7KSI4vOZBgrgZ5Na6h9LCHuRoCFQvVTVR1N2dtUc9rvezqa13xOrdIQA1NKRVE1Co1QKc2kitkR0NGUQYnSgHFrg0rdyWG9f5QiqJVou7rdvzrmUhSGMtM2vJNCpf56/8NP9o5cuikJVlbiz1ioaCivkrNV0otbHMK8ta7R67bkkUaDFs/K/Q8yDFIXzriln96+hlqpgS02Hi8eFEjNHKbcVh/x9/4nJbzB8mRkIwLst0PHA4jtTWmGvjclsQTBtpyANDHpi8c2LMg3NyPBO53zrvUMH3u7zAI7bmVPbdT4KSsm08Y00cz0dqqaxxJUkmSCCPo2VM2qy85YiJqXwbp+P48e8Y80itKz/+6YGvnx8dFTIOWh1eGI4/EfNPxFxJg5UH2dSo7SBe1sVkI9iNTe9aMukI47dfBxFCR32ELQCIYyCMkZgC5+8e+e6Hj6ScmMaJaZwopfI//t/fU1qjrIXry411LhQ/NFf32lqKubWX1rjVRhXQKMQxko+ZNAVIsvGG3vOSNBDytAv7Of9rV/22PaRV6zpUV2xutW7BzyZh4bw0RIh5JObBUC4PPIXehBE65OkHcEdH8T3JyvchRHuemPx5zBdLfC+x0kNAQrYENo+kw5kQM5ttA4pG10/y+7nd079wb+UXvroPW3asxr5+GzR1VETuvvf+K/P/+r//O3/6478yDcm8unI2D8UYSdGVtIMl8tM4bI0MXb16a5LBneA72r4tccuid3BbwEuda20WTARFY+/0bQRdCWrJ/YFIzhFCI7SKhMoxVx7HxuGsxHEl1BfKrXG5Lvz5y4V5KdxeZ14/3yhrhRxgTGZkq4HgCUFOgXwOpnaujZWFQKA283C0pqKVqisigaFO5DQaSCOFOgWKCpcKl2aojc4NVquapJKJLdBKM6X1UhmGf0e56meTwVdAD1z6Y/Z26b533WVlv/CkMfRMQu7aI7s2hXdXmDDOBqv2aLdnMJvzai854UgO6rB1R3Z6N0zvQAhbjb0jP8J+LomdpHRi3J552PNux5ls9OG7791BNPvO7I+Q/izbTXnb2vjeAc63Y+EUaQ9E7bMRZgO9JdyRKbpAnpWterlvCwjvft5cJbh2wcBqJpEUK2Mt1X62MZjk7g6rC8dpY7P4CIEWNsaVj5N1yHUX3h4b9jHYScJW0kmuCRKjdfTElFxR8k6QUQIhRVLOFvyFvXRwr0MSwl6eCXQk57fv9t/+up+Zd19v3Agg9JKdvZeWlJiboa0id4Z/HZW0+NduuxBSJk9HQl3J45E0Hk0orppNQ0iDGUTecT36QdoPHIWt3HhPrrQlcR8Y3gU6cvcOVb0b8Q5RCTtnZpgy43Ek58xhOnCYJspaGaeRlE2h18w5bZ7XFhAsIF6DBfFd8LEjHX2f2Ai523p+v0t6S/e2LxqPwoLFuyBasLXTfK/ckGbZ3tdWEnJ36ZCy72ude3MX5CD02nHfuje9pxbMRDNEe46Y6EmqSPBuIdmCnBBsXZmTdSLE5K+7B7a9dHP/xn/5fvxSsPJLD+7vfH+6nz/evvPeTJz9ul6vvF6u1OJBTS6GSkafs9mSfGt8YQtwFLax751/gu0/O7rGlhTv+cAdVQPxki9bQOk4hfEH1fSqpOcZja18lyNMyQ4B0YLqamagtbKUZhYq1egFBDHTViC2RlTZTGNzFFvOilMcujVOD3KKIfUilJaQFrdzxuaHlbhNnc8lYYoSVMkFUoNWzEWhFevq+rXrt4McbQRtWwU1qKBl7zyq2tAgtOKcmQ6GSF9ybLd+/5e7RWyHUHShnxB3j5iUggdEwUoD3jWibS9VbUqm9H1W/EBy2os49qI2yOtq3UFm2BoQiaSYGaeRGGKnAfl7A51NmpzethnE+v09UNLgSA0gMZAGm7imVdHbOe131Xkj3Vit84NQM2l790oVWNdDFuIQiEOECEVXlmY+IioVwwSVkF1CH598wQ6FKwupByEUosCVlasaMnLRwkFtEr/Uwpe1WMBzW6gUSlFer8qtKKvCcBh4eD6xzO5jgx26Zd0XhYgpzwYjAFk5UpzQ6uOe/LA9jCOnw0RKmcOQvRYtnA4j03hAFZJYl826Fi7XK5fb1UqXqs4tMUG7LsY2ZkNvDuPIcRo5TBPTOBgC6QfQFtDL+3dW2WWbR9O99NMFOZVG9RJgVV8n/V65htXX16/8y7/8d8Yh8zhmHifLIstSWVcjM54eP3E4W0fbdVXmNFHLytfXK9fLC1+/vvLj65WX243butAcRTP+gR8wvobVGwSaH4pm6dKtBXx+9kPJf9fK4QbW93JzCIHxnBkfR1IOPH985PnTkzlh54ExZ8paefn6yO06M8+GIl2vN7tlYkrLqBjJXYywnMbRBNTOB6bzyHQeiRlCDmwkr3e8QhoIaXSBPj/oenACtpsq0ISmZStRVR9+O/z6Y6ORPSWQspWO9uTLN9jt6x7kxG3+9ifsqsQSIpJGC3LECO42xL7Da0UxfRwLRvvfjx7QyvZYvVsx/evtz/7C9TaI2ZGZ/jsdxeyBjOWibx//7TO89/K8Xa9cL1eoyVCbZeG2zE7TCNuZcF+uyjmR84D495OjeDkllwPYNY9qbaylbFIkxrux9z2NkXG0BqHTU2Q8WedSmSttrTtS1ON2D3wfInzM8JCh5UZNhRYLdVQO54SMgTg0JI9mfB1MHVkDMILGYuuk4upvwfZ9tfJ3VTtTPUS30XPuLK16h6VuCbWpk1tCEkbzk5MKIai7jzZ00e13fu36K4KcQtCKO0gYBWB1ufvYrBNChLqayFFv/RXZrTq9ZPdmkzCujG1wpsjpi9nZ+QgW/HgwMaRElGDGmrXSipEoW+v0WOi6DwEnh6K+ns3IsFRYl7a5VoMHOXlgOhzM5doddZsqcy20r19cqFCNoRWsk0Oyk5fFhAwRzK3bqfQ55y3atg2pWpWouRy92utN3v77DUXh3a4QIWaT7U5jRDKs3mK86mKIS1AkgQwWhG0tugJzqlx0JmoBzPg0IAy6kFtkaI1JFwZdqVr5XFb+PC+UWni9vXJbbmgTdM1oiRSU8TTyTOZ2W8x1Vgvr2lgWW8g1BJoI0gI5BkKyDFRFNxJfiIHRUYXj8cDj6UTOmSFHchBSgA8PJ77/9D0SAoecGULgNs/8/g9/5PXliz2PKtkDm65aGkLgMI2cjgdOxyOn45Hz6ciYkz1WDUvZKo+wZU/veVkGX7f2bEO17Q9bN9rqDvMdYcM2tGQZwOcvP7FeXskx8OnhwPePB4LA9bpyvS5ISBw/3jg8O/cqNyQdWdvM7y9/5M//9pnX11f++PUrX65XIxoL5CHTFGr1DB9by9Ic8fPybh4Sh+PB1c5799JequmbcXSo/nCYOJ+PxBQ5PR84fzoSc+R4njg9TBY0SXDNjcI8L9SmXC83Xi9XPv/01dFfL7irsDbn7qXAcMikKXF8OnJ6OnJ6nJCoyNBcKvudg5w4EvJhE4cLoWvdhD0cUNtjV53dskao5m+5HfjQvdUGl1A4kscHJAQPPsN20Ddv2ZWYXBvEu7qMF2BlsWpBTsge5Nxd2rx1t5nqkEr0X06IJAt0VJ17539bwkZg358Ivjkitkt+4f/km2+Jn/ZbkLbBQL8cRL332rxcLlxeX6AOpBhcI8oSDfHgsiflwasY5ok4bJ1sdiYGU68eRuOsZpOJaMqmg+UAHTHYHnCaMudTYDoE/vF3Bz59N1Jr4fXrlfk606oy3xpltZvUmpFPnnLg+0F5HpV1aFxSoaQVnYSLJGIVhiIMT65f1pTSrB+sSGENq78/R9Y1WHeuVitd+co3sKb/B9KazQ9wx3HvMhNsr1IjUNOMiyOzQjE0pcVGW9u/L8jZoI0tdqa3PxjPpFqGr/eH9IZK9vKSOqJyd3Vo2wOi+2yuA4tddXQjZdH/xh0rvT/dljnu3RBbx5D/njqaY/4r9lv98Zu4WwgEiQ67hT3YFDo8tT+3f91LUR29sp876oRCFW/807sP2Raj3ac3Ra93u6RLETuabcRTcyRv7lKtXSkvbNjE9h6bqAV1Lh/eFX8KjUIj0CjaFWirG4HaQXtbK9d5NQSt2kdVCw7zEKm1kXI0Ynhr273tnlEibBmADZ8fjM06SJIHJNk3iJzNNyt5OXSvf0cOk6EyAiSXGrc36hPF/3cj13ficQwbwhPDzkfC75RsY/ruif8WRPeSbdO9pLeXh3o5dy9q+anAWlYuy0wMwiE2rtmg5ut14XKZkZDQw5U2XY1LQCBKYHEjzMuycF0Wllpts1Pdx2V7iXdr1L/Zu3hiNOXoNKRtL+DNftCDHPvZdJg4HA+kFDmcjhzPFuRMx8wwZYf4jbuFmI7QOA0mE+GoqvaDz7VoVM3SoXMDU06OLNuHSVWAiaC+74BaCSj6Z9n+P/RJtRkG+hS967b61kpk/9Lv6ca/2URS6NU5h9CdNMzGw1RpHkhjAVCw14ayzaQNCeqf+z58N5a67Rf7frI9rn/95tJf/O6vhib6Cz+SX/zyLz3L3/TaXccNkdhVv7eNZXv/4nOyV0eCC67W2uz+eYkphODItf1/l2GwwMKBAzFCfYrCkCKHIXE6JGoRpCxkiRbcVLXgQp3Z5zI2WWAICkFNZ0caUaxxp6/FiCDeWdOKod8do/sWJ9v3HL3fVtlng9+DHqV7rLGXsf2RPZ5QrAXeea79L+jd3/z2+u0gpxZ37DYC2l3xwiCwZvYOFg3aCzbhLi8/qdK8/lS00fWDLZiVLfPsGd+ugwJrqyY3HQKakxGG78i5EgI5J9o4gvprUEFqNSvQflA6Z0OksUaLQGvjjYv0uq4mL+6TT1VZ2+r15Yik+KZEtXsa7zd3y4/6a/SNpx8xIeAEay+dLWwT+C8N0t/yGo8jacmUCHNovOjMvy1fGdvMui5cw0qNlesELQzbISk+JmUQXqUQMAJ6CHYwJgqhLaRa4BqZo7l2/+HLC39+vbGWwpevV15frwhCapVIIUpmGh84HSfWeSXFgdvjjdfXK0tRluILvW/MdwHOMAwczmf3WBk5Ho+kmDgejzw+PViJE1PjDUH4eD7zdDwQQqQ+PSK1cb3d+PL5M/8KLkp45euXSFNlWZYtQB9y5jhNHKcDh2lkGgfyxsvxpd3PItkz6/e81rKyFkPJzNtNCMUCs3UtpvBbnW+j/TVa84A25cvLC/OXr4g2bl+O3H46EUW4zCuX2wohkl9X8o9fNkn2kHD59T/y8uVH1mXhWhfCGF3ryjsnmtrWUQ31NRdwtS7HISExcH488/f/9D3H0+HNXtAPd8SQY0vqhONx4nw6ElMgHyP5nByZDJ4hVufbRaIKh+OBx+dCSonDcSIPkVYUajBhEMTWdgiknDk9nDk+TpweJ07HI4dxtPslVsbN8tdpp/7vXiGOpDxtHVUGQtv+UdeFZb6grbLMV27XF2o1JfJay7Z/9gQx1kIohRCiefMFa/lPeSLmyd67BCvbiVhnlyuYRz/QVBvVS1A4qZhN48b3N/GMVnd7mp5A3rHo2Frjvd1Z7v6DN2f+//r1HxG1/C9e0zhwPh55fjwx5ERtlWVdLZCBbR9b1sI8z6bcvd54fb0BuwbOVukYzLcsDZk8ZkBoauddrUYxOB0zIShjEjIQm8LcqC+FnIR/+HRmHB64XFb+x//4iZ+a2cHM1dT/pVVyaOZtFRoiFbAzeEGZEQqwwEYVqB5gVIXmvJpQM6ENBI3ElkgqiHZZDiu5BhWkWXITNVrXpyrWV9UQ7c/urBzdz9Vu6Fq1supKpVD59+jklBXKinRMtPua4LWwjtGHsGUgnWfTyz7qyEvzBdln9IaewE6i0malHVU39nO373FEh+TryzKaGALDOBBcVMoMTBWpkQJQLaBZWlcLrSzBPD1QdT8Ng2iXdaHUrhNhbbVrLc7/CAab9RJVwIO1+8DEYPDQ9Xk6bHT3OBFcuNBQi7JWNr+RDVl632s8HkjzwJqVEgqtzbTbF1JwQcNgon0lC2UaHAVr2z1fVFgoG9ATPIOAldqUVAK3S+FzeaW2xp+/fuHz64V1LXz+fOHl6yuigSlUsmSO05HHH458fPpIXSvn4yPrbeXHHz/z40+vvFxmu7sOtXcjRQXGceS7j89M48DxcOTp8YkhZ6bDxPl0soCyVqSagevT4wMfT0dCiGTgmDPX243f/+u/ksTI7LfLlaWYj9Vaba4LMA6Dl6oOHuhMljHdAZBdCqJvYDtt+X2uUlaWZWGt64Z8dbpFKfumShO0Gexra9bI4T+9/sgf/uVfaKXw9Xji6+lEEOGyFK6L1dw5/QiHh424q667sywXyjqDNrSuyJQInl3VVmnVSIGlGApXMKQnhcQ4ZNKYePr0yD//H//E88cne93JMzaJiJgxsJZGXWxvOEwj5+NEDIEaV5q7GFctNC1UfwqJpvlyPB9BI8MwcDwfGIZEDY26RJr2rqKIpEQaB86PDzx/98jhPHA+nzgeDqhWivPVUnjfICfnAykfCNFRLa1IW0Aba7ny+vKZsi6s68w8X2mt++S1Pah2rZkYEiHMBHeD1yCEaN5TeWsdt5KS1ToSdN8rF0Y1letAk640n21OABv5UsFy+8aufSN0BdTO3dmB/rB9vzc3/HJ6d//dv7Av/hKK8/+D6ziNlPOJTx+eOUwja1lNesP1wGozaYWX1wuXy8Lq5dXr9UprzQjK3aLDUUUJQh5t7UgI5DRYORMQKg+ngSDKITcGGrkB10r7KuSHzD9//8Tf/f2RHz+/slwuLNdXlrWxzNUUiFsgS2WMgRqs40qlUhFmlCumgL+Ky9rAlgTb2WucVtpAqJN1QmlmIJh1ku5it9Ks+0oQEoGElbOi02MCFZqt676jCh0hs3OzamVloWKff+36bcXjN+WqDjyxTa4NffvZTL2D4t7US/37dJb/3d/6tb+9pci/9PwWLBnMZ98PapBf8Dpgb+/dnm4rJ+w8fBNMc+XVjUuzPZBNaFB6vvT21e7M9v79twGQCHTfnY58/Qyk/aYE9x5XTMZpYeMTKetGTPXWTrBuJgk2dCoEvzfq/KG+t3RF5+a1VLSx1AqreY/NpbKWZh9rY11NeqAE13Go3Rcnmd3FMBDVUBprTw+ucXdXW/dxMwQnMw7W1n2YBnLOTMPAOGTr2AtCly835MX4WkOMjEOm1cqQEikGag2+YC2E3bShcCg4uOZF2BVid3uSb+fD+189edDWxSn3WdfRFO27URdau+O8NW0WIHWPsnUlSDAUqKw0CegymwK5iBPs7biq1ThXRtQWK6vUtqEx23pre5j/ppwlxvfqJSUi5msjFuQQEoJx/aLYkwzDntGKNNZQAecJ6gaK+wDY88dkGXHv0GzeFfEtRxDnwIQY9nEWIyiLeIvsO4etIXaFZbOsobo/G93ippgWVa17gNM68fz+vPekrhna3Jopx4uIm2H6iPRyr4Sthbzzoayr0lB4aVbi0l5+2upc/R7uiMxWhOhjLN2C+O77294v+5kg24v/hb2V/gN+trJ+daH92j66nz/vecVgEgxG25zXXwAAIABJREFUJrYyYG3OYVQQD3KCW8P0+1pK3RpTmpUfrHOp2doyPTZX4PcSYBBHbmIw2ZTu00gvKQHNWrsPU+I6JoYhkLMhQV1FfK/F2HVfCu1eY50/2hwN2Kv7si9053fsiN2O5akHDfLtfz1O2P/4/nF//aKOw7eAw9vrryAeV3Pr7WhECEQxyfwNou/v624PqNUKOvcTOATjRyCyy4N/g1PagFvzXFfpDCEwjJmUEqqme9HFsPoN2jR3tu6nSFVlXgvlcqVZMmKHfIg4GYX+X2nVA5F9Q968Yvr4SUdk3i5KG7feFns3Ub5dj9KTH0exUiA6v6lwN7Hf8Xp6fOJ1feGqM5VKi8Iq7kIbxMmH4gvU/Gb2dn1XFa4OOdJb/q310BA5mG8Ls2/C82WmXiutNOKaGOpkQo+aSSERayQUNWi9QUgGid/GmdN04Hg4uD/OSm2NEGFMiTFFztPIh/OJ08G0UR6ngRQjYw5MAeLGC7O5NqBmxxECSSuHIEgOfHo8808/fM9tXvh8vfDlenMkz/hFUazubIKUZhlQVzM4rQLSNZA6p4meCPx6W+Pf4tqtBvyAV6Bik8xLPL31WqJN4iABjYFUG8fzgaePT7S18nQ48Xg8W3a9zLAuRigcEjVbu2hKiRQHC0TiwVrI1HymaJX5srBcCuutUqQh82obIXuA02ozuwnMx2otJjEgsIvutYK21QJbb2dHQaoiiyUwTQo1LEY+7w7xQKgNsM6j0r2xgvG+0mBdX2WxOSyurdUU4hKZ54X5tpCSmB1E/7uu3ivvLJTzd99/QLT4dqjcrq+8fL6YnU1daHWlVZNVMDuGYGK6aoGBBQx7AtmfR9tKXa5oTdQ80rLbLQyBkAbbGFOykpUH86Hzl6KVbtX3atsHu0qabhwidQ5jP7SHIXOYMikNW9at2pjXAR0HamhIHWAZaM2Ov429oZ0S0RXx+4S/+/qb6313zf/1axgy42hE4nEaCWtApdJqoDYrJRnYPxk1olR+CsGbL6zaURtsfd612hxcd57qXBfK2iygmhIpZ2KAMUcmA+ZYS+DlVUgZ1kVppZGD8P3HE1GU19cFbS+g1lW6rMLlCpeoXEvjGiu3EiizWei0ouiqdnbeDw0dyRYPsqyjyvZGn49q/oYAXShJfH3WjhR33R8wtK+XivzXYhNGBhfirYxypUrhQaZfHYvfxl/dmK8jHzEGckybfkoPdJo2Kt1TCEottql2rQk/2HMPclLcOihsHncF5eAy00J2omcQF3eLwUTIii0av6f+0YMjIWFO5YoQbjcuy8xay9aaZ4aGxSJc9fazjvbYt/b64Zax7InEfRLYA5/tPfj7fxMV3++NHmFrtMhbk2dqDuXpOx+Mz09PfF2+sM5KrQsaAmuwFkRJQswGhaZhYBwnI4BWtQNB1U00q23+4EqcSmimTqytMl8XltuVVg0KLYuVL+KSGKuNeSaSNJBqJFRFSiUQmNJASJHbNHOajpymK7MfhqU1giTGnDgMmfNh4uPDifPROp2OUyYGsxkYg5fSuiYSbEEOIiQFiRBz5LvHM7/7+x+43GbK73/P55eX7X2urZG8U3CIkewCia2s1CBU2TGm5qfihoi9Myr3xgVd9U3Scz+NpCNP9KAvoK1xejxS5me0NJ6nE8/Tg62l5QrLlaqNOcAsjRACxykyjd7FNA3E0ToI22oeWdcvV15+vHB9mUEqSC/H6kZwLFVZ1kKkMi8rc1lZigU5wWlu9x5M1ABqCurWUeF2KsHaW+1NqR/uGHjtpbmijRbUg5xgQU4TbrpSi6FADZOPDykw32Zu10xOYqJljoJtAc5/QJCTpIGX63/6sXH7qbLUGS0LWgtaLXAMEpCg217VF+SWd/XsXBStK3W5oDFRl0zNwfSPhpHBCGse5BjXo5egVQRJ1Vt0xdgSuvuMGTITMJ6OWf3kZDy1acgcR0NWzXLF5kJYM20YKKJIGdA80qobMns029/TzwpZvV96+/Yvr6+/atW9M9Q6ZAtwhnE0SkUUCF3t2/ZR1S6PMlCrGUb/9NPLNv+3NdDvdtdF6khYMb5YioHjcCRlIYfAOEYOg/u3lca6NPIge5ATAz98OvJ4Tvz4+crLy8I8F2IIzAu8XizIeV0rt1C5rRbk1Ca0taGLQzlJjansXHZrzhNPftvucOBlTNFIuNukNkQwCi2aPUijgzfiFYTghSRbf5HAgYEskSZmvltl5Szjr47FX1Guss/34MVeutkft6kx+i9tNWL/Xu/O6VC1cncIbJNXt59vIoES9k1adpDx3il3gz7vP3dlz3D/97W/BPsIBtdxlwl1Ps42FFtmxHYDxL/XX4x6bbpn83SI7k1w8xZ5+9m9/A+6eov8/rHf23s39nj3gVhpif56hQ0hCGDZvKiTxQJBheAHRFQhqWV9unWOmGVEEluUEVyLyboDoghJZOti6uKDAi40ZTYfg7d4b/5JwdSyk3i3vysBh+139+CzZx1NhCEnDpOp/+aUfjbP1N/3fVDRy0QqPaPpQbJuc/39fcjeXtt8U/Y1dFce6PMtIKiELeBH2m7FAMYB0ITQqKJUD3LSYF1w/euYu+4NtBCIed1kFEK466jsr4G+zL3k2dQUe1vzIGJHJdrdHtLXmrpMfbi7y30TsLNvPxZ7QrKJfkZ7vzX661LoamWGirs/W2nmmbN5tPXnf/9VOh1GjscJdbTmNmbv4nOEWvq93IMYpPMA2fajXlbfS0Ky3SNrNTbNqSFHxiEZEhMT6jY7Nm+wjiCqlUgQ8/jy5+rCHTWAtEwLplM1+vwYB3OmtyCnUZt3CS2ZJSdEK82Jtagp9W5buproW0cB3wY2b9/nPZKwnyd3N+Ob3+Gbp3qvK6bgEim+9lxgsk/yiJV+DDG39ZJ6qfJuj3kzsLAhbXQkr1cv+hnp2msxdp6qNdnU6pzVaoh8ipFxMLPrbqUkiMurmFhgEzHObRWiBlILNo7VEVYw3ym/3xth3sGKrVjla/pugPdxlS0y2Dof+3gbbyverXCIJHIYGCXRWiDElRohx/yrY/GbQU5XKe2IjGne7O2iXe+g0iFJ8w6K/vt7e6MjFs4NaBhnQwRytDJArwd3nQg7ZH2DNGzPxQD3ttgQAtaIIXRpfiM62kI0+wVrcFYxGWtis4Ueo2e1RnJGfdN1t18RGLKpUHZ1VfGMRTY31b1E0WqhrraJTnkgu9qnCQCansRmsud6QjGKObL/R8U5cyEVOIeBUaIFFA5PBxJJM6KBVBKDB3ehCaGKH0AR9ai590lYeWRFdUW1UgKUFK20SKPFhqpQc0KLBTnZA5KcI48xcqgrgUoSIdI40ngaB5bTiUuKrMsMrXKeJr7/8IHH08Snpwe+ezxzPkxEEXKwzTkFYQjsB7u/jxTtMeJZqhIYovDp+ZFVldfrjc+XC3/6+mLlscuNtdq2bmWwzJAStEZZVvfd8+2+n+AWYZhQVinvOpStGicmxejB6H7AVbU6vm0ubPopdk+sZT4PmWkazasqCGtbbWwOiaenM02gJKE4EXbMg71/3/w0YJBzTMQGuVSG48RwOkJYkC8LyLplqxIDjUrRQlsr67yahs3LSJoCQ4wWgGkv/UEI5lknWKnbSLlWyg7O+t4TGCPcxpAMRQ6BlgxhPj4cOT+dmYeV64uisoJ3p4haGe16mQnJRMfME6fDlRss+K7j+d/+y+/4+vxIXRe0Vf5wyCwvn/mSLdhfloWwLNZVVb2k440aNrr+r0Anb4sIKWZSTKSU+Pj4wIcPz+Rh5MP3P/D06TuQRJWBynCX1Am0Rl1ns5FQWKpuXXqd2dNaYV0Wai0MOXGcBnKKHI9HPn74yDgMrqli5evPP574w/8H8+3Kl8+BP9eFsi47oRTFGk/M0lU7sQveBJ2AB9F7sqvoG+fu/Wd3h6o/Q0zvSyI/n87UdSFEO6RjiIzjCFiptJTqnnFCzlCbcjxMjOOAoiylGGqihkbnZArcD6cT59PJqiAxQgwEgeNg1ko5CocpczoO1NJYlsJtLlyuwtcvhc8/FmJsDEPgcBxYr5XjkDikRNTI5SWy3iJLSpQpo3Fkigd+SM+0nLnWC1+WRllXljBzuy00aYynicNwsKAtjIw6IRrQuVFnQ0Vrs3Krilojj+8rFsRHq54goLYPTOFITNmWtptzHtLID+dPPAxH6jxzTZ9Zb1f+7vG7Xx2L30ZyfNKYOqNHZd0QDjssEPFovcf3xrfHJ+Hm2K1qTqJigVJXqu2yLGbtkEx+X9iCnC2S7wFO566oS4nH3tFlQU7V3c8FsVa0poVGgtCQoH4YJgQ7LAwFtkCqueYHYocwWE0/eieW+EYInTEuTvSslLVY4DUMZJeU11J2opaTyoyEbJF+CHu29e7J/1yJK5zyiEYlImSx4DVKJGlCWiA2IRVHA5rJaAtCiImYBt8Iw6bh0WRBdUFptBBoKdn7DQ3NionyjIgOvvGKE1YbISxIXRBrJERoHLXxNI6U04kchNfXV2pZOR9Gvv/4zMfHMx/OR757fOA4DUirhOpeV8GDGTzj9Q4+K7X6vXbdj4by8emROI683mZ+//kz5z/9mTjP3NaCzLNxiJyonJPBwGVdDM53MH9DDTzb1hCI7X1Lj+rdXylE70qSTV+mqfGJ+rxrzhbvrsWqaqTfw0grFanuPRYCw2Hi9HwwGHlIqBMnLeOz372VlbUWrKMwQovk0hhOB8bTbIheuqBYMnGYJnJOLHXlZX6h1Mq6FK7Xmfx6Y9CIjANJOk7jjbYhkrInWF11HCs/dQdkVRMDs/mZLchBCMkMOZsoh4cD56czMc3kYWbPji06bVW5XWdUKikHZg9yDIa3RIr4zkHOf/0dly/PlNuVVgtThM9/+gPJuUpfLxc0RFotLpqpWErdPd7v0WwX4hMhp0yOth99eHzg7z8+M00T//C7H/jhH/4eJLK0gVWtXNX/U20WcFWzJLitldJ6cmt/sdbKsizUVpmGzOP5QM6Jh/OJHz59xziOW1dta5U//H5gkpnL5ZUhNObLq7dQ192Dy4Mc2wvrVsZiQzbukCxbCfbvVvLfCbMbGrQhf/bolN53LM/nE+typVvDhBjIcULE7lkpZTsHWovUakHONA7bfF48SbJk0ICAh/OJjx8+GEqUooszNqReoc7kZEHO+TSxLIXPP124zoXhZkHOT38uTAc4fRc5HxLzoXDImSkmaJHrS+BKoOVEKQOkgWk68mH6QEgTP10/05YLt1mpulBZqFI5JpNeSHFgYGKUI9KE6+3K7Xa1edJW1jqb4fHBZVnEzp8UDHlqnoQFAmOIjHLYkjRROA8n/unpH/n08IH1duUlHJkvL3z37wlytusu2NiQGbFAx/He/aH98exbifovvCntOJTV3kzKPSPZpvF9NH7/kjxD74fZ9uHfx9EY8UDCYD3A2emdqyEqm6icFTR0qxf297IR+UR6hdGeZ+uYkvsXa68vuEriBi3f3Yt7/FT2/313RMehYEPqjb+UMFQmNvv/rhUYMRXfoEb4EtSQnyZb2WMPznbdBkN7Et30VKUZ5qMJyBuqEoMAFTbBx/05Uy9Z+cEa/B5HCeQYt1JV96WycWygYYdfPQDv4xfu5kcPSlDIKTIMmbU1cjaCe6rV0JGOZnm3RIxxe54+mj+7xdwhmO85lP5551/sa2DvX+hlH0fcpM8xC1Ktk8ZNdMNuNRKTZYmSIpq6Eq6NT1O29bJVEnxtSAhesoqb6KQEszjJOZu42BoozedNww+mu0YEe3WO5PS1GzYpehuQPr6gEuz19/d2dz+IPnbJhf5y3WB1fE7dj2dr9x5bd3vLRvJ+vysns62IWmklbEawh2liHG8MeaDURumBXW9S6J/73MYy4+4vlVNmSKYqfxhHjocDh8PEyT8IkVx7kOOBsAc5bQ2bIWtcq3FJsMPI0OtKjoHaKuOQORxGhpQ4ThOHw8g0jo48BVprTK4xVctq3XI5W1JZzdAYQDXQ1JIJ1YhS9wDH99OwDfYdWvMmyPFEWvcA6H41pr9Q3vhbXL38u/3tOz7XTseA1sL29c/m+v4L9IXby1XWvZWI2YKc5uacYVuRe0nX8AGlVjVJhyqGwIYuIGh7cWuGojSNqLqWlAaCRhLRhDYJhCbGD+pVCZo5/FS3O+LOTLQprXcDemegQadC3LqV2h5DtOYGwdvbMMqBKxZEVRMfVzuuUxDU6Qq/dv12kBN6+2CvmTlRV5UuZN9fz86bCJsUe+866lyXXurufhNyd6hqvGPp94DA2yC7PYNoP3CSb2x2g2IwH5wYowsYWkYQknA6jaTRkJ6UBUP5rKVORKgrnu0qsqq1abZGJJmFgHfqdPntXbjMEC7F1Co1NTQ6290tKVSF1CKQfDHb6HT+jqpFsDvn433bVMdVOSwG6xIwkraKIzluZ+9IXHTCbVAh+kGZciBn2zwNxbfX21ZTv0QF0eQBpHop0ZAcYUTI+6IWwcxIBCR5iJMREloahxQZgzA7hyCKHZaHceA4jZymkeM4cBwHMwqsLijVS550pNFeYye2C+LCjpEAHGIiTBPDMPD8+Mjz4yP5euP1ciO+GjfoeJh4enzg4XjkcJgYcja/GBWCtjeBat9wa6vvOpb9PXXn4vsg34j19c7XqmfAbMF9zKY4rDEyqjA1b68/HsiHIwShRFjjjjI27blAMEVktRbV2ipVlDwNHB6OKJCnTBgC4zTw/OGR8+nA5Xajfq6EW2DIdtBYudbl6w8ZPDAGtcwwZc+IA52/p9HgbsRK4+ZdZIFbH4fe8NBUOT2ceP70xDjM/Hi+kMdXQPZSeRT3IesBVucTBeKQiTmQh18nN/5NxjIkUh5IQZDW+PDxI//8u//Eh+dnjsc/Mi+Vl8uFeZ65eHbctLrf2+7mZElEJsXBShzHMw+nE9Mw8N/+8+/4L//0D0zTxHf/8Hd8+P4TSGLVgaLZD1H3E+u2Dq1QauOyFNZiQU7FE5tWWdaZVhspBcbR5BhOxxMfPz4zDOOGPNVWWZcrL18/cThMtFa4Xa/M8+yBUNkTAz9rVJtRCXqA49rxXZ3+7u5tiGXrXbO9K5SeZHtyKfDddz+861geDwdex4F1NQ0cEaHWbuWABwJw9w1yjEyjzf/aCrfZEwo8UHJJiFar++kljqcDSmO9FlZWYlBqWbhcCsvaqG01rUZgXpXLzfbkqj2hwUpXY6S2gaWdaTrQ0kgIRxoDSUdysQAnrxBKIywVaqMtpp1TklAG15ILDYlGkF9eLyy3L4bkrDfmdUYEjuHIlLoVixLVuHDzy4X5OttQeSIUEQ4hMIgQtVK+/JG5XNFWGLkyDIVD/nVqwF+B5JiQVN84mui2cb6RqEa9dNFF/iy7bl3eGldIdBSnm3uKR5LBJ6n6zbc53iNxPySdt2Vu0jhSI9CaSbIntwSgbShCjMLhOJA13cH5hiQYDO7ZbgvWQeRwYnUX6t5p0KPdjgB0p2V7y6bT02KlxeqHrGX+lnFFFHuvq/jB9w2kuh9Sf/1C+t+5xqpMq7XHilhkPKgYWiOQKZtIV2QPcoJvnrlGcqt2wHuwiB901q0mxJhMOE26719HFQZEnNwYeoLSLBDxEoOQQSJlKRxSYAzCEAzVCQI5eufGNFqwcx/ktB74drEzD0Dvb2q3PojdGFI4xsSYEnkYeHo48/jwQIiJP//0xYLnGDlOE4/nM+ejCQGaqR7IrnJ2t5my6Zi892W8Ln8v3CEhd47fPQveQJBg6zSmYGaasZElMWLZZz4cSNPkAoAmogl4ibifFRbkNlUqZsLXUNKUmc7W9h/HSMjCcMg8PZ95fnxkfH3lMl+sXJYyqNCahcDDMHKYBgtwxNZwDIkcswuLOqcNKwcSrcNzdz3vB0IP5izYzcDxfOTp+ZEUbxyOPzKMCdSUrFOyuRpiRwtsnjS1Az+mTB4TaRjedyz738oJQXl+fuaf/vEfuV2vxJj5/OWVcRh5vV0hRtZSNr5Lb5zoHzkPDHkgxcTH52c+PT9znCb+6z//J/7P//I7xnHk8btPPHx4AolURirZOllTJgXbs/QuyHm9rczuu9c7YVqtrGWhuQSH58Qmzvn0xDAMlFIoxbhE63Ll5eWZaRooZeXi5apewukJ3xbAdH5OL815p9HP57yNf3Ui+33AAzuy2lHMTx9/vbzxt7gOR0ua5nm+C3J6ABrJmys8jhDoFiQqlWWJpCDUpj3ec6DDaBFoYMyR8+kA2ri2mVBvQKOUhUurlGrSKERL9pYCl1sjZVtLNueFIQemIVA0QztRdKKFgRqOBMkkHcg1GupfIKxKKA0WpV3Nk6tGWIeAJkGSIslYscv1ynJ7odbCMl9Z5puhVSNMUySoUYujWjdre3ll/vq6Bzlq1IPjaGhNbCs1Bub5lRhhGoU4CIf86wnlb3Ny7peOZ/kiOKek+0Ldw+B3sDlswdEeEO3woueH/QF3kKQ9Xz80pENwd0F+uDu4tP8rO3QuDpMH38z7D3oJY7Na6U94/zaDiX/dd9lw99C7xHj7u7LX2XaIW7x7q79Tvf9d2X+/P/fdPXyvK2HoTS8vdugv+NcdVQuKGZwi3ebKv/ZyV/+sbwWkEPHOJj9Qxct7spcR+2HbJ4gEF/vwgRPelpV+7epBa8/YBVxl/m2Qs91U3Qc8ONnNII3g3Te9JLW7BPdSxvazrWzSCyr7CN7Dw8p2i9/tsrN4W3R36+2uhHo3T/uXfQ2EYGRGFa+Cy+7fJt654Tswd9Pb57tnpTiaClup6977KSYjCKfcPww9yqsRYbuMRBegC6Gjxrq91m9X+jcLcHuJ9vVdo0MXrsO4NX1cY/JyHJAGK3u+wcRD5xKqB1X/QZfgJUTrZozJ1JpbbWZOejrSm0CqGnm1turcqN3AU0SshXkYSTFyPh15OB3Nr20a3fDRODpmbROxfNkVj72r0Xy9KtoiKkKMjeg3I7iJTY0BETWtmy40ifo8kr3Ud480StjHXe5Lv/t9sKtXAe5WlfYu2LtV50Gt+rmy//vN0367Nt7xil5Kjy4yuqFSChUhSH0TuHeOaRDrLpU7YAH1rsKmWzAYxOwczOLA19+2qcp2nBribsCBYkjsnqQYGJGSo/MaqK4J1XwzbdIIUtFW3MiokKNSE4wqHFogN2WIilRvrlGhNgc+tDIkq3wFDUQ1HuohBw7JGgliso+EcEjCnGQXEW3KEIXjEDiPgTEJh0EZswEYQxZiVHL69VX6m0GOxc4CYiUKQrMyjOwlKENX+mFlPAqziYe2Nurqirpeg+oBrA2HIK16S5qiTlDrZ5OGnh23HsTb84dIa2YA2TzfLGoePi004pgYBiG2iNRuL8F20Nnf7tluN6Y0g8jpMG1kaRHzbcI7UhQBJ14jikQ7xPtkE/Ga8dYyaJOqFmsOU5c7Z3sVbO918857x+scEi3EDb7tbdsCbm64y6339mNDdTqvBRMswyD1IDb1Y2jE7MhP73DEDr7d3PRu4faAUgTE2lhtNnSfHO+cEXOkLc3c5w2it64SESFl06JAQVqfi2bXsQc5fdMLW9AeOjIHkIJBTiJMk/EgSq2MfhAMg6kod/QoeUeDqjvm3nXYAW+I8e95mdK0k/XuA0Ywa4NoB4+2Zp5NPgzRy42Mg3tDKaMKqVlgF5MdftpPDu0HV/R7ZnIBUZt1DlZbQ0Ejx9PEEAytOz8fWZeZ42liOg+Mp0SLIx/qI9NptI6Mw+iqsJlhtMO3aTXE1+9fbRVxW4idW5G2ObVTAtU6garZjqSYiDHRipU70hDJY+RwHjk/HQghcDodnRzbWBbzWEspsa6Vy3Vm0EY6GRm/vDMyF1JCXfAyAMPhwMPjM9O0IGLdZbfbjett5uvllbVUllK4rYs1QHjJB+A4TZwOB3JMfHp+5NPTE+OQ+fvvv+Px6UxMmTROaBjcF87cw61rxz5siwg0aTT6OgYJ0UqIMbrxpLWJt+qlKy/TtlopUhzV1M2Ju7pZ5VpNWXtZF9ZSTPRwE0NVH89qnamW2tM9sqzbN94FLuKcnD2Y2AJe9qBHPMFZ1tu7juXD+cTl4cEEUeeZZVm4XF49QNnpHP0ypfWVlAKqcTMWNqRTaUuhhMrXlwva1APU3nQjiBZStlJXDw4lCONgwcw0eAeTI6ExTuQ8cThknp9nhEBtibkpVVeWBq/lytoiAbMSKQRSuPDDR7OLWYjc5ETVxvXWeH39N2tB93NZAjw8Rf7u+0yMGdEJcQPS0zRwcFPdbtJbSuXPQ+PLOVJL43ZdWebK6ZD4z/945PsPI0kiUzSdHLb4AB6efn0sfruFHD+Y3aQtbGRaJ3lVm9B2MFr0bu2K3olUZFcPVtkyTpubPftWdJPSN3iy+UG0KSp7fVXEhP5iiFQBI6DbxC5aTUMjuLKxWIdLKGHT3egUBWP8rx783MHjyTgEqr2dvNe7dwSqk63sfTsm5fVi2AmQIewOsrX2BbgjO/eXyB1x8h2vcxDrfnLxw52qyKadY187Rwjrcgu9hIjBxqaO70iOZx6bPPgWyOyBjfpnEd2/xr6/uR/L/mruuRW9Q69vjl192YIcKzNtJz7eXhyTR43cPY8bEtL1ePDHCxJtnlqWmxlKdmKzZfrjkDmMA9OQN6KzbbvWJv+G2Kh3G+07XobksCuP3pNjdQ9yehCG9m4hh/qHbBorTckNYvVxj+42LSAtbAFlkOjdTyDaqGpBbomF2AIhKYfDSEu2Xo/nidt14nAcGY6J4RAhDjzqmWmdLBEZTfyxBzp5GIybsZUdoLeT9+AW8Awo+uB2oqlSViuNCNBiIyUzH+xrOw+R8ZA5PkzEGHl6OnM8Hiil8PWrINcbMSdKbdxuCxrgUBvJD+j3vEIMaPaOS4HcJk4PjboW61w6jtR15TbPvFyvlFq5Lisv82xBg3uZgfJwOvF4OpJT4tPjA5/csPZ0PHE6Hq39OA8QE6Y8ZHIaeBkQD3LM90wdWaO6AAAgAElEQVR8nvn6CoFhGEg52z7p3V3LMlNb2caol1bqnUO9BTp9LVd7zcUsRZZldu5J2+wnetC6tZJ7yWmbw7xFZTrx2NbA9s+2KfXybll/3evob3Edj0dOpyPruhBjpDVD3uZ52bqJURyNdhmWulonnQZS8kS5mXJ3KQZXvXKllELOiXHKDEMipcBhtLkNYXMnR2RrpBiy7b21GZfWgpwT4xh4eLggKFWFpZrx5XWFdoNQAnWFMr/SKuRU+Phk3B9yQsaEovzrv175fz5/Zr5V6rIwLwshCt8/feAfPn5kGBJjNG2zwP9k711jZcuq+tHfmHOutapqP8453dAt+jcGkaZ5KEiLBsMjqIBAoyGQqNFGAxGJHyQSBOFi5A3mtmAHNBIxxIARTHzFmEC4BL/I4xLUK6jd8kikG1roxzln77N31VprzjnuhzHGXKtq1z7nYPc+LW2N7jpVu2o95pqPMX/jDdQaHOkcoZ6IcBNTwunA2J84xD5hb2+Bw4MWuzsBj/quKf7Pd84k0Cd5IEtG6DZlxAzMdo4fi8uqXaUyYFF/jyMg1KNF/waKaQnDW1FYjHZxAzegcRI6Ox+DOUOlSVagJMU4eZAwR2hdCgOKJsVC25cU2yYdsj3XoAId869lteZYIGeAyZSo2j/j65qfTV55L1e22+l39qyjWX/SVLQzMKvgkmmxjEExydj3A5Mo7VbtnZilLLPwAGAKuHEGXHwJ5y5qajIt4NhERXBVgK8qjYip1JHbjyrHK7ByGsmToYMlkqYVi13GlFIUDtq2oZ3Ds40xpiXwWo7ucWsgqvarrZXRPD5RKoku5fEdW8zi0K+AA2vbxaoztKs8u4OkZ+CRk+bIBlQUcGTAGOWd9D7eq9YsEIgh4LOpUDcVqlqylQvAIlS1aGEkhFSfwdaObVJlfYx8/0ZrvZhJ7dCRKcMiTIoAZc/gLNWEL0EKot0JgAlG3umaN5+O0euky3SQmXRNICA1N0jSxrqWpHtw4meWcoave7i6RsoZXddioWa4ramURAnBYzJpBJSEkYlQwnlUmCDla4Mph/S7skiMH6gZyvKmiVZPxsWNTFAysCNT02hbyDyK9kmioY0pIUbNkJ3F2VgETQE5MG2+jr/LaeSTox3IQ6TuMDF4eAbt45w8YjxZkCNrwiOECjkzajURFgfiNAA20k3G/P8AibSrqgpEDoxU6umJxkcqhPd9RNt1yNmpo7kKkYPYqvzKyS6dpc/7PmM+jzg46NEuIlKSNeK9w7SW+eE7h54CQnSIntEyIyegqR22pkFMlw0hTCViup9HHJyq0DZSGqLtBKSd2WlwertGVUmlcfPz9AqtJbmrmJ2C99iZBQRqEGNCAGMSCDvbATtbFWaTAGQgdSTRXCQWZjdKVruOLm2uYo2mck7zLqj5hkWiT2nF4Ud3cKmKLJPde7H3wRx3YWHY8rJMteI87EqWSNiCzoyYI2IfxUk5S/6MpJ73DPHM7mJChGRvDdMarvLgzIjJvNIzsqZrl6dKMNv7GIyQk5DUUUiZPJp67WdOGNe1AjQvSYpIqQc4I8YeMfZFY+JMo8FWn0o3abiSlp0t58oJkieJmspqYR1vyOOQa5A5nEt2XFbzEVmuEiI4Y5oYM70BsMg1bNwJFipuwFZs8yMnQgNLIExdwO5Vc8x9QK5rnDo4RCaPrdkWQlXD+YAQajSTCSaTmY6vMDbykj+i+Izpv4nNRwqw5PTDVm5jwLC8HFYAtFKpNYSgNddQjoFqsWxbJRCQGRQkm+lJEnkH9qSJOFnyg2pKAzgSR1lmJO8QlcmRF81M0mzN5vgoJkFxkAxaIZhBIJcLk3BlkxHNJkGYy6SSrNOcgRwIOQrjueqhpxC8Q1151FsVqBbmvVNXyAxETmhzRMo9YuzQLRYgl0cDIgUlOQ1CQ9bEd54CnKiUlsxVnqRkAemaMz4TVEvCTcBsp8HOmRl88Ng9s4Wt7Sm6VnJ4pBwRao+UReqGB7q+h48eMZ5stJz3TqrFG8hxmpOKErwD6kDgnLANwhn1iclaFypnxuHhAS5cuADOjMlkgulkguAdtqcNtqeNALyqlv5xDlQFuCBmycwSCSqlb8xXCxKxmBnOZVRBnPq9ajarosnxRchb+CDJJcsgarke78TiD6BPCV0fcdi22L9wiPl8jr5v0bVtKURq2htzfDaQI1rFZYG50IrwaiarQgrAnHNopicbKRdCwGQi9ZRSTJhOp2jqBn0nGqv54SGSRj8aoCPvEOpKkuS6ALiAvo/YP5hjf3+u2mxGaju46HDvuT30fY+q8jh9aobd7aloRiqPEBp5VpZoVYck+YhiREoZX/7KOXzzmxdAlEAQzefuToXv+q5T2Nme4KAl3LPnsOgJi3nC/l5E7DO2Z4wzpxLqCpjtOOyc8XAeuPfuHt/8ntPouoyYUDRPp6+a4fRVUxAR9vY67O+1GsqekCIDjjFtHHZ3xNH6u79jgrpykpzzoEO36FHXhKuurrC17RG7jAvne7TzJNlHdNwvph+4DHMVl5BRkBfmoWoAzoNTsCXrMc1HHjiP+Aeo1sZAtWVFFkc3yW1iFYCLNOCoFP1MnBFTDyaHAIIvmhLZbFJKmMcFuhQR6grTWlR+WQtKcgZylOKenM3BUMIRl0xZkD2ZYbZOSzdu+TMGKU8OFm0Fq006qdSRUq8JuwColGOZSc3Lf8iXMBJFTlibU+6XrcAeLf1mWhcBmMZwCWy5RcxmT+K74ZVhihOr5qvx5lMjIIdtsy8RaVauQ64TQiglBey7BoTZqV3swKEHYWvrPLrEmEymCOoP4EKFUDeomgY5SV0Wic4zc5VKkwW8SCSPdLMlMMAY5SypuJ3zCFVVCnOaUzJIUyOMnGN5xNYFvMqmdaLkBUAm00bqw1i5Dq8+C0QAkkR7kZP0hUXwcaT1PDVUl6A1ZlTbQmbW4+LzZKjChBSpQycAMgcCJ9Hv7p7eLr5PlScBtORR+xqOPBZdi+5QQt1TEp8M18kcsMSbYIvqEtOHgRznrWiw9jubMDEkDCxJ/0mkZA4eucpopjVmOxOE4LG1O8HWzhRh4XFwoUK7CHCBxJTS9/CRii/YSacEGJzotdnOgXwQ8OodfCMSuQ81fD2BpXHwTnjK/v4ezp89h5wzqrpGXTfwTkotTCrVgDrVpmqaCxfE5Jd0bVjkqHNOxCA9VgI4GKAkWjp9Ca/PqmFI8M4j0tFUGLJniDN7TBl9ymi7HgfzhYCctkPXLcA5IfU9siU7zCOQgyG78yoNvjfrNX9GBnJ2di9i37gfSPpdotuYGU0zQR0k0uzw8BAEQt/3SCki9j2Ys2jrqFGgJnXC+hiRMmM+75ASpJ5eH4FIIByg61rxz3GyDoOa6gNVAJwKAzLGfdeiR0LXJfRdj7piNA3h9GmH6YTgK4+HXbuDax66jYM5sHMOWLSEg4Me99Ytui7h1A5w7UMYkwY4dcbjIdcGVJXD3tmEs9cmxJ6RckDOcs964lFPAlJmfP3OfXytjaJJyhmRRQvYVISdqUfTeFz7kBmuOqVFSxc9YhcBx3Ahg3wWc9giIXUKDGGWmePH4tJVyG2TYx6AizI4hiwKzlS0NGsuoCpRYzhLP5Z7WESHoXPZa7gwuaLONoa+tHlxYXdWoRUYaqGIxCN6ErsG2X1JQqQtBYc4B6vGxjQBZJrSwfbPZuspMA+wAqBOQ7fK8aOFN2TjFNXikjr+CpBzXqOKhnuamUY0MJYLyBz7hsgXS0TlRiDHhar4cVhItkVWHAE5BURhdG0qVeOHMVHNkJqnrMaSOJIayDAoYs7Fyypx09CY+cVUvaOtr5gy2KrhYjiXMQDC8vwWtYXRgeVtVScEYO2CuP9Ip/myCdYAuKNSDI9HGieYSc4SA2ouLvJOUq3rhXm0eIp5ENJ7JaJLBZcC2EXVh6zjV9UV6kklJTe8OriT1xxGHj5rJBurP5cCKFFEyTxIyFZiSm85bGClh8cClWkmyYDc6HsbS0/6kkirEBySmrBCJfPb0ehcNs3VyZKZqgw8Ju0DK52zYihEmV8mOKppC1Azrs5XMSVLEkuYmUoTBZIKdGT+kjSYzMZ81Pp6bNIdeNvSEAzzxa5BKOUJR60eeGke8fWVNTWsrTV8kkdHsB0/8Nl1O5/tYyfNc50CPacANHhJhumc+MxMp1OEKgh47nv1mzMfWKk3FTPQ9xFdl9C2PfqYsGi7QQwmCTF3Wky46yJyYDQxIVUZYuiytZqKKTgzw7Vi4chMaBZyrbZj9BGIWufK9jywjZsKjVn2yRSB2IkAkiLr8eYADuVNDoAEIVXVBLMpI9YZLiSEOsF7QlU3IFeBmXA4T3CYg5kR24QUxYE5NAQXgHbBmHdA2xP6RKI1SmKCPI4uCXKCSml9srpU4odvjJJUurUNC1CwMMx4qRPClgxwMMew/k6audE2EpvYOTGQ1OnUVNYaS6VxLSVPBMAIVQVCJemu4UVbkTNSHuzAURdCcKpBIGXwLJ2UYkbsohwbGT0JwhFJxbRU4jgHgkhCRHDwqKsansSBj4gQ41CvShZpLinOAdEyETmkJOoF8006SaonEzSzadmEBXdY1k2n5hgBHgWI+KFivBsBnmLj13YbyDFgxADYKZMm43KyVIwBC98dzFjGRH0dVRqVauiz6RR9n1DXEzAIMYsTXZQpouNiUIPFtIKRHocZmd1wjDK6QQugi7OwSdLieXXJzGpOfOXZRqBZLmObogDnk2ak5OVmpcxKFnMOqRayaN84S50YqMnR2cYvDtdWf4pqBXDBIWqgAGHIvkoGiiC28KzXEM2RFtJTX++69tg9vYXJtJbxgOJbJmisHqgFOnTwHaGpxSERiaV4pBd+0DNL3TnhpTBlCsuCBlGWBJBFk0OlvYZymSQ3CVWyYdd1QNV4VJXHZFZjtj1BCA7znakEUghiAFGGpyxO6gBO1viomj8Og7yYgUhqUleNsiSLZaQoGapD2cvlGOc0zT9Jsgho0sbgK/Wl8RI5RaIlAnkBTd4BljbDNLgrsG4wVTvZ5LTYY9L5J38TAPUj8hVC8CowCt9c8tuBgo5sQGcAR+JD5grI0xbA7KtLa2tpmcmgE9Qfj48eZMDqJMl5EcoyCX9wJP45zIytrS3sntrVfU1LXmg/mFZs0XZYLFrEGHH33ffim3fdg67rcfb8Ps6d30dK4mjeRzl3/6BFBg2Zf0m0moF6eEhQRep7dS9hNVNlNA3hQucxmRKy6/F/7o1wVULXAYcHQN8DfZuRI8BZKplfmGd0kdUhOsM7YDFnzA/Mz8qhjxEgh63cgP0WiDy2d6/CZMtLiDkzYoZaQObg1GLedrjzzruwv3dOhR1xXfBVwPapLTTTBn3PuLDv0C48mD1yrpDZg+rtY8fi0pocL4g/KQp0EKZh9u5iZljZm827G8oEAWHCo1rqRfsiqtNQpGYTwARx55JYbXCCtq1L8yZrqm/vgzA4r8YwFi1TcXRjwIJ9mcQ05pyDJw9JdUToEUFJvfQpaQr10WJUJJy1uCjYwRw9gw/wZKHQFim2LIXK8yjMhWhzLGHV2pwR9zP5ukJomnJ/56iYCM1Rjkwzo2HVBnJMS1OiropPjmp+tO0Dk5Tw5lx4lPTTcs6M4ZnHz+5CJX4wVYWqrtDUDZqmFz8AiGf9AF4V5BTVygCmB5ADhceWM2kAOWRaN4xYIaHkfAk+qCbJFxC4KiUuOZgzME7jfpIkkrj+V+QHEu0ktAoyRgk2SbUcIA0+UwATvKq1ZTPKsgDFVw4YnSsbLZxqQDH4Udm+mIkRKo/Z9gTNtB7EfNMOZdHaZMpougpEjCp4OCZQlusGkpxE2TRndrplC88CVAHVtFiuEIFyow7StnsBVzkzvObsqWqPupEoFQIwmdaInTiHRuU54hzJ2gcnDFqd5DQxkGM+blA9EmvukZwBl1j9BstjotTDI3m3nBSi1RH+6rTmEYhEw0qmozJNupma6AhTH7RMJD5wxGVzZl5OkUHlvpquomjUlkOnLbR8uWdHIMi0WUWjOF56bP+v9iQK0Cm/jhDPSAN0UuTIwbsAoqyanDBosIDi6ilm0MHcZ5FRMUYxVcWoZi8vEUvOI8aMvo+4cMhYdFIDa95KpHBdBUyaINXgiZCdlmxIjK5NUtYhi/9ozglNQ4jEmLQOk2nC2QsZs+2M2ANdq9qanmXuqePyos2ISYT+1EmtwL5ndK3MgT5ldBEy9ypCNZkghArT2Q4m0x3Z4xUbxBRx9t57sL93Dm08wB1fP8TtX/0vMANVVcF7icA8/RCPrZ2AlIB2Tuh7B0cB3k3gXIU6To8di8syVxXt9Oj7wQSAMn/IJHV9AJPaRd1aDoCpw2l0jcws4YpHQMHKLNZLOKe1NpbupT+WuPNhURhAMRFvWGcj9Xj5fmm7K+u99INdC6OkS9JZxR+gqPr0coNqdyyFmF/RqK9XpKf7m0QjUZd+cERFE1fKZRCpX4uCFSfmJGAEYPQlC5ZK4k7bCsYLelma0rF3w9hIEjEqfQqC5prRqJZk1ZYHrV6MEpHRx4Q+qf7UirY62SwxBgCAOFfqZ8eD8l+aJ1ujaaQGTZMr83j0RLBJv+QDcISuBNDh8pK+1naxSkma4Vn8zGy7HDP/oZ0m+UrqnAFAjHTPqwMKQHx8HIY5bo9tjvUo65itEQBDMy57MCRPhtX7ceq/QwaInSTndJBEeAwFdyztYi3cW57BUgeodmeMXInE0T1UvpRecepsW1UedVOLABMFRIvpSjWVJz2eZuYBljSbsAiZ1ZBp4xs6Jo40y7oKLM6bdtyXqCgagRheehknHPP2kSnKbmfmJxh/GyH7ddIu2T8MMC09V3lh+XnGp65w4iXZQj6PwIvO7PGlBr5SGnNFyEy3w7t9t3ycc9aXw5mDa4Oc0zQTbG1toaoqzHd7tL345fiqgq8ElHvHEHYtifRizCJ0eipBadZdzhMqF8Ds4YMk74uRcDjPuPvuOQhA6oFuIUCm6xjzuYSx103C9EArmQdCUwlviVGOYxbzUZ8IIC9+stwhVIxp12HSdSD192IAMUacPbvA3vkFDg8WuOdci3vPdyAQJhOPqgYiCLPeo4lBKyVI8ESmAEcVGAF8ET3rJUGOFDoc5SwhEcQcNPJKJSjnCMFyrDgHX4lGQJI/GcPMIKchoiS5ExzJdfqYRYpJtpit2Fo2EUGuQSTMqfZAIvicS4eZRFJChbMDJ1GpRS0MJk50KL4hUAnBO1WNEYlkmCWKzBXnUlu/w+bMUBVjko3VK7NkFkdpSb9tvJZgjtLjPBIlbw6byeZSI3LfaHv3FPyIbYzNTyLdy3sJwVfGxaM5AMj3iYaM1xpXBYALwJCfVgGAMjqnDuzgUhPNfgeA2CX0ix7dokPfdui7HrHv0bUeh/MFOGdcOJhj/3AuEiPJXHUE8bqnEQwhkyUDWDVtjAxfItn0OZ2oRuumQWQJgzafIANMR2iVC9sVrwg/HfIyDSYLfaZMSOqXQpCMo8XXQsd4Oah8AHKZc4majMygRPYDTEE7jojLmUFa+oFAiqgIrvJF22IgRNEmwEDlK2y7GZLWPaqbSrRnI6DjAyHkgOwzXHbwOcCQUoy9NEsBsbQywFEowsqgfbEQdofJpML2zgwheDQTzfwLws7uFupQSRDDYo6+j2gmUnneewmTP1lyhUcJH2WQT3BIcCzM2iBlsifTHZIgrgWT6QRgwFU1XKjhnFSw9pUfpXxQEzI7JB3bTFSyMJjwZVqWwl/JTLrWjhE4JpjiCEuhs4V0lpGWytBMy5JlWUDc0kljnlCEXpvoNOzYK6L3/xiyMH2FxmOQMxJJZD8yDaqVQGJGShVCED8aIo/ZdAsxJVx19SG+88IcfYw4v7+PvQsH6Psee+fP4cKFPQUcEplETsrgeAdA9zgfCM4HNE2DEAJyToixw8GFiK93Lfb3voamEtNj18scSFlATM6AdwkhSFkfCUISbiq+MWKuZCnUAOcCrnlYwnc8TCJUZ7Mes60OosFJ6nQdcfbs3dg7fw7z+Rxf/c+78I07zyKEgKuv9tjZqTGbOlSzCXy9JQqLpCkz4ME0AbsAdlvHDsW3pMkBxnKjDpQxVXZFSDPzExEBWbxoRMLGIAkyBOhAeF7KuniZ4QuS5aI14dKWwWmQoZJWloycBEtUJkwWquZMGnpHTmotmYQ4XhIirUkUAY0Yh2yWo3wGGNoGmJ8RS0Z4DaNm1uD0knmThudR6WfQLsn0d8tK9hOjZjoDdHNg8Mj8tMzAGVhmbnSUhTDEH4bsN+ai0i6iiV1EyYAHeRpp1lavSsgxI8WE1MtCSDGVlOZt28EBYrduezR1r34crgAnU8IXjxnCUKyTBi0iaGDSIE1jUFVShVw38hISz+YgOX6aodVXnnjlNZqXUKCu61cScQu0ceyKYkZAks1r3fA4D4nvUgJlKnPXzAYVOQSNJxiDv2Xz2CClm0PiAHIYwXmQa8BZsqgHjQASs4puis40jQrIGUdU++NoR+9l8wYwjDEAM9E5J3l6JqmGD+KXY4U5pxPJqBpjhNSaA6oqlAzX7shcvb9JhLRSMsMiELNI6RUEXEgkFNljFZ7qnYer1O8q1HBVDQn5lsrwS6ZhSD+ZcJhZ17gwUPXzGpL4jRXqxhvMe7IsgwJ2BmGtaCnKHBsc+IcgkRHA4TJjlnQyg9LI5uLFBuN/ANgpGjj7k5a6yfbP8d7q3FBF3bmstQ8Z3gfMpjPknLG93eNM24sGZG8P5/f3sWhbgBPaxSHAInS3HOFIUjoEK1FTy7oKlcd0NkXTNOi6HnvnGW2bcXDQ4c47LyDGDpklI3XSdCdseZSQQVpLzNwvxBIDRPNzogrkKvgQ8F0XtrHX7qNuJphtEba2pAf6vkXfdwpy7sX58+fQLhb4+tf2cfddh2jqCol20IPQZYcziwpbXYNirQEETFEDSVR8fF25S4Mc+48GnH10eq3sYrT6i6Fz8zbX73SPs+gB83VJlul0lPmSeLh4qfdh6lSVaop61WXVxAyLg3kImz1uCYwjuFiZvw9SRMw2woJskzgekz6HMQjm0XWKKWoAMBKeaflaaABBPOqXkyYzH5Z+s9sOfiVSeFGfZQRy7PnAZh6RQTRbu2CFwY6OPPYVUaas6nRfnMz13sNRWCxadG2Hvu+1wJ+AnRhkE+q9OLd1XY+26xCcA4L5BVl48YhVEqAOHwAIgcSngADN8+GkUrKGWmfL0moS7YrpDPAjdf7qjBoiUk6WVsyfK+YZSVU1yqRN5tCpw71khqMCfAxAjBcKA0XgcLqJlcrfGJxJZXxVmtHrm9azSK/GI3RDlchGEYYGk7OaMEkAkyPRkJbSd2W+qk+b8acilNHw3Lqhg+UaoQqoU1aTzhBRRpoGwYMRqoDMGaEKJSrt5NVzyhOgIASqRXfDc0kyVB1HBsixaHxgQqf2sfPK/GWz5dLpVAQ85mHzMuBi42XawaV0GaM+GPfE2KeumNLGfUXDceID6AcBYhQNa2DtW10647bYuWMz1bLJ6gEiA3pHumXc+qxaLS5bBjM0h4NYOupKAIX3HtPpBDFLSP/uzg66doGcEjhGcJbSJk7n/zC/BQTHFEGdQ9v2WCw6LOYSzt62GSlJdu++7LG2z2Lk/8ZlXwWbVUfAJ7kMcgmJCQeHc5w9dx5VNcfhvMXBwRwAlTxyKUVc2N/H4eEhuq4Ds5SskASwoQjhKGLrAB6ZNM0BePDHXEOXAXJMtjA0uhzIOBZqycQdU33aUdrR4hRsk1mioxTNILJESVAfh2iknAB18K10cYiqiwEkjXIaJA1x4tKFDy8M0TYlG6gMkVTMuU7upOpuSepnHu6+8lq8T+yMMUrIHmihbTCUC0hZFSloJhujJgYEgSjoRiCVZsFB7y2mk5yBvstIaWXOnwR5B9YEizZpM2UpypaHCr4xi/e8gD4qhp2UU4l0M0DAjJIVGLDwXzU5RtbsnihArpg7bIMp+/IAGvYP59g7v4eDCwc4uHCIw4MDyS+REybBI6WI/QsTnNvbR8oZlSc0lYbJcgY01fx4foIC4MSMGpxUtyUCQhBpt+16zBcLdL1ELYjzn2iPur5H1/UgZsTKIyer35ZHd7HBM3B0sokdswkBKkoV8FmY0gqowLARFZCjm9FYUyO+ThrRxFpGQw4q17ACj0ROQ/ul7yVhom2nTlNMsGQjzwK4xNFdNtXEojWw6BvDokWwIpFAoZEWlrFatHqubMLW14PmzdIl2Aaq5miWeTeZTgCSqCvdA0pNL58d4AlNqqWAYOXNCn6ipJnDJIqMRFIlV0nRBccIniUMX/kYs5Rdq7yBddXAAICXuQ4iWe+WGsJAjlYRTywAJ9nwEsPrtE05l4zEBJR6cI5s07Rtmgr/t03JrW7oOs/EkbTBJPaoajFPljp1YOSisblEX1GZjhgh5svGoSft+yiBEUP2LHKD36fcf9QWW4NwJfLRMYOdjeXAF0PVoNEkgs20walTu+hjj61Jg6vOnEbf9Th/71lc2NtTTbnss141OFUlQP/wcA7mORbzHvfcfQHzQ6l/FlOScB5mRGTdD3modM8kLwBg1iUxKAfkeQA4ieCKd34T954/0Gg7TaYKguW3Y2bklYSPu7s7qOsK0+mklA+BC0jsi7nT/O6SYoD+IjmsLhPkYIicWYY2S8dBmZICctvXFOiMnd7sDNkg5CG1g3RjJUCMbywOVFzyOkg4XCxZUOVag8St52rWTVatEHRQihhrK4TsJy7vNmDeSa4PIoJLourLmdF1sZilaCS1D9qbMeOV4pUwScw5+CA3dmS1Rhixv8zVfR+p1DqiUXeoBkBAo2SEjllyL1gw3BhI9pqoKyoAAMy356gzYe6HjLXFdKUgZ1mTY1KjfHMwX2AxX6BrO3RtKxlpux6dc+i6Do6Atu0wXywQgkf0Djl5iYTNY5AzuoMLgEtibnGErF00mZIAACAASURBVA7XKTn4RGj7KBKGpplPo5T+SdPOB0cl3wSvXn/Qkaz57QRopC3k8SBp2zKP5tQY6GBkthiBHJPaOgV3S/N49ChEtkFKSLCKJGpOcsoECc4wLFlSUTUqa9ZQduovZa+hicWXyDuJbnPqt+FgDvCqu2BGzoP5xja6YvYqplgRaixJ46rWVTZpBx8YlAm1I0lo5gD4sW/PSZKGR0MEJjBpfhs1tQeRIT2rmyI0H6T6RZCOFwNg8mAVCtmh+NjZRDDNrKxvKmUdSc1WpiXPCtaJJAFr0eSMQLNRcScYRVcCIwhCVlYjIIRKgxyW85phPGcvQUeATsn9peL1aG5faW1OmV/WDrZ2LffXmExbJ+0FAJmnBVyzJMGsKpmzoa4wTSKMEQN1VaNtW6Q2op1L9ujEEQzJNSMlcEjCz/sefZ8wP+xw4cIChwedrGmdGxmaboWygt04Ajm+PIl5OA6zaXhMooy23wftHehg2Y/QvVBWZgiE4MUsvb01wWTSjFJ2mE+kJEd05MA0pJphLdp6sZIrl3Y8hkWrD7ZSkwgMvJRWY5ie2RYMjyYijRH04DVhohQBMundcC8IPoCpsTkrNKJBVV3UZDSUCxCNurUVWkjSJsuYncrflr12mIuq3jN/jGSOYYq0FeQ4P9S7sZctKhoedVjpuMQiPuF1aD5E1m/jm+aUxR9BQWSfUulb4z9RF5WFOUat8LwUQm4aAtWkWbkFtugnIriUioOl/TPWJvS6qEDiJ1PXFZrUoGkaTCYTNI1UH6/qRlSbXp3qiACX1EQ2bPgADdlejZlAHkryTTDavkfX9xL6n2XTrmupn8XM6PoOxAlzT/Aq2fQUkWFSBBVeS5nA/mRTx4OHtHA2v+QzFWAm6RVQprtI2UOEklHS4/MIvBbgYwKIXtsBxazMDCSXyrxPFgGkGl3nXdkobbPMup4lVF1zqrMySzapVzbfIWsWD9VBlA+I2VdaxiCdW/pAZCtt4EgDAB1eJTEcD+cI4NFNSJjHsuh9QlR8EJfwCGmf0PBiwCkolLQ16nyeBgc4MXNZ1CQV7ZYAXAnpEdOD3kedyu2LZXf09a9iqrf2q5BZ0n0UAGnaoyINLgmTQ5SWPvs6HkiAreexkDrMYV459iioGbfhxAWQAiqHv8vcwmgfJDoytZZNbUDx6YQAWuQMJsCz+lV5j6ZpsDVLqHzAYncXHKXcR59apNTDOaBqBChLCDqQMyFUjGZSC2gmCTARkJORKEjajSz+ObZOKA94wJfWC8gZ1qM9rGn1NVGvzskqBAnsIClDYb5xW7MG06ZCVQXs7u5gpr5Dk2ZSkimKqVOFHGV8VvNrHV06GSCcRE0VXECDVEPQsgtip7coGQkH1wiiYveFOATbKMPpRHOApZ+HOBL7slk6se/prQ3gxDhEKFn0DyBSX9EmqN+AJfGy9Utmvx8FCzFQHC0zAfCiNvTeabI7oFMpPyU5rgqVSFLBw1umWCQAaVmiKc3jldfqdyh9epIUNTumMXerFMxZ1NJ9L/W2kh4nG90AckSj0WuBPdsMh2ldHsJQRNG2QUN9bfEqIDAwuaQBZLR9QgLDBYe6qbGzs4OqmWBrOsGZM6cwaWqcPn0K2zu7mE0n4pipzB6cJMEDln1WRLIVFb5jSRGfmbGYz9G2c7R9xMHBAbquQ0wJTV1hd3cXs6ZBTgn7+3tYeIc8P8A8eDAxIhKSgRwatgYPh614woyUM4izlt8b1gKr2rlrF+JbUzZrqaljhf8sTw4Yao7rRFMZI/pe1dOFUQ1ZXDMBTFGrkJvJSFTJOadiiqxrhiepZRRzQk5JNbGaadc0h2bjtzpvPGQudyBkEoADp6YuACLlaoV0RyBplG6aOpGQB4A+7rbRpiFRkFCQ5Yp3sXgaqcl1vLefIEVN4V95A/+643sSU1FVCX9j456aw8cENEqaSweSYFVzXpXIMBtDnS8psjq7MiQj7mAvl6eXJHbe5dIjQz0/GRAGl4zFkghONJ6W5NT44BAyjpHGe/RSgCMCJwrDNi3foOEZaQsuQQYWVrU4V0Kj4zSLuzG/IWu1Gl6LxhsrQGe8K40+aqdkltw3zKxbJyGwlHPY2dpGigk7sx0cPvQQMUUcLvaw6A+Vj0v5iK7vwXwA51o4n5A5YLbVCx8OXkO8GYlkXKRavGZMlighEfBJykYYoBzyJZmG3wCzPGNd1aiqWjQ2OzvY2tqC9x6zSYOmqVc+O0wnjQiZ3mO6NZHoSxWcSKQgmDXI6oSto0uCHEv6N+7+IjbqYBkTNEYwVpXLv8pUVsVHW1CKIIVHD5ocEaKomFMY0Eqqgw1esndy8dgvWoSRWtXxsjm9YA4MkG0okjm0UWroFIglBT6L7V+Qo7cQ47Kh5qJVWlZH8hoRZQA5V4CHAlCJnQdpK2fR3sh7Qtf1RUqPMRW1OKtWbmyuGkAOD4t56W7DEzFjlF9ofAgNCfZG1KeskovU2Jk0DZwPmE4nmE1nmEwaTCZT1E2DumngiOFJHY4th7wxT7ZxFWdMBkoYMzOj73sczgXktF1X+sMHj4nzqKsKzFkK3DmCjx7JiVNsQpLoQRqexzZIV8/uv4FbR2xzZw2LVE1bsgy+CoDEmT7IxjgaCmFk8tx9H9H1WiNqtEGxo+LkamZl5yRhoISTqq8NZ3iWUhyU86DJsXlCQ9/nUf25IVKTtMyAbWmmDTDTjAhOEsQwbLpFIwN707EfTa3x9BvMVXYCDTIYmayaC7c7aRJFp4RmG88cHHlH+W70iFIV3t7T0LdEvmQsd57UB1ldBuzaWf0gdf4Oz1gmczmnfD/im/bJorCKhtgyGLP5hhX5GACGNblWmyNHWOHV8re9qDziMbQ8Wg+Y07GtmdFX435YjnQbjTNW3zE8EptvLErfwTE8SzoGNJL0r3IVtibb6FOH/UOPw7ZCygld1yGlCHIdFpVUHwclTKZcclRVlWTGHkzMktG6j6n4AHKMEK2phyeN+SuZXhgx9uj7rig6AOEjTSOaeB8Czpw+jVOnTiGEgK3ZFNPJBN450dLXorGpLbLREUKtaSVsPpfuEf4Rwn3Q5Fx91UNHqrUywzFs2tnGVJyKiUDkxbsfUsm45IIh8/iHnmfqVflMAConcf3CSHlUN0YVU3nIgAwoyAFKsiubXNYLfY5YtJ2UfiCoQ5RIhHWlBUFHtv4YkxZMYzRNwGQiSLWZdJhOenF00voeAMrkkB6JACcFDFJkTocYxoT7Pon5Bhh8cjKjW0iypWuuufZSQ3KfqNnaUbBoIEeKiWZ1Mqy6vmxKMSZYVOAAcrRQKgvYTCM1pvTBemLdw46qiYdkhGOKKaHuEmLOaPuEZrtFHxOmkwanT+2iaWrsbm9ja/cUppMGpCDH6WbJKSrIwQjkONHkAOrvpQvXB6CuUfcJZ1Ch9VMBeSK0oAkBZ66+GrunTyM4wtR71COQw2OQAwXqcJju7N6nsboU7W6dkQ/qjCtSlDC/tu8wqbaOgJwQJINoqVGm/b6oO7RNJyAniuQGoGhUQObIK5ukMRrnHOogzIhIMkSLL4xHXXJx6HrQyWT5ZmT+xaKVNZBjyfngCMGHUuIhOI/KVSCIxij2WqSFh/wig5/eQMeBHLDU6CmaHB6yPZX9nDKyJIQY+vuEqKo8spN8NyV6SvsseNEYO2d+OyMByeLIQ1GGwQVX6rw5r/5RtrmaLsxbkWSHoGkFCJKk0TsHykDgoc6dgR2nbTG/CEfmMiBmZe9E011VmuDRiWM/CGiaCltbMwCMM2dO49prr8HOzhb6tpUCncwA9wDLRsopgrPwm7Gwe7xAqCaMFc3Nks8lGN/xsO+838ZtHc1mW+V+RkXDT7LjLIOcctSyHmCE+4TNskAPZiQWEatoelh8RoOvUNcRMfXwDdB0tQAVLQja9z3qeoq2bRFjRrsQzTVpqSPypnUykMO6F2gQwhLI0exNJRcrI0WxemDl2eu6RlPXcMHj1KnT2NnZQVDBddI08I7QNA1q1VhWQWoVOiKE2kyuR0EOwJhtHV9wlfgBj63b0IY2tKENbWhDG7r/6cpkoNvQhja0oQ1taEMbusK0ATkb2tCGNrShDW3oQUkbkLOhDW1oQxva0IYelPRtC3I+8pGP4Kabbnqgm7Ghb4E+85nP4MYbb3ygm7Gh+4k24/ngpuPG95ZbbsFf//VfX/L8e++9F4961KNOomkbukx6yUtegnvvvfc+X+eOO+7AD/7gD94PLbrydMnoqg1taEMb2tCGjF7xilc80E3Y0GXSP/zDPzzQTXjA6dsK5Nxyyy3427/9W5w+fRrf8z3fAwDoug4333wzPvvZzyKlhMc85jF4/etfj+3tbXzjG9/Am970Jtx5553o+x7Pe97z8PKXvxx33HEHfv7nfx6PeMQj8LWvfQ0f+MAHcM011zzAT/e/gw4PD/Hrv/7r+MpXvoK2bfGWt7wFj3rUo/DGN74Rt956K4gIT33qU/HKV74SIQQ87nGPw4//+I/j1ltvxc0334xPfOIT+NjHPoaqqnDmzBm8/e1vxzXXXIMvf/nLeOtb34pz584hpYSbbroJL3rRix7ox33Q02Y8H9x0eHiIX/u1X8N//ud/Ynd3F29605vw3ve+F4985CPx0pe+9Mh43nnnnXjXu96F6XSKxz3ucQ908/9X02tf+1oAwC/+4i/iS1/6Ep797Gfjtttuwytf+Uq8/e1vxy233ILv//7vBwD82I/9WPn7E5/4BH7v934POWfMZjO88Y1vxPb2drnul7/8ZfzyL/8yXvva1+KZz3zmA/Js3xLxtwl97GMf4+c+97m8v7/Pfd/zy172Mv6FX/gFfve7383veMc7OOfMzMy/+7u/y7/927/NzMw33XQTf/zjH2dm5sViwTfddBP/3d/9Hd9+++183XXX8Wc/+9kH6nH+V9KnP/1pfvSjH83//M//zMzM73//+/nFL34xv/rVr+Y3v/nNnHPmtm35JS95Cb/3ve9lZubrrruO/+qv/oqZmb/+9a/zE5/4RG7blpmZ//iP/5g/9rGPcd/3/NznPpe/8IUvMDPz3t4eP+c5z+F/+qd/egCe8n8PbcbzwU2f/vSn+frrr+fPfe5zzMz8oQ99iF/0ohfxa17zGn7f+97HzMvjedddd/ENN9zAX/ziF5mZ+Q//8A/5uuuue2AavyFmlvG55557+BnPeAa/5z3vKd8/4xnP4H/5l3858reN4b/+678yM/NHP/pRfulLX8q33347P+EJT+DbbruNf+InfoI/+clPXvFn+e/St40m51Of+hSe+cxnFkT5whe+EB/4wAfw93//99jf38cnP/lJAEDf97j66qtxeHiIz372szh//jxuueUWACKV3HrrrfiBH/gBhBDwhCc84QF7nv+t9N3f/d14/OMfDwC4/vrr8Rd/8Rf4j//4D/zZn/1ZSRj1sz/7s/iTP/kTvOxlLwMA/NAP/RAA4Nprr8X111+PF7zgBXja056Gpz3taXjyk5+ML33pS/jqV7+K173udeU+i8UC//Zv/7YZ4xOmzXg+uOlRj3oUnvjEJwIAXvCCF+ANb3jDEa23jefnPvc5XHfddfi+7/s+AMDP/MzP4J3vfOeVbfCGjiUbp4vRP/7jP+KRj3wkHvOYxwAAnvWsZ+FZz3oW7rjjDnRdhxe/+MX44R/+YTz5yU8+6ebeb/RtA3KA5eyRXgty5Zzxute9Dk9/+tMBAAcHB5J+XzOgfuhDH8J0OgUgjnBN0+Ds2bNSwj18Wz3+g4KqqiqfLeW6FW40slITRrOZlEdwzuGDH/wgPv/5z+NTn/oU3va2t+GpT30qfvqnfxo7Ozv4m7/5m3LO3XffjZ2d47Ngbuj+oc14PrjJ6u8ZkWbNHpONJ7DMozf89X8WjccJWB6rrusA4EiJHWbGbbfdVpQLv//7v49Xv/rV+OhHP4pnP/vZV6DV952+baKrnva0p+EjH/kI9vb2kHMuDPApT3kK/vRP/1SLC2b81m/9Ft75zndie3sbT3jCE/D+978fALC3t4ef+7mfw8c//vEH8jE2tIae8pSn4IMf/CCYGV3X4c///M/xoz/6o0eOu/XWW3HjjTfiEY94BH7lV34Fv/RLv4TPf/7zePjDH47JZFLmxJ133okbb7wRX/jCF670o2wIm/F8MNFtt92Gf//3fwcAfPjDH8YNN9xQhMZVetKTnoQvfelLuPXWWwEAf/mXf3nF2rmh9eS9XxIwjK666qqynj7zmc/grrvuAgA8/vGPx5e//GV88YtfBAB8/OMfx2/8xm8AAOq6xg033IC3ve1teMMb3lDO+Z9O3zZQ++lPfzpuu+02vPCFL8Tu7i6uv/56nD17Fr/6q7+K3/md38ELXvACpJTw6Ec/Gr/5m78JALj55pvx5je/Gc9//vPRdR1uvPFG/NRP/RTuuOOOB/hpNjSm17/+9XjLW96C5z//+ej7Hk996lPx8pe//Mhx119/PZ7znOfghS98IWazGSaTCV7/+tejrmv8wR/8Ad761rfife97H2KMeMUrXoEbbrjhAXiaDW3G88FD3/u934v3vOc9uP3223H11VfjHe94B9797nevPfaqq67CzTffjFe96lWoqgpPetKTrnBrN7RKP/mTP4mbbroJBwcHS9+/6lWvwhve8AZ8+MMfxmMf+1g89rGPBQA85CEPwc0334zXvOY1SClhe3sb73rXu5bO/ZEf+RE873nPw+te9zr80R/90RV7lv8ubWpXbWhDG9rQhja0oQclfduYqza0oQ1taEMb2tCGvhXagJwNbWhDG9rQhjb0oKQNyNnQhja0oQ1taEMPStqAnA1taEMb2tCGNvSgpA3I2dCGNrShDW1oQw9K2oCcDW1oQxva0IY29KCkDcjZ0IY2tKENbWhDD0q6ZDLA3/l/7sbZeQbAABg0/pGp/M3yZ/nLPsoRDqTHDEdw+Wb8L0Z3WD7efjv67UCE9XQ5xx137vgKLP/T0Pb1t6DRa+X6JS2RXIPB0kd6+Jmpx//1Ew+9aFvuC9179vyRtPtEVP4m0r9B8sfomPJu3xPLmI8ek1Y+0egvverKcevpcpI3MbikJmfWrtUTx+mfymdmnXdrrrWSLuq47FHy/fi++pkZiRmZubTFeYfvvObqy3iS/x793v/9VizmB9jdnaCuKzgPVJWDc8CiTTiYR8SYcfbsPr5x11m0bY+27bDQsic5AznrynJulMKfRsvRBliek0crcdwnDAYRwdEwR8ZzYfzXus6V+eX0frz+OMLSb5mFL+Ws43CkXRmcZVKknJFzkjnA+ZjxHr5bTW3PzLj22mvxoQ996Ejb7y/6/z7z/6LX9Pq2JpfWW2nb6PPSmhp+cM4NzzBa30vPzXykH+zeqzeTtiwfY2vaKf+S8XdLvGTtNUftyDac5bXaHl6Zi6s/r8xHZtCIBxzHR0Jd4Xse++hjfr3v9NMveD7uufsuzGYTBO9ByPDoQQCmTcCpWYM6eOzuTPCQM9uo64CrrjqF73jo1Qgh4PyFOc7tH6DtIr5yx3/hK7d/HTEmTCcTTKYTBO+w1TTYamo4SL87J2VVYk5IKcN7h+nWFupJA+cIVVXBey+lV2JCThl1M8HuqaswmU7QJ0abMmJmtF2P/cND9CnCeYcqBJBzmE0a7G5tIXiPpmowrRt4H7A128Hu7lVw3uPCwQJ7F+bo+oivf+MufP2/voEYk9yXGd457J7awa6WimgXC7Rti77vcc899+DcuXNwzmE6maKuK1SVx6ndKWbTCtPpFq699mHY3T2NUM2wvX0N6nobITg8/OEPWT/Wlxqs84uMs/NonAwDgBGWwopsmAgMV2bbMB8ZgKSVJuX+sjh4SY0kwIEULFGBVHa3zFSABmHY0Y7fLMfw67jfRve/jG13WDJrFs+RL1ZBzroT2Lpx1GGXasd9oxQjMi8zc1piZMNn2/RWGX5hmMQXwYvjWSLv7hgAO6YjWHHdMWsBDIa5uPL7GJBc6nrDd+tbuXq98StlRtapmRnweX377y86d+4s5gd74DxDUwf4QKgaYXb7Fxa4994DtF2Pb959Hrff8U3MFx1yZsSsD0gORDIq5Dy887A1OJJYyneMZZBIehmpE5cBInjn4Eg2PUe2DWJlOfLSuAECCIMPIEe6tPVezOC8vPYAFKAiLGUAmuM5I0BOjkuxR0pRj9X2rpA1SXCFK0vSrnHSeVMXizm6Rbt2DY5pSUAZg5wRMFoCSRjYy5E+W3PdVT5w7LX1e4YAIAKBV85b/WxkAKSsJYxW25gfHgGjq8LlMudewsE5H+XYbMLRyS7Ou7/5X/jGN+7EbDZBVUBOBoExrQMOZw2q4HCwO0PqdjBpKlDuMKuAqgq4cOEQ+3sH6LoOh+fvwnzvbsSY4OIMPk2RnUOYV3A+6AyQvSRzxkG7QNt3cJXH1u42JltT+BAw29lCM5kgc0YbE2LKCKnGvutRtQ36lDFfRMSU0ceEtu2QckZdVdhSsOZcwrR2IA6ISOg5IzuPWFXIqQMhIPatzOWux8GFfZw/fw5dH5FzLuAr5yTjw4zFfI62bdF1He66+26cO3tWQM50iqZpMGkqVCEh+CmqOoAc4CsP8oSeGTllVO74ffOSIEf5lf5hHwZpimgMEAapbyAeuKFpQFgAjhsfzaIVyEdvJtcvk17OWJZm1tGqXLeeCmO7yPFj6XWdJHt8Iy7ShiKdjK9Bl9Hi+07GKKyg4rrPdtxx3y8N82qjDSDY8Vjtu+NH7MgXSwqA9Qxv+JqXeOJaMLR6i2M3rsvb0NYy6cs//T7SAASOfJ8z+tij73t0XYtF26JtO5SVN9qkRNJ3AMmKVHlD1t1xqLP8ZNDHqQQt4guV4wcGIryCdWrQMI/0WFLtAxEKaGEwQLlI54D1+YCaaHyDUc+M+dF4Y1+nyVniN2xAhy4yP+5/sjYCw73XAZGlYy6xpsbXNnC5DuTYddcBkuPAz6DNOR7UrLvekbYt3Wzl44oG6siYlH2hwPPCm9aJpDz+4QSJCKLVxLCHGuRnEDIDKQM5y7u8GFlfbGppArwjVMHDEaEOHk3w8M6hcg7BhEk9PGcGp4QYe3hkpBQRUwR5kv3XizCYmZABRAe0SEg5ok8ZixQRY0LKjGg9R6IZ9MHDe30PXrSFjlSDR3BuDILlPAO/NGqkaYSXeOdIUFk3N713CMEjhIBQycv5AHiHvIQbjtKla1fZYtfmLBmoDMWvaC0IgEM2HYycRwxPLGiWgAoZQX+LKSGlhJwZbQJihjBKXwG+kqnhAoj8iGeOANVFJ+3FFxmNxEzhuWsWOWi0UGSyHO2Pi9/3KE8di7amll7i0ydKF2Nc645dlezkD2C8+wz71fI1bLyWZ8maNuF4hnfcM7ABbV2NY4n1OMa9uoBWv7vc761fmNfoAa/AOGZTG0mL9E03dxJNimgDPLz38D6oRlTXrfNwzutzmFbH1rqAnTwCjmwgRde8zFmBNJxlHJwTTdIqLZtDDZyK+tq0SgW78FhwgoKvsWBlACivrE07HiC22STcaDyHmcdAQU8YnWwAZzznbX5dKToOcBjR6H1FThp9ptF3A78sIBLL4GEdHxhrc8cmsPI9dJ6V9lxeH5Eu2gJXywXWmOjWCFvrAGC5lmGdsjaXweDxRqz7jxwEnHg1A4tRT/veefTZIcPhsAf25hmLmEBhAQp7qILH4XyBg/kcMUZcOOzQ9wJ+YmSkyGDHiC6j1743baOYcAnBB3gfEJxHIAfHQOp6dIACrIycGS4TKCRpW59BnWzAlDKoT0BmEHn4JPt3yEAAIRAhOIfgBXAREXLOABIARvAe2Wf53Tv47KRtY/tkGbijPHb8YggATEk0QaIx5wLs3CXm3CVBjnOktr6VHwyqYQA6TsU7xxmeI4ydOmQQMRrPaHyGI0bjOjSuB3LGfL7AomsRMyPOMw4XDJCHn+zA1VOQC3DVFAgDPCRya1XU/11iHjbIdQvV5FWVC5VR07HHL197dd8fq0BG4IHWq6dPmi4GbgB5PkfaLhoDnWWJagA5q5B39ZOevUZKppUPl2ZHAyM3BdJS20dAZ1UjNW7HcaBmXXuXwE257pVinwPllISxLJl+uCwJ51QC8h5VVSMlQmIgJjncuQDvvaxe50HkZX4r82AAxGryAcCOkEnXuvOjDTCDnfRFcINvgKmjTZSlUd+VjTYlZcwOWVW5SzPGQDSZr4VKg2AQk67F8vADkaxR+43IwZGHQCOGCF/jdb/MWIXvOdX6AM7xFVmbx2lTL4fGgG1Zm+KOvcblgJzlfnFHj1ndpHAMCFldW2S8kwdwAxSN3rpz15oNlzS2CuXWtQnr1/9JUOWBULQPld5T5lqGwzw7UAb6OdByhPcZZw8z7jq3gHOElHqk1COnjL0Lc8wXspYqx+hchneAcwCc7IHF5wwABYdJPYEPDrWvUDsPZKA/mKM7nOsKImQQXJVBNINPCSkm0GEC9QmUM5yamDx7VHVGw4w6AQ0cKvKofEBdBeUFJK4QTvbIuhIzWl1VqKsgACxndVxRBmMDZu9HN8ryW4oJXR9R9+JnGJO5rciV0kXG4jLMVQoqCk4eJMUBG4+OB+DAcJwLwJEXUFFG7RmeMiYuYeJ6sMtI1CLxXFBezOAuAy6AQw2EoGCphqnvQEd9Re4LDX07Zq/D5+IcvPJ5Sdg8Bugsj9sY3KzQiPlfKVpnux//Vt5p+Tv5DKx7jqPHlL/Kv8eBjTGVnqLjjy99y2VWLjVpVfI7zjdg3YZyrBlPdv6lG43NBkt04mNpmpyRRodp+Gxj65wyooSch3EZNDhkupmh2Uto0/pkuIUAF12HjkCqzSLn5MWMxABzXrqyaXuK5EcZBDewvSOMjgasM8LWMuzLc23ZjDVuO4/mOI+fsjzXWHtjn+0ayxqRK0eXq80Bltv23zEdHQdyVgFOATZlX1iZ5itCxjqtaXk2HVheWUuE9W29wEWJTgAAIABJREFUlFZ22DfXixo0HHLyQIekj8QhWB0zdK2ZuQgAkAjUM1xipBwRY4QjAJzAOYI5o+0iYhImlxIjJQU2nJH0gVJKSFmsJAFBNaoenkyHJEAh56zr10m/U1BbGYOi7L+IakeLGZTle5chLxYtlTzb+Pm037MID3J/FmGB3KDt4xGvWt9ta8lAUlbtb84M59Svi2hpDq3SZZirbMIPKmOCOW2ZAxcLk2NVVXGPijs4znBI8EhwxNiqPLaCh3dA43rULiKnhEQL9OkQHDN8F+HaCCYPRkZOLeACKHagagI4DxcmgFd0bBOnLDcDI6V7Ltp55ceLqKOXZUs52A5fZZjHXvoSVMDkFeCkly8Z6gYzAjqrmpxhL1zV5Kze4+hGcTGtCQHF93UMNJbbh8E6w0f7+WLgZXzM5Whvlr9bBgrWXirCgG2ax172fqSRn0km5KwrNefSFtjvquI1KxcRl0g7cRLOgxSuauWcJApjSZNDBM55JGwA5sAsDE1U06zRW7JMRxoTJ8cwAKRhhWRlkMMma307aKfIBKwyLke1BmVelNA/NRcUbeTFN3u77gNBxwGMVUGk9OVonjntVwNmpsZnyHhm3SiSav/ERSCJdivlorUq/iAY+sM5B+99ATpLn5UH24ZngMj7cKT9NlamWSnjtmatLvcLyu+wp+Lx/JbvBm3f6l46rMsrQrpBOF/B+RrOVwihATmva1Aa4kOAr2o4R4i5w37bApzFUb5fgDlhsWixWCxAAFLM6LsETw5N8Gi8G80BhvMEqsSEBCorRgQDhpqfqCwNzkCOQHb6nga/oEwAOyARo8sJyA4hRiy6Fgmy/utGnlOUsglEosGVzjceNOzOWPlU/P54uM7SYfpzzjJHU8rIKYGTIC5Haja7iJb1MjQ5DuRYeoMJhAxXJlQCsUReEXeg1IKQUaNFgwUcMjz3CLmFI+AUzXC6niF4QqCE4NQXBwfo4jlQn1AtWvjDTnzRuwPEMBWQM9kB1VOQr4HZKbh6poCngXO+GMbMtj+wTh6Y/TD7lp+RsTT7mcY9PZw2bNwjdRuwfuWUxTs+cnVDXZFgVLI+UaKj8GP5Z52AzmzjWAI2Y20TrbZ/7T1Gd7sIaFmvyj5e8lryqVgZv9XrXUpVfxyt/Z2OPjcgwCI7AmWZfA4A+Yte/j6TRRnmnJCTjFdWiY8565yVuZqzqbSBnEnXCCPp+mBH8PpYOQ+mphRljQrIcWASZ0N2DuwyHBE8+SX/H0cEMCHnHilmnUterk+y+TnngATEmIqjpYAcNS05Zd5r+37YEAtwyzYeo4MHWAQ4NVdlgqOEvGYOrtNoPhC0CixWQY7sXYZs2PYyOHLwpMez6s+YxZFUoyo7jWLJOaPrW8S+R84Jbdsi9h0yZ6QYCxAycs6VEGTnPeq61mgbj7qqi99XXTdwzqGua0ynM/UFG3y/mBVQwRygR8+tZiu2z2s5lQGYQYPJxo/50mva+vekx1mCtxycnyBUE1T1DM3sFHyo1NlY1qDzAS5UYBDmh+exf3iPmGYWHdrDPeSUEPsWqW9BAC5UC0xDLdFHVcCkCnCOUNceVSXmMdfUajb2wosFb4ASg7IK6E77OgG5F3NP7FlMQTEhA0hEYAd0xJjnhD4B6B3CwqNOAXAe0+kOvCfxF+qj8CDjzUgQoGOv0apkWvKbEz+5dYMlID3lJPO4j8KTYoTzCcGJM3YV7hPI0UXHIi6TSoBkSI0TiDMoRwE6nOCpQ6BWQU6HwC08gBoeE1fDO0JwGZ4yEicERHju4XKCywaWHECCeuGCOCCTQs80E9gJqFbJ6wIZ9jvputWt9riFMxxVzFHjw3j895prEJa/G4n646PXSylHpbaTpCWAccy9zHxh4IZGbRyOKZ9G287KMaNeX9ftR0xBWKNpWTlnCfSMsOgR01VBP3oeD+etM2N9K1E0DOj8H11T22r3XYO77n8yqVUZvrwNDnlF08YoYMCOsblefGbsGoAKNHrwONy6CNFOQdRyHwxzxcYSohEarQU9eklTUx5HQYtz440M47OOWb7jcV5enyPdGnRCr59UYyxOQ/uX27bu3vcfXe58KcKbHs9kwEdfRYOnb+pXlXNGjBG9ApuubdH3HVJKaBdzdF2rx/RISXMK6T0N5IQgDuwp1vDq3JpigvcS+cKMAszqOhXBzV8S8C87BSyLOMtPvjTea3+/NF0ZMGvaLYkE8qGBD43wjqQ8ywcR3AEkBHSRECPQdhnzNiKniBx75F59XDMBieDVRAwWB2dyKlRZqpXxPOdRX9pyL2uUVAM8AI3MIugzQUxBBBGGmBFzRkwJzqmzMxjmyC+aQAMrwk9WFQwCcFa205VhK3+ubKkmDHHORVtEoKU9ah1d2vGYMgJFOFIwwz2QFgAnOI5wqr2h3MHnBQgZk5AxUwdjSkm0PwQ0Hqi9OEM6B5HsWCQDmQSkEgCJJ4+vwM6DiZCpRU4ZQAueA7k/FL+degYKlWh7qhnIBUAnFo4w0mXfieWOGYsUo91zSaVg1zuqdVi6xNJGv6qNGDPLZYBztE0nQUf7BBjx+YJWjOXw0ikF9I4PxbABLf8yotFGMn7eQRpfM9PXASOsLAI++v069ngUAA2g81j/GxzdaAdHWgEGBSgBIEdweWj6SbupphRFy6LMDizRFZSBpTwgNPR5WRLGaAwRjcbXe4ILwqy8Y/gopg7xAcjCaCMjc5K8KFmkOgEvYi7J6ghpdnqJCCEwCy/I2YvGICVRPxvzYkbGytDrGIgPgHxlOTeG9XQU0A7zyq49mGTKei4H698mKY2uxfmo1uEkyFT+bUnWmIfxxTAXTctKJJubRbO5MffIkgiRc0bfd+h70d60XYu2E41N33WqycnoWtXqMIvWZ2Q6BI43V0G1dwQHHwKauoHzAU3TYHt7B1UIaJoJtra24ENAXVVomomGH2sIsjGOAtCovJfx4wGsWV+MlOXDpj4C+GsVsaO1cJJkWk0fKoRQienOTEiZy9rgnMExggHEvgVzkrVLkP6BA2Un8egg6NIDMxCJED2BvUOqPLiuwMEje49MAZkcEtWIVIGJkUKNLA41gLbH+QaYbAFhCnatODyTuuZrFHEmj8wOOTtkBUQ5E2IC+sgAZbhMyF46P4/AUB+j8CnjBSZoWD9htNaX1qW+YBpiwQ2kebiE/WbEvgW5ORA8gN21Y3FJkFO5hIZ6VNTDcQbzHLk/D+QOLvcIeSHaG+7hWbQ3M19huxKNTY4JiTIcCJMATKrBgc1MMyEE+BAQ4DCZOSRXCyJ0QXxzGGjjAjEdIidC311AhAdcBdTbQGjgqgnC7IxGY1VwXpIFDVKcImfGaP80zm6aKUOvIyA0Ai40Sny4vI0e3WnHudSWl9NRgDGWbNeF396fNE6waPcuEtSSKQqwCDIanT1mQOMJSitHDmfwMirRQ2wOFIeyEfMa324J69DSW/mjrJ3RrdbCVxqNpz7vOrqor9Cacwp4YNIwa9LMnic7ln3fI/a9SDWA9qVEFY2jrghUHAWXAAQDgoigEp7MhCp41FUAGOh7Ru9lw2z7HtyLBjXnDjlqyLmvkEsiQZUIMeAnTkBiRoKMO+cM55yo4i05JQ/h5JSzhI4QDRFarCHqBeRwmTOSn8OcH/NoAxwk/5wzWP1PxN9oDGbsb5nF43MtzH0tED8BijHiwoX/n7c3bXIkSc40HzUzd8cRkUc1m93kkC2cWdn9//9l+GV3ZVcoQo7IdFd3ZWYcANzt0P2gZuYORERWzbKiLAsFD8ABuNuh9uqr1xMpJeZ57scd8GBRXr6af7xzPYSXkg3cUJMfpgpgljPLcqlJ4BaWtNBNkSmjRcmxmiUVtDp4CnVjab5XrbfUTJ+tb1Ky8733jNMO5zy7acfd/T1DGLi/v+d3v/sd0zTx8eMn/u7vfs84jgzDwDhOZv7cytDKDthzqVm5r+XENau8+nvRAD9v5xlqEcPv2iTg/MAwjGbCC0MHOQUl51TvxwBBUSUuZ4omlGJ7kq++a6UCHVUyQsSevQPnhRAc4xjQ3YB6j4aR4gey80Q3IW5EcZQQUAwohGGo5MKIjEcIExoCuSwkb7KjE7jiTSnBnksWshNSUuaYKWrMTih2fzlnUrF8O0tciCn1Odz2CcHMaLou0bW9oqE653E1P4+4pmNmlvlMzlDGt6HML2JyvJTqPFxQNbOU6oLXBa9zBzlBFxyFgDC4wSgtp3VvU7yYNuZrliQRcKUCHmcPs+1Wp8SaoKyUQpJC0USLHlEVcNlMWWrAQ3NESwvXq/Z96yXa0+22Lf3FDXPRPqJsFs9mIfU3NlvqzWE/9WfbNdD5LX0CNla1q+PNjfNyxoH12hYq3sC4LQvyxr7wSzeMazh2DV1evbINiungx07vX7idFb8komp7rW9fd4N8ldX5DYaxAQCt12X3qb1nZPPoMFS2c7OhSBvPtll47xhCsy8UoJo6suW/st8TqKHfKq6GIljUiG5YEa0D0MorAMbIVnDbtFnatW9ZiwYuqq8Ouib9umJp1LHNvfNyc9tI0y3ifQ0SX7E4q+nvtwI5quYQbEkcFy6XCzHG6pNgGZs7q0IFOd5C/w3kmMac0kKKC6qFeT4zL2c7zgtLjhXkFEqqvlcx13FZzQIiQghhDZ3f9G1KiayFkpUlJctkGwLjeME5zzztyLkwDAOqyjRNxBgZhpFlWWjOyVfRqtc90fvDxvi6/69M/3XByWbB3zKyt+395WzdyJtMb+b1zZzWynaWXCi6lh2hMqvOmSdLy2HV5HIzAili7KgT8A68Q1psed07VRxFfH0OqLPxVD+AD+vDebOaeAte6CWb2t66IQpawvRS50ouxh7kGmSQ1ZzbWy6eKwb1tbG+WlZvjMt2b+wWEUVLNsCY3x7PnwU593JG3IWxXHBkiruQ3AUl4iTjnPnpOBWcCoIjeIeveSZwHu8N3IRgyNZ5B9iNO6+GKGusvbiCiOmb4nx9OCaxvAMtXb75VxYSM6VkJEb0pORlQv2IxjtDqS7gwgTO22TZmLFaGQqLDGv+/rIyNZ29qaCsDTT0MhRXY9M1Z66z4r8+atfHIr/d5shWx13BQBfqsk7m5q7RtgShRu8I1QG9buxvMBztvrSaOzSXqv2l6o0vV86V2zDV10DfNTPEzQLRDtb0ZniurmvzsbdMVG+ZtN5qbVNux6oFfWcn8pzbAs81SgrwK2vjvMer9AR9zhnD1Ka3GZZMZI7DWNOnB37/ux/4u999xjmxmjLLzLJE/vKXH/npp5+6BlraBgSVQQDU2QNQ3IY1bO8XMma6KlvaXrWXbxAxP4Hrkbr+c9Xqm7a/DvYKlFg/0LMcr+Hk22ifreZ/DaCu65H9lq2xnA30nM9n85UxrhDAFFApdRwLUu8z51idjQtLXFjiYutOS2cOciykZJvtcomkJa33WxTnHbtpYhgGRMyp1Ttn7zcQQt2oS4EMMTmcyzZWz7a252XmdD4RwsDHn37i28MD0zTx6fNnfv/731dGZ2ScRlOfbsDuW4BlKydaCZFG516P7fVnfhNFUm19LvNsAMUldEkgvjoTL9UMupYMcWkhqGUqDsGzcztUCykMpDjYBKxKhHPCbjey3wdC8OyPgcNxIPjAbj8yTTtwjjKMzJXByYyoWP6aoAFXHB6IJeKlEJlJPgGF6nlszLdXcrCwzeSVhO2LS1HmaH61LjuCt40iZyXWSKglFWJWUlaLKJOX4/hSnG/UM3GoOlIRYnbE7JijcImCV5Dh50HMz4KcT/6ZyT0zuBOuJLKLLP5CkbTV1UzFKoY2R+/WTI8eRM32GoaRYRx77Qqj5jAHthAoWvoCMarUQI4HQnB9Qkw5k4uFP17ymZSgZEdansg4i7iaP+C8mbFk/xEXJsQPiDe6S6kIF8GpVn8jULn2CHgtTqjINYjp62jD7jTg/t22xkh3lkjemUZteQbqD7etB2nHFdCYjv5S13X1vevtR3on2G3IlQBqv6sxdery+fmZy+WC9579fs8wDD0qo4GeLeDpgk5tW9V+XJuu/d6e1/fYDmm/ztYNv1ToveqfsznuZIFaBElFge/WUk4VLCZKNmUgtHBeJwTvQe3Zi8OL2fRtXJvh0kzJ0+D5cNyz2438y7/8F/73/+1f8N5zOZ+YL2fOpzP/p0vE01dSrlEY2bSzVAq5JFsr6qGCG+3PVJBRMVBJOGlOjitj05IOtjkENZtujbTS9rpSgU3qAGUdwvX4aoNEV8Aua1K71fyxgortM9hG3r7vt2yN1cnZwogfHx+JMVHyQk7mv+FY8JgvpLHu1p+mSZv5KabMkloKREcRUxbjkknRNqPz05nLeb669xAC93d37HY720j3B2Q0ZqZFNJnpKpNKxtWeEifEFLnMM43zbQ7od3d3fP70A+M08qd//hMIHA4H7u7uGMahJmBcTZG37VauNFNsiwSVppHB1Xfcmrfeu5VikYOXy8WAKmLmHioDWnJXMMzQC04zY7FIxjA4QjgCEGMiLtVfKhdyKjgHh0PgePSEwXH3YeR4b87g+2nPbjxQxHHxA4vzKJ5UJooOCOCL4BI4rwx5NrmuCykk1GekCBKdJdz0Sh4y6sx0bVmTBUkQloxPrpZdCCCYfMgmHy6xEKuskJrKgar8Xivbt2vLAE7NukfKnjl6fHScF2FaYFAhTL8CyBkkkyQxSMJJQsgUKRTZ0vJtI9po4DSfjUqRyzaPgjPGpmwc6PrGaJ/UdtwmbxOapRiCQ8goPpe6ObeIkGrqSrZg1TnIEUNbjkaxNz6jRw4h9CzGXUrSmYi1ab8mu+d2+jUcUOC7a2mV5v1zndZ8x7adWOgaSaZNw2ejCa+EVqcatY5787u4uoXXfq99V7HQxBhj9zdoIKdtOM6teTa2Do+vtca7tZ9/dfvZbkpXA7JujK/tW98zV93e1/WxdkZHS+mFTd6raXMu1fV3X3aE3Dyt8Ub1kkFMMwzBBNV+N3F3dyQEz+CFwZvD734aGAeHa8Z0tRo8pQJPA7+6EVj1yzsA1D4BlcZ+tVM32vt6uTWwsY5Xe102DFL9xApYNzf7Wutzeo2s+24f/7yq8qu17aVsmc0V7K+AvxdF1YSwGLMj9dFATjawkpJtjArVIdV3IG6p8s0UFlPs7ELzgzHfkVyjb5p5dAWm/Z/WkgLVMV3R+h1i15JyB5ghDIzLyPNpVXR2ux2l+mq9BSbfYnKupviGOX7z/Ju+fq+mSjWnZApCKrlmoio1pxzUgKj6gcbIgaP6WiGoU4pXyyytGa0WKasnJTjv8DWgZ/sw27NQnJmRM2YBsSliFhhfFXzXVrCsbh6yiabGKerMZ9UM2Kag5AKgVg6vJho1Exa9/IIxf7XfX++p2l9vr7Veb0uFXPMVuh4Q8H1T8s+CnJ0s4GJlWKrYVkvB3vJ0CGraQQUPt4W6pG6MdT/dUIvND6dqa07pySHf6JHmtd4iC6ZaTyOroVMLf8uUcgZdQBdiSeAHY3jSXWV0RmTYQwVdFiFAdxxdo+7WjbHBIdgCgn5l/TxuP3o1WNvTtwCnd8n7NmmgZiPA265SsaLUjcWc/AVkvV+K0Lx3RbiqOmw29mpKqJEQ58uFeV5IKfLly1cenx6JMfLt2zeen58JIXA8HpkqLd6OQwgcDoeem2O323VA5Ftkxua2ttrfmxN+RaSdAdief/v83e+6eX+rAS/LwsPjA6qF//IP//j98fhPtJwtJDjFSAym8Ygr+OLIyTaxZhPvAGizMbU0EKgwDJ7DYcdhv+PuaI8QPJ5EkIiXkd//8IHLH39XNdTIvFg+ledzYl6sqN8cIWbTpqsnDWw26MYyNeakMWA9b19nZmyeOjE/g6YodQtgPb+buaqXpLhmuqAyRVend3DcQP12eK/ZoHVtouU3ATuCMAwDd3d35JzZ7/ccDgdSSlaN+mS1jEpeSPFM0QzlDPkJ1UzJMyXNaClWkPVyMZbmsqDnBUvh5BG8rVNXwBccjmGI5BR6v4ExOdM0Mo4tP05d+0Urk15qQsFCweoS5Vo7yZRay9PSZY1ATAuPz4+EOfDnv/yZcZrY7/f84e//gKqa2WocmabpuwD058DpLePzW7cwekKonKnWgpe5GCjQjJQEKMEJ4j1O6lrZiORapMFAajQmtPnLgIEbN3h8cOaSo5lQYEqRPZHsHWkM5EmYS+FxufCcZkSsFIcUZfTCR+/YDeByZFwiPkckCW4WJAuyc8i+IIPJ3SwTWmdRLDUTTq7EhDhSFmKGnCGVmiVHBd8wwWY1vtVW9bUyOXhEB+bseV4UOSV2k2eYMuLNXPbmWPzcYB3lgrgZDRbaZs7DoSbfaojUUk0nDIW7FuZ1o2MbOlxvVJyzrMhVW3HOEoG1GJCrTWxVwSxyCou68t7ET9Y2iSy0dUnPFtERA+nyRBGPG3b49BHxA346MMgHxA89K6WIq+HqW0l61euAssYavBamWAXsG+PXNdqbAW4szrt7/TupqLh0Ya9Nv9daD6VigXYpNmYru9N8ltYsthhY9B5USTl1xubL1688PDxwuVz4j//4D3788UeWZeHLly88Pj4SQuCuUuLTNPH582cOhwO73Y4ffviB/X7Pbrfj48ePHfz43e6F784twPku0IHOJtye+5ovwGtOjy/PWXORXOYLP/3tb6Sc/v+M0C9uuZr+4rIQvKBqPme5OFJUizrI1dG3eQtqRbJKBQamW45D4O5uz/Gw5/5+z4f7PSE4BolcXGQKyh///iO+/MEigJ4vnM8zS8p8+Xbm6TSTkqJqOVh0m4m8srhNy7YgE6m+H808tGrgzdzZAhI6u+dWc24DQQ1clprf3kJNzWm6lNKzBF5Jk7I6QZr/33WzaWULofnp2R79vmuzmWunaeqvtfnVAK2ZsAzkqGZyeiYtD5QSmedn5tMDOSceHx95enggxgRyIialZCWrJ6uBHJyaTySZcRjQVvuopgEIwbPbT0zTUFn4KvPIlJKssHIxn4zSfIFqsjmRGgFGG0czysxxYY7meJxz5nw5M407zuezsYj7PZ8+fmIapzp/rtfx99gZ2KiNGzb49fa+Y2k5hYzeUDXH7LhEi0orGdICFDQEwjhsFAFjb0pdpmBZjnMqG6LWgQriHX70+CBW/LokAsouJo5lIQ2BJRTSHpZceCgLP+aWvNdKNhzVM4QRP3gmIlNeGGNCFnBnkAzqPfgMgxECUZIRA1oYCgQVM8dVZ+eULddPrsW2reJ5y8D9hibf5bLeIgYUR9YADEgOPF4KySX2xTPtEy6EOr9ebz8LcnytHp6bmcKBFKmb4bWGttpJv9c259OYgpVqfLPp+v46kesk14r3RKvTSHPCU4QakUU27/G0AIqmgOa4shFaO0macJMNUJHe3evfL4HKG7d6cxO3b0h/+i2c4l44V0rzmdDO8LQL2nBSdKpn80rRVs7D5kZSo7hbZEhMifly4Xw+c6nPp9OJZVn6ayGEmkTMNqvdbtdNCefzufdHe72Ugve+09HbOki3wOTtPlgPvgd0/leOO36ozELKiRTfF+RYn2VSNTcoDuctYjElyKlSxuV1JmLLPYrQgwW8E6O/nVRAYmHpg3dMo8eLEgdPTrZmxuAYvJmC7XNU0wgd5KxRJm2e1xnV5IBoB0HG8FaR2IpLSvueeu1OLF8P1+DjioHhej11EMXKJK3C56pj+3etYOj9WYGX5qlVJhiAHivICeRkfo05QQyZUqIlUdRETolliczjjIgnDLECDkWLR0utFSaKuoKq49ZUbAn8PG4LMgVuZcB2mdnqb4pfqWy/AdwuTXQ1Yy3R5EDJhcv5zGW+4Jwj1tD3F+aoN9gZmyfbv1+bD5vr/AVmyv9s897hKgNZf7UrQhTLOdep8xsZ2+SxRRW2sd8agTdzvK2N+hkzI1Zw3xRZUYooUTKLpPoN9rujQJZCcYJSrDBnzkix4GUS1bzV5H9Tjtcoy2796LRGZe/eymL8nfb66TXKTNcCwzEpQyU1cmlz6vX28yDHWVZLFSzRmFihrmZP17K6rvbQtvp654M3ZoKqEgHNR0dtEWwE2GYJbe9z0xErR7RlRpxz9beUUWsuDYSgLWoqUuITmjyaLsR4seivcY9OR3NKHna4aW/ajPj6kLaC+53247cYm1df3wCKG6DT6fl3ziD39dtXiqpFuVXGjSbEdL1uA3/r8QooWxVniMtSE7lZMrF5nsk58/z8zOn0TEqJh4dHnp6fe/6PZndv7M2aSdVCG1vI7PPzczdnjePI8XjsVPbd3R3DMDAMA/v9vqeOb87L23T4dv03m500P5KXzM/3nt86Xp/7BOlA7D3bnBYen58pZbGIRu8Ig2ncuQg5e4rC0ymSYwS1jcPXzLSqNdOpEwsxVaPWNUc0XihZuDx+5enr34jLQnz+glse8bkwakQlMfhCnpRBLfphDAOXnScXx5wHkhrjmqt2aox2HRdVRGqSMOeQClRFjBXsTuubeUjdaFXEEhA20NLWUwufBaMi1a0ip/HBYunm7dXyUvcQQbSZ9KRuHLy92N+pdRYELJCj/nzwNeeJKqXsyNMR1cJud2K/P1FyYpq+sZu+ElPEhy84+UrKmWXJLLGFmbcw30IIFq1jos0Wh/Our6kGSltzVT54Lwx4cxUoSoTqm0PNS9SinKQyM3SBElPi6XTisiz8+a8/Mkwj+92eJS6EITAOI8M4MA7jC4BzZRGQNUnkZia82X4LkPPx0yfOl3OtCaZomTnnhMZE8MK4G/ECwxCYxgEnQkzKEnOtMZarU745jseY6piYmaoUIRVHKh4yzCkjAkEU8TPRQRLPt+h4TspzKTzqwpMk8CCjIAO4oJx8YUTQnJhOJ+S04GIhnAsuKeoC+X5E1YMXdDggAllzZVeNtQt+oKV8yRLqhGkll+ribRPpaj9v4FhvXgMqYDIfHyhJeZ4jkYWiwnG/ILWU01vtZ0FOCB4fQrWn+9gdAAAgAElEQVQlFiQrvpi/hm0UpsG1WPoV5FRmoAmuBorq9zZtwblm9pDO6rSbvD2qn4TN7/QzxLQ+T7U30oCYVLu8kkpkXiJZoeDJ8oCKEHZH9PARFwbC/p4weEu3LWI5BCrIWRmA9bJ+jnz67msb4dqVlneWo9++fqWIcH9/3wWYdXvVqPt9bZ2/e8LN3gmqyjJfWC5nckp8/fqVb1+/EmPky9cvfPv2rUcXzLPVXWmsjdTfD8Gm3zaipYXJdjNEZW5a1FUzXY3jyOFw4NOnT4zjyG63uwI/0zS90Ipv2wtWi9fZmltw812nY+sxvPPmVfiObVkij09PPD4aBe2ENRsoHiQAjqyOVIIpBzLgXUBFyNnMs+pk3YxKWUEOyvz0hcef/kyKC+n5K255hFKYSmV4neIm5eCEWITd6LlkSNnxNA/MyZEVe60RJ30ulUr3VPlQbHO0TKyhsoyrliZS+gQt4nr8QOULqCfRDMrU+ssNGJkupOv6bToYpTNMvUmVZqoVKOm7Rz5Cu5e2kdP9i3q1+L4mK2zTFWDnZFFXpWT2+28cDl9IccG5vwATKUVOp0uP+Ek597U2TcPVumssgrER2/uuW1afa3a1xTtzdKbUjLgWNqyqVpSyAbbqYwXCkhLp+blXro8pspsmxAv3H+7ZTTuOd8cawr7uDZ3xqkVhWxZc67+XxJz10/WafSug4ddqHz9/5nQ+M59NPqYY0ZwoacH7kcNkJTJMibP7K3O0+ky172JNzhiXSFyMgRkGz4CnOEfKAzkDKJeklBphl7JwcplE4CF6ThFOWnjQZCDHOVywKEjnlZPPjAKaI/PzCfd4wceMOyckFbIbSB92FPUwelRmxJvLA2K+V84HQhgAS/IpkuraaxGWshp+toQHjX2qUKcdbzZKxZS2kq28hF4Sc14oBY77WgLqOzlbfkEyQLeZzDbRtP/0bZNX/mqcy/Xbdq834OZKW9Amh66/8S3kII2mbOHFK93Xae0WgKAdJtkgpIjmaPI227G912zJzQvn5X0Ar26iL6/9e++0Pnh/c9XpcgZx3bGvOfO+tuS3flCNgeoboSrLErlcLqSUzBR1PhFj7MDGojZW6nn7ew2MbMFMztky4W5AT/O1iTH2vDqXy6UfT9NUc4eYHbw5QjZB2FiebbRKvZU328/537x2zvb7jGRwjXh+t1YwTVxzriwN+I6WK21CQSXQPK+6Jr1B1kpNLVBLLJScKSmBKDlGUlzIcaHkhJZm02/5kpTgQL35y2TncepYkmNRR8ZBWR0dkZVogTWJn11WNRW75mzs6liWDQNQmZuNsNReRJhVntTW9YgNqOnUpFZF7Hvaf6M0r4Tz+7QrM8tGTHQF5IrFcP0Y6n35GgRSCsM4MU57nPNM085KKXhPTLlnn+3wvLFWItUclntAQZdJN1p2Y1CgZi9XRZvTt0iPivveClA1R2VRM3HPlwuocrkYEAOYpqnLAO98dxTszqtyfS2vgZwtc9MVkXceS9dyvMkq19u1Nb8xq/e1yicL7rE5rnWrW6+33rUILQpbsYAb1J5ThfqL2LpMoixqj6iQxdYKIhZ1LI7ibI0mhAQkINZRy3WRZ1FSU+LqQ1R7cEkRY/O1Zlu/AilXz9/v8xdzZftC29/b2t2yQ/rah9f2syBn2u1JFHS5VEezjNJuahvCytoBHWlbyLEJNnclgFbbr9aICIerToBX8KZP3tfvQrVpXPQlpTGyPJ1IMdZoU+sIdR7vgl2XC1buAkXzjF4eUedJaUGXi3m8jwf8dECcxw0jbjDn5CtKvCmKN9yTwYMbVKc34lTXN0VaHZifG5H/XPvv//1fEe/50z//M3d3dxwOBz5//sw4jgZi2hUpqFRw0tLsgwmg85kUI3/964/89S9/JqXI+Xzicj53cOJDIAwDh+PRgI04xgpstmAHvU6bH2stnVJKBzZd8Na5dTqdADidTjw+PuKcgbb9ft/NW/v9Huccu92Ow8GqIU/TxG63r+DHcjPBtcDr4KVd0+a1n3u2L4MwDHz48HodlV+zpZzQ5rhfBaBvWq14xJlDYy4Ocksft4Z0WninQIH5MvP48ECaB759uePhywGH8vTtC+eHb+QcKZcTZbGSAFrjUZ049mNgvwu4MBLuPxN2R54vmf/x1wtfHyNzzOhzRBej1MWvflSSW/4rCwkFq8wcRjNRtER43ZemQTUPeA9KT15ntyedcFQtm4rIbQkqKbUoIPoa7POe9rf2jf19oeptq78rG+EtjYRqr5fri6UB68rUaYHDHUMI5JwQ8QzjjhTN4f/h4Ss5Jy6XS69IPpTQlYsOghqrc6MRiHMQQHvSR2spF7yIBYGkjJCMhZNae6k6NNvXresLhOfziVKymW92EyKw2+/5p3/8J1w1Re+midFPm71jBX9NUXatMzbX3MDb1T28M8iZVVlKYYmp1webxoHghMPeankNQ+iFTRGIKgxLRlxBpWYJ9jCGgNtXRj0IbjCFQn3gFBVXhDgGhhr9xBAQH8jB8yyBOTkWEfATu2B7lzjLcuxQzgIU5eKVcn/HNASGUtjnRChK2nniFCjO4TUwpoIrCVfO+PSFQUamcWE/FZwEi/isyV5zKuRsbKw6VtDcu1/XtXbln9TeNcZwDANu2uGHicPxA+O0ZzcN7HfHGgE4vjkWvwDk7Miaa7IvRbIV6aM+1oJ315OqZTCm+uasE3PVTJzIjU+Oqzf/lpvkzUTVFv3AWnsKKDGyPD0RLxfLZFxAVHDDQJh2FgUUtIIVJaWZJSeKCGU+EU+PiPOE3R3D4QPiA8PhHgm+AklXC4IJqxt8HbVmKruivtcDaSO3HcUNDf3eIOdf//VfzRsd+OGHH/j8+TN3d3eM42ib4CYapWkhDYyoKufLha9fv7LMM//+7//Ov/3b/0tcFlpGWecc93d3HO/uCN5zd3/P8XCwyJEwWII6WAXczUayFa4N5MQYOZ2MJdoeb81HFu46XQEe7z0fPnzg06dPNTT3no8fSz3XNNsXtv52bRvw1a63tTdfq1N0GAIfPnzAv/NglpwomnuyP+ccwYeaQM8oZHDEpKbxWVLansG3lDXE+nK58PhtIU0DD1+OPPy0xwk8f/vK6fEBzQnSjNS6RzUwy0yJ+4FhNzLtj/zwT//A/Q9/x9fHC2X4GzI+cbokzuWZqOZv47wlLizZbPqlRju17gzDwFCLOLZxLo2VrRtkTxJILUPQMEtztgJyyQbQWTdCVUuSZ4lLy8pWbtq1OPttIc6bFLCuqp9dUdmKFaDW8PIm0ochwN58dYZhYr8/EuOCDwMiEFOs89/mbqmZyLUUkvdXikZK6YrFeunjZnIi5Ix3Qi5KlISqRYVpLS/QtO9SAWtpIArI58TlcuqlHi7zpSstHz99ZJp2tm6lOjKL6zLqam+pwKezfDdr9L3BTWtRLaJpiYm02ByexgEdPIfDnrtmhnOusr7CnCFcEkgjEoyhnsLAroJ+vKLemJ7olVM0hTQSGIYR9WJh46MnO8fFBZZsJRsIe6ZhNJDjBwM5WjinhVgyg4d0f2TYB0YpHF1hEMtwPKujIEwEDrkQcoJ4BhWCBPZjQqPgXbDSLgXzK0oW+VjKxq+tslLKugf0NBfraG0UTGEYRoZpxzDu+HB3z7Q/MoXAfrdjGq3w61vt581VW3+ZKxZiZXBWun7Vh9ZoCunszvX8up5snR195Rrk6mjlpl4TPzVPWPUtqDb+UlUh5yx8T0CKOTr2Lxfz3qZkVJJt2jlaNJYqJS9mxqp2SKnFyLbX3gDL9YvXN3FVDWJLWNFozVdu6lds8+UC3nM6ndjtduz3ewMPKfURbiCtbdFbNnALAIqWXlLAygiYL4qvjE1jVYZx7BtwBzmlT/F1HDeaYxOyNn3KVQQWsmpn7fwGhhqD2Ojgzjyl1PN+mC+Q9AJ4W6G9+iOs98rm3vv1Xh1ve9i+x/yB3tcpJwyBIQyMweMrGA1hA3Jc82spuNwKXVJZz22/0/swJyHFyDLPeKH6EmS0WMgpagrD2i/GHoXgCYMV9hzHwDgFwugZBk9IpefG6RuRrHJFtgtIbS24di5cSZ4tbd9fQ1gr75qMaJiz/nkjQ95aZFL/25ofZV2077w4t99+a2J5eayd8LHP3tyh1CzRKn0dAlYRO5j5sq2RUn2hKAWtfmzaZfbPm9D7b7W1xDVTshV3zU+yOZo2J2dVK+Da2NzL5YxIZY4vMyDsp33/TDOvbVsb87eGd8vovLec7QpY+7EGytX6q4N6rQCAjaypU83Vm/HeSiWJWPbh4qkWEvOBUjX2TCqhYLWjLKy75GxZ/FUQVXzNhWtmL4crIOKrb14mhwEnSpJC9Nmql5fqE6NCaUl1xZnfVc33k3I01wSn9gO4VUncCBvZ/P9Fn8FGKN2+0z17rOOKdkLh+2v6F4CcG3FoqHsTtrVWCS5dqDhnERzOOcgZJV3Ti9tHm5gbQVSX84vLfqlzvXKpUAfTNmDLd2U+BFoKqWSL5Bgn8wv1ARkGxiGYnVIV1YiqwPJkg+g8OZ2Jy6kyPAeG3cG00jAiYTAh42tEFtcaxG1bNbB2wU2Avj/IeT6duMwz//Zv/8Zf//Y3/vDwSJgmPnywPDTHQzMv1aSO/V5Me5DgGKYBFeXu/sjvfveZnDLTfsdUE/YdDkf2jb0ZRkI1UTlZk/g1Fs5uv9qJcyYuszEUlvWKnF0Pxyy5MM8zzgkxGs1umVQb3Wn3qFpIaSEXx7dvC0+PXxFxHI4H7u7u8SFwPN5xd3ePq07N42DVkIMPV2UlVgfF1WC6FUhXo1wjcb634H7N9o9//AP7IAzeSjZYhFmwysV1PqnCt6cL+acnlphJWdC4Vvw2BlnRXMhLJpbEw08/8echG8i5PJOXVLOiOryMdSFaAjjvA7v9jsP9HcN+z7gPhEkYkmN/FzjGkewV/xWkmT+FjRZude+UTbmRkiElcILkhNO8ypftZq59wC1YqqO3JhRLDdWFVk+r7S72tDKo0IDpuhG2ed+u872ZObupdZ6VSre9BBnbOfaazLhmWYZhhP2BEAaOhyPn49FY0pT6WrOK9msB0NYH12tgcwUdLKym9mI2THrkWmfnVmBiW0XT4Fm1d2nfa2ZoLcWiKg93eBc47Pf88z//yZQnb5FX4ptiUlmuBiZ+m+X33ZZLsRIqQ82BUzyavTGJOB7PC25OdXe1flliJGcbgyCY+QnYjY7d6BEnJC8kD0UUNJLJqGTyfDIlxAluGpAh1GrjMy6MEALEBRknZBhw7t4itZwjjAccDikJHffEkkiSmcVy+TiFkBvWHxjdjgGHnhPPl1ON7iosseCcZ3Ajg59AMXNpVYysqIP9W4donRe2fK/lqmEDpSwzSU5oSjwjLOcTabfjOMIYVkb6tfYLQA5tvnYAlVXJujoeNcHRLtw5o6S9t6RTouUFyLm6k+ufYs1Bo1ev8+LV1+9MVHG54HPGFSVkAzlJkkUgCPiSwXskZFwQgjev/pJLr5KclkhcLga34hnmZ8QHpvQBpwnnA06O5qnefHWq9i5F1wW8udamWa4qh25AzvuHkJ+fT3x9eACEaTdxmRfuP3/mvCTu7+/xw8Qo3jJosunvpjF6RxgDiHK8O/L50ydUC3cfP3L/4YNVIq6Ojois1Tbq91heFKvX4uu9OzGfHfMfsDxZDeR4L6gGxslyhIzzACjLMqDVWVlLIdekeKUolEyqKWqe5guXswnNabdndzAK/P7+Ex8+fqqZlY8cKrjb7ffsph3O2ebQosHa4Kkqr2ad0Q1gX1961/YPf/h7DqNncMZ8+OAtHNWZlpWrlijhgcfnhcyC06Zl2hUK1Y8hZwoLKcPTV+XHfCJ4YXAQaiCTOI933oSS2IbmfWDa7TjcHRh2O8ZdwI8wJGF3DBzSwFIyIdhMatnPG2Cx1J91bVQhJ5otyY8KlISr1Zk7yNkIAcWEkjRzTtX0LX9TLVjZOkyq/88rfbmCWungZnWYNyD4W0RX2U1VmaDrPNsyK9trtp3huk+uMbZjCCPBOYYQ2R8OHE5HlrAQl9mUgZwNeNScSs35f03SugqlW8ddkRacYa4HDaRt8YtWc4SK9N+gv7+5WFnzY53PZ4L3jMOEqnI8HDnsD/zw6QfGccTVZINb1s5utyUt3PbRdfstTFa5zjMJwcBxKZa6RK3kxnJeUIwsSCXXbWCNXAsOBjFH7t3g2E2meM3egbeSRpRc06QU8jxTLotVAZjHbgoLQ4QwQPC4EpE04qaJME14BpzzeL9D3EhRZeHOmCASC2cymakod1kJdkMMfmBEuKQnntOZvCTmmJiXhBfPfjxwGO8QpIIcqSkODRC7m3lMl0mrZeh6AhVyWoAzOUZKyng/oHdH4ocdZRfYFui9bb+MyWkwq/7J5kJu2ZVrG2lD1q9MtO+9sPnSG0Lyu8fXX7Emqbt6T9bj1eHp+qdbZStjHErdrLP5JaBoWihxtknmB4oPIL4PYv+9q4vf+ONcUVfbRf7+C9A7AxklJ1J0LPPM6fnEMIx45zkfLpSi5uw2jdW/WjbjsdYgC9VMoeoYani4D5UJcdUGL7oBOaattxTm2sygIqxOidqd2a+AYft1aVFTztJZlVKzVG+QOGtch1aTmgEiS5SGKjEtFRQZG+Gcx/kWeWXabC6FIVcWytV7kgoNthNq27bz9p1RjvPe8uPUqJaw3ZREu9nNOVkp6nZx7TqlAY710UCjFMEHWxEtF9gWt98m++omiK2ZzxbRRpNfx9bMnnTB1mojFRFKyWYW62Y25aqwSjN1tOPSt9bKElxfh8rm4m9Y1pem9Nv23nC1X8hGAdSeKPFldJBUDLQ9XtfJ9rh+cWdcXPWJ8sXT6gg6pz3Vwva5AR3vX7LTHShLG4vqyN7rCXEjdJsZiwokG46TlYWp4ESrybibrs4XvHjO50tNHpjxzjOExhDTb7izAvU732bU31fOlmLZoGW1S63j24BgYy60jfE6F607WvHRdmdg5qmqDEjBO62FD9TqCoowDcI4WH/KKBAEvCBeEZcRyXhdcBqQojix6u+ZqnjWtdr2QVcflgjAfNos75Wizr7btCy6ENnA802PS3cn2S6pF24L6wfsdbU5UXJG1Cor5FwYl9Brq7Vs3a+1nwU5bWOyqraFVhiuaxbOkm1tddtm63Xe4zZ0q7yYWFudrtHYV/d3dSXrp7a/diOwWDXqHmTW6GbvGILlGCD4PrlKKRbZUatGN0fbpmmqCEUjOWXIrqZQfzLNdneP2x3NOXl/T9gdTJBsnGzXBUjtq5fd0ISSf2cm53d3B8p8QopFuzz89Ff+n//7/2K3O/D58w88/vGB/W7P58+f+MPf/55pGhEVfB2qQQT1nkEL8ziy7HdoUfbjyDhYem2jXi3HQ85rqvcioM7GZHS29kQ8QTxOhFSUeV5YzicaNdKKpjYY6xyEwYMMBlhKsiRzldonV5+dCkQcSqgZP51mtESKFJbLmSfnEXE8PT3V6D9nDm7DgPOO/W7PNE447zkcDky7ndGxuwlffRz6fK3X0IC1oLy34h9ECM4eTqgZb81hUas5uSX4E6l5dNy6Hkx4KQ5h9I79EAiiuJxJl4Q68ENAB/PtyZaECmAd0wIxmYNlcZ7LsqCXmctSK87XasQxWamPbrLC6pu1UgVWmqGaSySxSNsEtftvCa4rES177NYcYu0G/GzAVgMMTZkQwZIh9h351sewbewNlL0sAfGuTfWlrKit5VoF2Rxv3teW+KLKwmoqDsNoTvfiGMeppl1wlNHkeWOv2rg0X7jr3y49AivlwmWJdZwzc0w9jYMiPap2VXabkF8BclMa1jxrLRxZuFzO/PiXH3mYTC6VXNjv9/zTf/kn/vD3f+jVr7cBDWUDClv7ub9/7XaqCVFdlQm+KiGCwznwYoyoFIelSGjr03ZJwyXSgUZKERVYUuYs5iszjIlhSDinjHsYdwPOe/bHHdPO5NYw7gnDZGuVQiGZ/CuJfPYoA0nuUUaymKmpOE+RQnZ27qiFQ854VVyGx6S4olb54G7Cs2NyE3u/x+MJGkBLrYVdLL3Dxv21ta1C1KK1bxUQA9GZuJxRIqqW7TgXJc6f+OGHe3a7gcG/rYj8IibHEjxVQaTXIYWuR0Q1dGm1pVzLv+I2TM4b80q2b8kmYqoKl+uPXYOdppTK1etcdVZNv4HzjjAOqHMU5y2xUCm9k1HLF+PronSyOt8mTUgyAVqWMwvGPrj9jMwXXBhAzY8Fb9WbfbVVb2+gXF9a1WJWISXvDHI+3e05Pw1cYiKnyNPXr5wuEecDv/vh96Rz5ng4oinzu4+fmPxgG6Rv11iLtHnPfhiIk9HJwzjUUEghFUi16nBMFrKrAsUrxdXNX0Fb7SsPHiGpMtd8GVYV25yZtx0jYtWyRTyihSK1ulADOcWc7LrmJKaBuKqFUDKKsiyzhU+LEGMiJtMEXPXJcc6Zb9FuTxgGPn36xF1NYLj3jl29V2gailaBZgPsoLuBvFfzYsycbxG6ApBfMCNQOlnWpuOqnQlOlNE5Jh8IUhBdyPNSKw5rTeLlzO7tNpsRFcRmy8iqzrMsEZbIslif5mpSTPVh/WWMaEtIZyDHcvOoVsq+hpP2HF0iSAe99OrZNEZnI4O6r5v9mD2p2Gay0fLX59cE0zVr8YLG+o1aUya3ZqKt03eTfnbutSwxz5i24RuLY/4sk22kYcDXpKdazOfptZDx1to45Jy5XGYLEy6RGDNzjORcWJIVbd3SSbIxed0m6GzSvdBSHHBl0rpcZk7PZ4YwMI0TJSvH45Hj/sDnjxY16Z2zdAK1E5SVjX3Nefu3MFddziculxOhOmMH7/HeWeJbdXhp17SOr2vFqqlrmz6y5JIoKEtZmMuCOOUwKPtQ8INw/yFwvA+EwXP4sGN33OF9YD8dzOSXM8vZzD0xFx7nRyuqqyMXfSbpSPYjIdxR3GD5L4OiDgYt7HLCU1jmzPPZ3AJ205Hj8QPBBwY3MfkdHocsgi5qyXucMVSdsG8dpNqXlHYGcKusrE3VIqBznkm5cDpdWKJl9jmf/shyf2QZ3h7Tn2dyujlHr9f4lhPtT7L98/vf287TG/bqhsq6+q4tcOlajm5k2nqtUs8xjWEDhjotKr1TzZRREMkGXDa08VaMbK+lHxfzH1ChmrEukAO5pmEXkZqifs0sfJV570YwvZUP6Ndqg4PRCzk3s1yx6y6ZOJ85n55AC5fzR+KykMbBAA4toV67zpseUhNOdm+r4KWfZUDEobjGtClA9atRsWyfteifIjh19RxZNy6hO0b2bM199awCTasdpFP6CNv5aSzkZvW1y95oF83PR7WwLEsFRsqU01q3S9zK0NW52+bMb7MlbiMX273S+2V73P54TWloy6ID+0qf002HhZxrTZoNyJEsxLiCnDGXGj6q5FSISyal0s0YAj05qV5fQet+tj3X12xvm+MNK3q17/Oy7/vIi5qPgKyz9FqsNdZmPW4Jzr4XTPBrt5+PrnrtGLbyDtZ5+Pq+/vp9fS+i6lr7bjmX6gjWOlfrelqBmdyCnHZtupGvm/GUZoNjBT3zvPB8egboZV8s7xZrGZfNRtpMQc1MtO2v924N3COl7ye5ArGWhVuofkw08931frrO7Zp8j40ZXxRxigTBBUGCINWBToNQgsm34qnRWIIbLTBGs2MCxDuSeihCVHNmDiGb6d+DeturXM44TdVklIg5kbMyTAW8M7+jMFihT6lJUEsxkO3om+c131r/3uCLtyXmdqGXyuLm1R/4ZyTtL3M8vu35jWRpdmO41o5We2J77QYoVPRWqq2tPfQqIZAJV+shvbkX7d+zXtfmc7o+cqXtCQ7xggQPxTaxWivOBK/35rU/TsZQNQQCWH6PumhrzgBE0JLQ5RmiI5ZIPj9YZtH9HX7a47xn2h8I42Q1PoZgeXrYjO1mY/LvG3XMp0mJe5iDIxVhSZlTfCYVOJfE/7hcCMOIJ/L7330kxg98/PiB4+EHQnBolhqKb74wZpsVSBmdF4tc8+bQpliB18ZoBGfpph0wYJOvlMxlTsQk5OXC6fnEcr7UCr6WJhwxj3zqJjyMI0HNnBgvc7VFO8swqpaUbDnPqFrotNdgbJQGs0OrY/Aj0+5gfe8WRBbQZmq1C07LzHI+44NHS2aZz4zTDj9YokPTUANOfB9LqaNp2YDfW6BeL/DVnt8yz5pA9T5UZrVUJuRasEhluszZEXyp7JeC5kRe7MasmrC1UnlUv3iyV57nC9PxiLv7xHBwLBd4+Hbhp5+eeTpF4qyo+j6eIDV/igm7IlBwKBbj6mj5bVzPHGvh/qbfFhXUXdPdtiQdq51wY7qSFehKSwLaAE2+BgHXfkWNVTBz/W/RttFdK4i5OmODI7Z2ANdPrJHC9ezVtFiK1kzkhRjX7Me51VjaAJPts6p2M1RKqRbgNTOViEXTBhGmnb/qzAYuGsMim+8sxQosqioxl5podnOOCEWMhSw58/XLF87nM/vdHieO0+nEfn/gT38y05XzlnA0hMbqsG6rDej8p0bml7f59Mzp6an6QFreIs1TdZkITMNoPoC1ZIgp2y0yeg0a6CqFFvNv9AXvCxLA7z3h3uEGR7kPzPceCZ548ISdAahpKAw+4UXY73aMcmBS2KlD1JGLsixKynW3LWcUqkPygIjj6bTwt9MzlzlxmjNfnhOxAPsDH497wm7Pfn/Hx8MnBueID2fitxNlSSwXB64m1RIrLdN8+TqL07HApqZg7ccVWzTVKqMlUUpENVUQyXddA/4XQM5b7Vprv3U8tgn78jNt0RYt9ZFXU1hbaJvjdXZupukbKK6ZDOyU1dYnqjbpjCusaFDJ0QIf8R5RYQxjiwSnAR0BfDV6iGvbGHbdMaIIKV2gFheMccbtDvgwmGOfB1fZEO9bYiy6dmsrUt7dj+N+gmWC2dkEP2kini6UlJnnCw8PD8E0j0MAACAASURBVOA8H+73fPv2J5xT9jvLnRC8I1X2BGkUs9UGS7lQYjK7PwHv2iZr5hBoZiMTWgNGx8ZSWC6J0yWT42JlIpaFUjxDMKfalta/2fdD1dpKyhbZIbnXv6EK8SUu5FKYcAR8ZSg8Tj2CI/iBcZwMqELPmGuFLi13yKnm2Gk+WilGdvs9dx8/kNLR8uB4V0NZrw2m3yml8iu2zfrgelMy51E7dn5bZXrVfsx/yMwUju6fiBcDOE4VSiYnOz8WJdWNvrS14B3RKX5ZOCTl/pLJyREjPD9Fvn07c7lkYrT53euh9X4yZcKM4gacBJtbUq+sOcxajaKapdrZNt7KxRZTFWvm55aJt27aQgdW2whPY/xKB0s3nM7G5GeK1/cqHb9XW4GOXmn73YxVN8DVjLVmVW/mqnZnqg3kaDX3FVLKPTfUltF5DeS00ispJWKyxJylMiUtbchYfdhgw6h0X86N8glmvqwpILqJS9WUmqvyDbZPPDw+Ur5+YxxHgg/EGLm7u+Pu/sinT59qbbxWs20Fi9wyOL8BIzdfLsyXU59vOQUES9wpTOymAe/t/qSa11NWNG2cuHOrVF5N7VLN/V4RL7jJ4Y4DbhDKMRAPAbxj2TkYbeMfQiG4zOQ9YbdjGkYG8Rz9yCiBkhPLxTLYUxLEGUrCy8DowUvgL3HhazyR58h8KTxeEpcMh5JhNxHuDuzu7rn/9JnBeWb/jXOKpAB+NK1J1RyjwWZoHQj7r7S8a23NvRwioUVmFsznMFs19wr4vydufxbkdCTFFrw0tL09460vkBd/C9IBzbpwsjErZQ3QVVYnzk2fsMHn9fUt8Fnf3Zq0mq2/f0Nd9c20kUsGlJwtJXVzXpOelaz1ZPU77zay+ls0bcWKDmpOVhMLJceZvHjUe0u93vqlOqKZH5JrOZbetQkFR8GL/VbwMARBa7Xm2LLP5khaZuJ8IceFnBLZCakKt5QSS4xcaoSSqOJKRsThs+KTbT5WBNAWa3AJ76pm5gTvhJSVnFabbGMiOvhoYLUP2uszrml/azTC5jXk2rejb6Dr3/17pNW3atpyZQOzma6cc5xPzwxP5oQcwmgaD1ZbR6q5xztBy9tZOH/1tu2Uep9aCxi2DLHO+XpPa+XttuE3U5WlDnAEqf5pXmoYPzgyLUjJSibYNM4pg3fEmFguC5fzzHyeiXMix0JJpQYDdHhxpQApKwPV2K8+Qs2nrzPBSv8mMXBWoJYhWe/dmm7OhquN7rb7Np9Ze6W9/r0onfdt22t+6/LbPJWrObtOiVIVvZzN+XtZZpZlJqW4OgmXWqpHr2vGbUHOLdPT1psXMVDsfE/GCZvf72z/ZjfSdYxXUrFu6JvTmsnfftNuSlGWuHA6nRAnPD4+8u2bgR9x0jOZt2CWrQLSSmO893g653FSM+QjiPjNo9lwqp9hlflOq++jKiomp61vKsUhCr5QvMcFcMFYHDcIElzTUGjRjOZobJFQVpcqE8WS4UaM1S1ixVXBW2UAl5FijtF9Lg3gRsFnwRchRMeQrcSEimL5mY0ocAjFmfnM5fpc/SrXvdTu92oIbofjlbnuoPqVeYYQ8N5SQuRsvmFvtZ8HORWlt1DC4q2IIl6vAcUVCLpmcrbf1TYw0yIiS4ycTyeeHh9JqbAspVYwd2/2wgpd2iZ4w3E1ig9WIHNDQUPV0FTJKfayDjkXUtaa68dCos2Z2vWKvOK8eYw3hqcJgvp7lILOZ0qKqPOc48IyjvgQmA5HwjThfGDc7/HDaGasYAv0vaOrRBe8LEzBoy1Bk46kDKdUkDlZ0bx44vzlr4S8cDd4Lvd35HHk6Xzm29MTMSV+/PITP/70N9PImlcrgoSAc8HIxQpyhMJAZMBSv+8Gy9Sr4snuQHEjFJtbwzjinY1zrqn/TWGsIa4tb8dGMxQsekEDJPW2SJ3gxRNcsPRTzkMVMiLBBE4DO3W6BO+ZhkDJjugcUWxDn0/PLOcTz0PgHGfGv/ylRy/4YOH307hjqEkwg/cc9xP/x3/75/cdUBvV/tCWaEvM+V8Ry3007iiaCPOCiAF6AwmCd8bSjYNjcMrBwUHMp8w7m/tFC+d5piy15MaSWZaMeEfGIrJycfzlf/6V54vyeI48/PTM+SGSiiIFhuqvoa5yL4KZkdQEvivNJAm+AjQvUtm8CmoqyvJOCQFUrYxAEzTONcUEVNekYy1PB2DrEzrL89Lnpzmtts3dXvfvvTjbFVwBm83xVpo2IC9UJYmu6dJAY02yGJeFZbkQ48JPX/7GX378n6QYeXp64Hx6XhktXU1WW5DTWKJWPw6ombWNzZvqc4vgaoVWG3rJTWlR3Vx40xkVKZbPyNXf9mgHShaKVH30KgsuTnh4+sYcZ3a7HSrK377+xPF45L/91//KH//wR3zw7KYdQw0IoTEENHjxviBnPx3Y7+5om9IQAsMw4as5X00S2t9NOQoQGhxU7TmeSjIHf0XJ05k8nXGDMtx7hg8GctzR4/ZWwDR5JbOYAq2FrImijscCMVsum5MUBo2WD2zvcVhk5SQToaafSLXwr4oyRU+6KOmizM+BJQv7D57kMpcSOZXEQyqMrphisg/4oAzHwHgfKLHg1VmIrciaLuD2IXbKWt5YqwJppm1xjjtx7ItyOOzJceH0+I2g6c2x+EXmqgZ0ml3cVZ+MfpXwAnmtzmvbhblqZc22G1NknmdLvZ8LKdvm29uGcnzRrihmvXl9e976cgMi3U5clKyZqBWNFsjFFmwYhloosKatb+YmqobBSo23LmjIW+OCxoiKsKQF8QEXApoTY9zjh4EQLN+LYBmGzXHveyPxn29CxJPMnETNlMtAKeDmxCUnYi6QZ+bnBwKFy92R+fkZTYnz8zNPT0/MMfL18ZG/PT6Scu5FElvYaLuRlMzXSlAmXRg14Z3juJuYhsGKOh4EP5pgNBbF1wgsq6djVeSlOxy/YHeq9uediS4vribgaseWbBDn0JqW3GrfVI1qM9+8mM28SLZMy2I5W5ZWw8c5ns4XNAQDqtOBMEw9m+xYBdkwDMR4eN/BrCPa1pi2jVyk3qMBdO8DYRgJ2eFd7udvGZzghOAdg4fRwz64HtLqvIUWx5JxOaFYn6QUoThy1eRzucCXR86LcJoLp8fIcs6VaZEatUj3w6FuXEaWGTgx1o3uy+CbTETWTVwsXN53WrvFWVLXUO2PzcbaR/hWBtCAxGYDhp5QUSpYMjz0vovze2HO/Q5kw+jI+vd2y7hiTdTCjy+XM8sy8/j0yLdvX0kxMs8nlvmyysIbtmZ7HbevtTXVfKa2PFsbE2PQoY1PT8NonWqfqL/XNvXGQLWgSnVUjb2FuNvMOZ1PPD0/W7kKJzw+P/Hxw0c+fvrIx0+fGMrQmR0USivXXbvpvXm5oSoWrV9DCHhvssG5gEowiwBWSNf2V9dzdTnRmrMGYhTSAkohTYJMioyFcBDC0RgVt/PI1D6R0GrWobuCOM7FWRFV51mKEKQwOM9h2DP6YC4Vg+A95BQpy4mSI1ogzI5p50gX4W6AmIRx78iSiZqYS+GclQSMTpgmj3g1v6GdN0fo6CBdszmt3QKdJiO0KrCN6XXi8GEAhN04UlJkPp+Yh7fX5i/Kk6ObubnKUNlU/n39k2+9KdvvagyQk15M8/Y79ObvF7+km99ql1QleEs2pVqpyp59bL2nxsrZB0tlDxTJDmKqUTjmPNUEYmOkLMnamsiu3+NWYKlWyrGgKZGXGUEpMVK8t4yUtBDCV2/xV2u5KEmr1ifm5qloRcnK0DYXVVgWNMzE04nzwyNpnHl6euLh8ZElJR4fHnl6er4COUBNxmKgJWdz2HQosSwMmgjOUVJiGQb8kNj7I4ObsEzIVaNjtc+u3biC0yYouyauiuZEySv1jhZwa1RYK2HWttjrba39RkGrE3yOsZaZKOZkmZI5W6dslLELxFjwQyT4QE6FYZitltMQaAX23rVJ4y2bArGhwquGTQNzdc7a0lvNDQZmDHD7Go3rfQVAvvpHOKuBZJlUCyFkQqhsZguNFdCSTUBmm/NNY/ZtnATLtSOWfAxdmfhiuLSHxrfNzktzjt5s8PWzevPa1uRbVHvldacGALXKnXVerQBn/Z4V/NhUqz32HXPXr9Ve+42XrzUEsHlsZClQ/WCUks3X7nw+VTPVpZqp1krj23W1dXq+ba+Bn+3rPRTY/ujnbcPSV9ZIe6DJNdPOeh2yHndfjZv9SNXSQZxOJ7z3fPnyhR/vfmSapv5ZM8P6GuX0coN9j/bhwydOpwttkvrgGaqz8bb8iveeoYbxG0vd1qX2tZMipFEoFJZBWYaIDIUwKGFQJIAEZ8oIVvDagykD3tg17+x95008O6eI1ARL0spLbIIBJNP+qVP8KAziCAXCQk2ynUnxghaY/YlLeP7/iHu35UiOJE3zUzu4RwCZJKuqDzLdMrIiOzczF3ux+wLz/o+wKyt70bvTh6oimZlIIMLdDroXqmbuSGaRrK5Cj1OCQCKA8IOZqan++uuvtJjRukFt0DpVQaMFCUFlChfaYv4JjjOfnzLmUz/W34iQRrCkjVp29i1Qtr+gQWdrRgxmwNsOX4MJrPXWTpPmcDbGf6+P042IRfwSg0+AhIROU6X28++6mZr1hV+iNDOcm8BOB9PBiSZM18RgWzWGrBm7Pvgf9qwDTn5qjdp3EJkNHwdZOEbzJNfrlcv1gkjwypV0GhjAoe3JLVFFm3VELs+deguGEKG0/Uq+XFmuKzEub+7kvDTlpRvcb8tG6K5KnAJ8s0S0w0Or8OkT9bbxtDf++emOxMQfnp7410+f2Grhjy/P/OHlmXoylnac48rBmVBSK6Q+kJyFNWfWywO/+0+R999ZWuKSIFtV4xzTQZh9heT4uSxPi3GEXp5ppVKHgwNGJg8yU1zN606sesGcvbOP3GqjYBogz09PfPrxB1rrlFaordLU0npb75buiisSMzFEluVi/JxgfaR+99vfvO1gDrUbOV4SsiszR9Q1SpWELXXnfrnTF4OQoqE4OUfWNbFE4ZKVSzY9o4GMdFWIkbSutNqJIZPTbmOcrKJJQ6DXO9tLp9RA0MgSHWURd6RFCMn4Z7XDpkpToVbTGerdigPsI80opunGebrKN7qezMK0DnVMmOHEALUptfk8CMnWPdA10MR7YZ0cpcOZ8HvuMpGckaZ+y+PXoguH48UR9mLBVnDHbN9vbPcbtRa+//4P/PD97yll5+OHH7jfPpuda+bsAK4y/rVzfSVIlRPXRXFej5GHy15P7TBk/v7wIJuLQprmmhV/WCrMRARB6V0Ifl+tt5km6+4QG5Jn/Bak8+nTR56enrisK/u28U//9E+8e/eO//K//hf+/u/+nsu68tvf/pbHB0NWD37Y2x3/7b/+b/z93/3j/Pege0wuU7BN+9zIV4YyOebkRBferBVqEZpWnvT3PGmCWLl8t7O+35EIugILqJg+Wx1r9+TkLCmRozl7SRpBumm50QZTkx5M/qTIxl1v1F7oWbh+F1h7hJfOFhpbUer+wsuPv0d7ol2eaQ/PxJhZY+ISswUuqvSLNZcKVcjVZURCBzE7rTTQ8bLyA1wodBfbi8MAQyQQYkckUHbl6dOPvLwkVP8CTo55wwe8Mhwdda9sgBSnP+BLZ2f+LQO4PMi3w4FIyftciTPK5cu/H579F+f6wsGZVyDmlXY5ovauirSOiYKNPPZwdExvp2mjnmL8wSsKTpQ1rk4wTzwMuNbrGIbBZECu5nEOLQntajwdQGsh5WV+tmjnlOV5s2NX21gMLBVnvHu0HSxNISos2uHlRo+F296pzztI4PdPT/zLhw9stfL9fueP+502HQ9//mfHMwQT/VMl1kJobTo5l5y4Pr5jef9b0uWRHK0SKnpX4z4Qw9Pgzu/8ZwMc0N6o+0bdC02wsnWxbW3QhU4uszlLKEeqyiaXtkbVTi072+3Gy+fPtN6ty25vlNb5tO287BUIaMggpvSc0sLo/B2isO33tx1MGLCqf40Q0gnOGGmEcHrJ3KQswjX0JqWhHAs5w7La+6MdhCpoiIRkTVFtP7LMeRfT2OgIm6c7WwuIrkRJ8xLBaFHRL7EqSIOq7uBEg7tTxBBFYfY4E1WPbscIMr82NSdHlUlctjhJPa3ukcxIp7hDo/78XhPP4VjBcvxsmKQ3PP6sz58Q1nByBko3ENTKtt3Y952npw/8+OMfKaVwe3li3++niicPUE5is2fk6E+lrs4VbL2Zw1JLZd831xx73SJitIYw4cjyusDEI3b1EvJDcFb9Ok3gcpQXo4AXcfSuvLy8sO87KSX2fecPf/gD3337HQ/XB3LKvHt85JtvvvkPQeLG8Y//8J95/+6703MzJ3Regh5vTMrDWKqYLX7l5DRovRKKp5FiYX0UlkuHqOiiJt4nuFNj8yGmOPWLlhit9QsQR1WlmJYPA0cXNZ6kFAo7hQLJegcigV1g2Roaoe0723OhlQBbRbdGDJnL5ZH98n4iSilH21+DEqNO4dQxvhMUmfIxtn/01ihSZ5rKikc8XR0NgLnd7P3L8qddmV90cqwJ5wky9IEJQWj65YiNkfriX+5QTNRlOBjD4J6g8yM/fjgM8wyvchcnkEv19Dt2fcQIo7miizKR4jSCeoLuYwjkEIxXgHhX5FeY9snkwRlCHeJEeLqq6zAEVjBt1/PqYu3+1XxncUg/oJMc/5bH5/vG08udvLiT1kHa8ACCyYwjUDaCvBBDIrVGag0l8LzdKXSqKARTUBWfkMMYjTy7HaZArBhygr8/OtnvpfLx40cagSVF6rsL18VKQR+vKyHHA6pW9c92FRA/x7ANEUP85jwYYzdQQ858ngMSP5e2jvMYzI2XfJr8/+yc7uM40EX1RWkihrYIOzIVft/sGJucPwHjyiiIVQtWl0i43Xf2bafs1SrddOS51dfdEZ3bszp9rDJFyHo/0hExRvJiEHEP6urIAIGggdAjNURCikcgwYguLT1dulVkSMea46bgJGg5RZyK0caHk9Omwzq4e1UNzbElFgxyV2XbLTCxDdIFDNRQiy5tPkJDRs4IxRH8zJhfRtXd2x2/lg8zQRx3vMbmPRyF3jvb/cbz82fKvnO/vbBtVk1VqnUfH2mj7jyY7o4OMJ2TLzk6Zw5T8DSwtQ45FPFHpeyXTs7g1IwS9EFsHutPz06POzYwHJ4+12Xrfv7e0DAcrvEMbM3tu0lRfPz4kT9e/8j9fufhekV7t8KCdWW9XP/Ko/f60G4d3o8xEyT0rziy7uQMX/UUKKobstKgVqVpdeXjijrvZrS+OTIn/tzj4OONirdACtFbS4g3/zSu20BNUzRtHetv5do2NNBgthtBuwEcFkAcTmjv1ZAYd7JlBEcCzWOM4orzAUHUuKjd/3sFpzPdnmPflxPwAF4ld/qTr8GQfvyik1OKlVQ338TQbjl68VTQPM9h8F8tSsU6coOx6HufFxncQ4shkkJEg8HRpziCmVX/yo3Mnw3nazQMDIFwMeGlXhvkZGm1EOjeMVzFuQQIS7CSWUSOqJBRGeS9dZpFPmOj1GYltE3GExCaL3bxSTU5CCNC8mdA78QWiL2SeiVpJUsnh04KYNv12xz/77/9wPd//CPrmiw/TCARHA6MhJARhCg3sjxZZCgJdWG+mhfKckGTICHzsGR6V/aXO/t+mxN/oH/dIAMzpH00ZBRKU4J07p+f+ePH/5uq8HBZ+Ye//x3fvn/k8fGBf/hPf8f7/M7/9vD4pWFoXG8WEQhWGRBM0G7Tzq4ddSn/kF1WvhnfRlVmWS1ikvbqHJLeLQ1LbeQQebes1GY9spp6z5bptA8EA/BSRrfCAGz7/mbjaIfrzEi0jb0LpZqjddsKT883Sm18ft748MG+3/fR8XhUq4yoMUxZeUtJ221U37RUrUdVrTaZ13XlcrkYgBAVnE+zAUWVqpGHulC61YuYDOAZl1W2qqSts3d7bJdsFXmjwi0EIXsK0wxyJ4qR2KcghJow4Ai4+lBjUuXltnHfdlqHew1szTblu0Avw04dNiXGUUnUMeFCt6O+jOMbp6vGcbafX08XhSHaDgMXVyhlp+x3ait8/8ff8/vf/wv7vvHxww98+PCDNeWtu/VBOjv1HA7seWOcCIt/Hc5JCIFlWYgx0bulBHtX9r3Mdg92nTKdHFMmNgrA/X639SzHPtVRp0UAKN35bIPTp6qzD5qhcOsZDCF5QLtvO7U09r2g/+f/xT//j3/m/fv3fPjhB373u9/x7t07/vE//QO//d3bjuW+3di251eIzde7o594VJg9gcGZse/32tiLdQZ/4YldbkBlkR0JLl1v0QgSAktcCcvFNMtStjQZxqmxasrAGjMpWMFLjs1QI6mEeANpxLARwk6QQuuBWi3NW0tHu3Fsei+UutGK0tKC6m4BVGhuc4XWxDWyOve+I2UjIDwGpQVQ7ezO/eleaDLm/JAyEBHiCETgC6DE7/1n+sr9qrYOs2eV522DjJJeO8v0wKejM4ZsXIB/3/VwcnygLf9oD75LP6BjHX+qvPq411d3mM35u47QJDOuIUZLl7RmjTbxCw8m2ifBPNzgvYjMybFLrrVSig1Ehdnf5YDBmcq/c1Acpekx0TUe94MBgtKt307ozV7aiNqJbp7jG4PiHz6/8P2nz1xWr+4SgzGjBCQkU9P08n2vNLbJ3c05TO++YVmycRxCZCVZ3vVeqK6oauOsxxzx7w0JtJDfeBTKXgp//PEjT88vvHt8IEmjlm9ptfI3v/utpyBOpMTevQpxyNfbK4q1rAhiMY6MElhs4YcYrCSzHVGpkZPl6HruP7PUYieJWDd2Ue4i3oQT5DS358xXppEe77b2pxfeX+UQx7DEUlFGObPN6n7f+fT5hX0v3G6Fl9tGnXpE+LrjqKAKpwKAcKA5vTlC1Qf6Zs9+XTI5JYvYkkHRHYPBqzs5sWVKjyY0CLNE2NaJOVelgVSlByFjSGtOkXWxnkRLFK5ZvNFxJ8kRHQ/1VFNiDva946JNlSQmcNg6sEMvRryPbfC7hiPsEeJEkoNxBkZUOh/3f1y642cPHTE7DO0DAXqrbNudWnc+f37i44cfrKLq6SO3l2ePuJsT888fdzRz/ZJMPPgwzRGS1ppzS2yCmLigIS0jFTWcHGA6TeP78f5EcvgCkBRmMA02XiOTYEGno0Q9muSA/9HQcrLGosb7aaXy4w8/8M3796w5s+873333Hb/57jd8+8bq1bXt1PI6yJmaT6dn499ZsHWE9IayBtsEt+KNT6ns8U6LO6INleZIjn+Mj2mcqfNASqvxRhHrU6eQQmRNKzkkYujkVAmhGRLMC0olhIpIxZCcTq9C69DrgeYYsb1YebsrECsBgmJFrAGNVlClYo5r74WgEEkkSUCnaTvhNsNfkGkrzgjqdGzGXs9AdP4CJOdVeeG4CE/znFNNrwZzhD/nn42L14krH/D/QEfOI/Xqtn7+Jr64Ypss2mlDBEu7ly8aegPu5IRRdcLhGeqQ5GfqMqhZb78ac8he6wDpV67O7vUwJn7/zclWPXovJzv9FNV6azvqhD7jBykxQRCL3GJayOsDEqIhHqU4imWRooo5jSkvVg5vtH5zcvZi1WLd0JKpReJIjnJwk4IIS47WEd7hZntW3sixVqvY8s0VQIKr8Iot1lnlpufy06HBbOkUZFQXzNzDIGv476sPvx5zUpzH40Nm6KNXBngELUOC7hSB2ecccCocp3yzocTUpks1zZlSO/fd+sq83He2zZpklqaAqx3LUcF34isPO3tsOMGfw/w3Ayf2+7IHJiImr5BMLweUoJ2ggVYDNPuc2PUQEsQKrFIULmsiZsxpTpb3TzGQk6WrclCWUelFYzREMQfHnrY5OL7pijk51omhm9PblCYmjlZbZy+KWHveMXq+nuMkWctEGnSu8Tcf0F97nKazrU9reLptd15ePlPKxu3lmfv95ho5G6XsBw+nm4yAVc5FQOndnRORn2ws57TViKy1j55zeNoLBqfCRuenzuNQ3Q5hVKO+TkUNRdwzUflAQl7PzyFuiDu6Hm1ztAww5zy0yrbvfPz4ERHrrv7b737Dsi78L284RDOIGsfIsZ5vaKZc5BWjAdRil/nMvaefNjrNwABXiA6+HhXn44zP7ebAm9EcZ3QC76hiFoxN0I8HLM7dCySWuCI5UmWFnmmSKbGRo41dSo2YswV30dhy3YPAFI043UZAKtYCqKrd667K1is0paoVHDUMJGA8A0cpZXx3mhcHguO/+pekq4xnMvpLOHTkJzsmLDPnauP52gmYG1FvxqIKhzSX+EY2X37xp6lwmiQ/7+iMX1FV9mYaPNqtJ0rvSkwGm4YJO48HJjPyn2kmjDsQJM0Neul2kuidqs+5cE6faHt2o/fThHNVYOnNUA4B6dbZNWFRZw6Bn2mm+lc5RG3jaTuodBKZdFnIObFc33P95m8IaaVuz2y3T/RW6Xunu9hSviy8++Y9MWeIGQkZVSWnSMrRpMhbQ33jNWK2T0kxEX4RXIMGQgqkD4Kq5bD3fed237hcthn1STQiXY7Bqm3cLzUuhj/b3gi9or2SYmJNmR4CKS8Qk6VzqNbAR9Ulwc1ITtalb5pDQye0jpRG6J1IIIeM0okzZy3ORelMxeohYgesb02wkkTtQt2tB5uhN3dKrdy3yvNt91SC0IkQoldM2Z/HrMTUrYv5KAMPeKmpr99uL+3uaIbBgTACfwyRh8uFy8NKR9m0UdVaQMRd2auaIS0DCTauU1cjRS4PqzX2XC88vHtPTBkrF/eKTm1ILz5m1eAYzIBPjEVM5NEMtSGOXWHbMtu+UpqyPm3k58JeKqV2tnsxlNKdcJFASMm6cg8H3fWdZkf0N49Aft0xYkhB6b0aQtManz5+z+9//y9s253vv//9JBvfX1643V6OprXd0LvL9UpKyYj15cbtfrPPP9n3nC3dMbg0R0BcQgAAIABJREFU1fk8JZaJrjYdaNCXrSGYzk1M0cUDh412nbSyW0Da21Sdl5MjdIgLGsIY1D5jXMv0wIfNFmvb0lWp7PQWqLXy/9zvpJT47rvveHl+5sOnj/wf//2/v9kYncUVz8908G78iz8o9X8fGDETD1CqpwEbla1t7H0naEO7ksLgv+hsnyOt0vdCCOo0A/zZGHIfRByl70erBddNk2Dq7YskZM30rNS6sIVHWksEtTHLsVOvked3AntHwkr19Rii8pA7MTZ0V/YooIFdIi9qlVZdq6kut06vu/ejajRsjwhzbz4cnLE3n7M9A3z5uZYrfyaSg6Gj4YTknLz1OWbDwRlOzjHytslMpEemFza9WP0pKjInxK91dMArYhxq9QchgqlLji6YZw9wTEZOkG0wh07VBAA1jsV/dLwdzvngYB+xvVrEpHogG2rKsEa0a4fzh04U5+2DReNumOdvN2DN2BJpubBe3xPzhS1AbXeotmgGGhNSZFkXazgaM5IWS2XUOttyUBuuA2Bcj/GsvIz4DLHWZux4Q7catVX2WimulNybpQODqvM6jpqhUZpujosjOWoN22KKJq0enVwyVZL7Me7aGdUor48xL0E8xWq57Oi5YTO8x18dCJXtmUMw8m0H84zk1GY8nM8vd7a9UGrntjXnWZgwIH5946pC0ClpdCYeM1SDgfEH8pOXYrpR0cvPF3NgvToxtk71BnrdU2hGYGaCIhID6yUjMXO5Xvn2N+/JefG5UG2TaMUh8m7tUuphi8aKFYmIV7UNJ0exdiXrEs2pqZ2tNidgGpJjFmisZXElc9dTadGmjB4p6nPq+X/WMezOuBLVbimDWrnfX3h6+si23fj8+YmXl2dqLWzbnX3fj/IkX/O6+le1QK2U12W4wbk5X5KQramyNQzA56AFw/1YE/LFPuBqyFFBk5OUnWw8iMejnH0KBIoXowSO/SbYHtDa0XpiOLijOetYd82Rq1oKL8/PaFfu9zuPDw88vHv3puN0pMOPfxNkKqvr6f/n/WM+PMukW4DdrOVR1UqTRg/NuuiC7xkDBfNX72hzXqRrf9reJxPNsRjCUB7tR7MUwcryI5EQk6ebMtSVRqSkgBUydXLuxKUSaYjXa41WLzkamnN3+6LBino2DaCd2JVAtWrnKVp4RnJePY0jazRW4fm5nv79teNXtXUY6al5xq/9Dgea88q2u9c+m23OgR9evaFEg2DWh1Ml4+bspDJuRn5abXDKUE+fJXgE0Voz8bZusKjIZr1N7FPtd3XUpvjn+6oayY/B/Bjc+BBMcGnuheOGx7U65DuMo7aGjmZrjuj03um1mHBaLWjdoQ0js/zSsPy7jxSEJVlKwNJG2V4ps8REiiY13kL0yqlOS0KK/mxVTAtDA4SOBovstm2j1GoRTLHGewDdNRosyBZ7MVBAIeeFx8cHtu09l8uFh4dHrtcr67JM1WPplkuO3aoBQrcIT9Xgz9qN27FX65NU42j2GKaUAOLjqCPXq17q6lHAnF+2IFWg9s5WqxlvsQaEMQSWrlzEeB/DWFgfq0QMyaqyxBQ53/IopXLfCi+O2NzvO3tplp7pY/MfTtnrpW5+X8eFoGcrhOMXmIhbIjkC02nRNvsl+VxJ6Wh3Am5C1VWLAynaI6qukm5RvTkkMS+k6yMhZi7XC9fLSsrZHJymqAqtdpoGd5DF5p07Ocd6jIy+XCZoZmMcA7RgsHlOcFkMoVuXyLpE43tUsXJ45331k4masgj6Ksb+Dz1eUQHGMx7OKC72d7/TXNV42164b3dK2aYOzk+iXL8f02IB1Mc5RkNVWqN14wiq9pM9HzbcUijDXg60NgTjxsBRVSXukQ2UZyBCw4anlFHtljrrcf58IvlfBgpzmY41O8Mdc9Zo08FiBKwzSFdKLXx6euLHDx/+CqPzp4/eHWk62ZWh3D5u5NhTv7Kx+3O1a67sdadppYghIFE6NNOiCigxWKNZRNBoNkxEQSvduUtVrf2OSiCS3cE98RHBq57SDOYRQTTT85UWI6ntXgbeQCLNSf+9KbVVoij3rbDtlRaNUzXS/ABDpLV1ZdeOdA9eo3HriAO9O3yOERjNVNUJXJn7/c+grL/CyTkW1THX9YBMTsdIXR0T04mcQ9myVehGlvLY1+XhG6Uaf6D57zIj7FNE+bMojjJKSm0TjcRgmh5lt6aSEgLb3fReTB/EDGNCWM7RkVhWuUugjR5MVrzqC+iAUydUJT44IRwLzCdR3wta68koQK+Fst8Jt0DKQrs9o2syYbU3dHIuOfGwLqzZGpxdLxfeXx/IOZMvVy55IaaF0FekXmgpEmhoy85zEG7PdwjFCaa2SGop1L1YSqJUeq3WT26obQZhuS7kaBLmy5InFP63f/t3PD4+knPmu+++5Xq98u56sV5HpRIUlgSrE2ylRdCI1shWhb0JW1Fue6XtBWIGDR6VR7qPSxMxBU5VSmukvTDKWrsb/i5CC0ILyq0WPt1vgJCuV/K6ElR5v6yk1qi983zfuZdKjInLas8xikmb/+bxbctUn24bHz498/Hphb1UalW2YgRhcY5LeqWAfPBsBIipkVK3ruNRGcxqe8w2T/OaSWRbet2cOgFSTFP/JOU8S/TjDEk7a4YQO612y/uLEqIJQIaUWdYLj+/fk/JCzpnr9UJM0aqAyoZqo+ywyW5oUEv0wBcbobetSAlDFUY3Y8zh7o0gyrurOX3brtzviVoypSnt1inN0l/NK+sGydpAZ+OCwRdVo/8BxxkJ8R9MJ2cU1JTtzqePP1D2jY8f/8iHj9+z7RtPz0/c7jezpw2c5GH2V6uJcwZlSdCDcMmJti4mgrlv3Lc7OWUul9Uc5T4cPSehF1srIUTSurhOVGBZLf0lA1nBHM7iFV2tNkoxJyelxHq5THupjvSVsrM7H/C8z3h/Tka6J8AMWMbvGqXCyMkxxCkCOo7b/c7/98//w9KUb3jc7zfu95fDefB02k9Oe0K1TT/G73EIeqpyu73w+eWZSqWUOyUXcgMtiaQLCcipkZeGSqASqGI05tI3SisI0CQQBVJISH6AlCfP0dA9QBcQJUompyshJFpM5LTQe6DKnbQHohQ0dEq/s1VoW6e+3BACS7rw7rqxZEuVG/HcElDdVenv3QoUpCuLdGsSLQFyJOZovCHxks3TEA6h1egqygO9svX/9ePX9a4aX195V8eC/zpjnJnLGY0wx0vcNZtQpfYJLZ71Ab48/9dQnC+Av8NxPhmHVg3NgU4NzityjReLNlx4bn6iIUk9RDOqiPdzHbC2N7A7Yf8iYpogmOeq3Zgar5EcLPfpkY2leHZDctpuaM4blo8D3p8osmSLxAeSk3MmxWSVZl5xlmLCSGSG5AyKUdktDbH3xu6KpdYKwctNa0WLK6mGQBcXfOxDddednxjJOfPwcCW6wOLDwwPrurI4kqO9QY8TyQGBYCk31KrhSofSlb11Q2e6VakJQh9IjmN1c6/uOis1jvL0gczZ2ira2Ws1mN2RHAEWr8wptbGVTmpqDTpTIqdMDMIaYc1v24V8L5X7Xni5b2xb9eoqu8+Il9X+hH92QNxDfPKQeNdjPsORHhibq+PnNn7GsQjRkTpg5s5VrWdYDJODEEKgByUmU03OeWW9XHh4eGBZV1KKLEu232sQpbpTHWgx0PDKS/1pa0UTMnSxuW4dz3vXeW/Kl0hOYM0BpJ90qYZj5FHthMN9Zuh/gJNzOsdPK744xsFfiukh7dvGtt8MydnvbJs146y1WLrSEQTxvxqvIFaVKMH6lqUYJ5ek1erqyX2c2r76s+pDAyrNS3Mk5+Dg4TphralXdh0VUsPJMX5jOD7E94TqKsr+YM6AiH3V41rmeKn9UKUzhPfMOTsQlNoq++edj58+/YWD9fNHa9V6u706vqyuOt+UuuPqCE+PIOawjx6PVSuFSpVuKsddCBqJwBKUJQ3Kt31kU6X0OjVuwDlo2unutMir1keOwiGIZGK4EMJihSmS6RpISQnx7kFRNPVwNemKbeuICtte2fbRk9zaVwTXCFLnUI6sTVC77xDNnuPChcPW6HxAB4AxqtSMW2hz5uc0rH7RyRlN2ILI5HCon33ShGeaak7BI83lsCXaJySufm3DkI7NDgmEVjjUd04Ozp86xpr94phsfjfE0uwhDF6APR85lrzqNBzz/lToag0Jq0L1yWCIjT/408MOanD8QHMCthl2zznbzzydpZ19u6FUgiiff/wjtA1dM/C3v3TX/+4jiqlODn2UEA60Ljh5LdJpYvL6ogbtw4qqIMsj4fKISqCcnBy6O7Nd6bWhnq6aOEIMXK4ry8U2suWyknN2KXdYloWUEo+Pj+bkREsH7bUQgW1fCEmovXJvd5rCc9n5eNvYWgUVZH0gpUzPK0UFqlrqbDcUrynEnImqszWJgFXyZDOKMUdSDpQcuT4+cH//DgnC4/t3XB4frfS5FpbenMRqRnts9IZqNcp9t/45b3jc7jv3rVDr0JfwucepUzsKTqRGnC/kHIcYm/Wy8Xy+8SyU1rwIUCEQEfcERgNNeB3MdO2TrGtVjTh50FS/JUZ0jbQMOV+4PjyS8sKyriyLNTcN4Uhp9N5Pxm0cFtnF+OVyfy3SN1Iih0iZETSN96akIKxL4vG6EkvnedtIu/PTXomqnc7yPyFTNWzqdKzOSLq/P+61VENRx33jgeNQDg6SLJ2ngHRoOsu6Rx8pc3qOdgMDMRoovpGQEzEIvRnPpbtmTXOU5ssU00hVjY1TVdlVKWUEtkMY0PqiDUc15zxlS2o90lvd02hwpNcUfA32k/sGw1kaQnnu/fize5Mhe3XEGInp9WT9WurxtH8T9WjKCQEd6SQRR807pXZ2bXSB+72y3Qq9Q0qdZem2h4qlgERgSZGgvvZ9ZFOI1qIoOrJ0Ans1iBfgyixnN1nAQdcQokSXPAlWddoFbTYOdOG+V0PVsWKRJfnf+xB02xDMkbGHZWmy1qFFTO8BZO9o7cYIELs3XjmGdn2/dPxyumq0MgjOoXHhtEPv4nUl1AGx4qSxYORPte+nOMd0EIS8OFRdm1VjbO31fZyvZ+RYv3xDD3TJoopoG3lqFi2kIR1ukUpntG43z1YninREU41O9W5He1e25uc99UIST8UgMol6IRjyEBaBfhhdEZn8hdoaL08fkc/K9vwRrZ/5/LDyzeMD/O//7RcH7t97LBnWDEu2zS4nF4SLYmqzNCLBeCkp0oOwLCvv3j2ARMLyQLy8A3dySj+aYIThJrY+I2LbZMzBXdZMcqXlmCIx2iJ+fHyk906MkevDhZwz2ip1v/H5vtHiwtojNVY+33f+8PGZ217ZgVuAivCYhG/f/4Yc4NY691KpNLIWNr0TJNiCu15tfpx07NdL5uFysUh0TcQ1UUuh7Dc65sB88913vHv3nq6dl21jr4XbfZsOnYpFrb12tn3j08cPbKX+zEj85cfHj5/59PTCVpxgHK1D7+wMPcjiPjcPBNNy5EtsLNEq/JBKMcUvau7U0l0anqmhE3xjASaJVcXGmGp8ltobtavByimSrGcEl+sKksjrhcf335HyhZgSeVkIMVJbpewbrbeDH+ezash+DSVXYNoYODatuekXq6qsxTlvary7HECS8O3DhRQXblvltiv3vZugIFbGOsjGIxg77MKbDucr8bPzMYNIj0iOVgx2j/eb9akq+0bvFdU2K660KylncrIWMtID9OCEYkM2ht5UjNav61XgE/zfIZDT1cp/953P+04tm61vAYnVbF/OFhjP8TMl3VE2qtq537pxVoo6T0t48HS1UQGM39Za5/nlme2+MdSUrXlyYLlcWFcTBNy2nd0drdpMMmSce46ZDqRiXNfbDmbKyTqk67E7HsMqr6CxMd4Ra7cDWC8pCVaJG4Ohyq1zq5WXvpOL8ulJ+PhRWVez49eLZRlSsCBWJZDTgkZDlA/Jl0CSOPdoE7FyByeKVxx6w3AHBboKJtMQyZJoATKR1IXYhV4695vxvz697PzwvLEW5d01ErLZkS6YlINAD9nQIgmQLsS0IK0T84143eilUp4+U19uEKJrAg0G+mlQT4/0T47FLw3WKAc/RmhAW7+06A8kxxyJERa8viAZed2UjJkdwk+m4KsM2JeO3J+87iErHmZJ4rkqY6QlGOc6EbCGEzXIauqQ64gWZhmUn6P7JqCqRrSN6t2w/bP64TjNSKNbV27tFbRw/6xoyeT+thvjQG+s27QZsFEVZHPbSYke2UkQQsqkdEFCIixX4uUKEki9kbRaOkKOqgZO4npWcWVpobQm62Pi6N2AtVMyYmtMkctlJeVEK8J9e6bUQlRhD5XYhZuL3H2+bdQU2XOmx0COGVlM5Zptp+7GmVHpaGiEoMYNSqN6poJXc6QYWBZXgL5k4poIUVivK+v1Ys7X4wMP7x4stRUDqdhmb2k/I0E3PA1WGy/3jfQz/VT+GsdWKvteZ0uDoMMZGfVnzPUWRpl0GI0CR6qqOylQPMVqDRK78yrCF2npKQdgi4YRMU8UZqAJRAL+rGMgLJkQF/Kysq4X8nKxoMCVant3xdNuxNFX1feug3Ks6ddo0hn9mdegA8kZWlXiZGgbM0tem/CgIZZeGaKjvcDQb3qTofvqMW3EF4b17PScydDm1FlqqdZ66up9VBtaetEDVQQLkeNcqweSM+zAYQvOrzB4jGI8SjAHpYupwg8sLXImgY5NVaZsRwz+UH3MoKIaHHHygHjet1XDDedvvALM8vaRzok9TH0jOc1NtJuTPH92zKW3PKIXcHx5nLMdh6Njz3y01LV3LcXeXaqiqaefmnPIRNn3xr5ZWrHX4BDJUY2GGOJFcgLxdD0d2fXfJRzb85QcEvFqSdsDj9oMq8+KEgiO5FifTXcwm1oavzY0NNZuKbRRKTb33RBQ7/lHXpF8MamOplbBGgLlRbzgwFPprxDWLxbmz6zTX2GFx11z5Kd9YKbDMueOzAGbbw9lYTUL08VUTQ+NCz1FDKf+VXLOpH5xI79ofIaEdiBl2zhjjPTerQeTGlybUjb4XjuhtVkBNpyhoErwKsUc9dDxGak7P4949VBwolsYERcHq3+I3k0uz0CPeqMU5eX5hbIH8lsvviTELFZ1GS3E7TSaWqqw7DdCLLRW6G03Q9rvlGJaJLpV9GXDugh1x7mMozEXtRthYCJnEoRMpmmyZ57iTDNM41WDK6oG9u3O048/st1eWEPmll9YQ+LzvUwkp8VIXRYjFpcLD0EoOfJ02/j49EJtjZAiKd0IItyWxG01yH0JwhKNG6R9MfGqxES1AoF3j1f4zbeEEHj37sr1uvj4NWLsqCaWHMjJKrxqNTi9jg37TUcSILj2jQUeMWWi60CN8nKA0YU9iDk4IUavXBswf6OHbujdQGgbtvG3joQOwVthDFtwWoBdG9LxKkl7BkF1or8i0Tk6R7uA5s0Y1REK4+Q52XcSn5186GsNOdbWWS9jONH2Z2ZDum/s0c9H11k1laNVafUWueTIdUmmcu51EUE8qnWLrCPV8afT/n+148zJ+UkVqQ55i7PjeaDJ0UUUUePcrUs2TRyxMmTjcCrila1drfplOomuNBzFBR59o45ul7OP35IS67KYNEYIE8UcZN8Yo4mNjqBx7AkIOSeu15XWkiMExiLpvbHt20nDzByZlBLrutJ7Z9/NHr3SZsPSW8Hte3RUUIePw/FVfZ8yp+3txhCYiMlX35PTzjb2O3sKzMDB/2VuoDWxLWoFMJ1IQ9kLvNwbXYX7BvseCNEcJfH7pipDuFQkHuc+qaQb1GkBWvQNL7RC74IQfWEoaES0colK7MIaPMXmBQl+JkpX7rXRpbLshWXbQU2NXVzTv5PpskJI9PUd+vAOtJtW1ZqRbSO8PMPtNp3Bo5f0nxd5/DInx42KiQGOWXMMkGP/p5kjE+k4hKCMcKopU6MRkdpQu8REtlIKdA3WP2bkhd1wjdtSwT3Lnz8GVB5UkbBaKblvpGNRj+hGENNZcU2bs5OjQ28AJaiQx3Uw8pUHCcoeR5iGOYxNQU2nx7xiTIUVQJvr5zRaLez3J4SOlj/dMv6vceRLIK8ym1Vr7FSMnKbFrk8kOnBv915qZas26fcmbNXgS4KgDnunlGf/mBEvDDh0lJcudSGVNI1XSnGmGIbmxXAKb8/P/OFf/5XPn59IErlEqxTaqvJ5a+xNIWU0rxAj+/t3JITLkvn49Jk//PjBK+rGdBTWLFzWSArCN48r3767sOTEd9+srPk9KUNcIK6C9kT+3bd8++4BEWFZTDCx98ayKPsupKQ8PCQ+PwtSlVupbKWz152ijf5nLsY/9zCBv2w9lSSQslUshRhprVtJv0dFw8FIKZGTybxTC6VURBstuQJwwDkX3UjywRx6K/EVQmdutIItmX5KYVufu+7pMVBNBLG0Vc4ZidEQlloMRcJ0fEo15K33PrWsRrrby3sMjfDGg0czX2hNMERAZ7BEt9QcyTZc0WqlAyJcUiSlSBR4/7Cw7aYKrS9W+dEFaxjqWgLWiPVoT/BWxwiK4OzQnN6fqXQ9muCKOTcxRXJOXHIiiqJtBX+eZYd9N/HDmAYhXLwL/NG2oXv1a4qB1YsRcgwkdxZHsUJEqQ9XUjDe3Fab8d0kkFMiOdfOqqhGt2sbr8u6kKLzc/bCfdttb2mVl+dnRIS8LCzLOtfdqICc0hxyIOchBC6Xi/UpVGXbdheBVUqpJ1XmI0A9St3f7hi0ha8fX/BXGVXJrkruyEkFipo68L0re1d2DVRZUDqfN+XHT4XLKjw8wvUqpKgsq5IXC2CNJ9nM4RqVBiLoFNBUqtgckKCuW4ZtBv0OKgQSkSuikaiBdynSRfgQYGnK5sUeTY06cG+dz3shdXURVsOOtqKILAQRqjywywMpLrT3fwu/+S3QidtH8v5Me3khPH9Gnp+s0grM0Rpoy59x/PrqKj1FGT95U44fyusfMwyVWVtUrNrmfJ3im9CRMzzAIfzc4+PGz88dW792vWMSBdf5CH30V+nTyZlnCEacOoSF7EGeq8kio93D0GMY9z0cnJNHfoLRX1EZ1b3Rgf11daJgo9cdtLJv28+MxF9+BEfNBkw5yJYjVdC0IFRbIN7RuzUoO7Qu3PbObTMZd6JtRiJCzvVwcuRIKcQYZsWFRIuGBx/ANkGdvWzODs/tdufp8wufPz0bChPMOS5duFXTGpRUoSoSE7e8cN8rELjvxpfZ94KIqxyJUnOgFTNwS+w8rIEoCjrSWQchmyDEdUGT5bNTMhShd2gtApFlD6Ronxe6lWw2R3HORMy3OrwuwUXsvGQ+mmZMNwgFS7uMxpujiCDMNda7CShqV4hj6o88vFcpOf9jzGdLgehcg7NdgHqa1lGD7huTpSLkFcI50xVdQfpPntkIcjjzO04BxHmNH2vPrMP49yiWOKByJ0MH+5sajRiZU0TpntIbpcmW3hqILXrM6TcbTz2cnPHczojO2f6OoPZsO4M7o0mP1hitCZXm44crBts9qY8vOuyQzvkSYziQdQZKY85OD+YkthQtKGou8gquiWK6RoIjNarT2trn5nmvpVbEq21atdRMPPGuBm8vnBz1L8chugM9EGHwhsqtTeHTfvqTAwN8y+O4ztd78uDmDPRG5s/mPMf2t9lpAEY/cNP/EtMDr62Z4xCgFLXmuaqkjLdq8D2mqdv5sS978OKobHdkdojTDlhGu70ZXNE9EBHN5GBtVLI4Mqg6wQdlpNUszVRaY6+VgEk0COYDqJjWf5OMphW9PIA2RHZCtP5qkozScOgLjf35K0HAzwzor2/rcCLFzYWlNkhuWs7D+OozBgDUVJ2UePpcVVcn9k7nr/o9/cLhTtSXf6AcBkO7TfbxucOAHw7NAdHPFItDYgoHR+gEMZrkxzGBz5DoOP9o5mg/P3XAUT1Ky7tVOohAiplAIKe3LTtuTqfGR64RTTrfbyZg6r6W/xiRcmcvnVrh8/OdD5/uVuXgCrFMpCN7tHWk5wZpNeXEt999w+P7R3JK5PzODFNr1FpmBUWr1ZEE62sirsa5edO/0uDuzb5jUjIZ6cJ+2/jw4RM5JZ5fbtxvu39mQ3tFUEpStqSkFLhmaO8SrSul3Nnvz2hLrFGJ6YTOjf/ZjjB1RtAK0gixE5JCa5R6475t1NZIsXPJb5vfGPNqbiE69hpbkTEm29hjPByh4V1yODTDoY8xWvWdO3yDt2HlwoNo74bZDeAYs+aoUa99CmF2F0OLqTIa2o5VMtZo935JqmpCikF9I4zukAfb9Mb1DU6VnK2MAtGrdCKtRXeE1J2xjnQrS1XfWBqQAiwpcMkGoadgRnvQVtXX6WFb39rJOYz3T6qrYKKd0+R7/mWk5UQATw+nGLis2cq3a6EU5rps1da7NusELajPa7OBpxnFkPjQ7rwpMaG73ozY3Id9baevrk9WvZcdE2GzStd8QnzV7X8pja0Y6tK6kcdHJmBZlmnPv9RDGUGSmqgRIYipb3c7Z3Uh2LLXo2HucOrfdjSPPUFf//zQztbZb0oVDyrtl7t2au9UbabdFK3FZpCFIJ0QOrXv3O6mRfNyE15ehBx9b2ndAdCOZAvyhk1Xt2cqtgc1rXRpIB1qMbtmZ5sVWUGrFWt0CEXQbrxGac3OIWJ9rLpASFS/7600EpulvrdO36wKtMUKoQGV9nKjfHpCpdPbM/QXpN5JouScCB1rxtoPRvyfE3D8opMzJbxbn6WykuKxEIN4s68jguIUXZgH6lB2V0rvbmzMAzCvr1o5bhkT8RS5OCT76piOzcm7OyM7p+hyOFCDgKgnxGtM9FHiffJ153lG5dSZdGn2XXyB2kth5vwHejCuxboky/y3qPN7rPkSKbqTEIXL8rYCcs2UFqzruIBS0V6w2hJLRwRAkgnpIUKtyu1m7QN++P4D//xvP1BKhWCQp4TAulg5MMBeC+WkESEirOvKP/5v4vdaAAAgAElEQVTnf+Bv+99wuVx4//4dl3Wl1srL7WY9bNy56bVT7gUKhBaorXGvuy36dgjercuVjJWW3+sz/3a7+/VWNq+waXWn1Q3VTgzWXTcl4WHp/O43mRQb+/bE7SVRcyIkJS+OGKinSsB7mxnqQ98RdoLsxNRJq0It3PdPPD0b8rSkxOPlbdMbVtbpJeMeb7VRfu3pK8R5EilytKOwxWKgonpa11NZAVJSUjSl28GzGXhkiMHXFxi5t1PKRrkXR2jMYIduYl8hRmLM7uSMWA9gaIDYxjZkJCQEI8W7PUsxW0PY3j3ldUqJnspJRyrDOmy7FgcW/ap2T76a2k5Tt0lJuC6Bx0uydOrLTnQEon1lD3zr6F8HaTscrRTmeyeUy4i34g7JUJZ2MrFaCjwnE2XrHXoV9t3WzFYtsgaoYsq5QSC7czsqy4yWqox2Kx1Xh9ZOq4Vadlo1pe26N2rrBIm0UhEJXp1oXctzdufVu2NfL4uPt/W7syqqjb3e3WlubLohIjw8PPDwYCnj6/U6q6xeXl643W42tltFMZ7lw8MDl8tK78ruabNaKy/cYHdBwubCtG86mBz7zU/e0xk9iTs3AyyY6Lk2ilaKmpwHOUK0bFMO1nph7/Dpc2VZ4JtPwmURcoJWO3Wv1ky5dcJiPed0AU2ObgZFg62JUneaVKDS9Y5SyTHykBaXHlBCF6IGqA0pHW1CKJsVb7RmY7ssZltSpmqgdWArtG23pbqDbIoSIV0gFnqD8umJrbtzHD5DuCO79ceSS0K6EPeAeeSvsyfj+Dmf51chOQcqol/02PCBki8NwOt/jb8fyM30Yv3/Eyka6R39yUf85PhSQ+In3vIX1z2FpU6T7/w7Rym8zs8fX8+vgYAMlZXJu5i2WycMOJGh03NRhwYHHKhd0VPlS5A33hhdP8MFAez61Txxe88fve2UgCMzvTuis3O/mcIuYk0fRYIJLjp/ad93trIfdy1CrYX7/U4tlZZNImCQwS0A9dL+Njole2WIb5qtGaG3VqVWY/Hn4CKL0mkdSrNy9t5PfctqpQ6HJxRCKPQm1FYM5aHRu4kyBlGvdhuowzA9AKPh3gCNfdM+NchrvVDbRgqJkCLxbYur5vjMlw7ASc3/9FTxmaw7UZzxEadQ05wgdQ6TfX9eS/Y7x99PNKZbqlPHnPGg0aq1Dv2bs8UX86AmQXXoS8UYTbfHJ+G5MmcS+0+pKXCCc+9T6iKISyCE4ARiLxTwKqpx7UFMx8fSMKOhrDlqB1JyBFxvnYAcZxvn/omjYxdiXKmTfRlaZofrql4NFemiM83TPQprrhvUxENQccL9KV48porOsbOvp6rR+fMhv9Gng6keHPfW6EHQbgUnwpHCjj16irsTYpkmv2tH25HWHs2QB+G4tca2bQdq3M2m4ddtv6+kdPD8RnNHe2RHIP1WxwkAfDWGdo3isbujOsqE/ceO0dFpq0fFE0GQGJCQgEbXQGm2/5ZilAI6lKzk6HttVRtYNZTXAIlD2bvT6WLOjtJMF06r92rsWK7XGiCj6g17LQ3ISFOOeRiPwNBPQ+tK7c0yG1WnBJ6pUndr5lwqfdvoodNTQZO1OBIPqgScPc/PezN/4vh16ap+7qhqy/D1xj8WxvgZ00kZcOT42hzaGoPa+iFoVUp3cTgzNP40fvb6bBv6csLqq+/PBn1MvoMpc4ovh4WR87k9HXe+Kb5wkmb5qsxUlo6/VHzR27lmDrxV6v2O1p0aoG+QgpK+Unb41zxKrVZ1pHookMjYJmXsUYD3EsHy/MuakKBcr1ce3r0jlzoRKsG0jlJ2gTG8j4oykb1lXbisK5fLyrossxIrhsDDw4OXjVdu8ca+7Wjr5DVbGqSA1AAepbdaqa1Tws5eDKUZm9hwcpr/7ogo0c7lYm0slhx5vF54d124rBkRZds3mnYu1brHaxjl0yPHX+jN0l+tuvCaNqtmWISwKJIbGosZhxhA3pZEviwLeVnmJgDigcjYIMPs+RZGZZKals1Mv9m7s4+RVZd1608jIE4/N35GmJur9k6vnrKo3RXFYZSmWnq4WhFAKbN5q3Uy9xOJ0iUQupJycrn2oRpum0BMmWXtU09l8HEUF73DJrC4EYwxsizJWg8ItCjQOxU1Y65AM8Xsec9iawFtNs4q1C6zNJ9RQDBUft/yUNvcLIVxRPzT2QDbYIajEwTJiajJ+Fgm/kCUwOKk8esSaZeF2gzNLt680QKH6pV2AZI7jzGQQkaCWOpo39EUWaIFNCkFrteFnIwwTjBByhC9JYcq+76x3W7UVqGvLMnEWESNoBxDJFCnGrx0dQ6Uzj0HlYkahRi94ivRorAumVKyV2V1erG5fLvdqc27UjKcVWvYm8kWANRAzG8egfBq7/nip4ebbslR9WaNE9HxkuzSG7Wb5ldTNUfRn6WWRG+RKvByEz4/dXLCMF3thGjIfewWICw9kDIOkXYkBFQs7S4jcPOsRGhmHzQI2hute1Xk1thfdnpRnj69cHvZud8qVUebBVc874YeZZTF6gYorRpCjxcGRaxZdqpI2EE6VZ7Z5Aa10e8vaNmQHog9gRp3rg+7pp5Sl5/vLPeLI23VEm1GYxZhHA5N8JTUK6fn5BBYOspg2JAisaWpy4KYp7fXxrY59Dn6CH2ladlPjyNHNdwPPX09gp3DS56AjuoMLE9BEfimPY2LjI3/lec2X6+dHH+JzJQU6IHYDKTC28vvzy/UfUO08awF0UZ441zxthe2Uoh5td5a447H2A2ukYTp5MSYuFwzaYF398q3t2bVMK1RPM8dY5wKpBIh1lE+bGN9uaw8PFx4fLiyrqtFcCgpRb799htEhH3b+BA+cAu2Vd6vqzkwG8Q9IBVUPT1SjFeQU6a27KiFOVmDRK2qlH1j2+6A8nB95P3jOy5r5rv3j3z7/sqSIyF0nrcXUstcS6W1gKhpVAi2wZZ9o7lD1fQQXAsZ8hVSVWTZId8tjRcF4v7TAfgrHuvlyuVynWNnwcLR/NCqgRSJrkskVqJfx3rs3Qo6ZVRBCmmkq5IL4olXBp6cnIGsjXYptdaTk6OvnGUFJK2TdxWDVVqFoYsVrZIkRavKMT0XZso3i7gA4YEWoKNNQJuOtEhAoqn7pmSFDq1G145ptAAtqkHou6LFnKAU7BWlo71YCkahNpMFgCMSb/WNUxwj9DrbgBFw6YEsSXAsWdWaLS4LKkpImSAmnJZDYI3mcLIuZCKlWZm+NXGFe+vsWzEgfI0oXoqekgdbLglQC71nrksmSiKmQH68gtpnxWTrsXWvLOyd7b7x8vmZWiryqFyXFRXTVYkSyRIpKqg7ymgzCqAYobbXhgah7jtl2Ug9suZHLutC104pO61X73tYZhqrlApim/rlciGnjOJK5ynRu3F48hs3zz0QsK++5ZXCY1dyoEBP+2Y3UnbpldKqodh0QsyExYjbrWRKzfSmfH7qxNpZMuactEZInaVB3Kuh5hVSbrbWc0BiMAdHClEMyQnDyRGF3fR4uiq1CUEDt+fC04c7+9b58fudz5/uvGydLSbUK6NDh+RVxEuAiyOvpWxsL8/QIaVGjndz2ORG1CtCp9RnntvNeD5bQUpFNICuRDDpB7ftInKkdn9mJH7ZnR2L62ufMkhAP31j/Okc7Nk0z73R8VvmHByprK8e5xOc8mUH7+X0d1/9iDF5/hwH4nQP+hpQmg7N+dzz/vxnwX9TmVUkFg3WCeNWT6WINmg7oo19e9vof+iRzBTEjBQHemWI1JECOW2CGIE4LwuEhtQGYegOhQknm1KDO5++AY+S8aNCgvl+8gheWz+qJ4RZen70wmHOx8EHab0hzdMyfgMTMp+pFENjjOMR/VqSlcfGodvSEe+XM8yOujFSjk11QPJ9iMYFZnd1k7l2HSTpv9JR//cfwaupBop6LsM/4Mkjpn1V9TfXjI+zHJWN9hrpKkM8JuHYx2A6+KfX8aZ96eNcc86NNMFBHLZAiaMfzfy5X3MYnxsY/Y8GenM+jgyWOTwWvlvTUO2gMaDNk8zzHg/gVmTcU3cdPTmALoaNelvlo9OQMcrDX9m2sSVOh8eX56is875z3dvwjMqoFIWcHDmNXoXl99u6cXK6i+mhpmQiwVrS0A+leIaTJS4y55tLTtHWSgW0Tj7k+XXeQ87l/36zcxxmADxQ8LGW3WYEF5adqckwkBB9lTWI0bg3MSrnVG0QJfTgbV3e+viFPedPvj167Okki8+8gwh4qbliVU5dO7VAibYBjUqroBCSkyxVaLUTpM8Ue0DNTgXPY/kYydjvnJ6iakTmrlBLZd8K+2aNr42m4OhxtPOIiuOJ1kMveuZGtLtMAagU3yes+bNWQ2la2aDuSO/EaqTm4M6ggV36ak8e3/8p1wF+hZMzNgTLt+ms0Hhd3fAayTmguLku7KIYOInMyZxCZMkL18vFe5BYx/Bpgb4c/tO619P/eeWHufru3HQHL+D4as7W2BAccVHmeYfzNaKq1g+P23RBDL2pHokoTmjzKq7uuWtLmdRZibBvVvXTa2F7eaaVzQyRb4rvbg+/NCR/0VE22O+QotKbkhKEbCWjbY4O9GrkN0UgmIBMCIGHd5G/4dEJ3c25Lec953BqGc9QlWXJXC4XHz+LpvZtMyG4aiS558+f+dd//Rd+/OEHWm3s951WGnsp3mzQhQUdSTSHrNPpvkZH3YITv8UQwcEtGXn9OFIerzbpTg8DSPbS7FEiJ8f9qZqOTOudXpVEZI0rS9xZYibH6L1heHMnB8YmPdae81VOJfxuGV+llrvfc/Ap/dOlJlNbyIzLQPlAqyMqzV9DZK+P8e4H5O6fNZzqw2iPXlcn5/r0VeFVj6yQJqPfTFxXxHvZuC7rXPu2P+o5CgFHl4bjJr7WZqXo+ASvxDNpkcPJGZ/d+9s6OfNcIw33peUeawpmsBdjYlmupJh4eHjPu3e/odWd2AuhW3VbigFZjEx+WTKXtVKriceNddRQSh+qu0PsNGCFBQBCqY2wFxcENKJzSJHLw0LujVI7aRVa6yxrICWhtm5FCZdkcgsJQrBURsrCuiZSkumo9a5uVwxJXC6mZRUcWbL+VdaU9Ha72e86eqngTr9MIcpBjxhOjv1OOJqCvtFhDS7zyfn/8n2ZG+SIM0cFmqIgJtTX8fU8uSmmdTMyBkP0835vyF7IOZCDaVqlHAArIJEYrNm0Qg9K6G1KiIjz0WyPFtARMJoD2bpaL0IV9s8729OdbW/UO6h1sHYEdSGExLpErqspiV8FHgS0dW4SCc32nU6hqNB74OUuIJWAksqd2HaCwqULiwYUQ/6s/95R+zcyDcKpsOcrx69wcqJ3i40HQVBOjo4c5LhzddVcqF5qOqxemFFGsAZhCJdl5fHhgVIapdzYtj4vfk4Kvu70voofT45OiDZQ1mk1eFSvr8ShupwdmD7t4vCaLe1xkKGHjettCCMqvXteeaI0FsmU/5+5d++O5DiyPH/+ishMoKpIqh/Tu2fP7vf/WH22p6dHFFmsAjIj/GX7h5l7BIqkpF0Jmg0pCRSQyIwM93C/du3atZKN5lXqthal67fHnbxr1UHeH7RSCUGbBoaozrrveWwbbHdtVtiSsCwQncdJmJ4izha03Spf0upY00LwkQ/LwvOnFcRRe6H1YtfW0gdwsDEYCOxKkd5uCuCGVmaz+TLYnc+fP/Pv//7v/Pf/+A+Cgd8YIrV1cim0NgSGxu44DuG0jC7Vtg4EN9Me41BvOPPtGRGKjbFawmskq86gnukDMFGAzo5Wu3n7CNFFLunCJWbWmFhT0gjG62L+nsdkQsyE03utpHL02T9oaHid3XMj6h1Ce+fGfXaAjDPDwunhupaIi+hC9SugA1bCzIkeUR1aM/F/s7QCwcrS3dGCQmQIMmH442illS1T1QIM30EGwPHA2chzMB06XvNTOdtI5RCKD5DjDegMml+6NimtU4Jjn+Od01UzsPpzYel8sg6sD4nLVXu/tQ8bfd805fb4Sr1/Regs0UFS8PG0a+VhLvDY/Sge1DReU3uLKI5IsA1QNI3idE2QLm8Y2RAc19UrUGqNnLVaqtaVp+crrWlH8GG8GiP4KPggpOS5XhO9B1ISlsW8zERsTcYYKj+DZ+3uXdm2B6+vL7RmHcsFW0vi7NMmzlFHebP3cy74EMyg9v2O4BPBL2/H8rQPH0yWAXaYfSI7gnb4Nl2Y87gYVUMZwgnkaEWldLhvlb3spOgJ3SEFliUS/Eoy86/uNODwTqxdgt4H3uvP9L73BvhFAxG710vW+/zxmrn/8tCWEo+ItKQmgT6xpBUJkesaebpGoocrcBPotfGKx1clCJrsOpbeI76w9YQTiLURWyM6zye/4FwkOo+EgHPRAi9Lw6KGnapf+xtAztkX5xing9X4zUNOdOqM584uE24uriO6VqpVJsPzl+7zKYJ+817nv7OI/nzu7vi5Upjym+9z/tGoqhj9HCcLNHx3+qC4xXwiVGA5BHO9d2reDfQ08r6R90xvjbLv1FrNaA7EBeo7R4u9jT5cuoApYFN0fx7hJgr+ukxxPsxFZAU8UQKtD9fiSrOyzCEgHYzNcCud3aJPaaSZQukKHPZt4/F4EM3ZFg4Dx6mjOo2pcNoUZvR+AOTf4gp0/v3W1Rlg5gDrvw2sj3kzzPW0rYXa4HucgYw/c4/8XY7BQA5GYpyTmwDniA6Oc5ns6jiO9ZZ5gX/jrUbKYVTXnE7h9Jzjfjy/36/TWqe3d+dROk5obgSDNf7tQTuf4uEEfGbpxhyZ53kuOfj2FY515f+vxzg1xd8efCDgCDGRllVNEPNGG0Jxu1+6G0D4aNUwWoKMuX6KR/U97A4SY+QaHWciU7HfD1Nqwc80ljJ4jhC+WWOtd5Wc2vmAJ0QhDpArqAGlDMbj8L3vfbT/UOH0TIWNsz2vM+drdgrEx3Pf9xiRgZuAdP7mNI/nnmTB5ZGDOThIBkBHpgP4WN1sl9MAvKmAvhahZsG7Ti1CK2jaMar4HodqbaySUjsWyDxdp9Gj3ecgVTS46Srw7rXbv4Wjr9zbPpEh6BwL3UT94x7WNAe4TncO14XaPFT7fevTvLBbsKPBkLfA85iTzHXj2/Xj7fFX+OTI1D600ZgNjYKG2EsGw2Oj1lqlFEXfrY0o6xjUET0FoyC9dwTnaO7YfN6c8nl9Mrp6LLojz/92DZNJyY9y4kmZDz3K8Lf5huHpHAvjWSc0Umzahn7U4int1kWbUObtzrY96E3BTC1FwUzONLMY380TZpgUauVGhH9AagNg643XWqj7g1Aza68Ur20Z1G9DwUXrQhlArm607QXvE+sKa1D/hOg8ya0gQi6OYvPielVBrIiwb5v2nEHHKm87IQbW1RZjEfK+01pjezwQEaJFYprmU41EbTbGp5LFAZbGHJmVx2MeOZsvBgBUQFepTUscQ4qEFAhdXzvGpKkmi+x1HjSbYxZpocJJcQJ4koHTS6x8uH5ie+pGAHkuy6d3Hcucd/K+K/D0zvoU6YY0bbzEKbPRZc5tu1xz7nes0WNTcFtcJ4vYNXTHemK5/LEbetTZNrhAc6rDatLoTea6I0Crmr/PObP4YAyEV8bMpzeL4yiFHgCxizAUwM3E7irgtzSxmIFkrRZsWLsU6ZNZpXdNd9aqbQhyJ1chl7PBqZsaMO1ScwRl/0i880ar8ueeMyd4B4Yh543nDz/QW+HhvG5srZLzptWDfRgFgoiaBT61ixmyDgNPTZNon6GxLDls6UergDvOF6IZzoXocN4Yu1G4J6YTcmOd1nPv3bHtDV9tDXcBFyB6R4g6Yboom6Ps+fAhE2OFi/aIa6J2G14ZV+c52vmcJBVzc7VjAN/BOr/X0bFu6IMxPccOcozzSOw6URsYxmftQulCFYEQSOGi5phrwq3qOyXLDssOrREFokt4J9QdXmolxEbZvvDLT6/EGPj4YeVyjUTvuC2BFBzBO9Y4QK8y3boudyVIu6j1UhFNTZWA7wkvAY+m5IILSNC+eUStso1BewT2mslboZVKeRTapoy8BK9VqF4rQltpCrrRfcW7yJJu3JYbwXmucWUNiW/D1jHOy/r7GZC/ygxwUMyjyqq2ZhTRWDQ1ctI1UV0mnWhlSTtVe4zFy1tE4YOh+BldDH/FU6Q1o64xQYwen9FanzfRoKx0M2sToJVSjtJte9mR+5UuanA12YLRU8tuVnvbELR00oNW3biObg8F1wu0yv74wtcvX2itsj82Lb3snV40paUpqp1atYwOs5FXF+1V8+B/xSL3txyP3vhaMg+0vHdtmd11YkikGLksi3lqjEo0p06+TXA+8MEHlssTOEdKiSUtIFbK17Us8cPTMx8/fkJEeH194X6/m4HXK9u+aQ+o25P2MGmd7fFg23Ye9zt0IQbtSl5KMa3HQVGO9KFOBZlKewU5OhcDQdc+NIoal7V37ecUigGBmIgpGigRNZ6z5pXOcfRXkzajTkFZG8wpdnWaArkm4ePte1qNqukojsvy/buO5b492PdNGxR6LR/2/nAE9mK0roEcwXRi9vcjCGio71CtHe8hS2NrbaZ05py0iM/j1LV6ABYXCC6YS+sAOTI1A7U2SsmEfcfFhGN0oQ+EmDjagOj5utP5IUKVZqxgM/NAs+w3wFNrpZWiP6+FVgsYm9pqsb8t2nKjC7lCrrqRdznY3plK96cU1jiXdx3Jt8fbSP/NFj1+ynCg1r7fCrjXywcuaUV6VWO+LtSS2WrnUe4K+kSI3uGi53ZZNCjtnUdWQ1YNXh21Df7mYESbgGvaCbv2gg9NU8BppEzhsP7weG/phTn/oHXPY9f3iTEQk1bbxqAbI1ifRNF1PGczie3Cvm9s2z4r+9So7qimV/1WnEzOADtjnp/ZxPe2A1CPm8bQCsEpjnVYCnCwcXp9NZbqs8ll6Qp2WBIpLUTv8MuCX7WFhew75B3XGotLLEHL8cv9zr5tCJ3P4RV8Y1kCP/zhiefnhSUGPt0WLosWX8iqWsLgHST72h3SzJKiOVqxfTcHfF8I0gku4cNCkIDEhZhWiJGY9BEQes1s90zLhfzYaVs2kBOQ2HT++YYEtT1IyxViwPuFy/LM0+0jwXuucWHxkW8ZG2cR7Hr5G0DOoObPC6MyKwe3MsDDjBJFrabd+WccQOd48QGQjgVtsDLj12/el7e091taer7Um9+f6c3zSnUo/89mZkOjYQjcFmlMnxD8kRboaApL4VBHPTbq9FDR700Q14YNejftiolnCcDJxdEPqvf9jg50p0JDZxtItdbLThxVGibSNzGXoI29Cl46takIGIHmHd3KtgcI9t8sKMf1VaBci5p+NSsprlUXMdUsnUoDDWVq1+9BH9g8GizenBeHh68bzzrNu+OSHgwfYGJHr1UIghnJnRQqA+kCg/IfX51zxhzp84+UVdSqIo9pYN7vmJUrw6rgpAE6b07jeg3/pvMhMp/whh3VOXoGOXZzjVXZotGRcj6EzW4GEgJH36h5jzOv4QAVzh85dnC2GRhbPM/pCKjergWc1gA53MtFZlPec1uafvIvO9JZx/FWK3GGFb+X4vz7HWdx+Lc/nxfOzmZ81fvB5qX3BJeQ7glxIUQVvrowdE9H0Yecrr9MI0E9xn072Kz57iKTPHJd8ObFgreilDcR9vGHcnqAILXPeeWDgNf7Tz8rk0E6M2ljTo42Db2/TYO9AeOn43f3jXc+xlQ/74vzXHHW9JUJdI57xUCezfM+/sIHXaOtlY5n6FtVVxWDJ0mg02gE9bVBvXY6FenC/ihaTZqEPQS8OCRo8QTR0U0jK94buJFp2NmqfW3K1vU+dDCjSMPP1Omh89ML0c0SoNduWj71NZOmH36u2rYHeWd6Lx/VzsAFQohTm/ercQd+vxnqX+WTo5PLOxU1Ypu7bTFzAAcwcPqOc9aqCZii5oFu6WKvq/1Qct4peSeXanbhxW4qhxvZiH4sSN28dHRROxr7jdJnvRnGxtqnNmbQn6BeDKWYfqZVpbOH0VwtIMJyubJeb4QQeLpd+e7TB7yDvD/Yt4eyE7XyqA+kFnrZ6GVDuvqrpOjtXDAgaBSsYDb6iwrK0kq4fiQuF8Ll418akr/pSB8X0pambb4PnrZ0xNfJTnjrvjyEvGqEt4M49m3nl58/41CxWfQJREsLSy145/njf/1PrrcbIsK+75qukkbOmVIKIQR++fyZy+UyNyPpnX3PiAjrsqrGp45U5yjLNl2AVTeBo1u/Izgib7vSjGVmLBWaguw0EVtgo5rNeU9y4fD66Q0RR5dGE+XpvXO4qCZzJAMwrdFzJddCrjt73djrQ4FdFXJ7XxH5iJCPH6DpPDcsxibkmwGBQ1lTcWi0JqMBoGo2uhtRdDddhP4WMOZRwXhYnDJurtPSAqKp6VyPysMRJGhKIuHjik8rMV2I6aJMTkg4c/k+4Ulz5NbNbYCVwbDO9dM2f7FNe2zJ2vlYF9JmQKDjfqU3AXNkHf5c3RbhAV6dfu2nc/tHH2cNyaEtVA0EMM0ftVu0w6FFBOvtA88itJr1s3c11tvuX3Ew2fZhrLnnwrZrVau6TlvJ9gCw+m7z/b1vsxpoWbSIRAGNrRkdWtsMhLp5/YZewzlIKbAs2mcspcCSrHO93cm6fuh59SY8Hjvbluld2EtVIbRdHRnAOXgDzb9/PYdo+j2PJo3WjqovOEHTkybD4bTrPai5qqWPS+vk3inS6S7h46JO1z5YSyGPd4nkV4KH79aFj2FBWmNfXyn3jVwzP//yIy/3L/TS+ZM8+PJVxclfrgvrop5V13WZdhpL0jQTgraysTVFmgZTuXUeRbvOv/obJV51DXYLl7jiQmJBCE37AeQts73cqXtmuz8oj03F6DFADHMt8Ysyw0/XG989f+SSFj58/Mjz7QOaGPPEM0VyCpYcWhjze8dfBDnNhLTeebTprNBcO8NUfc/eh2OzLjLe8ui2eJD2hjoAACAASURBVOCOyBMwytDcjnMm77uaH+VMLRnBafdR28B600W6iygjYuCmW1WUHn4CmYH41SCqzIi3GdgptbJnEwbXTMkbvanRXN42AD58+o5lSfjg+HC78N/++TuCd7x89bx+LZQs1FcDOaUg5UEr2n/FOzVcamqBMzeabluux0Nc8cuKXy/E60fi5Ua8vS/IWT4llqy5W73XHM1p206x/j5eRjckXTZr65S90xvkR2d/1YiYHqBHJkUtzEVSjeg4bU62w1iEHqMKdX3QvlcpWbmldNZloRTH7vIpChqg2hihplcxhD43O/s42J1pi+Vh4d5lRPbKDoWYCFYRdY2J4APJ/Hpwolorqm6d3hNdQLoyIsF3pELbO6UW9prZy4Ot3HWcW6O097UDELu2pxBcN/fRQdvKZM+sJw6CD4ho6m1gpM4opB4Bg91jvaifABYteY+EqP2OYjxF/V47SueKOLUe6M78PnzAxUXBfFwJA+S4gA8R5/zpHA1ojy7WrR5soPTJ+Q8rJwVRA+SMDse2DuAN5AxGaCxXbrI0Y62oTVN2g+VRpmk4l+ucem8m588dZ0GtNw2T4lRjrlCNlHMBBJYnR7RgQUEO1KI9hKRUvBTVgRQ11Mt75rGpR5efIGd0IzcTyBM8HMAnBM9SIyHYNbd1utTOvhVa0ys4+FFvm6nzjpQ866IGhMviuKx+6sCcMRrbVtg2Xb+3rZKzptxqFWrvxxnZ33l77XOKavh0DYATYjR26/2O3upsLXMGh+frN6+jfd+B5vRz59rIrVNErLpqwXmvDVINSHoU5CTv+e75E/9yfYYu7Nc79b5xf9x5fblT9q906by83uk0QvDcLpEUtefZZU3E6IneQE7wJv4ey6lMF93aO3sX7Yh+TdQPAfGJ5BNrWPExkVwh9gKt0badx9cXyq5gZ7/vSgLEoI+g7HdIjuQiz7cnvv/uey5p4dPzJz5cn/Vcalej3DOLg7HxOJVN/M7xV7V1+NUL23eDmZhPGIDHrLmnzwZyTHM3qqcEOAzdNN1xpHMEhxuOswwmh8nedLuxj+oJPQ8Rx+yMO77WOqPBZqZG2kbCmkLWTNkzvTdllPKuA1oLwwDOOSs/9nqTxuARE255p4vyqFwQEevObfZ0o3eQtlXWsswQ1KE0LYS44O3hwvs6cU6DvcnuHpz8MG8bOqQpvp39aLR5Xi7aWE1aQKysdtoK4GinSOloBzIOS0O0Ng29ztLAt6kPmyfu/Iw/d5zSFxzMjr7ueMbB6JwrM94alB1z7az/GdGlyAm0mu16a1Xt13u1kupG470dcuG4G8f1Yg7pPPdvnj3GXY6hP5iX8QTnjleVudrNZ480lWIG9W0fZbtDW+aMug4xEYICSh+j+na4o2Gom+811gp7+7GezHnIab05L0qnsXuT+rWovttiKPb9DKSP4oLej9c/GKUzeP7tdMg/4jjfDzNlOn/HtH2wn4BTxoQYwUGM2v7DYeL6EGfg6q2lx5vxPAGa3k8z6Zv72K4gwZg7LMBEzBHbgOMBcvSVnMMChWHL4ejNqZjYH+M4goXezXBu6i/Pa/6ItfV1uztA/+GrpU7fU6NzLl541+Nb3vCbX410HkzQ/aZoRkx4L2Mf1rujH4JJZjhqKR1c155lIapmCWXlpVvFrK3nuegeqz5Cen1rEK2I88MtTA4dkc2DKpClz9L1gQicpax0Ts0FyFJc5vI/7+fjUD8zu728gtQwLAoMYDsZ+nI53bt6jGIF/2dA618GOU29IywRpy9sJ9rEfGLm6mBvHCPBugX3bpuYqHX7xapqtn1n3zOlVh73jdfXO6VU7q93tvuG4PCp48NgfvQ/IkYF2kXXFJi+d2+mk2kamajlfGGz1JKyNxoV5FLZrJRbQc6GdH1+KVk3ChHislCuF/L+SRFocCwxwGWlBM9+WcmXlSUE+B6ul6NaoQlqM06gW9XCsjiCQEgr108/kC5PxGVlff6OtFy4fnpfsWoQFaS6uUI4RCxnL0rvO+vBNRr6SQW6R4vKOq45aJ26F/K+A9pFfVlWhidFjCN/aoukDC8dnStDwzMWnX3fCd6zrCshBGMATb/kHSE48zuyazg2ruG8KqdKuN5xXdkMjRITzmnU2aXRmuexbXz+8oU9L/inZy4+6jbb9YbEifbtMUO1Iuajg0zA98g7n19/5qfXL7w8Hvz09U/8/PLFFgh42t+XyfHmcuucdoM/HwPIOYFAP6UevC18nWI+Gwg0HBWvgtu0sFgPHEpHSrGpoqtR8BotD98ZQpjBXveB5gMxLTx//MRyuXB5+sD3//K/cfv4ibReWC7P+HixjdXA8RT7gnStojFeSseMSpdKa4XR2sFoH90YjQnUOaH2BT4mYrug5XlaHNB657E1yApOS925P4qyBEUbvWo6YIRzM3nyVwLtf/xxYK+R5sG0ETpHrs+aZm+lEJ0u+jln9qptLnJtEBbikk2c36ZzsBaa9G9efzBrQvCOXIoFTm4yQNLEmJNxbgqkvBe0OaMz3ypNV4Vg42/38fH+bQKYEAJp0TXAVe2DJiLUfmL7mpqDKjAwJocjcAoh0Grl8fr6zoNiQH5u6YcD9XmrP5fIiwjV0umlKZtTpNP3TI8ecZra6vpkUq4kcdDVzXjP2hF823f27cEj7zQRXFBxtxcQ0etNSIj3NOBe7Ho6IboBQg2+zH1Cr2lzUL2ytMsqrD5qcG7gOVgBxChBb71TRtoteLgs6mmmRmIQPH6JBHukS2K5LqSQ8NEhrtvYO8JgF9/49On3659p0/GXQU4fIEcVe5NOdCq6HRvRaFsAjmgGTt57KyHXGzHGwGVZcd6x75mcKzlnHo+Nl5dXSq3c73cejw3wxCbqdnqKUwSoBmzON5t0oRfdRFsp7I+NVgq5ZO73V0ot5Fx4vatLZs6Vx76re23ZqfumEUPXXKqiysD6dKO1Qt43kI5D3T7j5UKLge2ysq8rLUWWZeXjR5k541wbeynsVdj7cK9MdCJxvfL0w7+yPn0kxIXl9oGQVi6f3jddFUDNomZQ7kG8pi/E0Sz15MamLmjbhK6Iet5lDWouPO4ZvYH9AXJCmPTwuQqulJFePISggDbQBGJK4Bzruh4LVu+EsUBpjSOdaB2TVSQ3AFq393LO4y3aiN4Rk1YchWB6rVZ5bA8+f/mFPS9cQ+Dj9Yp3nmFhL6j3T21amtxLpY/+TLYQ3POdz6+f+dOXn3h5PPj55Uc+v37FO0fykU/lfcfSu4h30QDO4SUyIu/em0byIrOi0RKliHiqAW8RqOIpWGVZhOXqCTTE6fgqw6fOosEiLoJaqMoEOUILgeoDab3w/MMf+PDxOy5Pz/zwL//G9cMnnI/4RfU4Z34uWAmxdypK76Va3KRSyi5Nu7zXjEg/aUUcLmj3cuz9CXFSMiI2l1vGd9WfdNloPeOLag4fj8IjF0rVHj0T0nyrRXnX0fx/exhDMtlYG3P7p3qKWOXf80eu60qvFd8bvnUFOaVSa6e0iguJmHZqa9y3jW3PR/+k1gxJaUAxKlZbb6bVtPU9BBbrP6aA5mBwxlVV1lvPW4MhnUbRjxlsrsa1mTThiPtDDPioTDm+g+kua2mTmS1WRYecUrQwg/ABcrbH432Hx1nq0Kiao83GyICMWeYnyMGC9yqmyamNLJ22Z6oD8Yco2QGuCakDzlFrJ+eizsL7zrY92PNOF/AhgvP4rq06tAWNgRy7Zhp8Ht3GgdkEU89N1+oenAIu73l6VtPDFBedP6PBbvNKNnTrwdUbpXd69LjrcjDDltnwS8QvkbhE0ppIl0VtRIKCHNA9JXllimKME0x7W///JpAzIu5W1cn33O25WQVRG5VLRqG1EKd/zhGxMzctz6n83ACKiMwoulsn8t6aTmYOtrxjrqn296MsVLrQsp6j9tfYaKWSy9D7FLJ1pFYrcLuRTLczBNLj4cRNL45mVuJD1zNy1YgnpUhaIr5pGso3MY8ZQ8Ot42M0x0pP8CvOReKyEtJiFH7Ch2iP980VOwKe4z20NDuYruWkircFCQdOtLGeoO0gYrAut14XOThMxnwIagQ1PocBHPXNMJ0Vg/13x2KERl6WnJjP8cYcpOiJMVC7pkOamNbC6FJxI/9t1VjDOMrSi95Akje3294rpWZCQedFKaZvrkBFsLx41znVDOSoIRc4L+RSyDWT605tWT1izJG5y3BnfsexPG30x3ja78b3Mq7oiB9l8OLzyWeafBYhWkXbMB+b6Qk5RL6znYKlh5zXFGxchJgW0rKyrBfSshKiOdFaCutNOmwETuNznNHE8dT5i1NixrIzbj7RCzA88CaQV2M0lRI2E6Z6hola7abHmddjvM+Iw8c7/K+DOTP9Pyl/hxvWzqdxHMOLO10f73AhAtrVPaakpeQxTcY1xUBqCnJLCLQY8K1Tm6P1cY+d5pcfjVSxPkh8ozmx1cSN4XV2PzNBTozeJADjvhx73xzRX2UJZ+rbHb87Hm8r4AYP9zZBwrFxv/Nx1iH247QN5JiW1FtLGqfzsPROtUczRquZ7EIskDtSifZwmqXIXn3Zcikqx6jV1qBBy4wzG9D9lJ6U8fXX6wOnP9UCBdPmOf9mgM4VkO78l1bF6oL22lOQit6vtmf4oK0nFEZoUZL4Pj8rnNa7AW7e/Pv3782/CHK+/PwzX1++vgE5Yi9YWlG0OLoBi574ZV3Ju0bGOWf2PStyd4bYvSeXouhwrk/6TauF/fHQ7zsz3yt9lCM3tpwNsetGU608u+5Fv5bC/vpKq5VSMtv2oDb1WtBOvMxFOQSv6Rjl1wCxG04rv/accT6wPR487i/0tnBdPJdLQnqgff8dcUnU1rlvhceuQr7t651aN3rwxOtHru6qVSbpBnElxIX16RNxudp5RN7qQt7nuLmPPPvvOZZsj8P8ByynOiLY4VrixGkU0GFfMo9lo9XOuryQwgsCPD09cXt+wntPilErb8Dy6ZobDx6qRWKzs32Xk7j9OM/gHdc1sUS4rivff/rAdV3Ye+BrXSjdUwQezdKmtdJ2FbUpyaBg5Lo4nlebd0mIUfChU/pX/vRlZ0kRaRv7/iCGxBI+kvyzzu++UboyByUrKMYJPgk+Co9y58cvf+Lz/U/spdDIhHQC5O+8lr7ZDDjACudI0TFTtUcPKUBk/gyBvTZet04Njo9rpBtL5GKdEV2tkKsQurpdBwPHEiPOQ0rw/eUjn3Bcrlf++V//jQ8fP2pF1bLOZqcTIvlRyaNnPyovVW/X1RXXO1KKhOBoTvBSkf72HjlXG7lgOgSwFIqyTb0UpBV8abhXodGo4tlK475l68wtU4DNuA/OAOJ/8TGYT8cR4Q6zPtBr292Eo8yzdlhFU2C53rh9/I6UM9u+UYu6rofguCZP7Z3bEjVVUhtfHxvbXiyF0qldcFFLe0cfw2CeVNO/SLoGJhZcpOi1xYAFGjHoOrekwLrquoe0GeREX4neKyNeG6ABTLEKn8HYakCjqQycdhjHgQ+Nka7CsgvjHvE+sJjvzHseuRT2kg97B5iPLsP7iVPAAI9aeM2F2ju/5I3XnDVrgTW2NBQ46gizrZ8eyK8vfBaQ1ij3O3XbtPr3Yf5IWJWp9fGTDtI6TmQCVC0WOEDyDDa9m7nkkBbSdcWFSLw9IzHRfSD3zuu2E0IhusbiPRDxy1WzFWtF4qYPA+wDrFxuN66XCzFG9lL5+fMvrCGxPnnWNYAPEBze2x5pZfQ6qANo/Q2anC+fP/PLl1+UmTmDHKfi3QFyDv28Y19Xyr4rmBkgB3W1XII2lcMr5eWQE0KDWgp5e+gFF2dutEe1VK2V1/udLatL7n1TXU+rjbJpSWQrhe31lVYytWkjyNabIk+v1F1aVq63Jy2hNkZB01Gii5vXRSXvGec82/bg8biDNK7pymW92A0VuDzdqK0Rv264+86eC27vtHuj+0i4LqwJXIiE9QM+XXEhENP1oBLdhBT/3++sv+K48oGbux/vN027zDMmBEajx9HvRctSNTze151t3Wi1WfmvRpe3pxtPT0/mbxAYDfKq+eGIKHiMzdl4+pmyyvnceFPPM3hHWLVy68Nt5d9+eObD7cK9J36qV/YeuTdBSqd0oe0ZorpNe9ftIawX4eNNiB5cqPpwjdp2fn5pBO+pJfN4ZO2/4jYSWl3XJNNFrQVyVlqcIIRLx6fO3u789PoTv+SfFIA7ISS0NLseRpLvdcyod04ZmWkLW05Vw9ZUa9b92w70tWkpqBMh185967ToyC3S3UL3QojKwmp+vZCl4cWzECguIKgA3wV1y36+PZHWC5fLhT/807/w9PxMd47ig6VSZC60eNRx2nutRik6DzqNkXzxQUGOiKfSlAp3I2rUT+ltsVSGVXUBs5LGIseSC7VU8AUXdxo7VQpb0eCkVN3Az95Y8yLLb7AB/4Djjfj+9FUD84NtGsXjfVrhi/V0s00fIDici6TrFSeQ8s52f6XuD1otJC/sSVnofV1muiR4z2vYNYjbM642Ykpcnp5Iy3qag04DyvudWispelYzlrusiafbOgs2lqjarxAUCCkQV88s6UJwnuDrm0pcNW8tpgE9wLwD0+uFuXm2HiZDrJoc3qQ3UkqklN517FTbWcxQdNjGKgtTpZN7m6nihq6J91z4uu/ULjxa5d4UXNIbUgxkODe1a7k3Slet7JdtQzY1n5V9h1LsGpirPmI+WnI0pjaGLth872e2zmHsPhC8VkJ5R7hcSE/P+BSJ1yc4gZy+7zivYDkmBR9+WVmePhBqpYdEN6PXkb7z3nO53rhcLnjQFGr+wjUtfAoXSlhVvxO86Q/9rPI8e3O9ofC+Of6y4/Eoux2l4Ob6imOWsB7R48GA1Fbx3asbaVXhYimZfd+JseKCpnCGwPSNWZNVV7XWENsga6mz4+y+b+oc3DolF4qZDdVqrd9rse+16mVUb7mT59iZ+hrNHg9KfCwqYzHVzWFQh80U4845ahdq00fpKhirHZo4OrogK7Zyqkew1JTzR3XJGKTfMwP7ex5LXFjiOhkbNV6Kdh28lfQeE3DCLoP4PXZaaqbBWVjXFZGuzTRjNO+MYCBKUFGzzCoNb+Wdzh0gRwGQbVAjrytWXSdYrlcdsr1THxBcVIDcDJj6pmh+NF4MSue70HG+4YNYVZtGM+I6XSqIp/RKboXQQaxNB0CTogJY0RYAtSq13IICqdLUSLHbggV2r7mDUn7XY0yVkeOeTPOveea5MQxwMChpjLa2/Hnr6JzuYwMNeJ/AdXx06nQaIiFpSTgc3YB9jJaiUubGmw/GsQgd83wsSmMjG07q0vvUxiH9VFWJfTg3P8/wvBJ/lJ/LaK2uoej829EeoHYtby5Vu2ZrReeoaDlfVGMnTt//I9NVb6qpeAt0HO40xiZm/Z22MG/SBs70aiHgzZk2paQpquKpVQXpw+3Zo+7Ig5GpXdmbECNLjMQY5hk4oI++RaadGD463h/ryXC3H219ZjdwCRB0I56Vi94TOyTRsYtDptCHJP00LzBvKDeYLt7Ob7sK4/54W/H59z9CULaL3maKSaRZywpNiCtfeTg8Fzmnqk5VqSPbdMo6TVco+49zCgScAxeHJk0hDNJRfXKf/apCcMaggkwdpvXAMoZpVvIFj0869nG9sFyv+JgI60oIcZZx9971NYK+pp6jOcQHLPgNb0BOOM0Xh2KK1oTmtQp6RL7qrxSs/cuo3tW92+EOecRvHH9ZeFwavVh+r5sq2lCdOCbdrDRiM01MYTNtQt539n0HBE+l5bsi+MuVtF5oXXjcXyllV+Tb6tw08v2Vhpqr7fuugKYWvn75wuPxMP2rnxO3VbW1bqWQ804zf5yhD9JyVrOTT4mYFnwI9Fa174Zz5g1iPh+iRmGhCiU38tbwUnl1u236wpf7zstjp7TOl3vmZSvUJrzsnuyudOdgVddG1SxccWH064kT5PyWvuI9jn/64d/wLBMBj+aoI8qZgGeej/UMMjfiJe0saaW1RloTt+eb0tMDnBhQHC6taztE4rMc3diFCR4tXaWTOZkLcSdQcHSua8RHzT03J1SfqG7VZS6KUiYVuisanEQIqxC8EJYdlwqEhgsNnyrOCTVkWsh053m0O7Il9Z1oC6Ev0KHWxzQ063gE0131iq+NImq41Xoxo7lRDm+b+TtzOcOBWxeXU4k7MFZGJ0wX4GFqNwDOEJsjjtzEHK0dL1vjy9ZYgue2roTlgnee23PiZi6kt+cPXK5PCEIp6jMTU+TDx09crjfVlsWVbDuPuIizlIkPOsa9d14fap7ZqvZY0tR3MwCqbFxwmnboVb21pDOt/ZXJ8Va145QpTAEHFDqCpuQeW2bLhX3P/PHnV/7401de7ztfXjP33GY1pKZvdQ4fEOFgTd7z+FWp+Onnh77j7SbtQRsdopv5vP+O0wbMgBRPSEn7r8XE88fvCE6oeQep9LJRRdhqpm07CDwlzyVe6Dg+OU9nVMXoGB7eY0Ig4taFFr2Co6QAJ4VojWvNGdwqqlIMpDjWjAPJaul5O/xiinrjbLuKpVvvbHsm52JFHp1Su5VcO2U/BGs+bH5tI7XrtTP5w2xC3uv44bsf6HhyrzTpbLVQ8oNmFbxbaVQrmmmmg3vQeUjTNkN0dYJGGa8wAOUQCONMw5nwQLpciaLeNkvrxGboz1J2glBcpzlTMVrxROuibTNap+HYB7PkHM1r5/K4JJbrhRAD16cnPn33HTElGoFCouPItU4z19AjC4ngIPmIvzyrvrYIDSUtotM2Dt55VlYWtyBdzLOuEum65oYFlxbWpyduF20FFCzTMMCPw3G7Xn53LP6K6iplSVotWlNvJWgjBz4EQJ02jaDq6B8D7PumfTREkLpTthd88Fxuz6xPz4jAY7tTirIvvdUpQN5KJ1ed8I/7w4BO5vPnz9xfXzX/vqwmqmOi11a0DFz72RylxWBGVyESYySkRAiRVjM+RkXbzXK5Tg3EBmtTaifnhqPhKSb6En76eufzy4PShK9b5WVvdqMFGquqx9NKjAnwOJ+UhThRbW8ZnPddSH/49E84CSeQY32EnNm8h/iGCgfIxsD13kkmVuy9s15Xnp+fDrbPnK3PBlhvSIXTwn2urhpMXhcsstby5UgkuE6MDhfQOhsHzUeaJBp6D+NAfEO8ORJHR1ggBMGnjEsF5xsuNsLSwHXwhe4KgmPrG7UmnCRCuRLqBTrkfKfmu95MQSsIXOh4V3C90sjkXmhSDSwIowJFS7XfN1r0Hrwbrt9vVBjH9XeD7u9WVQXgZqpnKGRqg82adN5z52XvLNGR1hVZLpqWvT2zXG+EELk+PbFervQubPtGLoWUFm6fvuP29KRBR9NUojvZ0TtvBoDeU1vmsVlwU7WCUYsOOmI2fjE4lqgqm14Pe/laNX2oLGAwkOMRCRNcDnfs3oWXx87rntn2wk9f7vz4+ZXHlnl5ZPY8Iuwz66SHgoUzc/K+x28Zxul5OOtqxGQ6QM0APYO9sq/unM7C/kb/E0LC+4jECh8+sAQh7xv7/Rf2l4C4BrXQ9g3nPZf1QkhJNRDLBReiWmM0BYW6NmrTxUDA9UhvzkCMORtbempU+AWvzGwMgWRWI6Mr+sgQjMCo1KZMfRe2nNmzvtdruPPwTvV+vSBNaIOQF02v9umMfqRg1BujsZktwnsdnz5+R8XxaIUiDckb7lV7q1XX2XTP15nudEw3GjtNK1A5WJoQPEsIxuDptXBO2aKYlD1/SgvXmIjAVTwXYzIHyGl0Mp1qjaWH/qnWxn1TI95qLvfqGOKpMWqp+GXl8vxETImPH575px9+YFkWtr3y8ijqGH6/U/Kd1juLE1pw4D2RyLokze7cCzsbgpBcIDqtklpcIpHodErNlNwoXujiIWiJ+nK5cX0ykIObKdLhk3O9/A3VVbow6c2vF96Z4GwYuRkirCpQPCg2/Rr8MPtS0XDOXZmcRZtWAlriaY+DYgSZm+c3hnL2NG3OZkZAk48UxPxXxg1/4jSN/XKn13mj37YI/LQ523u11sml4Vy19JYyPXvpmqJqQuvDddUB5l9iAmf1MjFwOKIK3oKbN+f2TkewRngjZTAoZZ0wwYza3oKc0A5zvzHuoGPbZ2pJz1/OF25c09MxNDwD2A0PDBEtWZeu1LvDUlNYass5Kurj08fXwVePNEwf6Q0ZvLWlpjQ9Jb7TXZvppEHxi9j3I+2DtQKw9CTOge/4KWQc18adPur4uTAS2+8Ncs7VDLy9PeapjSB5zPUBcMbfzHQO46tTj46im8NehdyEiHosBWO0su50tNZ57JWcC607rrWxmK9Kt/tJhjcPXlNGOSPi2PeNr19fNSCplZJ3A77aABeEFBzVNsle8hSX11qpZWi9wlzsUtV0lMPN9h+9Kci5G8jZ9sKeK7kog6Pv5I5rJ29wDqcr+e7HuDfG93DcM3pqb2Gs9j36ZvAPrPNr0IsBW3dUw4XWiGklLSuCI8ZkqQEdNTdYVmQWB3Rb02dayO6NkUIZTW7PZeRjvnbpuO5n+l/TJv6opD4BNe+FgDIPMURaFJzrpJSUYfadVNXEzvexBje69W4aNvznqqTBKL/ncbmsXPMFeiD1TveOp16JteBjpHtHscrg1vWcUkhEHxlGt92uZwyRFLWpgbZXUDYmrNpgOHjHzUBOQEHOakGXGh8qbErOQI5owIhY8QEBX6q1elUQ0cwHqzs350aMkRgXYkykEMm+255tAmbbU4L3JK8tIxYXWLw6xfdrgaK6K/293rfrupLWRYNo51lj4rpeeLpeua4XLsvKuiTWxZr7OlOhTWmEY/kzGqu/CHLWp2fSvrOLIKXgY2JZlLpakpn7eU8pmWyGeozFVzqvMeBE+0Pl7cH9y6turDiuy6KbZhv0lJjRoP2vN2N2RI3iUgIRQoymC0jcnp5ZVqWqxv66bw+tpugjyjWQNEqineZ0a6vaULJpBK7Rt9bzIa5dRwAAIABJREFUnTeH3oXXR+GPn+860EmZIBHYSmerIOIosuBisAFQZ1dwyjR5W+hPwqCZnhqR/wBB73is68rlerH344SGDRmPcz7x3YJQ29HnqxQDOeaoeqShrPrD8vH6/QADw1VWx0KF5M30EmY30DvSK0jDoWDKO40qHtbM9E7kQWQjUnujZq2o63um5UyvhZ6cicyhx0JbCs5XWqhUXxCnrUmab2COryJKZSs4jdA6bRdyzeAc0Wl1oPeY906CXvDdG6Bh6kIGqH7vbbEYsxniwuz/ZG//dvzUJdW92Qtl6lSmYRoB6fD1UXFyJ0VPbo69OWLqPPvGLQCu0V9fEPlKrY2X169sj43r9cr/3hzfV7XRv91uLMsCziMu0vE8Hhs//vgTj23j9eWVH3/8kW1TD6rZnsBB0JhgCli9c7S8q2mnaKpKPbhMZ+L1D5Rp1AVvuu524eW+8bplSm389PmFLy8PSu3cc6MTEFQrMQCg4zz7J+31rseZ3fytQ+Q4lwHGOn2W8ckAHTAtHQbYGY/O8KPxsFxJPuJS5sMfGjFd2PeN7lQzqOXIO3XP6lETPNGJVUkp2N17I9d9suYBSzF6xzIcax1gRSu16LrunCNGTy7KRJxZHR9UjI5tYkk/vO49TY0KU0pcLyutCddrZctV+1sZeO29T7NZNX/Npp/TtYz6vl3I/9u//StPHz+Su77/1gov+0Zpjdwqr2XXtFQ3wbsBjtGCqE6PrhNinQyGrtdpWYjmS3TxkdW6sqfWCc0ApdmgiIMWHN2rzrZ27cuXS+XL/c6WMxXHZkxO7cKjN6oIaVm43K6EGHm+XnharqQYyVul7zr2sTeek8KJ7y9X/vB8I4XAh8sTH69PeKD8c6Zu6omWnCcaYZKWpHIREYrZtaQY+eeP3/Px9sQSE394euZ5vei88W5qvkKIJib/GzQ56XojrnfYNqgN56O2fY+R6/XChw/PxBAoeWd73BltEBhi4t6o+0YpViL39SvOOZ4vN/hQVMjWGkGEICeJmAwHS90Ilc6MWiJsdfUxJq6XK9fb07G3oLTo4+WFUgpWaamHARytKj36WA1vE8wCex6nSPexV37+uk1Nj/daPtv1tgbncTFBWIztilr6hoU156VmMB+23Axr7nkO73gkEwu/0d1MHcCo8jqDE9XPRBOJH6IvUEAwIk6ZUegQiQETQJ09k/R5hda8bbaqz3Ctq+CsmfjRgFh3nuwCDQU7G4FM0Ci9atuOXiq9FHotSNMFWDxIqPRY6b7SfUF8ASd0160Ml1k8Y/wR3gVjjEQdj8GM7+z3Xo2vxAVcPTF/J5DzOzrQv+tRrJmtD2kyDweR6ibjeGZxxm4tBt5HtVWz+ShduO+ank7Bg090n0gJ+trpFwWqj8fGvmdqKXz58gv311eenp5Ybh9w6cplXVmuzyxhVdbTwPNja/zXH3/m8+df+PrlC//53/+T+/2V4ZQNKopMUeflkgLrkvAeqoGcwyhS14ajl5ObARCg3jdNhZxf7xuvj53WhPtedFPskGvXHluWOhggxzMu1z9gIBljJn8W5BzPs692vjLSviexaggq9hy6urNWpxtLEuOFEFd8qtxaI8WFfbuzPTZqKdS8U/NGzxuuR/ySCFHXqODVK6HToWWk6OY1ArfgIAbzL5FDk6fpfz3HWj0hVh3nZTHhsDZj9OGohprMcTxY/RgCZUn0LqSlshqrmHZj6FojpagyiNZwD8EVTdu02kHeF+T84Yc/cHvKqjcU7bO1VyvF7429WUPorhv7YI6r7Zu57OzZWEvbqzR1pU2FnfcsaSUl3W8WPMmZYWuuONNQ1lI1GPBOe9kFT5PO3gu1q1nt8vKVx55pJ5CTW+Nl3yldtXaXy4UQAtc1cU0XTTeKQ0qm79m8zLTE+8O68t3lyhoTf/j4HX/4+J0aBPejH1ZEA0fVySYtUnBW3eU0S3BbLlyirrUflpVLSnjn1PfM5kU0/63wt/jkjNJiN9Iup6j/vKE1rxMThJK1bLv3pl2oiy6G1QSKDpRuzsVy8+3Uo+So5DrEqsxNSN6c01tA8va89dy0NNkdmQjcGz7/iNncyGXN352BScepMLEPVbqNxnC2HJVSTn92vBbHa4P9zTcLz7dv/47HOR12TgGOlOLx/TmddLAyh8ukLrKu99Pz9O9H6mAwRW9TcjYWzivgtOjT2+Uck1d7gclkcgRv1Wp6bQelDZbmNIcT/Qz9lF0Y1VT2+OZ/42/EMdOtw1RQG4saALbS57Me4ttJN67gMWvedzDVqLISLDIWm5N6j5zfW8/mlFWbh/7M7gWnz+miHlUO9c957JXSwT82JC6ICNum3eVrrTz2wl4asWjFUm1axdSMZGits5dC7cLXlzsvLw9eXx/cH7uljTRl0bzCTK2K1fusWwrTee10X/PwV1KgIxypEZzDt8bA4MqMqC4nZ6umsk22T2O1US16sCFDxnusGf+Yo1n16Pm+PKeszineCXT6qFzkm7k5gv+x3unMnE7Rp3tNfMDHhZBWYu8s65XL5UoJnrw/6K0q6ACVGHhh+JJoBZY2ehzvAkyNjYIcEFG/InkzLY+EbuvaPkKXz4aro92H4L2lWGRYIHRrqGrj2Ufj3ZGSkvn5R+psMFsOUe8y/77BpPPm0I7HjzXUadosdm8tZrRarNped+7JWGIgx6Dz1xgfwZiLqAUrSjaokDw5T0SvsfMBBsgx/aTmt4K5JndSj1RRxqR1YU2ZClxQKUCujSUlau/EEFnWheA9lyXytOj39XLhcXsmx2ypIw2Sv3+68d3tiSVGPl6ufFgvCnJwBMuSRJQZH8ydjxoEidd1KHjPJS6sJlpfYiCN1iEITvT+z7UgYJV+H35zLP4yyLEmkjGtdIEQE8Oe0nvVd8QYkZ7obaG6yi+ff+E///O/k/NO3jeyGRNtdxX7eRwvL3fW9EVTXVayWkpl2zYe265204ZClU2xBbgLPoSZIwRdHOAwXxNR06cQEpqWrbYfGlMzvppOQEMeU/k7VXUDdKcdVrtPVAJ7Vxo1kog+2QCpkA+rljrYGz97/RzbHvPfegMeZXr6OIDCex2jx8y54mw89AZ6K4jGznMwM6NlQ++C716dKeEEDnjThfw49LONlNYoAWQACqe5dmkytTjBN5zrNOdpLlDxVKKOi21M3lXEZTwV7b409FgD3FS6qzhf6a5pAz+GNkC3uH4CP6N9gLpqq6gYZ3PBBS3FxLQelgIaEuMJwhmpv/fdIh/3Fx6PVwSvKRofZuWeHif0fB6Joe8QrbDS0bHeOAjFRPXeQf2aednUJTh+3gjLn4AhEO+mj9GqyELi66PxvHeaE65Z8Ale7xv/47/+yNeXV758feE//u//5OvLC3kv3F8zpcgEKs7myWwDELSvkcOMJa2kX6eN3Ss2d/QzFSsx5sTECrlUNQIVqM2bYzYMgzQZ4HiKe38D4Mj7jufr6ysvX7/Oe/PM7MQYWZYFb1VparAptvnr198KPoFpvAlMmwbvPXK54OKKc51w+4hfroS8830XLk83yr5xvVx4vKhP2rY9KPtmWh7RDtIIT9eVy2WxFgH6XiGonGGkreXNfw8BdZ/j08hV86whV0JQYfAAZADDPFTT3Uc6J9dGrQpySm3WTV5LLh2d4IQ1epJTv6VL9Hy6Xd91LAOifcJGibxz4BYLJITh5atjPMA1mMiKZk1/B/hpY05PN/dRim89ypw1xkSF+fTj9Rksrt1kyqhp9VrpjUfOKu4Gq2d12vqoWEW1Ux2MA0IcgnLH69MH/q/vflCmzI11Dz5dL/xwu5K8pqs+XJ/M+8gTLfgN8Mb8dd5aQdOUDkfyVoGFpqiC0zL1XAutKYHy888/83g8uFwu/J//x7/+5lj8FcJjr52DY8L3Ps3icM4cS/Vm6kajicC27/z4449a5t2qphN6V7o5Kz35eOzc013FTTi682pNnTO56CZT66AnB8hRsamzjdaPJpC9zyUKDh8W7wPitQvtqNlnVFCM70ePF+ct0j8cKMUFxKvjYidQxNO7MjfBgA1hsXSBMThWXSMDvJwBzmnZdCPKmTfAby6rf/djAJeR/z9rcryVF34LcA5hsjcAE3HuMO+bjJttDWMRHcdRbXVcC+cC3vV5jbrpQnqM9v5iokXtltxnCuloQaH/b0DFuWq350H367s2xDUFOPa/Q4B4PE8JPuuUJNorSbsmVZxCIRUvuyOtMKPsb8ZtMFjvPZ4jiIhx1fcNQvDxhGm+BdfHcQjurZcOfl6XOjdXYa8F5zQV0Xk1Q79TqfJcOx0+Fh65sRXBRSFXyE14eWT+83/+iT/96SdeXu781//4E6+v99mhWBmTAW6YaW7duU8AxsbP/nH6jN3mobWhsdLyQxwuhxcOvL3HwTYWE58PVvINL2dv+M63577vPB4Pqq2X56KLZVEGbTSvHZYOeq0OBnXYQQTvaXYPFhNp41RLNlJ6abnQfcCh//ZLJyyZZ6msa6LsD7xUlujNm0z7ATo/NjWtjrmuCXHKyG+7ts3RAoeTYdu8dsdcayPtKOaTZEDM+WZr4xgDPc4g5+yvVk3ArD8/gUOzIBCnAnax9dnFwG19XzNAhxDMl8ihVb0xHQUf7gz+bIcfjJPeAx2RZoG9MVgorj+t0sdltXtyzGk5//zN8+1+73rXt6GDsXtjFHaorYO1yuldG3SLsdlen1ueOtt330+B9MiLfFgXPq0r0QduaeWWLpZmUrGxrhkmYh+ylFHgMXrioYShn8SFCttrG555hf3xwk8//he/fP6Fp+fn3x2LvwhycEckH2OcArSx8f1WfkVRVmbbdtXUWKfyVrWawTnYc+X18VDAFJQBqW10F7fh8EfPJF1zHc73Y6LARKXMhcs2W5tIv/aekWOBH+fvHFgZdUDzhTiULVpWYlpmnyndwKNu0sPEaFZMDdA0Xtsu4BuQ82uWBMfhLvnOC+nwDXqzSZ8iwPO5wa/p8mnK5Ny8dCNaHpP9nEdHUPBowHOkqaR3xFtHYQ++y9QKzNfGzahluLoC0wxM8xjd5sSxbell19t57vEyPpcC3ikSnp9V5j42XstZymzg15GFfBuXnr95O3zvXikXVXzXzWfG4wlxfDY5eeeMcdTBOkDp6azHnNUBY5QAjDsKrAJrfl5LnYx/oimubS+8vj6orZPSV0pp/PLlK6+vG49HZs/VotIBqs4b2bhe4530MX57BmVvnz+s0WRGydjZzzlzYmnejNLYGd4O3Pzr0xX6FZj9ex9DU3O+585fB8NzTuV77/FxNMT009JDzda+Tcm42UQxBD831EkmOzcNQUPSape4XklF/aSW602ByNikbQHQcu9R3DHY4mPtwA3HYbuqg612FnqKpjf98fTTtXZzirjxvR1zhpy0ZefrN/co73V9t8nrgcvl931V/h5HyTu5lNkx23dPl2YmrEy24vjER4m9HmeT3dGK5XwPMNeumV49z1d3/Gb+jNPls2vqwVgaOe4Xp/e/mvrJKR9su6s5lnvvcd3Cxi4mKIeE7qPmh85oyurdEch0M+lVgbimvnGOkCI+qbYn+kB0Qd9zgJza2LedvWRy3qml0Fqht9+3BPjLIAdrXXBdSYvmx6bJU7LqmvE/Zcm437WC4uvLy1x+dBCPfjs/fv7CT581XbVcriyXC13U1hlzA47BE4xta11pPaonhKL9SZwz11m78axyqZsbcfCB7lV0ZUVTVkExFm3b+EKaDr3Rea6mQbp8+I7bd/+Ej4nL7QOX2wdNg/lAtGoO74M11TzthDAFVHw7zdwJV1v1w7xC44Z8x2Pb1JzxnFqaOhgrGx0L1Mj1j+eNxVa1TvKGBtec+KiuOm8uOoDDUXb2Ejot1Npyw8ZwLEbO7kAnOPEESQTx+B6QVpXR6xkfMo6MDwXvlbUZC7w3DxEnzh7+hMkFraQbom9dOjxqPudCJ4ZOjOZJEZWqHVm24eMx9tvRqWAcx6u+3/F0u7EuSSsbcyEtotoKu8Zjj9FxHOJ6memcMR7DGGxcG527BgYdb8DiOEYAAQPLO0p1/NcfP/P6qFpddf0jS1p4bDt/+uln7veHdb124NXXwo1UC+e36OAOJofJ5JwgjDuYODsjvb8ZKUkmCNNDHZSVtJlwzQCsLlzK5g7IdIDf+f2f6Y/z9zhGA+Nx7sBMOfXeeTweep1moKHlt09PT4fX1QwuZM6BZppHOPRy3mvVWgyD19JrEfBwvSnbs2YER7o+c8k7brlxe7xSa+Gxael/77rxtD68XUZqW6ZvVkC7h0/3Y9P3DD2NVhapJ9pRTXRaE9+wz2I+Mfb5ujrk1tLmHjSu35JUO+KtMnewCDEE/ulf/uVdx/Knn/9E3ndijL8KDjmxW0exx2HnMT/2eLGxnurgvwG/7vy7sb+M4N72xBm8Toh0sOvDHGbcU2544AVgVCzJgFUHYz+C2jqqZUuhFB2TFSH2piBHGp5mWQI1oJAuvL585X5/oeTCH//4R3766Sec9zw9P3G5Xfl/2HvTWNu2q77zN+aca619mts9P2NcSqWxgh2CJRqHMukU+REZHOIgogQrpOxEkMQfIAGlTDn4A0GJkyISUkgwKI0Rqg9IRoIkpaoPsUmUhkoAKSYKBhFkmlQZ3Ps9v3vvOXvvteacoz6MMdda+9x733Xzjl/s2uPq3LPPOXuvNddsxvyP/2hmFxPn5+ecbE4AzwKtlanYsU77cWS33XL//j222+0cuvIw+dRATrAgpxiLF3HyyZriCuSY2qiq7MeJu/cvuHf3Hq2kt4jQpTSngY97Ox1cRDg7L5xWBSw+pwGW6AG9qkCphhpVZybJwE+hqchWEKxqs75ltiraxHAOwKxccazr2VIIpK6j6wdCiGxu3GJzdpOQEsPmjH44sWMQxHkFaUBlFXA8T8YZLLd5csgqGWfnk+66K6os0s6SgkUhte+tr5pctSrXbqtGnc+HPtb1ER9ycA2wAOUaFGJzJ9YV0IEaPJ5RAuKAx/qxAtGKUGLMHqWiOiFqqeFCJnj8zrxho8sl5s4NiDabZVGmgrAYl8beoEqISox2HTsWgvkIkHVQfBvn9fcrmOBaZOg7Uors9pMdOeH1KNpibG4XZTlWo41JA2gLe3fl4uL9JFyZm7psPCIHz5kL3L13yXY3mSWWLL0z58LF5Y5pyljP+LEcwYFKvUqleKTxitGZaxrN2sb/sppvimXitay5pvgbCGvXE+sEf+3MYgPZBw98yOZc96herQe2DtovpZA9Tbv1q4jSdYnTs1P73er91eupWBrx8nrOtgpCF8VdKqvnioHYD1ZNNxmbI6knTXsqQrc5YdzvmNTOTVKKBw0XN/rcXa84yLG4KrPfLLwhJdc5KwZGKKhmj0+ZO+AA5CwrzOZHg2elmNtzLRLE6st4uY/TzYau69xIj9y6efMaRnCRy4t77HY7OzYjtHPVVjXI/LmCLC5GCYuHZB6nWe8G1jr46tcaGMYYZ/ZFVmzdLKorXbWwpUGc1RHmtsAh4FqzZS1eSFUZVdlnO6wioUStSLV0EQtlDjPBUbWw319y7+5ddrstH/rQb/OhD32IEAK379zm/Pycvu8p+QnquQUT59JirjIXFxfsRytQu9vtmPxEhkfJ48+uwvovRCvWZOeNrFHn4cKXIPRDz40bN1bKXpcB8wWQOutiO6DrhM3Jqb17yqj71NUDg3UeDpm/z0a0OlEnC2ckEszF1NKY485S8FaARGJEvCw2nnePWO2BfthYfYbhxI9+cJ91m4yEFYIOXiyR1YL0jZC2LFdE4hqhz8/VNpPPxdb4IGCZaV1vdNs8rtK/wMFr2xzbOAq1XnV1LZ9pMUi1Hloh82tdLdpgQcUGKAJSxVwoVZwWzVZ7DAM2ISjS3Ep+jSiWxtpO3xIVd8Acgs9lhrprplpqciuFYJvfipORZg2t2arlUsup1ctmfJ2yuCmY3RNzsKEvkAaiG0O3zs55qEtN/LiHdof5LUoDDW0NrEXxrKxicXCKjU3J1e+3KOMZ2M+bbAP89r+KjbkNavW11jSAzBZl+2wDm+txOXA3SZgNG50P3PTvuuRYqj//ms2Ze+JhfXUN8ih3cTMu1htdC8DMOS/nzTlQW0ODBuCu9nSbD/MtHTQuB/YmUufxQKmzdN+SrYhgslicmHSOk1tpivn/FntlRxgEO0dQliwxhfkAaNTAUHP7H4RE+CAbgLGEkVoDWuucqt6eK8bAMPQzyAheeKkCUy5+uvn1SXMRtwDx9rwHTA7rI0lsjjaQaMBiYWOCs/5BFvARVmzqGuSEGGa3WANIAHLQvqYDFmY3iJVgsJIM5gmRsJotTX+4wVTrAnKmaWSa9natGNBqHoIUgmfaWYxYCMHqL/kRTbkxjHEp6GkGliU3ZMcCxQmO6vWWGps5DFaiYngO9+NjQU5RoyGt6JAFtM1MjgeHmfEoc/Gm27dv83t+z+9ht7NAtTxZTYD9ds9ut7XBPjmdG3rj5m3Ob96k1sqz9y64f3npRZuMphKf3IrR6O7pm32VTYG3+JjUJbrNhhiEy8sLxjwaWEsdpAQxErqe7uSUmHpS19P1J4QY6YcNm82JB4ptSMMG8Uytduq2sPi7xXaXZZLNjBErvmCxoq8sBWYA6FMpXvN5R+3AtM4zNaLXG2rMWJ0n8GFw4wNVp2nK2H3tKvM1YFHOS+C4Vaa1s+MUn9srEGULbc7KEp3deRlBa6AWoeRCnaxWSggjoZuIUohJCcnu30XLoEgReonEGgkzGG7K13zJ0jZxz/wqeUKKVd4tOlHJtmFLMZZo3uQNOC3gW2elZG7bMtP11yZ+mGLfJWL04owO1BriMoXgAE6Xc6ya5RZCPIRq2uDeGhLOO+BcP0bmbI5Vc1SYJnXFVQmh0pgkrTJnHs7ntSnz2mbuT8AP91N1EL1ynXlGsrl1imfK0VysbomGddsNeNpUtXHK2sgjPVyWa3aqHVrov7fv15x2fAX4r40MES9ACbMLRkTY7nbsnBFPLSEjBIauZxh6d9tayvJV0IbqnJm6Nr4CptcCif7k1GITxx1THq2mUIgM4x6VQCoFiXuLp/TNq/r8akB/KpW8tbOiLE26OFPqLhxvQNPjMXZW3G0GBM7aB9v4FKvKrNqZLjkZZibHxg7X5QNdSj5frC05W7xo6C+ucSStEcUZZ+BgXP2FrSx/vgb8ZzfWyrhoJwwIDdi4212uMrDtHofAdp6/q3XcArT9R8D6zEBhdMatO2Bz7L2Lvm71qlQr437PdnuJ1kryM8lCCEzDwDgOB4ZMrZV7d+9y/8KOc9IQ2JyeIhKIXW+MNMFCV7ZbzOhZMbfBz2BLiW7YULVyenr6yJF4/NlVroRiiqRgtXC6FniKT2R1rsXpsZPTU5588knLlNrv5xTye+HeHMdh1RqNbTm/eZMbN29RamWfKzs/MKy2czwcvTWEp7oAiHWWyGzLhciwGUgxUGohpoSk6JHb0VNtE6Hz2hDDCcPJOTEmhs0JJyfu447JT+VudGF77tWEXAXgLTNhNfNWZv7i1Dv4w8Enrttx1ZiSFNOsZBpD1SZui8c5rFV0yOjYtRZlvLaoH7BcfMMMweqtNp96CMwZXuabXizWg5icasyCJUxUtIyWfSOe6REqEv2Ec7cckgS6YDEGQVdjpupZdM0dsli0DSC0A0OrWnxHqwq0gNI1wGGef2tXh83L6wWsOAMYY4DQ6lS1v+CtautF53bNZ7nJOs1dHOD462a6yezYO9gkl+KVB60hW/oHbYUucVwrC3almSWIxY83I6ExPs1tuQKhc5Pm+7VgyUMJPvfme3h7rS1Wk0evvGd+b0NLsnx6ucPVOz3/8gA79hB38RoM5WlidDdWlxKd1zDhxH4mLJb91f1QAT+5c2Zw5w3W9WhKPURPY+8GUrYDMlM3GMAJdlaflIxMmWlyBpTGuHvcnQdNW8X00d1ubVOV2YBGhEAiNtcwgUO92nb+uBiUnjm3gGQHOf1A9Dox+2lyl0dhN450++s9oBMwo+k57JzFVXzIgD/wejXmYT3+q2W6uunVmzxwvVY7apn/brjHSN/3cwmQrssHpUCaTqcFyNdWrFf9jMqtBaunSPVYJLwSfouPawbzdrdj73W2FDGPycy4mSsylwLjdLjP+jM15if63vVcgeSfUkwOOOXYgv50Nopoij/7yd+1FC9N7UGpq0UaHSmqKv2wod9szPoHtrstpVR2+z370QLaSl2lfq5o9raR6szB+/d5YZmfFoFKIHYDXQVJPWFzhsREf3LO5uScmHq6fkPXnzir0fsBgrb5B19MS1XVRSGv1asebivL5Fmtz3beRkPkMdjPQcSRr3Dn5vXWb5CwUJlrdP0pfXZlXTZprx+yX8wgaHZzeJBrCKC6WAjr66UYKd4sibaKszLHwkioxGiuqBitD5v71ApEqseBBa+AFAgaZ0BugczG4zxgBGljCIpZnFq9iOCafGd2BSwWaGMelj6xwpaP9hM/HzLHBEnL8luAZguEvuoyu+pubHO0ubhmGLPe5GX9icZmLcpzDSfmNbG2HtsSZenH5q5ajEmHK/N6ac92eIe1iKz/JKvPrRy/s45af28tO1zDB8D8Sjetn/e6pG0Ac1xNPQxEvurGusr2zBsPvvH7ZzXMHbkyTqHBxPVztjkxs5ONrQ2R1A/01WJvxnEHQSzdHWCyjSlN2TYprah6ALIqsbazkiZnFO0Mw7AKmDYwJkucY+tzCbOKl9bGsLhLK0pwpsbij1q5ACF4UPSU7biAccpeSPD6jcm5aOpDGfDl9YPuWpaxkOW5r5oUKHO1dq7+bXWtFVk6u6jWa6etO/E3a61UgVoCDyr2Rf/bWYFl0R0NhM1tENqpBSKt+KZ9DnHjjGQuJ2xP7PthdjHOpUiEgw44MLZkPS8eLp9CxWNbSFPJSGkdwaw82uKa9jt22wtKKVzcv29Vjkv2tDm7ztBYp2bWAAAgAElEQVT3cG6Denp+ztnZDRTlk3fv8bGPf4KcM3fvX3D/cgsqFvwW4uxGaZipttLgIrRUbrM4A6rBUlknJZRK0UR/eoe0KaThlM2N28SuJw2nDKe3CdF8y6kfCBKNsUrdPGChAZuwbCTrKXQQe6ErZdKGQtpEtkHsUiQlc/mdbXqGzs5suXEyMHSJm2fXm9rYzvZpEf208XzoYjGZEfwVaUr5Ye/3VweABqwPzHW1BB6HEOfA5RgD1Sv4kgwE1gBhtIUUgtJLRKqdTt73BnBKF0nJVnEKgV4CnUCUnsiAqJ1sjbY6OXket7lmhSq1ZPI4eo2c4rUjLTBV8cWsilQHEvOmsACF6tkek5e6vy7JXg4+xEiQBBJnhnPhOq1P5ky5NY5Yzc2FxTEmZM4SnP9zY2oFWMN8r2UKycrCXjSPAkv/zZuoLqrEFK3H3qAsRvkaQunclhlstfpS/m2VEHgIyB9WZFPE52M4cAt5N83XXGIjrhfklFoW9xFLuYcW5A8csDmNlWvrqNRK8PfnYucgRa9NsmYAmuHRMpvsgcO8Exq5Y/M6tOy7Tjg9v8VwckqeRtLQM4479rs94dlPst/vSdMEJErOPhkWvWglH5T9uGebLv2ImJYYIqQU50wkO97DQJOIHaOD/0aDsXstfkNRpFSq+LEI445xnFCUUrZzVmfbYHMp7MeJk+cIVH0+JKU0G/QPgBxY7RPMBoF6X63+sPx9XlKLocVqPTzUVm0XPfiMvTEA2tLYpRktYKecV1RlPmNsNpbmx1iNq6d211qswnNYAoyFSi2ZcVwSINQvE0QYhgFF2WwG2snqa3fcfILA6vlmcK5Nf6lnbT8atD4e5Pj3mZVR7CTUmVWxQRz3I9vtzg7iHPdzsabZOhQbeMWQ+unJKec3ziml8syz97i4vGSaMpfbHbud+Zi7Dj8rZc0YLGwOcJjd5K2tKuSiBFWUSHL/bn9yxtmN26R+IHYndCc354rF7TyQNS0cWBU6m5H5oWJ/YPM/oLptmjamLa5AztAlzk8HTvrEpu+4c+OETd9x9hxHxj8f0jLTGgyTw9nzSLkKdB62eB/1ufbZ9YZyeE3r21aATqsvvGTxTqmABDs9XEK0OaHO5MRWBsNSU0NxlkwCSfCMrARSFsvGeYPmxZfVgJrlka3sXFiYHH9o5jiOA0vf2JO2PizA8nMRk6MsgaheleKqsmvW04Hu9M0OuapD/HFWplFDLytt2j7blPABk7NWTCyXOGzSonRXLxdFOHewr3kHV0tT2vzxP7YH0AY5WV0DHhwwf8Y298RK7M9szZWN5jnNxOdR1OefvT6MhVsHHre/r9/XWJyZzVkV+5wf5wDkGEAUlYPxW4w3+7l9D0GQfkPC6oVVqsVFhMRuu3NmROiynW8HM06G1XiqCDkXQmwZnq5fY7Sg5hU7aBuw6XczNJXQGOBVnIpqIbAcOJv9EM9xnDyGx59NLG3dqjI/Xnd9NjIfmXFFTz6wztY/+3/6kPfI/N/VHejKy4NL+qJQDgwOUSxWj2aY+D3c2AFm0Dxf/ipQ09UabesxrHcUG8HFm3MYiCEhEFMrQrtOYljN64fce3ZRH9hRzz2WjwU5p5vmPmkPZCCHFdhQ7KyNLliU/NAl+tQZXV/VsqW806ZckSCcn9/k/MYNSq3s9uaemnLmfGvHOligXW9HM7DUycnTxI3Tc3a7vVliHjPDyvps0egt60m9N7rNCac373hhvw1pc25skSQL2gyHnR5oi5BHgpw6MwPrEWz/uTXkqdkxBg8SDfRd5ObpwNAnAzzO5JwN1wtyLOttlVUVrNz2Yhl6oHCt7i9tk455U32uOB14cPHa5z0W5IrybvE/7XelZqczxdxVQdiMULrK2ahkHdnVnqITMRa6tCeEwn4DNzslT3B+Ity5Eeg6CLInssPqrlTA2I9CpjBhyr5HZAAN1F1H3fVUKlO4IMvOaibFc/q4IYZIHwdSTBT2nFRlr2deSt2yQ0ou5H3myTsvvdaxfPEXfRGEjpBOkNA5cLPT2q18glWPtnnaxmbZ7GKLgwDsFPAVeGibzWp4F6XoQf7NqGDZFBurugYGSyG+AzVnuqQuP88bnCqq5eD9B/zpysA6uGQDqat5pm7xLmy3OsNs63b+Plff9TTzVeBxc/G85Jprq5ydnR1sMnMsC0oM0TNffOyc2WknV6vqEngsFqOwGTbEGNgMG4ZhmDNIF3dYmWtbLUbiwTa8irxSqLaGYkpoqKS+I8bElAvdsCHnif7kdK7Js5BDDlxV2Yx7hpMzL/2xTJXkFYHXR6FY/ODC5NR5PFfGLXjBWQM3m9Mt4ziujgWq65lDxY6BeOKJJ5+fQXuEtBiR5zIEHw6dV2Ae5nW0Zn/W730ueViczxUbnNkw4SoIvnKn9XNoW48OOea/LT291jMLcF23Z5VF9giQ88C9ZbYjZ2kFLYfN8Oh+0E/FHD/KUY5ylKMc5ShH+TyT6y7KepSjHOUoRznKUY7ygsgR5BzlKEc5ylGOcpQvSDmCnKMc5ShHOcpRjvIFKZ83IOdbv/Vbefrppz/r6/zWb/0WX/mVX/k8tOgon40cx/MoTX7xF3+R7/3e732hm3GUK/LzP//z/Mk/+Scf+P0/+Af/gH/xL/7FYz//9NNP84pXvOI6mnaUz0D+5t/8mzz11FP8/b//91/opnxO5VMuBvhCy3/4D//hhW7CUZ5HOY7nUZr82q/9Gh/5yEde6GYc5VOU7/zO73yhm3CUz0B+4id+gn/7b/8tX/zFX/xCN+VzKp8XTM73fM/3APAX/sJf4Eu/9Ev5ru/6Ll73utfx0z/90zz11FO8733vm9+7/vnf/Jt/wzd+4zfy+te/nje84Q381//6Xw+u++u//us89dRT/PRP//Tn7mGOchzP/x/IT/7kT/IN3/ANvP71r+dNb3oTv/3bv83b3/52/uyf/bP8iT/xJ3jd617He9/7Xj70oQ/xD//hP+Q//af/NM+Lo/z3I5eXl/y1v/bX+MZv/Ebe+MY38pu/+Zv8jb/xN/jRH/1RAF75ylfynd/5nXzd130d73vf+3jPe97D6173Ov70n/7T/OAP/uAL3PqjNPmWb/kWVJW//Jf/8gM69/3vfz9vfOMbef3rX8+f+lN/6oCl+yf/5J/w2te+lm/6pm/i7/ydv8NTTz31Aj7FZyj6eSIvf/nL9ROf+IS+5jWv0Xe84x3z71/zmtfoL/7iLz7w88c+9jF91atepb/8y7+sqqrvfve79du+7dv0Ax/4gH7FV3yF/uqv/qr+8T/+x/U//sf/+Dl/lqMcx/MLWX7lV35FX/3qV+sHP/hBVVX9sR/7Mf3Wb/1W/at/9a9qKUVVVf/xP/7H+uY3v1lVVX/qp35K/8pf+SsvWHuP8nD5uZ/7Of19v+/36Xvf+15VVX3Xu96lf+bP/Bl961vfqu985ztV1dbxP//n/1xVdV6j73//+1VV9R/9o3+kL3/5y1+Yxh/lAXmYzp2mSb/2a79W3/3ud6uq6oc//GH9o3/0j+ov/MIv6L//9/9ev+7rvk6fffZZrbXq93zP9+hrXvOaF/IRPiP5vGByrsof+AN/4LHv+YVf+AW+5Eu+hN//+38/AK997Wt55zvfCcA4jrzpTW/iS7/0S/mDf/APXmtbj/J4OY7nF5b87M/+LH/kj/wRXvpSK4b4F//iX+RHf/RH+a7v+i7e9a538ff+3t/jX/7Lf8nFxXWfBH2Uz1Ze8YpX8FVf9VUAfNM3fRO/9Eu/xL179w7e09bve9/7Xl7+8pfze3/v7wXgDW94w+e2sUf5lKWN2X/7b/+N/X7Pa1/7WgBe8pKX8NrXvpaf+Zmf4d/9u3/H13/913Pz5k1EhD//5//8C9nkz1g+L0HO1WPVdVXPcBztvKAY40HlRlU9cG/88A//ML/8y7/Mu9/97mtu7VEeJ8fx/MKSq2O12+348R//cd785jcD8LVf+7X8uT/3516o5h3l05B2yGQTO2fqMJRzvX7Xa/fq+47y34+0MSt+eOZaVJWcsx3DtBrP9Ynkn0/yeQNyYozkhxyq9sQTT/BLv/RLgGUDfOxjHwPgy7/8y/n1X/913v/+9wPwr//1v+a7v/u7Aej7nle96lX83b/7d/m+7/u++TNH+dzJcTy/cOXVr341P/uzP8tHP/pRAN71rnfxMz/zM7zmNa/hW77lW3jlK1/Jv/pX/2o+jPJRc+EoL7z86q/+Kr/yK78CWODqq171Kk5OTh763q/+6q/m137t12bj45/9s3/2OWvnUT4zednLXkZKife85z0AfOQjH+Hd7343f+gP/SH+2B/7Y7znPe+Zmbuf/MmffCGb+hnL5w3U/vqv/3re+MY3PkBxv+Utb+H7vu/7+Imf+Am+7Mu+jC/7si8D4Mknn+QHfuAHeOtb30ophfPz8wdS51796lfzDd/wDbztbW/jn/7Tf/o5e5ajHMfzC1le8YpX8N3f/d38pb/0lwB48YtfzLd/+7fzt/7W3+L1r389OWf+8B/+w7znPe+h1spXfMVX8MM//MN8x3d8B+94xzte4NYfZS0ve9nLeMc73sEHPvABXvSiF/H93//9/NAP/dBD3/vEE0/wAz/wA7zlLW+h6zq++qu/+nPc2qN8utJ1HT/yIz/C29/+dn7oh36IUgrf/u3fztd8zdcA8M3f/M284Q1vYLPZ8CVf8iWPBLj/Pcvx7KqjHOUoRznKUY5yIO973/v4z//5P/OmN70JgB/7sR/jv/yX//J5lzV3BDlHOcpRjnKUoxzlQO7fv8/b3vY2fuM3fgMR4aUvfSl/+2//bV7ykpe80E37tOQIco5ylKMc5ShHOcoXpHzeBB4f5ShHOcpRjnKUo3w6cgQ5RznKUY5ylKMc5QtSjiDnKEc5ylGOcpSjfEHKEeQc5ShHOcpRjnKUL0g5gpyjHOUoRznKUY7yBSmPLQb4f/xf/yfjVDg5u0mKPVMp7PY7cimUXBinPbUqglh5aBFCEFKwUtFlmsjTCKqIgP3a/mZpXYKEABIQVmX7UfI0kb0qavtMLZX9OJFzptZMnrbUOqG1UktGtdJ3PTfObzH0AzEE+hSJ0fCciiIi9P3A5vSMlBIiAQlWsvpyu+X+xQW5FKZxyzReoLUSdEJqBlXKNJLziNbKdj+y208ooBpQAiEmTk9PGTYb64MKVf1Zoz1riolhsyF2id1uxzOf+DiX20vu3L7D2976ts9uVJ9DylP/M/rBj1ATIAoVKIqoQq1Qir2mAhn1UVIqilLIZEZAiZpIRBs76f0rQErQRVQEFR+6IJAEIjbO1SbAQUFx1TYpTFq58QAk+1GrolO1tnrbqfMH7PNVoYBUv56qfVGButxTQIOgQ48OCaJAH6BrxxL4O4vCxQSXGakV8h7K6NcbgVatN/hnItDD7/gdpP/7+qq+fu9bvoNnnvkYXbI11w0dJ2cbQgpsL3bcvXufPBWCBKJ0BBEQ5jLupVRqLijKNGWmyY7Q6GKaS7jnac80TTYPqoJWUozceeIOt27eoE+J2zducH66IeeJu598hu3lBaVWduNErpX9VHj2csc+F8Yps93tmUohIHQhEAikFNn0AxKCzTOtKJBioE8JCcKQBja+pkUCQSKKsttdsN1dolrpUiBGIYTA0PX0XU/sOm7dvsONm7cYc+a3P/RhPv7001StNgdMbSFBbfRCIHXWB6UU9vs9pRRe9ORL+N9+8H+/tvF85/f/dT7x0Q9ycbEl50xMkWFIxBRNp3aBIE1LCoIQJdjfXe+GKPP4tu+qlarVyvVXpfo6syv5dUIkBrF+7qwPQSklg1YQIaaICMQustn0pBSvrF+AgCrkqkxToVRlmirbMVNqJRdlnytalXHKjKPp7FoKJU8IsOkHTvoeCUKK9vyosh33bPd7qlb248g0TQh2zEQS1+++1kutTNNEKYUYI8MwkGKc/3brzov5629/57WN5f/yv76Zj3/so5C9/1AilSDK+c0zXvTkHfrB9GUMEUE4PT3lxtkpIUWqZHLIKJVpPzGN9iy7eyO7eyOaK+O+ksdi/V4iqJCzcu/+JZeXO8Y88bFnn+Hu5YXveR0xRFKMDP1AiokUI5u+p+sSJ5uBFz95i7PNxvfEhBDIubLdT5RSmcrEOI3UWtjnkd24pdaKBAiu/lK0e4jrQsFf1wBVUIWSC3WyIyWGzUA3dMQUOT0b2Jx0bE4Gvvh3PskTL75F6IX0RCCcAqLUUEyPBQi93XOI53zNS7/5oWPxWJBzud2y32eKJlLKpqT2O5uwOfsDV0IIpBBt0KKQYkCAkifK5JtiCMTgm4cI2mCNKiK2kJaFqeRSmaZMS3K3fbgyjhNTAznjnlKs08s0UWth6jMxdOQpE2NgciWgqCk2lH7YMGZbABIiIdimvN/tudxuKbUw7bdM+x2qBZ12kHdoLZQ8UrI993a3Z7cfqQq5QqmmYE9OTxmGDSC+rRqYS/1AjImYEkULfenZXl7yybt3uX//ngG+65Tf/gjygQ8iyZW7qm3iqr6BFwcEBdGC9xoGDpTKhNBAa8A2dAEZEBmAAF2HpA4CSAxotM2VTpDUgAgzOJkVpTYl5T/7poyAJP9eFaaCNgBTGpABql1big2GVFYAB7uh+k2DXV+CwEkPJ52DnIh00e6r3rKscG8PFxlqQfLOQU7hEOR4X5BQGUC653PkHpDLy/vcf/au318JMRCHgATh8mLLs5+8xzRlYkik0ECOrEBOJk/Zz6optqEBfdfRdz2g5HHPNI2IQghiXdT19AEGUXKMMG7Z3usY93s+9vGPcvfes9TVhpqLst1nplIZp8z9yy3jlIkS6KKtzZQSQz8QQrANqlq7TK+YAZRioo+9begxkqKpr1z2TGUPqqQoRN/oU0ykEIkpcffus5zfvMk0TXzwwx/hY08/bcAttD6B1MkMkLq+IznImaaJkgshXu/avH/3Ezz79Ee5uH/BNE2kLtIPHTEKMQW6LhGCQxM/Wzl5PxhIE3yvR/yZTHQ2VmzZKKg47rc3hSAOlALjkOg7AwRVC1ARsTaEEMxgzA4+YLW+HDEi5FIZp0qtyjgVLveZUpSxGOitVRnHzH6cqLWSp4lp3IPC2WbgdLOZwWaXIqrKxW7H5W5HqYXdfmQ/jgQRm4+xM/2AwTbbJ/bknAkh0PcdKdpZTLWaXrtO+fgnPspHP/JB2xSqIlQidt/txRk1b+n7zttTEYQb5+fsbt8ipUSNiiYbt5wLORe0VLZ39+zu7ailMu4mpn02vVmAIpRauby0PWnMme3FXbbbSwQh7xMhRIIEtiEZQI6REweAZ6cn9ExMJxuEgEgCAnk1ZrkWpjKhWsk1M9URVaXrAzElJIDWkTFXUMiTkqcKCF0cSNFQSS2ZUovrlEJMAxIDJWVy6pjSxBgH9lGQKETwPbxSa6FSTSdNBnYqjz4W5rEgJ4XEKErJFdVMzplSKqUaGgdBnJnou96UUnLrS0BrQYsp4S6tF6QxACZhttrFlbCqstvtGMfR9ymdQc6UC6UUas1M40DJZmlN446SJ7rUc7LZ0HW9Ka9gC76UQi3FkedE2O/9MEFnckSoVem6nqRKEqGXgNbCWCvjfgcVgrMyQYQuBmoKlKrUkplyMbB0qZRpMmYrdgRXtmno6btI1yVOh55hs0FrMYszdcR4vSdtBEAR/86MiNuGLlFnYGBMiK7UgX0yEFEx0CM+uUQdvEggVEFKMEUaBY3B7hPl4Q5SZ3RsNq1/vQAeKd742vg/Gp4wMV3sbTHWQarOV7JvK8UmYqZHEDQF6KNteDEsbWxgTBcWA62INoSmy7UbGiMYeJfQaMtrkyduP0ne7tlu75OnzLgf2d67TymZ7W7HvXsXTDmTYqJLzrLRLPzG3kyz9QvGcp46AwmQRzNSRITTYWDT9QzDwK3TM544v4mi7Kc9292Oy92W3/rEx/nEJ59GQqBLHSGaMu1Sx9D17MNEmTJRIYToG3e0zTNFJAi1VGNmUcpU2DmQ1Qo12yzpukTfd0gQ+j7QD5FgU86xrrLf7Si5IiFwOY0M9++Sc+bpTz7Lvfv3bb5FG6cYA5vQ0Seb20FxBkNJSehSYhiu94DCLkHXCV1n8yhGIUolIEQRUlBCWNgXgBAX9kZRam3j6BeVBmACItDFQAzB7INcKW7gVNetImaMhZRseYXkDLyQUjCQJRiQbA1f36y1S8SYIYQY1YxeMXMpV0VEDTQV0w06KaVUZ1oKtqDNkAohGPtEoWqxTa7YHqASkCQ+h4QudaQQqbWQQiBPk+s4wcCaLfFrxqtIrc6Iy/KvgqCUUdnf31G6iXG/Z7s1b8F4+zYhZ7quI/T2JUFAIkkSqjBEQTqhxkKsQnSGTkum1kKn0A8BZUMuyvmdnu2+2Fx3AKpFyVOh5mqejq4jxsimc6NAjI2rxfUdgT71KMIgah4AUWIXSIP1+8nZhvMbJ4QYuLy44OL+PXLO3L+35f69S1Thxo1zbty4hQRBpaBSCEk4u7Xh5MZAiEI6DaSNkPrI+YsS3U2FpHBS0Q6qVnKZHKia0SsVEv0jx+KxO6q4G0nVJmGtai6DqrQhNGImmCXpdFhykIOKWchAlxJ9l9ytFZ21WLkFHOCIGBJX1Rnw1GpfqkqIxdsSCZIpSZzqLGSBLnVO8QYz1MKyKRpYMvRcig++qI2buyjaybtSEyF1aAkUiciyFxDc4xJciahveFoLilDEGCgJgaRA1NkKi65UDfTZV3AKXuSaVx8LoGA1frNiDBgDMo/Jsjja+8M8Zsby2F+c7WmbQ6PE3RWgQQ6GemZo9OBWuJFpIIsrwEfb7x14yRq0+DP4+EjbuNuHH1bzsrUhiA+kLB6n+Ybt2wL+Dn1qD7uss5FXTvd9vmXoB/p+YL/bAYValP12ZMwju/2e3W5PLoUcK6Uwb5BNpmmaQY5g8z+IkJNtIAK+oRhTa2xKpIuR3kFLqZXtfsc4TezGkfu7HXe3W2KIbDaQVOliYkgDXUzUUkkhUIPpixQjwU8tD86qtE1TtVKqkrPpnZKNnleFvu/ItRBjQGJPHwykNt5RFaZSGMfJdEiAqdpz7fY7RnfNSTVgpUSqBp/NwSxGZXYDCfb6OkUaIAmNUVq6Q+a/r96P68sGpts8Ra9Md/HriLEjKS74XKtN67I4pk0HB3t/XO4b4wJyRB6xDnzRyqzLde5DVX/tbuzlPa5BVL09SvsnrpeWnq/+fGZka3CjRwzIRWe2ag3uamssVvNpq+9Xn9VQPV6a0YD1I7XOLJcWA++oMu727C4uqbVw0vfst1u0FKIqne+RIQoSE4ISJBFjcfA4EaMxchX3hoCzo4GiQIr0G1ddhhspubDfjmQpxlqm6O7KQPTxqa5rHact+1Pw/TQI3SaxOR2IKXB+45Rbd84JQbjXRYTi+iWz3QZQGDYdp+fG0GkqaDCQc37nhNNbAxIhnoD0kLpAdyMQThUC1M7BjlYbexwftH2iMfQPkceCHPObToQakaCmNHOh1EopmTFbPEzpbNBijGj1AQmC1mIUP8yKQkQIcaEWndoxi6JaY2cK05WwsUe6AlvGLBVnZmYQVO0atVSq1HndCza4eTKaTBFzUxVTskbkuAW+WlQhBlSUmIyJkQolBzw8Z7XM/blC8KXpgKpU9nUCyaSciamjKnS5IDEZlb/bMeVMqUqpD1Ecz6c0utofUsGADYKqmN/UVg2EZB+oFakOLQ7iW0q7pH93dkczoQa7cHXTekYUsiAXaYDG8Epjlmag0667VkqtzY3laTdvTE7b4dq1Dlq4et0CxIJtgO31gfJTPObHWcvGcC07xOqNh8+obYe6RsnZ3E0553lu2+8M8JgV3AwFAzqHonOsWjNOQhD6oafrzNUWUKrP677vSakjxMg0TVxeXpJr4XJ7yeV+x26/t/kbIhpMyYoqcdXlMQY2fW9xCCEQUpzXnLmwrV0zf+i/n90xMczzo1aLqbFYD2c8gpiV3t7jG3KlUimoKKmPnOhgIyiCOkuRukTqFtePzTWLRhPFXd3XJ6bTKiEKUcMMeEIQUooMQ+f61QlNBxO66jWaYXmIZw+WX5DlLQsoqs5wuzHjBmKIC8iTtj7aYn3Y9J7tgkMQFNzQCcE2TEQJJTj4WUAVGHMTHASH6DFHWonJQEzTu7WaS8T2ojozJtFBc5c6UzeqlDLNe8vjjJTnQ4RIFBsvISBaCdX0Y98ZEItBbC2cnaJaOT85YdN1pJQIEpFqA9WIZDPAE/1gm3pMQj8kj2kaqdU8Jm1fVYVhCJTmxlejOmtR8mhMTpBIjHa/1HWcnpySukQpMGXMuFCLYigeVysRmwNSUEZCMqAdOiHEwKiZeBKRAc7iCXJi1u6dOzccCAWIBaLN9ZObA5tzZ616RRLG+qfA5GNMVp92YqxWTAtyUyWER7OsjwU5u+2Oy+2OmCCERC6FsWTzeU4ju92WWgt931uwXAhshh6t5l9HC+ogRz0eJkggVl2ARbA4ApuMZf7a7feM+/3sosrF0Ft1NsYYg4w6hVncjQXCFI1JCU3RYZbrfm/WbSqVUnH3WkfX28I2tBrcMsCD3oTSdXR9Ry2C1omsDWi1aYWBpriAn1IqVZX9ZG6slBJTUTabkdR1jLnSb3bs9iPbvQVj5nq9itQB+rKNeFiNgQpn7VADOy1Wp3iwr9rvzTWlKJMBo1mlZUAIVahabVFltUVBsJ8lHjJHbRhdaa7VzwGwaaCovRegYm4sByOS/XXzJC1P/MBrDViskC8mSQ54mvJXRYtaPE6pHj1eZ0tTFkR1pYfNiadXze5rkMnXx7gfmcaRcT+y35lhULQSnSG0dWWd05hMWNxFIsJmGNgMG4KYpd9FDyrtEloKQYSTvmfoOmII7PZ7nv7kM+RSuLe7ZDvuGfPEVBWJHYjY9CmVYDy9uS+6RDg7tXEUqA5silamYnE4VZTm/cTtH0QQIsFjUT7hJR4AACAASURBVBRjZkQrqQhTMUeqMcQRVaEGoTjIy1IJZAgwnPb0pz0WIGsGUoiBfpPoBjPQ2iJurC+1uVGuT3KdUM2kKASJDnBMR/Z94uz0hJSSxyoW04PVAmmN9ZCFpVgxJITFMLAA0caqOCuqau7BWszVLLZZhSAOLAw1hvX6Yb0Qr4qsQuHMNRUcPKUInUaKKqUoMVgcW4yBEKODvOiMjMUhdb3F0nSjgdB211wyQYIB++BucwmkZO+PQajaUUpmt6uHIEeuV88G7UiyYUgO6NVjclTpe2HwAPnTLhHPNgSBk5MTzjcbQogUEiW7wRyEWgARUteTTjauZU4RCmillBGtFiujJduaRRhiTyeWXBN1CQKWGk0JIohGwFjM7IzmVJTdaPF0u/3EtN+Sc3GmwnT3OFbYjhBgN05sy57YBYbTwHC7Q2LHJnTcCWeEGLh95za37twixIAEY2hCgO4k0W1sXyju0iye1HM5ThZyUoNFEsRA3xt7ZMA8U7XSxeGRY/FYkFMcPCDVrUFnSZxRsRid4nE5ExojOUaKAxLUmByAUiK1KIRqmQxtkczU2OJGWn9VD3JuIEfnOJFKC+tdu6FC8EXryrUxM7VaLFG7RygZVbMoowd/heDaV20QJAiiwS2agGqYN+NDW6XRxy3Kv1kbSs6ZacpUVbppQkKkKqRxREUYp2l2BdZrBjmNRp3bLbK4aISmKlkQiG+KAXcBBUQtbgE17lKbopytwjJTNNJAQa3oHBHE3Ierzlt0putmbc1ov2ttWn1AFYu9aXhjFUd0SMpcHS3m4GPcymyewrkd60FuWnvVf4slfPibua3XzORUd7+0+IRa6hxz1kZSxCywBeBUj9tgBkEG9JPFAnigbwxhmQchGEsSkylsEc86quRaGceRcZrM+FEFj/mo3j3rvdDocZtw1WFxdXe07ZvLm9dzdO7SIKtnseeqLWOoVpQ4z5MZKDmkry2ANkZCDGYd50IpauyJM1kgdi2tHPTbNTM5RsMrEiwGp+Fka7ONUUoGCooF0VBQj41kBXQekmHFMh2lLcD1vdd9KsxupOAGn43AIbB5+LGHbY0uc67d0wxaeyatrluDmNE0t3lxwdlXmIG6zVV3azb2SXEmv67uFdyd1jJ8mgtuafN1H9loTv1oQf8xOcgJCEqMLS7IYjr7lIgBhs6C3YMYAKnVoMzc7QFEomfTGfsaxFZaLYFag+17k1CLkCRwlno2sSNIINAR/F+kQzxRQqsZBbkK2wxTNcOiuB6JQVEVy9mQ5go1V2EpnunUCWHvYOS046TviUmQIRJ6i786vTNwesfID3N52VxPQyD1wQkbpQCSCzqO5Frdxalm4IgQiKSQPChekWpGwaPk8YHHqaPrKzF1iCRz7dREVZ2DuUop9H3HMAxzRPwSQDyTZ35F2wzWFmXbPBqTY+nhNnEba2P7jL/fZutMT7bF0VieBmLArPMQnQoV2hZu98qFGqw9ErIBHBFEzCoIQYniwXB5pOAusOZeC0CIvlkqQSFJU7jqqfUViZ7pExJVZabxc1WCu6iWMNbr3Rhnl0tLomporO3JYdXHyehjgtqqrGq0WBFEFdVgFkGjY2aA1kxHU8TkbJTaJCsAwPLa3QvCjCFsTNYAZw4GtjE1h7NYJlXFA9B0BiSH2RPrXrWtYI7BiTK7JtbvnwFTY3NW7qpDl1WTAzS2XP8aZRz3TJ5BUotlnFlqtblYg9jmbunAYWmlb1pd39H3lizQdwNDP9iadfdc29xx9+5UADWgkIttNrVWRk9G0ApRIl3ofIM2l0eM0djYMhEk0nkgfp7Xqd1nKov7WQJEsXmUoin7kis5L5ux78dGmK18x3MpixRJfTIWqRNCh7t+jL1VVcIUyEXnTZjqgLXa9c01dBjVcV3SEiAaeFvABixJGjZXgwcbVnchKMyAVXBwMAOdFeYWS5Kg6UrXqSlZhmlKcQZ7wfWsLQdnL5sxMxs1V1y86KJXVveODnAUoSZBFFJVL1XgLqbgemOOHWvxmEucTgNBoQFx75eqxkSVWsilZdo4SPWGqO9HxY3d65SYBlIaCLEzhp9KaNGMwQFzwRkZU6stkcVCHiKBhKiQMeNaBUat5MnGPHVKigpeFkVIKJb1VLIZbpoSwfOstZjrqqBMJYPaum57dFZhlwO5CkUFlUhItn+mFCxmrRbKKvOx6zrEg5e7ZMAmkNASqQghKjUaQhu3hcu4X3SMGxEENRIFi6PLap6bexd7Lrcjw9Dz4hfd4eTGCVEDXejpQqRowbqxYJmtD5fHgpzNZkOpgRA3iCRf7DaBp2z59bWaK6bvu9nN0/S7iBv1621GV4oKEDUQYKmslvFRa6U42DkARLTrNutusTirutJUIUi2GJ1o9R+aYp/BV1WmatlPBjIsQr8FvNkliwMeRcpor6Va2EqMaG0bpQUlR7dKqirkguaKhEooIGpusEJgLFCDknKFWBnzAnRa316baEFKOSBEbJK5Hk3Wl3QB7aKbXSxfpcJk/lCpFUo2X2nNFn/lFjXVqc2SDSC0GJWmxLrgIIqD+SkN7c3MTWsjNg4CFKPDpZgrba6JUxsAgeUB9cprsNxDQVI0d1XL+nJj1d7qAGdqrirf8Zqyf4Srau7IcP0pHLvtBbvthZdRMJdtlIDGaJZssIeJ0WI6JFjgqfngLfZmGAbfMJJl9insdnv2273Httl1wZijsaUoM9sa5GJKuyok6dik5OyDx5Rgqad1zAydZT72qWfMmazGpJRSGKeRKWe31lvtl0ASY4+mqTCNeRV7Z40JIcyJENb9Xs+lT6RocyZ2leggZ9gk+t7cVWmslMlXfa1oqf560VFtal13uBzRKH7RQku6kNCATpihenMnoUrQOq8ZY3u6GQSEEGfWdAb9Wi0msRlXfp/OY61ijPRdcgDhn3MGq4GvNesjYla+LVUzfPzVfM9gKMdtwWAshffnVBQphTgld1e1uEh3+6raHJzvt2a2DCCJKLVmU015ZMxLILuIrVZ1tFWBKRfG8dEpx8+H9N0ZXb8ltXksSmpgk2wxQuLsfa5z/bjGooaQiKGlW4vFx6DstoWspluHjdAPFt/a9YGui1QtTGVknKCPgg4dIW6gmuup5kotyjSNls0G+JBRNLAviaKCBJ8PEkldMxasHEAZJ0qtDGlDP2yIKdFvOoahNzenCGXCXdJubAXhsmam7X1QKPuJ4mt53FsWddXKPk+MZSIX5d7lxHafuXXrBjdfeZvh7AZJA5tgjGYumYJQdAJ9dLmOx4IcU36VEBMicQHagFLpaketjUp1Bgf1NFtoUaPrPWsx42kXArkCfmaX1IMyU7E0sLOwOfN1V3T2GtzMIId2PwgrV1cLNLQNNdNqkEStRH+c2TzxFGRxa8eQekCqUlQIaunkEhOhqlkqYbWAHUU3S2MOfL5uUWc/mvun6jIclZk1kZZt1Pp1jRW8AKQZda5+G3sjq+vpoiQttsUt7tXMO7AE10MoD/6ORj/Pz8JzMjit0Q/MJmFhWw4aoIcvrwQzzy7WK9dfX7iN5xUT93mX3Eop6JqqtywTpM7sZdsQLGsnekCmMPQ9gxckC2KbolbblJYNjfnaBaXWtvG1519Irnb/6OnK0d0LFseUjQl1JmhxO7Sh9YQBNTeoERUy12URjDUqzVWx0h8LqG0g0/pfQovTMXeUhGrEq9ed0Qo1MgOasnJ5qD5EH12zi8MRDVYYVZfHabpt1c/LY7Z5psszS1ixOmsxxmxFeq1ubUG9xqjIavqun7s1YN0f4kDHGzUzOatrSxsj09XBGbrgsZhBg9/TLGJZrZvDGMzDa4YWQb1671x3pulSWdlorver6rUbkxLM7bQu2zDH1xOMPVGPQ/XHKAWK/yC0eSCzDlK194zVUuFb+YCgQkzLnlJVVqDc2C6LwCjUIh7+UeYYGw/NoWjwWm/m0BKPmaXFhvl6rS0eFV0CxENA3BhBsSgVFIq4EQpZLdgZVfJ2Iu+Ngd5td+x3O2qt7PKesUyUolzsCtux0oeBOkFUK0cRJNkcF2XOLngO1vyxIOf27dv0mwmrxmYmt/rkzyUz5RNU66xAgbZ9My8KtbgZSy2P8wQUT7FqfmStFiXdd73H/Bh9PVuLteKr3oGTZfgIlZwnTjYnTNPkqYTJazVE+s5y/3PJnHjxQvPlmhIPMZFS5+BjiS2xlB1L2wlaDLRQqVOhZAtsbjGp1iMWo1LVCzh5nM2YMzlbdceYOp8YVvE4dd2c1jqVwu1btz6dtfTpi7toHmBJGpBwVxvFH05lVVjCQU90zTGVhgHxSF7/fPLq0NZ90oBNroC7rtxn7lz2QlHPbin/rw1FC/BoQ7KU0niIHLI42jaMmclhuW+ri9NMXmgV05Z+mGN+Dq/7aHkYeHr+JedxLqgVJM4p8AYkQBqLESNdH+fNr2WfAIyjpVLXovZVdQ5gBi8wuFrXDWA0YBVESJ3Vd7L14BmCgrFJ3h1aZR7GXAohWAXcOe7DmQOwINVWVqEVEBWEGk2ZSw22dosDO8/CmZVwVZA6U+uIs47V5sKUJ090gGlSSnOB5cVNV9Rq9dg0qDMIul4R5tIazlogpo1Khf1oeqSKmWJgG3Z0d17LgjoE9g14rNGRAb/UB2LX0TKSoteaEbXq8XP6tlyd63plLayBxsHTtNsvOGwGWjLPTVhif1SEXDL7vSeNlEgtZlyPXum+VNtvWgZgiokoDtAi3jcWrSnYZjvzriEQu56YHl1X5XkRNRdTVVMvzUUoYgUrU+rM655H9nsPGK6JwESMlT5GNFmsaymCesKHaECquWDLVNhLMbJdzYWMVnKGqolKopDI2pnnolSmbEBpu8uM02RToTN9VVQYSzFXVZ7Q/R4IjNPEdru1Ew7Gkcu9VTkusVLvQ0iRsA3IPXcldkro7MFjB7E3IFxKphbLlq5jpU7GnuY8UfJkcz2ZdyOkjts3b/NFJ+fcvHnO7ds32Jz2npVtNen22x3PfPwZ7l9ccjbs4X98+FA8FuS8+Iu+iO0+W2JJQ83zXrAgunlD8gXWFKLtJ2src22W+5L0haGqbOpyzcVV5X5xxRmP4DmJCmqlr2utRtt7JldD+K0oVbNO1TOiqi4p6S19URBU1iGqvpOqQ1Nt125vcIXRdklpPzOnvM++b39aCQ3kCTGlOaguu0V+dnryuCH57CS0jb2NpS5xOHMQAubqyGVJrfbsIzTOb2UU2HvKuSyUtbRMpFphN5nLStVyEqfsQRSdgYcYYPANuoEPgXY0w2x2zhWOgUmXzKolmKn95+JU7MMYHGEBOFG8PTIr5Dk73isnt6yq5Y/r7w+5eDOD4/WCnP24J5eJ3gOGoypJLUaquYtELEOwW8XJtbEfx5Hdzqqn7nZ7dpd7m6vFisSJBM7OTuncDd0CY23DNxYppcTpsOHk5AStVmSsZF+/mmc3VsbLuaOMJc89KA3IeEl/GxqxYEynvmfmWC2eo7pFq65LQvLidW6p52roN2smV1OeFAMGrSZSxWKIJndXAR5rhhtY7iJZsVm5XC/IqURUkmcjylJ0UiFn5XI7uopRK9rpIGFm0BckYVBUl7ZX1Xl54YDCju9IC1E6M7KVmrPrq5aIwOxGYumVtvSZF0/Ti+pA1zFQcKO7oDPYNDdZstjHmGbgPU0ZnSyrJneR3Fkpi33OTHky9RQCw6ZfnsP1aog2tvh8M9eYBWgXAAmkbqAfrlfPqkYDObXaUUJaPfBdCTHSbTaEIOSLysU+U3Im50ito9WY6iLaW2p31Zb+Kla7rUaUSt5bLRoJyjQpsfO+FkGko2pH1p5JB2dJCuNYmKbK/cuR3X6HJCFuAiFZrOi+WNX+nCv7Mc+u5Gmy0JFpyux3e0qpdGXHZd4RQmAqNjYWuO7PLHhpBsuau9xesL28b7GqNSDVqK0oak6RBCe3EsNZ4OzGDX7X734Z/8Pv/F2cnA68+Iue4OzGhkph0j05T1zev+BD/+9H+cTHn+HW+SX8Tw8fi8eCnK4bqBrMqL8C09viaZbCmtZsboN5gcyvZSYMbDLoweu26S8uK533uZkSdcvPaPAEDnJSjHP8QKM3jT5vFs5y41oXenpxdeEZGe05KtoqKHkAJvgk8icSaZWbLdgP8boGdcliaXvw7FZzhdSKoFk/WltOVtVmr0PMWuIQUDT2pKoX/vMemCliXQawfUZxd519Rtc1ZkRswxABP2dkzlhT395KNZAxF8h5mOgK5Hh7qivOtY9Er3ymWbIH83Wxa+dnCazA1UPurszFBR+8x0NuTbt466frBTltE57ZEHwsnA1pxxvMbggfu1ZmscU7lFLI02R+8VqpnkkRgteIWVXTXJIAljUpzuKqqBcJdqanihfZZPk86gZGnfuoMblzD85uFznoxlaELDgQaelTD3VDs3Jz+D1F1T/b4lJYMrPU2m1ZJEsCAzhclueYps+XuBt7yWbCFYe3sRiwtrqH6vZUK9AncyPVP9O0T236mWW6N33VWLolaaC5hjwztdXfmBc983e3OVe/a0bruqeW31vjWr2t9sirMW7zsypFizEgQajBirbOiSg0gGfunOb6bHGazX1uyUA6x1JZ3xpYk/joQNXnRWbVtVYgtT00EiIx2X5RqgFo+yqoCiX4/tGM7hbWsKqtYHO3mrs+tH7xdPwQqEFQAuphJubGEotdKhbcGxDUC4U28r6oMtXCfhq9EKe5t6ra96nFymZBJwPMo5dnsXnWuDPovLq/auX+/Qvu379nCRIa7UuELnncYCekTSUNHaiyGXpunJ8xnPT0vRk9aEWzsbR5yuwud1zeu2SQ00cOxadxhsBqq17p7uAbmK6UgMwcpW/s9ktW31isgGXxtA1/WSMLcDqItjAHof/dmASlElPy16tFBAchIvNy07aRM4OO9uOySa7TSJfsobaYmJmc9owyg5x5gl/Zw5c0yWYZrdrjA3utcprgrFtrO2dy1N0y5jMViw+cQSnNVRgCiAV2alQDKspCaAHm1kyHn6mKjhnJXmukVBjNdaUCUrx+TlgUXTv3xTCmOLgBJv/eXEmzZr8qbW7p0sbmGptdVVfASANQLVtrnVV1gHZ0voP9tCAlEQN9151dFVOg6xODH5Z40KDV85Ra2e2m2Q2TnSktXkzQal6Z0lM1Kzl27aiFREh+RtQ0+vEuSymJIpVcMmMeZyxLtH7TWqla3LK2xlXFLHLqwborORMEkvg5NV7oT1aKI8ZoNL4COVP8XDuNghmFXvelNCapuOtOZ7+9qlCrtGmOc1Otq0BNRbcMHMtUmhH+NY2kP6K0YGHXmupz3nWDMUl1FZwaiKknpQ0ShJLVxlaxmAgtPvWX9SFz7R1FMUYAVbS0Uh+WUSp43Icwx8uIF3C0bJ4V0Hcgu2RCzU8EeAyMF7WbiiVamCdYqBrmfcDq8ZhusVwIU9x1tbaDtDpUawOGZRRbfJpAO1i0roqshmCHdZ6cPHpTfD5knEbGcUcIhYAxFV2y+Zy6wOn5hq6LTHlHuAtUpUphqlaGIcWOqRZ3WMjcf2OtjG1+k6lSbDxKnRMNGrA/GTbcPnkRXTqhkIlxJARbZ1oydRrNjRwCUsRqVsXoTGdhLHsPtcjsd5Nlrk2VcT9ZbakxEnZbBHMxztXT2zYhgmhnhLa6Ozh7uQht42np5TEJYUjcuHODF73kjBu3bvLEkze49cSG1EUkFvZ5yzjueeaZp7ncXvDMx5/lw//Ph/n4h58mP/Hosfi0QY7bGrSNWu2UR1iBkJmtWP/s11jma51/Xgea2UfXyB+3cNofAQ+ksgFNc8euwZSuPt8CU73Vc5vmc2AOPrwMwGHQYQM8LFdZ35CDjx4o6KsuE+HK51bS99d7qCPnHXKzXyzTNQqsDuWNirLaNm1zb/E585lMMmOZ9jjthHdShBjdHRWh69BSENkvoCRXi+kJbn0mt+xb7RpLvVhSxVt0XHMlufJfUl7WAKQ92BXg08BNc9nN7qrVpGnxN3MF5TXIOfCNPXj9thG2VX7dIKdPdEPH5nSwGjfCklHy/xH3ts2RG0mW7uMRAeQLyaqS1N3Ts2t2ze7//0X7ZXfvndnpUbdUqiqSmQDixe8H9wCQVFGt7hF1YcbKZFYyEwjEy4nj7ufURnU7hMs0c70slFqZS2YqlgNgSaB2GXMuLA5Ax5iI7sKdhmEtt16WxUIJzUwOW7U2yTkTU7cBiIRkoLaVStGytZRY5eG09Mlw233XWk21IwR7dHXYNVyKEIbIMDonvyzk4BOmSPeZZJUyUKVqoWm17hQiobkuSBVL9uy3dl0kNwaoVLOOkdCNh+W1IfubHSIRCWkL6WszcOYhRNvlNyQKh5A8ofzAMJ4QiTTNlJzdPLmSsznMr9s36YrDXbq/0tQ3i3lBPQ8nhkaUvliJA51ATL4D74uY54JIn/91k/3YQKEBytoM4C6lMs/VtJskoSGt81B3OV8lIfz1dnP+W/XWOq/2+6emhdRZD5v2fX7pICdGTqcz93cPb3ov52Vimq+EYG06DoEYB8slGiMP788cDiNzvhAGoDaqGLAIEok1MbYDQfaWRjCXylysH6gsqBTW0DAWYs2LAZOH+8Z/+0NgHO+oklniRA0LRQQtZm5NEKvOiwJDhNMB3AV9rlemsjBPC09PJgZYS6PkhkvobG3tfmKohaiGLkOgJ0avMtVSabl5RoSnigQhHq36Mx4GvvnTB/77//0HHt7d86f/9p7v/niHijJrZsoTz0/P/OU//oNPP37ipx++8G//49/4219+Yv6X10PJvxrkGBjT3a7XdwaybaJvwAo/f+124elg4/Xvc6Sw7vj6YOgy/x0gCdt79vSprqfZhQdlJRt69cF6jje7eYdIK8DprI6uk/Nrx/r/L9/zVabh50c/pzc7ki/uvQ198rB2NuZFVmp0m2g2BLdNLCvzotC1Gvr9MoZGDOQkb89eXaZqu8a+wphwkP04aJZ98nMX0un4wkNWfSrdA48bQPkVnLOe3/5cX97OFcPsvvMG2LwCbvDP2/Xbtzy6B1pXpw1hM0XEmxKxEFZrbo1SLa5uxQLGCKDbJNovR3pIeN+X+6WvYUx73jzcKmzVTLu99c09MTZJXdOn5+jZ+Oq/Wzh4O5mbEK+f08YubOfc75M6elkB+i2iXy9h61IvQevNErqewy+N+9/skO371kG6O7t9FV33uhNPVlbCLiQhnnOI9YH1unwICSS3wwBjcrRUzMdPLe+ns+zr9KAuPreFTtau0QFjb7m1vdXPqZuA6hbKj84i3Vz6i7bu0wvWJ27uwTYBbH3Nw2HqAGfTVvNLCV0M8m3DVdq69II5jxsoaKiGFTzGZCKzPaVTvfpQVage0lURtMnKkPVcOKWZIKtzkcXFKtsKcBuleKViiGhoW9v1m+WJ9Vp9tQ29mMeBL+o5dW3VH6pV14KaHtW0x7amiohE39+ZOOE6sfQx19fV/lofzzEwjAPH02js9JiISaiqtFzJxULq02Xi8nTl+nRlep6ZLwvLNb96L34VyPml8snXlu6tFPP2Hb92oth3/HXSEjxL3PynJJj7eYih5+fdHoJlpG8nZWVt+pVr2jFP+2v4aujslwBL/9x/ckJ88zLVgOnTrGuyrKxYVxgFkNq8839lce8LeIqWLaYYaOnmSH0bBQaqwuhMEM7wNEtALmUDBWuJ2m71qc70rWnbL07jBnzsn7/ShiKsTFT/3q99xD5MdaOJs082/sp37Cbmr4Kn3/i4uz+Tp2ej8FujVGXp4UYJRLGKKukgFQ8hta4101YT0aY736LdRqbVSl4WwHb2Fl9XWrUJzFzDK/M0mzhgre4v1JOfXQiw5DUHrhvtCqz6JUFgGNO6mPXvjykwHkckWo5Bi76Ietej9Rw5u8RW1ExyfZEVjIWJkojuqN1zPmwB9lC0r+TSWaO9ym6vPJJ/gPj+Zw6tBDVFXCuP9sq/znpgOjJxGAhx8IU6UVqEFsilsZREbVZFs5TOxG4MpFQQsWrXaSoELjZnFWNyUgycjwPH0bSWoufahWDBrdDUnmtYNXyCg7IOcmy5hurjt9TGnCutKbk05lJNiqBZArUCtZizNNrLo13rSDpw36rceuK7yQ34ciwGajpIDCEwJhOdBNaquXEYSTES3xiwNr1S9YJWYySLRJbFDGWn+cp1vqKhspSZSqWJMaO5GGOuzcT7TP3Y/CBVYc6LMaE0Y3GcyamtrtWKMY6M48hhvONwuON4vGdhQhtuAZOddbGNZGnO+LVKPAgSKiKV0ymRRmv/ZV4QlKVVllZd/LMbeXo/iNHZYc/FUyWKGBMKJGdFTaXbQHgUIY0Dh7sDx/sDd+/uePhwz+F4ZFlmPv30E0su/PjpkcfLxPOXJ/79f/4nP/3tE5cvV66fFnQCnV+/F3931L5c2Pc7ia8tJb8EBF4CnJd6CF/9fv+/Hlc2OfkrS1mIMXE6nUzRUdWtJPoas7E0UYJNpmoTez/+noXCP7xz213DrwErX7t+/Tvn9F8+kpjK7x7c+IK8roWo56TYRG/ZaC6eJduPpATBcySWvAGZHRBhSGtODiHCOKC1wTRvlVm1Iq6fQK27ENSW4G0rtW/b1zXwJTr5GvB48XJwoPMShPTtfQ9X7cNUN2DnZVWV7B535+ps1lseD/d3LNdHM9tz65NlzrTaGIeR4zG6qNu2g1OttGLKsK0zBsrKpPSk3zVXphaWZXbSThi9bLeLBCpWGprL4gqo1cMOLuSXklu/GNBZNxi+K+99MA7JFNNjMA+6sqCqxCFwvD8SUmRplblaxRZJCRVQY2cDdlusDL4XFAREEkGFEAZiMEGz0rJ7eeG7YXVg08XzTGxPMUmLFAc3MH1jkFObeXGF5CrVzjwpSGggA6rVS/ZHQkyoDOSaAGGpylwsB2spsOSuFAxeW2TCq9jreZmoy2Lb8VagVYYU+fb9PejRxSPVQ1ZKE5HfaQAAIABJREFUUjNUDCLuZeiMBN17sK8JxtZl3/HPuXC5Zs9z8nkakFqR6JY/JZugqCqyS5S3hdF5qNbTBhTJQsFDNWzKzYQt2f54PK6ij80tgVKMXtH1tmOz6jO1Pe/WgYhgXlvTPHCZnqgyMuWJKpUWGyVX8jSDCiUrZcE0hUgIpmE1LRPTPNl9DRWVTd6g1EZKiYeH4wpuTqcHjqd30AK1mtBnnhdaLcYstbpJUbSEHiGERBA4n0fUNyDTdUJQy8mrlZoLDVkFOVNy+wox8c+upRUkMAYbN0mSC3s257dsk5MOA8f7M+d3R+4/3PP+23eEGJmXiR9+XLheFv7t3z/y8eMTz18u/OV//h8+/fCZtjTmLwttgnZ9/V78fZDzgsFYE3T3C/rf/RBjNvZ/8/dCPl97TV26e5/k1NoB3I9HnQbdMvY9BVrWgMff/Z5/9ni5Xv6a481Zm68da5hmdw8689Cf277JdpLSqU25BRUdc3RFZJEVSAJ0KnktWRaBGNDUtXGi09WeQ9H/1sFGz/xaG/Yla3PTdHuA8zpY3n/gjtz4+d+9AGo/h/O/cN/2O8rfIVzVpfdNnbhtHlZxA8trGMhZig3f6drmfXHfKpV6G/nG4CaMsLNB8J31WonUzLHekmh3fUy58VjaFxSsXW8FWbuL7OxKCKDmIr5uesR2i/2a+nVsoQvW8b92gA5mFUte9X66byuR7fq2yku5feNbHA6ou9T/dk42FM1Ys1vUWIyjewopnlCtXkXTdjh9/fxtGKmqhzXKDcjBmZfaWEP8qmr2N1XNnsDFJpsGghobGDqL419meTjqYVL3OXTPw26TETxk8nJtsGve98Xg99H87zpAXt+7nyscpAcP5SZnFS1apJaw/RXW/rc+LNzXN92KJbxbkvCNL2OXZUDZV/1FKtlB0X6slpIpJVtbt4aGtrJlxuRYCDMl064Ke1FCtdyu1eerh42avS6dlqGzsMZ6W3Wmi0WG3drqnalXa/Zb0E/Xxuc24/ZloOfDBkzoMyYzXo0pOYNna0peLMH5elm4PF15fpy4PE1cLwvzNaNZaUVtgOvrd/Qf3prcMDmvsS8vX3/BWOz/9uXnbIBq+/PWKvMykfNCzgufHz9xvV44HA6mJXG+o1bL+i6lEmMyp9LgMuXjLnHwF877l6711x7/P8CWf/zoTMPL5/bC9vpe56UzGu6NY4tEhJhMWyRUmrgxo5oUOU41RzMXQoa0eWCZI50xRstiScbNK658wKnP1M4pvDjHl/fkFSYHZ2F8tdPq4C03E4WIggwRRl9ZO4vT1CZ/9WRP3asP7r9rv0LuT++tp1HIy8x0vXK9TF4lZcJ2qJpMv09Uh3Hk3cMdpVarVBBLrO0WKop5CMXVjiGuoGhfngse2mlemeWJsFU31WXBgE6MZqInwVTA+2S6TxYVsSRFcdXU6syQaYvYvNVQslaCwlwXrstk+QplUyi3lt7yzIJEZ5h88pYtlwd0rcwROlaQlTUIEgzPx14vF1an7y6D/2ZHl6nwxbBvEix8E0jxAF4VlT1frVFpZFSFaankUteS5Or5NrIhAN8sGghasjLNxp5oMefqYYDxUCAUb8sN8IqUm42RYOG8NKRVmXoYEiEKpZp1QtPGkgvXaXYRP1vUBK+aWkOLzUQNPZG51moAXjxkI3YvjH1QWvAFXkB8MRZn4CzvJnE6jozDYHu0ZsmyvV+/9fHu/YHLZWS+LiY2SSUXA+TzMnOZrlQKc14sWVogt8J1WSxsJcosDlqalbqqg5xcTMBTAogLs5ZqFWSRgfPxjj9880fuz/cM8UTNQs1KWSp1WWg5oyVDy2hr1Gb2KoKBmTQkJEZkTBYmbjBdM8OQCSFatkE20+lpNgsm06ErNAcbBpC3cW+Vk4FjjGiEh8OZMB5JY+L9H95x/+09x/NAy8LHHy60Wnn8/MT1+cp1Knz/lyc+f5rI08L1Y6FdbB0JnlIR/yveVdBZlK8BkV9e0nf78Z99Hnwd4Nz+bo+1Vq7XK9frM/My8/HHv/H4/IXz6WzOrcF8bZ4vE3kpDMPI3VlJw2D0+jCstPwt4LKz/DXX38/3l6+37yT/+ePtQdLLXenu+T4kJLo9h22X2dsRQAISk5GTsVKDTZi5maOtiFhmvUu1hwFErMyfcdiquaaE5GIVVyFbvk5ryGImclgcgq2Fw1fOfwc8XrI8vmBoV8ltDV0aEupaRi7dwMVQ2q6c3hS1b0NVL5mjF0Dnd9r4L3lmmi48PT2TlwzuLywSOIyusB0Cx8NovjZNGWKAZgDnOk0WJlBlSJHxMNCFMfFQVhfT3NqmJzcWigOk5smPvY2lmvbFYcB3orLuFlcWERAJRDeEtEXP2CgDIoAIVQzkSFPmunBZJsvHU9ny7Trr5KAgBlNEzrWyzHUNwYFj+thMbJu+q/Rz8UTsJkCUVYuyh8BaeePRqQ1p1VgVCTdaQb2SiiDMWbksDry0UKoxObkoS2lrdVjfsUdngQCamiJ7bfY519neV3OhlcIwKMOhoLHcdmFlzaXS5uq12ojBKt5iCIyHgfPdmSElAznZ8mxyzcyLeaEN48AxmjeWqnmj9bEUfVPVHJyqCBqFbhcgoUELaNhAjgQhjSNxSA4GTRAwhsjxMDIOo4dzfPwrtFX75+2O99+ceHo+0JqPkdKoLtEwzROX65WihXmZ3fjZcmOuy0QtprAftXSkb9MgWEKz5y11/SvwKH+FFuHu+MAfvv0XTocTYzpRi3lJ1aVQloWSF1rJaHWPyFYorRJJxBQZx8HMbU8Hq5RUYZkq45hJ/n05Z66XidYytTj479kKaunWKwvcTJJkEOHkRqz37x+4e/+eOA7c/+kdp2/vLcl4UT7+7YllWvjPf/+en378xDJXPv2QeX4qm59gMYYrYnIDkf+Cd9Vt2oOuIOF1gLMt9X+PFFzDX7vPfe0cujt5ydk0COaZFKOzO9mUVrOVzhnabJ3h+82OXwI7+uLxH/3M/rk/U1D/rY+XPPjN891qsX/7S+bn5oO8P/h7FKgSyD1hUTwxkR3BIZ0JMqrZ2B0Pe8WAajTSJPh2vncP3Vr55zzKSybn9n+2Xxx6a6/waTYz7D/i5UfpS1DzCzfpK6TYWx29UmWt+uvgYKWhuxWK2S9I7Iulx8hjJEXLV4sxrPk7VpXiX6K7xFyns2+kFXrYqJPS4iEx9uEG8QTojm5682w5V/sQ1trGyppvIGKPq0+XY9etz+JgejeX7Ma/6m6s7f525zyxYzx2h25/+3uGl5W+G2brcms7d9PQrby4wWq4eXu+G9e1frbKrl1sjFnvsWyK1owZ2IMcA3s9cdwWbAM5NtOHaKKow2hCcJaDZf0v91BVawS3DumsoLbONm1sIbvv/NpYW+eV0FmbrvqM6a2EbkQbfAqxOaQFWdvmzbmcVUV0u4at6ottDLggo2Ll1AYSfFy3am+suu71THW87caXl2d3RWQ2f7oUo6mQu61QcwmUzgyGlIitWgiyebK3zwESesWmsawxRWJVYkrm+6bR1Iyj9x8Pm3YiwYbh7ZwQQ2BMxhIdhoHjOJoMRjRPKgHL+Zkq82TaPPN1YZkbeS7UpVqhQdVt3P6KzeSvTjzevfAzQGLjr4MboQ+sX92R9vHVXUffV3lM05UvXz4zTVc+f/rI4+Mn5unCmBLT9WIx4OoN3RRO90SxpMR1F/k7H6+Vm/9imO/tRx+3ibeyAzH7535OfYrUDi0czMj2OaqgIdBSogCfYuKnYJT0B23coyTgRPNqcrXdsvidHoNXXQVIHlLqlhK5orXScnGvooZQ/EqE2BfXGzYlrBO3vbxZURib0FDbLJsycwrurA4U3awc2l4wR3ePX23VnaHo73EfzcunKgzjgZhG22EXm0VzKTw+PfqElziMAxIChyHx/u6O1hp3xwP3d2fPC8DEwFQpTkWrmnDfMtt974AVTAOnM3sppS1R1J+HELySJcEA59OZYRjYV8aEICBxXXh6joJ1RWvnMi8s7hqfW/HJ3na3rdnS3GK0Hb/v7YJnIVsVkfeMBloNBK7Riq7z4guENhc1rUrJjeJt2XxeqeVtnauRiIZoYWAJNIzpUCA0q5wThKnAtJiYXu15HMgasuiLZAe4Pa3OiLgOhIQYBw5Hs+NIzQBHEKFo4jK7nIQPIbs/bc17rB20BCUWCFJJS2OqQkymPJ9dj8lCfRaKnHNmmptXsamzpDCO3kfFc4yaCQ52psEqdho9X2kcA4dgieqnuzsOp9OqExW8Iq2fLxiTF9m8pIY3LiH/+PELnz4/mYheqTa3GTpjGEfu7h84ng8c65lTvqPWCu0jj5/nVXW4tQwNajZtGnRbVUWEVHteXmAcDozjgePhnuN45jgeiTHw9PSF6+WRPF+YW0GOIwOJ8f2ISCHXyjHPLLUQxshwOkKKhBSJw0hIgeEAh3MlpAoBSsvknEkDDEmppVKyskwuBKkCGogSOAwjYxoYYuRwOnEIFhI7PTxwvL+HICwIy9NEE6XqQqWQl8z1U6VeIpqF2CpDEOj5ck4ZaXM/SfkNSshvKJEXQEccvW3b638ksauDo5fHVuGhqkzXDnIufP78E49fPjIdjgSEy/MzIpEhnQjBUCHanEGw8trmo/UlefEWm7OvhfZ+KYfp9z1eAJkOVvrzsHve72MHQKrrTmP/GSpCC5GaEhnh0/HA/zmMBFXqnJGlMKgyoJw80U6cpDHAFDb8MLrJ52JJkIRKy5lchVZtUjSQ03zBDZ5UvvFFxi4E/4KeEyIuluVVJn29igHNAzZGxAxHV1NOT4x+NVT18vh97++cK7XBcDgiEqilsEyL7aJroT6bH9T5dGKIdyRJHFPi6BVSpVWyA4tpyUx5WZWMOwtUi/nqWM5K9PyqbRBFEYY4WMgimrlu91IK0d3NU0BO7iLeKktefGHuwNnyZ5bcpSFYWR/VhtaC3+qtwM69xda7EbolQkCiVXgEqdZXtG90jDZvVddqrFUMsnudiKshlEbxPqjNdqorLf9Wh0SrQHSlvabdeqZBFTTbwjzXwJTNULFZMzgDY9fIygRsob9+5sb82BCLcWA8hBUAqf9TamWZdwygz1219KTVtj63acBAfYyVy9KsMs3v9cYmOYtAXRNQuyanCJxPEIOJ5elOvLW65IGKGOOA5VgNw8jhMFg10YcH7u7vu24kASil8vz8xDwXQAjBKtBUhSqyKYS/0fHTT498+vyINOtbAQhiuUvDOHL38MDd/YmqaiXktTFdK3H4iGRz687Vy8oXq5pUtXyzIL3yTIhiBqWHceR4uOdwuOd4OHM8nGit8vT0ien6hDbzcZPjgWEInB8GDodIroXjcmEp2fLfUCqKpGiCoENkaMLxrMShQlBKWxhKoB2E02AEwnQpXDAdHZqZkxr4spSRMSW+u7/n2/s723gdjwzHA0UbP04XLs9XSm1c5plpsXloea7UKVgidguMpn+Autei+vykzfLSXjt+m5pIefH4DxxfpyTlBowYPdu2geujtGt+lFIs4hHNn6ZrcZivh8WopVOUfXHugOOX8mx+CZR0AKD7s999pt6GVPaf9POr7X/2O2z/dzfphm17AVrYv94ZOtvSbW227hBtd1gVisAsMAWruJiB2cMaVb3UU1iNUOlM0U0f6kDMqjh6UmEPh63nr/RsEAc/PYeo3wfdfhNPAfL/C6qEbkbay8YdzKvioYJ/BrT8DlS4H13cL4hpl6xq6X6ftvBMDz1YWXXoAMIrVZoKIax28sBtaGYNOe0czDt/ZmXicWVv9nkCwm2BQafXTbtEvPS3sp1dX2hh5Yd1UzXuGhy3DSwORNanu80W67l2yrw3z9qPO5Oz6+ey/+ybxzc+1nMIq/noCrxa14kRD0ttujTrxmMFFNu/aA9hbm2z6X9t4Y7eaPZXu3DWOtRl1zt6+a/PIf0ed1CG51e1LWS2XuLu3LouaBBxkTkl9k6wY8HX5GvZytVj8r6WLOTaQ1NRNoFJqwYK6/WsV6i3G9G3OPJSyEuxHDmM+Zbgc5WIF8UkT8hNtNZIaVjDRdaXveqKvQ+bF3F4M3VJDEvet2oqVaXUYhpXJTujVjzHKaEpEg8n0mmAVjgOkVQLuVVanlHXoEM2yw0b21sekClvR0iJJpWUTCnb/iCYa7r4Jsd/elix35e+frbaKLmYd1culOwVorWjb8P9Me6cD1Ba837YhJBeH6O/HuTsV+r1815jYX7d8XUtnf5/Hl/vM1drJqykMKbE6eBS9mCJekFIAdJgZc/TfKVhNgmnICS8EmBnVLiNvdc6vLzya59o3Xepf8LuY/pYFdiFMTbwc7OwY1NrX8h/j0PXR7nBMatlwz7YuQJDL6/1+9Zag2JhjWupfC6VSYS/tsZ/aCOoJ9OVzFFtJYrqcvHuaypqyqth5dO1oxfb5o3JukA5IpIs0a3hjEzzRyUQPKNAiCjdAajSqA6OFgqzWCLgURPH6mWsS0E6tbNUcBE1K6n8Govzyj3qE3oH4u21fvXbHI/PE58eL6S00E1je3lsDKYPFcTyFHqXD6gxnVi/NGUApdXCMpu7cMl5ZQGOB2NmLIH5yHE8sFZgBZsEe1lw18XpAoDzPFM8xNP6whUCx+PJmCc3AWzNHb+zgS5tuo2ZHYCJUVZbgeiVIGCKqPPSrH+CM3fmU9R1fVaBPeld3UvPzURifR4IqKjlGiTPQam2wMT4tmOzEWkyoNGrEBEkmpVCyY3r5K7QGinO4qw3dgUjfefhDJBaPkcTAzaed7/etx0O3n6wjY44sytmoERAsfwLtWI2r5bbSpKbaV5pY83T6CB0nTMs/NwToqOzdq0VSp29GhNSVAtrnEbG42iVWxEGT6I+HkfGcfDE54BI8xDpsCY+o8o4DJTSuFwXy9tsypIbYZje9F7+8Lcn/vbXJ8aUVkbj7hRI7jA/DiPHw4lhHDicDijw6cdHxmFgSQuxQExqE+SiXsEIwW1KggSGMTKMR4Y0cDjfcbq7ZzgceLw+8x/f/wdoI5cLtc4QA+F4QoZ3yOlI/PN3HN/dEwP8y2hyZk9PT/zl+//k6fkJtFDbRG0FNHA6Ho39C2Ku5CVTY6KKCYBKK5CTVZI1QVsgIBzSwBAiQ3BBTRdYmnOh5kyuhc+Pn/n8/ERVz99a+1MlBDOkTYe41cSYsZkxw9VUoM8fXocyvwrk3IR1Xu6i/gmg80so+oYo6JutPvH56BxThMNoSqSioAUrfxNS6iBn8oS3I3Ec0SCmg+cGcDfMBLsvfHl5L57LDuD8vDn6VnK9mp+zOX1XdYNntl3m73HoqgLYv/4FmFlf68+tTbvAUz+kKVY2aCDnU6lcgvC31vjeZ8xSC1MtnJuuP1FgSEKK4loZ6iBHWekILzPvYlSxjARJlBrI2bxvqijZpdAjakAWIQEHZ3aKKAuNClyl8owJ0jWF1ITYAkEqqtWutjQXNeyhqj24eXls92xdXNZF5rW/+e2Op8vMl6crKdrub0hWNtv1LGIKaxl5EPUf16vAoJtIz30o5GW20uNSaK3aojGO3J1PpJS4P99zdz4TxMpMU0zU2rhcJ+ZluQE/OWcu9cI8z+yrtVKIHA8nUhrIJVvwsRRCVXDDyL4Ae8PSBa+DWhVcEANTYzAPtutiKq5Kc8swG01mNOpTXICtLLvfPc/qUnezJiJYW6UgSLLk2EL1/JPAWx6NRHM/pw5yzDWlUUtmyo2cTVemeYK17KKH6ojQEly3cFVvyz682q4Kpr++/9kYJEGisQSivuD29wRjcFprtGwg0JRsK1uY2L3CenobmIlo9rLiFeSoyxkYgBlH4TAKKRkrMB5HUgoch8BhMFbhMA4cxkSv0OsgZxgTh/HA4PlLwzCyLJlpKtRqrtrLnIlvDHI+/njhxx+eOR0sAfh0agxxREYBtRya43jkfH/Huw/vEBH+8vA9wzA4Q6WEFEwbKGAVWGqSjuJgQ2JkOB4Y0sjhfOZwd0eKkefrhZwXCEqIBYkVGQ+k4QPx7h3p4Z74x/+Lw3ffcj5E/vzhyMMp8eMPP3CtiSY/kPOF6zVTSiHFyPEwrOMq50LKmRKiJQ7UitQCOZrieDV/OEE4DpaPk4Kbz7oP3JJnrosZgD5+eeTz42frk922BZyhhhCVdAyE0TdrvkFurVEqVK2cPrxubP0rQI585ansgM/LxVk29uIVMPP381PkZ791nx1ddVpstJkIWqWFQinZFHgRYioogTTUHcUlN5+9vt5DS3sW5WssE9uydUME7T9v97wHLqw9NoDQ2Z+Xf/v7wBzZdn83L38F0fXjZu3u1RFQUIooRZXnpjyhXBQuqkxqWjdXbTxV26k/1sZjszSIowgDgdgB7A3I6dvJXsXjSbEi5CAsQagEiioz5p5rAveWGDus+3TIKBOWEXBVe64CB+y8ESWqrkJ1/b7//Xuxf8fuzve+tK0Yb3Z0BqWpuvq/PZfWaBLMeVms6qb2SjLpI9Q3G/YWYgjGjEYrC07JFo27uzP3dxZHPx6ODC7HEG5MK7fw1lp15cn+tdb1/WtT7Zu6bfffuShWZ2wfFWtLN9bdXCAQgwlLxlAJUmlsAViFrTpp/8WKs8Rfu0U9pMb62F/rppZveeiLf21Cd7ATnGns7fJyWt7dh+1nl3i8XeEtNFf/tjWeo2t4Qnp1jfvTKUJwvZa+UXpZMXc7j3WwxIv1wsPQdGBm4SQTTFffnSvSek7OFrIMMblnW9z1wc7N7bYkyvp31c1NrcrI5A/+ntr9f/WwMnj7qbS1LN6EESul2I9pyLijfLANwuZJ5gnTXhFphSzJc3siw2A5SauxqYPbnvsmKJLw0FOkYbkyVSNNNkAtaSSkgZAOhDgi8YDUbGjC76HQl42+fvTKK7dyiL4JCG6Q2vaVX930tjAvCwhM88ycl9XlXFvbiG/XVWvqG+ved7R3J9myGbxPhl9gWX9luKr3Yp901sH/ytv16yAAeDXv5Gvl2T2Ob/YNRx4eHpivgcsXq7rovjk5mpHN83VG4sA4nrh/VxkPJ4iB90FWwap9XoduX/RV0KU3b9rhAF/sd4TO1k7bXLH7/H537HHLRunttEuEfOOFES/hW2P2L+6HbidFnzhoau6x2pha41orVeFZhCcs7+bfVfk3bUwifJ8jf12A1vgyXThOFw5N+ak2/p/aGEX4JiXuo6kbvEM4+emEZnlVK8XtjxoEHQK5Ji4yUlpk0cqzCoVGgDVENRA4qNGlsyoXtV3QopVFGwFlVlPcHFW508rg5Zrys/bf95bdCv2zY/f/HazVtwc5Ch5SMH2XuVglSs6wBMtSMhNGTxyWTXvFTPFsEbq/v+fu4d5AQzSWJsbEhw/f8P7dByz5f2KeZlqtTNPkOjvVX7fxaEqrkWVeTMjrel0ZoRiTVU7MViWV88J8XYz61kaUiERjdnqJcvBrE7EcL2kgQRnHgfvTnecKWJ82jyxbPHCg1VVZWzPhPIQtRV0CrXqWQwirvkpAaRFEGm0FRIE0vG2yaqvVf2zeGw+R0/lITMaGPH6eKOtUYn2ti46DmGJuXQxglkwrnmflGkBgw91E9+z72KnfouohHws5Gqjolg3qTuJeFp7rVr11kzsTfDHsuVee2Czs5tgtV0c8b66JkK3wkUaltGbhqsvC8XliHG0NOJ7viUEYYljDh92YlNatLColZ748PjNdr8zzwsefPnO5TJbQuhSaHN70XkJEWyBnAzizZC5pIefGl89P/PDXjyzzAip8eP+BMJoP4zgeOBwWwHKvQlNiGri7v0dEOAxHDuORIJEUBlIYjc2KjWV+svEgJ2IYbbMy3pGOIy0MXOvAcoEalfcTpFnQELnWA6MeWOQexu9Ix0CTkZivKMaoWAGArQHWVYKttSdLMm81slwVqD5et0KQOWeWUpjmib99/BFV07DK1Vj4qZgVTV8d8fm6lIVaMyEKB42kYv0xHRLBq9VCHIzpj6/fz38y8fjv7Gi+8t8vwc3+96+VWt9obMTA4XA0qlwrAaXMCwRT1pToNTf6TFPheLqHeODQlPF4NKDkFR9fU39WuLUk2L+++6UPqRvwo/tr8RmoLz67BnkJ+n62lPad8KuL6G90BK/goK7bxRtcswLabWtmQqw2uU2l8rlWsiofUX4EJuB/BeV/iS2sj3XhecEMHOcJmS8MTfmpVD6UxkGEf00j38TEAeE7idyLUeKxdXnx21ZSTyBcJHCRgayRqxa+qIWs9mc9qoEcAaamPLfqSc9mYhdVSdJ40MoRZWyVRvWqvK+1/98DOP1tHd3r75KT48uE7dRCsATF4jlKavlGdjpmUJliswmBzWxzkIEYA3fnE6f7MzFGTqczx9OJIQ189+2f+Oab7yil8v1/fs+PP/xIXhamaWKZZ0oxc855Mu8qHUCTsiwL8zRzfb7ablMs36VJoy4FGmvVSM4ZCT1RNNKkoS7AJ0GQ5CGo5jlhIgySuDucV30WFSitQZsoLa8MYGcqmlaqSw8o3VTSTGhb/x0hSqQFJcYOIsQknFSJb1yRg9vWtAahmdLv+TQyjJH5mt1U0nPinEDpybY2XBvNE07tx0KzIaZt9+2MqKoiVd0vig10qDKmkfEwrkAYMRfx4sxKdU8qAyv7ze5WEWs6K72yccvrX2dR7X9v4bWGO3Vj5fuhFGIMXK4Lz9fZSuMJHI5nYjQvreCzZXXtHW0YqGimtvz0fOX5+Zl5mvn85YnL5WohjtyI4/3b3ksirQWrQhNFpJImC5c9Pl759PEzJRfOpzu0qjOTiXE0c83alFjMsHVII0My0HJ/vuf+dI9gKsaWA6OUpbEsF/N+GyKqEYjE4ch4fGAhMJeB50VhgOdZOCwW0p7qwFEPFDkjw3vCQYgKMf1Ea4WgGXRGtXrKiF1hiInYKzVnJaTZN6qRKHHd6OdqTM3leuFyvRgIB+87QhiihUVlI1K0VpZ5Yl5mA/kyMLZGdCxDAAAgAElEQVRo4oRBSNgGLQT725D+Czk5a/4K+8dfd7zG2nwN4PxS+EowISFLarQ4bGdfaslotemoSkRdSjrEwLDT79i+85VrWCkMuCVddwyTv3Qj2Cc/ewJijMDrgOXrPNcbwxv/6s4dy82533z3Gqz3h85KNKU0ZaqVReFJ4LPAJPCMPWYgq9HN6tVwVk7auLRKaJWjBM5aCSocCEQxdVnBnI434tlOwtVQbMCIcg0mZzOp8EwgcwvNFg0saovWJMoFce+aQEOIKlwVntWqrY4oZ9RzeyDqBjZ/fqe2vvHVu3hD/LztHTWa+3YsqVsdNLVdIFgMfUmFVhsxCOpCjeMhMB7Mlfl8PnN/f0+IkcPhwOFwdDYneD6c0ew5Z5ZswCTnvOql+Ams52B2HDtriM4gYZU30g0+W129sbqbdX/sidTRqfvg37Emv+tWNRkkWH4eeJ/zMGdnTXf5OD0c1cuj++aF3n7ro//1W7Or2x20EIPazkLcQNHSw5q/pqzCN/2ctbe/l/5rW9vAPnXryzhwUd3aosNlm4u9isnjE5Z8bCzJfgbdBTE8fLAZwYoz2jZUbLz06Veke3Dt8q78yumgHNsfSHPXci8Dn3NhyYXUIsnlBFRZvbFAaTUj0gyIzzPTNLMsC9kF8fbh9rc8us1F93qKPc0Cu5cmbFtMG8rvlcBqRWRhK2uLEKzkPYbIMCTGcUAQMiZzrGD2NMmBpfSwVWVZMpomsgaWWsktUfIBmpEF4FWxPrfnuv2UrrXkStyiZRXZXSUXeizCQ2u4BICFy7bNvdm/OJilyxjYuE8aDIjIRm50AXrr49IddlAx3SCopnGmikQhL69rWP0KkLPFqN+i7K5/7uthLHvP4XhE2wNoYzgcCDGx5IUvz49c56tpPpzuiGngfH7g3f0dH775lvPdPcMwspYXr5ewfd9umbg9t42+uQVKYf9XP+Nk6BPGFubeTTIvr/8rf/2mh2Vy7Rbhl4jNQIW0xqprvxSYbTF6LAv/pyxcVPl/U+B/D4EZ4YtEvsRIFcjayHlBa2VeZvI8EVrjqRSGWhlC4G9UHnRglMB7SdyJeSolIyXWvILOilV/bCg5Wl5NBuaQbCETWQdVUMvRQSG3wFId+FYhVCGp6bJMtXFU4b9TqG1mJHBP4ExYgZbulEt74rW8aDF27wBY7Srq3m/rtz9OpwPjYfDvt4klZ5Nq16qoV4lNMXO5zIQQOA4Dp+PAkCLvv/2Gf/3X/87heODb777huz98YzvBdaGEkuHTT5+Yl4W/fv83vv/+e3LOPD1+4Xq52FWvgnqgJVMcvIwpwvFI8mrINCRLnlwWsiq5ZMqyUGshkYgeUm4hMcZIxdim4zgYS1QqWqot9LXQlgViJDY4hoFC4FoulKvZCaysR4A4CmE0YJOGZJ5ZYiHLDr5KtnNRVXIprrS8hWxu/dN++2OQykAhajPv2lxZLpm6CGWaoC6EVqyvBdOT0mq2GqDUvFCyhRNbaWixrUMMwfUwxVSuh+T31wQ2O5NjBpbKENWKGsUYGQli5b0FvLTRxoGHfKMkCB1obfCnVU/md1bHTltQjSiY0bIDD3FQZfkcjVaF0pRPjxPzvHAYE0MaCSSGIXF3OnA+mmXDNGcLQbVGXvLqyfbls3kc1tKY54WSCyLBnLD/wQ37P3qcTwP35yNDiqs1Soou31CU6WkiNJieryxzXh28j6cjtZle1HU2cDkMkePRqsbePZz49uEeEC5PV646gQrhMBJJG4lcZnJe+HS5MqvQJDDHAyUMxPYdof6Z0/ANKUSmssAU+HyZ+fFp4vOXK22+kp+uaJ4IupDqhaCF7NqowRFm8byvFtRK38TGdV6yFZQ42FOUEprb+TTKbIKjoTM5IZr1wxCIQ6RWWJZAFvftypGqibJg5pySaTSKP2r+LyQeW9+TNwM5sDE7Lz/fAI4BoHE4ICel5IWURkJMtGXh6emZL09fGMYD9wQORwDl7nzmw/t3jIcTQ0xbp34Jcm4e9mzMbsfeF1Bnc7YKq36ene3pC6KukusuWL9Sd1/5phtm6G2HHiCOItbyaMfaO/ZG0G2hbmrmmUtFa+NSCn8rC48o/5vE/4jREoElotH3Bq2X3hauOTN5SXKrGa2FFAKfApxEGULgnTZOmCy7VcDJOul2VqJqXcFMVxBrAVO79S3qpvnBep8t2c8+byjCWISoSpNC1sJRldQKdxo4EkgkTnsmbqVrXgLgnzXs7Xf/Djk542FgHAcHNuq5EjZBdjE7bcokmYvMBBHuTgeUM+M4kMYj3/7xj9zfnfmXf/kDf/7XPxJCMJ+46UrOhR9/+MLj4xPTNPHTTz/x448/mjHf9cI8TwQRjsOBMQ7bwtYatMoQI2G05MiDP+ZSWObJd9ZWRtpaRYMxelECLQRSiASE0f82BrFE9+KiX63QSnZmRxhCtOqrBnXJq8IuWMgrDO7IHIQ0mJGkIKtYG2AAp1h/y7WsFVUxDETim29AIo1IJahJZWgplGmmRUxUs2ZT/W2WnK2C5dR0vaOSacWE1NR1RkxBuM9PZusxDpHWhBKFEjaA04ISRUkRhujzXvRxpbpWQtLUQa2zdRI7JbRt7Bqoq1KF1dW9K617QirVjEb3TADYufVqrOeZy7MB5vPJlHzHcTBtFBloTblcsoGYUnh6sr5aigHxabpu05saUxLGjVV5q+M4Jit/T6OBTFVXbDZmfLnOCMp8nSlLpo4jgnA4jNR65DrPqyp0GgKHgymX350PvHs4WZ8tmToLopiycDzQmvL8bIJ6uVQ+Pl75cl3QGJHDEYaR80GReuEQC1EiSynUpfA4Z75cZj49z7DM6HWGMhPbTKpXghYaCXSwMK66fh1KC8BgLFKRxpQnWm3GQGHRlBbEOlYL1MXUlkWE0VXOg1h19DBEiuhqMyMIWgOtRS8zr5RWKFpsU9Myh/R6+PEfysl5CUZ+Kc/mnzk6kFLdKM510Lw41ly5F3/faejgjba3d+cr71+f3346nTjdPHpuv7t/0t7TxqpX/J295HpH8r78tl20aAd03hrm7HhjF5ay09zzWd7wfUJrbf3pr6mHd4qHizpVvoKcWm2h8XCEuudQh4EN02ipAtnp8YAlIHao0lwwsKlS2HI0cHbPlTtWFmcFoLujBaGpTTItKlUt/DKHwCUIVeFJ4YuaZuaJxp0nJ4f1GzYavRO0r/N//cUdcHyjo/e9LuHfdu3cF//bEI23fVczrnW1cFiWhWmaCUG4Xq9crhdyLjw/P/P09Mg8LyyL6eioGsNxKwBod615+GQNfYR9xZK1lq59ZSvRF38dtd3ukIxt6IqpIRg7F7GxklJym4l+WL5dCoEhmjZJbWW1ibB5ITgQlptb8/OMue3XzsT+Ht5VKbmoXdjaqimWE+fxJek5NetIejEH49El6flLdv61WvvWUsgCvQqn/72ujyYymbNplETM8LKHkrrLVbdXcALG527/BH/sv/fQ4cs23nJ0emXd/j70tUDW/dY8Z56eJ8bFypo9m9lDUtnDqRaWqqtXU99aWgcMboPw1nIAQ4qMMTK46rfNtvbYGZ2AeOTRk+TFPeRS9HGzttT6WEthnhcEzC5CPZy/hop342vV63KmrywojbzMzB7KC1UoMRNr5DplpsW8IckL5Ax5QSnEDpSxHDDFWJsuHBvFUz3V0z6TtXccrKxfEM94MOX64PIga+5tkE3pvH/ROmnsf/DQszmbD2Miwspof+341SDna2zLS1XUr71/f3xd/E9eec9WQGoLnXugNBMLys1ifCFEYkqklEgp2o9n3kdhXTy9/bepQbaS0F6GKOv32znUnsSnbaVAVU3TodYe362rL5YEAzl7saoQAsNwIEYPK7jhJGx5B3S255V2++2PvqNi1yKsAAZVE8SbC9SGzJk4Z6iN1DKpFhKKDh4OQkwnwS7QdtK+Qy9ejYN6BZnnaBCFFoUiwhSEGjzU5MC0NVy23mK4xcMHIQhJgu3aXRuplxT2yWQFPWC6C9HiyLUGNAakKR+DVYYNDqAumjmpMGG5EAnlSGGkYamp26Q+YLJxfsHrNL3CH1W0FFhelxr/LY7amuXILAvVQWWP8YsEhmQLehRrs95OS840bTw9PfHTT59857vw/PyEiPD0/MTz8xNLLvz4t098/PiZUiqXy9X0N4Dj8cDd3Qmhy8sLrboAoPeFmISI5xf00iiqCTrWxZhOqZ5LY0mzFUvwfXe6R0JgPJj2TwjBWQz1vhTd0EN8IbR7cn84oe8NwF2WC9NyNb+gYWAYTVdHG+TFgFqS5Am9fS7wHbdsDE9rilKp7W3DVXd3R2o+uKqTOvVvc0rVADESBjGg3vcmiI8Fu8djMFheUaqj2tYa8zQDMF9dSgCx8vRgDJWKJa5rg+fLwvWykFLkfDoyjqPlq0ljjLYxaR0/SN9cAmybwdXrSs1Ko1K3kL+r3bamrJo6Ph+rg5EOo6oGaJGW4a8/PvH0PJNi4P7uyN1p3KCZdrDcfMNlc05yG5Je7WOaLyOH8fXwxm9xfHM+sZxPdFuZGIVhsPL3w3E0/ZgQjbUuBj5jEM7nIxKU54uxl+r9Xas5zn/+8oXp+WrzWwtQrS2zW1+0qpQ505aC1koshbEVSlXm5cqsyjhE/vMvfyUc7wnDiXSnhPGOx8+f+evHn3j68glZfkIef0TyF04xMA5hNRMNyfJiaxRLHUAJo6kct6pojbRm1WXn04nz6YSIUIpQClbGXyvXyTS0whAZRgd2ez0rV3EWAiIJJCIooRnwPYyJ9++OpGPgT3/65tV78SvFAG/l2X8xSfgXFunXwlIvy8ZvgJR34K4HUlVX3Y9mXKjR0C433d1n48rmbOJgIlvBi3mJbCBnQ5Abg2PJqsZE5GVmmiZaa5bI5rkPtdbVuE9CQ4Ka7HW7Qzh6Oe5A8Mwq9XEt63Xbj9DefKdoJ9mvcX+fHOiITY+OKs0ks1TIlZAtQz60Ys61otACRRtFBamCaAFVyjJTZsvJqTmjnpsioYMYQ/HNkAmLMzrILjk1uE+m2oSdXQ8pIgzNdhNRhRHXeWMDqn3Hu+4R1TNsglCC5efMGECNTZFaKVTugAeFb1BsCjSH3j7lduukPaDZ2m//qyKrOeDbHer9zxKA69ofvQks6R4LAa0y6phehWpjmiaeHp8oOVPLwjxdUJTHxycenx7JOfPDDz/x8cfPzsy4VHuIZgZ4PFpLeJJsrZUlWyWTYLtSke5m3NvJrFhas9yXnvdm/b/SmjKExMkFCMdx4HgavDTWAJUolDmTp2wJiVh5eSNwGkY4Q6mVRiHX2XJyUiSlwRNVC7Va6CBG1kTsDei4IJkTC2sC7xsnrB5PA/M0Wkl9s3BbdxavaqCk5z9p3XjiDugjwuBVdhu7ouTFKopsp7+gNRuAPJ0Zjsl7hTFhrakJK5bCkCJptUZQV8u2LzW2iJUpt0PoSdGWG7KVilvozPdWnijdm7OzOH3MrhsU11vRFqiqfP5y5fOnRwMDp4HjIZl4YIoMyfrZYYyMKRiShTWBNzjIGVIyw8j0+s7/tzgejiPPx3E1dx3GxLErN4+WcxY9JNuZ1RCE8TDQqAxDdGFLuzvqm/rL8sxzbQQCx+HMIZ0RoGijabak31xoxZSlQzN5DG2VskxMZeHyeOSnnz4xPHwiDAvxekJG4fL0zOenZy6XJ2R+JDw9IvmRMI40TuZaLsKAuZVXUZeuaOgQqA5y6jEw1AANjvcj9/emcL6SQ7ny+HhxDShjfWLq88RuyxjC6jm2jknC2s2GY+T9t2dODyPffvcbhqt+TTXULx0v/6zT7f15/9mXkOODzwBFdWEnKwcGF65aQ13GuszTxPV6QVU4HgvEnpRlu/riJ3NDR9Op/IqqgZllmai1Ms3TLchZsutSWLlmBwkGcgLLcmW6mrLr/UPhdLxDQiBFKwXcgM6OoN2Drbc6OlOzMjmwUoEmQoLVH7scqWJMSLItXCQyaGAUZVCzZIjdxbnTpr2y6iWYvSHF7ehRnaZbSMh2crtTW99r9LWJS1lHb/4k4OJR0rm/fcVYZ8v2xK+FsgQre794GOyxKV9UOfiusIeseqgkyg7krBT8FiTrV9hj8G95tBdt3EXcVDdjxa6JE/qOm63f5Vy4XC6ebOt9GLXy8GUhZzO3TF6eGaPl1YRg1VjH48GutlUHOQb2h2FYQfymdRVdPEzIpZBStDHtlg4hRmIaVi8s2V1jyUoIDWIkmQKY+f6MvmloSnNsThiQJGYE2g7MzRiMIGFddJtX4wQRmjS3cvA2ZLujInFlKvZ0+VsdQ4oMQ0SpphfjlXKimLx9EPMoa3Cr8GfnZW2YCBo6FWrXm1gFF5sGag/Z7fnJdZNn2kDNNWtKMe0hHIyGYLvoIAZcbZ8paOyhEvGP0w04vtZsfv7rHOBl6V3Ebw3Haq84MwCqmPbT4omrNn8HUhDGtDFDMQQk2t0Mnmw8pOiqyG8Lcvpss4ZiOmj2aaOvVVV3oWb8HroHlfj2DW+bIBZKbKX63JQRnRECCSVp8qk9MgwBCY3jQREiUStzCFAHDuMRqpDn4gz8jCyJ+bLQshecVNCq1odaD307y+b3s/VrAfo6jDo4kWBGmqouCKrre2KAYRDG0ZKyhyFaIQD0CORqObJigWDqxwThNByROHA4D5wezpzuB453x1fvxT8UrtpXQb18/OUb/vJJv8n2mLMxI7AZgIlT7DFGmqtVXqeJyzTx/Hzl8fliu0GENJjFQ6uVvCxcnp/46/f/yfVy5f2Hb7wc1sSVLFarnoMwbWWxy2J0+zIzTVfLbr888/z8aNYE08T1Yq9nj1vahNljvwANEaPbHx7ubSE4nPjzn/8bHz58x+Fw5Ltv/8j9/YMh07DtGHdL/Jse2ty2IAZcMnJ7DNHrp5VVL742JCUIEamN4yJ8WEzP5n1T7udCEJhTY45evlzqKhzVAdWLInWf1CwpsVVjyyyMFX3ienHeuI+xmqdU8NdC8NwZUWNdfFEK0tVvt9i20fId4ARPWlY+JeWqykGVoVjS6RH4s1b+QCUBdzSOmKpywgQHvQ6Nlbf3Halhn7evrqqtO0HvwE2vYpFgpdfsy7A7eW6g5+npiX/7t38nxsjDwx0P93dIEJa8sOTZQw3C+w8PhBC5O99zOpmWjvXtI2ijlQX1qqTqCbsm71DWMuEems05c3n/ziqrSmX2UFs3GUS8vBjLOahFmWZr37vTgXg+kGLgeAqc3xkYau6UrihNAs2rgQ6fEulLpFYTHFuW4iJjtlkKIpCEFrrKbyc5TRAxvgDbMf42fsavHQ/vTmg9cZ0CpdomLpfenpkyFJvoC+RWMZ2RsErlhygMhwHQtcRfVUlJqcnDVjO0xXqqSqJqNIuGkGwOLZUyV5aaidqQy8ySMykFTseRYbD+VZqgHciI6aoYi1ho0iAEPJWnL/e7K93vXLCFsHTArpZXk/MKBDqDZmFRu75pzkzTjAgckoVmxyEyjoFTMO2nwxBJsY9MGwuH8cD7+wfef3h403tp6QzNFIHFFXk7OPDk2YqylMycF0JOKMI4HlEJpHRAGCyKUcyeRvC1cvHQIwuJC0Eih3hiiEdiSNwd7jkOJ1Th3kvniyrvW2VqjXi8Q3Li6YcrTSqzCJVHtE7oUok1orPQ5mYFJ9FY3JgipEA1jUeqCMU9ygiJIR3R2Ci5sqTFbRcKl+szIsIQD6Q0EgXePQyEZizU+TByHAeawrxkE/T0/m+eeI0wNtJQGU8j3/zpO84PR9Ixcvr2wHiKfHj4w6v34p8ata8BnJ9VR73yS0funcXJOTNNkzWEiwv1ybrvokq1mv9lyUzLwjQv9Oqg3pE6XT/PM18+f6LkQgiBsky0dnKwVD1GfeXp8ZGcC/M8cblcDEhdLzw9P3mm/he+PH6ilGwVJ5frDSjqg3DVCfHcghgTD+8euDufOZ/OFifNlfP5jrvTPefTnWlF9A2U3jI6b3r0TMY1p0S6ZOrPdlzSHIkgnincGGicS0FpnDRzLJWGMWO6fnxdmRxlt0m8+fwtJCjNd9UuDqV7Reb+7t2ur3Y9KjGNBxq+azC/mlVgep1XNv5o/QmChkAT5dICl9YYFO6aEptyUiVSSTRG/4TQr3EHaG7v2n7ybj+nLX/j4yWT0xlNYEtsZNOD6eCmS9rN08R8uSAizPPEsmTbqbdK1YKIOGNjwoDv37/n/v4dKSXOpxPH49Fo9GWiFsvV6Ru+znh2g85+mqUUTqfjmuP2fJ38PZ1ZgFwK19k2HbUpuTZQIYWR8zFASIzDyP39aFVXeaYus4HeYUCG0eYLzcytsJRCfnqmzgu1OcjpdhMECDhNHryiBdv5h32/2Swj3uo4HgaWo+W/hCKu/WJgNtVGTFZVFVpnSoxdMabAbAE6DpM4O2A0l/rq98Tmr+o78kDtOSNhQGICLVQCuZlsQ1gyrQrj4P5FIRijGZx9jZYI3Qsnmuu2BBWiBlflFi80lJeTwA2L0zzPsZaylvLbezpESkgwB+xcTAtHBOpgCsitJVo7WZJ6MKZgHNZgGKi18d3dibvz6zv/3+KwuU/XxPtuDos4S9FMmLPUSq6VVAsqENPAAIQ4ABE0os1ybToA7GtPboWgM0EibRBaCgwJ4nngfLYy867BU4GjwqxQw8BcAvNTJrfGcxGWNpFC4RirVYMVQbOi2eQoTB09oMGqpFSM8NlSswIhDqg2Z6Ks2KO2SluMcYtHEzxE4HSMSLWcqjENDDG5eappb1UXA+3ZqpKUMDYO58g3f3zHhz+8J46B4f1APAYe7t6/ei9+lU7Or7qpdPb+lyf2Pm3UWtZk0nmemOeJnl0uMvr39uqRyjLPXK9XpuuVZcmmdCxKSkIMaRVMCu57sSxG41ly5UcTX6qVJVs8/nq98OXLF+80MxcHMNNklSW1Fi6XZ54vjwacppllmWhN3fukZ7bvlk7fiZis+ML/x9ubfkdyHVmeP3uLu0cEgFxISipVT/XMOf3//zVz+lv1Mt2nWiKTyUwgFnd/i80He889kBKLqm6hXAIJAoFY/G1m1+69tsx26Ly8PDMMowVRH6+s62ITQTzOBVR24vFbXx1a3IKduw3qHoqkTTBQ4ykHgWqlhMMQ0Vr5UOAPVbkCLzie8RRRbuJZpFCpzMDKvlkhbIFhldqCPIc6g6Klthmi+yGutaFBDfrt5SyFLcA0AzlHtxwAc1AWcTtFWHkVLG33pJ3MFVgEzqJkga8VRlUGaRU8YEQ4saNCexFD9nvb7vS36NXf+7ovCf4lr422QXQujf08hsDU+k9psbq9wfhxM93DW7wuzhGjObCGYJtTP3yWhhxprZQ0U3NqJQKxOn3duULc3ZddBbYLC7bxauhhpjbIrVJyZV2tPHmYpv4BzexziAQvBK+WXQISIxIiqRS8t+y4H6Kl1A1Zcg1S994TQthKeLQ5ajewrQfpnJG3VeQEb4hEjA4RT3FmgukL1OIYggVlpVYja7ey0OYe7Bzi7UaEaMdPbzCKFqS41ovKNYUWtgdUqK00u+1Dd9t+beXhXkKotRo6UOzRRR29VN1Lk7Wa+tJpM/bjL5Pg14vw1xRsfaV/ew51MvHO/ynVVGSlWsd4HwLjGLgLc5jGkXEyR+e3vJx3TUFkweh2UvTkPmekCmvOrDnhc+uc3jaUXmLunJyqvR6rhry15ymtHc3GO4uR6XDgdHqydbosrey8V07MFLUjwc7KUlWbAV/zpanW8kR7D66+bsWSIAuQC0VMGmJ8RdtgpSGhfY+ppVC3ZN728OkwMoRoP9N2BmXrPperlfG0CVT8EDg8HDmeJg4PE9PxZC2bgr3fUkxU8mvX34zkvJZKy/az+831Lu+2x+o+WAgtq2xZ3nLlfD5TSuFyuXC93gwido7j4Wh1XlVKzaxp5Zcvv/CnP/2J8/mF55dnbreZGKzdwzgaEz2GsNUFv3z5BfQLzy/PnM8vjOOBlJP12ymF6/XKly9fN1h3WZa2eDMpp0bqTOTcylLNQXIbqA2euF9+bfMuldvlzHK7cDkPlDXz6aefeP/uA4fxSAyRcTwQ3w/40W/Kse1Af8urqS80l23AXN9YXEN1WiCiteEzAdQHUDgGx+/jwFoVXRae5oVZlZ+d8EmEVSs/OeWTV5IWPmsi30nOUTV5azbitrQ6rWuTv1SP6z4aXZKu0DvYgtLa1NhYND6IU6G2DvMOK1fZx9Wtb09f6O1P6UVGbehRVuWzWmAWVXmh8KeiHIB/Av4AHOkOyTS+Ts/vG9Yjuyz/r23Nf8+rlHui8Wt0VamtZ5XiiEhLAB4eH/j+43dE78nrSpptfscYiK31SWdYe+d4fHjg9HDcytW32wWAy7ktaq3UnNCSrP3KMJr8u5ohZEdMus1/LXuDRLOzt1YK4jzORRBHyUBZqQnma+LLy5VSlDFE6sf3iHOM08jTuwcrn6gd4gDiBwiRZc389PXcsmBIa2GerSQeYyBEk70fxpGxBe1rSpTcHIa7ysq5zQxQepPRN7oOo0cPgehHk3GXzLIYvygGxWkgZ8FLIq/FbBXEUzQDHu8ibjwgThgYcRiiulxuzJe5NaoMrWeQZfc9WfStZFhrbeiQBRBV1Xwtq5KzklJlzaV1ns/NXNScAzc+lWsoVErG7VE108F2YN8nAj057Kj+ztHcnc+3I0Z64M4dKdX2zloq4ipLKixrxvuBw/HI+/enVq41RHOME++fPvDu3dObjuV4HIljtKBLDTkzGwBlTYW5JCrK4Xzm6+XMipGgqhg65gKEYG7OaCZ38apUfDQbjFwzpWbUGco3HQ4cphPf/+4P/PF3/0RKmU8/f+Lr8wsUu0clV7J61gqrViqtRYeqQbzVgxd09eRFqKuSRvPfSkGabYhJxyuFIsUCcYtHbBd0nmk6WMXkNpPW0oJRaWBE4PHDex5PVuU4P1+5vswsa6Kcr9zSaoWDAC5Gxoo3hRsAACAASURBVNOB3//Tf+D7Hz4yjAOPH54YjxNZM9cyk+bM+K+0XPnfKlfdBzf9SH71b92j9n1+CjvNTTc1R0qZ6+XC5XolhMC7/P4uiNAt8Ljebjy/PHO5mNlTSgmR0CTaFhxZkCOkVJhvV/v3Yo6ZIUbWdeVys7KUBTlfSMkcMtc1bQusblLRSm8gt3t9SGsv4Teewb5mm99HraR2iDu3UHLl5eVMWleen7+aU6zKxqd4axj81dXKf1paOaWRhAUsyAmGqlg00MKAVtoBGMXhpC3eCsOqLFp5EjgCszhUMqtzzFU5i2zZprbgV1QplN0bQp3546gFqOp6kKzbexbnkW3ba1kqrVuxGCSOdBmqPcbm0F5c6lfnCUBHZOzzFVUuTpldc0WucBU4KpyAA6b0skJpC6LsDbJD8Xvw+9ZX3fxoXgc4/YP1oFJ9M9tywjRNvHv3xBAH1tvMEm5bWwVpzyHe+B3ee8Zp5HAwBOV2m42rU5Wca+NRVLQkqMUy+EOlxNi4dmkLcnwLsnp7iH2P2P1ressW58wDxRyXM9frTM6Vpa1REXMtHg8j49Bkpa3DOm4APxCWlRCHts4sIMzZ7kOM9tmC98TBDBUt+CoU9uaevT5pa9+9vr9vcMVgZSFBrUSVQaiUAqKOMniyU1IWgtPNa0ipqDpDckI0lMsp0TePlFKoayKLcR5db9XRUBmkeWHpN8hgQwN71XpDSrI1uVxWU2m5oFsA6IawUQ0suy5Qmxpry996wtxLuj3I2VGp/Yv2ftrj9h8grvXFantZbu8t50qtEOPA8WgJc3AOJzAOE8fTcZvTb3X5EHDBU5riqePiVa3H2ryuVFXmZTXjv+hxwYwqe/nU+eb+XvbAj6ZooiEopZqLs3ghxMgwjjw8vOP9h++NtnG5IpdbG9tCrUKpQsm6cV62u6otyUXQbEhOya1tRrNvMdaViTaUSpH8avuzUq8jhmiJjmuBbhvN3lX98fHI9999tDKcOtKqZLW9ONVi+3njAIXDwNOH93z/h98bOnecCENkyQm9FNuL/l5Izm8+Bjrme8dX2CP1UrKRBGvhdj0zX8/NOr0yjQMhmnW1axuKBUxdbUXrcmwNBB8fHxmGwOl04ngc2wZtQVR3+gTdSlAuraxpZb7dyKWwLHOTkvZuunZZ/dTvn6E9564UYTOTMhRCttr9fT2kSqG2x933MNGmQumBVA9ybAH8Nbj273y1Zo74hiNToTa1XHc3BlQLm+JGYNNFFZOLO1VGrZy8MOBYnaBOWKjUGolOmaUQSyKSSVTOZG5aGlJkdvSCtE1wV+z9RYAsznhXzm88qB6EaWs/sG/K0MMgdM8a7/EWpXEA+vRkeyErhTUDwdU5rtXG5gXlq9rzXqmsGDehy8q3vzdLQ76B+N7k6uWqexWT20pvu+KoT8sexJfcMjBtG6YNCP2w0dIVLsarWRbbJkw9pTuc7sSgbW0SY9d4IT6YBFp7TyE2bg7sY6F3qKhzHu+CSZvjwOFwJIZCwbLOUirv3z/y9O6B43Hi9HBgOkwM0TcOmHHD1lxZ59kOj+vMvCysi0nNvTOUwQe/9eXyDakB3dcx7V4V3URM4uTVZ3iLq1v/a+1tJJRYPU5Ai7W3yF5IuTI2XxGKw0DZ3fgOBHFW/sIJwxDhMJJzYV4zYbERyI3wL+IQNd8hbaUt+7orhalsgZEtqx3NLlXt8cKGLHZ/HNcmS2goTFdO7de+Z+4Jyq6q3SkB8grt0Satb4dO+z2koixrIUbjHqVkHJMwtJ5Q4u5f8s2uEDw+OHK2RKvWwlqty97a2lkYATnbVzYn+G4+6rwyjObw1lG7vp31Ev90mJiGkeCtS/nxdGQYBpa08MuXX5iXlc9fvvD56xdygcsqLBkqkayDFSjF+FhOIs61+1xb0Kx7qddahYgJNtTMJtVV1N8pSF8BH20vVA8a7FPVQCmB7Dy3GV7OZhlxvcG8OtYcQA6E4QlE8MeIGyLT8YiEA1WDOR/PFVmth971WpjXyii/rmT9zSDnr9dJ/zqC07/MnVQaqpGhJLQWLi/PXM7P5JT4+vyV5+dnRITHdx95+viOGAZOzR8DzOCo1ELRYptTDExy4He//x3v3z0Ro+fxcWSaQnOCnMmpdd51BpOmZeZ8fjH4NyfmZaFUIyB3O/xehu+weq+9C3t0akGWfVZzJTWeg2+y2L519+CuL0bnhDCE5mYKtSTWtBDzgNIOmTuY9j7geotLgjcctPW+IRU0r0hRUwPl3EwBM1pWUKsHq5b2fgPOmdvskzhOk6Pi+N55buIowFcXObuJWy38cxH+R3acKfxzvfE/W6O2JKU15WxBIxbM+FK2INe3MpbzgThO+DiYbUBJxhMombouDYnYPTFQWkNDpTi584jZUTO9u9eCHWKioMHUOarKM5WzmurKl8xVKx8QPlJ4h5kCOhyhhzri9kmC9IL6m12dRN8Jx72JrYhYtpVLq+XbJlWgkfyXdoCrKYbcbs+gaj46uaYNCStlbUjLPu9DcAZrt1q+FjPiHMaRaZxY15VrnY0DlzPzspBzIjjPOIybijLGsflcRUIcceIJYeJweKIqvF8XPv5wo1L5hz98zz/9xz9wOIx8eDry8f0j3gvLkljmRC2Fl69f+fT5K7fbwp9//IVfPj8bfwRlGCZDs8bBStzezAaH6MnO1rU4I8uWUuzwZkftrg/XNx3PIUZ0MMM4VSV7IbjGgQmOMVoZyHtPrpU1Fc6zY51rs4XRrW9iCJ5xMlRo9ALTQE7WpDUtZvmfckLTAuKQYIR7asHVGSkrG99FlSJKLgO52PoV5/DByKKmAFNKtf5DobaysaOhDptsjZwMATJgQrdGooK2oBn25o+dpGuBTi4WTHVKQPehcgg4T8Vxu2W0ztQCzy8zx4OVIw/DxBBHgotQ/VsLHzkcJsZpIOXVvMSawKUWZa2VuRGNr/PM9XYDL4zicYO1t4kTPL6PpBXm68Lt0tFwE48753g8PvBwOBL8wPvj73k8fAfV8eXlmU+frOLxP//0L3z6/JmKp8oDKiPiD4Sp+d6EgSGecOEAmlGdqTVTaqAU874pWcnLSpZMFkdy3igHARgtmK1ivmdG8LbgzK4IOgJCKQdSOlCr5+fP8PIyUytcznC7DpQS0RA4PH7ARc/x/YnpYWIYB+L0jrUeKaly+2I8o5Qy54u5XevHXzde/e0gh98IeVU3ElsPqi2jbL+v2jKtTFpuXF6+kvLK9fzC9fKM856nd+85HqZGcmzN+MBABm1SZAfOCyINvTkcCEF4OA0Mg2uSQzNuy90/AUOPbtcLa0omBW+S1ftqSI867X0bXP6tp8gGx7UMtrPmbeOXDZHZDlHX6srSPULszLtHcrRlxT1L6SWdt70EfOtwq2r9lbquuqg14iw2XuS1BQOZWhNg6hUXI+Ic0zDgggdxnMRTxFOBj9UzV+VGocoKJL7i+Illm3BJe2cbrJcU4KignirVAhyaAg2xBRmtuVsR0MbF6QjFltTdIRbA5hj6F+OjuhtDyjYMNhbespnkrI/SqpUvtXnkqHKhMguouka54y4S3mbN2yM53KFenffSJPi9OeL959V2eOdccJLx4ojONLa1aEPvaAR9C3LmRXDeUJpxHHDOOiDvfYgUa7hoAoDgo5HqvY2NtQjI3G4zaV0byberKIVhcO3xgeAtgA7ikWYCNtbEmCcU5eP37/n48YlpGng8HTicDuZ/w8KaHWhiXitfn2/cbjfOlxu362wqoRjwwdCbEGIrOTcHV99cgzs6oTuSdc+X6zYXb3V554i+yeJrRfCglVpb2xNp6FrKTIMlGkuGHf9oE7rNBR98+ztwwZNTZhwDMViG7qibMaBoRNQ3Y8fSjD1tkt2botq9wMJ7h/WUK+ZC35GcvkcG19sT2Ly0eVhxyc4VuUP5LdmxT+EwNHWrXHXkV7UBTA11vquySENy1tbqIAbPsmSWJVtvLVwTqNhcfeNcEh8DPvgGbuwdwVOygDupBaSdeBxywt/5DPkA4+Rx3nrvrWsjAGMqSe8cx9OB9+/eE/zA0/TEaXwgrZXnry/88vOV223mx58/8emXzyARHxXxlRAdB69EJzj1eD8QwohWt9EsdiRHmg+V9cOrYiaA2ukb1eabij1WOoKz4dseC3Q6khNRdeSsXMmownyDdWloj0TiCGGIHB+fOD4dCdHj4khR6313uRTmm/l4Xc+ZlArH+OtR69/QoHNf/H9BPLb/2PfyFuzUWklaQCtpvrHcztScebmcubVyEeKYjie8D0yHI+N0MDOwELdsWJzHoYQ48PD4xMfW6JGS0VLxHqbJEYMwz7NZyqdEV2l0/xAfAhEFEWLLfPca8PYpATON8s5vn2svmeidnYzsGe1d3577v7AJYsTFYRiJg7Hex8PEOE0Mw7jVrq3u3ZGctw1yNmMzB6ggwSNjtPqntyKwtHtMsk3W5RWS2z5PLy5rsklq6qRkvBkVQoVRrWb7wQlzGDhp5hMjV5SFyidNPGsPc2j3YQ+qa4dmm8Sgu1pX+xB0eLqqNSPc52dXFtkT2vP0QwD45vZKez65Iydb5+rW36a5zKZauGplFHhReEY5oDwAxz0cbr1cxBoYvnGQc6/42UjrtRk6dmxbMQFNK03WRlYuzkoitVcF2n2wf3UOCtB6zVgMZ8TSXnP3wZsKy2gd+BA4HR8YG5KDwmE6kNLKOAysq7UKmMbJ4HwXDL1xHh8GYpwaATls+8CgmVGtHcPpeMT7iIgnpcr5MoPC1+cLX7+eWdbEj59+5tPnzyzzym1e2ngoe7hnga/dj0JOCuo2k9GtD5jqxnm67wf0llefpZ1kv/kGVUPXLBitDEO2EhRCDIp3hjV511tsgFApOZksuZhvVdVqHkPNeXdZF9Zkr3MYI9NhtHJTyaa0qkothrR0lKsjntZDrpeV9oney5OuZbmuGreky9x70ifNnLEjPhsfp1cO7tbplsT0cdvKBvagzZxOjT+SMYn5umaWJeHEGfcyRWskWVbi8LalR6RaoHIM+EFxIbNqhSR4Zz3WEGE8BsRXkEytZjAvFapm1CnilTA6xhqaEM6jaoH6w7sH3n38YOtII4VMkULiRtIzWRYkZuIEFjl5RALOR5wf8f5ICAdiOBGHEyUv1GKMQ4eVcmlyfI8001drqFJxbX7YeCgG4LSYrtGthFwinS9ZdCSViFSHtWjw7fwIuGCM3ehMpu6jx+mI5kBV4XYtpGQAxjIr6wIlC7U4tHi0/B8Qj733SC6vSHevCHjffC9qPhR5naml8PWXz3z5+c+mYLpdmK8X27Aen/jw/gMxDrz77nc8vv+uLe4IEhAU7wQ0cPCBP/7x/+K7774HrBuu22rQCaHw8vLMMq+kNeN9MSKceJxXxnEkhEAsGR9M3tqNAXuQY1EoOBc2O/x9IXVL8o7MtCDHmYNmL699iyTUWvEx8PD4xOF05OnpPe/ff+T9+w+M04EQA0bGrZvhYfBvK1MF23A6dwPvIQ5b4EJtJjQ5IasFlTrPcL1CrUhOkA010+b6DNB7UgnCFCKjDxwxIuUf/ZGzVo5L4Ic88qyF/7dcKbpQVFnVshtoRGK0UfWNcOy8t0x8GKgpISkDzf8hV2rjSpSGmGk72LVliqbushJVbzERGvF0sy1oQatozzAbaucFqYberChJ4V9q5aCVJ5RHAk+0NdHMDO1l5fU6eYOruw+LSkPkKlVsl9RcWwouaFFSLTiprEveOCoamxOsCNZwz2Nke9sMRZRapSk7rF3JNI3E4Hl4OHA8THjnGLyVAbwLTNMDcRhJKfPD91eW1czLXp6/WtfyO4GAoUY2Xi4MxOGIOHM+jsNoYx8EH+1eDoM3pFesv9IvX8zP6n/9+Sf+5X/9yLIs/PTpM58+fbY9KNusMsO81qy3lZJzyohgbsodIVnzxlMprSddt7Co+i2X5O9/mR2QJ8SwcdR6s9OSM2syBWhBmFNmWTNrXZmuK6nAFOAQpM3ZxDpn20fV0FKtyjgGPr5/JKWMaoK64rzn/bsjp8dHSim8DJ7bNZqJ4mwZs/et5O485tVjfZcqNCRIm8lqbmCxI4ayqWm6Q7JviGAvr5WcrHlotX5Thmb3U5LtIO38rtIO1VfkgBYs1ZbVZAHnEi8vN6bBOC2Ph4kgghDwFCrHNx1LfCIe4N3xAExc5pU6wZIKfoiEacR5z+nxgI+Zykwujtti6E8qK4SKOOvRND66lqYFlMAQR/74x3/gH3//f6MqPH++cPl6Y5aFGz9z4RPJF/xp4egVrRZw1DLh45E4vGcYvmMYT5xOv2ecHkjLGdJK0ooQUe+pwTF6x4hnVCERUAaqGjcsaaaKVQP6UaBVUHWWINWBKobKLmUkLwPmWi1WJkUIfiIcYkP82p7mHF49dXEUrZzPN7Im85xbGz+oeMo6WOK2/rolwN/cu+ovNuw7BEe/+d7q+oWSE7f5xvPzizk1LjPrahvd6ckxHU4Mw8A0HRnGCRFP1d1uXDDlTfccOJ6OtmhpjrdaKGVG1bxvYoyW8Tu/ZQ1OpB1mdmgZ/Cy4pijakrNWB3bOWYdbw0jp7NT+9/c8nJ6VhNCh2P61w8beB+MpTGaeNk0dyRk2RGlDJH6F//R3vXolxbWDzcvWQ6Q/QFWRHAwzbUGPS8U4O1WhKQYoxXg7sBkkG8KlDYxzBBd5coGLKp9zZa7CQTP/XVdiTQiVpJZtt7wMpYGd23t128Tv2e0GpTf0q/c26zBOj087irG5U7fWAwK7tBw2Ezhp0Lf9cW/4aaWr5JxJyxWe1Q5HK+L1J9mREHRvYvFWl3TzvztIv3eI7/dh69vV5pUdFBXnKtW/ltTb+zcORUf8NuJhNe5U8J4YPIdp4OFhtMAmHhn82Ob6AyEYIjC2YGddZqJ3uxeWsxJvzpV5NT6HCwNhGHE+4ENkGCec94ToGafQeG8N61NrXHm+mIHh51+e+fGnn5mXhZ9//oXPn79Qa92a9srd/gR7EmL/YeT6bpS3lUV0tz3QfWG/6XgaSunaHra5O4FqK+FbwBVbZ3ZViL7gncmTvUDo9Be1MvN28rSn8t4xjWaiOA6BIbr2s8ipkZNLStC8cExSXzf02pLyHcG5J72rKmU76bSV5wXn+j20tWUIAfjUxqat812DtEOxWxlf9VWguWHmsr0cgpmDViBl80RblkTwnrVxOOwxQnzj5rm4igvWs0ocZFeJKVATxDEwnsbm4+MbkmNE5JJb2U8LOG15qMOFviYjEIlx4OHdA08f36NFuV0XynOmsJK5kbiQqUhUIpgIYTWujLiIcxPOHfD+QAhHhuHBel1JQLR5SDnzowpiSI7D4dWQHMQZ4oRYO9naXuPVUrGgDEZEHEUjtcQNiQer1vg44MLUkOK4GQYCaIZSM/NaWYqdO9pcCLUKWhxUZ3rzX7n+jxyP78tV+/Lvsm9bICD4GFGxnjZHPZmE7Ok90/GBECMSAqVN0raKNgTFkI528DTzp66iQs0DRFWIw8Tp9ERKBecC55crtarVGDFPDkXx1SEOSmuc5kS3w5C2AHckx3pRtY/Z1XVW12/eGX0jBeunYpunuTCjSoyRw+HE46O5xU7HE+M44YPBeKXV/bsxmfdv68XxahaKMWE22KJdQpOMx2Clq2m0x5QKKcIYkVrRtEJaoTVopCnGtCqaTElg5VnLJt/HwD84z6iZP6bEuSi3WvgkKy/dk6fHh/RyjI1DKQVp9v05p83csTbjKHen/BCR1lkaM6u6L7O2oKC7kPZgeC9Yvp6H2pxwcQ51nqyVS6380h57BRaa+p72um8cp27jtC2VDvHLRubuHaD7W2nJlrU1KMUypZzJ3Xa+Bex2sPvGs9mfvyutbjdHyYHDITKNHvVK9QV8N+2cSc1wc77Z91qLoRP+0F7H7vmaMkWFnAs+BEIMiDPujDRlUNXa3M3t+XvbiOeXZ758+cKaVr58+crldmVdk0lqXXN8drJxXqUfzgCqG9Knd5YR3eyuz5X70gpYu5a3vG5Loq4rtTYEXdgC71K0oW3QDQ77uIsWXFWcFly1g9wUiLUlnNZGwNDaFoCLEEJknCbbdxoaLc6I/rZnO0IslMoW4PTAr3MlTd1n/JwtuG73TxswbI7vmeoqm0VA+1xm4GpjUos0MLm/ht6VC2F3Kt2vnhi9KihA8+wJDOPAMASCt07gtVpLn7DObzeQNFTJK340WfghDLwL5t4dx5HxdDDOVEuS9ybS9olcS7YE83+LQ2wlvwFkJIYBkUBK2bg6WsEr4sFF8LENdVJyM331ccTHE0N84PjwxHR4T4wH4jjhQjAwQRSRglA2qkDBccvmWl0EkggVITtreFxFsfaw3WdrQGQEHKoRC7MEEW+CF7C9lZ4QNuNIGtre2JqdcF5qbglqQ/dKO6+3OfGvb7j/5iDnVeRO2xi23+p2IJmle0JFGKYjUSvHw6HV1T3T8YHp9NDIkpG1KCLVSkFtZZtSuClgnBgxTsyEzQ4j34KJyvEofPfDP3A4PvL89SvznHA+siw3qmZItTUpDFbmcK0xX8sOarG6du+B0g3pRPcN0kAEk7GHFln70PprqSIpsSbDoHqn7fFw5MPH7/nd73/P6fTI+w/fcXp6h+0YwtoI0857Ygua3vRqO8+Wub9Kidpp4CzC1milF5kG5OHIVl9I2Z5jWayUVQvMMyyzQc5ppearPVfMECKD8/zjeOL7cOBLzSyzY1ojX2vmP6czt9I3tLZxtv+5hizlZL7J67qyzNYKIKeVVGwBdKM5oWUfMeKkyX4b4mNEaweixgNpTSODc+wtIfu9oHGwBBWlhkhRYamFn0qlULgC/xH4DlNancR6RP17RTkumsKhrBWaJL7mLuF9ncT3+VaKsjYuVUcvrYQwMozDFsxb2U9NaVHMY+NyvpDTjWGIDAPEaJbsQxjQGCkF1ttCzlZaWJeVnAsxBh4ejoxjbEiOqXButxXlYuZtYSBOB5wPxoNqh/yaEsvFSt/zbK7nOWd+/vwzP376kXVdOV/PvFzOm4LSt/r+1tVYxLx/WpJSayesmwdXaU6zVoK2e9UVazuKLQzD+Kbj+eX5SlkuHA7FJO5OjIi8oR1GoFU8Rc3Zl1pxteBqxZeEzyvOOzNqa8q6eV6Y5xUQoh8IYUBxDIcDT8EQ0mEcjY8HhGFgwOFzIVcxsz9ajFStp5A51leqtoCsHV7bzNdefrKu07XUFtQEW5vNUmMcI9rUWCZwqKReIqxQspWobLzuKwq6qd46ZurafubE4aNnOoycTgcOUzTCdXSsS+Y236j/e/n933wVKTAo45NnmAKjDDy6B1SEOA6MxyPOO0oqrMtqZ1KqlNVk1U7U9hJxHIYjx4eTle39ARcOOBcQP3C5LZbMa0YiOAU/CfEk6CrUxdRc3gmH0yND/J5x/MjH7/7I8fTHllTY85UccC7jWMAVnLe6SUL4shjXq4r0KjjVOYozACOEkRgPDY2ZCHECcZRivjyWg71Gtw2wkBbIFFSaX1tbn7nNh6KFrPY92hCjugeBmwjiV65/00hrLyK9Cps76nL3OLWadqlm0+xCAFWmw5GHxycjAo8H4ngAca0Ca1n8VqZor7jfENnOYBp2tGllxBNC4TAdEYS0ZsbxwDAulJrx3plZkKoZz1FRNUJVA162kpvb4HR3l9lL2/ztsxsPpzfFc02doeR8ZxjmDPILITJNE8fW3HAcTUW2Lf4WkooTHP9OSM6r26u0W7vFOb3NwT7OVr4T7WiOwfoWlZt1t/08W5aoizkqC5aRq8GdD97xMAz46vguD3xfMoJJJ3u2rHfLYB9qk4xWsQPJ5M4tut/KCndITju4O/F8O6x68NLmsGsIxutyRvuuN8Ci+UE0Q8SMchN4ETMHnIGGWZl1+ZZY/Pqi+7tdrr+MZULShqfdMnpgc/+WNq+cxjvZfE3QDWEp3uNqL6XKViZIuSnWtJLWZKiaGCG9j4H1dbPnXZfUSmOWsU/TtI2Jc2J9YsOCLyZl98FbkONM4QZsCFLJhdtt5ny+knLi+fmFX34xM8/bOjOvy2aYtpUe2/3pSM79vtLLcL181yETp/vsc82qvbdCeOsEZFkSeU14H6mh+dwgGyrlG0+t+5doq8maIkp3ZVQ1oXFHdEprZwOW5Lk2GXwI9A7ZXf2EdGWWtcSwcdGt39LWHbz2kp8dME5cxz9tvWJhCA2NMT+0hrIGT62ylWLUCT63IHQbn7uvup8EHSHaU2vd/qkYqmRJmiWj3cnbeWnzQck5QVredCxrO8z8IITRNYHHgHhPHAbG4wHnHcu8grSqh0JNnazRpPFYY9ghWjnYxRbktD2z20jY69lrugAuCK5iCZo2Z/EQicOBYToyHk5Mx4e2SAKIQ7xrSJLx+qSdBRVhKfsZ3PMj7etUQF0ABkS8kZrDge51pLmVG+9GTLb9tT2fmmtaN8hVVWouhuJopXaTzpa81WYw+bfwHn9z1XYCbTf32s6A/ka2Cd04EeimKPKhOxHbApoOB6u/eZMdF8UIa3cf3DgT9pN7LwTaIWVDcH8QNgNAFcsGB2WcDhyOxyb5rNxu0TxVBFBrBimKsdebI25r3GF1yHboe+e2fj73svEQTGWCdG8NuxfeF1zpmaNB7yEODOOBaToxjEdck8Z2+fj9of7vdqkiVc26HrH65obpb9AV+yl6F3aINMWRwmDOttIcczUEpBRkDkiMvXbU/rYFG85Qj3fO8YM3gvmTcxyLo1CtI277u37eiIIU6xodS+FYKqVC1tZwDgui+oZq77Q1puSOa+Na+QEsOG1Gjn1evR6HPWPcgp02W24tKDoAZ5QLSpHKsb0G2EH05oGO2Biq9LlEr8bZhtlOwrvQbpOH12oyfYaWPHTljNhhWtUZATebDYOIHbqq0gJZM61DzVCsSWQotZCS8bQ6HD9OI4fjkePp2DY0GycXovVacgUXBlw9cgAAIABJREFUA8NkZdxUMktaKKXycn7h50+fWdeV+TZzuZhj+Zevz9waoldqtjHtsuOtGZt5PImKKcDuEiTu3kdt81dK3dCC+wAnhNaY8o27kF+uM8vlzDKnzagwek/vyxebbPxyvTEvidR6+Fm5xnB8rRnFtTlh+6hr3Aq0Hwq6fTZ3l1XWalLwZU0sayaXypLMuI4eWNwFH2zPJNvc63uEVFBxdKm4iG6rrNauVNVtLLz3jOPQlG0NwUHN/bypxzZDUm3UwHYedSGFc8IQPIO3/nrTGDlMA0O0BKq0lj1rWlG/vulY3pbCvFRSFnwVNFVKWVCBwwnGw2TIU0dLW8KRczYzzly7vyJSHa4GRDw12b0RAfUrxXeTxIIPDleEorV1N68sa2Zu5oiHu31egqFd2vYqC1waglYyJSXSvFLzQs+C7W9rM0Dyho6q24IZXLWzUCo4Eys1u7I9JN2i1bsApSVqtmdZ8+daS+u7lVBRimRUDDTR6rb33P3sc/l1tdxvrtrcbnwn5fWMRkRaX5OmQOjRN6bCOJ6sjWEvQYBNZN98ORRn5k53E13ElDCutDuh+361x7ctBNK+vIyhrdUThxMhTJQC7959NO+N6Lhdn6FmanF4GmcILDPBaorNYLTZf9uiid4ztLKHa67FiHEKfGjlFM92L3It5GoBnAsDLgyM05HTwzsenz4yjAbjGXXaao/b4dRv+FtHPLXtEB0abIFNj9pFXKtieLjbtLb/FmfsRjBJ4jjazD2dmpFgxV0uyO1mfjvzDV0WoLs/KweBP4RIrMqDCP+8Rn5iZRGhilJcf1/tZmjF54SvlZAyh1whV9aqXBAye2bbe3+5jhi0p3GKKcCioU8h+M1Tpt9z3RZd5250EqVt2FUqWYQXhGubeT9T+ZnCo3iePA30agHiW6Ny3jYTdZZgyOZTYVyWUkqDdXsWJixpxd9uzfogcDjQeBIN/XIC6m1DLbAulZxTC3IiIZjPhaqn1kDVAC4gIaA1kXJiWW7tua17+fHhyLsP73h8MvXOvKzkUggFXByQosRp4vT4SBxGXs7PfDl/ZVkWfvz0if/+X/8/rtcby7KY700pLGlmWa5UrfgguNBKOncbaim9TGIKOfcKJm5E2Y7kYOT/jjTFoYkYmnAg+MA4vm0rgJ9//sL5y6fNnsIcpG1MpnHk9HAgeM9tvnK+XMkls6RkML8DpVDLgmoz5eoy7hYkoT14a7mKDyYGVLVO7SWTcuVym7ncVkpRUiqUZiFgfGHjZvR2RztS1mZZ99mQnfDej6Oe0pWWZInsHlUhemKcNm5ZTbUpLRvOr9zRkhXKjsL7dp+CdxynyGEIPD6MvHs48O7haH3s1Fr3LMvCdb4wSHzTsXx+ztxS4bQKMnlSXrmtM6VWPqjw9O6xjUUll0ROibQW1qUYfzTVBhELLgd8HRA86yqsJVmFxBWc3EzlG4UweHJ15FK4zgvzUjjPmfOtMpTE6cFaaarz+BiIU7A2E6UFOqLkkkhpJd9m5ucLZT2TRUiumf35EQkOJBCDZxxs70jVymLiWpkJk+7fVwQ2zps0IfpWMQDD/VqrmFopOW8tnMSBRCs5W7IcAM892rcsv742/ybH481K+9sovkOYvWSwBSuyBULbJT2fvIvEm4zwLs5DtVh3amAzqdvQhLsgp/1p12HZAg6I84QwGLkrjSzzYJ4cjQjcW8D3QObehb9Hl/187Rt/h9c7nO/vfHIsiRWTV4shA/QyVrO4D2EgxJEQonFd+p34JtnfEs23vnp2L91bpgeMdnftO7spKnsLStrfbN96D804GW/ojfkYVUsic0ZTBkl08FUEvAgHJzw6z6U6DiIMIlSVzQb+7gUB2xRdIxjHanb7orA2xGRj+d+9TXu9vVzjRLZApo9Vf8wWaLbfb2PU/kvb7ypCEiGLsKiRjles03pt6JzNqW8/xxtc3yA52wfviYG2g3sLfVq23kpU9Q4l7V8d8XLa0ahOYmX72kzCmky0n3Qb7FzLhpSFGIgxNB7PQMoFV9r69v0wdo1H1cnHjlJKa6h74/l85nq5WpBzm00yWhO5mLZNnLcmnz1Qh21eG6dvN9vsaENXNfZEzG4IrWv0vt/0lhPGh3jboHVZVm63nRDr7vaeUuyehuBZ1kTK5ZuWNBZI7GPVJ2J7ri15kbvnl4Z+7ft5nx9rMtVbLqas23bvLtRoj78vO+xlYRq61zyn9G6Dhf1+s/2oBXT2u+C9oYxaKd2vSbRLJGxe92l397oWzBmncgieGO1Lut1EK3XnkpE3btGRUiUlQ5yrCrnQLAqshYPdu7a+qjmT11obwdb2O2p7TBVEHaIOLZBTbchZ3T63D5HoOvhgY2hflVQqviFkG67bUew+VyptbOu2hmvOlDVTRFi9UESQGthIyeIIfbpVRbuLeksctHFYN6VgH/e7TfrVDquGvNZiAoOUV1JaLJDFN9RoR442qxbt3dj/+vXbZoDtTd4rVO4XRXuvQAsa2Cddv/Ypvf/eoXs54m6BpjxTe+fvbFGtc45hOBC6UeCdUZaoidsQhzQ5qDVmOyGilLwwjhM5LWSBWtKW0fgud953b4QWuIlSq29NDiG42CTq0rKOsGUxBosrWbUdtEIvMPd6tC2w+ipg7Dfmm1jnba+7A607he4b3907aQeAbJXUb99pLye1nca7vYx1PKDeW+nKCfSA9zChwwC1EnJlKMqgnkEcI0Zqi84RgOA8A0JUCCpMmHrpKJ4PYWJ0MNfMVxZWLVxE+KxKakFWzgXnzLOpNiM0uZuf9hl1D3ruP1/ffPpBqMKowqgOLzBJJDrhO+CheiJCkNeo0NaW9w2v3iuqdu+/e7Ww9g3rbpIp21wEQx5NmVat3cMyN3+WjJU+rIXAdJgQYBhsDRg3xczcUqq8vJijuADjODINEyEEDs0iIg4DpSq3eWHNmcttZs2Z67yw5MSaE/V6YSl/xvnAl6+/8Kcf/8Q8z/z8+Wcu1zPz3Kzcc26J1R6g9VJSTzo6ouMkkdv4dmdyGurXr3u1j0pP1jrEbmqfnE3t8da9q3qLg34V3TlT/bN676ztRm7ePWoJVY83c3MCt4y3sSgFvLddzmnF7PvNTK0jX060lSOVcYiYY4TJ0HNzz6bYvbE5VDciKJptHMS4PPY9Oxl5M1Js97cJH4RKbpM2BuOt2McUgg+IGOm0Kye3ZpLSzQlt/wre48UxeM/DYeLdaeLhNDINgeAa2X7J5LxwvS7c5oXi3rZcFb2jMHCYDhymAwDzPJsqLSnpmnF1Zb1k8lzIqZLmQlosyKmpUBcr+cRxZloWxHmWVFlTD1haEc8JqXjmNbAsM0ta2n4Hh8MACEOYmAZP6E0/W0BliYrRErwbOQzvcFNCmHHlB+QgXMrC5/WFpSaSwpwdFQ8MeK8NEW9BuTpcFUKxoMR3JR2y23xgZSnpc3uH4Sk5o616ZErNGQSKeiSbQssHay+kquTa44ZfH8/fDHJ6NmHv5Q7B6ROOvTyw+a3clRq2Nu5wtwOBVPDSg4sCFGrNrLczt+sLtRTrsbJkYow8vfvI8fiIeI8bJuMQ4BCCBTmqGDNDGcYD71qrCNHCLz+fqGVldUJJq3k5eAfBIK8iFcl7i4AuKS3OUas16/R+ZJyGrXY8TkML+GzxllopIlSx3jepOkqTQpacrdWE8y37atL437r5b3FJC8Bkw0j2zPXVmdxOzQ6ZWc1mf44tIGhj6r356gjIMLayWMFNE3qbAUWDpzqHlkLIylSVicrROY5ieMNFIIkjimNCGBQGFY7qiCL84CL/aRx4h+OaE5+5spTCv0hmrokzFc2ZlBrvq0nNaZnTnlVwl1FAd9mmu8y2EmytilM4aHs/OD44x4OvvAM+oEwKQzfV69my8xDfNvN3ze6geqHPqFr6JrK3JrDs2vJgVwtrTvjqSCUbIbQW64pcmyW/KL55dMQYTRWF8Zm8dF8oT0qQc+K2zCiF4+HAP/3jP/L9x49GlhwmgjdIfK2V5XxlzYnn6401J+Z55bIsLCmxXq5cb38m58Lzy1d+/OnPLIsRjX/5+rzJ0ktuyYjTTe3ovN8SEO8tEFBVfDCnW6N42X7R13hHepwzpaSqWq+8lnx0c4WiypISrhSWN/ZW0WJZbJ+WvYeWqrKsK9d5bp+3k5ABTNjRjrutCaQ1RTErDBFHDO2zFzvcwDxtejNjFyJhsFYXxyLgB9ZUWNKVujbT1FqRantdzqVlz1YyhdZp3ssWOMZOa2hBkcnZkxkt1mrcj5JAlcM4mFmfWPI5DLFJyVez2ag9kG/eMc7KFoYMO7wI0xD5+PTA7z48cJgiD9NIdIJmNT7X9cZ1Xvj6cmXibUuPYxxwceLh9MjTwwmH4/pytQBvgfnrQrkV5uvC8pLtUL8VlovN87SsLPOCILgwMYwHnPcsa2FJudEj7F4imENwcOScuN4u1u/ROx4fJx4fJrx74DgEhqAEV6EaSqPiQQIigegOPBx+4KADx8nx8fiFsTzw6fqJ//L5hfM681wyL0tmVUfVyexBajD7AjErFLxufe6ii63MtO9JNox5p7iUbFwyVWpOaGucvcxX1nV+lZg7H4hDwYdho4gUrczrrxPJ/zYmXYey+QZqvEciZEdw9O7XnQ3fH7f/Uxu03gKndqjWYtLgWgrrspCWFa2mptFuWc+OQNxRSzfUwTlPjAMiShyG1qfGN45Gqy1Kh2plsxe/h7W5e1dg8J4PwZxyo2W0Pcgx8m2x1wneNgI1gvQGqfUNrO4bLNz5O/x7RTxbgMLe84XXyNt26f0Y7cjbXW2H/k91YoEjYuoGmupqzEZMtnRx5wm0ReCdecs0k2+8GCLoxcLXiBC0WWCpMInwJIEPEhgqVBdZVPiqSiDZp9EWXMtuy2/3+vVnVF7PyVeozl3ZStRQpBEYMan4k3OcFAZX8VW3pqL3Vh5bq4g3urpKrMswe9iqr9ZUX2K6obCmxLoTDmilFsjtMcHva9pHtysJxW293RrwaWhOWSl1bTwfz+FwwDm/KUJSLsyzNetccjZZeEpm0NZg9SWtnC9n1jXx/PLM+fJi/InbwppWcq5N0WOfxd0FqLswYC8rqyq++j1o2Ijgrxdat4YwfseObGwxcL9H5RtU5S2uFoCx7UX1FeextkDdN07Ezq/ZC8xd/IGa4tP24N3MtJRePmqlvMbIFu5cib23YLHotp93pLDWTuvby33G8dvLm/219gR59/WhyYk7n7MnIKVak9GOzhifQ5v6sZ0v+4LdxtyqAg1Fco4hBqYxMg6R0O1IUHKuW1PHlDMhv+1YmleXI/hg54LzbPzRopRUEIW8Fkqq1Gzu7SVb4GkNpDMgmyeYUygtMKwtYMzVAu+iDimG8vXAx9ayx0vASdjRL9lLPc1UCjCUJAZTRh3ckSd34lAXlvzC5GCh4ihoTRR1lBqbTYGg6ulM6e6jZFOjbiXzfvap0hpvW+RaSzLBkSo1ryYSqoVcEtbjEZtfYO7dvuBc2cpUXfn3a9dvBjmlHc73g3dfuvq2BHB/jPSF5775Wf+mK4yqOUYZH0dzi/LMZTMM5qvgY8BH49z47nyL7Db8olu5yUpKA84Jh8OJp3fvcQ4uL8+s85VSMt6JwaNVW/DGdhiW1jgpjhMPT++IMfD+/Tvev3/fZInRzJloTsW1GkwcX3BxIOfCy/lKWhfKeuPr5x/xWpkOR4QCmnG+K9A6gaofyL8+WH+fq+8YfTB6lv/tAN3/SUfm9oh6rw3cu63dP4kFPdIUWA0zai/pSHJjUW1flaVWqhOOLhC9cMDxOxc5NU6Qa/ysJ+84hsDkIsEJB5GmTFj4scComZtzXICCZeL33jvbe2wQaT/ItB1wrmf4wKEKsTpGhN8Hz3fBMyJ8j/CEcKjK70vhqVTGtqHYPWqb/l+9mX+/q+ZKSUbCthOutuaqbR61gL2rprbArR04pRRTIKrix8FKPo0M2SXptRhJ3+r+5nhsHJHKvCyGgOSVXFdCWJlvRg4OIYA6YoDrbeZPP33i+XxhSYmvt6sFOmvicrmRUmZZVy5n88CZlxtLWsm1oFRTcWDt1Eo3tVOMaK2toeiaW1nbeskJQoyBEI00lnNrcNiQWltvfbPvMbigbvfIuucgCrJ1JX+rK0bHGP02TqXK5mZsPfjsd6aW2pGcfmnrS6bYPHbtffsQrVzVcxa5379tH02lkupKLsptzlyXSkrd76whodXutyAEF6hS7/YR8M5vrTMsWGpWHE3dZTGOklpvrNrRNZSSlWXN7TA0SbSK0pvOimsoXm5j0CIvJ8I0BMYYOR0Gnh4PvHs6EYKdU7d55jYvnG8LL7eFeVlZc2V4421WcyKXlfV8Yw6O9bqYQWrJrPONLyUhTliWZGT6XMmrkmYLctYlsc4WNKS5kNdCCGZfMgyRqsptvXJLdjaL9/blHA+nR2IXCNQBrQEhWAIJ4Cw4WHNuAU5tW3vADw8EPIey8pi+56E6zvNMcCNOAsG1JF8d0zTyeHwghAEJETdYywbzeDL+jJ2NaQt4tFU31rSy5pXe0khrY1RWi6StApIs9hA2VVivnAg9Z34df/y167fVVdlkit135P4JXxsD7pPvPl8yQcP9G+iwjpHW7E0XimbQ3OR0Bl25YL0sYrS6fohx72X0CgLpmUmHoIVxHFGNPOQnvvvuBw6HiRg85+fP5LRYOaEFNS53d04jT+XG35hOJz58/z3jNPLDDz/wu9/9rvUL2sseVn6yGuJw+MJ4tgw0rSu3l2fynPnpX/4bX3/6E6eHR7xURDPDOBE/fEcYrfdGqgbR9Wafb3Z1p7ft/snd130N5/4SXpdzXE/Vvvn+m0BIHUyDOScDrqoRkteV1TmuWrlo5Vort1pw4nkXIjEEnvD8P27gI56klUvOpFr54D2Pw8ApDAx15BQPOFViuvJ5hk8l8ckpVypZu5+HzTPX3ydbjPPqe0OBjPTnVXinwvvqOTrHfxqO/IcwMYrjBx958p5QK1PKDMXM2GJO5lTV5bvubSXHaalmBJitvqLlLsgpd3wc2AjxfaMwrklmXRdqLQxDtMRAhFrM30aw4EDVzBbHEA3VadllyhYsLHkml4TgOZ+vnE8XYohQHTrA89cX/ut/+x/8+adPzGnly+3MklLrTr40Baf56tRSEaeIs1OoUMzFVRy5Zoq2zycC6nHVFECiydalDMTgN0VSHK12v8wry9LMxXLdDsmdgIyJFjYUZW822a+35uSMQ2Aaw4YAGtLh2l5oh0tXclZtsur2/nuZPXdSaykWHIj53oQQDclxe5BjFgoWfKScLdDJlZfLymU2s7/bYr5HDsGLx+PaQdaRy5Zgtjnmm8mfd90otRsJmo+OirK2bu5y57iec+WW1zsEqIkBvHWbqtWZ6rZb3bbSlXNwnAYeHyYejiPffXjkh++eqGrljpfryvW28OV84fnlSsqZeclM5a/tc3+/qy4reb1x+3pBamVZr+i6QjYzwl8WW3c5KWmxwEazoyaHVsxvaskWCJ0Sac7IIJwej5yeDijKl3OlXJMBad5UjgEYR+PQlarMcyGlgmpEsz033nrZkZJFClrBKVEC4/EDcXzkITu+W595KkfO88LoTniuBO8ZvPnqPBxPfHz6QIwjhADDAOLQUq3lT7VWOjmtWwLfSc2X25nb7byhMKq1BS6une0tGanajh+jWGjn0YpJWbxz1sT7X+E//g0+OS0r/BXofUdwOrzfJjz94GA/UO9Pli2bsL/a07P+M8syzFxo71vU+/VIy7y5e+q9BGTQG9jiHsaRUlZiHLC+Vq3A1SSlVcA54zRI4zAgVp4aWq+p6XDkcDw11dj+uXuQ43NmWGaG1QbUO2tBoSWzLjfyapvwutzIaTEL/TvlS0eS9I2zxbubjh3r9m/7/69NlA0j5i/RnLsvegns7nWc++ZZLTtTMaTlvqmwYCqryXlOeJ5c4B2etVaUyoIyIsQOTYtwxBNUOdXIwXkOWhmkvArZOslbv7m1ejd/treszTVUlVEdR4QTjkcXeBcGJnG8C5EnH3C1EnB4KUhznd0UJ9aq+d80LP/WqxbdlRj9w96hOK/i1TvU1T78zvlw0oQF22NkL9vcoRnI3th16xdWe2dz+8q5WPNLXONSmKvw7Xbj5XxlyQvn64U5ra2MtRi/o1TyakGID2aFvyl/7tAC22H6wDUXlg69NzK08QVl80VSbeRIZ9D6ryV9r0n4vAoSe4nlLS/j7stW0u1ooGrnX7W9TysUSyyt19/9+6ytumz35F400j/HXs7sQa9SVDc1TrcNyUX3JqWYOwJt73St6Sns6r7uV9OVO93VmA2xYdu/bX/f51ptWXxXHInbx2F3fL5LzHQHj723MtUQA8MQiEOgZJjR9jksIF9zIWdTPL3xUBr3KVsj2LxmSiptXZpbdFotuC8JinXGaf44ilYoua/tvs4tEHLO2jyoaLPBcHaUtuRVRAit91mpSilLUyB1zxrbKGorhULfK9QYXM5b+Y+BUKf/n7c3744cSbL9fuYLgAiSmVlVXT378o400vf/Qno6RyP1vJmuJZNkRAC+6Q8zdyBYWd09081BHVZEkrEADndzs2tm9xLlRPAzWv/UJWC6tpUn+ED0gRZMS0KEalqS0EZXG8OZ0f2ypI2UFMlpFoT2eqzW7WbbEfjRjfbGrvVf/aHjT2oh79D2Ebk5/v2Y67ev3h2V42fpG+5OEATvgkGcnseHj0xxsloIB2g0sCxnfIiIRRJ7J8sOwVcY1Cb9HGOceHz6yDxNbOvKPJ/Y1tXyuAq/OleoJK2t8ChdvnN8+uZ7/uZv/4HT6cTHjx95fPo4NKkOZgMamgu1jSBeJ5b5B52M3Wkxoq60ray3G8551eKge9wr67bZxHvno6dnDoe8fdbHtqMf7sCTM/RH3G6Q6c6EbUCi+3wz+BGgiwtTK641q8+HT+L5rYvMPvKbcOIxznx0gX8KC9+6wJoyH3GsObNI5NTE6mBkcNH4Elh84ExjRrtIHGqcj3VitRl5pBHXSRWigyBad/NRPB9wzM7xd37it37i5Dz/sJz5flqIIjyEwOS0k8wn0/cqRfkjsnYmUdOvBgZ/qSNvakClT5lSf3XBj43N1k0Tze/ftpVcMvE2McebOgM0ZVRFhzcE20imiXlRsT1xyqWTS+HlItRVN83Xy5Uff/qZZTkR40IMswqk23lpR5ZylvS6Ay0oLmxpo5aKK0IuakNq1fRxT7X44BW584EY/aCCcNZqnHPhcr1ZSltrv0DnQIyR4is+VVwwpKuiTqLo2PSaqjGMX3GO3+tQ0rydT6azFuvIFtqBlmAPBOvoyqoH5XQHg6S0y+wA3G5K9Ae99kYladYtDydgXYu2PxcLuszGKxu81cH0bjVB1SZEi9RP55kQvBL6VUUB1d708dyDYLsCdVuNfRq7+s5D2qQTBx5K+trevBi94+E08fHpxGmZCB5DIjUFelk3LreNl+vG82W1aMrhw/vy5FQC69b4/POF25oobSPVlYa25sc5ElpAloA0FZraroX1JVNyoxXRhhiEkiu360atjcf8MNLKj08fCKdZUUejdKi1kVMip6zpy1xNUzGT00rJF7yfQF4p5YXmJo023aadSjURWsVz4+cYafHM8/yBbf6ekoSqkDEAuQVuKatsQ66QtAC95ERJSVNVW6JsyZCcPFLF6+2qr2mHtLqIkYP2BddGsHoAogcapDGdtVz8OTU5PSLoRvJIk+8ORry13et3vb3bvO+eWuqCa3rs0Ypzwd7TlHDLUJ4jD4fgDZ3pXAw9GlADpu2MYmJh3ckRpnnh06dvFVG53TidHllvK16UTVQAt2YqSn4VcERRErDvf/s3/NP/+BfOD2emKTJPk0WxvSWykwQ6ZYU1osTL9MoP539nilFbLUuxzolEWm/cLq/qaRtLYy6F19uNy+X6B/v9/yLH0RM+RMdHs9Pvy10qyof9d3fpruOG0B26PVQSmxPSoIueiDk5oTUWhO8k8vdu5uwn/j4+8M184slH/mk+8clP3LaNL9WxstGcp6nSpPo3wdPE4WvkIUU2gaU2Qsm4tkerYo5pM6bnXmbmEGaBBZhF+Fs/8bdh5uw8/7w88nfLicl5vlkWnqbJIiXRGuuGpqdqU4X2W1AIuGRYr++ertpuWdlxezDR0Bb+u5vytUPvXcqZdr3incf7SPRRuwdjZI7RqPGFaVIUQRFNpaOPcSHEiZQSBa1JQxxfnl/Ja+Lh8ZGHhw8spwdDV/Sba1VJiPW2KodHSpqWSZl13TSYchgaZqmLQ91XiHq1UwhMll7bnetGypl1y4oieO34UW6rwDRH2/AVeW2G2vUlUaX1LNY452MX5Hv7Op3tqQ04VDWMmhl7ajYndS/oVXBA5Raqocq1GRLSlAQsp2z6T43LJXG5aQdWCFE7sxqsWyYlRXJut8y66vjU0jrArs0AVmPjTfPKObSswMHpdOLTtx+Z54nL9cbnL19UTkL8uMLWJSgAOfxXKpZGVIeoO050k4NudN6iJ++E4GAKjqeHhe++eWKKjhiFaqSU13Xl9brycrnx+eXKTy83ovc8nmamaXrXe1lb4LY2fvzhBRdAQsFPG+ILIXqmZdK6TH9iCY8InpefLnzeXkgU5cMRTUnmVLi+Xik5U3LF+6if8XCCaOnGpKnGkjOff/5CuiZq0WJmRUoT23ohpxnnPbV9IYSFJpHmC00mHIUrG44CMfNpntiC4/P5W7bT35LLiVpWKC9IK5QWuawJ7wrQe9OFnDdyWtXJSRt52yxLkaklm+OtxcYdU4Q+5aulpHb/QWz/70XoWseTdyendaWErx9/shU+IjZvkZz+eHz+tUKg7pWNC+qPvWNKUB0oxUVVSr315268q3eV6HvePh6+r0d4Iao+SjAWU4PznPOmiqyRa3VVpeZdwIfINKtRX5azEgoOHoeqRXfs0RAJQtx0AAAgAElEQVSiRiOEiRC2oV7snHrlimLtXujOl4Pd9GpQ8X8DkvOVY7g50v3ng+t82GR6Ku8XBr+N/w34sz+X4VjtYXJH0rwIi3OcnefBeR594NEFHn3g7COnEHClkpzHiSeLY0MLikuDjBKD1bsFYc7vcIUxNuc9RqZnlKQRmiJKE8JJHE/iObvAkw88+ag08T4wjRqHPt/M+luqE+8NtWo2p3+5Bv6SR09vdr6ioWYwvvdr2/LuBPUOF8Ao5Y0k8ChfIN3g7Oli5zwhBt0oBHalbq3VWYEppTsl9B4U9aLefhJ9qvR10NGVHs212mjK4jDuKYgVpCqi26wzZ19L6ij0FJqu9T34GUSQbZTl2d/6wIyz+8/flD/j2G2XQfT9ufSzGcIG+3s4vuf+d8e/VXOEOseNrm9FO3XMlPiv1HYYy/1zjij+kXzPWYpNrKsthqBcSikNR607oMP2y/Bf7s+zm4nW9nVq3430c9B3dFTJOSF4RwjqiGP3v1Qjwstl2NVSqsr00Auu3+8QUcmEXNSp8xQkWleg7IzrsTvreNaQxlwdpHc2fH0uN0PKBa1XddHbvdLGnVaV+ZzK0BerpUszaWdWI1FKQkTvUSWZeGamsSIUVl9ZHWzOkV2g+QXxZ71vVdu1mwn+jjqpokS36ozlwVxcch6OiTojvfj/vpO5p06PQMggm5R9j+/ITzs+/wNw6x91cmKMjIzLVxyc+iswUf/iY05Y9zk5rFGr+D84La0yuCr63ti9/p4P/rVKas3hu/HZ+lWOUh2It5xiQEQJzeZp1k6RtiKrMupOpzPzWenln54+Mc1nYpwV+h0oh0PcMEHkArUKzkWm6UROhRBnfJyQUnFO+y6nZcFHjZ7EOUprVvBXlCqdoaT1fsfbjqj+O7B/mwU6Fig7rxIOyH3NzXFyHaLp3aPj4PzsTpME1TN6rA+4PPNPTniaF5YQ+M35kcdpYvGBZZ7NMRVcVp6Ijcq/U7iaMct1pSH8Pq/8a9m4tMKrqLTIgpCppOYP6RqNCs/O8egCkzj+Oiz8lV9YnOcf5hN/M52Ynef7eeYpTgSnxlssZTc2w+O9ck5luZ2D7HUiv7MMANjacXshfkdyRHZ27+Oaa2hBq7poFgWJcFtXgtfCQppGyt47QhBKC0jTaDvVivjA6eGBDx8+sq4rr5cLl+uVViuX241rrVTgsl55zBvihO+++0ScAl9eX6mu8fz6ym3d+PLyyrYlPA6HUyPtO7u4opwp5YNx1znqnba2uy4Q2lutm6OT+K1rAjrtfSTESOf3ilNgzHVDZ7fOqaQAyD52NtLv7bSKaPDVid5ch+/tvg0C+LbD862pJI14xzRPIw3THdIG2j6dC6U2clLiudYgZd1EdE4IpTolF3QwBds8nTrS3jlmr23ITsTQG0HMwXBe2YZpqn3UaqFbTBXq1LFzNJXO8Q2qg+BUbJFGbnq+Plimt7eQO71HLaqGkxNRZfEpME8B78VEXKGlyqtTmYqfPr/wcllV5yupLplznnlZOJ1O73ovH7/5lku+0VhBMmFqLA8VHxrzMvHwtGinrj8x+wfAkdfG65cb4hKlaTNKbQ0JUJ0G1sXYjBGo10xbm/JQrZm0aS3V9eXKet1I9vz1dqOWxLrO5JQJIeHbt7jJg0Sa20AmUlnZts/UuuIeIj/OZ7IPrDHw4dNfM88fiZefuH0pSLoitbClG51PTbNYzfiP1LEpWSUraHVQqdjEvfOiR1jWGtDvlXVQi7bjK7GgFjaXZohQFwfOv85h9UednGma1GgcFtXRc+qIxFvn566zQ3YnRqPq/qy/pydgO01z3Y2y2Rnv97z70ce5/x6379v2WNHiZbWRAXERcRMhTCynB2MQ9bjXTHWF0/mJb7//a6Zl4eOnb1nmM3GaOdYc9UiExiC2alVwfmKeFZWJ8YQPM87XcaXzciZMk0K93lkbX7pzct677XgwRnc0ZgzoAbkBq4L0O5KjIl07KtMnaU+vGVHYjuD07+uf2XYKf5RLJTrPuRRO08SaNoLznOaJyXSV5nmx9lePrxUfA2tO/G575aeSeC2FH0tirZVbK7xYXdTqse4PR5KGNIPz0YXmBB6c1gGdnOdfpkf+ZX7i5AO/PZ/5fjkTnCJMs9UgaK7YDUhVr7E7Os2cnElXVMg6nMv7OzmIRXTdobEIx7kOU9h6rD1w2BMwyg+ltXbudkXBKE3hKr+HIjaxOUoTcm2k3PBReHh84vu/+i3Xy5WfPv/M5y9fyCnxer2yXq8U4PV25batiIfffv8d3337iZ+/PFNa5fzlWfWXjC+keZiC1uKFEJSiQYTL9cpzftHUrmgEqwWKnhii1Z30ro2Kyz2N07heN243TV3FOarApXPEKRBn/XzndYPOpVAuWY1mPTgUwkCpjsWw73F4k4JRh62jX2psamt78Ee7o/XoTM4xepwLFhEzGGZr2SglUbI6OWkrVAsma9M6CvEBRLcD7wIuAhgKizpSk1fSvd4S3scv9HENQmuFWjZazaqK3vEoq+NwNCXoE69NHt5RRSVZXEdaXRtBpC47vRkuKgu6844Pj2ceHxa8F4IXtm2FVnlNN1rWovbPrzeut0TK2g4PDucCy7Lw8PC+Ts6H33zPtW6k7ZlaN+YFHp+EGOF0Xvjw8YEQPV4mgjtBE/Kt8ny+aABMY2uqwC1B1MGxtHBnu95uia2o2O52y6RN01nXlxvrdSXlxOXzhZfLK7VG1lXI+cIUb0T5iG+ATIjcEJlJ2wtfXv6N2/ZKTU88fvhrbvMZPwU+fft3SAb34//Ll9cvkBpbeeWWLlSy1uRsXSm8M5K3gR7p3r7bIRE3VNY7xYXapV0ktzs4Ikr+GFywVHIdpTG5FSp/AcbjP1Rs/NaZ6X9/e9yhOf3xGCjpJwwwQN/zBtCU/XWam7v/nrcIzw5v33+H2MbdoffRsWWtltM0M0+71pRYtKjnd3DoZD9nRSHdm3SYo9WeOsEMQ0dRDkDHOF+3oyrvddzn8xg4mbzVbzo4PW9TVIe05I7aGGx5d4Pffm8bjlVn60VgKYFgqN/iPcG7MX7YfRHnEO+1eBvVi7q1xmst3GohNZXVqAZ0q06VMXGLRqdi0YEDFuc5+8DZeR5D4ClETj7waCkyL8IkpsxwF82/QXDs98NvB02NdSTsXY/79AGHs2uH0/5jaWRg74KAoYiM9OdKEa+Oon2v3T8f9vmOCKX2FIH+9BqzGNVpmueJeZqY55ktZaYY1WhX7RJrzfhW3N7J0adXD5D69fUxuLsOjoEYQFF0IlstiW+E1tOOh5/aAa/7Gpxf2o/3O0a30df/qg6DHMC5dh9A+s6lg5hgazMhzH4VfR7scUo1R1jRIBnfJA60E61ac8aIURTcdTJ+etqod78MOZFDfNNH09mG1Vqlekf1jt5K3kyIUzwDvenyO7Q2uHa8c6ZLFUbLugaa2tlXUjLyv6LI1agJk+EgdymP9zrEBws+PK15u24xhFRLHEIIOAJetM7QeTf+1p1vqnWxub5n7AXapRRK1iBbBT6zEQpmTVFlq5MtBh7kRMme4jZa3Wh11XSQzHqjykZJN0q6knMklY2tRKYWFBjwTh1gnKbEWif705rTZoR+ihCrk7NzUt2DI9LMTh5mpvS5+cYvkMP/Yd93dn65yp9XeMy+kI4kgP3oLZp3hvZwMcfOrHunRG+aNKUo7zlSJ55Kf27X1g3S/bXendfXHLE+dL32oDb2En1bGEX0u6d5xtfG49MT333/G5bTmfPDIxWlju5cK8fv7VFzR6Zc0EUXcybOJ6b5jPKz6FeqCnkYLfGdKMsHz2nRgrDTaf5TbsmfcaiD176C3nTRtZGu6hhUQwtrddCgt6nWsjs2rQ7kQG9A/7qjk2ST2gwWU0Cq5gZUCkyLCbW7R7lPqFqMGbxQmyc0NR7SGr42YtXiywnhEXWaXmj8XBuIttgGp3P0yc989NoZ9Y/TA/80P3Bynr+fH/i7+czkHA8+MlmqRGox3hsHLhqScNxiKyMfXSvam6o5+Kbw47veyRh3Lalj0NFsg6sGIwtizKQ6d73VFvW2ZOj6dNpRc7lcqFnrA9Z14XqaVTYhOObTQqmNLWVut81SQlqPVkrV9K2lfi7XK88vr4SgyN0UAudz4ZtP3xDizPnhkfl0Zt0S1+uNz5+f2VKy1nZtN902jVBzzvQOn/3cNV1VrS21tWpdW/XOJkiDLWVyLfjg8dET0dRVzRmKdmVtVnzbWq8B7LasC5q+L0+OoqVONasGC/seeIhTvikvnQzVglDX72ffvPUejBb+ouPpfWVLwpx1qialJkPRQH9oJOmuXrMXaJePc84kJTzLEvBBERxnRci0ZkWnUEshmtBmrW0UPjsf8csETTnY0hZHp2O2YlLnNQ2msWAbYyDWqemd4/HxzPmkquUpK8lfLYXtupFW5XZ7fd24rdmCHsfsPXMIWjcU3lls9XZjXTdabrQq1CzUVShFuYrSrMXFXoTqxTaRxrRMiBOKq2RRZDpERVS996SSeX5+1rSmKGliKYXnz8+8fH7VGrYsUDQtHQic4omUYC0reV3xrVG2H2nBI25RPSt/ItVnZPsM12dS2Hj+j0Z+nVlYKDwR8KzbF6RteMm4knBppdWkYp4pKyGpAJYGrcZtZYZor0FvPeRVpmz9z+HQVvUOKDRDI1vd95dRq4Wi9LSyo+tfOf7T7R9HR+Lt8/7vWnchyv7vr38YY/GMz3IO1+vi9v1Xvf1OT94vUvbHbvhGO3vdo4kO2zaDjjpqUZptTk6I00wAnj5+5Pvf/pbldCbOs+pQ9T5/O23X2uAL6UyMCHiJOCIxF6b5xGxOjnc9m6E1JsPB8cpcGlARxDhVTvP7Vv1bnwJ3dThjkN19rx7uEPKZ0zocmzbYKfVo+95/dGbv6n/273KWsvOtElylFXV+1LGwFF9NgNKZB6d93qFVvA/KZVMqsQqtKOpydqbY0wqvTVOAHiUxEwffTSf+eTlz9p7/7fSB/+PhI4sLfDNNfJpmJV2vFelEGtU6pcTgfA7jpE+Aso9RyfvrRfZW+3c6pjgxTdOY98d112ob8D8wKNVHBGlOgoptMgT7AC554/KqnByX68yyzMzzzPnhgQ8fVRpg2zLX28a2JVqzov4uZNmdnMuV5+dnltPC+fzANM+cG3zzzTcspzMpFz598y25VH766Wfgd1wuV2WlvV60xXxLbCmRc9b7cyyEFq0LqbVYDUBTrbKsWkvOun+wQAXjFplPk45LM56rpq3TW8qW1ujF693Q9nqfd+58FO0ULBVyMb4laxv33hEM3dL7po6NSsmEw7ho8Wc2NE11o0xdPFfTG2vqj9MFEw059Vrh3du+NXWQaSjvkXPKJh2j53QKTHOnE7DUcEpcrlctYHdOHQnRAvAsWmg6hcBp0q64kjNp25SzpRSSFaU6Y7gHdNM0NlwvFY/Oy9PpxLLMlFz46ed18C1dXze2m7I0Xy7Kqu29V76zGKwrLyhZ5Tsel/XGbd3wreGa0JJQb0L1kKWRpkKLjeLBB8MxBJbTRIie6hvZFWqr+HGPNa36+cuzFl5Hddhzynz56TM//seP0ITZnYhOhTmjBHyMrDXzXJ7Jtxu+Ver2IzU4nD/hxRNdIZUvuPVnuH0h88oX98p1ipzCI3X5jugnbtsXaCuejK8bklekbJAyxTjizBMHtGam2Lrx7uDkHILlLhKqcxvbMxTSaabzyIFDzu1JUHNwurf+9eO/1uPasdLxz69DrL9Eb45HGyUNvQCyowb3H9cvp6dE2B2VPwC/H989DJWd01tSQWedGmBQYgiGuOzRXD+HI+x7n3Dai6K7AVZRwF7D+8aZOxhSLRLVoun3zvv/2jF4c74GmRvc3Ppm3pG6/gN76/LxLYdnMj7D/mIO635P5IAENUZhhAEl5poRQFXLRcVQT84RqhIDntD01CKN0ApB+mcrXPzgNS2ljxMPIbI4i+6sFuWX87UDqezjc/eSA7YzxqOZI/Rrc/8vdMj9nbpbC3L4Xb+lh/TG2zQXh2totRfyinVdZXzwQ/06l6KyDOtGSsmCmL4yOpprRcOmEzScLzqxXaAhxKbsxiqw6W2NsHeGjNxKD3Z3hLjaOR6duzr0rQ63w97X7czxR1EE+6wjSZzi6b94/Xsex87FvtRAI9ZjusV1m9U73XwYqa52tEO8fU8X99RNZDBBiLxhhdAzUHunz4VqNk3b2t1dPHScYzuSqNfEwWbe0/E3Mdbltp97Q5F9TX/qZzYTcg4iBNdGR1XXIWxoV1gnx1SnrhPZQrNz9SKju+q9j95B6/t87aR+yGjtVlNiDN/yxvYczbAw6lZqreSWtT7VC765fX4ajUqTA9u5HMhYW6XVrFQENWnKSgK0pAHd+MnUAiV5hEJugRRuitSVzWSXVHJld1Y6303P+x7Qboyh2Dti7ER/+0+tph+J1WN14e69XbRvQQe7dvzlcR/+5fGfaiG/q6s5OBjH1NTxUU9G9kXQNzqwQVEjlTNGjKd8Fs4K4GrVKv2GkhlVo5ePQYX/flGPcDBGxSihS8kqZJYTIo3lNFPKmSlGTvOsBU2pID7REOZFPekuZtZkR6F243PobqAdymjkMNx7F1OIfhRxImKpsx2Zck7wZjDCe6+/o3OiOPj9T8MMPOYl23tKp1M/oDel0Aln7qZYt5jS/2Kz846t0R4OjhKg6aH+mXlDDaMjNo9vwndU/k/vueBIeK4Esm1ypWih3plMFuFCZQqR2Tqk/mk587+fHjl5z2/mhW/mmeAccxecBIPENeoc6bh+wh2ZOUCjOoTm6LQM1QrgqsAfKIb7ixyN4WweN+1ajUnbKRLnnaf5RnPtjaN9OESna4eO7bJoaCohpczz8wsh/J5pmkgp8/PPXzRdkFaVebB0lZjy+Ov1hvv8zHXdEOe53G6U0kgHccTjre+BQANSTuZAFV0fvrMB6f9LqaxrQsRqE4wno3NYDdvQddlG04IjG9Oy1jcUQ3OatcOa/EA1w42OAUb98J5HiNYdJabp07RwE2CaHJNYLYcpw6stjKNIOyW9T30tpJy0W42qHVBOOC8RJ8qG60PBBwsmREBsDpViqblCcwlatvZw5aHxvlCriqY653FEdYBawTkstVwpeRuoUkp2LjERbJNuVQlSW9NzdG5UfGkto42+M0dqnjxz7PcVckmknNlSZkuKVG1Zu8ZqEQRPcBCcZ7Exm4Kz2p33VZRPtwv5diX27EJtbLmSXaNkR2kb3tK/p0eVCkpb4XJdyalw21a2daO2xiwOCbouU0rc8opzwqN/YD5NhOZZlonzw5lWwbeonYqtKQlgreScKPlKzVeq87T1GXyk+Y3cPPgbaX0mbc+k9IzgufkbIThyeCXlC85F1tsrr9tPlLyR0ispbyawabU3tkcoAqM2xXt1XD99+8Q33zzZv01ks1Reni9cLjd1TLMSEoLHmWyzE0fDUZpBA31rRVm4pcEfyj7+SU5Oj8D6Bn/4w30k+CbS6ZHJ/hrz8lEyNu3t74+mPIsYQVXT9rOSaK2Q8kapSbVQlt250Tzy3qY+YPtSBwyazdFBGvNpppGZY+Q0LwoDrxlxgQbMy4QPDhdEoQN2Z2bgDId92bmO2OwR/p7CUsMa4sQ0aS1Od3JKq7pvGnLjHUg1lO9dj6Pn28N7+5O8eVmt+8Y/nJyOsrTdyRlRwyGiG6zIb9CZ4TMc4IUjGqQzXTeVoq2HIl7hV3FEET74oBGcmyja8cwlbXxeb6y1EElcgFdpfJjOfHN+ZA6B/7Gc+ZfzI4vzZrAP9623DBuxXz+XTjeu19XRGaHX3tyFXEcnp/H+Tg67w93nvXLENFML3x2aYK/t9ThvURw6mnYXoGghd6kVcuLL87PWdoTAy+uV0+lHvHecTgvTPJmTox0stSn7cSmVeZ1BHJerMn17k1dph2vgiKi2ZrpanRxQzd2RjljrPJRJtTsqfQF2JMM5D77X8lkxLaKbiKxAGykdvfWGF7ZeE9Q5j6yI8p3TVSFGvI8gnkah1MqWdPPw3nT7XCCEyDzNeK9iwdOsqYnWbmybImslZ3JK9NrE4J1yA56EEIPW/LiEOCP9q9p235oiBdQNpNIkAVkp/IOyTCvL8qr8J86bqVPahE6S2Yra71Kr3UtlwJUSCFXFVKVH+QbV+hHkl1HqtzNaC/MUeDjplnXbtNA2ZWVqXlOh5MqWYEuCXqzHixC9OjnLrNprrVZT+H6/I60XynqlhRlxXrW0mnacpSRsRXBBeKhCXM6EIGypcL2upC1zSytr2mg0bYCZ1U6lpIKezjtOZw3SXYB5mTk/nBQBTULLYmtExyiXjZJv5uQ42vZMc57qV0p1NH8jra/k9IWcXqFphUz2Aj7ysn0G5yg5sW1XRarySk0rrRbt1KPrqdWxVrxTstwQPd9++8Tf/+NfEaNnmjMxFnJO/K//9QM//j6Rc+P6WtmuOi9UGlnV2xtCsQAsmMntJJHS9rnzteNPk3Xojwekppt3tRtqHH+h7dLaMDDwFlBqd0auVWMcHXLs5qBkpX4uVSXkNU3VPcZuk++7vXoLW29HP1b7K/eHGoo4TcpWXIWQlG8AwXL8yvPgDM7t33NwDXagQjicz47yiHPWkdC7riz9VTvb1hFD0bTVfwOSOo7j/bhPW8iOVozUVO+c6gVgHe0ZH9CfsMM0/X9fhxLHN4ocosk3J9jUA5G+aA5IQxOhWeRfq2f1anDPVB4JOGlK6Bcisw+cfdD0lC08x44w/rJ1X+/F3vFi6MZwaA6OHR25shCjjZP/6nX/xY6eGjikhd/Wqe0p0V+fWPufer1a3VET6Q6+oqMpK3fHuq4gqCq51+vPOQ3eJ0TYtjQCnfW2DscjNHDOm3q4boQl58NUkpEG6Rtc7feiYeuxGSx/1K7Sm9mdHDnet8Pl97QW7MhXG0HK8fXHNfH+RwiREAMhBIoFGC5nNe7HlJPI0A96e197V1nPmo7s6WFedLaB/lkN1JCbHR5kya3/vndQHVNUlsIapEImnDiQl0MwZfbj2Oqvd0pbzMdOYOfYxUf7TtYRxnFtaFqqa22VvGunHes/vaU+g3eKHjjt7jrSgbzXMRBqu497LtT2utrA0jRDRNge+z46ynEPa3mkUMd79iJ7773uobnt9AGDfLYMgEGayqV6tHBXWkGapbGMVLBWMaJQs5FFJ0UumlGp/XXj/u7BbPvKahF0H4zREyfPvDiWUyUnz8PDzO02K3+T7Y2tamdy32t2mEVX+Nfi8l87/qiToyKK2jraKZS74BaySzaMHDndwPQF5ToLzhgCULbazvtQ7cZJq2wpGUCQeXn9mcv1WY1pFFyA2iaWtujNOlxhrXVEX4rgmHNUksJfBrmeTme89zycH/j04RMxRF5er/jwhWxR8E8//oAPkdP5zOnx0aKOfQPue5i2s/ZURSNtSnykBh1Oy4yIMM2K5ICoaFpNxKRevbPrEK/CZJP/r5VJ/dePg3Mhcq9p1SdvKbSUd6Sj1d157R0ZZgGPhdi//CoZCI6M1jn9DkVFHHhNkbVaEFcYanp5tXNyCMFQMq/CcAhLcLR5prSG58SDa2yoEvV5ORGc52PwTA68qBDrfpKHPP3YF8VYyfz+c5dr7tdvchdGooefLOWWVbDuHY9UlLzOszs10ZSqRazgWqDkQkIN49sU83it9yoOCbRmXWXcV6GlnGi3C04ct23Fv6jjPn3R7qucM6/Pz6y3GyEEnh9OzNPENE+8vLyyLLOOq9PUkxKbqZFOJuzpnCPEwGk54X0gbQlQkcHeUi6A60zjQEpZ0yE9yLFN1Dmn1zSiEbU3inYVc9zUXog47fxxAaQh1qmnG7OiKVN837X53fff431hOp1Y143b7crz5y+klFimaXQG9Udn6ci0KRqzbZmUO8OtOgTK6u3GPVQ7rkGipxFdh0x6WgWSE4rxDely0PT9Er2in9IDMtXa8pK1Tkca4gvVNRyF6hKZirhCdbYhosh6KTIcHrD4wMDf0hrZfJAYIsHHUaOjZqhxudy43jZSyvz08ytfvlwUwaxqS7x3PC6ROQZ8cCynQIyORmNLK9fr9V3v5eS9dTHdDEEVYidQdAznv9bGlgq+orIMtvS8mzjPWpJxmk6c4knXoC842ZAGaU28vrzq+KH8OzkXntdXLttNU7qbBR4pa1rHKXP7WTwPzvTdJCNsrKzUciHnC7Va15MTXPa4khDn1GHqRf6G4DSw8skOiOwQRm1VRUYPgX+IgW++O/HtbxdaLXz8/sTfPH9k2wo//XDl+fNK2ho//1h5fVlp1WnCYNgvj2s9ePa2tH99bf5xFXLU0ekXU6oyLh7z3sAoElSofNe18t4TjpGk7ROFA7lVRb02lC0xt0rOG5+//Mzn55/wXjg/zsyniLYFF4WQD+kDFf5Lw4CVmswjzVrbQ8E7YV4WQog8PX3ku9/8FdM0Ez9/oVRjzayVn3/+CRHhY/2WOM+jRXdE9YeNbsQrrbJtN9brjZS0VmCeF5yDeZ5UHLBW8raScqZM2m7nMGfRq5MQ3rm1sd+Et8hFR6Z6sGyem0YiparwZDUnp0dLIUAMVtWneHN3+AbSc/jOXqPUFzjmDDNIBAVKtESrICVZhF1UqrcWEI9Y6kpcxPofmcUTRYtZn5znbySqplWMONOcclQ85eC82Rjc1RBhOwMgYZzzfTt426/JR5NzCLrS/aQOznbT83vHo+Qy6Bl6IaoW8BpRmzkEgqVw+Xr02rt0Ygx6/2vRn9ZGGzKYM5Gt9sWifpFdLqWUzPX1wrathOB5eTUnJ0ZeXi/M02RB0l6E3Eczxol5OeO8J4TIsiyjRbmno1RjS9Nt06QdXwDbpkXQmuIu1iV2KI4HmhRGTJgrRYwp1SJe5wQfens9+KDIhSI9mgIK7+3kfPcd0Rd8jNzWldeXF8qW2JxjmeIokNcfHXe2hNIAACAASURBVJtUNKVVm/HC5M4s28ZC7v8p27d1+zQh0IhSzZ8XnNd7GkQo3X5bsbF3WhMTvNkOcxzFVaV7oOFcw3njummZ7DSQwxWK01JVGuSss7Kr1oMVMltBYiqZzSgr5mk2InHtOls3RUFeXlYul5vWhn2+8vJy0Y0+CJMXggs8Ppx4elxsjgriIeXGbd243m7vei8n8QTUycm5MM2BKc7G/8QI9nTfKpTCEEWtVVOKMWqtzjItzFHb5W9+w6FdcGlLXF6qrfuJ5TSTkmpXXdcLtVRW6xisWddBcJ7oPCfneOi2DV3Tsa20ciPnCyJCrupYer+nGI9NPLVmysB522gMaHLM+vQ6uWb3WWtUP377xN/+wzeIa/zm9sC6fWJbE//rd5/58YdnrpdMbl+4po1aHGXT+68cPeBFnRvXOoTy6/vmnyTQuT9td7/rZupttXo3JiOtMKB9DjBI/wUDEdOBU1KhnLXokKZ5/mHIe1TSN1L7rCNMO7agN6iEAg9eCzC9R1ywtkiFiGtVeuhOEZ1TYts2NXA+KOX9V/Exgw+L5p9Lzggy2jy913qcfl7HNNo+vn8K8PaXOORXntuY9n2wVFO1bsO5aT39aAZT26QPTkv/PbvPoq/uT2TwewzHorUBiUMDr0CqzmCP+GawpdHtH1vSRQZZ9uiaQx+98zSbM70zYYQccpgkx5FomBJ1P19nLMdHLPIrw9kdNuc0omh0qtY/9ab8WcdXu3/6OuOXOmNf60rs60efH4an7eu34zo6JboTofl/cVqwWoxTqpqqdM7aLJBTHhIMRydnr/cTXEga/ea8d2MNxAkj8/OjS+iY1tLr2blxLDt9gM7FEBqbs6raRDMhlb1OibtHrZnbOWne8/AxEKaJeVms/iEzL+rIqfP69SaLke7v6Y56iBt6+NJvnP2uT9uejleSOl2/reqcFzR20ZS7GCPzjuz12XX3050fGzuPUKsbRcUgmNAcSt0vd7aiv6b/s9n8q7UZ6pfpSvYpabdfZ78d984oO5zvqulwrA1svWvvPY8OaWBwGh35amMsxeZwMzZ2ta89NWkpdWvt98b1pWlKr45GVckO72CK2mBQXRvBjQbkYvey4lC76EX5i8KRmNYcYOkNQeh5aVBaaNX1ipt9Xo09uN+rNh77a5rt8z242bZE2LzxJtkcC5FJtG5vOU+crjMNx3KOLOeNkp2VTHYGbZU+kW6b9ljmq8efVnhsUdEwR24P93vuWpzDtd52eFiM/aL16mnjt6JdVMJYHKUUXq/PXC+vWheTb8Sg8NbD6cz54QFvXBEjAujfY8WDukCc8X8ci5x1cEKIOPEqN49Xgrm48Pj4kZwzzy/PXK5XSil8/vyF623Dec/HDx/4+PGT5nmdDAgY45sopXC9Xnn+8kWhdSecHx9wIkzTRAiBLW1crlfSliy1ZSRnvQOEXT7jv+Po92RIcquXqY+57OhNZ5Rs0LxD5kOKJnTpBxkLRjcTGP/rc8TvHWf9Z5wDTdEQ79ShqVVlEkrWf+dNC5L7igSaeFNHN2ekS08c00t+P6cdwepO2cGc9pojcYrGaHirP31sjk7pcG7smpxAFAjN8GYPy+P73Dg7ZKSXDmzFpQxYtxpS1vPxb4+9lqWR85sgpvUiwjIo2oc1GfdVjdeaNp02ZtxcUHQvV0UAK+D8TWt1+mc0PdfNVMh9iMTXqzpE9vvaHR5LLcXoeXhY6Gy1O5OqIrYjyjykrWq1An+vAY44Nao+iKXJErkW21Ssldq1fUMX7fAAWOb3JeoMp0ceqEyPT9RaeX3+wrIsrLcrNWXypmm70jTFIVLJtbEVdQK2XNlSHXVGtVhayjSg9LbqRuGkMQXBiUecFvWG2Bs+HKUoijAb2aQKLJbRdZZzp+7vq8pShEYCNwWHiMrhbLHgvbM6qD6eQi5tEDcOOR8YAUNrIOKMZLJyuaxku77rdWVdt1Fk3dmEYxSmKMRJiGP5NgrV2su1vlM7eN7vKJYy7HuFM6TMhcq8RM4fzvjoVWS4rLQCtWVc0P0lzpFpWXDeMU8L06xIzjJltqjIZl5vXF5XJZOND5weHwgk5jAT3YRrleYCTqD6ho+ZRuUcTzzEhYc4GZO8ioWtTvC10HLS9JNt89V7WitDVkkReLE9weqqButwu09DNi0vzxR+/umF8K+/5/Sw8PjxgaePn4izZzkvLKeZOBcqkdPTA7dbwi8zH79/5XbL/PDvV56/3ChJSJeNtDmkClIC0hTR/LXjT28hP7gnY4M4RIM9Uj6iOv15bT0C3IuStHhOuUnEVVxrUDNpfeXly+9pNFzoXTCeZTnxcH4ENG+czWA6q6fo0YyeizMW5UbrJEFNzzH4SJWmuXccrTlCmDg/PFKKtu7Vqt0dtzVRPiu7pMfx+PCEC3vEN8bGUJz1duP19QWAZZk5zaeRQghh15RJSWt3cspUc9b6YvjaZvRuR/+uzk+jLGT671w0Ad5VrrsDEx0tBsSJdla8ZfU9vvaYlpKx09w7ReM8bPOz7hpqhRCt26pAMienp8taow0ZDDHU5+DkWDEsTlRk0Jyt1o7ztH8/e5Tbw0ARCJOeQ2sojWu+j16GJyf6HmWy2p2c+X31cTr6MAyKKK9Nc3ugIciIdN8ee9u5OTNHghl9glXj0Y1Xh/rEWrprg5QKW6q2vo4SD+aAtDacF90IdfS3lLher/p75/HhqmsAQ4rsHPp3hug4nbSrqEfy9VAzU0fTQts3zuHkdKkVYZpUx6oHI5KTIa5qMzTdbtO1a4MZK/p7Hn4+cYrCU1C06vXxkeAd6/XK5eWFzz/8QE6bjmvW4u48lmwj50Yqu4Mz9Aa70GXraKz+hLCjM8virW6wUYpQq6IHp9OJKU5qG6838lCX3zTL0SFQehOK/ss5FVBFhJDymKfKZmtUGrlZeu2NEKsIIlbBU2XwGF0uVy6vNyOjTNYh1YZkQvBCCOrohCiaSQ5WyGtcTbVWWsn8IRmAv8ShrdBNEUe0k8qFhguNOHseHhfiFLiuie1yNZRK0Se8I0yRuPQOuplpUidnCok5JHLLrNvK5bIRY4BPnjksuOaZ/ER0QUGHDte1Rgvq9J7izCnMnEPXb9OMw7OAt3rChmqSV7Spg1YNFd+7NnvrvzbdVKN7gSP9Qh2F6fD85UJpwum88P1ff8/3f1VYTk6JGpdIk4qfAw/lxLYm3Ox4+nbm9eVG4UaRjXSDtOq8l+qRuiA1DKmLrx1/Qk2OLZaeC7JD5/YhqjsYRgbc3XFIGW/fK+kdziDEZhLsOW/kvFLKCgg+Ku18MHpwZ5VpdXzywdOQ7nT1Kv++h4rx5RjSI51oauQ57HeaTlLtqkl/nQukYsSB3F//wcs5GtWcy9i7vRnWzlg5dEmGHlaXvNBi2F/LiPxlj31z787hwLZLhWz3ujs+PSXUyblM66cXKvaLbcdP7k/uEB45fvUvT6n/oX+eHFEfDi3pWjDcjeHu5BzQIbmHu/XkbFK2PzbI9w7aseC8H31uy/jwPuGFI4HVnSf8Dsde5yZ3c7Ofo9h6a3e/vJ+7+ut293fswRDnQ4DT9s+wtXQf/ui4d0K/Y9qnrw+5uzd7B4hrfRMcIzpOxMkb9PTw/mPgREecR3fOfi37PdRJqGkNQ6C9O/wbI/ruySxU4BXePcVxu23QEidDkBDBh6CpKq/ptVIriFhdomjbu3hEmjr6YgXTqDSE7px6w4bNtnnbp3nXkwr2HRiy7r0bP8cUp4zJwZgkd3MMu+8mN+Fdw7vhEcHhO0TAuYYUdbz6LOocT7AjctpBVcyBK2MjZWhnwdAiM5S4p1XvU5/yizXwlz60kNoKd424T+1mj6O6yGkfPzs3J/RO055B6TZQMEFZ53HenIjSqKYU38y+OfFWR6oIrTN6jOoAV63r17q2XO86c6oOL5pX0LjPVkHdmb+HoHw7nJ/d074TfMUSDTJEFYjNXC8bz89Xcs6cnhynzRsRYCD4mWbAhqrHq4zHekusobG+Zkqqin7VYjbkz2A8TutK3rZ9gjfN243821HpuB3y7Ab//zL3r4/emTBZrayXn7ldnyl54+X5B9brT3gfOD/8hsfzEyHMTH7GMY0J1FpfDn4Yvy4wpxZYadxbFdYts66besVhVsdDq9nUwbGUh2uVp48fccGKKK83LpcrDTRPLlotrjlNNSZlUKcXLRa8XPDecX54IM5KWrgsC9M04UPg/PzCmhJxnsmlcLne8CFwWgLB7Zo073f0+2j1M6XSNi2CJldIhuR4N1JREjxEfygwPqaobOy7MwJ0bTD7w73Dc38ab3cgvSe1oboNQR+bo/W+11aVx+boWNj3tIPcQk+diNTdIA8E7jgn3+yqfRM+OlljQ+9jByMpbKJ0d47ccYd+z0N6HZL902uarlkbe9m9FLVMtlH5w2bVhlMmh6HcneDSqtWZ786MWOFVp1cffqSAC34U6neh1dYauVZSKSquaF0mlapFilhLb9EUzO4NawHmtER7n7fGAq19yzkPu9M3uFYKpeXhjDm/X9cA60RTm6454tRwJqKL77alqrpx1hZpEeXquG7vW6z6P//vf8VJ5vvvv+F0mqm5MM8nYoxcbze2XLiuG605alPukGk5MS0L4hyxBXJzuFLY8jOX20arlegd0ZwAYxXT574XXHvmOZpuXrO6qoJzntmEVbtToiBrM6LCZvPC0s6yO6LeeeZZC2290/MoVrNVymoCojoXG5rOUMkYjG5DtcVyVtS75EJaV9bbzeq1NBhzhh7GqOnGOHn8pI5raZU1m2hk7zZDmOLEPL0vKhcfPPHJD3TQe5BJlKV4FuIsTLNjy2gmoxaq9/Re5CqNrWRcawQKBLVLcfacHibSJnx5dqRcaRTWrbKu1bL8C49PH41V2biQSqNIpjkhGKljDCp+uoRAcIHnGDkFzxw8Wy2sWbmaaFXTkKMDz/8yfms7O7aj7eT1PcMvjbRmaBtlq/zr//VvrNfE6Tzxz69/Td6+I86BD9+cOT/NzLHg3Ac+fFhZbyvn00denp95+XLj/zn9wE8/vLBdC59//8otZzLnX70Xf9zJ2TZKVv0P9a71ovW62oD9RiEiVljb6zzux8EOAadMwbVkLq/PvHz+D0rZuN1+Zlu/qJaU+5bTPOHDpG2EaLdKN4xHDpPRTWJhhralV2pTkqV1S0yTY569FQJbLQfdm9XF+eAdy2mm1srLyws+KKw9TdO4ZoX49PWdfK3WqkRN1j7bUBLAENTJUSdJmE8n5tuNECO5VG7rytQay7IXTb7/0a0+UBuyZeugKrCZk7NEFdAMXjuo5rijJeNjbPzfOjLj+f5VX3Vw3v5jvKf2HVgNFF5btbuT4w8EhH3n6kff03sUaI5OR2Tk6ID1MeCIQ8h+HW+doWHlGxbiG0miHW/H57/hOKJp4g6OGRaADAdHr0HYi03Vv2t71Gif110c6FOkb2b3UdsbcNcAN29dUXvqqpbCthmzsK+69nAGdO8okW5cfdx74atjnqZRcKycHz246MK5bfhonR0drAanrynpaLI65iIaOapRFr2XvtDEiqZTJtdsY+cBx5rel9zx//vdfxB8YZojtTWCQx0FlAk5lcKaEqUKudgYxYUlTPjgCVUIFciZ1l5Zt6zsxcHRTAYh6NuG6rh3QnDCFD3LHNXhrE6LhUVJ3LxzWozc55UtQx3PNlqHkV2N3IljCmr/pDVNs9FIrVCakXzSnWDBUUY9oHMR54Ou3Zb0XuRCShvbtu5OLYAV8YZgzLrB4a2upbSuOwg7UC5EH5jfWbvKL0I4eS1vcG3oNokDNwlhcppSC+BE08DOkJTWHE0aqWowt9DUARfBT55pUZoDnJiTI2xJ67Gg4cPM+WxM0+VKaYqiSGsUmnURChq7Ok4xMLnAOXomc4hLwxA0PQdllLA50AlSOdgfuyfdIAwr2JHgBnmrtJLIa+Hff/cjz88XTg+LEhmeHzidZx4eZ+b4DUhjmhO1aYbn4eHE7fbKzz89c7uuNFl5/VJ5/nyjcKXw8qv34o+nq0rvFkKVjY0yfC/u28HqfmW19NeMv9jR4X/RotE+Dq1HD3VEm2JRQW+NfQtV39cD9e/pqNJOANihzl2w06LfQ2pjP/U2ohEtsVCouFYV9uvn0s/57Y9IJxoMowZAnCp+d+RJP3MykcSutiqU0gyKfWck5+0mfNj8tCrTLEKvMTH+mzFed59xdMruPBr9/4ic6e68bqyH5/suaRDIcFoO8MrdTiq2kR+cm+OOO57vSKI0e30XHD2c49G9GZDreM1hTvXN0rk3Y3jY6Yd3peP0Vs/rL32or7LPZ7GaExE3OKv0vOw6+sMYXxkp5F78rmis8V+13kJtbzwgPZo12W1Aq83IPHstiKMal8p9yqCTjPX6EEsvtLvk8z6bWv9sI/6zV/XCzvs12Plx9DWd7K7P22bX19Me0GiuMrpfYN8N7Wefikdo/n2O9Xoj+8J2W9mi1/SCV+Rlu63aqJAyjYAYvYFzPRUecL4MB702Y3MuhaDCTfQ1d7Cch0dDkVsv5G6K8qVCdSgJpDV3dMe3z7F6+LxalTtH+cAqUvSzxuYnWLOIFqQOQPbgVPduqt6h1ykElPRPAxzpxH5VazKHlpfviur9vlsaa6xxZwHw+91HsL6H0DBBbfDK8+akN2DssUcbHpummhpaF9hcGEFZrnk44y4IvjpCdKZO7mgox1xH1H3UQFu8ggI4PRfXRCUivBzSaUYy6RzRBEwzRoRrBuOYmhrByCG783Y41ebvweIxTQ3qI+Qtk0Pi+nrly88v5FT49JrYboow4iPOBdUm9ydiqMxT4eHhgcfHFYpnml8IcfuD1Ct/1Mm5Xi6s2zbYgVNK3Nbb6OboVxdCIMSuobIpb0XdER/EUlSGCPlZkBARUX0N1cDINATvI8FPxDARwoz3kzo5o55iN0hyFPEa/ftaG1NrZUsb62ZqtGGyAseIeM9egne/iYrTyG05nQlBYVNvudC+qXeovMPmtVbmeebpw0dFb05no6931ApbUlXs5XSmmfPW1aAbwrquVoj8zirkHaGoe2QzWp29aC+iiKanFkvp9R8zFn1B3qWg3gIY/RYdx7a1w2vb/t7jfe39rwdI5s6PQg6cNYfvr4ZEtYbUjNj9ueM36i3eGgIyUm3HFWob8I5KmcPiHMikzlMdllmvqRRzro7bh+x8Qu90aMrBCAB9F4XVOZqTRsy1VtPj0i6I0ncWULbo1h1P2RU3kCGo2usfdPjEvmMn3+xF+opoukG54JyjhIB3Tsk5e0u48Vj1wKRvTq01OoWRYGgT0Ephvd5s/A9/YHfWci62HgFBCzGlpz3UsU3WWVNro91Wtpys06oiXjeAEERLv7r6Nw7VIOyyD+8r6/D7f/sPpG08BMf64UG1oFCumR9/+JGff/jM7XpjOj9yevyAjxPL+Ynz4wecD+QmyE2lALZcTQ+o4OXMEr0hLHv9CnTWXIeTgHeTdbUm1rXQWqaU23Ams7HCazt3YcvKZTOQQ6WOHb6+QxXTSzGkuDUV2Q0670oTShVDRTXYqxVKqzSzqdfryutFuWau1xu326qf01GoGBCvsiIhOObTzDwH81GtTkW0JME7RymNvBVyeV8vxz80/E2dc2hICIR5UvblU6QFtJams1M4wbtI8It6Jj4gIZqprrxcX3DA5DzzgyfM8PjpxIf1iWYI2U9fPivR7cPC6XxmSwm3vVLTpghZEHxTnhw/99Z0k10wLqYPD2euNeFvF162K6locbF2EVcN2E1XrNex7cFur8FTO7TPjZ2CoTcd1JTZ0Pv8r//zdzx/fuXh6ZFWTrj6kThFHj88sZwXfM3MfsHNF+Tpwj/+Y+C7b77nx9//zHrRZoYPn/6MdNW23VhXFUqs3rOuK5fXV2t9Zmz6ms5Ro5W2TdseO3pim1uIkdg3/jgh0pRfo5m4Ys0WiRkS4lXLxfmA4IYRuz/2jaRHXUOssfbcfSFlK8RzXj/PWox1OxX2XcuQHBrTNDEf0lT9p0cWb5+HGDmfH/DeM02zOlNimjBW+BznRVv2xknr96aUyeSjDX/fow+W7uB7cXFvCY8epvAVR0Y6hMCdk3H/wfvzozPw9iVj2A+IzVeK3DEQZtx814vG3e6glYIYzbl6ugWpXXeqn2dAvTuNiDpquH8/jHSsmJNihZ7D0av2vXKoETpe53E8/hvoAJwZetVFs6pZ0For2uh0kHGeSsKpQI1yUJkZG74ltMFB0WnjAXPK++2xNHWtRsCn3STZZRsGKzb2fqAswAHpaVZnExSir02dqaZ+jOuIb63Keixy1+XXebOAsQZhL5bVv3tzoFA6+qoRaUkVsg5XmLXrX0ETm/sORS+bFa/SEYv33Ri//PQFypUvH06QNlrNUDZaUzqL12cNOF0848NEnBam5cy0nPE+cL1tukGSh4BprYXzVMec3dE+EHpLuICoLpb67J2grrJtyiqsDlKwoK0Z8mz1U4YkdnRHb7TgZNXaxVZH8CAiBLEC52qIyqH0rTUlnFUZnzaIHnMurNtG2hKNRqiO6ncOrBgDPnriNBFn7YisuZPW6l7inFeZg9befWn6Gdy8o5gyQViUzNFNnuZEyy6kIzmCeE+IE7iAC3E4Odt2IW1XhEY4n4inBV+E5WHm/HRSPqpUeL68EmPg/LgwnWYIgpughYw01YRUrhyHi3sxvzc+oxg852XmsZxITRHALgPRzBEdW0ZHZ9qO8ksv/xBsbdoeQWfJlpHRb9k4tVLm9//2Ez//9MLThw/85tt/4JuPieUUmOPCafqIIxOdw8lCOJ9x3wvp0wemuPC7f/13Li+vnB5+vcbqjzo5uhgMhkaN2w459n2vh13dmLURNXB4nb+r6rZ1hxa++TjhiqE1zeHcpM6OC+oVDkiv5/H1p7Wd3bRzhPSyiYYOdpwitc2KRhmS1G9Id3A6ZNqRh6/6U2NMZDx2Y9uasbbOZdQliNUIjTGy13r/BlprO5rwi5bsdzt6IR77/5zcdSn9wok5Dkpr938bL2A4KOP/7c3f+3e2N782LPwXUPJwqjic26EVfX+RgdJtbzU/vg/HHWwjX7/PX9vKWnd0YKTwDGs+nPDxk97fW/XBZAv6emiFvfi/I1KMFE43OfQ1fZzth3EfJFvHqzGn5c6FHSmIvg5s/G1utNZGbV47wN7dOZSe3nD9LTJswh56yHh/T63RsM32UBt4ON++7vrfunNzdGZbv87aa8jFSg2lDxuMPhOGpt57HtE7C4gKJSVqTdS0Uk01uzPH1l+kcypQBqKVch4ttWMWWuCgHW2Y3EMzJF27XtY1meK8MicrmaOy8IoIEhgpwz6aR3MAu20ELE1l+8EhCBBnSEvrReedMqDfJ3PIpClR3LSo8nlphhA0QlBnNk6Bh8dH5VALntN5Zpqj3uOs+4ETITotXne+kLPgw/si5ru+4jF4tq4xcwhLVXmjinbJOfE4b/teiPhJC8FzulGMauSYpvVB1cdLKVyrMl43Izvt+2uITsejghQZ5Y5dwFe7qcTqtYQ5BBUzDYEpqAREY2+2tSt5c7GMidBsYbWxueg8a/rGcY/7/ZfmqDVTqmaAnl++8MMPv2dZFppgGnmF5i40t0LLNDIiqkmm6uuLFc1//fijTk7wKjJZcqLAqNbvHRr9JjpUkkGqIK0RvdM8pPNW1Gv5P/P6QxBcbEiF+XziKX+jEdmWKClp4fH0AT+ddWI7r4J8NOiwaKuUstKapoyulyspZaOFf8D7yDQFvv32G0qt6oScTgRvOW2bcK0JhsfZAu4GorcJH27wwbEBZSL13tMa+Bh5eHyEwY3Ti9tsM3AexONDPXwWOyEmKgHxrkfdayiwTUu6hOuhowrnBk332+PQBHz47b2Dc/e3e2+qf8ibt/aNdljL3bG9e9So844CgD6IMq6RkmldY61axDk1mKyA3rVegXu/3QvsDhT79zojHnQd3bHr0qnIWMXj+gzxecfjvETmSRWrWyl62bUO5FCoONfwFj33FMVAVRr3QUEnWWw2HuasdEe9wS+5dFobkLeYsRRAqqaoCvcbn6JDVqzvNJqvhi6E/7+9cw+1rSr7/2eMeVtr7cvxeM1+vBSWl0JCUjHKEpMy3jISI0kzzMoiu0AUXagMCzSwi2mRERhEYKBdCdQyJCEVsaj+UMnXt9Ay9fV29tlnrznHHOP5/fGMMedc+3je6n3d7Tzv/Mo+Z7vOXGuNOcftO57L90mS9+lInw4y6Ik/oBaE9HzTp6aYO51PBUWh8653JUtUWB32cRwaPhFEi/gsFiI0GMlUEzkEfbY+Bl9uIQ5YmdGs14R6g7k4Wtcwn+8heM+8bnBNIARD2wp10+KNI68bqo0am2Ws79nD2to6TVPTpIzYgUK8xDgd8XEjjLWnWoFda3uonVq/5xtzGqdVw9uYsq0SV6o0PZgY3dTVLrFdsVYEGqdZbp2LDKOqzrm6xF0sv9N6wYvFY7TSirEYMmwGS8sF1WwlEmbV/8FAWWrWV55nrO5YYml5oqSnKrX8RtyZJRG6WNjXNS1FuYellZ1b2pcp1T2t61YMmc3JY9B94wUTArWHJuS0IuTZjGqygywvqSYVk+kUQXB1zXyPI4SW0mRMMs1enM0mVKWWcvivRx5lY2OOSIGxDpt7CgvLqxNstYp3nmbN0W605EatNmVRUoqlDJZSDLM856DpTN2ACGsbu8kzQ9N69jQNbYgJHNJnsTJwd3dLoAger8/dZvFsqNd1ryP93hc8oXWsbzj+4z/u4dHH/kpRlKwecCDTpWWVHJsJRSlMpwXPOXyV1dUJVVVw6KE7KQrDQTsP3mdf/E2Sk0oXDlVTrYmDPW5KQhzXwSPRGpLHjVPTtvtIdkEnS5YZ9YVbKKqSarZEaAOOBictWVaS5ROyrAJrkhwZSkY0Gl2kxYeGEFxUE16jrmuqcqrBIuSheQAAHN9JREFUvVlOXmTk1bIaKmyGzcuBz1BN9Xqk2xSbE08WRqRbTIcWnBRH0BEeYyhMNShw2UejdC47o7Vg+vf1p2sTF/Yi3+ICnYmWD6i5pMBimwLeNllwFt7fZ+HsTYDSyWUz4elfWiQ7svjX5s8bkpvkx+tcVAOLTndtfJq6G2usTIrVAc3QKlIGz7Adi3StN3eY/v+jNpDWp4plJ4KP7qvB56XJsOnztwJFoWKZro1aEb4veJlOVynrP8+yzmYpcd4aL12XpdeQRPs6begFDN3PCVncwJL4YNefocu76awrnRKvBm1oZkkGBtUSsUm4TEL3NanNIaS05YUG9e0izc2kiNx2sTpJUmW4Kesw0fFqrAxq6BEX8qjJ1C85W4qlqsTUGeIa2uCo65qNPRoa4Hx6bnQVuMWpgF7tHNaHriZT06h7x8TnYeJzSlacROo7S454LXaZ4hgbR9v2WaMiQiZCKECyRG42j3n9PWnjhFjSQ0RDGHKy6IbMsFml1nTn8BLDFTprDvEQk6l1obBMo/Ck6rvoOCqrnLLUoNvl5QnTWYWxhrwosCkINUn2CkirBzvXOIIUVFusRp6sbab7z5KZlLhicHHcuQCtWFoBbElezijySsVvl2aIBDJb4JoW7x2uanGtJ7cZZVlQLpU0TcNTT1pEUlya7zK6JrMSW2r6tmz4VC9ea0pmGblYMrFkovE+y1WlmV2tY7maqBq4dcxbpxJlMDhg7GMdlehhiVai/uSSrojjT+KyLUGLcNaOhx/5C48+9ghZXrC0vEPjYgvLymrBZJpxwAHLHLAjZ8dqRVEowbWZsHPHyj774u/YUYfMvbvH2NwY2CVpSUwbnF5v4sDvBI3oT40hpqelVafLssgyjULPNXApxDeEbu1Mp0z6hc8PFs6ujSbuU3aQfZJ1LjTTPfA0EfbekA2m22e7zxw+GbOpc9OG3D23hYu7KPREEoafb9M1W6yTI953QYCpmd05XdBTbNDdUWTgLO+Pa92iuejKGpKbwfXD52p65t99+WDwm+5PemIztOgMicfC96QppydXsRlkJQYLISBpdnaxWHbwk747jamekA52/+5EBokjDdo2uJt0zxpzsLW74qQqmUxKshhv5kMga2Oh3HTqFRnEYKQTfehITS/9EI8ziVf+d188sKKkjTQtZv3iF4/QoZ/78a3dxmu6AFhFPxLSZicLXUD8LhkcPNIH9BmZJl5D76qKf3expsaosY5kttd4giRY1rm/4pmKSJK22FuVatx2KgAp48Vmsepy9J4F0dpNAUNdN9TzGmO1PhjofWXWkmU5iKo6S7Tg9QKNdB0tqAUQHzPZQhoXkXhEl2IaK6pg38dgpA/rNZtMv5YFIdUHVGHUEpsrycmLjLLMaL0wISDWazhclpMEU/OioIixjZ37P3kCMhuLLpcUpVo3siJXa5IY5alCjEexGDHkeUuQnNnSvjfFZwJhSJiRTpA2a0McezrRJBIhzRQLuKZBvA48JS1B41vjuh1a9XaQBQqbYXKduUWWU5WqrJ+ZpGAFRaYFjW0w1MbSxOw3fNDMQtCg8ywj8734bh71roo8x4VAbjNCrCPYZaeldXDQ5wtre1zu07mvG8vGMJmWTKoiiiZq8H/aCMUYrcc1bcnKOmoeeVxrmTeGXWtPUlYaL1Y3G6Qi3PvC30dy4oDtFFTj5pJqVEjc4NKpTVliH7QaEhGKrN2IwbUO69BJG7wueFjystTJnueINbTexYmYPr8vuiZBaF2r9W9aH/cUXSVS7IuJwn8mucxSLE7SYBkEZPa0v1+Q7UBsrXsim8hOZwYfnnCGG6D0BK1LN08n30gysrjwF1sckxPqBuq628yShWLBMhMJhonVwLvXFm8mLpKy6fXhs0z/pBudDD5ncZscEKhuhTeDIpey8FELpDTNtmDUUmMEUxoweRfEbFIwcpb1QoZZ3v9OyuASjPddLS/N+GlBMsg6n5S20Uo82acAO3oF5Oguo63/ka75h3HQzh20zR6a5JIJgca1ugnWjvmGZkH2URRKcltpde4EYumFWEzRZj2XjNd3pRGI6a+x/9LvibDEcOfBwSZE2RO1hmhoo86dLAbed98ZN0mRoNk2g1ISQYiuJnReR3I6DGZOyuK6+eVxjvfaVSpRMSjKaFU6Qt0bE4qiBENMUW7jcNL7DB6kRd2SW+yuKgqJP7p5CxlFqLA+IE4zrYIXnBfa9XWwGU0bcG3AZJZmPie5D8uiYDadgmh4gYie4lV1PZF7Ope0846mjZaiEK1XIrEqtoZcq5p7VKZOJMr01ptFQmrV2m9TrOJUK8wXU8rJksbl5IKt1L3qY7YVmJigUmCtZWm2xNJsqVu/iYfmtm20r6IVznSp/4N4rEhMc5szraYUucqBNE3LZLq0pX3pnaVtTRRe7GOego8Zhbla88XHSu8W2qZm15OPK+GODDuEwJ7dT+LrGiTQ7NlgI3jyLKfcYTVuxxiWJhN2rqxqrasixwQ93KxMZmS5YW5rGrOHpg3YEJDaEXwDWUFRWqqswBGY+pLWwsw3LE+neCMYmzNvA9a1OC/ULnSlmha9G6Zb54fG8mAEYy2T6YTpdEZRFjz3/x3Gcw47GJNBG9Zo2UMQT+PXcWEDRAgyJ8gGiOpCNbXBPZUT/vMJ/vJXzfYuYwZn0+y7hM7f6RuJp6NocUkWmbQUpsWjc2dZ4oYVB1s8VWmQmW4CPnh8a/r3GT3ZZbm6I2ys+O1j+mqygRhUY8fEwe69xMWpN2/3BCLFBGWbJkls+4Lpfe/NuQue3vw0hhac6HpaPOUNT/j96VifV+jKDqSNXn3WSqa2utKxOA+tjyEtSrU7spMi6JMbyG46ug6tOkNSRHwNFo/dm4hO/9g2P9NN1h5j9Dhrs0EusfSfEXqizbDLUgxMprkyi6tuJEMmfkcs4jk80XbfE5IvXd+rJ7IBucPQpRnFTd0sNCRgQov4fZ8ungnMZhOWl6cqTxBjYEqnLoe5tfjW9e2S5G6WPsOkMzYJSNanFpt+bD5dsG1y0XYWmfSzMFcG5FUgxUwZo4s60OnYDK1MvWZNf3BKVoV+Pg4JciJO2UK70mEikTQlOX3mj65PmrWZZTlCKv0S462sZhyKly6Bjr0fxTOKlNyYRY9slqkOClbLXuD0YXoRfN1oOq+xiIlZZMF3cyTLMsqiQGsO9bWEMJYkUZYO44G4Hnsf51XS7xoEOBt9Ptb6zjKfEjhslq6PFqHuzBLjMPOcvFT197yYkhdLalHNDCaPpT2tVUur0YLGZVmQZzmrq6usrqxijYYW6HQOzDc2qOs5IoHWq2hcF3Qfn2dyM5ZFyWxphaqc6H7loZpubV25EIzGcBnVm/Fe8M6rVSmDjNBZD5N3wfsWV+9BBFzb4JwWZA2uJrgWg+AbRyOC5B6Z+bhEayHVWVS+zq2ufdZYJkVBWRVYB6WxWC9YLf5IoAUyFX3MMnJRjbdCPEVRUJYFlS9xXiizIqp79IEjafyk/XZB6V4HQLTkxPpiRUE1mzKZVBx86CH82/P/DWuFuX+c2j+FDw0bLlC3Du89dV3jWodvwa1rKcU2QPvYLvJMRUIP3LGD2XSC9/sW6vw7alctkhTQF4aH6b1O54OdI1lv9G191EvrWogaAk2tVbkBrCnJjBaKc009MIZEkhMF9zKbqQXIZOS5RqSDVqwtywobFzBSOwf6IMqaJP3Wff7Aqt4heWWGGRybLTnpM80CuRl+evos2bQ8D67Y7P7ZIiSBLhszjRJ5VUtIbF8iWineZHO7IsHp3DaRAHTBqul+9JfuPX3vR8LQWVFMTxi6yTIgO9319G6EjuD03y8D22iXpZYIzqDrSfeYgqsHlovuguF3pWDjzRaq1D6jCqX9Cts19u/vmP8B1JxcIBiyoHoxwWsflnnOdDIlFKoM7L1aNMRAi2rnCMmdY1BhvyEJ6Md8LwKWCL10vy+gI33EzXZ42DCxa2MKq7E9MeonwWBN0T+GvLJbTPtO7Db0VEoitQ9UzLOqqmjJ8fi0DkG05GSdFShlLnmfSI5+TYjq1gOh9y2D804ruqOK0a4NNK0K67Xe03rNmgoYfCSKbZPRxEOcoZ+DBnVZIYlMhv75xO8LsXJ0EL3v1nv6I7jt1/xuCUjjIP4hum5qMLBmyrlYVdwmPTJjyfMJZaWaY3k5o6zUkuODpRDNTBWjZEcgWpq0UGuXfZ5EJWOxzsZparl6FryqLsd+DqltHTm1tBOhiMuTHgi2eG4WlVoIxWHQjOSA9JpTQa1p4iG0EkMITRdDFnwL4vW96N/aadpnqUxGut7ajHIy0UuA2jlyCZRSUliLyXPK2YTpyhKlKRBjcMZT5y114cjLjBqHy1qc93jr9Zybqa5RmeREcNSNH3AB/UPS8zb9/IIkrmo6K3cKOLZRPsZkEuUbBKzKOXijMb9iDLa1UYbMkJdGdWrjQpt0lLwTqnzfJVf+JsnxAm3UH+ksLnGRCYAk/59Jmzyq8hsHrA/SqYsm9wwC840av+6Ulc/n1HWNNZbZdInJZIYxBtc2mN3d1O0mT1lW5LlmNU2qikm5pIM9LtypQBlY9SG20Q8Ji4tk2n+71+P/pEhxAaGPXxgu9sOgYxtdKsZqcFn/3rT5JhcVvTFh8IyHipFbnabaNA3GOQriyV30+xH0GFkUKvCUZ3GxTxv53oRH77MnET3J2UQqogVIXQ6DfxsQm5RaSRa/byhCKKKEOH10pymfvj8R2di2LIdYKb1rQ7ooxckkMhVfN2lV7IQcdDzp8cVqKQnb9veftIWynI7odUQIiFbKrbTLLVVT5tMpTauqso1zGNGKzlVesDzT4MomZun44MmaOcy1Wrl1HmM0K0uziFwc6/0IHbqrEjoiIX3SQdep3aWihf9S1lVyJ9mMsqhi0cfUl7rhtqHtx79NnxJj9ejTXodHk3Tyr6oKY8yCZs5kMmU2m0Uy1+LF95uc6IGpLCuyrABp8W5OXbeDca/jIMQA7a0OPN69vpu13WtRfkKl9dtW3f1149iY1yqq2D1tg2vmzNfXMcZSlCVVNeky3Ioij7ciSnIgklldZ9rWR/2ggHNtTFOGlLloTCzemBlN848bi4QkEhmdKlEksfVC7TQ+rKomLJUT8qJksrTKjgMPpSon5OWUslpRTR6jP2BoWhddrYG2dbjWEeLe0zp1k6W0du9b1nbvYX19NxjI8gybqRttXs+jixKtsSaGycQzmSxTlEpanfOQb62VdXm2ysbGLtp2HRFHZiCEllYMVlkdBotrAk2tmVjOBZqmJQVrZ7GAp5EWIxpITjBIq2PD1Q3zKJRZViXVVAOF1+I4KsqCYmVKWRSYqWHl0IMpp8uY1uPXG9aaDVzpKWY5ddWyJ2/Y1W6wIQ3zvIEikHmoyNnBDO+FtfU5rvHUcW/06SBEzGUxcfyktTWFGxjppNgyq9XYszxolfgsrQFgCkMWDIJlEvIY/2NoXUbwltYF5rsbXK1lPv7rEUdw0G7sOyv5b5KcLhBtoEJr+r2BRPP3CvyDboP3nU9fTxWggXNNoxoQ8/kG8/kG1io5SXWivG8WPQ6AtTnBQ55rPanpZImimOxlbOgWS9F7GJpOhtfuZVGBzsLQvZ/+RLv5dKuX95klfbqtRDEx6TaEdO12wgePCZ48RPIoaLpxuh1rARszE4KWRIgx+Qv0a/CIEoNZID3DC7sTfrxu+AiEGCORTvWbfhIr3OT+SOVEhryKwXgkTxIB9EwjhMVaU+mju0yeAZlL99Kli6fUcdNZuoT+98jgFojUVvd0nqtLwpiscxO4TC2iWZZTxEzCeT0HgloCCNS+0S4WJTEm6XV0z6EfG33MHXuNe5HN1s/BnIB48KG35KDBokWeRdkFUbN+CF2QZnqMOu/0NRP7Pwm/9XMoBaHmnSUnRFFBJT+DtUQcQfSU3wmDovFB1liCsQSvcTlAv0inNize3pagaWvmTY31Xi0ZYvDBRvdFS9uqjk2C8nyHiNMWyowiyyGzsY6uVie3SGe5TWaptC4nZfi27ckhsUhqF9fYHU7TYZcu3qXT7BGh9bqu+yDK/W2OzUqKYspkusJkMiUvplTVitYvszkmKwHDvK6xtSo0b8wNbpAV54OOM+89rVNiVteNVm03UJQFeZHjgwog1k1NsuAAGKvBzT4oyWl9IPNby1jLsqIoKmAeRW5jnBqobEKMUQveE5xTS1rjaOoGEVFpElvoniI+inmifRc05MH7FuecFlgttVCrax1Pra+xUdd4A62IFtYscqrlGUVe0tY1c++0Nltu2CgbTAkb4qgzR20dbdZCJpqxlsNE1F3VNJ7cWnQliUV202E+WnJ0/PRu4+4QTZ+xaY1gM7XmZCaRJCHExFVdzpPNxhB8gYilmXvaulVF9xBYX3c0ewKzycY+++JvkpzpdELybScM94BFQbB+w08XBe+1SBp0QoAglIWlKXMkeMoipypLjLWsLC8zmy3F9+6dXaGWnImKRBUFk6qkKotN5OHpSUhqfCJhQ2mVvclHZ25YICo9ydF/zWKtFD2tZl1MTXJ1JHddur43GJnB9/TtK4qtTSG3hx2spu1UVVzQjV+Ilpy8rzSex8Dfzi1DRwIGj44Bw+gGc4fhc00VshdvPcbQoN+RlHutiYHEehJIui/xeMdCTnD3E6dQnkNV6kaxmeSE8DTkox9k4hymdfr/bQxCNhaqKZKXdKn2aXP37XDF10ntHdRTZOfWZnCs7DgQEQ1ETSb8yWxO673GP2TqLqibmtl8GR88G/WcpXq9U0l18eToW/BtH2tHtDsPSY7WYjOdRcZGV9Mg2m3xmXaTi86ik2UZk7Iis1k3N1J9LOddryo8mJypTarWmy3MVWMMVVVRlto3rmlxressPGXUzPHiCKhrpl0gORrg2rae5aUV6lrN3mLNXuPksMOe8wz13NNjdechaDZodA2IwUd3jnMt07rGyyLJUbeSxsRMplMmk1nMOtITs56JU/ZmMkdpfJJrXVeTapHkaG9aq3oqWbRQF5m6GkWgbUPnItJYDY2PrCPJqSZLrKzupCgrVlZWmS2vUlUTsnxCVc5U9d1q5g+QjvhI0AKuqRbRbDpjOp2qJcd5fLQ+eQkxFgjyMo+FlD15kdM0TVyCdOJPqylLy8tMZzNCCORF27l2tgoHrB6I9zVtOyEEF89ayZuRkZtCLTku0MxdTN1vqRsXSY7WTTTGEJwjtGp5qrKMwmgG2dLyMtOlGTZTV1VZVTjvaMWQFRVFVbJz9SBWl3cgbaC1jlC1+KahNiXtdE5VFKyuzpiUBXnTEuyMad0w3WjI8hXquSMEcE73stnummp5hbpuNZar0wOnWzaSlTzxAGNVP2l1xworqytMJiUHH3QwB6zuxGZCKxZPRRCHk4pWltWDQnKLGULIEbG4iaeipp62uLlnJfc0G4FDDzlsn31hZKv9IyNGjBgxYsSIEduArU3lGTFixIgRI0aM2CaMJGfEiBEjRowYsV9iJDkjRowYMWLEiP0Sz1qSc8MNN3DeeedtdzNGbBN+97vf8ZnPfGa7mzFiEy6++GJe/epX8+Uvf3m7mzLif4mxL/cv3HHHHbzhDW/Y6/UrrriCH/7wh3/z/Y8//jhHH330VjRtS7HF1SBHjNga3HfffTz88MPb3YwRm/C9732PW265hec8Z2szkUZsPca+/L+BD33oQ9vdhC3Fs4rkXHHFFfzkJz/hgAMO4HnPex6g4naXX345d955J957XvziF/OpT32K5eVlHn74YS655BIeeughnHO8/vWv573vfS8PPvgg5557Li94wQv485//zHe+8x0OPfTQbb67Eddddx3XXHMN1lp27tzJpZdeyjXXXMNvf/tb1tfXERE+//nP89znPpevfvWrrK2t8YlPfIJLL710u5s+AjjnnHMQEd797ndz3333cfrpp3Pvvffy4Q9/mOc///lccsklPPnkkxhjuOCCC3jTm94EwDe/+U2uu+46lpaWOOGEE7j55pv5xS9+sc13838bY1/un9izZw8f/OAH+dOf/sTq6iqXXHIJV199NUceeSTvfOc7OfbYYznttNO45557uPzyy3nooYf48pe/zHQ65dhjj93u5v/PIM8S/OxnP5N///d/l7W1NXHOyYUXXihve9vb5Morr5TLLrtMQggiIvLFL35RLr74YhEROe+88+Tmm28WEZH5fC7nnXee/PSnP5UHHnhAjjrqKLnzzju363ZGbMLdd98tJ510kvzlL38REZFrrrlGLrjgAvnABz4g3nsREbn66qvlPe95j4iIXH/99XLhhRduW3tHPD2OOuooeeyxx+TUU0+Vq666SkREnHNy2mmnyY033igiIn/961/lla98pfz617+WX/7yl3L66afLU089JSEE+cQnPiGnnnrqdt7CiIixL/cv3H777XLMMcfIXXfdJSIi1157rbz5zW+Wj33sY/Ktb31LRLTPf/CDH4iIyKOPPirHH3+8/OEPfxARkW984xty1FFHbU/j/xd41lhybrvtNl7zmtewvKxS9WeddRbf+c53uOWWW1hbW+NXv/oVAM45DjroIPbs2cOdd97JU089xRVXXAEoi73nnnt4yUteQp7nHHfccdt2PyMWcdttt3HyySdz+OGHA3D++edz/vnnc//993PttdfywAMPcMcdd7C0tLXVg0c8czjhhBMA+OMf/0hd17z2ta8F4LDDDuO1r30tt956K7t27eJ1r3sdq6urAJx77rncfvvt29bmEU+PsS/3Dxx99NG89KUvBeDMM8/ks5/97F5ejNTXd911F0cddRQvfOELATj77LP50pe+9M9t8DOAZw3JARZUjLOodhlC4JOf/CSnnHIKAOvr69R13Sm1XnvttUxjxdnHH3+cqqp44oknKMsy1rca8a+AJB+fMJ/Puf766/n2t7/NO97xDk477TSOOOIIfvzjH29jK0f8I5jNZoDK8W9WFBcR2rYlz/Onndcj/rUw9uX+gaTIn5DKogyR+hoW99xn6375rMmuetWrXsUNN9zArl27CCHwox/9CICTTz6Z7373uzRNQwiBT3/603zpS19ieXmZ4447jmuuuQaAXbt28da3vpWbb755O29jxD5w0kkncdttt/HII48AcO2113Lrrbdy6qmncs4553Dsscfy85//vJOez7KMtt3aInsjnhkcccQR5HnOTTfdBMDDDz/MjTfeyMtf/nJOOeUUbrrpJtbW1gCNyxrxr4uxL5/duPfee7n77rsBDSw//vjjOyPAZpx44oncd9993HPPPQB8//vf/6e185nEs4aanXLKKdx7772cddZZrK6ucswxx/DEE0/wvve9jy984QuceeaZeO950YtexMc//nEALr/8cj73uc9xxhln0DQNb3jDG3jjG9/Igw8+uM13M2Izjj76aD760Y/yrne9C4BDDjmEiy66iEsuuYQzzjiDtm15xStewU033UQIgeOOO46vfe1rvP/97+eqq67a5taP+O9QFAVf//rX+fznP8+VV16J956LLrqIl73sZQC85S1v4eyzz2YymXDkkUfuc9Edsf0Y+/LZjSOOOIKrrrqKBx54gIMOOojLLruMK6+88mmvPfDAA7n88sv5yEc+QlEUnHjiif/k1j4zGGtXjRgxYtvw+9//nt/85je8/e1vB+iy6b7yla9sc8tG/KMY+3LEvyJGkjNixIhtw+7du/nkJz/J/fffjzGGww8/nM997nMcdti+qwqP+NfE2Jcj/hUxkpwRI0aMGDFixH6JZ03g8YgRI0aMGDFixD+CkeSMGDFixIgRI/ZLjCRnxIgRI0aMGLFfYiQ5I0aMGDFixIj9EiPJGTFixIgRI0bslxhJzogRI0aMGDFiv8T/ByRtE05nZXtwAAAAAElFTkSuQmCC\n",
-      "text/plain": [
-       "<Figure size 720x720 with 25 Axes>"
-      ]
-     },
-     "metadata": {},
-     "output_type": "display_data"
-    }
-   ],
-   "source": [
-    "class_names = ['airplane', 'automobile', 'bird', 'cat', 'deer',\n",
-    "               'dog', 'frog', 'horse', 'ship', 'truck']\n",
->>>>>>> 9b0e2e75096cc1acee65bfac25f4eff818140252
-    "plt.figure(figsize=(10,10))\n",
-    "for i in range(25):\n",
-    "    plt.subplot(5,5,i+1)\n",
-    "    plt.xticks([])\n",
-    "    plt.yticks([])\n",
-    "    plt.grid(False)\n",
-    "    plt.imshow(train_images[i], cmap=plt.cm.binary)\n",
-    "    # The CIFAR labels happen to be arrays, \n",
-    "    # which is why you need the extra index\n",
-    "    plt.xlabel(class_names[train_labels[i][0]])\n",
-    "plt.show()"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Set up  the model\n",
-    "\n",
-    "The 6 lines of code below define the convolutional base using a common pattern: a stack of Conv2D and MaxPooling2D layers.\n",
-    "\n",
-    "As input, a CNN takes tensors of shape (image_height, image_width, color_channels), ignoring the batch size. If you are new to these dimensions, color_channels refers to (R,G,B). In this example, you will configure our CNN to process inputs of shape (32, 32, 3), which is the format of CIFAR images. You can do this by passing the argument input_shape to our first layer."
-   ]
-  },
-  {
-   "cell_type": "code",
-<<<<<<< HEAD
-   "execution_count": 8,
-   "metadata": {},
-   "outputs": [],
-=======
-   "execution_count": 10,
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "Model: \"sequential_49\"\n",
-      "_________________________________________________________________\n",
-      "Layer (type)                 Output Shape              Param #   \n",
-      "=================================================================\n",
-      "conv2d_49 (Conv2D)           (None, 30, 30, 32)        896       \n",
-      "_________________________________________________________________\n",
-      "max_pooling2d_49 (MaxPooling (None, 15, 15, 32)        0         \n",
-      "_________________________________________________________________\n",
-      "conv2d_50 (Conv2D)           (None, 13, 13, 64)        18496     \n",
-      "_________________________________________________________________\n",
-      "max_pooling2d_50 (MaxPooling (None, 6, 6, 64)          0         \n",
-      "_________________________________________________________________\n",
-      "conv2d_51 (Conv2D)           (None, 4, 4, 64)          36928     \n",
-      "=================================================================\n",
-      "Total params: 56,320\n",
-      "Trainable params: 56,320\n",
-      "Non-trainable params: 0\n",
-      "_________________________________________________________________\n"
-     ]
-    }
-   ],
->>>>>>> 9b0e2e75096cc1acee65bfac25f4eff818140252
-   "source": [
-    "model = models.Sequential()\n",
-    "model.add(layers.Conv2D(32, (3, 3), activation='relu', input_shape=(32, 32, 3)))\n",
-    "model.add(layers.MaxPooling2D((2, 2)))\n",
-    "model.add(layers.Conv2D(64, (3, 3), activation='relu'))\n",
-    "model.add(layers.MaxPooling2D((2, 2)))\n",
-    "model.add(layers.Conv2D(64, (3, 3), activation='relu'))\n",
-    "\n",
-    "# Let's display the architecture of our model so far.\n",
-    "\n",
-    "model.summary()"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "You can see that the output of every Conv2D and MaxPooling2D layer is a 3D tensor of shape (height, width, channels). The width and height dimensions tend to shrink as you go deeper in the network. The number of output channels for each Conv2D layer is controlled by the first argument (e.g., 32 or 64). Typically, as the width and height shrink, you can afford (computationally) to add more output channels in each Conv2D layer.\n",
-    "\n",
-    "\n",
-    "\n",
-    "\n",
-    "## Add Dense layers on top\n",
-    "\n",
-    "To complete our model, you will feed the last output tensor from the\n",
-    "convolutional base (of shape (4, 4, 64)) into one or more Dense layers\n",
-    "to perform classification. Dense layers take vectors as input (which\n",
-    "are 1D), while the current output is a 3D tensor. First, you will\n",
-    "flatten (or unroll) the 3D output to 1D, then add one or more Dense\n",
-    "layers on top. CIFAR has 10 output classes, so you use a final Dense\n",
-    "layer with 10 outputs and a softmax activation."
-   ]
-  },
-  {
-   "cell_type": "code",
-<<<<<<< HEAD
-   "execution_count": 9,
-   "metadata": {},
-   "outputs": [],
-=======
-   "execution_count": 12,
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "Model: \"sequential_49\"\n",
-      "_________________________________________________________________\n",
-      "Layer (type)                 Output Shape              Param #   \n",
-      "=================================================================\n",
-      "conv2d_49 (Conv2D)           (None, 30, 30, 32)        896       \n",
-      "_________________________________________________________________\n",
-      "max_pooling2d_49 (MaxPooling (None, 15, 15, 32)        0         \n",
-      "_________________________________________________________________\n",
-      "conv2d_50 (Conv2D)           (None, 13, 13, 64)        18496     \n",
-      "_________________________________________________________________\n",
-      "max_pooling2d_50 (MaxPooling (None, 6, 6, 64)          0         \n",
-      "_________________________________________________________________\n",
-      "conv2d_51 (Conv2D)           (None, 4, 4, 64)          36928     \n",
-      "_________________________________________________________________\n",
-      "flatten_49 (Flatten)         (None, 1024)              0         \n",
-      "_________________________________________________________________\n",
-      "dense_98 (Dense)             (None, 64)                65600     \n",
-      "_________________________________________________________________\n",
-      "dense_99 (Dense)             (None, 10)                650       \n",
-      "=================================================================\n",
-      "Total params: 122,570\n",
-      "Trainable params: 122,570\n",
-      "Non-trainable params: 0\n",
-      "_________________________________________________________________\n"
-     ]
-    }
-   ],
->>>>>>> 9b0e2e75096cc1acee65bfac25f4eff818140252
-   "source": [
-    "model.add(layers.Flatten())\n",
-    "model.add(layers.Dense(64, activation='relu'))\n",
-    "model.add(layers.Dense(10))\n",
-<<<<<<< HEAD
-    "Here's the complete architecture of our model.\n",
-=======
-    "#Here's the complete architecture of our model.\n",
->>>>>>> 9b0e2e75096cc1acee65bfac25f4eff818140252
-    "\n",
-    "model.summary()"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "As you can see, our (4, 4, 64) outputs were flattened into vectors of shape (1024) before going through two Dense layers.\n",
-    "\n",
-    "## Compile and train the model"
-   ]
-  },
-  {
-   "cell_type": "code",
-<<<<<<< HEAD
-   "execution_count": 10,
-   "metadata": {},
-   "outputs": [],
-=======
-   "execution_count": 13,
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "Train on 50000 samples, validate on 10000 samples\n",
-      "Epoch 1/10\n",
-      "50000/50000 [==============================] - 40s 793us/sample - loss: 1.5115 - accuracy: 0.4515 - val_loss: 1.2411 - val_accuracy: 0.5545\n",
-      "Epoch 2/10\n",
-      "50000/50000 [==============================] - 41s 826us/sample - loss: 1.1297 - accuracy: 0.6006 - val_loss: 1.0419 - val_accuracy: 0.6307\n",
-      "Epoch 3/10\n",
-      "50000/50000 [==============================] - 43s 870us/sample - loss: 0.9842 - accuracy: 0.6534 - val_loss: 1.0402 - val_accuracy: 0.6314\n",
-      "Epoch 4/10\n",
-      "50000/50000 [==============================] - 43s 869us/sample - loss: 0.8824 - accuracy: 0.6894 - val_loss: 0.9944 - val_accuracy: 0.6599\n",
-      "Epoch 5/10\n",
-      "50000/50000 [==============================] - 40s 803us/sample - loss: 0.8098 - accuracy: 0.7171 - val_loss: 0.9176 - val_accuracy: 0.6829\n",
-      "Epoch 6/10\n",
-      "50000/50000 [==============================] - 46s 925us/sample - loss: 0.7469 - accuracy: 0.7370 - val_loss: 0.8683 - val_accuracy: 0.7072\n",
-      "Epoch 7/10\n",
-      "50000/50000 [==============================] - 43s 857us/sample - loss: 0.6939 - accuracy: 0.7546 - val_loss: 0.8628 - val_accuracy: 0.7055\n",
-      "Epoch 8/10\n",
-      "50000/50000 [==============================] - 38s 770us/sample - loss: 0.6492 - accuracy: 0.7719 - val_loss: 0.8725 - val_accuracy: 0.7120\n",
-      "Epoch 9/10\n",
-      "50000/50000 [==============================] - 37s 743us/sample - loss: 0.6064 - accuracy: 0.7881 - val_loss: 0.8604 - val_accuracy: 0.7144\n",
-      "Epoch 10/10\n",
-      "50000/50000 [==============================] - 36s 715us/sample - loss: 0.5675 - accuracy: 0.8003 - val_loss: 0.8882 - val_accuracy: 0.7137\n"
-     ]
-    }
-   ],
->>>>>>> 9b0e2e75096cc1acee65bfac25f4eff818140252
-   "source": [
-    "model.compile(optimizer='adam',\n",
-    "              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),\n",
-    "              metrics=['accuracy'])\n",
-<<<<<<< HEAD
-    "​\n",
-=======
-    "\n",
->>>>>>> 9b0e2e75096cc1acee65bfac25f4eff818140252
-    "history = model.fit(train_images, train_labels, epochs=10, \n",
-    "                    validation_data=(test_images, test_labels))"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Finally, evaluate the model"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 11,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "plt.plot(history.history['accuracy'], label='accuracy')\n",
-    "plt.plot(history.history['val_accuracy'], label = 'val_accuracy')\n",
-    "plt.xlabel('Epoch')\n",
-    "plt.ylabel('Accuracy')\n",
-    "plt.ylim([0.5, 1])\n",
-    "plt.legend(loc='lower right')\n",
-    "\n",
-    "test_loss, test_acc = model.evaluate(test_images,  test_labels, verbose=2)\n",
-    "\n",
-    "print(test_acc)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Recurrent neural networks: Overarching view\n",
-    "\n",
-    "Till now our focus has been, including convolutional neural networks\n",
-    "as well, on feedforward neural networks. The output or the activations\n",
-    "flow only in one direction, from the input layer to the output layer.\n",
-    "\n",
-    "A recurrent neural network (RNN) looks very much like a feedforward\n",
-    "neural network, except that it also has connections pointing\n",
-    "backward. \n",
-    "\n",
-    "RNNs are used to analyze time series data such as stock prices, and\n",
-    "tell you when to buy or sell. In autonomous driving systems, they can\n",
-    "anticipate car trajectories and help avoid accidents. More generally,\n",
-    "they can work on sequences of arbitrary lengths, rather than on\n",
-    "fixed-sized inputs like all the nets we have discussed so far. For\n",
-    "example, they can take sentences, documents, or audio samples as\n",
-    "input, making them extremely useful for natural language processing\n",
-    "systems such as automatic translation and speech-to-text.\n",
-    "\n",
-    "\n",
-    "## A simple example"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 12,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# Start importing packages\n",
-    "import pandas as pd\n",
-    "import numpy as np\n",
-    "import matplotlib.pyplot as plt\n",
-    "import tensorflow as tf\n",
-    "from tensorflow.keras import datasets, layers, models\n",
-    "from tensorflow.keras.layers import Input\n",
-    "from tensorflow.keras.models import Model, Sequential \n",
-    "from tensorflow.keras.layers import Dense, SimpleRNN, LSTM, GRU\n",
-    "from tensorflow.keras import optimizers     \n",
-    "from tensorflow.keras import regularizers           \n",
-    "from tensorflow.keras.utils import to_categorical \n",
-    "\n",
-    "\n",
-    "\n",
-    "# convert into dataset matrix\n",
-    "def convertToMatrix(data, step):\n",
-    " X, Y =[], []\n",
-    " for i in range(len(data)-step):\n",
-    "  d=i+step  \n",
-    "  X.append(data[i:d,])\n",
-    "  Y.append(data[d,])\n",
-    " return np.array(X), np.array(Y)\n",
-    "\n",
-    "step = 4\n",
-    "N = 1000    \n",
-    "Tp = 800    \n",
-    "\n",
-    "t=np.arange(0,N)\n",
-    "x=np.sin(0.02*t)+2*np.random.rand(N)\n",
-    "df = pd.DataFrame(x)\n",
-    "df.head()\n",
-    "\n",
-    "plt.plot(df)\n",
-    "plt.show()\n",
-    "\n",
-    "values=df.values\n",
-    "train,test = values[0:Tp,:], values[Tp:N,:]\n",
-    "\n",
-    "# add step elements into train and test\n",
-    "test = np.append(test,np.repeat(test[-1,],step))\n",
-    "train = np.append(train,np.repeat(train[-1,],step))\n",
-    " \n",
-    "trainX,trainY =convertToMatrix(train,step)\n",
-    "testX,testY =convertToMatrix(test,step)\n",
-    "trainX = np.reshape(trainX, (trainX.shape[0], 1, trainX.shape[1]))\n",
-    "testX = np.reshape(testX, (testX.shape[0], 1, testX.shape[1]))\n",
-    "\n",
-    "model = Sequential()\n",
-    "model.add(SimpleRNN(units=32, input_shape=(1,step), activation=\"relu\"))\n",
-    "model.add(Dense(8, activation=\"relu\")) \n",
-    "model.add(Dense(1))\n",
-    "model.compile(loss='mean_squared_error', optimizer='rmsprop')\n",
-    "model.summary()\n",
-    "\n",
-    "model.fit(trainX,trainY, epochs=100, batch_size=16, verbose=2)\n",
-    "trainPredict = model.predict(trainX)\n",
-    "testPredict= model.predict(testX)\n",
-    "predicted=np.concatenate((trainPredict,testPredict),axis=0)\n",
-    "\n",
-    "trainScore = model.evaluate(trainX, trainY, verbose=0)\n",
-    "print(trainScore)\n",
-    "\n",
-    "index = df.index.values\n",
-    "plt.plot(index,df)\n",
-    "plt.plot(index,predicted)\n",
-    "plt.axvline(df.index[Tp], c=\"r\")\n",
-    "plt.show()"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Set up of an RNN\n",
-    "\n",
-    "The figure here displays a simple example of an RNN, with inputs $x_t$\n",
-    "at a given time $t$ and outputs $y_t$.  Introducing time as a variable\n",
-    "offers an intutitive way of understanding these networks. In addition\n",
-    "to the inputs $x_t$, the layer at a time $t$ receives also as input\n",
-    "the output from the previous layer $t-1$, that is $y_{t1}$.\n",
-    "\n",
-    "This means also that we need to have weights that link both the inputs\n",
-    "$x_t$ to the outputs $y_t$ as well as weights that link the output\n",
-    "from the previous time $y_{t-1}$ and $y_t$.  The figure here shows an\n",
-    "example of a simple RNN.\n",
-    "\n",
-    "More material will be added here.\n",
-    "\n",
-    "\n",
-    "## Solving differential equations and eigenvalue problems with RNNs\n",
-    "\n",
-    "\n",
-    "\n",
-    "In our discussions of ordinary differential equations and partial\n",
-    "differential equations using neural networks. Here we will discuss how\n",
-    "we can solve say ordinary differential equations and eigenvalue\n",
-    "problems using RNNs. Eigenvalue problems can be solved using RNNs by\n",
-    "rewriting such a problems as a non-linear differential equation.\n",
-    "\n",
-    "Instead of starting with a well-known ordinary differential equation,\n",
-    "we start directly with an eigenvaule problem.\n",
-    "\n",
-    "\n",
-    "\n",
-    "## Long-Short Time Memory\n",
-    "\n",
-    "Discussions about dynamic unrolling through time. discuss memory cells, input and output\n",
-    "\n",
-    "\n",
-    "\n",
-    "\n",
-    "## Autoencoders: Overarching view\n",
-    "\n",
-    "Autoencoders are artificial neural networks capable of learning\n",
-    "efficient representations of the input data (these representations are called codings)  without\n",
-    "any supervision (i.e., the training set is unlabeled). These codings\n",
-    "typically have a much lower dimensionality than the input data, making\n",
-    "autoencoders useful for dimensionality reduction. \n",
-    "\n",
-    "More importantly, autoencoders act as powerful feature detectors, and\n",
-    "they can be used for unsupervised pretraining of deep neural networks.\n",
-    "\n",
-    "Lastly, they are capable of randomly generating new data that looks\n",
-    "very similar to the training data; this is called a generative\n",
-    "model. For example, you could train an autoencoder on pictures of\n",
-    "faces, and it would then be able to generate new faces.  Surprisingly,\n",
-    "autoencoders work by simply learning to copy their inputs to their\n",
-    "outputs. This may sound like a trivial task, but we will see that\n",
-    "constraining the network in various ways can make it rather\n",
-    "difficult. For example, you can limit the size of the internal\n",
-    "representation, or you can add noise to the inputs and train the\n",
-    "network to recover the original inputs. These constraints prevent the\n",
-    "autoencoder from trivially copying the inputs directly to the outputs,\n",
-    "which forces it to learn efficient ways of representing the data. In\n",
-    "short, the codings are byproducts of the autoencoder’s attempt to\n",
-    "learn the identity function under some constraints.\n",
-    "\n",
-    "## Simple examples of Autoencoders"
-   ]
-  }
- ],
- "metadata": {
-  "kernelspec": {
-   "display_name": "Python 3",
-   "language": "python",
-   "name": "python3"
-  },
-  "language_info": {
-   "codemirror_mode": {
-    "name": "ipython",
-    "version": 3
-   },
-   "file_extension": ".py",
-   "mimetype": "text/x-python",
-   "name": "python",
-   "nbconvert_exporter": "python",
-   "pygments_lexer": "ipython3",
-<<<<<<< HEAD
-   "version": "3.6.8"
-=======
-   "version": "3.8.3"
->>>>>>> 9b0e2e75096cc1acee65bfac25f4eff818140252
-  }
- },
- "nbformat": 4,
- "nbformat_minor": 4
-}
diff --git a/doc/pub/week42/ipynb/figures/adagrad.png b/doc/pub/week42/ipynb/figures/adagrad.png
new file mode 100644
index 000000000..97a9cf908
Binary files /dev/null and b/doc/pub/week42/ipynb/figures/adagrad.png differ
diff --git a/doc/pub/week42/ipynb/figures/adam.png b/doc/pub/week42/ipynb/figures/adam.png
new file mode 100644
index 000000000..a3a39f025
Binary files /dev/null and b/doc/pub/week42/ipynb/figures/adam.png differ
diff --git a/doc/pub/week42/ipynb/figures/rmsprop.png b/doc/pub/week42/ipynb/figures/rmsprop.png
new file mode 100644
index 000000000..9f336d033
Binary files /dev/null and b/doc/pub/week42/ipynb/figures/rmsprop.png differ
diff --git a/doc/pub/week42/ipynb/figures/simplenn3.png b/doc/pub/week42/ipynb/figures/simplenn3.png
new file mode 100644
index 000000000..a377fad3c
Binary files /dev/null and b/doc/pub/week42/ipynb/figures/simplenn3.png differ
diff --git a/doc/pub/week42/ipynb/ipynb-week42-src.tar.gz b/doc/pub/week42/ipynb/ipynb-week42-src.tar.gz
index 3d017ff04..03dbe7dac 100644
Binary files a/doc/pub/week42/ipynb/ipynb-week42-src.tar.gz and b/doc/pub/week42/ipynb/ipynb-week42-src.tar.gz differ
diff --git a/doc/pub/week43/html/week43-bs.html b/doc/pub/week43/html/week43-bs.html
index 778488ec5..1c5be76a7 100644
--- a/doc/pub/week43/html/week43-bs.html
+++ b/doc/pub/week43/html/week43-bs.html
@@ -41,15 +41,6 @@
                2,
                None,
                'exercises-and-lab-session-week-43'),
-              ('Mathematics of deep learning',
-               2,
-               None,
-               'mathematics-of-deep-learning'),
-              ('Reminder on books with hands-on material and codes',
-               2,
-               None,
-               'reminder-on-books-with-hands-on-material-and-codes'),
-              ('Reading recommendations', 2, None, 'reading-recommendations'),
               ('Using Automatic differentiation',
                2,
                None,
@@ -58,10 +49,10 @@
                2,
                None,
                'back-propagation-and-automatic-differentiation'),
-              ('Lecture Monday  October 21',
+              ('Lecture Monday  October 20',
                2,
                None,
-               'lecture-monday-october-21'),
+               'lecture-monday-october-20'),
               ('Setting up the back propagation algorithm and algorithm for a '
                'feed forward NN, initalizations',
                2,
@@ -95,63 +86,6 @@
                2,
                None,
                'more-on-activation-functions-output-layers'),
-              ('Setting up a Multi-layer perceptron model for classification',
-               2,
-               None,
-               'setting-up-a-multi-layer-perceptron-model-for-classification'),
-              ('Defining the cost function',
-               2,
-               None,
-               'defining-the-cost-function'),
-              ('Example: binary classification problem',
-               2,
-               None,
-               'example-binary-classification-problem'),
-              ('The Softmax function', 2, None, 'the-softmax-function'),
-              ('Developing a code for doing neural networks with back '
-               'propagation',
-               2,
-               None,
-               'developing-a-code-for-doing-neural-networks-with-back-propagation'),
-              ('Collect and pre-process data',
-               2,
-               None,
-               'collect-and-pre-process-data'),
-              ('Train and test datasets', 2, None, 'train-and-test-datasets'),
-              ('Define model and architecture',
-               2,
-               None,
-               'define-model-and-architecture'),
-              ('Layers', 2, None, 'layers'),
-              ('Weights and biases', 2, None, 'weights-and-biases'),
-              ('Feed-forward pass', 2, None, 'feed-forward-pass'),
-              ('Matrix multiplications', 2, None, 'matrix-multiplications'),
-              ('Choose cost function and optimizer',
-               2,
-               None,
-               'choose-cost-function-and-optimizer'),
-              ('Optimizing the cost function',
-               2,
-               None,
-               'optimizing-the-cost-function'),
-              ('Regularization', 2, None, 'regularization'),
-              ('Matrix  multiplication', 2, None, 'matrix-multiplication'),
-              ('Improving performance', 2, None, 'improving-performance'),
-              ('Full object-oriented implementation',
-               2,
-               None,
-               'full-object-oriented-implementation'),
-              ('Evaluate model performance on test data',
-               2,
-               None,
-               'evaluate-model-performance-on-test-data'),
-              ('Adjust hyperparameters', 2, None, 'adjust-hyperparameters'),
-              ('Visualization', 2, None, 'visualization'),
-              ('scikit-learn implementation',
-               2,
-               None,
-               'scikit-learn-implementation'),
-              ('Visualization', 2, None, 'visualization'),
               ('Building neural networks in Tensorflow and Keras',
                2,
                None,
@@ -162,14 +96,18 @@
                2,
                None,
                'collect-and-pre-process-data'),
-              ('The Breast Cancer Data, now with Keras',
+              ('Using Pytorch with the full MNIST data set',
                2,
                None,
-               'the-breast-cancer-data-now-with-keras'),
-              ('Building a neural network code',
+               'using-pytorch-with-the-full-mnist-data-set'),
+              ('And a similar example using Tensorflow with Keras',
                2,
                None,
-               'building-a-neural-network-code'),
+               'and-a-similar-example-using-tensorflow-with-keras'),
+              ('Building our own  neural network code',
+               2,
+               None,
+               'building-our-own-neural-network-code'),
               ('Learning rate methods', 3, None, 'learning-rate-methods'),
               ('Usage of the above learning rate schedulers',
                3,
@@ -344,12 +282,9 @@
         <ul class="dropdown-menu">
      <!-- navigation toc: --> <li><a href="#plans-for-week-43" style="font-size: 80%;"><b>Plans for week 43</b></a></li>
      <!-- navigation toc: --> <li><a href="#exercises-and-lab-session-week-43" style="font-size: 80%;"><b>Exercises and lab session week 43</b></a></li>
-     <!-- navigation toc: --> <li><a href="#mathematics-of-deep-learning" style="font-size: 80%;"><b>Mathematics of deep learning</b></a></li>
-     <!-- navigation toc: --> <li><a href="#reminder-on-books-with-hands-on-material-and-codes" style="font-size: 80%;"><b>Reminder on books with hands-on material and codes</b></a></li>
-     <!-- navigation toc: --> <li><a href="#reading-recommendations" style="font-size: 80%;"><b>Reading recommendations</b></a></li>
      <!-- navigation toc: --> <li><a href="#using-automatic-differentiation" style="font-size: 80%;"><b>Using Automatic differentiation</b></a></li>
      <!-- navigation toc: --> <li><a href="#back-propagation-and-automatic-differentiation" style="font-size: 80%;"><b>Back propagation and automatic differentiation</b></a></li>
-     <!-- navigation toc: --> <li><a href="#lecture-monday-october-21" style="font-size: 80%;"><b>Lecture Monday  October 21</b></a></li>
+     <!-- navigation toc: --> <li><a href="#lecture-monday-october-20" style="font-size: 80%;"><b>Lecture Monday  October 20</b></a></li>
      <!-- navigation toc: --> <li><a href="#setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations" style="font-size: 80%;"><b>Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations</b></a></li>
      <!-- navigation toc: --> <li><a href="#setting-up-the-back-propagation-algorithm-part-1" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 1</b></a></li>
      <!-- navigation toc: --> <li><a href="#setting-up-the-back-propagation-algorithm-part-2" style="font-size: 80%;"><b>Setting up the back propagation algorithm, part 2</b></a></li>
@@ -361,35 +296,13 @@
      <!-- navigation toc: --> <li><a href="#elu-function" style="font-size: 80%;"><b>ELU function</b></a></li>
      <!-- navigation toc: --> <li><a href="#which-activation-function-should-we-use" style="font-size: 80%;"><b>Which activation function should we use?</b></a></li>
      <!-- navigation toc: --> <li><a href="#more-on-activation-functions-output-layers" style="font-size: 80%;"><b>More on activation functions, output layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="#setting-up-a-multi-layer-perceptron-model-for-classification" style="font-size: 80%;"><b>Setting up a Multi-layer perceptron model for classification</b></a></li>
-     <!-- navigation toc: --> <li><a href="#defining-the-cost-function" style="font-size: 80%;"><b>Defining the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="#example-binary-classification-problem" style="font-size: 80%;"><b>Example: binary classification problem</b></a></li>
-     <!-- navigation toc: --> <li><a href="#the-softmax-function" style="font-size: 80%;"><b>The Softmax function</b></a></li>
-     <!-- navigation toc: --> <li><a href="#developing-a-code-for-doing-neural-networks-with-back-propagation" style="font-size: 80%;"><b>Developing a code for doing neural networks with back propagation</b></a></li>
-     <!-- navigation toc: --> <li><a href="#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="#train-and-test-datasets" style="font-size: 80%;"><b>Train and test datasets</b></a></li>
-     <!-- navigation toc: --> <li><a href="#define-model-and-architecture" style="font-size: 80%;"><b>Define model and architecture</b></a></li>
-     <!-- navigation toc: --> <li><a href="#layers" style="font-size: 80%;"><b>Layers</b></a></li>
-     <!-- navigation toc: --> <li><a href="#weights-and-biases" style="font-size: 80%;"><b>Weights and biases</b></a></li>
-     <!-- navigation toc: --> <li><a href="#feed-forward-pass" style="font-size: 80%;"><b>Feed-forward pass</b></a></li>
-     <!-- navigation toc: --> <li><a href="#matrix-multiplications" style="font-size: 80%;"><b>Matrix multiplications</b></a></li>
-     <!-- navigation toc: --> <li><a href="#choose-cost-function-and-optimizer" style="font-size: 80%;"><b>Choose cost function and optimizer</b></a></li>
-     <!-- navigation toc: --> <li><a href="#optimizing-the-cost-function" style="font-size: 80%;"><b>Optimizing the cost function</b></a></li>
-     <!-- navigation toc: --> <li><a href="#regularization" style="font-size: 80%;"><b>Regularization</b></a></li>
-     <!-- navigation toc: --> <li><a href="#matrix-multiplication" style="font-size: 80%;"><b>Matrix  multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="#improving-performance" style="font-size: 80%;"><b>Improving performance</b></a></li>
-     <!-- navigation toc: --> <li><a href="#full-object-oriented-implementation" style="font-size: 80%;"><b>Full object-oriented implementation</b></a></li>
-     <!-- navigation toc: --> <li><a href="#evaluate-model-performance-on-test-data" style="font-size: 80%;"><b>Evaluate model performance on test data</b></a></li>
-     <!-- navigation toc: --> <li><a href="#adjust-hyperparameters" style="font-size: 80%;"><b>Adjust hyperparameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="#scikit-learn-implementation" style="font-size: 80%;"><b>scikit-learn implementation</b></a></li>
-     <!-- navigation toc: --> <li><a href="#visualization" style="font-size: 80%;"><b>Visualization</b></a></li>
      <!-- navigation toc: --> <li><a href="#building-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building neural networks in Tensorflow and Keras</b></a></li>
      <!-- navigation toc: --> <li><a href="#tensorflow" style="font-size: 80%;"><b>Tensorflow</b></a></li>
      <!-- navigation toc: --> <li><a href="#using-keras" style="font-size: 80%;"><b>Using Keras</b></a></li>
      <!-- navigation toc: --> <li><a href="#collect-and-pre-process-data" style="font-size: 80%;"><b>Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="#the-breast-cancer-data-now-with-keras" style="font-size: 80%;"><b>The Breast Cancer Data, now with Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="#building-a-neural-network-code" style="font-size: 80%;"><b>Building a neural network code</b></a></li>
+     <!-- navigation toc: --> <li><a href="#using-pytorch-with-the-full-mnist-data-set" style="font-size: 80%;"><b>Using Pytorch with the full MNIST data set</b></a></li>
+     <!-- navigation toc: --> <li><a href="#and-a-similar-example-using-tensorflow-with-keras" style="font-size: 80%;"><b>And a similar example using Tensorflow with Keras</b></a></li>
+     <!-- navigation toc: --> <li><a href="#building-our-own-neural-network-code" style="font-size: 80%;"><b>Building our own  neural network code</b></a></li>
      <!-- navigation toc: --> <li><a href="#learning-rate-methods" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Learning rate methods</a></li>
      <!-- navigation toc: --> <li><a href="#usage-of-the-above-learning-rate-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of the above learning rate schedulers</a></li>
      <!-- navigation toc: --> <li><a href="#cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Cost functions</a></li>
@@ -462,18 +375,15 @@ <h1>Week 43: Deep Learning: Constructing a Neural Network code and solving diffe
 
 <!-- author(s): Morten Hjorth-Jensen -->
 <center>
-<b>Morten Hjorth-Jensen</b> [1, 2]
-</center>
-<!-- institution(s) -->
-<center>
-[1] <b>Department of Physics, University of Oslo</b>
+<b>Morten Hjorth-Jensen</b> 
 </center>
+<!-- institution -->
 <center>
-[2] <b>Department of Physics and Astronomy and Facility for Rare Isotope Beams, Michigan State University</b>
+<b>Department of Physics, University of Oslo, Norway</b>
 </center>
 <br>
 <center>
-<h4>October 21, 2024</h4>
+<h4>October 20, 2025</h4>
 </center> <!-- date -->
 <br>
 
@@ -486,69 +396,36 @@ <h2 id="plans-for-week-43" class="anchor">Plans for week 43 </h2>
 <div class="panel panel-default">
 <div class="panel-body">
 <!-- subsequent paragraphs come in larger fonts, so start with a paragraph -->
-<ul>
-  <li> Building our own Feed-forward Neural Network with intro to Tensorflow</li>
-  <li> Solving differential equations with Neural Networks</li>
-  <li> Video of lecture at <a href="https://youtu.be/vkBNTn-MLqs" target="_self"><tt>https://youtu.be/vkBNTn-MLqs</tt></a></li>
-  <li> Video os second part, solving differential equations with neural networks at <a href="https://youtu.be/2N8To65I2wQ" target="_self"><tt>https://youtu.be/2N8To65I2wQ</tt></a></li>
-  <li> Whiteboard notes on solving differential equations at <a href="https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2024/NotesOct21.pdf" target="_self"><tt>https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2024/NotesOct21.pdf</tt></a></li>  
-</ul>
+<ol>
+<li> Reminder from last week, see also lecture notes from week 42 at <a href="https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/week42.html" target="_self"><tt>https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/week42.html</tt></a> as well as those from week 41, see see <a href="https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/week41.html" target="_self"><tt>https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/week41.html</tt></a>.</li> 
+<li> Building our own Feed-forward Neural Network.</li>
+<li> Coding examples using Tensorflow/Keras and Pytorch examples. The Pytorch examples are adapted from Rashcka's text, see chapters 11-13..</li> 
+<li> Start discussions on how to use neural networks for solving  differential equations (ordinary and partial ones). This topic continues next week as well.</li>
+<li> Video of lecture at <a href="https://youtu.be/Gi6mzxAT0Ew" target="_self"><tt>https://youtu.be/Gi6mzxAT0Ew</tt></a></li>
+<li> Whiteboard notes at <a href="https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2025/FYSSTKweek43.pdf" target="_self"><tt>https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2025/FYSSTKweek43.pdf</tt></a></li>  
+</ol>
 </div>
 </div>
 
 
 <!-- !split -->
 <h2 id="exercises-and-lab-session-week-43" class="anchor">Exercises and lab session week 43 </h2>
-<div class="panel panel-default">
-<div class="panel-body">
-<!-- subsequent paragraphs come in larger fonts, so start with a paragraph -->
-<ul>
-  <li> Exercise on writing your own neural network code</li>
-  <li> The exercises this week will be continued next week as well</li>
-  <li> Discussion of project 2</li>
-</ul>
-</div>
-</div>
-  
-
-<!-- !split -->
-<h2 id="mathematics-of-deep-learning" class="anchor">Mathematics of deep learning </h2>
-
 <div class="panel panel-default">
 <div class="panel-body">
 <!-- subsequent paragraphs come in larger fonts, so start with a paragraph -->
 <ol>
-<li> The Modern Mathematics of Deep Learning, by Julius Berner, Philipp Grohs, Gitta Kutyniok, Philipp Petersen at <a href="https://arxiv.org/abs/2105.04026" target="_self"><tt>https://arxiv.org/abs/2105.04026</tt></a>, published as <a href="https://doi.org/10.1017/9781009025096.002" target="_self">Mathematical Aspects of Deep Learning, pp. 1-111. Cambridge University Press, 2022</a></li>
-<li> Mathematical Introduction to Deep Learning: Methods, Implementations, and Theory, Arnulf Jentzen, Benno Kuckuck, Philippe von Wurstemberger at <a href="https://doi.org/10.48550/arXiv.2310.20360" target="_self"><tt>https://doi.org/10.48550/arXiv.2310.20360</tt></a></li>
+<li> Work  on writing your own neural network code and discussions of project 2. If you didn't get time to do the exercises from the two last weeks, we recommend doing so as these exercises give you the basic elements of a neural network code.</li>
+<li> The exercises this week are tailored to the optional part of project 2, and deal with studying ways to display results from classification problems</li>
 </ol>
 </div>
 </div>
+  
 
-
-<!-- !split -->
-<h2 id="reminder-on-books-with-hands-on-material-and-codes" class="anchor">Reminder on books with hands-on material and codes </h2>
-<div class="panel panel-default">
-<div class="panel-body">
-<!-- subsequent paragraphs come in larger fonts, so start with a paragraph -->
-<ul>
-<li> Sebastian Rashcka et al, Machine learning with Scikit-Learn and PyTorch at <a href="https://sebastianraschka.com/blog/2022/ml-pytorch-book.html" target="_self"><tt>https://sebastianraschka.com/blog/2022/ml-pytorch-book.html</tt></a></li>
-</ul>
-</div>
-</div>
-
-
-<!-- !split -->
-<h2 id="reading-recommendations" class="anchor">Reading recommendations </h2>
-
-<ol>
-<li> Rashkca et al., chapter 11, jupyter-notebook sent separately, from GitHub site at <a href="https://github.com/rasbt/machine-learning-book" target="_self"><tt>https://github.com/rasbt/machine-learning-book</tt></a>. See also chapters 12 and 13 on using Pytorch to make a Neural network code.</li> 
-<li> Goodfellow et al, chapter 6 and 7 contain most of the neural network background.</li>
-</ol>
 <!-- !split -->
 <h2 id="using-automatic-differentiation" class="anchor">Using Automatic differentiation </h2>
 
 <p>In our discussions of ordinary differential equations and neural network codes
-we will also study the usage of Autograd, see for example <a href="https://www.youtube.com/watch?v=fRf4l5qaX1M&ab_channel=AlexSmola" target="_self"><tt>https://www.youtube.com/watch?v=fRf4l5qaX1M&ab_channel=AlexSmola</tt></a> in computing gradients for deep learning. For the documentation of Autograd and examples see the lectures slides from <a href="https://compphysics.github.io/MachineLearning/doc/pub/week39/html/week39.html" target="_self">week 39</a> and the Autograd documentation at <a href="https://github.com/HIPS/autograd" target="_self"><tt>https://github.com/HIPS/autograd</tt></a>.
+we will also study the usage of Autograd, see for example <a href="https://www.youtube.com/watch?v=fRf4l5qaX1M&ab_channel=AlexSmola" target="_self"><tt>https://www.youtube.com/watch?v=fRf4l5qaX1M&ab_channel=AlexSmola</tt></a> in computing gradients for deep learning. For the documentation of Autograd and examples see the Autograd documentation at <a href="https://github.com/HIPS/autograd" target="_self"><tt>https://github.com/HIPS/autograd</tt></a> and the lecture slides from week 41, see <a href="https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/week41.html" target="_self"><tt>https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/week41.html</tt></a>. 
 </p>
 
 <!-- !split -->
@@ -561,11 +438,11 @@ <h2 id="back-propagation-and-automatic-differentiation" class="anchor">Back prop
 <li> Slides 12-44 at <a href="http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture4.pdf" target="_self"><tt>http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture4.pdf</tt></a></li>
 </ol>
 <!-- !split -->
-<h2 id="lecture-monday-october-21" class="anchor">Lecture Monday  October 21 </h2>
+<h2 id="lecture-monday-october-20" class="anchor">Lecture Monday  October 20 </h2>
 
 <!-- !split -->
 <h2 id="setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations" class="anchor">Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations </h2>
-<p>This is a reminder from where we ended last week.</p>
+<p>This is a reminder from last week.</p>
 
 <div class="panel panel-default">
 <div class="panel-body">
@@ -746,221 +623,49 @@ <h2 id="more-on-activation-functions-output-layers" class="anchor">More on activ
 <li> For classification the softmax activation function is generally a good choice for classification tasks (when the classes are mutually exclusive).</li>
 <li> For regression tasks, you can simply use no activation function at all.</li>
 </ul>
-<!-- !split  -->
-<h2 id="setting-up-a-multi-layer-perceptron-model-for-classification" class="anchor">Setting up a Multi-layer perceptron model for classification  </h2>
-
-<p>We are now gong to develop an example based on the MNIST data
-base. This is a classification problem and we need to use our
-cross-entropy function we discussed in connection with logistic
-regression. The cross-entropy defines our cost function for the
-classificaton problems with neural networks.
-</p>
-
-<p>In binary classification with two classes \( (0, 1) \) we define the
-logistic/sigmoid function as the probability that a particular input
-is in class \( 0 \) or \( 1 \).  This is possible because the logistic
-function takes any input from the real numbers and inputs a number
-between 0 and 1, and can therefore be interpreted as a probability. It
-also has other nice properties, such as a derivative that is simple to
-calculate.
-</p>
-
-<p>For an input \( \boldsymbol{a} \) from the hidden layer, the probability that the input \( \boldsymbol{x} \)
-is in class 0 or 1 is just. We let \( \theta \) represent the unknown weights and biases to be adjusted by our equations). The variable \( x \)
-represents our activation values \( z \). We have
-</p>
-$$
-P(y = 0 \mid \boldsymbol{x}, \boldsymbol{\theta}) = \frac{1}{1 + \exp{(- \boldsymbol{x}})} ,
-$$
-
-<p>and</p>
-$$
-P(y = 1 \mid \boldsymbol{x}, \boldsymbol{\theta}) = 1 - P(y = 0 \mid \boldsymbol{x}, \boldsymbol{\theta}) ,
-$$
-
-<p>where \( y \in \{0, 1\} \)  and \( \boldsymbol{\theta} \) represents the weights and biases
-of our network.
-</p>
-
-
-<!-- !split -->
-<h2 id="defining-the-cost-function" class="anchor">Defining the cost function </h2>
-
-<p>Our cost function is given as (see the Logistic regression lectures)</p>
-$$
-\mathcal{C}(\boldsymbol{\theta}) = - \ln P(\mathcal{D} \mid \boldsymbol{\theta}) = - \sum_{i=1}^n
-y_i \ln[P(y_i = 0)] + (1 - y_i) \ln [1 - P(y_i = 0)] = \sum_{i=1}^n \mathcal{L}_i(\boldsymbol{\theta}) .
-$$
-
-<p>This last equality means that we can interpret our <em>cost</em> function as a sum over the <em>loss</em> function
-for each point in the dataset \( \mathcal{L}_i(\boldsymbol{\theta}) \).  
-The negative sign is just so that we can think about our algorithm as minimizing a positive number, rather
-than maximizing a negative number.  
-</p>
-
-<p>In <em>multiclass</em> classification it is common to treat each integer label as a so called <em>one-hot</em> vector:  </p>
-
-<p>\( y = 5 \quad \rightarrow \quad \boldsymbol{y} = (0, 0, 0, 0, 0, 1, 0, 0, 0, 0) , \) and</p>
-
-\( y = 1 \quad \rightarrow \quad \boldsymbol{y} = (0, 1, 0, 0, 0, 0, 0, 0, 0, 0) , \)
-
-<p>i.e. a binary bit string of length \( C \), where \( C = 10 \) is the number of classes in the MNIST dataset (numbers from \( 0 \) to \( 9 \))..  </p>
-
-<p>If \( \boldsymbol{x}_i \) is the \( i \)-th input (image), \( y_{ic} \) refers to the \( c \)-th component of the \( i \)-th
-output vector \( \boldsymbol{y}_i \).  
-The probability of \( \boldsymbol{x}_i \) being in class \( c \) will be given by the softmax function:  
-</p>
-
-$$
-P(y_{ic} = 1 \mid \boldsymbol{x}_i, \boldsymbol{\theta}) = \frac{\exp{((\boldsymbol{a}_i^{hidden})^T \boldsymbol{w}_c)}}
-{\sum_{c'=0}^{C-1} \exp{((\boldsymbol{a}_i^{hidden})^T \boldsymbol{w}_{c'})}} ,
-$$
-
-<p>which reduces to the logistic function in the binary case.  
-The likelihood of this \( C \)-class classifier
-is now given as:  
-</p>
-
-$$
-P(\mathcal{D} \mid \boldsymbol{\theta}) = \prod_{i=1}^n \prod_{c=0}^{C-1} [P(y_{ic} = 1)]^{y_{ic}} .
-$$
-
-<p>Again we take the negative log-likelihood to define our cost function:  </p>
-
-$$
-\mathcal{C}(\boldsymbol{\theta}) = - \log{P(\mathcal{D} \mid \boldsymbol{\theta})}.
-$$
-
-<p>See the logistic regression lectures for a full definition of the cost function.</p>
-
-<p>The back propagation equations need now only a small change, namely the definition of a new cost function. We are thus ready to use the same equations as before!</p>
-
 <!-- !split -->
-<h2 id="example-binary-classification-problem" class="anchor">Example: binary classification problem </h2>
-
-<p>As an example of the above, relevant for project 2 as well, let us consider a binary class. As discussed in our logistic regression lectures, we defined a cost function in terms of the parameters \( \beta \) as</p>
-$$
-\mathcal{C}(\boldsymbol{\beta}) = - \sum_{i=1}^n \left(y_i\log{p(y_i \vert x_i,\boldsymbol{\beta})}+(1-y_i)\log{1-p(y_i \vert x_i,\boldsymbol{\beta})}\right),
-$$
-
-<p>where we had defined the logistic (sigmoid) function</p>
-$$
-p(y_i =1\vert x_i,\boldsymbol{\beta})=\frac{\exp{(\beta_0+\beta_1 x_i)}}{1+\exp{(\beta_0+\beta_1 x_i)}},
-$$
-
-<p>and</p>
-$$
-p(y_i =0\vert x_i,\boldsymbol{\beta})=1-p(y_i =1\vert x_i,\boldsymbol{\beta}).
-$$
-
-<p>The parameters \( \boldsymbol{\beta} \) were defined using a minimization method like gradient descent or Newton-Raphson's method. </p>
+<h2 id="building-neural-networks-in-tensorflow-and-keras" class="anchor">Building neural networks in Tensorflow and Keras </h2>
 
-<p>Now we replace \( x_i \) with the activation \( z_i^l \) for a given layer \( l \) and the outputs as \( y_i=a_i^l=f(z_i^l) \), with \( z_i^l \) now being a function of the weights \( w_{ij}^l \) and biases \( b_i^l \). 
-We have then
+<p>Now we want  to build on the experience gained from our neural network implementation in NumPy and scikit-learn
+and use it to construct a neural network in Tensorflow. Once we have constructed a neural network in NumPy
+and Tensorflow, building one in Keras is really quite trivial, though the performance may suffer.  
 </p>
-$$
-a_i^l = y_i = \frac{\exp{(z_i^l)}}{1+\exp{(z_i^l)}},
-$$
-
-<p>with </p>
-$$
-z_i^l = \sum_{j}w_{ij}^l a_j^{l-1}+b_i^l,
-$$
 
-<p>where the superscript \( l-1 \) indicates that these are the outputs from layer \( l-1 \).
-Our cost function at the final layer \( l=L \) is now
+<p>In our previous example we used only one hidden layer, and in this we will use two. From this it should be quite
+clear how to build one using an arbitrary number of hidden layers, using data structures such as Python lists or
+NumPy arrays.
 </p>
-$$
-\mathcal{C}(\boldsymbol{W}) = - \sum_{i=1}^n \left(t_i\log{a_i^L}+(1-t_i)\log{(1-a_i^L)}\right),
-$$
-
-<p>where we have defined the targets \( t_i \). The derivatives of the cost function with respect to the output \( a_i^L \) are then easily calculated and we get</p>
-$$
-\frac{\partial \mathcal{C}(\boldsymbol{W})}{\partial a_i^L} = \frac{a_i^L-t_i}{a_i^L(1-a_i^L)}. 
-$$
-
-<p>In case we use another activation function than the logistic one, we need to evaluate other derivatives. </p>
-
-<!-- !split -->
-<h2 id="the-softmax-function" class="anchor">The Softmax function </h2>
-<p>In case we employ the more general case given by the Softmax equation, we need to evaluate the derivative of the activation function with respect to the activation \( z_i^l \), that is we need</p>
-$$
-\frac{\partial f(z_i^l)}{\partial w_{jk}^l} =
-\frac{\partial f(z_i^l)}{\partial z_j^l} \frac{\partial z_j^l}{\partial w_{jk}^l}= \frac{\partial f(z_i^l)}{\partial z_j^l}a_k^{l-1}.
-$$
-
-<p>For the Softmax function we have</p>
-$$
-f(z_i^l) = \frac{\exp{(z_i^l)}}{\sum_{m=1}^K\exp{(z_m^l)}}.
-$$
-
-<p>Its derivative with respect to \( z_j^l \) gives </p>
-$$
-\frac{\partial f(z_i^l)}{\partial z_j^l}= f(z_i^l)\left(\delta_{ij}-f(z_j^l)\right), 
-$$
-
-<p>which in case of the simply binary model reduces to  having \( i=j \). </p>
-
-<!-- !split  -->
-<h2 id="developing-a-code-for-doing-neural-networks-with-back-propagation" class="anchor">Developing a code for doing neural networks with back propagation </h2>
-
-<p>One can identify a set of key steps when using neural networks to solve supervised learning problems:  </p>
 
-<ol>
-<li> Collect and pre-process data</li>  
-<li> Define model and architecture</li>  
-<li> Choose cost function and optimizer</li>  
-<li> Train the model</li>  
-<li> Evaluate model performance on test data</li>  
-<li> Adjust hyperparameters (if necessary, network architecture)</li>
-</ol>
 <!-- !split -->
-<h2 id="collect-and-pre-process-data" class="anchor">Collect and pre-process data </h2>
-
-<p>Here we will be using the MNIST dataset, which is readily available through the <b>scikit-learn</b>
-package. You may also find it for example <a href="http://yann.lecun.com/exdb/mnist/" target="_self">here</a>.  
-The <em>MNIST</em> (Modified National Institute of Standards and Technology) database is a large database
-of handwritten digits that is commonly used for training various image processing systems.  
-The MNIST dataset consists of 70 000 images of size \( 28\times 28 \) pixels, each labeled from 0 to 9.  
-The scikit-learn dataset we will use consists of a selection of 1797 images of size \( 8\times 8 \) collected and processed from this database.  
-</p>
+<h2 id="tensorflow" class="anchor">Tensorflow </h2>
 
-<p>To feed data into a feed-forward neural network we need to represent
-the inputs as a design/feature matrix \( X = (n_{inputs}, n_{features}) \).  Each
-row represents an <em>input</em>, in this case a handwritten digit, and
-each column represents a <em>feature</em>, in this case a pixel.  The
-correct answers, also known as <em>labels</em> or <em>targets</em> are
-represented as a 1D array of integers 
-\( Y = (n_{inputs}) = (5, 3, 1, 8,...) \).
+<p>Tensorflow is an open source library machine learning library
+developed by the Google Brain team for internal use. It was released
+under the Apache 2.0 open source license in November 9, 2015.
 </p>
 
-<p>As an example, say we want to build a neural network using supervised learning to predict Body-Mass Index (BMI) from
-measurements of height (in m)  
-and weight (in kg). If we have measurements of 5 people the design/feature matrix could be for example:  
+<p>Tensorflow is a computational framework that allows you to construct
+machine learning models at different levels of abstraction, from
+high-level, object-oriented APIs like Keras, down to the C++ kernels
+that Tensorflow is built upon. The higher levels of abstraction are
+simpler to use, but less flexible, and our choice of implementation
+should reflect the problems we are trying to solve.
 </p>
 
-<p>$$ X = \begin{bmatrix}
-1.85 &amp; 81\\
-1.71 &amp; 65\\
-1.95 &amp; 103\\
-1.55 &amp; 42\\
-1.63 &amp; 56
-\end{bmatrix} ,$$  
+<p><a href="https://www.tensorflow.org/guide/graphs" target="_self">Tensorflow uses</a> so-called graphs to represent your computation
+in terms of the dependencies between individual operations, such that you first build a Tensorflow <em>graph</em>
+to represent your model, and then create a Tensorflow <em>session</em> to run the graph.
 </p>
 
-<p>and the targets would be:  </p>
-
-<p>$$ Y = (23.7, 22.2, 27.1, 17.5, 21.1) $$  </p>
-
-<p>Since each input image is a 2D matrix, we need to flatten the image
-(i.e. "unravel" the 2D matrix into a 1D array) to turn the data into a
-design/feature matrix. This means we lose all spatial information in the
-image, such as locality and translational invariance. More complicated
-architectures such as Convolutional Neural Networks can take advantage
-of such information, and are most commonly applied when analyzing
-images.
+<p>In this guide we will analyze the same data as we did in our NumPy and
+scikit-learn tutorial, gathered from the MNIST database of images. We
+will give an introduction to the lower level Python Application
+Program Interfaces (APIs), and see how we use them to build our graph.
+Then we will build (effectively) the same graph in Keras, to see just
+how simple solving a machine learning problem can be.
 </p>
 
+<p>To install tensorflow on Unix/Linux systems, use pip as</p>
 
 <!-- code=python (!bc pycod) typeset with pygments style "default" -->
 <div class="cell border-box-sizing code_cell rendered">
@@ -968,48 +673,7 @@ <h2 id="collect-and-pre-process-data" class="anchor">Collect and pre-process dat
     <div class="inner_cell">
       <div class="input_area">
         <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #408080; font-style: italic"># import necessary packages</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">matplotlib.pyplot</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">plt</span>
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn</span> <span style="color: #008000; font-weight: bold">import</span> datasets
-
-
-<span style="color: #408080; font-style: italic"># ensure the same random numbers appear every time</span>
-np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>seed(<span style="color: #666666">0</span>)
-
-<span style="color: #408080; font-style: italic"># display images in notebook</span>
-<span style="color: #666666">%</span>matplotlib inline
-plt<span style="color: #666666">.</span>rcParams[<span style="color: #BA2121">&#39;figure.figsize&#39;</span>] <span style="color: #666666">=</span> (<span style="color: #666666">12</span>,<span style="color: #666666">12</span>)
-
-
-<span style="color: #408080; font-style: italic"># download MNIST dataset</span>
-digits <span style="color: #666666">=</span> datasets<span style="color: #666666">.</span>load_digits()
-
-<span style="color: #408080; font-style: italic"># define inputs and labels</span>
-inputs <span style="color: #666666">=</span> digits<span style="color: #666666">.</span>images
-labels <span style="color: #666666">=</span> digits<span style="color: #666666">.</span>target
-
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;inputs = (n_inputs, pixel_width, pixel_height) = &quot;</span> <span style="color: #666666">+</span> <span style="color: #008000">str</span>(inputs<span style="color: #666666">.</span>shape))
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;labels = (n_inputs) = &quot;</span> <span style="color: #666666">+</span> <span style="color: #008000">str</span>(labels<span style="color: #666666">.</span>shape))
-
-
-<span style="color: #408080; font-style: italic"># flatten the image</span>
-<span style="color: #408080; font-style: italic"># the value -1 means dimension is inferred from the remaining dimensions: 8x8 = 64</span>
-n_inputs <span style="color: #666666">=</span> <span style="color: #008000">len</span>(inputs)
-inputs <span style="color: #666666">=</span> inputs<span style="color: #666666">.</span>reshape(n_inputs, <span style="color: #666666">-1</span>)
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;X = (n_inputs, n_features) = &quot;</span> <span style="color: #666666">+</span> <span style="color: #008000">str</span>(inputs<span style="color: #666666">.</span>shape))
-
-
-<span style="color: #408080; font-style: italic"># choose some random images to display</span>
-indices <span style="color: #666666">=</span> np<span style="color: #666666">.</span>arange(n_inputs)
-random_indices <span style="color: #666666">=</span> np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>choice(indices, size<span style="color: #666666">=5</span>)
-
-<span style="color: #008000; font-weight: bold">for</span> i, image <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">enumerate</span>(digits<span style="color: #666666">.</span>images[random_indices]):
-    plt<span style="color: #666666">.</span>subplot(<span style="color: #666666">1</span>, <span style="color: #666666">5</span>, i<span style="color: #666666">+1</span>)
-    plt<span style="color: #666666">.</span>axis(<span style="color: #BA2121">&#39;off&#39;</span>)
-    plt<span style="color: #666666">.</span>imshow(image, cmap<span style="color: #666666">=</span>plt<span style="color: #666666">.</span>cm<span style="color: #666666">.</span>gray_r, interpolation<span style="color: #666666">=</span><span style="color: #BA2121">&#39;nearest&#39;</span>)
-    plt<span style="color: #666666">.</span>title(<span style="color: #BA2121">&quot;Label: </span><span style="color: #BB6688; font-weight: bold">%d</span><span style="color: #BA2121">&quot;</span> <span style="color: #666666">%</span> digits<span style="color: #666666">.</span>target[random_indices[i]])
-plt<span style="color: #666666">.</span>show()
+  <pre style="line-height: 125%;">pip3 install tensorflow
 </pre>
 </div>
       </div>
@@ -1025,55 +689,18 @@ <h2 id="collect-and-pre-process-data" class="anchor">Collect and pre-process dat
   </div>
 </div>
 
-
-<!-- !split -->
-<h2 id="train-and-test-datasets" class="anchor">Train and test datasets </h2>
-
-<p>Performing analysis before partitioning the dataset is a major error, that can lead to incorrect conclusions.  </p>
-
-<p>We will reserve \( 80 \% \) of our dataset for training and \( 20 \% \) for testing.  </p>
-
-<p>It is important that the train and test datasets are drawn randomly from our dataset, to ensure
-no bias in the sampling.  
-Say you are taking measurements of weather data to predict the weather in the coming 5 days.
-You don't want to train your model on measurements taken from the hours 00.00 to 12.00, and then test it on data
-collected from 12.00 to 24.00.
+<p>and/or if you use <b>anaconda</b>, just write (or install from the graphical user interface)
+(current release of CPU-only TensorFlow)
 </p>
 
-
 <!-- code=python (!bc pycod) typeset with pygments style "default" -->
 <div class="cell border-box-sizing code_cell rendered">
   <div class="input">
     <div class="inner_cell">
       <div class="input_area">
         <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.model_selection</span> <span style="color: #008000; font-weight: bold">import</span> train_test_split
-
-<span style="color: #408080; font-style: italic"># one-liner from scikit-learn library</span>
-train_size <span style="color: #666666">=</span> <span style="color: #666666">0.8</span>
-test_size <span style="color: #666666">=</span> <span style="color: #666666">1</span> <span style="color: #666666">-</span> train_size
-X_train, X_test, Y_train, Y_test <span style="color: #666666">=</span> train_test_split(inputs, labels, train_size<span style="color: #666666">=</span>train_size,
-                                                    test_size<span style="color: #666666">=</span>test_size)
-
-<span style="color: #408080; font-style: italic"># equivalently in numpy</span>
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">train_test_split_numpy</span>(inputs, labels, train_size, test_size):
-    n_inputs <span style="color: #666666">=</span> <span style="color: #008000">len</span>(inputs)
-    inputs_shuffled <span style="color: #666666">=</span> inputs<span style="color: #666666">.</span>copy()
-    labels_shuffled <span style="color: #666666">=</span> labels<span style="color: #666666">.</span>copy()
-    
-    np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>shuffle(inputs_shuffled)
-    np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>shuffle(labels_shuffled)
-    
-    train_end <span style="color: #666666">=</span> <span style="color: #008000">int</span>(n_inputs<span style="color: #666666">*</span>train_size)
-    X_train, X_test <span style="color: #666666">=</span> inputs_shuffled[:train_end], inputs_shuffled[train_end:]
-    Y_train, Y_test <span style="color: #666666">=</span> labels_shuffled[:train_end], labels_shuffled[train_end:]
-    
-    <span style="color: #008000; font-weight: bold">return</span> X_train, X_test, Y_train, Y_test
-
-<span style="color: #408080; font-style: italic">#X_train, X_test, Y_train, Y_test = train_test_split_numpy(inputs, labels, train_size, test_size)</span>
-
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;Number of training images: &quot;</span> <span style="color: #666666">+</span> <span style="color: #008000">str</span>(<span style="color: #008000">len</span>(X_train)))
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;Number of test images: &quot;</span> <span style="color: #666666">+</span> <span style="color: #008000">str</span>(<span style="color: #008000">len</span>(X_test)))
+  <pre style="line-height: 125%;">conda create <span style="color: #666666">-</span>n tf tensorflow
+conda activate tf
 </pre>
 </div>
       </div>
@@ -1089,990 +716,7 @@ <h2 id="train-and-test-datasets" class="anchor">Train and test datasets </h2>
   </div>
 </div>
 
-
-<!-- !split -->
-<h2 id="define-model-and-architecture" class="anchor">Define model and architecture </h2>
-
-<p>Our simple feed-forward neural network will consist of an <em>input</em> layer, a single <em>hidden</em> layer and an <em>output</em> layer. The activation \( y \) of each neuron is a weighted sum of inputs, passed through an activation function. In case of the simple perceptron model we have </p>
-
-<p>$$ z = \sum_{i=1}^n w_i a_i ,$$</p>
-
-<p>$$ y = f(z) ,$$</p>
-
-<p>where \( f \) is the activation function, \( a_i \) represents input from neuron \( i \) in the preceding layer
-and \( w_i \) is the weight to input \( i \).  
-The activation of the neurons in the input layer is just the features (e.g. a pixel value).  
-</p>
-
-<p>The simplest activation function for a neuron is the <em>Heaviside</em> function:</p>
-
-<p>$$ f(z) = 
-\begin{cases}
-1,  &  z > 0\\
-0,  & \text{otherwise}
-\end{cases}
-$$
-</p>
-
-<p>A feed-forward neural network with this activation is known as a <em>perceptron</em>.  
-For a binary classifier (i.e. two classes, 0 or 1, dog or not-dog) we can also use this in our output layer.  
-This activation can be generalized to \( k \) classes (using e.g. the <em>one-against-all</em> strategy), 
-and we call these architectures <em>multiclass perceptrons</em>.  
-</p>
-
-<p>However, it is now common to use the terms Single Layer Perceptron (SLP) (1 hidden layer) and  
-Multilayer Perceptron (MLP) (2 or more hidden layers) to refer to feed-forward neural networks with any activation function.  
-</p>
-
-<p>Typical choices for activation functions include the sigmoid function, hyperbolic tangent, and Rectified Linear Unit (ReLU).  
-We will be using the sigmoid function \( \sigma(x) \):  
-</p>
-
-<p>$$ f(x) = \sigma(x) = \frac{1}{1 + e^{-x}} ,$$</p>
-
-<p>which is inspired by probability theory (see logistic regression) and was most commonly used until about 2011. See the discussion below concerning other activation functions.</p>
-
-<!-- !split   -->
-<h2 id="layers" class="anchor">Layers </h2>
-
-<ul>
-<li> Input</li> 
-</ul>
-<p>Since each input image has 8x8 = 64 pixels or features, we have an input layer of 64 neurons.  </p>
-
-<ul>
-<li> Hidden layer</li>
-</ul>
-<p>We will use 50 neurons in the hidden layer receiving input from the neurons in the input layer.  
-Since each neuron in the hidden layer is connected to the 64 inputs we have 64x50 = 3200 weights to the hidden layer.  
-</p>
-
-<ul>
-<li> Output</li>
-</ul>
-<p>If we were building a binary classifier, it would be sufficient with a single neuron in the output layer,
-which could output 0 or 1 according to the Heaviside function. This would be an example of a <em>hard</em> classifier, meaning it outputs the class of the input directly. However, if we are dealing with noisy data it is often beneficial to use a <em>soft</em> classifier, which outputs the probability of being in class 0 or 1.  
-</p>
-
-<p>For a soft binary classifier, we could use a single neuron and interpret the output as either being the probability of being in class 0 or the probability of being in class 1. Alternatively we could use 2 neurons, and interpret each neuron as the probability of being in each class.  </p>
-
-<p>Since we are doing multiclass classification, with 10 categories, it is natural to use 10 neurons in the output layer. We number the neurons \( j = 0,1,...,9 \). The activation of each output neuron \( j \) will be according to the <em>softmax</em> function:  </p>
-
-<p>$$ P(\text{class \( j \)} \mid \text{input \( \boldsymbol{a} \)}) = \frac{\exp{(\boldsymbol{a}^T \boldsymbol{w}_j)}}
-{\sum_{c=0}^{9} \exp{(\boldsymbol{a}^T \boldsymbol{w}_c)}} ,$$  
-</p>
-
-<p>i.e. each neuron \( j \) outputs the probability of being in class \( j \) given an input from the hidden layer \( \boldsymbol{a} \), with \( \boldsymbol{w}_j \) the weights of neuron \( j \) to the inputs.  
-The denominator is a normalization factor to ensure the outputs (probabilities) sum up to 1.  
-The exponent is just the weighted sum of inputs as before:  
-</p>
-
-<p>$$ z_j = \sum_{i=1}^n w_ {ij} a_i+b_j.$$  </p>
-
-<p>Since each neuron in the output layer is connected to the 50 inputs from the hidden layer we have 50x10 = 500
-weights to the output layer.
-</p>
-
-<!-- !split   -->
-<h2 id="weights-and-biases" class="anchor">Weights and biases </h2>
-
-<p>Typically weights are initialized with small values distributed around zero, drawn from a uniform
-or normal distribution. Setting all weights to zero means all neurons give the same output, making the network useless.  
-</p>
-
-<p>Adding a bias value to the weighted sum of inputs allows the neural network to represent a greater range
-of values. Without it, any input with the value 0 will be mapped to zero (before being passed through the activation). The bias unit has an output of 1, and a weight to each neuron \( j \), \( b_j \):  
-</p>
-
-<p>$$ z_j = \sum_{i=1}^n w_ {ij} a_i + b_j.$$  </p>
-
-<p>The bias weights \( \boldsymbol{b} \) are often initialized to zero, but a small value like \( 0.01 \) ensures all neurons have some output which can be backpropagated in the first training cycle.</p>
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #408080; font-style: italic"># building our neural network</span>
-
-n_inputs, n_features <span style="color: #666666">=</span> X_train<span style="color: #666666">.</span>shape
-n_hidden_neurons <span style="color: #666666">=</span> <span style="color: #666666">50</span>
-n_categories <span style="color: #666666">=</span> <span style="color: #666666">10</span>
-
-<span style="color: #408080; font-style: italic"># we make the weights normally distributed using numpy.random.randn</span>
-
-<span style="color: #408080; font-style: italic"># weights and bias in the hidden layer</span>
-hidden_weights <span style="color: #666666">=</span> np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>randn(n_features, n_hidden_neurons)
-hidden_bias <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros(n_hidden_neurons) <span style="color: #666666">+</span> <span style="color: #666666">0.01</span>
-
-<span style="color: #408080; font-style: italic"># weights and bias in the output layer</span>
-output_weights <span style="color: #666666">=</span> np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>randn(n_hidden_neurons, n_categories)
-output_bias <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros(n_categories) <span style="color: #666666">+</span> <span style="color: #666666">0.01</span>
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-
-<!-- !split -->
-<h2 id="feed-forward-pass" class="anchor">Feed-forward pass </h2>
-
-<p>Denote \( F \) the number of features, \( H \) the number of hidden neurons and \( C \) the number of categories.  
-For each input image we calculate a weighted sum of input features (pixel values) to each neuron \( j \) in the hidden layer \( l \):  
-</p>
-
-<p>$$ z_{j}^{l} = \sum_{i=1}^{F} w_{ij}^{l} x_i + b_{j}^{l},$$</p>
-
-<p>this is then passed through our activation function  </p>
-
-<p>$$ a_{j}^{l} = f(z_{j}^{l}) .$$  </p>
-
-<p>We calculate a weighted sum of inputs (activations in the hidden layer) to each neuron \( j \) in the output layer:  </p>
-
-<p>$$ z_{j}^{L} = \sum_{i=1}^{H} w_{ij}^{L} a_{i}^{l} + b_{j}^{L}.$$  </p>
-
-<p>Finally we calculate the output of neuron \( j \) in the output layer using the softmax function:  </p>
-
-<p>$$ a_{j}^{L} = \frac{\exp{(z_j^{L})}}
-{\sum_{c=0}^{C-1} \exp{(z_c^{L})}} .$$  
-</p>
-
-<!-- !split    -->
-<h2 id="matrix-multiplications" class="anchor">Matrix multiplications </h2>
-
-<p>Since our data has the dimensions \( X = (n_{inputs}, n_{features}) \) and our weights to the hidden
-layer have the dimensions  
-\( W_{hidden} = (n_{features}, n_{hidden}) \),
-we can easily feed the network all our training data in one go by taking the matrix product  
-</p>
-
-<p>$$ X W^{h} = (n_{inputs}, n_{hidden}),$$ </p>
-
-<p>and obtain a matrix that holds the weighted sum of inputs to the hidden layer
-for each input image and each hidden neuron.    
-We also add the bias to obtain a matrix of weighted sums to the hidden layer \( Z^{h} \):  
-</p>
-
-<p>$$ \boldsymbol{z}^{l} = \boldsymbol{X} \boldsymbol{W}^{l} + \boldsymbol{b}^{l} ,$$</p>
-
-<p>meaning the same bias (1D array with size equal number of hidden neurons) is added to each input image.  
-This is then passed through the activation:  
-</p>
-
-<p>$$ \boldsymbol{a}^{l} = f(\boldsymbol{z}^l) .$$  </p>
-
-<p>This is fed to the output layer:  </p>
-
-<p>$$ \boldsymbol{z}^{L} = \boldsymbol{a}^{L} \boldsymbol{W}^{L} + \boldsymbol{b}^{L} .$$</p>
-
-<p>Finally we receive our output values for each image and each category by passing it through the softmax function:  </p>
-
-<p>$$ output = softmax (\boldsymbol{z}^{L}) = (n_{inputs}, n_{categories}) .$$</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #408080; font-style: italic"># setup the feed-forward pass, subscript h = hidden layer</span>
-
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">sigmoid</span>(x):
-    <span style="color: #008000; font-weight: bold">return</span> <span style="color: #666666">1/</span>(<span style="color: #666666">1</span> <span style="color: #666666">+</span> np<span style="color: #666666">.</span>exp(<span style="color: #666666">-</span>x))
-
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">feed_forward</span>(X):
-    <span style="color: #408080; font-style: italic"># weighted sum of inputs to the hidden layer</span>
-    z_h <span style="color: #666666">=</span> np<span style="color: #666666">.</span>matmul(X, hidden_weights) <span style="color: #666666">+</span> hidden_bias
-    <span style="color: #408080; font-style: italic"># activation in the hidden layer</span>
-    a_h <span style="color: #666666">=</span> sigmoid(z_h)
-    
-    <span style="color: #408080; font-style: italic"># weighted sum of inputs to the output layer</span>
-    z_o <span style="color: #666666">=</span> np<span style="color: #666666">.</span>matmul(a_h, output_weights) <span style="color: #666666">+</span> output_bias
-    <span style="color: #408080; font-style: italic"># softmax output</span>
-    <span style="color: #408080; font-style: italic"># axis 0 holds each input and axis 1 the probabilities of each category</span>
-    exp_term <span style="color: #666666">=</span> np<span style="color: #666666">.</span>exp(z_o)
-    probabilities <span style="color: #666666">=</span> exp_term <span style="color: #666666">/</span> np<span style="color: #666666">.</span>sum(exp_term, axis<span style="color: #666666">=1</span>, keepdims<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">True</span>)
-    
-    <span style="color: #008000; font-weight: bold">return</span> probabilities
-
-probabilities <span style="color: #666666">=</span> feed_forward(X_train)
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;probabilities = (n_inputs, n_categories) = &quot;</span> <span style="color: #666666">+</span> <span style="color: #008000">str</span>(probabilities<span style="color: #666666">.</span>shape))
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;probability that image 0 is in category 0,1,2,...,9 = </span><span style="color: #BB6622; font-weight: bold">\n</span><span style="color: #BA2121">&quot;</span> <span style="color: #666666">+</span> <span style="color: #008000">str</span>(probabilities[<span style="color: #666666">0</span>]))
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;probabilities sum up to: &quot;</span> <span style="color: #666666">+</span> <span style="color: #008000">str</span>(probabilities[<span style="color: #666666">0</span>]<span style="color: #666666">.</span>sum()))
-<span style="color: #008000">print</span>()
-
-<span style="color: #408080; font-style: italic"># we obtain a prediction by taking the class with the highest likelihood</span>
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">predict</span>(X):
-    probabilities <span style="color: #666666">=</span> feed_forward(X)
-    <span style="color: #008000; font-weight: bold">return</span> np<span style="color: #666666">.</span>argmax(probabilities, axis<span style="color: #666666">=1</span>)
-
-predictions <span style="color: #666666">=</span> predict(X_train)
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;predictions = (n_inputs) = &quot;</span> <span style="color: #666666">+</span> <span style="color: #008000">str</span>(predictions<span style="color: #666666">.</span>shape))
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;prediction for image 0: &quot;</span> <span style="color: #666666">+</span> <span style="color: #008000">str</span>(predictions[<span style="color: #666666">0</span>]))
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;correct label for image 0: &quot;</span> <span style="color: #666666">+</span> <span style="color: #008000">str</span>(Y_train[<span style="color: #666666">0</span>]))
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-
-<!-- !split -->
-<h2 id="choose-cost-function-and-optimizer" class="anchor">Choose cost function and optimizer </h2>
-
-<p>To measure how well our neural network is doing we need to introduce a cost function.  
-We will call the function that gives the error of a single sample output the <em>loss</em> function, and the function
-that gives the total error of our network across all samples the <em>cost</em> function.
-A typical choice for multiclass classification is the <em>cross-entropy</em> loss, also known as the negative log likelihood.  
-</p>
-
-<p>In <em>multiclass</em> classification it is common to treat each integer label as a so called <em>one-hot</em> vector:  </p>
-
-<p>$$ y = 5 \quad \rightarrow \quad \boldsymbol{y} = (0, 0, 0, 0, 0, 1, 0, 0, 0, 0) ,$$  </p>
-
-<p>$$ y = 1 \quad \rightarrow \quad \boldsymbol{y} = (0, 1, 0, 0, 0, 0, 0, 0, 0, 0) ,$$  </p>
-
-<p>i.e. a binary bit string of length \( C \), where \( C = 10 \) is the number of classes in the MNIST dataset.  </p>
-
-<p>Let \( y_{ic} \) denote the \( c \)-th component of the \( i \)-th one-hot vector.  
-We define the cost function \( \mathcal{C} \) as a sum over the cross-entropy loss for each point \( \boldsymbol{x}_i \) in the dataset.
-</p>
-
-<p>In the one-hot representation only one of the terms in the loss function is non-zero, namely the
-probability of the correct category \( c' \)  
-(i.e. the category \( c' \) such that \( y_{ic'} = 1 \)). This means that the cross entropy loss only punishes you for how wrong
-you got the correct label. The probability of category \( c \) is given by the softmax function. The vector \( \boldsymbol{\theta} \) represents the parameters of our network, i.e. all the weights and biases.  
-</p>
-
-
-<!-- !split -->
-<h2 id="optimizing-the-cost-function" class="anchor">Optimizing the cost function </h2>
-
-<p>The network is trained by finding the weights and biases that minimize the cost function. One of the most widely used classes of methods is <em>gradient descent</em> and its generalizations. The idea behind gradient descent
-is simply to adjust the weights in the direction where the gradient of the cost function is large and negative. This ensures we flow toward a <em>local</em> minimum of the cost function.  
-Each parameter \( \theta \) is iteratively adjusted according to the rule  
-</p>
-
-<p>$$ \theta_{i+1} = \theta_i - \eta \nabla \mathcal{C}(\theta_i) ,$$</p>
-
-<p>where \( \eta \) is known as the <em>learning rate</em>, which controls how big a step we take towards the minimum.  
-This update can be repeated for any number of iterations, or until we are satisfied with the result.  
-</p>
-
-<p>A simple and effective improvement is a variant called <em>Batch Gradient Descent</em>.  
-Instead of calculating the gradient on the whole dataset, we calculate an approximation of the gradient
-on a subset of the data called a <em>minibatch</em>.  
-If there are \( N \) data points and we have a minibatch size of \( M \), the total number of batches
-is \( N/M \).  
-We denote each minibatch \( B_k \), with \( k = 1, 2,...,N/M \). The gradient then becomes:  
-</p>
-
-<p>$$ \nabla \mathcal{C}(\theta) = \frac{1}{N} \sum_{i=1}^N \nabla \mathcal{L}_i(\theta) \quad \rightarrow \quad
-\frac{1}{M} \sum_{i \in B_k} \nabla \mathcal{L}_i(\theta) ,$$
-</p>
-
-<p>i.e. instead of averaging the loss over the entire dataset, we average over a minibatch.  </p>
-
-<p>This has two important benefits:  </p>
-<ol>
-<li> Introducing stochasticity decreases the chance that the algorithm becomes stuck in a local minima.</li>  
-<li> It significantly speeds up the calculation, since we do not have to use the entire dataset to calculate the gradient.</li>  
-</ol>
-<p>The various optmization  methods, with codes and algorithms,  are discussed in our lectures on <a href="https://compphysics.github.io/MachineLearning/doc/pub/Splines/html/Splines-bs.html" target="_self">Gradient descent approaches</a>.</p>
-
-<!-- !split   -->
-<h2 id="regularization" class="anchor">Regularization </h2>
-
-<p>It is common to add an extra term to the cost function, proportional
-to the size of the weights.  This is equivalent to constraining the
-size of the weights, so that they do not grow out of control.
-Constraining the size of the weights means that the weights cannot
-grow arbitrarily large to fit the training data, and in this way
-reduces <em>overfitting</em>.
-</p>
-
-<p>We will measure the size of the weights using the so called <em>L2-norm</em>, meaning our cost function becomes:  </p>
-
-<p>$$  \mathcal{C}(\theta) = \frac{1}{N} \sum_{i=1}^N \mathcal{L}_i(\theta) \quad \rightarrow \quad
-\frac{1}{N} \sum_{i=1}^N  \mathcal{L}_i(\theta) + \lambda \lvert \lvert \boldsymbol{w} \rvert \rvert_2^2 
-= \frac{1}{N} \sum_{i=1}^N  \mathcal{L}(\theta) + \lambda \sum_{ij} w_{ij}^2,$$  
-</p>
-
-<p>i.e. we sum up all the weights squared. The factor \( \lambda \) is known as a regularization parameter.</p>
-
-<p>In order to train the model, we need to calculate the derivative of
-the cost function with respect to every bias and weight in the
-network.  In total our network has \( (64 + 1)\times 50=3250 \) weights in
-the hidden layer and \( (50 + 1)\times 10=510 \) weights to the output
-layer (\( +1 \) for the bias), and the gradient must be calculated for
-every parameter.  We use the <em>backpropagation</em> algorithm discussed
-above. This is a clever use of the chain rule that allows us to
-calculate the gradient efficently. 
-</p>
-
-<!-- !split -->
-<h2 id="matrix-multiplication" class="anchor">Matrix  multiplication </h2>
-
-<p>To more efficently train our network these equations are implemented using matrix operations.  
-The error in the output layer is calculated simply as, with \( \boldsymbol{t} \) being our targets,  
-</p>
-
-<p>$$ \delta_L = \boldsymbol{t} - \boldsymbol{y} = (n_{inputs}, n_{categories}) .$$  </p>
-
-<p>The gradient for the output weights is calculated as  </p>
-
-<p>$$ \nabla W_{L} = \boldsymbol{a}^T \delta_L   = (n_{hidden}, n_{categories}) ,$$</p>
-
-<p>where \( \boldsymbol{a} = (n_{inputs}, n_{hidden}) \). This simply means that we are summing up the gradients for each input.  
-Since we are going backwards we have to transpose the activation matrix.  
-</p>
-
-<p>The gradient with respect to the output bias is then  </p>
-
-<p>$$ \nabla \boldsymbol{b}_{L} = \sum_{i=1}^{n_{inputs}} \delta_L = (n_{categories}) .$$  </p>
-
-<p>The error in the hidden layer is  </p>
-
-<p>$$ \Delta_h = \delta_L W_{L}^T \circ f'(z_{h}) = \delta_L W_{L}^T \circ a_{h} \circ (1 - a_{h}) = (n_{inputs}, n_{hidden}) ,$$  </p>
-
-<p>where \( f'(a_{h}) \) is the derivative of the activation in the hidden layer. The matrix products mean
-that we are summing up the products for each neuron in the output layer. The symbol \( \circ \) denotes
-the <em>Hadamard product</em>, meaning element-wise multiplication.  
-</p>
-
-<p>This again gives us the gradients in the hidden layer:  </p>
-
-<p>$$ \nabla W_{h} = X^T \delta_h = (n_{features}, n_{hidden}) ,$$  </p>
-
-<p>$$ \nabla b_{h} = \sum_{i=1}^{n_{inputs}} \delta_h = (n_{hidden}) .$$</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #408080; font-style: italic"># to categorical turns our integer vector into a onehot representation</span>
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.metrics</span> <span style="color: #008000; font-weight: bold">import</span> accuracy_score
-
-<span style="color: #408080; font-style: italic"># one-hot in numpy</span>
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">to_categorical_numpy</span>(integer_vector):
-    n_inputs <span style="color: #666666">=</span> <span style="color: #008000">len</span>(integer_vector)
-    n_categories <span style="color: #666666">=</span> np<span style="color: #666666">.</span>max(integer_vector) <span style="color: #666666">+</span> <span style="color: #666666">1</span>
-    onehot_vector <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros((n_inputs, n_categories))
-    onehot_vector[<span style="color: #008000">range</span>(n_inputs), integer_vector] <span style="color: #666666">=</span> <span style="color: #666666">1</span>
-    
-    <span style="color: #008000; font-weight: bold">return</span> onehot_vector
-
-<span style="color: #408080; font-style: italic">#Y_train_onehot, Y_test_onehot = to_categorical(Y_train), to_categorical(Y_test)</span>
-Y_train_onehot, Y_test_onehot <span style="color: #666666">=</span> to_categorical_numpy(Y_train), to_categorical_numpy(Y_test)
-
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">feed_forward_train</span>(X):
-    <span style="color: #408080; font-style: italic"># weighted sum of inputs to the hidden layer</span>
-    z_h <span style="color: #666666">=</span> np<span style="color: #666666">.</span>matmul(X, hidden_weights) <span style="color: #666666">+</span> hidden_bias
-    <span style="color: #408080; font-style: italic"># activation in the hidden layer</span>
-    a_h <span style="color: #666666">=</span> sigmoid(z_h)
-    
-    <span style="color: #408080; font-style: italic"># weighted sum of inputs to the output layer</span>
-    z_o <span style="color: #666666">=</span> np<span style="color: #666666">.</span>matmul(a_h, output_weights) <span style="color: #666666">+</span> output_bias
-    <span style="color: #408080; font-style: italic"># softmax output</span>
-    <span style="color: #408080; font-style: italic"># axis 0 holds each input and axis 1 the probabilities of each category</span>
-    exp_term <span style="color: #666666">=</span> np<span style="color: #666666">.</span>exp(z_o)
-    probabilities <span style="color: #666666">=</span> exp_term <span style="color: #666666">/</span> np<span style="color: #666666">.</span>sum(exp_term, axis<span style="color: #666666">=1</span>, keepdims<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">True</span>)
-    
-    <span style="color: #408080; font-style: italic"># for backpropagation need activations in hidden and output layers</span>
-    <span style="color: #008000; font-weight: bold">return</span> a_h, probabilities
-
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">backpropagation</span>(X, Y):
-    a_h, probabilities <span style="color: #666666">=</span> feed_forward_train(X)
-    
-    <span style="color: #408080; font-style: italic"># error in the output layer</span>
-    error_output <span style="color: #666666">=</span> probabilities <span style="color: #666666">-</span> Y
-    <span style="color: #408080; font-style: italic"># error in the hidden layer</span>
-    error_hidden <span style="color: #666666">=</span> np<span style="color: #666666">.</span>matmul(error_output, output_weights<span style="color: #666666">.</span>T) <span style="color: #666666">*</span> a_h <span style="color: #666666">*</span> (<span style="color: #666666">1</span> <span style="color: #666666">-</span> a_h)
-    
-    <span style="color: #408080; font-style: italic"># gradients for the output layer</span>
-    output_weights_gradient <span style="color: #666666">=</span> np<span style="color: #666666">.</span>matmul(a_h<span style="color: #666666">.</span>T, error_output)
-    output_bias_gradient <span style="color: #666666">=</span> np<span style="color: #666666">.</span>sum(error_output, axis<span style="color: #666666">=0</span>)
-    
-    <span style="color: #408080; font-style: italic"># gradient for the hidden layer</span>
-    hidden_weights_gradient <span style="color: #666666">=</span> np<span style="color: #666666">.</span>matmul(X<span style="color: #666666">.</span>T, error_hidden)
-    hidden_bias_gradient <span style="color: #666666">=</span> np<span style="color: #666666">.</span>sum(error_hidden, axis<span style="color: #666666">=0</span>)
-
-    <span style="color: #008000; font-weight: bold">return</span> output_weights_gradient, output_bias_gradient, hidden_weights_gradient, hidden_bias_gradient
-
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;Old accuracy on training data: &quot;</span> <span style="color: #666666">+</span> <span style="color: #008000">str</span>(accuracy_score(predict(X_train), Y_train)))
-
-eta <span style="color: #666666">=</span> <span style="color: #666666">0.01</span>
-lmbd <span style="color: #666666">=</span> <span style="color: #666666">0.01</span>
-<span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #666666">1000</span>):
-    <span style="color: #408080; font-style: italic"># calculate gradients</span>
-    dWo, dBo, dWh, dBh <span style="color: #666666">=</span> backpropagation(X_train, Y_train_onehot)
-    
-    <span style="color: #408080; font-style: italic"># regularization term gradients</span>
-    dWo <span style="color: #666666">+=</span> lmbd <span style="color: #666666">*</span> output_weights
-    dWh <span style="color: #666666">+=</span> lmbd <span style="color: #666666">*</span> hidden_weights
-    
-    <span style="color: #408080; font-style: italic"># update weights and biases</span>
-    output_weights <span style="color: #666666">-=</span> eta <span style="color: #666666">*</span> dWo
-    output_bias <span style="color: #666666">-=</span> eta <span style="color: #666666">*</span> dBo
-    hidden_weights <span style="color: #666666">-=</span> eta <span style="color: #666666">*</span> dWh
-    hidden_bias <span style="color: #666666">-=</span> eta <span style="color: #666666">*</span> dBh
-
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;New accuracy on training data: &quot;</span> <span style="color: #666666">+</span> <span style="color: #008000">str</span>(accuracy_score(predict(X_train), Y_train)))
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-
-<!-- !split -->
-<h2 id="improving-performance" class="anchor">Improving performance </h2>
-
-<p>As we can see the network does not seem to be learning at all. It seems to be just guessing the label for each image.  
-In order to obtain a network that does something useful, we will have to do a bit more work.  
-</p>
-
-<p>The choice of <em>hyperparameters</em> such as learning rate and regularization parameter is hugely influential for the performance of the network. Typically a <em>grid-search</em> is performed, wherein we test different hyperparameters separated by orders of magnitude. For example we could test the learning rates \( \eta = 10^{-6}, 10^{-5},...,10^{-1} \) with different regularization parameters \( \lambda = 10^{-6},...,10^{-0} \).  </p>
-
-<p>Next, we haven't implemented minibatching yet, which introduces stochasticity and is though to act as an important regularizer on the weights. We call a feed-forward + backward pass with a minibatch an <em>iteration</em>, and a full training period
-going through the entire dataset (\( n/M \) batches) an <em>epoch</em>.
-</p>
-
-<p>If this does not improve network performance, you may want to consider altering the network architecture, adding more neurons or hidden layers.  
-Andrew Ng goes through some of these considerations in this <a href="https://youtu.be/F1ka6a13S9I" target="_self">video</a>. You can find a summary of the video <a href="https://kevinzakka.github.io/2016/09/26/applying-deep-learning/" target="_self">here</a>.  
-</p>
-
-<!-- !split -->
-<h2 id="full-object-oriented-implementation" class="anchor">Full object-oriented implementation </h2>
-
-<p>It is very natural to think of the network as an object, with specific instances of the network
-being realizations of this object with different hyperparameters. An implementation using Python classes provides a clean structure and interface, and the full implementation of our neural network is given below.
-</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">class</span> <span style="color: #0000FF; font-weight: bold">NeuralNetwork</span>:
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">__init__</span>(
-            <span style="color: #008000">self</span>,
-            X_data,
-            Y_data,
-            n_hidden_neurons<span style="color: #666666">=50</span>,
-            n_categories<span style="color: #666666">=10</span>,
-            epochs<span style="color: #666666">=10</span>,
-            batch_size<span style="color: #666666">=100</span>,
-            eta<span style="color: #666666">=0.1</span>,
-            lmbd<span style="color: #666666">=0.0</span>):
-
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>X_data_full <span style="color: #666666">=</span> X_data
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>Y_data_full <span style="color: #666666">=</span> Y_data
-
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>n_inputs <span style="color: #666666">=</span> X_data<span style="color: #666666">.</span>shape[<span style="color: #666666">0</span>]
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>n_features <span style="color: #666666">=</span> X_data<span style="color: #666666">.</span>shape[<span style="color: #666666">1</span>]
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>n_hidden_neurons <span style="color: #666666">=</span> n_hidden_neurons
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>n_categories <span style="color: #666666">=</span> n_categories
-
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>epochs <span style="color: #666666">=</span> epochs
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>batch_size <span style="color: #666666">=</span> batch_size
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>iterations <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>n_inputs <span style="color: #666666">//</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>batch_size
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>eta <span style="color: #666666">=</span> eta
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>lmbd <span style="color: #666666">=</span> lmbd
-
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>create_biases_and_weights()
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">create_biases_and_weights</span>(<span style="color: #008000">self</span>):
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>hidden_weights <span style="color: #666666">=</span> np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>randn(<span style="color: #008000">self</span><span style="color: #666666">.</span>n_features, <span style="color: #008000">self</span><span style="color: #666666">.</span>n_hidden_neurons)
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>hidden_bias <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros(<span style="color: #008000">self</span><span style="color: #666666">.</span>n_hidden_neurons) <span style="color: #666666">+</span> <span style="color: #666666">0.01</span>
-
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>output_weights <span style="color: #666666">=</span> np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>randn(<span style="color: #008000">self</span><span style="color: #666666">.</span>n_hidden_neurons, <span style="color: #008000">self</span><span style="color: #666666">.</span>n_categories)
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>output_bias <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros(<span style="color: #008000">self</span><span style="color: #666666">.</span>n_categories) <span style="color: #666666">+</span> <span style="color: #666666">0.01</span>
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">feed_forward</span>(<span style="color: #008000">self</span>):
-        <span style="color: #408080; font-style: italic"># feed-forward for training</span>
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>z_h <span style="color: #666666">=</span> np<span style="color: #666666">.</span>matmul(<span style="color: #008000">self</span><span style="color: #666666">.</span>X_data, <span style="color: #008000">self</span><span style="color: #666666">.</span>hidden_weights) <span style="color: #666666">+</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>hidden_bias
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>a_h <span style="color: #666666">=</span> sigmoid(<span style="color: #008000">self</span><span style="color: #666666">.</span>z_h)
-
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>z_o <span style="color: #666666">=</span> np<span style="color: #666666">.</span>matmul(<span style="color: #008000">self</span><span style="color: #666666">.</span>a_h, <span style="color: #008000">self</span><span style="color: #666666">.</span>output_weights) <span style="color: #666666">+</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>output_bias
-
-        exp_term <span style="color: #666666">=</span> np<span style="color: #666666">.</span>exp(<span style="color: #008000">self</span><span style="color: #666666">.</span>z_o)
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>probabilities <span style="color: #666666">=</span> exp_term <span style="color: #666666">/</span> np<span style="color: #666666">.</span>sum(exp_term, axis<span style="color: #666666">=1</span>, keepdims<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">True</span>)
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">feed_forward_out</span>(<span style="color: #008000">self</span>, X):
-        <span style="color: #408080; font-style: italic"># feed-forward for output</span>
-        z_h <span style="color: #666666">=</span> np<span style="color: #666666">.</span>matmul(X, <span style="color: #008000">self</span><span style="color: #666666">.</span>hidden_weights) <span style="color: #666666">+</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>hidden_bias
-        a_h <span style="color: #666666">=</span> sigmoid(z_h)
-
-        z_o <span style="color: #666666">=</span> np<span style="color: #666666">.</span>matmul(a_h, <span style="color: #008000">self</span><span style="color: #666666">.</span>output_weights) <span style="color: #666666">+</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>output_bias
-        
-        exp_term <span style="color: #666666">=</span> np<span style="color: #666666">.</span>exp(z_o)
-        probabilities <span style="color: #666666">=</span> exp_term <span style="color: #666666">/</span> np<span style="color: #666666">.</span>sum(exp_term, axis<span style="color: #666666">=1</span>, keepdims<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">True</span>)
-        <span style="color: #008000; font-weight: bold">return</span> probabilities
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">backpropagation</span>(<span style="color: #008000">self</span>):
-        error_output <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>probabilities <span style="color: #666666">-</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>Y_data
-        error_hidden <span style="color: #666666">=</span> np<span style="color: #666666">.</span>matmul(error_output, <span style="color: #008000">self</span><span style="color: #666666">.</span>output_weights<span style="color: #666666">.</span>T) <span style="color: #666666">*</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>a_h <span style="color: #666666">*</span> (<span style="color: #666666">1</span> <span style="color: #666666">-</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>a_h)
-
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>output_weights_gradient <span style="color: #666666">=</span> np<span style="color: #666666">.</span>matmul(<span style="color: #008000">self</span><span style="color: #666666">.</span>a_h<span style="color: #666666">.</span>T, error_output)
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>output_bias_gradient <span style="color: #666666">=</span> np<span style="color: #666666">.</span>sum(error_output, axis<span style="color: #666666">=0</span>)
-
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>hidden_weights_gradient <span style="color: #666666">=</span> np<span style="color: #666666">.</span>matmul(<span style="color: #008000">self</span><span style="color: #666666">.</span>X_data<span style="color: #666666">.</span>T, error_hidden)
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>hidden_bias_gradient <span style="color: #666666">=</span> np<span style="color: #666666">.</span>sum(error_hidden, axis<span style="color: #666666">=0</span>)
-
-        <span style="color: #008000; font-weight: bold">if</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>lmbd <span style="color: #666666">&gt;</span> <span style="color: #666666">0.0</span>:
-            <span style="color: #008000">self</span><span style="color: #666666">.</span>output_weights_gradient <span style="color: #666666">+=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>lmbd <span style="color: #666666">*</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>output_weights
-            <span style="color: #008000">self</span><span style="color: #666666">.</span>hidden_weights_gradient <span style="color: #666666">+=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>lmbd <span style="color: #666666">*</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>hidden_weights
-
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>output_weights <span style="color: #666666">-=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>eta <span style="color: #666666">*</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>output_weights_gradient
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>output_bias <span style="color: #666666">-=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>eta <span style="color: #666666">*</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>output_bias_gradient
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>hidden_weights <span style="color: #666666">-=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>eta <span style="color: #666666">*</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>hidden_weights_gradient
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>hidden_bias <span style="color: #666666">-=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>eta <span style="color: #666666">*</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>hidden_bias_gradient
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">predict</span>(<span style="color: #008000">self</span>, X):
-        probabilities <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>feed_forward_out(X)
-        <span style="color: #008000; font-weight: bold">return</span> np<span style="color: #666666">.</span>argmax(probabilities, axis<span style="color: #666666">=1</span>)
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">predict_probabilities</span>(<span style="color: #008000">self</span>, X):
-        probabilities <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>feed_forward_out(X)
-        <span style="color: #008000; font-weight: bold">return</span> probabilities
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">train</span>(<span style="color: #008000">self</span>):
-        data_indices <span style="color: #666666">=</span> np<span style="color: #666666">.</span>arange(<span style="color: #008000">self</span><span style="color: #666666">.</span>n_inputs)
-
-        <span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #008000">self</span><span style="color: #666666">.</span>epochs):
-            <span style="color: #008000; font-weight: bold">for</span> j <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #008000">self</span><span style="color: #666666">.</span>iterations):
-                <span style="color: #408080; font-style: italic"># pick datapoints with replacement</span>
-                chosen_datapoints <span style="color: #666666">=</span> np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>choice(
-                    data_indices, size<span style="color: #666666">=</span><span style="color: #008000">self</span><span style="color: #666666">.</span>batch_size, replace<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">False</span>
-                )
-
-                <span style="color: #408080; font-style: italic"># minibatch training data</span>
-                <span style="color: #008000">self</span><span style="color: #666666">.</span>X_data <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>X_data_full[chosen_datapoints]
-                <span style="color: #008000">self</span><span style="color: #666666">.</span>Y_data <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>Y_data_full[chosen_datapoints]
-
-                <span style="color: #008000">self</span><span style="color: #666666">.</span>feed_forward()
-                <span style="color: #008000">self</span><span style="color: #666666">.</span>backpropagation()
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-
-<!-- !split -->
-<h2 id="evaluate-model-performance-on-test-data" class="anchor">Evaluate model performance on test data </h2>
-
-<p>To measure the performance of our network we evaluate how well it does it data it has never seen before, i.e. the test data.  
-We measure the performance of the network using the <em>accuracy</em> score.  
-The accuracy is as you would expect just the number of images correctly labeled divided by the total number of images. A perfect classifier will have an accuracy score of \( 1 \).  
-</p>
-
-<p>$$ \text{Accuracy} = \frac{\sum_{i=1}^n I(\tilde{y}_i = y_i)}{n} ,$$  </p>
-
-<p>where \( I \) is the indicator function, \( 1 \) if \( \tilde{y}_i = y_i \) and \( 0 \) otherwise.</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;">epochs <span style="color: #666666">=</span> <span style="color: #666666">100</span>
-batch_size <span style="color: #666666">=</span> <span style="color: #666666">100</span>
-
-dnn <span style="color: #666666">=</span> NeuralNetwork(X_train, Y_train_onehot, eta<span style="color: #666666">=</span>eta, lmbd<span style="color: #666666">=</span>lmbd, epochs<span style="color: #666666">=</span>epochs, batch_size<span style="color: #666666">=</span>batch_size,
-                    n_hidden_neurons<span style="color: #666666">=</span>n_hidden_neurons, n_categories<span style="color: #666666">=</span>n_categories)
-dnn<span style="color: #666666">.</span>train()
-test_predict <span style="color: #666666">=</span> dnn<span style="color: #666666">.</span>predict(X_test)
-
-<span style="color: #408080; font-style: italic"># accuracy score from scikit library</span>
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;Accuracy score on test set: &quot;</span>, accuracy_score(Y_test, test_predict))
-
-<span style="color: #408080; font-style: italic"># equivalent in numpy</span>
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">accuracy_score_numpy</span>(Y_test, Y_pred):
-    <span style="color: #008000; font-weight: bold">return</span> np<span style="color: #666666">.</span>sum(Y_test <span style="color: #666666">==</span> Y_pred) <span style="color: #666666">/</span> <span style="color: #008000">len</span>(Y_test)
-
-<span style="color: #408080; font-style: italic">#print(&quot;Accuracy score on test set: &quot;, accuracy_score_numpy(Y_test, test_predict))</span>
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-
-<!-- !split -->
-<h2 id="adjust-hyperparameters" class="anchor">Adjust hyperparameters </h2>
-
-<p>We now perform a grid search to find the optimal hyperparameters for the network.  
-Note that we are only using 1 layer with 50 neurons, and human performance is estimated to be around \( 98\% \) (\( 2\% \) error rate).
-</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;">eta_vals <span style="color: #666666">=</span> np<span style="color: #666666">.</span>logspace(<span style="color: #666666">-5</span>, <span style="color: #666666">1</span>, <span style="color: #666666">7</span>)
-lmbd_vals <span style="color: #666666">=</span> np<span style="color: #666666">.</span>logspace(<span style="color: #666666">-5</span>, <span style="color: #666666">1</span>, <span style="color: #666666">7</span>)
-<span style="color: #408080; font-style: italic"># store the models for later use</span>
-DNN_numpy <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros((<span style="color: #008000">len</span>(eta_vals), <span style="color: #008000">len</span>(lmbd_vals)), dtype<span style="color: #666666">=</span><span style="color: #008000">object</span>)
-
-<span style="color: #408080; font-style: italic"># grid search</span>
-<span style="color: #008000; font-weight: bold">for</span> i, eta <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">enumerate</span>(eta_vals):
-    <span style="color: #008000; font-weight: bold">for</span> j, lmbd <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">enumerate</span>(lmbd_vals):
-        dnn <span style="color: #666666">=</span> NeuralNetwork(X_train, Y_train_onehot, eta<span style="color: #666666">=</span>eta, lmbd<span style="color: #666666">=</span>lmbd, epochs<span style="color: #666666">=</span>epochs, batch_size<span style="color: #666666">=</span>batch_size,
-                            n_hidden_neurons<span style="color: #666666">=</span>n_hidden_neurons, n_categories<span style="color: #666666">=</span>n_categories)
-        dnn<span style="color: #666666">.</span>train()
-        
-        DNN_numpy[i][j] <span style="color: #666666">=</span> dnn
-        
-        test_predict <span style="color: #666666">=</span> dnn<span style="color: #666666">.</span>predict(X_test)
-        
-        <span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;Learning rate  = &quot;</span>, eta)
-        <span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;Lambda = &quot;</span>, lmbd)
-        <span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;Accuracy score on test set: &quot;</span>, accuracy_score(Y_test, test_predict))
-        <span style="color: #008000">print</span>()
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-
-<!-- !split -->
-<h2 id="visualization" class="anchor">Visualization </h2>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #408080; font-style: italic"># visual representation of grid search</span>
-<span style="color: #408080; font-style: italic"># uses seaborn heatmap, you can also do this with matplotlib imshow</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">seaborn</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">sns</span>
-
-sns<span style="color: #666666">.</span>set()
-
-train_accuracy <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros((<span style="color: #008000">len</span>(eta_vals), <span style="color: #008000">len</span>(lmbd_vals)))
-test_accuracy <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros((<span style="color: #008000">len</span>(eta_vals), <span style="color: #008000">len</span>(lmbd_vals)))
-
-<span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #008000">len</span>(eta_vals)):
-    <span style="color: #008000; font-weight: bold">for</span> j <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #008000">len</span>(lmbd_vals)):
-        dnn <span style="color: #666666">=</span> DNN_numpy[i][j]
-        
-        train_pred <span style="color: #666666">=</span> dnn<span style="color: #666666">.</span>predict(X_train) 
-        test_pred <span style="color: #666666">=</span> dnn<span style="color: #666666">.</span>predict(X_test)
-
-        train_accuracy[i][j] <span style="color: #666666">=</span> accuracy_score(Y_train, train_pred)
-        test_accuracy[i][j] <span style="color: #666666">=</span> accuracy_score(Y_test, test_pred)
-
-        
-fig, ax <span style="color: #666666">=</span> plt<span style="color: #666666">.</span>subplots(figsize <span style="color: #666666">=</span> (<span style="color: #666666">10</span>, <span style="color: #666666">10</span>))
-sns<span style="color: #666666">.</span>heatmap(train_accuracy, annot<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">True</span>, ax<span style="color: #666666">=</span>ax, cmap<span style="color: #666666">=</span><span style="color: #BA2121">&quot;viridis&quot;</span>)
-ax<span style="color: #666666">.</span>set_title(<span style="color: #BA2121">&quot;Training Accuracy&quot;</span>)
-ax<span style="color: #666666">.</span>set_ylabel(<span style="color: #BA2121">&quot;$\eta$&quot;</span>)
-ax<span style="color: #666666">.</span>set_xlabel(<span style="color: #BA2121">&quot;$\lambda$&quot;</span>)
-plt<span style="color: #666666">.</span>show()
-
-fig, ax <span style="color: #666666">=</span> plt<span style="color: #666666">.</span>subplots(figsize <span style="color: #666666">=</span> (<span style="color: #666666">10</span>, <span style="color: #666666">10</span>))
-sns<span style="color: #666666">.</span>heatmap(test_accuracy, annot<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">True</span>, ax<span style="color: #666666">=</span>ax, cmap<span style="color: #666666">=</span><span style="color: #BA2121">&quot;viridis&quot;</span>)
-ax<span style="color: #666666">.</span>set_title(<span style="color: #BA2121">&quot;Test Accuracy&quot;</span>)
-ax<span style="color: #666666">.</span>set_ylabel(<span style="color: #BA2121">&quot;$\eta$&quot;</span>)
-ax<span style="color: #666666">.</span>set_xlabel(<span style="color: #BA2121">&quot;$\lambda$&quot;</span>)
-plt<span style="color: #666666">.</span>show()
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-
-<!-- !split -->
-<h2 id="scikit-learn-implementation" class="anchor">scikit-learn implementation </h2>
-
-<p><b>scikit-learn</b> focuses more
-on traditional machine learning methods, such as regression,
-clustering, decision trees, etc. As such, it has only two types of
-neural networks: Multi Layer Perceptron outputting continuous values,
-<em>MPLRegressor</em>, and Multi Layer Perceptron outputting labels,
-<em>MLPClassifier</em>. We will see how simple it is to use these classes.
-</p>
-
-<p><b>scikit-learn</b> implements a few improvements from our neural network,
-such as early stopping, a varying learning rate, different
-optimization methods, etc. We would therefore expect a better
-performance overall.
-</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.neural_network</span> <span style="color: #008000; font-weight: bold">import</span> MLPClassifier
-<span style="color: #408080; font-style: italic"># store models for later use</span>
-DNN_scikit <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros((<span style="color: #008000">len</span>(eta_vals), <span style="color: #008000">len</span>(lmbd_vals)), dtype<span style="color: #666666">=</span><span style="color: #008000">object</span>)
-
-<span style="color: #008000; font-weight: bold">for</span> i, eta <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">enumerate</span>(eta_vals):
-    <span style="color: #008000; font-weight: bold">for</span> j, lmbd <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">enumerate</span>(lmbd_vals):
-        dnn <span style="color: #666666">=</span> MLPClassifier(hidden_layer_sizes<span style="color: #666666">=</span>(n_hidden_neurons), activation<span style="color: #666666">=</span><span style="color: #BA2121">&#39;logistic&#39;</span>,
-                            alpha<span style="color: #666666">=</span>lmbd, learning_rate_init<span style="color: #666666">=</span>eta, max_iter<span style="color: #666666">=</span>epochs)
-        dnn<span style="color: #666666">.</span>fit(X_train, Y_train)
-        
-        DNN_scikit[i][j] <span style="color: #666666">=</span> dnn
-        
-        <span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;Learning rate  = &quot;</span>, eta)
-        <span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;Lambda = &quot;</span>, lmbd)
-        <span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;Accuracy score on test set: &quot;</span>, dnn<span style="color: #666666">.</span>score(X_test, Y_test))
-        <span style="color: #008000">print</span>()
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-
-<!-- !split -->
-<h2 id="visualization" class="anchor">Visualization </h2>
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #408080; font-style: italic"># optional</span>
-<span style="color: #408080; font-style: italic"># visual representation of grid search</span>
-<span style="color: #408080; font-style: italic"># uses seaborn heatmap, could probably do this in matplotlib</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">seaborn</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">sns</span>
-
-sns<span style="color: #666666">.</span>set()
-
-train_accuracy <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros((<span style="color: #008000">len</span>(eta_vals), <span style="color: #008000">len</span>(lmbd_vals)))
-test_accuracy <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros((<span style="color: #008000">len</span>(eta_vals), <span style="color: #008000">len</span>(lmbd_vals)))
-
-<span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #008000">len</span>(eta_vals)):
-    <span style="color: #008000; font-weight: bold">for</span> j <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #008000">len</span>(lmbd_vals)):
-        dnn <span style="color: #666666">=</span> DNN_scikit[i][j]
-        
-        train_pred <span style="color: #666666">=</span> dnn<span style="color: #666666">.</span>predict(X_train) 
-        test_pred <span style="color: #666666">=</span> dnn<span style="color: #666666">.</span>predict(X_test)
-
-        train_accuracy[i][j] <span style="color: #666666">=</span> accuracy_score(Y_train, train_pred)
-        test_accuracy[i][j] <span style="color: #666666">=</span> accuracy_score(Y_test, test_pred)
-
-        
-fig, ax <span style="color: #666666">=</span> plt<span style="color: #666666">.</span>subplots(figsize <span style="color: #666666">=</span> (<span style="color: #666666">10</span>, <span style="color: #666666">10</span>))
-sns<span style="color: #666666">.</span>heatmap(train_accuracy, annot<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">True</span>, ax<span style="color: #666666">=</span>ax, cmap<span style="color: #666666">=</span><span style="color: #BA2121">&quot;viridis&quot;</span>)
-ax<span style="color: #666666">.</span>set_title(<span style="color: #BA2121">&quot;Training Accuracy&quot;</span>)
-ax<span style="color: #666666">.</span>set_ylabel(<span style="color: #BA2121">&quot;$\eta$&quot;</span>)
-ax<span style="color: #666666">.</span>set_xlabel(<span style="color: #BA2121">&quot;$\lambda$&quot;</span>)
-plt<span style="color: #666666">.</span>show()
-
-fig, ax <span style="color: #666666">=</span> plt<span style="color: #666666">.</span>subplots(figsize <span style="color: #666666">=</span> (<span style="color: #666666">10</span>, <span style="color: #666666">10</span>))
-sns<span style="color: #666666">.</span>heatmap(test_accuracy, annot<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">True</span>, ax<span style="color: #666666">=</span>ax, cmap<span style="color: #666666">=</span><span style="color: #BA2121">&quot;viridis&quot;</span>)
-ax<span style="color: #666666">.</span>set_title(<span style="color: #BA2121">&quot;Test Accuracy&quot;</span>)
-ax<span style="color: #666666">.</span>set_ylabel(<span style="color: #BA2121">&quot;$\eta$&quot;</span>)
-ax<span style="color: #666666">.</span>set_xlabel(<span style="color: #BA2121">&quot;$\lambda$&quot;</span>)
-plt<span style="color: #666666">.</span>show()
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-
-<!-- !split -->
-<h2 id="building-neural-networks-in-tensorflow-and-keras" class="anchor">Building neural networks in Tensorflow and Keras </h2>
-
-<p>Now we want  to build on the experience gained from our neural network implementation in NumPy and scikit-learn
-and use it to construct a neural network in Tensorflow. Once we have constructed a neural network in NumPy
-and Tensorflow, building one in Keras is really quite trivial, though the performance may suffer.  
-</p>
-
-<p>In our previous example we used only one hidden layer, and in this we will use two. From this it should be quite
-clear how to build one using an arbitrary number of hidden layers, using data structures such as Python lists or
-NumPy arrays.
-</p>
-
-<!-- !split -->
-<h2 id="tensorflow" class="anchor">Tensorflow </h2>
-
-<p>Tensorflow is an open source library machine learning library
-developed by the Google Brain team for internal use. It was released
-under the Apache 2.0 open source license in November 9, 2015.
-</p>
-
-<p>Tensorflow is a computational framework that allows you to construct
-machine learning models at different levels of abstraction, from
-high-level, object-oriented APIs like Keras, down to the C++ kernels
-that Tensorflow is built upon. The higher levels of abstraction are
-simpler to use, but less flexible, and our choice of implementation
-should reflect the problems we are trying to solve.
-</p>
-
-<p><a href="https://www.tensorflow.org/guide/graphs" target="_self">Tensorflow uses</a> so-called graphs to represent your computation
-in terms of the dependencies between individual operations, such that you first build a Tensorflow <em>graph</em>
-to represent your model, and then create a Tensorflow <em>session</em> to run the graph.
-</p>
-
-<p>In this guide we will analyze the same data as we did in our NumPy and
-scikit-learn tutorial, gathered from the MNIST database of images. We
-will give an introduction to the lower level Python Application
-Program Interfaces (APIs), and see how we use them to build our graph.
-Then we will build (effectively) the same graph in Keras, to see just
-how simple solving a machine learning problem can be.
-</p>
-
-<p>To install tensorflow on Unix/Linux systems, use pip as</p>
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;">pip3 install tensorflow
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-<p>and/or if you use <b>anaconda</b>, just write (or install from the graphical user interface)
-(current release of CPU-only TensorFlow)
-</p>
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;">conda create <span style="color: #666666">-</span>n tf tensorflow
-conda activate tf
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-<p>To install the current release of GPU TensorFlow</p>
+<p>To install the current release of GPU TensorFlow</p>
 
 <!-- code=python (!bc pycod) typeset with pygments style "default" -->
 <div class="cell border-box-sizing code_cell rendered">
@@ -2360,7 +1004,7 @@ <h2 id="collect-and-pre-process-data" class="anchor">Collect and pre-process dat
 
 
 <!-- !split -->
-<h2 id="the-breast-cancer-data-now-with-keras" class="anchor">The Breast Cancer Data, now with Keras </h2>
+<h2 id="using-pytorch-with-the-full-mnist-data-set" class="anchor">Using Pytorch with the full MNIST data set </h2>
 
 
 <!-- code=python (!bc pycod) typeset with pygments style "default" -->
@@ -2369,171 +1013,156 @@ <h2 id="the-breast-cancer-data-now-with-keras" class="anchor">The Breast Cancer
     <div class="inner_cell">
       <div class="input_area">
         <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">tensorflow</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">tf</span>
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">tensorflow.keras.layers</span> <span style="color: #008000; font-weight: bold">import</span> Input
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">tensorflow.keras.models</span> <span style="color: #008000; font-weight: bold">import</span> Sequential      <span style="color: #408080; font-style: italic">#This allows appending layers to existing models</span>
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">tensorflow.keras.layers</span> <span style="color: #008000; font-weight: bold">import</span> Dense           <span style="color: #408080; font-style: italic">#This allows defining the characteristics of a particular layer</span>
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">tensorflow.keras</span> <span style="color: #008000; font-weight: bold">import</span> optimizers             <span style="color: #408080; font-style: italic">#This allows using whichever optimiser we want (sgd,adam,RMSprop)</span>
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">tensorflow.keras</span> <span style="color: #008000; font-weight: bold">import</span> regularizers           <span style="color: #408080; font-style: italic">#This allows using whichever regularizer we want (l1,l2,l1_l2)</span>
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">tensorflow.keras.utils</span> <span style="color: #008000; font-weight: bold">import</span> to_categorical   <span style="color: #408080; font-style: italic">#This allows using categorical cross entropy as the cost function</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">matplotlib.pyplot</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">plt</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">seaborn</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">sns</span>
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.model_selection</span> <span style="color: #008000; font-weight: bold">import</span> train_test_split <span style="color: #008000; font-weight: bold">as</span> splitter
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.datasets</span> <span style="color: #008000; font-weight: bold">import</span> load_breast_cancer
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">pickle</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">os</span> 
-
-
-<span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;Load breast cancer dataset&quot;&quot;&quot;</span>
-
-np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>seed(<span style="color: #666666">0</span>)        <span style="color: #408080; font-style: italic">#create same seed for random number every time</span>
-
-cancer<span style="color: #666666">=</span>load_breast_cancer()      <span style="color: #408080; font-style: italic">#Download breast cancer dataset</span>
-
-inputs<span style="color: #666666">=</span>cancer<span style="color: #666666">.</span>data                     <span style="color: #408080; font-style: italic">#Feature matrix of 569 rows (samples) and 30 columns (parameters)</span>
-outputs<span style="color: #666666">=</span>cancer<span style="color: #666666">.</span>target                  <span style="color: #408080; font-style: italic">#Label array of 569 rows (0 for benign and 1 for malignant)</span>
-labels<span style="color: #666666">=</span>cancer<span style="color: #666666">.</span>feature_names[<span style="color: #666666">0</span>:<span style="color: #666666">30</span>]
-
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&#39;The content of the breast cancer dataset is:&#39;</span>)      <span style="color: #408080; font-style: italic">#Print information about the datasets</span>
-<span style="color: #008000">print</span>(labels)
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&#39;-------------------------&#39;</span>)
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;inputs =  &quot;</span> <span style="color: #666666">+</span> <span style="color: #008000">str</span>(inputs<span style="color: #666666">.</span>shape))
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;outputs =  &quot;</span> <span style="color: #666666">+</span> <span style="color: #008000">str</span>(outputs<span style="color: #666666">.</span>shape))
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;labels =  &quot;</span><span style="color: #666666">+</span> <span style="color: #008000">str</span>(labels<span style="color: #666666">.</span>shape))
-
-x<span style="color: #666666">=</span>inputs      <span style="color: #408080; font-style: italic">#Reassign the Feature and Label matrices to other variables</span>
-y<span style="color: #666666">=</span>outputs
-
-<span style="color: #408080; font-style: italic">#%% </span>
-
-<span style="color: #408080; font-style: italic"># Visualisation of dataset (for correlation analysis)</span>
-
-plt<span style="color: #666666">.</span>figure()
-plt<span style="color: #666666">.</span>scatter(x[:,<span style="color: #666666">0</span>],x[:,<span style="color: #666666">2</span>],s<span style="color: #666666">=40</span>,c<span style="color: #666666">=</span>y,cmap<span style="color: #666666">=</span>plt<span style="color: #666666">.</span>cm<span style="color: #666666">.</span>Spectral)
-plt<span style="color: #666666">.</span>xlabel(<span style="color: #BA2121">&#39;Mean radius&#39;</span>,fontweight<span style="color: #666666">=</span><span style="color: #BA2121">&#39;bold&#39;</span>)
-plt<span style="color: #666666">.</span>ylabel(<span style="color: #BA2121">&#39;Mean perimeter&#39;</span>,fontweight<span style="color: #666666">=</span><span style="color: #BA2121">&#39;bold&#39;</span>)
-plt<span style="color: #666666">.</span>show()
-
-plt<span style="color: #666666">.</span>figure()
-plt<span style="color: #666666">.</span>scatter(x[:,<span style="color: #666666">5</span>],x[:,<span style="color: #666666">6</span>],s<span style="color: #666666">=40</span>,c<span style="color: #666666">=</span>y, cmap<span style="color: #666666">=</span>plt<span style="color: #666666">.</span>cm<span style="color: #666666">.</span>Spectral)
-plt<span style="color: #666666">.</span>xlabel(<span style="color: #BA2121">&#39;Mean compactness&#39;</span>,fontweight<span style="color: #666666">=</span><span style="color: #BA2121">&#39;bold&#39;</span>)
-plt<span style="color: #666666">.</span>ylabel(<span style="color: #BA2121">&#39;Mean concavity&#39;</span>,fontweight<span style="color: #666666">=</span><span style="color: #BA2121">&#39;bold&#39;</span>)
-plt<span style="color: #666666">.</span>show()
-
-
-plt<span style="color: #666666">.</span>figure()
-plt<span style="color: #666666">.</span>scatter(x[:,<span style="color: #666666">0</span>],x[:,<span style="color: #666666">1</span>],s<span style="color: #666666">=40</span>,c<span style="color: #666666">=</span>y,cmap<span style="color: #666666">=</span>plt<span style="color: #666666">.</span>cm<span style="color: #666666">.</span>Spectral)
-plt<span style="color: #666666">.</span>xlabel(<span style="color: #BA2121">&#39;Mean radius&#39;</span>,fontweight<span style="color: #666666">=</span><span style="color: #BA2121">&#39;bold&#39;</span>)
-plt<span style="color: #666666">.</span>ylabel(<span style="color: #BA2121">&#39;Mean texture&#39;</span>,fontweight<span style="color: #666666">=</span><span style="color: #BA2121">&#39;bold&#39;</span>)
-plt<span style="color: #666666">.</span>show()
-
-plt<span style="color: #666666">.</span>figure()
-plt<span style="color: #666666">.</span>scatter(x[:,<span style="color: #666666">2</span>],x[:,<span style="color: #666666">1</span>],s<span style="color: #666666">=40</span>,c<span style="color: #666666">=</span>y,cmap<span style="color: #666666">=</span>plt<span style="color: #666666">.</span>cm<span style="color: #666666">.</span>Spectral)
-plt<span style="color: #666666">.</span>xlabel(<span style="color: #BA2121">&#39;Mean perimeter&#39;</span>,fontweight<span style="color: #666666">=</span><span style="color: #BA2121">&#39;bold&#39;</span>)
-plt<span style="color: #666666">.</span>ylabel(<span style="color: #BA2121">&#39;Mean compactness&#39;</span>,fontweight<span style="color: #666666">=</span><span style="color: #BA2121">&#39;bold&#39;</span>)
-plt<span style="color: #666666">.</span>show()
-
-
-<span style="color: #408080; font-style: italic"># Generate training and testing datasets</span>
-
-<span style="color: #408080; font-style: italic">#Select features relevant to classification (texture,perimeter,compactness and symmetery) </span>
-<span style="color: #408080; font-style: italic">#and add to input matrix</span>
-
-temp1<span style="color: #666666">=</span>np<span style="color: #666666">.</span>reshape(x[:,<span style="color: #666666">1</span>],(<span style="color: #008000">len</span>(x[:,<span style="color: #666666">1</span>]),<span style="color: #666666">1</span>))
-temp2<span style="color: #666666">=</span>np<span style="color: #666666">.</span>reshape(x[:,<span style="color: #666666">2</span>],(<span style="color: #008000">len</span>(x[:,<span style="color: #666666">2</span>]),<span style="color: #666666">1</span>))
-X<span style="color: #666666">=</span>np<span style="color: #666666">.</span>hstack((temp1,temp2))      
-temp<span style="color: #666666">=</span>np<span style="color: #666666">.</span>reshape(x[:,<span style="color: #666666">5</span>],(<span style="color: #008000">len</span>(x[:,<span style="color: #666666">5</span>]),<span style="color: #666666">1</span>))
-X<span style="color: #666666">=</span>np<span style="color: #666666">.</span>hstack((X,temp))       
-temp<span style="color: #666666">=</span>np<span style="color: #666666">.</span>reshape(x[:,<span style="color: #666666">8</span>],(<span style="color: #008000">len</span>(x[:,<span style="color: #666666">8</span>]),<span style="color: #666666">1</span>))
-X<span style="color: #666666">=</span>np<span style="color: #666666">.</span>hstack((X,temp))       
-
-X_train,X_test,y_train,y_test<span style="color: #666666">=</span>splitter(X,y,test_size<span style="color: #666666">=0.1</span>)   <span style="color: #408080; font-style: italic">#Split datasets into training and testing</span>
-
-y_train<span style="color: #666666">=</span>to_categorical(y_train)     <span style="color: #408080; font-style: italic">#Convert labels to categorical when using categorical cross entropy</span>
-y_test<span style="color: #666666">=</span>to_categorical(y_test)
-
-<span style="color: #008000; font-weight: bold">del</span> temp1,temp2,temp
-
-<span style="color: #408080; font-style: italic"># %%</span>
-
-<span style="color: #408080; font-style: italic"># Define tunable parameters&quot;</span>
-
-eta<span style="color: #666666">=</span>np<span style="color: #666666">.</span>logspace(<span style="color: #666666">-3</span>,<span style="color: #666666">-1</span>,<span style="color: #666666">3</span>)                    <span style="color: #408080; font-style: italic">#Define vector of learning rates (parameter to SGD optimiser)</span>
-lamda<span style="color: #666666">=0.01</span>                                  <span style="color: #408080; font-style: italic">#Define hyperparameter</span>
-n_layers<span style="color: #666666">=2</span>                                  <span style="color: #408080; font-style: italic">#Define number of hidden layers in the model</span>
-n_neuron<span style="color: #666666">=</span>np<span style="color: #666666">.</span>logspace(<span style="color: #666666">0</span>,<span style="color: #666666">3</span>,<span style="color: #666666">4</span>,dtype<span style="color: #666666">=</span><span style="color: #008000">int</span>)       <span style="color: #408080; font-style: italic">#Define number of neurons per layer</span>
-epochs<span style="color: #666666">=100</span>                                   <span style="color: #408080; font-style: italic">#Number of reiterations over the input data</span>
-batch_size<span style="color: #666666">=100</span>                              <span style="color: #408080; font-style: italic">#Number of samples per gradient update</span>
-
-<span style="color: #408080; font-style: italic"># %%</span>
-
-<span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;Define function to return Deep Neural Network model&quot;&quot;&quot;</span>
-
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">NN_model</span>(inputsize,n_layers,n_neuron,eta,lamda):
-    model<span style="color: #666666">=</span>Sequential()      
-    <span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(n_layers):       <span style="color: #408080; font-style: italic">#Run loop to add hidden layers to the model</span>
-        <span style="color: #008000; font-weight: bold">if</span> (i<span style="color: #666666">==0</span>):                  <span style="color: #408080; font-style: italic">#First layer requires input dimensions</span>
-            model<span style="color: #666666">.</span>add(Dense(n_neuron,activation<span style="color: #666666">=</span><span style="color: #BA2121">&#39;relu&#39;</span>,kernel_regularizer<span style="color: #666666">=</span>regularizers<span style="color: #666666">.</span>l2(lamda),input_dim<span style="color: #666666">=</span>inputsize))
-        <span style="color: #008000; font-weight: bold">else</span>:                       <span style="color: #408080; font-style: italic">#Subsequent layers are capable of automatic shape inferencing</span>
-            model<span style="color: #666666">.</span>add(Dense(n_neuron,activation<span style="color: #666666">=</span><span style="color: #BA2121">&#39;relu&#39;</span>,kernel_regularizer<span style="color: #666666">=</span>regularizers<span style="color: #666666">.</span>l2(lamda)))
-    model<span style="color: #666666">.</span>add(Dense(<span style="color: #666666">2</span>,activation<span style="color: #666666">=</span><span style="color: #BA2121">&#39;softmax&#39;</span>))  <span style="color: #408080; font-style: italic">#2 outputs - ordered and disordered (softmax for prob)</span>
-    sgd<span style="color: #666666">=</span>optimizers<span style="color: #666666">.</span>SGD(learning_rate<span style="color: #666666">=</span>eta)
-    model<span style="color: #666666">.</span>compile(loss<span style="color: #666666">=</span><span style="color: #BA2121">&#39;categorical_crossentropy&#39;</span>,optimizer<span style="color: #666666">=</span>sgd,metrics<span style="color: #666666">=</span>[<span style="color: #BA2121">&#39;accuracy&#39;</span>])
-    <span style="color: #008000; font-weight: bold">return</span> model
-
-    
-Train_accuracy<span style="color: #666666">=</span>np<span style="color: #666666">.</span>zeros((<span style="color: #008000">len</span>(n_neuron),<span style="color: #008000">len</span>(eta)))      <span style="color: #408080; font-style: italic">#Define matrices to store accuracy scores as a function</span>
-Test_accuracy<span style="color: #666666">=</span>np<span style="color: #666666">.</span>zeros((<span style="color: #008000">len</span>(n_neuron),<span style="color: #008000">len</span>(eta)))       <span style="color: #408080; font-style: italic">#of learning rate and number of hidden neurons for </span>
-
-<span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #008000">len</span>(n_neuron)):     <span style="color: #408080; font-style: italic">#run loops over hidden neurons and learning rates to calculate </span>
-    <span style="color: #008000; font-weight: bold">for</span> j <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #008000">len</span>(eta)):      <span style="color: #408080; font-style: italic">#accuracy scores </span>
-        DNN_model<span style="color: #666666">=</span>NN_model(X_train<span style="color: #666666">.</span>shape[<span style="color: #666666">1</span>],n_layers,n_neuron[i],eta[j],lamda)
-        DNN_model<span style="color: #666666">.</span>fit(X_train,y_train,epochs<span style="color: #666666">=</span>epochs,batch_size<span style="color: #666666">=</span>batch_size,verbose<span style="color: #666666">=1</span>)
-        Train_accuracy[i,j]<span style="color: #666666">=</span>DNN_model<span style="color: #666666">.</span>evaluate(X_train,y_train)[<span style="color: #666666">1</span>]
-        Test_accuracy[i,j]<span style="color: #666666">=</span>DNN_model<span style="color: #666666">.</span>evaluate(X_test,y_test)[<span style="color: #666666">1</span>]
-               
-
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">plot_data</span>(x,y,data,title<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">None</span>):
-
-    <span style="color: #408080; font-style: italic"># plot results</span>
-    fontsize<span style="color: #666666">=16</span>
-
-
-    fig <span style="color: #666666">=</span> plt<span style="color: #666666">.</span>figure()
-    ax <span style="color: #666666">=</span> fig<span style="color: #666666">.</span>add_subplot(<span style="color: #666666">111</span>)
-    cax <span style="color: #666666">=</span> ax<span style="color: #666666">.</span>matshow(data, interpolation<span style="color: #666666">=</span><span style="color: #BA2121">&#39;nearest&#39;</span>, vmin<span style="color: #666666">=0</span>, vmax<span style="color: #666666">=1</span>)
-    
-    cbar<span style="color: #666666">=</span>fig<span style="color: #666666">.</span>colorbar(cax)
-    cbar<span style="color: #666666">.</span>ax<span style="color: #666666">.</span>set_ylabel(<span style="color: #BA2121">&#39;accuracy (%)&#39;</span>,rotation<span style="color: #666666">=90</span>,fontsize<span style="color: #666666">=</span>fontsize)
-    cbar<span style="color: #666666">.</span>set_ticks([<span style="color: #666666">0</span>,<span style="color: #666666">.2</span>,<span style="color: #666666">.4</span>,<span style="color: #666666">0.6</span>,<span style="color: #666666">0.8</span>,<span style="color: #666666">1.0</span>])
-    cbar<span style="color: #666666">.</span>set_ticklabels([<span style="color: #BA2121">&#39;0%&#39;</span>,<span style="color: #BA2121">&#39;20%&#39;</span>,<span style="color: #BA2121">&#39;40%&#39;</span>,<span style="color: #BA2121">&#39;60%&#39;</span>,<span style="color: #BA2121">&#39;80%&#39;</span>,<span style="color: #BA2121">&#39;100%&#39;</span>])
-
-    <span style="color: #408080; font-style: italic"># put text on matrix elements</span>
-    <span style="color: #008000; font-weight: bold">for</span> i, x_val <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">enumerate</span>(np<span style="color: #666666">.</span>arange(<span style="color: #008000">len</span>(x))):
-        <span style="color: #008000; font-weight: bold">for</span> j, y_val <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">enumerate</span>(np<span style="color: #666666">.</span>arange(<span style="color: #008000">len</span>(y))):
-            c <span style="color: #666666">=</span> <span style="color: #BA2121">&quot;$</span><span style="color: #BB6688; font-weight: bold">{0:.1f}</span><span style="color: #BB6622; font-weight: bold">\\</span><span style="color: #BA2121">%$&quot;</span><span style="color: #666666">.</span>format( <span style="color: #666666">100*</span>data[j,i])  
-            ax<span style="color: #666666">.</span>text(x_val, y_val, c, va<span style="color: #666666">=</span><span style="color: #BA2121">&#39;center&#39;</span>, ha<span style="color: #666666">=</span><span style="color: #BA2121">&#39;center&#39;</span>)
-
-    <span style="color: #408080; font-style: italic"># convert axis vaues to to string labels</span>
-    x<span style="color: #666666">=</span>[<span style="color: #008000">str</span>(i) <span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> x]
-    y<span style="color: #666666">=</span>[<span style="color: #008000">str</span>(i) <span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> y]
-
+  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">torch</span>
+<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">torch.nn</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">nn</span>
+<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">torch.optim</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">optim</span>
+<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">torchvision</span>
+<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">torchvision.transforms</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">transforms</span>
+
+<span style="color: #408080; font-style: italic"># Device configuration: use GPU if available</span>
+device <span style="color: #666666">=</span> torch<span style="color: #666666">.</span>device(<span style="color: #BA2121">&quot;cuda&quot;</span> <span style="color: #008000; font-weight: bold">if</span> torch<span style="color: #666666">.</span>cuda<span style="color: #666666">.</span>is_available() <span style="color: #008000; font-weight: bold">else</span> <span style="color: #BA2121">&quot;cpu&quot;</span>)
+
+<span style="color: #408080; font-style: italic"># MNIST dataset (downloads if not already present)</span>
+transform <span style="color: #666666">=</span> transforms<span style="color: #666666">.</span>Compose([
+    transforms<span style="color: #666666">.</span>ToTensor(),
+    transforms<span style="color: #666666">.</span>Normalize((<span style="color: #666666">0.5</span>,), (<span style="color: #666666">0.5</span>,))  <span style="color: #408080; font-style: italic"># normalize to mean=0.5, std=0.5 (approx. [-1,1] pixel range)</span>
+])
+train_dataset <span style="color: #666666">=</span> torchvision<span style="color: #666666">.</span>datasets<span style="color: #666666">.</span>MNIST(root<span style="color: #666666">=</span><span style="color: #BA2121">&#39;./data&#39;</span>, train<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">True</span>, download<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">True</span>, transform<span style="color: #666666">=</span>transform)
+test_dataset  <span style="color: #666666">=</span> torchvision<span style="color: #666666">.</span>datasets<span style="color: #666666">.</span>MNIST(root<span style="color: #666666">=</span><span style="color: #BA2121">&#39;./data&#39;</span>, train<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">False</span>, download<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">True</span>, transform<span style="color: #666666">=</span>transform)
+
+train_loader <span style="color: #666666">=</span> torch<span style="color: #666666">.</span>utils<span style="color: #666666">.</span>data<span style="color: #666666">.</span>DataLoader(train_dataset, batch_size<span style="color: #666666">=64</span>, shuffle<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">True</span>)
+test_loader  <span style="color: #666666">=</span> torch<span style="color: #666666">.</span>utils<span style="color: #666666">.</span>data<span style="color: #666666">.</span>DataLoader(test_dataset, batch_size<span style="color: #666666">=64</span>, shuffle<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">False</span>)
+
+
+<span style="color: #008000; font-weight: bold">class</span> <span style="color: #0000FF; font-weight: bold">NeuralNet</span>(nn<span style="color: #666666">.</span>Module):
+    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">__init__</span>(<span style="color: #008000">self</span>):
+        <span style="color: #008000">super</span>(NeuralNet, <span style="color: #008000">self</span>)<span style="color: #666666">.</span><span style="color: #0000FF">__init__</span>()
+        <span style="color: #008000">self</span><span style="color: #666666">.</span>fc1 <span style="color: #666666">=</span> nn<span style="color: #666666">.</span>Linear(<span style="color: #666666">28*28</span>, <span style="color: #666666">100</span>)   <span style="color: #408080; font-style: italic"># first hidden layer (784 -&gt; 100)</span>
+        <span style="color: #008000">self</span><span style="color: #666666">.</span>fc2 <span style="color: #666666">=</span> nn<span style="color: #666666">.</span>Linear(<span style="color: #666666">100</span>, <span style="color: #666666">100</span>)    <span style="color: #408080; font-style: italic"># second hidden layer (100 -&gt; 100)</span>
+        <span style="color: #008000">self</span><span style="color: #666666">.</span>fc3 <span style="color: #666666">=</span> nn<span style="color: #666666">.</span>Linear(<span style="color: #666666">100</span>, <span style="color: #666666">10</span>)     <span style="color: #408080; font-style: italic"># output layer (100 -&gt; 10 classes)</span>
+    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">forward</span>(<span style="color: #008000">self</span>, x):
+        x <span style="color: #666666">=</span> x<span style="color: #666666">.</span>view(x<span style="color: #666666">.</span>size(<span style="color: #666666">0</span>), <span style="color: #666666">-1</span>)         <span style="color: #408080; font-style: italic"># flatten images into vectors of size 784</span>
+        x <span style="color: #666666">=</span> torch<span style="color: #666666">.</span>relu(<span style="color: #008000">self</span><span style="color: #666666">.</span>fc1(x))       <span style="color: #408080; font-style: italic"># hidden layer 1 + ReLU activation</span>
+        x <span style="color: #666666">=</span> torch<span style="color: #666666">.</span>relu(<span style="color: #008000">self</span><span style="color: #666666">.</span>fc2(x))       <span style="color: #408080; font-style: italic"># hidden layer 2 + ReLU activation</span>
+        x <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>fc3(x)                   <span style="color: #408080; font-style: italic"># output layer (logits for 10 classes)</span>
+        <span style="color: #008000; font-weight: bold">return</span> x
+
+model <span style="color: #666666">=</span> NeuralNet()<span style="color: #666666">.</span>to(device)
+
+
+criterion <span style="color: #666666">=</span> nn<span style="color: #666666">.</span>CrossEntropyLoss()
+optimizer <span style="color: #666666">=</span> optim<span style="color: #666666">.</span>SGD(model<span style="color: #666666">.</span>parameters(), lr<span style="color: #666666">=0.01</span>, weight_decay<span style="color: #666666">=1e-4</span>)
+
+num_epochs <span style="color: #666666">=</span> <span style="color: #666666">10</span>
+<span style="color: #008000; font-weight: bold">for</span> epoch <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(num_epochs):
+    model<span style="color: #666666">.</span>train()  <span style="color: #408080; font-style: italic"># set model to training mode</span>
+    running_loss <span style="color: #666666">=</span> <span style="color: #666666">0.0</span>
+    <span style="color: #008000; font-weight: bold">for</span> images, labels <span style="color: #AA22FF; font-weight: bold">in</span> train_loader:
+        <span style="color: #408080; font-style: italic"># Move data to device (GPU if available, else CPU)</span>
+        images, labels <span style="color: #666666">=</span> images<span style="color: #666666">.</span>to(device), labels<span style="color: #666666">.</span>to(device)
+
+        optimizer<span style="color: #666666">.</span>zero_grad()            <span style="color: #408080; font-style: italic"># reset gradients to zero</span>
+        outputs <span style="color: #666666">=</span> model(images)          <span style="color: #408080; font-style: italic"># forward pass: compute predictions</span>
+        loss <span style="color: #666666">=</span> criterion(outputs, labels)  <span style="color: #408080; font-style: italic"># compute cross-entropy loss</span>
+        loss<span style="color: #666666">.</span>backward()                 <span style="color: #408080; font-style: italic"># backpropagate to compute gradients</span>
+        optimizer<span style="color: #666666">.</span>step()                <span style="color: #408080; font-style: italic"># update weights using SGD step </span>
+
+        running_loss <span style="color: #666666">+=</span> loss<span style="color: #666666">.</span>item()
+    <span style="color: #408080; font-style: italic"># Compute average loss over all batches in this epoch</span>
+    avg_loss <span style="color: #666666">=</span> running_loss <span style="color: #666666">/</span> <span style="color: #008000">len</span>(train_loader)
+    <span style="color: #008000">print</span>(<span style="color: #BA2121">f&quot;Epoch </span><span style="color: #BB6688; font-weight: bold">{</span>epoch<span style="color: #666666">+1</span><span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">/</span><span style="color: #BB6688; font-weight: bold">{</span>num_epochs<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">, Loss: </span><span style="color: #BB6688; font-weight: bold">{</span>avg_loss<span style="color: #BB6688; font-weight: bold">:</span><span style="color: #BA2121">.4f</span><span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">&quot;</span>)
+
+<span style="color: #408080; font-style: italic">#Evaluation on the Test Set</span>
+
+
+
+model<span style="color: #666666">.</span>eval()  <span style="color: #408080; font-style: italic"># set model to evaluation mode </span>
+correct <span style="color: #666666">=</span> <span style="color: #666666">0</span>
+total <span style="color: #666666">=</span> <span style="color: #666666">0</span>
+<span style="color: #008000; font-weight: bold">with</span> torch<span style="color: #666666">.</span>no_grad():  <span style="color: #408080; font-style: italic"># disable gradient calculation for evaluation </span>
+    <span style="color: #008000; font-weight: bold">for</span> images, labels <span style="color: #AA22FF; font-weight: bold">in</span> test_loader:
+        images, labels <span style="color: #666666">=</span> images<span style="color: #666666">.</span>to(device), labels<span style="color: #666666">.</span>to(device)
+        outputs <span style="color: #666666">=</span> model(images)
+        _, predicted <span style="color: #666666">=</span> torch<span style="color: #666666">.</span>max(outputs, dim<span style="color: #666666">=1</span>)  <span style="color: #408080; font-style: italic"># class with highest score</span>
+        total <span style="color: #666666">+=</span> labels<span style="color: #666666">.</span>size(<span style="color: #666666">0</span>)
+        correct <span style="color: #666666">+=</span> (predicted <span style="color: #666666">==</span> labels)<span style="color: #666666">.</span>sum()<span style="color: #666666">.</span>item()
+
+accuracy <span style="color: #666666">=</span> <span style="color: #666666">100</span> <span style="color: #666666">*</span> correct <span style="color: #666666">/</span> total
+<span style="color: #008000">print</span>(<span style="color: #BA2121">f&quot;Test Accuracy: </span><span style="color: #BB6688; font-weight: bold">{</span>accuracy<span style="color: #BB6688; font-weight: bold">:</span><span style="color: #BA2121">.2f</span><span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">%&quot;</span>)
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
+</div>
 
-    ax<span style="color: #666666">.</span>set_xticklabels([<span style="color: #BA2121">&#39;&#39;</span>]<span style="color: #666666">+</span>x)
-    ax<span style="color: #666666">.</span>set_yticklabels([<span style="color: #BA2121">&#39;&#39;</span>]<span style="color: #666666">+</span>y)
 
-    ax<span style="color: #666666">.</span>set_xlabel(<span style="color: #BA2121">&#39;$</span><span style="color: #BB6622; font-weight: bold">\\</span><span style="color: #BA2121">mathrm{learning</span><span style="color: #BB6622; font-weight: bold">\\</span><span style="color: #BA2121"> rate}$&#39;</span>,fontsize<span style="color: #666666">=</span>fontsize)
-    ax<span style="color: #666666">.</span>set_ylabel(<span style="color: #BA2121">&#39;$</span><span style="color: #BB6622; font-weight: bold">\\</span><span style="color: #BA2121">mathrm{hidden</span><span style="color: #BB6622; font-weight: bold">\\</span><span style="color: #BA2121"> neurons}$&#39;</span>,fontsize<span style="color: #666666">=</span>fontsize)
-    <span style="color: #008000; font-weight: bold">if</span> title <span style="color: #AA22FF; font-weight: bold">is</span> <span style="color: #AA22FF; font-weight: bold">not</span> <span style="color: #008000; font-weight: bold">None</span>:
-        ax<span style="color: #666666">.</span>set_title(title)
+<!-- !split -->
+<h2 id="and-a-similar-example-using-tensorflow-with-keras" class="anchor">And a similar example using Tensorflow with Keras </h2>
 
-    plt<span style="color: #666666">.</span>tight_layout()
 
-    plt<span style="color: #666666">.</span>show()
-    
-plot_data(eta,n_neuron,Train_accuracy, <span style="color: #BA2121">&#39;training&#39;</span>)
-plot_data(eta,n_neuron,Test_accuracy, <span style="color: #BA2121">&#39;testing&#39;</span>)
+<!-- code=python (!bc pycod) typeset with pygments style "default" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #f8f8f8">
+  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">tensorflow</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">tf</span>
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">tensorflow</span> <span style="color: #008000; font-weight: bold">import</span> keras
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">tensorflow.keras</span> <span style="color: #008000; font-weight: bold">import</span> layers, regularizers
+
+<span style="color: #408080; font-style: italic"># Check for GPU (TensorFlow will use it automatically if available)</span>
+gpus <span style="color: #666666">=</span> tf<span style="color: #666666">.</span>config<span style="color: #666666">.</span>list_physical_devices(<span style="color: #BA2121">&#39;GPU&#39;</span>)
+<span style="color: #008000">print</span>(<span style="color: #BA2121">f&quot;GPUs available: </span><span style="color: #BB6688; font-weight: bold">{</span>gpus<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">&quot;</span>)
+
+<span style="color: #408080; font-style: italic"># 1) Load and preprocess MNIST</span>
+(x_train, y_train), (x_test, y_test) <span style="color: #666666">=</span> keras<span style="color: #666666">.</span>datasets<span style="color: #666666">.</span>mnist<span style="color: #666666">.</span>load_data()
+<span style="color: #408080; font-style: italic"># Normalize to [0, 1]</span>
+x_train <span style="color: #666666">=</span> (x_train<span style="color: #666666">.</span>astype(<span style="color: #BA2121">&quot;float32&quot;</span>) <span style="color: #666666">/</span> <span style="color: #666666">255.0</span>)
+x_test  <span style="color: #666666">=</span> (x_test<span style="color: #666666">.</span>astype(<span style="color: #BA2121">&quot;float32&quot;</span>) <span style="color: #666666">/</span> <span style="color: #666666">255.0</span>)
+
+<span style="color: #408080; font-style: italic"># 2) Build the model: 784 -&gt; 100 -&gt; 100 -&gt; 10</span>
+l2_reg <span style="color: #666666">=</span> <span style="color: #666666">1e-4</span>  <span style="color: #408080; font-style: italic"># L2 regularization strength</span>
+
+model <span style="color: #666666">=</span> keras<span style="color: #666666">.</span>Sequential([
+    layers<span style="color: #666666">.</span>Input(shape<span style="color: #666666">=</span>(<span style="color: #666666">28</span>, <span style="color: #666666">28</span>)),
+    layers<span style="color: #666666">.</span>Flatten(),
+    layers<span style="color: #666666">.</span>Dense(<span style="color: #666666">100</span>, activation<span style="color: #666666">=</span><span style="color: #BA2121">&quot;relu&quot;</span>,
+                 kernel_regularizer<span style="color: #666666">=</span>regularizers<span style="color: #666666">.</span>l2(l2_reg)),
+    layers<span style="color: #666666">.</span>Dense(<span style="color: #666666">100</span>, activation<span style="color: #666666">=</span><span style="color: #BA2121">&quot;relu&quot;</span>,
+                 kernel_regularizer<span style="color: #666666">=</span>regularizers<span style="color: #666666">.</span>l2(l2_reg)),
+    layers<span style="color: #666666">.</span>Dense(<span style="color: #666666">10</span>, activation<span style="color: #666666">=</span><span style="color: #BA2121">&quot;softmax&quot;</span>)  <span style="color: #408080; font-style: italic"># output probabilities for 10 classes</span>
+])
+
+<span style="color: #408080; font-style: italic"># 3) Compile with SGD + weight decay via L2 regularizers</span>
+model<span style="color: #666666">.</span>compile(
+    optimizer<span style="color: #666666">=</span>keras<span style="color: #666666">.</span>optimizers<span style="color: #666666">.</span>SGD(learning_rate<span style="color: #666666">=0.01</span>),
+    loss<span style="color: #666666">=</span><span style="color: #BA2121">&quot;sparse_categorical_crossentropy&quot;</span>,
+    metrics<span style="color: #666666">=</span>[<span style="color: #BA2121">&quot;accuracy&quot;</span>],
+)
+
+model<span style="color: #666666">.</span>summary()
+
+<span style="color: #408080; font-style: italic"># 4) Train</span>
+history <span style="color: #666666">=</span> model<span style="color: #666666">.</span>fit(
+    x_train, y_train,
+    epochs<span style="color: #666666">=10</span>,
+    batch_size<span style="color: #666666">=64</span>,
+    validation_split<span style="color: #666666">=0.1</span>,  <span style="color: #408080; font-style: italic"># optional: monitor validation during training</span>
+    verbose<span style="color: #666666">=1</span>
+)
+
+<span style="color: #408080; font-style: italic"># 5) Evaluate on test set</span>
+test_loss, test_acc <span style="color: #666666">=</span> model<span style="color: #666666">.</span>evaluate(x_test, y_test, verbose<span style="color: #666666">=0</span>)
+<span style="color: #008000">print</span>(<span style="color: #BA2121">f&quot;Test accuracy: </span><span style="color: #BB6688; font-weight: bold">{</span>test_acc<span style="color: #BB6688; font-weight: bold">:</span><span style="color: #BA2121">.4f</span><span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">, Test loss: </span><span style="color: #BB6688; font-weight: bold">{</span>test_loss<span style="color: #BB6688; font-weight: bold">:</span><span style="color: #BA2121">.4f</span><span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">&quot;</span>)
 </pre>
 </div>
       </div>
@@ -2551,7 +1180,7 @@ <h2 id="the-breast-cancer-data-now-with-keras" class="anchor">The Breast Cancer
 
 
 <!-- !split -->
-<h2 id="building-a-neural-network-code" class="anchor">Building a neural network code </h2>
+<h2 id="building-our-own-neural-network-code" class="anchor">Building our own  neural network code </h2>
 
 <p>Here we  present a flexible object oriented codebase
 for a feed forward neural network, along with a demonstration of how
@@ -6584,7 +5213,7 @@ <h2 id="resources-on-differential-equations-and-deep-learning" class="anchor">Re
 </footer>
 -->
 <center style="font-size:80%">
-<!-- copyright --> &copy; 1999-2024, Morten Hjorth-Jensen. Released under CC Attribution-NonCommercial 4.0 license
+<!-- copyright --> &copy; 1999-2025, Morten Hjorth-Jensen. Released under CC Attribution-NonCommercial 4.0 license
 </center>
 </body>
 </html>
diff --git a/doc/pub/week43/html/week43-reveal.html b/doc/pub/week43/html/week43-reveal.html
index 30320d347..3678cbd4c 100644
--- a/doc/pub/week43/html/week43-reveal.html
+++ b/doc/pub/week43/html/week43-reveal.html
@@ -173,24 +173,21 @@ <h1 style="text-align: center;">Week 43: Deep Learning: Constructing a Neural Ne
 
 <!-- author(s): Morten Hjorth-Jensen -->
 <center>
-<b>Morten Hjorth-Jensen</b> [1, 2]
+<b>Morten Hjorth-Jensen</b> 
 </center>
-<!-- institution(s) -->
+<!-- institution -->
 <center>
-[1] <b>Department of Physics, University of Oslo</b>
-</center>
-<center>
-[2] <b>Department of Physics and Astronomy and Facility for Rare Isotope Beams, Michigan State University</b>
+<b>Department of Physics, University of Oslo, Norway</b>
 </center>
 <br>
 <center>
-<h4>October 21, 2024</h4>
+<h4>October 20, 2025</h4>
 </center> <!-- date -->
 <br>
 
 
 <center style="font-size:80%">
-<!-- copyright --> &copy; 1999-2024, Morten Hjorth-Jensen. Released under CC Attribution-NonCommercial 4.0 license
+<!-- copyright --> &copy; 1999-2025, Morten Hjorth-Jensen. Released under CC Attribution-NonCommercial 4.0 license
 </center>
 </section>
 
@@ -198,20 +195,16 @@ <h4>October 21, 2024</h4>
 <h2 id="plans-for-week-43">Plans for week 43 </h2>
 
 <div class="alert alert-block alert-block alert-text-normal">
-<b>Material for the lecture on Monday October 21, 2024</b>
+<b>Material for the lecture on Monday October 20, 2025</b>
 <p>
-<ul>
-
-<p><li> Building our own Feed-forward Neural Network with intro to Tensorflow</li>
-
-<p><li> Solving differential equations with Neural Networks</li>
-
-<p><li> Video of lecture at <a href="https://youtu.be/vkBNTn-MLqs" target="_blank"><tt>https://youtu.be/vkBNTn-MLqs</tt></a></li>
-
-<p><li> Video os second part, solving differential equations with neural networks at <a href="https://youtu.be/2N8To65I2wQ" target="_blank"><tt>https://youtu.be/2N8To65I2wQ</tt></a></li>
-
-<p><li> Whiteboard notes on solving differential equations at <a href="https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2024/NotesOct21.pdf" target="_blank"><tt>https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2024/NotesOct21.pdf</tt></a></li>  
-</ul>
+<ol>
+<p><li> Reminder from last week, see also lecture notes from week 42 at <a href="https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/week42.html" target="_blank"><tt>https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/week42.html</tt></a> as well as those from week 41, see see <a href="https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/week41.html" target="_blank"><tt>https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/week41.html</tt></a>.</li> 
+<p><li> Building our own Feed-forward Neural Network.</li>
+<p><li> Coding examples using Tensorflow/Keras and Pytorch examples. The Pytorch examples are adapted from Rashcka's text, see chapters 11-13..</li> 
+<p><li> Start discussions on how to use neural networks for solving  differential equations (ordinary and partial ones). This topic continues next week as well.</li>
+<p><li> Video of lecture at <a href="https://youtu.be/Gi6mzxAT0Ew" target="_blank"><tt>https://youtu.be/Gi6mzxAT0Ew</tt></a></li>
+<p><li> Whiteboard notes at <a href="https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2025/FYSSTKweek43.pdf" target="_blank"><tt>https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2025/FYSSTKweek43.pdf</tt></a></li>  
+</ol>
 </div>
 </section>
 
@@ -220,55 +213,18 @@ <h2 id="exercises-and-lab-session-week-43">Exercises and lab session week 43 </h
 <div class="alert alert-block alert-block alert-text-normal">
 <b>Lab sessions on Tuesday and Wednesday</b>
 <p>
-<ul>
-
-<p><li> Exercise on writing your own neural network code</li>
-
-<p><li> The exercises this week will be continued next week as well</li>
-
-<p><li> Discussion of project 2</li>
-</ul>
-</div>
-</section>
-
-<section>
-<h2 id="mathematics-of-deep-learning">Mathematics of deep learning </h2>
-
-<div class="alert alert-block alert-block alert-text-normal">
-<b>Two recent books online</b>
-<p>
 <ol>
-<p><li> The Modern Mathematics of Deep Learning, by Julius Berner, Philipp Grohs, Gitta Kutyniok, Philipp Petersen at <a href="https://arxiv.org/abs/2105.04026" target="_blank"><tt>https://arxiv.org/abs/2105.04026</tt></a>, published as <a href="https://doi.org/10.1017/9781009025096.002" target="_blank">Mathematical Aspects of Deep Learning, pp. 1-111. Cambridge University Press, 2022</a></li>
-<p><li> Mathematical Introduction to Deep Learning: Methods, Implementations, and Theory, Arnulf Jentzen, Benno Kuckuck, Philippe von Wurstemberger at <a href="https://doi.org/10.48550/arXiv.2310.20360" target="_blank"><tt>https://doi.org/10.48550/arXiv.2310.20360</tt></a></li>
+<p><li> Work  on writing your own neural network code and discussions of project 2. If you didn't get time to do the exercises from the two last weeks, we recommend doing so as these exercises give you the basic elements of a neural network code.</li>
+<p><li> The exercises this week are tailored to the optional part of project 2, and deal with studying ways to display results from classification problems</li>
 </ol>
 </div>
 </section>
 
-<section>
-<h2 id="reminder-on-books-with-hands-on-material-and-codes">Reminder on books with hands-on material and codes </h2>
-<div class="alert alert-block alert-block alert-text-normal">
-<b></b>
-<p>
-<ul>
-<p><li> Sebastian Rashcka et al, Machine learning with Scikit-Learn and PyTorch at <a href="https://sebastianraschka.com/blog/2022/ml-pytorch-book.html" target="_blank"><tt>https://sebastianraschka.com/blog/2022/ml-pytorch-book.html</tt></a></li>
-</ul>
-</div>
-</section>
-
-<section>
-<h2 id="reading-recommendations">Reading recommendations </h2>
-
-<ol>
-<p><li> Rashkca et al., chapter 11, jupyter-notebook sent separately, from GitHub site at <a href="https://github.com/rasbt/machine-learning-book" target="_blank"><tt>https://github.com/rasbt/machine-learning-book</tt></a>. See also chapters 12 and 13 on using Pytorch to make a Neural network code.</li> 
-<p><li> Goodfellow et al, chapter 6 and 7 contain most of the neural network background.</li>
-</ol>
-</section>
-
 <section>
 <h2 id="using-automatic-differentiation">Using Automatic differentiation </h2>
 
 <p>In our discussions of ordinary differential equations and neural network codes
-we will also study the usage of Autograd, see for example <a href="https://www.youtube.com/watch?v=fRf4l5qaX1M&ab_channel=AlexSmola" target="_blank"><tt>https://www.youtube.com/watch?v=fRf4l5qaX1M&ab_channel=AlexSmola</tt></a> in computing gradients for deep learning. For the documentation of Autograd and examples see the lectures slides from <a href="https://compphysics.github.io/MachineLearning/doc/pub/week39/html/week39.html" target="_blank">week 39</a> and the Autograd documentation at <a href="https://github.com/HIPS/autograd" target="_blank"><tt>https://github.com/HIPS/autograd</tt></a>.
+we will also study the usage of Autograd, see for example <a href="https://www.youtube.com/watch?v=fRf4l5qaX1M&ab_channel=AlexSmola" target="_blank"><tt>https://www.youtube.com/watch?v=fRf4l5qaX1M&ab_channel=AlexSmola</tt></a> in computing gradients for deep learning. For the documentation of Autograd and examples see the Autograd documentation at <a href="https://github.com/HIPS/autograd" target="_blank"><tt>https://github.com/HIPS/autograd</tt></a> and the lecture slides from week 41, see <a href="https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/week41.html" target="_blank"><tt>https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/week41.html</tt></a>. 
 </p>
 </section>
 
@@ -284,12 +240,12 @@ <h2 id="back-propagation-and-automatic-differentiation">Back propagation and aut
 </section>
 
 <section>
-<h2 id="lecture-monday-october-21">Lecture Monday  October 21 </h2>
+<h2 id="lecture-monday-october-20">Lecture Monday  October 20 </h2>
 </section>
 
 <section>
 <h2 id="setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations">Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations </h2>
-<p>This is a reminder from where we ended last week.</p>
+<p>This is a reminder from last week.</p>
 
 <div class="alert alert-block alert-block alert-text-normal">
 <b>The architecture (our model)</b>
@@ -403,1446 +359,100 @@ <h2 id="activation-functions">Activation functions  </h2>
 
 <p><li> Non-constant</li>
 
-<p><li> Bounded</li>
-
-<p><li> Monotonically-increasing</li>
-
-<p><li> Continuous</li>
-</ul>
-</section>
-
-<section>
-<h3 id="activation-functions-examples">Activation functions, examples  </h3>
-
-<p>Typical examples are the logistic <em>Sigmoid</em></p>
-
-<p>&nbsp;<br>
-$$
- \sigma(x) = \frac{1}{1 + e^{-x}},
-$$
-<p>&nbsp;<br>
-
-<p>and the <em>hyperbolic tangent</em> function</p>
-<p>&nbsp;<br>
-$$
- \sigma(x) = \tanh(x)
-$$
-<p>&nbsp;<br>
-</section>
-
-<section>
-<h2 id="the-relu-function-family">The RELU function family </h2>
-
-<p>The ReLU activation function suffers from a problem known as the dying
-ReLUs: during training, some neurons effectively die, meaning they
-stop outputting anything other than 0.
-</p>
-
-<p>In some cases, you may find that half of your network&#8217;s neurons are
-dead, especially if you used a large learning rate. During training,
-if a neuron&#8217;s weights get updated such that the weighted sum of the
-neuron&#8217;s inputs is negative, it will start outputting 0. When this
-happen, the neuron is unlikely to come back to life since the gradient
-of the ReLU function is 0 when its input is negative.
-</p>
-</section>
-
-<section>
-<h2 id="elu-function">ELU function </h2>
-
-<p>To solve this problem, nowadays practitioners use a variant of the
-ReLU function, such as the leaky ReLU discussed above or the so-called
-exponential linear unit (ELU) function
-</p>
-
-<p>&nbsp;<br>
-$$
-ELU(z) = \left\{\begin{array}{cc} \alpha\left( \exp{(z)}-1\right) & z < 0,\\  z & z \ge 0.\end{array}\right. 
-$$
-<p>&nbsp;<br>
-</section>
-
-<section>
-<h2 id="which-activation-function-should-we-use">Which activation function should we use? </h2>
-
-<p>In general it seems that the ELU activation function is better than
-the leaky ReLU function (and its variants), which is better than
-ReLU. ReLU performs better than \( \tanh \) which in turn performs better
-than the logistic function.
-</p>
-
-<p>If runtime performance is an issue, then you may opt for the leaky
-ReLU function over the ELU function If you don&#8217;t want to tweak yet
-another hyperparameter, you may just use the default \( \alpha \) of
-\( 0.01 \) for the leaky ReLU, and \( 1 \) for ELU. If you have spare time and
-computing power, you can use cross-validation or bootstrap to evaluate
-other activation functions.
-</p>
-</section>
-
-<section>
-<h2 id="more-on-activation-functions-output-layers">More on activation functions, output layers </h2>
-
-<p>In most cases you can use the ReLU activation function in the hidden
-layers (or one of its variants).
-</p>
-
-<p>It is a bit faster to compute than other activation functions, and the
-gradient descent optimization does in general not get stuck.
-</p>
-
-<b>For the output layer:</b>
-
-<ul>
-<p><li> For classification the softmax activation function is generally a good choice for classification tasks (when the classes are mutually exclusive).</li>
-<p><li> For regression tasks, you can simply use no activation function at all.</li>
-</ul>
-</section>
-
-<section>
-<h2 id="setting-up-a-multi-layer-perceptron-model-for-classification">Setting up a Multi-layer perceptron model for classification  </h2>
-
-<p>We are now gong to develop an example based on the MNIST data
-base. This is a classification problem and we need to use our
-cross-entropy function we discussed in connection with logistic
-regression. The cross-entropy defines our cost function for the
-classificaton problems with neural networks.
-</p>
-
-<p>In binary classification with two classes \( (0, 1) \) we define the
-logistic/sigmoid function as the probability that a particular input
-is in class \( 0 \) or \( 1 \).  This is possible because the logistic
-function takes any input from the real numbers and inputs a number
-between 0 and 1, and can therefore be interpreted as a probability. It
-also has other nice properties, such as a derivative that is simple to
-calculate.
-</p>
-
-<p>For an input \( \boldsymbol{a} \) from the hidden layer, the probability that the input \( \boldsymbol{x} \)
-is in class 0 or 1 is just. We let \( \theta \) represent the unknown weights and biases to be adjusted by our equations). The variable \( x \)
-represents our activation values \( z \). We have
-</p>
-<p>&nbsp;<br>
-$$
-P(y = 0 \mid \boldsymbol{x}, \boldsymbol{\theta}) = \frac{1}{1 + \exp{(- \boldsymbol{x}})} ,
-$$
-<p>&nbsp;<br>
-
-<p>and</p>
-<p>&nbsp;<br>
-$$
-P(y = 1 \mid \boldsymbol{x}, \boldsymbol{\theta}) = 1 - P(y = 0 \mid \boldsymbol{x}, \boldsymbol{\theta}) ,
-$$
-<p>&nbsp;<br>
-
-<p>where \( y \in \{0, 1\} \)  and \( \boldsymbol{\theta} \) represents the weights and biases
-of our network.
-</p>
-</section>
-
-<section>
-<h2 id="defining-the-cost-function">Defining the cost function </h2>
-
-<p>Our cost function is given as (see the Logistic regression lectures)</p>
-<p>&nbsp;<br>
-$$
-\mathcal{C}(\boldsymbol{\theta}) = - \ln P(\mathcal{D} \mid \boldsymbol{\theta}) = - \sum_{i=1}^n
-y_i \ln[P(y_i = 0)] + (1 - y_i) \ln [1 - P(y_i = 0)] = \sum_{i=1}^n \mathcal{L}_i(\boldsymbol{\theta}) .
-$$
-<p>&nbsp;<br>
-
-<p>This last equality means that we can interpret our <em>cost</em> function as a sum over the <em>loss</em> function
-for each point in the dataset \( \mathcal{L}_i(\boldsymbol{\theta}) \).  
-The negative sign is just so that we can think about our algorithm as minimizing a positive number, rather
-than maximizing a negative number.  
-</p>
-
-<p>In <em>multiclass</em> classification it is common to treat each integer label as a so called <em>one-hot</em> vector:  </p>
-
-<p>\( y = 5 \quad \rightarrow \quad \boldsymbol{y} = (0, 0, 0, 0, 0, 1, 0, 0, 0, 0) , \) and</p>
-
-\( y = 1 \quad \rightarrow \quad \boldsymbol{y} = (0, 1, 0, 0, 0, 0, 0, 0, 0, 0) , \)
-
-<p>i.e. a binary bit string of length \( C \), where \( C = 10 \) is the number of classes in the MNIST dataset (numbers from \( 0 \) to \( 9 \))..  </p>
-
-<p>If \( \boldsymbol{x}_i \) is the \( i \)-th input (image), \( y_{ic} \) refers to the \( c \)-th component of the \( i \)-th
-output vector \( \boldsymbol{y}_i \).  
-The probability of \( \boldsymbol{x}_i \) being in class \( c \) will be given by the softmax function:  
-</p>
-
-<p>&nbsp;<br>
-$$
-P(y_{ic} = 1 \mid \boldsymbol{x}_i, \boldsymbol{\theta}) = \frac{\exp{((\boldsymbol{a}_i^{hidden})^T \boldsymbol{w}_c)}}
-{\sum_{c'=0}^{C-1} \exp{((\boldsymbol{a}_i^{hidden})^T \boldsymbol{w}_{c'})}} ,
-$$
-<p>&nbsp;<br>
-
-<p>which reduces to the logistic function in the binary case.  
-The likelihood of this \( C \)-class classifier
-is now given as:  
-</p>
-
-<p>&nbsp;<br>
-$$
-P(\mathcal{D} \mid \boldsymbol{\theta}) = \prod_{i=1}^n \prod_{c=0}^{C-1} [P(y_{ic} = 1)]^{y_{ic}} .
-$$
-<p>&nbsp;<br>
-
-<p>Again we take the negative log-likelihood to define our cost function:  </p>
-
-<p>&nbsp;<br>
-$$
-\mathcal{C}(\boldsymbol{\theta}) = - \log{P(\mathcal{D} \mid \boldsymbol{\theta})}.
-$$
-<p>&nbsp;<br>
-
-<p>See the logistic regression lectures for a full definition of the cost function.</p>
-
-<p>The back propagation equations need now only a small change, namely the definition of a new cost function. We are thus ready to use the same equations as before!</p>
-</section>
-
-<section>
-<h2 id="example-binary-classification-problem">Example: binary classification problem </h2>
-
-<p>As an example of the above, relevant for project 2 as well, let us consider a binary class. As discussed in our logistic regression lectures, we defined a cost function in terms of the parameters \( \beta \) as</p>
-<p>&nbsp;<br>
-$$
-\mathcal{C}(\boldsymbol{\beta}) = - \sum_{i=1}^n \left(y_i\log{p(y_i \vert x_i,\boldsymbol{\beta})}+(1-y_i)\log{1-p(y_i \vert x_i,\boldsymbol{\beta})}\right),
-$$
-<p>&nbsp;<br>
-
-<p>where we had defined the logistic (sigmoid) function</p>
-<p>&nbsp;<br>
-$$
-p(y_i =1\vert x_i,\boldsymbol{\beta})=\frac{\exp{(\beta_0+\beta_1 x_i)}}{1+\exp{(\beta_0+\beta_1 x_i)}},
-$$
-<p>&nbsp;<br>
-
-<p>and</p>
-<p>&nbsp;<br>
-$$
-p(y_i =0\vert x_i,\boldsymbol{\beta})=1-p(y_i =1\vert x_i,\boldsymbol{\beta}).
-$$
-<p>&nbsp;<br>
-
-<p>The parameters \( \boldsymbol{\beta} \) were defined using a minimization method like gradient descent or Newton-Raphson's method. </p>
-
-<p>Now we replace \( x_i \) with the activation \( z_i^l \) for a given layer \( l \) and the outputs as \( y_i=a_i^l=f(z_i^l) \), with \( z_i^l \) now being a function of the weights \( w_{ij}^l \) and biases \( b_i^l \). 
-We have then
-</p>
-<p>&nbsp;<br>
-$$
-a_i^l = y_i = \frac{\exp{(z_i^l)}}{1+\exp{(z_i^l)}},
-$$
-<p>&nbsp;<br>
-
-<p>with </p>
-<p>&nbsp;<br>
-$$
-z_i^l = \sum_{j}w_{ij}^l a_j^{l-1}+b_i^l,
-$$
-<p>&nbsp;<br>
-
-<p>where the superscript \( l-1 \) indicates that these are the outputs from layer \( l-1 \).
-Our cost function at the final layer \( l=L \) is now
-</p>
-<p>&nbsp;<br>
-$$
-\mathcal{C}(\boldsymbol{W}) = - \sum_{i=1}^n \left(t_i\log{a_i^L}+(1-t_i)\log{(1-a_i^L)}\right),
-$$
-<p>&nbsp;<br>
-
-<p>where we have defined the targets \( t_i \). The derivatives of the cost function with respect to the output \( a_i^L \) are then easily calculated and we get</p>
-<p>&nbsp;<br>
-$$
-\frac{\partial \mathcal{C}(\boldsymbol{W})}{\partial a_i^L} = \frac{a_i^L-t_i}{a_i^L(1-a_i^L)}. 
-$$
-<p>&nbsp;<br>
-
-<p>In case we use another activation function than the logistic one, we need to evaluate other derivatives. </p>
-</section>
-
-<section>
-<h2 id="the-softmax-function">The Softmax function </h2>
-<p>In case we employ the more general case given by the Softmax equation, we need to evaluate the derivative of the activation function with respect to the activation \( z_i^l \), that is we need</p>
-<p>&nbsp;<br>
-$$
-\frac{\partial f(z_i^l)}{\partial w_{jk}^l} =
-\frac{\partial f(z_i^l)}{\partial z_j^l} \frac{\partial z_j^l}{\partial w_{jk}^l}= \frac{\partial f(z_i^l)}{\partial z_j^l}a_k^{l-1}.
-$$
-<p>&nbsp;<br>
-
-<p>For the Softmax function we have</p>
-<p>&nbsp;<br>
-$$
-f(z_i^l) = \frac{\exp{(z_i^l)}}{\sum_{m=1}^K\exp{(z_m^l)}}.
-$$
-<p>&nbsp;<br>
-
-<p>Its derivative with respect to \( z_j^l \) gives </p>
-<p>&nbsp;<br>
-$$
-\frac{\partial f(z_i^l)}{\partial z_j^l}= f(z_i^l)\left(\delta_{ij}-f(z_j^l)\right), 
-$$
-<p>&nbsp;<br>
-
-<p>which in case of the simply binary model reduces to  having \( i=j \). </p>
-</section>
-
-<section>
-<h2 id="developing-a-code-for-doing-neural-networks-with-back-propagation">Developing a code for doing neural networks with back propagation </h2>
-
-<p>One can identify a set of key steps when using neural networks to solve supervised learning problems:  </p>
-
-<ol>
-<p><li> Collect and pre-process data</li>
-
-<p><li> Define model and architecture</li>
-
-<p><li> Choose cost function and optimizer</li>
-
-<p><li> Train the model</li>
-
-<p><li> Evaluate model performance on test data</li>
-
-<p><li> Adjust hyperparameters (if necessary, network architecture)</li>
-</ol>
-</section>
-
-<section>
-<h2 id="collect-and-pre-process-data">Collect and pre-process data </h2>
-
-<p>Here we will be using the MNIST dataset, which is readily available through the <b>scikit-learn</b>
-package. You may also find it for example <a href="http://yann.lecun.com/exdb/mnist/" target="_blank">here</a>.  
-The <em>MNIST</em> (Modified National Institute of Standards and Technology) database is a large database
-of handwritten digits that is commonly used for training various image processing systems.  
-The MNIST dataset consists of 70 000 images of size \( 28\times 28 \) pixels, each labeled from 0 to 9.  
-The scikit-learn dataset we will use consists of a selection of 1797 images of size \( 8\times 8 \) collected and processed from this database.  
-</p>
-
-<p>To feed data into a feed-forward neural network we need to represent
-the inputs as a design/feature matrix \( X = (n_{inputs}, n_{features}) \).  Each
-row represents an <em>input</em>, in this case a handwritten digit, and
-each column represents a <em>feature</em>, in this case a pixel.  The
-correct answers, also known as <em>labels</em> or <em>targets</em> are
-represented as a 1D array of integers 
-\( Y = (n_{inputs}) = (5, 3, 1, 8,...) \).
-</p>
-
-<p>As an example, say we want to build a neural network using supervised learning to predict Body-Mass Index (BMI) from
-measurements of height (in m)  
-and weight (in kg). If we have measurements of 5 people the design/feature matrix could be for example:  
-</p>
-
-<p><p>&nbsp;<br>
-$$ X = \begin{bmatrix}
-1.85 &amp; 81\\
-1.71 &amp; 65\\
-1.95 &amp; 103\\
-1.55 &amp; 42\\
-1.63 &amp; 56
-\end{bmatrix} ,$$
-<p>&nbsp;<br>  
-</p>
-
-<p>and the targets would be:  </p>
-
-<p><p>&nbsp;<br>
-$$ Y = (23.7, 22.2, 27.1, 17.5, 21.1) $$
-<p>&nbsp;<br>  </p>
-
-<p>Since each input image is a 2D matrix, we need to flatten the image
-(i.e. "unravel" the 2D matrix into a 1D array) to turn the data into a
-design/feature matrix. This means we lose all spatial information in the
-image, such as locality and translational invariance. More complicated
-architectures such as Convolutional Neural Networks can take advantage
-of such information, and are most commonly applied when analyzing
-images.
-</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="font-size: 80%; line-height: 125%;"><span style="color: #228B22"># import necessary packages</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">matplotlib.pyplot</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">plt</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn</span> <span style="color: #8B008B; font-weight: bold">import</span> datasets
-
-
-<span style="color: #228B22"># ensure the same random numbers appear every time</span>
-np.random.seed(<span style="color: #B452CD">0</span>)
-
-<span style="color: #228B22"># display images in notebook</span>
-%matplotlib inline
-plt.rcParams[<span style="color: #CD5555">&#39;figure.figsize&#39;</span>] = (<span style="color: #B452CD">12</span>,<span style="color: #B452CD">12</span>)
-
-
-<span style="color: #228B22"># download MNIST dataset</span>
-digits = datasets.load_digits()
-
-<span style="color: #228B22"># define inputs and labels</span>
-inputs = digits.images
-labels = digits.target
-
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;inputs = (n_inputs, pixel_width, pixel_height) = &quot;</span> + <span style="color: #658b00">str</span>(inputs.shape))
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;labels = (n_inputs) = &quot;</span> + <span style="color: #658b00">str</span>(labels.shape))
-
-
-<span style="color: #228B22"># flatten the image</span>
-<span style="color: #228B22"># the value -1 means dimension is inferred from the remaining dimensions: 8x8 = 64</span>
-n_inputs = <span style="color: #658b00">len</span>(inputs)
-inputs = inputs.reshape(n_inputs, -<span style="color: #B452CD">1</span>)
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;X = (n_inputs, n_features) = &quot;</span> + <span style="color: #658b00">str</span>(inputs.shape))
-
-
-<span style="color: #228B22"># choose some random images to display</span>
-indices = np.arange(n_inputs)
-random_indices = np.random.choice(indices, size=<span style="color: #B452CD">5</span>)
-
-<span style="color: #8B008B; font-weight: bold">for</span> i, image <span style="color: #8B008B">in</span> <span style="color: #658b00">enumerate</span>(digits.images[random_indices]):
-    plt.subplot(<span style="color: #B452CD">1</span>, <span style="color: #B452CD">5</span>, i+<span style="color: #B452CD">1</span>)
-    plt.axis(<span style="color: #CD5555">&#39;off&#39;</span>)
-    plt.imshow(image, cmap=plt.cm.gray_r, interpolation=<span style="color: #CD5555">&#39;nearest&#39;</span>)
-    plt.title(<span style="color: #CD5555">&quot;Label: %d&quot;</span> % digits.target[random_indices[i]])
-plt.show()
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-</section>
-
-<section>
-<h2 id="train-and-test-datasets">Train and test datasets </h2>
-
-<p>Performing analysis before partitioning the dataset is a major error, that can lead to incorrect conclusions.  </p>
-
-<p>We will reserve \( 80 \% \) of our dataset for training and \( 20 \% \) for testing.  </p>
-
-<p>It is important that the train and test datasets are drawn randomly from our dataset, to ensure
-no bias in the sampling.  
-Say you are taking measurements of weather data to predict the weather in the coming 5 days.
-You don't want to train your model on measurements taken from the hours 00.00 to 12.00, and then test it on data
-collected from 12.00 to 24.00.
-</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="font-size: 80%; line-height: 125%;"><span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.model_selection</span> <span style="color: #8B008B; font-weight: bold">import</span> train_test_split
-
-<span style="color: #228B22"># one-liner from scikit-learn library</span>
-train_size = <span style="color: #B452CD">0.8</span>
-test_size = <span style="color: #B452CD">1</span> - train_size
-X_train, X_test, Y_train, Y_test = train_test_split(inputs, labels, train_size=train_size,
-                                                    test_size=test_size)
-
-<span style="color: #228B22"># equivalently in numpy</span>
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">train_test_split_numpy</span>(inputs, labels, train_size, test_size):
-    n_inputs = <span style="color: #658b00">len</span>(inputs)
-    inputs_shuffled = inputs.copy()
-    labels_shuffled = labels.copy()
-    
-    np.random.shuffle(inputs_shuffled)
-    np.random.shuffle(labels_shuffled)
-    
-    train_end = <span style="color: #658b00">int</span>(n_inputs*train_size)
-    X_train, X_test = inputs_shuffled[:train_end], inputs_shuffled[train_end:]
-    Y_train, Y_test = labels_shuffled[:train_end], labels_shuffled[train_end:]
-    
-    <span style="color: #8B008B; font-weight: bold">return</span> X_train, X_test, Y_train, Y_test
-
-<span style="color: #228B22">#X_train, X_test, Y_train, Y_test = train_test_split_numpy(inputs, labels, train_size, test_size)</span>
-
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;Number of training images: &quot;</span> + <span style="color: #658b00">str</span>(<span style="color: #658b00">len</span>(X_train)))
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;Number of test images: &quot;</span> + <span style="color: #658b00">str</span>(<span style="color: #658b00">len</span>(X_test)))
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-</section>
-
-<section>
-<h2 id="define-model-and-architecture">Define model and architecture </h2>
-
-<p>Our simple feed-forward neural network will consist of an <em>input</em> layer, a single <em>hidden</em> layer and an <em>output</em> layer. The activation \( y \) of each neuron is a weighted sum of inputs, passed through an activation function. In case of the simple perceptron model we have </p>
-
-<p><p>&nbsp;<br>
-$$ z = \sum_{i=1}^n w_i a_i ,$$
-<p>&nbsp;<br></p>
-
-<p><p>&nbsp;<br>
-$$ y = f(z) ,$$
-<p>&nbsp;<br></p>
-
-<p>where \( f \) is the activation function, \( a_i \) represents input from neuron \( i \) in the preceding layer
-and \( w_i \) is the weight to input \( i \).  
-The activation of the neurons in the input layer is just the features (e.g. a pixel value).  
-</p>
-
-<p>The simplest activation function for a neuron is the <em>Heaviside</em> function:</p>
-
-<p><p>&nbsp;<br>
-$$ f(z) = 
-\begin{cases}
-1,  &  z > 0\\
-0,  & \text{otherwise}
-\end{cases}
-$$
-<p>&nbsp;<br>
-</p>
-
-<p>A feed-forward neural network with this activation is known as a <em>perceptron</em>.  
-For a binary classifier (i.e. two classes, 0 or 1, dog or not-dog) we can also use this in our output layer.  
-This activation can be generalized to \( k \) classes (using e.g. the <em>one-against-all</em> strategy), 
-and we call these architectures <em>multiclass perceptrons</em>.  
-</p>
-
-<p>However, it is now common to use the terms Single Layer Perceptron (SLP) (1 hidden layer) and  
-Multilayer Perceptron (MLP) (2 or more hidden layers) to refer to feed-forward neural networks with any activation function.  
-</p>
-
-<p>Typical choices for activation functions include the sigmoid function, hyperbolic tangent, and Rectified Linear Unit (ReLU).  
-We will be using the sigmoid function \( \sigma(x) \):  
-</p>
-
-<p><p>&nbsp;<br>
-$$ f(x) = \sigma(x) = \frac{1}{1 + e^{-x}} ,$$
-<p>&nbsp;<br></p>
-
-<p>which is inspired by probability theory (see logistic regression) and was most commonly used until about 2011. See the discussion below concerning other activation functions.</p>
-</section>
-
-<section>
-<h2 id="layers">Layers </h2>
-
-<ul>
-<p><li> Input</li> 
-</ul>
-<p>
-<p>Since each input image has 8x8 = 64 pixels or features, we have an input layer of 64 neurons.  </p>
-
-<ul>
-<p><li> Hidden layer</li>
-</ul>
-<p>
-<p>We will use 50 neurons in the hidden layer receiving input from the neurons in the input layer.  
-Since each neuron in the hidden layer is connected to the 64 inputs we have 64x50 = 3200 weights to the hidden layer.  
-</p>
-
-<ul>
-<p><li> Output</li>
-</ul>
-<p>
-<p>If we were building a binary classifier, it would be sufficient with a single neuron in the output layer,
-which could output 0 or 1 according to the Heaviside function. This would be an example of a <em>hard</em> classifier, meaning it outputs the class of the input directly. However, if we are dealing with noisy data it is often beneficial to use a <em>soft</em> classifier, which outputs the probability of being in class 0 or 1.  
-</p>
-
-<p>For a soft binary classifier, we could use a single neuron and interpret the output as either being the probability of being in class 0 or the probability of being in class 1. Alternatively we could use 2 neurons, and interpret each neuron as the probability of being in each class.  </p>
-
-<p>Since we are doing multiclass classification, with 10 categories, it is natural to use 10 neurons in the output layer. We number the neurons \( j = 0,1,...,9 \). The activation of each output neuron \( j \) will be according to the <em>softmax</em> function:  </p>
-
-<p><p>&nbsp;<br>
-$$ P(\text{class \( j \)} \mid \text{input \( \boldsymbol{a} \)}) = \frac{\exp{(\boldsymbol{a}^T \boldsymbol{w}_j)}}
-{\sum_{c=0}^{9} \exp{(\boldsymbol{a}^T \boldsymbol{w}_c)}} ,$$
-<p>&nbsp;<br>  
-</p>
-
-<p>i.e. each neuron \( j \) outputs the probability of being in class \( j \) given an input from the hidden layer \( \boldsymbol{a} \), with \( \boldsymbol{w}_j \) the weights of neuron \( j \) to the inputs.  
-The denominator is a normalization factor to ensure the outputs (probabilities) sum up to 1.  
-The exponent is just the weighted sum of inputs as before:  
-</p>
-
-<p><p>&nbsp;<br>
-$$ z_j = \sum_{i=1}^n w_ {ij} a_i+b_j.$$
-<p>&nbsp;<br>  </p>
-
-<p>Since each neuron in the output layer is connected to the 50 inputs from the hidden layer we have 50x10 = 500
-weights to the output layer.
-</p>
-</section>
-
-<section>
-<h2 id="weights-and-biases">Weights and biases </h2>
-
-<p>Typically weights are initialized with small values distributed around zero, drawn from a uniform
-or normal distribution. Setting all weights to zero means all neurons give the same output, making the network useless.  
-</p>
-
-<p>Adding a bias value to the weighted sum of inputs allows the neural network to represent a greater range
-of values. Without it, any input with the value 0 will be mapped to zero (before being passed through the activation). The bias unit has an output of 1, and a weight to each neuron \( j \), \( b_j \):  
-</p>
-
-<p><p>&nbsp;<br>
-$$ z_j = \sum_{i=1}^n w_ {ij} a_i + b_j.$$
-<p>&nbsp;<br>  </p>
-
-<p>The bias weights \( \boldsymbol{b} \) are often initialized to zero, but a small value like \( 0.01 \) ensures all neurons have some output which can be backpropagated in the first training cycle.</p>
-
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="font-size: 80%; line-height: 125%;"><span style="color: #228B22"># building our neural network</span>
-
-n_inputs, n_features = X_train.shape
-n_hidden_neurons = <span style="color: #B452CD">50</span>
-n_categories = <span style="color: #B452CD">10</span>
-
-<span style="color: #228B22"># we make the weights normally distributed using numpy.random.randn</span>
-
-<span style="color: #228B22"># weights and bias in the hidden layer</span>
-hidden_weights = np.random.randn(n_features, n_hidden_neurons)
-hidden_bias = np.zeros(n_hidden_neurons) + <span style="color: #B452CD">0.01</span>
-
-<span style="color: #228B22"># weights and bias in the output layer</span>
-output_weights = np.random.randn(n_hidden_neurons, n_categories)
-output_bias = np.zeros(n_categories) + <span style="color: #B452CD">0.01</span>
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-</section>
-
-<section>
-<h2 id="feed-forward-pass">Feed-forward pass </h2>
-
-<p>Denote \( F \) the number of features, \( H \) the number of hidden neurons and \( C \) the number of categories.  
-For each input image we calculate a weighted sum of input features (pixel values) to each neuron \( j \) in the hidden layer \( l \):  
-</p>
-
-<p><p>&nbsp;<br>
-$$ z_{j}^{l} = \sum_{i=1}^{F} w_{ij}^{l} x_i + b_{j}^{l},$$
-<p>&nbsp;<br></p>
-
-<p>this is then passed through our activation function  </p>
-
-<p><p>&nbsp;<br>
-$$ a_{j}^{l} = f(z_{j}^{l}) .$$
-<p>&nbsp;<br>  </p>
-
-<p>We calculate a weighted sum of inputs (activations in the hidden layer) to each neuron \( j \) in the output layer:  </p>
-
-<p><p>&nbsp;<br>
-$$ z_{j}^{L} = \sum_{i=1}^{H} w_{ij}^{L} a_{i}^{l} + b_{j}^{L}.$$
-<p>&nbsp;<br>  </p>
-
-<p>Finally we calculate the output of neuron \( j \) in the output layer using the softmax function:  </p>
-
-<p><p>&nbsp;<br>
-$$ a_{j}^{L} = \frac{\exp{(z_j^{L})}}
-{\sum_{c=0}^{C-1} \exp{(z_c^{L})}} .$$
-<p>&nbsp;<br>  
-</p>
-</section>
-
-<section>
-<h2 id="matrix-multiplications">Matrix multiplications </h2>
-
-<p>Since our data has the dimensions \( X = (n_{inputs}, n_{features}) \) and our weights to the hidden
-layer have the dimensions  
-\( W_{hidden} = (n_{features}, n_{hidden}) \),
-we can easily feed the network all our training data in one go by taking the matrix product  
-</p>
-
-<p><p>&nbsp;<br>
-$$ X W^{h} = (n_{inputs}, n_{hidden}),$$
-<p>&nbsp;<br> </p>
-
-<p>and obtain a matrix that holds the weighted sum of inputs to the hidden layer
-for each input image and each hidden neuron.    
-We also add the bias to obtain a matrix of weighted sums to the hidden layer \( Z^{h} \):  
-</p>
-
-<p><p>&nbsp;<br>
-$$ \boldsymbol{z}^{l} = \boldsymbol{X} \boldsymbol{W}^{l} + \boldsymbol{b}^{l} ,$$
-<p>&nbsp;<br></p>
-
-<p>meaning the same bias (1D array with size equal number of hidden neurons) is added to each input image.  
-This is then passed through the activation:  
-</p>
-
-<p><p>&nbsp;<br>
-$$ \boldsymbol{a}^{l} = f(\boldsymbol{z}^l) .$$
-<p>&nbsp;<br>  </p>
-
-<p>This is fed to the output layer:  </p>
-
-<p><p>&nbsp;<br>
-$$ \boldsymbol{z}^{L} = \boldsymbol{a}^{L} \boldsymbol{W}^{L} + \boldsymbol{b}^{L} .$$
-<p>&nbsp;<br></p>
-
-<p>Finally we receive our output values for each image and each category by passing it through the softmax function:  </p>
-
-<p><p>&nbsp;<br>
-$$ output = softmax (\boldsymbol{z}^{L}) = (n_{inputs}, n_{categories}) .$$
-<p>&nbsp;<br></p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="font-size: 80%; line-height: 125%;"><span style="color: #228B22"># setup the feed-forward pass, subscript h = hidden layer</span>
-
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">sigmoid</span>(x):
-    <span style="color: #8B008B; font-weight: bold">return</span> <span style="color: #B452CD">1</span>/(<span style="color: #B452CD">1</span> + np.exp(-x))
-
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">feed_forward</span>(X):
-    <span style="color: #228B22"># weighted sum of inputs to the hidden layer</span>
-    z_h = np.matmul(X, hidden_weights) + hidden_bias
-    <span style="color: #228B22"># activation in the hidden layer</span>
-    a_h = sigmoid(z_h)
-    
-    <span style="color: #228B22"># weighted sum of inputs to the output layer</span>
-    z_o = np.matmul(a_h, output_weights) + output_bias
-    <span style="color: #228B22"># softmax output</span>
-    <span style="color: #228B22"># axis 0 holds each input and axis 1 the probabilities of each category</span>
-    exp_term = np.exp(z_o)
-    probabilities = exp_term / np.sum(exp_term, axis=<span style="color: #B452CD">1</span>, keepdims=<span style="color: #8B008B; font-weight: bold">True</span>)
-    
-    <span style="color: #8B008B; font-weight: bold">return</span> probabilities
-
-probabilities = feed_forward(X_train)
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;probabilities = (n_inputs, n_categories) = &quot;</span> + <span style="color: #658b00">str</span>(probabilities.shape))
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;probability that image 0 is in category 0,1,2,...,9 = \n&quot;</span> + <span style="color: #658b00">str</span>(probabilities[<span style="color: #B452CD">0</span>]))
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;probabilities sum up to: &quot;</span> + <span style="color: #658b00">str</span>(probabilities[<span style="color: #B452CD">0</span>].sum()))
-<span style="color: #658b00">print</span>()
-
-<span style="color: #228B22"># we obtain a prediction by taking the class with the highest likelihood</span>
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">predict</span>(X):
-    probabilities = feed_forward(X)
-    <span style="color: #8B008B; font-weight: bold">return</span> np.argmax(probabilities, axis=<span style="color: #B452CD">1</span>)
-
-predictions = predict(X_train)
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;predictions = (n_inputs) = &quot;</span> + <span style="color: #658b00">str</span>(predictions.shape))
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;prediction for image 0: &quot;</span> + <span style="color: #658b00">str</span>(predictions[<span style="color: #B452CD">0</span>]))
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;correct label for image 0: &quot;</span> + <span style="color: #658b00">str</span>(Y_train[<span style="color: #B452CD">0</span>]))
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-</section>
-
-<section>
-<h2 id="choose-cost-function-and-optimizer">Choose cost function and optimizer </h2>
-
-<p>To measure how well our neural network is doing we need to introduce a cost function.  
-We will call the function that gives the error of a single sample output the <em>loss</em> function, and the function
-that gives the total error of our network across all samples the <em>cost</em> function.
-A typical choice for multiclass classification is the <em>cross-entropy</em> loss, also known as the negative log likelihood.  
-</p>
-
-<p>In <em>multiclass</em> classification it is common to treat each integer label as a so called <em>one-hot</em> vector:  </p>
-
-<p><p>&nbsp;<br>
-$$ y = 5 \quad \rightarrow \quad \boldsymbol{y} = (0, 0, 0, 0, 0, 1, 0, 0, 0, 0) ,$$
-<p>&nbsp;<br>  </p>
-
-<p><p>&nbsp;<br>
-$$ y = 1 \quad \rightarrow \quad \boldsymbol{y} = (0, 1, 0, 0, 0, 0, 0, 0, 0, 0) ,$$
-<p>&nbsp;<br>  </p>
-
-<p>i.e. a binary bit string of length \( C \), where \( C = 10 \) is the number of classes in the MNIST dataset.  </p>
-
-<p>Let \( y_{ic} \) denote the \( c \)-th component of the \( i \)-th one-hot vector.  
-We define the cost function \( \mathcal{C} \) as a sum over the cross-entropy loss for each point \( \boldsymbol{x}_i \) in the dataset.
-</p>
-
-<p>In the one-hot representation only one of the terms in the loss function is non-zero, namely the
-probability of the correct category \( c' \)  
-(i.e. the category \( c' \) such that \( y_{ic'} = 1 \)). This means that the cross entropy loss only punishes you for how wrong
-you got the correct label. The probability of category \( c \) is given by the softmax function. The vector \( \boldsymbol{\theta} \) represents the parameters of our network, i.e. all the weights and biases.  
-</p>
-</section>
-
-<section>
-<h2 id="optimizing-the-cost-function">Optimizing the cost function </h2>
-
-<p>The network is trained by finding the weights and biases that minimize the cost function. One of the most widely used classes of methods is <em>gradient descent</em> and its generalizations. The idea behind gradient descent
-is simply to adjust the weights in the direction where the gradient of the cost function is large and negative. This ensures we flow toward a <em>local</em> minimum of the cost function.  
-Each parameter \( \theta \) is iteratively adjusted according to the rule  
-</p>
-
-<p><p>&nbsp;<br>
-$$ \theta_{i+1} = \theta_i - \eta \nabla \mathcal{C}(\theta_i) ,$$
-<p>&nbsp;<br></p>
-
-<p>where \( \eta \) is known as the <em>learning rate</em>, which controls how big a step we take towards the minimum.  
-This update can be repeated for any number of iterations, or until we are satisfied with the result.  
-</p>
-
-<p>A simple and effective improvement is a variant called <em>Batch Gradient Descent</em>.  
-Instead of calculating the gradient on the whole dataset, we calculate an approximation of the gradient
-on a subset of the data called a <em>minibatch</em>.  
-If there are \( N \) data points and we have a minibatch size of \( M \), the total number of batches
-is \( N/M \).  
-We denote each minibatch \( B_k \), with \( k = 1, 2,...,N/M \). The gradient then becomes:  
-</p>
-
-<p><p>&nbsp;<br>
-$$ \nabla \mathcal{C}(\theta) = \frac{1}{N} \sum_{i=1}^N \nabla \mathcal{L}_i(\theta) \quad \rightarrow \quad
-\frac{1}{M} \sum_{i \in B_k} \nabla \mathcal{L}_i(\theta) ,$$
-<p>&nbsp;<br>
-</p>
-
-<p>i.e. instead of averaging the loss over the entire dataset, we average over a minibatch.  </p>
-
-<p>This has two important benefits:  </p>
-<ol>
-<p><li> Introducing stochasticity decreases the chance that the algorithm becomes stuck in a local minima.</li>
-
-<p><li> It significantly speeds up the calculation, since we do not have to use the entire dataset to calculate the gradient.</li>  
-</ol>
-<p>
-<p>The various optmization  methods, with codes and algorithms,  are discussed in our lectures on <a href="https://compphysics.github.io/MachineLearning/doc/pub/Splines/html/Splines-bs.html" target="_blank">Gradient descent approaches</a>.</p>
-</section>
-
-<section>
-<h2 id="regularization">Regularization </h2>
-
-<p>It is common to add an extra term to the cost function, proportional
-to the size of the weights.  This is equivalent to constraining the
-size of the weights, so that they do not grow out of control.
-Constraining the size of the weights means that the weights cannot
-grow arbitrarily large to fit the training data, and in this way
-reduces <em>overfitting</em>.
-</p>
-
-<p>We will measure the size of the weights using the so called <em>L2-norm</em>, meaning our cost function becomes:  </p>
-
-<p><p>&nbsp;<br>
-$$  \mathcal{C}(\theta) = \frac{1}{N} \sum_{i=1}^N \mathcal{L}_i(\theta) \quad \rightarrow \quad
-\frac{1}{N} \sum_{i=1}^N  \mathcal{L}_i(\theta) + \lambda \lvert \lvert \boldsymbol{w} \rvert \rvert_2^2 
-= \frac{1}{N} \sum_{i=1}^N  \mathcal{L}(\theta) + \lambda \sum_{ij} w_{ij}^2,$$
-<p>&nbsp;<br>  
-</p>
-
-<p>i.e. we sum up all the weights squared. The factor \( \lambda \) is known as a regularization parameter.</p>
-
-<p>In order to train the model, we need to calculate the derivative of
-the cost function with respect to every bias and weight in the
-network.  In total our network has \( (64 + 1)\times 50=3250 \) weights in
-the hidden layer and \( (50 + 1)\times 10=510 \) weights to the output
-layer (\( +1 \) for the bias), and the gradient must be calculated for
-every parameter.  We use the <em>backpropagation</em> algorithm discussed
-above. This is a clever use of the chain rule that allows us to
-calculate the gradient efficently. 
-</p>
-</section>
-
-<section>
-<h2 id="matrix-multiplication">Matrix  multiplication </h2>
-
-<p>To more efficently train our network these equations are implemented using matrix operations.  
-The error in the output layer is calculated simply as, with \( \boldsymbol{t} \) being our targets,  
-</p>
-
-<p><p>&nbsp;<br>
-$$ \delta_L = \boldsymbol{t} - \boldsymbol{y} = (n_{inputs}, n_{categories}) .$$
-<p>&nbsp;<br>  </p>
-
-<p>The gradient for the output weights is calculated as  </p>
-
-<p><p>&nbsp;<br>
-$$ \nabla W_{L} = \boldsymbol{a}^T \delta_L   = (n_{hidden}, n_{categories}) ,$$
-<p>&nbsp;<br></p>
-
-<p>where \( \boldsymbol{a} = (n_{inputs}, n_{hidden}) \). This simply means that we are summing up the gradients for each input.  
-Since we are going backwards we have to transpose the activation matrix.  
-</p>
-
-<p>The gradient with respect to the output bias is then  </p>
-
-<p><p>&nbsp;<br>
-$$ \nabla \boldsymbol{b}_{L} = \sum_{i=1}^{n_{inputs}} \delta_L = (n_{categories}) .$$
-<p>&nbsp;<br>  </p>
-
-<p>The error in the hidden layer is  </p>
-
-<p><p>&nbsp;<br>
-$$ \Delta_h = \delta_L W_{L}^T \circ f'(z_{h}) = \delta_L W_{L}^T \circ a_{h} \circ (1 - a_{h}) = (n_{inputs}, n_{hidden}) ,$$
-<p>&nbsp;<br>  </p>
-
-<p>where \( f'(a_{h}) \) is the derivative of the activation in the hidden layer. The matrix products mean
-that we are summing up the products for each neuron in the output layer. The symbol \( \circ \) denotes
-the <em>Hadamard product</em>, meaning element-wise multiplication.  
-</p>
-
-<p>This again gives us the gradients in the hidden layer:  </p>
-
-<p><p>&nbsp;<br>
-$$ \nabla W_{h} = X^T \delta_h = (n_{features}, n_{hidden}) ,$$
-<p>&nbsp;<br>  </p>
-
-<p><p>&nbsp;<br>
-$$ \nabla b_{h} = \sum_{i=1}^{n_{inputs}} \delta_h = (n_{hidden}) .$$
-<p>&nbsp;<br></p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="font-size: 80%; line-height: 125%;"><span style="color: #228B22"># to categorical turns our integer vector into a onehot representation</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.metrics</span> <span style="color: #8B008B; font-weight: bold">import</span> accuracy_score
-
-<span style="color: #228B22"># one-hot in numpy</span>
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">to_categorical_numpy</span>(integer_vector):
-    n_inputs = <span style="color: #658b00">len</span>(integer_vector)
-    n_categories = np.max(integer_vector) + <span style="color: #B452CD">1</span>
-    onehot_vector = np.zeros((n_inputs, n_categories))
-    onehot_vector[<span style="color: #658b00">range</span>(n_inputs), integer_vector] = <span style="color: #B452CD">1</span>
-    
-    <span style="color: #8B008B; font-weight: bold">return</span> onehot_vector
-
-<span style="color: #228B22">#Y_train_onehot, Y_test_onehot = to_categorical(Y_train), to_categorical(Y_test)</span>
-Y_train_onehot, Y_test_onehot = to_categorical_numpy(Y_train), to_categorical_numpy(Y_test)
-
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">feed_forward_train</span>(X):
-    <span style="color: #228B22"># weighted sum of inputs to the hidden layer</span>
-    z_h = np.matmul(X, hidden_weights) + hidden_bias
-    <span style="color: #228B22"># activation in the hidden layer</span>
-    a_h = sigmoid(z_h)
-    
-    <span style="color: #228B22"># weighted sum of inputs to the output layer</span>
-    z_o = np.matmul(a_h, output_weights) + output_bias
-    <span style="color: #228B22"># softmax output</span>
-    <span style="color: #228B22"># axis 0 holds each input and axis 1 the probabilities of each category</span>
-    exp_term = np.exp(z_o)
-    probabilities = exp_term / np.sum(exp_term, axis=<span style="color: #B452CD">1</span>, keepdims=<span style="color: #8B008B; font-weight: bold">True</span>)
-    
-    <span style="color: #228B22"># for backpropagation need activations in hidden and output layers</span>
-    <span style="color: #8B008B; font-weight: bold">return</span> a_h, probabilities
-
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">backpropagation</span>(X, Y):
-    a_h, probabilities = feed_forward_train(X)
-    
-    <span style="color: #228B22"># error in the output layer</span>
-    error_output = probabilities - Y
-    <span style="color: #228B22"># error in the hidden layer</span>
-    error_hidden = np.matmul(error_output, output_weights.T) * a_h * (<span style="color: #B452CD">1</span> - a_h)
-    
-    <span style="color: #228B22"># gradients for the output layer</span>
-    output_weights_gradient = np.matmul(a_h.T, error_output)
-    output_bias_gradient = np.sum(error_output, axis=<span style="color: #B452CD">0</span>)
-    
-    <span style="color: #228B22"># gradient for the hidden layer</span>
-    hidden_weights_gradient = np.matmul(X.T, error_hidden)
-    hidden_bias_gradient = np.sum(error_hidden, axis=<span style="color: #B452CD">0</span>)
-
-    <span style="color: #8B008B; font-weight: bold">return</span> output_weights_gradient, output_bias_gradient, hidden_weights_gradient, hidden_bias_gradient
-
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;Old accuracy on training data: &quot;</span> + <span style="color: #658b00">str</span>(accuracy_score(predict(X_train), Y_train)))
-
-eta = <span style="color: #B452CD">0.01</span>
-lmbd = <span style="color: #B452CD">0.01</span>
-<span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #B452CD">1000</span>):
-    <span style="color: #228B22"># calculate gradients</span>
-    dWo, dBo, dWh, dBh = backpropagation(X_train, Y_train_onehot)
-    
-    <span style="color: #228B22"># regularization term gradients</span>
-    dWo += lmbd * output_weights
-    dWh += lmbd * hidden_weights
-    
-    <span style="color: #228B22"># update weights and biases</span>
-    output_weights -= eta * dWo
-    output_bias -= eta * dBo
-    hidden_weights -= eta * dWh
-    hidden_bias -= eta * dBh
-
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;New accuracy on training data: &quot;</span> + <span style="color: #658b00">str</span>(accuracy_score(predict(X_train), Y_train)))
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-</section>
-
-<section>
-<h2 id="improving-performance">Improving performance </h2>
-
-<p>As we can see the network does not seem to be learning at all. It seems to be just guessing the label for each image.  
-In order to obtain a network that does something useful, we will have to do a bit more work.  
-</p>
-
-<p>The choice of <em>hyperparameters</em> such as learning rate and regularization parameter is hugely influential for the performance of the network. Typically a <em>grid-search</em> is performed, wherein we test different hyperparameters separated by orders of magnitude. For example we could test the learning rates \( \eta = 10^{-6}, 10^{-5},...,10^{-1} \) with different regularization parameters \( \lambda = 10^{-6},...,10^{-0} \).  </p>
-
-<p>Next, we haven't implemented minibatching yet, which introduces stochasticity and is though to act as an important regularizer on the weights. We call a feed-forward + backward pass with a minibatch an <em>iteration</em>, and a full training period
-going through the entire dataset (\( n/M \) batches) an <em>epoch</em>.
-</p>
-
-<p>If this does not improve network performance, you may want to consider altering the network architecture, adding more neurons or hidden layers.  
-Andrew Ng goes through some of these considerations in this <a href="https://youtu.be/F1ka6a13S9I" target="_blank">video</a>. You can find a summary of the video <a href="https://kevinzakka.github.io/2016/09/26/applying-deep-learning/" target="_blank">here</a>.  
-</p>
-</section>
-
-<section>
-<h2 id="full-object-oriented-implementation">Full object-oriented implementation </h2>
-
-<p>It is very natural to think of the network as an object, with specific instances of the network
-being realizations of this object with different hyperparameters. An implementation using Python classes provides a clean structure and interface, and the full implementation of our neural network is given below.
-</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="font-size: 80%; line-height: 125%;"><span style="color: #8B008B; font-weight: bold">class</span> <span style="color: #008b45; font-weight: bold">NeuralNetwork</span>:
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">__init__</span>(
-            <span style="color: #658b00">self</span>,
-            X_data,
-            Y_data,
-            n_hidden_neurons=<span style="color: #B452CD">50</span>,
-            n_categories=<span style="color: #B452CD">10</span>,
-            epochs=<span style="color: #B452CD">10</span>,
-            batch_size=<span style="color: #B452CD">100</span>,
-            eta=<span style="color: #B452CD">0.1</span>,
-            lmbd=<span style="color: #B452CD">0.0</span>):
-
-        <span style="color: #658b00">self</span>.X_data_full = X_data
-        <span style="color: #658b00">self</span>.Y_data_full = Y_data
-
-        <span style="color: #658b00">self</span>.n_inputs = X_data.shape[<span style="color: #B452CD">0</span>]
-        <span style="color: #658b00">self</span>.n_features = X_data.shape[<span style="color: #B452CD">1</span>]
-        <span style="color: #658b00">self</span>.n_hidden_neurons = n_hidden_neurons
-        <span style="color: #658b00">self</span>.n_categories = n_categories
-
-        <span style="color: #658b00">self</span>.epochs = epochs
-        <span style="color: #658b00">self</span>.batch_size = batch_size
-        <span style="color: #658b00">self</span>.iterations = <span style="color: #658b00">self</span>.n_inputs // <span style="color: #658b00">self</span>.batch_size
-        <span style="color: #658b00">self</span>.eta = eta
-        <span style="color: #658b00">self</span>.lmbd = lmbd
-
-        <span style="color: #658b00">self</span>.create_biases_and_weights()
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">create_biases_and_weights</span>(<span style="color: #658b00">self</span>):
-        <span style="color: #658b00">self</span>.hidden_weights = np.random.randn(<span style="color: #658b00">self</span>.n_features, <span style="color: #658b00">self</span>.n_hidden_neurons)
-        <span style="color: #658b00">self</span>.hidden_bias = np.zeros(<span style="color: #658b00">self</span>.n_hidden_neurons) + <span style="color: #B452CD">0.01</span>
-
-        <span style="color: #658b00">self</span>.output_weights = np.random.randn(<span style="color: #658b00">self</span>.n_hidden_neurons, <span style="color: #658b00">self</span>.n_categories)
-        <span style="color: #658b00">self</span>.output_bias = np.zeros(<span style="color: #658b00">self</span>.n_categories) + <span style="color: #B452CD">0.01</span>
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">feed_forward</span>(<span style="color: #658b00">self</span>):
-        <span style="color: #228B22"># feed-forward for training</span>
-        <span style="color: #658b00">self</span>.z_h = np.matmul(<span style="color: #658b00">self</span>.X_data, <span style="color: #658b00">self</span>.hidden_weights) + <span style="color: #658b00">self</span>.hidden_bias
-        <span style="color: #658b00">self</span>.a_h = sigmoid(<span style="color: #658b00">self</span>.z_h)
-
-        <span style="color: #658b00">self</span>.z_o = np.matmul(<span style="color: #658b00">self</span>.a_h, <span style="color: #658b00">self</span>.output_weights) + <span style="color: #658b00">self</span>.output_bias
-
-        exp_term = np.exp(<span style="color: #658b00">self</span>.z_o)
-        <span style="color: #658b00">self</span>.probabilities = exp_term / np.sum(exp_term, axis=<span style="color: #B452CD">1</span>, keepdims=<span style="color: #8B008B; font-weight: bold">True</span>)
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">feed_forward_out</span>(<span style="color: #658b00">self</span>, X):
-        <span style="color: #228B22"># feed-forward for output</span>
-        z_h = np.matmul(X, <span style="color: #658b00">self</span>.hidden_weights) + <span style="color: #658b00">self</span>.hidden_bias
-        a_h = sigmoid(z_h)
-
-        z_o = np.matmul(a_h, <span style="color: #658b00">self</span>.output_weights) + <span style="color: #658b00">self</span>.output_bias
-        
-        exp_term = np.exp(z_o)
-        probabilities = exp_term / np.sum(exp_term, axis=<span style="color: #B452CD">1</span>, keepdims=<span style="color: #8B008B; font-weight: bold">True</span>)
-        <span style="color: #8B008B; font-weight: bold">return</span> probabilities
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">backpropagation</span>(<span style="color: #658b00">self</span>):
-        error_output = <span style="color: #658b00">self</span>.probabilities - <span style="color: #658b00">self</span>.Y_data
-        error_hidden = np.matmul(error_output, <span style="color: #658b00">self</span>.output_weights.T) * <span style="color: #658b00">self</span>.a_h * (<span style="color: #B452CD">1</span> - <span style="color: #658b00">self</span>.a_h)
-
-        <span style="color: #658b00">self</span>.output_weights_gradient = np.matmul(<span style="color: #658b00">self</span>.a_h.T, error_output)
-        <span style="color: #658b00">self</span>.output_bias_gradient = np.sum(error_output, axis=<span style="color: #B452CD">0</span>)
-
-        <span style="color: #658b00">self</span>.hidden_weights_gradient = np.matmul(<span style="color: #658b00">self</span>.X_data.T, error_hidden)
-        <span style="color: #658b00">self</span>.hidden_bias_gradient = np.sum(error_hidden, axis=<span style="color: #B452CD">0</span>)
-
-        <span style="color: #8B008B; font-weight: bold">if</span> <span style="color: #658b00">self</span>.lmbd &gt; <span style="color: #B452CD">0.0</span>:
-            <span style="color: #658b00">self</span>.output_weights_gradient += <span style="color: #658b00">self</span>.lmbd * <span style="color: #658b00">self</span>.output_weights
-            <span style="color: #658b00">self</span>.hidden_weights_gradient += <span style="color: #658b00">self</span>.lmbd * <span style="color: #658b00">self</span>.hidden_weights
-
-        <span style="color: #658b00">self</span>.output_weights -= <span style="color: #658b00">self</span>.eta * <span style="color: #658b00">self</span>.output_weights_gradient
-        <span style="color: #658b00">self</span>.output_bias -= <span style="color: #658b00">self</span>.eta * <span style="color: #658b00">self</span>.output_bias_gradient
-        <span style="color: #658b00">self</span>.hidden_weights -= <span style="color: #658b00">self</span>.eta * <span style="color: #658b00">self</span>.hidden_weights_gradient
-        <span style="color: #658b00">self</span>.hidden_bias -= <span style="color: #658b00">self</span>.eta * <span style="color: #658b00">self</span>.hidden_bias_gradient
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">predict</span>(<span style="color: #658b00">self</span>, X):
-        probabilities = <span style="color: #658b00">self</span>.feed_forward_out(X)
-        <span style="color: #8B008B; font-weight: bold">return</span> np.argmax(probabilities, axis=<span style="color: #B452CD">1</span>)
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">predict_probabilities</span>(<span style="color: #658b00">self</span>, X):
-        probabilities = <span style="color: #658b00">self</span>.feed_forward_out(X)
-        <span style="color: #8B008B; font-weight: bold">return</span> probabilities
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">train</span>(<span style="color: #658b00">self</span>):
-        data_indices = np.arange(<span style="color: #658b00">self</span>.n_inputs)
-
-        <span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #658b00">self</span>.epochs):
-            <span style="color: #8B008B; font-weight: bold">for</span> j <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #658b00">self</span>.iterations):
-                <span style="color: #228B22"># pick datapoints with replacement</span>
-                chosen_datapoints = np.random.choice(
-                    data_indices, size=<span style="color: #658b00">self</span>.batch_size, replace=<span style="color: #8B008B; font-weight: bold">False</span>
-                )
-
-                <span style="color: #228B22"># minibatch training data</span>
-                <span style="color: #658b00">self</span>.X_data = <span style="color: #658b00">self</span>.X_data_full[chosen_datapoints]
-                <span style="color: #658b00">self</span>.Y_data = <span style="color: #658b00">self</span>.Y_data_full[chosen_datapoints]
-
-                <span style="color: #658b00">self</span>.feed_forward()
-                <span style="color: #658b00">self</span>.backpropagation()
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-</section>
-
-<section>
-<h2 id="evaluate-model-performance-on-test-data">Evaluate model performance on test data </h2>
-
-<p>To measure the performance of our network we evaluate how well it does it data it has never seen before, i.e. the test data.  
-We measure the performance of the network using the <em>accuracy</em> score.  
-The accuracy is as you would expect just the number of images correctly labeled divided by the total number of images. A perfect classifier will have an accuracy score of \( 1 \).  
-</p>
-
-<p><p>&nbsp;<br>
-$$ \text{Accuracy} = \frac{\sum_{i=1}^n I(\tilde{y}_i = y_i)}{n} ,$$
-<p>&nbsp;<br>  </p>
-
-<p>where \( I \) is the indicator function, \( 1 \) if \( \tilde{y}_i = y_i \) and \( 0 \) otherwise.</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="font-size: 80%; line-height: 125%;">epochs = <span style="color: #B452CD">100</span>
-batch_size = <span style="color: #B452CD">100</span>
-
-dnn = NeuralNetwork(X_train, Y_train_onehot, eta=eta, lmbd=lmbd, epochs=epochs, batch_size=batch_size,
-                    n_hidden_neurons=n_hidden_neurons, n_categories=n_categories)
-dnn.train()
-test_predict = dnn.predict(X_test)
-
-<span style="color: #228B22"># accuracy score from scikit library</span>
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;Accuracy score on test set: &quot;</span>, accuracy_score(Y_test, test_predict))
-
-<span style="color: #228B22"># equivalent in numpy</span>
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">accuracy_score_numpy</span>(Y_test, Y_pred):
-    <span style="color: #8B008B; font-weight: bold">return</span> np.sum(Y_test == Y_pred) / <span style="color: #658b00">len</span>(Y_test)
-
-<span style="color: #228B22">#print(&quot;Accuracy score on test set: &quot;, accuracy_score_numpy(Y_test, test_predict))</span>
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-</section>
-
-<section>
-<h2 id="adjust-hyperparameters">Adjust hyperparameters </h2>
-
-<p>We now perform a grid search to find the optimal hyperparameters for the network.  
-Note that we are only using 1 layer with 50 neurons, and human performance is estimated to be around \( 98\% \) (\( 2\% \) error rate).
-</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="font-size: 80%; line-height: 125%;">eta_vals = np.logspace(-<span style="color: #B452CD">5</span>, <span style="color: #B452CD">1</span>, <span style="color: #B452CD">7</span>)
-lmbd_vals = np.logspace(-<span style="color: #B452CD">5</span>, <span style="color: #B452CD">1</span>, <span style="color: #B452CD">7</span>)
-<span style="color: #228B22"># store the models for later use</span>
-DNN_numpy = np.zeros((<span style="color: #658b00">len</span>(eta_vals), <span style="color: #658b00">len</span>(lmbd_vals)), dtype=<span style="color: #658b00">object</span>)
-
-<span style="color: #228B22"># grid search</span>
-<span style="color: #8B008B; font-weight: bold">for</span> i, eta <span style="color: #8B008B">in</span> <span style="color: #658b00">enumerate</span>(eta_vals):
-    <span style="color: #8B008B; font-weight: bold">for</span> j, lmbd <span style="color: #8B008B">in</span> <span style="color: #658b00">enumerate</span>(lmbd_vals):
-        dnn = NeuralNetwork(X_train, Y_train_onehot, eta=eta, lmbd=lmbd, epochs=epochs, batch_size=batch_size,
-                            n_hidden_neurons=n_hidden_neurons, n_categories=n_categories)
-        dnn.train()
-        
-        DNN_numpy[i][j] = dnn
-        
-        test_predict = dnn.predict(X_test)
-        
-        <span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;Learning rate  = &quot;</span>, eta)
-        <span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;Lambda = &quot;</span>, lmbd)
-        <span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;Accuracy score on test set: &quot;</span>, accuracy_score(Y_test, test_predict))
-        <span style="color: #658b00">print</span>()
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-</section>
-
-<section>
-<h2 id="visualization">Visualization </h2>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="font-size: 80%; line-height: 125%;"><span style="color: #228B22"># visual representation of grid search</span>
-<span style="color: #228B22"># uses seaborn heatmap, you can also do this with matplotlib imshow</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">seaborn</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">sns</span>
+<p><li> Bounded</li>
 
-sns.set()
+<p><li> Monotonically-increasing</li>
 
-train_accuracy = np.zeros((<span style="color: #658b00">len</span>(eta_vals), <span style="color: #658b00">len</span>(lmbd_vals)))
-test_accuracy = np.zeros((<span style="color: #658b00">len</span>(eta_vals), <span style="color: #658b00">len</span>(lmbd_vals)))
+<p><li> Continuous</li>
+</ul>
+</section>
 
-<span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #658b00">len</span>(eta_vals)):
-    <span style="color: #8B008B; font-weight: bold">for</span> j <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #658b00">len</span>(lmbd_vals)):
-        dnn = DNN_numpy[i][j]
-        
-        train_pred = dnn.predict(X_train) 
-        test_pred = dnn.predict(X_test)
+<section>
+<h3 id="activation-functions-examples">Activation functions, examples  </h3>
 
-        train_accuracy[i][j] = accuracy_score(Y_train, train_pred)
-        test_accuracy[i][j] = accuracy_score(Y_test, test_pred)
+<p>Typical examples are the logistic <em>Sigmoid</em></p>
 
-        
-fig, ax = plt.subplots(figsize = (<span style="color: #B452CD">10</span>, <span style="color: #B452CD">10</span>))
-sns.heatmap(train_accuracy, annot=<span style="color: #8B008B; font-weight: bold">True</span>, ax=ax, cmap=<span style="color: #CD5555">&quot;viridis&quot;</span>)
-ax.set_title(<span style="color: #CD5555">&quot;Training Accuracy&quot;</span>)
-ax.set_ylabel(<span style="color: #CD5555">&quot;$\eta$&quot;</span>)
-ax.set_xlabel(<span style="color: #CD5555">&quot;$\lambda$&quot;</span>)
-plt.show()
+<p>&nbsp;<br>
+$$
+ \sigma(x) = \frac{1}{1 + e^{-x}},
+$$
+<p>&nbsp;<br>
 
-fig, ax = plt.subplots(figsize = (<span style="color: #B452CD">10</span>, <span style="color: #B452CD">10</span>))
-sns.heatmap(test_accuracy, annot=<span style="color: #8B008B; font-weight: bold">True</span>, ax=ax, cmap=<span style="color: #CD5555">&quot;viridis&quot;</span>)
-ax.set_title(<span style="color: #CD5555">&quot;Test Accuracy&quot;</span>)
-ax.set_ylabel(<span style="color: #CD5555">&quot;$\eta$&quot;</span>)
-ax.set_xlabel(<span style="color: #CD5555">&quot;$\lambda$&quot;</span>)
-plt.show()
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
+<p>and the <em>hyperbolic tangent</em> function</p>
+<p>&nbsp;<br>
+$$
+ \sigma(x) = \tanh(x)
+$$
+<p>&nbsp;<br>
 </section>
 
 <section>
-<h2 id="scikit-learn-implementation">scikit-learn implementation </h2>
-
-<p><b>scikit-learn</b> focuses more
-on traditional machine learning methods, such as regression,
-clustering, decision trees, etc. As such, it has only two types of
-neural networks: Multi Layer Perceptron outputting continuous values,
-<em>MPLRegressor</em>, and Multi Layer Perceptron outputting labels,
-<em>MLPClassifier</em>. We will see how simple it is to use these classes.
+<h2 id="the-relu-function-family">The RELU function family </h2>
+
+<p>The ReLU activation function suffers from a problem known as the dying
+ReLUs: during training, some neurons effectively die, meaning they
+stop outputting anything other than 0.
 </p>
 
-<p><b>scikit-learn</b> implements a few improvements from our neural network,
-such as early stopping, a varying learning rate, different
-optimization methods, etc. We would therefore expect a better
-performance overall.
+<p>In some cases, you may find that half of your network&#8217;s neurons are
+dead, especially if you used a large learning rate. During training,
+if a neuron&#8217;s weights get updated such that the weighted sum of the
+neuron&#8217;s inputs is negative, it will start outputting 0. When this
+happen, the neuron is unlikely to come back to life since the gradient
+of the ReLU function is 0 when its input is negative.
 </p>
+</section>
 
+<section>
+<h2 id="elu-function">ELU function </h2>
 
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="font-size: 80%; line-height: 125%;"><span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.neural_network</span> <span style="color: #8B008B; font-weight: bold">import</span> MLPClassifier
-<span style="color: #228B22"># store models for later use</span>
-DNN_scikit = np.zeros((<span style="color: #658b00">len</span>(eta_vals), <span style="color: #658b00">len</span>(lmbd_vals)), dtype=<span style="color: #658b00">object</span>)
+<p>To solve this problem, nowadays practitioners use a variant of the
+ReLU function, such as the leaky ReLU discussed above or the so-called
+exponential linear unit (ELU) function
+</p>
 
-<span style="color: #8B008B; font-weight: bold">for</span> i, eta <span style="color: #8B008B">in</span> <span style="color: #658b00">enumerate</span>(eta_vals):
-    <span style="color: #8B008B; font-weight: bold">for</span> j, lmbd <span style="color: #8B008B">in</span> <span style="color: #658b00">enumerate</span>(lmbd_vals):
-        dnn = MLPClassifier(hidden_layer_sizes=(n_hidden_neurons), activation=<span style="color: #CD5555">&#39;logistic&#39;</span>,
-                            alpha=lmbd, learning_rate_init=eta, max_iter=epochs)
-        dnn.fit(X_train, Y_train)
-        
-        DNN_scikit[i][j] = dnn
-        
-        <span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;Learning rate  = &quot;</span>, eta)
-        <span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;Lambda = &quot;</span>, lmbd)
-        <span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;Accuracy score on test set: &quot;</span>, dnn.score(X_test, Y_test))
-        <span style="color: #658b00">print</span>()
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
+<p>&nbsp;<br>
+$$
+ELU(z) = \left\{\begin{array}{cc} \alpha\left( \exp{(z)}-1\right) & z < 0,\\  z & z \ge 0.\end{array}\right. 
+$$
+<p>&nbsp;<br>
 </section>
 
 <section>
-<h2 id="visualization">Visualization </h2>
+<h2 id="which-activation-function-should-we-use">Which activation function should we use? </h2>
 
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="font-size: 80%; line-height: 125%;"><span style="color: #228B22"># optional</span>
-<span style="color: #228B22"># visual representation of grid search</span>
-<span style="color: #228B22"># uses seaborn heatmap, could probably do this in matplotlib</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">seaborn</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">sns</span>
+<p>In general it seems that the ELU activation function is better than
+the leaky ReLU function (and its variants), which is better than
+ReLU. ReLU performs better than \( \tanh \) which in turn performs better
+than the logistic function.
+</p>
 
-sns.set()
+<p>If runtime performance is an issue, then you may opt for the leaky
+ReLU function over the ELU function If you don&#8217;t want to tweak yet
+another hyperparameter, you may just use the default \( \alpha \) of
+\( 0.01 \) for the leaky ReLU, and \( 1 \) for ELU. If you have spare time and
+computing power, you can use cross-validation or bootstrap to evaluate
+other activation functions.
+</p>
+</section>
 
-train_accuracy = np.zeros((<span style="color: #658b00">len</span>(eta_vals), <span style="color: #658b00">len</span>(lmbd_vals)))
-test_accuracy = np.zeros((<span style="color: #658b00">len</span>(eta_vals), <span style="color: #658b00">len</span>(lmbd_vals)))
+<section>
+<h2 id="more-on-activation-functions-output-layers">More on activation functions, output layers </h2>
 
-<span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #658b00">len</span>(eta_vals)):
-    <span style="color: #8B008B; font-weight: bold">for</span> j <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #658b00">len</span>(lmbd_vals)):
-        dnn = DNN_scikit[i][j]
-        
-        train_pred = dnn.predict(X_train) 
-        test_pred = dnn.predict(X_test)
+<p>In most cases you can use the ReLU activation function in the hidden
+layers (or one of its variants).
+</p>
 
-        train_accuracy[i][j] = accuracy_score(Y_train, train_pred)
-        test_accuracy[i][j] = accuracy_score(Y_test, test_pred)
+<p>It is a bit faster to compute than other activation functions, and the
+gradient descent optimization does in general not get stuck.
+</p>
 
-        
-fig, ax = plt.subplots(figsize = (<span style="color: #B452CD">10</span>, <span style="color: #B452CD">10</span>))
-sns.heatmap(train_accuracy, annot=<span style="color: #8B008B; font-weight: bold">True</span>, ax=ax, cmap=<span style="color: #CD5555">&quot;viridis&quot;</span>)
-ax.set_title(<span style="color: #CD5555">&quot;Training Accuracy&quot;</span>)
-ax.set_ylabel(<span style="color: #CD5555">&quot;$\eta$&quot;</span>)
-ax.set_xlabel(<span style="color: #CD5555">&quot;$\lambda$&quot;</span>)
-plt.show()
+<b>For the output layer:</b>
 
-fig, ax = plt.subplots(figsize = (<span style="color: #B452CD">10</span>, <span style="color: #B452CD">10</span>))
-sns.heatmap(test_accuracy, annot=<span style="color: #8B008B; font-weight: bold">True</span>, ax=ax, cmap=<span style="color: #CD5555">&quot;viridis&quot;</span>)
-ax.set_title(<span style="color: #CD5555">&quot;Test Accuracy&quot;</span>)
-ax.set_ylabel(<span style="color: #CD5555">&quot;$\eta$&quot;</span>)
-ax.set_xlabel(<span style="color: #CD5555">&quot;$\lambda$&quot;</span>)
-plt.show()
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
+<ul>
+<p><li> For classification the softmax activation function is generally a good choice for classification tasks (when the classes are mutually exclusive).</li>
+<p><li> For regression tasks, you can simply use no activation function at all.</li>
+</ul>
 </section>
 
 <section>
@@ -2228,7 +838,7 @@ <h2 id="collect-and-pre-process-data">Collect and pre-process data </h2>
 </section>
 
 <section>
-<h2 id="the-breast-cancer-data-now-with-keras">The Breast Cancer Data, now with Keras </h2>
+<h2 id="using-pytorch-with-the-full-mnist-data-set">Using Pytorch with the full MNIST data set </h2>
 
 
 <!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
@@ -2237,171 +847,156 @@ <h2 id="the-breast-cancer-data-now-with-keras">The Breast Cancer Data, now with
     <div class="inner_cell">
       <div class="input_area">
         <div class="highlight" style="background: #eeeedd">
-  <pre style="font-size: 80%; line-height: 125%;"><span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">tensorflow</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">tf</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">tensorflow.keras.layers</span> <span style="color: #8B008B; font-weight: bold">import</span> Input
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">tensorflow.keras.models</span> <span style="color: #8B008B; font-weight: bold">import</span> Sequential      <span style="color: #228B22">#This allows appending layers to existing models</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">tensorflow.keras.layers</span> <span style="color: #8B008B; font-weight: bold">import</span> Dense           <span style="color: #228B22">#This allows defining the characteristics of a particular layer</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">tensorflow.keras</span> <span style="color: #8B008B; font-weight: bold">import</span> optimizers             <span style="color: #228B22">#This allows using whichever optimiser we want (sgd,adam,RMSprop)</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">tensorflow.keras</span> <span style="color: #8B008B; font-weight: bold">import</span> regularizers           <span style="color: #228B22">#This allows using whichever regularizer we want (l1,l2,l1_l2)</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">tensorflow.keras.utils</span> <span style="color: #8B008B; font-weight: bold">import</span> to_categorical   <span style="color: #228B22">#This allows using categorical cross entropy as the cost function</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">matplotlib.pyplot</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">plt</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">seaborn</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">sns</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.model_selection</span> <span style="color: #8B008B; font-weight: bold">import</span> train_test_split <span style="color: #8B008B; font-weight: bold">as</span> splitter
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.datasets</span> <span style="color: #8B008B; font-weight: bold">import</span> load_breast_cancer
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">pickle</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">os</span> 
-
-
-<span style="color: #CD5555">&quot;&quot;&quot;Load breast cancer dataset&quot;&quot;&quot;</span>
-
-np.random.seed(<span style="color: #B452CD">0</span>)        <span style="color: #228B22">#create same seed for random number every time</span>
-
-cancer=load_breast_cancer()      <span style="color: #228B22">#Download breast cancer dataset</span>
-
-inputs=cancer.data                     <span style="color: #228B22">#Feature matrix of 569 rows (samples) and 30 columns (parameters)</span>
-outputs=cancer.target                  <span style="color: #228B22">#Label array of 569 rows (0 for benign and 1 for malignant)</span>
-labels=cancer.feature_names[<span style="color: #B452CD">0</span>:<span style="color: #B452CD">30</span>]
-
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&#39;The content of the breast cancer dataset is:&#39;</span>)      <span style="color: #228B22">#Print information about the datasets</span>
-<span style="color: #658b00">print</span>(labels)
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&#39;-------------------------&#39;</span>)
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;inputs =  &quot;</span> + <span style="color: #658b00">str</span>(inputs.shape))
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;outputs =  &quot;</span> + <span style="color: #658b00">str</span>(outputs.shape))
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;labels =  &quot;</span>+ <span style="color: #658b00">str</span>(labels.shape))
-
-x=inputs      <span style="color: #228B22">#Reassign the Feature and Label matrices to other variables</span>
-y=outputs
-
-<span style="color: #228B22">#%% </span>
-
-<span style="color: #228B22"># Visualisation of dataset (for correlation analysis)</span>
-
-plt.figure()
-plt.scatter(x[:,<span style="color: #B452CD">0</span>],x[:,<span style="color: #B452CD">2</span>],s=<span style="color: #B452CD">40</span>,c=y,cmap=plt.cm.Spectral)
-plt.xlabel(<span style="color: #CD5555">&#39;Mean radius&#39;</span>,fontweight=<span style="color: #CD5555">&#39;bold&#39;</span>)
-plt.ylabel(<span style="color: #CD5555">&#39;Mean perimeter&#39;</span>,fontweight=<span style="color: #CD5555">&#39;bold&#39;</span>)
-plt.show()
-
-plt.figure()
-plt.scatter(x[:,<span style="color: #B452CD">5</span>],x[:,<span style="color: #B452CD">6</span>],s=<span style="color: #B452CD">40</span>,c=y, cmap=plt.cm.Spectral)
-plt.xlabel(<span style="color: #CD5555">&#39;Mean compactness&#39;</span>,fontweight=<span style="color: #CD5555">&#39;bold&#39;</span>)
-plt.ylabel(<span style="color: #CD5555">&#39;Mean concavity&#39;</span>,fontweight=<span style="color: #CD5555">&#39;bold&#39;</span>)
-plt.show()
-
-
-plt.figure()
-plt.scatter(x[:,<span style="color: #B452CD">0</span>],x[:,<span style="color: #B452CD">1</span>],s=<span style="color: #B452CD">40</span>,c=y,cmap=plt.cm.Spectral)
-plt.xlabel(<span style="color: #CD5555">&#39;Mean radius&#39;</span>,fontweight=<span style="color: #CD5555">&#39;bold&#39;</span>)
-plt.ylabel(<span style="color: #CD5555">&#39;Mean texture&#39;</span>,fontweight=<span style="color: #CD5555">&#39;bold&#39;</span>)
-plt.show()
-
-plt.figure()
-plt.scatter(x[:,<span style="color: #B452CD">2</span>],x[:,<span style="color: #B452CD">1</span>],s=<span style="color: #B452CD">40</span>,c=y,cmap=plt.cm.Spectral)
-plt.xlabel(<span style="color: #CD5555">&#39;Mean perimeter&#39;</span>,fontweight=<span style="color: #CD5555">&#39;bold&#39;</span>)
-plt.ylabel(<span style="color: #CD5555">&#39;Mean compactness&#39;</span>,fontweight=<span style="color: #CD5555">&#39;bold&#39;</span>)
-plt.show()
-
-
-<span style="color: #228B22"># Generate training and testing datasets</span>
-
-<span style="color: #228B22">#Select features relevant to classification (texture,perimeter,compactness and symmetery) </span>
-<span style="color: #228B22">#and add to input matrix</span>
-
-temp1=np.reshape(x[:,<span style="color: #B452CD">1</span>],(<span style="color: #658b00">len</span>(x[:,<span style="color: #B452CD">1</span>]),<span style="color: #B452CD">1</span>))
-temp2=np.reshape(x[:,<span style="color: #B452CD">2</span>],(<span style="color: #658b00">len</span>(x[:,<span style="color: #B452CD">2</span>]),<span style="color: #B452CD">1</span>))
-X=np.hstack((temp1,temp2))      
-temp=np.reshape(x[:,<span style="color: #B452CD">5</span>],(<span style="color: #658b00">len</span>(x[:,<span style="color: #B452CD">5</span>]),<span style="color: #B452CD">1</span>))
-X=np.hstack((X,temp))       
-temp=np.reshape(x[:,<span style="color: #B452CD">8</span>],(<span style="color: #658b00">len</span>(x[:,<span style="color: #B452CD">8</span>]),<span style="color: #B452CD">1</span>))
-X=np.hstack((X,temp))       
-
-X_train,X_test,y_train,y_test=splitter(X,y,test_size=<span style="color: #B452CD">0.1</span>)   <span style="color: #228B22">#Split datasets into training and testing</span>
-
-y_train=to_categorical(y_train)     <span style="color: #228B22">#Convert labels to categorical when using categorical cross entropy</span>
-y_test=to_categorical(y_test)
-
-<span style="color: #8B008B; font-weight: bold">del</span> temp1,temp2,temp
-
-<span style="color: #228B22"># %%</span>
-
-<span style="color: #228B22"># Define tunable parameters&quot;</span>
-
-eta=np.logspace(-<span style="color: #B452CD">3</span>,-<span style="color: #B452CD">1</span>,<span style="color: #B452CD">3</span>)                    <span style="color: #228B22">#Define vector of learning rates (parameter to SGD optimiser)</span>
-lamda=<span style="color: #B452CD">0.01</span>                                  <span style="color: #228B22">#Define hyperparameter</span>
-n_layers=<span style="color: #B452CD">2</span>                                  <span style="color: #228B22">#Define number of hidden layers in the model</span>
-n_neuron=np.logspace(<span style="color: #B452CD">0</span>,<span style="color: #B452CD">3</span>,<span style="color: #B452CD">4</span>,dtype=<span style="color: #658b00">int</span>)       <span style="color: #228B22">#Define number of neurons per layer</span>
-epochs=<span style="color: #B452CD">100</span>                                   <span style="color: #228B22">#Number of reiterations over the input data</span>
-batch_size=<span style="color: #B452CD">100</span>                              <span style="color: #228B22">#Number of samples per gradient update</span>
-
-<span style="color: #228B22"># %%</span>
-
-<span style="color: #CD5555">&quot;&quot;&quot;Define function to return Deep Neural Network model&quot;&quot;&quot;</span>
-
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">NN_model</span>(inputsize,n_layers,n_neuron,eta,lamda):
-    model=Sequential()      
-    <span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(n_layers):       <span style="color: #228B22">#Run loop to add hidden layers to the model</span>
-        <span style="color: #8B008B; font-weight: bold">if</span> (i==<span style="color: #B452CD">0</span>):                  <span style="color: #228B22">#First layer requires input dimensions</span>
-            model.add(Dense(n_neuron,activation=<span style="color: #CD5555">&#39;relu&#39;</span>,kernel_regularizer=regularizers.l2(lamda),input_dim=inputsize))
-        <span style="color: #8B008B; font-weight: bold">else</span>:                       <span style="color: #228B22">#Subsequent layers are capable of automatic shape inferencing</span>
-            model.add(Dense(n_neuron,activation=<span style="color: #CD5555">&#39;relu&#39;</span>,kernel_regularizer=regularizers.l2(lamda)))
-    model.add(Dense(<span style="color: #B452CD">2</span>,activation=<span style="color: #CD5555">&#39;softmax&#39;</span>))  <span style="color: #228B22">#2 outputs - ordered and disordered (softmax for prob)</span>
-    sgd=optimizers.SGD(learning_rate=eta)
-    model.compile(loss=<span style="color: #CD5555">&#39;categorical_crossentropy&#39;</span>,optimizer=sgd,metrics=[<span style="color: #CD5555">&#39;accuracy&#39;</span>])
-    <span style="color: #8B008B; font-weight: bold">return</span> model
-
-    
-Train_accuracy=np.zeros((<span style="color: #658b00">len</span>(n_neuron),<span style="color: #658b00">len</span>(eta)))      <span style="color: #228B22">#Define matrices to store accuracy scores as a function</span>
-Test_accuracy=np.zeros((<span style="color: #658b00">len</span>(n_neuron),<span style="color: #658b00">len</span>(eta)))       <span style="color: #228B22">#of learning rate and number of hidden neurons for </span>
-
-<span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #658b00">len</span>(n_neuron)):     <span style="color: #228B22">#run loops over hidden neurons and learning rates to calculate </span>
-    <span style="color: #8B008B; font-weight: bold">for</span> j <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #658b00">len</span>(eta)):      <span style="color: #228B22">#accuracy scores </span>
-        DNN_model=NN_model(X_train.shape[<span style="color: #B452CD">1</span>],n_layers,n_neuron[i],eta[j],lamda)
-        DNN_model.fit(X_train,y_train,epochs=epochs,batch_size=batch_size,verbose=<span style="color: #B452CD">1</span>)
-        Train_accuracy[i,j]=DNN_model.evaluate(X_train,y_train)[<span style="color: #B452CD">1</span>]
-        Test_accuracy[i,j]=DNN_model.evaluate(X_test,y_test)[<span style="color: #B452CD">1</span>]
-               
-
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">plot_data</span>(x,y,data,title=<span style="color: #8B008B; font-weight: bold">None</span>):
-
-    <span style="color: #228B22"># plot results</span>
-    fontsize=<span style="color: #B452CD">16</span>
-
-
-    fig = plt.figure()
-    ax = fig.add_subplot(<span style="color: #B452CD">111</span>)
-    cax = ax.matshow(data, interpolation=<span style="color: #CD5555">&#39;nearest&#39;</span>, vmin=<span style="color: #B452CD">0</span>, vmax=<span style="color: #B452CD">1</span>)
-    
-    cbar=fig.colorbar(cax)
-    cbar.ax.set_ylabel(<span style="color: #CD5555">&#39;accuracy (%)&#39;</span>,rotation=<span style="color: #B452CD">90</span>,fontsize=fontsize)
-    cbar.set_ticks([<span style="color: #B452CD">0</span>,<span style="color: #B452CD">.2</span>,<span style="color: #B452CD">.4</span>,<span style="color: #B452CD">0.6</span>,<span style="color: #B452CD">0.8</span>,<span style="color: #B452CD">1.0</span>])
-    cbar.set_ticklabels([<span style="color: #CD5555">&#39;0%&#39;</span>,<span style="color: #CD5555">&#39;20%&#39;</span>,<span style="color: #CD5555">&#39;40%&#39;</span>,<span style="color: #CD5555">&#39;60%&#39;</span>,<span style="color: #CD5555">&#39;80%&#39;</span>,<span style="color: #CD5555">&#39;100%&#39;</span>])
-
-    <span style="color: #228B22"># put text on matrix elements</span>
-    <span style="color: #8B008B; font-weight: bold">for</span> i, x_val <span style="color: #8B008B">in</span> <span style="color: #658b00">enumerate</span>(np.arange(<span style="color: #658b00">len</span>(x))):
-        <span style="color: #8B008B; font-weight: bold">for</span> j, y_val <span style="color: #8B008B">in</span> <span style="color: #658b00">enumerate</span>(np.arange(<span style="color: #658b00">len</span>(y))):
-            c = <span style="color: #CD5555">&quot;${0:.1f}\\%$&quot;</span>.format( <span style="color: #B452CD">100</span>*data[j,i])  
-            ax.text(x_val, y_val, c, va=<span style="color: #CD5555">&#39;center&#39;</span>, ha=<span style="color: #CD5555">&#39;center&#39;</span>)
-
-    <span style="color: #228B22"># convert axis vaues to to string labels</span>
-    x=[<span style="color: #658b00">str</span>(i) <span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> x]
-    y=[<span style="color: #658b00">str</span>(i) <span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> y]
-
-
-    ax.set_xticklabels([<span style="color: #CD5555">&#39;&#39;</span>]+x)
-    ax.set_yticklabels([<span style="color: #CD5555">&#39;&#39;</span>]+y)
+  <pre style="font-size: 80%; line-height: 125%;"><span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">torch</span>
+<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">torch.nn</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">nn</span>
+<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">torch.optim</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">optim</span>
+<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">torchvision</span>
+<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">torchvision.transforms</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">transforms</span>
+
+<span style="color: #228B22"># Device configuration: use GPU if available</span>
+device = torch.device(<span style="color: #CD5555">&quot;cuda&quot;</span> <span style="color: #8B008B; font-weight: bold">if</span> torch.cuda.is_available() <span style="color: #8B008B; font-weight: bold">else</span> <span style="color: #CD5555">&quot;cpu&quot;</span>)
+
+<span style="color: #228B22"># MNIST dataset (downloads if not already present)</span>
+transform = transforms.Compose([
+    transforms.ToTensor(),
+    transforms.Normalize((<span style="color: #B452CD">0.5</span>,), (<span style="color: #B452CD">0.5</span>,))  <span style="color: #228B22"># normalize to mean=0.5, std=0.5 (approx. [-1,1] pixel range)</span>
+])
+train_dataset = torchvision.datasets.MNIST(root=<span style="color: #CD5555">&#39;./data&#39;</span>, train=<span style="color: #8B008B; font-weight: bold">True</span>, download=<span style="color: #8B008B; font-weight: bold">True</span>, transform=transform)
+test_dataset  = torchvision.datasets.MNIST(root=<span style="color: #CD5555">&#39;./data&#39;</span>, train=<span style="color: #8B008B; font-weight: bold">False</span>, download=<span style="color: #8B008B; font-weight: bold">True</span>, transform=transform)
+
+train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=<span style="color: #B452CD">64</span>, shuffle=<span style="color: #8B008B; font-weight: bold">True</span>)
+test_loader  = torch.utils.data.DataLoader(test_dataset, batch_size=<span style="color: #B452CD">64</span>, shuffle=<span style="color: #8B008B; font-weight: bold">False</span>)
+
+
+<span style="color: #8B008B; font-weight: bold">class</span> <span style="color: #008b45; font-weight: bold">NeuralNet</span>(nn.Module):
+    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">__init__</span>(<span style="color: #658b00">self</span>):
+        <span style="color: #658b00">super</span>(NeuralNet, <span style="color: #658b00">self</span>).<span style="color: #008b45">__init__</span>()
+        <span style="color: #658b00">self</span>.fc1 = nn.Linear(<span style="color: #B452CD">28</span>*<span style="color: #B452CD">28</span>, <span style="color: #B452CD">100</span>)   <span style="color: #228B22"># first hidden layer (784 -&gt; 100)</span>
+        <span style="color: #658b00">self</span>.fc2 = nn.Linear(<span style="color: #B452CD">100</span>, <span style="color: #B452CD">100</span>)    <span style="color: #228B22"># second hidden layer (100 -&gt; 100)</span>
+        <span style="color: #658b00">self</span>.fc3 = nn.Linear(<span style="color: #B452CD">100</span>, <span style="color: #B452CD">10</span>)     <span style="color: #228B22"># output layer (100 -&gt; 10 classes)</span>
+    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">forward</span>(<span style="color: #658b00">self</span>, x):
+        x = x.view(x.size(<span style="color: #B452CD">0</span>), -<span style="color: #B452CD">1</span>)         <span style="color: #228B22"># flatten images into vectors of size 784</span>
+        x = torch.relu(<span style="color: #658b00">self</span>.fc1(x))       <span style="color: #228B22"># hidden layer 1 + ReLU activation</span>
+        x = torch.relu(<span style="color: #658b00">self</span>.fc2(x))       <span style="color: #228B22"># hidden layer 2 + ReLU activation</span>
+        x = <span style="color: #658b00">self</span>.fc3(x)                   <span style="color: #228B22"># output layer (logits for 10 classes)</span>
+        <span style="color: #8B008B; font-weight: bold">return</span> x
+
+model = NeuralNet().to(device)
+
+
+criterion = nn.CrossEntropyLoss()
+optimizer = optim.SGD(model.parameters(), lr=<span style="color: #B452CD">0.01</span>, weight_decay=<span style="color: #B452CD">1e-4</span>)
+
+num_epochs = <span style="color: #B452CD">10</span>
+<span style="color: #8B008B; font-weight: bold">for</span> epoch <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(num_epochs):
+    model.train()  <span style="color: #228B22"># set model to training mode</span>
+    running_loss = <span style="color: #B452CD">0.0</span>
+    <span style="color: #8B008B; font-weight: bold">for</span> images, labels <span style="color: #8B008B">in</span> train_loader:
+        <span style="color: #228B22"># Move data to device (GPU if available, else CPU)</span>
+        images, labels = images.to(device), labels.to(device)
+
+        optimizer.zero_grad()            <span style="color: #228B22"># reset gradients to zero</span>
+        outputs = model(images)          <span style="color: #228B22"># forward pass: compute predictions</span>
+        loss = criterion(outputs, labels)  <span style="color: #228B22"># compute cross-entropy loss</span>
+        loss.backward()                 <span style="color: #228B22"># backpropagate to compute gradients</span>
+        optimizer.step()                <span style="color: #228B22"># update weights using SGD step </span>
+
+        running_loss += loss.item()
+    <span style="color: #228B22"># Compute average loss over all batches in this epoch</span>
+    avg_loss = running_loss / <span style="color: #658b00">len</span>(train_loader)
+    <span style="color: #658b00">print</span>(<span style="color: #CD5555">f&quot;Epoch {</span>epoch+<span style="color: #B452CD">1</span><span style="color: #CD5555">}/{</span>num_epochs<span style="color: #CD5555">}, Loss: {</span>avg_loss<span style="color: #CD5555">:.4f}&quot;</span>)
+
+<span style="color: #228B22">#Evaluation on the Test Set</span>
+
+
+
+model.eval()  <span style="color: #228B22"># set model to evaluation mode </span>
+correct = <span style="color: #B452CD">0</span>
+total = <span style="color: #B452CD">0</span>
+<span style="color: #8B008B; font-weight: bold">with</span> torch.no_grad():  <span style="color: #228B22"># disable gradient calculation for evaluation </span>
+    <span style="color: #8B008B; font-weight: bold">for</span> images, labels <span style="color: #8B008B">in</span> test_loader:
+        images, labels = images.to(device), labels.to(device)
+        outputs = model(images)
+        _, predicted = torch.max(outputs, dim=<span style="color: #B452CD">1</span>)  <span style="color: #228B22"># class with highest score</span>
+        total += labels.size(<span style="color: #B452CD">0</span>)
+        correct += (predicted == labels).sum().item()
+
+accuracy = <span style="color: #B452CD">100</span> * correct / total
+<span style="color: #658b00">print</span>(<span style="color: #CD5555">f&quot;Test Accuracy: {</span>accuracy<span style="color: #CD5555">:.2f}%&quot;</span>)
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
+</div>
+</section>
 
-    ax.set_xlabel(<span style="color: #CD5555">&#39;$\\mathrm{learning\\ rate}$&#39;</span>,fontsize=fontsize)
-    ax.set_ylabel(<span style="color: #CD5555">&#39;$\\mathrm{hidden\\ neurons}$&#39;</span>,fontsize=fontsize)
-    <span style="color: #8B008B; font-weight: bold">if</span> title <span style="color: #8B008B">is</span> <span style="color: #8B008B">not</span> <span style="color: #8B008B; font-weight: bold">None</span>:
-        ax.set_title(title)
+<section>
+<h2 id="and-a-similar-example-using-tensorflow-with-keras">And a similar example using Tensorflow with Keras </h2>
 
-    plt.tight_layout()
 
-    plt.show()
-    
-plot_data(eta,n_neuron,Train_accuracy, <span style="color: #CD5555">&#39;training&#39;</span>)
-plot_data(eta,n_neuron,Test_accuracy, <span style="color: #CD5555">&#39;testing&#39;</span>)
+<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #eeeedd">
+  <pre style="font-size: 80%; line-height: 125%;"><span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">tensorflow</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">tf</span>
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">tensorflow</span> <span style="color: #8B008B; font-weight: bold">import</span> keras
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">tensorflow.keras</span> <span style="color: #8B008B; font-weight: bold">import</span> layers, regularizers
+
+<span style="color: #228B22"># Check for GPU (TensorFlow will use it automatically if available)</span>
+gpus = tf.config.list_physical_devices(<span style="color: #CD5555">&#39;GPU&#39;</span>)
+<span style="color: #658b00">print</span>(<span style="color: #CD5555">f&quot;GPUs available: {</span>gpus<span style="color: #CD5555">}&quot;</span>)
+
+<span style="color: #228B22"># 1) Load and preprocess MNIST</span>
+(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()
+<span style="color: #228B22"># Normalize to [0, 1]</span>
+x_train = (x_train.astype(<span style="color: #CD5555">&quot;float32&quot;</span>) / <span style="color: #B452CD">255.0</span>)
+x_test  = (x_test.astype(<span style="color: #CD5555">&quot;float32&quot;</span>) / <span style="color: #B452CD">255.0</span>)
+
+<span style="color: #228B22"># 2) Build the model: 784 -&gt; 100 -&gt; 100 -&gt; 10</span>
+l2_reg = <span style="color: #B452CD">1e-4</span>  <span style="color: #228B22"># L2 regularization strength</span>
+
+model = keras.Sequential([
+    layers.Input(shape=(<span style="color: #B452CD">28</span>, <span style="color: #B452CD">28</span>)),
+    layers.Flatten(),
+    layers.Dense(<span style="color: #B452CD">100</span>, activation=<span style="color: #CD5555">&quot;relu&quot;</span>,
+                 kernel_regularizer=regularizers.l2(l2_reg)),
+    layers.Dense(<span style="color: #B452CD">100</span>, activation=<span style="color: #CD5555">&quot;relu&quot;</span>,
+                 kernel_regularizer=regularizers.l2(l2_reg)),
+    layers.Dense(<span style="color: #B452CD">10</span>, activation=<span style="color: #CD5555">&quot;softmax&quot;</span>)  <span style="color: #228B22"># output probabilities for 10 classes</span>
+])
+
+<span style="color: #228B22"># 3) Compile with SGD + weight decay via L2 regularizers</span>
+model.compile(
+    optimizer=keras.optimizers.SGD(learning_rate=<span style="color: #B452CD">0.01</span>),
+    loss=<span style="color: #CD5555">&quot;sparse_categorical_crossentropy&quot;</span>,
+    metrics=[<span style="color: #CD5555">&quot;accuracy&quot;</span>],
+)
+
+model.summary()
+
+<span style="color: #228B22"># 4) Train</span>
+history = model.fit(
+    x_train, y_train,
+    epochs=<span style="color: #B452CD">10</span>,
+    batch_size=<span style="color: #B452CD">64</span>,
+    validation_split=<span style="color: #B452CD">0.1</span>,  <span style="color: #228B22"># optional: monitor validation during training</span>
+    verbose=<span style="color: #B452CD">1</span>
+)
+
+<span style="color: #228B22"># 5) Evaluate on test set</span>
+test_loss, test_acc = model.evaluate(x_test, y_test, verbose=<span style="color: #B452CD">0</span>)
+<span style="color: #658b00">print</span>(<span style="color: #CD5555">f&quot;Test accuracy: {</span>test_acc<span style="color: #CD5555">:.4f}, Test loss: {</span>test_loss<span style="color: #CD5555">:.4f}&quot;</span>)
 </pre>
 </div>
       </div>
@@ -2419,7 +1014,7 @@ <h2 id="the-breast-cancer-data-now-with-keras">The Breast Cancer Data, now with
 </section>
 
 <section>
-<h2 id="building-a-neural-network-code">Building a neural network code </h2>
+<h2 id="building-our-own-neural-network-code">Building our own  neural network code </h2>
 
 <p>Here we  present a flexible object oriented codebase
 for a feed forward neural network, along with a demonstration of how
diff --git a/doc/pub/week43/html/week43-solarized.html b/doc/pub/week43/html/week43-solarized.html
index 2650fd3cb..37267c7df 100644
--- a/doc/pub/week43/html/week43-solarized.html
+++ b/doc/pub/week43/html/week43-solarized.html
@@ -68,15 +68,6 @@
                2,
                None,
                'exercises-and-lab-session-week-43'),
-              ('Mathematics of deep learning',
-               2,
-               None,
-               'mathematics-of-deep-learning'),
-              ('Reminder on books with hands-on material and codes',
-               2,
-               None,
-               'reminder-on-books-with-hands-on-material-and-codes'),
-              ('Reading recommendations', 2, None, 'reading-recommendations'),
               ('Using Automatic differentiation',
                2,
                None,
@@ -85,10 +76,10 @@
                2,
                None,
                'back-propagation-and-automatic-differentiation'),
-              ('Lecture Monday  October 21',
+              ('Lecture Monday  October 20',
                2,
                None,
-               'lecture-monday-october-21'),
+               'lecture-monday-october-20'),
               ('Setting up the back propagation algorithm and algorithm for a '
                'feed forward NN, initalizations',
                2,
@@ -122,63 +113,6 @@
                2,
                None,
                'more-on-activation-functions-output-layers'),
-              ('Setting up a Multi-layer perceptron model for classification',
-               2,
-               None,
-               'setting-up-a-multi-layer-perceptron-model-for-classification'),
-              ('Defining the cost function',
-               2,
-               None,
-               'defining-the-cost-function'),
-              ('Example: binary classification problem',
-               2,
-               None,
-               'example-binary-classification-problem'),
-              ('The Softmax function', 2, None, 'the-softmax-function'),
-              ('Developing a code for doing neural networks with back '
-               'propagation',
-               2,
-               None,
-               'developing-a-code-for-doing-neural-networks-with-back-propagation'),
-              ('Collect and pre-process data',
-               2,
-               None,
-               'collect-and-pre-process-data'),
-              ('Train and test datasets', 2, None, 'train-and-test-datasets'),
-              ('Define model and architecture',
-               2,
-               None,
-               'define-model-and-architecture'),
-              ('Layers', 2, None, 'layers'),
-              ('Weights and biases', 2, None, 'weights-and-biases'),
-              ('Feed-forward pass', 2, None, 'feed-forward-pass'),
-              ('Matrix multiplications', 2, None, 'matrix-multiplications'),
-              ('Choose cost function and optimizer',
-               2,
-               None,
-               'choose-cost-function-and-optimizer'),
-              ('Optimizing the cost function',
-               2,
-               None,
-               'optimizing-the-cost-function'),
-              ('Regularization', 2, None, 'regularization'),
-              ('Matrix  multiplication', 2, None, 'matrix-multiplication'),
-              ('Improving performance', 2, None, 'improving-performance'),
-              ('Full object-oriented implementation',
-               2,
-               None,
-               'full-object-oriented-implementation'),
-              ('Evaluate model performance on test data',
-               2,
-               None,
-               'evaluate-model-performance-on-test-data'),
-              ('Adjust hyperparameters', 2, None, 'adjust-hyperparameters'),
-              ('Visualization', 2, None, 'visualization'),
-              ('scikit-learn implementation',
-               2,
-               None,
-               'scikit-learn-implementation'),
-              ('Visualization', 2, None, 'visualization'),
               ('Building neural networks in Tensorflow and Keras',
                2,
                None,
@@ -189,14 +123,18 @@
                2,
                None,
                'collect-and-pre-process-data'),
-              ('The Breast Cancer Data, now with Keras',
+              ('Using Pytorch with the full MNIST data set',
                2,
                None,
-               'the-breast-cancer-data-now-with-keras'),
-              ('Building a neural network code',
+               'using-pytorch-with-the-full-mnist-data-set'),
+              ('And a similar example using Tensorflow with Keras',
                2,
                None,
-               'building-a-neural-network-code'),
+               'and-a-similar-example-using-tensorflow-with-keras'),
+              ('Building our own  neural network code',
+               2,
+               None,
+               'building-our-own-neural-network-code'),
               ('Learning rate methods', 3, None, 'learning-rate-methods'),
               ('Usage of the above learning rate schedulers',
                3,
@@ -361,18 +299,15 @@ <h1>Week 43: Deep Learning: Constructing a Neural Network code and solving diffe
 
 <!-- author(s): Morten Hjorth-Jensen -->
 <center>
-<b>Morten Hjorth-Jensen</b> [1, 2]
-</center>
-<!-- institution(s) -->
-<center>
-[1] <b>Department of Physics, University of Oslo</b>
+<b>Morten Hjorth-Jensen</b> 
 </center>
+<!-- institution -->
 <center>
-[2] <b>Department of Physics and Astronomy and Facility for Rare Isotope Beams, Michigan State University</b>
+<b>Department of Physics, University of Oslo, Norway</b>
 </center>
 <br>
 <center>
-<h4>October 21, 2024</h4>
+<h4>October 20, 2025</h4>
 </center> <!-- date -->
 <br>
 
@@ -380,15 +315,16 @@ <h4>October 21, 2024</h4>
 <h2 id="plans-for-week-43">Plans for week 43 </h2>
 
 <div class="alert alert-block alert-block alert-text-normal">
-<b>Material for the lecture on Monday October 21, 2024</b>
+<b>Material for the lecture on Monday October 20, 2025</b>
 <p>
-<ul>
-  <li> Building our own Feed-forward Neural Network with intro to Tensorflow</li>
-  <li> Solving differential equations with Neural Networks</li>
-  <li> Video of lecture at <a href="https://youtu.be/vkBNTn-MLqs" target="_blank"><tt>https://youtu.be/vkBNTn-MLqs</tt></a></li>
-  <li> Video os second part, solving differential equations with neural networks at <a href="https://youtu.be/2N8To65I2wQ" target="_blank"><tt>https://youtu.be/2N8To65I2wQ</tt></a></li>
-  <li> Whiteboard notes on solving differential equations at <a href="https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2024/NotesOct21.pdf" target="_blank"><tt>https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2024/NotesOct21.pdf</tt></a></li>  
-</ul>
+<ol>
+<li> Reminder from last week, see also lecture notes from week 42 at <a href="https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/week42.html" target="_blank"><tt>https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/week42.html</tt></a> as well as those from week 41, see see <a href="https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/week41.html" target="_blank"><tt>https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/week41.html</tt></a>.</li> 
+<li> Building our own Feed-forward Neural Network.</li>
+<li> Coding examples using Tensorflow/Keras and Pytorch examples. The Pytorch examples are adapted from Rashcka's text, see chapters 11-13..</li> 
+<li> Start discussions on how to use neural networks for solving  differential equations (ordinary and partial ones). This topic continues next week as well.</li>
+<li> Video of lecture at <a href="https://youtu.be/Gi6mzxAT0Ew" target="_blank"><tt>https://youtu.be/Gi6mzxAT0Ew</tt></a></li>
+<li> Whiteboard notes at <a href="https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2025/FYSSTKweek43.pdf" target="_blank"><tt>https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2025/FYSSTKweek43.pdf</tt></a></li>  
+</ol>
 </div>
 
 
@@ -397,50 +333,18 @@ <h2 id="exercises-and-lab-session-week-43">Exercises and lab session week 43 </h
 <div class="alert alert-block alert-block alert-text-normal">
 <b>Lab sessions on Tuesday and Wednesday</b>
 <p>
-<ul>
-  <li> Exercise on writing your own neural network code</li>
-  <li> The exercises this week will be continued next week as well</li>
-  <li> Discussion of project 2</li>
-</ul>
-</div>
-  
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="mathematics-of-deep-learning">Mathematics of deep learning </h2>
-
-<div class="alert alert-block alert-block alert-text-normal">
-<b>Two recent books online</b>
-<p>
 <ol>
-<li> The Modern Mathematics of Deep Learning, by Julius Berner, Philipp Grohs, Gitta Kutyniok, Philipp Petersen at <a href="https://arxiv.org/abs/2105.04026" target="_blank"><tt>https://arxiv.org/abs/2105.04026</tt></a>, published as <a href="https://doi.org/10.1017/9781009025096.002" target="_blank">Mathematical Aspects of Deep Learning, pp. 1-111. Cambridge University Press, 2022</a></li>
-<li> Mathematical Introduction to Deep Learning: Methods, Implementations, and Theory, Arnulf Jentzen, Benno Kuckuck, Philippe von Wurstemberger at <a href="https://doi.org/10.48550/arXiv.2310.20360" target="_blank"><tt>https://doi.org/10.48550/arXiv.2310.20360</tt></a></li>
+<li> Work  on writing your own neural network code and discussions of project 2. If you didn't get time to do the exercises from the two last weeks, we recommend doing so as these exercises give you the basic elements of a neural network code.</li>
+<li> The exercises this week are tailored to the optional part of project 2, and deal with studying ways to display results from classification problems</li>
 </ol>
 </div>
+  
 
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="reminder-on-books-with-hands-on-material-and-codes">Reminder on books with hands-on material and codes </h2>
-<div class="alert alert-block alert-block alert-text-normal">
-<b></b>
-<p>
-<ul>
-<li> Sebastian Rashcka et al, Machine learning with Scikit-Learn and PyTorch at <a href="https://sebastianraschka.com/blog/2022/ml-pytorch-book.html" target="_blank"><tt>https://sebastianraschka.com/blog/2022/ml-pytorch-book.html</tt></a></li>
-</ul>
-</div>
-
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="reading-recommendations">Reading recommendations </h2>
-
-<ol>
-<li> Rashkca et al., chapter 11, jupyter-notebook sent separately, from GitHub site at <a href="https://github.com/rasbt/machine-learning-book" target="_blank"><tt>https://github.com/rasbt/machine-learning-book</tt></a>. See also chapters 12 and 13 on using Pytorch to make a Neural network code.</li> 
-<li> Goodfellow et al, chapter 6 and 7 contain most of the neural network background.</li>
-</ol>
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
 <h2 id="using-automatic-differentiation">Using Automatic differentiation </h2>
 
 <p>In our discussions of ordinary differential equations and neural network codes
-we will also study the usage of Autograd, see for example <a href="https://www.youtube.com/watch?v=fRf4l5qaX1M&ab_channel=AlexSmola" target="_blank"><tt>https://www.youtube.com/watch?v=fRf4l5qaX1M&ab_channel=AlexSmola</tt></a> in computing gradients for deep learning. For the documentation of Autograd and examples see the lectures slides from <a href="https://compphysics.github.io/MachineLearning/doc/pub/week39/html/week39.html" target="_blank">week 39</a> and the Autograd documentation at <a href="https://github.com/HIPS/autograd" target="_blank"><tt>https://github.com/HIPS/autograd</tt></a>.
+we will also study the usage of Autograd, see for example <a href="https://www.youtube.com/watch?v=fRf4l5qaX1M&ab_channel=AlexSmola" target="_blank"><tt>https://www.youtube.com/watch?v=fRf4l5qaX1M&ab_channel=AlexSmola</tt></a> in computing gradients for deep learning. For the documentation of Autograd and examples see the Autograd documentation at <a href="https://github.com/HIPS/autograd" target="_blank"><tt>https://github.com/HIPS/autograd</tt></a> and the lecture slides from week 41, see <a href="https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/week41.html" target="_blank"><tt>https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/week41.html</tt></a>. 
 </p>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
@@ -453,11 +357,11 @@ <h2 id="back-propagation-and-automatic-differentiation">Back propagation and aut
 <li> Slides 12-44 at <a href="http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture4.pdf" target="_blank"><tt>http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture4.pdf</tt></a></li>
 </ol>
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="lecture-monday-october-21">Lecture Monday  October 21 </h2>
+<h2 id="lecture-monday-october-20">Lecture Monday  October 20 </h2>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
 <h2 id="setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations">Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations </h2>
-<p>This is a reminder from where we ended last week.</p>
+<p>This is a reminder from last week.</p>
 
 <div class="alert alert-block alert-block alert-text-normal">
 <b>The architecture (our model)</b>
@@ -637,220 +541,163 @@ <h2 id="more-on-activation-functions-output-layers">More on activation functions
 <li> For classification the softmax activation function is generally a good choice for classification tasks (when the classes are mutually exclusive).</li>
 <li> For regression tasks, you can simply use no activation function at all.</li>
 </ul>
-<!-- !split  -->
-<h2 id="setting-up-a-multi-layer-perceptron-model-for-classification">Setting up a Multi-layer perceptron model for classification  </h2>
-
-<p>We are now gong to develop an example based on the MNIST data
-base. This is a classification problem and we need to use our
-cross-entropy function we discussed in connection with logistic
-regression. The cross-entropy defines our cost function for the
-classificaton problems with neural networks.
-</p>
-
-<p>In binary classification with two classes \( (0, 1) \) we define the
-logistic/sigmoid function as the probability that a particular input
-is in class \( 0 \) or \( 1 \).  This is possible because the logistic
-function takes any input from the real numbers and inputs a number
-between 0 and 1, and can therefore be interpreted as a probability. It
-also has other nice properties, such as a derivative that is simple to
-calculate.
-</p>
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="building-neural-networks-in-tensorflow-and-keras">Building neural networks in Tensorflow and Keras </h2>
 
-<p>For an input \( \boldsymbol{a} \) from the hidden layer, the probability that the input \( \boldsymbol{x} \)
-is in class 0 or 1 is just. We let \( \theta \) represent the unknown weights and biases to be adjusted by our equations). The variable \( x \)
-represents our activation values \( z \). We have
+<p>Now we want  to build on the experience gained from our neural network implementation in NumPy and scikit-learn
+and use it to construct a neural network in Tensorflow. Once we have constructed a neural network in NumPy
+and Tensorflow, building one in Keras is really quite trivial, though the performance may suffer.  
 </p>
-$$
-P(y = 0 \mid \boldsymbol{x}, \boldsymbol{\theta}) = \frac{1}{1 + \exp{(- \boldsymbol{x}})} ,
-$$
-
-<p>and</p>
-$$
-P(y = 1 \mid \boldsymbol{x}, \boldsymbol{\theta}) = 1 - P(y = 0 \mid \boldsymbol{x}, \boldsymbol{\theta}) ,
-$$
 
-<p>where \( y \in \{0, 1\} \)  and \( \boldsymbol{\theta} \) represents the weights and biases
-of our network.
+<p>In our previous example we used only one hidden layer, and in this we will use two. From this it should be quite
+clear how to build one using an arbitrary number of hidden layers, using data structures such as Python lists or
+NumPy arrays.
 </p>
 
-
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="defining-the-cost-function">Defining the cost function </h2>
-
-<p>Our cost function is given as (see the Logistic regression lectures)</p>
-$$
-\mathcal{C}(\boldsymbol{\theta}) = - \ln P(\mathcal{D} \mid \boldsymbol{\theta}) = - \sum_{i=1}^n
-y_i \ln[P(y_i = 0)] + (1 - y_i) \ln [1 - P(y_i = 0)] = \sum_{i=1}^n \mathcal{L}_i(\boldsymbol{\theta}) .
-$$
+<h2 id="tensorflow">Tensorflow </h2>
 
-<p>This last equality means that we can interpret our <em>cost</em> function as a sum over the <em>loss</em> function
-for each point in the dataset \( \mathcal{L}_i(\boldsymbol{\theta}) \).  
-The negative sign is just so that we can think about our algorithm as minimizing a positive number, rather
-than maximizing a negative number.  
+<p>Tensorflow is an open source library machine learning library
+developed by the Google Brain team for internal use. It was released
+under the Apache 2.0 open source license in November 9, 2015.
 </p>
 
-<p>In <em>multiclass</em> classification it is common to treat each integer label as a so called <em>one-hot</em> vector:  </p>
-
-<p>\( y = 5 \quad \rightarrow \quad \boldsymbol{y} = (0, 0, 0, 0, 0, 1, 0, 0, 0, 0) , \) and</p>
-
-\( y = 1 \quad \rightarrow \quad \boldsymbol{y} = (0, 1, 0, 0, 0, 0, 0, 0, 0, 0) , \)
-
-<p>i.e. a binary bit string of length \( C \), where \( C = 10 \) is the number of classes in the MNIST dataset (numbers from \( 0 \) to \( 9 \))..  </p>
-
-<p>If \( \boldsymbol{x}_i \) is the \( i \)-th input (image), \( y_{ic} \) refers to the \( c \)-th component of the \( i \)-th
-output vector \( \boldsymbol{y}_i \).  
-The probability of \( \boldsymbol{x}_i \) being in class \( c \) will be given by the softmax function:  
+<p>Tensorflow is a computational framework that allows you to construct
+machine learning models at different levels of abstraction, from
+high-level, object-oriented APIs like Keras, down to the C++ kernels
+that Tensorflow is built upon. The higher levels of abstraction are
+simpler to use, but less flexible, and our choice of implementation
+should reflect the problems we are trying to solve.
 </p>
 
-$$
-P(y_{ic} = 1 \mid \boldsymbol{x}_i, \boldsymbol{\theta}) = \frac{\exp{((\boldsymbol{a}_i^{hidden})^T \boldsymbol{w}_c)}}
-{\sum_{c'=0}^{C-1} \exp{((\boldsymbol{a}_i^{hidden})^T \boldsymbol{w}_{c'})}} ,
-$$
-
-<p>which reduces to the logistic function in the binary case.  
-The likelihood of this \( C \)-class classifier
-is now given as:  
+<p><a href="https://www.tensorflow.org/guide/graphs" target="_blank">Tensorflow uses</a> so-called graphs to represent your computation
+in terms of the dependencies between individual operations, such that you first build a Tensorflow <em>graph</em>
+to represent your model, and then create a Tensorflow <em>session</em> to run the graph.
 </p>
 
-$$
-P(\mathcal{D} \mid \boldsymbol{\theta}) = \prod_{i=1}^n \prod_{c=0}^{C-1} [P(y_{ic} = 1)]^{y_{ic}} .
-$$
-
-<p>Again we take the negative log-likelihood to define our cost function:  </p>
-
-$$
-\mathcal{C}(\boldsymbol{\theta}) = - \log{P(\mathcal{D} \mid \boldsymbol{\theta})}.
-$$
-
-<p>See the logistic regression lectures for a full definition of the cost function.</p>
-
-<p>The back propagation equations need now only a small change, namely the definition of a new cost function. We are thus ready to use the same equations as before!</p>
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="example-binary-classification-problem">Example: binary classification problem </h2>
-
-<p>As an example of the above, relevant for project 2 as well, let us consider a binary class. As discussed in our logistic regression lectures, we defined a cost function in terms of the parameters \( \beta \) as</p>
-$$
-\mathcal{C}(\boldsymbol{\beta}) = - \sum_{i=1}^n \left(y_i\log{p(y_i \vert x_i,\boldsymbol{\beta})}+(1-y_i)\log{1-p(y_i \vert x_i,\boldsymbol{\beta})}\right),
-$$
-
-<p>where we had defined the logistic (sigmoid) function</p>
-$$
-p(y_i =1\vert x_i,\boldsymbol{\beta})=\frac{\exp{(\beta_0+\beta_1 x_i)}}{1+\exp{(\beta_0+\beta_1 x_i)}},
-$$
+<p>In this guide we will analyze the same data as we did in our NumPy and
+scikit-learn tutorial, gathered from the MNIST database of images. We
+will give an introduction to the lower level Python Application
+Program Interfaces (APIs), and see how we use them to build our graph.
+Then we will build (effectively) the same graph in Keras, to see just
+how simple solving a machine learning problem can be.
+</p>
 
-<p>and</p>
-$$
-p(y_i =0\vert x_i,\boldsymbol{\beta})=1-p(y_i =1\vert x_i,\boldsymbol{\beta}).
-$$
+<p>To install tensorflow on Unix/Linux systems, use pip as</p>
 
-<p>The parameters \( \boldsymbol{\beta} \) were defined using a minimization method like gradient descent or Newton-Raphson's method. </p>
+<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #eeeedd">
+  <pre style="line-height: 125%;">pip3 install tensorflow
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
+</div>
 
-<p>Now we replace \( x_i \) with the activation \( z_i^l \) for a given layer \( l \) and the outputs as \( y_i=a_i^l=f(z_i^l) \), with \( z_i^l \) now being a function of the weights \( w_{ij}^l \) and biases \( b_i^l \). 
-We have then
+<p>and/or if you use <b>anaconda</b>, just write (or install from the graphical user interface)
+(current release of CPU-only TensorFlow)
 </p>
-$$
-a_i^l = y_i = \frac{\exp{(z_i^l)}}{1+\exp{(z_i^l)}},
-$$
 
-<p>with </p>
-$$
-z_i^l = \sum_{j}w_{ij}^l a_j^{l-1}+b_i^l,
-$$
+<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #eeeedd">
+  <pre style="line-height: 125%;">conda create -n tf tensorflow
+conda activate tf
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
+</div>
 
-<p>where the superscript \( l-1 \) indicates that these are the outputs from layer \( l-1 \).
-Our cost function at the final layer \( l=L \) is now
-</p>
-$$
-\mathcal{C}(\boldsymbol{W}) = - \sum_{i=1}^n \left(t_i\log{a_i^L}+(1-t_i)\log{(1-a_i^L)}\right),
-$$
+<p>To install the current release of GPU TensorFlow</p>
 
-<p>where we have defined the targets \( t_i \). The derivatives of the cost function with respect to the output \( a_i^L \) are then easily calculated and we get</p>
-$$
-\frac{\partial \mathcal{C}(\boldsymbol{W})}{\partial a_i^L} = \frac{a_i^L-t_i}{a_i^L(1-a_i^L)}. 
-$$
+<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #eeeedd">
+  <pre style="line-height: 125%;">conda create -n tf-gpu tensorflow-gpu
+conda activate tf-gpu
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
+</div>
 
-<p>In case we use another activation function than the logistic one, we need to evaluate other derivatives. </p>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="the-softmax-function">The Softmax function </h2>
-<p>In case we employ the more general case given by the Softmax equation, we need to evaluate the derivative of the activation function with respect to the activation \( z_i^l \), that is we need</p>
-$$
-\frac{\partial f(z_i^l)}{\partial w_{jk}^l} =
-\frac{\partial f(z_i^l)}{\partial z_j^l} \frac{\partial z_j^l}{\partial w_{jk}^l}= \frac{\partial f(z_i^l)}{\partial z_j^l}a_k^{l-1}.
-$$
-
-<p>For the Softmax function we have</p>
-$$
-f(z_i^l) = \frac{\exp{(z_i^l)}}{\sum_{m=1}^K\exp{(z_m^l)}}.
-$$
+<h2 id="using-keras">Using Keras </h2>
 
-<p>Its derivative with respect to \( z_j^l \) gives </p>
-$$
-\frac{\partial f(z_i^l)}{\partial z_j^l}= f(z_i^l)\left(\delta_{ij}-f(z_j^l)\right), 
-$$
+<p>Keras is a high level <a href="https://en.wikipedia.org/wiki/Application_programming_interface" target="_blank">neural network</a>
+that supports Tensorflow, CTNK and Theano as backends.  
+If you have Anaconda installed you may run the following command
+</p>
 
-<p>which in case of the simply binary model reduces to  having \( i=j \). </p>
+<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #eeeedd">
+  <pre style="line-height: 125%;">conda install keras
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
+</div>
 
-<!-- !split  -->
-<h2 id="developing-a-code-for-doing-neural-networks-with-back-propagation">Developing a code for doing neural networks with back propagation </h2>
+<p>You can look up the <a href="https://keras.io/" target="_blank">instructions here</a> for more information.</p>
 
-<p>One can identify a set of key steps when using neural networks to solve supervised learning problems:  </p>
+<p>We will to a large extent use <b>keras</b> in this course. </p>
 
-<ol>
-<li> Collect and pre-process data</li>  
-<li> Define model and architecture</li>  
-<li> Choose cost function and optimizer</li>  
-<li> Train the model</li>  
-<li> Evaluate model performance on test data</li>  
-<li> Adjust hyperparameters (if necessary, network architecture)</li>
-</ol>
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
 <h2 id="collect-and-pre-process-data">Collect and pre-process data </h2>
 
-<p>Here we will be using the MNIST dataset, which is readily available through the <b>scikit-learn</b>
-package. You may also find it for example <a href="http://yann.lecun.com/exdb/mnist/" target="_blank">here</a>.  
-The <em>MNIST</em> (Modified National Institute of Standards and Technology) database is a large database
-of handwritten digits that is commonly used for training various image processing systems.  
-The MNIST dataset consists of 70 000 images of size \( 28\times 28 \) pixels, each labeled from 0 to 9.  
-The scikit-learn dataset we will use consists of a selection of 1797 images of size \( 8\times 8 \) collected and processed from this database.  
-</p>
-
-<p>To feed data into a feed-forward neural network we need to represent
-the inputs as a design/feature matrix \( X = (n_{inputs}, n_{features}) \).  Each
-row represents an <em>input</em>, in this case a handwritten digit, and
-each column represents a <em>feature</em>, in this case a pixel.  The
-correct answers, also known as <em>labels</em> or <em>targets</em> are
-represented as a 1D array of integers 
-\( Y = (n_{inputs}) = (5, 3, 1, 8,...) \).
-</p>
-
-<p>As an example, say we want to build a neural network using supervised learning to predict Body-Mass Index (BMI) from
-measurements of height (in m)  
-and weight (in kg). If we have measurements of 5 people the design/feature matrix could be for example:  
-</p>
-
-<p>$$ X = \begin{bmatrix}
-1.85 &amp; 81\\
-1.71 &amp; 65\\
-1.95 &amp; 103\\
-1.55 &amp; 42\\
-1.63 &amp; 56
-\end{bmatrix} ,$$  
-</p>
-
-<p>and the targets would be:  </p>
-
-<p>$$ Y = (23.7, 22.2, 27.1, 17.5, 21.1) $$  </p>
-
-<p>Since each input image is a 2D matrix, we need to flatten the image
-(i.e. "unravel" the 2D matrix into a 1D array) to turn the data into a
-design/feature matrix. This means we lose all spatial information in the
-image, such as locality and translational invariance. More complicated
-architectures such as Convolutional Neural Networks can take advantage
-of such information, and are most commonly applied when analyzing
-images.
-</p>
+<p>Let us look again at the MINST data set.</p>
 
 
 <!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
@@ -862,1183 +709,7 @@ <h2 id="collect-and-pre-process-data">Collect and pre-process data </h2>
   <pre style="line-height: 125%;"><span style="color: #228B22"># import necessary packages</span>
 <span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
 <span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">matplotlib.pyplot</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">plt</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn</span> <span style="color: #8B008B; font-weight: bold">import</span> datasets
-
-
-<span style="color: #228B22"># ensure the same random numbers appear every time</span>
-np.random.seed(<span style="color: #B452CD">0</span>)
-
-<span style="color: #228B22"># display images in notebook</span>
-%matplotlib inline
-plt.rcParams[<span style="color: #CD5555">&#39;figure.figsize&#39;</span>] = (<span style="color: #B452CD">12</span>,<span style="color: #B452CD">12</span>)
-
-
-<span style="color: #228B22"># download MNIST dataset</span>
-digits = datasets.load_digits()
-
-<span style="color: #228B22"># define inputs and labels</span>
-inputs = digits.images
-labels = digits.target
-
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;inputs = (n_inputs, pixel_width, pixel_height) = &quot;</span> + <span style="color: #658b00">str</span>(inputs.shape))
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;labels = (n_inputs) = &quot;</span> + <span style="color: #658b00">str</span>(labels.shape))
-
-
-<span style="color: #228B22"># flatten the image</span>
-<span style="color: #228B22"># the value -1 means dimension is inferred from the remaining dimensions: 8x8 = 64</span>
-n_inputs = <span style="color: #658b00">len</span>(inputs)
-inputs = inputs.reshape(n_inputs, -<span style="color: #B452CD">1</span>)
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;X = (n_inputs, n_features) = &quot;</span> + <span style="color: #658b00">str</span>(inputs.shape))
-
-
-<span style="color: #228B22"># choose some random images to display</span>
-indices = np.arange(n_inputs)
-random_indices = np.random.choice(indices, size=<span style="color: #B452CD">5</span>)
-
-<span style="color: #8B008B; font-weight: bold">for</span> i, image <span style="color: #8B008B">in</span> <span style="color: #658b00">enumerate</span>(digits.images[random_indices]):
-    plt.subplot(<span style="color: #B452CD">1</span>, <span style="color: #B452CD">5</span>, i+<span style="color: #B452CD">1</span>)
-    plt.axis(<span style="color: #CD5555">&#39;off&#39;</span>)
-    plt.imshow(image, cmap=plt.cm.gray_r, interpolation=<span style="color: #CD5555">&#39;nearest&#39;</span>)
-    plt.title(<span style="color: #CD5555">&quot;Label: %d&quot;</span> % digits.target[random_indices[i]])
-plt.show()
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="train-and-test-datasets">Train and test datasets </h2>
-
-<p>Performing analysis before partitioning the dataset is a major error, that can lead to incorrect conclusions.  </p>
-
-<p>We will reserve \( 80 \% \) of our dataset for training and \( 20 \% \) for testing.  </p>
-
-<p>It is important that the train and test datasets are drawn randomly from our dataset, to ensure
-no bias in the sampling.  
-Say you are taking measurements of weather data to predict the weather in the coming 5 days.
-You don't want to train your model on measurements taken from the hours 00.00 to 12.00, and then test it on data
-collected from 12.00 to 24.00.
-</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="line-height: 125%;"><span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.model_selection</span> <span style="color: #8B008B; font-weight: bold">import</span> train_test_split
-
-<span style="color: #228B22"># one-liner from scikit-learn library</span>
-train_size = <span style="color: #B452CD">0.8</span>
-test_size = <span style="color: #B452CD">1</span> - train_size
-X_train, X_test, Y_train, Y_test = train_test_split(inputs, labels, train_size=train_size,
-                                                    test_size=test_size)
-
-<span style="color: #228B22"># equivalently in numpy</span>
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">train_test_split_numpy</span>(inputs, labels, train_size, test_size):
-    n_inputs = <span style="color: #658b00">len</span>(inputs)
-    inputs_shuffled = inputs.copy()
-    labels_shuffled = labels.copy()
-    
-    np.random.shuffle(inputs_shuffled)
-    np.random.shuffle(labels_shuffled)
-    
-    train_end = <span style="color: #658b00">int</span>(n_inputs*train_size)
-    X_train, X_test = inputs_shuffled[:train_end], inputs_shuffled[train_end:]
-    Y_train, Y_test = labels_shuffled[:train_end], labels_shuffled[train_end:]
-    
-    <span style="color: #8B008B; font-weight: bold">return</span> X_train, X_test, Y_train, Y_test
-
-<span style="color: #228B22">#X_train, X_test, Y_train, Y_test = train_test_split_numpy(inputs, labels, train_size, test_size)</span>
-
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;Number of training images: &quot;</span> + <span style="color: #658b00">str</span>(<span style="color: #658b00">len</span>(X_train)))
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;Number of test images: &quot;</span> + <span style="color: #658b00">str</span>(<span style="color: #658b00">len</span>(X_test)))
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="define-model-and-architecture">Define model and architecture </h2>
-
-<p>Our simple feed-forward neural network will consist of an <em>input</em> layer, a single <em>hidden</em> layer and an <em>output</em> layer. The activation \( y \) of each neuron is a weighted sum of inputs, passed through an activation function. In case of the simple perceptron model we have </p>
-
-<p>$$ z = \sum_{i=1}^n w_i a_i ,$$</p>
-
-<p>$$ y = f(z) ,$$</p>
-
-<p>where \( f \) is the activation function, \( a_i \) represents input from neuron \( i \) in the preceding layer
-and \( w_i \) is the weight to input \( i \).  
-The activation of the neurons in the input layer is just the features (e.g. a pixel value).  
-</p>
-
-<p>The simplest activation function for a neuron is the <em>Heaviside</em> function:</p>
-
-<p>$$ f(z) = 
-\begin{cases}
-1,  &  z > 0\\
-0,  & \text{otherwise}
-\end{cases}
-$$
-</p>
-
-<p>A feed-forward neural network with this activation is known as a <em>perceptron</em>.  
-For a binary classifier (i.e. two classes, 0 or 1, dog or not-dog) we can also use this in our output layer.  
-This activation can be generalized to \( k \) classes (using e.g. the <em>one-against-all</em> strategy), 
-and we call these architectures <em>multiclass perceptrons</em>.  
-</p>
-
-<p>However, it is now common to use the terms Single Layer Perceptron (SLP) (1 hidden layer) and  
-Multilayer Perceptron (MLP) (2 or more hidden layers) to refer to feed-forward neural networks with any activation function.  
-</p>
-
-<p>Typical choices for activation functions include the sigmoid function, hyperbolic tangent, and Rectified Linear Unit (ReLU).  
-We will be using the sigmoid function \( \sigma(x) \):  
-</p>
-
-<p>$$ f(x) = \sigma(x) = \frac{1}{1 + e^{-x}} ,$$</p>
-
-<p>which is inspired by probability theory (see logistic regression) and was most commonly used until about 2011. See the discussion below concerning other activation functions.</p>
-
-<!-- !split   -->
-<h2 id="layers">Layers </h2>
-
-<ul>
-<li> Input</li> 
-</ul>
-<p>Since each input image has 8x8 = 64 pixels or features, we have an input layer of 64 neurons.  </p>
-
-<ul>
-<li> Hidden layer</li>
-</ul>
-<p>We will use 50 neurons in the hidden layer receiving input from the neurons in the input layer.  
-Since each neuron in the hidden layer is connected to the 64 inputs we have 64x50 = 3200 weights to the hidden layer.  
-</p>
-
-<ul>
-<li> Output</li>
-</ul>
-<p>If we were building a binary classifier, it would be sufficient with a single neuron in the output layer,
-which could output 0 or 1 according to the Heaviside function. This would be an example of a <em>hard</em> classifier, meaning it outputs the class of the input directly. However, if we are dealing with noisy data it is often beneficial to use a <em>soft</em> classifier, which outputs the probability of being in class 0 or 1.  
-</p>
-
-<p>For a soft binary classifier, we could use a single neuron and interpret the output as either being the probability of being in class 0 or the probability of being in class 1. Alternatively we could use 2 neurons, and interpret each neuron as the probability of being in each class.  </p>
-
-<p>Since we are doing multiclass classification, with 10 categories, it is natural to use 10 neurons in the output layer. We number the neurons \( j = 0,1,...,9 \). The activation of each output neuron \( j \) will be according to the <em>softmax</em> function:  </p>
-
-<p>$$ P(\text{class \( j \)} \mid \text{input \( \boldsymbol{a} \)}) = \frac{\exp{(\boldsymbol{a}^T \boldsymbol{w}_j)}}
-{\sum_{c=0}^{9} \exp{(\boldsymbol{a}^T \boldsymbol{w}_c)}} ,$$  
-</p>
-
-<p>i.e. each neuron \( j \) outputs the probability of being in class \( j \) given an input from the hidden layer \( \boldsymbol{a} \), with \( \boldsymbol{w}_j \) the weights of neuron \( j \) to the inputs.  
-The denominator is a normalization factor to ensure the outputs (probabilities) sum up to 1.  
-The exponent is just the weighted sum of inputs as before:  
-</p>
-
-<p>$$ z_j = \sum_{i=1}^n w_ {ij} a_i+b_j.$$  </p>
-
-<p>Since each neuron in the output layer is connected to the 50 inputs from the hidden layer we have 50x10 = 500
-weights to the output layer.
-</p>
-
-<!-- !split   -->
-<h2 id="weights-and-biases">Weights and biases </h2>
-
-<p>Typically weights are initialized with small values distributed around zero, drawn from a uniform
-or normal distribution. Setting all weights to zero means all neurons give the same output, making the network useless.  
-</p>
-
-<p>Adding a bias value to the weighted sum of inputs allows the neural network to represent a greater range
-of values. Without it, any input with the value 0 will be mapped to zero (before being passed through the activation). The bias unit has an output of 1, and a weight to each neuron \( j \), \( b_j \):  
-</p>
-
-<p>$$ z_j = \sum_{i=1}^n w_ {ij} a_i + b_j.$$  </p>
-
-<p>The bias weights \( \boldsymbol{b} \) are often initialized to zero, but a small value like \( 0.01 \) ensures all neurons have some output which can be backpropagated in the first training cycle.</p>
-
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="line-height: 125%;"><span style="color: #228B22"># building our neural network</span>
-
-n_inputs, n_features = X_train.shape
-n_hidden_neurons = <span style="color: #B452CD">50</span>
-n_categories = <span style="color: #B452CD">10</span>
-
-<span style="color: #228B22"># we make the weights normally distributed using numpy.random.randn</span>
-
-<span style="color: #228B22"># weights and bias in the hidden layer</span>
-hidden_weights = np.random.randn(n_features, n_hidden_neurons)
-hidden_bias = np.zeros(n_hidden_neurons) + <span style="color: #B452CD">0.01</span>
-
-<span style="color: #228B22"># weights and bias in the output layer</span>
-output_weights = np.random.randn(n_hidden_neurons, n_categories)
-output_bias = np.zeros(n_categories) + <span style="color: #B452CD">0.01</span>
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="feed-forward-pass">Feed-forward pass </h2>
-
-<p>Denote \( F \) the number of features, \( H \) the number of hidden neurons and \( C \) the number of categories.  
-For each input image we calculate a weighted sum of input features (pixel values) to each neuron \( j \) in the hidden layer \( l \):  
-</p>
-
-<p>$$ z_{j}^{l} = \sum_{i=1}^{F} w_{ij}^{l} x_i + b_{j}^{l},$$</p>
-
-<p>this is then passed through our activation function  </p>
-
-<p>$$ a_{j}^{l} = f(z_{j}^{l}) .$$  </p>
-
-<p>We calculate a weighted sum of inputs (activations in the hidden layer) to each neuron \( j \) in the output layer:  </p>
-
-<p>$$ z_{j}^{L} = \sum_{i=1}^{H} w_{ij}^{L} a_{i}^{l} + b_{j}^{L}.$$  </p>
-
-<p>Finally we calculate the output of neuron \( j \) in the output layer using the softmax function:  </p>
-
-<p>$$ a_{j}^{L} = \frac{\exp{(z_j^{L})}}
-{\sum_{c=0}^{C-1} \exp{(z_c^{L})}} .$$  
-</p>
-
-<!-- !split    -->
-<h2 id="matrix-multiplications">Matrix multiplications </h2>
-
-<p>Since our data has the dimensions \( X = (n_{inputs}, n_{features}) \) and our weights to the hidden
-layer have the dimensions  
-\( W_{hidden} = (n_{features}, n_{hidden}) \),
-we can easily feed the network all our training data in one go by taking the matrix product  
-</p>
-
-<p>$$ X W^{h} = (n_{inputs}, n_{hidden}),$$ </p>
-
-<p>and obtain a matrix that holds the weighted sum of inputs to the hidden layer
-for each input image and each hidden neuron.    
-We also add the bias to obtain a matrix of weighted sums to the hidden layer \( Z^{h} \):  
-</p>
-
-<p>$$ \boldsymbol{z}^{l} = \boldsymbol{X} \boldsymbol{W}^{l} + \boldsymbol{b}^{l} ,$$</p>
-
-<p>meaning the same bias (1D array with size equal number of hidden neurons) is added to each input image.  
-This is then passed through the activation:  
-</p>
-
-<p>$$ \boldsymbol{a}^{l} = f(\boldsymbol{z}^l) .$$  </p>
-
-<p>This is fed to the output layer:  </p>
-
-<p>$$ \boldsymbol{z}^{L} = \boldsymbol{a}^{L} \boldsymbol{W}^{L} + \boldsymbol{b}^{L} .$$</p>
-
-<p>Finally we receive our output values for each image and each category by passing it through the softmax function:  </p>
-
-<p>$$ output = softmax (\boldsymbol{z}^{L}) = (n_{inputs}, n_{categories}) .$$</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="line-height: 125%;"><span style="color: #228B22"># setup the feed-forward pass, subscript h = hidden layer</span>
-
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">sigmoid</span>(x):
-    <span style="color: #8B008B; font-weight: bold">return</span> <span style="color: #B452CD">1</span>/(<span style="color: #B452CD">1</span> + np.exp(-x))
-
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">feed_forward</span>(X):
-    <span style="color: #228B22"># weighted sum of inputs to the hidden layer</span>
-    z_h = np.matmul(X, hidden_weights) + hidden_bias
-    <span style="color: #228B22"># activation in the hidden layer</span>
-    a_h = sigmoid(z_h)
-    
-    <span style="color: #228B22"># weighted sum of inputs to the output layer</span>
-    z_o = np.matmul(a_h, output_weights) + output_bias
-    <span style="color: #228B22"># softmax output</span>
-    <span style="color: #228B22"># axis 0 holds each input and axis 1 the probabilities of each category</span>
-    exp_term = np.exp(z_o)
-    probabilities = exp_term / np.sum(exp_term, axis=<span style="color: #B452CD">1</span>, keepdims=<span style="color: #8B008B; font-weight: bold">True</span>)
-    
-    <span style="color: #8B008B; font-weight: bold">return</span> probabilities
-
-probabilities = feed_forward(X_train)
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;probabilities = (n_inputs, n_categories) = &quot;</span> + <span style="color: #658b00">str</span>(probabilities.shape))
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;probability that image 0 is in category 0,1,2,...,9 = \n&quot;</span> + <span style="color: #658b00">str</span>(probabilities[<span style="color: #B452CD">0</span>]))
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;probabilities sum up to: &quot;</span> + <span style="color: #658b00">str</span>(probabilities[<span style="color: #B452CD">0</span>].sum()))
-<span style="color: #658b00">print</span>()
-
-<span style="color: #228B22"># we obtain a prediction by taking the class with the highest likelihood</span>
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">predict</span>(X):
-    probabilities = feed_forward(X)
-    <span style="color: #8B008B; font-weight: bold">return</span> np.argmax(probabilities, axis=<span style="color: #B452CD">1</span>)
-
-predictions = predict(X_train)
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;predictions = (n_inputs) = &quot;</span> + <span style="color: #658b00">str</span>(predictions.shape))
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;prediction for image 0: &quot;</span> + <span style="color: #658b00">str</span>(predictions[<span style="color: #B452CD">0</span>]))
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;correct label for image 0: &quot;</span> + <span style="color: #658b00">str</span>(Y_train[<span style="color: #B452CD">0</span>]))
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="choose-cost-function-and-optimizer">Choose cost function and optimizer </h2>
-
-<p>To measure how well our neural network is doing we need to introduce a cost function.  
-We will call the function that gives the error of a single sample output the <em>loss</em> function, and the function
-that gives the total error of our network across all samples the <em>cost</em> function.
-A typical choice for multiclass classification is the <em>cross-entropy</em> loss, also known as the negative log likelihood.  
-</p>
-
-<p>In <em>multiclass</em> classification it is common to treat each integer label as a so called <em>one-hot</em> vector:  </p>
-
-<p>$$ y = 5 \quad \rightarrow \quad \boldsymbol{y} = (0, 0, 0, 0, 0, 1, 0, 0, 0, 0) ,$$  </p>
-
-<p>$$ y = 1 \quad \rightarrow \quad \boldsymbol{y} = (0, 1, 0, 0, 0, 0, 0, 0, 0, 0) ,$$  </p>
-
-<p>i.e. a binary bit string of length \( C \), where \( C = 10 \) is the number of classes in the MNIST dataset.  </p>
-
-<p>Let \( y_{ic} \) denote the \( c \)-th component of the \( i \)-th one-hot vector.  
-We define the cost function \( \mathcal{C} \) as a sum over the cross-entropy loss for each point \( \boldsymbol{x}_i \) in the dataset.
-</p>
-
-<p>In the one-hot representation only one of the terms in the loss function is non-zero, namely the
-probability of the correct category \( c' \)  
-(i.e. the category \( c' \) such that \( y_{ic'} = 1 \)). This means that the cross entropy loss only punishes you for how wrong
-you got the correct label. The probability of category \( c \) is given by the softmax function. The vector \( \boldsymbol{\theta} \) represents the parameters of our network, i.e. all the weights and biases.  
-</p>
-
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="optimizing-the-cost-function">Optimizing the cost function </h2>
-
-<p>The network is trained by finding the weights and biases that minimize the cost function. One of the most widely used classes of methods is <em>gradient descent</em> and its generalizations. The idea behind gradient descent
-is simply to adjust the weights in the direction where the gradient of the cost function is large and negative. This ensures we flow toward a <em>local</em> minimum of the cost function.  
-Each parameter \( \theta \) is iteratively adjusted according to the rule  
-</p>
-
-<p>$$ \theta_{i+1} = \theta_i - \eta \nabla \mathcal{C}(\theta_i) ,$$</p>
-
-<p>where \( \eta \) is known as the <em>learning rate</em>, which controls how big a step we take towards the minimum.  
-This update can be repeated for any number of iterations, or until we are satisfied with the result.  
-</p>
-
-<p>A simple and effective improvement is a variant called <em>Batch Gradient Descent</em>.  
-Instead of calculating the gradient on the whole dataset, we calculate an approximation of the gradient
-on a subset of the data called a <em>minibatch</em>.  
-If there are \( N \) data points and we have a minibatch size of \( M \), the total number of batches
-is \( N/M \).  
-We denote each minibatch \( B_k \), with \( k = 1, 2,...,N/M \). The gradient then becomes:  
-</p>
-
-<p>$$ \nabla \mathcal{C}(\theta) = \frac{1}{N} \sum_{i=1}^N \nabla \mathcal{L}_i(\theta) \quad \rightarrow \quad
-\frac{1}{M} \sum_{i \in B_k} \nabla \mathcal{L}_i(\theta) ,$$
-</p>
-
-<p>i.e. instead of averaging the loss over the entire dataset, we average over a minibatch.  </p>
-
-<p>This has two important benefits:  </p>
-<ol>
-<li> Introducing stochasticity decreases the chance that the algorithm becomes stuck in a local minima.</li>  
-<li> It significantly speeds up the calculation, since we do not have to use the entire dataset to calculate the gradient.</li>  
-</ol>
-<p>The various optmization  methods, with codes and algorithms,  are discussed in our lectures on <a href="https://compphysics.github.io/MachineLearning/doc/pub/Splines/html/Splines-bs.html" target="_blank">Gradient descent approaches</a>.</p>
-
-<!-- !split   -->
-<h2 id="regularization">Regularization </h2>
-
-<p>It is common to add an extra term to the cost function, proportional
-to the size of the weights.  This is equivalent to constraining the
-size of the weights, so that they do not grow out of control.
-Constraining the size of the weights means that the weights cannot
-grow arbitrarily large to fit the training data, and in this way
-reduces <em>overfitting</em>.
-</p>
-
-<p>We will measure the size of the weights using the so called <em>L2-norm</em>, meaning our cost function becomes:  </p>
-
-<p>$$  \mathcal{C}(\theta) = \frac{1}{N} \sum_{i=1}^N \mathcal{L}_i(\theta) \quad \rightarrow \quad
-\frac{1}{N} \sum_{i=1}^N  \mathcal{L}_i(\theta) + \lambda \lvert \lvert \boldsymbol{w} \rvert \rvert_2^2 
-= \frac{1}{N} \sum_{i=1}^N  \mathcal{L}(\theta) + \lambda \sum_{ij} w_{ij}^2,$$  
-</p>
-
-<p>i.e. we sum up all the weights squared. The factor \( \lambda \) is known as a regularization parameter.</p>
-
-<p>In order to train the model, we need to calculate the derivative of
-the cost function with respect to every bias and weight in the
-network.  In total our network has \( (64 + 1)\times 50=3250 \) weights in
-the hidden layer and \( (50 + 1)\times 10=510 \) weights to the output
-layer (\( +1 \) for the bias), and the gradient must be calculated for
-every parameter.  We use the <em>backpropagation</em> algorithm discussed
-above. This is a clever use of the chain rule that allows us to
-calculate the gradient efficently. 
-</p>
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="matrix-multiplication">Matrix  multiplication </h2>
-
-<p>To more efficently train our network these equations are implemented using matrix operations.  
-The error in the output layer is calculated simply as, with \( \boldsymbol{t} \) being our targets,  
-</p>
-
-<p>$$ \delta_L = \boldsymbol{t} - \boldsymbol{y} = (n_{inputs}, n_{categories}) .$$  </p>
-
-<p>The gradient for the output weights is calculated as  </p>
-
-<p>$$ \nabla W_{L} = \boldsymbol{a}^T \delta_L   = (n_{hidden}, n_{categories}) ,$$</p>
-
-<p>where \( \boldsymbol{a} = (n_{inputs}, n_{hidden}) \). This simply means that we are summing up the gradients for each input.  
-Since we are going backwards we have to transpose the activation matrix.  
-</p>
-
-<p>The gradient with respect to the output bias is then  </p>
-
-<p>$$ \nabla \boldsymbol{b}_{L} = \sum_{i=1}^{n_{inputs}} \delta_L = (n_{categories}) .$$  </p>
-
-<p>The error in the hidden layer is  </p>
-
-<p>$$ \Delta_h = \delta_L W_{L}^T \circ f'(z_{h}) = \delta_L W_{L}^T \circ a_{h} \circ (1 - a_{h}) = (n_{inputs}, n_{hidden}) ,$$  </p>
-
-<p>where \( f'(a_{h}) \) is the derivative of the activation in the hidden layer. The matrix products mean
-that we are summing up the products for each neuron in the output layer. The symbol \( \circ \) denotes
-the <em>Hadamard product</em>, meaning element-wise multiplication.  
-</p>
-
-<p>This again gives us the gradients in the hidden layer:  </p>
-
-<p>$$ \nabla W_{h} = X^T \delta_h = (n_{features}, n_{hidden}) ,$$  </p>
-
-<p>$$ \nabla b_{h} = \sum_{i=1}^{n_{inputs}} \delta_h = (n_{hidden}) .$$</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="line-height: 125%;"><span style="color: #228B22"># to categorical turns our integer vector into a onehot representation</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.metrics</span> <span style="color: #8B008B; font-weight: bold">import</span> accuracy_score
-
-<span style="color: #228B22"># one-hot in numpy</span>
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">to_categorical_numpy</span>(integer_vector):
-    n_inputs = <span style="color: #658b00">len</span>(integer_vector)
-    n_categories = np.max(integer_vector) + <span style="color: #B452CD">1</span>
-    onehot_vector = np.zeros((n_inputs, n_categories))
-    onehot_vector[<span style="color: #658b00">range</span>(n_inputs), integer_vector] = <span style="color: #B452CD">1</span>
-    
-    <span style="color: #8B008B; font-weight: bold">return</span> onehot_vector
-
-<span style="color: #228B22">#Y_train_onehot, Y_test_onehot = to_categorical(Y_train), to_categorical(Y_test)</span>
-Y_train_onehot, Y_test_onehot = to_categorical_numpy(Y_train), to_categorical_numpy(Y_test)
-
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">feed_forward_train</span>(X):
-    <span style="color: #228B22"># weighted sum of inputs to the hidden layer</span>
-    z_h = np.matmul(X, hidden_weights) + hidden_bias
-    <span style="color: #228B22"># activation in the hidden layer</span>
-    a_h = sigmoid(z_h)
-    
-    <span style="color: #228B22"># weighted sum of inputs to the output layer</span>
-    z_o = np.matmul(a_h, output_weights) + output_bias
-    <span style="color: #228B22"># softmax output</span>
-    <span style="color: #228B22"># axis 0 holds each input and axis 1 the probabilities of each category</span>
-    exp_term = np.exp(z_o)
-    probabilities = exp_term / np.sum(exp_term, axis=<span style="color: #B452CD">1</span>, keepdims=<span style="color: #8B008B; font-weight: bold">True</span>)
-    
-    <span style="color: #228B22"># for backpropagation need activations in hidden and output layers</span>
-    <span style="color: #8B008B; font-weight: bold">return</span> a_h, probabilities
-
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">backpropagation</span>(X, Y):
-    a_h, probabilities = feed_forward_train(X)
-    
-    <span style="color: #228B22"># error in the output layer</span>
-    error_output = probabilities - Y
-    <span style="color: #228B22"># error in the hidden layer</span>
-    error_hidden = np.matmul(error_output, output_weights.T) * a_h * (<span style="color: #B452CD">1</span> - a_h)
-    
-    <span style="color: #228B22"># gradients for the output layer</span>
-    output_weights_gradient = np.matmul(a_h.T, error_output)
-    output_bias_gradient = np.sum(error_output, axis=<span style="color: #B452CD">0</span>)
-    
-    <span style="color: #228B22"># gradient for the hidden layer</span>
-    hidden_weights_gradient = np.matmul(X.T, error_hidden)
-    hidden_bias_gradient = np.sum(error_hidden, axis=<span style="color: #B452CD">0</span>)
-
-    <span style="color: #8B008B; font-weight: bold">return</span> output_weights_gradient, output_bias_gradient, hidden_weights_gradient, hidden_bias_gradient
-
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;Old accuracy on training data: &quot;</span> + <span style="color: #658b00">str</span>(accuracy_score(predict(X_train), Y_train)))
-
-eta = <span style="color: #B452CD">0.01</span>
-lmbd = <span style="color: #B452CD">0.01</span>
-<span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #B452CD">1000</span>):
-    <span style="color: #228B22"># calculate gradients</span>
-    dWo, dBo, dWh, dBh = backpropagation(X_train, Y_train_onehot)
-    
-    <span style="color: #228B22"># regularization term gradients</span>
-    dWo += lmbd * output_weights
-    dWh += lmbd * hidden_weights
-    
-    <span style="color: #228B22"># update weights and biases</span>
-    output_weights -= eta * dWo
-    output_bias -= eta * dBo
-    hidden_weights -= eta * dWh
-    hidden_bias -= eta * dBh
-
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;New accuracy on training data: &quot;</span> + <span style="color: #658b00">str</span>(accuracy_score(predict(X_train), Y_train)))
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="improving-performance">Improving performance </h2>
-
-<p>As we can see the network does not seem to be learning at all. It seems to be just guessing the label for each image.  
-In order to obtain a network that does something useful, we will have to do a bit more work.  
-</p>
-
-<p>The choice of <em>hyperparameters</em> such as learning rate and regularization parameter is hugely influential for the performance of the network. Typically a <em>grid-search</em> is performed, wherein we test different hyperparameters separated by orders of magnitude. For example we could test the learning rates \( \eta = 10^{-6}, 10^{-5},...,10^{-1} \) with different regularization parameters \( \lambda = 10^{-6},...,10^{-0} \).  </p>
-
-<p>Next, we haven't implemented minibatching yet, which introduces stochasticity and is though to act as an important regularizer on the weights. We call a feed-forward + backward pass with a minibatch an <em>iteration</em>, and a full training period
-going through the entire dataset (\( n/M \) batches) an <em>epoch</em>.
-</p>
-
-<p>If this does not improve network performance, you may want to consider altering the network architecture, adding more neurons or hidden layers.  
-Andrew Ng goes through some of these considerations in this <a href="https://youtu.be/F1ka6a13S9I" target="_blank">video</a>. You can find a summary of the video <a href="https://kevinzakka.github.io/2016/09/26/applying-deep-learning/" target="_blank">here</a>.  
-</p>
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="full-object-oriented-implementation">Full object-oriented implementation </h2>
-
-<p>It is very natural to think of the network as an object, with specific instances of the network
-being realizations of this object with different hyperparameters. An implementation using Python classes provides a clean structure and interface, and the full implementation of our neural network is given below.
-</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="line-height: 125%;"><span style="color: #8B008B; font-weight: bold">class</span> <span style="color: #008b45; font-weight: bold">NeuralNetwork</span>:
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">__init__</span>(
-            <span style="color: #658b00">self</span>,
-            X_data,
-            Y_data,
-            n_hidden_neurons=<span style="color: #B452CD">50</span>,
-            n_categories=<span style="color: #B452CD">10</span>,
-            epochs=<span style="color: #B452CD">10</span>,
-            batch_size=<span style="color: #B452CD">100</span>,
-            eta=<span style="color: #B452CD">0.1</span>,
-            lmbd=<span style="color: #B452CD">0.0</span>):
-
-        <span style="color: #658b00">self</span>.X_data_full = X_data
-        <span style="color: #658b00">self</span>.Y_data_full = Y_data
-
-        <span style="color: #658b00">self</span>.n_inputs = X_data.shape[<span style="color: #B452CD">0</span>]
-        <span style="color: #658b00">self</span>.n_features = X_data.shape[<span style="color: #B452CD">1</span>]
-        <span style="color: #658b00">self</span>.n_hidden_neurons = n_hidden_neurons
-        <span style="color: #658b00">self</span>.n_categories = n_categories
-
-        <span style="color: #658b00">self</span>.epochs = epochs
-        <span style="color: #658b00">self</span>.batch_size = batch_size
-        <span style="color: #658b00">self</span>.iterations = <span style="color: #658b00">self</span>.n_inputs // <span style="color: #658b00">self</span>.batch_size
-        <span style="color: #658b00">self</span>.eta = eta
-        <span style="color: #658b00">self</span>.lmbd = lmbd
-
-        <span style="color: #658b00">self</span>.create_biases_and_weights()
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">create_biases_and_weights</span>(<span style="color: #658b00">self</span>):
-        <span style="color: #658b00">self</span>.hidden_weights = np.random.randn(<span style="color: #658b00">self</span>.n_features, <span style="color: #658b00">self</span>.n_hidden_neurons)
-        <span style="color: #658b00">self</span>.hidden_bias = np.zeros(<span style="color: #658b00">self</span>.n_hidden_neurons) + <span style="color: #B452CD">0.01</span>
-
-        <span style="color: #658b00">self</span>.output_weights = np.random.randn(<span style="color: #658b00">self</span>.n_hidden_neurons, <span style="color: #658b00">self</span>.n_categories)
-        <span style="color: #658b00">self</span>.output_bias = np.zeros(<span style="color: #658b00">self</span>.n_categories) + <span style="color: #B452CD">0.01</span>
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">feed_forward</span>(<span style="color: #658b00">self</span>):
-        <span style="color: #228B22"># feed-forward for training</span>
-        <span style="color: #658b00">self</span>.z_h = np.matmul(<span style="color: #658b00">self</span>.X_data, <span style="color: #658b00">self</span>.hidden_weights) + <span style="color: #658b00">self</span>.hidden_bias
-        <span style="color: #658b00">self</span>.a_h = sigmoid(<span style="color: #658b00">self</span>.z_h)
-
-        <span style="color: #658b00">self</span>.z_o = np.matmul(<span style="color: #658b00">self</span>.a_h, <span style="color: #658b00">self</span>.output_weights) + <span style="color: #658b00">self</span>.output_bias
-
-        exp_term = np.exp(<span style="color: #658b00">self</span>.z_o)
-        <span style="color: #658b00">self</span>.probabilities = exp_term / np.sum(exp_term, axis=<span style="color: #B452CD">1</span>, keepdims=<span style="color: #8B008B; font-weight: bold">True</span>)
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">feed_forward_out</span>(<span style="color: #658b00">self</span>, X):
-        <span style="color: #228B22"># feed-forward for output</span>
-        z_h = np.matmul(X, <span style="color: #658b00">self</span>.hidden_weights) + <span style="color: #658b00">self</span>.hidden_bias
-        a_h = sigmoid(z_h)
-
-        z_o = np.matmul(a_h, <span style="color: #658b00">self</span>.output_weights) + <span style="color: #658b00">self</span>.output_bias
-        
-        exp_term = np.exp(z_o)
-        probabilities = exp_term / np.sum(exp_term, axis=<span style="color: #B452CD">1</span>, keepdims=<span style="color: #8B008B; font-weight: bold">True</span>)
-        <span style="color: #8B008B; font-weight: bold">return</span> probabilities
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">backpropagation</span>(<span style="color: #658b00">self</span>):
-        error_output = <span style="color: #658b00">self</span>.probabilities - <span style="color: #658b00">self</span>.Y_data
-        error_hidden = np.matmul(error_output, <span style="color: #658b00">self</span>.output_weights.T) * <span style="color: #658b00">self</span>.a_h * (<span style="color: #B452CD">1</span> - <span style="color: #658b00">self</span>.a_h)
-
-        <span style="color: #658b00">self</span>.output_weights_gradient = np.matmul(<span style="color: #658b00">self</span>.a_h.T, error_output)
-        <span style="color: #658b00">self</span>.output_bias_gradient = np.sum(error_output, axis=<span style="color: #B452CD">0</span>)
-
-        <span style="color: #658b00">self</span>.hidden_weights_gradient = np.matmul(<span style="color: #658b00">self</span>.X_data.T, error_hidden)
-        <span style="color: #658b00">self</span>.hidden_bias_gradient = np.sum(error_hidden, axis=<span style="color: #B452CD">0</span>)
-
-        <span style="color: #8B008B; font-weight: bold">if</span> <span style="color: #658b00">self</span>.lmbd &gt; <span style="color: #B452CD">0.0</span>:
-            <span style="color: #658b00">self</span>.output_weights_gradient += <span style="color: #658b00">self</span>.lmbd * <span style="color: #658b00">self</span>.output_weights
-            <span style="color: #658b00">self</span>.hidden_weights_gradient += <span style="color: #658b00">self</span>.lmbd * <span style="color: #658b00">self</span>.hidden_weights
-
-        <span style="color: #658b00">self</span>.output_weights -= <span style="color: #658b00">self</span>.eta * <span style="color: #658b00">self</span>.output_weights_gradient
-        <span style="color: #658b00">self</span>.output_bias -= <span style="color: #658b00">self</span>.eta * <span style="color: #658b00">self</span>.output_bias_gradient
-        <span style="color: #658b00">self</span>.hidden_weights -= <span style="color: #658b00">self</span>.eta * <span style="color: #658b00">self</span>.hidden_weights_gradient
-        <span style="color: #658b00">self</span>.hidden_bias -= <span style="color: #658b00">self</span>.eta * <span style="color: #658b00">self</span>.hidden_bias_gradient
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">predict</span>(<span style="color: #658b00">self</span>, X):
-        probabilities = <span style="color: #658b00">self</span>.feed_forward_out(X)
-        <span style="color: #8B008B; font-weight: bold">return</span> np.argmax(probabilities, axis=<span style="color: #B452CD">1</span>)
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">predict_probabilities</span>(<span style="color: #658b00">self</span>, X):
-        probabilities = <span style="color: #658b00">self</span>.feed_forward_out(X)
-        <span style="color: #8B008B; font-weight: bold">return</span> probabilities
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">train</span>(<span style="color: #658b00">self</span>):
-        data_indices = np.arange(<span style="color: #658b00">self</span>.n_inputs)
-
-        <span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #658b00">self</span>.epochs):
-            <span style="color: #8B008B; font-weight: bold">for</span> j <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #658b00">self</span>.iterations):
-                <span style="color: #228B22"># pick datapoints with replacement</span>
-                chosen_datapoints = np.random.choice(
-                    data_indices, size=<span style="color: #658b00">self</span>.batch_size, replace=<span style="color: #8B008B; font-weight: bold">False</span>
-                )
-
-                <span style="color: #228B22"># minibatch training data</span>
-                <span style="color: #658b00">self</span>.X_data = <span style="color: #658b00">self</span>.X_data_full[chosen_datapoints]
-                <span style="color: #658b00">self</span>.Y_data = <span style="color: #658b00">self</span>.Y_data_full[chosen_datapoints]
-
-                <span style="color: #658b00">self</span>.feed_forward()
-                <span style="color: #658b00">self</span>.backpropagation()
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="evaluate-model-performance-on-test-data">Evaluate model performance on test data </h2>
-
-<p>To measure the performance of our network we evaluate how well it does it data it has never seen before, i.e. the test data.  
-We measure the performance of the network using the <em>accuracy</em> score.  
-The accuracy is as you would expect just the number of images correctly labeled divided by the total number of images. A perfect classifier will have an accuracy score of \( 1 \).  
-</p>
-
-<p>$$ \text{Accuracy} = \frac{\sum_{i=1}^n I(\tilde{y}_i = y_i)}{n} ,$$  </p>
-
-<p>where \( I \) is the indicator function, \( 1 \) if \( \tilde{y}_i = y_i \) and \( 0 \) otherwise.</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="line-height: 125%;">epochs = <span style="color: #B452CD">100</span>
-batch_size = <span style="color: #B452CD">100</span>
-
-dnn = NeuralNetwork(X_train, Y_train_onehot, eta=eta, lmbd=lmbd, epochs=epochs, batch_size=batch_size,
-                    n_hidden_neurons=n_hidden_neurons, n_categories=n_categories)
-dnn.train()
-test_predict = dnn.predict(X_test)
-
-<span style="color: #228B22"># accuracy score from scikit library</span>
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;Accuracy score on test set: &quot;</span>, accuracy_score(Y_test, test_predict))
-
-<span style="color: #228B22"># equivalent in numpy</span>
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">accuracy_score_numpy</span>(Y_test, Y_pred):
-    <span style="color: #8B008B; font-weight: bold">return</span> np.sum(Y_test == Y_pred) / <span style="color: #658b00">len</span>(Y_test)
-
-<span style="color: #228B22">#print(&quot;Accuracy score on test set: &quot;, accuracy_score_numpy(Y_test, test_predict))</span>
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="adjust-hyperparameters">Adjust hyperparameters </h2>
-
-<p>We now perform a grid search to find the optimal hyperparameters for the network.  
-Note that we are only using 1 layer with 50 neurons, and human performance is estimated to be around \( 98\% \) (\( 2\% \) error rate).
-</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="line-height: 125%;">eta_vals = np.logspace(-<span style="color: #B452CD">5</span>, <span style="color: #B452CD">1</span>, <span style="color: #B452CD">7</span>)
-lmbd_vals = np.logspace(-<span style="color: #B452CD">5</span>, <span style="color: #B452CD">1</span>, <span style="color: #B452CD">7</span>)
-<span style="color: #228B22"># store the models for later use</span>
-DNN_numpy = np.zeros((<span style="color: #658b00">len</span>(eta_vals), <span style="color: #658b00">len</span>(lmbd_vals)), dtype=<span style="color: #658b00">object</span>)
-
-<span style="color: #228B22"># grid search</span>
-<span style="color: #8B008B; font-weight: bold">for</span> i, eta <span style="color: #8B008B">in</span> <span style="color: #658b00">enumerate</span>(eta_vals):
-    <span style="color: #8B008B; font-weight: bold">for</span> j, lmbd <span style="color: #8B008B">in</span> <span style="color: #658b00">enumerate</span>(lmbd_vals):
-        dnn = NeuralNetwork(X_train, Y_train_onehot, eta=eta, lmbd=lmbd, epochs=epochs, batch_size=batch_size,
-                            n_hidden_neurons=n_hidden_neurons, n_categories=n_categories)
-        dnn.train()
-        
-        DNN_numpy[i][j] = dnn
-        
-        test_predict = dnn.predict(X_test)
-        
-        <span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;Learning rate  = &quot;</span>, eta)
-        <span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;Lambda = &quot;</span>, lmbd)
-        <span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;Accuracy score on test set: &quot;</span>, accuracy_score(Y_test, test_predict))
-        <span style="color: #658b00">print</span>()
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="visualization">Visualization </h2>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="line-height: 125%;"><span style="color: #228B22"># visual representation of grid search</span>
-<span style="color: #228B22"># uses seaborn heatmap, you can also do this with matplotlib imshow</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">seaborn</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">sns</span>
-
-sns.set()
-
-train_accuracy = np.zeros((<span style="color: #658b00">len</span>(eta_vals), <span style="color: #658b00">len</span>(lmbd_vals)))
-test_accuracy = np.zeros((<span style="color: #658b00">len</span>(eta_vals), <span style="color: #658b00">len</span>(lmbd_vals)))
-
-<span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #658b00">len</span>(eta_vals)):
-    <span style="color: #8B008B; font-weight: bold">for</span> j <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #658b00">len</span>(lmbd_vals)):
-        dnn = DNN_numpy[i][j]
-        
-        train_pred = dnn.predict(X_train) 
-        test_pred = dnn.predict(X_test)
-
-        train_accuracy[i][j] = accuracy_score(Y_train, train_pred)
-        test_accuracy[i][j] = accuracy_score(Y_test, test_pred)
-
-        
-fig, ax = plt.subplots(figsize = (<span style="color: #B452CD">10</span>, <span style="color: #B452CD">10</span>))
-sns.heatmap(train_accuracy, annot=<span style="color: #8B008B; font-weight: bold">True</span>, ax=ax, cmap=<span style="color: #CD5555">&quot;viridis&quot;</span>)
-ax.set_title(<span style="color: #CD5555">&quot;Training Accuracy&quot;</span>)
-ax.set_ylabel(<span style="color: #CD5555">&quot;$\eta$&quot;</span>)
-ax.set_xlabel(<span style="color: #CD5555">&quot;$\lambda$&quot;</span>)
-plt.show()
-
-fig, ax = plt.subplots(figsize = (<span style="color: #B452CD">10</span>, <span style="color: #B452CD">10</span>))
-sns.heatmap(test_accuracy, annot=<span style="color: #8B008B; font-weight: bold">True</span>, ax=ax, cmap=<span style="color: #CD5555">&quot;viridis&quot;</span>)
-ax.set_title(<span style="color: #CD5555">&quot;Test Accuracy&quot;</span>)
-ax.set_ylabel(<span style="color: #CD5555">&quot;$\eta$&quot;</span>)
-ax.set_xlabel(<span style="color: #CD5555">&quot;$\lambda$&quot;</span>)
-plt.show()
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="scikit-learn-implementation">scikit-learn implementation </h2>
-
-<p><b>scikit-learn</b> focuses more
-on traditional machine learning methods, such as regression,
-clustering, decision trees, etc. As such, it has only two types of
-neural networks: Multi Layer Perceptron outputting continuous values,
-<em>MPLRegressor</em>, and Multi Layer Perceptron outputting labels,
-<em>MLPClassifier</em>. We will see how simple it is to use these classes.
-</p>
-
-<p><b>scikit-learn</b> implements a few improvements from our neural network,
-such as early stopping, a varying learning rate, different
-optimization methods, etc. We would therefore expect a better
-performance overall.
-</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="line-height: 125%;"><span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.neural_network</span> <span style="color: #8B008B; font-weight: bold">import</span> MLPClassifier
-<span style="color: #228B22"># store models for later use</span>
-DNN_scikit = np.zeros((<span style="color: #658b00">len</span>(eta_vals), <span style="color: #658b00">len</span>(lmbd_vals)), dtype=<span style="color: #658b00">object</span>)
-
-<span style="color: #8B008B; font-weight: bold">for</span> i, eta <span style="color: #8B008B">in</span> <span style="color: #658b00">enumerate</span>(eta_vals):
-    <span style="color: #8B008B; font-weight: bold">for</span> j, lmbd <span style="color: #8B008B">in</span> <span style="color: #658b00">enumerate</span>(lmbd_vals):
-        dnn = MLPClassifier(hidden_layer_sizes=(n_hidden_neurons), activation=<span style="color: #CD5555">&#39;logistic&#39;</span>,
-                            alpha=lmbd, learning_rate_init=eta, max_iter=epochs)
-        dnn.fit(X_train, Y_train)
-        
-        DNN_scikit[i][j] = dnn
-        
-        <span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;Learning rate  = &quot;</span>, eta)
-        <span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;Lambda = &quot;</span>, lmbd)
-        <span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;Accuracy score on test set: &quot;</span>, dnn.score(X_test, Y_test))
-        <span style="color: #658b00">print</span>()
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="visualization">Visualization </h2>
-
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="line-height: 125%;"><span style="color: #228B22"># optional</span>
-<span style="color: #228B22"># visual representation of grid search</span>
-<span style="color: #228B22"># uses seaborn heatmap, could probably do this in matplotlib</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">seaborn</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">sns</span>
-
-sns.set()
-
-train_accuracy = np.zeros((<span style="color: #658b00">len</span>(eta_vals), <span style="color: #658b00">len</span>(lmbd_vals)))
-test_accuracy = np.zeros((<span style="color: #658b00">len</span>(eta_vals), <span style="color: #658b00">len</span>(lmbd_vals)))
-
-<span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #658b00">len</span>(eta_vals)):
-    <span style="color: #8B008B; font-weight: bold">for</span> j <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #658b00">len</span>(lmbd_vals)):
-        dnn = DNN_scikit[i][j]
-        
-        train_pred = dnn.predict(X_train) 
-        test_pred = dnn.predict(X_test)
-
-        train_accuracy[i][j] = accuracy_score(Y_train, train_pred)
-        test_accuracy[i][j] = accuracy_score(Y_test, test_pred)
-
-        
-fig, ax = plt.subplots(figsize = (<span style="color: #B452CD">10</span>, <span style="color: #B452CD">10</span>))
-sns.heatmap(train_accuracy, annot=<span style="color: #8B008B; font-weight: bold">True</span>, ax=ax, cmap=<span style="color: #CD5555">&quot;viridis&quot;</span>)
-ax.set_title(<span style="color: #CD5555">&quot;Training Accuracy&quot;</span>)
-ax.set_ylabel(<span style="color: #CD5555">&quot;$\eta$&quot;</span>)
-ax.set_xlabel(<span style="color: #CD5555">&quot;$\lambda$&quot;</span>)
-plt.show()
-
-fig, ax = plt.subplots(figsize = (<span style="color: #B452CD">10</span>, <span style="color: #B452CD">10</span>))
-sns.heatmap(test_accuracy, annot=<span style="color: #8B008B; font-weight: bold">True</span>, ax=ax, cmap=<span style="color: #CD5555">&quot;viridis&quot;</span>)
-ax.set_title(<span style="color: #CD5555">&quot;Test Accuracy&quot;</span>)
-ax.set_ylabel(<span style="color: #CD5555">&quot;$\eta$&quot;</span>)
-ax.set_xlabel(<span style="color: #CD5555">&quot;$\lambda$&quot;</span>)
-plt.show()
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="building-neural-networks-in-tensorflow-and-keras">Building neural networks in Tensorflow and Keras </h2>
-
-<p>Now we want  to build on the experience gained from our neural network implementation in NumPy and scikit-learn
-and use it to construct a neural network in Tensorflow. Once we have constructed a neural network in NumPy
-and Tensorflow, building one in Keras is really quite trivial, though the performance may suffer.  
-</p>
-
-<p>In our previous example we used only one hidden layer, and in this we will use two. From this it should be quite
-clear how to build one using an arbitrary number of hidden layers, using data structures such as Python lists or
-NumPy arrays.
-</p>
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="tensorflow">Tensorflow </h2>
-
-<p>Tensorflow is an open source library machine learning library
-developed by the Google Brain team for internal use. It was released
-under the Apache 2.0 open source license in November 9, 2015.
-</p>
-
-<p>Tensorflow is a computational framework that allows you to construct
-machine learning models at different levels of abstraction, from
-high-level, object-oriented APIs like Keras, down to the C++ kernels
-that Tensorflow is built upon. The higher levels of abstraction are
-simpler to use, but less flexible, and our choice of implementation
-should reflect the problems we are trying to solve.
-</p>
-
-<p><a href="https://www.tensorflow.org/guide/graphs" target="_blank">Tensorflow uses</a> so-called graphs to represent your computation
-in terms of the dependencies between individual operations, such that you first build a Tensorflow <em>graph</em>
-to represent your model, and then create a Tensorflow <em>session</em> to run the graph.
-</p>
-
-<p>In this guide we will analyze the same data as we did in our NumPy and
-scikit-learn tutorial, gathered from the MNIST database of images. We
-will give an introduction to the lower level Python Application
-Program Interfaces (APIs), and see how we use them to build our graph.
-Then we will build (effectively) the same graph in Keras, to see just
-how simple solving a machine learning problem can be.
-</p>
-
-<p>To install tensorflow on Unix/Linux systems, use pip as</p>
-
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="line-height: 125%;">pip3 install tensorflow
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-<p>and/or if you use <b>anaconda</b>, just write (or install from the graphical user interface)
-(current release of CPU-only TensorFlow)
-</p>
-
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="line-height: 125%;">conda create -n tf tensorflow
-conda activate tf
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-<p>To install the current release of GPU TensorFlow</p>
-
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="line-height: 125%;">conda create -n tf-gpu tensorflow-gpu
-conda activate tf-gpu
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="using-keras">Using Keras </h2>
-
-<p>Keras is a high level <a href="https://en.wikipedia.org/wiki/Application_programming_interface" target="_blank">neural network</a>
-that supports Tensorflow, CTNK and Theano as backends.  
-If you have Anaconda installed you may run the following command
-</p>
-
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="line-height: 125%;">conda install keras
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-<p>You can look up the <a href="https://keras.io/" target="_blank">instructions here</a> for more information.</p>
-
-<p>We will to a large extent use <b>keras</b> in this course. </p>
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="collect-and-pre-process-data">Collect and pre-process data </h2>
-
-<p>Let us look again at the MINST data set.</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="line-height: 125%;"><span style="color: #228B22"># import necessary packages</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">matplotlib.pyplot</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">plt</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">tensorflow</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">tf</span>
+<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">tensorflow</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">tf</span>
 <span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn</span> <span style="color: #8B008B; font-weight: bold">import</span> datasets
 
 
@@ -2251,7 +922,7 @@ <h2 id="collect-and-pre-process-data">Collect and pre-process data </h2>
 
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="the-breast-cancer-data-now-with-keras">The Breast Cancer Data, now with Keras </h2>
+<h2 id="using-pytorch-with-the-full-mnist-data-set">Using Pytorch with the full MNIST data set </h2>
 
 
 <!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
@@ -2260,171 +931,156 @@ <h2 id="the-breast-cancer-data-now-with-keras">The Breast Cancer Data, now with
     <div class="inner_cell">
       <div class="input_area">
         <div class="highlight" style="background: #eeeedd">
-  <pre style="line-height: 125%;"><span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">tensorflow</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">tf</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">tensorflow.keras.layers</span> <span style="color: #8B008B; font-weight: bold">import</span> Input
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">tensorflow.keras.models</span> <span style="color: #8B008B; font-weight: bold">import</span> Sequential      <span style="color: #228B22">#This allows appending layers to existing models</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">tensorflow.keras.layers</span> <span style="color: #8B008B; font-weight: bold">import</span> Dense           <span style="color: #228B22">#This allows defining the characteristics of a particular layer</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">tensorflow.keras</span> <span style="color: #8B008B; font-weight: bold">import</span> optimizers             <span style="color: #228B22">#This allows using whichever optimiser we want (sgd,adam,RMSprop)</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">tensorflow.keras</span> <span style="color: #8B008B; font-weight: bold">import</span> regularizers           <span style="color: #228B22">#This allows using whichever regularizer we want (l1,l2,l1_l2)</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">tensorflow.keras.utils</span> <span style="color: #8B008B; font-weight: bold">import</span> to_categorical   <span style="color: #228B22">#This allows using categorical cross entropy as the cost function</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">matplotlib.pyplot</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">plt</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">seaborn</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">sns</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.model_selection</span> <span style="color: #8B008B; font-weight: bold">import</span> train_test_split <span style="color: #8B008B; font-weight: bold">as</span> splitter
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.datasets</span> <span style="color: #8B008B; font-weight: bold">import</span> load_breast_cancer
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">pickle</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">os</span> 
-
-
-<span style="color: #CD5555">&quot;&quot;&quot;Load breast cancer dataset&quot;&quot;&quot;</span>
-
-np.random.seed(<span style="color: #B452CD">0</span>)        <span style="color: #228B22">#create same seed for random number every time</span>
-
-cancer=load_breast_cancer()      <span style="color: #228B22">#Download breast cancer dataset</span>
-
-inputs=cancer.data                     <span style="color: #228B22">#Feature matrix of 569 rows (samples) and 30 columns (parameters)</span>
-outputs=cancer.target                  <span style="color: #228B22">#Label array of 569 rows (0 for benign and 1 for malignant)</span>
-labels=cancer.feature_names[<span style="color: #B452CD">0</span>:<span style="color: #B452CD">30</span>]
-
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&#39;The content of the breast cancer dataset is:&#39;</span>)      <span style="color: #228B22">#Print information about the datasets</span>
-<span style="color: #658b00">print</span>(labels)
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&#39;-------------------------&#39;</span>)
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;inputs =  &quot;</span> + <span style="color: #658b00">str</span>(inputs.shape))
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;outputs =  &quot;</span> + <span style="color: #658b00">str</span>(outputs.shape))
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;labels =  &quot;</span>+ <span style="color: #658b00">str</span>(labels.shape))
-
-x=inputs      <span style="color: #228B22">#Reassign the Feature and Label matrices to other variables</span>
-y=outputs
-
-<span style="color: #228B22">#%% </span>
-
-<span style="color: #228B22"># Visualisation of dataset (for correlation analysis)</span>
-
-plt.figure()
-plt.scatter(x[:,<span style="color: #B452CD">0</span>],x[:,<span style="color: #B452CD">2</span>],s=<span style="color: #B452CD">40</span>,c=y,cmap=plt.cm.Spectral)
-plt.xlabel(<span style="color: #CD5555">&#39;Mean radius&#39;</span>,fontweight=<span style="color: #CD5555">&#39;bold&#39;</span>)
-plt.ylabel(<span style="color: #CD5555">&#39;Mean perimeter&#39;</span>,fontweight=<span style="color: #CD5555">&#39;bold&#39;</span>)
-plt.show()
-
-plt.figure()
-plt.scatter(x[:,<span style="color: #B452CD">5</span>],x[:,<span style="color: #B452CD">6</span>],s=<span style="color: #B452CD">40</span>,c=y, cmap=plt.cm.Spectral)
-plt.xlabel(<span style="color: #CD5555">&#39;Mean compactness&#39;</span>,fontweight=<span style="color: #CD5555">&#39;bold&#39;</span>)
-plt.ylabel(<span style="color: #CD5555">&#39;Mean concavity&#39;</span>,fontweight=<span style="color: #CD5555">&#39;bold&#39;</span>)
-plt.show()
-
-
-plt.figure()
-plt.scatter(x[:,<span style="color: #B452CD">0</span>],x[:,<span style="color: #B452CD">1</span>],s=<span style="color: #B452CD">40</span>,c=y,cmap=plt.cm.Spectral)
-plt.xlabel(<span style="color: #CD5555">&#39;Mean radius&#39;</span>,fontweight=<span style="color: #CD5555">&#39;bold&#39;</span>)
-plt.ylabel(<span style="color: #CD5555">&#39;Mean texture&#39;</span>,fontweight=<span style="color: #CD5555">&#39;bold&#39;</span>)
-plt.show()
-
-plt.figure()
-plt.scatter(x[:,<span style="color: #B452CD">2</span>],x[:,<span style="color: #B452CD">1</span>],s=<span style="color: #B452CD">40</span>,c=y,cmap=plt.cm.Spectral)
-plt.xlabel(<span style="color: #CD5555">&#39;Mean perimeter&#39;</span>,fontweight=<span style="color: #CD5555">&#39;bold&#39;</span>)
-plt.ylabel(<span style="color: #CD5555">&#39;Mean compactness&#39;</span>,fontweight=<span style="color: #CD5555">&#39;bold&#39;</span>)
-plt.show()
-
-
-<span style="color: #228B22"># Generate training and testing datasets</span>
-
-<span style="color: #228B22">#Select features relevant to classification (texture,perimeter,compactness and symmetery) </span>
-<span style="color: #228B22">#and add to input matrix</span>
-
-temp1=np.reshape(x[:,<span style="color: #B452CD">1</span>],(<span style="color: #658b00">len</span>(x[:,<span style="color: #B452CD">1</span>]),<span style="color: #B452CD">1</span>))
-temp2=np.reshape(x[:,<span style="color: #B452CD">2</span>],(<span style="color: #658b00">len</span>(x[:,<span style="color: #B452CD">2</span>]),<span style="color: #B452CD">1</span>))
-X=np.hstack((temp1,temp2))      
-temp=np.reshape(x[:,<span style="color: #B452CD">5</span>],(<span style="color: #658b00">len</span>(x[:,<span style="color: #B452CD">5</span>]),<span style="color: #B452CD">1</span>))
-X=np.hstack((X,temp))       
-temp=np.reshape(x[:,<span style="color: #B452CD">8</span>],(<span style="color: #658b00">len</span>(x[:,<span style="color: #B452CD">8</span>]),<span style="color: #B452CD">1</span>))
-X=np.hstack((X,temp))       
-
-X_train,X_test,y_train,y_test=splitter(X,y,test_size=<span style="color: #B452CD">0.1</span>)   <span style="color: #228B22">#Split datasets into training and testing</span>
-
-y_train=to_categorical(y_train)     <span style="color: #228B22">#Convert labels to categorical when using categorical cross entropy</span>
-y_test=to_categorical(y_test)
-
-<span style="color: #8B008B; font-weight: bold">del</span> temp1,temp2,temp
-
-<span style="color: #228B22"># %%</span>
-
-<span style="color: #228B22"># Define tunable parameters&quot;</span>
-
-eta=np.logspace(-<span style="color: #B452CD">3</span>,-<span style="color: #B452CD">1</span>,<span style="color: #B452CD">3</span>)                    <span style="color: #228B22">#Define vector of learning rates (parameter to SGD optimiser)</span>
-lamda=<span style="color: #B452CD">0.01</span>                                  <span style="color: #228B22">#Define hyperparameter</span>
-n_layers=<span style="color: #B452CD">2</span>                                  <span style="color: #228B22">#Define number of hidden layers in the model</span>
-n_neuron=np.logspace(<span style="color: #B452CD">0</span>,<span style="color: #B452CD">3</span>,<span style="color: #B452CD">4</span>,dtype=<span style="color: #658b00">int</span>)       <span style="color: #228B22">#Define number of neurons per layer</span>
-epochs=<span style="color: #B452CD">100</span>                                   <span style="color: #228B22">#Number of reiterations over the input data</span>
-batch_size=<span style="color: #B452CD">100</span>                              <span style="color: #228B22">#Number of samples per gradient update</span>
-
-<span style="color: #228B22"># %%</span>
-
-<span style="color: #CD5555">&quot;&quot;&quot;Define function to return Deep Neural Network model&quot;&quot;&quot;</span>
-
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">NN_model</span>(inputsize,n_layers,n_neuron,eta,lamda):
-    model=Sequential()      
-    <span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(n_layers):       <span style="color: #228B22">#Run loop to add hidden layers to the model</span>
-        <span style="color: #8B008B; font-weight: bold">if</span> (i==<span style="color: #B452CD">0</span>):                  <span style="color: #228B22">#First layer requires input dimensions</span>
-            model.add(Dense(n_neuron,activation=<span style="color: #CD5555">&#39;relu&#39;</span>,kernel_regularizer=regularizers.l2(lamda),input_dim=inputsize))
-        <span style="color: #8B008B; font-weight: bold">else</span>:                       <span style="color: #228B22">#Subsequent layers are capable of automatic shape inferencing</span>
-            model.add(Dense(n_neuron,activation=<span style="color: #CD5555">&#39;relu&#39;</span>,kernel_regularizer=regularizers.l2(lamda)))
-    model.add(Dense(<span style="color: #B452CD">2</span>,activation=<span style="color: #CD5555">&#39;softmax&#39;</span>))  <span style="color: #228B22">#2 outputs - ordered and disordered (softmax for prob)</span>
-    sgd=optimizers.SGD(learning_rate=eta)
-    model.compile(loss=<span style="color: #CD5555">&#39;categorical_crossentropy&#39;</span>,optimizer=sgd,metrics=[<span style="color: #CD5555">&#39;accuracy&#39;</span>])
-    <span style="color: #8B008B; font-weight: bold">return</span> model
-
-    
-Train_accuracy=np.zeros((<span style="color: #658b00">len</span>(n_neuron),<span style="color: #658b00">len</span>(eta)))      <span style="color: #228B22">#Define matrices to store accuracy scores as a function</span>
-Test_accuracy=np.zeros((<span style="color: #658b00">len</span>(n_neuron),<span style="color: #658b00">len</span>(eta)))       <span style="color: #228B22">#of learning rate and number of hidden neurons for </span>
-
-<span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #658b00">len</span>(n_neuron)):     <span style="color: #228B22">#run loops over hidden neurons and learning rates to calculate </span>
-    <span style="color: #8B008B; font-weight: bold">for</span> j <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #658b00">len</span>(eta)):      <span style="color: #228B22">#accuracy scores </span>
-        DNN_model=NN_model(X_train.shape[<span style="color: #B452CD">1</span>],n_layers,n_neuron[i],eta[j],lamda)
-        DNN_model.fit(X_train,y_train,epochs=epochs,batch_size=batch_size,verbose=<span style="color: #B452CD">1</span>)
-        Train_accuracy[i,j]=DNN_model.evaluate(X_train,y_train)[<span style="color: #B452CD">1</span>]
-        Test_accuracy[i,j]=DNN_model.evaluate(X_test,y_test)[<span style="color: #B452CD">1</span>]
-               
-
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">plot_data</span>(x,y,data,title=<span style="color: #8B008B; font-weight: bold">None</span>):
-
-    <span style="color: #228B22"># plot results</span>
-    fontsize=<span style="color: #B452CD">16</span>
-
-
-    fig = plt.figure()
-    ax = fig.add_subplot(<span style="color: #B452CD">111</span>)
-    cax = ax.matshow(data, interpolation=<span style="color: #CD5555">&#39;nearest&#39;</span>, vmin=<span style="color: #B452CD">0</span>, vmax=<span style="color: #B452CD">1</span>)
-    
-    cbar=fig.colorbar(cax)
-    cbar.ax.set_ylabel(<span style="color: #CD5555">&#39;accuracy (%)&#39;</span>,rotation=<span style="color: #B452CD">90</span>,fontsize=fontsize)
-    cbar.set_ticks([<span style="color: #B452CD">0</span>,<span style="color: #B452CD">.2</span>,<span style="color: #B452CD">.4</span>,<span style="color: #B452CD">0.6</span>,<span style="color: #B452CD">0.8</span>,<span style="color: #B452CD">1.0</span>])
-    cbar.set_ticklabels([<span style="color: #CD5555">&#39;0%&#39;</span>,<span style="color: #CD5555">&#39;20%&#39;</span>,<span style="color: #CD5555">&#39;40%&#39;</span>,<span style="color: #CD5555">&#39;60%&#39;</span>,<span style="color: #CD5555">&#39;80%&#39;</span>,<span style="color: #CD5555">&#39;100%&#39;</span>])
-
-    <span style="color: #228B22"># put text on matrix elements</span>
-    <span style="color: #8B008B; font-weight: bold">for</span> i, x_val <span style="color: #8B008B">in</span> <span style="color: #658b00">enumerate</span>(np.arange(<span style="color: #658b00">len</span>(x))):
-        <span style="color: #8B008B; font-weight: bold">for</span> j, y_val <span style="color: #8B008B">in</span> <span style="color: #658b00">enumerate</span>(np.arange(<span style="color: #658b00">len</span>(y))):
-            c = <span style="color: #CD5555">&quot;${0:.1f}\\%$&quot;</span>.format( <span style="color: #B452CD">100</span>*data[j,i])  
-            ax.text(x_val, y_val, c, va=<span style="color: #CD5555">&#39;center&#39;</span>, ha=<span style="color: #CD5555">&#39;center&#39;</span>)
-
-    <span style="color: #228B22"># convert axis vaues to to string labels</span>
-    x=[<span style="color: #658b00">str</span>(i) <span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> x]
-    y=[<span style="color: #658b00">str</span>(i) <span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> y]
-
+  <pre style="line-height: 125%;"><span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">torch</span>
+<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">torch.nn</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">nn</span>
+<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">torch.optim</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">optim</span>
+<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">torchvision</span>
+<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">torchvision.transforms</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">transforms</span>
+
+<span style="color: #228B22"># Device configuration: use GPU if available</span>
+device = torch.device(<span style="color: #CD5555">&quot;cuda&quot;</span> <span style="color: #8B008B; font-weight: bold">if</span> torch.cuda.is_available() <span style="color: #8B008B; font-weight: bold">else</span> <span style="color: #CD5555">&quot;cpu&quot;</span>)
+
+<span style="color: #228B22"># MNIST dataset (downloads if not already present)</span>
+transform = transforms.Compose([
+    transforms.ToTensor(),
+    transforms.Normalize((<span style="color: #B452CD">0.5</span>,), (<span style="color: #B452CD">0.5</span>,))  <span style="color: #228B22"># normalize to mean=0.5, std=0.5 (approx. [-1,1] pixel range)</span>
+])
+train_dataset = torchvision.datasets.MNIST(root=<span style="color: #CD5555">&#39;./data&#39;</span>, train=<span style="color: #8B008B; font-weight: bold">True</span>, download=<span style="color: #8B008B; font-weight: bold">True</span>, transform=transform)
+test_dataset  = torchvision.datasets.MNIST(root=<span style="color: #CD5555">&#39;./data&#39;</span>, train=<span style="color: #8B008B; font-weight: bold">False</span>, download=<span style="color: #8B008B; font-weight: bold">True</span>, transform=transform)
+
+train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=<span style="color: #B452CD">64</span>, shuffle=<span style="color: #8B008B; font-weight: bold">True</span>)
+test_loader  = torch.utils.data.DataLoader(test_dataset, batch_size=<span style="color: #B452CD">64</span>, shuffle=<span style="color: #8B008B; font-weight: bold">False</span>)
+
+
+<span style="color: #8B008B; font-weight: bold">class</span> <span style="color: #008b45; font-weight: bold">NeuralNet</span>(nn.Module):
+    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">__init__</span>(<span style="color: #658b00">self</span>):
+        <span style="color: #658b00">super</span>(NeuralNet, <span style="color: #658b00">self</span>).<span style="color: #008b45">__init__</span>()
+        <span style="color: #658b00">self</span>.fc1 = nn.Linear(<span style="color: #B452CD">28</span>*<span style="color: #B452CD">28</span>, <span style="color: #B452CD">100</span>)   <span style="color: #228B22"># first hidden layer (784 -&gt; 100)</span>
+        <span style="color: #658b00">self</span>.fc2 = nn.Linear(<span style="color: #B452CD">100</span>, <span style="color: #B452CD">100</span>)    <span style="color: #228B22"># second hidden layer (100 -&gt; 100)</span>
+        <span style="color: #658b00">self</span>.fc3 = nn.Linear(<span style="color: #B452CD">100</span>, <span style="color: #B452CD">10</span>)     <span style="color: #228B22"># output layer (100 -&gt; 10 classes)</span>
+    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">forward</span>(<span style="color: #658b00">self</span>, x):
+        x = x.view(x.size(<span style="color: #B452CD">0</span>), -<span style="color: #B452CD">1</span>)         <span style="color: #228B22"># flatten images into vectors of size 784</span>
+        x = torch.relu(<span style="color: #658b00">self</span>.fc1(x))       <span style="color: #228B22"># hidden layer 1 + ReLU activation</span>
+        x = torch.relu(<span style="color: #658b00">self</span>.fc2(x))       <span style="color: #228B22"># hidden layer 2 + ReLU activation</span>
+        x = <span style="color: #658b00">self</span>.fc3(x)                   <span style="color: #228B22"># output layer (logits for 10 classes)</span>
+        <span style="color: #8B008B; font-weight: bold">return</span> x
+
+model = NeuralNet().to(device)
+
+
+criterion = nn.CrossEntropyLoss()
+optimizer = optim.SGD(model.parameters(), lr=<span style="color: #B452CD">0.01</span>, weight_decay=<span style="color: #B452CD">1e-4</span>)
+
+num_epochs = <span style="color: #B452CD">10</span>
+<span style="color: #8B008B; font-weight: bold">for</span> epoch <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(num_epochs):
+    model.train()  <span style="color: #228B22"># set model to training mode</span>
+    running_loss = <span style="color: #B452CD">0.0</span>
+    <span style="color: #8B008B; font-weight: bold">for</span> images, labels <span style="color: #8B008B">in</span> train_loader:
+        <span style="color: #228B22"># Move data to device (GPU if available, else CPU)</span>
+        images, labels = images.to(device), labels.to(device)
+
+        optimizer.zero_grad()            <span style="color: #228B22"># reset gradients to zero</span>
+        outputs = model(images)          <span style="color: #228B22"># forward pass: compute predictions</span>
+        loss = criterion(outputs, labels)  <span style="color: #228B22"># compute cross-entropy loss</span>
+        loss.backward()                 <span style="color: #228B22"># backpropagate to compute gradients</span>
+        optimizer.step()                <span style="color: #228B22"># update weights using SGD step </span>
+
+        running_loss += loss.item()
+    <span style="color: #228B22"># Compute average loss over all batches in this epoch</span>
+    avg_loss = running_loss / <span style="color: #658b00">len</span>(train_loader)
+    <span style="color: #658b00">print</span>(<span style="color: #CD5555">f&quot;Epoch {</span>epoch+<span style="color: #B452CD">1</span><span style="color: #CD5555">}/{</span>num_epochs<span style="color: #CD5555">}, Loss: {</span>avg_loss<span style="color: #CD5555">:.4f}&quot;</span>)
+
+<span style="color: #228B22">#Evaluation on the Test Set</span>
+
+
+
+model.eval()  <span style="color: #228B22"># set model to evaluation mode </span>
+correct = <span style="color: #B452CD">0</span>
+total = <span style="color: #B452CD">0</span>
+<span style="color: #8B008B; font-weight: bold">with</span> torch.no_grad():  <span style="color: #228B22"># disable gradient calculation for evaluation </span>
+    <span style="color: #8B008B; font-weight: bold">for</span> images, labels <span style="color: #8B008B">in</span> test_loader:
+        images, labels = images.to(device), labels.to(device)
+        outputs = model(images)
+        _, predicted = torch.max(outputs, dim=<span style="color: #B452CD">1</span>)  <span style="color: #228B22"># class with highest score</span>
+        total += labels.size(<span style="color: #B452CD">0</span>)
+        correct += (predicted == labels).sum().item()
+
+accuracy = <span style="color: #B452CD">100</span> * correct / total
+<span style="color: #658b00">print</span>(<span style="color: #CD5555">f&quot;Test Accuracy: {</span>accuracy<span style="color: #CD5555">:.2f}%&quot;</span>)
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
+</div>
 
-    ax.set_xticklabels([<span style="color: #CD5555">&#39;&#39;</span>]+x)
-    ax.set_yticklabels([<span style="color: #CD5555">&#39;&#39;</span>]+y)
 
-    ax.set_xlabel(<span style="color: #CD5555">&#39;$\\mathrm{learning\\ rate}$&#39;</span>,fontsize=fontsize)
-    ax.set_ylabel(<span style="color: #CD5555">&#39;$\\mathrm{hidden\\ neurons}$&#39;</span>,fontsize=fontsize)
-    <span style="color: #8B008B; font-weight: bold">if</span> title <span style="color: #8B008B">is</span> <span style="color: #8B008B">not</span> <span style="color: #8B008B; font-weight: bold">None</span>:
-        ax.set_title(title)
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="and-a-similar-example-using-tensorflow-with-keras">And a similar example using Tensorflow with Keras </h2>
 
-    plt.tight_layout()
 
-    plt.show()
-    
-plot_data(eta,n_neuron,Train_accuracy, <span style="color: #CD5555">&#39;training&#39;</span>)
-plot_data(eta,n_neuron,Test_accuracy, <span style="color: #CD5555">&#39;testing&#39;</span>)
+<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #eeeedd">
+  <pre style="line-height: 125%;"><span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">tensorflow</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">tf</span>
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">tensorflow</span> <span style="color: #8B008B; font-weight: bold">import</span> keras
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">tensorflow.keras</span> <span style="color: #8B008B; font-weight: bold">import</span> layers, regularizers
+
+<span style="color: #228B22"># Check for GPU (TensorFlow will use it automatically if available)</span>
+gpus = tf.config.list_physical_devices(<span style="color: #CD5555">&#39;GPU&#39;</span>)
+<span style="color: #658b00">print</span>(<span style="color: #CD5555">f&quot;GPUs available: {</span>gpus<span style="color: #CD5555">}&quot;</span>)
+
+<span style="color: #228B22"># 1) Load and preprocess MNIST</span>
+(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()
+<span style="color: #228B22"># Normalize to [0, 1]</span>
+x_train = (x_train.astype(<span style="color: #CD5555">&quot;float32&quot;</span>) / <span style="color: #B452CD">255.0</span>)
+x_test  = (x_test.astype(<span style="color: #CD5555">&quot;float32&quot;</span>) / <span style="color: #B452CD">255.0</span>)
+
+<span style="color: #228B22"># 2) Build the model: 784 -&gt; 100 -&gt; 100 -&gt; 10</span>
+l2_reg = <span style="color: #B452CD">1e-4</span>  <span style="color: #228B22"># L2 regularization strength</span>
+
+model = keras.Sequential([
+    layers.Input(shape=(<span style="color: #B452CD">28</span>, <span style="color: #B452CD">28</span>)),
+    layers.Flatten(),
+    layers.Dense(<span style="color: #B452CD">100</span>, activation=<span style="color: #CD5555">&quot;relu&quot;</span>,
+                 kernel_regularizer=regularizers.l2(l2_reg)),
+    layers.Dense(<span style="color: #B452CD">100</span>, activation=<span style="color: #CD5555">&quot;relu&quot;</span>,
+                 kernel_regularizer=regularizers.l2(l2_reg)),
+    layers.Dense(<span style="color: #B452CD">10</span>, activation=<span style="color: #CD5555">&quot;softmax&quot;</span>)  <span style="color: #228B22"># output probabilities for 10 classes</span>
+])
+
+<span style="color: #228B22"># 3) Compile with SGD + weight decay via L2 regularizers</span>
+model.compile(
+    optimizer=keras.optimizers.SGD(learning_rate=<span style="color: #B452CD">0.01</span>),
+    loss=<span style="color: #CD5555">&quot;sparse_categorical_crossentropy&quot;</span>,
+    metrics=[<span style="color: #CD5555">&quot;accuracy&quot;</span>],
+)
+
+model.summary()
+
+<span style="color: #228B22"># 4) Train</span>
+history = model.fit(
+    x_train, y_train,
+    epochs=<span style="color: #B452CD">10</span>,
+    batch_size=<span style="color: #B452CD">64</span>,
+    validation_split=<span style="color: #B452CD">0.1</span>,  <span style="color: #228B22"># optional: monitor validation during training</span>
+    verbose=<span style="color: #B452CD">1</span>
+)
+
+<span style="color: #228B22"># 5) Evaluate on test set</span>
+test_loss, test_acc = model.evaluate(x_test, y_test, verbose=<span style="color: #B452CD">0</span>)
+<span style="color: #658b00">print</span>(<span style="color: #CD5555">f&quot;Test accuracy: {</span>test_acc<span style="color: #CD5555">:.4f}, Test loss: {</span>test_loss<span style="color: #CD5555">:.4f}&quot;</span>)
 </pre>
 </div>
       </div>
@@ -2442,7 +1098,7 @@ <h2 id="the-breast-cancer-data-now-with-keras">The Breast Cancer Data, now with
 
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="building-a-neural-network-code">Building a neural network code </h2>
+<h2 id="building-our-own-neural-network-code">Building our own  neural network code </h2>
 
 <p>Here we  present a flexible object oriented codebase
 for a feed forward neural network, along with a demonstration of how
@@ -6462,7 +5118,7 @@ <h2 id="resources-on-differential-equations-and-deep-learning">Resources on diff
 </ol>
 <!-- ------------------- end of main content --------------- -->
 <center style="font-size:80%">
-<!-- copyright --> &copy; 1999-2024, Morten Hjorth-Jensen. Released under CC Attribution-NonCommercial 4.0 license
+<!-- copyright --> &copy; 1999-2025, Morten Hjorth-Jensen. Released under CC Attribution-NonCommercial 4.0 license
 </center>
 </body>
 </html>
diff --git a/doc/pub/week43/html/week43.html b/doc/pub/week43/html/week43.html
index 9210219f9..ac8c08010 100644
--- a/doc/pub/week43/html/week43.html
+++ b/doc/pub/week43/html/week43.html
@@ -145,15 +145,6 @@
                2,
                None,
                'exercises-and-lab-session-week-43'),
-              ('Mathematics of deep learning',
-               2,
-               None,
-               'mathematics-of-deep-learning'),
-              ('Reminder on books with hands-on material and codes',
-               2,
-               None,
-               'reminder-on-books-with-hands-on-material-and-codes'),
-              ('Reading recommendations', 2, None, 'reading-recommendations'),
               ('Using Automatic differentiation',
                2,
                None,
@@ -162,10 +153,10 @@
                2,
                None,
                'back-propagation-and-automatic-differentiation'),
-              ('Lecture Monday  October 21',
+              ('Lecture Monday  October 20',
                2,
                None,
-               'lecture-monday-october-21'),
+               'lecture-monday-october-20'),
               ('Setting up the back propagation algorithm and algorithm for a '
                'feed forward NN, initalizations',
                2,
@@ -199,63 +190,6 @@
                2,
                None,
                'more-on-activation-functions-output-layers'),
-              ('Setting up a Multi-layer perceptron model for classification',
-               2,
-               None,
-               'setting-up-a-multi-layer-perceptron-model-for-classification'),
-              ('Defining the cost function',
-               2,
-               None,
-               'defining-the-cost-function'),
-              ('Example: binary classification problem',
-               2,
-               None,
-               'example-binary-classification-problem'),
-              ('The Softmax function', 2, None, 'the-softmax-function'),
-              ('Developing a code for doing neural networks with back '
-               'propagation',
-               2,
-               None,
-               'developing-a-code-for-doing-neural-networks-with-back-propagation'),
-              ('Collect and pre-process data',
-               2,
-               None,
-               'collect-and-pre-process-data'),
-              ('Train and test datasets', 2, None, 'train-and-test-datasets'),
-              ('Define model and architecture',
-               2,
-               None,
-               'define-model-and-architecture'),
-              ('Layers', 2, None, 'layers'),
-              ('Weights and biases', 2, None, 'weights-and-biases'),
-              ('Feed-forward pass', 2, None, 'feed-forward-pass'),
-              ('Matrix multiplications', 2, None, 'matrix-multiplications'),
-              ('Choose cost function and optimizer',
-               2,
-               None,
-               'choose-cost-function-and-optimizer'),
-              ('Optimizing the cost function',
-               2,
-               None,
-               'optimizing-the-cost-function'),
-              ('Regularization', 2, None, 'regularization'),
-              ('Matrix  multiplication', 2, None, 'matrix-multiplication'),
-              ('Improving performance', 2, None, 'improving-performance'),
-              ('Full object-oriented implementation',
-               2,
-               None,
-               'full-object-oriented-implementation'),
-              ('Evaluate model performance on test data',
-               2,
-               None,
-               'evaluate-model-performance-on-test-data'),
-              ('Adjust hyperparameters', 2, None, 'adjust-hyperparameters'),
-              ('Visualization', 2, None, 'visualization'),
-              ('scikit-learn implementation',
-               2,
-               None,
-               'scikit-learn-implementation'),
-              ('Visualization', 2, None, 'visualization'),
               ('Building neural networks in Tensorflow and Keras',
                2,
                None,
@@ -266,14 +200,18 @@
                2,
                None,
                'collect-and-pre-process-data'),
-              ('The Breast Cancer Data, now with Keras',
+              ('Using Pytorch with the full MNIST data set',
                2,
                None,
-               'the-breast-cancer-data-now-with-keras'),
-              ('Building a neural network code',
+               'using-pytorch-with-the-full-mnist-data-set'),
+              ('And a similar example using Tensorflow with Keras',
                2,
                None,
-               'building-a-neural-network-code'),
+               'and-a-similar-example-using-tensorflow-with-keras'),
+              ('Building our own  neural network code',
+               2,
+               None,
+               'building-our-own-neural-network-code'),
               ('Learning rate methods', 3, None, 'learning-rate-methods'),
               ('Usage of the above learning rate schedulers',
                3,
@@ -438,18 +376,15 @@ <h1>Week 43: Deep Learning: Constructing a Neural Network code and solving diffe
 
 <!-- author(s): Morten Hjorth-Jensen -->
 <center>
-<b>Morten Hjorth-Jensen</b> [1, 2]
-</center>
-<!-- institution(s) -->
-<center>
-[1] <b>Department of Physics, University of Oslo</b>
+<b>Morten Hjorth-Jensen</b> 
 </center>
+<!-- institution -->
 <center>
-[2] <b>Department of Physics and Astronomy and Facility for Rare Isotope Beams, Michigan State University</b>
+<b>Department of Physics, University of Oslo, Norway</b>
 </center>
 <br>
 <center>
-<h4>October 21, 2024</h4>
+<h4>October 20, 2025</h4>
 </center> <!-- date -->
 <br>
 
@@ -457,15 +392,16 @@ <h4>October 21, 2024</h4>
 <h2 id="plans-for-week-43">Plans for week 43 </h2>
 
 <div class="alert alert-block alert-block alert-text-normal">
-<b>Material for the lecture on Monday October 21, 2024</b>
+<b>Material for the lecture on Monday October 20, 2025</b>
 <p>
-<ul>
-  <li> Building our own Feed-forward Neural Network with intro to Tensorflow</li>
-  <li> Solving differential equations with Neural Networks</li>
-  <li> Video of lecture at <a href="https://youtu.be/vkBNTn-MLqs" target="_blank"><tt>https://youtu.be/vkBNTn-MLqs</tt></a></li>
-  <li> Video os second part, solving differential equations with neural networks at <a href="https://youtu.be/2N8To65I2wQ" target="_blank"><tt>https://youtu.be/2N8To65I2wQ</tt></a></li>
-  <li> Whiteboard notes on solving differential equations at <a href="https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2024/NotesOct21.pdf" target="_blank"><tt>https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2024/NotesOct21.pdf</tt></a></li>  
-</ul>
+<ol>
+<li> Reminder from last week, see also lecture notes from week 42 at <a href="https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/week42.html" target="_blank"><tt>https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/week42.html</tt></a> as well as those from week 41, see see <a href="https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/week41.html" target="_blank"><tt>https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/week41.html</tt></a>.</li> 
+<li> Building our own Feed-forward Neural Network.</li>
+<li> Coding examples using Tensorflow/Keras and Pytorch examples. The Pytorch examples are adapted from Rashcka's text, see chapters 11-13..</li> 
+<li> Start discussions on how to use neural networks for solving  differential equations (ordinary and partial ones). This topic continues next week as well.</li>
+<li> Video of lecture at <a href="https://youtu.be/Gi6mzxAT0Ew" target="_blank"><tt>https://youtu.be/Gi6mzxAT0Ew</tt></a></li>
+<li> Whiteboard notes at <a href="https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2025/FYSSTKweek43.pdf" target="_blank"><tt>https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2025/FYSSTKweek43.pdf</tt></a></li>  
+</ol>
 </div>
 
 
@@ -474,50 +410,18 @@ <h2 id="exercises-and-lab-session-week-43">Exercises and lab session week 43 </h
 <div class="alert alert-block alert-block alert-text-normal">
 <b>Lab sessions on Tuesday and Wednesday</b>
 <p>
-<ul>
-  <li> Exercise on writing your own neural network code</li>
-  <li> The exercises this week will be continued next week as well</li>
-  <li> Discussion of project 2</li>
-</ul>
-</div>
-  
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="mathematics-of-deep-learning">Mathematics of deep learning </h2>
-
-<div class="alert alert-block alert-block alert-text-normal">
-<b>Two recent books online</b>
-<p>
 <ol>
-<li> The Modern Mathematics of Deep Learning, by Julius Berner, Philipp Grohs, Gitta Kutyniok, Philipp Petersen at <a href="https://arxiv.org/abs/2105.04026" target="_blank"><tt>https://arxiv.org/abs/2105.04026</tt></a>, published as <a href="https://doi.org/10.1017/9781009025096.002" target="_blank">Mathematical Aspects of Deep Learning, pp. 1-111. Cambridge University Press, 2022</a></li>
-<li> Mathematical Introduction to Deep Learning: Methods, Implementations, and Theory, Arnulf Jentzen, Benno Kuckuck, Philippe von Wurstemberger at <a href="https://doi.org/10.48550/arXiv.2310.20360" target="_blank"><tt>https://doi.org/10.48550/arXiv.2310.20360</tt></a></li>
+<li> Work  on writing your own neural network code and discussions of project 2. If you didn't get time to do the exercises from the two last weeks, we recommend doing so as these exercises give you the basic elements of a neural network code.</li>
+<li> The exercises this week are tailored to the optional part of project 2, and deal with studying ways to display results from classification problems</li>
 </ol>
 </div>
+  
 
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="reminder-on-books-with-hands-on-material-and-codes">Reminder on books with hands-on material and codes </h2>
-<div class="alert alert-block alert-block alert-text-normal">
-<b></b>
-<p>
-<ul>
-<li> Sebastian Rashcka et al, Machine learning with Scikit-Learn and PyTorch at <a href="https://sebastianraschka.com/blog/2022/ml-pytorch-book.html" target="_blank"><tt>https://sebastianraschka.com/blog/2022/ml-pytorch-book.html</tt></a></li>
-</ul>
-</div>
-
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="reading-recommendations">Reading recommendations </h2>
-
-<ol>
-<li> Rashkca et al., chapter 11, jupyter-notebook sent separately, from GitHub site at <a href="https://github.com/rasbt/machine-learning-book" target="_blank"><tt>https://github.com/rasbt/machine-learning-book</tt></a>. See also chapters 12 and 13 on using Pytorch to make a Neural network code.</li> 
-<li> Goodfellow et al, chapter 6 and 7 contain most of the neural network background.</li>
-</ol>
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
 <h2 id="using-automatic-differentiation">Using Automatic differentiation </h2>
 
 <p>In our discussions of ordinary differential equations and neural network codes
-we will also study the usage of Autograd, see for example <a href="https://www.youtube.com/watch?v=fRf4l5qaX1M&ab_channel=AlexSmola" target="_blank"><tt>https://www.youtube.com/watch?v=fRf4l5qaX1M&ab_channel=AlexSmola</tt></a> in computing gradients for deep learning. For the documentation of Autograd and examples see the lectures slides from <a href="https://compphysics.github.io/MachineLearning/doc/pub/week39/html/week39.html" target="_blank">week 39</a> and the Autograd documentation at <a href="https://github.com/HIPS/autograd" target="_blank"><tt>https://github.com/HIPS/autograd</tt></a>.
+we will also study the usage of Autograd, see for example <a href="https://www.youtube.com/watch?v=fRf4l5qaX1M&ab_channel=AlexSmola" target="_blank"><tt>https://www.youtube.com/watch?v=fRf4l5qaX1M&ab_channel=AlexSmola</tt></a> in computing gradients for deep learning. For the documentation of Autograd and examples see the Autograd documentation at <a href="https://github.com/HIPS/autograd" target="_blank"><tt>https://github.com/HIPS/autograd</tt></a> and the lecture slides from week 41, see <a href="https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/week41.html" target="_blank"><tt>https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/week41.html</tt></a>. 
 </p>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
@@ -530,11 +434,11 @@ <h2 id="back-propagation-and-automatic-differentiation">Back propagation and aut
 <li> Slides 12-44 at <a href="http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture4.pdf" target="_blank"><tt>http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture4.pdf</tt></a></li>
 </ol>
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="lecture-monday-october-21">Lecture Monday  October 21 </h2>
+<h2 id="lecture-monday-october-20">Lecture Monday  October 20 </h2>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
 <h2 id="setting-up-the-back-propagation-algorithm-and-algorithm-for-a-feed-forward-nn-initalizations">Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations </h2>
-<p>This is a reminder from where we ended last week.</p>
+<p>This is a reminder from last week.</p>
 
 <div class="alert alert-block alert-block alert-text-normal">
 <b>The architecture (our model)</b>
@@ -714,220 +618,163 @@ <h2 id="more-on-activation-functions-output-layers">More on activation functions
 <li> For classification the softmax activation function is generally a good choice for classification tasks (when the classes are mutually exclusive).</li>
 <li> For regression tasks, you can simply use no activation function at all.</li>
 </ul>
-<!-- !split  -->
-<h2 id="setting-up-a-multi-layer-perceptron-model-for-classification">Setting up a Multi-layer perceptron model for classification  </h2>
-
-<p>We are now gong to develop an example based on the MNIST data
-base. This is a classification problem and we need to use our
-cross-entropy function we discussed in connection with logistic
-regression. The cross-entropy defines our cost function for the
-classificaton problems with neural networks.
-</p>
-
-<p>In binary classification with two classes \( (0, 1) \) we define the
-logistic/sigmoid function as the probability that a particular input
-is in class \( 0 \) or \( 1 \).  This is possible because the logistic
-function takes any input from the real numbers and inputs a number
-between 0 and 1, and can therefore be interpreted as a probability. It
-also has other nice properties, such as a derivative that is simple to
-calculate.
-</p>
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="building-neural-networks-in-tensorflow-and-keras">Building neural networks in Tensorflow and Keras </h2>
 
-<p>For an input \( \boldsymbol{a} \) from the hidden layer, the probability that the input \( \boldsymbol{x} \)
-is in class 0 or 1 is just. We let \( \theta \) represent the unknown weights and biases to be adjusted by our equations). The variable \( x \)
-represents our activation values \( z \). We have
+<p>Now we want  to build on the experience gained from our neural network implementation in NumPy and scikit-learn
+and use it to construct a neural network in Tensorflow. Once we have constructed a neural network in NumPy
+and Tensorflow, building one in Keras is really quite trivial, though the performance may suffer.  
 </p>
-$$
-P(y = 0 \mid \boldsymbol{x}, \boldsymbol{\theta}) = \frac{1}{1 + \exp{(- \boldsymbol{x}})} ,
-$$
-
-<p>and</p>
-$$
-P(y = 1 \mid \boldsymbol{x}, \boldsymbol{\theta}) = 1 - P(y = 0 \mid \boldsymbol{x}, \boldsymbol{\theta}) ,
-$$
 
-<p>where \( y \in \{0, 1\} \)  and \( \boldsymbol{\theta} \) represents the weights and biases
-of our network.
+<p>In our previous example we used only one hidden layer, and in this we will use two. From this it should be quite
+clear how to build one using an arbitrary number of hidden layers, using data structures such as Python lists or
+NumPy arrays.
 </p>
 
-
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="defining-the-cost-function">Defining the cost function </h2>
-
-<p>Our cost function is given as (see the Logistic regression lectures)</p>
-$$
-\mathcal{C}(\boldsymbol{\theta}) = - \ln P(\mathcal{D} \mid \boldsymbol{\theta}) = - \sum_{i=1}^n
-y_i \ln[P(y_i = 0)] + (1 - y_i) \ln [1 - P(y_i = 0)] = \sum_{i=1}^n \mathcal{L}_i(\boldsymbol{\theta}) .
-$$
+<h2 id="tensorflow">Tensorflow </h2>
 
-<p>This last equality means that we can interpret our <em>cost</em> function as a sum over the <em>loss</em> function
-for each point in the dataset \( \mathcal{L}_i(\boldsymbol{\theta}) \).  
-The negative sign is just so that we can think about our algorithm as minimizing a positive number, rather
-than maximizing a negative number.  
+<p>Tensorflow is an open source library machine learning library
+developed by the Google Brain team for internal use. It was released
+under the Apache 2.0 open source license in November 9, 2015.
 </p>
 
-<p>In <em>multiclass</em> classification it is common to treat each integer label as a so called <em>one-hot</em> vector:  </p>
-
-<p>\( y = 5 \quad \rightarrow \quad \boldsymbol{y} = (0, 0, 0, 0, 0, 1, 0, 0, 0, 0) , \) and</p>
-
-\( y = 1 \quad \rightarrow \quad \boldsymbol{y} = (0, 1, 0, 0, 0, 0, 0, 0, 0, 0) , \)
-
-<p>i.e. a binary bit string of length \( C \), where \( C = 10 \) is the number of classes in the MNIST dataset (numbers from \( 0 \) to \( 9 \))..  </p>
-
-<p>If \( \boldsymbol{x}_i \) is the \( i \)-th input (image), \( y_{ic} \) refers to the \( c \)-th component of the \( i \)-th
-output vector \( \boldsymbol{y}_i \).  
-The probability of \( \boldsymbol{x}_i \) being in class \( c \) will be given by the softmax function:  
+<p>Tensorflow is a computational framework that allows you to construct
+machine learning models at different levels of abstraction, from
+high-level, object-oriented APIs like Keras, down to the C++ kernels
+that Tensorflow is built upon. The higher levels of abstraction are
+simpler to use, but less flexible, and our choice of implementation
+should reflect the problems we are trying to solve.
 </p>
 
-$$
-P(y_{ic} = 1 \mid \boldsymbol{x}_i, \boldsymbol{\theta}) = \frac{\exp{((\boldsymbol{a}_i^{hidden})^T \boldsymbol{w}_c)}}
-{\sum_{c'=0}^{C-1} \exp{((\boldsymbol{a}_i^{hidden})^T \boldsymbol{w}_{c'})}} ,
-$$
-
-<p>which reduces to the logistic function in the binary case.  
-The likelihood of this \( C \)-class classifier
-is now given as:  
+<p><a href="https://www.tensorflow.org/guide/graphs" target="_blank">Tensorflow uses</a> so-called graphs to represent your computation
+in terms of the dependencies between individual operations, such that you first build a Tensorflow <em>graph</em>
+to represent your model, and then create a Tensorflow <em>session</em> to run the graph.
 </p>
 
-$$
-P(\mathcal{D} \mid \boldsymbol{\theta}) = \prod_{i=1}^n \prod_{c=0}^{C-1} [P(y_{ic} = 1)]^{y_{ic}} .
-$$
-
-<p>Again we take the negative log-likelihood to define our cost function:  </p>
-
-$$
-\mathcal{C}(\boldsymbol{\theta}) = - \log{P(\mathcal{D} \mid \boldsymbol{\theta})}.
-$$
-
-<p>See the logistic regression lectures for a full definition of the cost function.</p>
-
-<p>The back propagation equations need now only a small change, namely the definition of a new cost function. We are thus ready to use the same equations as before!</p>
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="example-binary-classification-problem">Example: binary classification problem </h2>
-
-<p>As an example of the above, relevant for project 2 as well, let us consider a binary class. As discussed in our logistic regression lectures, we defined a cost function in terms of the parameters \( \beta \) as</p>
-$$
-\mathcal{C}(\boldsymbol{\beta}) = - \sum_{i=1}^n \left(y_i\log{p(y_i \vert x_i,\boldsymbol{\beta})}+(1-y_i)\log{1-p(y_i \vert x_i,\boldsymbol{\beta})}\right),
-$$
-
-<p>where we had defined the logistic (sigmoid) function</p>
-$$
-p(y_i =1\vert x_i,\boldsymbol{\beta})=\frac{\exp{(\beta_0+\beta_1 x_i)}}{1+\exp{(\beta_0+\beta_1 x_i)}},
-$$
+<p>In this guide we will analyze the same data as we did in our NumPy and
+scikit-learn tutorial, gathered from the MNIST database of images. We
+will give an introduction to the lower level Python Application
+Program Interfaces (APIs), and see how we use them to build our graph.
+Then we will build (effectively) the same graph in Keras, to see just
+how simple solving a machine learning problem can be.
+</p>
 
-<p>and</p>
-$$
-p(y_i =0\vert x_i,\boldsymbol{\beta})=1-p(y_i =1\vert x_i,\boldsymbol{\beta}).
-$$
+<p>To install tensorflow on Unix/Linux systems, use pip as</p>
 
-<p>The parameters \( \boldsymbol{\beta} \) were defined using a minimization method like gradient descent or Newton-Raphson's method. </p>
+<!-- code=python (!bc pycod) typeset with pygments style "default" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #f8f8f8">
+  <pre style="line-height: 125%;">pip3 install tensorflow
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
+</div>
 
-<p>Now we replace \( x_i \) with the activation \( z_i^l \) for a given layer \( l \) and the outputs as \( y_i=a_i^l=f(z_i^l) \), with \( z_i^l \) now being a function of the weights \( w_{ij}^l \) and biases \( b_i^l \). 
-We have then
+<p>and/or if you use <b>anaconda</b>, just write (or install from the graphical user interface)
+(current release of CPU-only TensorFlow)
 </p>
-$$
-a_i^l = y_i = \frac{\exp{(z_i^l)}}{1+\exp{(z_i^l)}},
-$$
 
-<p>with </p>
-$$
-z_i^l = \sum_{j}w_{ij}^l a_j^{l-1}+b_i^l,
-$$
+<!-- code=python (!bc pycod) typeset with pygments style "default" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #f8f8f8">
+  <pre style="line-height: 125%;">conda create <span style="color: #666666">-</span>n tf tensorflow
+conda activate tf
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
+</div>
 
-<p>where the superscript \( l-1 \) indicates that these are the outputs from layer \( l-1 \).
-Our cost function at the final layer \( l=L \) is now
-</p>
-$$
-\mathcal{C}(\boldsymbol{W}) = - \sum_{i=1}^n \left(t_i\log{a_i^L}+(1-t_i)\log{(1-a_i^L)}\right),
-$$
+<p>To install the current release of GPU TensorFlow</p>
 
-<p>where we have defined the targets \( t_i \). The derivatives of the cost function with respect to the output \( a_i^L \) are then easily calculated and we get</p>
-$$
-\frac{\partial \mathcal{C}(\boldsymbol{W})}{\partial a_i^L} = \frac{a_i^L-t_i}{a_i^L(1-a_i^L)}. 
-$$
+<!-- code=python (!bc pycod) typeset with pygments style "default" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #f8f8f8">
+  <pre style="line-height: 125%;">conda create <span style="color: #666666">-</span>n tf<span style="color: #666666">-</span>gpu tensorflow<span style="color: #666666">-</span>gpu
+conda activate tf<span style="color: #666666">-</span>gpu
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
+</div>
 
-<p>In case we use another activation function than the logistic one, we need to evaluate other derivatives. </p>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="the-softmax-function">The Softmax function </h2>
-<p>In case we employ the more general case given by the Softmax equation, we need to evaluate the derivative of the activation function with respect to the activation \( z_i^l \), that is we need</p>
-$$
-\frac{\partial f(z_i^l)}{\partial w_{jk}^l} =
-\frac{\partial f(z_i^l)}{\partial z_j^l} \frac{\partial z_j^l}{\partial w_{jk}^l}= \frac{\partial f(z_i^l)}{\partial z_j^l}a_k^{l-1}.
-$$
-
-<p>For the Softmax function we have</p>
-$$
-f(z_i^l) = \frac{\exp{(z_i^l)}}{\sum_{m=1}^K\exp{(z_m^l)}}.
-$$
+<h2 id="using-keras">Using Keras </h2>
 
-<p>Its derivative with respect to \( z_j^l \) gives </p>
-$$
-\frac{\partial f(z_i^l)}{\partial z_j^l}= f(z_i^l)\left(\delta_{ij}-f(z_j^l)\right), 
-$$
+<p>Keras is a high level <a href="https://en.wikipedia.org/wiki/Application_programming_interface" target="_blank">neural network</a>
+that supports Tensorflow, CTNK and Theano as backends.  
+If you have Anaconda installed you may run the following command
+</p>
 
-<p>which in case of the simply binary model reduces to  having \( i=j \). </p>
+<!-- code=python (!bc pycod) typeset with pygments style "default" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #f8f8f8">
+  <pre style="line-height: 125%;">conda install keras
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
+</div>
 
-<!-- !split  -->
-<h2 id="developing-a-code-for-doing-neural-networks-with-back-propagation">Developing a code for doing neural networks with back propagation </h2>
+<p>You can look up the <a href="https://keras.io/" target="_blank">instructions here</a> for more information.</p>
 
-<p>One can identify a set of key steps when using neural networks to solve supervised learning problems:  </p>
+<p>We will to a large extent use <b>keras</b> in this course. </p>
 
-<ol>
-<li> Collect and pre-process data</li>  
-<li> Define model and architecture</li>  
-<li> Choose cost function and optimizer</li>  
-<li> Train the model</li>  
-<li> Evaluate model performance on test data</li>  
-<li> Adjust hyperparameters (if necessary, network architecture)</li>
-</ol>
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
 <h2 id="collect-and-pre-process-data">Collect and pre-process data </h2>
 
-<p>Here we will be using the MNIST dataset, which is readily available through the <b>scikit-learn</b>
-package. You may also find it for example <a href="http://yann.lecun.com/exdb/mnist/" target="_blank">here</a>.  
-The <em>MNIST</em> (Modified National Institute of Standards and Technology) database is a large database
-of handwritten digits that is commonly used for training various image processing systems.  
-The MNIST dataset consists of 70 000 images of size \( 28\times 28 \) pixels, each labeled from 0 to 9.  
-The scikit-learn dataset we will use consists of a selection of 1797 images of size \( 8\times 8 \) collected and processed from this database.  
-</p>
-
-<p>To feed data into a feed-forward neural network we need to represent
-the inputs as a design/feature matrix \( X = (n_{inputs}, n_{features}) \).  Each
-row represents an <em>input</em>, in this case a handwritten digit, and
-each column represents a <em>feature</em>, in this case a pixel.  The
-correct answers, also known as <em>labels</em> or <em>targets</em> are
-represented as a 1D array of integers 
-\( Y = (n_{inputs}) = (5, 3, 1, 8,...) \).
-</p>
-
-<p>As an example, say we want to build a neural network using supervised learning to predict Body-Mass Index (BMI) from
-measurements of height (in m)  
-and weight (in kg). If we have measurements of 5 people the design/feature matrix could be for example:  
-</p>
-
-<p>$$ X = \begin{bmatrix}
-1.85 &amp; 81\\
-1.71 &amp; 65\\
-1.95 &amp; 103\\
-1.55 &amp; 42\\
-1.63 &amp; 56
-\end{bmatrix} ,$$  
-</p>
-
-<p>and the targets would be:  </p>
-
-<p>$$ Y = (23.7, 22.2, 27.1, 17.5, 21.1) $$  </p>
-
-<p>Since each input image is a 2D matrix, we need to flatten the image
-(i.e. "unravel" the 2D matrix into a 1D array) to turn the data into a
-design/feature matrix. This means we lose all spatial information in the
-image, such as locality and translational invariance. More complicated
-architectures such as Convolutional Neural Networks can take advantage
-of such information, and are most commonly applied when analyzing
-images.
-</p>
+<p>Let us look again at the MINST data set.</p>
 
 
 <!-- code=python (!bc pycod) typeset with pygments style "default" -->
@@ -939,1183 +786,7 @@ <h2 id="collect-and-pre-process-data">Collect and pre-process data </h2>
   <pre style="line-height: 125%;"><span style="color: #408080; font-style: italic"># import necessary packages</span>
 <span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
 <span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">matplotlib.pyplot</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">plt</span>
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn</span> <span style="color: #008000; font-weight: bold">import</span> datasets
-
-
-<span style="color: #408080; font-style: italic"># ensure the same random numbers appear every time</span>
-np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>seed(<span style="color: #666666">0</span>)
-
-<span style="color: #408080; font-style: italic"># display images in notebook</span>
-<span style="color: #666666">%</span>matplotlib inline
-plt<span style="color: #666666">.</span>rcParams[<span style="color: #BA2121">&#39;figure.figsize&#39;</span>] <span style="color: #666666">=</span> (<span style="color: #666666">12</span>,<span style="color: #666666">12</span>)
-
-
-<span style="color: #408080; font-style: italic"># download MNIST dataset</span>
-digits <span style="color: #666666">=</span> datasets<span style="color: #666666">.</span>load_digits()
-
-<span style="color: #408080; font-style: italic"># define inputs and labels</span>
-inputs <span style="color: #666666">=</span> digits<span style="color: #666666">.</span>images
-labels <span style="color: #666666">=</span> digits<span style="color: #666666">.</span>target
-
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;inputs = (n_inputs, pixel_width, pixel_height) = &quot;</span> <span style="color: #666666">+</span> <span style="color: #008000">str</span>(inputs<span style="color: #666666">.</span>shape))
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;labels = (n_inputs) = &quot;</span> <span style="color: #666666">+</span> <span style="color: #008000">str</span>(labels<span style="color: #666666">.</span>shape))
-
-
-<span style="color: #408080; font-style: italic"># flatten the image</span>
-<span style="color: #408080; font-style: italic"># the value -1 means dimension is inferred from the remaining dimensions: 8x8 = 64</span>
-n_inputs <span style="color: #666666">=</span> <span style="color: #008000">len</span>(inputs)
-inputs <span style="color: #666666">=</span> inputs<span style="color: #666666">.</span>reshape(n_inputs, <span style="color: #666666">-1</span>)
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;X = (n_inputs, n_features) = &quot;</span> <span style="color: #666666">+</span> <span style="color: #008000">str</span>(inputs<span style="color: #666666">.</span>shape))
-
-
-<span style="color: #408080; font-style: italic"># choose some random images to display</span>
-indices <span style="color: #666666">=</span> np<span style="color: #666666">.</span>arange(n_inputs)
-random_indices <span style="color: #666666">=</span> np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>choice(indices, size<span style="color: #666666">=5</span>)
-
-<span style="color: #008000; font-weight: bold">for</span> i, image <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">enumerate</span>(digits<span style="color: #666666">.</span>images[random_indices]):
-    plt<span style="color: #666666">.</span>subplot(<span style="color: #666666">1</span>, <span style="color: #666666">5</span>, i<span style="color: #666666">+1</span>)
-    plt<span style="color: #666666">.</span>axis(<span style="color: #BA2121">&#39;off&#39;</span>)
-    plt<span style="color: #666666">.</span>imshow(image, cmap<span style="color: #666666">=</span>plt<span style="color: #666666">.</span>cm<span style="color: #666666">.</span>gray_r, interpolation<span style="color: #666666">=</span><span style="color: #BA2121">&#39;nearest&#39;</span>)
-    plt<span style="color: #666666">.</span>title(<span style="color: #BA2121">&quot;Label: </span><span style="color: #BB6688; font-weight: bold">%d</span><span style="color: #BA2121">&quot;</span> <span style="color: #666666">%</span> digits<span style="color: #666666">.</span>target[random_indices[i]])
-plt<span style="color: #666666">.</span>show()
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="train-and-test-datasets">Train and test datasets </h2>
-
-<p>Performing analysis before partitioning the dataset is a major error, that can lead to incorrect conclusions.  </p>
-
-<p>We will reserve \( 80 \% \) of our dataset for training and \( 20 \% \) for testing.  </p>
-
-<p>It is important that the train and test datasets are drawn randomly from our dataset, to ensure
-no bias in the sampling.  
-Say you are taking measurements of weather data to predict the weather in the coming 5 days.
-You don't want to train your model on measurements taken from the hours 00.00 to 12.00, and then test it on data
-collected from 12.00 to 24.00.
-</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.model_selection</span> <span style="color: #008000; font-weight: bold">import</span> train_test_split
-
-<span style="color: #408080; font-style: italic"># one-liner from scikit-learn library</span>
-train_size <span style="color: #666666">=</span> <span style="color: #666666">0.8</span>
-test_size <span style="color: #666666">=</span> <span style="color: #666666">1</span> <span style="color: #666666">-</span> train_size
-X_train, X_test, Y_train, Y_test <span style="color: #666666">=</span> train_test_split(inputs, labels, train_size<span style="color: #666666">=</span>train_size,
-                                                    test_size<span style="color: #666666">=</span>test_size)
-
-<span style="color: #408080; font-style: italic"># equivalently in numpy</span>
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">train_test_split_numpy</span>(inputs, labels, train_size, test_size):
-    n_inputs <span style="color: #666666">=</span> <span style="color: #008000">len</span>(inputs)
-    inputs_shuffled <span style="color: #666666">=</span> inputs<span style="color: #666666">.</span>copy()
-    labels_shuffled <span style="color: #666666">=</span> labels<span style="color: #666666">.</span>copy()
-    
-    np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>shuffle(inputs_shuffled)
-    np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>shuffle(labels_shuffled)
-    
-    train_end <span style="color: #666666">=</span> <span style="color: #008000">int</span>(n_inputs<span style="color: #666666">*</span>train_size)
-    X_train, X_test <span style="color: #666666">=</span> inputs_shuffled[:train_end], inputs_shuffled[train_end:]
-    Y_train, Y_test <span style="color: #666666">=</span> labels_shuffled[:train_end], labels_shuffled[train_end:]
-    
-    <span style="color: #008000; font-weight: bold">return</span> X_train, X_test, Y_train, Y_test
-
-<span style="color: #408080; font-style: italic">#X_train, X_test, Y_train, Y_test = train_test_split_numpy(inputs, labels, train_size, test_size)</span>
-
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;Number of training images: &quot;</span> <span style="color: #666666">+</span> <span style="color: #008000">str</span>(<span style="color: #008000">len</span>(X_train)))
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;Number of test images: &quot;</span> <span style="color: #666666">+</span> <span style="color: #008000">str</span>(<span style="color: #008000">len</span>(X_test)))
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="define-model-and-architecture">Define model and architecture </h2>
-
-<p>Our simple feed-forward neural network will consist of an <em>input</em> layer, a single <em>hidden</em> layer and an <em>output</em> layer. The activation \( y \) of each neuron is a weighted sum of inputs, passed through an activation function. In case of the simple perceptron model we have </p>
-
-<p>$$ z = \sum_{i=1}^n w_i a_i ,$$</p>
-
-<p>$$ y = f(z) ,$$</p>
-
-<p>where \( f \) is the activation function, \( a_i \) represents input from neuron \( i \) in the preceding layer
-and \( w_i \) is the weight to input \( i \).  
-The activation of the neurons in the input layer is just the features (e.g. a pixel value).  
-</p>
-
-<p>The simplest activation function for a neuron is the <em>Heaviside</em> function:</p>
-
-<p>$$ f(z) = 
-\begin{cases}
-1,  &  z > 0\\
-0,  & \text{otherwise}
-\end{cases}
-$$
-</p>
-
-<p>A feed-forward neural network with this activation is known as a <em>perceptron</em>.  
-For a binary classifier (i.e. two classes, 0 or 1, dog or not-dog) we can also use this in our output layer.  
-This activation can be generalized to \( k \) classes (using e.g. the <em>one-against-all</em> strategy), 
-and we call these architectures <em>multiclass perceptrons</em>.  
-</p>
-
-<p>However, it is now common to use the terms Single Layer Perceptron (SLP) (1 hidden layer) and  
-Multilayer Perceptron (MLP) (2 or more hidden layers) to refer to feed-forward neural networks with any activation function.  
-</p>
-
-<p>Typical choices for activation functions include the sigmoid function, hyperbolic tangent, and Rectified Linear Unit (ReLU).  
-We will be using the sigmoid function \( \sigma(x) \):  
-</p>
-
-<p>$$ f(x) = \sigma(x) = \frac{1}{1 + e^{-x}} ,$$</p>
-
-<p>which is inspired by probability theory (see logistic regression) and was most commonly used until about 2011. See the discussion below concerning other activation functions.</p>
-
-<!-- !split   -->
-<h2 id="layers">Layers </h2>
-
-<ul>
-<li> Input</li> 
-</ul>
-<p>Since each input image has 8x8 = 64 pixels or features, we have an input layer of 64 neurons.  </p>
-
-<ul>
-<li> Hidden layer</li>
-</ul>
-<p>We will use 50 neurons in the hidden layer receiving input from the neurons in the input layer.  
-Since each neuron in the hidden layer is connected to the 64 inputs we have 64x50 = 3200 weights to the hidden layer.  
-</p>
-
-<ul>
-<li> Output</li>
-</ul>
-<p>If we were building a binary classifier, it would be sufficient with a single neuron in the output layer,
-which could output 0 or 1 according to the Heaviside function. This would be an example of a <em>hard</em> classifier, meaning it outputs the class of the input directly. However, if we are dealing with noisy data it is often beneficial to use a <em>soft</em> classifier, which outputs the probability of being in class 0 or 1.  
-</p>
-
-<p>For a soft binary classifier, we could use a single neuron and interpret the output as either being the probability of being in class 0 or the probability of being in class 1. Alternatively we could use 2 neurons, and interpret each neuron as the probability of being in each class.  </p>
-
-<p>Since we are doing multiclass classification, with 10 categories, it is natural to use 10 neurons in the output layer. We number the neurons \( j = 0,1,...,9 \). The activation of each output neuron \( j \) will be according to the <em>softmax</em> function:  </p>
-
-<p>$$ P(\text{class \( j \)} \mid \text{input \( \boldsymbol{a} \)}) = \frac{\exp{(\boldsymbol{a}^T \boldsymbol{w}_j)}}
-{\sum_{c=0}^{9} \exp{(\boldsymbol{a}^T \boldsymbol{w}_c)}} ,$$  
-</p>
-
-<p>i.e. each neuron \( j \) outputs the probability of being in class \( j \) given an input from the hidden layer \( \boldsymbol{a} \), with \( \boldsymbol{w}_j \) the weights of neuron \( j \) to the inputs.  
-The denominator is a normalization factor to ensure the outputs (probabilities) sum up to 1.  
-The exponent is just the weighted sum of inputs as before:  
-</p>
-
-<p>$$ z_j = \sum_{i=1}^n w_ {ij} a_i+b_j.$$  </p>
-
-<p>Since each neuron in the output layer is connected to the 50 inputs from the hidden layer we have 50x10 = 500
-weights to the output layer.
-</p>
-
-<!-- !split   -->
-<h2 id="weights-and-biases">Weights and biases </h2>
-
-<p>Typically weights are initialized with small values distributed around zero, drawn from a uniform
-or normal distribution. Setting all weights to zero means all neurons give the same output, making the network useless.  
-</p>
-
-<p>Adding a bias value to the weighted sum of inputs allows the neural network to represent a greater range
-of values. Without it, any input with the value 0 will be mapped to zero (before being passed through the activation). The bias unit has an output of 1, and a weight to each neuron \( j \), \( b_j \):  
-</p>
-
-<p>$$ z_j = \sum_{i=1}^n w_ {ij} a_i + b_j.$$  </p>
-
-<p>The bias weights \( \boldsymbol{b} \) are often initialized to zero, but a small value like \( 0.01 \) ensures all neurons have some output which can be backpropagated in the first training cycle.</p>
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #408080; font-style: italic"># building our neural network</span>
-
-n_inputs, n_features <span style="color: #666666">=</span> X_train<span style="color: #666666">.</span>shape
-n_hidden_neurons <span style="color: #666666">=</span> <span style="color: #666666">50</span>
-n_categories <span style="color: #666666">=</span> <span style="color: #666666">10</span>
-
-<span style="color: #408080; font-style: italic"># we make the weights normally distributed using numpy.random.randn</span>
-
-<span style="color: #408080; font-style: italic"># weights and bias in the hidden layer</span>
-hidden_weights <span style="color: #666666">=</span> np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>randn(n_features, n_hidden_neurons)
-hidden_bias <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros(n_hidden_neurons) <span style="color: #666666">+</span> <span style="color: #666666">0.01</span>
-
-<span style="color: #408080; font-style: italic"># weights and bias in the output layer</span>
-output_weights <span style="color: #666666">=</span> np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>randn(n_hidden_neurons, n_categories)
-output_bias <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros(n_categories) <span style="color: #666666">+</span> <span style="color: #666666">0.01</span>
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="feed-forward-pass">Feed-forward pass </h2>
-
-<p>Denote \( F \) the number of features, \( H \) the number of hidden neurons and \( C \) the number of categories.  
-For each input image we calculate a weighted sum of input features (pixel values) to each neuron \( j \) in the hidden layer \( l \):  
-</p>
-
-<p>$$ z_{j}^{l} = \sum_{i=1}^{F} w_{ij}^{l} x_i + b_{j}^{l},$$</p>
-
-<p>this is then passed through our activation function  </p>
-
-<p>$$ a_{j}^{l} = f(z_{j}^{l}) .$$  </p>
-
-<p>We calculate a weighted sum of inputs (activations in the hidden layer) to each neuron \( j \) in the output layer:  </p>
-
-<p>$$ z_{j}^{L} = \sum_{i=1}^{H} w_{ij}^{L} a_{i}^{l} + b_{j}^{L}.$$  </p>
-
-<p>Finally we calculate the output of neuron \( j \) in the output layer using the softmax function:  </p>
-
-<p>$$ a_{j}^{L} = \frac{\exp{(z_j^{L})}}
-{\sum_{c=0}^{C-1} \exp{(z_c^{L})}} .$$  
-</p>
-
-<!-- !split    -->
-<h2 id="matrix-multiplications">Matrix multiplications </h2>
-
-<p>Since our data has the dimensions \( X = (n_{inputs}, n_{features}) \) and our weights to the hidden
-layer have the dimensions  
-\( W_{hidden} = (n_{features}, n_{hidden}) \),
-we can easily feed the network all our training data in one go by taking the matrix product  
-</p>
-
-<p>$$ X W^{h} = (n_{inputs}, n_{hidden}),$$ </p>
-
-<p>and obtain a matrix that holds the weighted sum of inputs to the hidden layer
-for each input image and each hidden neuron.    
-We also add the bias to obtain a matrix of weighted sums to the hidden layer \( Z^{h} \):  
-</p>
-
-<p>$$ \boldsymbol{z}^{l} = \boldsymbol{X} \boldsymbol{W}^{l} + \boldsymbol{b}^{l} ,$$</p>
-
-<p>meaning the same bias (1D array with size equal number of hidden neurons) is added to each input image.  
-This is then passed through the activation:  
-</p>
-
-<p>$$ \boldsymbol{a}^{l} = f(\boldsymbol{z}^l) .$$  </p>
-
-<p>This is fed to the output layer:  </p>
-
-<p>$$ \boldsymbol{z}^{L} = \boldsymbol{a}^{L} \boldsymbol{W}^{L} + \boldsymbol{b}^{L} .$$</p>
-
-<p>Finally we receive our output values for each image and each category by passing it through the softmax function:  </p>
-
-<p>$$ output = softmax (\boldsymbol{z}^{L}) = (n_{inputs}, n_{categories}) .$$</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #408080; font-style: italic"># setup the feed-forward pass, subscript h = hidden layer</span>
-
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">sigmoid</span>(x):
-    <span style="color: #008000; font-weight: bold">return</span> <span style="color: #666666">1/</span>(<span style="color: #666666">1</span> <span style="color: #666666">+</span> np<span style="color: #666666">.</span>exp(<span style="color: #666666">-</span>x))
-
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">feed_forward</span>(X):
-    <span style="color: #408080; font-style: italic"># weighted sum of inputs to the hidden layer</span>
-    z_h <span style="color: #666666">=</span> np<span style="color: #666666">.</span>matmul(X, hidden_weights) <span style="color: #666666">+</span> hidden_bias
-    <span style="color: #408080; font-style: italic"># activation in the hidden layer</span>
-    a_h <span style="color: #666666">=</span> sigmoid(z_h)
-    
-    <span style="color: #408080; font-style: italic"># weighted sum of inputs to the output layer</span>
-    z_o <span style="color: #666666">=</span> np<span style="color: #666666">.</span>matmul(a_h, output_weights) <span style="color: #666666">+</span> output_bias
-    <span style="color: #408080; font-style: italic"># softmax output</span>
-    <span style="color: #408080; font-style: italic"># axis 0 holds each input and axis 1 the probabilities of each category</span>
-    exp_term <span style="color: #666666">=</span> np<span style="color: #666666">.</span>exp(z_o)
-    probabilities <span style="color: #666666">=</span> exp_term <span style="color: #666666">/</span> np<span style="color: #666666">.</span>sum(exp_term, axis<span style="color: #666666">=1</span>, keepdims<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">True</span>)
-    
-    <span style="color: #008000; font-weight: bold">return</span> probabilities
-
-probabilities <span style="color: #666666">=</span> feed_forward(X_train)
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;probabilities = (n_inputs, n_categories) = &quot;</span> <span style="color: #666666">+</span> <span style="color: #008000">str</span>(probabilities<span style="color: #666666">.</span>shape))
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;probability that image 0 is in category 0,1,2,...,9 = </span><span style="color: #BB6622; font-weight: bold">\n</span><span style="color: #BA2121">&quot;</span> <span style="color: #666666">+</span> <span style="color: #008000">str</span>(probabilities[<span style="color: #666666">0</span>]))
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;probabilities sum up to: &quot;</span> <span style="color: #666666">+</span> <span style="color: #008000">str</span>(probabilities[<span style="color: #666666">0</span>]<span style="color: #666666">.</span>sum()))
-<span style="color: #008000">print</span>()
-
-<span style="color: #408080; font-style: italic"># we obtain a prediction by taking the class with the highest likelihood</span>
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">predict</span>(X):
-    probabilities <span style="color: #666666">=</span> feed_forward(X)
-    <span style="color: #008000; font-weight: bold">return</span> np<span style="color: #666666">.</span>argmax(probabilities, axis<span style="color: #666666">=1</span>)
-
-predictions <span style="color: #666666">=</span> predict(X_train)
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;predictions = (n_inputs) = &quot;</span> <span style="color: #666666">+</span> <span style="color: #008000">str</span>(predictions<span style="color: #666666">.</span>shape))
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;prediction for image 0: &quot;</span> <span style="color: #666666">+</span> <span style="color: #008000">str</span>(predictions[<span style="color: #666666">0</span>]))
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;correct label for image 0: &quot;</span> <span style="color: #666666">+</span> <span style="color: #008000">str</span>(Y_train[<span style="color: #666666">0</span>]))
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="choose-cost-function-and-optimizer">Choose cost function and optimizer </h2>
-
-<p>To measure how well our neural network is doing we need to introduce a cost function.  
-We will call the function that gives the error of a single sample output the <em>loss</em> function, and the function
-that gives the total error of our network across all samples the <em>cost</em> function.
-A typical choice for multiclass classification is the <em>cross-entropy</em> loss, also known as the negative log likelihood.  
-</p>
-
-<p>In <em>multiclass</em> classification it is common to treat each integer label as a so called <em>one-hot</em> vector:  </p>
-
-<p>$$ y = 5 \quad \rightarrow \quad \boldsymbol{y} = (0, 0, 0, 0, 0, 1, 0, 0, 0, 0) ,$$  </p>
-
-<p>$$ y = 1 \quad \rightarrow \quad \boldsymbol{y} = (0, 1, 0, 0, 0, 0, 0, 0, 0, 0) ,$$  </p>
-
-<p>i.e. a binary bit string of length \( C \), where \( C = 10 \) is the number of classes in the MNIST dataset.  </p>
-
-<p>Let \( y_{ic} \) denote the \( c \)-th component of the \( i \)-th one-hot vector.  
-We define the cost function \( \mathcal{C} \) as a sum over the cross-entropy loss for each point \( \boldsymbol{x}_i \) in the dataset.
-</p>
-
-<p>In the one-hot representation only one of the terms in the loss function is non-zero, namely the
-probability of the correct category \( c' \)  
-(i.e. the category \( c' \) such that \( y_{ic'} = 1 \)). This means that the cross entropy loss only punishes you for how wrong
-you got the correct label. The probability of category \( c \) is given by the softmax function. The vector \( \boldsymbol{\theta} \) represents the parameters of our network, i.e. all the weights and biases.  
-</p>
-
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="optimizing-the-cost-function">Optimizing the cost function </h2>
-
-<p>The network is trained by finding the weights and biases that minimize the cost function. One of the most widely used classes of methods is <em>gradient descent</em> and its generalizations. The idea behind gradient descent
-is simply to adjust the weights in the direction where the gradient of the cost function is large and negative. This ensures we flow toward a <em>local</em> minimum of the cost function.  
-Each parameter \( \theta \) is iteratively adjusted according to the rule  
-</p>
-
-<p>$$ \theta_{i+1} = \theta_i - \eta \nabla \mathcal{C}(\theta_i) ,$$</p>
-
-<p>where \( \eta \) is known as the <em>learning rate</em>, which controls how big a step we take towards the minimum.  
-This update can be repeated for any number of iterations, or until we are satisfied with the result.  
-</p>
-
-<p>A simple and effective improvement is a variant called <em>Batch Gradient Descent</em>.  
-Instead of calculating the gradient on the whole dataset, we calculate an approximation of the gradient
-on a subset of the data called a <em>minibatch</em>.  
-If there are \( N \) data points and we have a minibatch size of \( M \), the total number of batches
-is \( N/M \).  
-We denote each minibatch \( B_k \), with \( k = 1, 2,...,N/M \). The gradient then becomes:  
-</p>
-
-<p>$$ \nabla \mathcal{C}(\theta) = \frac{1}{N} \sum_{i=1}^N \nabla \mathcal{L}_i(\theta) \quad \rightarrow \quad
-\frac{1}{M} \sum_{i \in B_k} \nabla \mathcal{L}_i(\theta) ,$$
-</p>
-
-<p>i.e. instead of averaging the loss over the entire dataset, we average over a minibatch.  </p>
-
-<p>This has two important benefits:  </p>
-<ol>
-<li> Introducing stochasticity decreases the chance that the algorithm becomes stuck in a local minima.</li>  
-<li> It significantly speeds up the calculation, since we do not have to use the entire dataset to calculate the gradient.</li>  
-</ol>
-<p>The various optmization  methods, with codes and algorithms,  are discussed in our lectures on <a href="https://compphysics.github.io/MachineLearning/doc/pub/Splines/html/Splines-bs.html" target="_blank">Gradient descent approaches</a>.</p>
-
-<!-- !split   -->
-<h2 id="regularization">Regularization </h2>
-
-<p>It is common to add an extra term to the cost function, proportional
-to the size of the weights.  This is equivalent to constraining the
-size of the weights, so that they do not grow out of control.
-Constraining the size of the weights means that the weights cannot
-grow arbitrarily large to fit the training data, and in this way
-reduces <em>overfitting</em>.
-</p>
-
-<p>We will measure the size of the weights using the so called <em>L2-norm</em>, meaning our cost function becomes:  </p>
-
-<p>$$  \mathcal{C}(\theta) = \frac{1}{N} \sum_{i=1}^N \mathcal{L}_i(\theta) \quad \rightarrow \quad
-\frac{1}{N} \sum_{i=1}^N  \mathcal{L}_i(\theta) + \lambda \lvert \lvert \boldsymbol{w} \rvert \rvert_2^2 
-= \frac{1}{N} \sum_{i=1}^N  \mathcal{L}(\theta) + \lambda \sum_{ij} w_{ij}^2,$$  
-</p>
-
-<p>i.e. we sum up all the weights squared. The factor \( \lambda \) is known as a regularization parameter.</p>
-
-<p>In order to train the model, we need to calculate the derivative of
-the cost function with respect to every bias and weight in the
-network.  In total our network has \( (64 + 1)\times 50=3250 \) weights in
-the hidden layer and \( (50 + 1)\times 10=510 \) weights to the output
-layer (\( +1 \) for the bias), and the gradient must be calculated for
-every parameter.  We use the <em>backpropagation</em> algorithm discussed
-above. This is a clever use of the chain rule that allows us to
-calculate the gradient efficently. 
-</p>
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="matrix-multiplication">Matrix  multiplication </h2>
-
-<p>To more efficently train our network these equations are implemented using matrix operations.  
-The error in the output layer is calculated simply as, with \( \boldsymbol{t} \) being our targets,  
-</p>
-
-<p>$$ \delta_L = \boldsymbol{t} - \boldsymbol{y} = (n_{inputs}, n_{categories}) .$$  </p>
-
-<p>The gradient for the output weights is calculated as  </p>
-
-<p>$$ \nabla W_{L} = \boldsymbol{a}^T \delta_L   = (n_{hidden}, n_{categories}) ,$$</p>
-
-<p>where \( \boldsymbol{a} = (n_{inputs}, n_{hidden}) \). This simply means that we are summing up the gradients for each input.  
-Since we are going backwards we have to transpose the activation matrix.  
-</p>
-
-<p>The gradient with respect to the output bias is then  </p>
-
-<p>$$ \nabla \boldsymbol{b}_{L} = \sum_{i=1}^{n_{inputs}} \delta_L = (n_{categories}) .$$  </p>
-
-<p>The error in the hidden layer is  </p>
-
-<p>$$ \Delta_h = \delta_L W_{L}^T \circ f'(z_{h}) = \delta_L W_{L}^T \circ a_{h} \circ (1 - a_{h}) = (n_{inputs}, n_{hidden}) ,$$  </p>
-
-<p>where \( f'(a_{h}) \) is the derivative of the activation in the hidden layer. The matrix products mean
-that we are summing up the products for each neuron in the output layer. The symbol \( \circ \) denotes
-the <em>Hadamard product</em>, meaning element-wise multiplication.  
-</p>
-
-<p>This again gives us the gradients in the hidden layer:  </p>
-
-<p>$$ \nabla W_{h} = X^T \delta_h = (n_{features}, n_{hidden}) ,$$  </p>
-
-<p>$$ \nabla b_{h} = \sum_{i=1}^{n_{inputs}} \delta_h = (n_{hidden}) .$$</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #408080; font-style: italic"># to categorical turns our integer vector into a onehot representation</span>
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.metrics</span> <span style="color: #008000; font-weight: bold">import</span> accuracy_score
-
-<span style="color: #408080; font-style: italic"># one-hot in numpy</span>
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">to_categorical_numpy</span>(integer_vector):
-    n_inputs <span style="color: #666666">=</span> <span style="color: #008000">len</span>(integer_vector)
-    n_categories <span style="color: #666666">=</span> np<span style="color: #666666">.</span>max(integer_vector) <span style="color: #666666">+</span> <span style="color: #666666">1</span>
-    onehot_vector <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros((n_inputs, n_categories))
-    onehot_vector[<span style="color: #008000">range</span>(n_inputs), integer_vector] <span style="color: #666666">=</span> <span style="color: #666666">1</span>
-    
-    <span style="color: #008000; font-weight: bold">return</span> onehot_vector
-
-<span style="color: #408080; font-style: italic">#Y_train_onehot, Y_test_onehot = to_categorical(Y_train), to_categorical(Y_test)</span>
-Y_train_onehot, Y_test_onehot <span style="color: #666666">=</span> to_categorical_numpy(Y_train), to_categorical_numpy(Y_test)
-
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">feed_forward_train</span>(X):
-    <span style="color: #408080; font-style: italic"># weighted sum of inputs to the hidden layer</span>
-    z_h <span style="color: #666666">=</span> np<span style="color: #666666">.</span>matmul(X, hidden_weights) <span style="color: #666666">+</span> hidden_bias
-    <span style="color: #408080; font-style: italic"># activation in the hidden layer</span>
-    a_h <span style="color: #666666">=</span> sigmoid(z_h)
-    
-    <span style="color: #408080; font-style: italic"># weighted sum of inputs to the output layer</span>
-    z_o <span style="color: #666666">=</span> np<span style="color: #666666">.</span>matmul(a_h, output_weights) <span style="color: #666666">+</span> output_bias
-    <span style="color: #408080; font-style: italic"># softmax output</span>
-    <span style="color: #408080; font-style: italic"># axis 0 holds each input and axis 1 the probabilities of each category</span>
-    exp_term <span style="color: #666666">=</span> np<span style="color: #666666">.</span>exp(z_o)
-    probabilities <span style="color: #666666">=</span> exp_term <span style="color: #666666">/</span> np<span style="color: #666666">.</span>sum(exp_term, axis<span style="color: #666666">=1</span>, keepdims<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">True</span>)
-    
-    <span style="color: #408080; font-style: italic"># for backpropagation need activations in hidden and output layers</span>
-    <span style="color: #008000; font-weight: bold">return</span> a_h, probabilities
-
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">backpropagation</span>(X, Y):
-    a_h, probabilities <span style="color: #666666">=</span> feed_forward_train(X)
-    
-    <span style="color: #408080; font-style: italic"># error in the output layer</span>
-    error_output <span style="color: #666666">=</span> probabilities <span style="color: #666666">-</span> Y
-    <span style="color: #408080; font-style: italic"># error in the hidden layer</span>
-    error_hidden <span style="color: #666666">=</span> np<span style="color: #666666">.</span>matmul(error_output, output_weights<span style="color: #666666">.</span>T) <span style="color: #666666">*</span> a_h <span style="color: #666666">*</span> (<span style="color: #666666">1</span> <span style="color: #666666">-</span> a_h)
-    
-    <span style="color: #408080; font-style: italic"># gradients for the output layer</span>
-    output_weights_gradient <span style="color: #666666">=</span> np<span style="color: #666666">.</span>matmul(a_h<span style="color: #666666">.</span>T, error_output)
-    output_bias_gradient <span style="color: #666666">=</span> np<span style="color: #666666">.</span>sum(error_output, axis<span style="color: #666666">=0</span>)
-    
-    <span style="color: #408080; font-style: italic"># gradient for the hidden layer</span>
-    hidden_weights_gradient <span style="color: #666666">=</span> np<span style="color: #666666">.</span>matmul(X<span style="color: #666666">.</span>T, error_hidden)
-    hidden_bias_gradient <span style="color: #666666">=</span> np<span style="color: #666666">.</span>sum(error_hidden, axis<span style="color: #666666">=0</span>)
-
-    <span style="color: #008000; font-weight: bold">return</span> output_weights_gradient, output_bias_gradient, hidden_weights_gradient, hidden_bias_gradient
-
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;Old accuracy on training data: &quot;</span> <span style="color: #666666">+</span> <span style="color: #008000">str</span>(accuracy_score(predict(X_train), Y_train)))
-
-eta <span style="color: #666666">=</span> <span style="color: #666666">0.01</span>
-lmbd <span style="color: #666666">=</span> <span style="color: #666666">0.01</span>
-<span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #666666">1000</span>):
-    <span style="color: #408080; font-style: italic"># calculate gradients</span>
-    dWo, dBo, dWh, dBh <span style="color: #666666">=</span> backpropagation(X_train, Y_train_onehot)
-    
-    <span style="color: #408080; font-style: italic"># regularization term gradients</span>
-    dWo <span style="color: #666666">+=</span> lmbd <span style="color: #666666">*</span> output_weights
-    dWh <span style="color: #666666">+=</span> lmbd <span style="color: #666666">*</span> hidden_weights
-    
-    <span style="color: #408080; font-style: italic"># update weights and biases</span>
-    output_weights <span style="color: #666666">-=</span> eta <span style="color: #666666">*</span> dWo
-    output_bias <span style="color: #666666">-=</span> eta <span style="color: #666666">*</span> dBo
-    hidden_weights <span style="color: #666666">-=</span> eta <span style="color: #666666">*</span> dWh
-    hidden_bias <span style="color: #666666">-=</span> eta <span style="color: #666666">*</span> dBh
-
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;New accuracy on training data: &quot;</span> <span style="color: #666666">+</span> <span style="color: #008000">str</span>(accuracy_score(predict(X_train), Y_train)))
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="improving-performance">Improving performance </h2>
-
-<p>As we can see the network does not seem to be learning at all. It seems to be just guessing the label for each image.  
-In order to obtain a network that does something useful, we will have to do a bit more work.  
-</p>
-
-<p>The choice of <em>hyperparameters</em> such as learning rate and regularization parameter is hugely influential for the performance of the network. Typically a <em>grid-search</em> is performed, wherein we test different hyperparameters separated by orders of magnitude. For example we could test the learning rates \( \eta = 10^{-6}, 10^{-5},...,10^{-1} \) with different regularization parameters \( \lambda = 10^{-6},...,10^{-0} \).  </p>
-
-<p>Next, we haven't implemented minibatching yet, which introduces stochasticity and is though to act as an important regularizer on the weights. We call a feed-forward + backward pass with a minibatch an <em>iteration</em>, and a full training period
-going through the entire dataset (\( n/M \) batches) an <em>epoch</em>.
-</p>
-
-<p>If this does not improve network performance, you may want to consider altering the network architecture, adding more neurons or hidden layers.  
-Andrew Ng goes through some of these considerations in this <a href="https://youtu.be/F1ka6a13S9I" target="_blank">video</a>. You can find a summary of the video <a href="https://kevinzakka.github.io/2016/09/26/applying-deep-learning/" target="_blank">here</a>.  
-</p>
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="full-object-oriented-implementation">Full object-oriented implementation </h2>
-
-<p>It is very natural to think of the network as an object, with specific instances of the network
-being realizations of this object with different hyperparameters. An implementation using Python classes provides a clean structure and interface, and the full implementation of our neural network is given below.
-</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">class</span> <span style="color: #0000FF; font-weight: bold">NeuralNetwork</span>:
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">__init__</span>(
-            <span style="color: #008000">self</span>,
-            X_data,
-            Y_data,
-            n_hidden_neurons<span style="color: #666666">=50</span>,
-            n_categories<span style="color: #666666">=10</span>,
-            epochs<span style="color: #666666">=10</span>,
-            batch_size<span style="color: #666666">=100</span>,
-            eta<span style="color: #666666">=0.1</span>,
-            lmbd<span style="color: #666666">=0.0</span>):
-
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>X_data_full <span style="color: #666666">=</span> X_data
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>Y_data_full <span style="color: #666666">=</span> Y_data
-
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>n_inputs <span style="color: #666666">=</span> X_data<span style="color: #666666">.</span>shape[<span style="color: #666666">0</span>]
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>n_features <span style="color: #666666">=</span> X_data<span style="color: #666666">.</span>shape[<span style="color: #666666">1</span>]
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>n_hidden_neurons <span style="color: #666666">=</span> n_hidden_neurons
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>n_categories <span style="color: #666666">=</span> n_categories
-
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>epochs <span style="color: #666666">=</span> epochs
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>batch_size <span style="color: #666666">=</span> batch_size
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>iterations <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>n_inputs <span style="color: #666666">//</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>batch_size
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>eta <span style="color: #666666">=</span> eta
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>lmbd <span style="color: #666666">=</span> lmbd
-
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>create_biases_and_weights()
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">create_biases_and_weights</span>(<span style="color: #008000">self</span>):
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>hidden_weights <span style="color: #666666">=</span> np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>randn(<span style="color: #008000">self</span><span style="color: #666666">.</span>n_features, <span style="color: #008000">self</span><span style="color: #666666">.</span>n_hidden_neurons)
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>hidden_bias <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros(<span style="color: #008000">self</span><span style="color: #666666">.</span>n_hidden_neurons) <span style="color: #666666">+</span> <span style="color: #666666">0.01</span>
-
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>output_weights <span style="color: #666666">=</span> np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>randn(<span style="color: #008000">self</span><span style="color: #666666">.</span>n_hidden_neurons, <span style="color: #008000">self</span><span style="color: #666666">.</span>n_categories)
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>output_bias <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros(<span style="color: #008000">self</span><span style="color: #666666">.</span>n_categories) <span style="color: #666666">+</span> <span style="color: #666666">0.01</span>
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">feed_forward</span>(<span style="color: #008000">self</span>):
-        <span style="color: #408080; font-style: italic"># feed-forward for training</span>
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>z_h <span style="color: #666666">=</span> np<span style="color: #666666">.</span>matmul(<span style="color: #008000">self</span><span style="color: #666666">.</span>X_data, <span style="color: #008000">self</span><span style="color: #666666">.</span>hidden_weights) <span style="color: #666666">+</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>hidden_bias
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>a_h <span style="color: #666666">=</span> sigmoid(<span style="color: #008000">self</span><span style="color: #666666">.</span>z_h)
-
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>z_o <span style="color: #666666">=</span> np<span style="color: #666666">.</span>matmul(<span style="color: #008000">self</span><span style="color: #666666">.</span>a_h, <span style="color: #008000">self</span><span style="color: #666666">.</span>output_weights) <span style="color: #666666">+</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>output_bias
-
-        exp_term <span style="color: #666666">=</span> np<span style="color: #666666">.</span>exp(<span style="color: #008000">self</span><span style="color: #666666">.</span>z_o)
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>probabilities <span style="color: #666666">=</span> exp_term <span style="color: #666666">/</span> np<span style="color: #666666">.</span>sum(exp_term, axis<span style="color: #666666">=1</span>, keepdims<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">True</span>)
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">feed_forward_out</span>(<span style="color: #008000">self</span>, X):
-        <span style="color: #408080; font-style: italic"># feed-forward for output</span>
-        z_h <span style="color: #666666">=</span> np<span style="color: #666666">.</span>matmul(X, <span style="color: #008000">self</span><span style="color: #666666">.</span>hidden_weights) <span style="color: #666666">+</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>hidden_bias
-        a_h <span style="color: #666666">=</span> sigmoid(z_h)
-
-        z_o <span style="color: #666666">=</span> np<span style="color: #666666">.</span>matmul(a_h, <span style="color: #008000">self</span><span style="color: #666666">.</span>output_weights) <span style="color: #666666">+</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>output_bias
-        
-        exp_term <span style="color: #666666">=</span> np<span style="color: #666666">.</span>exp(z_o)
-        probabilities <span style="color: #666666">=</span> exp_term <span style="color: #666666">/</span> np<span style="color: #666666">.</span>sum(exp_term, axis<span style="color: #666666">=1</span>, keepdims<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">True</span>)
-        <span style="color: #008000; font-weight: bold">return</span> probabilities
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">backpropagation</span>(<span style="color: #008000">self</span>):
-        error_output <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>probabilities <span style="color: #666666">-</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>Y_data
-        error_hidden <span style="color: #666666">=</span> np<span style="color: #666666">.</span>matmul(error_output, <span style="color: #008000">self</span><span style="color: #666666">.</span>output_weights<span style="color: #666666">.</span>T) <span style="color: #666666">*</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>a_h <span style="color: #666666">*</span> (<span style="color: #666666">1</span> <span style="color: #666666">-</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>a_h)
-
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>output_weights_gradient <span style="color: #666666">=</span> np<span style="color: #666666">.</span>matmul(<span style="color: #008000">self</span><span style="color: #666666">.</span>a_h<span style="color: #666666">.</span>T, error_output)
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>output_bias_gradient <span style="color: #666666">=</span> np<span style="color: #666666">.</span>sum(error_output, axis<span style="color: #666666">=0</span>)
-
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>hidden_weights_gradient <span style="color: #666666">=</span> np<span style="color: #666666">.</span>matmul(<span style="color: #008000">self</span><span style="color: #666666">.</span>X_data<span style="color: #666666">.</span>T, error_hidden)
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>hidden_bias_gradient <span style="color: #666666">=</span> np<span style="color: #666666">.</span>sum(error_hidden, axis<span style="color: #666666">=0</span>)
-
-        <span style="color: #008000; font-weight: bold">if</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>lmbd <span style="color: #666666">&gt;</span> <span style="color: #666666">0.0</span>:
-            <span style="color: #008000">self</span><span style="color: #666666">.</span>output_weights_gradient <span style="color: #666666">+=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>lmbd <span style="color: #666666">*</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>output_weights
-            <span style="color: #008000">self</span><span style="color: #666666">.</span>hidden_weights_gradient <span style="color: #666666">+=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>lmbd <span style="color: #666666">*</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>hidden_weights
-
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>output_weights <span style="color: #666666">-=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>eta <span style="color: #666666">*</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>output_weights_gradient
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>output_bias <span style="color: #666666">-=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>eta <span style="color: #666666">*</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>output_bias_gradient
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>hidden_weights <span style="color: #666666">-=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>eta <span style="color: #666666">*</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>hidden_weights_gradient
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>hidden_bias <span style="color: #666666">-=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>eta <span style="color: #666666">*</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>hidden_bias_gradient
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">predict</span>(<span style="color: #008000">self</span>, X):
-        probabilities <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>feed_forward_out(X)
-        <span style="color: #008000; font-weight: bold">return</span> np<span style="color: #666666">.</span>argmax(probabilities, axis<span style="color: #666666">=1</span>)
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">predict_probabilities</span>(<span style="color: #008000">self</span>, X):
-        probabilities <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>feed_forward_out(X)
-        <span style="color: #008000; font-weight: bold">return</span> probabilities
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">train</span>(<span style="color: #008000">self</span>):
-        data_indices <span style="color: #666666">=</span> np<span style="color: #666666">.</span>arange(<span style="color: #008000">self</span><span style="color: #666666">.</span>n_inputs)
-
-        <span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #008000">self</span><span style="color: #666666">.</span>epochs):
-            <span style="color: #008000; font-weight: bold">for</span> j <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #008000">self</span><span style="color: #666666">.</span>iterations):
-                <span style="color: #408080; font-style: italic"># pick datapoints with replacement</span>
-                chosen_datapoints <span style="color: #666666">=</span> np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>choice(
-                    data_indices, size<span style="color: #666666">=</span><span style="color: #008000">self</span><span style="color: #666666">.</span>batch_size, replace<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">False</span>
-                )
-
-                <span style="color: #408080; font-style: italic"># minibatch training data</span>
-                <span style="color: #008000">self</span><span style="color: #666666">.</span>X_data <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>X_data_full[chosen_datapoints]
-                <span style="color: #008000">self</span><span style="color: #666666">.</span>Y_data <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>Y_data_full[chosen_datapoints]
-
-                <span style="color: #008000">self</span><span style="color: #666666">.</span>feed_forward()
-                <span style="color: #008000">self</span><span style="color: #666666">.</span>backpropagation()
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="evaluate-model-performance-on-test-data">Evaluate model performance on test data </h2>
-
-<p>To measure the performance of our network we evaluate how well it does it data it has never seen before, i.e. the test data.  
-We measure the performance of the network using the <em>accuracy</em> score.  
-The accuracy is as you would expect just the number of images correctly labeled divided by the total number of images. A perfect classifier will have an accuracy score of \( 1 \).  
-</p>
-
-<p>$$ \text{Accuracy} = \frac{\sum_{i=1}^n I(\tilde{y}_i = y_i)}{n} ,$$  </p>
-
-<p>where \( I \) is the indicator function, \( 1 \) if \( \tilde{y}_i = y_i \) and \( 0 \) otherwise.</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;">epochs <span style="color: #666666">=</span> <span style="color: #666666">100</span>
-batch_size <span style="color: #666666">=</span> <span style="color: #666666">100</span>
-
-dnn <span style="color: #666666">=</span> NeuralNetwork(X_train, Y_train_onehot, eta<span style="color: #666666">=</span>eta, lmbd<span style="color: #666666">=</span>lmbd, epochs<span style="color: #666666">=</span>epochs, batch_size<span style="color: #666666">=</span>batch_size,
-                    n_hidden_neurons<span style="color: #666666">=</span>n_hidden_neurons, n_categories<span style="color: #666666">=</span>n_categories)
-dnn<span style="color: #666666">.</span>train()
-test_predict <span style="color: #666666">=</span> dnn<span style="color: #666666">.</span>predict(X_test)
-
-<span style="color: #408080; font-style: italic"># accuracy score from scikit library</span>
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;Accuracy score on test set: &quot;</span>, accuracy_score(Y_test, test_predict))
-
-<span style="color: #408080; font-style: italic"># equivalent in numpy</span>
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">accuracy_score_numpy</span>(Y_test, Y_pred):
-    <span style="color: #008000; font-weight: bold">return</span> np<span style="color: #666666">.</span>sum(Y_test <span style="color: #666666">==</span> Y_pred) <span style="color: #666666">/</span> <span style="color: #008000">len</span>(Y_test)
-
-<span style="color: #408080; font-style: italic">#print(&quot;Accuracy score on test set: &quot;, accuracy_score_numpy(Y_test, test_predict))</span>
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="adjust-hyperparameters">Adjust hyperparameters </h2>
-
-<p>We now perform a grid search to find the optimal hyperparameters for the network.  
-Note that we are only using 1 layer with 50 neurons, and human performance is estimated to be around \( 98\% \) (\( 2\% \) error rate).
-</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;">eta_vals <span style="color: #666666">=</span> np<span style="color: #666666">.</span>logspace(<span style="color: #666666">-5</span>, <span style="color: #666666">1</span>, <span style="color: #666666">7</span>)
-lmbd_vals <span style="color: #666666">=</span> np<span style="color: #666666">.</span>logspace(<span style="color: #666666">-5</span>, <span style="color: #666666">1</span>, <span style="color: #666666">7</span>)
-<span style="color: #408080; font-style: italic"># store the models for later use</span>
-DNN_numpy <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros((<span style="color: #008000">len</span>(eta_vals), <span style="color: #008000">len</span>(lmbd_vals)), dtype<span style="color: #666666">=</span><span style="color: #008000">object</span>)
-
-<span style="color: #408080; font-style: italic"># grid search</span>
-<span style="color: #008000; font-weight: bold">for</span> i, eta <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">enumerate</span>(eta_vals):
-    <span style="color: #008000; font-weight: bold">for</span> j, lmbd <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">enumerate</span>(lmbd_vals):
-        dnn <span style="color: #666666">=</span> NeuralNetwork(X_train, Y_train_onehot, eta<span style="color: #666666">=</span>eta, lmbd<span style="color: #666666">=</span>lmbd, epochs<span style="color: #666666">=</span>epochs, batch_size<span style="color: #666666">=</span>batch_size,
-                            n_hidden_neurons<span style="color: #666666">=</span>n_hidden_neurons, n_categories<span style="color: #666666">=</span>n_categories)
-        dnn<span style="color: #666666">.</span>train()
-        
-        DNN_numpy[i][j] <span style="color: #666666">=</span> dnn
-        
-        test_predict <span style="color: #666666">=</span> dnn<span style="color: #666666">.</span>predict(X_test)
-        
-        <span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;Learning rate  = &quot;</span>, eta)
-        <span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;Lambda = &quot;</span>, lmbd)
-        <span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;Accuracy score on test set: &quot;</span>, accuracy_score(Y_test, test_predict))
-        <span style="color: #008000">print</span>()
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="visualization">Visualization </h2>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #408080; font-style: italic"># visual representation of grid search</span>
-<span style="color: #408080; font-style: italic"># uses seaborn heatmap, you can also do this with matplotlib imshow</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">seaborn</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">sns</span>
-
-sns<span style="color: #666666">.</span>set()
-
-train_accuracy <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros((<span style="color: #008000">len</span>(eta_vals), <span style="color: #008000">len</span>(lmbd_vals)))
-test_accuracy <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros((<span style="color: #008000">len</span>(eta_vals), <span style="color: #008000">len</span>(lmbd_vals)))
-
-<span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #008000">len</span>(eta_vals)):
-    <span style="color: #008000; font-weight: bold">for</span> j <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #008000">len</span>(lmbd_vals)):
-        dnn <span style="color: #666666">=</span> DNN_numpy[i][j]
-        
-        train_pred <span style="color: #666666">=</span> dnn<span style="color: #666666">.</span>predict(X_train) 
-        test_pred <span style="color: #666666">=</span> dnn<span style="color: #666666">.</span>predict(X_test)
-
-        train_accuracy[i][j] <span style="color: #666666">=</span> accuracy_score(Y_train, train_pred)
-        test_accuracy[i][j] <span style="color: #666666">=</span> accuracy_score(Y_test, test_pred)
-
-        
-fig, ax <span style="color: #666666">=</span> plt<span style="color: #666666">.</span>subplots(figsize <span style="color: #666666">=</span> (<span style="color: #666666">10</span>, <span style="color: #666666">10</span>))
-sns<span style="color: #666666">.</span>heatmap(train_accuracy, annot<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">True</span>, ax<span style="color: #666666">=</span>ax, cmap<span style="color: #666666">=</span><span style="color: #BA2121">&quot;viridis&quot;</span>)
-ax<span style="color: #666666">.</span>set_title(<span style="color: #BA2121">&quot;Training Accuracy&quot;</span>)
-ax<span style="color: #666666">.</span>set_ylabel(<span style="color: #BA2121">&quot;$\eta$&quot;</span>)
-ax<span style="color: #666666">.</span>set_xlabel(<span style="color: #BA2121">&quot;$\lambda$&quot;</span>)
-plt<span style="color: #666666">.</span>show()
-
-fig, ax <span style="color: #666666">=</span> plt<span style="color: #666666">.</span>subplots(figsize <span style="color: #666666">=</span> (<span style="color: #666666">10</span>, <span style="color: #666666">10</span>))
-sns<span style="color: #666666">.</span>heatmap(test_accuracy, annot<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">True</span>, ax<span style="color: #666666">=</span>ax, cmap<span style="color: #666666">=</span><span style="color: #BA2121">&quot;viridis&quot;</span>)
-ax<span style="color: #666666">.</span>set_title(<span style="color: #BA2121">&quot;Test Accuracy&quot;</span>)
-ax<span style="color: #666666">.</span>set_ylabel(<span style="color: #BA2121">&quot;$\eta$&quot;</span>)
-ax<span style="color: #666666">.</span>set_xlabel(<span style="color: #BA2121">&quot;$\lambda$&quot;</span>)
-plt<span style="color: #666666">.</span>show()
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="scikit-learn-implementation">scikit-learn implementation </h2>
-
-<p><b>scikit-learn</b> focuses more
-on traditional machine learning methods, such as regression,
-clustering, decision trees, etc. As such, it has only two types of
-neural networks: Multi Layer Perceptron outputting continuous values,
-<em>MPLRegressor</em>, and Multi Layer Perceptron outputting labels,
-<em>MLPClassifier</em>. We will see how simple it is to use these classes.
-</p>
-
-<p><b>scikit-learn</b> implements a few improvements from our neural network,
-such as early stopping, a varying learning rate, different
-optimization methods, etc. We would therefore expect a better
-performance overall.
-</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.neural_network</span> <span style="color: #008000; font-weight: bold">import</span> MLPClassifier
-<span style="color: #408080; font-style: italic"># store models for later use</span>
-DNN_scikit <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros((<span style="color: #008000">len</span>(eta_vals), <span style="color: #008000">len</span>(lmbd_vals)), dtype<span style="color: #666666">=</span><span style="color: #008000">object</span>)
-
-<span style="color: #008000; font-weight: bold">for</span> i, eta <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">enumerate</span>(eta_vals):
-    <span style="color: #008000; font-weight: bold">for</span> j, lmbd <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">enumerate</span>(lmbd_vals):
-        dnn <span style="color: #666666">=</span> MLPClassifier(hidden_layer_sizes<span style="color: #666666">=</span>(n_hidden_neurons), activation<span style="color: #666666">=</span><span style="color: #BA2121">&#39;logistic&#39;</span>,
-                            alpha<span style="color: #666666">=</span>lmbd, learning_rate_init<span style="color: #666666">=</span>eta, max_iter<span style="color: #666666">=</span>epochs)
-        dnn<span style="color: #666666">.</span>fit(X_train, Y_train)
-        
-        DNN_scikit[i][j] <span style="color: #666666">=</span> dnn
-        
-        <span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;Learning rate  = &quot;</span>, eta)
-        <span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;Lambda = &quot;</span>, lmbd)
-        <span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;Accuracy score on test set: &quot;</span>, dnn<span style="color: #666666">.</span>score(X_test, Y_test))
-        <span style="color: #008000">print</span>()
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="visualization">Visualization </h2>
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #408080; font-style: italic"># optional</span>
-<span style="color: #408080; font-style: italic"># visual representation of grid search</span>
-<span style="color: #408080; font-style: italic"># uses seaborn heatmap, could probably do this in matplotlib</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">seaborn</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">sns</span>
-
-sns<span style="color: #666666">.</span>set()
-
-train_accuracy <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros((<span style="color: #008000">len</span>(eta_vals), <span style="color: #008000">len</span>(lmbd_vals)))
-test_accuracy <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros((<span style="color: #008000">len</span>(eta_vals), <span style="color: #008000">len</span>(lmbd_vals)))
-
-<span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #008000">len</span>(eta_vals)):
-    <span style="color: #008000; font-weight: bold">for</span> j <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #008000">len</span>(lmbd_vals)):
-        dnn <span style="color: #666666">=</span> DNN_scikit[i][j]
-        
-        train_pred <span style="color: #666666">=</span> dnn<span style="color: #666666">.</span>predict(X_train) 
-        test_pred <span style="color: #666666">=</span> dnn<span style="color: #666666">.</span>predict(X_test)
-
-        train_accuracy[i][j] <span style="color: #666666">=</span> accuracy_score(Y_train, train_pred)
-        test_accuracy[i][j] <span style="color: #666666">=</span> accuracy_score(Y_test, test_pred)
-
-        
-fig, ax <span style="color: #666666">=</span> plt<span style="color: #666666">.</span>subplots(figsize <span style="color: #666666">=</span> (<span style="color: #666666">10</span>, <span style="color: #666666">10</span>))
-sns<span style="color: #666666">.</span>heatmap(train_accuracy, annot<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">True</span>, ax<span style="color: #666666">=</span>ax, cmap<span style="color: #666666">=</span><span style="color: #BA2121">&quot;viridis&quot;</span>)
-ax<span style="color: #666666">.</span>set_title(<span style="color: #BA2121">&quot;Training Accuracy&quot;</span>)
-ax<span style="color: #666666">.</span>set_ylabel(<span style="color: #BA2121">&quot;$\eta$&quot;</span>)
-ax<span style="color: #666666">.</span>set_xlabel(<span style="color: #BA2121">&quot;$\lambda$&quot;</span>)
-plt<span style="color: #666666">.</span>show()
-
-fig, ax <span style="color: #666666">=</span> plt<span style="color: #666666">.</span>subplots(figsize <span style="color: #666666">=</span> (<span style="color: #666666">10</span>, <span style="color: #666666">10</span>))
-sns<span style="color: #666666">.</span>heatmap(test_accuracy, annot<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">True</span>, ax<span style="color: #666666">=</span>ax, cmap<span style="color: #666666">=</span><span style="color: #BA2121">&quot;viridis&quot;</span>)
-ax<span style="color: #666666">.</span>set_title(<span style="color: #BA2121">&quot;Test Accuracy&quot;</span>)
-ax<span style="color: #666666">.</span>set_ylabel(<span style="color: #BA2121">&quot;$\eta$&quot;</span>)
-ax<span style="color: #666666">.</span>set_xlabel(<span style="color: #BA2121">&quot;$\lambda$&quot;</span>)
-plt<span style="color: #666666">.</span>show()
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="building-neural-networks-in-tensorflow-and-keras">Building neural networks in Tensorflow and Keras </h2>
-
-<p>Now we want  to build on the experience gained from our neural network implementation in NumPy and scikit-learn
-and use it to construct a neural network in Tensorflow. Once we have constructed a neural network in NumPy
-and Tensorflow, building one in Keras is really quite trivial, though the performance may suffer.  
-</p>
-
-<p>In our previous example we used only one hidden layer, and in this we will use two. From this it should be quite
-clear how to build one using an arbitrary number of hidden layers, using data structures such as Python lists or
-NumPy arrays.
-</p>
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="tensorflow">Tensorflow </h2>
-
-<p>Tensorflow is an open source library machine learning library
-developed by the Google Brain team for internal use. It was released
-under the Apache 2.0 open source license in November 9, 2015.
-</p>
-
-<p>Tensorflow is a computational framework that allows you to construct
-machine learning models at different levels of abstraction, from
-high-level, object-oriented APIs like Keras, down to the C++ kernels
-that Tensorflow is built upon. The higher levels of abstraction are
-simpler to use, but less flexible, and our choice of implementation
-should reflect the problems we are trying to solve.
-</p>
-
-<p><a href="https://www.tensorflow.org/guide/graphs" target="_blank">Tensorflow uses</a> so-called graphs to represent your computation
-in terms of the dependencies between individual operations, such that you first build a Tensorflow <em>graph</em>
-to represent your model, and then create a Tensorflow <em>session</em> to run the graph.
-</p>
-
-<p>In this guide we will analyze the same data as we did in our NumPy and
-scikit-learn tutorial, gathered from the MNIST database of images. We
-will give an introduction to the lower level Python Application
-Program Interfaces (APIs), and see how we use them to build our graph.
-Then we will build (effectively) the same graph in Keras, to see just
-how simple solving a machine learning problem can be.
-</p>
-
-<p>To install tensorflow on Unix/Linux systems, use pip as</p>
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;">pip3 install tensorflow
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-<p>and/or if you use <b>anaconda</b>, just write (or install from the graphical user interface)
-(current release of CPU-only TensorFlow)
-</p>
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;">conda create <span style="color: #666666">-</span>n tf tensorflow
-conda activate tf
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-<p>To install the current release of GPU TensorFlow</p>
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;">conda create <span style="color: #666666">-</span>n tf<span style="color: #666666">-</span>gpu tensorflow<span style="color: #666666">-</span>gpu
-conda activate tf<span style="color: #666666">-</span>gpu
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="using-keras">Using Keras </h2>
-
-<p>Keras is a high level <a href="https://en.wikipedia.org/wiki/Application_programming_interface" target="_blank">neural network</a>
-that supports Tensorflow, CTNK and Theano as backends.  
-If you have Anaconda installed you may run the following command
-</p>
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;">conda install keras
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-<p>You can look up the <a href="https://keras.io/" target="_blank">instructions here</a> for more information.</p>
-
-<p>We will to a large extent use <b>keras</b> in this course. </p>
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="collect-and-pre-process-data">Collect and pre-process data </h2>
-
-<p>Let us look again at the MINST data set.</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #408080; font-style: italic"># import necessary packages</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">matplotlib.pyplot</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">plt</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">tensorflow</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">tf</span>
+<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">tensorflow</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">tf</span>
 <span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn</span> <span style="color: #008000; font-weight: bold">import</span> datasets
 
 
@@ -2328,7 +999,7 @@ <h2 id="collect-and-pre-process-data">Collect and pre-process data </h2>
 
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="the-breast-cancer-data-now-with-keras">The Breast Cancer Data, now with Keras </h2>
+<h2 id="using-pytorch-with-the-full-mnist-data-set">Using Pytorch with the full MNIST data set </h2>
 
 
 <!-- code=python (!bc pycod) typeset with pygments style "default" -->
@@ -2337,171 +1008,156 @@ <h2 id="the-breast-cancer-data-now-with-keras">The Breast Cancer Data, now with
     <div class="inner_cell">
       <div class="input_area">
         <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">tensorflow</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">tf</span>
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">tensorflow.keras.layers</span> <span style="color: #008000; font-weight: bold">import</span> Input
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">tensorflow.keras.models</span> <span style="color: #008000; font-weight: bold">import</span> Sequential      <span style="color: #408080; font-style: italic">#This allows appending layers to existing models</span>
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">tensorflow.keras.layers</span> <span style="color: #008000; font-weight: bold">import</span> Dense           <span style="color: #408080; font-style: italic">#This allows defining the characteristics of a particular layer</span>
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">tensorflow.keras</span> <span style="color: #008000; font-weight: bold">import</span> optimizers             <span style="color: #408080; font-style: italic">#This allows using whichever optimiser we want (sgd,adam,RMSprop)</span>
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">tensorflow.keras</span> <span style="color: #008000; font-weight: bold">import</span> regularizers           <span style="color: #408080; font-style: italic">#This allows using whichever regularizer we want (l1,l2,l1_l2)</span>
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">tensorflow.keras.utils</span> <span style="color: #008000; font-weight: bold">import</span> to_categorical   <span style="color: #408080; font-style: italic">#This allows using categorical cross entropy as the cost function</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">matplotlib.pyplot</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">plt</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">seaborn</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">sns</span>
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.model_selection</span> <span style="color: #008000; font-weight: bold">import</span> train_test_split <span style="color: #008000; font-weight: bold">as</span> splitter
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.datasets</span> <span style="color: #008000; font-weight: bold">import</span> load_breast_cancer
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">pickle</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">os</span> 
-
-
-<span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;Load breast cancer dataset&quot;&quot;&quot;</span>
-
-np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>seed(<span style="color: #666666">0</span>)        <span style="color: #408080; font-style: italic">#create same seed for random number every time</span>
-
-cancer<span style="color: #666666">=</span>load_breast_cancer()      <span style="color: #408080; font-style: italic">#Download breast cancer dataset</span>
-
-inputs<span style="color: #666666">=</span>cancer<span style="color: #666666">.</span>data                     <span style="color: #408080; font-style: italic">#Feature matrix of 569 rows (samples) and 30 columns (parameters)</span>
-outputs<span style="color: #666666">=</span>cancer<span style="color: #666666">.</span>target                  <span style="color: #408080; font-style: italic">#Label array of 569 rows (0 for benign and 1 for malignant)</span>
-labels<span style="color: #666666">=</span>cancer<span style="color: #666666">.</span>feature_names[<span style="color: #666666">0</span>:<span style="color: #666666">30</span>]
-
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&#39;The content of the breast cancer dataset is:&#39;</span>)      <span style="color: #408080; font-style: italic">#Print information about the datasets</span>
-<span style="color: #008000">print</span>(labels)
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&#39;-------------------------&#39;</span>)
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;inputs =  &quot;</span> <span style="color: #666666">+</span> <span style="color: #008000">str</span>(inputs<span style="color: #666666">.</span>shape))
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;outputs =  &quot;</span> <span style="color: #666666">+</span> <span style="color: #008000">str</span>(outputs<span style="color: #666666">.</span>shape))
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;labels =  &quot;</span><span style="color: #666666">+</span> <span style="color: #008000">str</span>(labels<span style="color: #666666">.</span>shape))
-
-x<span style="color: #666666">=</span>inputs      <span style="color: #408080; font-style: italic">#Reassign the Feature and Label matrices to other variables</span>
-y<span style="color: #666666">=</span>outputs
-
-<span style="color: #408080; font-style: italic">#%% </span>
-
-<span style="color: #408080; font-style: italic"># Visualisation of dataset (for correlation analysis)</span>
-
-plt<span style="color: #666666">.</span>figure()
-plt<span style="color: #666666">.</span>scatter(x[:,<span style="color: #666666">0</span>],x[:,<span style="color: #666666">2</span>],s<span style="color: #666666">=40</span>,c<span style="color: #666666">=</span>y,cmap<span style="color: #666666">=</span>plt<span style="color: #666666">.</span>cm<span style="color: #666666">.</span>Spectral)
-plt<span style="color: #666666">.</span>xlabel(<span style="color: #BA2121">&#39;Mean radius&#39;</span>,fontweight<span style="color: #666666">=</span><span style="color: #BA2121">&#39;bold&#39;</span>)
-plt<span style="color: #666666">.</span>ylabel(<span style="color: #BA2121">&#39;Mean perimeter&#39;</span>,fontweight<span style="color: #666666">=</span><span style="color: #BA2121">&#39;bold&#39;</span>)
-plt<span style="color: #666666">.</span>show()
-
-plt<span style="color: #666666">.</span>figure()
-plt<span style="color: #666666">.</span>scatter(x[:,<span style="color: #666666">5</span>],x[:,<span style="color: #666666">6</span>],s<span style="color: #666666">=40</span>,c<span style="color: #666666">=</span>y, cmap<span style="color: #666666">=</span>plt<span style="color: #666666">.</span>cm<span style="color: #666666">.</span>Spectral)
-plt<span style="color: #666666">.</span>xlabel(<span style="color: #BA2121">&#39;Mean compactness&#39;</span>,fontweight<span style="color: #666666">=</span><span style="color: #BA2121">&#39;bold&#39;</span>)
-plt<span style="color: #666666">.</span>ylabel(<span style="color: #BA2121">&#39;Mean concavity&#39;</span>,fontweight<span style="color: #666666">=</span><span style="color: #BA2121">&#39;bold&#39;</span>)
-plt<span style="color: #666666">.</span>show()
-
-
-plt<span style="color: #666666">.</span>figure()
-plt<span style="color: #666666">.</span>scatter(x[:,<span style="color: #666666">0</span>],x[:,<span style="color: #666666">1</span>],s<span style="color: #666666">=40</span>,c<span style="color: #666666">=</span>y,cmap<span style="color: #666666">=</span>plt<span style="color: #666666">.</span>cm<span style="color: #666666">.</span>Spectral)
-plt<span style="color: #666666">.</span>xlabel(<span style="color: #BA2121">&#39;Mean radius&#39;</span>,fontweight<span style="color: #666666">=</span><span style="color: #BA2121">&#39;bold&#39;</span>)
-plt<span style="color: #666666">.</span>ylabel(<span style="color: #BA2121">&#39;Mean texture&#39;</span>,fontweight<span style="color: #666666">=</span><span style="color: #BA2121">&#39;bold&#39;</span>)
-plt<span style="color: #666666">.</span>show()
-
-plt<span style="color: #666666">.</span>figure()
-plt<span style="color: #666666">.</span>scatter(x[:,<span style="color: #666666">2</span>],x[:,<span style="color: #666666">1</span>],s<span style="color: #666666">=40</span>,c<span style="color: #666666">=</span>y,cmap<span style="color: #666666">=</span>plt<span style="color: #666666">.</span>cm<span style="color: #666666">.</span>Spectral)
-plt<span style="color: #666666">.</span>xlabel(<span style="color: #BA2121">&#39;Mean perimeter&#39;</span>,fontweight<span style="color: #666666">=</span><span style="color: #BA2121">&#39;bold&#39;</span>)
-plt<span style="color: #666666">.</span>ylabel(<span style="color: #BA2121">&#39;Mean compactness&#39;</span>,fontweight<span style="color: #666666">=</span><span style="color: #BA2121">&#39;bold&#39;</span>)
-plt<span style="color: #666666">.</span>show()
-
-
-<span style="color: #408080; font-style: italic"># Generate training and testing datasets</span>
-
-<span style="color: #408080; font-style: italic">#Select features relevant to classification (texture,perimeter,compactness and symmetery) </span>
-<span style="color: #408080; font-style: italic">#and add to input matrix</span>
-
-temp1<span style="color: #666666">=</span>np<span style="color: #666666">.</span>reshape(x[:,<span style="color: #666666">1</span>],(<span style="color: #008000">len</span>(x[:,<span style="color: #666666">1</span>]),<span style="color: #666666">1</span>))
-temp2<span style="color: #666666">=</span>np<span style="color: #666666">.</span>reshape(x[:,<span style="color: #666666">2</span>],(<span style="color: #008000">len</span>(x[:,<span style="color: #666666">2</span>]),<span style="color: #666666">1</span>))
-X<span style="color: #666666">=</span>np<span style="color: #666666">.</span>hstack((temp1,temp2))      
-temp<span style="color: #666666">=</span>np<span style="color: #666666">.</span>reshape(x[:,<span style="color: #666666">5</span>],(<span style="color: #008000">len</span>(x[:,<span style="color: #666666">5</span>]),<span style="color: #666666">1</span>))
-X<span style="color: #666666">=</span>np<span style="color: #666666">.</span>hstack((X,temp))       
-temp<span style="color: #666666">=</span>np<span style="color: #666666">.</span>reshape(x[:,<span style="color: #666666">8</span>],(<span style="color: #008000">len</span>(x[:,<span style="color: #666666">8</span>]),<span style="color: #666666">1</span>))
-X<span style="color: #666666">=</span>np<span style="color: #666666">.</span>hstack((X,temp))       
-
-X_train,X_test,y_train,y_test<span style="color: #666666">=</span>splitter(X,y,test_size<span style="color: #666666">=0.1</span>)   <span style="color: #408080; font-style: italic">#Split datasets into training and testing</span>
-
-y_train<span style="color: #666666">=</span>to_categorical(y_train)     <span style="color: #408080; font-style: italic">#Convert labels to categorical when using categorical cross entropy</span>
-y_test<span style="color: #666666">=</span>to_categorical(y_test)
-
-<span style="color: #008000; font-weight: bold">del</span> temp1,temp2,temp
-
-<span style="color: #408080; font-style: italic"># %%</span>
-
-<span style="color: #408080; font-style: italic"># Define tunable parameters&quot;</span>
-
-eta<span style="color: #666666">=</span>np<span style="color: #666666">.</span>logspace(<span style="color: #666666">-3</span>,<span style="color: #666666">-1</span>,<span style="color: #666666">3</span>)                    <span style="color: #408080; font-style: italic">#Define vector of learning rates (parameter to SGD optimiser)</span>
-lamda<span style="color: #666666">=0.01</span>                                  <span style="color: #408080; font-style: italic">#Define hyperparameter</span>
-n_layers<span style="color: #666666">=2</span>                                  <span style="color: #408080; font-style: italic">#Define number of hidden layers in the model</span>
-n_neuron<span style="color: #666666">=</span>np<span style="color: #666666">.</span>logspace(<span style="color: #666666">0</span>,<span style="color: #666666">3</span>,<span style="color: #666666">4</span>,dtype<span style="color: #666666">=</span><span style="color: #008000">int</span>)       <span style="color: #408080; font-style: italic">#Define number of neurons per layer</span>
-epochs<span style="color: #666666">=100</span>                                   <span style="color: #408080; font-style: italic">#Number of reiterations over the input data</span>
-batch_size<span style="color: #666666">=100</span>                              <span style="color: #408080; font-style: italic">#Number of samples per gradient update</span>
-
-<span style="color: #408080; font-style: italic"># %%</span>
-
-<span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;Define function to return Deep Neural Network model&quot;&quot;&quot;</span>
-
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">NN_model</span>(inputsize,n_layers,n_neuron,eta,lamda):
-    model<span style="color: #666666">=</span>Sequential()      
-    <span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(n_layers):       <span style="color: #408080; font-style: italic">#Run loop to add hidden layers to the model</span>
-        <span style="color: #008000; font-weight: bold">if</span> (i<span style="color: #666666">==0</span>):                  <span style="color: #408080; font-style: italic">#First layer requires input dimensions</span>
-            model<span style="color: #666666">.</span>add(Dense(n_neuron,activation<span style="color: #666666">=</span><span style="color: #BA2121">&#39;relu&#39;</span>,kernel_regularizer<span style="color: #666666">=</span>regularizers<span style="color: #666666">.</span>l2(lamda),input_dim<span style="color: #666666">=</span>inputsize))
-        <span style="color: #008000; font-weight: bold">else</span>:                       <span style="color: #408080; font-style: italic">#Subsequent layers are capable of automatic shape inferencing</span>
-            model<span style="color: #666666">.</span>add(Dense(n_neuron,activation<span style="color: #666666">=</span><span style="color: #BA2121">&#39;relu&#39;</span>,kernel_regularizer<span style="color: #666666">=</span>regularizers<span style="color: #666666">.</span>l2(lamda)))
-    model<span style="color: #666666">.</span>add(Dense(<span style="color: #666666">2</span>,activation<span style="color: #666666">=</span><span style="color: #BA2121">&#39;softmax&#39;</span>))  <span style="color: #408080; font-style: italic">#2 outputs - ordered and disordered (softmax for prob)</span>
-    sgd<span style="color: #666666">=</span>optimizers<span style="color: #666666">.</span>SGD(learning_rate<span style="color: #666666">=</span>eta)
-    model<span style="color: #666666">.</span>compile(loss<span style="color: #666666">=</span><span style="color: #BA2121">&#39;categorical_crossentropy&#39;</span>,optimizer<span style="color: #666666">=</span>sgd,metrics<span style="color: #666666">=</span>[<span style="color: #BA2121">&#39;accuracy&#39;</span>])
-    <span style="color: #008000; font-weight: bold">return</span> model
-
-    
-Train_accuracy<span style="color: #666666">=</span>np<span style="color: #666666">.</span>zeros((<span style="color: #008000">len</span>(n_neuron),<span style="color: #008000">len</span>(eta)))      <span style="color: #408080; font-style: italic">#Define matrices to store accuracy scores as a function</span>
-Test_accuracy<span style="color: #666666">=</span>np<span style="color: #666666">.</span>zeros((<span style="color: #008000">len</span>(n_neuron),<span style="color: #008000">len</span>(eta)))       <span style="color: #408080; font-style: italic">#of learning rate and number of hidden neurons for </span>
-
-<span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #008000">len</span>(n_neuron)):     <span style="color: #408080; font-style: italic">#run loops over hidden neurons and learning rates to calculate </span>
-    <span style="color: #008000; font-weight: bold">for</span> j <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #008000">len</span>(eta)):      <span style="color: #408080; font-style: italic">#accuracy scores </span>
-        DNN_model<span style="color: #666666">=</span>NN_model(X_train<span style="color: #666666">.</span>shape[<span style="color: #666666">1</span>],n_layers,n_neuron[i],eta[j],lamda)
-        DNN_model<span style="color: #666666">.</span>fit(X_train,y_train,epochs<span style="color: #666666">=</span>epochs,batch_size<span style="color: #666666">=</span>batch_size,verbose<span style="color: #666666">=1</span>)
-        Train_accuracy[i,j]<span style="color: #666666">=</span>DNN_model<span style="color: #666666">.</span>evaluate(X_train,y_train)[<span style="color: #666666">1</span>]
-        Test_accuracy[i,j]<span style="color: #666666">=</span>DNN_model<span style="color: #666666">.</span>evaluate(X_test,y_test)[<span style="color: #666666">1</span>]
-               
-
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">plot_data</span>(x,y,data,title<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">None</span>):
-
-    <span style="color: #408080; font-style: italic"># plot results</span>
-    fontsize<span style="color: #666666">=16</span>
-
-
-    fig <span style="color: #666666">=</span> plt<span style="color: #666666">.</span>figure()
-    ax <span style="color: #666666">=</span> fig<span style="color: #666666">.</span>add_subplot(<span style="color: #666666">111</span>)
-    cax <span style="color: #666666">=</span> ax<span style="color: #666666">.</span>matshow(data, interpolation<span style="color: #666666">=</span><span style="color: #BA2121">&#39;nearest&#39;</span>, vmin<span style="color: #666666">=0</span>, vmax<span style="color: #666666">=1</span>)
-    
-    cbar<span style="color: #666666">=</span>fig<span style="color: #666666">.</span>colorbar(cax)
-    cbar<span style="color: #666666">.</span>ax<span style="color: #666666">.</span>set_ylabel(<span style="color: #BA2121">&#39;accuracy (%)&#39;</span>,rotation<span style="color: #666666">=90</span>,fontsize<span style="color: #666666">=</span>fontsize)
-    cbar<span style="color: #666666">.</span>set_ticks([<span style="color: #666666">0</span>,<span style="color: #666666">.2</span>,<span style="color: #666666">.4</span>,<span style="color: #666666">0.6</span>,<span style="color: #666666">0.8</span>,<span style="color: #666666">1.0</span>])
-    cbar<span style="color: #666666">.</span>set_ticklabels([<span style="color: #BA2121">&#39;0%&#39;</span>,<span style="color: #BA2121">&#39;20%&#39;</span>,<span style="color: #BA2121">&#39;40%&#39;</span>,<span style="color: #BA2121">&#39;60%&#39;</span>,<span style="color: #BA2121">&#39;80%&#39;</span>,<span style="color: #BA2121">&#39;100%&#39;</span>])
-
-    <span style="color: #408080; font-style: italic"># put text on matrix elements</span>
-    <span style="color: #008000; font-weight: bold">for</span> i, x_val <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">enumerate</span>(np<span style="color: #666666">.</span>arange(<span style="color: #008000">len</span>(x))):
-        <span style="color: #008000; font-weight: bold">for</span> j, y_val <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">enumerate</span>(np<span style="color: #666666">.</span>arange(<span style="color: #008000">len</span>(y))):
-            c <span style="color: #666666">=</span> <span style="color: #BA2121">&quot;$</span><span style="color: #BB6688; font-weight: bold">{0:.1f}</span><span style="color: #BB6622; font-weight: bold">\\</span><span style="color: #BA2121">%$&quot;</span><span style="color: #666666">.</span>format( <span style="color: #666666">100*</span>data[j,i])  
-            ax<span style="color: #666666">.</span>text(x_val, y_val, c, va<span style="color: #666666">=</span><span style="color: #BA2121">&#39;center&#39;</span>, ha<span style="color: #666666">=</span><span style="color: #BA2121">&#39;center&#39;</span>)
-
-    <span style="color: #408080; font-style: italic"># convert axis vaues to to string labels</span>
-    x<span style="color: #666666">=</span>[<span style="color: #008000">str</span>(i) <span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> x]
-    y<span style="color: #666666">=</span>[<span style="color: #008000">str</span>(i) <span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> y]
-
+  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">torch</span>
+<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">torch.nn</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">nn</span>
+<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">torch.optim</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">optim</span>
+<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">torchvision</span>
+<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">torchvision.transforms</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">transforms</span>
+
+<span style="color: #408080; font-style: italic"># Device configuration: use GPU if available</span>
+device <span style="color: #666666">=</span> torch<span style="color: #666666">.</span>device(<span style="color: #BA2121">&quot;cuda&quot;</span> <span style="color: #008000; font-weight: bold">if</span> torch<span style="color: #666666">.</span>cuda<span style="color: #666666">.</span>is_available() <span style="color: #008000; font-weight: bold">else</span> <span style="color: #BA2121">&quot;cpu&quot;</span>)
+
+<span style="color: #408080; font-style: italic"># MNIST dataset (downloads if not already present)</span>
+transform <span style="color: #666666">=</span> transforms<span style="color: #666666">.</span>Compose([
+    transforms<span style="color: #666666">.</span>ToTensor(),
+    transforms<span style="color: #666666">.</span>Normalize((<span style="color: #666666">0.5</span>,), (<span style="color: #666666">0.5</span>,))  <span style="color: #408080; font-style: italic"># normalize to mean=0.5, std=0.5 (approx. [-1,1] pixel range)</span>
+])
+train_dataset <span style="color: #666666">=</span> torchvision<span style="color: #666666">.</span>datasets<span style="color: #666666">.</span>MNIST(root<span style="color: #666666">=</span><span style="color: #BA2121">&#39;./data&#39;</span>, train<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">True</span>, download<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">True</span>, transform<span style="color: #666666">=</span>transform)
+test_dataset  <span style="color: #666666">=</span> torchvision<span style="color: #666666">.</span>datasets<span style="color: #666666">.</span>MNIST(root<span style="color: #666666">=</span><span style="color: #BA2121">&#39;./data&#39;</span>, train<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">False</span>, download<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">True</span>, transform<span style="color: #666666">=</span>transform)
+
+train_loader <span style="color: #666666">=</span> torch<span style="color: #666666">.</span>utils<span style="color: #666666">.</span>data<span style="color: #666666">.</span>DataLoader(train_dataset, batch_size<span style="color: #666666">=64</span>, shuffle<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">True</span>)
+test_loader  <span style="color: #666666">=</span> torch<span style="color: #666666">.</span>utils<span style="color: #666666">.</span>data<span style="color: #666666">.</span>DataLoader(test_dataset, batch_size<span style="color: #666666">=64</span>, shuffle<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">False</span>)
+
+
+<span style="color: #008000; font-weight: bold">class</span> <span style="color: #0000FF; font-weight: bold">NeuralNet</span>(nn<span style="color: #666666">.</span>Module):
+    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">__init__</span>(<span style="color: #008000">self</span>):
+        <span style="color: #008000">super</span>(NeuralNet, <span style="color: #008000">self</span>)<span style="color: #666666">.</span><span style="color: #0000FF">__init__</span>()
+        <span style="color: #008000">self</span><span style="color: #666666">.</span>fc1 <span style="color: #666666">=</span> nn<span style="color: #666666">.</span>Linear(<span style="color: #666666">28*28</span>, <span style="color: #666666">100</span>)   <span style="color: #408080; font-style: italic"># first hidden layer (784 -&gt; 100)</span>
+        <span style="color: #008000">self</span><span style="color: #666666">.</span>fc2 <span style="color: #666666">=</span> nn<span style="color: #666666">.</span>Linear(<span style="color: #666666">100</span>, <span style="color: #666666">100</span>)    <span style="color: #408080; font-style: italic"># second hidden layer (100 -&gt; 100)</span>
+        <span style="color: #008000">self</span><span style="color: #666666">.</span>fc3 <span style="color: #666666">=</span> nn<span style="color: #666666">.</span>Linear(<span style="color: #666666">100</span>, <span style="color: #666666">10</span>)     <span style="color: #408080; font-style: italic"># output layer (100 -&gt; 10 classes)</span>
+    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">forward</span>(<span style="color: #008000">self</span>, x):
+        x <span style="color: #666666">=</span> x<span style="color: #666666">.</span>view(x<span style="color: #666666">.</span>size(<span style="color: #666666">0</span>), <span style="color: #666666">-1</span>)         <span style="color: #408080; font-style: italic"># flatten images into vectors of size 784</span>
+        x <span style="color: #666666">=</span> torch<span style="color: #666666">.</span>relu(<span style="color: #008000">self</span><span style="color: #666666">.</span>fc1(x))       <span style="color: #408080; font-style: italic"># hidden layer 1 + ReLU activation</span>
+        x <span style="color: #666666">=</span> torch<span style="color: #666666">.</span>relu(<span style="color: #008000">self</span><span style="color: #666666">.</span>fc2(x))       <span style="color: #408080; font-style: italic"># hidden layer 2 + ReLU activation</span>
+        x <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>fc3(x)                   <span style="color: #408080; font-style: italic"># output layer (logits for 10 classes)</span>
+        <span style="color: #008000; font-weight: bold">return</span> x
+
+model <span style="color: #666666">=</span> NeuralNet()<span style="color: #666666">.</span>to(device)
+
+
+criterion <span style="color: #666666">=</span> nn<span style="color: #666666">.</span>CrossEntropyLoss()
+optimizer <span style="color: #666666">=</span> optim<span style="color: #666666">.</span>SGD(model<span style="color: #666666">.</span>parameters(), lr<span style="color: #666666">=0.01</span>, weight_decay<span style="color: #666666">=1e-4</span>)
+
+num_epochs <span style="color: #666666">=</span> <span style="color: #666666">10</span>
+<span style="color: #008000; font-weight: bold">for</span> epoch <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(num_epochs):
+    model<span style="color: #666666">.</span>train()  <span style="color: #408080; font-style: italic"># set model to training mode</span>
+    running_loss <span style="color: #666666">=</span> <span style="color: #666666">0.0</span>
+    <span style="color: #008000; font-weight: bold">for</span> images, labels <span style="color: #AA22FF; font-weight: bold">in</span> train_loader:
+        <span style="color: #408080; font-style: italic"># Move data to device (GPU if available, else CPU)</span>
+        images, labels <span style="color: #666666">=</span> images<span style="color: #666666">.</span>to(device), labels<span style="color: #666666">.</span>to(device)
+
+        optimizer<span style="color: #666666">.</span>zero_grad()            <span style="color: #408080; font-style: italic"># reset gradients to zero</span>
+        outputs <span style="color: #666666">=</span> model(images)          <span style="color: #408080; font-style: italic"># forward pass: compute predictions</span>
+        loss <span style="color: #666666">=</span> criterion(outputs, labels)  <span style="color: #408080; font-style: italic"># compute cross-entropy loss</span>
+        loss<span style="color: #666666">.</span>backward()                 <span style="color: #408080; font-style: italic"># backpropagate to compute gradients</span>
+        optimizer<span style="color: #666666">.</span>step()                <span style="color: #408080; font-style: italic"># update weights using SGD step </span>
+
+        running_loss <span style="color: #666666">+=</span> loss<span style="color: #666666">.</span>item()
+    <span style="color: #408080; font-style: italic"># Compute average loss over all batches in this epoch</span>
+    avg_loss <span style="color: #666666">=</span> running_loss <span style="color: #666666">/</span> <span style="color: #008000">len</span>(train_loader)
+    <span style="color: #008000">print</span>(<span style="color: #BA2121">f&quot;Epoch </span><span style="color: #BB6688; font-weight: bold">{</span>epoch<span style="color: #666666">+1</span><span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">/</span><span style="color: #BB6688; font-weight: bold">{</span>num_epochs<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">, Loss: </span><span style="color: #BB6688; font-weight: bold">{</span>avg_loss<span style="color: #BB6688; font-weight: bold">:</span><span style="color: #BA2121">.4f</span><span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">&quot;</span>)
+
+<span style="color: #408080; font-style: italic">#Evaluation on the Test Set</span>
+
+
+
+model<span style="color: #666666">.</span>eval()  <span style="color: #408080; font-style: italic"># set model to evaluation mode </span>
+correct <span style="color: #666666">=</span> <span style="color: #666666">0</span>
+total <span style="color: #666666">=</span> <span style="color: #666666">0</span>
+<span style="color: #008000; font-weight: bold">with</span> torch<span style="color: #666666">.</span>no_grad():  <span style="color: #408080; font-style: italic"># disable gradient calculation for evaluation </span>
+    <span style="color: #008000; font-weight: bold">for</span> images, labels <span style="color: #AA22FF; font-weight: bold">in</span> test_loader:
+        images, labels <span style="color: #666666">=</span> images<span style="color: #666666">.</span>to(device), labels<span style="color: #666666">.</span>to(device)
+        outputs <span style="color: #666666">=</span> model(images)
+        _, predicted <span style="color: #666666">=</span> torch<span style="color: #666666">.</span>max(outputs, dim<span style="color: #666666">=1</span>)  <span style="color: #408080; font-style: italic"># class with highest score</span>
+        total <span style="color: #666666">+=</span> labels<span style="color: #666666">.</span>size(<span style="color: #666666">0</span>)
+        correct <span style="color: #666666">+=</span> (predicted <span style="color: #666666">==</span> labels)<span style="color: #666666">.</span>sum()<span style="color: #666666">.</span>item()
+
+accuracy <span style="color: #666666">=</span> <span style="color: #666666">100</span> <span style="color: #666666">*</span> correct <span style="color: #666666">/</span> total
+<span style="color: #008000">print</span>(<span style="color: #BA2121">f&quot;Test Accuracy: </span><span style="color: #BB6688; font-weight: bold">{</span>accuracy<span style="color: #BB6688; font-weight: bold">:</span><span style="color: #BA2121">.2f</span><span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">%&quot;</span>)
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
+</div>
 
-    ax<span style="color: #666666">.</span>set_xticklabels([<span style="color: #BA2121">&#39;&#39;</span>]<span style="color: #666666">+</span>x)
-    ax<span style="color: #666666">.</span>set_yticklabels([<span style="color: #BA2121">&#39;&#39;</span>]<span style="color: #666666">+</span>y)
 
-    ax<span style="color: #666666">.</span>set_xlabel(<span style="color: #BA2121">&#39;$</span><span style="color: #BB6622; font-weight: bold">\\</span><span style="color: #BA2121">mathrm{learning</span><span style="color: #BB6622; font-weight: bold">\\</span><span style="color: #BA2121"> rate}$&#39;</span>,fontsize<span style="color: #666666">=</span>fontsize)
-    ax<span style="color: #666666">.</span>set_ylabel(<span style="color: #BA2121">&#39;$</span><span style="color: #BB6622; font-weight: bold">\\</span><span style="color: #BA2121">mathrm{hidden</span><span style="color: #BB6622; font-weight: bold">\\</span><span style="color: #BA2121"> neurons}$&#39;</span>,fontsize<span style="color: #666666">=</span>fontsize)
-    <span style="color: #008000; font-weight: bold">if</span> title <span style="color: #AA22FF; font-weight: bold">is</span> <span style="color: #AA22FF; font-weight: bold">not</span> <span style="color: #008000; font-weight: bold">None</span>:
-        ax<span style="color: #666666">.</span>set_title(title)
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="and-a-similar-example-using-tensorflow-with-keras">And a similar example using Tensorflow with Keras </h2>
 
-    plt<span style="color: #666666">.</span>tight_layout()
 
-    plt<span style="color: #666666">.</span>show()
-    
-plot_data(eta,n_neuron,Train_accuracy, <span style="color: #BA2121">&#39;training&#39;</span>)
-plot_data(eta,n_neuron,Test_accuracy, <span style="color: #BA2121">&#39;testing&#39;</span>)
+<!-- code=python (!bc pycod) typeset with pygments style "default" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #f8f8f8">
+  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">tensorflow</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">tf</span>
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">tensorflow</span> <span style="color: #008000; font-weight: bold">import</span> keras
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">tensorflow.keras</span> <span style="color: #008000; font-weight: bold">import</span> layers, regularizers
+
+<span style="color: #408080; font-style: italic"># Check for GPU (TensorFlow will use it automatically if available)</span>
+gpus <span style="color: #666666">=</span> tf<span style="color: #666666">.</span>config<span style="color: #666666">.</span>list_physical_devices(<span style="color: #BA2121">&#39;GPU&#39;</span>)
+<span style="color: #008000">print</span>(<span style="color: #BA2121">f&quot;GPUs available: </span><span style="color: #BB6688; font-weight: bold">{</span>gpus<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">&quot;</span>)
+
+<span style="color: #408080; font-style: italic"># 1) Load and preprocess MNIST</span>
+(x_train, y_train), (x_test, y_test) <span style="color: #666666">=</span> keras<span style="color: #666666">.</span>datasets<span style="color: #666666">.</span>mnist<span style="color: #666666">.</span>load_data()
+<span style="color: #408080; font-style: italic"># Normalize to [0, 1]</span>
+x_train <span style="color: #666666">=</span> (x_train<span style="color: #666666">.</span>astype(<span style="color: #BA2121">&quot;float32&quot;</span>) <span style="color: #666666">/</span> <span style="color: #666666">255.0</span>)
+x_test  <span style="color: #666666">=</span> (x_test<span style="color: #666666">.</span>astype(<span style="color: #BA2121">&quot;float32&quot;</span>) <span style="color: #666666">/</span> <span style="color: #666666">255.0</span>)
+
+<span style="color: #408080; font-style: italic"># 2) Build the model: 784 -&gt; 100 -&gt; 100 -&gt; 10</span>
+l2_reg <span style="color: #666666">=</span> <span style="color: #666666">1e-4</span>  <span style="color: #408080; font-style: italic"># L2 regularization strength</span>
+
+model <span style="color: #666666">=</span> keras<span style="color: #666666">.</span>Sequential([
+    layers<span style="color: #666666">.</span>Input(shape<span style="color: #666666">=</span>(<span style="color: #666666">28</span>, <span style="color: #666666">28</span>)),
+    layers<span style="color: #666666">.</span>Flatten(),
+    layers<span style="color: #666666">.</span>Dense(<span style="color: #666666">100</span>, activation<span style="color: #666666">=</span><span style="color: #BA2121">&quot;relu&quot;</span>,
+                 kernel_regularizer<span style="color: #666666">=</span>regularizers<span style="color: #666666">.</span>l2(l2_reg)),
+    layers<span style="color: #666666">.</span>Dense(<span style="color: #666666">100</span>, activation<span style="color: #666666">=</span><span style="color: #BA2121">&quot;relu&quot;</span>,
+                 kernel_regularizer<span style="color: #666666">=</span>regularizers<span style="color: #666666">.</span>l2(l2_reg)),
+    layers<span style="color: #666666">.</span>Dense(<span style="color: #666666">10</span>, activation<span style="color: #666666">=</span><span style="color: #BA2121">&quot;softmax&quot;</span>)  <span style="color: #408080; font-style: italic"># output probabilities for 10 classes</span>
+])
+
+<span style="color: #408080; font-style: italic"># 3) Compile with SGD + weight decay via L2 regularizers</span>
+model<span style="color: #666666">.</span>compile(
+    optimizer<span style="color: #666666">=</span>keras<span style="color: #666666">.</span>optimizers<span style="color: #666666">.</span>SGD(learning_rate<span style="color: #666666">=0.01</span>),
+    loss<span style="color: #666666">=</span><span style="color: #BA2121">&quot;sparse_categorical_crossentropy&quot;</span>,
+    metrics<span style="color: #666666">=</span>[<span style="color: #BA2121">&quot;accuracy&quot;</span>],
+)
+
+model<span style="color: #666666">.</span>summary()
+
+<span style="color: #408080; font-style: italic"># 4) Train</span>
+history <span style="color: #666666">=</span> model<span style="color: #666666">.</span>fit(
+    x_train, y_train,
+    epochs<span style="color: #666666">=10</span>,
+    batch_size<span style="color: #666666">=64</span>,
+    validation_split<span style="color: #666666">=0.1</span>,  <span style="color: #408080; font-style: italic"># optional: monitor validation during training</span>
+    verbose<span style="color: #666666">=1</span>
+)
+
+<span style="color: #408080; font-style: italic"># 5) Evaluate on test set</span>
+test_loss, test_acc <span style="color: #666666">=</span> model<span style="color: #666666">.</span>evaluate(x_test, y_test, verbose<span style="color: #666666">=0</span>)
+<span style="color: #008000">print</span>(<span style="color: #BA2121">f&quot;Test accuracy: </span><span style="color: #BB6688; font-weight: bold">{</span>test_acc<span style="color: #BB6688; font-weight: bold">:</span><span style="color: #BA2121">.4f</span><span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">, Test loss: </span><span style="color: #BB6688; font-weight: bold">{</span>test_loss<span style="color: #BB6688; font-weight: bold">:</span><span style="color: #BA2121">.4f</span><span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">&quot;</span>)
 </pre>
 </div>
       </div>
@@ -2519,7 +1175,7 @@ <h2 id="the-breast-cancer-data-now-with-keras">The Breast Cancer Data, now with
 
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="building-a-neural-network-code">Building a neural network code </h2>
+<h2 id="building-our-own-neural-network-code">Building our own  neural network code </h2>
 
 <p>Here we  present a flexible object oriented codebase
 for a feed forward neural network, along with a demonstration of how
@@ -6539,7 +5195,7 @@ <h2 id="resources-on-differential-equations-and-deep-learning">Resources on diff
 </ol>
 <!-- ------------------- end of main content --------------- -->
 <center style="font-size:80%">
-<!-- copyright --> &copy; 1999-2024, Morten Hjorth-Jensen. Released under CC Attribution-NonCommercial 4.0 license
+<!-- copyright --> &copy; 1999-2025, Morten Hjorth-Jensen. Released under CC Attribution-NonCommercial 4.0 license
 </center>
 </body>
 </html>
diff --git a/doc/pub/week43/ipynb/.ipynb_checkpoints/week43-checkpoint.ipynb b/doc/pub/week43/ipynb/.ipynb_checkpoints/week43-checkpoint.ipynb
index d9260ad4f..713df5efc 100644
--- a/doc/pub/week43/ipynb/.ipynb_checkpoints/week43-checkpoint.ipynb
+++ b/doc/pub/week43/ipynb/.ipynb_checkpoints/week43-checkpoint.ipynb
@@ -2,2541 +2,4749 @@
  "cells": [
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "b10156d4",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "<!-- dom:TITLE: Week 43: Solving Differential Equations with Deep Learning and Dimensionality Reduction methods -->\n",
-    "# Week 43: Solving Differential Equations with Deep Learning and Dimensionality Reduction methods\n",
-    "<!-- dom:AUTHOR: Morten Hjorth-Jensen at Department of Physics, University of Oslo & Department of Physics and Astronomy and National Superconducting Cyclotron Laboratory, Michigan State University -->\n",
-    "<!-- Author: -->  \n",
-    "**Morten Hjorth-Jensen**, Department of Physics, University of Oslo and Department of Physics and Astronomy and National Superconducting Cyclotron Laboratory, Michigan State University\n",
-    "\n",
-<<<<<<< HEAD
-    "Date: **Oct 22, 2020**\n",
-=======
-    "Date: **Oct 23, 2020**\n",
->>>>>>> 9b0e2e75096cc1acee65bfac25f4eff818140252
-    "\n",
-    "Copyright 1999-2020, Morten Hjorth-Jensen. Released under CC Attribution-NonCommercial 4.0 license\n",
-    "\n",
-    "\n",
-    "\n",
-    "\n",
-<<<<<<< HEAD
-    "* Thursday: Wrapping up Recurrent Neural Networks and solving differential equations. \n",
-    "\n",
-    "* Friday: Principal Component Analysis and Dimensionality Reduction\n",
-=======
-    "* Thursday: Wrapping up Recurrent Neural Networks and solving differential equations. [Video of Lecture October 22](https://www.uio.no/studier/emner/matnat/fys/FYS-STK4155/h20/forelesningsvideoer/LectureOctober22.mp4?vrtx=view-as-webpage)\n",
-    "\n",
-    "* Friday: Principal Component Analysis and Dimensionality Reduction. [Video of Lecture October 23](https://www.uio.no/studier/emner/matnat/fys/FYS-STK4155/h20/forelesningsvideoer/LectureOctober23.mp4?vrtx=view-as-webpage)\n",
->>>>>>> 9b0e2e75096cc1acee65bfac25f4eff818140252
-    "\n",
-    "We will also study the usage of [Autograd](https://www.youtube.com/watch?v=fRf4l5qaX1M&ab_channel=AlexSmola) in computing gradients for deep learning. For the documentation of Autograd and examples see the lectures slides from [week 40](https://compphysics.github.io/MachineLearning/doc/pub/week40/html/week40.html) and the [Autograd doucmentation](https://github.com/HIPS/autograd).\n",
-    "\n",
-    "## Recurrent Neural Networks\n",
-    "\n",
-    "[Overview video](https://www.youtube.com/watch?v=SEnXr6v2ifU&ab_channel=AlexanderAmini).\n",
-    "See also lecture on Thursday October 22 and examples from [week 42](https://compphysics.github.io/MachineLearning/doc/pub/week42/html/week42.html).\n",
-    "\n",
-    "[IN5400 at UiO Lecture](https://www.uio.no/studier/emner/matnat/ifi/IN5400/v20/material/week10/in5400_2020_week10_recurrent_neural_network.pdf)\n",
-    "\n",
-    "[CS231 at Stanford Lecture](https://www.youtube.com/watch?v=6niqTuYFZLQ&list=PLzUTmXVwsnXod6WNdg57Yc3zFx_f-RYsq&index=10&ab_channel=StanfordUniversitySchoolofEngineering)\n",
-    "\n",
-    "## Solving ODEs with Deep Learning\n",
-    "\n",
-    "The Universal Approximation Theorem states that a neural network can\n",
-    "approximate any function at a single hidden layer along with one input\n",
-    "and output layer to any given precision.  \n",
-    "\n",
-    "\n",
-    "## Ordinary Differential Equations\n",
-    "\n",
-    "An ordinary differential equation (ODE) is an equation involving functions having one variable.\n",
-    "\n",
-    "In general, an ordinary differential equation looks like"
+    "<!-- HTML file automatically generated from DocOnce source (https://github.com/doconce/doconce/)\n",
+    "doconce format html week43.do.txt --no_mako -->\n",
+    "<!-- dom:TITLE: Week 43: Deep Learning: Constructing a Neural Network code and solving differential equations -->"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "f85baa2f",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "<!-- Equation labels as ordinary links -->\n",
-    "<div id=\"ode\"></div>\n",
+    "# Week 43: Deep Learning: Constructing a Neural Network code and solving differential equations\n",
+    "**Morten Hjorth-Jensen**, Department of Physics, University of Oslo, Norway\n",
     "\n",
-    "$$\n",
-    "\\begin{equation} \\label{ode} \\tag{1}\n",
-    "f\\left(x, \\, g(x), \\, g'(x), \\, g''(x), \\, \\dots \\, , \\, g^{(n)}(x)\\right) = 0\n",
-    "\\end{equation}\n",
-    "$$"
+    "Date: **October 20, 2025**"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "543fad4a",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "where $g(x)$ is the function to find, and $g^{(n)}(x)$ is the $n$-th derivative of $g(x)$.\n",
+    "## Plans for week 43\n",
     "\n",
-    "The $f\\left(x, g(x), g'(x), g''(x), \\, \\dots \\, , g^{(n)}(x)\\right)$ is just a way to write that there is an expression involving $x$ and $g(x), \\ g'(x), \\ g''(x), \\, \\dots \\, , \\text{ and } g^{(n)}(x)$ on the left side of the equality sign in ([1](#ode)).\n",
-    "The highest order of derivative, that is the value of $n$, determines to the order of the equation.\n",
-    "The equation is referred to as a $n$-th order ODE.\n",
-    "Along with ([1](#ode)), some additional conditions of the function $g(x)$ are typically given\n",
-    "for the solution to be unique.\n",
+    "**Material for the lecture on Monday October 20, 2025.**\n",
     "\n",
-    "## The trial solution\n",
+    "1. Reminder from last week, see also lecture notes from week 42 at <https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/week42.html> as well as those from week 41, see see <https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/week41.html>. \n",
     "\n",
-    "Let the trial solution $g_t(x)$ be"
+    "2. Building our own Feed-forward Neural Network.\n",
+    "\n",
+    "3. Coding examples using Tensorflow/Keras and Pytorch examples. The Pytorch examples are adapted from Rashcka's text, see chapters 11-13.. \n",
+    "\n",
+    "4. Start discussions on how to use neural networks for solving  differential equations (ordinary and partial ones). This topic continues next week as well.\n",
+    "<!-- * Video of lecture at <https://youtu.be/vkBNTn-MLqs> -->\n",
+    "<!-- * Whiteboard notes on solving differential equations at <https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2025/FYSSTKweek43.pdf> -->"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "72acb4e9",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "<!-- Equation labels as ordinary links -->\n",
-    "<div id=\"_auto1\"></div>\n",
+    "## Exercises and lab session week 43\n",
+    "**Lab sessions on Tuesday and Wednesday.**\n",
     "\n",
-    "$$\n",
-    "\\begin{equation}\n",
-    "\tg_t(x) = h_1(x) + h_2(x,N(x,P))\n",
-    "\\label{_auto1} \\tag{2}\n",
-    "\\end{equation}\n",
-    "$$"
+    "1. Work  on writing your own neural network code and discussions of project 2. If you didn't get time to do the exercises from the two last weeks, we recommend doing so as these exercises give you the basic elements of a neural network code.\n",
+    "\n",
+    "2. The exercises this week are tailored to the optional part of project 2, and deal with studying ways to display results from classification problems"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "361768dc",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "where $h_1(x)$ is a function that makes $g_t(x)$ satisfy a given set\n",
-    "of conditions, $N(x,P)$ a neural network with weights and biases\n",
-    "described by $P$ and $h_2(x, N(x,P))$ some expression involving the\n",
-    "neural network.  The role of the function $h_2(x, N(x,P))$, is to\n",
-    "ensure that the output from $N(x,P)$ is zero when $g_t(x)$ is\n",
-    "evaluated at the values of $x$ where the given conditions must be\n",
-    "satisfied.  The function $h_1(x)$ should alone make $g_t(x)$ satisfy\n",
-    "the conditions.\n",
-    "\n",
-    "But what about the network $N(x,P)$?\n",
-    "\n",
+    "## Using Automatic differentiation\n",
     "\n",
-    "As described previously, an optimization method could be used to minimize the parameters of a neural network, that being its weights and biases, through backward propagation.\n",
-    "\n",
-    "\n",
-    "## Minimization process\n",
-    "\n",
-    "For the minimization to be defined, we need to have a cost function at hand to minimize.\n",
-    "\n",
-    "It is given that $f\\left(x, \\, g(x), \\, g'(x), \\, g''(x), \\, \\dots \\, , \\, g^{(n)}(x)\\right)$ should be equal to zero in ([1](#ode)).\n",
-    "We can choose to consider the mean squared error as the cost function for an input $x$.\n",
-    "Since we are looking at one input, the cost function is just $f$ squared.\n",
-    "The cost function $c\\left(x, P \\right)$ can therefore be expressed as"
+    "In our discussions of ordinary differential equations and neural network codes\n",
+    "we will also study the usage of Autograd, see for example <https://www.youtube.com/watch?v=fRf4l5qaX1M&ab_channel=AlexSmola> in computing gradients for deep learning. For the documentation of Autograd and examples see the Autograd documentation at <https://github.com/HIPS/autograd> and the lecture slides from week 41, see <https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/week41.html>."
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "3e058671",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "$$\n",
-    "C\\left(x, P\\right) = \\big(f\\left(x, \\, g(x), \\, g'(x), \\, g''(x), \\, \\dots \\, , \\, g^{(n)}(x)\\right)\\big)^2\n",
-    "$$"
+    "## Back propagation and automatic differentiation\n",
+    "\n",
+    "For more details on the back propagation algorithm and automatic differentiation see\n",
+    "1. <https://www.jmlr.org/papers/volume18/17-468/17-468.pdf>\n",
+    "\n",
+    "2. <https://deepimaging.github.io/lectures/lecture_11_Backpropagation.pdf>\n",
+    "\n",
+    "3. Slides 12-44 at <http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture4.pdf>"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "8cbbf2bf",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "If $N$ inputs are given as a vector $\\boldsymbol{x}$ with elements $x_i$ for $i = 1,\\dots,N$,\n",
-    "the cost function becomes"
+    "## Lecture Monday  October 20"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "78e2de21",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "<!-- Equation labels as ordinary links -->\n",
-    "<div id=\"cost\"></div>\n",
+    "## Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations\n",
+    "This is a reminder from last week.\n",
     "\n",
-    "$$\n",
-    "\\begin{equation} \\label{cost} \\tag{3}\n",
-    "\tC\\left(\\boldsymbol{x}, P\\right) = \\frac{1}{N} \\sum_{i=1}^N \\big(f\\left(x_i, \\, g(x_i), \\, g'(x_i), \\, g''(x_i), \\, \\dots \\, , \\, g^{(n)}(x_i)\\right)\\big)^2\n",
-    "\\end{equation}\n",
-    "$$"
+    "**The architecture (our model).**\n",
+    "\n",
+    "1. Set up your inputs and outputs (scalars, vectors, matrices or higher-order arrays)\n",
+    "\n",
+    "2. Define the number of hidden layers and hidden nodes\n",
+    "\n",
+    "3. Define activation functions for hidden layers and output layers\n",
+    "\n",
+    "4. Define optimizer (plan learning rate, momentum, ADAgrad, RMSprop, ADAM etc) and array of initial learning rates\n",
+    "\n",
+    "5. Define cost function and possible regularization terms with hyperparameters\n",
+    "\n",
+    "6. Initialize weights and biases\n",
+    "\n",
+    "7. Fix number of iterations for the feed forward part and back propagation part"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "41a3dc23",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "The neural net should then find the parameters $P$ that minimizes the cost function in\n",
-    "([3](#cost)) for a set of $N$ training samples $x_i$.\n",
-    "\n",
-    "## Minimizing the cost function using gradient descent and automatic differentiation\n",
-    "\n",
-    "To perform the minimization using gradient descent, the gradient of $C\\left(\\boldsymbol{x}, P\\right)$ is needed.\n",
-    "It might happen so that finding an analytical expression of the gradient of $C(\\boldsymbol{x}, P)$ from ([3](#cost)) gets too messy, depending on which cost function one desires to use.\n",
+    "## Setting up the back propagation algorithm, part 1\n",
     "\n",
-    "Luckily, there exists libraries that makes the job for us through automatic differentiation.\n",
-    "Automatic differentiation is a method of finding the derivatives numerically with very high precision.\n",
+    "Let us write this out in the form of an algorithm.\n",
     "\n",
+    "**First**, we set up the input data $\\boldsymbol{x}$ and the activations\n",
+    "$\\boldsymbol{z}_1$ of the input layer and compute the activation function and\n",
+    "the pertinent outputs $\\boldsymbol{a}^1$.\n",
     "\n",
-    "## Example: Exponential decay\n",
+    "**Secondly**, we perform then the feed forward till we reach the output\n",
+    "layer and compute all $\\boldsymbol{z}_l$ of the input layer and compute the\n",
+    "activation function and the pertinent outputs $\\boldsymbol{a}^l$ for\n",
+    "$l=1,2,3,\\dots,L$.\n",
     "\n",
-    "An exponential decay of a quantity $g(x)$ is described by the equation"
+    "**Notation**: The first hidden layer has $l=1$ as label and the final output layer has $l=L$."
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "0e4ac2c0",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "<!-- Equation labels as ordinary links -->\n",
-    "<div id=\"solve_expdec\"></div>\n",
+    "## Setting up the back propagation algorithm, part 2\n",
     "\n",
+    "Thereafter we compute the ouput error $\\boldsymbol{\\delta}^L$ by computing all"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e9fd2f83",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
     "$$\n",
-    "\\begin{equation} \\label{solve_expdec} \\tag{4}\n",
-    "  g'(x) = -\\gamma g(x)\n",
-    "\\end{equation}\n",
+    "\\delta_j^L = \\sigma'(z_j^L)\\frac{\\partial {\\cal C}}{\\partial (a_j^L)}.\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "16e2b900",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "with $g(0) = g_0$ for some chosen initial value $g_0$.\n",
-    "\n",
-    "The analytical solution of ([4](#solve_expdec)) is"
+    "Then we compute the back propagate error for each $l=L-1,L-2,\\dots,1$ as"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "f9f4b9d8",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "<!-- Equation labels as ordinary links -->\n",
-    "<div id=\"_auto2\"></div>\n",
-    "\n",
     "$$\n",
-    "\\begin{equation}\n",
-    "  g(x) = g_0 \\exp\\left(-\\gamma x\\right)\n",
-    "\\label{_auto2} \\tag{5}\n",
-    "\\end{equation}\n",
+    "\\delta_j^l = \\sum_k \\delta_k^{l+1}w_{kj}^{l+1}\\sigma'(z_j^l).\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "01be6441",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "Having an analytical solution at hand, it is possible to use it to compare how well a neural network finds a solution of ([4](#solve_expdec)).\n",
-    "\n",
-    "\n",
-    "## The function to solve for\n",
+    "## Setting up the Back propagation algorithm, part 3\n",
     "\n",
-    "The program will use a neural network to solve"
+    "Finally, we update the weights and the biases using gradient descent\n",
+    "for each $l=L-1,L-2,\\dots,1$ (the first hidden layer) and update the weights and biases\n",
+    "according to the rules"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "ce898b85",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "<!-- Equation labels as ordinary links -->\n",
-    "<div id=\"solveode\"></div>\n",
-    "\n",
     "$$\n",
-    "\\begin{equation} \\label{solveode} \\tag{6}\n",
-    "g'(x) = -\\gamma g(x)\n",
-    "\\end{equation}\n",
+    "w_{ij}^l\\leftarrow  = w_{ij}^l- \\eta \\delta_j^la_i^{l-1},\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "4e2e7314",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "where $g(0) = g_0$ with $\\gamma$ and $g_0$ being some chosen values.\n",
-    "\n",
-    "In this example, $\\gamma = 2$ and $g_0 = 10$.\n",
-    "\n",
-    "## The trial solution\n",
-    "To begin with, a trial solution $g_t(t)$ must be chosen. A general trial solution for ordinary differential equations could be"
+    "$$\n",
+    "b_j^l \\leftarrow b_j^l-\\eta \\frac{\\partial {\\cal C}}{\\partial b_j^l}=b_j^l-\\eta \\delta_j^l,\n",
+    "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "b7114295",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "$$\n",
-    "g_t(x, P) = h_1(x) + h_2(x, N(x, P))\n",
-    "$$"
+    "with $\\eta$ being the learning rate."
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "69dfa048",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "with $h_1(x)$ ensuring that $g_t(x)$ satisfies some conditions and $h_2(x,N(x, P))$ an expression involving $x$ and the output from the neural network $N(x,P)$ with $P $ being the collection of the weights and biases for each layer. For now, it is assumed that the network consists of one input layer, one hidden layer, and one output layer.\n",
-    "\n",
-    "## Setup of Network\n",
+    "## Updating the gradients\n",
     "\n",
-    "In this network, there are no weights and bias at the input layer, so $P = \\{ P_{\\text{hidden}},  P_{\\text{output}} \\}$.\n",
-    "If there are $N_{\\text{hidden} }$ neurons in the hidden layer, then $P_{\\text{hidden}}$ is a $N_{\\text{hidden} } \\times (1 + N_{\\text{input}})$ matrix, given that there are $N_{\\text{input}}$ neurons in the input layer.\n",
-    "\n",
-    "The first column in $P_{\\text{hidden} }$ represents the bias for each neuron in the hidden layer and the second column represents the weights for each neuron in the hidden layer from the input layer.\n",
-    "If there are $N_{\\text{output} }$ neurons in the output layer, then $P_{\\text{output}} $ is a $N_{\\text{output} } \\times (1 + N_{\\text{hidden} })$ matrix.\n",
-    "\n",
-    "Its first column represents the bias of each neuron and the remaining columns represents the weights to each neuron.\n",
-    "\n",
-    "It is given that $g(0) = g_0$. The trial solution must fulfill this condition to be a proper solution of ([6](#solveode)). A possible way to ensure that $g_t(0, P) = g_0$, is to let $F(N(x,P)) = x \\cdot N(x,P)$ and $A(x) = g_0$. This gives the following trial solution:"
+    "With the back propagate error for each $l=L-1,L-2,\\dots,1$ as"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "6efa469c",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "<!-- Equation labels as ordinary links -->\n",
-    "<div id=\"trial\"></div>\n",
-    "\n",
     "$$\n",
-    "\\begin{equation} \\label{trial} \\tag{7}\n",
-    "g_t(x, P) = g_0 + x \\cdot N(x, P)\n",
-    "\\end{equation}\n",
+    "\\delta_j^l = \\sum_k \\delta_k^{l+1}w_{kj}^{l+1}\\sigma'(z_j^l),\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "076e4937",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "## Reformulating the problem\n",
-    "\n",
-    "We wish that our neural network manages to minimize a given cost function.\n",
-    "\n",
-    "A reformulation of out equation, ([6](#solveode)), must therefore be done,\n",
-    "such that it describes the problem a neural network can solve for.\n",
-    "\n",
-    "The neural network must find the set of weights and biases $P$ such that the trial solution in ([7](#trial)) satisfies ([6](#solveode)).\n",
-    "\n",
-    "The trial solution"
+    "we update the weights and the biases using gradient descent for each $l=L-1,L-2,\\dots,1$ and update the weights and biases according to the rules"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "1072f5a1",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
-    "g_t(x, P) = g_0 + x \\cdot N(x, P)\n",
+    "w_{ij}^l\\leftarrow  = w_{ij}^l- \\eta \\delta_j^la_i^{l-1},\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "f77a7074",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "has been chosen such that it already solves the condition $g(0) = g_0$. What remains, is to find $P$ such that"
+    "$$\n",
+    "b_j^l \\leftarrow b_j^l-\\eta \\frac{\\partial {\\cal C}}{\\partial b_j^l}=b_j^l-\\eta \\delta_j^l,\n",
+    "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "f12effab",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "<!-- Equation labels as ordinary links -->\n",
-    "<div id=\"nnmin\"></div>\n",
+    "## Activation functions\n",
     "\n",
-    "$$\n",
-    "\\begin{equation} \\label{nnmin} \\tag{8}\n",
-    "g_t'(x, P) = - \\gamma g_t(x, P)\n",
-    "\\end{equation}\n",
-    "$$"
+    "A property that characterizes a neural network, other than its\n",
+    "connectivity, is the choice of activation function(s).  The following\n",
+    "restrictions are imposed on an activation function for an FFNN to\n",
+    "fulfill the universal approximation theorem\n",
+    "\n",
+    "  * Non-constant\n",
+    "\n",
+    "  * Bounded\n",
+    "\n",
+    "  * Monotonically-increasing\n",
+    "\n",
+    "  * Continuous"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "31eb54b1",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "is fulfilled as *best as possible*.\n",
-    "\n",
-    "## More technicalities\n",
+    "### Activation functions, examples\n",
     "\n",
-    "The left hand side and right hand side of ([8](#nnmin)) must be computed separately, and then the neural network must choose weights and biases, contained in $P$, such that the sides are equal as best as possible.\n",
-    "This means that the absolute or squared difference between the sides must be as close to zero, ideally equal to zero.\n",
-    "In this case, the difference squared shows to be an appropriate measurement of how erroneous the trial solution is with respect to $P$ of the neural network.\n",
-    "\n",
-    "This gives the following cost function our neural network must solve for:"
+    "Typical examples are the logistic *Sigmoid*"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "7a549168",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
-    "\\min_{P}\\Big\\{ \\big(g_t'(x, P) - ( -\\gamma g_t(x, P) \\big)^2 \\Big\\}\n",
+    "\\sigma(x) = \\frac{1}{1 + e^{-x}},\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "ce35ae73",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "(the notation $\\min_{P}\\{ f(x, P) \\}$ means that we desire to find $P$ that yields the minimum of $f(x, P)$)\n",
-    "\n",
-    "or, in terms of weights and biases for the hidden and output layer in our network:"
+    "and the *hyperbolic tangent* function"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "d6cdfc89",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
-    "\\min_{P_{\\text{hidden} }, \\ P_{\\text{output} }}\\Big\\{ \\big(g_t'(x, \\{ P_{\\text{hidden} }, P_{\\text{output} }\\}) - ( -\\gamma g_t(x, \\{ P_{\\text{hidden} }, P_{\\text{output} }\\}) \\big)^2 \\Big\\}\n",
+    "\\sigma(x) = \\tanh(x)\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "ddd59bb0",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "for an input value $x$.\n",
+    "## The RELU function family\n",
     "\n",
-    "## More details\n",
+    "The ReLU activation function suffers from a problem known as the dying\n",
+    "ReLUs: during training, some neurons effectively die, meaning they\n",
+    "stop outputting anything other than 0.\n",
     "\n",
-    "If the neural network evaluates $g_t(x, P)$ at more values for $x$, say $N$ values $x_i$ for $i = 1, \\dots, N$, then the *total* error to minimize becomes"
+    "In some cases, you may find that half of your network’s neurons are\n",
+    "dead, especially if you used a large learning rate. During training,\n",
+    "if a neuron’s weights get updated such that the weighted sum of the\n",
+    "neuron’s inputs is negative, it will start outputting 0. When this\n",
+    "happen, the neuron is unlikely to come back to life since the gradient\n",
+    "of the ReLU function is 0 when its input is negative."
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "f2a78e55",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "<!-- Equation labels as ordinary links -->\n",
-    "<div id=\"min\"></div>\n",
+    "## ELU function\n",
     "\n",
-    "$$\n",
-    "\\begin{equation} \\label{min} \\tag{9}\n",
-    "\\min_{P}\\Big\\{\\frac{1}{N} \\sum_{i=1}^N  \\big(g_t'(x_i, P) - ( -\\gamma g_t(x_i, P) \\big)^2 \\Big\\}\n",
-    "\\end{equation}\n",
-    "$$"
+    "To solve this problem, nowadays practitioners use a variant of the\n",
+    "ReLU function, such as the leaky ReLU discussed above or the so-called\n",
+    "exponential linear unit (ELU) function"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "cde73faf",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "Letting $\\boldsymbol{x}$ be a vector with elements $x_i$ and $C(\\boldsymbol{x}, P) = \\frac{1}{N} \\sum_i  \\big(g_t'(x_i, P) - ( -\\gamma g_t(x_i, P) \\big)^2$ denote the cost function, the minimization problem that our network must solve, becomes"
+    "$$\n",
+    "ELU(z) = \\left\\{\\begin{array}{cc} \\alpha\\left( \\exp{(z)}-1\\right) & z < 0,\\\\  z & z \\ge 0.\\end{array}\\right.\n",
+    "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "08048672",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "$$\n",
-    "\\min_{P} C(\\boldsymbol{x}, P)\n",
-    "$$"
+    "## Which activation function should we use?\n",
+    "\n",
+    "In general it seems that the ELU activation function is better than\n",
+    "the leaky ReLU function (and its variants), which is better than\n",
+    "ReLU. ReLU performs better than $\\tanh$ which in turn performs better\n",
+    "than the logistic function.\n",
+    "\n",
+    "If runtime performance is an issue, then you may opt for the leaky\n",
+    "ReLU function over the ELU function If you don’t want to tweak yet\n",
+    "another hyperparameter, you may just use the default $\\alpha$ of\n",
+    "$0.01$ for the leaky ReLU, and $1$ for ELU. If you have spare time and\n",
+    "computing power, you can use cross-validation or bootstrap to evaluate\n",
+    "other activation functions."
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "a7085280",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "In terms of $P_{\\text{hidden} }$ and $P_{\\text{output} }$, this could also be expressed as\n",
-    "\n",
-    "$$\n",
-    "\\min_{P_{\\text{hidden} }, \\ P_{\\text{output} }} C(\\boldsymbol{x}, \\{P_{\\text{hidden} }, P_{\\text{output} }\\})\n",
-    "$$\n",
+    "## More on activation functions, output layers\n",
     "\n",
-    "## A possible implementation of a neural network\n",
+    "In most cases you can use the ReLU activation function in the hidden\n",
+    "layers (or one of its variants).\n",
     "\n",
-    "For simplicity, it is assumed that the input is an array $\\boldsymbol{x} = (x_1, \\dots, x_N)$ with $N$ elements. It is at these points the neural network should find $P$ such that it fulfills ([9](#min)).\n",
+    "It is a bit faster to compute than other activation functions, and the\n",
+    "gradient descent optimization does in general not get stuck.\n",
     "\n",
-    "First, the neural network must feed forward the inputs.\n",
-    "This means that $\\boldsymbol{x}s$ must be passed through an input layer, a hidden layer and a output layer. The input layer in this case, does not need to process the data any further.\n",
-    "The input layer will consist of $N_{\\text{input} }$ neurons, passing its element to each neuron in the hidden layer.  The number of neurons in the hidden layer will be $N_{\\text{hidden} }$.\n",
+    "**For the output layer:**\n",
     "\n",
-    "## Technicalities\n",
+    "* For classification the softmax activation function is generally a good choice for classification tasks (when the classes are mutually exclusive).\n",
     "\n",
-    "For the $i$-th in the hidden layer with weight $w_i^{\\text{hidden} }$ and bias $b_i^{\\text{hidden} }$, the weighting from the $j$-th neuron at the input layer is:"
+    "* For regression tasks, you can simply use no activation function at all."
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "291e4fb2",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "$$\n",
-    "\\begin{aligned}\n",
-    "z_{i,j}^{\\text{hidden}} &= b_i^{\\text{hidden}} + w_i^{\\text{hidden}}x_j \\\\\n",
-    "&=\n",
-    "\\begin{pmatrix}\n",
-    "b_i^{\\text{hidden}} & w_i^{\\text{hidden}}\n",
-    "\\end{pmatrix}\n",
-    "\\begin{pmatrix}\n",
-    "1 \\\\\n",
-    "x_j\n",
-    "\\end{pmatrix}\n",
-    "\\end{aligned}\n",
-    "$$"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Final technicalities I\n",
-    "\n",
-    "The result after weighting the inputs at the $i$-th hidden neuron can be written as a vector:"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "$$\n",
-    "\\begin{aligned}\n",
-    "\\boldsymbol{z}_{i}^{\\text{hidden}} &= \\Big( b_i^{\\text{hidden}} + w_i^{\\text{hidden}}x_1 , \\ b_i^{\\text{hidden}} + w_i^{\\text{hidden}} x_2, \\ \\dots \\, , \\ b_i^{\\text{hidden}} + w_i^{\\text{hidden}} x_N\\Big)  \\\\\n",
-    "&=\n",
-    "\\begin{pmatrix}\n",
-    " b_i^{\\text{hidden}}  & w_i^{\\text{hidden}}\n",
-    "\\end{pmatrix}\n",
-    "\\begin{pmatrix}\n",
-    "1  & 1 & \\dots & 1 \\\\\n",
-    "x_1 & x_2 & \\dots & x_N\n",
-    "\\end{pmatrix} \\\\\n",
-    "&= \\boldsymbol{p}_{i, \\text{hidden}}^T X\n",
-    "\\end{aligned}\n",
-    "$$"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Final technicalities II\n",
-    "\n",
-    "The vector $\\boldsymbol{p}_{i, \\text{hidden}}^T$ constitutes each row in $P_{\\text{hidden} }$, which contains the weights for the neural network to minimize according to ([9](#min)).\n",
+    "## Building neural networks in Tensorflow and Keras\n",
     "\n",
-    "After having found $\\boldsymbol{z}_{i}^{\\text{hidden}} $ for every $i$-th neuron within the hidden layer, the vector will be sent to an activation function $a_i(\\boldsymbol{z})$.\n",
+    "Now we want  to build on the experience gained from our neural network implementation in NumPy and scikit-learn\n",
+    "and use it to construct a neural network in Tensorflow. Once we have constructed a neural network in NumPy\n",
+    "and Tensorflow, building one in Keras is really quite trivial, though the performance may suffer.  \n",
     "\n",
-    "In this example, the sigmoid function has been chosen to be the activation function for each hidden neuron:"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "$$\n",
-    "f(z) = \\frac{1}{1 + \\exp{(-z)}}\n",
-    "$$"
+    "In our previous example we used only one hidden layer, and in this we will use two. From this it should be quite\n",
+    "clear how to build one using an arbitrary number of hidden layers, using data structures such as Python lists or\n",
+    "NumPy arrays."
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "a8c5f4c2",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "It is possible to use other activations functions for the hidden layer also.\n",
-    "\n",
-    "The output $\\boldsymbol{x}_i^{\\text{hidden}}$ from each $i$-th hidden neuron is:\n",
-    "\n",
-    "$$\n",
-    "\\boldsymbol{x}_i^{\\text{hidden} } = f\\big(  \\boldsymbol{z}_{i}^{\\text{hidden}} \\big)\n",
-    "$$\n",
+    "## Tensorflow\n",
     "\n",
-    "The outputs $\\boldsymbol{x}_i^{\\text{hidden} } $ are then sent to the output layer.\n",
+    "Tensorflow is an open source library machine learning library\n",
+    "developed by the Google Brain team for internal use. It was released\n",
+    "under the Apache 2.0 open source license in November 9, 2015.\n",
     "\n",
-    "The output layer consists of one neuron in this case, and combines the\n",
-    "output from each of the neurons in the hidden layers. The output layer\n",
-    "combines the results from the hidden layer using some weights $w_i^{\\text{output}}$\n",
-    "and biases $b_i^{\\text{output}}$. In this case,\n",
-    "it is assumes that the number of neurons in the output layer is one.\n",
+    "Tensorflow is a computational framework that allows you to construct\n",
+    "machine learning models at different levels of abstraction, from\n",
+    "high-level, object-oriented APIs like Keras, down to the C++ kernels\n",
+    "that Tensorflow is built upon. The higher levels of abstraction are\n",
+    "simpler to use, but less flexible, and our choice of implementation\n",
+    "should reflect the problems we are trying to solve.\n",
     "\n",
-    "## Final technicalities III\n",
+    "[Tensorflow uses](https://www.tensorflow.org/guide/graphs) so-called graphs to represent your computation\n",
+    "in terms of the dependencies between individual operations, such that you first build a Tensorflow *graph*\n",
+    "to represent your model, and then create a Tensorflow *session* to run the graph.\n",
     "\n",
+    "In this guide we will analyze the same data as we did in our NumPy and\n",
+    "scikit-learn tutorial, gathered from the MNIST database of images. We\n",
+    "will give an introduction to the lower level Python Application\n",
+    "Program Interfaces (APIs), and see how we use them to build our graph.\n",
+    "Then we will build (effectively) the same graph in Keras, to see just\n",
+    "how simple solving a machine learning problem can be.\n",
     "\n",
-    "The procedure of weighting the output neuron $j$ in the hidden layer to the $i$-th neuron in the output layer is similar as for the hidden layer described previously."
+    "To install tensorflow on Unix/Linux systems, use pip as"
    ]
   },
   {
-   "cell_type": "markdown",
-   "metadata": {},
+   "cell_type": "code",
+   "execution_count": 1,
+   "id": "9a0aac03",
+   "metadata": {
+    "collapsed": false,
+    "editable": true,
+    "jupyter": {
+     "outputs_hidden": false
+    }
+   },
+   "outputs": [],
    "source": [
-    "$$\n",
-    "\\begin{aligned}\n",
-    "z_{1,j}^{\\text{output}} & =\n",
-    "\\begin{pmatrix}\n",
-    "b_1^{\\text{output}} & \\boldsymbol{w}_1^{\\text{output}}\n",
-    "\\end{pmatrix}\n",
-    "\\begin{pmatrix}\n",
-    "1 \\\\\n",
-    "\\boldsymbol{x}_j^{\\text{hidden}}\n",
-    "\\end{pmatrix}\n",
-    "\\end{aligned}\n",
-    "$$"
+    "pip3 install tensorflow"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "ca0c7865",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "## Final technicalities IV\n",
-    "\n",
-    "Expressing $z_{1,j}^{\\text{output}}$ as a vector gives the following way of weighting the inputs from the hidden layer:"
+    "and/or if you use **anaconda**, just write (or install from the graphical user interface)\n",
+    "(current release of CPU-only TensorFlow)"
    ]
   },
   {
-   "cell_type": "markdown",
-   "metadata": {},
+   "cell_type": "code",
+   "execution_count": 2,
+   "id": "d0c581f7",
+   "metadata": {
+    "collapsed": false,
+    "editable": true,
+    "jupyter": {
+     "outputs_hidden": false
+    }
+   },
+   "outputs": [],
    "source": [
-    "$$\n",
-    "\\boldsymbol{z}_{1}^{\\text{output}} =\n",
-    "\\begin{pmatrix}\n",
-    "b_1^{\\text{output}} & \\boldsymbol{w}_1^{\\text{output}}\n",
-    "\\end{pmatrix}\n",
-    "\\begin{pmatrix}\n",
-    "1  & 1 & \\dots & 1 \\\\\n",
-    "\\boldsymbol{x}_1^{\\text{hidden}} & \\boldsymbol{x}_2^{\\text{hidden}} & \\dots & \\boldsymbol{x}_N^{\\text{hidden}}\n",
-    "\\end{pmatrix}\n",
-    "$$"
+    "conda create -n tf tensorflow\n",
+    "conda activate tf"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "fe086bc9",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "In this case we seek a continuous range of values since we are approximating a function. This means that after computing $\\boldsymbol{z}_{1}^{\\text{output}}$ the neural network has finished its feed forward step, and $\\boldsymbol{z}_{1}^{\\text{output}}$ is the final output of the network.\n",
-    "\n",
-    "## Back propagation\n",
-    "\n",
-    "The next step is to decide how the parameters should be changed such that they minimize the cost function.\n",
-    "\n",
-    "The chosen cost function for this problem is"
+    "To install the current release of GPU TensorFlow"
    ]
   },
   {
-   "cell_type": "markdown",
-   "metadata": {},
+   "cell_type": "code",
+   "execution_count": 3,
+   "id": "f551fad9",
+   "metadata": {
+    "collapsed": false,
+    "editable": true,
+    "jupyter": {
+     "outputs_hidden": false
+    }
+   },
+   "outputs": [],
    "source": [
-    "$$\n",
-    "C(\\boldsymbol{x}, P) = \\frac{1}{N} \\sum_i  \\big(g_t'(x_i, P) - ( -\\gamma g_t(x_i, P) \\big)^2\n",
-    "$$"
+    "conda create -n tf-gpu tensorflow-gpu\n",
+    "conda activate tf-gpu"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "58152cef",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "In order to minimize the cost function, an optimization method must be chosen.\n",
+    "## Using Keras\n",
     "\n",
-    "Here, gradient descent with a constant step size has been chosen.\n",
-    "\n",
-    "## Gradient descent\n",
-    "\n",
-    "The idea of the gradient descent algorithm is to update parameters in\n",
-    "a direction where the cost function decreases goes to a minimum.\n",
-    "\n",
-    "In general, the update of some parameters $\\boldsymbol{\\omega}$ given a cost\n",
-    "function defined by some weights $\\boldsymbol{\\omega}$, $C(\\boldsymbol{x},\n",
-    "\\boldsymbol{\\omega})$, goes as follows:"
+    "Keras is a high level [neural network](https://en.wikipedia.org/wiki/Application_programming_interface)\n",
+    "that supports Tensorflow, CTNK and Theano as backends.  \n",
+    "If you have Anaconda installed you may run the following command"
    ]
   },
   {
-   "cell_type": "markdown",
-   "metadata": {},
+   "cell_type": "code",
+   "execution_count": 4,
+   "id": "579b6a4a",
+   "metadata": {
+    "collapsed": false,
+    "editable": true,
+    "jupyter": {
+     "outputs_hidden": false
+    }
+   },
+   "outputs": [],
    "source": [
-    "$$\n",
-    "\\boldsymbol{\\omega}_{\\text{new} } = \\boldsymbol{\\omega} - \\lambda \\nabla_{\\boldsymbol{\\omega}} C(\\boldsymbol{x}, \\boldsymbol{\\omega})\n",
-    "$$"
+    "conda install keras"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "5da15206",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "for a number of iterations or until $ \\big|\\big| \\boldsymbol{\\omega}_{\\text{new} } - \\boldsymbol{\\omega} \\big|\\big|$ becomes smaller than some given tolerance.\n",
-    "\n",
-    "The value of $\\lambda$ decides how large steps the algorithm must take\n",
-    "in the direction of $ \\nabla_{\\boldsymbol{\\omega}} C(\\boldsymbol{x}, \\boldsymbol{\\omega})$.\n",
-    "The notation $\\nabla_{\\boldsymbol{\\omega}}$ express the gradient with respect\n",
-    "to the elements in $\\boldsymbol{\\omega}$.\n",
-    "\n",
-    "In our case, we have to minimize the cost function $C(\\boldsymbol{x}, P)$ with\n",
-    "respect to the two sets of weights and biases, that is for the hidden\n",
-    "layer $P_{\\text{hidden} }$ and for the output layer $P_{\\text{output}\n",
-    "}$ .\n",
+    "You can look up the [instructions here](https://keras.io/) for more information.\n",
     "\n",
-    "This means that $P_{\\text{hidden} }$ and $P_{\\text{output} }$ is updated by"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "$$\n",
-    "\\begin{aligned}\n",
-    "P_{\\text{hidden},\\text{new}} &= P_{\\text{hidden}} - \\lambda \\nabla_{P_{\\text{hidden}}} C(\\boldsymbol{x}, P)  \\\\\n",
-    "P_{\\text{output},\\text{new}} &= P_{\\text{output}} - \\lambda \\nabla_{P_{\\text{output}}} C(\\boldsymbol{x}, P)\n",
-    "\\end{aligned}\n",
-    "$$"
+    "We will to a large extent use **keras** in this course."
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "cc970d32",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "## The code for solving the ODE"
+    "## Collect and pre-process data\n",
+    "\n",
+    "Let us look again at the MINST data set."
    ]
   },
   {
    "cell_type": "code",
    "execution_count": 1,
-   "metadata": {},
-<<<<<<< HEAD
+   "id": "a4f2c8a8",
+   "metadata": {
+    "collapsed": false,
+    "editable": true,
+    "jupyter": {
+     "outputs_hidden": false
+    }
+   },
    "outputs": [
     {
      "name": "stdout",
      "output_type": "stream",
      "text": [
-      "Initial cost: 367.01\n",
-      "Final cost: 0.0666807\n",
-      "Max absolute difference: 0.0437499\n"
+      "inputs = (n_inputs, pixel_width, pixel_height) = (1797, 8, 8)\n",
+      "labels = (n_inputs) = (1797,)\n",
+      "X = (n_inputs, n_features) = (1797, 64)\n"
      ]
     },
     {
      "data": {
-      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAl4AAAJcCAYAAAAo6aqNAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjEsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+j8jraAAAgAElEQVR4nOzdd3gU5d7G8e9vk5BQQg89NAsgCAiICBZsx4aCIiqCqEfsvRzsx97LsRdAREURRFFRLIiCohTpvShFOqGEGiDlef+YwTfGNCDJ7G7uz3VxkexsuWf22dl7n53dmHMOERERESl+oaADiIiIiJQWKl4iIiIiJUTFS0RERKSEqHiJiIiIlBAVLxEREZESouIlIiIiUkJUvEqImdU0s5/MbLuZPR90nqCZWVkzG2VmW83s46Dz5MfMLjezCUHnOBhm1tnMVhXzbYwzs76FON/xZraoOLNIeCvsWCnC21tuZqeW1O0diIPZz5hZfTPbYWYxB5mhyO+Xg9n2RbVe4UbFKx/+gEnz7/j1ZvaOmVU4wKu7GtgIVHTO3VGEMSPVBUBNoJpzrkfQYYJgZg3NzJlZbNBZSpJz7mfnXJOgcxSGmXUxsylmttPMNpnZB2ZWL9vyy80s099H7DCzZf5+4vBs59l3P+/I8e+iYNYqvJnZQ2Y25CAuP9jMHivKTOEmZ5lxzv3pnKvgnMsMMtfBitb1yknFq2DnOOcqAG2Ao4H79+fC5gkBDYD57gC+sTZKn5gbAIudcxlBhojSbfs3pWEdi4OZXQB8CLwEVAeaA3uACWZWJdtZJ/r7iErAqUAaMM3MWuS4ysr+k8i+f8OKfy2Kn8aXyH5yzulfHv+A5cCp2X5/FvjS/7kD8CuQCswCOmc73zjgceAXvJ3wECAd2AvswNs5xwMvAmv8fy8C8f7lOwOrgLuAdcD7wEPAx/51bQfmAIcD9wAbgJXAv7JluAJY4J93KXBNtmX7rv8O/7JrgSuyLS8LPA+sALYCE4CyBa13Ltuvmb8tUoF5wLn+6Q/72yLd3x5X5nLZh4DhwHv+OswD2mVbXgf4BEgBlgE3Z1s2GHgs5/rmuF/vAmbjPZHGAncDf/i3NR84L9v5Lwcm5LGODQEHXAb8iTereV+25aFs173JX6eq/rI//cvu8P8d62/ztv7y3v7yI/zf+wKf+T/v7/jJuQ1u9tezXi7rdCgw3r/vNwLDsi3rCPzmL/sN6Jhj3Pf1s6UCLbItS8J7LNTI4/64078/tgLDgIRsy/vhjdE1/vU74NA87o8DHvc5rsf8+6JfjtNDwFzgkfzGBvAlMCLHGIkt5H6nKvCOv75b9t3n/rKrgN+BzcAXQJ1syxxwPbDEX/9HgUOAicA2vLFXJse2uNe/j5cDvbJd19nADP9yK4GHchnzV+KN4Z/80//tb/stwLdAg2yXOQ1Y6N+/r+KNr765rPsZ/H3fMCvb4/0Lf71/B67KY9tdzd/3taMKOca6ADPxxu2vQMt87p+X/G2yDZgGHL8f+61C7WeA14Dnc9zuKOBWvMdzFt7jaQfe42PffRKb3xgCquCNzRT/9C/Jtg/Afwznsd7tgan+eq8HXsi27Fx/XVP962iW4/F9akH75kKuV57joKBtH07/Ag8Qzv9yDJhk/458FKiL9yR6Ft6O+DT/96Rsg/dPvFfIsUBcLgPuEWAS3hNREt6D/dFsgzEDeBrvSaysP6h2A6f71/keXuG4z7/+q4Bl2a7/bLydrgEnAruANjmu/xH/smf5y6v4y1/z16EuEIP3ZBtf0Hrn2HZx/gPjXqAMcLL/YGiS7UEyJJ9tv299z/IzPAlM8peF8HZ4//WvuzHek+zp/vKc27oz/3yin+nfp/sKZQ+8B3UIuAjYCdT2l11OwcVrgH8/tcIrc8385bf693M9fxu+BQzNcdnYbNf3HnCH/3N/vJ30ddmW3XaA4+evbQA8AEzP7X7zlw/FG1chIAE4zj+9Kt7O+lK8MdjT/71atnHf1/95EPB4tuu8Afgmn/tjir/9q+I9eV/rLzsDrzw2B8rh7ZzzK14HPO5zXE9T/3Ya5bLsYbxZrjzHBl4JWZ/X/VzAfucrvGJQxc95on/6yXglqY1/v76CX3r85Q7vSaki/z87Nxbv8VEJ74n+shzb4gX/uk7EG/NNsi0/0h8DLfGeaLvlWJ/3gPJ446sb3uO9mT827gd+9c9fHe/J+gJ/fW7zbzuvJ/iHyLFvwCtqr+ONx9Z4xeGUPC4/mGyP/0KMsTZ4RfwYvH3NZf754/O4/t5ANX8978AbnwkF7bf2Zz+DV3LWAKFs23AXUDPnc1NuY4y8x1A1oDveYykR78V89mI/Lp/7ZSJwqf9zBaCD//Ph/nqc5t9WP38slMmZNed9Q+77gvzWK89xUNC2D6d/gQcI53/+INiB1+JX+Hd4WbyZhPdznPdb/n+nNg7/FXG25TkH3B/AWdl+Px1Ynm0w7uXvr8geAsZk+/0cP1uM/3uiP0Ar57EunwG3ZLv+NP7+hL8BbzYr5C9rlct15LveOU4/Hm+HFMp22lD8V84Urnh9n+33I4A0/+djgD9znP8e4J08tnVuD+5/F3DfzwS6+j9fTsHFK/urxinAxf7PC8j2BAHUxntFHkvuxetK4Itsl+0LfOT/voL/LxH7O346A6vxnmgnAJXyWff38EpfvRynXwpMyXHaRODybON+X/E6FVia7Xy/AH3yuT96Z/v9GeBN/+dBwJPZlh1KPsXrQMd9Lpc7zr+dhFyWXQssyW9s4BXG9BxjJDXHv2a5XK423qv+3Mrg28Az2X6v4I+lhv7vDuiUbfk04K5svz8PvJhtW2QA5bMtHw48kMd2fBH4X471aZxt+ddkm7nG24/swjukoA9/Lx+GN9tWqOKF9wIpE0jMdtqTwOA8Lj+Y3ItXXmPsDfwXLdmWL8IvK4UYY1vw95fks9/K47J57mfwHv+n+T/fCIzOsT65FpT8xlAut98a2JLt93H53C8/4b3oqJ7j9AeA4Tnu+9X474ZQRMWroHGwv9s+yH86xqtg3ZxzlZ1zDZxz1zvn0vB2Jj3MLHXfP7wdde1sl1tZwPXWwXsi3WeFf9o+Kc653Tkusz7bz2nARvf/Bx2m+f9XADCzM81skplt9vOdhfeqaZ9N7u/HV+3yL1sd79XEH7lkLsx6Z1+/lc65rBzrWDeX8+ZlXY58Cf7xJA2AOjly3It3sH5h/e3+MbM+ZjYz2/W14O/ba3+z7vsQRgNgZLbrXYC388gr63jgeDOrhfeqbRjQycwa4s1azPTPdyDjpzLeWzFPOue25rMu/fCeHKeY2Twz+3cet7nvdnO7T38AyprZMWbWAG8HPzKf28xr+9Xh7/dVvo+rgxj3OW30/89tbNfOtjwvdfHeDsmuur8v2fdvQS6XSwY2O+e25LLsb9vfObcDb8Y5+/bPuY/I+Xv2dd3inNuZ7fe/xpB/v/1oZilmthWvbOZ8PGS/LxoAL2Ub55vxxlBdctyHzntWLGj/mF0dvG2yPUfW/dmXQP6P0Tty7E+S+fvj6S9mdoeZLfA/kZ2K97jMvm3y2m/t737mXbzZNfz/3y/keuY5hsysnJm9ZWYrzGwbXpmqXMhPDV6JN7u10Mx+M7Mu/uk5x2UW3v27v/dPQQozDvLc9uFExevArMSb+cm+Ey3vnHsq23lcAdexBu8Bv099/7TCXj5PZhaPd/zTc3hT05WB0Xg7woJsxJuuPSSXZYVZ733WAMn+Bwv2qY/3SuhgrcR7WzV7jkTn3Fn+8p14U+n71MrlOv7avn4xGID3qrKav73mUrjtVZisZ+bImuCcW00u97Fz7ne8HcbNeG8jbcfbmVyN92p4X5E9kPGzBe9YlnfMrFNegZ1z65xzVznn6gDXAK+b2aG53Oa+2/3HfernHI73duQleMdGbs95vkJYi/c27T7JeZ3xIMd9TovwZmX+9olbfzx3x3sLLz/nAT8fwO2uBKqaWeVclv1t+5tZeby3jg70MVXFv459so+hD/Hetkx2zlUC3uSf2zH7GFuJdzxd9nFe1jn3K959+Nf9ZmZGPvcj/xy7a/C2SWKOrHmt9/7uO1fivS2ePXs559zQnGc0s+PxZv4vxJtRqox3zFiBY+wA9jNDgK5m1grvLdzPsi3Lbx3zG0N3AE2AY5xzFYET9sUrKL9zbolzrife4Q1PAyP88ZNzXO67f3O7fwraN+e3Xvs7DsKWiteBGQKcY2anm1mMmSWY9z1J9Qq85P8bCtxvZklmVh3veKUD/gh1DmXwjttIATLM7EzgX4W5oP+EOQh4wczq+Ot3rP+ktj/rPRnvQdbPzOLMrDPe26MfHfzqMQXYZmZ3mfd9YDFm1sLMjvaXzwTOMrOq/szRrQVcX3m8B3wKgJldgfdKtCi8CTzu73Tx7++u/rIUvLcEGue4zHi8nfN4//dxOX6HAxw/zrlxQC+8WbhjcjuPmfXIdp9uwds2mXgl5nAzu8TMYs37OoQj8A7Qzc2HeMex9PJ/PhDDgSvMrJmZlcNbz7wc8LjPyZ+VuRNvG1/ij7NawEC8Y6j+l/My/jhsZGav4L2F8vAB3O5avLftXjezKv5jZ9+T44d426K1/3h8ApjsnFt+AKu4z8NmVsYvFF3wjvkB79CFzc653WbWHq885+dN4B4zaw5gZpXMbF9p/Qpobmbn+7MPN5P7i6F91gMN971oc86txDuG8Ul/n9MSb/blg3wun/MxlZ8BwLX+LJ+ZWXkzOzvHE/w+iXhv0aYAsWb2X7zxUBj7tZ9xzq3C+wDL+8An/rst++S5jgWMoUS8mc9UM6sKPFjI7JhZbzNL8p8jUv2TM/Eeo2eb2SlmFodX7vbg3Wc5FbRvzm+99ncchC0VrwPgD4CueG9vpeC9wvgP+7c9H8P7hMhsvE8oTvdPK4p82/F2bsPxnjgvwXv1Wlh3+pl+w3vL4Gm8Y7UKvd7Oub14n3Q5E28W7XW8Y3wWHtha/e26M/FKXGu8DxhsxHtCrOSf5X28T1wuB77De7suv+ubj3f8y0S8B/6ReMckFYWX8Lb9d2a2He+A+GP8292F/+lX/62HDv5lxuPtIH/K43c4iPHjnBuD9+m/L8ysbS5nORqYbGY7/Oy3OOeWOec24T0534H3Flc/oItzLte33Zxz+8p3Hbwngv3mnPsaeBn4Ee+A3Yn+oj25nPdgx33O6xuGd1zbbXhjbD7eMZ6d/G2xz7H+ttqGV5IrAkc75+bkuMpU+/v3eN2ex01finfs1kK8Y9Bu9fOMxTue5hO8WaRDgIsPdP3wZlK34M0kfIB3sPm+x+f1wCP+mP0v3jbNk3NuJN5+4iP/Lay5eI99/PHRA3gKb9wcRv6Pr33lb5OZTfd/7ol3vM8avLesH/THcW7eBo7wH1Of5XGe7Nmn4n046VW87fE73vFWufkWbywvxnubazeFfNv0APcz7/rny/k245N4LwpSzezOXC6X6xjCO1avLN54ngR8U5jsvjOAef5YfwnvONbdzrlFeG+FvuJf7zl4X8O0N5frKGjfXNB67c84CFvmvbATEQl/ZtYM70k93gX8HXCRzJ+BHuKc259Zeilh/kzVELwPUGQVdH6JDJrxEpGwZmbn+W+HVcGbVRml0iXRzn/b7hZgoEpXdFHxEpFwdw3eW9t/4B1Tcl2wcUSKlz+zm4r3CdoXA44jRUxvNYqIiIiUEM14iYiIiJSQsPtisdxUr17dNWzYMOgYIiIiIgWaNm3aRudcUm7LIqJ4NWzYkKlTpwYdQ0RERKRAZpbzr3z8RW81ioiIiJQQFS8RERGREqLiJSIiIlJCIuIYLxERETl46enprFq1it27dwcdJSokJCRQr1494uLiCn0ZFS8REZFSYtWqVSQmJtKwYUPMLOg4Ec05x6ZNm1i1ahWNGjUq9OX0VqOIiEgpsXv3bqpVq6bSVQTMjGrVqu337KGKl4iISCmi0lV0DmRbqniJiIiIlBAVLxEREYk4gwcP5sYbbyzwPGvWrPnr9759+zJ//vz9vq1x48bRpUuX/b5cblS8REREJCrlLF4DBw7kiCOOCDCRipeIiIiUsG7dutG2bVuaN29O//79AahQoQL33XcfrVq1okOHDqxfvx6AUaNGccwxx3DUUUdx6qmn/nX6Ptu3b6dRo0akp6cDsG3bNho2bMjHH3/M1KlT6dWrF61btyYtLY3OnTv/9ScIv/nmG9q0aUOrVq045ZRTAJgyZQodO3bkqKOOomPHjixatKjI111fJyEiIlIKPTxqHvPXbCvS6zyiTkUePKd5gecbNGgQVatWJS0tjaOPPpru3buzc+dOOnTowOOPP06/fv0YMGAA999/P8cddxyTJk3CzBg4cCDPPPMMzz///F/XlZiYSOfOnfnqq6/o1q0bH330Ed27d6dHjx689tprPPfcc7Rr1+5vt5+SksJVV13FTz/9RKNGjdi8eTMATZs25aeffiI2Npbvv/+ee++9l08++aRIt5GKl4iIiJSol19+mZEjRwKwcuVKlixZQpkyZf46jqpt27aMGTMG8L577KKLLmLt2rXs3bs31+/M6tu3L8888wzdunXjnXfeYcCAAfne/qRJkzjhhBP+uq6qVasCsHXrVi677DKWLFmCmf01i1aUVLxERERKocLMTBWHcePG8f333zNx4kTKlStH586d2b17N3FxcX99PUNMTAwZGRkA3HTTTdx+++2ce+65jBs3joceeugf19mpUyeWL1/O+PHjyczMpEWLFvlmcM7l+lUQDzzwACeddBIjR45k+fLldO7c+aDXNycd4yUiIiIlZuvWrVSpUoVy5cqxcOFCJk2aVOD569atC8C7776b5/n69OlDz549ueKKK/46LTExke3bt//jvMceeyzjx49n2bJlAH+91Zj9tgYPHrxf61VYKl4iIiJSYs444wwyMjJo2bIlDzzwAB06dMj3/A899BA9evTg+OOPp3r16nmer1evXmzZsoWePXv+ddrll1/Otdde+9fB9fskJSXRv39/zj//fFq1asVFF10EQL9+/bjnnnvo1KkTmZmZB7mmuTPnXLFccVFq166d2/cpBBERETkwCxYsoFmzZkHHKBYjRozg888/5/333y/R281tm5rZNOdcu9zOr2O8REREJKLddNNNfP3114wePTroKAVS8RIREZGI9sorrwQdodB0jJeIiIhICSm24mVmg8xsg5nNzXZaVTMbY2ZL/P+rFNfti4iIiISb4pzxGgyckeO0u4GxzrnDgLH+72EhI31v0BFEREQkyhVb8XLO/QRsznFyV2Dfl3C8C3QrrtvfH1NGvsK6J44kbec/v+tDREREpKiU9DFeNZ1zawH8/2vkdUYzu9rMpprZ1JSUlGINlVjncOq5dcz67IVivR0REREp3cL24HrnXH/nXDvnXLukpKRiva1mx5zO3PjWHLbkbXbv0qyXiIiIFI+SLl7rzaw2gP//hhK+/TyFOt9NNbYya+T/go4iIiIStZYvX06zZs246qqraN68Of/6179IS0ujc+fO3HXXXbRv357DDz+cn3/+OeioxaKkv8frC+Ay4Cn//89L+PbzdMSxZzLnx9YcuuRtdu+6jYRyiUFHEhERKT5f3w3r5hTtddY6Es58qsCzLVmyhKFDhzJgwAAuvPBCPvnkEwAyMjKYMmUKo0eP5uGHH+b7778v2nxhoDi/TmIoMBFoYmarzOxKvMJ1mpktAU7zfw8b1vkuqpHK7M9eDDqKiIhI1GrUqBGtW7cGoG3btixfvhyA888//x+nRZtim/FyzvXMY9EpxXWbB6v5sWcyd1wrGi8e6M96VQg6koiISPEoxMxUcYmPj//r55iYmL/+gPW+02NiYsjIyAgkW3EL24Prg2BmuBPvojqpzPnipaDjiIiISJRR8cqhRcezmBvXkkYL+7MnbUfQcURERCSK6I9k52BmZJ7Qj+pje/Pb5y9z9MX3Bh1JREQkajRs2JC5c//6a4Lceeed/zhP9erVo/YYL8145aLlcV2YG3ckjRb2Z+/uXUHHERERkSih4pULMyPj+H5UZwuzdayXiIiIFBEVrzy0Oq4Lc+Na0GB+f9L3aNZLRESig3Mu6AhR40C2pYpXHiwUIv24/5DEZmZ/8XLQcURERA5aQkICmzZtUvkqAs45Nm3aREJCwn5dTgfX56P18ecy7+fmJM/vT/qem4mLLxd0JBERkQNWr149Vq1aRUpKStBRokJCQgL16tXbr8uoeOXDQiF2d/oPNcZfzvRRr9Lmgn5BRxIRETlgcXFxNGrUKOgYpZreaixAmxO7Mi/2COrNe5OMPWlBxxEREZEIpuJVAAuFSOv4H2q4Tcz58tWg44iIiEgEU/EqhLaduzEvphl1576hWS8RERE5YCpehWChELs63kkNt4m5X70WdBwRERGJUCpehdS28/nMj2lK7TlvkLl3d9BxREREJAKpeBVSKCbEjmPvpKbbyJyvXg86joiIiEQgFa/90O6k7syPaULt2a+Tmb4n6DgiIiISYVS89kMoJsS2Y+6kpkth7ldvBB1HREREIoyK1346+pQLWBBzOLVmvUqWZr1ERERkP6h47aeYmBCp7W+npkth3tdvBh1HREREIoiK1wFof+pFLAgdRo0ZmvUSERGRwlPxOgAxMSG2tL+dmm4D8795K+g4IiIiEiFUvA7Qvlmv6jNeJSt9b9BxREREJAKoeB2g2NgYNre7jVpZ61nwrWa9REREpGAqXgfhmH9dzMLQoVSb/gouQ7NeIiIikj8Vr4MQGxvDxra3erNe3wwIOo6IiIiEORWvg3TM6ZewMHQIVaa/rFkvERERyZeK10GKi40hpc2t1M5ax8LvBgYdR0RERMKYilcR6HBGLxZZYypPfRmXmR50HBEREQlTKl5FIC42hvVtbqV21loWadZLRERE8qDiVUQ6nNGbRdaYSlNf0qyXiIiI5ErFq4iUiYthbetbqJ25lsVjBgUdR0RERMKQilcROvas3iy2RlT87UXNeomIiMg/qHgVofi4WNa0upnamWtYMvadoOOIiIhImFHxKmLHnt2HxdaQCpM16yUiIiJ/p+JVxOLjYlnV8mbqZK5myQ/vBh1HREREwoiKVzHoeHYfltCACpNegKzMoOOIiIhImFDxKgYJZeL488ib/FmvwUHHERERkTCh4lVMOp1zOUuoT3nNeomIiIhPxauYJJSJY0WLG6mTsYo/xr0XdBwREREJAypexajTOf/md+pT9tfnNOslIiIiKl7FqWx8HMta3ECdjFUsHf9+0HFEREQkYCpexcyb9Uom4RfNeomIiJR2Kl7FrFx8GZY1v4E6GStZ9tOQoOOIiIhIgFS8SkDHc67kD+pR5pfnNeslIiJSiql4lYDyCWX4vdkN1E1fwfKfPww6joiIiARExauEdDrXm/WKm/AsZGUFHUdEREQCoOJVQiqUjef3ptdRN30Ff07QrJeIiEhppOJVgjqe25el1CXmZ816iYiIlEYqXiUosVwCi5pcR9305fz5y9Cg44iIiEgJU/EqYR3Pvcqb9fpJs14iIiKljYpXCatUPoGFh19L3fRlrPx1WNBxREREpASpeAWg07lXs8zVwX56RrNeIiIipYiKVwAqVUhgweHXUm/vUlZPGh50HBERESkhKl4B6dj1apa52rhxT2vWS0REpJRQ8QpI5QplmX/YNdTbu5Q1k0cEHUdERERKgIpXgDp2vYblrjZZ457SrJeIiEgpoOIVoCqJ5Zh76NXU2/MHa6d8EnQcERERKWYqXgE7tus1rHC1yPzxKXAu6DgiIiJSjFS8AlatYnnmNL6Kent+Z+2UT4OOIyIiIsVIxSsMdDjvOla4mmT+8KRmvURERKKYilcYqF6xPLMbX0W9PUtYP3Vk0HFERESkmKh4hYkO3a5nhatJ+tgnNOslIiISpVS8wkRSpfLMbNiXeruXsGHaZ0HHERERkWKg4hVGju12HX+6Guz9XrNeIiIi0UjFK4zUqJLIjAZ9qbd7MSnTvwg6joiIiBQxFa8w0+G861nparBbs14iIiJRR8UrzNSsksj0Bv8mOW0hKdNHBR1HREREipCKVxhq3+0GVrokzXqJiIhEGRWvMFS7akWm1v83yWkL2Djzy6DjiIiISBFR8QpT7bvdwCpXnbQxj2vWS0REJEqoeIWputUqMSX53yTvWsDmWV8FHUdERESKgIpXGDu66w2sdtXZ+Z1mvURERKKBilcYS06qzOR6V5C8az6bZ38ddBwRERE5SCpeYe7objf6s16PadZLREQkwql4hbnkpMpMrnsZyTvnkTrn26DjiIiIyEFQ8YoAbbvdyBpXje3fPqpZLxERkQim4hUBGtSoyq+1LyN551y2zP0u6DgiIiJygFS8IkTb825ijavKjm806yUiIhKpVLwiRKOaVfm1Vh+Sd85h6/wxQccRERGRA6DiFUHanHcza11Vtn2tWS8REZFIpOIVQRrXqsYvtS4lecdsts7/Pug4IiIisp9UvCLMUV1v8ma9dKyXiIhIxFHxijCH1EliQs3eJG+fxbYFPwQdR0RERPaDilcEOqrrzaxzVdj69SOa9RIREYkgKl4R6NC6SfxUozfJ22eyY+GPQccRERGRQlLxilCtut7MeleZLV8/GnQUERERKSQVrwjVpF4Nxif1JnnbdM16iYiIRAgVrwjWsuvNbHCV2TJas14iIiKRQMUrgjVNrsmP1S8heds0diwaF3QcERERKYCKV4Q7suutmvUSERGJECpeEe6I+jX5odolJG+dys7F44OOIyIiIvlQ8YoCLc69lRRXic1fadZLREQknKl4RYEWDWsytmpPkrf+xq7ffw46joiIiORBxStKNO/qzXpt+vKRoKOIiIhIHlS8osSRDWsztupFJKdOIe33CUHHERERkVyoeEWRZl1uJcVVZONXmvUSEREJRypeUaTVIXX5vsrFJG+ZTNofvwQdR0RERHJQ8YoyTbvcykZXUcd6iYiIhCEVryhz1KF1GVP5QuptmcSepb8GHUdERESyUfGKQod3uY1NLpEUzXqJiIiEFRWvKNT2sHp8W+lC6m2eyJ6lE4OOIyIiIj4Vryh12Nm3erNe+oSjiIhI2FDxilJHN4FXUkcAACAASURBVKnPtxV7UG/Tr+xZPinoOCIiIoKKV1Q75Ozb2OwqkDJKs14iIiLhQMUrirVvkszXiT2ot+kX9iyfEnQcERGRUi+Q4mVmt5nZPDOba2ZDzSwhiBzRzsw45Oxb2eIq6BOOIiIiYaDEi5eZ1QVuBto551oAMcDFJZ2jtDimaQNGJ15AvY0/s3fF5KDjiIiIlGpBvdUYC5Q1s1igHLAmoBxRz8xodJb3NxxTP7kDsrKCjiQiIlJqlXjxcs6tBp4D/gTWAludc9/lPJ+ZXW1mU81sakpKSknHjCrHNmvIyGrXUGPbHLZNHBR0HBERkVIriLcaqwBdgUZAHaC8mfXOeT7nXH/nXDvnXLukpKSSjhlVzIzTL7mVqa4JobEPw67NQUcSEREplYJ4q/FUYJlzLsU5lw58CnQMIEep0qB6BRa3fZiEzB2s/uTuoOOIiIiUSkEUrz+BDmZWzswMOAVYEECOUqf7Wf/iszJdqP3HcH29hIiISACCOMZrMjACmA7M8TP0L+kcpVF8bAz1z3+UFFeJLR/fCFmZQUcSEREpVQL5VKNz7kHnXFPnXAvn3KXOuT1B5CiN2jdryHf1bqLWzkVs+PGNoOOIiIiUKvrm+lLorItvZArNKTfhSdyODUHHERERKTVUvEqhaokJbDzhCcpkpbH8o/8EHUdERKTUUPEqpc7ofCJflT+fRqs+Y9uin4OOIyIiUiqoeJVSoZBxRM/HWOOqsWPkLZCZEXQkERGRqKfiVYo1Sa7FpMPvoM7uP1jxzYtBxxEREYl6Kl6l3BkXXM2k0FFU/+059m5ZHXQcERGRqKbiVcqVi48j64yniXXpLB96R9BxREREopqKl9Cx/TGMqXoxh2/4mg2zxgQdR0REJGqpeAkAbXs/ykpXg/Qvb8dl6PtsRUREioOKlwBQu1pV5re+j7rpf7L482eCjiMiIhKVVLzkL6ec24dJce2pP+cVdqasCDqOiIhI1FHxkr/ExoRI7PY85rJY8cEtQccRERGJOipe8jfNm7dkQu3LOCL1R5ZPHhV0HBERkaii4iX/cHSvh1lBbeK+7Ufm3t1BxxEREYkaKl7yD5USK7D62Iepm7WGOcMfCTqOiIhI1FDxklwd+68LmVL2OJou6c/GlYuCjiMiIhIVVLwkV2ZGrQv/RyYh1gy7Leg4IiIiUUHFS/JUv9HhzGh0NS13/MLcH4cFHUdERCTiqXhJvo7ueR/LrR5Vf3qA3bt2BB1HREQkoql4Sb7i48uy45SnqePWM/PD/wYdR0REJKKpeEmBWhzXhWkVT6XNyndZsXh20HFEREQiloqXFErDnv9jr8Wx5ZNbcVlZQccRERGJSCpeUijVatdnYdMbab1nGpNHvxt0HBERkYik4iWF1uaCu1gW24iGUx8jNXVL0HFEREQijoqXFFooNg7Oep5abGTmB/cFHUdERCTiqHjJfmnU5hRmVT+bThs+Yt6sKUHHERERiSgqXrLfDr3kedIsgb2j7iA9IzPoOCIiIhFDxUv2W/mqtVl91B0clTGb8Z++FXQcERGRiKHiJQek2Tm3srzM4bSc9wyr160POo6IiEhEUPGSAxOKodz5L1KdVOZ9eA/OuaATiYiIhD0VLzlgNZp2YlHd8zh560h+nfhT0HFERETCnoqXHJRDez7LzlB5yo+5i52704OOIyIiEtZUvOSgxCVWZ0vH+2jtFvD9Ry8FHUdERCSsqXjJQWt4yjX8Wa45nZa9xIJlfwYdR0REJGypeMnBC4Wo0uNlqtgOlg67h6wsHWgvIiKSGxUvKRKJjdqxvNHFnJH2Fd+M+SboOCIiImFJxUuKTOMLn2BHTCXqTnyAlG1pQccREREJOypeUmSsbBV2n/QQrVjC2A+fCzqOiIhI2FHxkiJV87jLWVXxKP619k0mzV0SdBwREZGwouIlRcuMpIteoaLtYuNn97I7XX9EW0REZB8VLyly8XWPZF3TyzkrfQyfffl50HFERETChoqXFIt65z3CtrhqtJj5CEvXbw06joiISFhQ8ZLiEZ+Inf44LWwZ44c+oz+iLSIigoqXFKNK7S5ibdX2nL9lEN9MmRN0HBERkcCpeEnxMaPGxa9SzvaS8c0DbN2lP6ItIiKlm4qXFKuYGk1IbXU157hxDPtkWNBxREREAqXiJcUu6ez7SS1TixOWPMX05SlBxxEREQmMipcUvzLlie/yNE1DK/lt2NNkZGYFnUhERCQQKl5SIsoe2ZWUmsdzya4hDP9xStBxREREAqHiJSXDjOoXvkSCZVDx50dYnao/oi0iIqWPipeUGKt2CLva30QX+4WPhg0JOo6IiEiJU/GSElXptH5sS6hL19Uv8P2clUHHERERKVEqXlKy4spSrtvzHBpaw+LPn2bX3oygE4mIiJQYFS8pcbFNz2RL8qlcnj6cQV9NCDqOiIhIiVHxkkBUOf8FYkNwyIzHWbhuW9BxRERESoSKlwSjSgMyO93OmaEpDB86mKws/RFtERGJfipeEpiynW9je/mGXLrlVT6e/HvQcURERIqdipcEJzaeCuf9j0ah9Wz89lk27tgTdCIREZFipeIlgbJDT2b7IV240n3KmyN/CDqOiIhIsVLxksAlnvsMoZhYjln8DL/+sTHoOCIiIsVGxUuCV6ku1vkuTouZzlcjBrEnIzPoRCIiIsVCxUvCQlynG9lZ8VCu3dmft3+YH3QcERGRYqHiJeEhJo7y571IcigFN+F5lm/cGXQiERGRIqfiJeGj0fGkNe3OVTaKV0d8i3P6bi8REYkuKl4SVsqe/QTEJXDO6v8xataaoOOIiIgUKRUvCS+JtYg95T5OjJnNhFGD2JqWHnQiERGRIqPiJWEn1P5q0qoewa0Zg3h59Iyg44iIiBQZFS8JPzGxlO32InVsM0kzXmbmytSgE4mIiBQJFS8JT/WPIf3IS+gbO5o3R4wmIzMr6EQiIiIHTcVLwlbcGY/i4srTZ/MrvPvr8qDjiIiIHDQVLwlf5asTe9qDdIyZz8Ix77B2a1rQiURERA6KipeENWt3BXtqtOI/9j7PfP5b0HFEREQOioqXhLdQDPFd/0eSpdJi8ev8sHB90IlEREQOmIqXhL+6bclqcxmXx37L4E+/Im2v/oi2iIhEJhUviQgxpz5IVnwlbtr9Bi+PXRx0HBERkQOi4iWRoVxV4k5/hKNDi9n4y2AWrdsedCIREZH9puIlkaN1b9LrHM09sR/y5Ke/kpWlP6ItIiKRRcVLIkcoRNw5L1CZHZy8ZgAjpq0KOpGIiMh+UfGSyFK7Jda+L71jv2fk6C/ZtGNP0IlEREQKTcVLIo6dfD9ZZatzV+YAnho9P+g4IiIihabiJZEnoRKxZzxO69AfxMwawqSlm4JOJCIiUigqXhKZWl5IZnJH7on7iGc+/ZW9Gfoj2iIiEv5UvCQymRHT5XkSLY0eqW8z4OelQScSEREpkIqXRK6aRxDqcB09Y39k/NjRrNi0M+hEIiIi+VLxksjW+W4yy9fioZi3ue/TmWTqu71ERCSMqXhJZItPJObMJznCltN4+TBeHrsk6EQiIiJ5UvGSyNf8PFzjk7ivzDBG//gjPy7aEHQiERGRXKl4SeQzw857kzJlExmY8Ar3fjSRVVt2BZ1KRETkH1S8JDok1sIueJv6bjX3ZfXn+iHT2JORGXQqERGRv1HxkujR+ESs8710sQm0WDeSR0bpW+1FRCS8qHhJdDn+Djj0VB4p8x4zp4znE/0hbRERCSMqXhJdQiE4rz8xiUkMKvcKT302iYXrtgWdSkREBFDxkmhUvhrW411quI08G/sW170/jW2704NOJSIiouIlUSq5PXbao3R2Uzht68f85+NZOKcvVxURkWCpeEn06nAdNDuXu+M+YtP88fp7jiIiEjgVL4leZtD1VaxKAwaWe42B30xh8tJNQacSEZFSTMVLoltCJazHu1RiB6+XfYObP5zKhm27g04lIiKllIqXRL/aLbGznqVd5iwu3TucGz+cQXpmVtCpRESkFFLxktKhTR9o1ZMbQp8Q/+c4nv12UdCJRESkFFLxktLBDM5+Hktqyptl3+SLn37jm7lrg04lIiKljIqXlB5lysOF71EuJoN3KrzO3R9PZ2nKjqBTiYhIKaLiJaVL0uHYOS/RLGMBt9tQrhsynV17M4JOJSIipYSKl5Q+R14AR/elD6NosPEH7h85V1+uKiIiJULFS0qn05+AOkfxSsIAfps5nQ+n/Bl0IhERKQVUvKR0io2HHoMpExfD+4mv8eQXM5m1MjXoVCIiEuUCKV5mVtnMRpjZQjNbYGbHBpFDSrkqDbHz3qLh3t95LGEI138wnS079wadSkREolhQM14vAd8455oCrYAFAeWQ0q7JmdDpFrplfkeHHWO5ddhMsrJ0vJeIiBSPEi9eZlYROAF4G8A5t9c5p/d4JDgn/xcadOLp+IGsXjKDV374PehEIiISpYKY8WoMpADvmNkMMxtoZuVznsnMrjazqWY2NSUlpeRTSukREwsXDCImIZEhia/Rf+xsxi/WmBMRkaIXRPGKBdoAbzjnjgJ2AnfnPJNzrr9zrp1zrl1SUlJJZ5TSJrEW1v1taqav5JUK73HL0OmsTk0LOpWIiESZIIrXKmCVc26y//sIvCImEqzGJ2Kd7+Xk9HGcnzWG6z+Yzp6MzKBTiYhIFCnx4uWcWwesNLMm/kmnAPNLOodIro6/Aw49lftjBpOxagaPfanPfYiISNEJ6lONNwEfmNlsoDXwREA5RP4uFILz+hOqUIMhFV/j80nzGTljVdCpREQkSgRSvJxzM/3jt1o657o557YEkUMkV+WrQY/BVE7fwMBK73DPp7NZtG570KlERCQK6JvrRXKT3B477VHa7/mVa+O+4boh09i+Oz3oVCIiEuFUvETy0uE6aHYut7ghVN8yg34jZuuPaYuIyEFR8RLJixl0fRWr0oBBFV5n8tzFvD1hWdCpREQkgql4ieQnoRL0eJfyGVsZUmUAT389nynLNgedSkREIpSKl0hBarfEznqWI9KmcW+FL7nxw+ls2L476FQiIhKBVLxECqNNH2jVk8v3fkSL3dO46cMZZGRmBZ1KREQiTKGLl5lVMbPmZtbYzFTYpHQxg7Ofx5Ka8kbZN1m+7Hee/W5R0KlERCTC5FugzKySmd1rZnOAScBbwHBghZl9bGYnlURIkbBQpjxc+B7x7GVY1bd4e/xivp23LuhUIiISQQqauRoBrASOd841cc4d53/xaTLwFNDVzK4s9pQi4SLpcDjnJRrumsMzVT7jzuGzWL5xZ9CpREQkQsTmt9A5d1o+y6YB04o8kUi4O/IC+HMi5/82kJ9Ch3LtkLKMvL4TZcvEBJ1MRETCXKGO1co5q2VmMWb2YPFEEokApz8BdY7iudg32LX+d+7/bK6+XFVERApU2IPkTzGz0WZW28xa4B3vlViMuUTCW2w89BhMbEyIj6u+xZfTl/LRbyuDTiUiImGuUMXLOXcJ8C4wBxgN3Oqcu7M4g4mEvSoN4by3qLlzIa9XH8GDn89j9qrUoFOJiEgYK+xbjYcBtwCfAMuBS82sXDHmEokMTc6ETrdwyo4vuaTcZK4bMp3UXXuDTiUiImGqsG81jgIecM5dA5wILAF+K7ZUIpHk5P9C/Y484N4icfsf3DpsJllZOt5LRET+qbDFq71zbiyA8zwPdCu+WCIRJCYWLhhETHx5PqryBpMXreS1H38POpWIiIShgr5A9TgA59y2nMucc0vMrKJ/sL1I6VaxNnQfSKUdS3m/xlBe+H4RPy9JCTqViIiEmYJmvLqb2a9m9l8zO9vM2pvZCWb2bzN7H/gSKFsCOUXCX+PO2En30m7bGG6t9As3D53B6tS0oFOJiEgYKegLVG8zsyrABUAPoBaQBiwA3nTO/VL8EUUiyPF3wp+TuHnZQCZk1ueGD8oz/JpjKROrP28qIiKFOMbLObcFqAjMBsYAE4CNQFMza1288UQiTCgE5w/AKiTxbvlXWbpyNY9/NT/oVCIiEiYK+zK8LXAtUBuoA1wNdAYGmFm/4okmEqHKV4Megym3ex0f13qfdycu5/OZq4NOJSIiYaCwxasa0MY5d6dz7g6gHZAEnABcXkzZRCJXcns47RGapP7EI0njuPuTOSxevz3oVCIiErDCFq/6QPZvhUwHGjjn0oA9RZ5KJBp0uB6ancOlOwbRsczvXDtkGjv2ZASdSkREAlTY4vUhMMnMHvT/OPYvwFAzKw/oABaR3JhB19ewyvV5Pf4Vtm9ax10jZuuPaYuIlGKF/VuNjwJXAanAVuBa59wjzrmdzrlexRlQJKIlVIIL3yN+zxY+rfkOX89ZzaBflgedSkREApLv10lk55ybBkwrxiwi0al2SzjrGZJH3cKLtZtw++gYWtWrRLuGVYNOJiIiJUxfLiRSEtpcBi0v5pwt79G14iJu+HA6Kdt1eKSISGmj4iVSEsygywtYUlOe5hXi09Zz89AZZGRmBZ1MRERKkIqXSEkpUx4ufJfYzN18Wn0Avy1dz/NjFgedSkRESpCKl0hJSmoC575M9S0zGJT8NW+M+4Mx89cHnUpEREqIipdISTvyAji6LyekDKVv0nxuHz6TFZt2Bp1KRERKgIqXSBBOfwLqHMW9e14mmQ1cO2Q6u9Mzg04lIiLFTMVLJAix8dBjMKGQMazKGyxdt5H7P5urL1cVEYlyKl4iQanSEM57i8Qt8xjeYBQjpq1i2G8rg04lIiLFSMVLJEhNzoROt9Bq3Sf0qzOb/34xj7mrtwadSkREiomKl0jQTv4v1O/Iddtfpm3ZDVw7ZBqpu/YWfDkREYk4Kl4iQYuJhQsGYWXKM6j8K2zblsrtw2eRlaXjvUREoo2Kl0g4qFgbug+kbOrvfFZ/BD8sXM/r434POpWIiBQxFS+RcNG4M5x0L43XfsVTDafzwpjFTFiyMehUIiJShFS8RMLJ8XfCIadwUcqrnFFtPTd/NIO1W9OCTiUiIkVExUsknIRCcP4ArHx1Xgq9SJn0bVz/wXT2ZuiPaYuIRAMVL5FwU74a9BhM3I7VfFbvQ2b8uYUnRi8IOpWIiBQBFS+RcJTcHk57hFprvqf/oZMZ/Otyvpi1JuhUIiJykFS8RMJVh+uh2TmctuZ1etVey92fzGbWytSgU4mIyEFQ8RIJV2bQ9TWsUjKPpD/HIeXT6DNoCvPXbAs6mYiIHCAVL5FwllAJLnyPmLTNDE96hwpx0PvtySxZvz3oZCIicgBUvETCXe2WcNazlP1zPN80GkacZXHJwMks27gz6GQiIrKfVLxEIkHby+Dk+0lcNILvDv2UrMxMeg2YxMrNu4JOJiIi+0HFSyRSnPAfOKEflRZ+xJimo9ixJ51LBk7SF6yKiEQQFS+RSHLSvdDpFqrOf5+xzb9ly8699BowmQ3bdwedTERECkHFSySSmMGpD0OH60maN4gxR/7A2q1p9B44mc079wadTkRECqDiJRJpzOD0J+DovtSe+xbftZ7Aik27uPTtyWzdlR50OhERyYeKl0gkMoMzn4U2fUie8yrfHDWJxeu3c9k7U9ixJyPodCIikgcVL5FIFQpBl5egVU8azXmRL9tOZ87qrfz7nd/YtVflS0QkHKl4iUSyUAi6vgYtutNk9rN83nY2U1ds5ur3prE7PTPodCIikoOKl0ikC8XAeW9Bs3NpMedJRrRbwITfN3L9B9PZm5EVdDoREclGxUskGsTEQfe34fAzaTPnUT5qt5gfFm7g5qEzyMhU+RIRCRcqXiLRIrYMXPguHHoqHeY+zLttl/LNvHXc8fEsMrNc0OlERAQVL5HoEhsPFw2BRidw4vz/0r/NCj6fuYZ7Pp1NlsqXiEjgVLxEok1cWej5EdQ/ln8tuJ+XW69k+NRVPPjFPJxT+RIRCZKKl0g0KlMOLhkGddtyzuL7ePbI1bw/aQVPjF6g8iUiEiAVL5FoFZ8IvUdgtVpywdL7eKz5Ogb8vIwXxiwOOpmISKml4iUSzRIqwaWfYklN6bXiXu5rtp5Xfvid1378PehkIiKlkoqXSLQrWwUu/Qyregh9V97L7Yen8Oy3ixj489Kgk4mIlDoqXiKlQflq0OdzrHJ9blp3HzccuonHvlrA+5NWBJ1MRKRUUfESKS0qJMFlX2AVanJnyr30bbyZBz6by/CpK4NOJiJSaqh4iZQmibXgslFYuWrct/k+ejdI5a5PZvP5zNVBJxMRKRVUvERKm0p1vfIVX5FHt91P97pbuX34LL6ZuzboZCIiUU/FS6Q0qlzfe9sxNoFndv2Xs2pt5aahM/hx4Yagk4mIRDUVL5HSqmpjuOwLQqEQL+19kJOStnHNkGlMWLIx6GQiIlFLxUukNKt+GPT5glBWJm9mPkzHKtvp+95vTFm2OehkIiJRScVLpLSr0RT6fE4oI4237RGOqridK96Zwow/twSdTEQk6qh4iQjUagGXfkbM3m28H/sYTcrt4LJBU5i7emvQyUREooqKl4h46rSG3iOJTdvMsIQnaBC/g0vfnszi9duDTiYiEjVUvETk/9VrC71HELdzHZ+Uf4qk0HYuGTCZpSk7gk4mIhIVVLxE5O/qd4BewymzbSWjKj1Lxaxt9Bo4mZWbdwWdTEQk4ql4icg/NTwOeg4lPnUpX1V9gdi92+g5YBJrUtOCTiYiEtFUvEQkd4ecBBd/QNkti/i2+otk7NpKr4GT2bBtd9DJREQiloqXiOTtsNOgx7uU2zSX72u+yvZtqfQaOJlNO/YEnUxEJCKpeIlI/pqeBd3fpkLKDMbWfoMNm7dw6dtT2LorPehkIiIRR8VLRArWvBuc9xaVNkzhx3r9+XPDZvq8M4Xtu1W+RET2h4qXiBROyx7Q9TWqrvuVH+u/zZLVG/n34N/YtTcj6GQiIhFDxUtECq/1JXDOiyStHc8PDd5l1ooU+r47ld3pmUEnExGJCCpeIrJ/2l4OZz1HrbVj+aHhEKYs3cC1Q6axJ0PlS0SkICpeIrL/2l8Fpz9BvbXf8X2jofy0aD03fTiD9MysoJOJiIQ1FS8ROTDH3gCnPkTDNaP5tvFwxsxfy+3DZ5GZ5YJOJiIStmKDDiAiEey42yBjL4eNe4KvGsVw9qzulIkJ8ewFLQmFLOh0IiJhR8VLRA7Oif0gcw9H/Pw8IxvF0m16NxLiQjzWrQVmKl8iItmpeInIwTGDkx+AzL20/vUVhjeM48LJZxMfG8MDXZqpfImIZKPiJSIHzwxOexQy02k/+U3ebxDDpb+cQdkyIf5zetOg04mIhA0VLxEpGmZwxlOQsYfjp73DwPqx9P3RSIiN4aZTDgs6nYhIWFDxEpGiYwZnvwCZ6Zw68x1eqx/LDWMgIS6Gq05oHHQ6EZHAqXiJSNEKheDclyErnbNnD2BPvVhuHw3xcSH6HNsw6HQiIoFS8RKRoheKga6vQ8Yezp//BnvqxnDP5xAfG+Kio+sHnU5EJDAqXiJSPGJioftAyMqg58JX2V07hrs/hfjYGLodVTfodCIigdA314tI8YmJgwvegcNO54otL3FXzanc8fEsvp6zNuhkIiKBUPESkeIVWwYufA8OOZlrUv/HzdWnc9PQGYxdsD7oZCIiJU7FS0SKX1wCXPwh1vA4bt7+PFdVm8V1Q6bz85KUoJOJiJQoFS8RKRlxZeGSYVjyMfTb8Sy9K8/hqvemMmnppqCTiYiUGBUvESk5ZcpDr4+xum14IO0ZuleYx5WDf2Paii1BJxMRKREqXiJSsuITodcIrGZzHtv7DGeUnc/l70xh7uqtQScTESl2Kl4iUvLKVoZLR2LVD+e5jKfpHLeQ3m9PZt4alS8RiW4qXiISjHJVoc9nWNVGvOSepEPMInq8OZHv5q0LOpmISLFR8RKR4JSvDpd9QahSPV7nSbpWWc41Q6bx+rjfcc4FnU5EpMipeIlIsCrUgMtGEapYmye2388T9afzzDeLuH34LHanZwadTkSkSKl4iUjwKtaGvmOxxifSc/1zjGr0KV/OWMHF/SexYfvuoNOJiBSZwIqXmcWY2Qwz+zKoDCISRspWhkuGQ6dbOXLtCCbXe5mUdavo+uov+sSjiESNIGe8bgEWBHj7IhJuQjFw2sPQ/W2qps7jx4oP0TTrD3q8OVF/31FEokIgxcvM6gFnAwODuH0RCXNHXgBXfkuZ2BgGuQe4pspUrvtgOi+PXaKD7kUkogU14/Ui0A/IyusMZna1mU01s6kpKfp7biKlTu1WcPU4rG5bbt32LO/U/YIXxyzkpqEzdNC9iESsEi9eZtYF2OCcm5bf+Zxz/Z1z7Zxz7ZKSkkoonYiElfLVoc/ncPRVnLTpI8bXfpUJc5Zw4VsTWbdVB92LSOQJYsarE3CumS0HPgJONrMhAeQQkUgQEwdnPwfnvkLy1mn8Wu0x2LCAc1+dwKyVqUGnExHZLyVevJxz9zjn6jnnGgIXAz8453qXdA4RiTBt+sAVoynHHkbGP8gp9hsXvjWRL2atCTqZiEih6Xu8RCRyJLeHq8cRU6MpT+59iscqjeKWodN4/rtFZGXpoHsRCX+BFi/n3DjnXJcgM4hIhKlYBy4fDa170WPnB4xKeotBP8zhhg+ns2tvRtDpRETypRkvEYk8cQnQ9TU442ma7/iVCdUeZ+G8mVzwxkTWpKYFnU5EJE8qXiISmcygw7XYpSOpkpXKdxUeInnzr5z76i9M/3NL0OlERHKl4iUika3xiXD1j8RVrc+boae40r7g4v4T+XT6qqCTiYj8g4qXiES+Kg3hyu+wZudyXfq7DErsz73Dp/DU1wt10L2IhBUVLxGJDmXKQ4/BcMp/6ZQ2jrFVnuSL8ZO5+v1p7Nijg+5FJDyoeIlI9DCD4+/ALhlGnax1jE38v/buO8rK8tD3+PeZytB7FwYVK4iAoiIoiA17Cxp7CRqNWBJjy0miN4nKicbYezSi2BsK3WoHmwAAGthJREFUWADp0kREQBFQEZAyDB2EYWae+wecexOPZYzMfvfM/n7WmrXE2az5LZ4F85139rz7j2z4dAynPTCRRas2Jb1OkgwvSdXQbkcR+o+ioG5jBufdQo81r3HSveOZ+sWqpJdJynCGl6TqqXF76D+SrF378F88ys1Zj3D+I+N4ftqipJdJymCGl6Tqq0Y9+Pkz0PM3HFf6Nq/Uuo2/vjiWP78xhzKfdC8pAYaXpOotKxv6/AF+9gTt4xeMrPNHpkwYwS/+OZV1m7cmvU5ShjG8JGWGvU8mXPQ2dWsV8ErBn2i04GVOuX8iC4s3Jr1MUgYxvCRljuYdof9ostseyO05D3D+uoc45d6xvLegOOllkjKE4SUps9RqBGe/AgdcytkM5ZGsWxnw2AgGT/4y6WWSMoDhJSnzZOdA39vgxPvpzCcMq/kHnnx1KDcNmU1pWXnS6yRVY4aXpMzV+SzCBcNpUhAYUnATyyc9xwVPTGXtJp90L6lyGF6SMlvrroSLR5PXch8eyLuLg764n1PuG8dnRRuSXiapGjK8JKlOczj/Deh8Dpdlv8ofN/6Zs+97h/HzVia9TFI1Y3hJEkBOPpxwDxxzOz2zPuS5rN9x0+Ov8uR7XyS9TFI1YnhJ0v8IAbr1J5z7Gq3zNzMk//eMev0p/uvVj9jqk+4l7QCGlyR9U2EPwiWjKWi2K4/n3U7dqfdw3mOTWbOpJOllkqo4w0uSvk39NoQL3yJ0OIVrc5/j7MU3cfq9I5m/Yn3SyyRVYYaXJH2XvJpw6mNw+M30zZrM3Zuu51f3vcrouSuSXiapijK8JOn7hAA9riKc9SLt81bxfNaNPPzPJ3hs/OfEGJNeJ6mKMbwkqSLaH07WJaOp07A5g/Ju5cvhd3LDSzMpKfVJ95IqzvCSpIpqtAtZ/UeStdtR3Jz7T7rM+D0XPDKO4g1bkl4mqYowvCTpx6hRl3DGYDj0OvrljOG3y37Nhfe+ztxlPule0g8zvCTpx8rKgt43Qr9BdMz9ikc3X8NN9z/OyI+XJ71MUpozvCTpP7XXCWT3H0GDenV5Mutm3nrqdh4as8An3Uv6ToaXJP0UzfYm55LRZBUezH/nPkz+O9dz7fPT2VJalvQySWnI8JKkn6pmQ7LPeZl44K84P+dtTpl1OZc8+BZF633SvaR/Z3hJ0o6QnUM4+hY4+SG65c7nL0UD+O09TzHnq3VJL5OURgwvSdqROp1B9kVv0aR2Lg+WXM+jD97Om7OWJb1KUpowvCRpR2vVhbxLx5LVcl/+lnUXnz97DfePmuuT7iUZXpJUKWo3Je/CoZR2Pp9Lc15nz3f789unxrFmU0nSyyQlyPCSpMqSk0fOiXcRj72TQ3JmccX8i7jhjvt4Z473+5IyleElSZUs7H8h2RcMp1m9mjxQdhNFgy/h+qe9+iVlIsNLklKhzQHkD5hE2UFXcEbOGK769Bz+z+138NZsn3gvZRLDS5JSJbeA7KP+RFb/kdRv1Iy/ld9GybPnceOgUaze6NUvKRMYXpKUaq26UOOycZT1+h19c97nmvnncMcdf+LNj5YmvUxSJTO8JCkJOXlk97qWnEvHU6PZbvy5/G5yn/85v3/yLVZ59UuqtgwvSUpS0z2o+csRlB15K4fkfsx1C87joTtu5M2PliS9TFIlMLwkKWlZ2WR3v4zcAZOh9X7cUP4IDV44hZufGELxBl/vUapODC9JShcNCqn9i9cpPf4e9s1dzHWfX8hTd/ya4R8uSnqZpB3E8JKkdBICOV3PJf/KaWxtdxhXxqdo9dLx3PKP51np1S+pyjO8JCkd1W1BnfOeo+zUx9k1fy2/XfhLXrnjUoZ98EXSyyT9BIaXJKWrEMjueAo1r36fTbufTP/4Eu1fOYa/PvqkV7+kKsrwkqR0V7Mh9c58jLKfv0DzgjJ+s+gK3rr9fIa9P48YY9LrJP0IhpckVRHZux9JnV9PY13HczmLYXR8rS93P/wwReu9+iVVFYaXJFUl+XWof9rdlJ43lDq1anLl0muZcMfpDJsyx6tfUhVgeElSFZTTrgf1fz2F1Z1/xfGMYb+hfXngwbtYsX5z0tMkfQ/DS5KqqtwaNDjxFug/iqw6zbhs+R+ZccdJvDnpQ69+SWnK8JKkKi67VWcaXz2B4gOupxfTOHB4Xx6//1ZWrP066WmSvsHwkqTqIDuXRn1vIPuyCWyuvysXFg1k3p1H8eaEKV79ktKI4SVJ1Uh2091pfuVoinr+mS5hLj3ePp7B9/6eFWs3JT1NEoaXJFU/WVk06TOAvAGTWdOoM2cV38PiO3vz9thxXv2SEmZ4SVI1ld2wkNYDhrOiz520D4s5dOTJvHTXb1i+en3S06SMZXhJUnUWAk17XkjNq9/nq2aHctqax1h1V09GjHrHq19SAgwvScoA2XWb0+6yl1h+9CO0yFpDrzH9GHrnL1lWvCbpaVJGMbwkKYM0O7AfdX/zAZ+1Op7j1j3L1/ccxKi3X/Pql5QihpckZZisWg3Y7eInWXbCYGpll9JrwnmM/Nt5LCsqSnqaVO0ZXpKUoZp3OZbGv53OJ23O4LB1Qyi/90DGDHvWq19SJTK8JCmDZdWow14XPciK014h5hRw6JRLGH/7z1i67Kukp0nVkuElSaJ5x960uG4qM3fuz4EbRpH3wIGMf+1Rr35JO5jhJUkCICuvgH3OvZ2VP3+TtXlN6PHBb5j238exdMkXSU+Tqg3DS5L0b1rs0Y3C6yYxfber2GfTZGo+3J1JL91FLC9PeppU5RlekqT/JSsnly5n3syqc9/lq/x2HPjRH5g1sA9LF85NeppUpRlekqTv1GKXjux+3Vim7HUjO2+eQ71/9GTac7cQy0qTniZVSYaXJOl7ZWVn063fday9cBzzanRkv48HMm9gD5bOn5H0NKnKMbwkSRXSsu1u7HPdO0zc5xaabllEo0F9mPH07yjfWpL0NKnKMLwkSRUWsrLofsqv2HTxe0yveTD7zruXRQO7seyT95KeJlUJhpck6Udr2aoNB1z7GmO7/J2CrWto/MwxzHriSso2r096mpTWDC9J0n8khMAhJ1xA6aWTGFf7KDp88QTrB+7N3Nf+Sty6Oel5UloyvCRJP0nL5s3pdc0zjO/1HAtCG3b/4M8U3dqRz0Y8AuVlSc+T0orhJUn6yUII9Oh1NPv8biwj93+Y4liHncdfw5JbO7No4vPgSw9JgOElSdqBcrOz6HPs6RReP4W39v4rJVu3stPb/fliYHdWfPhO0vOkxBlekqQdriA/h6N+djENr5nOsHa/I//rZTR95TQW3HE4q+dNTnqelBjDS5JUaerVLuCY864lXPEBQ1teToN1H9Pg6SP59J6T2bBkTtLzpJQzvCRJla55o/oce/FfWHvxNIY2PI+WKydS45GDmfvweWwuXpj0PCllDC9JUsq0a9WCY6+4my/Pfo8RtU+kcMkbhHu6MvfJKyhdX5T0PKnSGV6SpJTbq/3OHH3NE8w6eRTj8nux64In2XJHR+a/8F/EzeuSnidVGsNLkpSYrvt2os/1LzDxqNeZntOJXWffw9qBHfj8jduhdEvS86QdzvCSJCUqhEDP7j056IbhvNP9aebThnbT/sTKWzuw+N1HvQmrqhXDS5KUFnKyszjiyOPocOMYhu77AMvL6tB6zG9YelsXVkx+0ZuwqlowvCRJaaVGbjbHnnQmO103idd2u42vt5TQdPhFLPrrQaye5U1YVbUZXpKktFS3II8Tz7yU2r+eyittbiBn43IavHgan995BBs+m5L0POk/YnhJktJa03q1OfnC6ym5bBqvNP0V9dbMofaTR7DgvlPZsvTjpOdJP4rhJUmqEto2a8TJl93C8gsm82q9c2i2Yjw5D3VnwaMXULrqy6TnSRVieEmSqpQ9C1tz0tX3Mvf0cQwrOIHWi4ZQfncXPnvqSuLGlUnPk76X4SVJqpK67rUbx137BFOOe4dRuYfQdt4/+fr2Dix8+Y+wZX3S86RvZXhJkqqsEAI99+/CETe8xMjerzIldKLtzL+zduDeLBn+N2/CqrRjeEmSqrzsrMCRvXpx0I3DGLL/IOaW70SryTdTfFsHVox9zJuwKm0YXpKkaiM/J5sTjj2BvW4Yw0sd7mXp1to0HfVrlg/szKr3X/ImrEqc4SVJqnZq5+dw6mnn0Pya93h+57+wYfNWGr5+IV/d3p0NH49Mep4ymOElSaq2GtepQb9zLydvwGSea3EdccNyaj93Cov+fiSbF05Nep4ykOElSar2dmpcl9MvuZGN/SfzXMNLqbl6DjUeP5yFD5zK1uWfJD1PGcTwkiRljN1aN+H0K27jy7Mn8EKts2i0bDxZDxzEwn9cQPlqb8Kqymd4SZIyTuf2bTntmvuYcfJoXss/juYLh1B6VxcWPXOVN2FVpQqxCvyEx3777RenTZuW9AxJUjVUXh4ZMWkqW0fextGlo9iSVcCaThfTsu81kF8n6XmqgkII78cY9/u293nFS5KU0bKyAkd278YRN7zIGz1eYhIdaTnj76wfuDcr3rnTm7BqhzK8JEkC8nKyOPGIPhxw/TCe3/cJZpe1oemEm1g9sCOrxv8DykqTnqhqwPCSJOlf1MrPod9JJ7P7taN4ere7WFxSi4Yjrqbor11ZN2UwlJYkPVFVmM/xkiTpe3y1ehMjX36UAxc+SPusJazNacSmfc6jxWGXQe0mSc9TGvq+53gZXpIkVcDnReuZ+OazFM4fxMHhQ0rIZelOx9LyqKvJbb1v0vOURgwvSZJ2kPWbtzJizFiypj7CEVtHUTNsYXHdztQ55HLqdT4JsnOSnqiEGV6SJO1g5eWRiXM+Y8nIh+i+6mV2CkWsymnGpn0voNVhlxBqNkx6ohKSVuEVQtgJeBJoDpQDD8cY7/q+32N4SZLS2Rcr1jHlracpXDCIbsxmC/ksaXMCrY6+ivyWHZKepxRLt/BqAbSIMU4PIdQB3gdOijHO+a7fY3hJkqqCjVtKeXfsu2RPeYjeJaOpEbaysN7+1Dl0AA33PQ6yspOeqBRIq/D6XwNCeA24N8b4znc9xvCSJFUlMUamzJ7HkpEPctCqV2gRVlGU24JNnS6iTZ/+hIL6SU9UJUrb8AohFAJjgQ4xxnXfeN/FwMUAbdq06bpw4cKU75Mk6adaVLSWaW8+SeGCQXRmLpsoYEnbk2jT92rym++e9DxVgrQMrxBCbWAM8JcY48vf91iveEmSqrqvS8oYO/ptsqc+xCElY8kLZSyodxB1ew+gyT59Ict7mlcXaRdeIYRc4A3grRjj337o8YaXJKm6iDEyfc5clo68nwOKX6VJWMvS3DZ8ve9FtDv8IoIvzF3lpVV4hRAC8E9gVYzxqor8HsNLklQdfbVyDdOHP067BYPYmwVsoBZfFp5Ku2OuoqDpLknP038o3cKrBzAO+Ihtt5MAuDHGOOy7fo/hJUmqzjaXlDJhzHBypz5E9y0TyAqRefV7Uq/3AJrvcwSEkPRE/QhpFV7/CcNLkpQJYox89PEclo64n/2LX6Vh2MCi3HZ83fkX7NrnQrLyayY9URVgeEmSVMUsL17NjGGP0m7BIHZjIWtDHRYW9mOXY66kVpO2Sc/T9zC8JEmqorZsLWXy6NfJnfow3ba8RyTwSYPeNDhsAK069vLbkGnI8JIkqRqYM+cjlo24l67Fr1MvbOTzvPZ83fkX7NHnfLLyaiQ9T9sZXpIkVSNFq1Yxa+hDFC4YRDuWUBzqs7DwdHY95grqNmmd9LyMZ3hJklQNlWwt4/13XyZ32sPsVzKFrTGb2Q0Pp+FhA2jTsWfS8zKW4SVJUjX36ZwZLB9xN52Lh1I7bObTvD35usvFdOhzNtm5eUnPyyiGlyRJGWJV8UpmD7ufdguepjXLWEEjvmh3Brsfezn1GrdMel5GMLwkScowpaWlzBj1PLnTHqFTyXS2xFxmNjySRn2uZOcOByQ9r1ozvCRJymAL5kyjaMTddCoeTkEoYXZeR7Z0uZh9+vycnNzcpOdVO4aXJElibfFyPh52H4ULBtOcIpbShM92PpO9jr2cBo2aJj2v2jC8JEnS/1NWupWPRj5D/vsPs2fJR2yK+cxoeDQNe1/O7h33J3hT1p/E8JIkSd9q4exJrBxxFx1WvUN+2Mr8rEK+an0srXqewy7t90x6XpVkeEmSpO+1rngp80Y8Tr0Fr7FryScAzM7eg6LC42l3yFm0bdsu4YVVh+ElSZIqrHjRXBaOfYpGnw+hbekXlMXAzNxOrNnlBNr3OpPWLVokPTGtGV6SJOk/UrTgAxaPe4oWX75B8/JlbIk5zMjfj43tT2Sv3v1o3rhx0hPTjuElSZJ+mhhZ/sl7LJvwFK2WDKdxXMXGmM8HBQdRsufJdDz0VJrUr5P0yrRgeEmSpB2nvIyvZo6keNJg2i4fQd24nrWxJtNrHULscCqdex5PgzoFSa9MjOElSZIqR9lWFk8byrppz1JY9C412cyKWJ8ZdXuR06kf+x18BHULMuu1Ig0vSZJU6WLJRhZNfpVN05+n3eoJ5LOVRbEJHzU4nILO/eh2QE9q1aj+d8o3vCRJUkrFr9ewcOILlH74IoXrppBDOfNiaz5pdAT1up1Bt677UyM3O+mZlcLwkiRJiSlfX8TC8YMJs16icOOHAMyKOzO/2dE0PuAMunXqSF5OVsIrdxzDS5IkpYXSVV+yaPxgcj9+mdZfz6U8BqaHPfiy5TE0P/B0uu3dnpzsqh1hhpckSUo7Jcs/Zcm4QdT89FWalXzJ1pjN5LAPS3c6hrbd+9F197ZkZ1W91400vCRJUvqKkS1LZrJk3CDqLRhCo9LlbI65TMjqysrC42nf4xT2bdeCrCoSYYaXJEmqGmJk8+fvsXT8UzRaOJS6ZWtYHwsYn92NtbueyF49TqDjTo0JIX0jzPCSJElVT1kpm+aNZsXEp2m66C1qxo0UxzqMzz2YTbufxL4H92WPFvXSLsIML0mSVLWVbmHj7DcpnvwMzZaOIj9uYWlsyPj8nmzd8xS6de/Drs3S4yWLDC9JklR9bNnA+pmvs3bKMzQvmkAOpXxe3oz3avaGDqfR46CDadOoZmLzDC9JklQ9bVrF2g9eYeP7z9Fs1VSyKefj8jZMrd2b3E79OPSArrSsn9rXjTS8JElS9bd+OWumPcfmD16g+bqZALxf3p4Z9fpQu/Np9N6/I03r1Kj0GYaXJEnKLKsXsnrKM5TNfJHGG+dRFgOT4l6UH3oDPfscX6kf+vvCK6dSP7IkSVISGrSlwVHXw1HXw4pPWDtlMHvNepmtzWolOssrXpIkKTP8T/NU8u0nvOIlSZKUBvf7qtqvQilJklSFGF6SJEkpYnhJkiSliOElSZKUIoaXJElSihhekiRJKWJ4SZIkpYjhJUmSlCKGlyRJUooYXpIkSSlieEmSJKWI4SVJkpQihpckSVKKGF6SJEkpYnhJkiSliOElSZKUIoaXJElSihhekiRJKWJ4SZIkpYjhJUmSlCKGlyRJUooYXpIkSSkSYoxJb/hBIYQiYGElf5jGwMpK/hj68TyX9OOZpCfPJf14JukpFefSNsbY5NveUSXCKxVCCNNijPslvUP/znNJP55JevJc0o9nkp6SPhe/1ShJkpQihpckSVKKGF7/38NJD9C38lzSj2eSnjyX9OOZpKdEz8XneEmSJKWIV7wkSZJSxPCSJElKkYwLrxDC0SGEuSGE+SGE67/l/SGEcPf2988MIXRJYmcmqcCZnLX9LGaGECaGEDolsTPT/NC5/Mvj9g8hlIUQTkvlvkxUkTMJIfQKIcwIIcwOIYxJ9cZMVIF/w+qFEF4PIXy4/VwuSGJnJgkh/COEsCKEMOs73p/c5/oYY8a8AdnAAmBnIA/4ENjrG485BhgOBOBAYHLSu6vzWwXPpDvQYPt/9/VM0uNc/uVxo4BhwGlJ767ObxX8u1IfmAO02f7rpknvru5vFTyXG4GB2/+7CbAKyEt6e3V+Aw4BugCzvuP9iX2uz7QrXt2A+THGz2KMJcCzwInfeMyJwJNxm0lA/RBCi1QPzSA/eCYxxokxxtXbfzkJaJ3ijZmoIn9XAAYALwErUjkuQ1XkTM4EXo4xfgkQY/RcKl9FziUCdUIIAajNtvAqTe3MzBJjHMu2P+fvktjn+kwLr1bAon/59eLt/+/HPkY7zo/9876IbV+lqHL94LmEEFoBJwMPpnBXJqvI35XdgAYhhNEhhPdDCOembF3mqsi53AvsCXwFfARcGWMsT808fYfEPtfnpOKDpJHwLf/vm/fTqMhjtONU+M87hNCbbeHVo1IXCSp2Ln8Hrosxlm37Ql6VrCJnkgN0BfoABcB7IYRJMcZPK3tcBqvIuRwFzAAOA3YB3gkhjIsxrqvscfpOiX2uz7TwWgzs9C+/bs22r0B+7GO041TozzuEsA/wKNA3xlicom2ZrCLnsh/w7PboagwcE0IojTG+mpqJGaei/36tjDFuBDaGEMYCnQDDq/JU5FwuAG6L255cND+E8DmwBzAlNRP1LRL7XJ9p32qcCrQPIbQLIeQBZwBDvvGYIcC523/i4UBgbYxxaaqHZpAfPJMQQhvgZeAcv3JPmR88lxhjuxhjYYyxEHgRuMzoqlQV+ffrNaBnCCEnhFATOAD4OMU7M01FzuVLtl2FJITQDNgd+CylK/VNiX2uz6grXjHG0hDC5cBbbPtJlH/EGGeHEH65/f0Psu2ns44B5gOb2PaViipJBc/kD0Aj4P7tV1dKY4KvLJ8JKnguSqGKnEmM8eMQwpvATKAceDTG+K0/Tq8do4J/V/4EPBFC+Iht3+K6Lsa4MrHRGSCE8AzQC2gcQlgM/BHIheQ/1/uSQZIkSSmSad9qlCRJSozhJUmSlCKGlyRJUooYXpIkSSlieEmSJKWI4SVJkpQihpckSVKKGF6SMkoIYf8QwswQQo0QQq0QwuwQQoekd0nKDN5AVVLGCSH8GajBtheSXhxjvDXhSZIyhOElKeNsf029qcBmoHuMsSzhSZIyhN9qlJSJGgK1gTpsu/IlSSnhFS9JGSeEMAR4FmgHtIgxXp7wJEkZIifpAZKUSiGEc4HSGOPgEEI2MDGEcFiMcVTS2yRVf17xkiRJShGf4yVJkpQihpckSVKKGF6SJEkpYnhJkiSliOElSZKUIoaXJElSihhekiRJKfJ/AST8L0Oa7aCKAAAAAElFTkSuQmCC\n",
+      "image/png": "iVBORw0KGgoAAAANSUhEUgAAA7YAAADICAYAAADcOn20AAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjguNCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8fJSN1AAAACXBIWXMAAA9hAAAPYQGoP6dpAAAQx0lEQVR4nO3dbWydZRkH8OuMjklYaEHMxla3ji46/KBdiItodB2oEUVXDNMYois6QgLCkKkhEGwHGDCaWIxODY510SUuJmSdZr4AdjUmxPBWEhaNjlBEzSYOOl+SbQwfPxhqN9Y68G7rdc7vl/QDZz3/c5+T6z7P899zOKtVVVUFAAAAJDVrphcAAAAA/wvFFgAAgNQUWwAAAFJTbAEAAEhNsQUAACA1xRYAAIDUFFsAAABSU2wBAABITbEFAAAgtYYptv39/VGr1eLhhx8ukler1eLTn/50kazxmb29va/6/nv37o2Pf/zjsWjRojjttNOivb09brjhhjhw4EC5RZJSvc9/b29v1Gq1CX++//3vF10r+dgD9kAjq/f5f+aZZ+LSSy+Nc889N04//fRobm6O5cuXx9e//vU4evRo0XWSj/lvHE0zvQDKePbZZ+Ntb3tbnHHGGXHbbbfFokWL4rHHHouenp4YHByMRx55JGbNapi/x6DBrFu3Lt73vve97PYrr7wynnzyyRP+GdQTe4BG9o9//CPOOOOMuOWWW2LRokVx5MiR2LVrV1x77bUxPDwc3/nOd2Z6iTBlzP9/KLZ1YmBgIA4cOBDbt2+Piy66KCIiVq1aFYcPH46bbropHn/88Vi+fPkMrxKmRmtra7S2th5z28jISOzZsycuv/zyaGlpmZmFwTSxB2hky5Yti61btx5z28UXXxx//vOfY+vWrfGNb3wj5syZM0Org6ll/v/DJbxxDh06FBs2bIiOjo5obm6Os846Ky644IIYGBiY8D7f/va34w1veEPMmTMn3vSmN53w41779u2Lq666KlpbW+PUU0+NJUuWxMaNG4t+PGD27NkREdHc3HzM7S+dzLzmNa8p9ljUp8zzfyL33HNPVFUV69atm9LHoX7YAzSyepv/iIjXve51MWvWrDjllFOm/LHIzfzXB1dsxzl8+HA899xz8dnPfjYWLlwYR44cifvvvz8+/OEPx5YtW+ITn/jEMb+/c+fOGBwcjFtvvTVOP/302LRpU3zsYx+LpqamuOyyyyLi3wO9YsWKmDVrVnzhC1+I9vb2ePDBB+P222+PkZGR2LJly6Rramtri4h//837ZLq6umLRokWxYcOG2LRpUyxevDgeffTRuPPOO+ODH/xgnHfeea/6daExZJ7/4/3zn/+M/v7+WLp0aaxcufIV3ZfGZQ/QyOph/quqihdffDH+9re/xc9+9rPo7++PDRs2RFOT010mZ/7rRNUgtmzZUkVE9dBDD530fY4ePVq98MIL1ac+9alq+fLlx/xZRFSnnXZatW/fvmN+f9myZdXSpUvHbrvqqququXPnVk8//fQx9//KV75SRUS1Z8+eYzJ7enqO+b329vaqvb39pNb7pz/9qbrggguqiBj7WbNmTXXo0KGTfcrUqUaY//F+/OMfVxFR3XHHHa/4vtQne4BG1ijzf8cdd4yd/9Rqtermm28+6ftSv8x/4/BR5OP84Ac/iHe84x0xd+7caGpqitmzZ8fmzZvj17/+9ct+96KLLop58+aN/fcpp5wSH/3oR2Pv3r3xhz/8ISIifvSjH8WqVatiwYIFcfTo0bGfiy++OCIihoaGJl3P3r17Y+/evf913c8//3ysXr06/vrXv8a2bdviF7/4RWzatCl++ctfxoc+9KGG+1Y0Xp2s83+8zZs3R1NTU3R3d7/i+9LY7AEaWfb57+7ujoceeih++tOfxuc///n48pe/HNdee+1J35/GZv7za6Br0//dvffeGx/5yEdizZo18bnPfS7mz58fTU1N8c1vfjPuueeel/3+/PnzJ7ztwIED0draGvv3748f/vCHY/8P7PH+8pe/FFn7l770pRgeHo6nn346zjnnnIiIeOc73xnLli2LCy+8MLZt2xZr164t8ljUp8zzf3zmzp074wMf+MAJ1wgTsQdoZPUw//Pnzx9bw3vf+94488wz48Ybb4xPfvKTvkCTSZn/+qDYjvO9730vlixZEtu3b49arTZ2++HDh0/4+/v27Zvwtte+9rUREXH22WfHm9/85vjiF794wowFCxb8r8uOiIjh4eFYuHDhWKl9yVvf+taIiHjiiSeKPA71K/P8j/fd7343jhw54gtzeMXsARpZvcz/eCtWrIiIiN/+9rcNc2LPq2P+64NiO06tVotTTz31mIHet2/fhN+I9sADD8T+/fvHPorw4osvxvbt26O9vX3sn1245JJLYteuXdHe3h5nnnnmlK19wYIF8cADD8Qf//jHWLhw4djtDz74YETEy/4ZCDhe5vkfb/PmzbFgwYKxj/rAybIHaGT1Mv/jDQ4ORkTE0qVLp/2xycX814eGK7Y///nPT/jtYu9///vjkksuiXvvvTeuvvrquOyyy+KZZ56J2267Lc4555z43e9+97L7nH322XHhhRfGLbfcMvaNaL/5zW+O+brvW2+9Ne677754+9vfHtddd1288Y1vjEOHDsXIyEjs2rUrvvWtb01aOl8axv/2Gftrrrkmtm3bFu95z3vixhtvjNe//vXxxBNPxO233x7z5s2Lyy+//CRfIepZvc7/S371q1/Fnj174qabbmqor7fn5NkDNLJ6nf+enp7Yv39/vOtd74qFCxfG6Oho/OQnP4m777471qxZE+eff/5JvkLUM/PfAGb626umy0vfiDbRz1NPPVVVVVXdeeedVVtbWzVnzpzqvPPOq+6+++6qp6enOv6liojqmmuuqTZt2lS1t7dXs2fPrpYtW1Zt27btZY/97LPPVtddd121ZMmSavbs2dVZZ51VnX/++dXNN99c/f3vfz8m8/hvRFu8eHG1ePHik3qOjz76aHXppZdWra2t1Zw5c6pzzz23WrduXfX73//+Fb1W1J9GmP+qqqorr7yyqtVq1ZNPPnnS96Ex2AM0snqf/507d1bvfve7q3nz5lVNTU3V3LlzqxUrVlRf+9rXqhdeeOEVv17UF/PfOGpVVVWlyzIAAABMF//cDwAAAKkptgAAAKSm2AIAAJCaYgsAAEBqii0AAACpKbYAAACkptgCAACQWtNML6CU/v7+onm9vb1F81paWormRUT09fUVzevs7Cyax/TZvXt30bzS+2nHjh1F8yIiDh48WDRvcHCwaJ79NL0GBgaK5q1fv75o3lQove/b2tqK5jGxkZGRonmlzwdKHwNKv19HRDQ3NxfNGx4eLppnP01udHS0aN7/+x4o/XwjzOyJuGILAABAaootAAAAqSm2AAAApKbYAgAAkJpiCwAAQGqKLQAAAKkptgAAAKSm2AIAAJCaYgsAAEBqii0AAACpKbYAAACkptgCAACQmmILAABAaootAAAAqSm2AAAApKbYAgAAkJpiCwAAQGqKLQAAAKk1zdQD7969u2jeFVdcUTRv9erVRfNaWlqK5kVEdHV1Fc0bHR0tmsf0uf7664vmlZ6F7u7uonkREXfddVfRvKnYo0xsZGSkaF7p98MMduzYUTSv9PsIE/t/f623bt1aNG9wcLBoXkT5Y4BzoOlV+vUu/X5Y+phSen0REf39/UXzent7i+bNBFdsAQAASE2xBQAAIDXFFgAAgNQUWwAAAFJTbAEAAEhNsQUAACA1xRYAAIDUFFsAAABSU2wBAABITbEFAAAgNcUWAACA1BRbAAAAUlNsAQAASE2xBQAAIDXFFgAAgNQUWwAAAFJTbAEAAEhNsQUAACA1xRYAAIDUalVVVTPxwNdff33RvJGRkaJ5O3bsKJrX2dlZNC8ioqWlpWhe6efM9Ck9/6Vna2hoqGheRMTatWuL5o2OjhbNY3r19fUVzevo6Ciat2rVqqJ5ERErV64smrd79+6iefCS0ud8ERHDw8NF88w/U2kqekDp41Tp4+hMcMUWAACA1BRbAAAAUlNsAQAASE2xBQAAIDXFFgAAgNQUWwAAAFJTbAEAAEhNsQUAACA1xRYAAIDUFFsAAABSU2wBAABITbEFAAAgNcUWAACA1BRbAAAAUlNsAQAASE2xBQAAIDXFFgAAgNQUWwAAAFJTbAEAAEitaaYeuK2trWjeyMhI0bze3t6ieUNDQ0XzIiIee+yx4pnkNDo6WjSv9H7q6ekpmhcR0dLSUjSv9HMu/R7H5Lq7u4vmlT4GTIXSx5XSzznDa8j06OjoKJ7Z399fNK/0cbT0MYrJlT6Gd3V1Fc2bCn19fTO9hP87rtgCAACQmmILAABAaootAAAAqSm2AAAApKbYAgAAkJpiCwAAQGqKLQAAAKkptgAAAKSm2AIAAJCaYgsAAEBqii0AAACpKbYAAACkptgCAACQmmILAABAaootAAAAqSm2AAAApKbYAgAAkJpiCwAAQGqKLQAAAKnVqqqqZnoRJXR0dBTNe/zxx4vmrV27tmheRER/f3/xTKbHwMBA0byurq6ieY2op6enaF5vb2/RvHozPDxcNK+zs7No3sGDB4vmTYXSx5XSM9vW1lY0D8YrPV+lj6N9fX1F85hcIx5TtmzZUjSvu7u7aN5McMUWAACA1BRbAAAAUlNsAQAASE2xBQAAIDXFFgAAgNQUWwAAAFJTbAEAAEhNsQUAACA1xRYAAIDUFFsAAABSU2wBAABITbEFAAAgNcUWAACA1BRbAAAAUlNsAQAASE2xBQAAIDXFFgAAgNQUWwAAAFJTbAEAAEhNsQUAACC1WlVV1UwvooSOjo6ZXsKkWlpaimeWfs59fX1F85jY7t27i+bt2LGjaN7w8HDRvJGRkaJ5EeXXOBV7lImV3gOrVq0qmlfa6tWri2eW3veQSWdn50wvYVKl3+Pqzejo6EwvYVKlzwmmYl5Ln1tNxbnadHPFFgAAgNQUWwAAAFJTbAEAAEhNsQUAACA1xRYAAIDUFFsAAABSU2wBAABITbEFAAAgNcUWAACA1BRbAAAAUlNsAQAASE2xBQAAIDXFFgAAgNQUWwAAAFJTbAEAAEhNsQUAACA1xRYAAIDUFFsAAABSU2wBAABIrWmmF1BKS0tL0bzOzs6ieb29vUXzIso/59JrLL2+elJ6vg4ePFg0r7+/v2heV1dX0bwI85Vd6T2wfv36onl33XVX0bwrrriiaB6MNzAwUDRv8eLFRfOGh4eL5k1F5lScpzGxoaGhonk9PT1F8zZu3Fg0r7u7u2heRPnjyujoaNG8mThPc8UWAACA1BRbAAAAUlNsAQAASE2xBQAAIDXFFgAAgNQUWwAAAFJTbAEAAEhNsQUAACA1xRYAAIDUFFsAAABSU2wBAABITbEFAAAgNcUWAACA1BRbAAAAUlNsAQAASE2xBQAAIDXFFgAAgNQUWwAAAFJTbAEAAEitaaYXUMpnPvOZonldXV1F8zZu3Fg0LyJi9erVRfNaWlqK5jF9nn/++aJ5Bw8eLJrX3d1dNA+m2lve8paieaXfr2G8r371q0XzhoaGiuY1NzcXzYsof1xxnJpeK1euLJrX2dlZNK/0nhodHS2aFxGxfv36onn10ANcsQUAACA1xRYAAIDUFFsAAABSU2wBAABITbEFAAAgNcUWAACA1BRbAAAAUlNsAQAASE2xBQAAIDXFFgAAgNQUWwAAAFJTbAEAAEhNsQUAACA1xRYAAIDUFFsAAABSU2wBAABITbEFAAAgNcUWAACA1BRbAAAAUqtVVVXN9CIAAADg1XLFFgAAgNQUWwAAAFJTbAEAAEhNsQUAACA1xRYAAIDUFFsAAABSU2wBAABITbEFAAAgNcUWAACA1P4F8SqxxH5w1EMAAAAASUVORK5CYII=",
       "text/plain": [
-       "<Figure size 720x720 with 1 Axes>"
+       "<Figure size 1200x1200 with 5 Axes>"
       ]
      },
-     "metadata": {
-      "needs_background": "light"
-     },
+     "metadata": {},
      "output_type": "display_data"
     }
    ],
-=======
-   "outputs": [],
->>>>>>> 9b0e2e75096cc1acee65bfac25f4eff818140252
    "source": [
     "%matplotlib inline\n",
     "\n",
-    "import autograd.numpy as np\n",
-    "from autograd import grad, elementwise_grad\n",
-    "import autograd.numpy.random as npr\n",
-    "from matplotlib import pyplot as plt\n",
-    "\n",
-    "def sigmoid(z):\n",
-    "    return 1/(1 + np.exp(-z))\n",
-    "\n",
-    "# Assuming one input, hidden, and output layer\n",
-    "def neural_network(params, x):\n",
-    "\n",
-    "    # Find the weights (including and biases) for the hidden and output layer.\n",
-    "    # Assume that params is a list of parameters for each layer.\n",
-    "    # The biases are the first element for each array in params,\n",
-    "    # and the weights are the remaning elements in each array in params.\n",
-    "\n",
-    "    w_hidden = params[0]\n",
-    "    w_output = params[1]\n",
-    "\n",
-    "    # Assumes input x being an one-dimensional array\n",
-    "    num_values = np.size(x)\n",
-    "    x = x.reshape(-1, num_values)\n",
-    "\n",
-    "    # Assume that the input layer does nothing to the input x\n",
-    "    x_input = x\n",
-    "\n",
-    "    ## Hidden layer:\n",
+    "# import necessary packages\n",
+    "import numpy as np\n",
+    "import matplotlib.pyplot as plt\n",
+    "import tensorflow as tf\n",
+    "from sklearn import datasets\n",
     "\n",
-    "    # Add a row of ones to include bias\n",
-    "    x_input = np.concatenate((np.ones((1,num_values)), x_input ), axis = 0)\n",
     "\n",
-    "    z_hidden = np.matmul(w_hidden, x_input)\n",
-    "    x_hidden = sigmoid(z_hidden)\n",
+    "# ensure the same random numbers appear every time\n",
+    "np.random.seed(0)\n",
     "\n",
-    "    ## Output layer:\n",
+    "# display images in notebook\n",
+    "%matplotlib inline\n",
+    "plt.rcParams['figure.figsize'] = (12,12)\n",
     "\n",
-    "    # Include bias:\n",
-    "    x_hidden = np.concatenate((np.ones((1,num_values)), x_hidden ), axis = 0)\n",
     "\n",
-    "    z_output = np.matmul(w_output, x_hidden)\n",
-    "    x_output = z_output\n",
+    "# download MNIST dataset\n",
+    "digits = datasets.load_digits()\n",
     "\n",
-    "    return x_output\n",
+    "# define inputs and labels\n",
+    "inputs = digits.images\n",
+    "labels = digits.target\n",
     "\n",
-    "# The trial solution using the deep neural network:\n",
-    "def g_trial(x,params, g0 = 10):\n",
-    "    return g0 + x*neural_network(params,x)\n",
+    "print(\"inputs = (n_inputs, pixel_width, pixel_height) = \" + str(inputs.shape))\n",
+    "print(\"labels = (n_inputs) = \" + str(labels.shape))\n",
     "\n",
-    "# The right side of the ODE:\n",
-    "def g(x, g_trial, gamma = 2):\n",
-    "    return -gamma*g_trial\n",
     "\n",
-    "# The cost function:\n",
-    "def cost_function(P, x):\n",
+    "# flatten the image\n",
+    "# the value -1 means dimension is inferred from the remaining dimensions: 8x8 = 64\n",
+    "n_inputs = len(inputs)\n",
+    "inputs = inputs.reshape(n_inputs, -1)\n",
+    "print(\"X = (n_inputs, n_features) = \" + str(inputs.shape))\n",
     "\n",
-    "    # Evaluate the trial function with the current parameters P\n",
-    "    g_t = g_trial(x,P)\n",
     "\n",
-    "    # Find the derivative w.r.t x of the neural network\n",
-    "    d_net_out = elementwise_grad(neural_network,1)(P,x)\n",
+    "# choose some random images to display\n",
+    "indices = np.arange(n_inputs)\n",
+    "random_indices = np.random.choice(indices, size=5)\n",
     "\n",
-    "    # Find the derivative w.r.t x of the trial function\n",
-    "    d_g_t = elementwise_grad(g_trial,0)(x,P)\n",
+    "for i, image in enumerate(digits.images[random_indices]):\n",
+    "    plt.subplot(1, 5, i+1)\n",
+    "    plt.axis('off')\n",
+    "    plt.imshow(image, cmap=plt.cm.gray_r, interpolation='nearest')\n",
+    "    plt.title(\"Label: %d\" % digits.target[random_indices[i]])\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "id": "d0c06f34",
+   "metadata": {
+    "collapsed": false,
+    "editable": true,
+    "jupyter": {
+     "outputs_hidden": false
+    }
+   },
+   "outputs": [],
+   "source": [
+    "from tensorflow.keras.layers import Input\n",
+    "from tensorflow.keras.models import Sequential      #This allows appending layers to existing models\n",
+    "from tensorflow.keras.layers import Dense           #This allows defining the characteristics of a particular layer\n",
+    "from tensorflow.keras import optimizers             #This allows using whichever optimiser we want (sgd,adam,RMSprop)\n",
+    "from tensorflow.keras import regularizers           #This allows using whichever regularizer we want (l1,l2,l1_l2)\n",
+    "from tensorflow.keras.utils import to_categorical   #This allows using categorical cross entropy as the cost function\n",
     "\n",
-    "    # The right side of the ODE\n",
-    "    func = g(x, g_t)\n",
+    "from sklearn.model_selection import train_test_split\n",
     "\n",
-    "    err_sqr = (d_g_t - func)**2\n",
-    "    cost_sum = np.sum(err_sqr)\n",
+    "# one-hot representation of labels\n",
+    "labels = to_categorical(labels)\n",
     "\n",
-    "    return cost_sum / np.size(err_sqr)\n",
+    "# split into train and test data\n",
+    "train_size = 0.8\n",
+    "test_size = 1 - train_size\n",
+    "X_train, X_test, Y_train, Y_test = train_test_split(inputs, labels, train_size=train_size,\n",
+    "                                                    test_size=test_size)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "id": "8272ca95",
+   "metadata": {
+    "collapsed": false,
+    "editable": true,
+    "jupyter": {
+     "outputs_hidden": false
+    }
+   },
+   "outputs": [],
+   "source": [
     "\n",
-    "# Solve the exponential decay ODE using neural network with one input, hidden, and output layer\n",
-    "def solve_ode_neural_network(x, num_neurons_hidden, num_iter, lmb):\n",
-    "    ## Set up initial weights and biases\n",
+    "epochs = 100\n",
+    "batch_size = 100\n",
+    "n_neurons_layer1 = 100\n",
+    "n_neurons_layer2 = 50\n",
+    "n_categories = 10\n",
+    "eta_vals = np.logspace(-5, 1, 7)\n",
+    "lmbd_vals = np.logspace(-5, 1, 7)\n",
+    "def create_neural_network_keras(n_neurons_layer1, n_neurons_layer2, n_categories, eta, lmbd):\n",
+    "    model = Sequential()\n",
+    "    model.add(Dense(n_neurons_layer1, activation='sigmoid', kernel_regularizer=regularizers.l2(lmbd)))\n",
+    "    model.add(Dense(n_neurons_layer2, activation='sigmoid', kernel_regularizer=regularizers.l2(lmbd)))\n",
+    "    model.add(Dense(n_categories, activation='softmax'))\n",
+    "    \n",
+    "    sgd = optimizers.SGD(learning_rate=eta)\n",
+    "    model.compile(loss='categorical_crossentropy', optimizer=sgd, metrics=['accuracy'])\n",
+    "    \n",
+    "    return model"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "id": "616613a7",
+   "metadata": {
+    "collapsed": false,
+    "editable": true,
+    "jupyter": {
+     "outputs_hidden": false
+    }
+   },
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "WARNING:absl:At this time, the v2.11+ optimizer `tf.keras.optimizers.SGD` runs slowly on M1/M2 Macs, please use the legacy Keras optimizer instead, located at `tf.keras.optimizers.legacy.SGD`.\n",
+      "WARNING:absl:There is a known slowdown when using v2.11+ Keras optimizers on M1/M2 Macs. Falling back to the legacy Keras optimizer, i.e., `tf.keras.optimizers.legacy.SGD`.\n",
+      "2025-10-20 13:19:24.847167: W tensorflow/tsl/platform/profile_utils/cpu_utils.cc:128] Failed to get CPU frequency: 0 Hz\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "12/12 [==============================] - 1s 77ms/step - loss: 2.3648 - accuracy: 0.0833\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "WARNING:absl:At this time, the v2.11+ optimizer `tf.keras.optimizers.SGD` runs slowly on M1/M2 Macs, please use the legacy Keras optimizer instead, located at `tf.keras.optimizers.legacy.SGD`.\n",
+      "WARNING:absl:There is a known slowdown when using v2.11+ Keras optimizers on M1/M2 Macs. Falling back to the legacy Keras optimizer, i.e., `tf.keras.optimizers.legacy.SGD`.\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Learning rate =  1e-05\n",
+      "Lambda =  1e-05\n",
+      "Test accuracy: 0.083\n",
+      "\n",
+      "12/12 [==============================] - 0s 16ms/step - loss: 2.5249 - accuracy: 0.1278\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "WARNING:absl:At this time, the v2.11+ optimizer `tf.keras.optimizers.SGD` runs slowly on M1/M2 Macs, please use the legacy Keras optimizer instead, located at `tf.keras.optimizers.legacy.SGD`.\n",
+      "WARNING:absl:There is a known slowdown when using v2.11+ Keras optimizers on M1/M2 Macs. Falling back to the legacy Keras optimizer, i.e., `tf.keras.optimizers.legacy.SGD`.\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Learning rate =  1e-05\n",
+      "Lambda =  0.0001\n",
+      "Test accuracy: 0.128\n",
+      "\n",
+      "12/12 [==============================] - 0s 17ms/step - loss: 2.6284 - accuracy: 0.1250\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "WARNING:absl:At this time, the v2.11+ optimizer `tf.keras.optimizers.SGD` runs slowly on M1/M2 Macs, please use the legacy Keras optimizer instead, located at `tf.keras.optimizers.legacy.SGD`.\n",
+      "WARNING:absl:There is a known slowdown when using v2.11+ Keras optimizers on M1/M2 Macs. Falling back to the legacy Keras optimizer, i.e., `tf.keras.optimizers.legacy.SGD`.\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Learning rate =  1e-05\n",
+      "Lambda =  0.001\n",
+      "Test accuracy: 0.125\n",
+      "\n",
+      "12/12 [==============================] - 0s 17ms/step - loss: 3.8276 - accuracy: 0.0889\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "WARNING:absl:At this time, the v2.11+ optimizer `tf.keras.optimizers.SGD` runs slowly on M1/M2 Macs, please use the legacy Keras optimizer instead, located at `tf.keras.optimizers.legacy.SGD`.\n",
+      "WARNING:absl:There is a known slowdown when using v2.11+ Keras optimizers on M1/M2 Macs. Falling back to the legacy Keras optimizer, i.e., `tf.keras.optimizers.legacy.SGD`.\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Learning rate =  1e-05\n",
+      "Lambda =  0.01\n",
+      "Test accuracy: 0.089\n",
+      "\n",
+      "12/12 [==============================] - 0s 13ms/step - loss: 17.2502 - accuracy: 0.1056\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "WARNING:absl:At this time, the v2.11+ optimizer `tf.keras.optimizers.SGD` runs slowly on M1/M2 Macs, please use the legacy Keras optimizer instead, located at `tf.keras.optimizers.legacy.SGD`.\n",
+      "WARNING:absl:There is a known slowdown when using v2.11+ Keras optimizers on M1/M2 Macs. Falling back to the legacy Keras optimizer, i.e., `tf.keras.optimizers.legacy.SGD`.\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Learning rate =  1e-05\n",
+      "Lambda =  0.1\n",
+      "Test accuracy: 0.106\n",
+      "\n",
+      "12/12 [==============================] - 0s 15ms/step - loss: 137.7438 - accuracy: 0.1167\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "WARNING:absl:At this time, the v2.11+ optimizer `tf.keras.optimizers.SGD` runs slowly on M1/M2 Macs, please use the legacy Keras optimizer instead, located at `tf.keras.optimizers.legacy.SGD`.\n",
+      "WARNING:absl:There is a known slowdown when using v2.11+ Keras optimizers on M1/M2 Macs. Falling back to the legacy Keras optimizer, i.e., `tf.keras.optimizers.legacy.SGD`.\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Learning rate =  1e-05\n",
+      "Lambda =  1.0\n",
+      "Test accuracy: 0.117\n",
+      "\n",
+      "12/12 [==============================] - 0s 11ms/step - loss: 791.1546 - accuracy: 0.1028\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "WARNING:absl:At this time, the v2.11+ optimizer `tf.keras.optimizers.SGD` runs slowly on M1/M2 Macs, please use the legacy Keras optimizer instead, located at `tf.keras.optimizers.legacy.SGD`.\n",
+      "WARNING:absl:There is a known slowdown when using v2.11+ Keras optimizers on M1/M2 Macs. Falling back to the legacy Keras optimizer, i.e., `tf.keras.optimizers.legacy.SGD`.\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Learning rate =  1e-05\n",
+      "Lambda =  10.0\n",
+      "Test accuracy: 0.103\n",
+      "\n",
+      "12/12 [==============================] - 0s 18ms/step - loss: 2.4095 - accuracy: 0.0917\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "WARNING:absl:At this time, the v2.11+ optimizer `tf.keras.optimizers.SGD` runs slowly on M1/M2 Macs, please use the legacy Keras optimizer instead, located at `tf.keras.optimizers.legacy.SGD`.\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Learning rate =  0.0001\n",
+      "Lambda =  1e-05\n",
+      "Test accuracy: 0.092\n",
+      "\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "WARNING:absl:There is a known slowdown when using v2.11+ Keras optimizers on M1/M2 Macs. Falling back to the legacy Keras optimizer, i.e., `tf.keras.optimizers.legacy.SGD`.\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "12/12 [==============================] - 0s 13ms/step - loss: 2.4702 - accuracy: 0.1167\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "WARNING:absl:At this time, the v2.11+ optimizer `tf.keras.optimizers.SGD` runs slowly on M1/M2 Macs, please use the legacy Keras optimizer instead, located at `tf.keras.optimizers.legacy.SGD`.\n",
+      "WARNING:absl:There is a known slowdown when using v2.11+ Keras optimizers on M1/M2 Macs. Falling back to the legacy Keras optimizer, i.e., `tf.keras.optimizers.legacy.SGD`.\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Learning rate =  0.0001\n",
+      "Lambda =  0.0001\n",
+      "Test accuracy: 0.117\n",
+      "\n",
+      "12/12 [==============================] - 0s 14ms/step - loss: 2.4999 - accuracy: 0.0917\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "WARNING:absl:At this time, the v2.11+ optimizer `tf.keras.optimizers.SGD` runs slowly on M1/M2 Macs, please use the legacy Keras optimizer instead, located at `tf.keras.optimizers.legacy.SGD`.\n",
+      "WARNING:absl:There is a known slowdown when using v2.11+ Keras optimizers on M1/M2 Macs. Falling back to the legacy Keras optimizer, i.e., `tf.keras.optimizers.legacy.SGD`.\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Learning rate =  0.0001\n",
+      "Lambda =  0.001\n",
+      "Test accuracy: 0.092\n",
+      "\n",
+      "12/12 [==============================] - 0s 17ms/step - loss: 3.8617 - accuracy: 0.0889\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "WARNING:absl:At this time, the v2.11+ optimizer `tf.keras.optimizers.SGD` runs slowly on M1/M2 Macs, please use the legacy Keras optimizer instead, located at `tf.keras.optimizers.legacy.SGD`.\n",
+      "WARNING:absl:There is a known slowdown when using v2.11+ Keras optimizers on M1/M2 Macs. Falling back to the legacy Keras optimizer, i.e., `tf.keras.optimizers.legacy.SGD`.\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Learning rate =  0.0001\n",
+      "Lambda =  0.01\n",
+      "Test accuracy: 0.089\n",
+      "\n",
+      "12/12 [==============================] - 0s 17ms/step - loss: 15.9813 - accuracy: 0.1139\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "WARNING:absl:At this time, the v2.11+ optimizer `tf.keras.optimizers.SGD` runs slowly on M1/M2 Macs, please use the legacy Keras optimizer instead, located at `tf.keras.optimizers.legacy.SGD`.\n",
+      "WARNING:absl:There is a known slowdown when using v2.11+ Keras optimizers on M1/M2 Macs. Falling back to the legacy Keras optimizer, i.e., `tf.keras.optimizers.legacy.SGD`.\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Learning rate =  0.0001\n",
+      "Lambda =  0.1\n",
+      "Test accuracy: 0.114\n",
+      "\n",
+      "12/12 [==============================] - 0s 13ms/step - loss: 81.3855 - accuracy: 0.0889\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "WARNING:absl:At this time, the v2.11+ optimizer `tf.keras.optimizers.SGD` runs slowly on M1/M2 Macs, please use the legacy Keras optimizer instead, located at `tf.keras.optimizers.legacy.SGD`.\n",
+      "WARNING:absl:There is a known slowdown when using v2.11+ Keras optimizers on M1/M2 Macs. Falling back to the legacy Keras optimizer, i.e., `tf.keras.optimizers.legacy.SGD`.\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Learning rate =  0.0001\n",
+      "Lambda =  1.0\n",
+      "Test accuracy: 0.089\n",
+      "\n",
+      "12/12 [==============================] - 0s 17ms/step - loss: 6.0620 - accuracy: 0.1139\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "WARNING:absl:At this time, the v2.11+ optimizer `tf.keras.optimizers.SGD` runs slowly on M1/M2 Macs, please use the legacy Keras optimizer instead, located at `tf.keras.optimizers.legacy.SGD`.\n",
+      "WARNING:absl:There is a known slowdown when using v2.11+ Keras optimizers on M1/M2 Macs. Falling back to the legacy Keras optimizer, i.e., `tf.keras.optimizers.legacy.SGD`.\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Learning rate =  0.0001\n",
+      "Lambda =  10.0\n",
+      "Test accuracy: 0.114\n",
+      "\n",
+      "12/12 [==============================] - 0s 11ms/step - loss: 2.2036 - accuracy: 0.3917\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "WARNING:absl:At this time, the v2.11+ optimizer `tf.keras.optimizers.SGD` runs slowly on M1/M2 Macs, please use the legacy Keras optimizer instead, located at `tf.keras.optimizers.legacy.SGD`.\n",
+      "WARNING:absl:There is a known slowdown when using v2.11+ Keras optimizers on M1/M2 Macs. Falling back to the legacy Keras optimizer, i.e., `tf.keras.optimizers.legacy.SGD`.\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Learning rate =  0.001\n",
+      "Lambda =  1e-05\n",
+      "Test accuracy: 0.392\n",
+      "\n",
+      "12/12 [==============================] - 0s 14ms/step - loss: 2.1645 - accuracy: 0.3694\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "WARNING:absl:At this time, the v2.11+ optimizer `tf.keras.optimizers.SGD` runs slowly on M1/M2 Macs, please use the legacy Keras optimizer instead, located at `tf.keras.optimizers.legacy.SGD`.\n",
+      "WARNING:absl:There is a known slowdown when using v2.11+ Keras optimizers on M1/M2 Macs. Falling back to the legacy Keras optimizer, i.e., `tf.keras.optimizers.legacy.SGD`.\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Learning rate =  0.001\n",
+      "Lambda =  0.0001\n",
+      "Test accuracy: 0.369\n",
+      "\n",
+      "12/12 [==============================] - 0s 11ms/step - loss: 2.3489 - accuracy: 0.3333\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "WARNING:absl:At this time, the v2.11+ optimizer `tf.keras.optimizers.SGD` runs slowly on M1/M2 Macs, please use the legacy Keras optimizer instead, located at `tf.keras.optimizers.legacy.SGD`.\n",
+      "WARNING:absl:There is a known slowdown when using v2.11+ Keras optimizers on M1/M2 Macs. Falling back to the legacy Keras optimizer, i.e., `tf.keras.optimizers.legacy.SGD`.\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Learning rate =  0.001\n",
+      "Lambda =  0.001\n",
+      "Test accuracy: 0.333\n",
+      "\n",
+      "12/12 [==============================] - 0s 11ms/step - loss: 3.5798 - accuracy: 0.2611\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "WARNING:absl:At this time, the v2.11+ optimizer `tf.keras.optimizers.SGD` runs slowly on M1/M2 Macs, please use the legacy Keras optimizer instead, located at `tf.keras.optimizers.legacy.SGD`.\n",
+      "WARNING:absl:There is a known slowdown when using v2.11+ Keras optimizers on M1/M2 Macs. Falling back to the legacy Keras optimizer, i.e., `tf.keras.optimizers.legacy.SGD`.\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Learning rate =  0.001\n",
+      "Lambda =  0.01\n",
+      "Test accuracy: 0.261\n",
+      "\n",
+      "12/12 [==============================] - 0s 15ms/step - loss: 10.2003 - accuracy: 0.4333\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "WARNING:absl:At this time, the v2.11+ optimizer `tf.keras.optimizers.SGD` runs slowly on M1/M2 Macs, please use the legacy Keras optimizer instead, located at `tf.keras.optimizers.legacy.SGD`.\n",
+      "WARNING:absl:There is a known slowdown when using v2.11+ Keras optimizers on M1/M2 Macs. Falling back to the legacy Keras optimizer, i.e., `tf.keras.optimizers.legacy.SGD`.\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Learning rate =  0.001\n",
+      "Lambda =  0.1\n",
+      "Test accuracy: 0.433\n",
+      "\n",
+      "12/12 [==============================] - 0s 11ms/step - loss: 2.6594 - accuracy: 0.0889\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "WARNING:absl:At this time, the v2.11+ optimizer `tf.keras.optimizers.SGD` runs slowly on M1/M2 Macs, please use the legacy Keras optimizer instead, located at `tf.keras.optimizers.legacy.SGD`.\n",
+      "WARNING:absl:There is a known slowdown when using v2.11+ Keras optimizers on M1/M2 Macs. Falling back to the legacy Keras optimizer, i.e., `tf.keras.optimizers.legacy.SGD`.\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Learning rate =  0.001\n",
+      "Lambda =  1.0\n",
+      "Test accuracy: 0.089\n",
+      "\n",
+      "12/12 [==============================] - 0s 14ms/step - loss: 2.3030 - accuracy: 0.1167\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "WARNING:absl:At this time, the v2.11+ optimizer `tf.keras.optimizers.SGD` runs slowly on M1/M2 Macs, please use the legacy Keras optimizer instead, located at `tf.keras.optimizers.legacy.SGD`.\n",
+      "WARNING:absl:There is a known slowdown when using v2.11+ Keras optimizers on M1/M2 Macs. Falling back to the legacy Keras optimizer, i.e., `tf.keras.optimizers.legacy.SGD`.\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Learning rate =  0.001\n",
+      "Lambda =  10.0\n",
+      "Test accuracy: 0.117\n",
+      "\n",
+      "12/12 [==============================] - 0s 12ms/step - loss: 1.0555 - accuracy: 0.8694\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "WARNING:absl:At this time, the v2.11+ optimizer `tf.keras.optimizers.SGD` runs slowly on M1/M2 Macs, please use the legacy Keras optimizer instead, located at `tf.keras.optimizers.legacy.SGD`.\n",
+      "WARNING:absl:There is a known slowdown when using v2.11+ Keras optimizers on M1/M2 Macs. Falling back to the legacy Keras optimizer, i.e., `tf.keras.optimizers.legacy.SGD`.\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Learning rate =  0.01\n",
+      "Lambda =  1e-05\n",
+      "Test accuracy: 0.869\n",
+      "\n",
+      "12/12 [==============================] - 0s 11ms/step - loss: 1.1238 - accuracy: 0.8583\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "WARNING:absl:At this time, the v2.11+ optimizer `tf.keras.optimizers.SGD` runs slowly on M1/M2 Macs, please use the legacy Keras optimizer instead, located at `tf.keras.optimizers.legacy.SGD`.\n",
+      "WARNING:absl:There is a known slowdown when using v2.11+ Keras optimizers on M1/M2 Macs. Falling back to the legacy Keras optimizer, i.e., `tf.keras.optimizers.legacy.SGD`.\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Learning rate =  0.01\n",
+      "Lambda =  0.0001\n",
+      "Test accuracy: 0.858\n",
+      "\n",
+      "12/12 [==============================] - 0s 15ms/step - loss: 1.2342 - accuracy: 0.8806\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "WARNING:absl:At this time, the v2.11+ optimizer `tf.keras.optimizers.SGD` runs slowly on M1/M2 Macs, please use the legacy Keras optimizer instead, located at `tf.keras.optimizers.legacy.SGD`.\n",
+      "WARNING:absl:There is a known slowdown when using v2.11+ Keras optimizers on M1/M2 Macs. Falling back to the legacy Keras optimizer, i.e., `tf.keras.optimizers.legacy.SGD`.\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Learning rate =  0.01\n",
+      "Lambda =  0.001\n",
+      "Test accuracy: 0.881\n",
+      "\n",
+      "12/12 [==============================] - 0s 11ms/step - loss: 2.1114 - accuracy: 0.8833\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "WARNING:absl:At this time, the v2.11+ optimizer `tf.keras.optimizers.SGD` runs slowly on M1/M2 Macs, please use the legacy Keras optimizer instead, located at `tf.keras.optimizers.legacy.SGD`.\n",
+      "WARNING:absl:There is a known slowdown when using v2.11+ Keras optimizers on M1/M2 Macs. Falling back to the legacy Keras optimizer, i.e., `tf.keras.optimizers.legacy.SGD`.\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Learning rate =  0.01\n",
+      "Lambda =  0.01\n",
+      "Test accuracy: 0.883\n",
+      "\n",
+      "12/12 [==============================] - 0s 14ms/step - loss: 2.2776 - accuracy: 0.5139\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "WARNING:absl:At this time, the v2.11+ optimizer `tf.keras.optimizers.SGD` runs slowly on M1/M2 Macs, please use the legacy Keras optimizer instead, located at `tf.keras.optimizers.legacy.SGD`.\n",
+      "WARNING:absl:There is a known slowdown when using v2.11+ Keras optimizers on M1/M2 Macs. Falling back to the legacy Keras optimizer, i.e., `tf.keras.optimizers.legacy.SGD`.\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Learning rate =  0.01\n",
+      "Lambda =  0.1\n",
+      "Test accuracy: 0.514\n",
+      "\n",
+      "12/12 [==============================] - 0s 15ms/step - loss: 2.3079 - accuracy: 0.0778\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "WARNING:absl:At this time, the v2.11+ optimizer `tf.keras.optimizers.SGD` runs slowly on M1/M2 Macs, please use the legacy Keras optimizer instead, located at `tf.keras.optimizers.legacy.SGD`.\n",
+      "WARNING:absl:There is a known slowdown when using v2.11+ Keras optimizers on M1/M2 Macs. Falling back to the legacy Keras optimizer, i.e., `tf.keras.optimizers.legacy.SGD`.\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Learning rate =  0.01\n",
+      "Lambda =  1.0\n",
+      "Test accuracy: 0.078\n",
+      "\n",
+      "12/12 [==============================] - 0s 11ms/step - loss: 2.3081 - accuracy: 0.0778\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "WARNING:absl:At this time, the v2.11+ optimizer `tf.keras.optimizers.SGD` runs slowly on M1/M2 Macs, please use the legacy Keras optimizer instead, located at `tf.keras.optimizers.legacy.SGD`.\n",
+      "WARNING:absl:There is a known slowdown when using v2.11+ Keras optimizers on M1/M2 Macs. Falling back to the legacy Keras optimizer, i.e., `tf.keras.optimizers.legacy.SGD`.\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Learning rate =  0.01\n",
+      "Lambda =  10.0\n",
+      "Test accuracy: 0.078\n",
+      "\n",
+      "12/12 [==============================] - 0s 16ms/step - loss: 0.1069 - accuracy: 0.9722\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "WARNING:absl:At this time, the v2.11+ optimizer `tf.keras.optimizers.SGD` runs slowly on M1/M2 Macs, please use the legacy Keras optimizer instead, located at `tf.keras.optimizers.legacy.SGD`.\n",
+      "WARNING:absl:There is a known slowdown when using v2.11+ Keras optimizers on M1/M2 Macs. Falling back to the legacy Keras optimizer, i.e., `tf.keras.optimizers.legacy.SGD`.\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Learning rate =  0.1\n",
+      "Lambda =  1e-05\n",
+      "Test accuracy: 0.972\n",
+      "\n",
+      "12/12 [==============================] - 0s 11ms/step - loss: 0.1282 - accuracy: 0.9667\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "WARNING:absl:At this time, the v2.11+ optimizer `tf.keras.optimizers.SGD` runs slowly on M1/M2 Macs, please use the legacy Keras optimizer instead, located at `tf.keras.optimizers.legacy.SGD`.\n",
+      "WARNING:absl:There is a known slowdown when using v2.11+ Keras optimizers on M1/M2 Macs. Falling back to the legacy Keras optimizer, i.e., `tf.keras.optimizers.legacy.SGD`.\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Learning rate =  0.1\n",
+      "Lambda =  0.0001\n",
+      "Test accuracy: 0.967\n",
+      "\n",
+      "12/12 [==============================] - 0s 13ms/step - loss: 0.2611 - accuracy: 0.9694\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "WARNING:absl:At this time, the v2.11+ optimizer `tf.keras.optimizers.SGD` runs slowly on M1/M2 Macs, please use the legacy Keras optimizer instead, located at `tf.keras.optimizers.legacy.SGD`.\n",
+      "WARNING:absl:There is a known slowdown when using v2.11+ Keras optimizers on M1/M2 Macs. Falling back to the legacy Keras optimizer, i.e., `tf.keras.optimizers.legacy.SGD`.\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Learning rate =  0.1\n",
+      "Lambda =  0.001\n",
+      "Test accuracy: 0.969\n",
+      "\n",
+      "12/12 [==============================] - 0s 15ms/step - loss: 0.4928 - accuracy: 0.9639\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "WARNING:absl:At this time, the v2.11+ optimizer `tf.keras.optimizers.SGD` runs slowly on M1/M2 Macs, please use the legacy Keras optimizer instead, located at `tf.keras.optimizers.legacy.SGD`.\n",
+      "WARNING:absl:There is a known slowdown when using v2.11+ Keras optimizers on M1/M2 Macs. Falling back to the legacy Keras optimizer, i.e., `tf.keras.optimizers.legacy.SGD`.\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Learning rate =  0.1\n",
+      "Lambda =  0.01\n",
+      "Test accuracy: 0.964\n",
+      "\n",
+      "12/12 [==============================] - 0s 11ms/step - loss: 1.6045 - accuracy: 0.6222\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "WARNING:absl:At this time, the v2.11+ optimizer `tf.keras.optimizers.SGD` runs slowly on M1/M2 Macs, please use the legacy Keras optimizer instead, located at `tf.keras.optimizers.legacy.SGD`.\n",
+      "WARNING:absl:There is a known slowdown when using v2.11+ Keras optimizers on M1/M2 Macs. Falling back to the legacy Keras optimizer, i.e., `tf.keras.optimizers.legacy.SGD`.\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Learning rate =  0.1\n",
+      "Lambda =  0.1\n",
+      "Test accuracy: 0.622\n",
+      "\n",
+      "12/12 [==============================] - 0s 16ms/step - loss: 2.3218 - accuracy: 0.0861\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "WARNING:absl:At this time, the v2.11+ optimizer `tf.keras.optimizers.SGD` runs slowly on M1/M2 Macs, please use the legacy Keras optimizer instead, located at `tf.keras.optimizers.legacy.SGD`.\n",
+      "WARNING:absl:There is a known slowdown when using v2.11+ Keras optimizers on M1/M2 Macs. Falling back to the legacy Keras optimizer, i.e., `tf.keras.optimizers.legacy.SGD`.\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Learning rate =  0.1\n",
+      "Lambda =  1.0\n",
+      "Test accuracy: 0.086\n",
+      "\n",
+      "12/12 [==============================] - 0s 14ms/step - loss: 1597.5353 - accuracy: 0.0444\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "WARNING:absl:At this time, the v2.11+ optimizer `tf.keras.optimizers.SGD` runs slowly on M1/M2 Macs, please use the legacy Keras optimizer instead, located at `tf.keras.optimizers.legacy.SGD`.\n",
+      "WARNING:absl:There is a known slowdown when using v2.11+ Keras optimizers on M1/M2 Macs. Falling back to the legacy Keras optimizer, i.e., `tf.keras.optimizers.legacy.SGD`.\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Learning rate =  0.1\n",
+      "Lambda =  10.0\n",
+      "Test accuracy: 0.044\n",
+      "\n",
+      "12/12 [==============================] - 0s 12ms/step - loss: 0.0781 - accuracy: 0.9806\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "WARNING:absl:At this time, the v2.11+ optimizer `tf.keras.optimizers.SGD` runs slowly on M1/M2 Macs, please use the legacy Keras optimizer instead, located at `tf.keras.optimizers.legacy.SGD`.\n",
+      "WARNING:absl:There is a known slowdown when using v2.11+ Keras optimizers on M1/M2 Macs. Falling back to the legacy Keras optimizer, i.e., `tf.keras.optimizers.legacy.SGD`.\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Learning rate =  1.0\n",
+      "Lambda =  1e-05\n",
+      "Test accuracy: 0.981\n",
+      "\n",
+      "12/12 [==============================] - 0s 15ms/step - loss: 0.0868 - accuracy: 0.9861\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "WARNING:absl:At this time, the v2.11+ optimizer `tf.keras.optimizers.SGD` runs slowly on M1/M2 Macs, please use the legacy Keras optimizer instead, located at `tf.keras.optimizers.legacy.SGD`.\n",
+      "WARNING:absl:There is a known slowdown when using v2.11+ Keras optimizers on M1/M2 Macs. Falling back to the legacy Keras optimizer, i.e., `tf.keras.optimizers.legacy.SGD`.\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Learning rate =  1.0\n",
+      "Lambda =  0.0001\n",
+      "Test accuracy: 0.986\n",
+      "\n",
+      "12/12 [==============================] - 0s 11ms/step - loss: 0.4859 - accuracy: 0.9222\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "WARNING:absl:At this time, the v2.11+ optimizer `tf.keras.optimizers.SGD` runs slowly on M1/M2 Macs, please use the legacy Keras optimizer instead, located at `tf.keras.optimizers.legacy.SGD`.\n",
+      "WARNING:absl:There is a known slowdown when using v2.11+ Keras optimizers on M1/M2 Macs. Falling back to the legacy Keras optimizer, i.e., `tf.keras.optimizers.legacy.SGD`.\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Learning rate =  1.0\n",
+      "Lambda =  0.001\n",
+      "Test accuracy: 0.922\n",
+      "\n",
+      "12/12 [==============================] - 0s 17ms/step - loss: 2.9575 - accuracy: 0.1583\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "WARNING:absl:At this time, the v2.11+ optimizer `tf.keras.optimizers.SGD` runs slowly on M1/M2 Macs, please use the legacy Keras optimizer instead, located at `tf.keras.optimizers.legacy.SGD`.\n",
+      "WARNING:absl:There is a known slowdown when using v2.11+ Keras optimizers on M1/M2 Macs. Falling back to the legacy Keras optimizer, i.e., `tf.keras.optimizers.legacy.SGD`.\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Learning rate =  1.0\n",
+      "Lambda =  0.01\n",
+      "Test accuracy: 0.158\n",
+      "\n",
+      "12/12 [==============================] - 0s 17ms/step - loss: 2.5299 - accuracy: 0.0889\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "WARNING:absl:At this time, the v2.11+ optimizer `tf.keras.optimizers.SGD` runs slowly on M1/M2 Macs, please use the legacy Keras optimizer instead, located at `tf.keras.optimizers.legacy.SGD`.\n",
+      "WARNING:absl:There is a known slowdown when using v2.11+ Keras optimizers on M1/M2 Macs. Falling back to the legacy Keras optimizer, i.e., `tf.keras.optimizers.legacy.SGD`.\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Learning rate =  1.0\n",
+      "Lambda =  0.1\n",
+      "Test accuracy: 0.089\n",
+      "\n",
+      "12/12 [==============================] - 0s 12ms/step - loss: 371.5039 - accuracy: 0.1139\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "WARNING:absl:At this time, the v2.11+ optimizer `tf.keras.optimizers.SGD` runs slowly on M1/M2 Macs, please use the legacy Keras optimizer instead, located at `tf.keras.optimizers.legacy.SGD`.\n",
+      "WARNING:absl:There is a known slowdown when using v2.11+ Keras optimizers on M1/M2 Macs. Falling back to the legacy Keras optimizer, i.e., `tf.keras.optimizers.legacy.SGD`.\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Learning rate =  1.0\n",
+      "Lambda =  1.0\n",
+      "Test accuracy: 0.114\n",
+      "\n",
+      "12/12 [==============================] - 0s 17ms/step - loss: nan - accuracy: 0.0778\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "WARNING:absl:At this time, the v2.11+ optimizer `tf.keras.optimizers.SGD` runs slowly on M1/M2 Macs, please use the legacy Keras optimizer instead, located at `tf.keras.optimizers.legacy.SGD`.\n",
+      "WARNING:absl:There is a known slowdown when using v2.11+ Keras optimizers on M1/M2 Macs. Falling back to the legacy Keras optimizer, i.e., `tf.keras.optimizers.legacy.SGD`.\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Learning rate =  1.0\n",
+      "Lambda =  10.0\n",
+      "Test accuracy: 0.078\n",
+      "\n",
+      "12/12 [==============================] - 0s 11ms/step - loss: 4.4885 - accuracy: 0.1056\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "WARNING:absl:At this time, the v2.11+ optimizer `tf.keras.optimizers.SGD` runs slowly on M1/M2 Macs, please use the legacy Keras optimizer instead, located at `tf.keras.optimizers.legacy.SGD`.\n",
+      "WARNING:absl:There is a known slowdown when using v2.11+ Keras optimizers on M1/M2 Macs. Falling back to the legacy Keras optimizer, i.e., `tf.keras.optimizers.legacy.SGD`.\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Learning rate =  10.0\n",
+      "Lambda =  1e-05\n",
+      "Test accuracy: 0.106\n",
+      "\n",
+      "12/12 [==============================] - 0s 14ms/step - loss: 2.4071 - accuracy: 0.1250\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "WARNING:absl:At this time, the v2.11+ optimizer `tf.keras.optimizers.SGD` runs slowly on M1/M2 Macs, please use the legacy Keras optimizer instead, located at `tf.keras.optimizers.legacy.SGD`.\n",
+      "WARNING:absl:There is a known slowdown when using v2.11+ Keras optimizers on M1/M2 Macs. Falling back to the legacy Keras optimizer, i.e., `tf.keras.optimizers.legacy.SGD`.\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Learning rate =  10.0\n",
+      "Lambda =  0.0001\n",
+      "Test accuracy: 0.125\n",
+      "\n",
+      "12/12 [==============================] - 0s 14ms/step - loss: 2.4936 - accuracy: 0.1167\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "WARNING:absl:At this time, the v2.11+ optimizer `tf.keras.optimizers.SGD` runs slowly on M1/M2 Macs, please use the legacy Keras optimizer instead, located at `tf.keras.optimizers.legacy.SGD`.\n",
+      "WARNING:absl:There is a known slowdown when using v2.11+ Keras optimizers on M1/M2 Macs. Falling back to the legacy Keras optimizer, i.e., `tf.keras.optimizers.legacy.SGD`.\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Learning rate =  10.0\n",
+      "Lambda =  0.001\n",
+      "Test accuracy: 0.117\n",
+      "\n",
+      "12/12 [==============================] - 0s 13ms/step - loss: 2.4569 - accuracy: 0.0861\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "WARNING:absl:At this time, the v2.11+ optimizer `tf.keras.optimizers.SGD` runs slowly on M1/M2 Macs, please use the legacy Keras optimizer instead, located at `tf.keras.optimizers.legacy.SGD`.\n",
+      "WARNING:absl:There is a known slowdown when using v2.11+ Keras optimizers on M1/M2 Macs. Falling back to the legacy Keras optimizer, i.e., `tf.keras.optimizers.legacy.SGD`.\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Learning rate =  10.0\n",
+      "Lambda =  0.01\n",
+      "Test accuracy: 0.086\n",
+      "\n",
+      "12/12 [==============================] - 0s 16ms/step - loss: 959.3554 - accuracy: 0.1167\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "WARNING:absl:At this time, the v2.11+ optimizer `tf.keras.optimizers.SGD` runs slowly on M1/M2 Macs, please use the legacy Keras optimizer instead, located at `tf.keras.optimizers.legacy.SGD`.\n",
+      "WARNING:absl:There is a known slowdown when using v2.11+ Keras optimizers on M1/M2 Macs. Falling back to the legacy Keras optimizer, i.e., `tf.keras.optimizers.legacy.SGD`.\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Learning rate =  10.0\n",
+      "Lambda =  0.1\n",
+      "Test accuracy: 0.117\n",
+      "\n",
+      "12/12 [==============================] - 0s 14ms/step - loss: nan - accuracy: 0.0778\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "WARNING:absl:At this time, the v2.11+ optimizer `tf.keras.optimizers.SGD` runs slowly on M1/M2 Macs, please use the legacy Keras optimizer instead, located at `tf.keras.optimizers.legacy.SGD`.\n",
+      "WARNING:absl:There is a known slowdown when using v2.11+ Keras optimizers on M1/M2 Macs. Falling back to the legacy Keras optimizer, i.e., `tf.keras.optimizers.legacy.SGD`.\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Learning rate =  10.0\n",
+      "Lambda =  1.0\n",
+      "Test accuracy: 0.078\n",
+      "\n",
+      "12/12 [==============================] - 0s 13ms/step - loss: nan - accuracy: 0.0778\n",
+      "Learning rate =  10.0\n",
+      "Lambda =  10.0\n",
+      "Test accuracy: 0.078\n",
+      "\n"
+     ]
+    }
+   ],
+   "source": [
+    "DNN_keras = np.zeros((len(eta_vals), len(lmbd_vals)), dtype=object)\n",
+    "        \n",
+    "for i, eta in enumerate(eta_vals):\n",
+    "    for j, lmbd in enumerate(lmbd_vals):\n",
+    "        DNN = create_neural_network_keras(n_neurons_layer1, n_neurons_layer2, n_categories,\n",
+    "                                         eta=eta, lmbd=lmbd)\n",
+    "        DNN.fit(X_train, Y_train, epochs=epochs, batch_size=batch_size, verbose=0)\n",
+    "        scores = DNN.evaluate(X_test, Y_test)\n",
+    "        \n",
+    "        DNN_keras[i][j] = DNN\n",
+    "        \n",
+    "        print(\"Learning rate = \", eta)\n",
+    "        print(\"Lambda = \", lmbd)\n",
+    "        print(\"Test accuracy: %.3f\" % scores[1])\n",
+    "        print()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "id": "f57a7b70",
+   "metadata": {
+    "collapsed": false,
+    "editable": true,
+    "jupyter": {
+     "outputs_hidden": false
+    }
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "45/45 [==============================] - 1s 15ms/step - loss: 2.3829 - accuracy: 0.0995\n",
+      "12/12 [==============================] - 0s 14ms/step - loss: 2.3648 - accuracy: 0.0833\n",
+      "45/45 [==============================] - 1s 14ms/step - loss: 2.5299 - accuracy: 0.0946\n",
+      "12/12 [==============================] - 0s 15ms/step - loss: 2.5249 - accuracy: 0.1278\n",
+      "45/45 [==============================] - 1s 14ms/step - loss: 2.5953 - accuracy: 0.1427\n",
+      "12/12 [==============================] - 0s 12ms/step - loss: 2.6284 - accuracy: 0.1250\n",
+      "45/45 [==============================] - 1s 17ms/step - loss: 3.8118 - accuracy: 0.0995\n",
+      "12/12 [==============================] - 0s 16ms/step - loss: 3.8276 - accuracy: 0.0889\n",
+      "45/45 [==============================] - 1s 15ms/step - loss: 17.2903 - accuracy: 0.0905\n",
+      "12/12 [==============================] - 0s 15ms/step - loss: 17.2502 - accuracy: 0.1056\n",
+      "45/45 [==============================] - 1s 15ms/step - loss: 137.7373 - accuracy: 0.0932\n",
+      "12/12 [==============================] - 0s 14ms/step - loss: 137.7438 - accuracy: 0.1167\n",
+      "45/45 [==============================] - 1s 16ms/step - loss: 791.1535 - accuracy: 0.0960\n",
+      "12/12 [==============================] - 0s 15ms/step - loss: 791.1546 - accuracy: 0.1028\n",
+      "45/45 [==============================] - 1s 14ms/step - loss: 2.3828 - accuracy: 0.1016\n",
+      "12/12 [==============================] - 0s 14ms/step - loss: 2.4095 - accuracy: 0.0917\n",
+      "45/45 [==============================] - 1s 13ms/step - loss: 2.5109 - accuracy: 0.0939\n",
+      "12/12 [==============================] - 0s 13ms/step - loss: 2.4702 - accuracy: 0.1167\n",
+      "45/45 [==============================] - 1s 13ms/step - loss: 2.5195 - accuracy: 0.0647\n",
+      "12/12 [==============================] - 0s 15ms/step - loss: 2.4999 - accuracy: 0.0917\n",
+      "45/45 [==============================] - 1s 14ms/step - loss: 3.8371 - accuracy: 0.1044\n",
+      "12/12 [==============================] - 0s 13ms/step - loss: 3.8617 - accuracy: 0.0889\n",
+      "45/45 [==============================] - 1s 15ms/step - loss: 15.9897 - accuracy: 0.0953\n",
+      "12/12 [==============================] - 0s 17ms/step - loss: 15.9813 - accuracy: 0.1139\n",
+      "45/45 [==============================] - 1s 17ms/step - loss: 81.3778 - accuracy: 0.1037\n",
+      "12/12 [==============================] - 0s 16ms/step - loss: 81.3855 - accuracy: 0.0889\n",
+      "45/45 [==============================] - 1s 15ms/step - loss: 6.0581 - accuracy: 0.0967\n",
+      "12/12 [==============================] - 0s 14ms/step - loss: 6.0620 - accuracy: 0.1139\n",
+      "45/45 [==============================] - 1s 16ms/step - loss: 2.1808 - accuracy: 0.4509\n",
+      "12/12 [==============================] - 0s 16ms/step - loss: 2.2036 - accuracy: 0.3917\n",
+      "45/45 [==============================] - 1s 14ms/step - loss: 2.1643 - accuracy: 0.3800\n",
+      "12/12 [==============================] - 0s 14ms/step - loss: 2.1645 - accuracy: 0.3694\n",
+      "45/45 [==============================] - 1s 15ms/step - loss: 2.3447 - accuracy: 0.3834\n",
+      "12/12 [==============================] - 0s 14ms/step - loss: 2.3489 - accuracy: 0.3333\n",
+      "45/45 [==============================] - 1s 13ms/step - loss: 3.5658 - accuracy: 0.2902\n",
+      "12/12 [==============================] - 0s 13ms/step - loss: 3.5798 - accuracy: 0.2611\n",
+      "45/45 [==============================] - 1s 13ms/step - loss: 10.1893 - accuracy: 0.4621\n",
+      "12/12 [==============================] - 0s 12ms/step - loss: 10.2003 - accuracy: 0.4333\n",
+      "45/45 [==============================] - 1s 14ms/step - loss: 2.6642 - accuracy: 0.1044\n",
+      "12/12 [==============================] - 0s 13ms/step - loss: 2.6594 - accuracy: 0.0889\n",
+      "45/45 [==============================] - 1s 15ms/step - loss: 2.3056 - accuracy: 0.0939\n",
+      "12/12 [==============================] - 0s 15ms/step - loss: 2.3030 - accuracy: 0.1167\n",
+      "45/45 [==============================] - 1s 15ms/step - loss: 1.0158 - accuracy: 0.8970\n",
+      "12/12 [==============================] - 0s 15ms/step - loss: 1.0555 - accuracy: 0.8694\n",
+      "45/45 [==============================] - 1s 16ms/step - loss: 1.0723 - accuracy: 0.8824\n",
+      "12/12 [==============================] - 0s 16ms/step - loss: 1.1238 - accuracy: 0.8583\n",
+      "45/45 [==============================] - 1s 16ms/step - loss: 1.1955 - accuracy: 0.8866\n",
+      "12/12 [==============================] - 0s 18ms/step - loss: 1.2342 - accuracy: 0.8806\n",
+      "45/45 [==============================] - 1s 17ms/step - loss: 2.0627 - accuracy: 0.9088\n",
+      "12/12 [==============================] - 0s 12ms/step - loss: 2.1114 - accuracy: 0.8833\n",
+      "45/45 [==============================] - 1s 13ms/step - loss: 2.2699 - accuracy: 0.5560\n",
+      "12/12 [==============================] - 0s 14ms/step - loss: 2.2776 - accuracy: 0.5139\n",
+      "45/45 [==============================] - 1s 11ms/step - loss: 2.3020 - accuracy: 0.1044\n",
+      "12/12 [==============================] - 0s 11ms/step - loss: 2.3079 - accuracy: 0.0778\n",
+      "45/45 [==============================] - 1s 12ms/step - loss: 2.3020 - accuracy: 0.1044\n",
+      "12/12 [==============================] - 0s 13ms/step - loss: 2.3081 - accuracy: 0.0778\n",
+      "45/45 [==============================] - 1s 14ms/step - loss: 0.0471 - accuracy: 0.9986\n",
+      "12/12 [==============================] - 0s 14ms/step - loss: 0.1069 - accuracy: 0.9722\n",
+      "45/45 [==============================] - 1s 15ms/step - loss: 0.0652 - accuracy: 0.9986\n",
+      "12/12 [==============================] - 0s 14ms/step - loss: 0.1282 - accuracy: 0.9667\n",
+      "45/45 [==============================] - 1s 25ms/step - loss: 0.2069 - accuracy: 0.9979\n",
+      "12/12 [==============================] - 0s 14ms/step - loss: 0.2611 - accuracy: 0.9694\n",
+      "45/45 [==============================] - 1s 14ms/step - loss: 0.4478 - accuracy: 0.9812\n",
+      "12/12 [==============================] - 0s 14ms/step - loss: 0.4928 - accuracy: 0.9639\n",
+      "45/45 [==============================] - 1s 16ms/step - loss: 1.5912 - accuracy: 0.6430\n",
+      "12/12 [==============================] - 0s 15ms/step - loss: 1.6045 - accuracy: 0.6222\n",
+      "45/45 [==============================] - 1s 15ms/step - loss: 2.3066 - accuracy: 0.0995\n",
+      "12/12 [==============================] - 0s 16ms/step - loss: 2.3218 - accuracy: 0.0861\n",
+      "45/45 [==============================] - 1s 14ms/step - loss: 1597.5291 - accuracy: 0.0640\n",
+      "12/12 [==============================] - 0s 15ms/step - loss: 1597.5353 - accuracy: 0.0444\n",
+      "45/45 [==============================] - 1s 14ms/step - loss: 0.0054 - accuracy: 1.0000\n",
+      "12/12 [==============================] - 0s 15ms/step - loss: 0.0781 - accuracy: 0.9806\n",
+      "45/45 [==============================] - 1s 16ms/step - loss: 0.0267 - accuracy: 1.0000\n",
+      "12/12 [==============================] - 0s 17ms/step - loss: 0.0868 - accuracy: 0.9861\n",
+      "45/45 [==============================] - 1s 16ms/step - loss: 0.4085 - accuracy: 0.9631\n",
+      "12/12 [==============================] - 0s 15ms/step - loss: 0.4859 - accuracy: 0.9222\n",
+      "45/45 [==============================] - 1s 15ms/step - loss: 2.8874 - accuracy: 0.1628\n",
+      "12/12 [==============================] - 0s 15ms/step - loss: 2.9575 - accuracy: 0.1583\n",
+      "45/45 [==============================] - 1s 15ms/step - loss: 2.5044 - accuracy: 0.1037\n",
+      "12/12 [==============================] - 0s 15ms/step - loss: 2.5299 - accuracy: 0.0889\n",
+      "45/45 [==============================] - 1s 16ms/step - loss: 372.5927 - accuracy: 0.0967\n",
+      "12/12 [==============================] - 0s 15ms/step - loss: 371.5039 - accuracy: 0.1139\n",
+      "45/45 [==============================] - 1s 16ms/step - loss: nan - accuracy: 0.1044\n",
+      "12/12 [==============================] - 0s 15ms/step - loss: nan - accuracy: 0.0778\n",
+      "45/45 [==============================] - 1s 13ms/step - loss: 4.4143 - accuracy: 0.0995\n",
+      "12/12 [==============================] - 0s 13ms/step - loss: 4.4885 - accuracy: 0.1056\n",
+      "45/45 [==============================] - 1s 13ms/step - loss: 2.4027 - accuracy: 0.0953\n",
+      "12/12 [==============================] - 0s 13ms/step - loss: 2.4071 - accuracy: 0.1250\n",
+      "45/45 [==============================] - 1s 14ms/step - loss: 2.4921 - accuracy: 0.0939\n",
+      "12/12 [==============================] - 0s 17ms/step - loss: 2.4936 - accuracy: 0.1167\n",
+      "45/45 [==============================] - 1s 14ms/step - loss: 2.4047 - accuracy: 0.0995\n",
+      "12/12 [==============================] - 0s 12ms/step - loss: 2.4569 - accuracy: 0.0861\n",
+      "45/45 [==============================] - 1s 15ms/step - loss: 962.0815 - accuracy: 0.0939\n",
+      "12/12 [==============================] - 0s 16ms/step - loss: 959.3554 - accuracy: 0.1167\n",
+      "45/45 [==============================] - 1s 16ms/step - loss: nan - accuracy: 0.1044\n",
+      "12/12 [==============================] - 0s 15ms/step - loss: nan - accuracy: 0.0778\n",
+      "45/45 [==============================] - 1s 15ms/step - loss: nan - accuracy: 0.1044\n",
+      "12/12 [==============================] - 0s 16ms/step - loss: nan - accuracy: 0.0778\n"
+     ]
+    },
+    {
+     "data": {
+      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAyAAAANbCAYAAAC6lftqAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjguNCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8fJSN1AAAACXBIWXMAAA9hAAAPYQGoP6dpAACs+0lEQVR4nOzdd3xTVR/H8W+6KS0bWqDsadl7b1AU2aggSxHcIijT9QCiqIAoAooIiIOhLFmCDAUBmbKX7NVCWaV0t0meP9IWQ1sIGG5K+3k/L14+ubknOfdHOMnvnt8912S1Wq0CAAAAAAO4uboDAAAAALIOEhAAAAAAhiEBAQAAAGAYEhAAAAAAhiEBAQAAAGAYEhAAAAAAhiEBAQAAAGAYEhAAAAAAhiEBAQAAAGAYEhAAD6Rhw4apXLlyt/3TvHnz//QeCxcuVLly5XTu3Ln72ua/Onv2rMqXL6/atWsrNjbWsPcFAOBemKxWq9XVnQCAu3XmzBldvXo15fGUKVN08OBBTZo0KWWbl5eXgoOD7/k9rl69qjNnzig4OFheXl73rc1/NWHCBK1atUrnzp3T+++/r44dOxryvgAA3AsSEACZwrBhw7Rt2zatW7fO1V0xlMViUfPmzdWhQwcdPHhQ169f17x581zdLQAA0kUJFoBMbevWrSpXrpzmzp2rZs2aqX79+tq4caMk6eeff1anTp1UtWpVVa5cWe3bt9eKFStS2t5aTjVs2DA988wzWrBggR555BFVrFhR7dq10/r16/9TG0natWuXunfvrqpVq6pp06aaNWuWnnnmGQ0bNuy2x7dx40aFhoaqWbNmateunXbv3q3Dhw+n2u/KlSt66623VL9+fVWrVk3du3fXzp07U55PSEjQ5MmT1bJlS1WuXFlt2rTRggULUp7v2bOnevbsmWZst27dmnLswcHB+vnnn9WwYUM1btxYR48eldls1tdff63HH39clStXVtWqVdW1a1f99ddfdq+3f/9+9e3bVzVq1FDdunU1cOBAhYaGKjExUQ0bNtSbb76Z6rgeffRRDR8+/LYxAgBkLCQgALKECRMmaOjQoRo6dKiqVq2qH3/8Ue+9955atGihqVOnauzYsfL09NTgwYMVEhKS7uvs379f06dPV//+/TV58mR5eHiof//+un79+j23OX78uJ555hlJ0qeffqrXXntNX3/9tV2CkJ4FCxaoRIkSqlKlilq1aqUcOXJozpw5dvtER0era9eu2rx5s958801NmjRJ2bNnV9++fXX8+HFJ0tChQ/X111+rS5cumjp1qpo0aaK33npLixcvvmMf/s1sNuurr77S6NGjNWDAAJUuXVrjxo3T5MmT9dRTT+mbb77RqFGjdO3aNb3++uuKjo6WJB0+fFjdunVTTEyMPvroI40aNUoHDx5Unz59ZLVa1aFDB61Zs0aRkZEp77Vnzx6dOHFCnTp1uqs+AgBcy8PVHQAAI3Tt2lWtW7dOeXz27Fn16dNHr7zySsq2oKAgderUSX///bcKFSqU5uvcuHFDCxcuVNGiRSVJvr6+6tGjh7Zs2aJHHnnkntpMnTpVfn5++uabb5QtWzZJUsmSJdW1a9fbHlN4eLjWrVun1157TZLk7e2tNm3aaMmSJRoyZIiyZ88uSVq0aJHOnj2rxYsXq3z58pKkmjVrqkOHDtq+fbssFouWL1+ut99+W7169ZIk1atXTyEhIdq6das6dOhw237c6sUXX1TTpk1THoeFhWngwIF2Myg+Pj567bXXdOTIEVWrVk1TpkxRzpw5NWPGDHl7e0uSAgMDNWDAAB05ckSdO3fWtGnTtGrVKnXu3DnluIoWLaqaNWveVf8AAK5FAgIgSyhXrpzd4+TSphs3bujUqVM6depUSklQQkJCuq+TJ0+elERCsv1IlqSYmJh7brNlyxY1adIkJfmQpGrVqqlw4cK3PaYlS5YoMTFRzZs3V0REhCTpkUce0Zw5c7R06dKUBGbHjh0KCgpKST4kW7Ly66+/SlLKjEmrVq3sXv+zzz677funp2zZsnaPx48fL8l2gf7p06d18uTJlGt1kmO9c+dONWnSJCX5kKTKlSvbXdNTo0YN/fLLL+rcubPi4+O1YsUK9e7dWyaT6Z76CQBwDRIQAFlC3rx57R6fOXNG7733nrZs2SIPDw+VLFkyJUm53doc/04SJKX8+LVYLPfc5urVq6n6J0n58+dP9zUl2zUXFotFbdq0SfXc3LlzUxKQ8PDwNF8/WXh4uKTUMbpXt77Ovn37NHLkSO3bt08+Pj4qXbp0SnKVHOs79VGSunTporfeekshISHas2ePIiIiWPELAB5AJCAAshyLxaLnn39enp6e+umnnxQcHCwPDw8dO3ZMS5YsMbw/gYGBunLlSqrtV65cUYkSJdJsc/DgQR06dEivvvqqateubffcunXr9O2332rPnj2qUqWK/P3907wvya5du+Tn56ccOXJIsiVCybMzknTixAldvXo1pcTJbDbbtU++fuN2IiMj1bdvX5UrV07Lli1TqVKl5ObmpvXr12vVqlUp+/n7+9stq5xs/fr1Kl++vAICAtS6dWuNHj1aq1at0q5du1SvXr10S+UAABkXF6EDyHKuXbumkydPqkuXLqpcubI8PGznYjZs2CDp9rMZ90OtWrW0YcMGxcXFpWw7dOjQbW9mOH/+fHl5eemZZ55RnTp17P4899xzcnd319y5cyXZrvc4e/asjhw5ktI+Pj5er732mn766SfVqFFDkrRmzRq795gwYYLef/99SZKfn58uXLhg9/zff/99x2M7ceKEwsPD1atXL5UpU0ZubravnVtjXbNmTf3555+Kj49PaXvkyBE9//zz2rdvnyTbtTOPPfaYli1bpj///JPZDwB4QDEDAiDLyZs3rwoXLqwff/xRgYGBypEjhzZu3KhZs2ZJuv31HPfDiy++qBUrVqhv377q06ePIiIi9Pnnn8tkMqV5fUN8fLyWL1+uJk2ayN/fP9XzBQoUUIMGDbRixQoNHz5cnTp10vfff6+XXnpJr7/+uvLkyaMff/xRsbGx6tmzp4oWLarWrVtr3Lhxio2NVYUKFbRx40atXr065TqQZs2aad26dfrggw/UsmVL7dy506EVskqUKCE/Pz999dVX8vDwkIeHh1atWqX58+dLuhnrl19+WU899ZT69eun3r17Kz4+Xp9//rkqVKigxo0bp7xely5d9NRTT8nPz08PP/zwPUQbAOBqzIAAyJKmTJmigIAADRs2TAMGDNDu3bv15ZdfqmTJktqxY4ehfSlWrJimT5+uuLg49e/fXxMmTFC/fv2UP3/+lJWs/m3NmjUKDw/X448/nu5rduzYUbGxsVq0aJH8/Pz0ww8/qFq1avrggw/0+uuvKy4uTt9//33KxfFjx45Vr1699P333+uFF17Qxo0b9dlnn6WsHNa5c2f169dPK1asUL9+/fT333/r888/v+Ox+fv7a8qUKbJarXr99dc1ZMgQhYSE6IcfflD27NlTYh0cHKzvv/9eFotFAwcO1KhRo1S1alVNmzbN7o7yVatWVe7cudWmTRv5+PjcVZwBABkDd0IHABf766+/5Onpabec7PXr19WgQQMNGTIkZWlcSHv37tUTTzyhBQsWqGLFiq7uDgDgHlCCBQAuduDAAU2cOFFvvPGGKlSooGvXrmnGjBny9/e/7SxHVrJ161Zt3bpVixcvVt26dUk+AOABRgICAC7Wp08fxcfHa86cOQoNDZWvr69q166tjz/+WHny5HF19zKEa9euaebMmSpdurTGjBnj6u4AAP4DSrAAAAAAaMqUKfrrr7/0/fffp7vPtWvXNHr06JTVDFu3bq3hw4fL19fX4ffhInQAAAAgi/v22281ceLEO+7Xv39/nT17NmX/TZs2aeTIkXf1XpRgAQAAAFnUxYsX9fbbb2vnzp3p3vw22a5du7Rt2zatWLFCpUqVkiSNGjVKffv21RtvvKGAgACH3pMZEAAAACCLOnDggHLmzKklS5aoSpUqt913x44dyp8/f0ryIUm1a9eWyWTSzp07HX5PZkAAAACAB1iLFi1u+/zatWvTfa558+Zq3ry5Q+9z8eJFFSxY0G6bl5eXcuXKpdDQUIdeQ8piCUjrnH1c3YXMjfUM7jtTnlyu7kKmZr123dVdyPSsZrOruwD8JyY3ikfut5URM13dhXRZLpR1dRfSUcSQd4mJibG7OWwyb29vxcXFOfw6WSoBAQAAADKb281wOJOPj4/i4+NTbY+Li2MVLAAAAADOFRgYqLCwMLtt8fHxCg8Pd/gCdIkEBAAAAHCIJYP+zyi1atXShQsXdPr06ZRtW7dulSRVr17d4dchAQEAAACQitls1qVLlxQbGytJqlKliqpXr66BAwdq79692rJli/73v/+pQ4cOzIAAAAAA+G9CQ0PVsGFDrVixQpJkMpk0adIkBQUFqXfv3howYIAaN26sESNG3NXrmqzWrLN0Eatg3WdZ56PkMqyCdX+xCtb9xypYeNCxCtb9l5FXwYoLLenqLqTJu+AJV3fhrvCvCAAAAIBhSEAAAAAAGIb7gAAAAAAOsIhyc2dgBgQAAACAYUhAAAAAABiGEiwAAADAAUbe9C8zYwYEAAAAgGFIQAAAAAAYhhIsAAAAwAFmbrrsFMyAAAAAADAMCQgAAAAAw1CCBQAAADiAGxE6BzMgAAAAAAxDAgIAAADAMJRgAQAAAA4wU4LlFMyAAAAAADAMCQgAAAAAw1CCBQAAADiAVbCcgxkQAAAAAIYhAQEAAABgGEqwAAAAAAeYrZRgOQMzIAAAAAAMQwICAAAAwDCUYAEAAAAOsLi6A5kEMyAAAAAADEMCAgAAAMAwlGABAAAADjBzI0KnYAYEAAAAgGFIQAAAAAAYhhIsAAAAwAFmKrCcghkQAAAAAIYhAQEAAABgGEqwAAAAAAdwI0LnYAYEAAAAgGFIQAAAAAAYhhIsAAAAwAFmmVzdhUyBGRAAAAAAhiEBAQAAAGAYSrAAAAAAB1i4EaFTMAMCAAAAwDAkIAAAAAAMQwmWi9RoUVG93+2kouUK6vrlG1ox8w/N+3SFQ21LVy2mz9a8reeqD9fFM1fuc08zphotK6r3O51UtHwhW/xm/KF5ny6/bZvmT9XTU2+0UWDx/Lp07ormf75SK7/bYLdPq6cbqHP/1ipUMkBXL4RrzdzNmvPJUpkTzSn7DJv5opp2rpPq9cc8+6XWL9jmnAPMgGo0Ka9ebz6momUCdP1KpFbM3qyfpqx1qG3pikGasGiAnmv2gcLOXUt3v+ff7aCOzzXRo8UHOqvbDzTGCeeq2bKSer/XOWXcWD79d80bv+y2bZo/VV9dBz2uwOL5FXb2iuZ//qtWzlpvt0+r7g3V5fVHVahkAV29cF1r5mzS7I+X2I0bmZEr41mxQTk9+7/OKlmpqGKi4vTnom2aNWqBom/E3pdjdQVXfs9lz5lNz/yvixq0raFs2b116uA5fTtqofZsOHRfjvVBwipYzkEC4gIP1S6lEXP7a8PCbZr1/kJVrFdGvd/tJJObm+aOu/3gXaJiEY36aYA8PLPuX91DtUtrxNzXbfEbvVAV65VV7/c6yeRmSjd+jTrU1KCpfbX4y9XauWa/6rWppgGTnlVcbLx+/2mLJKn9S6300sdP689F2/XNuz8pZ14/9RjeQSUqBOn97pNSXqtUpaJaO3ezlk6z//F9/vjF+3fQLvZQ9eL637TntGHZbn03foUq1Cyh3oMek5vJpLmT19y2bYmHCmnkzH7y8HS/7X4Va5dUu2caObPbDzTGCecKrlNaI34aoPULtmrWqAWqUL+snvlfZ7m5mTRn7NI02zTqWEuDp/XT4imrtWP1XtVvW0MDJ/dRXEy8fv/pL0lSh5db6aVPemjDom2a9vY85czrp55vd7T9HXSbaOQhGsqV8SxVuag+XDxIu34/oPe7T1LegrnUZ9QTKlK2kN5qP9awGNxPrvyec3MzafSCN5Q/KK+mv/eTwsMi1P6lVnp//kC93myUTh44Z1gckHnx7eQCPYa114l9ZzT2hW8kSTvX7pe7p7ueHPCYFk5apfjYhFRtPDzd1e6Flur1doc0n89KegxPit/z0yRJO9fsl7uHu54cmH78er/bSRsX79DXw+fa2qzdL//cfur5Vgf9/tMWubmZ1GNYO+1ct18f9J6S0u7ortP6evsHqtYsWLt+PyjvbF4qVCpA8z5drsPbTxhzwBlA9wGP6MTB8xr3xo+SpJ3rD8vD011PvNRCC79Zr/i4dD6zvRup55uP3vEz653NS2+M7aarF68rf6Hc9+UYHjSME87VfXgHndh7RmP7fS1J2rFmnzw83PXkG2204IuV6YwbnbVx8Q5NHTZbUvK4kV293u6o33/6K2nc6KCda/frg56TU9od3X1K03aMUfVmFfT37weMOUCDuTKenV5rreuXb+j97l8oMeHmWftBU/spqEygzh29cJ+P/v5z5fdcs6fqqWz1Enq10YiUZGPvxsP68q/3Vb15RRIQOAXXgBjM08tDlRqW06alf9tt3/jLDvn6+6hi/bJptqv1cGV1H9pOc8cv14z//WxEVzOklPgt2Wm33Ra/bGnGL6BoXgWVKahNS29ts12FSgaocOkA5SqQU/65/bT11912+5w5EqLwyzdUp3VVSVKJikFyd3fT8b1nnHpcGZmnl7sq1ymtTav22W3fuGKPfP18VLF2yTTb1Wr2kLq//ojmTVqtGR+lfUY0Wb+32+nqpRta/XPmLWG7G4wTzuXp5aHKjcpr45Iddtv/XLw9adwol6pNQNF8KlK2YJptCpX617iRx09bft1lt8+ZwyEKvxyh2o9WdfqxZASujufM//2s97pMsEs+EuMTk/rm6YxDdClXf881bFdTezcesUs0EuIS1bf6cC34YqVzDvIBZpYpQ/550JCAGCyweH55eXvq/DH7MzQhJ8IkSYVLBaTZ7p+/T6p35SGaO26ZzImW+97PjOpm/OzLnUJO2B4XLh2Yqk2RcoUkKY02YSltoq5HKzEhUQFF89nt45fLV/65fBVYzLa9ZKWikqQ2fZpq9tHPtPTyNI1bOVzlaqb9IzwzCCySV57eHjqfFK9kIacuS5IKl8ifZrt/9pxV74bva+7kNTKb0//MVmtYVi061dSEwXNksbK+ocQ44WyBJdKLp21MCCpzm3HjlrPpIUmllkH/GjcC0xw3siuwWNr/Nh50ro7n5ZBrOnngrCTJJ7u3qjUN1jMjumjfpiMp2x9krv+eK6LTh86rw8ut9O3eT7T86jeatGGEKjVInVgC98rlJViJiYn67bfftGPHDoWEhCg+Pl7ZsmVTYGCgatasqVatWsnDw+XddBq/nL6SlOpCueTHvjmypdnuSmj4fe3Xg8IvV3L8Yuy2p8TP3yd1m6SYR93aJjK5TTbFxcRrw8Ltavt8C50+FKLNy3YqV74cevGTp5WYYJa3r7ck2/Ufkq1k6KNnv5R/Hj899UYbfbxsiAa2GJ0pp6az57R9JqMj4+y2R0fZHvv6pY65JF25eP2Or+3r76MBH3fV95+u1PmTl/5jTzMPxgnnSolnxF2MG+mMNTG3jBvrF2xT2xda6tSh89q8dKdy5c+hlz7prsQEs3yyezn9WDKCjBTPn89Mlpe3p65fuaGvh8/57weXAbj6ey5nPn816lBTkeHR+ubdnxQXE68nBz6mDxa9qQEt3teJfQ9+kgfXc+kv+zNnzqhfv366ePGigoODVaBAAeXMmVNxcXE6dOiQFixYoC+++ELffPONChUq5MquOo3JzTZNZk3nTK+VO9zclsl09/FLjrluaZPyWhbbmeKJA2YpIS5BAyY9ozem9FFsVJx+/vxXeWfzUly07cf2osm/6c/F27V7/c2VQHavP6jpuz5W10FtNebZL//bAWZAbqakidJ0Yv5fZi1eeK+DLl8I16Lp6++8cxbCOOFcJjfbZzi9qFnSGjdSxppUT9jaJD0x8fVvlRCXoIGT++jNL/sqNipOP322Qt6+3oqNindK/zOajBJPdw93/e+Jz+Tu4aYOLz+s8b+9pXc6jdeeDYfv/eAyAFd/z3l6eSh7Tl+93ux9XQ6xrVq4f/M/mrnnYz058DF91Gfqfz/IB5jF+uCVO2VELk1ARo4cqaCgIM2fP1/+/v6pno+IiNDAgQM1atQoffXVVy7oofNFXY+WZDsb8W/JZzSiIqIN79OD5M7xi3G4Tbbs3nZtYqPiNOHVmfpy6GwFFMmrC2cuKy46Xg/3aKi9p2xn588du6Bzt5QdRF2P0cEtR1WyUpH/engZUmRSfG496+abFL9bz9I5qnbzYDVpW039206Qyc0kk0xyS/qydHN3k9ViTfcLOLNjnHCuO8Xz1jP59m3sP/cp48b1f40br8zQl0N+VEDRfLpw+pLiouP1SM9GCj1pX7aYWWSUeJoTzfp73X5J0q7fD+jr7R+q66C2D3wC4urvuegbsTr7T2hK8iHZZqoObj2WUoYM/FcuvQZk586dGjJkSJrJhyTlyJFDgwcP1vbt2w3u2f0TcjJM5kSzCpUsYLc9+fGZwyGu6NYDI/342Wri04rf2aSa4zvFvHbrKgquU1qxUXE6fThEcdHxypnPX/mD8ujYntOSpCada6tas+BU7+GdzUsRV278x6PLmELPXJY50ayCxezrhgsVtz0+c/Telh9u+FgVeft4aerqoVp+fLyWHx+vp19/RJK0/Ph4DRzb9b91/AHGOOFcISduP26cPnw+VZtzR0Nt+9xyvU3y4zNJbeq0rqLgumVs48ah87ZxI7/9uJHZuDqedR+rpoq3XI+QmGDWyQNnlT8ojxOO0LVc/T0XcuKiPL1Sn5/28HRXfAyr68E5XJqA5MiRQ2Fhtz9DFBISIh+ftGvMH0QJcYnat/kfNWhb3W57w/Y1dSM8Skd2nnRRzx4MCXGJ2rfpHzVoV8Nue8P2NXXjWpSO7Ey9NG7oiTCFnAxTw/a1bmlTS+eOhirsrO0mbW36NFW/D56y26fjyw/LYrZo68o9kqTH+zbXaxN62d3TIm/BXAquU1p7Nx5xyjFmNAlxidq37YQatK5st73hY1V043q0juy+txXBfvhspfq3/dTuz6+zbfcC6N/2U/3wWdZdbYVxwrkS4hK0b9MRNWhX0257ow61bOPGjtTjRsiJMIWcCFOjDrVStTn7z7/Gjeeaq98H9slyp5cfsY0bt6w2lFm4Op5dXn9U/T/vLTf3mz9hfHNk00O1S2eK6xNc/T23/be9Klm5qIqULZiyj3+e7AquU0b7//rHKcf4IHP1alesguUEXbp00fDhw/XTTz/p9OnTio+31XfGx8fr7NmzWrBggd5++2116tTJld10ujljl6pczZJ6e9ZLqtmyknq93VFd+rfWvPHLFR+bIF9/H5WvWVI586Y9M5TV3Yzfy6rZqpJ6vdNRXV5vrXnjl92MXy37+M35eImadK6tVz7tqRotK+qVT3uqSefamjV6Uco+v3y1Rg/VLq0XPuqmKo0fUu93O6nroMe1YOIqXUiamp798RIFFMuvd398VTVaVlTTJ+rq4+VDFRkerfkTfzU8FkaZ+8VvKle1qN6a3Fs1m5ZXzzceVefnm2ne5DWKj0uQr5+3ylcrppx5sjv8mmHnrunovrN2f66E2S5cP7rv7G3vmJ4VME441+xPlqh8rZJ6+/tXVLNVZfV6t5O6DHhUc8ct/de4UUo5892M5+yPf1GTznX06oReqtmykl6d0EtNOtfRd6MXpuyz+MvVCq5TWi9+/LSqNHlIvd/rrK6D22r+5ytTxo3MyJXx/HHMYgWVKah3f3hVNVpUVKOOtfTJ8mHy8fXW9x8sStXXB5Erv+cWf7lal89f1aifB6hplzqq07qKRi94Q1arVfM/z7zfczCWyerCImur1arJkydr5syZio5OXdOcPXt2de/eXa+//rrc3P57rtQ6Z5///BrOUv/x6uo5vL0KlwnUldBwLZ22TgsnrZIkVW5YTp8sH6rxL03X6tmbUrVt9XQDvfnlc+pdabAunrlidNfTZ+BHqf7j1dXzrQ62+IVcSx2/FcM0/sVv7OL32LNN1bl/a+UvnEehp8L006fLtXbuX3av27RLHXUb3FYBxfIp7OwVLftmnZZMtb/jebVmweo+tL1KVAiSxWLVznX7Nf3dn3Tp3NX7ftymPLnu+3ukp/4jldRjQGsFlSygyxeva9l3G7Xwmz8kSZXqltInc1/V+EGztWZ+6pLJll1q6c1xT6t3w1G3TSy6D3hEPQa01qPFB96vw7gt67U7r9xlpMw4TljN5jvvdJ/Ub1tDPd/uqKDkcePrtSn3NajcqLzG/jpc416YptU/bkxp81ifpurS/1HlD8qj0FOXNG/cMq2du9nudZs+UVdPD2mXMm4snbZWS75aY+ixuYIr41m1abB6DO+gkpWKymqxaM+fhzVzxM86eyT0vh+3yQm/Rxzhyu+5fIVyq8+oJ1SrVSV5eHrowJajmvbWXJ02qPxzZcRMQ97nXvx9JmNeB1O96IN1fzKXJiDJEhISdOjQIV28eFExMTHy8fFRYGCgypcvLy8v5y1jmJESkEzJ9R+lTM+VCUhWkNESkMzIlQkI4AxGJSBZWUZOQLafKe7qLqSpVtFTru7CXckQN9jw9PRU5cqV77wjAAAAgAcaaTwAAAAAw2SIGRAAAAAgo+NGhM7BDAgAAAAAw5CAAAAAADAMJVgAAACAAx7Em/5lRMyAAAAAADAMCQgAAAAAw1CCBQAAADjAbOXcvTMQRQAAAACGIQEBAAAAYBhKsAAAAAAHWDh37xREEQAAAIBhSEAAAAAAGIYSLAAAAMAB3IjQOZgBAQAAAGAYEhAAAAAAhqEECwAAAHAANyJ0DqIIAAAAwDAkIAAAAAAMQwkWAAAA4AALq2A5BTMgAAAAAAxDAgIAAADAMJRgAQAAAA4wc+7eKYgiAAAAAMOQgAAAAAAwDCVYAAAAgAO4EaFzEEUAAAAAhiEBAQAAAGAYSrAAAAAAB1g4d+8URBEAAACAYUhAAAAAABiGEiwAAADAAWarydVdyBSYAQEAAABgGBIQAAAAAIYhAQEAAABgGK4BAQAAABxg5ty9UxBFAAAAAIYhAQEAAABgGEqwAAAAAAdYrJy7dwaiCAAAAMAwJCAAAAAADEMJFgAAAOAAVsFyDqIIAAAAwDAkIAAAAAAMQwkWAAAA4ACz1eTqLmQKzIAAAAAAMAwJCAAAAADDUIIFAAAAOMDCuXunyFoJiBsfmvvKbHZ1DzK9xDPnXN2FTM09Vy5XdyHTM1mtru5C5ufu7uoeZG581wH/Gb/IAQAAABgma82AAAAAAPfIbOXcvTMQRQAAAACGIQEBAAAAYBhKsAAAAAAHWMSNCJ2BGRAAAAAAhiEBAQAAAGAYSrAAAAAAB7AKlnMQRQAAAACGIQEBAAAAYBhKsAAAAAAHmDl37xREEQAAAIBhSEAAAAAAGIYSLAAAAMABFis3InQGZkAAAAAAGIYEBAAAAIBhKMECAAAAHMAqWM5BFAEAAAAYhgQEAAAAgGEowQIAAAAcYLFy7t4ZiCIAAAAAw5CAAAAAADAMJVgAAACAA8ziRoTOwAwIAAAAAMOQgAAAAAAwDCVYAAAAgANYBcs5iCIAAAAAw5CAAAAAADAMJVgAAACAA1gFyzmYAQEAAABgGBIQAAAAAIahBAsAAABwAKtgOQdRBAAAAGAYEhAAAAAAhqEECwAAAHCAmRIspyCKAAAAAAxDAgIAAADAMJRgAQAAAA6wcCNCp2AGBAAAAIBhSEAAAAAAGIYSLAAAAMABrILlHEQRAAAAgGFIQAAAAAAYhhIsAAAAwAEWK6tgOQMzIAAAAAAMQwICAAAAwDCUYAEAAAAOMHPu3ilIQFykRosK6v12RxUtV1DXr0Rqxcw/NG/Crw61LV2lmD5bPVzP1XhbF89euc89zZhqtKyo3u90UtHyhXT98g2tmPGH5n26/LZtmj9VT0+90UaBxfPr0rkrmv/5Sq38boPdPq2ebqDO/VurUMkAXb0QrjVzN2vOJ0tlTjSn+Zp1H6umEXP7a8hjH2nvxiNOOz5Xq/lIVT37flcVDQ7S9UsRWjb1N839aPFt27To3khdh3VUwZIBCjtzWT+P+0W/Tl9nt0+RcoXU7+Oeqtw0WOYEs/ZuOKSpg2bpwsmwlH3emj1Azbo2SPX6H3SboD/mbXbK8T0oarSooN5vdfjXOLFe8z5zdJwoqs9+G67nar6TJccJV44Rw2a+qKad66R6/THPfqn1C7Y55wAfEHzX3TtXfYZn7RurgGL50nz9C6cu6ZnKQ5xzgMjSSEBc4KHapTRi9mvasGi7Zn2wSBXrllHvdzrK5OamueNvP7iUqBikUfP6y8Mz6/7VPVS7tEbMfV0bFm7TrNELVbFeWfV+r5NMbibNHbcszTaNOtTUoKl9tfjL1dq5Zr/qtammAZOeVVxsvH7/aYskqf1LrfTSx0/rz0Xb9c27PylnXj/1GN5BJSoE6f3uk1K9pn+e7Or/ee/7eqyuEFyvrEb9MlTr523WzHfnqmLD8np2dDe5ublp9ocL02zTuEtdDZn1qhZNXKEdK3erfofaemPaS4qLide62RslSfmD8uqzjaN19kiIxnT/XN7ZvPTM+9300ap39XzlNxUfGy9JKlW1uNZ8v0FLpqy0e49zR0Pv74FnMA/VLqURP75qGyc+XKyKdcqo9zsdbJ/zT1fctm2JCkEaNTfrjhOuHiNKVSqqtXM3a+m0tXbvcf74xft30BkQ33X3zpWf4VHdv5Cnt33cH6pdWi+M6ablM/64r8cN17BYLJo0aZJ+/vlnRUREqEaNGvrf//6nYsWKpbn/pUuXNGbMGG3atEmSVLduXQ0fPlyBgYEOv2fW/JftYj2GtNWJfWc19sXpkqSdaw/I3dNdT77+qBZO/k3xsQmp2nh4uqvd8y3U6632aT6flfQY3l4n9p3R2OenSZJ2rtkvdw93PTnwMS2ctCrN+PR+t5M2Lt6hr4fPtbVZu1/+uf3U860O+v2nLXJzM6nHsHbauW6/Pug9JaXd0V2n9fX2D1StWbB2/X7Q7jVfHd9L5oS0Z0YeZD3fe0LHd5/Sx72/kCTtWLVbHp7uempoB83/dFlKovBvz7zfTX/O36Kv3phla/PbHvnn9lOvEU+lJCC9Rj6p6BsxGtpqlOJibK9x4WSYRv0yVGVrltT+jYflnc1LhcsU1NyPFunQ1qMGHXHGlDJOvDRD0i3jxJTVtxknmqvX8Kw9TrhyjPDO5qVCpQI079PlOrz9hDEHnEHxXXfvXPkZPr73jN3r+vr7aPiMl7R15W79/NntT35kBZlxFawpU6Zo7ty5GjNmjAICAjR27Fj169dPy5Ytk5eXV6r9Bw4cKLPZrJkzZ0qSRo4cqZdfflkLF6Z9kjItFLIZzNPLQ5UaltOmZX/bbd/4y075+vuoYr0yabar1aqSug9pq7mfrtCMEQuM6GqGlBK/JTvttm/8ZYd8/bOpYv2yqdoEFM2roDIFtWnprW22q1DJABUuHaBcBXLKP7eftv66226fM0dCFH75huq0rmq3vXGn2qrePFjT3/vJKceVUXh6eahy0wrauGir3fYN87fI1z+bKjUqn6pNQLH8KlKuUKo2fy74S4VLB6pwmYKSpIYd62jljHUpyYck/bPzhLoGvaD9Gw9LkkpWLiZ3dzcd333KyUf2YPH08lClBmVTjxNL7mKcGJk1xwlXjxElKgbZPsO3/IjLaviuu3eu/gzf6umh7ZUzn78mv/nDfzouZEzx8fGaMWOGXnvtNTVp0kTly5fXhAkTdPHiRa1evTrV/hEREdq+fbv69eun4OBgBQcH6/nnn9eBAwd07do1h9+XBMRggcXzy8vbU+eP2U/Fh5yw1cAXLh2QZrt/dp1S7yrDNHf88nSvR8gK0o+f7XHh0qmn/4qUKyRJt4l5oKKuRysxIVEBRe3rXv1y+co/l68C/1UPmyt/Dr0yvoe+GjpbVy+E/+djykgKlgywxfefELvtIccuSJKCyhZK1aboQ4UlSef+sS+ROp/SpqACixeQX67sunjqkl6b9JwWXJqh5dE/atQvQ5W/yM3YlqpaXJL0+IsPa17INK2InaNP149S+dqlnXaMD4LA4vlsfw/H0/nMlrrDOPHpCpkTLfe9nxmRq8eIkpWKSpLa9Gmq2Uc/09LL0zRu5XCVq1nSCUf34OC77t65+jP8bwFF86rdCy00f+KvCsuC1+FkBYcPH1ZUVJTq1q2bsi1HjhwKDg7W9u3bU+3v7e0tX19fLV68WJGRkYqMjNQvv/yi4sWLK2fOnA6/LwmIwfxyZpMkRd+IsdseHRkrSfL1z5Zmuyuh4YoMj7q/nXsA+OXylZRG/G4kx88ndZuctjZRt4l5XEy8NizcrrbPt9DDPRrJL5evgkoHatiMF5WYYJa3r3dKu9cnPqND245r7dy/nHdgGUT2XNklSVERt8bX9tg3R+rPp19Sm+hb2sQk/53k8FXO/DkkSX0/6q68hfLow6c/06f9vlKpqsU1bt3/5JMU3+QExNvHSx90m6APn/5MXj6eGrtuhEok/bDLCpI/s8mf62SOjRPR97dzGZyrx4hSSZ9T72xe+ujZL/VRn6/k5eOpj5cNUYkKQU480oyN77p75+rP8L+1f6mVEuIStfjL1GfCsyqL3DLkn3t14YLtZGHBggXtthcoUEChoamvvfT29tYHH3ygbdu2qWbNmqpVq5Z2796tadOmyc3N8X5wDYjBTEl/OVarNc3nrZa0t8PGZLLVXt5N/ExuSfWat7RJeS2L7UzxxAGzlBCXoAGTntEbU/ooNipOP3/+q7yzeSkuOk6S1PLpBqpYv4xeqPOuU44no3FLJ1bJLLeJ761/J6bkl7JY5OFlG2quXbyukZ3HpewbcuyCJv71oVr0aKTlX6/RggnLteHnv7Rr3f6U19m1dp++/ecLPf1WZ33QbcJ/Or4HRXoxTZb8mUVqrh4jFk3+TX8u3q7d6w+lvM7u9Qc1fdfH6jqorcY8++V/O8AHBN91987Vn+FkXj6eeqRnY636fkOWP7HxIGjRosVtn1+7dm2a22NibEnrrdd6eHt76/r166n2t1qtOnLkiKpVq6a+ffvKbDZrwoQJeuWVVzRnzhz5+fk51F8SEINFXbf9I7717I+vn+2Mxq1nnmEv3fj5px+/9Npky+5t1yY2Kk4TXp2pL4fOVkCRvLpw5rLiouP1cI+G2nvqkvIWzKUXxnTTtLfnKfxShNzc3eTmbvuSdXN3k5ubKc0f6A+S5DOPvjl87bYnxy45lvZtkuJ7y+yIT/Jn+nq0YpLOym1fucvuS/XQ1qO6cS1SpaqWkCSd+ydE524p/4q6Hq0Dmw6rZJW0V+PIjKKuJ804MU7cNVeOEZJ07tgFnUsqP7z5+jE6uOWoSlYq8l8P74HBd929c/VnOFn15hWVPUe2lBW0kDn5+Ng+V/Hx8Sn/X5Li4uKULVvqmcrly5dr9uzZ+v3331OSja+++krNmjXTggUL1Lu3Y6uDujwB6dmzZ0qGfifffffdfe7N/RdyMkzmRLMKlSxgtz358ZnDIWk1Q5L042erJ04rfmePXkjap4DdhaG3xrx26yqKvBalg1uP6XTStpz5/JU/KI+O7Tmt6s0qyD93dr0xpY/emNLH7j0+WjpEF09fVu9Kg510pK4RcvyiLb631BgnPz5z8FyqNueO2GJVuHSg3cXjyXXKpw+eU8SVSJnNFnl6e6Zq7+HpkXJhetOn6iviSqT+XrPXbh/vbF6KuHzj3g/sAXPzc57fbnvKZ/ZI1lqS+G64coyQpCadayviamSqVfO8s3kp4kpW/AzzXXe3XP0ZTlandRWFngzT0V2nnHNgmYQ5g66Cld4Mx50kl16FhYWpaNGbpc5hYWEqXz71wjM7d+5UiRIl7GY6cubMqRIlSujUqVMOv6/LrwGpV6+etm/fritXrqhw4cK3/ZMZJMQlat/mf9Tg8ep22xu2r6Eb4VE68vdJF/XswZAQl6h9m/5Rg3Y17LY3bF9TN65F6cjO1Mtehp4IU8jJMDVsX+uWNrV07mhoyoV1bfo0Vb8PnrLbp+PLD8titmjryj3asnK3Xmsy0u7PxNdty85OfH2W/vfU5848VJdIiEvQ3g2H1LCj/U3UGnepqxvXInV427FUbUKOX1DI8Qtq1Lme3fZGnevp7JEQhZ25rNioWO3/85AadKwjT6+b5z2qNa+obH4+2v+nrVyl7UuPqP+UfnZr/+ctlEcVGpTXnvUHnHmoGZptnDiaepxoxzhxJ64cIyTp8b7N9dqEXvLwdE/ZJ2/BXAquUzpT3az0Tviuu3eu/gwnK1+zpA5uTT3mI3MpX768/Pz8tHXrzZUsIyIidPDgQdWsWTPV/gULFtTp06cVF3ezZC8mJkbnzp1L974haXH5DMjLL78sX19fTZw4UVOnTlVQUOa/SG/OuOUas/gNvT3zRa36caOCa5dSl9ce0YwRCxQfmyBffx8VLVdIoSfDdP1KpKu7m+HMGbtUY5YM0tuzXtaqH/5UcJ3S6vJ6a8147+eb8StfSKEnLul60hnHOR8v0Ztf9VXEtUhtWbFLdR+rpiada9uthf7LV2v04eJBeuGjbtqyYreqNnlIXQc9rnnjl+tC0tT0jav2F0cmT2+fOxqqU2nMDjyIZn+wQB+vflfvzntDK2euU3D9cnpiUDt9M+xHxcfGy9c/m4oFBynk+EVdvxwhSfpx9AINnvmKIq7e0F9Ldqheu5pq+lR9vf/UpymvO/2t2Rr3+wh9sPwt/Tx+iXIH5FLfj7rr0JZ/9NeSHZKkH96frzEr39H/FgzSL5NXyj+Pn3r97wnduBapn8ctcUk8XGXO+OUas2ig3p75glb9uClpnHhYM0YyTtyJK8eI2R8v0ehFb+rdH1/Vkq/Xyj+3n3oMb6/I8GjNn+jYHcAzC77r7p0rP8OS7XrAIuUK6o/5W1P1DZmLl5eXevTooXHjxilPnjwqXLiwxo4dq8DAQLVq1Upms1lXr16Vv7+/fHx81KFDB02fPl0DBgzQ66+/Lkn67LPP5OXlpU6dOjn8viZrelc5Gaxv377KlSuXxo0bd9/eo3Xuvvftte9W/TbV1HN4exUuHaAroeFa+s3vWjj5N0lS5Qbl9MmywRr/8gytnrM5VdtW3errzSl91LvyUF3MSMvimY1bMrH+49XV860OKlwmUFdCrmnptHVaOGmVJKlyw3L6ZMUwjX/xG62evSmlzWPPNlXn/q2Vv3AehZ4K00+fLk+1klXTLnXUbXBbBRTLp7CzV7Tsm3VaMjX9ac3k9xry2EeGnN00RxrzJd2gQ231GvGkgsoV0pXzV7VkykrN/9R2993KTYI1/veRGvvsZP0264+UNm2eb6kn3myn/EXyKvREmOZ+tEhrfthg97rB9crq2dHdVL5OGcVFx2nzL9s1ddB3dteWVG9ZWT3e7aKSlYvJYrFox6o9mjb0B106e/m+H7d7rlz3/T3uRv021dRzWLub48T037Vwsm01msoNyuqTpYM1/pWZ6Y8Tk59V7yrDMtY4kZhoyNu4coyo1ixY3Ye2V4kKQbJYrNq5br+mv/uTLp27et+PW5Lk7n7nfQzCd929c+VnOGc+f807MVETB8zSChfc/XxlxEzD39NRr+/q5uoupOnzanPuua3ZbNann36qhQsXKjY2VrVq1dJ7772noKAgnTt3Ti1atNCYMWNSEozjx49r7Nix2rVrl9zc3FSzZk0NHTr0riYRMkwCcvHiRR08eFDNmjW7b++RkRKQTMnABCSrMioByaoyWgKSKRmUgGRpGSgByZT4rrvvSEDu3n9JQFzB5SVYyQICAhQQkPaNiQAAAABkDhkmAQEAAAAyMovV5es3ZQpEEQAAAIBhSEAAAAAAGIYSLAAAAMABZmXMGxE+aJgBAQAAAGAYEhAAAAAAhqEECwAAAHCAxUoJljMwAwIAAADAMCQgAAAAAAxDCRYAAADgAG5E6BxEEQAAAIBhSEAAAAAAGIYSLAAAAMABFm5E6BTMgAAAAAAwDAkIAAAAAMNQggUAAAA4wMyNCJ2CGRAAAAAAhiEBAQAAAGAYSrAAAAAAB3AjQucgigAAAAAMQwICAAAAwDCUYAEAAAAOsLAKllMwAwIAAADAMCQgAAAAAAxDCRYAAADgAIsowXIGZkAAAAAAGIYEBAAAAIBhKMECAAAAHMAqWM7BDAgAAAAAw5CAAAAAADAMJVgAAACAAyxWzt07A1EEAAAAYBgSEAAAAACGoQQLAAAAcACrYDkHMyAAAAAADEMCAgAAAMAwlGABAAAADrCIEixnYAYEAAAAgGFIQAAAAAAYhhIsAAAAwAGsguUczIAAAAAAMAwJCAAAAADDUIIFAAAAOIASLOdgBgQAAACAYUhAAAAAABiGEiwAAADAAZRgOQczIAAAAAAMQwICAAAAwDBZqgTrzCsVXN2FTM37qqt7kAUw83tf+Z1PdHUXMr1LVbLU145LFP/ioKu7AGRalGA5BzMgAAAAAAxDAgIAAADAMMyFAwAAAA6wUAvtFMyAAAAAADAMCQgAAAAAw1CCBQAAADiAVbCcgxkQAAAAAIYhAQEAAABgGEqwAAAAAAdQguUczIAAAAAAMAwJCAAAAADDUIIFAAAAOIASLOdgBgQAAACAYUhAAAAAABiGEiwAAADAAZRgOQczIAAAAAAMQwICAAAAwDCUYAEAAAAOsFKC5RTMgAAAAAAwDAkIAAAAAMNQggUAAAA4wCJKsJyBGRAAAAAAhiEBAQAAAGAYSrAAAAAAB3AjQudgBgQAAACAYUhAAAAAABiGEiwAAADAAdyI0DmYAQEAAABgGBIQAAAAAIahBAsAAABwAKtgOQczIAAAAAAMQwICAAAAwDCUYAEAAAAOYBUs52AGBAAAAIBhSEAAAAAAGIYSLAAAAMABrILlHMyAAAAAADAMCQgAAAAAw1CCBQAAADjAanV1DzIHZkAAAAAAGIYEBAAAAIBhKMECAAAAHGARq2A5AzMgAAAAAAxDAgIAAADAMJRgAQAAAA6wciNCp2AGBAAAAIBhSEAAAAAAGIYSLIM0LFNM/Vs2UKn8eXQtOkbztu3VtA3bHWrr7mbS7Oe7KiYhQc9Mn2/33Iahzyuff/ZUbRp/NFWXI6Od0vcHQf3gYnqlbQOVKJhH4TdiNH/jXs1YlX58fTw99MLj9fRI9bLK5Z9NR89d0tQVW7T54Gm7/ZpWKaV+j9ZR8QK5dTkiWsu3HdKMVduUaLbc70PKcOoHF9MrjyfFODJG8//cqxm/3SHGberpkRpllcsvKca/phHjykkxDsjaMa5drbj6dW+k4kXyKvx6tH5ZtUc/LNia7v4eHm7q2r6WWjeroAL5/HXpSqRWrz+oHxZuVWLizdg92ryCuravpcIFc+vKtUit+v2gZv38l8xZLL6S1LB0MQ1obhuHr0bHaN6Ovfr6T8fH4bl9uyomPkG9vrUfh0vky63BrRqpdvEgJVos2n76vD5etUHnrl2/H4fxQKnRooJ6v9VBRcsV1PUrkVoxc73mffarQ21LVymqz34brudqvqOLZ6/c555mPDVaVlTvdzqpaPlCun75hlbM+EPzPl1+2zbNn6qnp95oo8Di+XXp3BXN/3ylVn63wW6fVk83UOf+rVWoZICuXgjXmrmbNeeTpTInmtN8zbqPVdOIuf015LGPtHfjEacd34PKQgmWU5CAGKBqkYKa3L29ft3/jyau2aTqxQrr9ZYN5GYyaer6bXds37dxLVUKCtS2k2fttufN7qt8/tn10Yo/tPtMqN1z4dGxTj2GjKxKyYL67MX2WrXzH01euklVSxXWK20byGQyafrKtOM7otfDqh9cXF8s3qgzYeF6vG6wPn+pg57/bL52HT8vSapTvqjG92ur33Ye0cTFG1WmUD692r6Bcvtl08c//W7kIbpclRIF9dkL7bXq7380edkmVS35rxivSifGPZNi/Mu/YvxiBz3/eRox/vuIJv6SFON2WS/GFcsV0kdvddK6TYc17cc/VfmhIPXr3kgmk0nfz9+SZpv+zzVX62YVNOunv3T42AWVLRmgZ7vWV0CBHPp40ipJUpfHq+v1vi30+6YjmjJrvXLlyKY+XRuoVPH8evujxQYeoetVK1JQU7q1168H/tFn6zapRtHCGtDc9hmeuuHO43C/hrVUqXDqcTgwh5/mPPeUTl6+pkELfpW3h4cGtKiv6T07qd2U7xSXzo+6rOCh2qU04sdXtWHRds36cLEq1imj3u90kMnNpLmfrrht2xIVgjRqbn95eGbNnykP1S6tEXNf14aF2zRr9EJVrFdWvd/rZIvduGVptmnUoaYGTe2rxV+u1s41+1WvTTUNmPSs4mLj9ftPtnGk/Uut9NLHT+vPRdv1zbs/KWdeP/UY3kElKgTp/e6TUr2mf57s6v957/t6rMiasua/bIO93LyuDl24pGHzV0qSNh49LQ83N/VtXEvfbtp52y+ocoH59Hzj2rp0IyrVcw8Vyi9JWnPwmELCb9yfzj8Ann+sro6cu6R3Z9niu/ngaXm4u+nZh2vph7U7FZdgH9+gfDn1SI1y+nDOWv38515J0rZ/zqhqqUJ6snHllB/H7etV0IVrEXr725WyWK3aeviM8vj7qnvzaho/f70SLVnnDPJtY7zuNjGee0uMS94S47rpxLhZ1orxs13r6+jJMI3+zPajbNuuU/LwcFOPznU0b8kOxccn2u3v7+ej9o9U1VffrdecxbYz+Dv3npEkvfxMU039boMiImP17FP1tW33Kb03dklK2yPHLur7SX1Us0ox7dhjPxuVmb3StK4OX7ikoQuTxuFjts/w8w1r6dvNdxiHA/LphUa1FZbGOPxas3qKiovXs98tUGyC7e/pfHiEpnRrp4qFArXzzPn7c0APgB5D2urEvrMa+9IMSdLOtQfk7umuJ19/VAunrFZ8bEKqNh6e7mr3fHP1Gt4+zeezih7D2+vEvjMa+/w0SdLONfvl7uGuJwc+poWTVqUZm97vdtLGxTv09fC5tjZr98s/t596vtVBv/+0RW5uJvUY1k471+3XB72npLQ7uuu0vt7+gao1C9au3w/avear43vJnJB1k2jcP1wDcp95ururdokgrTl4zG77bweOKru3l2oUL5xuWw83N43p/Ih+2LJbJy9fTfV8+YIFdD0mNksnH54e7qpZJkjrdtvHd82uo8ru46VqpVPH92J4pLp/NFsrth9O2Wa1SmazRZ6e7inbvDzcFROXKIvVmrLtWlSMvDw95OvjeR+OJmO6Y4xLpRPjj9OJscctMY6/JcaRWSvGnh7uqlqxiDZs+cdu+x+b/5FvNi9VCQ5K1cbP11u/rNqtjdvs/07OhlyTJBUKyKXcOX2Vwz+bNm+33+fUuSsKvx6t+jVLOflIMi5Pd3fVLh6k1YfsY7EqaRyuWez24/BHHR/R91t361Qa43Crh0pr/q4DKcmHJO0PuajG46dl6eTD08tDlRqU1aZlf9tt37hkp3z9fVSxXpk029VqVUndh7TV3E9XaMbIBUZ0NcPx9PJQpYbltGnJTrvtG3/ZIV//bKpYv2yqNgFF8yqoTEFtWnprm+0qVDJAhUsHKFeBnPLP7aetv+622+fMkRCFX76hOq2r2m1v3Km2qjcP1vT3fnLKcWUWVmvG/POgIQG5z4rkySkvDw+dunzNbvuZK+GSpOJ5c6fb9uXmdeXp7q5Ja/9K8/nygfkVEROnz7s9rq3vvKwd776icU8+pnx+qa8JyayC8uWUl6eHTofZx/dsWLgkqViB1PFNSDTr4JmLioqNl8kkBeb216AuTRSUP6fmJ52tl6S563eraIFc6tWyhvyyeatS8UB1b1ZNf+4/oYjouPt6XBlJUN50YnwpXJJULMDBGHdOJ8b5s3aMCwXa4pucPCQ7F2p7XKRQ6viGhl3Xp1PXpGrTuG4ZJSSYdSbkqiKj4pSYaFZggZx2+/hl95afn48KBthvz8yK5E4ah6/cMg5fDZd0+3H4laa2cfiL31OPw4Vz5VCObD46fy1C77Zppi1DX9Sed17Tl0+3V8Gc/k49hgdNYPF88vL21PnjF+22h5wIkyQVLhWQZrt/dp1S7yrDNPfTFTInZo0Z0FsFFs9vi92xW2Nne1y4dGCqNkXKFZKkNNqEpbSJuh6txIREBRTNZ7ePXy5f+efyVWCxm9tz5c+hV8b30FdDZ+vqhfD/fEzArVxegnXy5EktW7ZM169fV6NGjdSkSRO75yMjI/XBBx9ozJgxLurhf+Pv4y1JioqLt9seFW977OfjlWa7ioUD9GyDGur1zU9KMKc9/Vm+YH4F5vDT/B379N3mXSqZP49ea1FP3/V9Qp0n/6CYhMQ022Um/tmS4htjH9/opHhnTye+yfo8UluvtmsgSVq0aZ92/HMu5bkd/5zTt6t3aGCnxhrYqbEk6dCZi3prhmMXUGYW/r5JMY69xxg/fEuMj/4rxkeTYtyxsQZ2/FeMZ2adGPtl95EkRUXbxzcm6TOd3ff28U3WpF5ZPdK0guYv26nIKFvytm7jEXV6rJpOnrmsDVuOKndOX73et4XMiWZlyyIzTJKUI2kcjkxvHPZOZxwuFKA+9Wuox8y0x+E82bNJkga1aqi95y/ozfkrlCe7r95o2VCznumi9lO+zxLjcFr8cvpKkqJv2F+PGB1pe+zrny3NdldCw+9rvx4EfrmSYxdjtz05lr7+PqnbJMU76tY2/4p3XEy8NizcrrbPt9DpQyHavGyncuXLoRc/eVqJCWZ5J431kvT6xGd0aNtxrZ37lyo3LOe8gwOSuDQB2blzp5577jkFBATIarXqxx9/VMuWLTV+/Hh5edm+EGJjY7V48eIHNgFxM9lWS7CmMz9mSWOzl4e7xnR+RN/9tUv7zl9MvUOStxf+pvjERB0KvSRJ2nn6vI6FXdGPzz+ldtWCNW/b3nTbZham5Pgq7fjeaVpy/d7j2nXsvIKLBuiFNnUVkNtfr0xaJEl6u1sLta9XQdNWbNHWI2dVOG8Ovfh4PU1+taNe+HyBXclFZnbHGN/hJOX6fce163hSjB9LivHkf8W4bgVN+/VfMW5TT5Nf6agXJmaNGCePEUonvpa0BolbNK1XVu++0Ua7D5zVV/9a8WbcV7YxYugrrTX8tUcVExuv2Yu2y9vbQzFZqL7edI/j8EcdH9GsLemPw57utnLCy1HRem3e0pTx5szVcM3r103tqjykeTv2/fcDeACZ3G4fc2sWub7rXtzp82pN4wObHO9bv/RSXisp3hMHzFJCXIIGTHpGb0zpo9ioOP38+a/yzualuKRZ55ZPN1DF+mX0Qp13nXI8mQ03InQOlyYg48ePV5cuXfTOO+9Ikn799Ve9/fbbevHFFzV16lR5ej74Z+huxNr+QWe/5Qxb9qQEK/n5f3u9ZX2ZTCZ99fsWuScNKibZ/uvuZpI5afDZczY0VdtdZ0IUEROr8oH5nXcQGdiNmKT43nIW3jcp3pExty/jORZiW9rx72PndSMmTiN6PqwqJQsp5Mp1dWpQSdNXbdOUZbbSi51HpQOnL2r+u73Uvn4FzVu/x9mHkyHdMcZpfIb/LVWMe/wrxvXTifE7WSfGN6KSzlBm87bbni2bLb63zozc6ql2NfVS7ybafeCshn+4SAn/upg6JjZBH09apYnfrFNA/hy6EBah2LgEtWlRUbsunr3Nq2YuyePsrTMdKeNwXOrP8IDm9eVmMunL9f8ah03243DyzPafR0/Z/e7bc+6CrmehcTgtUddtZ+Jvnenw9Uua8YuISdUGNlHXbUvop4qdf/qxS69Ntuzedm1io+I04dWZ+nLobAUUyasLZy4rLjpeD/doqL2nLilvwVx6YUw3TXt7nsIvRcjN3U1u7rZqfTd3N7m5mRw6KQLciUsTkCNHjujDDz9Mefzoo4+qQIEC6tu3r4YMGaIJEya4sHfOceZquBLNFhXLm8tue9Gkx8fDUq9t/nCFMiqcO6d2/u+1VM/tGzVAby1YpbWHjqtVcGntORuq45fsL4z0dHfXteisMbifu2SLb5H8uey2Fylge3ziQur4FsqbQ7XLFdGKbYcV/68fawdOX5AkBeb2k9VqlZubSXuOh9i1PR56RdciY1SqYF7nHkgGlm6Mkx6fCE0nxmWLaMV2B2J8ImvHOOSCLb5BBXPZbQ8qaLsu4dTZy+m2HdCvhTq3qa61Gw/rg89W2CUfklS/ZkndiIzTvsPndSrpPgq5cvqqQL4cOnI8/dnVzObMNVuMi946DuexPU5rHH4k2DYO73on9Th84H8DNHzRKq06eFRmi0Ve/1pYIZmHm5vissAMXnpCTobJnGhWoZL2SVihkgUkSWeOpD6BBpubsStgt71QSdt1M2cOh6Rqc/bohaR9Cuh40op4yY//3aZ26yqKvBalg1uP6XTStpz5/JU/KI+O7Tmt6s0qyD93dr0xpY/emNLH7j0+WjpEF09fVu9Kg510pMjKXJqA+Pn56dq1aypevHjKtho1amjs2LHq37+/xowZo379+rmug04Qn2jWjtPn1DK4tGZsvLk6xcMVyuh6TKz2nbuQqs3LP/wiL3f7v5oR7VvY/vvLWp27dl0JZrPebdtcK/f/o+ELVqXs1+KhUsrm5antJ88pK4hPNOvvY+fUomppfbfmZnxbViujiOhY7T+VOr6F8+bU/3o8rNj4RK3ccfOmSvWDi0uS/jl/WeGRMUo0W1StdGFtOngqZZ9iBXIrt182nb8Scd+OKaNJiXGVdGJ8+jYxTkgnxucuKzzqDjG+nDViHJ9g1p4DZ9W4btmUJXUlqWn9sroRGauDR1PHV5Je6NFIndtU17xfdmjSzLTvmdL+karK4Z9NLw37MWXbk21ryGKxaPOO4849kAwseRx++KHSmrHp5mf4kaRxeO/51DF+afYv8vKwH4dHtrWNw/9bahuHo+MTtPP0ebV6qLQ+XbMp5TqRuiWKKLu3l3aczrqrYCXEJWrf5qNq8Hh1zf/it5TtDdvV0I3wKB35+6QLe5exJcQlat+mf9SgXQ3Nn7gyZXvD9jV141qUjuw8kapN6IkwhZwMU8P2tfTn4h3/alNL546GKizpBESbPk2VI4+fBrb8IGWfji8/LIvZoq0r9ygqIlqvNRlp99plqhZX/897a+Lrs3Rwq/1KclkRJVjO4dIEpEmTJho1apRGjBih4ODglJKrli1b6q233tLo0aMVGvrgnyWZ+sc2TX+msyZ0baOFOw+oatGC6tOwpj797U/FJZqV3dtLpfLn0dmr13UtOkZHL6Y+G5d8seSBkJtnLadv3KGXm9XVlchobTx6SmUD8+mV5vX0x+ET+uv4mVSvkVl98+s2fdW/sz7p20a/bD6gKiULqnfLmvp88Z+KSzAru4+XSgbm0bnL13UtMkY7j57TtiNnNOyp5vL39dbpi9dUs2wRPdOqpub/uVcnL9hmlGb//rd6taohSdpy+LQK5smhFx6rq9ArEVq4MWvVdX+zcpu+eq2zPnmujX75618x/uUOMX6yufyzeet02DXVLJMU4417dfLiv2LcMinGh06rYN4ceuHRugq9GqGFm7JOjL/7eYsmjHxSowa30/K1+1SxfCF161BbX323XvHxifLN5qXiRfIq5EK4wiNiVLpEAT3dqY4OHQ3Vuk2HFVy2oN3rnTp7RdEx8Zq//G99OuIJvfZcM23adlzVKxVVzy519cP8LQq9mLXu0v3lhm2a2auzPnuyjRb8fUDVihbUc/Vratyam+Nw6fx5dCZpHP4njVmR5JKr/f8ahz9ds0nfPdtFX/fooBmbdiqfn6/ebNVQu8+Gat2R1D8Us5I545drzKKBenvmC1r14yYF1y6lLq89rBkjFyg+NkG+/j4qWq6QQk+G6fqVSFd3N0OZM3apxiwZpLdnvaxVP/yp4Dql1eX11prx3s83Y1e+kEJPXNL1K7al+Od8vERvftVXEdcitWXFLtV9rJqadK5td8+PX75aow8XD9ILH3XTlhW7VbXJQ+o66HHNG79cF07Zrie9cdX+fjfJZVznjobq1MGscXIT95/Jmt5VTga4fv26Bg4cqL/++ktTp05V48aN7Z6fPXu2PvzwQ5nNZh06dOg/v1/wO64r6WrxUCm92qKeSuTLrYsRUZqzdbe+3WRbH71WiSDNeu4JvbVglRbvOphm+2+f6yJJemb6/JRtJpPUtXYVda1dWUXy5FJ4dIyW7z2iSWs3u+Tuu96pl8g3TLMqpfTi4/VUvEBuhV2P0k/rd+v7tbb41igTpG8GPqH3vlulpVts8c3u46XnH6ujFlXLKH/O7Dp/JUILNu7V7N932dVyP92smro0qqzCeXPockSU/jp0RpOXbNK1SBeVuLnwxEuzKqX0Ypt/xXjDLTEe8ITe+/6WGD+aRoz/SCPGDW+J8VLXxNjvvOtKZhrVKaPnujVQkcK5dflKpBb+ukvzfrGdyaxasYi+GN1VH05coV/XHdBz3Rromafqp/tar70zV7v3267xaNGovHo/UU8FA3LqQliEFq/cpQXLdxlyTGm5VMV1571ali+l15rdHIdnb9+tmZttn+HaxYP03bNPaPiiVVq0O+1x+LtnbONwr2/n222vVqSgBrRooMqFAxWbkKg1h4/pk9/+TPMaPyMU/yLt/rtC/TbV1HNYOxUuHaAroeFaOv13LZy8WpJUuUFZfbJ0sMa/MlOr52xO1bZVt/p6c/Kz6l1lmC6eTZ0QukyiMeNE/cerq+dbHVS4TKCuhFzT0mnrtHCSreKhcsNy+mTFMI1/8Rutnr0ppc1jzzZV5/6tlb9wHoWeCtNPny7X2rn2S0g37VJH3Qa3VUCxfAo7e0XLvlmnJVPXptuP5Pca8thH2rvxSLr7OdPKiJmGvM+9qLTkf67uQpr2tRt5550yEJcmIMnOnDmj3Llzy98/9brpJ0+e1G+//aYXXnjhP7+PKxOQrMCVCUiWwczvfeXKBCSrcGUCklVkpAQkUzIoAcnKMnICUuGXEa7uQpoOtB/h6i7clQzxTVC0aNF0nytRooRTkg8AAAAArsed0AEAAAAYJkPMgAAAAAAZnesvXMgcmAEBAAAAYBgSEAAAAACGoQQLAAAAcAA3InQOZkAAAAAAGIYEBAAAAIBhKMECAAAAHEAJlnMwAwIAAADAMCQgAAAAAAxDCRYAAADgAO5D6BzMgAAAAAAwDAkIAAAAAMNQggUAAAA4gFWwnIMZEAAAAACGIQEBAAAAYBhKsAAAAABHsAyWUzADAgAAAMAwJCAAAAAADEMJFgAAAOAAVsFyDmZAAAAAABiGBAQAAACAYSjBAgAAABxgZRUsp2AGBAAAAIBhSEAAAAAAGIYSLAAAAMABrILlHMyAAAAAADAMCQgAAAAAw1CCBQAAADiCEiynYAYEAAAAgGFIQAAAAAAYhhIsAAAAwAHciNA5mAEBAAAAYBgSEAAAAACGoQQLAAAAcAQlWE7BDAgAAAAAw5CAAAAAADAMJVgAAACAA6zciNApmAEBAAAAYBgSEAAAAACGoQQLAAAAcASrYDkFMyAAAAAADEMCAgAAAGRRFotFEydOVKNGjVSlShX16dNHp0+fTnf/hIQEjR8/Xo0aNVLVqlXVo0cPHTp06K7ekwQEAAAAcIDVasqQf/6LKVOmaO7cuRo9erTmzZsnk8mkfv36KT4+Ps39R4wYofnz5+v999/XggULlCtXLvXr1083btxw+D1JQAAAAIAsKD4+XjNmzNBrr72mJk2aqHz58powYYIuXryo1atXp9r/7Nmzmj9/vsaMGaOmTZuqVKlS+vDDD+Xl5aX9+/c7/L4kIAAAAEAWdPjwYUVFRalu3bop23LkyKHg4GBt37491f4bN25Ujhw51LhxY7v9161bp3r16jn8vqyCBQAAADgig66C1aJFi9s+v3bt2jS3X7hwQZJUsGBBu+0FChRQaGhoqv1PnTqlIkWK6LffftPXX3+tixcvKjg4WMOGDVOpUqUc7i8zIAAAAEAWFBMTI0ny8vKy2+7t7a24uLhU+0dGRurMmTOaMmWK3njjDX355Zfy8PDQ008/rStXrjj8vllqBmTcc9Nd3YVMLcri7eouZHrusri6C5laXvcoV3ch0+v96/Ou7gLw33hkqZ9OeECkN8NxJz4+PpJs14Ik/39JiouLU7Zs2VLt7+npqRs3bmjChAkpMx4TJkxQkyZNtGjRIvXt29eh92UGBAAAAHCIKYP+uTfJpVdhYWF228PCwhQYGJhq/8DAQHl4eNiVW/n4+KhIkSI6d+6cw+9LAgIAAABkQeXLl5efn5+2bt2asi0iIkIHDx5UzZo1U+1fs2ZNJSYmat++fSnbYmNjdfbsWRUrVszh92UeEQAAAMiCvLy81KNHD40bN0558uRR4cKFNXbsWAUGBqpVq1Yym826evWq/P395ePjo5o1a6p+/foaOnSoRo0apVy5cmnixIlyd3dX+/btHX5fZkAAAAAAR1gz6J//oH///urSpYveeecddevWTe7u7po+fbq8vLwUGhqqhg0basWKFSn7f/HFF6pdu7ZeffVVdenSRZGRkfruu++UJ08eh9/TZLVaM+iCYs634mRFV3chU+Mi9PuPi9DvLy5Cv/+4CP3+K//2EVd3AfhPVl6d5uoupKv4rI9d3YU0neo91NVduCvMgAAAAAAwDNeAAAAAAI7IMnVD9xczIAAAAAAMQwICAAAAwDCUYAEAAACOsN77Tf9wEzMgAAAAAAxDAgIAAADAMJRgAQAAAA7IOnfPu7+YAQEAAABgGBIQAAAAAIahBAsAAABwBCVYTsEMCAAAAADDkIAAAAAAMAwlWAAAAIAjuBGhUzADAgAAAMAwJCAAAAAADEMJFgAAAOAAE6tgOQUzIAAAAAAMQwICAAAAwDCUYAEAAACOoATLKZgBAQAAAGAYEhAAAAAAhqEECwAAAHAENyJ0CmZAAAAAABiGBAQAAACAYSjBAgAAABzBKlhOwQwIAAAAAMOQgAAAAAAwDCVYAAAAgCMowXIKZkAAAAAAGIYEBAAAAIBhKMECAAAAHEEJllMwAwIAAADAMCQgAAAAAAxDCRYAAADgCKvJ1T3IFJgBAQAAAGAYEhAAAAAAhqEECwAAAHCAiVWwnIIZEAAAAACGIQEBAAAAYBhKsAAAAABHUILlFMyAAAAAADAMMyAucmiHRStmWXTxjFV+OaX6j7mpxVNuMpnSXl86Md6qlT9YtHOdRVERUoEiUrPO7qrRnBxSkv7Zkag138Xr0lmLfHOYVPsxTzV+0jPdeJrNVm1ckKC/f0tQxBWr8hZ2U5MnPFWpiafdfgc3J+qPufG6fM4iv9wmVW1ue10Pz6y3DviRHWat/i5BYWctyp7DpDqPeajJkx63jfGfCxK147dERVyxKl9hk5o+4anKTeyHnZ2rE/XnwgRdCbHKP49J1Vq4q3lXT7l7ZK0Y798u/TJLCjkj+eeUmrSRWj8lpRNeJcRLS3+Qtq6VIiOkwCLSw12kOs3T3j8mShr1ktS2h1T/4ft3HBlZkyLF9WbthiqTO6+uxMZo9oHdmrJrW7r7l8qVR2u79Um1/fi1K2oxd6bdfsPqNVbdQkWUaLFoW8g5jd78h87euH5fjuNBUqNFBfV+q4OKliuo61citWLmes377FeH2pauUlSf/TZcz9V8RxfPXrnPPX1wEWM8iEhAXODkQYumjzCramOTHuvtrpP7rVoxyyKrVWrVzT3NNt99ZNaBrVY16+ymMlVNOn/cqp8mmhUZYVWTDmm3ySrOHDTrx1GxqtjIQy17een0AbPWfBcvq1Vq2tUrzTbrfojXhp8T1Kybl4oGu+ngpkTN+zhOJneTKja0/bM49nei5nwQq4qNPfTwM166eMqi1bPiFXXdqrYvext5iC53+qBZ34+KU6VG7mrVy1unD5j123cJslqlZl0902yz9ocE/fFzopp381TxYDft32TWnI/jZXKXKiXFeNPiBC37OkEVG7rr0T4eioqwas2PCbpw0qqe72adGB8/IE0eIdVsIrXvLR07IC3+VrJYpDZPp91m2hhp71Zb0lG+qnT2uPTD51LkdalFR/t9oyKkSSOkKxdNyqr1A9UDCmnaox217Nhhjd+2UTULFtagOo1kMpk0+e+tabYJzpdfktT1l3mKMyembI9NvPn/C2b31/yO3XQi/KpeX7NcPu4eerNOQ33ftosemTfLrl1W81DtUhrx46vasGi7Zn24WBXrlFHvdzrI5GbS3E9X3LZtiQpBGjW3vzw8+ZlyO8QYDyo+dS6w6geLCpc0qccQW/gfqimZzdLanyxq0slNXt72pzzPHbNq32arHnvGTa262pKNctUlLx9p6XSLard0Uza/rHW2+N/WzY5XYEk3PTHYR5JUtqaHLGZpw8/xatDRU57eqWPz9+pEVW7ioebdbQlK6WoeCjkera3LElISkL9XJypnfpOeGOQtN3eTSleXIq9btXlRgh573itLnaFfOztBBUu66anBtqSgXE13mc3SHz8nqGFHjzRjvGO1WVWauKtld1uCUrqau0KOW7RlWaIqNfSQxWzV2tkJKl3NTd3fuplsFC7jps9ejNXRv80qUz1rJNdLf5SKlJSeG2J7XLGWZE6UVv4kteosed2Si505Ju3ebFKHZ6x6rJttW3B1ydtHWvCNVK+V5Otn2757szT3SykuxrjjyYgG1Kyng5fD9MY625nh9WdPydPNXS9Vq6Nv9uxMM1EIzltAZyOua0vI2XRfd2Ct+oqKj1f3pT+nJCZnb1zXN492VOUCAdoeev7+HNADoMeQtjqx76zGvjRDkrRz7QG5e7rrydcf1cIpqxUfm5CqjYenu9o931y9hrdP83nYI8Z4UN1V/c6lS5c0YsQIPffccxo6dKi+/fZb7dixQzExWfyb7S4kxlt1bJ9VlRrY/2Cr0tCkuBjpxP7UZycvnrVtq1DH/q+rVCU3xcdKR/dkzTOakpSYYNXJvWYF17fPpSs09FB8jHTqgDnddj6+9n8H2XOYFB1htdvHy8ckN3eT3T7mxKz1Yy4xwaoTey2qUN8+GajY0F3xMdLJA5Z0290aY98cSolxZLhVMZHSQ3XsXzegqJuy55AOb0v77y6zSYiX/tkrVWtov71GIykuxqSj+1O3CT1j+2/luvbby1aS4mJNOrLH9jg6UvryfalcZen1D53f9weFl5u76hQuolUnj9ptX3H8H/l5eal2wcJptgvOV0AHL4fd9rVblyyjeYf32c2K7Lt0UXW++ypLJx+eXh6q1KCsNi372277xiU75evvo4r1yqTZrlarSuo+pK3mfrpCM0YuMKKrDyxijAfZXc2AvPXWW9q4caPKlCmjc+fOaenSpbJarXJzc1PJkiVVsWJFVapUSZUqVVL58uXl6Zl2aUZWduWCZE6QChS2/2GWr5Dt8aXzVpWvYd/GL6ftv1cvWlWoxM12V0JtP+SuXsi6CcjVUKvMiVK+wvbJWd6CtsdXzltUpnrqdg06emnDz/EqV8ddRYPddXhroo7uNKvVMzdLtuq29dSsd2P15/x41WztqcvnLNq8OEFla7nL1z/rzH7cjPEtn9mkGF8+b1HZNGYqGnX01B8/J6h8HXcVC3bToa1mHd1p0SPP2MYFn+wmublL1y7af35jbtgSk1u3Z1aXL0iJCSYFFLY/3vyFbP+9eE6qcMuY4J80Jly5KAWVuLk9LPTma0q2mZORX9uuD0nelhUVyZFT3u4eOhF+zW77qeu2xyVy5dGf506nahecr4COXruihR2fVoV8BRQRH6f5h/dr/PZNSrRYFOSfUzm8fXTuRoRGNWqhtqXLy9fDU3+eO633/lyjkMgbhhxfRhRYPJ+8vD11/vhFu+0hJ2wJXeFSAfr794Op2v2z65R6VxmmyPBotepW35C+PqiIsWtwI0LnuKsEZNeuXRo8eLD69LFdlBcdHa0DBw5o37592rdvn7Zv365FixZJkry8vLR37947vmZcXJyOHj2q0qVLy8fHR4cOHdIPP/ygixcvqkyZMurdu7cCAwPv4dAypphI2yfX29d+e/Lj2OjUbUpVMilvQWnRl2Z5eUtFy5p0/qRVS2eYZXKT4mPvc6czsNiotOPpdZt4SlLddp46fcCs7967GbzqD3uoUZebCUiJyu5q2NlTq2bEa9WMeElSwVJuenKIj/MO4AEQkxTjW2czkmMcl06M67Xz0MkDZn37XlzKtpoPu6txF1sC4uVjUuXG7vpraaICirmpQj13RV63aunUeLl5SPGxWWOUj460/dfnls+wz20+w2UrS/kKWjV3ii3JKF5WOndCWjhdMrlZFZf0sfbwtCUfWV1Ob1sNW2R8nN32qATbv2s/r9TXiuXL5qv8vtllsVr10ZYNCrkRofpBxfRitVoq6OevAWtXKG+2bJKkYXUba09YqPqvXqa82Xw1pG4jzWn3lFr/NEsxiVmzxMUvp+0DHH3D/gsqOtL22Nc/W5rtroSG39d+ZSbEGA+yu0pAvL29FRwcnPLY19dXtWrVUq1atVK2hYeHa+/evdq/P426gVscP35czzzzjC5duqRChQpp9OjRevnllxUUFKRSpUppzZo1WrhwoWbPnq1SpUrdTVczLEvSb6r0VrZJa7uHp0kvfOChuZ+a9eVwW1lKjjxSx5fc9d0Ys7yy1u9hO9ak6p+7iWdiglXfDI7RjWtWtXvVW/mLuOn0AbPWz4uXt0+c2rxo+7Hyy6Q47VqdqKZdPVWyqruuXbBq3Y/xmvVujJ79MJu8fLLGLEhyjHWXMZ46OFaR16zq8KpnUowt+n1egrx84tX2RdsPvg6vesnDM14LP4/Xgs8kT2+pcRdPJcQq68T3TmNCGoWyHp7SgA+kWZ9KE4bZGubMY1XXl6WvP7RdC4KbkldqSy+ltVpTP3MjPl7dl/ykE+HXFBplm8nYGnpO8eZEDa7TSF/s3CJPN9vM3+XoKL2w8peU1z8dEa5FnbqrY9mHNPvgnU/EZUYmt6SYpxFbSbJa0i7dhOOIMR5kd5WAtGzZUgcPHlTdunXT3SdXrlxq3LixGjdufMfX++STT1StWjW9/PLLmj59ul566SW1a9dOo0aNkslkUmJiooYMGaIxY8bom2++uZuuZljZstsGjFvPaiafRc6WPe12+QuZ9No4D90Ityo6QspXWAq/ZPtxmJXKgW7lk3Sh7a1n4eOTHvtkTx2bAxsTdeGkRc984KPS1Wz/BEpUcpdPdmnZl/Gq8YiHfP1N2rkyUY2f9FTLXklXAFeWgsq66YuXY/T36gTVbZv2CluZTfICB3HR9l9yN2Ocus3+jWZdOGnVcx94q3Q124+0kkkxXvJlgmo94qHAEm7yzmZS5wHeevwFq8LDrModYJKXj0k7f0tUnoJZ43PtmxS/W8eE5MfZbpkZSVagsDR4vBQRbrUtzV1YunZJslpMyu6fNWaPHBURZ5v58L9lpiO7p+3xjVtmRiQpzpyoTefPpNr+++kTGlynkYLz5dexa1clSX+cPWmX3Oy6GKrrcbEKzlfASUfw4Im6brtQ7taz8L5+tuw4KiILXUh3nxBjF7Fmje+m++2uLkLv3Lmzfv31Vx07dswpb75t2zYNGDBA5cuX19ChQxUXF6du3bqlnK3y8PDQiy++qJ07dzrl/TKCfIUkNzfpcoj9D4TkxwFFU3+w4+Os2rHWoisXrPLPZVJAUZPc3U06e9TWJqh01v3HkKegm9zcpCsh9md6roTaHhcomvojHh5mi1uxYPvrFkpUsj2+dMai8EtWWa2p9wko7i7fHFLY6axzZilPQVNSjG/5zDoUY/vnkmMcdsbW9tBWs04dMMs7m0kBxdzk5WNSZLhV1y9bVahU1rjHTf5CkpubVWEh9tsvJT0uWCx1m/g4acta23UdOXJJBYtK7u7S6aRrrIuWvq9dfuCciQhXosWiYjlz220vnvT46NXU9z8omSu3ugdXkZ+nfdLi42E7aXE1Jkanr4fLbLHIyy31uTwPNze7C9OzmpCTYTInmlWoZH677YVK2pKyM0dCXdGtTIUY40F2V9/wTz75pPbv368nnnhCw4cP14oVK3T6dOoL9xzl4+Oj2FhbrWK+fPn05JNPytvbfr3JiIgI+fv73/N7ZDSeXiaVrGTS3k1Wu2nTPRutyuYnFS2XOpnw8JAWTDHrrxU3f/RazFZtXGJRvkJSYHEjep4xeXqZVKyiuw5uTrSL54GNifLxs81Y3Cp/Edu2W1fIOn3QFt/cgW7KW8iW2Ny6z6VzFkVHSLkDssaPY8kW4+IV3bR/s9kuxvs3muXjJxVJM8a2z/GtK2SdPmiLZ+5A2/NbVyRoxXT7GvlNixNkcpMeqp01luD19JLKVJJ2bbpZjiVJO/+UfP2sKlEudRsPD2nOZGnDv5b5t5ildb9IBQpZVaj4fe/2AyXObNa20HNqXcJ+VaDHSpXV9bhY7Q5LfYV+QHY/fdCklR4tVdZu++Oly+tGfJz2Xbqo6MQEbQ89r9Yly8jL7ebntX7hosru6aXtoefuzwE9ABLiErVv81E1eNx+FZCG7WroRniUjvx90kU9yzyIMR5kd1WCNXr0aB06dEgHDhzQr7/+qkWLFslkMil79uwKDg5WxYoVNWTIEIdfr2HDhnr//fc1evRolSpVSqNGjUp5zmq1atu2bRo5cqRatmx5N93M8B7u5qYvh5s16wOz6jzippMHrfp9vkWP97HdAyQ2yqoLZ6zKV9Akv1y2ZWAbPu6m9YstyplPCihi0salFp08YFWf/7nLzS3rzoBIUtOunvr27VjNHROrGq08deaQWRsXJOjhZ73k6W1SbLRVl85YlKegm7LnNKl8HXcFlXPT/LFxat7DqvxBJp09YtH6ufFJz9l+SNTr4KmNC2w/jktXc1d4mFXrZscrZ36TarbOWiu8Ne/qqelvx2n2mHjVbOWh04fM+nNBolo/65kS47CkGPvlNOmhOu4qUs5NP42NU8sensof5KazRyz6fW5C0nO2GNdv76mZ78Rp6dR4Bddx1/E9Zv3xU6KaPOGhPAWzTpLX5mlpwjBp6gdSg0ekEwel3+ZLnZ6zXWQeE2Vbejd/Qck/l+TmLjV9XFqzWMqV1zYD8vsS2w0NXx5hm2WFvS92/qUf2z6pyQ+31c+H96t6QCE9X7WWPtqyQXHmRPl5eqlMnrw6fT1cV2NjtDXknP46f0bv1m8qXw9PHQ+/qubFSuqZStX14V/rFZFUtvXJ1g2a0/4pzWzTSdP27FC+bL4aWrexdl0M0epTx1181K41Z/xyjVk0UG/PfEGrftyk4Nql1OW1hzVj5ALFxybI199HRcsVUujJMF2/Eunq7j6QiLELUOHqFCZrelcv3YHFYtHx48d14MAB7d+/XwcPHtThw4f1999/37lxkqtXr+rFF19UkSJFNH78eLvnli9frjfffFONGjXShAkT5Ofndy/dtLPiZMX//BrOsneTRSu/NyvsvJQzr9SwrZuadbb9KDu2x6LJQ83q9oa7aj9s+yVhTrRq1Q8WbV9rUfQNqXBJkx7u7qbyNTLOL40oi+vuXH1wc6LW/hCvy+csypHPpDqPe6phJ1vpxIm9iZoxLFadBnqreitb4hAbbdWaWfE6sClRMTesyh1oUtUWnmrQ0VMenjcv7PvrlwRtW5Ggaxes8s9jUunq7mrV21vZc7om6XOX60q/DmxO1JofEnTpnFU58plU73EPNepki+eJvWZNGxanLgO9VKOV7bxGbLRVv81K0P5NZsXcsCpPoEnVWnioYUePlBhL0u4/EvX73ARdu2hVrgIm1W3jofrtXJPg5XWPcsn7SrYZkCXf25bdzZVXatrWdpdzSTqyRxo/xKRn3rSq/sO2bYmJ0rIfpL/WSNE3pKBS0uPdUy/Zm+zyBemt3vav4Qq9f33eZe/9SInSGlCrgUrmyq2LUZH6bv9ufbNnhySpbqEimtv+KQ1a96vmHzkgyXbNyICa9fVwiTIq4JtdpyPCNWPvTs09tM/udasHFNLgOg1VtUBBxSQm6LdTx/Th5ptJitHKv33EJe+blvptqqnnsHYqXDpAV0LDtXT671o4ebUkqXKDsvpk6WCNf2WmVs/ZnKptq2719ebkZ9W7yjBdPJu6TA42mTHGK69Oc3UX0lXys09d3YU0nRjwhqu7cFfuOQFJi9VqTbl+426Eh4crV65cdtuuXr2qsLAwlS9f3km9y1gJSGbkygQkq3BlApIVuDIBySpcmYBkFRkpAQHuBQnI3XvQEpC7KsG6k3tJPiSlSj4kKU+ePMqTJ89/7BEAAADgJJRgOUXGqd8BAAAAkOmRgAAAAAAwjFNLsAAAAIDMykQJllMwAwIAAADAMCQgAAAAAAxDCRYAAADgCEqwnIIZEAAAAACGIQEBAAAAYBhKsAAAAABHUILlFMyAAAAAADAMCQgAAAAAw1CCBQAAADiAGxE6BzMgAAAAAAxDAgIAAADAMJRgAQAAAI6wmlzdg0yBGRAAAAAAhiEBAQAAAGAYSrAAAAAAR7AKllMwAwIAAADAMCQgAAAAAAxDCRYAAADgAG5E6BzMgAAAAAAwDAkIAAAAAMNQggUAAAA4ghIsp2AGBAAAAIBhSEAAAAAAGIYSLAAAAMABrILlHMyAAAAAADAMCQgAAAAAw1CCBQAAADiCEiynYAYEAAAAgGFIQAAAAAAYhhIsAAAAwBGUYDkFMyAAAAAADEMCAgAAAMAwlGABAAAADuBGhM7BDAgAAAAAw5CAAAAAADAMCQgAAAAAw5CAAAAAADAMCQgAAAAAw7AKFgAAAOAIVsFyCmZAAAAAABiGBAQAAACAYSjBAgAAABzAjQidgxkQAAAAAIYhAQEAAABgGEqwAAAAAEdQguUUWSoBaZ0t3tVdyOSILx5skZZYV3ch0/MOiHZ1FzI/s9nVPcjUzDduuLoLwAOPEiwAAAAAhslSMyAAAADAPaMEyymYAQEAAABgGBIQAAAAAIahBAsAAABwADcidA5mQAAAAAAYhgQEAAAAgGEowQIAAAAcQQmWUzADAgAAAMAwJCAAAAAADEMJFgAAAOAAVsFyDmZAAAAAABiGBAQAAACAYSjBAgAAABxBCZZTMAMCAAAAwDAkIAAAAAAMQwkWAAAA4AhKsJyCGRAAAAAAhiEBAQAAAGAYSrAAAAAAB3AjQudgBgQAAACAYUhAAAAAABiGEiwAAADAEZRgOQUzIAAAAAAMQwICAAAAwDCUYAEAAACOoATLKZgBAQAAAGAYEhAAAAAAhqEECwAAAHAANyJ0DmZAAAAAABiGBAQAAACAYSjBAgAAABxBCZZTMAMCAAAAwDAkIAAAAAAMQwkWAAAA4ABWwXIOZkAAAAAAGIYEBAAAAIBhKMECAAAAHEEJllMwAwIAAADAMCQgAAAAAAxDCRYAAADgCEqwnIIZEAAAAACGIQEBAAAAYBgSEAAAAACG4RoQAAAAwAEmV3cgk2AGBAAAAIBhSEAAAAAAGIYEJAMLDZNqt5G27XJ1TzIvYnx/EV/Hbd7mpl4veKthax+17eqtmT96yHqb5R7j46VJ0zzU5klvNXzER937eevX1e6p9vtjo5t6Pu+txo/6qGN3b02b5aGEhPt4IBlYo4BSWtCsr3a3G651rfvr+bIN7timSWAZ/dz0Oe1pP1zrHx2gtys/omzununuP7zSwzrS6T1ndvuBVqNFRU384z0tDv1Ss/Z9oqfeeMzhtqWrFtOyy18roGje+9jDjKvmI1U1edtHWhr5g344OUVdh3W4Y5sW3Rtp2r5PtSzqR8049Lkefa55qn2KlCukUYuHanH4LC24NEP/WzBYgSUKpPua9drV1GrLz6rcJPi/HE7mYc2gf/4Di8WiiRMnqlGjRqpSpYr69Omj06dPO9R26dKlKleunM6dO3dX70kCkkGdvyA996Z0I5Jqw/uFGN9fxNdxe/a76c23vVS8mEWfjIrXY63M+nK6h2b+mP5lem+/76Uf5nno0VZmjf8wXq1bmDXmU0/NmX8zCdm6w01D3vNS0SIWjX0/Xl3am/Xtjx6aMCX9H9CZVbU8QZpSr6uO37is17b+pCVn9mlgheZ6sVzDdNs0CyyrL+s9paM3LumFzXP09ZFN6lSsqt6v/nia+9fMW1Q9S9e+X4fwwHmodimNmNtfZ4+E6P0ek7Vu3l/q/W4ndR2Udvz+rUTFIhr10wB5eGbNS1WD65XVqF+G6syh8xrZeZzW/LBBz47upqff6pRum8Zd6mrIrFe1c/Uejej4iXb/vl9vTHtJzZ+++RnPH5RXn20crRz5/DWm++f6/KWvVSw4SB+teldePl6pXtM/j58GfPXCfTlGZBxTpkzR3LlzNXr0aM2bN08mk0n9+vVTfHz8bdudP39eI0eOvKf3zJr/sjMwi0VavFL65EtX9yTzIsb3F/G9e9/M8lDZ0laNess2NVG/tkWJidKs2R56+olE+Xjb73/kqEl/bHTXy88l6NkeiZKkOjUs8vGx6oupnnq8tVn+ftLSX90VWMD2uu7uUp2aFl0Nl+bM99AbryTIIwt9A7zyUBMdDr+gITsWS5L+vHhcHiY3PV+2gWYe3aI4S2KqNm9Vfli/nT+kt3YukSRtuXRK7iaTepaqLR93D8Wab7bJ5u6pMTXaKSzmhgr65jTkmDK6HsPa68S+Mxr7wjeSpJ1r98vd011PDnhMCyetUnxs6qk4D093tXuhpXq93SHN57OKnu89oeO7T+nj3l9Iknas2i0PT3c9NbSD5n+6TPGxqX8YPvN+N/05f4u+emOWrc1ve+Sf20+9RjyldbM3SpJ6jXxS0TdiNLTVKMXF2F7jwskwjfplqMrWLKn9Gw/bvWb/yf2UmJD63wYyj/j4eM2YMUODBw9WkyZNJEkTJkxQo0aNtHr1arVp0ybNdhaLRYMHD1aFChW0ZcuWu35fZkAymCPHpZETpA6PSB+/7ereZE7E+P4ivncnPl7aucdNzRqZ7bY3b2JWdIxJu/emHqZPnrZta1Tfvk31KhbFxJq0Y5ft+fgEk3x8JPd/VWblyiklJJgUFe3kA8nAPN3cVSdfMf0WYv/jalXIIWX39FbNfEVTtXkoZ6CK+uXR98e3223/7vg2tfptkl3yIUlDK7XS5bgoLTy9x/kH8ADy9PJQpYbltGnp33bbN/6yQ77+PqpYv2ya7Wo9XFndh7bT3PHLNeN/PxvR1QzH08tDlZtW0MZFW+22b5i/Rb7+2VSpUflUbQKK5VeRcoVStflzwV8qXDpQhcsUlCQ17FhHK2esS0k+JOmfnSfUNeiFVMlHkyfrq3qryvpm6A/OOrRMwWTNmH/u1eHDhxUVFaW6deumbMuRI4eCg4O1ffv2dNt99dVXSkhI0Asv3NsMGQlIBlMwQFr1ozTsVSmb9533x90jxvcX8b0750NNSkgwqWiQxW57kcK2b5Qz51KXsOXOZXsu5IL9c+dCbI9DQm1D+5MdE3X2vEnfz/XQjUhp30GT5s73UIM6ZuXM4fRDybCKZM8tL3cPnYq8Yrf9dORVSVJxv9TXGDyUK0CSFGdO0Ff1umpP++Ha9vhgvVOltbzc7K+1qV+gpNoXrazhO3+R5b8WY2cSgcXzy8vbU+ePXbDbHnIiTJJUuFRAmu3++fukelceornjlsmcaElzn8yuYMkAW+z+CbHbHpIUy6CyhVK1KfpQYUnSuX9C7bafT2lTUIHFC8gvV3ZdPHVJr016TgsuzdDy6B816pehyl8kn127XAVy6rVJz+nLATN1JfSa044NGc+FC7bPSMGCBe22FyhQQKGhoWk10d69ezVjxgyNHTtW7u6prz10RIadgG/btq2+/vrrVAHJ7HLlkJSFfhi4AjG+v4jv3Um+RiZ7dvvtvr62/0ZFpU5AqlexqHAhi8Z/4Skf7wQFl7fo6HE3TfraU25uVsXE2varUdWiXl0TNXGqpyZOtV33Ua6MRaPfuX1db2aTw9NHkhSZEGe3PSrR9tjPM3WmnMfL9hcyqe6TWnZ2v2Ye3aJKuQvpteAmyuudXQO3LbC19fDWB9XbauLBP3QqKaGB5JfT9gGOvhFrtz35sW+ObGm2uxIafl/79SDInsv22YuKiLHbHn3D9jit2PkltYm+pU1MSrx9lTO/bWDu+1F3Hd52TB8+/ZlyFcipPh8+rXHr/qcXqgxSbLTt38TAqS/o4F//aM0PG7j4/AHRokWL2z6/du3aNLfHxNg+M15e9tcAeXt76/r166n2j46O1qBBgzRo0CAVL15cFy9evKf+ujQBWbx4cbrPnT59Wr/++qvy5MkjSerQoYMxnQIAA1mTTvKmd6m+KY15ak9P6YtP4vX+J556ZZDtx3O+vFYNei1eb43yUjbb722N+dRTS1e6q0/PBNWubtH5UJO+/tZD/Yd6a8r4OPn4OP94MiK3pOimNzdhSWO5Mc+kWY7VIYc17oDti3vr5VMymUwaVLGFJh78Qycjr+itKo/oQkyEvj129zXQmZnJLSnm6SzlZrUwU5Qet6TYpbcMniWN2KUXb1PyS1ks8vCy/eS7dvG6RnYel7JvyLELmvjXh2rRo5GWf71GrXo1UcVGD+n5Sm8443Ayn0z20fVJ+iKIj49P+f+SFBcXp2zZUie7o0ePVvHixdW1a9f/9L4uTUBGjhyp2Fhbdp7WIPXJJ59IkkwmEwkIgEzJz8829t16TUZ00mO/7Gl/2xUpbNXXn8fr6jXpeoRJRYKsuhhmksViUo4cVoVdkhYvd9ez3RP1Uh/b9Qo1qkrB5S3q1sdHS35115MdzWm+dmYTkWD7nvHztD/Dl93DlrxFJsSmapM8O/LHhaN22/+8eEyDKrZQ+ZwBKuaXR22CKqjzum/klvRLLznZcTeZZLFaM9tvFYdFXbd9gH397X/A+PrbfuBERWShi5DuUmR4lCTbrMW/JccyObb2bZLifcvsiI+fT0qbmKQZlO0rd9n95jq09ahuXItUqaollLdQHr004RlNHfSdroVdl5u7m9zdbWdB3N3d5ObmJosla5bGZXTpzXDcSXKlUVhYmIoWvXk9XFhYmMqXT3290YIFC+Tl5aVq1apJksxm2/fI448/rnbt2mnUqFEOva9LE5CFCxdq0KBB8vf318cff6yAgJs1odWqVdOSJUtUpEgRF/YQAO6voMJWubtZde68m6SbX+xnz9t+yJYolvonbGyctG6Du6pUtKhwQavy5Lbtc/gfW5vyZSy6EOYmq9WkyhXtfyyULmFVzhxWnTjlJilrJCBnoq4q0WJRsex57LYX87M9Pnbjcqo2yeVUXm72X5PJMyNxlkQ9Uvgh+bh7anmrl1K1P9jxXS08vVvDk1bQympCTobJnGhWoZL295dIfnzmcEhazSAp5PhFW+xKB9ptT3585mDq+y2cO2KLZ+HSgTq++1TK9sJJbU4fPKeIK5Eymy3y9E69DLeHp4fiYuJVo1Vl+ef206DpL2vQ9Jft9vlkzf904VSYepZ85T8dHzKW8uXLy8/PT1u3bk1JQCIiInTw4EH16NEj1f6//fab3eM9e/Zo8ODB+vrrr1WqVCmH39elF6GXKFFC8+bNU+XKldW+fXutWLHCld0BAMN5e0nVqlj0+5/udhUX69a7y9/PqgoPpT7b6Okhjf3cU4uW3bz4z2yW5i30UJHCFpUqYVWRwha5u1lTraJ16oxJ1yNMKhSYdc7Nx1vM2nH5tFoVeshu+yOFHtL1+BjtvXo+VZsdl08rKjFebYpUsNvevGBZJVjM2nXlnCYdWq/O66bZ/Zl3cqckqfO6aZp0aP39O6gMLiEuUfs2/6MGbavbbW/YvqZuhEfpyM6TLupZxpcQl6C9Gw6pYcc6dtsbd6mrG9cidXjbsVRtQo5fUMjxC2rUuZ7d9kad6+nskRCFnbms2KhY7f/zkBp0rCNPr5uJdbXmFZXNz0f7/zykv5bu0Cu1htr9+ezFqZKkz16cqnfbfXwfjvgB4+obDjr5RoReXl7q0aOHxo0bp7Vr1+rw4cMaOHCgAgMD1apVK5nNZl26dCmlYqlYsWJ2f5InDwoVKqS8eR2/aajLL0L38PDQG2+8oUaNGmno0KFau3atRowY4epuAYBh+vRI1CuDvDR8pJfaPpqovQfc9P08D732vO0eIJFR0snTJgUVsip3Ltuyul3aJ2rOAg8VyGdV8aJW/bzYQ3v3u2nc6Hi5uUm5c0lduyTq+3m2Yb5OTYtCL5r0zSwPBQZY1OHxrLW2/5dH/tTMhj31ee0uWnB6t6rlDdJzZetr3P41irMkKruHl0r759eZqGu6Fh+taHOCJh78Q8MrP6yI+Fj9FnJY1fMGqW/ZBvru2DZdi4/WtXjpfLT9RZpNYyMlSfvD0149JiuZM3apxvwySG/Pekmrvt+o4Dql1aV/a83433zFxybI199HRcsVUujJS7p+5Yaru5uhzP5ggT5e/a7enfeGVs5cp+D65fTEoHb6ZtiPio+Nl69/NhULDlLI8Yu6fjlCkvTj6AUaPPMVRVy9ob+W7FC9djXV9Kn6ev+pT1Ned/pbszXu9xH6YPlb+nn8EuUOyKW+H3XXoS3/6K8lO2SxWHTjaqRdX5LLuM4eCdGp/WeMCwIM079/fyUmJuqdd95RbGysatWqpenTp8vLy0vnzp1TixYtNGbMGHXqlP6NMO+WyxOQZLVq1dLixYs1cuRIPf7440pIyLo3IAKQtdSqbtHHI+P19beeGvyul/Lns6r/i4nq8aQtSThy1E0vDvTWe0Pj1ba1rWzqhWcT5eYmfTfXQxERJpUtbdFnH8Wrbq2bMyavv5ioAvmtWrjEQz/+7KF8eayqU9Oil/smKIe/Sw7VZbZcOqXXtv6k/g811eS6T+pi7A19sm+1ZiZdPF4hV0F937i3hu34RYvO2O7l8e2xLYpIiNGzpevpieLVFBZ7Q18c/EPT/tnkykN5YOzZcFije05Rz+Ht9d7sV3UlNFzfvPuzFk5aJUkqXaWYPlk+VONfmq7Vs4npv+3+fb9GdRmvXiOe1IhFQ3Tl/FVNG/K95n+6TJJUunoJjf99pMY+O1m/zfpDkvTbrD/k6e2hJ95sp9bPNlPoiTB93OsLbfj5r5TXPbTlHw1uPkLPju6m9+YPUlx0nDb/sl1TB33HtR1ZmLu7uwYPHqzBgwenei4oKEhHjhxJt22dOnVu+3x6TNb0lqhwocWLF2vhwoUaN26cChQocOcGDrJcSPvGRwAgSZGW1Bcjw7lqbenr6i5keiWePeXqLmRq5hvM1txvqy0Z9yaUVV+b4OoupGn3FwNd3YW7kmFmQP6tQ4cOrHoFAAAAZELcCR0AAACAYTLkDAgAAACQ4WS4CxceTMyAAAAAADAMCQgAAAAAw1CCBQAAADjARAmWUzADAgAAAMAwJCAAAAAADEMJFgAAAOAISrCcghkQAAAAAIYhAQEAAABgGEqwAAAAAAewCpZzMAMCAAAAwDAkIAAAAAAMQwkWAAAA4AhKsJyCGRAAAAAAhiEBAQAAAGAYSrAAAAAAR1CC5RTMgAAAAAAwDAkIAAAAAMNQggUAAAA4gBsROgczIAAAAAAMQwICAAAAwDCUYAEAAACOoATLKZgBAQAAAGAYEhAAAAAAhqEECwAAAHCAyUoNljMwAwIAAADAMCQgAAAAAAxDCRYAAADgCCqwnIIZEAAAAACGIQEBAAAAYBhKsAAAAAAHmCjBcgpmQAAAAAAYhgQEAAAAgGEowQIAAAAcQQmWUzADAgAAAMAwJCAAAAAADEMJFgAAAOAAVsFyDmZAAAAAABiGBAQAAACAYSjBAgAAABxBCZZTMAMCAAAAwDAkIAAAAAAMQwkWAAAA4ABWwXIOZkAAAAAAGIYEBAAAAIBhKMECAAAAHEEJllMwAwIAAADAMMyAAA+QC+ZIV3chU3uueS9XdyHTKxF2xtVdyPysnKK9n9xz5XJ1F4AHHgkIAAAA4ABWwXIOSrAAAAAAGIYEBAAAAIBhKMECAAAAHME1Vk7BDAgAAAAAw5CAAAAAADAMJVgAAACAA1gFyzmYAQEAAABgGBIQAAAAAIahBAsAAABwBCVYTsEMCAAAAADDkIAAAAAAMAwlWAAAAIADTBZX9yBzYAYEAAAAgGFIQAAAAAAYhhIsAAAAwBGsguUUzIAAAAAAMAwJCAAAAADDUIIFAAAAOMBECZZTMAMCAAAAwDAkIAAAAAAMQwkWAAAA4AgrNVjOwAwIAAAAAMOQgAAAAAAwDCVYAAAAgANYBcs5mAEBAAAAYBgSEAAAAACGoQQLAAAAcAQlWE7BDAgAAAAAw5CAAAAAADAMJVgAAACAA1gFyzmYAQEAAABgGBIQAAAAAIahBAsAAAD4f3v3HR9Vlf5x/DvpCQlFhAQIIB1Cky5dQJRFim2XRSORLp0FRKqAoNKbNEGC7A8xICIqIiwou5RdkCIgIEWqQCB0SAKp8/tjSHBIggMO904mn/frNS+dk3tmnnvmcJPnnufecYSVGixnYAUEAAAAgGFIQAAAAAAYhhIsAAAAwAHcBcs5WAEBAAAAYBgSEAAAAACGoQQLAAAAcAQlWE7BCggAAAAAw5CAAAAAADAMJVgAAACAA7gLlnOwAgIAAADAMCQgAAAAAAxDCRYAAADgiFRqsJyBFRAAAAAAhmEFxIVFx0htO0qzxkm1q5kdjXtijB3344+eWhjpq1OnPJQ3j1Wt2yTp1faJslgy3z4xUVq82EfrN3jr+nWLihVL1d/+lqjmzyTbbXf6tIc++shXe/Z6ystLqlI5WT16JKhw4Zx1lqlGw7Lq0P9ZFStVUNevxGlN1HYtn/9vh/qWrlhE05b1VOfnJivm7FW7n4WWLKDOb/1FVWqXVHJSivbvPKkF47/V+TNXHsFeZD81mlVUxPAXVaxcIV2/HKs1i/6tZdO+c6hv6arFNX39UHWuMVwXfrv8iCN1PTWeqaSIES+pWPnCun7pptZE/lvLpn573z5N29VVuwHPK+SJArp45rJWzFirtf/cZLdN81fr6+W+LVS4ZLCunL+mDVH/1WcTv1FKcookafHPkxRc/PFMX//8yYt6o8pg5+xgNlGjWUVFDHvhd3P4P1o23dE5XEzT/zVUnWuOyJFzGOYhAXFRZ89LXd+SbsZaxLfePBqMseP27/fQ8BH+avJ0sjp3StDPP3tq4UIfWVOl8PDETPuMHeun/23zUru/Jap69RQdPeqhqVP9dP16gl55OUmSFBNjUZ++ASoamqoRw28pIcGiyEhfvTU4QJEL4+Tra+RemqdCtWIaNaeDNn23T/+c/i9VrPGEIv7xrDw8LIqat/G+fUuUK6QxH70hL2/PDD97PCSPpix9U2dOXNKEgVHy8fNWRP9n9V5kJ/VoPV2JCcmZvGLOUaF2KY1e2kebvtyhxe99qUpPlVHEiBdl8fBQ1JT7/yFdolKo3l3WV17eOfPXaIXapTU6qp82rfxRi8etVKW6ZRXxzkuyeFgUNXl1pn0avlBTgz7qolVz12vXhv2q+3w19Z/VUQm3E7Vx+TZJUtsezdVjwqva/OUOfTxyufLkD1T40BdUomKoxr42S5L07msfytvXftwr1C6t7h+017eR/36k++1qKtQupdGf9rbN4fdXqVKdMooY8YLtc5i65r59S1QM1btROXcOPzT+XHAKZp2LSU2VVq2VJs41OxL3xRg/uMX/9FXpUqkaNuy2JKl27RQlp0hLP/PRX/+amCFROHrUQ1u2eqtz5wSFv2ZLUGrUSJGfv/TRR75q8VySAgOlRZ/4yt/fqsmT4+XnZ+tbqFCqho/w1+HDnqpSJcXI3TTNa72e0fFD0Zo8eLkkadfmI/Ly8tBfuz2tlYs2Z5ooeHl7qk14Pb3er7kSbydl+rqv922uW3GJGtbxYyXc2ebCmSsaNTdCZSqF6sCuk49sn7KD8MGtdfzn3zTpzYWSpF3fH5Cnt6f+1u8vWjn7X5mOq5e3p9p0a6YOw9pmOe45QfjQtjr+82lN6rZAkrRrw355ennqb/9oqZWz1mU6NhEjX9KWVTs1f2iUrc/3+xWUL1CvD3tBG5dvk4eHReFD2mjXD/v1XsSc9H5Hfzql+TveU7UmYfpp40Ed23fa7nUDgvw0NLKHtq/do8+n3/+PbneTPod7REq6Zw7PWX+fOdxUHYbm7DkMc3ENiIs5fEwaM0164TlpwnCzo3FPjPGDSUyU9u71VMOG9n8EN26UrFu3LNq3L+OZ91OnbYeWenXt+1StkqLbty366ScvWa3S5s1eatkyKT35kKRy5VK14vO4HJN8eHt7qkqdktr6r/127VvW7VdALl9Vqlki0361GpXTa72badm8jYqcvDbTbeo1r6h1X+xITz4k6ej+swpv+H6OTz68fbxUuUE5bV292659y1e7FBDkp0p1y2Tar1bzynptcGtFTV2jyNFfGBGqy0kfu6932bVv+WqnAoL8Vale2Qx9govlV2iZQtr6zb19dqhwyWAVKR2svAXzKChfoLZ/t8dum9OHz+napZuq0+LJTON59e22yvN4kGYPXPKn9iu78fbxUuX6ZTPO4a8fYA6PyZlzGOYjAXExhYKldZ9KQ3pL/jmk/MRojPGDiY72UFKSRaGhqXbtRYrYnp85k/EwkjePbY36/Hn7C0TOnbM9jz5v0fnzFsXFWRQSnKrpM3zV9oVAPftcoIYN89eFC1lcWOKGQoo+Jm8fL509ecmu/dwpWz12kScyr3U/8vMZRTSdoKh5G5WSkprh58Gh+RSY218Xzl5Vz3faatm2kfpq31iNnhuhAoXyOH9HspmQJwrIx9dbZ3+9YNd+7niMJKlI6eBM+x356aQiqg5R1JRv069JyGmyHjvb8yKlQzL0KVqusCTdZ7xDFHc9XslJyQouZj/nA/MGKChvgEIyue4juFh+teneTCtmfqeYHHYNQ8gTj9s+h2NZjGmpP5jDU9coJTnjsQP3Z7G65iO7MTUBWbFihRIT7evHt23bpm7duqlNmzYaOHCgfv31V5OiM0fe3FJIQbOjcG+M8YOJjbX9N1cu+yNcQIDtv3HxGZOFqlVTVLhQqj6c5adduz0VFyft2+ep+Qt85eFh1e3bFl27Zus3f4GvLl2yaMTwW3pr0G39esxDAwYE6NatR7pbLiNXbn9JUnxsgl17fJzteUBg5lny5Zgbir2e9SDlyZdLktRp0F+UPzi3JgyM0vQRX6hkhUKa8M9u8vX3dkb42VZgnjvjftN+DONjbWWGAUH+mfa7HH1NsdfiHm1wLi4wr+0ff4axu5k2dn4Z++Sx9Ym7z3gn3ErUppU71LpbMz0b3lCBeQMUWjpEQyLfVHJSinwDMv5baNujuZISkrVq7vo/v2PZTNqYpo17GsfmcPyjDQ74A6YmICNHjtTNmzfTn2/ZskUdO3ZUamqqGjRooIsXL+rll1/W7t277/MqAB6lVOudBCOLRQmPTNq9vaWJE+NVsECqBg0KUKvWQXp3rJ86dbSdcPDzsyo52dYxXz6r3h1zW7Vqpah582SNHnVL56I9tH5DzvgD2SNtAK2Zn8JKfch7znv72Erjrl2K1bjeS7R761Ft/HqP3u/3qQoVy6+mbXL2bd8sHrZff9Ysxt3Kvf6zZLlz67sHGTtLFvM8/bVSbWfiZ/ZfrI3L/qf+s97QitOzNWvzaP3y4zEd2X1CCfH2SbqPn7eee72R1v3fphz5B3XamGb9ObC6Addl6kXo9/6jmTNnjjp06KChQ4emt33wwQeaPHmyli5danR4ACQF3ln5iI+zzzTi7/y+v3dlJE2RIlbNmHFLV69adOOGrYQrJsai1FSLcgdZ5R9g61e7drI8fncqJCwsVYGBVh37NWdUiMbesJ0RDgi0P2sckMt2tjftbOaDio+1JXs7Nh22O9Ye2vubbl6/pVIVCj/U67qLuOu2CXzvWeK0zyHuRg5ZgnsIWY5dUNZjl1Uf/zvzPK3P7bgETeu9SHPfXqrgovl1/vQlJcQn6tnwBtp38qJd3+pNKylXbv/0O2jlNHF3VkCZwwbLIuHDg3Gp3/CnTp1S27Zt7dratWungwcPmhQRgCJFUuXhYdXZs/YJyNmztsNH8ScynmVLSJDWr/dSdLRF+fJZVbx4qjw9pSNHbGfly5RJVZHCttdNSsq4hJKcLPnkkOtzok9fUUpyigoVz2/XXvjO89O/xjzc6/52WSkpqfL2yXieycvLw+7C9Jzo3IkYpSSnqHBJ+3rMtOenD50zI6xsIeuxs11zkNnY/Xb0/J1t7j/etVtUVVid0rodl6BTh84pIT5ReR4PUoHQx/Tr3lN2feu0qKroEzE6+tNJp+xXdnP3cyhg154+poejzQgLcIipCYjlnm8we+KJJxQfb7+MevXqVQUFBRkZFoDf8fGx3b1q8xZvuxM//9nkpcBAqyqUz3ghrpeXNGOmn1avvltGlZIirfzSW0WKpKpEiVT5+0uVK6do82Yv/f5SsF27PXX7tkVVKueMC3yTEpP1886Tqt+8ol17g+cq6eb1Wzq877eHet3b8Yk6sPOk6j9bUd6/+46QJ58qJf9cvjqw8+SfCTvbS0pI1s//PaL6rarbtTdoW0M3r8Xp8O4TJkXm+pISkvXz1iOq36aGXXuDtjV182qcDu86nqFP9PEYnTsRowZta93Tp5bOHI1Ov4D8+U5Pq+t77ey2ebHns0pNSdX2tXvt2svXLKmD23PWdaK/Z5vDRzPO4TbMYbg+UxMQq9WqZs2a6cUXX9SgQYPk4+OjSZMmKSnJdmZu9+7dGjNmjBo3bmxmmECOFx6eqF9+8dCYMX7avt1TkZE+WrbMR6+9miBfXykuTjp40CP9wnJPT6lt20R9sdJHX37prV27PTV6tJ/27/dUr16300uuunZJ0OXLFg0Z6q/t2z21dq2X3nvPTxUqpKhevZzzJXlRc39QuapFNWzGq6rZqKxe79dcL3dupGUfbVRiQrICcvmqfNWi6ReWO2rR1LXKXzC33l3QUTUbldUzL9bQ4Cl/16E9p7XtB1aWP5v8rcrVLKHhi95UzWcqqcOwtnqlz3NaNnWNEm8nKSDIT+VrllSe/IFmh+pyPpv0jcrVLKnhi3uqZvPK6jDiRb3Sr4WWTVl9d+xqlVSe/HdPIH424Ws1frm2ek19XTWeqaReU19X45dra/G4L9O3+WreBtuXCo5vr6qNKihi5Ev6+6BW+mLmOp3/XQmWh4dFRcsVyvErVZ9N+VblapTQ8EXdfzeHn9WyaczhR8Xsu11xFywn+OGHHzRt2jS1aNFCqampunjxog4cOKCUFNuZz86dOysgIEADBw40M0wgx6tePUVjRt/Wb2c8NPIdf2343ltvdk/Q3/9uO1lw9KinevXOpW3b7p5p7/hGov76SqKilvloxAh/Xbtu0fgPbqnuU3dXNipWTNXUKfGyWqVRo/01d56v6tZN1sQJ8fLM+PUibmvvtmN6r8+nCi1RQO/M7qAmrZ/Uwonf6YuFmyRJpSoW0bTlvVTr6fIP9LqH9pzW2x3my+Jh0fCZ4erydktt33hII7pEPvTF7e5k7+ZDGtdhrkLLhOidJb3U5K9P6eN3VmjFh+skSaWrFNf09cNU+9kqJkfqevZu+kXjwmfbxm5pH9vYjViuFTNt30lTumpxTf9+pGo/d3fs1i/dqpn9Fqt6k4oatbSvqjQop0nd5mvzlzvSt9n9wwGN7zRP1ZtU1Jjl/VS/TQ3NeWuJFo1ZYff+QY8FysvbSzdz+B3J9m4+pHER8xRaOkTv/F9PNXmljj4etUIrPvyXJKl0lWKa/q+hzGG4HIs1q9snmCQpKUne3rayjcOHD6ts2bIZSrUeVur5jF+OBGQn51NizQ7BrXVu2sHsENyeNebSH2+EPyclZ5QvmsbL1Pv35AhrrywwO4QsNXlugtkhZGrjurfNDuGBuNy/orTkQ5LKlStnYiQAAADA77jUafvsy6XuggUAAADAvZGAAAAAADCMy5VgAQAAAK7I4lqXTmdbrIAAAAAAMAwJCAAAAADDUIIFAAAAOCLV7ADcAysgAAAAAAxDAgIAAADAMJRgAQAAAA7gLljOwQoIAAAAAMOQgAAAAAAwDCVYAAAAgCOowHIKVkAAAAAAGIYEBAAAAIBhKMECAAAAHMFdsJyCFRAAAAAAhiEBAQAAAGAYSrAAAAAAB1iowHIKVkAAAAAAGIYEBAAAAIBhKMECAAAAHMFdsJyCFRAAAAAAhiEBAQAAAGAYSrAAAAAAB1hSzY7APbACAgAAAMAwJCAAAAAADEMJFgAAAOAI7oLlFKyAAAAAADAMCQgAAAAAw1CCBQAAADiCCiynYAUEAAAAgGFIQAAAAAAYhhIsAAAAwAEW7oLlFKyAAAAAADAMCQgAAAAAw1CCBQAAADiCEiynYAUEAAAAgGFIQAAAAAAYhhIsAAAAwBGpZgfgHlgBAQAAAGAYEhAAAAAAhiEBAQAAABxgsVpd8vFnpKamaubMmWrYsKGqVq2qTp066dSpU1luf/ToUXXr1k116tRR3bp11bdvX507d+6B3pMEBAAAAMih5syZo6ioKI0bN07Lli2TxWJR165dlZiYmGHbq1evqmPHjsqVK5eWLFmiBQsW6OrVq+rSpYsSEhIcfk8SEAAAACAHSkxMVGRkpPr06aPGjRurfPnymjZtmi5cuKD169dn2H7Dhg26deuWxo8frzJlyqhSpUqaNGmSjh07pt27dzv8viQgAAAAgCOsVtd8PKRDhw4pLi5OTz31VHpb7ty5FRYWph07dmTYvm7dupo9e7Z8fX0z/Oz69esOvy+34QUAAACysWbNmt33599//32m7efPn5ckFSpUyK69YMGCio6OzrB9aGioQkND7do++ugj+fr6qlatWg7HywoIAAAAkAPdunVLkuTj42PX7uvr69A1Hf/85z+1dOlSDRgwQPnz53f4fVkBAQAAABzxJ+849ahktcLxR/z8/CTZrgVJ+39JSkhIkL+/f5b9rFarZsyYoblz56p79+564403Huh9c1QC0rJcA7NDcG8u+o8ScJglxuwI3B/HiUfPK0f9agfwJ6SVXsXExKhYsWLp7TExMSpfvnymfZKSkjR06FCtXr1agwcPVufOnR/4fSnBAgAAAHKg8uXLKzAwUNu3b09vu3Hjhg4ePKiaNWtm2mfw4MFau3atpkyZ8lDJh5TDVkAAAACAh5ZqdgDO5ePjo/DwcE2ePFmPPfaYihQpokmTJikkJETNmzdXSkqKrly5oqCgIPn5+WnlypVas2aNBg8erNq1a+vixYvpr5W2jSNYAQEAAAByqL59++qVV17RiBEj1L59e3l6emrhwoXy8fFRdHS0GjRooDVr1kiSVq9eLUmaOHGiGjRoYPdI28YRFqs15xTktsjTyewQ3FvOmUpwVxaL2RG4P44Tjx7XgCCbW3tlgdkhZOm5aqPMDiFT634aY3YID4SjFAAAAOAACydRnIISLAAAAACGIQEBAAAAYBhKsAAAAABHUILlFKyAAAAAADAMCQgAAAAAw1CCBQAAADiCEiynYAUEAAAAgGFIQAAAAAAYhhIsAAAAwBGUYDkFKyAAAAAADEMCAgAAAMAwlGABAAAAjkg1OwD3wAoIAAAAAMOQgAAAAAAwDCVYAAAAgAMs3AXLKVgBAQAAAGAYEhAAAAAAhqEECwAAAHAEJVhOwQoIAAAAAMOQgAAAAAAwDCVYAAAAgCNSKcFyBlZAAAAAABiGBAQAAACAYSjBAgAAABzBXbCcghUQAAAAAIYhAQEAAABgGEqwAAAAAEdQguUUrIAAAAAAMAwJCAAAAADDUIIFAAAAOIISLKdgBQQAAACAYUhAAAAAABiGEiwAAADAEamUYDkDKyAAAAAADEMCAgAAAMAwlGABAAAAjrCmmh2BW2AFBAAAAIBhSEAAAAAAGIYSLAAAAMARfBGhU7ACAgAAAMAwJCAAAAAADEMJFgAAAOAIvojQKVgBMUmNZpU089/vaFX0XC3+eaLaDWjpcN/STxbX6kvzFVws/yOM0LXVeObO+J2fp8X7J6ndgOf/sE/TdnX10fZx+urCR/p41/tq0aFRhm2av1pf87aN1dcx8/XJvokKH/aCPL087bYZsuhNrb2xKMOj8cu1nbZ/ZjNzfH/vqZbVtPbGIlVpUO5P7U92xXHi4TGHXUONZhU18/vhWnVmlhbvHa92/f/icN/SVYtp9YW5Ci6aM+ewoxhjZEesgJigQu1SGh3VV5tW/qjFY1eqUt0yihj5kiweHoqavPq+fUtUKqp3l/eXl3fO/egq1C6t0VH9bOM3bqUq1S2riHdeksXDkuX4NXyhpgZ91EWr5q7Xrg37Vff5auo/q6MSbidq4/JtkqS2PZqrx4RXtfnLHfp45HLlyR+o8KEvqETFUI19bVb6a5WqXEzfR/1X3yz43u49zh678Oh22kBmj2+aoMdyqe+MiEe6r66M48TDYw67hgq1S2n0p7216csdWvz+KlWqU0YRI16wfQ5T19y3b4mKoXo3qm+OncOOYoyRXTHrTBA+pK2O/3xak7p/LEna9f1+eXp76m/9W2rlrHVKvJ2UoY+Xt6fadH9GHYa/kOnPc5LwoXfGr9sCSdKuDfvl6eWpv/0j6/GLGPmStqzaqflDo2x9vt+voHyBen3YC9q4fJs8PCwKH9JGu37Yr/ci5qT3O/rTKc3f8Z6qNQnTTxsPytffR4VLBWvZ1G91aMdxY3bYYGaO7+/1ntJBKUkpj3BPXRvHiYfHHHYN4YNb6/jPv2lSj0hJ0q7vD9jmcL+/aOWc9VnP4W5N1WFo2xw9hx3FGJuAu2A5BSVYBvP28VLlBuW09Zvddu1bvtqpgCA/VapXNtN+tZ6totfebqOoKd8qctTnRoTqktLH7+tddu228fPPdPyCi+VXaJlC2vrNvX12qHDJYBUpHay8BfMoKF+gtn+3x26b04fP6dqlm6rT4klJUolKofL09NCxfaedul+uwuzxTdPopdqq3jRMC99Z7pT9ym44Tjw85rBr8PbxUuX6ZbV19T1z+Otdtjlct0ym/Wo1r6zXBrdW1NQ1ihzzhRGhZluMMbIzEhCDhTxRQD6+3jr763m79nPHYyRJRUoFZ9rvyO4TiqgyWFGTVyslOfWRx+mq7o6ffbnTueO250VKh2ToU7RcYUnKpE9Mep+46/FKTkpWcLHH7bYJzBugoLwBCiluay9ZuZgk6flOT2vp0en65tICTV47VOVqlnTC3pnP7PGVpLwFcqvXlHDNe3uprpy/9qf3KTviOPHwmMOuIeSJx22fw7EsxjSrOfzTSUVUHaKoqWty7Bx2FGOM7Mz0BGTv3r2aP39++vNt27bpzTffVKtWrdSzZ0/t3LnTxOicLzBPgCQp/uZtu/a05wG5/TPtdzn6mmKvxj3a4LKBwLxp43fLrj19/IL8Mva5M+Zx9/aJTevjr4Rbidq0codad2umZ8MbKjBvgEJLh2hI5JtKTkqRb4CvJNv1H5Lk6++j8R3nanynefLx89aE1YNVomKoE/fUHGaPryT1m/mGfvnxmL6P+p/zdiyb4Tjx8JjDriHLOfy7Mc3M5ehrir0W/2iDcxOMsUmsVtd8ZDOmXgOydu1aDRgwQPXq1VO3bt20ceNG9ezZU40aNVLjxo115MgRRUREaNasWWrSpImZoTqNxcMiSbJmMVms3N7tviyWBx+/tDG/9x9o+mul2s4Azey/WEkJSeo/6w0NmNNJt+MS9PmM7+Tr76OE+ARJ0pez/6XNq3Zoz39+SX+dPf85qIU/TdDfB7XWBx3n/rkdNJnZ4/vMq/VVqV4Zda8z0in7k11xnHh4zGHX8MdzmDPvfxZjjOzM1ARk1qxZ6t27t3r27ClJmjt3rt58803169cvfZu5c+dq5syZbpOAxF23nXW498xE2lm5uBuclbifPx6/Ww738c/la9fndlyCpvVepLlvL1Vw0fw6f/qSEuIT9Wx4A+07eVGSdObX8zpzT1lM3PVbOrjtqEpWLvpnd890Zo5v/kJ51f2D9lowfJmuXbwhD08PeXjaFmk9PD3k4WFRag75w5vjxMNjDruGuOu2McvwOQRm/TngwTDGyM5MTUBOnz6t1q1bpz8/c+aMnnvuObttWrVqpblzs/dZ5d87dyJGKckpKlyyoF172vPTh86ZEVa2kfX42WpdMxu/346ev7NNQbuLx+8d89otqir2apwObv9Vp+605Xk8SAVCH9Ove09Jkhq/XFs3rsRmuNuNr7+Pbly+6YxdNJWZ41u9SUUF5culAXM6acCcTnbvMf6bwbpw6pIiKr/lpD11bRwnHh5z2DXc/RwK2LWnj+nhaDPCciuMsUmyYbmTKzL1GpCiRYvqP//5T/rzChUq6NChQ3bb7Nu3T8HBmV9IlR0lJSTr5/8eUf3W1e3aG7StqZvX4nR41wmTIssekhKS9fPWI6rfpoZde4O2NXXzapwO78p4a9zo4zE6dyJGDdrWuqdPLZ05Gq2Y3y5Lsl1Y3vW9dnbbvNjzWaWmpGr72r2SpFZdmqrPtA7y8r77xWP5C+VVWJ3S2rflsFP20Uxmju+2tXvUp/EYu8fMfoslSTP7LdaodjOcuasujePEw2MOuwbbHD6q+q3umcNtatjm8G7m8J/FGCM7M3UFpGvXrho+fLjOnz+fftH5kCFDlJCQoDJlymjv3r2aPXu2evfubWaYTvfZpG/0wVeDNHxxD637vy0Kq1Nar/RtochRK5R4O0kBQX4qVq6wok9c1HU3OKvubJ9N+kYffD1Iwxf31Lolm23j16+FIt/5/O74lS+s6ON3x++zCV9r4LwuunE1VtvW/KSnWlZT45dr293P/6t5G/T+qkHqPr69tq3ZoycbV9DfB7XSsinf6vydEqylE77WuC8HauSnvfX1/O8VlC9Q4UPbKvZavFbM/M6U8XA2M8f35hX7C6jTSmDOHI3WyYNnDBoB18Bx4uExh13DZ1O+1Qdf/kPDF3XXuk+3Kqx2Kb3S51lFjvninjkco+uXY80ON1tijJFdWaxZXb1kkK+++kozZ87U2bNnZbFY7C6mypUrl7p06aIePXo45b1a5On0xxsZpF6r6np9aFsVKROiy9HX9M2CH7Ry1jpJUpUG5TTx27c1pcdCrV+6NUPf5q/W18C5nRVR+S1dOH3Z6NCzZuBUqtequl4f9oJt/M5dzTh+a4Zoypsf241fy45P6+W+LVSgyGOKPhmj5VO/zXCXmqdfqaP2b7VWcPHHFfPbZa3++Ad9/ZH9N55XaxKm195uqxIVQ5WaatWuH/Zr4cjlunjmyiPfb6OYOb6/l/Zeg1uON2aF6c5Fx66C48TDy7FzWJK8XOc7hus9X02vD2mjIqWDbXN44UatnL1eklSlfllN/OYtTem1SOs/+2+Gvs3b19PA2R0VUXWILvzmQnPYxbjjGK+9ssDsELL0l0K9zA4hU99FzzY7hAdiegKS5vjx4zp58qRiY2Pl7e2tkJAQhYWFydfX9487O8iVEhC35BpTCXh4LpaAuCWOE4+eCyUgwMMgAXlw2S0BcZmjVMmSJVWypHt8mRsAAACAzLlMAgIAAAC4NFZxncL0b0IHAAAAkHOQgAAAAAAwDCVYAAAAgCMowXIKVkAAAAAAGIYEBAAAAIBhKMECAAAAHJFKCZYzsAICAAAAwDAkIAAAAAAMQwkWAAAA4ACrNdXsENwCKyAAAAAADEMCAgAAAMAwlGABAAAAjuAuWE7BCggAAAAAw5CAAAAAADAMJVgAAACAI6yUYDkDKyAAAAAADEMCAgAAAMAwlGABAAAAjkjliwidgRUQAAAAAIYhAQEAAABgGEqwAAAAAEdwFyynYAUEAAAAgGFIQAAAAAAYhhIsAAAAwAFW7oLlFKyAAAAAADAMCQgAAAAAw1CCBQAAADiCu2A5BSsgAAAAAAxDAgIAAADAMJRgAQAAAI5IpQTLGVgBAQAAAGAYEhAAAAAAhqEECwAAAHCElS8idAZWQAAAAAAYhgQEAAAAgGEowQIAAAAcYOUuWE7BCggAAAAAw5CAAAAAADAMJVgAAACAI7gLllOwAgIAAADAMCQgAAAAAAxDCRYAAADgAO6C5RysgAAAAAAwDAkIAAAAAMNQggUAAAA4grtgOQUrIAAAAAAMQwICAAAAwDAWq9XK5fwAAAAADMEKCAAAAADDkIAAAAAAMAwJCAAAAADDkIAAAAAAMAwJCAAAAADDkIAAAAAAMAwJCAAAAADDkIAAAAAAMAwJCAAAAADDkIAAAAAAMAwJCAAAAADDkIAAAAAAMAwJCAAAAADDkIC4mNTUVM2cOVMNGzZU1apV1alTJ506dcrssNzWnDlz9Prrr5sdhlu5du2a3nnnHTVq1EjVq1dX+/bttXPnTrPDciuXL1/WW2+9paeeekrVqlVTt27d9Ouvv5odlls6ceKEqlWrppUrV5odils5e/asypUrl+Hx+eefmx2aW1m1apVatmypypUr6/nnn9d3331ndkiAJBIQlzNnzhxFRUVp3LhxWrZsmSwWi7p27arExESzQ3M7n3zyiWbOnGl2GG5nwIAB2rt3r6ZOnaoVK1aoYsWK6ty5s44dO2Z2aG6jR48e+u2337RgwQKtWLFCfn5+euONN3Tr1i2zQ3MrSUlJGjRokOLj480Oxe0cPnxYvr6+2rx5s7Zs2ZL+aN26tdmhuY2vvvpKw4YNU7t27bR69Wq1bNlSAwYM0E8//WR2aAAJiCtJTExUZGSk+vTpo8aNG6t8+fKaNm2aLly4oPXr15sdntu4cOGCunTpohkzZqhEiRJmh+NWTp06pa1bt2rUqFGqWbOmSpYsqeHDhys4OFirV682Ozy3cPXqVYWGhmrs2LGqXLmySpUqpZ49e+rixYs6evSo2eG5lQ8//FC5cuUyOwy3dOTIEZUoUUIFCxZUgQIF0h9+fn5mh+YWrFarZsyYoYiICEVERKh48eLq1auX6tWrpx9//NHs8AASEFdy6NAhxcXF6amnnkpvy507t8LCwrRjxw4TI3MvBw4cUJ48efT111+ratWqZofjVvLly6f58+erUqVK6W0Wi0VWq1XXr183MTL3kS9fPk2dOlVlypSRJF26dEkLFy5USEiISpcubXJ07mPHjh1atmyZJkyYYHYobunw4cPM10fo+PHjOnv2bIYVpYULF6p79+4mRQXc5WV2ALjr/PnzkqRChQrZtRcsWFDR0dFmhOSWmjZtqqZNm5odhlvKnTu3GjdubNf23Xff6fTp02rQoIFJUbmvkSNHavny5fLx8dHcuXMVEBBgdkhu4caNGxo8eLBGjBiR4XgM5zhy5IgKFCigV199VSdPnlTx4sXVs2dPNWzY0OzQ3MLJkyclSfHx8ercubMOHjyo0NBQ9ejRg99/cAmsgLiQtPptHx8fu3ZfX18lJCSYERLwp+zatUvDhg1Ts2bN+KX3CEREROiLL75QmzZt1KtXLx04cMDskNzC6NGj9eSTT3I9wiOSmJiokydPKjY2Vv3799f8+fNVuXJlde3aVf/73//MDs8txMbGSpLefvtttWrVSpGRkapfv7569uzJGMMlsALiQtJqXxMTE+3qYBMSEuTv729WWMBD2bBhgwYNGqSqVatq6tSpZofjltJKWMaOHas9e/ZoyZIl+uCDD0yOKntbtWqVdu7cqW+++cbsUNyWj4+PduzYIS8vr/QTbpUqVdKxY8e0cOFC1a1b1+QIsz9vb29JUufOnfXiiy9KkipUqKCDBw9q0aJFjDFMxwqIC0lb6o+JibFrj4mJUUhIiBkhAQ9lyZIl6tOnjxo1aqQFCxZwYakTXb58WatXr1ZKSkp6m4eHh0qVKpXh2IEH98UXX+jy5ct6+umnVa1aNVWrVk2SNGrUKD3//PMmR+c+AgICMqz2ly1bVhcuXDApIveS9jdD2bJl7dpLly6tM2fOmBESYIcExIWUL19egYGB2r59e3rbjRs3dPDgQdWsWdPEyADHLV26VGPHjtVrr72m6dOnZ/gjA39OTEyMBg4caHcnm6SkJB08eFClSpUyMTL3MHnyZK1Zs0arVq1Kf0hS3759NX/+fHODcxOHDh1StWrVMnw/0P79+7kw3UnCwsKUK1cu7d271679yJEjKlasmElRAXdRguVCfHx8FB4ersmTJ+uxxx5TkSJFNGnSJIWEhKh58+Zmhwf8oRMnTuj9999X8+bN1b17d12+fDn9Z35+fgoKCjIxOvdQvnx5NWjQQGPGjNG4ceOUO3duzZs3Tzdu3NAbb7xhdnjZXnBwcKbt+fPnV5EiRQyOxj2VLVtWZcqU0ZgxYzRq1Cjly5dPy5cv1549e7RixQqzw3MLfn5+6tKli2bPnq3g4GBVqVJF3377rbZu3apPPvnE7PAAEhBX07dvXyUnJ2vEiBG6ffu2atWqpYULF3IWGdnCunXrlJSUpPXr12f47poXX3xR48ePNyky92GxWDR9+nRNmTJF/fv3182bN1WzZk19+umnKly4sNnhAX/Iw8ND8+bN0+TJk9W/f3/duHFDYWFhWrRokcqVK2d2eG6jZ8+e8vf3T/8+sVKlSunDDz9UnTp1zA4NkMVqtVrNDgIAAABAzsA1IAAAAAAMQwICAAAAwDAkIAAAAAAMQwICAAAAwDAkIAAAAAAMQwICAAAAwDAkIAAAAAAMQwICAAAAwDAkIACQzSQkJCgsLEzVqlXT2LFjzQ4HAIAHQgICANmMxWLR4sWLVaVKFS1ZskQnTpwwOyQAABxGAgIA2YyPj49q1aqlLl26SJIOHDhgckQAADiOBAQAsqmSJUtKkn755ReTIwEAwHEkIACQTS1YsECSdOjQIZMjAQDAcSQgAJANbdmyRZ999pny5MmjgwcPmh0OAAAOIwEBgGzmxo0bGjZsmJo1a6b27dvrypUrunDhgtlhAQDgEBIQAMhmxowZo+TkZI0bN05hYWGSKMMCAGQfJCAAkI2sXbtWq1ev1nvvvafHHnssPQHhQnQAQHZBAgIA2cTFixc1atQotWvXTk2aNJEkFS1aVLlz5+Y6EABAtkECAgDZxMiRI5UnTx4NGTLErr1ChQqUYAEAsg0SEADIBj7//HNt2rRJEydOVEBAgN3PwsLCdPr0acXGxpoUHQAAjrNYrVar2UEAAAAAyBlYAQEAAABgGBIQAAAAAIYhAQEAAABgGBIQAAAAAIYhAQEAAABgGBIQAAAAAIYhAQEAAABgGBIQAAAAAIYhAQEAAABgGBIQAAAAAIYhAQEAAABgGBIQAAAAAIb5f8LSe3ogZVkKAAAAAElFTkSuQmCC",
+      "text/plain": [
+       "<Figure size 1000x1000 with 2 Axes>"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAyAAAANbCAYAAAC6lftqAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjguNCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8fJSN1AAAACXBIWXMAAA9hAAAPYQGoP6dpAADPoUlEQVR4nOzdd3gU1dvG8e+mAwkdEhIIvUU6offeBJEiICBKEUSaIKDA7xUpdsSCKCoIiEgvIr0qvffeSSAkoQXSk919/9gQXJJA0GQTkvtzXVy6M3N2Zp6cnMwz58wZg9lsNiMiIiIiImIDdml9ACIiIiIiknkoAREREREREZtRAiIiIiIiIjajBERERERERGxGCYiIiIiIiNiMEhAREREREbEZJSAiIiIiImIzSkBERERERMRmlICIiKQivetVRETEmkNaH4CIiC299957LF++/InbeHl5sWXLlv+8r8WLF3Px4kXee++9p267ZMkSxo4dS7169fj555//875FRETSK4NZt+dEJBO5du0ad+7cif88ffp0Tp06xbRp0+KXOTk54ePj85/31bhxY6pXr84nn3zy1G27detGaGgoFy5cYMOGDRQqVOg/719ERCQ9Ug+IiGQq3t7eeHt7x3/OnTs3Tk5OVKpUKc2O6fLlyxw6dIgff/yRkSNHsmjRIkaMGJFmxyMiIpKa9AyIiEgizp07R//+/alSpQpVqlTh7bffxs/Pz2qbX3/9lZYtW1K+fHnq1avH+PHjCQ0NBSy9H9evX2f58uWULl0af3//JPe1dOlS3NzcqFWrFi1btmTp0qVER0cn2O7EiRP07duXqlWrUrNmTd555x0CAgLi19++fZsxY8ZQu3ZtKleuTPfu3Tl48GD8+tKlS/Ptt99afee3335L6dKl4z+/99579OrViw8++ABfX19efvllYmNjuXPnDh9++CGNGjWiXLlyVK9enbfffjvBea1evZoOHTpQsWJFGjZsyOeff050dDTnz5+ndOnSLFy40Gr7wMBAypYt+9RhcSIiknEoAREReczly5fp2rUrt2/f5pNPPmHy5Mn4+fnRrVs3bt++DVgutD/99FO6d+/OzJkzefvtt1m5ciWTJk0CYNq0aeTLl48GDRqwcOFC8ufPn+i+jEYjK1eupHXr1jg5OdGhQwdu377Npk2brLY7c+YM3bp1IyIigk8++YQJEyZw6tQpevfuTUxMDOHh4XTt2pVdu3YxYsQIpk2bRrZs2ejbty8XL158pvM/cOAAV69e5dtvv+Xtt9/G3t6e/v37s3PnTkaMGMHMmTMZOHAgu3bt4v/+7//iyy1YsIDhw4dTtmxZpk2bRv/+/Zk/fz7jx4+nZMmSVKxYkZUrV1rta+XKlbi4uNCiRYtnOkYREXl+aQiWiMhjpk2bhouLC7Nnz8bV1RWAWrVq0bRpU37++WdGjx7N3r178fLyonv37tjZ2VG9enWyZs3K3bt3AfDx8cHJyYncuXM/cXjX33//TVBQEB07dgSgUqVKlChRgt9//53WrVvHbzd9+nRy5MjBrFmzcHZ2BsDDw4Nhw4Zx9uxZjh49ip+fHytWrKBMmTIA+Pr60r59e/bv30/x4sWTff6xsbF8+OGHFC5cGLD0UmTJkoXRo0fj6+sLQI0aNfD392fBggUAmEwmvv32W5o1a8bkyZPjvysqKorly5cTHR1Nx44d+b//+z/8/Pzin3FZsWIFrVq1ImvWrMk+PhEReb6pB0RE5DF79uyhRo0auLi4EBsbS2xsLK6urvj6+rJr1y4AatasyZUrV+jQoUP8g+xt27alV69ez7SvpUuXUrhwYYoWLcr9+/e5f/8+rVq1Yt++fVY9FwcPHqR+/frxyQdAhQoV2LJlC+XKlePAgQMULFgwPvkAcHZ2Zu3atXTt2vWZjsnFxcXqORl3d3fmzp2Lr68vN27cYPfu3cybN49Dhw4RExMDWHqNbt26RdOmTa2+6/XXX2flypU4OTnRpk0bsmTJEt8LcuzYMS5evEiHDh2e6fhEROT5ph4QEZHH3Lt3jzVr1rBmzZoE63Lnzg1A69atMZlMzJ8/n2nTpvH111/j5eXFiBEjaNOmTbL2c+fOHbZt20ZMTAzVqlVLsH7hwoWMGTMm/pjy5MnzxGN+0vpnkSdPHgwGg9WyP/74gy+//JKAgABy5sxJmTJlcHFxsdr/w7JJcXV1pWXLlvzxxx8MGjSI5cuXU7hw4fheFRERyRyUgIiIPMbNzY3atWvzxhtvJFjn4PCo2XzxxRd58cUXefDgATt27OCnn35i5MiR+Pr64u7u/tT9rFy5kpiYGKZNm0b27Nmt1n333XesWLGC4cOH4+Ligpubm9X0wQ/99ddflClTBjc3t0QfdD98+DCurq6ULFkSsDxz8k/h4eFPPc4DBw4wevRoevToQZ8+ffDw8ADgs88+i3/I/eHxP36M9+7d4+TJk1SqVIls2bLRsWNHli9fzrFjx1i/fj09e/Z86v5FRCRj0RAsEZHHVK9enQsXLlC2bFnKly9P+fLlKVeuHLNnz2bjxo0ADBs2jEGDBgGWhKVVq1YMHDgQo9FIUFAQAHZ2T25ily1bRqVKlWjWrBk1atSw+tetWzdCQkJYu3YtYHmeY/v27VazY509e5Y333yT48eP4+vri5+fH2fPno1fHx0dzeDBg1m0aBFg6YG4efOm1TEcOnToqfE4fPgwJpOJIUOGxCcfRqMxfjiayWSiWLFi5MqVi82bN1uVXbVqFf369SMqKgqAatWqUaRIET7//HPu3r1L+/btn7p/ERHJWJSAiIg8ZuDAgVy7do3+/fuzadMmtm/fzuDBg1m9enX8MxY1a9Zk48aNfPrpp+zevZv169fz9ddfU6RIkfhtsmfPzqlTp9i3bx+RkZFW+zh27Bjnzp1LcrhWkyZNyJEjR/xD3gMHDuTu3bv069ePLVu2sG7dOoYNG8YLL7xA/fr16dChA4UKFeKtt95i5cqVbN++nSFDhhAZGRnfy9CwYUNWr17N/Pnz2b17N6NGjeLq1atPjUeFChUAmDBhAnv27GHDhg288cYbnDlzBrD0otjb2zN48GDWr1/P+PHj2blzJ7/99htfffUV3bp1ix+6BtCxY0f27dtHrVq1KFCgwLP8aEREJANQAiIi8pgyZcrw22+/YTAYGDVqFEOGDCE4OJjvvvuO5s2bA9C1a1fGjRvH33//zYABA/i///s/ihcvzqxZs3B0dASgd+/e3Lp1iz59+nDixAmrfSxduhR7e3urma7+ycnJiVatWnHkyBFOnz6Nj48Pv/76KyaTiXfeeYcJEyZQqVIlfvrpJ5ycnHB1dWXevHlUrlyZyZMnM3ToUKKiovj111/jHyh///33ady4MZ9//jlDhgwhS5YsyXrhYY0aNfi///s/Dh8+TL9+/fj444/x9PSMf3v8w2FY3bt355NPPuHAgQP079+fWbNm0bt3b9577z2r72vYsCGAHj4XEcmkDGaz2ZzWByEiIpnHTz/9xM8//8z27dtxcnJK68MREREb00PoIiJiE8uXL+fcuXPMnz+fN998U8mHiEgmpQRERERs4syZMyxYsICmTZvSr1+/tD4cERFJIxqCJSIiIiIiNqOH0EVERERExGaUgIiIiIiIiM0oAREREREREZtRAiIiIiIiIjaTqWbBapmrb1ofQoZmyJk9rQ8h49OcEanKHBqW1oeQ4RniXtIoqcghU/1pt72YmLQ+ggxv7c3paX0ISTLdLJXWh5AoO49zaX0Iz0Q9ICIiIiIiYjNKQERERERExGbUTysiIiIikgwmTGl9CIl63noUnrfjFRERERGR55gSEBERERERsRkNwRIRERERSQajOX0OwXreLujVAyIiIiIiIjajBERERERERGzmeeuxERERERFJEyb0QuCUoB4QERERERGxGSUgIiIiIiJiMxqCJSIiIiKSDOn1RYTPG/WAiIiIiIiIzSgBERERERERm9EQLBERERGRZDCaNQtWSlAPiIiIiIiI2IwSEBERERERsRkNwRIRERERSQa9iDBlqAdERERERERsRgmIiIiIiIjYjIZgiYiIiIgkg1FDsFKEekBERERERMRmlICIiIiIiIjNaAiWiIiIiEgyaBaslKEeEBERERERsRklICIiIiIiYjMagiUiIiIikgxGs4ZgpQT1gIiIiIiIiM0oAREREREREZvRECwRERERkWQwpfUBZBDqAREREREREZtRAiIiIiIiIjajIVgiIiIiIslg1IsIU4R6QERERERExGaUgIiIiIiIiM1oCJaIiIiISDIYNQIrRagHREREREREbEYJiIiIiIiI2IyGYImIiIiIJINeRJgy1AMiIiIiIiI2owRERERERERsRkOwRERERESSwYghrQ8hQ1APiIiIiIiI2IwSEBERERERsRkNwRIRERERSQaTXkSYItQDIiIiIiIiNqMEREREREREbEZDsFJB1SYv0Gvsy3iXLkDI7VDW/LKNhVPXPrFM41dq0mVYKzyK5CPY/w5Lvl3Pul+3W21Tq3UlXh3ZloIl3LkbdJ/NC3ezcOoaYmOM8duUq12S18e9TLFy3kSGRbJ95UHmTF5O+IPIVDnX9KJq/TK8NqIV3iXdCbkTyprfdrPo+83JKluiXEGmLhtKn0YfEXT9rtW6ll1r0v6N+nh45yH4xl3+/HUnK2dvT+KbMraqDcrw2ojWlhjfDmXN/F0smv4MMV4+jD6NJhPkbx3jOi0r0HlAYwoVdyfsQQRHdp1n1ieruHcrNDVOI92o2vgFeo15Ce9SnoTcfsCa2X+x8Ot1TyzTuHMNugxthUfhvARfv8OSaRtYN2+H1Ta1WlXi1XfbULB4XDuxeA8Lv1pr1U6U9S3G6+NepnSVokSGRbJ/0wl+mbScO4EhqXKu6UnVRj68NvpFvEvFtc9zt7Po2w3JKluiQiGm/jmSPrXHE+R/55nXZ3RVG5bhtZFt8C7pYYntvJ0s+m5TssqWKF+QqSuH06f+pASxq9OqIp0HNnnURuw8x6yPVnHv1oPUOI3nTtVGPrz2Xlu8SxYg5M4D1szZwaJv1yerbIkKhZi6ehR9an9AkF/mq7P/hmbBShnqAUlhZasXZ/z8wfidC2Dia9PZsnA3vca9TNcRbZIsU++lqrz7fW8Obj3FhB7fcWT7GYZ904tGnWvEb1O5oQ//+3Ug1y/eZELP6fw5cytd3mnNm5O7xG9TvHwhJi95h9CQCCb1ms4vE5ZRr70vY2e/larnnNbKVinCBz/1xu9iIJMGzGbL8oP0ercVXd9u+tSyRct68uGsvjg42idY16Z7bYZ+/Ap7N59kfJ+f2bR0P/3GtqPLwCapcRrpmiXGffC7EMikAb+wZfkBer3bOvkx/qVfojGu27oi4354gwsnrzN54Gxmf76G8jWK88n8t3F0zrj3R8pWK8b4eW/jd+4mE1//ni2L9tBrbHu6vtM6yTL12lXl3e/e4OC2U0x4bbqlnfjqNRp1qh6/TeUGZfnfnAFcvxjIhF7f8+cv2+gytBVvTnwlfptSlYvw6coRuObIypRBv/Dl0DnkK5ibL9eMJqtbllQ977RW1rcoH8zuj9/5QCb1+YktS/bS6722dB3a4qlli/p48eGvAxOtx8lZn9GVrVqED2b2s8T2zZlsWbafXqPa0HVws6eWLVrWkw9n90+8jWhTiXE/9ubCcX8mD5jF7E//pHyNEnyyIGO3EclV1rcYH8wZgN+5m0zq8yNbFu+j1/tt6Tq05VPLFvXx4sN5mbfOStrSb28K6zGqLZeO+/H5gJkAHNx8EntHe14Z2opl320gOjImQZleY19mx8qD/Dh2oaXMlpO45cxGz/deYuvivQA0716HYP87fPbmz5hMZg5vO0WOfNl5+a2mzBizEGOskQ5vNyfk9gMmvTbd6m7niOm9KVjCHf8LgTaIgO11H9qcS6dv8MXw+QAc/PsMDg72dB7QmGU//0V0VMKYOzja065XXXoOb5XozwSg81uN+evPw/zy2WoAjuw6j1fRfLTrVY+Fybzzn1F0H9aCS6eu88Xw3wA4+NcZHBzt6fxWk6fEuB49RyQd41cHN2ffllNMG7s4fpn/xUC+/mM4NRq/wI61R1PnhNJYj5FtuXTCj88HzgIsv/OWdqIly77fmHg7MeYldvxxiB/HLbKU2XoKt1zZ6DmqHVuX7AOg+au1Le3EgJmWduKv0+TI68bLA5oyY9wijLFGug1vTVhIBKPbTyE0JByAw3+d5uc9E+k8uAVzPlphmyCkge4jWnPppD9fDJ4DWGLo4GBP50HNWTZjS6Jxd3C0p13vhvQc/SLREdHPvD6z6P5OS0sbMWweAAe3xbXDA5uy7KdtScf2jfr0fLd10m3E0Bbs23ySaWMWxS/zvxjE13+OoEaTF9ixJmO2EcmVaJ12tKfz4OYsm7E56bj3aUjPUW2Jjsy8dVbSlnpAUpCjkwPl65Zm55+HrJbvWHmQrG4ulKtVMkEZ90J5KFjSI2GZPw7gWSw/XsXd4787MiwK0z+mX7h/+wFOzo5kcXUB4JcJy/ig67dWyUdMTKylvLNjypxkOuPoZE+FGiXYue6Y1fIda4+S1dWFctWLJlquWsOydB/SgoXfbWLWp38mus3/ev3IrE+s18XGGHF0ylx5e3yM1x+3Wr5jzcMYF0u0XLVGZek+tAULp21k1ierEqw3GAwc2nGWtb/vtlrufzkYgAKF86TQGaQvjk4OlK9Tip1/HrZavmPVQUs8aybRTpTwYOfqx8r8cSiuncgf992ORIZHW7cTd0Lj2glnAAqVKsDJvRfikw+AmKhYzh26Qo3m5VPsPNMbRycHKtQqyc41R6yW7/jzsCXuNYonWq5akxfoPqIVC79ex6zJK595fWbg6GRPhZol2bn2sXb4aW1EYx+6D2vJwm83MOujPxKsNxgMHNp+lrXzd1kt978UBECBwnlT6AyeT45ODlSoXZKdq49YLX9Up0skWq5ak3J0H9HaUmcnrUj9A81gjBjS5b/njRKQFORRJB9Ozo5cf6yn4UZcY+lVwj1BmUKlCwA8tcwfP23Bs7g7nQa3IFv2LJTxLUb7AU3Zt+EYoffCALh14y6XT/oD4JLNmcoNyvLG/zpwfNe5+OUZjUehPDg6O3A97qL1oRtXbgHgVTR/ouXOHfOjV71JLPhuE8ZYU6Lb+F0Min8mxDVHVlp0qUGTDr6s+nVHottnVPExjquTDz2Kcb5Ey5076kevuhMtMTYmjLHZbObnyX+wZ+MJq+V1WlYA4MrZmylx+OmOR+G8lnbi4uO/85Y6/PCmwz8VKhXXTjxe5nKQVZk/Zm7Fs1h+Og1qbmknqhalff+m7Nt4nNB7loQj5NYD3L0TJncFiubDPQNf0HkUzoOjs2Mi9Tgu7sUSxh3g3JGr9Kr+fyz4en2ibcXT1mcGHt55k2gj4mKbVDt89Bq9an/Igm83Jt1GTFzBng2PtRGtKgJw5WxAShz+c8ujcN7E63R8u5BE3I9coVe1/7Hg63WJxl3EFtL8Vm5sbCwbNmzgwIED3Lhxg+joaLJkyYKHhwe+vr40a9YMB4c0P8xkcc1hGT8d/iDCanl4qOUB8MTGV7vmyApA2GMPiYeHRsWVsfRuHNtxliXfrKPvhM70ndAZgAtHr/JJv58SPZZFF7/CydmRkNsP+GncokS3yQiyZY+Leehj8QuLi1/cXd/H3X6Gh219qhZhypIhgCVx+WNO5kpAsj2s13F18qFHMXZJtNyzxPghzyJ56fN+W84f9+PAttPPXP558PB3/vGJIR61Ewnj+bBtCXtK23Jsx1mWfLuevuM70Xd8JwAuHLvGJ2/+HF9m4++7GPbVa/Sf9AqLp63HbDLz8oCmFCrpgUMG7t3Llj2puFu3tY+7ffPJ9fhp6zODJNvh/xjbxHgWyUefse04f8yPA1szZhuRXPFxT9AuPKVtVp2VdCBNe0CuXbtGmzZtGDNmDGfOnMHFxYV8+fLh6OjI6dOnef/992nXrh03btxIy8NMNoOdJZxmc+JvqTEn8vYag11ct9ljZQwPF8eVGTK1J52GtGT+56sY9eLnTHn7F7LncWXy4mE4Z3GyKmvvYM/4bt/yv1e+5sKRq3yxZjQV6pb+L6eWbtnFxy/x9aYUeGPQTb87jOr6HZ8OnUc2Nxe+WfkOOfO6/ufvfV7YGeKaiSTqtSmJ5c+qUPH8fPr728RExzL5rdlJ/h497x7+zv+7duKx5XENRXw7MaUHnQa3YP4XfzLqpS+YMng22XO7MnnRkPh2Yt28Hfz4v0W07FGX+Sc+57cTn+HhnZfVc/4mMtw6ycxInt5W6E7wv2WLdhigUAl3Pl00yNJGDJiVYduI5LJV3MWayWxIl/+eN2l6u+vDDz+kYMGCLFmyBDc3twTr79+/zzvvvMOECRP44Ycf0uAIn01Y3Jjqx3s6Ht6FCLsf8YQy1ncqsmRzji+Tp0BOWr5Wj4VfrmHuR3FjjHee5dzhy8zYNYHmPeqw6qet8WWNsUYObT0FwOFtp5ixewJdh7fh2I6zKXCW6UtoXEwfv9OTNS5+KTH98J2g+9wJug/A2SNX+Xnr+7TsUpMFyZxe8nkXH2O3pGKcsF4/qwq1SjDuhzeICI1iTM/vCczAU5gm9Tsf304kEs+wkMR/BlbthEdOWvasy8Kpa5n7Sdx4+p3nOHf4CjN2jKf5q3VYNdPSTiz7fhMrf9pKgSL5eHA3lJDboYyY9jqhd8NS7kTTmdAkYviwlzT8fsaeqjw1JdlGuKZwG/FjHyLCohjz6nQCNWWsTeIuklrStAfk4MGDjBo1KtHkAyB79uyMHDmS/fv32/jI/p0bl4MwxhrxLGY97vLh52tnEvbk+J0PtNomQZmzN8hfMDd2dnac3HvBapurp28QcvsBhct4AVCzVUXK1bZ+gDU2xsjlk/7k88r1H84s/Qq4ehtjrJECRazHrnvGfb72L2f+ypLNmUYvVUnwkGPAtduEhkSQt0DOf/W9z6OAa7csMS6cRIzP/7fZ1Rq2q8KkOf25fTOEER2/5vql4KcXeo7duBKcRDtheZbmWiLj2v0uWJ6H8XxsLP3Dz1btxL6LVttcPXODkNuhFC5jeY6kZKXC1GlTGWOsEf8LNwm5bXnfSsmKhblw7FoKnGH6FHA1OK6tsH5myTPu87Xzmft5gv8i4GpSbcTD2P6357katq/KpHlvcTswhBHtpyZ45iGzCohrSwo89hxefLtwLmM+RycZQ5omINmzZyco6MkNyY0bN3BxSXwcY3oTExXL8V3nqPNiFavldV+qyoN7YZw9dDlBmYDLQdy4HETddr7WZdr54n/+JkF+d7hxyZLYPD6LVsES7uTI48bNq5aHgTsOasHgKT2xs3/0Y82aPQtlqxXn0omM+RB6THQsx/ddok4L69l76raqyIOQcM4e+XcXVCajiWGfdqFz/0ZWy0tVKET2XNm4fPr5GBaYEmKi4mIc93D4Q3Vb/7cYg2U2sne/fJXTh64wotM33MoEY5NjomI5vvs8ddpUtlpet+2T2olgblwOpm67x9qWdlXwv3CTIP878TdAytW0nvnG0k64cvPabQAq1CnFqB/6xI8fB8v7Q4qU9WLXY7PpZCQxUbEc33OBOq0rWi2v+2JlHtwL5+zhq2l0ZM+/mKhYju+9SJ1WibQR9/5jG9HIh3enduf0wcuMePmrTNFGJNejOl3JavmjOn0lTY4ro0vr2a4yyixYaToEq1OnTrz//vsMGTKEGjVqUKBAAZycnIiOjiYwMJB9+/bxxRdf0KlTp7Q8zGfy+xer+XjFcMb+MoD1v+3Ap3pxOg1uwazxS4mOjCGrmwvepT0JuBwUf+fx98//ZMT03ty/G8qetUeo2aoSDTpUY/IblmFnIbdDWf79JjoNtrws69DWU7gXykP30W0J9LvNujl/AzD/s1VMWvoO4+a8xepZ28jq5sIrw1rjks2JXz/JuNNDLpi2kY/mDWDMd6+xYdE+ylYtQsc3GzLr09VER8WQ1dUZ7xIeBFy7Rcid5A0xiYqMYfEPW+g2uBn374ZzZOc5vIrmo/uwFlw8dZ0Ni/el8lmlLwu+3cBHv73FmO96sWHxXspWKUrHNxsx65M/H8W4pAcBV5MfY0dnB4Z+2oXwsCgWTNuI92OzxN0KuJdhLzZ+/3I1Hy99h7Ez+7N+/k58qhWj06DmzJqwzNJOuLrgXboAAVeCH7UTU/5kxLQ3uH8njD3rjlKzZUUatK/G5D4zgLh2YsZmOg2Kayf+Oo17wdx0HxnXTszdDsCWxXvpMrQVY2f1Z8m0DeTzysWbE1/h5J4LbF26N20CYiMLvlrHR4sGM+bHPmxYsJuyvsXoOLApsyatfBT3UnH1OC7ukjwLvtnAR78PZMz3r7Nh4V7K+hal44DGzPpoVVxs/2Ub8XlXwkOjWPBt5mojkmvB1LV8tHgIY37qy4bfd/2jTq9QnZZ0zWBOw6e4zGYz3333Hb/88gvh4eEJ1mfLlo3u3bszdOhQ7Oz+e2dNy1x9//N3JEftNpXp+f5LeJVw53bAPVb9vJVl320AoEKd0nz250imDJzFxt8fzW3e+vX6dBzUgnxeuQm4Esyir9aweeEeq+9tP6Apbd5ogHvhvNwNDOHglpPMmbTcqlGpVL8M3Ue3o1i5QpjNZo5uP8PsicvxO5f6wwsMObOn+j6SUrt5eXq804KCRfNzKzCEP3/dwbKf/wKgfI3ifLbgbaa8+zubliYczte0YzVGfNGNXnUnxk+7C5YHfFu/Wos2PergWSQvD+6Fs2v9MeZMWZsiz5b8K2n40GXtFuXpMawlBYvFxXjuDpb9vA2A8jWL89mCQUx5dz6bliQS407VGPHFq/SqO4Egf0uMK9YqwSe/v53k/uZ9tY7fvlqfKueSFHOo7Z6BqN26Ej1Ht3vUTszaxrLpGwFLL8VnK99lyqBf2Ljg0XtSWveqT8e3m5HPMzcBV4NZ9NU6Ni9+rJ3o34Q2rzfA3TuPpZ3Ydoo5k1dYtRMlKnrz5sRXKFHBm7CQcHasOsTcT1YSEZr6D6EbHNP2nUS1W1Wkx7ttKFg8P7duhvDnL3+zbIblxaLla5Xks2XDmDL0VzYt2pOgbNNXajLi6570qvY/ghJ5Tulp620mjWaOrN2yAj2Gt7K0ETfvWdqIHy3PHZWvWYLPFg9myvDf2JTIDZymnasz4svu9Kr1YXzsKtYuyScLByW5v3lfruW3qetS52SeJCbxlyamldqtKtJj5Iv/qNN/seyHuDpduySfLXuHKUPnsmlhInW6S01GfP0avaqNIygdPVez9ub0tD6EJB265p3Wh5CoKt7P1xDaNE1AHoqJieH06dMEBgYSERGBi4sLHh4elClTBicnp6d/QTLZKgHJrNIyAck00v7XNUOzZQKSWaV1ApIpPCdT1z+30lkCkhGl5wRk/7UiaX0IiarmfSWtD+GZpItWytHRkQoVKjx9QxERERERea7pTegiIiIiImIz6aIHREREREQkvXseX/qXHqkHREREREREbEYJiIiIiIiI2IyGYImIiIiIJMPz+NK/9Eg9ICIiIiIiYjNKQERERERExGY0BEtEREREJBmMZt27TwmKooiIiIiI2IwSEBERERERsRkNwRIRERERSQaT7t2nCEVRRERERERsRgmIiIiIiIjYjIZgiYiIiIgkg15EmDLUAyIiIiIiIjajBERERERERGxGQ7BERERERJJBLyJMGYqiiIiIiIjYjBIQERERERGxGQ3BEhERERFJBpNmwUoR6gERERERERGbUQIiIiIiIiI2oyFYIiIiIiLJYNS9+xShKIqIiIiIiM0oAREREREREZvRECwRERERkWTQiwhThqIoIiIiIiI2owRERERERERsRkOwRERERESSwaR79ylCURQREREREZtRAiIiIiIiIjajIVgiIiIiIslgNBvS+hAyBPWAiIiIiIiIzSgBERERERERm9EQLBERERGRZDDq3n2KUBRFRERERMRmlICIiIiIiIjNKAERERERERGb0TMgIiIiIiLJYDLr3n1KUBRFRERERMRmlICIiIiIiIjNaAiWiIiIiEgyaBrelKEoioiIiIiIzSgBERERERERm9EQLBERERGRZDCaDWl9CBmCekBERERERMRmlICIiIiIiIjNaAiWiIiIiEgymHTvPkVkrgTEXpUmVcUa0/oIMjxzRERaH4LIf2Nvn9ZHkOGpnUhdBheXtD4EkeeershFRERERMRmMlcPiIiIiIjIv2Q06959SlAURURERETEZpSAiIiIiIiIzWgIloiIiIhIMpjQiwhTgnpARERERETEZpSAiIiIiIiIzWgIloiIiIhIMmgWrJShKIqIiIiIiM0oAREREREREZvRECwRERERkWQw6t59ilAURURERETEZpSAiIiIiIiIzWgIloiIiIhIMpjMehFhSlAPiIiIiIiI2IwSEBERERERsRkNwRIRERERSQbNgpUyFEUREREREbEZJSAiIiIiImIzGoIlIiIiIpIMJrPu3acERVFERERERGxGCYiIiIiIiNiMhmCJiIiIiCSDEb2IMCWoB0RERERERGxGCYiIiIiIiNiMhmCJiIiIiCSDZsFKGYqiiIiIiIjYjBIQERERERGxGQ3BEhERERFJBs2ClTLUAyIiIiIiIjajBERERERERGxGQ7BERERERJJBs2ClDEVRRERERERsRgmIiIiIiIjYjIZgiYiIiIgkg1FDsFKEoigiIiIiIjajBEREREREJJMymUx888031KtXj4oVK9K7d2+uXr2a5PbBwcEMHz6cGjVqUKNGDYYOHcrNmzefaZ9KQEREREREksGEIV3++y+mT5/OggULmDRpEgsXLsRgMNCvXz+io6MT3f6dd94hICCAX375hV9++YWbN28ycODAZ9qnEhARERERkUwoOjqaWbNmMXjwYBo0aECZMmWYOnUqgYGBbNy4McH29+/fZ//+/fTr1w8fHx98fHx48803OXnyJHfv3k32fpWAiIiIiIhkQmfOnCEsLIyaNWvGL8uePTs+Pj7s378/wfbOzs5kzZqVFStWEBoaSmhoKCtXrqRIkSLkyJEj2fvVLFgiIiIiIsmQXmfBatKkyRPXb968OdHlD5/dKFCggNXy/PnzExAQkGB7Z2dnJk+ezIQJE/D19cVgMJAvXz7mzZuHnV3yY5M+oygiIiIiIqkqIiICACcnJ6vlzs7OREVFJdjebDZz9uxZKleuzG+//cacOXPw8vLi7bffJjQ0NNn7VQ+IiIiIiMhzLKkejqdxcXEBLM+CPPx/gKioKLJkyZJg+9WrVzN//ny2bt2Kq6srAD/88AONGjVi6dKl9OrVK1n7VQIiIiIiIpIMJvN/m3EqvXk49CooKAhvb+/45UFBQZQpUybB9gcPHqRo0aLxyQdAjhw5KFq0KFeuXEn2fjUES0REREQkEypTpgyurq7s3bs3ftn9+/c5deoUvr6+CbYvUKAAV69etRqeFRERgb+/P4ULF072fpWAiIiIiIhkQk5OTvTo0YMvvviCzZs3c+bMGd555x08PDxo1qwZRqOR4OBgIiMjAWjfvj0Aw4YN48yZM/HbOzk50aFDh2TvVwmIiIiIiEgyGLFLl//+iyFDhtCpUyfGjRtHt27dsLe3Z+bMmTg5OREQEEDdunVZs2YNYJkda/78+ZjNZnr16sUbb7yBo6Mjv//+O9mzZ0/2Pg1ms9n8n476OdIy75s22U/Vxi/Qa8xLeJfyJOT2A9bM/ouFX697YpnGnWvQZWgrPArnJfj6HZZM28C6eTustmnWtRYd326OZ9H83AkMYdOi3fw+ZQ3GWGP8NsXKFeT1sS9TqnIR7OwMnD96jV8mLuPCsWupcq7/ZEjkYSVbqdqwDK+NbIN3SQ9CboeyZt5OFn23KVllS5QvyNSVw+lTfxJB/nes1tVpVZHOA5tQqLg7YQ8iOLLzHLM+WsW9Ww9S4zSeyhw3W0Vqy6x12JZSK8a1WlXi1XfbULC4O3eD7rN58R4WfrWW2JhHMS7rW4zXx71M6SpFiQyLZP+mE/wyaTl3AkNS5Vz/yfCPhxzTQtWGZS1tRam4tuLXnSz6LuHLthJTonwhpv4xnD71JiZoK5Kz3lbMcXcqU5vqcNrIDHV4rf83abbvp3nvWKe0PoREfVJhSVofwjNRD0gKK1utGOPnvY3fuZtMfP17tizaQ6+x7en6Tusky9RrV5V3v3uDg9tOMeG16RzZfoZhX71Go07V47d56c3GjJj2Bn7nbjKh1/f8+skfNOlckzEzHyVVBYrk4/NVI3HJ6sRXw+byxaBfcHC054s/R1GwhHuqnndaKlu1CB/M7Iff+UAmvTmTLcv202tUG7oObvbUskXLevLh7P44ONonWFe3TSXG/dibC8f9mTxgFrM//ZPyNUrwyYK3cXTOuPM3qA6nvtSKceUGZfnfnAFcvxjIhF7f8+cv2+gytBVvTnwlfptSlYvw6coRuObIypRBv/Dl0DnkK5ibL9eMJqtb2t1EsIWyVYvywax++F0IZFK/mWxZup9eo9vQdXDzp5YtWtaTD+ck3lYkZ31GozqcNlSHJaPIuFdRaaTHyLZcOuHH5wNnAXBwy0nsHe15ZWhLln2/kejImARleo15iR1/HOLHcYssZbaewi1XNnqOasfWJfuwszPQY2RbDm49xeQ+M+LLnT96lR93fUjlBmU5/Ndp2r/ZmOiIGP7X7VuiwqMBOLr9LHMOf0y7vo2Z/t7vNoiA7XV/pyWXTl3ni2HzADi47QwODvZ0HtiUZT9tSzTmDo72tHujPj3fbZ3oeoBXh7Zg3+aTTBuzKH6Z/8Ugvv5zBDWavMCONUdT54TSmOpw6kuNGAM0f7U2wf53+GzATEwmM4f/Ok2OvG68PKApM8YtwhhrpNvw1oSFRDC6/RRCQ8IBOPzXaX7eM5HOg1sw56MVtglCGohvK4b+CsDBbadxcLSn89tNWfbT1ie3FSPb/Kv1GZXqcNpQHU57GW0WrLSiHpAU5OjkQPk6pdj552Gr5TtWHSSrqwvlapZMUMa9UB4KlvBg5+rHyvxxCM9i+fEqnp+c+bPjlisbe9dbX/BeOxfAvVsPqNG8guXz+Zssnb4h/sINICoimls37lKgSL6UOs10xdHJngo1S7Jz7TGr5TvWHLXEvHqxRMtVa+xD92EtWfjtBmZ99EeC9QaDgUPbz7J2/i6r5f6XggAoUDhvCp1B+qI6nPpSK8aW73YkMjwak+nRyNr7d0JxcnYki6szAIVKFeDk3gvxF24AMVGxnDt0hRrNy6fYeaY3jk4OVKhVkp1rrevgjtVH4tqK4omWq9bYh+7vtGLhN4m3FU9bnxGpDqcN1WHJSJSApCCPwnlxcnbk+sVAq+U3LgUD4FU84RCSQqUs8y8nKHM5KL5MWEgEsTFG3L3zWG3jmiMrbjmz4hF3Mbz6l79YMm2D1TZexfNTpKwXV89c/w9nln55eOfF0dmB63GJwUM3rsTFvGj+RMudO3qNXrU/ZMG3GzEaTQnWm81mfp64gj0bTlgtr9OqIgBXzgakxOGnO6rDqS+1Ygzwx8yteBbLT6dBzcmWPQtlqhalff+m7Nt4nNB7lou1kFsPEvwcAAoUzYd7Bk2sATy88zy5rSiWeIJ77ug1etUaz4JvN1g9q5Tc9RmR6nDaUB2WjERDsFKQa46sAIQ/sH4AMDzU8jmrW8IH11xzWMarhj2ISKJMFqIiovl7xX7a9mnE1TMB7Fp9mJz53BgwuQuxMUacszolejzOWZx4d9obREVGs+LHLf/t5NKpbNkt8XsYr4fCQy3zUycWc4DbN5/9QUXPIvnoM7Yd54/5cWDr6Wcu/zxQHU59qRVjgGM7zrLk2/X0Hd+JvuMtD0peOHaNT978Ob7Mxt93Meyr1+g/6RUWT1uP2WTm5QFNKVTSAwenjPsnIb6tSBD3uLbC9d+1Ff+mLXneqQ6nDdXh9MGke/cpIuP+pqYBg51lXGBSE4uZTQmXPyzDY6sMBoNVmW/e/Y2YqFiGfdWT4d/0IjIsisXT1uOc1clquMpDWV1d+GDe25SsVIQJvaZz68bdf3ta6ZpdEvF7yJRIzP+NQiXc+Wj+QGKiY5k8YFaSP+Pnnepw6kvNGA+Z0oNm3Woz/4s/ObL9DO7eeek5uh2TFw3hvQ5TiYqIZt28HWR1c6Hn6Ha8PKApJpOJHX8cYvWcv2nRvU4KnWX689S2IoP+TqcG1eG0oTosGYkSkBQUFjce9fG7Pw/vSjx+58dSJiLRMlmyWca6ht23rI8Mi2LqsLl8P3Yh7gVzc9PvNlHh0TR/tQ7Hrpy1KpvXMxcTfx+MV3F3Puozg30bjqfA2aVPofcTj1/WuLHC4YnE/FlVqFWCcT/2ISIsijGvTifQL+2mJkxtqsOpL7VinMcjJy171mXh1LXM/SRuHPfOc5w7fIUZO8bT/NU6rJq5FYBl329i5U9bKVAkHw/uhhJyO5QR014n9G5Yyp1oOvPUtuK+baauzQhUh9OG6rBkJGmegPTs2TP+DsjTzJ07N5WP5r+5cSUYY6wRz2LWzx14xo3LvJbIcwN+F25atiman4vH/R6ViXt24drZGwBUb16e0HvhnNp3katx35Mjrxv5vHJZvR+hqI8XkxYNxcnFkXFdvubYznMpeIbpT8DVWxhjjQkeCveMe2D52vmb/+n7G7avyvApr3L9cjD/6/E9tzJ4V7XqcOpLrRjnL5gbOzs7Tu67aFX26pkbhNwOpXAZyxj8kpUKk98rNztXH8b/wqPfj5IVC2e4d638U3xbUSR12orMRHU4bagOpw9GzYKVItJ8IFutWrXYv38/t2/fxsvL64n/0ruYqFiO7z5PnTaVrZbXbVuVB/fCOHvocoIyAZeDuXE5mLrtqliXaVcF/ws3418E1Ob1BvT70PrlNy/3b4LJaGbvBssMUHk9c/HRkncwm82MaP1ZhrtwS0xMVCzH916kTqsKVsvrtq7Ig3vhnD3y7/8YVWvkw7tTu3P64GVGvPxVhk8+QHXYFlIrxjcuB2GMNVKuZgmrbQqWcCdHHlduXrsNQIU6pRj1Q5/48eRgefdCkbJe7Fp9JIXOMv151FZUtFpet02luLbiahod2fNHdThtqA5LRpLmPSADBw4ka9asfPPNN8yYMYOCBQum9SH9J79/uZqPl77D2Jn9WT9/Jz7VitFpUHNmTVhGdGQMWV1d8C5dgIArwYTcDrWUmfInI6a9wf07YexZd5SaLSvSoH01q/clrPxxCx8tGUb/Sa+wZ/1RKtUtQ9d3WrPwq7XcvHoLgLc+7kqu/Nn5ZsQ8srq5UKZq0fjy4Q8iuXYuY87ctOCbDXz0+0DGfP86GxbupaxvUToOaMysj1bFxdwZ75IeBFy9Rcid5HXPOzo7MPTzroSHRrHg2414P/YSvFsB9zJsQqI6nPpSI8Yht0NZPmMznQa1AODQX6dxL5ib7iPbEuh3m3VztwOwZfFeugxtxdhZ/VkybQP5vHLx5sRXOLnnAluX7k2bgNjIgq/X89GCtxnzwxtsWLiHslUfthV/PIp7KQ8Crtwi5E5oWh9uuqY6nDZUhyWjMJjTydO0ffv2JWfOnHzxxRepto+Wed98+kYpoHbrSvQc3Q6vEu7cDrjHqlnbWDZ9I2C5c/PZyneZMugXNi7YHV+mda/6dHy7Gfk8cxNwNZhFX61j8+I9Vt/bsEM1ug1vg7t3XoL8b/PnrG388bNlPKyDoz0rrk1L8g2mx3aeZdRLU1LpjC0MWdLuDbS1W1agx/BWFCyWn1s37/Hn3B0s+9ESm/I1S/DZ4sFMGf4bmxbvS1C2aefqjPiyO71qfRh/t75i7ZJ8snBQkvub9+Vafpu6LnVO5gnMEf/9mZbkyKx12JZSK8bt+zehzesNcPfOw93AEA5uO8WcySviLwIBSlT05s2Jr1CigjdhIeHsWHWIuZ+sJCJuNp3UZHBJfKYeW6ndsgI9RrSiYDF3S1sxZ/ujtqJWCT5bPIQp78xLuq2Y2oNeNcfHtxXPst5WzJG2eRZAdThtZIY6vNb/mzTb99MMPdwtrQ8hUV9Xfr5e1JtuEpDAwEBOnTpFo0aNUm0ftkpAMqu0TEAyC1slICKpJa0v3jIDWyUgmZXqcOpTAvLsnrcEJM2HYD3k7u6Ou3vClxeJiIiIiEjGkW4SEBERERGR9MxkTvP5mzIERVFERERERGxGCYiIiIiIiNiMhmCJiIiIiCSDEb2IMCWoB0RERERERGxGCYiIiIiIiNiMhmCJiIiIiCSDyawhWClBPSAiIiIiImIzSkBERERERMRmNARLRERERCQZ9CLClKEoioiIiIiIzSgBERERERERm9EQLBERERGRZDDpRYQpQj0gIiIiIiJiM0pARERERETEZjQES0REREQkGYx6EWGKUA+IiIiIiIjYjBIQERERERGxGQ3BEhERERFJBr2IMGUoiiIiIiIiYjNKQERERERExGY0BEtEREREJBlMmgUrRagHREREREREbEYJiIiIiIiI2IyGYImIiIiIJIMJDcFKCeoBERERERERm1ECIiIiIiIiNqMhWCIiIiIiyaBZsFKGekBERERERMRmlICIiIiIiIjNaAiWiIiIiEgymMy6d58SFEUREREREbEZJSAiIiIiImIzGoIlIiIiIpIMmgUrZagHREREREREbEYJiIiIiIiI2IyGYImIiIiIJIMJDcFKCeoBERERERERm1ECIiIiIiIiNqMhWCIiIiIiyaBZsFKGekBERERERMRmlICIiIiIiIjNaAiWiIiIiEgyaAhWylAPiIiIiIiI2IwSEBERERERsRkNwRIRERERSQYNwUoZ6gERERERERGbUQIiIiIiIiI2k6mGYN14rWxaH0KGZhed1keQ8TmGm9P6EDK0rMHGtD6EDC+4Yqb6s5MmvH84ndaHkLEZNAQnM9MQrJShHhAREREREbEZJSAiIiIiImIz6gsXEREREUkGExqClRLUAyIiIiIiIjajBERERERERGxGQ7BERERERJJBs2ClDPWAiIiIiIiIzSgBERERERERm9EQLBERERGRZNAQrJShHhAREREREbEZJSAiIiIiImIzGoIlIiIiIpIMGoKVMtQDIiIiIiIiNqMEREREREREbEZDsEREREREkkFDsFKGekBERERERMRmlICIiIiIiIjNaAiWiIiIiEgymDUEK0WoB0RERERERGxGCYiIiIiIiNiMhmCJiIiIiCSDCQ3BSgnqAREREREREZtRAiIiIiIiIjajIVgiIiIiIsmgFxGmDPWAiIiIiIiIzSgBERERERERm9EQLBERERGRZNCLCFOGekBERERERMRmlICIiIiIiIjNaAiWiIiIiEgyaBaslKEeEBERERERsRklICIiIiIiYjMagiUiIiIikgyaBStlqAdERERERERsRgmIiIiIiIjYjIZgiYiIiIgkg2bBShnqAREREREREZtRAiIiIiIiIjajIVgiIiIiIslgNqf1EWQM6gERERERERGbUQIiIiIiIiI2oyFYIiIiIiLJYEKzYKUE9YCIiIiIiIjNKAERERERERGb0RAsEREREZFkMOtFhClCPSAiIiIiImIzSkBERERERMRmNATLRuqUKcygVnUo5p6bu6ERLN59jJmb9ye5vYujA2+1qEWLSqXI5ZqFczeC+WHDHnaeuWq1XbtqPrzesCqF8ubk1v0wVh04zY8b9xJrMqX2KaUrtcsW5u22dSjmYYnvkh3HmLXhyfEd0LoWzavGxfd6MDPW7GHXaUt8PXNnZ82EPkmWX7nnJB/M25Di55Ge1SpXhAEdalOsQB7uPohg2bajzF6TdIydnRx486VaNKtWmlxuWTjnF8zPf+xh94krVtu1r1+ebs0q45kvJ4F37rNky1EWbDqcymeT/lSvUoQ+PepRxDsP90LC+WPtUX5bsjfJ7R0c7OjSvhotmrxA/rxuBN8KZdNfp/htyV5iYx/9/nsXzM2A1xtQqXwhYmNNHDvpz3cztxIQGGKL00pX6pYszJCmdSieLzd3wyNYuO8YP/2ddB3+J3s7A/Pf7EpETAyvz1xita6Tbzleq1WFgrlzEHDvPr/vO8a83Rm/Dldt/AK9xryEdylPQm4/YM3sv1j49bonlmncuQZdhrbCo3Begq/fYcm0Daybt8Nqm1qtKvHqu20oWNydu0H32bx4Dwu/WktsjDF+m7K+xXh93MuUrlKUyLBI9m86wS+TlnMng9frqg3L8NrINniX9CDkdihr5u1k0XebklW2RPmCTF05nD71JxHkf8dqXZ1WFek8sAmFirsT9iCCIzvPMeujVdy79SA1TuO5ZtIQrBShBMQGKhYpwDe9X2LdkXNMW7uTykW9GNyqDnYGAz9t2pdomQldm1OnTBG+Xr2Dq8H3aFfNh2/7tKfv9CUcunwdgO71KjP65YZsOHKOL1dtJ2e2LAxsWYuSnnl555dVNjzDtFWxaAG+7v8S6w+d47tVO6lc3ItBL1ri+/P6xOP7YY/m1PYpwjcrd3At+B5ta/jwzYD29PtmCYcvXif4fhg9v/g9Qbku9SvRokoplu86kdqnla5UKF6AKUNeYuO+s/ywbBcVS3ryVoe6GOwM/PJn4jH+oHcLapUrwrQl2/ELukeb2j58ObQ9b322mCPnLXW4Y8MKvPdaU+as2cfek1d5oVgBhnZpgIuzI7NXJ/69GdELZTz5aFwHtu44w8x52ynvU5C+PethsDMwb9GeRMsM7teYFo1fYO7C3Zw5d5NSxd15/dXauOfLzmffrgcgX143pn36Kn7X7zDxiz9xdnKgT496fDGhM28Mnk10dKwtTzNNVSpUgO+6v8TaE+f4ZtNOqhT2YmhTSzsx46+n17W+9atRvqAH+y77WS3vUr0CH7Rrwk9/72f3hatUKOTBqJb1yerkwI9/JS+5eR6VrVaM8fPe5u8VB5jz0UrK1ShBr7HtMdjZsWDqmkTL1GtXlXe/e4MVP27h4OYT1GpdiWFfvUZUZDRbl1h+BpUblOV/cwbw94oDzJqwjKI+Xrw+9mVy5HFj+nuWNrlU5SJ8unIEfuduMmXQL0RFRtPhrWZ8uWY0AxtOJPxBhM3iYEtlqxbhg5n9+HvVYeZ+vpoXqhWj16g22NkZWPDtxieWLVrWkw9n98fB0T7BurptKjH2hzdY/etO5n6+mpx53eg5ojWfLHibwW2+ICYq87QTYjtKQGxgQPOanLkRzNj5ljtDO89cxcHejt6NqzH3r4NE/eOuDkDBPDloWbk0k5ZsZtGuYwDsu3CNykU96VKnAocuX8fOYGBAi5rsOnuVd+euji97yj+QFaN7UbOUN3vOXbPdSaah/q1rctY/mHFzLfHdddoS3zeaVePXLYnEN28OWlQtzeQFm1m8Iy6+565RqZgnr9SrwOGL14mJNXL8yk2rcj7e7rSoUopvV+3kyKUbtjm5dKLvS7U4dy2YD362xHj3iSs42NvTq3V15q8/RFSM9R8or3w5aFa9NJ/M3cTSbZYY7z99jYolPenUuGJ8AtKrdXU27jvLtCU74rbxo7B7Lro0qZSpEpDXu9XmwuUgJn9puXDbd+gKDvZ2dO9Yg0UrDiRIFNxcXWjXshIzZv/FguWWi9xDxyy/72/1bsiMOX8Tcj+C3q/WITwimuH/W0RU3EVEQGAIH43rQJkS7hw7dd2GZ5m2Bjauyembwby3xFKHd5y/ioOdHX3rV2P2zoNExRqTLFvaIy9v1q9O8IOwBOv61a/G2uNnmbrBUof3XPKjcJ5cdK9ZOUMnID1GtuXSCT8+HzgLgINbTmLvaM8rQ1uy7PuNREfGJCjTa8xL7PjjED+OW2Qps/UUbrmy0XNUu/gEpPmrtQn2v8NnA2ZiMpk5/NdpcuR14+UBTZkxbhHGWCPdhrcmLCSC0e2nEBoSDsDhv07z856JdB7cgjkfrbBNEGys+zstuXTqOl8MmwfAwW1ncHCwp/PApiz7aVuiMXdwtKfdG/Xp+W7rRNcDvDq0Bfs2n2TamEXxy/wvBvH1nyOo0eQFdqw5mjonJJmangFJZY729lQrUZDNxy5YLd949DzZXJyoUtQrQZnAe6F0/XI+qw+eiV9mNkOs0YSjg+XuRR63rOTI6sJfJy9Zlb0UeIc7oeE08CmWCmeT/jg62ONboiCbj1rHd9PhuPgWTzy+r342nzUHrONrNJlwckh4d+ihMa805tLNO8zbcijlTuA54OhgT9XSBdl66LzV8i0HzpHNxYlKpRLGOOhuKK9N+I11ex6vw2arGA+ZuoxvFv9tVTbGaMTRIfPcG3F0sKdS+UL8vfuc1fK/dp0ja1YnKrxQMEGZbNmc+WPdEXbus673fjfuAuDpkROAerVKsmbj8fjkA+DshUA6vv59pko+HO3tqV60IJtOWcdrw8nzZHN2omqRhHX4IQc7Oz7u2IJ5e45w+dadBOvfnLOcKeu3Wy2LMZpwsk+6LXneOTo5UL5OKXb+aT3MbMeqg2R1daFczZIJyrgXykPBEh7sXP1YmT8O4VksP17F88d9tyOR4dGYTOb4be7fCcXJ2ZEsrs4AFCpVgJN7L8QnHwAxUbGcO3SFGs3Lp9h5pieOTvZUqFmSnWuPWS3fseaoJebVE/+bX62xD92HtWThtxuY9dEfCdYbDAYObT/L2vm7rJb7XwoCoEDhvCl0BhmH2Zw+/z1vlICksoJ5cuDk4MDV4LtWy6/dugdA4fy5EpSJMRo55R9IWFQ0BgN45HRjVPsGFMqbg8VxPSIPIqKIMRrxzJ3dqqxbFmeyZ3HB67HlGVXBPDlwcnTgatBj8Q2+B4B3YvGNNXLqWiBhkXHxzeXGyI4NKJg3B0t2HEuwPUAr39KUK+LB50u3YXoef9P/A698lhhfu2kdY7+gewB4uyce49NXHsXYPbcbw7s1pGD+HPE9IgBXAu5w87ZljHH2bC68VK8crWv7sGTrkVQ7n/TG08MSX7/r1vH1j0smCnkmjO/NwBCmfr8pQZn6tUoSE2PE7/odPNxz4Obqws2gEIYNaMofvw1iw9J3+Ph/L5M/n1vqnVA6VCi3pR2+cuuxduL2PQCK5EkY44cGNq6Jo7090zbvTnT9peA73LhnqcM5sjjTsWo5XqpUlvl7j6TIsadHHoXz4uTsyPWLgVbLb1wKBsCruHuCMoVKFQBIWOZykFWZP2ZuxbNYfjoNak627FkoU7Uo7fs3Zd/G44TesyQcIbce4O6dJ8E+ChTNh3sGvWD28M6Lo7MD1+MSg4duXImLedH8iZY7d/QavWp/yIJvN2I0Jnw21Gw28/PEFezZYD2suE6rigBcORuQEocvkkCa32a8fPkyf/75JyEhIdSrV48GDRpYrQ8NDWXy5Ml8/PHHaXSE/41bFssdm9DIaKvl4VGWz67OTk8s37dJdQa3rgPAsj3H2X/RH4DImFjWHzlHt7oVuXjzNpuPXyCPa1ZGv9yQWJOJLE6OKXwm6ZNbVkt8w5KKr8uT49uneXUGtY2L767j7D/vn+h2rzWpyuGL1zmQxPqMLMkYx33OluXJMX69TXUGdqgLwIq/j3PwjF+CbSqU8GTmmK4AnLpyk4WZ6CF012wuAISHW8c3IiIuvlmfHN+H6tcuRfNGL7Bk1UFCw6IoGJe49O/VgNPnA5jwxZ/kypGVN1+rx1eTu9J78GwioxIfkpHRuLnE1eEo6xiHRT+5nSjn5c4bdary2s+LiDEmPUQLoLK3J7+92QWAE9cD+W3Pkf941OmXa46sAIQ/iLRaHh5q+ZzVzSWRMlkACHvs+YxHZSzrj+04y5Jv19N3fCf6ju8EwIVj1/jkzZ/jy2z8fRfDvnqN/pNeYfG09ZhNZl4e0JRCJT1wcErzy5pUkS27JT4P4/VQeGgUkHjMAW7ffPaH8j2L5KPP2HacP+bHga2nn7m8SHKk6W/qwYMH6dOnD+7u7pjNZn777TeaNm3KlClTcHKy/EGIjIxkxYoVz20CYmd4OFtC4nfNTU+5mb7t5EUOXbqOTyF33mpRE/ecbrz143IAJi7eTHSskfGvNGNC1+ZERMXwy9YDuDg6EBGdOS4sHsbXnESvxFPje/wihy5cx6ewOwNa1cQjlxsDv1tutU2lYp6ULeTOsBkrU+SYnzeGp8Q4qeUP/X34EkfOXadsEXf6vVQL99xuDPlymdU2N26F0P+TReTL5cqb7Wsx9/+602vifO7cD0/iWzMOg93T6vDTe9wa1C7FuHfbcPSEHz/Otgxpc4x72PTuvTD+99GK+C766wF3+f6LHjRr5MOqdZljbPe/aSecHOz5uGML5u4+zPHrgQk3eIz/3RBe+3kR7tldGdS4FovfepVXvv+d22EZrw4/rc6aEwnowzKP/ymMb1/iygyZ0oNm3Woz/4s/ObL9DO7eeek5uh2TFw3hvQ5TiYqIZt28HWR1c6Hn6Ha8PKApJpOJHX8cYvWcv2nRvU4KnWX6YpdE/B4yPe2PXTIVKuHOR/MHEhMdy+QBs57avmdGehFhykjTBGTKlCl06tSJcePGAbB27VrGjh3LgAEDmDFjBo6Oz/9d/AcRlrsT2R67w5Y1rucjNDLqieXPB9wG4OCl6zyIiGJC1+ZUKuLJkSs3iIiOYfzCjXy6fBueubJz424IEdGxtK/+Av63E95lzogehD8lvhFPju+FG5b4Hrp4nQfhUXzYozmVinlaPWTetFJJQsIi2XHySgoe+fMj9GGM43rzHsoaF/OH65Ny8fotAA6fs8T4/3q3oEIJT45deBTjW/fCuHXP8oDviUsBLPu4Ny/VL5fkDFsZSWiY5Y5mtqzW8c0S17MUFhadoMw/vfKSLwPeaMCRE36MnbScmLiHqR/2qOw9eNlqfPCpswE8CI2kZBJDNjKiB3HtbLbHepyzxd3oepBIOzy0aW0MBgM/bN2DfdzFnwHLf+3tDBgfu+ALfhAW/5D6Mf+brB32Bp18yyVrhq3nTVjcsxeP33XP6mr5/Hgvh6VMRKJlsmSL6526H0Eej5y07FmXhVPXMveTuOcVdp7j3OErzNgxnuav1mHVzK0ALPt+Eyt/2kqBIvl4cDeUkNuhjJj2OqF3E04UkBGE3k88flnjnotJiZm/KtQqwbgf+xARFsWYV6cT6JfwmSeRlJKmz4CcPXuWHj16xH9u1aoVP/30E4cPH2bUqFFpeGQpx+/2PWKNJrzz5rRa/vDzxcDbCcp45c7OyzVeSPBA9Ek/y6xMHjldAajvU5RKRTyJiI7hYuBtIqJjye2aBY+cbpy+HpTgezMiv1tx8c2X02r5w8+XbiaMr2ee7LSvlUh8r1ni657L1Wp5/XJF2XrsQqZ7t8pD/kGWGBfMn9NqeaG4z5dvJPwj5Zk3O+3qlUsQ41OXLXeS3XO7kdXFkZY1yyT43uvBIdwPj8Q9V+Z4TuFGgCW+XgVyWi1/OITqit+tJMsOfbMJb/dtxF+7zjF6/FIi/jHLzY2b9zAaTfE9If9kb29HVCaagvfaHUuMC+fJabXcO+7zxaCE7UTzF0pSLF9uDn4wmOMThnF8wjCqFS1ItaIFOT5hGO0r+5DVyZEXK5bBO3cOq7J+d0K4HxmJR46MWYdvXAnGGGvEs5h1EutZLB8A1xJ5bsDvgqV99Xws8X34+drZG+QvmBs7OztO7rtotc3VMzcIuR1K4TKW50hKVipMnTaVMcYa8b9wk5DboZblFQtz4VjGnP0x4OotjLHGBA+FexaJi/n5m4kVS7aG7asyad5b3A4MYUT7qQmeNRFJaWmagLi6unL3rvVDgVWrVuXzzz9n/fr1z+2wq3+KjjVy6JI/TcqXsFrerGJJ7odHcuJawkbDK3cOPuzSnKYVrMvUKVMEgLM3LBcknWtVYES7elbb9KhfBaPZxF8nL6fgWaRf0bFGDl3wp3El61g1rRwX36uJxDdPDsZ3b06Tx8rUKVsEgHP+jy74smd1xjt/rkw37e4/RccaOXzOn0ZVrOPV2LcU98MiOXk5YYw98+bgf280p1FV69lwapUvAsB5v2CMJjPj3mjOa618rbbxKeJOTtcsnPcLTtkTSaeiY4wcO+FH/dqlrJY3qF2KB6GRnD6X+IVFv9fq0aFtFRatOMCHn62K7/l4KCIyhmOn/Klfu2T87HkAVSp4kzWLE8dOZp7nmaJjjRy46k9TH+s63PyFkoRERHLcP2GMB85bSefp863+nbweyMnrgXSePp+tZy5hMpuZ2L4ZvetVsypbzsudnFmzcOZmxqzDMVGxHN99njptKlstr9u2Kg/uhXH2UMK/PwGXg7lxOZi67apYl2lXBf8LNwnyv8ONy0EYY42Uq2n9cypYwp0ceVy5ec2SKFaoU4pRP/SJfy4CLO8PKVLWi12rj6TQWaYvMVGxHN97kTqtKlgtr9u6Ig/uhXP2yL9PvKo18uHdqd05ffAyI17+ilv/4rmRzMRsNqTLf8+bNB2C1aBBAyZMmMD48ePx8fGJH3LVtGlTxowZw6RJkwgIeP5nYPhx4z5+HNCRL15rw4p9J6lYpACvN/Tlq9XbiYoxks3ZieIeufG7FcLdsAgOXPRn3/lrvN+hMW4uzlwJvku1EoV4o5Evi3cf43KQ5Y7z/O2HmTGgI6PaN2DbiUtUL1mIvk2r8/PmfVy/k3kakJ/W72PGoI583rsNK/acpGLRAvRq4svXK+Pi6+JEMY/c+N8K4W5oBAfP+7Pv7DXe69wYtyzOXAm8S7VShXi9qS9LdhzjcuCjO/olPS13my7dzNxd0bNW7eW7dzvx8VsvsmrHCSqU8KRnS1++XbKdqJhYsrk4UdQzD/7B97j3IIJDZ/3Zf/oaI7s3xi2rM1dv3qFqmUK81qoay7Yd40qAJZ5z1+ynT9uahIRGsu/UVbzdc9HvpVqcuxbEqh0n0/isbWfuoj18OfEVPhzdjjWbjvNCGU+6dqjOjNl/ER0dS9YsThTxzsP1gHuE3I+gRNH8vNqxBmfOB7B1xxl8Shew+r4r124THhHNT3O289XHXfh0fEcWLt9PrpxZ6d+rASfP3EgwhW9GN2PbPma+3pGpXduw7OBJKnkXoHddX77csJ2o2Lh2OF9u/O6EcDc8gvOJ9E4/fGj95I1Hz4TM3H6AAQ1rEBIewe6L1yicNxdvN67JmYAglh/KuHX49y9X8/HSdxg7sz/r5+/Ep1oxOg1qzqwJy4iOjCGrqwvepQsQcCU4vofi9yl/MmLaG9y/E8aedUep2bIiDdpXY3KfGQCE3A5l+YzNdBrUAoBDf53GvWBuuo9sS6DfbdbNtUx3vGXxXroMbcXYWf1ZMm0D+bxy8ebEVzi55wJbl+5Nm4DYwIJvNvDR7wMZ8/3rbFi4l7K+Rek4oDGzPloVF3NnvEt6EHD1FiF3kjcUzdHZgaGfdyU8NIoF327Eu4T1DGa3Au4pIZFUYTCn4RNGISEhvPPOO+zevZsZM2ZQv359q/Xz58/no48+wmg0cvr0f5+JocLwqf/5O/6txuWLM7BFLYrkz0VQSBgLdhxh7l+W90n4Fi/IrLc7M+739fyx/xRgGas8oHkNmlYoSb4c2bh++z6Ldx/jt+2HrcZzt6pcmn7NauCVOzsBd++zcOcxft9xJA3OEOyePFQ9VTWqUJy32jyK78K/j/Br3Ps6fEsW5Oehnfm/X9fzx964+Lo48WarGjSt+Ci+S3ce47dt1vFtXrkUn/VpQ/uJs7kSeDexXduUY3jaPRDYsEoJ3nypFoU9chF8L5TFW47y2/qDAFQpXZAZo1/hw5nr+HPnoxj3bVeTRlVLki9nNm7cus+ybcdYsOlQfIwNBujQoAKdGleiYP6c3A+LZOvB83y/fCdhEbavUFmDnzzTUWqqV7Mkb7xah0IFc3HrdijLVx9m0YoDAFQqV4ivP+7Kx1+tYd3mk/TuXodeXWsn+V1D31/AkROW58BeKONJv571KFu6AJFRMezYc4HvZ20jNOzJz+6kluCKaXffq0nZ4gxqUouieXMReD+M3/ceYfZOSztRrWhB5vTpzJil61lx+FSi5Wf3sczK9PrMJfHLDAZ4pVoFulWviHeenIRERLLp5Hm+3rSL0Ki0aRS9f7DNzEW1W1ei5+h2eJVw53bAPVbN2say6ZY3cleoU4rPVr7LlEG/sHHBoymMW/eqT8e3m5HPMzcBV4NZ9NU6Ni/eY/W97fs3oc3rDXD3zsPdwBAObjvFnMkr4hMZgBIVvXlz4iuUqOBNWEg4O1YdYu4nK4kITf16bciS5ekbpZLaLSvQY3grChbLz62b9/hz7g6W/Wh5LqZ8zRJ8tngwU4b/xqbFCZ89atq5OiO+7E6vWh8S5G+5CVSxdkk+WTgoyf3N+3Itv01dlzon8wRr/b62+T6Tq/wfH6T1ISTqeLsP0/oQnkmaJiAPXbt2jVy5cuHmlnC87OXLl9mwYQP9+/f/z/tJywQkM0jLBCSzSMsEJDNIywQks0jLBCSzsFUCklmlZQKSWaTnBOSFlePT+hASdfKl8Wl9CM8kXfwl8Pb2TnJd0aJFUyT5EBERERGRtKc3oYuIiIiIiM2kix4QEREREZH0Lu0fXMgY1AMiIiIiIiI2owRERERERERsRkOwRERERESS4Xl86V96pB4QERERERGxGSUgIiIiIiJiMxqCJSIiIiKSDBqClTLUAyIiIiIiIjajBERERERERGxGQ7BERERERJJB7yFMGeoBERERERERm1ECIiIiIiIiNqMhWCIiIiIiyaBZsFKGekBERERERMRmlICIiIiIiIjNaAiWiIiIiEhyaBqsFKEeEBERERERsRklICIiIiIiYjMagiUiIiIikgyaBStlqAdERERERERsRgmIiIiIiIjYjIZgiYiIiIgkg1mzYKUI9YCIiIiIiIjNKAERERERERGb0RAsEREREZFk0CxYKUM9ICIiIiIiYjNKQERERERExGY0BEtEREREJDk0BCtFqAdERERERCSTMplMfPPNN9SrV4+KFSvSu3dvrl69muT2MTExTJkyhXr16lGpUiV69OjB6dOnn2mfSkBERERERDKp6dOns2DBAiZNmsTChQsxGAz069eP6OjoRLcfP348S5YsYeLEiSxdupScOXPSr18/Hjx4kOx9KgEREREREUkGszl9/vu3oqOjmTVrFoMHD6ZBgwaUKVOGqVOnEhgYyMaNGxNs7+fnx5IlS/j4449p2LAhxYsX56OPPsLJyYkTJ04ke79KQEREREREMqEzZ84QFhZGzZo145dlz54dHx8f9u/fn2D7HTt2kD17durXr2+1/ZYtW6hVq1ay96uH0EVEREREnmNNmjR54vrNmzcnuvzmzZsAFChQwGp5/vz5CQgISLD9lStXKFSoEBs2bODHH38kMDAQHx8f3nvvPYoXL57s41UPiIiIiIhIcpjT6b9/KSIiAgAnJyer5c7OzkRFRSXYPjQ0lGvXrjF9+nSGDx/O999/j4ODA6+++iq3b99O9n7VAyIiIiIi8hxLqofjaVxcXADLsyAP/x8gKiqKLFmyJNje0dGRBw8eMHXq1Pgej6lTp9KgQQOWL19O3759k7Vf9YCIiIiIiGRCD4deBQUFWS0PCgrCw8MjwfYeHh44ODhYDbdycXGhUKFC+Pv7J3u/SkBERERERJLBbDaky3//VpkyZXB1dWXv3r3xy+7fv8+pU6fw9fVNsL2vry+xsbEcP348fllkZCR+fn4ULlw42fvVECwRERERkUzIycmJHj168MUXX5A7d268vLz4/PPP8fDwoFmzZhiNRu7cuYObmxsuLi74+vpSu3ZtRo8ezYQJE8iZMyfffPMN9vb2vPTSS8ner3pAREREREQyqSFDhtCpUyfGjRtHt27dsLe3Z+bMmTg5OREQEEDdunVZs2ZN/Pbffvst1atXZ9CgQXTq1InQ0FDmzp1L7ty5k71Pg9n8X15f8nypMHxqWh9ChmaX+AszJQU5hmeaX9c0kTXYmNaHkOEFV1THe2rz/uF0Wh9ChmZI5MFcSVlr/b5O60NIUtF5H6f1ISTqco/30/oQnol6QERERERExGaUgIiIiIiIiM2oL1xEREREJBn+y4xT8oh6QERERERExGaUgIiIiIiIiM1oCJaIiIiISHJoMsoUoR4QERERERGxmUzVAzL+7blpfQgZWqTZKa0PIcPLaR+W1oeQodljSutDyPDe+qNvWh9CxhcTm9ZHkKGZYx6k9SGIPPcyVQIiIiIiIvLvaRaslKAhWCIiIiIiYjNKQERERERExGY0BEtEREREJDk0C1aKUA+IiIiIiIjYjBIQERERERGxGQ3BEhERERFJDg3BShHqAREREREREZtRAiIiIiIiIjajIVgiIiIiIslh1osIU4J6QERERERExGaUgIiIiIiIiM1oCJaIiIiISDKYNQtWilAPiIiIiIiI2IwSEBERERERsRkNwRIRERERSQ4NwUoR6gERERERERGbUQIiIiIiIiI2oyFYIiIiIiLJoRcRpgj1gIiIiIiIiM0oAREREREREZvRECwRERERkWQwaBasFKEeEBERERERsRklICIiIiIiYjMagiUiIiIikhwagpUi1AMiIiIiIiI2owRERERERERsRkOwRERERESSQy8iTBHqAREREREREZtRAiIiIiIiIjajIVgiIiIiIsmhWbBShHpARERERETEZpSAiIiIiIiIzWgIloiIiIhIcmgIVopQD4iIiIiIiNiMEhAREREREbEZDcESEREREUkODcFKEeoBERERERERm1ECIiIiIiIiNqMhWCIiIiIiyWE2pPURZAjqAREREREREZtRAiIiIiIiIjajIVgiIiIiIslg0CxYKUI9ICIiIiIiYjNKQERERERExGY0BEtEREREJDk0BCtFqAdERERERERsRj0gNnL2gJGNc2MI8jORLbuBGq0daPCKAwZD4vNJG41mti+N5cCGWO7fNpPXy0DDzo5UaGD5kd0NNPHZG5FJ7q9qU3s6DXdOlXNJjy4ciGHLr5EE+xnJlt1A1dbO1O3snGR8TUYzu5ZGcXhjNA9um8jtaUfdV1woV9/JartbfkY2zorgyvFY7B0MFC7nQPM+LuQqYG+L00pXTh8wsWaOicBrZlxzQO3WdjTpYvfEOrx1iYm9603cvw15vaBpF3sqN7C+73F8l4kN840E+UP2XFC1iR1Nu9jh4Ji55lo/dcDM6jkmbl4D1xxQp7WBZl0MT4zvliVm9qw3E3Ib8nlBsy4GqjwW32O7zKybb4qPb7Umlu/NbPEFqF+4CCNq1aZE7jzciYhg/vGjfH9gf5LbF8uVi02vvZFg+cU7d2j26+wEy12dnFjTvSdf79nN0tOnUvLQ06WqTcrR638d8C5dgJBbD1jzyzYWfrnmiWUav1KTLsPb4FEkH8H+d1jyzVrWzd0OgLt3HuYc/zzJshvm7eDLt2cBUMTHiz4TXqGMb1FiomI5uOUkM/9vMfeC76fcCaYxxVcyMiUgNnD1lJFfJ0RRvp49zV5z5upJIxvmxmA2Q6OujomW2Twvhm2LY2nczZEiPnac2Gnk90+jMdhD+boOuOU28NaXCROM3atiOb7diG+LzPOj9TsVy+8TwyhXz5HGPV24diqWLXMjMZugfleXRMts+y2SHYujqN/NBe+y9pzaFcPST8OxswOfupYkJCTYxKyRoeTxsqPjqGzERJnZ+mskv/4vjLe+c8PROfNcwF0+ZWLmeCOV6hto3cueyyfMrJljwmyGZt0ST8bW/2pi0yITzV+1o6iPgWM7zcz92IjBDirVs1wknz1k4peJlu99sbcdAZfNrJ5tIiwEOr6deZK8S6fM/DTeROX6Btr0MnDphJnVc8yYzdCiW+L1bO2vZjYuMtPyVQPFfAwc2Wlm9sdmDHZmKtezlDlzyMzMiZbvbdfbwI3LZv6cbSY0BDq/nXnqL0CVAgX4se1LrD53lim7d+Hr6cmI2nUxGAxM378v0TI++fID0G3JIqKMxvjlkbExCbbN4ezCT+1eomD2HKlzAulM2erFGb9gCH8v28ecicsoV6skvf7XAYOdHQu++DPRMvXa+/LujL6s+H4TBzcfp1abKgz79g2iImLYungPd26GMKzJpATl2vZrTP0O1Vn/q+VCOlf+7Hz65yiC/e8w5a1ZOGd1ovf4Tkxa+g5DG0/CGGtM8B3PG8VXMrrMc5WahjbPj6FAMTu6jLQkDKV97TEaYdviGOq+7JDoheyBjUYqNrCnaXdLglKisj03LprY82cs5es64OBowLuM9QWa/zkjx7cbad7LkSIvZJ6Lt79+j8SjmD0vv5sNgBK+jhhjYeeSSGq97JxofA9vjKZ8A0cavmpJUIpVduTmRSP7/4yOT0C2/RaJUxZ4bbIrji6W78jlYcfvE8K4cd5I4XKZ59dn/TwTXsUM9BhlOeeyvmA0wuZFJhp0sMMpkRjv3WCiSkMDLXtY6mLpKnD9opmdq0zxCci+DSZy5oMeo+yxszdQugo8CIG/lpto398Oe4fMcZG8bp4Jr2Lw2ihLXHx8DRiNJjYtMtOogznR+O7ZYKZqQwOteljKlK5iwP+ike2rTFSuZ4n53g1mcuWD10YZsLM3UKaKgdAQE1uXm+nQ35xp4gswpEYtTgcHM2LDOgD+vnoFRzt7BvhWZ+ahQ0QZYxOU8cmXD7+QEPZe93/idzctVpwPGjQiq2PiN5Qyoh7vvcSl49f4vP/PABzcfAJ7R3teGdaaZdPWEx2ZMEnrNa4DO1Ye5McxC+LKnMQtVzZ6jnmJrYv3EBMdy5kDl6zKlKxchPodqjN7wlJO7jkPQM3WlcmRx41hTSYRcDkYgNB74UxeNhyfGsU5vvNcap66TSi+ktE90zMgwcHBjB8/nj59+jB69Ghmz57NgQMHiIiISK3je+7Fxpi5dMzEC7WtE4Jyde2JjoDLJ01JlnPJan1xkDU7hN9P/Okns9nMyukx5CtkoG77zHNhHBtj5sqxWMrWsv7D71PXkegIuHYy4UUFgDEGnBLE10D4A8vPw2w2c3pXNJWbO8cnHwCeJR0Y8WuOTJV8xEabuXDcTPk61vGqWNdAVARcOpF4nYyNIUEdzpYdwv5Rh2NjwMkF7Owfbeea3fLzicwkzUpMtJnzx6HiY/GtFBffiycSL2eJr/Uy1+wQfv+f25gTxDdbJosvgJO9PTW8CrL+4nmr5WsvnMPVyYlqXl6JliubNz+ngoOf+N1uTs5836Yte/z9eH3FshQ75vTM0cmB8nVLs3PVIavlO1YeIKubC+Vql0pQxt07DwVLerBz1cEEZTyLueNV3D3RfQ2a0gO/swEs/26D1f4Bwu8/GoZ8/04oANlzu/67k0pHFF/JDJ4pARkzZgwLFy4kODiYI0eO8Nlnn9GzZ098fX1p27Yt77//PvPnz+f48ePExCTMzjOjOwFmjLGQ18v64iJvAUvob11PPAGp97Ijh7bEcvaAkchwM4e3xnL+oInKjRO/8D26zYj/ORNt+ztZXWxkdHcDTBhjIY+XdVXOHRff20nEt2Z7Z45tiebCgRiiws0c2xrNhYOxVGhs6f24F2giKgxyutuxeno4n3UNYVL7e/z+YSghQYl/Z0Z1+6blgjX/43XY0/I5+HriCUjDDnYc2Gzi9AETkWFmDm4xceaAGd8mj35WddvZcesGbFlsJCLUzJXTJv5aYaJsNQPZ3DJHPU4qvvk8Lf9NKr6NOhjYv9nMqQNmIsLM7N9i4vQByzMeD9VrZ0fwDdi82ER4qJnLp81sW2HGpxqZJr4AhbLnwNnBgct371otv3LvHgBFc+ZKtJxPvny4OTux5JWunH57CHv79mdUnbo42D2qw5GxMTT/dQ4jN67nTia5GedRJB9Ozo5cv3DTavmNS0EAiV7sFiplqdBJlinhkaBMw041KF21GD+8Nx+T6dHvwd/L93Prxl0GftGd3O45cC+cl74TOnM74B6H/zr9304uHVB80zeDOX3+e948023cw4cPM3LkSHr37g1AeHg4J0+e5Pjx4xw/fpz9+/ezfPlyAJycnDh27NhTvzMqKorz589TokQJXFxcOH36NPPmzSMwMJCSJUvSq1cvPDwS/uI8LyLCLLXi8TvBTnF3LqPCEy9Xq50Dl08amf1/UfHLfJvbU79T4l3825fFUNjHjmIVMs/QK4DIuPg6Pxbfh5+jwhP/razRzplrJ2P57YOw+GWVmjlRp6NlSFZ4iKXcpl8i8CplT8dRWQm7Z2bznAjmvB/KgO/ccHLJHBdwEaEPY2y9/OHnyCTqcL12dlw6YebHcY/GC9dobqBx50d1tEQFA4072bFqpolVMy2JnVdx6Ple5qnHEZYbiwl6M54W3/rtDFw8YeaHcY8S4prNDTTp/OjiuGQFaNLJwMqZZlbOtPwcCxaHXu9lrgkQsztbhr+GRkdbLQ+L++zm7JSgTN6sWcmXLRsms5lPd27nxoMH1C7kTf+qvhRwdeOd9WsBiDGZuHzvboLyGZlrDkvlDH9gPRHKw89Zs2dJskzY42VC48q4JXxer9OQlpzcfZ5jO85aLb8XfJ9pI37lvZn9adChOgAP7oYy6sXPCb///CeBiq9kBs+UgDg7O+Pj4xP/OWvWrFSrVo1q1arFL7t37x7Hjh3jxIkkxg38w8WLF3n99dcJDg7G09OTSZMmMXDgQAoWLEjx4sXZtGkTy5YtY/78+RQvXvxZDjXdMD+8NkjiWjWxCW5iY8zMGBlJ6F0z7Qc5kq+QHVdPmti6MAYnl2jaDrD+Y3nlpJEbF830/F/mGX/8kPlhfvGM8f1l1ANC75ppMygLeQvac+1kLNsXWZ75aNU/Kw+Hg7vmtKPL2GwY7CxflNvTjpkjQjm2NRrfVpljlrGHN8aSmIwp8RhHm/n23Vju34XOg+3IX8jA5ZNmNi4w4ZTFSIe3LAnG4m9M7Ntoolk3O0pVMnA7ENb9amTGWCMDP7HPFEnev6nDMdFmvn7XxIO70GWwgfyFDFw6aWbDAjPOWUx0fMuSYCz8xszejWZadDPExdfM2l/NfD/WxKBP7DJFfAHs4oJoTmICf5M54fIHUdH0XLaES3fvEBBqyRL3Xfcn2hjLu7XrMm3fXi7evZN6B52OPWwPzYnEDcBsSrj8YRkeK/OwBj7+XT41SlCiYmHGd/smwXc17FSDUT/14+/l+9kwbwdOLo50HtKKj5YPZ2TrT/E/fzNBmeeJ4iuZwTMlIE2bNuXUqVPUrFkzyW1y5sxJ/fr1qV+//lO/77PPPqNy5coMHDiQmTNn8tZbb9GuXTsmTJiAwWAgNjaWUaNG8fHHH/Pzzz8/y6GmG1lcE78THx13V9MlW8IyJ3YYuXnZTJ/JzpSobLlQK1beHpds8Mf3MVRr4YBH0Ud3ME/sNJLFFUpXyzx3jR9yyZZ4fB9+ds6W8ALr9M4YAi+b6DkpG8UqW5K2IuUdcHE1sPb7CKq0cMYpi6VcCV+HRw07ULCMAy7ZDNy8lHlmAckSF8PH78Q/7L3LkkgdPrrTzI3LMOAje0pXsdTVEhUs2y6dbqJmSzuyusGedSaadrGjdS9L3S0BeJcy8NmAWPZuMFGvXcav0w/jl1R8E2sjHsb37Y/sKF3F8vMpWcFAlmwmlkw3U6ulmaxusHudmWZdDLTpZfkZlMSAdykznwwwsWeDmfrtMkcCcj/K0pPs6mR90yCbk+VmzoOoqARlooyx7PS7lmD51suXebd2Xcrmy5dpE5CwEEvlzOpmfSf+4V32sPsJu+2SKpPFNa5MiPWd9bov+fLgbij7NxxP8F093n+JU3su8EnvGfHLDm89xY/7JtHrfx2Y/Nr0Zz2ldEXxTefMmaPdTG3P1A/fsWNH1q5dy4ULF1Jk5/v27WPYsGGUKVOG0aNHExUVRbdu3eLnvXdwcGDAgAEcPHjwKd+UfuUuYMDODm7fsL5AvhVg6RrJ753wR3AvyLJtYR/rdUXLWy7Ggq5ZP4NwZp8Rn1r2mWpGm4dyF7DDYAd3Aqxj8vBzPu+EF7D34p7hKORjnX8XiXuwPPiaMf57E5ltE6PRjKNT5ol1Xk+ws4Nbj9fhuM/u3gljcTfQsq7oC9brilew1OnAa2buBVmmmX18mwJFDGTLDjevptgppGtJxTf4huW/HonE906g5b9FX7BeXqKCZdub18zcDbLcDC32WHw94+IbkEniC3A15B6xJhOFc+S0Wl4kp+Xz+TsJE4miOXPxavkKuDpZ9zi7OMS9iymTPO+RmBuXgzDGGvEslt9q+cPP187cSFDGL+7ZhCTLnLUuU6NlRXb9eTjRKV/zF8rDqX3W1yFREdGcO3yFwmU8n/Fs0h/FVzKDZ0pAXnnlFU6cOEHnzp15//33WbNmDVev/vu/Yi4uLkRGWsYn5s2bl1deeQVnZ+s7VPfv38fNze1f7yOtOToZKFLOjhO7jFZdoCd2GHFxhUKlEv4I8hWyXDA8PkPW1VOWhiKXx6MLivAHZm7fMFPYJ+PfKU6Mg5Pl5YBndsVYxffUjhhcshnwKpUwLnkLWmL++AxZ105bPudyt8MpiwHvF+w5syuG2JhH33vpSAwxkeCdiaY5dnQyUKy85T0e/4zx0R1msriCd+mEF8j54+rw4zNkPazTuT0sD7Hb2SXcJsjPTNh9yzaZgaOTgeLlLb0a/4zvkbj4Fi6dsIx7XHwfnyHr8klL+TwehvjE5uJj8Q2Mi2+eTBJfgGijkX3X/WlRooTV8lYlShESGcnRmwmHlHi4ujKpcVNalShptfzFUqV5EBXF8aDAVD3m9CwmKpbju85Rp20Vq+V1X/Llwb0wzh68nKBMwKUgblwOou5LvgnK+J+/SZDf7fhlrrmy4VXcnVN7E7/Z6X/uJi/UtP65ODo7UKJiYW5evfVvTyvdUHwlM3imIViTJk3i9OnTnDx5krVr17J8+XIMBgPZsmXDx8eHcuXKMWrUqGR/X926dZk4cSKTJk2iePHiTJgwIX6d2Wxm3759fPjhhzRt2vRZDjPdadzVkZljo5j/cTS+zRy4etrI9qWxtHzDEUdnA5HhZoKumchdwA7XHAbK1rCnUGk7Fn0eRdMejuQraIffWRNbF8TErXt08XvzysOelMxzR/5x9bs6M3dsGEs+DqdScyf8Tseya1kUTd9wwdHZQFS4meBrRnIVsCNbDjtK13DEq7Q9y74Ip2F3F/IWtOP6WSN/L4ykVHUHvEpbfi2a9MrCnPdCmf9BGLU6OBN218Sm2ZF4lbandI3M9bxN8252fP++kTmTjdRoYcflU5a3nL/Y2/IOkMgwMzevmclbwIBrTgPlahooXMbAb58ZadnTTP5CBq6eMbPxdxMv1DBQuLQlCaz/sh1blljqcKkqBu4GwvrfjOTKD7VaZp4HpVt0s+O79038MtlEzbj4bllipl1vA07OBiLCzNy8BnkLgFtOA+VrQuEy8OtnJlr1NOAeF9/1v5spVwMKxyWFDV82sHmJGTBRuoqBu4Fm1v5mJld+qN0yc7UZ3+3by68dOjGt9YssPnmCKgU86VfVl093bCfKGIurkxMlcufhWsg97kREsPe6P7v9/BhbvwFZHB25dPcOjYoUo1elyny8/e/4YV2Z1e+fr+Ljle8yds5brP91Bz41StBpSEtmfbCE6MgYsrq54F3ak4DLwYTcfmAp89kqRnzfh/t3Qtmz5gg1W1eiQYfqTH79e6vvLupjmRY5sTv9AHMnL+f/5g9i7Jy3WDd3O47ODnQY2Jw8BXLyWd8fU/fEbUTxTceewxmn0iODOamnnJ7CZDJx8eJFTp48yYkTJzh16hRnzpzh0KFDTy8c586dOwwYMIBChQoxZcoUq3WrV69mxIgR1KtXj6lTp+Lq+t/nnl52sfJ//o5/6+SuWDbNiyHY30z2vAZqvehAvQ6Wi9hLx4z89F4Und5xomozy8VvZLiZDXNiOLHTSMQDM7k9DFRu4kDdly0vIXzo2N+x/P5JNO/McCF/obS9YIs0J5xJxlZO74pm22+R3PY34ZbHjmovOlG7g2Xs65VjMcx5P4yXhmWhUjNLD1tUuGVGq9O7Yoh4YCaXhx0VGztR62Vn7P8RX7+4t6r7n4vF0dlAmZqONO/jgotr2sQ6p33Y0zdKJcd2mlj3q5Gg65AjD9Rta0ejjpZk+MJRE9+NNtJtuD3Vm1tiExlmZvUcE8d2mAh/AHkKgG8TOxp2sIuvw2azmb9XmNi12sTtQMieC0pXNdCmlz2uOW1/gWxP2k2xfHSnmbW/mgi8DjnzQL22Bhp3tMTy/FEz34420X24gRpx8Y0Is7wt/cgOc3x8qzcx0KiDwSq+21aY2bnazO1AyJELylS1vG3dLQ3iC/DWH33TZL8AzYuXYFjNWhTNmYvAsFB+PXqUmYctQ3xreBXk906vMHLDOpaePgWAm5MTQ2vWolmxEuTPlo2rIff45fBhFp5MOG4ewMstO9t797X6jrRQ6v+ePglMSqj9YhV6vv8SXiU9uB1wj1U/bWHZtPUAVKhbms9Wj2bKWzPZOH9nfJnWbzSg4+CW5PPKTcCVYBZ9uZrNC3dbfW+9l6sxdvZb9PUdk+QDz1WblOPVUW0pUbEwEaGRnDt0mV8mLOPyCb/UO2Eby8zxXRcyyyb7+TeKffVlWh9Coi4NG57Wh/BM/nUCkhiz2Rz//MazuHfvHjnjxuI+dOfOHYKCgihTpkwKHV3aJiCZQVomIJlFWiYgmUFaJiCZRVomIJmFrRIQkdSiBOTZPW8JSIq+zvnfJB9AguQDIHfu3OTOnfs/HpGIiIiISArREKwUkXkGWYuIiIiISJpTAiIiIiIiIjaTokOwREREREQyKoOGYKUI9YCIiIiIiIjNKAERERERERGb0RAsEREREZHk0BCsFKEeEBERERERsRklICIiIiIiYjMagiUiIiIikhwagpUi1AMiIiIiIiI2owRERERERERsRkOwRERERESSQS8iTBnqAREREREREZtRAiIiIiIiIjajIVgiIiIiIslhNqT1EWQI6gERERERERGbUQIiIiIiIiI2oyFYIiIiIiLJoVmwUoR6QERERERExGaUgIiIiIiIiM1oCJaIiIiISDLoRYQpQz0gIiIiIiJiM0pARERERETEZjQES0REREQkOTQEK0WoB0RERERERGxGCYiIiIiIiNiMhmCJiIiIiCSDZsFKGeoBERERERERm1ECIiIiIiIiNqMhWCIiIiIiyaEhWClCPSAiIiIiImIzSkBERERERMRmNARLRERERCQ5NAQrRagHREREREREbEYJiIiIiIiI2IyGYImIiIiIJINeRJgy1AMiIiIiIiI2owRERERERERsRgmIiIiIiIjYjBIQERERERGxGSUgIiIiIiJiM5oFS0REREQkOTQLVopQD4iIiIiIiNiMEhAREREREbEZDcESEREREUkGvYgwZagHREREREREbEYJiIiIiIiI2IyGYImIiIiIJIeGYKWITJWAtMp6L60PIUOLMRvT+hAyPHuDIa0PIUOLMZvS+hAyPIeCYWl9CBmfvX1aH0GGZrx3L60PQeS5pyFYIiIiIiJiM5mqB0RERERE5F/TEKwUoR4QERERERGxGSUgIiIiIiJiMxqCJSIiIiKSDHoRYcpQD4iIiIiIiNiMEhAREREREbEZDcESEREREUkODcFKEeoBERERERERm1ECIiIiIiIiNqMhWCIiIiIiyaBZsFKGekBERERERMRmlICIiIiIiIjNaAiWiIiIiEhyaAhWilAPiIiIiIiI2IwSEBERERERsRkNwRIRERERSQ4NwUoR6gERERERERGbUQIiIiIiIiI2oyFYIiIiIiLJoBcRpgz1gIiIiIiIiM0oAREREREREZvRECwRERERkeTQEKwUoR4QERERERGxGSUgIiIiIiJiMxqCJSIiIiKSHBqClSLUAyIiIiIiIjajBERERERERGxGQ7BERERERJJBLyJMGeoBERERERERm1ECIiIiIiIiNqMhWCIiIiIiyaEhWClCPSAiIiIiImIzSkBERERERDIpk8nEN998Q7169ahYsSK9e/fm6tWrySq7atUqSpcujb+//zPtUwmIiIiIiEgyGMzp899/MX36dBYsWMCkSZNYuHAhBoOBfv36ER0d/cRy169f58MPP/xX+1QCIiIiIiKSCUVHRzNr1iwGDx5MgwYNKFOmDFOnTiUwMJCNGzcmWc5kMjFy5EheeOGFf7VfJSAiIiIiIpnQmTNnCAsLo2bNmvHLsmfPjo+PD/v370+y3A8//EBMTAz9+/f/V/vVLFgiIiIiIsmRwWbBunnzJgAFChSwWp4/f34CAgISLXPs2DFmzZrFkiVLCAwM/Ff7VQIiIiIiIvIca9KkyRPXb968OdHlERERADg5OVktd3Z2JiQkJMH24eHhvPvuu7z77rsUKVLkXycgGoIlIiIiIpIJubi4ACR44DwqKoosWbIk2H7SpEkUKVKErl27/qf9qgdERERERCQ50ukQrKR6OJ7m4dCroKAgvL2945cHBQVRpkyZBNsvXboUJycnKleuDIDRaATgxRdfpF27dkyYMCFZ+1UCIiIiIiKSCZUpUwZXV1f27t0bn4Dcv3+fU6dO0aNHjwTbb9iwwerz0aNHGTlyJD/++CPFixdP9n6VgIiIiIiIZEJOTk706NGDL774gty5c+Pl5cXnn3+Oh4cHzZo1w2g0cufOHdzc3HBxcaFw4cJW5R8+xO7p6UmePHmSvV89AyIiIiIikkkNGTKETp06MW7cOLp164a9vT0zZ87EycmJgIAA6taty5o1a1J0n+oBERERERFJBkNaH0AqsLe3Z+TIkYwcOTLBuoIFC3L27Nkky9aoUeOJ65OiHhAREREREbEZJSAiIiIiImIzGoJlIzv3Gvhupj2XrhrIlRM6tTPS+1UThiT68qKj4YfZ9qzeaMe9ECjibea1LibaNDMBcD0A2nRzSrww0K6lkQnvGVPhTNKnXfvs+GGmY1x8zXRoa+T1V2OfGN8f5ziwdqM9ISEGCnub6fFKLK2aWWJ246aBl7q5JLm/F1vG8sHomNQ4lXRr5147ps904PJVAzlzQqd2sbzxqvGJMZ4x24E1G+3j63DPLrG0jqvDNwIMvNjNOcn9tW0Zy4fvxabCmaRPu+Pq8OWrduTKaebltrH0ekod/mmOI+v+UYe7vxJDy2bWv/dXrhmYNsORg0fscXCAyhWMDH0rBi/PdDqXZCqq516cYS80pIRbPu5EhbHg8iFmnN35xDINPUowqGx9SuXIz72oCNbfOM2XJ7YSYXz0+9/cswxvlq5NMbe8PIiJZHfQFT4/sZnbUWGpfUppqmqTF+g1pj3epQsQcjuUNb/8xcKv1j6xTOPONejyTms8Cucl2P8OS6atZ92vO6y2qdW6Eq+++yIFS7hzN+g+mxftYeHUNcTGPKrb2bJn4fX/vUydF6uQJZszV05fZ/akFRzdfiZVzjUt+LaoxBsTu+LtU5CQ4Pv8OWMDCz5Z8cQyTbrXo+t7L1OgmDtB126x+IuVrJ25JcntB3zZi47DXqSZXeckt6nVzpcJK0YzotEHHPvr1L89nYwj8zWdqUIJiA0cOWFg6FgHWjQy8XYfI4eP2zHtZ3tMJujX05RomdETHNi+28BrXUxUr2LizAUDk6ZYLuS6dzKRLw/M/S7hBfDCFXas32rHy60T/96M6OgJO0aMdaJZIyMD+hg5etyO72c6YDZD7x6JX8COnejE9t129OgSS7UqJs6dt+PjLx25FwLdOhnJm9vMrO8iE5RbvMKBjVvteal15knuAI6eMPDOWEeaNzIxsE8sR47b8d3PDphM0Ldn4rF4f4Ij23fb0bOLMb4OT57iyL2QWF7tZCRvHjOzv4tKUG7RCgc2bLWjfSaK8bETdrw71pmmjYwM6BPF0eP2/DDTEbMZ3kiiDo+b6MSO3fZ07xJLtSpGzp6345MvnbgXEkPXTpYygUEG+g12oXAhExPHRREVZeCHWY4MGenMb7MicUk6/8twKucuyPe1u7DW7yRfndxG1TyFeOeFRhgw8MPZHYmWaVSgJNNrvcKKq8f44sQWSmTPy/AXGpPbKRsj9i8HoKVXWb6p2YnfLx1k6slt5HXJxhCfBsyt35OXN/9EtClj1uOy1Ysz/rdB/L18P3M+WkG5GiXpNa49BjsDC75M/GHVei9V5d3ve7NixmYObj5JrdaVGPZ1L6IiYti6ZC8AlRuW5X9z3+Lv5QeYNWEpRX0K8vq4l8mRx5Xpo38HwM7OwKTFQ8nnlZuZ45dyL/g+L/VvwsSFQxjadDKXT123WRxSi0+tUkxYOZq/Fu7il/8toFzdMrwxqRt2dnbM/2hZomXqd6rJqDmDWP7NGg6sO0Lt9tUZ/tNbREVEs2V+wjpevl5Z2g9u/cTjcMvtyrAf+qfIOYn8kxIQG5gxx57SJcxMHmv5Q1SnhpFYI/wy356er5gSXAScOW9g6w47BvWNpW8PSyJR09dMFhf46gd72rYwkd0NKrxgnYafPGNg/VY7Bvc1UrlC5knRf57jQKkSZiaMsSRktaubiI2FOfMdeLVzbIL4nj1vYNsOewb2iYm/uKtR1YSLi5lvZzjyYksjbq5Q3sc6hqfOGti41Z6BfWOpVD7zJHgAM+Y4ULqEmUljLTGuU8NErBFmz3egxyvGJOqwPW/3jaFPD0u9r+ELWVzg6x8caNvCiFsidfjUGQMbtlrqfuaqw46UKmHiwzGWN9HWiqvDc+c70i2JOvzXDgfe6hPN63F1uHpVE1lcYNoMR9q0jMXNFX78xZFsWcxM+yKKuJfd4lnAzLtjnTh91o7KFTJPPR7kU58z924y8sBKALYHXsTBzp43S9fml/N7iDIlTPTGVmjO+utneP/gKgD2BF/BzmDHa8Wr4WLvQKQxloFl67Et4DwfHH500X3pwS2WNu5LowKlWH/9tG1O0MZ6jGrLpeN+fP7WLAAObj6JvaM9rwxtxbLpG4mOTHiDrNeY9uz44xA/jl1kKbPlJG65stHzvXbxCUjzV+sQ7H+Hz/r/jMlk5vC20+TI68bLbzVlxthFGGONNOpck1KVizCo4cT4ZOPYzrN8v308VRq/kCESkJ7/15mLR67waa9vATiw/ggOjvZ0Gd2eJV/+SXRkdIIyr0/sxvYle/hh+BxLmQ1HccvlymvjuyRIQFyyOvPurIHcvnGH/IXyJnkcQ77rR2xM5umJFtvRMyCpLDoaDhwx0KSe9R/6pg1MhEcYOHQs4fiKS1ctyxrUtr4Aq1rRTESkgQNHEpYxm+Gjr+wpVthMj86Z56IiOhoOHrWjUT3ru4yNGxgJjzBw5FjCKn75qmVZvdrWZapUNFniezhhGbMZPv3KiaLeZl7tlLka4+hoOHjEjsaPxbhpXIwPJxpjSx2tX9u6LlaNi/H+I4nH+OOvHCla2Ez3zhnzrnFioqPh0FE7Gj5DHb4SV4frJqjDRiIiDRw8bI/ZDNu229O2dWx88gFQtrSJ1UsiM1Xy4WhnT428hdlw3Xp4zvrrp3B1dMY3r3eCMmVzeODtmpt5F/ZZLZ97YR9N139HpDEWA7Ar8BILLx+y2ubygzsAeGfLlbInkk44OjlQvk4pdv5pfd47/jhIVjcXytUqmaCMe6E8FCzpkWgZz2L58SrubvluZwciw6MxmR79/bt/JxQnZ0eyuFoy8bptq3Bs5zmrRCMmKpa+1cexdJr1S9KeR45ODlRo+AI7lu+1Wv73kj1kdctC+XoJ307tXjgfhUp7JiizfeluvEp44FWygNXyN794jbs377Fh9rYkj6PBK7Wp0qwCP4+e9+9PJgMymNPnv+eNEpBU5h8AMTEGCheyrh3eXpbP1/wSJhO5clrW3bj52HfdsPz3ekDCMms323HyjB0jBxmxt0+BA39OXA8wEBNjwLug9cVUoYfx9X9SfK3X+d+wfL4RkPDXYv1me06dsWPE4JhMFV8A/4cxfqwOP4zx1UTrsOW/Scc4YZl18XU4NlPF+FEdto5vQS9LnfbzT1gfc8bV4YCb1uv8b1g+3wgwEHDTQGiYgQIeZj77ypFmL2WhXvMsjBjjzM3AjDiRZNK8s+XCyd6BK6F3rJZfDb0LQFG33AnKlM1puSCONMYyo3YXjrV/j/1t3+V/FVvgZGepoGbgk+Ob2Bxwzqpscy/LBeL5+0EpfSrpgkeRvDg5O3L9YqDV8huXLOf7MJn4p0KlLRfA1y88ucwfP23Fs1h+Og1uTrbsWSjjW4z2A5qyb8MxQu+FA1CsfCGunrlO+wFNmH34Y1YH/cC0reMoX7tUyp5oGilQzN0S33M3rJbfuGC5KChYyjNBGe+yXgD4nwuwWn49vsyjBKRK0wo07Vmfz3tPx2RK/EZEzvw5GDytD98P+4XbAXf//cmIJCHdJiBt27YlICDg6Rumcw9CLX/os2W1Xp41i+W/oeEJy/hWNFPQ08yn3ziw96CB0DA4dMzA1zMcsLMzE5Hw0QTmLrSjUjkT1So/h2nwfxAf32zWy7PGxTssLOGFVpWKJrw8TUz51pF9B+0IDYPDx+yY9qNjkvGdt8iBiuWMVK2Uee4aP/Qg1PJf1yTqcFgidbhqRRMFPU18/o0De+Ni/LQ6/OtCByqVM+FbOXPFODS+Dlv/7j6qwwnL/LMO7/9HHf7uH3X47j3L9373oyPBtwxMHBfFmJHRnLtgYOBwZyIiUvW00hU3R8ud89BY62eOwuI+uzokfBgmt7PlB/Bdrc6cv3+Lfjt/Z8bZnXQuWpnPfF9Kcl+FXXMzunxTTtwN4K+bF1LqFNIV1xyW2IT/f3v3HR5F9bZx/N5NoyM1QOiEFqnSmygoIl1FfZUSEBDpCAj8UAQELFQpioJUkS4gvYogKL1IkV4DgUCAhJCQuu8fmwSXTTTiMptsvp/rygU5u2f3zJOTzTxznpm5Z/uLHB5m/T5T1oz2fbJZ+9y/F5FMH+sy3R87T2nZ5I3qPOJ1/Xhxsr7c9D+F3AzV512+S+yTPVcW1WtRVS+3f1bfDVuq4W9PVcT9SI1e1lfFyxVy0FY6T+anrH/Q7oc+Eqv42GXKlkR84/uEP9InIv5nlCk+/pmyZVL/77pp3rDFunom+X2s97/tqhO/n9aW+TsecyuAv+fUc0BWrlyZ7GOXLl3S+vXrlTOn9chUq1atjBmUgyUcXEjuSjbmJNo9PKSvx0Rr+Bh3de3vIUnKk8uigb1iNOgTd2V85OJMh46adPKMWRNHpa+rMkmSJSG+yTxuSiLF9vCQpoyJ0sgxHuoxwLrjkTuXRQN6RWnIJ5528T1y1KxTZ8waN9L+hOn0ICHGyQU5uTk8dUy0RozxULf+1qu15c5l0cBe0Rr8iYddjA/Hz+EJo+zrml1d3GPO4UljIjVqjKd6DrAGM3euOPXrFa2P4udwQtl2zhzSF59EyRz/OgV9LOrcI4PWb3bXqy3SRzmhOf4D2GJJ+gBNXBLtHvGrHJuvndK4Y1slSXtuXpJZJg0o31CTTmzXhbBgmz4lsubW7LptFBUXo967l7nsxXJM5r+PpyWJo+oJfR4NiumRn03vCW314tt1tGDcGh3e/qe8i+RWu8EtNHpZHw1uNUGREVHy8HRX5uwZ1efFT3XrmvXo/LHdZzT7wKd6o09jfd5lhiM202nMibFKZr7G2bcn9zNJ2PdI+Jl0n9hBNwOC9ePEtcm+/4vt66tcvbJ6t3y/fzv09MFVf7EN5tQEZMSIEXrwwJqdJ/VBNmbMGEnWD6i0moBkzWL999GjxOHxBymyZEm6X+GC0qzJMbp9R7obKhX2kW7clOLiTMqW1fa5W7ablS2rRXVrpr/fiixZrNtsF9/477NkTjomhXwsmj4pSrfvSCGhJhUqaNGNIJM1vtls+2zd4aZsWS2qUzN9HZlP8Phz2KKZk6Pi57BJhX0sunHTlMwcTr8xzpo4h21TkIdzOOl+hXws+nZSpM0cDvrLHM6U0fq6tarHJiYfklTeL05Zs1h05lz6KcMKjYpf6fCwXenIHL/ycS/G/uDC/RhrMrwt8IxN+683zmlA+YYq+5S3TQJSI08RTa35uu7HRKnjjh8UEH7XkZuQqtwPiT8S/8hKR6Ys1mT40SP31j7h8X1sjz5kzOyV2CdX/qfUuH09LZ6wXvM+tV4sQLtO6/Shi/p21wg1alNHq7/bpvCwB7pyOjAx+ZCkiLBIndh7TsXLp/0VkLC71mXPhFWLBAnxToilbZ/4+D6yOpIh4WcSEq4aTZ/Rc/9XRz2qDZLJbJJJJpnjPxzMbmZZ4izKmT+Huk3soG8HzNOdoBCZ3cxyc7M+x83NLLPZnGzZFvBvODUBWb58uQYMGKCsWbPqiy++kLf3w7rRypUra9WqVSpUKG1/mBQqYJGb2aLLV036a9ps/V4qXsR+B/lBpLR1u1mVysfJJ7/1CKZkvQqTJJUtZdtnx+9mPV83Th7p8JpmBX2s8Q24apb08EPxSnx8iyUT3593uKliuTj55LcoZw7rc06etvYpU9L2w3Xn72bVrxMr93QYX0kqGD+HE2Ka4EriHLb/Y/RwDltsYvxn/BwuU8q2z6+/m/Vc3dh0OYd9fJKOr3VOS8WSie+2+Dlc4K/xPW3tU7pknHwKWGQ2WxSdxMJoTIzklfxthFzO5fu3FRMXp8JZbM/1KJLF+uF6NvSmXZ9L8eeLeD5yQpJ7/A7bg9iHq0fNCpXT51Vb6OK9YHXatUA3Iu45dPypzbULQYqNiVWB4nls2gsUzytJunzKvrTnSvy5CAWK59W5o1fs+5y8prwFc8psNuv4XtvStUt/XlNI8D0VKWM99+HauSB5eHrYvYe7h5uiItL+Kuq1czes8fXNZ9Oe8P3lEwF2fQJOWc8X8fHNp3OHLya2+8T3uXQiQP7D35RXRk99d2yiXf+N0Yu1ac4vOrL9uLLmyKIBM7trwMzuNs8Zs2WYrl8MUrviPf7T9gGSk88BKVasmBYvXqwKFSqoZcuWWrcu6WuHp2VeXtIzFS36eYfZZjV1y3azsmaxqFzZJJb+3aXPJrnpx9UP//DFxkqLVripkI9FvsUe9gkJte4IViqX/lY/JOtOVOWKcdr2q5tNfH/e7qasWSx6uqz9zpuHuzR2kodWrLGN7+Ll7irkE6cSdvE1q2K59HvEx8vLGuOfd7g9MocTYpz0HP5ikoeW281ha4zt57A5Xc/hShXj9Esyc9gvmTk8bpKnVqx5mLHFxkpLl7urYPwczpRRqlTe+rsR9Zd9sn0HzIp4YFKldHQVrKi4WO27dUmNCthePeglHz+FREXoj9vX7Prsu3lJ92Oi1KxgOZv2BvlLKzouVoeCrTuB9fP5akzVljoUfEX/98scl08+JOsVp47+dkZ1mj1j0163RRXdu3tfpw5esOsTeOGmrl0IUt0WVez6BJy5rqCA27p23prYlKtpexWtgr7eyp4rq65fuiVJ2rflqIqXL6RCpR7uoGfNkVl+NXx1bLftilVaFB0ZrT92/Km6r9SwaX+2dU3duxOmk3vtzy26du66rp27rnqv1bJpr/daLV05dU1Bl29p3ogl6lFtkM3X2hlbJEk9qg3SvBFL9Pvq/XbP+fK9byVJX773rYa2+OIJbXUaYkmlX2mM0483uru7q1+/fqpXr54GDRqkrVu3avjw4c4elkN1aRerrv3d9cFwd7VqEqsjx8yau8isPl2t908Iuy+dv2hSQR+Lcj4lublJb7SK04JlZuXNbVGxIhYtWuGmw0dNmjg6xqac4sz5+KPQRdPg7HOQd9rGqMcAT/1vhKeavxyjP46b9f1id/V6NyYxvhcumVSwgEU5nrLGt3XLGC380V15c1tUtLBFS1e6649jZo0bFWUT37Pn449Cp+P4StabDXbr76FBwz3UMn4Oz1vkpt5dY2zmcCGfhzF+vVWsFi5zU574ObxkhZuOHDVpwujoR2Icv1pVNP3sED/qnbbR6jnAS0MS57Cb5i92V493o/8yh80qWCAuMb6vtYzRosQ5HJc4h8eMikyMb/cu0er2vpfeH+ylNm9G6/Ydk6ZO99TTZWPtLkPt6qad3Kk59dpqUo3X9OPFw6qcq5A6l6qlsce2KjIuRpndPeWbLY8uh93RnahwhcdGa/KJX/S/Co0UGh2hTVdPqnKuQupSurbmnd2rO1Hh8jS7adQzzXQ/JlLTTu5UiWy291O4HhHqsgnJwvFr9dmK9/Xh7K7a+MMu+VUvoda9GmnWiB8V9SBambJmUOHSBRR4IUghwdYrWSwct1b9v+qo0Nth2r3hiGo2rqj6r1TT6HesO7ghwWFa8c0Wte7VSJJ08JcT8i6US20GNteNK8HaMO9XSdLKb7bqxbfr6JNFvTV31EpF3H+gtwc0k8Vi0bIpaf8yvJK0YPSP+mLzUA1d3E8bZv8sv9ql9fqAFvpu8A+KehClTFkzqohfQV07d0Mht0IlST+M+lEfzO6h0Nv39Puq/arVoqqee7O2Rr45QZJ049JN3bhku9pX45p1pe/0gfOJbfduh9k8J6GM68qpa7p47PIT22akLyZLcmeROUFoaKhGjBih/fv3Kzg4WOvXr3doCVZEYDGHvda/9fOvJk2b7aaLV0zKm1t6s1Ws2r9p3eHad8ikLu97aMSgGLV82doWHSN9O8dNazaZFRIqlfa16F3/WNWuZvvj2rjNrEEj3LVibpSKFTF8s2xEW5y3Q7PtV7Omz/HQpSsm5clt0eutYtX2DWuJxIHDZr33vpc+HhSl5o2tY4yJkWbMddfaTW4KDTWplG+cOrePUc1qtjvBm7e5acgnnlo694GKFnb+r4pbclczMMDPv5r1zWx3XbpiUt7cFr3RKlbt3rTGc/8hs95931PDB0WrxcvWtugYafoca4wT5nAX/xjVeiTGm7aZNXiEp36cG5lkyZyRoi3OS4J++dVNM/4yh1u3ilGbv8zh7u9n0NBBkWr2lzn83VwPrYufwyV949SpfbTdHP7jmFnTZnro+J9mZfCS6teNVe9uUYnn9hit+u/vOueNJb1YoLR6+dVX8Sy5dOPBPf1wbr9mndltHVfuIppfv70G7f9JKy79kdjn1SIV9U7JmiqaJaduPLinJRcOafqpXbJIqpmnqOY92y7Z95tyYrum/Gn8VYSKdzFmJ7F208pqN7iFfHy9FRx4V6tnbtPyrzZLkirUKaUxqz/Q+B6ztXnhb4l9mvg/q9d6NlIen5wKvHRTSyau19Ylu21et9V7DdW0Q315F8mtOzdCdGDbCc0dtSIxkZGk3AVy6J1hr6naC+Xk7uGm43vOasbQpbp00n41y9Fi79594u8hSXVaVVf74W+oYOkCCr56W6u+3qBlE9ZIkirU99P4bSM0tuNX2jT3l8Q+Td99Qa/3b6E8hXIp8HyQFn2+4m+vZNVu2OtqP+wNvWh+PdnnJLxX/+eH6Y/tJxy2fX9nc9xSQ97ncVTsbV/Clhocmfy+s4fwr6SqBCTBypUrtXz5co0bN0558+Z12Os6MwFJD5yZgKQXzkxA0gNnJiDphTMTkPTCqAQkvTIqAUnPUnMCUqlX6kxADk9JWwmI00uwktKqVas0e9UrAAAAAMlLtTciBAAAAOB6UuUKCAAAAJDqpLoTF9ImVkAAAAAAGIYEBAAAAIBhKMECAAAAUsBECZZDsAICAAAAwDAkIAAAAAAMQwkWAAAAkBKUYDkEKyAAAAAADEMCAgAAAMAwlGABAAAAKcBVsByDFRAAAAAAhiEBAQAAAGAYSrAAAACAlKAEyyFYAQEAAABgGBIQAAAAAIahBAsAAABICUqwHIIVEAAAAACGIQEBAAAAYBhKsAAAAIAU4EaEjsEKCAAAAADDkIAAAAAAMAwlWAAAAEBKUILlEKyAAAAAADAMCQgAAAAAw1CCBQAAAKSAyUINliOwAgIAAADAMCQgAAAAAAxDCRYAAACQElRgOQQrIAAAAAAMQwICAAAAwDCUYAEAAAApYKIEyyFYAQEAAABgGBIQAAAAAIahBAsAAABICUqwHIIVEAAAAACGIQEBAAAAYBhKsAAAAIAU4CpYjsEKCAAAAADDkIAAAAAAMAwlWAAAAEBKUILlEKyAAAAAADAMCQgAAAAAw1CCBQAAAKQAV8FyDFZAAAAAABiGBAQAAACAYSjBAgAAAFKCEiyHYAUEAAAAgGHS1QqIh8nN2UNwaV4mD2cPweUdjw539hBc2geN/Z09BJdX/MYVZw/B5ZkyZXT2EFyaW2yss4cApHnpKgEBAAAAHhdXwXIMSrAAAAAAGIYEBAAAAIBhKMECAAAAUsJCDZYjsAICAAAAwDAkIAAAAAAMQwkWAAAAkAJcBcsxWAEBAAAAYBgSEAAAAACGoQQLAAAASAlKsByCFRAAAAAAhiEBAQAAAGAYSrAAAACAFDDFOXsEroEVEAAAAACGIQEBAAAAYBhKsAAAAICU4CpYDsEKCAAAAADDkIAAAAAAMAwlWAAAAEAKmCjBcghWQAAAAAAYhgQEAAAAgGEowQIAAABSwkINliOwAgIAAADAMCQgAAAAAAxDCRYAAACQAlwFyzFYAQEAAABgGBIQAAAAAIahBAsAAABICUqwHIIVEAAAAACGIQEBAAAAYBhKsAAAAIAU4CpYjsEKCAAAAADDkIAAAAAAMAwlWAAAAEBKWKjBcgRWQAAAAAAYhgQEAAAAgGEowQIAAABSgKtgOQYrIAAAAAAMQwICAAAAwDCUYAEAAAApQQmWQ7ACAgAAAMAwJCAAAAAADEMJFgAAAJACXAXLMVgBAQAAAGAYEhAAAAAAhqEECwAAAEiJOGqwHIEVEAAAAACGYQXEIL/ukabMNOvcRSnHU9KbLSzq3MYikynp50dFSV/NMWnNJpPuhEjFCksd/8+iZi/aZt5bf5W+mWfWhStS7pxSi0bW1/X0eOKblKr8ukeaNFOJ8f2/FlKXNvrb+E6dI63eJN0JkYoXljr+n9T8RdvnrVgvzVosXb4q5ckptXxJeq+95JEOf3MO7XXTwjkeCrhkVrbsFjVqFqNX3opONsbRUdKSeR7asdVd90JMKlAoTi3eiNazDWNtnrd7h5tWLvbQ1StmZcpsUfnKsWrbJUpP5TBgo1KRKnVLyr/XiypcIo9C7tzXusV7tfi7HSnq6+tXQF8ueE+dmkzQjWt3bR4rWCy3OvdvrPLViik2Jk5H91/UjLHrdD3gzhPYitSjSoOn5T+kpQqXKqCQ4HtaN2e7Fk/a8Ld9GrxeQ2/2eVn5iuTWzau3tWzqJm2Yv9PmObVerqS3BzRVwRLeuhMUqq1Ld2vxl+sVE/1wXpetWlwdPnpFpZ8ppgf3H2jflmOaPWqFbt8IeSLbmppUea6s2n/QVIVL5VNIcJjWfb9LS77anKK+vuULaeKqfupUb6SCAm7/68ddTZWG5eQ/9FUVLp1fIbfuad3sX7R4wrq/7dPgjZp6s19T5SuaRzcDbmvZ5PXaMO9XSZJ34Vyae3Rssn03zd+pCT1mSZKK+vmo0ydvqEzVYoqOjNGBn49r5sdLdfdmqOM2EOlaOtyNMt6hY1LPIWa9/LxFvTpZdPCoSZO+MynOInVtl/RS3oARZm3/XerwfxbVfMaiP8+YNHy8NRlp19ra57d9Up+hZjV+3qL337Xo9Hnr696+K33UN/0sER46JvUYIjV+XurTSTpwVPryO+sq6Xvtku7Tb4S0/Xdr0lHzGenPM9Lw8dZkpH1r63PmLZM+m2LSS/Ut+uA962NTZ0unzklTRxu3fanByeNmffGxl2o/F6u3OkTq5DGzFs72kMUivdYmOsk+E0d76cBuN7V4PVrlK8fqwlk3TZ/opXshUWr6aowk6bftbpowMoNebBat/+sYrZA7Ji2e66HhAzJqzLQIeXoauZXOU7ZSYQ2f2lY71h/V3CmbVe6ZIvLv86JMZrMWTf/lb/sWK51Pn0xrL3cPN7vHcufLrgnzuyrgwk19MXCJvLw85N/7BX06o6PeazVZUZExT2iLnKtsteIaPr+Hdqzcr7mf/qRyNXzl/2ErazwnJr0DV69FFQ34qqNWTv9ZB7YeU60mldT3y/aKfBClbcv2SpIq1y+roXPf046V+zXrk+Uq5uejDh++ouy5surrwQslSaUqF9UXP/XXldPXNb7nbEU+iNKr3V7UhHWD1P25kQq/F2FYHIxWtkoxDZvVRTtWH9K8sWv1dLXi8h/UVGazSYumbPrbvsXKFtCIuV2TnMcpedzVlK1eQsMX9daO5Xs1d+RylatVUv5DX7XO4XFrkuxTr1VVDfi2s1ZO26IDW4+qVtNn1HdKR0VGRGvb0t26fT1EfRuOsuvXvEsDPftqdW383pqo5MibTV+sGaibAbc1vtsseWXy1DvDW2vUj++rT4NRio2JtXuNdCX97F49USQgBvh6jlllfKXPP7LO2no1LIqJkb77wST/NyzK4GX7/D9PS1t3mtSnc5zejU9QalW1KGMGafy3JrV8yaJsWaUV603K7y198ZFFbm5S7WoW3b4rzVtq0qCelnRzlP6rOVIZX2nMR9bv69WQYmKkGT9IHd6QXXxPxMe3b2eLusYnKLWrKj6+UquXpMyZpK/nSLWrWvTlJw/7Pl1aau5v0q59FtWpZsTWpQ5L53moaIk49R4cKUmqXD1WMbHSikUeatY6Wl6PxPj8GbP27nLXW+9E6bW3rQlKhSpx8spg0fwZnnquUYwyZ5GWzffUM9Vj1LVvVGJfn0JxGtwzow7sdlOtZ9PHH7q23Rvo/MnrGvu/ZZKkAzvPyM3dTW90flbL5+5MMlFw93BTi7drqn2vF5JNJNr1aKiI+5H6X+fZinxg/Tlcv3pHw6e2VcmnfXT84KUnt1FO1PaD5jp/7IrGdrcezT3w83G5ebjpjT6NtXzaZkU9sE+a/Ye01M5VBzX9oyXWPttOKGuOzGo3sEViAtLo7dq6GXBbY96bqbg4iw5t/1PZc2fVK++9oG8/WqLYmFi91a+J7odEaFCr8QoLCZckHdr+p77bPVKv93pJcz9daUwQnKDN+411/sRVjevzvSTpwC9/yt3DTa/3eEHLZ2xLMu7uHm5q0fFZtfug6WM97qraDm6p80cva2zX7yRJB7Yes87hvk20fOrGpOfwR69q508HNH3Iovg+x61zeEhLbVu6W9FRMTq5/7xNn5KVi+rZV6trzic/6vjuM5Kkmk0qK3uurOrbcJQCL9yUJIXdDdfo5f3kV6OEju46/SQ3HekE54A8YVFR0r7D0gvP2qbMjepbFB5h0oE/7Pucu2StaXmutm2fapUsiogwae+h+NeONiljBsntLweEcmSXoqNNuh/uyK1IvaKipL2HpReftW1/qb6Sje/5+H2u52vbtlerZO2z55AUfEcKuWeye45vUSlHdou2/+6gDUgDoqOk43+4qUZd22SgVr1YPYgw6c+j9kckr162zuGqNW13jJ+uGKsHD0w6dthNcXFSxSqxeqGp7XPyF4yTJF2/lj4+njw83FS+WjHt2nLcpn3npmPKlNlL5aoUTbJftXql1KZ7Ay2avl2zJmxM8jl1XvDTxuUHEpMPSTpz/KraPP+FyyYfHp7uKl+nlHatOWTTvnP1AWXKkkHlapa06+NdKJcK+ubTrrWP9Fl1UAWK55VPibzxr+2hB+FRivvLSaiht8Pk6eWhjFmsWXihUvl1fM/ZxORDkqIjY3T64EXVaFTeYduZ2nh4uqtCrZLatf6ITfvOtYetca9eIsl+1Rr4qc37L2vx5E2a9emqf/24K/LwdFf5uqW1a/VBm/adP+1XpqwZVK52Kbs+3oVzqWDJfNq1+oBdnwLFveVTwjvJ9+o5vq2unArUiq8erlB5eFqPXoaHPkhsC70dJknKljPL420U8Ij08Rfeia5csyYERQvZJhOFC1r/vXjFvoA+51PW5169btt++Zr134BAa5+3X4nTpQBp1kKTQu9JR45L3y8z6dmaFj2VzbHbkVolxLdIIdv2h/G175PjKeu/j8b3Snx8rwZKWbNI7m4Wu+eE3JNCw6SAwP889DTjRqBJMdGmxMQgQT4f6/eBAfZzOFv8HL55w/YjJiGpCLpuktks+b8Xpep1bBObPTutf/wKF7V9P1eVr1BOeXq66+rFWzbt1y4HS5J8iuZOst/pY1fl32icFk3/RbGx9itF3j45lCVbRt24ekc9PmquJbs+1KqDwzX8q3bKkz+74zcklchXJLc8vTx09dwNm/Zr561HcpPaEStUKr8k2fe5EGTTZ9XMbSpQPK9a92ykzNkyqkyVYmrV9QXt3XxUYXetCUfIrXvyLpzL7j3yF8sj7yJJ/yxdQb7CueTh5a6r54Ns2q9djI978TxJ9jt95LL8aw3Xoimbkizt+afHXVG+onmsc/is7R+ga+dt5+NfFSpVQJKS7+Obz67Pc61rqHSV4vpm8AKbpHrHin26de2Ouo9ro5ze2eVdJLc6f/K6ggPv6tD2P//bxrkAkyV1fqU1Ti3SWbZsmVq0aCHPvxR67969W7NmzdL169dVsmRJdevWTb6+vk4c5X9zz3rQQFky2bZnzmj99/59+z5VK0mFClj02WSzMmaIU7ky0qmz0sRvzDKbLYqIPyhRvbL0zlsWjf/GrPHfWNvKlrRozND0seMmWZMBKfn4hiUR32qVrPEdPVnKkEEqX0Y6eVYa/41kNlsU/sBajvVyA+mHFdZVjxeelW7fkT6dIrm7KfFnkB7cv29NMDJlsv2Eyxgf8/Bw+wTEr0KcvPPHadZXnvL0ipRv6ThdPG/W/O88ZTZb9OBB0meuBwaY9P10TxUvGavK1dPHzkaWrBkkSeFhkTbt4fetZWmZMnvZ9ZGk4KC/Pxk0e87MkqR3+r2kU0cD9PkHi/VUzszq2LeRvpjVSd1enaLICNcracmS3Toxw+/Z/pKGh1m/zxQfb9s+1g+M+4+cn/Gwj/XxP3ae0rIpG9V5eGt1Hm49WezsH5f1+bvfJfbZvPA39f2yvbqOekNLp26UJc6iV957QYVK5pO7p+vWxWbOZo2Rfdyt8zpTFvu4S1Lw9b8/Mf+fHndFyc7h+O8zxcc6qT73/8W8b927sY7/fkZ/7Dxl0373Zqim9v9eg2d2Vf1Xq0uS7t0J08BmYxUe6rrnMMFYTl0BGTp0qO7du5f4/c6dO9WxY0fFxcWpbt26unnzpl577TUdPHjwb14ldUs4qJDclYJMSfwEPD2kb8fGKV9eqVM/N9Vo4qb+I8zq2cmaWGSM/xwZMd6kWQtN6to+TrO/jNWoQXG6EyJ1/cCcbnaQLf8QX3My8Z0xVsqXV3qnn0nVmpjUb4TUu5P18Uzx8R3Wz3pVrKFjpZrNTHqti1TpaalcmYc/g/TAkpDPJhfjJNo9PKSPPn+gXHks+mRgRrVvmVkTR3np/zpYd6ozZLA/XBNwyaRhAzLIw8Oi/h9HJvmzc0Wm+ABaLEkfwkqu/Z94xJ+seyc4TCP7LNDB387q5zVHNLrfIhUonEsNm1V6rNdN7f4xnklcwz+hz6Mnl5riP1gS+vQe31ate72kBePWaGDLcRrfa46y5cyi0Ut6yyuj9UDahvk7NX3oEjVuW1cLjo3VD8fGKF/h3Fo7d4cehNsmma7EnEwME8Q95jxOj/7bHLZ9LOHj+dHX8qvhK9+KRbR08nq713qudQ19/ENP7V5/WENeGa/hb03W5ZOB+nRFPxUsab+SAjwOpx6OefQX4uuvv1b79u31v//9L7Hts88+07hx47RgwQKjh+cQ2eLLJR89En8//iBC1sxJ9ytSUJo3JU7Bd6S7Idbvr9+U4uJMyp7Nohs3pWVrTOrS1qLeneLjWNmicmUsatXRTcvXmdTmVdf/wM/6D/HN8jfxnT9FCr5jSTK+kvVE9NGDpCG9pGs3LPLJJ2XKKP24TipU4AltUCqUOYs1HhGPrHRExJe4Z8qc9DzL72PRyIkPFHJHuhdqUv6CFt0KMikuzqQsWW2fe+ywWWOHZ1DGjBYNHftA3vldf+4muB9fZ/3oEeJMma07tI8e0Uyp8PvWnd39v562+aw9+ccV3QuJUPEy+R/rdVO7+/HnXjx6xDchvo+uclj7RCTZJ2P86tP90AjlyveUGrerq8UT12ve5/HnIuw6rdOHLurbncPV6O06Wj1zmyRp+bQt+mnGNuUvmkf37oQpJDhM/ad2UNidJJZkXURYaNIxzBR/bsxfzyfA33s4h21XOhJiez/U/iTP5PpkTJj3Ibbzvm7Lqrp3J0z7Nh21e622/2upE7vP6vN3vk1sO7TthKbvHSX/oa9qdPuv/+0muRaSaYdIVccYL126pJYtW9q0vfnmmzpx4oSTRvTfFSogublZdPmq7c7b5QDrvyWK2k/kB5HS6k0mBQRKuXJIJYpK7u7S8fhVUr9SFgXekCwWk54pZ9u/ZHHpqewWnbvwJLYm9SmcGF/b9ofxte/zIFJatUl/E1/rv9t+kw4etSYiJYtZk4/gO9L1IMnP/jxWl+VdwCKz2aLrVx85nyP++4JF7Ev+IiOlHVvcdCPQpOw5pIJFrFdqO3/G2qdYyYflVb9uddOowRmUM7dFoyc/kE+h9PXhfu3KbcXGxKpA4Zw27QXizyO4fC4oqW7/KPDKbcXGxiWeUPpX7h5mRUW6XvmVZD3nIDYmVgWK57VpLxB/DsLlU/YncF2Jr5svUOyRPvHfXz51TXkL5pTZbNbxvedsnnPp5DWFBIepSHxCV7JSEdVpWlmxMbEKOHtdIcHWOtGSFYvo7B+XHbCFqVPgpVuKjYlV/kfOWSpQND7uZ64n1Q1JuHYhKJk5HD8fT16z65M4h5Prc8q2T43GFfXbmkNJnleTt1Aundh71qYtMiJKpw9dVJEy6ejoG54opyYgpkfqZooWLarwcNvM/s6dO8qa9ZHDpWmIl5dUpYK0ZYfJJmnetN2kbFksKl/Wvo+HuzR6kklLVz+MT2ystGC5WYV9LCpZzHqStZubRQf+sI3hhcvS3RCTfFzz4KYdLy+pagVp8w7bgxIbt0vZslhUIZn4jpokLVn9sC02Vpq/XInxlaTFq6QxjxzombdUcjNLzz1ydSxX5ulpPadjz043mxj//qubMmexyLeMfQLi7i59N8VLW9Y+3PmNjZXWr/RQPp84FY5PvA/ucdOUL7xUyi9Oo76MUK486Sv5kKToqBgdPXBRdV542qa9bqNyuhcSoVNHAx7rdR+ER+n4gYuq84JfYjmWJFWqUVwZM3np2AHXvApWdGSMjv5+RnWaVrZpr9u8iu7dva9TB+2PzgReuKlrF26qbotnbPu0eEYBZ68rKOB24k5huZq25yQW9PVW9lxZdD3+ogEV6pTSwG86JZ4TIVnvH1K0rI9+W3vYQVuZ+kRHxujonnOq83JFm/a6TSvp3t1wnTrsmvPtSYiOjNHR306rTvNH5mPLqtY5fCCJOXw+SNcuBKluy6p2fQLOXFfQleDEtiw5MsunhLdO7Dn76MtIkgJOX9fTj1wtzsPLXb4Vi+j6pVtJ9gH+LaeXYDVs2FDFihVTiRIl5OnpqbFjx2r+/Pny8PDQwYMHNWLECNWvX9+Zw/zPuraPU+d+ZvUbZtarTeJ06LhJsxeZ1K+r9R4gYfetd/Au5CPlfMp6Wd3/a2nR98tMyptbKlHEogXLzTp0TJoyOk5ms/V57VpbNHuRNQGpVdWiazdMmjbHpPzeFrVunn525N5rL73TT3p/mPRqE+nQcWnWIql/VyXG9+xFqfBf4vtWS+uNBr1zS8WLSD8st97QcOroh+eNtHtN6jzApE+nWNSgjrT7oDT9B5O6tLGkqxIsSXqtTZQ+GZhB40d6qUHjGJ06btaqJR5q29l6D5Dw+1LAJbO8C8Qp+1PWGL/UIlprl3soZ26LfArHacNPHjp5zKxBn1jP74iKkqZN8FTGTNbXv3rZ9nhIrjyWdJOQLPz2F332XUd9OOH/tHH5QflVLqzWHetq1oSNioqMUabMXipcIq8CrwQr5E7Kr7E9+8tN+mJOZ33yjb9+nL1TOXJn1jvvN9afRy5r9zbXvZrNwglr9dmP7+vDmV21ccEu+VUrrtY9G2nWJ8sV9SBambJkUOHS+RV48WbiCsXC8WvUf2pHhd6+r90bjqhm44qq36qaRneylqGEBIdpxbdb1brnS5Kkg9v/lHfBnGrzQXPduBKceLfpn5fu0Zt9XtaHs7pq2dRNyuOTQ++OfEPHd5/Vth/3OCcgBlk0aaM+XdRDQ77pqE2Ld6tslWJ67b0GmvXpqodxL5VPgRdvKST+sq5I2sKxq/XZTwP04dxu2vj9TvnV8FXr3o01a9gyayyzZlDh0gUUeOGmQoKt59IuHLNa/ad1UujtMO1ed1g1m1RS/Vera3SHaTavXczPR1LSKymSNG/0Cn28oKc+nNtNG+b9Kg8vd73avZFy5X9KYzpPf7IbngakxStOpUYmy+Oe4egA165d06lTp3T69OnEfy9evKj9+/crQ4YMqly5skqXLq1p06YpR44c//n9Yq4772paW3ZIX80268IV607vW69Y1OFNa+j3HpI69nXTqMFxeuVla1t0jPT1HJNWbTQp5J71Rnvd/ONsbn5nsVgvu7tklbVcK08u643z+nSxKOdTxm+j2YkLapt3WO9SnhDft1+ROr5pfWzvIcm/r0mfDrbolZetbdEx1hsYrtqoxPh295fdzQXXbpGmfW+9NG8Bb+mtVlLb14zcMlvHo513g5c9O920eK6nrgWYlDOXRY1bRqvF69Z7eBw7bNbwARnV44NIPf+StS0mxnoDw+2b3RV2z6SiJeLUul20KlW1LvkfPWTWiA/sr+aS4PV2UXrT39gyoQ8a+xv6fn9Vu6Gf2vVoKJ9iuRV8I1SrF+7W8rm7JEkVqhXTmDmdNf7DZdq88pBd3xdbVVb/0a3l/+JY3bh21+axspUKq0OfF1W6fEFFPojW71v/1Ixx6x/73JL/7MZNQ96mdpNKajeohXx8vRUceFerZ/2i5V9vlmRdpRjz0wCN7zlbmxc9vKlPE/9n9VqPF5WnQE4FXrqpJV9u0Nalu21et1XXhmraob68C+fSnRshOvDLCc0dvTIxkZEk34qF9e7IN+RbobDuh4Rr5+qDmvf5T4oIM+YkdFMG510lo3bjCmrb/2UVLO6tW9fvas3cX7V8uvXcmPK1fDVmaW+Nf3++tizda9f3hderq//EtvKvOVxBAbf/9eNGsdwzJnmq3ewZtftfS/mUzGedwzN+1vKp1nv+VKhbWmPWDtL4bjO1ecGuxD5NOtbXa70aK49PTgVevKklE9Zq62LbG1fVe6WaPpzTTZ2rDlFAMqVxVRqW09sDm8u3YhFFhD3Q6YMXNPuT5bpwLIlr2z8BG0JmGfI+j+P5l75w9hCStG3jIGcP4V9xagKSlOjoaHl4eEiSTp06pVKlStmVaj0uZyYg6YEzE5D0wpkJSHrgzAQk3TAoAUnPnJmApAdGJSDpGQnIv5fWEpBUd1HyhORDkkqXLu3EkQAAAAB/kaoO26ddHLIGAAAAYBgSEAAAAACGSXUlWAAAAEBqZEpdp06nWayAAAAAADAMCQgAAAAAw1CCBQAAAKREnLMH4BpYAQEAAABgGBIQAAAAAIahBAsAAABIAa6C5RisgAAAAAAwDAkIAAAAAMNQggUAAACkBBVYDsEKCAAAAADDkIAAAAAAMAwlWAAAAEBKcBUsh2AFBAAAAIBhSEAAAAAAGIYSLAAAACAFTFRgOQQrIAAAAAAMQwICAAAAwDCUYAEAAAApwVWwHIIVEAAAAACGIQEBAAAAYBhKsAAAAIAUMMU5ewSugRUQAAAAAIYhAQEAAABgGBIQAAAAICUsltT59R/ExcVp8uTJqlevnipWrKh33nlHly5dSvb5Z86c0bvvvqsaNWqoVq1a6t27t65du/av3pMEBAAAAEinvv76ay1atEijRo3S4sWLZTKZ1KVLF0VFRdk9986dO+rYsaMyZ86s+fPna8aMGbpz5446d+6syMjIFL8nCQgAAACQDkVFRWnWrFnq1auX6tevrzJlymjixIm6ceOGNm/ebPf8LVu2KCIiQp9//rlKliypcuXKaezYsTp37pwOHjyY4vclAQEAAABSwpJKvx7TyZMndf/+fdWsWTOxLVu2bPLz89O+ffvsnl+rVi199dVX8vLysnssJCQkxe/LZXgBAACANKxhw4Z/+/jWrVuTbL9+/bokKX/+/DbtefPmVWBgoN3zCxYsqIIFC9q0ffvtt/Ly8lK1atVSPF5WQAAAAIB0KCIiQpLk6elp0+7l5ZWiczrmzZunBQsWqF+/fsqVK1eK35cVEAAAACAFTP/xilNPSnIrHP8kQ4YMkqzngiT8X5IiIyOVMWPGZPtZLBZNmjRJ06ZNU9euXdWhQ4d/9b6sgAAAAADpUELpVVBQkE17UFCQ8uXLl2Sf6OhoffDBB/rmm280cOBA9evX71+/LwkIAAAAkA6VKVNGWbJk0Z49exLbQkNDdeLECVWtWjXJPgMHDtSGDRs0fvx4derU6bHelxIsAAAAICVSaQnW4/L09FTbtm01btw45cyZUz4+Pho7dqzy5cunF198UbGxsbp9+7ayZs2qDBkyaPny5Vq3bp0GDhyo6tWr6+bNm4mvlfCclGAFBAAAAEinevfurdatW+ujjz7SW2+9JTc3N82cOVOenp4KDAxU3bp1tW7dOknSmjVrJEljxoxR3bp1bb4SnpMSJovFxVK5vxFz3dfZQ3BpZvLZJ+54dLizh+DSPmjs7+whuL4bN//5OfhPTCk8AonHY7kX5uwhuLwNIbOcPYRkNar5ibOHkKRNuz929hD+FUqwAAAAgJSIc/YAXAOHrAEAAAAYhgQEAAAAgGEowQIAAABSILXeiDCtYQUEAAAAgGFIQAAAAAAYhhIsAAAAICUowXIIVkAAAAAAGIYEBAAAAIBhKMECAAAAUoISLIdIVwlI88ovOXsIrs3L09kjcH2x3IL1ibof5OwRuDxT5kzOHoLLs4Tec/YQAOBvUYIFAAAAwDDpagUEAAAAeGwUIjgEKyAAAAAADEMCAgAAAMAwlGABAAAAKWDiKlgOwQoIAAAAAMOQgAAAAAAwDCVYAAAAQEpQguUQrIAAAAAAMAwJCAAAAADDUIIFAAAApAQlWA7BCggAAAAAw5CAAAAAADAMJVgAAABASlCC5RCsgAAAAAAwDAkIAAAAAMNQggUAAACkRJyzB+AaWAEBAAAAYBgSEAAAAACGoQQLAAAASAETV8FyCFZAAAAAABiGBAQAAACAYSjBAgAAAFKCEiyHYAUEAAAAgGFIQAAAAAAYhhIsAAAAICXiKMFyBFZAAAAAABiGBAQAAACAYSjBAgAAAFKCq2A5BCsgAAAAAAxDAgIAAADAMJRgAQAAAClBCZZDsAICAAAAwDAkIAAAAAAMQwkWAAAAkBKUYDkEKyAAAAAADEMCAgAAAMAwlGABAAAAKRFHCZYjsAICAAAAwDAkIAAAAAAMQwkWAAAAkBKWOGePwCWwAgIAAADAMCQgAAAAAAxDCRYAAACQEtyI0CFYAQEAAABgGBIQAAAAAIahBAsAAABICW5E6BAkIAap8ryf2g9qpsKl8iskOEzr5v2qJVM2paivb4VCmrjmA3WqPVxBAbf/9ePpQZX6ZdS+fxMVLultje+C37Tk660p6utbrqAmruirTs+PVlDAHZvH6jSuoNffa6BCJbx1/16EDv92RrM+X627t8KexGakalWeK6P2HzRV4ZL5rDGev0tLvtqSor6+5Qtq4k/91OnZUXZztM7LFfV694YPY7zrtGZ9ulp3b917EpuRalRp+LT8h7RS4dLxnwmzt2vxl+v/tk+D12vozfebKF+R3LoZcFvLpm7Uhu932jynVpNKentAMxX09dadoFBtXbJbiyeuU0x0bOJzMmfLqA5DX1GdZs8oY2YvXfzzquaMWqkjv558ItuaWjCHHatKw3LyH/qqdQ7fuqd1s3/R4gnr/rZPgzdq6s1+TZWvaB7rHJ68Xhvm/SpJ8i6cS3OPjk2276b5OzWhxyxJUlE/H3X65A2VqVpM0ZExOvDzcc38eKnu3gx13AY6GfGFKyMBMUDZqsU0bE5X7Vh1UPO+WKOnqxeX/+DmMptNWjRp49/2LebnoxHfd5e7h9tjPZ4elH2mqIbN6KQdaw5r3vh1erpqMfkPaCKzyaRF/7BzUaxsAY2Y3SXJ+NVtUlEfft1Ba3/4TfPGr9dTubOqXb/G+nxBD/VqPl7RkTFPapNSnbJVimrYzC7asfqQ5o1dq6erFZf/wKbWOTxl89/2LVa2gEbM6Zp0jJtW0offdNTa73dp3ti11hj3b6LPF/VQr6bjXDbGZauX0PAfemrHin2a++lKlatRUv4ftZLJbNKiZHYw6rWsogHT3tHKb7fqwNbjqtWkkvpO8ldkRLS2LdsjSar8XFkNnddNO1bs16xPflQxv4Lq8NEryp4ri74etFCSZDabNGppH+XxyamZw3/U3Zuhatm1oUYu7q0+L4zWhRNXDYuDkZjDjlW2egkNX9RbO5bv1dyRy1WuVkn5D31VJrNZi8atSbJPvVZVNeDbzlo5bYsObD2qWk2fUd8pHa1zeOlu3b4eor4NR9n1a96lgZ59tbo2fm/dkc6RN5u+WDNQNwNua3y3WfLK5Kl3hrfWqB/fV58GoxQbE2v3GmkN8YWrIwExQJv+TXT+eIDG9ZorSTqw7YTc3d30es9GWv7tz4p6EG3Xx93DTS3eeU7tBjVTVETUv348PWnT9yWdP3FV4/r9IEk6sP2k3D3c9Hq3hlr+3XZFRSYTX/96atf/5STjL0lv92qkvT+f0NQPlya2BZy7oUmr+qlGg6e1c/2RJ7NBqVCb9xtbY9x3viTpwC8nrXO4+wtaPuOX5Odwx2fVbkCT5GPc5yXt3XpcU4csSWwLOBekSWv6q0bDp7VznWvGuO3A5jp/9IrGdrMebTyw9bjcPNz0Rp+XtfzrzUnGy39IK+1cdVDTP7TG6sDPx5U1R2a1G9wiMQFp9HYd3Qy4rTFdv1NcnEWHfvlT2XNn1SvdXtC3Hy5RbEysnn+9pkpVLqqez41MTDb+2HVK034drmcaPO2yCQhz2LHaDm6p80cva2zX7yRJB7Yes87hvk20fOrGpOfwR69q508HNH3Iovg+8XN4SEttW7pb0VExOrn/vE2fkpWL6tlXq2vOJz/q+O4zkqSaTSore66s6ttwlAIv3JQkhd0N1+jl/eRXo4SO7jr9JDfdEMQ3FeMqWA7BSehPmIenuyrUKqld6w7btO9cc0iZsmRQuRolkuxXreHTatP/ZS2etEGzRv/0rx9PLzw83VShhq92bTxq075z3RFrfKsXT7JftefLqk2fl7R46mbN+ny13eMmk0kHd57S+oW/27QHxH8Y5y+Sy0FbkPp5eLqpQs2S2rX+D5v2f4xxAz+16dtYi6ds0qxPV9k9bjKZdPDXU1q/4Deb9oDzQZKk/EVyO2gLUhcPT3eVr1NKu9YctGnfueqAMmXNoHK1Str18S6USwVL5kuyT4HieeVTwtv62l7uehAepbi/1CiH3g6Tp5eHMmbxkiTVbf6M/th12ibRiI6MUefqH+nHqSkrC01rmMOO5eHprvJ1S2vX6kfm40/7rXO4dim7Pt6F4+fw6gN2fQoU906cw4/qOb6trpwK1IqvHs5ND0/rsdPw0AeJbaG3rWWx2XJmebyNSkWIL9IDEpAnLF+RXPLw8tDV+D9ICa5dtO7I+hRP+kPh9OFL8q/+sRZN2qjYmLh//Xh6ka9QLnl4uScR31uSJJ9ieZLsd/rIFfnXHalFX21RbKx9/CwWi74bvUq7Nx+zaa/TuIIk6eKp644YfpqQr3DuZGIcP4eL5U2y3+kjl+Vfe4QWTdmcfIxHrtTuTY/E+OWKkqSLpwIdMfxUJ1/R3PL08tDVczds2q/FxzepHYVCpfNLkq6e/fs+q2ZsU4HiedW6VyNlzpZRZaoWV6v3XtDeTX8o7G64JKl4+UK6dPKqWr3XUHMOfaa1Qd9o6raPVD6JnRpXwRx2rHxF81jn8Fnbz8G/ncOlCkhS8n1889n1ea51DZWuUlzfDF5gk1TvWLFPt67dUfdxbZTTO7u8i+RW509eV3DgXR3a/ud/27hUgPgiPXB6CdaRI0e0Z88evfvuu5Kk3bt3a86cOQoICFDhwoX1zjvvqGrVqk4e5ePLnC2TJCn83gOb9vCwSElSpqwZkuwXfD3kb1/3nx5PLzJnzyjpYTwThN+Pj2+WZOJ749/Hr0DR3Or0v+Y6c/SK9v+Sfj6EM2dLiLFj53BSChTNo04fttCZP65o/zbXjHGW7Ml9Jli/z5Q1o32f+M+R+/cikulj/Rn8sfOUlk3eqM4jXlfnEa9Lks4euaTPu3yX2Cd7riyq16KqwkLC9d2wpYoMj9IbfV/W6GV91bfRZzp/7IojNjNVYQ47VrJzOP77TNmSmMPZE+ZwcvPe/mfQundjHf/9jP7Yecqm/e7NUE3t/70Gz+yq+q9WlyTduxOmgc3GKjw0wu510hrim8pRguUQTl0B2bBhg9566y3t3btXkrRt2zZ17NhRFotF9evXV3R0tPz9/bVt2zZnDvM/MZtN1v8kM1/j4tLv6oUjmE3xUziZD4Q4B31QFCqRV18s7KHoqBiN7jZHlnT0AfTPc9hBMfb11hdLelpj/N4sl42xKT6eyW2fJYnPBFMyPwOTyfa1ek9oq9a9X9KCcWs0sPlYje85W9lyZdHoZX3kldFTkrW8InP2jPqw9Zfaueqg9m05po//b7LC70XojT6NHbGJqQ5z2LH+eQ7btz+cw7aPmRL6PNLuV8NXvhWLaOlk+yvDPde6hj7+oad2rz+sIa+M1/C3JuvyyUB9uqKfCpa0P9Kf1hBfpAdOXQGZOnWqevbsqe7du0uSpk2bpvfee099+vRJfM60adM0efJkPf/8884a5n8SFmI9WvDo0YdM8fXYf62xxL8XFppMfDPHx/fefz9aU6GWrz76pqMiwiI1pN003UhnlzpONsZZHBzj6Z0UcT9SQ97+WjeuuG6M7yd+JtgexUxYrbufxBHG+yHh8X1sfwYZ4+f5/dAI5cr/lBq3r6fFE9Zr3qfx54XtOq3Thy7q210j1KhNHa3+bpvCwx7oyulA3br28JLTEWGROrH3nIqXL+SYjUxlmMOO9XA+PjKHsybM4fAU98mYMO9DbH8GdVtW1b07Ydq3yfb8Pklq+7+WOrH7rD5/59vEtkPbTmj63lHyH/qqRrf/+t9uUqpCfJEeOHUF5PLly2revHni9wEBAXrppZdsntOsWTOdO3fO6KE5TOClm4qNiVX+orbnIhSI//7yGdesETZK4OVb1vg+crJngaLW7y+fuZFUtxR7rsUzGjW3q4Kvh6j/a5N09fzN//R6aVHgpeRinDCH/9v5MM+1qqJR87sp+EaI+reaaFen72quXQhSbEysChR/5DOhuPU8hMtJnDdwJb6uO+E5dn1OXlPegjllNpt1fO9Zm+dc+vOaQoLvqUgZa434tXNB8vD0sHsPdw83l72iHnPYsR7O4eTn46P+cQ6fsu1To3FF/bbmUJKXfM1bKJdOPDLPIyOidPrQxcR5npYR31TOYkmdX2mMUxOQQoUKafv27Ynfly1bVidP2t4I648//pC3d9InaqcF0ZExOrr7rOo0qWjTXrdZZd27G65Thy45aWSuIToyRkf3nk88OTxB3SYVdS8kXKcOX37s1672XFkNmPC2/jx4Uf1bT9atdHreTXRkjI7uOac6LycR47v/McbP+2nAxDb688AF9X/ly3QR4+jIGB397YzqNHvGpr1uiyq6d/e+Th28YNcn8MJNXbsQpLotqtj1CThzXUEBt3XtvHWnpVxN26toFfT1VvZcWXX9kvXCDPu2HFXx8oVUqNTDUoqsOTLLr4avjsVfhtPVMIcdyzqHT6tO80fmcMuq1jl8IIk5fD7IOodbVrXrE3DmuoKuBCe2ZcmRWT4lvHViz9lHX0aSFHD6up5+ZJ57eLnLt2KRxHmelhFfpAdOLcHq0qWLPvzwQ12/fl3NmjVT9+7dNXjwYEVGRqpkyZI6cuSIvvrqK/Xs2dOZw/zPFn25QZ8u6aUh0ztp06LfVbZqcb3W/QXNGvWToh5EK1OWDCpcKp8CL91SSHD6u8P2f7VoyiZ9+kM3DfnKX5uW7lHZZ4rptXef16zP1ygqMlqZsnipcMn4+N6+n6LX9PByV58v3lT4/UgtmrpZhX1tk+BbgXfTxY5GgkWTN+nThd01ZFoHbVq8R2WrFtNr7zXQrE9Xx8/hx4zx2P9TeFikFk1JXzFeOH6tPlvxvj6c3VUbf9glv+ol1LpXI80a8aM1nlkzqHDpAgq8EJT4mbBw3Fr1/6qjQm+HafeGI6rZuKLqv1JNo+PLJEKCw7Timy1q3auRJOngLyfkXSiX2gxsrhtXghPvhrzym6168e06+mRRb80dtVIR9x/o7QHNZLFYtGyKa16GV2IOO9rCsav12U8D9OHcbtr4/U751fBV696NNWvYskfm8E2FBFvvCL9wzGr1n9bJOofXHVbNJpVU/9XqGt1hms1rF/PzkZT0kX5Jmjd6hT5e0FMfzu2mDfN+lYeXu17t3ki58j+lMZ2nP9kNNwjxhaszWZx8ltxPP/2kyZMn6+rVqzKZTDYnSmXOnFmdO3dWt27dHPJeL+fv4ZDXeRy1X66otgOaqmCJvLp1PURrZu/Q8m+3SpLK1yqpMcv7anyf77VlyW67vi+8UVP9J7WTf7WhCkri/IN/etwwXp5Oe+vaL5VX276NVbB4Xt26EaI183Zq+Xe/SJLK1yyhMYt6avyABdqybJ9d3xdaV1P/cW/Lv+4nCgqw1sVXrOWrzxcmP1/mf7lBP3z593exfyKSuBSoUWo3rqC2/V62xvj6XWuMp1svEFG+pq/GLO2l8f1+0Jale+36vvB6dfWf0Eb+tUYkztGKtUvq88XJH1yYP2G9fpi44clsTDIs9+1rq5+U2k0rq93gFvLx9VZw4F2tnrlNy7+y3pG7Qp1SGrP6A43vMVubFz68x0QT/2f1Ws9GyuOTU4GXbmrJxPXa+shnRqv3Gqpph/ryLpJbd26E6MC2E5o7aoXNwY3cBXLonWGvqdoL5eTu4abje85qxtClupTMDokjmTJneuLvkZz0MIclyRJ6z5D3qd3sGbX7X0v5lMxnncMzftbyqdbPxQp1S2vM2kEa322mNi/YldinScf6eq1XY+scvnhTSyas1dbFtvdbqvdKNX04p5s6Vx2igGTK46o0LKe3BzaXb8Uiigh7oNMHL2j2J8t1wYWu4pae47shZJYh7/M4nLkv+XfWB37l7CH8K05PQBKcP39eFy9eVFhYmDw8PJQvXz75+fnJy8vLYe+RWieNy3BiApJuODEBSQ+MTEDSK2cmIOmFUQkI8KSQgPx7aS0Bcfp9QBIUL15cxYsnfTdaAAAAAK4h1SQgAAAAQKqWOgqH0jynXgULAAAAQPpCAgIAAADAMJRgAQAAAClBCZZDsAICAAAAwDAkIAAAAAAMQwkWAAAAkBJxlGA5AisgAAAAAAxDAgIAAADAMJRgAQAAAClgscQ5ewgugRUQAAAAAIYhAQEAAABgGEqwAAAAgJTgKlgOwQoIAAAAAMOQgAAAAAAwDCVYAAAAQEpYKMFyBFZAAAAAABiGBAQAAACAYSjBAgAAAFIijhsROgIrIAAAAAAMQwICAAAAwDCUYAEAAAApwVWwHIIVEAAAAACGIQEBAAAAYBhKsAAAAIAUsHAVLIdgBQQAAACAYUhAAAAAABiGEiwAAAAgJbgKlkOwAgIAAADAMCQgAAAAAAxDCRYAAACQEnGUYDkCKyAAAAAADEMCAgAAAMAwlGABAAAAKWHhRoSOwAoIAAAAAMOQgAAAAAAwDCVYAAAAQApYuAqWQ7ACAgAAAMAwJCAAAAAADEMJFgAAAJASXAXLIVgBAQAAAGAYEhAAAAAAhqEECwAAAEgBroLlGKyAAAAAADAMCQgAAAAAw1CCBQAAAKQEV8FyCFZAAAAAABiGBAQAAACAYUwWi4XT+QEAAAAYghUQAAAAAIYhAQEAAABgGBIQAAAAAIYhAQEAAABgGBIQAAAAAIYhAQEAAABgGBIQAAAAAIYhAQEAAABgGBIQAAAAAIYhAQEAAABgGBIQAAAAAIYhAQEAAABgGBIQAAAAAIYhAUll4uLiNHnyZNWrV08VK1bUO++8o0uXLjl7WC7r66+/Vrt27Zw9DJdy9+5dffzxx3r22Wf1zDPP6K233tL+/fudPSyXEhwcrA8++EA1a9ZU5cqV9e677+rs2bPOHpZLunDhgipXrqzly5c7eygu5erVqypdurTd19KlS509NJeycuVKNWnSROXLl1fTpk21fv16Zw8JkEQCkup8/fXXWrRokUaNGqXFixfLZDKpS5cuioqKcvbQXM6cOXM0efJkZw/D5fTr109HjhzRhAkTtGzZMj399NPq1KmTzp075+yhuYxu3brpypUrmjFjhpYtW6YMGTKoQ4cOioiIcPbQXEp0dLQGDBig8PBwZw/F5Zw6dUpeXl769ddftXPnzsSv5s2bO3toLuOnn37SkCFD9Oabb2rNmjVq0qSJ+vXrp0OHDjl7aAAJSGoSFRWlWbNmqVevXqpfv77KlCmjiRMn6saNG9q8ebOzh+cybty4oc6dO2vSpEkqVqyYs4fjUi5duqRdu3Zp2LBhqlq1qooXL64PP/xQ3t7eWrNmjbOH5xLu3LmjggULauTIkSpfvrxKlCih7t276+bNmzpz5oyzh+dSpkyZosyZMzt7GC7p9OnTKlasmPLmzas8efIkfmXIkMHZQ3MJFotFkyZNkr+/v/z9/VWkSBH16NFDtWvX1t69e509PIAEJDU5efKk7t+/r5o1aya2ZcuWTX5+ftq3b58TR+Zajh8/ruzZs2vVqlWqWLGis4fjUnLkyKHp06erXLlyiW0mk0kWi0UhISFOHJnryJEjhyZMmKCSJUtKkm7duqWZM2cqX7588vX1dfLoXMe+ffu0ePFiffHFF84eiks6deoU8/UJOn/+vK5evWq3ojRz5kx17drVSaMCHnJ39gDw0PXr1yVJ+fPnt2nPmzevAgMDnTEkl9SgQQM1aNDA2cNwSdmyZVP9+vVt2tavX6/Lly+rbt26ThqV6xo6dKiWLFkiT09PTZs2TZkyZXL2kFxCaGioBg4cqI8++sju8xiOcfr0aeXJk0dvv/22Ll68qCJFiqh79+6qV6+es4fmEi5evChJCg8PV6dOnXTixAkVLFhQ3bp14+8fUgVWQFKRhPptT09Pm3YvLy9FRkY6Y0jAf3LgwAENGTJEDRs25I/eE+Dv768ff/xRLVq0UI8ePXT8+HFnD8klDB8+XJUqVeJ8hCckKipKFy9eVFhYmPr27avp06erfPny6tKli37//XdnD88lhIWFSZIGDRqkZs2aadasWapTp466d+9OjJEqsAKSiiTUvkZFRdnUwUZGRipjxozOGhbwWLZs2aIBAwaoYsWKmjBhgrOH45ISSlhGjhypw4cPa/78+frss8+cPKq0beXKldq/f79Wr17t7KG4LE9PT+3bt0/u7u6JB9zKlSunc+fOaebMmapVq5aTR5j2eXh4SJI6deqkV155RZJUtmxZnThxQrNnzybGcDpWQFKRhKX+oKAgm/agoCDly5fPGUMCHsv8+fPVq1cvPfvss5oxYwYnljpQcHCw1qxZo9jY2MQ2s9msEiVK2H124N/78ccfFRwcrOeee06VK1dW5cqVJUnDhg1T06ZNnTw615EpUya71f5SpUrpxo0bThqRa0nYZyhVqpRNu6+vrwICApwxJMAGCUgqUqZMGWXJkkV79uxJbAsNDdWJEydUtWpVJ44MSLkFCxZo5MiRatOmjb788ku7nQz8N0FBQerfv7/NlWyio6N14sQJlShRwokjcw3jxo3TunXrtHLlysQvSerdu7emT5/u3MG5iJMnT6py5cp29wc6duwYJ6Y7iJ+fnzJnzqwjR47YtJ8+fVqFCxd20qiAhyjBSkU8PT3Vtm1bjRs3Tjlz5pSPj4/Gjh2rfPny6cUXX3T28IB/dOHCBX366ad68cUX1bVrVwUHByc+liFDBmXNmtWJo3MNZcqUUd26dTVixAiNGjVK2bJl0zfffKPQ0FB16NDB2cNL87y9vZNsz5Url3x8fAwejWsqVaqUSpYsqREjRmjYsGHKkSOHlixZosOHD2vZsmXOHp5LyJAhgzp37qyvvvpK3t7eqlChgtauXatdu3Zpzpw5zh4eQAKS2vTu3VsxMTH66KOP9ODBA1WrVk0zZ87kKDLShI0bNyo6OlqbN2+2u3fNK6+8os8//9xJI3MdJpNJX375pcaPH6++ffvq3r17qlq1qn744QcVKFDA2cMD/pHZbNY333yjcePGqW/fvgoNDZWfn59mz56t0qVLO3t4LqN79+7KmDFj4v3ESpQooSlTpqhGjRrOHhogk8VisTh7EAAAAADSB84BAQAAAGAYEhAAAAAAhiEBAQAAAGAYEhAAAAAAhiEBAQAAAGAYEhAAAAAAhiEBAQAAAGAYEhAAAAAAhiEBAYA0JjIyUn5+fqpcubJGjhzp7OEAAPCvkIAAQBpjMpk0d+5cVahQQfPnz9eFCxecPSQAAFKMBAQA0hhPT09Vq1ZNnTt3liQdP37cySMCACDlSEAAII0qXry4JOnPP/908kgAAEg5EhAASKNmzJghSTp58qSTRwIAQMqRgABAGrRz504tXLhQ2bNn14kTJ5w9HAAAUowEBADSmNDQUA0ZMkQNGzbUW2+9pdu3b+vGjRvOHhYAAClCAgIAacyIESMUExOjUaNGyc/PTxJlWACAtIMEBADSkA0bNmjNmjUaPXq0cubMmZiAcCI6ACCtIAEBgDTi5s2bGjZsmN588009//zzkqRChQopW7ZsnAcCAEgzSEAAII0YOnSosmfPrsGDB9u0ly1blhIsAECaQQICAGnA0qVLtWPHDo0ZM0aZMmWyeczPz0+XL19WWFiYk0YHAEDKmSwWi8XZgwAAAACQPrACAgAAAMAwJCAAAAAADEMCAgAAAMAwJCAAAAAADEMCAgAAAMAwJCAAAAAADEMCAgAAAMAwJCAAAAAADEMCAgAAAMAwJCAAAAAADEMCAgAAAMAwJCAAAAAADPP/IFx0KvjTz7wAAAAASUVORK5CYII=",
+      "text/plain": [
+       "<Figure size 1000x1000 with 2 Axes>"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "# optional\n",
+    "# visual representation of grid search\n",
+    "# uses seaborn heatmap, could probably do this in matplotlib\n",
+    "import seaborn as sns\n",
     "\n",
-    "    # For the hidden layer\n",
-    "    p0 = npr.randn(num_neurons_hidden, 2 )\n",
+    "sns.set()\n",
     "\n",
-    "    # For the output layer\n",
-    "    p1 = npr.randn(1, num_neurons_hidden + 1 ) # +1 since bias is included\n",
+    "train_accuracy = np.zeros((len(eta_vals), len(lmbd_vals)))\n",
+    "test_accuracy = np.zeros((len(eta_vals), len(lmbd_vals)))\n",
     "\n",
-    "    P = [p0, p1]\n",
+    "for i in range(len(eta_vals)):\n",
+    "    for j in range(len(lmbd_vals)):\n",
+    "        DNN = DNN_keras[i][j]\n",
     "\n",
-    "    print('Initial cost: %g'%cost_function(P, x))\n",
+    "        train_accuracy[i][j] = DNN.evaluate(X_train, Y_train)[1]\n",
+    "        test_accuracy[i][j] = DNN.evaluate(X_test, Y_test)[1]\n",
     "\n",
-    "    ## Start finding the optimal weights using gradient descent\n",
+    "        \n",
+    "fig, ax = plt.subplots(figsize = (10, 10))\n",
+    "sns.heatmap(train_accuracy, annot=True, ax=ax, cmap=\"viridis\")\n",
+    "ax.set_title(\"Training Accuracy\")\n",
+    "ax.set_ylabel(\"$\\eta$\")\n",
+    "ax.set_xlabel(\"$\\lambda$\")\n",
+    "plt.show()\n",
     "\n",
-    "    # Find the Python function that represents the gradient of the cost function\n",
-    "    # w.r.t the 0-th input argument -- that is the weights and biases in the hidden and output layer\n",
-    "    cost_function_grad = grad(cost_function,0)\n",
+    "fig, ax = plt.subplots(figsize = (10, 10))\n",
+    "sns.heatmap(test_accuracy, annot=True, ax=ax, cmap=\"viridis\")\n",
+    "ax.set_title(\"Test Accuracy\")\n",
+    "ax.set_ylabel(\"$\\eta$\")\n",
+    "ax.set_xlabel(\"$\\lambda$\")\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a61b50a8",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Using Pytorch with the full MNIST data set"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 10,
+   "id": "d220a7ad",
+   "metadata": {
+    "collapsed": false,
+    "editable": true,
+    "jupyter": {
+     "outputs_hidden": false
+    }
+   },
+   "outputs": [],
+   "source": [
+    "import torch\n",
+    "import torch.nn as nn\n",
+    "import torch.optim as optim\n",
+    "import torchvision\n",
+    "import torchvision.transforms as transforms\n",
     "\n",
-    "    # Let the update be done num_iter times\n",
-    "    for i in range(num_iter):\n",
-    "        # Evaluate the gradient at the current weights and biases in P.\n",
-    "        # The cost_grad consist now of two arrays;\n",
-    "        # one for the gradient w.r.t P_hidden and\n",
-    "        # one for the gradient w.r.t P_output\n",
-    "        cost_grad =  cost_function_grad(P, x)\n",
+    "# Device configuration: use GPU if available\n",
+    "device = torch.device(\"cuda\" if torch.cuda.is_available() else \"cpu\")\n",
     "\n",
-    "        P[0] = P[0] - lmb * cost_grad[0]\n",
-    "        P[1] = P[1] - lmb * cost_grad[1]\n",
+    "# MNIST dataset (downloads if not already present)\n",
+    "transform = transforms.Compose([\n",
+    "    transforms.ToTensor(),\n",
+    "    transforms.Normalize((0.5,), (0.5,))  # normalize to mean=0.5, std=0.5 (approx. [-1,1] pixel range)\n",
+    "])\n",
+    "train_dataset = torchvision.datasets.MNIST(root='./data', train=True, download=True, transform=transform)\n",
+    "test_dataset  = torchvision.datasets.MNIST(root='./data', train=False, download=True, transform=transform)\n",
     "\n",
-    "    print('Final cost: %g'%cost_function(P, x))\n",
+    "train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=64, shuffle=True)\n",
+    "test_loader  = torch.utils.data.DataLoader(test_dataset, batch_size=64, shuffle=False)\n",
     "\n",
-    "    return P\n",
     "\n",
-    "def g_analytic(x, gamma = 2, g0 = 10):\n",
-    "    return g0*np.exp(-gamma*x)\n",
+    "class NeuralNet(nn.Module):\n",
+    "    def __init__(self):\n",
+    "        super(NeuralNet, self).__init__()\n",
+    "        self.fc1 = nn.Linear(28*28, 100)   # first hidden layer (784 -> 100)\n",
+    "        self.fc2 = nn.Linear(100, 100)    # second hidden layer (100 -> 100)\n",
+    "        self.fc3 = nn.Linear(100, 10)     # output layer (100 -> 10 classes)\n",
+    "    def forward(self, x):\n",
+    "        x = x.view(x.size(0), -1)         # flatten images into vectors of size 784\n",
+    "        x = torch.relu(self.fc1(x))       # hidden layer 1 + ReLU activation\n",
+    "        x = torch.relu(self.fc2(x))       # hidden layer 2 + ReLU activation\n",
+    "        x = self.fc3(x)                   # output layer (logits for 10 classes)\n",
+    "        return x\n",
     "\n",
-    "# Solve the given problem\n",
-    "if __name__ == '__main__':\n",
-    "    # Set seed such that the weight are initialized\n",
-    "    # with same weights and biases for every run.\n",
-    "    npr.seed(15)\n",
+    "model = NeuralNet().to(device)\n",
     "\n",
-    "    ## Decide the vales of arguments to the function to solve\n",
-    "    N = 10\n",
-    "    x = np.linspace(0, 1, N)\n",
     "\n",
-    "    ## Set up the initial parameters\n",
-    "    num_hidden_neurons = 10\n",
-    "    num_iter = 10000\n",
-    "    lmb = 0.001\n",
+    "criterion = nn.CrossEntropyLoss()\n",
+    "optimizer = optim.SGD(model.parameters(), lr=0.01, weight_decay=1e-4)\n",
     "\n",
-    "    # Use the network\n",
-    "    P = solve_ode_neural_network(x, num_hidden_neurons, num_iter, lmb)\n",
+    "num_epochs = 10\n",
+    "for epoch in range(num_epochs):\n",
+    "    model.train()  # set model to training mode\n",
+    "    running_loss = 0.0\n",
+    "    for images, labels in train_loader:\n",
+    "        # Move data to device (GPU if available, else CPU)\n",
+    "        images, labels = images.to(device), labels.to(device)\n",
     "\n",
-    "    # Print the deviation from the trial solution and true solution\n",
-    "    res = g_trial(x,P)\n",
-    "    res_analytical = g_analytic(x)\n",
+    "        optimizer.zero_grad()            # reset gradients to zero\n",
+    "        outputs = model(images)          # forward pass: compute predictions\n",
+    "        loss = criterion(outputs, labels)  # compute cross-entropy loss\n",
+    "        loss.backward()                 # backpropagate to compute gradients\n",
+    "        optimizer.step()                # update weights using SGD step \n",
     "\n",
-    "    print('Max absolute difference: %g'%np.max(np.abs(res - res_analytical)))\n",
+    "        running_loss += loss.item()\n",
+    "    # Compute average loss over all batches in this epoch\n",
+    "    avg_loss = running_loss / len(train_loader)\n",
+    "    print(f\"Epoch {epoch+1}/{num_epochs}, Loss: {avg_loss:.4f}\")\n",
     "\n",
-    "    # Plot the results\n",
-    "    plt.figure(figsize=(10,10))\n",
+    "#Evaluation on the Test Set\n",
     "\n",
-    "    plt.title('Performance of neural network solving an ODE compared to the analytical solution')\n",
-    "    plt.plot(x, res_analytical)\n",
-    "    plt.plot(x, res[0,:])\n",
-    "    plt.legend(['analytical','nn'])\n",
-    "    plt.xlabel('x')\n",
-    "    plt.ylabel('g(x)')\n",
-    "    plt.show()"
+    "\n",
+    "\n",
+    "model.eval()  # set model to evaluation mode \n",
+    "correct = 0\n",
+    "total = 0\n",
+    "with torch.no_grad():  # disable gradient calculation for evaluation \n",
+    "    for images, labels in test_loader:\n",
+    "        images, labels = images.to(device), labels.to(device)\n",
+    "        outputs = model(images)\n",
+    "        _, predicted = torch.max(outputs, dim=1)  # class with highest score\n",
+    "        total += labels.size(0)\n",
+    "        correct += (predicted == labels).sum().item()\n",
+    "\n",
+    "accuracy = 100 * correct / total\n",
+    "print(f\"Test Accuracy: {accuracy:.2f}%\")"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "d87d7514",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "## The network with one input layer, specified number of hidden layers, and one output layer\n",
-    "\n",
-    "It is also possible to extend the construction of our network into a more general one, allowing the network to contain more than one hidden layers.\n",
-    "\n",
-    "The number of neurons within each hidden layer are given as a list of integers in the program below."
+    "## And a similar example using Tensorflow with Keras"
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 2,
-   "metadata": {},
-<<<<<<< HEAD
+   "execution_count": 6,
+   "id": "c6df6115",
+   "metadata": {
+    "collapsed": false,
+    "editable": true,
+    "jupyter": {
+     "outputs_hidden": false
+    }
+   },
    "outputs": [
     {
      "name": "stdout",
      "output_type": "stream",
      "text": [
-      "Initial cost: 324.246\n",
-      "Final cost: 0.119936\n"
+      "GPUs available: [PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "WARNING:absl:At this time, the v2.11+ optimizer `tf.keras.optimizers.SGD` runs slowly on M1/M2 Macs, please use the legacy Keras optimizer instead, located at `tf.keras.optimizers.legacy.SGD`.\n",
+      "WARNING:absl:There is a known slowdown when using v2.11+ Keras optimizers on M1/M2 Macs. Falling back to the legacy Keras optimizer, i.e., `tf.keras.optimizers.legacy.SGD`.\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Model: \"sequential_49\"\n",
+      "_________________________________________________________________\n",
+      " Layer (type)                Output Shape              Param #   \n",
+      "=================================================================\n",
+      " flatten (Flatten)           (None, 784)               0         \n",
+      "                                                                 \n",
+      " dense_147 (Dense)           (None, 100)               78500     \n",
+      "                                                                 \n",
+      " dense_148 (Dense)           (None, 100)               10100     \n",
+      "                                                                 \n",
+      " dense_149 (Dense)           (None, 10)                1010      \n",
+      "                                                                 \n",
+      "=================================================================\n",
+      "Total params: 89,610\n",
+      "Trainable params: 89,610\n",
+      "Non-trainable params: 0\n",
+      "_________________________________________________________________\n"
      ]
     },
     {
      "data": {
-      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAl4AAAJOCAYAAABm9wkdAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjEsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+j8jraAAAgAElEQVR4nOzddZxU9f7H8ddng126G5QUKUHAQkUUW1AEsUDQa16v9bO7rl692IEFEoKCimIgBgaKIJ1KiISkdG+w8f39cQ7ecd2E3Tkzs+/n47EPmH6fMyfe8z1nZ805h4iIiIiUvLigA4iIiIiUFipeIiIiImGi4iUiIiISJipeIiIiImGi4iUiIiISJipeIiIiImGi4hXCzGqb2Q9mttvMngk6T9DMrKyZfWpmO83s/RJ4/lVmdmpxP2+0MLNGZubMLCHoLAfDn4ZmJfj8D5vZqELed4+ZNSmpLBLZirKsFNPrDTezx8L1egfiYLczZvaLmXU9yAzF/r4c7Lwvjuk6UFFfvPydd6q/wd1oZsPMrMIBPt01wBagknPutmKMGa0uAGoD1Z1zfYIOI4VXWkutc66Cc25F0DkKYmatzOwT/0PNbjP7zsw6h9y+f2e5J2TbNt7MTsvxPKHbv/0/L4d/iiKfmXU1s7UH8fjLzezH4swUaXIrM8651s65SQFFKhaRNl1RX7x8PZxzFYAOwFHA/UV5sHnigEOBRe4AvlU22kct8nAo8KtzLjPoINHIzOKDzlCSYnSZL3Fm1hSYAiwEGgP1gHHAV2Z2XI67V/G3be2AicA4M7s8x316+IVz/88NJTsF4aHlS2KWcy6qf4BVwKkhl58Cxvv/PxaYCuwA5gNdQ+43CXgcbwOYCowCMoB9wB7gVCAJeB5Y7/88DyT5j+8KrAXuAv4ARgIPA+/7z7Ubb8N6GHAPsAlYA5wekuEKYLF/3xXAtSG37X/+2/zHbgCuCLm9LPAM8DuwE/gRKFvQdOcy/1r682IH8Atwrn/9I/68yPDnx5W5PPZo4Cf/sRuAl4Ey+bzWZX7ercB9oe8d3oeAu4Hl/u3vAdVCHlvQe/kEMMOfFx+HPjZHhoLmaxLwNLAa2Ai8FjJfLwd+zPF8Dmjm/3848CowAdiLtwydA8wFdvnv/8Mhj23kPz4hn2X7dmCBP13vAskht3cH5vnzZCpwhH/9SCAbb7neA9wJjABu82+v77/u9f7lZsA2wPzLVwO/+dd9AtTLMb3/ApYBK3OZByf403lyLtOTjLdubPUzzwRq+7fV819rm//aV4c87mFglP//L4AbcjzvfKBXHu/HIOAzvHVsOtA05HGnA0v9efsK8D1wVR7vRb7Luv+61/nzZbv/upbHc40EJuRy/avAD/ktG/7ysBGIy237V8C2Mh64F28d2w3MBhr6t3X234+d/r+dc6xfj/nL2B7gU6A68Dbecj0TaJRjXtyEt03bgrdN3p+3KfCtvwxs8Z+jSo5l/i68ZT4dSCD/db+x/77txiumL+9fVnJMe3m89SHbn4Y9eMtcntv4XLaTaUCW/9gdhVzGDvdzbfOXtQvzeX8OZn9QqO0M0AeYneN1bwM+wjviE7oP/DTnMkb+y9AL/mvv8q8/Mbd1OJfprgGM99/fbcDkkOUl1/1TyLx/rKBtcyGnqzD7+lzn/YH8HFTpiYSfHDOvof/m/Btv57IVOBtvp36af7lmyMZkNdDaXyATQ99I/z6PAtOAWkBNvJX/3yFvRibwX/9NK+svXGnAGf5zvgWsxCsZiXg7tJU5VpamgAEnASlAhxzP/6j/2LP926v6tw/yp6E+3srQ2c+R73TnmHeJeDu5e4EywCl4K1OLglYW//aOeBvFBLyVezFwSx73beUv9F38nM/607f/vbvFn9cN/NtfB0b7txXmvVwHtMHbwH6QV+5CzNfn8QpANaAi3k7miYJW7pANwU7geD9nsv96bf3LR+DtNHvm3CDms2zPwNtBVPPn73X+bR3wNgLH+O//AP/+SSGPDf1A8g/+t8G5FG/D+W7IbR/7/z8Fb4fYwX8fXsIvAyHTO9HPUzZ0HuAt92uAo/OYnmv9+VnOz9wR77A+eDvPV/x51h7YDHTLuRwC/YEpOZarHSHTnfP92IZXmhLwdvJj/Ntq4O0gevm33Yy3cc6reOW7rPuvOx6oAhzi5z8zj+f6g1w23MDJeDv2cnktG0AT//qWub3PBWwr78D7MNgCb5vTDq9AVcMri5f503eJf7l6yPr1G962qjKwCPgV74PF/u3csBzz4jv/eQ/x73uVf1szvPU3CW+b+gPwfI5lfh7etrwsBa/7P+FtS5Lwti27yX/dX5vjujy38bk8/nL+vv7nt4yVx1sfrvBv64C3brXO4/kPZn/QlUJsZ/z5tG3/8uPfPhfoHTI9j+XItYr/badzXYb82/rhLU8JeCXlD/wPiuRfvJ7A+4Cb6P+c6D93QfunP7Pm8d7k3BbkN12F2dfnOu8P5Cfw4nSwP/7M24O38f0db+NdFu9T08gc9/0SGBCyMXk0l5UotHgtB84OuXwGsCrkzdjHX0cgHgYmhlzu4WeL9y9X9BeGKnlMy0fAzSHPn0rIhhdvR3ss3sqVCrTL5Tnyne4c15/orxxxIdeNxv+0lN/Kkkf+W4Bxedz2IP4Gyb9c3p9/+xf8xfg7Wv9yXbwdYUIh38snQ25r5T93fC458puvhjdSFfqJ9Tj+N7JzOQWv3G8VMI+eB57z/9+IgotXv5DLA4HX/P+/So4dBN4n6pNCHhtavJrirSNxeBu5a/F3QnijYbf6/38TGBjyuAr++9AoZHpPyWUe3IO3/rXNZ9r/QcjIXMj1DfEKR8WQ654AhudcDvHWob3Aof7lx4Gh+bwfQ0JuOxtY4v+/P/BTyG2Gt5PMtXgVtKz7r3tCyOX3gLvzeGwmuZQyvNERh1c2cl028IqpA44PeZ/3b//2/1ydx+suBc7L5frLgBk5rvsJuDxk/bov5LZngM9DLvcA5uWYF2eGXL4e+CaPTD2BuTmW+X+EXM5z3ccrdZlA+ZDb3qFoxSvPbXwuj7+c3ItXXsvYRcDkHPd/HXiokMtYofYHeTw2z+0M3rbjcf//rfFKdlLI9ORXUHJdhvLIsB1/H0X+xetRvKMUzXJcX9D+6c+sebw3RSleBe3rCz3vC/MTK+d49XTOVXHOHeqcu945l4p3flIfM9ux/wfvMEjdkMetKeB56+HtTPb73b9uv83OubQcj9kY8v9UYItzLivkMng7M8zsLDObZmbb/Hxn430S32+r++v5VSn+Y2vgbYCX55K5MNMdOn1rnHPZOaaxfi73/RszO8w/4fcPM9sF/CdH/r+91v4Lzrm9eJ9cQ3OPC8m8GG9nXLuQ0xT6Xv6O98kkryx5zdeaeKMNs0Ne5wv/+sL6yzJlZsf4J05vNrOdeIej8sqVmz9yyQnePLktxzxpyF+Xzz8555bj7aDb423QxgPrzawF3qfr7/27/mWZd87twXufQpeJ3NabW4D3nHML85mWkXg7zTFmtt7MBppZov+a25xzu0Pum+ty6N/nM+Bi/6qL8UYZ8pLX/Mu5PDq8wwm5KuSyntdr5bSF3NfHuniHwrbnlYP/zZNtIdft3/7t/xmcx2Mbkvs2I+d2Dv4+/3Nu13JezjmtOdfHegBmVsvMxpjZOn8+juLv8zH0sfmt+/WA7f62JPS1iqKgbXxh5LeOHpMje1+gTm5PchD7g6JuZ0YAl5qZ4ZXu95xz6YWc1ryWIczsNjNb7P/CyA680dHCbOuewhvZ+srMVpjZ3f71B7V/KqKCloM85/2BiJXilZs1eJ+UQjdI5Z1zT4bcxxXwHOvxVp79DvGvK+zj82RmSXiHxJ7GO8+lCt65QVaIh2/BO6TZNJfbCjPd+60HGvq/WLDfIXiH7QrjVWAJ0Nw5VwlvSDiv/BvwVloAzKwc3rB0aO6zcuROds6tK+Q0NQz5/yF4ozRbCjkd+23B24m0Dnmdys47uRm8kZZyIdOQ2wY05zLxDt6hy4bOucp4o02FeY8LsgbvU2voPCnnnBudRw7wytUFeOcmrfMv9weq4h3egRzLvJmVx3ufQpeJ3J67D9DTzG7JK7BzLsM594hzrhXeofHu/uuvB6qZWcWQu+e3HI4GLvFPRC+Ld1irqDbgHdYGvF+wCb2ci6Is6wX5Gm9+5XQh3ihcSj6PPR/v0/bSA3jdNeS+zci5nYOibQdyk3N93L/dfAJv+TnCn4/9+Pt8DF2+8lv3NwBV/WU09LXykttyW9A2vqDH52cN8H2O7BWcc//MeceD3B9AEbYzzrlpeEcETsQ77WBk6M2FmKa/LUNmdiLe6OSFeIfgquCddlFgfufcbufcbc65Jnijp7eaWTeKtn8qaNt8sPv6YhXLxWsU0MPMzjCzeDNL9n+dOL+Na06jgfvNrKaZ1cA7XFZc30VSBu94+2Yg08zOwjvZt0D+J4ChwLNmVs+fvuP8lbco0z0db4G908wSzftOkx7AmEJOQ0W882T2mNnhwN82KCHGAt3N7AQzK4M3vBy6/L0GPG5mhwL48/w8/7bCTFM/835Fv5z/3GNDRhoLxZ+vg4HnzKyWn6O+mZ3h32U+0NrM2ptZMt7weUEq4o3mpJnZ0XgbuuIwGLjO/6RrZlbezM4JKS8b8c4HCvU9cAPeeTXgHUK6EW+Ifv+8ege4wp/GJLyRnenOuVUF5FkPdANuMrPrc7uDmZ1sZm3N+23PXXjlOMs5twbvEOQT/nt7BHAleY9kTcDbSD6Kd55adh73y89nQFsz62neb8/9izxGInxFWdYL8gjQ2cweN7NqZlbRzG7EK6F35fYA875j8AbgIeCeA5zmIcC/zay5v8wcYWbV8ebnYWZ2qZklmNlFeIfrxx/Q1HnuMLOqZtYQ7/y5d/3rK+IfGjWz+njnDOUnz3XfOfc7MAt4xMzKmNkJeNuvvGwEqptZ5ZDrirKN3wg08LdfhTEeb75e5m9fE83sKDNrmct9D3h/4CvqduYtvF9EyHTOhX5FRm7bjVB5LUMV8Q77bgYSzOxBoFJhgptZdzNr5n/42YV3pCOLou2fCto2FzRdJbmv/5uYLV7+xvw8vE+mm/Ga+h0UbZofw1uxF+CdUDjHv6448u3G+82f9/AOLVyK94mlsG73M83EO+zwX7xj4YWebufcPuBc4Cy80Z5XgP7OuSVFyHAp3gmPg/nfxvVvnHO/4O3c3sH7pLqdvx7aeQFv+r8ys914Jzoe4z+2MNM0Eu84/h94h2FvKuQ05HQX3rD3NPMOhXyNdyIpzrlf8Xb2X+P99lphvtPneuBRf5oexHu/D5pzbhbeL2u8jDcvf8M7z2G/J/A2JDvM7Hb/uu/xNpD7i9ePeJ8S91/GOfcN8ADep+8NeJ9u9x/WKyjTarzydZeZXZXLXergFfBdeIeSv+d/G7dL8M5FWY/31QoPOecm5vE66cCHeCd3v1OYbLk8xxa8UaeBeIdSW+Gt63kdcin0sl6I116Gd7isHd55JhuA3sAZzrkpOe6+w8z24q3rZwN9nHNDc9znU/vr93iNy+Oln8Vb/r7Cew/exPsFia14o4+34c2LO4Hu/jw6UB/j/WbbPLyS+6Z//SN4J5nv9K//ML8nKcS6fynedmIbXil9K5/nWoK3g13hrxf1KNo2/lu8X976w8wKnDf+Nv50vPVnPd62af8vY+V234PZHxR1OzMS75eRRua4/k2glT9/PsrlcbkuQ3inEHyO94sUv+MdkSnoVJ79muNtU/fgnVv4inNuUlH2T4XYNhc0XSW2r8/N/l8fF4laZjYJ78TNIUFnkehk3uGMtUBf59yBHLoUn5k5vEOyvwWdRXJnZmXxDll38D8ISBjF7IiXiEh+/MNXVfxDqvvP2ZoWcCyRcPgnMFOlKxj6ZmARKa2OwztUWQbvu6l6+r8RLRKzzGwV3oeMngFHKbV0qFFEREQkTHSoUURERCRMouJQY40aNVyjRo2CjiEiIiJSoNmzZ29xzuX65dtRUbwaNWrErFmzgo4hIiIiUiAzy/OvKOhQo4iIiEiYqHiJiIiIhImKl4iIiEiYRMU5XiIiInLwMjIyWLt2LWlpaUFHiQnJyck0aNCAxMTEQj9GxUtERKSUWLt2LRUrVqRRo0Z4f5daDpRzjq1bt7J27VoaN25c6MfpUKOIiEgpkZaWRvXq1VW6ioGZUb169SKPHqp4iYiIlCIqXcXnQOalipeIiIhImKh4iYiISNQZPnw4N9xwQ4H3Wb9+/Z+Xr7rqKhYtWlTk15o0aRLdu3cv8uNyo+IlIiIiMSln8RoyZAitWrUKMJGKl4iIiIRZz5496dixI61bt+aNN94AoEKFCtx33320a9eOY489lo0bNwLw6aefcswxx3DkkUdy6qmn/nn9frt376Zx48ZkZGQAsGvXLho1asT777/PrFmz6Nu3L+3btyc1NZWuXbv++ScIv/jiCzp06EC7du3o1q0bADNmzKBz584ceeSRdO7cmaVLlxb7tOvrJEREREqhRz79hUXrdxXrc7aqV4mHerQu8H5Dhw6lWrVqpKamctRRR9G7d2/27t3Lsccey+OPP86dd97J4MGDuf/++znhhBOYNm0aZsaQIUMYOHAgzzzzzJ/PVbFiRbp27cpnn31Gz549GTNmDL1796ZPnz4MGjSIp59+mk6dOv3l9Tdv3szVV1/NDz/8QOPGjdm2bRsAhx9+OD/88AMJCQl8/fXX3HvvvXzwwQfFOo9UvERERCSsXnzxRcaNGwfAmjVrWLZsGWXKlPnzPKqOHTsyceJEwPvusYsuuogNGzawb9++XL8z66qrrmLgwIH07NmTYcOGMXjw4Hxff9q0aXTp0uXP56pWrRoAO3fuZMCAASxbtgwz+3MUrTipeImIiJRChRmZKgmTJk3i66+/5qeffqJcuXJ07dqVtLQ0EhMT//x6hvj4eDIzMwG48cYbufXWWzn33HOZNGkSDz/88N+e8/jjj2fVqlV8//33ZGVl0aZNm3wzOOdy/SqIBx54gJNPPplx48axatUqunbtetDTm5PO8RIREZGw2blzJ1WrVqVcuXIsWbKEadOmFXj/+vXrAzBixIg879e/f38uueQSrrjiij+vq1ixIrt37/7bfY877ji+//57Vq5cCfDnocbQ1xo+fHiRpquwVLxEREQkbM4880wyMzM54ogjeOCBBzj22GPzvf/DDz9Mnz59OPHEE6lRo0ae9+vbty/bt2/nkksu+fO6yy+/nOuuu+7Pk+v3q1mzJm+88Qa9evWiXbt2XHTRRQDceeed3HPPPRx//PFkZWUd5JTmzpxzJfLExalTp05u/28hiIiIyIFZvHgxLVu2DDpGiRg7diwff/wxI0eODOvr5jZPzWy2c65TbvfXOV4iIiIS1W688UY+//xzJkyYEHSUAql4iYiISFR76aWXgo5QaDrHS0RERCRMSqx4mdlQM9tkZj+HXFfNzCaa2TL/36ol9foiIiIikaYkR7yGA2fmuO5u4BvnXHPgG/9yRMjM2Bd0BBEREYlxJVa8nHM/ANtyXH0esP9LOEYAPUvq9YtixriX+eM/bUnd+/fv+hAREREpLuE+x6u2c24DgP9vrbzuaGbXmNksM5u1efPmEg1VsV5zGrg/mP/RsyX6OiIiIvJXDz/8ME8//XTQMcImYk+ud8694Zzr5JzrVLNmzRJ9rZbHnMHPSe1pvuxN0lI06iUiIiIlI9zFa6OZ1QXw/90U5tfPk3W9m+rsZP6454KOIiIiEtMef/xxWrRowamnnsrSpUsB6Nq1K3fddRdHH300hx12GJMnTwa8P93Tq1cvzjzzTJo3b86dd94ZZPSDFu7v8foEGAA86f/7cZhfP0+tjzuLhd+1p9myN0lL+T+Sy1UMOpKIiEjJ+fxu+GNh8T5nnbZw1pP53mX27NmMGTOGuXPnkpmZSYcOHejYsSMAmZmZzJgxgwkTJvDII4/w9ddfAzBv3jzmzp1LUlISLVq04MYbb6Rhw4bFmz1MSvLrJEYDPwEtzGytmV2JV7hOM7NlwGn+5cjR9W6qs4OFHz8fdBIREZGYNHnyZM4//3zKlStHpUqVOPfcc/+8rVevXgB07NiRVatW/Xl9t27dqFy5MsnJybRq1Yrff/893LGLTYmNeDnnLsnjpm4l9ZoHq81xZ7JgUnuaLB1CeuotJJXVqJeIiMSoAkamSpKZ5Xp9UlISAPHx8WRmZv7t+txuizYRe3J9EMwMTvJGvRZ8/ELQcURERGJOly5dGDduHKmpqezevZtPP/006EhhpeKVQ9vOZ7KgTDsaLxlMeuqeoOOIiIjElA4dOnDRRRfRvn17evfuzYknnhh0pLAy51zQGQrUqVMnN2vWrLC93vwfP6Pd15cy6/A76HTx/WF7XRERkZK0ePFiWrZsGXSMmJLbPDWz2c65TrndXyNeuTji+LNZkNiORkuGsC91b9BxREREJEaoeOXCzMjscic12M7Pn+g3HEVERKR4qHjl4cgTzmFB4hEcsngwGWka9RIRkdgQDacYRYsDmZcqXnkwMzJO2D/qpd9wFBGR6JecnMzWrVtVvoqBc46tW7eSnJxcpMeF+5vro0qHLt2ZP/kIDln0BplpN5OQXD7oSCIiIgesQYMGrF27ls2bNwcdJSYkJyfToEGDIj1GxSsf+0e9qk/qx7xPX6R9n3uCjiQiInLAEhMTady4cdAxSjUdaixAx5O6Mz+hLQ0XvU6mzvUSERGRg6DiVQAzI73zHVR32/ll/EtBxxEREZEopuJVCJ26nsv8hLbU/+U1stJTgo4jIiIiUUrFqxDi4oy0zrdTw21n0fgXg44jIiIiUUrFq5CO6noe8+PbUvfn18nalxp0HBEREYlCKl6FFBdnpHS+nRpuG4t0rpeIiIgcABWvIjjm5POYH9+GegtfJVujXiIiIlJEKl5FEBdn7D72dqq7bSz+TKNeIiIiUjQqXkV0XLeezItvQ50FGvUSERGRolHxKqL4OGPPMbdR3W1jyQSNeomIiEjhqXgdgGNPOY/5ca2pNV+jXiIiIlJ4Kl4HICEhnp3H3EYNt42lnw8KOo6IiIhECRWvA9S5W0/mxbWm5rxBuAyNeomIiEjBVLwOUEJCPDuO9ke9JmjUS0RERAqm4nUQjj+1J/PiWlFz/isa9RIREZECqXgdhMSEeLYfdSvVs7ey7ItXg44jIiIiEU7F6yCdcFov5lkrqs19WaNeIiIiki8Vr4OUmBDP1qNupUb2Vn77UqNeIiIikjcVr2Jw4mm9mG8tqTbnZVxGWtBxREREJEKpeBWDMonxbO70f1TP3sryrzTqJSIiIrlT8SomJ57em3nWkqqzX9Kol4iIiORKxauYJCUmsKmjN+q18qvXgo4jIiIiEUjFqxh18Ue9Ks9+CTLTg44jIiIiEUbFqxgll0lg45E3Uz17Cyt0rpeIiIjkoOJVzE46sw/z7HAqz9Kol4iIiPyVilcxSy6TwIb2t1A9ewurJr4edBwRERGJICpeJaDrmX2Yx+FUnPWCRr1ERETkTypeJaBsUgLr299M9awt/P61Rr1ERETEo+JVQrqe1Yd5tKD8zBc16iUiIiKAileJKZeUyNp2N1MjazOrv30j6DgiIiISAVS8StDJZ13ojXpN17leIiIiouJVosonJ7L6iJuonrWZNd8ODjqOiIiIBEzFq4SdcvZFzOcwyk1/XqNeIiIipZyKVwmrkJzI7229Ua+13w0JOo6IiIgESMUrDE4++yLmcRhlpz0PmfuCjiMiIiIBUfEKg4ply7CqzU1Uz9rEukk610tERKS0UvEKk1POuYj5NKfsTxr1EhERKa1UvMKkUtkyLG91I9WyNrF+ks71EhERKY1UvMKo2zmXMN81J2nacxr1EhERKYVUvMKocvky/NbqBqpnbmLDD28GHUdERETCTMUrzLp1v4R5rjlJUzXqJSIiUtqoeIVZlfJJLGv5L6plbmTjZI16iYiIlCYqXgE4tfulzHfNSJiiUS8REZHSRMUrAFUrJLG05Q1Uz9zIph+HBh1HREREwkTFKyDebzg2I+HHZzXqJSIiUkqoeAWkesVklrS4nmqZG9n847Cg44iIiEgYqHgFqFuPvsx3TYmfolEvERGR0kDFK0A1Kiaz6LB/US3jD7ZMGR50HBERESlhKl4B69bjUua7psT9+IxGvURERGKcilfAalUqyy/NvVGvrVOHBx1HRERESpCKVwTo1uNSFrim2GSd6yUiIhLLVLwiQO3KZVnY7J9Uy9jAtqkjgo4jIiIiJUTFK0J069GPBdlNYfIzkJURdBwREREpASpeEaJOlbLMa3adRr1ERERimIpXBDm1Rz/mZzeFyU9r1EtERCQGqXhFkHpVyzG/6bVU27eBHT+9FXQcERERKWYqXhGm27mXsSC7Cdk/aNRLREQk1qh4RZj6Vcsxp8m1VNu3np3TRgYdR0RERIqRilcE6tbDH/X6/imNeomIiMQQFa8I1LB6eWY3uoaq+9azc7pGvURERGKFileE6nZufxZmNyZ7kka9REREYoWKV4Q6pEZ5ZjS6lqr71rNrxqig44iIiEgxUPGKYN16XMbC7MZkadRLREQkJqh4RbBGNSsw/ZBrqJq+jt0z3g46joiIiBwkFa8Id8q53qhX5qSBGvUSERGJcipeEa5JrYpMa3g1VdPXsUejXiIiIlFNxSsKnHxufxZmNyJj0lOQlRl0HBERETlAKl5RoFntikxteA1V09eyZ5ZGvURERKKVileUOKXHZd6o17cDNeolIiISpVS8okTzOpWY0uAqqqavJUWjXiIiIlFJxSuKdO3Rn5+zG7HvO416iYiIRCMVryhyeN3KTK5/JVXS1pIye3TQcURERKSIVLyizEndB3ijXt8+qVEvERGRKKPiFWVa1a/M93X/QZW0taTO0aiXiIhINFHxikIn9RjAL9mHkv6NRr1ERESiiYpXFGrToArf1bnSH/UaE3QcERERKSQVryj156iXzvUSERGJGipeUaptwyp8W/sfVEldQ9pcjXqJiIhEAxWvKNbFH/VK07leIiIiUUHFK4q1O6Qq39S+giqpa0if+27QcURERKQAKl5R7oTu3qhXqka9RGDjDmQAACAASURBVEREIp6KV5TrcGg1Jta8nCqpq0mfp1EvERGRSKbiFQNO7DGARdmHkva1Rr1EREQimYpXDOjYqDpf1hxA5dTV7Jv3XtBxREREJA8qXjHi+HO8Ua8UneslIiISsVS8YsTRTWrwefUBVEn5nX3z3w86joiIiORCxSuGHN+9P4uzDyH16yc06iUiIhKBVLxiyLFNa/JZtf5U1qiXiIhIRFLxijGdzxnwv1Gv7Kyg44iIiEgIFa8Yc1yzmoz3R70yNOolIiISUVS8YoyZcezZ/rleE/+jUS8REZEIouIVg05oXotPq/SjUsrvZGrUS0REJGKoeMUgM+OYcy5ncXZD9upcLxERkYih4hWjuhxWi4+rXEblvavIXDA26DgiIiKCilfMMjOOOWsAi7MbkqJzvURERCKCilcM63p4bT6u3I9KGvUSERGJCIEULzP7PzP7xcx+NrPRZpYcRI5YZ2YcddYAlmQ3JGWizvUSEREJWtiLl5nVB24COjnn2gDxwMXhzlFanNKyDuMq9aXS3pVkzXs36DgiIiKlWlCHGhOAsmaWAJQD1geUI+aZGR3PHMC87Cbs+/IBSNsZdCQREZFSK+zFyzm3DngaWA1sAHY6577KeT8zu8bMZpnZrM2bN4c7Zkw5rXVdRte4maT0raRNfCzoOCIiIqVWEIcaqwLnAY2BekB5M+uX837OuTecc52cc51q1qwZ7pgxxcy44sJejM7qRpnZQ+CPhUFHEhERKZWCONR4KrDSObfZOZcBfAh0DiBHqXJ4nUpsOvoOtrvy7P7wFsjODjqSiIhIqRNE8VoNHGtm5czMgG7A4gBylDrXntGJ1xP7U3HTLDLnvhN0HBERkVIniHO8pgNjgTnAQj/DG+HOURqVK5PAMb1uZHZ2czK+uB9StwcdSUREpFQJ5LcanXMPOecOd861cc5d5pxLDyJHadStVV0mHHI7ZfbtYPeEh4OOIyIiUqrom+tLoSsvOI/RnEH5hSNw6+YGHUdERKTUUPEqhepVKUtW13vZ6iqxc+xNOtFeREQkTFS8Sqm+XdoyrPxVVNm+gLSZw4OOIyIiUiqoeJVSCfFxnHbRDUzPPpzsrx6CvVuDjiQiIhLzVLxKsSMPrca0w++lTOYetn16X9BxREREYp6KVyl3ec+zGR13DlWWjCFr9cyg44iIiMQ0Fa9SrnK5RKqd/SCbXBV2jL0RsrOCjiQiIhKzVLyEszs15/3q11F912J2T9F32YqIiJQUFS/BzDjn4n8xNbsN8d89Bns2Bx1JREQkJql4CQBNalXk144PkZCVysYP7ww6joiISExS8ZI/XXx2N94v05PaKz4kfcWUoOOIiIjEHBUv+VNyYjyNez3MOledXWNvhqzMoCOJiIjEFBUv+YvOLQ/hiwY3UzNlGVu+eynoOCIiIjFFxUv+psdF1/Aj7Sk/ZSBu14ag44iIiMQMFS/5m1qVyrLlxH8Tl72Pte/dFnQcERGRmKHiJbk69+QT+ah8Hxqu/Yw9S74LOo6IiEhMUPGSXMXFGW0vfoQ1riZ7x90CWRlBRxIREYl6Kl6Sp1aH1GZK8zuonb6KtV88E3QcERGRqKfiJfnq3udKJsd1ovrM58jcviboOCIiIlFNxUvyVSEpgewznsRcFqtH/1/QcURERKKaipcUqMvRnfi8al+abJrI1vmfBx1HREQkaql4SYHMjE6XPsQqV4eM8bdDZnrQkURERKKSipcUSsNa1fj5iPuok7GW3z56Iug4IiIiUUnFSwrtjJ79mJxwHA1+HkTqphVBxxEREYk6Kl5SaInxcVTq+RTZzlg9+pag44iIiEQdFS8pknZt2vJ9nctpsf171kz/KOg4IiIiUUXFS4rs2L4PsoL6JH51F9npKUHHERERiRoqXlJkVStVYM2xj1An6w9+GfvvoOOIiIhEDRUvOSBdzriAqckncdiywWxfuyToOCIiIlFBxUsOiJlR98KnyXDxbBhzMzgXdCQREZGIp+IlB6xxk8OY2ehaWu2ZxpLv3w06joiISMRT8ZKDcuzF97LCGlLl+wdIT90ddBwREZGIpuIlB6Vs2WR2nPIkddwmFrzzYNBxREREIpqKlxy0Did2Z3rF02i3+i3W/bYg6DgiIiIRS8VLikXjS54lnUS2j70Zl50ddBwREZGIpOIlxaJWvUP45fCbaJM2hzlfjgg6joiISERS8ZJic1SfO1ge34QG0x9l167tQccRERGJOCpeUmziExJxZz9NbbaxcNR9QccRERGJOCpeUqyadezGnGrdOXrjGJYunBF0HBERkYii4iXFrnm/p0mxsqR/fCtZWTrRXkREZD8VLyl2FavV5ff2t3FE5kJ+HPdq0HFEREQihoqXlIi2PW5iRZnDaLVwIJs2bw46joiISERQ8ZISYfEJlO35PNXZyc9v3x10HBERkYig4iUlpm6r4/ml7vl02f4hM6f9EHQcERGRwKl4SYk6rO9T7I0rT9JXd5C2LzPoOCIiIoFS8ZISlVSxBpuPuZcjspfw3bsvBB1HREQkUCpeUuKanX4dK8u25ujfnmfFmrVBxxEREQmMipeUvLg4qlzwIlVsN7+OvhvnXNCJREREAqHiJWFRtWknlh96MaftHc83304MOo6IiEggVLwkbJpd9AS74ytTe/K9bN+TFnQcERGRsFPxkrCJK1eVlJMeoi3L+Hb0M0HHERERCTsVLwmrel2uYHXF9py89hXmLFkedBwREZGwUvGS8DKj5kUvUclS2PDhPWToj2iLiEgpouIlYVe2wRGsO6w/Z6V/xacTPg06joiISNioeEkgDu31b3YlVKPFrIdYs2V30HFERETCQsVLgpFciezTH6e1rWTSOwP13V4iIlIqqHhJYKodfTHrqxzFuVuH8N3sRUHHERERKXEqXhIcM2pe/BLlbB8pE+5jT7r+iLaIiMQ2FS8JVGKdlmxtexXds79j7IfvBx1HRESkRKl4SeDqdH+AHYm1OGbxf/hl7dag44iIiJQYFS8JXlIFEs/+Ly3jVjN19H/JytaJ9iIiEptUvCQilG9/PhtrHc9Fe97iox9mBx1HRESkRKh4SWQwo9aFL1LWMkma9BCbd6cHnUhERKTYqXhJxLAazdjd6Xq68yNj3n8n6DgiIiLFTsVLIkq10+9mZ1JdTl/1NFOWbgg6joiISLFS8ZLIUqYcZc99mhZxa1nwwZOkZWQFnUhERKTYqHhJxCnTujtb651M//TRjPpqatBxREREio2Kl0Sk6hc8R2Kco970x1i5ZW/QcURERIqFipdEpmqN2XfcLZwdN43RY0boj2iLiEhMUPGSiFXh5NvYVbYhF216kfFzVgUdR0RE5KCpeEnkSkymfM9naRq3gd8/G8jOlIygE4mIiBwUFS+JaPEtTmdnozO5Mmssg8dPCjqOiIjIQVHxkohXuefTJMQZbRb+l7mrtwcdR0RE5ICpeEnkq9KQ7C53cGb8TMa9N5zMrOygE4mIiBwQFS+JCkkn3syeCo35x65XGTl5adBxREREDoiKl0SHhDKUP/85GsVtZM+3z7B+R2rQiURERIpMxUuihjU9mZTm53K1fcQrH34ddBwREZEiU/GSqFKux3+Jj0+g68pn+GbxxqDjiIiIFImKl0SXSvWwk+/h1Pi5fPXhMFL2ZQadSEREpNBUvCTqJHS+npQqh3HjvsG88tXPQccREREpNBUviT7xiZTr+RwNbAvJ059n6R+7g04kIiJSKCpeEp0anUB6ywu4Jv5TXnr/C7Kz9Ue0RUQk8ql4SdRKOvs/kJDMhZte4P1Zq4OOIyIiUiAVL4leFWuT0O1+usQvZMbnI9i6Jz3oRCIiIvlS8ZKoFnf01aRVb8Xt2cP47yezgo4jIiKSLxUviW7xCSSf9zx1bRtNFw3i3Zk65CgiIpFLxUui3yHHkN1hAFcnTOCLT97h53U7g04kIiKSKxUviQlxZz5BdvUWPBs/iAdGfsXOlIygI4mIiPyNipfEhjLlSbh4JJUSsrgv5SnueHeWvmJCREQijoqXxI6ahxF/3ot0iltKx+Uv8er3y4NOJCIi8hcqXhJb2l6A63Ql1yZ8xvyv32bKb1uCTiQiIvInFS+JOXbmE2TVacczia/z5Dtf8MfOtKAjiYiIACpeEosSkoi/cATlysTzRNYz3Pz2NDKysoNOJSIiouIlMapaY+J7vUYbW8E561/miQlLgk4kIiKi4iUx7PBzoPON9E+YyOaf3mb8gvVBJxIRkVJOxUtiW7eHyG5wDAOThvDa2M/5bdOeoBOJiEgppuIlsS0+kbg+wyiTXI7n4p7nlpFT2ZueGXQqEREppVS8JPZVrk987yE0Yw2X73iJez5YgHP6clUREQk/FS8pHZp1w066kwvifyDpl9GMnPZ70IlERKQUUvGS0uOku3CNT+LxMsN5f/znzFm9PehEIiJSyqh4SekRF4/1HkJCuaq8UuYF7hj1I1v3pAedSkREShEVLyldKtQi7sLhNGATt6W9zC1j5pKlP6YtIiJhouIlpc+hnbFuD3J23DSarHyHF77+NehEIiJSSqh4SenU+SbcYWfwQJm3+f67L/huyaagE4mISCmg4iWlU1wc1vM14irV5Y2yL/PgmMms2ZYSdCoREYlxKl5SepWrRlyfEdRiO4+5l/nXqFmkZWQFnUpERGJYIMXLzKqY2VgzW2Jmi83suCByiNCgI3bGfzjJ5tB549s8On5R0IlERCSGBTXi9QLwhXPucKAdsDigHCJw9NXQ+nzuTHyf32Z8yQez1wadSEREYlTYi5eZVQK6AG8COOf2Oed2hDuHyJ/MoMeLWLVGvFZ2EM9+NJnFG3YFnUpERGJQECNeTYDNwDAzm2tmQ8ysfM47mdk1ZjbLzGZt3rw5/CmldEmuhF34FlXjUng+YRD/GjmDXWkZQacSEZEYE0TxSgA6AK86544E9gJ357yTc+4N51wn51ynmjVrhjujlEZ12mDnPMNRbiHn736b29+brz+mLSIixSqI4rUWWOucm+5fHotXxESCd2Q/aN+XG+LHkbbkKwZPXhF0IhERiSFhL17OuT+ANWbWwr+qG6BfJZPIcfbTUKslg5JfY8QXU5m+YmvQiUREJEYE9VuNNwJvm9kCoD3wn4ByiPxdmXLYhW9RISGT15Nf5uZ3ZrJpV1rQqUREJAYEUrycc/P887eOcM71dM5tDyKHSJ5qNMfOfYk22Uu4Zt9IbnhnLhlZ2UGnEhGRKKdvrhfJS5vecNTV/CNuPFVWf8lTXy4NOpGIiEQ5FS+R/JzxONQ7kheSB/P55J/44ucNQScSEZEopuIlkp+EJOgznOQy8Qwr9zL3vT+LlVv2Bp1KRESilIqXSEGqNsJ6vkazrOXcbSP456jZpO7TH9MWEZGiU/ESKYzDz4bON9GHibTY/AX3jVuoL1cVEZEiK3TxMrOqZtbazJqYmQqblD7dHoRDjuOppKHMnzeDd2asDjqRiIhEmXwLlJlVNrN7zWwhMA14HXgP+N3M3jezk8MRUiQixCfCBUNJTC7PiAovM/CTOSxYq7/vLiIihVfQyNVYYA1wonOuhXPuBP/7txoCTwLnmdmVJZ5SJFJUqof1HkL9jNU8mTScf46czfa9+4JOJSIiUSLf4uWcO805N9I597eP9c652c65W5xzb5ZcPJEI1PRkrOvdnJU9ia57v+D/3ptHdrbO9xIRkYIV6lytnKNaZhZvZg+VTCSRKNDlDmjSlUfLDGfTrzN5+bvfgk4kIiJRoLAnyXczswlmVtfM2uCd71WxBHOJRLa4eOg1hLjy1RlRYRBDvp7HD79uDjqViIhEuEIVL+fcpcAIYCEwAbjFOXd7SQYTiXgVamIXDKNG5h8MKj+Um0fPYd2O1KBTiYhIBCvsocbmwM3AB8Aq4DIzK1eCuUSiw6HHYac+xImZU7kwewL/ensO+zL1x7RFRCR3hT3U+CnwgHPuWuAkYBkws8RSiUSTzjfBYWdxV9zbsHYWj3+2KOhEIiISoQpbvI52zn0D4DzPAD1LLpZIFDGD818lrnJdRlQcxMc//czH89YFnUpERCJQQV+gegKAc25Xztucc8vMrJJ/sr1I6Va2KvQZQaWs7QytPIR7PpjPrxt3B51KREQiTEEjXr3NbKqZPWhm55jZ0WbWxcz+YWYjgfFA2TDkFIl89TtgZ/yHDukzuT5xPNeNms2e9MygU4mISARJyO9G59z/mVlV4AKgD1AHSAUWA68556aUfESRKHLUVbD6J/71y2h+3NqEu8ZW4uVLj8TMgk4mIiIRoMBzvJxz24FKwAJgIvAjsAU43Mzal2w8kShjBj1ewKo14c0KrzJj4WKGTVkVdCoREYkQhT25viNwHVAXqAdcA3QFBpvZnSUTTSRKJVWEC9+iXPZeRlZ5gycn/MKsVduCTiUiIhGgsMWrOtDBOXe7c+42oBNQE+gCXF5C2USiV+3W2DnPcHjaPO4r/xH/emcOW/akB51KREQCVtjidQiwL+RyBnCocy4V0N5EJDdH9oUj+zEg433aps7kxnfmkpmlL1cVESnNClu83gGmmdlD/h/HngKMNrPygL4tUiQvZz0FtVozKPk1Vq34lWcn/hp0IhERCVBh/1bjv4GrgR3ATuA659yjzrm9zrm+JRlQJKqVKQcXvkWSZfJutdcYPGkpXy/aGHQqEREJSGFHvHDOzXbOveCce945N6skQ4nElBrN4NyXOCTlF56q8gH/9948Vm9NCTqViIgEoNDFS0QOQptecPQ19Ez7mFOZwT/fnk1aRlbQqUREJMxUvETC5fTHoF4Hnkp8jd0blvHQx78EnUhERMJMxUskXBKSoM9wEuITeL/aa3w0aznvzVwTdCoREQkjFS+RcKp6KJz/OrX3LuWV6u/xwMc/8/O6nUGnEhGRMFHxEgm3FmfC8TfTbe8ELk7+ievfnsPO1IygU4mISBioeIkE4ZQH4ZDOPMgblNuxjNvem0d2tgs6lYiIlDAVL5EgxCfABUOJL1OeMVVfZcri1bz2w/KgU4mISAlT8RIJSqW60HsIlfeuZESt0Tz95RKmLt8SdCoRESlBKl4iQWp6Mtb1Ho7eNZEbKk/lptFz+WNnWtCpRESkhKh4iQStyx3Q9BRuyRjCoft+44Z35pChP6YtIhKTVLxEghYXB70GE1euOm9VfIWlv6/jyc+XBJ1KRERKgIqXSCQoXwP6DKN8yjrerTOKN39cwWcLNgSdSkREipmKl0ikOORYOPVhWu2YxIM1vufOsfP5bdOeoFOJiEgxUvESiSSdb4QW53BFylCOSviNf46azd70zKBTiYhIMVHxEokkZtBzEFapPq8nv8S2zeu5d9xCnNOXq4qIxAIVL5FIU7YqXDiCpPRtfFDnLT6Zt5ZR034POpWIiBQDFS+RSFTvSDjzCRptn8rTdb7l0fGLmLt6e9CpRETkIKl4iUSqTldCm9702jmcM8sv419vz2HjLn25qohINFPxEolUZtDjBaxaU56Nf4nE1E1cOngaW/akB51MREQOkIqXSCRLqggXvkVixh4+rTuMjTt202/IdHak7As6mYiIHAAVL5FIV7sVdH+OShun822TMazaspv+Q2ewKy0j6GQiIlJEKl4i0aD9JXDqI9T6fTzfNnufxet3cMWwmfqOLxGRKKPiJRItTrgFTrqbeqvG8XWLT5m7ehtXjZhFWkZW0MlERKSQVLxEoknXu+H4mzl05Ri+avkl01Zu4ZqRs0nPVPkSEYkGKl4i0cQMTn0Ejr6WZive4rNWk/jh183c8M5cMrKyg04nIiIFUPESiTZmcOaT0KE/rZYPZlybKUxctJFb3p1HVrb+tJCISCRLCDqAiByAuDjo/jxkpnPkgkGMblOGSxYcRVJCHE9f0I64OAs6oYiI5ELFSyRaxcXDea9AZjrHLXqOYa3v4oo5kJwYz+M922Cm8iUiEmlUvESiWXwC9BoMmemc/Ot/eaXVfVw/HZIS4niweyuVLxGRCKNzvESiXUIZ6DMcmpzMWSv+wzMtf2XYlFUM/HIpzumcLxGRSKLiJRILEpPh4newQ4+n16p/83iLlbw6aTkvfftb0MlERCSEipdIrChTDi4dg9XvwKVrHua+5mt4duKvvPHD8qCTiYiIT8VLJJYkVYS+Y7Harbhq/YPc2mw9/5mwhBFTVwWdTEREUPESiT1lq8BlH2HVm3Ljxge5vvFGHvrkF8bMWB10MhGRUk/FSyQWlasG/T/GKtXjjq0PcMWhW7ln3EI+mrsu6GQiIqWaipdIrKpQC/p/gpWrzoM77+eiBju47f35fL5wQ9DJRERKLRUvkVhWuT4M+BQrU5En9jxAj7o7uXH0XL5ZvDHoZCIipZKKl0isq3ooDPgEi0/g2bQH6VZ7N/8cNYfJyzYHnUxEpNRR8RIpDao3hf6fEOeyeDXzYY6rtoer35rFtBVbg04mIlKqqHiJlBa1DofLPiIuYy9D4x6lfeUUrhw+kzmrtwedTESk1FDxEilN6h4Bl40jPnU7oxIf57DyKQwYOoOf1+0MOpmISKmg4iVS2tTvCH3fJ2HPBt4r918aJqVy2ZvTWfrH7qCTiYjEPBUvkdLo0OPgktEk7ljJuEpPUS0+lb5DprF8856gk4mIxDQVL5HSqklXuPhtkrYu5bNqz1POpdJ38HRWb00JOpmISMxS8RIpzZqfBn2GkbxpPp/XehkyUrhk8DTW70gNOpmISExS8RIp7Vr2gF5vUH7DDCbWe5201L1cOngam3alBZ1MRCTmqHiJCLS9AM4bRMV1k/nmkKFs372XvkOms3VPetDJRERiioqXiHiO7AvnPEOVNd/ybaNRrNu2m35vzmBHyr6gk4mIxAwVLxH5n6OugtMfp/rqz/mu2Xus3LSLAUNnsCstI+hkIiIxQcVLRP6q8w1wyv3UXvUx3xw2jkXrd/CPYTNJ2ZcZdDIRkain4iUif9flDjjxduqvfJ+vWn7OnNXbuGrELNIysoJOJiIS1VS8RCR3p9wPx91A4+Wj+Lz1N/y0YgvXjZpNeqbKl4jIgVLxEpHcmcHpj0GnK2nx21A+af0jk5Zu5sZ35pKRlR10OhGRqKTiJSJ5M4Ozn4b2/Wj726uMbTOdrxZt5Nb35pOV7YJOJyISdRKCDiAiES4uDs59ETJT6fTzC4xqcwf95kNSQhwDex9BXJwFnVBEJGqoeIlIweLi4fzXITOdE5Y8xZDW93DVbEhOjOPf57XBTOVLRKQwdKhRRAonPhEuGArNTqPb8id5qdVSRk1bzWOfLcY5HXYUESkMFS8RKbyEJLhoJNb4RLqv/DcDW67gzR9X8vRXS4NOJiISFVS8RKRoEsvCJWOwBkfT5/eHebTF7wz6bjkvf7ss6GQiIhFPxUtEiq5Meej7PlbnCC5b+xB3N1/P01/9ypDJK4JOJiIS0VS8ROTAJFeCfh9gNVpw7YYHuLnpRh77bDEjf1oVdDIRkYil4iUiB65cNej/EVblEG7ZdD/XNN7CAx//wnsz1wSdTEQkIql4icjBKV8DBnyCVazNPdvu47JDt3HXhwv4eN66oJOJiEQcFS8ROXgV60D/T7Dkqjy66wF619/Jre/N54ufNwSdTEQkoqh4iUjxqNLQG/lKSGZgyoOcXWcXN46ey3dLNgWdTEQkYqh4iUjxqdYY+n9CnMEL+x7ipJp7uHbUbH5ctiXoZCIiEUHFS0SKV83DoP/HxGWl83r2oxxTbS9XvTWTGSu3BZ1MRCRwKl4iUvxqt4bLPiI+fRfD4h6jbaUUrhg2g7mrtwedTEQkUCpeIlIy6rWHfmNJ2LuJ0UlP0qR8GgOGzuDndTuDTiYiEhgVLxEpOQ2Phr7vkbBrDR+UH0jdMqlc9uZ0ft24O+hkIiKBUPESkZLV6AS4+G3KbF/GJ1Weo0pcKpcOns6KzXuCTiYiEnYqXiJS8pp1gwvfImnLz0yo8RJJ2an0HTKdNdtSgk4mIhJWKl4iEh4tzoLeQyi7cTZf1XmVrPQULhk8jQ07U4NOJiISNipeIhI+rc+Hnq9Rfv1Uvm4whJSUFPoOns6m3WlBJxMRCQsVLxEJr3YXQY/nqbR2Et8c+hZbdu2h35DpbNu7L+hkIiIlTsVLRMKv4+Vw1kCqrv6Sb5qMZs1Wr3ztTMkIOpmISIlS8RKRYBxzLZz6CDVXjefb5mNZvmkXvV+byqote4NOJiJSYlS8RCQ4J9wCXe+h7soP+b71Z2zZnUbPV6Ywdbn+tqOIxCYVLxEJ1kl3wfE3U+fXt5ncchx1ysfR/80ZvD3996CTiYgUOxUvEQmWGZz6CHS5k4qLRjO+8lOc1SSe+8b9zMOf/EJmVnbQCUVEik1gxcvM4s1srpmNDyqDiEQIMzjlPrhgKAl/zOfF3bdyb4cMhk9dxRXDZ+qkexGJGUGOeN0MLA7w9UUk0rTpDf/4AnOOa5b9k7c7r2faiq2c/8oU/YkhEYkJgRQvM2sAnAMMCeL1RSSC1WsP10yCOm05fs7tfNdxKjtT0uk5aAo/LtNJ9yIS3YIa8XoeuBPI8+QNM7vGzGaZ2azNmzeHL5mIBK9CLRjwKRzZjwYLXmJy42E0quQYMGwGb/20Kuh0IiIHLOzFy8y6A5ucc7Pzu59z7g3nXCfnXKeaNWuGKZ2IRIyEJDj3ZTjzv5Rb8SXjkh7hgsaZPPjxL9z/0UIydNK9iEShIEa8jgfONbNVwBjgFDMbFUAOEYl0ZnDsddDvA+J3r+fJbTfzePvtjJq2mgFDZ7AjRX9mSESiS9iLl3PuHudcA+dcI+Bi4FvnXL9w5xCRKNL0FLj6W6x8Tfr+ejMfHrWYWau203PQFH7btDvodCIihabv8RKR6FC9KVw1EZp2o8PCfzOlzaekpqVx/qCpTFq6Keh0IiKFEmjxcs5Ncs51DzKDiESR5MpwyWg4/hZqLn2bH+q8QKvK+/jH8JkM/XElzrmg7iC15QAAHKlJREFUE4qI5EsjXiISXeLi4bRHoNcQkjbOZbTdy4Ame3h0/CLu+XAh+zJ10r2IRC4VLxGJTkf0gSsmEJedwYOb/o/nj1jDmJlruOzN6Wzbq5PuRSQyqXiJSPSq3xGumYTVaknPX+/i8/ZT/7+9O4+uqrzbPv69TwYgDCIgg8qoFMSCymDVonXAsSpatYrUAcUZtbXaKq2d7WNtVaxKtVoVB7TOCAiIyCRhngJOiJAQIMzzEEKS/f4R3vdtfRyCkrNPcr6ftbIW4eysXCu/lZzr3Ps+ezOvcAO9H32fRavddC8p9Vi8JFVv9ZvDlSPhiD4c9vEjTDv0eSjZwY8G5/Lex6vjTidJ/8XiJan6y6oN5/0DTruH/QtG817DP9O94TauHjKLf076zE33klKGxUtSzRACHDcALn2FrK3LeXr3LxjQbg1/fvtjfvFqHrtKy+JOKEkWL0k1TPtecM04Qp2G3FZ0B08cvpBXZi/nJ09OZ922XXGnk5TmLF6Sap4m7aH/OELbEzj1sz8zvtNIPly+nt6PTOHjVVviTicpjVm8JNVMdRpC31fg2AG0XfIC01oNJqdsMxcMzmXsh266lxQPi5ekmiuRAaffA+c9Rv3VsxiV81tObLSea5+bxeAJi910LynpLF6Sar4j+8CVb5NZVswjO37BnW2XcN/oT/j5y/Mp3u2me0nJY/GSlB5a9qi42GqT9ly78m6Gdnif1+cup88T01iztTjudJLShMVLUvpocCD0G0XofCHHFQwm99AXyC9ax3mPTOGDlZvjTicpDVi8JKWXrDrwoyeg1+84cPkocpvdxwHROi78x1RGLyyKO52kGs7iJSn9hAA9fwZ9XqLOlgJez/wV5zYu5Prn5/DwuE/ddC+pyli8JKWvDmfANePIqF2fe7cO5H/azOP+sYu49aV5brqXVCUsXpLS2wEdKi622vo4+qy6jzcPGcHI+YVc/PhUVm9x072kfcviJUk5jaDva3DMjRy5YijTWz/G6jWr6P3IFBYsd9O9pH3H4iVJABmZcMb/wLmP0GTtDCY2/CPtWM5Fj+cyMs9N95L2DYuXJP2nrpfBlSOoVbad5xnIZY0/4aahc3hw7CLKy910L+nbsXhJ0ue1OgaunUCi8SEM3PQ7Hm41kYfGLeLmF+eys8RN95K+OYuXJH2R/Q6GfqMJh5/HOWseZ2zr5xm3sICLHs+laPPOuNNJqqYsXpL0ZbJz4MKn4eS7ab96FDOa/43tawvp/cgU5hVuijudpGrI4iVJXyUEOOF2uGQoDbbnM7be7zgysZiLH5/KsHkr4k4nqZqxeElSZXT8IVw9lszs2jxe+htubjyLW1+ax9/GfOKme0mVZvGSpMpq1gmunUBoeTQDNv+NIQe/xeDxi7jhhdls31UadzpJ1YDFS5L2Rk4juOwNOPpafrDuJSYe9A+mfbiECx+byopNbrqX9NUsXpK0tzKy4Ky/wtmDaLlxOlOb/JnMDYvp/cgUZhdsjDudpBRm8ZKkb6p7P7j8LXLKtvBmrd9wYkYeff45jdfnLI87maQUZfGSpG+jzffh2glk7N+Kv5b8kV83HsdtL8/j3lEfU+ame0mfY/GSpG+rYSu4agyh49lcvuUJXmvxPE9P/IjrnpvFNjfdS/oPFi9J2hdq1YOLhsCJd9Ft4yjeb3Y/Cz9ZxIX/yKVww46400lKERYvSdpXEgk48U748XMcsOMzJu73expuWsB5j05hZv6GuNNJSgEWL0na1zqdC1e/Q63sWgzN+APnZ+Zy6RPTeHHGMqLIfV9SOrN4SVJVaN4Zrh1P4uBu/HrXAzzQ6E1+9fp8+g+ZxeotxXGnkxQTi5ckVZW6TeCyN6FbP87Z+m8mHjiYTxd/wqkPTOS12ctd/ZLSkMVLkqpSZjacMwjOfpCWW+Yyoc4vGVB/Ire/Mperh8xi1WZXv6R0YvGSpGTofhXcOJVEy+5cu/VRpja/n6LP5nPqgxN5ZVahq19SmrB4SVKyNGpbceqx92CaFy/l7ay7GFhvBHe9Ooernpnp6peUBixekpRMIcBRfeGmGYSOZ9Fn27PMaPInti6Z4eqXlAYsXpIUh/rN4MdD4JKhNApbeSXzbv6c829+8+oM+j0zk6LNO+NOKKkKWLwkKU4dfwg3TSd0vYJzdrzGjP3vJrFkAqc9MImXZ7r6JdU0Fi9Jilvt/Sre+XjlSOrXqc1TGffw95wn+dNruVz59ExWbnL1S6opLF6SlCra9IQbpkDP2zixeBzTG9xFw6Vvc/qDE/n3TK96L9UEFi9JSiVZdaDXbwnXTqBO44N5KONBnqo9iAdem8jlT81ghatfUrVm8ZKkVNSiC/R/D079A93L5jK53p20K3iFMx6cwEve81GqtixekpSqMjLh+7cSbsgl++Cj+H3iCf5d6x4ef+MdV7+kasriJUmprvEhcMVwOPdhDgsFvFvnLo4qeJofPvgeL7r6JVUrFi9Jqg5CgK6XEwbMJKPDGdyWeJE3s+7m+Tfe4vKnZrB84464E0qqBIuXJFUn9ZvDxc/Bxc/TuvY2htf+DScue5jzBr3L0OmufkmpzuIlSdXRYecQbppB4qi+XB2GMyLzFwwf9hKX/cvVLymVWbwkqbqq0xDOfRiuGE6zBrV5Mfseziu8lwsfHMUL0wtc/ZJSkMVLkqq7ticQbpwK37+VCxITGZV5O5OGPcVP/jWdwg2ufkmpxOIlSTVBVh049Q+Ea96jYdODeDx7EFcW3s1PBg3j+WkFlJe7+iWlAouXJNUkBx5JuGY89PodvTLzeDvjdvKGP8xPnpzm6peUAixeklTTZGRBz58Rbswlp9UR3Jf1BLeu+DnXDHqZ56bmu/olxcjiJUk1VeNDCFeMgLMH0aPWMt7KuIPCEfdy2RO5rn5JMbF4SVJNlkhA934kBswg6zu9GJj1IgNXDuDWQc/y7FRXv6Rks3hJUjpocCDhkqFw0RA61t3KK4m72Dbybq54YiLL1rv6JSWLxUuS0kUIcPh5ZAyYQeLIS7kx8y3+uPJ6fj3oMYbkuvolJYPFS5LSTU4jwnmPwuXDOLhhNs9m/J7Mt2/jqsfHUbB+e9zppBrN4iVJ6ardiWTeNJXo2AH0yRzPX1b1575BD/DMlKWufklVxOIlSeksuy7h9HtIXPMujQ5owaMZf6PJ6Ou47rFR5K9z9Uva1yxekiQ4qBtZN0wiOvluzsyay9/W9Ofxv/+Bp99f4uqXtA9ZvCRJFTKyCCfcTsaNudQ58Lv8T+IxDh1zGbcMfsPVL2kfsXhJkv5bk/Zk9x9NdNb9HFNrKfetu56X/v5Lnp682NUv6VuyeEmS/rdEgnB0f7Junkmi3Q+4M/EcR429iDseHcpSV7+kb8ziJUn6cvsdRO3LXia64Ck61t7EvetvZvTfb+KZiR9T5uqXtNcsXpKkrxYCofMF1P7pbEo7XcANiTfoOe58fvvwEyxZuy3udFK1YvGSJFVOTiPq/PgJop+8Tot6gT9tvIOpD/djyPg8V7+kSrJ4SZL2Sjj0FOreOoMdXa+lT2Isp07ozV/+PsjVL6kSLF6SpL1Xqx455/6VcPVY6jZozMBNv+PDhy9i8Jvj2bxjd9zppJRl8ZIkfWOhZQ/2uzWXbcf9kjMyZtJ/7gWMua8Pz46ews6SsrjjSSknRFHqn5fv3r17NGvWrLhjSJK+yublbBhzLw0+fJGyCIZlnErWD27n7J5dycrwdb7SRwhhdhRF3b/wMYuXJGmf2rSMNSPvodGnr1AWJRiefQb7nXoHp3TvQiIR4k4nVbmvKl6+BJEk7VsNW9G07+Nk3DKbde16c/7ukfQc2Yth9/Vj2oKP404nxcriJUmqEqFRWw664l8wYBZrWp7BucXD6PLqCbx1/zUs/HRJ3PGkWFi8JElVKqPJIbTu/xxlN0ylqMUpnL31Fdo8fyyjHrqRpYWFcceTksriJUlKiuxmHTnk+hfZec0UVhxwAqdvGEqTJ3swbvAtrFpdFHc8KSksXpKkpKp70OF0GPAKm6+cwLL9j+GUNUPIGdyVSU/czqaN6+KOJ1Upi5ckKRb7tz2Sw3/6Jqv6vEtBg26csOIJwkNdmPbMXezYujHueFKVsHhJkmLVvEMPOv98BPkXjGJpTheOyR9Myf2dmTP0t+zeuSXueNI+ZfGSJKWENp2P48hfjOajs99kSXZHui4axLa/fJcPXv0T5bu2xx1P2icsXpKklHJY95M46q6xzO71Mksy23H4wr+y6d5OLB72F6KSHXHHk74Vi5ckKeWEEOjW83SOGjieST2fZ0loxaFz/8zGew+ncPSDsLs47ojSN2LxkiSlrEQicEKvc+gycCKju/+LpeXNaTntd2y893DWjHsUSnfFHVHaKxYvSVLKy85McMbZF9Lxzkm80fkfLC1tTNPJA9n4l85snPxPKC2JO6JUKRYvSVK1Ubd2FudfcClt7pjMC+0fIr+kAfuPu4NN93Vh27SnoWx33BGlr2TxkiRVO43q1aJv3ytp+tNJPNnqPgqK61Bv9E/Z/Ncj2DXzOSgrjTui9IUsXpKkauug/XPof9V15Nw4kUea/4nCHVnUGjmAzfcfRencl6C8LO6I0n+xeEmSqr32zRsw4PqbKbl6PH/b/zes3AaZw65j6wPdKc97FcrL444oARYvSVIN0rV1I35+y22s6fsu99S7i5VbSki8fjXbHjqa6IM3LWCKXYiiKO4MX6t79+7RrFmz4o4hSapGyssjRuStYM6op+m780XaJ1awY/+O5Jz2a+h4NoQQd0TVUCGE2VEUdf/CxyxekqSabHdZOS/PyOejd4fQb/e/OSRRRHGT71K716+gw5kWMO1zFi9JUtrbUVLKM+8vpnDSc1wXvUKbsJqSpkeQ3evX0P5UC5j2GYuXJEl7bNpRwmPjP2HztOe5MfEaLcNadrfoRtYpA+GQUyxg+tYsXpIkfU7R5p08PPYjyucO5ZbMNzgwrKPsoKPJOOVX0PYHFjB9YxYvSZK+xOI123jonYU0+Ojf3Jw1jOasp7zVcSRO/hW06Rl3PFVDFi9Jkr7G/MJNPDAqj9YFr3Jz1nAOYANRm+MJJ/0KWh8bdzxVIxYvSZIq6f1P1/HgqPl0Wf0GN2cPp1G0iajdSYSTBkLLo+OOp2rgq4pXZrLDSJKUynq2b8Jxh5zMqIWduHTMOXx/0zBuWTqC/ZacCm2Ohx5XV1wHLCMr7qiqhixekiR9TiIR+GGXFpx2eDNenX0Y5449i9N2jOSawnE0zb+SqF5zQrcroOsVsN9BccdVNeKpRkmSvsbOkjKem5bPc1OWcOjW6VxT+z2OLZ8DIUHoeBb06O87IfX/uMdLkqR9oLSsnHc/Ws0zufksX/oxl2e9R9+sidQt2wyN21echjyiD9RpGHdUxcjiJUnSPvbxqi0MyS1g5NwlnFw2jRvqjqfD7o+IMusQulwE3a+GA4+MO6ZikFLFK4TQEngWaA6UA/+Mouihr/oai5ckKVVt2lHCy7MKeXZqAftt+ohr6oznh0wmq7wYDu5RcRqy03mQVTvuqEqSVCteLYAWURTNCSHUB2YD50VR9OGXfY3FS5KU6srKI977eA1DcvPJW1zARVnvc12d8TQtWQZ1GkHXy6BbP2jUNu6oqmIpVbz+V4AQhgGPRFE09suOsXhJkqqTT1dvZcjUfF6fs5wjShdwS/0JfK9kGiEqJxzaC46+Bg7tBYmMuKOqCqRs8QohtAEmAd+NomjL5x67FrgWoFWrVt0KCgqSnk+SpG9j887dvDp7Oc9Ozad4/XL650yiT+Z71CtZB/u1gu79oOvlULdJ3FG1D6Vk8Qoh1AMmAvdEUfT6Vx3ripckqTorL4+YuGgtz+TmM2VREWdmzuGWBhNpv2MuZGRX7AHr0b/iyvhekqLaS7niFULIAkYAY6IoeuDrjrd4SZJqis/WbuO5qQW8Ons5zUoK+FnDyZy++z2ySrdBs84Vl6TofBHUqhd3VH1DKVW8QggBGAJsiKLop5X5GouXJKmm2Vq8m9fnrGDI1HxWrV1P35zpXFPnPZpu/xRqNai4HliPq+GADnFH1V5KteLVE5gMLKDichIAA6MoevvLvsbiJUmqqcrLIyYvXseQ3HzGf7Ka7onF/KLRZLptn0SivMT7Q1ZDKVW8vgmLlyQpHRSs386zUwt4eVYhWcUbuKXRNC6K3qHuzpVQrzl0uwK6XQkNDow7qr6CxUuSpGpk+65S3pi7giG5+Xy2Zgtn53zArQ0m0m7TVEJIgPeHTGkWL0mSqqEoisj9bD3P5Obz7keraR3WMLDZNE7aMYasXRu9P2SKsnhJklTNFW7YwfPTCnhpZiHFO7fTv1Ee/bLfpcmmPMjKgc4XVqyCtTgi7qhpz+IlSVINsbOkjDfnVZyG/HjVVo6pU8idTabQZeM7JEq9P2QqsHhJklTDRFHE9KUbGJKbz5gPVlGf7dx14Fx67x5FnS1LvT9kjCxekiTVYCs27aw4DTljGRt3lHBx46XcVG88LddMIETl0P7UilUw7w+ZFBYvSZLSQPHuMt6av5Ihufl8sHIL7Wtv5u4WMzluy0gyt6+Ghq2g+1Vw1GXeH7IKWbwkSUojURQxq2Ajz+TmM3rhKhLRbm5ruZg+4R0arp7m/SGrmMVLkqQ0tWpzMS9ML2Do9GWs317CyY3Wc0eTXDquHkHYtdX7Q1YBi5ckSWluV2kZI/OKeCY3n7zlm2laq5S7Wy/ktO0jqLX+w4r7Q3a5GI7sAwd2dRXsW7B4SZIkoOI05NzCTQzJzeftBUXsLiunf+u19K89jmbL3yGU7YIm36koYV0uhoYt445c7Vi8JEnS/7JmSzFDZyzjhenLWLt1F532L+fWFh9w/M5x5BTNqDiozfEVBaxTb6jdIN7A1YTFS5IkfamS0nJGLSzilVnLyf1sHeUR9GyyjZsazaHb5tFkb86HzNrQ8YcVtydqdxJkZMYdO2VZvCRJUqWs3bqL0QuLGJ5XxMz8DURRxPkHFHFV/el02vAuGcUboW7Tis34R1wMzbu4H+xzLF6SJGmvrdpczMgFRYzIW8ncZZvIopR+TRdxaa1cWq+fTCjfDU077dkP9mNocGDckVOCxUuSJH0rhRt2MHJBEcPnr+SDlVtoyFZuappH78T7NN00HwjQ7gcVpyI7np3Wl6aweEmSpH1mydptjMwrYnjeShat3ka7RBEDGs/htNIJ1Nu5ArJy4LBz4IhLoO0P0u42RRYvSZJUJT5ZtZUReSsZkVfE0nXb+F7GIq7ffxbfL55EdulWqN9iz36wS6DZ4XHHTQqLlyRJqlJRFPHByi0Mz1vJiPlFrNu0mdMy53F1g+l02TmTRFQKzTtDl0sqilj9ZnFHrjIWL0mSlDT/9yKtI+YXMXLBSnZvWcv52dO4rM402uz6mCgkCIecXLEfrMNZkJ0Td+R9yuIlSZJiUV4eMTN/AyPyinh7QRENdyzlx9m5XJSVS6PS1UTZ9Qmdzq04Fdm6JyQScUf+1ixekiQpdqVl5UxbsoHh81cyZuFKOpYs4OLsKZyZmE7t8h1EDQ4idLm4ooQd0CHuuN+YxUuSJKWUktJy3l+8lhHzi5j04TKO3T2di7NzOZb5ZFBGdOBRhC6XQOcLoW6TuOPuFYuXJElKWcW7y5jwyVpG5K1k3keLOK38fX6c9T4dWUoUMqH9KYQj+sB3zoSs2nHH/VoWL0mSVC3sKCll3EdrGD5/JSsWzeEcJvGjzFyasp6y7AYkvns+4YhLoOUxKbsfzOIlSZKqna3Fuxn74WpGzCuk9LNJ9E5M5qyMmdShmJL6Lck+qk/FfrDGh8Qd9b9YvCRJUrW2cXsJYz5Yxdh5S2hQMIbzE5P5fsYHZFBOcbOu1O52KXz3AshpFHdUi5ckSao51m7dxeiFRbw/dyGtV47k/MRkDksUUhYy2dW2Fzk9fgLtT4PMWrHks3hJkqQaadXmYkYuKGLB7PfptHYU52VMoWnYRHFmA8o7/YicHn3h4B4QQtIyWbwkSVKNV7hhByPnFbJizii6bR7D6YlZ1AklbMlpRcaRl1C3+6XQqG2V57B4SZKktLJk7TbGzFnMtnmvc9y2dzk28SGJELHk+Ptpd0r/Kv3eX1W8Mqv0O0uSJMWg3QH1uOH0I+H0I/lk1R08OXMu5L1Mj2bHxZrL4iVJkmq0Ds3r0+GcE4jOPj7uKBYvSZKUHkISN9h/mdS85KskSVINZPGSJElKEouXJElSkli8JEmSksTiJUmSlCQWL0mSpCSxeEmSJCWJxUuSJClJLF6SJElJYvGSJElKEouXJElSkli8JEmSksTiJUmSlCQWL0mSpCSxeEmSJCWJxUuSJClJLF6SJElJYvGSJElKEouXJElSkli8JEmSksTiJUmSlCQWL0mSpCQJURTFneFrhRDWAgVV/G2aAOuq+Hto7zmX1ONMUpNzST3OJDUlYy6toyg64IseqBbFKxlCCLOiKOoedw79N+eSepxJanIuqceZpKa45+KpRkmSpCSxeEmSJCWJxev/+2fcAfSFnEvqcSapybmkHmeSmmKdi3u8JEmSksQVL0mSpCSxeEmSJCVJ2hWvEMIZIYRPQgiLQwh3fsHjIYTw9z2P54UQusaRM51UYiZ998wiL4SQG0I4Io6c6ebr5vIfx/UIIZSFEC5MZr50VJmZhBBODCHMCyF8EEKYmOyM6agSf8P2CyEMDyHM3zOXfnHkTCchhKdCCGtCCAu/5PH4nuujKEqbDyAD+AxoB2QD84FOnzvmLGAUEIBjgOlx567JH5WcyXHA/nv+faYzSY25/Mdx7wFvAxfGnbsmf1Tyd6Uh8CHQas/nTePOXdM/KjmXgcBf9vz7AGADkB139pr8AZwAdAUWfsnjsT3Xp9uK19HA4iiKlkRRVAK8BPT+3DG9gWejCtOAhiGEFskOmka+diZRFOVGUbRxz6fTgIOTnDEdVeZ3BeBm4DVgTTLDpanKzORS4PUoipYBRFHkXKpeZeYSAfVDCAGoR0XxKk1uzPQSRdEkKn7OXya25/p0K14HAYX/8fnyPf+3t8do39nbn/fVVLxKUdX62rmEEA4CzgceS2KudFaZ35XvAPuHECaEEGaHEC5PWrr0VZm5PAIcBqwEFgC3RlFUnpx4+hKxPddnJuObpJDwBf/3+etpVOYY7TuV/nmHEE6ionj1rNJEgsrNZRDwyyiKyipeyKuKVWYmmUA34BSgDjA1hDAtiqJFVR0ujVVmLqcD84CTgUOAsSGEyVEUbanqcPpSsT3Xp1vxWg60/I/PD6biFcjeHqN9p1I/7xBCF+BJ4MwoitYnKVs6q8xcugMv7SldTYCzQgilURS9mZyIaaeyf7/WRVG0HdgeQpgEHAFYvKpOZebSD7g3qthctDiEsBToCMxITkR9gdie69PtVONMoH0IoW0IIRu4BHjrc8e8BVy+5x0PxwCboygqSnbQNPK1MwkhtAJeBy7zlXvSfO1coihqG0VRmyiK2gCvAjdauqpUZf5+DQOODyFkhhBygO8BHyU5Z7qpzFyWUbEKSQihGdABWJLUlPq82J7r02rFK4qi0hDCAGAMFe9EeSqKog9CCNfvefwxKt6ddRawGNhBxSsVVZFKzuQ3QGNg8J7VldIoxjvLp4NKzkVJVJmZRFH0UQhhNJAHlANPRlH0hW+n175Ryd+VPwLPhBAWUHGK65dRFK2LLXQaCCG8CJwINAkhLAd+C2RB/M/13jJIkiQpSdLtVKMkSVJsLF6SJElJYvGSJElKEouXJElSkli8JEmSksTiJUmSlCQWL0mSpCT5Pyp4NFAwy6g5AAAAAElFTkSuQmCC\n",
       "text/plain": [
-       "<Figure size 720x720 with 1 Axes>"
+       "'\\n# 4) Train\\nhistory = model.fit(\\n    x_train, y_train,\\n    epochs=10,\\n    batch_size=64,\\n    validation_split=0.1,  # optional: monitor validation during training\\n    verbose=1\\n)\\n\\n# 5) Evaluate on test set\\ntest_loss, test_acc = model.evaluate(x_test, y_test, verbose=0)\\nprint(f\"Test accuracy: {test_acc:.4f}, Test loss: {test_loss:.4f}\")\\n'"
       ]
      },
-     "metadata": {
-      "needs_background": "light"
-     },
-     "output_type": "display_data"
+     "execution_count": 6,
+     "metadata": {},
+     "output_type": "execute_result"
     }
    ],
-=======
-   "outputs": [],
->>>>>>> 9b0e2e75096cc1acee65bfac25f4eff818140252
    "source": [
-    "import autograd.numpy as np\n",
-    "from autograd import grad, elementwise_grad\n",
-    "import autograd.numpy.random as npr\n",
-    "from matplotlib import pyplot as plt\n",
-    "\n",
-    "def sigmoid(z):\n",
-    "    return 1/(1 + np.exp(-z))\n",
-    "\n",
-    "# The neural network with one input layer and one output layer,\n",
-    "# but with number of hidden layers specified by the user.\n",
-    "def deep_neural_network(deep_params, x):\n",
-    "    # N_hidden is the number of hidden layers\n",
-    "\n",
-    "    N_hidden = np.size(deep_params) - 1 # -1 since params consists of\n",
-    "                                        # parameters to all the hidden\n",
-    "                                        # layers AND the output layer.\n",
-    "\n",
-    "    # Assumes input x being an one-dimensional array\n",
-    "    num_values = np.size(x)\n",
-    "    x = x.reshape(-1, num_values)\n",
-    "\n",
-    "    # Assume that the input layer does nothing to the input x\n",
-    "    x_input = x\n",
     "\n",
-    "    # Due to multiple hidden layers, define a variable referencing to the\n",
-    "    # output of the previous layer:\n",
-    "    x_prev = x_input\n",
-    "\n",
-    "    ## Hidden layers:\n",
-    "\n",
-    "    for l in range(N_hidden):\n",
-    "        # From the list of parameters P; find the correct weigths and bias for this layer\n",
-    "        w_hidden = deep_params[l]\n",
+    "import tensorflow as tf\n",
+    "from tensorflow import keras\n",
+    "from tensorflow.keras import layers, regularizers\n",
+    "\n",
+    "# Check for GPU (TensorFlow will use it automatically if available)\n",
+    "gpus = tf.config.list_physical_devices('GPU')\n",
+    "print(f\"GPUs available: {gpus}\")\n",
+    "\n",
+    "# 1) Load and preprocess MNIST\n",
+    "(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()\n",
+    "# Normalize to [0, 1]\n",
+    "x_train = (x_train.astype(\"float32\") / 255.0)\n",
+    "x_test  = (x_test.astype(\"float32\") / 255.0)\n",
+    "\n",
+    "# 2) Build the model: 784 -> 100 -> 100 -> 10\n",
+    "l2_reg = 1e-4  # L2 regularization strength\n",
+    "\n",
+    "model = keras.Sequential([\n",
+    "    layers.Input(shape=(28, 28)),\n",
+    "    layers.Flatten(),\n",
+    "    layers.Dense(100, activation=\"relu\",\n",
+    "                 kernel_regularizer=regularizers.l2(l2_reg)),\n",
+    "    layers.Dense(100, activation=\"relu\",\n",
+    "                 kernel_regularizer=regularizers.l2(l2_reg)),\n",
+    "    layers.Dense(10, activation=\"softmax\")  # output probabilities for 10 classes\n",
+    "])\n",
+    "\n",
+    "# 3) Compile with SGD + weight decay via L2 regularizers\n",
+    "model.compile(\n",
+    "    optimizer=keras.optimizers.SGD(learning_rate=0.01),\n",
+    "    loss=\"sparse_categorical_crossentropy\",\n",
+    "    metrics=[\"accuracy\"],\n",
+    ")\n",
+    "\n",
+    "model.summary()\n",
+    "\"\"\"\n",
+    "# 4) Train\n",
+    "history = model.fit(\n",
+    "    x_train, y_train,\n",
+    "    epochs=10,\n",
+    "    batch_size=64,\n",
+    "    validation_split=0.1,  # optional: monitor validation during training\n",
+    "    verbose=1\n",
+    ")\n",
+    "\n",
+    "# 5) Evaluate on test set\n",
+    "test_loss, test_acc = model.evaluate(x_test, y_test, verbose=0)\n",
+    "print(f\"Test accuracy: {test_acc:.4f}, Test loss: {test_loss:.4f}\")\n",
+    "\"\"\""
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5fd4d319",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Building our own  neural network code\n",
     "\n",
-    "        # Add a row of ones to include bias\n",
-    "        x_prev = np.concatenate((np.ones((1,num_values)), x_prev ), axis = 0)\n",
+    "Here we  present a flexible object oriented codebase\n",
+    "for a feed forward neural network, along with a demonstration of how\n",
+    "to use it. Before we get into the details of the neural network, we\n",
+    "will first present some implementations of various schedulers, cost\n",
+    "functions and activation functions that can be used together with the\n",
+    "neural network.\n",
     "\n",
-    "        z_hidden = np.matmul(w_hidden, x_prev)\n",
-    "        x_hidden = sigmoid(z_hidden)\n",
+    "The codes here were developed by Eric Reber and Gregor Kajda during spring 2023."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "64134feb",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "### Learning rate methods\n",
     "\n",
-    "        # Update x_prev such that next layer can use the output from this layer\n",
-    "        x_prev = x_hidden\n",
+    "The code below shows object oriented implementations of the Constant,\n",
+    "Momentum, Adagrad, AdagradMomentum, RMS prop and Adam schedulers. All\n",
+    "of the classes belong to the shared abstract Scheduler class, and\n",
+    "share the update_change() and reset() methods allowing for any of the\n",
+    "schedulers to be seamlessly used during the training stage, as will\n",
+    "later be shown in the fit() method of the neural\n",
+    "network. Update_change() only has one parameter, the gradient\n",
+    "($δ^l_ja^{l−1}_k$), and returns the change which will be subtracted\n",
+    "from the weights. The reset() function takes no parameters, and resets\n",
+    "the desired variables. For Constant and Momentum, reset does nothing."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "id": "643f7a82",
+   "metadata": {
+    "collapsed": false,
+    "editable": true,
+    "jupyter": {
+     "outputs_hidden": false
+    }
+   },
+   "outputs": [],
+   "source": [
+    "import autograd.numpy as np\n",
     "\n",
-    "    ## Output layer:\n",
+    "class Scheduler:\n",
+    "    \"\"\"\n",
+    "    Abstract class for Schedulers\n",
+    "    \"\"\"\n",
     "\n",
-    "    # Get the weights and bias for this layer\n",
-    "    w_output = deep_params[-1]\n",
+    "    def __init__(self, eta):\n",
+    "        self.eta = eta\n",
     "\n",
-    "    # Include bias:\n",
-    "    x_prev = np.concatenate((np.ones((1,num_values)), x_prev), axis = 0)\n",
+    "    # should be overwritten\n",
+    "    def update_change(self, gradient):\n",
+    "        raise NotImplementedError\n",
     "\n",
-    "    z_output = np.matmul(w_output, x_prev)\n",
-    "    x_output = z_output\n",
+    "    # overwritten if needed\n",
+    "    def reset(self):\n",
+    "        pass\n",
     "\n",
-    "    return x_output\n",
     "\n",
-    "# The trial solution using the deep neural network:\n",
-    "def g_trial_deep(x,params, g0 = 10):\n",
-    "    return g0 + x*deep_neural_network(params, x)\n",
+    "class Constant(Scheduler):\n",
+    "    def __init__(self, eta):\n",
+    "        super().__init__(eta)\n",
     "\n",
-    "# The right side of the ODE:\n",
-    "def g(x, g_trial, gamma = 2):\n",
-    "    return -gamma*g_trial\n",
+    "    def update_change(self, gradient):\n",
+    "        return self.eta * gradient\n",
+    "    \n",
+    "    def reset(self):\n",
+    "        pass\n",
     "\n",
-    "# The same cost function as before, but calls deep_neural_network instead.\n",
-    "def cost_function_deep(P, x):\n",
     "\n",
-    "    # Evaluate the trial function with the current parameters P\n",
-    "    g_t = g_trial_deep(x,P)\n",
+    "class Momentum(Scheduler):\n",
+    "    def __init__(self, eta: float, momentum: float):\n",
+    "        super().__init__(eta)\n",
+    "        self.momentum = momentum\n",
+    "        self.change = 0\n",
     "\n",
-    "    # Find the derivative w.r.t x of the neural network\n",
-    "    d_net_out = elementwise_grad(deep_neural_network,1)(P,x)\n",
+    "    def update_change(self, gradient):\n",
+    "        self.change = self.momentum * self.change + self.eta * gradient\n",
+    "        return self.change\n",
     "\n",
-    "    # Find the derivative w.r.t x of the trial function\n",
-    "    d_g_t = elementwise_grad(g_trial_deep,0)(x,P)\n",
+    "    def reset(self):\n",
+    "        pass\n",
     "\n",
-    "    # The right side of the ODE\n",
-    "    func = g(x, g_t)\n",
     "\n",
-    "    err_sqr = (d_g_t - func)**2\n",
-    "    cost_sum = np.sum(err_sqr)\n",
+    "class Adagrad(Scheduler):\n",
+    "    def __init__(self, eta):\n",
+    "        super().__init__(eta)\n",
+    "        self.G_t = None\n",
     "\n",
-    "    return cost_sum / np.size(err_sqr)\n",
+    "    def update_change(self, gradient):\n",
+    "        delta = 1e-8  # avoid division ny zero\n",
     "\n",
-    "# Solve the exponential decay ODE using neural network with one input and one output layer,\n",
-    "# but with specified number of hidden layers from the user.\n",
-    "def solve_ode_deep_neural_network(x, num_neurons, num_iter, lmb):\n",
-    "    # num_hidden_neurons is now a list of number of neurons within each hidden layer\n",
+    "        if self.G_t is None:\n",
+    "            self.G_t = np.zeros((gradient.shape[0], gradient.shape[0]))\n",
     "\n",
-    "    # The number of elements in the list num_hidden_neurons thus represents\n",
-    "    # the number of hidden layers.\n",
+    "        self.G_t += gradient @ gradient.T\n",
     "\n",
-    "    # Find the number of hidden layers:\n",
-    "    N_hidden = np.size(num_neurons)\n",
+    "        G_t_inverse = 1 / (\n",
+    "            delta + np.sqrt(np.reshape(np.diagonal(self.G_t), (self.G_t.shape[0], 1)))\n",
+    "        )\n",
+    "        return self.eta * gradient * G_t_inverse\n",
     "\n",
-    "    ## Set up initial weights and biases\n",
+    "    def reset(self):\n",
+    "        self.G_t = None\n",
     "\n",
-    "    # Initialize the list of parameters:\n",
-    "    P = [None]*(N_hidden + 1) # + 1 to include the output layer\n",
     "\n",
-    "    P[0] = npr.randn(num_neurons[0], 2 )\n",
-    "    for l in range(1,N_hidden):\n",
-    "        P[l] = npr.randn(num_neurons[l], num_neurons[l-1] + 1) # +1 to include bias\n",
+    "class AdagradMomentum(Scheduler):\n",
+    "    def __init__(self, eta, momentum):\n",
+    "        super().__init__(eta)\n",
+    "        self.G_t = None\n",
+    "        self.momentum = momentum\n",
+    "        self.change = 0\n",
     "\n",
-    "    # For the output layer\n",
-    "    P[-1] = npr.randn(1, num_neurons[-1] + 1 ) # +1 since bias is included\n",
+    "    def update_change(self, gradient):\n",
+    "        delta = 1e-8  # avoid division ny zero\n",
     "\n",
-    "    print('Initial cost: %g'%cost_function_deep(P, x))\n",
+    "        if self.G_t is None:\n",
+    "            self.G_t = np.zeros((gradient.shape[0], gradient.shape[0]))\n",
     "\n",
-    "    ## Start finding the optimal weights using gradient descent\n",
+    "        self.G_t += gradient @ gradient.T\n",
     "\n",
-    "    # Find the Python function that represents the gradient of the cost function\n",
-    "    # w.r.t the 0-th input argument -- that is the weights and biases in the hidden and output layer\n",
-    "    cost_function_deep_grad = grad(cost_function_deep,0)\n",
+    "        G_t_inverse = 1 / (\n",
+    "            delta + np.sqrt(np.reshape(np.diagonal(self.G_t), (self.G_t.shape[0], 1)))\n",
+    "        )\n",
+    "        self.change = self.change * self.momentum + self.eta * gradient * G_t_inverse\n",
+    "        return self.change\n",
     "\n",
-    "    # Let the update be done num_iter times\n",
-    "    for i in range(num_iter):\n",
-    "        # Evaluate the gradient at the current weights and biases in P.\n",
-    "        # The cost_grad consist now of N_hidden + 1 arrays; the gradient w.r.t the weights and biases\n",
-    "        # in the hidden layers and output layers evaluated at x.\n",
-    "        cost_deep_grad =  cost_function_deep_grad(P, x)\n",
+    "    def reset(self):\n",
+    "        self.G_t = None\n",
     "\n",
-    "        for l in range(N_hidden+1):\n",
-    "            P[l] = P[l] - lmb * cost_deep_grad[l]\n",
     "\n",
-    "    print('Final cost: %g'%cost_function_deep(P, x))\n",
+    "class RMS_prop(Scheduler):\n",
+    "    def __init__(self, eta, rho):\n",
+    "        super().__init__(eta)\n",
+    "        self.rho = rho\n",
+    "        self.second = 0.0\n",
     "\n",
-    "    return P\n",
+    "    def update_change(self, gradient):\n",
+    "        delta = 1e-8  # avoid division ny zero\n",
+    "        self.second = self.rho * self.second + (1 - self.rho) * gradient * gradient\n",
+    "        return self.eta * gradient / (np.sqrt(self.second + delta))\n",
     "\n",
-    "def g_analytic(x, gamma = 2, g0 = 10):\n",
-    "    return g0*np.exp(-gamma*x)\n",
+    "    def reset(self):\n",
+    "        self.second = 0.0\n",
     "\n",
-    "# Solve the given problem\n",
-    "if __name__ == '__main__':\n",
-    "    npr.seed(15)\n",
     "\n",
-    "    ## Decide the vales of arguments to the function to solve\n",
-    "    N = 10\n",
-    "    x = np.linspace(0, 1, N)\n",
+    "class Adam(Scheduler):\n",
+    "    def __init__(self, eta, rho, rho2):\n",
+    "        super().__init__(eta)\n",
+    "        self.rho = rho\n",
+    "        self.rho2 = rho2\n",
+    "        self.moment = 0\n",
+    "        self.second = 0\n",
+    "        self.n_epochs = 1\n",
     "\n",
-    "    ## Set up the initial parameters\n",
-    "    num_hidden_neurons = np.array([10,10])\n",
-    "    num_iter = 10000\n",
-    "    lmb = 0.001\n",
+    "    def update_change(self, gradient):\n",
+    "        delta = 1e-8  # avoid division ny zero\n",
     "\n",
-    "    P = solve_ode_deep_neural_network(x, num_hidden_neurons, num_iter, lmb)\n",
+    "        self.moment = self.rho * self.moment + (1 - self.rho) * gradient\n",
+    "        self.second = self.rho2 * self.second + (1 - self.rho2) * gradient * gradient\n",
     "\n",
-    "    res = g_trial_deep(x,P)\n",
-    "    res_analytical = g_analytic(x)\n",
+    "        moment_corrected = self.moment / (1 - self.rho**self.n_epochs)\n",
+    "        second_corrected = self.second / (1 - self.rho2**self.n_epochs)\n",
     "\n",
-    "    plt.figure(figsize=(10,10))\n",
+    "        return self.eta * moment_corrected / (np.sqrt(second_corrected + delta))\n",
     "\n",
-    "    plt.title('Performance of a deep neural network solving an ODE compared to the analytical solution')\n",
-    "    plt.plot(x, res_analytical)\n",
-    "    plt.plot(x, res[0,:])\n",
-    "    plt.legend(['analytical','dnn'])\n",
-    "    plt.ylabel('g(x)')\n",
-    "    plt.show()"
+    "    def reset(self):\n",
+    "        self.n_epochs += 1\n",
+    "        self.moment = 0\n",
+    "        self.second = 0"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "dfa32b7e",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "## Example: Population growth\n",
+    "### Usage of the above learning rate schedulers\n",
     "\n",
-    "A logistic model of population growth assumes that a population converges toward an equilibrium.\n",
-    "The population growth can be modeled by"
+    "To initalize a scheduler, simply create the object and pass in the\n",
+    "necessary parameters such as the learning rate and the momentum as\n",
+    "shown below. As the Scheduler class is an abstract class it should not\n",
+    "called directly, and will raise an error upon usage."
    ]
   },
   {
-   "cell_type": "markdown",
-   "metadata": {},
+   "cell_type": "code",
+   "execution_count": 8,
+   "id": "4b88b24e",
+   "metadata": {
+    "collapsed": false,
+    "editable": true,
+    "jupyter": {
+     "outputs_hidden": false
+    }
+   },
+   "outputs": [],
    "source": [
-    "<!-- Equation labels as ordinary links -->\n",
-    "<div id=\"log\"></div>\n",
-    "\n",
-    "$$\n",
-    "\\begin{equation} \\label{log} \\tag{10}\n",
-    "\tg'(t) = \\alpha g(t)(A - g(t))\n",
-    "\\end{equation}\n",
-    "$$"
+    "momentum_scheduler = Momentum(eta=1e-3, momentum=0.9)\n",
+    "adam_scheduler = Adam(eta=1e-3, rho=0.9, rho2=0.999)"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "2eea0e52",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "where $g(t)$ is the population density at time $t$, $\\alpha > 0$ the growth rate and $A > 0$ is the maximum population number in the environment.\n",
-    "Also, at $t = 0$ the population has the size $g(0) = g_0$, where $g_0$ is some chosen constant.\n",
-    "\n",
-    "In this example, similar network as for the exponential decay using Autograd has been used to solve the equation. However, as the implementation might suffer from e.g numerical instability\n",
-    "and high execution time (this might be more apparent in the examples solving PDEs),\n",
-    "using a library like  TensorFlow is recommended.\n",
-    "Here, we stay with a more simple approach and implement for comparison, the simple forward Euler method.\n",
-    "\n",
-    "## Setting up the problem\n",
-    "\n",
-    "Here, we will model a population $g(t)$ in an environment having carrying capacity $A$.\n",
-    "The population follows the model"
+    "Here is a small example for how a segment of code using schedulers\n",
+    "could look. Switching out the schedulers is simple."
    ]
   },
   {
-   "cell_type": "markdown",
-   "metadata": {},
+   "cell_type": "code",
+   "execution_count": 9,
+   "id": "090bee3c",
+   "metadata": {
+    "collapsed": false,
+    "editable": true,
+    "jupyter": {
+     "outputs_hidden": false
+    }
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Before scheduler:\n",
+      "weights=array([[1., 1., 1.],\n",
+      "       [1., 1., 1.],\n",
+      "       [1., 1., 1.]])\n",
+      "\n",
+      "After scheduler:\n",
+      "weights=array([[0.993993  , 0.99403075, 0.99399314],\n",
+      "       [0.99399303, 0.99399301, 0.993993  ],\n",
+      "       [0.993993  , 0.99399301, 0.993993  ]])\n"
+     ]
+    }
+   ],
    "source": [
-    "<!-- Equation labels as ordinary links -->\n",
-    "<div id=\"solveode_population\"></div>\n",
+    "weights = np.ones((3,3))\n",
+    "print(f\"Before scheduler:\\n{weights=}\")\n",
     "\n",
-    "$$\n",
-    "\\begin{equation} \\label{solveode_population} \\tag{11}\n",
-    "g'(t) = \\alpha g(t)(A - g(t))\n",
-    "\\end{equation}\n",
-    "$$"
+    "epochs = 10\n",
+    "for e in range(epochs):\n",
+    "    gradient = np.random.rand(3, 3)\n",
+    "    change = adam_scheduler.update_change(gradient)\n",
+    "    weights = weights - change\n",
+    "    adam_scheduler.reset()\n",
+    "\n",
+    "print(f\"\\nAfter scheduler:\\n{weights=}\")"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "e0eee286",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "where $g(0) = g_0$.\n",
-    "\n",
-    "In this example, we let $\\alpha = 2$, $A = 1$, and $g_0 = 1.2$.\n",
-    "\n",
-    "## The trial solution\n",
-    "\n",
-    "We will get a slightly different trial solution, as the boundary conditions are different\n",
-    "compared to the case for exponential decay.\n",
-    "\n",
-    "A possible trial solution satisfying the condition $g(0) = g_0$ could be\n",
-    "\n",
-    "$$\n",
-    "h_1(t) = g_0 + t \\cdot N(t,P)\n",
-    "$$\n",
-    "\n",
-    "with $N(t,P)$ being the output from the neural network with weights and biases for each layer collected in the set $P$.\n",
-    "\n",
-    "The analytical solution is\n",
-    "\n",
-    "$$\n",
-    "g(t) = \\frac{Ag_0}{g_0 + (A - g_0)\\exp(-\\alpha A t)}\n",
-    "$$\n",
-    "\n",
-    "## The program using Autograd\n",
+    "### Cost functions\n",
     "\n",
-    "The network will be the similar as for the exponential decay example, but with some small modifications for our problem."
+    "Here we discuss cost functions that can be used when creating the\n",
+    "neural network. Every cost function takes the target vector as its\n",
+    "parameter, and returns a function valued only at $x$ such that it may\n",
+    "easily be differentiated."
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 3,
-   "metadata": {},
+   "execution_count": 10,
+   "id": "191224bb",
+   "metadata": {
+    "collapsed": false,
+    "editable": true,
+    "jupyter": {
+     "outputs_hidden": false
+    }
+   },
    "outputs": [],
    "source": [
     "import autograd.numpy as np\n",
-    "from autograd import grad, elementwise_grad\n",
-    "import autograd.numpy.random as npr\n",
-    "from matplotlib import pyplot as plt\n",
-    "\n",
-    "def sigmoid(z):\n",
-    "    return 1/(1 + np.exp(-z))\n",
     "\n",
-    "# Function to get the parameters.\n",
-    "# Done such that one can easily change the paramaters after one's liking.\n",
-    "def get_parameters():\n",
-    "    alpha = 2\n",
-    "    A = 1\n",
-    "    g0 = 1.2\n",
-    "    return alpha, A, g0\n",
-    "\n",
-    "def deep_neural_network(P, x):\n",
-    "    # N_hidden is the number of hidden layers\n",
-    "    N_hidden = np.size(P) - 1 # -1 since params consist of parameters to all the hidden layers AND the output layer\n",
-    "\n",
-    "    # Assumes input x being an one-dimensional array\n",
-    "    num_values = np.size(x)\n",
-    "    x = x.reshape(-1, num_values)\n",
-    "\n",
-    "    # Assume that the input layer does nothing to the input x\n",
-    "    x_input = x\n",
+    "def CostOLS(target):\n",
+    "    \n",
+    "    def func(X):\n",
+    "        return (1.0 / target.shape[0]) * np.sum((target - X) ** 2)\n",
     "\n",
-    "    # Due to multiple hidden layers, define a variable referencing to the\n",
-    "    # output of the previous layer:\n",
-    "    x_prev = x_input\n",
+    "    return func\n",
     "\n",
-    "    ## Hidden layers:\n",
     "\n",
-    "    for l in range(N_hidden):\n",
-    "        # From the list of parameters P; find the correct weigths and bias for this layer\n",
-    "        w_hidden = P[l]\n",
+    "def CostLogReg(target):\n",
     "\n",
-    "        # Add a row of ones to include bias\n",
-    "        x_prev = np.concatenate((np.ones((1,num_values)), x_prev ), axis = 0)\n",
+    "    def func(X):\n",
+    "        \n",
+    "        return -(1.0 / target.shape[0]) * np.sum(\n",
+    "            (target * np.log(X + 10e-10)) + ((1 - target) * np.log(1 - X + 10e-10))\n",
+    "        )\n",
     "\n",
-    "        z_hidden = np.matmul(w_hidden, x_prev)\n",
-    "        x_hidden = sigmoid(z_hidden)\n",
+    "    return func\n",
     "\n",
-    "        # Update x_prev such that next layer can use the output from this layer\n",
-    "        x_prev = x_hidden\n",
     "\n",
-    "    ## Output layer:\n",
+    "def CostCrossEntropy(target):\n",
+    "    \n",
+    "    def func(X):\n",
+    "        return -(1.0 / target.size) * np.sum(target * np.log(X + 10e-10))\n",
     "\n",
-    "    # Get the weights and bias for this layer\n",
-    "    w_output = P[-1]\n",
+    "    return func"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7f4a0238",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Below we give a short example of how these cost function may be used\n",
+    "to obtain results if you wish to test them out on your own using\n",
+    "AutoGrad's automatics differentiation."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 11,
+   "id": "d822b656",
+   "metadata": {
+    "collapsed": false,
+    "editable": true,
+    "jupyter": {
+     "outputs_hidden": false
+    }
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Derivative of cost function CostCrossEntropy valued at a:\n",
+      "[[-0.08333333]\n",
+      " [-0.13333333]\n",
+      " [-0.16666667]]\n"
+     ]
+    }
+   ],
+   "source": [
+    "from autograd import grad\n",
     "\n",
-    "    # Include bias:\n",
-    "    x_prev = np.concatenate((np.ones((1,num_values)), x_prev), axis = 0)\n",
+    "target = np.array([[1, 2, 3]]).T\n",
+    "a = np.array([[4, 5, 6]]).T\n",
     "\n",
-    "    z_output = np.matmul(w_output, x_prev)\n",
-    "    x_output = z_output\n",
+    "cost_func = CostCrossEntropy\n",
+    "cost_func_derivative = grad(cost_func(target))\n",
     "\n",
-    "    return x_output\n",
+    "valued_at_a = cost_func_derivative(a)\n",
+    "print(f\"Derivative of cost function {cost_func.__name__} valued at a:\\n{valued_at_a}\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7ff32a3b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "### Activation functions\n",
     "\n",
+    "Finally, before we look at the neural network, we will look at the\n",
+    "activation functions which can be specified between the hidden layers\n",
+    "and as the output function. Each function can be valued for any given\n",
+    "vector or matrix X, and can be differentiated via derivate()."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 12,
+   "id": "90045474",
+   "metadata": {
+    "collapsed": false,
+    "editable": true,
+    "jupyter": {
+     "outputs_hidden": false
+    }
+   },
+   "outputs": [],
+   "source": [
+    "import autograd.numpy as np\n",
+    "from autograd import elementwise_grad\n",
     "\n",
-    "def cost_function_deep(P, x):\n",
+    "def identity(X):\n",
+    "    return X\n",
     "\n",
-    "    # Evaluate the trial function with the current parameters P\n",
-    "    g_t = g_trial_deep(x,P)\n",
     "\n",
-    "    # Find the derivative w.r.t x of the trial function\n",
-    "    d_g_t = elementwise_grad(g_trial_deep,0)(x,P)\n",
+    "def sigmoid(X):\n",
+    "    try:\n",
+    "        return 1.0 / (1 + np.exp(-X))\n",
+    "    except FloatingPointError:\n",
+    "        return np.where(X > np.zeros(X.shape), np.ones(X.shape), np.zeros(X.shape))\n",
     "\n",
-    "    # The right side of the ODE\n",
-    "    func = f(x, g_t)\n",
     "\n",
-    "    err_sqr = (d_g_t - func)**2\n",
-    "    cost_sum = np.sum(err_sqr)\n",
+    "def softmax(X):\n",
+    "    X = X - np.max(X, axis=-1, keepdims=True)\n",
+    "    delta = 10e-10\n",
+    "    return np.exp(X) / (np.sum(np.exp(X), axis=-1, keepdims=True) + delta)\n",
     "\n",
-    "    return cost_sum / np.size(err_sqr)\n",
     "\n",
-    "# The right side of the ODE:\n",
-    "def f(x, g_trial):\n",
-    "    alpha,A, g0 = get_parameters()\n",
-    "    return alpha*g_trial*(A - g_trial)\n",
+    "def RELU(X):\n",
+    "    return np.where(X > np.zeros(X.shape), X, np.zeros(X.shape))\n",
     "\n",
-    "# The trial solution using the deep neural network:\n",
-    "def g_trial_deep(x, params):\n",
-    "    alpha,A, g0 = get_parameters()\n",
-    "    return g0 + x*deep_neural_network(params,x)\n",
     "\n",
-    "# The analytical solution:\n",
-    "def g_analytic(t):\n",
-    "    alpha,A, g0 = get_parameters()\n",
-    "    return A*g0/(g0 + (A - g0)*np.exp(-alpha*A*t))\n",
+    "def LRELU(X):\n",
+    "    delta = 10e-4\n",
+    "    return np.where(X > np.zeros(X.shape), X, delta * X)\n",
     "\n",
-    "def solve_ode_deep_neural_network(x, num_neurons, num_iter, lmb):\n",
-    "    # num_hidden_neurons is now a list of number of neurons within each hidden layer\n",
     "\n",
-    "    # Find the number of hidden layers:\n",
-    "    N_hidden = np.size(num_neurons)\n",
+    "def derivate(func):\n",
+    "    if func.__name__ == \"RELU\":\n",
     "\n",
-    "    ## Set up initial weigths and biases\n",
+    "        def func(X):\n",
+    "            return np.where(X > 0, 1, 0)\n",
     "\n",
-    "    # Initialize the list of parameters:\n",
-    "    P = [None]*(N_hidden + 1) # + 1 to include the output layer\n",
+    "        return func\n",
     "\n",
-    "    P[0] = npr.randn(num_neurons[0], 2 )\n",
-    "    for l in range(1,N_hidden):\n",
-    "        P[l] = npr.randn(num_neurons[l], num_neurons[l-1] + 1) # +1 to include bias\n",
+    "    elif func.__name__ == \"LRELU\":\n",
     "\n",
-    "    # For the output layer\n",
-    "    P[-1] = npr.randn(1, num_neurons[-1] + 1 ) # +1 since bias is included\n",
+    "        def func(X):\n",
+    "            delta = 10e-4\n",
+    "            return np.where(X > 0, 1, delta)\n",
     "\n",
-    "    print('Initial cost: %g'%cost_function_deep(P, x))\n",
+    "        return func\n",
     "\n",
-    "    ## Start finding the optimal weigths using gradient descent\n",
+    "    else:\n",
+    "        return elementwise_grad(func)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "eec681dc",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Below follows a short demonstration of how to use an activation\n",
+    "function. The derivative of the activation function will be important\n",
+    "when calculating the output delta term during backpropagation. Note\n",
+    "that derivate() can also be used for cost functions for a more\n",
+    "generalized approach."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 13,
+   "id": "a36d4506",
+   "metadata": {
+    "collapsed": false,
+    "editable": true,
+    "jupyter": {
+     "outputs_hidden": false
+    }
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Input to activation function:\n",
+      "[[4]\n",
+      " [5]\n",
+      " [6]]\n",
+      "\n",
+      "Output from sigmoid activation function:\n",
+      "[[0.98201379]\n",
+      " [0.99330715]\n",
+      " [0.99752738]]\n",
+      "\n",
+      "Derivative of sigmoid activation function valued at z:\n",
+      "[[0.19824029]\n",
+      " [0.19721923]\n",
+      " [0.19683648]]\n"
+     ]
+    }
+   ],
+   "source": [
+    "z = np.array([[4, 5, 6]]).T\n",
+    "print(f\"Input to activation function:\\n{z}\")\n",
     "\n",
-    "    # Find the Python function that represents the gradient of the cost function\n",
-    "    # w.r.t the 0-th input argument -- that is the weights and biases in the hidden and output layer\n",
-    "    cost_function_deep_grad = grad(cost_function_deep,0)\n",
+    "act_func = sigmoid\n",
+    "a = act_func(z)\n",
+    "print(f\"\\nOutput from {act_func.__name__} activation function:\\n{a}\")\n",
     "\n",
-    "    # Let the update be done num_iter times\n",
-    "    for i in range(num_iter):\n",
-    "        # Evaluate the gradient at the current weights and biases in P.\n",
-    "        # The cost_grad consist now of N_hidden + 1 arrays; the gradient w.r.t the weights and biases\n",
-    "        # in the hidden layers and output layers evaluated at x.\n",
-    "        cost_deep_grad =  cost_function_deep_grad(P, x)\n",
+    "act_func_derivative = derivate(act_func)\n",
+    "valued_at_z = act_func_derivative(a)\n",
+    "print(f\"\\nDerivative of {act_func.__name__} activation function valued at z:\\n{valued_at_z}\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d2358581",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "### The Neural Network\n",
     "\n",
-    "        for l in range(N_hidden+1):\n",
-    "            P[l] = P[l] - lmb * cost_deep_grad[l]\n",
+    "Now that we have gotten a good understanding of the implementation of\n",
+    "some important components, we can take a look at an object oriented\n",
+    "implementation of a feed forward neural network. The feed forward\n",
+    "neural network has been implemented as a class named FFNN, which can\n",
+    "be initiated as a regressor or classifier dependant on the choice of\n",
+    "cost function. The FFNN can have any number of input nodes, hidden\n",
+    "layers with any amount of hidden nodes, and any amount of output nodes\n",
+    "meaning it can perform multiclass classification as well as binary\n",
+    "classification and regression problems. Although there is a lot of\n",
+    "code present, it makes for an easy to use and generalizeable interface\n",
+    "for creating many types of neural networks as will be demonstrated\n",
+    "below."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 14,
+   "id": "9dd0b112",
+   "metadata": {
+    "collapsed": false,
+    "editable": true,
+    "jupyter": {
+     "outputs_hidden": false
+    }
+   },
+   "outputs": [],
+   "source": [
+    "import math\n",
+    "import autograd.numpy as np\n",
+    "import sys\n",
+    "import warnings\n",
+    "from autograd import grad, elementwise_grad\n",
+    "from random import random, seed\n",
+    "from copy import deepcopy, copy\n",
+    "from typing import Tuple, Callable\n",
+    "from sklearn.utils import resample\n",
+    "\n",
+    "warnings.simplefilter(\"error\")\n",
+    "\n",
+    "\n",
+    "class FFNN:\n",
+    "    \"\"\"\n",
+    "    Description:\n",
+    "    ------------\n",
+    "        Feed Forward Neural Network with interface enabling flexible design of a\n",
+    "        nerual networks architecture and the specification of activation function\n",
+    "        in the hidden layers and output layer respectively. This model can be used\n",
+    "        for both regression and classification problems, depending on the output function.\n",
+    "\n",
+    "    Attributes:\n",
+    "    ------------\n",
+    "        I   dimensions (tuple[int]): A list of positive integers, which specifies the\n",
+    "            number of nodes in each of the networks layers. The first integer in the array\n",
+    "            defines the number of nodes in the input layer, the second integer defines number\n",
+    "            of nodes in the first hidden layer and so on until the last number, which\n",
+    "            specifies the number of nodes in the output layer.\n",
+    "        II  hidden_func (Callable): The activation function for the hidden layers\n",
+    "        III output_func (Callable): The activation function for the output layer\n",
+    "        IV  cost_func (Callable): Our cost function\n",
+    "        V   seed (int): Sets random seed, makes results reproducible\n",
+    "    \"\"\"\n",
+    "\n",
+    "    def __init__(\n",
+    "        self,\n",
+    "        dimensions: tuple[int],\n",
+    "        hidden_func: Callable = sigmoid,\n",
+    "        output_func: Callable = lambda x: x,\n",
+    "        cost_func: Callable = CostOLS,\n",
+    "        seed: int = None,\n",
+    "    ):\n",
+    "        self.dimensions = dimensions\n",
+    "        self.hidden_func = hidden_func\n",
+    "        self.output_func = output_func\n",
+    "        self.cost_func = cost_func\n",
+    "        self.seed = seed\n",
+    "        self.weights = list()\n",
+    "        self.schedulers_weight = list()\n",
+    "        self.schedulers_bias = list()\n",
+    "        self.a_matrices = list()\n",
+    "        self.z_matrices = list()\n",
+    "        self.classification = None\n",
+    "\n",
+    "        self.reset_weights()\n",
+    "        self._set_classification()\n",
+    "\n",
+    "    def fit(\n",
+    "        self,\n",
+    "        X: np.ndarray,\n",
+    "        t: np.ndarray,\n",
+    "        scheduler: Scheduler,\n",
+    "        batches: int = 1,\n",
+    "        epochs: int = 100,\n",
+    "        lam: float = 0,\n",
+    "        X_val: np.ndarray = None,\n",
+    "        t_val: np.ndarray = None,\n",
+    "    ):\n",
+    "        \"\"\"\n",
+    "        Description:\n",
+    "        ------------\n",
+    "            This function performs the training the neural network by performing the feedforward and backpropagation\n",
+    "            algorithm to update the networks weights.\n",
+    "\n",
+    "        Parameters:\n",
+    "        ------------\n",
+    "            I    X (np.ndarray) : training data\n",
+    "            II   t (np.ndarray) : target data\n",
+    "            III  scheduler (Scheduler) : specified scheduler (algorithm for optimization of gradient descent)\n",
+    "            IV   scheduler_args (list[int]) : list of all arguments necessary for scheduler\n",
+    "\n",
+    "        Optional Parameters:\n",
+    "        ------------\n",
+    "            V    batches (int) : number of batches the datasets are split into, default equal to 1\n",
+    "            VI   epochs (int) : number of iterations used to train the network, default equal to 100\n",
+    "            VII  lam (float) : regularization hyperparameter lambda\n",
+    "            VIII X_val (np.ndarray) : validation set\n",
+    "            IX   t_val (np.ndarray) : validation target set\n",
+    "\n",
+    "        Returns:\n",
+    "        ------------\n",
+    "            I   scores (dict) : A dictionary containing the performance metrics of the model.\n",
+    "                The number of the metrics depends on the parameters passed to the fit-function.\n",
+    "\n",
+    "        \"\"\"\n",
+    "\n",
+    "        # setup \n",
+    "        if self.seed is not None:\n",
+    "            np.random.seed(self.seed)\n",
+    "\n",
+    "        val_set = False\n",
+    "        if X_val is not None and t_val is not None:\n",
+    "            val_set = True\n",
+    "\n",
+    "        # creating arrays for score metrics\n",
+    "        train_errors = np.empty(epochs)\n",
+    "        train_errors.fill(np.nan)\n",
+    "        val_errors = np.empty(epochs)\n",
+    "        val_errors.fill(np.nan)\n",
+    "\n",
+    "        train_accs = np.empty(epochs)\n",
+    "        train_accs.fill(np.nan)\n",
+    "        val_accs = np.empty(epochs)\n",
+    "        val_accs.fill(np.nan)\n",
+    "\n",
+    "        self.schedulers_weight = list()\n",
+    "        self.schedulers_bias = list()\n",
+    "\n",
+    "        batch_size = X.shape[0] // batches\n",
+    "\n",
+    "        X, t = resample(X, t)\n",
+    "\n",
+    "        # this function returns a function valued only at X\n",
+    "        cost_function_train = self.cost_func(t)\n",
+    "        if val_set:\n",
+    "            cost_function_val = self.cost_func(t_val)\n",
+    "\n",
+    "        # create schedulers for each weight matrix\n",
+    "        for i in range(len(self.weights)):\n",
+    "            self.schedulers_weight.append(copy(scheduler))\n",
+    "            self.schedulers_bias.append(copy(scheduler))\n",
+    "\n",
+    "        print(f\"{scheduler.__class__.__name__}: Eta={scheduler.eta}, Lambda={lam}\")\n",
+    "\n",
+    "        try:\n",
+    "            for e in range(epochs):\n",
+    "                for i in range(batches):\n",
+    "                    # allows for minibatch gradient descent\n",
+    "                    if i == batches - 1:\n",
+    "                        # If the for loop has reached the last batch, take all thats left\n",
+    "                        X_batch = X[i * batch_size :, :]\n",
+    "                        t_batch = t[i * batch_size :, :]\n",
+    "                    else:\n",
+    "                        X_batch = X[i * batch_size : (i + 1) * batch_size, :]\n",
+    "                        t_batch = t[i * batch_size : (i + 1) * batch_size, :]\n",
+    "\n",
+    "                    self._feedforward(X_batch)\n",
+    "                    self._backpropagate(X_batch, t_batch, lam)\n",
+    "\n",
+    "                # reset schedulers for each epoch (some schedulers pass in this call)\n",
+    "                for scheduler in self.schedulers_weight:\n",
+    "                    scheduler.reset()\n",
+    "\n",
+    "                for scheduler in self.schedulers_bias:\n",
+    "                    scheduler.reset()\n",
+    "\n",
+    "                # computing performance metrics\n",
+    "                pred_train = self.predict(X)\n",
+    "                train_error = cost_function_train(pred_train)\n",
+    "\n",
+    "                train_errors[e] = train_error\n",
+    "                if val_set:\n",
+    "                    \n",
+    "                    pred_val = self.predict(X_val)\n",
+    "                    val_error = cost_function_val(pred_val)\n",
+    "                    val_errors[e] = val_error\n",
+    "\n",
+    "                if self.classification:\n",
+    "                    train_acc = self._accuracy(self.predict(X), t)\n",
+    "                    train_accs[e] = train_acc\n",
+    "                    if val_set:\n",
+    "                        val_acc = self._accuracy(pred_val, t_val)\n",
+    "                        val_accs[e] = val_acc\n",
+    "\n",
+    "                # printing progress bar\n",
+    "                progression = e / epochs\n",
+    "                print_length = self._progress_bar(\n",
+    "                    progression,\n",
+    "                    train_error=train_errors[e],\n",
+    "                    train_acc=train_accs[e],\n",
+    "                    val_error=val_errors[e],\n",
+    "                    val_acc=val_accs[e],\n",
+    "                )\n",
+    "        except KeyboardInterrupt:\n",
+    "            # allows for stopping training at any point and seeing the result\n",
+    "            pass\n",
+    "\n",
+    "        # visualization of training progression (similiar to tensorflow progression bar)\n",
+    "        sys.stdout.write(\"\\r\" + \" \" * print_length)\n",
+    "        sys.stdout.flush()\n",
+    "        self._progress_bar(\n",
+    "            1,\n",
+    "            train_error=train_errors[e],\n",
+    "            train_acc=train_accs[e],\n",
+    "            val_error=val_errors[e],\n",
+    "            val_acc=val_accs[e],\n",
+    "        )\n",
+    "        sys.stdout.write(\"\")\n",
+    "\n",
+    "        # return performance metrics for the entire run\n",
+    "        scores = dict()\n",
+    "\n",
+    "        scores[\"train_errors\"] = train_errors\n",
+    "\n",
+    "        if val_set:\n",
+    "            scores[\"val_errors\"] = val_errors\n",
+    "\n",
+    "        if self.classification:\n",
+    "            scores[\"train_accs\"] = train_accs\n",
+    "\n",
+    "            if val_set:\n",
+    "                scores[\"val_accs\"] = val_accs\n",
+    "\n",
+    "        return scores\n",
+    "\n",
+    "    def predict(self, X: np.ndarray, *, threshold=0.5):\n",
+    "        \"\"\"\n",
+    "         Description:\n",
+    "         ------------\n",
+    "             Performs prediction after training of the network has been finished.\n",
+    "\n",
+    "         Parameters:\n",
+    "        ------------\n",
+    "             I   X (np.ndarray): The design matrix, with n rows of p features each\n",
+    "\n",
+    "         Optional Parameters:\n",
+    "         ------------\n",
+    "             II  threshold (float) : sets minimal value for a prediction to be predicted as the positive class\n",
+    "                 in classification problems\n",
+    "\n",
+    "         Returns:\n",
+    "         ------------\n",
+    "             I   z (np.ndarray): A prediction vector (row) for each row in our design matrix\n",
+    "                 This vector is thresholded if regression=False, meaning that classification results\n",
+    "                 in a vector of 1s and 0s, while regressions in an array of decimal numbers\n",
+    "\n",
+    "        \"\"\"\n",
+    "\n",
+    "        predict = self._feedforward(X)\n",
+    "\n",
+    "        if self.classification:\n",
+    "            return np.where(predict > threshold, 1, 0)\n",
+    "        else:\n",
+    "            return predict\n",
+    "\n",
+    "    def reset_weights(self):\n",
+    "        \"\"\"\n",
+    "        Description:\n",
+    "        ------------\n",
+    "            Resets/Reinitializes the weights in order to train the network for a new problem.\n",
+    "\n",
+    "        \"\"\"\n",
+    "        if self.seed is not None:\n",
+    "            np.random.seed(self.seed)\n",
+    "\n",
+    "        self.weights = list()\n",
+    "        for i in range(len(self.dimensions) - 1):\n",
+    "            weight_array = np.random.randn(\n",
+    "                self.dimensions[i] + 1, self.dimensions[i + 1]\n",
+    "            )\n",
+    "            weight_array[0, :] = np.random.randn(self.dimensions[i + 1]) * 0.01\n",
+    "\n",
+    "            self.weights.append(weight_array)\n",
+    "\n",
+    "    def _feedforward(self, X: np.ndarray):\n",
+    "        \"\"\"\n",
+    "        Description:\n",
+    "        ------------\n",
+    "            Calculates the activation of each layer starting at the input and ending at the output.\n",
+    "            Each following activation is calculated from a weighted sum of each of the preceeding\n",
+    "            activations (except in the case of the input layer).\n",
+    "\n",
+    "        Parameters:\n",
+    "        ------------\n",
+    "            I   X (np.ndarray): The design matrix, with n rows of p features each\n",
+    "\n",
+    "        Returns:\n",
+    "        ------------\n",
+    "            I   z (np.ndarray): A prediction vector (row) for each row in our design matrix\n",
+    "        \"\"\"\n",
+    "\n",
+    "        # reset matrices\n",
+    "        self.a_matrices = list()\n",
+    "        self.z_matrices = list()\n",
+    "\n",
+    "        # if X is just a vector, make it into a matrix\n",
+    "        if len(X.shape) == 1:\n",
+    "            X = X.reshape((1, X.shape[0]))\n",
+    "\n",
+    "        # Add a coloumn of zeros as the first coloumn of the design matrix, in order\n",
+    "        # to add bias to our data\n",
+    "        bias = np.ones((X.shape[0], 1)) * 0.01\n",
+    "        X = np.hstack([bias, X])\n",
+    "\n",
+    "        # a^0, the nodes in the input layer (one a^0 for each row in X - where the\n",
+    "        # exponent indicates layer number).\n",
+    "        a = X\n",
+    "        self.a_matrices.append(a)\n",
+    "        self.z_matrices.append(a)\n",
+    "\n",
+    "        # The feed forward algorithm\n",
+    "        for i in range(len(self.weights)):\n",
+    "            if i < len(self.weights) - 1:\n",
+    "                z = a @ self.weights[i]\n",
+    "                self.z_matrices.append(z)\n",
+    "                a = self.hidden_func(z)\n",
+    "                # bias column again added to the data here\n",
+    "                bias = np.ones((a.shape[0], 1)) * 0.01\n",
+    "                a = np.hstack([bias, a])\n",
+    "                self.a_matrices.append(a)\n",
+    "            else:\n",
+    "                try:\n",
+    "                    # a^L, the nodes in our output layers\n",
+    "                    z = a @ self.weights[i]\n",
+    "                    a = self.output_func(z)\n",
+    "                    self.a_matrices.append(a)\n",
+    "                    self.z_matrices.append(z)\n",
+    "                except Exception as OverflowError:\n",
+    "                    print(\n",
+    "                        \"OverflowError in fit() in FFNN\\nHOW TO DEBUG ERROR: Consider lowering your learning rate or scheduler specific parameters such as momentum, or check if your input values need scaling\"\n",
+    "                    )\n",
+    "\n",
+    "        # this will be a^L\n",
+    "        return a\n",
+    "\n",
+    "    def _backpropagate(self, X, t, lam):\n",
+    "        \"\"\"\n",
+    "        Description:\n",
+    "        ------------\n",
+    "            Performs the backpropagation algorithm. In other words, this method\n",
+    "            calculates the gradient of all the layers starting at the\n",
+    "            output layer, and moving from right to left accumulates the gradient until\n",
+    "            the input layer is reached. Each layers respective weights are updated while\n",
+    "            the algorithm propagates backwards from the output layer (auto-differentation in reverse mode).\n",
+    "\n",
+    "        Parameters:\n",
+    "        ------------\n",
+    "            I   X (np.ndarray): The design matrix, with n rows of p features each.\n",
+    "            II  t (np.ndarray): The target vector, with n rows of p targets.\n",
+    "            III lam (float32): regularization parameter used to punish the weights in case of overfitting\n",
+    "\n",
+    "        Returns:\n",
+    "        ------------\n",
+    "            No return value.\n",
+    "\n",
+    "        \"\"\"\n",
+    "        out_derivative = derivate(self.output_func)\n",
+    "        hidden_derivative = derivate(self.hidden_func)\n",
+    "\n",
+    "        for i in range(len(self.weights) - 1, -1, -1):\n",
+    "            # delta terms for output\n",
+    "            if i == len(self.weights) - 1:\n",
+    "                # for multi-class classification\n",
+    "                if (\n",
+    "                    self.output_func.__name__ == \"softmax\"\n",
+    "                ):\n",
+    "                    delta_matrix = self.a_matrices[i + 1] - t\n",
+    "                # for single class classification\n",
+    "                else:\n",
+    "                    cost_func_derivative = grad(self.cost_func(t))\n",
+    "                    delta_matrix = out_derivative(\n",
+    "                        self.z_matrices[i + 1]\n",
+    "                    ) * cost_func_derivative(self.a_matrices[i + 1])\n",
+    "\n",
+    "            # delta terms for hidden layer\n",
+    "            else:\n",
+    "                delta_matrix = (\n",
+    "                    self.weights[i + 1][1:, :] @ delta_matrix.T\n",
+    "                ).T * hidden_derivative(self.z_matrices[i + 1])\n",
+    "\n",
+    "            # calculate gradient\n",
+    "            gradient_weights = self.a_matrices[i][:, 1:].T @ delta_matrix\n",
+    "            gradient_bias = np.sum(delta_matrix, axis=0).reshape(\n",
+    "                1, delta_matrix.shape[1]\n",
+    "            )\n",
+    "\n",
+    "            # regularization term\n",
+    "            gradient_weights += self.weights[i][1:, :] * lam\n",
+    "\n",
+    "            # use scheduler\n",
+    "            update_matrix = np.vstack(\n",
+    "                [\n",
+    "                    self.schedulers_bias[i].update_change(gradient_bias),\n",
+    "                    self.schedulers_weight[i].update_change(gradient_weights),\n",
+    "                ]\n",
+    "            )\n",
+    "\n",
+    "            # update weights and bias\n",
+    "            self.weights[i] -= update_matrix\n",
+    "\n",
+    "    def _accuracy(self, prediction: np.ndarray, target: np.ndarray):\n",
+    "        \"\"\"\n",
+    "        Description:\n",
+    "        ------------\n",
+    "            Calculates accuracy of given prediction to target\n",
+    "\n",
+    "        Parameters:\n",
+    "        ------------\n",
+    "            I   prediction (np.ndarray): vector of predicitons output network\n",
+    "                (1s and 0s in case of classification, and real numbers in case of regression)\n",
+    "            II  target (np.ndarray): vector of true values (What the network ideally should predict)\n",
+    "\n",
+    "        Returns:\n",
+    "        ------------\n",
+    "            A floating point number representing the percentage of correctly classified instances.\n",
+    "        \"\"\"\n",
+    "        assert prediction.size == target.size\n",
+    "        return np.average((target == prediction))\n",
+    "    def _set_classification(self):\n",
+    "        \"\"\"\n",
+    "        Description:\n",
+    "        ------------\n",
+    "            Decides if FFNN acts as classifier (True) og regressor (False),\n",
+    "            sets self.classification during init()\n",
+    "        \"\"\"\n",
+    "        self.classification = False\n",
+    "        if (\n",
+    "            self.cost_func.__name__ == \"CostLogReg\"\n",
+    "            or self.cost_func.__name__ == \"CostCrossEntropy\"\n",
+    "        ):\n",
+    "            self.classification = True\n",
+    "\n",
+    "    def _progress_bar(self, progression, **kwargs):\n",
+    "        \"\"\"\n",
+    "        Description:\n",
+    "        ------------\n",
+    "            Displays progress of training\n",
+    "        \"\"\"\n",
+    "        print_length = 40\n",
+    "        num_equals = int(progression * print_length)\n",
+    "        num_not = print_length - num_equals\n",
+    "        arrow = \">\" if num_equals > 0 else \"\"\n",
+    "        bar = \"[\" + \"=\" * (num_equals - 1) + arrow + \"-\" * num_not + \"]\"\n",
+    "        perc_print = self._format(progression * 100, decimals=5)\n",
+    "        line = f\"  {bar} {perc_print}% \"\n",
+    "\n",
+    "        for key in kwargs:\n",
+    "            if not np.isnan(kwargs[key]):\n",
+    "                value = self._format(kwargs[key], decimals=4)\n",
+    "                line += f\"| {key}: {value} \"\n",
+    "        sys.stdout.write(\"\\r\" + line)\n",
+    "        sys.stdout.flush()\n",
+    "        return len(line)\n",
+    "\n",
+    "    def _format(self, value, decimals=4):\n",
+    "        \"\"\"\n",
+    "        Description:\n",
+    "        ------------\n",
+    "            Formats decimal numbers for progress bar\n",
+    "        \"\"\"\n",
+    "        if value > 0:\n",
+    "            v = value\n",
+    "        elif value < 0:\n",
+    "            v = -10 * value\n",
+    "        else:\n",
+    "            v = 1\n",
+    "        n = 1 + math.floor(math.log10(v))\n",
+    "        if n >= decimals - 1:\n",
+    "            return str(round(value))\n",
+    "        return f\"{value:.{decimals-n-1}f}\""
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b5aaa66b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Before we make a model, we will quickly generate a dataset we can use\n",
+    "for our linear regression problem as shown below"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 15,
+   "id": "35f13536",
+   "metadata": {
+    "collapsed": false,
+    "editable": true,
+    "jupyter": {
+     "outputs_hidden": false
+    }
+   },
+   "outputs": [],
+   "source": [
+    "import autograd.numpy as np\n",
+    "from sklearn.model_selection import train_test_split\n",
     "\n",
-    "    print('Final cost: %g'%cost_function_deep(P, x))\n",
+    "def SkrankeFunction(x, y):\n",
+    "    return np.ravel(0 + 1*x + 2*y + 3*x**2 + 4*x*y + 5*y**2)\n",
     "\n",
-    "    return P\n",
+    "def create_X(x, y, n):\n",
+    "    if len(x.shape) > 1:\n",
+    "        x = np.ravel(x)\n",
+    "        y = np.ravel(y)\n",
     "\n",
-    "if __name__ == '__main__':\n",
-    "    npr.seed(4155)\n",
+    "    N = len(x)\n",
+    "    l = int((n + 1) * (n + 2) / 2)  # Number of elements in beta\n",
+    "    X = np.ones((N, l))\n",
     "\n",
-    "    ## Decide the vales of arguments to the function to solve\n",
-    "    Nt = 10\n",
-    "    T = 1\n",
-    "    t = np.linspace(0,T, Nt)\n",
+    "    for i in range(1, n + 1):\n",
+    "        q = int((i) * (i + 1) / 2)\n",
+    "        for k in range(i + 1):\n",
+    "            X[:, q + k] = (x ** (i - k)) * (y**k)\n",
     "\n",
-    "    ## Set up the initial parameters\n",
-    "    num_hidden_neurons = [100, 50, 25]\n",
-    "    num_iter = 1000\n",
-    "    lmb = 1e-3\n",
+    "    return X\n",
     "\n",
-    "    P = solve_ode_deep_neural_network(t, num_hidden_neurons, num_iter, lmb)\n",
+    "step=0.5\n",
+    "x = np.arange(0, 1, step)\n",
+    "y = np.arange(0, 1, step)\n",
+    "x, y = np.meshgrid(x, y)\n",
+    "target = SkrankeFunction(x, y)\n",
+    "target = target.reshape(target.shape[0], 1)\n",
     "\n",
-    "    g_dnn_ag = g_trial_deep(t,P)\n",
-    "    g_analytical = g_analytic(t)\n",
+    "poly_degree=3\n",
+    "X = create_X(x, y, poly_degree)\n",
     "\n",
-    "    # Find the maximum absolute difference between the solutons:\n",
-    "    diff_ag = np.max(np.abs(g_dnn_ag - g_analytical))\n",
-    "    print(\"The max absolute difference between the solutions is: %g\"%diff_ag)\n",
+    "X_train, X_test, t_train, t_test = train_test_split(X, target)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "12780998",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Now that we have our dataset ready for the regression, we can create\n",
+    "our regressor. Note that with the seed parameter, we can make sure our\n",
+    "results stay the same every time we run the neural network. For\n",
+    "inititialization, we simply specify the dimensions (we wish the amount\n",
+    "of input nodes to be equal to the datapoints, and the output to\n",
+    "predict one value)."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 16,
+   "id": "3de4263c",
+   "metadata": {
+    "collapsed": false,
+    "editable": true,
+    "jupyter": {
+     "outputs_hidden": false
+    }
+   },
+   "outputs": [],
+   "source": [
+    "input_nodes = X_train.shape[1]\n",
+    "output_nodes = 1\n",
     "\n",
-    "    plt.figure(figsize=(10,10))\n",
+    "linear_regression = FFNN((input_nodes, output_nodes), output_func=identity, cost_func=CostOLS, seed=2023)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e3ca1fb5",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "We then fit our model with our training data using the scheduler of our choice."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 17,
+   "id": "714229a9",
+   "metadata": {
+    "collapsed": false,
+    "editable": true,
+    "jupyter": {
+     "outputs_hidden": false
+    }
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Constant: Eta=0.001, Lambda=0\n",
+      "  [=======================================>] 100.0% | train_error: 10.9 "
+     ]
+    }
+   ],
+   "source": [
+    "linear_regression.reset_weights() # reset weights such that previous runs or reruns don't affect the weights\n",
     "\n",
-    "    plt.title('Performance of neural network solving an ODE compared to the analytical solution')\n",
-    "    plt.plot(t, g_analytical)\n",
-    "    plt.plot(t, g_dnn_ag[0,:])\n",
-    "    plt.legend(['analytical','nn'])\n",
-    "    plt.xlabel('t')\n",
-    "    plt.ylabel('g(t)')\n",
+    "scheduler = Constant(eta=1e-3)\n",
+    "scores = linear_regression.fit(X_train, t_train, scheduler)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2240c6b8",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Due to the progress bar we can see the MSE (train_error) throughout\n",
+    "the FFNN's training. Note that the fit() function has some optional\n",
+    "parameters with defualt arguments. For example, the regularization\n",
+    "hyperparameter can be left ignored if not needed, and equally the FFNN\n",
+    "will by default run for 100 epochs. These can easily be changed, such\n",
+    "as for example:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 18,
+   "id": "96f9f1ab",
+   "metadata": {
+    "collapsed": false,
+    "editable": true,
+    "jupyter": {
+     "outputs_hidden": false
+    }
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Constant: Eta=0.001, Lambda=0.0001\n",
+      "  [=======================================>] 100.0% | train_error: 1.00  "
+     ]
+    }
+   ],
+   "source": [
+    "linear_regression.reset_weights() # reset weights such that previous runs or reruns don't affect the weights\n",
     "\n",
-    "    plt.show()"
+    "scores = linear_regression.fit(X_train, t_train, scheduler, lam=1e-4, epochs=1000)"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "21af3f64",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "## Using forward Euler to solve the ODE\n",
+    "We see that given more epochs to train on, the regressor reaches a lower MSE.\n",
     "\n",
-    "A straightforward way of solving an ODE numerically, is to use Euler's method.\n",
+    "Let us then switch to a binary classification. We use a binary\n",
+    "classification dataset, and follow a similar setup to the regression\n",
+    "case."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 19,
+   "id": "98f0055d",
+   "metadata": {
+    "collapsed": false,
+    "editable": true,
+    "jupyter": {
+     "outputs_hidden": false
+    }
+   },
+   "outputs": [],
+   "source": [
+    "from sklearn.datasets import load_breast_cancer\n",
+    "from sklearn.preprocessing import MinMaxScaler\n",
+    "\n",
+    "wisconsin = load_breast_cancer()\n",
+    "X = wisconsin.data\n",
+    "target = wisconsin.target\n",
+    "target = target.reshape(target.shape[0], 1)\n",
     "\n",
-    "Euler's method uses Taylor series to approximate the value at a function $f$ at a step $\\Delta x$ from $x$:\n",
+    "X_train, X_val, t_train, t_val = train_test_split(X, target)\n",
     "\n",
-    "$$\n",
-    "f(x + \\Delta x) \\approx f(x) + \\Delta x f'(x)\n",
-    "$$\n",
+    "scaler = MinMaxScaler()\n",
+    "scaler.fit(X_train)\n",
+    "X_train = scaler.transform(X_train)\n",
+    "X_val = scaler.transform(X_val)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 20,
+   "id": "fbd2675f",
+   "metadata": {
+    "collapsed": false,
+    "editable": true,
+    "jupyter": {
+     "outputs_hidden": false
+    }
+   },
+   "outputs": [],
+   "source": [
+    "input_nodes = X_train.shape[1]\n",
+    "output_nodes = 1\n",
     "\n",
-    "In our case, using Euler's method to approximate the value of $g$ at a step $\\Delta t$ from $t$ yields"
+    "logistic_regression = FFNN((input_nodes, output_nodes), output_func=sigmoid, cost_func=CostLogReg, seed=2023)"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "64ed3461",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "$$\n",
-    "\\begin{aligned}\n",
-    "  g(t + \\Delta t) &\\approx g(t) + \\Delta t g'(t) \\\\\n",
-    "  &= g(t) + \\Delta t \\big(\\alpha g(t)(A - g(t))\\big)\n",
-    "\\end{aligned}\n",
-    "$$"
+    "We will now make use of our validation data by passing it into our fit function as a keyword argument"
    ]
   },
   {
-   "cell_type": "markdown",
-   "metadata": {},
+   "cell_type": "code",
+   "execution_count": 21,
+   "id": "1cdc9d23",
+   "metadata": {
+    "collapsed": false,
+    "editable": true,
+    "jupyter": {
+     "outputs_hidden": false
+    }
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Adam: Eta=0.001, Lambda=0\n",
+      "  [=======================================>] 100.0% | train_error: 3.21 | train_acc: 0.845 | val_error: 3.77 | val_acc: 0.818  "
+     ]
+    }
+   ],
    "source": [
-    "along with the condition that $g(0) = g_0$.\n",
+    "logistic_regression.reset_weights() # reset weights such that previous runs or reruns don't affect the weights\n",
     "\n",
-    "Let $t_i = i \\cdot \\Delta t$ where $\\Delta t = \\frac{T}{N_t-1}$ where $T$ is the final time our solver must solve for and $N_t$ the number of values for $t \\in [0, T]$ for $i = 0, \\dots, N_t-1$.\n",
-    "\n",
-    "For $i \\geq 1$, we have that"
+    "scheduler = Adam(eta=1e-3, rho=0.9, rho2=0.999)\n",
+    "scores = logistic_regression.fit(X_train, t_train, scheduler, epochs=1000, X_val=X_val, t_val=t_val)"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "13e2f881",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "$$\n",
-    "\\begin{aligned}\n",
-    "t_i &= i\\Delta t \\\\\n",
-    "&= (i - 1)\\Delta t + \\Delta t \\\\\n",
-    "&= t_{i-1} + \\Delta t\n",
-    "\\end{aligned}\n",
-    "$$"
+    "Finally, we will create a neural network with 2 hidden layers with activation functions."
    ]
   },
   {
-   "cell_type": "markdown",
-   "metadata": {},
+   "cell_type": "code",
+   "execution_count": 22,
+   "id": "c28f2181",
+   "metadata": {
+    "collapsed": false,
+    "editable": true,
+    "jupyter": {
+     "outputs_hidden": false
+    }
+   },
+   "outputs": [],
    "source": [
-    "Now, if $g_i = g(t_i)$ then"
+    "input_nodes = X_train.shape[1]\n",
+    "hidden_nodes1 = 100\n",
+    "hidden_nodes2 = 30\n",
+    "output_nodes = 1\n",
+    "\n",
+    "dims = (input_nodes, hidden_nodes1, hidden_nodes2, output_nodes)\n",
+    "\n",
+    "neural_network = FFNN(dims, hidden_func=RELU, output_func=sigmoid, cost_func=CostLogReg, seed=2023)"
    ]
   },
   {
-   "cell_type": "markdown",
-   "metadata": {},
+   "cell_type": "code",
+   "execution_count": 23,
+   "id": "3150b724",
+   "metadata": {
+    "collapsed": false,
+    "editable": true,
+    "jupyter": {
+     "outputs_hidden": false
+    }
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Adam: Eta=0.0001, Lambda=0\n",
+      "  [=======================================>] 100.0% | train_error: 0.0973 | train_acc: 0.995 | val_error: 1.74 | val_acc: 0.916 "
+     ]
+    }
+   ],
    "source": [
-    "<!-- Equation labels as ordinary links -->\n",
-    "<div id=\"odenum\"></div>\n",
+    "neural_network.reset_weights() # reset weights such that previous runs or reruns don't affect the weights\n",
     "\n",
-    "$$\n",
-    "\\begin{equation}\n",
-    "  \\begin{aligned}\n",
-    "  g_i &= g(t_i) \\\\\n",
-    "  &= g(t_{i-1} + \\Delta t) \\\\\n",
-    "  &\\approx g(t_{i-1}) + \\Delta t \\big(\\alpha g(t_{i-1})(A - g(t_{i-1}))\\big) \\\\\n",
-    "  &= g_{i-1} + \\Delta t \\big(\\alpha g_{i-1}(A - g_{i-1})\\big)\n",
-    "  \\end{aligned}\n",
-    "\\end{equation} \\label{odenum} \\tag{12}\n",
-    "$$"
+    "scheduler = Adam(eta=1e-4, rho=0.9, rho2=0.999)\n",
+    "scores = neural_network.fit(X_train, t_train, scheduler, epochs=1000, X_val=X_val, t_val=t_val)"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "17aebab2",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "for $i \\geq 1$ and $g_0 = g(t_0) = g(0) = g_0$.\n",
+    "### Multiclass classification\n",
     "\n",
-    "Equation ([12](#odenum)) could be implemented in the following way,\n",
-    "extending the program that uses the network using Autograd:"
+    "Finally, we will demonstrate the use case of multiclass classification\n",
+    "using our FFNN with the famous MNIST dataset, which contain images of\n",
+    "digits between the range of 0 to 9."
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 4,
-   "metadata": {},
-   "outputs": [],
+   "execution_count": 24,
+   "id": "997c5001",
+   "metadata": {
+    "collapsed": false,
+    "editable": true,
+    "jupyter": {
+     "outputs_hidden": false
+    }
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Adam: Eta=0.0001, Lambda=0\n",
+      "  [=======================================>] 100.0% | train_error: 0.175 | train_acc: 0.983 "
+     ]
+    }
+   ],
    "source": [
-    "# Assume that all function definitions from the example program using Autograd\n",
-    "# are located here.\n",
+    "from sklearn.datasets import load_digits\n",
     "\n",
-    "if __name__ == '__main__':\n",
-    "    npr.seed(4155)\n",
+    "def onehot(target: np.ndarray):\n",
+    "    onehot = np.zeros((target.size, target.max() + 1))\n",
+    "    onehot[np.arange(target.size), target] = 1\n",
+    "    return onehot\n",
     "\n",
-    "    ## Decide the vales of arguments to the function to solve\n",
-    "    Nt = 10\n",
-    "    T = 1\n",
-    "    t = np.linspace(0,T, Nt)\n",
+    "digits = load_digits()\n",
     "\n",
-    "    ## Set up the initial parameters\n",
-    "    num_hidden_neurons = [100,50,25]\n",
-    "    num_iter = 1000\n",
-    "    lmb = 1e-3\n",
+    "X = digits.data\n",
+    "target = digits.target\n",
+    "target = onehot(target)\n",
     "\n",
-    "    P = solve_ode_deep_neural_network(t, num_hidden_neurons, num_iter, lmb)\n",
+    "input_nodes = 64\n",
+    "hidden_nodes1 = 100\n",
+    "hidden_nodes2 = 30\n",
+    "output_nodes = 10\n",
     "\n",
-    "    g_dnn_ag = g_trial_deep(t,P)\n",
-    "    g_analytical = g_analytic(t)\n",
+    "dims = (input_nodes, hidden_nodes1, hidden_nodes2, output_nodes)\n",
     "\n",
-    "    # Find the maximum absolute difference between the solutons:\n",
-    "    diff_ag = np.max(np.abs(g_dnn_ag - g_analytical))\n",
-    "    print(\"The max absolute difference between the solutions is: %g\"%diff_ag)\n",
+    "multiclass = FFNN(dims, hidden_func=LRELU, output_func=softmax, cost_func=CostCrossEntropy)\n",
     "\n",
-    "    plt.figure(figsize=(10,10))\n",
+    "multiclass.reset_weights() # reset weights such that previous runs or reruns don't affect the weights\n",
     "\n",
-    "    plt.title('Performance of neural network solving an ODE compared to the analytical solution')\n",
-    "    plt.plot(t, g_analytical)\n",
-    "    plt.plot(t, g_dnn_ag[0,:])\n",
-    "    plt.legend(['analytical','nn'])\n",
-    "    plt.xlabel('t')\n",
-    "    plt.ylabel('g(t)')\n",
+    "scheduler = Adam(eta=1e-4, rho=0.9, rho2=0.999)\n",
+    "scores = multiclass.fit(X, target, scheduler, epochs=1000)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "43d805bc",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Testing the XOR gate and other gates\n",
     "\n",
-    "    ## Find an approximation to the funtion using forward Euler\n",
+    "Let us now use our code to test the XOR gate."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 28,
+   "id": "4bbaf697",
+   "metadata": {
+    "collapsed": false,
+    "editable": true,
+    "jupyter": {
+     "outputs_hidden": false
+    }
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Adam: Eta=0.001, Lambda=0\n",
+      "  [=======================================>] 100.0% | train_error: 10.4 | train_acc: 0.500  "
+     ]
+    }
+   ],
+   "source": [
+    "X = np.array([ [0, 0], [0, 1], [1, 0],[1, 1]],dtype=np.float64)\n",
     "\n",
-    "    alpha, A, g0 = get_parameters()\n",
-    "    dt = T/(Nt - 1)\n",
+    "# The XOR gate\n",
+    "yXOR = np.array( [[ 0], [1] ,[1], [0]])\n",
     "\n",
-    "    # Perform forward Euler to solve the ODE\n",
-    "    g_euler = np.zeros(Nt)\n",
-    "    g_euler[0] = g0\n",
+    "input_nodes = X.shape[1]\n",
+    "output_nodes = 1\n",
     "\n",
-    "    for i in range(1,Nt):\n",
-    "        g_euler[i] = g_euler[i-1] + dt*(alpha*g_euler[i-1]*(A - g_euler[i-1]))\n",
+    "logistic_regression = FFNN((input_nodes, output_nodes), output_func=sigmoid, cost_func=CostLogReg, seed=2023)\n",
+    "logistic_regression.reset_weights() # reset weights such that previous runs or reruns don't affect the weights\n",
+    "scheduler = Adam(eta=1e-3, rho=0.9, rho2=0.999)\n",
+    "scores = logistic_regression.fit(X, yXOR, scheduler, epochs=1000)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "31e852a7",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Not bad, but the results depend strongly on the learning reate. Try different learning rates."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9792c0c3",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Solving differential equations  with Deep Learning\n",
     "\n",
-    "    # Print the errors done by each method\n",
-    "    diff1 = np.max(np.abs(g_euler - g_analytical))\n",
-    "    diff2 = np.max(np.abs(g_dnn_ag[0,:] - g_analytical))\n",
+    "The Universal Approximation Theorem states that a neural network can\n",
+    "approximate any function at a single hidden layer along with one input\n",
+    "and output layer to any given precision.\n",
     "\n",
-    "    print('Max absolute difference between Euler method and analytical: %g'%diff1)\n",
-    "    print('Max absolute difference between deep neural network and analytical: %g'%diff2)\n",
+    "**Book on solving differential equations with ML methods.**\n",
     "\n",
-    "    # Plot results\n",
-    "    plt.figure(figsize=(10,10))\n",
+    "[An Introduction to Neural Network Methods for Differential Equations](https://www.springer.com/gp/book/9789401798150), by Yadav and Kumar.\n",
     "\n",
-    "    plt.plot(t,g_euler)\n",
-    "    plt.plot(t,g_analytical)\n",
-    "    plt.plot(t,g_dnn_ag[0,:])\n",
+    "**Physics informed neural networks.**\n",
     "\n",
-    "    plt.legend(['euler','analytical','dnn'])\n",
-    "    plt.xlabel('Time t')\n",
-    "    plt.ylabel('g(t)')\n",
+    "[Scientific Machine Learning Through Physics–Informed Neural Networks: Where we are and What’s Next](https://link.springer.com/article/10.1007/s10915-022-01939-z), by Cuomo et al\n",
     "\n",
-    "    plt.show()"
+    "**Thanks to Kristine Baluka Hein.**\n",
+    "\n",
+    "The lectures on differential equations were developed by Kristine Baluka Hein, now PhD student at IFI.\n",
+    "A great thanks to Kristine."
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "9214a407",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "## Example: Solving the one dimensional Poisson equation\n",
+    "## Ordinary Differential Equations first\n",
     "\n",
-    "The Poisson equation for $g(x)$ in one dimension is"
+    "An ordinary differential equation (ODE) is an equation involving functions having one variable.\n",
+    "\n",
+    "In general, an ordinary differential equation looks like"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "40a78c33",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "<!-- Equation labels as ordinary links -->\n",
-    "<div id=\"poisson\"></div>\n",
+    "<div id=\"ode\"></div>\n",
     "\n",
     "$$\n",
-    "\\begin{equation} \\label{poisson} \\tag{13}\n",
-    "  -g''(x) = f(x)\n",
+    "\\begin{equation} \\label{ode} \\tag{1}\n",
+    "f\\left(x, \\, g(x), \\, g'(x), \\, g''(x), \\, \\dots \\, , \\, g^{(n)}(x)\\right) = 0\n",
     "\\end{equation}\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "42dae561",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "where $f(x)$ is a given function for $x \\in (0,1)$.\n",
+    "where $g(x)$ is the function to find, and $g^{(n)}(x)$ is the $n$-th derivative of $g(x)$.\n",
     "\n",
-    "The conditions that $g(x)$ is chosen to fulfill, are"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "$$\n",
-    "\\begin{align*}\n",
-    "  g(0) &= 0 \\\\\n",
-    "  g(1) &= 0\n",
-    "\\end{align*}\n",
-    "$$"
+    "The $f\\left(x, g(x), g'(x), g''(x), \\, \\dots \\, , g^{(n)}(x)\\right)$ is just a way to write that there is an expression involving $x$ and $g(x), \\ g'(x), \\ g''(x), \\, \\dots \\, , \\text{ and } g^{(n)}(x)$ on the left side of the equality sign in ([1](#ode)).\n",
+    "The highest order of derivative, that is the value of $n$, determines to the order of the equation.\n",
+    "The equation is referred to as a $n$-th order ODE.\n",
+    "Along with ([1](#ode)), some additional conditions of the function $g(x)$ are typically given\n",
+    "for the solution to be unique."
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "b4bf5f2e",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "This equation can be solved numerically using programs where e.g Autograd and TensorFlow are used.\n",
-    "The results from the networks can then be compared to the analytical solution.\n",
-    "In addition, it could be interesting to see how a typical method for numerically solving second order ODEs compares to the neural networks.\n",
-    "\n",
-    "## The specific equation to solve for\n",
+    "## The trial solution\n",
     "\n",
-    "Here, the function $g(x)$ to solve for follows the equation"
+    "Let the trial solution $g_t(x)$ be"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "1f4f3eba",
+   "metadata": {
+    "editable": true
+   },
    "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"_auto1\"></div>\n",
+    "\n",
     "$$\n",
-    "-g''(x) = f(x),\\qquad x \\in (0,1)\n",
+    "\\begin{equation}\n",
+    "\tg_t(x) = h_1(x) + h_2(x,N(x,P))\n",
+    "\\label{_auto1} \\tag{2}\n",
+    "\\end{equation}\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "where $f(x)$ is a given function, along with the chosen conditions"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
+   "id": "d799a47c",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "<!-- Equation labels as ordinary links -->\n",
-    "<div id=\"cond\"></div>\n",
+    "where $h_1(x)$ is a function that makes $g_t(x)$ satisfy a given set\n",
+    "of conditions, $N(x,P)$ a neural network with weights and biases\n",
+    "described by $P$ and $h_2(x, N(x,P))$ some expression involving the\n",
+    "neural network.  The role of the function $h_2(x, N(x,P))$, is to\n",
+    "ensure that the output from $N(x,P)$ is zero when $g_t(x)$ is\n",
+    "evaluated at the values of $x$ where the given conditions must be\n",
+    "satisfied.  The function $h_1(x)$ should alone make $g_t(x)$ satisfy\n",
+    "the conditions.\n",
     "\n",
-    "$$\n",
-    "\\begin{aligned}\n",
-    "g(0) = g(1) = 0\n",
-    "\\end{aligned}\\label{cond} \\tag{14}\n",
-    "$$"
+    "But what about the network $N(x,P)$?\n",
+    "\n",
+    "As described previously, an optimization method could be used to minimize the parameters of a neural network, that being its weights and biases, through backward propagation."
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "abb02959",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "In this example, we consider the case when $f(x) = (3x + x^2)\\exp(x)$.\n",
+    "## Minimization process\n",
     "\n",
-    "For this case, a possible trial solution satisfying the conditions could be"
+    "For the minimization to be defined, we need to have a cost function at hand to minimize.\n",
+    "\n",
+    "It is given that $f\\left(x, \\, g(x), \\, g'(x), \\, g''(x), \\, \\dots \\, , \\, g^{(n)}(x)\\right)$ should be equal to zero in ([1](#ode)).\n",
+    "We can choose to consider the mean squared error as the cost function for an input $x$.\n",
+    "Since we are looking at one input, the cost function is just $f$ squared.\n",
+    "The cost function $c\\left(x, P \\right)$ can therefore be expressed as"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "6468ecf8",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
-    "g_t(x) = x \\cdot (1-x) \\cdot N(P,x)\n",
+    "C\\left(x, P\\right) = \\big(f\\left(x, \\, g(x), \\, g'(x), \\, g''(x), \\, \\dots \\, , \\, g^{(n)}(x)\\right)\\big)^2\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "e7441b12",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "The analytical solution for this problem is"
+    "If $N$ inputs are given as a vector $\\boldsymbol{x}$ with elements $x_i$ for $i = 1,\\dots,N$,\n",
+    "the cost function becomes"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "0ffd1c29",
+   "metadata": {
+    "editable": true
+   },
    "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"cost\"></div>\n",
+    "\n",
     "$$\n",
-    "g(x) = x(1 - x)\\exp(x)\n",
+    "\\begin{equation} \\label{cost} \\tag{3}\n",
+    "\tC\\left(\\boldsymbol{x}, P\\right) = \\frac{1}{N} \\sum_{i=1}^N \\big(f\\left(x_i, \\, g(x_i), \\, g'(x_i), \\, g''(x_i), \\, \\dots \\, , \\, g^{(n)}(x_i)\\right)\\big)^2\n",
+    "\\end{equation}\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "e55c8d3e",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "## Solving the equation using Autograd"
+    "The neural net should then find the parameters $P$ that minimizes the cost function in\n",
+    "([3](#cost)) for a set of $N$ training samples $x_i$."
    ]
   },
   {
-   "cell_type": "code",
-   "execution_count": 5,
-   "metadata": {},
-   "outputs": [],
+   "cell_type": "markdown",
+   "id": "8a940e88",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "import autograd.numpy as np\n",
-    "from autograd import grad, elementwise_grad\n",
-    "import autograd.numpy.random as npr\n",
-    "from matplotlib import pyplot as plt\n",
-    "\n",
-    "def sigmoid(z):\n",
-    "    return 1/(1 + np.exp(-z))\n",
-    "\n",
-    "def deep_neural_network(deep_params, x):\n",
-    "    # N_hidden is the number of hidden layers\n",
-    "    N_hidden = np.size(deep_params) - 1 # -1 since params consist of parameters to all the hidden layers AND the output layer\n",
-    "\n",
-    "    # Assumes input x being an one-dimensional array\n",
-    "    num_values = np.size(x)\n",
-    "    x = x.reshape(-1, num_values)\n",
-    "\n",
-    "    # Assume that the input layer does nothing to the input x\n",
-    "    x_input = x\n",
-    "\n",
-    "    # Due to multiple hidden layers, define a variable referencing to the\n",
-    "    # output of the previous layer:\n",
-    "    x_prev = x_input\n",
-    "\n",
-    "    ## Hidden layers:\n",
-    "\n",
-    "    for l in range(N_hidden):\n",
-    "        # From the list of parameters P; find the correct weigths and bias for this layer\n",
-    "        w_hidden = deep_params[l]\n",
-    "\n",
-    "        # Add a row of ones to include bias\n",
-    "        x_prev = np.concatenate((np.ones((1,num_values)), x_prev ), axis = 0)\n",
-    "\n",
-    "        z_hidden = np.matmul(w_hidden, x_prev)\n",
-    "        x_hidden = sigmoid(z_hidden)\n",
-    "\n",
-    "        # Update x_prev such that next layer can use the output from this layer\n",
-    "        x_prev = x_hidden\n",
-    "\n",
-    "    ## Output layer:\n",
-    "\n",
-    "    # Get the weights and bias for this layer\n",
-    "    w_output = deep_params[-1]\n",
-    "\n",
-    "    # Include bias:\n",
-    "    x_prev = np.concatenate((np.ones((1,num_values)), x_prev), axis = 0)\n",
-    "\n",
-    "    z_output = np.matmul(w_output, x_prev)\n",
-    "    x_output = z_output\n",
-    "\n",
-    "    return x_output\n",
-    "\n",
-    "def solve_ode_deep_neural_network(x, num_neurons, num_iter, lmb):\n",
-    "    # num_hidden_neurons is now a list of number of neurons within each hidden layer\n",
-    "\n",
-    "    # Find the number of hidden layers:\n",
-    "    N_hidden = np.size(num_neurons)\n",
-    "\n",
-    "    ## Set up initial weigths and biases\n",
-    "\n",
-    "    # Initialize the list of parameters:\n",
-    "    P = [None]*(N_hidden + 1) # + 1 to include the output layer\n",
-    "\n",
-    "    P[0] = npr.randn(num_neurons[0], 2 )\n",
-    "    for l in range(1,N_hidden):\n",
-    "        P[l] = npr.randn(num_neurons[l], num_neurons[l-1] + 1) # +1 to include bias\n",
-    "\n",
-    "    # For the output layer\n",
-    "    P[-1] = npr.randn(1, num_neurons[-1] + 1 ) # +1 since bias is included\n",
-    "\n",
-    "    print('Initial cost: %g'%cost_function_deep(P, x))\n",
-    "\n",
-    "    ## Start finding the optimal weigths using gradient descent\n",
-    "\n",
-    "    # Find the Python function that represents the gradient of the cost function\n",
-    "    # w.r.t the 0-th input argument -- that is the weights and biases in the hidden and output layer\n",
-    "    cost_function_deep_grad = grad(cost_function_deep,0)\n",
-    "\n",
-    "    # Let the update be done num_iter times\n",
-    "    for i in range(num_iter):\n",
-    "        # Evaluate the gradient at the current weights and biases in P.\n",
-    "        # The cost_grad consist now of N_hidden + 1 arrays; the gradient w.r.t the weights and biases\n",
-    "        # in the hidden layers and output layers evaluated at x.\n",
-    "        cost_deep_grad =  cost_function_deep_grad(P, x)\n",
-    "\n",
-    "        for l in range(N_hidden+1):\n",
-    "            P[l] = P[l] - lmb * cost_deep_grad[l]\n",
-    "\n",
-    "    print('Final cost: %g'%cost_function_deep(P, x))\n",
-    "\n",
-    "    return P\n",
-    "\n",
-    "## Set up the cost function specified for this Poisson equation:\n",
-    "\n",
-    "# The right side of the ODE\n",
-    "def f(x):\n",
-    "    return (3*x + x**2)*np.exp(x)\n",
-    "\n",
-    "def cost_function_deep(P, x):\n",
-    "\n",
-    "    # Evaluate the trial function with the current parameters P\n",
-    "    g_t = g_trial_deep(x,P)\n",
-    "\n",
-    "    # Find the derivative w.r.t x of the trial function\n",
-    "    d2_g_t = elementwise_grad(elementwise_grad(g_trial_deep,0))(x,P)\n",
-    "\n",
-    "    right_side = f(x)\n",
-    "\n",
-    "    err_sqr = (-d2_g_t - right_side)**2\n",
-    "    cost_sum = np.sum(err_sqr)\n",
-    "\n",
-    "    return cost_sum/np.size(err_sqr)\n",
-    "\n",
-    "# The trial solution:\n",
-    "def g_trial_deep(x,P):\n",
-    "    return x*(1-x)*deep_neural_network(P,x)\n",
-    "\n",
-    "# The analytic solution;\n",
-    "def g_analytic(x):\n",
-    "    return x*(1-x)*np.exp(x)\n",
-    "\n",
-    "if __name__ == '__main__':\n",
-    "    npr.seed(4155)\n",
-    "\n",
-    "    ## Decide the vales of arguments to the function to solve\n",
-    "    Nx = 10\n",
-    "    x = np.linspace(0,1, Nx)\n",
-    "\n",
-    "    ## Set up the initial parameters\n",
-    "    num_hidden_neurons = [200,100]\n",
-    "    num_iter = 1000\n",
-    "    lmb = 1e-3\n",
-    "\n",
-    "    P = solve_ode_deep_neural_network(x, num_hidden_neurons, num_iter, lmb)\n",
-    "\n",
-    "    g_dnn_ag = g_trial_deep(x,P)\n",
-    "    g_analytical = g_analytic(x)\n",
-    "\n",
-    "    # Find the maximum absolute difference between the solutons:\n",
-    "    max_diff = np.max(np.abs(g_dnn_ag - g_analytical))\n",
-    "    print(\"The max absolute difference between the solutions is: %g\"%max_diff)\n",
+    "## Minimizing the cost function using gradient descent and automatic differentiation\n",
     "\n",
-    "    plt.figure(figsize=(10,10))\n",
+    "To perform the minimization using gradient descent, the gradient of $C\\left(\\boldsymbol{x}, P\\right)$ is needed.\n",
+    "It might happen so that finding an analytical expression of the gradient of $C(\\boldsymbol{x}, P)$ from ([3](#cost)) gets too messy, depending on which cost function one desires to use.\n",
     "\n",
-    "    plt.title('Performance of neural network solving an ODE compared to the analytical solution')\n",
-    "    plt.plot(x, g_analytical)\n",
-    "    plt.plot(x, g_dnn_ag[0,:])\n",
-    "    plt.legend(['analytical','nn'])\n",
-    "    plt.xlabel('x')\n",
-    "    plt.ylabel('g(x)')\n",
-    "    plt.show()"
+    "Luckily, there exists libraries that makes the job for us through automatic differentiation.\n",
+    "Automatic differentiation is a method of finding the derivatives numerically with very high precision."
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "547613c0",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "## Comparing with a numerical scheme\n",
-    "\n",
-    "The Poisson equation is possible to solve using Taylor series to approximate the second derivative.\n",
-    "\n",
-    "Using Taylor series, the second derivative can be expressed as\n",
-    "\n",
-    "$$\n",
-    "g''(x) = \\frac{g(x + \\Delta x) - 2g(x) + g(x-\\Delta x)}{\\Delta x^2} + E_{\\Delta x}(x)\n",
-    "$$\n",
-    "\n",
-    "where $\\Delta x$ is a small step size and $E_{\\Delta x}(x)$ being the error term.\n",
+    "## Example: Exponential decay\n",
     "\n",
-    "Looking away from the error terms gives an approximation to the second derivative:"
+    "An exponential decay of a quantity $g(x)$ is described by the equation"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "826651d6",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "<!-- Equation labels as ordinary links -->\n",
-    "<div id=\"approx\"></div>\n",
+    "<div id=\"solve_expdec\"></div>\n",
     "\n",
     "$$\n",
-    "\\begin{equation} \\label{approx} \\tag{15}\n",
-    "g''(x) \\approx \\frac{g(x + \\Delta x) - 2g(x) + g(x-\\Delta x)}{\\Delta x^2}\n",
+    "\\begin{equation} \\label{solve_expdec} \\tag{4}\n",
+    "  g'(x) = -\\gamma g(x)\n",
     "\\end{equation}\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "870b960b",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "If $x_i = i \\Delta x = x_{i-1} + \\Delta x$ and $g_i = g(x_i)$ for $i = 1,\\dots N_x - 2$ with $N_x$ being the number of values for $x$, ([15](#approx)) becomes"
+    "with $g(0) = g_0$ for some chosen initial value $g_0$.\n",
+    "\n",
+    "The analytical solution of ([4](#solve_expdec)) is"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "5a8fd1e3",
+   "metadata": {
+    "editable": true
+   },
    "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"_auto2\"></div>\n",
+    "\n",
     "$$\n",
-    "\\begin{aligned}\n",
-    "g''(x_i) &\\approx \\frac{g(x_i + \\Delta x) - 2g(x_i) + g(x_i -\\Delta x)}{\\Delta x^2} \\\\\n",
-    "&= \\frac{g_{i+1} - 2g_i + g_{i-1}}{\\Delta x^2}\n",
-    "\\end{aligned}\n",
+    "\\begin{equation}\n",
+    "  g(x) = g_0 \\exp\\left(-\\gamma x\\right)\n",
+    "\\label{_auto2} \\tag{5}\n",
+    "\\end{equation}\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "55b4f286",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "Since we know from our problem that"
+    "Having an analytical solution at hand, it is possible to use it to compare how well a neural network finds a solution of ([4](#solve_expdec))."
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "7e4f689b",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "$$\n",
-    "\\begin{aligned}\n",
-    "-g''(x) &= f(x) \\\\\n",
-    "&= (3x + x^2)\\exp(x)\n",
-    "\\end{aligned}\n",
-    "$$"
+    "## The function to solve for\n",
+    "\n",
+    "The program will use a neural network to solve"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "01e8e999",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "along with the conditions $g(0) = g(1) = 0$,\n",
-    "the following scheme can be used to find an approximate solution for $g(x)$ numerically:"
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"solveode\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation} \\label{solveode} \\tag{6}\n",
+    "g'(x) = -\\gamma g(x)\n",
+    "\\end{equation}\n",
+    "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "7ccea9f1",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "<!-- Equation labels as ordinary links -->\n",
-    "<div id=\"odesys\"></div>\n",
+    "where $g(0) = g_0$ with $\\gamma$ and $g_0$ being some chosen values.\n",
     "\n",
-    "$$\n",
-    "\\begin{equation}\n",
-    "  \\begin{aligned}\n",
-    "  -\\Big( \\frac{g_{i+1} - 2g_i + g_{i-1}}{\\Delta x^2} \\Big) &= f(x_i) \\\\\n",
-    "  -g_{i+1} + 2g_i - g_{i-1} &= \\Delta x^2 f(x_i)\n",
-    "  \\end{aligned}\n",
-    "\\end{equation} \\label{odesys} \\tag{16}\n",
-    "$$"
+    "In this example, $\\gamma = 2$ and $g_0 = 10$."
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "47fde776",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "for $i = 1, \\dots, N_x - 2$ where $g_0 = g_{N_x - 1} = 0$ and $f(x_i) = (3x_i + x_i^2)\\exp(x_i)$, which is given for our specific problem.\n",
-    "\n",
-    "The equation can be rewritten into a matrix equation:"
+    "## The trial solution\n",
+    "To begin with, a trial solution $g_t(t)$ must be chosen. A general trial solution for ordinary differential equations could be"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "f7a8f626",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
-    "\\begin{aligned}\n",
-    "\\begin{pmatrix}\n",
-    "2 & -1 & 0 & \\dots & 0 \\\\\n",
-    "-1 & 2 & -1 & \\dots & 0 \\\\\n",
-    "\\vdots & & \\ddots & & \\vdots \\\\\n",
-    "0 & \\dots & -1 & 2 & -1  \\\\\n",
-    "0 & \\dots & 0 & -1 & 2\\\\\n",
-    "\\end{pmatrix}\n",
-    "\\begin{pmatrix}\n",
-    "g_1 \\\\\n",
-    "g_2 \\\\\n",
-    "\\vdots \\\\\n",
-    "g_{N_x - 3} \\\\\n",
-    "g_{N_x - 2}\n",
-    "\\end{pmatrix}\n",
-    "&=\n",
-    "\\Delta x^2\n",
-    "\\begin{pmatrix}\n",
-    "f(x_1) \\\\\n",
-    "f(x_2) \\\\\n",
-    "\\vdots \\\\\n",
-    "f(x_{N_x - 3}) \\\\\n",
-    "f(x_{N_x - 2})\n",
-    "\\end{pmatrix} \\\\\n",
-    "\\boldsymbol{A}\\boldsymbol{g} &= \\boldsymbol{f},\n",
-    "\\end{aligned}\n",
+    "g_t(x, P) = h_1(x) + h_2(x, N(x, P))\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "66551df0",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "which makes it possible to solve for the vector $\\boldsymbol{g}$.\n",
-    "\n",
-    "## Setting up the code\n",
-    "\n",
-    "We can then compare the result from this numerical scheme with the output from our network using Autograd:"
+    "with $h_1(x)$ ensuring that $g_t(x)$ satisfies some conditions and $h_2(x,N(x, P))$ an expression involving $x$ and the output from the neural network $N(x,P)$ with $P $ being the collection of the weights and biases for each layer. For now, it is assumed that the network consists of one input layer, one hidden layer, and one output layer."
    ]
   },
   {
-   "cell_type": "code",
-   "execution_count": 6,
-   "metadata": {},
-   "outputs": [],
+   "cell_type": "markdown",
+   "id": "c354ef4e",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "import autograd.numpy as np\n",
-    "from autograd import grad, elementwise_grad\n",
-    "import autograd.numpy.random as npr\n",
-    "from matplotlib import pyplot as plt\n",
-    "\n",
-    "def sigmoid(z):\n",
-    "    return 1/(1 + np.exp(-z))\n",
-    "\n",
-    "def deep_neural_network(deep_params, x):\n",
-    "    # N_hidden is the number of hidden layers\n",
-    "    N_hidden = np.size(deep_params) - 1 # -1 since params consist of parameters to all the hidden layers AND the output layer\n",
-    "\n",
-    "    # Assumes input x being an one-dimensional array\n",
-    "    num_values = np.size(x)\n",
-    "    x = x.reshape(-1, num_values)\n",
-    "\n",
-    "    # Assume that the input layer does nothing to the input x\n",
-    "    x_input = x\n",
-    "\n",
-    "    # Due to multiple hidden layers, define a variable referencing to the\n",
-    "    # output of the previous layer:\n",
-    "    x_prev = x_input\n",
-    "\n",
-    "    ## Hidden layers:\n",
-    "\n",
-    "    for l in range(N_hidden):\n",
-    "        # From the list of parameters P; find the correct weigths and bias for this layer\n",
-    "        w_hidden = deep_params[l]\n",
-    "\n",
-    "        # Add a row of ones to include bias\n",
-    "        x_prev = np.concatenate((np.ones((1,num_values)), x_prev ), axis = 0)\n",
-    "\n",
-    "        z_hidden = np.matmul(w_hidden, x_prev)\n",
-    "        x_hidden = sigmoid(z_hidden)\n",
-    "\n",
-    "        # Update x_prev such that next layer can use the output from this layer\n",
-    "        x_prev = x_hidden\n",
-    "\n",
-    "    ## Output layer:\n",
-    "\n",
-    "    # Get the weights and bias for this layer\n",
-    "    w_output = deep_params[-1]\n",
-    "\n",
-    "    # Include bias:\n",
-    "    x_prev = np.concatenate((np.ones((1,num_values)), x_prev), axis = 0)\n",
-    "\n",
-    "    z_output = np.matmul(w_output, x_prev)\n",
-    "    x_output = z_output\n",
-    "\n",
-    "    return x_output\n",
-    "\n",
-    "def solve_ode_deep_neural_network(x, num_neurons, num_iter, lmb):\n",
-    "    # num_hidden_neurons is now a list of number of neurons within each hidden layer\n",
-    "\n",
-    "    # Find the number of hidden layers:\n",
-    "    N_hidden = np.size(num_neurons)\n",
-    "\n",
-    "    ## Set up initial weigths and biases\n",
-    "\n",
-    "    # Initialize the list of parameters:\n",
-    "    P = [None]*(N_hidden + 1) # + 1 to include the output layer\n",
-    "\n",
-    "    P[0] = npr.randn(num_neurons[0], 2 )\n",
-    "    for l in range(1,N_hidden):\n",
-    "        P[l] = npr.randn(num_neurons[l], num_neurons[l-1] + 1) # +1 to include bias\n",
-    "\n",
-    "    # For the output layer\n",
-    "    P[-1] = npr.randn(1, num_neurons[-1] + 1 ) # +1 since bias is included\n",
-    "\n",
-    "    print('Initial cost: %g'%cost_function_deep(P, x))\n",
-    "\n",
-    "    ## Start finding the optimal weigths using gradient descent\n",
-    "\n",
-    "    # Find the Python function that represents the gradient of the cost function\n",
-    "    # w.r.t the 0-th input argument -- that is the weights and biases in the hidden and output layer\n",
-    "    cost_function_deep_grad = grad(cost_function_deep,0)\n",
-    "\n",
-    "    # Let the update be done num_iter times\n",
-    "    for i in range(num_iter):\n",
-    "        # Evaluate the gradient at the current weights and biases in P.\n",
-    "        # The cost_grad consist now of N_hidden + 1 arrays; the gradient w.r.t the weights and biases\n",
-    "        # in the hidden layers and output layers evaluated at x.\n",
-    "        cost_deep_grad =  cost_function_deep_grad(P, x)\n",
-    "\n",
-    "        for l in range(N_hidden+1):\n",
-    "            P[l] = P[l] - lmb * cost_deep_grad[l]\n",
-    "\n",
-    "    print('Final cost: %g'%cost_function_deep(P, x))\n",
-    "\n",
-    "    return P\n",
-    "\n",
-    "## Set up the cost function specified for this Poisson equation:\n",
-    "\n",
-    "# The right side of the ODE\n",
-    "def f(x):\n",
-    "    return (3*x + x**2)*np.exp(x)\n",
-    "\n",
-    "def cost_function_deep(P, x):\n",
-    "\n",
-    "    # Evaluate the trial function with the current parameters P\n",
-    "    g_t = g_trial_deep(x,P)\n",
-    "\n",
-    "    # Find the derivative w.r.t x of the trial function\n",
-    "    d2_g_t = elementwise_grad(elementwise_grad(g_trial_deep,0))(x,P)\n",
-    "\n",
-    "    right_side = f(x)\n",
-    "\n",
-    "    err_sqr = (-d2_g_t - right_side)**2\n",
-    "    cost_sum = np.sum(err_sqr)\n",
-    "\n",
-    "    return cost_sum/np.size(err_sqr)\n",
-    "\n",
-    "# The trial solution:\n",
-    "def g_trial_deep(x,P):\n",
-    "    return x*(1-x)*deep_neural_network(P,x)\n",
-    "\n",
-    "# The analytic solution;\n",
-    "def g_analytic(x):\n",
-    "    return x*(1-x)*np.exp(x)\n",
-    "\n",
-    "if __name__ == '__main__':\n",
-    "    npr.seed(4155)\n",
-    "\n",
-    "    ## Decide the vales of arguments to the function to solve\n",
-    "    Nx = 10\n",
-    "    x = np.linspace(0,1, Nx)\n",
-    "\n",
-    "    ## Set up the initial parameters\n",
-    "    num_hidden_neurons = [200,100]\n",
-    "    num_iter = 1000\n",
-    "    lmb = 1e-3\n",
-    "\n",
-    "    P = solve_ode_deep_neural_network(x, num_hidden_neurons, num_iter, lmb)\n",
-    "\n",
-    "    g_dnn_ag = g_trial_deep(x,P)\n",
-    "    g_analytical = g_analytic(x)\n",
-    "\n",
-    "    # Find the maximum absolute difference between the solutons:\n",
-    "\n",
-    "    plt.figure(figsize=(10,10))\n",
-    "\n",
-    "    plt.title('Performance of neural network solving an ODE compared to the analytical solution')\n",
-    "    plt.plot(x, g_analytical)\n",
-    "    plt.plot(x, g_dnn_ag[0,:])\n",
-    "    plt.legend(['analytical','nn'])\n",
-    "    plt.xlabel('x')\n",
-    "    plt.ylabel('g(x)')\n",
-    "\n",
-    "    ## Perform the computation using the numerical scheme\n",
+    "## Setup of Network\n",
     "\n",
-    "    dx = 1/(Nx - 1)\n",
+    "In this network, there are no weights and bias at the input layer, so $P = \\{ P_{\\text{hidden}},  P_{\\text{output}} \\}$.\n",
+    "If there are $N_{\\text{hidden} }$ neurons in the hidden layer, then $P_{\\text{hidden}}$ is a $N_{\\text{hidden} } \\times (1 + N_{\\text{input}})$ matrix, given that there are $N_{\\text{input}}$ neurons in the input layer.\n",
     "\n",
-    "    # Set up the matrix A\n",
-    "    A = np.zeros((Nx-2,Nx-2))\n",
+    "The first column in $P_{\\text{hidden} }$ represents the bias for each neuron in the hidden layer and the second column represents the weights for each neuron in the hidden layer from the input layer.\n",
+    "If there are $N_{\\text{output} }$ neurons in the output layer, then $P_{\\text{output}} $ is a $N_{\\text{output} } \\times (1 + N_{\\text{hidden} })$ matrix.\n",
     "\n",
-    "    A[0,0] = 2\n",
-    "    A[0,1] = -1\n",
+    "Its first column represents the bias of each neuron and the remaining columns represents the weights to each neuron.\n",
     "\n",
-    "    for i in range(1,Nx-3):\n",
-    "        A[i,i-1] = -1\n",
-    "        A[i,i] = 2\n",
-    "        A[i,i+1] = -1\n",
+    "It is given that $g(0) = g_0$. The trial solution must fulfill this condition to be a proper solution of ([6](#solveode)). A possible way to ensure that $g_t(0, P) = g_0$, is to let $F(N(x,P)) = x \\cdot N(x,P)$ and $A(x) = g_0$. This gives the following trial solution:"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a574c0b7",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"trial\"></div>\n",
     "\n",
-    "    A[Nx - 3, Nx - 4] = -1\n",
-    "    A[Nx - 3, Nx - 3] = 2\n",
+    "$$\n",
+    "\\begin{equation} \\label{trial} \\tag{7}\n",
+    "g_t(x, P) = g_0 + x \\cdot N(x, P)\n",
+    "\\end{equation}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "22f440c8",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Reformulating the problem\n",
     "\n",
-    "    # Set up the vector f\n",
-    "    f_vec = dx**2 * f(x[1:-1])\n",
+    "We wish that our neural network manages to minimize a given cost function.\n",
     "\n",
-    "    # Solve the equation\n",
-    "    g_res = np.linalg.solve(A,f_vec)\n",
+    "A reformulation of out equation, ([6](#solveode)), must therefore be done,\n",
+    "such that it describes the problem a neural network can solve for.\n",
     "\n",
-    "    g_vec = np.zeros(Nx)\n",
-    "    g_vec[1:-1] = g_res\n",
+    "The neural network must find the set of weights and biases $P$ such that the trial solution in ([7](#trial)) satisfies ([6](#solveode)).\n",
     "\n",
-    "    # Print the differences between each method\n",
-    "    max_diff1 = np.max(np.abs(g_dnn_ag - g_analytical))\n",
-    "    max_diff2 = np.max(np.abs(g_vec - g_analytical))\n",
-    "    print(\"The max absolute difference between the analytical solution and DNN Autograd: %g\"%max_diff1)\n",
-    "    print(\"The max absolute difference between the analytical solution and numerical scheme: %g\"%max_diff2)\n",
+    "The trial solution"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0ff80a83",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "g_t(x, P) = g_0 + x \\cdot N(x, P)\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6829edab",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "has been chosen such that it already solves the condition $g(0) = g_0$. What remains, is to find $P$ such that"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "381c61e2",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"nnmin\"></div>\n",
     "\n",
-    "    # Plot the results\n",
-    "    plt.figure(figsize=(10,10))\n",
+    "$$\n",
+    "\\begin{equation} \\label{nnmin} \\tag{8}\n",
+    "g_t'(x, P) = - \\gamma g_t(x, P)\n",
+    "\\end{equation}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ac36a03d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "is fulfilled as *best as possible*."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2899becc",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## More technicalities\n",
     "\n",
-    "    plt.plot(x,g_vec)\n",
-    "    plt.plot(x,g_analytical)\n",
-    "    plt.plot(x,g_dnn_ag[0,:])\n",
+    "The left hand side and right hand side of ([8](#nnmin)) must be computed separately, and then the neural network must choose weights and biases, contained in $P$, such that the sides are equal as best as possible.\n",
+    "This means that the absolute or squared difference between the sides must be as close to zero, ideally equal to zero.\n",
+    "In this case, the difference squared shows to be an appropriate measurement of how erroneous the trial solution is with respect to $P$ of the neural network.\n",
     "\n",
-    "    plt.legend(['numerical scheme','analytical','dnn'])\n",
-    "    plt.show()"
+    "This gives the following cost function our neural network must solve for:"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "d52c8124",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "## Partial Differential Equations\n",
+    "$$\n",
+    "\\min_{P}\\Big\\{ \\big(g_t'(x, P) - ( -\\gamma g_t(x, P) \\big)^2 \\Big\\}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3f8f684e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "(the notation $\\min_{P}\\{ f(x, P) \\}$ means that we desire to find $P$ that yields the minimum of $f(x, P)$)\n",
     "\n",
-    "A partial differential equation (PDE) has a solution here the function\n",
-    "is defined by multiple variables.  The equation may involve all kinds\n",
-    "of combinations of which variables the function is differentiated with\n",
-    "respect to.\n",
+    "or, in terms of weights and biases for the hidden and output layer in our network:"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "92cc16c9",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\min_{P_{\\text{hidden} }, \\ P_{\\text{output} }}\\Big\\{ \\big(g_t'(x, \\{ P_{\\text{hidden} }, P_{\\text{output} }\\}) - ( -\\gamma g_t(x, \\{ P_{\\text{hidden} }, P_{\\text{output} }\\}) \\big)^2 \\Big\\}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "628e0dfc",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "for an input value $x$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e54b4c6e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## More details\n",
     "\n",
-    "In general, a partial differential equation for a function $g(x_1,\\dots,x_N)$ with $N$ variables may be expressed as"
+    "If the neural network evaluates $g_t(x, P)$ at more values for $x$, say $N$ values $x_i$ for $i = 1, \\dots, N$, then the *total* error to minimize becomes"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "80dc48dd",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "<!-- Equation labels as ordinary links -->\n",
-    "<div id=\"PDE\"></div>\n",
+    "<div id=\"min\"></div>\n",
     "\n",
     "$$\n",
-    "\\begin{equation} \\label{PDE} \\tag{17}\n",
-    "  f\\left(x_1, \\, \\dots \\, , x_N, \\frac{\\partial g(x_1,\\dots,x_N) }{\\partial x_1}, \\dots , \\frac{\\partial g(x_1,\\dots,x_N) }{\\partial x_N}, \\frac{\\partial g(x_1,\\dots,x_N) }{\\partial x_1\\partial x_2}, \\, \\dots \\, , \\frac{\\partial^n g(x_1,\\dots,x_N) }{\\partial x_N^n} \\right) = 0\n",
+    "\\begin{equation} \\label{min} \\tag{9}\n",
+    "\\min_{P}\\Big\\{\\frac{1}{N} \\sum_{i=1}^N  \\big(g_t'(x_i, P) - ( -\\gamma g_t(x_i, P) \\big)^2 \\Big\\}\n",
     "\\end{equation}\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "e57a1d70",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Letting $\\boldsymbol{x}$ be a vector with elements $x_i$ and $C(\\boldsymbol{x}, P) = \\frac{1}{N} \\sum_i  \\big(g_t'(x_i, P) - ( -\\gamma g_t(x_i, P) \\big)^2$ denote the cost function, the minimization problem that our network must solve, becomes"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8ad67e57",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\min_{P} C(\\boldsymbol{x}, P)\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4eed66ce",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "where $f$ is an expression involving all kinds of possible mixed derivatives of $g(x_1,\\dots,x_N)$ up to an order $n$. In order for the solution to be unique, some additional conditions must also be given.\n",
+    "In terms of $P_{\\text{hidden} }$ and $P_{\\text{output} }$, this could also be expressed as\n",
     "\n",
-    "## Type of problem\n",
+    "$$\n",
+    "\\min_{P_{\\text{hidden} }, \\ P_{\\text{output} }} C(\\boldsymbol{x}, \\{P_{\\text{hidden} }, P_{\\text{output} }\\})\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9d652c56",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## A possible implementation of a neural network\n",
     "\n",
-    "The problem our network must solve for, is similar to the ODE case.\n",
-    "We must have a trial solution $g_t$ at hand.\n",
+    "For simplicity, it is assumed that the input is an array $\\boldsymbol{x} = (x_1, \\dots, x_N)$ with $N$ elements. It is at these points the neural network should find $P$ such that it fulfills ([9](#min)).\n",
     "\n",
-    "For instance, the trial solution could be expressed as"
+    "First, the neural network must feed forward the inputs.\n",
+    "This means that $\\boldsymbol{x}s$ must be passed through an input layer, a hidden layer and a output layer. The input layer in this case, does not need to process the data any further.\n",
+    "The input layer will consist of $N_{\\text{input} }$ neurons, passing its element to each neuron in the hidden layer.  The number of neurons in the hidden layer will be $N_{\\text{hidden} }$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9a5a1ad7",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Technicalities\n",
+    "\n",
+    "For the $i$-th in the hidden layer with weight $w_i^{\\text{hidden} }$ and bias $b_i^{\\text{hidden} }$, the weighting from the $j$-th neuron at the input layer is:"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "ed15e067",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
-    "\\begin{align*}\n",
-    "  g_t(x_1,\\dots,x_N) = h_1(x_1,\\dots,x_N) + h_2(x_1,\\dots,x_N,N(x_1,\\dots,x_N,P))\n",
-    "\\end{align*}\n",
+    "\\begin{aligned}\n",
+    "z_{i,j}^{\\text{hidden}} &= b_i^{\\text{hidden}} + w_i^{\\text{hidden}}x_j \\\\\n",
+    "&=\n",
+    "\\begin{pmatrix}\n",
+    "b_i^{\\text{hidden}} & w_i^{\\text{hidden}}\n",
+    "\\end{pmatrix}\n",
+    "\\begin{pmatrix}\n",
+    "1 \\\\\n",
+    "x_j\n",
+    "\\end{pmatrix}\n",
+    "\\end{aligned}\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "827ac223",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "where $h_1(x_1,\\dots,x_N)$ is a function that ensures $g_t(x_1,\\dots,x_N)$ satisfies some given conditions.\n",
-    "The neural network $N(x_1,\\dots,x_N,P)$ has weights and biases described by $P$ and $h_2(x_1,\\dots,x_N,N(x_1,\\dots,x_N,P))$ is an expression using the output from the neural network in some way.\n",
-    "\n",
-    "The role of the function $h_2(x_1,\\dots,x_N,N(x_1,\\dots,x_N,P))$, is to ensure that the output of $N(x_1,\\dots,x_N,P)$ is zero when $g_t(x_1,\\dots,x_N)$ is evaluated at the values of $x_1,\\dots,x_N$ where the given conditions must be satisfied. The function $h_1(x_1,\\dots,x_N)$ should alone make $g_t(x_1,\\dots,x_N)$ satisfy the conditions.\n",
+    "## Final technicalities I\n",
     "\n",
+    "The result after weighting the inputs at the $i$-th hidden neuron can be written as a vector:"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a0a7b13f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\begin{aligned}\n",
+    "\\boldsymbol{z}_{i}^{\\text{hidden}} &= \\Big( b_i^{\\text{hidden}} + w_i^{\\text{hidden}}x_1 , \\ b_i^{\\text{hidden}} + w_i^{\\text{hidden}} x_2, \\ \\dots \\, , \\ b_i^{\\text{hidden}} + w_i^{\\text{hidden}} x_N\\Big)  \\\\\n",
+    "&=\n",
+    "\\begin{pmatrix}\n",
+    " b_i^{\\text{hidden}}  & w_i^{\\text{hidden}}\n",
+    "\\end{pmatrix}\n",
+    "\\begin{pmatrix}\n",
+    "1  & 1 & \\dots & 1 \\\\\n",
+    "x_1 & x_2 & \\dots & x_N\n",
+    "\\end{pmatrix} \\\\\n",
+    "&= \\boldsymbol{p}_{i, \\text{hidden}}^T X\n",
+    "\\end{aligned}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0879010a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Final technicalities II\n",
     "\n",
-    "## Network requirements\n",
+    "The vector $\\boldsymbol{p}_{i, \\text{hidden}}^T$ constitutes each row in $P_{\\text{hidden} }$, which contains the weights for the neural network to minimize according to ([9](#min)).\n",
     "\n",
-    "The network tries then the minimize the cost function following the\n",
-    "same ideas as described for the ODE case, but now with more than one\n",
-    "variables to consider.  The concept still remains the same; find a set\n",
-    "of parameters $P$ such that the expression $f$ in ([17](#PDE)) is as\n",
-    "close to zero as possible.\n",
+    "After having found $\\boldsymbol{z}_{i}^{\\text{hidden}} $ for every $i$-th neuron within the hidden layer, the vector will be sent to an activation function $a_i(\\boldsymbol{z})$.\n",
     "\n",
-    "As for the ODE case, the cost function is the mean squared error that\n",
-    "the network must try to minimize. The cost function for the network to\n",
-    "minimize is"
+    "In this example, the sigmoid function has been chosen to be the activation function for each hidden neuron:"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "66ac91b3",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
-    "C\\left(x_1, \\dots, x_N, P\\right) = \\left(  f\\left(x_1, \\, \\dots \\, , x_N, \\frac{\\partial g(x_1,\\dots,x_N) }{\\partial x_1}, \\dots , \\frac{\\partial g(x_1,\\dots,x_N) }{\\partial x_N}, \\frac{\\partial g(x_1,\\dots,x_N) }{\\partial x_1\\partial x_2}, \\, \\dots \\, , \\frac{\\partial^n g(x_1,\\dots,x_N) }{\\partial x_N^n} \\right) \\right)^2\n",
+    "f(z) = \\frac{1}{1 + \\exp{(-z)}}\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "470c74b5",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "## More details\n",
+    "It is possible to use other activations functions for the hidden layer also.\n",
     "\n",
-    "If we let $\\boldsymbol{x} = \\big( x_1, \\dots, x_N \\big)$ be an array containing the values for $x_1, \\dots, x_N$ respectively, the cost function can be reformulated into the following:"
+    "The output $\\boldsymbol{x}_i^{\\text{hidden}}$ from each $i$-th hidden neuron is:\n",
+    "\n",
+    "$$\n",
+    "\\boldsymbol{x}_i^{\\text{hidden} } = f\\big(  \\boldsymbol{z}_{i}^{\\text{hidden}} \\big)\n",
+    "$$\n",
+    "\n",
+    "The outputs $\\boldsymbol{x}_i^{\\text{hidden} } $ are then sent to the output layer.\n",
+    "\n",
+    "The output layer consists of one neuron in this case, and combines the\n",
+    "output from each of the neurons in the hidden layers. The output layer\n",
+    "combines the results from the hidden layer using some weights $w_i^{\\text{output}}$\n",
+    "and biases $b_i^{\\text{output}}$. In this case,\n",
+    "it is assumes that the number of neurons in the output layer is one."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "bf5e6967",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Final technicalities III\n",
+    "\n",
+    "The procedure of weighting the output neuron $j$ in the hidden layer to the $i$-th neuron in the output layer is similar as for the hidden layer described previously."
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "766b88f8",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
-    "C\\left(\\boldsymbol{x}, P\\right) = f\\left( \\left( \\boldsymbol{x}, \\frac{\\partial g(\\boldsymbol{x}) }{\\partial x_1}, \\dots , \\frac{\\partial g(\\boldsymbol{x}) }{\\partial x_N}, \\frac{\\partial g(\\boldsymbol{x}) }{\\partial x_1\\partial x_2}, \\, \\dots \\, , \\frac{\\partial^n g(\\boldsymbol{x}) }{\\partial x_N^n} \\right) \\right)^2\n",
+    "\\begin{aligned}\n",
+    "z_{1,j}^{\\text{output}} & =\n",
+    "\\begin{pmatrix}\n",
+    "b_1^{\\text{output}} & \\boldsymbol{w}_1^{\\text{output}}\n",
+    "\\end{pmatrix}\n",
+    "\\begin{pmatrix}\n",
+    "1 \\\\\n",
+    "\\boldsymbol{x}_j^{\\text{hidden}}\n",
+    "\\end{pmatrix}\n",
+    "\\end{aligned}\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "5c114139",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "If we also have $M$ different sets of values for $x_1, \\dots, x_N$, that is $\\boldsymbol{x}_i = \\big(x_1^{(i)}, \\dots, x_N^{(i)}\\big)$ for $i = 1,\\dots,M$ being the rows in matrix $X$, the cost function can be generalized into"
+    "## Final technicalities IV\n",
+    "\n",
+    "Expressing $z_{1,j}^{\\text{output}}$ as a vector gives the following way of weighting the inputs from the hidden layer:"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "45596281",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
-    "C\\left(X, P \\right) = \\sum_{i=1}^M f\\left( \\left( \\boldsymbol{x}_i, \\frac{\\partial g(\\boldsymbol{x}_i) }{\\partial x_1}, \\dots , \\frac{\\partial g(\\boldsymbol{x}_i) }{\\partial x_N}, \\frac{\\partial g(\\boldsymbol{x}_i) }{\\partial x_1\\partial x_2}, \\, \\dots \\, , \\frac{\\partial^n g(\\boldsymbol{x}_i) }{\\partial x_N^n} \\right) \\right)^2.\n",
+    "\\boldsymbol{z}_{1}^{\\text{output}} =\n",
+    "\\begin{pmatrix}\n",
+    "b_1^{\\text{output}} & \\boldsymbol{w}_1^{\\text{output}}\n",
+    "\\end{pmatrix}\n",
+    "\\begin{pmatrix}\n",
+    "1  & 1 & \\dots & 1 \\\\\n",
+    "\\boldsymbol{x}_1^{\\text{hidden}} & \\boldsymbol{x}_2^{\\text{hidden}} & \\dots & \\boldsymbol{x}_N^{\\text{hidden}}\n",
+    "\\end{pmatrix}\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "2c1378fb",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "In this case we seek a continuous range of values since we are approximating a function. This means that after computing $\\boldsymbol{z}_{1}^{\\text{output}}$ the neural network has finished its feed forward step, and $\\boldsymbol{z}_{1}^{\\text{output}}$ is the final output of the network."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "66a732e1",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "## Example: The diffusion equation\n",
+    "## Back propagation\n",
     "\n",
-    "In one spatial dimension, the equation reads"
+    "The next step is to decide how the parameters should be changed such that they minimize the cost function.\n",
+    "\n",
+    "The chosen cost function for this problem is"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "fdf81225",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
-    "\\frac{\\partial g(x,t)}{\\partial t} = \\frac{\\partial^2 g(x,t)}{\\partial x^2}\n",
+    "C(\\boldsymbol{x}, P) = \\frac{1}{N} \\sum_i  \\big(g_t'(x_i, P) - ( -\\gamma g_t(x_i, P) \\big)^2\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "where a possible choice of conditions are"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
+   "id": "9bb52111",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "$$\n",
-    "\\begin{align*}\n",
-    "g(0,t) &= 0 ,\\qquad t \\geq 0 \\\\\n",
-    "g(1,t) &= 0, \\qquad t \\geq 0 \\\\\n",
-    "g(x,0) &= u(x),\\qquad x\\in [0,1]\n",
-    "\\end{align*}\n",
-    "$$"
+    "In order to minimize the cost function, an optimization method must be chosen.\n",
+    "\n",
+    "Here, gradient descent with a constant step size has been chosen."
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "f3e495b4",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "with $u(x)$ being some given function.\n",
+    "## Gradient descent\n",
     "\n",
-    "## Defining the problem\n",
+    "The idea of the gradient descent algorithm is to update parameters in\n",
+    "a direction where the cost function decreases goes to a minimum.\n",
     "\n",
-    "For this case, we want to find $g(x,t)$ such that"
+    "In general, the update of some parameters $\\boldsymbol{\\omega}$ given a cost\n",
+    "function defined by some weights $\\boldsymbol{\\omega}$, $C(\\boldsymbol{x},\n",
+    "\\boldsymbol{\\omega})$, goes as follows:"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "adc904df",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "<!-- Equation labels as ordinary links -->\n",
-    "<div id=\"diffonedim\"></div>\n",
-    "\n",
     "$$\n",
-    "\\begin{equation}\n",
-    "  \\frac{\\partial g(x,t)}{\\partial t} = \\frac{\\partial^2 g(x,t)}{\\partial x^2}\n",
-    "\\end{equation} \\label{diffonedim} \\tag{18}\n",
+    "\\boldsymbol{\\omega}_{\\text{new} } = \\boldsymbol{\\omega} - \\lambda \\nabla_{\\boldsymbol{\\omega}} C(\\boldsymbol{x}, \\boldsymbol{\\omega})\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "2d01b1b5",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "and"
+    "for a number of iterations or until $ \\big|\\big| \\boldsymbol{\\omega}_{\\text{new} } - \\boldsymbol{\\omega} \\big|\\big|$ becomes smaller than some given tolerance.\n",
+    "\n",
+    "The value of $\\lambda$ decides how large steps the algorithm must take\n",
+    "in the direction of $ \\nabla_{\\boldsymbol{\\omega}} C(\\boldsymbol{x}, \\boldsymbol{\\omega})$.\n",
+    "The notation $\\nabla_{\\boldsymbol{\\omega}}$ express the gradient with respect\n",
+    "to the elements in $\\boldsymbol{\\omega}$.\n",
+    "\n",
+    "In our case, we have to minimize the cost function $C(\\boldsymbol{x}, P)$ with\n",
+    "respect to the two sets of weights and biases, that is for the hidden\n",
+    "layer $P_{\\text{hidden} }$ and for the output layer $P_{\\text{output}\n",
+    "}$ .\n",
+    "\n",
+    "This means that $P_{\\text{hidden} }$ and $P_{\\text{output} }$ is updated by"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "5077f4f7",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
-    "\\begin{align*}\n",
-    "g(0,t) &= 0 ,\\qquad t \\geq 0 \\\\\n",
-    "g(1,t) &= 0, \\qquad t \\geq 0 \\\\\n",
-    "g(x,0) &= u(x),\\qquad x\\in [0,1]\n",
-    "\\end{align*}\n",
+    "\\begin{aligned}\n",
+    "P_{\\text{hidden},\\text{new}} &= P_{\\text{hidden}} - \\lambda \\nabla_{P_{\\text{hidden}}} C(\\boldsymbol{x}, P)  \\\\\n",
+    "P_{\\text{output},\\text{new}} &= P_{\\text{output}} - \\lambda \\nabla_{P_{\\text{output}}} C(\\boldsymbol{x}, P)\n",
+    "\\end{aligned}\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "fb01e943",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "with $u(x) = \\sin(\\pi x)$.\n",
-    "\n",
-    "First, let us set up the deep neural network.\n",
-    "The deep neural network will follow the same structure as discussed in the examples solving the ODEs.\n",
-    "First, we will look into how Autograd could be used in a network tailored to solve for bivariate functions.\n",
-    "\n",
-    "\n",
-    "\n",
-    "## Setting up the network using Autograd\n",
-    "\n",
-    "The only change to do here, is to extend our network such that\n",
-    "functions of multiple parameters are correctly handled.  In this case\n",
-    "we have two variables in our function to solve for, that is time $t$\n",
-    "and position $x$.  The variables will be represented by a\n",
-    "one-dimensional array in the program.  The program will evaluate the\n",
-    "network at each possible pair $(x,t)$, given an array for the desired\n",
-    "$x$-values and $t$-values to approximate the solution at."
+    "## The code for solving the ODE"
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 7,
-   "metadata": {},
+   "execution_count": 31,
+   "id": "6347e101",
+   "metadata": {
+    "collapsed": false,
+    "editable": true,
+    "jupyter": {
+     "outputs_hidden": false
+    }
+   },
    "outputs": [],
    "source": [
+    "import autograd.numpy as np\n",
+    "from autograd import grad, elementwise_grad\n",
+    "import autograd.numpy.random as npr\n",
+    "from matplotlib import pyplot as plt\n",
+    "\n",
     "def sigmoid(z):\n",
     "    return 1/(1 + np.exp(-z))\n",
     "\n",
-    "def deep_neural_network(deep_params, x):\n",
-    "    # x is now a point and a 1D numpy array; make it a column vector\n",
-    "    num_coordinates = np.size(x,0)\n",
-    "    x = x.reshape(num_coordinates,-1)\n",
+    "# Assuming one input, hidden, and output layer\n",
+    "def neural_network(params, x):\n",
     "\n",
-    "    num_points = np.size(x,1)\n",
+    "    # Find the weights (including and biases) for the hidden and output layer.\n",
+    "    # Assume that params is a list of parameters for each layer.\n",
+    "    # The biases are the first element for each array in params,\n",
+    "    # and the weights are the remaning elements in each array in params.\n",
     "\n",
-    "    # N_hidden is the number of hidden layers\n",
-    "    N_hidden = np.size(deep_params) - 1 # -1 since params consist of parameters to all the hidden layers AND the output layer\n",
+    "    w_hidden = params[0]\n",
+    "    w_output = params[1]\n",
+    "\n",
+    "    # Assumes input x being an one-dimensional array\n",
+    "    num_values = np.size(x)\n",
+    "    x = x.reshape(-1, num_values)\n",
     "\n",
     "    # Assume that the input layer does nothing to the input x\n",
     "    x_input = x\n",
-    "    x_prev = x_input\n",
-    "\n",
-    "    ## Hidden layers:\n",
-    "\n",
-    "    for l in range(N_hidden):\n",
-    "        # From the list of parameters P; find the correct weigths and bias for this layer\n",
-    "        w_hidden = deep_params[l]\n",
     "\n",
-    "        # Add a row of ones to include bias\n",
-    "        x_prev = np.concatenate((np.ones((1,num_points)), x_prev ), axis = 0)\n",
+    "    ## Hidden layer:\n",
     "\n",
-    "        z_hidden = np.matmul(w_hidden, x_prev)\n",
-    "        x_hidden = sigmoid(z_hidden)\n",
+    "    # Add a row of ones to include bias\n",
+    "    x_input = np.concatenate((np.ones((1,num_values)), x_input ), axis = 0)\n",
     "\n",
-    "        # Update x_prev such that next layer can use the output from this layer\n",
-    "        x_prev = x_hidden\n",
+    "    z_hidden = np.matmul(w_hidden, x_input)\n",
+    "    x_hidden = sigmoid(z_hidden)\n",
     "\n",
     "    ## Output layer:\n",
     "\n",
-    "    # Get the weights and bias for this layer\n",
-    "    w_output = deep_params[-1]\n",
-    "\n",
     "    # Include bias:\n",
-    "    x_prev = np.concatenate((np.ones((1,num_points)), x_prev), axis = 0)\n",
+    "    x_hidden = np.concatenate((np.ones((1,num_values)), x_hidden ), axis = 0)\n",
     "\n",
-    "    z_output = np.matmul(w_output, x_prev)\n",
+    "    z_output = np.matmul(w_output, x_hidden)\n",
     "    x_output = z_output\n",
     "\n",
-    "    return x_output[0][0]"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Setting up the network using Autograd; The trial solution\n",
+    "    return x_output\n",
     "\n",
-    "The cost function must then iterate through the given arrays\n",
-    "containing values for $x$ and $t$, defines a point $(x,t)$ the deep\n",
-    "neural network and the trial solution is evaluated at, and then finds\n",
-    "the Jacobian of the trial solution.\n",
+    "# The trial solution using the deep neural network:\n",
+    "def g_trial(x,params, g0 = 10):\n",
+    "    return g0 + x*neural_network(params,x)\n",
     "\n",
-    "A possible trial solution for this PDE is\n",
+    "# The right side of the ODE:\n",
+    "def g(x, g_trial, gamma = 2):\n",
+    "    return -gamma*g_trial\n",
     "\n",
-    "$$\n",
-    "g_t(x,t) = h_1(x,t) + x(1-x)tN(x,t,P)\n",
-    "$$\n",
+    "# The cost function:\n",
+    "def cost_function(P, x):\n",
     "\n",
-    "with $A(x,t)$ being a function ensuring that $g_t(x,t)$ satisfies our given conditions, and $N(x,t,P)$ being the output from the deep neural network using weights and biases for each layer from $P$.\n",
+    "    # Evaluate the trial function with the current parameters P\n",
+    "    g_t = g_trial(x,P)\n",
     "\n",
-    "To fulfill the conditions, $A(x,t)$ could be:\n",
+    "    # Find the derivative w.r.t x of the neural network\n",
+    "    d_net_out = elementwise_grad(neural_network,1)(P,x)\n",
     "\n",
-    "$$\n",
-    "h_1(x,t) = (1-t)\\Big(u(x) - \\big((1-x)u(0) + x u(1)\\big)\\Big) = (1-t)u(x) = (1-t)\\sin(\\pi x)\n",
-    "$$\n",
-    "since $(0) = u(1) = 0$ and $u(x) = \\sin(\\pi x)$.\n",
+    "    # Find the derivative w.r.t x of the trial function\n",
+    "    d_g_t = elementwise_grad(g_trial,0)(x,P)\n",
     "\n",
-    "## Why the jacobian?\n",
+    "    # The right side of the ODE\n",
+    "    func = g(x, g_t)\n",
     "\n",
-    "The Jacobian is used because the program must find the derivative of\n",
-    "the trial solution with respect to $x$ and $t$.\n",
+    "    err_sqr = (d_g_t - func)**2\n",
+    "    cost_sum = np.sum(err_sqr)\n",
     "\n",
-    "This gives the necessity of computing the Jacobian matrix, as we want\n",
-    "to evaluate the gradient with respect to $x$ and $t$ (note that the\n",
-    "Jacobian of a scalar-valued multivariate function is simply its\n",
-    "gradient).\n",
+    "    return cost_sum / np.size(err_sqr)\n",
     "\n",
-    "In Autograd, the differentiation is by default done with respect to\n",
-    "the first input argument of your Python function. Since the points is\n",
-    "an array representing $x$ and $t$, the Jacobian is calculated using\n",
-    "the values of $x$ and $t$.\n",
+    "# Solve the exponential decay ODE using neural network with one input, hidden, and output layer\n",
+    "def solve_ode_neural_network(x, num_neurons_hidden, num_iter, lmb):\n",
+    "    ## Set up initial weights and biases\n",
     "\n",
-    "To find the second derivative with respect to $x$ and $t$, the\n",
-    "Jacobian can be found for the second time. The result is a Hessian\n",
-    "matrix, which is the matrix containing all the possible second order\n",
-    "mixed derivatives of $g(x,t)$."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 8,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# Set up the trial function:\n",
-    "def u(x):\n",
-    "    return np.sin(np.pi*x)\n",
+    "    # For the hidden layer\n",
+    "    p0 = npr.randn(num_neurons_hidden, 2 )\n",
     "\n",
-    "def g_trial(point,P):\n",
-    "    x,t = point\n",
-    "    return (1-t)*u(x) + x*(1-x)*t*deep_neural_network(P,point)\n",
+    "    # For the output layer\n",
+    "    p1 = npr.randn(1, num_neurons_hidden + 1 ) # +1 since bias is included\n",
     "\n",
-    "# The right side of the ODE:\n",
-    "def f(point):\n",
-    "    return 0.\n",
+    "    P = [p0, p1]\n",
     "\n",
-    "# The cost function:\n",
-    "def cost_function(P, x, t):\n",
-    "    cost_sum = 0\n",
+    "    print('Initial cost: %g'%cost_function(P, x))\n",
     "\n",
-    "    g_t_jacobian_func = jacobian(g_trial)\n",
-    "    g_t_hessian_func = hessian(g_trial)\n",
+    "    ## Start finding the optimal weights using gradient descent\n",
     "\n",
-    "    for x_ in x:\n",
-    "        for t_ in t:\n",
-    "            point = np.array([x_,t_])\n",
+    "    # Find the Python function that represents the gradient of the cost function\n",
+    "    # w.r.t the 0-th input argument -- that is the weights and biases in the hidden and output layer\n",
+    "    cost_function_grad = grad(cost_function,0)\n",
+    "\n",
+    "    # Let the update be done num_iter times\n",
+    "    for i in range(num_iter):\n",
+    "        # Evaluate the gradient at the current weights and biases in P.\n",
+    "        # The cost_grad consist now of two arrays;\n",
+    "        # one for the gradient w.r.t P_hidden and\n",
+    "        # one for the gradient w.r.t P_output\n",
+    "        cost_grad =  cost_function_grad(P, x)\n",
+    "\n",
+    "        P[0] = P[0] - lmb * cost_grad[0]\n",
+    "        P[1] = P[1] - lmb * cost_grad[1]\n",
+    "\n",
+    "    print('Final cost: %g'%cost_function(P, x))\n",
+    "\n",
+    "    return P\n",
+    "\n",
+    "def g_analytic(x, gamma = 2, g0 = 10):\n",
+    "    return g0*np.exp(-gamma*x)\n",
+    "\n",
+    "# Solve the given problem\n",
+    "if __name__ == '__main__':\n",
+    "    # Set seed such that the weight are initialized\n",
+    "    # with same weights and biases for every run.\n",
+    "    npr.seed(15)\n",
+    "\n",
+    "    ## Decide the vales of arguments to the function to solve\n",
+    "    N = 10\n",
+    "    x = np.linspace(0, 1, N)\n",
     "\n",
-    "            g_t = g_trial(point,P)\n",
-    "            g_t_jacobian = g_t_jacobian_func(point,P)\n",
-    "            g_t_hessian = g_t_hessian_func(point,P)\n",
+    "    ## Set up the initial parameters\n",
+    "    num_hidden_neurons = 10\n",
+    "    num_iter = 10000\n",
+    "    lmb = 0.001\n",
     "\n",
-    "            g_t_dt = g_t_jacobian[1]\n",
-    "            g_t_d2x = g_t_hessian[0][0]\n",
+    "    # Use the network\n",
+    "    P = solve_ode_neural_network(x, num_hidden_neurons, num_iter, lmb)\n",
     "\n",
-    "            func = f(point)\n",
+    "    # Print the deviation from the trial solution and true solution\n",
+    "    res = g_trial(x,P)\n",
+    "    res_analytical = g_analytic(x)\n",
     "\n",
-    "            err_sqr = ( (g_t_dt - g_t_d2x) - func)**2\n",
-    "            cost_sum += err_sqr\n",
+    "    print('Max absolute difference: %g'%np.max(np.abs(res - res_analytical)))\n",
     "\n",
-    "    return cost_sum"
+    "    # Plot the results\n",
+    "    plt.figure(figsize=(10,10))\n",
+    "\n",
+    "    plt.title('Performance of neural network solving an ODE compared to the analytical solution')\n",
+    "    plt.plot(x, res_analytical)\n",
+    "    plt.plot(x, res[0,:])\n",
+    "    plt.legend(['analytical','nn'])\n",
+    "    plt.xlabel('x')\n",
+    "    plt.ylabel('g(x)')\n",
+    "    plt.show()"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "59e5acda",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "## Setting up the network using Autograd; The full program\n",
-    "\n",
-    "Having set up the network, along with the trial solution and cost function, we can now see how the deep neural network performs by comparing the results to the analytical solution.\n",
-    "\n",
-    "The analytical solution of our problem is\n",
-    "\n",
-    "$$\n",
-    "g(x,t) = \\exp(-\\pi^2 t)\\sin(\\pi x)\n",
-    "$$\n",
-    "\n",
-    "A possible way to implement a neural network solving the PDE, is given below.\n",
-    "Be aware, though, that it is fairly slow for the parameters used.\n",
-    "A better result is possible, but requires more iterations, and thus longer time to complete.\n",
+    "## The network with one input layer, specified number of hidden layers, and one output layer\n",
     "\n",
+    "It is also possible to extend the construction of our network into a more general one, allowing the network to contain more than one hidden layers.\n",
     "\n",
-    "Indeed, the program below is not optimal in its implementation, but rather serves as an example on how to implement and use a neural network to solve a PDE.\n",
-    "Using TensorFlow results in a much better execution time. Try it!"
+    "The number of neurons within each hidden layer are given as a list of integers in the program below."
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 9,
-   "metadata": {},
+   "execution_count": 32,
+   "id": "f1a60516",
+   "metadata": {
+    "collapsed": false,
+    "editable": true,
+    "jupyter": {
+     "outputs_hidden": false
+    }
+   },
    "outputs": [],
    "source": [
     "import autograd.numpy as np\n",
-    "from autograd import jacobian,hessian,grad\n",
+    "from autograd import grad, elementwise_grad\n",
     "import autograd.numpy.random as npr\n",
-    "from matplotlib import cm\n",
     "from matplotlib import pyplot as plt\n",
-    "from mpl_toolkits.mplot3d import axes3d\n",
-    "\n",
-    "## Set up the network\n",
     "\n",
     "def sigmoid(z):\n",
     "    return 1/(1 + np.exp(-z))\n",
     "\n",
+    "# The neural network with one input layer and one output layer,\n",
+    "# but with number of hidden layers specified by the user.\n",
     "def deep_neural_network(deep_params, x):\n",
-    "    # x is now a point and a 1D numpy array; make it a column vector\n",
-    "    num_coordinates = np.size(x,0)\n",
-    "    x = x.reshape(num_coordinates,-1)\n",
-    "\n",
-    "    num_points = np.size(x,1)\n",
-    "\n",
     "    # N_hidden is the number of hidden layers\n",
-    "    N_hidden = np.size(deep_params) - 1 # -1 since params consist of parameters to all the hidden layers AND the output layer\n",
+    "    # deep_params is a list, len() should be used\n",
+    "    N_hidden = len(deep_params) - 1 # -1 since params consists of\n",
+    "                                        # parameters to all the hidden\n",
+    "                                        # layers AND the output layer.\n",
+    "\n",
+    "    # Assumes input x being an one-dimensional array\n",
+    "    num_values = np.size(x)\n",
+    "    x = x.reshape(-1, num_values)\n",
     "\n",
     "    # Assume that the input layer does nothing to the input x\n",
     "    x_input = x\n",
+    "\n",
+    "    # Due to multiple hidden layers, define a variable referencing to the\n",
+    "    # output of the previous layer:\n",
     "    x_prev = x_input\n",
     "\n",
     "    ## Hidden layers:\n",
@@ -2546,7 +4754,7 @@
     "        w_hidden = deep_params[l]\n",
     "\n",
     "        # Add a row of ones to include bias\n",
-    "        x_prev = np.concatenate((np.ones((1,num_points)), x_prev ), axis = 0)\n",
+    "        x_prev = np.concatenate((np.ones((1,num_values)), x_prev ), axis = 0)\n",
     "\n",
     "        z_hidden = np.matmul(w_hidden, x_prev)\n",
     "        x_hidden = sigmoid(z_hidden)\n",
@@ -2560,375 +4768,289 @@
     "    w_output = deep_params[-1]\n",
     "\n",
     "    # Include bias:\n",
-    "    x_prev = np.concatenate((np.ones((1,num_points)), x_prev), axis = 0)\n",
+    "    x_prev = np.concatenate((np.ones((1,num_values)), x_prev), axis = 0)\n",
     "\n",
     "    z_output = np.matmul(w_output, x_prev)\n",
     "    x_output = z_output\n",
     "\n",
-    "    return x_output[0][0]\n",
-    "\n",
-    "## Define the trial solution and cost function\n",
-    "def u(x):\n",
-    "    return np.sin(np.pi*x)\n",
+    "    return x_output\n",
     "\n",
-    "def g_trial(point,P):\n",
-    "    x,t = point\n",
-    "    return (1-t)*u(x) + x*(1-x)*t*deep_neural_network(P,point)\n",
+    "# The trial solution using the deep neural network:\n",
+    "def g_trial_deep(x,params, g0 = 10):\n",
+    "    return g0 + x*deep_neural_network(params, x)\n",
     "\n",
     "# The right side of the ODE:\n",
-    "def f(point):\n",
-    "    return 0.\n",
+    "def g(x, g_trial, gamma = 2):\n",
+    "    return -gamma*g_trial\n",
     "\n",
-    "# The cost function:\n",
-    "def cost_function(P, x, t):\n",
-    "    cost_sum = 0\n",
+    "# The same cost function as before, but calls deep_neural_network instead.\n",
+    "def cost_function_deep(P, x):\n",
     "\n",
-    "    g_t_jacobian_func = jacobian(g_trial)\n",
-    "    g_t_hessian_func = hessian(g_trial)\n",
+    "    # Evaluate the trial function with the current parameters P\n",
+    "    g_t = g_trial_deep(x,P)\n",
     "\n",
-    "    for x_ in x:\n",
-    "        for t_ in t:\n",
-    "            point = np.array([x_,t_])\n",
+    "    # Find the derivative w.r.t x of the neural network\n",
+    "    d_net_out = elementwise_grad(deep_neural_network,1)(P,x)\n",
     "\n",
-    "            g_t = g_trial(point,P)\n",
-    "            g_t_jacobian = g_t_jacobian_func(point,P)\n",
-    "            g_t_hessian = g_t_hessian_func(point,P)\n",
+    "    # Find the derivative w.r.t x of the trial function\n",
+    "    d_g_t = elementwise_grad(g_trial_deep,0)(x,P)\n",
     "\n",
-    "            g_t_dt = g_t_jacobian[1]\n",
-    "            g_t_d2x = g_t_hessian[0][0]\n",
+    "    # The right side of the ODE\n",
+    "    func = g(x, g_t)\n",
     "\n",
-    "            func = f(point)\n",
+    "    err_sqr = (d_g_t - func)**2\n",
+    "    cost_sum = np.sum(err_sqr)\n",
     "\n",
-    "            err_sqr = ( (g_t_dt - g_t_d2x) - func)**2\n",
-    "            cost_sum += err_sqr\n",
+    "    return cost_sum / np.size(err_sqr)\n",
     "\n",
-    "    return cost_sum /( np.size(x)*np.size(t) )\n",
+    "# Solve the exponential decay ODE using neural network with one input and one output layer,\n",
+    "# but with specified number of hidden layers from the user.\n",
+    "def solve_ode_deep_neural_network(x, num_neurons, num_iter, lmb):\n",
+    "    # num_hidden_neurons is now a list of number of neurons within each hidden layer\n",
     "\n",
-    "## For comparison, define the analytical solution\n",
-    "def g_analytic(point):\n",
-    "    x,t = point\n",
-    "    return np.exp(-np.pi**2*t)*np.sin(np.pi*x)\n",
+    "    # The number of elements in the list num_hidden_neurons thus represents\n",
+    "    # the number of hidden layers.\n",
     "\n",
-    "## Set up a function for training the network to solve for the equation\n",
-    "def solve_pde_deep_neural_network(x,t, num_neurons, num_iter, lmb):\n",
-    "    ## Set up initial weigths and biases\n",
+    "    # Find the number of hidden layers:\n",
     "    N_hidden = np.size(num_neurons)\n",
     "\n",
-    "    ## Set up initial weigths and biases\n",
+    "    ## Set up initial weights and biases\n",
     "\n",
     "    # Initialize the list of parameters:\n",
     "    P = [None]*(N_hidden + 1) # + 1 to include the output layer\n",
     "\n",
-    "    P[0] = npr.randn(num_neurons[0], 2 + 1 ) # 2 since we have two points, +1 to include bias\n",
+    "    P[0] = npr.randn(num_neurons[0], 2 )\n",
     "    for l in range(1,N_hidden):\n",
     "        P[l] = npr.randn(num_neurons[l], num_neurons[l-1] + 1) # +1 to include bias\n",
     "\n",
     "    # For the output layer\n",
     "    P[-1] = npr.randn(1, num_neurons[-1] + 1 ) # +1 since bias is included\n",
     "\n",
-    "    print('Initial cost: ',cost_function(P, x, t))\n",
+    "    print('Initial cost: %g'%cost_function_deep(P, x))\n",
     "\n",
-    "    cost_function_grad = grad(cost_function,0)\n",
+    "    ## Start finding the optimal weights using gradient descent\n",
+    "\n",
+    "    # Find the Python function that represents the gradient of the cost function\n",
+    "    # w.r.t the 0-th input argument -- that is the weights and biases in the hidden and output layer\n",
+    "    cost_function_deep_grad = grad(cost_function_deep,0)\n",
     "\n",
     "    # Let the update be done num_iter times\n",
     "    for i in range(num_iter):\n",
-    "        cost_grad =  cost_function_grad(P, x , t)\n",
+    "        # Evaluate the gradient at the current weights and biases in P.\n",
+    "        # The cost_grad consist now of N_hidden + 1 arrays; the gradient w.r.t the weights and biases\n",
+    "        # in the hidden layers and output layers evaluated at x.\n",
+    "        cost_deep_grad =  cost_function_deep_grad(P, x)\n",
     "\n",
     "        for l in range(N_hidden+1):\n",
-    "            P[l] = P[l] - lmb * cost_grad[l]\n",
+    "            P[l] = P[l] - lmb * cost_deep_grad[l]\n",
     "\n",
-    "    print('Final cost: ',cost_function(P, x, t))\n",
+    "    print('Final cost: %g'%cost_function_deep(P, x))\n",
     "\n",
     "    return P\n",
     "\n",
+    "def g_analytic(x, gamma = 2, g0 = 10):\n",
+    "    return g0*np.exp(-gamma*x)\n",
+    "\n",
+    "# Solve the given problem\n",
     "if __name__ == '__main__':\n",
-    "    ### Use the neural network:\n",
     "    npr.seed(15)\n",
     "\n",
     "    ## Decide the vales of arguments to the function to solve\n",
-    "    Nx = 10; Nt = 10\n",
-    "    x = np.linspace(0, 1, Nx)\n",
-    "    t = np.linspace(0,1,Nt)\n",
-    "\n",
-    "    ## Set up the parameters for the network\n",
-    "    num_hidden_neurons = [100, 25]\n",
-    "    num_iter = 250\n",
-    "    lmb = 0.01\n",
-    "\n",
-    "    P = solve_pde_deep_neural_network(x,t, num_hidden_neurons, num_iter, lmb)\n",
-    "\n",
-    "    ## Store the results\n",
-    "    g_dnn_ag = np.zeros((Nx, Nt))\n",
-    "    G_analytical = np.zeros((Nx, Nt))\n",
-    "    for i,x_ in enumerate(x):\n",
-    "        for j, t_ in enumerate(t):\n",
-    "            point = np.array([x_, t_])\n",
-    "            g_dnn_ag[i,j] = g_trial(point,P)\n",
-    "\n",
-    "            G_analytical[i,j] = g_analytic(point)\n",
-    "\n",
-    "    # Find the map difference between the analytical and the computed solution\n",
-    "    diff_ag = np.abs(g_dnn_ag - G_analytical)\n",
-    "    print('Max absolute difference between the analytical solution and the network: %g'%np.max(diff_ag))\n",
-    "\n",
-    "    ## Plot the solutions in two dimensions, that being in position and time\n",
-    "\n",
-    "    T,X = np.meshgrid(t,x)\n",
-    "\n",
-    "    fig = plt.figure(figsize=(10,10))\n",
-    "    ax = fig.gca(projection='3d')\n",
-    "    ax.set_title('Solution from the deep neural network w/ %d layer'%len(num_hidden_neurons))\n",
-    "    s = ax.plot_surface(T,X,g_dnn_ag,linewidth=0,antialiased=False,cmap=cm.viridis)\n",
-    "    ax.set_xlabel('Time $t$')\n",
-    "    ax.set_ylabel('Position $x$');\n",
-    "\n",
-    "\n",
-    "    fig = plt.figure(figsize=(10,10))\n",
-    "    ax = fig.gca(projection='3d')\n",
-    "    ax.set_title('Analytical solution')\n",
-    "    s = ax.plot_surface(T,X,G_analytical,linewidth=0,antialiased=False,cmap=cm.viridis)\n",
-    "    ax.set_xlabel('Time $t$')\n",
-    "    ax.set_ylabel('Position $x$');\n",
-    "\n",
-    "    fig = plt.figure(figsize=(10,10))\n",
-    "    ax = fig.gca(projection='3d')\n",
-    "    ax.set_title('Difference')\n",
-    "    s = ax.plot_surface(T,X,diff_ag,linewidth=0,antialiased=False,cmap=cm.viridis)\n",
-    "    ax.set_xlabel('Time $t$')\n",
-    "    ax.set_ylabel('Position $x$');\n",
-    "\n",
-    "    ## Take some slices of the 3D plots just to see the solutions at particular times\n",
-    "    indx1 = 0\n",
-    "    indx2 = int(Nt/2)\n",
-    "    indx3 = Nt-1\n",
-    "\n",
-    "    t1 = t[indx1]\n",
-    "    t2 = t[indx2]\n",
-    "    t3 = t[indx3]\n",
-    "\n",
-    "    # Slice the results from the DNN\n",
-    "    res1 = g_dnn_ag[:,indx1]\n",
-    "    res2 = g_dnn_ag[:,indx2]\n",
-    "    res3 = g_dnn_ag[:,indx3]\n",
-    "\n",
-    "    # Slice the analytical results\n",
-    "    res_analytical1 = G_analytical[:,indx1]\n",
-    "    res_analytical2 = G_analytical[:,indx2]\n",
-    "    res_analytical3 = G_analytical[:,indx3]\n",
+    "    N = 10\n",
+    "    x = np.linspace(0, 1, N)\n",
     "\n",
-    "    # Plot the slices\n",
-    "    plt.figure(figsize=(10,10))\n",
-    "    plt.title(\"Computed solutions at time = %g\"%t1)\n",
-    "    plt.plot(x, res1)\n",
-    "    plt.plot(x,res_analytical1)\n",
-    "    plt.legend(['dnn','analytical'])\n",
+    "    ## Set up the initial parameters\n",
+    "    num_hidden_neurons = np.array([10,10])\n",
+    "    num_iter = 10000\n",
+    "    lmb = 0.001\n",
     "\n",
-    "    plt.figure(figsize=(10,10))\n",
-    "    plt.title(\"Computed solutions at time = %g\"%t2)\n",
-    "    plt.plot(x, res2)\n",
-    "    plt.plot(x,res_analytical2)\n",
-    "    plt.legend(['dnn','analytical'])\n",
+    "    P = solve_ode_deep_neural_network(x, num_hidden_neurons, num_iter, lmb)\n",
     "\n",
-    "    plt.figure(figsize=(10,10))\n",
-    "    plt.title(\"Computed solutions at time = %g\"%t3)\n",
-    "    plt.plot(x, res3)\n",
-    "    plt.plot(x,res_analytical3)\n",
-    "    plt.legend(['dnn','analytical'])\n",
+    "    res = g_trial_deep(x,P)\n",
+    "    res_analytical = g_analytic(x)\n",
     "\n",
+    "    plt.figure(figsize=(10,10))\n",
+    "\n",
+    "    plt.title('Performance of a deep neural network solving an ODE compared to the analytical solution')\n",
+    "    plt.plot(x, res_analytical)\n",
+    "    plt.plot(x, res[0,:])\n",
+    "    plt.legend(['analytical','dnn'])\n",
+    "    plt.ylabel('g(x)')\n",
     "    plt.show()"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "807a375c",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "## Example: Solving the wave equation with Neural Networks\n",
+    "## Example: Population growth\n",
     "\n",
-    "The wave equation is"
+    "A logistic model of population growth assumes that a population converges toward an equilibrium.\n",
+    "The population growth can be modeled by"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "d35839bb",
+   "metadata": {
+    "editable": true
+   },
    "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"log\"></div>\n",
+    "\n",
     "$$\n",
-    "\\frac{\\partial^2 g(x,t)}{\\partial t^2} = c^2\\frac{\\partial^2 g(x,t)}{\\partial x^2}\n",
+    "\\begin{equation} \\label{log} \\tag{10}\n",
+    "\tg'(t) = \\alpha g(t)(A - g(t))\n",
+    "\\end{equation}\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "2991d1fe",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "with $c$ being the specified wave speed.\n",
+    "where $g(t)$ is the population density at time $t$, $\\alpha > 0$ the growth rate and $A > 0$ is the maximum population number in the environment.\n",
+    "Also, at $t = 0$ the population has the size $g(0) = g_0$, where $g_0$ is some chosen constant.\n",
     "\n",
-    "Here, the chosen conditions are"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "$$\n",
-    "\\begin{align*}\n",
-    "\tg(0,t) &= 0 \\\\\n",
-    "\tg(1,t) &= 0 \\\\\n",
-    "\tg(x,0) &= u(x) \\\\\n",
-    "\t\\frac{\\partial g(x,t)}{\\partial t} \\Big |_{t = 0} &= v(x)\n",
-    "\\end{align*}\n",
-    "$$"
+    "In this example, similar network as for the exponential decay using Autograd has been used to solve the equation. However, as the implementation might suffer from e.g numerical instability\n",
+    "and high execution time (this might be more apparent in the examples solving PDEs),\n",
+    "using a library like  TensorFlow is recommended.\n",
+    "Here, we stay with a more simple approach and implement for comparison, the simple forward Euler method."
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "ee668a71",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "where $\\frac{\\partial g(x,t)}{\\partial t} \\Big |_{t = 0}$ means the derivative of $g(x,t)$ with respect to $t$ is evaluated at $t = 0$, and $u(x)$ and $v(x)$ being given functions.\n",
-    "\n",
-    "## The problem to solve for\n",
+    "## Setting up the problem\n",
     "\n",
-    "The wave equation to solve for, is"
+    "Here, we will model a population $g(t)$ in an environment having carrying capacity $A$.\n",
+    "The population follows the model"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "febf10cc",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "<!-- Equation labels as ordinary links -->\n",
-    "<div id=\"wave\"></div>\n",
+    "<div id=\"solveode_population\"></div>\n",
     "\n",
     "$$\n",
-    "\\begin{equation} \\label{wave} \\tag{19}\n",
-    "\\frac{\\partial^2 g(x,t)}{\\partial t^2} = c^2 \\frac{\\partial^2 g(x,t)}{\\partial x^2}\n",
+    "\\begin{equation} \\label{solveode_population} \\tag{11}\n",
+    "g'(t) = \\alpha g(t)(A - g(t))\n",
     "\\end{equation}\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "where $c$ is the given wave speed.\n",
-    "The chosen conditions for this equation are"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
+   "id": "494194e3",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "<!-- Equation labels as ordinary links -->\n",
-    "<div id=\"condwave\"></div>\n",
+    "where $g(0) = g_0$.\n",
     "\n",
-    "$$\n",
-    "\\begin{aligned}\n",
-    "g(0,t) &= 0, &t \\geq 0 \\\\\n",
-    "g(1,t) &= 0, &t \\geq 0 \\\\\n",
-    "g(x,0) &= u(x), &x\\in[0,1] \\\\\n",
-    "\\frac{\\partial g(x,t)}{\\partial t}\\Big |_{t = 0} &= v(x), &x \\in [0,1]\n",
-    "\\end{aligned} \\label{condwave} \\tag{20}\n",
-    "$$"
+    "In this example, we let $\\alpha = 2$, $A = 1$, and $g_0 = 1.2$."
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "5efa7b11",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "In this example, let $c = 1$ and $u(x) = \\sin(\\pi x)$ and $v(x) = -\\pi\\sin(\\pi x)$.\n",
-    "\n",
-    "\n",
     "## The trial solution\n",
-    "Setting up the network is done in similar matter as for the example of solving the diffusion equation.\n",
-    "The only things we have to change, is the trial solution such that it satisfies the conditions from ([20](#condwave)) and the cost function.\n",
     "\n",
-    "The trial solution becomes slightly different since we have other conditions than in the example of solving the diffusion equation. Here, a possible trial solution $g_t(x,t)$ is\n",
-    "\n",
-    "$$\n",
-    "g_t(x,t) = h_1(x,t) + x(1-x)t^2N(x,t,P)\n",
-    "$$\n",
+    "We will get a slightly different trial solution, as the boundary conditions are different\n",
+    "compared to the case for exponential decay.\n",
     "\n",
-    "where\n",
+    "A possible trial solution satisfying the condition $g(0) = g_0$ could be\n",
     "\n",
     "$$\n",
-    "h_1(x,t) = (1-t^2)u(x) + tv(x)\n",
+    "h_1(t) = g_0 + t \\cdot N(t,P)\n",
     "$$\n",
     "\n",
-    "Note that this trial solution satisfies the conditions only if $u(0) = v(0) = u(1) = v(1) = 0$, which is the case in this example.\n",
-    "\n",
-    "## The analytical solution\n",
+    "with $N(t,P)$ being the output from the neural network with weights and biases for each layer collected in the set $P$.\n",
     "\n",
-    "The analytical solution for our specific problem, is\n",
+    "The analytical solution is\n",
     "\n",
     "$$\n",
-    "g(x,t) = \\sin(\\pi x)\\cos(\\pi t) - \\sin(\\pi x)\\sin(\\pi t)\n",
-    "$$\n",
+    "g(t) = \\frac{Ag_0}{g_0 + (A - g_0)\\exp(-\\alpha A t)}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "568131dc",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## The program using Autograd\n",
     "\n",
-    "## Solving the wave equation - the full program using Autograd"
+    "The network will be the similar as for the exponential decay example, but with some small modifications for our problem."
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 10,
-   "metadata": {},
+   "execution_count": 33,
+   "id": "8737e028",
+   "metadata": {
+    "collapsed": false,
+    "editable": true,
+    "jupyter": {
+     "outputs_hidden": false
+    }
+   },
    "outputs": [],
    "source": [
     "import autograd.numpy as np\n",
-    "from autograd import hessian,grad\n",
+    "from autograd import grad, elementwise_grad\n",
     "import autograd.numpy.random as npr\n",
-    "from matplotlib import cm\n",
     "from matplotlib import pyplot as plt\n",
-    "from mpl_toolkits.mplot3d import axes3d\n",
-    "\n",
-    "## Set up the trial function:\n",
-    "def u(x):\n",
-    "    return np.sin(np.pi*x)\n",
-    "\n",
-    "def v(x):\n",
-    "    return -np.pi*np.sin(np.pi*x)\n",
-    "\n",
-    "def h1(point):\n",
-    "    x,t = point\n",
-    "    return (1 - t**2)*u(x) + t*v(x)\n",
-    "\n",
-    "def g_trial(point,P):\n",
-    "    x,t = point\n",
-    "    return h1(point) + x*(1-x)*t**2*deep_neural_network(P,point)\n",
-    "\n",
-    "## Define the cost function\n",
-    "def cost_function(P, x, t):\n",
-    "    cost_sum = 0\n",
-    "\n",
-    "    g_t_hessian_func = hessian(g_trial)\n",
-    "\n",
-    "    for x_ in x:\n",
-    "        for t_ in t:\n",
-    "            point = np.array([x_,t_])\n",
-    "\n",
-    "            g_t_hessian = g_t_hessian_func(point,P)\n",
-    "\n",
-    "            g_t_d2x = g_t_hessian[0][0]\n",
-    "            g_t_d2t = g_t_hessian[1][1]\n",
-    "\n",
-    "            err_sqr = ( (g_t_d2t - g_t_d2x) )**2\n",
-    "            cost_sum += err_sqr\n",
-    "\n",
-    "    return cost_sum / (np.size(t) * np.size(x))\n",
     "\n",
-    "## The neural network\n",
     "def sigmoid(z):\n",
     "    return 1/(1 + np.exp(-z))\n",
     "\n",
-    "def deep_neural_network(deep_params, x):\n",
-    "    # x is now a point and a 1D numpy array; make it a column vector\n",
-    "    num_coordinates = np.size(x,0)\n",
-    "    x = x.reshape(num_coordinates,-1)\n",
-    "\n",
-    "    num_points = np.size(x,1)\n",
+    "# Function to get the parameters.\n",
+    "# Done such that one can easily change the paramaters after one's liking.\n",
+    "def get_parameters():\n",
+    "    alpha = 2\n",
+    "    A = 1\n",
+    "    g0 = 1.2\n",
+    "    return alpha, A, g0\n",
     "\n",
+    "def deep_neural_network(deep_params, x):\n",
     "    # N_hidden is the number of hidden layers\n",
-    "    N_hidden = np.size(deep_params) - 1 # -1 since params consist of parameters to all the hidden layers AND the output layer\n",
+    "    # deep_params is a list, len() should be used\n",
+    "    N_hidden = len(deep_params) - 1 # -1 since params consists of\n",
+    "                                        # parameters to all the hidden\n",
+    "                                        # layers AND the output layer.\n",
+    "\n",
+    "    # Assumes input x being an one-dimensional array\n",
+    "    num_values = np.size(x)\n",
+    "    x = x.reshape(-1, num_values)\n",
     "\n",
     "    # Assume that the input layer does nothing to the input x\n",
     "    x_input = x\n",
+    "\n",
+    "    # Due to multiple hidden layers, define a variable referencing to the\n",
+    "    # output of the previous layer:\n",
     "    x_prev = x_input\n",
     "\n",
     "    ## Hidden layers:\n",
@@ -2938,7 +5060,7 @@
     "        w_hidden = deep_params[l]\n",
     "\n",
     "        # Add a row of ones to include bias\n",
-    "        x_prev = np.concatenate((np.ones((1,num_points)), x_prev ), axis = 0)\n",
+    "        x_prev = np.concatenate((np.ones((1,num_values)), x_prev ), axis = 0)\n",
     "\n",
     "        z_hidden = np.matmul(w_hidden, x_prev)\n",
     "        x_hidden = sigmoid(z_hidden)\n",
@@ -2952,1788 +5074,2239 @@
     "    w_output = deep_params[-1]\n",
     "\n",
     "    # Include bias:\n",
-    "    x_prev = np.concatenate((np.ones((1,num_points)), x_prev), axis = 0)\n",
+    "    x_prev = np.concatenate((np.ones((1,num_values)), x_prev), axis = 0)\n",
     "\n",
     "    z_output = np.matmul(w_output, x_prev)\n",
     "    x_output = z_output\n",
     "\n",
-    "    return x_output[0][0]\n",
-    "\n",
-    "## The analytical solution\n",
-    "def g_analytic(point):\n",
-    "    x,t = point\n",
-    "    return np.sin(np.pi*x)*np.cos(np.pi*t) - np.sin(np.pi*x)*np.sin(np.pi*t)\n",
+    "    return x_output\n",
     "\n",
-    "def solve_pde_deep_neural_network(x,t, num_neurons, num_iter, lmb):\n",
-    "    ## Set up initial weigths and biases\n",
-    "    N_hidden = np.size(num_neurons)\n",
     "\n",
-    "    ## Set up initial weigths and biases\n",
     "\n",
-    "    # Initialize the list of parameters:\n",
-    "    P = [None]*(N_hidden + 1) # + 1 to include the output layer\n",
     "\n",
-    "    P[0] = npr.randn(num_neurons[0], 2 + 1 ) # 2 since we have two points, +1 to include bias\n",
-    "    for l in range(1,N_hidden):\n",
-    "        P[l] = npr.randn(num_neurons[l], num_neurons[l-1] + 1) # +1 to include bias\n",
+    "def cost_function_deep(P, x):\n",
     "\n",
-    "    # For the output layer\n",
-    "    P[-1] = npr.randn(1, num_neurons[-1] + 1 ) # +1 since bias is included\n",
+    "    # Evaluate the trial function with the current parameters P\n",
+    "    g_t = g_trial_deep(x,P)\n",
     "\n",
-    "    print('Initial cost: ',cost_function(P, x, t))\n",
+    "    # Find the derivative w.r.t x of the trial function\n",
+    "    d_g_t = elementwise_grad(g_trial_deep,0)(x,P)\n",
     "\n",
-    "    cost_function_grad = grad(cost_function,0)\n",
+    "    # The right side of the ODE\n",
+    "    func = f(x, g_t)\n",
     "\n",
-    "    # Let the update be done num_iter times\n",
-    "    for i in range(num_iter):\n",
-    "        cost_grad =  cost_function_grad(P, x , t)\n",
+    "    err_sqr = (d_g_t - func)**2\n",
+    "    cost_sum = np.sum(err_sqr)\n",
     "\n",
-    "        for l in range(N_hidden+1):\n",
-    "            P[l] = P[l] - lmb * cost_grad[l]\n",
+    "    return cost_sum / np.size(err_sqr)\n",
     "\n",
+    "# The right side of the ODE:\n",
+    "def f(x, g_trial):\n",
+    "    alpha,A, g0 = get_parameters()\n",
+    "    return alpha*g_trial*(A - g_trial)\n",
     "\n",
-    "    print('Final cost: ',cost_function(P, x, t))\n",
+    "# The trial solution using the deep neural network:\n",
+    "def g_trial_deep(x, params):\n",
+    "    alpha,A, g0 = get_parameters()\n",
+    "    return g0 + x*deep_neural_network(params,x)\n",
     "\n",
-    "    return P\n",
+    "# The analytical solution:\n",
+    "def g_analytic(t):\n",
+    "    alpha,A, g0 = get_parameters()\n",
+    "    return A*g0/(g0 + (A - g0)*np.exp(-alpha*A*t))\n",
     "\n",
-    "if __name__ == '__main__':\n",
-    "    ### Use the neural network:\n",
-    "    npr.seed(15)\n",
+    "def solve_ode_deep_neural_network(x, num_neurons, num_iter, lmb):\n",
+    "    # num_hidden_neurons is now a list of number of neurons within each hidden layer\n",
     "\n",
-    "    ## Decide the vales of arguments to the function to solve\n",
-    "    Nx = 10; Nt = 10\n",
-    "    x = np.linspace(0, 1, Nx)\n",
-    "    t = np.linspace(0,1,Nt)\n",
+    "    # Find the number of hidden layers:\n",
+    "    N_hidden = np.size(num_neurons)\n",
     "\n",
-    "    ## Set up the parameters for the network\n",
-    "    num_hidden_neurons = [50,20]\n",
-    "    num_iter = 1000\n",
-    "    lmb = 0.01\n",
+    "    ## Set up initial weigths and biases\n",
     "\n",
-    "    P = solve_pde_deep_neural_network(x,t, num_hidden_neurons, num_iter, lmb)\n",
+    "    # Initialize the list of parameters:\n",
+    "    P = [None]*(N_hidden + 1) # + 1 to include the output layer\n",
     "\n",
-    "    ## Store the results\n",
-    "    res = np.zeros((Nx, Nt))\n",
-    "    res_analytical = np.zeros((Nx, Nt))\n",
-    "    for i,x_ in enumerate(x):\n",
-    "        for j, t_ in enumerate(t):\n",
-    "            point = np.array([x_, t_])\n",
-    "            res[i,j] = g_trial(point,P)\n",
+    "    P[0] = npr.randn(num_neurons[0], 2 )\n",
+    "    for l in range(1,N_hidden):\n",
+    "        P[l] = npr.randn(num_neurons[l], num_neurons[l-1] + 1) # +1 to include bias\n",
     "\n",
-    "            res_analytical[i,j] = g_analytic(point)\n",
+    "    # For the output layer\n",
+    "    P[-1] = npr.randn(1, num_neurons[-1] + 1 ) # +1 since bias is included\n",
     "\n",
-    "    diff = np.abs(res - res_analytical)\n",
-    "    print(\"Max difference between analytical and solution from nn: %g\"%np.max(diff))\n",
+    "    print('Initial cost: %g'%cost_function_deep(P, x))\n",
     "\n",
-    "    ## Plot the solutions in two dimensions, that being in position and time\n",
+    "    ## Start finding the optimal weigths using gradient descent\n",
     "\n",
-    "    T,X = np.meshgrid(t,x)\n",
+    "    # Find the Python function that represents the gradient of the cost function\n",
+    "    # w.r.t the 0-th input argument -- that is the weights and biases in the hidden and output layer\n",
+    "    cost_function_deep_grad = grad(cost_function_deep,0)\n",
     "\n",
-    "    fig = plt.figure(figsize=(10,10))\n",
-    "    ax = fig.gca(projection='3d')\n",
-    "    ax.set_title('Solution from the deep neural network w/ %d layer'%len(num_hidden_neurons))\n",
-    "    s = ax.plot_surface(T,X,res,linewidth=0,antialiased=False,cmap=cm.viridis)\n",
-    "    ax.set_xlabel('Time $t$')\n",
-    "    ax.set_ylabel('Position $x$');\n",
+    "    # Let the update be done num_iter times\n",
+    "    for i in range(num_iter):\n",
+    "        # Evaluate the gradient at the current weights and biases in P.\n",
+    "        # The cost_grad consist now of N_hidden + 1 arrays; the gradient w.r.t the weights and biases\n",
+    "        # in the hidden layers and output layers evaluated at x.\n",
+    "        cost_deep_grad =  cost_function_deep_grad(P, x)\n",
     "\n",
+    "        for l in range(N_hidden+1):\n",
+    "            P[l] = P[l] - lmb * cost_deep_grad[l]\n",
     "\n",
-    "    fig = plt.figure(figsize=(10,10))\n",
-    "    ax = fig.gca(projection='3d')\n",
-    "    ax.set_title('Analytical solution')\n",
-    "    s = ax.plot_surface(T,X,res_analytical,linewidth=0,antialiased=False,cmap=cm.viridis)\n",
-    "    ax.set_xlabel('Time $t$')\n",
-    "    ax.set_ylabel('Position $x$');\n",
+    "    print('Final cost: %g'%cost_function_deep(P, x))\n",
     "\n",
+    "    return P\n",
     "\n",
-    "    fig = plt.figure(figsize=(10,10))\n",
-    "    ax = fig.gca(projection='3d')\n",
-    "    ax.set_title('Difference')\n",
-    "    s = ax.plot_surface(T,X,diff,linewidth=0,antialiased=False,cmap=cm.viridis)\n",
-    "    ax.set_xlabel('Time $t$')\n",
-    "    ax.set_ylabel('Position $x$');\n",
+    "if __name__ == '__main__':\n",
+    "    npr.seed(4155)\n",
     "\n",
-    "    ## Take some slices of the 3D plots just to see the solutions at particular times\n",
-    "    indx1 = 0\n",
-    "    indx2 = int(Nt/2)\n",
-    "    indx3 = Nt-1\n",
+    "    ## Decide the vales of arguments to the function to solve\n",
+    "    Nt = 10\n",
+    "    T = 1\n",
+    "    t = np.linspace(0,T, Nt)\n",
     "\n",
-    "    t1 = t[indx1]\n",
-    "    t2 = t[indx2]\n",
-    "    t3 = t[indx3]\n",
+    "    ## Set up the initial parameters\n",
+    "    num_hidden_neurons = [100, 50, 25]\n",
+    "    num_iter = 1000\n",
+    "    lmb = 1e-3\n",
     "\n",
-    "    # Slice the results from the DNN\n",
-    "    res1 = res[:,indx1]\n",
-    "    res2 = res[:,indx2]\n",
-    "    res3 = res[:,indx3]\n",
+    "    P = solve_ode_deep_neural_network(t, num_hidden_neurons, num_iter, lmb)\n",
     "\n",
-    "    # Slice the analytical results\n",
-    "    res_analytical1 = res_analytical[:,indx1]\n",
-    "    res_analytical2 = res_analytical[:,indx2]\n",
-    "    res_analytical3 = res_analytical[:,indx3]\n",
+    "    g_dnn_ag = g_trial_deep(t,P)\n",
+    "    g_analytical = g_analytic(t)\n",
     "\n",
-    "    # Plot the slices\n",
-    "    plt.figure(figsize=(10,10))\n",
-    "    plt.title(\"Computed solutions at time = %g\"%t1)\n",
-    "    plt.plot(x, res1)\n",
-    "    plt.plot(x,res_analytical1)\n",
-    "    plt.legend(['dnn','analytical'])\n",
+    "    # Find the maximum absolute difference between the solutons:\n",
+    "    diff_ag = np.max(np.abs(g_dnn_ag - g_analytical))\n",
+    "    print(\"The max absolute difference between the solutions is: %g\"%diff_ag)\n",
     "\n",
     "    plt.figure(figsize=(10,10))\n",
-    "    plt.title(\"Computed solutions at time = %g\"%t2)\n",
-    "    plt.plot(x, res2)\n",
-    "    plt.plot(x,res_analytical2)\n",
-    "    plt.legend(['dnn','analytical'])\n",
     "\n",
-    "    plt.figure(figsize=(10,10))\n",
-    "    plt.title(\"Computed solutions at time = %g\"%t3)\n",
-    "    plt.plot(x, res3)\n",
-    "    plt.plot(x,res_analytical3)\n",
-    "    plt.legend(['dnn','analytical'])\n",
+    "    plt.title('Performance of neural network solving an ODE compared to the analytical solution')\n",
+    "    plt.plot(t, g_analytical)\n",
+    "    plt.plot(t, g_dnn_ag[0,:])\n",
+    "    plt.legend(['analytical','nn'])\n",
+    "    plt.xlabel('t')\n",
+    "    plt.ylabel('g(t)')\n",
     "\n",
     "    plt.show()"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "0904f64d",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "## Resources on differential equations and deep learning\n",
-    "\n",
-    "1. [Artificial neural networks for solving ordinary and partial differential equations by I.E. Lagaris et al](https://pdfs.semanticscholar.org/d061/df393e0e8fbfd0ea24976458b7d42419040d.pdf)\n",
-    "\n",
-    "2. [Neural networks for solving differential equations by A. Honchar](https://becominghuman.ai/neural-networks-for-solving-differential-equations-fa230ac5e04c)\n",
-    "\n",
-    "3. [Solving differential equations using neural networks by M.M Chiaramonte and M. Kiener](http://cs229.stanford.edu/proj2013/ChiaramonteKiener-SolvingDifferentialEquationsUsingNeuralNetworks.pdf)\n",
-    "\n",
-    "4. [Introduction to Partial Differential Equations by A. Tveito, R. Winther](https://www.springer.com/us/book/9783540225515)\n",
-    "\n",
-    "## Friday, Principal Component Analysis\n",
-    "\n",
-    "[Overview video](https://www.youtube.com/watch?v=fkf4IBRSeEc&ab_channel=SteveBrunton)\n",
-    "\n",
-    "## Basic ideas of the Principal Component Analysis (PCA)\n",
-    "\n",
-    "The principal component analysis deals with the problem of fitting a\n",
-    "low-dimensional affine subspace $S$ of dimension $d$ much smaller than\n",
-<<<<<<< HEAD
-    "the totaldimension $D$ of the problem at hand (our data\n",
-    "set). Mathematically it can be formulated as a statistical problem or\n",
-    "a geometric problem.  In our discussion of the theorem for the\n",
-    "classical PCA, we will stay with a statistical approach. This is also\n",
-    "what set the scene historically which for the PCA.\n",
-=======
-    "the total dimension $D$ of the problem at hand (our data\n",
-    "set). Mathematically it can be formulated as a statistical problem or\n",
-    "a geometric problem.  In our discussion of the theorem for the\n",
-    "classical PCA, we will stay with a statistical approach. \n",
-    "Historically, the PCA was first formulated in a statistical setting in order to estimate the principal component of a multivariate random variable.\n",
->>>>>>> 9b0e2e75096cc1acee65bfac25f4eff818140252
-    "\n",
-    "We have a data set defined by a design/feature matrix $\\boldsymbol{X}$ (see below for its definition) \n",
-    "* Each data point is determined by $p$ extrinsic (measurement) variables\n",
-    "\n",
-    "* We may want to ask the following question: Are there fewer intrinsic variables (say $d << p$) that still approximately describe the data?\n",
+    "## Using forward Euler to solve the ODE\n",
     "\n",
-    "* If so, these intrinsic variables may tell us something important and finding these intrinsic variables is what dimension reduction methods do. \n",
+    "A straightforward way of solving an ODE numerically, is to use Euler's method.\n",
     "\n",
-<<<<<<< HEAD
-=======
-    "A good read is for example [Vidal, Ma and Sastry](https://www.springer.com/gp/book/9780387878102).\n",
+    "Euler's method uses Taylor series to approximate the value at a function $f$ at a step $\\Delta x$ from $x$:\n",
     "\n",
+    "$$\n",
+    "f(x + \\Delta x) \\approx f(x) + \\Delta x f'(x)\n",
+    "$$\n",
     "\n",
->>>>>>> 9b0e2e75096cc1acee65bfac25f4eff818140252
-    "## Introducing the Covariance and Correlation functions\n",
+    "In our case, using Euler's method to approximate the value of $g$ at a step $\\Delta t$ from $t$ yields"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6f3577a8",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\begin{aligned}\n",
+    "  g(t + \\Delta t) &\\approx g(t) + \\Delta t g'(t) \\\\\n",
+    "  &= g(t) + \\Delta t \\big(\\alpha g(t)(A - g(t))\\big)\n",
+    "\\end{aligned}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "56d4410b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "along with the condition that $g(0) = g_0$.\n",
     "\n",
-    "Before we discuss the PCA theorem, we need to remind ourselves about\n",
-    "the definition of the covariance and the correlation function. These are quantities \n",
+    "Let $t_i = i \\cdot \\Delta t$ where $\\Delta t = \\frac{T}{N_t-1}$ where $T$ is the final time our solver must solve for and $N_t$ the number of values for $t \\in [0, T]$ for $i = 0, \\dots, N_t-1$.\n",
     "\n",
-    "Suppose we have defined two vectors\n",
-    "$\\hat{x}$ and $\\hat{y}$ with $n$ elements each. The covariance matrix $\\boldsymbol{C}$ is defined as"
+    "For $i \\geq 1$, we have that"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "48d2707e",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
-    "\\boldsymbol{C}[\\boldsymbol{x},\\boldsymbol{y}] = \\begin{bmatrix} \\mathrm{cov}[\\boldsymbol{x},\\boldsymbol{x}] & \\mathrm{cov}[\\boldsymbol{x},\\boldsymbol{y}] \\\\\n",
-    "                              \\mathrm{cov}[\\boldsymbol{y},\\boldsymbol{x}] & \\mathrm{cov}[\\boldsymbol{y},\\boldsymbol{y}] \\\\\n",
-    "             \\end{bmatrix},\n",
+    "\\begin{aligned}\n",
+    "t_i &= i\\Delta t \\\\\n",
+    "&= (i - 1)\\Delta t + \\Delta t \\\\\n",
+    "&= t_{i-1} + \\Delta t\n",
+    "\\end{aligned}\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "66d99f85",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "where for example"
+    "Now, if $g_i = g(t_i)$ then"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "3c9447d9",
+   "metadata": {
+    "editable": true
+   },
    "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"odenum\"></div>\n",
+    "\n",
     "$$\n",
-    "\\mathrm{cov}[\\boldsymbol{x},\\boldsymbol{y}] =\\frac{1}{n} \\sum_{i=0}^{n-1}(x_i- \\overline{x})(y_i- \\overline{y}).\n",
+    "\\begin{equation}\n",
+    "  \\begin{aligned}\n",
+    "  g_i &= g(t_i) \\\\\n",
+    "  &= g(t_{i-1} + \\Delta t) \\\\\n",
+    "  &\\approx g(t_{i-1}) + \\Delta t \\big(\\alpha g(t_{i-1})(A - g(t_{i-1}))\\big) \\\\\n",
+    "  &= g_{i-1} + \\Delta t \\big(\\alpha g_{i-1}(A - g_{i-1})\\big)\n",
+    "  \\end{aligned}\n",
+    "\\end{equation} \\label{odenum} \\tag{12}\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "724b97f1",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "for $i \\geq 1$ and $g_0 = g(t_0) = g(0) = g_0$.\n",
+    "\n",
+    "Equation ([12](#odenum)) could be implemented in the following way,\n",
+    "extending the program that uses the network using Autograd:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 34,
+   "id": "58b0da70",
+   "metadata": {
+    "collapsed": false,
+    "editable": true,
+    "jupyter": {
+     "outputs_hidden": false
+    }
+   },
+   "outputs": [],
+   "source": [
+    "# Assume that all function definitions from the example program using Autograd\n",
+    "# are located here.\n",
+    "\n",
+    "if __name__ == '__main__':\n",
+    "    npr.seed(4155)\n",
+    "\n",
+    "    ## Decide the vales of arguments to the function to solve\n",
+    "    Nt = 10\n",
+    "    T = 1\n",
+    "    t = np.linspace(0,T, Nt)\n",
+    "\n",
+    "    ## Set up the initial parameters\n",
+    "    num_hidden_neurons = [100,50,25]\n",
+    "    num_iter = 1000\n",
+    "    lmb = 1e-3\n",
+    "\n",
+    "    P = solve_ode_deep_neural_network(t, num_hidden_neurons, num_iter, lmb)\n",
+    "\n",
+    "    g_dnn_ag = g_trial_deep(t,P)\n",
+    "    g_analytical = g_analytic(t)\n",
+    "\n",
+    "    # Find the maximum absolute difference between the solutons:\n",
+    "    diff_ag = np.max(np.abs(g_dnn_ag - g_analytical))\n",
+    "    print(\"The max absolute difference between the solutions is: %g\"%diff_ag)\n",
+    "\n",
+    "    plt.figure(figsize=(10,10))\n",
+    "\n",
+    "    plt.title('Performance of neural network solving an ODE compared to the analytical solution')\n",
+    "    plt.plot(t, g_analytical)\n",
+    "    plt.plot(t, g_dnn_ag[0,:])\n",
+    "    plt.legend(['analytical','nn'])\n",
+    "    plt.xlabel('t')\n",
+    "    plt.ylabel('g(t)')\n",
+    "\n",
+    "    ## Find an approximation to the funtion using forward Euler\n",
+    "\n",
+    "    alpha, A, g0 = get_parameters()\n",
+    "    dt = T/(Nt - 1)\n",
+    "\n",
+    "    # Perform forward Euler to solve the ODE\n",
+    "    g_euler = np.zeros(Nt)\n",
+    "    g_euler[0] = g0\n",
+    "\n",
+    "    for i in range(1,Nt):\n",
+    "        g_euler[i] = g_euler[i-1] + dt*(alpha*g_euler[i-1]*(A - g_euler[i-1]))\n",
+    "\n",
+    "    # Print the errors done by each method\n",
+    "    diff1 = np.max(np.abs(g_euler - g_analytical))\n",
+    "    diff2 = np.max(np.abs(g_dnn_ag[0,:] - g_analytical))\n",
+    "\n",
+    "    print('Max absolute difference between Euler method and analytical: %g'%diff1)\n",
+    "    print('Max absolute difference between deep neural network and analytical: %g'%diff2)\n",
+    "\n",
+    "    # Plot results\n",
+    "    plt.figure(figsize=(10,10))\n",
+    "\n",
+    "    plt.plot(t,g_euler)\n",
+    "    plt.plot(t,g_analytical)\n",
+    "    plt.plot(t,g_dnn_ag[0,:])\n",
+    "\n",
+    "    plt.legend(['euler','analytical','dnn'])\n",
+    "    plt.xlabel('Time t')\n",
+    "    plt.ylabel('g(t)')\n",
+    "\n",
+    "    plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f1230dee",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "With this definition and recalling that the variance is defined as"
+    "## Example: Solving the one dimensional Poisson equation\n",
+    "\n",
+    "The Poisson equation for $g(x)$ in one dimension is"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "ba2c6d0a",
+   "metadata": {
+    "editable": true
+   },
    "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"poisson\"></div>\n",
+    "\n",
     "$$\n",
-    "\\mathrm{var}[\\boldsymbol{x}]=\\frac{1}{n} \\sum_{i=0}^{n-1}(x_i- \\overline{x})^2,\n",
+    "\\begin{equation} \\label{poisson} \\tag{13}\n",
+    "  -g''(x) = f(x)\n",
+    "\\end{equation}\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "bab1c7d3",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where $f(x)$ is a given function for $x \\in (0,1)$.\n",
+    "\n",
+    "The conditions that $g(x)$ is chosen to fulfill, are"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "42bfde23",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "we can rewrite the covariance matrix as"
+    "$$\n",
+    "\\begin{align*}\n",
+    "  g(0) &= 0 \\\\\n",
+    "  g(1) &= 0\n",
+    "\\end{align*}\n",
+    "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "7b3a2504",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "$$\n",
-    "\\boldsymbol{C}[\\boldsymbol{x},\\boldsymbol{y}] = \\begin{bmatrix} \\mathrm{var}[\\boldsymbol{x}] & \\mathrm{cov}[\\boldsymbol{x},\\boldsymbol{y}] \\\\\n",
-    "                              \\mathrm{cov}[\\boldsymbol{x},\\boldsymbol{y}] & \\mathrm{var}[\\boldsymbol{y}] \\\\\n",
-    "             \\end{bmatrix}.\n",
-    "$$"
+    "This equation can be solved numerically using programs where e.g Autograd and TensorFlow are used.\n",
+    "The results from the networks can then be compared to the analytical solution.\n",
+    "In addition, it could be interesting to see how a typical method for numerically solving second order ODEs compares to the neural networks."
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "a419909c",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "The covariance takes values between zero and infinity and may thus\n",
-    "lead to problems with loss of numerical precision for particularly\n",
-    "large values. It is common to scale the covariance matrix by\n",
-    "introducing instead the correlation matrix defined via the so-called\n",
-    "correlation function"
+    "## The specific equation to solve for\n",
+    "\n",
+    "Here, the function $g(x)$ to solve for follows the equation"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "125f8197",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
-    "\\mathrm{corr}[\\boldsymbol{x},\\boldsymbol{y}]=\\frac{\\mathrm{cov}[\\boldsymbol{x},\\boldsymbol{y}]}{\\sqrt{\\mathrm{var}[\\boldsymbol{x}] \\mathrm{var}[\\boldsymbol{y}]}}.\n",
+    "-g''(x) = f(x),\\qquad x \\in (0,1)\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "16376b60",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "The correlation function is then given by values $\\mathrm{corr}[\\boldsymbol{x},\\boldsymbol{y}]\n",
-    "\\in [-1,1]$. This avoids eventual problems with too large values. We\n",
-    "can then define the correlation matrix for the two vectors $\\boldsymbol{x}$\n",
-    "and $\\boldsymbol{y}$ as"
+    "where $f(x)$ is a given function, along with the chosen conditions"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "044c76ec",
+   "metadata": {
+    "editable": true
+   },
    "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"cond\"></div>\n",
+    "\n",
     "$$\n",
-    "\\boldsymbol{K}[\\boldsymbol{x},\\boldsymbol{y}] = \\begin{bmatrix} 1 & \\mathrm{corr}[\\boldsymbol{x},\\boldsymbol{y}] \\\\\n",
-    "                              \\mathrm{corr}[\\boldsymbol{y},\\boldsymbol{x}] & 1 \\\\\n",
-    "             \\end{bmatrix},\n",
+    "\\begin{aligned}\n",
+    "g(0) = g(1) = 0\n",
+    "\\end{aligned}\\label{cond} \\tag{14}\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "0ec4860b",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "In the above example this is the function we constructed using **pandas**.\n",
-    "\n",
-    "## Correlation Function and Design/Feature Matrix\n",
+    "In this example, we consider the case when $f(x) = (3x + x^2)\\exp(x)$.\n",
     "\n",
-    "In our derivation of the various regression algorithms like **Ordinary Least Squares** or **Ridge regression**\n",
-    "we defined the design/feature matrix $\\boldsymbol{X}$ as"
+    "For this case, a possible trial solution satisfying the conditions could be"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "03e27ec0",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
-    "\\boldsymbol{X}=\\begin{bmatrix}\n",
-    "x_{0,0} & x_{0,1} & x_{0,2}& \\dots & \\dots x_{0,p-1}\\\\\n",
-    "x_{1,0} & x_{1,1} & x_{1,2}& \\dots & \\dots x_{1,p-1}\\\\\n",
-    "x_{2,0} & x_{2,1} & x_{2,2}& \\dots & \\dots x_{2,p-1}\\\\\n",
-    "\\dots & \\dots & \\dots & \\dots \\dots & \\dots \\\\\n",
-    "x_{n-2,0} & x_{n-2,1} & x_{n-2,2}& \\dots & \\dots x_{n-2,p-1}\\\\\n",
-    "x_{n-1,0} & x_{n-1,1} & x_{n-1,2}& \\dots & \\dots x_{n-1,p-1}\\\\\n",
-    "\\end{bmatrix},\n",
+    "g_t(x) = x \\cdot (1-x) \\cdot N(P,x)\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "82fdb51f",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "with $\\boldsymbol{X}\\in {\\mathbb{R}}^{n\\times p}$, with the predictors/features $p$  refering to the column numbers and the\n",
-    "entries $n$ being the row elements.\n",
-    "We can rewrite the design/feature matrix in terms of its column vectors as"
+    "The analytical solution for this problem is"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "82e39d0e",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
-    "\\boldsymbol{X}=\\begin{bmatrix} \\boldsymbol{x}_0 & \\boldsymbol{x}_1 & \\boldsymbol{x}_2 & \\dots & \\dots & \\boldsymbol{x}_{p-1}\\end{bmatrix},\n",
+    "g(x) = x(1 - x)\\exp(x)\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "bf029e6c",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "with a given vector"
+    "## Solving the equation using Autograd"
    ]
   },
   {
-   "cell_type": "markdown",
-   "metadata": {},
+   "cell_type": "code",
+   "execution_count": 35,
+   "id": "e10d7641",
+   "metadata": {
+    "collapsed": false,
+    "editable": true,
+    "jupyter": {
+     "outputs_hidden": false
+    }
+   },
+   "outputs": [],
    "source": [
-    "$$\n",
-    "\\boldsymbol{x}_i^T = \\begin{bmatrix}x_{0,i} & x_{1,i} & x_{2,i}& \\dots & \\dots x_{n-1,i}\\end{bmatrix}.\n",
-    "$$"
+    "import autograd.numpy as np\n",
+    "from autograd import grad, elementwise_grad\n",
+    "import autograd.numpy.random as npr\n",
+    "from matplotlib import pyplot as plt\n",
+    "\n",
+    "def sigmoid(z):\n",
+    "    return 1/(1 + np.exp(-z))\n",
+    "\n",
+    "def deep_neural_network(deep_params, x):\n",
+    "    # N_hidden is the number of hidden layers\n",
+    "    # deep_params is a list, len() should be used\n",
+    "    N_hidden = len(deep_params) - 1 # -1 since params consists of\n",
+    "                                        # parameters to all the hidden\n",
+    "                                        # layers AND the output layer.\n",
+    "\n",
+    "    # Assumes input x being an one-dimensional array\n",
+    "    num_values = np.size(x)\n",
+    "    x = x.reshape(-1, num_values)\n",
+    "\n",
+    "    # Assume that the input layer does nothing to the input x\n",
+    "    x_input = x\n",
+    "\n",
+    "    # Due to multiple hidden layers, define a variable referencing to the\n",
+    "    # output of the previous layer:\n",
+    "    x_prev = x_input\n",
+    "\n",
+    "    ## Hidden layers:\n",
+    "\n",
+    "    for l in range(N_hidden):\n",
+    "        # From the list of parameters P; find the correct weigths and bias for this layer\n",
+    "        w_hidden = deep_params[l]\n",
+    "\n",
+    "        # Add a row of ones to include bias\n",
+    "        x_prev = np.concatenate((np.ones((1,num_values)), x_prev ), axis = 0)\n",
+    "\n",
+    "        z_hidden = np.matmul(w_hidden, x_prev)\n",
+    "        x_hidden = sigmoid(z_hidden)\n",
+    "\n",
+    "        # Update x_prev such that next layer can use the output from this layer\n",
+    "        x_prev = x_hidden\n",
+    "\n",
+    "    ## Output layer:\n",
+    "\n",
+    "    # Get the weights and bias for this layer\n",
+    "    w_output = deep_params[-1]\n",
+    "\n",
+    "    # Include bias:\n",
+    "    x_prev = np.concatenate((np.ones((1,num_values)), x_prev), axis = 0)\n",
+    "\n",
+    "    z_output = np.matmul(w_output, x_prev)\n",
+    "    x_output = z_output\n",
+    "\n",
+    "    return x_output\n",
+    "\n",
+    "\n",
+    "def solve_ode_deep_neural_network(x, num_neurons, num_iter, lmb):\n",
+    "    # num_hidden_neurons is now a list of number of neurons within each hidden layer\n",
+    "\n",
+    "    # Find the number of hidden layers:\n",
+    "    N_hidden = np.size(num_neurons)\n",
+    "\n",
+    "    ## Set up initial weigths and biases\n",
+    "\n",
+    "    # Initialize the list of parameters:\n",
+    "    P = [None]*(N_hidden + 1) # + 1 to include the output layer\n",
+    "\n",
+    "    P[0] = npr.randn(num_neurons[0], 2 )\n",
+    "    for l in range(1,N_hidden):\n",
+    "        P[l] = npr.randn(num_neurons[l], num_neurons[l-1] + 1) # +1 to include bias\n",
+    "\n",
+    "    # For the output layer\n",
+    "    P[-1] = npr.randn(1, num_neurons[-1] + 1 ) # +1 since bias is included\n",
+    "\n",
+    "    print('Initial cost: %g'%cost_function_deep(P, x))\n",
+    "\n",
+    "    ## Start finding the optimal weigths using gradient descent\n",
+    "\n",
+    "    # Find the Python function that represents the gradient of the cost function\n",
+    "    # w.r.t the 0-th input argument -- that is the weights and biases in the hidden and output layer\n",
+    "    cost_function_deep_grad = grad(cost_function_deep,0)\n",
+    "\n",
+    "    # Let the update be done num_iter times\n",
+    "    for i in range(num_iter):\n",
+    "        # Evaluate the gradient at the current weights and biases in P.\n",
+    "        # The cost_grad consist now of N_hidden + 1 arrays; the gradient w.r.t the weights and biases\n",
+    "        # in the hidden layers and output layers evaluated at x.\n",
+    "        cost_deep_grad =  cost_function_deep_grad(P, x)\n",
+    "\n",
+    "        for l in range(N_hidden+1):\n",
+    "            P[l] = P[l] - lmb * cost_deep_grad[l]\n",
+    "\n",
+    "    print('Final cost: %g'%cost_function_deep(P, x))\n",
+    "\n",
+    "    return P\n",
+    "\n",
+    "## Set up the cost function specified for this Poisson equation:\n",
+    "\n",
+    "# The right side of the ODE\n",
+    "def f(x):\n",
+    "    return (3*x + x**2)*np.exp(x)\n",
+    "\n",
+    "def cost_function_deep(P, x):\n",
+    "\n",
+    "    # Evaluate the trial function with the current parameters P\n",
+    "    g_t = g_trial_deep(x,P)\n",
+    "\n",
+    "    # Find the derivative w.r.t x of the trial function\n",
+    "    d2_g_t = elementwise_grad(elementwise_grad(g_trial_deep,0))(x,P)\n",
+    "\n",
+    "    right_side = f(x)\n",
+    "\n",
+    "    err_sqr = (-d2_g_t - right_side)**2\n",
+    "    cost_sum = np.sum(err_sqr)\n",
+    "\n",
+    "    return cost_sum/np.size(err_sqr)\n",
+    "\n",
+    "# The trial solution:\n",
+    "def g_trial_deep(x,P):\n",
+    "    return x*(1-x)*deep_neural_network(P,x)\n",
+    "\n",
+    "# The analytic solution;\n",
+    "def g_analytic(x):\n",
+    "    return x*(1-x)*np.exp(x)\n",
+    "\n",
+    "if __name__ == '__main__':\n",
+    "    npr.seed(4155)\n",
+    "\n",
+    "    ## Decide the vales of arguments to the function to solve\n",
+    "    Nx = 10\n",
+    "    x = np.linspace(0,1, Nx)\n",
+    "\n",
+    "    ## Set up the initial parameters\n",
+    "    num_hidden_neurons = [200,100]\n",
+    "    num_iter = 1000\n",
+    "    lmb = 1e-3\n",
+    "\n",
+    "    P = solve_ode_deep_neural_network(x, num_hidden_neurons, num_iter, lmb)\n",
+    "\n",
+    "    g_dnn_ag = g_trial_deep(x,P)\n",
+    "    g_analytical = g_analytic(x)\n",
+    "\n",
+    "    # Find the maximum absolute difference between the solutons:\n",
+    "    max_diff = np.max(np.abs(g_dnn_ag - g_analytical))\n",
+    "    print(\"The max absolute difference between the solutions is: %g\"%max_diff)\n",
+    "\n",
+    "    plt.figure(figsize=(10,10))\n",
+    "\n",
+    "    plt.title('Performance of neural network solving an ODE compared to the analytical solution')\n",
+    "    plt.plot(x, g_analytical)\n",
+    "    plt.plot(x, g_dnn_ag[0,:])\n",
+    "    plt.legend(['analytical','nn'])\n",
+    "    plt.xlabel('x')\n",
+    "    plt.ylabel('g(x)')\n",
+    "    plt.show()"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "82891392",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "With these definitions, we can now rewrite our $2\\times 2$\n",
-    "correaltion/covariance matrix in terms of a moe general design/feature\n",
-    "matrix $\\boldsymbol{X}\\in {\\mathbb{R}}^{n\\times p}$. This leads to a $p\\times p$\n",
-    "covariance matrix for the vectors $\\boldsymbol{x}_i$ with $i=0,1,\\dots,p-1$"
+    "## Comparing with a numerical scheme\n",
+    "\n",
+    "The Poisson equation is possible to solve using Taylor series to approximate the second derivative.\n",
+    "\n",
+    "Using Taylor series, the second derivative can be expressed as\n",
+    "\n",
+    "$$\n",
+    "g''(x) = \\frac{g(x + \\Delta x) - 2g(x) + g(x-\\Delta x)}{\\Delta x^2} + E_{\\Delta x}(x)\n",
+    "$$\n",
+    "\n",
+    "where $\\Delta x$ is a small step size and $E_{\\Delta x}(x)$ being the error term.\n",
+    "\n",
+    "Looking away from the error terms gives an approximation to the second derivative:"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "ad4ef510",
+   "metadata": {
+    "editable": true
+   },
    "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"approx\"></div>\n",
+    "\n",
     "$$\n",
-    "\\boldsymbol{C}[\\boldsymbol{x}] = \\begin{bmatrix}\n",
-    "\\mathrm{var}[\\boldsymbol{x}_0] & \\mathrm{cov}[\\boldsymbol{x}_0,\\boldsymbol{x}_1]  & \\mathrm{cov}[\\boldsymbol{x}_0,\\boldsymbol{x}_2] & \\dots & \\dots & \\mathrm{cov}[\\boldsymbol{x}_0,\\boldsymbol{x}_{p-1}]\\\\\n",
-    "\\mathrm{cov}[\\boldsymbol{x}_1,\\boldsymbol{x}_0] & \\mathrm{var}[\\boldsymbol{x}_1]  & \\mathrm{cov}[\\boldsymbol{x}_1,\\boldsymbol{x}_2] & \\dots & \\dots & \\mathrm{cov}[\\boldsymbol{x}_1,\\boldsymbol{x}_{p-1}]\\\\\n",
-    "\\mathrm{cov}[\\boldsymbol{x}_2,\\boldsymbol{x}_0]   & \\mathrm{cov}[\\boldsymbol{x}_2,\\boldsymbol{x}_1] & \\mathrm{var}[\\boldsymbol{x}_2] & \\dots & \\dots & \\mathrm{cov}[\\boldsymbol{x}_2,\\boldsymbol{x}_{p-1}]\\\\\n",
-    "\\dots & \\dots & \\dots & \\dots & \\dots & \\dots \\\\\n",
-    "\\dots & \\dots & \\dots & \\dots & \\dots & \\dots \\\\\n",
-    "\\mathrm{cov}[\\boldsymbol{x}_{p-1},\\boldsymbol{x}_0]   & \\mathrm{cov}[\\boldsymbol{x}_{p-1},\\boldsymbol{x}_1] & \\mathrm{cov}[\\boldsymbol{x}_{p-1},\\boldsymbol{x}_{2}]  & \\dots & \\dots  & \\mathrm{var}[\\boldsymbol{x}_{p-1}]\\\\\n",
-    "\\end{bmatrix},\n",
+    "\\begin{equation} \\label{approx} \\tag{15}\n",
+    "g''(x) \\approx \\frac{g(x + \\Delta x) - 2g(x) + g(x-\\Delta x)}{\\Delta x^2}\n",
+    "\\end{equation}\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "eb8ab804",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "and the correlation matrix"
+    "If $x_i = i \\Delta x = x_{i-1} + \\Delta x$ and $g_i = g(x_i)$ for $i = 1,\\dots N_x - 2$ with $N_x$ being the number of values for $x$, ([15](#approx)) becomes"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "f9b7b2a0",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
-    "\\boldsymbol{K}[\\boldsymbol{x}] = \\begin{bmatrix}\n",
-    "1 & \\mathrm{corr}[\\boldsymbol{x}_0,\\boldsymbol{x}_1]  & \\mathrm{corr}[\\boldsymbol{x}_0,\\boldsymbol{x}_2] & \\dots & \\dots & \\mathrm{corr}[\\boldsymbol{x}_0,\\boldsymbol{x}_{p-1}]\\\\\n",
-    "\\mathrm{corr}[\\boldsymbol{x}_1,\\boldsymbol{x}_0] & 1  & \\mathrm{corr}[\\boldsymbol{x}_1,\\boldsymbol{x}_2] & \\dots & \\dots & \\mathrm{corr}[\\boldsymbol{x}_1,\\boldsymbol{x}_{p-1}]\\\\\n",
-    "\\mathrm{corr}[\\boldsymbol{x}_2,\\boldsymbol{x}_0]   & \\mathrm{corr}[\\boldsymbol{x}_2,\\boldsymbol{x}_1] & 1 & \\dots & \\dots & \\mathrm{corr}[\\boldsymbol{x}_2,\\boldsymbol{x}_{p-1}]\\\\\n",
-    "\\dots & \\dots & \\dots & \\dots & \\dots & \\dots \\\\\n",
-    "\\dots & \\dots & \\dots & \\dots & \\dots & \\dots \\\\\n",
-    "\\mathrm{corr}[\\boldsymbol{x}_{p-1},\\boldsymbol{x}_0]   & \\mathrm{corr}[\\boldsymbol{x}_{p-1},\\boldsymbol{x}_1] & \\mathrm{corr}[\\boldsymbol{x}_{p-1},\\boldsymbol{x}_{2}]  & \\dots & \\dots  & 1\\\\\n",
-    "\\end{bmatrix},\n",
+    "\\begin{aligned}\n",
+    "g''(x_i) &\\approx \\frac{g(x_i + \\Delta x) - 2g(x_i) + g(x_i -\\Delta x)}{\\Delta x^2} \\\\\n",
+    "&= \\frac{g_{i+1} - 2g_i + g_{i-1}}{\\Delta x^2}\n",
+    "\\end{aligned}\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "6a71c7bb",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "## Covariance Matrix Examples\n",
-    "\n",
-    "\n",
-    "The Numpy function **np.cov** calculates the covariance elements using\n",
-    "the factor $1/(n-1)$ instead of $1/n$ since it assumes we do not have\n",
-    "the exact mean values.  The following simple function uses the\n",
-    "**np.vstack** function which takes each vector of dimension $1\\times n$\n",
-    "and produces a $2\\times n$ matrix $\\boldsymbol{W}$"
+    "Since we know from our problem that"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "d19780a8",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
-    "\\boldsymbol{W} = \\begin{bmatrix} x_0 & y_0 \\\\\n",
-    "                          x_1 & y_1 \\\\\n",
-    "                          x_2 & y_2\\\\\n",
-    "                          \\dots & \\dots \\\\\n",
-    "                          x_{n-2} & y_{n-2}\\\\\n",
-    "                          x_{n-1} & y_{n-1} & \n",
-    "             \\end{bmatrix},\n",
-    "$$"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "which in turn is converted into into the $2\\times 2$ covariance matrix\n",
-    "$\\boldsymbol{C}$ via the Numpy function **np.cov()**. We note that we can also calculate\n",
-    "the mean value of each set of samples $\\boldsymbol{x}$ etc using the Numpy\n",
-    "function **np.mean(x)**. We can also extract the eigenvalues of the\n",
-    "covariance matrix through the **np.linalg.eig()** function."
-   ]
-  },
-  {
-   "cell_type": "code",
-<<<<<<< HEAD
-   "execution_count": 11,
-   "metadata": {},
-   "outputs": [],
-=======
-   "execution_count": 1,
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "-0.0295549375800962\n",
-      "3.790157415516731\n",
-      "[[ 1.14945017  3.28385419]\n",
-      " [ 3.28385419 10.22579788]]\n"
-     ]
-    }
-   ],
->>>>>>> 9b0e2e75096cc1acee65bfac25f4eff818140252
+    "\\begin{aligned}\n",
+    "-g''(x) &= f(x) \\\\\n",
+    "&= (3x + x^2)\\exp(x)\n",
+    "\\end{aligned}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "00fedc6e",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "# Importing various packages\n",
-    "import numpy as np\n",
-    "n = 100\n",
-    "x = np.random.normal(size=n)\n",
-    "print(np.mean(x))\n",
-    "y = 4+3*x+np.random.normal(size=n)\n",
-    "print(np.mean(y))\n",
-    "W = np.vstack((x, y))\n",
-    "C = np.cov(W)\n",
-    "print(C)"
+    "along with the conditions $g(0) = g(1) = 0$,\n",
+    "the following scheme can be used to find an approximate solution for $g(x)$ numerically:"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "28005c86",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "## Correlation Matrix\n",
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"odesys\"></div>\n",
     "\n",
-    "The previous example can be converted into the correlation matrix by\n",
-    "simply scaling the matrix elements with the variances.  We should also\n",
-    "subtract the mean values for each column. This leads to the following\n",
-    "code which sets up the correlations matrix for the previous example in\n",
-    "a more brute force way. Here we scale the mean values for each column of the design matrix, calculate the relevant mean values and variances and then finally set up the $2\\times 2$ correlation matrix (since we have only two vectors)."
+    "$$\n",
+    "\\begin{equation}\n",
+    "  \\begin{aligned}\n",
+    "  -\\Big( \\frac{g_{i+1} - 2g_i + g_{i-1}}{\\Delta x^2} \\Big) &= f(x_i) \\\\\n",
+    "  -g_{i+1} + 2g_i - g_{i-1} &= \\Delta x^2 f(x_i)\n",
+    "  \\end{aligned}\n",
+    "\\end{equation} \\label{odesys} \\tag{16}\n",
+    "$$"
    ]
   },
   {
-   "cell_type": "code",
-<<<<<<< HEAD
-   "execution_count": 12,
-   "metadata": {},
-   "outputs": [],
-=======
-   "execution_count": 2,
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "0.08073726712724406\n",
-      "1.6145539590295142\n",
-      "[[1.         0.63404481]\n",
-      " [0.63404481 1.        ]]\n"
-     ]
-    }
-   ],
->>>>>>> 9b0e2e75096cc1acee65bfac25f4eff818140252
+   "cell_type": "markdown",
+   "id": "d562bb0c",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "import numpy as np\n",
-    "n = 100\n",
-    "# define two vectors                                                                                           \n",
-    "x = np.random.random(size=n)\n",
-    "y = 4+3*x+np.random.normal(size=n)\n",
-    "#scaling the x and y vectors                                                                                   \n",
-    "x = x - np.mean(x)\n",
-    "y = y - np.mean(y)\n",
-    "variance_x = np.sum(x@x)/n\n",
-    "variance_y = np.sum(y@y)/n\n",
-    "print(variance_x)\n",
-    "print(variance_y)\n",
-    "cov_xy = np.sum(x@y)/n\n",
-    "cov_xx = np.sum(x@x)/n\n",
-    "cov_yy = np.sum(y@y)/n\n",
-    "C = np.zeros((2,2))\n",
-    "C[0,0]= cov_xx/variance_x\n",
-    "C[1,1]= cov_yy/variance_y\n",
-    "C[0,1]= cov_xy/np.sqrt(variance_y*variance_x)\n",
-    "C[1,0]= C[0,1]\n",
-    "print(C)"
+    "for $i = 1, \\dots, N_x - 2$ where $g_0 = g_{N_x - 1} = 0$ and $f(x_i) = (3x_i + x_i^2)\\exp(x_i)$, which is given for our specific problem.\n",
+    "\n",
+    "The equation can be rewritten into a matrix equation:"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "bdee81e4",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "We see that the matrix elements along the diagonal are one as they\n",
-    "should be and that the matrix is symmetric. Furthermore, diagonalizing\n",
-    "this matrix we easily see that it is a positive definite matrix.\n",
-    "\n",
-    "The above procedure with **numpy** can be made more compact if we use **pandas**.\n",
-    "\n",
-    "## Correlation Matrix with Pandas\n",
-    "\n",
-    "We whow here how we can set up the correlation matrix using **pandas**, as done in this simple code"
+    "$$\n",
+    "\\begin{aligned}\n",
+    "\\begin{pmatrix}\n",
+    "2 & -1 & 0 & \\dots & 0 \\\\\n",
+    "-1 & 2 & -1 & \\dots & 0 \\\\\n",
+    "\\vdots & & \\ddots & & \\vdots \\\\\n",
+    "0 & \\dots & -1 & 2 & -1  \\\\\n",
+    "0 & \\dots & 0 & -1 & 2\\\\\n",
+    "\\end{pmatrix}\n",
+    "\\begin{pmatrix}\n",
+    "g_1 \\\\\n",
+    "g_2 \\\\\n",
+    "\\vdots \\\\\n",
+    "g_{N_x - 3} \\\\\n",
+    "g_{N_x - 2}\n",
+    "\\end{pmatrix}\n",
+    "&=\n",
+    "\\Delta x^2\n",
+    "\\begin{pmatrix}\n",
+    "f(x_1) \\\\\n",
+    "f(x_2) \\\\\n",
+    "\\vdots \\\\\n",
+    "f(x_{N_x - 3}) \\\\\n",
+    "f(x_{N_x - 2})\n",
+    "\\end{pmatrix} \\\\\n",
+    "\\boldsymbol{A}\\boldsymbol{g} &= \\boldsymbol{f},\n",
+    "\\end{aligned}\n",
+    "$$"
    ]
   },
   {
-   "cell_type": "code",
-<<<<<<< HEAD
-   "execution_count": 13,
-   "metadata": {},
-   "outputs": [],
-=======
-   "execution_count": 3,
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "[[ 0.01396941  0.49068974]\n",
-      " [ 0.54918099  2.04838299]\n",
-      " [-0.35991553 -1.16529785]\n",
-      " [ 0.74909071  1.11467729]\n",
-      " [ 1.10998316  4.0040917 ]\n",
-      " [-0.98934642 -2.16772616]\n",
-      " [ 0.25009971  0.75283979]\n",
-      " [-0.57918262 -1.70870953]\n",
-      " [-0.98545332 -3.90181134]\n",
-      " [ 0.24157391  0.53286336]]\n",
-      "          0         1\n",
-      "0  0.013969  0.490690\n",
-      "1  0.549181  2.048383\n",
-      "2 -0.359916 -1.165298\n",
-      "3  0.749091  1.114677\n",
-      "4  1.109983  4.004092\n",
-      "5 -0.989346 -2.167726\n",
-      "6  0.250100  0.752840\n",
-      "7 -0.579183 -1.708710\n",
-      "8 -0.985453 -3.901811\n",
-      "9  0.241574  0.532863\n",
-      "          0         1\n",
-      "0  1.000000  0.959994\n",
-      "1  0.959994  1.000000\n"
-     ]
-    }
-   ],
->>>>>>> 9b0e2e75096cc1acee65bfac25f4eff818140252
+   "cell_type": "markdown",
+   "id": "ddf436f5",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "import numpy as np\n",
-    "import pandas as pd\n",
-    "n = 10\n",
-    "x = np.random.normal(size=n)\n",
-    "x = x - np.mean(x)\n",
-    "y = 4+3*x+np.random.normal(size=n)\n",
-    "y = y - np.mean(y)\n",
-    "X = (np.vstack((x, y))).T\n",
-    "print(X)\n",
-    "Xpd = pd.DataFrame(X)\n",
-    "print(Xpd)\n",
-    "correlation_matrix = Xpd.corr()\n",
-    "print(correlation_matrix)"
+    "which makes it possible to solve for the vector $\\boldsymbol{g}$."
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "66ae2d44",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "We expand this model to the Franke function discussed above.\n",
+    "## Setting up the code\n",
     "\n",
-    "## Correlation Matrix with Pandas and the Franke function"
+    "We can then compare the result from this numerical scheme with the output from our network using Autograd:"
    ]
   },
   {
    "cell_type": "code",
-<<<<<<< HEAD
-   "execution_count": 14,
-   "metadata": {},
-   "outputs": [],
-=======
-   "execution_count": 4,
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "     0         1         2         3         4         5         6         7   \\\n",
-      "0   0.0  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000   \n",
-      "1   0.0  0.072184  0.069825  0.069428  0.071162  0.072822  0.060421  0.061903   \n",
-      "2   0.0  0.069825  0.069297  0.065490  0.068155  0.070919  0.055665  0.057749   \n",
-      "3   0.0  0.069428  0.065490  0.071945  0.072269  0.072413  0.065947  0.066493   \n",
-      "4   0.0  0.071162  0.068155  0.072269  0.073368  0.074365  0.065102  0.066207   \n",
-      "5   0.0  0.072822  0.070919  0.072413  0.074365  0.076313  0.064012  0.065724   \n",
-      "6   0.0  0.060421  0.055665  0.065947  0.065102  0.064012  0.062745  0.062435   \n",
-      "7   0.0  0.061903  0.057749  0.066493  0.066207  0.065724  0.062435  0.062545   \n",
-      "8   0.0  0.063660  0.060176  0.067220  0.067552  0.067741  0.062209  0.062780   \n",
-      "9   0.0  0.065675  0.062949  0.068101  0.069119  0.070057  0.062036  0.063114   \n",
-      "10  0.0  0.052443  0.047291  0.059402  0.057799  0.055912  0.058081  0.057183   \n",
-      "11  0.0  0.053341  0.048621  0.059668  0.058469  0.057019  0.057756  0.057169   \n",
-      "12  0.0  0.054483  0.050229  0.060122  0.059366  0.058394  0.057550  0.057302   \n",
-      "13  0.0  0.055889  0.052145  0.060775  0.060506  0.060062  0.057464  0.057588   \n",
-      "14  0.0  0.057577  0.054396  0.061634  0.061905  0.062047  0.057497  0.058031   \n",
-      "\n",
-      "          8         9         10        11        12        13        14  \n",
-      "0   0.000000  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000  \n",
-      "1   0.063660  0.065675  0.052443  0.053341  0.054483  0.055889  0.057577  \n",
-      "2   0.060176  0.062949  0.047291  0.048621  0.050229  0.052145  0.054396  \n",
-      "3   0.067220  0.068101  0.059402  0.059668  0.060122  0.060775  0.061634  \n",
-      "4   0.067552  0.069119  0.057799  0.058469  0.059366  0.060506  0.061905  \n",
-      "5   0.067741  0.070057  0.055912  0.057019  0.058394  0.060062  0.062047  \n",
-      "6   0.062209  0.062036  0.058081  0.057756  0.057550  0.057464  0.057497  \n",
-      "7   0.062780  0.063114  0.057183  0.057169  0.057302  0.057588  0.058031  \n",
-      "8   0.063524  0.064419  0.056301  0.056626  0.057130  0.057825  0.058718  \n",
-      "9   0.064419  0.065935  0.055400  0.056095  0.057007  0.058150  0.059542  \n",
-      "10  0.056301  0.055400  0.054868  0.054129  0.053455  0.052843  0.052283  \n",
-      "11  0.056626  0.056095  0.054129  0.053624  0.053205  0.052870  0.052614  \n",
-      "12  0.057130  0.057007  0.053455  0.053205  0.053063  0.053031  0.053108  \n",
-      "13  0.057825  0.058150  0.052843  0.052870  0.053031  0.053332  0.053775  \n",
-      "14  0.058718  0.059542  0.052283  0.052614  0.053108  0.053775  0.054625  \n"
-     ]
+   "execution_count": 36,
+   "id": "17f02a24",
+   "metadata": {
+    "collapsed": false,
+    "editable": true,
+    "jupyter": {
+     "outputs_hidden": false
     }
-   ],
->>>>>>> 9b0e2e75096cc1acee65bfac25f4eff818140252
+   },
+   "outputs": [],
    "source": [
-    "# Common imports\n",
-    "import numpy as np\n",
-    "import pandas as pd\n",
+    "import autograd.numpy as np\n",
+    "from autograd import grad, elementwise_grad\n",
+    "import autograd.numpy.random as npr\n",
+    "from matplotlib import pyplot as plt\n",
+    "\n",
+    "def sigmoid(z):\n",
+    "    return 1/(1 + np.exp(-z))\n",
+    "\n",
+    "def deep_neural_network(deep_params, x):\n",
+    "    # N_hidden is the number of hidden layers\n",
+    "    # deep_params is a list, len() should be used\n",
+    "    N_hidden = len(deep_params) - 1 # -1 since params consists of\n",
+    "                                        # parameters to all the hidden\n",
+    "                                        # layers AND the output layer.\n",
+    "\n",
+    "    # Assumes input x being an one-dimensional array\n",
+    "    num_values = np.size(x)\n",
+    "    x = x.reshape(-1, num_values)\n",
+    "\n",
+    "    # Assume that the input layer does nothing to the input x\n",
+    "    x_input = x\n",
+    "\n",
+    "    # Due to multiple hidden layers, define a variable referencing to the\n",
+    "    # output of the previous layer:\n",
+    "    x_prev = x_input\n",
+    "\n",
+    "    ## Hidden layers:\n",
+    "\n",
+    "    for l in range(N_hidden):\n",
+    "        # From the list of parameters P; find the correct weigths and bias for this layer\n",
+    "        w_hidden = deep_params[l]\n",
+    "\n",
+    "        # Add a row of ones to include bias\n",
+    "        x_prev = np.concatenate((np.ones((1,num_values)), x_prev ), axis = 0)\n",
+    "\n",
+    "        z_hidden = np.matmul(w_hidden, x_prev)\n",
+    "        x_hidden = sigmoid(z_hidden)\n",
+    "\n",
+    "        # Update x_prev such that next layer can use the output from this layer\n",
+    "        x_prev = x_hidden\n",
+    "\n",
+    "    ## Output layer:\n",
+    "\n",
+    "    # Get the weights and bias for this layer\n",
+    "    w_output = deep_params[-1]\n",
+    "\n",
+    "    # Include bias:\n",
+    "    x_prev = np.concatenate((np.ones((1,num_values)), x_prev), axis = 0)\n",
+    "\n",
+    "    z_output = np.matmul(w_output, x_prev)\n",
+    "    x_output = z_output\n",
+    "\n",
+    "    return x_output\n",
+    "\n",
+    "\n",
+    "def solve_ode_deep_neural_network(x, num_neurons, num_iter, lmb):\n",
+    "    # num_hidden_neurons is now a list of number of neurons within each hidden layer\n",
+    "\n",
+    "    # Find the number of hidden layers:\n",
+    "    N_hidden = np.size(num_neurons)\n",
+    "\n",
+    "    ## Set up initial weigths and biases\n",
+    "\n",
+    "    # Initialize the list of parameters:\n",
+    "    P = [None]*(N_hidden + 1) # + 1 to include the output layer\n",
+    "\n",
+    "    P[0] = npr.randn(num_neurons[0], 2 )\n",
+    "    for l in range(1,N_hidden):\n",
+    "        P[l] = npr.randn(num_neurons[l], num_neurons[l-1] + 1) # +1 to include bias\n",
+    "\n",
+    "    # For the output layer\n",
+    "    P[-1] = npr.randn(1, num_neurons[-1] + 1 ) # +1 since bias is included\n",
+    "\n",
+    "    print('Initial cost: %g'%cost_function_deep(P, x))\n",
+    "\n",
+    "    ## Start finding the optimal weigths using gradient descent\n",
+    "\n",
+    "    # Find the Python function that represents the gradient of the cost function\n",
+    "    # w.r.t the 0-th input argument -- that is the weights and biases in the hidden and output layer\n",
+    "    cost_function_deep_grad = grad(cost_function_deep,0)\n",
+    "\n",
+    "    # Let the update be done num_iter times\n",
+    "    for i in range(num_iter):\n",
+    "        # Evaluate the gradient at the current weights and biases in P.\n",
+    "        # The cost_grad consist now of N_hidden + 1 arrays; the gradient w.r.t the weights and biases\n",
+    "        # in the hidden layers and output layers evaluated at x.\n",
+    "        cost_deep_grad =  cost_function_deep_grad(P, x)\n",
+    "\n",
+    "        for l in range(N_hidden+1):\n",
+    "            P[l] = P[l] - lmb * cost_deep_grad[l]\n",
+    "\n",
+    "    print('Final cost: %g'%cost_function_deep(P, x))\n",
+    "\n",
+    "    return P\n",
+    "\n",
+    "## Set up the cost function specified for this Poisson equation:\n",
+    "\n",
+    "# The right side of the ODE\n",
+    "def f(x):\n",
+    "    return (3*x + x**2)*np.exp(x)\n",
+    "\n",
+    "def cost_function_deep(P, x):\n",
+    "\n",
+    "    # Evaluate the trial function with the current parameters P\n",
+    "    g_t = g_trial_deep(x,P)\n",
+    "\n",
+    "    # Find the derivative w.r.t x of the trial function\n",
+    "    d2_g_t = elementwise_grad(elementwise_grad(g_trial_deep,0))(x,P)\n",
+    "\n",
+    "    right_side = f(x)\n",
+    "\n",
+    "    err_sqr = (-d2_g_t - right_side)**2\n",
+    "    cost_sum = np.sum(err_sqr)\n",
+    "\n",
+    "    return cost_sum/np.size(err_sqr)\n",
+    "\n",
+    "# The trial solution:\n",
+    "def g_trial_deep(x,P):\n",
+    "    return x*(1-x)*deep_neural_network(P,x)\n",
+    "\n",
+    "# The analytic solution;\n",
+    "def g_analytic(x):\n",
+    "    return x*(1-x)*np.exp(x)\n",
+    "\n",
+    "if __name__ == '__main__':\n",
+    "    npr.seed(4155)\n",
+    "\n",
+    "    ## Decide the vales of arguments to the function to solve\n",
+    "    Nx = 10\n",
+    "    x = np.linspace(0,1, Nx)\n",
+    "\n",
+    "    ## Set up the initial parameters\n",
+    "    num_hidden_neurons = [200,100]\n",
+    "    num_iter = 1000\n",
+    "    lmb = 1e-3\n",
+    "\n",
+    "    P = solve_ode_deep_neural_network(x, num_hidden_neurons, num_iter, lmb)\n",
+    "\n",
+    "    g_dnn_ag = g_trial_deep(x,P)\n",
+    "    g_analytical = g_analytic(x)\n",
+    "\n",
+    "    # Find the maximum absolute difference between the solutons:\n",
+    "\n",
+    "    plt.figure(figsize=(10,10))\n",
+    "\n",
+    "    plt.title('Performance of neural network solving an ODE compared to the analytical solution')\n",
+    "    plt.plot(x, g_analytical)\n",
+    "    plt.plot(x, g_dnn_ag[0,:])\n",
+    "    plt.legend(['analytical','nn'])\n",
+    "    plt.xlabel('x')\n",
+    "    plt.ylabel('g(x)')\n",
+    "\n",
+    "    ## Perform the computation using the numerical scheme\n",
+    "\n",
+    "    dx = 1/(Nx - 1)\n",
+    "\n",
+    "    # Set up the matrix A\n",
+    "    A = np.zeros((Nx-2,Nx-2))\n",
     "\n",
+    "    A[0,0] = 2\n",
+    "    A[0,1] = -1\n",
     "\n",
-    "def FrankeFunction(x,y):\n",
-    "\tterm1 = 0.75*np.exp(-(0.25*(9*x-2)**2) - 0.25*((9*y-2)**2))\n",
-    "\tterm2 = 0.75*np.exp(-((9*x+1)**2)/49.0 - 0.1*(9*y+1))\n",
-    "\tterm3 = 0.5*np.exp(-(9*x-7)**2/4.0 - 0.25*((9*y-3)**2))\n",
-    "\tterm4 = -0.2*np.exp(-(9*x-4)**2 - (9*y-7)**2)\n",
-    "\treturn term1 + term2 + term3 + term4\n",
+    "    for i in range(1,Nx-3):\n",
+    "        A[i,i-1] = -1\n",
+    "        A[i,i] = 2\n",
+    "        A[i,i+1] = -1\n",
     "\n",
+    "    A[Nx - 3, Nx - 4] = -1\n",
+    "    A[Nx - 3, Nx - 3] = 2\n",
     "\n",
-    "def create_X(x, y, n ):\n",
-    "\tif len(x.shape) > 1:\n",
-    "\t\tx = np.ravel(x)\n",
-    "\t\ty = np.ravel(y)\n",
+    "    # Set up the vector f\n",
+    "    f_vec = dx**2 * f(x[1:-1])\n",
     "\n",
-    "\tN = len(x)\n",
-    "\tl = int((n+1)*(n+2)/2)\t\t# Number of elements in beta\n",
-    "\tX = np.ones((N,l))\n",
+    "    # Solve the equation\n",
+    "    g_res = np.linalg.solve(A,f_vec)\n",
     "\n",
-    "\tfor i in range(1,n+1):\n",
-    "\t\tq = int((i)*(i+1)/2)\n",
-    "\t\tfor k in range(i+1):\n",
-    "\t\t\tX[:,q+k] = (x**(i-k))*(y**k)\n",
+    "    g_vec = np.zeros(Nx)\n",
+    "    g_vec[1:-1] = g_res\n",
     "\n",
-    "\treturn X\n",
+    "    # Print the differences between each method\n",
+    "    max_diff1 = np.max(np.abs(g_dnn_ag - g_analytical))\n",
+    "    max_diff2 = np.max(np.abs(g_vec - g_analytical))\n",
+    "    print(\"The max absolute difference between the analytical solution and DNN Autograd: %g\"%max_diff1)\n",
+    "    print(\"The max absolute difference between the analytical solution and numerical scheme: %g\"%max_diff2)\n",
     "\n",
+    "    # Plot the results\n",
+    "    plt.figure(figsize=(10,10))\n",
     "\n",
-    "# Making meshgrid of datapoints and compute Franke's function\n",
-    "n = 4\n",
-    "N = 100\n",
-    "x = np.sort(np.random.uniform(0, 1, N))\n",
-    "y = np.sort(np.random.uniform(0, 1, N))\n",
-    "z = FrankeFunction(x, y)\n",
-    "X = create_X(x, y, n=n)    \n",
+    "    plt.plot(x,g_vec)\n",
+    "    plt.plot(x,g_analytical)\n",
+    "    plt.plot(x,g_dnn_ag[0,:])\n",
     "\n",
-    "Xpd = pd.DataFrame(X)\n",
-    "# subtract the mean values and set up the covariance matrix\n",
-    "Xpd = Xpd - Xpd.mean()\n",
-    "covariance_matrix = Xpd.cov()\n",
-    "print(covariance_matrix)"
+    "    plt.legend(['numerical scheme','analytical','dnn'])\n",
+    "    plt.show()"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "51ee4433",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "We note here that the covariance is zero for the first rows and\n",
-    "columns since all matrix elements in the design matrix were set to one\n",
-<<<<<<< HEAD
-    "(we are fitting the function in terms of a polynomial of degree $n$).\n",
-    "\n",
-    "This means that the variance for these elements will be zero and will\n",
-    "cause problems when we set up the correlation matrix.  We can simply\n",
-    "drop these elements and construct a correlation\n",
-    "matrix without these elements. \n",
-=======
-    "(we are fitting the function in terms of a polynomial of degree $n$). We would however not include the intercept\n",
-    "and wee can simply\n",
-    "drop these elements and construct a correlation\n",
-    "matrix without them. \n",
->>>>>>> 9b0e2e75096cc1acee65bfac25f4eff818140252
-    "\n",
+    "## Partial Differential Equations\n",
     "\n",
-    "## Rewriting the Covariance and/or Correlation Matrix\n",
+    "A partial differential equation (PDE) has a solution here the function\n",
+    "is defined by multiple variables.  The equation may involve all kinds\n",
+    "of combinations of which variables the function is differentiated with\n",
+    "respect to.\n",
     "\n",
-    "We can rewrite the covariance matrix in a more compact form in terms of the design/feature matrix $\\boldsymbol{X}$ as"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "$$\n",
-    "\\boldsymbol{C}[\\boldsymbol{x}] = \\frac{1}{n}\\boldsymbol{X}\\boldsymbol{X}^T= \\mathbb{E}[\\boldsymbol{X}\\boldsymbol{X}^T].\n",
-    "$$"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "To see this let us simply look at a design matrix $\\boldsymbol{X}\\in {\\mathbb{R}}^{2\\times 2}$"
+    "In general, a partial differential equation for a function $g(x_1,\\dots,x_N)$ with $N$ variables may be expressed as"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "1ec16aab",
+   "metadata": {
+    "editable": true
+   },
    "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"PDE\"></div>\n",
+    "\n",
     "$$\n",
-    "\\boldsymbol{X}=\\begin{bmatrix}\n",
-    "x_{00} & x_{01}\\\\\n",
-    "x_{10} & x_{11}\\\\\n",
-    "\\end{bmatrix}=\\begin{bmatrix}\n",
-    "\\boldsymbol{x}_{0} & \\boldsymbol{x}_{1}\\\\\n",
-    "\\end{bmatrix}.\n",
+    "\\begin{equation} \\label{PDE} \\tag{17}\n",
+    "  f\\left(x_1, \\, \\dots \\, , x_N, \\frac{\\partial g(x_1,\\dots,x_N) }{\\partial x_1}, \\dots , \\frac{\\partial g(x_1,\\dots,x_N) }{\\partial x_N}, \\frac{\\partial g(x_1,\\dots,x_N) }{\\partial x_1\\partial x_2}, \\, \\dots \\, , \\frac{\\partial^n g(x_1,\\dots,x_N) }{\\partial x_N^n} \\right) = 0\n",
+    "\\end{equation}\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "If we then compute the expectation value"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
+   "id": "64fd215d",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "$$\n",
-    "\\mathbb{E}[\\boldsymbol{X}\\boldsymbol{X}^T] = \\frac{1}{n}\\boldsymbol{X}\\boldsymbol{X}^T=\\begin{bmatrix}\n",
-    "x_{00}^2+x_{01}^2 & x_{00}x_{10}+x_{01}x_{11}\\\\\n",
-    "x_{10}x_{00}+x_{11}x_{01} & x_{10}^2+x_{11}^2\\\\\n",
-    "\\end{bmatrix},\n",
-    "$$"
+    "where $f$ is an expression involving all kinds of possible mixed derivatives of $g(x_1,\\dots,x_N)$ up to an order $n$. In order for the solution to be unique, some additional conditions must also be given."
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "3efab799",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "which is just"
+    "## Type of problem\n",
+    "\n",
+    "The problem our network must solve for, is similar to the ODE case.\n",
+    "We must have a trial solution $g_t$ at hand.\n",
+    "\n",
+    "For instance, the trial solution could be expressed as"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "80e6d77c",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
-    "\\boldsymbol{C}[\\boldsymbol{x}_0,\\boldsymbol{x}_1] = \\boldsymbol{C}[\\boldsymbol{x}]=\\begin{bmatrix} \\mathrm{var}[\\boldsymbol{x}_0] & \\mathrm{cov}[\\boldsymbol{x}_0,\\boldsymbol{x}_1] \\\\\n",
-    "                              \\mathrm{cov}[\\boldsymbol{x}_1,\\boldsymbol{x}_0] & \\mathrm{var}[\\boldsymbol{x}_1] \\\\\n",
-    "             \\end{bmatrix},\n",
+    "\\begin{align*}\n",
+    "  g_t(x_1,\\dots,x_N) = h_1(x_1,\\dots,x_N) + h_2(x_1,\\dots,x_N,N(x_1,\\dots,x_N,P))\n",
+    "\\end{align*}\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "f08a42bd",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "where we wrote $$\\boldsymbol{C}[\\boldsymbol{x}_0,\\boldsymbol{x}_1] = \\boldsymbol{C}[\\boldsymbol{x}]$$ to indicate that this the covariance of the vectors $\\boldsymbol{x}$ of the design/feature matrix $\\boldsymbol{X}$.\n",
-    "\n",
-    "It is easy to generalize this to a matrix $\\boldsymbol{X}\\in {\\mathbb{R}}^{n\\times p}$.\n",
-    "\n",
-    "\n",
-    "## Towards the PCA theorem\n",
+    "where $h_1(x_1,\\dots,x_N)$ is a function that ensures $g_t(x_1,\\dots,x_N)$ satisfies some given conditions.\n",
+    "The neural network $N(x_1,\\dots,x_N,P)$ has weights and biases described by $P$ and $h_2(x_1,\\dots,x_N,N(x_1,\\dots,x_N,P))$ is an expression using the output from the neural network in some way.\n",
     "\n",
-    "We have that the covariance matrix (the correlation matrix involves a simple rescaling) is given as"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "$$\n",
-    "\\boldsymbol{C}[\\boldsymbol{x}] = \\frac{1}{n}\\boldsymbol{X}\\boldsymbol{X}^T= \\mathbb{E}[\\boldsymbol{X}\\boldsymbol{X}^T].\n",
-    "$$"
+    "The role of the function $h_2(x_1,\\dots,x_N,N(x_1,\\dots,x_N,P))$, is to ensure that the output of $N(x_1,\\dots,x_N,P)$ is zero when $g_t(x_1,\\dots,x_N)$ is evaluated at the values of $x_1,\\dots,x_N$ where the given conditions must be satisfied. The function $h_1(x_1,\\dots,x_N)$ should alone make $g_t(x_1,\\dots,x_N)$ satisfy the conditions."
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "af035b50",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "Let us now assume that we can perform a series of orthogonal transformations where we employ some orthogonal matrices $\\boldsymbol{S}$.\n",
-    "These matrices are defined as $\\boldsymbol{S}\\in {\\mathbb{R}}^{p\\times p}$ and obey the orthogonality requirements $\\boldsymbol{S}\\boldsymbol{S}^T=\\boldsymbol{S}^T\\boldsymbol{S}=\\boldsymbol{I}$. The matrix can be written out in terms of the column vectors $\\boldsymbol{s}_i$ as $\\boldsymbol{S}=[\\boldsymbol{s}_0,\\boldsymbol{s}_1,\\dots,\\boldsymbol{s}_{p-1}]$ and $\\boldsymbol{s}_i \\in {\\mathbb{R}}^{p}$.\n",
+    "## Network requirements\n",
     "\n",
-    "Assume also that there is a transformation $\\boldsymbol{S}\\boldsymbol{C}[\\boldsymbol{x}]\\boldsymbol{S}^T=\\boldsymbol{C}[\\boldsymbol{y}]$ such that the new matrix $\\boldsymbol{C}[\\boldsymbol{y}]$ is diagonal with elements $[\\lambda_0,\\lambda_1,\\lambda_2,\\dots,\\lambda_{p-1}]$.  \n",
+    "The network tries then the minimize the cost function following the\n",
+    "same ideas as described for the ODE case, but now with more than one\n",
+    "variables to consider.  The concept still remains the same; find a set\n",
+    "of parameters $P$ such that the expression $f$ in ([17](#PDE)) is as\n",
+    "close to zero as possible.\n",
     "\n",
-    "That is we have"
+    "As for the ODE case, the cost function is the mean squared error that\n",
+    "the network must try to minimize. The cost function for the network to\n",
+    "minimize is"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "ee147dfb",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
-    "\\boldsymbol{C}[\\boldsymbol{y}] = \\mathbb{E}[\\boldsymbol{S}\\boldsymbol{X}\\boldsymbol{X}^T\\boldsymbol{S}^T]=\\boldsymbol{S}\\boldsymbol{C}[\\boldsymbol{x}]\\boldsymbol{S}^T,\n",
+    "C\\left(x_1, \\dots, x_N, P\\right) = \\left(  f\\left(x_1, \\, \\dots \\, , x_N, \\frac{\\partial g(x_1,\\dots,x_N) }{\\partial x_1}, \\dots , \\frac{\\partial g(x_1,\\dots,x_N) }{\\partial x_N}, \\frac{\\partial g(x_1,\\dots,x_N) }{\\partial x_1\\partial x_2}, \\, \\dots \\, , \\frac{\\partial^n g(x_1,\\dots,x_N) }{\\partial x_N^n} \\right) \\right)^2\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "850e95ed",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "since the matrix $\\boldsymbol{S}$ is not a data dependent matrix.   Multiplying with $\\boldsymbol{S}^T$ from the left we have"
+    "## More details\n",
+    "\n",
+    "If we let $\\boldsymbol{x} = \\big( x_1, \\dots, x_N \\big)$ be an array containing the values for $x_1, \\dots, x_N$ respectively, the cost function can be reformulated into the following:"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "96f9cca4",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
-    "\\boldsymbol{S}^T\\boldsymbol{C}[\\boldsymbol{y}] = \\boldsymbol{C}[\\boldsymbol{x}]\\boldsymbol{S}^T,\n",
+    "C\\left(\\boldsymbol{x}, P\\right) = f\\left( \\left( \\boldsymbol{x}, \\frac{\\partial g(\\boldsymbol{x}) }{\\partial x_1}, \\dots , \\frac{\\partial g(\\boldsymbol{x}) }{\\partial x_N}, \\frac{\\partial g(\\boldsymbol{x}) }{\\partial x_1\\partial x_2}, \\, \\dots \\, , \\frac{\\partial^n g(\\boldsymbol{x}) }{\\partial x_N^n} \\right) \\right)^2\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "70394cae",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "and since $\\boldsymbol{C}[\\boldsymbol{y}]$ is diagonal we have for a given eigenvalue $i$ of the covariance matrix that"
+    "If we also have $M$ different sets of values for $x_1, \\dots, x_N$, that is $\\boldsymbol{x}_i = \\big(x_1^{(i)}, \\dots, x_N^{(i)}\\big)$ for $i = 1,\\dots,M$ being the rows in matrix $X$, the cost function can be generalized into"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "d06e6c30",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
-    "\\boldsymbol{S}^T_i\\lambda_i = \\boldsymbol{C}[\\boldsymbol{x}]\\boldsymbol{S}^T_i.\n",
+    "C\\left(X, P \\right) = \\sum_{i=1}^M f\\left( \\left( \\boldsymbol{x}_i, \\frac{\\partial g(\\boldsymbol{x}_i) }{\\partial x_1}, \\dots , \\frac{\\partial g(\\boldsymbol{x}_i) }{\\partial x_N}, \\frac{\\partial g(\\boldsymbol{x}_i) }{\\partial x_1\\partial x_2}, \\, \\dots \\, , \\frac{\\partial^n g(\\boldsymbol{x}_i) }{\\partial x_N^n} \\right) \\right)^2.\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "b4972f88",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "In the derivation of the PCA theorem we will assume that the eigenvalues are ordered in descending order, that is\n",
-    "$\\lambda_0 > \\lambda_1 > \\dots > \\lambda_{p-1}$. \n",
-    "\n",
-    "\n",
-    "The eigenvalues tell us then how much we need to stretch the\n",
-    "corresponding eigenvectors. Dimensions with large eigenvalues have\n",
-    "thus large variations (large variance) and define therefore useful\n",
-    "dimensions. The data points are more spread out in the direction of\n",
-    "these eigenvectors.  Smaller eigenvalues mean on the other hand that\n",
-    "the corresponding eigenvectors are shrunk accordingly and the data\n",
-    "points are tightly bunched together and there is not much variation in\n",
-    "these specific directions. Hopefully then we could leave it out\n",
-    "dimensions where the eigenvalues are very small. If $p$ is very large,\n",
-    "we could then aim at reducing $p$ to $l << p$ and handle only $l$\n",
-    "features/predictors.\n",
-    "\n",
-    "## The Algorithm before theorem\n",
+    "## Example: The diffusion equation\n",
     "\n",
-    "Here's how we would proceed in setting up the algorithm for the PCA, see also discussion below here. \n",
-    "* Set up the datapoints for the design/feature matrix $\\boldsymbol{X}$ with $\\boldsymbol{X}\\in {\\mathbb{R}}^{n\\times p}$, with the predictors/features $p$  referring to the column numbers and the entries $n$ being the row elements."
+    "In one spatial dimension, the equation reads"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "3d35cbd3",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
-    "\\boldsymbol{X}=\\begin{bmatrix}\n",
-    "x_{0,0} & x_{0,1} & x_{0,2}& \\dots & \\dots x_{0,p-1}\\\\\n",
-    "x_{1,0} & x_{1,1} & x_{1,2}& \\dots & \\dots x_{1,p-1}\\\\\n",
-    "x_{2,0} & x_{2,1} & x_{2,2}& \\dots & \\dots x_{2,p-1}\\\\\n",
-    "\\dots & \\dots & \\dots & \\dots \\dots & \\dots \\\\\n",
-    "x_{n-2,0} & x_{n-2,1} & x_{n-2,2}& \\dots & \\dots x_{n-2,p-1}\\\\\n",
-    "x_{n-1,0} & x_{n-1,1} & x_{n-1,2}& \\dots & \\dots x_{n-1,p-1}\\\\\n",
-    "\\end{bmatrix},\n",
+    "\\frac{\\partial g(x,t)}{\\partial t} = \\frac{\\partial^2 g(x,t)}{\\partial x^2}\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "984bf645",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "* Center the data by subtracting the mean value for each column. This leads to a new matrix $\\boldsymbol{X}\\rightarrow \\overline{\\boldsymbol{X}}$.\n",
-    "\n",
-    "* Compute then the covariance/correlation matrix $\\mathbb{E}[\\overline{\\boldsymbol{X}}\\overline{\\boldsymbol{X}}^T]$.\n",
-    "\n",
-    "* Find the eigenpairs of $\\boldsymbol{C}$ with eigenvalues $[\\lambda_0,\\lambda_1,\\dots,\\lambda_{p-1}]$ and eigenvectors $[\\boldsymbol{s}_0,\\boldsymbol{s}_1,\\dots,\\boldsymbol{s}_{p-1}]$.\n",
-    "\n",
-    "* Order the eigenvalue (and the eigenvectors accordingly) in order of decreasing eigenvalues.\n",
-    "\n",
-    "* Keep only those $l$ eigenvalues larger than a selected threshold value, discarding thus $p-l$ features since we expect small variations in the data here.\n",
-    "\n",
-    "## Writing our own PCA code\n",
-    "\n",
-    "We will use a simple example first with two-dimensional data\n",
-    "drawn from a multivariate normal distribution with the following mean and covariance matrix:"
+    "where a possible choice of conditions are"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "9d58d0ec",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
-    "\\mu = (-1,2) \\qquad \\Sigma = \\begin{bmatrix} 4 & 2 \\\\\n",
-    "2 & 2\n",
-    "\\end{bmatrix}\n",
+    "\\begin{align*}\n",
+    "g(0,t) &= 0 ,\\qquad t \\geq 0 \\\\\n",
+    "g(1,t) &= 0, \\qquad t \\geq 0 \\\\\n",
+    "g(x,0) &= u(x),\\qquad x\\in [0,1]\n",
+    "\\end{align*}\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Note that the mean refers to each column of data. \n",
-    "We will generate $n = 1000$ points $X = \\{ x_1, \\ldots, x_N \\}$ from\n",
-    "this distribution, and store them in the $1000 \\times 2$ matrix $\\boldsymbol{X}$.\n",
-    "\n",
-    "The following Python code aids in setting up the data and writing out the design matrix.\n",
-    "Note that the function **multivariate** returns also the covariance discussed above and that it is defined by dividing by $n-1$ instead of $n$."
-   ]
-  },
-  {
-   "cell_type": "code",
-<<<<<<< HEAD
-   "execution_count": 15,
-=======
-   "execution_count": 5,
->>>>>>> 9b0e2e75096cc1acee65bfac25f4eff818140252
-   "metadata": {},
-   "outputs": [],
+   "id": "99cf8f47",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "import numpy as np\n",
-    "import pandas as pd\n",
-    "import matplotlib.pyplot as plt\n",
-    "from IPython.display import display\n",
-    "n = 10000\n",
-    "mean = (-1, 2)\n",
-    "cov = [[4, 2], [2, 2]]\n",
-    "X = np.random.multivariate_normal(mean, cov, n)"
+    "with $u(x)$ being some given function."
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "777ad3a8",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "Now we are going to implement the PCA algorithm. We will break it down into various substeps.\n",
-    "\n",
-    "### Compute the sample mean and center the data\n",
+    "## Defining the problem\n",
     "\n",
-    "The first step of PCA is to compute the sample mean of the data and use it to center the data. Recall that the sample mean is"
+    "For this case, we want to find $g(x,t)$ such that"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "7182b747",
+   "metadata": {
+    "editable": true
+   },
    "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"diffonedim\"></div>\n",
+    "\n",
     "$$\n",
-    "\\mu_n = \\frac{1}{n} \\sum_{i=1}^n x_i\n",
+    "\\begin{equation}\n",
+    "  \\frac{\\partial g(x,t)}{\\partial t} = \\frac{\\partial^2 g(x,t)}{\\partial x^2}\n",
+    "\\end{equation} \\label{diffonedim} \\tag{18}\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "3c40d528",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "and the mean-centered data $\\bar{X} = \\{ \\bar{x}_1, \\ldots, \\bar{x}_n \\}$ takes the form"
+    "and"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "7cb1e15a",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
-    "\\bar{x}_i = x_i - \\mu_n.\n",
+    "\\begin{align*}\n",
+    "g(0,t) &= 0 ,\\qquad t \\geq 0 \\\\\n",
+    "g(1,t) &= 0, \\qquad t \\geq 0 \\\\\n",
+    "g(x,0) &= u(x),\\qquad x\\in [0,1]\n",
+    "\\end{align*}\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "5c4bcdb5",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "When you are done with these steps, print out $\\mu_n$ to verify it is\n",
-    "close to $\\mu$ and plot your mean centered data to verify it is\n",
-    "centered at the origin! Compare your code with the functionality from **Scikit-Learn** discussed above.\n",
-    "The following code elements perform these operations using **pandas** or using our own functionality for doing so. The latter, using **numpy** is rather simple through the **mean()** function."
+    "with $u(x) = \\sin(\\pi x)$.\n",
+    "\n",
+    "First, let us set up the deep neural network.\n",
+    "The deep neural network will follow the same structure as discussed in the examples solving the ODEs.\n",
+    "First, we will look into how Autograd could be used in a network tailored to solve for bivariate functions."
    ]
   },
   {
-   "cell_type": "code",
-   "execution_count": 16,
-   "metadata": {},
-   "outputs": [],
+   "cell_type": "markdown",
+   "id": "c84ff432",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "df = pd.DataFrame(X)\n",
-    "# Pandas does the centering for us\n",
-    "df = df -df.mean()\n",
-    "# we center it ourselves\n",
-    "X_centered = X - X.mean(axis=0)"
+    "## Setting up the network using Autograd\n",
+    "\n",
+    "The only change to do here, is to extend our network such that\n",
+    "functions of multiple parameters are correctly handled.  In this case\n",
+    "we have two variables in our function to solve for, that is time $t$\n",
+    "and position $x$.  The variables will be represented by a\n",
+    "one-dimensional array in the program.  The program will evaluate the\n",
+    "network at each possible pair $(x,t)$, given an array for the desired\n",
+    "$x$-values and $t$-values to approximate the solution at."
    ]
   },
   {
-   "cell_type": "markdown",
-   "metadata": {},
+   "cell_type": "code",
+   "execution_count": 37,
+   "id": "ba62ab4c",
+   "metadata": {
+    "collapsed": false,
+    "editable": true,
+    "jupyter": {
+     "outputs_hidden": false
+    }
+   },
+   "outputs": [],
    "source": [
-    "Alternatively, we could use the functions we discussed\n",
-    "earlier for scaling the data set.  That is, we could have used the\n",
-    "**StandardScaler** function in **Scikit-Learn**, a function which ensures\n",
-    "that for each feature/predictor we study the mean value is zero and\n",
-    "the variance is one (every column in the design/feature matrix).  You\n",
-    "would then not get the same results, since we divide by the\n",
-    "variance. The diagonal covariance matrix elements will then be one,\n",
-    "while the non-diagonal ones need to be divided by $2\\sqrt{2}$ for our\n",
-    "specific case.\n",
+    "def sigmoid(z):\n",
+    "    return 1/(1 + np.exp(-z))\n",
+    "\n",
+    "def deep_neural_network(deep_params, x):\n",
+    "    # x is now a point and a 1D numpy array; make it a column vector\n",
+    "    num_coordinates = np.size(x,0)\n",
+    "    x = x.reshape(num_coordinates,-1)\n",
+    "\n",
+    "    num_points = np.size(x,1)\n",
+    "\n",
+    "    # N_hidden is the number of hidden layers\n",
+    "    N_hidden = len(deep_params) - 1 # -1 since params consist of parameters to all the hidden layers AND the output layer\n",
+    "\n",
+    "    # Assume that the input layer does nothing to the input x\n",
+    "    x_input = x\n",
+    "    x_prev = x_input\n",
+    "\n",
+    "    ## Hidden layers:\n",
+    "\n",
+    "    for l in range(N_hidden):\n",
+    "        # From the list of parameters P; find the correct weigths and bias for this layer\n",
+    "        w_hidden = deep_params[l]\n",
+    "\n",
+    "        # Add a row of ones to include bias\n",
+    "        x_prev = np.concatenate((np.ones((1,num_points)), x_prev ), axis = 0)\n",
+    "\n",
+    "        z_hidden = np.matmul(w_hidden, x_prev)\n",
+    "        x_hidden = sigmoid(z_hidden)\n",
+    "\n",
+    "        # Update x_prev such that next layer can use the output from this layer\n",
+    "        x_prev = x_hidden\n",
+    "\n",
+    "    ## Output layer:\n",
+    "\n",
+    "    # Get the weights and bias for this layer\n",
+    "    w_output = deep_params[-1]\n",
+    "\n",
+    "    # Include bias:\n",
+    "    x_prev = np.concatenate((np.ones((1,num_points)), x_prev), axis = 0)\n",
     "\n",
-    "### Compute the sample covariance\n",
+    "    z_output = np.matmul(w_output, x_prev)\n",
+    "    x_output = z_output\n",
     "\n",
-    "Now we are going to use the mean centered data to compute the sample covariance of the data by using the following equation"
+    "    return x_output[0][0]"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "7fd9e6dc",
+   "metadata": {
+    "editable": true
+   },
    "source": [
+    "## Setting up the network using Autograd; The trial solution\n",
+    "\n",
+    "The cost function must then iterate through the given arrays\n",
+    "containing values for $x$ and $t$, defines a point $(x,t)$ the deep\n",
+    "neural network and the trial solution is evaluated at, and then finds\n",
+    "the Jacobian of the trial solution.\n",
+    "\n",
+    "A possible trial solution for this PDE is\n",
+    "\n",
     "$$\n",
-    "\\Sigma_n = \\frac{1}{n-1} \\sum_{i=1}^n \\bar{x}_i^T \\bar{x}_i = \\frac{1}{n-1} \\sum_{i=1}^n (x_i - \\mu_n)^T (x_i - \\mu_n)\n",
-    "$$"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "where the data points $x_i \\in \\mathbb{R}^p$ (here in this example $p = 2$) are column vectors and $x^T$ is the transpose of $x$.\n",
-    "We can write our own code or simply use either the functionaly of **numpy** or that of **pandas**, as follows"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 17,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "print(df.cov())\n",
-    "print(np.cov(X_centered.T))"
+    "g_t(x,t) = h_1(x,t) + x(1-x)tN(x,t,P)\n",
+    "$$\n",
+    "\n",
+    "with $A(x,t)$ being a function ensuring that $g_t(x,t)$ satisfies our given conditions, and $N(x,t,P)$ being the output from the deep neural network using weights and biases for each layer from $P$.\n",
+    "\n",
+    "To fulfill the conditions, $A(x,t)$ could be:\n",
+    "\n",
+    "$$\n",
+    "h_1(x,t) = (1-t)\\Big(u(x) - \\big((1-x)u(0) + x u(1)\\big)\\Big) = (1-t)u(x) = (1-t)\\sin(\\pi x)\n",
+    "$$\n",
+    "since $(0) = u(1) = 0$ and $u(x) = \\sin(\\pi x)$."
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "6c63c928",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "Note that the way we define the covariance matrix here has a factor $n-1$ instead of $n$. This is included in the **cov()** function by **numpy** and **pandas**. \n",
-    "Our own code here is not very elegant and asks for obvious improvements. It is tailored to this specific $2\\times 2$ covariance matrix."
+    "## Why the jacobian?\n",
+    "\n",
+    "The Jacobian is used because the program must find the derivative of\n",
+    "the trial solution with respect to $x$ and $t$.\n",
+    "\n",
+    "This gives the necessity of computing the Jacobian matrix, as we want\n",
+    "to evaluate the gradient with respect to $x$ and $t$ (note that the\n",
+    "Jacobian of a scalar-valued multivariate function is simply its\n",
+    "gradient).\n",
+    "\n",
+    "In Autograd, the differentiation is by default done with respect to\n",
+    "the first input argument of your Python function. Since the points is\n",
+    "an array representing $x$ and $t$, the Jacobian is calculated using\n",
+    "the values of $x$ and $t$.\n",
+    "\n",
+    "To find the second derivative with respect to $x$ and $t$, the\n",
+    "Jacobian can be found for the second time. The result is a Hessian\n",
+    "matrix, which is the matrix containing all the possible second order\n",
+    "mixed derivatives of $g(x,t)$."
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 18,
-   "metadata": {},
+   "execution_count": 38,
+   "id": "4192bf3d",
+   "metadata": {
+    "collapsed": false,
+    "editable": true,
+    "jupyter": {
+     "outputs_hidden": false
+    }
+   },
    "outputs": [],
    "source": [
-    "# extract the relevant columns from the centered design matrix of dim n x 2\n",
-    "x = X_centered[:,0]\n",
-    "y = X_centered[:,1]\n",
-    "Cov = np.zeros((2,2))\n",
-    "Cov[0,1] = np.sum(x.T@y)/(n-1.0)\n",
-    "Cov[0,0] = np.sum(x.T@x)/(n-1.0)\n",
-    "Cov[1,1] = np.sum(y.T@y)/(n-1.0)\n",
-    "Cov[1,0]= Cov[0,1]\n",
-    "print(\"Centered covariance using own code\")\n",
-    "print(Cov)\n",
-    "plt.plot(x, y, 'x')\n",
-    "plt.axis('equal')\n",
-    "plt.show()"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Depending on the number of points $n$, we will get results that are close to the covariance values defined above.\n",
-    "The plot shows how the data are clustered around a line with slope close to one. Is this expected?\n",
+    "# Set up the trial function:\n",
+    "def u(x):\n",
+    "    return np.sin(np.pi*x)\n",
+    "\n",
+    "def g_trial(point,P):\n",
+    "    x,t = point\n",
+    "    return (1-t)*u(x) + x*(1-x)*t*deep_neural_network(P,point)\n",
+    "\n",
+    "# The right side of the ODE:\n",
+    "def f(point):\n",
+    "    return 0.\n",
+    "\n",
+    "# The cost function:\n",
+    "def cost_function(P, x, t):\n",
+    "    cost_sum = 0\n",
+    "\n",
+    "    g_t_jacobian_func = jacobian(g_trial)\n",
+    "    g_t_hessian_func = hessian(g_trial)\n",
     "\n",
-    "### Diagonalize the sample covariance matrix to obtain the principal components\n",
+    "    for x_ in x:\n",
+    "        for t_ in t:\n",
+    "            point = np.array([x_,t_])\n",
     "\n",
-    "Now we are ready to solve for the principal components! To do so we\n",
-    "diagonalize the sample covariance matrix $\\Sigma$. We can use the\n",
-    "function **np.linalg.eig** to do so. It will return the eigenvalues and\n",
-    "eigenvectors of $\\Sigma$. Once we have these we can perform the \n",
-    "following tasks:\n",
+    "            g_t = g_trial(point,P)\n",
+    "            g_t_jacobian = g_t_jacobian_func(point,P)\n",
+    "            g_t_hessian = g_t_hessian_func(point,P)\n",
     "\n",
-    "* We compute the percentage of the total variance captured by the first principal component\n",
+    "            g_t_dt = g_t_jacobian[1]\n",
+    "            g_t_d2x = g_t_hessian[0][0]\n",
     "\n",
-    "* We plot the mean centered data and lines along the first and second principal components\n",
+    "            func = f(point)\n",
     "\n",
-    "* Then we project the mean centered data onto the first and second principal components, and plot the projected data. \n",
+    "            err_sqr = ( (g_t_dt - g_t_d2x) - func)**2\n",
+    "            cost_sum += err_sqr\n",
     "\n",
-    "* Finally, we approximate the data as"
+    "    return cost_sum"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "87f8417d",
+   "metadata": {
+    "editable": true
+   },
    "source": [
+    "## Setting up the network using Autograd; The full program\n",
+    "\n",
+    "Having set up the network, along with the trial solution and cost function, we can now see how the deep neural network performs by comparing the results to the analytical solution.\n",
+    "\n",
+    "The analytical solution of our problem is\n",
+    "\n",
+    "$$\n",
+    "g(x,t) = \\exp(-\\pi^2 t)\\sin(\\pi x)\n",
     "$$\n",
-    "x_i \\approx \\tilde{x}_i = \\mu_n + \\langle x_i, v_0 \\rangle v_0\n",
-    "$$"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "where $v_0$ is the first principal component. \n",
     "\n",
-    "Collecting all these steps we can write our own PCA function and\n",
-    "compare this with the functionality included in **Scikit-Learn**.  \n",
+    "A possible way to implement a neural network solving the PDE, is given below.\n",
+    "Be aware, though, that it is fairly slow for the parameters used.\n",
+    "A better result is possible, but requires more iterations, and thus longer time to complete.\n",
     "\n",
-    "The code here outlines some of the elements we could include in the\n",
-    "analysis. Feel free to extend upon this in order to address the above\n",
-    "questions."
+    "Indeed, the program below is not optimal in its implementation, but rather serves as an example on how to implement and use a neural network to solve a PDE.\n",
+    "Using TensorFlow results in a much better execution time. Try it!"
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 19,
-   "metadata": {},
+   "execution_count": 39,
+   "id": "1572e93b",
+   "metadata": {
+    "collapsed": false,
+    "editable": true,
+    "jupyter": {
+     "outputs_hidden": false
+    }
+   },
    "outputs": [],
    "source": [
-    "# diagonalize and obtain eigenvalues, not necessarily sorted\n",
-    "EigValues, EigVectors = np.linalg.eig(Cov)\n",
-    "# sort eigenvectors and eigenvalues\n",
-    "#permute = EigValues.argsort()\n",
-    "#EigValues = EigValues[permute]\n",
-    "#EigVectors = EigVectors[:,permute]\n",
-    "print(\"Eigenvalues of Covariance matrix\")\n",
-    "for i in range(2):\n",
-    "    print(EigValues[i])\n",
-    "FirstEigvector = EigVectors[:,0]\n",
-    "SecondEigvector = EigVectors[:,1]\n",
-    "print(\"First eigenvector\")\n",
-    "print(FirstEigvector)\n",
-    "print(\"Second eigenvector\")\n",
-    "print(SecondEigvector)\n",
-    "#thereafter we do a PCA with Scikit-learn\n",
-    "from sklearn.decomposition import PCA\n",
-    "pca = PCA(n_components = 2)\n",
-    "X2Dsl = pca.fit_transform(X)\n",
-    "print(\"Eigenvector of largest eigenvalue\")\n",
-    "print(pca.components_.T[:, 0])"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "This code does not contain all the above elements, but it shows how we can use **Scikit-Learn** to extract the eigenvector which corresponds to the largest eigenvalue. Try to address the questions we pose before the above code.  Try also to change the values of the covariance matrix by making one of the diagonal elements much larger than the other. What do you observe then? \n",
+    "import autograd.numpy as np\n",
+    "from autograd import jacobian,hessian,grad\n",
+    "import autograd.numpy.random as npr\n",
+    "from matplotlib import cm\n",
+    "from matplotlib import pyplot as plt\n",
+    "from mpl_toolkits.mplot3d import axes3d\n",
+    "\n",
+    "## Set up the network\n",
+    "\n",
+    "def sigmoid(z):\n",
+    "    return 1/(1 + np.exp(-z))\n",
+    "\n",
+    "def deep_neural_network(deep_params, x):\n",
+    "    # x is now a point and a 1D numpy array; make it a column vector\n",
+    "    num_coordinates = np.size(x,0)\n",
+    "    x = x.reshape(num_coordinates,-1)\n",
+    "\n",
+    "    num_points = np.size(x,1)\n",
+    "\n",
+    "    # N_hidden is the number of hidden layers\n",
+    "    N_hidden = len(deep_params) - 1 # -1 since params consist of parameters to all the hidden layers AND the output layer\n",
+    "\n",
+    "    # Assume that the input layer does nothing to the input x\n",
+    "    x_input = x\n",
+    "    x_prev = x_input\n",
+    "\n",
+    "    ## Hidden layers:\n",
+    "\n",
+    "    for l in range(N_hidden):\n",
+    "        # From the list of parameters P; find the correct weigths and bias for this layer\n",
+    "        w_hidden = deep_params[l]\n",
+    "\n",
+    "        # Add a row of ones to include bias\n",
+    "        x_prev = np.concatenate((np.ones((1,num_points)), x_prev ), axis = 0)\n",
+    "\n",
+    "        z_hidden = np.matmul(w_hidden, x_prev)\n",
+    "        x_hidden = sigmoid(z_hidden)\n",
+    "\n",
+    "        # Update x_prev such that next layer can use the output from this layer\n",
+    "        x_prev = x_hidden\n",
+    "\n",
+    "    ## Output layer:\n",
+    "\n",
+    "    # Get the weights and bias for this layer\n",
+    "    w_output = deep_params[-1]\n",
+    "\n",
+    "    # Include bias:\n",
+    "    x_prev = np.concatenate((np.ones((1,num_points)), x_prev), axis = 0)\n",
+    "\n",
+    "    z_output = np.matmul(w_output, x_prev)\n",
+    "    x_output = z_output\n",
+    "\n",
+    "    return x_output[0][0]\n",
+    "\n",
+    "## Define the trial solution and cost function\n",
+    "def u(x):\n",
+    "    return np.sin(np.pi*x)\n",
+    "\n",
+    "def g_trial(point,P):\n",
+    "    x,t = point\n",
+    "    return (1-t)*u(x) + x*(1-x)*t*deep_neural_network(P,point)\n",
+    "\n",
+    "# The right side of the ODE:\n",
+    "def f(point):\n",
+    "    return 0.\n",
+    "\n",
+    "# The cost function:\n",
+    "def cost_function(P, x, t):\n",
+    "    cost_sum = 0\n",
+    "\n",
+    "    g_t_jacobian_func = jacobian(g_trial)\n",
+    "    g_t_hessian_func = hessian(g_trial)\n",
+    "\n",
+    "    for x_ in x:\n",
+    "        for t_ in t:\n",
+    "            point = np.array([x_,t_])\n",
+    "\n",
+    "            g_t = g_trial(point,P)\n",
+    "            g_t_jacobian = g_t_jacobian_func(point,P)\n",
+    "            g_t_hessian = g_t_hessian_func(point,P)\n",
+    "\n",
+    "            g_t_dt = g_t_jacobian[1]\n",
+    "            g_t_d2x = g_t_hessian[0][0]\n",
+    "\n",
+    "            func = f(point)\n",
+    "\n",
+    "            err_sqr = ( (g_t_dt - g_t_d2x) - func)**2\n",
+    "            cost_sum += err_sqr\n",
+    "\n",
+    "    return cost_sum /( np.size(x)*np.size(t) )\n",
+    "\n",
+    "## For comparison, define the analytical solution\n",
+    "def g_analytic(point):\n",
+    "    x,t = point\n",
+    "    return np.exp(-np.pi**2*t)*np.sin(np.pi*x)\n",
+    "\n",
+    "## Set up a function for training the network to solve for the equation\n",
+    "def solve_pde_deep_neural_network(x,t, num_neurons, num_iter, lmb):\n",
+    "    ## Set up initial weigths and biases\n",
+    "    N_hidden = np.size(num_neurons)\n",
+    "\n",
+    "    ## Set up initial weigths and biases\n",
+    "\n",
+    "    # Initialize the list of parameters:\n",
+    "    P = [None]*(N_hidden + 1) # + 1 to include the output layer\n",
+    "\n",
+    "    P[0] = npr.randn(num_neurons[0], 2 + 1 ) # 2 since we have two points, +1 to include bias\n",
+    "    for l in range(1,N_hidden):\n",
+    "        P[l] = npr.randn(num_neurons[l], num_neurons[l-1] + 1) # +1 to include bias\n",
+    "\n",
+    "    # For the output layer\n",
+    "    P[-1] = npr.randn(1, num_neurons[-1] + 1 ) # +1 since bias is included\n",
+    "\n",
+    "    print('Initial cost: ',cost_function(P, x, t))\n",
+    "\n",
+    "    cost_function_grad = grad(cost_function,0)\n",
+    "\n",
+    "    # Let the update be done num_iter times\n",
+    "    for i in range(num_iter):\n",
+    "        cost_grad =  cost_function_grad(P, x , t)\n",
+    "\n",
+    "        for l in range(N_hidden+1):\n",
+    "            P[l] = P[l] - lmb * cost_grad[l]\n",
+    "\n",
+    "    print('Final cost: ',cost_function(P, x, t))\n",
+    "\n",
+    "    return P\n",
+    "\n",
+    "if __name__ == '__main__':\n",
+    "    ### Use the neural network:\n",
+    "    npr.seed(15)\n",
+    "\n",
+    "    ## Decide the vales of arguments to the function to solve\n",
+    "    Nx = 10; Nt = 10\n",
+    "    x = np.linspace(0, 1, Nx)\n",
+    "    t = np.linspace(0,1,Nt)\n",
+    "\n",
+    "    ## Set up the parameters for the network\n",
+    "    num_hidden_neurons = [100, 25]\n",
+    "    num_iter = 250\n",
+    "    lmb = 0.01\n",
+    "\n",
+    "    P = solve_pde_deep_neural_network(x,t, num_hidden_neurons, num_iter, lmb)\n",
+    "\n",
+    "    ## Store the results\n",
+    "    g_dnn_ag = np.zeros((Nx, Nt))\n",
+    "    G_analytical = np.zeros((Nx, Nt))\n",
+    "    for i,x_ in enumerate(x):\n",
+    "        for j, t_ in enumerate(t):\n",
+    "            point = np.array([x_, t_])\n",
+    "            g_dnn_ag[i,j] = g_trial(point,P)\n",
+    "\n",
+    "            G_analytical[i,j] = g_analytic(point)\n",
+    "\n",
+    "    # Find the map difference between the analytical and the computed solution\n",
+    "    diff_ag = np.abs(g_dnn_ag - G_analytical)\n",
+    "    print('Max absolute difference between the analytical solution and the network: %g'%np.max(diff_ag))\n",
+    "\n",
+    "    ## Plot the solutions in two dimensions, that being in position and time\n",
+    "\n",
+    "    T,X = np.meshgrid(t,x)\n",
+    "\n",
+    "    fig = plt.figure(figsize=(10,10))\n",
+    "    ax = fig.add_suplot(projection='3d')\n",
+    "    ax.set_title('Solution from the deep neural network w/ %d layer'%len(num_hidden_neurons))\n",
+    "    s = ax.plot_surface(T,X,g_dnn_ag,linewidth=0,antialiased=False,cmap=cm.viridis)\n",
+    "    ax.set_xlabel('Time $t$')\n",
+    "    ax.set_ylabel('Position $x$');\n",
     "\n",
-    "## Classical PCA Theorem\n",
     "\n",
-    "We assume now that we have a design matrix $\\boldsymbol{X}$ which has been\n",
-    "centered as discussed above. For the sake of simplicity we skip the\n",
-    "overline symbol. The matrix is defined in terms of the various column\n",
-    "vectors $[\\boldsymbol{x}_0,\\boldsymbol{x}_1,\\dots, \\boldsymbol{x}_{p-1}]$ each with dimension\n",
-    "$\\boldsymbol{x}\\in {\\mathbb{R}}^{n}$.\n",
+    "    fig = plt.figure(figsize=(10,10))\n",
+    "    ax = fig.add_suplot(projection='3d')\n",
+    "    ax.set_title('Analytical solution')\n",
+    "    s = ax.plot_surface(T,X,G_analytical,linewidth=0,antialiased=False,cmap=cm.viridis)\n",
+    "    ax.set_xlabel('Time $t$')\n",
+    "    ax.set_ylabel('Position $x$');\n",
     "\n",
-    "We assume also that we have an orthogonal transformation $\\boldsymbol{W}\\in {\\mathbb{R}}^{p\\times p}$.  We define the reconstruction error (which is similar to the mean squared error we have seen before) as"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "$$\n",
-    "J(\\boldsymbol{W},\\boldsymbol{Z}) = \\frac{1}{n}\\sum_i (\\boldsymbol{x}_i - \\overline{\\boldsymbol{x}}_i)^2,\n",
-    "$$"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "with $\\overline{\\boldsymbol{x}}_i = \\boldsymbol{W}\\boldsymbol{z}_i$, where $\\boldsymbol{z}_i$ is a row vector with dimension ${\\mathbb{R}}^{n}$ of the matrix\n",
-    "$\\boldsymbol{Z}\\in{\\mathbb{R}}^{p\\times n}$.  When doing PCA we want to reduce this dimensionality. \n",
+    "    fig = plt.figure(figsize=(10,10))\n",
+    "    ax = fig.add_suplot(projection='3d')\n",
+    "    ax.set_title('Difference')\n",
+    "    s = ax.plot_surface(T,X,diff_ag,linewidth=0,antialiased=False,cmap=cm.viridis)\n",
+    "    ax.set_xlabel('Time $t$')\n",
+    "    ax.set_ylabel('Position $x$');\n",
     "\n",
-    "The PCA theorem states that minimizing the above reconstruction error\n",
-    "corresponds to setting $\\boldsymbol{W}=\\boldsymbol{S}$, the orthogonal matrix which\n",
-    "diagonalizes the empirical covariance(correlation) matrix. The optimal\n",
-    "low-dimensional encoding of the data is then given by a set of vectors\n",
-    "$\\boldsymbol{z}_i$ with at most $l$ vectors, with $l << p$, defined by the\n",
-    "orthogonal projection of the data onto the columns spanned by the\n",
-    "eigenvectors of the covariance(correlations matrix).\n",
+    "    ## Take some slices of the 3D plots just to see the solutions at particular times\n",
+    "    indx1 = 0\n",
+    "    indx2 = int(Nt/2)\n",
+    "    indx3 = Nt-1\n",
     "\n",
-<<<<<<< HEAD
-    "The proof which follows will be updated by mid January 2020.\n",
-=======
+    "    t1 = t[indx1]\n",
+    "    t2 = t[indx2]\n",
+    "    t3 = t[indx3]\n",
     "\n",
->>>>>>> 9b0e2e75096cc1acee65bfac25f4eff818140252
+    "    # Slice the results from the DNN\n",
+    "    res1 = g_dnn_ag[:,indx1]\n",
+    "    res2 = g_dnn_ag[:,indx2]\n",
+    "    res3 = g_dnn_ag[:,indx3]\n",
     "\n",
-    "## Proof of the PCA Theorem\n",
+    "    # Slice the analytical results\n",
+    "    res_analytical1 = G_analytical[:,indx1]\n",
+    "    res_analytical2 = G_analytical[:,indx2]\n",
+    "    res_analytical3 = G_analytical[:,indx3]\n",
     "\n",
-    "To show the PCA theorem let us start with the assumption that there is one vector $\\boldsymbol{w}_0$ which corresponds to a solution which minimized the reconstruction error $J$. This is an orthogonal vector. It means that we now approximate the reconstruction error in terms of $\\boldsymbol{w}_0$ and $\\boldsymbol{z}_0$ as"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "$$\n",
-    "J(\\boldsymbol{w}_0,\\boldsymbol{z}_0)= \\frac{1}{n}\\sum_i (\\boldsymbol{x}_i - z_{i0}\\boldsymbol{w}_0)^2=\\frac{1}{n}\\sum_i (\\boldsymbol{x}_i^T\\boldsymbol{x}_i - 2z_{i0}\\boldsymbol{w}_0^T\\boldsymbol{x}_i+z_{i0}^2\\boldsymbol{w}_0^T\\boldsymbol{w}_0),\n",
-    "$$"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "which we can rewrite due to the orthogonality of $\\boldsymbol{w}_i$ as"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "$$\n",
-    "J(\\boldsymbol{w}_0,\\boldsymbol{z}_0)=\\frac{1}{n}\\sum_i (\\boldsymbol{x}_i^T\\boldsymbol{x}_i - 2z_{i0}\\boldsymbol{w}_0^T\\boldsymbol{x}_i+z_{i0}^2).\n",
-    "$$"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Minimizing $J$ with respect to the unknown parameters $z_{0i}$ we obtain that"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "$$\n",
-    "z_{i0}=\\boldsymbol{w}_0^T\\boldsymbol{x}_i,\n",
-    "$$"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "where the vectors on the rhs are known. \n",
+    "    # Plot the slices\n",
+    "    plt.figure(figsize=(10,10))\n",
+    "    plt.title(\"Computed solutions at time = %g\"%t1)\n",
+    "    plt.plot(x, res1)\n",
+    "    plt.plot(x,res_analytical1)\n",
+    "    plt.legend(['dnn','analytical'])\n",
     "\n",
+    "    plt.figure(figsize=(10,10))\n",
+    "    plt.title(\"Computed solutions at time = %g\"%t2)\n",
+    "    plt.plot(x, res2)\n",
+    "    plt.plot(x,res_analytical2)\n",
+    "    plt.legend(['dnn','analytical'])\n",
     "\n",
-    "## PCA Proof continued\n",
+    "    plt.figure(figsize=(10,10))\n",
+    "    plt.title(\"Computed solutions at time = %g\"%t3)\n",
+    "    plt.plot(x, res3)\n",
+    "    plt.plot(x,res_analytical3)\n",
+    "    plt.legend(['dnn','analytical'])\n",
     "\n",
-    "We have now found the unknown parameters $z_{i0}$. These correspond to the projected coordinates and we can write"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "$$\n",
-    "J(\\boldsymbol{w}_0)= \\frac{1}{p}\\sum_i (\\boldsymbol{x}_i^T\\boldsymbol{x}_i - z_{i0}^2)=\\mathrm{const}-\\frac{1}{n}\\sum_i z_{i0}^2.\n",
-    "$$"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "We can show that the variance of the projected coordinates defined by $\\boldsymbol{w}_0^T\\boldsymbol{x}_i$ are given by"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "$$\n",
-    "\\mathrm{var}[\\boldsymbol{w}_0^T\\boldsymbol{x}_i] = \\frac{1}{n}\\sum_i z_{i0}^2,\n",
-    "$$"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "since the expectation value of"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "$$\n",
-    "\\mathbb{E}[\\boldsymbol{w}_0^T\\boldsymbol{x}_i] = \\mathbb{E}[z_{i0}]= \\boldsymbol{w}_0^T\\mathbb{E}[\\boldsymbol{x}_i]=0,\n",
-    "$$"
+    "    plt.show()"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "bf7afd74",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "where we have used the fact that our data are centered.\n",
+    "## Example: Solving the wave equation with Neural Networks\n",
     "\n",
-    "Recalling our definition of the covariance as"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "$$\n",
-    "\\boldsymbol{C}[\\boldsymbol{x}] = \\frac{1}{n}\\boldsymbol{X}\\boldsymbol{X}^T=\\mathbb{E}[\\boldsymbol{X}\\boldsymbol{X}^T],\n",
-    "$$"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "we have thus that"
+    "The wave equation is"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "fdef78b2",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
-    "\\mathrm{var}[\\boldsymbol{w}_0^T\\boldsymbol{x}_i] = \\frac{1}{n}\\sum_i z_{i0}^2=\\boldsymbol{w}_0^T\\boldsymbol{C}[\\boldsymbol{x}]\\boldsymbol{w}_0.\n",
+    "\\frac{\\partial^2 g(x,t)}{\\partial t^2} = c^2\\frac{\\partial^2 g(x,t)}{\\partial x^2}\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "be570613",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "We are almost there, we have obtained a relation between minimizing\n",
-    "the reconstruction error and the variance and the covariance\n",
-    "matrix. Minimizing the error is equivalent to maximizing the variance\n",
-    "of the projected data.\n",
-    "\n",
-    "## The final step\n",
+    "with $c$ being the specified wave speed.\n",
     "\n",
-    "We could trivially maximize the variance of the projection (and\n",
-    "thereby minimize the error in the reconstruction function) by letting\n",
-    "the norm-2 of $\\boldsymbol{w}_0$ go to infinity. However, this norm since we\n",
-    "want the matrix $\\boldsymbol{W}$ to be an orthogonal matrix, is constrained by\n",
-    "$\\vert\\vert \\boldsymbol{w}_0 \\vert\\vert_2^2=1$. Imposing this condition via a\n",
-    "Lagrange multiplier we can then in turn maximize"
+    "Here, the chosen conditions are"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "9f81e04f",
+   "metadata": {
+    "editable": true
+   },
    "source": [
     "$$\n",
-    "J(\\boldsymbol{w}_0)= \\boldsymbol{w}_0^T\\boldsymbol{C}[\\boldsymbol{x}]\\boldsymbol{w}_0+\\lambda_0(1-\\boldsymbol{w}_0^T\\boldsymbol{w}_0).\n",
+    "\\begin{align*}\n",
+    "\tg(0,t) &= 0 \\\\\n",
+    "\tg(1,t) &= 0 \\\\\n",
+    "\tg(x,0) &= u(x) \\\\\n",
+    "\t\\frac{\\partial g(x,t)}{\\partial t} \\Big |_{t = 0} &= v(x)\n",
+    "\\end{align*}\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Taking the derivative with respect to $\\boldsymbol{w}_0$ we obtain"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
+   "id": "91171d8b",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "$$\n",
-    "\\frac{\\partial J(\\boldsymbol{w}_0)}{\\partial \\boldsymbol{w}_0}= 2\\boldsymbol{C}[\\boldsymbol{x}]\\boldsymbol{w}_0-2\\lambda_0\\boldsymbol{w}_0=0,\n",
-    "$$"
+    "where $\\frac{\\partial g(x,t)}{\\partial t} \\Big |_{t = 0}$ means the derivative of $g(x,t)$ with respect to $t$ is evaluated at $t = 0$, and $u(x)$ and $v(x)$ being given functions."
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "dbbbb8a5",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "meaning that"
+    "## The problem to solve for\n",
+    "\n",
+    "The wave equation to solve for, is"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "3f1be58e",
+   "metadata": {
+    "editable": true
+   },
    "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"wave\"></div>\n",
+    "\n",
     "$$\n",
-    "\\boldsymbol{C}[\\boldsymbol{x}]\\boldsymbol{w}_0=\\lambda_0\\boldsymbol{w}_0.\n",
+    "\\begin{equation} \\label{wave} \\tag{19}\n",
+    "\\frac{\\partial^2 g(x,t)}{\\partial t^2} = c^2 \\frac{\\partial^2 g(x,t)}{\\partial x^2}\n",
+    "\\end{equation}\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "d54c4188",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "**The direction that maximizes the variance (or minimizes the construction error) is an eigenvector of the covariance matrix**! If we left multiply with $\\boldsymbol{w}_0^T$ we have the variance of the projected data is"
+    "where $c$ is the given wave speed.\n",
+    "The chosen conditions for this equation are"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "952c58e8",
+   "metadata": {
+    "editable": true
+   },
    "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"condwave\"></div>\n",
+    "\n",
     "$$\n",
-    "\\boldsymbol{w}_0^T\\boldsymbol{C}[\\boldsymbol{x}]\\boldsymbol{w}_0=\\lambda_0.\n",
+    "\\begin{aligned}\n",
+    "g(0,t) &= 0, &t \\geq 0 \\\\\n",
+    "g(1,t) &= 0, &t \\geq 0 \\\\\n",
+    "g(x,0) &= u(x), &x\\in[0,1] \\\\\n",
+    "\\frac{\\partial g(x,t)}{\\partial t}\\Big |_{t = 0} &= v(x), &x \\in [0,1]\n",
+    "\\end{aligned} \\label{condwave} \\tag{20}\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "If we want to maximize the variance (minimize the construction error)\n",
-    "we simply pick the eigenvector of the covariance matrix with the\n",
-    "largest eigenvalue. This establishes the link between the minimization\n",
-    "of the reconstruction function $J$ in terms of an orthogonal matrix\n",
-    "and the maximization of the variance and thereby the covariance of our\n",
-    "observations encoded in the design/feature matrix $\\boldsymbol{X}$.\n",
-    "\n",
-    "The proof\n",
-    "for the other eigenvectors $\\boldsymbol{w}_1,\\boldsymbol{w}_2,\\dots$ can be\n",
-    "established by applying the above arguments and using the fact that\n",
-    "our basis of eigenvectors is orthogonal, see [Murphy chapter\n",
-    "12.2](https://mitpress.mit.edu/books/machine-learning-1).  The\n",
-    "discussion in chapter 12.2 of Murphy's text has also a nice link with\n",
-    "the Singular Value Decomposition theorem. For categorical data, see\n",
-    "chapter 12.4 and discussion therein.\n",
-    "\n",
-<<<<<<< HEAD
-    "Additional part of the proof for the other eigenvectors will be added by mid January 2020.\n",
-    "\n",
-    "## Geometric Interpretation and link with Singular Value Decomposition\n",
-    "\n",
-    "This material will be added by mid January 2020.\n",
-=======
-    "For more details, see for example [Vidal, Ma and Sastry, chapter 2](https://www.springer.com/gp/book/9780387878102).\n",
-    "\n",
-    "## Geometric Interpretation and link with Singular Value Decomposition\n",
-    "\n",
-    "For a detailed demonstration of the geometric interpretation, see [Vidal, Ma and Sastry, section 2.1.2](https://www.springer.com/gp/book/9780387878102).\n",
->>>>>>> 9b0e2e75096cc1acee65bfac25f4eff818140252
-    "\n",
-    "\n",
-    "## Principal Component Analysis\n",
-    "\n",
-    "Principal Component Analysis (PCA) is by far the most popular dimensionality reduction algorithm.\n",
-    "First it identifies the hyperplane that lies closest to the data, and then it projects the data onto it.\n",
-    "\n",
-    "The following Python code uses NumPy’s **svd()** function to obtain all the principal components of the\n",
-    "training set, then extracts the first two principal components. First we center the data using either **pandas** or our own code"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 20,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "import numpy as np\n",
-    "import pandas as pd\n",
-    "from IPython.display import display\n",
-    "np.random.seed(100)\n",
-    "# setting up a 10 x 5 vanilla matrix \n",
-    "rows = 10\n",
-    "cols = 5\n",
-    "X = np.random.randn(rows,cols)\n",
-    "df = pd.DataFrame(X)\n",
-    "# Pandas does the centering for us\n",
-    "df = df -df.mean()\n",
-    "display(df)\n",
-    "\n",
-    "# we center it ourselves\n",
-    "X_centered = X - X.mean(axis=0)\n",
-    "# Then check the difference between pandas and our own set up\n",
-    "print(X_centered-df)\n",
-    "#Now we do an SVD\n",
-    "U, s, V = np.linalg.svd(X_centered)\n",
-    "c1 = V.T[:, 0]\n",
-    "c2 = V.T[:, 1]\n",
-    "W2 = V.T[:, :2]\n",
-    "X2D = X_centered.dot(W2)\n",
-    "print(X2D)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "PCA assumes that the dataset is centered around the origin. Scikit-Learn’s PCA classes take care of centering\n",
-    "the data for you. However, if you implement PCA yourself (as in the preceding example), or if you use other libraries, don’t\n",
-    "forget to center the data first.\n",
-    "\n",
-    "Once you have identified all the principal components, you can reduce the dimensionality of the dataset\n",
-    "down to $d$ dimensions by projecting it onto the hyperplane defined by the first $d$ principal components.\n",
-    "Selecting this hyperplane ensures that the projection will preserve as much variance as possible."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 21,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "W2 = V.T[:, :2]\n",
-    "X2D = X_centered.dot(W2)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "<!-- !split  -->\n",
-    "## PCA and scikit-learn\n",
-    "\n",
-    "Scikit-Learn’s PCA class implements PCA using SVD decomposition just like we did before. The\n",
-    "following code applies PCA to reduce the dimensionality of the dataset down to two dimensions (note\n",
-    "that it automatically takes care of centering the data):"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 22,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "#thereafter we do a PCA with Scikit-learn\n",
-    "from sklearn.decomposition import PCA\n",
-    "pca = PCA(n_components = 2)\n",
-    "X2D = pca.fit_transform(X)\n",
-    "print(X2D)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "After fitting the PCA transformer to the dataset, you can access the principal components using the\n",
-    "components variable (note that it contains the PCs as horizontal vectors, so, for example, the first\n",
-    "principal component is equal to"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 23,
-   "metadata": {},
-   "outputs": [],
+   "id": "a650bae2",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "pca.components_.T[:, 0]."
+    "In this example, let $c = 1$ and $u(x) = \\sin(\\pi x)$ and $v(x) = -\\pi\\sin(\\pi x)$."
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Another very useful piece of information is the explained variance ratio of each principal component,\n",
-    "available via the $explained\\_variance\\_ratio$ variable. It indicates the proportion of the dataset’s\n",
-    "variance that lies along the axis of each principal component. \n",
-    "\n",
-    "## Back to the Cancer Data\n",
-    "We can now repeat the above but applied to real data, in this case our breast cancer data.\n",
-    "Here we compute performance scores on the training data using logistic regression."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 24,
-   "metadata": {},
-   "outputs": [],
+   "id": "9e0b8996",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "import matplotlib.pyplot as plt\n",
-    "import numpy as np\n",
-    "from sklearn.model_selection import  train_test_split \n",
-    "from sklearn.datasets import load_breast_cancer\n",
-    "from sklearn.linear_model import LogisticRegression\n",
-    "cancer = load_breast_cancer()\n",
+    "## The trial solution\n",
+    "Setting up the network is done in similar matter as for the example of solving the diffusion equation.\n",
+    "The only things we have to change, is the trial solution such that it satisfies the conditions from ([20](#condwave)) and the cost function.\n",
     "\n",
-    "X_train, X_test, y_train, y_test = train_test_split(cancer.data,cancer.target,random_state=0)\n",
+    "The trial solution becomes slightly different since we have other conditions than in the example of solving the diffusion equation. Here, a possible trial solution $g_t(x,t)$ is\n",
     "\n",
-    "logreg = LogisticRegression()\n",
-    "logreg.fit(X_train, y_train)\n",
-    "print(\"Train set accuracy from Logistic Regression: {:.2f}\".format(logreg.score(X_train,y_train)))\n",
-    "# We scale the data\n",
-    "from sklearn.preprocessing import StandardScaler\n",
-    "scaler = StandardScaler()\n",
-    "scaler.fit(X_train)\n",
-    "X_train_scaled = scaler.transform(X_train)\n",
-    "X_test_scaled = scaler.transform(X_test)\n",
-    "# Then perform again a log reg fit\n",
-    "logreg.fit(X_train_scaled, y_train)\n",
-    "print(\"Train set accuracy scaled data: {:.2f}\".format(logreg.score(X_train_scaled,y_train)))\n",
-    "#thereafter we do a PCA with Scikit-learn\n",
-    "from sklearn.decomposition import PCA\n",
-    "pca = PCA(n_components = 2)\n",
-    "X2D_train = pca.fit_transform(X_train_scaled)\n",
-    "# and finally compute the log reg fit and the score on the training data\t\n",
-    "logreg.fit(X2D_train,y_train)\n",
-    "print(\"Train set accuracy scaled and PCA data: {:.2f}\".format(logreg.score(X2D_train,y_train)))"
+    "$$\n",
+    "g_t(x,t) = h_1(x,t) + x(1-x)t^2N(x,t,P)\n",
+    "$$\n",
+    "\n",
+    "where\n",
+    "\n",
+    "$$\n",
+    "h_1(x,t) = (1-t^2)u(x) + tv(x)\n",
+    "$$\n",
+    "\n",
+    "Note that this trial solution satisfies the conditions only if $u(0) = v(0) = u(1) = v(1) = 0$, which is the case in this example."
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "0f3f1985",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "We see that our training data after the PCA decomposition has a performance similar to the non-scaled data. \n",
+    "## The analytical solution\n",
     "\n",
-    "## More on the PCA\n",
+    "The analytical solution for our specific problem, is\n",
     "\n",
-    "Instead of arbitrarily choosing the number of dimensions to reduce down to, it is generally preferable to\n",
-    "choose the number of dimensions that add up to a sufficiently large portion of the variance (e.g., 95%).\n",
-    "Unless, of course, you are reducing dimensionality for data visualization — in that case you will\n",
-    "generally want to reduce the dimensionality down to 2 or 3.\n",
-    "The following code computes PCA without reducing dimensionality, then computes the minimum number\n",
-    "of dimensions required to preserve 95% of the training set’s variance:"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 25,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "pca = PCA()\n",
-    "pca.fit(X)\n",
-    "cumsum = np.cumsum(pca.explained_variance_ratio_)\n",
-    "d = np.argmax(cumsum >= 0.95) + 1"
+    "$$\n",
+    "g(x,t) = \\sin(\\pi x)\\cos(\\pi t) - \\sin(\\pi x)\\sin(\\pi t)\n",
+    "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "id": "fbd35329",
+   "metadata": {
+    "editable": true
+   },
    "source": [
-    "You could then set $n\\_components=d$ and run PCA again. However, there is a much better option: instead\n",
-    "of specifying the number of principal components you want to preserve, you can set $n\\_components$ to be\n",
-    "a float between 0.0 and 1.0, indicating the ratio of variance you wish to preserve:"
+    "## Solving the wave equation - the full program using Autograd"
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 26,
-   "metadata": {},
+   "execution_count": 40,
+   "id": "6ccf9344",
+   "metadata": {
+    "collapsed": false,
+    "editable": true,
+    "jupyter": {
+     "outputs_hidden": false
+    }
+   },
    "outputs": [],
    "source": [
-    "pca = PCA(n_components=0.95)\n",
-    "X_reduced = pca.fit_transform(X)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Incremental PCA\n",
+    "import autograd.numpy as np\n",
+    "from autograd import hessian,grad\n",
+    "import autograd.numpy.random as npr\n",
+    "from matplotlib import cm\n",
+    "from matplotlib import pyplot as plt\n",
+    "from mpl_toolkits.mplot3d import axes3d\n",
     "\n",
-    "One problem with the preceding implementation of PCA is that it requires the whole training set to fit in\n",
-    "memory in order for the SVD algorithm to run. Fortunately, Incremental PCA (IPCA) algorithms have\n",
-    "been developed: you can split the training set into mini-batches and feed an IPCA algorithm one minibatch\n",
-    "at a time. This is useful for large training sets, and also to apply PCA online (i.e., on the fly, as new\n",
-    "instances arrive).\n",
+    "## Set up the trial function:\n",
+    "def u(x):\n",
+    "    return np.sin(np.pi*x)\n",
     "\n",
-    "## Randomized PCA\n",
+    "def v(x):\n",
+    "    return -np.pi*np.sin(np.pi*x)\n",
     "\n",
-    "Scikit-Learn offers yet another option to perform PCA, called Randomized PCA. This is a stochastic\n",
-    "algorithm that quickly finds an approximation of the first d principal components. Its computational\n",
-    "complexity is $O(m \\times d^2)+O(d^3)$, instead of $O(m \\times n^2) + O(n^3)$, so it is dramatically faster than the\n",
-    "previous algorithms when $d$ is much smaller than $n$.\n",
+    "def h1(point):\n",
+    "    x,t = point\n",
+    "    return (1 - t**2)*u(x) + t*v(x)\n",
     "\n",
+    "def g_trial(point,P):\n",
+    "    x,t = point\n",
+    "    return h1(point) + x*(1-x)*t**2*deep_neural_network(P,point)\n",
     "\n",
+    "## Define the cost function\n",
+    "def cost_function(P, x, t):\n",
+    "    cost_sum = 0\n",
     "\n",
+    "    g_t_hessian_func = hessian(g_trial)\n",
     "\n",
-    "## Kernel PCA\n",
+    "    for x_ in x:\n",
+    "        for t_ in t:\n",
+    "            point = np.array([x_,t_])\n",
     "\n",
-    "The kernel trick is a mathematical technique that implicitly maps instances into a\n",
-    "very high-dimensional space (called the feature space), enabling nonlinear classification and regression\n",
-    "with Support Vector Machines. Recall that a linear decision boundary in the high-dimensional feature\n",
-    "space corresponds to a complex nonlinear decision boundary in the original space.\n",
-    "It turns out that the same trick can be applied to PCA, making it possible to perform complex nonlinear\n",
-    "projections for dimensionality reduction. This is called Kernel PCA (kPCA). It is often good at\n",
-    "preserving clusters of instances after projection, or sometimes even unrolling datasets that lie close to a\n",
-    "twisted manifold.\n",
-    "For example, the following code uses Scikit-Learn’s KernelPCA class to perform kPCA with an"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 27,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "from sklearn.decomposition import KernelPCA\n",
-    "rbf_pca = KernelPCA(n_components = 2, kernel=\"rbf\", gamma=0.04)\n",
-    "X_reduced = rbf_pca.fit_transform(X)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## LLE\n",
+    "            g_t_hessian = g_t_hessian_func(point,P)\n",
+    "\n",
+    "            g_t_d2x = g_t_hessian[0][0]\n",
+    "            g_t_d2t = g_t_hessian[1][1]\n",
+    "\n",
+    "            err_sqr = ( (g_t_d2t - g_t_d2x) )**2\n",
+    "            cost_sum += err_sqr\n",
+    "\n",
+    "    return cost_sum / (np.size(t) * np.size(x))\n",
+    "\n",
+    "## The neural network\n",
+    "def sigmoid(z):\n",
+    "    return 1/(1 + np.exp(-z))\n",
+    "\n",
+    "def deep_neural_network(deep_params, x):\n",
+    "    # x is now a point and a 1D numpy array; make it a column vector\n",
+    "    num_coordinates = np.size(x,0)\n",
+    "    x = x.reshape(num_coordinates,-1)\n",
+    "\n",
+    "    num_points = np.size(x,1)\n",
+    "\n",
+    "    # N_hidden is the number of hidden layers\n",
+    "    N_hidden = len(deep_params) - 1 # -1 since params consist of parameters to all the hidden layers AND the output layer\n",
+    "\n",
+    "    # Assume that the input layer does nothing to the input x\n",
+    "    x_input = x\n",
+    "    x_prev = x_input\n",
+    "\n",
+    "    ## Hidden layers:\n",
+    "\n",
+    "    for l in range(N_hidden):\n",
+    "        # From the list of parameters P; find the correct weigths and bias for this layer\n",
+    "        w_hidden = deep_params[l]\n",
+    "\n",
+    "        # Add a row of ones to include bias\n",
+    "        x_prev = np.concatenate((np.ones((1,num_points)), x_prev ), axis = 0)\n",
+    "\n",
+    "        z_hidden = np.matmul(w_hidden, x_prev)\n",
+    "        x_hidden = sigmoid(z_hidden)\n",
+    "\n",
+    "        # Update x_prev such that next layer can use the output from this layer\n",
+    "        x_prev = x_hidden\n",
+    "\n",
+    "    ## Output layer:\n",
+    "\n",
+    "    # Get the weights and bias for this layer\n",
+    "    w_output = deep_params[-1]\n",
+    "\n",
+    "    # Include bias:\n",
+    "    x_prev = np.concatenate((np.ones((1,num_points)), x_prev), axis = 0)\n",
+    "\n",
+    "    z_output = np.matmul(w_output, x_prev)\n",
+    "    x_output = z_output\n",
+    "\n",
+    "    return x_output[0][0]\n",
+    "\n",
+    "## The analytical solution\n",
+    "def g_analytic(point):\n",
+    "    x,t = point\n",
+    "    return np.sin(np.pi*x)*np.cos(np.pi*t) - np.sin(np.pi*x)*np.sin(np.pi*t)\n",
+    "\n",
+    "def solve_pde_deep_neural_network(x,t, num_neurons, num_iter, lmb):\n",
+    "    ## Set up initial weigths and biases\n",
+    "    N_hidden = np.size(num_neurons)\n",
+    "\n",
+    "    ## Set up initial weigths and biases\n",
+    "\n",
+    "    # Initialize the list of parameters:\n",
+    "    P = [None]*(N_hidden + 1) # + 1 to include the output layer\n",
+    "\n",
+    "    P[0] = npr.randn(num_neurons[0], 2 + 1 ) # 2 since we have two points, +1 to include bias\n",
+    "    for l in range(1,N_hidden):\n",
+    "        P[l] = npr.randn(num_neurons[l], num_neurons[l-1] + 1) # +1 to include bias\n",
+    "\n",
+    "    # For the output layer\n",
+    "    P[-1] = npr.randn(1, num_neurons[-1] + 1 ) # +1 since bias is included\n",
+    "\n",
+    "    print('Initial cost: ',cost_function(P, x, t))\n",
+    "\n",
+    "    cost_function_grad = grad(cost_function,0)\n",
+    "\n",
+    "    # Let the update be done num_iter times\n",
+    "    for i in range(num_iter):\n",
+    "        cost_grad =  cost_function_grad(P, x , t)\n",
+    "\n",
+    "        for l in range(N_hidden+1):\n",
+    "            P[l] = P[l] - lmb * cost_grad[l]\n",
+    "\n",
+    "\n",
+    "    print('Final cost: ',cost_function(P, x, t))\n",
+    "\n",
+    "    return P\n",
+    "\n",
+    "if __name__ == '__main__':\n",
+    "    ### Use the neural network:\n",
+    "    npr.seed(15)\n",
+    "\n",
+    "    ## Decide the vales of arguments to the function to solve\n",
+    "    Nx = 10; Nt = 10\n",
+    "    x = np.linspace(0, 1, Nx)\n",
+    "    t = np.linspace(0,1,Nt)\n",
+    "\n",
+    "    ## Set up the parameters for the network\n",
+    "    num_hidden_neurons = [50,20]\n",
+    "    num_iter = 1000\n",
+    "    lmb = 0.01\n",
+    "\n",
+    "    P = solve_pde_deep_neural_network(x,t, num_hidden_neurons, num_iter, lmb)\n",
+    "\n",
+    "    ## Store the results\n",
+    "    res = np.zeros((Nx, Nt))\n",
+    "    res_analytical = np.zeros((Nx, Nt))\n",
+    "    for i,x_ in enumerate(x):\n",
+    "        for j, t_ in enumerate(t):\n",
+    "            point = np.array([x_, t_])\n",
+    "            res[i,j] = g_trial(point,P)\n",
+    "\n",
+    "            res_analytical[i,j] = g_analytic(point)\n",
+    "\n",
+    "    diff = np.abs(res - res_analytical)\n",
+    "    print(\"Max difference between analytical and solution from nn: %g\"%np.max(diff))\n",
+    "\n",
+    "    ## Plot the solutions in two dimensions, that being in position and time\n",
+    "\n",
+    "    T,X = np.meshgrid(t,x)\n",
     "\n",
-    "Locally Linear Embedding (LLE) is another very powerful nonlinear dimensionality reduction\n",
-    "(NLDR) technique. It is a Manifold Learning technique that does not rely on projections like the previous\n",
-    "algorithms. In a nutshell, LLE works by first measuring how each training instance linearly relates to its\n",
-    "closest neighbors (c.n.), and then looking for a low-dimensional representation of the training set where\n",
-    "these local relationships are best preserved (more details shortly). \n",
+    "    fig = plt.figure(figsize=(10,10))\n",
+    "    ax = fig.add_suplot(projection='3d')\n",
+    "    ax.set_title('Solution from the deep neural network w/ %d layer'%len(num_hidden_neurons))\n",
+    "    s = ax.plot_surface(T,X,res,linewidth=0,antialiased=False,cmap=cm.viridis)\n",
+    "    ax.set_xlabel('Time $t$')\n",
+    "    ax.set_ylabel('Position $x$');\n",
+    "\n",
+    "\n",
+    "    fig = plt.figure(figsize=(10,10))\n",
+    "    ax = fig.add_suplot(projection='3d')\n",
+    "    ax.set_title('Analytical solution')\n",
+    "    s = ax.plot_surface(T,X,res_analytical,linewidth=0,antialiased=False,cmap=cm.viridis)\n",
+    "    ax.set_xlabel('Time $t$')\n",
+    "    ax.set_ylabel('Position $x$');\n",
+    "\n",
+    "\n",
+    "    fig = plt.figure(figsize=(10,10))\n",
+    "    ax = fig.add_suplot(projection='3d')\n",
+    "    ax.set_title('Difference')\n",
+    "    s = ax.plot_surface(T,X,diff,linewidth=0,antialiased=False,cmap=cm.viridis)\n",
+    "    ax.set_xlabel('Time $t$')\n",
+    "    ax.set_ylabel('Position $x$');\n",
+    "\n",
+    "    ## Take some slices of the 3D plots just to see the solutions at particular times\n",
+    "    indx1 = 0\n",
+    "    indx2 = int(Nt/2)\n",
+    "    indx3 = Nt-1\n",
+    "\n",
+    "    t1 = t[indx1]\n",
+    "    t2 = t[indx2]\n",
+    "    t3 = t[indx3]\n",
+    "\n",
+    "    # Slice the results from the DNN\n",
+    "    res1 = res[:,indx1]\n",
+    "    res2 = res[:,indx2]\n",
+    "    res3 = res[:,indx3]\n",
     "\n",
+    "    # Slice the analytical results\n",
+    "    res_analytical1 = res_analytical[:,indx1]\n",
+    "    res_analytical2 = res_analytical[:,indx2]\n",
+    "    res_analytical3 = res_analytical[:,indx3]\n",
     "\n",
+    "    # Plot the slices\n",
+    "    plt.figure(figsize=(10,10))\n",
+    "    plt.title(\"Computed solutions at time = %g\"%t1)\n",
+    "    plt.plot(x, res1)\n",
+    "    plt.plot(x,res_analytical1)\n",
+    "    plt.legend(['dnn','analytical'])\n",
     "\n",
-    "## Other techniques\n",
+    "    plt.figure(figsize=(10,10))\n",
+    "    plt.title(\"Computed solutions at time = %g\"%t2)\n",
+    "    plt.plot(x, res2)\n",
+    "    plt.plot(x,res_analytical2)\n",
+    "    plt.legend(['dnn','analytical'])\n",
     "\n",
+    "    plt.figure(figsize=(10,10))\n",
+    "    plt.title(\"Computed solutions at time = %g\"%t3)\n",
+    "    plt.plot(x, res3)\n",
+    "    plt.plot(x,res_analytical3)\n",
+    "    plt.legend(['dnn','analytical'])\n",
     "\n",
-    "There are many other dimensionality reduction techniques, several of which are available in Scikit-Learn.\n",
+    "    plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "988e09cf",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Resources on differential equations and deep learning\n",
     "\n",
-    "Here are some of the most popular:\n",
-    "* **Multidimensional Scaling (MDS)** reduces dimensionality while trying to preserve the distances between the instances.\n",
+    "1. [Artificial neural networks for solving ordinary and partial differential equations by I.E. Lagaris et al](https://pdfs.semanticscholar.org/d061/df393e0e8fbfd0ea24976458b7d42419040d.pdf)\n",
     "\n",
-    "* **Isomap** creates a graph by connecting each instance to its nearest neighbors, then reduces dimensionality while trying to preserve the geodesic distances between the instances.\n",
+    "2. [Neural networks for solving differential equations by A. Honchar](https://becominghuman.ai/neural-networks-for-solving-differential-equations-fa230ac5e04c)\n",
     "\n",
-    "* **t-Distributed Stochastic Neighbor Embedding** (t-SNE) reduces dimensionality while trying to keep similar instances close and dissimilar instances apart. It is mostly used for visualization, in particular to visualize clusters of instances in high-dimensional space (e.g., to visualize the MNIST images in 2D).\n",
+    "3. [Solving differential equations using neural networks by M.M Chiaramonte and M. Kiener](http://cs229.stanford.edu/proj2013/ChiaramonteKiener-SolvingDifferentialEquationsUsingNeuralNetworks.pdf)\n",
     "\n",
-    "* Linear Discriminant Analysis (LDA) is actually a classification algorithm, but during training it learns the most discriminative axes between the classes, and these axes can then be used to define a hyperplane onto which to project the data. The benefit is that the projection will keep classes as far apart as possible, so LDA is a good technique to reduce dimensionality before running another classification algorithm such as a Support Vector Machine (SVM) classifier discussed in the SVM lectures."
+    "4. [Introduction to Partial Differential Equations by A. Tveito, R. Winther](https://www.springer.com/us/book/9783540225515)"
    ]
   }
  ],
  "metadata": {
   "kernelspec": {
-   "display_name": "Python 3",
+   "display_name": "Python 3 (ipykernel)",
    "language": "python",
    "name": "python3"
   },
@@ -4747,13 +7320,9 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-<<<<<<< HEAD
-   "version": "3.6.8"
-=======
-   "version": "3.8.3"
->>>>>>> 9b0e2e75096cc1acee65bfac25f4eff818140252
+   "version": "3.9.15"
   }
  },
  "nbformat": 4,
- "nbformat_minor": 4
+ "nbformat_minor": 5
 }
diff --git a/doc/pub/week43/ipynb/ipynb-week43-src.tar.gz b/doc/pub/week43/ipynb/ipynb-week43-src.tar.gz
index 27f05a662..daf852197 100644
Binary files a/doc/pub/week43/ipynb/ipynb-week43-src.tar.gz and b/doc/pub/week43/ipynb/ipynb-week43-src.tar.gz differ
diff --git a/doc/pub/week43/ipynb/week43.ipynb b/doc/pub/week43/ipynb/week43.ipynb
index 0635c602d..b190102b6 100644
--- a/doc/pub/week43/ipynb/week43.ipynb
+++ b/doc/pub/week43/ipynb/week43.ipynb
@@ -2,7 +2,7 @@
  "cells": [
   {
    "cell_type": "markdown",
-   "id": "6107bf3a",
+   "id": "5e07edf2",
    "metadata": {
     "editable": true
    },
@@ -14,42 +14,44 @@
   },
   {
    "cell_type": "markdown",
-   "id": "fdc249e0",
+   "id": "44b465a0",
    "metadata": {
     "editable": true
    },
    "source": [
     "# Week 43: Deep Learning: Constructing a Neural Network code and solving differential equations\n",
-    "**Morten Hjorth-Jensen**, Department of Physics, University of Oslo and Department of Physics and Astronomy and Facility for Rare Isotope Beams, Michigan State University\n",
+    "**Morten Hjorth-Jensen**, Department of Physics, University of Oslo, Norway\n",
     "\n",
-    "Date: **October 21, 2024**"
+    "Date: **October 20, 2025**"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "92454b8c",
+   "id": "9d7bd8c9",
    "metadata": {
     "editable": true
    },
    "source": [
     "## Plans for week 43\n",
     "\n",
-    "**Material for the lecture on Monday October 21, 2024.**\n",
+    "**Material for the lecture on Monday October 20, 2025.**\n",
     "\n",
-    "  * Building our own Feed-forward Neural Network with intro to Tensorflow\n",
+    "1. Reminder from last week, see also lecture notes from week 42 at <https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/week42.html> as well as those from week 41, see see <https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/week41.html>. \n",
     "\n",
-    "  * Solving differential equations with Neural Networks\n",
+    "2. Building our own Feed-forward Neural Network.\n",
     "\n",
-    "  * Video of lecture at <https://youtu.be/vkBNTn-MLqs>\n",
+    "3. Coding examples using Tensorflow/Keras and Pytorch examples. The Pytorch examples are adapted from Rashcka's text, see chapters 11-13.. \n",
     "\n",
-    "  * Video os second part, solving differential equations with neural networks at <https://youtu.be/2N8To65I2wQ>\n",
+    "4. Start discussions on how to use neural networks for solving  differential equations (ordinary and partial ones). This topic continues next week as well.\n",
     "\n",
-    "  * Whiteboard notes on solving differential equations at <https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2024/NotesOct21.pdf>"
+    "5. Video of lecture at <https://youtu.be/Gi6mzxAT0Ew>\n",
+    "\n",
+    "6. Whiteboard notes at <https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2025/FYSSTKweek43.pdf>"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "7ad9295b",
+   "id": "c50cff0f",
    "metadata": {
     "editable": true
    },
@@ -57,57 +59,14 @@
     "## Exercises and lab session week 43\n",
     "**Lab sessions on Tuesday and Wednesday.**\n",
     "\n",
-    "  * Exercise on writing your own neural network code\n",
-    "\n",
-    "  * The exercises this week will be continued next week as well\n",
-    "\n",
-    "  * Discussion of project 2"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "cbe78b79",
-   "metadata": {
-    "editable": true
-   },
-   "source": [
-    "## Mathematics of deep learning\n",
-    "\n",
-    "**Two recent books online.**\n",
-    "\n",
-    "1. The Modern Mathematics of Deep Learning, by Julius Berner, Philipp Grohs, Gitta Kutyniok, Philipp Petersen at <https://arxiv.org/abs/2105.04026>, published as [Mathematical Aspects of Deep Learning, pp. 1-111. Cambridge University Press, 2022](https://doi.org/10.1017/9781009025096.002)\n",
-    "\n",
-    "2. Mathematical Introduction to Deep Learning: Methods, Implementations, and Theory, Arnulf Jentzen, Benno Kuckuck, Philippe von Wurstemberger at <https://doi.org/10.48550/arXiv.2310.20360>"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "52f3d73d",
-   "metadata": {
-    "editable": true
-   },
-   "source": [
-    "## Reminder on books with hands-on material and codes\n",
-    "* Sebastian Rashcka et al, Machine learning with Scikit-Learn and PyTorch at <https://sebastianraschka.com/blog/2022/ml-pytorch-book.html>"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "afcf91a9",
-   "metadata": {
-    "editable": true
-   },
-   "source": [
-    "## Reading recommendations\n",
-    "\n",
-    "1. Rashkca et al., chapter 11, jupyter-notebook sent separately, from GitHub site at <https://github.com/rasbt/machine-learning-book>. See also chapters 12 and 13 on using Pytorch to make a Neural network code. \n",
+    "1. Work  on writing your own neural network code and discussions of project 2. If you didn't get time to do the exercises from the two last weeks, we recommend doing so as these exercises give you the basic elements of a neural network code.\n",
     "\n",
-    "2. Goodfellow et al, chapter 6 and 7 contain most of the neural network background."
+    "2. The exercises this week are tailored to the optional part of project 2, and deal with studying ways to display results from classification problems"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "73c52766",
+   "id": "fe8d32ed",
    "metadata": {
     "editable": true
    },
@@ -115,12 +74,12 @@
     "## Using Automatic differentiation\n",
     "\n",
     "In our discussions of ordinary differential equations and neural network codes\n",
-    "we will also study the usage of Autograd, see for example <https://www.youtube.com/watch?v=fRf4l5qaX1M&ab_channel=AlexSmola> in computing gradients for deep learning. For the documentation of Autograd and examples see the lectures slides from [week 39](https://compphysics.github.io/MachineLearning/doc/pub/week39/html/week39.html) and the Autograd documentation at <https://github.com/HIPS/autograd>."
+    "we will also study the usage of Autograd, see for example <https://www.youtube.com/watch?v=fRf4l5qaX1M&ab_channel=AlexSmola> in computing gradients for deep learning. For the documentation of Autograd and examples see the Autograd documentation at <https://github.com/HIPS/autograd> and the lecture slides from week 41, see <https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/week41.html>."
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "0c8d3f87",
+   "id": "99999ab4",
    "metadata": {
     "editable": true
    },
@@ -137,23 +96,23 @@
   },
   {
    "cell_type": "markdown",
-   "id": "e37a061f",
+   "id": "b4489372",
    "metadata": {
     "editable": true
    },
    "source": [
-    "## Lecture Monday  October 21"
+    "## Lecture Monday  October 20"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "c9dcc967",
+   "id": "f7435e4a",
    "metadata": {
     "editable": true
    },
    "source": [
     "## Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations\n",
-    "This is a reminder from where we ended last week.\n",
+    "This is a reminder from last week.\n",
     "\n",
     "**The architecture (our model).**\n",
     "\n",
@@ -174,7 +133,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "5ff0f230",
+   "id": "e2561576",
    "metadata": {
     "editable": true
    },
@@ -197,7 +156,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "9bfa26c0",
+   "id": "39ed46ed",
    "metadata": {
     "editable": true
    },
@@ -209,7 +168,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "eb6f6d75",
+   "id": "776b50ac",
    "metadata": {
     "editable": true
    },
@@ -221,7 +180,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "8474f6a3",
+   "id": "b0ad385d",
    "metadata": {
     "editable": true
    },
@@ -231,7 +190,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "0edb9d87",
+   "id": "bb592830",
    "metadata": {
     "editable": true
    },
@@ -243,7 +202,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "5e0a7cea",
+   "id": "41259526",
    "metadata": {
     "editable": true
    },
@@ -257,7 +216,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "790a822d",
+   "id": "47eaff91",
    "metadata": {
     "editable": true
    },
@@ -269,7 +228,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "adec1944",
+   "id": "05b74533",
    "metadata": {
     "editable": true
    },
@@ -281,7 +240,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "556caafc",
+   "id": "6edb8648",
    "metadata": {
     "editable": true
    },
@@ -291,7 +250,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "17f55244",
+   "id": "a663fc08",
    "metadata": {
     "editable": true
    },
@@ -303,7 +262,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "d2cb9b96",
+   "id": "479150e0",
    "metadata": {
     "editable": true
    },
@@ -315,7 +274,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "ededcd6c",
+   "id": "41b9b1ea",
    "metadata": {
     "editable": true
    },
@@ -325,7 +284,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "837b7226",
+   "id": "590c403a",
    "metadata": {
     "editable": true
    },
@@ -337,7 +296,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "be1f3b39",
+   "id": "3db8cbb4",
    "metadata": {
     "editable": true
    },
@@ -349,7 +308,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "a320c88d",
+   "id": "a204182a",
    "metadata": {
     "editable": true
    },
@@ -372,7 +331,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "d832d09f",
+   "id": "4fe58cce",
    "metadata": {
     "editable": true
    },
@@ -384,1624 +343,132 @@
   },
   {
    "cell_type": "markdown",
-   "id": "588ddc37",
-   "metadata": {
-    "editable": true
-   },
-   "source": [
-    "$$\n",
-    "\\sigma(x) = \\frac{1}{1 + e^{-x}},\n",
-    "$$"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "ff490c14",
-   "metadata": {
-    "editable": true
-   },
-   "source": [
-    "and the *hyperbolic tangent* function"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "33129f1d",
-   "metadata": {
-    "editable": true
-   },
-   "source": [
-    "$$\n",
-    "\\sigma(x) = \\tanh(x)\n",
-    "$$"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "2cd95b52",
-   "metadata": {
-    "editable": true
-   },
-   "source": [
-    "## The RELU function family\n",
-    "\n",
-    "The ReLU activation function suffers from a problem known as the dying\n",
-    "ReLUs: during training, some neurons effectively die, meaning they\n",
-    "stop outputting anything other than 0.\n",
-    "\n",
-    "In some cases, you may find that half of your network’s neurons are\n",
-    "dead, especially if you used a large learning rate. During training,\n",
-    "if a neuron’s weights get updated such that the weighted sum of the\n",
-    "neuron’s inputs is negative, it will start outputting 0. When this\n",
-    "happen, the neuron is unlikely to come back to life since the gradient\n",
-    "of the ReLU function is 0 when its input is negative."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "36bff826",
-   "metadata": {
-    "editable": true
-   },
-   "source": [
-    "## ELU function\n",
-    "\n",
-    "To solve this problem, nowadays practitioners use a variant of the\n",
-    "ReLU function, such as the leaky ReLU discussed above or the so-called\n",
-    "exponential linear unit (ELU) function"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "9ef16459",
-   "metadata": {
-    "editable": true
-   },
-   "source": [
-    "$$\n",
-    "ELU(z) = \\left\\{\\begin{array}{cc} \\alpha\\left( \\exp{(z)}-1\\right) & z < 0,\\\\  z & z \\ge 0.\\end{array}\\right.\n",
-    "$$"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "4cc08f94",
-   "metadata": {
-    "editable": true
-   },
-   "source": [
-    "## Which activation function should we use?\n",
-    "\n",
-    "In general it seems that the ELU activation function is better than\n",
-    "the leaky ReLU function (and its variants), which is better than\n",
-    "ReLU. ReLU performs better than $\\tanh$ which in turn performs better\n",
-    "than the logistic function.\n",
-    "\n",
-    "If runtime performance is an issue, then you may opt for the leaky\n",
-    "ReLU function over the ELU function If you don’t want to tweak yet\n",
-    "another hyperparameter, you may just use the default $\\alpha$ of\n",
-    "$0.01$ for the leaky ReLU, and $1$ for ELU. If you have spare time and\n",
-    "computing power, you can use cross-validation or bootstrap to evaluate\n",
-    "other activation functions."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "ea028825",
-   "metadata": {
-    "editable": true
-   },
-   "source": [
-    "## More on activation functions, output layers\n",
-    "\n",
-    "In most cases you can use the ReLU activation function in the hidden\n",
-    "layers (or one of its variants).\n",
-    "\n",
-    "It is a bit faster to compute than other activation functions, and the\n",
-    "gradient descent optimization does in general not get stuck.\n",
-    "\n",
-    "**For the output layer:**\n",
-    "\n",
-    "* For classification the softmax activation function is generally a good choice for classification tasks (when the classes are mutually exclusive).\n",
-    "\n",
-    "* For regression tasks, you can simply use no activation function at all."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "921a49ef",
-   "metadata": {
-    "editable": true
-   },
-   "source": [
-    "## Setting up a Multi-layer perceptron model for classification\n",
-    "\n",
-    "We are now gong to develop an example based on the MNIST data\n",
-    "base. This is a classification problem and we need to use our\n",
-    "cross-entropy function we discussed in connection with logistic\n",
-    "regression. The cross-entropy defines our cost function for the\n",
-    "classificaton problems with neural networks.\n",
-    "\n",
-    "In binary classification with two classes $(0, 1)$ we define the\n",
-    "logistic/sigmoid function as the probability that a particular input\n",
-    "is in class $0$ or $1$.  This is possible because the logistic\n",
-    "function takes any input from the real numbers and inputs a number\n",
-    "between 0 and 1, and can therefore be interpreted as a probability. It\n",
-    "also has other nice properties, such as a derivative that is simple to\n",
-    "calculate.\n",
-    "\n",
-    "For an input $\\boldsymbol{a}$ from the hidden layer, the probability that the input $\\boldsymbol{x}$\n",
-    "is in class 0 or 1 is just. We let $\\theta$ represent the unknown weights and biases to be adjusted by our equations). The variable $x$\n",
-    "represents our activation values $z$. We have"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "9a029a10",
-   "metadata": {
-    "editable": true
-   },
-   "source": [
-    "$$\n",
-    "P(y = 0 \\mid \\boldsymbol{x}, \\boldsymbol{\\theta}) = \\frac{1}{1 + \\exp{(- \\boldsymbol{x}})} ,\n",
-    "$$"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "40d7c3b7",
-   "metadata": {
-    "editable": true
-   },
-   "source": [
-    "and"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "cf8c63fe",
-   "metadata": {
-    "editable": true
-   },
-   "source": [
-    "$$\n",
-    "P(y = 1 \\mid \\boldsymbol{x}, \\boldsymbol{\\theta}) = 1 - P(y = 0 \\mid \\boldsymbol{x}, \\boldsymbol{\\theta}) ,\n",
-    "$$"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "1b6c3403",
-   "metadata": {
-    "editable": true
-   },
-   "source": [
-    "where $y \\in \\{0, 1\\}$  and $\\boldsymbol{\\theta}$ represents the weights and biases\n",
-    "of our network."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "8143e962",
-   "metadata": {
-    "editable": true
-   },
-   "source": [
-    "## Defining the cost function\n",
-    "\n",
-    "Our cost function is given as (see the Logistic regression lectures)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "6619f034",
-   "metadata": {
-    "editable": true
-   },
-   "source": [
-    "$$\n",
-    "\\mathcal{C}(\\boldsymbol{\\theta}) = - \\ln P(\\mathcal{D} \\mid \\boldsymbol{\\theta}) = - \\sum_{i=1}^n\n",
-    "y_i \\ln[P(y_i = 0)] + (1 - y_i) \\ln [1 - P(y_i = 0)] = \\sum_{i=1}^n \\mathcal{L}_i(\\boldsymbol{\\theta}) .\n",
-    "$$"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "6b4e1124",
-   "metadata": {
-    "editable": true
-   },
-   "source": [
-    "This last equality means that we can interpret our *cost* function as a sum over the *loss* function\n",
-    "for each point in the dataset $\\mathcal{L}_i(\\boldsymbol{\\theta})$.  \n",
-    "The negative sign is just so that we can think about our algorithm as minimizing a positive number, rather\n",
-    "than maximizing a negative number.  \n",
-    "\n",
-    "In *multiclass* classification it is common to treat each integer label as a so called *one-hot* vector:  \n",
-    "\n",
-    "$y = 5 \\quad \\rightarrow \\quad \\boldsymbol{y} = (0, 0, 0, 0, 0, 1, 0, 0, 0, 0) ,$ and\n",
-    "\n",
-    "$y = 1 \\quad \\rightarrow \\quad \\boldsymbol{y} = (0, 1, 0, 0, 0, 0, 0, 0, 0, 0) ,$ \n",
-    "\n",
-    "i.e. a binary bit string of length $C$, where $C = 10$ is the number of classes in the MNIST dataset (numbers from $0$ to $9$)..  \n",
-    "\n",
-    "If $\\boldsymbol{x}_i$ is the $i$-th input (image), $y_{ic}$ refers to the $c$-th component of the $i$-th\n",
-    "output vector $\\boldsymbol{y}_i$.  \n",
-    "The probability of $\\boldsymbol{x}_i$ being in class $c$ will be given by the softmax function:"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "f0373257",
-   "metadata": {
-    "editable": true
-   },
-   "source": [
-    "$$\n",
-    "P(y_{ic} = 1 \\mid \\boldsymbol{x}_i, \\boldsymbol{\\theta}) = \\frac{\\exp{((\\boldsymbol{a}_i^{hidden})^T \\boldsymbol{w}_c)}}\n",
-    "{\\sum_{c'=0}^{C-1} \\exp{((\\boldsymbol{a}_i^{hidden})^T \\boldsymbol{w}_{c'})}} ,\n",
-    "$$"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "f958a039",
-   "metadata": {
-    "editable": true
-   },
-   "source": [
-    "which reduces to the logistic function in the binary case.  \n",
-    "The likelihood of this $C$-class classifier\n",
-    "is now given as:"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "e67e2ba4",
-   "metadata": {
-    "editable": true
-   },
-   "source": [
-    "$$\n",
-    "P(\\mathcal{D} \\mid \\boldsymbol{\\theta}) = \\prod_{i=1}^n \\prod_{c=0}^{C-1} [P(y_{ic} = 1)]^{y_{ic}} .\n",
-    "$$"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "de66b0e1",
+   "id": "a14f6d08",
    "metadata": {
     "editable": true
    },
    "source": [
-    "Again we take the negative log-likelihood to define our cost function:"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "8f041533",
-   "metadata": {
-    "editable": true
-   },
-   "source": [
-    "$$\n",
-    "\\mathcal{C}(\\boldsymbol{\\theta}) = - \\log{P(\\mathcal{D} \\mid \\boldsymbol{\\theta})}.\n",
-    "$$"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "8840e115",
-   "metadata": {
-    "editable": true
-   },
-   "source": [
-    "See the logistic regression lectures for a full definition of the cost function.\n",
-    "\n",
-    "The back propagation equations need now only a small change, namely the definition of a new cost function. We are thus ready to use the same equations as before!"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "f21c7506",
-   "metadata": {
-    "editable": true
-   },
-   "source": [
-    "## Example: binary classification problem\n",
-    "\n",
-    "As an example of the above, relevant for project 2 as well, let us consider a binary class. As discussed in our logistic regression lectures, we defined a cost function in terms of the parameters $\\beta$ as"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "72c3c921",
-   "metadata": {
-    "editable": true
-   },
-   "source": [
-    "$$\n",
-    "\\mathcal{C}(\\boldsymbol{\\beta}) = - \\sum_{i=1}^n \\left(y_i\\log{p(y_i \\vert x_i,\\boldsymbol{\\beta})}+(1-y_i)\\log{1-p(y_i \\vert x_i,\\boldsymbol{\\beta})}\\right),\n",
-    "$$"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "e37b9409",
-   "metadata": {
-    "editable": true
-   },
-   "source": [
-    "where we had defined the logistic (sigmoid) function"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "0f635478",
-   "metadata": {
-    "editable": true
-   },
-   "source": [
-    "$$\n",
-    "p(y_i =1\\vert x_i,\\boldsymbol{\\beta})=\\frac{\\exp{(\\beta_0+\\beta_1 x_i)}}{1+\\exp{(\\beta_0+\\beta_1 x_i)}},\n",
-    "$$"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "96e12d56",
-   "metadata": {
-    "editable": true
-   },
-   "source": [
-    "and"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "25625fe3",
-   "metadata": {
-    "editable": true
-   },
-   "source": [
-    "$$\n",
-    "p(y_i =0\\vert x_i,\\boldsymbol{\\beta})=1-p(y_i =1\\vert x_i,\\boldsymbol{\\beta}).\n",
-    "$$"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "cd61de5c",
-   "metadata": {
-    "editable": true
-   },
-   "source": [
-    "The parameters $\\boldsymbol{\\beta}$ were defined using a minimization method like gradient descent or Newton-Raphson's method. \n",
-    "\n",
-    "Now we replace $x_i$ with the activation $z_i^l$ for a given layer $l$ and the outputs as $y_i=a_i^l=f(z_i^l)$, with $z_i^l$ now being a function of the weights $w_{ij}^l$ and biases $b_i^l$. \n",
-    "We have then"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "9bb7f55b",
-   "metadata": {
-    "editable": true
-   },
-   "source": [
-    "$$\n",
-    "a_i^l = y_i = \\frac{\\exp{(z_i^l)}}{1+\\exp{(z_i^l)}},\n",
-    "$$"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "6e427758",
-   "metadata": {
-    "editable": true
-   },
-   "source": [
-    "with"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "e28000bc",
-   "metadata": {
-    "editable": true
-   },
-   "source": [
-    "$$\n",
-    "z_i^l = \\sum_{j}w_{ij}^l a_j^{l-1}+b_i^l,\n",
-    "$$"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "1986d31a",
-   "metadata": {
-    "editable": true
-   },
-   "source": [
-    "where the superscript $l-1$ indicates that these are the outputs from layer $l-1$.\n",
-    "Our cost function at the final layer $l=L$ is now"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "c1797668",
-   "metadata": {
-    "editable": true
-   },
-   "source": [
-    "$$\n",
-    "\\mathcal{C}(\\boldsymbol{W}) = - \\sum_{i=1}^n \\left(t_i\\log{a_i^L}+(1-t_i)\\log{(1-a_i^L)}\\right),\n",
-    "$$"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "07583c6a",
-   "metadata": {
-    "editable": true
-   },
-   "source": [
-    "where we have defined the targets $t_i$. The derivatives of the cost function with respect to the output $a_i^L$ are then easily calculated and we get"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "978e292d",
-   "metadata": {
-    "editable": true
-   },
-   "source": [
-    "$$\n",
-    "\\frac{\\partial \\mathcal{C}(\\boldsymbol{W})}{\\partial a_i^L} = \\frac{a_i^L-t_i}{a_i^L(1-a_i^L)}.\n",
-    "$$"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "d6385ead",
-   "metadata": {
-    "editable": true
-   },
-   "source": [
-    "In case we use another activation function than the logistic one, we need to evaluate other derivatives."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "8335897f",
-   "metadata": {
-    "editable": true
-   },
-   "source": [
-    "## The Softmax function\n",
-    "In case we employ the more general case given by the Softmax equation, we need to evaluate the derivative of the activation function with respect to the activation $z_i^l$, that is we need"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "50f96054",
-   "metadata": {
-    "editable": true
-   },
-   "source": [
-    "$$\n",
-    "\\frac{\\partial f(z_i^l)}{\\partial w_{jk}^l} =\n",
-    "\\frac{\\partial f(z_i^l)}{\\partial z_j^l} \\frac{\\partial z_j^l}{\\partial w_{jk}^l}= \\frac{\\partial f(z_i^l)}{\\partial z_j^l}a_k^{l-1}.\n",
-    "$$"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "db6ebd31",
-   "metadata": {
-    "editable": true
-   },
-   "source": [
-    "For the Softmax function we have"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "e5c3f583",
-   "metadata": {
-    "editable": true
-   },
-   "source": [
-    "$$\n",
-    "f(z_i^l) = \\frac{\\exp{(z_i^l)}}{\\sum_{m=1}^K\\exp{(z_m^l)}}.\n",
-    "$$"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "7988ff75",
-   "metadata": {
-    "editable": true
-   },
-   "source": [
-    "Its derivative with respect to $z_j^l$ gives"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "d3073ab9",
-   "metadata": {
-    "editable": true
-   },
-   "source": [
-    "$$\n",
-    "\\frac{\\partial f(z_i^l)}{\\partial z_j^l}= f(z_i^l)\\left(\\delta_{ij}-f(z_j^l)\\right),\n",
-    "$$"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "a4be6483",
-   "metadata": {
-    "editable": true
-   },
-   "source": [
-    "which in case of the simply binary model reduces to  having $i=j$."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "cbbc9199",
-   "metadata": {
-    "editable": true
-   },
-   "source": [
-    "## Developing a code for doing neural networks with back propagation\n",
-    "\n",
-    "One can identify a set of key steps when using neural networks to solve supervised learning problems:  \n",
-    "\n",
-    "1. Collect and pre-process data  \n",
-    "\n",
-    "2. Define model and architecture  \n",
-    "\n",
-    "3. Choose cost function and optimizer  \n",
-    "\n",
-    "4. Train the model  \n",
-    "\n",
-    "5. Evaluate model performance on test data  \n",
-    "\n",
-    "6. Adjust hyperparameters (if necessary, network architecture)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "67083baf",
-   "metadata": {
-    "editable": true
-   },
-   "source": [
-    "## Collect and pre-process data\n",
-    "\n",
-    "Here we will be using the MNIST dataset, which is readily available through the **scikit-learn**\n",
-    "package. You may also find it for example [here](http://yann.lecun.com/exdb/mnist/).  \n",
-    "The *MNIST* (Modified National Institute of Standards and Technology) database is a large database\n",
-    "of handwritten digits that is commonly used for training various image processing systems.  \n",
-    "The MNIST dataset consists of 70 000 images of size $28\\times 28$ pixels, each labeled from 0 to 9.  \n",
-    "The scikit-learn dataset we will use consists of a selection of 1797 images of size $8\\times 8$ collected and processed from this database.  \n",
-    "\n",
-    "To feed data into a feed-forward neural network we need to represent\n",
-    "the inputs as a design/feature matrix $X = (n_{inputs}, n_{features})$.  Each\n",
-    "row represents an *input*, in this case a handwritten digit, and\n",
-    "each column represents a *feature*, in this case a pixel.  The\n",
-    "correct answers, also known as *labels* or *targets* are\n",
-    "represented as a 1D array of integers \n",
-    "$Y = (n_{inputs}) = (5, 3, 1, 8,...)$.\n",
-    "\n",
-    "As an example, say we want to build a neural network using supervised learning to predict Body-Mass Index (BMI) from\n",
-    "measurements of height (in m)  \n",
-    "and weight (in kg). If we have measurements of 5 people the design/feature matrix could be for example:  \n",
-    "\n",
-    "$$ X = \\begin{bmatrix}\n",
-    "1.85 & 81\\\\\n",
-    "1.71 & 65\\\\\n",
-    "1.95 & 103\\\\\n",
-    "1.55 & 42\\\\\n",
-    "1.63 & 56\n",
-    "\\end{bmatrix} ,$$  \n",
-    "\n",
-    "and the targets would be:  \n",
-    "\n",
-    "$$ Y = (23.7, 22.2, 27.1, 17.5, 21.1) $$  \n",
-    "\n",
-    "Since each input image is a 2D matrix, we need to flatten the image\n",
-    "(i.e. \"unravel\" the 2D matrix into a 1D array) to turn the data into a\n",
-    "design/feature matrix. This means we lose all spatial information in the\n",
-    "image, such as locality and translational invariance. More complicated\n",
-    "architectures such as Convolutional Neural Networks can take advantage\n",
-    "of such information, and are most commonly applied when analyzing\n",
-    "images."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 1,
-   "id": "ce086d04",
-   "metadata": {
-    "collapsed": false,
-    "editable": true
-   },
-   "outputs": [],
-   "source": [
-    "%matplotlib inline\n",
-    "\n",
-    "# import necessary packages\n",
-    "import numpy as np\n",
-    "import matplotlib.pyplot as plt\n",
-    "from sklearn import datasets\n",
-    "\n",
-    "\n",
-    "# ensure the same random numbers appear every time\n",
-    "np.random.seed(0)\n",
-    "\n",
-    "# display images in notebook\n",
-    "%matplotlib inline\n",
-    "plt.rcParams['figure.figsize'] = (12,12)\n",
-    "\n",
-    "\n",
-    "# download MNIST dataset\n",
-    "digits = datasets.load_digits()\n",
-    "\n",
-    "# define inputs and labels\n",
-    "inputs = digits.images\n",
-    "labels = digits.target\n",
-    "\n",
-    "print(\"inputs = (n_inputs, pixel_width, pixel_height) = \" + str(inputs.shape))\n",
-    "print(\"labels = (n_inputs) = \" + str(labels.shape))\n",
-    "\n",
-    "\n",
-    "# flatten the image\n",
-    "# the value -1 means dimension is inferred from the remaining dimensions: 8x8 = 64\n",
-    "n_inputs = len(inputs)\n",
-    "inputs = inputs.reshape(n_inputs, -1)\n",
-    "print(\"X = (n_inputs, n_features) = \" + str(inputs.shape))\n",
-    "\n",
-    "\n",
-    "# choose some random images to display\n",
-    "indices = np.arange(n_inputs)\n",
-    "random_indices = np.random.choice(indices, size=5)\n",
-    "\n",
-    "for i, image in enumerate(digits.images[random_indices]):\n",
-    "    plt.subplot(1, 5, i+1)\n",
-    "    plt.axis('off')\n",
-    "    plt.imshow(image, cmap=plt.cm.gray_r, interpolation='nearest')\n",
-    "    plt.title(\"Label: %d\" % digits.target[random_indices[i]])\n",
-    "plt.show()"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "54bfb494",
-   "metadata": {
-    "editable": true
-   },
-   "source": [
-    "## Train and test datasets\n",
-    "\n",
-    "Performing analysis before partitioning the dataset is a major error, that can lead to incorrect conclusions.  \n",
-    "\n",
-    "We will reserve $80 \\%$ of our dataset for training and $20 \\%$ for testing.  \n",
-    "\n",
-    "It is important that the train and test datasets are drawn randomly from our dataset, to ensure\n",
-    "no bias in the sampling.  \n",
-    "Say you are taking measurements of weather data to predict the weather in the coming 5 days.\n",
-    "You don't want to train your model on measurements taken from the hours 00.00 to 12.00, and then test it on data\n",
-    "collected from 12.00 to 24.00."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 2,
-   "id": "63b09387",
-   "metadata": {
-    "collapsed": false,
-    "editable": true
-   },
-   "outputs": [],
-   "source": [
-    "from sklearn.model_selection import train_test_split\n",
-    "\n",
-    "# one-liner from scikit-learn library\n",
-    "train_size = 0.8\n",
-    "test_size = 1 - train_size\n",
-    "X_train, X_test, Y_train, Y_test = train_test_split(inputs, labels, train_size=train_size,\n",
-    "                                                    test_size=test_size)\n",
-    "\n",
-    "# equivalently in numpy\n",
-    "def train_test_split_numpy(inputs, labels, train_size, test_size):\n",
-    "    n_inputs = len(inputs)\n",
-    "    inputs_shuffled = inputs.copy()\n",
-    "    labels_shuffled = labels.copy()\n",
-    "    \n",
-    "    np.random.shuffle(inputs_shuffled)\n",
-    "    np.random.shuffle(labels_shuffled)\n",
-    "    \n",
-    "    train_end = int(n_inputs*train_size)\n",
-    "    X_train, X_test = inputs_shuffled[:train_end], inputs_shuffled[train_end:]\n",
-    "    Y_train, Y_test = labels_shuffled[:train_end], labels_shuffled[train_end:]\n",
-    "    \n",
-    "    return X_train, X_test, Y_train, Y_test\n",
-    "\n",
-    "#X_train, X_test, Y_train, Y_test = train_test_split_numpy(inputs, labels, train_size, test_size)\n",
-    "\n",
-    "print(\"Number of training images: \" + str(len(X_train)))\n",
-    "print(\"Number of test images: \" + str(len(X_test)))"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "8af9f143",
-   "metadata": {
-    "editable": true
-   },
-   "source": [
-    "## Define model and architecture\n",
-    "\n",
-    "Our simple feed-forward neural network will consist of an *input* layer, a single *hidden* layer and an *output* layer. The activation $y$ of each neuron is a weighted sum of inputs, passed through an activation function. In case of the simple perceptron model we have \n",
-    "\n",
-    "$$ z = \\sum_{i=1}^n w_i a_i ,$$\n",
-    "\n",
-    "$$ y = f(z) ,$$\n",
-    "\n",
-    "where $f$ is the activation function, $a_i$ represents input from neuron $i$ in the preceding layer\n",
-    "and $w_i$ is the weight to input $i$.  \n",
-    "The activation of the neurons in the input layer is just the features (e.g. a pixel value).  \n",
-    "\n",
-    "The simplest activation function for a neuron is the *Heaviside* function:\n",
-    "\n",
-    "$$ f(z) = \n",
-    "\\begin{cases}\n",
-    "1,  &  z > 0\\\\\n",
-    "0,  & \\text{otherwise}\n",
-    "\\end{cases}\n",
-    "$$\n",
-    "\n",
-    "A feed-forward neural network with this activation is known as a *perceptron*.  \n",
-    "For a binary classifier (i.e. two classes, 0 or 1, dog or not-dog) we can also use this in our output layer.  \n",
-    "This activation can be generalized to $k$ classes (using e.g. the *one-against-all* strategy), \n",
-    "and we call these architectures *multiclass perceptrons*.  \n",
-    "\n",
-    "However, it is now common to use the terms Single Layer Perceptron (SLP) (1 hidden layer) and  \n",
-    "Multilayer Perceptron (MLP) (2 or more hidden layers) to refer to feed-forward neural networks with any activation function.  \n",
-    "\n",
-    "Typical choices for activation functions include the sigmoid function, hyperbolic tangent, and Rectified Linear Unit (ReLU).  \n",
-    "We will be using the sigmoid function $\\sigma(x)$:  \n",
-    "\n",
-    "$$ f(x) = \\sigma(x) = \\frac{1}{1 + e^{-x}} ,$$\n",
-    "\n",
-    "which is inspired by probability theory (see logistic regression) and was most commonly used until about 2011. See the discussion below concerning other activation functions."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "0878e62d",
-   "metadata": {
-    "editable": true
-   },
-   "source": [
-    "## Layers\n",
-    "\n",
-    "* Input \n",
-    "\n",
-    "Since each input image has 8x8 = 64 pixels or features, we have an input layer of 64 neurons.  \n",
-    "\n",
-    "* Hidden layer\n",
-    "\n",
-    "We will use 50 neurons in the hidden layer receiving input from the neurons in the input layer.  \n",
-    "Since each neuron in the hidden layer is connected to the 64 inputs we have 64x50 = 3200 weights to the hidden layer.  \n",
-    "\n",
-    "* Output\n",
-    "\n",
-    "If we were building a binary classifier, it would be sufficient with a single neuron in the output layer,\n",
-    "which could output 0 or 1 according to the Heaviside function. This would be an example of a *hard* classifier, meaning it outputs the class of the input directly. However, if we are dealing with noisy data it is often beneficial to use a *soft* classifier, which outputs the probability of being in class 0 or 1.  \n",
-    "\n",
-    "For a soft binary classifier, we could use a single neuron and interpret the output as either being the probability of being in class 0 or the probability of being in class 1. Alternatively we could use 2 neurons, and interpret each neuron as the probability of being in each class.  \n",
-    "\n",
-    "Since we are doing multiclass classification, with 10 categories, it is natural to use 10 neurons in the output layer. We number the neurons $j = 0,1,...,9$. The activation of each output neuron $j$ will be according to the *softmax* function:  \n",
-    "\n",
-    "$$ P(\\text{class $j$} \\mid \\text{input $\\boldsymbol{a}$}) = \\frac{\\exp{(\\boldsymbol{a}^T \\boldsymbol{w}_j)}}\n",
-    "{\\sum_{c=0}^{9} \\exp{(\\boldsymbol{a}^T \\boldsymbol{w}_c)}} ,$$  \n",
-    "\n",
-    "i.e. each neuron $j$ outputs the probability of being in class $j$ given an input from the hidden layer $\\boldsymbol{a}$, with $\\boldsymbol{w}_j$ the weights of neuron $j$ to the inputs.  \n",
-    "The denominator is a normalization factor to ensure the outputs (probabilities) sum up to 1.  \n",
-    "The exponent is just the weighted sum of inputs as before:  \n",
-    "\n",
-    "$$ z_j = \\sum_{i=1}^n w_ {ij} a_i+b_j.$$  \n",
-    "\n",
-    "Since each neuron in the output layer is connected to the 50 inputs from the hidden layer we have 50x10 = 500\n",
-    "weights to the output layer."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "32e84e6b",
-   "metadata": {
-    "editable": true
-   },
-   "source": [
-    "## Weights and biases\n",
-    "\n",
-    "Typically weights are initialized with small values distributed around zero, drawn from a uniform\n",
-    "or normal distribution. Setting all weights to zero means all neurons give the same output, making the network useless.  \n",
-    "\n",
-    "Adding a bias value to the weighted sum of inputs allows the neural network to represent a greater range\n",
-    "of values. Without it, any input with the value 0 will be mapped to zero (before being passed through the activation). The bias unit has an output of 1, and a weight to each neuron $j$, $b_j$:  \n",
-    "\n",
-    "$$ z_j = \\sum_{i=1}^n w_ {ij} a_i + b_j.$$  \n",
-    "\n",
-    "The bias weights $\\boldsymbol{b}$ are often initialized to zero, but a small value like $0.01$ ensures all neurons have some output which can be backpropagated in the first training cycle."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 3,
-   "id": "2f1c2946",
-   "metadata": {
-    "collapsed": false,
-    "editable": true
-   },
-   "outputs": [],
-   "source": [
-    "# building our neural network\n",
-    "\n",
-    "n_inputs, n_features = X_train.shape\n",
-    "n_hidden_neurons = 50\n",
-    "n_categories = 10\n",
-    "\n",
-    "# we make the weights normally distributed using numpy.random.randn\n",
-    "\n",
-    "# weights and bias in the hidden layer\n",
-    "hidden_weights = np.random.randn(n_features, n_hidden_neurons)\n",
-    "hidden_bias = np.zeros(n_hidden_neurons) + 0.01\n",
-    "\n",
-    "# weights and bias in the output layer\n",
-    "output_weights = np.random.randn(n_hidden_neurons, n_categories)\n",
-    "output_bias = np.zeros(n_categories) + 0.01"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "8f3da5d5",
-   "metadata": {
-    "editable": true
-   },
-   "source": [
-    "## Feed-forward pass\n",
-    "\n",
-    "Denote $F$ the number of features, $H$ the number of hidden neurons and $C$ the number of categories.  \n",
-    "For each input image we calculate a weighted sum of input features (pixel values) to each neuron $j$ in the hidden layer $l$:  \n",
-    "\n",
-    "$$ z_{j}^{l} = \\sum_{i=1}^{F} w_{ij}^{l} x_i + b_{j}^{l},$$\n",
-    "\n",
-    "this is then passed through our activation function  \n",
-    "\n",
-    "$$ a_{j}^{l} = f(z_{j}^{l}) .$$  \n",
-    "\n",
-    "We calculate a weighted sum of inputs (activations in the hidden layer) to each neuron $j$ in the output layer:  \n",
-    "\n",
-    "$$ z_{j}^{L} = \\sum_{i=1}^{H} w_{ij}^{L} a_{i}^{l} + b_{j}^{L}.$$  \n",
-    "\n",
-    "Finally we calculate the output of neuron $j$ in the output layer using the softmax function:  \n",
-    "\n",
-    "$$ a_{j}^{L} = \\frac{\\exp{(z_j^{L})}}\n",
-    "{\\sum_{c=0}^{C-1} \\exp{(z_c^{L})}} .$$"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "bd250632",
-   "metadata": {
-    "editable": true
-   },
-   "source": [
-    "## Matrix multiplications\n",
-    "\n",
-    "Since our data has the dimensions $X = (n_{inputs}, n_{features})$ and our weights to the hidden\n",
-    "layer have the dimensions  \n",
-    "$W_{hidden} = (n_{features}, n_{hidden})$,\n",
-    "we can easily feed the network all our training data in one go by taking the matrix product  \n",
-    "\n",
-    "$$ X W^{h} = (n_{inputs}, n_{hidden}),$$ \n",
-    "\n",
-    "and obtain a matrix that holds the weighted sum of inputs to the hidden layer\n",
-    "for each input image and each hidden neuron.    \n",
-    "We also add the bias to obtain a matrix of weighted sums to the hidden layer $Z^{h}$:  \n",
-    "\n",
-    "$$ \\boldsymbol{z}^{l} = \\boldsymbol{X} \\boldsymbol{W}^{l} + \\boldsymbol{b}^{l} ,$$\n",
-    "\n",
-    "meaning the same bias (1D array with size equal number of hidden neurons) is added to each input image.  \n",
-    "This is then passed through the activation:  \n",
-    "\n",
-    "$$ \\boldsymbol{a}^{l} = f(\\boldsymbol{z}^l) .$$  \n",
-    "\n",
-    "This is fed to the output layer:  \n",
-    "\n",
-    "$$ \\boldsymbol{z}^{L} = \\boldsymbol{a}^{L} \\boldsymbol{W}^{L} + \\boldsymbol{b}^{L} .$$\n",
-    "\n",
-    "Finally we receive our output values for each image and each category by passing it through the softmax function:  \n",
-    "\n",
-    "$$ output = softmax (\\boldsymbol{z}^{L}) = (n_{inputs}, n_{categories}) .$$"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 4,
-   "id": "9367a90d",
-   "metadata": {
-    "collapsed": false,
-    "editable": true
-   },
-   "outputs": [],
-   "source": [
-    "# setup the feed-forward pass, subscript h = hidden layer\n",
-    "\n",
-    "def sigmoid(x):\n",
-    "    return 1/(1 + np.exp(-x))\n",
-    "\n",
-    "def feed_forward(X):\n",
-    "    # weighted sum of inputs to the hidden layer\n",
-    "    z_h = np.matmul(X, hidden_weights) + hidden_bias\n",
-    "    # activation in the hidden layer\n",
-    "    a_h = sigmoid(z_h)\n",
-    "    \n",
-    "    # weighted sum of inputs to the output layer\n",
-    "    z_o = np.matmul(a_h, output_weights) + output_bias\n",
-    "    # softmax output\n",
-    "    # axis 0 holds each input and axis 1 the probabilities of each category\n",
-    "    exp_term = np.exp(z_o)\n",
-    "    probabilities = exp_term / np.sum(exp_term, axis=1, keepdims=True)\n",
-    "    \n",
-    "    return probabilities\n",
-    "\n",
-    "probabilities = feed_forward(X_train)\n",
-    "print(\"probabilities = (n_inputs, n_categories) = \" + str(probabilities.shape))\n",
-    "print(\"probability that image 0 is in category 0,1,2,...,9 = \\n\" + str(probabilities[0]))\n",
-    "print(\"probabilities sum up to: \" + str(probabilities[0].sum()))\n",
-    "print()\n",
-    "\n",
-    "# we obtain a prediction by taking the class with the highest likelihood\n",
-    "def predict(X):\n",
-    "    probabilities = feed_forward(X)\n",
-    "    return np.argmax(probabilities, axis=1)\n",
-    "\n",
-    "predictions = predict(X_train)\n",
-    "print(\"predictions = (n_inputs) = \" + str(predictions.shape))\n",
-    "print(\"prediction for image 0: \" + str(predictions[0]))\n",
-    "print(\"correct label for image 0: \" + str(Y_train[0]))"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "06333cb2",
-   "metadata": {
-    "editable": true
-   },
-   "source": [
-    "## Choose cost function and optimizer\n",
-    "\n",
-    "To measure how well our neural network is doing we need to introduce a cost function.  \n",
-    "We will call the function that gives the error of a single sample output the *loss* function, and the function\n",
-    "that gives the total error of our network across all samples the *cost* function.\n",
-    "A typical choice for multiclass classification is the *cross-entropy* loss, also known as the negative log likelihood.  \n",
-    "\n",
-    "In *multiclass* classification it is common to treat each integer label as a so called *one-hot* vector:  \n",
-    "\n",
-    "$$ y = 5 \\quad \\rightarrow \\quad \\boldsymbol{y} = (0, 0, 0, 0, 0, 1, 0, 0, 0, 0) ,$$  \n",
-    "\n",
-    "$$ y = 1 \\quad \\rightarrow \\quad \\boldsymbol{y} = (0, 1, 0, 0, 0, 0, 0, 0, 0, 0) ,$$  \n",
-    "\n",
-    "i.e. a binary bit string of length $C$, where $C = 10$ is the number of classes in the MNIST dataset.  \n",
-    "\n",
-    "Let $y_{ic}$ denote the $c$-th component of the $i$-th one-hot vector.  \n",
-    "We define the cost function $\\mathcal{C}$ as a sum over the cross-entropy loss for each point $\\boldsymbol{x}_i$ in the dataset.\n",
-    "\n",
-    "In the one-hot representation only one of the terms in the loss function is non-zero, namely the\n",
-    "probability of the correct category $c'$  \n",
-    "(i.e. the category $c'$ such that $y_{ic'} = 1$). This means that the cross entropy loss only punishes you for how wrong\n",
-    "you got the correct label. The probability of category $c$ is given by the softmax function. The vector $\\boldsymbol{\\theta}$ represents the parameters of our network, i.e. all the weights and biases."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "c7de629a",
-   "metadata": {
-    "editable": true
-   },
-   "source": [
-    "## Optimizing the cost function\n",
-    "\n",
-    "The network is trained by finding the weights and biases that minimize the cost function. One of the most widely used classes of methods is *gradient descent* and its generalizations. The idea behind gradient descent\n",
-    "is simply to adjust the weights in the direction where the gradient of the cost function is large and negative. This ensures we flow toward a *local* minimum of the cost function.  \n",
-    "Each parameter $\\theta$ is iteratively adjusted according to the rule  \n",
-    "\n",
-    "$$ \\theta_{i+1} = \\theta_i - \\eta \\nabla \\mathcal{C}(\\theta_i) ,$$\n",
-    "\n",
-    "where $\\eta$ is known as the *learning rate*, which controls how big a step we take towards the minimum.  \n",
-    "This update can be repeated for any number of iterations, or until we are satisfied with the result.  \n",
-    "\n",
-    "A simple and effective improvement is a variant called *Batch Gradient Descent*.  \n",
-    "Instead of calculating the gradient on the whole dataset, we calculate an approximation of the gradient\n",
-    "on a subset of the data called a *minibatch*.  \n",
-    "If there are $N$ data points and we have a minibatch size of $M$, the total number of batches\n",
-    "is $N/M$.  \n",
-    "We denote each minibatch $B_k$, with $k = 1, 2,...,N/M$. The gradient then becomes:  \n",
-    "\n",
-    "$$ \\nabla \\mathcal{C}(\\theta) = \\frac{1}{N} \\sum_{i=1}^N \\nabla \\mathcal{L}_i(\\theta) \\quad \\rightarrow \\quad\n",
-    "\\frac{1}{M} \\sum_{i \\in B_k} \\nabla \\mathcal{L}_i(\\theta) ,$$\n",
-    "\n",
-    "i.e. instead of averaging the loss over the entire dataset, we average over a minibatch.  \n",
-    "\n",
-    "This has two important benefits:  \n",
-    "1. Introducing stochasticity decreases the chance that the algorithm becomes stuck in a local minima.  \n",
-    "\n",
-    "2. It significantly speeds up the calculation, since we do not have to use the entire dataset to calculate the gradient.  \n",
-    "\n",
-    "The various optmization  methods, with codes and algorithms,  are discussed in our lectures on [Gradient descent approaches](https://compphysics.github.io/MachineLearning/doc/pub/Splines/html/Splines-bs.html)."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "467898fe",
-   "metadata": {
-    "editable": true
-   },
-   "source": [
-    "## Regularization\n",
-    "\n",
-    "It is common to add an extra term to the cost function, proportional\n",
-    "to the size of the weights.  This is equivalent to constraining the\n",
-    "size of the weights, so that they do not grow out of control.\n",
-    "Constraining the size of the weights means that the weights cannot\n",
-    "grow arbitrarily large to fit the training data, and in this way\n",
-    "reduces *overfitting*.\n",
-    "\n",
-    "We will measure the size of the weights using the so called *L2-norm*, meaning our cost function becomes:  \n",
-    "\n",
-    "$$  \\mathcal{C}(\\theta) = \\frac{1}{N} \\sum_{i=1}^N \\mathcal{L}_i(\\theta) \\quad \\rightarrow \\quad\n",
-    "\\frac{1}{N} \\sum_{i=1}^N  \\mathcal{L}_i(\\theta) + \\lambda \\lvert \\lvert \\boldsymbol{w} \\rvert \\rvert_2^2 \n",
-    "= \\frac{1}{N} \\sum_{i=1}^N  \\mathcal{L}(\\theta) + \\lambda \\sum_{ij} w_{ij}^2,$$  \n",
-    "\n",
-    "i.e. we sum up all the weights squared. The factor $\\lambda$ is known as a regularization parameter.\n",
-    "\n",
-    "In order to train the model, we need to calculate the derivative of\n",
-    "the cost function with respect to every bias and weight in the\n",
-    "network.  In total our network has $(64 + 1)\\times 50=3250$ weights in\n",
-    "the hidden layer and $(50 + 1)\\times 10=510$ weights to the output\n",
-    "layer ($+1$ for the bias), and the gradient must be calculated for\n",
-    "every parameter.  We use the *backpropagation* algorithm discussed\n",
-    "above. This is a clever use of the chain rule that allows us to\n",
-    "calculate the gradient efficently."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "da72e113",
-   "metadata": {
-    "editable": true
-   },
-   "source": [
-    "## Matrix  multiplication\n",
-    "\n",
-    "To more efficently train our network these equations are implemented using matrix operations.  \n",
-    "The error in the output layer is calculated simply as, with $\\boldsymbol{t}$ being our targets,  \n",
-    "\n",
-    "$$ \\delta_L = \\boldsymbol{t} - \\boldsymbol{y} = (n_{inputs}, n_{categories}) .$$  \n",
-    "\n",
-    "The gradient for the output weights is calculated as  \n",
-    "\n",
-    "$$ \\nabla W_{L} = \\boldsymbol{a}^T \\delta_L   = (n_{hidden}, n_{categories}) ,$$\n",
-    "\n",
-    "where $\\boldsymbol{a} = (n_{inputs}, n_{hidden})$. This simply means that we are summing up the gradients for each input.  \n",
-    "Since we are going backwards we have to transpose the activation matrix.  \n",
-    "\n",
-    "The gradient with respect to the output bias is then  \n",
-    "\n",
-    "$$ \\nabla \\boldsymbol{b}_{L} = \\sum_{i=1}^{n_{inputs}} \\delta_L = (n_{categories}) .$$  \n",
-    "\n",
-    "The error in the hidden layer is  \n",
-    "\n",
-    "$$ \\Delta_h = \\delta_L W_{L}^T \\circ f'(z_{h}) = \\delta_L W_{L}^T \\circ a_{h} \\circ (1 - a_{h}) = (n_{inputs}, n_{hidden}) ,$$  \n",
-    "\n",
-    "where $f'(a_{h})$ is the derivative of the activation in the hidden layer. The matrix products mean\n",
-    "that we are summing up the products for each neuron in the output layer. The symbol $\\circ$ denotes\n",
-    "the *Hadamard product*, meaning element-wise multiplication.  \n",
-    "\n",
-    "This again gives us the gradients in the hidden layer:  \n",
-    "\n",
-    "$$ \\nabla W_{h} = X^T \\delta_h = (n_{features}, n_{hidden}) ,$$  \n",
-    "\n",
-    "$$ \\nabla b_{h} = \\sum_{i=1}^{n_{inputs}} \\delta_h = (n_{hidden}) .$$"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 5,
-   "id": "356881fc",
-   "metadata": {
-    "collapsed": false,
-    "editable": true
-   },
-   "outputs": [],
-   "source": [
-    "# to categorical turns our integer vector into a onehot representation\n",
-    "from sklearn.metrics import accuracy_score\n",
-    "\n",
-    "# one-hot in numpy\n",
-    "def to_categorical_numpy(integer_vector):\n",
-    "    n_inputs = len(integer_vector)\n",
-    "    n_categories = np.max(integer_vector) + 1\n",
-    "    onehot_vector = np.zeros((n_inputs, n_categories))\n",
-    "    onehot_vector[range(n_inputs), integer_vector] = 1\n",
-    "    \n",
-    "    return onehot_vector\n",
-    "\n",
-    "#Y_train_onehot, Y_test_onehot = to_categorical(Y_train), to_categorical(Y_test)\n",
-    "Y_train_onehot, Y_test_onehot = to_categorical_numpy(Y_train), to_categorical_numpy(Y_test)\n",
-    "\n",
-    "def feed_forward_train(X):\n",
-    "    # weighted sum of inputs to the hidden layer\n",
-    "    z_h = np.matmul(X, hidden_weights) + hidden_bias\n",
-    "    # activation in the hidden layer\n",
-    "    a_h = sigmoid(z_h)\n",
-    "    \n",
-    "    # weighted sum of inputs to the output layer\n",
-    "    z_o = np.matmul(a_h, output_weights) + output_bias\n",
-    "    # softmax output\n",
-    "    # axis 0 holds each input and axis 1 the probabilities of each category\n",
-    "    exp_term = np.exp(z_o)\n",
-    "    probabilities = exp_term / np.sum(exp_term, axis=1, keepdims=True)\n",
-    "    \n",
-    "    # for backpropagation need activations in hidden and output layers\n",
-    "    return a_h, probabilities\n",
-    "\n",
-    "def backpropagation(X, Y):\n",
-    "    a_h, probabilities = feed_forward_train(X)\n",
-    "    \n",
-    "    # error in the output layer\n",
-    "    error_output = probabilities - Y\n",
-    "    # error in the hidden layer\n",
-    "    error_hidden = np.matmul(error_output, output_weights.T) * a_h * (1 - a_h)\n",
-    "    \n",
-    "    # gradients for the output layer\n",
-    "    output_weights_gradient = np.matmul(a_h.T, error_output)\n",
-    "    output_bias_gradient = np.sum(error_output, axis=0)\n",
-    "    \n",
-    "    # gradient for the hidden layer\n",
-    "    hidden_weights_gradient = np.matmul(X.T, error_hidden)\n",
-    "    hidden_bias_gradient = np.sum(error_hidden, axis=0)\n",
-    "\n",
-    "    return output_weights_gradient, output_bias_gradient, hidden_weights_gradient, hidden_bias_gradient\n",
-    "\n",
-    "print(\"Old accuracy on training data: \" + str(accuracy_score(predict(X_train), Y_train)))\n",
-    "\n",
-    "eta = 0.01\n",
-    "lmbd = 0.01\n",
-    "for i in range(1000):\n",
-    "    # calculate gradients\n",
-    "    dWo, dBo, dWh, dBh = backpropagation(X_train, Y_train_onehot)\n",
-    "    \n",
-    "    # regularization term gradients\n",
-    "    dWo += lmbd * output_weights\n",
-    "    dWh += lmbd * hidden_weights\n",
-    "    \n",
-    "    # update weights and biases\n",
-    "    output_weights -= eta * dWo\n",
-    "    output_bias -= eta * dBo\n",
-    "    hidden_weights -= eta * dWh\n",
-    "    hidden_bias -= eta * dBh\n",
-    "\n",
-    "print(\"New accuracy on training data: \" + str(accuracy_score(predict(X_train), Y_train)))"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "978a1d33",
-   "metadata": {
-    "editable": true
-   },
-   "source": [
-    "## Improving performance\n",
-    "\n",
-    "As we can see the network does not seem to be learning at all. It seems to be just guessing the label for each image.  \n",
-    "In order to obtain a network that does something useful, we will have to do a bit more work.  \n",
-    "\n",
-    "The choice of *hyperparameters* such as learning rate and regularization parameter is hugely influential for the performance of the network. Typically a *grid-search* is performed, wherein we test different hyperparameters separated by orders of magnitude. For example we could test the learning rates $\\eta = 10^{-6}, 10^{-5},...,10^{-1}$ with different regularization parameters $\\lambda = 10^{-6},...,10^{-0}$.  \n",
-    "\n",
-    "Next, we haven't implemented minibatching yet, which introduces stochasticity and is though to act as an important regularizer on the weights. We call a feed-forward + backward pass with a minibatch an *iteration*, and a full training period\n",
-    "going through the entire dataset ($n/M$ batches) an *epoch*.\n",
-    "\n",
-    "If this does not improve network performance, you may want to consider altering the network architecture, adding more neurons or hidden layers.  \n",
-    "Andrew Ng goes through some of these considerations in this [video](https://youtu.be/F1ka6a13S9I). You can find a summary of the video [here](https://kevinzakka.github.io/2016/09/26/applying-deep-learning/)."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "2c42fc5d",
-   "metadata": {
-    "editable": true
-   },
-   "source": [
-    "## Full object-oriented implementation\n",
-    "\n",
-    "It is very natural to think of the network as an object, with specific instances of the network\n",
-    "being realizations of this object with different hyperparameters. An implementation using Python classes provides a clean structure and interface, and the full implementation of our neural network is given below."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 6,
-   "id": "c54a3f5d",
-   "metadata": {
-    "collapsed": false,
-    "editable": true
-   },
-   "outputs": [],
-   "source": [
-    "class NeuralNetwork:\n",
-    "    def __init__(\n",
-    "            self,\n",
-    "            X_data,\n",
-    "            Y_data,\n",
-    "            n_hidden_neurons=50,\n",
-    "            n_categories=10,\n",
-    "            epochs=10,\n",
-    "            batch_size=100,\n",
-    "            eta=0.1,\n",
-    "            lmbd=0.0):\n",
-    "\n",
-    "        self.X_data_full = X_data\n",
-    "        self.Y_data_full = Y_data\n",
-    "\n",
-    "        self.n_inputs = X_data.shape[0]\n",
-    "        self.n_features = X_data.shape[1]\n",
-    "        self.n_hidden_neurons = n_hidden_neurons\n",
-    "        self.n_categories = n_categories\n",
-    "\n",
-    "        self.epochs = epochs\n",
-    "        self.batch_size = batch_size\n",
-    "        self.iterations = self.n_inputs // self.batch_size\n",
-    "        self.eta = eta\n",
-    "        self.lmbd = lmbd\n",
-    "\n",
-    "        self.create_biases_and_weights()\n",
-    "\n",
-    "    def create_biases_and_weights(self):\n",
-    "        self.hidden_weights = np.random.randn(self.n_features, self.n_hidden_neurons)\n",
-    "        self.hidden_bias = np.zeros(self.n_hidden_neurons) + 0.01\n",
-    "\n",
-    "        self.output_weights = np.random.randn(self.n_hidden_neurons, self.n_categories)\n",
-    "        self.output_bias = np.zeros(self.n_categories) + 0.01\n",
-    "\n",
-    "    def feed_forward(self):\n",
-    "        # feed-forward for training\n",
-    "        self.z_h = np.matmul(self.X_data, self.hidden_weights) + self.hidden_bias\n",
-    "        self.a_h = sigmoid(self.z_h)\n",
-    "\n",
-    "        self.z_o = np.matmul(self.a_h, self.output_weights) + self.output_bias\n",
-    "\n",
-    "        exp_term = np.exp(self.z_o)\n",
-    "        self.probabilities = exp_term / np.sum(exp_term, axis=1, keepdims=True)\n",
-    "\n",
-    "    def feed_forward_out(self, X):\n",
-    "        # feed-forward for output\n",
-    "        z_h = np.matmul(X, self.hidden_weights) + self.hidden_bias\n",
-    "        a_h = sigmoid(z_h)\n",
-    "\n",
-    "        z_o = np.matmul(a_h, self.output_weights) + self.output_bias\n",
-    "        \n",
-    "        exp_term = np.exp(z_o)\n",
-    "        probabilities = exp_term / np.sum(exp_term, axis=1, keepdims=True)\n",
-    "        return probabilities\n",
-    "\n",
-    "    def backpropagation(self):\n",
-    "        error_output = self.probabilities - self.Y_data\n",
-    "        error_hidden = np.matmul(error_output, self.output_weights.T) * self.a_h * (1 - self.a_h)\n",
-    "\n",
-    "        self.output_weights_gradient = np.matmul(self.a_h.T, error_output)\n",
-    "        self.output_bias_gradient = np.sum(error_output, axis=0)\n",
-    "\n",
-    "        self.hidden_weights_gradient = np.matmul(self.X_data.T, error_hidden)\n",
-    "        self.hidden_bias_gradient = np.sum(error_hidden, axis=0)\n",
-    "\n",
-    "        if self.lmbd > 0.0:\n",
-    "            self.output_weights_gradient += self.lmbd * self.output_weights\n",
-    "            self.hidden_weights_gradient += self.lmbd * self.hidden_weights\n",
-    "\n",
-    "        self.output_weights -= self.eta * self.output_weights_gradient\n",
-    "        self.output_bias -= self.eta * self.output_bias_gradient\n",
-    "        self.hidden_weights -= self.eta * self.hidden_weights_gradient\n",
-    "        self.hidden_bias -= self.eta * self.hidden_bias_gradient\n",
-    "\n",
-    "    def predict(self, X):\n",
-    "        probabilities = self.feed_forward_out(X)\n",
-    "        return np.argmax(probabilities, axis=1)\n",
-    "\n",
-    "    def predict_probabilities(self, X):\n",
-    "        probabilities = self.feed_forward_out(X)\n",
-    "        return probabilities\n",
-    "\n",
-    "    def train(self):\n",
-    "        data_indices = np.arange(self.n_inputs)\n",
-    "\n",
-    "        for i in range(self.epochs):\n",
-    "            for j in range(self.iterations):\n",
-    "                # pick datapoints with replacement\n",
-    "                chosen_datapoints = np.random.choice(\n",
-    "                    data_indices, size=self.batch_size, replace=False\n",
-    "                )\n",
-    "\n",
-    "                # minibatch training data\n",
-    "                self.X_data = self.X_data_full[chosen_datapoints]\n",
-    "                self.Y_data = self.Y_data_full[chosen_datapoints]\n",
-    "\n",
-    "                self.feed_forward()\n",
-    "                self.backpropagation()"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "bde5d577",
-   "metadata": {
-    "editable": true
-   },
-   "source": [
-    "## Evaluate model performance on test data\n",
-    "\n",
-    "To measure the performance of our network we evaluate how well it does it data it has never seen before, i.e. the test data.  \n",
-    "We measure the performance of the network using the *accuracy* score.  \n",
-    "The accuracy is as you would expect just the number of images correctly labeled divided by the total number of images. A perfect classifier will have an accuracy score of $1$.  \n",
-    "\n",
-    "$$ \\text{Accuracy} = \\frac{\\sum_{i=1}^n I(\\tilde{y}_i = y_i)}{n} ,$$  \n",
-    "\n",
-    "where $I$ is the indicator function, $1$ if $\\tilde{y}_i = y_i$ and $0$ otherwise."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 7,
-   "id": "0ff4f685",
-   "metadata": {
-    "collapsed": false,
-    "editable": true
-   },
-   "outputs": [],
-   "source": [
-    "epochs = 100\n",
-    "batch_size = 100\n",
-    "\n",
-    "dnn = NeuralNetwork(X_train, Y_train_onehot, eta=eta, lmbd=lmbd, epochs=epochs, batch_size=batch_size,\n",
-    "                    n_hidden_neurons=n_hidden_neurons, n_categories=n_categories)\n",
-    "dnn.train()\n",
-    "test_predict = dnn.predict(X_test)\n",
-    "\n",
-    "# accuracy score from scikit library\n",
-    "print(\"Accuracy score on test set: \", accuracy_score(Y_test, test_predict))\n",
-    "\n",
-    "# equivalent in numpy\n",
-    "def accuracy_score_numpy(Y_test, Y_pred):\n",
-    "    return np.sum(Y_test == Y_pred) / len(Y_test)\n",
-    "\n",
-    "#print(\"Accuracy score on test set: \", accuracy_score_numpy(Y_test, test_predict))"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "222281ee",
-   "metadata": {
-    "editable": true
-   },
-   "source": [
-    "## Adjust hyperparameters\n",
-    "\n",
-    "We now perform a grid search to find the optimal hyperparameters for the network.  \n",
-    "Note that we are only using 1 layer with 50 neurons, and human performance is estimated to be around $98\\%$ ($2\\%$ error rate)."
+    "$$\n",
+    "\\sigma(x) = \\frac{1}{1 + e^{-x}},\n",
+    "$$"
    ]
   },
   {
-   "cell_type": "code",
-   "execution_count": 8,
-   "id": "bff5aecd",
+   "cell_type": "markdown",
+   "id": "4c290410",
    "metadata": {
-    "collapsed": false,
     "editable": true
    },
-   "outputs": [],
    "source": [
-    "eta_vals = np.logspace(-5, 1, 7)\n",
-    "lmbd_vals = np.logspace(-5, 1, 7)\n",
-    "# store the models for later use\n",
-    "DNN_numpy = np.zeros((len(eta_vals), len(lmbd_vals)), dtype=object)\n",
-    "\n",
-    "# grid search\n",
-    "for i, eta in enumerate(eta_vals):\n",
-    "    for j, lmbd in enumerate(lmbd_vals):\n",
-    "        dnn = NeuralNetwork(X_train, Y_train_onehot, eta=eta, lmbd=lmbd, epochs=epochs, batch_size=batch_size,\n",
-    "                            n_hidden_neurons=n_hidden_neurons, n_categories=n_categories)\n",
-    "        dnn.train()\n",
-    "        \n",
-    "        DNN_numpy[i][j] = dnn\n",
-    "        \n",
-    "        test_predict = dnn.predict(X_test)\n",
-    "        \n",
-    "        print(\"Learning rate  = \", eta)\n",
-    "        print(\"Lambda = \", lmbd)\n",
-    "        print(\"Accuracy score on test set: \", accuracy_score(Y_test, test_predict))\n",
-    "        print()"
+    "and the *hyperbolic tangent* function"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "205932c1",
+   "id": "ca1ac514",
    "metadata": {
     "editable": true
    },
    "source": [
-    "## Visualization"
+    "$$\n",
+    "\\sigma(x) = \\tanh(x)\n",
+    "$$"
    ]
   },
   {
-   "cell_type": "code",
-   "execution_count": 9,
-   "id": "90a6c9a8",
+   "cell_type": "markdown",
+   "id": "b9bcfab3",
    "metadata": {
-    "collapsed": false,
     "editable": true
    },
-   "outputs": [],
    "source": [
-    "# visual representation of grid search\n",
-    "# uses seaborn heatmap, you can also do this with matplotlib imshow\n",
-    "import seaborn as sns\n",
-    "\n",
-    "sns.set()\n",
-    "\n",
-    "train_accuracy = np.zeros((len(eta_vals), len(lmbd_vals)))\n",
-    "test_accuracy = np.zeros((len(eta_vals), len(lmbd_vals)))\n",
-    "\n",
-    "for i in range(len(eta_vals)):\n",
-    "    for j in range(len(lmbd_vals)):\n",
-    "        dnn = DNN_numpy[i][j]\n",
-    "        \n",
-    "        train_pred = dnn.predict(X_train) \n",
-    "        test_pred = dnn.predict(X_test)\n",
-    "\n",
-    "        train_accuracy[i][j] = accuracy_score(Y_train, train_pred)\n",
-    "        test_accuracy[i][j] = accuracy_score(Y_test, test_pred)\n",
+    "## The RELU function family\n",
     "\n",
-    "        \n",
-    "fig, ax = plt.subplots(figsize = (10, 10))\n",
-    "sns.heatmap(train_accuracy, annot=True, ax=ax, cmap=\"viridis\")\n",
-    "ax.set_title(\"Training Accuracy\")\n",
-    "ax.set_ylabel(\"$\\eta$\")\n",
-    "ax.set_xlabel(\"$\\lambda$\")\n",
-    "plt.show()\n",
+    "The ReLU activation function suffers from a problem known as the dying\n",
+    "ReLUs: during training, some neurons effectively die, meaning they\n",
+    "stop outputting anything other than 0.\n",
     "\n",
-    "fig, ax = plt.subplots(figsize = (10, 10))\n",
-    "sns.heatmap(test_accuracy, annot=True, ax=ax, cmap=\"viridis\")\n",
-    "ax.set_title(\"Test Accuracy\")\n",
-    "ax.set_ylabel(\"$\\eta$\")\n",
-    "ax.set_xlabel(\"$\\lambda$\")\n",
-    "plt.show()"
+    "In some cases, you may find that half of your network’s neurons are\n",
+    "dead, especially if you used a large learning rate. During training,\n",
+    "if a neuron’s weights get updated such that the weighted sum of the\n",
+    "neuron’s inputs is negative, it will start outputting 0. When this\n",
+    "happen, the neuron is unlikely to come back to life since the gradient\n",
+    "of the ReLU function is 0 when its input is negative."
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "8616f12b",
+   "id": "2fdf56f7",
    "metadata": {
     "editable": true
    },
    "source": [
-    "## scikit-learn implementation\n",
-    "\n",
-    "**scikit-learn** focuses more\n",
-    "on traditional machine learning methods, such as regression,\n",
-    "clustering, decision trees, etc. As such, it has only two types of\n",
-    "neural networks: Multi Layer Perceptron outputting continuous values,\n",
-    "*MPLRegressor*, and Multi Layer Perceptron outputting labels,\n",
-    "*MLPClassifier*. We will see how simple it is to use these classes.\n",
+    "## ELU function\n",
     "\n",
-    "**scikit-learn** implements a few improvements from our neural network,\n",
-    "such as early stopping, a varying learning rate, different\n",
-    "optimization methods, etc. We would therefore expect a better\n",
-    "performance overall."
+    "To solve this problem, nowadays practitioners use a variant of the\n",
+    "ReLU function, such as the leaky ReLU discussed above or the so-called\n",
+    "exponential linear unit (ELU) function"
    ]
   },
   {
-   "cell_type": "code",
-   "execution_count": 10,
-   "id": "38c2dff0",
+   "cell_type": "markdown",
+   "id": "14bf193c",
    "metadata": {
-    "collapsed": false,
     "editable": true
    },
-   "outputs": [],
    "source": [
-    "from sklearn.neural_network import MLPClassifier\n",
-    "# store models for later use\n",
-    "DNN_scikit = np.zeros((len(eta_vals), len(lmbd_vals)), dtype=object)\n",
-    "\n",
-    "for i, eta in enumerate(eta_vals):\n",
-    "    for j, lmbd in enumerate(lmbd_vals):\n",
-    "        dnn = MLPClassifier(hidden_layer_sizes=(n_hidden_neurons), activation='logistic',\n",
-    "                            alpha=lmbd, learning_rate_init=eta, max_iter=epochs)\n",
-    "        dnn.fit(X_train, Y_train)\n",
-    "        \n",
-    "        DNN_scikit[i][j] = dnn\n",
-    "        \n",
-    "        print(\"Learning rate  = \", eta)\n",
-    "        print(\"Lambda = \", lmbd)\n",
-    "        print(\"Accuracy score on test set: \", dnn.score(X_test, Y_test))\n",
-    "        print()"
+    "$$\n",
+    "ELU(z) = \\left\\{\\begin{array}{cc} \\alpha\\left( \\exp{(z)}-1\\right) & z < 0,\\\\  z & z \\ge 0.\\end{array}\\right.\n",
+    "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "766c15ae",
+   "id": "df29068f",
    "metadata": {
     "editable": true
    },
    "source": [
-    "## Visualization"
+    "## Which activation function should we use?\n",
+    "\n",
+    "In general it seems that the ELU activation function is better than\n",
+    "the leaky ReLU function (and its variants), which is better than\n",
+    "ReLU. ReLU performs better than $\\tanh$ which in turn performs better\n",
+    "than the logistic function.\n",
+    "\n",
+    "If runtime performance is an issue, then you may opt for the leaky\n",
+    "ReLU function over the ELU function If you don’t want to tweak yet\n",
+    "another hyperparameter, you may just use the default $\\alpha$ of\n",
+    "$0.01$ for the leaky ReLU, and $1$ for ELU. If you have spare time and\n",
+    "computing power, you can use cross-validation or bootstrap to evaluate\n",
+    "other activation functions."
    ]
   },
   {
-   "cell_type": "code",
-   "execution_count": 11,
-   "id": "b8670cd3",
+   "cell_type": "markdown",
+   "id": "2fb5a29e",
    "metadata": {
-    "collapsed": false,
     "editable": true
    },
-   "outputs": [],
    "source": [
-    "# optional\n",
-    "# visual representation of grid search\n",
-    "# uses seaborn heatmap, could probably do this in matplotlib\n",
-    "import seaborn as sns\n",
-    "\n",
-    "sns.set()\n",
+    "## More on activation functions, output layers\n",
     "\n",
-    "train_accuracy = np.zeros((len(eta_vals), len(lmbd_vals)))\n",
-    "test_accuracy = np.zeros((len(eta_vals), len(lmbd_vals)))\n",
+    "In most cases you can use the ReLU activation function in the hidden\n",
+    "layers (or one of its variants).\n",
     "\n",
-    "for i in range(len(eta_vals)):\n",
-    "    for j in range(len(lmbd_vals)):\n",
-    "        dnn = DNN_scikit[i][j]\n",
-    "        \n",
-    "        train_pred = dnn.predict(X_train) \n",
-    "        test_pred = dnn.predict(X_test)\n",
+    "It is a bit faster to compute than other activation functions, and the\n",
+    "gradient descent optimization does in general not get stuck.\n",
     "\n",
-    "        train_accuracy[i][j] = accuracy_score(Y_train, train_pred)\n",
-    "        test_accuracy[i][j] = accuracy_score(Y_test, test_pred)\n",
+    "**For the output layer:**\n",
     "\n",
-    "        \n",
-    "fig, ax = plt.subplots(figsize = (10, 10))\n",
-    "sns.heatmap(train_accuracy, annot=True, ax=ax, cmap=\"viridis\")\n",
-    "ax.set_title(\"Training Accuracy\")\n",
-    "ax.set_ylabel(\"$\\eta$\")\n",
-    "ax.set_xlabel(\"$\\lambda$\")\n",
-    "plt.show()\n",
+    "* For classification the softmax activation function is generally a good choice for classification tasks (when the classes are mutually exclusive).\n",
     "\n",
-    "fig, ax = plt.subplots(figsize = (10, 10))\n",
-    "sns.heatmap(test_accuracy, annot=True, ax=ax, cmap=\"viridis\")\n",
-    "ax.set_title(\"Test Accuracy\")\n",
-    "ax.set_ylabel(\"$\\eta$\")\n",
-    "ax.set_xlabel(\"$\\lambda$\")\n",
-    "plt.show()"
+    "* For regression tasks, you can simply use no activation function at all."
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "512ca005",
+   "id": "bab79791",
    "metadata": {
     "editable": true
    },
@@ -2019,7 +486,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "0fa97800",
+   "id": "cc32bc9d",
    "metadata": {
     "editable": true
    },
@@ -2053,8 +520,8 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 12,
-   "id": "7bdc829d",
+   "execution_count": 1,
+   "id": "deb81088",
    "metadata": {
     "collapsed": false,
     "editable": true
@@ -2066,7 +533,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "1fcedfc6",
+   "id": "979148b0",
    "metadata": {
     "editable": true
    },
@@ -2077,8 +544,8 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 13,
-   "id": "a96a6361",
+   "execution_count": 2,
+   "id": "ad63b8d9",
    "metadata": {
     "collapsed": false,
     "editable": true
@@ -2091,7 +558,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "540e8c9b",
+   "id": "1417a40e",
    "metadata": {
     "editable": true
    },
@@ -2101,8 +568,8 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 14,
-   "id": "73b6c334",
+   "execution_count": 3,
+   "id": "d56acb3a",
    "metadata": {
     "collapsed": false,
     "editable": true
@@ -2115,7 +582,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "df6cd2ce",
+   "id": "6a163d27",
    "metadata": {
     "editable": true
    },
@@ -2129,8 +596,8 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 15,
-   "id": "b5d544a0",
+   "execution_count": 4,
+   "id": "9ee390a8",
    "metadata": {
     "collapsed": false,
     "editable": true
@@ -2142,7 +609,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "a456ab5f",
+   "id": "528ea3d5",
    "metadata": {
     "editable": true
    },
@@ -2154,7 +621,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "53a01445",
+   "id": "32178225",
    "metadata": {
     "editable": true
    },
@@ -2166,14 +633,16 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 16,
-   "id": "3026f2a6",
+   "execution_count": 5,
+   "id": "e37f86e4",
    "metadata": {
     "collapsed": false,
     "editable": true
    },
    "outputs": [],
    "source": [
+    "%matplotlib inline\n",
+    "\n",
     "# import necessary packages\n",
     "import numpy as np\n",
     "import matplotlib.pyplot as plt\n",
@@ -2221,8 +690,8 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 17,
-   "id": "9b5cdbc5",
+   "execution_count": 6,
+   "id": "06a7c3bd",
    "metadata": {
     "collapsed": false,
     "editable": true
@@ -2250,8 +719,8 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 18,
-   "id": "62419d1b",
+   "execution_count": 7,
+   "id": "358b46c5",
    "metadata": {
     "collapsed": false,
     "editable": true
@@ -2280,8 +749,8 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 19,
-   "id": "ba4de85c",
+   "execution_count": 8,
+   "id": "5a0445fb",
    "metadata": {
     "collapsed": false,
     "editable": true
@@ -2307,8 +776,8 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 20,
-   "id": "0ba97552",
+   "execution_count": 9,
+   "id": "f301c7cf",
    "metadata": {
     "collapsed": false,
     "editable": true
@@ -2350,200 +819,181 @@
   },
   {
    "cell_type": "markdown",
-   "id": "e3785d9f",
+   "id": "610c95e1",
    "metadata": {
     "editable": true
    },
    "source": [
-    "## The Breast Cancer Data, now with Keras"
+    "## Using Pytorch with the full MNIST data set"
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 21,
-   "id": "9c50fc5e",
+   "execution_count": 10,
+   "id": "d0f3ad9a",
    "metadata": {
     "collapsed": false,
     "editable": true
    },
    "outputs": [],
    "source": [
+    "import torch\n",
+    "import torch.nn as nn\n",
+    "import torch.optim as optim\n",
+    "import torchvision\n",
+    "import torchvision.transforms as transforms\n",
     "\n",
-    "import tensorflow as tf\n",
-    "from tensorflow.keras.layers import Input\n",
-    "from tensorflow.keras.models import Sequential      #This allows appending layers to existing models\n",
-    "from tensorflow.keras.layers import Dense           #This allows defining the characteristics of a particular layer\n",
-    "from tensorflow.keras import optimizers             #This allows using whichever optimiser we want (sgd,adam,RMSprop)\n",
-    "from tensorflow.keras import regularizers           #This allows using whichever regularizer we want (l1,l2,l1_l2)\n",
-    "from tensorflow.keras.utils import to_categorical   #This allows using categorical cross entropy as the cost function\n",
-    "import numpy as np\n",
-    "import matplotlib.pyplot as plt\n",
-    "import seaborn as sns\n",
-    "from sklearn.model_selection import train_test_split as splitter\n",
-    "from sklearn.datasets import load_breast_cancer\n",
-    "import pickle\n",
-    "import os \n",
-    "\n",
-    "\n",
-    "\"\"\"Load breast cancer dataset\"\"\"\n",
-    "\n",
-    "np.random.seed(0)        #create same seed for random number every time\n",
-    "\n",
-    "cancer=load_breast_cancer()      #Download breast cancer dataset\n",
-    "\n",
-    "inputs=cancer.data                     #Feature matrix of 569 rows (samples) and 30 columns (parameters)\n",
-    "outputs=cancer.target                  #Label array of 569 rows (0 for benign and 1 for malignant)\n",
-    "labels=cancer.feature_names[0:30]\n",
-    "\n",
-    "print('The content of the breast cancer dataset is:')      #Print information about the datasets\n",
-    "print(labels)\n",
-    "print('-------------------------')\n",
-    "print(\"inputs =  \" + str(inputs.shape))\n",
-    "print(\"outputs =  \" + str(outputs.shape))\n",
-    "print(\"labels =  \"+ str(labels.shape))\n",
-    "\n",
-    "x=inputs      #Reassign the Feature and Label matrices to other variables\n",
-    "y=outputs\n",
-    "\n",
-    "#%% \n",
-    "\n",
-    "# Visualisation of dataset (for correlation analysis)\n",
-    "\n",
-    "plt.figure()\n",
-    "plt.scatter(x[:,0],x[:,2],s=40,c=y,cmap=plt.cm.Spectral)\n",
-    "plt.xlabel('Mean radius',fontweight='bold')\n",
-    "plt.ylabel('Mean perimeter',fontweight='bold')\n",
-    "plt.show()\n",
-    "\n",
-    "plt.figure()\n",
-    "plt.scatter(x[:,5],x[:,6],s=40,c=y, cmap=plt.cm.Spectral)\n",
-    "plt.xlabel('Mean compactness',fontweight='bold')\n",
-    "plt.ylabel('Mean concavity',fontweight='bold')\n",
-    "plt.show()\n",
-    "\n",
-    "\n",
-    "plt.figure()\n",
-    "plt.scatter(x[:,0],x[:,1],s=40,c=y,cmap=plt.cm.Spectral)\n",
-    "plt.xlabel('Mean radius',fontweight='bold')\n",
-    "plt.ylabel('Mean texture',fontweight='bold')\n",
-    "plt.show()\n",
-    "\n",
-    "plt.figure()\n",
-    "plt.scatter(x[:,2],x[:,1],s=40,c=y,cmap=plt.cm.Spectral)\n",
-    "plt.xlabel('Mean perimeter',fontweight='bold')\n",
-    "plt.ylabel('Mean compactness',fontweight='bold')\n",
-    "plt.show()\n",
-    "\n",
+    "# Device configuration: use GPU if available\n",
+    "device = torch.device(\"cuda\" if torch.cuda.is_available() else \"cpu\")\n",
     "\n",
-    "# Generate training and testing datasets\n",
+    "# MNIST dataset (downloads if not already present)\n",
+    "transform = transforms.Compose([\n",
+    "    transforms.ToTensor(),\n",
+    "    transforms.Normalize((0.5,), (0.5,))  # normalize to mean=0.5, std=0.5 (approx. [-1,1] pixel range)\n",
+    "])\n",
+    "train_dataset = torchvision.datasets.MNIST(root='./data', train=True, download=True, transform=transform)\n",
+    "test_dataset  = torchvision.datasets.MNIST(root='./data', train=False, download=True, transform=transform)\n",
     "\n",
-    "#Select features relevant to classification (texture,perimeter,compactness and symmetery) \n",
-    "#and add to input matrix\n",
+    "train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=64, shuffle=True)\n",
+    "test_loader  = torch.utils.data.DataLoader(test_dataset, batch_size=64, shuffle=False)\n",
     "\n",
-    "temp1=np.reshape(x[:,1],(len(x[:,1]),1))\n",
-    "temp2=np.reshape(x[:,2],(len(x[:,2]),1))\n",
-    "X=np.hstack((temp1,temp2))      \n",
-    "temp=np.reshape(x[:,5],(len(x[:,5]),1))\n",
-    "X=np.hstack((X,temp))       \n",
-    "temp=np.reshape(x[:,8],(len(x[:,8]),1))\n",
-    "X=np.hstack((X,temp))       \n",
     "\n",
-    "X_train,X_test,y_train,y_test=splitter(X,y,test_size=0.1)   #Split datasets into training and testing\n",
+    "class NeuralNet(nn.Module):\n",
+    "    def __init__(self):\n",
+    "        super(NeuralNet, self).__init__()\n",
+    "        self.fc1 = nn.Linear(28*28, 100)   # first hidden layer (784 -> 100)\n",
+    "        self.fc2 = nn.Linear(100, 100)    # second hidden layer (100 -> 100)\n",
+    "        self.fc3 = nn.Linear(100, 10)     # output layer (100 -> 10 classes)\n",
+    "    def forward(self, x):\n",
+    "        x = x.view(x.size(0), -1)         # flatten images into vectors of size 784\n",
+    "        x = torch.relu(self.fc1(x))       # hidden layer 1 + ReLU activation\n",
+    "        x = torch.relu(self.fc2(x))       # hidden layer 2 + ReLU activation\n",
+    "        x = self.fc3(x)                   # output layer (logits for 10 classes)\n",
+    "        return x\n",
     "\n",
-    "y_train=to_categorical(y_train)     #Convert labels to categorical when using categorical cross entropy\n",
-    "y_test=to_categorical(y_test)\n",
+    "model = NeuralNet().to(device)\n",
     "\n",
-    "del temp1,temp2,temp\n",
     "\n",
-    "# %%\n",
+    "criterion = nn.CrossEntropyLoss()\n",
+    "optimizer = optim.SGD(model.parameters(), lr=0.01, weight_decay=1e-4)\n",
     "\n",
-    "# Define tunable parameters\"\n",
+    "num_epochs = 10\n",
+    "for epoch in range(num_epochs):\n",
+    "    model.train()  # set model to training mode\n",
+    "    running_loss = 0.0\n",
+    "    for images, labels in train_loader:\n",
+    "        # Move data to device (GPU if available, else CPU)\n",
+    "        images, labels = images.to(device), labels.to(device)\n",
     "\n",
-    "eta=np.logspace(-3,-1,3)                    #Define vector of learning rates (parameter to SGD optimiser)\n",
-    "lamda=0.01                                  #Define hyperparameter\n",
-    "n_layers=2                                  #Define number of hidden layers in the model\n",
-    "n_neuron=np.logspace(0,3,4,dtype=int)       #Define number of neurons per layer\n",
-    "epochs=100                                   #Number of reiterations over the input data\n",
-    "batch_size=100                              #Number of samples per gradient update\n",
+    "        optimizer.zero_grad()            # reset gradients to zero\n",
+    "        outputs = model(images)          # forward pass: compute predictions\n",
+    "        loss = criterion(outputs, labels)  # compute cross-entropy loss\n",
+    "        loss.backward()                 # backpropagate to compute gradients\n",
+    "        optimizer.step()                # update weights using SGD step \n",
     "\n",
-    "# %%\n",
+    "        running_loss += loss.item()\n",
+    "    # Compute average loss over all batches in this epoch\n",
+    "    avg_loss = running_loss / len(train_loader)\n",
+    "    print(f\"Epoch {epoch+1}/{num_epochs}, Loss: {avg_loss:.4f}\")\n",
     "\n",
-    "\"\"\"Define function to return Deep Neural Network model\"\"\"\n",
+    "#Evaluation on the Test Set\n",
     "\n",
-    "def NN_model(inputsize,n_layers,n_neuron,eta,lamda):\n",
-    "    model=Sequential()      \n",
-    "    for i in range(n_layers):       #Run loop to add hidden layers to the model\n",
-    "        if (i==0):                  #First layer requires input dimensions\n",
-    "            model.add(Dense(n_neuron,activation='relu',kernel_regularizer=regularizers.l2(lamda),input_dim=inputsize))\n",
-    "        else:                       #Subsequent layers are capable of automatic shape inferencing\n",
-    "            model.add(Dense(n_neuron,activation='relu',kernel_regularizer=regularizers.l2(lamda)))\n",
-    "    model.add(Dense(2,activation='softmax'))  #2 outputs - ordered and disordered (softmax for prob)\n",
-    "    sgd=optimizers.SGD(learning_rate=eta)\n",
-    "    model.compile(loss='categorical_crossentropy',optimizer=sgd,metrics=['accuracy'])\n",
-    "    return model\n",
     "\n",
-    "    \n",
-    "Train_accuracy=np.zeros((len(n_neuron),len(eta)))      #Define matrices to store accuracy scores as a function\n",
-    "Test_accuracy=np.zeros((len(n_neuron),len(eta)))       #of learning rate and number of hidden neurons for \n",
-    "\n",
-    "for i in range(len(n_neuron)):     #run loops over hidden neurons and learning rates to calculate \n",
-    "    for j in range(len(eta)):      #accuracy scores \n",
-    "        DNN_model=NN_model(X_train.shape[1],n_layers,n_neuron[i],eta[j],lamda)\n",
-    "        DNN_model.fit(X_train,y_train,epochs=epochs,batch_size=batch_size,verbose=1)\n",
-    "        Train_accuracy[i,j]=DNN_model.evaluate(X_train,y_train)[1]\n",
-    "        Test_accuracy[i,j]=DNN_model.evaluate(X_test,y_test)[1]\n",
-    "               \n",
     "\n",
-    "def plot_data(x,y,data,title=None):\n",
+    "model.eval()  # set model to evaluation mode \n",
+    "correct = 0\n",
+    "total = 0\n",
+    "with torch.no_grad():  # disable gradient calculation for evaluation \n",
+    "    for images, labels in test_loader:\n",
+    "        images, labels = images.to(device), labels.to(device)\n",
+    "        outputs = model(images)\n",
+    "        _, predicted = torch.max(outputs, dim=1)  # class with highest score\n",
+    "        total += labels.size(0)\n",
+    "        correct += (predicted == labels).sum().item()\n",
     "\n",
-    "    # plot results\n",
-    "    fontsize=16\n",
+    "accuracy = 100 * correct / total\n",
+    "print(f\"Test Accuracy: {accuracy:.2f}%\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "aad687aa",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## And a similar example using Tensorflow with Keras"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 11,
+   "id": "b6c4fad4",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
     "\n",
+    "import tensorflow as tf\n",
+    "from tensorflow import keras\n",
+    "from tensorflow.keras import layers, regularizers\n",
     "\n",
-    "    fig = plt.figure()\n",
-    "    ax = fig.add_subplot(111)\n",
-    "    cax = ax.matshow(data, interpolation='nearest', vmin=0, vmax=1)\n",
-    "    \n",
-    "    cbar=fig.colorbar(cax)\n",
-    "    cbar.ax.set_ylabel('accuracy (%)',rotation=90,fontsize=fontsize)\n",
-    "    cbar.set_ticks([0,.2,.4,0.6,0.8,1.0])\n",
-    "    cbar.set_ticklabels(['0%','20%','40%','60%','80%','100%'])\n",
+    "# Check for GPU (TensorFlow will use it automatically if available)\n",
+    "gpus = tf.config.list_physical_devices('GPU')\n",
+    "print(f\"GPUs available: {gpus}\")\n",
     "\n",
-    "    # put text on matrix elements\n",
-    "    for i, x_val in enumerate(np.arange(len(x))):\n",
-    "        for j, y_val in enumerate(np.arange(len(y))):\n",
-    "            c = \"${0:.1f}\\\\%$\".format( 100*data[j,i])  \n",
-    "            ax.text(x_val, y_val, c, va='center', ha='center')\n",
+    "# 1) Load and preprocess MNIST\n",
+    "(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()\n",
+    "# Normalize to [0, 1]\n",
+    "x_train = (x_train.astype(\"float32\") / 255.0)\n",
+    "x_test  = (x_test.astype(\"float32\") / 255.0)\n",
     "\n",
-    "    # convert axis vaues to to string labels\n",
-    "    x=[str(i) for i in x]\n",
-    "    y=[str(i) for i in y]\n",
+    "# 2) Build the model: 784 -> 100 -> 100 -> 10\n",
+    "l2_reg = 1e-4  # L2 regularization strength\n",
     "\n",
+    "model = keras.Sequential([\n",
+    "    layers.Input(shape=(28, 28)),\n",
+    "    layers.Flatten(),\n",
+    "    layers.Dense(100, activation=\"relu\",\n",
+    "                 kernel_regularizer=regularizers.l2(l2_reg)),\n",
+    "    layers.Dense(100, activation=\"relu\",\n",
+    "                 kernel_regularizer=regularizers.l2(l2_reg)),\n",
+    "    layers.Dense(10, activation=\"softmax\")  # output probabilities for 10 classes\n",
+    "])\n",
     "\n",
-    "    ax.set_xticklabels(['']+x)\n",
-    "    ax.set_yticklabels(['']+y)\n",
+    "# 3) Compile with SGD + weight decay via L2 regularizers\n",
+    "model.compile(\n",
+    "    optimizer=keras.optimizers.SGD(learning_rate=0.01),\n",
+    "    loss=\"sparse_categorical_crossentropy\",\n",
+    "    metrics=[\"accuracy\"],\n",
+    ")\n",
     "\n",
-    "    ax.set_xlabel('$\\\\mathrm{learning\\\\ rate}$',fontsize=fontsize)\n",
-    "    ax.set_ylabel('$\\\\mathrm{hidden\\\\ neurons}$',fontsize=fontsize)\n",
-    "    if title is not None:\n",
-    "        ax.set_title(title)\n",
+    "model.summary()\n",
     "\n",
-    "    plt.tight_layout()\n",
+    "# 4) Train\n",
+    "history = model.fit(\n",
+    "    x_train, y_train,\n",
+    "    epochs=10,\n",
+    "    batch_size=64,\n",
+    "    validation_split=0.1,  # optional: monitor validation during training\n",
+    "    verbose=1\n",
+    ")\n",
     "\n",
-    "    plt.show()\n",
-    "    \n",
-    "plot_data(eta,n_neuron,Train_accuracy, 'training')\n",
-    "plot_data(eta,n_neuron,Test_accuracy, 'testing')"
+    "# 5) Evaluate on test set\n",
+    "test_loss, test_acc = model.evaluate(x_test, y_test, verbose=0)\n",
+    "print(f\"Test accuracy: {test_acc:.4f}, Test loss: {test_loss:.4f}\")"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "ade357fd",
+   "id": "73162fbb",
    "metadata": {
     "editable": true
    },
    "source": [
-    "## Building a neural network code\n",
+    "## Building our own  neural network code\n",
     "\n",
     "Here we  present a flexible object oriented codebase\n",
     "for a feed forward neural network, along with a demonstration of how\n",
@@ -2557,7 +1007,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "77afb266",
+   "id": "86f36041",
    "metadata": {
     "editable": true
    },
@@ -2578,8 +1028,8 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 22,
-   "id": "bfd96580",
+   "execution_count": 12,
+   "id": "bcbec449",
    "metadata": {
     "collapsed": false,
     "editable": true
@@ -2720,7 +1170,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "ace26b9a",
+   "id": "961989d9",
    "metadata": {
     "editable": true
    },
@@ -2735,8 +1185,8 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 23,
-   "id": "60867348",
+   "execution_count": 13,
+   "id": "1e9fbe0f",
    "metadata": {
     "collapsed": false,
     "editable": true
@@ -2749,7 +1199,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "12f1d032",
+   "id": "b5adb1b4",
    "metadata": {
     "editable": true
    },
@@ -2760,8 +1210,8 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 24,
-   "id": "78ae91c3",
+   "execution_count": 14,
+   "id": "dc4f4d28",
    "metadata": {
     "collapsed": false,
     "editable": true
@@ -2783,7 +1233,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "e505e6ef",
+   "id": "8964d118",
    "metadata": {
     "editable": true
    },
@@ -2798,8 +1248,8 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 25,
-   "id": "195b3f3b",
+   "execution_count": 15,
+   "id": "3a8470bd",
    "metadata": {
     "collapsed": false,
     "editable": true
@@ -2837,7 +1287,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "683f7300",
+   "id": "ab4daf8f",
    "metadata": {
     "editable": true
    },
@@ -2849,8 +1299,8 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 26,
-   "id": "5e9a52eb",
+   "execution_count": 16,
+   "id": "cf8922ac",
    "metadata": {
     "collapsed": false,
     "editable": true
@@ -2871,7 +1321,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "34fd5422",
+   "id": "fab332c4",
    "metadata": {
     "editable": true
    },
@@ -2886,8 +1336,8 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 27,
-   "id": "429ec779",
+   "execution_count": 17,
+   "id": "5ab56013",
    "metadata": {
     "collapsed": false,
     "editable": true
@@ -2945,7 +1395,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "f1ffb711",
+   "id": "969612c3",
    "metadata": {
     "editable": true
    },
@@ -2959,8 +1409,8 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 28,
-   "id": "a08ff105",
+   "execution_count": 18,
+   "id": "313878c6",
    "metadata": {
     "collapsed": false,
     "editable": true
@@ -2981,7 +1431,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "a6e789db",
+   "id": "095347a2",
    "metadata": {
     "editable": true
    },
@@ -3004,8 +1454,8 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 29,
-   "id": "9f0ef730",
+   "execution_count": 19,
+   "id": "9ea2b0b7",
    "metadata": {
     "collapsed": false,
     "editable": true
@@ -3477,7 +1927,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "6d6cae4f",
+   "id": "0f29bccd",
    "metadata": {
     "editable": true
    },
@@ -3488,8 +1938,8 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 30,
-   "id": "e2c91847",
+   "execution_count": 20,
+   "id": "dc37b403",
    "metadata": {
     "collapsed": false,
     "editable": true
@@ -3533,7 +1983,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "e9049787",
+   "id": "91790369",
    "metadata": {
     "editable": true
    },
@@ -3548,8 +1998,8 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 31,
-   "id": "a3af9356",
+   "execution_count": 21,
+   "id": "62585c7a",
    "metadata": {
     "collapsed": false,
     "editable": true
@@ -3564,7 +2014,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "7c016294",
+   "id": "69cdc171",
    "metadata": {
     "editable": true
    },
@@ -3574,8 +2024,8 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 32,
-   "id": "456a3a63",
+   "execution_count": 22,
+   "id": "d0713298",
    "metadata": {
     "collapsed": false,
     "editable": true
@@ -3590,7 +2040,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "a66bd6e5",
+   "id": "310f805d",
    "metadata": {
     "editable": true
    },
@@ -3605,8 +2055,8 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 33,
-   "id": "861e3e2b",
+   "execution_count": 23,
+   "id": "216d1c44",
    "metadata": {
     "collapsed": false,
     "editable": true
@@ -3620,7 +2070,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "614514f4",
+   "id": "ba2e5a39",
    "metadata": {
     "editable": true
    },
@@ -3634,8 +2084,8 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 34,
-   "id": "595d3148",
+   "execution_count": 24,
+   "id": "8c5b291e",
    "metadata": {
     "collapsed": false,
     "editable": true
@@ -3660,8 +2110,8 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 35,
-   "id": "6d87e7d4",
+   "execution_count": 25,
+   "id": "4f6aa682",
    "metadata": {
     "collapsed": false,
     "editable": true
@@ -3676,7 +2126,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "c21d1a42",
+   "id": "3ff7c54a",
    "metadata": {
     "editable": true
    },
@@ -3686,8 +2136,8 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 36,
-   "id": "a4c194ca",
+   "execution_count": 26,
+   "id": "4bbcaedd",
    "metadata": {
     "collapsed": false,
     "editable": true
@@ -3702,7 +2152,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "11d9fe2d",
+   "id": "aa4f54fe",
    "metadata": {
     "editable": true
    },
@@ -3712,8 +2162,8 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 37,
-   "id": "006df1f5",
+   "execution_count": 27,
+   "id": "c11be1f5",
    "metadata": {
     "collapsed": false,
     "editable": true
@@ -3732,8 +2182,8 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 38,
-   "id": "e5d4f374",
+   "execution_count": 28,
+   "id": "78482f24",
    "metadata": {
     "collapsed": false,
     "editable": true
@@ -3748,7 +2198,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "6d0775a4",
+   "id": "678b88e7",
    "metadata": {
     "editable": true
    },
@@ -3762,8 +2212,8 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 39,
-   "id": "0437ee65",
+   "execution_count": 29,
+   "id": "833a7321",
    "metadata": {
     "collapsed": false,
     "editable": true
@@ -3800,7 +2250,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "676f6839",
+   "id": "1af2ad7b",
    "metadata": {
     "editable": true
    },
@@ -3812,8 +2262,8 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 40,
-   "id": "a0614327",
+   "execution_count": 30,
+   "id": "752c6403",
    "metadata": {
     "collapsed": false,
     "editable": true
@@ -3836,7 +2286,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "2e3e6a48",
+   "id": "0a7c91e3",
    "metadata": {
     "editable": true
    },
@@ -3846,7 +2296,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "a0600f67",
+   "id": "40ffa1fb",
    "metadata": {
     "editable": true
    },
@@ -3873,7 +2323,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "2e2db428",
+   "id": "191ba3eb",
    "metadata": {
     "editable": true
    },
@@ -3887,7 +2337,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "79bc94bf",
+   "id": "a0be312a",
    "metadata": {
     "editable": true
    },
@@ -3904,7 +2354,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "8aae9608",
+   "id": "000663cf",
    "metadata": {
     "editable": true
    },
@@ -3920,7 +2370,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "5d4d34df",
+   "id": "f5b87995",
    "metadata": {
     "editable": true
    },
@@ -3932,7 +2382,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "4789ccab",
+   "id": "a166c0b6",
    "metadata": {
     "editable": true
    },
@@ -3950,7 +2400,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "f5bde685",
+   "id": "f1e49a2c",
    "metadata": {
     "editable": true
    },
@@ -3971,7 +2421,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "4651cd07",
+   "id": "207d1a97",
    "metadata": {
     "editable": true
    },
@@ -3988,7 +2438,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "6254e156",
+   "id": "94a061a1",
    "metadata": {
     "editable": true
    },
@@ -4000,7 +2450,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "20333501",
+   "id": "93244d03",
    "metadata": {
     "editable": true
    },
@@ -4011,7 +2461,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "813fe9a1",
+   "id": "6dc16fd4",
    "metadata": {
     "editable": true
    },
@@ -4028,7 +2478,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "ab52e1ed",
+   "id": "01f4c14a",
    "metadata": {
     "editable": true
    },
@@ -4039,7 +2489,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "ad9ead42",
+   "id": "1784066c",
    "metadata": {
     "editable": true
    },
@@ -4055,7 +2505,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "9d7307ee",
+   "id": "43e1b7bf",
    "metadata": {
     "editable": true
    },
@@ -4067,7 +2517,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "25cd1ecf",
+   "id": "5c28e60a",
    "metadata": {
     "editable": true
    },
@@ -4084,7 +2534,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "1628ddd6",
+   "id": "cfd2e420",
    "metadata": {
     "editable": true
    },
@@ -4096,7 +2546,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "6559be22",
+   "id": "b93aa0f8",
    "metadata": {
     "editable": true
    },
@@ -4114,7 +2564,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "ad3489df",
+   "id": "093952f0",
    "metadata": {
     "editable": true
    },
@@ -4124,7 +2574,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "3baa5d51",
+   "id": "8f82fa61",
    "metadata": {
     "editable": true
    },
@@ -4136,7 +2586,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "6793951c",
+   "id": "027d9c52",
    "metadata": {
     "editable": true
    },
@@ -4153,7 +2603,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "029174ae",
+   "id": "c18c4ee8",
    "metadata": {
     "editable": true
    },
@@ -4165,7 +2615,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "502a37dc",
+   "id": "a0d7fc0a",
    "metadata": {
     "editable": true
    },
@@ -4176,7 +2626,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "86f46ea0",
+   "id": "73cd72f4",
    "metadata": {
     "editable": true
    },
@@ -4188,7 +2638,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "3327e0e0",
+   "id": "a4d0850f",
    "metadata": {
     "editable": true
    },
@@ -4198,7 +2648,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "27d583c4",
+   "id": "62f3b94f",
    "metadata": {
     "editable": true
    },
@@ -4218,7 +2668,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "344a02b5",
+   "id": "f5144858",
    "metadata": {
     "editable": true
    },
@@ -4235,7 +2685,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "2ba3f9de",
+   "id": "6b441362",
    "metadata": {
     "editable": true
    },
@@ -4254,7 +2704,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "21fcac4a",
+   "id": "abfe2d6d",
    "metadata": {
     "editable": true
    },
@@ -4266,7 +2716,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "04a26245",
+   "id": "aabb6c7b",
    "metadata": {
     "editable": true
    },
@@ -4276,7 +2726,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "bbe30d4d",
+   "id": "11fc8b1b",
    "metadata": {
     "editable": true
    },
@@ -4293,7 +2743,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "bf115ba7",
+   "id": "604c92b4",
    "metadata": {
     "editable": true
    },
@@ -4303,7 +2753,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "e07822af",
+   "id": "e2cd7572",
    "metadata": {
     "editable": true
    },
@@ -4319,7 +2769,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "0bd949a8",
+   "id": "d916a5f6",
    "metadata": {
     "editable": true
    },
@@ -4331,7 +2781,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "e682bd3f",
+   "id": "d746e69c",
    "metadata": {
     "editable": true
    },
@@ -4343,7 +2793,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "9a5d7cbb",
+   "id": "4c34c242",
    "metadata": {
     "editable": true
    },
@@ -4355,7 +2805,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "5005ba3b",
+   "id": "f55f3047",
    "metadata": {
     "editable": true
    },
@@ -4365,7 +2815,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "a6c678bc",
+   "id": "485e4671",
    "metadata": {
     "editable": true
    },
@@ -4377,7 +2827,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "5d9081a6",
+   "id": "5628ca35",
    "metadata": {
     "editable": true
    },
@@ -4394,7 +2844,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "d30b3296",
+   "id": "da2c90ea",
    "metadata": {
     "editable": true
    },
@@ -4404,7 +2854,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "ba7ad1dc",
+   "id": "d386a466",
    "metadata": {
     "editable": true
    },
@@ -4416,7 +2866,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "c5c85fae",
+   "id": "ec3d975a",
    "metadata": {
     "editable": true
    },
@@ -4430,7 +2880,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "31455ab3",
+   "id": "4f0f47e7",
    "metadata": {
     "editable": true
    },
@@ -4446,7 +2896,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "db892730",
+   "id": "a757d9cf",
    "metadata": {
     "editable": true
    },
@@ -4458,7 +2908,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "77581ff5",
+   "id": "ee093dd9",
    "metadata": {
     "editable": true
    },
@@ -4480,7 +2930,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "2066b3b8",
+   "id": "4d3954bf",
    "metadata": {
     "editable": true
    },
@@ -4492,7 +2942,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "e5b2ec88",
+   "id": "b4b36b8c",
    "metadata": {
     "editable": true
    },
@@ -4515,7 +2965,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "7e73e4d1",
+   "id": "36e8a1dd",
    "metadata": {
     "editable": true
    },
@@ -4531,7 +2981,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "6e75a17f",
+   "id": "af2e68be",
    "metadata": {
     "editable": true
    },
@@ -4543,7 +2993,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "508d5800",
+   "id": "7b8922c6",
    "metadata": {
     "editable": true
    },
@@ -4567,7 +3017,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "c2e068bb",
+   "id": "2aa977d9",
    "metadata": {
     "editable": true
    },
@@ -4579,7 +3029,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "1c41a567",
+   "id": "48eccfa6",
    "metadata": {
     "editable": true
    },
@@ -4600,7 +3050,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "ee777fc2",
+   "id": "d4c2cdbf",
    "metadata": {
     "editable": true
    },
@@ -4612,7 +3062,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "4867702f",
+   "id": "be26d9c9",
    "metadata": {
     "editable": true
    },
@@ -4631,7 +3081,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "e8b191a7",
+   "id": "f3703c9a",
    "metadata": {
     "editable": true
    },
@@ -4641,7 +3091,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "07edf9ef",
+   "id": "9859680c",
    "metadata": {
     "editable": true
    },
@@ -4655,7 +3105,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "a44c683c",
+   "id": "c3df269d",
    "metadata": {
     "editable": true
    },
@@ -4667,7 +3117,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "8e35cfd6",
+   "id": "dc69023a",
    "metadata": {
     "editable": true
    },
@@ -4679,7 +3129,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "cad756d7",
+   "id": "d4bed3bd",
    "metadata": {
     "editable": true
    },
@@ -4696,7 +3146,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "e89f3475",
+   "id": "ed2a4f9a",
    "metadata": {
     "editable": true
    },
@@ -4708,7 +3158,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "05770b4f",
+   "id": "b9a4f604",
    "metadata": {
     "editable": true
    },
@@ -4730,7 +3180,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "aba3f2e6",
+   "id": "e48d507f",
    "metadata": {
     "editable": true
    },
@@ -4745,7 +3195,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "b9bfa280",
+   "id": "b84c5cf5",
    "metadata": {
     "editable": true
    },
@@ -4755,8 +3205,8 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 41,
-   "id": "4eebdf51",
+   "execution_count": 31,
+   "id": "293d0f7d",
    "metadata": {
     "collapsed": false,
     "editable": true
@@ -4911,7 +3361,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "091cc419",
+   "id": "54c070e1",
    "metadata": {
     "editable": true
    },
@@ -4925,8 +3375,8 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 42,
-   "id": "665731ce",
+   "execution_count": 32,
+   "id": "4ab2467e",
    "metadata": {
     "collapsed": false,
     "editable": true
@@ -5095,7 +3545,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "50bc17b7",
+   "id": "05126a03",
    "metadata": {
     "editable": true
    },
@@ -5108,7 +3558,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "340763a0",
+   "id": "7b4e9871",
    "metadata": {
     "editable": true
    },
@@ -5125,7 +3575,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "26658651",
+   "id": "20266e3a",
    "metadata": {
     "editable": true
    },
@@ -5141,7 +3591,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "6abebf29",
+   "id": "8a3f1b3d",
    "metadata": {
     "editable": true
    },
@@ -5154,7 +3604,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "a68128ae",
+   "id": "14dfc04b",
    "metadata": {
     "editable": true
    },
@@ -5171,7 +3621,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "b5000a32",
+   "id": "b125d1d3",
    "metadata": {
     "editable": true
    },
@@ -5183,7 +3633,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "306c6e46",
+   "id": "226a3528",
    "metadata": {
     "editable": true
    },
@@ -5210,7 +3660,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "c7d0d8c3",
+   "id": "adeeb731",
    "metadata": {
     "editable": true
    },
@@ -5222,8 +3672,8 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 43,
-   "id": "a2b6af75",
+   "execution_count": 33,
+   "id": "eb3ed6d1",
    "metadata": {
     "collapsed": false,
     "editable": true
@@ -5402,7 +3852,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "78241a6c",
+   "id": "2407df1c",
    "metadata": {
     "editable": true
    },
@@ -5422,7 +3872,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "e493bb17",
+   "id": "e30d9840",
    "metadata": {
     "editable": true
    },
@@ -5437,7 +3887,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "68995ee5",
+   "id": "4af6e338",
    "metadata": {
     "editable": true
    },
@@ -5451,7 +3901,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "f0e67441",
+   "id": "606cf0d3",
    "metadata": {
     "editable": true
    },
@@ -5467,7 +3917,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "b0722fa9",
+   "id": "3275ea67",
    "metadata": {
     "editable": true
    },
@@ -5477,7 +3927,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "6376873b",
+   "id": "8c36efec",
    "metadata": {
     "editable": true
    },
@@ -5499,7 +3949,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "57a0317c",
+   "id": "5290cde6",
    "metadata": {
     "editable": true
    },
@@ -5512,8 +3962,8 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 44,
-   "id": "d16f77f1",
+   "execution_count": 34,
+   "id": "d5488516",
    "metadata": {
     "collapsed": false,
     "editable": true
@@ -5589,7 +4039,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "e704b947",
+   "id": "d631641d",
    "metadata": {
     "editable": true
    },
@@ -5601,7 +4051,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "1feb358d",
+   "id": "3bd8043b",
    "metadata": {
     "editable": true
    },
@@ -5618,7 +4068,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "05027662",
+   "id": "818ac1d8",
    "metadata": {
     "editable": true
    },
@@ -5630,7 +4080,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "96d6d6a0",
+   "id": "894be116",
    "metadata": {
     "editable": true
    },
@@ -5645,7 +4095,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "d4dfad8a",
+   "id": "c2fce07f",
    "metadata": {
     "editable": true
    },
@@ -5657,7 +4107,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "ffe7eb65",
+   "id": "1e2ffb5e",
    "metadata": {
     "editable": true
    },
@@ -5669,7 +4119,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "2e3acb6a",
+   "id": "5677eb07",
    "metadata": {
     "editable": true
    },
@@ -5681,7 +4131,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "600af805",
+   "id": "89173815",
    "metadata": {
     "editable": true
    },
@@ -5691,7 +4141,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "3a61de1c",
+   "id": "f6e81c01",
    "metadata": {
     "editable": true
    },
@@ -5708,7 +4158,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "6475eda5",
+   "id": "82b4c100",
    "metadata": {
     "editable": true
    },
@@ -5720,7 +4170,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "4c506a87",
+   "id": "05574f7f",
    "metadata": {
     "editable": true
    },
@@ -5732,7 +4182,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "24701ca8",
+   "id": "5c17a08c",
    "metadata": {
     "editable": true
    },
@@ -5742,7 +4192,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "826b9055",
+   "id": "a0ce240a",
    "metadata": {
     "editable": true
    },
@@ -5754,7 +4204,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "323573be",
+   "id": "d90da9be",
    "metadata": {
     "editable": true
    },
@@ -5764,8 +4214,8 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 45,
-   "id": "4234c03b",
+   "execution_count": 35,
+   "id": "ffd8b552",
    "metadata": {
     "collapsed": false,
     "editable": true
@@ -5930,7 +4380,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "5f7a90b0",
+   "id": "2cde42e7",
    "metadata": {
     "editable": true
    },
@@ -5952,7 +4402,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "532f1254",
+   "id": "e24a46af",
    "metadata": {
     "editable": true
    },
@@ -5969,7 +4419,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "ece44428",
+   "id": "2417ec7c",
    "metadata": {
     "editable": true
    },
@@ -5979,7 +4429,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "c34e3e05",
+   "id": "012a9c2b",
    "metadata": {
     "editable": true
    },
@@ -5994,7 +4444,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "9186a55c",
+   "id": "101bccb8",
    "metadata": {
     "editable": true
    },
@@ -6004,7 +4454,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "0644e2f2",
+   "id": "280cdc54",
    "metadata": {
     "editable": true
    },
@@ -6019,7 +4469,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "b3bdd092",
+   "id": "38bc9035",
    "metadata": {
     "editable": true
    },
@@ -6030,7 +4480,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "e1e12027",
+   "id": "3925a117",
    "metadata": {
     "editable": true
    },
@@ -6050,7 +4500,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "334ba808",
+   "id": "6f86e85b",
    "metadata": {
     "editable": true
    },
@@ -6062,7 +4512,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "3e465af3",
+   "id": "394b14bc",
    "metadata": {
     "editable": true
    },
@@ -6099,7 +4549,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "c854491a",
+   "id": "5ab07ae1",
    "metadata": {
     "editable": true
    },
@@ -6109,7 +4559,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "6f5435eb",
+   "id": "8134c34f",
    "metadata": {
     "editable": true
    },
@@ -6121,8 +4571,8 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 46,
-   "id": "aec3e689",
+   "execution_count": 36,
+   "id": "4362f9a9",
    "metadata": {
     "collapsed": false,
     "editable": true
@@ -6327,7 +4777,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "349c11e3",
+   "id": "c66dc85a",
    "metadata": {
     "editable": true
    },
@@ -6344,7 +4794,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "d4adb530",
+   "id": "cf60d1fc",
    "metadata": {
     "editable": true
    },
@@ -6361,7 +4811,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "8b9396bf",
+   "id": "bff85f6e",
    "metadata": {
     "editable": true
    },
@@ -6371,7 +4821,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "059b4467",
+   "id": "64289867",
    "metadata": {
     "editable": true
    },
@@ -6386,7 +4836,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "5a7fcf9e",
+   "id": "75d3a4d2",
    "metadata": {
     "editable": true
    },
@@ -6400,7 +4850,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "df2d65c5",
+   "id": "6f3e695d",
    "metadata": {
     "editable": true
    },
@@ -6413,7 +4863,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "ae173d6c",
+   "id": "da1ba3cf",
    "metadata": {
     "editable": true
    },
@@ -6433,7 +4883,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "bdaf49ce",
+   "id": "373065ff",
    "metadata": {
     "editable": true
    },
@@ -6445,7 +4895,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "8fc692b7",
+   "id": "2281eade",
    "metadata": {
     "editable": true
    },
@@ -6457,7 +4907,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "3d9a8d27",
+   "id": "989a8905",
    "metadata": {
     "editable": true
    },
@@ -6469,7 +4919,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "8bb8693c",
+   "id": "b36367a0",
    "metadata": {
     "editable": true
    },
@@ -6479,7 +4929,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "8f12a71a",
+   "id": "6f6f51dd",
    "metadata": {
     "editable": true
    },
@@ -6491,7 +4941,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "92c02c54",
+   "id": "35bd1e4a",
    "metadata": {
     "editable": true
    },
@@ -6503,7 +4953,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "3574103a",
+   "id": "2b804c0a",
    "metadata": {
     "editable": true
    },
@@ -6515,7 +4965,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "0c5567ec",
+   "id": "07f20557",
    "metadata": {
     "editable": true
    },
@@ -6525,7 +4975,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "e239c04b",
+   "id": "0e14c702",
    "metadata": {
     "editable": true
    },
@@ -6541,7 +4991,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "b32c6b70",
+   "id": "a19c5cae",
    "metadata": {
     "editable": true
    },
@@ -6551,7 +5001,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "dfb3bad0",
+   "id": "de041a40",
    "metadata": {
     "editable": true
    },
@@ -6563,7 +5013,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "14519bf8",
+   "id": "519bb7a7",
    "metadata": {
     "editable": true
    },
@@ -6580,7 +5030,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "a0882035",
+   "id": "129322ea",
    "metadata": {
     "editable": true
    },
@@ -6590,7 +5040,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "bf1f5906",
+   "id": "ddc7b725",
    "metadata": {
     "editable": true
    },
@@ -6606,7 +5056,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "61a16dd6",
+   "id": "5497b34b",
    "metadata": {
     "editable": true
    },
@@ -6620,7 +5070,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "99949d7b",
+   "id": "0b9040e4",
    "metadata": {
     "editable": true
    },
@@ -6638,8 +5088,8 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 47,
-   "id": "f9faf581",
+   "execution_count": 37,
+   "id": "17097802",
    "metadata": {
     "collapsed": false,
     "editable": true
@@ -6694,7 +5144,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "3cbeb7ac",
+   "id": "a2178b56",
    "metadata": {
     "editable": true
    },
@@ -6724,7 +5174,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "04a66fd1",
+   "id": "533f4e84",
    "metadata": {
     "editable": true
    },
@@ -6752,8 +5202,8 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 48,
-   "id": "83ad594f",
+   "execution_count": 38,
+   "id": "7b494481",
    "metadata": {
     "collapsed": false,
     "editable": true
@@ -6800,7 +5250,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "dbe6f74a",
+   "id": "9f4b4939",
    "metadata": {
     "editable": true
    },
@@ -6825,8 +5275,8 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 49,
-   "id": "a29e5348",
+   "execution_count": 39,
+   "id": "83d6eb7d",
    "metadata": {
     "collapsed": false,
     "editable": true
@@ -7060,7 +5510,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "f2171b20",
+   "id": "ada13a48",
    "metadata": {
     "editable": true
    },
@@ -7072,7 +5522,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "e2c87638",
+   "id": "e4727d73",
    "metadata": {
     "editable": true
    },
@@ -7084,7 +5534,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "d8595812",
+   "id": "0b86d555",
    "metadata": {
     "editable": true
    },
@@ -7096,7 +5546,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "6969d557",
+   "id": "216948d5",
    "metadata": {
     "editable": true
    },
@@ -7113,7 +5563,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "f39d16ef",
+   "id": "44c25fdc",
    "metadata": {
     "editable": true
    },
@@ -7123,7 +5573,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "6c8e08dd",
+   "id": "98f919eb",
    "metadata": {
     "editable": true
    },
@@ -7135,7 +5585,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "699a4862",
+   "id": "01299767",
    "metadata": {
     "editable": true
    },
@@ -7152,7 +5602,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "3e5b810b",
+   "id": "556587c5",
    "metadata": {
     "editable": true
    },
@@ -7163,7 +5613,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "c3f9ac9b",
+   "id": "c9eb4f3a",
    "metadata": {
     "editable": true
    },
@@ -7183,7 +5633,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "56f7fba0",
+   "id": "63128ef6",
    "metadata": {
     "editable": true
    },
@@ -7193,7 +5643,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "2788d661",
+   "id": "ff568c81",
    "metadata": {
     "editable": true
    },
@@ -7219,7 +5669,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "2094e988",
+   "id": "7b32c8dd",
    "metadata": {
     "editable": true
    },
@@ -7235,7 +5685,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "aac9bb21",
+   "id": "fc33e683",
    "metadata": {
     "editable": true
    },
@@ -7245,8 +5695,8 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 50,
-   "id": "550bea3f",
+   "execution_count": 40,
+   "id": "2f923958",
    "metadata": {
     "collapsed": false,
     "editable": true
@@ -7477,7 +5927,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "80a6f6ef",
+   "id": "95dea76f",
    "metadata": {
     "editable": true
    },
diff --git a/doc/pub/week44/html/._week44-bs000.html b/doc/pub/week44/html/._week44-bs000.html
index 644209ab4..7e2427036 100644
--- a/doc/pub/week44/html/._week44-bs000.html
+++ b/doc/pub/week44/html/._week44-bs000.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 44,  Convolutional Neural Networks (CNN)">
-<title>Week 44,  Convolutional Neural Networks (CNN)</title>
+<meta name="description" content="Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)">
+<title>Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week44.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week44-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -41,10 +41,121 @@
                2,
                None,
                'lab-sessions-on-tuesday-and-wednesday'),
-              ('Material for Lecture Monday October 28',
+              ('Material for Lecture Monday October 27',
                2,
                None,
-               'material-for-lecture-monday-october-28'),
+               'material-for-lecture-monday-october-27'),
+              ('Solving differential equations  with Deep Learning',
+               2,
+               None,
+               'solving-differential-equations-with-deep-learning'),
+              ('Ordinary Differential Equations first',
+               2,
+               None,
+               'ordinary-differential-equations-first'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('Minimization process', 2, None, 'minimization-process'),
+              ('Minimizing the cost function using gradient descent and '
+               'automatic differentiation',
+               2,
+               None,
+               'minimizing-the-cost-function-using-gradient-descent-and-automatic-differentiation'),
+              ('Example: Exponential decay',
+               2,
+               None,
+               'example-exponential-decay'),
+              ('The function to solve for',
+               2,
+               None,
+               'the-function-to-solve-for'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('Setup of Network', 2, None, 'setup-of-network'),
+              ('Reformulating the problem',
+               2,
+               None,
+               'reformulating-the-problem'),
+              ('More technicalities', 2, None, 'more-technicalities'),
+              ('More details', 2, None, 'more-details'),
+              ('A possible implementation of a neural network',
+               2,
+               None,
+               'a-possible-implementation-of-a-neural-network'),
+              ('Technicalities', 2, None, 'technicalities'),
+              ('Final technicalities I', 2, None, 'final-technicalities-i'),
+              ('Final technicalities II', 2, None, 'final-technicalities-ii'),
+              ('Final technicalities III', 2, None, 'final-technicalities-iii'),
+              ('Final technicalities IV', 2, None, 'final-technicalities-iv'),
+              ('Back propagation', 2, None, 'back-propagation'),
+              ('Gradient descent', 2, None, 'gradient-descent'),
+              ('The code for solving the ODE',
+               2,
+               None,
+               'the-code-for-solving-the-ode'),
+              ('The network with one input layer, specified number of hidden '
+               'layers, and one output layer',
+               2,
+               None,
+               'the-network-with-one-input-layer-specified-number-of-hidden-layers-and-one-output-layer'),
+              ('Example: Population growth',
+               2,
+               None,
+               'example-population-growth'),
+              ('Setting up the problem', 2, None, 'setting-up-the-problem'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('The program using Autograd',
+               2,
+               None,
+               'the-program-using-autograd'),
+              ('Using forward Euler to solve the ODE',
+               2,
+               None,
+               'using-forward-euler-to-solve-the-ode'),
+              ('Example: Solving the one dimensional Poisson equation',
+               2,
+               None,
+               'example-solving-the-one-dimensional-poisson-equation'),
+              ('The specific equation to solve for',
+               2,
+               None,
+               'the-specific-equation-to-solve-for'),
+              ('Solving the equation using Autograd',
+               2,
+               None,
+               'solving-the-equation-using-autograd'),
+              ('Comparing with a numerical scheme',
+               2,
+               None,
+               'comparing-with-a-numerical-scheme'),
+              ('Setting up the code', 2, None, 'setting-up-the-code'),
+              ('Partial Differential Equations',
+               2,
+               None,
+               'partial-differential-equations'),
+              ('Type of problem', 2, None, 'type-of-problem'),
+              ('Network requirements', 2, None, 'network-requirements'),
+              ('More details', 2, None, 'more-details'),
+              ('Example: The diffusion equation',
+               2,
+               None,
+               'example-the-diffusion-equation'),
+              ('Defining the problem', 2, None, 'defining-the-problem'),
+              ('Setting up the network using Autograd',
+               2,
+               None,
+               'setting-up-the-network-using-autograd'),
+              ('Setting up the network using Autograd; The trial solution',
+               2,
+               None,
+               'setting-up-the-network-using-autograd-the-trial-solution'),
+              ('Why the Jacobian?', 2, None, 'why-the-jacobian'),
+              ('Setting up the network using Autograd; The full program',
+               2,
+               None,
+               'setting-up-the-network-using-autograd-the-full-program'),
+              ('Resources on differential equations and deep learning',
+               2,
+               None,
+               'resources-on-differential-equations-and-deep-learning'),
               ('Convolutional Neural Networks (recognizing images)',
                2,
                None,
@@ -141,85 +252,11 @@
                2,
                None,
                'pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
-              ('Building convolutional neural networks in Tensorflow and Keras',
-               2,
-               None,
-               'building-convolutional-neural-networks-in-tensorflow-and-keras'),
-              ('Setting it up', 2, None, 'setting-it-up'),
-              ('The MNIST dataset again', 2, None, 'the-mnist-dataset-again'),
-              ('Strong correlations', 2, None, 'strong-correlations'),
-              ('Layers of a CNN', 2, None, 'layers-of-a-cnn'),
-              ('Systematic reduction', 2, None, 'systematic-reduction'),
-              ('Prerequisites: Collect and pre-process data',
+              ('Building convolutional neural networks in Tensorflow/Keras and '
+               'PyTorch',
                2,
                None,
-               'prerequisites-collect-and-pre-process-data'),
-              ('Importing Keras and Tensorflow',
-               2,
-               None,
-               'importing-keras-and-tensorflow'),
-              ('Running with Keras', 2, None, 'running-with-keras'),
-              ('Final part', 2, None, 'final-part'),
-              ('Final visualization', 2, None, 'final-visualization'),
-              ('The CIFAR01 data set', 2, None, 'the-cifar01-data-set'),
-              ('Verifying the data set', 2, None, 'verifying-the-data-set'),
-              ('Set up  the model', 2, None, 'set-up-the-model'),
-              ('Add Dense layers on top', 2, None, 'add-dense-layers-on-top'),
-              ('Compile and train the model',
-               2,
-               None,
-               'compile-and-train-the-model'),
-              ('Finally, evaluate the model',
-               2,
-               None,
-               'finally-evaluate-the-model'),
-              ('Building our own CNN code',
-               2,
-               None,
-               'building-our-own-cnn-code'),
-              ('List of contents:', 3, None, 'list-of-contents'),
-              ('Schedulers', 3, None, 'schedulers'),
-              ('Usage of schedulers', 3, None, 'usage-of-schedulers'),
-              ('Cost functions', 3, None, 'cost-functions'),
-              ('Usage of cost functions', 3, None, 'usage-of-cost-functions'),
-              ('Activation functions', 3, None, 'activation-functions'),
-              ('Usage of activation functions',
-               3,
-               None,
-               'usage-of-activation-functions'),
-              ('Convolution', 3, None, 'convolution'),
-              ('Layers', 3, None, 'layers'),
-              ('Convolution2DLayer: convolution in a hidden layer',
-               3,
-               None,
-               'convolution2dlayer-convolution-in-a-hidden-layer'),
-              ('Backpropagation in the convolutional layer',
-               3,
-               None,
-               'backpropagation-in-the-convolutional-layer'),
-              ('Demonstration', 3, None, 'demonstration'),
-              ('Pooling Layer', 3, None, 'pooling-layer'),
-              ('Flattening Layer', 3, None, 'flattening-layer'),
-              ('Fully Connected Layers', 3, None, 'fully-connected-layers'),
-              ('Optimized Convolution2DLayer',
-               3,
-               None,
-               'optimized-convolution2dlayer'),
-              ('The Convolutional Neural Network (CNN)',
-               3,
-               None,
-               'the-convolutional-neural-network-cnn'),
-              ('Usage of CNN code', 3, None, 'usage-of-cnn-code'),
-              ('Additional Remarks', 3, None, 'additional-remarks'),
-              ('Remarks on the speed', 3, None, 'remarks-on-the-speed'),
-              ('Convolution using separable kernels',
-               3,
-               None,
-               'convolution-using-separable-kernels'),
-              ('Convolution in the Fourier domain',
-               3,
-               None,
-               'convolution-in-the-fourier-domain')]}
+               'building-convolutional-neural-networks-in-tensorflow-keras-and-pytorch')]}
 end of tocinfo -->
 
 <body>
@@ -247,96 +284,100 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week44-bs.html">Week 44,  Convolutional Neural Networks (CNN)</a>
+    <a class="navbar-brand" href="http://github.com/week44-bs.html">Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs001.html#plan-for-week-44" style="font-size: 80%;"><b>Plan for week 44</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs002.html#lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Lab  sessions on Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs003.html#material-for-lecture-monday-october-28" style="font-size: 80%;"><b>Material for Lecture Monday October 28</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs004.html#convolutional-neural-networks-recognizing-images" style="font-size: 80%;"><b>Convolutional Neural Networks (recognizing images)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs005.html#what-is-the-difference" style="font-size: 80%;"><b>What is the Difference</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs006.html#neural-networks-vs-cnns" style="font-size: 80%;"><b>Neural Networks vs CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs007.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;"><b>Why CNNS for images, sound files, medical images from CT scans etc?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs008.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;"><b>Regular NNs don’t scale well to full images</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs009.html#3d-volumes-of-neurons" style="font-size: 80%;"><b>3D volumes of neurons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs010.html#more-on-dimensionalities" style="font-size: 80%;"><b>More on Dimensionalities</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs011.html#further-remarks" style="font-size: 80%;"><b>Further remarks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs012.html#layers-used-to-build-cnns" style="font-size: 80%;"><b>Layers used to build CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs013.html#transforming-images" style="font-size: 80%;"><b>Transforming images</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs014.html#cnns-in-brief" style="font-size: 80%;"><b>CNNs in brief</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs015.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs016.html#key-idea" style="font-size: 80%;"><b>Key Idea</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs017.html#how-to-do-image-compression-before-the-era-of-deep-learning" style="font-size: 80%;"><b>How to do image compression before the era of deep learning</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs018.html#the-svd-example" style="font-size: 80%;"><b>The SVD example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs019.html#mathematics-of-cnns" style="font-size: 80%;"><b>Mathematics of CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs020.html#convolution-examples-polynomial-multiplication" style="font-size: 80%;"><b>Convolution Examples: Polynomial multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs021.html#efficient-polynomial-multiplication" style="font-size: 80%;"><b>Efficient Polynomial Multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs022.html#further-simplification" style="font-size: 80%;"><b>Further simplification</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs023.html#a-more-efficient-way-of-coding-the-above-convolution" style="font-size: 80%;"><b>A more efficient way of coding the above Convolution</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs024.html#commutative-process" style="font-size: 80%;"><b>Commutative process</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs025.html#toeplitz-matrices" style="font-size: 80%;"><b>Toeplitz matrices</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs026.html#fourier-series-and-toeplitz-matrices" style="font-size: 80%;"><b>Fourier series and Toeplitz matrices</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs027.html#generalizing-the-above-one-dimensional-case" style="font-size: 80%;"><b>Generalizing the above one-dimensional case</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#memory-considerations" style="font-size: 80%;"><b>Memory considerations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs029.html#padding" style="font-size: 80%;"><b>Padding</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs030.html#new-vector" style="font-size: 80%;"><b>New vector</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs031.html#rewriting-as-dot-products" style="font-size: 80%;"><b>Rewriting as dot products</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs032.html#cross-correlation" style="font-size: 80%;"><b>Cross correlation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs032.html#two-dimensional-objects" style="font-size: 80%;"><b>Two-dimensional objects</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs033.html#cnns-in-more-detail-simple-example" style="font-size: 80%;"><b>CNNs in more detail, simple example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs034.html#the-convolution-stage" style="font-size: 80%;"><b>The convolution stage</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs035.html#finding-the-number-of-parameters" style="font-size: 80%;"><b>Finding the number of parameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs036.html#new-image-or-volume" style="font-size: 80%;"><b>New image (or volume)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs037.html#parameters-to-train-common-settings" style="font-size: 80%;"><b>Parameters to train, common settings</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs038.html#examples-of-cnn-setups" style="font-size: 80%;"><b>Examples of CNN setups</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs039.html#summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>Summarizing: Performing a general discrete convolution ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs040.html#pooling" style="font-size: 80%;"><b>Pooling</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs041.html#pooling-arithmetic" style="font-size: 80%;"><b>Pooling arithmetic</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs042.html#pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>Pooling types ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs043.html#building-convolutional-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building convolutional neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs044.html#setting-it-up" style="font-size: 80%;"><b>Setting it up</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs045.html#the-mnist-dataset-again" style="font-size: 80%;"><b>The MNIST dataset again</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs046.html#strong-correlations" style="font-size: 80%;"><b>Strong correlations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs047.html#layers-of-a-cnn" style="font-size: 80%;"><b>Layers of a CNN</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs048.html#systematic-reduction" style="font-size: 80%;"><b>Systematic reduction</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs049.html#prerequisites-collect-and-pre-process-data" style="font-size: 80%;"><b>Prerequisites: Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs050.html#importing-keras-and-tensorflow" style="font-size: 80%;"><b>Importing Keras and Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs051.html#running-with-keras" style="font-size: 80%;"><b>Running with Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs052.html#final-part" style="font-size: 80%;"><b>Final part</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs053.html#final-visualization" style="font-size: 80%;"><b>Final visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs054.html#the-cifar01-data-set" style="font-size: 80%;"><b>The CIFAR01 data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs055.html#verifying-the-data-set" style="font-size: 80%;"><b>Verifying the data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs056.html#set-up-the-model" style="font-size: 80%;"><b>Set up  the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs057.html#add-dense-layers-on-top" style="font-size: 80%;"><b>Add Dense layers on top</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs058.html#compile-and-train-the-model" style="font-size: 80%;"><b>Compile and train the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs059.html#finally-evaluate-the-model" style="font-size: 80%;"><b>Finally, evaluate the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#building-our-own-cnn-code" style="font-size: 80%;"><b>Building our own CNN code</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#list-of-contents" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;List of contents:</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution2dlayer-convolution-in-a-hidden-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution2DLayer: convolution in a hidden layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#backpropagation-in-the-convolutional-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Backpropagation in the convolutional layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#demonstration" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Demonstration</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#pooling-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Pooling Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#flattening-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Flattening Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#fully-connected-layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Fully Connected Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#optimized-convolution2dlayer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Optimized Convolution2DLayer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#the-convolutional-neural-network-cnn" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;The Convolutional Neural Network (CNN)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-cnn-code" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of CNN code</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#additional-remarks" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Additional Remarks</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#remarks-on-the-speed" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Remarks on the speed</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution-using-separable-kernels" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution using separable kernels</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution-in-the-fourier-domain" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution in the Fourier domain</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs001.html#plan-for-week-44" style="font-size: 80%;">Plan for week 44</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs002.html#lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;">Lab  sessions on Tuesday and Wednesday</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs003.html#material-for-lecture-monday-october-27" style="font-size: 80%;">Material for Lecture Monday October 27</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs004.html#solving-differential-equations-with-deep-learning" style="font-size: 80%;">Solving differential equations  with Deep Learning</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs005.html#ordinary-differential-equations-first" style="font-size: 80%;">Ordinary Differential Equations first</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#the-trial-solution" style="font-size: 80%;">The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs007.html#minimization-process" style="font-size: 80%;">Minimization process</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs008.html#minimizing-the-cost-function-using-gradient-descent-and-automatic-differentiation" style="font-size: 80%;">Minimizing the cost function using gradient descent and automatic differentiation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs009.html#example-exponential-decay" style="font-size: 80%;">Example: Exponential decay</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs010.html#the-function-to-solve-for" style="font-size: 80%;">The function to solve for</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#the-trial-solution" style="font-size: 80%;">The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs012.html#setup-of-network" style="font-size: 80%;">Setup of Network</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs013.html#reformulating-the-problem" style="font-size: 80%;">Reformulating the problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs014.html#more-technicalities" style="font-size: 80%;">More technicalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs039.html#more-details" style="font-size: 80%;">More details</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs016.html#a-possible-implementation-of-a-neural-network" style="font-size: 80%;">A possible implementation of a neural network</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs017.html#technicalities" style="font-size: 80%;">Technicalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs018.html#final-technicalities-i" style="font-size: 80%;">Final technicalities I</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs019.html#final-technicalities-ii" style="font-size: 80%;">Final technicalities II</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs020.html#final-technicalities-iii" style="font-size: 80%;">Final technicalities III</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs021.html#final-technicalities-iv" style="font-size: 80%;">Final technicalities IV</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs022.html#back-propagation" style="font-size: 80%;">Back propagation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs023.html#gradient-descent" style="font-size: 80%;">Gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs024.html#the-code-for-solving-the-ode" style="font-size: 80%;">The code for solving the ODE</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs025.html#the-network-with-one-input-layer-specified-number-of-hidden-layers-and-one-output-layer" style="font-size: 80%;">The network with one input layer, specified number of hidden layers, and one output layer</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs026.html#example-population-growth" style="font-size: 80%;">Example: Population growth</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs027.html#setting-up-the-problem" style="font-size: 80%;">Setting up the problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#the-trial-solution" style="font-size: 80%;">The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs029.html#the-program-using-autograd" style="font-size: 80%;">The program using Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs030.html#using-forward-euler-to-solve-the-ode" style="font-size: 80%;">Using forward Euler to solve the ODE</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs031.html#example-solving-the-one-dimensional-poisson-equation" style="font-size: 80%;">Example: Solving the one dimensional Poisson equation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs032.html#the-specific-equation-to-solve-for" style="font-size: 80%;">The specific equation to solve for</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs033.html#solving-the-equation-using-autograd" style="font-size: 80%;">Solving the equation using Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs034.html#comparing-with-a-numerical-scheme" style="font-size: 80%;">Comparing with a numerical scheme</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs035.html#setting-up-the-code" style="font-size: 80%;">Setting up the code</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs036.html#partial-differential-equations" style="font-size: 80%;">Partial Differential Equations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs037.html#type-of-problem" style="font-size: 80%;">Type of problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs038.html#network-requirements" style="font-size: 80%;">Network requirements</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs039.html#more-details" style="font-size: 80%;">More details</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs040.html#example-the-diffusion-equation" style="font-size: 80%;">Example: The diffusion equation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs041.html#defining-the-problem" style="font-size: 80%;">Defining the problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs042.html#setting-up-the-network-using-autograd" style="font-size: 80%;">Setting up the network using Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs043.html#setting-up-the-network-using-autograd-the-trial-solution" style="font-size: 80%;">Setting up the network using Autograd; The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs044.html#why-the-jacobian" style="font-size: 80%;">Why the Jacobian?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs045.html#setting-up-the-network-using-autograd-the-full-program" style="font-size: 80%;">Setting up the network using Autograd; The full program</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs046.html#resources-on-differential-equations-and-deep-learning" style="font-size: 80%;">Resources on differential equations and deep learning</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs047.html#convolutional-neural-networks-recognizing-images" style="font-size: 80%;">Convolutional Neural Networks (recognizing images)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs048.html#what-is-the-difference" style="font-size: 80%;">What is the Difference</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs049.html#neural-networks-vs-cnns" style="font-size: 80%;">Neural Networks vs CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs050.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;">Why CNNS for images, sound files, medical images from CT scans etc?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs051.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;">Regular NNs don’t scale well to full images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs052.html#3d-volumes-of-neurons" style="font-size: 80%;">3D volumes of neurons</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs053.html#more-on-dimensionalities" style="font-size: 80%;">More on Dimensionalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs054.html#further-remarks" style="font-size: 80%;">Further remarks</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs055.html#layers-used-to-build-cnns" style="font-size: 80%;">Layers used to build CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs056.html#transforming-images" style="font-size: 80%;">Transforming images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs057.html#cnns-in-brief" style="font-size: 80%;">CNNs in brief</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs058.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs059.html#key-idea" style="font-size: 80%;">Key Idea</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#how-to-do-image-compression-before-the-era-of-deep-learning" style="font-size: 80%;">How to do image compression before the era of deep learning</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs061.html#the-svd-example" style="font-size: 80%;">The SVD example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs062.html#mathematics-of-cnns" style="font-size: 80%;">Mathematics of CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs063.html#convolution-examples-polynomial-multiplication" style="font-size: 80%;">Convolution Examples: Polynomial multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs064.html#efficient-polynomial-multiplication" style="font-size: 80%;">Efficient Polynomial Multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs065.html#further-simplification" style="font-size: 80%;">Further simplification</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs066.html#a-more-efficient-way-of-coding-the-above-convolution" style="font-size: 80%;">A more efficient way of coding the above Convolution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs067.html#commutative-process" style="font-size: 80%;">Commutative process</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs068.html#toeplitz-matrices" style="font-size: 80%;">Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs069.html#fourier-series-and-toeplitz-matrices" style="font-size: 80%;">Fourier series and Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs070.html#generalizing-the-above-one-dimensional-case" style="font-size: 80%;">Generalizing the above one-dimensional case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs071.html#memory-considerations" style="font-size: 80%;">Memory considerations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs072.html#padding" style="font-size: 80%;">Padding</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs073.html#new-vector" style="font-size: 80%;">New vector</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs074.html#rewriting-as-dot-products" style="font-size: 80%;">Rewriting as dot products</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs075.html#cross-correlation" style="font-size: 80%;">Cross correlation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs075.html#two-dimensional-objects" style="font-size: 80%;">Two-dimensional objects</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs076.html#cnns-in-more-detail-simple-example" style="font-size: 80%;">CNNs in more detail, simple example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs077.html#the-convolution-stage" style="font-size: 80%;">The convolution stage</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs078.html#finding-the-number-of-parameters" style="font-size: 80%;">Finding the number of parameters</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs079.html#new-image-or-volume" style="font-size: 80%;">New image (or volume)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs080.html#parameters-to-train-common-settings" style="font-size: 80%;">Parameters to train, common settings</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs081.html#examples-of-cnn-setups" style="font-size: 80%;">Examples of CNN setups</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs082.html#summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Summarizing: Performing a general discrete convolution ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs083.html#pooling" style="font-size: 80%;">Pooling</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs084.html#pooling-arithmetic" style="font-size: 80%;">Pooling arithmetic</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs085.html#pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Pooling types ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs086.html#building-convolutional-neural-networks-in-tensorflow-keras-and-pytorch" style="font-size: 80%;">Building convolutional neural networks in Tensorflow/Keras and PyTorch</a></li>
 
         </ul>
       </li>
@@ -350,7 +391,7 @@
 <!-- ------------------- main content ---------------------- -->
 <div class="jumbotron">
 <center>
-<h1>Week 44,  Convolutional Neural Networks (CNN)</h1>
+<h1>Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)</h1>
 </center>  <!-- document title -->
 
 <!-- author(s): Morten Hjorth-Jensen -->
@@ -363,7 +404,7 @@ <h1>Week 44,  Convolutional Neural Networks (CNN)</h1>
 </center>
 <br>
 <center>
-<h4>October 28</h4>
+<h4>Week 44</h4>
 </center> <!-- date -->
 <br>
 
@@ -388,7 +429,7 @@ <h4>October 28</h4>
   <li><a href="http://github.com/._week44-bs008.html">9</a></li>
   <li><a href="http://github.com/._week44-bs009.html">10</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week44-bs060.html">61</a></li>
+  <li><a href="http://github.com/._week44-bs086.html">87</a></li>
   <li><a href="http://github.com/._week44-bs001.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
@@ -402,7 +443,7 @@ <h4>October 28</h4>
 </footer>
 -->
 <center style="font-size:80%">
-<!-- copyright --> &copy; 1999-2024, Morten Hjorth-Jensen. Released under CC Attribution-NonCommercial 4.0 license
+<!-- copyright --> &copy; 1999-2025, Morten Hjorth-Jensen. Released under CC Attribution-NonCommercial 4.0 license
 </center>
 </body>
 </html>
diff --git a/doc/pub/week44/html/._week44-bs001.html b/doc/pub/week44/html/._week44-bs001.html
index 01fd279d3..ca81e0600 100644
--- a/doc/pub/week44/html/._week44-bs001.html
+++ b/doc/pub/week44/html/._week44-bs001.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 44,  Convolutional Neural Networks (CNN)">
-<title>Week 44,  Convolutional Neural Networks (CNN)</title>
+<meta name="description" content="Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)">
+<title>Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week44.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week44-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -41,10 +41,121 @@
                2,
                None,
                'lab-sessions-on-tuesday-and-wednesday'),
-              ('Material for Lecture Monday October 28',
+              ('Material for Lecture Monday October 27',
                2,
                None,
-               'material-for-lecture-monday-october-28'),
+               'material-for-lecture-monday-october-27'),
+              ('Solving differential equations  with Deep Learning',
+               2,
+               None,
+               'solving-differential-equations-with-deep-learning'),
+              ('Ordinary Differential Equations first',
+               2,
+               None,
+               'ordinary-differential-equations-first'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('Minimization process', 2, None, 'minimization-process'),
+              ('Minimizing the cost function using gradient descent and '
+               'automatic differentiation',
+               2,
+               None,
+               'minimizing-the-cost-function-using-gradient-descent-and-automatic-differentiation'),
+              ('Example: Exponential decay',
+               2,
+               None,
+               'example-exponential-decay'),
+              ('The function to solve for',
+               2,
+               None,
+               'the-function-to-solve-for'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('Setup of Network', 2, None, 'setup-of-network'),
+              ('Reformulating the problem',
+               2,
+               None,
+               'reformulating-the-problem'),
+              ('More technicalities', 2, None, 'more-technicalities'),
+              ('More details', 2, None, 'more-details'),
+              ('A possible implementation of a neural network',
+               2,
+               None,
+               'a-possible-implementation-of-a-neural-network'),
+              ('Technicalities', 2, None, 'technicalities'),
+              ('Final technicalities I', 2, None, 'final-technicalities-i'),
+              ('Final technicalities II', 2, None, 'final-technicalities-ii'),
+              ('Final technicalities III', 2, None, 'final-technicalities-iii'),
+              ('Final technicalities IV', 2, None, 'final-technicalities-iv'),
+              ('Back propagation', 2, None, 'back-propagation'),
+              ('Gradient descent', 2, None, 'gradient-descent'),
+              ('The code for solving the ODE',
+               2,
+               None,
+               'the-code-for-solving-the-ode'),
+              ('The network with one input layer, specified number of hidden '
+               'layers, and one output layer',
+               2,
+               None,
+               'the-network-with-one-input-layer-specified-number-of-hidden-layers-and-one-output-layer'),
+              ('Example: Population growth',
+               2,
+               None,
+               'example-population-growth'),
+              ('Setting up the problem', 2, None, 'setting-up-the-problem'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('The program using Autograd',
+               2,
+               None,
+               'the-program-using-autograd'),
+              ('Using forward Euler to solve the ODE',
+               2,
+               None,
+               'using-forward-euler-to-solve-the-ode'),
+              ('Example: Solving the one dimensional Poisson equation',
+               2,
+               None,
+               'example-solving-the-one-dimensional-poisson-equation'),
+              ('The specific equation to solve for',
+               2,
+               None,
+               'the-specific-equation-to-solve-for'),
+              ('Solving the equation using Autograd',
+               2,
+               None,
+               'solving-the-equation-using-autograd'),
+              ('Comparing with a numerical scheme',
+               2,
+               None,
+               'comparing-with-a-numerical-scheme'),
+              ('Setting up the code', 2, None, 'setting-up-the-code'),
+              ('Partial Differential Equations',
+               2,
+               None,
+               'partial-differential-equations'),
+              ('Type of problem', 2, None, 'type-of-problem'),
+              ('Network requirements', 2, None, 'network-requirements'),
+              ('More details', 2, None, 'more-details'),
+              ('Example: The diffusion equation',
+               2,
+               None,
+               'example-the-diffusion-equation'),
+              ('Defining the problem', 2, None, 'defining-the-problem'),
+              ('Setting up the network using Autograd',
+               2,
+               None,
+               'setting-up-the-network-using-autograd'),
+              ('Setting up the network using Autograd; The trial solution',
+               2,
+               None,
+               'setting-up-the-network-using-autograd-the-trial-solution'),
+              ('Why the Jacobian?', 2, None, 'why-the-jacobian'),
+              ('Setting up the network using Autograd; The full program',
+               2,
+               None,
+               'setting-up-the-network-using-autograd-the-full-program'),
+              ('Resources on differential equations and deep learning',
+               2,
+               None,
+               'resources-on-differential-equations-and-deep-learning'),
               ('Convolutional Neural Networks (recognizing images)',
                2,
                None,
@@ -141,85 +252,11 @@
                2,
                None,
                'pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
-              ('Building convolutional neural networks in Tensorflow and Keras',
-               2,
-               None,
-               'building-convolutional-neural-networks-in-tensorflow-and-keras'),
-              ('Setting it up', 2, None, 'setting-it-up'),
-              ('The MNIST dataset again', 2, None, 'the-mnist-dataset-again'),
-              ('Strong correlations', 2, None, 'strong-correlations'),
-              ('Layers of a CNN', 2, None, 'layers-of-a-cnn'),
-              ('Systematic reduction', 2, None, 'systematic-reduction'),
-              ('Prerequisites: Collect and pre-process data',
+              ('Building convolutional neural networks in Tensorflow/Keras and '
+               'PyTorch',
                2,
                None,
-               'prerequisites-collect-and-pre-process-data'),
-              ('Importing Keras and Tensorflow',
-               2,
-               None,
-               'importing-keras-and-tensorflow'),
-              ('Running with Keras', 2, None, 'running-with-keras'),
-              ('Final part', 2, None, 'final-part'),
-              ('Final visualization', 2, None, 'final-visualization'),
-              ('The CIFAR01 data set', 2, None, 'the-cifar01-data-set'),
-              ('Verifying the data set', 2, None, 'verifying-the-data-set'),
-              ('Set up  the model', 2, None, 'set-up-the-model'),
-              ('Add Dense layers on top', 2, None, 'add-dense-layers-on-top'),
-              ('Compile and train the model',
-               2,
-               None,
-               'compile-and-train-the-model'),
-              ('Finally, evaluate the model',
-               2,
-               None,
-               'finally-evaluate-the-model'),
-              ('Building our own CNN code',
-               2,
-               None,
-               'building-our-own-cnn-code'),
-              ('List of contents:', 3, None, 'list-of-contents'),
-              ('Schedulers', 3, None, 'schedulers'),
-              ('Usage of schedulers', 3, None, 'usage-of-schedulers'),
-              ('Cost functions', 3, None, 'cost-functions'),
-              ('Usage of cost functions', 3, None, 'usage-of-cost-functions'),
-              ('Activation functions', 3, None, 'activation-functions'),
-              ('Usage of activation functions',
-               3,
-               None,
-               'usage-of-activation-functions'),
-              ('Convolution', 3, None, 'convolution'),
-              ('Layers', 3, None, 'layers'),
-              ('Convolution2DLayer: convolution in a hidden layer',
-               3,
-               None,
-               'convolution2dlayer-convolution-in-a-hidden-layer'),
-              ('Backpropagation in the convolutional layer',
-               3,
-               None,
-               'backpropagation-in-the-convolutional-layer'),
-              ('Demonstration', 3, None, 'demonstration'),
-              ('Pooling Layer', 3, None, 'pooling-layer'),
-              ('Flattening Layer', 3, None, 'flattening-layer'),
-              ('Fully Connected Layers', 3, None, 'fully-connected-layers'),
-              ('Optimized Convolution2DLayer',
-               3,
-               None,
-               'optimized-convolution2dlayer'),
-              ('The Convolutional Neural Network (CNN)',
-               3,
-               None,
-               'the-convolutional-neural-network-cnn'),
-              ('Usage of CNN code', 3, None, 'usage-of-cnn-code'),
-              ('Additional Remarks', 3, None, 'additional-remarks'),
-              ('Remarks on the speed', 3, None, 'remarks-on-the-speed'),
-              ('Convolution using separable kernels',
-               3,
-               None,
-               'convolution-using-separable-kernels'),
-              ('Convolution in the Fourier domain',
-               3,
-               None,
-               'convolution-in-the-fourier-domain')]}
+               'building-convolutional-neural-networks-in-tensorflow-keras-and-pytorch')]}
 end of tocinfo -->
 
 <body>
@@ -247,96 +284,100 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week44-bs.html">Week 44,  Convolutional Neural Networks (CNN)</a>
+    <a class="navbar-brand" href="http://github.com/week44-bs.html">Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="#plan-for-week-44" style="font-size: 80%;"><b>Plan for week 44</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs002.html#lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Lab  sessions on Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs003.html#material-for-lecture-monday-october-28" style="font-size: 80%;"><b>Material for Lecture Monday October 28</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs004.html#convolutional-neural-networks-recognizing-images" style="font-size: 80%;"><b>Convolutional Neural Networks (recognizing images)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs005.html#what-is-the-difference" style="font-size: 80%;"><b>What is the Difference</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs006.html#neural-networks-vs-cnns" style="font-size: 80%;"><b>Neural Networks vs CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs007.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;"><b>Why CNNS for images, sound files, medical images from CT scans etc?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs008.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;"><b>Regular NNs don’t scale well to full images</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs009.html#3d-volumes-of-neurons" style="font-size: 80%;"><b>3D volumes of neurons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs010.html#more-on-dimensionalities" style="font-size: 80%;"><b>More on Dimensionalities</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs011.html#further-remarks" style="font-size: 80%;"><b>Further remarks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs012.html#layers-used-to-build-cnns" style="font-size: 80%;"><b>Layers used to build CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs013.html#transforming-images" style="font-size: 80%;"><b>Transforming images</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs014.html#cnns-in-brief" style="font-size: 80%;"><b>CNNs in brief</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs015.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs016.html#key-idea" style="font-size: 80%;"><b>Key Idea</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs017.html#how-to-do-image-compression-before-the-era-of-deep-learning" style="font-size: 80%;"><b>How to do image compression before the era of deep learning</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs018.html#the-svd-example" style="font-size: 80%;"><b>The SVD example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs019.html#mathematics-of-cnns" style="font-size: 80%;"><b>Mathematics of CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs020.html#convolution-examples-polynomial-multiplication" style="font-size: 80%;"><b>Convolution Examples: Polynomial multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs021.html#efficient-polynomial-multiplication" style="font-size: 80%;"><b>Efficient Polynomial Multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs022.html#further-simplification" style="font-size: 80%;"><b>Further simplification</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs023.html#a-more-efficient-way-of-coding-the-above-convolution" style="font-size: 80%;"><b>A more efficient way of coding the above Convolution</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs024.html#commutative-process" style="font-size: 80%;"><b>Commutative process</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs025.html#toeplitz-matrices" style="font-size: 80%;"><b>Toeplitz matrices</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs026.html#fourier-series-and-toeplitz-matrices" style="font-size: 80%;"><b>Fourier series and Toeplitz matrices</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs027.html#generalizing-the-above-one-dimensional-case" style="font-size: 80%;"><b>Generalizing the above one-dimensional case</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#memory-considerations" style="font-size: 80%;"><b>Memory considerations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs029.html#padding" style="font-size: 80%;"><b>Padding</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs030.html#new-vector" style="font-size: 80%;"><b>New vector</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs031.html#rewriting-as-dot-products" style="font-size: 80%;"><b>Rewriting as dot products</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs032.html#cross-correlation" style="font-size: 80%;"><b>Cross correlation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs032.html#two-dimensional-objects" style="font-size: 80%;"><b>Two-dimensional objects</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs033.html#cnns-in-more-detail-simple-example" style="font-size: 80%;"><b>CNNs in more detail, simple example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs034.html#the-convolution-stage" style="font-size: 80%;"><b>The convolution stage</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs035.html#finding-the-number-of-parameters" style="font-size: 80%;"><b>Finding the number of parameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs036.html#new-image-or-volume" style="font-size: 80%;"><b>New image (or volume)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs037.html#parameters-to-train-common-settings" style="font-size: 80%;"><b>Parameters to train, common settings</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs038.html#examples-of-cnn-setups" style="font-size: 80%;"><b>Examples of CNN setups</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs039.html#summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>Summarizing: Performing a general discrete convolution ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs040.html#pooling" style="font-size: 80%;"><b>Pooling</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs041.html#pooling-arithmetic" style="font-size: 80%;"><b>Pooling arithmetic</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs042.html#pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>Pooling types ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs043.html#building-convolutional-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building convolutional neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs044.html#setting-it-up" style="font-size: 80%;"><b>Setting it up</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs045.html#the-mnist-dataset-again" style="font-size: 80%;"><b>The MNIST dataset again</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs046.html#strong-correlations" style="font-size: 80%;"><b>Strong correlations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs047.html#layers-of-a-cnn" style="font-size: 80%;"><b>Layers of a CNN</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs048.html#systematic-reduction" style="font-size: 80%;"><b>Systematic reduction</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs049.html#prerequisites-collect-and-pre-process-data" style="font-size: 80%;"><b>Prerequisites: Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs050.html#importing-keras-and-tensorflow" style="font-size: 80%;"><b>Importing Keras and Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs051.html#running-with-keras" style="font-size: 80%;"><b>Running with Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs052.html#final-part" style="font-size: 80%;"><b>Final part</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs053.html#final-visualization" style="font-size: 80%;"><b>Final visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs054.html#the-cifar01-data-set" style="font-size: 80%;"><b>The CIFAR01 data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs055.html#verifying-the-data-set" style="font-size: 80%;"><b>Verifying the data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs056.html#set-up-the-model" style="font-size: 80%;"><b>Set up  the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs057.html#add-dense-layers-on-top" style="font-size: 80%;"><b>Add Dense layers on top</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs058.html#compile-and-train-the-model" style="font-size: 80%;"><b>Compile and train the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs059.html#finally-evaluate-the-model" style="font-size: 80%;"><b>Finally, evaluate the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#building-our-own-cnn-code" style="font-size: 80%;"><b>Building our own CNN code</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#list-of-contents" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;List of contents:</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution2dlayer-convolution-in-a-hidden-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution2DLayer: convolution in a hidden layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#backpropagation-in-the-convolutional-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Backpropagation in the convolutional layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#demonstration" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Demonstration</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#pooling-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Pooling Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#flattening-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Flattening Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#fully-connected-layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Fully Connected Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#optimized-convolution2dlayer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Optimized Convolution2DLayer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#the-convolutional-neural-network-cnn" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;The Convolutional Neural Network (CNN)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-cnn-code" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of CNN code</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#additional-remarks" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Additional Remarks</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#remarks-on-the-speed" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Remarks on the speed</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution-using-separable-kernels" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution using separable kernels</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution-in-the-fourier-domain" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution in the Fourier domain</a></li>
+     <!-- navigation toc: --> <li><a href="#plan-for-week-44" style="font-size: 80%;">Plan for week 44</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs002.html#lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;">Lab  sessions on Tuesday and Wednesday</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs003.html#material-for-lecture-monday-october-27" style="font-size: 80%;">Material for Lecture Monday October 27</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs004.html#solving-differential-equations-with-deep-learning" style="font-size: 80%;">Solving differential equations  with Deep Learning</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs005.html#ordinary-differential-equations-first" style="font-size: 80%;">Ordinary Differential Equations first</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#the-trial-solution" style="font-size: 80%;">The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs007.html#minimization-process" style="font-size: 80%;">Minimization process</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs008.html#minimizing-the-cost-function-using-gradient-descent-and-automatic-differentiation" style="font-size: 80%;">Minimizing the cost function using gradient descent and automatic differentiation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs009.html#example-exponential-decay" style="font-size: 80%;">Example: Exponential decay</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs010.html#the-function-to-solve-for" style="font-size: 80%;">The function to solve for</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#the-trial-solution" style="font-size: 80%;">The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs012.html#setup-of-network" style="font-size: 80%;">Setup of Network</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs013.html#reformulating-the-problem" style="font-size: 80%;">Reformulating the problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs014.html#more-technicalities" style="font-size: 80%;">More technicalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs039.html#more-details" style="font-size: 80%;">More details</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs016.html#a-possible-implementation-of-a-neural-network" style="font-size: 80%;">A possible implementation of a neural network</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs017.html#technicalities" style="font-size: 80%;">Technicalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs018.html#final-technicalities-i" style="font-size: 80%;">Final technicalities I</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs019.html#final-technicalities-ii" style="font-size: 80%;">Final technicalities II</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs020.html#final-technicalities-iii" style="font-size: 80%;">Final technicalities III</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs021.html#final-technicalities-iv" style="font-size: 80%;">Final technicalities IV</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs022.html#back-propagation" style="font-size: 80%;">Back propagation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs023.html#gradient-descent" style="font-size: 80%;">Gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs024.html#the-code-for-solving-the-ode" style="font-size: 80%;">The code for solving the ODE</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs025.html#the-network-with-one-input-layer-specified-number-of-hidden-layers-and-one-output-layer" style="font-size: 80%;">The network with one input layer, specified number of hidden layers, and one output layer</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs026.html#example-population-growth" style="font-size: 80%;">Example: Population growth</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs027.html#setting-up-the-problem" style="font-size: 80%;">Setting up the problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#the-trial-solution" style="font-size: 80%;">The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs029.html#the-program-using-autograd" style="font-size: 80%;">The program using Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs030.html#using-forward-euler-to-solve-the-ode" style="font-size: 80%;">Using forward Euler to solve the ODE</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs031.html#example-solving-the-one-dimensional-poisson-equation" style="font-size: 80%;">Example: Solving the one dimensional Poisson equation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs032.html#the-specific-equation-to-solve-for" style="font-size: 80%;">The specific equation to solve for</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs033.html#solving-the-equation-using-autograd" style="font-size: 80%;">Solving the equation using Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs034.html#comparing-with-a-numerical-scheme" style="font-size: 80%;">Comparing with a numerical scheme</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs035.html#setting-up-the-code" style="font-size: 80%;">Setting up the code</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs036.html#partial-differential-equations" style="font-size: 80%;">Partial Differential Equations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs037.html#type-of-problem" style="font-size: 80%;">Type of problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs038.html#network-requirements" style="font-size: 80%;">Network requirements</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs039.html#more-details" style="font-size: 80%;">More details</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs040.html#example-the-diffusion-equation" style="font-size: 80%;">Example: The diffusion equation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs041.html#defining-the-problem" style="font-size: 80%;">Defining the problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs042.html#setting-up-the-network-using-autograd" style="font-size: 80%;">Setting up the network using Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs043.html#setting-up-the-network-using-autograd-the-trial-solution" style="font-size: 80%;">Setting up the network using Autograd; The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs044.html#why-the-jacobian" style="font-size: 80%;">Why the Jacobian?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs045.html#setting-up-the-network-using-autograd-the-full-program" style="font-size: 80%;">Setting up the network using Autograd; The full program</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs046.html#resources-on-differential-equations-and-deep-learning" style="font-size: 80%;">Resources on differential equations and deep learning</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs047.html#convolutional-neural-networks-recognizing-images" style="font-size: 80%;">Convolutional Neural Networks (recognizing images)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs048.html#what-is-the-difference" style="font-size: 80%;">What is the Difference</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs049.html#neural-networks-vs-cnns" style="font-size: 80%;">Neural Networks vs CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs050.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;">Why CNNS for images, sound files, medical images from CT scans etc?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs051.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;">Regular NNs don’t scale well to full images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs052.html#3d-volumes-of-neurons" style="font-size: 80%;">3D volumes of neurons</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs053.html#more-on-dimensionalities" style="font-size: 80%;">More on Dimensionalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs054.html#further-remarks" style="font-size: 80%;">Further remarks</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs055.html#layers-used-to-build-cnns" style="font-size: 80%;">Layers used to build CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs056.html#transforming-images" style="font-size: 80%;">Transforming images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs057.html#cnns-in-brief" style="font-size: 80%;">CNNs in brief</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs058.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs059.html#key-idea" style="font-size: 80%;">Key Idea</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#how-to-do-image-compression-before-the-era-of-deep-learning" style="font-size: 80%;">How to do image compression before the era of deep learning</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs061.html#the-svd-example" style="font-size: 80%;">The SVD example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs062.html#mathematics-of-cnns" style="font-size: 80%;">Mathematics of CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs063.html#convolution-examples-polynomial-multiplication" style="font-size: 80%;">Convolution Examples: Polynomial multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs064.html#efficient-polynomial-multiplication" style="font-size: 80%;">Efficient Polynomial Multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs065.html#further-simplification" style="font-size: 80%;">Further simplification</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs066.html#a-more-efficient-way-of-coding-the-above-convolution" style="font-size: 80%;">A more efficient way of coding the above Convolution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs067.html#commutative-process" style="font-size: 80%;">Commutative process</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs068.html#toeplitz-matrices" style="font-size: 80%;">Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs069.html#fourier-series-and-toeplitz-matrices" style="font-size: 80%;">Fourier series and Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs070.html#generalizing-the-above-one-dimensional-case" style="font-size: 80%;">Generalizing the above one-dimensional case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs071.html#memory-considerations" style="font-size: 80%;">Memory considerations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs072.html#padding" style="font-size: 80%;">Padding</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs073.html#new-vector" style="font-size: 80%;">New vector</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs074.html#rewriting-as-dot-products" style="font-size: 80%;">Rewriting as dot products</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs075.html#cross-correlation" style="font-size: 80%;">Cross correlation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs075.html#two-dimensional-objects" style="font-size: 80%;">Two-dimensional objects</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs076.html#cnns-in-more-detail-simple-example" style="font-size: 80%;">CNNs in more detail, simple example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs077.html#the-convolution-stage" style="font-size: 80%;">The convolution stage</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs078.html#finding-the-number-of-parameters" style="font-size: 80%;">Finding the number of parameters</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs079.html#new-image-or-volume" style="font-size: 80%;">New image (or volume)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs080.html#parameters-to-train-common-settings" style="font-size: 80%;">Parameters to train, common settings</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs081.html#examples-of-cnn-setups" style="font-size: 80%;">Examples of CNN setups</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs082.html#summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Summarizing: Performing a general discrete convolution ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs083.html#pooling" style="font-size: 80%;">Pooling</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs084.html#pooling-arithmetic" style="font-size: 80%;">Pooling arithmetic</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs085.html#pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Pooling types ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs086.html#building-convolutional-neural-networks-in-tensorflow-keras-and-pytorch" style="font-size: 80%;">Building convolutional neural networks in Tensorflow/Keras and PyTorch</a></li>
 
         </ul>
       </li>
@@ -354,7 +395,8 @@ <h2 id="plan-for-week-44" class="anchor">Plan for week 44 </h2>
 <div class="panel-body">
 <!-- subsequent paragraphs come in larger fonts, so start with a paragraph -->
 <ol>
-<li> Convolutional  Neural Networks</li>
+<li> Solving differential equations, continuation from last week, first lecture</li>
+<li> Convolutional  Neural Networks, second lecture</li>
 <li> Readings and Videos:</li>
 <ul>
   <li> These lecture notes at <a href="https://github.com/CompPhysics/MachineLearning/blob/master/doc/pub/week44/ipynb/week44.ipynb" target="_self"><tt>https://github.com/CompPhysics/MachineLearning/blob/master/doc/pub/week44/ipynb/week44.ipynb</tt></a></li>
@@ -363,8 +405,8 @@ <h2 id="plan-for-week-44" class="anchor">Plan for week 44 </h2>
   <li> Video on Deep Learning at <a href="https://www.youtube.com/playlist?list=PLZHQObOWTQDNU6R1_67000Dx_ZCJB-3pi" target="_self"><tt>https://www.youtube.com/playlist?list=PLZHQObOWTQDNU6R1_67000Dx_ZCJB-3pi</tt></a></li>
   <li> Video  on Convolutional Neural Networks from MIT at <a href="https://www.youtube.com/watch?v=iaSUYvmCekI&ab_channel=AlexanderAmini" target="_self"><tt>https://www.youtube.com/watch?v=iaSUYvmCekI&ab_channel=AlexanderAmini</tt></a></li>
   <li> Video on CNNs from Stanford at <a href="https://www.youtube.com/watch?v=bNb2fEVKeEo&list=PLC1qU-LWwrF64f4QKQT-Vg5Wr4qEE1Zxk&index=6&ab_channel=StanfordUniversitySchoolofEngineering" target="_self"><tt>https://www.youtube.com/watch?v=bNb2fEVKeEo&list=PLC1qU-LWwrF64f4QKQT-Vg5Wr4qEE1Zxk&index=6&ab_channel=StanfordUniversitySchoolofEngineering</tt></a></li>
-  <li> Video of lecture October 28 at <a href="https://youtu.be/rfrSfikAz94" target="_self"><tt>https://youtu.be/rfrSfikAz94</tt></a></li>
-  <li> Whiteboard notes at <a href="https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2024/NotesOctober28" target="_self"><tt>https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2024/NotesOctober28</tt></a></li>
+  <li> Video of lecture October 27 at <a href="https://youtu.be/QqOGhLgkig0" target="_self"><tt>https://youtu.be/QqOGhLgkig0</tt></a></li>
+  <li> Whiteboard notes at <a href="https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2025/FYSSTKweek44" target="_self"><tt>https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2025/FYSSTKweek44</tt></a></li>
 </ul>
 </ol>
 </div>
@@ -387,7 +429,7 @@ <h2 id="plan-for-week-44" class="anchor">Plan for week 44 </h2>
   <li><a href="http://github.com/._week44-bs009.html">10</a></li>
   <li><a href="http://github.com/._week44-bs010.html">11</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week44-bs060.html">61</a></li>
+  <li><a href="http://github.com/._week44-bs086.html">87</a></li>
   <li><a href="http://github.com/._week44-bs002.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week44/html/._week44-bs002.html b/doc/pub/week44/html/._week44-bs002.html
index 5d78a1f71..509f1462c 100644
--- a/doc/pub/week44/html/._week44-bs002.html
+++ b/doc/pub/week44/html/._week44-bs002.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 44,  Convolutional Neural Networks (CNN)">
-<title>Week 44,  Convolutional Neural Networks (CNN)</title>
+<meta name="description" content="Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)">
+<title>Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week44.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week44-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -41,10 +41,121 @@
                2,
                None,
                'lab-sessions-on-tuesday-and-wednesday'),
-              ('Material for Lecture Monday October 28',
+              ('Material for Lecture Monday October 27',
                2,
                None,
-               'material-for-lecture-monday-october-28'),
+               'material-for-lecture-monday-october-27'),
+              ('Solving differential equations  with Deep Learning',
+               2,
+               None,
+               'solving-differential-equations-with-deep-learning'),
+              ('Ordinary Differential Equations first',
+               2,
+               None,
+               'ordinary-differential-equations-first'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('Minimization process', 2, None, 'minimization-process'),
+              ('Minimizing the cost function using gradient descent and '
+               'automatic differentiation',
+               2,
+               None,
+               'minimizing-the-cost-function-using-gradient-descent-and-automatic-differentiation'),
+              ('Example: Exponential decay',
+               2,
+               None,
+               'example-exponential-decay'),
+              ('The function to solve for',
+               2,
+               None,
+               'the-function-to-solve-for'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('Setup of Network', 2, None, 'setup-of-network'),
+              ('Reformulating the problem',
+               2,
+               None,
+               'reformulating-the-problem'),
+              ('More technicalities', 2, None, 'more-technicalities'),
+              ('More details', 2, None, 'more-details'),
+              ('A possible implementation of a neural network',
+               2,
+               None,
+               'a-possible-implementation-of-a-neural-network'),
+              ('Technicalities', 2, None, 'technicalities'),
+              ('Final technicalities I', 2, None, 'final-technicalities-i'),
+              ('Final technicalities II', 2, None, 'final-technicalities-ii'),
+              ('Final technicalities III', 2, None, 'final-technicalities-iii'),
+              ('Final technicalities IV', 2, None, 'final-technicalities-iv'),
+              ('Back propagation', 2, None, 'back-propagation'),
+              ('Gradient descent', 2, None, 'gradient-descent'),
+              ('The code for solving the ODE',
+               2,
+               None,
+               'the-code-for-solving-the-ode'),
+              ('The network with one input layer, specified number of hidden '
+               'layers, and one output layer',
+               2,
+               None,
+               'the-network-with-one-input-layer-specified-number-of-hidden-layers-and-one-output-layer'),
+              ('Example: Population growth',
+               2,
+               None,
+               'example-population-growth'),
+              ('Setting up the problem', 2, None, 'setting-up-the-problem'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('The program using Autograd',
+               2,
+               None,
+               'the-program-using-autograd'),
+              ('Using forward Euler to solve the ODE',
+               2,
+               None,
+               'using-forward-euler-to-solve-the-ode'),
+              ('Example: Solving the one dimensional Poisson equation',
+               2,
+               None,
+               'example-solving-the-one-dimensional-poisson-equation'),
+              ('The specific equation to solve for',
+               2,
+               None,
+               'the-specific-equation-to-solve-for'),
+              ('Solving the equation using Autograd',
+               2,
+               None,
+               'solving-the-equation-using-autograd'),
+              ('Comparing with a numerical scheme',
+               2,
+               None,
+               'comparing-with-a-numerical-scheme'),
+              ('Setting up the code', 2, None, 'setting-up-the-code'),
+              ('Partial Differential Equations',
+               2,
+               None,
+               'partial-differential-equations'),
+              ('Type of problem', 2, None, 'type-of-problem'),
+              ('Network requirements', 2, None, 'network-requirements'),
+              ('More details', 2, None, 'more-details'),
+              ('Example: The diffusion equation',
+               2,
+               None,
+               'example-the-diffusion-equation'),
+              ('Defining the problem', 2, None, 'defining-the-problem'),
+              ('Setting up the network using Autograd',
+               2,
+               None,
+               'setting-up-the-network-using-autograd'),
+              ('Setting up the network using Autograd; The trial solution',
+               2,
+               None,
+               'setting-up-the-network-using-autograd-the-trial-solution'),
+              ('Why the Jacobian?', 2, None, 'why-the-jacobian'),
+              ('Setting up the network using Autograd; The full program',
+               2,
+               None,
+               'setting-up-the-network-using-autograd-the-full-program'),
+              ('Resources on differential equations and deep learning',
+               2,
+               None,
+               'resources-on-differential-equations-and-deep-learning'),
               ('Convolutional Neural Networks (recognizing images)',
                2,
                None,
@@ -141,85 +252,11 @@
                2,
                None,
                'pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
-              ('Building convolutional neural networks in Tensorflow and Keras',
-               2,
-               None,
-               'building-convolutional-neural-networks-in-tensorflow-and-keras'),
-              ('Setting it up', 2, None, 'setting-it-up'),
-              ('The MNIST dataset again', 2, None, 'the-mnist-dataset-again'),
-              ('Strong correlations', 2, None, 'strong-correlations'),
-              ('Layers of a CNN', 2, None, 'layers-of-a-cnn'),
-              ('Systematic reduction', 2, None, 'systematic-reduction'),
-              ('Prerequisites: Collect and pre-process data',
+              ('Building convolutional neural networks in Tensorflow/Keras and '
+               'PyTorch',
                2,
                None,
-               'prerequisites-collect-and-pre-process-data'),
-              ('Importing Keras and Tensorflow',
-               2,
-               None,
-               'importing-keras-and-tensorflow'),
-              ('Running with Keras', 2, None, 'running-with-keras'),
-              ('Final part', 2, None, 'final-part'),
-              ('Final visualization', 2, None, 'final-visualization'),
-              ('The CIFAR01 data set', 2, None, 'the-cifar01-data-set'),
-              ('Verifying the data set', 2, None, 'verifying-the-data-set'),
-              ('Set up  the model', 2, None, 'set-up-the-model'),
-              ('Add Dense layers on top', 2, None, 'add-dense-layers-on-top'),
-              ('Compile and train the model',
-               2,
-               None,
-               'compile-and-train-the-model'),
-              ('Finally, evaluate the model',
-               2,
-               None,
-               'finally-evaluate-the-model'),
-              ('Building our own CNN code',
-               2,
-               None,
-               'building-our-own-cnn-code'),
-              ('List of contents:', 3, None, 'list-of-contents'),
-              ('Schedulers', 3, None, 'schedulers'),
-              ('Usage of schedulers', 3, None, 'usage-of-schedulers'),
-              ('Cost functions', 3, None, 'cost-functions'),
-              ('Usage of cost functions', 3, None, 'usage-of-cost-functions'),
-              ('Activation functions', 3, None, 'activation-functions'),
-              ('Usage of activation functions',
-               3,
-               None,
-               'usage-of-activation-functions'),
-              ('Convolution', 3, None, 'convolution'),
-              ('Layers', 3, None, 'layers'),
-              ('Convolution2DLayer: convolution in a hidden layer',
-               3,
-               None,
-               'convolution2dlayer-convolution-in-a-hidden-layer'),
-              ('Backpropagation in the convolutional layer',
-               3,
-               None,
-               'backpropagation-in-the-convolutional-layer'),
-              ('Demonstration', 3, None, 'demonstration'),
-              ('Pooling Layer', 3, None, 'pooling-layer'),
-              ('Flattening Layer', 3, None, 'flattening-layer'),
-              ('Fully Connected Layers', 3, None, 'fully-connected-layers'),
-              ('Optimized Convolution2DLayer',
-               3,
-               None,
-               'optimized-convolution2dlayer'),
-              ('The Convolutional Neural Network (CNN)',
-               3,
-               None,
-               'the-convolutional-neural-network-cnn'),
-              ('Usage of CNN code', 3, None, 'usage-of-cnn-code'),
-              ('Additional Remarks', 3, None, 'additional-remarks'),
-              ('Remarks on the speed', 3, None, 'remarks-on-the-speed'),
-              ('Convolution using separable kernels',
-               3,
-               None,
-               'convolution-using-separable-kernels'),
-              ('Convolution in the Fourier domain',
-               3,
-               None,
-               'convolution-in-the-fourier-domain')]}
+               'building-convolutional-neural-networks-in-tensorflow-keras-and-pytorch')]}
 end of tocinfo -->
 
 <body>
@@ -247,96 +284,100 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week44-bs.html">Week 44,  Convolutional Neural Networks (CNN)</a>
+    <a class="navbar-brand" href="http://github.com/week44-bs.html">Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs001.html#plan-for-week-44" style="font-size: 80%;"><b>Plan for week 44</b></a></li>
-     <!-- navigation toc: --> <li><a href="#lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Lab  sessions on Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs003.html#material-for-lecture-monday-october-28" style="font-size: 80%;"><b>Material for Lecture Monday October 28</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs004.html#convolutional-neural-networks-recognizing-images" style="font-size: 80%;"><b>Convolutional Neural Networks (recognizing images)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs005.html#what-is-the-difference" style="font-size: 80%;"><b>What is the Difference</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs006.html#neural-networks-vs-cnns" style="font-size: 80%;"><b>Neural Networks vs CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs007.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;"><b>Why CNNS for images, sound files, medical images from CT scans etc?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs008.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;"><b>Regular NNs don’t scale well to full images</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs009.html#3d-volumes-of-neurons" style="font-size: 80%;"><b>3D volumes of neurons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs010.html#more-on-dimensionalities" style="font-size: 80%;"><b>More on Dimensionalities</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs011.html#further-remarks" style="font-size: 80%;"><b>Further remarks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs012.html#layers-used-to-build-cnns" style="font-size: 80%;"><b>Layers used to build CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs013.html#transforming-images" style="font-size: 80%;"><b>Transforming images</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs014.html#cnns-in-brief" style="font-size: 80%;"><b>CNNs in brief</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs015.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs016.html#key-idea" style="font-size: 80%;"><b>Key Idea</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs017.html#how-to-do-image-compression-before-the-era-of-deep-learning" style="font-size: 80%;"><b>How to do image compression before the era of deep learning</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs018.html#the-svd-example" style="font-size: 80%;"><b>The SVD example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs019.html#mathematics-of-cnns" style="font-size: 80%;"><b>Mathematics of CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs020.html#convolution-examples-polynomial-multiplication" style="font-size: 80%;"><b>Convolution Examples: Polynomial multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs021.html#efficient-polynomial-multiplication" style="font-size: 80%;"><b>Efficient Polynomial Multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs022.html#further-simplification" style="font-size: 80%;"><b>Further simplification</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs023.html#a-more-efficient-way-of-coding-the-above-convolution" style="font-size: 80%;"><b>A more efficient way of coding the above Convolution</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs024.html#commutative-process" style="font-size: 80%;"><b>Commutative process</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs025.html#toeplitz-matrices" style="font-size: 80%;"><b>Toeplitz matrices</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs026.html#fourier-series-and-toeplitz-matrices" style="font-size: 80%;"><b>Fourier series and Toeplitz matrices</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs027.html#generalizing-the-above-one-dimensional-case" style="font-size: 80%;"><b>Generalizing the above one-dimensional case</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#memory-considerations" style="font-size: 80%;"><b>Memory considerations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs029.html#padding" style="font-size: 80%;"><b>Padding</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs030.html#new-vector" style="font-size: 80%;"><b>New vector</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs031.html#rewriting-as-dot-products" style="font-size: 80%;"><b>Rewriting as dot products</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs032.html#cross-correlation" style="font-size: 80%;"><b>Cross correlation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs032.html#two-dimensional-objects" style="font-size: 80%;"><b>Two-dimensional objects</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs033.html#cnns-in-more-detail-simple-example" style="font-size: 80%;"><b>CNNs in more detail, simple example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs034.html#the-convolution-stage" style="font-size: 80%;"><b>The convolution stage</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs035.html#finding-the-number-of-parameters" style="font-size: 80%;"><b>Finding the number of parameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs036.html#new-image-or-volume" style="font-size: 80%;"><b>New image (or volume)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs037.html#parameters-to-train-common-settings" style="font-size: 80%;"><b>Parameters to train, common settings</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs038.html#examples-of-cnn-setups" style="font-size: 80%;"><b>Examples of CNN setups</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs039.html#summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>Summarizing: Performing a general discrete convolution ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs040.html#pooling" style="font-size: 80%;"><b>Pooling</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs041.html#pooling-arithmetic" style="font-size: 80%;"><b>Pooling arithmetic</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs042.html#pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>Pooling types ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs043.html#building-convolutional-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building convolutional neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs044.html#setting-it-up" style="font-size: 80%;"><b>Setting it up</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs045.html#the-mnist-dataset-again" style="font-size: 80%;"><b>The MNIST dataset again</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs046.html#strong-correlations" style="font-size: 80%;"><b>Strong correlations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs047.html#layers-of-a-cnn" style="font-size: 80%;"><b>Layers of a CNN</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs048.html#systematic-reduction" style="font-size: 80%;"><b>Systematic reduction</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs049.html#prerequisites-collect-and-pre-process-data" style="font-size: 80%;"><b>Prerequisites: Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs050.html#importing-keras-and-tensorflow" style="font-size: 80%;"><b>Importing Keras and Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs051.html#running-with-keras" style="font-size: 80%;"><b>Running with Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs052.html#final-part" style="font-size: 80%;"><b>Final part</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs053.html#final-visualization" style="font-size: 80%;"><b>Final visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs054.html#the-cifar01-data-set" style="font-size: 80%;"><b>The CIFAR01 data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs055.html#verifying-the-data-set" style="font-size: 80%;"><b>Verifying the data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs056.html#set-up-the-model" style="font-size: 80%;"><b>Set up  the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs057.html#add-dense-layers-on-top" style="font-size: 80%;"><b>Add Dense layers on top</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs058.html#compile-and-train-the-model" style="font-size: 80%;"><b>Compile and train the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs059.html#finally-evaluate-the-model" style="font-size: 80%;"><b>Finally, evaluate the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#building-our-own-cnn-code" style="font-size: 80%;"><b>Building our own CNN code</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#list-of-contents" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;List of contents:</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution2dlayer-convolution-in-a-hidden-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution2DLayer: convolution in a hidden layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#backpropagation-in-the-convolutional-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Backpropagation in the convolutional layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#demonstration" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Demonstration</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#pooling-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Pooling Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#flattening-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Flattening Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#fully-connected-layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Fully Connected Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#optimized-convolution2dlayer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Optimized Convolution2DLayer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#the-convolutional-neural-network-cnn" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;The Convolutional Neural Network (CNN)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-cnn-code" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of CNN code</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#additional-remarks" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Additional Remarks</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#remarks-on-the-speed" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Remarks on the speed</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution-using-separable-kernels" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution using separable kernels</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution-in-the-fourier-domain" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution in the Fourier domain</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs001.html#plan-for-week-44" style="font-size: 80%;">Plan for week 44</a></li>
+     <!-- navigation toc: --> <li><a href="#lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;">Lab  sessions on Tuesday and Wednesday</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs003.html#material-for-lecture-monday-october-27" style="font-size: 80%;">Material for Lecture Monday October 27</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs004.html#solving-differential-equations-with-deep-learning" style="font-size: 80%;">Solving differential equations  with Deep Learning</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs005.html#ordinary-differential-equations-first" style="font-size: 80%;">Ordinary Differential Equations first</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#the-trial-solution" style="font-size: 80%;">The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs007.html#minimization-process" style="font-size: 80%;">Minimization process</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs008.html#minimizing-the-cost-function-using-gradient-descent-and-automatic-differentiation" style="font-size: 80%;">Minimizing the cost function using gradient descent and automatic differentiation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs009.html#example-exponential-decay" style="font-size: 80%;">Example: Exponential decay</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs010.html#the-function-to-solve-for" style="font-size: 80%;">The function to solve for</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#the-trial-solution" style="font-size: 80%;">The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs012.html#setup-of-network" style="font-size: 80%;">Setup of Network</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs013.html#reformulating-the-problem" style="font-size: 80%;">Reformulating the problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs014.html#more-technicalities" style="font-size: 80%;">More technicalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs039.html#more-details" style="font-size: 80%;">More details</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs016.html#a-possible-implementation-of-a-neural-network" style="font-size: 80%;">A possible implementation of a neural network</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs017.html#technicalities" style="font-size: 80%;">Technicalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs018.html#final-technicalities-i" style="font-size: 80%;">Final technicalities I</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs019.html#final-technicalities-ii" style="font-size: 80%;">Final technicalities II</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs020.html#final-technicalities-iii" style="font-size: 80%;">Final technicalities III</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs021.html#final-technicalities-iv" style="font-size: 80%;">Final technicalities IV</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs022.html#back-propagation" style="font-size: 80%;">Back propagation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs023.html#gradient-descent" style="font-size: 80%;">Gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs024.html#the-code-for-solving-the-ode" style="font-size: 80%;">The code for solving the ODE</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs025.html#the-network-with-one-input-layer-specified-number-of-hidden-layers-and-one-output-layer" style="font-size: 80%;">The network with one input layer, specified number of hidden layers, and one output layer</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs026.html#example-population-growth" style="font-size: 80%;">Example: Population growth</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs027.html#setting-up-the-problem" style="font-size: 80%;">Setting up the problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#the-trial-solution" style="font-size: 80%;">The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs029.html#the-program-using-autograd" style="font-size: 80%;">The program using Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs030.html#using-forward-euler-to-solve-the-ode" style="font-size: 80%;">Using forward Euler to solve the ODE</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs031.html#example-solving-the-one-dimensional-poisson-equation" style="font-size: 80%;">Example: Solving the one dimensional Poisson equation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs032.html#the-specific-equation-to-solve-for" style="font-size: 80%;">The specific equation to solve for</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs033.html#solving-the-equation-using-autograd" style="font-size: 80%;">Solving the equation using Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs034.html#comparing-with-a-numerical-scheme" style="font-size: 80%;">Comparing with a numerical scheme</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs035.html#setting-up-the-code" style="font-size: 80%;">Setting up the code</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs036.html#partial-differential-equations" style="font-size: 80%;">Partial Differential Equations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs037.html#type-of-problem" style="font-size: 80%;">Type of problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs038.html#network-requirements" style="font-size: 80%;">Network requirements</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs039.html#more-details" style="font-size: 80%;">More details</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs040.html#example-the-diffusion-equation" style="font-size: 80%;">Example: The diffusion equation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs041.html#defining-the-problem" style="font-size: 80%;">Defining the problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs042.html#setting-up-the-network-using-autograd" style="font-size: 80%;">Setting up the network using Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs043.html#setting-up-the-network-using-autograd-the-trial-solution" style="font-size: 80%;">Setting up the network using Autograd; The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs044.html#why-the-jacobian" style="font-size: 80%;">Why the Jacobian?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs045.html#setting-up-the-network-using-autograd-the-full-program" style="font-size: 80%;">Setting up the network using Autograd; The full program</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs046.html#resources-on-differential-equations-and-deep-learning" style="font-size: 80%;">Resources on differential equations and deep learning</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs047.html#convolutional-neural-networks-recognizing-images" style="font-size: 80%;">Convolutional Neural Networks (recognizing images)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs048.html#what-is-the-difference" style="font-size: 80%;">What is the Difference</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs049.html#neural-networks-vs-cnns" style="font-size: 80%;">Neural Networks vs CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs050.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;">Why CNNS for images, sound files, medical images from CT scans etc?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs051.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;">Regular NNs don’t scale well to full images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs052.html#3d-volumes-of-neurons" style="font-size: 80%;">3D volumes of neurons</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs053.html#more-on-dimensionalities" style="font-size: 80%;">More on Dimensionalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs054.html#further-remarks" style="font-size: 80%;">Further remarks</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs055.html#layers-used-to-build-cnns" style="font-size: 80%;">Layers used to build CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs056.html#transforming-images" style="font-size: 80%;">Transforming images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs057.html#cnns-in-brief" style="font-size: 80%;">CNNs in brief</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs058.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs059.html#key-idea" style="font-size: 80%;">Key Idea</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#how-to-do-image-compression-before-the-era-of-deep-learning" style="font-size: 80%;">How to do image compression before the era of deep learning</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs061.html#the-svd-example" style="font-size: 80%;">The SVD example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs062.html#mathematics-of-cnns" style="font-size: 80%;">Mathematics of CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs063.html#convolution-examples-polynomial-multiplication" style="font-size: 80%;">Convolution Examples: Polynomial multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs064.html#efficient-polynomial-multiplication" style="font-size: 80%;">Efficient Polynomial Multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs065.html#further-simplification" style="font-size: 80%;">Further simplification</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs066.html#a-more-efficient-way-of-coding-the-above-convolution" style="font-size: 80%;">A more efficient way of coding the above Convolution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs067.html#commutative-process" style="font-size: 80%;">Commutative process</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs068.html#toeplitz-matrices" style="font-size: 80%;">Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs069.html#fourier-series-and-toeplitz-matrices" style="font-size: 80%;">Fourier series and Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs070.html#generalizing-the-above-one-dimensional-case" style="font-size: 80%;">Generalizing the above one-dimensional case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs071.html#memory-considerations" style="font-size: 80%;">Memory considerations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs072.html#padding" style="font-size: 80%;">Padding</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs073.html#new-vector" style="font-size: 80%;">New vector</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs074.html#rewriting-as-dot-products" style="font-size: 80%;">Rewriting as dot products</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs075.html#cross-correlation" style="font-size: 80%;">Cross correlation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs075.html#two-dimensional-objects" style="font-size: 80%;">Two-dimensional objects</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs076.html#cnns-in-more-detail-simple-example" style="font-size: 80%;">CNNs in more detail, simple example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs077.html#the-convolution-stage" style="font-size: 80%;">The convolution stage</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs078.html#finding-the-number-of-parameters" style="font-size: 80%;">Finding the number of parameters</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs079.html#new-image-or-volume" style="font-size: 80%;">New image (or volume)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs080.html#parameters-to-train-common-settings" style="font-size: 80%;">Parameters to train, common settings</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs081.html#examples-of-cnn-setups" style="font-size: 80%;">Examples of CNN setups</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs082.html#summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Summarizing: Performing a general discrete convolution ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs083.html#pooling" style="font-size: 80%;">Pooling</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs084.html#pooling-arithmetic" style="font-size: 80%;">Pooling arithmetic</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs085.html#pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Pooling types ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs086.html#building-convolutional-neural-networks-in-tensorflow-keras-and-pytorch" style="font-size: 80%;">Building convolutional neural networks in Tensorflow/Keras and PyTorch</a></li>
 
         </ul>
       </li>
@@ -355,7 +396,7 @@ <h2 id="lab-sessions-on-tuesday-and-wednesday" class="anchor">Lab  sessions on T
 <!-- subsequent paragraphs come in larger fonts, so start with a paragraph -->
 <ul>
 <li> Main focus is discussion of and work on project 2</li>
-<li> If you did not get time to finish the exercises from week 43, you can also keep working on them and hand in this coming Friday</li>
+<li> If you did not get time to finish the exercises from weeks 41-42, you can also keep working on them and hand in this coming Friday</li>
 </ul>
 </div>
 </div>
@@ -378,7 +419,7 @@ <h2 id="lab-sessions-on-tuesday-and-wednesday" class="anchor">Lab  sessions on T
   <li><a href="http://github.com/._week44-bs010.html">11</a></li>
   <li><a href="http://github.com/._week44-bs011.html">12</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week44-bs060.html">61</a></li>
+  <li><a href="http://github.com/._week44-bs086.html">87</a></li>
   <li><a href="http://github.com/._week44-bs003.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week44/html/._week44-bs003.html b/doc/pub/week44/html/._week44-bs003.html
index 851f78ca0..b8cd74e4b 100644
--- a/doc/pub/week44/html/._week44-bs003.html
+++ b/doc/pub/week44/html/._week44-bs003.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 44,  Convolutional Neural Networks (CNN)">
-<title>Week 44,  Convolutional Neural Networks (CNN)</title>
+<meta name="description" content="Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)">
+<title>Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week44.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week44-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -41,10 +41,121 @@
                2,
                None,
                'lab-sessions-on-tuesday-and-wednesday'),
-              ('Material for Lecture Monday October 28',
+              ('Material for Lecture Monday October 27',
                2,
                None,
-               'material-for-lecture-monday-october-28'),
+               'material-for-lecture-monday-october-27'),
+              ('Solving differential equations  with Deep Learning',
+               2,
+               None,
+               'solving-differential-equations-with-deep-learning'),
+              ('Ordinary Differential Equations first',
+               2,
+               None,
+               'ordinary-differential-equations-first'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('Minimization process', 2, None, 'minimization-process'),
+              ('Minimizing the cost function using gradient descent and '
+               'automatic differentiation',
+               2,
+               None,
+               'minimizing-the-cost-function-using-gradient-descent-and-automatic-differentiation'),
+              ('Example: Exponential decay',
+               2,
+               None,
+               'example-exponential-decay'),
+              ('The function to solve for',
+               2,
+               None,
+               'the-function-to-solve-for'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('Setup of Network', 2, None, 'setup-of-network'),
+              ('Reformulating the problem',
+               2,
+               None,
+               'reformulating-the-problem'),
+              ('More technicalities', 2, None, 'more-technicalities'),
+              ('More details', 2, None, 'more-details'),
+              ('A possible implementation of a neural network',
+               2,
+               None,
+               'a-possible-implementation-of-a-neural-network'),
+              ('Technicalities', 2, None, 'technicalities'),
+              ('Final technicalities I', 2, None, 'final-technicalities-i'),
+              ('Final technicalities II', 2, None, 'final-technicalities-ii'),
+              ('Final technicalities III', 2, None, 'final-technicalities-iii'),
+              ('Final technicalities IV', 2, None, 'final-technicalities-iv'),
+              ('Back propagation', 2, None, 'back-propagation'),
+              ('Gradient descent', 2, None, 'gradient-descent'),
+              ('The code for solving the ODE',
+               2,
+               None,
+               'the-code-for-solving-the-ode'),
+              ('The network with one input layer, specified number of hidden '
+               'layers, and one output layer',
+               2,
+               None,
+               'the-network-with-one-input-layer-specified-number-of-hidden-layers-and-one-output-layer'),
+              ('Example: Population growth',
+               2,
+               None,
+               'example-population-growth'),
+              ('Setting up the problem', 2, None, 'setting-up-the-problem'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('The program using Autograd',
+               2,
+               None,
+               'the-program-using-autograd'),
+              ('Using forward Euler to solve the ODE',
+               2,
+               None,
+               'using-forward-euler-to-solve-the-ode'),
+              ('Example: Solving the one dimensional Poisson equation',
+               2,
+               None,
+               'example-solving-the-one-dimensional-poisson-equation'),
+              ('The specific equation to solve for',
+               2,
+               None,
+               'the-specific-equation-to-solve-for'),
+              ('Solving the equation using Autograd',
+               2,
+               None,
+               'solving-the-equation-using-autograd'),
+              ('Comparing with a numerical scheme',
+               2,
+               None,
+               'comparing-with-a-numerical-scheme'),
+              ('Setting up the code', 2, None, 'setting-up-the-code'),
+              ('Partial Differential Equations',
+               2,
+               None,
+               'partial-differential-equations'),
+              ('Type of problem', 2, None, 'type-of-problem'),
+              ('Network requirements', 2, None, 'network-requirements'),
+              ('More details', 2, None, 'more-details'),
+              ('Example: The diffusion equation',
+               2,
+               None,
+               'example-the-diffusion-equation'),
+              ('Defining the problem', 2, None, 'defining-the-problem'),
+              ('Setting up the network using Autograd',
+               2,
+               None,
+               'setting-up-the-network-using-autograd'),
+              ('Setting up the network using Autograd; The trial solution',
+               2,
+               None,
+               'setting-up-the-network-using-autograd-the-trial-solution'),
+              ('Why the Jacobian?', 2, None, 'why-the-jacobian'),
+              ('Setting up the network using Autograd; The full program',
+               2,
+               None,
+               'setting-up-the-network-using-autograd-the-full-program'),
+              ('Resources on differential equations and deep learning',
+               2,
+               None,
+               'resources-on-differential-equations-and-deep-learning'),
               ('Convolutional Neural Networks (recognizing images)',
                2,
                None,
@@ -141,85 +252,11 @@
                2,
                None,
                'pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
-              ('Building convolutional neural networks in Tensorflow and Keras',
-               2,
-               None,
-               'building-convolutional-neural-networks-in-tensorflow-and-keras'),
-              ('Setting it up', 2, None, 'setting-it-up'),
-              ('The MNIST dataset again', 2, None, 'the-mnist-dataset-again'),
-              ('Strong correlations', 2, None, 'strong-correlations'),
-              ('Layers of a CNN', 2, None, 'layers-of-a-cnn'),
-              ('Systematic reduction', 2, None, 'systematic-reduction'),
-              ('Prerequisites: Collect and pre-process data',
+              ('Building convolutional neural networks in Tensorflow/Keras and '
+               'PyTorch',
                2,
                None,
-               'prerequisites-collect-and-pre-process-data'),
-              ('Importing Keras and Tensorflow',
-               2,
-               None,
-               'importing-keras-and-tensorflow'),
-              ('Running with Keras', 2, None, 'running-with-keras'),
-              ('Final part', 2, None, 'final-part'),
-              ('Final visualization', 2, None, 'final-visualization'),
-              ('The CIFAR01 data set', 2, None, 'the-cifar01-data-set'),
-              ('Verifying the data set', 2, None, 'verifying-the-data-set'),
-              ('Set up  the model', 2, None, 'set-up-the-model'),
-              ('Add Dense layers on top', 2, None, 'add-dense-layers-on-top'),
-              ('Compile and train the model',
-               2,
-               None,
-               'compile-and-train-the-model'),
-              ('Finally, evaluate the model',
-               2,
-               None,
-               'finally-evaluate-the-model'),
-              ('Building our own CNN code',
-               2,
-               None,
-               'building-our-own-cnn-code'),
-              ('List of contents:', 3, None, 'list-of-contents'),
-              ('Schedulers', 3, None, 'schedulers'),
-              ('Usage of schedulers', 3, None, 'usage-of-schedulers'),
-              ('Cost functions', 3, None, 'cost-functions'),
-              ('Usage of cost functions', 3, None, 'usage-of-cost-functions'),
-              ('Activation functions', 3, None, 'activation-functions'),
-              ('Usage of activation functions',
-               3,
-               None,
-               'usage-of-activation-functions'),
-              ('Convolution', 3, None, 'convolution'),
-              ('Layers', 3, None, 'layers'),
-              ('Convolution2DLayer: convolution in a hidden layer',
-               3,
-               None,
-               'convolution2dlayer-convolution-in-a-hidden-layer'),
-              ('Backpropagation in the convolutional layer',
-               3,
-               None,
-               'backpropagation-in-the-convolutional-layer'),
-              ('Demonstration', 3, None, 'demonstration'),
-              ('Pooling Layer', 3, None, 'pooling-layer'),
-              ('Flattening Layer', 3, None, 'flattening-layer'),
-              ('Fully Connected Layers', 3, None, 'fully-connected-layers'),
-              ('Optimized Convolution2DLayer',
-               3,
-               None,
-               'optimized-convolution2dlayer'),
-              ('The Convolutional Neural Network (CNN)',
-               3,
-               None,
-               'the-convolutional-neural-network-cnn'),
-              ('Usage of CNN code', 3, None, 'usage-of-cnn-code'),
-              ('Additional Remarks', 3, None, 'additional-remarks'),
-              ('Remarks on the speed', 3, None, 'remarks-on-the-speed'),
-              ('Convolution using separable kernels',
-               3,
-               None,
-               'convolution-using-separable-kernels'),
-              ('Convolution in the Fourier domain',
-               3,
-               None,
-               'convolution-in-the-fourier-domain')]}
+               'building-convolutional-neural-networks-in-tensorflow-keras-and-pytorch')]}
 end of tocinfo -->
 
 <body>
@@ -247,96 +284,100 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week44-bs.html">Week 44,  Convolutional Neural Networks (CNN)</a>
+    <a class="navbar-brand" href="http://github.com/week44-bs.html">Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs001.html#plan-for-week-44" style="font-size: 80%;"><b>Plan for week 44</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs002.html#lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Lab  sessions on Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="#material-for-lecture-monday-october-28" style="font-size: 80%;"><b>Material for Lecture Monday October 28</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs004.html#convolutional-neural-networks-recognizing-images" style="font-size: 80%;"><b>Convolutional Neural Networks (recognizing images)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs005.html#what-is-the-difference" style="font-size: 80%;"><b>What is the Difference</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs006.html#neural-networks-vs-cnns" style="font-size: 80%;"><b>Neural Networks vs CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs007.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;"><b>Why CNNS for images, sound files, medical images from CT scans etc?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs008.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;"><b>Regular NNs don’t scale well to full images</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs009.html#3d-volumes-of-neurons" style="font-size: 80%;"><b>3D volumes of neurons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs010.html#more-on-dimensionalities" style="font-size: 80%;"><b>More on Dimensionalities</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs011.html#further-remarks" style="font-size: 80%;"><b>Further remarks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs012.html#layers-used-to-build-cnns" style="font-size: 80%;"><b>Layers used to build CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs013.html#transforming-images" style="font-size: 80%;"><b>Transforming images</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs014.html#cnns-in-brief" style="font-size: 80%;"><b>CNNs in brief</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs015.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs016.html#key-idea" style="font-size: 80%;"><b>Key Idea</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs017.html#how-to-do-image-compression-before-the-era-of-deep-learning" style="font-size: 80%;"><b>How to do image compression before the era of deep learning</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs018.html#the-svd-example" style="font-size: 80%;"><b>The SVD example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs019.html#mathematics-of-cnns" style="font-size: 80%;"><b>Mathematics of CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs020.html#convolution-examples-polynomial-multiplication" style="font-size: 80%;"><b>Convolution Examples: Polynomial multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs021.html#efficient-polynomial-multiplication" style="font-size: 80%;"><b>Efficient Polynomial Multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs022.html#further-simplification" style="font-size: 80%;"><b>Further simplification</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs023.html#a-more-efficient-way-of-coding-the-above-convolution" style="font-size: 80%;"><b>A more efficient way of coding the above Convolution</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs024.html#commutative-process" style="font-size: 80%;"><b>Commutative process</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs025.html#toeplitz-matrices" style="font-size: 80%;"><b>Toeplitz matrices</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs026.html#fourier-series-and-toeplitz-matrices" style="font-size: 80%;"><b>Fourier series and Toeplitz matrices</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs027.html#generalizing-the-above-one-dimensional-case" style="font-size: 80%;"><b>Generalizing the above one-dimensional case</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#memory-considerations" style="font-size: 80%;"><b>Memory considerations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs029.html#padding" style="font-size: 80%;"><b>Padding</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs030.html#new-vector" style="font-size: 80%;"><b>New vector</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs031.html#rewriting-as-dot-products" style="font-size: 80%;"><b>Rewriting as dot products</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs032.html#cross-correlation" style="font-size: 80%;"><b>Cross correlation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs032.html#two-dimensional-objects" style="font-size: 80%;"><b>Two-dimensional objects</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs033.html#cnns-in-more-detail-simple-example" style="font-size: 80%;"><b>CNNs in more detail, simple example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs034.html#the-convolution-stage" style="font-size: 80%;"><b>The convolution stage</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs035.html#finding-the-number-of-parameters" style="font-size: 80%;"><b>Finding the number of parameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs036.html#new-image-or-volume" style="font-size: 80%;"><b>New image (or volume)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs037.html#parameters-to-train-common-settings" style="font-size: 80%;"><b>Parameters to train, common settings</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs038.html#examples-of-cnn-setups" style="font-size: 80%;"><b>Examples of CNN setups</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs039.html#summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>Summarizing: Performing a general discrete convolution ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs040.html#pooling" style="font-size: 80%;"><b>Pooling</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs041.html#pooling-arithmetic" style="font-size: 80%;"><b>Pooling arithmetic</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs042.html#pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>Pooling types ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs043.html#building-convolutional-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building convolutional neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs044.html#setting-it-up" style="font-size: 80%;"><b>Setting it up</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs045.html#the-mnist-dataset-again" style="font-size: 80%;"><b>The MNIST dataset again</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs046.html#strong-correlations" style="font-size: 80%;"><b>Strong correlations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs047.html#layers-of-a-cnn" style="font-size: 80%;"><b>Layers of a CNN</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs048.html#systematic-reduction" style="font-size: 80%;"><b>Systematic reduction</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs049.html#prerequisites-collect-and-pre-process-data" style="font-size: 80%;"><b>Prerequisites: Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs050.html#importing-keras-and-tensorflow" style="font-size: 80%;"><b>Importing Keras and Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs051.html#running-with-keras" style="font-size: 80%;"><b>Running with Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs052.html#final-part" style="font-size: 80%;"><b>Final part</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs053.html#final-visualization" style="font-size: 80%;"><b>Final visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs054.html#the-cifar01-data-set" style="font-size: 80%;"><b>The CIFAR01 data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs055.html#verifying-the-data-set" style="font-size: 80%;"><b>Verifying the data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs056.html#set-up-the-model" style="font-size: 80%;"><b>Set up  the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs057.html#add-dense-layers-on-top" style="font-size: 80%;"><b>Add Dense layers on top</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs058.html#compile-and-train-the-model" style="font-size: 80%;"><b>Compile and train the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs059.html#finally-evaluate-the-model" style="font-size: 80%;"><b>Finally, evaluate the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#building-our-own-cnn-code" style="font-size: 80%;"><b>Building our own CNN code</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#list-of-contents" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;List of contents:</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution2dlayer-convolution-in-a-hidden-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution2DLayer: convolution in a hidden layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#backpropagation-in-the-convolutional-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Backpropagation in the convolutional layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#demonstration" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Demonstration</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#pooling-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Pooling Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#flattening-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Flattening Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#fully-connected-layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Fully Connected Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#optimized-convolution2dlayer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Optimized Convolution2DLayer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#the-convolutional-neural-network-cnn" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;The Convolutional Neural Network (CNN)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-cnn-code" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of CNN code</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#additional-remarks" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Additional Remarks</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#remarks-on-the-speed" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Remarks on the speed</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution-using-separable-kernels" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution using separable kernels</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution-in-the-fourier-domain" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution in the Fourier domain</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs001.html#plan-for-week-44" style="font-size: 80%;">Plan for week 44</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs002.html#lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;">Lab  sessions on Tuesday and Wednesday</a></li>
+     <!-- navigation toc: --> <li><a href="#material-for-lecture-monday-october-27" style="font-size: 80%;">Material for Lecture Monday October 27</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs004.html#solving-differential-equations-with-deep-learning" style="font-size: 80%;">Solving differential equations  with Deep Learning</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs005.html#ordinary-differential-equations-first" style="font-size: 80%;">Ordinary Differential Equations first</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#the-trial-solution" style="font-size: 80%;">The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs007.html#minimization-process" style="font-size: 80%;">Minimization process</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs008.html#minimizing-the-cost-function-using-gradient-descent-and-automatic-differentiation" style="font-size: 80%;">Minimizing the cost function using gradient descent and automatic differentiation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs009.html#example-exponential-decay" style="font-size: 80%;">Example: Exponential decay</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs010.html#the-function-to-solve-for" style="font-size: 80%;">The function to solve for</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#the-trial-solution" style="font-size: 80%;">The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs012.html#setup-of-network" style="font-size: 80%;">Setup of Network</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs013.html#reformulating-the-problem" style="font-size: 80%;">Reformulating the problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs014.html#more-technicalities" style="font-size: 80%;">More technicalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs039.html#more-details" style="font-size: 80%;">More details</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs016.html#a-possible-implementation-of-a-neural-network" style="font-size: 80%;">A possible implementation of a neural network</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs017.html#technicalities" style="font-size: 80%;">Technicalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs018.html#final-technicalities-i" style="font-size: 80%;">Final technicalities I</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs019.html#final-technicalities-ii" style="font-size: 80%;">Final technicalities II</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs020.html#final-technicalities-iii" style="font-size: 80%;">Final technicalities III</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs021.html#final-technicalities-iv" style="font-size: 80%;">Final technicalities IV</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs022.html#back-propagation" style="font-size: 80%;">Back propagation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs023.html#gradient-descent" style="font-size: 80%;">Gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs024.html#the-code-for-solving-the-ode" style="font-size: 80%;">The code for solving the ODE</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs025.html#the-network-with-one-input-layer-specified-number-of-hidden-layers-and-one-output-layer" style="font-size: 80%;">The network with one input layer, specified number of hidden layers, and one output layer</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs026.html#example-population-growth" style="font-size: 80%;">Example: Population growth</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs027.html#setting-up-the-problem" style="font-size: 80%;">Setting up the problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#the-trial-solution" style="font-size: 80%;">The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs029.html#the-program-using-autograd" style="font-size: 80%;">The program using Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs030.html#using-forward-euler-to-solve-the-ode" style="font-size: 80%;">Using forward Euler to solve the ODE</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs031.html#example-solving-the-one-dimensional-poisson-equation" style="font-size: 80%;">Example: Solving the one dimensional Poisson equation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs032.html#the-specific-equation-to-solve-for" style="font-size: 80%;">The specific equation to solve for</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs033.html#solving-the-equation-using-autograd" style="font-size: 80%;">Solving the equation using Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs034.html#comparing-with-a-numerical-scheme" style="font-size: 80%;">Comparing with a numerical scheme</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs035.html#setting-up-the-code" style="font-size: 80%;">Setting up the code</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs036.html#partial-differential-equations" style="font-size: 80%;">Partial Differential Equations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs037.html#type-of-problem" style="font-size: 80%;">Type of problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs038.html#network-requirements" style="font-size: 80%;">Network requirements</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs039.html#more-details" style="font-size: 80%;">More details</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs040.html#example-the-diffusion-equation" style="font-size: 80%;">Example: The diffusion equation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs041.html#defining-the-problem" style="font-size: 80%;">Defining the problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs042.html#setting-up-the-network-using-autograd" style="font-size: 80%;">Setting up the network using Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs043.html#setting-up-the-network-using-autograd-the-trial-solution" style="font-size: 80%;">Setting up the network using Autograd; The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs044.html#why-the-jacobian" style="font-size: 80%;">Why the Jacobian?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs045.html#setting-up-the-network-using-autograd-the-full-program" style="font-size: 80%;">Setting up the network using Autograd; The full program</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs046.html#resources-on-differential-equations-and-deep-learning" style="font-size: 80%;">Resources on differential equations and deep learning</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs047.html#convolutional-neural-networks-recognizing-images" style="font-size: 80%;">Convolutional Neural Networks (recognizing images)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs048.html#what-is-the-difference" style="font-size: 80%;">What is the Difference</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs049.html#neural-networks-vs-cnns" style="font-size: 80%;">Neural Networks vs CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs050.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;">Why CNNS for images, sound files, medical images from CT scans etc?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs051.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;">Regular NNs don’t scale well to full images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs052.html#3d-volumes-of-neurons" style="font-size: 80%;">3D volumes of neurons</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs053.html#more-on-dimensionalities" style="font-size: 80%;">More on Dimensionalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs054.html#further-remarks" style="font-size: 80%;">Further remarks</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs055.html#layers-used-to-build-cnns" style="font-size: 80%;">Layers used to build CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs056.html#transforming-images" style="font-size: 80%;">Transforming images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs057.html#cnns-in-brief" style="font-size: 80%;">CNNs in brief</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs058.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs059.html#key-idea" style="font-size: 80%;">Key Idea</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#how-to-do-image-compression-before-the-era-of-deep-learning" style="font-size: 80%;">How to do image compression before the era of deep learning</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs061.html#the-svd-example" style="font-size: 80%;">The SVD example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs062.html#mathematics-of-cnns" style="font-size: 80%;">Mathematics of CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs063.html#convolution-examples-polynomial-multiplication" style="font-size: 80%;">Convolution Examples: Polynomial multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs064.html#efficient-polynomial-multiplication" style="font-size: 80%;">Efficient Polynomial Multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs065.html#further-simplification" style="font-size: 80%;">Further simplification</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs066.html#a-more-efficient-way-of-coding-the-above-convolution" style="font-size: 80%;">A more efficient way of coding the above Convolution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs067.html#commutative-process" style="font-size: 80%;">Commutative process</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs068.html#toeplitz-matrices" style="font-size: 80%;">Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs069.html#fourier-series-and-toeplitz-matrices" style="font-size: 80%;">Fourier series and Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs070.html#generalizing-the-above-one-dimensional-case" style="font-size: 80%;">Generalizing the above one-dimensional case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs071.html#memory-considerations" style="font-size: 80%;">Memory considerations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs072.html#padding" style="font-size: 80%;">Padding</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs073.html#new-vector" style="font-size: 80%;">New vector</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs074.html#rewriting-as-dot-products" style="font-size: 80%;">Rewriting as dot products</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs075.html#cross-correlation" style="font-size: 80%;">Cross correlation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs075.html#two-dimensional-objects" style="font-size: 80%;">Two-dimensional objects</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs076.html#cnns-in-more-detail-simple-example" style="font-size: 80%;">CNNs in more detail, simple example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs077.html#the-convolution-stage" style="font-size: 80%;">The convolution stage</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs078.html#finding-the-number-of-parameters" style="font-size: 80%;">Finding the number of parameters</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs079.html#new-image-or-volume" style="font-size: 80%;">New image (or volume)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs080.html#parameters-to-train-common-settings" style="font-size: 80%;">Parameters to train, common settings</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs081.html#examples-of-cnn-setups" style="font-size: 80%;">Examples of CNN setups</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs082.html#summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Summarizing: Performing a general discrete convolution ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs083.html#pooling" style="font-size: 80%;">Pooling</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs084.html#pooling-arithmetic" style="font-size: 80%;">Pooling arithmetic</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs085.html#pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Pooling types ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs086.html#building-convolutional-neural-networks-in-tensorflow-keras-and-pytorch" style="font-size: 80%;">Building convolutional neural networks in Tensorflow/Keras and PyTorch</a></li>
 
         </ul>
       </li>
@@ -348,7 +389,7 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0003"></a>
 <!-- !split -->
-<h2 id="material-for-lecture-monday-october-28" class="anchor">Material for Lecture Monday October 28 </h2>
+<h2 id="material-for-lecture-monday-october-27" class="anchor">Material for Lecture Monday October 27 </h2>
 
 <p>
 <!-- navigation buttons at the bottom of the page -->
@@ -368,7 +409,7 @@ <h2 id="material-for-lecture-monday-october-28" class="anchor">Material for Lect
   <li><a href="http://github.com/._week44-bs011.html">12</a></li>
   <li><a href="http://github.com/._week44-bs012.html">13</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week44-bs060.html">61</a></li>
+  <li><a href="http://github.com/._week44-bs086.html">87</a></li>
   <li><a href="http://github.com/._week44-bs004.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week44/html/._week44-bs004.html b/doc/pub/week44/html/._week44-bs004.html
index 6fbfc36bb..14f863aec 100644
--- a/doc/pub/week44/html/._week44-bs004.html
+++ b/doc/pub/week44/html/._week44-bs004.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 44,  Convolutional Neural Networks (CNN)">
-<title>Week 44,  Convolutional Neural Networks (CNN)</title>
+<meta name="description" content="Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)">
+<title>Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week44.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week44-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -41,10 +41,121 @@
                2,
                None,
                'lab-sessions-on-tuesday-and-wednesday'),
-              ('Material for Lecture Monday October 28',
+              ('Material for Lecture Monday October 27',
                2,
                None,
-               'material-for-lecture-monday-october-28'),
+               'material-for-lecture-monday-october-27'),
+              ('Solving differential equations  with Deep Learning',
+               2,
+               None,
+               'solving-differential-equations-with-deep-learning'),
+              ('Ordinary Differential Equations first',
+               2,
+               None,
+               'ordinary-differential-equations-first'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('Minimization process', 2, None, 'minimization-process'),
+              ('Minimizing the cost function using gradient descent and '
+               'automatic differentiation',
+               2,
+               None,
+               'minimizing-the-cost-function-using-gradient-descent-and-automatic-differentiation'),
+              ('Example: Exponential decay',
+               2,
+               None,
+               'example-exponential-decay'),
+              ('The function to solve for',
+               2,
+               None,
+               'the-function-to-solve-for'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('Setup of Network', 2, None, 'setup-of-network'),
+              ('Reformulating the problem',
+               2,
+               None,
+               'reformulating-the-problem'),
+              ('More technicalities', 2, None, 'more-technicalities'),
+              ('More details', 2, None, 'more-details'),
+              ('A possible implementation of a neural network',
+               2,
+               None,
+               'a-possible-implementation-of-a-neural-network'),
+              ('Technicalities', 2, None, 'technicalities'),
+              ('Final technicalities I', 2, None, 'final-technicalities-i'),
+              ('Final technicalities II', 2, None, 'final-technicalities-ii'),
+              ('Final technicalities III', 2, None, 'final-technicalities-iii'),
+              ('Final technicalities IV', 2, None, 'final-technicalities-iv'),
+              ('Back propagation', 2, None, 'back-propagation'),
+              ('Gradient descent', 2, None, 'gradient-descent'),
+              ('The code for solving the ODE',
+               2,
+               None,
+               'the-code-for-solving-the-ode'),
+              ('The network with one input layer, specified number of hidden '
+               'layers, and one output layer',
+               2,
+               None,
+               'the-network-with-one-input-layer-specified-number-of-hidden-layers-and-one-output-layer'),
+              ('Example: Population growth',
+               2,
+               None,
+               'example-population-growth'),
+              ('Setting up the problem', 2, None, 'setting-up-the-problem'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('The program using Autograd',
+               2,
+               None,
+               'the-program-using-autograd'),
+              ('Using forward Euler to solve the ODE',
+               2,
+               None,
+               'using-forward-euler-to-solve-the-ode'),
+              ('Example: Solving the one dimensional Poisson equation',
+               2,
+               None,
+               'example-solving-the-one-dimensional-poisson-equation'),
+              ('The specific equation to solve for',
+               2,
+               None,
+               'the-specific-equation-to-solve-for'),
+              ('Solving the equation using Autograd',
+               2,
+               None,
+               'solving-the-equation-using-autograd'),
+              ('Comparing with a numerical scheme',
+               2,
+               None,
+               'comparing-with-a-numerical-scheme'),
+              ('Setting up the code', 2, None, 'setting-up-the-code'),
+              ('Partial Differential Equations',
+               2,
+               None,
+               'partial-differential-equations'),
+              ('Type of problem', 2, None, 'type-of-problem'),
+              ('Network requirements', 2, None, 'network-requirements'),
+              ('More details', 2, None, 'more-details'),
+              ('Example: The diffusion equation',
+               2,
+               None,
+               'example-the-diffusion-equation'),
+              ('Defining the problem', 2, None, 'defining-the-problem'),
+              ('Setting up the network using Autograd',
+               2,
+               None,
+               'setting-up-the-network-using-autograd'),
+              ('Setting up the network using Autograd; The trial solution',
+               2,
+               None,
+               'setting-up-the-network-using-autograd-the-trial-solution'),
+              ('Why the Jacobian?', 2, None, 'why-the-jacobian'),
+              ('Setting up the network using Autograd; The full program',
+               2,
+               None,
+               'setting-up-the-network-using-autograd-the-full-program'),
+              ('Resources on differential equations and deep learning',
+               2,
+               None,
+               'resources-on-differential-equations-and-deep-learning'),
               ('Convolutional Neural Networks (recognizing images)',
                2,
                None,
@@ -141,85 +252,11 @@
                2,
                None,
                'pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
-              ('Building convolutional neural networks in Tensorflow and Keras',
-               2,
-               None,
-               'building-convolutional-neural-networks-in-tensorflow-and-keras'),
-              ('Setting it up', 2, None, 'setting-it-up'),
-              ('The MNIST dataset again', 2, None, 'the-mnist-dataset-again'),
-              ('Strong correlations', 2, None, 'strong-correlations'),
-              ('Layers of a CNN', 2, None, 'layers-of-a-cnn'),
-              ('Systematic reduction', 2, None, 'systematic-reduction'),
-              ('Prerequisites: Collect and pre-process data',
-               2,
-               None,
-               'prerequisites-collect-and-pre-process-data'),
-              ('Importing Keras and Tensorflow',
+              ('Building convolutional neural networks in Tensorflow/Keras and '
+               'PyTorch',
                2,
                None,
-               'importing-keras-and-tensorflow'),
-              ('Running with Keras', 2, None, 'running-with-keras'),
-              ('Final part', 2, None, 'final-part'),
-              ('Final visualization', 2, None, 'final-visualization'),
-              ('The CIFAR01 data set', 2, None, 'the-cifar01-data-set'),
-              ('Verifying the data set', 2, None, 'verifying-the-data-set'),
-              ('Set up  the model', 2, None, 'set-up-the-model'),
-              ('Add Dense layers on top', 2, None, 'add-dense-layers-on-top'),
-              ('Compile and train the model',
-               2,
-               None,
-               'compile-and-train-the-model'),
-              ('Finally, evaluate the model',
-               2,
-               None,
-               'finally-evaluate-the-model'),
-              ('Building our own CNN code',
-               2,
-               None,
-               'building-our-own-cnn-code'),
-              ('List of contents:', 3, None, 'list-of-contents'),
-              ('Schedulers', 3, None, 'schedulers'),
-              ('Usage of schedulers', 3, None, 'usage-of-schedulers'),
-              ('Cost functions', 3, None, 'cost-functions'),
-              ('Usage of cost functions', 3, None, 'usage-of-cost-functions'),
-              ('Activation functions', 3, None, 'activation-functions'),
-              ('Usage of activation functions',
-               3,
-               None,
-               'usage-of-activation-functions'),
-              ('Convolution', 3, None, 'convolution'),
-              ('Layers', 3, None, 'layers'),
-              ('Convolution2DLayer: convolution in a hidden layer',
-               3,
-               None,
-               'convolution2dlayer-convolution-in-a-hidden-layer'),
-              ('Backpropagation in the convolutional layer',
-               3,
-               None,
-               'backpropagation-in-the-convolutional-layer'),
-              ('Demonstration', 3, None, 'demonstration'),
-              ('Pooling Layer', 3, None, 'pooling-layer'),
-              ('Flattening Layer', 3, None, 'flattening-layer'),
-              ('Fully Connected Layers', 3, None, 'fully-connected-layers'),
-              ('Optimized Convolution2DLayer',
-               3,
-               None,
-               'optimized-convolution2dlayer'),
-              ('The Convolutional Neural Network (CNN)',
-               3,
-               None,
-               'the-convolutional-neural-network-cnn'),
-              ('Usage of CNN code', 3, None, 'usage-of-cnn-code'),
-              ('Additional Remarks', 3, None, 'additional-remarks'),
-              ('Remarks on the speed', 3, None, 'remarks-on-the-speed'),
-              ('Convolution using separable kernels',
-               3,
-               None,
-               'convolution-using-separable-kernels'),
-              ('Convolution in the Fourier domain',
-               3,
-               None,
-               'convolution-in-the-fourier-domain')]}
+               'building-convolutional-neural-networks-in-tensorflow-keras-and-pytorch')]}
 end of tocinfo -->
 
 <body>
@@ -247,96 +284,100 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week44-bs.html">Week 44,  Convolutional Neural Networks (CNN)</a>
+    <a class="navbar-brand" href="http://github.com/week44-bs.html">Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs001.html#plan-for-week-44" style="font-size: 80%;"><b>Plan for week 44</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs002.html#lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Lab  sessions on Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs003.html#material-for-lecture-monday-october-28" style="font-size: 80%;"><b>Material for Lecture Monday October 28</b></a></li>
-     <!-- navigation toc: --> <li><a href="#convolutional-neural-networks-recognizing-images" style="font-size: 80%;"><b>Convolutional Neural Networks (recognizing images)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs005.html#what-is-the-difference" style="font-size: 80%;"><b>What is the Difference</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs006.html#neural-networks-vs-cnns" style="font-size: 80%;"><b>Neural Networks vs CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs007.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;"><b>Why CNNS for images, sound files, medical images from CT scans etc?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs008.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;"><b>Regular NNs don’t scale well to full images</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs009.html#3d-volumes-of-neurons" style="font-size: 80%;"><b>3D volumes of neurons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs010.html#more-on-dimensionalities" style="font-size: 80%;"><b>More on Dimensionalities</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs011.html#further-remarks" style="font-size: 80%;"><b>Further remarks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs012.html#layers-used-to-build-cnns" style="font-size: 80%;"><b>Layers used to build CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs013.html#transforming-images" style="font-size: 80%;"><b>Transforming images</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs014.html#cnns-in-brief" style="font-size: 80%;"><b>CNNs in brief</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs015.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs016.html#key-idea" style="font-size: 80%;"><b>Key Idea</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs017.html#how-to-do-image-compression-before-the-era-of-deep-learning" style="font-size: 80%;"><b>How to do image compression before the era of deep learning</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs018.html#the-svd-example" style="font-size: 80%;"><b>The SVD example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs019.html#mathematics-of-cnns" style="font-size: 80%;"><b>Mathematics of CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs020.html#convolution-examples-polynomial-multiplication" style="font-size: 80%;"><b>Convolution Examples: Polynomial multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs021.html#efficient-polynomial-multiplication" style="font-size: 80%;"><b>Efficient Polynomial Multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs022.html#further-simplification" style="font-size: 80%;"><b>Further simplification</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs023.html#a-more-efficient-way-of-coding-the-above-convolution" style="font-size: 80%;"><b>A more efficient way of coding the above Convolution</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs024.html#commutative-process" style="font-size: 80%;"><b>Commutative process</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs025.html#toeplitz-matrices" style="font-size: 80%;"><b>Toeplitz matrices</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs026.html#fourier-series-and-toeplitz-matrices" style="font-size: 80%;"><b>Fourier series and Toeplitz matrices</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs027.html#generalizing-the-above-one-dimensional-case" style="font-size: 80%;"><b>Generalizing the above one-dimensional case</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#memory-considerations" style="font-size: 80%;"><b>Memory considerations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs029.html#padding" style="font-size: 80%;"><b>Padding</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs030.html#new-vector" style="font-size: 80%;"><b>New vector</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs031.html#rewriting-as-dot-products" style="font-size: 80%;"><b>Rewriting as dot products</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs032.html#cross-correlation" style="font-size: 80%;"><b>Cross correlation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs032.html#two-dimensional-objects" style="font-size: 80%;"><b>Two-dimensional objects</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs033.html#cnns-in-more-detail-simple-example" style="font-size: 80%;"><b>CNNs in more detail, simple example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs034.html#the-convolution-stage" style="font-size: 80%;"><b>The convolution stage</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs035.html#finding-the-number-of-parameters" style="font-size: 80%;"><b>Finding the number of parameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs036.html#new-image-or-volume" style="font-size: 80%;"><b>New image (or volume)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs037.html#parameters-to-train-common-settings" style="font-size: 80%;"><b>Parameters to train, common settings</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs038.html#examples-of-cnn-setups" style="font-size: 80%;"><b>Examples of CNN setups</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs039.html#summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>Summarizing: Performing a general discrete convolution ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs040.html#pooling" style="font-size: 80%;"><b>Pooling</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs041.html#pooling-arithmetic" style="font-size: 80%;"><b>Pooling arithmetic</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs042.html#pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>Pooling types ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs043.html#building-convolutional-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building convolutional neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs044.html#setting-it-up" style="font-size: 80%;"><b>Setting it up</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs045.html#the-mnist-dataset-again" style="font-size: 80%;"><b>The MNIST dataset again</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs046.html#strong-correlations" style="font-size: 80%;"><b>Strong correlations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs047.html#layers-of-a-cnn" style="font-size: 80%;"><b>Layers of a CNN</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs048.html#systematic-reduction" style="font-size: 80%;"><b>Systematic reduction</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs049.html#prerequisites-collect-and-pre-process-data" style="font-size: 80%;"><b>Prerequisites: Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs050.html#importing-keras-and-tensorflow" style="font-size: 80%;"><b>Importing Keras and Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs051.html#running-with-keras" style="font-size: 80%;"><b>Running with Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs052.html#final-part" style="font-size: 80%;"><b>Final part</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs053.html#final-visualization" style="font-size: 80%;"><b>Final visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs054.html#the-cifar01-data-set" style="font-size: 80%;"><b>The CIFAR01 data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs055.html#verifying-the-data-set" style="font-size: 80%;"><b>Verifying the data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs056.html#set-up-the-model" style="font-size: 80%;"><b>Set up  the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs057.html#add-dense-layers-on-top" style="font-size: 80%;"><b>Add Dense layers on top</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs058.html#compile-and-train-the-model" style="font-size: 80%;"><b>Compile and train the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs059.html#finally-evaluate-the-model" style="font-size: 80%;"><b>Finally, evaluate the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#building-our-own-cnn-code" style="font-size: 80%;"><b>Building our own CNN code</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#list-of-contents" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;List of contents:</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution2dlayer-convolution-in-a-hidden-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution2DLayer: convolution in a hidden layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#backpropagation-in-the-convolutional-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Backpropagation in the convolutional layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#demonstration" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Demonstration</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#pooling-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Pooling Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#flattening-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Flattening Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#fully-connected-layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Fully Connected Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#optimized-convolution2dlayer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Optimized Convolution2DLayer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#the-convolutional-neural-network-cnn" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;The Convolutional Neural Network (CNN)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-cnn-code" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of CNN code</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#additional-remarks" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Additional Remarks</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#remarks-on-the-speed" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Remarks on the speed</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution-using-separable-kernels" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution using separable kernels</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution-in-the-fourier-domain" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution in the Fourier domain</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs001.html#plan-for-week-44" style="font-size: 80%;">Plan for week 44</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs002.html#lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;">Lab  sessions on Tuesday and Wednesday</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs003.html#material-for-lecture-monday-october-27" style="font-size: 80%;">Material for Lecture Monday October 27</a></li>
+     <!-- navigation toc: --> <li><a href="#solving-differential-equations-with-deep-learning" style="font-size: 80%;">Solving differential equations  with Deep Learning</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs005.html#ordinary-differential-equations-first" style="font-size: 80%;">Ordinary Differential Equations first</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#the-trial-solution" style="font-size: 80%;">The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs007.html#minimization-process" style="font-size: 80%;">Minimization process</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs008.html#minimizing-the-cost-function-using-gradient-descent-and-automatic-differentiation" style="font-size: 80%;">Minimizing the cost function using gradient descent and automatic differentiation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs009.html#example-exponential-decay" style="font-size: 80%;">Example: Exponential decay</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs010.html#the-function-to-solve-for" style="font-size: 80%;">The function to solve for</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#the-trial-solution" style="font-size: 80%;">The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs012.html#setup-of-network" style="font-size: 80%;">Setup of Network</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs013.html#reformulating-the-problem" style="font-size: 80%;">Reformulating the problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs014.html#more-technicalities" style="font-size: 80%;">More technicalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs039.html#more-details" style="font-size: 80%;">More details</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs016.html#a-possible-implementation-of-a-neural-network" style="font-size: 80%;">A possible implementation of a neural network</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs017.html#technicalities" style="font-size: 80%;">Technicalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs018.html#final-technicalities-i" style="font-size: 80%;">Final technicalities I</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs019.html#final-technicalities-ii" style="font-size: 80%;">Final technicalities II</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs020.html#final-technicalities-iii" style="font-size: 80%;">Final technicalities III</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs021.html#final-technicalities-iv" style="font-size: 80%;">Final technicalities IV</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs022.html#back-propagation" style="font-size: 80%;">Back propagation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs023.html#gradient-descent" style="font-size: 80%;">Gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs024.html#the-code-for-solving-the-ode" style="font-size: 80%;">The code for solving the ODE</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs025.html#the-network-with-one-input-layer-specified-number-of-hidden-layers-and-one-output-layer" style="font-size: 80%;">The network with one input layer, specified number of hidden layers, and one output layer</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs026.html#example-population-growth" style="font-size: 80%;">Example: Population growth</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs027.html#setting-up-the-problem" style="font-size: 80%;">Setting up the problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#the-trial-solution" style="font-size: 80%;">The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs029.html#the-program-using-autograd" style="font-size: 80%;">The program using Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs030.html#using-forward-euler-to-solve-the-ode" style="font-size: 80%;">Using forward Euler to solve the ODE</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs031.html#example-solving-the-one-dimensional-poisson-equation" style="font-size: 80%;">Example: Solving the one dimensional Poisson equation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs032.html#the-specific-equation-to-solve-for" style="font-size: 80%;">The specific equation to solve for</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs033.html#solving-the-equation-using-autograd" style="font-size: 80%;">Solving the equation using Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs034.html#comparing-with-a-numerical-scheme" style="font-size: 80%;">Comparing with a numerical scheme</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs035.html#setting-up-the-code" style="font-size: 80%;">Setting up the code</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs036.html#partial-differential-equations" style="font-size: 80%;">Partial Differential Equations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs037.html#type-of-problem" style="font-size: 80%;">Type of problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs038.html#network-requirements" style="font-size: 80%;">Network requirements</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs039.html#more-details" style="font-size: 80%;">More details</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs040.html#example-the-diffusion-equation" style="font-size: 80%;">Example: The diffusion equation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs041.html#defining-the-problem" style="font-size: 80%;">Defining the problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs042.html#setting-up-the-network-using-autograd" style="font-size: 80%;">Setting up the network using Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs043.html#setting-up-the-network-using-autograd-the-trial-solution" style="font-size: 80%;">Setting up the network using Autograd; The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs044.html#why-the-jacobian" style="font-size: 80%;">Why the Jacobian?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs045.html#setting-up-the-network-using-autograd-the-full-program" style="font-size: 80%;">Setting up the network using Autograd; The full program</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs046.html#resources-on-differential-equations-and-deep-learning" style="font-size: 80%;">Resources on differential equations and deep learning</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs047.html#convolutional-neural-networks-recognizing-images" style="font-size: 80%;">Convolutional Neural Networks (recognizing images)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs048.html#what-is-the-difference" style="font-size: 80%;">What is the Difference</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs049.html#neural-networks-vs-cnns" style="font-size: 80%;">Neural Networks vs CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs050.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;">Why CNNS for images, sound files, medical images from CT scans etc?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs051.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;">Regular NNs don’t scale well to full images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs052.html#3d-volumes-of-neurons" style="font-size: 80%;">3D volumes of neurons</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs053.html#more-on-dimensionalities" style="font-size: 80%;">More on Dimensionalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs054.html#further-remarks" style="font-size: 80%;">Further remarks</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs055.html#layers-used-to-build-cnns" style="font-size: 80%;">Layers used to build CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs056.html#transforming-images" style="font-size: 80%;">Transforming images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs057.html#cnns-in-brief" style="font-size: 80%;">CNNs in brief</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs058.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs059.html#key-idea" style="font-size: 80%;">Key Idea</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#how-to-do-image-compression-before-the-era-of-deep-learning" style="font-size: 80%;">How to do image compression before the era of deep learning</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs061.html#the-svd-example" style="font-size: 80%;">The SVD example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs062.html#mathematics-of-cnns" style="font-size: 80%;">Mathematics of CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs063.html#convolution-examples-polynomial-multiplication" style="font-size: 80%;">Convolution Examples: Polynomial multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs064.html#efficient-polynomial-multiplication" style="font-size: 80%;">Efficient Polynomial Multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs065.html#further-simplification" style="font-size: 80%;">Further simplification</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs066.html#a-more-efficient-way-of-coding-the-above-convolution" style="font-size: 80%;">A more efficient way of coding the above Convolution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs067.html#commutative-process" style="font-size: 80%;">Commutative process</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs068.html#toeplitz-matrices" style="font-size: 80%;">Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs069.html#fourier-series-and-toeplitz-matrices" style="font-size: 80%;">Fourier series and Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs070.html#generalizing-the-above-one-dimensional-case" style="font-size: 80%;">Generalizing the above one-dimensional case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs071.html#memory-considerations" style="font-size: 80%;">Memory considerations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs072.html#padding" style="font-size: 80%;">Padding</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs073.html#new-vector" style="font-size: 80%;">New vector</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs074.html#rewriting-as-dot-products" style="font-size: 80%;">Rewriting as dot products</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs075.html#cross-correlation" style="font-size: 80%;">Cross correlation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs075.html#two-dimensional-objects" style="font-size: 80%;">Two-dimensional objects</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs076.html#cnns-in-more-detail-simple-example" style="font-size: 80%;">CNNs in more detail, simple example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs077.html#the-convolution-stage" style="font-size: 80%;">The convolution stage</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs078.html#finding-the-number-of-parameters" style="font-size: 80%;">Finding the number of parameters</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs079.html#new-image-or-volume" style="font-size: 80%;">New image (or volume)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs080.html#parameters-to-train-common-settings" style="font-size: 80%;">Parameters to train, common settings</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs081.html#examples-of-cnn-setups" style="font-size: 80%;">Examples of CNN setups</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs082.html#summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Summarizing: Performing a general discrete convolution ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs083.html#pooling" style="font-size: 80%;">Pooling</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs084.html#pooling-arithmetic" style="font-size: 80%;">Pooling arithmetic</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs085.html#pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Pooling types ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs086.html#building-convolutional-neural-networks-in-tensorflow-keras-and-pytorch" style="font-size: 80%;">Building convolutional neural networks in Tensorflow/Keras and PyTorch</a></li>
 
         </ul>
       </li>
@@ -348,26 +389,44 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0004"></a>
 <!-- !split -->
-<h2 id="convolutional-neural-networks-recognizing-images" class="anchor">Convolutional Neural Networks (recognizing images) </h2>
+<h2 id="solving-differential-equations-with-deep-learning" class="anchor">Solving differential equations  with Deep Learning </h2>
 
-<p>Convolutional neural networks (CNNs) were developed during the last
-decade of the previous century, with a focus on character recognition
-tasks. Nowadays, CNNs are a central element in the spectacular success
-of deep learning methods. The success in for example image
-classifications have made them a central tool for most machine
-learning practitioners.
+<div class="panel panel-default">
+<div class="panel-body">
+<!-- subsequent paragraphs come in larger fonts, so start with a paragraph -->
+<p>The Universal Approximation Theorem states that a neural network can
+approximate any function at a single hidden layer along with one input
+and output layer to any given precision.  
 </p>
+</div>
+</div>
+
+
+<div class="panel panel-default">
+<div class="panel-body">
+<!-- subsequent paragraphs come in larger fonts, so start with a paragraph -->
+<p><a href="https://www.springer.com/gp/book/9789401798150" target="_self">An Introduction to Neural Network Methods for Differential Equations</a>, by Yadav and Kumar.</p>
+</div>
+</div>
+
+
+<div class="panel panel-default">
+<div class="panel-body">
+<!-- subsequent paragraphs come in larger fonts, so start with a paragraph -->
+<p><a href="https://link.springer.com/article/10.1007/s10915-022-01939-z" target="_self">Scientific Machine Learning Through Physics&#8211;Informed Neural Networks: Where we are and What&#8217;s Next</a>, by Cuomo et al</p>
+</div>
+</div>
 
-<p>CNNs are very similar to ordinary Neural Networks.
-They are made up of neurons that have learnable weights and
-biases. Each neuron receives some inputs, performs a dot product and
-optionally follows it with a non-linearity. The whole network still
-expresses a single differentiable score function: from the raw image
-pixels on one end to class scores at the other. And they still have a
-loss function (for example Softmax) on the last (fully-connected) layer
-and all the tips/tricks we developed for learning regular Neural
-Networks still apply (back propagation, gradient descent etc etc).
+
+<div class="panel panel-default">
+<div class="panel-body">
+<!-- subsequent paragraphs come in larger fonts, so start with a paragraph -->
+<p>The lectures on differential equations were developed by Kristine Baluka Hein, now PhD student at IFI.
+A great thanks to Kristine.
 </p>
+</div>
+</div>
+
 
 <p>
 <!-- navigation buttons at the bottom of the page -->
@@ -388,7 +447,7 @@ <h2 id="convolutional-neural-networks-recognizing-images" class="anchor">Convolu
   <li><a href="http://github.com/._week44-bs012.html">13</a></li>
   <li><a href="http://github.com/._week44-bs013.html">14</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week44-bs060.html">61</a></li>
+  <li><a href="http://github.com/._week44-bs086.html">87</a></li>
   <li><a href="http://github.com/._week44-bs005.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week44/html/._week44-bs005.html b/doc/pub/week44/html/._week44-bs005.html
index b5d317bf6..0634e0df7 100644
--- a/doc/pub/week44/html/._week44-bs005.html
+++ b/doc/pub/week44/html/._week44-bs005.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 44,  Convolutional Neural Networks (CNN)">
-<title>Week 44,  Convolutional Neural Networks (CNN)</title>
+<meta name="description" content="Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)">
+<title>Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week44.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week44-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -41,10 +41,121 @@
                2,
                None,
                'lab-sessions-on-tuesday-and-wednesday'),
-              ('Material for Lecture Monday October 28',
+              ('Material for Lecture Monday October 27',
                2,
                None,
-               'material-for-lecture-monday-october-28'),
+               'material-for-lecture-monday-october-27'),
+              ('Solving differential equations  with Deep Learning',
+               2,
+               None,
+               'solving-differential-equations-with-deep-learning'),
+              ('Ordinary Differential Equations first',
+               2,
+               None,
+               'ordinary-differential-equations-first'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('Minimization process', 2, None, 'minimization-process'),
+              ('Minimizing the cost function using gradient descent and '
+               'automatic differentiation',
+               2,
+               None,
+               'minimizing-the-cost-function-using-gradient-descent-and-automatic-differentiation'),
+              ('Example: Exponential decay',
+               2,
+               None,
+               'example-exponential-decay'),
+              ('The function to solve for',
+               2,
+               None,
+               'the-function-to-solve-for'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('Setup of Network', 2, None, 'setup-of-network'),
+              ('Reformulating the problem',
+               2,
+               None,
+               'reformulating-the-problem'),
+              ('More technicalities', 2, None, 'more-technicalities'),
+              ('More details', 2, None, 'more-details'),
+              ('A possible implementation of a neural network',
+               2,
+               None,
+               'a-possible-implementation-of-a-neural-network'),
+              ('Technicalities', 2, None, 'technicalities'),
+              ('Final technicalities I', 2, None, 'final-technicalities-i'),
+              ('Final technicalities II', 2, None, 'final-technicalities-ii'),
+              ('Final technicalities III', 2, None, 'final-technicalities-iii'),
+              ('Final technicalities IV', 2, None, 'final-technicalities-iv'),
+              ('Back propagation', 2, None, 'back-propagation'),
+              ('Gradient descent', 2, None, 'gradient-descent'),
+              ('The code for solving the ODE',
+               2,
+               None,
+               'the-code-for-solving-the-ode'),
+              ('The network with one input layer, specified number of hidden '
+               'layers, and one output layer',
+               2,
+               None,
+               'the-network-with-one-input-layer-specified-number-of-hidden-layers-and-one-output-layer'),
+              ('Example: Population growth',
+               2,
+               None,
+               'example-population-growth'),
+              ('Setting up the problem', 2, None, 'setting-up-the-problem'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('The program using Autograd',
+               2,
+               None,
+               'the-program-using-autograd'),
+              ('Using forward Euler to solve the ODE',
+               2,
+               None,
+               'using-forward-euler-to-solve-the-ode'),
+              ('Example: Solving the one dimensional Poisson equation',
+               2,
+               None,
+               'example-solving-the-one-dimensional-poisson-equation'),
+              ('The specific equation to solve for',
+               2,
+               None,
+               'the-specific-equation-to-solve-for'),
+              ('Solving the equation using Autograd',
+               2,
+               None,
+               'solving-the-equation-using-autograd'),
+              ('Comparing with a numerical scheme',
+               2,
+               None,
+               'comparing-with-a-numerical-scheme'),
+              ('Setting up the code', 2, None, 'setting-up-the-code'),
+              ('Partial Differential Equations',
+               2,
+               None,
+               'partial-differential-equations'),
+              ('Type of problem', 2, None, 'type-of-problem'),
+              ('Network requirements', 2, None, 'network-requirements'),
+              ('More details', 2, None, 'more-details'),
+              ('Example: The diffusion equation',
+               2,
+               None,
+               'example-the-diffusion-equation'),
+              ('Defining the problem', 2, None, 'defining-the-problem'),
+              ('Setting up the network using Autograd',
+               2,
+               None,
+               'setting-up-the-network-using-autograd'),
+              ('Setting up the network using Autograd; The trial solution',
+               2,
+               None,
+               'setting-up-the-network-using-autograd-the-trial-solution'),
+              ('Why the Jacobian?', 2, None, 'why-the-jacobian'),
+              ('Setting up the network using Autograd; The full program',
+               2,
+               None,
+               'setting-up-the-network-using-autograd-the-full-program'),
+              ('Resources on differential equations and deep learning',
+               2,
+               None,
+               'resources-on-differential-equations-and-deep-learning'),
               ('Convolutional Neural Networks (recognizing images)',
                2,
                None,
@@ -141,85 +252,11 @@
                2,
                None,
                'pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
-              ('Building convolutional neural networks in Tensorflow and Keras',
-               2,
-               None,
-               'building-convolutional-neural-networks-in-tensorflow-and-keras'),
-              ('Setting it up', 2, None, 'setting-it-up'),
-              ('The MNIST dataset again', 2, None, 'the-mnist-dataset-again'),
-              ('Strong correlations', 2, None, 'strong-correlations'),
-              ('Layers of a CNN', 2, None, 'layers-of-a-cnn'),
-              ('Systematic reduction', 2, None, 'systematic-reduction'),
-              ('Prerequisites: Collect and pre-process data',
-               2,
-               None,
-               'prerequisites-collect-and-pre-process-data'),
-              ('Importing Keras and Tensorflow',
-               2,
-               None,
-               'importing-keras-and-tensorflow'),
-              ('Running with Keras', 2, None, 'running-with-keras'),
-              ('Final part', 2, None, 'final-part'),
-              ('Final visualization', 2, None, 'final-visualization'),
-              ('The CIFAR01 data set', 2, None, 'the-cifar01-data-set'),
-              ('Verifying the data set', 2, None, 'verifying-the-data-set'),
-              ('Set up  the model', 2, None, 'set-up-the-model'),
-              ('Add Dense layers on top', 2, None, 'add-dense-layers-on-top'),
-              ('Compile and train the model',
-               2,
-               None,
-               'compile-and-train-the-model'),
-              ('Finally, evaluate the model',
-               2,
-               None,
-               'finally-evaluate-the-model'),
-              ('Building our own CNN code',
+              ('Building convolutional neural networks in Tensorflow/Keras and '
+               'PyTorch',
                2,
                None,
-               'building-our-own-cnn-code'),
-              ('List of contents:', 3, None, 'list-of-contents'),
-              ('Schedulers', 3, None, 'schedulers'),
-              ('Usage of schedulers', 3, None, 'usage-of-schedulers'),
-              ('Cost functions', 3, None, 'cost-functions'),
-              ('Usage of cost functions', 3, None, 'usage-of-cost-functions'),
-              ('Activation functions', 3, None, 'activation-functions'),
-              ('Usage of activation functions',
-               3,
-               None,
-               'usage-of-activation-functions'),
-              ('Convolution', 3, None, 'convolution'),
-              ('Layers', 3, None, 'layers'),
-              ('Convolution2DLayer: convolution in a hidden layer',
-               3,
-               None,
-               'convolution2dlayer-convolution-in-a-hidden-layer'),
-              ('Backpropagation in the convolutional layer',
-               3,
-               None,
-               'backpropagation-in-the-convolutional-layer'),
-              ('Demonstration', 3, None, 'demonstration'),
-              ('Pooling Layer', 3, None, 'pooling-layer'),
-              ('Flattening Layer', 3, None, 'flattening-layer'),
-              ('Fully Connected Layers', 3, None, 'fully-connected-layers'),
-              ('Optimized Convolution2DLayer',
-               3,
-               None,
-               'optimized-convolution2dlayer'),
-              ('The Convolutional Neural Network (CNN)',
-               3,
-               None,
-               'the-convolutional-neural-network-cnn'),
-              ('Usage of CNN code', 3, None, 'usage-of-cnn-code'),
-              ('Additional Remarks', 3, None, 'additional-remarks'),
-              ('Remarks on the speed', 3, None, 'remarks-on-the-speed'),
-              ('Convolution using separable kernels',
-               3,
-               None,
-               'convolution-using-separable-kernels'),
-              ('Convolution in the Fourier domain',
-               3,
-               None,
-               'convolution-in-the-fourier-domain')]}
+               'building-convolutional-neural-networks-in-tensorflow-keras-and-pytorch')]}
 end of tocinfo -->
 
 <body>
@@ -247,96 +284,100 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week44-bs.html">Week 44,  Convolutional Neural Networks (CNN)</a>
+    <a class="navbar-brand" href="http://github.com/week44-bs.html">Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs001.html#plan-for-week-44" style="font-size: 80%;"><b>Plan for week 44</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs002.html#lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Lab  sessions on Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs003.html#material-for-lecture-monday-october-28" style="font-size: 80%;"><b>Material for Lecture Monday October 28</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs004.html#convolutional-neural-networks-recognizing-images" style="font-size: 80%;"><b>Convolutional Neural Networks (recognizing images)</b></a></li>
-     <!-- navigation toc: --> <li><a href="#what-is-the-difference" style="font-size: 80%;"><b>What is the Difference</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs006.html#neural-networks-vs-cnns" style="font-size: 80%;"><b>Neural Networks vs CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs007.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;"><b>Why CNNS for images, sound files, medical images from CT scans etc?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs008.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;"><b>Regular NNs don’t scale well to full images</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs009.html#3d-volumes-of-neurons" style="font-size: 80%;"><b>3D volumes of neurons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs010.html#more-on-dimensionalities" style="font-size: 80%;"><b>More on Dimensionalities</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs011.html#further-remarks" style="font-size: 80%;"><b>Further remarks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs012.html#layers-used-to-build-cnns" style="font-size: 80%;"><b>Layers used to build CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs013.html#transforming-images" style="font-size: 80%;"><b>Transforming images</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs014.html#cnns-in-brief" style="font-size: 80%;"><b>CNNs in brief</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs015.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs016.html#key-idea" style="font-size: 80%;"><b>Key Idea</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs017.html#how-to-do-image-compression-before-the-era-of-deep-learning" style="font-size: 80%;"><b>How to do image compression before the era of deep learning</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs018.html#the-svd-example" style="font-size: 80%;"><b>The SVD example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs019.html#mathematics-of-cnns" style="font-size: 80%;"><b>Mathematics of CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs020.html#convolution-examples-polynomial-multiplication" style="font-size: 80%;"><b>Convolution Examples: Polynomial multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs021.html#efficient-polynomial-multiplication" style="font-size: 80%;"><b>Efficient Polynomial Multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs022.html#further-simplification" style="font-size: 80%;"><b>Further simplification</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs023.html#a-more-efficient-way-of-coding-the-above-convolution" style="font-size: 80%;"><b>A more efficient way of coding the above Convolution</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs024.html#commutative-process" style="font-size: 80%;"><b>Commutative process</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs025.html#toeplitz-matrices" style="font-size: 80%;"><b>Toeplitz matrices</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs026.html#fourier-series-and-toeplitz-matrices" style="font-size: 80%;"><b>Fourier series and Toeplitz matrices</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs027.html#generalizing-the-above-one-dimensional-case" style="font-size: 80%;"><b>Generalizing the above one-dimensional case</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#memory-considerations" style="font-size: 80%;"><b>Memory considerations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs029.html#padding" style="font-size: 80%;"><b>Padding</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs030.html#new-vector" style="font-size: 80%;"><b>New vector</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs031.html#rewriting-as-dot-products" style="font-size: 80%;"><b>Rewriting as dot products</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs032.html#cross-correlation" style="font-size: 80%;"><b>Cross correlation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs032.html#two-dimensional-objects" style="font-size: 80%;"><b>Two-dimensional objects</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs033.html#cnns-in-more-detail-simple-example" style="font-size: 80%;"><b>CNNs in more detail, simple example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs034.html#the-convolution-stage" style="font-size: 80%;"><b>The convolution stage</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs035.html#finding-the-number-of-parameters" style="font-size: 80%;"><b>Finding the number of parameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs036.html#new-image-or-volume" style="font-size: 80%;"><b>New image (or volume)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs037.html#parameters-to-train-common-settings" style="font-size: 80%;"><b>Parameters to train, common settings</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs038.html#examples-of-cnn-setups" style="font-size: 80%;"><b>Examples of CNN setups</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs039.html#summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>Summarizing: Performing a general discrete convolution ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs040.html#pooling" style="font-size: 80%;"><b>Pooling</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs041.html#pooling-arithmetic" style="font-size: 80%;"><b>Pooling arithmetic</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs042.html#pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>Pooling types ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs043.html#building-convolutional-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building convolutional neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs044.html#setting-it-up" style="font-size: 80%;"><b>Setting it up</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs045.html#the-mnist-dataset-again" style="font-size: 80%;"><b>The MNIST dataset again</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs046.html#strong-correlations" style="font-size: 80%;"><b>Strong correlations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs047.html#layers-of-a-cnn" style="font-size: 80%;"><b>Layers of a CNN</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs048.html#systematic-reduction" style="font-size: 80%;"><b>Systematic reduction</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs049.html#prerequisites-collect-and-pre-process-data" style="font-size: 80%;"><b>Prerequisites: Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs050.html#importing-keras-and-tensorflow" style="font-size: 80%;"><b>Importing Keras and Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs051.html#running-with-keras" style="font-size: 80%;"><b>Running with Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs052.html#final-part" style="font-size: 80%;"><b>Final part</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs053.html#final-visualization" style="font-size: 80%;"><b>Final visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs054.html#the-cifar01-data-set" style="font-size: 80%;"><b>The CIFAR01 data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs055.html#verifying-the-data-set" style="font-size: 80%;"><b>Verifying the data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs056.html#set-up-the-model" style="font-size: 80%;"><b>Set up  the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs057.html#add-dense-layers-on-top" style="font-size: 80%;"><b>Add Dense layers on top</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs058.html#compile-and-train-the-model" style="font-size: 80%;"><b>Compile and train the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs059.html#finally-evaluate-the-model" style="font-size: 80%;"><b>Finally, evaluate the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#building-our-own-cnn-code" style="font-size: 80%;"><b>Building our own CNN code</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#list-of-contents" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;List of contents:</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution2dlayer-convolution-in-a-hidden-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution2DLayer: convolution in a hidden layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#backpropagation-in-the-convolutional-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Backpropagation in the convolutional layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#demonstration" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Demonstration</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#pooling-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Pooling Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#flattening-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Flattening Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#fully-connected-layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Fully Connected Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#optimized-convolution2dlayer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Optimized Convolution2DLayer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#the-convolutional-neural-network-cnn" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;The Convolutional Neural Network (CNN)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-cnn-code" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of CNN code</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#additional-remarks" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Additional Remarks</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#remarks-on-the-speed" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Remarks on the speed</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution-using-separable-kernels" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution using separable kernels</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution-in-the-fourier-domain" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution in the Fourier domain</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs001.html#plan-for-week-44" style="font-size: 80%;">Plan for week 44</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs002.html#lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;">Lab  sessions on Tuesday and Wednesday</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs003.html#material-for-lecture-monday-october-27" style="font-size: 80%;">Material for Lecture Monday October 27</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs004.html#solving-differential-equations-with-deep-learning" style="font-size: 80%;">Solving differential equations  with Deep Learning</a></li>
+     <!-- navigation toc: --> <li><a href="#ordinary-differential-equations-first" style="font-size: 80%;">Ordinary Differential Equations first</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#the-trial-solution" style="font-size: 80%;">The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs007.html#minimization-process" style="font-size: 80%;">Minimization process</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs008.html#minimizing-the-cost-function-using-gradient-descent-and-automatic-differentiation" style="font-size: 80%;">Minimizing the cost function using gradient descent and automatic differentiation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs009.html#example-exponential-decay" style="font-size: 80%;">Example: Exponential decay</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs010.html#the-function-to-solve-for" style="font-size: 80%;">The function to solve for</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#the-trial-solution" style="font-size: 80%;">The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs012.html#setup-of-network" style="font-size: 80%;">Setup of Network</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs013.html#reformulating-the-problem" style="font-size: 80%;">Reformulating the problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs014.html#more-technicalities" style="font-size: 80%;">More technicalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs039.html#more-details" style="font-size: 80%;">More details</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs016.html#a-possible-implementation-of-a-neural-network" style="font-size: 80%;">A possible implementation of a neural network</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs017.html#technicalities" style="font-size: 80%;">Technicalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs018.html#final-technicalities-i" style="font-size: 80%;">Final technicalities I</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs019.html#final-technicalities-ii" style="font-size: 80%;">Final technicalities II</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs020.html#final-technicalities-iii" style="font-size: 80%;">Final technicalities III</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs021.html#final-technicalities-iv" style="font-size: 80%;">Final technicalities IV</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs022.html#back-propagation" style="font-size: 80%;">Back propagation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs023.html#gradient-descent" style="font-size: 80%;">Gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs024.html#the-code-for-solving-the-ode" style="font-size: 80%;">The code for solving the ODE</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs025.html#the-network-with-one-input-layer-specified-number-of-hidden-layers-and-one-output-layer" style="font-size: 80%;">The network with one input layer, specified number of hidden layers, and one output layer</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs026.html#example-population-growth" style="font-size: 80%;">Example: Population growth</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs027.html#setting-up-the-problem" style="font-size: 80%;">Setting up the problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#the-trial-solution" style="font-size: 80%;">The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs029.html#the-program-using-autograd" style="font-size: 80%;">The program using Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs030.html#using-forward-euler-to-solve-the-ode" style="font-size: 80%;">Using forward Euler to solve the ODE</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs031.html#example-solving-the-one-dimensional-poisson-equation" style="font-size: 80%;">Example: Solving the one dimensional Poisson equation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs032.html#the-specific-equation-to-solve-for" style="font-size: 80%;">The specific equation to solve for</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs033.html#solving-the-equation-using-autograd" style="font-size: 80%;">Solving the equation using Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs034.html#comparing-with-a-numerical-scheme" style="font-size: 80%;">Comparing with a numerical scheme</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs035.html#setting-up-the-code" style="font-size: 80%;">Setting up the code</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs036.html#partial-differential-equations" style="font-size: 80%;">Partial Differential Equations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs037.html#type-of-problem" style="font-size: 80%;">Type of problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs038.html#network-requirements" style="font-size: 80%;">Network requirements</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs039.html#more-details" style="font-size: 80%;">More details</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs040.html#example-the-diffusion-equation" style="font-size: 80%;">Example: The diffusion equation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs041.html#defining-the-problem" style="font-size: 80%;">Defining the problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs042.html#setting-up-the-network-using-autograd" style="font-size: 80%;">Setting up the network using Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs043.html#setting-up-the-network-using-autograd-the-trial-solution" style="font-size: 80%;">Setting up the network using Autograd; The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs044.html#why-the-jacobian" style="font-size: 80%;">Why the Jacobian?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs045.html#setting-up-the-network-using-autograd-the-full-program" style="font-size: 80%;">Setting up the network using Autograd; The full program</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs046.html#resources-on-differential-equations-and-deep-learning" style="font-size: 80%;">Resources on differential equations and deep learning</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs047.html#convolutional-neural-networks-recognizing-images" style="font-size: 80%;">Convolutional Neural Networks (recognizing images)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs048.html#what-is-the-difference" style="font-size: 80%;">What is the Difference</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs049.html#neural-networks-vs-cnns" style="font-size: 80%;">Neural Networks vs CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs050.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;">Why CNNS for images, sound files, medical images from CT scans etc?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs051.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;">Regular NNs don’t scale well to full images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs052.html#3d-volumes-of-neurons" style="font-size: 80%;">3D volumes of neurons</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs053.html#more-on-dimensionalities" style="font-size: 80%;">More on Dimensionalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs054.html#further-remarks" style="font-size: 80%;">Further remarks</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs055.html#layers-used-to-build-cnns" style="font-size: 80%;">Layers used to build CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs056.html#transforming-images" style="font-size: 80%;">Transforming images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs057.html#cnns-in-brief" style="font-size: 80%;">CNNs in brief</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs058.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs059.html#key-idea" style="font-size: 80%;">Key Idea</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#how-to-do-image-compression-before-the-era-of-deep-learning" style="font-size: 80%;">How to do image compression before the era of deep learning</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs061.html#the-svd-example" style="font-size: 80%;">The SVD example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs062.html#mathematics-of-cnns" style="font-size: 80%;">Mathematics of CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs063.html#convolution-examples-polynomial-multiplication" style="font-size: 80%;">Convolution Examples: Polynomial multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs064.html#efficient-polynomial-multiplication" style="font-size: 80%;">Efficient Polynomial Multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs065.html#further-simplification" style="font-size: 80%;">Further simplification</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs066.html#a-more-efficient-way-of-coding-the-above-convolution" style="font-size: 80%;">A more efficient way of coding the above Convolution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs067.html#commutative-process" style="font-size: 80%;">Commutative process</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs068.html#toeplitz-matrices" style="font-size: 80%;">Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs069.html#fourier-series-and-toeplitz-matrices" style="font-size: 80%;">Fourier series and Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs070.html#generalizing-the-above-one-dimensional-case" style="font-size: 80%;">Generalizing the above one-dimensional case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs071.html#memory-considerations" style="font-size: 80%;">Memory considerations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs072.html#padding" style="font-size: 80%;">Padding</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs073.html#new-vector" style="font-size: 80%;">New vector</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs074.html#rewriting-as-dot-products" style="font-size: 80%;">Rewriting as dot products</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs075.html#cross-correlation" style="font-size: 80%;">Cross correlation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs075.html#two-dimensional-objects" style="font-size: 80%;">Two-dimensional objects</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs076.html#cnns-in-more-detail-simple-example" style="font-size: 80%;">CNNs in more detail, simple example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs077.html#the-convolution-stage" style="font-size: 80%;">The convolution stage</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs078.html#finding-the-number-of-parameters" style="font-size: 80%;">Finding the number of parameters</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs079.html#new-image-or-volume" style="font-size: 80%;">New image (or volume)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs080.html#parameters-to-train-common-settings" style="font-size: 80%;">Parameters to train, common settings</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs081.html#examples-of-cnn-setups" style="font-size: 80%;">Examples of CNN setups</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs082.html#summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Summarizing: Performing a general discrete convolution ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs083.html#pooling" style="font-size: 80%;">Pooling</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs084.html#pooling-arithmetic" style="font-size: 80%;">Pooling arithmetic</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs085.html#pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Pooling types ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs086.html#building-convolutional-neural-networks-in-tensorflow-keras-and-pytorch" style="font-size: 80%;">Building convolutional neural networks in Tensorflow/Keras and PyTorch</a></li>
 
         </ul>
       </li>
@@ -348,13 +389,25 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0005"></a>
 <!-- !split -->
-<h2 id="what-is-the-difference" class="anchor">What is the Difference </h2>
+<h2 id="ordinary-differential-equations-first" class="anchor">Ordinary Differential Equations first  </h2>
+
+<p>An ordinary differential equation (ODE) is an equation involving functions having one variable.</p>
+
+<p>In general, an ordinary differential equation looks like</p>
+
+$$
+\begin{equation} \tag{1}
+f\left(x, \, g(x), \, g'(x), \, g''(x), \, \dots \, , \, g^{(n)}(x)\right) = 0
+\end{equation}
+$$
+
+<p>where \( g(x) \) is the function to find, and \( g^{(n)}(x) \) is the \( n \)-th derivative of \( g(x) \).</p>
 
-<p><b>CNN architectures make the explicit assumption that
-the inputs are images, which allows us to encode certain properties
-into the architecture. These then make the forward function more
-efficient to implement and vastly reduce the amount of parameters in
-the network.</b>
+<p>The \( f\left(x, g(x), g'(x), g''(x), \, \dots \, , g^{(n)}(x)\right) \) is just a way to write that there is an expression involving \( x \) and \( g(x), \ g'(x), \ g''(x), \, \dots \, , \text{ and } g^{(n)}(x) \) on the left side of the equality sign in <a href="#mjx-eqn-1">(1)</a>.
+The highest order of derivative, that is the value of \( n \), determines to the order of the equation.
+The equation is referred to as a \( n \)-th order ODE.
+Along with <a href="#mjx-eqn-1">(1)</a>, some additional conditions of the function \( g(x) \) are typically given
+for the solution to be unique.
 </p>
 
 <p>
@@ -377,7 +430,7 @@ <h2 id="what-is-the-difference" class="anchor">What is the Difference </h2>
   <li><a href="http://github.com/._week44-bs013.html">14</a></li>
   <li><a href="http://github.com/._week44-bs014.html">15</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week44-bs060.html">61</a></li>
+  <li><a href="http://github.com/._week44-bs086.html">87</a></li>
   <li><a href="http://github.com/._week44-bs006.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week44/html/._week44-bs006.html b/doc/pub/week44/html/._week44-bs006.html
index 5f3572c68..2182ed63c 100644
--- a/doc/pub/week44/html/._week44-bs006.html
+++ b/doc/pub/week44/html/._week44-bs006.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 44,  Convolutional Neural Networks (CNN)">
-<title>Week 44,  Convolutional Neural Networks (CNN)</title>
+<meta name="description" content="Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)">
+<title>Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week44.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week44-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -41,10 +41,121 @@
                2,
                None,
                'lab-sessions-on-tuesday-and-wednesday'),
-              ('Material for Lecture Monday October 28',
+              ('Material for Lecture Monday October 27',
                2,
                None,
-               'material-for-lecture-monday-october-28'),
+               'material-for-lecture-monday-october-27'),
+              ('Solving differential equations  with Deep Learning',
+               2,
+               None,
+               'solving-differential-equations-with-deep-learning'),
+              ('Ordinary Differential Equations first',
+               2,
+               None,
+               'ordinary-differential-equations-first'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('Minimization process', 2, None, 'minimization-process'),
+              ('Minimizing the cost function using gradient descent and '
+               'automatic differentiation',
+               2,
+               None,
+               'minimizing-the-cost-function-using-gradient-descent-and-automatic-differentiation'),
+              ('Example: Exponential decay',
+               2,
+               None,
+               'example-exponential-decay'),
+              ('The function to solve for',
+               2,
+               None,
+               'the-function-to-solve-for'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('Setup of Network', 2, None, 'setup-of-network'),
+              ('Reformulating the problem',
+               2,
+               None,
+               'reformulating-the-problem'),
+              ('More technicalities', 2, None, 'more-technicalities'),
+              ('More details', 2, None, 'more-details'),
+              ('A possible implementation of a neural network',
+               2,
+               None,
+               'a-possible-implementation-of-a-neural-network'),
+              ('Technicalities', 2, None, 'technicalities'),
+              ('Final technicalities I', 2, None, 'final-technicalities-i'),
+              ('Final technicalities II', 2, None, 'final-technicalities-ii'),
+              ('Final technicalities III', 2, None, 'final-technicalities-iii'),
+              ('Final technicalities IV', 2, None, 'final-technicalities-iv'),
+              ('Back propagation', 2, None, 'back-propagation'),
+              ('Gradient descent', 2, None, 'gradient-descent'),
+              ('The code for solving the ODE',
+               2,
+               None,
+               'the-code-for-solving-the-ode'),
+              ('The network with one input layer, specified number of hidden '
+               'layers, and one output layer',
+               2,
+               None,
+               'the-network-with-one-input-layer-specified-number-of-hidden-layers-and-one-output-layer'),
+              ('Example: Population growth',
+               2,
+               None,
+               'example-population-growth'),
+              ('Setting up the problem', 2, None, 'setting-up-the-problem'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('The program using Autograd',
+               2,
+               None,
+               'the-program-using-autograd'),
+              ('Using forward Euler to solve the ODE',
+               2,
+               None,
+               'using-forward-euler-to-solve-the-ode'),
+              ('Example: Solving the one dimensional Poisson equation',
+               2,
+               None,
+               'example-solving-the-one-dimensional-poisson-equation'),
+              ('The specific equation to solve for',
+               2,
+               None,
+               'the-specific-equation-to-solve-for'),
+              ('Solving the equation using Autograd',
+               2,
+               None,
+               'solving-the-equation-using-autograd'),
+              ('Comparing with a numerical scheme',
+               2,
+               None,
+               'comparing-with-a-numerical-scheme'),
+              ('Setting up the code', 2, None, 'setting-up-the-code'),
+              ('Partial Differential Equations',
+               2,
+               None,
+               'partial-differential-equations'),
+              ('Type of problem', 2, None, 'type-of-problem'),
+              ('Network requirements', 2, None, 'network-requirements'),
+              ('More details', 2, None, 'more-details'),
+              ('Example: The diffusion equation',
+               2,
+               None,
+               'example-the-diffusion-equation'),
+              ('Defining the problem', 2, None, 'defining-the-problem'),
+              ('Setting up the network using Autograd',
+               2,
+               None,
+               'setting-up-the-network-using-autograd'),
+              ('Setting up the network using Autograd; The trial solution',
+               2,
+               None,
+               'setting-up-the-network-using-autograd-the-trial-solution'),
+              ('Why the Jacobian?', 2, None, 'why-the-jacobian'),
+              ('Setting up the network using Autograd; The full program',
+               2,
+               None,
+               'setting-up-the-network-using-autograd-the-full-program'),
+              ('Resources on differential equations and deep learning',
+               2,
+               None,
+               'resources-on-differential-equations-and-deep-learning'),
               ('Convolutional Neural Networks (recognizing images)',
                2,
                None,
@@ -141,85 +252,11 @@
                2,
                None,
                'pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
-              ('Building convolutional neural networks in Tensorflow and Keras',
-               2,
-               None,
-               'building-convolutional-neural-networks-in-tensorflow-and-keras'),
-              ('Setting it up', 2, None, 'setting-it-up'),
-              ('The MNIST dataset again', 2, None, 'the-mnist-dataset-again'),
-              ('Strong correlations', 2, None, 'strong-correlations'),
-              ('Layers of a CNN', 2, None, 'layers-of-a-cnn'),
-              ('Systematic reduction', 2, None, 'systematic-reduction'),
-              ('Prerequisites: Collect and pre-process data',
-               2,
-               None,
-               'prerequisites-collect-and-pre-process-data'),
-              ('Importing Keras and Tensorflow',
-               2,
-               None,
-               'importing-keras-and-tensorflow'),
-              ('Running with Keras', 2, None, 'running-with-keras'),
-              ('Final part', 2, None, 'final-part'),
-              ('Final visualization', 2, None, 'final-visualization'),
-              ('The CIFAR01 data set', 2, None, 'the-cifar01-data-set'),
-              ('Verifying the data set', 2, None, 'verifying-the-data-set'),
-              ('Set up  the model', 2, None, 'set-up-the-model'),
-              ('Add Dense layers on top', 2, None, 'add-dense-layers-on-top'),
-              ('Compile and train the model',
-               2,
-               None,
-               'compile-and-train-the-model'),
-              ('Finally, evaluate the model',
-               2,
-               None,
-               'finally-evaluate-the-model'),
-              ('Building our own CNN code',
+              ('Building convolutional neural networks in Tensorflow/Keras and '
+               'PyTorch',
                2,
                None,
-               'building-our-own-cnn-code'),
-              ('List of contents:', 3, None, 'list-of-contents'),
-              ('Schedulers', 3, None, 'schedulers'),
-              ('Usage of schedulers', 3, None, 'usage-of-schedulers'),
-              ('Cost functions', 3, None, 'cost-functions'),
-              ('Usage of cost functions', 3, None, 'usage-of-cost-functions'),
-              ('Activation functions', 3, None, 'activation-functions'),
-              ('Usage of activation functions',
-               3,
-               None,
-               'usage-of-activation-functions'),
-              ('Convolution', 3, None, 'convolution'),
-              ('Layers', 3, None, 'layers'),
-              ('Convolution2DLayer: convolution in a hidden layer',
-               3,
-               None,
-               'convolution2dlayer-convolution-in-a-hidden-layer'),
-              ('Backpropagation in the convolutional layer',
-               3,
-               None,
-               'backpropagation-in-the-convolutional-layer'),
-              ('Demonstration', 3, None, 'demonstration'),
-              ('Pooling Layer', 3, None, 'pooling-layer'),
-              ('Flattening Layer', 3, None, 'flattening-layer'),
-              ('Fully Connected Layers', 3, None, 'fully-connected-layers'),
-              ('Optimized Convolution2DLayer',
-               3,
-               None,
-               'optimized-convolution2dlayer'),
-              ('The Convolutional Neural Network (CNN)',
-               3,
-               None,
-               'the-convolutional-neural-network-cnn'),
-              ('Usage of CNN code', 3, None, 'usage-of-cnn-code'),
-              ('Additional Remarks', 3, None, 'additional-remarks'),
-              ('Remarks on the speed', 3, None, 'remarks-on-the-speed'),
-              ('Convolution using separable kernels',
-               3,
-               None,
-               'convolution-using-separable-kernels'),
-              ('Convolution in the Fourier domain',
-               3,
-               None,
-               'convolution-in-the-fourier-domain')]}
+               'building-convolutional-neural-networks-in-tensorflow-keras-and-pytorch')]}
 end of tocinfo -->
 
 <body>
@@ -247,96 +284,100 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week44-bs.html">Week 44,  Convolutional Neural Networks (CNN)</a>
+    <a class="navbar-brand" href="http://github.com/week44-bs.html">Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs001.html#plan-for-week-44" style="font-size: 80%;"><b>Plan for week 44</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs002.html#lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Lab  sessions on Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs003.html#material-for-lecture-monday-october-28" style="font-size: 80%;"><b>Material for Lecture Monday October 28</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs004.html#convolutional-neural-networks-recognizing-images" style="font-size: 80%;"><b>Convolutional Neural Networks (recognizing images)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs005.html#what-is-the-difference" style="font-size: 80%;"><b>What is the Difference</b></a></li>
-     <!-- navigation toc: --> <li><a href="#neural-networks-vs-cnns" style="font-size: 80%;"><b>Neural Networks vs CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs007.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;"><b>Why CNNS for images, sound files, medical images from CT scans etc?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs008.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;"><b>Regular NNs don’t scale well to full images</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs009.html#3d-volumes-of-neurons" style="font-size: 80%;"><b>3D volumes of neurons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs010.html#more-on-dimensionalities" style="font-size: 80%;"><b>More on Dimensionalities</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs011.html#further-remarks" style="font-size: 80%;"><b>Further remarks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs012.html#layers-used-to-build-cnns" style="font-size: 80%;"><b>Layers used to build CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs013.html#transforming-images" style="font-size: 80%;"><b>Transforming images</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs014.html#cnns-in-brief" style="font-size: 80%;"><b>CNNs in brief</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs015.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs016.html#key-idea" style="font-size: 80%;"><b>Key Idea</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs017.html#how-to-do-image-compression-before-the-era-of-deep-learning" style="font-size: 80%;"><b>How to do image compression before the era of deep learning</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs018.html#the-svd-example" style="font-size: 80%;"><b>The SVD example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs019.html#mathematics-of-cnns" style="font-size: 80%;"><b>Mathematics of CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs020.html#convolution-examples-polynomial-multiplication" style="font-size: 80%;"><b>Convolution Examples: Polynomial multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs021.html#efficient-polynomial-multiplication" style="font-size: 80%;"><b>Efficient Polynomial Multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs022.html#further-simplification" style="font-size: 80%;"><b>Further simplification</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs023.html#a-more-efficient-way-of-coding-the-above-convolution" style="font-size: 80%;"><b>A more efficient way of coding the above Convolution</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs024.html#commutative-process" style="font-size: 80%;"><b>Commutative process</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs025.html#toeplitz-matrices" style="font-size: 80%;"><b>Toeplitz matrices</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs026.html#fourier-series-and-toeplitz-matrices" style="font-size: 80%;"><b>Fourier series and Toeplitz matrices</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs027.html#generalizing-the-above-one-dimensional-case" style="font-size: 80%;"><b>Generalizing the above one-dimensional case</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#memory-considerations" style="font-size: 80%;"><b>Memory considerations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs029.html#padding" style="font-size: 80%;"><b>Padding</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs030.html#new-vector" style="font-size: 80%;"><b>New vector</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs031.html#rewriting-as-dot-products" style="font-size: 80%;"><b>Rewriting as dot products</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs032.html#cross-correlation" style="font-size: 80%;"><b>Cross correlation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs032.html#two-dimensional-objects" style="font-size: 80%;"><b>Two-dimensional objects</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs033.html#cnns-in-more-detail-simple-example" style="font-size: 80%;"><b>CNNs in more detail, simple example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs034.html#the-convolution-stage" style="font-size: 80%;"><b>The convolution stage</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs035.html#finding-the-number-of-parameters" style="font-size: 80%;"><b>Finding the number of parameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs036.html#new-image-or-volume" style="font-size: 80%;"><b>New image (or volume)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs037.html#parameters-to-train-common-settings" style="font-size: 80%;"><b>Parameters to train, common settings</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs038.html#examples-of-cnn-setups" style="font-size: 80%;"><b>Examples of CNN setups</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs039.html#summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>Summarizing: Performing a general discrete convolution ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs040.html#pooling" style="font-size: 80%;"><b>Pooling</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs041.html#pooling-arithmetic" style="font-size: 80%;"><b>Pooling arithmetic</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs042.html#pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>Pooling types ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs043.html#building-convolutional-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building convolutional neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs044.html#setting-it-up" style="font-size: 80%;"><b>Setting it up</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs045.html#the-mnist-dataset-again" style="font-size: 80%;"><b>The MNIST dataset again</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs046.html#strong-correlations" style="font-size: 80%;"><b>Strong correlations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs047.html#layers-of-a-cnn" style="font-size: 80%;"><b>Layers of a CNN</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs048.html#systematic-reduction" style="font-size: 80%;"><b>Systematic reduction</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs049.html#prerequisites-collect-and-pre-process-data" style="font-size: 80%;"><b>Prerequisites: Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs050.html#importing-keras-and-tensorflow" style="font-size: 80%;"><b>Importing Keras and Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs051.html#running-with-keras" style="font-size: 80%;"><b>Running with Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs052.html#final-part" style="font-size: 80%;"><b>Final part</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs053.html#final-visualization" style="font-size: 80%;"><b>Final visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs054.html#the-cifar01-data-set" style="font-size: 80%;"><b>The CIFAR01 data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs055.html#verifying-the-data-set" style="font-size: 80%;"><b>Verifying the data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs056.html#set-up-the-model" style="font-size: 80%;"><b>Set up  the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs057.html#add-dense-layers-on-top" style="font-size: 80%;"><b>Add Dense layers on top</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs058.html#compile-and-train-the-model" style="font-size: 80%;"><b>Compile and train the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs059.html#finally-evaluate-the-model" style="font-size: 80%;"><b>Finally, evaluate the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#building-our-own-cnn-code" style="font-size: 80%;"><b>Building our own CNN code</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#list-of-contents" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;List of contents:</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution2dlayer-convolution-in-a-hidden-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution2DLayer: convolution in a hidden layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#backpropagation-in-the-convolutional-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Backpropagation in the convolutional layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#demonstration" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Demonstration</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#pooling-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Pooling Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#flattening-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Flattening Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#fully-connected-layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Fully Connected Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#optimized-convolution2dlayer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Optimized Convolution2DLayer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#the-convolutional-neural-network-cnn" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;The Convolutional Neural Network (CNN)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-cnn-code" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of CNN code</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#additional-remarks" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Additional Remarks</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#remarks-on-the-speed" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Remarks on the speed</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution-using-separable-kernels" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution using separable kernels</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution-in-the-fourier-domain" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution in the Fourier domain</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs001.html#plan-for-week-44" style="font-size: 80%;">Plan for week 44</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs002.html#lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;">Lab  sessions on Tuesday and Wednesday</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs003.html#material-for-lecture-monday-october-27" style="font-size: 80%;">Material for Lecture Monday October 27</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs004.html#solving-differential-equations-with-deep-learning" style="font-size: 80%;">Solving differential equations  with Deep Learning</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs005.html#ordinary-differential-equations-first" style="font-size: 80%;">Ordinary Differential Equations first</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#the-trial-solution" style="font-size: 80%;">The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs007.html#minimization-process" style="font-size: 80%;">Minimization process</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs008.html#minimizing-the-cost-function-using-gradient-descent-and-automatic-differentiation" style="font-size: 80%;">Minimizing the cost function using gradient descent and automatic differentiation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs009.html#example-exponential-decay" style="font-size: 80%;">Example: Exponential decay</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs010.html#the-function-to-solve-for" style="font-size: 80%;">The function to solve for</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#the-trial-solution" style="font-size: 80%;">The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs012.html#setup-of-network" style="font-size: 80%;">Setup of Network</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs013.html#reformulating-the-problem" style="font-size: 80%;">Reformulating the problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs014.html#more-technicalities" style="font-size: 80%;">More technicalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs039.html#more-details" style="font-size: 80%;">More details</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs016.html#a-possible-implementation-of-a-neural-network" style="font-size: 80%;">A possible implementation of a neural network</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs017.html#technicalities" style="font-size: 80%;">Technicalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs018.html#final-technicalities-i" style="font-size: 80%;">Final technicalities I</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs019.html#final-technicalities-ii" style="font-size: 80%;">Final technicalities II</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs020.html#final-technicalities-iii" style="font-size: 80%;">Final technicalities III</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs021.html#final-technicalities-iv" style="font-size: 80%;">Final technicalities IV</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs022.html#back-propagation" style="font-size: 80%;">Back propagation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs023.html#gradient-descent" style="font-size: 80%;">Gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs024.html#the-code-for-solving-the-ode" style="font-size: 80%;">The code for solving the ODE</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs025.html#the-network-with-one-input-layer-specified-number-of-hidden-layers-and-one-output-layer" style="font-size: 80%;">The network with one input layer, specified number of hidden layers, and one output layer</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs026.html#example-population-growth" style="font-size: 80%;">Example: Population growth</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs027.html#setting-up-the-problem" style="font-size: 80%;">Setting up the problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#the-trial-solution" style="font-size: 80%;">The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs029.html#the-program-using-autograd" style="font-size: 80%;">The program using Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs030.html#using-forward-euler-to-solve-the-ode" style="font-size: 80%;">Using forward Euler to solve the ODE</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs031.html#example-solving-the-one-dimensional-poisson-equation" style="font-size: 80%;">Example: Solving the one dimensional Poisson equation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs032.html#the-specific-equation-to-solve-for" style="font-size: 80%;">The specific equation to solve for</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs033.html#solving-the-equation-using-autograd" style="font-size: 80%;">Solving the equation using Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs034.html#comparing-with-a-numerical-scheme" style="font-size: 80%;">Comparing with a numerical scheme</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs035.html#setting-up-the-code" style="font-size: 80%;">Setting up the code</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs036.html#partial-differential-equations" style="font-size: 80%;">Partial Differential Equations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs037.html#type-of-problem" style="font-size: 80%;">Type of problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs038.html#network-requirements" style="font-size: 80%;">Network requirements</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs039.html#more-details" style="font-size: 80%;">More details</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs040.html#example-the-diffusion-equation" style="font-size: 80%;">Example: The diffusion equation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs041.html#defining-the-problem" style="font-size: 80%;">Defining the problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs042.html#setting-up-the-network-using-autograd" style="font-size: 80%;">Setting up the network using Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs043.html#setting-up-the-network-using-autograd-the-trial-solution" style="font-size: 80%;">Setting up the network using Autograd; The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs044.html#why-the-jacobian" style="font-size: 80%;">Why the Jacobian?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs045.html#setting-up-the-network-using-autograd-the-full-program" style="font-size: 80%;">Setting up the network using Autograd; The full program</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs046.html#resources-on-differential-equations-and-deep-learning" style="font-size: 80%;">Resources on differential equations and deep learning</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs047.html#convolutional-neural-networks-recognizing-images" style="font-size: 80%;">Convolutional Neural Networks (recognizing images)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs048.html#what-is-the-difference" style="font-size: 80%;">What is the Difference</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs049.html#neural-networks-vs-cnns" style="font-size: 80%;">Neural Networks vs CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs050.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;">Why CNNS for images, sound files, medical images from CT scans etc?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs051.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;">Regular NNs don’t scale well to full images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs052.html#3d-volumes-of-neurons" style="font-size: 80%;">3D volumes of neurons</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs053.html#more-on-dimensionalities" style="font-size: 80%;">More on Dimensionalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs054.html#further-remarks" style="font-size: 80%;">Further remarks</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs055.html#layers-used-to-build-cnns" style="font-size: 80%;">Layers used to build CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs056.html#transforming-images" style="font-size: 80%;">Transforming images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs057.html#cnns-in-brief" style="font-size: 80%;">CNNs in brief</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs058.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs059.html#key-idea" style="font-size: 80%;">Key Idea</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#how-to-do-image-compression-before-the-era-of-deep-learning" style="font-size: 80%;">How to do image compression before the era of deep learning</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs061.html#the-svd-example" style="font-size: 80%;">The SVD example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs062.html#mathematics-of-cnns" style="font-size: 80%;">Mathematics of CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs063.html#convolution-examples-polynomial-multiplication" style="font-size: 80%;">Convolution Examples: Polynomial multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs064.html#efficient-polynomial-multiplication" style="font-size: 80%;">Efficient Polynomial Multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs065.html#further-simplification" style="font-size: 80%;">Further simplification</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs066.html#a-more-efficient-way-of-coding-the-above-convolution" style="font-size: 80%;">A more efficient way of coding the above Convolution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs067.html#commutative-process" style="font-size: 80%;">Commutative process</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs068.html#toeplitz-matrices" style="font-size: 80%;">Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs069.html#fourier-series-and-toeplitz-matrices" style="font-size: 80%;">Fourier series and Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs070.html#generalizing-the-above-one-dimensional-case" style="font-size: 80%;">Generalizing the above one-dimensional case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs071.html#memory-considerations" style="font-size: 80%;">Memory considerations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs072.html#padding" style="font-size: 80%;">Padding</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs073.html#new-vector" style="font-size: 80%;">New vector</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs074.html#rewriting-as-dot-products" style="font-size: 80%;">Rewriting as dot products</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs075.html#cross-correlation" style="font-size: 80%;">Cross correlation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs075.html#two-dimensional-objects" style="font-size: 80%;">Two-dimensional objects</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs076.html#cnns-in-more-detail-simple-example" style="font-size: 80%;">CNNs in more detail, simple example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs077.html#the-convolution-stage" style="font-size: 80%;">The convolution stage</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs078.html#finding-the-number-of-parameters" style="font-size: 80%;">Finding the number of parameters</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs079.html#new-image-or-volume" style="font-size: 80%;">New image (or volume)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs080.html#parameters-to-train-common-settings" style="font-size: 80%;">Parameters to train, common settings</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs081.html#examples-of-cnn-setups" style="font-size: 80%;">Examples of CNN setups</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs082.html#summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Summarizing: Performing a general discrete convolution ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs083.html#pooling" style="font-size: 80%;">Pooling</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs084.html#pooling-arithmetic" style="font-size: 80%;">Pooling arithmetic</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs085.html#pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Pooling types ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs086.html#building-convolutional-neural-networks-in-tensorflow-keras-and-pytorch" style="font-size: 80%;">Building convolutional neural networks in Tensorflow/Keras and PyTorch</a></li>
 
         </ul>
       </li>
@@ -348,17 +389,31 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0006"></a>
 <!-- !split -->
-<h2 id="neural-networks-vs-cnns" class="anchor">Neural Networks vs CNNs </h2>
+<h2 id="the-trial-solution" class="anchor">The trial solution </h2>
 
-<p>Neural networks are defined as <b>affine transformations</b>, that is 
-a vector is received as input and is multiplied with a matrix of so-called weights (our unknown paramters) to produce an
-output (to which a bias vector is usually added before passing the result
-through a nonlinear activation function). This is applicable to any type of input, be it an
-image, a sound clip or an unordered collection of features: whatever their
-dimensionality, their representation can always be flattened into a vector
-before the transformation.
+<p>Let the trial solution \( g_t(x) \) be</p>
+
+$$
+\begin{equation}
+	g_t(x) = h_1(x) + h_2(x,N(x,P))
+\tag{2}
+\end{equation}
+$$
+
+<p>where \( h_1(x) \) is a function that makes \( g_t(x) \) satisfy a given set
+of conditions, \( N(x,P) \) a neural network with weights and biases
+described by \( P \) and \( h_2(x, N(x,P)) \) some expression involving the
+neural network.  The role of the function \( h_2(x, N(x,P)) \), is to
+ensure that the output from \( N(x,P) \) is zero when \( g_t(x) \) is
+evaluated at the values of \( x \) where the given conditions must be
+satisfied.  The function \( h_1(x) \) should alone make \( g_t(x) \) satisfy
+the conditions.
 </p>
 
+<p>But what about the network \( N(x,P) \)?</p>
+
+<p>As described previously, an optimization method could be used to minimize the parameters of a neural network, that being its weights and biases, through backward propagation.</p>
+
 <p>
 <!-- navigation buttons at the bottom of the page -->
 <ul class="pagination">
@@ -380,7 +435,7 @@ <h2 id="neural-networks-vs-cnns" class="anchor">Neural Networks vs CNNs </h2>
   <li><a href="http://github.com/._week44-bs014.html">15</a></li>
   <li><a href="http://github.com/._week44-bs015.html">16</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week44-bs060.html">61</a></li>
+  <li><a href="http://github.com/._week44-bs086.html">87</a></li>
   <li><a href="http://github.com/._week44-bs007.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week44/html/._week44-bs007.html b/doc/pub/week44/html/._week44-bs007.html
index a116eb485..fc16e4fca 100644
--- a/doc/pub/week44/html/._week44-bs007.html
+++ b/doc/pub/week44/html/._week44-bs007.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 44,  Convolutional Neural Networks (CNN)">
-<title>Week 44,  Convolutional Neural Networks (CNN)</title>
+<meta name="description" content="Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)">
+<title>Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week44.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week44-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -41,10 +41,121 @@
                2,
                None,
                'lab-sessions-on-tuesday-and-wednesday'),
-              ('Material for Lecture Monday October 28',
+              ('Material for Lecture Monday October 27',
                2,
                None,
-               'material-for-lecture-monday-october-28'),
+               'material-for-lecture-monday-october-27'),
+              ('Solving differential equations  with Deep Learning',
+               2,
+               None,
+               'solving-differential-equations-with-deep-learning'),
+              ('Ordinary Differential Equations first',
+               2,
+               None,
+               'ordinary-differential-equations-first'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('Minimization process', 2, None, 'minimization-process'),
+              ('Minimizing the cost function using gradient descent and '
+               'automatic differentiation',
+               2,
+               None,
+               'minimizing-the-cost-function-using-gradient-descent-and-automatic-differentiation'),
+              ('Example: Exponential decay',
+               2,
+               None,
+               'example-exponential-decay'),
+              ('The function to solve for',
+               2,
+               None,
+               'the-function-to-solve-for'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('Setup of Network', 2, None, 'setup-of-network'),
+              ('Reformulating the problem',
+               2,
+               None,
+               'reformulating-the-problem'),
+              ('More technicalities', 2, None, 'more-technicalities'),
+              ('More details', 2, None, 'more-details'),
+              ('A possible implementation of a neural network',
+               2,
+               None,
+               'a-possible-implementation-of-a-neural-network'),
+              ('Technicalities', 2, None, 'technicalities'),
+              ('Final technicalities I', 2, None, 'final-technicalities-i'),
+              ('Final technicalities II', 2, None, 'final-technicalities-ii'),
+              ('Final technicalities III', 2, None, 'final-technicalities-iii'),
+              ('Final technicalities IV', 2, None, 'final-technicalities-iv'),
+              ('Back propagation', 2, None, 'back-propagation'),
+              ('Gradient descent', 2, None, 'gradient-descent'),
+              ('The code for solving the ODE',
+               2,
+               None,
+               'the-code-for-solving-the-ode'),
+              ('The network with one input layer, specified number of hidden '
+               'layers, and one output layer',
+               2,
+               None,
+               'the-network-with-one-input-layer-specified-number-of-hidden-layers-and-one-output-layer'),
+              ('Example: Population growth',
+               2,
+               None,
+               'example-population-growth'),
+              ('Setting up the problem', 2, None, 'setting-up-the-problem'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('The program using Autograd',
+               2,
+               None,
+               'the-program-using-autograd'),
+              ('Using forward Euler to solve the ODE',
+               2,
+               None,
+               'using-forward-euler-to-solve-the-ode'),
+              ('Example: Solving the one dimensional Poisson equation',
+               2,
+               None,
+               'example-solving-the-one-dimensional-poisson-equation'),
+              ('The specific equation to solve for',
+               2,
+               None,
+               'the-specific-equation-to-solve-for'),
+              ('Solving the equation using Autograd',
+               2,
+               None,
+               'solving-the-equation-using-autograd'),
+              ('Comparing with a numerical scheme',
+               2,
+               None,
+               'comparing-with-a-numerical-scheme'),
+              ('Setting up the code', 2, None, 'setting-up-the-code'),
+              ('Partial Differential Equations',
+               2,
+               None,
+               'partial-differential-equations'),
+              ('Type of problem', 2, None, 'type-of-problem'),
+              ('Network requirements', 2, None, 'network-requirements'),
+              ('More details', 2, None, 'more-details'),
+              ('Example: The diffusion equation',
+               2,
+               None,
+               'example-the-diffusion-equation'),
+              ('Defining the problem', 2, None, 'defining-the-problem'),
+              ('Setting up the network using Autograd',
+               2,
+               None,
+               'setting-up-the-network-using-autograd'),
+              ('Setting up the network using Autograd; The trial solution',
+               2,
+               None,
+               'setting-up-the-network-using-autograd-the-trial-solution'),
+              ('Why the Jacobian?', 2, None, 'why-the-jacobian'),
+              ('Setting up the network using Autograd; The full program',
+               2,
+               None,
+               'setting-up-the-network-using-autograd-the-full-program'),
+              ('Resources on differential equations and deep learning',
+               2,
+               None,
+               'resources-on-differential-equations-and-deep-learning'),
               ('Convolutional Neural Networks (recognizing images)',
                2,
                None,
@@ -141,85 +252,11 @@
                2,
                None,
                'pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
-              ('Building convolutional neural networks in Tensorflow and Keras',
-               2,
-               None,
-               'building-convolutional-neural-networks-in-tensorflow-and-keras'),
-              ('Setting it up', 2, None, 'setting-it-up'),
-              ('The MNIST dataset again', 2, None, 'the-mnist-dataset-again'),
-              ('Strong correlations', 2, None, 'strong-correlations'),
-              ('Layers of a CNN', 2, None, 'layers-of-a-cnn'),
-              ('Systematic reduction', 2, None, 'systematic-reduction'),
-              ('Prerequisites: Collect and pre-process data',
-               2,
-               None,
-               'prerequisites-collect-and-pre-process-data'),
-              ('Importing Keras and Tensorflow',
-               2,
-               None,
-               'importing-keras-and-tensorflow'),
-              ('Running with Keras', 2, None, 'running-with-keras'),
-              ('Final part', 2, None, 'final-part'),
-              ('Final visualization', 2, None, 'final-visualization'),
-              ('The CIFAR01 data set', 2, None, 'the-cifar01-data-set'),
-              ('Verifying the data set', 2, None, 'verifying-the-data-set'),
-              ('Set up  the model', 2, None, 'set-up-the-model'),
-              ('Add Dense layers on top', 2, None, 'add-dense-layers-on-top'),
-              ('Compile and train the model',
+              ('Building convolutional neural networks in Tensorflow/Keras and '
+               'PyTorch',
                2,
                None,
-               'compile-and-train-the-model'),
-              ('Finally, evaluate the model',
-               2,
-               None,
-               'finally-evaluate-the-model'),
-              ('Building our own CNN code',
-               2,
-               None,
-               'building-our-own-cnn-code'),
-              ('List of contents:', 3, None, 'list-of-contents'),
-              ('Schedulers', 3, None, 'schedulers'),
-              ('Usage of schedulers', 3, None, 'usage-of-schedulers'),
-              ('Cost functions', 3, None, 'cost-functions'),
-              ('Usage of cost functions', 3, None, 'usage-of-cost-functions'),
-              ('Activation functions', 3, None, 'activation-functions'),
-              ('Usage of activation functions',
-               3,
-               None,
-               'usage-of-activation-functions'),
-              ('Convolution', 3, None, 'convolution'),
-              ('Layers', 3, None, 'layers'),
-              ('Convolution2DLayer: convolution in a hidden layer',
-               3,
-               None,
-               'convolution2dlayer-convolution-in-a-hidden-layer'),
-              ('Backpropagation in the convolutional layer',
-               3,
-               None,
-               'backpropagation-in-the-convolutional-layer'),
-              ('Demonstration', 3, None, 'demonstration'),
-              ('Pooling Layer', 3, None, 'pooling-layer'),
-              ('Flattening Layer', 3, None, 'flattening-layer'),
-              ('Fully Connected Layers', 3, None, 'fully-connected-layers'),
-              ('Optimized Convolution2DLayer',
-               3,
-               None,
-               'optimized-convolution2dlayer'),
-              ('The Convolutional Neural Network (CNN)',
-               3,
-               None,
-               'the-convolutional-neural-network-cnn'),
-              ('Usage of CNN code', 3, None, 'usage-of-cnn-code'),
-              ('Additional Remarks', 3, None, 'additional-remarks'),
-              ('Remarks on the speed', 3, None, 'remarks-on-the-speed'),
-              ('Convolution using separable kernels',
-               3,
-               None,
-               'convolution-using-separable-kernels'),
-              ('Convolution in the Fourier domain',
-               3,
-               None,
-               'convolution-in-the-fourier-domain')]}
+               'building-convolutional-neural-networks-in-tensorflow-keras-and-pytorch')]}
 end of tocinfo -->
 
 <body>
@@ -247,96 +284,100 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week44-bs.html">Week 44,  Convolutional Neural Networks (CNN)</a>
+    <a class="navbar-brand" href="http://github.com/week44-bs.html">Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs001.html#plan-for-week-44" style="font-size: 80%;"><b>Plan for week 44</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs002.html#lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Lab  sessions on Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs003.html#material-for-lecture-monday-october-28" style="font-size: 80%;"><b>Material for Lecture Monday October 28</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs004.html#convolutional-neural-networks-recognizing-images" style="font-size: 80%;"><b>Convolutional Neural Networks (recognizing images)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs005.html#what-is-the-difference" style="font-size: 80%;"><b>What is the Difference</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs006.html#neural-networks-vs-cnns" style="font-size: 80%;"><b>Neural Networks vs CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;"><b>Why CNNS for images, sound files, medical images from CT scans etc?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs008.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;"><b>Regular NNs don’t scale well to full images</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs009.html#3d-volumes-of-neurons" style="font-size: 80%;"><b>3D volumes of neurons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs010.html#more-on-dimensionalities" style="font-size: 80%;"><b>More on Dimensionalities</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs011.html#further-remarks" style="font-size: 80%;"><b>Further remarks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs012.html#layers-used-to-build-cnns" style="font-size: 80%;"><b>Layers used to build CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs013.html#transforming-images" style="font-size: 80%;"><b>Transforming images</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs014.html#cnns-in-brief" style="font-size: 80%;"><b>CNNs in brief</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs015.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs016.html#key-idea" style="font-size: 80%;"><b>Key Idea</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs017.html#how-to-do-image-compression-before-the-era-of-deep-learning" style="font-size: 80%;"><b>How to do image compression before the era of deep learning</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs018.html#the-svd-example" style="font-size: 80%;"><b>The SVD example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs019.html#mathematics-of-cnns" style="font-size: 80%;"><b>Mathematics of CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs020.html#convolution-examples-polynomial-multiplication" style="font-size: 80%;"><b>Convolution Examples: Polynomial multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs021.html#efficient-polynomial-multiplication" style="font-size: 80%;"><b>Efficient Polynomial Multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs022.html#further-simplification" style="font-size: 80%;"><b>Further simplification</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs023.html#a-more-efficient-way-of-coding-the-above-convolution" style="font-size: 80%;"><b>A more efficient way of coding the above Convolution</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs024.html#commutative-process" style="font-size: 80%;"><b>Commutative process</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs025.html#toeplitz-matrices" style="font-size: 80%;"><b>Toeplitz matrices</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs026.html#fourier-series-and-toeplitz-matrices" style="font-size: 80%;"><b>Fourier series and Toeplitz matrices</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs027.html#generalizing-the-above-one-dimensional-case" style="font-size: 80%;"><b>Generalizing the above one-dimensional case</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#memory-considerations" style="font-size: 80%;"><b>Memory considerations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs029.html#padding" style="font-size: 80%;"><b>Padding</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs030.html#new-vector" style="font-size: 80%;"><b>New vector</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs031.html#rewriting-as-dot-products" style="font-size: 80%;"><b>Rewriting as dot products</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs032.html#cross-correlation" style="font-size: 80%;"><b>Cross correlation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs032.html#two-dimensional-objects" style="font-size: 80%;"><b>Two-dimensional objects</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs033.html#cnns-in-more-detail-simple-example" style="font-size: 80%;"><b>CNNs in more detail, simple example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs034.html#the-convolution-stage" style="font-size: 80%;"><b>The convolution stage</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs035.html#finding-the-number-of-parameters" style="font-size: 80%;"><b>Finding the number of parameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs036.html#new-image-or-volume" style="font-size: 80%;"><b>New image (or volume)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs037.html#parameters-to-train-common-settings" style="font-size: 80%;"><b>Parameters to train, common settings</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs038.html#examples-of-cnn-setups" style="font-size: 80%;"><b>Examples of CNN setups</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs039.html#summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>Summarizing: Performing a general discrete convolution ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs040.html#pooling" style="font-size: 80%;"><b>Pooling</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs041.html#pooling-arithmetic" style="font-size: 80%;"><b>Pooling arithmetic</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs042.html#pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>Pooling types ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs043.html#building-convolutional-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building convolutional neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs044.html#setting-it-up" style="font-size: 80%;"><b>Setting it up</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs045.html#the-mnist-dataset-again" style="font-size: 80%;"><b>The MNIST dataset again</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs046.html#strong-correlations" style="font-size: 80%;"><b>Strong correlations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs047.html#layers-of-a-cnn" style="font-size: 80%;"><b>Layers of a CNN</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs048.html#systematic-reduction" style="font-size: 80%;"><b>Systematic reduction</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs049.html#prerequisites-collect-and-pre-process-data" style="font-size: 80%;"><b>Prerequisites: Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs050.html#importing-keras-and-tensorflow" style="font-size: 80%;"><b>Importing Keras and Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs051.html#running-with-keras" style="font-size: 80%;"><b>Running with Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs052.html#final-part" style="font-size: 80%;"><b>Final part</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs053.html#final-visualization" style="font-size: 80%;"><b>Final visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs054.html#the-cifar01-data-set" style="font-size: 80%;"><b>The CIFAR01 data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs055.html#verifying-the-data-set" style="font-size: 80%;"><b>Verifying the data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs056.html#set-up-the-model" style="font-size: 80%;"><b>Set up  the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs057.html#add-dense-layers-on-top" style="font-size: 80%;"><b>Add Dense layers on top</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs058.html#compile-and-train-the-model" style="font-size: 80%;"><b>Compile and train the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs059.html#finally-evaluate-the-model" style="font-size: 80%;"><b>Finally, evaluate the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#building-our-own-cnn-code" style="font-size: 80%;"><b>Building our own CNN code</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#list-of-contents" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;List of contents:</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution2dlayer-convolution-in-a-hidden-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution2DLayer: convolution in a hidden layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#backpropagation-in-the-convolutional-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Backpropagation in the convolutional layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#demonstration" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Demonstration</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#pooling-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Pooling Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#flattening-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Flattening Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#fully-connected-layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Fully Connected Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#optimized-convolution2dlayer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Optimized Convolution2DLayer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#the-convolutional-neural-network-cnn" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;The Convolutional Neural Network (CNN)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-cnn-code" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of CNN code</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#additional-remarks" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Additional Remarks</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#remarks-on-the-speed" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Remarks on the speed</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution-using-separable-kernels" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution using separable kernels</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution-in-the-fourier-domain" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution in the Fourier domain</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs001.html#plan-for-week-44" style="font-size: 80%;">Plan for week 44</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs002.html#lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;">Lab  sessions on Tuesday and Wednesday</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs003.html#material-for-lecture-monday-october-27" style="font-size: 80%;">Material for Lecture Monday October 27</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs004.html#solving-differential-equations-with-deep-learning" style="font-size: 80%;">Solving differential equations  with Deep Learning</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs005.html#ordinary-differential-equations-first" style="font-size: 80%;">Ordinary Differential Equations first</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#the-trial-solution" style="font-size: 80%;">The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="#minimization-process" style="font-size: 80%;">Minimization process</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs008.html#minimizing-the-cost-function-using-gradient-descent-and-automatic-differentiation" style="font-size: 80%;">Minimizing the cost function using gradient descent and automatic differentiation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs009.html#example-exponential-decay" style="font-size: 80%;">Example: Exponential decay</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs010.html#the-function-to-solve-for" style="font-size: 80%;">The function to solve for</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#the-trial-solution" style="font-size: 80%;">The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs012.html#setup-of-network" style="font-size: 80%;">Setup of Network</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs013.html#reformulating-the-problem" style="font-size: 80%;">Reformulating the problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs014.html#more-technicalities" style="font-size: 80%;">More technicalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs039.html#more-details" style="font-size: 80%;">More details</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs016.html#a-possible-implementation-of-a-neural-network" style="font-size: 80%;">A possible implementation of a neural network</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs017.html#technicalities" style="font-size: 80%;">Technicalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs018.html#final-technicalities-i" style="font-size: 80%;">Final technicalities I</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs019.html#final-technicalities-ii" style="font-size: 80%;">Final technicalities II</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs020.html#final-technicalities-iii" style="font-size: 80%;">Final technicalities III</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs021.html#final-technicalities-iv" style="font-size: 80%;">Final technicalities IV</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs022.html#back-propagation" style="font-size: 80%;">Back propagation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs023.html#gradient-descent" style="font-size: 80%;">Gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs024.html#the-code-for-solving-the-ode" style="font-size: 80%;">The code for solving the ODE</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs025.html#the-network-with-one-input-layer-specified-number-of-hidden-layers-and-one-output-layer" style="font-size: 80%;">The network with one input layer, specified number of hidden layers, and one output layer</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs026.html#example-population-growth" style="font-size: 80%;">Example: Population growth</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs027.html#setting-up-the-problem" style="font-size: 80%;">Setting up the problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#the-trial-solution" style="font-size: 80%;">The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs029.html#the-program-using-autograd" style="font-size: 80%;">The program using Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs030.html#using-forward-euler-to-solve-the-ode" style="font-size: 80%;">Using forward Euler to solve the ODE</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs031.html#example-solving-the-one-dimensional-poisson-equation" style="font-size: 80%;">Example: Solving the one dimensional Poisson equation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs032.html#the-specific-equation-to-solve-for" style="font-size: 80%;">The specific equation to solve for</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs033.html#solving-the-equation-using-autograd" style="font-size: 80%;">Solving the equation using Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs034.html#comparing-with-a-numerical-scheme" style="font-size: 80%;">Comparing with a numerical scheme</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs035.html#setting-up-the-code" style="font-size: 80%;">Setting up the code</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs036.html#partial-differential-equations" style="font-size: 80%;">Partial Differential Equations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs037.html#type-of-problem" style="font-size: 80%;">Type of problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs038.html#network-requirements" style="font-size: 80%;">Network requirements</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs039.html#more-details" style="font-size: 80%;">More details</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs040.html#example-the-diffusion-equation" style="font-size: 80%;">Example: The diffusion equation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs041.html#defining-the-problem" style="font-size: 80%;">Defining the problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs042.html#setting-up-the-network-using-autograd" style="font-size: 80%;">Setting up the network using Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs043.html#setting-up-the-network-using-autograd-the-trial-solution" style="font-size: 80%;">Setting up the network using Autograd; The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs044.html#why-the-jacobian" style="font-size: 80%;">Why the Jacobian?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs045.html#setting-up-the-network-using-autograd-the-full-program" style="font-size: 80%;">Setting up the network using Autograd; The full program</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs046.html#resources-on-differential-equations-and-deep-learning" style="font-size: 80%;">Resources on differential equations and deep learning</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs047.html#convolutional-neural-networks-recognizing-images" style="font-size: 80%;">Convolutional Neural Networks (recognizing images)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs048.html#what-is-the-difference" style="font-size: 80%;">What is the Difference</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs049.html#neural-networks-vs-cnns" style="font-size: 80%;">Neural Networks vs CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs050.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;">Why CNNS for images, sound files, medical images from CT scans etc?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs051.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;">Regular NNs don’t scale well to full images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs052.html#3d-volumes-of-neurons" style="font-size: 80%;">3D volumes of neurons</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs053.html#more-on-dimensionalities" style="font-size: 80%;">More on Dimensionalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs054.html#further-remarks" style="font-size: 80%;">Further remarks</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs055.html#layers-used-to-build-cnns" style="font-size: 80%;">Layers used to build CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs056.html#transforming-images" style="font-size: 80%;">Transforming images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs057.html#cnns-in-brief" style="font-size: 80%;">CNNs in brief</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs058.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs059.html#key-idea" style="font-size: 80%;">Key Idea</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#how-to-do-image-compression-before-the-era-of-deep-learning" style="font-size: 80%;">How to do image compression before the era of deep learning</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs061.html#the-svd-example" style="font-size: 80%;">The SVD example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs062.html#mathematics-of-cnns" style="font-size: 80%;">Mathematics of CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs063.html#convolution-examples-polynomial-multiplication" style="font-size: 80%;">Convolution Examples: Polynomial multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs064.html#efficient-polynomial-multiplication" style="font-size: 80%;">Efficient Polynomial Multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs065.html#further-simplification" style="font-size: 80%;">Further simplification</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs066.html#a-more-efficient-way-of-coding-the-above-convolution" style="font-size: 80%;">A more efficient way of coding the above Convolution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs067.html#commutative-process" style="font-size: 80%;">Commutative process</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs068.html#toeplitz-matrices" style="font-size: 80%;">Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs069.html#fourier-series-and-toeplitz-matrices" style="font-size: 80%;">Fourier series and Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs070.html#generalizing-the-above-one-dimensional-case" style="font-size: 80%;">Generalizing the above one-dimensional case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs071.html#memory-considerations" style="font-size: 80%;">Memory considerations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs072.html#padding" style="font-size: 80%;">Padding</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs073.html#new-vector" style="font-size: 80%;">New vector</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs074.html#rewriting-as-dot-products" style="font-size: 80%;">Rewriting as dot products</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs075.html#cross-correlation" style="font-size: 80%;">Cross correlation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs075.html#two-dimensional-objects" style="font-size: 80%;">Two-dimensional objects</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs076.html#cnns-in-more-detail-simple-example" style="font-size: 80%;">CNNs in more detail, simple example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs077.html#the-convolution-stage" style="font-size: 80%;">The convolution stage</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs078.html#finding-the-number-of-parameters" style="font-size: 80%;">Finding the number of parameters</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs079.html#new-image-or-volume" style="font-size: 80%;">New image (or volume)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs080.html#parameters-to-train-common-settings" style="font-size: 80%;">Parameters to train, common settings</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs081.html#examples-of-cnn-setups" style="font-size: 80%;">Examples of CNN setups</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs082.html#summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Summarizing: Performing a general discrete convolution ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs083.html#pooling" style="font-size: 80%;">Pooling</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs084.html#pooling-arithmetic" style="font-size: 80%;">Pooling arithmetic</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs085.html#pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Pooling types ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs086.html#building-convolutional-neural-networks-in-tensorflow-keras-and-pytorch" style="font-size: 80%;">Building convolutional neural networks in Tensorflow/Keras and PyTorch</a></li>
 
         </ul>
       </li>
@@ -348,28 +389,32 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0007"></a>
 <!-- !split -->
-<h2 id="why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" class="anchor">Why CNNS for images, sound files, medical images from CT scans etc? </h2>
+<h2 id="minimization-process" class="anchor">Minimization process </h2>
+
+<p>For the minimization to be defined, we need to have a cost function at hand to minimize.</p>
 
-<p>However, when we consider images, sound clips and many other similar kinds of data, these data  have an intrinsic
-structure. More formally, they share these important properties:
+<p>It is given that \( f\left(x, \, g(x), \, g'(x), \, g''(x), \, \dots \, , \, g^{(n)}(x)\right) \) should be equal to zero in <a href="http://github.com/._week44-bs005.html#mjx-eqn-1">(1)</a>.
+We can choose to consider the mean squared error as the cost function for an input \( x \).
+Since we are looking at one input, the cost function is just \( f \) squared.
+The cost function \( c\left(x, P \right) \) can therefore be expressed as
 </p>
-<ul>
-<li> They are stored as multi-dimensional arrays (think of the pixels of a figure) .</li>
-<li> They feature one or more axes for which ordering matters (e.g., width and height axes for an image, time axis for a sound clip).</li>
-<li> One axis, called the channel axis, is used to access different views of the data (e.g., the red, green and blue channels of a color image, or the left and right channels of a stereo audio track).</li>
-</ul>
-<p>These properties are not exploited when an affine transformation is applied; in
-fact, all the axes are treated in the same way and the topological information
-is not taken into account. Still, taking advantage of the implicit structure of
-the data may prove very handy in solving some tasks, like computer vision and
-speech recognition, and in these cases it would be best to preserve it. This is
-where discrete convolutions come into play.
+
+$$
+C\left(x, P\right) = \big(f\left(x, \, g(x), \, g'(x), \, g''(x), \, \dots \, , \, g^{(n)}(x)\right)\big)^2
+$$
+
+<p>If \( N \) inputs are given as a vector \( \boldsymbol{x} \) with elements \( x_i \) for \( i = 1,\dots,N \),
+the cost function becomes
 </p>
 
-<p>A discrete convolution is a linear transformation that preserves this notion of
-ordering. It is sparse (only a few input units contribute to a given output
-unit) and reuses parameters (the same weights are applied to multiple locations
-in the input).
+$$
+\begin{equation} \tag{3}
+	C\left(\boldsymbol{x}, P\right) = \frac{1}{N} \sum_{i=1}^N \big(f\left(x_i, \, g(x_i), \, g'(x_i), \, g''(x_i), \, \dots \, , \, g^{(n)}(x_i)\right)\big)^2
+\end{equation}
+$$
+
+<p>The neural net should then find the parameters \( P \) that minimizes the cost function in
+<a href="#mjx-eqn-3">(3)</a> for a set of \( N \) training samples \( x_i \).
 </p>
 
 <p>
@@ -394,7 +439,7 @@ <h2 id="why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" class=
   <li><a href="http://github.com/._week44-bs015.html">16</a></li>
   <li><a href="http://github.com/._week44-bs016.html">17</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week44-bs060.html">61</a></li>
+  <li><a href="http://github.com/._week44-bs086.html">87</a></li>
   <li><a href="http://github.com/._week44-bs008.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week44/html/._week44-bs008.html b/doc/pub/week44/html/._week44-bs008.html
index 34b07afec..e011f66f6 100644
--- a/doc/pub/week44/html/._week44-bs008.html
+++ b/doc/pub/week44/html/._week44-bs008.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 44,  Convolutional Neural Networks (CNN)">
-<title>Week 44,  Convolutional Neural Networks (CNN)</title>
+<meta name="description" content="Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)">
+<title>Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week44.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week44-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -41,10 +41,121 @@
                2,
                None,
                'lab-sessions-on-tuesday-and-wednesday'),
-              ('Material for Lecture Monday October 28',
+              ('Material for Lecture Monday October 27',
                2,
                None,
-               'material-for-lecture-monday-october-28'),
+               'material-for-lecture-monday-october-27'),
+              ('Solving differential equations  with Deep Learning',
+               2,
+               None,
+               'solving-differential-equations-with-deep-learning'),
+              ('Ordinary Differential Equations first',
+               2,
+               None,
+               'ordinary-differential-equations-first'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('Minimization process', 2, None, 'minimization-process'),
+              ('Minimizing the cost function using gradient descent and '
+               'automatic differentiation',
+               2,
+               None,
+               'minimizing-the-cost-function-using-gradient-descent-and-automatic-differentiation'),
+              ('Example: Exponential decay',
+               2,
+               None,
+               'example-exponential-decay'),
+              ('The function to solve for',
+               2,
+               None,
+               'the-function-to-solve-for'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('Setup of Network', 2, None, 'setup-of-network'),
+              ('Reformulating the problem',
+               2,
+               None,
+               'reformulating-the-problem'),
+              ('More technicalities', 2, None, 'more-technicalities'),
+              ('More details', 2, None, 'more-details'),
+              ('A possible implementation of a neural network',
+               2,
+               None,
+               'a-possible-implementation-of-a-neural-network'),
+              ('Technicalities', 2, None, 'technicalities'),
+              ('Final technicalities I', 2, None, 'final-technicalities-i'),
+              ('Final technicalities II', 2, None, 'final-technicalities-ii'),
+              ('Final technicalities III', 2, None, 'final-technicalities-iii'),
+              ('Final technicalities IV', 2, None, 'final-technicalities-iv'),
+              ('Back propagation', 2, None, 'back-propagation'),
+              ('Gradient descent', 2, None, 'gradient-descent'),
+              ('The code for solving the ODE',
+               2,
+               None,
+               'the-code-for-solving-the-ode'),
+              ('The network with one input layer, specified number of hidden '
+               'layers, and one output layer',
+               2,
+               None,
+               'the-network-with-one-input-layer-specified-number-of-hidden-layers-and-one-output-layer'),
+              ('Example: Population growth',
+               2,
+               None,
+               'example-population-growth'),
+              ('Setting up the problem', 2, None, 'setting-up-the-problem'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('The program using Autograd',
+               2,
+               None,
+               'the-program-using-autograd'),
+              ('Using forward Euler to solve the ODE',
+               2,
+               None,
+               'using-forward-euler-to-solve-the-ode'),
+              ('Example: Solving the one dimensional Poisson equation',
+               2,
+               None,
+               'example-solving-the-one-dimensional-poisson-equation'),
+              ('The specific equation to solve for',
+               2,
+               None,
+               'the-specific-equation-to-solve-for'),
+              ('Solving the equation using Autograd',
+               2,
+               None,
+               'solving-the-equation-using-autograd'),
+              ('Comparing with a numerical scheme',
+               2,
+               None,
+               'comparing-with-a-numerical-scheme'),
+              ('Setting up the code', 2, None, 'setting-up-the-code'),
+              ('Partial Differential Equations',
+               2,
+               None,
+               'partial-differential-equations'),
+              ('Type of problem', 2, None, 'type-of-problem'),
+              ('Network requirements', 2, None, 'network-requirements'),
+              ('More details', 2, None, 'more-details'),
+              ('Example: The diffusion equation',
+               2,
+               None,
+               'example-the-diffusion-equation'),
+              ('Defining the problem', 2, None, 'defining-the-problem'),
+              ('Setting up the network using Autograd',
+               2,
+               None,
+               'setting-up-the-network-using-autograd'),
+              ('Setting up the network using Autograd; The trial solution',
+               2,
+               None,
+               'setting-up-the-network-using-autograd-the-trial-solution'),
+              ('Why the Jacobian?', 2, None, 'why-the-jacobian'),
+              ('Setting up the network using Autograd; The full program',
+               2,
+               None,
+               'setting-up-the-network-using-autograd-the-full-program'),
+              ('Resources on differential equations and deep learning',
+               2,
+               None,
+               'resources-on-differential-equations-and-deep-learning'),
               ('Convolutional Neural Networks (recognizing images)',
                2,
                None,
@@ -141,85 +252,11 @@
                2,
                None,
                'pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
-              ('Building convolutional neural networks in Tensorflow and Keras',
-               2,
-               None,
-               'building-convolutional-neural-networks-in-tensorflow-and-keras'),
-              ('Setting it up', 2, None, 'setting-it-up'),
-              ('The MNIST dataset again', 2, None, 'the-mnist-dataset-again'),
-              ('Strong correlations', 2, None, 'strong-correlations'),
-              ('Layers of a CNN', 2, None, 'layers-of-a-cnn'),
-              ('Systematic reduction', 2, None, 'systematic-reduction'),
-              ('Prerequisites: Collect and pre-process data',
-               2,
-               None,
-               'prerequisites-collect-and-pre-process-data'),
-              ('Importing Keras and Tensorflow',
+              ('Building convolutional neural networks in Tensorflow/Keras and '
+               'PyTorch',
                2,
                None,
-               'importing-keras-and-tensorflow'),
-              ('Running with Keras', 2, None, 'running-with-keras'),
-              ('Final part', 2, None, 'final-part'),
-              ('Final visualization', 2, None, 'final-visualization'),
-              ('The CIFAR01 data set', 2, None, 'the-cifar01-data-set'),
-              ('Verifying the data set', 2, None, 'verifying-the-data-set'),
-              ('Set up  the model', 2, None, 'set-up-the-model'),
-              ('Add Dense layers on top', 2, None, 'add-dense-layers-on-top'),
-              ('Compile and train the model',
-               2,
-               None,
-               'compile-and-train-the-model'),
-              ('Finally, evaluate the model',
-               2,
-               None,
-               'finally-evaluate-the-model'),
-              ('Building our own CNN code',
-               2,
-               None,
-               'building-our-own-cnn-code'),
-              ('List of contents:', 3, None, 'list-of-contents'),
-              ('Schedulers', 3, None, 'schedulers'),
-              ('Usage of schedulers', 3, None, 'usage-of-schedulers'),
-              ('Cost functions', 3, None, 'cost-functions'),
-              ('Usage of cost functions', 3, None, 'usage-of-cost-functions'),
-              ('Activation functions', 3, None, 'activation-functions'),
-              ('Usage of activation functions',
-               3,
-               None,
-               'usage-of-activation-functions'),
-              ('Convolution', 3, None, 'convolution'),
-              ('Layers', 3, None, 'layers'),
-              ('Convolution2DLayer: convolution in a hidden layer',
-               3,
-               None,
-               'convolution2dlayer-convolution-in-a-hidden-layer'),
-              ('Backpropagation in the convolutional layer',
-               3,
-               None,
-               'backpropagation-in-the-convolutional-layer'),
-              ('Demonstration', 3, None, 'demonstration'),
-              ('Pooling Layer', 3, None, 'pooling-layer'),
-              ('Flattening Layer', 3, None, 'flattening-layer'),
-              ('Fully Connected Layers', 3, None, 'fully-connected-layers'),
-              ('Optimized Convolution2DLayer',
-               3,
-               None,
-               'optimized-convolution2dlayer'),
-              ('The Convolutional Neural Network (CNN)',
-               3,
-               None,
-               'the-convolutional-neural-network-cnn'),
-              ('Usage of CNN code', 3, None, 'usage-of-cnn-code'),
-              ('Additional Remarks', 3, None, 'additional-remarks'),
-              ('Remarks on the speed', 3, None, 'remarks-on-the-speed'),
-              ('Convolution using separable kernels',
-               3,
-               None,
-               'convolution-using-separable-kernels'),
-              ('Convolution in the Fourier domain',
-               3,
-               None,
-               'convolution-in-the-fourier-domain')]}
+               'building-convolutional-neural-networks-in-tensorflow-keras-and-pytorch')]}
 end of tocinfo -->
 
 <body>
@@ -247,96 +284,100 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week44-bs.html">Week 44,  Convolutional Neural Networks (CNN)</a>
+    <a class="navbar-brand" href="http://github.com/week44-bs.html">Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs001.html#plan-for-week-44" style="font-size: 80%;"><b>Plan for week 44</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs002.html#lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Lab  sessions on Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs003.html#material-for-lecture-monday-october-28" style="font-size: 80%;"><b>Material for Lecture Monday October 28</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs004.html#convolutional-neural-networks-recognizing-images" style="font-size: 80%;"><b>Convolutional Neural Networks (recognizing images)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs005.html#what-is-the-difference" style="font-size: 80%;"><b>What is the Difference</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs006.html#neural-networks-vs-cnns" style="font-size: 80%;"><b>Neural Networks vs CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs007.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;"><b>Why CNNS for images, sound files, medical images from CT scans etc?</b></a></li>
-     <!-- navigation toc: --> <li><a href="#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;"><b>Regular NNs don’t scale well to full images</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs009.html#3d-volumes-of-neurons" style="font-size: 80%;"><b>3D volumes of neurons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs010.html#more-on-dimensionalities" style="font-size: 80%;"><b>More on Dimensionalities</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs011.html#further-remarks" style="font-size: 80%;"><b>Further remarks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs012.html#layers-used-to-build-cnns" style="font-size: 80%;"><b>Layers used to build CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs013.html#transforming-images" style="font-size: 80%;"><b>Transforming images</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs014.html#cnns-in-brief" style="font-size: 80%;"><b>CNNs in brief</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs015.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs016.html#key-idea" style="font-size: 80%;"><b>Key Idea</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs017.html#how-to-do-image-compression-before-the-era-of-deep-learning" style="font-size: 80%;"><b>How to do image compression before the era of deep learning</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs018.html#the-svd-example" style="font-size: 80%;"><b>The SVD example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs019.html#mathematics-of-cnns" style="font-size: 80%;"><b>Mathematics of CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs020.html#convolution-examples-polynomial-multiplication" style="font-size: 80%;"><b>Convolution Examples: Polynomial multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs021.html#efficient-polynomial-multiplication" style="font-size: 80%;"><b>Efficient Polynomial Multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs022.html#further-simplification" style="font-size: 80%;"><b>Further simplification</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs023.html#a-more-efficient-way-of-coding-the-above-convolution" style="font-size: 80%;"><b>A more efficient way of coding the above Convolution</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs024.html#commutative-process" style="font-size: 80%;"><b>Commutative process</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs025.html#toeplitz-matrices" style="font-size: 80%;"><b>Toeplitz matrices</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs026.html#fourier-series-and-toeplitz-matrices" style="font-size: 80%;"><b>Fourier series and Toeplitz matrices</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs027.html#generalizing-the-above-one-dimensional-case" style="font-size: 80%;"><b>Generalizing the above one-dimensional case</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#memory-considerations" style="font-size: 80%;"><b>Memory considerations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs029.html#padding" style="font-size: 80%;"><b>Padding</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs030.html#new-vector" style="font-size: 80%;"><b>New vector</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs031.html#rewriting-as-dot-products" style="font-size: 80%;"><b>Rewriting as dot products</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs032.html#cross-correlation" style="font-size: 80%;"><b>Cross correlation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs032.html#two-dimensional-objects" style="font-size: 80%;"><b>Two-dimensional objects</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs033.html#cnns-in-more-detail-simple-example" style="font-size: 80%;"><b>CNNs in more detail, simple example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs034.html#the-convolution-stage" style="font-size: 80%;"><b>The convolution stage</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs035.html#finding-the-number-of-parameters" style="font-size: 80%;"><b>Finding the number of parameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs036.html#new-image-or-volume" style="font-size: 80%;"><b>New image (or volume)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs037.html#parameters-to-train-common-settings" style="font-size: 80%;"><b>Parameters to train, common settings</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs038.html#examples-of-cnn-setups" style="font-size: 80%;"><b>Examples of CNN setups</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs039.html#summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>Summarizing: Performing a general discrete convolution ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs040.html#pooling" style="font-size: 80%;"><b>Pooling</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs041.html#pooling-arithmetic" style="font-size: 80%;"><b>Pooling arithmetic</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs042.html#pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>Pooling types ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs043.html#building-convolutional-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building convolutional neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs044.html#setting-it-up" style="font-size: 80%;"><b>Setting it up</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs045.html#the-mnist-dataset-again" style="font-size: 80%;"><b>The MNIST dataset again</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs046.html#strong-correlations" style="font-size: 80%;"><b>Strong correlations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs047.html#layers-of-a-cnn" style="font-size: 80%;"><b>Layers of a CNN</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs048.html#systematic-reduction" style="font-size: 80%;"><b>Systematic reduction</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs049.html#prerequisites-collect-and-pre-process-data" style="font-size: 80%;"><b>Prerequisites: Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs050.html#importing-keras-and-tensorflow" style="font-size: 80%;"><b>Importing Keras and Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs051.html#running-with-keras" style="font-size: 80%;"><b>Running with Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs052.html#final-part" style="font-size: 80%;"><b>Final part</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs053.html#final-visualization" style="font-size: 80%;"><b>Final visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs054.html#the-cifar01-data-set" style="font-size: 80%;"><b>The CIFAR01 data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs055.html#verifying-the-data-set" style="font-size: 80%;"><b>Verifying the data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs056.html#set-up-the-model" style="font-size: 80%;"><b>Set up  the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs057.html#add-dense-layers-on-top" style="font-size: 80%;"><b>Add Dense layers on top</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs058.html#compile-and-train-the-model" style="font-size: 80%;"><b>Compile and train the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs059.html#finally-evaluate-the-model" style="font-size: 80%;"><b>Finally, evaluate the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#building-our-own-cnn-code" style="font-size: 80%;"><b>Building our own CNN code</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#list-of-contents" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;List of contents:</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution2dlayer-convolution-in-a-hidden-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution2DLayer: convolution in a hidden layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#backpropagation-in-the-convolutional-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Backpropagation in the convolutional layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#demonstration" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Demonstration</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#pooling-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Pooling Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#flattening-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Flattening Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#fully-connected-layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Fully Connected Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#optimized-convolution2dlayer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Optimized Convolution2DLayer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#the-convolutional-neural-network-cnn" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;The Convolutional Neural Network (CNN)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-cnn-code" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of CNN code</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#additional-remarks" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Additional Remarks</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#remarks-on-the-speed" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Remarks on the speed</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution-using-separable-kernels" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution using separable kernels</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution-in-the-fourier-domain" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution in the Fourier domain</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs001.html#plan-for-week-44" style="font-size: 80%;">Plan for week 44</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs002.html#lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;">Lab  sessions on Tuesday and Wednesday</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs003.html#material-for-lecture-monday-october-27" style="font-size: 80%;">Material for Lecture Monday October 27</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs004.html#solving-differential-equations-with-deep-learning" style="font-size: 80%;">Solving differential equations  with Deep Learning</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs005.html#ordinary-differential-equations-first" style="font-size: 80%;">Ordinary Differential Equations first</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#the-trial-solution" style="font-size: 80%;">The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs007.html#minimization-process" style="font-size: 80%;">Minimization process</a></li>
+     <!-- navigation toc: --> <li><a href="#minimizing-the-cost-function-using-gradient-descent-and-automatic-differentiation" style="font-size: 80%;">Minimizing the cost function using gradient descent and automatic differentiation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs009.html#example-exponential-decay" style="font-size: 80%;">Example: Exponential decay</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs010.html#the-function-to-solve-for" style="font-size: 80%;">The function to solve for</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#the-trial-solution" style="font-size: 80%;">The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs012.html#setup-of-network" style="font-size: 80%;">Setup of Network</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs013.html#reformulating-the-problem" style="font-size: 80%;">Reformulating the problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs014.html#more-technicalities" style="font-size: 80%;">More technicalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs039.html#more-details" style="font-size: 80%;">More details</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs016.html#a-possible-implementation-of-a-neural-network" style="font-size: 80%;">A possible implementation of a neural network</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs017.html#technicalities" style="font-size: 80%;">Technicalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs018.html#final-technicalities-i" style="font-size: 80%;">Final technicalities I</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs019.html#final-technicalities-ii" style="font-size: 80%;">Final technicalities II</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs020.html#final-technicalities-iii" style="font-size: 80%;">Final technicalities III</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs021.html#final-technicalities-iv" style="font-size: 80%;">Final technicalities IV</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs022.html#back-propagation" style="font-size: 80%;">Back propagation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs023.html#gradient-descent" style="font-size: 80%;">Gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs024.html#the-code-for-solving-the-ode" style="font-size: 80%;">The code for solving the ODE</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs025.html#the-network-with-one-input-layer-specified-number-of-hidden-layers-and-one-output-layer" style="font-size: 80%;">The network with one input layer, specified number of hidden layers, and one output layer</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs026.html#example-population-growth" style="font-size: 80%;">Example: Population growth</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs027.html#setting-up-the-problem" style="font-size: 80%;">Setting up the problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#the-trial-solution" style="font-size: 80%;">The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs029.html#the-program-using-autograd" style="font-size: 80%;">The program using Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs030.html#using-forward-euler-to-solve-the-ode" style="font-size: 80%;">Using forward Euler to solve the ODE</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs031.html#example-solving-the-one-dimensional-poisson-equation" style="font-size: 80%;">Example: Solving the one dimensional Poisson equation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs032.html#the-specific-equation-to-solve-for" style="font-size: 80%;">The specific equation to solve for</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs033.html#solving-the-equation-using-autograd" style="font-size: 80%;">Solving the equation using Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs034.html#comparing-with-a-numerical-scheme" style="font-size: 80%;">Comparing with a numerical scheme</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs035.html#setting-up-the-code" style="font-size: 80%;">Setting up the code</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs036.html#partial-differential-equations" style="font-size: 80%;">Partial Differential Equations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs037.html#type-of-problem" style="font-size: 80%;">Type of problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs038.html#network-requirements" style="font-size: 80%;">Network requirements</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs039.html#more-details" style="font-size: 80%;">More details</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs040.html#example-the-diffusion-equation" style="font-size: 80%;">Example: The diffusion equation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs041.html#defining-the-problem" style="font-size: 80%;">Defining the problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs042.html#setting-up-the-network-using-autograd" style="font-size: 80%;">Setting up the network using Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs043.html#setting-up-the-network-using-autograd-the-trial-solution" style="font-size: 80%;">Setting up the network using Autograd; The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs044.html#why-the-jacobian" style="font-size: 80%;">Why the Jacobian?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs045.html#setting-up-the-network-using-autograd-the-full-program" style="font-size: 80%;">Setting up the network using Autograd; The full program</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs046.html#resources-on-differential-equations-and-deep-learning" style="font-size: 80%;">Resources on differential equations and deep learning</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs047.html#convolutional-neural-networks-recognizing-images" style="font-size: 80%;">Convolutional Neural Networks (recognizing images)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs048.html#what-is-the-difference" style="font-size: 80%;">What is the Difference</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs049.html#neural-networks-vs-cnns" style="font-size: 80%;">Neural Networks vs CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs050.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;">Why CNNS for images, sound files, medical images from CT scans etc?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs051.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;">Regular NNs don’t scale well to full images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs052.html#3d-volumes-of-neurons" style="font-size: 80%;">3D volumes of neurons</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs053.html#more-on-dimensionalities" style="font-size: 80%;">More on Dimensionalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs054.html#further-remarks" style="font-size: 80%;">Further remarks</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs055.html#layers-used-to-build-cnns" style="font-size: 80%;">Layers used to build CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs056.html#transforming-images" style="font-size: 80%;">Transforming images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs057.html#cnns-in-brief" style="font-size: 80%;">CNNs in brief</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs058.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs059.html#key-idea" style="font-size: 80%;">Key Idea</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#how-to-do-image-compression-before-the-era-of-deep-learning" style="font-size: 80%;">How to do image compression before the era of deep learning</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs061.html#the-svd-example" style="font-size: 80%;">The SVD example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs062.html#mathematics-of-cnns" style="font-size: 80%;">Mathematics of CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs063.html#convolution-examples-polynomial-multiplication" style="font-size: 80%;">Convolution Examples: Polynomial multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs064.html#efficient-polynomial-multiplication" style="font-size: 80%;">Efficient Polynomial Multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs065.html#further-simplification" style="font-size: 80%;">Further simplification</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs066.html#a-more-efficient-way-of-coding-the-above-convolution" style="font-size: 80%;">A more efficient way of coding the above Convolution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs067.html#commutative-process" style="font-size: 80%;">Commutative process</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs068.html#toeplitz-matrices" style="font-size: 80%;">Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs069.html#fourier-series-and-toeplitz-matrices" style="font-size: 80%;">Fourier series and Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs070.html#generalizing-the-above-one-dimensional-case" style="font-size: 80%;">Generalizing the above one-dimensional case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs071.html#memory-considerations" style="font-size: 80%;">Memory considerations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs072.html#padding" style="font-size: 80%;">Padding</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs073.html#new-vector" style="font-size: 80%;">New vector</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs074.html#rewriting-as-dot-products" style="font-size: 80%;">Rewriting as dot products</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs075.html#cross-correlation" style="font-size: 80%;">Cross correlation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs075.html#two-dimensional-objects" style="font-size: 80%;">Two-dimensional objects</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs076.html#cnns-in-more-detail-simple-example" style="font-size: 80%;">CNNs in more detail, simple example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs077.html#the-convolution-stage" style="font-size: 80%;">The convolution stage</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs078.html#finding-the-number-of-parameters" style="font-size: 80%;">Finding the number of parameters</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs079.html#new-image-or-volume" style="font-size: 80%;">New image (or volume)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs080.html#parameters-to-train-common-settings" style="font-size: 80%;">Parameters to train, common settings</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs081.html#examples-of-cnn-setups" style="font-size: 80%;">Examples of CNN setups</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs082.html#summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Summarizing: Performing a general discrete convolution ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs083.html#pooling" style="font-size: 80%;">Pooling</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs084.html#pooling-arithmetic" style="font-size: 80%;">Pooling arithmetic</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs085.html#pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Pooling types ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs086.html#building-convolutional-neural-networks-in-tensorflow-keras-and-pytorch" style="font-size: 80%;">Building convolutional neural networks in Tensorflow/Keras and PyTorch</a></li>
 
         </ul>
       </li>
@@ -348,32 +389,16 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0008"></a>
 <!-- !split -->
-<h2 id="regular-nns-don-t-scale-well-to-full-images" class="anchor">Regular NNs don&#8217;t scale well to full images </h2>
+<h2 id="minimizing-the-cost-function-using-gradient-descent-and-automatic-differentiation" class="anchor">Minimizing the cost function using gradient descent and automatic differentiation </h2>
 
-<p>As an example, consider
-an image of size \( 32\times 32\times 3 \) (32 wide, 32 high, 3 color channels), so a
-single fully-connected neuron in a first hidden layer of a regular
-Neural Network would have \( 32\times 32\times 3 = 3072 \) weights. This amount still
-seems manageable, but clearly this fully-connected structure does not
-scale to larger images. For example, an image of more respectable
-size, say \( 200\times 200\times 3 \), would lead to neurons that have 
-\( 200\times 200\times 3 = 120,000 \) weights. 
+<p>To perform the minimization using gradient descent, the gradient of \( C\left(\boldsymbol{x}, P\right) \) is needed.
+It might happen so that finding an analytical expression of the gradient of \( C(\boldsymbol{x}, P) \) from <a href="http://github.com/._week44-bs007.html#mjx-eqn-3">(3)</a> gets too messy, depending on which cost function one desires to use.
 </p>
 
-<p>We could have
-several such neurons, and the parameters would add up quickly! Clearly,
-this full connectivity is wasteful and the huge number of parameters
-would quickly lead to possible overfitting.
+<p>Luckily, there exists libraries that makes the job for us through automatic differentiation.
+Automatic differentiation is a method of finding the derivatives numerically with very high precision.
 </p>
 
-<center>  <!-- FIGURE -->
-<hr class="figure">
-<center>
-<p class="caption">Figure 1:  A regular 3-layer Neural Network. </p>
-</center>
-<p><img src="http://github.com/figslides/nn.jpeg" width="500" align="bottom"></p>
-</center>
-
 <p>
 <!-- navigation buttons at the bottom of the page -->
 <ul class="pagination">
@@ -397,7 +422,7 @@ <h2 id="regular-nns-don-t-scale-well-to-full-images" class="anchor">Regular NNs
   <li><a href="http://github.com/._week44-bs016.html">17</a></li>
   <li><a href="http://github.com/._week44-bs017.html">18</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week44-bs060.html">61</a></li>
+  <li><a href="http://github.com/._week44-bs086.html">87</a></li>
   <li><a href="http://github.com/._week44-bs009.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week44/html/._week44-bs009.html b/doc/pub/week44/html/._week44-bs009.html
index 8620b13c7..369108b88 100644
--- a/doc/pub/week44/html/._week44-bs009.html
+++ b/doc/pub/week44/html/._week44-bs009.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 44,  Convolutional Neural Networks (CNN)">
-<title>Week 44,  Convolutional Neural Networks (CNN)</title>
+<meta name="description" content="Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)">
+<title>Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week44.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week44-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -41,10 +41,121 @@
                2,
                None,
                'lab-sessions-on-tuesday-and-wednesday'),
-              ('Material for Lecture Monday October 28',
+              ('Material for Lecture Monday October 27',
                2,
                None,
-               'material-for-lecture-monday-october-28'),
+               'material-for-lecture-monday-october-27'),
+              ('Solving differential equations  with Deep Learning',
+               2,
+               None,
+               'solving-differential-equations-with-deep-learning'),
+              ('Ordinary Differential Equations first',
+               2,
+               None,
+               'ordinary-differential-equations-first'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('Minimization process', 2, None, 'minimization-process'),
+              ('Minimizing the cost function using gradient descent and '
+               'automatic differentiation',
+               2,
+               None,
+               'minimizing-the-cost-function-using-gradient-descent-and-automatic-differentiation'),
+              ('Example: Exponential decay',
+               2,
+               None,
+               'example-exponential-decay'),
+              ('The function to solve for',
+               2,
+               None,
+               'the-function-to-solve-for'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('Setup of Network', 2, None, 'setup-of-network'),
+              ('Reformulating the problem',
+               2,
+               None,
+               'reformulating-the-problem'),
+              ('More technicalities', 2, None, 'more-technicalities'),
+              ('More details', 2, None, 'more-details'),
+              ('A possible implementation of a neural network',
+               2,
+               None,
+               'a-possible-implementation-of-a-neural-network'),
+              ('Technicalities', 2, None, 'technicalities'),
+              ('Final technicalities I', 2, None, 'final-technicalities-i'),
+              ('Final technicalities II', 2, None, 'final-technicalities-ii'),
+              ('Final technicalities III', 2, None, 'final-technicalities-iii'),
+              ('Final technicalities IV', 2, None, 'final-technicalities-iv'),
+              ('Back propagation', 2, None, 'back-propagation'),
+              ('Gradient descent', 2, None, 'gradient-descent'),
+              ('The code for solving the ODE',
+               2,
+               None,
+               'the-code-for-solving-the-ode'),
+              ('The network with one input layer, specified number of hidden '
+               'layers, and one output layer',
+               2,
+               None,
+               'the-network-with-one-input-layer-specified-number-of-hidden-layers-and-one-output-layer'),
+              ('Example: Population growth',
+               2,
+               None,
+               'example-population-growth'),
+              ('Setting up the problem', 2, None, 'setting-up-the-problem'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('The program using Autograd',
+               2,
+               None,
+               'the-program-using-autograd'),
+              ('Using forward Euler to solve the ODE',
+               2,
+               None,
+               'using-forward-euler-to-solve-the-ode'),
+              ('Example: Solving the one dimensional Poisson equation',
+               2,
+               None,
+               'example-solving-the-one-dimensional-poisson-equation'),
+              ('The specific equation to solve for',
+               2,
+               None,
+               'the-specific-equation-to-solve-for'),
+              ('Solving the equation using Autograd',
+               2,
+               None,
+               'solving-the-equation-using-autograd'),
+              ('Comparing with a numerical scheme',
+               2,
+               None,
+               'comparing-with-a-numerical-scheme'),
+              ('Setting up the code', 2, None, 'setting-up-the-code'),
+              ('Partial Differential Equations',
+               2,
+               None,
+               'partial-differential-equations'),
+              ('Type of problem', 2, None, 'type-of-problem'),
+              ('Network requirements', 2, None, 'network-requirements'),
+              ('More details', 2, None, 'more-details'),
+              ('Example: The diffusion equation',
+               2,
+               None,
+               'example-the-diffusion-equation'),
+              ('Defining the problem', 2, None, 'defining-the-problem'),
+              ('Setting up the network using Autograd',
+               2,
+               None,
+               'setting-up-the-network-using-autograd'),
+              ('Setting up the network using Autograd; The trial solution',
+               2,
+               None,
+               'setting-up-the-network-using-autograd-the-trial-solution'),
+              ('Why the Jacobian?', 2, None, 'why-the-jacobian'),
+              ('Setting up the network using Autograd; The full program',
+               2,
+               None,
+               'setting-up-the-network-using-autograd-the-full-program'),
+              ('Resources on differential equations and deep learning',
+               2,
+               None,
+               'resources-on-differential-equations-and-deep-learning'),
               ('Convolutional Neural Networks (recognizing images)',
                2,
                None,
@@ -141,85 +252,11 @@
                2,
                None,
                'pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
-              ('Building convolutional neural networks in Tensorflow and Keras',
-               2,
-               None,
-               'building-convolutional-neural-networks-in-tensorflow-and-keras'),
-              ('Setting it up', 2, None, 'setting-it-up'),
-              ('The MNIST dataset again', 2, None, 'the-mnist-dataset-again'),
-              ('Strong correlations', 2, None, 'strong-correlations'),
-              ('Layers of a CNN', 2, None, 'layers-of-a-cnn'),
-              ('Systematic reduction', 2, None, 'systematic-reduction'),
-              ('Prerequisites: Collect and pre-process data',
-               2,
-               None,
-               'prerequisites-collect-and-pre-process-data'),
-              ('Importing Keras and Tensorflow',
+              ('Building convolutional neural networks in Tensorflow/Keras and '
+               'PyTorch',
                2,
                None,
-               'importing-keras-and-tensorflow'),
-              ('Running with Keras', 2, None, 'running-with-keras'),
-              ('Final part', 2, None, 'final-part'),
-              ('Final visualization', 2, None, 'final-visualization'),
-              ('The CIFAR01 data set', 2, None, 'the-cifar01-data-set'),
-              ('Verifying the data set', 2, None, 'verifying-the-data-set'),
-              ('Set up  the model', 2, None, 'set-up-the-model'),
-              ('Add Dense layers on top', 2, None, 'add-dense-layers-on-top'),
-              ('Compile and train the model',
-               2,
-               None,
-               'compile-and-train-the-model'),
-              ('Finally, evaluate the model',
-               2,
-               None,
-               'finally-evaluate-the-model'),
-              ('Building our own CNN code',
-               2,
-               None,
-               'building-our-own-cnn-code'),
-              ('List of contents:', 3, None, 'list-of-contents'),
-              ('Schedulers', 3, None, 'schedulers'),
-              ('Usage of schedulers', 3, None, 'usage-of-schedulers'),
-              ('Cost functions', 3, None, 'cost-functions'),
-              ('Usage of cost functions', 3, None, 'usage-of-cost-functions'),
-              ('Activation functions', 3, None, 'activation-functions'),
-              ('Usage of activation functions',
-               3,
-               None,
-               'usage-of-activation-functions'),
-              ('Convolution', 3, None, 'convolution'),
-              ('Layers', 3, None, 'layers'),
-              ('Convolution2DLayer: convolution in a hidden layer',
-               3,
-               None,
-               'convolution2dlayer-convolution-in-a-hidden-layer'),
-              ('Backpropagation in the convolutional layer',
-               3,
-               None,
-               'backpropagation-in-the-convolutional-layer'),
-              ('Demonstration', 3, None, 'demonstration'),
-              ('Pooling Layer', 3, None, 'pooling-layer'),
-              ('Flattening Layer', 3, None, 'flattening-layer'),
-              ('Fully Connected Layers', 3, None, 'fully-connected-layers'),
-              ('Optimized Convolution2DLayer',
-               3,
-               None,
-               'optimized-convolution2dlayer'),
-              ('The Convolutional Neural Network (CNN)',
-               3,
-               None,
-               'the-convolutional-neural-network-cnn'),
-              ('Usage of CNN code', 3, None, 'usage-of-cnn-code'),
-              ('Additional Remarks', 3, None, 'additional-remarks'),
-              ('Remarks on the speed', 3, None, 'remarks-on-the-speed'),
-              ('Convolution using separable kernels',
-               3,
-               None,
-               'convolution-using-separable-kernels'),
-              ('Convolution in the Fourier domain',
-               3,
-               None,
-               'convolution-in-the-fourier-domain')]}
+               'building-convolutional-neural-networks-in-tensorflow-keras-and-pytorch')]}
 end of tocinfo -->
 
 <body>
@@ -247,96 +284,100 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week44-bs.html">Week 44,  Convolutional Neural Networks (CNN)</a>
+    <a class="navbar-brand" href="http://github.com/week44-bs.html">Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs001.html#plan-for-week-44" style="font-size: 80%;"><b>Plan for week 44</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs002.html#lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Lab  sessions on Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs003.html#material-for-lecture-monday-october-28" style="font-size: 80%;"><b>Material for Lecture Monday October 28</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs004.html#convolutional-neural-networks-recognizing-images" style="font-size: 80%;"><b>Convolutional Neural Networks (recognizing images)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs005.html#what-is-the-difference" style="font-size: 80%;"><b>What is the Difference</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs006.html#neural-networks-vs-cnns" style="font-size: 80%;"><b>Neural Networks vs CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs007.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;"><b>Why CNNS for images, sound files, medical images from CT scans etc?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs008.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;"><b>Regular NNs don’t scale well to full images</b></a></li>
-     <!-- navigation toc: --> <li><a href="#3d-volumes-of-neurons" style="font-size: 80%;"><b>3D volumes of neurons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs010.html#more-on-dimensionalities" style="font-size: 80%;"><b>More on Dimensionalities</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs011.html#further-remarks" style="font-size: 80%;"><b>Further remarks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs012.html#layers-used-to-build-cnns" style="font-size: 80%;"><b>Layers used to build CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs013.html#transforming-images" style="font-size: 80%;"><b>Transforming images</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs014.html#cnns-in-brief" style="font-size: 80%;"><b>CNNs in brief</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs015.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs016.html#key-idea" style="font-size: 80%;"><b>Key Idea</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs017.html#how-to-do-image-compression-before-the-era-of-deep-learning" style="font-size: 80%;"><b>How to do image compression before the era of deep learning</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs018.html#the-svd-example" style="font-size: 80%;"><b>The SVD example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs019.html#mathematics-of-cnns" style="font-size: 80%;"><b>Mathematics of CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs020.html#convolution-examples-polynomial-multiplication" style="font-size: 80%;"><b>Convolution Examples: Polynomial multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs021.html#efficient-polynomial-multiplication" style="font-size: 80%;"><b>Efficient Polynomial Multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs022.html#further-simplification" style="font-size: 80%;"><b>Further simplification</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs023.html#a-more-efficient-way-of-coding-the-above-convolution" style="font-size: 80%;"><b>A more efficient way of coding the above Convolution</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs024.html#commutative-process" style="font-size: 80%;"><b>Commutative process</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs025.html#toeplitz-matrices" style="font-size: 80%;"><b>Toeplitz matrices</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs026.html#fourier-series-and-toeplitz-matrices" style="font-size: 80%;"><b>Fourier series and Toeplitz matrices</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs027.html#generalizing-the-above-one-dimensional-case" style="font-size: 80%;"><b>Generalizing the above one-dimensional case</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#memory-considerations" style="font-size: 80%;"><b>Memory considerations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs029.html#padding" style="font-size: 80%;"><b>Padding</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs030.html#new-vector" style="font-size: 80%;"><b>New vector</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs031.html#rewriting-as-dot-products" style="font-size: 80%;"><b>Rewriting as dot products</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs032.html#cross-correlation" style="font-size: 80%;"><b>Cross correlation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs032.html#two-dimensional-objects" style="font-size: 80%;"><b>Two-dimensional objects</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs033.html#cnns-in-more-detail-simple-example" style="font-size: 80%;"><b>CNNs in more detail, simple example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs034.html#the-convolution-stage" style="font-size: 80%;"><b>The convolution stage</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs035.html#finding-the-number-of-parameters" style="font-size: 80%;"><b>Finding the number of parameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs036.html#new-image-or-volume" style="font-size: 80%;"><b>New image (or volume)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs037.html#parameters-to-train-common-settings" style="font-size: 80%;"><b>Parameters to train, common settings</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs038.html#examples-of-cnn-setups" style="font-size: 80%;"><b>Examples of CNN setups</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs039.html#summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>Summarizing: Performing a general discrete convolution ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs040.html#pooling" style="font-size: 80%;"><b>Pooling</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs041.html#pooling-arithmetic" style="font-size: 80%;"><b>Pooling arithmetic</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs042.html#pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>Pooling types ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs043.html#building-convolutional-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building convolutional neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs044.html#setting-it-up" style="font-size: 80%;"><b>Setting it up</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs045.html#the-mnist-dataset-again" style="font-size: 80%;"><b>The MNIST dataset again</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs046.html#strong-correlations" style="font-size: 80%;"><b>Strong correlations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs047.html#layers-of-a-cnn" style="font-size: 80%;"><b>Layers of a CNN</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs048.html#systematic-reduction" style="font-size: 80%;"><b>Systematic reduction</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs049.html#prerequisites-collect-and-pre-process-data" style="font-size: 80%;"><b>Prerequisites: Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs050.html#importing-keras-and-tensorflow" style="font-size: 80%;"><b>Importing Keras and Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs051.html#running-with-keras" style="font-size: 80%;"><b>Running with Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs052.html#final-part" style="font-size: 80%;"><b>Final part</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs053.html#final-visualization" style="font-size: 80%;"><b>Final visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs054.html#the-cifar01-data-set" style="font-size: 80%;"><b>The CIFAR01 data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs055.html#verifying-the-data-set" style="font-size: 80%;"><b>Verifying the data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs056.html#set-up-the-model" style="font-size: 80%;"><b>Set up  the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs057.html#add-dense-layers-on-top" style="font-size: 80%;"><b>Add Dense layers on top</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs058.html#compile-and-train-the-model" style="font-size: 80%;"><b>Compile and train the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs059.html#finally-evaluate-the-model" style="font-size: 80%;"><b>Finally, evaluate the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#building-our-own-cnn-code" style="font-size: 80%;"><b>Building our own CNN code</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#list-of-contents" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;List of contents:</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution2dlayer-convolution-in-a-hidden-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution2DLayer: convolution in a hidden layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#backpropagation-in-the-convolutional-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Backpropagation in the convolutional layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#demonstration" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Demonstration</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#pooling-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Pooling Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#flattening-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Flattening Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#fully-connected-layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Fully Connected Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#optimized-convolution2dlayer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Optimized Convolution2DLayer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#the-convolutional-neural-network-cnn" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;The Convolutional Neural Network (CNN)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-cnn-code" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of CNN code</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#additional-remarks" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Additional Remarks</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#remarks-on-the-speed" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Remarks on the speed</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution-using-separable-kernels" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution using separable kernels</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution-in-the-fourier-domain" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution in the Fourier domain</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs001.html#plan-for-week-44" style="font-size: 80%;">Plan for week 44</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs002.html#lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;">Lab  sessions on Tuesday and Wednesday</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs003.html#material-for-lecture-monday-october-27" style="font-size: 80%;">Material for Lecture Monday October 27</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs004.html#solving-differential-equations-with-deep-learning" style="font-size: 80%;">Solving differential equations  with Deep Learning</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs005.html#ordinary-differential-equations-first" style="font-size: 80%;">Ordinary Differential Equations first</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#the-trial-solution" style="font-size: 80%;">The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs007.html#minimization-process" style="font-size: 80%;">Minimization process</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs008.html#minimizing-the-cost-function-using-gradient-descent-and-automatic-differentiation" style="font-size: 80%;">Minimizing the cost function using gradient descent and automatic differentiation</a></li>
+     <!-- navigation toc: --> <li><a href="#example-exponential-decay" style="font-size: 80%;">Example: Exponential decay</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs010.html#the-function-to-solve-for" style="font-size: 80%;">The function to solve for</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#the-trial-solution" style="font-size: 80%;">The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs012.html#setup-of-network" style="font-size: 80%;">Setup of Network</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs013.html#reformulating-the-problem" style="font-size: 80%;">Reformulating the problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs014.html#more-technicalities" style="font-size: 80%;">More technicalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs039.html#more-details" style="font-size: 80%;">More details</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs016.html#a-possible-implementation-of-a-neural-network" style="font-size: 80%;">A possible implementation of a neural network</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs017.html#technicalities" style="font-size: 80%;">Technicalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs018.html#final-technicalities-i" style="font-size: 80%;">Final technicalities I</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs019.html#final-technicalities-ii" style="font-size: 80%;">Final technicalities II</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs020.html#final-technicalities-iii" style="font-size: 80%;">Final technicalities III</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs021.html#final-technicalities-iv" style="font-size: 80%;">Final technicalities IV</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs022.html#back-propagation" style="font-size: 80%;">Back propagation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs023.html#gradient-descent" style="font-size: 80%;">Gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs024.html#the-code-for-solving-the-ode" style="font-size: 80%;">The code for solving the ODE</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs025.html#the-network-with-one-input-layer-specified-number-of-hidden-layers-and-one-output-layer" style="font-size: 80%;">The network with one input layer, specified number of hidden layers, and one output layer</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs026.html#example-population-growth" style="font-size: 80%;">Example: Population growth</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs027.html#setting-up-the-problem" style="font-size: 80%;">Setting up the problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#the-trial-solution" style="font-size: 80%;">The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs029.html#the-program-using-autograd" style="font-size: 80%;">The program using Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs030.html#using-forward-euler-to-solve-the-ode" style="font-size: 80%;">Using forward Euler to solve the ODE</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs031.html#example-solving-the-one-dimensional-poisson-equation" style="font-size: 80%;">Example: Solving the one dimensional Poisson equation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs032.html#the-specific-equation-to-solve-for" style="font-size: 80%;">The specific equation to solve for</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs033.html#solving-the-equation-using-autograd" style="font-size: 80%;">Solving the equation using Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs034.html#comparing-with-a-numerical-scheme" style="font-size: 80%;">Comparing with a numerical scheme</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs035.html#setting-up-the-code" style="font-size: 80%;">Setting up the code</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs036.html#partial-differential-equations" style="font-size: 80%;">Partial Differential Equations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs037.html#type-of-problem" style="font-size: 80%;">Type of problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs038.html#network-requirements" style="font-size: 80%;">Network requirements</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs039.html#more-details" style="font-size: 80%;">More details</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs040.html#example-the-diffusion-equation" style="font-size: 80%;">Example: The diffusion equation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs041.html#defining-the-problem" style="font-size: 80%;">Defining the problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs042.html#setting-up-the-network-using-autograd" style="font-size: 80%;">Setting up the network using Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs043.html#setting-up-the-network-using-autograd-the-trial-solution" style="font-size: 80%;">Setting up the network using Autograd; The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs044.html#why-the-jacobian" style="font-size: 80%;">Why the Jacobian?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs045.html#setting-up-the-network-using-autograd-the-full-program" style="font-size: 80%;">Setting up the network using Autograd; The full program</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs046.html#resources-on-differential-equations-and-deep-learning" style="font-size: 80%;">Resources on differential equations and deep learning</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs047.html#convolutional-neural-networks-recognizing-images" style="font-size: 80%;">Convolutional Neural Networks (recognizing images)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs048.html#what-is-the-difference" style="font-size: 80%;">What is the Difference</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs049.html#neural-networks-vs-cnns" style="font-size: 80%;">Neural Networks vs CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs050.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;">Why CNNS for images, sound files, medical images from CT scans etc?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs051.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;">Regular NNs don’t scale well to full images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs052.html#3d-volumes-of-neurons" style="font-size: 80%;">3D volumes of neurons</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs053.html#more-on-dimensionalities" style="font-size: 80%;">More on Dimensionalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs054.html#further-remarks" style="font-size: 80%;">Further remarks</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs055.html#layers-used-to-build-cnns" style="font-size: 80%;">Layers used to build CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs056.html#transforming-images" style="font-size: 80%;">Transforming images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs057.html#cnns-in-brief" style="font-size: 80%;">CNNs in brief</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs058.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs059.html#key-idea" style="font-size: 80%;">Key Idea</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#how-to-do-image-compression-before-the-era-of-deep-learning" style="font-size: 80%;">How to do image compression before the era of deep learning</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs061.html#the-svd-example" style="font-size: 80%;">The SVD example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs062.html#mathematics-of-cnns" style="font-size: 80%;">Mathematics of CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs063.html#convolution-examples-polynomial-multiplication" style="font-size: 80%;">Convolution Examples: Polynomial multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs064.html#efficient-polynomial-multiplication" style="font-size: 80%;">Efficient Polynomial Multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs065.html#further-simplification" style="font-size: 80%;">Further simplification</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs066.html#a-more-efficient-way-of-coding-the-above-convolution" style="font-size: 80%;">A more efficient way of coding the above Convolution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs067.html#commutative-process" style="font-size: 80%;">Commutative process</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs068.html#toeplitz-matrices" style="font-size: 80%;">Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs069.html#fourier-series-and-toeplitz-matrices" style="font-size: 80%;">Fourier series and Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs070.html#generalizing-the-above-one-dimensional-case" style="font-size: 80%;">Generalizing the above one-dimensional case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs071.html#memory-considerations" style="font-size: 80%;">Memory considerations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs072.html#padding" style="font-size: 80%;">Padding</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs073.html#new-vector" style="font-size: 80%;">New vector</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs074.html#rewriting-as-dot-products" style="font-size: 80%;">Rewriting as dot products</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs075.html#cross-correlation" style="font-size: 80%;">Cross correlation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs075.html#two-dimensional-objects" style="font-size: 80%;">Two-dimensional objects</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs076.html#cnns-in-more-detail-simple-example" style="font-size: 80%;">CNNs in more detail, simple example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs077.html#the-convolution-stage" style="font-size: 80%;">The convolution stage</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs078.html#finding-the-number-of-parameters" style="font-size: 80%;">Finding the number of parameters</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs079.html#new-image-or-volume" style="font-size: 80%;">New image (or volume)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs080.html#parameters-to-train-common-settings" style="font-size: 80%;">Parameters to train, common settings</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs081.html#examples-of-cnn-setups" style="font-size: 80%;">Examples of CNN setups</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs082.html#summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Summarizing: Performing a general discrete convolution ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs083.html#pooling" style="font-size: 80%;">Pooling</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs084.html#pooling-arithmetic" style="font-size: 80%;">Pooling arithmetic</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs085.html#pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Pooling types ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs086.html#building-convolutional-neural-networks-in-tensorflow-keras-and-pytorch" style="font-size: 80%;">Building convolutional neural networks in Tensorflow/Keras and PyTorch</a></li>
 
         </ul>
       </li>
@@ -348,43 +389,28 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0009"></a>
 <!-- !split -->
-<h2 id="3d-volumes-of-neurons" class="anchor">3D volumes of neurons </h2>
+<h2 id="example-exponential-decay" class="anchor">Example: Exponential decay </h2>
 
-<p>Convolutional Neural Networks take advantage of the fact that the
-input consists of images and they constrain the architecture in a more
-sensible way. 
-</p>
+<p>An exponential decay of a quantity \( g(x) \) is described by the equation</p>
 
-<p>In particular, unlike a regular Neural Network, the
-layers of a CNN have neurons arranged in 3 dimensions: width,
-height, depth. (Note that the word depth here refers to the third
-dimension of an activation volume, not to the depth of a full Neural
-Network, which can refer to the total number of layers in a network.)
-</p>
+$$
+\begin{equation} \tag{4}
+  g'(x) = -\gamma g(x)
+\end{equation}
+$$
 
-<p>To understand it better, the above example of an image 
-with an input volume of
-activations has dimensions \( 32\times 32\times 3 \) (width, height,
-depth respectively). 
-</p>
+<p>with \( g(0) = g_0 \) for some chosen initial value \( g_0 \).</p>
 
-<p>The neurons in a layer will
-only be connected to a small region of the layer before it, instead of
-all of the neurons in a fully-connected manner. Moreover, the final
-output layer could  for this specific image have dimensions \( 1\times 1 \times 10 \), 
-because by the
-end of the CNN architecture we will reduce the full image into a
-single vector of class scores, arranged along the depth
-dimension. 
-</p>
+<p>The analytical solution of <a href="#mjx-eqn-4">(4)</a> is</p>
 
-<center>  <!-- FIGURE -->
-<hr class="figure">
-<center>
-<p class="caption">Figure 2:  A CNN arranges its neurons in three dimensions (width, height, depth), as visualized in one of the layers. Every layer of a CNN transforms the 3D input volume to a 3D output volume of neuron activations. In this example, the red input layer holds the image, so its width and height would be the dimensions of the image, and the depth would be 3 (Red, Green, Blue channels). </p>
-</center>
-<p><img src="http://github.com/figslides/cnn.jpeg" width="500" align="bottom"></p>
-</center>
+$$
+\begin{equation}
+  g(x) = g_0 \exp\left(-\gamma x\right)
+\tag{5}
+\end{equation}
+$$
+
+<p>Having an analytical solution at hand, it is possible to use it to compare how well a neural network finds a solution of <a href="#mjx-eqn-4">(4)</a>.</p>
 
 <p>
 <!-- navigation buttons at the bottom of the page -->
@@ -410,7 +436,7 @@ <h2 id="3d-volumes-of-neurons" class="anchor">3D volumes of neurons </h2>
   <li><a href="http://github.com/._week44-bs017.html">18</a></li>
   <li><a href="http://github.com/._week44-bs018.html">19</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week44-bs060.html">61</a></li>
+  <li><a href="http://github.com/._week44-bs086.html">87</a></li>
   <li><a href="http://github.com/._week44-bs010.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week44/html/._week44-bs010.html b/doc/pub/week44/html/._week44-bs010.html
index cb683ec63..dfc22a3da 100644
--- a/doc/pub/week44/html/._week44-bs010.html
+++ b/doc/pub/week44/html/._week44-bs010.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 44,  Convolutional Neural Networks (CNN)">
-<title>Week 44,  Convolutional Neural Networks (CNN)</title>
+<meta name="description" content="Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)">
+<title>Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week44.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week44-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -41,10 +41,121 @@
                2,
                None,
                'lab-sessions-on-tuesday-and-wednesday'),
-              ('Material for Lecture Monday October 28',
+              ('Material for Lecture Monday October 27',
                2,
                None,
-               'material-for-lecture-monday-october-28'),
+               'material-for-lecture-monday-october-27'),
+              ('Solving differential equations  with Deep Learning',
+               2,
+               None,
+               'solving-differential-equations-with-deep-learning'),
+              ('Ordinary Differential Equations first',
+               2,
+               None,
+               'ordinary-differential-equations-first'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('Minimization process', 2, None, 'minimization-process'),
+              ('Minimizing the cost function using gradient descent and '
+               'automatic differentiation',
+               2,
+               None,
+               'minimizing-the-cost-function-using-gradient-descent-and-automatic-differentiation'),
+              ('Example: Exponential decay',
+               2,
+               None,
+               'example-exponential-decay'),
+              ('The function to solve for',
+               2,
+               None,
+               'the-function-to-solve-for'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('Setup of Network', 2, None, 'setup-of-network'),
+              ('Reformulating the problem',
+               2,
+               None,
+               'reformulating-the-problem'),
+              ('More technicalities', 2, None, 'more-technicalities'),
+              ('More details', 2, None, 'more-details'),
+              ('A possible implementation of a neural network',
+               2,
+               None,
+               'a-possible-implementation-of-a-neural-network'),
+              ('Technicalities', 2, None, 'technicalities'),
+              ('Final technicalities I', 2, None, 'final-technicalities-i'),
+              ('Final technicalities II', 2, None, 'final-technicalities-ii'),
+              ('Final technicalities III', 2, None, 'final-technicalities-iii'),
+              ('Final technicalities IV', 2, None, 'final-technicalities-iv'),
+              ('Back propagation', 2, None, 'back-propagation'),
+              ('Gradient descent', 2, None, 'gradient-descent'),
+              ('The code for solving the ODE',
+               2,
+               None,
+               'the-code-for-solving-the-ode'),
+              ('The network with one input layer, specified number of hidden '
+               'layers, and one output layer',
+               2,
+               None,
+               'the-network-with-one-input-layer-specified-number-of-hidden-layers-and-one-output-layer'),
+              ('Example: Population growth',
+               2,
+               None,
+               'example-population-growth'),
+              ('Setting up the problem', 2, None, 'setting-up-the-problem'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('The program using Autograd',
+               2,
+               None,
+               'the-program-using-autograd'),
+              ('Using forward Euler to solve the ODE',
+               2,
+               None,
+               'using-forward-euler-to-solve-the-ode'),
+              ('Example: Solving the one dimensional Poisson equation',
+               2,
+               None,
+               'example-solving-the-one-dimensional-poisson-equation'),
+              ('The specific equation to solve for',
+               2,
+               None,
+               'the-specific-equation-to-solve-for'),
+              ('Solving the equation using Autograd',
+               2,
+               None,
+               'solving-the-equation-using-autograd'),
+              ('Comparing with a numerical scheme',
+               2,
+               None,
+               'comparing-with-a-numerical-scheme'),
+              ('Setting up the code', 2, None, 'setting-up-the-code'),
+              ('Partial Differential Equations',
+               2,
+               None,
+               'partial-differential-equations'),
+              ('Type of problem', 2, None, 'type-of-problem'),
+              ('Network requirements', 2, None, 'network-requirements'),
+              ('More details', 2, None, 'more-details'),
+              ('Example: The diffusion equation',
+               2,
+               None,
+               'example-the-diffusion-equation'),
+              ('Defining the problem', 2, None, 'defining-the-problem'),
+              ('Setting up the network using Autograd',
+               2,
+               None,
+               'setting-up-the-network-using-autograd'),
+              ('Setting up the network using Autograd; The trial solution',
+               2,
+               None,
+               'setting-up-the-network-using-autograd-the-trial-solution'),
+              ('Why the Jacobian?', 2, None, 'why-the-jacobian'),
+              ('Setting up the network using Autograd; The full program',
+               2,
+               None,
+               'setting-up-the-network-using-autograd-the-full-program'),
+              ('Resources on differential equations and deep learning',
+               2,
+               None,
+               'resources-on-differential-equations-and-deep-learning'),
               ('Convolutional Neural Networks (recognizing images)',
                2,
                None,
@@ -141,85 +252,11 @@
                2,
                None,
                'pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
-              ('Building convolutional neural networks in Tensorflow and Keras',
-               2,
-               None,
-               'building-convolutional-neural-networks-in-tensorflow-and-keras'),
-              ('Setting it up', 2, None, 'setting-it-up'),
-              ('The MNIST dataset again', 2, None, 'the-mnist-dataset-again'),
-              ('Strong correlations', 2, None, 'strong-correlations'),
-              ('Layers of a CNN', 2, None, 'layers-of-a-cnn'),
-              ('Systematic reduction', 2, None, 'systematic-reduction'),
-              ('Prerequisites: Collect and pre-process data',
-               2,
-               None,
-               'prerequisites-collect-and-pre-process-data'),
-              ('Importing Keras and Tensorflow',
+              ('Building convolutional neural networks in Tensorflow/Keras and '
+               'PyTorch',
                2,
                None,
-               'importing-keras-and-tensorflow'),
-              ('Running with Keras', 2, None, 'running-with-keras'),
-              ('Final part', 2, None, 'final-part'),
-              ('Final visualization', 2, None, 'final-visualization'),
-              ('The CIFAR01 data set', 2, None, 'the-cifar01-data-set'),
-              ('Verifying the data set', 2, None, 'verifying-the-data-set'),
-              ('Set up  the model', 2, None, 'set-up-the-model'),
-              ('Add Dense layers on top', 2, None, 'add-dense-layers-on-top'),
-              ('Compile and train the model',
-               2,
-               None,
-               'compile-and-train-the-model'),
-              ('Finally, evaluate the model',
-               2,
-               None,
-               'finally-evaluate-the-model'),
-              ('Building our own CNN code',
-               2,
-               None,
-               'building-our-own-cnn-code'),
-              ('List of contents:', 3, None, 'list-of-contents'),
-              ('Schedulers', 3, None, 'schedulers'),
-              ('Usage of schedulers', 3, None, 'usage-of-schedulers'),
-              ('Cost functions', 3, None, 'cost-functions'),
-              ('Usage of cost functions', 3, None, 'usage-of-cost-functions'),
-              ('Activation functions', 3, None, 'activation-functions'),
-              ('Usage of activation functions',
-               3,
-               None,
-               'usage-of-activation-functions'),
-              ('Convolution', 3, None, 'convolution'),
-              ('Layers', 3, None, 'layers'),
-              ('Convolution2DLayer: convolution in a hidden layer',
-               3,
-               None,
-               'convolution2dlayer-convolution-in-a-hidden-layer'),
-              ('Backpropagation in the convolutional layer',
-               3,
-               None,
-               'backpropagation-in-the-convolutional-layer'),
-              ('Demonstration', 3, None, 'demonstration'),
-              ('Pooling Layer', 3, None, 'pooling-layer'),
-              ('Flattening Layer', 3, None, 'flattening-layer'),
-              ('Fully Connected Layers', 3, None, 'fully-connected-layers'),
-              ('Optimized Convolution2DLayer',
-               3,
-               None,
-               'optimized-convolution2dlayer'),
-              ('The Convolutional Neural Network (CNN)',
-               3,
-               None,
-               'the-convolutional-neural-network-cnn'),
-              ('Usage of CNN code', 3, None, 'usage-of-cnn-code'),
-              ('Additional Remarks', 3, None, 'additional-remarks'),
-              ('Remarks on the speed', 3, None, 'remarks-on-the-speed'),
-              ('Convolution using separable kernels',
-               3,
-               None,
-               'convolution-using-separable-kernels'),
-              ('Convolution in the Fourier domain',
-               3,
-               None,
-               'convolution-in-the-fourier-domain')]}
+               'building-convolutional-neural-networks-in-tensorflow-keras-and-pytorch')]}
 end of tocinfo -->
 
 <body>
@@ -247,96 +284,100 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week44-bs.html">Week 44,  Convolutional Neural Networks (CNN)</a>
+    <a class="navbar-brand" href="http://github.com/week44-bs.html">Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs001.html#plan-for-week-44" style="font-size: 80%;"><b>Plan for week 44</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs002.html#lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Lab  sessions on Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs003.html#material-for-lecture-monday-october-28" style="font-size: 80%;"><b>Material for Lecture Monday October 28</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs004.html#convolutional-neural-networks-recognizing-images" style="font-size: 80%;"><b>Convolutional Neural Networks (recognizing images)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs005.html#what-is-the-difference" style="font-size: 80%;"><b>What is the Difference</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs006.html#neural-networks-vs-cnns" style="font-size: 80%;"><b>Neural Networks vs CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs007.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;"><b>Why CNNS for images, sound files, medical images from CT scans etc?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs008.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;"><b>Regular NNs don’t scale well to full images</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs009.html#3d-volumes-of-neurons" style="font-size: 80%;"><b>3D volumes of neurons</b></a></li>
-     <!-- navigation toc: --> <li><a href="#more-on-dimensionalities" style="font-size: 80%;"><b>More on Dimensionalities</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs011.html#further-remarks" style="font-size: 80%;"><b>Further remarks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs012.html#layers-used-to-build-cnns" style="font-size: 80%;"><b>Layers used to build CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs013.html#transforming-images" style="font-size: 80%;"><b>Transforming images</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs014.html#cnns-in-brief" style="font-size: 80%;"><b>CNNs in brief</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs015.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs016.html#key-idea" style="font-size: 80%;"><b>Key Idea</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs017.html#how-to-do-image-compression-before-the-era-of-deep-learning" style="font-size: 80%;"><b>How to do image compression before the era of deep learning</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs018.html#the-svd-example" style="font-size: 80%;"><b>The SVD example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs019.html#mathematics-of-cnns" style="font-size: 80%;"><b>Mathematics of CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs020.html#convolution-examples-polynomial-multiplication" style="font-size: 80%;"><b>Convolution Examples: Polynomial multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs021.html#efficient-polynomial-multiplication" style="font-size: 80%;"><b>Efficient Polynomial Multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs022.html#further-simplification" style="font-size: 80%;"><b>Further simplification</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs023.html#a-more-efficient-way-of-coding-the-above-convolution" style="font-size: 80%;"><b>A more efficient way of coding the above Convolution</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs024.html#commutative-process" style="font-size: 80%;"><b>Commutative process</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs025.html#toeplitz-matrices" style="font-size: 80%;"><b>Toeplitz matrices</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs026.html#fourier-series-and-toeplitz-matrices" style="font-size: 80%;"><b>Fourier series and Toeplitz matrices</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs027.html#generalizing-the-above-one-dimensional-case" style="font-size: 80%;"><b>Generalizing the above one-dimensional case</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#memory-considerations" style="font-size: 80%;"><b>Memory considerations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs029.html#padding" style="font-size: 80%;"><b>Padding</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs030.html#new-vector" style="font-size: 80%;"><b>New vector</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs031.html#rewriting-as-dot-products" style="font-size: 80%;"><b>Rewriting as dot products</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs032.html#cross-correlation" style="font-size: 80%;"><b>Cross correlation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs032.html#two-dimensional-objects" style="font-size: 80%;"><b>Two-dimensional objects</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs033.html#cnns-in-more-detail-simple-example" style="font-size: 80%;"><b>CNNs in more detail, simple example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs034.html#the-convolution-stage" style="font-size: 80%;"><b>The convolution stage</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs035.html#finding-the-number-of-parameters" style="font-size: 80%;"><b>Finding the number of parameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs036.html#new-image-or-volume" style="font-size: 80%;"><b>New image (or volume)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs037.html#parameters-to-train-common-settings" style="font-size: 80%;"><b>Parameters to train, common settings</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs038.html#examples-of-cnn-setups" style="font-size: 80%;"><b>Examples of CNN setups</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs039.html#summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>Summarizing: Performing a general discrete convolution ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs040.html#pooling" style="font-size: 80%;"><b>Pooling</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs041.html#pooling-arithmetic" style="font-size: 80%;"><b>Pooling arithmetic</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs042.html#pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>Pooling types ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs043.html#building-convolutional-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building convolutional neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs044.html#setting-it-up" style="font-size: 80%;"><b>Setting it up</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs045.html#the-mnist-dataset-again" style="font-size: 80%;"><b>The MNIST dataset again</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs046.html#strong-correlations" style="font-size: 80%;"><b>Strong correlations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs047.html#layers-of-a-cnn" style="font-size: 80%;"><b>Layers of a CNN</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs048.html#systematic-reduction" style="font-size: 80%;"><b>Systematic reduction</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs049.html#prerequisites-collect-and-pre-process-data" style="font-size: 80%;"><b>Prerequisites: Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs050.html#importing-keras-and-tensorflow" style="font-size: 80%;"><b>Importing Keras and Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs051.html#running-with-keras" style="font-size: 80%;"><b>Running with Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs052.html#final-part" style="font-size: 80%;"><b>Final part</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs053.html#final-visualization" style="font-size: 80%;"><b>Final visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs054.html#the-cifar01-data-set" style="font-size: 80%;"><b>The CIFAR01 data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs055.html#verifying-the-data-set" style="font-size: 80%;"><b>Verifying the data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs056.html#set-up-the-model" style="font-size: 80%;"><b>Set up  the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs057.html#add-dense-layers-on-top" style="font-size: 80%;"><b>Add Dense layers on top</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs058.html#compile-and-train-the-model" style="font-size: 80%;"><b>Compile and train the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs059.html#finally-evaluate-the-model" style="font-size: 80%;"><b>Finally, evaluate the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#building-our-own-cnn-code" style="font-size: 80%;"><b>Building our own CNN code</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#list-of-contents" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;List of contents:</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution2dlayer-convolution-in-a-hidden-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution2DLayer: convolution in a hidden layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#backpropagation-in-the-convolutional-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Backpropagation in the convolutional layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#demonstration" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Demonstration</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#pooling-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Pooling Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#flattening-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Flattening Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#fully-connected-layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Fully Connected Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#optimized-convolution2dlayer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Optimized Convolution2DLayer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#the-convolutional-neural-network-cnn" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;The Convolutional Neural Network (CNN)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-cnn-code" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of CNN code</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#additional-remarks" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Additional Remarks</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#remarks-on-the-speed" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Remarks on the speed</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution-using-separable-kernels" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution using separable kernels</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution-in-the-fourier-domain" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution in the Fourier domain</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs001.html#plan-for-week-44" style="font-size: 80%;">Plan for week 44</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs002.html#lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;">Lab  sessions on Tuesday and Wednesday</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs003.html#material-for-lecture-monday-october-27" style="font-size: 80%;">Material for Lecture Monday October 27</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs004.html#solving-differential-equations-with-deep-learning" style="font-size: 80%;">Solving differential equations  with Deep Learning</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs005.html#ordinary-differential-equations-first" style="font-size: 80%;">Ordinary Differential Equations first</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#the-trial-solution" style="font-size: 80%;">The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs007.html#minimization-process" style="font-size: 80%;">Minimization process</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs008.html#minimizing-the-cost-function-using-gradient-descent-and-automatic-differentiation" style="font-size: 80%;">Minimizing the cost function using gradient descent and automatic differentiation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs009.html#example-exponential-decay" style="font-size: 80%;">Example: Exponential decay</a></li>
+     <!-- navigation toc: --> <li><a href="#the-function-to-solve-for" style="font-size: 80%;">The function to solve for</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#the-trial-solution" style="font-size: 80%;">The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs012.html#setup-of-network" style="font-size: 80%;">Setup of Network</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs013.html#reformulating-the-problem" style="font-size: 80%;">Reformulating the problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs014.html#more-technicalities" style="font-size: 80%;">More technicalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs039.html#more-details" style="font-size: 80%;">More details</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs016.html#a-possible-implementation-of-a-neural-network" style="font-size: 80%;">A possible implementation of a neural network</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs017.html#technicalities" style="font-size: 80%;">Technicalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs018.html#final-technicalities-i" style="font-size: 80%;">Final technicalities I</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs019.html#final-technicalities-ii" style="font-size: 80%;">Final technicalities II</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs020.html#final-technicalities-iii" style="font-size: 80%;">Final technicalities III</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs021.html#final-technicalities-iv" style="font-size: 80%;">Final technicalities IV</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs022.html#back-propagation" style="font-size: 80%;">Back propagation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs023.html#gradient-descent" style="font-size: 80%;">Gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs024.html#the-code-for-solving-the-ode" style="font-size: 80%;">The code for solving the ODE</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs025.html#the-network-with-one-input-layer-specified-number-of-hidden-layers-and-one-output-layer" style="font-size: 80%;">The network with one input layer, specified number of hidden layers, and one output layer</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs026.html#example-population-growth" style="font-size: 80%;">Example: Population growth</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs027.html#setting-up-the-problem" style="font-size: 80%;">Setting up the problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#the-trial-solution" style="font-size: 80%;">The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs029.html#the-program-using-autograd" style="font-size: 80%;">The program using Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs030.html#using-forward-euler-to-solve-the-ode" style="font-size: 80%;">Using forward Euler to solve the ODE</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs031.html#example-solving-the-one-dimensional-poisson-equation" style="font-size: 80%;">Example: Solving the one dimensional Poisson equation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs032.html#the-specific-equation-to-solve-for" style="font-size: 80%;">The specific equation to solve for</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs033.html#solving-the-equation-using-autograd" style="font-size: 80%;">Solving the equation using Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs034.html#comparing-with-a-numerical-scheme" style="font-size: 80%;">Comparing with a numerical scheme</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs035.html#setting-up-the-code" style="font-size: 80%;">Setting up the code</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs036.html#partial-differential-equations" style="font-size: 80%;">Partial Differential Equations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs037.html#type-of-problem" style="font-size: 80%;">Type of problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs038.html#network-requirements" style="font-size: 80%;">Network requirements</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs039.html#more-details" style="font-size: 80%;">More details</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs040.html#example-the-diffusion-equation" style="font-size: 80%;">Example: The diffusion equation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs041.html#defining-the-problem" style="font-size: 80%;">Defining the problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs042.html#setting-up-the-network-using-autograd" style="font-size: 80%;">Setting up the network using Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs043.html#setting-up-the-network-using-autograd-the-trial-solution" style="font-size: 80%;">Setting up the network using Autograd; The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs044.html#why-the-jacobian" style="font-size: 80%;">Why the Jacobian?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs045.html#setting-up-the-network-using-autograd-the-full-program" style="font-size: 80%;">Setting up the network using Autograd; The full program</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs046.html#resources-on-differential-equations-and-deep-learning" style="font-size: 80%;">Resources on differential equations and deep learning</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs047.html#convolutional-neural-networks-recognizing-images" style="font-size: 80%;">Convolutional Neural Networks (recognizing images)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs048.html#what-is-the-difference" style="font-size: 80%;">What is the Difference</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs049.html#neural-networks-vs-cnns" style="font-size: 80%;">Neural Networks vs CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs050.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;">Why CNNS for images, sound files, medical images from CT scans etc?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs051.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;">Regular NNs don’t scale well to full images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs052.html#3d-volumes-of-neurons" style="font-size: 80%;">3D volumes of neurons</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs053.html#more-on-dimensionalities" style="font-size: 80%;">More on Dimensionalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs054.html#further-remarks" style="font-size: 80%;">Further remarks</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs055.html#layers-used-to-build-cnns" style="font-size: 80%;">Layers used to build CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs056.html#transforming-images" style="font-size: 80%;">Transforming images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs057.html#cnns-in-brief" style="font-size: 80%;">CNNs in brief</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs058.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs059.html#key-idea" style="font-size: 80%;">Key Idea</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#how-to-do-image-compression-before-the-era-of-deep-learning" style="font-size: 80%;">How to do image compression before the era of deep learning</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs061.html#the-svd-example" style="font-size: 80%;">The SVD example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs062.html#mathematics-of-cnns" style="font-size: 80%;">Mathematics of CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs063.html#convolution-examples-polynomial-multiplication" style="font-size: 80%;">Convolution Examples: Polynomial multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs064.html#efficient-polynomial-multiplication" style="font-size: 80%;">Efficient Polynomial Multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs065.html#further-simplification" style="font-size: 80%;">Further simplification</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs066.html#a-more-efficient-way-of-coding-the-above-convolution" style="font-size: 80%;">A more efficient way of coding the above Convolution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs067.html#commutative-process" style="font-size: 80%;">Commutative process</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs068.html#toeplitz-matrices" style="font-size: 80%;">Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs069.html#fourier-series-and-toeplitz-matrices" style="font-size: 80%;">Fourier series and Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs070.html#generalizing-the-above-one-dimensional-case" style="font-size: 80%;">Generalizing the above one-dimensional case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs071.html#memory-considerations" style="font-size: 80%;">Memory considerations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs072.html#padding" style="font-size: 80%;">Padding</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs073.html#new-vector" style="font-size: 80%;">New vector</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs074.html#rewriting-as-dot-products" style="font-size: 80%;">Rewriting as dot products</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs075.html#cross-correlation" style="font-size: 80%;">Cross correlation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs075.html#two-dimensional-objects" style="font-size: 80%;">Two-dimensional objects</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs076.html#cnns-in-more-detail-simple-example" style="font-size: 80%;">CNNs in more detail, simple example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs077.html#the-convolution-stage" style="font-size: 80%;">The convolution stage</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs078.html#finding-the-number-of-parameters" style="font-size: 80%;">Finding the number of parameters</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs079.html#new-image-or-volume" style="font-size: 80%;">New image (or volume)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs080.html#parameters-to-train-common-settings" style="font-size: 80%;">Parameters to train, common settings</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs081.html#examples-of-cnn-setups" style="font-size: 80%;">Examples of CNN setups</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs082.html#summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Summarizing: Performing a general discrete convolution ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs083.html#pooling" style="font-size: 80%;">Pooling</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs084.html#pooling-arithmetic" style="font-size: 80%;">Pooling arithmetic</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs085.html#pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Pooling types ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs086.html#building-convolutional-neural-networks-in-tensorflow-keras-and-pytorch" style="font-size: 80%;">Building convolutional neural networks in Tensorflow/Keras and PyTorch</a></li>
 
         </ul>
       </li>
@@ -348,33 +389,19 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0010"></a>
 <!-- !split -->
-<h2 id="more-on-dimensionalities" class="anchor">More on Dimensionalities </h2>
-
-<p>In fields like signal processing (and imaging as well), one designs
-so-called filters. These filters are defined by the convolutions and
-are often hand-crafted. One may specify filters for smoothing, edge
-detection, frequency reshaping, and similar operations. However with
-neural networks the idea is to automatically learn the filters and use
-many of them in conjunction with non-linear operations (activation
-functions).
-</p>
+<h2 id="the-function-to-solve-for" class="anchor">The function to solve for </h2>
 
-<p>As an example consider a neural network operating on sound sequence
-data.  Assume that we an input vector \( \boldsymbol{x} \) of length \( d=10^6 \).  We
-construct then a neural network with onle hidden layer only with
-\( 10^4 \) nodes. This means that we will have a weight matrix with
-\( 10^4\times 10^6=10^{10} \) weights to be determined, together with \( 10^4 \) biases.
-</p>
-
-<p>Assume furthermore that we have an output layer which is meant to train whether the sound sequence represents a human voice (true) or something else (false).
-It means that we have only one output node. But since this output node connects to \( 10^4 \) nodes in the hidden layer, there are in total \( 10^4 \) weights to be determined for the output layer, plus one bias. In total we have
-</p>
+<p>The program will use a neural network to solve</p>
 
 $$
-\mathrm{NumberParameters}=10^{10}+10^4+10^4+1 \approx 10^{10},
+\begin{equation} \tag{6}
+g'(x) = -\gamma g(x)
+\end{equation}
 $$
 
-<p>that is ten billion parameters to determine. </p>
+<p>where \( g(0) = g_0 \) with \( \gamma \) and \( g_0 \) being some chosen values.</p>
+
+<p>In this example, \( \gamma = 2 \) and \( g_0 = 10 \).</p>
 
 <p>
 <!-- navigation buttons at the bottom of the page -->
@@ -401,7 +428,7 @@ <h2 id="more-on-dimensionalities" class="anchor">More on Dimensionalities </h2>
   <li><a href="http://github.com/._week44-bs018.html">19</a></li>
   <li><a href="http://github.com/._week44-bs019.html">20</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week44-bs060.html">61</a></li>
+  <li><a href="http://github.com/._week44-bs086.html">87</a></li>
   <li><a href="http://github.com/._week44-bs011.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week44/html/._week44-bs011.html b/doc/pub/week44/html/._week44-bs011.html
index 0a3d45392..ddf777e6f 100644
--- a/doc/pub/week44/html/._week44-bs011.html
+++ b/doc/pub/week44/html/._week44-bs011.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 44,  Convolutional Neural Networks (CNN)">
-<title>Week 44,  Convolutional Neural Networks (CNN)</title>
+<meta name="description" content="Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)">
+<title>Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week44.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week44-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -41,10 +41,121 @@
                2,
                None,
                'lab-sessions-on-tuesday-and-wednesday'),
-              ('Material for Lecture Monday October 28',
+              ('Material for Lecture Monday October 27',
                2,
                None,
-               'material-for-lecture-monday-october-28'),
+               'material-for-lecture-monday-october-27'),
+              ('Solving differential equations  with Deep Learning',
+               2,
+               None,
+               'solving-differential-equations-with-deep-learning'),
+              ('Ordinary Differential Equations first',
+               2,
+               None,
+               'ordinary-differential-equations-first'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('Minimization process', 2, None, 'minimization-process'),
+              ('Minimizing the cost function using gradient descent and '
+               'automatic differentiation',
+               2,
+               None,
+               'minimizing-the-cost-function-using-gradient-descent-and-automatic-differentiation'),
+              ('Example: Exponential decay',
+               2,
+               None,
+               'example-exponential-decay'),
+              ('The function to solve for',
+               2,
+               None,
+               'the-function-to-solve-for'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('Setup of Network', 2, None, 'setup-of-network'),
+              ('Reformulating the problem',
+               2,
+               None,
+               'reformulating-the-problem'),
+              ('More technicalities', 2, None, 'more-technicalities'),
+              ('More details', 2, None, 'more-details'),
+              ('A possible implementation of a neural network',
+               2,
+               None,
+               'a-possible-implementation-of-a-neural-network'),
+              ('Technicalities', 2, None, 'technicalities'),
+              ('Final technicalities I', 2, None, 'final-technicalities-i'),
+              ('Final technicalities II', 2, None, 'final-technicalities-ii'),
+              ('Final technicalities III', 2, None, 'final-technicalities-iii'),
+              ('Final technicalities IV', 2, None, 'final-technicalities-iv'),
+              ('Back propagation', 2, None, 'back-propagation'),
+              ('Gradient descent', 2, None, 'gradient-descent'),
+              ('The code for solving the ODE',
+               2,
+               None,
+               'the-code-for-solving-the-ode'),
+              ('The network with one input layer, specified number of hidden '
+               'layers, and one output layer',
+               2,
+               None,
+               'the-network-with-one-input-layer-specified-number-of-hidden-layers-and-one-output-layer'),
+              ('Example: Population growth',
+               2,
+               None,
+               'example-population-growth'),
+              ('Setting up the problem', 2, None, 'setting-up-the-problem'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('The program using Autograd',
+               2,
+               None,
+               'the-program-using-autograd'),
+              ('Using forward Euler to solve the ODE',
+               2,
+               None,
+               'using-forward-euler-to-solve-the-ode'),
+              ('Example: Solving the one dimensional Poisson equation',
+               2,
+               None,
+               'example-solving-the-one-dimensional-poisson-equation'),
+              ('The specific equation to solve for',
+               2,
+               None,
+               'the-specific-equation-to-solve-for'),
+              ('Solving the equation using Autograd',
+               2,
+               None,
+               'solving-the-equation-using-autograd'),
+              ('Comparing with a numerical scheme',
+               2,
+               None,
+               'comparing-with-a-numerical-scheme'),
+              ('Setting up the code', 2, None, 'setting-up-the-code'),
+              ('Partial Differential Equations',
+               2,
+               None,
+               'partial-differential-equations'),
+              ('Type of problem', 2, None, 'type-of-problem'),
+              ('Network requirements', 2, None, 'network-requirements'),
+              ('More details', 2, None, 'more-details'),
+              ('Example: The diffusion equation',
+               2,
+               None,
+               'example-the-diffusion-equation'),
+              ('Defining the problem', 2, None, 'defining-the-problem'),
+              ('Setting up the network using Autograd',
+               2,
+               None,
+               'setting-up-the-network-using-autograd'),
+              ('Setting up the network using Autograd; The trial solution',
+               2,
+               None,
+               'setting-up-the-network-using-autograd-the-trial-solution'),
+              ('Why the Jacobian?', 2, None, 'why-the-jacobian'),
+              ('Setting up the network using Autograd; The full program',
+               2,
+               None,
+               'setting-up-the-network-using-autograd-the-full-program'),
+              ('Resources on differential equations and deep learning',
+               2,
+               None,
+               'resources-on-differential-equations-and-deep-learning'),
               ('Convolutional Neural Networks (recognizing images)',
                2,
                None,
@@ -141,85 +252,11 @@
                2,
                None,
                'pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
-              ('Building convolutional neural networks in Tensorflow and Keras',
-               2,
-               None,
-               'building-convolutional-neural-networks-in-tensorflow-and-keras'),
-              ('Setting it up', 2, None, 'setting-it-up'),
-              ('The MNIST dataset again', 2, None, 'the-mnist-dataset-again'),
-              ('Strong correlations', 2, None, 'strong-correlations'),
-              ('Layers of a CNN', 2, None, 'layers-of-a-cnn'),
-              ('Systematic reduction', 2, None, 'systematic-reduction'),
-              ('Prerequisites: Collect and pre-process data',
+              ('Building convolutional neural networks in Tensorflow/Keras and '
+               'PyTorch',
                2,
                None,
-               'prerequisites-collect-and-pre-process-data'),
-              ('Importing Keras and Tensorflow',
-               2,
-               None,
-               'importing-keras-and-tensorflow'),
-              ('Running with Keras', 2, None, 'running-with-keras'),
-              ('Final part', 2, None, 'final-part'),
-              ('Final visualization', 2, None, 'final-visualization'),
-              ('The CIFAR01 data set', 2, None, 'the-cifar01-data-set'),
-              ('Verifying the data set', 2, None, 'verifying-the-data-set'),
-              ('Set up  the model', 2, None, 'set-up-the-model'),
-              ('Add Dense layers on top', 2, None, 'add-dense-layers-on-top'),
-              ('Compile and train the model',
-               2,
-               None,
-               'compile-and-train-the-model'),
-              ('Finally, evaluate the model',
-               2,
-               None,
-               'finally-evaluate-the-model'),
-              ('Building our own CNN code',
-               2,
-               None,
-               'building-our-own-cnn-code'),
-              ('List of contents:', 3, None, 'list-of-contents'),
-              ('Schedulers', 3, None, 'schedulers'),
-              ('Usage of schedulers', 3, None, 'usage-of-schedulers'),
-              ('Cost functions', 3, None, 'cost-functions'),
-              ('Usage of cost functions', 3, None, 'usage-of-cost-functions'),
-              ('Activation functions', 3, None, 'activation-functions'),
-              ('Usage of activation functions',
-               3,
-               None,
-               'usage-of-activation-functions'),
-              ('Convolution', 3, None, 'convolution'),
-              ('Layers', 3, None, 'layers'),
-              ('Convolution2DLayer: convolution in a hidden layer',
-               3,
-               None,
-               'convolution2dlayer-convolution-in-a-hidden-layer'),
-              ('Backpropagation in the convolutional layer',
-               3,
-               None,
-               'backpropagation-in-the-convolutional-layer'),
-              ('Demonstration', 3, None, 'demonstration'),
-              ('Pooling Layer', 3, None, 'pooling-layer'),
-              ('Flattening Layer', 3, None, 'flattening-layer'),
-              ('Fully Connected Layers', 3, None, 'fully-connected-layers'),
-              ('Optimized Convolution2DLayer',
-               3,
-               None,
-               'optimized-convolution2dlayer'),
-              ('The Convolutional Neural Network (CNN)',
-               3,
-               None,
-               'the-convolutional-neural-network-cnn'),
-              ('Usage of CNN code', 3, None, 'usage-of-cnn-code'),
-              ('Additional Remarks', 3, None, 'additional-remarks'),
-              ('Remarks on the speed', 3, None, 'remarks-on-the-speed'),
-              ('Convolution using separable kernels',
-               3,
-               None,
-               'convolution-using-separable-kernels'),
-              ('Convolution in the Fourier domain',
-               3,
-               None,
-               'convolution-in-the-fourier-domain')]}
+               'building-convolutional-neural-networks-in-tensorflow-keras-and-pytorch')]}
 end of tocinfo -->
 
 <body>
@@ -247,96 +284,100 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week44-bs.html">Week 44,  Convolutional Neural Networks (CNN)</a>
+    <a class="navbar-brand" href="http://github.com/week44-bs.html">Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs001.html#plan-for-week-44" style="font-size: 80%;"><b>Plan for week 44</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs002.html#lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Lab  sessions on Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs003.html#material-for-lecture-monday-october-28" style="font-size: 80%;"><b>Material for Lecture Monday October 28</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs004.html#convolutional-neural-networks-recognizing-images" style="font-size: 80%;"><b>Convolutional Neural Networks (recognizing images)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs005.html#what-is-the-difference" style="font-size: 80%;"><b>What is the Difference</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs006.html#neural-networks-vs-cnns" style="font-size: 80%;"><b>Neural Networks vs CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs007.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;"><b>Why CNNS for images, sound files, medical images from CT scans etc?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs008.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;"><b>Regular NNs don’t scale well to full images</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs009.html#3d-volumes-of-neurons" style="font-size: 80%;"><b>3D volumes of neurons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs010.html#more-on-dimensionalities" style="font-size: 80%;"><b>More on Dimensionalities</b></a></li>
-     <!-- navigation toc: --> <li><a href="#further-remarks" style="font-size: 80%;"><b>Further remarks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs012.html#layers-used-to-build-cnns" style="font-size: 80%;"><b>Layers used to build CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs013.html#transforming-images" style="font-size: 80%;"><b>Transforming images</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs014.html#cnns-in-brief" style="font-size: 80%;"><b>CNNs in brief</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs015.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs016.html#key-idea" style="font-size: 80%;"><b>Key Idea</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs017.html#how-to-do-image-compression-before-the-era-of-deep-learning" style="font-size: 80%;"><b>How to do image compression before the era of deep learning</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs018.html#the-svd-example" style="font-size: 80%;"><b>The SVD example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs019.html#mathematics-of-cnns" style="font-size: 80%;"><b>Mathematics of CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs020.html#convolution-examples-polynomial-multiplication" style="font-size: 80%;"><b>Convolution Examples: Polynomial multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs021.html#efficient-polynomial-multiplication" style="font-size: 80%;"><b>Efficient Polynomial Multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs022.html#further-simplification" style="font-size: 80%;"><b>Further simplification</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs023.html#a-more-efficient-way-of-coding-the-above-convolution" style="font-size: 80%;"><b>A more efficient way of coding the above Convolution</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs024.html#commutative-process" style="font-size: 80%;"><b>Commutative process</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs025.html#toeplitz-matrices" style="font-size: 80%;"><b>Toeplitz matrices</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs026.html#fourier-series-and-toeplitz-matrices" style="font-size: 80%;"><b>Fourier series and Toeplitz matrices</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs027.html#generalizing-the-above-one-dimensional-case" style="font-size: 80%;"><b>Generalizing the above one-dimensional case</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#memory-considerations" style="font-size: 80%;"><b>Memory considerations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs029.html#padding" style="font-size: 80%;"><b>Padding</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs030.html#new-vector" style="font-size: 80%;"><b>New vector</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs031.html#rewriting-as-dot-products" style="font-size: 80%;"><b>Rewriting as dot products</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs032.html#cross-correlation" style="font-size: 80%;"><b>Cross correlation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs032.html#two-dimensional-objects" style="font-size: 80%;"><b>Two-dimensional objects</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs033.html#cnns-in-more-detail-simple-example" style="font-size: 80%;"><b>CNNs in more detail, simple example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs034.html#the-convolution-stage" style="font-size: 80%;"><b>The convolution stage</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs035.html#finding-the-number-of-parameters" style="font-size: 80%;"><b>Finding the number of parameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs036.html#new-image-or-volume" style="font-size: 80%;"><b>New image (or volume)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs037.html#parameters-to-train-common-settings" style="font-size: 80%;"><b>Parameters to train, common settings</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs038.html#examples-of-cnn-setups" style="font-size: 80%;"><b>Examples of CNN setups</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs039.html#summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>Summarizing: Performing a general discrete convolution ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs040.html#pooling" style="font-size: 80%;"><b>Pooling</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs041.html#pooling-arithmetic" style="font-size: 80%;"><b>Pooling arithmetic</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs042.html#pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>Pooling types ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs043.html#building-convolutional-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building convolutional neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs044.html#setting-it-up" style="font-size: 80%;"><b>Setting it up</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs045.html#the-mnist-dataset-again" style="font-size: 80%;"><b>The MNIST dataset again</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs046.html#strong-correlations" style="font-size: 80%;"><b>Strong correlations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs047.html#layers-of-a-cnn" style="font-size: 80%;"><b>Layers of a CNN</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs048.html#systematic-reduction" style="font-size: 80%;"><b>Systematic reduction</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs049.html#prerequisites-collect-and-pre-process-data" style="font-size: 80%;"><b>Prerequisites: Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs050.html#importing-keras-and-tensorflow" style="font-size: 80%;"><b>Importing Keras and Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs051.html#running-with-keras" style="font-size: 80%;"><b>Running with Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs052.html#final-part" style="font-size: 80%;"><b>Final part</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs053.html#final-visualization" style="font-size: 80%;"><b>Final visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs054.html#the-cifar01-data-set" style="font-size: 80%;"><b>The CIFAR01 data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs055.html#verifying-the-data-set" style="font-size: 80%;"><b>Verifying the data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs056.html#set-up-the-model" style="font-size: 80%;"><b>Set up  the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs057.html#add-dense-layers-on-top" style="font-size: 80%;"><b>Add Dense layers on top</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs058.html#compile-and-train-the-model" style="font-size: 80%;"><b>Compile and train the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs059.html#finally-evaluate-the-model" style="font-size: 80%;"><b>Finally, evaluate the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#building-our-own-cnn-code" style="font-size: 80%;"><b>Building our own CNN code</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#list-of-contents" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;List of contents:</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution2dlayer-convolution-in-a-hidden-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution2DLayer: convolution in a hidden layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#backpropagation-in-the-convolutional-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Backpropagation in the convolutional layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#demonstration" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Demonstration</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#pooling-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Pooling Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#flattening-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Flattening Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#fully-connected-layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Fully Connected Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#optimized-convolution2dlayer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Optimized Convolution2DLayer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#the-convolutional-neural-network-cnn" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;The Convolutional Neural Network (CNN)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-cnn-code" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of CNN code</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#additional-remarks" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Additional Remarks</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#remarks-on-the-speed" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Remarks on the speed</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution-using-separable-kernels" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution using separable kernels</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution-in-the-fourier-domain" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution in the Fourier domain</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs001.html#plan-for-week-44" style="font-size: 80%;">Plan for week 44</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs002.html#lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;">Lab  sessions on Tuesday and Wednesday</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs003.html#material-for-lecture-monday-october-27" style="font-size: 80%;">Material for Lecture Monday October 27</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs004.html#solving-differential-equations-with-deep-learning" style="font-size: 80%;">Solving differential equations  with Deep Learning</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs005.html#ordinary-differential-equations-first" style="font-size: 80%;">Ordinary Differential Equations first</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#the-trial-solution" style="font-size: 80%;">The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs007.html#minimization-process" style="font-size: 80%;">Minimization process</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs008.html#minimizing-the-cost-function-using-gradient-descent-and-automatic-differentiation" style="font-size: 80%;">Minimizing the cost function using gradient descent and automatic differentiation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs009.html#example-exponential-decay" style="font-size: 80%;">Example: Exponential decay</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs010.html#the-function-to-solve-for" style="font-size: 80%;">The function to solve for</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#the-trial-solution" style="font-size: 80%;">The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs012.html#setup-of-network" style="font-size: 80%;">Setup of Network</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs013.html#reformulating-the-problem" style="font-size: 80%;">Reformulating the problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs014.html#more-technicalities" style="font-size: 80%;">More technicalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs039.html#more-details" style="font-size: 80%;">More details</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs016.html#a-possible-implementation-of-a-neural-network" style="font-size: 80%;">A possible implementation of a neural network</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs017.html#technicalities" style="font-size: 80%;">Technicalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs018.html#final-technicalities-i" style="font-size: 80%;">Final technicalities I</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs019.html#final-technicalities-ii" style="font-size: 80%;">Final technicalities II</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs020.html#final-technicalities-iii" style="font-size: 80%;">Final technicalities III</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs021.html#final-technicalities-iv" style="font-size: 80%;">Final technicalities IV</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs022.html#back-propagation" style="font-size: 80%;">Back propagation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs023.html#gradient-descent" style="font-size: 80%;">Gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs024.html#the-code-for-solving-the-ode" style="font-size: 80%;">The code for solving the ODE</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs025.html#the-network-with-one-input-layer-specified-number-of-hidden-layers-and-one-output-layer" style="font-size: 80%;">The network with one input layer, specified number of hidden layers, and one output layer</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs026.html#example-population-growth" style="font-size: 80%;">Example: Population growth</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs027.html#setting-up-the-problem" style="font-size: 80%;">Setting up the problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#the-trial-solution" style="font-size: 80%;">The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs029.html#the-program-using-autograd" style="font-size: 80%;">The program using Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs030.html#using-forward-euler-to-solve-the-ode" style="font-size: 80%;">Using forward Euler to solve the ODE</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs031.html#example-solving-the-one-dimensional-poisson-equation" style="font-size: 80%;">Example: Solving the one dimensional Poisson equation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs032.html#the-specific-equation-to-solve-for" style="font-size: 80%;">The specific equation to solve for</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs033.html#solving-the-equation-using-autograd" style="font-size: 80%;">Solving the equation using Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs034.html#comparing-with-a-numerical-scheme" style="font-size: 80%;">Comparing with a numerical scheme</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs035.html#setting-up-the-code" style="font-size: 80%;">Setting up the code</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs036.html#partial-differential-equations" style="font-size: 80%;">Partial Differential Equations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs037.html#type-of-problem" style="font-size: 80%;">Type of problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs038.html#network-requirements" style="font-size: 80%;">Network requirements</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs039.html#more-details" style="font-size: 80%;">More details</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs040.html#example-the-diffusion-equation" style="font-size: 80%;">Example: The diffusion equation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs041.html#defining-the-problem" style="font-size: 80%;">Defining the problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs042.html#setting-up-the-network-using-autograd" style="font-size: 80%;">Setting up the network using Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs043.html#setting-up-the-network-using-autograd-the-trial-solution" style="font-size: 80%;">Setting up the network using Autograd; The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs044.html#why-the-jacobian" style="font-size: 80%;">Why the Jacobian?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs045.html#setting-up-the-network-using-autograd-the-full-program" style="font-size: 80%;">Setting up the network using Autograd; The full program</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs046.html#resources-on-differential-equations-and-deep-learning" style="font-size: 80%;">Resources on differential equations and deep learning</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs047.html#convolutional-neural-networks-recognizing-images" style="font-size: 80%;">Convolutional Neural Networks (recognizing images)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs048.html#what-is-the-difference" style="font-size: 80%;">What is the Difference</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs049.html#neural-networks-vs-cnns" style="font-size: 80%;">Neural Networks vs CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs050.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;">Why CNNS for images, sound files, medical images from CT scans etc?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs051.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;">Regular NNs don’t scale well to full images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs052.html#3d-volumes-of-neurons" style="font-size: 80%;">3D volumes of neurons</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs053.html#more-on-dimensionalities" style="font-size: 80%;">More on Dimensionalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs054.html#further-remarks" style="font-size: 80%;">Further remarks</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs055.html#layers-used-to-build-cnns" style="font-size: 80%;">Layers used to build CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs056.html#transforming-images" style="font-size: 80%;">Transforming images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs057.html#cnns-in-brief" style="font-size: 80%;">CNNs in brief</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs058.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs059.html#key-idea" style="font-size: 80%;">Key Idea</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#how-to-do-image-compression-before-the-era-of-deep-learning" style="font-size: 80%;">How to do image compression before the era of deep learning</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs061.html#the-svd-example" style="font-size: 80%;">The SVD example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs062.html#mathematics-of-cnns" style="font-size: 80%;">Mathematics of CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs063.html#convolution-examples-polynomial-multiplication" style="font-size: 80%;">Convolution Examples: Polynomial multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs064.html#efficient-polynomial-multiplication" style="font-size: 80%;">Efficient Polynomial Multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs065.html#further-simplification" style="font-size: 80%;">Further simplification</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs066.html#a-more-efficient-way-of-coding-the-above-convolution" style="font-size: 80%;">A more efficient way of coding the above Convolution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs067.html#commutative-process" style="font-size: 80%;">Commutative process</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs068.html#toeplitz-matrices" style="font-size: 80%;">Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs069.html#fourier-series-and-toeplitz-matrices" style="font-size: 80%;">Fourier series and Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs070.html#generalizing-the-above-one-dimensional-case" style="font-size: 80%;">Generalizing the above one-dimensional case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs071.html#memory-considerations" style="font-size: 80%;">Memory considerations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs072.html#padding" style="font-size: 80%;">Padding</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs073.html#new-vector" style="font-size: 80%;">New vector</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs074.html#rewriting-as-dot-products" style="font-size: 80%;">Rewriting as dot products</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs075.html#cross-correlation" style="font-size: 80%;">Cross correlation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs075.html#two-dimensional-objects" style="font-size: 80%;">Two-dimensional objects</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs076.html#cnns-in-more-detail-simple-example" style="font-size: 80%;">CNNs in more detail, simple example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs077.html#the-convolution-stage" style="font-size: 80%;">The convolution stage</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs078.html#finding-the-number-of-parameters" style="font-size: 80%;">Finding the number of parameters</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs079.html#new-image-or-volume" style="font-size: 80%;">New image (or volume)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs080.html#parameters-to-train-common-settings" style="font-size: 80%;">Parameters to train, common settings</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs081.html#examples-of-cnn-setups" style="font-size: 80%;">Examples of CNN setups</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs082.html#summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Summarizing: Performing a general discrete convolution ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs083.html#pooling" style="font-size: 80%;">Pooling</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs084.html#pooling-arithmetic" style="font-size: 80%;">Pooling arithmetic</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs085.html#pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Pooling types ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs086.html#building-convolutional-neural-networks-in-tensorflow-keras-and-pytorch" style="font-size: 80%;">Building convolutional neural networks in Tensorflow/Keras and PyTorch</a></li>
 
         </ul>
       </li>
@@ -348,16 +389,14 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0011"></a>
 <!-- !split -->
-<h2 id="further-remarks" class="anchor">Further remarks </h2>
+<h2 id="the-trial-solution" class="anchor">The trial solution </h2>
+<p>To begin with, a trial solution \( g_t(t) \) must be chosen. A general trial solution for ordinary differential equations could be</p>
+
+$$
+g_t(x, P) = h_1(x) + h_2(x, N(x, P))
+$$
 
-<p>The main principles that justify convolutions is locality of
-information and repetion of patterns within the signal. Sound samples
-of the input in adjacent spots are much more likely to affect each
-other than those that are very far away. Similarly, sounds are
-repeated in multiple times in the signal. While slightly simplistic,
-reasoning about such a sound example demonstrates this. The same
-principles then apply to images and other similar data.
-</p>
+<p>with \( h_1(x) \) ensuring that \( g_t(x) \) satisfies some conditions and \( h_2(x,N(x, P)) \) an expression involving \( x \) and the output from the neural network \( N(x,P) \) with \( P  \) being the collection of the weights and biases for each layer. For now, it is assumed that the network consists of one input layer, one hidden layer, and one output layer.</p>
 
 <p>
 <!-- navigation buttons at the bottom of the page -->
@@ -384,7 +423,7 @@ <h2 id="further-remarks" class="anchor">Further remarks </h2>
   <li><a href="http://github.com/._week44-bs019.html">20</a></li>
   <li><a href="http://github.com/._week44-bs020.html">21</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week44-bs060.html">61</a></li>
+  <li><a href="http://github.com/._week44-bs086.html">87</a></li>
   <li><a href="http://github.com/._week44-bs012.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week44/html/._week44-bs012.html b/doc/pub/week44/html/._week44-bs012.html
index a6430e130..4f653ae9b 100644
--- a/doc/pub/week44/html/._week44-bs012.html
+++ b/doc/pub/week44/html/._week44-bs012.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 44,  Convolutional Neural Networks (CNN)">
-<title>Week 44,  Convolutional Neural Networks (CNN)</title>
+<meta name="description" content="Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)">
+<title>Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week44.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week44-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -41,10 +41,121 @@
                2,
                None,
                'lab-sessions-on-tuesday-and-wednesday'),
-              ('Material for Lecture Monday October 28',
+              ('Material for Lecture Monday October 27',
                2,
                None,
-               'material-for-lecture-monday-october-28'),
+               'material-for-lecture-monday-october-27'),
+              ('Solving differential equations  with Deep Learning',
+               2,
+               None,
+               'solving-differential-equations-with-deep-learning'),
+              ('Ordinary Differential Equations first',
+               2,
+               None,
+               'ordinary-differential-equations-first'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('Minimization process', 2, None, 'minimization-process'),
+              ('Minimizing the cost function using gradient descent and '
+               'automatic differentiation',
+               2,
+               None,
+               'minimizing-the-cost-function-using-gradient-descent-and-automatic-differentiation'),
+              ('Example: Exponential decay',
+               2,
+               None,
+               'example-exponential-decay'),
+              ('The function to solve for',
+               2,
+               None,
+               'the-function-to-solve-for'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('Setup of Network', 2, None, 'setup-of-network'),
+              ('Reformulating the problem',
+               2,
+               None,
+               'reformulating-the-problem'),
+              ('More technicalities', 2, None, 'more-technicalities'),
+              ('More details', 2, None, 'more-details'),
+              ('A possible implementation of a neural network',
+               2,
+               None,
+               'a-possible-implementation-of-a-neural-network'),
+              ('Technicalities', 2, None, 'technicalities'),
+              ('Final technicalities I', 2, None, 'final-technicalities-i'),
+              ('Final technicalities II', 2, None, 'final-technicalities-ii'),
+              ('Final technicalities III', 2, None, 'final-technicalities-iii'),
+              ('Final technicalities IV', 2, None, 'final-technicalities-iv'),
+              ('Back propagation', 2, None, 'back-propagation'),
+              ('Gradient descent', 2, None, 'gradient-descent'),
+              ('The code for solving the ODE',
+               2,
+               None,
+               'the-code-for-solving-the-ode'),
+              ('The network with one input layer, specified number of hidden '
+               'layers, and one output layer',
+               2,
+               None,
+               'the-network-with-one-input-layer-specified-number-of-hidden-layers-and-one-output-layer'),
+              ('Example: Population growth',
+               2,
+               None,
+               'example-population-growth'),
+              ('Setting up the problem', 2, None, 'setting-up-the-problem'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('The program using Autograd',
+               2,
+               None,
+               'the-program-using-autograd'),
+              ('Using forward Euler to solve the ODE',
+               2,
+               None,
+               'using-forward-euler-to-solve-the-ode'),
+              ('Example: Solving the one dimensional Poisson equation',
+               2,
+               None,
+               'example-solving-the-one-dimensional-poisson-equation'),
+              ('The specific equation to solve for',
+               2,
+               None,
+               'the-specific-equation-to-solve-for'),
+              ('Solving the equation using Autograd',
+               2,
+               None,
+               'solving-the-equation-using-autograd'),
+              ('Comparing with a numerical scheme',
+               2,
+               None,
+               'comparing-with-a-numerical-scheme'),
+              ('Setting up the code', 2, None, 'setting-up-the-code'),
+              ('Partial Differential Equations',
+               2,
+               None,
+               'partial-differential-equations'),
+              ('Type of problem', 2, None, 'type-of-problem'),
+              ('Network requirements', 2, None, 'network-requirements'),
+              ('More details', 2, None, 'more-details'),
+              ('Example: The diffusion equation',
+               2,
+               None,
+               'example-the-diffusion-equation'),
+              ('Defining the problem', 2, None, 'defining-the-problem'),
+              ('Setting up the network using Autograd',
+               2,
+               None,
+               'setting-up-the-network-using-autograd'),
+              ('Setting up the network using Autograd; The trial solution',
+               2,
+               None,
+               'setting-up-the-network-using-autograd-the-trial-solution'),
+              ('Why the Jacobian?', 2, None, 'why-the-jacobian'),
+              ('Setting up the network using Autograd; The full program',
+               2,
+               None,
+               'setting-up-the-network-using-autograd-the-full-program'),
+              ('Resources on differential equations and deep learning',
+               2,
+               None,
+               'resources-on-differential-equations-and-deep-learning'),
               ('Convolutional Neural Networks (recognizing images)',
                2,
                None,
@@ -141,85 +252,11 @@
                2,
                None,
                'pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
-              ('Building convolutional neural networks in Tensorflow and Keras',
-               2,
-               None,
-               'building-convolutional-neural-networks-in-tensorflow-and-keras'),
-              ('Setting it up', 2, None, 'setting-it-up'),
-              ('The MNIST dataset again', 2, None, 'the-mnist-dataset-again'),
-              ('Strong correlations', 2, None, 'strong-correlations'),
-              ('Layers of a CNN', 2, None, 'layers-of-a-cnn'),
-              ('Systematic reduction', 2, None, 'systematic-reduction'),
-              ('Prerequisites: Collect and pre-process data',
-               2,
-               None,
-               'prerequisites-collect-and-pre-process-data'),
-              ('Importing Keras and Tensorflow',
-               2,
-               None,
-               'importing-keras-and-tensorflow'),
-              ('Running with Keras', 2, None, 'running-with-keras'),
-              ('Final part', 2, None, 'final-part'),
-              ('Final visualization', 2, None, 'final-visualization'),
-              ('The CIFAR01 data set', 2, None, 'the-cifar01-data-set'),
-              ('Verifying the data set', 2, None, 'verifying-the-data-set'),
-              ('Set up  the model', 2, None, 'set-up-the-model'),
-              ('Add Dense layers on top', 2, None, 'add-dense-layers-on-top'),
-              ('Compile and train the model',
+              ('Building convolutional neural networks in Tensorflow/Keras and '
+               'PyTorch',
                2,
                None,
-               'compile-and-train-the-model'),
-              ('Finally, evaluate the model',
-               2,
-               None,
-               'finally-evaluate-the-model'),
-              ('Building our own CNN code',
-               2,
-               None,
-               'building-our-own-cnn-code'),
-              ('List of contents:', 3, None, 'list-of-contents'),
-              ('Schedulers', 3, None, 'schedulers'),
-              ('Usage of schedulers', 3, None, 'usage-of-schedulers'),
-              ('Cost functions', 3, None, 'cost-functions'),
-              ('Usage of cost functions', 3, None, 'usage-of-cost-functions'),
-              ('Activation functions', 3, None, 'activation-functions'),
-              ('Usage of activation functions',
-               3,
-               None,
-               'usage-of-activation-functions'),
-              ('Convolution', 3, None, 'convolution'),
-              ('Layers', 3, None, 'layers'),
-              ('Convolution2DLayer: convolution in a hidden layer',
-               3,
-               None,
-               'convolution2dlayer-convolution-in-a-hidden-layer'),
-              ('Backpropagation in the convolutional layer',
-               3,
-               None,
-               'backpropagation-in-the-convolutional-layer'),
-              ('Demonstration', 3, None, 'demonstration'),
-              ('Pooling Layer', 3, None, 'pooling-layer'),
-              ('Flattening Layer', 3, None, 'flattening-layer'),
-              ('Fully Connected Layers', 3, None, 'fully-connected-layers'),
-              ('Optimized Convolution2DLayer',
-               3,
-               None,
-               'optimized-convolution2dlayer'),
-              ('The Convolutional Neural Network (CNN)',
-               3,
-               None,
-               'the-convolutional-neural-network-cnn'),
-              ('Usage of CNN code', 3, None, 'usage-of-cnn-code'),
-              ('Additional Remarks', 3, None, 'additional-remarks'),
-              ('Remarks on the speed', 3, None, 'remarks-on-the-speed'),
-              ('Convolution using separable kernels',
-               3,
-               None,
-               'convolution-using-separable-kernels'),
-              ('Convolution in the Fourier domain',
-               3,
-               None,
-               'convolution-in-the-fourier-domain')]}
+               'building-convolutional-neural-networks-in-tensorflow-keras-and-pytorch')]}
 end of tocinfo -->
 
 <body>
@@ -247,96 +284,100 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week44-bs.html">Week 44,  Convolutional Neural Networks (CNN)</a>
+    <a class="navbar-brand" href="http://github.com/week44-bs.html">Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs001.html#plan-for-week-44" style="font-size: 80%;"><b>Plan for week 44</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs002.html#lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Lab  sessions on Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs003.html#material-for-lecture-monday-october-28" style="font-size: 80%;"><b>Material for Lecture Monday October 28</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs004.html#convolutional-neural-networks-recognizing-images" style="font-size: 80%;"><b>Convolutional Neural Networks (recognizing images)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs005.html#what-is-the-difference" style="font-size: 80%;"><b>What is the Difference</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs006.html#neural-networks-vs-cnns" style="font-size: 80%;"><b>Neural Networks vs CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs007.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;"><b>Why CNNS for images, sound files, medical images from CT scans etc?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs008.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;"><b>Regular NNs don’t scale well to full images</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs009.html#3d-volumes-of-neurons" style="font-size: 80%;"><b>3D volumes of neurons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs010.html#more-on-dimensionalities" style="font-size: 80%;"><b>More on Dimensionalities</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs011.html#further-remarks" style="font-size: 80%;"><b>Further remarks</b></a></li>
-     <!-- navigation toc: --> <li><a href="#layers-used-to-build-cnns" style="font-size: 80%;"><b>Layers used to build CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs013.html#transforming-images" style="font-size: 80%;"><b>Transforming images</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs014.html#cnns-in-brief" style="font-size: 80%;"><b>CNNs in brief</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs015.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs016.html#key-idea" style="font-size: 80%;"><b>Key Idea</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs017.html#how-to-do-image-compression-before-the-era-of-deep-learning" style="font-size: 80%;"><b>How to do image compression before the era of deep learning</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs018.html#the-svd-example" style="font-size: 80%;"><b>The SVD example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs019.html#mathematics-of-cnns" style="font-size: 80%;"><b>Mathematics of CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs020.html#convolution-examples-polynomial-multiplication" style="font-size: 80%;"><b>Convolution Examples: Polynomial multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs021.html#efficient-polynomial-multiplication" style="font-size: 80%;"><b>Efficient Polynomial Multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs022.html#further-simplification" style="font-size: 80%;"><b>Further simplification</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs023.html#a-more-efficient-way-of-coding-the-above-convolution" style="font-size: 80%;"><b>A more efficient way of coding the above Convolution</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs024.html#commutative-process" style="font-size: 80%;"><b>Commutative process</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs025.html#toeplitz-matrices" style="font-size: 80%;"><b>Toeplitz matrices</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs026.html#fourier-series-and-toeplitz-matrices" style="font-size: 80%;"><b>Fourier series and Toeplitz matrices</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs027.html#generalizing-the-above-one-dimensional-case" style="font-size: 80%;"><b>Generalizing the above one-dimensional case</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#memory-considerations" style="font-size: 80%;"><b>Memory considerations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs029.html#padding" style="font-size: 80%;"><b>Padding</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs030.html#new-vector" style="font-size: 80%;"><b>New vector</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs031.html#rewriting-as-dot-products" style="font-size: 80%;"><b>Rewriting as dot products</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs032.html#cross-correlation" style="font-size: 80%;"><b>Cross correlation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs032.html#two-dimensional-objects" style="font-size: 80%;"><b>Two-dimensional objects</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs033.html#cnns-in-more-detail-simple-example" style="font-size: 80%;"><b>CNNs in more detail, simple example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs034.html#the-convolution-stage" style="font-size: 80%;"><b>The convolution stage</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs035.html#finding-the-number-of-parameters" style="font-size: 80%;"><b>Finding the number of parameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs036.html#new-image-or-volume" style="font-size: 80%;"><b>New image (or volume)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs037.html#parameters-to-train-common-settings" style="font-size: 80%;"><b>Parameters to train, common settings</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs038.html#examples-of-cnn-setups" style="font-size: 80%;"><b>Examples of CNN setups</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs039.html#summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>Summarizing: Performing a general discrete convolution ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs040.html#pooling" style="font-size: 80%;"><b>Pooling</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs041.html#pooling-arithmetic" style="font-size: 80%;"><b>Pooling arithmetic</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs042.html#pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>Pooling types ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs043.html#building-convolutional-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building convolutional neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs044.html#setting-it-up" style="font-size: 80%;"><b>Setting it up</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs045.html#the-mnist-dataset-again" style="font-size: 80%;"><b>The MNIST dataset again</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs046.html#strong-correlations" style="font-size: 80%;"><b>Strong correlations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs047.html#layers-of-a-cnn" style="font-size: 80%;"><b>Layers of a CNN</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs048.html#systematic-reduction" style="font-size: 80%;"><b>Systematic reduction</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs049.html#prerequisites-collect-and-pre-process-data" style="font-size: 80%;"><b>Prerequisites: Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs050.html#importing-keras-and-tensorflow" style="font-size: 80%;"><b>Importing Keras and Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs051.html#running-with-keras" style="font-size: 80%;"><b>Running with Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs052.html#final-part" style="font-size: 80%;"><b>Final part</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs053.html#final-visualization" style="font-size: 80%;"><b>Final visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs054.html#the-cifar01-data-set" style="font-size: 80%;"><b>The CIFAR01 data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs055.html#verifying-the-data-set" style="font-size: 80%;"><b>Verifying the data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs056.html#set-up-the-model" style="font-size: 80%;"><b>Set up  the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs057.html#add-dense-layers-on-top" style="font-size: 80%;"><b>Add Dense layers on top</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs058.html#compile-and-train-the-model" style="font-size: 80%;"><b>Compile and train the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs059.html#finally-evaluate-the-model" style="font-size: 80%;"><b>Finally, evaluate the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#building-our-own-cnn-code" style="font-size: 80%;"><b>Building our own CNN code</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#list-of-contents" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;List of contents:</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution2dlayer-convolution-in-a-hidden-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution2DLayer: convolution in a hidden layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#backpropagation-in-the-convolutional-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Backpropagation in the convolutional layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#demonstration" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Demonstration</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#pooling-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Pooling Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#flattening-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Flattening Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#fully-connected-layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Fully Connected Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#optimized-convolution2dlayer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Optimized Convolution2DLayer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#the-convolutional-neural-network-cnn" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;The Convolutional Neural Network (CNN)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-cnn-code" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of CNN code</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#additional-remarks" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Additional Remarks</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#remarks-on-the-speed" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Remarks on the speed</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution-using-separable-kernels" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution using separable kernels</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution-in-the-fourier-domain" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution in the Fourier domain</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs001.html#plan-for-week-44" style="font-size: 80%;">Plan for week 44</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs002.html#lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;">Lab  sessions on Tuesday and Wednesday</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs003.html#material-for-lecture-monday-october-27" style="font-size: 80%;">Material for Lecture Monday October 27</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs004.html#solving-differential-equations-with-deep-learning" style="font-size: 80%;">Solving differential equations  with Deep Learning</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs005.html#ordinary-differential-equations-first" style="font-size: 80%;">Ordinary Differential Equations first</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#the-trial-solution" style="font-size: 80%;">The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs007.html#minimization-process" style="font-size: 80%;">Minimization process</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs008.html#minimizing-the-cost-function-using-gradient-descent-and-automatic-differentiation" style="font-size: 80%;">Minimizing the cost function using gradient descent and automatic differentiation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs009.html#example-exponential-decay" style="font-size: 80%;">Example: Exponential decay</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs010.html#the-function-to-solve-for" style="font-size: 80%;">The function to solve for</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#the-trial-solution" style="font-size: 80%;">The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="#setup-of-network" style="font-size: 80%;">Setup of Network</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs013.html#reformulating-the-problem" style="font-size: 80%;">Reformulating the problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs014.html#more-technicalities" style="font-size: 80%;">More technicalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs039.html#more-details" style="font-size: 80%;">More details</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs016.html#a-possible-implementation-of-a-neural-network" style="font-size: 80%;">A possible implementation of a neural network</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs017.html#technicalities" style="font-size: 80%;">Technicalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs018.html#final-technicalities-i" style="font-size: 80%;">Final technicalities I</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs019.html#final-technicalities-ii" style="font-size: 80%;">Final technicalities II</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs020.html#final-technicalities-iii" style="font-size: 80%;">Final technicalities III</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs021.html#final-technicalities-iv" style="font-size: 80%;">Final technicalities IV</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs022.html#back-propagation" style="font-size: 80%;">Back propagation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs023.html#gradient-descent" style="font-size: 80%;">Gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs024.html#the-code-for-solving-the-ode" style="font-size: 80%;">The code for solving the ODE</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs025.html#the-network-with-one-input-layer-specified-number-of-hidden-layers-and-one-output-layer" style="font-size: 80%;">The network with one input layer, specified number of hidden layers, and one output layer</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs026.html#example-population-growth" style="font-size: 80%;">Example: Population growth</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs027.html#setting-up-the-problem" style="font-size: 80%;">Setting up the problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#the-trial-solution" style="font-size: 80%;">The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs029.html#the-program-using-autograd" style="font-size: 80%;">The program using Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs030.html#using-forward-euler-to-solve-the-ode" style="font-size: 80%;">Using forward Euler to solve the ODE</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs031.html#example-solving-the-one-dimensional-poisson-equation" style="font-size: 80%;">Example: Solving the one dimensional Poisson equation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs032.html#the-specific-equation-to-solve-for" style="font-size: 80%;">The specific equation to solve for</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs033.html#solving-the-equation-using-autograd" style="font-size: 80%;">Solving the equation using Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs034.html#comparing-with-a-numerical-scheme" style="font-size: 80%;">Comparing with a numerical scheme</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs035.html#setting-up-the-code" style="font-size: 80%;">Setting up the code</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs036.html#partial-differential-equations" style="font-size: 80%;">Partial Differential Equations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs037.html#type-of-problem" style="font-size: 80%;">Type of problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs038.html#network-requirements" style="font-size: 80%;">Network requirements</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs039.html#more-details" style="font-size: 80%;">More details</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs040.html#example-the-diffusion-equation" style="font-size: 80%;">Example: The diffusion equation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs041.html#defining-the-problem" style="font-size: 80%;">Defining the problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs042.html#setting-up-the-network-using-autograd" style="font-size: 80%;">Setting up the network using Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs043.html#setting-up-the-network-using-autograd-the-trial-solution" style="font-size: 80%;">Setting up the network using Autograd; The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs044.html#why-the-jacobian" style="font-size: 80%;">Why the Jacobian?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs045.html#setting-up-the-network-using-autograd-the-full-program" style="font-size: 80%;">Setting up the network using Autograd; The full program</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs046.html#resources-on-differential-equations-and-deep-learning" style="font-size: 80%;">Resources on differential equations and deep learning</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs047.html#convolutional-neural-networks-recognizing-images" style="font-size: 80%;">Convolutional Neural Networks (recognizing images)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs048.html#what-is-the-difference" style="font-size: 80%;">What is the Difference</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs049.html#neural-networks-vs-cnns" style="font-size: 80%;">Neural Networks vs CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs050.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;">Why CNNS for images, sound files, medical images from CT scans etc?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs051.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;">Regular NNs don’t scale well to full images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs052.html#3d-volumes-of-neurons" style="font-size: 80%;">3D volumes of neurons</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs053.html#more-on-dimensionalities" style="font-size: 80%;">More on Dimensionalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs054.html#further-remarks" style="font-size: 80%;">Further remarks</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs055.html#layers-used-to-build-cnns" style="font-size: 80%;">Layers used to build CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs056.html#transforming-images" style="font-size: 80%;">Transforming images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs057.html#cnns-in-brief" style="font-size: 80%;">CNNs in brief</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs058.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs059.html#key-idea" style="font-size: 80%;">Key Idea</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#how-to-do-image-compression-before-the-era-of-deep-learning" style="font-size: 80%;">How to do image compression before the era of deep learning</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs061.html#the-svd-example" style="font-size: 80%;">The SVD example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs062.html#mathematics-of-cnns" style="font-size: 80%;">Mathematics of CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs063.html#convolution-examples-polynomial-multiplication" style="font-size: 80%;">Convolution Examples: Polynomial multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs064.html#efficient-polynomial-multiplication" style="font-size: 80%;">Efficient Polynomial Multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs065.html#further-simplification" style="font-size: 80%;">Further simplification</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs066.html#a-more-efficient-way-of-coding-the-above-convolution" style="font-size: 80%;">A more efficient way of coding the above Convolution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs067.html#commutative-process" style="font-size: 80%;">Commutative process</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs068.html#toeplitz-matrices" style="font-size: 80%;">Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs069.html#fourier-series-and-toeplitz-matrices" style="font-size: 80%;">Fourier series and Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs070.html#generalizing-the-above-one-dimensional-case" style="font-size: 80%;">Generalizing the above one-dimensional case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs071.html#memory-considerations" style="font-size: 80%;">Memory considerations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs072.html#padding" style="font-size: 80%;">Padding</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs073.html#new-vector" style="font-size: 80%;">New vector</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs074.html#rewriting-as-dot-products" style="font-size: 80%;">Rewriting as dot products</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs075.html#cross-correlation" style="font-size: 80%;">Cross correlation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs075.html#two-dimensional-objects" style="font-size: 80%;">Two-dimensional objects</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs076.html#cnns-in-more-detail-simple-example" style="font-size: 80%;">CNNs in more detail, simple example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs077.html#the-convolution-stage" style="font-size: 80%;">The convolution stage</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs078.html#finding-the-number-of-parameters" style="font-size: 80%;">Finding the number of parameters</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs079.html#new-image-or-volume" style="font-size: 80%;">New image (or volume)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs080.html#parameters-to-train-common-settings" style="font-size: 80%;">Parameters to train, common settings</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs081.html#examples-of-cnn-setups" style="font-size: 80%;">Examples of CNN setups</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs082.html#summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Summarizing: Performing a general discrete convolution ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs083.html#pooling" style="font-size: 80%;">Pooling</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs084.html#pooling-arithmetic" style="font-size: 80%;">Pooling arithmetic</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs085.html#pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Pooling types ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs086.html#building-convolutional-neural-networks-in-tensorflow-keras-and-pytorch" style="font-size: 80%;">Building convolutional neural networks in Tensorflow/Keras and PyTorch</a></li>
 
         </ul>
       </li>
@@ -347,26 +388,28 @@
 <div class="container">
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0012"></a>
-<!-- !split  -->
-<h2 id="layers-used-to-build-cnns" class="anchor">Layers used to build CNNs </h2>
+<!-- !split -->
+<h2 id="setup-of-network" class="anchor">Setup of Network </h2>
 
-<p>A simple CNN is a sequence of layers, and every layer of a CNN
-transforms one volume of activations to another through a
-differentiable function. We use three main types of layers to build
-CNN architectures: Convolutional Layer, Pooling Layer, and
-Fully-Connected Layer (exactly as seen in regular Neural Networks). We
-will stack these layers to form a full CNN architecture.
+<p>In this network, there are no weights and bias at the input layer, so \( P = \{ P_{\text{hidden}},  P_{\text{output}} \} \).
+If there are \( N_{\text{hidden} } \) neurons in the hidden layer, then \( P_{\text{hidden}} \) is a \( N_{\text{hidden} } \times (1 + N_{\text{input}}) \) matrix, given that there are \( N_{\text{input}} \) neurons in the input layer.
 </p>
 
-<p>A simple CNN for image classification could have the architecture:</p>
+<p>The first column in \( P_{\text{hidden} } \) represents the bias for each neuron in the hidden layer and the second column represents the weights for each neuron in the hidden layer from the input layer.
+If there are \( N_{\text{output} } \) neurons in the output layer, then \( P_{\text{output}}  \) is a \( N_{\text{output} } \times (1 + N_{\text{hidden} }) \) matrix.
+</p>
+
+<p>Its first column represents the bias of each neuron and the remaining columns represents the weights to each neuron.</p>
+
+<p>It is given that \( g(0) = g_0 \). The trial solution must fulfill this condition to be a proper solution of <a href="http://github.com/._week44-bs010.html#mjx-eqn-6">(6)</a>. A possible way to ensure that \( g_t(0, P) = g_0 \), is to let \( F(N(x,P)) = x \cdot N(x,P) \) and \( h_1(x) = g_0 \). This gives the following trial solution:</p>
+
+$$
+\begin{equation} \tag{7}
+g_t(x, P) = g_0 + x \cdot N(x, P)
+\end{equation}
+$$
+
 
-<ul>
-<li> <b>INPUT</b> (\( 32\times 32 \times 3 \)) will hold the raw pixel values of the image, in this case an image of width 32, height 32, and with three color channels R,G,B.</li>
-<li> <b>CONV</b> (convolutional )layer will compute the output of neurons that are connected to local regions in the input, each computing a dot product between their weights and a small region they are connected to in the input volume. This may result in volume such as \( [32\times 32\times 12] \) if we decided to use 12 filters.</li>
-<li> <b>RELU</b> layer will apply an elementwise activation function, such as the \( max(0,x) \) thresholding at zero. This leaves the size of the volume unchanged (\( [32\times 32\times 12] \)).</li>
-<li> <b>POOL</b> (pooling) layer will perform a downsampling operation along the spatial dimensions (width, height), resulting in volume such as \( [16\times 16\times 12] \).</li>
-<li> <b>FC</b> (i.e. fully-connected) layer will compute the class scores, resulting in volume of size \( [1\times 1\times 10] \), where each of the 10 numbers correspond to a class score, such as among the 10 categories of the MNIST images we considered above . As with ordinary Neural Networks and as the name implies, each neuron in this layer will be connected to all the numbers in the previous volume.</li>
-</ul>
 <p>
 <!-- navigation buttons at the bottom of the page -->
 <ul class="pagination">
@@ -392,7 +435,7 @@ <h2 id="layers-used-to-build-cnns" class="anchor">Layers used to build CNNs </h2
   <li><a href="http://github.com/._week44-bs020.html">21</a></li>
   <li><a href="http://github.com/._week44-bs021.html">22</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week44-bs060.html">61</a></li>
+  <li><a href="http://github.com/._week44-bs086.html">87</a></li>
   <li><a href="http://github.com/._week44-bs013.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week44/html/._week44-bs013.html b/doc/pub/week44/html/._week44-bs013.html
index 998d15fe6..49e33d7b2 100644
--- a/doc/pub/week44/html/._week44-bs013.html
+++ b/doc/pub/week44/html/._week44-bs013.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 44,  Convolutional Neural Networks (CNN)">
-<title>Week 44,  Convolutional Neural Networks (CNN)</title>
+<meta name="description" content="Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)">
+<title>Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week44.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week44-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -41,10 +41,121 @@
                2,
                None,
                'lab-sessions-on-tuesday-and-wednesday'),
-              ('Material for Lecture Monday October 28',
+              ('Material for Lecture Monday October 27',
                2,
                None,
-               'material-for-lecture-monday-october-28'),
+               'material-for-lecture-monday-october-27'),
+              ('Solving differential equations  with Deep Learning',
+               2,
+               None,
+               'solving-differential-equations-with-deep-learning'),
+              ('Ordinary Differential Equations first',
+               2,
+               None,
+               'ordinary-differential-equations-first'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('Minimization process', 2, None, 'minimization-process'),
+              ('Minimizing the cost function using gradient descent and '
+               'automatic differentiation',
+               2,
+               None,
+               'minimizing-the-cost-function-using-gradient-descent-and-automatic-differentiation'),
+              ('Example: Exponential decay',
+               2,
+               None,
+               'example-exponential-decay'),
+              ('The function to solve for',
+               2,
+               None,
+               'the-function-to-solve-for'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('Setup of Network', 2, None, 'setup-of-network'),
+              ('Reformulating the problem',
+               2,
+               None,
+               'reformulating-the-problem'),
+              ('More technicalities', 2, None, 'more-technicalities'),
+              ('More details', 2, None, 'more-details'),
+              ('A possible implementation of a neural network',
+               2,
+               None,
+               'a-possible-implementation-of-a-neural-network'),
+              ('Technicalities', 2, None, 'technicalities'),
+              ('Final technicalities I', 2, None, 'final-technicalities-i'),
+              ('Final technicalities II', 2, None, 'final-technicalities-ii'),
+              ('Final technicalities III', 2, None, 'final-technicalities-iii'),
+              ('Final technicalities IV', 2, None, 'final-technicalities-iv'),
+              ('Back propagation', 2, None, 'back-propagation'),
+              ('Gradient descent', 2, None, 'gradient-descent'),
+              ('The code for solving the ODE',
+               2,
+               None,
+               'the-code-for-solving-the-ode'),
+              ('The network with one input layer, specified number of hidden '
+               'layers, and one output layer',
+               2,
+               None,
+               'the-network-with-one-input-layer-specified-number-of-hidden-layers-and-one-output-layer'),
+              ('Example: Population growth',
+               2,
+               None,
+               'example-population-growth'),
+              ('Setting up the problem', 2, None, 'setting-up-the-problem'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('The program using Autograd',
+               2,
+               None,
+               'the-program-using-autograd'),
+              ('Using forward Euler to solve the ODE',
+               2,
+               None,
+               'using-forward-euler-to-solve-the-ode'),
+              ('Example: Solving the one dimensional Poisson equation',
+               2,
+               None,
+               'example-solving-the-one-dimensional-poisson-equation'),
+              ('The specific equation to solve for',
+               2,
+               None,
+               'the-specific-equation-to-solve-for'),
+              ('Solving the equation using Autograd',
+               2,
+               None,
+               'solving-the-equation-using-autograd'),
+              ('Comparing with a numerical scheme',
+               2,
+               None,
+               'comparing-with-a-numerical-scheme'),
+              ('Setting up the code', 2, None, 'setting-up-the-code'),
+              ('Partial Differential Equations',
+               2,
+               None,
+               'partial-differential-equations'),
+              ('Type of problem', 2, None, 'type-of-problem'),
+              ('Network requirements', 2, None, 'network-requirements'),
+              ('More details', 2, None, 'more-details'),
+              ('Example: The diffusion equation',
+               2,
+               None,
+               'example-the-diffusion-equation'),
+              ('Defining the problem', 2, None, 'defining-the-problem'),
+              ('Setting up the network using Autograd',
+               2,
+               None,
+               'setting-up-the-network-using-autograd'),
+              ('Setting up the network using Autograd; The trial solution',
+               2,
+               None,
+               'setting-up-the-network-using-autograd-the-trial-solution'),
+              ('Why the Jacobian?', 2, None, 'why-the-jacobian'),
+              ('Setting up the network using Autograd; The full program',
+               2,
+               None,
+               'setting-up-the-network-using-autograd-the-full-program'),
+              ('Resources on differential equations and deep learning',
+               2,
+               None,
+               'resources-on-differential-equations-and-deep-learning'),
               ('Convolutional Neural Networks (recognizing images)',
                2,
                None,
@@ -141,85 +252,11 @@
                2,
                None,
                'pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
-              ('Building convolutional neural networks in Tensorflow and Keras',
-               2,
-               None,
-               'building-convolutional-neural-networks-in-tensorflow-and-keras'),
-              ('Setting it up', 2, None, 'setting-it-up'),
-              ('The MNIST dataset again', 2, None, 'the-mnist-dataset-again'),
-              ('Strong correlations', 2, None, 'strong-correlations'),
-              ('Layers of a CNN', 2, None, 'layers-of-a-cnn'),
-              ('Systematic reduction', 2, None, 'systematic-reduction'),
-              ('Prerequisites: Collect and pre-process data',
-               2,
-               None,
-               'prerequisites-collect-and-pre-process-data'),
-              ('Importing Keras and Tensorflow',
-               2,
-               None,
-               'importing-keras-and-tensorflow'),
-              ('Running with Keras', 2, None, 'running-with-keras'),
-              ('Final part', 2, None, 'final-part'),
-              ('Final visualization', 2, None, 'final-visualization'),
-              ('The CIFAR01 data set', 2, None, 'the-cifar01-data-set'),
-              ('Verifying the data set', 2, None, 'verifying-the-data-set'),
-              ('Set up  the model', 2, None, 'set-up-the-model'),
-              ('Add Dense layers on top', 2, None, 'add-dense-layers-on-top'),
-              ('Compile and train the model',
-               2,
-               None,
-               'compile-and-train-the-model'),
-              ('Finally, evaluate the model',
-               2,
-               None,
-               'finally-evaluate-the-model'),
-              ('Building our own CNN code',
+              ('Building convolutional neural networks in Tensorflow/Keras and '
+               'PyTorch',
                2,
                None,
-               'building-our-own-cnn-code'),
-              ('List of contents:', 3, None, 'list-of-contents'),
-              ('Schedulers', 3, None, 'schedulers'),
-              ('Usage of schedulers', 3, None, 'usage-of-schedulers'),
-              ('Cost functions', 3, None, 'cost-functions'),
-              ('Usage of cost functions', 3, None, 'usage-of-cost-functions'),
-              ('Activation functions', 3, None, 'activation-functions'),
-              ('Usage of activation functions',
-               3,
-               None,
-               'usage-of-activation-functions'),
-              ('Convolution', 3, None, 'convolution'),
-              ('Layers', 3, None, 'layers'),
-              ('Convolution2DLayer: convolution in a hidden layer',
-               3,
-               None,
-               'convolution2dlayer-convolution-in-a-hidden-layer'),
-              ('Backpropagation in the convolutional layer',
-               3,
-               None,
-               'backpropagation-in-the-convolutional-layer'),
-              ('Demonstration', 3, None, 'demonstration'),
-              ('Pooling Layer', 3, None, 'pooling-layer'),
-              ('Flattening Layer', 3, None, 'flattening-layer'),
-              ('Fully Connected Layers', 3, None, 'fully-connected-layers'),
-              ('Optimized Convolution2DLayer',
-               3,
-               None,
-               'optimized-convolution2dlayer'),
-              ('The Convolutional Neural Network (CNN)',
-               3,
-               None,
-               'the-convolutional-neural-network-cnn'),
-              ('Usage of CNN code', 3, None, 'usage-of-cnn-code'),
-              ('Additional Remarks', 3, None, 'additional-remarks'),
-              ('Remarks on the speed', 3, None, 'remarks-on-the-speed'),
-              ('Convolution using separable kernels',
-               3,
-               None,
-               'convolution-using-separable-kernels'),
-              ('Convolution in the Fourier domain',
-               3,
-               None,
-               'convolution-in-the-fourier-domain')]}
+               'building-convolutional-neural-networks-in-tensorflow-keras-and-pytorch')]}
 end of tocinfo -->
 
 <body>
@@ -247,96 +284,100 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week44-bs.html">Week 44,  Convolutional Neural Networks (CNN)</a>
+    <a class="navbar-brand" href="http://github.com/week44-bs.html">Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs001.html#plan-for-week-44" style="font-size: 80%;"><b>Plan for week 44</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs002.html#lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Lab  sessions on Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs003.html#material-for-lecture-monday-october-28" style="font-size: 80%;"><b>Material for Lecture Monday October 28</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs004.html#convolutional-neural-networks-recognizing-images" style="font-size: 80%;"><b>Convolutional Neural Networks (recognizing images)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs005.html#what-is-the-difference" style="font-size: 80%;"><b>What is the Difference</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs006.html#neural-networks-vs-cnns" style="font-size: 80%;"><b>Neural Networks vs CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs007.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;"><b>Why CNNS for images, sound files, medical images from CT scans etc?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs008.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;"><b>Regular NNs don’t scale well to full images</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs009.html#3d-volumes-of-neurons" style="font-size: 80%;"><b>3D volumes of neurons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs010.html#more-on-dimensionalities" style="font-size: 80%;"><b>More on Dimensionalities</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs011.html#further-remarks" style="font-size: 80%;"><b>Further remarks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs012.html#layers-used-to-build-cnns" style="font-size: 80%;"><b>Layers used to build CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="#transforming-images" style="font-size: 80%;"><b>Transforming images</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs014.html#cnns-in-brief" style="font-size: 80%;"><b>CNNs in brief</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs015.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs016.html#key-idea" style="font-size: 80%;"><b>Key Idea</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs017.html#how-to-do-image-compression-before-the-era-of-deep-learning" style="font-size: 80%;"><b>How to do image compression before the era of deep learning</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs018.html#the-svd-example" style="font-size: 80%;"><b>The SVD example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs019.html#mathematics-of-cnns" style="font-size: 80%;"><b>Mathematics of CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs020.html#convolution-examples-polynomial-multiplication" style="font-size: 80%;"><b>Convolution Examples: Polynomial multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs021.html#efficient-polynomial-multiplication" style="font-size: 80%;"><b>Efficient Polynomial Multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs022.html#further-simplification" style="font-size: 80%;"><b>Further simplification</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs023.html#a-more-efficient-way-of-coding-the-above-convolution" style="font-size: 80%;"><b>A more efficient way of coding the above Convolution</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs024.html#commutative-process" style="font-size: 80%;"><b>Commutative process</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs025.html#toeplitz-matrices" style="font-size: 80%;"><b>Toeplitz matrices</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs026.html#fourier-series-and-toeplitz-matrices" style="font-size: 80%;"><b>Fourier series and Toeplitz matrices</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs027.html#generalizing-the-above-one-dimensional-case" style="font-size: 80%;"><b>Generalizing the above one-dimensional case</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#memory-considerations" style="font-size: 80%;"><b>Memory considerations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs029.html#padding" style="font-size: 80%;"><b>Padding</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs030.html#new-vector" style="font-size: 80%;"><b>New vector</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs031.html#rewriting-as-dot-products" style="font-size: 80%;"><b>Rewriting as dot products</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs032.html#cross-correlation" style="font-size: 80%;"><b>Cross correlation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs032.html#two-dimensional-objects" style="font-size: 80%;"><b>Two-dimensional objects</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs033.html#cnns-in-more-detail-simple-example" style="font-size: 80%;"><b>CNNs in more detail, simple example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs034.html#the-convolution-stage" style="font-size: 80%;"><b>The convolution stage</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs035.html#finding-the-number-of-parameters" style="font-size: 80%;"><b>Finding the number of parameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs036.html#new-image-or-volume" style="font-size: 80%;"><b>New image (or volume)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs037.html#parameters-to-train-common-settings" style="font-size: 80%;"><b>Parameters to train, common settings</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs038.html#examples-of-cnn-setups" style="font-size: 80%;"><b>Examples of CNN setups</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs039.html#summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>Summarizing: Performing a general discrete convolution ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs040.html#pooling" style="font-size: 80%;"><b>Pooling</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs041.html#pooling-arithmetic" style="font-size: 80%;"><b>Pooling arithmetic</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs042.html#pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>Pooling types ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs043.html#building-convolutional-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building convolutional neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs044.html#setting-it-up" style="font-size: 80%;"><b>Setting it up</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs045.html#the-mnist-dataset-again" style="font-size: 80%;"><b>The MNIST dataset again</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs046.html#strong-correlations" style="font-size: 80%;"><b>Strong correlations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs047.html#layers-of-a-cnn" style="font-size: 80%;"><b>Layers of a CNN</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs048.html#systematic-reduction" style="font-size: 80%;"><b>Systematic reduction</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs049.html#prerequisites-collect-and-pre-process-data" style="font-size: 80%;"><b>Prerequisites: Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs050.html#importing-keras-and-tensorflow" style="font-size: 80%;"><b>Importing Keras and Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs051.html#running-with-keras" style="font-size: 80%;"><b>Running with Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs052.html#final-part" style="font-size: 80%;"><b>Final part</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs053.html#final-visualization" style="font-size: 80%;"><b>Final visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs054.html#the-cifar01-data-set" style="font-size: 80%;"><b>The CIFAR01 data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs055.html#verifying-the-data-set" style="font-size: 80%;"><b>Verifying the data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs056.html#set-up-the-model" style="font-size: 80%;"><b>Set up  the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs057.html#add-dense-layers-on-top" style="font-size: 80%;"><b>Add Dense layers on top</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs058.html#compile-and-train-the-model" style="font-size: 80%;"><b>Compile and train the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs059.html#finally-evaluate-the-model" style="font-size: 80%;"><b>Finally, evaluate the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#building-our-own-cnn-code" style="font-size: 80%;"><b>Building our own CNN code</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#list-of-contents" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;List of contents:</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution2dlayer-convolution-in-a-hidden-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution2DLayer: convolution in a hidden layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#backpropagation-in-the-convolutional-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Backpropagation in the convolutional layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#demonstration" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Demonstration</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#pooling-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Pooling Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#flattening-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Flattening Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#fully-connected-layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Fully Connected Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#optimized-convolution2dlayer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Optimized Convolution2DLayer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#the-convolutional-neural-network-cnn" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;The Convolutional Neural Network (CNN)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-cnn-code" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of CNN code</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#additional-remarks" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Additional Remarks</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#remarks-on-the-speed" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Remarks on the speed</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution-using-separable-kernels" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution using separable kernels</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution-in-the-fourier-domain" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution in the Fourier domain</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs001.html#plan-for-week-44" style="font-size: 80%;">Plan for week 44</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs002.html#lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;">Lab  sessions on Tuesday and Wednesday</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs003.html#material-for-lecture-monday-october-27" style="font-size: 80%;">Material for Lecture Monday October 27</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs004.html#solving-differential-equations-with-deep-learning" style="font-size: 80%;">Solving differential equations  with Deep Learning</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs005.html#ordinary-differential-equations-first" style="font-size: 80%;">Ordinary Differential Equations first</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#the-trial-solution" style="font-size: 80%;">The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs007.html#minimization-process" style="font-size: 80%;">Minimization process</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs008.html#minimizing-the-cost-function-using-gradient-descent-and-automatic-differentiation" style="font-size: 80%;">Minimizing the cost function using gradient descent and automatic differentiation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs009.html#example-exponential-decay" style="font-size: 80%;">Example: Exponential decay</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs010.html#the-function-to-solve-for" style="font-size: 80%;">The function to solve for</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#the-trial-solution" style="font-size: 80%;">The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs012.html#setup-of-network" style="font-size: 80%;">Setup of Network</a></li>
+     <!-- navigation toc: --> <li><a href="#reformulating-the-problem" style="font-size: 80%;">Reformulating the problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs014.html#more-technicalities" style="font-size: 80%;">More technicalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs039.html#more-details" style="font-size: 80%;">More details</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs016.html#a-possible-implementation-of-a-neural-network" style="font-size: 80%;">A possible implementation of a neural network</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs017.html#technicalities" style="font-size: 80%;">Technicalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs018.html#final-technicalities-i" style="font-size: 80%;">Final technicalities I</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs019.html#final-technicalities-ii" style="font-size: 80%;">Final technicalities II</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs020.html#final-technicalities-iii" style="font-size: 80%;">Final technicalities III</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs021.html#final-technicalities-iv" style="font-size: 80%;">Final technicalities IV</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs022.html#back-propagation" style="font-size: 80%;">Back propagation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs023.html#gradient-descent" style="font-size: 80%;">Gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs024.html#the-code-for-solving-the-ode" style="font-size: 80%;">The code for solving the ODE</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs025.html#the-network-with-one-input-layer-specified-number-of-hidden-layers-and-one-output-layer" style="font-size: 80%;">The network with one input layer, specified number of hidden layers, and one output layer</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs026.html#example-population-growth" style="font-size: 80%;">Example: Population growth</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs027.html#setting-up-the-problem" style="font-size: 80%;">Setting up the problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#the-trial-solution" style="font-size: 80%;">The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs029.html#the-program-using-autograd" style="font-size: 80%;">The program using Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs030.html#using-forward-euler-to-solve-the-ode" style="font-size: 80%;">Using forward Euler to solve the ODE</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs031.html#example-solving-the-one-dimensional-poisson-equation" style="font-size: 80%;">Example: Solving the one dimensional Poisson equation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs032.html#the-specific-equation-to-solve-for" style="font-size: 80%;">The specific equation to solve for</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs033.html#solving-the-equation-using-autograd" style="font-size: 80%;">Solving the equation using Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs034.html#comparing-with-a-numerical-scheme" style="font-size: 80%;">Comparing with a numerical scheme</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs035.html#setting-up-the-code" style="font-size: 80%;">Setting up the code</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs036.html#partial-differential-equations" style="font-size: 80%;">Partial Differential Equations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs037.html#type-of-problem" style="font-size: 80%;">Type of problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs038.html#network-requirements" style="font-size: 80%;">Network requirements</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs039.html#more-details" style="font-size: 80%;">More details</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs040.html#example-the-diffusion-equation" style="font-size: 80%;">Example: The diffusion equation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs041.html#defining-the-problem" style="font-size: 80%;">Defining the problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs042.html#setting-up-the-network-using-autograd" style="font-size: 80%;">Setting up the network using Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs043.html#setting-up-the-network-using-autograd-the-trial-solution" style="font-size: 80%;">Setting up the network using Autograd; The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs044.html#why-the-jacobian" style="font-size: 80%;">Why the Jacobian?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs045.html#setting-up-the-network-using-autograd-the-full-program" style="font-size: 80%;">Setting up the network using Autograd; The full program</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs046.html#resources-on-differential-equations-and-deep-learning" style="font-size: 80%;">Resources on differential equations and deep learning</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs047.html#convolutional-neural-networks-recognizing-images" style="font-size: 80%;">Convolutional Neural Networks (recognizing images)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs048.html#what-is-the-difference" style="font-size: 80%;">What is the Difference</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs049.html#neural-networks-vs-cnns" style="font-size: 80%;">Neural Networks vs CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs050.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;">Why CNNS for images, sound files, medical images from CT scans etc?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs051.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;">Regular NNs don’t scale well to full images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs052.html#3d-volumes-of-neurons" style="font-size: 80%;">3D volumes of neurons</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs053.html#more-on-dimensionalities" style="font-size: 80%;">More on Dimensionalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs054.html#further-remarks" style="font-size: 80%;">Further remarks</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs055.html#layers-used-to-build-cnns" style="font-size: 80%;">Layers used to build CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs056.html#transforming-images" style="font-size: 80%;">Transforming images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs057.html#cnns-in-brief" style="font-size: 80%;">CNNs in brief</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs058.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs059.html#key-idea" style="font-size: 80%;">Key Idea</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#how-to-do-image-compression-before-the-era-of-deep-learning" style="font-size: 80%;">How to do image compression before the era of deep learning</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs061.html#the-svd-example" style="font-size: 80%;">The SVD example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs062.html#mathematics-of-cnns" style="font-size: 80%;">Mathematics of CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs063.html#convolution-examples-polynomial-multiplication" style="font-size: 80%;">Convolution Examples: Polynomial multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs064.html#efficient-polynomial-multiplication" style="font-size: 80%;">Efficient Polynomial Multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs065.html#further-simplification" style="font-size: 80%;">Further simplification</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs066.html#a-more-efficient-way-of-coding-the-above-convolution" style="font-size: 80%;">A more efficient way of coding the above Convolution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs067.html#commutative-process" style="font-size: 80%;">Commutative process</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs068.html#toeplitz-matrices" style="font-size: 80%;">Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs069.html#fourier-series-and-toeplitz-matrices" style="font-size: 80%;">Fourier series and Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs070.html#generalizing-the-above-one-dimensional-case" style="font-size: 80%;">Generalizing the above one-dimensional case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs071.html#memory-considerations" style="font-size: 80%;">Memory considerations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs072.html#padding" style="font-size: 80%;">Padding</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs073.html#new-vector" style="font-size: 80%;">New vector</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs074.html#rewriting-as-dot-products" style="font-size: 80%;">Rewriting as dot products</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs075.html#cross-correlation" style="font-size: 80%;">Cross correlation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs075.html#two-dimensional-objects" style="font-size: 80%;">Two-dimensional objects</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs076.html#cnns-in-more-detail-simple-example" style="font-size: 80%;">CNNs in more detail, simple example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs077.html#the-convolution-stage" style="font-size: 80%;">The convolution stage</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs078.html#finding-the-number-of-parameters" style="font-size: 80%;">Finding the number of parameters</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs079.html#new-image-or-volume" style="font-size: 80%;">New image (or volume)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs080.html#parameters-to-train-common-settings" style="font-size: 80%;">Parameters to train, common settings</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs081.html#examples-of-cnn-setups" style="font-size: 80%;">Examples of CNN setups</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs082.html#summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Summarizing: Performing a general discrete convolution ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs083.html#pooling" style="font-size: 80%;">Pooling</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs084.html#pooling-arithmetic" style="font-size: 80%;">Pooling arithmetic</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs085.html#pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Pooling types ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs086.html#building-convolutional-neural-networks-in-tensorflow-keras-and-pytorch" style="font-size: 80%;">Building convolutional neural networks in Tensorflow/Keras and PyTorch</a></li>
 
         </ul>
       </li>
@@ -348,22 +389,32 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0013"></a>
 <!-- !split -->
-<h2 id="transforming-images" class="anchor">Transforming images </h2>
+<h2 id="reformulating-the-problem" class="anchor">Reformulating the problem </h2>
 
-<p>CNNs transform the original image layer by layer from the original
-pixel values to the final class scores. 
-</p>
+<p>We wish that our neural network manages to minimize a given cost function.</p>
 
-<p>Observe that some layers contain
-parameters and other don&#8217;t. In particular, the CNN layers perform
-transformations that are a function of not only the activations in the
-input volume, but also of the parameters (the weights and biases of
-the neurons). On the other hand, the RELU/POOL layers will implement a
-fixed function. The parameters in the CONV/FC layers will be trained
-with gradient descent so that the class scores that the CNN computes
-are consistent with the labels in the training set for each image.
+<p>A reformulation of out equation, <a href="http://github.com/._week44-bs010.html#mjx-eqn-6">(6)</a>, must therefore be done,
+such that it describes the problem a neural network can solve for.
 </p>
 
+<p>The neural network must find the set of weights and biases \( P \) such that the trial solution in <a href="http://github.com/._week44-bs012.html#mjx-eqn-7">(7)</a> satisfies <a href="http://github.com/._week44-bs010.html#mjx-eqn-6">(6)</a>.</p>
+
+<p>The trial solution</p>
+
+$$
+g_t(x, P) = g_0 + x \cdot N(x, P)
+$$
+
+<p>has been chosen such that it already solves the condition \( g(0) = g_0 \). What remains, is to find \( P \) such that</p>
+
+$$
+\begin{equation} \tag{8}
+g_t'(x, P) = - \gamma g_t(x, P)
+\end{equation}
+$$
+
+<p>is fulfilled as <em>best as possible</em>.</p>
+
 <p>
 <!-- navigation buttons at the bottom of the page -->
 <ul class="pagination">
@@ -389,7 +440,7 @@ <h2 id="transforming-images" class="anchor">Transforming images </h2>
   <li><a href="http://github.com/._week44-bs021.html">22</a></li>
   <li><a href="http://github.com/._week44-bs022.html">23</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week44-bs060.html">61</a></li>
+  <li><a href="http://github.com/._week44-bs086.html">87</a></li>
   <li><a href="http://github.com/._week44-bs014.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week44/html/._week44-bs014.html b/doc/pub/week44/html/._week44-bs014.html
index 07cac5aca..db1c6e0b7 100644
--- a/doc/pub/week44/html/._week44-bs014.html
+++ b/doc/pub/week44/html/._week44-bs014.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 44,  Convolutional Neural Networks (CNN)">
-<title>Week 44,  Convolutional Neural Networks (CNN)</title>
+<meta name="description" content="Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)">
+<title>Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week44.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week44-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -41,10 +41,121 @@
                2,
                None,
                'lab-sessions-on-tuesday-and-wednesday'),
-              ('Material for Lecture Monday October 28',
+              ('Material for Lecture Monday October 27',
                2,
                None,
-               'material-for-lecture-monday-october-28'),
+               'material-for-lecture-monday-october-27'),
+              ('Solving differential equations  with Deep Learning',
+               2,
+               None,
+               'solving-differential-equations-with-deep-learning'),
+              ('Ordinary Differential Equations first',
+               2,
+               None,
+               'ordinary-differential-equations-first'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('Minimization process', 2, None, 'minimization-process'),
+              ('Minimizing the cost function using gradient descent and '
+               'automatic differentiation',
+               2,
+               None,
+               'minimizing-the-cost-function-using-gradient-descent-and-automatic-differentiation'),
+              ('Example: Exponential decay',
+               2,
+               None,
+               'example-exponential-decay'),
+              ('The function to solve for',
+               2,
+               None,
+               'the-function-to-solve-for'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('Setup of Network', 2, None, 'setup-of-network'),
+              ('Reformulating the problem',
+               2,
+               None,
+               'reformulating-the-problem'),
+              ('More technicalities', 2, None, 'more-technicalities'),
+              ('More details', 2, None, 'more-details'),
+              ('A possible implementation of a neural network',
+               2,
+               None,
+               'a-possible-implementation-of-a-neural-network'),
+              ('Technicalities', 2, None, 'technicalities'),
+              ('Final technicalities I', 2, None, 'final-technicalities-i'),
+              ('Final technicalities II', 2, None, 'final-technicalities-ii'),
+              ('Final technicalities III', 2, None, 'final-technicalities-iii'),
+              ('Final technicalities IV', 2, None, 'final-technicalities-iv'),
+              ('Back propagation', 2, None, 'back-propagation'),
+              ('Gradient descent', 2, None, 'gradient-descent'),
+              ('The code for solving the ODE',
+               2,
+               None,
+               'the-code-for-solving-the-ode'),
+              ('The network with one input layer, specified number of hidden '
+               'layers, and one output layer',
+               2,
+               None,
+               'the-network-with-one-input-layer-specified-number-of-hidden-layers-and-one-output-layer'),
+              ('Example: Population growth',
+               2,
+               None,
+               'example-population-growth'),
+              ('Setting up the problem', 2, None, 'setting-up-the-problem'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('The program using Autograd',
+               2,
+               None,
+               'the-program-using-autograd'),
+              ('Using forward Euler to solve the ODE',
+               2,
+               None,
+               'using-forward-euler-to-solve-the-ode'),
+              ('Example: Solving the one dimensional Poisson equation',
+               2,
+               None,
+               'example-solving-the-one-dimensional-poisson-equation'),
+              ('The specific equation to solve for',
+               2,
+               None,
+               'the-specific-equation-to-solve-for'),
+              ('Solving the equation using Autograd',
+               2,
+               None,
+               'solving-the-equation-using-autograd'),
+              ('Comparing with a numerical scheme',
+               2,
+               None,
+               'comparing-with-a-numerical-scheme'),
+              ('Setting up the code', 2, None, 'setting-up-the-code'),
+              ('Partial Differential Equations',
+               2,
+               None,
+               'partial-differential-equations'),
+              ('Type of problem', 2, None, 'type-of-problem'),
+              ('Network requirements', 2, None, 'network-requirements'),
+              ('More details', 2, None, 'more-details'),
+              ('Example: The diffusion equation',
+               2,
+               None,
+               'example-the-diffusion-equation'),
+              ('Defining the problem', 2, None, 'defining-the-problem'),
+              ('Setting up the network using Autograd',
+               2,
+               None,
+               'setting-up-the-network-using-autograd'),
+              ('Setting up the network using Autograd; The trial solution',
+               2,
+               None,
+               'setting-up-the-network-using-autograd-the-trial-solution'),
+              ('Why the Jacobian?', 2, None, 'why-the-jacobian'),
+              ('Setting up the network using Autograd; The full program',
+               2,
+               None,
+               'setting-up-the-network-using-autograd-the-full-program'),
+              ('Resources on differential equations and deep learning',
+               2,
+               None,
+               'resources-on-differential-equations-and-deep-learning'),
               ('Convolutional Neural Networks (recognizing images)',
                2,
                None,
@@ -141,85 +252,11 @@
                2,
                None,
                'pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
-              ('Building convolutional neural networks in Tensorflow and Keras',
-               2,
-               None,
-               'building-convolutional-neural-networks-in-tensorflow-and-keras'),
-              ('Setting it up', 2, None, 'setting-it-up'),
-              ('The MNIST dataset again', 2, None, 'the-mnist-dataset-again'),
-              ('Strong correlations', 2, None, 'strong-correlations'),
-              ('Layers of a CNN', 2, None, 'layers-of-a-cnn'),
-              ('Systematic reduction', 2, None, 'systematic-reduction'),
-              ('Prerequisites: Collect and pre-process data',
-               2,
-               None,
-               'prerequisites-collect-and-pre-process-data'),
-              ('Importing Keras and Tensorflow',
-               2,
-               None,
-               'importing-keras-and-tensorflow'),
-              ('Running with Keras', 2, None, 'running-with-keras'),
-              ('Final part', 2, None, 'final-part'),
-              ('Final visualization', 2, None, 'final-visualization'),
-              ('The CIFAR01 data set', 2, None, 'the-cifar01-data-set'),
-              ('Verifying the data set', 2, None, 'verifying-the-data-set'),
-              ('Set up  the model', 2, None, 'set-up-the-model'),
-              ('Add Dense layers on top', 2, None, 'add-dense-layers-on-top'),
-              ('Compile and train the model',
-               2,
-               None,
-               'compile-and-train-the-model'),
-              ('Finally, evaluate the model',
-               2,
-               None,
-               'finally-evaluate-the-model'),
-              ('Building our own CNN code',
+              ('Building convolutional neural networks in Tensorflow/Keras and '
+               'PyTorch',
                2,
                None,
-               'building-our-own-cnn-code'),
-              ('List of contents:', 3, None, 'list-of-contents'),
-              ('Schedulers', 3, None, 'schedulers'),
-              ('Usage of schedulers', 3, None, 'usage-of-schedulers'),
-              ('Cost functions', 3, None, 'cost-functions'),
-              ('Usage of cost functions', 3, None, 'usage-of-cost-functions'),
-              ('Activation functions', 3, None, 'activation-functions'),
-              ('Usage of activation functions',
-               3,
-               None,
-               'usage-of-activation-functions'),
-              ('Convolution', 3, None, 'convolution'),
-              ('Layers', 3, None, 'layers'),
-              ('Convolution2DLayer: convolution in a hidden layer',
-               3,
-               None,
-               'convolution2dlayer-convolution-in-a-hidden-layer'),
-              ('Backpropagation in the convolutional layer',
-               3,
-               None,
-               'backpropagation-in-the-convolutional-layer'),
-              ('Demonstration', 3, None, 'demonstration'),
-              ('Pooling Layer', 3, None, 'pooling-layer'),
-              ('Flattening Layer', 3, None, 'flattening-layer'),
-              ('Fully Connected Layers', 3, None, 'fully-connected-layers'),
-              ('Optimized Convolution2DLayer',
-               3,
-               None,
-               'optimized-convolution2dlayer'),
-              ('The Convolutional Neural Network (CNN)',
-               3,
-               None,
-               'the-convolutional-neural-network-cnn'),
-              ('Usage of CNN code', 3, None, 'usage-of-cnn-code'),
-              ('Additional Remarks', 3, None, 'additional-remarks'),
-              ('Remarks on the speed', 3, None, 'remarks-on-the-speed'),
-              ('Convolution using separable kernels',
-               3,
-               None,
-               'convolution-using-separable-kernels'),
-              ('Convolution in the Fourier domain',
-               3,
-               None,
-               'convolution-in-the-fourier-domain')]}
+               'building-convolutional-neural-networks-in-tensorflow-keras-and-pytorch')]}
 end of tocinfo -->
 
 <body>
@@ -247,96 +284,100 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week44-bs.html">Week 44,  Convolutional Neural Networks (CNN)</a>
+    <a class="navbar-brand" href="http://github.com/week44-bs.html">Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs001.html#plan-for-week-44" style="font-size: 80%;"><b>Plan for week 44</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs002.html#lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Lab  sessions on Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs003.html#material-for-lecture-monday-october-28" style="font-size: 80%;"><b>Material for Lecture Monday October 28</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs004.html#convolutional-neural-networks-recognizing-images" style="font-size: 80%;"><b>Convolutional Neural Networks (recognizing images)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs005.html#what-is-the-difference" style="font-size: 80%;"><b>What is the Difference</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs006.html#neural-networks-vs-cnns" style="font-size: 80%;"><b>Neural Networks vs CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs007.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;"><b>Why CNNS for images, sound files, medical images from CT scans etc?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs008.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;"><b>Regular NNs don’t scale well to full images</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs009.html#3d-volumes-of-neurons" style="font-size: 80%;"><b>3D volumes of neurons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs010.html#more-on-dimensionalities" style="font-size: 80%;"><b>More on Dimensionalities</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs011.html#further-remarks" style="font-size: 80%;"><b>Further remarks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs012.html#layers-used-to-build-cnns" style="font-size: 80%;"><b>Layers used to build CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs013.html#transforming-images" style="font-size: 80%;"><b>Transforming images</b></a></li>
-     <!-- navigation toc: --> <li><a href="#cnns-in-brief" style="font-size: 80%;"><b>CNNs in brief</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs015.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs016.html#key-idea" style="font-size: 80%;"><b>Key Idea</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs017.html#how-to-do-image-compression-before-the-era-of-deep-learning" style="font-size: 80%;"><b>How to do image compression before the era of deep learning</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs018.html#the-svd-example" style="font-size: 80%;"><b>The SVD example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs019.html#mathematics-of-cnns" style="font-size: 80%;"><b>Mathematics of CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs020.html#convolution-examples-polynomial-multiplication" style="font-size: 80%;"><b>Convolution Examples: Polynomial multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs021.html#efficient-polynomial-multiplication" style="font-size: 80%;"><b>Efficient Polynomial Multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs022.html#further-simplification" style="font-size: 80%;"><b>Further simplification</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs023.html#a-more-efficient-way-of-coding-the-above-convolution" style="font-size: 80%;"><b>A more efficient way of coding the above Convolution</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs024.html#commutative-process" style="font-size: 80%;"><b>Commutative process</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs025.html#toeplitz-matrices" style="font-size: 80%;"><b>Toeplitz matrices</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs026.html#fourier-series-and-toeplitz-matrices" style="font-size: 80%;"><b>Fourier series and Toeplitz matrices</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs027.html#generalizing-the-above-one-dimensional-case" style="font-size: 80%;"><b>Generalizing the above one-dimensional case</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#memory-considerations" style="font-size: 80%;"><b>Memory considerations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs029.html#padding" style="font-size: 80%;"><b>Padding</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs030.html#new-vector" style="font-size: 80%;"><b>New vector</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs031.html#rewriting-as-dot-products" style="font-size: 80%;"><b>Rewriting as dot products</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs032.html#cross-correlation" style="font-size: 80%;"><b>Cross correlation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs032.html#two-dimensional-objects" style="font-size: 80%;"><b>Two-dimensional objects</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs033.html#cnns-in-more-detail-simple-example" style="font-size: 80%;"><b>CNNs in more detail, simple example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs034.html#the-convolution-stage" style="font-size: 80%;"><b>The convolution stage</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs035.html#finding-the-number-of-parameters" style="font-size: 80%;"><b>Finding the number of parameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs036.html#new-image-or-volume" style="font-size: 80%;"><b>New image (or volume)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs037.html#parameters-to-train-common-settings" style="font-size: 80%;"><b>Parameters to train, common settings</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs038.html#examples-of-cnn-setups" style="font-size: 80%;"><b>Examples of CNN setups</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs039.html#summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>Summarizing: Performing a general discrete convolution ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs040.html#pooling" style="font-size: 80%;"><b>Pooling</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs041.html#pooling-arithmetic" style="font-size: 80%;"><b>Pooling arithmetic</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs042.html#pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>Pooling types ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs043.html#building-convolutional-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building convolutional neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs044.html#setting-it-up" style="font-size: 80%;"><b>Setting it up</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs045.html#the-mnist-dataset-again" style="font-size: 80%;"><b>The MNIST dataset again</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs046.html#strong-correlations" style="font-size: 80%;"><b>Strong correlations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs047.html#layers-of-a-cnn" style="font-size: 80%;"><b>Layers of a CNN</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs048.html#systematic-reduction" style="font-size: 80%;"><b>Systematic reduction</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs049.html#prerequisites-collect-and-pre-process-data" style="font-size: 80%;"><b>Prerequisites: Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs050.html#importing-keras-and-tensorflow" style="font-size: 80%;"><b>Importing Keras and Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs051.html#running-with-keras" style="font-size: 80%;"><b>Running with Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs052.html#final-part" style="font-size: 80%;"><b>Final part</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs053.html#final-visualization" style="font-size: 80%;"><b>Final visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs054.html#the-cifar01-data-set" style="font-size: 80%;"><b>The CIFAR01 data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs055.html#verifying-the-data-set" style="font-size: 80%;"><b>Verifying the data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs056.html#set-up-the-model" style="font-size: 80%;"><b>Set up  the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs057.html#add-dense-layers-on-top" style="font-size: 80%;"><b>Add Dense layers on top</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs058.html#compile-and-train-the-model" style="font-size: 80%;"><b>Compile and train the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs059.html#finally-evaluate-the-model" style="font-size: 80%;"><b>Finally, evaluate the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#building-our-own-cnn-code" style="font-size: 80%;"><b>Building our own CNN code</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#list-of-contents" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;List of contents:</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution2dlayer-convolution-in-a-hidden-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution2DLayer: convolution in a hidden layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#backpropagation-in-the-convolutional-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Backpropagation in the convolutional layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#demonstration" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Demonstration</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#pooling-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Pooling Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#flattening-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Flattening Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#fully-connected-layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Fully Connected Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#optimized-convolution2dlayer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Optimized Convolution2DLayer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#the-convolutional-neural-network-cnn" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;The Convolutional Neural Network (CNN)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-cnn-code" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of CNN code</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#additional-remarks" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Additional Remarks</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#remarks-on-the-speed" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Remarks on the speed</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution-using-separable-kernels" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution using separable kernels</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution-in-the-fourier-domain" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution in the Fourier domain</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs001.html#plan-for-week-44" style="font-size: 80%;">Plan for week 44</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs002.html#lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;">Lab  sessions on Tuesday and Wednesday</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs003.html#material-for-lecture-monday-october-27" style="font-size: 80%;">Material for Lecture Monday October 27</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs004.html#solving-differential-equations-with-deep-learning" style="font-size: 80%;">Solving differential equations  with Deep Learning</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs005.html#ordinary-differential-equations-first" style="font-size: 80%;">Ordinary Differential Equations first</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#the-trial-solution" style="font-size: 80%;">The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs007.html#minimization-process" style="font-size: 80%;">Minimization process</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs008.html#minimizing-the-cost-function-using-gradient-descent-and-automatic-differentiation" style="font-size: 80%;">Minimizing the cost function using gradient descent and automatic differentiation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs009.html#example-exponential-decay" style="font-size: 80%;">Example: Exponential decay</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs010.html#the-function-to-solve-for" style="font-size: 80%;">The function to solve for</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#the-trial-solution" style="font-size: 80%;">The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs012.html#setup-of-network" style="font-size: 80%;">Setup of Network</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs013.html#reformulating-the-problem" style="font-size: 80%;">Reformulating the problem</a></li>
+     <!-- navigation toc: --> <li><a href="#more-technicalities" style="font-size: 80%;">More technicalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs039.html#more-details" style="font-size: 80%;">More details</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs016.html#a-possible-implementation-of-a-neural-network" style="font-size: 80%;">A possible implementation of a neural network</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs017.html#technicalities" style="font-size: 80%;">Technicalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs018.html#final-technicalities-i" style="font-size: 80%;">Final technicalities I</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs019.html#final-technicalities-ii" style="font-size: 80%;">Final technicalities II</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs020.html#final-technicalities-iii" style="font-size: 80%;">Final technicalities III</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs021.html#final-technicalities-iv" style="font-size: 80%;">Final technicalities IV</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs022.html#back-propagation" style="font-size: 80%;">Back propagation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs023.html#gradient-descent" style="font-size: 80%;">Gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs024.html#the-code-for-solving-the-ode" style="font-size: 80%;">The code for solving the ODE</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs025.html#the-network-with-one-input-layer-specified-number-of-hidden-layers-and-one-output-layer" style="font-size: 80%;">The network with one input layer, specified number of hidden layers, and one output layer</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs026.html#example-population-growth" style="font-size: 80%;">Example: Population growth</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs027.html#setting-up-the-problem" style="font-size: 80%;">Setting up the problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#the-trial-solution" style="font-size: 80%;">The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs029.html#the-program-using-autograd" style="font-size: 80%;">The program using Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs030.html#using-forward-euler-to-solve-the-ode" style="font-size: 80%;">Using forward Euler to solve the ODE</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs031.html#example-solving-the-one-dimensional-poisson-equation" style="font-size: 80%;">Example: Solving the one dimensional Poisson equation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs032.html#the-specific-equation-to-solve-for" style="font-size: 80%;">The specific equation to solve for</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs033.html#solving-the-equation-using-autograd" style="font-size: 80%;">Solving the equation using Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs034.html#comparing-with-a-numerical-scheme" style="font-size: 80%;">Comparing with a numerical scheme</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs035.html#setting-up-the-code" style="font-size: 80%;">Setting up the code</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs036.html#partial-differential-equations" style="font-size: 80%;">Partial Differential Equations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs037.html#type-of-problem" style="font-size: 80%;">Type of problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs038.html#network-requirements" style="font-size: 80%;">Network requirements</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs039.html#more-details" style="font-size: 80%;">More details</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs040.html#example-the-diffusion-equation" style="font-size: 80%;">Example: The diffusion equation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs041.html#defining-the-problem" style="font-size: 80%;">Defining the problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs042.html#setting-up-the-network-using-autograd" style="font-size: 80%;">Setting up the network using Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs043.html#setting-up-the-network-using-autograd-the-trial-solution" style="font-size: 80%;">Setting up the network using Autograd; The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs044.html#why-the-jacobian" style="font-size: 80%;">Why the Jacobian?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs045.html#setting-up-the-network-using-autograd-the-full-program" style="font-size: 80%;">Setting up the network using Autograd; The full program</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs046.html#resources-on-differential-equations-and-deep-learning" style="font-size: 80%;">Resources on differential equations and deep learning</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs047.html#convolutional-neural-networks-recognizing-images" style="font-size: 80%;">Convolutional Neural Networks (recognizing images)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs048.html#what-is-the-difference" style="font-size: 80%;">What is the Difference</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs049.html#neural-networks-vs-cnns" style="font-size: 80%;">Neural Networks vs CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs050.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;">Why CNNS for images, sound files, medical images from CT scans etc?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs051.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;">Regular NNs don’t scale well to full images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs052.html#3d-volumes-of-neurons" style="font-size: 80%;">3D volumes of neurons</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs053.html#more-on-dimensionalities" style="font-size: 80%;">More on Dimensionalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs054.html#further-remarks" style="font-size: 80%;">Further remarks</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs055.html#layers-used-to-build-cnns" style="font-size: 80%;">Layers used to build CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs056.html#transforming-images" style="font-size: 80%;">Transforming images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs057.html#cnns-in-brief" style="font-size: 80%;">CNNs in brief</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs058.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs059.html#key-idea" style="font-size: 80%;">Key Idea</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#how-to-do-image-compression-before-the-era-of-deep-learning" style="font-size: 80%;">How to do image compression before the era of deep learning</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs061.html#the-svd-example" style="font-size: 80%;">The SVD example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs062.html#mathematics-of-cnns" style="font-size: 80%;">Mathematics of CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs063.html#convolution-examples-polynomial-multiplication" style="font-size: 80%;">Convolution Examples: Polynomial multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs064.html#efficient-polynomial-multiplication" style="font-size: 80%;">Efficient Polynomial Multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs065.html#further-simplification" style="font-size: 80%;">Further simplification</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs066.html#a-more-efficient-way-of-coding-the-above-convolution" style="font-size: 80%;">A more efficient way of coding the above Convolution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs067.html#commutative-process" style="font-size: 80%;">Commutative process</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs068.html#toeplitz-matrices" style="font-size: 80%;">Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs069.html#fourier-series-and-toeplitz-matrices" style="font-size: 80%;">Fourier series and Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs070.html#generalizing-the-above-one-dimensional-case" style="font-size: 80%;">Generalizing the above one-dimensional case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs071.html#memory-considerations" style="font-size: 80%;">Memory considerations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs072.html#padding" style="font-size: 80%;">Padding</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs073.html#new-vector" style="font-size: 80%;">New vector</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs074.html#rewriting-as-dot-products" style="font-size: 80%;">Rewriting as dot products</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs075.html#cross-correlation" style="font-size: 80%;">Cross correlation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs075.html#two-dimensional-objects" style="font-size: 80%;">Two-dimensional objects</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs076.html#cnns-in-more-detail-simple-example" style="font-size: 80%;">CNNs in more detail, simple example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs077.html#the-convolution-stage" style="font-size: 80%;">The convolution stage</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs078.html#finding-the-number-of-parameters" style="font-size: 80%;">Finding the number of parameters</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs079.html#new-image-or-volume" style="font-size: 80%;">New image (or volume)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs080.html#parameters-to-train-common-settings" style="font-size: 80%;">Parameters to train, common settings</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs081.html#examples-of-cnn-setups" style="font-size: 80%;">Examples of CNN setups</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs082.html#summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Summarizing: Performing a general discrete convolution ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs083.html#pooling" style="font-size: 80%;">Pooling</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs084.html#pooling-arithmetic" style="font-size: 80%;">Pooling arithmetic</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs085.html#pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Pooling types ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs086.html#building-convolutional-neural-networks-in-tensorflow-keras-and-pytorch" style="font-size: 80%;">Building convolutional neural networks in Tensorflow/Keras and PyTorch</a></li>
 
         </ul>
       </li>
@@ -348,17 +389,29 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0014"></a>
 <!-- !split -->
-<h2 id="cnns-in-brief" class="anchor">CNNs in brief </h2>
+<h2 id="more-technicalities" class="anchor">More technicalities </h2>
 
-<p>In summary:</p>
+<p>The left hand side and right hand side of <a href="http://github.com/._week44-bs013.html#mjx-eqn-8">(8)</a> must be computed separately, and then the neural network must choose weights and biases, contained in \( P \), such that the sides are equal as best as possible.
+This means that the absolute or squared difference between the sides must be as close to zero, ideally equal to zero.
+In this case, the difference squared shows to be an appropriate measurement of how erroneous the trial solution is with respect to \( P \) of the neural network.
+</p>
+
+<p>This gives the following cost function our neural network must solve for:</p>
+
+$$
+\min_{P}\Big\{ \big(g_t'(x, P) - ( -\gamma g_t(x, P) \big)^2 \Big\}
+$$
+
+<p>(the notation \( \min_{P}\{ f(x, P) \} \) means that we desire to find \( P \) that yields the minimum of \( f(x, P) \))</p>
+
+<p>or, in terms of weights and biases for the hidden and output layer in our network:</p>
+
+$$
+\min_{P_{\text{hidden} }, \ P_{\text{output} }}\Big\{ \big(g_t'(x, \{ P_{\text{hidden} }, P_{\text{output} }\}) - ( -\gamma g_t(x, \{ P_{\text{hidden} }, P_{\text{output} }\}) \big)^2 \Big\}
+$$
+
+<p>for an input value \( x \).</p>
 
-<ul>
-<li> A CNN architecture is in the simplest case a list of Layers that transform the image volume into an output volume (e.g. holding the class scores)</li>
-<li> There are a few distinct types of Layers (e.g. CONV/FC/RELU/POOL are by far the most popular)</li>
-<li> Each Layer accepts an input 3D volume and transforms it to an output 3D volume through a differentiable function</li>
-<li> Each Layer may or may not have parameters (e.g. CONV/FC do, RELU/POOL don&#8217;t)</li>
-<li> Each Layer may or may not have additional hyperparameters (e.g. CONV/FC/POOL do, RELU doesn&#8217;t)</li>
-</ul>
 <p>
 <!-- navigation buttons at the bottom of the page -->
 <ul class="pagination">
@@ -384,7 +437,7 @@ <h2 id="cnns-in-brief" class="anchor">CNNs in brief </h2>
   <li><a href="http://github.com/._week44-bs022.html">23</a></li>
   <li><a href="http://github.com/._week44-bs023.html">24</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week44-bs060.html">61</a></li>
+  <li><a href="http://github.com/._week44-bs086.html">87</a></li>
   <li><a href="http://github.com/._week44-bs015.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week44/html/._week44-bs015.html b/doc/pub/week44/html/._week44-bs015.html
index 0a8b427e3..4eb895fb5 100644
--- a/doc/pub/week44/html/._week44-bs015.html
+++ b/doc/pub/week44/html/._week44-bs015.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 44,  Convolutional Neural Networks (CNN)">
-<title>Week 44,  Convolutional Neural Networks (CNN)</title>
+<meta name="description" content="Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)">
+<title>Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week44.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week44-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -41,10 +41,121 @@
                2,
                None,
                'lab-sessions-on-tuesday-and-wednesday'),
-              ('Material for Lecture Monday October 28',
+              ('Material for Lecture Monday October 27',
                2,
                None,
-               'material-for-lecture-monday-october-28'),
+               'material-for-lecture-monday-october-27'),
+              ('Solving differential equations  with Deep Learning',
+               2,
+               None,
+               'solving-differential-equations-with-deep-learning'),
+              ('Ordinary Differential Equations first',
+               2,
+               None,
+               'ordinary-differential-equations-first'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('Minimization process', 2, None, 'minimization-process'),
+              ('Minimizing the cost function using gradient descent and '
+               'automatic differentiation',
+               2,
+               None,
+               'minimizing-the-cost-function-using-gradient-descent-and-automatic-differentiation'),
+              ('Example: Exponential decay',
+               2,
+               None,
+               'example-exponential-decay'),
+              ('The function to solve for',
+               2,
+               None,
+               'the-function-to-solve-for'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('Setup of Network', 2, None, 'setup-of-network'),
+              ('Reformulating the problem',
+               2,
+               None,
+               'reformulating-the-problem'),
+              ('More technicalities', 2, None, 'more-technicalities'),
+              ('More details', 2, None, 'more-details'),
+              ('A possible implementation of a neural network',
+               2,
+               None,
+               'a-possible-implementation-of-a-neural-network'),
+              ('Technicalities', 2, None, 'technicalities'),
+              ('Final technicalities I', 2, None, 'final-technicalities-i'),
+              ('Final technicalities II', 2, None, 'final-technicalities-ii'),
+              ('Final technicalities III', 2, None, 'final-technicalities-iii'),
+              ('Final technicalities IV', 2, None, 'final-technicalities-iv'),
+              ('Back propagation', 2, None, 'back-propagation'),
+              ('Gradient descent', 2, None, 'gradient-descent'),
+              ('The code for solving the ODE',
+               2,
+               None,
+               'the-code-for-solving-the-ode'),
+              ('The network with one input layer, specified number of hidden '
+               'layers, and one output layer',
+               2,
+               None,
+               'the-network-with-one-input-layer-specified-number-of-hidden-layers-and-one-output-layer'),
+              ('Example: Population growth',
+               2,
+               None,
+               'example-population-growth'),
+              ('Setting up the problem', 2, None, 'setting-up-the-problem'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('The program using Autograd',
+               2,
+               None,
+               'the-program-using-autograd'),
+              ('Using forward Euler to solve the ODE',
+               2,
+               None,
+               'using-forward-euler-to-solve-the-ode'),
+              ('Example: Solving the one dimensional Poisson equation',
+               2,
+               None,
+               'example-solving-the-one-dimensional-poisson-equation'),
+              ('The specific equation to solve for',
+               2,
+               None,
+               'the-specific-equation-to-solve-for'),
+              ('Solving the equation using Autograd',
+               2,
+               None,
+               'solving-the-equation-using-autograd'),
+              ('Comparing with a numerical scheme',
+               2,
+               None,
+               'comparing-with-a-numerical-scheme'),
+              ('Setting up the code', 2, None, 'setting-up-the-code'),
+              ('Partial Differential Equations',
+               2,
+               None,
+               'partial-differential-equations'),
+              ('Type of problem', 2, None, 'type-of-problem'),
+              ('Network requirements', 2, None, 'network-requirements'),
+              ('More details', 2, None, 'more-details'),
+              ('Example: The diffusion equation',
+               2,
+               None,
+               'example-the-diffusion-equation'),
+              ('Defining the problem', 2, None, 'defining-the-problem'),
+              ('Setting up the network using Autograd',
+               2,
+               None,
+               'setting-up-the-network-using-autograd'),
+              ('Setting up the network using Autograd; The trial solution',
+               2,
+               None,
+               'setting-up-the-network-using-autograd-the-trial-solution'),
+              ('Why the Jacobian?', 2, None, 'why-the-jacobian'),
+              ('Setting up the network using Autograd; The full program',
+               2,
+               None,
+               'setting-up-the-network-using-autograd-the-full-program'),
+              ('Resources on differential equations and deep learning',
+               2,
+               None,
+               'resources-on-differential-equations-and-deep-learning'),
               ('Convolutional Neural Networks (recognizing images)',
                2,
                None,
@@ -141,85 +252,11 @@
                2,
                None,
                'pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
-              ('Building convolutional neural networks in Tensorflow and Keras',
-               2,
-               None,
-               'building-convolutional-neural-networks-in-tensorflow-and-keras'),
-              ('Setting it up', 2, None, 'setting-it-up'),
-              ('The MNIST dataset again', 2, None, 'the-mnist-dataset-again'),
-              ('Strong correlations', 2, None, 'strong-correlations'),
-              ('Layers of a CNN', 2, None, 'layers-of-a-cnn'),
-              ('Systematic reduction', 2, None, 'systematic-reduction'),
-              ('Prerequisites: Collect and pre-process data',
-               2,
-               None,
-               'prerequisites-collect-and-pre-process-data'),
-              ('Importing Keras and Tensorflow',
-               2,
-               None,
-               'importing-keras-and-tensorflow'),
-              ('Running with Keras', 2, None, 'running-with-keras'),
-              ('Final part', 2, None, 'final-part'),
-              ('Final visualization', 2, None, 'final-visualization'),
-              ('The CIFAR01 data set', 2, None, 'the-cifar01-data-set'),
-              ('Verifying the data set', 2, None, 'verifying-the-data-set'),
-              ('Set up  the model', 2, None, 'set-up-the-model'),
-              ('Add Dense layers on top', 2, None, 'add-dense-layers-on-top'),
-              ('Compile and train the model',
-               2,
-               None,
-               'compile-and-train-the-model'),
-              ('Finally, evaluate the model',
-               2,
-               None,
-               'finally-evaluate-the-model'),
-              ('Building our own CNN code',
+              ('Building convolutional neural networks in Tensorflow/Keras and '
+               'PyTorch',
                2,
                None,
-               'building-our-own-cnn-code'),
-              ('List of contents:', 3, None, 'list-of-contents'),
-              ('Schedulers', 3, None, 'schedulers'),
-              ('Usage of schedulers', 3, None, 'usage-of-schedulers'),
-              ('Cost functions', 3, None, 'cost-functions'),
-              ('Usage of cost functions', 3, None, 'usage-of-cost-functions'),
-              ('Activation functions', 3, None, 'activation-functions'),
-              ('Usage of activation functions',
-               3,
-               None,
-               'usage-of-activation-functions'),
-              ('Convolution', 3, None, 'convolution'),
-              ('Layers', 3, None, 'layers'),
-              ('Convolution2DLayer: convolution in a hidden layer',
-               3,
-               None,
-               'convolution2dlayer-convolution-in-a-hidden-layer'),
-              ('Backpropagation in the convolutional layer',
-               3,
-               None,
-               'backpropagation-in-the-convolutional-layer'),
-              ('Demonstration', 3, None, 'demonstration'),
-              ('Pooling Layer', 3, None, 'pooling-layer'),
-              ('Flattening Layer', 3, None, 'flattening-layer'),
-              ('Fully Connected Layers', 3, None, 'fully-connected-layers'),
-              ('Optimized Convolution2DLayer',
-               3,
-               None,
-               'optimized-convolution2dlayer'),
-              ('The Convolutional Neural Network (CNN)',
-               3,
-               None,
-               'the-convolutional-neural-network-cnn'),
-              ('Usage of CNN code', 3, None, 'usage-of-cnn-code'),
-              ('Additional Remarks', 3, None, 'additional-remarks'),
-              ('Remarks on the speed', 3, None, 'remarks-on-the-speed'),
-              ('Convolution using separable kernels',
-               3,
-               None,
-               'convolution-using-separable-kernels'),
-              ('Convolution in the Fourier domain',
-               3,
-               None,
-               'convolution-in-the-fourier-domain')]}
+               'building-convolutional-neural-networks-in-tensorflow-keras-and-pytorch')]}
 end of tocinfo -->
 
 <body>
@@ -247,96 +284,100 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week44-bs.html">Week 44,  Convolutional Neural Networks (CNN)</a>
+    <a class="navbar-brand" href="http://github.com/week44-bs.html">Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs001.html#plan-for-week-44" style="font-size: 80%;"><b>Plan for week 44</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs002.html#lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Lab  sessions on Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs003.html#material-for-lecture-monday-october-28" style="font-size: 80%;"><b>Material for Lecture Monday October 28</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs004.html#convolutional-neural-networks-recognizing-images" style="font-size: 80%;"><b>Convolutional Neural Networks (recognizing images)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs005.html#what-is-the-difference" style="font-size: 80%;"><b>What is the Difference</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs006.html#neural-networks-vs-cnns" style="font-size: 80%;"><b>Neural Networks vs CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs007.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;"><b>Why CNNS for images, sound files, medical images from CT scans etc?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs008.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;"><b>Regular NNs don’t scale well to full images</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs009.html#3d-volumes-of-neurons" style="font-size: 80%;"><b>3D volumes of neurons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs010.html#more-on-dimensionalities" style="font-size: 80%;"><b>More on Dimensionalities</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs011.html#further-remarks" style="font-size: 80%;"><b>Further remarks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs012.html#layers-used-to-build-cnns" style="font-size: 80%;"><b>Layers used to build CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs013.html#transforming-images" style="font-size: 80%;"><b>Transforming images</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs014.html#cnns-in-brief" style="font-size: 80%;"><b>CNNs in brief</b></a></li>
-     <!-- navigation toc: --> <li><a href="#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs016.html#key-idea" style="font-size: 80%;"><b>Key Idea</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs017.html#how-to-do-image-compression-before-the-era-of-deep-learning" style="font-size: 80%;"><b>How to do image compression before the era of deep learning</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs018.html#the-svd-example" style="font-size: 80%;"><b>The SVD example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs019.html#mathematics-of-cnns" style="font-size: 80%;"><b>Mathematics of CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs020.html#convolution-examples-polynomial-multiplication" style="font-size: 80%;"><b>Convolution Examples: Polynomial multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs021.html#efficient-polynomial-multiplication" style="font-size: 80%;"><b>Efficient Polynomial Multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs022.html#further-simplification" style="font-size: 80%;"><b>Further simplification</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs023.html#a-more-efficient-way-of-coding-the-above-convolution" style="font-size: 80%;"><b>A more efficient way of coding the above Convolution</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs024.html#commutative-process" style="font-size: 80%;"><b>Commutative process</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs025.html#toeplitz-matrices" style="font-size: 80%;"><b>Toeplitz matrices</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs026.html#fourier-series-and-toeplitz-matrices" style="font-size: 80%;"><b>Fourier series and Toeplitz matrices</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs027.html#generalizing-the-above-one-dimensional-case" style="font-size: 80%;"><b>Generalizing the above one-dimensional case</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#memory-considerations" style="font-size: 80%;"><b>Memory considerations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs029.html#padding" style="font-size: 80%;"><b>Padding</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs030.html#new-vector" style="font-size: 80%;"><b>New vector</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs031.html#rewriting-as-dot-products" style="font-size: 80%;"><b>Rewriting as dot products</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs032.html#cross-correlation" style="font-size: 80%;"><b>Cross correlation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs032.html#two-dimensional-objects" style="font-size: 80%;"><b>Two-dimensional objects</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs033.html#cnns-in-more-detail-simple-example" style="font-size: 80%;"><b>CNNs in more detail, simple example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs034.html#the-convolution-stage" style="font-size: 80%;"><b>The convolution stage</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs035.html#finding-the-number-of-parameters" style="font-size: 80%;"><b>Finding the number of parameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs036.html#new-image-or-volume" style="font-size: 80%;"><b>New image (or volume)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs037.html#parameters-to-train-common-settings" style="font-size: 80%;"><b>Parameters to train, common settings</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs038.html#examples-of-cnn-setups" style="font-size: 80%;"><b>Examples of CNN setups</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs039.html#summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>Summarizing: Performing a general discrete convolution ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs040.html#pooling" style="font-size: 80%;"><b>Pooling</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs041.html#pooling-arithmetic" style="font-size: 80%;"><b>Pooling arithmetic</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs042.html#pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>Pooling types ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs043.html#building-convolutional-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building convolutional neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs044.html#setting-it-up" style="font-size: 80%;"><b>Setting it up</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs045.html#the-mnist-dataset-again" style="font-size: 80%;"><b>The MNIST dataset again</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs046.html#strong-correlations" style="font-size: 80%;"><b>Strong correlations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs047.html#layers-of-a-cnn" style="font-size: 80%;"><b>Layers of a CNN</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs048.html#systematic-reduction" style="font-size: 80%;"><b>Systematic reduction</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs049.html#prerequisites-collect-and-pre-process-data" style="font-size: 80%;"><b>Prerequisites: Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs050.html#importing-keras-and-tensorflow" style="font-size: 80%;"><b>Importing Keras and Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs051.html#running-with-keras" style="font-size: 80%;"><b>Running with Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs052.html#final-part" style="font-size: 80%;"><b>Final part</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs053.html#final-visualization" style="font-size: 80%;"><b>Final visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs054.html#the-cifar01-data-set" style="font-size: 80%;"><b>The CIFAR01 data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs055.html#verifying-the-data-set" style="font-size: 80%;"><b>Verifying the data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs056.html#set-up-the-model" style="font-size: 80%;"><b>Set up  the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs057.html#add-dense-layers-on-top" style="font-size: 80%;"><b>Add Dense layers on top</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs058.html#compile-and-train-the-model" style="font-size: 80%;"><b>Compile and train the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs059.html#finally-evaluate-the-model" style="font-size: 80%;"><b>Finally, evaluate the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#building-our-own-cnn-code" style="font-size: 80%;"><b>Building our own CNN code</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#list-of-contents" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;List of contents:</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution2dlayer-convolution-in-a-hidden-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution2DLayer: convolution in a hidden layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#backpropagation-in-the-convolutional-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Backpropagation in the convolutional layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#demonstration" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Demonstration</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#pooling-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Pooling Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#flattening-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Flattening Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#fully-connected-layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Fully Connected Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#optimized-convolution2dlayer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Optimized Convolution2DLayer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#the-convolutional-neural-network-cnn" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;The Convolutional Neural Network (CNN)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-cnn-code" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of CNN code</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#additional-remarks" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Additional Remarks</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#remarks-on-the-speed" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Remarks on the speed</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution-using-separable-kernels" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution using separable kernels</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution-in-the-fourier-domain" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution in the Fourier domain</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs001.html#plan-for-week-44" style="font-size: 80%;">Plan for week 44</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs002.html#lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;">Lab  sessions on Tuesday and Wednesday</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs003.html#material-for-lecture-monday-october-27" style="font-size: 80%;">Material for Lecture Monday October 27</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs004.html#solving-differential-equations-with-deep-learning" style="font-size: 80%;">Solving differential equations  with Deep Learning</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs005.html#ordinary-differential-equations-first" style="font-size: 80%;">Ordinary Differential Equations first</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#the-trial-solution" style="font-size: 80%;">The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs007.html#minimization-process" style="font-size: 80%;">Minimization process</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs008.html#minimizing-the-cost-function-using-gradient-descent-and-automatic-differentiation" style="font-size: 80%;">Minimizing the cost function using gradient descent and automatic differentiation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs009.html#example-exponential-decay" style="font-size: 80%;">Example: Exponential decay</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs010.html#the-function-to-solve-for" style="font-size: 80%;">The function to solve for</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#the-trial-solution" style="font-size: 80%;">The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs012.html#setup-of-network" style="font-size: 80%;">Setup of Network</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs013.html#reformulating-the-problem" style="font-size: 80%;">Reformulating the problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs014.html#more-technicalities" style="font-size: 80%;">More technicalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs039.html#more-details" style="font-size: 80%;">More details</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs016.html#a-possible-implementation-of-a-neural-network" style="font-size: 80%;">A possible implementation of a neural network</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs017.html#technicalities" style="font-size: 80%;">Technicalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs018.html#final-technicalities-i" style="font-size: 80%;">Final technicalities I</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs019.html#final-technicalities-ii" style="font-size: 80%;">Final technicalities II</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs020.html#final-technicalities-iii" style="font-size: 80%;">Final technicalities III</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs021.html#final-technicalities-iv" style="font-size: 80%;">Final technicalities IV</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs022.html#back-propagation" style="font-size: 80%;">Back propagation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs023.html#gradient-descent" style="font-size: 80%;">Gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs024.html#the-code-for-solving-the-ode" style="font-size: 80%;">The code for solving the ODE</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs025.html#the-network-with-one-input-layer-specified-number-of-hidden-layers-and-one-output-layer" style="font-size: 80%;">The network with one input layer, specified number of hidden layers, and one output layer</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs026.html#example-population-growth" style="font-size: 80%;">Example: Population growth</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs027.html#setting-up-the-problem" style="font-size: 80%;">Setting up the problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#the-trial-solution" style="font-size: 80%;">The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs029.html#the-program-using-autograd" style="font-size: 80%;">The program using Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs030.html#using-forward-euler-to-solve-the-ode" style="font-size: 80%;">Using forward Euler to solve the ODE</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs031.html#example-solving-the-one-dimensional-poisson-equation" style="font-size: 80%;">Example: Solving the one dimensional Poisson equation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs032.html#the-specific-equation-to-solve-for" style="font-size: 80%;">The specific equation to solve for</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs033.html#solving-the-equation-using-autograd" style="font-size: 80%;">Solving the equation using Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs034.html#comparing-with-a-numerical-scheme" style="font-size: 80%;">Comparing with a numerical scheme</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs035.html#setting-up-the-code" style="font-size: 80%;">Setting up the code</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs036.html#partial-differential-equations" style="font-size: 80%;">Partial Differential Equations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs037.html#type-of-problem" style="font-size: 80%;">Type of problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs038.html#network-requirements" style="font-size: 80%;">Network requirements</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs039.html#more-details" style="font-size: 80%;">More details</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs040.html#example-the-diffusion-equation" style="font-size: 80%;">Example: The diffusion equation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs041.html#defining-the-problem" style="font-size: 80%;">Defining the problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs042.html#setting-up-the-network-using-autograd" style="font-size: 80%;">Setting up the network using Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs043.html#setting-up-the-network-using-autograd-the-trial-solution" style="font-size: 80%;">Setting up the network using Autograd; The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs044.html#why-the-jacobian" style="font-size: 80%;">Why the Jacobian?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs045.html#setting-up-the-network-using-autograd-the-full-program" style="font-size: 80%;">Setting up the network using Autograd; The full program</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs046.html#resources-on-differential-equations-and-deep-learning" style="font-size: 80%;">Resources on differential equations and deep learning</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs047.html#convolutional-neural-networks-recognizing-images" style="font-size: 80%;">Convolutional Neural Networks (recognizing images)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs048.html#what-is-the-difference" style="font-size: 80%;">What is the Difference</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs049.html#neural-networks-vs-cnns" style="font-size: 80%;">Neural Networks vs CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs050.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;">Why CNNS for images, sound files, medical images from CT scans etc?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs051.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;">Regular NNs don’t scale well to full images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs052.html#3d-volumes-of-neurons" style="font-size: 80%;">3D volumes of neurons</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs053.html#more-on-dimensionalities" style="font-size: 80%;">More on Dimensionalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs054.html#further-remarks" style="font-size: 80%;">Further remarks</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs055.html#layers-used-to-build-cnns" style="font-size: 80%;">Layers used to build CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs056.html#transforming-images" style="font-size: 80%;">Transforming images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs057.html#cnns-in-brief" style="font-size: 80%;">CNNs in brief</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs058.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs059.html#key-idea" style="font-size: 80%;">Key Idea</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#how-to-do-image-compression-before-the-era-of-deep-learning" style="font-size: 80%;">How to do image compression before the era of deep learning</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs061.html#the-svd-example" style="font-size: 80%;">The SVD example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs062.html#mathematics-of-cnns" style="font-size: 80%;">Mathematics of CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs063.html#convolution-examples-polynomial-multiplication" style="font-size: 80%;">Convolution Examples: Polynomial multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs064.html#efficient-polynomial-multiplication" style="font-size: 80%;">Efficient Polynomial Multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs065.html#further-simplification" style="font-size: 80%;">Further simplification</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs066.html#a-more-efficient-way-of-coding-the-above-convolution" style="font-size: 80%;">A more efficient way of coding the above Convolution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs067.html#commutative-process" style="font-size: 80%;">Commutative process</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs068.html#toeplitz-matrices" style="font-size: 80%;">Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs069.html#fourier-series-and-toeplitz-matrices" style="font-size: 80%;">Fourier series and Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs070.html#generalizing-the-above-one-dimensional-case" style="font-size: 80%;">Generalizing the above one-dimensional case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs071.html#memory-considerations" style="font-size: 80%;">Memory considerations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs072.html#padding" style="font-size: 80%;">Padding</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs073.html#new-vector" style="font-size: 80%;">New vector</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs074.html#rewriting-as-dot-products" style="font-size: 80%;">Rewriting as dot products</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs075.html#cross-correlation" style="font-size: 80%;">Cross correlation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs075.html#two-dimensional-objects" style="font-size: 80%;">Two-dimensional objects</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs076.html#cnns-in-more-detail-simple-example" style="font-size: 80%;">CNNs in more detail, simple example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs077.html#the-convolution-stage" style="font-size: 80%;">The convolution stage</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs078.html#finding-the-number-of-parameters" style="font-size: 80%;">Finding the number of parameters</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs079.html#new-image-or-volume" style="font-size: 80%;">New image (or volume)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs080.html#parameters-to-train-common-settings" style="font-size: 80%;">Parameters to train, common settings</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs081.html#examples-of-cnn-setups" style="font-size: 80%;">Examples of CNN setups</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs082.html#summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Summarizing: Performing a general discrete convolution ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs083.html#pooling" style="font-size: 80%;">Pooling</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs084.html#pooling-arithmetic" style="font-size: 80%;">Pooling arithmetic</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs085.html#pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Pooling types ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs086.html#building-convolutional-neural-networks-in-tensorflow-keras-and-pytorch" style="font-size: 80%;">Building convolutional neural networks in Tensorflow/Keras and PyTorch</a></li>
 
         </ul>
       </li>
@@ -348,15 +389,28 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0015"></a>
 <!-- !split -->
-<h2 id="a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" class="anchor">A deep CNN model (<a href="https://github.com/rasbt/machine-learning-book" target="_self">From Raschka et al</a>) </h2>
+<h2 id="more-details" class="anchor">More details </h2>
 
-<center>  <!-- FIGURE -->
-<hr class="figure">
-<center>
-<p class="caption">Figure 3:  A deep CNN </p>
-</center>
-<p><img src="http://github.com/figslides/deepcnn.png" width="500" align="bottom"></p>
-</center>
+<p>If the neural network evaluates \( g_t(x, P) \) at more values for \( x \), say \( N \) values \( x_i \) for \( i = 1, \dots, N \), then the <em>total</em> error to minimize becomes</p>
+
+$$
+\begin{equation} \tag{9}
+\min_{P}\Big\{\frac{1}{N} \sum_{i=1}^N  \big(g_t'(x_i, P) - ( -\gamma g_t(x_i, P) \big)^2 \Big\}
+\end{equation}
+$$
+
+<p>Letting \( \boldsymbol{x} \) be a vector with elements \( x_i \) and \( C(\boldsymbol{x}, P) = \frac{1}{N} \sum_i  \big(g_t'(x_i, P) - ( -\gamma g_t(x_i, P) \big)^2 \) denote the cost function, the minimization problem that our network must solve, becomes</p>
+
+$$
+\min_{P} C(\boldsymbol{x}, P)
+$$
+
+<p>In terms of \( P_{\text{hidden} } \) and \( P_{\text{output} } \), this could also be expressed as</p>
+
+<p>$$
+\min_{P_{\text{hidden} }, \ P_{\text{output} }} C(\boldsymbol{x}, \{P_{\text{hidden} }, P_{\text{output} }\})
+$$
+</p>
 
 <p>
 <!-- navigation buttons at the bottom of the page -->
@@ -383,7 +437,7 @@ <h2 id="a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learn
   <li><a href="http://github.com/._week44-bs023.html">24</a></li>
   <li><a href="http://github.com/._week44-bs024.html">25</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week44-bs060.html">61</a></li>
+  <li><a href="http://github.com/._week44-bs086.html">87</a></li>
   <li><a href="http://github.com/._week44-bs016.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week44/html/._week44-bs016.html b/doc/pub/week44/html/._week44-bs016.html
index 18b6be747..2f2ad5ce6 100644
--- a/doc/pub/week44/html/._week44-bs016.html
+++ b/doc/pub/week44/html/._week44-bs016.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 44,  Convolutional Neural Networks (CNN)">
-<title>Week 44,  Convolutional Neural Networks (CNN)</title>
+<meta name="description" content="Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)">
+<title>Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week44.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week44-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -41,10 +41,121 @@
                2,
                None,
                'lab-sessions-on-tuesday-and-wednesday'),
-              ('Material for Lecture Monday October 28',
+              ('Material for Lecture Monday October 27',
                2,
                None,
-               'material-for-lecture-monday-october-28'),
+               'material-for-lecture-monday-october-27'),
+              ('Solving differential equations  with Deep Learning',
+               2,
+               None,
+               'solving-differential-equations-with-deep-learning'),
+              ('Ordinary Differential Equations first',
+               2,
+               None,
+               'ordinary-differential-equations-first'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('Minimization process', 2, None, 'minimization-process'),
+              ('Minimizing the cost function using gradient descent and '
+               'automatic differentiation',
+               2,
+               None,
+               'minimizing-the-cost-function-using-gradient-descent-and-automatic-differentiation'),
+              ('Example: Exponential decay',
+               2,
+               None,
+               'example-exponential-decay'),
+              ('The function to solve for',
+               2,
+               None,
+               'the-function-to-solve-for'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('Setup of Network', 2, None, 'setup-of-network'),
+              ('Reformulating the problem',
+               2,
+               None,
+               'reformulating-the-problem'),
+              ('More technicalities', 2, None, 'more-technicalities'),
+              ('More details', 2, None, 'more-details'),
+              ('A possible implementation of a neural network',
+               2,
+               None,
+               'a-possible-implementation-of-a-neural-network'),
+              ('Technicalities', 2, None, 'technicalities'),
+              ('Final technicalities I', 2, None, 'final-technicalities-i'),
+              ('Final technicalities II', 2, None, 'final-technicalities-ii'),
+              ('Final technicalities III', 2, None, 'final-technicalities-iii'),
+              ('Final technicalities IV', 2, None, 'final-technicalities-iv'),
+              ('Back propagation', 2, None, 'back-propagation'),
+              ('Gradient descent', 2, None, 'gradient-descent'),
+              ('The code for solving the ODE',
+               2,
+               None,
+               'the-code-for-solving-the-ode'),
+              ('The network with one input layer, specified number of hidden '
+               'layers, and one output layer',
+               2,
+               None,
+               'the-network-with-one-input-layer-specified-number-of-hidden-layers-and-one-output-layer'),
+              ('Example: Population growth',
+               2,
+               None,
+               'example-population-growth'),
+              ('Setting up the problem', 2, None, 'setting-up-the-problem'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('The program using Autograd',
+               2,
+               None,
+               'the-program-using-autograd'),
+              ('Using forward Euler to solve the ODE',
+               2,
+               None,
+               'using-forward-euler-to-solve-the-ode'),
+              ('Example: Solving the one dimensional Poisson equation',
+               2,
+               None,
+               'example-solving-the-one-dimensional-poisson-equation'),
+              ('The specific equation to solve for',
+               2,
+               None,
+               'the-specific-equation-to-solve-for'),
+              ('Solving the equation using Autograd',
+               2,
+               None,
+               'solving-the-equation-using-autograd'),
+              ('Comparing with a numerical scheme',
+               2,
+               None,
+               'comparing-with-a-numerical-scheme'),
+              ('Setting up the code', 2, None, 'setting-up-the-code'),
+              ('Partial Differential Equations',
+               2,
+               None,
+               'partial-differential-equations'),
+              ('Type of problem', 2, None, 'type-of-problem'),
+              ('Network requirements', 2, None, 'network-requirements'),
+              ('More details', 2, None, 'more-details'),
+              ('Example: The diffusion equation',
+               2,
+               None,
+               'example-the-diffusion-equation'),
+              ('Defining the problem', 2, None, 'defining-the-problem'),
+              ('Setting up the network using Autograd',
+               2,
+               None,
+               'setting-up-the-network-using-autograd'),
+              ('Setting up the network using Autograd; The trial solution',
+               2,
+               None,
+               'setting-up-the-network-using-autograd-the-trial-solution'),
+              ('Why the Jacobian?', 2, None, 'why-the-jacobian'),
+              ('Setting up the network using Autograd; The full program',
+               2,
+               None,
+               'setting-up-the-network-using-autograd-the-full-program'),
+              ('Resources on differential equations and deep learning',
+               2,
+               None,
+               'resources-on-differential-equations-and-deep-learning'),
               ('Convolutional Neural Networks (recognizing images)',
                2,
                None,
@@ -141,85 +252,11 @@
                2,
                None,
                'pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
-              ('Building convolutional neural networks in Tensorflow and Keras',
-               2,
-               None,
-               'building-convolutional-neural-networks-in-tensorflow-and-keras'),
-              ('Setting it up', 2, None, 'setting-it-up'),
-              ('The MNIST dataset again', 2, None, 'the-mnist-dataset-again'),
-              ('Strong correlations', 2, None, 'strong-correlations'),
-              ('Layers of a CNN', 2, None, 'layers-of-a-cnn'),
-              ('Systematic reduction', 2, None, 'systematic-reduction'),
-              ('Prerequisites: Collect and pre-process data',
+              ('Building convolutional neural networks in Tensorflow/Keras and '
+               'PyTorch',
                2,
                None,
-               'prerequisites-collect-and-pre-process-data'),
-              ('Importing Keras and Tensorflow',
-               2,
-               None,
-               'importing-keras-and-tensorflow'),
-              ('Running with Keras', 2, None, 'running-with-keras'),
-              ('Final part', 2, None, 'final-part'),
-              ('Final visualization', 2, None, 'final-visualization'),
-              ('The CIFAR01 data set', 2, None, 'the-cifar01-data-set'),
-              ('Verifying the data set', 2, None, 'verifying-the-data-set'),
-              ('Set up  the model', 2, None, 'set-up-the-model'),
-              ('Add Dense layers on top', 2, None, 'add-dense-layers-on-top'),
-              ('Compile and train the model',
-               2,
-               None,
-               'compile-and-train-the-model'),
-              ('Finally, evaluate the model',
-               2,
-               None,
-               'finally-evaluate-the-model'),
-              ('Building our own CNN code',
-               2,
-               None,
-               'building-our-own-cnn-code'),
-              ('List of contents:', 3, None, 'list-of-contents'),
-              ('Schedulers', 3, None, 'schedulers'),
-              ('Usage of schedulers', 3, None, 'usage-of-schedulers'),
-              ('Cost functions', 3, None, 'cost-functions'),
-              ('Usage of cost functions', 3, None, 'usage-of-cost-functions'),
-              ('Activation functions', 3, None, 'activation-functions'),
-              ('Usage of activation functions',
-               3,
-               None,
-               'usage-of-activation-functions'),
-              ('Convolution', 3, None, 'convolution'),
-              ('Layers', 3, None, 'layers'),
-              ('Convolution2DLayer: convolution in a hidden layer',
-               3,
-               None,
-               'convolution2dlayer-convolution-in-a-hidden-layer'),
-              ('Backpropagation in the convolutional layer',
-               3,
-               None,
-               'backpropagation-in-the-convolutional-layer'),
-              ('Demonstration', 3, None, 'demonstration'),
-              ('Pooling Layer', 3, None, 'pooling-layer'),
-              ('Flattening Layer', 3, None, 'flattening-layer'),
-              ('Fully Connected Layers', 3, None, 'fully-connected-layers'),
-              ('Optimized Convolution2DLayer',
-               3,
-               None,
-               'optimized-convolution2dlayer'),
-              ('The Convolutional Neural Network (CNN)',
-               3,
-               None,
-               'the-convolutional-neural-network-cnn'),
-              ('Usage of CNN code', 3, None, 'usage-of-cnn-code'),
-              ('Additional Remarks', 3, None, 'additional-remarks'),
-              ('Remarks on the speed', 3, None, 'remarks-on-the-speed'),
-              ('Convolution using separable kernels',
-               3,
-               None,
-               'convolution-using-separable-kernels'),
-              ('Convolution in the Fourier domain',
-               3,
-               None,
-               'convolution-in-the-fourier-domain')]}
+               'building-convolutional-neural-networks-in-tensorflow-keras-and-pytorch')]}
 end of tocinfo -->
 
 <body>
@@ -247,96 +284,100 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week44-bs.html">Week 44,  Convolutional Neural Networks (CNN)</a>
+    <a class="navbar-brand" href="http://github.com/week44-bs.html">Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs001.html#plan-for-week-44" style="font-size: 80%;"><b>Plan for week 44</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs002.html#lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Lab  sessions on Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs003.html#material-for-lecture-monday-october-28" style="font-size: 80%;"><b>Material for Lecture Monday October 28</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs004.html#convolutional-neural-networks-recognizing-images" style="font-size: 80%;"><b>Convolutional Neural Networks (recognizing images)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs005.html#what-is-the-difference" style="font-size: 80%;"><b>What is the Difference</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs006.html#neural-networks-vs-cnns" style="font-size: 80%;"><b>Neural Networks vs CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs007.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;"><b>Why CNNS for images, sound files, medical images from CT scans etc?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs008.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;"><b>Regular NNs don’t scale well to full images</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs009.html#3d-volumes-of-neurons" style="font-size: 80%;"><b>3D volumes of neurons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs010.html#more-on-dimensionalities" style="font-size: 80%;"><b>More on Dimensionalities</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs011.html#further-remarks" style="font-size: 80%;"><b>Further remarks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs012.html#layers-used-to-build-cnns" style="font-size: 80%;"><b>Layers used to build CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs013.html#transforming-images" style="font-size: 80%;"><b>Transforming images</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs014.html#cnns-in-brief" style="font-size: 80%;"><b>CNNs in brief</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs015.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="#key-idea" style="font-size: 80%;"><b>Key Idea</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs017.html#how-to-do-image-compression-before-the-era-of-deep-learning" style="font-size: 80%;"><b>How to do image compression before the era of deep learning</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs018.html#the-svd-example" style="font-size: 80%;"><b>The SVD example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs019.html#mathematics-of-cnns" style="font-size: 80%;"><b>Mathematics of CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs020.html#convolution-examples-polynomial-multiplication" style="font-size: 80%;"><b>Convolution Examples: Polynomial multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs021.html#efficient-polynomial-multiplication" style="font-size: 80%;"><b>Efficient Polynomial Multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs022.html#further-simplification" style="font-size: 80%;"><b>Further simplification</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs023.html#a-more-efficient-way-of-coding-the-above-convolution" style="font-size: 80%;"><b>A more efficient way of coding the above Convolution</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs024.html#commutative-process" style="font-size: 80%;"><b>Commutative process</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs025.html#toeplitz-matrices" style="font-size: 80%;"><b>Toeplitz matrices</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs026.html#fourier-series-and-toeplitz-matrices" style="font-size: 80%;"><b>Fourier series and Toeplitz matrices</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs027.html#generalizing-the-above-one-dimensional-case" style="font-size: 80%;"><b>Generalizing the above one-dimensional case</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#memory-considerations" style="font-size: 80%;"><b>Memory considerations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs029.html#padding" style="font-size: 80%;"><b>Padding</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs030.html#new-vector" style="font-size: 80%;"><b>New vector</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs031.html#rewriting-as-dot-products" style="font-size: 80%;"><b>Rewriting as dot products</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs032.html#cross-correlation" style="font-size: 80%;"><b>Cross correlation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs032.html#two-dimensional-objects" style="font-size: 80%;"><b>Two-dimensional objects</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs033.html#cnns-in-more-detail-simple-example" style="font-size: 80%;"><b>CNNs in more detail, simple example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs034.html#the-convolution-stage" style="font-size: 80%;"><b>The convolution stage</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs035.html#finding-the-number-of-parameters" style="font-size: 80%;"><b>Finding the number of parameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs036.html#new-image-or-volume" style="font-size: 80%;"><b>New image (or volume)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs037.html#parameters-to-train-common-settings" style="font-size: 80%;"><b>Parameters to train, common settings</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs038.html#examples-of-cnn-setups" style="font-size: 80%;"><b>Examples of CNN setups</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs039.html#summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>Summarizing: Performing a general discrete convolution ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs040.html#pooling" style="font-size: 80%;"><b>Pooling</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs041.html#pooling-arithmetic" style="font-size: 80%;"><b>Pooling arithmetic</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs042.html#pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>Pooling types ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs043.html#building-convolutional-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building convolutional neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs044.html#setting-it-up" style="font-size: 80%;"><b>Setting it up</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs045.html#the-mnist-dataset-again" style="font-size: 80%;"><b>The MNIST dataset again</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs046.html#strong-correlations" style="font-size: 80%;"><b>Strong correlations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs047.html#layers-of-a-cnn" style="font-size: 80%;"><b>Layers of a CNN</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs048.html#systematic-reduction" style="font-size: 80%;"><b>Systematic reduction</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs049.html#prerequisites-collect-and-pre-process-data" style="font-size: 80%;"><b>Prerequisites: Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs050.html#importing-keras-and-tensorflow" style="font-size: 80%;"><b>Importing Keras and Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs051.html#running-with-keras" style="font-size: 80%;"><b>Running with Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs052.html#final-part" style="font-size: 80%;"><b>Final part</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs053.html#final-visualization" style="font-size: 80%;"><b>Final visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs054.html#the-cifar01-data-set" style="font-size: 80%;"><b>The CIFAR01 data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs055.html#verifying-the-data-set" style="font-size: 80%;"><b>Verifying the data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs056.html#set-up-the-model" style="font-size: 80%;"><b>Set up  the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs057.html#add-dense-layers-on-top" style="font-size: 80%;"><b>Add Dense layers on top</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs058.html#compile-and-train-the-model" style="font-size: 80%;"><b>Compile and train the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs059.html#finally-evaluate-the-model" style="font-size: 80%;"><b>Finally, evaluate the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#building-our-own-cnn-code" style="font-size: 80%;"><b>Building our own CNN code</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#list-of-contents" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;List of contents:</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution2dlayer-convolution-in-a-hidden-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution2DLayer: convolution in a hidden layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#backpropagation-in-the-convolutional-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Backpropagation in the convolutional layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#demonstration" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Demonstration</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#pooling-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Pooling Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#flattening-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Flattening Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#fully-connected-layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Fully Connected Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#optimized-convolution2dlayer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Optimized Convolution2DLayer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#the-convolutional-neural-network-cnn" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;The Convolutional Neural Network (CNN)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-cnn-code" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of CNN code</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#additional-remarks" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Additional Remarks</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#remarks-on-the-speed" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Remarks on the speed</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution-using-separable-kernels" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution using separable kernels</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution-in-the-fourier-domain" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution in the Fourier domain</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs001.html#plan-for-week-44" style="font-size: 80%;">Plan for week 44</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs002.html#lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;">Lab  sessions on Tuesday and Wednesday</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs003.html#material-for-lecture-monday-october-27" style="font-size: 80%;">Material for Lecture Monday October 27</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs004.html#solving-differential-equations-with-deep-learning" style="font-size: 80%;">Solving differential equations  with Deep Learning</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs005.html#ordinary-differential-equations-first" style="font-size: 80%;">Ordinary Differential Equations first</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#the-trial-solution" style="font-size: 80%;">The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs007.html#minimization-process" style="font-size: 80%;">Minimization process</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs008.html#minimizing-the-cost-function-using-gradient-descent-and-automatic-differentiation" style="font-size: 80%;">Minimizing the cost function using gradient descent and automatic differentiation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs009.html#example-exponential-decay" style="font-size: 80%;">Example: Exponential decay</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs010.html#the-function-to-solve-for" style="font-size: 80%;">The function to solve for</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#the-trial-solution" style="font-size: 80%;">The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs012.html#setup-of-network" style="font-size: 80%;">Setup of Network</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs013.html#reformulating-the-problem" style="font-size: 80%;">Reformulating the problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs014.html#more-technicalities" style="font-size: 80%;">More technicalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs039.html#more-details" style="font-size: 80%;">More details</a></li>
+     <!-- navigation toc: --> <li><a href="#a-possible-implementation-of-a-neural-network" style="font-size: 80%;">A possible implementation of a neural network</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs017.html#technicalities" style="font-size: 80%;">Technicalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs018.html#final-technicalities-i" style="font-size: 80%;">Final technicalities I</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs019.html#final-technicalities-ii" style="font-size: 80%;">Final technicalities II</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs020.html#final-technicalities-iii" style="font-size: 80%;">Final technicalities III</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs021.html#final-technicalities-iv" style="font-size: 80%;">Final technicalities IV</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs022.html#back-propagation" style="font-size: 80%;">Back propagation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs023.html#gradient-descent" style="font-size: 80%;">Gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs024.html#the-code-for-solving-the-ode" style="font-size: 80%;">The code for solving the ODE</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs025.html#the-network-with-one-input-layer-specified-number-of-hidden-layers-and-one-output-layer" style="font-size: 80%;">The network with one input layer, specified number of hidden layers, and one output layer</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs026.html#example-population-growth" style="font-size: 80%;">Example: Population growth</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs027.html#setting-up-the-problem" style="font-size: 80%;">Setting up the problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#the-trial-solution" style="font-size: 80%;">The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs029.html#the-program-using-autograd" style="font-size: 80%;">The program using Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs030.html#using-forward-euler-to-solve-the-ode" style="font-size: 80%;">Using forward Euler to solve the ODE</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs031.html#example-solving-the-one-dimensional-poisson-equation" style="font-size: 80%;">Example: Solving the one dimensional Poisson equation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs032.html#the-specific-equation-to-solve-for" style="font-size: 80%;">The specific equation to solve for</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs033.html#solving-the-equation-using-autograd" style="font-size: 80%;">Solving the equation using Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs034.html#comparing-with-a-numerical-scheme" style="font-size: 80%;">Comparing with a numerical scheme</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs035.html#setting-up-the-code" style="font-size: 80%;">Setting up the code</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs036.html#partial-differential-equations" style="font-size: 80%;">Partial Differential Equations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs037.html#type-of-problem" style="font-size: 80%;">Type of problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs038.html#network-requirements" style="font-size: 80%;">Network requirements</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs039.html#more-details" style="font-size: 80%;">More details</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs040.html#example-the-diffusion-equation" style="font-size: 80%;">Example: The diffusion equation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs041.html#defining-the-problem" style="font-size: 80%;">Defining the problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs042.html#setting-up-the-network-using-autograd" style="font-size: 80%;">Setting up the network using Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs043.html#setting-up-the-network-using-autograd-the-trial-solution" style="font-size: 80%;">Setting up the network using Autograd; The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs044.html#why-the-jacobian" style="font-size: 80%;">Why the Jacobian?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs045.html#setting-up-the-network-using-autograd-the-full-program" style="font-size: 80%;">Setting up the network using Autograd; The full program</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs046.html#resources-on-differential-equations-and-deep-learning" style="font-size: 80%;">Resources on differential equations and deep learning</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs047.html#convolutional-neural-networks-recognizing-images" style="font-size: 80%;">Convolutional Neural Networks (recognizing images)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs048.html#what-is-the-difference" style="font-size: 80%;">What is the Difference</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs049.html#neural-networks-vs-cnns" style="font-size: 80%;">Neural Networks vs CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs050.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;">Why CNNS for images, sound files, medical images from CT scans etc?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs051.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;">Regular NNs don’t scale well to full images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs052.html#3d-volumes-of-neurons" style="font-size: 80%;">3D volumes of neurons</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs053.html#more-on-dimensionalities" style="font-size: 80%;">More on Dimensionalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs054.html#further-remarks" style="font-size: 80%;">Further remarks</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs055.html#layers-used-to-build-cnns" style="font-size: 80%;">Layers used to build CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs056.html#transforming-images" style="font-size: 80%;">Transforming images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs057.html#cnns-in-brief" style="font-size: 80%;">CNNs in brief</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs058.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs059.html#key-idea" style="font-size: 80%;">Key Idea</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#how-to-do-image-compression-before-the-era-of-deep-learning" style="font-size: 80%;">How to do image compression before the era of deep learning</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs061.html#the-svd-example" style="font-size: 80%;">The SVD example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs062.html#mathematics-of-cnns" style="font-size: 80%;">Mathematics of CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs063.html#convolution-examples-polynomial-multiplication" style="font-size: 80%;">Convolution Examples: Polynomial multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs064.html#efficient-polynomial-multiplication" style="font-size: 80%;">Efficient Polynomial Multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs065.html#further-simplification" style="font-size: 80%;">Further simplification</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs066.html#a-more-efficient-way-of-coding-the-above-convolution" style="font-size: 80%;">A more efficient way of coding the above Convolution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs067.html#commutative-process" style="font-size: 80%;">Commutative process</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs068.html#toeplitz-matrices" style="font-size: 80%;">Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs069.html#fourier-series-and-toeplitz-matrices" style="font-size: 80%;">Fourier series and Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs070.html#generalizing-the-above-one-dimensional-case" style="font-size: 80%;">Generalizing the above one-dimensional case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs071.html#memory-considerations" style="font-size: 80%;">Memory considerations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs072.html#padding" style="font-size: 80%;">Padding</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs073.html#new-vector" style="font-size: 80%;">New vector</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs074.html#rewriting-as-dot-products" style="font-size: 80%;">Rewriting as dot products</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs075.html#cross-correlation" style="font-size: 80%;">Cross correlation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs075.html#two-dimensional-objects" style="font-size: 80%;">Two-dimensional objects</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs076.html#cnns-in-more-detail-simple-example" style="font-size: 80%;">CNNs in more detail, simple example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs077.html#the-convolution-stage" style="font-size: 80%;">The convolution stage</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs078.html#finding-the-number-of-parameters" style="font-size: 80%;">Finding the number of parameters</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs079.html#new-image-or-volume" style="font-size: 80%;">New image (or volume)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs080.html#parameters-to-train-common-settings" style="font-size: 80%;">Parameters to train, common settings</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs081.html#examples-of-cnn-setups" style="font-size: 80%;">Examples of CNN setups</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs082.html#summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Summarizing: Performing a general discrete convolution ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs083.html#pooling" style="font-size: 80%;">Pooling</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs084.html#pooling-arithmetic" style="font-size: 80%;">Pooling arithmetic</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs085.html#pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Pooling types ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs086.html#building-convolutional-neural-networks-in-tensorflow-keras-and-pytorch" style="font-size: 80%;">Building convolutional neural networks in Tensorflow/Keras and PyTorch</a></li>
 
         </ul>
       </li>
@@ -348,16 +389,15 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0016"></a>
 <!-- !split -->
-<h2 id="key-idea" class="anchor">Key Idea </h2>
+<h2 id="a-possible-implementation-of-a-neural-network" class="anchor">A possible implementation of a neural network </h2>
 
-<p>A dense neural network is representd by an affine operation (like matrix-matrix multiplication) where all parameters are included.</p>
+<p>For simplicity, it is assumed that the input is an array \( \boldsymbol{x} = (x_1, \dots, x_N) \) with \( N \) elements. It is at these points the neural network should find \( P \) such that it fulfills <a href="http://github.com/._week44-bs015.html#mjx-eqn-9">(9)</a>.</p>
 
-<p>The key idea in CNNs for say imaging is that in images neighbor pixels tend to be related! So we connect
-only neighboring neurons in the input instead of connecting all with the first hidden layer.
+<p>First, the neural network must feed forward the inputs.
+This means that \( \boldsymbol{x}s \) must be passed through an input layer, a hidden layer and a output layer. The input layer in this case, does not need to process the data any further.
+The input layer will consist of \( N_{\text{input} } \) neurons, passing its element to each neuron in the hidden layer.  The number of neurons in the hidden layer will be \( N_{\text{hidden} } \).
 </p>
 
-<p>We say we perform a filtering (convolution is the mathematical operation). </p>
-
 <p>
 <!-- navigation buttons at the bottom of the page -->
 <ul class="pagination">
@@ -383,7 +423,7 @@ <h2 id="key-idea" class="anchor">Key Idea </h2>
   <li><a href="http://github.com/._week44-bs024.html">25</a></li>
   <li><a href="http://github.com/._week44-bs025.html">26</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week44-bs060.html">61</a></li>
+  <li><a href="http://github.com/._week44-bs086.html">87</a></li>
   <li><a href="http://github.com/._week44-bs017.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week44/html/._week44-bs017.html b/doc/pub/week44/html/._week44-bs017.html
index e213ed33e..e3ff7431b 100644
--- a/doc/pub/week44/html/._week44-bs017.html
+++ b/doc/pub/week44/html/._week44-bs017.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 44,  Convolutional Neural Networks (CNN)">
-<title>Week 44,  Convolutional Neural Networks (CNN)</title>
+<meta name="description" content="Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)">
+<title>Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week44.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week44-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -41,10 +41,121 @@
                2,
                None,
                'lab-sessions-on-tuesday-and-wednesday'),
-              ('Material for Lecture Monday October 28',
+              ('Material for Lecture Monday October 27',
                2,
                None,
-               'material-for-lecture-monday-october-28'),
+               'material-for-lecture-monday-october-27'),
+              ('Solving differential equations  with Deep Learning',
+               2,
+               None,
+               'solving-differential-equations-with-deep-learning'),
+              ('Ordinary Differential Equations first',
+               2,
+               None,
+               'ordinary-differential-equations-first'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('Minimization process', 2, None, 'minimization-process'),
+              ('Minimizing the cost function using gradient descent and '
+               'automatic differentiation',
+               2,
+               None,
+               'minimizing-the-cost-function-using-gradient-descent-and-automatic-differentiation'),
+              ('Example: Exponential decay',
+               2,
+               None,
+               'example-exponential-decay'),
+              ('The function to solve for',
+               2,
+               None,
+               'the-function-to-solve-for'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('Setup of Network', 2, None, 'setup-of-network'),
+              ('Reformulating the problem',
+               2,
+               None,
+               'reformulating-the-problem'),
+              ('More technicalities', 2, None, 'more-technicalities'),
+              ('More details', 2, None, 'more-details'),
+              ('A possible implementation of a neural network',
+               2,
+               None,
+               'a-possible-implementation-of-a-neural-network'),
+              ('Technicalities', 2, None, 'technicalities'),
+              ('Final technicalities I', 2, None, 'final-technicalities-i'),
+              ('Final technicalities II', 2, None, 'final-technicalities-ii'),
+              ('Final technicalities III', 2, None, 'final-technicalities-iii'),
+              ('Final technicalities IV', 2, None, 'final-technicalities-iv'),
+              ('Back propagation', 2, None, 'back-propagation'),
+              ('Gradient descent', 2, None, 'gradient-descent'),
+              ('The code for solving the ODE',
+               2,
+               None,
+               'the-code-for-solving-the-ode'),
+              ('The network with one input layer, specified number of hidden '
+               'layers, and one output layer',
+               2,
+               None,
+               'the-network-with-one-input-layer-specified-number-of-hidden-layers-and-one-output-layer'),
+              ('Example: Population growth',
+               2,
+               None,
+               'example-population-growth'),
+              ('Setting up the problem', 2, None, 'setting-up-the-problem'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('The program using Autograd',
+               2,
+               None,
+               'the-program-using-autograd'),
+              ('Using forward Euler to solve the ODE',
+               2,
+               None,
+               'using-forward-euler-to-solve-the-ode'),
+              ('Example: Solving the one dimensional Poisson equation',
+               2,
+               None,
+               'example-solving-the-one-dimensional-poisson-equation'),
+              ('The specific equation to solve for',
+               2,
+               None,
+               'the-specific-equation-to-solve-for'),
+              ('Solving the equation using Autograd',
+               2,
+               None,
+               'solving-the-equation-using-autograd'),
+              ('Comparing with a numerical scheme',
+               2,
+               None,
+               'comparing-with-a-numerical-scheme'),
+              ('Setting up the code', 2, None, 'setting-up-the-code'),
+              ('Partial Differential Equations',
+               2,
+               None,
+               'partial-differential-equations'),
+              ('Type of problem', 2, None, 'type-of-problem'),
+              ('Network requirements', 2, None, 'network-requirements'),
+              ('More details', 2, None, 'more-details'),
+              ('Example: The diffusion equation',
+               2,
+               None,
+               'example-the-diffusion-equation'),
+              ('Defining the problem', 2, None, 'defining-the-problem'),
+              ('Setting up the network using Autograd',
+               2,
+               None,
+               'setting-up-the-network-using-autograd'),
+              ('Setting up the network using Autograd; The trial solution',
+               2,
+               None,
+               'setting-up-the-network-using-autograd-the-trial-solution'),
+              ('Why the Jacobian?', 2, None, 'why-the-jacobian'),
+              ('Setting up the network using Autograd; The full program',
+               2,
+               None,
+               'setting-up-the-network-using-autograd-the-full-program'),
+              ('Resources on differential equations and deep learning',
+               2,
+               None,
+               'resources-on-differential-equations-and-deep-learning'),
               ('Convolutional Neural Networks (recognizing images)',
                2,
                None,
@@ -141,85 +252,11 @@
                2,
                None,
                'pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
-              ('Building convolutional neural networks in Tensorflow and Keras',
-               2,
-               None,
-               'building-convolutional-neural-networks-in-tensorflow-and-keras'),
-              ('Setting it up', 2, None, 'setting-it-up'),
-              ('The MNIST dataset again', 2, None, 'the-mnist-dataset-again'),
-              ('Strong correlations', 2, None, 'strong-correlations'),
-              ('Layers of a CNN', 2, None, 'layers-of-a-cnn'),
-              ('Systematic reduction', 2, None, 'systematic-reduction'),
-              ('Prerequisites: Collect and pre-process data',
+              ('Building convolutional neural networks in Tensorflow/Keras and '
+               'PyTorch',
                2,
                None,
-               'prerequisites-collect-and-pre-process-data'),
-              ('Importing Keras and Tensorflow',
-               2,
-               None,
-               'importing-keras-and-tensorflow'),
-              ('Running with Keras', 2, None, 'running-with-keras'),
-              ('Final part', 2, None, 'final-part'),
-              ('Final visualization', 2, None, 'final-visualization'),
-              ('The CIFAR01 data set', 2, None, 'the-cifar01-data-set'),
-              ('Verifying the data set', 2, None, 'verifying-the-data-set'),
-              ('Set up  the model', 2, None, 'set-up-the-model'),
-              ('Add Dense layers on top', 2, None, 'add-dense-layers-on-top'),
-              ('Compile and train the model',
-               2,
-               None,
-               'compile-and-train-the-model'),
-              ('Finally, evaluate the model',
-               2,
-               None,
-               'finally-evaluate-the-model'),
-              ('Building our own CNN code',
-               2,
-               None,
-               'building-our-own-cnn-code'),
-              ('List of contents:', 3, None, 'list-of-contents'),
-              ('Schedulers', 3, None, 'schedulers'),
-              ('Usage of schedulers', 3, None, 'usage-of-schedulers'),
-              ('Cost functions', 3, None, 'cost-functions'),
-              ('Usage of cost functions', 3, None, 'usage-of-cost-functions'),
-              ('Activation functions', 3, None, 'activation-functions'),
-              ('Usage of activation functions',
-               3,
-               None,
-               'usage-of-activation-functions'),
-              ('Convolution', 3, None, 'convolution'),
-              ('Layers', 3, None, 'layers'),
-              ('Convolution2DLayer: convolution in a hidden layer',
-               3,
-               None,
-               'convolution2dlayer-convolution-in-a-hidden-layer'),
-              ('Backpropagation in the convolutional layer',
-               3,
-               None,
-               'backpropagation-in-the-convolutional-layer'),
-              ('Demonstration', 3, None, 'demonstration'),
-              ('Pooling Layer', 3, None, 'pooling-layer'),
-              ('Flattening Layer', 3, None, 'flattening-layer'),
-              ('Fully Connected Layers', 3, None, 'fully-connected-layers'),
-              ('Optimized Convolution2DLayer',
-               3,
-               None,
-               'optimized-convolution2dlayer'),
-              ('The Convolutional Neural Network (CNN)',
-               3,
-               None,
-               'the-convolutional-neural-network-cnn'),
-              ('Usage of CNN code', 3, None, 'usage-of-cnn-code'),
-              ('Additional Remarks', 3, None, 'additional-remarks'),
-              ('Remarks on the speed', 3, None, 'remarks-on-the-speed'),
-              ('Convolution using separable kernels',
-               3,
-               None,
-               'convolution-using-separable-kernels'),
-              ('Convolution in the Fourier domain',
-               3,
-               None,
-               'convolution-in-the-fourier-domain')]}
+               'building-convolutional-neural-networks-in-tensorflow-keras-and-pytorch')]}
 end of tocinfo -->
 
 <body>
@@ -247,96 +284,100 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week44-bs.html">Week 44,  Convolutional Neural Networks (CNN)</a>
+    <a class="navbar-brand" href="http://github.com/week44-bs.html">Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs001.html#plan-for-week-44" style="font-size: 80%;"><b>Plan for week 44</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs002.html#lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Lab  sessions on Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs003.html#material-for-lecture-monday-october-28" style="font-size: 80%;"><b>Material for Lecture Monday October 28</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs004.html#convolutional-neural-networks-recognizing-images" style="font-size: 80%;"><b>Convolutional Neural Networks (recognizing images)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs005.html#what-is-the-difference" style="font-size: 80%;"><b>What is the Difference</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs006.html#neural-networks-vs-cnns" style="font-size: 80%;"><b>Neural Networks vs CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs007.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;"><b>Why CNNS for images, sound files, medical images from CT scans etc?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs008.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;"><b>Regular NNs don’t scale well to full images</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs009.html#3d-volumes-of-neurons" style="font-size: 80%;"><b>3D volumes of neurons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs010.html#more-on-dimensionalities" style="font-size: 80%;"><b>More on Dimensionalities</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs011.html#further-remarks" style="font-size: 80%;"><b>Further remarks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs012.html#layers-used-to-build-cnns" style="font-size: 80%;"><b>Layers used to build CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs013.html#transforming-images" style="font-size: 80%;"><b>Transforming images</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs014.html#cnns-in-brief" style="font-size: 80%;"><b>CNNs in brief</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs015.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs016.html#key-idea" style="font-size: 80%;"><b>Key Idea</b></a></li>
-     <!-- navigation toc: --> <li><a href="#how-to-do-image-compression-before-the-era-of-deep-learning" style="font-size: 80%;"><b>How to do image compression before the era of deep learning</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs018.html#the-svd-example" style="font-size: 80%;"><b>The SVD example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs019.html#mathematics-of-cnns" style="font-size: 80%;"><b>Mathematics of CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs020.html#convolution-examples-polynomial-multiplication" style="font-size: 80%;"><b>Convolution Examples: Polynomial multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs021.html#efficient-polynomial-multiplication" style="font-size: 80%;"><b>Efficient Polynomial Multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs022.html#further-simplification" style="font-size: 80%;"><b>Further simplification</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs023.html#a-more-efficient-way-of-coding-the-above-convolution" style="font-size: 80%;"><b>A more efficient way of coding the above Convolution</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs024.html#commutative-process" style="font-size: 80%;"><b>Commutative process</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs025.html#toeplitz-matrices" style="font-size: 80%;"><b>Toeplitz matrices</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs026.html#fourier-series-and-toeplitz-matrices" style="font-size: 80%;"><b>Fourier series and Toeplitz matrices</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs027.html#generalizing-the-above-one-dimensional-case" style="font-size: 80%;"><b>Generalizing the above one-dimensional case</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#memory-considerations" style="font-size: 80%;"><b>Memory considerations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs029.html#padding" style="font-size: 80%;"><b>Padding</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs030.html#new-vector" style="font-size: 80%;"><b>New vector</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs031.html#rewriting-as-dot-products" style="font-size: 80%;"><b>Rewriting as dot products</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs032.html#cross-correlation" style="font-size: 80%;"><b>Cross correlation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs032.html#two-dimensional-objects" style="font-size: 80%;"><b>Two-dimensional objects</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs033.html#cnns-in-more-detail-simple-example" style="font-size: 80%;"><b>CNNs in more detail, simple example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs034.html#the-convolution-stage" style="font-size: 80%;"><b>The convolution stage</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs035.html#finding-the-number-of-parameters" style="font-size: 80%;"><b>Finding the number of parameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs036.html#new-image-or-volume" style="font-size: 80%;"><b>New image (or volume)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs037.html#parameters-to-train-common-settings" style="font-size: 80%;"><b>Parameters to train, common settings</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs038.html#examples-of-cnn-setups" style="font-size: 80%;"><b>Examples of CNN setups</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs039.html#summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>Summarizing: Performing a general discrete convolution ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs040.html#pooling" style="font-size: 80%;"><b>Pooling</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs041.html#pooling-arithmetic" style="font-size: 80%;"><b>Pooling arithmetic</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs042.html#pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>Pooling types ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs043.html#building-convolutional-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building convolutional neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs044.html#setting-it-up" style="font-size: 80%;"><b>Setting it up</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs045.html#the-mnist-dataset-again" style="font-size: 80%;"><b>The MNIST dataset again</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs046.html#strong-correlations" style="font-size: 80%;"><b>Strong correlations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs047.html#layers-of-a-cnn" style="font-size: 80%;"><b>Layers of a CNN</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs048.html#systematic-reduction" style="font-size: 80%;"><b>Systematic reduction</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs049.html#prerequisites-collect-and-pre-process-data" style="font-size: 80%;"><b>Prerequisites: Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs050.html#importing-keras-and-tensorflow" style="font-size: 80%;"><b>Importing Keras and Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs051.html#running-with-keras" style="font-size: 80%;"><b>Running with Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs052.html#final-part" style="font-size: 80%;"><b>Final part</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs053.html#final-visualization" style="font-size: 80%;"><b>Final visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs054.html#the-cifar01-data-set" style="font-size: 80%;"><b>The CIFAR01 data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs055.html#verifying-the-data-set" style="font-size: 80%;"><b>Verifying the data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs056.html#set-up-the-model" style="font-size: 80%;"><b>Set up  the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs057.html#add-dense-layers-on-top" style="font-size: 80%;"><b>Add Dense layers on top</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs058.html#compile-and-train-the-model" style="font-size: 80%;"><b>Compile and train the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs059.html#finally-evaluate-the-model" style="font-size: 80%;"><b>Finally, evaluate the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#building-our-own-cnn-code" style="font-size: 80%;"><b>Building our own CNN code</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#list-of-contents" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;List of contents:</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution2dlayer-convolution-in-a-hidden-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution2DLayer: convolution in a hidden layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#backpropagation-in-the-convolutional-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Backpropagation in the convolutional layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#demonstration" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Demonstration</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#pooling-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Pooling Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#flattening-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Flattening Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#fully-connected-layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Fully Connected Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#optimized-convolution2dlayer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Optimized Convolution2DLayer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#the-convolutional-neural-network-cnn" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;The Convolutional Neural Network (CNN)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-cnn-code" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of CNN code</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#additional-remarks" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Additional Remarks</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#remarks-on-the-speed" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Remarks on the speed</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution-using-separable-kernels" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution using separable kernels</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution-in-the-fourier-domain" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution in the Fourier domain</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs001.html#plan-for-week-44" style="font-size: 80%;">Plan for week 44</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs002.html#lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;">Lab  sessions on Tuesday and Wednesday</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs003.html#material-for-lecture-monday-october-27" style="font-size: 80%;">Material for Lecture Monday October 27</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs004.html#solving-differential-equations-with-deep-learning" style="font-size: 80%;">Solving differential equations  with Deep Learning</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs005.html#ordinary-differential-equations-first" style="font-size: 80%;">Ordinary Differential Equations first</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#the-trial-solution" style="font-size: 80%;">The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs007.html#minimization-process" style="font-size: 80%;">Minimization process</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs008.html#minimizing-the-cost-function-using-gradient-descent-and-automatic-differentiation" style="font-size: 80%;">Minimizing the cost function using gradient descent and automatic differentiation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs009.html#example-exponential-decay" style="font-size: 80%;">Example: Exponential decay</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs010.html#the-function-to-solve-for" style="font-size: 80%;">The function to solve for</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#the-trial-solution" style="font-size: 80%;">The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs012.html#setup-of-network" style="font-size: 80%;">Setup of Network</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs013.html#reformulating-the-problem" style="font-size: 80%;">Reformulating the problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs014.html#more-technicalities" style="font-size: 80%;">More technicalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs039.html#more-details" style="font-size: 80%;">More details</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs016.html#a-possible-implementation-of-a-neural-network" style="font-size: 80%;">A possible implementation of a neural network</a></li>
+     <!-- navigation toc: --> <li><a href="#technicalities" style="font-size: 80%;">Technicalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs018.html#final-technicalities-i" style="font-size: 80%;">Final technicalities I</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs019.html#final-technicalities-ii" style="font-size: 80%;">Final technicalities II</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs020.html#final-technicalities-iii" style="font-size: 80%;">Final technicalities III</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs021.html#final-technicalities-iv" style="font-size: 80%;">Final technicalities IV</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs022.html#back-propagation" style="font-size: 80%;">Back propagation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs023.html#gradient-descent" style="font-size: 80%;">Gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs024.html#the-code-for-solving-the-ode" style="font-size: 80%;">The code for solving the ODE</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs025.html#the-network-with-one-input-layer-specified-number-of-hidden-layers-and-one-output-layer" style="font-size: 80%;">The network with one input layer, specified number of hidden layers, and one output layer</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs026.html#example-population-growth" style="font-size: 80%;">Example: Population growth</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs027.html#setting-up-the-problem" style="font-size: 80%;">Setting up the problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#the-trial-solution" style="font-size: 80%;">The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs029.html#the-program-using-autograd" style="font-size: 80%;">The program using Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs030.html#using-forward-euler-to-solve-the-ode" style="font-size: 80%;">Using forward Euler to solve the ODE</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs031.html#example-solving-the-one-dimensional-poisson-equation" style="font-size: 80%;">Example: Solving the one dimensional Poisson equation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs032.html#the-specific-equation-to-solve-for" style="font-size: 80%;">The specific equation to solve for</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs033.html#solving-the-equation-using-autograd" style="font-size: 80%;">Solving the equation using Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs034.html#comparing-with-a-numerical-scheme" style="font-size: 80%;">Comparing with a numerical scheme</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs035.html#setting-up-the-code" style="font-size: 80%;">Setting up the code</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs036.html#partial-differential-equations" style="font-size: 80%;">Partial Differential Equations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs037.html#type-of-problem" style="font-size: 80%;">Type of problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs038.html#network-requirements" style="font-size: 80%;">Network requirements</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs039.html#more-details" style="font-size: 80%;">More details</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs040.html#example-the-diffusion-equation" style="font-size: 80%;">Example: The diffusion equation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs041.html#defining-the-problem" style="font-size: 80%;">Defining the problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs042.html#setting-up-the-network-using-autograd" style="font-size: 80%;">Setting up the network using Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs043.html#setting-up-the-network-using-autograd-the-trial-solution" style="font-size: 80%;">Setting up the network using Autograd; The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs044.html#why-the-jacobian" style="font-size: 80%;">Why the Jacobian?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs045.html#setting-up-the-network-using-autograd-the-full-program" style="font-size: 80%;">Setting up the network using Autograd; The full program</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs046.html#resources-on-differential-equations-and-deep-learning" style="font-size: 80%;">Resources on differential equations and deep learning</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs047.html#convolutional-neural-networks-recognizing-images" style="font-size: 80%;">Convolutional Neural Networks (recognizing images)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs048.html#what-is-the-difference" style="font-size: 80%;">What is the Difference</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs049.html#neural-networks-vs-cnns" style="font-size: 80%;">Neural Networks vs CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs050.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;">Why CNNS for images, sound files, medical images from CT scans etc?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs051.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;">Regular NNs don’t scale well to full images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs052.html#3d-volumes-of-neurons" style="font-size: 80%;">3D volumes of neurons</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs053.html#more-on-dimensionalities" style="font-size: 80%;">More on Dimensionalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs054.html#further-remarks" style="font-size: 80%;">Further remarks</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs055.html#layers-used-to-build-cnns" style="font-size: 80%;">Layers used to build CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs056.html#transforming-images" style="font-size: 80%;">Transforming images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs057.html#cnns-in-brief" style="font-size: 80%;">CNNs in brief</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs058.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs059.html#key-idea" style="font-size: 80%;">Key Idea</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#how-to-do-image-compression-before-the-era-of-deep-learning" style="font-size: 80%;">How to do image compression before the era of deep learning</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs061.html#the-svd-example" style="font-size: 80%;">The SVD example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs062.html#mathematics-of-cnns" style="font-size: 80%;">Mathematics of CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs063.html#convolution-examples-polynomial-multiplication" style="font-size: 80%;">Convolution Examples: Polynomial multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs064.html#efficient-polynomial-multiplication" style="font-size: 80%;">Efficient Polynomial Multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs065.html#further-simplification" style="font-size: 80%;">Further simplification</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs066.html#a-more-efficient-way-of-coding-the-above-convolution" style="font-size: 80%;">A more efficient way of coding the above Convolution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs067.html#commutative-process" style="font-size: 80%;">Commutative process</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs068.html#toeplitz-matrices" style="font-size: 80%;">Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs069.html#fourier-series-and-toeplitz-matrices" style="font-size: 80%;">Fourier series and Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs070.html#generalizing-the-above-one-dimensional-case" style="font-size: 80%;">Generalizing the above one-dimensional case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs071.html#memory-considerations" style="font-size: 80%;">Memory considerations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs072.html#padding" style="font-size: 80%;">Padding</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs073.html#new-vector" style="font-size: 80%;">New vector</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs074.html#rewriting-as-dot-products" style="font-size: 80%;">Rewriting as dot products</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs075.html#cross-correlation" style="font-size: 80%;">Cross correlation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs075.html#two-dimensional-objects" style="font-size: 80%;">Two-dimensional objects</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs076.html#cnns-in-more-detail-simple-example" style="font-size: 80%;">CNNs in more detail, simple example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs077.html#the-convolution-stage" style="font-size: 80%;">The convolution stage</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs078.html#finding-the-number-of-parameters" style="font-size: 80%;">Finding the number of parameters</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs079.html#new-image-or-volume" style="font-size: 80%;">New image (or volume)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs080.html#parameters-to-train-common-settings" style="font-size: 80%;">Parameters to train, common settings</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs081.html#examples-of-cnn-setups" style="font-size: 80%;">Examples of CNN setups</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs082.html#summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Summarizing: Performing a general discrete convolution ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs083.html#pooling" style="font-size: 80%;">Pooling</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs084.html#pooling-arithmetic" style="font-size: 80%;">Pooling arithmetic</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs085.html#pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Pooling types ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs086.html#building-convolutional-neural-networks-in-tensorflow-keras-and-pytorch" style="font-size: 80%;">Building convolutional neural networks in Tensorflow/Keras and PyTorch</a></li>
 
         </ul>
       </li>
@@ -348,24 +389,24 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0017"></a>
 <!-- !split -->
-<h2 id="how-to-do-image-compression-before-the-era-of-deep-learning" class="anchor">How to do image compression before the era of deep learning </h2>
+<h2 id="technicalities" class="anchor">Technicalities </h2>
 
-<p>The singular-value decomposition (SVD) algorithm has been for decades one of the standard ways of compressing images.
-The <a href="https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/chapter2.html#the-singular-value-decomposition" target="_self">lectures on the SVD</a> give many of the essential details concerning the SVD.
-</p>
+<p>For the \( i \)-th in the hidden layer with weight \( w_i^{\text{hidden} } \) and bias \( b_i^{\text{hidden} } \), the weighting from the \( j \)-th neuron at the input layer is:</p>
 
-<p>The orthogonal vectors which are obtained from the SVD, can be used to
-project down the dimensionality of a given image. In the example here
-we gray-scale an image and downsize it.
-</p>
+$$
+\begin{aligned}
+z_{i,j}^{\text{hidden}} &= b_i^{\text{hidden}} + w_i^{\text{hidden}}x_j \\
+&=
+\begin{pmatrix}
+b_i^{\text{hidden}} & w_i^{\text{hidden}}
+\end{pmatrix}
+\begin{pmatrix}
+1 \\
+x_j
+\end{pmatrix}
+\end{aligned}
+$$
 
-<p>This recipe relies on us being able to actually perform the SVD. For
-large images, and in particular with many images to reconstruct, using the SVD 
-may quickly become an overwhelming task. With the advent of efficient deep
-learning methods like CNNs and later generative methods, these methods
-have become in the last years the premier way of performing image
-analysis. In particular for classification problems with labelled images.
-</p>
 
 <p>
 <!-- navigation buttons at the bottom of the page -->
@@ -392,7 +433,7 @@ <h2 id="how-to-do-image-compression-before-the-era-of-deep-learning" class="anch
   <li><a href="http://github.com/._week44-bs025.html">26</a></li>
   <li><a href="http://github.com/._week44-bs026.html">27</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week44-bs060.html">61</a></li>
+  <li><a href="http://github.com/._week44-bs086.html">87</a></li>
   <li><a href="http://github.com/._week44-bs018.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week44/html/._week44-bs018.html b/doc/pub/week44/html/._week44-bs018.html
index 151a88785..0cfb3327e 100644
--- a/doc/pub/week44/html/._week44-bs018.html
+++ b/doc/pub/week44/html/._week44-bs018.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 44,  Convolutional Neural Networks (CNN)">
-<title>Week 44,  Convolutional Neural Networks (CNN)</title>
+<meta name="description" content="Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)">
+<title>Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week44.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week44-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -41,10 +41,121 @@
                2,
                None,
                'lab-sessions-on-tuesday-and-wednesday'),
-              ('Material for Lecture Monday October 28',
+              ('Material for Lecture Monday October 27',
                2,
                None,
-               'material-for-lecture-monday-october-28'),
+               'material-for-lecture-monday-october-27'),
+              ('Solving differential equations  with Deep Learning',
+               2,
+               None,
+               'solving-differential-equations-with-deep-learning'),
+              ('Ordinary Differential Equations first',
+               2,
+               None,
+               'ordinary-differential-equations-first'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('Minimization process', 2, None, 'minimization-process'),
+              ('Minimizing the cost function using gradient descent and '
+               'automatic differentiation',
+               2,
+               None,
+               'minimizing-the-cost-function-using-gradient-descent-and-automatic-differentiation'),
+              ('Example: Exponential decay',
+               2,
+               None,
+               'example-exponential-decay'),
+              ('The function to solve for',
+               2,
+               None,
+               'the-function-to-solve-for'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('Setup of Network', 2, None, 'setup-of-network'),
+              ('Reformulating the problem',
+               2,
+               None,
+               'reformulating-the-problem'),
+              ('More technicalities', 2, None, 'more-technicalities'),
+              ('More details', 2, None, 'more-details'),
+              ('A possible implementation of a neural network',
+               2,
+               None,
+               'a-possible-implementation-of-a-neural-network'),
+              ('Technicalities', 2, None, 'technicalities'),
+              ('Final technicalities I', 2, None, 'final-technicalities-i'),
+              ('Final technicalities II', 2, None, 'final-technicalities-ii'),
+              ('Final technicalities III', 2, None, 'final-technicalities-iii'),
+              ('Final technicalities IV', 2, None, 'final-technicalities-iv'),
+              ('Back propagation', 2, None, 'back-propagation'),
+              ('Gradient descent', 2, None, 'gradient-descent'),
+              ('The code for solving the ODE',
+               2,
+               None,
+               'the-code-for-solving-the-ode'),
+              ('The network with one input layer, specified number of hidden '
+               'layers, and one output layer',
+               2,
+               None,
+               'the-network-with-one-input-layer-specified-number-of-hidden-layers-and-one-output-layer'),
+              ('Example: Population growth',
+               2,
+               None,
+               'example-population-growth'),
+              ('Setting up the problem', 2, None, 'setting-up-the-problem'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('The program using Autograd',
+               2,
+               None,
+               'the-program-using-autograd'),
+              ('Using forward Euler to solve the ODE',
+               2,
+               None,
+               'using-forward-euler-to-solve-the-ode'),
+              ('Example: Solving the one dimensional Poisson equation',
+               2,
+               None,
+               'example-solving-the-one-dimensional-poisson-equation'),
+              ('The specific equation to solve for',
+               2,
+               None,
+               'the-specific-equation-to-solve-for'),
+              ('Solving the equation using Autograd',
+               2,
+               None,
+               'solving-the-equation-using-autograd'),
+              ('Comparing with a numerical scheme',
+               2,
+               None,
+               'comparing-with-a-numerical-scheme'),
+              ('Setting up the code', 2, None, 'setting-up-the-code'),
+              ('Partial Differential Equations',
+               2,
+               None,
+               'partial-differential-equations'),
+              ('Type of problem', 2, None, 'type-of-problem'),
+              ('Network requirements', 2, None, 'network-requirements'),
+              ('More details', 2, None, 'more-details'),
+              ('Example: The diffusion equation',
+               2,
+               None,
+               'example-the-diffusion-equation'),
+              ('Defining the problem', 2, None, 'defining-the-problem'),
+              ('Setting up the network using Autograd',
+               2,
+               None,
+               'setting-up-the-network-using-autograd'),
+              ('Setting up the network using Autograd; The trial solution',
+               2,
+               None,
+               'setting-up-the-network-using-autograd-the-trial-solution'),
+              ('Why the Jacobian?', 2, None, 'why-the-jacobian'),
+              ('Setting up the network using Autograd; The full program',
+               2,
+               None,
+               'setting-up-the-network-using-autograd-the-full-program'),
+              ('Resources on differential equations and deep learning',
+               2,
+               None,
+               'resources-on-differential-equations-and-deep-learning'),
               ('Convolutional Neural Networks (recognizing images)',
                2,
                None,
@@ -141,85 +252,11 @@
                2,
                None,
                'pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
-              ('Building convolutional neural networks in Tensorflow and Keras',
-               2,
-               None,
-               'building-convolutional-neural-networks-in-tensorflow-and-keras'),
-              ('Setting it up', 2, None, 'setting-it-up'),
-              ('The MNIST dataset again', 2, None, 'the-mnist-dataset-again'),
-              ('Strong correlations', 2, None, 'strong-correlations'),
-              ('Layers of a CNN', 2, None, 'layers-of-a-cnn'),
-              ('Systematic reduction', 2, None, 'systematic-reduction'),
-              ('Prerequisites: Collect and pre-process data',
-               2,
-               None,
-               'prerequisites-collect-and-pre-process-data'),
-              ('Importing Keras and Tensorflow',
-               2,
-               None,
-               'importing-keras-and-tensorflow'),
-              ('Running with Keras', 2, None, 'running-with-keras'),
-              ('Final part', 2, None, 'final-part'),
-              ('Final visualization', 2, None, 'final-visualization'),
-              ('The CIFAR01 data set', 2, None, 'the-cifar01-data-set'),
-              ('Verifying the data set', 2, None, 'verifying-the-data-set'),
-              ('Set up  the model', 2, None, 'set-up-the-model'),
-              ('Add Dense layers on top', 2, None, 'add-dense-layers-on-top'),
-              ('Compile and train the model',
+              ('Building convolutional neural networks in Tensorflow/Keras and '
+               'PyTorch',
                2,
                None,
-               'compile-and-train-the-model'),
-              ('Finally, evaluate the model',
-               2,
-               None,
-               'finally-evaluate-the-model'),
-              ('Building our own CNN code',
-               2,
-               None,
-               'building-our-own-cnn-code'),
-              ('List of contents:', 3, None, 'list-of-contents'),
-              ('Schedulers', 3, None, 'schedulers'),
-              ('Usage of schedulers', 3, None, 'usage-of-schedulers'),
-              ('Cost functions', 3, None, 'cost-functions'),
-              ('Usage of cost functions', 3, None, 'usage-of-cost-functions'),
-              ('Activation functions', 3, None, 'activation-functions'),
-              ('Usage of activation functions',
-               3,
-               None,
-               'usage-of-activation-functions'),
-              ('Convolution', 3, None, 'convolution'),
-              ('Layers', 3, None, 'layers'),
-              ('Convolution2DLayer: convolution in a hidden layer',
-               3,
-               None,
-               'convolution2dlayer-convolution-in-a-hidden-layer'),
-              ('Backpropagation in the convolutional layer',
-               3,
-               None,
-               'backpropagation-in-the-convolutional-layer'),
-              ('Demonstration', 3, None, 'demonstration'),
-              ('Pooling Layer', 3, None, 'pooling-layer'),
-              ('Flattening Layer', 3, None, 'flattening-layer'),
-              ('Fully Connected Layers', 3, None, 'fully-connected-layers'),
-              ('Optimized Convolution2DLayer',
-               3,
-               None,
-               'optimized-convolution2dlayer'),
-              ('The Convolutional Neural Network (CNN)',
-               3,
-               None,
-               'the-convolutional-neural-network-cnn'),
-              ('Usage of CNN code', 3, None, 'usage-of-cnn-code'),
-              ('Additional Remarks', 3, None, 'additional-remarks'),
-              ('Remarks on the speed', 3, None, 'remarks-on-the-speed'),
-              ('Convolution using separable kernels',
-               3,
-               None,
-               'convolution-using-separable-kernels'),
-              ('Convolution in the Fourier domain',
-               3,
-               None,
-               'convolution-in-the-fourier-domain')]}
+               'building-convolutional-neural-networks-in-tensorflow-keras-and-pytorch')]}
 end of tocinfo -->
 
 <body>
@@ -247,96 +284,100 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week44-bs.html">Week 44,  Convolutional Neural Networks (CNN)</a>
+    <a class="navbar-brand" href="http://github.com/week44-bs.html">Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs001.html#plan-for-week-44" style="font-size: 80%;"><b>Plan for week 44</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs002.html#lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Lab  sessions on Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs003.html#material-for-lecture-monday-october-28" style="font-size: 80%;"><b>Material for Lecture Monday October 28</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs004.html#convolutional-neural-networks-recognizing-images" style="font-size: 80%;"><b>Convolutional Neural Networks (recognizing images)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs005.html#what-is-the-difference" style="font-size: 80%;"><b>What is the Difference</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs006.html#neural-networks-vs-cnns" style="font-size: 80%;"><b>Neural Networks vs CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs007.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;"><b>Why CNNS for images, sound files, medical images from CT scans etc?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs008.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;"><b>Regular NNs don’t scale well to full images</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs009.html#3d-volumes-of-neurons" style="font-size: 80%;"><b>3D volumes of neurons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs010.html#more-on-dimensionalities" style="font-size: 80%;"><b>More on Dimensionalities</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs011.html#further-remarks" style="font-size: 80%;"><b>Further remarks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs012.html#layers-used-to-build-cnns" style="font-size: 80%;"><b>Layers used to build CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs013.html#transforming-images" style="font-size: 80%;"><b>Transforming images</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs014.html#cnns-in-brief" style="font-size: 80%;"><b>CNNs in brief</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs015.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs016.html#key-idea" style="font-size: 80%;"><b>Key Idea</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs017.html#how-to-do-image-compression-before-the-era-of-deep-learning" style="font-size: 80%;"><b>How to do image compression before the era of deep learning</b></a></li>
-     <!-- navigation toc: --> <li><a href="#the-svd-example" style="font-size: 80%;"><b>The SVD example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs019.html#mathematics-of-cnns" style="font-size: 80%;"><b>Mathematics of CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs020.html#convolution-examples-polynomial-multiplication" style="font-size: 80%;"><b>Convolution Examples: Polynomial multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs021.html#efficient-polynomial-multiplication" style="font-size: 80%;"><b>Efficient Polynomial Multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs022.html#further-simplification" style="font-size: 80%;"><b>Further simplification</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs023.html#a-more-efficient-way-of-coding-the-above-convolution" style="font-size: 80%;"><b>A more efficient way of coding the above Convolution</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs024.html#commutative-process" style="font-size: 80%;"><b>Commutative process</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs025.html#toeplitz-matrices" style="font-size: 80%;"><b>Toeplitz matrices</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs026.html#fourier-series-and-toeplitz-matrices" style="font-size: 80%;"><b>Fourier series and Toeplitz matrices</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs027.html#generalizing-the-above-one-dimensional-case" style="font-size: 80%;"><b>Generalizing the above one-dimensional case</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#memory-considerations" style="font-size: 80%;"><b>Memory considerations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs029.html#padding" style="font-size: 80%;"><b>Padding</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs030.html#new-vector" style="font-size: 80%;"><b>New vector</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs031.html#rewriting-as-dot-products" style="font-size: 80%;"><b>Rewriting as dot products</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs032.html#cross-correlation" style="font-size: 80%;"><b>Cross correlation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs032.html#two-dimensional-objects" style="font-size: 80%;"><b>Two-dimensional objects</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs033.html#cnns-in-more-detail-simple-example" style="font-size: 80%;"><b>CNNs in more detail, simple example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs034.html#the-convolution-stage" style="font-size: 80%;"><b>The convolution stage</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs035.html#finding-the-number-of-parameters" style="font-size: 80%;"><b>Finding the number of parameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs036.html#new-image-or-volume" style="font-size: 80%;"><b>New image (or volume)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs037.html#parameters-to-train-common-settings" style="font-size: 80%;"><b>Parameters to train, common settings</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs038.html#examples-of-cnn-setups" style="font-size: 80%;"><b>Examples of CNN setups</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs039.html#summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>Summarizing: Performing a general discrete convolution ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs040.html#pooling" style="font-size: 80%;"><b>Pooling</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs041.html#pooling-arithmetic" style="font-size: 80%;"><b>Pooling arithmetic</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs042.html#pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>Pooling types ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs043.html#building-convolutional-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building convolutional neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs044.html#setting-it-up" style="font-size: 80%;"><b>Setting it up</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs045.html#the-mnist-dataset-again" style="font-size: 80%;"><b>The MNIST dataset again</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs046.html#strong-correlations" style="font-size: 80%;"><b>Strong correlations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs047.html#layers-of-a-cnn" style="font-size: 80%;"><b>Layers of a CNN</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs048.html#systematic-reduction" style="font-size: 80%;"><b>Systematic reduction</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs049.html#prerequisites-collect-and-pre-process-data" style="font-size: 80%;"><b>Prerequisites: Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs050.html#importing-keras-and-tensorflow" style="font-size: 80%;"><b>Importing Keras and Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs051.html#running-with-keras" style="font-size: 80%;"><b>Running with Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs052.html#final-part" style="font-size: 80%;"><b>Final part</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs053.html#final-visualization" style="font-size: 80%;"><b>Final visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs054.html#the-cifar01-data-set" style="font-size: 80%;"><b>The CIFAR01 data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs055.html#verifying-the-data-set" style="font-size: 80%;"><b>Verifying the data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs056.html#set-up-the-model" style="font-size: 80%;"><b>Set up  the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs057.html#add-dense-layers-on-top" style="font-size: 80%;"><b>Add Dense layers on top</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs058.html#compile-and-train-the-model" style="font-size: 80%;"><b>Compile and train the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs059.html#finally-evaluate-the-model" style="font-size: 80%;"><b>Finally, evaluate the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#building-our-own-cnn-code" style="font-size: 80%;"><b>Building our own CNN code</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#list-of-contents" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;List of contents:</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution2dlayer-convolution-in-a-hidden-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution2DLayer: convolution in a hidden layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#backpropagation-in-the-convolutional-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Backpropagation in the convolutional layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#demonstration" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Demonstration</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#pooling-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Pooling Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#flattening-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Flattening Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#fully-connected-layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Fully Connected Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#optimized-convolution2dlayer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Optimized Convolution2DLayer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#the-convolutional-neural-network-cnn" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;The Convolutional Neural Network (CNN)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-cnn-code" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of CNN code</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#additional-remarks" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Additional Remarks</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#remarks-on-the-speed" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Remarks on the speed</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution-using-separable-kernels" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution using separable kernels</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution-in-the-fourier-domain" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution in the Fourier domain</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs001.html#plan-for-week-44" style="font-size: 80%;">Plan for week 44</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs002.html#lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;">Lab  sessions on Tuesday and Wednesday</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs003.html#material-for-lecture-monday-october-27" style="font-size: 80%;">Material for Lecture Monday October 27</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs004.html#solving-differential-equations-with-deep-learning" style="font-size: 80%;">Solving differential equations  with Deep Learning</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs005.html#ordinary-differential-equations-first" style="font-size: 80%;">Ordinary Differential Equations first</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#the-trial-solution" style="font-size: 80%;">The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs007.html#minimization-process" style="font-size: 80%;">Minimization process</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs008.html#minimizing-the-cost-function-using-gradient-descent-and-automatic-differentiation" style="font-size: 80%;">Minimizing the cost function using gradient descent and automatic differentiation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs009.html#example-exponential-decay" style="font-size: 80%;">Example: Exponential decay</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs010.html#the-function-to-solve-for" style="font-size: 80%;">The function to solve for</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#the-trial-solution" style="font-size: 80%;">The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs012.html#setup-of-network" style="font-size: 80%;">Setup of Network</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs013.html#reformulating-the-problem" style="font-size: 80%;">Reformulating the problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs014.html#more-technicalities" style="font-size: 80%;">More technicalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs039.html#more-details" style="font-size: 80%;">More details</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs016.html#a-possible-implementation-of-a-neural-network" style="font-size: 80%;">A possible implementation of a neural network</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs017.html#technicalities" style="font-size: 80%;">Technicalities</a></li>
+     <!-- navigation toc: --> <li><a href="#final-technicalities-i" style="font-size: 80%;">Final technicalities I</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs019.html#final-technicalities-ii" style="font-size: 80%;">Final technicalities II</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs020.html#final-technicalities-iii" style="font-size: 80%;">Final technicalities III</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs021.html#final-technicalities-iv" style="font-size: 80%;">Final technicalities IV</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs022.html#back-propagation" style="font-size: 80%;">Back propagation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs023.html#gradient-descent" style="font-size: 80%;">Gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs024.html#the-code-for-solving-the-ode" style="font-size: 80%;">The code for solving the ODE</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs025.html#the-network-with-one-input-layer-specified-number-of-hidden-layers-and-one-output-layer" style="font-size: 80%;">The network with one input layer, specified number of hidden layers, and one output layer</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs026.html#example-population-growth" style="font-size: 80%;">Example: Population growth</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs027.html#setting-up-the-problem" style="font-size: 80%;">Setting up the problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#the-trial-solution" style="font-size: 80%;">The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs029.html#the-program-using-autograd" style="font-size: 80%;">The program using Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs030.html#using-forward-euler-to-solve-the-ode" style="font-size: 80%;">Using forward Euler to solve the ODE</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs031.html#example-solving-the-one-dimensional-poisson-equation" style="font-size: 80%;">Example: Solving the one dimensional Poisson equation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs032.html#the-specific-equation-to-solve-for" style="font-size: 80%;">The specific equation to solve for</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs033.html#solving-the-equation-using-autograd" style="font-size: 80%;">Solving the equation using Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs034.html#comparing-with-a-numerical-scheme" style="font-size: 80%;">Comparing with a numerical scheme</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs035.html#setting-up-the-code" style="font-size: 80%;">Setting up the code</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs036.html#partial-differential-equations" style="font-size: 80%;">Partial Differential Equations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs037.html#type-of-problem" style="font-size: 80%;">Type of problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs038.html#network-requirements" style="font-size: 80%;">Network requirements</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs039.html#more-details" style="font-size: 80%;">More details</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs040.html#example-the-diffusion-equation" style="font-size: 80%;">Example: The diffusion equation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs041.html#defining-the-problem" style="font-size: 80%;">Defining the problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs042.html#setting-up-the-network-using-autograd" style="font-size: 80%;">Setting up the network using Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs043.html#setting-up-the-network-using-autograd-the-trial-solution" style="font-size: 80%;">Setting up the network using Autograd; The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs044.html#why-the-jacobian" style="font-size: 80%;">Why the Jacobian?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs045.html#setting-up-the-network-using-autograd-the-full-program" style="font-size: 80%;">Setting up the network using Autograd; The full program</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs046.html#resources-on-differential-equations-and-deep-learning" style="font-size: 80%;">Resources on differential equations and deep learning</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs047.html#convolutional-neural-networks-recognizing-images" style="font-size: 80%;">Convolutional Neural Networks (recognizing images)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs048.html#what-is-the-difference" style="font-size: 80%;">What is the Difference</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs049.html#neural-networks-vs-cnns" style="font-size: 80%;">Neural Networks vs CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs050.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;">Why CNNS for images, sound files, medical images from CT scans etc?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs051.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;">Regular NNs don’t scale well to full images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs052.html#3d-volumes-of-neurons" style="font-size: 80%;">3D volumes of neurons</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs053.html#more-on-dimensionalities" style="font-size: 80%;">More on Dimensionalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs054.html#further-remarks" style="font-size: 80%;">Further remarks</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs055.html#layers-used-to-build-cnns" style="font-size: 80%;">Layers used to build CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs056.html#transforming-images" style="font-size: 80%;">Transforming images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs057.html#cnns-in-brief" style="font-size: 80%;">CNNs in brief</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs058.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs059.html#key-idea" style="font-size: 80%;">Key Idea</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#how-to-do-image-compression-before-the-era-of-deep-learning" style="font-size: 80%;">How to do image compression before the era of deep learning</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs061.html#the-svd-example" style="font-size: 80%;">The SVD example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs062.html#mathematics-of-cnns" style="font-size: 80%;">Mathematics of CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs063.html#convolution-examples-polynomial-multiplication" style="font-size: 80%;">Convolution Examples: Polynomial multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs064.html#efficient-polynomial-multiplication" style="font-size: 80%;">Efficient Polynomial Multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs065.html#further-simplification" style="font-size: 80%;">Further simplification</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs066.html#a-more-efficient-way-of-coding-the-above-convolution" style="font-size: 80%;">A more efficient way of coding the above Convolution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs067.html#commutative-process" style="font-size: 80%;">Commutative process</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs068.html#toeplitz-matrices" style="font-size: 80%;">Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs069.html#fourier-series-and-toeplitz-matrices" style="font-size: 80%;">Fourier series and Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs070.html#generalizing-the-above-one-dimensional-case" style="font-size: 80%;">Generalizing the above one-dimensional case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs071.html#memory-considerations" style="font-size: 80%;">Memory considerations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs072.html#padding" style="font-size: 80%;">Padding</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs073.html#new-vector" style="font-size: 80%;">New vector</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs074.html#rewriting-as-dot-products" style="font-size: 80%;">Rewriting as dot products</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs075.html#cross-correlation" style="font-size: 80%;">Cross correlation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs075.html#two-dimensional-objects" style="font-size: 80%;">Two-dimensional objects</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs076.html#cnns-in-more-detail-simple-example" style="font-size: 80%;">CNNs in more detail, simple example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs077.html#the-convolution-stage" style="font-size: 80%;">The convolution stage</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs078.html#finding-the-number-of-parameters" style="font-size: 80%;">Finding the number of parameters</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs079.html#new-image-or-volume" style="font-size: 80%;">New image (or volume)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs080.html#parameters-to-train-common-settings" style="font-size: 80%;">Parameters to train, common settings</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs081.html#examples-of-cnn-setups" style="font-size: 80%;">Examples of CNN setups</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs082.html#summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Summarizing: Performing a general discrete convolution ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs083.html#pooling" style="font-size: 80%;">Pooling</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs084.html#pooling-arithmetic" style="font-size: 80%;">Pooling arithmetic</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs085.html#pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Pooling types ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs086.html#building-convolutional-neural-networks-in-tensorflow-keras-and-pytorch" style="font-size: 80%;">Building convolutional neural networks in Tensorflow/Keras and PyTorch</a></li>
 
         </ul>
       </li>
@@ -348,87 +389,24 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0018"></a>
 <!-- !split -->
-<h2 id="the-svd-example" class="anchor">The SVD example </h2>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">matplotlib.image</span> <span style="color: #008000; font-weight: bold">import</span> imread
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">matplotlib.pyplot</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">plt</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">scipy.linalg</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">ln</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">os</span>
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">PIL</span> <span style="color: #008000; font-weight: bold">import</span> Image
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">math</span> <span style="color: #008000; font-weight: bold">import</span> log10, sqrt 
-plt<span style="color: #666666">.</span>rcParams[<span style="color: #BA2121">&#39;figure.figsize&#39;</span>] <span style="color: #666666">=</span> [<span style="color: #666666">16</span>, <span style="color: #666666">8</span>]
-<span style="color: #408080; font-style: italic"># Import image</span>
-A <span style="color: #666666">=</span> imread(os<span style="color: #666666">.</span>path<span style="color: #666666">.</span>join(<span style="color: #BA2121">&quot;figslides/photo1.jpg&quot;</span>))
-X <span style="color: #666666">=</span> A<span style="color: #666666">.</span>dot([<span style="color: #666666">0.299</span>, <span style="color: #666666">0.5870</span>, <span style="color: #666666">0.114</span>]) <span style="color: #408080; font-style: italic"># Convert RGB to grayscale</span>
-img <span style="color: #666666">=</span> plt<span style="color: #666666">.</span>imshow(X)
-<span style="color: #408080; font-style: italic"># convert to gray</span>
-img<span style="color: #666666">.</span>set_cmap(<span style="color: #BA2121">&#39;gray&#39;</span>)
-plt<span style="color: #666666">.</span>axis(<span style="color: #BA2121">&#39;off&#39;</span>)
-plt<span style="color: #666666">.</span>show()
-<span style="color: #408080; font-style: italic"># Call image size</span>
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&#39;: </span><span style="color: #BB6688; font-weight: bold">%s</span><span style="color: #BA2121">&#39;</span><span style="color: #666666">%</span><span style="color: #008000">str</span>(X<span style="color: #666666">.</span>shape))
+<h2 id="final-technicalities-i" class="anchor">Final technicalities I </h2>
 
+<p>The result after weighting the inputs at the \( i \)-th hidden neuron can be written as a vector:</p>
 
-<span style="color: #408080; font-style: italic"># split the matrix into U, S, VT</span>
-U, S, VT <span style="color: #666666">=</span> np<span style="color: #666666">.</span>linalg<span style="color: #666666">.</span>svd(X,full_matrices<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">False</span>)
-S <span style="color: #666666">=</span> np<span style="color: #666666">.</span>diag(S)
-m <span style="color: #666666">=</span> <span style="color: #666666">800</span> <span style="color: #408080; font-style: italic"># Image&#39;s width</span>
-n <span style="color: #666666">=</span> <span style="color: #666666">1200</span> <span style="color: #408080; font-style: italic"># Image&#39;s height</span>
-j <span style="color: #666666">=</span> <span style="color: #666666">0</span>
-<span style="color: #408080; font-style: italic"># Try compression with different k vectors (these represent projections):</span>
-<span style="color: #008000; font-weight: bold">for</span> k <span style="color: #AA22FF; font-weight: bold">in</span> (<span style="color: #666666">5</span>,<span style="color: #666666">10</span>, <span style="color: #666666">20</span>, <span style="color: #666666">100</span>,<span style="color: #666666">200</span>,<span style="color: #666666">400</span>,<span style="color: #666666">500</span>):
-    <span style="color: #408080; font-style: italic"># Original size of the image</span>
-    originalSize <span style="color: #666666">=</span> m <span style="color: #666666">*</span> n 
-    <span style="color: #408080; font-style: italic"># Size after compressed</span>
-    compressedSize <span style="color: #666666">=</span> k <span style="color: #666666">*</span> (<span style="color: #666666">1</span> <span style="color: #666666">+</span> m <span style="color: #666666">+</span> n) 
-    <span style="color: #408080; font-style: italic"># The projection of the original image</span>
-    Xapprox <span style="color: #666666">=</span> U[:,:k] <span style="color: #666666">@</span> S[<span style="color: #666666">0</span>:k,:k] <span style="color: #666666">@</span> VT[:k,:]
-    plt<span style="color: #666666">.</span>figure(j<span style="color: #666666">+1</span>)
-    j <span style="color: #666666">+=</span> <span style="color: #666666">1</span>
-    img <span style="color: #666666">=</span> plt<span style="color: #666666">.</span>imshow(Xapprox)
-    img<span style="color: #666666">.</span>set_cmap(<span style="color: #BA2121">&#39;gray&#39;</span>)
-    
-    plt<span style="color: #666666">.</span>axis(<span style="color: #BA2121">&#39;off&#39;</span>)
-    plt<span style="color: #666666">.</span>title(<span style="color: #BA2121">&#39;k = &#39;</span> <span style="color: #666666">+</span> <span style="color: #008000">str</span>(k))
-    plt<span style="color: #666666">.</span>show() 
-    <span style="color: #008000">print</span>(<span style="color: #BA2121">&#39;Original size of image:&#39;</span>)
-    <span style="color: #008000">print</span>(originalSize)
-    <span style="color: #008000">print</span>(<span style="color: #BA2121">&#39;Compression rate as Compressed image / Original size:&#39;</span>)
-    ratio <span style="color: #666666">=</span> compressedSize <span style="color: #666666">*</span> <span style="color: #666666">1.0</span> <span style="color: #666666">/</span> originalSize
-    <span style="color: #008000">print</span>(ratio)
-    <span style="color: #008000">print</span>(<span style="color: #BA2121">&#39;Compression rate is &#39;</span> <span style="color: #666666">+</span> <span style="color: #008000">str</span>( <span style="color: #008000">round</span>(ratio <span style="color: #666666">*</span> <span style="color: #666666">100</span> ,<span style="color: #666666">2</span>)) <span style="color: #666666">+</span> <span style="color: #BA2121">&#39;%&#39;</span> )  
-    <span style="color: #408080; font-style: italic"># Estimate MQA</span>
-    x<span style="color: #666666">=</span> X<span style="color: #666666">.</span>astype(<span style="color: #BA2121">&quot;float&quot;</span>)
-    y<span style="color: #666666">=</span>Xapprox<span style="color: #666666">.</span>astype(<span style="color: #BA2121">&quot;float&quot;</span>)
-    err <span style="color: #666666">=</span> np<span style="color: #666666">.</span>sum((x <span style="color: #666666">-</span> y) <span style="color: #666666">**</span> <span style="color: #666666">2</span>)
-    err <span style="color: #666666">/=</span> <span style="color: #008000">float</span>(X<span style="color: #666666">.</span>shape[<span style="color: #666666">0</span>] <span style="color: #666666">*</span> Xapprox<span style="color: #666666">.</span>shape[<span style="color: #666666">1</span>])
-    <span style="color: #008000">print</span>(<span style="color: #BA2121">&#39;The mean-square deviation &#39;</span><span style="color: #666666">+</span> <span style="color: #008000">str</span>(<span style="color: #008000">round</span>( err)))
-    max_pixel <span style="color: #666666">=</span> <span style="color: #666666">255.0</span>
-    <span style="color: #408080; font-style: italic"># Estimate Signal Noise Ratio</span>
-    srv <span style="color: #666666">=</span> <span style="color: #666666">20</span> <span style="color: #666666">*</span> (log10(max_pixel <span style="color: #666666">/</span> sqrt(err)))
-    <span style="color: #008000">print</span>(<span style="color: #BA2121">&#39;Signa to noise ratio &#39;</span><span style="color: #666666">+</span> <span style="color: #008000">str</span>(<span style="color: #008000">round</span>(srv)) <span style="color: #666666">+</span><span style="color: #BA2121">&#39;dB&#39;</span>)
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
+$$
+\begin{aligned}
+\boldsymbol{z}_{i}^{\text{hidden}} &= \Big( b_i^{\text{hidden}} + w_i^{\text{hidden}}x_1 , \ b_i^{\text{hidden}} + w_i^{\text{hidden}} x_2, \ \dots \, , \ b_i^{\text{hidden}} + w_i^{\text{hidden}} x_N\Big)  \\
+&=
+\begin{pmatrix}
+ b_i^{\text{hidden}}  & w_i^{\text{hidden}}
+\end{pmatrix}
+\begin{pmatrix}
+1  & 1 & \dots & 1 \\
+x_1 & x_2 & \dots & x_N
+\end{pmatrix} \\
+&= \boldsymbol{p}_{i, \text{hidden}}^T X
+\end{aligned}
+$$
 
 
 <p>
@@ -456,7 +434,7 @@ <h2 id="the-svd-example" class="anchor">The SVD example </h2>
   <li><a href="http://github.com/._week44-bs026.html">27</a></li>
   <li><a href="http://github.com/._week44-bs027.html">28</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week44-bs060.html">61</a></li>
+  <li><a href="http://github.com/._week44-bs086.html">87</a></li>
   <li><a href="http://github.com/._week44-bs019.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week44/html/._week44-bs019.html b/doc/pub/week44/html/._week44-bs019.html
index 169d3ef04..839dd4c94 100644
--- a/doc/pub/week44/html/._week44-bs019.html
+++ b/doc/pub/week44/html/._week44-bs019.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 44,  Convolutional Neural Networks (CNN)">
-<title>Week 44,  Convolutional Neural Networks (CNN)</title>
+<meta name="description" content="Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)">
+<title>Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week44.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week44-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -41,10 +41,121 @@
                2,
                None,
                'lab-sessions-on-tuesday-and-wednesday'),
-              ('Material for Lecture Monday October 28',
+              ('Material for Lecture Monday October 27',
                2,
                None,
-               'material-for-lecture-monday-october-28'),
+               'material-for-lecture-monday-october-27'),
+              ('Solving differential equations  with Deep Learning',
+               2,
+               None,
+               'solving-differential-equations-with-deep-learning'),
+              ('Ordinary Differential Equations first',
+               2,
+               None,
+               'ordinary-differential-equations-first'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('Minimization process', 2, None, 'minimization-process'),
+              ('Minimizing the cost function using gradient descent and '
+               'automatic differentiation',
+               2,
+               None,
+               'minimizing-the-cost-function-using-gradient-descent-and-automatic-differentiation'),
+              ('Example: Exponential decay',
+               2,
+               None,
+               'example-exponential-decay'),
+              ('The function to solve for',
+               2,
+               None,
+               'the-function-to-solve-for'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('Setup of Network', 2, None, 'setup-of-network'),
+              ('Reformulating the problem',
+               2,
+               None,
+               'reformulating-the-problem'),
+              ('More technicalities', 2, None, 'more-technicalities'),
+              ('More details', 2, None, 'more-details'),
+              ('A possible implementation of a neural network',
+               2,
+               None,
+               'a-possible-implementation-of-a-neural-network'),
+              ('Technicalities', 2, None, 'technicalities'),
+              ('Final technicalities I', 2, None, 'final-technicalities-i'),
+              ('Final technicalities II', 2, None, 'final-technicalities-ii'),
+              ('Final technicalities III', 2, None, 'final-technicalities-iii'),
+              ('Final technicalities IV', 2, None, 'final-technicalities-iv'),
+              ('Back propagation', 2, None, 'back-propagation'),
+              ('Gradient descent', 2, None, 'gradient-descent'),
+              ('The code for solving the ODE',
+               2,
+               None,
+               'the-code-for-solving-the-ode'),
+              ('The network with one input layer, specified number of hidden '
+               'layers, and one output layer',
+               2,
+               None,
+               'the-network-with-one-input-layer-specified-number-of-hidden-layers-and-one-output-layer'),
+              ('Example: Population growth',
+               2,
+               None,
+               'example-population-growth'),
+              ('Setting up the problem', 2, None, 'setting-up-the-problem'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('The program using Autograd',
+               2,
+               None,
+               'the-program-using-autograd'),
+              ('Using forward Euler to solve the ODE',
+               2,
+               None,
+               'using-forward-euler-to-solve-the-ode'),
+              ('Example: Solving the one dimensional Poisson equation',
+               2,
+               None,
+               'example-solving-the-one-dimensional-poisson-equation'),
+              ('The specific equation to solve for',
+               2,
+               None,
+               'the-specific-equation-to-solve-for'),
+              ('Solving the equation using Autograd',
+               2,
+               None,
+               'solving-the-equation-using-autograd'),
+              ('Comparing with a numerical scheme',
+               2,
+               None,
+               'comparing-with-a-numerical-scheme'),
+              ('Setting up the code', 2, None, 'setting-up-the-code'),
+              ('Partial Differential Equations',
+               2,
+               None,
+               'partial-differential-equations'),
+              ('Type of problem', 2, None, 'type-of-problem'),
+              ('Network requirements', 2, None, 'network-requirements'),
+              ('More details', 2, None, 'more-details'),
+              ('Example: The diffusion equation',
+               2,
+               None,
+               'example-the-diffusion-equation'),
+              ('Defining the problem', 2, None, 'defining-the-problem'),
+              ('Setting up the network using Autograd',
+               2,
+               None,
+               'setting-up-the-network-using-autograd'),
+              ('Setting up the network using Autograd; The trial solution',
+               2,
+               None,
+               'setting-up-the-network-using-autograd-the-trial-solution'),
+              ('Why the Jacobian?', 2, None, 'why-the-jacobian'),
+              ('Setting up the network using Autograd; The full program',
+               2,
+               None,
+               'setting-up-the-network-using-autograd-the-full-program'),
+              ('Resources on differential equations and deep learning',
+               2,
+               None,
+               'resources-on-differential-equations-and-deep-learning'),
               ('Convolutional Neural Networks (recognizing images)',
                2,
                None,
@@ -141,85 +252,11 @@
                2,
                None,
                'pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
-              ('Building convolutional neural networks in Tensorflow and Keras',
+              ('Building convolutional neural networks in Tensorflow/Keras and '
+               'PyTorch',
                2,
                None,
-               'building-convolutional-neural-networks-in-tensorflow-and-keras'),
-              ('Setting it up', 2, None, 'setting-it-up'),
-              ('The MNIST dataset again', 2, None, 'the-mnist-dataset-again'),
-              ('Strong correlations', 2, None, 'strong-correlations'),
-              ('Layers of a CNN', 2, None, 'layers-of-a-cnn'),
-              ('Systematic reduction', 2, None, 'systematic-reduction'),
-              ('Prerequisites: Collect and pre-process data',
-               2,
-               None,
-               'prerequisites-collect-and-pre-process-data'),
-              ('Importing Keras and Tensorflow',
-               2,
-               None,
-               'importing-keras-and-tensorflow'),
-              ('Running with Keras', 2, None, 'running-with-keras'),
-              ('Final part', 2, None, 'final-part'),
-              ('Final visualization', 2, None, 'final-visualization'),
-              ('The CIFAR01 data set', 2, None, 'the-cifar01-data-set'),
-              ('Verifying the data set', 2, None, 'verifying-the-data-set'),
-              ('Set up  the model', 2, None, 'set-up-the-model'),
-              ('Add Dense layers on top', 2, None, 'add-dense-layers-on-top'),
-              ('Compile and train the model',
-               2,
-               None,
-               'compile-and-train-the-model'),
-              ('Finally, evaluate the model',
-               2,
-               None,
-               'finally-evaluate-the-model'),
-              ('Building our own CNN code',
-               2,
-               None,
-               'building-our-own-cnn-code'),
-              ('List of contents:', 3, None, 'list-of-contents'),
-              ('Schedulers', 3, None, 'schedulers'),
-              ('Usage of schedulers', 3, None, 'usage-of-schedulers'),
-              ('Cost functions', 3, None, 'cost-functions'),
-              ('Usage of cost functions', 3, None, 'usage-of-cost-functions'),
-              ('Activation functions', 3, None, 'activation-functions'),
-              ('Usage of activation functions',
-               3,
-               None,
-               'usage-of-activation-functions'),
-              ('Convolution', 3, None, 'convolution'),
-              ('Layers', 3, None, 'layers'),
-              ('Convolution2DLayer: convolution in a hidden layer',
-               3,
-               None,
-               'convolution2dlayer-convolution-in-a-hidden-layer'),
-              ('Backpropagation in the convolutional layer',
-               3,
-               None,
-               'backpropagation-in-the-convolutional-layer'),
-              ('Demonstration', 3, None, 'demonstration'),
-              ('Pooling Layer', 3, None, 'pooling-layer'),
-              ('Flattening Layer', 3, None, 'flattening-layer'),
-              ('Fully Connected Layers', 3, None, 'fully-connected-layers'),
-              ('Optimized Convolution2DLayer',
-               3,
-               None,
-               'optimized-convolution2dlayer'),
-              ('The Convolutional Neural Network (CNN)',
-               3,
-               None,
-               'the-convolutional-neural-network-cnn'),
-              ('Usage of CNN code', 3, None, 'usage-of-cnn-code'),
-              ('Additional Remarks', 3, None, 'additional-remarks'),
-              ('Remarks on the speed', 3, None, 'remarks-on-the-speed'),
-              ('Convolution using separable kernels',
-               3,
-               None,
-               'convolution-using-separable-kernels'),
-              ('Convolution in the Fourier domain',
-               3,
-               None,
-               'convolution-in-the-fourier-domain')]}
+               'building-convolutional-neural-networks-in-tensorflow-keras-and-pytorch')]}
 end of tocinfo -->
 
 <body>
@@ -247,96 +284,100 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week44-bs.html">Week 44,  Convolutional Neural Networks (CNN)</a>
+    <a class="navbar-brand" href="http://github.com/week44-bs.html">Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs001.html#plan-for-week-44" style="font-size: 80%;"><b>Plan for week 44</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs002.html#lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Lab  sessions on Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs003.html#material-for-lecture-monday-october-28" style="font-size: 80%;"><b>Material for Lecture Monday October 28</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs004.html#convolutional-neural-networks-recognizing-images" style="font-size: 80%;"><b>Convolutional Neural Networks (recognizing images)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs005.html#what-is-the-difference" style="font-size: 80%;"><b>What is the Difference</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs006.html#neural-networks-vs-cnns" style="font-size: 80%;"><b>Neural Networks vs CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs007.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;"><b>Why CNNS for images, sound files, medical images from CT scans etc?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs008.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;"><b>Regular NNs don’t scale well to full images</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs009.html#3d-volumes-of-neurons" style="font-size: 80%;"><b>3D volumes of neurons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs010.html#more-on-dimensionalities" style="font-size: 80%;"><b>More on Dimensionalities</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs011.html#further-remarks" style="font-size: 80%;"><b>Further remarks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs012.html#layers-used-to-build-cnns" style="font-size: 80%;"><b>Layers used to build CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs013.html#transforming-images" style="font-size: 80%;"><b>Transforming images</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs014.html#cnns-in-brief" style="font-size: 80%;"><b>CNNs in brief</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs015.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs016.html#key-idea" style="font-size: 80%;"><b>Key Idea</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs017.html#how-to-do-image-compression-before-the-era-of-deep-learning" style="font-size: 80%;"><b>How to do image compression before the era of deep learning</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs018.html#the-svd-example" style="font-size: 80%;"><b>The SVD example</b></a></li>
-     <!-- navigation toc: --> <li><a href="#mathematics-of-cnns" style="font-size: 80%;"><b>Mathematics of CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs020.html#convolution-examples-polynomial-multiplication" style="font-size: 80%;"><b>Convolution Examples: Polynomial multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs021.html#efficient-polynomial-multiplication" style="font-size: 80%;"><b>Efficient Polynomial Multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs022.html#further-simplification" style="font-size: 80%;"><b>Further simplification</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs023.html#a-more-efficient-way-of-coding-the-above-convolution" style="font-size: 80%;"><b>A more efficient way of coding the above Convolution</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs024.html#commutative-process" style="font-size: 80%;"><b>Commutative process</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs025.html#toeplitz-matrices" style="font-size: 80%;"><b>Toeplitz matrices</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs026.html#fourier-series-and-toeplitz-matrices" style="font-size: 80%;"><b>Fourier series and Toeplitz matrices</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs027.html#generalizing-the-above-one-dimensional-case" style="font-size: 80%;"><b>Generalizing the above one-dimensional case</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#memory-considerations" style="font-size: 80%;"><b>Memory considerations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs029.html#padding" style="font-size: 80%;"><b>Padding</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs030.html#new-vector" style="font-size: 80%;"><b>New vector</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs031.html#rewriting-as-dot-products" style="font-size: 80%;"><b>Rewriting as dot products</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs032.html#cross-correlation" style="font-size: 80%;"><b>Cross correlation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs032.html#two-dimensional-objects" style="font-size: 80%;"><b>Two-dimensional objects</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs033.html#cnns-in-more-detail-simple-example" style="font-size: 80%;"><b>CNNs in more detail, simple example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs034.html#the-convolution-stage" style="font-size: 80%;"><b>The convolution stage</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs035.html#finding-the-number-of-parameters" style="font-size: 80%;"><b>Finding the number of parameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs036.html#new-image-or-volume" style="font-size: 80%;"><b>New image (or volume)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs037.html#parameters-to-train-common-settings" style="font-size: 80%;"><b>Parameters to train, common settings</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs038.html#examples-of-cnn-setups" style="font-size: 80%;"><b>Examples of CNN setups</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs039.html#summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>Summarizing: Performing a general discrete convolution ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs040.html#pooling" style="font-size: 80%;"><b>Pooling</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs041.html#pooling-arithmetic" style="font-size: 80%;"><b>Pooling arithmetic</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs042.html#pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>Pooling types ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs043.html#building-convolutional-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building convolutional neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs044.html#setting-it-up" style="font-size: 80%;"><b>Setting it up</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs045.html#the-mnist-dataset-again" style="font-size: 80%;"><b>The MNIST dataset again</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs046.html#strong-correlations" style="font-size: 80%;"><b>Strong correlations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs047.html#layers-of-a-cnn" style="font-size: 80%;"><b>Layers of a CNN</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs048.html#systematic-reduction" style="font-size: 80%;"><b>Systematic reduction</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs049.html#prerequisites-collect-and-pre-process-data" style="font-size: 80%;"><b>Prerequisites: Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs050.html#importing-keras-and-tensorflow" style="font-size: 80%;"><b>Importing Keras and Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs051.html#running-with-keras" style="font-size: 80%;"><b>Running with Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs052.html#final-part" style="font-size: 80%;"><b>Final part</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs053.html#final-visualization" style="font-size: 80%;"><b>Final visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs054.html#the-cifar01-data-set" style="font-size: 80%;"><b>The CIFAR01 data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs055.html#verifying-the-data-set" style="font-size: 80%;"><b>Verifying the data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs056.html#set-up-the-model" style="font-size: 80%;"><b>Set up  the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs057.html#add-dense-layers-on-top" style="font-size: 80%;"><b>Add Dense layers on top</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs058.html#compile-and-train-the-model" style="font-size: 80%;"><b>Compile and train the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs059.html#finally-evaluate-the-model" style="font-size: 80%;"><b>Finally, evaluate the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#building-our-own-cnn-code" style="font-size: 80%;"><b>Building our own CNN code</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#list-of-contents" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;List of contents:</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution2dlayer-convolution-in-a-hidden-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution2DLayer: convolution in a hidden layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#backpropagation-in-the-convolutional-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Backpropagation in the convolutional layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#demonstration" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Demonstration</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#pooling-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Pooling Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#flattening-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Flattening Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#fully-connected-layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Fully Connected Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#optimized-convolution2dlayer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Optimized Convolution2DLayer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#the-convolutional-neural-network-cnn" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;The Convolutional Neural Network (CNN)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-cnn-code" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of CNN code</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#additional-remarks" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Additional Remarks</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#remarks-on-the-speed" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Remarks on the speed</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution-using-separable-kernels" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution using separable kernels</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution-in-the-fourier-domain" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution in the Fourier domain</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs001.html#plan-for-week-44" style="font-size: 80%;">Plan for week 44</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs002.html#lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;">Lab  sessions on Tuesday and Wednesday</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs003.html#material-for-lecture-monday-october-27" style="font-size: 80%;">Material for Lecture Monday October 27</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs004.html#solving-differential-equations-with-deep-learning" style="font-size: 80%;">Solving differential equations  with Deep Learning</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs005.html#ordinary-differential-equations-first" style="font-size: 80%;">Ordinary Differential Equations first</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#the-trial-solution" style="font-size: 80%;">The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs007.html#minimization-process" style="font-size: 80%;">Minimization process</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs008.html#minimizing-the-cost-function-using-gradient-descent-and-automatic-differentiation" style="font-size: 80%;">Minimizing the cost function using gradient descent and automatic differentiation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs009.html#example-exponential-decay" style="font-size: 80%;">Example: Exponential decay</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs010.html#the-function-to-solve-for" style="font-size: 80%;">The function to solve for</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#the-trial-solution" style="font-size: 80%;">The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs012.html#setup-of-network" style="font-size: 80%;">Setup of Network</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs013.html#reformulating-the-problem" style="font-size: 80%;">Reformulating the problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs014.html#more-technicalities" style="font-size: 80%;">More technicalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs039.html#more-details" style="font-size: 80%;">More details</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs016.html#a-possible-implementation-of-a-neural-network" style="font-size: 80%;">A possible implementation of a neural network</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs017.html#technicalities" style="font-size: 80%;">Technicalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs018.html#final-technicalities-i" style="font-size: 80%;">Final technicalities I</a></li>
+     <!-- navigation toc: --> <li><a href="#final-technicalities-ii" style="font-size: 80%;">Final technicalities II</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs020.html#final-technicalities-iii" style="font-size: 80%;">Final technicalities III</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs021.html#final-technicalities-iv" style="font-size: 80%;">Final technicalities IV</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs022.html#back-propagation" style="font-size: 80%;">Back propagation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs023.html#gradient-descent" style="font-size: 80%;">Gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs024.html#the-code-for-solving-the-ode" style="font-size: 80%;">The code for solving the ODE</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs025.html#the-network-with-one-input-layer-specified-number-of-hidden-layers-and-one-output-layer" style="font-size: 80%;">The network with one input layer, specified number of hidden layers, and one output layer</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs026.html#example-population-growth" style="font-size: 80%;">Example: Population growth</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs027.html#setting-up-the-problem" style="font-size: 80%;">Setting up the problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#the-trial-solution" style="font-size: 80%;">The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs029.html#the-program-using-autograd" style="font-size: 80%;">The program using Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs030.html#using-forward-euler-to-solve-the-ode" style="font-size: 80%;">Using forward Euler to solve the ODE</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs031.html#example-solving-the-one-dimensional-poisson-equation" style="font-size: 80%;">Example: Solving the one dimensional Poisson equation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs032.html#the-specific-equation-to-solve-for" style="font-size: 80%;">The specific equation to solve for</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs033.html#solving-the-equation-using-autograd" style="font-size: 80%;">Solving the equation using Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs034.html#comparing-with-a-numerical-scheme" style="font-size: 80%;">Comparing with a numerical scheme</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs035.html#setting-up-the-code" style="font-size: 80%;">Setting up the code</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs036.html#partial-differential-equations" style="font-size: 80%;">Partial Differential Equations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs037.html#type-of-problem" style="font-size: 80%;">Type of problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs038.html#network-requirements" style="font-size: 80%;">Network requirements</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs039.html#more-details" style="font-size: 80%;">More details</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs040.html#example-the-diffusion-equation" style="font-size: 80%;">Example: The diffusion equation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs041.html#defining-the-problem" style="font-size: 80%;">Defining the problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs042.html#setting-up-the-network-using-autograd" style="font-size: 80%;">Setting up the network using Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs043.html#setting-up-the-network-using-autograd-the-trial-solution" style="font-size: 80%;">Setting up the network using Autograd; The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs044.html#why-the-jacobian" style="font-size: 80%;">Why the Jacobian?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs045.html#setting-up-the-network-using-autograd-the-full-program" style="font-size: 80%;">Setting up the network using Autograd; The full program</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs046.html#resources-on-differential-equations-and-deep-learning" style="font-size: 80%;">Resources on differential equations and deep learning</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs047.html#convolutional-neural-networks-recognizing-images" style="font-size: 80%;">Convolutional Neural Networks (recognizing images)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs048.html#what-is-the-difference" style="font-size: 80%;">What is the Difference</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs049.html#neural-networks-vs-cnns" style="font-size: 80%;">Neural Networks vs CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs050.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;">Why CNNS for images, sound files, medical images from CT scans etc?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs051.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;">Regular NNs don’t scale well to full images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs052.html#3d-volumes-of-neurons" style="font-size: 80%;">3D volumes of neurons</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs053.html#more-on-dimensionalities" style="font-size: 80%;">More on Dimensionalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs054.html#further-remarks" style="font-size: 80%;">Further remarks</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs055.html#layers-used-to-build-cnns" style="font-size: 80%;">Layers used to build CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs056.html#transforming-images" style="font-size: 80%;">Transforming images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs057.html#cnns-in-brief" style="font-size: 80%;">CNNs in brief</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs058.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs059.html#key-idea" style="font-size: 80%;">Key Idea</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#how-to-do-image-compression-before-the-era-of-deep-learning" style="font-size: 80%;">How to do image compression before the era of deep learning</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs061.html#the-svd-example" style="font-size: 80%;">The SVD example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs062.html#mathematics-of-cnns" style="font-size: 80%;">Mathematics of CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs063.html#convolution-examples-polynomial-multiplication" style="font-size: 80%;">Convolution Examples: Polynomial multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs064.html#efficient-polynomial-multiplication" style="font-size: 80%;">Efficient Polynomial Multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs065.html#further-simplification" style="font-size: 80%;">Further simplification</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs066.html#a-more-efficient-way-of-coding-the-above-convolution" style="font-size: 80%;">A more efficient way of coding the above Convolution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs067.html#commutative-process" style="font-size: 80%;">Commutative process</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs068.html#toeplitz-matrices" style="font-size: 80%;">Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs069.html#fourier-series-and-toeplitz-matrices" style="font-size: 80%;">Fourier series and Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs070.html#generalizing-the-above-one-dimensional-case" style="font-size: 80%;">Generalizing the above one-dimensional case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs071.html#memory-considerations" style="font-size: 80%;">Memory considerations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs072.html#padding" style="font-size: 80%;">Padding</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs073.html#new-vector" style="font-size: 80%;">New vector</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs074.html#rewriting-as-dot-products" style="font-size: 80%;">Rewriting as dot products</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs075.html#cross-correlation" style="font-size: 80%;">Cross correlation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs075.html#two-dimensional-objects" style="font-size: 80%;">Two-dimensional objects</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs076.html#cnns-in-more-detail-simple-example" style="font-size: 80%;">CNNs in more detail, simple example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs077.html#the-convolution-stage" style="font-size: 80%;">The convolution stage</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs078.html#finding-the-number-of-parameters" style="font-size: 80%;">Finding the number of parameters</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs079.html#new-image-or-volume" style="font-size: 80%;">New image (or volume)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs080.html#parameters-to-train-common-settings" style="font-size: 80%;">Parameters to train, common settings</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs081.html#examples-of-cnn-setups" style="font-size: 80%;">Examples of CNN setups</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs082.html#summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Summarizing: Performing a general discrete convolution ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs083.html#pooling" style="font-size: 80%;">Pooling</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs084.html#pooling-arithmetic" style="font-size: 80%;">Pooling arithmetic</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs085.html#pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Pooling types ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs086.html#building-convolutional-neural-networks-in-tensorflow-keras-and-pytorch" style="font-size: 80%;">Building convolutional neural networks in Tensorflow/Keras and PyTorch</a></li>
 
         </ul>
       </li>
@@ -348,41 +389,35 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0019"></a>
 <!-- !split -->
-<h2 id="mathematics-of-cnns" class="anchor">Mathematics of CNNs </h2>
+<h2 id="final-technicalities-ii" class="anchor">Final technicalities II </h2>
 
-<p>The mathematics of CNNs is based on the mathematical operation of
-<b>convolution</b>.  In mathematics (in particular in functional analysis),
-convolution is represented by mathematical operations (integration,
-summation etc) on two functions in order to produce a third function
-that expresses how the shape of one gets modified by the other.
-Convolution has a plethora of applications in a variety of
-disciplines, spanning from statistics to signal processing, computer
-vision, solutions of differential equations,linear algebra,
-engineering, and yes, machine learning.
-</p>
+<p>The vector \( \boldsymbol{p}_{i, \text{hidden}}^T \) constitutes each row in \( P_{\text{hidden} } \), which contains the weights for the neural network to minimize according to <a href="http://github.com/._week44-bs015.html#mjx-eqn-9">(9)</a>.</p>
 
-<p>Mathematically, convolution is defined as follows (one-dimensional example):
-Let us define a continuous function \( y(t) \) given by
-</p>
-$$
-y(t) = \int x(a) w(t-a) da,
-$$
+<p>After having found \( \boldsymbol{z}_{i}^{\text{hidden}}  \) for every \( i \)-th neuron within the hidden layer, the vector will be sent to an activation function \( a_i(\boldsymbol{z}) \).</p>
 
-<p>where \( x(a) \) represents a so-called input and \( w(t-a) \) is normally called the weight function or kernel.</p>
+<p>In this example, the sigmoid function has been chosen to be the activation function for each hidden neuron:</p>
 
-<p>The above integral is written in  a more compact form as</p>
 $$
-y(t) = \left(x * w\right)(t).
+f(z) = \frac{1}{1 + \exp{(-z)}}
 $$
 
-<p>The discretized version reads</p>
-$$
-y(t) = \sum_{a=-\infty}^{a=\infty}x(a)w(t-a).
+<p>It is possible to use other activations functions for the hidden layer also.</p>
+
+<p>The output \( \boldsymbol{x}_i^{\text{hidden}} \) from each \( i \)-th hidden neuron is:</p>
+
+<p>$$
+\boldsymbol{x}_i^{\text{hidden} } = f\big(  \boldsymbol{z}_{i}^{\text{hidden}} \big)
 $$
+</p>
 
-<p>Computing the inverse of the above convolution operations is known as deconvolution and the process is commutative.</p>
+<p>The outputs \( \boldsymbol{x}_i^{\text{hidden} }  \) are then sent to the output layer.</p>
 
-<p>How can we use this? And what does it mean? Let us study some familiar examples first.</p>
+<p>The output layer consists of one neuron in this case, and combines the
+output from each of the neurons in the hidden layers. The output layer
+combines the results from the hidden layer using some weights \( w_i^{\text{output}} \)
+and biases \( b_i^{\text{output}} \). In this case,
+it is assumes that the number of neurons in the output layer is one.
+</p>
 
 <p>
 <!-- navigation buttons at the bottom of the page -->
@@ -409,7 +444,7 @@ <h2 id="mathematics-of-cnns" class="anchor">Mathematics of CNNs </h2>
   <li><a href="http://github.com/._week44-bs027.html">28</a></li>
   <li><a href="http://github.com/._week44-bs028.html">29</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week44-bs060.html">61</a></li>
+  <li><a href="http://github.com/._week44-bs086.html">87</a></li>
   <li><a href="http://github.com/._week44-bs020.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week44/html/._week44-bs020.html b/doc/pub/week44/html/._week44-bs020.html
index daf6ddd1a..0b57136e8 100644
--- a/doc/pub/week44/html/._week44-bs020.html
+++ b/doc/pub/week44/html/._week44-bs020.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 44,  Convolutional Neural Networks (CNN)">
-<title>Week 44,  Convolutional Neural Networks (CNN)</title>
+<meta name="description" content="Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)">
+<title>Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week44.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week44-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -41,10 +41,121 @@
                2,
                None,
                'lab-sessions-on-tuesday-and-wednesday'),
-              ('Material for Lecture Monday October 28',
+              ('Material for Lecture Monday October 27',
                2,
                None,
-               'material-for-lecture-monday-october-28'),
+               'material-for-lecture-monday-october-27'),
+              ('Solving differential equations  with Deep Learning',
+               2,
+               None,
+               'solving-differential-equations-with-deep-learning'),
+              ('Ordinary Differential Equations first',
+               2,
+               None,
+               'ordinary-differential-equations-first'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('Minimization process', 2, None, 'minimization-process'),
+              ('Minimizing the cost function using gradient descent and '
+               'automatic differentiation',
+               2,
+               None,
+               'minimizing-the-cost-function-using-gradient-descent-and-automatic-differentiation'),
+              ('Example: Exponential decay',
+               2,
+               None,
+               'example-exponential-decay'),
+              ('The function to solve for',
+               2,
+               None,
+               'the-function-to-solve-for'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('Setup of Network', 2, None, 'setup-of-network'),
+              ('Reformulating the problem',
+               2,
+               None,
+               'reformulating-the-problem'),
+              ('More technicalities', 2, None, 'more-technicalities'),
+              ('More details', 2, None, 'more-details'),
+              ('A possible implementation of a neural network',
+               2,
+               None,
+               'a-possible-implementation-of-a-neural-network'),
+              ('Technicalities', 2, None, 'technicalities'),
+              ('Final technicalities I', 2, None, 'final-technicalities-i'),
+              ('Final technicalities II', 2, None, 'final-technicalities-ii'),
+              ('Final technicalities III', 2, None, 'final-technicalities-iii'),
+              ('Final technicalities IV', 2, None, 'final-technicalities-iv'),
+              ('Back propagation', 2, None, 'back-propagation'),
+              ('Gradient descent', 2, None, 'gradient-descent'),
+              ('The code for solving the ODE',
+               2,
+               None,
+               'the-code-for-solving-the-ode'),
+              ('The network with one input layer, specified number of hidden '
+               'layers, and one output layer',
+               2,
+               None,
+               'the-network-with-one-input-layer-specified-number-of-hidden-layers-and-one-output-layer'),
+              ('Example: Population growth',
+               2,
+               None,
+               'example-population-growth'),
+              ('Setting up the problem', 2, None, 'setting-up-the-problem'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('The program using Autograd',
+               2,
+               None,
+               'the-program-using-autograd'),
+              ('Using forward Euler to solve the ODE',
+               2,
+               None,
+               'using-forward-euler-to-solve-the-ode'),
+              ('Example: Solving the one dimensional Poisson equation',
+               2,
+               None,
+               'example-solving-the-one-dimensional-poisson-equation'),
+              ('The specific equation to solve for',
+               2,
+               None,
+               'the-specific-equation-to-solve-for'),
+              ('Solving the equation using Autograd',
+               2,
+               None,
+               'solving-the-equation-using-autograd'),
+              ('Comparing with a numerical scheme',
+               2,
+               None,
+               'comparing-with-a-numerical-scheme'),
+              ('Setting up the code', 2, None, 'setting-up-the-code'),
+              ('Partial Differential Equations',
+               2,
+               None,
+               'partial-differential-equations'),
+              ('Type of problem', 2, None, 'type-of-problem'),
+              ('Network requirements', 2, None, 'network-requirements'),
+              ('More details', 2, None, 'more-details'),
+              ('Example: The diffusion equation',
+               2,
+               None,
+               'example-the-diffusion-equation'),
+              ('Defining the problem', 2, None, 'defining-the-problem'),
+              ('Setting up the network using Autograd',
+               2,
+               None,
+               'setting-up-the-network-using-autograd'),
+              ('Setting up the network using Autograd; The trial solution',
+               2,
+               None,
+               'setting-up-the-network-using-autograd-the-trial-solution'),
+              ('Why the Jacobian?', 2, None, 'why-the-jacobian'),
+              ('Setting up the network using Autograd; The full program',
+               2,
+               None,
+               'setting-up-the-network-using-autograd-the-full-program'),
+              ('Resources on differential equations and deep learning',
+               2,
+               None,
+               'resources-on-differential-equations-and-deep-learning'),
               ('Convolutional Neural Networks (recognizing images)',
                2,
                None,
@@ -141,85 +252,11 @@
                2,
                None,
                'pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
-              ('Building convolutional neural networks in Tensorflow and Keras',
-               2,
-               None,
-               'building-convolutional-neural-networks-in-tensorflow-and-keras'),
-              ('Setting it up', 2, None, 'setting-it-up'),
-              ('The MNIST dataset again', 2, None, 'the-mnist-dataset-again'),
-              ('Strong correlations', 2, None, 'strong-correlations'),
-              ('Layers of a CNN', 2, None, 'layers-of-a-cnn'),
-              ('Systematic reduction', 2, None, 'systematic-reduction'),
-              ('Prerequisites: Collect and pre-process data',
-               2,
-               None,
-               'prerequisites-collect-and-pre-process-data'),
-              ('Importing Keras and Tensorflow',
-               2,
-               None,
-               'importing-keras-and-tensorflow'),
-              ('Running with Keras', 2, None, 'running-with-keras'),
-              ('Final part', 2, None, 'final-part'),
-              ('Final visualization', 2, None, 'final-visualization'),
-              ('The CIFAR01 data set', 2, None, 'the-cifar01-data-set'),
-              ('Verifying the data set', 2, None, 'verifying-the-data-set'),
-              ('Set up  the model', 2, None, 'set-up-the-model'),
-              ('Add Dense layers on top', 2, None, 'add-dense-layers-on-top'),
-              ('Compile and train the model',
+              ('Building convolutional neural networks in Tensorflow/Keras and '
+               'PyTorch',
                2,
                None,
-               'compile-and-train-the-model'),
-              ('Finally, evaluate the model',
-               2,
-               None,
-               'finally-evaluate-the-model'),
-              ('Building our own CNN code',
-               2,
-               None,
-               'building-our-own-cnn-code'),
-              ('List of contents:', 3, None, 'list-of-contents'),
-              ('Schedulers', 3, None, 'schedulers'),
-              ('Usage of schedulers', 3, None, 'usage-of-schedulers'),
-              ('Cost functions', 3, None, 'cost-functions'),
-              ('Usage of cost functions', 3, None, 'usage-of-cost-functions'),
-              ('Activation functions', 3, None, 'activation-functions'),
-              ('Usage of activation functions',
-               3,
-               None,
-               'usage-of-activation-functions'),
-              ('Convolution', 3, None, 'convolution'),
-              ('Layers', 3, None, 'layers'),
-              ('Convolution2DLayer: convolution in a hidden layer',
-               3,
-               None,
-               'convolution2dlayer-convolution-in-a-hidden-layer'),
-              ('Backpropagation in the convolutional layer',
-               3,
-               None,
-               'backpropagation-in-the-convolutional-layer'),
-              ('Demonstration', 3, None, 'demonstration'),
-              ('Pooling Layer', 3, None, 'pooling-layer'),
-              ('Flattening Layer', 3, None, 'flattening-layer'),
-              ('Fully Connected Layers', 3, None, 'fully-connected-layers'),
-              ('Optimized Convolution2DLayer',
-               3,
-               None,
-               'optimized-convolution2dlayer'),
-              ('The Convolutional Neural Network (CNN)',
-               3,
-               None,
-               'the-convolutional-neural-network-cnn'),
-              ('Usage of CNN code', 3, None, 'usage-of-cnn-code'),
-              ('Additional Remarks', 3, None, 'additional-remarks'),
-              ('Remarks on the speed', 3, None, 'remarks-on-the-speed'),
-              ('Convolution using separable kernels',
-               3,
-               None,
-               'convolution-using-separable-kernels'),
-              ('Convolution in the Fourier domain',
-               3,
-               None,
-               'convolution-in-the-fourier-domain')]}
+               'building-convolutional-neural-networks-in-tensorflow-keras-and-pytorch')]}
 end of tocinfo -->
 
 <body>
@@ -247,96 +284,100 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week44-bs.html">Week 44,  Convolutional Neural Networks (CNN)</a>
+    <a class="navbar-brand" href="http://github.com/week44-bs.html">Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs001.html#plan-for-week-44" style="font-size: 80%;"><b>Plan for week 44</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs002.html#lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Lab  sessions on Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs003.html#material-for-lecture-monday-october-28" style="font-size: 80%;"><b>Material for Lecture Monday October 28</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs004.html#convolutional-neural-networks-recognizing-images" style="font-size: 80%;"><b>Convolutional Neural Networks (recognizing images)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs005.html#what-is-the-difference" style="font-size: 80%;"><b>What is the Difference</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs006.html#neural-networks-vs-cnns" style="font-size: 80%;"><b>Neural Networks vs CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs007.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;"><b>Why CNNS for images, sound files, medical images from CT scans etc?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs008.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;"><b>Regular NNs don’t scale well to full images</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs009.html#3d-volumes-of-neurons" style="font-size: 80%;"><b>3D volumes of neurons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs010.html#more-on-dimensionalities" style="font-size: 80%;"><b>More on Dimensionalities</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs011.html#further-remarks" style="font-size: 80%;"><b>Further remarks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs012.html#layers-used-to-build-cnns" style="font-size: 80%;"><b>Layers used to build CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs013.html#transforming-images" style="font-size: 80%;"><b>Transforming images</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs014.html#cnns-in-brief" style="font-size: 80%;"><b>CNNs in brief</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs015.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs016.html#key-idea" style="font-size: 80%;"><b>Key Idea</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs017.html#how-to-do-image-compression-before-the-era-of-deep-learning" style="font-size: 80%;"><b>How to do image compression before the era of deep learning</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs018.html#the-svd-example" style="font-size: 80%;"><b>The SVD example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs019.html#mathematics-of-cnns" style="font-size: 80%;"><b>Mathematics of CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="#convolution-examples-polynomial-multiplication" style="font-size: 80%;"><b>Convolution Examples: Polynomial multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs021.html#efficient-polynomial-multiplication" style="font-size: 80%;"><b>Efficient Polynomial Multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs022.html#further-simplification" style="font-size: 80%;"><b>Further simplification</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs023.html#a-more-efficient-way-of-coding-the-above-convolution" style="font-size: 80%;"><b>A more efficient way of coding the above Convolution</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs024.html#commutative-process" style="font-size: 80%;"><b>Commutative process</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs025.html#toeplitz-matrices" style="font-size: 80%;"><b>Toeplitz matrices</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs026.html#fourier-series-and-toeplitz-matrices" style="font-size: 80%;"><b>Fourier series and Toeplitz matrices</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs027.html#generalizing-the-above-one-dimensional-case" style="font-size: 80%;"><b>Generalizing the above one-dimensional case</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#memory-considerations" style="font-size: 80%;"><b>Memory considerations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs029.html#padding" style="font-size: 80%;"><b>Padding</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs030.html#new-vector" style="font-size: 80%;"><b>New vector</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs031.html#rewriting-as-dot-products" style="font-size: 80%;"><b>Rewriting as dot products</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs032.html#cross-correlation" style="font-size: 80%;"><b>Cross correlation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs032.html#two-dimensional-objects" style="font-size: 80%;"><b>Two-dimensional objects</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs033.html#cnns-in-more-detail-simple-example" style="font-size: 80%;"><b>CNNs in more detail, simple example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs034.html#the-convolution-stage" style="font-size: 80%;"><b>The convolution stage</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs035.html#finding-the-number-of-parameters" style="font-size: 80%;"><b>Finding the number of parameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs036.html#new-image-or-volume" style="font-size: 80%;"><b>New image (or volume)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs037.html#parameters-to-train-common-settings" style="font-size: 80%;"><b>Parameters to train, common settings</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs038.html#examples-of-cnn-setups" style="font-size: 80%;"><b>Examples of CNN setups</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs039.html#summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>Summarizing: Performing a general discrete convolution ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs040.html#pooling" style="font-size: 80%;"><b>Pooling</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs041.html#pooling-arithmetic" style="font-size: 80%;"><b>Pooling arithmetic</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs042.html#pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>Pooling types ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs043.html#building-convolutional-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building convolutional neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs044.html#setting-it-up" style="font-size: 80%;"><b>Setting it up</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs045.html#the-mnist-dataset-again" style="font-size: 80%;"><b>The MNIST dataset again</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs046.html#strong-correlations" style="font-size: 80%;"><b>Strong correlations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs047.html#layers-of-a-cnn" style="font-size: 80%;"><b>Layers of a CNN</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs048.html#systematic-reduction" style="font-size: 80%;"><b>Systematic reduction</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs049.html#prerequisites-collect-and-pre-process-data" style="font-size: 80%;"><b>Prerequisites: Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs050.html#importing-keras-and-tensorflow" style="font-size: 80%;"><b>Importing Keras and Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs051.html#running-with-keras" style="font-size: 80%;"><b>Running with Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs052.html#final-part" style="font-size: 80%;"><b>Final part</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs053.html#final-visualization" style="font-size: 80%;"><b>Final visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs054.html#the-cifar01-data-set" style="font-size: 80%;"><b>The CIFAR01 data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs055.html#verifying-the-data-set" style="font-size: 80%;"><b>Verifying the data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs056.html#set-up-the-model" style="font-size: 80%;"><b>Set up  the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs057.html#add-dense-layers-on-top" style="font-size: 80%;"><b>Add Dense layers on top</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs058.html#compile-and-train-the-model" style="font-size: 80%;"><b>Compile and train the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs059.html#finally-evaluate-the-model" style="font-size: 80%;"><b>Finally, evaluate the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#building-our-own-cnn-code" style="font-size: 80%;"><b>Building our own CNN code</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#list-of-contents" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;List of contents:</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution2dlayer-convolution-in-a-hidden-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution2DLayer: convolution in a hidden layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#backpropagation-in-the-convolutional-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Backpropagation in the convolutional layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#demonstration" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Demonstration</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#pooling-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Pooling Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#flattening-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Flattening Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#fully-connected-layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Fully Connected Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#optimized-convolution2dlayer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Optimized Convolution2DLayer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#the-convolutional-neural-network-cnn" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;The Convolutional Neural Network (CNN)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-cnn-code" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of CNN code</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#additional-remarks" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Additional Remarks</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#remarks-on-the-speed" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Remarks on the speed</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution-using-separable-kernels" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution using separable kernels</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution-in-the-fourier-domain" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution in the Fourier domain</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs001.html#plan-for-week-44" style="font-size: 80%;">Plan for week 44</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs002.html#lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;">Lab  sessions on Tuesday and Wednesday</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs003.html#material-for-lecture-monday-october-27" style="font-size: 80%;">Material for Lecture Monday October 27</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs004.html#solving-differential-equations-with-deep-learning" style="font-size: 80%;">Solving differential equations  with Deep Learning</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs005.html#ordinary-differential-equations-first" style="font-size: 80%;">Ordinary Differential Equations first</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#the-trial-solution" style="font-size: 80%;">The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs007.html#minimization-process" style="font-size: 80%;">Minimization process</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs008.html#minimizing-the-cost-function-using-gradient-descent-and-automatic-differentiation" style="font-size: 80%;">Minimizing the cost function using gradient descent and automatic differentiation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs009.html#example-exponential-decay" style="font-size: 80%;">Example: Exponential decay</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs010.html#the-function-to-solve-for" style="font-size: 80%;">The function to solve for</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#the-trial-solution" style="font-size: 80%;">The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs012.html#setup-of-network" style="font-size: 80%;">Setup of Network</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs013.html#reformulating-the-problem" style="font-size: 80%;">Reformulating the problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs014.html#more-technicalities" style="font-size: 80%;">More technicalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs039.html#more-details" style="font-size: 80%;">More details</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs016.html#a-possible-implementation-of-a-neural-network" style="font-size: 80%;">A possible implementation of a neural network</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs017.html#technicalities" style="font-size: 80%;">Technicalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs018.html#final-technicalities-i" style="font-size: 80%;">Final technicalities I</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs019.html#final-technicalities-ii" style="font-size: 80%;">Final technicalities II</a></li>
+     <!-- navigation toc: --> <li><a href="#final-technicalities-iii" style="font-size: 80%;">Final technicalities III</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs021.html#final-technicalities-iv" style="font-size: 80%;">Final technicalities IV</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs022.html#back-propagation" style="font-size: 80%;">Back propagation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs023.html#gradient-descent" style="font-size: 80%;">Gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs024.html#the-code-for-solving-the-ode" style="font-size: 80%;">The code for solving the ODE</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs025.html#the-network-with-one-input-layer-specified-number-of-hidden-layers-and-one-output-layer" style="font-size: 80%;">The network with one input layer, specified number of hidden layers, and one output layer</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs026.html#example-population-growth" style="font-size: 80%;">Example: Population growth</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs027.html#setting-up-the-problem" style="font-size: 80%;">Setting up the problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#the-trial-solution" style="font-size: 80%;">The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs029.html#the-program-using-autograd" style="font-size: 80%;">The program using Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs030.html#using-forward-euler-to-solve-the-ode" style="font-size: 80%;">Using forward Euler to solve the ODE</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs031.html#example-solving-the-one-dimensional-poisson-equation" style="font-size: 80%;">Example: Solving the one dimensional Poisson equation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs032.html#the-specific-equation-to-solve-for" style="font-size: 80%;">The specific equation to solve for</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs033.html#solving-the-equation-using-autograd" style="font-size: 80%;">Solving the equation using Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs034.html#comparing-with-a-numerical-scheme" style="font-size: 80%;">Comparing with a numerical scheme</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs035.html#setting-up-the-code" style="font-size: 80%;">Setting up the code</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs036.html#partial-differential-equations" style="font-size: 80%;">Partial Differential Equations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs037.html#type-of-problem" style="font-size: 80%;">Type of problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs038.html#network-requirements" style="font-size: 80%;">Network requirements</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs039.html#more-details" style="font-size: 80%;">More details</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs040.html#example-the-diffusion-equation" style="font-size: 80%;">Example: The diffusion equation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs041.html#defining-the-problem" style="font-size: 80%;">Defining the problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs042.html#setting-up-the-network-using-autograd" style="font-size: 80%;">Setting up the network using Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs043.html#setting-up-the-network-using-autograd-the-trial-solution" style="font-size: 80%;">Setting up the network using Autograd; The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs044.html#why-the-jacobian" style="font-size: 80%;">Why the Jacobian?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs045.html#setting-up-the-network-using-autograd-the-full-program" style="font-size: 80%;">Setting up the network using Autograd; The full program</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs046.html#resources-on-differential-equations-and-deep-learning" style="font-size: 80%;">Resources on differential equations and deep learning</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs047.html#convolutional-neural-networks-recognizing-images" style="font-size: 80%;">Convolutional Neural Networks (recognizing images)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs048.html#what-is-the-difference" style="font-size: 80%;">What is the Difference</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs049.html#neural-networks-vs-cnns" style="font-size: 80%;">Neural Networks vs CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs050.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;">Why CNNS for images, sound files, medical images from CT scans etc?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs051.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;">Regular NNs don’t scale well to full images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs052.html#3d-volumes-of-neurons" style="font-size: 80%;">3D volumes of neurons</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs053.html#more-on-dimensionalities" style="font-size: 80%;">More on Dimensionalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs054.html#further-remarks" style="font-size: 80%;">Further remarks</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs055.html#layers-used-to-build-cnns" style="font-size: 80%;">Layers used to build CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs056.html#transforming-images" style="font-size: 80%;">Transforming images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs057.html#cnns-in-brief" style="font-size: 80%;">CNNs in brief</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs058.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs059.html#key-idea" style="font-size: 80%;">Key Idea</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#how-to-do-image-compression-before-the-era-of-deep-learning" style="font-size: 80%;">How to do image compression before the era of deep learning</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs061.html#the-svd-example" style="font-size: 80%;">The SVD example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs062.html#mathematics-of-cnns" style="font-size: 80%;">Mathematics of CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs063.html#convolution-examples-polynomial-multiplication" style="font-size: 80%;">Convolution Examples: Polynomial multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs064.html#efficient-polynomial-multiplication" style="font-size: 80%;">Efficient Polynomial Multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs065.html#further-simplification" style="font-size: 80%;">Further simplification</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs066.html#a-more-efficient-way-of-coding-the-above-convolution" style="font-size: 80%;">A more efficient way of coding the above Convolution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs067.html#commutative-process" style="font-size: 80%;">Commutative process</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs068.html#toeplitz-matrices" style="font-size: 80%;">Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs069.html#fourier-series-and-toeplitz-matrices" style="font-size: 80%;">Fourier series and Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs070.html#generalizing-the-above-one-dimensional-case" style="font-size: 80%;">Generalizing the above one-dimensional case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs071.html#memory-considerations" style="font-size: 80%;">Memory considerations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs072.html#padding" style="font-size: 80%;">Padding</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs073.html#new-vector" style="font-size: 80%;">New vector</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs074.html#rewriting-as-dot-products" style="font-size: 80%;">Rewriting as dot products</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs075.html#cross-correlation" style="font-size: 80%;">Cross correlation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs075.html#two-dimensional-objects" style="font-size: 80%;">Two-dimensional objects</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs076.html#cnns-in-more-detail-simple-example" style="font-size: 80%;">CNNs in more detail, simple example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs077.html#the-convolution-stage" style="font-size: 80%;">The convolution stage</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs078.html#finding-the-number-of-parameters" style="font-size: 80%;">Finding the number of parameters</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs079.html#new-image-or-volume" style="font-size: 80%;">New image (or volume)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs080.html#parameters-to-train-common-settings" style="font-size: 80%;">Parameters to train, common settings</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs081.html#examples-of-cnn-setups" style="font-size: 80%;">Examples of CNN setups</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs082.html#summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Summarizing: Performing a general discrete convolution ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs083.html#pooling" style="font-size: 80%;">Pooling</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs084.html#pooling-arithmetic" style="font-size: 80%;">Pooling arithmetic</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs085.html#pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Pooling types ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs086.html#building-convolutional-neural-networks-in-tensorflow-keras-and-pytorch" style="font-size: 80%;">Building convolutional neural networks in Tensorflow/Keras and PyTorch</a></li>
 
         </ul>
       </li>
@@ -348,29 +389,21 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0020"></a>
 <!-- !split -->
-<h2 id="convolution-examples-polynomial-multiplication" class="anchor">Convolution Examples: Polynomial multiplication </h2>
-
-<p>Our first example is that of a multiplication between two polynomials,
-which we will rewrite in terms of the mathematics of convolution. In
-the final stage, since the problem here is a discrete one, we will
-recast the final expression in terms of a matrix-vector
-multiplication, where the matrix is a so-called <a href="https://link.springer.com/book/10.1007/978-93-86279-04-0" target="_self">Toeplitz matrix
-</a>.
-</p>
-
-<p>Let us look a the following polynomials to second and third order, respectively:</p>
-$$
-p(t) = \alpha_0+\alpha_1 t+\alpha_2 t^2,
-$$
+<h2 id="final-technicalities-iii" class="anchor">Final technicalities III </h2>
 
-<p>and</p>
-$$
-s(t) = \beta_0+\beta_1 t+\beta_2 t^2+\beta_3 t^3.
-$$
+<p>The procedure of weighting the output neuron \( j \) in the hidden layer to the \( i \)-th neuron in the output layer is similar as for the hidden layer described previously.</p>
 
-<p>The polynomial multiplication gives us a new polynomial of degree \( 5 \)</p>
 $$
-z(t) = \delta_0+\delta_1 t+\delta_2 t^2+\delta_3 t^3+\delta_4 t^4+\delta_5 t^5.
+\begin{aligned}
+z_{1,j}^{\text{output}} & =
+\begin{pmatrix}
+b_1^{\text{output}} & \boldsymbol{w}_1^{\text{output}}
+\end{pmatrix}
+\begin{pmatrix}
+1 \\
+\boldsymbol{x}_j^{\text{hidden}}
+\end{pmatrix}
+\end{aligned}
 $$
 
 
@@ -399,7 +432,7 @@ <h2 id="convolution-examples-polynomial-multiplication" class="anchor">Convoluti
   <li><a href="http://github.com/._week44-bs028.html">29</a></li>
   <li><a href="http://github.com/._week44-bs029.html">30</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week44-bs060.html">61</a></li>
+  <li><a href="http://github.com/._week44-bs086.html">87</a></li>
   <li><a href="http://github.com/._week44-bs021.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week44/html/._week44-bs021.html b/doc/pub/week44/html/._week44-bs021.html
index 979a86d78..fd81f7aed 100644
--- a/doc/pub/week44/html/._week44-bs021.html
+++ b/doc/pub/week44/html/._week44-bs021.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 44,  Convolutional Neural Networks (CNN)">
-<title>Week 44,  Convolutional Neural Networks (CNN)</title>
+<meta name="description" content="Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)">
+<title>Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week44.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week44-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -41,10 +41,121 @@
                2,
                None,
                'lab-sessions-on-tuesday-and-wednesday'),
-              ('Material for Lecture Monday October 28',
+              ('Material for Lecture Monday October 27',
                2,
                None,
-               'material-for-lecture-monday-october-28'),
+               'material-for-lecture-monday-october-27'),
+              ('Solving differential equations  with Deep Learning',
+               2,
+               None,
+               'solving-differential-equations-with-deep-learning'),
+              ('Ordinary Differential Equations first',
+               2,
+               None,
+               'ordinary-differential-equations-first'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('Minimization process', 2, None, 'minimization-process'),
+              ('Minimizing the cost function using gradient descent and '
+               'automatic differentiation',
+               2,
+               None,
+               'minimizing-the-cost-function-using-gradient-descent-and-automatic-differentiation'),
+              ('Example: Exponential decay',
+               2,
+               None,
+               'example-exponential-decay'),
+              ('The function to solve for',
+               2,
+               None,
+               'the-function-to-solve-for'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('Setup of Network', 2, None, 'setup-of-network'),
+              ('Reformulating the problem',
+               2,
+               None,
+               'reformulating-the-problem'),
+              ('More technicalities', 2, None, 'more-technicalities'),
+              ('More details', 2, None, 'more-details'),
+              ('A possible implementation of a neural network',
+               2,
+               None,
+               'a-possible-implementation-of-a-neural-network'),
+              ('Technicalities', 2, None, 'technicalities'),
+              ('Final technicalities I', 2, None, 'final-technicalities-i'),
+              ('Final technicalities II', 2, None, 'final-technicalities-ii'),
+              ('Final technicalities III', 2, None, 'final-technicalities-iii'),
+              ('Final technicalities IV', 2, None, 'final-technicalities-iv'),
+              ('Back propagation', 2, None, 'back-propagation'),
+              ('Gradient descent', 2, None, 'gradient-descent'),
+              ('The code for solving the ODE',
+               2,
+               None,
+               'the-code-for-solving-the-ode'),
+              ('The network with one input layer, specified number of hidden '
+               'layers, and one output layer',
+               2,
+               None,
+               'the-network-with-one-input-layer-specified-number-of-hidden-layers-and-one-output-layer'),
+              ('Example: Population growth',
+               2,
+               None,
+               'example-population-growth'),
+              ('Setting up the problem', 2, None, 'setting-up-the-problem'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('The program using Autograd',
+               2,
+               None,
+               'the-program-using-autograd'),
+              ('Using forward Euler to solve the ODE',
+               2,
+               None,
+               'using-forward-euler-to-solve-the-ode'),
+              ('Example: Solving the one dimensional Poisson equation',
+               2,
+               None,
+               'example-solving-the-one-dimensional-poisson-equation'),
+              ('The specific equation to solve for',
+               2,
+               None,
+               'the-specific-equation-to-solve-for'),
+              ('Solving the equation using Autograd',
+               2,
+               None,
+               'solving-the-equation-using-autograd'),
+              ('Comparing with a numerical scheme',
+               2,
+               None,
+               'comparing-with-a-numerical-scheme'),
+              ('Setting up the code', 2, None, 'setting-up-the-code'),
+              ('Partial Differential Equations',
+               2,
+               None,
+               'partial-differential-equations'),
+              ('Type of problem', 2, None, 'type-of-problem'),
+              ('Network requirements', 2, None, 'network-requirements'),
+              ('More details', 2, None, 'more-details'),
+              ('Example: The diffusion equation',
+               2,
+               None,
+               'example-the-diffusion-equation'),
+              ('Defining the problem', 2, None, 'defining-the-problem'),
+              ('Setting up the network using Autograd',
+               2,
+               None,
+               'setting-up-the-network-using-autograd'),
+              ('Setting up the network using Autograd; The trial solution',
+               2,
+               None,
+               'setting-up-the-network-using-autograd-the-trial-solution'),
+              ('Why the Jacobian?', 2, None, 'why-the-jacobian'),
+              ('Setting up the network using Autograd; The full program',
+               2,
+               None,
+               'setting-up-the-network-using-autograd-the-full-program'),
+              ('Resources on differential equations and deep learning',
+               2,
+               None,
+               'resources-on-differential-equations-and-deep-learning'),
               ('Convolutional Neural Networks (recognizing images)',
                2,
                None,
@@ -141,85 +252,11 @@
                2,
                None,
                'pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
-              ('Building convolutional neural networks in Tensorflow and Keras',
-               2,
-               None,
-               'building-convolutional-neural-networks-in-tensorflow-and-keras'),
-              ('Setting it up', 2, None, 'setting-it-up'),
-              ('The MNIST dataset again', 2, None, 'the-mnist-dataset-again'),
-              ('Strong correlations', 2, None, 'strong-correlations'),
-              ('Layers of a CNN', 2, None, 'layers-of-a-cnn'),
-              ('Systematic reduction', 2, None, 'systematic-reduction'),
-              ('Prerequisites: Collect and pre-process data',
-               2,
-               None,
-               'prerequisites-collect-and-pre-process-data'),
-              ('Importing Keras and Tensorflow',
-               2,
-               None,
-               'importing-keras-and-tensorflow'),
-              ('Running with Keras', 2, None, 'running-with-keras'),
-              ('Final part', 2, None, 'final-part'),
-              ('Final visualization', 2, None, 'final-visualization'),
-              ('The CIFAR01 data set', 2, None, 'the-cifar01-data-set'),
-              ('Verifying the data set', 2, None, 'verifying-the-data-set'),
-              ('Set up  the model', 2, None, 'set-up-the-model'),
-              ('Add Dense layers on top', 2, None, 'add-dense-layers-on-top'),
-              ('Compile and train the model',
-               2,
-               None,
-               'compile-and-train-the-model'),
-              ('Finally, evaluate the model',
-               2,
-               None,
-               'finally-evaluate-the-model'),
-              ('Building our own CNN code',
+              ('Building convolutional neural networks in Tensorflow/Keras and '
+               'PyTorch',
                2,
                None,
-               'building-our-own-cnn-code'),
-              ('List of contents:', 3, None, 'list-of-contents'),
-              ('Schedulers', 3, None, 'schedulers'),
-              ('Usage of schedulers', 3, None, 'usage-of-schedulers'),
-              ('Cost functions', 3, None, 'cost-functions'),
-              ('Usage of cost functions', 3, None, 'usage-of-cost-functions'),
-              ('Activation functions', 3, None, 'activation-functions'),
-              ('Usage of activation functions',
-               3,
-               None,
-               'usage-of-activation-functions'),
-              ('Convolution', 3, None, 'convolution'),
-              ('Layers', 3, None, 'layers'),
-              ('Convolution2DLayer: convolution in a hidden layer',
-               3,
-               None,
-               'convolution2dlayer-convolution-in-a-hidden-layer'),
-              ('Backpropagation in the convolutional layer',
-               3,
-               None,
-               'backpropagation-in-the-convolutional-layer'),
-              ('Demonstration', 3, None, 'demonstration'),
-              ('Pooling Layer', 3, None, 'pooling-layer'),
-              ('Flattening Layer', 3, None, 'flattening-layer'),
-              ('Fully Connected Layers', 3, None, 'fully-connected-layers'),
-              ('Optimized Convolution2DLayer',
-               3,
-               None,
-               'optimized-convolution2dlayer'),
-              ('The Convolutional Neural Network (CNN)',
-               3,
-               None,
-               'the-convolutional-neural-network-cnn'),
-              ('Usage of CNN code', 3, None, 'usage-of-cnn-code'),
-              ('Additional Remarks', 3, None, 'additional-remarks'),
-              ('Remarks on the speed', 3, None, 'remarks-on-the-speed'),
-              ('Convolution using separable kernels',
-               3,
-               None,
-               'convolution-using-separable-kernels'),
-              ('Convolution in the Fourier domain',
-               3,
-               None,
-               'convolution-in-the-fourier-domain')]}
+               'building-convolutional-neural-networks-in-tensorflow-keras-and-pytorch')]}
 end of tocinfo -->
 
 <body>
@@ -247,96 +284,100 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week44-bs.html">Week 44,  Convolutional Neural Networks (CNN)</a>
+    <a class="navbar-brand" href="http://github.com/week44-bs.html">Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs001.html#plan-for-week-44" style="font-size: 80%;"><b>Plan for week 44</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs002.html#lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Lab  sessions on Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs003.html#material-for-lecture-monday-october-28" style="font-size: 80%;"><b>Material for Lecture Monday October 28</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs004.html#convolutional-neural-networks-recognizing-images" style="font-size: 80%;"><b>Convolutional Neural Networks (recognizing images)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs005.html#what-is-the-difference" style="font-size: 80%;"><b>What is the Difference</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs006.html#neural-networks-vs-cnns" style="font-size: 80%;"><b>Neural Networks vs CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs007.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;"><b>Why CNNS for images, sound files, medical images from CT scans etc?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs008.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;"><b>Regular NNs don’t scale well to full images</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs009.html#3d-volumes-of-neurons" style="font-size: 80%;"><b>3D volumes of neurons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs010.html#more-on-dimensionalities" style="font-size: 80%;"><b>More on Dimensionalities</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs011.html#further-remarks" style="font-size: 80%;"><b>Further remarks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs012.html#layers-used-to-build-cnns" style="font-size: 80%;"><b>Layers used to build CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs013.html#transforming-images" style="font-size: 80%;"><b>Transforming images</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs014.html#cnns-in-brief" style="font-size: 80%;"><b>CNNs in brief</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs015.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs016.html#key-idea" style="font-size: 80%;"><b>Key Idea</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs017.html#how-to-do-image-compression-before-the-era-of-deep-learning" style="font-size: 80%;"><b>How to do image compression before the era of deep learning</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs018.html#the-svd-example" style="font-size: 80%;"><b>The SVD example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs019.html#mathematics-of-cnns" style="font-size: 80%;"><b>Mathematics of CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs020.html#convolution-examples-polynomial-multiplication" style="font-size: 80%;"><b>Convolution Examples: Polynomial multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="#efficient-polynomial-multiplication" style="font-size: 80%;"><b>Efficient Polynomial Multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs022.html#further-simplification" style="font-size: 80%;"><b>Further simplification</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs023.html#a-more-efficient-way-of-coding-the-above-convolution" style="font-size: 80%;"><b>A more efficient way of coding the above Convolution</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs024.html#commutative-process" style="font-size: 80%;"><b>Commutative process</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs025.html#toeplitz-matrices" style="font-size: 80%;"><b>Toeplitz matrices</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs026.html#fourier-series-and-toeplitz-matrices" style="font-size: 80%;"><b>Fourier series and Toeplitz matrices</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs027.html#generalizing-the-above-one-dimensional-case" style="font-size: 80%;"><b>Generalizing the above one-dimensional case</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#memory-considerations" style="font-size: 80%;"><b>Memory considerations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs029.html#padding" style="font-size: 80%;"><b>Padding</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs030.html#new-vector" style="font-size: 80%;"><b>New vector</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs031.html#rewriting-as-dot-products" style="font-size: 80%;"><b>Rewriting as dot products</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs032.html#cross-correlation" style="font-size: 80%;"><b>Cross correlation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs032.html#two-dimensional-objects" style="font-size: 80%;"><b>Two-dimensional objects</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs033.html#cnns-in-more-detail-simple-example" style="font-size: 80%;"><b>CNNs in more detail, simple example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs034.html#the-convolution-stage" style="font-size: 80%;"><b>The convolution stage</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs035.html#finding-the-number-of-parameters" style="font-size: 80%;"><b>Finding the number of parameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs036.html#new-image-or-volume" style="font-size: 80%;"><b>New image (or volume)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs037.html#parameters-to-train-common-settings" style="font-size: 80%;"><b>Parameters to train, common settings</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs038.html#examples-of-cnn-setups" style="font-size: 80%;"><b>Examples of CNN setups</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs039.html#summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>Summarizing: Performing a general discrete convolution ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs040.html#pooling" style="font-size: 80%;"><b>Pooling</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs041.html#pooling-arithmetic" style="font-size: 80%;"><b>Pooling arithmetic</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs042.html#pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>Pooling types ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs043.html#building-convolutional-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building convolutional neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs044.html#setting-it-up" style="font-size: 80%;"><b>Setting it up</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs045.html#the-mnist-dataset-again" style="font-size: 80%;"><b>The MNIST dataset again</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs046.html#strong-correlations" style="font-size: 80%;"><b>Strong correlations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs047.html#layers-of-a-cnn" style="font-size: 80%;"><b>Layers of a CNN</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs048.html#systematic-reduction" style="font-size: 80%;"><b>Systematic reduction</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs049.html#prerequisites-collect-and-pre-process-data" style="font-size: 80%;"><b>Prerequisites: Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs050.html#importing-keras-and-tensorflow" style="font-size: 80%;"><b>Importing Keras and Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs051.html#running-with-keras" style="font-size: 80%;"><b>Running with Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs052.html#final-part" style="font-size: 80%;"><b>Final part</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs053.html#final-visualization" style="font-size: 80%;"><b>Final visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs054.html#the-cifar01-data-set" style="font-size: 80%;"><b>The CIFAR01 data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs055.html#verifying-the-data-set" style="font-size: 80%;"><b>Verifying the data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs056.html#set-up-the-model" style="font-size: 80%;"><b>Set up  the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs057.html#add-dense-layers-on-top" style="font-size: 80%;"><b>Add Dense layers on top</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs058.html#compile-and-train-the-model" style="font-size: 80%;"><b>Compile and train the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs059.html#finally-evaluate-the-model" style="font-size: 80%;"><b>Finally, evaluate the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#building-our-own-cnn-code" style="font-size: 80%;"><b>Building our own CNN code</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#list-of-contents" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;List of contents:</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution2dlayer-convolution-in-a-hidden-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution2DLayer: convolution in a hidden layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#backpropagation-in-the-convolutional-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Backpropagation in the convolutional layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#demonstration" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Demonstration</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#pooling-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Pooling Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#flattening-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Flattening Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#fully-connected-layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Fully Connected Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#optimized-convolution2dlayer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Optimized Convolution2DLayer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#the-convolutional-neural-network-cnn" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;The Convolutional Neural Network (CNN)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-cnn-code" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of CNN code</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#additional-remarks" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Additional Remarks</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#remarks-on-the-speed" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Remarks on the speed</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution-using-separable-kernels" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution using separable kernels</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution-in-the-fourier-domain" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution in the Fourier domain</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs001.html#plan-for-week-44" style="font-size: 80%;">Plan for week 44</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs002.html#lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;">Lab  sessions on Tuesday and Wednesday</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs003.html#material-for-lecture-monday-october-27" style="font-size: 80%;">Material for Lecture Monday October 27</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs004.html#solving-differential-equations-with-deep-learning" style="font-size: 80%;">Solving differential equations  with Deep Learning</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs005.html#ordinary-differential-equations-first" style="font-size: 80%;">Ordinary Differential Equations first</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#the-trial-solution" style="font-size: 80%;">The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs007.html#minimization-process" style="font-size: 80%;">Minimization process</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs008.html#minimizing-the-cost-function-using-gradient-descent-and-automatic-differentiation" style="font-size: 80%;">Minimizing the cost function using gradient descent and automatic differentiation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs009.html#example-exponential-decay" style="font-size: 80%;">Example: Exponential decay</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs010.html#the-function-to-solve-for" style="font-size: 80%;">The function to solve for</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#the-trial-solution" style="font-size: 80%;">The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs012.html#setup-of-network" style="font-size: 80%;">Setup of Network</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs013.html#reformulating-the-problem" style="font-size: 80%;">Reformulating the problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs014.html#more-technicalities" style="font-size: 80%;">More technicalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs039.html#more-details" style="font-size: 80%;">More details</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs016.html#a-possible-implementation-of-a-neural-network" style="font-size: 80%;">A possible implementation of a neural network</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs017.html#technicalities" style="font-size: 80%;">Technicalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs018.html#final-technicalities-i" style="font-size: 80%;">Final technicalities I</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs019.html#final-technicalities-ii" style="font-size: 80%;">Final technicalities II</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs020.html#final-technicalities-iii" style="font-size: 80%;">Final technicalities III</a></li>
+     <!-- navigation toc: --> <li><a href="#final-technicalities-iv" style="font-size: 80%;">Final technicalities IV</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs022.html#back-propagation" style="font-size: 80%;">Back propagation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs023.html#gradient-descent" style="font-size: 80%;">Gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs024.html#the-code-for-solving-the-ode" style="font-size: 80%;">The code for solving the ODE</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs025.html#the-network-with-one-input-layer-specified-number-of-hidden-layers-and-one-output-layer" style="font-size: 80%;">The network with one input layer, specified number of hidden layers, and one output layer</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs026.html#example-population-growth" style="font-size: 80%;">Example: Population growth</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs027.html#setting-up-the-problem" style="font-size: 80%;">Setting up the problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#the-trial-solution" style="font-size: 80%;">The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs029.html#the-program-using-autograd" style="font-size: 80%;">The program using Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs030.html#using-forward-euler-to-solve-the-ode" style="font-size: 80%;">Using forward Euler to solve the ODE</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs031.html#example-solving-the-one-dimensional-poisson-equation" style="font-size: 80%;">Example: Solving the one dimensional Poisson equation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs032.html#the-specific-equation-to-solve-for" style="font-size: 80%;">The specific equation to solve for</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs033.html#solving-the-equation-using-autograd" style="font-size: 80%;">Solving the equation using Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs034.html#comparing-with-a-numerical-scheme" style="font-size: 80%;">Comparing with a numerical scheme</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs035.html#setting-up-the-code" style="font-size: 80%;">Setting up the code</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs036.html#partial-differential-equations" style="font-size: 80%;">Partial Differential Equations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs037.html#type-of-problem" style="font-size: 80%;">Type of problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs038.html#network-requirements" style="font-size: 80%;">Network requirements</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs039.html#more-details" style="font-size: 80%;">More details</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs040.html#example-the-diffusion-equation" style="font-size: 80%;">Example: The diffusion equation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs041.html#defining-the-problem" style="font-size: 80%;">Defining the problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs042.html#setting-up-the-network-using-autograd" style="font-size: 80%;">Setting up the network using Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs043.html#setting-up-the-network-using-autograd-the-trial-solution" style="font-size: 80%;">Setting up the network using Autograd; The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs044.html#why-the-jacobian" style="font-size: 80%;">Why the Jacobian?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs045.html#setting-up-the-network-using-autograd-the-full-program" style="font-size: 80%;">Setting up the network using Autograd; The full program</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs046.html#resources-on-differential-equations-and-deep-learning" style="font-size: 80%;">Resources on differential equations and deep learning</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs047.html#convolutional-neural-networks-recognizing-images" style="font-size: 80%;">Convolutional Neural Networks (recognizing images)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs048.html#what-is-the-difference" style="font-size: 80%;">What is the Difference</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs049.html#neural-networks-vs-cnns" style="font-size: 80%;">Neural Networks vs CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs050.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;">Why CNNS for images, sound files, medical images from CT scans etc?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs051.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;">Regular NNs don’t scale well to full images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs052.html#3d-volumes-of-neurons" style="font-size: 80%;">3D volumes of neurons</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs053.html#more-on-dimensionalities" style="font-size: 80%;">More on Dimensionalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs054.html#further-remarks" style="font-size: 80%;">Further remarks</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs055.html#layers-used-to-build-cnns" style="font-size: 80%;">Layers used to build CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs056.html#transforming-images" style="font-size: 80%;">Transforming images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs057.html#cnns-in-brief" style="font-size: 80%;">CNNs in brief</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs058.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs059.html#key-idea" style="font-size: 80%;">Key Idea</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#how-to-do-image-compression-before-the-era-of-deep-learning" style="font-size: 80%;">How to do image compression before the era of deep learning</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs061.html#the-svd-example" style="font-size: 80%;">The SVD example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs062.html#mathematics-of-cnns" style="font-size: 80%;">Mathematics of CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs063.html#convolution-examples-polynomial-multiplication" style="font-size: 80%;">Convolution Examples: Polynomial multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs064.html#efficient-polynomial-multiplication" style="font-size: 80%;">Efficient Polynomial Multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs065.html#further-simplification" style="font-size: 80%;">Further simplification</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs066.html#a-more-efficient-way-of-coding-the-above-convolution" style="font-size: 80%;">A more efficient way of coding the above Convolution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs067.html#commutative-process" style="font-size: 80%;">Commutative process</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs068.html#toeplitz-matrices" style="font-size: 80%;">Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs069.html#fourier-series-and-toeplitz-matrices" style="font-size: 80%;">Fourier series and Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs070.html#generalizing-the-above-one-dimensional-case" style="font-size: 80%;">Generalizing the above one-dimensional case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs071.html#memory-considerations" style="font-size: 80%;">Memory considerations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs072.html#padding" style="font-size: 80%;">Padding</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs073.html#new-vector" style="font-size: 80%;">New vector</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs074.html#rewriting-as-dot-products" style="font-size: 80%;">Rewriting as dot products</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs075.html#cross-correlation" style="font-size: 80%;">Cross correlation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs075.html#two-dimensional-objects" style="font-size: 80%;">Two-dimensional objects</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs076.html#cnns-in-more-detail-simple-example" style="font-size: 80%;">CNNs in more detail, simple example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs077.html#the-convolution-stage" style="font-size: 80%;">The convolution stage</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs078.html#finding-the-number-of-parameters" style="font-size: 80%;">Finding the number of parameters</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs079.html#new-image-or-volume" style="font-size: 80%;">New image (or volume)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs080.html#parameters-to-train-common-settings" style="font-size: 80%;">Parameters to train, common settings</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs081.html#examples-of-cnn-setups" style="font-size: 80%;">Examples of CNN setups</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs082.html#summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Summarizing: Performing a general discrete convolution ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs083.html#pooling" style="font-size: 80%;">Pooling</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs084.html#pooling-arithmetic" style="font-size: 80%;">Pooling arithmetic</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs085.html#pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Pooling types ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs086.html#building-convolutional-neural-networks-in-tensorflow-keras-and-pytorch" style="font-size: 80%;">Building convolutional neural networks in Tensorflow/Keras and PyTorch</a></li>
 
         </ul>
       </li>
@@ -348,35 +389,22 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0021"></a>
 <!-- !split -->
-<h2 id="efficient-polynomial-multiplication" class="anchor">Efficient Polynomial Multiplication </h2>
-
-<p>Computing polynomial products can be implemented efficiently if we rewrite the more brute force multiplications using convolution.
-We note first that the new coefficients are given as
-</p>
-
-$$
-\begin{split}
-\delta_0=&\alpha_0\beta_0\\
-\delta_1=&\alpha_1\beta_0+\alpha_0\beta_1\\
-\delta_2=&\alpha_0\beta_2+\alpha_1\beta_1+\alpha_2\beta_0\\
-\delta_3=&\alpha_1\beta_2+\alpha_2\beta_1+\alpha_0\beta_3\\
-\delta_4=&\alpha_2\beta_2+\alpha_1\beta_3\\
-\delta_5=&\alpha_2\beta_3.\\
-\end{split}
-$$
-
-<p>We note that \( \alpha_i=0 \) except for \( i\in \left\{0,1,2\right\} \) and \( \beta_i=0 \) except for \( i\in\left\{0,1,2,3\right\} \).</p>
+<h2 id="final-technicalities-iv" class="anchor">Final technicalities IV </h2>
 
-<p>We can then rewrite the coefficients \( \delta_j \) using a discrete convolution as</p>
-$$
-\delta_j = \sum_{i=-\infty}^{i=\infty}\alpha_i\beta_{j-i}=(\alpha * \beta)_j,
-$$
+<p>Expressing \( z_{1,j}^{\text{output}} \) as a vector gives the following way of weighting the inputs from the hidden layer:</p>
 
-<p>or as a double sum with restriction \( l=i+j \)</p>
 $$
-\delta_l = \sum_{ij}\alpha_i\beta_{j}.
+\boldsymbol{z}_{1}^{\text{output}} =
+\begin{pmatrix}
+b_1^{\text{output}} & \boldsymbol{w}_1^{\text{output}}
+\end{pmatrix}
+\begin{pmatrix}
+1  & 1 & \dots & 1 \\
+\boldsymbol{x}_1^{\text{hidden}} & \boldsymbol{x}_2^{\text{hidden}} & \dots & \boldsymbol{x}_N^{\text{hidden}}
+\end{pmatrix}
 $$
 
+<p>In this case we seek a continuous range of values since we are approximating a function. This means that after computing \( \boldsymbol{z}_{1}^{\text{output}} \) the neural network has finished its feed forward step, and \( \boldsymbol{z}_{1}^{\text{output}} \) is the final output of the network.</p>
 
 <p>
 <!-- navigation buttons at the bottom of the page -->
@@ -403,7 +431,7 @@ <h2 id="efficient-polynomial-multiplication" class="anchor">Efficient Polynomial
   <li><a href="http://github.com/._week44-bs029.html">30</a></li>
   <li><a href="http://github.com/._week44-bs030.html">31</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week44-bs060.html">61</a></li>
+  <li><a href="http://github.com/._week44-bs086.html">87</a></li>
   <li><a href="http://github.com/._week44-bs022.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week44/html/._week44-bs022.html b/doc/pub/week44/html/._week44-bs022.html
index 196907d82..7dc331510 100644
--- a/doc/pub/week44/html/._week44-bs022.html
+++ b/doc/pub/week44/html/._week44-bs022.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 44,  Convolutional Neural Networks (CNN)">
-<title>Week 44,  Convolutional Neural Networks (CNN)</title>
+<meta name="description" content="Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)">
+<title>Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week44.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week44-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -41,10 +41,121 @@
                2,
                None,
                'lab-sessions-on-tuesday-and-wednesday'),
-              ('Material for Lecture Monday October 28',
+              ('Material for Lecture Monday October 27',
                2,
                None,
-               'material-for-lecture-monday-october-28'),
+               'material-for-lecture-monday-october-27'),
+              ('Solving differential equations  with Deep Learning',
+               2,
+               None,
+               'solving-differential-equations-with-deep-learning'),
+              ('Ordinary Differential Equations first',
+               2,
+               None,
+               'ordinary-differential-equations-first'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('Minimization process', 2, None, 'minimization-process'),
+              ('Minimizing the cost function using gradient descent and '
+               'automatic differentiation',
+               2,
+               None,
+               'minimizing-the-cost-function-using-gradient-descent-and-automatic-differentiation'),
+              ('Example: Exponential decay',
+               2,
+               None,
+               'example-exponential-decay'),
+              ('The function to solve for',
+               2,
+               None,
+               'the-function-to-solve-for'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('Setup of Network', 2, None, 'setup-of-network'),
+              ('Reformulating the problem',
+               2,
+               None,
+               'reformulating-the-problem'),
+              ('More technicalities', 2, None, 'more-technicalities'),
+              ('More details', 2, None, 'more-details'),
+              ('A possible implementation of a neural network',
+               2,
+               None,
+               'a-possible-implementation-of-a-neural-network'),
+              ('Technicalities', 2, None, 'technicalities'),
+              ('Final technicalities I', 2, None, 'final-technicalities-i'),
+              ('Final technicalities II', 2, None, 'final-technicalities-ii'),
+              ('Final technicalities III', 2, None, 'final-technicalities-iii'),
+              ('Final technicalities IV', 2, None, 'final-technicalities-iv'),
+              ('Back propagation', 2, None, 'back-propagation'),
+              ('Gradient descent', 2, None, 'gradient-descent'),
+              ('The code for solving the ODE',
+               2,
+               None,
+               'the-code-for-solving-the-ode'),
+              ('The network with one input layer, specified number of hidden '
+               'layers, and one output layer',
+               2,
+               None,
+               'the-network-with-one-input-layer-specified-number-of-hidden-layers-and-one-output-layer'),
+              ('Example: Population growth',
+               2,
+               None,
+               'example-population-growth'),
+              ('Setting up the problem', 2, None, 'setting-up-the-problem'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('The program using Autograd',
+               2,
+               None,
+               'the-program-using-autograd'),
+              ('Using forward Euler to solve the ODE',
+               2,
+               None,
+               'using-forward-euler-to-solve-the-ode'),
+              ('Example: Solving the one dimensional Poisson equation',
+               2,
+               None,
+               'example-solving-the-one-dimensional-poisson-equation'),
+              ('The specific equation to solve for',
+               2,
+               None,
+               'the-specific-equation-to-solve-for'),
+              ('Solving the equation using Autograd',
+               2,
+               None,
+               'solving-the-equation-using-autograd'),
+              ('Comparing with a numerical scheme',
+               2,
+               None,
+               'comparing-with-a-numerical-scheme'),
+              ('Setting up the code', 2, None, 'setting-up-the-code'),
+              ('Partial Differential Equations',
+               2,
+               None,
+               'partial-differential-equations'),
+              ('Type of problem', 2, None, 'type-of-problem'),
+              ('Network requirements', 2, None, 'network-requirements'),
+              ('More details', 2, None, 'more-details'),
+              ('Example: The diffusion equation',
+               2,
+               None,
+               'example-the-diffusion-equation'),
+              ('Defining the problem', 2, None, 'defining-the-problem'),
+              ('Setting up the network using Autograd',
+               2,
+               None,
+               'setting-up-the-network-using-autograd'),
+              ('Setting up the network using Autograd; The trial solution',
+               2,
+               None,
+               'setting-up-the-network-using-autograd-the-trial-solution'),
+              ('Why the Jacobian?', 2, None, 'why-the-jacobian'),
+              ('Setting up the network using Autograd; The full program',
+               2,
+               None,
+               'setting-up-the-network-using-autograd-the-full-program'),
+              ('Resources on differential equations and deep learning',
+               2,
+               None,
+               'resources-on-differential-equations-and-deep-learning'),
               ('Convolutional Neural Networks (recognizing images)',
                2,
                None,
@@ -141,85 +252,11 @@
                2,
                None,
                'pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
-              ('Building convolutional neural networks in Tensorflow and Keras',
-               2,
-               None,
-               'building-convolutional-neural-networks-in-tensorflow-and-keras'),
-              ('Setting it up', 2, None, 'setting-it-up'),
-              ('The MNIST dataset again', 2, None, 'the-mnist-dataset-again'),
-              ('Strong correlations', 2, None, 'strong-correlations'),
-              ('Layers of a CNN', 2, None, 'layers-of-a-cnn'),
-              ('Systematic reduction', 2, None, 'systematic-reduction'),
-              ('Prerequisites: Collect and pre-process data',
-               2,
-               None,
-               'prerequisites-collect-and-pre-process-data'),
-              ('Importing Keras and Tensorflow',
+              ('Building convolutional neural networks in Tensorflow/Keras and '
+               'PyTorch',
                2,
                None,
-               'importing-keras-and-tensorflow'),
-              ('Running with Keras', 2, None, 'running-with-keras'),
-              ('Final part', 2, None, 'final-part'),
-              ('Final visualization', 2, None, 'final-visualization'),
-              ('The CIFAR01 data set', 2, None, 'the-cifar01-data-set'),
-              ('Verifying the data set', 2, None, 'verifying-the-data-set'),
-              ('Set up  the model', 2, None, 'set-up-the-model'),
-              ('Add Dense layers on top', 2, None, 'add-dense-layers-on-top'),
-              ('Compile and train the model',
-               2,
-               None,
-               'compile-and-train-the-model'),
-              ('Finally, evaluate the model',
-               2,
-               None,
-               'finally-evaluate-the-model'),
-              ('Building our own CNN code',
-               2,
-               None,
-               'building-our-own-cnn-code'),
-              ('List of contents:', 3, None, 'list-of-contents'),
-              ('Schedulers', 3, None, 'schedulers'),
-              ('Usage of schedulers', 3, None, 'usage-of-schedulers'),
-              ('Cost functions', 3, None, 'cost-functions'),
-              ('Usage of cost functions', 3, None, 'usage-of-cost-functions'),
-              ('Activation functions', 3, None, 'activation-functions'),
-              ('Usage of activation functions',
-               3,
-               None,
-               'usage-of-activation-functions'),
-              ('Convolution', 3, None, 'convolution'),
-              ('Layers', 3, None, 'layers'),
-              ('Convolution2DLayer: convolution in a hidden layer',
-               3,
-               None,
-               'convolution2dlayer-convolution-in-a-hidden-layer'),
-              ('Backpropagation in the convolutional layer',
-               3,
-               None,
-               'backpropagation-in-the-convolutional-layer'),
-              ('Demonstration', 3, None, 'demonstration'),
-              ('Pooling Layer', 3, None, 'pooling-layer'),
-              ('Flattening Layer', 3, None, 'flattening-layer'),
-              ('Fully Connected Layers', 3, None, 'fully-connected-layers'),
-              ('Optimized Convolution2DLayer',
-               3,
-               None,
-               'optimized-convolution2dlayer'),
-              ('The Convolutional Neural Network (CNN)',
-               3,
-               None,
-               'the-convolutional-neural-network-cnn'),
-              ('Usage of CNN code', 3, None, 'usage-of-cnn-code'),
-              ('Additional Remarks', 3, None, 'additional-remarks'),
-              ('Remarks on the speed', 3, None, 'remarks-on-the-speed'),
-              ('Convolution using separable kernels',
-               3,
-               None,
-               'convolution-using-separable-kernels'),
-              ('Convolution in the Fourier domain',
-               3,
-               None,
-               'convolution-in-the-fourier-domain')]}
+               'building-convolutional-neural-networks-in-tensorflow-keras-and-pytorch')]}
 end of tocinfo -->
 
 <body>
@@ -247,96 +284,100 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week44-bs.html">Week 44,  Convolutional Neural Networks (CNN)</a>
+    <a class="navbar-brand" href="http://github.com/week44-bs.html">Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs001.html#plan-for-week-44" style="font-size: 80%;"><b>Plan for week 44</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs002.html#lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Lab  sessions on Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs003.html#material-for-lecture-monday-october-28" style="font-size: 80%;"><b>Material for Lecture Monday October 28</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs004.html#convolutional-neural-networks-recognizing-images" style="font-size: 80%;"><b>Convolutional Neural Networks (recognizing images)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs005.html#what-is-the-difference" style="font-size: 80%;"><b>What is the Difference</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs006.html#neural-networks-vs-cnns" style="font-size: 80%;"><b>Neural Networks vs CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs007.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;"><b>Why CNNS for images, sound files, medical images from CT scans etc?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs008.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;"><b>Regular NNs don’t scale well to full images</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs009.html#3d-volumes-of-neurons" style="font-size: 80%;"><b>3D volumes of neurons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs010.html#more-on-dimensionalities" style="font-size: 80%;"><b>More on Dimensionalities</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs011.html#further-remarks" style="font-size: 80%;"><b>Further remarks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs012.html#layers-used-to-build-cnns" style="font-size: 80%;"><b>Layers used to build CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs013.html#transforming-images" style="font-size: 80%;"><b>Transforming images</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs014.html#cnns-in-brief" style="font-size: 80%;"><b>CNNs in brief</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs015.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs016.html#key-idea" style="font-size: 80%;"><b>Key Idea</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs017.html#how-to-do-image-compression-before-the-era-of-deep-learning" style="font-size: 80%;"><b>How to do image compression before the era of deep learning</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs018.html#the-svd-example" style="font-size: 80%;"><b>The SVD example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs019.html#mathematics-of-cnns" style="font-size: 80%;"><b>Mathematics of CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs020.html#convolution-examples-polynomial-multiplication" style="font-size: 80%;"><b>Convolution Examples: Polynomial multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs021.html#efficient-polynomial-multiplication" style="font-size: 80%;"><b>Efficient Polynomial Multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="#further-simplification" style="font-size: 80%;"><b>Further simplification</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs023.html#a-more-efficient-way-of-coding-the-above-convolution" style="font-size: 80%;"><b>A more efficient way of coding the above Convolution</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs024.html#commutative-process" style="font-size: 80%;"><b>Commutative process</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs025.html#toeplitz-matrices" style="font-size: 80%;"><b>Toeplitz matrices</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs026.html#fourier-series-and-toeplitz-matrices" style="font-size: 80%;"><b>Fourier series and Toeplitz matrices</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs027.html#generalizing-the-above-one-dimensional-case" style="font-size: 80%;"><b>Generalizing the above one-dimensional case</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#memory-considerations" style="font-size: 80%;"><b>Memory considerations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs029.html#padding" style="font-size: 80%;"><b>Padding</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs030.html#new-vector" style="font-size: 80%;"><b>New vector</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs031.html#rewriting-as-dot-products" style="font-size: 80%;"><b>Rewriting as dot products</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs032.html#cross-correlation" style="font-size: 80%;"><b>Cross correlation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs032.html#two-dimensional-objects" style="font-size: 80%;"><b>Two-dimensional objects</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs033.html#cnns-in-more-detail-simple-example" style="font-size: 80%;"><b>CNNs in more detail, simple example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs034.html#the-convolution-stage" style="font-size: 80%;"><b>The convolution stage</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs035.html#finding-the-number-of-parameters" style="font-size: 80%;"><b>Finding the number of parameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs036.html#new-image-or-volume" style="font-size: 80%;"><b>New image (or volume)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs037.html#parameters-to-train-common-settings" style="font-size: 80%;"><b>Parameters to train, common settings</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs038.html#examples-of-cnn-setups" style="font-size: 80%;"><b>Examples of CNN setups</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs039.html#summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>Summarizing: Performing a general discrete convolution ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs040.html#pooling" style="font-size: 80%;"><b>Pooling</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs041.html#pooling-arithmetic" style="font-size: 80%;"><b>Pooling arithmetic</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs042.html#pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>Pooling types ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs043.html#building-convolutional-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building convolutional neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs044.html#setting-it-up" style="font-size: 80%;"><b>Setting it up</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs045.html#the-mnist-dataset-again" style="font-size: 80%;"><b>The MNIST dataset again</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs046.html#strong-correlations" style="font-size: 80%;"><b>Strong correlations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs047.html#layers-of-a-cnn" style="font-size: 80%;"><b>Layers of a CNN</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs048.html#systematic-reduction" style="font-size: 80%;"><b>Systematic reduction</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs049.html#prerequisites-collect-and-pre-process-data" style="font-size: 80%;"><b>Prerequisites: Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs050.html#importing-keras-and-tensorflow" style="font-size: 80%;"><b>Importing Keras and Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs051.html#running-with-keras" style="font-size: 80%;"><b>Running with Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs052.html#final-part" style="font-size: 80%;"><b>Final part</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs053.html#final-visualization" style="font-size: 80%;"><b>Final visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs054.html#the-cifar01-data-set" style="font-size: 80%;"><b>The CIFAR01 data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs055.html#verifying-the-data-set" style="font-size: 80%;"><b>Verifying the data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs056.html#set-up-the-model" style="font-size: 80%;"><b>Set up  the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs057.html#add-dense-layers-on-top" style="font-size: 80%;"><b>Add Dense layers on top</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs058.html#compile-and-train-the-model" style="font-size: 80%;"><b>Compile and train the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs059.html#finally-evaluate-the-model" style="font-size: 80%;"><b>Finally, evaluate the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#building-our-own-cnn-code" style="font-size: 80%;"><b>Building our own CNN code</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#list-of-contents" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;List of contents:</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution2dlayer-convolution-in-a-hidden-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution2DLayer: convolution in a hidden layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#backpropagation-in-the-convolutional-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Backpropagation in the convolutional layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#demonstration" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Demonstration</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#pooling-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Pooling Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#flattening-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Flattening Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#fully-connected-layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Fully Connected Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#optimized-convolution2dlayer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Optimized Convolution2DLayer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#the-convolutional-neural-network-cnn" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;The Convolutional Neural Network (CNN)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-cnn-code" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of CNN code</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#additional-remarks" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Additional Remarks</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#remarks-on-the-speed" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Remarks on the speed</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution-using-separable-kernels" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution using separable kernels</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution-in-the-fourier-domain" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution in the Fourier domain</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs001.html#plan-for-week-44" style="font-size: 80%;">Plan for week 44</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs002.html#lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;">Lab  sessions on Tuesday and Wednesday</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs003.html#material-for-lecture-monday-october-27" style="font-size: 80%;">Material for Lecture Monday October 27</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs004.html#solving-differential-equations-with-deep-learning" style="font-size: 80%;">Solving differential equations  with Deep Learning</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs005.html#ordinary-differential-equations-first" style="font-size: 80%;">Ordinary Differential Equations first</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#the-trial-solution" style="font-size: 80%;">The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs007.html#minimization-process" style="font-size: 80%;">Minimization process</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs008.html#minimizing-the-cost-function-using-gradient-descent-and-automatic-differentiation" style="font-size: 80%;">Minimizing the cost function using gradient descent and automatic differentiation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs009.html#example-exponential-decay" style="font-size: 80%;">Example: Exponential decay</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs010.html#the-function-to-solve-for" style="font-size: 80%;">The function to solve for</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#the-trial-solution" style="font-size: 80%;">The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs012.html#setup-of-network" style="font-size: 80%;">Setup of Network</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs013.html#reformulating-the-problem" style="font-size: 80%;">Reformulating the problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs014.html#more-technicalities" style="font-size: 80%;">More technicalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs039.html#more-details" style="font-size: 80%;">More details</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs016.html#a-possible-implementation-of-a-neural-network" style="font-size: 80%;">A possible implementation of a neural network</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs017.html#technicalities" style="font-size: 80%;">Technicalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs018.html#final-technicalities-i" style="font-size: 80%;">Final technicalities I</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs019.html#final-technicalities-ii" style="font-size: 80%;">Final technicalities II</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs020.html#final-technicalities-iii" style="font-size: 80%;">Final technicalities III</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs021.html#final-technicalities-iv" style="font-size: 80%;">Final technicalities IV</a></li>
+     <!-- navigation toc: --> <li><a href="#back-propagation" style="font-size: 80%;">Back propagation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs023.html#gradient-descent" style="font-size: 80%;">Gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs024.html#the-code-for-solving-the-ode" style="font-size: 80%;">The code for solving the ODE</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs025.html#the-network-with-one-input-layer-specified-number-of-hidden-layers-and-one-output-layer" style="font-size: 80%;">The network with one input layer, specified number of hidden layers, and one output layer</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs026.html#example-population-growth" style="font-size: 80%;">Example: Population growth</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs027.html#setting-up-the-problem" style="font-size: 80%;">Setting up the problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#the-trial-solution" style="font-size: 80%;">The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs029.html#the-program-using-autograd" style="font-size: 80%;">The program using Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs030.html#using-forward-euler-to-solve-the-ode" style="font-size: 80%;">Using forward Euler to solve the ODE</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs031.html#example-solving-the-one-dimensional-poisson-equation" style="font-size: 80%;">Example: Solving the one dimensional Poisson equation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs032.html#the-specific-equation-to-solve-for" style="font-size: 80%;">The specific equation to solve for</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs033.html#solving-the-equation-using-autograd" style="font-size: 80%;">Solving the equation using Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs034.html#comparing-with-a-numerical-scheme" style="font-size: 80%;">Comparing with a numerical scheme</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs035.html#setting-up-the-code" style="font-size: 80%;">Setting up the code</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs036.html#partial-differential-equations" style="font-size: 80%;">Partial Differential Equations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs037.html#type-of-problem" style="font-size: 80%;">Type of problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs038.html#network-requirements" style="font-size: 80%;">Network requirements</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs039.html#more-details" style="font-size: 80%;">More details</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs040.html#example-the-diffusion-equation" style="font-size: 80%;">Example: The diffusion equation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs041.html#defining-the-problem" style="font-size: 80%;">Defining the problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs042.html#setting-up-the-network-using-autograd" style="font-size: 80%;">Setting up the network using Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs043.html#setting-up-the-network-using-autograd-the-trial-solution" style="font-size: 80%;">Setting up the network using Autograd; The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs044.html#why-the-jacobian" style="font-size: 80%;">Why the Jacobian?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs045.html#setting-up-the-network-using-autograd-the-full-program" style="font-size: 80%;">Setting up the network using Autograd; The full program</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs046.html#resources-on-differential-equations-and-deep-learning" style="font-size: 80%;">Resources on differential equations and deep learning</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs047.html#convolutional-neural-networks-recognizing-images" style="font-size: 80%;">Convolutional Neural Networks (recognizing images)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs048.html#what-is-the-difference" style="font-size: 80%;">What is the Difference</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs049.html#neural-networks-vs-cnns" style="font-size: 80%;">Neural Networks vs CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs050.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;">Why CNNS for images, sound files, medical images from CT scans etc?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs051.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;">Regular NNs don’t scale well to full images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs052.html#3d-volumes-of-neurons" style="font-size: 80%;">3D volumes of neurons</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs053.html#more-on-dimensionalities" style="font-size: 80%;">More on Dimensionalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs054.html#further-remarks" style="font-size: 80%;">Further remarks</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs055.html#layers-used-to-build-cnns" style="font-size: 80%;">Layers used to build CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs056.html#transforming-images" style="font-size: 80%;">Transforming images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs057.html#cnns-in-brief" style="font-size: 80%;">CNNs in brief</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs058.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs059.html#key-idea" style="font-size: 80%;">Key Idea</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#how-to-do-image-compression-before-the-era-of-deep-learning" style="font-size: 80%;">How to do image compression before the era of deep learning</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs061.html#the-svd-example" style="font-size: 80%;">The SVD example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs062.html#mathematics-of-cnns" style="font-size: 80%;">Mathematics of CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs063.html#convolution-examples-polynomial-multiplication" style="font-size: 80%;">Convolution Examples: Polynomial multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs064.html#efficient-polynomial-multiplication" style="font-size: 80%;">Efficient Polynomial Multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs065.html#further-simplification" style="font-size: 80%;">Further simplification</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs066.html#a-more-efficient-way-of-coding-the-above-convolution" style="font-size: 80%;">A more efficient way of coding the above Convolution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs067.html#commutative-process" style="font-size: 80%;">Commutative process</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs068.html#toeplitz-matrices" style="font-size: 80%;">Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs069.html#fourier-series-and-toeplitz-matrices" style="font-size: 80%;">Fourier series and Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs070.html#generalizing-the-above-one-dimensional-case" style="font-size: 80%;">Generalizing the above one-dimensional case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs071.html#memory-considerations" style="font-size: 80%;">Memory considerations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs072.html#padding" style="font-size: 80%;">Padding</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs073.html#new-vector" style="font-size: 80%;">New vector</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs074.html#rewriting-as-dot-products" style="font-size: 80%;">Rewriting as dot products</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs075.html#cross-correlation" style="font-size: 80%;">Cross correlation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs075.html#two-dimensional-objects" style="font-size: 80%;">Two-dimensional objects</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs076.html#cnns-in-more-detail-simple-example" style="font-size: 80%;">CNNs in more detail, simple example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs077.html#the-convolution-stage" style="font-size: 80%;">The convolution stage</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs078.html#finding-the-number-of-parameters" style="font-size: 80%;">Finding the number of parameters</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs079.html#new-image-or-volume" style="font-size: 80%;">New image (or volume)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs080.html#parameters-to-train-common-settings" style="font-size: 80%;">Parameters to train, common settings</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs081.html#examples-of-cnn-setups" style="font-size: 80%;">Examples of CNN setups</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs082.html#summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Summarizing: Performing a general discrete convolution ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs083.html#pooling" style="font-size: 80%;">Pooling</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs084.html#pooling-arithmetic" style="font-size: 80%;">Pooling arithmetic</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs085.html#pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Pooling types ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs086.html#building-convolutional-neural-networks-in-tensorflow-keras-and-pytorch" style="font-size: 80%;">Building convolutional neural networks in Tensorflow/Keras and PyTorch</a></li>
 
         </ul>
       </li>
@@ -348,16 +389,19 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0022"></a>
 <!-- !split -->
-<h2 id="further-simplification" class="anchor">Further simplification </h2>
+<h2 id="back-propagation" class="anchor">Back propagation </h2>
+
+<p>The next step is to decide how the parameters should be changed such that they minimize the cost function.</p>
+
+<p>The chosen cost function for this problem is</p>
 
-<p>Although we may have redundant operations with some few zeros for \( \beta_i \), we can rewrite the above sum in a more compact way as </p>
 $$
-\delta_i = \sum_{k=0}^{k=m-1}\alpha_k\beta_{i-k},
+C(\boldsymbol{x}, P) = \frac{1}{N} \sum_i  \big(g_t'(x_i, P) - ( -\gamma g_t(x_i, P) \big)^2
 $$
 
-<p>where \( m=3 \) in our case, the maximum length of
-the vector \( \alpha \). Note that the vector \( \boldsymbol{\beta} \) has length \( n=4 \). Below we will find an even more efficient representation.
-</p>
+<p>In order to minimize the cost function, an optimization method must be chosen.</p>
+
+<p>Here, gradient descent with a constant step size has been chosen.</p>
 
 <p>
 <!-- navigation buttons at the bottom of the page -->
@@ -384,7 +428,7 @@ <h2 id="further-simplification" class="anchor">Further simplification </h2>
   <li><a href="http://github.com/._week44-bs030.html">31</a></li>
   <li><a href="http://github.com/._week44-bs031.html">32</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week44-bs060.html">61</a></li>
+  <li><a href="http://github.com/._week44-bs086.html">87</a></li>
   <li><a href="http://github.com/._week44-bs023.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week44/html/._week44-bs023.html b/doc/pub/week44/html/._week44-bs023.html
index a2ffd256f..483064f7b 100644
--- a/doc/pub/week44/html/._week44-bs023.html
+++ b/doc/pub/week44/html/._week44-bs023.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 44,  Convolutional Neural Networks (CNN)">
-<title>Week 44,  Convolutional Neural Networks (CNN)</title>
+<meta name="description" content="Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)">
+<title>Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week44.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week44-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -41,10 +41,121 @@
                2,
                None,
                'lab-sessions-on-tuesday-and-wednesday'),
-              ('Material for Lecture Monday October 28',
+              ('Material for Lecture Monday October 27',
                2,
                None,
-               'material-for-lecture-monday-october-28'),
+               'material-for-lecture-monday-october-27'),
+              ('Solving differential equations  with Deep Learning',
+               2,
+               None,
+               'solving-differential-equations-with-deep-learning'),
+              ('Ordinary Differential Equations first',
+               2,
+               None,
+               'ordinary-differential-equations-first'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('Minimization process', 2, None, 'minimization-process'),
+              ('Minimizing the cost function using gradient descent and '
+               'automatic differentiation',
+               2,
+               None,
+               'minimizing-the-cost-function-using-gradient-descent-and-automatic-differentiation'),
+              ('Example: Exponential decay',
+               2,
+               None,
+               'example-exponential-decay'),
+              ('The function to solve for',
+               2,
+               None,
+               'the-function-to-solve-for'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('Setup of Network', 2, None, 'setup-of-network'),
+              ('Reformulating the problem',
+               2,
+               None,
+               'reformulating-the-problem'),
+              ('More technicalities', 2, None, 'more-technicalities'),
+              ('More details', 2, None, 'more-details'),
+              ('A possible implementation of a neural network',
+               2,
+               None,
+               'a-possible-implementation-of-a-neural-network'),
+              ('Technicalities', 2, None, 'technicalities'),
+              ('Final technicalities I', 2, None, 'final-technicalities-i'),
+              ('Final technicalities II', 2, None, 'final-technicalities-ii'),
+              ('Final technicalities III', 2, None, 'final-technicalities-iii'),
+              ('Final technicalities IV', 2, None, 'final-technicalities-iv'),
+              ('Back propagation', 2, None, 'back-propagation'),
+              ('Gradient descent', 2, None, 'gradient-descent'),
+              ('The code for solving the ODE',
+               2,
+               None,
+               'the-code-for-solving-the-ode'),
+              ('The network with one input layer, specified number of hidden '
+               'layers, and one output layer',
+               2,
+               None,
+               'the-network-with-one-input-layer-specified-number-of-hidden-layers-and-one-output-layer'),
+              ('Example: Population growth',
+               2,
+               None,
+               'example-population-growth'),
+              ('Setting up the problem', 2, None, 'setting-up-the-problem'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('The program using Autograd',
+               2,
+               None,
+               'the-program-using-autograd'),
+              ('Using forward Euler to solve the ODE',
+               2,
+               None,
+               'using-forward-euler-to-solve-the-ode'),
+              ('Example: Solving the one dimensional Poisson equation',
+               2,
+               None,
+               'example-solving-the-one-dimensional-poisson-equation'),
+              ('The specific equation to solve for',
+               2,
+               None,
+               'the-specific-equation-to-solve-for'),
+              ('Solving the equation using Autograd',
+               2,
+               None,
+               'solving-the-equation-using-autograd'),
+              ('Comparing with a numerical scheme',
+               2,
+               None,
+               'comparing-with-a-numerical-scheme'),
+              ('Setting up the code', 2, None, 'setting-up-the-code'),
+              ('Partial Differential Equations',
+               2,
+               None,
+               'partial-differential-equations'),
+              ('Type of problem', 2, None, 'type-of-problem'),
+              ('Network requirements', 2, None, 'network-requirements'),
+              ('More details', 2, None, 'more-details'),
+              ('Example: The diffusion equation',
+               2,
+               None,
+               'example-the-diffusion-equation'),
+              ('Defining the problem', 2, None, 'defining-the-problem'),
+              ('Setting up the network using Autograd',
+               2,
+               None,
+               'setting-up-the-network-using-autograd'),
+              ('Setting up the network using Autograd; The trial solution',
+               2,
+               None,
+               'setting-up-the-network-using-autograd-the-trial-solution'),
+              ('Why the Jacobian?', 2, None, 'why-the-jacobian'),
+              ('Setting up the network using Autograd; The full program',
+               2,
+               None,
+               'setting-up-the-network-using-autograd-the-full-program'),
+              ('Resources on differential equations and deep learning',
+               2,
+               None,
+               'resources-on-differential-equations-and-deep-learning'),
               ('Convolutional Neural Networks (recognizing images)',
                2,
                None,
@@ -141,85 +252,11 @@
                2,
                None,
                'pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
-              ('Building convolutional neural networks in Tensorflow and Keras',
-               2,
-               None,
-               'building-convolutional-neural-networks-in-tensorflow-and-keras'),
-              ('Setting it up', 2, None, 'setting-it-up'),
-              ('The MNIST dataset again', 2, None, 'the-mnist-dataset-again'),
-              ('Strong correlations', 2, None, 'strong-correlations'),
-              ('Layers of a CNN', 2, None, 'layers-of-a-cnn'),
-              ('Systematic reduction', 2, None, 'systematic-reduction'),
-              ('Prerequisites: Collect and pre-process data',
-               2,
-               None,
-               'prerequisites-collect-and-pre-process-data'),
-              ('Importing Keras and Tensorflow',
-               2,
-               None,
-               'importing-keras-and-tensorflow'),
-              ('Running with Keras', 2, None, 'running-with-keras'),
-              ('Final part', 2, None, 'final-part'),
-              ('Final visualization', 2, None, 'final-visualization'),
-              ('The CIFAR01 data set', 2, None, 'the-cifar01-data-set'),
-              ('Verifying the data set', 2, None, 'verifying-the-data-set'),
-              ('Set up  the model', 2, None, 'set-up-the-model'),
-              ('Add Dense layers on top', 2, None, 'add-dense-layers-on-top'),
-              ('Compile and train the model',
-               2,
-               None,
-               'compile-and-train-the-model'),
-              ('Finally, evaluate the model',
-               2,
-               None,
-               'finally-evaluate-the-model'),
-              ('Building our own CNN code',
+              ('Building convolutional neural networks in Tensorflow/Keras and '
+               'PyTorch',
                2,
                None,
-               'building-our-own-cnn-code'),
-              ('List of contents:', 3, None, 'list-of-contents'),
-              ('Schedulers', 3, None, 'schedulers'),
-              ('Usage of schedulers', 3, None, 'usage-of-schedulers'),
-              ('Cost functions', 3, None, 'cost-functions'),
-              ('Usage of cost functions', 3, None, 'usage-of-cost-functions'),
-              ('Activation functions', 3, None, 'activation-functions'),
-              ('Usage of activation functions',
-               3,
-               None,
-               'usage-of-activation-functions'),
-              ('Convolution', 3, None, 'convolution'),
-              ('Layers', 3, None, 'layers'),
-              ('Convolution2DLayer: convolution in a hidden layer',
-               3,
-               None,
-               'convolution2dlayer-convolution-in-a-hidden-layer'),
-              ('Backpropagation in the convolutional layer',
-               3,
-               None,
-               'backpropagation-in-the-convolutional-layer'),
-              ('Demonstration', 3, None, 'demonstration'),
-              ('Pooling Layer', 3, None, 'pooling-layer'),
-              ('Flattening Layer', 3, None, 'flattening-layer'),
-              ('Fully Connected Layers', 3, None, 'fully-connected-layers'),
-              ('Optimized Convolution2DLayer',
-               3,
-               None,
-               'optimized-convolution2dlayer'),
-              ('The Convolutional Neural Network (CNN)',
-               3,
-               None,
-               'the-convolutional-neural-network-cnn'),
-              ('Usage of CNN code', 3, None, 'usage-of-cnn-code'),
-              ('Additional Remarks', 3, None, 'additional-remarks'),
-              ('Remarks on the speed', 3, None, 'remarks-on-the-speed'),
-              ('Convolution using separable kernels',
-               3,
-               None,
-               'convolution-using-separable-kernels'),
-              ('Convolution in the Fourier domain',
-               3,
-               None,
-               'convolution-in-the-fourier-domain')]}
+               'building-convolutional-neural-networks-in-tensorflow-keras-and-pytorch')]}
 end of tocinfo -->
 
 <body>
@@ -247,96 +284,100 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week44-bs.html">Week 44,  Convolutional Neural Networks (CNN)</a>
+    <a class="navbar-brand" href="http://github.com/week44-bs.html">Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs001.html#plan-for-week-44" style="font-size: 80%;"><b>Plan for week 44</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs002.html#lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Lab  sessions on Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs003.html#material-for-lecture-monday-october-28" style="font-size: 80%;"><b>Material for Lecture Monday October 28</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs004.html#convolutional-neural-networks-recognizing-images" style="font-size: 80%;"><b>Convolutional Neural Networks (recognizing images)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs005.html#what-is-the-difference" style="font-size: 80%;"><b>What is the Difference</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs006.html#neural-networks-vs-cnns" style="font-size: 80%;"><b>Neural Networks vs CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs007.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;"><b>Why CNNS for images, sound files, medical images from CT scans etc?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs008.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;"><b>Regular NNs don’t scale well to full images</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs009.html#3d-volumes-of-neurons" style="font-size: 80%;"><b>3D volumes of neurons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs010.html#more-on-dimensionalities" style="font-size: 80%;"><b>More on Dimensionalities</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs011.html#further-remarks" style="font-size: 80%;"><b>Further remarks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs012.html#layers-used-to-build-cnns" style="font-size: 80%;"><b>Layers used to build CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs013.html#transforming-images" style="font-size: 80%;"><b>Transforming images</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs014.html#cnns-in-brief" style="font-size: 80%;"><b>CNNs in brief</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs015.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs016.html#key-idea" style="font-size: 80%;"><b>Key Idea</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs017.html#how-to-do-image-compression-before-the-era-of-deep-learning" style="font-size: 80%;"><b>How to do image compression before the era of deep learning</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs018.html#the-svd-example" style="font-size: 80%;"><b>The SVD example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs019.html#mathematics-of-cnns" style="font-size: 80%;"><b>Mathematics of CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs020.html#convolution-examples-polynomial-multiplication" style="font-size: 80%;"><b>Convolution Examples: Polynomial multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs021.html#efficient-polynomial-multiplication" style="font-size: 80%;"><b>Efficient Polynomial Multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs022.html#further-simplification" style="font-size: 80%;"><b>Further simplification</b></a></li>
-     <!-- navigation toc: --> <li><a href="#a-more-efficient-way-of-coding-the-above-convolution" style="font-size: 80%;"><b>A more efficient way of coding the above Convolution</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs024.html#commutative-process" style="font-size: 80%;"><b>Commutative process</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs025.html#toeplitz-matrices" style="font-size: 80%;"><b>Toeplitz matrices</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs026.html#fourier-series-and-toeplitz-matrices" style="font-size: 80%;"><b>Fourier series and Toeplitz matrices</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs027.html#generalizing-the-above-one-dimensional-case" style="font-size: 80%;"><b>Generalizing the above one-dimensional case</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#memory-considerations" style="font-size: 80%;"><b>Memory considerations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs029.html#padding" style="font-size: 80%;"><b>Padding</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs030.html#new-vector" style="font-size: 80%;"><b>New vector</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs031.html#rewriting-as-dot-products" style="font-size: 80%;"><b>Rewriting as dot products</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs032.html#cross-correlation" style="font-size: 80%;"><b>Cross correlation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs032.html#two-dimensional-objects" style="font-size: 80%;"><b>Two-dimensional objects</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs033.html#cnns-in-more-detail-simple-example" style="font-size: 80%;"><b>CNNs in more detail, simple example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs034.html#the-convolution-stage" style="font-size: 80%;"><b>The convolution stage</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs035.html#finding-the-number-of-parameters" style="font-size: 80%;"><b>Finding the number of parameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs036.html#new-image-or-volume" style="font-size: 80%;"><b>New image (or volume)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs037.html#parameters-to-train-common-settings" style="font-size: 80%;"><b>Parameters to train, common settings</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs038.html#examples-of-cnn-setups" style="font-size: 80%;"><b>Examples of CNN setups</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs039.html#summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>Summarizing: Performing a general discrete convolution ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs040.html#pooling" style="font-size: 80%;"><b>Pooling</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs041.html#pooling-arithmetic" style="font-size: 80%;"><b>Pooling arithmetic</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs042.html#pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>Pooling types ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs043.html#building-convolutional-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building convolutional neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs044.html#setting-it-up" style="font-size: 80%;"><b>Setting it up</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs045.html#the-mnist-dataset-again" style="font-size: 80%;"><b>The MNIST dataset again</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs046.html#strong-correlations" style="font-size: 80%;"><b>Strong correlations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs047.html#layers-of-a-cnn" style="font-size: 80%;"><b>Layers of a CNN</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs048.html#systematic-reduction" style="font-size: 80%;"><b>Systematic reduction</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs049.html#prerequisites-collect-and-pre-process-data" style="font-size: 80%;"><b>Prerequisites: Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs050.html#importing-keras-and-tensorflow" style="font-size: 80%;"><b>Importing Keras and Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs051.html#running-with-keras" style="font-size: 80%;"><b>Running with Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs052.html#final-part" style="font-size: 80%;"><b>Final part</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs053.html#final-visualization" style="font-size: 80%;"><b>Final visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs054.html#the-cifar01-data-set" style="font-size: 80%;"><b>The CIFAR01 data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs055.html#verifying-the-data-set" style="font-size: 80%;"><b>Verifying the data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs056.html#set-up-the-model" style="font-size: 80%;"><b>Set up  the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs057.html#add-dense-layers-on-top" style="font-size: 80%;"><b>Add Dense layers on top</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs058.html#compile-and-train-the-model" style="font-size: 80%;"><b>Compile and train the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs059.html#finally-evaluate-the-model" style="font-size: 80%;"><b>Finally, evaluate the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#building-our-own-cnn-code" style="font-size: 80%;"><b>Building our own CNN code</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#list-of-contents" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;List of contents:</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution2dlayer-convolution-in-a-hidden-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution2DLayer: convolution in a hidden layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#backpropagation-in-the-convolutional-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Backpropagation in the convolutional layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#demonstration" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Demonstration</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#pooling-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Pooling Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#flattening-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Flattening Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#fully-connected-layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Fully Connected Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#optimized-convolution2dlayer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Optimized Convolution2DLayer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#the-convolutional-neural-network-cnn" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;The Convolutional Neural Network (CNN)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-cnn-code" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of CNN code</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#additional-remarks" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Additional Remarks</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#remarks-on-the-speed" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Remarks on the speed</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution-using-separable-kernels" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution using separable kernels</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution-in-the-fourier-domain" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution in the Fourier domain</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs001.html#plan-for-week-44" style="font-size: 80%;">Plan for week 44</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs002.html#lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;">Lab  sessions on Tuesday and Wednesday</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs003.html#material-for-lecture-monday-october-27" style="font-size: 80%;">Material for Lecture Monday October 27</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs004.html#solving-differential-equations-with-deep-learning" style="font-size: 80%;">Solving differential equations  with Deep Learning</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs005.html#ordinary-differential-equations-first" style="font-size: 80%;">Ordinary Differential Equations first</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#the-trial-solution" style="font-size: 80%;">The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs007.html#minimization-process" style="font-size: 80%;">Minimization process</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs008.html#minimizing-the-cost-function-using-gradient-descent-and-automatic-differentiation" style="font-size: 80%;">Minimizing the cost function using gradient descent and automatic differentiation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs009.html#example-exponential-decay" style="font-size: 80%;">Example: Exponential decay</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs010.html#the-function-to-solve-for" style="font-size: 80%;">The function to solve for</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#the-trial-solution" style="font-size: 80%;">The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs012.html#setup-of-network" style="font-size: 80%;">Setup of Network</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs013.html#reformulating-the-problem" style="font-size: 80%;">Reformulating the problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs014.html#more-technicalities" style="font-size: 80%;">More technicalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs039.html#more-details" style="font-size: 80%;">More details</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs016.html#a-possible-implementation-of-a-neural-network" style="font-size: 80%;">A possible implementation of a neural network</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs017.html#technicalities" style="font-size: 80%;">Technicalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs018.html#final-technicalities-i" style="font-size: 80%;">Final technicalities I</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs019.html#final-technicalities-ii" style="font-size: 80%;">Final technicalities II</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs020.html#final-technicalities-iii" style="font-size: 80%;">Final technicalities III</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs021.html#final-technicalities-iv" style="font-size: 80%;">Final technicalities IV</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs022.html#back-propagation" style="font-size: 80%;">Back propagation</a></li>
+     <!-- navigation toc: --> <li><a href="#gradient-descent" style="font-size: 80%;">Gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs024.html#the-code-for-solving-the-ode" style="font-size: 80%;">The code for solving the ODE</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs025.html#the-network-with-one-input-layer-specified-number-of-hidden-layers-and-one-output-layer" style="font-size: 80%;">The network with one input layer, specified number of hidden layers, and one output layer</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs026.html#example-population-growth" style="font-size: 80%;">Example: Population growth</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs027.html#setting-up-the-problem" style="font-size: 80%;">Setting up the problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#the-trial-solution" style="font-size: 80%;">The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs029.html#the-program-using-autograd" style="font-size: 80%;">The program using Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs030.html#using-forward-euler-to-solve-the-ode" style="font-size: 80%;">Using forward Euler to solve the ODE</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs031.html#example-solving-the-one-dimensional-poisson-equation" style="font-size: 80%;">Example: Solving the one dimensional Poisson equation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs032.html#the-specific-equation-to-solve-for" style="font-size: 80%;">The specific equation to solve for</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs033.html#solving-the-equation-using-autograd" style="font-size: 80%;">Solving the equation using Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs034.html#comparing-with-a-numerical-scheme" style="font-size: 80%;">Comparing with a numerical scheme</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs035.html#setting-up-the-code" style="font-size: 80%;">Setting up the code</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs036.html#partial-differential-equations" style="font-size: 80%;">Partial Differential Equations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs037.html#type-of-problem" style="font-size: 80%;">Type of problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs038.html#network-requirements" style="font-size: 80%;">Network requirements</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs039.html#more-details" style="font-size: 80%;">More details</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs040.html#example-the-diffusion-equation" style="font-size: 80%;">Example: The diffusion equation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs041.html#defining-the-problem" style="font-size: 80%;">Defining the problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs042.html#setting-up-the-network-using-autograd" style="font-size: 80%;">Setting up the network using Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs043.html#setting-up-the-network-using-autograd-the-trial-solution" style="font-size: 80%;">Setting up the network using Autograd; The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs044.html#why-the-jacobian" style="font-size: 80%;">Why the Jacobian?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs045.html#setting-up-the-network-using-autograd-the-full-program" style="font-size: 80%;">Setting up the network using Autograd; The full program</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs046.html#resources-on-differential-equations-and-deep-learning" style="font-size: 80%;">Resources on differential equations and deep learning</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs047.html#convolutional-neural-networks-recognizing-images" style="font-size: 80%;">Convolutional Neural Networks (recognizing images)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs048.html#what-is-the-difference" style="font-size: 80%;">What is the Difference</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs049.html#neural-networks-vs-cnns" style="font-size: 80%;">Neural Networks vs CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs050.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;">Why CNNS for images, sound files, medical images from CT scans etc?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs051.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;">Regular NNs don’t scale well to full images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs052.html#3d-volumes-of-neurons" style="font-size: 80%;">3D volumes of neurons</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs053.html#more-on-dimensionalities" style="font-size: 80%;">More on Dimensionalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs054.html#further-remarks" style="font-size: 80%;">Further remarks</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs055.html#layers-used-to-build-cnns" style="font-size: 80%;">Layers used to build CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs056.html#transforming-images" style="font-size: 80%;">Transforming images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs057.html#cnns-in-brief" style="font-size: 80%;">CNNs in brief</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs058.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs059.html#key-idea" style="font-size: 80%;">Key Idea</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#how-to-do-image-compression-before-the-era-of-deep-learning" style="font-size: 80%;">How to do image compression before the era of deep learning</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs061.html#the-svd-example" style="font-size: 80%;">The SVD example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs062.html#mathematics-of-cnns" style="font-size: 80%;">Mathematics of CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs063.html#convolution-examples-polynomial-multiplication" style="font-size: 80%;">Convolution Examples: Polynomial multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs064.html#efficient-polynomial-multiplication" style="font-size: 80%;">Efficient Polynomial Multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs065.html#further-simplification" style="font-size: 80%;">Further simplification</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs066.html#a-more-efficient-way-of-coding-the-above-convolution" style="font-size: 80%;">A more efficient way of coding the above Convolution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs067.html#commutative-process" style="font-size: 80%;">Commutative process</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs068.html#toeplitz-matrices" style="font-size: 80%;">Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs069.html#fourier-series-and-toeplitz-matrices" style="font-size: 80%;">Fourier series and Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs070.html#generalizing-the-above-one-dimensional-case" style="font-size: 80%;">Generalizing the above one-dimensional case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs071.html#memory-considerations" style="font-size: 80%;">Memory considerations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs072.html#padding" style="font-size: 80%;">Padding</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs073.html#new-vector" style="font-size: 80%;">New vector</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs074.html#rewriting-as-dot-products" style="font-size: 80%;">Rewriting as dot products</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs075.html#cross-correlation" style="font-size: 80%;">Cross correlation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs075.html#two-dimensional-objects" style="font-size: 80%;">Two-dimensional objects</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs076.html#cnns-in-more-detail-simple-example" style="font-size: 80%;">CNNs in more detail, simple example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs077.html#the-convolution-stage" style="font-size: 80%;">The convolution stage</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs078.html#finding-the-number-of-parameters" style="font-size: 80%;">Finding the number of parameters</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs079.html#new-image-or-volume" style="font-size: 80%;">New image (or volume)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs080.html#parameters-to-train-common-settings" style="font-size: 80%;">Parameters to train, common settings</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs081.html#examples-of-cnn-setups" style="font-size: 80%;">Examples of CNN setups</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs082.html#summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Summarizing: Performing a general discrete convolution ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs083.html#pooling" style="font-size: 80%;">Pooling</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs084.html#pooling-arithmetic" style="font-size: 80%;">Pooling arithmetic</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs085.html#pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Pooling types ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs086.html#building-convolutional-neural-networks-in-tensorflow-keras-and-pytorch" style="font-size: 80%;">Building convolutional neural networks in Tensorflow/Keras and PyTorch</a></li>
 
         </ul>
       </li>
@@ -348,21 +389,42 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0023"></a>
 <!-- !split -->
-<h2 id="a-more-efficient-way-of-coding-the-above-convolution" class="anchor">A more efficient way of coding the above Convolution </h2>
+<h2 id="gradient-descent" class="anchor">Gradient descent </h2>
+
+<p>The idea of the gradient descent algorithm is to update parameters in
+a direction where the cost function decreases goes to a minimum.
+</p>
 
-<p>Since we only have a finite number of \( \alpha \) and \( \beta \) values
-which are non-zero, we can rewrite the above convolution expressions
-as a matrix-vector multiplication
+<p>In general, the update of some parameters \( \boldsymbol{\omega} \) given a cost
+function defined by some weights \( \boldsymbol{\omega} \), \( C(\boldsymbol{x},
+\boldsymbol{\omega}) \), goes as follows:
 </p>
 
 $$
-\boldsymbol{\delta}=\begin{bmatrix}\alpha_0 & 0 & 0 & 0 \\
-                            \alpha_1 & \alpha_0 & 0 & 0 \\
-			    \alpha_2 & \alpha_1 & \alpha_0 & 0 \\
-			    0 & \alpha_2 & \alpha_1 & \alpha_0 \\
-			    0 & 0 & \alpha_2 & \alpha_1 \\
-			    0 & 0 & 0 & \alpha_2
-			    \end{bmatrix}\begin{bmatrix} \beta_0 \\ \beta_1 \\ \beta_2 \\ \beta_3\end{bmatrix}.
+\boldsymbol{\omega}_{\text{new} } = \boldsymbol{\omega} - \lambda \nabla_{\boldsymbol{\omega}} C(\boldsymbol{x}, \boldsymbol{\omega})
+$$
+
+<p>for a number of iterations or until $ \big|\big| \boldsymbol{\omega}_{\text{new} } - \boldsymbol{\omega} \big|\big|$ becomes smaller than some given tolerance.</p>
+
+<p>The value of \( \lambda \) decides how large steps the algorithm must take
+in the direction of $ \nabla_{\boldsymbol{\omega}} C(\boldsymbol{x}, \boldsymbol{\omega})$.
+The notation \( \nabla_{\boldsymbol{\omega}} \) express the gradient with respect
+to the elements in \( \boldsymbol{\omega} \).
+</p>
+
+<p>In our case, we have to minimize the cost function \( C(\boldsymbol{x}, P) \) with
+respect to the two sets of weights and biases, that is for the hidden
+layer \( P_{\text{hidden} } \) and for the output layer \( P_{\text{output}
+} \) .
+</p>
+
+<p>This means that \( P_{\text{hidden} } \) and \( P_{\text{output} } \) is updated by</p>
+
+$$
+\begin{aligned}
+P_{\text{hidden},\text{new}} &= P_{\text{hidden}} - \lambda \nabla_{P_{\text{hidden}}} C(\boldsymbol{x}, P)  \\
+P_{\text{output},\text{new}} &= P_{\text{output}} - \lambda \nabla_{P_{\text{output}}} C(\boldsymbol{x}, P)
+\end{aligned}
 $$
 
 
@@ -391,7 +453,7 @@ <h2 id="a-more-efficient-way-of-coding-the-above-convolution" class="anchor">A m
   <li><a href="http://github.com/._week44-bs031.html">32</a></li>
   <li><a href="http://github.com/._week44-bs032.html">33</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week44-bs060.html">61</a></li>
+  <li><a href="http://github.com/._week44-bs086.html">87</a></li>
   <li><a href="http://github.com/._week44-bs024.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week44/html/._week44-bs024.html b/doc/pub/week44/html/._week44-bs024.html
index 35eba43fd..41e0dd27a 100644
--- a/doc/pub/week44/html/._week44-bs024.html
+++ b/doc/pub/week44/html/._week44-bs024.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 44,  Convolutional Neural Networks (CNN)">
-<title>Week 44,  Convolutional Neural Networks (CNN)</title>
+<meta name="description" content="Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)">
+<title>Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week44.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week44-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -41,10 +41,121 @@
                2,
                None,
                'lab-sessions-on-tuesday-and-wednesday'),
-              ('Material for Lecture Monday October 28',
+              ('Material for Lecture Monday October 27',
                2,
                None,
-               'material-for-lecture-monday-october-28'),
+               'material-for-lecture-monday-october-27'),
+              ('Solving differential equations  with Deep Learning',
+               2,
+               None,
+               'solving-differential-equations-with-deep-learning'),
+              ('Ordinary Differential Equations first',
+               2,
+               None,
+               'ordinary-differential-equations-first'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('Minimization process', 2, None, 'minimization-process'),
+              ('Minimizing the cost function using gradient descent and '
+               'automatic differentiation',
+               2,
+               None,
+               'minimizing-the-cost-function-using-gradient-descent-and-automatic-differentiation'),
+              ('Example: Exponential decay',
+               2,
+               None,
+               'example-exponential-decay'),
+              ('The function to solve for',
+               2,
+               None,
+               'the-function-to-solve-for'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('Setup of Network', 2, None, 'setup-of-network'),
+              ('Reformulating the problem',
+               2,
+               None,
+               'reformulating-the-problem'),
+              ('More technicalities', 2, None, 'more-technicalities'),
+              ('More details', 2, None, 'more-details'),
+              ('A possible implementation of a neural network',
+               2,
+               None,
+               'a-possible-implementation-of-a-neural-network'),
+              ('Technicalities', 2, None, 'technicalities'),
+              ('Final technicalities I', 2, None, 'final-technicalities-i'),
+              ('Final technicalities II', 2, None, 'final-technicalities-ii'),
+              ('Final technicalities III', 2, None, 'final-technicalities-iii'),
+              ('Final technicalities IV', 2, None, 'final-technicalities-iv'),
+              ('Back propagation', 2, None, 'back-propagation'),
+              ('Gradient descent', 2, None, 'gradient-descent'),
+              ('The code for solving the ODE',
+               2,
+               None,
+               'the-code-for-solving-the-ode'),
+              ('The network with one input layer, specified number of hidden '
+               'layers, and one output layer',
+               2,
+               None,
+               'the-network-with-one-input-layer-specified-number-of-hidden-layers-and-one-output-layer'),
+              ('Example: Population growth',
+               2,
+               None,
+               'example-population-growth'),
+              ('Setting up the problem', 2, None, 'setting-up-the-problem'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('The program using Autograd',
+               2,
+               None,
+               'the-program-using-autograd'),
+              ('Using forward Euler to solve the ODE',
+               2,
+               None,
+               'using-forward-euler-to-solve-the-ode'),
+              ('Example: Solving the one dimensional Poisson equation',
+               2,
+               None,
+               'example-solving-the-one-dimensional-poisson-equation'),
+              ('The specific equation to solve for',
+               2,
+               None,
+               'the-specific-equation-to-solve-for'),
+              ('Solving the equation using Autograd',
+               2,
+               None,
+               'solving-the-equation-using-autograd'),
+              ('Comparing with a numerical scheme',
+               2,
+               None,
+               'comparing-with-a-numerical-scheme'),
+              ('Setting up the code', 2, None, 'setting-up-the-code'),
+              ('Partial Differential Equations',
+               2,
+               None,
+               'partial-differential-equations'),
+              ('Type of problem', 2, None, 'type-of-problem'),
+              ('Network requirements', 2, None, 'network-requirements'),
+              ('More details', 2, None, 'more-details'),
+              ('Example: The diffusion equation',
+               2,
+               None,
+               'example-the-diffusion-equation'),
+              ('Defining the problem', 2, None, 'defining-the-problem'),
+              ('Setting up the network using Autograd',
+               2,
+               None,
+               'setting-up-the-network-using-autograd'),
+              ('Setting up the network using Autograd; The trial solution',
+               2,
+               None,
+               'setting-up-the-network-using-autograd-the-trial-solution'),
+              ('Why the Jacobian?', 2, None, 'why-the-jacobian'),
+              ('Setting up the network using Autograd; The full program',
+               2,
+               None,
+               'setting-up-the-network-using-autograd-the-full-program'),
+              ('Resources on differential equations and deep learning',
+               2,
+               None,
+               'resources-on-differential-equations-and-deep-learning'),
               ('Convolutional Neural Networks (recognizing images)',
                2,
                None,
@@ -141,85 +252,11 @@
                2,
                None,
                'pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
-              ('Building convolutional neural networks in Tensorflow and Keras',
-               2,
-               None,
-               'building-convolutional-neural-networks-in-tensorflow-and-keras'),
-              ('Setting it up', 2, None, 'setting-it-up'),
-              ('The MNIST dataset again', 2, None, 'the-mnist-dataset-again'),
-              ('Strong correlations', 2, None, 'strong-correlations'),
-              ('Layers of a CNN', 2, None, 'layers-of-a-cnn'),
-              ('Systematic reduction', 2, None, 'systematic-reduction'),
-              ('Prerequisites: Collect and pre-process data',
+              ('Building convolutional neural networks in Tensorflow/Keras and '
+               'PyTorch',
                2,
                None,
-               'prerequisites-collect-and-pre-process-data'),
-              ('Importing Keras and Tensorflow',
-               2,
-               None,
-               'importing-keras-and-tensorflow'),
-              ('Running with Keras', 2, None, 'running-with-keras'),
-              ('Final part', 2, None, 'final-part'),
-              ('Final visualization', 2, None, 'final-visualization'),
-              ('The CIFAR01 data set', 2, None, 'the-cifar01-data-set'),
-              ('Verifying the data set', 2, None, 'verifying-the-data-set'),
-              ('Set up  the model', 2, None, 'set-up-the-model'),
-              ('Add Dense layers on top', 2, None, 'add-dense-layers-on-top'),
-              ('Compile and train the model',
-               2,
-               None,
-               'compile-and-train-the-model'),
-              ('Finally, evaluate the model',
-               2,
-               None,
-               'finally-evaluate-the-model'),
-              ('Building our own CNN code',
-               2,
-               None,
-               'building-our-own-cnn-code'),
-              ('List of contents:', 3, None, 'list-of-contents'),
-              ('Schedulers', 3, None, 'schedulers'),
-              ('Usage of schedulers', 3, None, 'usage-of-schedulers'),
-              ('Cost functions', 3, None, 'cost-functions'),
-              ('Usage of cost functions', 3, None, 'usage-of-cost-functions'),
-              ('Activation functions', 3, None, 'activation-functions'),
-              ('Usage of activation functions',
-               3,
-               None,
-               'usage-of-activation-functions'),
-              ('Convolution', 3, None, 'convolution'),
-              ('Layers', 3, None, 'layers'),
-              ('Convolution2DLayer: convolution in a hidden layer',
-               3,
-               None,
-               'convolution2dlayer-convolution-in-a-hidden-layer'),
-              ('Backpropagation in the convolutional layer',
-               3,
-               None,
-               'backpropagation-in-the-convolutional-layer'),
-              ('Demonstration', 3, None, 'demonstration'),
-              ('Pooling Layer', 3, None, 'pooling-layer'),
-              ('Flattening Layer', 3, None, 'flattening-layer'),
-              ('Fully Connected Layers', 3, None, 'fully-connected-layers'),
-              ('Optimized Convolution2DLayer',
-               3,
-               None,
-               'optimized-convolution2dlayer'),
-              ('The Convolutional Neural Network (CNN)',
-               3,
-               None,
-               'the-convolutional-neural-network-cnn'),
-              ('Usage of CNN code', 3, None, 'usage-of-cnn-code'),
-              ('Additional Remarks', 3, None, 'additional-remarks'),
-              ('Remarks on the speed', 3, None, 'remarks-on-the-speed'),
-              ('Convolution using separable kernels',
-               3,
-               None,
-               'convolution-using-separable-kernels'),
-              ('Convolution in the Fourier domain',
-               3,
-               None,
-               'convolution-in-the-fourier-domain')]}
+               'building-convolutional-neural-networks-in-tensorflow-keras-and-pytorch')]}
 end of tocinfo -->
 
 <body>
@@ -247,96 +284,100 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week44-bs.html">Week 44,  Convolutional Neural Networks (CNN)</a>
+    <a class="navbar-brand" href="http://github.com/week44-bs.html">Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs001.html#plan-for-week-44" style="font-size: 80%;"><b>Plan for week 44</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs002.html#lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Lab  sessions on Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs003.html#material-for-lecture-monday-october-28" style="font-size: 80%;"><b>Material for Lecture Monday October 28</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs004.html#convolutional-neural-networks-recognizing-images" style="font-size: 80%;"><b>Convolutional Neural Networks (recognizing images)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs005.html#what-is-the-difference" style="font-size: 80%;"><b>What is the Difference</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs006.html#neural-networks-vs-cnns" style="font-size: 80%;"><b>Neural Networks vs CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs007.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;"><b>Why CNNS for images, sound files, medical images from CT scans etc?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs008.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;"><b>Regular NNs don’t scale well to full images</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs009.html#3d-volumes-of-neurons" style="font-size: 80%;"><b>3D volumes of neurons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs010.html#more-on-dimensionalities" style="font-size: 80%;"><b>More on Dimensionalities</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs011.html#further-remarks" style="font-size: 80%;"><b>Further remarks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs012.html#layers-used-to-build-cnns" style="font-size: 80%;"><b>Layers used to build CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs013.html#transforming-images" style="font-size: 80%;"><b>Transforming images</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs014.html#cnns-in-brief" style="font-size: 80%;"><b>CNNs in brief</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs015.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs016.html#key-idea" style="font-size: 80%;"><b>Key Idea</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs017.html#how-to-do-image-compression-before-the-era-of-deep-learning" style="font-size: 80%;"><b>How to do image compression before the era of deep learning</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs018.html#the-svd-example" style="font-size: 80%;"><b>The SVD example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs019.html#mathematics-of-cnns" style="font-size: 80%;"><b>Mathematics of CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs020.html#convolution-examples-polynomial-multiplication" style="font-size: 80%;"><b>Convolution Examples: Polynomial multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs021.html#efficient-polynomial-multiplication" style="font-size: 80%;"><b>Efficient Polynomial Multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs022.html#further-simplification" style="font-size: 80%;"><b>Further simplification</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs023.html#a-more-efficient-way-of-coding-the-above-convolution" style="font-size: 80%;"><b>A more efficient way of coding the above Convolution</b></a></li>
-     <!-- navigation toc: --> <li><a href="#commutative-process" style="font-size: 80%;"><b>Commutative process</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs025.html#toeplitz-matrices" style="font-size: 80%;"><b>Toeplitz matrices</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs026.html#fourier-series-and-toeplitz-matrices" style="font-size: 80%;"><b>Fourier series and Toeplitz matrices</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs027.html#generalizing-the-above-one-dimensional-case" style="font-size: 80%;"><b>Generalizing the above one-dimensional case</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#memory-considerations" style="font-size: 80%;"><b>Memory considerations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs029.html#padding" style="font-size: 80%;"><b>Padding</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs030.html#new-vector" style="font-size: 80%;"><b>New vector</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs031.html#rewriting-as-dot-products" style="font-size: 80%;"><b>Rewriting as dot products</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs032.html#cross-correlation" style="font-size: 80%;"><b>Cross correlation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs032.html#two-dimensional-objects" style="font-size: 80%;"><b>Two-dimensional objects</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs033.html#cnns-in-more-detail-simple-example" style="font-size: 80%;"><b>CNNs in more detail, simple example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs034.html#the-convolution-stage" style="font-size: 80%;"><b>The convolution stage</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs035.html#finding-the-number-of-parameters" style="font-size: 80%;"><b>Finding the number of parameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs036.html#new-image-or-volume" style="font-size: 80%;"><b>New image (or volume)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs037.html#parameters-to-train-common-settings" style="font-size: 80%;"><b>Parameters to train, common settings</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs038.html#examples-of-cnn-setups" style="font-size: 80%;"><b>Examples of CNN setups</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs039.html#summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>Summarizing: Performing a general discrete convolution ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs040.html#pooling" style="font-size: 80%;"><b>Pooling</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs041.html#pooling-arithmetic" style="font-size: 80%;"><b>Pooling arithmetic</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs042.html#pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>Pooling types ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs043.html#building-convolutional-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building convolutional neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs044.html#setting-it-up" style="font-size: 80%;"><b>Setting it up</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs045.html#the-mnist-dataset-again" style="font-size: 80%;"><b>The MNIST dataset again</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs046.html#strong-correlations" style="font-size: 80%;"><b>Strong correlations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs047.html#layers-of-a-cnn" style="font-size: 80%;"><b>Layers of a CNN</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs048.html#systematic-reduction" style="font-size: 80%;"><b>Systematic reduction</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs049.html#prerequisites-collect-and-pre-process-data" style="font-size: 80%;"><b>Prerequisites: Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs050.html#importing-keras-and-tensorflow" style="font-size: 80%;"><b>Importing Keras and Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs051.html#running-with-keras" style="font-size: 80%;"><b>Running with Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs052.html#final-part" style="font-size: 80%;"><b>Final part</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs053.html#final-visualization" style="font-size: 80%;"><b>Final visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs054.html#the-cifar01-data-set" style="font-size: 80%;"><b>The CIFAR01 data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs055.html#verifying-the-data-set" style="font-size: 80%;"><b>Verifying the data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs056.html#set-up-the-model" style="font-size: 80%;"><b>Set up  the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs057.html#add-dense-layers-on-top" style="font-size: 80%;"><b>Add Dense layers on top</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs058.html#compile-and-train-the-model" style="font-size: 80%;"><b>Compile and train the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs059.html#finally-evaluate-the-model" style="font-size: 80%;"><b>Finally, evaluate the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#building-our-own-cnn-code" style="font-size: 80%;"><b>Building our own CNN code</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#list-of-contents" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;List of contents:</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution2dlayer-convolution-in-a-hidden-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution2DLayer: convolution in a hidden layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#backpropagation-in-the-convolutional-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Backpropagation in the convolutional layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#demonstration" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Demonstration</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#pooling-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Pooling Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#flattening-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Flattening Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#fully-connected-layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Fully Connected Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#optimized-convolution2dlayer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Optimized Convolution2DLayer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#the-convolutional-neural-network-cnn" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;The Convolutional Neural Network (CNN)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-cnn-code" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of CNN code</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#additional-remarks" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Additional Remarks</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#remarks-on-the-speed" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Remarks on the speed</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution-using-separable-kernels" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution using separable kernels</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution-in-the-fourier-domain" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution in the Fourier domain</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs001.html#plan-for-week-44" style="font-size: 80%;">Plan for week 44</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs002.html#lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;">Lab  sessions on Tuesday and Wednesday</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs003.html#material-for-lecture-monday-october-27" style="font-size: 80%;">Material for Lecture Monday October 27</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs004.html#solving-differential-equations-with-deep-learning" style="font-size: 80%;">Solving differential equations  with Deep Learning</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs005.html#ordinary-differential-equations-first" style="font-size: 80%;">Ordinary Differential Equations first</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#the-trial-solution" style="font-size: 80%;">The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs007.html#minimization-process" style="font-size: 80%;">Minimization process</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs008.html#minimizing-the-cost-function-using-gradient-descent-and-automatic-differentiation" style="font-size: 80%;">Minimizing the cost function using gradient descent and automatic differentiation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs009.html#example-exponential-decay" style="font-size: 80%;">Example: Exponential decay</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs010.html#the-function-to-solve-for" style="font-size: 80%;">The function to solve for</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#the-trial-solution" style="font-size: 80%;">The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs012.html#setup-of-network" style="font-size: 80%;">Setup of Network</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs013.html#reformulating-the-problem" style="font-size: 80%;">Reformulating the problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs014.html#more-technicalities" style="font-size: 80%;">More technicalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs039.html#more-details" style="font-size: 80%;">More details</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs016.html#a-possible-implementation-of-a-neural-network" style="font-size: 80%;">A possible implementation of a neural network</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs017.html#technicalities" style="font-size: 80%;">Technicalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs018.html#final-technicalities-i" style="font-size: 80%;">Final technicalities I</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs019.html#final-technicalities-ii" style="font-size: 80%;">Final technicalities II</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs020.html#final-technicalities-iii" style="font-size: 80%;">Final technicalities III</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs021.html#final-technicalities-iv" style="font-size: 80%;">Final technicalities IV</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs022.html#back-propagation" style="font-size: 80%;">Back propagation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs023.html#gradient-descent" style="font-size: 80%;">Gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="#the-code-for-solving-the-ode" style="font-size: 80%;">The code for solving the ODE</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs025.html#the-network-with-one-input-layer-specified-number-of-hidden-layers-and-one-output-layer" style="font-size: 80%;">The network with one input layer, specified number of hidden layers, and one output layer</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs026.html#example-population-growth" style="font-size: 80%;">Example: Population growth</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs027.html#setting-up-the-problem" style="font-size: 80%;">Setting up the problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#the-trial-solution" style="font-size: 80%;">The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs029.html#the-program-using-autograd" style="font-size: 80%;">The program using Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs030.html#using-forward-euler-to-solve-the-ode" style="font-size: 80%;">Using forward Euler to solve the ODE</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs031.html#example-solving-the-one-dimensional-poisson-equation" style="font-size: 80%;">Example: Solving the one dimensional Poisson equation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs032.html#the-specific-equation-to-solve-for" style="font-size: 80%;">The specific equation to solve for</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs033.html#solving-the-equation-using-autograd" style="font-size: 80%;">Solving the equation using Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs034.html#comparing-with-a-numerical-scheme" style="font-size: 80%;">Comparing with a numerical scheme</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs035.html#setting-up-the-code" style="font-size: 80%;">Setting up the code</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs036.html#partial-differential-equations" style="font-size: 80%;">Partial Differential Equations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs037.html#type-of-problem" style="font-size: 80%;">Type of problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs038.html#network-requirements" style="font-size: 80%;">Network requirements</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs039.html#more-details" style="font-size: 80%;">More details</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs040.html#example-the-diffusion-equation" style="font-size: 80%;">Example: The diffusion equation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs041.html#defining-the-problem" style="font-size: 80%;">Defining the problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs042.html#setting-up-the-network-using-autograd" style="font-size: 80%;">Setting up the network using Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs043.html#setting-up-the-network-using-autograd-the-trial-solution" style="font-size: 80%;">Setting up the network using Autograd; The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs044.html#why-the-jacobian" style="font-size: 80%;">Why the Jacobian?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs045.html#setting-up-the-network-using-autograd-the-full-program" style="font-size: 80%;">Setting up the network using Autograd; The full program</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs046.html#resources-on-differential-equations-and-deep-learning" style="font-size: 80%;">Resources on differential equations and deep learning</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs047.html#convolutional-neural-networks-recognizing-images" style="font-size: 80%;">Convolutional Neural Networks (recognizing images)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs048.html#what-is-the-difference" style="font-size: 80%;">What is the Difference</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs049.html#neural-networks-vs-cnns" style="font-size: 80%;">Neural Networks vs CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs050.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;">Why CNNS for images, sound files, medical images from CT scans etc?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs051.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;">Regular NNs don’t scale well to full images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs052.html#3d-volumes-of-neurons" style="font-size: 80%;">3D volumes of neurons</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs053.html#more-on-dimensionalities" style="font-size: 80%;">More on Dimensionalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs054.html#further-remarks" style="font-size: 80%;">Further remarks</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs055.html#layers-used-to-build-cnns" style="font-size: 80%;">Layers used to build CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs056.html#transforming-images" style="font-size: 80%;">Transforming images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs057.html#cnns-in-brief" style="font-size: 80%;">CNNs in brief</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs058.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs059.html#key-idea" style="font-size: 80%;">Key Idea</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#how-to-do-image-compression-before-the-era-of-deep-learning" style="font-size: 80%;">How to do image compression before the era of deep learning</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs061.html#the-svd-example" style="font-size: 80%;">The SVD example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs062.html#mathematics-of-cnns" style="font-size: 80%;">Mathematics of CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs063.html#convolution-examples-polynomial-multiplication" style="font-size: 80%;">Convolution Examples: Polynomial multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs064.html#efficient-polynomial-multiplication" style="font-size: 80%;">Efficient Polynomial Multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs065.html#further-simplification" style="font-size: 80%;">Further simplification</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs066.html#a-more-efficient-way-of-coding-the-above-convolution" style="font-size: 80%;">A more efficient way of coding the above Convolution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs067.html#commutative-process" style="font-size: 80%;">Commutative process</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs068.html#toeplitz-matrices" style="font-size: 80%;">Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs069.html#fourier-series-and-toeplitz-matrices" style="font-size: 80%;">Fourier series and Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs070.html#generalizing-the-above-one-dimensional-case" style="font-size: 80%;">Generalizing the above one-dimensional case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs071.html#memory-considerations" style="font-size: 80%;">Memory considerations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs072.html#padding" style="font-size: 80%;">Padding</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs073.html#new-vector" style="font-size: 80%;">New vector</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs074.html#rewriting-as-dot-products" style="font-size: 80%;">Rewriting as dot products</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs075.html#cross-correlation" style="font-size: 80%;">Cross correlation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs075.html#two-dimensional-objects" style="font-size: 80%;">Two-dimensional objects</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs076.html#cnns-in-more-detail-simple-example" style="font-size: 80%;">CNNs in more detail, simple example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs077.html#the-convolution-stage" style="font-size: 80%;">The convolution stage</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs078.html#finding-the-number-of-parameters" style="font-size: 80%;">Finding the number of parameters</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs079.html#new-image-or-volume" style="font-size: 80%;">New image (or volume)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs080.html#parameters-to-train-common-settings" style="font-size: 80%;">Parameters to train, common settings</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs081.html#examples-of-cnn-setups" style="font-size: 80%;">Examples of CNN setups</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs082.html#summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Summarizing: Performing a general discrete convolution ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs083.html#pooling" style="font-size: 80%;">Pooling</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs084.html#pooling-arithmetic" style="font-size: 80%;">Pooling arithmetic</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs085.html#pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Pooling types ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs086.html#building-convolutional-neural-networks-in-tensorflow-keras-and-pytorch" style="font-size: 80%;">Building convolutional neural networks in Tensorflow/Keras and PyTorch</a></li>
 
         </ul>
       </li>
@@ -348,27 +389,174 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0024"></a>
 <!-- !split -->
-<h2 id="commutative-process" class="anchor">Commutative process </h2>
-
-<p>The process is commutative and we can easily see that we can rewrite the multiplication in terms of  a matrix holding \( \beta \) and a vector holding \( \alpha \).
-In this case we have
-</p>
-$$
-\boldsymbol{\delta}=\begin{bmatrix}\beta_0 & 0 & 0  \\
-                            \beta_1 & \beta_0 & 0  \\
-			    \beta_2 & \beta_1 & \beta_0  \\
-			    \beta_3 & \beta_2 & \beta_1 \\
-			    0 & \beta_3 & \beta_2 \\
-			    0 & 0 & \beta_3
-			    \end{bmatrix}\begin{bmatrix} \alpha_0 \\ \alpha_1 \\ \alpha_2\end{bmatrix}.
-$$
-
-<p>Note that the use of these matrices is for mathematical purposes only
-and not implementation purposes.  When implementing the above equation
-we do not encode (and allocate memory) the matrices explicitely.  We
-rather code the convolutions in the minimal memory footprint that they
-require.
-</p>
+<h2 id="the-code-for-solving-the-ode" class="anchor">The code for solving the ODE </h2>
+
+
+<!-- code=python (!bc pycod) typeset with pygments style "default" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #f8f8f8">
+  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">autograd.numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">autograd</span> <span style="color: #008000; font-weight: bold">import</span> grad, elementwise_grad
+<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">autograd.numpy.random</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">npr</span>
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">matplotlib</span> <span style="color: #008000; font-weight: bold">import</span> pyplot <span style="color: #008000; font-weight: bold">as</span> plt
+
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">sigmoid</span>(z):
+    <span style="color: #008000; font-weight: bold">return</span> <span style="color: #666666">1/</span>(<span style="color: #666666">1</span> <span style="color: #666666">+</span> np<span style="color: #666666">.</span>exp(<span style="color: #666666">-</span>z))
+
+<span style="color: #408080; font-style: italic"># Assuming one input, hidden, and output layer</span>
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">neural_network</span>(params, x):
+
+    <span style="color: #408080; font-style: italic"># Find the weights (including and biases) for the hidden and output layer.</span>
+    <span style="color: #408080; font-style: italic"># Assume that params is a list of parameters for each layer.</span>
+    <span style="color: #408080; font-style: italic"># The biases are the first element for each array in params,</span>
+    <span style="color: #408080; font-style: italic"># and the weights are the remaning elements in each array in params.</span>
+
+    w_hidden <span style="color: #666666">=</span> params[<span style="color: #666666">0</span>]
+    w_output <span style="color: #666666">=</span> params[<span style="color: #666666">1</span>]
+
+    <span style="color: #408080; font-style: italic"># Assumes input x being an one-dimensional array</span>
+    num_values <span style="color: #666666">=</span> np<span style="color: #666666">.</span>size(x)
+    x <span style="color: #666666">=</span> x<span style="color: #666666">.</span>reshape(<span style="color: #666666">-1</span>, num_values)
+
+    <span style="color: #408080; font-style: italic"># Assume that the input layer does nothing to the input x</span>
+    x_input <span style="color: #666666">=</span> x
+
+    <span style="color: #408080; font-style: italic">## Hidden layer:</span>
+
+    <span style="color: #408080; font-style: italic"># Add a row of ones to include bias</span>
+    x_input <span style="color: #666666">=</span> np<span style="color: #666666">.</span>concatenate((np<span style="color: #666666">.</span>ones((<span style="color: #666666">1</span>,num_values)), x_input ), axis <span style="color: #666666">=</span> <span style="color: #666666">0</span>)
+
+    z_hidden <span style="color: #666666">=</span> np<span style="color: #666666">.</span>matmul(w_hidden, x_input)
+    x_hidden <span style="color: #666666">=</span> sigmoid(z_hidden)
+
+    <span style="color: #408080; font-style: italic">## Output layer:</span>
+
+    <span style="color: #408080; font-style: italic"># Include bias:</span>
+    x_hidden <span style="color: #666666">=</span> np<span style="color: #666666">.</span>concatenate((np<span style="color: #666666">.</span>ones((<span style="color: #666666">1</span>,num_values)), x_hidden ), axis <span style="color: #666666">=</span> <span style="color: #666666">0</span>)
+
+    z_output <span style="color: #666666">=</span> np<span style="color: #666666">.</span>matmul(w_output, x_hidden)
+    x_output <span style="color: #666666">=</span> z_output
+
+    <span style="color: #008000; font-weight: bold">return</span> x_output
+
+<span style="color: #408080; font-style: italic"># The trial solution using the deep neural network:</span>
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">g_trial</span>(x,params, g0 <span style="color: #666666">=</span> <span style="color: #666666">10</span>):
+    <span style="color: #008000; font-weight: bold">return</span> g0 <span style="color: #666666">+</span> x<span style="color: #666666">*</span>neural_network(params,x)
+
+<span style="color: #408080; font-style: italic"># The right side of the ODE:</span>
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">g</span>(x, g_trial, gamma <span style="color: #666666">=</span> <span style="color: #666666">2</span>):
+    <span style="color: #008000; font-weight: bold">return</span> <span style="color: #666666">-</span>gamma<span style="color: #666666">*</span>g_trial
+
+<span style="color: #408080; font-style: italic"># The cost function:</span>
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">cost_function</span>(P, x):
+
+    <span style="color: #408080; font-style: italic"># Evaluate the trial function with the current parameters P</span>
+    g_t <span style="color: #666666">=</span> g_trial(x,P)
+
+    <span style="color: #408080; font-style: italic"># Find the derivative w.r.t x of the neural network</span>
+    d_net_out <span style="color: #666666">=</span> elementwise_grad(neural_network,<span style="color: #666666">1</span>)(P,x)
+
+    <span style="color: #408080; font-style: italic"># Find the derivative w.r.t x of the trial function</span>
+    d_g_t <span style="color: #666666">=</span> elementwise_grad(g_trial,<span style="color: #666666">0</span>)(x,P)
+
+    <span style="color: #408080; font-style: italic"># The right side of the ODE</span>
+    func <span style="color: #666666">=</span> g(x, g_t)
+
+    err_sqr <span style="color: #666666">=</span> (d_g_t <span style="color: #666666">-</span> func)<span style="color: #666666">**2</span>
+    cost_sum <span style="color: #666666">=</span> np<span style="color: #666666">.</span>sum(err_sqr)
+
+    <span style="color: #008000; font-weight: bold">return</span> cost_sum <span style="color: #666666">/</span> np<span style="color: #666666">.</span>size(err_sqr)
+
+<span style="color: #408080; font-style: italic"># Solve the exponential decay ODE using neural network with one input, hidden, and output layer</span>
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">solve_ode_neural_network</span>(x, num_neurons_hidden, num_iter, lmb):
+    <span style="color: #408080; font-style: italic">## Set up initial weights and biases</span>
+
+    <span style="color: #408080; font-style: italic"># For the hidden layer</span>
+    p0 <span style="color: #666666">=</span> npr<span style="color: #666666">.</span>randn(num_neurons_hidden, <span style="color: #666666">2</span> )
+
+    <span style="color: #408080; font-style: italic"># For the output layer</span>
+    p1 <span style="color: #666666">=</span> npr<span style="color: #666666">.</span>randn(<span style="color: #666666">1</span>, num_neurons_hidden <span style="color: #666666">+</span> <span style="color: #666666">1</span> ) <span style="color: #408080; font-style: italic"># +1 since bias is included</span>
+
+    P <span style="color: #666666">=</span> [p0, p1]
+
+    <span style="color: #008000">print</span>(<span style="color: #BA2121">&#39;Initial cost: </span><span style="color: #BB6688; font-weight: bold">%g</span><span style="color: #BA2121">&#39;</span><span style="color: #666666">%</span>cost_function(P, x))
+
+    <span style="color: #408080; font-style: italic">## Start finding the optimal weights using gradient descent</span>
+
+    <span style="color: #408080; font-style: italic"># Find the Python function that represents the gradient of the cost function</span>
+    <span style="color: #408080; font-style: italic"># w.r.t the 0-th input argument -- that is the weights and biases in the hidden and output layer</span>
+    cost_function_grad <span style="color: #666666">=</span> grad(cost_function,<span style="color: #666666">0</span>)
+
+    <span style="color: #408080; font-style: italic"># Let the update be done num_iter times</span>
+    <span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(num_iter):
+        <span style="color: #408080; font-style: italic"># Evaluate the gradient at the current weights and biases in P.</span>
+        <span style="color: #408080; font-style: italic"># The cost_grad consist now of two arrays;</span>
+        <span style="color: #408080; font-style: italic"># one for the gradient w.r.t P_hidden and</span>
+        <span style="color: #408080; font-style: italic"># one for the gradient w.r.t P_output</span>
+        cost_grad <span style="color: #666666">=</span>  cost_function_grad(P, x)
+
+        P[<span style="color: #666666">0</span>] <span style="color: #666666">=</span> P[<span style="color: #666666">0</span>] <span style="color: #666666">-</span> lmb <span style="color: #666666">*</span> cost_grad[<span style="color: #666666">0</span>]
+        P[<span style="color: #666666">1</span>] <span style="color: #666666">=</span> P[<span style="color: #666666">1</span>] <span style="color: #666666">-</span> lmb <span style="color: #666666">*</span> cost_grad[<span style="color: #666666">1</span>]
+
+    <span style="color: #008000">print</span>(<span style="color: #BA2121">&#39;Final cost: </span><span style="color: #BB6688; font-weight: bold">%g</span><span style="color: #BA2121">&#39;</span><span style="color: #666666">%</span>cost_function(P, x))
+
+    <span style="color: #008000; font-weight: bold">return</span> P
+
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">g_analytic</span>(x, gamma <span style="color: #666666">=</span> <span style="color: #666666">2</span>, g0 <span style="color: #666666">=</span> <span style="color: #666666">10</span>):
+    <span style="color: #008000; font-weight: bold">return</span> g0<span style="color: #666666">*</span>np<span style="color: #666666">.</span>exp(<span style="color: #666666">-</span>gamma<span style="color: #666666">*</span>x)
+
+<span style="color: #408080; font-style: italic"># Solve the given problem</span>
+<span style="color: #008000; font-weight: bold">if</span> <span style="color: #19177C">__name__</span> <span style="color: #666666">==</span> <span style="color: #BA2121">&#39;__main__&#39;</span>:
+    <span style="color: #408080; font-style: italic"># Set seed such that the weight are initialized</span>
+    <span style="color: #408080; font-style: italic"># with same weights and biases for every run.</span>
+    npr<span style="color: #666666">.</span>seed(<span style="color: #666666">15</span>)
+
+    <span style="color: #408080; font-style: italic">## Decide the vales of arguments to the function to solve</span>
+    N <span style="color: #666666">=</span> <span style="color: #666666">10</span>
+    x <span style="color: #666666">=</span> np<span style="color: #666666">.</span>linspace(<span style="color: #666666">0</span>, <span style="color: #666666">1</span>, N)
+
+    <span style="color: #408080; font-style: italic">## Set up the initial parameters</span>
+    num_hidden_neurons <span style="color: #666666">=</span> <span style="color: #666666">10</span>
+    num_iter <span style="color: #666666">=</span> <span style="color: #666666">10000</span>
+    lmb <span style="color: #666666">=</span> <span style="color: #666666">0.001</span>
+
+    <span style="color: #408080; font-style: italic"># Use the network</span>
+    P <span style="color: #666666">=</span> solve_ode_neural_network(x, num_hidden_neurons, num_iter, lmb)
+
+    <span style="color: #408080; font-style: italic"># Print the deviation from the trial solution and true solution</span>
+    res <span style="color: #666666">=</span> g_trial(x,P)
+    res_analytical <span style="color: #666666">=</span> g_analytic(x)
+
+    <span style="color: #008000">print</span>(<span style="color: #BA2121">&#39;Max absolute difference: </span><span style="color: #BB6688; font-weight: bold">%g</span><span style="color: #BA2121">&#39;</span><span style="color: #666666">%</span>np<span style="color: #666666">.</span>max(np<span style="color: #666666">.</span>abs(res <span style="color: #666666">-</span> res_analytical)))
+
+    <span style="color: #408080; font-style: italic"># Plot the results</span>
+    plt<span style="color: #666666">.</span>figure(figsize<span style="color: #666666">=</span>(<span style="color: #666666">10</span>,<span style="color: #666666">10</span>))
+
+    plt<span style="color: #666666">.</span>title(<span style="color: #BA2121">&#39;Performance of neural network solving an ODE compared to the analytical solution&#39;</span>)
+    plt<span style="color: #666666">.</span>plot(x, res_analytical)
+    plt<span style="color: #666666">.</span>plot(x, res[<span style="color: #666666">0</span>,:])
+    plt<span style="color: #666666">.</span>legend([<span style="color: #BA2121">&#39;analytical&#39;</span>,<span style="color: #BA2121">&#39;nn&#39;</span>])
+    plt<span style="color: #666666">.</span>xlabel(<span style="color: #BA2121">&#39;x&#39;</span>)
+    plt<span style="color: #666666">.</span>ylabel(<span style="color: #BA2121">&#39;g(x)&#39;</span>)
+    plt<span style="color: #666666">.</span>show()
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
+</div>
+
 
 <p>
 <!-- navigation buttons at the bottom of the page -->
@@ -395,7 +583,7 @@ <h2 id="commutative-process" class="anchor">Commutative process </h2>
   <li><a href="http://github.com/._week44-bs032.html">33</a></li>
   <li><a href="http://github.com/._week44-bs033.html">34</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week44-bs060.html">61</a></li>
+  <li><a href="http://github.com/._week44-bs086.html">87</a></li>
   <li><a href="http://github.com/._week44-bs025.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week44/html/._week44-bs025.html b/doc/pub/week44/html/._week44-bs025.html
index fbbb06298..e7ccd7cd9 100644
--- a/doc/pub/week44/html/._week44-bs025.html
+++ b/doc/pub/week44/html/._week44-bs025.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 44,  Convolutional Neural Networks (CNN)">
-<title>Week 44,  Convolutional Neural Networks (CNN)</title>
+<meta name="description" content="Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)">
+<title>Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week44.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week44-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -41,10 +41,121 @@
                2,
                None,
                'lab-sessions-on-tuesday-and-wednesday'),
-              ('Material for Lecture Monday October 28',
+              ('Material for Lecture Monday October 27',
                2,
                None,
-               'material-for-lecture-monday-october-28'),
+               'material-for-lecture-monday-october-27'),
+              ('Solving differential equations  with Deep Learning',
+               2,
+               None,
+               'solving-differential-equations-with-deep-learning'),
+              ('Ordinary Differential Equations first',
+               2,
+               None,
+               'ordinary-differential-equations-first'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('Minimization process', 2, None, 'minimization-process'),
+              ('Minimizing the cost function using gradient descent and '
+               'automatic differentiation',
+               2,
+               None,
+               'minimizing-the-cost-function-using-gradient-descent-and-automatic-differentiation'),
+              ('Example: Exponential decay',
+               2,
+               None,
+               'example-exponential-decay'),
+              ('The function to solve for',
+               2,
+               None,
+               'the-function-to-solve-for'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('Setup of Network', 2, None, 'setup-of-network'),
+              ('Reformulating the problem',
+               2,
+               None,
+               'reformulating-the-problem'),
+              ('More technicalities', 2, None, 'more-technicalities'),
+              ('More details', 2, None, 'more-details'),
+              ('A possible implementation of a neural network',
+               2,
+               None,
+               'a-possible-implementation-of-a-neural-network'),
+              ('Technicalities', 2, None, 'technicalities'),
+              ('Final technicalities I', 2, None, 'final-technicalities-i'),
+              ('Final technicalities II', 2, None, 'final-technicalities-ii'),
+              ('Final technicalities III', 2, None, 'final-technicalities-iii'),
+              ('Final technicalities IV', 2, None, 'final-technicalities-iv'),
+              ('Back propagation', 2, None, 'back-propagation'),
+              ('Gradient descent', 2, None, 'gradient-descent'),
+              ('The code for solving the ODE',
+               2,
+               None,
+               'the-code-for-solving-the-ode'),
+              ('The network with one input layer, specified number of hidden '
+               'layers, and one output layer',
+               2,
+               None,
+               'the-network-with-one-input-layer-specified-number-of-hidden-layers-and-one-output-layer'),
+              ('Example: Population growth',
+               2,
+               None,
+               'example-population-growth'),
+              ('Setting up the problem', 2, None, 'setting-up-the-problem'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('The program using Autograd',
+               2,
+               None,
+               'the-program-using-autograd'),
+              ('Using forward Euler to solve the ODE',
+               2,
+               None,
+               'using-forward-euler-to-solve-the-ode'),
+              ('Example: Solving the one dimensional Poisson equation',
+               2,
+               None,
+               'example-solving-the-one-dimensional-poisson-equation'),
+              ('The specific equation to solve for',
+               2,
+               None,
+               'the-specific-equation-to-solve-for'),
+              ('Solving the equation using Autograd',
+               2,
+               None,
+               'solving-the-equation-using-autograd'),
+              ('Comparing with a numerical scheme',
+               2,
+               None,
+               'comparing-with-a-numerical-scheme'),
+              ('Setting up the code', 2, None, 'setting-up-the-code'),
+              ('Partial Differential Equations',
+               2,
+               None,
+               'partial-differential-equations'),
+              ('Type of problem', 2, None, 'type-of-problem'),
+              ('Network requirements', 2, None, 'network-requirements'),
+              ('More details', 2, None, 'more-details'),
+              ('Example: The diffusion equation',
+               2,
+               None,
+               'example-the-diffusion-equation'),
+              ('Defining the problem', 2, None, 'defining-the-problem'),
+              ('Setting up the network using Autograd',
+               2,
+               None,
+               'setting-up-the-network-using-autograd'),
+              ('Setting up the network using Autograd; The trial solution',
+               2,
+               None,
+               'setting-up-the-network-using-autograd-the-trial-solution'),
+              ('Why the Jacobian?', 2, None, 'why-the-jacobian'),
+              ('Setting up the network using Autograd; The full program',
+               2,
+               None,
+               'setting-up-the-network-using-autograd-the-full-program'),
+              ('Resources on differential equations and deep learning',
+               2,
+               None,
+               'resources-on-differential-equations-and-deep-learning'),
               ('Convolutional Neural Networks (recognizing images)',
                2,
                None,
@@ -141,85 +252,11 @@
                2,
                None,
                'pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
-              ('Building convolutional neural networks in Tensorflow and Keras',
+              ('Building convolutional neural networks in Tensorflow/Keras and '
+               'PyTorch',
                2,
                None,
-               'building-convolutional-neural-networks-in-tensorflow-and-keras'),
-              ('Setting it up', 2, None, 'setting-it-up'),
-              ('The MNIST dataset again', 2, None, 'the-mnist-dataset-again'),
-              ('Strong correlations', 2, None, 'strong-correlations'),
-              ('Layers of a CNN', 2, None, 'layers-of-a-cnn'),
-              ('Systematic reduction', 2, None, 'systematic-reduction'),
-              ('Prerequisites: Collect and pre-process data',
-               2,
-               None,
-               'prerequisites-collect-and-pre-process-data'),
-              ('Importing Keras and Tensorflow',
-               2,
-               None,
-               'importing-keras-and-tensorflow'),
-              ('Running with Keras', 2, None, 'running-with-keras'),
-              ('Final part', 2, None, 'final-part'),
-              ('Final visualization', 2, None, 'final-visualization'),
-              ('The CIFAR01 data set', 2, None, 'the-cifar01-data-set'),
-              ('Verifying the data set', 2, None, 'verifying-the-data-set'),
-              ('Set up  the model', 2, None, 'set-up-the-model'),
-              ('Add Dense layers on top', 2, None, 'add-dense-layers-on-top'),
-              ('Compile and train the model',
-               2,
-               None,
-               'compile-and-train-the-model'),
-              ('Finally, evaluate the model',
-               2,
-               None,
-               'finally-evaluate-the-model'),
-              ('Building our own CNN code',
-               2,
-               None,
-               'building-our-own-cnn-code'),
-              ('List of contents:', 3, None, 'list-of-contents'),
-              ('Schedulers', 3, None, 'schedulers'),
-              ('Usage of schedulers', 3, None, 'usage-of-schedulers'),
-              ('Cost functions', 3, None, 'cost-functions'),
-              ('Usage of cost functions', 3, None, 'usage-of-cost-functions'),
-              ('Activation functions', 3, None, 'activation-functions'),
-              ('Usage of activation functions',
-               3,
-               None,
-               'usage-of-activation-functions'),
-              ('Convolution', 3, None, 'convolution'),
-              ('Layers', 3, None, 'layers'),
-              ('Convolution2DLayer: convolution in a hidden layer',
-               3,
-               None,
-               'convolution2dlayer-convolution-in-a-hidden-layer'),
-              ('Backpropagation in the convolutional layer',
-               3,
-               None,
-               'backpropagation-in-the-convolutional-layer'),
-              ('Demonstration', 3, None, 'demonstration'),
-              ('Pooling Layer', 3, None, 'pooling-layer'),
-              ('Flattening Layer', 3, None, 'flattening-layer'),
-              ('Fully Connected Layers', 3, None, 'fully-connected-layers'),
-              ('Optimized Convolution2DLayer',
-               3,
-               None,
-               'optimized-convolution2dlayer'),
-              ('The Convolutional Neural Network (CNN)',
-               3,
-               None,
-               'the-convolutional-neural-network-cnn'),
-              ('Usage of CNN code', 3, None, 'usage-of-cnn-code'),
-              ('Additional Remarks', 3, None, 'additional-remarks'),
-              ('Remarks on the speed', 3, None, 'remarks-on-the-speed'),
-              ('Convolution using separable kernels',
-               3,
-               None,
-               'convolution-using-separable-kernels'),
-              ('Convolution in the Fourier domain',
-               3,
-               None,
-               'convolution-in-the-fourier-domain')]}
+               'building-convolutional-neural-networks-in-tensorflow-keras-and-pytorch')]}
 end of tocinfo -->
 
 <body>
@@ -247,96 +284,100 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week44-bs.html">Week 44,  Convolutional Neural Networks (CNN)</a>
+    <a class="navbar-brand" href="http://github.com/week44-bs.html">Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs001.html#plan-for-week-44" style="font-size: 80%;"><b>Plan for week 44</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs002.html#lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Lab  sessions on Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs003.html#material-for-lecture-monday-october-28" style="font-size: 80%;"><b>Material for Lecture Monday October 28</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs004.html#convolutional-neural-networks-recognizing-images" style="font-size: 80%;"><b>Convolutional Neural Networks (recognizing images)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs005.html#what-is-the-difference" style="font-size: 80%;"><b>What is the Difference</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs006.html#neural-networks-vs-cnns" style="font-size: 80%;"><b>Neural Networks vs CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs007.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;"><b>Why CNNS for images, sound files, medical images from CT scans etc?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs008.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;"><b>Regular NNs don’t scale well to full images</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs009.html#3d-volumes-of-neurons" style="font-size: 80%;"><b>3D volumes of neurons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs010.html#more-on-dimensionalities" style="font-size: 80%;"><b>More on Dimensionalities</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs011.html#further-remarks" style="font-size: 80%;"><b>Further remarks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs012.html#layers-used-to-build-cnns" style="font-size: 80%;"><b>Layers used to build CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs013.html#transforming-images" style="font-size: 80%;"><b>Transforming images</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs014.html#cnns-in-brief" style="font-size: 80%;"><b>CNNs in brief</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs015.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs016.html#key-idea" style="font-size: 80%;"><b>Key Idea</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs017.html#how-to-do-image-compression-before-the-era-of-deep-learning" style="font-size: 80%;"><b>How to do image compression before the era of deep learning</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs018.html#the-svd-example" style="font-size: 80%;"><b>The SVD example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs019.html#mathematics-of-cnns" style="font-size: 80%;"><b>Mathematics of CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs020.html#convolution-examples-polynomial-multiplication" style="font-size: 80%;"><b>Convolution Examples: Polynomial multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs021.html#efficient-polynomial-multiplication" style="font-size: 80%;"><b>Efficient Polynomial Multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs022.html#further-simplification" style="font-size: 80%;"><b>Further simplification</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs023.html#a-more-efficient-way-of-coding-the-above-convolution" style="font-size: 80%;"><b>A more efficient way of coding the above Convolution</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs024.html#commutative-process" style="font-size: 80%;"><b>Commutative process</b></a></li>
-     <!-- navigation toc: --> <li><a href="#toeplitz-matrices" style="font-size: 80%;"><b>Toeplitz matrices</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs026.html#fourier-series-and-toeplitz-matrices" style="font-size: 80%;"><b>Fourier series and Toeplitz matrices</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs027.html#generalizing-the-above-one-dimensional-case" style="font-size: 80%;"><b>Generalizing the above one-dimensional case</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#memory-considerations" style="font-size: 80%;"><b>Memory considerations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs029.html#padding" style="font-size: 80%;"><b>Padding</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs030.html#new-vector" style="font-size: 80%;"><b>New vector</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs031.html#rewriting-as-dot-products" style="font-size: 80%;"><b>Rewriting as dot products</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs032.html#cross-correlation" style="font-size: 80%;"><b>Cross correlation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs032.html#two-dimensional-objects" style="font-size: 80%;"><b>Two-dimensional objects</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs033.html#cnns-in-more-detail-simple-example" style="font-size: 80%;"><b>CNNs in more detail, simple example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs034.html#the-convolution-stage" style="font-size: 80%;"><b>The convolution stage</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs035.html#finding-the-number-of-parameters" style="font-size: 80%;"><b>Finding the number of parameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs036.html#new-image-or-volume" style="font-size: 80%;"><b>New image (or volume)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs037.html#parameters-to-train-common-settings" style="font-size: 80%;"><b>Parameters to train, common settings</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs038.html#examples-of-cnn-setups" style="font-size: 80%;"><b>Examples of CNN setups</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs039.html#summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>Summarizing: Performing a general discrete convolution ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs040.html#pooling" style="font-size: 80%;"><b>Pooling</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs041.html#pooling-arithmetic" style="font-size: 80%;"><b>Pooling arithmetic</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs042.html#pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>Pooling types ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs043.html#building-convolutional-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building convolutional neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs044.html#setting-it-up" style="font-size: 80%;"><b>Setting it up</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs045.html#the-mnist-dataset-again" style="font-size: 80%;"><b>The MNIST dataset again</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs046.html#strong-correlations" style="font-size: 80%;"><b>Strong correlations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs047.html#layers-of-a-cnn" style="font-size: 80%;"><b>Layers of a CNN</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs048.html#systematic-reduction" style="font-size: 80%;"><b>Systematic reduction</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs049.html#prerequisites-collect-and-pre-process-data" style="font-size: 80%;"><b>Prerequisites: Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs050.html#importing-keras-and-tensorflow" style="font-size: 80%;"><b>Importing Keras and Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs051.html#running-with-keras" style="font-size: 80%;"><b>Running with Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs052.html#final-part" style="font-size: 80%;"><b>Final part</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs053.html#final-visualization" style="font-size: 80%;"><b>Final visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs054.html#the-cifar01-data-set" style="font-size: 80%;"><b>The CIFAR01 data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs055.html#verifying-the-data-set" style="font-size: 80%;"><b>Verifying the data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs056.html#set-up-the-model" style="font-size: 80%;"><b>Set up  the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs057.html#add-dense-layers-on-top" style="font-size: 80%;"><b>Add Dense layers on top</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs058.html#compile-and-train-the-model" style="font-size: 80%;"><b>Compile and train the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs059.html#finally-evaluate-the-model" style="font-size: 80%;"><b>Finally, evaluate the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#building-our-own-cnn-code" style="font-size: 80%;"><b>Building our own CNN code</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#list-of-contents" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;List of contents:</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution2dlayer-convolution-in-a-hidden-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution2DLayer: convolution in a hidden layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#backpropagation-in-the-convolutional-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Backpropagation in the convolutional layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#demonstration" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Demonstration</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#pooling-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Pooling Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#flattening-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Flattening Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#fully-connected-layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Fully Connected Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#optimized-convolution2dlayer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Optimized Convolution2DLayer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#the-convolutional-neural-network-cnn" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;The Convolutional Neural Network (CNN)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-cnn-code" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of CNN code</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#additional-remarks" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Additional Remarks</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#remarks-on-the-speed" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Remarks on the speed</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution-using-separable-kernels" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution using separable kernels</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution-in-the-fourier-domain" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution in the Fourier domain</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs001.html#plan-for-week-44" style="font-size: 80%;">Plan for week 44</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs002.html#lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;">Lab  sessions on Tuesday and Wednesday</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs003.html#material-for-lecture-monday-october-27" style="font-size: 80%;">Material for Lecture Monday October 27</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs004.html#solving-differential-equations-with-deep-learning" style="font-size: 80%;">Solving differential equations  with Deep Learning</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs005.html#ordinary-differential-equations-first" style="font-size: 80%;">Ordinary Differential Equations first</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#the-trial-solution" style="font-size: 80%;">The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs007.html#minimization-process" style="font-size: 80%;">Minimization process</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs008.html#minimizing-the-cost-function-using-gradient-descent-and-automatic-differentiation" style="font-size: 80%;">Minimizing the cost function using gradient descent and automatic differentiation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs009.html#example-exponential-decay" style="font-size: 80%;">Example: Exponential decay</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs010.html#the-function-to-solve-for" style="font-size: 80%;">The function to solve for</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#the-trial-solution" style="font-size: 80%;">The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs012.html#setup-of-network" style="font-size: 80%;">Setup of Network</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs013.html#reformulating-the-problem" style="font-size: 80%;">Reformulating the problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs014.html#more-technicalities" style="font-size: 80%;">More technicalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs039.html#more-details" style="font-size: 80%;">More details</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs016.html#a-possible-implementation-of-a-neural-network" style="font-size: 80%;">A possible implementation of a neural network</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs017.html#technicalities" style="font-size: 80%;">Technicalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs018.html#final-technicalities-i" style="font-size: 80%;">Final technicalities I</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs019.html#final-technicalities-ii" style="font-size: 80%;">Final technicalities II</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs020.html#final-technicalities-iii" style="font-size: 80%;">Final technicalities III</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs021.html#final-technicalities-iv" style="font-size: 80%;">Final technicalities IV</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs022.html#back-propagation" style="font-size: 80%;">Back propagation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs023.html#gradient-descent" style="font-size: 80%;">Gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs024.html#the-code-for-solving-the-ode" style="font-size: 80%;">The code for solving the ODE</a></li>
+     <!-- navigation toc: --> <li><a href="#the-network-with-one-input-layer-specified-number-of-hidden-layers-and-one-output-layer" style="font-size: 80%;">The network with one input layer, specified number of hidden layers, and one output layer</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs026.html#example-population-growth" style="font-size: 80%;">Example: Population growth</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs027.html#setting-up-the-problem" style="font-size: 80%;">Setting up the problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#the-trial-solution" style="font-size: 80%;">The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs029.html#the-program-using-autograd" style="font-size: 80%;">The program using Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs030.html#using-forward-euler-to-solve-the-ode" style="font-size: 80%;">Using forward Euler to solve the ODE</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs031.html#example-solving-the-one-dimensional-poisson-equation" style="font-size: 80%;">Example: Solving the one dimensional Poisson equation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs032.html#the-specific-equation-to-solve-for" style="font-size: 80%;">The specific equation to solve for</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs033.html#solving-the-equation-using-autograd" style="font-size: 80%;">Solving the equation using Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs034.html#comparing-with-a-numerical-scheme" style="font-size: 80%;">Comparing with a numerical scheme</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs035.html#setting-up-the-code" style="font-size: 80%;">Setting up the code</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs036.html#partial-differential-equations" style="font-size: 80%;">Partial Differential Equations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs037.html#type-of-problem" style="font-size: 80%;">Type of problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs038.html#network-requirements" style="font-size: 80%;">Network requirements</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs039.html#more-details" style="font-size: 80%;">More details</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs040.html#example-the-diffusion-equation" style="font-size: 80%;">Example: The diffusion equation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs041.html#defining-the-problem" style="font-size: 80%;">Defining the problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs042.html#setting-up-the-network-using-autograd" style="font-size: 80%;">Setting up the network using Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs043.html#setting-up-the-network-using-autograd-the-trial-solution" style="font-size: 80%;">Setting up the network using Autograd; The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs044.html#why-the-jacobian" style="font-size: 80%;">Why the Jacobian?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs045.html#setting-up-the-network-using-autograd-the-full-program" style="font-size: 80%;">Setting up the network using Autograd; The full program</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs046.html#resources-on-differential-equations-and-deep-learning" style="font-size: 80%;">Resources on differential equations and deep learning</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs047.html#convolutional-neural-networks-recognizing-images" style="font-size: 80%;">Convolutional Neural Networks (recognizing images)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs048.html#what-is-the-difference" style="font-size: 80%;">What is the Difference</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs049.html#neural-networks-vs-cnns" style="font-size: 80%;">Neural Networks vs CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs050.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;">Why CNNS for images, sound files, medical images from CT scans etc?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs051.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;">Regular NNs don’t scale well to full images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs052.html#3d-volumes-of-neurons" style="font-size: 80%;">3D volumes of neurons</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs053.html#more-on-dimensionalities" style="font-size: 80%;">More on Dimensionalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs054.html#further-remarks" style="font-size: 80%;">Further remarks</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs055.html#layers-used-to-build-cnns" style="font-size: 80%;">Layers used to build CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs056.html#transforming-images" style="font-size: 80%;">Transforming images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs057.html#cnns-in-brief" style="font-size: 80%;">CNNs in brief</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs058.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs059.html#key-idea" style="font-size: 80%;">Key Idea</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#how-to-do-image-compression-before-the-era-of-deep-learning" style="font-size: 80%;">How to do image compression before the era of deep learning</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs061.html#the-svd-example" style="font-size: 80%;">The SVD example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs062.html#mathematics-of-cnns" style="font-size: 80%;">Mathematics of CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs063.html#convolution-examples-polynomial-multiplication" style="font-size: 80%;">Convolution Examples: Polynomial multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs064.html#efficient-polynomial-multiplication" style="font-size: 80%;">Efficient Polynomial Multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs065.html#further-simplification" style="font-size: 80%;">Further simplification</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs066.html#a-more-efficient-way-of-coding-the-above-convolution" style="font-size: 80%;">A more efficient way of coding the above Convolution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs067.html#commutative-process" style="font-size: 80%;">Commutative process</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs068.html#toeplitz-matrices" style="font-size: 80%;">Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs069.html#fourier-series-and-toeplitz-matrices" style="font-size: 80%;">Fourier series and Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs070.html#generalizing-the-above-one-dimensional-case" style="font-size: 80%;">Generalizing the above one-dimensional case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs071.html#memory-considerations" style="font-size: 80%;">Memory considerations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs072.html#padding" style="font-size: 80%;">Padding</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs073.html#new-vector" style="font-size: 80%;">New vector</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs074.html#rewriting-as-dot-products" style="font-size: 80%;">Rewriting as dot products</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs075.html#cross-correlation" style="font-size: 80%;">Cross correlation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs075.html#two-dimensional-objects" style="font-size: 80%;">Two-dimensional objects</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs076.html#cnns-in-more-detail-simple-example" style="font-size: 80%;">CNNs in more detail, simple example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs077.html#the-convolution-stage" style="font-size: 80%;">The convolution stage</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs078.html#finding-the-number-of-parameters" style="font-size: 80%;">Finding the number of parameters</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs079.html#new-image-or-volume" style="font-size: 80%;">New image (or volume)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs080.html#parameters-to-train-common-settings" style="font-size: 80%;">Parameters to train, common settings</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs081.html#examples-of-cnn-setups" style="font-size: 80%;">Examples of CNN setups</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs082.html#summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Summarizing: Performing a general discrete convolution ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs083.html#pooling" style="font-size: 80%;">Pooling</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs084.html#pooling-arithmetic" style="font-size: 80%;">Pooling arithmetic</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs085.html#pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Pooling types ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs086.html#building-convolutional-neural-networks-in-tensorflow-keras-and-pytorch" style="font-size: 80%;">Building convolutional neural networks in Tensorflow/Keras and PyTorch</a></li>
 
         </ul>
       </li>
@@ -348,32 +389,192 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0025"></a>
 <!-- !split -->
-<h2 id="toeplitz-matrices" class="anchor">Toeplitz matrices </h2>
-
-<p>The above matrices are examples of so-called <a href="https://link.springer.com/book/10.1007/978-93-86279-04-0" target="_self">Toeplitz
-matrices</a>. A
-Toeplitz matrix is a matrix in which each descending diagonal from
-left to right is constant. For instance the last matrix, which we
-rewrite as
-</p>
-$$
-\boldsymbol{A}=\begin{bmatrix}a_0 & 0 & 0  \\
-                            a_1 & a_0 & 0  \\
-			    a_2 & a_1 & a_0  \\
-			    a_3 & a_2 & a_1 \\
-			    0 & a_3 & a_2 \\
-			    0 & 0 & a_3
-			    \end{bmatrix},
-$$
-
-<p>with elements \( a_{ii}=a_{i+1,j+1}=a_{i-j} \) is an example of a Toeplitz
-matrix. Such a matrix does not need to be a square matrix.  Toeplitz
-matrices are also closely connected with Fourier series, because the multiplication operator by a trigonometric
-polynomial, compressed to a finite-dimensional space, can be
-represented by such a matrix. The example above shows that we can
-represent linear convolution as multiplication of a Toeplitz matrix by
-a vector.
-</p>
+<h2 id="the-network-with-one-input-layer-specified-number-of-hidden-layers-and-one-output-layer" class="anchor">The network with one input layer, specified number of hidden layers, and one output layer </h2>
+
+<p>It is also possible to extend the construction of our network into a more general one, allowing the network to contain more than one hidden layers.</p>
+
+<p>The number of neurons within each hidden layer are given as a list of integers in the program below.</p>
+
+
+<!-- code=python (!bc pycod) typeset with pygments style "default" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #f8f8f8">
+  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">autograd.numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">autograd</span> <span style="color: #008000; font-weight: bold">import</span> grad, elementwise_grad
+<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">autograd.numpy.random</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">npr</span>
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">matplotlib</span> <span style="color: #008000; font-weight: bold">import</span> pyplot <span style="color: #008000; font-weight: bold">as</span> plt
+
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">sigmoid</span>(z):
+    <span style="color: #008000; font-weight: bold">return</span> <span style="color: #666666">1/</span>(<span style="color: #666666">1</span> <span style="color: #666666">+</span> np<span style="color: #666666">.</span>exp(<span style="color: #666666">-</span>z))
+
+<span style="color: #408080; font-style: italic"># The neural network with one input layer and one output layer,</span>
+<span style="color: #408080; font-style: italic"># but with number of hidden layers specified by the user.</span>
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">deep_neural_network</span>(deep_params, x):
+    <span style="color: #408080; font-style: italic"># N_hidden is the number of hidden layers</span>
+    <span style="color: #408080; font-style: italic"># deep_params is a list, len() should be used</span>
+    N_hidden <span style="color: #666666">=</span> <span style="color: #008000">len</span>(deep_params) <span style="color: #666666">-</span> <span style="color: #666666">1</span> <span style="color: #408080; font-style: italic"># -1 since params consists of</span>
+                                        <span style="color: #408080; font-style: italic"># parameters to all the hidden</span>
+                                        <span style="color: #408080; font-style: italic"># layers AND the output layer.</span>
+
+    <span style="color: #408080; font-style: italic"># Assumes input x being an one-dimensional array</span>
+    num_values <span style="color: #666666">=</span> np<span style="color: #666666">.</span>size(x)
+    x <span style="color: #666666">=</span> x<span style="color: #666666">.</span>reshape(<span style="color: #666666">-1</span>, num_values)
+
+    <span style="color: #408080; font-style: italic"># Assume that the input layer does nothing to the input x</span>
+    x_input <span style="color: #666666">=</span> x
+
+    <span style="color: #408080; font-style: italic"># Due to multiple hidden layers, define a variable referencing to the</span>
+    <span style="color: #408080; font-style: italic"># output of the previous layer:</span>
+    x_prev <span style="color: #666666">=</span> x_input
+
+    <span style="color: #408080; font-style: italic">## Hidden layers:</span>
+
+    <span style="color: #008000; font-weight: bold">for</span> l <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(N_hidden):
+        <span style="color: #408080; font-style: italic"># From the list of parameters P; find the correct weigths and bias for this layer</span>
+        w_hidden <span style="color: #666666">=</span> deep_params[l]
+
+        <span style="color: #408080; font-style: italic"># Add a row of ones to include bias</span>
+        x_prev <span style="color: #666666">=</span> np<span style="color: #666666">.</span>concatenate((np<span style="color: #666666">.</span>ones((<span style="color: #666666">1</span>,num_values)), x_prev ), axis <span style="color: #666666">=</span> <span style="color: #666666">0</span>)
+
+        z_hidden <span style="color: #666666">=</span> np<span style="color: #666666">.</span>matmul(w_hidden, x_prev)
+        x_hidden <span style="color: #666666">=</span> sigmoid(z_hidden)
+
+        <span style="color: #408080; font-style: italic"># Update x_prev such that next layer can use the output from this layer</span>
+        x_prev <span style="color: #666666">=</span> x_hidden
+
+    <span style="color: #408080; font-style: italic">## Output layer:</span>
+
+    <span style="color: #408080; font-style: italic"># Get the weights and bias for this layer</span>
+    w_output <span style="color: #666666">=</span> deep_params[<span style="color: #666666">-1</span>]
+
+    <span style="color: #408080; font-style: italic"># Include bias:</span>
+    x_prev <span style="color: #666666">=</span> np<span style="color: #666666">.</span>concatenate((np<span style="color: #666666">.</span>ones((<span style="color: #666666">1</span>,num_values)), x_prev), axis <span style="color: #666666">=</span> <span style="color: #666666">0</span>)
+
+    z_output <span style="color: #666666">=</span> np<span style="color: #666666">.</span>matmul(w_output, x_prev)
+    x_output <span style="color: #666666">=</span> z_output
+
+    <span style="color: #008000; font-weight: bold">return</span> x_output
+
+<span style="color: #408080; font-style: italic"># The trial solution using the deep neural network:</span>
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">g_trial_deep</span>(x,params, g0 <span style="color: #666666">=</span> <span style="color: #666666">10</span>):
+    <span style="color: #008000; font-weight: bold">return</span> g0 <span style="color: #666666">+</span> x<span style="color: #666666">*</span>deep_neural_network(params, x)
+
+<span style="color: #408080; font-style: italic"># The right side of the ODE:</span>
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">g</span>(x, g_trial, gamma <span style="color: #666666">=</span> <span style="color: #666666">2</span>):
+    <span style="color: #008000; font-weight: bold">return</span> <span style="color: #666666">-</span>gamma<span style="color: #666666">*</span>g_trial
+
+<span style="color: #408080; font-style: italic"># The same cost function as before, but calls deep_neural_network instead.</span>
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">cost_function_deep</span>(P, x):
+
+    <span style="color: #408080; font-style: italic"># Evaluate the trial function with the current parameters P</span>
+    g_t <span style="color: #666666">=</span> g_trial_deep(x,P)
+
+    <span style="color: #408080; font-style: italic"># Find the derivative w.r.t x of the neural network</span>
+    d_net_out <span style="color: #666666">=</span> elementwise_grad(deep_neural_network,<span style="color: #666666">1</span>)(P,x)
+
+    <span style="color: #408080; font-style: italic"># Find the derivative w.r.t x of the trial function</span>
+    d_g_t <span style="color: #666666">=</span> elementwise_grad(g_trial_deep,<span style="color: #666666">0</span>)(x,P)
+
+    <span style="color: #408080; font-style: italic"># The right side of the ODE</span>
+    func <span style="color: #666666">=</span> g(x, g_t)
+
+    err_sqr <span style="color: #666666">=</span> (d_g_t <span style="color: #666666">-</span> func)<span style="color: #666666">**2</span>
+    cost_sum <span style="color: #666666">=</span> np<span style="color: #666666">.</span>sum(err_sqr)
+
+    <span style="color: #008000; font-weight: bold">return</span> cost_sum <span style="color: #666666">/</span> np<span style="color: #666666">.</span>size(err_sqr)
+
+<span style="color: #408080; font-style: italic"># Solve the exponential decay ODE using neural network with one input and one output layer,</span>
+<span style="color: #408080; font-style: italic"># but with specified number of hidden layers from the user.</span>
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">solve_ode_deep_neural_network</span>(x, num_neurons, num_iter, lmb):
+    <span style="color: #408080; font-style: italic"># num_hidden_neurons is now a list of number of neurons within each hidden layer</span>
+
+    <span style="color: #408080; font-style: italic"># The number of elements in the list num_hidden_neurons thus represents</span>
+    <span style="color: #408080; font-style: italic"># the number of hidden layers.</span>
+
+    <span style="color: #408080; font-style: italic"># Find the number of hidden layers:</span>
+    N_hidden <span style="color: #666666">=</span> np<span style="color: #666666">.</span>size(num_neurons)
+
+    <span style="color: #408080; font-style: italic">## Set up initial weights and biases</span>
+
+    <span style="color: #408080; font-style: italic"># Initialize the list of parameters:</span>
+    P <span style="color: #666666">=</span> [<span style="color: #008000; font-weight: bold">None</span>]<span style="color: #666666">*</span>(N_hidden <span style="color: #666666">+</span> <span style="color: #666666">1</span>) <span style="color: #408080; font-style: italic"># + 1 to include the output layer</span>
+
+    P[<span style="color: #666666">0</span>] <span style="color: #666666">=</span> npr<span style="color: #666666">.</span>randn(num_neurons[<span style="color: #666666">0</span>], <span style="color: #666666">2</span> )
+    <span style="color: #008000; font-weight: bold">for</span> l <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #666666">1</span>,N_hidden):
+        P[l] <span style="color: #666666">=</span> npr<span style="color: #666666">.</span>randn(num_neurons[l], num_neurons[l<span style="color: #666666">-1</span>] <span style="color: #666666">+</span> <span style="color: #666666">1</span>) <span style="color: #408080; font-style: italic"># +1 to include bias</span>
+
+    <span style="color: #408080; font-style: italic"># For the output layer</span>
+    P[<span style="color: #666666">-1</span>] <span style="color: #666666">=</span> npr<span style="color: #666666">.</span>randn(<span style="color: #666666">1</span>, num_neurons[<span style="color: #666666">-1</span>] <span style="color: #666666">+</span> <span style="color: #666666">1</span> ) <span style="color: #408080; font-style: italic"># +1 since bias is included</span>
+
+    <span style="color: #008000">print</span>(<span style="color: #BA2121">&#39;Initial cost: </span><span style="color: #BB6688; font-weight: bold">%g</span><span style="color: #BA2121">&#39;</span><span style="color: #666666">%</span>cost_function_deep(P, x))
+
+    <span style="color: #408080; font-style: italic">## Start finding the optimal weights using gradient descent</span>
+
+    <span style="color: #408080; font-style: italic"># Find the Python function that represents the gradient of the cost function</span>
+    <span style="color: #408080; font-style: italic"># w.r.t the 0-th input argument -- that is the weights and biases in the hidden and output layer</span>
+    cost_function_deep_grad <span style="color: #666666">=</span> grad(cost_function_deep,<span style="color: #666666">0</span>)
+
+    <span style="color: #408080; font-style: italic"># Let the update be done num_iter times</span>
+    <span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(num_iter):
+        <span style="color: #408080; font-style: italic"># Evaluate the gradient at the current weights and biases in P.</span>
+        <span style="color: #408080; font-style: italic"># The cost_grad consist now of N_hidden + 1 arrays; the gradient w.r.t the weights and biases</span>
+        <span style="color: #408080; font-style: italic"># in the hidden layers and output layers evaluated at x.</span>
+        cost_deep_grad <span style="color: #666666">=</span>  cost_function_deep_grad(P, x)
+
+        <span style="color: #008000; font-weight: bold">for</span> l <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(N_hidden<span style="color: #666666">+1</span>):
+            P[l] <span style="color: #666666">=</span> P[l] <span style="color: #666666">-</span> lmb <span style="color: #666666">*</span> cost_deep_grad[l]
+
+    <span style="color: #008000">print</span>(<span style="color: #BA2121">&#39;Final cost: </span><span style="color: #BB6688; font-weight: bold">%g</span><span style="color: #BA2121">&#39;</span><span style="color: #666666">%</span>cost_function_deep(P, x))
+
+    <span style="color: #008000; font-weight: bold">return</span> P
+
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">g_analytic</span>(x, gamma <span style="color: #666666">=</span> <span style="color: #666666">2</span>, g0 <span style="color: #666666">=</span> <span style="color: #666666">10</span>):
+    <span style="color: #008000; font-weight: bold">return</span> g0<span style="color: #666666">*</span>np<span style="color: #666666">.</span>exp(<span style="color: #666666">-</span>gamma<span style="color: #666666">*</span>x)
+
+<span style="color: #408080; font-style: italic"># Solve the given problem</span>
+<span style="color: #008000; font-weight: bold">if</span> <span style="color: #19177C">__name__</span> <span style="color: #666666">==</span> <span style="color: #BA2121">&#39;__main__&#39;</span>:
+    npr<span style="color: #666666">.</span>seed(<span style="color: #666666">15</span>)
+
+    <span style="color: #408080; font-style: italic">## Decide the vales of arguments to the function to solve</span>
+    N <span style="color: #666666">=</span> <span style="color: #666666">10</span>
+    x <span style="color: #666666">=</span> np<span style="color: #666666">.</span>linspace(<span style="color: #666666">0</span>, <span style="color: #666666">1</span>, N)
+
+    <span style="color: #408080; font-style: italic">## Set up the initial parameters</span>
+    num_hidden_neurons <span style="color: #666666">=</span> np<span style="color: #666666">.</span>array([<span style="color: #666666">10</span>,<span style="color: #666666">10</span>])
+    num_iter <span style="color: #666666">=</span> <span style="color: #666666">10000</span>
+    lmb <span style="color: #666666">=</span> <span style="color: #666666">0.001</span>
+
+    P <span style="color: #666666">=</span> solve_ode_deep_neural_network(x, num_hidden_neurons, num_iter, lmb)
+
+    res <span style="color: #666666">=</span> g_trial_deep(x,P)
+    res_analytical <span style="color: #666666">=</span> g_analytic(x)
+
+    plt<span style="color: #666666">.</span>figure(figsize<span style="color: #666666">=</span>(<span style="color: #666666">10</span>,<span style="color: #666666">10</span>))
+
+    plt<span style="color: #666666">.</span>title(<span style="color: #BA2121">&#39;Performance of a deep neural network solving an ODE compared to the analytical solution&#39;</span>)
+    plt<span style="color: #666666">.</span>plot(x, res_analytical)
+    plt<span style="color: #666666">.</span>plot(x, res[<span style="color: #666666">0</span>,:])
+    plt<span style="color: #666666">.</span>legend([<span style="color: #BA2121">&#39;analytical&#39;</span>,<span style="color: #BA2121">&#39;dnn&#39;</span>])
+    plt<span style="color: #666666">.</span>ylabel(<span style="color: #BA2121">&#39;g(x)&#39;</span>)
+    plt<span style="color: #666666">.</span>show()
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
+</div>
+
 
 <p>
 <!-- navigation buttons at the bottom of the page -->
@@ -400,7 +601,7 @@ <h2 id="toeplitz-matrices" class="anchor">Toeplitz matrices </h2>
   <li><a href="http://github.com/._week44-bs033.html">34</a></li>
   <li><a href="http://github.com/._week44-bs034.html">35</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week44-bs060.html">61</a></li>
+  <li><a href="http://github.com/._week44-bs086.html">87</a></li>
   <li><a href="http://github.com/._week44-bs026.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week44/html/._week44-bs026.html b/doc/pub/week44/html/._week44-bs026.html
index cd3b1b4ce..bd7843733 100644
--- a/doc/pub/week44/html/._week44-bs026.html
+++ b/doc/pub/week44/html/._week44-bs026.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 44,  Convolutional Neural Networks (CNN)">
-<title>Week 44,  Convolutional Neural Networks (CNN)</title>
+<meta name="description" content="Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)">
+<title>Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week44.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week44-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -41,10 +41,121 @@
                2,
                None,
                'lab-sessions-on-tuesday-and-wednesday'),
-              ('Material for Lecture Monday October 28',
+              ('Material for Lecture Monday October 27',
                2,
                None,
-               'material-for-lecture-monday-october-28'),
+               'material-for-lecture-monday-october-27'),
+              ('Solving differential equations  with Deep Learning',
+               2,
+               None,
+               'solving-differential-equations-with-deep-learning'),
+              ('Ordinary Differential Equations first',
+               2,
+               None,
+               'ordinary-differential-equations-first'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('Minimization process', 2, None, 'minimization-process'),
+              ('Minimizing the cost function using gradient descent and '
+               'automatic differentiation',
+               2,
+               None,
+               'minimizing-the-cost-function-using-gradient-descent-and-automatic-differentiation'),
+              ('Example: Exponential decay',
+               2,
+               None,
+               'example-exponential-decay'),
+              ('The function to solve for',
+               2,
+               None,
+               'the-function-to-solve-for'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('Setup of Network', 2, None, 'setup-of-network'),
+              ('Reformulating the problem',
+               2,
+               None,
+               'reformulating-the-problem'),
+              ('More technicalities', 2, None, 'more-technicalities'),
+              ('More details', 2, None, 'more-details'),
+              ('A possible implementation of a neural network',
+               2,
+               None,
+               'a-possible-implementation-of-a-neural-network'),
+              ('Technicalities', 2, None, 'technicalities'),
+              ('Final technicalities I', 2, None, 'final-technicalities-i'),
+              ('Final technicalities II', 2, None, 'final-technicalities-ii'),
+              ('Final technicalities III', 2, None, 'final-technicalities-iii'),
+              ('Final technicalities IV', 2, None, 'final-technicalities-iv'),
+              ('Back propagation', 2, None, 'back-propagation'),
+              ('Gradient descent', 2, None, 'gradient-descent'),
+              ('The code for solving the ODE',
+               2,
+               None,
+               'the-code-for-solving-the-ode'),
+              ('The network with one input layer, specified number of hidden '
+               'layers, and one output layer',
+               2,
+               None,
+               'the-network-with-one-input-layer-specified-number-of-hidden-layers-and-one-output-layer'),
+              ('Example: Population growth',
+               2,
+               None,
+               'example-population-growth'),
+              ('Setting up the problem', 2, None, 'setting-up-the-problem'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('The program using Autograd',
+               2,
+               None,
+               'the-program-using-autograd'),
+              ('Using forward Euler to solve the ODE',
+               2,
+               None,
+               'using-forward-euler-to-solve-the-ode'),
+              ('Example: Solving the one dimensional Poisson equation',
+               2,
+               None,
+               'example-solving-the-one-dimensional-poisson-equation'),
+              ('The specific equation to solve for',
+               2,
+               None,
+               'the-specific-equation-to-solve-for'),
+              ('Solving the equation using Autograd',
+               2,
+               None,
+               'solving-the-equation-using-autograd'),
+              ('Comparing with a numerical scheme',
+               2,
+               None,
+               'comparing-with-a-numerical-scheme'),
+              ('Setting up the code', 2, None, 'setting-up-the-code'),
+              ('Partial Differential Equations',
+               2,
+               None,
+               'partial-differential-equations'),
+              ('Type of problem', 2, None, 'type-of-problem'),
+              ('Network requirements', 2, None, 'network-requirements'),
+              ('More details', 2, None, 'more-details'),
+              ('Example: The diffusion equation',
+               2,
+               None,
+               'example-the-diffusion-equation'),
+              ('Defining the problem', 2, None, 'defining-the-problem'),
+              ('Setting up the network using Autograd',
+               2,
+               None,
+               'setting-up-the-network-using-autograd'),
+              ('Setting up the network using Autograd; The trial solution',
+               2,
+               None,
+               'setting-up-the-network-using-autograd-the-trial-solution'),
+              ('Why the Jacobian?', 2, None, 'why-the-jacobian'),
+              ('Setting up the network using Autograd; The full program',
+               2,
+               None,
+               'setting-up-the-network-using-autograd-the-full-program'),
+              ('Resources on differential equations and deep learning',
+               2,
+               None,
+               'resources-on-differential-equations-and-deep-learning'),
               ('Convolutional Neural Networks (recognizing images)',
                2,
                None,
@@ -141,85 +252,11 @@
                2,
                None,
                'pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
-              ('Building convolutional neural networks in Tensorflow and Keras',
-               2,
-               None,
-               'building-convolutional-neural-networks-in-tensorflow-and-keras'),
-              ('Setting it up', 2, None, 'setting-it-up'),
-              ('The MNIST dataset again', 2, None, 'the-mnist-dataset-again'),
-              ('Strong correlations', 2, None, 'strong-correlations'),
-              ('Layers of a CNN', 2, None, 'layers-of-a-cnn'),
-              ('Systematic reduction', 2, None, 'systematic-reduction'),
-              ('Prerequisites: Collect and pre-process data',
-               2,
-               None,
-               'prerequisites-collect-and-pre-process-data'),
-              ('Importing Keras and Tensorflow',
-               2,
-               None,
-               'importing-keras-and-tensorflow'),
-              ('Running with Keras', 2, None, 'running-with-keras'),
-              ('Final part', 2, None, 'final-part'),
-              ('Final visualization', 2, None, 'final-visualization'),
-              ('The CIFAR01 data set', 2, None, 'the-cifar01-data-set'),
-              ('Verifying the data set', 2, None, 'verifying-the-data-set'),
-              ('Set up  the model', 2, None, 'set-up-the-model'),
-              ('Add Dense layers on top', 2, None, 'add-dense-layers-on-top'),
-              ('Compile and train the model',
-               2,
-               None,
-               'compile-and-train-the-model'),
-              ('Finally, evaluate the model',
-               2,
-               None,
-               'finally-evaluate-the-model'),
-              ('Building our own CNN code',
+              ('Building convolutional neural networks in Tensorflow/Keras and '
+               'PyTorch',
                2,
                None,
-               'building-our-own-cnn-code'),
-              ('List of contents:', 3, None, 'list-of-contents'),
-              ('Schedulers', 3, None, 'schedulers'),
-              ('Usage of schedulers', 3, None, 'usage-of-schedulers'),
-              ('Cost functions', 3, None, 'cost-functions'),
-              ('Usage of cost functions', 3, None, 'usage-of-cost-functions'),
-              ('Activation functions', 3, None, 'activation-functions'),
-              ('Usage of activation functions',
-               3,
-               None,
-               'usage-of-activation-functions'),
-              ('Convolution', 3, None, 'convolution'),
-              ('Layers', 3, None, 'layers'),
-              ('Convolution2DLayer: convolution in a hidden layer',
-               3,
-               None,
-               'convolution2dlayer-convolution-in-a-hidden-layer'),
-              ('Backpropagation in the convolutional layer',
-               3,
-               None,
-               'backpropagation-in-the-convolutional-layer'),
-              ('Demonstration', 3, None, 'demonstration'),
-              ('Pooling Layer', 3, None, 'pooling-layer'),
-              ('Flattening Layer', 3, None, 'flattening-layer'),
-              ('Fully Connected Layers', 3, None, 'fully-connected-layers'),
-              ('Optimized Convolution2DLayer',
-               3,
-               None,
-               'optimized-convolution2dlayer'),
-              ('The Convolutional Neural Network (CNN)',
-               3,
-               None,
-               'the-convolutional-neural-network-cnn'),
-              ('Usage of CNN code', 3, None, 'usage-of-cnn-code'),
-              ('Additional Remarks', 3, None, 'additional-remarks'),
-              ('Remarks on the speed', 3, None, 'remarks-on-the-speed'),
-              ('Convolution using separable kernels',
-               3,
-               None,
-               'convolution-using-separable-kernels'),
-              ('Convolution in the Fourier domain',
-               3,
-               None,
-               'convolution-in-the-fourier-domain')]}
+               'building-convolutional-neural-networks-in-tensorflow-keras-and-pytorch')]}
 end of tocinfo -->
 
 <body>
@@ -247,96 +284,100 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week44-bs.html">Week 44,  Convolutional Neural Networks (CNN)</a>
+    <a class="navbar-brand" href="http://github.com/week44-bs.html">Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs001.html#plan-for-week-44" style="font-size: 80%;"><b>Plan for week 44</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs002.html#lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Lab  sessions on Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs003.html#material-for-lecture-monday-october-28" style="font-size: 80%;"><b>Material for Lecture Monday October 28</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs004.html#convolutional-neural-networks-recognizing-images" style="font-size: 80%;"><b>Convolutional Neural Networks (recognizing images)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs005.html#what-is-the-difference" style="font-size: 80%;"><b>What is the Difference</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs006.html#neural-networks-vs-cnns" style="font-size: 80%;"><b>Neural Networks vs CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs007.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;"><b>Why CNNS for images, sound files, medical images from CT scans etc?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs008.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;"><b>Regular NNs don’t scale well to full images</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs009.html#3d-volumes-of-neurons" style="font-size: 80%;"><b>3D volumes of neurons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs010.html#more-on-dimensionalities" style="font-size: 80%;"><b>More on Dimensionalities</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs011.html#further-remarks" style="font-size: 80%;"><b>Further remarks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs012.html#layers-used-to-build-cnns" style="font-size: 80%;"><b>Layers used to build CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs013.html#transforming-images" style="font-size: 80%;"><b>Transforming images</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs014.html#cnns-in-brief" style="font-size: 80%;"><b>CNNs in brief</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs015.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs016.html#key-idea" style="font-size: 80%;"><b>Key Idea</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs017.html#how-to-do-image-compression-before-the-era-of-deep-learning" style="font-size: 80%;"><b>How to do image compression before the era of deep learning</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs018.html#the-svd-example" style="font-size: 80%;"><b>The SVD example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs019.html#mathematics-of-cnns" style="font-size: 80%;"><b>Mathematics of CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs020.html#convolution-examples-polynomial-multiplication" style="font-size: 80%;"><b>Convolution Examples: Polynomial multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs021.html#efficient-polynomial-multiplication" style="font-size: 80%;"><b>Efficient Polynomial Multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs022.html#further-simplification" style="font-size: 80%;"><b>Further simplification</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs023.html#a-more-efficient-way-of-coding-the-above-convolution" style="font-size: 80%;"><b>A more efficient way of coding the above Convolution</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs024.html#commutative-process" style="font-size: 80%;"><b>Commutative process</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs025.html#toeplitz-matrices" style="font-size: 80%;"><b>Toeplitz matrices</b></a></li>
-     <!-- navigation toc: --> <li><a href="#fourier-series-and-toeplitz-matrices" style="font-size: 80%;"><b>Fourier series and Toeplitz matrices</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs027.html#generalizing-the-above-one-dimensional-case" style="font-size: 80%;"><b>Generalizing the above one-dimensional case</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#memory-considerations" style="font-size: 80%;"><b>Memory considerations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs029.html#padding" style="font-size: 80%;"><b>Padding</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs030.html#new-vector" style="font-size: 80%;"><b>New vector</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs031.html#rewriting-as-dot-products" style="font-size: 80%;"><b>Rewriting as dot products</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs032.html#cross-correlation" style="font-size: 80%;"><b>Cross correlation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs032.html#two-dimensional-objects" style="font-size: 80%;"><b>Two-dimensional objects</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs033.html#cnns-in-more-detail-simple-example" style="font-size: 80%;"><b>CNNs in more detail, simple example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs034.html#the-convolution-stage" style="font-size: 80%;"><b>The convolution stage</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs035.html#finding-the-number-of-parameters" style="font-size: 80%;"><b>Finding the number of parameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs036.html#new-image-or-volume" style="font-size: 80%;"><b>New image (or volume)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs037.html#parameters-to-train-common-settings" style="font-size: 80%;"><b>Parameters to train, common settings</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs038.html#examples-of-cnn-setups" style="font-size: 80%;"><b>Examples of CNN setups</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs039.html#summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>Summarizing: Performing a general discrete convolution ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs040.html#pooling" style="font-size: 80%;"><b>Pooling</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs041.html#pooling-arithmetic" style="font-size: 80%;"><b>Pooling arithmetic</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs042.html#pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>Pooling types ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs043.html#building-convolutional-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building convolutional neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs044.html#setting-it-up" style="font-size: 80%;"><b>Setting it up</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs045.html#the-mnist-dataset-again" style="font-size: 80%;"><b>The MNIST dataset again</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs046.html#strong-correlations" style="font-size: 80%;"><b>Strong correlations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs047.html#layers-of-a-cnn" style="font-size: 80%;"><b>Layers of a CNN</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs048.html#systematic-reduction" style="font-size: 80%;"><b>Systematic reduction</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs049.html#prerequisites-collect-and-pre-process-data" style="font-size: 80%;"><b>Prerequisites: Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs050.html#importing-keras-and-tensorflow" style="font-size: 80%;"><b>Importing Keras and Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs051.html#running-with-keras" style="font-size: 80%;"><b>Running with Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs052.html#final-part" style="font-size: 80%;"><b>Final part</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs053.html#final-visualization" style="font-size: 80%;"><b>Final visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs054.html#the-cifar01-data-set" style="font-size: 80%;"><b>The CIFAR01 data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs055.html#verifying-the-data-set" style="font-size: 80%;"><b>Verifying the data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs056.html#set-up-the-model" style="font-size: 80%;"><b>Set up  the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs057.html#add-dense-layers-on-top" style="font-size: 80%;"><b>Add Dense layers on top</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs058.html#compile-and-train-the-model" style="font-size: 80%;"><b>Compile and train the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs059.html#finally-evaluate-the-model" style="font-size: 80%;"><b>Finally, evaluate the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#building-our-own-cnn-code" style="font-size: 80%;"><b>Building our own CNN code</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#list-of-contents" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;List of contents:</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution2dlayer-convolution-in-a-hidden-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution2DLayer: convolution in a hidden layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#backpropagation-in-the-convolutional-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Backpropagation in the convolutional layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#demonstration" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Demonstration</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#pooling-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Pooling Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#flattening-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Flattening Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#fully-connected-layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Fully Connected Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#optimized-convolution2dlayer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Optimized Convolution2DLayer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#the-convolutional-neural-network-cnn" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;The Convolutional Neural Network (CNN)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-cnn-code" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of CNN code</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#additional-remarks" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Additional Remarks</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#remarks-on-the-speed" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Remarks on the speed</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution-using-separable-kernels" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution using separable kernels</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution-in-the-fourier-domain" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution in the Fourier domain</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs001.html#plan-for-week-44" style="font-size: 80%;">Plan for week 44</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs002.html#lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;">Lab  sessions on Tuesday and Wednesday</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs003.html#material-for-lecture-monday-october-27" style="font-size: 80%;">Material for Lecture Monday October 27</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs004.html#solving-differential-equations-with-deep-learning" style="font-size: 80%;">Solving differential equations  with Deep Learning</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs005.html#ordinary-differential-equations-first" style="font-size: 80%;">Ordinary Differential Equations first</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#the-trial-solution" style="font-size: 80%;">The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs007.html#minimization-process" style="font-size: 80%;">Minimization process</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs008.html#minimizing-the-cost-function-using-gradient-descent-and-automatic-differentiation" style="font-size: 80%;">Minimizing the cost function using gradient descent and automatic differentiation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs009.html#example-exponential-decay" style="font-size: 80%;">Example: Exponential decay</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs010.html#the-function-to-solve-for" style="font-size: 80%;">The function to solve for</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#the-trial-solution" style="font-size: 80%;">The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs012.html#setup-of-network" style="font-size: 80%;">Setup of Network</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs013.html#reformulating-the-problem" style="font-size: 80%;">Reformulating the problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs014.html#more-technicalities" style="font-size: 80%;">More technicalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs039.html#more-details" style="font-size: 80%;">More details</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs016.html#a-possible-implementation-of-a-neural-network" style="font-size: 80%;">A possible implementation of a neural network</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs017.html#technicalities" style="font-size: 80%;">Technicalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs018.html#final-technicalities-i" style="font-size: 80%;">Final technicalities I</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs019.html#final-technicalities-ii" style="font-size: 80%;">Final technicalities II</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs020.html#final-technicalities-iii" style="font-size: 80%;">Final technicalities III</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs021.html#final-technicalities-iv" style="font-size: 80%;">Final technicalities IV</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs022.html#back-propagation" style="font-size: 80%;">Back propagation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs023.html#gradient-descent" style="font-size: 80%;">Gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs024.html#the-code-for-solving-the-ode" style="font-size: 80%;">The code for solving the ODE</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs025.html#the-network-with-one-input-layer-specified-number-of-hidden-layers-and-one-output-layer" style="font-size: 80%;">The network with one input layer, specified number of hidden layers, and one output layer</a></li>
+     <!-- navigation toc: --> <li><a href="#example-population-growth" style="font-size: 80%;">Example: Population growth</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs027.html#setting-up-the-problem" style="font-size: 80%;">Setting up the problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#the-trial-solution" style="font-size: 80%;">The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs029.html#the-program-using-autograd" style="font-size: 80%;">The program using Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs030.html#using-forward-euler-to-solve-the-ode" style="font-size: 80%;">Using forward Euler to solve the ODE</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs031.html#example-solving-the-one-dimensional-poisson-equation" style="font-size: 80%;">Example: Solving the one dimensional Poisson equation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs032.html#the-specific-equation-to-solve-for" style="font-size: 80%;">The specific equation to solve for</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs033.html#solving-the-equation-using-autograd" style="font-size: 80%;">Solving the equation using Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs034.html#comparing-with-a-numerical-scheme" style="font-size: 80%;">Comparing with a numerical scheme</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs035.html#setting-up-the-code" style="font-size: 80%;">Setting up the code</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs036.html#partial-differential-equations" style="font-size: 80%;">Partial Differential Equations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs037.html#type-of-problem" style="font-size: 80%;">Type of problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs038.html#network-requirements" style="font-size: 80%;">Network requirements</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs039.html#more-details" style="font-size: 80%;">More details</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs040.html#example-the-diffusion-equation" style="font-size: 80%;">Example: The diffusion equation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs041.html#defining-the-problem" style="font-size: 80%;">Defining the problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs042.html#setting-up-the-network-using-autograd" style="font-size: 80%;">Setting up the network using Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs043.html#setting-up-the-network-using-autograd-the-trial-solution" style="font-size: 80%;">Setting up the network using Autograd; The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs044.html#why-the-jacobian" style="font-size: 80%;">Why the Jacobian?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs045.html#setting-up-the-network-using-autograd-the-full-program" style="font-size: 80%;">Setting up the network using Autograd; The full program</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs046.html#resources-on-differential-equations-and-deep-learning" style="font-size: 80%;">Resources on differential equations and deep learning</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs047.html#convolutional-neural-networks-recognizing-images" style="font-size: 80%;">Convolutional Neural Networks (recognizing images)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs048.html#what-is-the-difference" style="font-size: 80%;">What is the Difference</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs049.html#neural-networks-vs-cnns" style="font-size: 80%;">Neural Networks vs CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs050.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;">Why CNNS for images, sound files, medical images from CT scans etc?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs051.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;">Regular NNs don’t scale well to full images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs052.html#3d-volumes-of-neurons" style="font-size: 80%;">3D volumes of neurons</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs053.html#more-on-dimensionalities" style="font-size: 80%;">More on Dimensionalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs054.html#further-remarks" style="font-size: 80%;">Further remarks</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs055.html#layers-used-to-build-cnns" style="font-size: 80%;">Layers used to build CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs056.html#transforming-images" style="font-size: 80%;">Transforming images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs057.html#cnns-in-brief" style="font-size: 80%;">CNNs in brief</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs058.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs059.html#key-idea" style="font-size: 80%;">Key Idea</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#how-to-do-image-compression-before-the-era-of-deep-learning" style="font-size: 80%;">How to do image compression before the era of deep learning</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs061.html#the-svd-example" style="font-size: 80%;">The SVD example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs062.html#mathematics-of-cnns" style="font-size: 80%;">Mathematics of CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs063.html#convolution-examples-polynomial-multiplication" style="font-size: 80%;">Convolution Examples: Polynomial multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs064.html#efficient-polynomial-multiplication" style="font-size: 80%;">Efficient Polynomial Multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs065.html#further-simplification" style="font-size: 80%;">Further simplification</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs066.html#a-more-efficient-way-of-coding-the-above-convolution" style="font-size: 80%;">A more efficient way of coding the above Convolution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs067.html#commutative-process" style="font-size: 80%;">Commutative process</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs068.html#toeplitz-matrices" style="font-size: 80%;">Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs069.html#fourier-series-and-toeplitz-matrices" style="font-size: 80%;">Fourier series and Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs070.html#generalizing-the-above-one-dimensional-case" style="font-size: 80%;">Generalizing the above one-dimensional case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs071.html#memory-considerations" style="font-size: 80%;">Memory considerations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs072.html#padding" style="font-size: 80%;">Padding</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs073.html#new-vector" style="font-size: 80%;">New vector</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs074.html#rewriting-as-dot-products" style="font-size: 80%;">Rewriting as dot products</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs075.html#cross-correlation" style="font-size: 80%;">Cross correlation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs075.html#two-dimensional-objects" style="font-size: 80%;">Two-dimensional objects</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs076.html#cnns-in-more-detail-simple-example" style="font-size: 80%;">CNNs in more detail, simple example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs077.html#the-convolution-stage" style="font-size: 80%;">The convolution stage</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs078.html#finding-the-number-of-parameters" style="font-size: 80%;">Finding the number of parameters</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs079.html#new-image-or-volume" style="font-size: 80%;">New image (or volume)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs080.html#parameters-to-train-common-settings" style="font-size: 80%;">Parameters to train, common settings</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs081.html#examples-of-cnn-setups" style="font-size: 80%;">Examples of CNN setups</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs082.html#summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Summarizing: Performing a general discrete convolution ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs083.html#pooling" style="font-size: 80%;">Pooling</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs084.html#pooling-arithmetic" style="font-size: 80%;">Pooling arithmetic</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs085.html#pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Pooling types ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs086.html#building-convolutional-neural-networks-in-tensorflow-keras-and-pytorch" style="font-size: 80%;">Building convolutional neural networks in Tensorflow/Keras and PyTorch</a></li>
 
         </ul>
       </li>
@@ -348,13 +389,28 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0026"></a>
 <!-- !split -->
-<h2 id="fourier-series-and-toeplitz-matrices" class="anchor">Fourier series and Toeplitz matrices </h2>
+<h2 id="example-population-growth" class="anchor">Example: Population growth </h2>
+
+<p>A logistic model of population growth assumes that a population converges toward an equilibrium.
+The population growth can be modeled by
+</p>
+
+$$
+\begin{equation} \tag{10}
+	g'(t) = \alpha g(t)(A - g(t))
+\end{equation}
+$$
+
+<p>where \( g(t) \) is the population density at time \( t \), \( \alpha > 0 \) the growth rate and \( A > 0 \) is the maximum population number in the environment.
+Also, at \( t = 0 \) the population has the size \( g(0) = g_0 \), where \( g_0 \) is some chosen constant.
+</p>
+
+<p>In this example, similar network as for the exponential decay using Autograd has been used to solve the equation. However, as the implementation might suffer from e.g numerical instability
+and high execution time (this might be more apparent in the examples solving PDEs),
+using a library like  TensorFlow is recommended.
+Here, we stay with a more simple approach and implement for comparison, the simple forward Euler method.
+</p>
 
-<p>This is an active and ogoing research area concerning CNNs. The following articles may be of interest</p>
-<ol>
-<li> <a href="https://www.sciencedirect.com/topics/engineering/convolution-theorem#:~:text=The%20convolution%20theorem%20(together%20with,k%20)%20G%20(%20k%20)%20." target="_self">Read more about the convolution theorem and Fouriers series</a></li>
-<li> <a href="https://www.sciencedirect.com/science/article/pii/S1568494623006257" target="_self">Fourier Transform Layer</a></li>
-</ol>
 <p>
 <!-- navigation buttons at the bottom of the page -->
 <ul class="pagination">
@@ -380,7 +436,7 @@ <h2 id="fourier-series-and-toeplitz-matrices" class="anchor">Fourier series and
   <li><a href="http://github.com/._week44-bs034.html">35</a></li>
   <li><a href="http://github.com/._week44-bs035.html">36</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week44-bs060.html">61</a></li>
+  <li><a href="http://github.com/._week44-bs086.html">87</a></li>
   <li><a href="http://github.com/._week44-bs027.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week44/html/._week44-bs027.html b/doc/pub/week44/html/._week44-bs027.html
index 008666ea1..c2f521a33 100644
--- a/doc/pub/week44/html/._week44-bs027.html
+++ b/doc/pub/week44/html/._week44-bs027.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 44,  Convolutional Neural Networks (CNN)">
-<title>Week 44,  Convolutional Neural Networks (CNN)</title>
+<meta name="description" content="Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)">
+<title>Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week44.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week44-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -41,10 +41,121 @@
                2,
                None,
                'lab-sessions-on-tuesday-and-wednesday'),
-              ('Material for Lecture Monday October 28',
+              ('Material for Lecture Monday October 27',
                2,
                None,
-               'material-for-lecture-monday-october-28'),
+               'material-for-lecture-monday-october-27'),
+              ('Solving differential equations  with Deep Learning',
+               2,
+               None,
+               'solving-differential-equations-with-deep-learning'),
+              ('Ordinary Differential Equations first',
+               2,
+               None,
+               'ordinary-differential-equations-first'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('Minimization process', 2, None, 'minimization-process'),
+              ('Minimizing the cost function using gradient descent and '
+               'automatic differentiation',
+               2,
+               None,
+               'minimizing-the-cost-function-using-gradient-descent-and-automatic-differentiation'),
+              ('Example: Exponential decay',
+               2,
+               None,
+               'example-exponential-decay'),
+              ('The function to solve for',
+               2,
+               None,
+               'the-function-to-solve-for'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('Setup of Network', 2, None, 'setup-of-network'),
+              ('Reformulating the problem',
+               2,
+               None,
+               'reformulating-the-problem'),
+              ('More technicalities', 2, None, 'more-technicalities'),
+              ('More details', 2, None, 'more-details'),
+              ('A possible implementation of a neural network',
+               2,
+               None,
+               'a-possible-implementation-of-a-neural-network'),
+              ('Technicalities', 2, None, 'technicalities'),
+              ('Final technicalities I', 2, None, 'final-technicalities-i'),
+              ('Final technicalities II', 2, None, 'final-technicalities-ii'),
+              ('Final technicalities III', 2, None, 'final-technicalities-iii'),
+              ('Final technicalities IV', 2, None, 'final-technicalities-iv'),
+              ('Back propagation', 2, None, 'back-propagation'),
+              ('Gradient descent', 2, None, 'gradient-descent'),
+              ('The code for solving the ODE',
+               2,
+               None,
+               'the-code-for-solving-the-ode'),
+              ('The network with one input layer, specified number of hidden '
+               'layers, and one output layer',
+               2,
+               None,
+               'the-network-with-one-input-layer-specified-number-of-hidden-layers-and-one-output-layer'),
+              ('Example: Population growth',
+               2,
+               None,
+               'example-population-growth'),
+              ('Setting up the problem', 2, None, 'setting-up-the-problem'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('The program using Autograd',
+               2,
+               None,
+               'the-program-using-autograd'),
+              ('Using forward Euler to solve the ODE',
+               2,
+               None,
+               'using-forward-euler-to-solve-the-ode'),
+              ('Example: Solving the one dimensional Poisson equation',
+               2,
+               None,
+               'example-solving-the-one-dimensional-poisson-equation'),
+              ('The specific equation to solve for',
+               2,
+               None,
+               'the-specific-equation-to-solve-for'),
+              ('Solving the equation using Autograd',
+               2,
+               None,
+               'solving-the-equation-using-autograd'),
+              ('Comparing with a numerical scheme',
+               2,
+               None,
+               'comparing-with-a-numerical-scheme'),
+              ('Setting up the code', 2, None, 'setting-up-the-code'),
+              ('Partial Differential Equations',
+               2,
+               None,
+               'partial-differential-equations'),
+              ('Type of problem', 2, None, 'type-of-problem'),
+              ('Network requirements', 2, None, 'network-requirements'),
+              ('More details', 2, None, 'more-details'),
+              ('Example: The diffusion equation',
+               2,
+               None,
+               'example-the-diffusion-equation'),
+              ('Defining the problem', 2, None, 'defining-the-problem'),
+              ('Setting up the network using Autograd',
+               2,
+               None,
+               'setting-up-the-network-using-autograd'),
+              ('Setting up the network using Autograd; The trial solution',
+               2,
+               None,
+               'setting-up-the-network-using-autograd-the-trial-solution'),
+              ('Why the Jacobian?', 2, None, 'why-the-jacobian'),
+              ('Setting up the network using Autograd; The full program',
+               2,
+               None,
+               'setting-up-the-network-using-autograd-the-full-program'),
+              ('Resources on differential equations and deep learning',
+               2,
+               None,
+               'resources-on-differential-equations-and-deep-learning'),
               ('Convolutional Neural Networks (recognizing images)',
                2,
                None,
@@ -141,85 +252,11 @@
                2,
                None,
                'pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
-              ('Building convolutional neural networks in Tensorflow and Keras',
-               2,
-               None,
-               'building-convolutional-neural-networks-in-tensorflow-and-keras'),
-              ('Setting it up', 2, None, 'setting-it-up'),
-              ('The MNIST dataset again', 2, None, 'the-mnist-dataset-again'),
-              ('Strong correlations', 2, None, 'strong-correlations'),
-              ('Layers of a CNN', 2, None, 'layers-of-a-cnn'),
-              ('Systematic reduction', 2, None, 'systematic-reduction'),
-              ('Prerequisites: Collect and pre-process data',
-               2,
-               None,
-               'prerequisites-collect-and-pre-process-data'),
-              ('Importing Keras and Tensorflow',
+              ('Building convolutional neural networks in Tensorflow/Keras and '
+               'PyTorch',
                2,
                None,
-               'importing-keras-and-tensorflow'),
-              ('Running with Keras', 2, None, 'running-with-keras'),
-              ('Final part', 2, None, 'final-part'),
-              ('Final visualization', 2, None, 'final-visualization'),
-              ('The CIFAR01 data set', 2, None, 'the-cifar01-data-set'),
-              ('Verifying the data set', 2, None, 'verifying-the-data-set'),
-              ('Set up  the model', 2, None, 'set-up-the-model'),
-              ('Add Dense layers on top', 2, None, 'add-dense-layers-on-top'),
-              ('Compile and train the model',
-               2,
-               None,
-               'compile-and-train-the-model'),
-              ('Finally, evaluate the model',
-               2,
-               None,
-               'finally-evaluate-the-model'),
-              ('Building our own CNN code',
-               2,
-               None,
-               'building-our-own-cnn-code'),
-              ('List of contents:', 3, None, 'list-of-contents'),
-              ('Schedulers', 3, None, 'schedulers'),
-              ('Usage of schedulers', 3, None, 'usage-of-schedulers'),
-              ('Cost functions', 3, None, 'cost-functions'),
-              ('Usage of cost functions', 3, None, 'usage-of-cost-functions'),
-              ('Activation functions', 3, None, 'activation-functions'),
-              ('Usage of activation functions',
-               3,
-               None,
-               'usage-of-activation-functions'),
-              ('Convolution', 3, None, 'convolution'),
-              ('Layers', 3, None, 'layers'),
-              ('Convolution2DLayer: convolution in a hidden layer',
-               3,
-               None,
-               'convolution2dlayer-convolution-in-a-hidden-layer'),
-              ('Backpropagation in the convolutional layer',
-               3,
-               None,
-               'backpropagation-in-the-convolutional-layer'),
-              ('Demonstration', 3, None, 'demonstration'),
-              ('Pooling Layer', 3, None, 'pooling-layer'),
-              ('Flattening Layer', 3, None, 'flattening-layer'),
-              ('Fully Connected Layers', 3, None, 'fully-connected-layers'),
-              ('Optimized Convolution2DLayer',
-               3,
-               None,
-               'optimized-convolution2dlayer'),
-              ('The Convolutional Neural Network (CNN)',
-               3,
-               None,
-               'the-convolutional-neural-network-cnn'),
-              ('Usage of CNN code', 3, None, 'usage-of-cnn-code'),
-              ('Additional Remarks', 3, None, 'additional-remarks'),
-              ('Remarks on the speed', 3, None, 'remarks-on-the-speed'),
-              ('Convolution using separable kernels',
-               3,
-               None,
-               'convolution-using-separable-kernels'),
-              ('Convolution in the Fourier domain',
-               3,
-               None,
-               'convolution-in-the-fourier-domain')]}
+               'building-convolutional-neural-networks-in-tensorflow-keras-and-pytorch')]}
 end of tocinfo -->
 
 <body>
@@ -247,96 +284,100 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week44-bs.html">Week 44,  Convolutional Neural Networks (CNN)</a>
+    <a class="navbar-brand" href="http://github.com/week44-bs.html">Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs001.html#plan-for-week-44" style="font-size: 80%;"><b>Plan for week 44</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs002.html#lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Lab  sessions on Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs003.html#material-for-lecture-monday-october-28" style="font-size: 80%;"><b>Material for Lecture Monday October 28</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs004.html#convolutional-neural-networks-recognizing-images" style="font-size: 80%;"><b>Convolutional Neural Networks (recognizing images)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs005.html#what-is-the-difference" style="font-size: 80%;"><b>What is the Difference</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs006.html#neural-networks-vs-cnns" style="font-size: 80%;"><b>Neural Networks vs CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs007.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;"><b>Why CNNS for images, sound files, medical images from CT scans etc?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs008.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;"><b>Regular NNs don’t scale well to full images</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs009.html#3d-volumes-of-neurons" style="font-size: 80%;"><b>3D volumes of neurons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs010.html#more-on-dimensionalities" style="font-size: 80%;"><b>More on Dimensionalities</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs011.html#further-remarks" style="font-size: 80%;"><b>Further remarks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs012.html#layers-used-to-build-cnns" style="font-size: 80%;"><b>Layers used to build CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs013.html#transforming-images" style="font-size: 80%;"><b>Transforming images</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs014.html#cnns-in-brief" style="font-size: 80%;"><b>CNNs in brief</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs015.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs016.html#key-idea" style="font-size: 80%;"><b>Key Idea</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs017.html#how-to-do-image-compression-before-the-era-of-deep-learning" style="font-size: 80%;"><b>How to do image compression before the era of deep learning</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs018.html#the-svd-example" style="font-size: 80%;"><b>The SVD example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs019.html#mathematics-of-cnns" style="font-size: 80%;"><b>Mathematics of CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs020.html#convolution-examples-polynomial-multiplication" style="font-size: 80%;"><b>Convolution Examples: Polynomial multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs021.html#efficient-polynomial-multiplication" style="font-size: 80%;"><b>Efficient Polynomial Multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs022.html#further-simplification" style="font-size: 80%;"><b>Further simplification</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs023.html#a-more-efficient-way-of-coding-the-above-convolution" style="font-size: 80%;"><b>A more efficient way of coding the above Convolution</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs024.html#commutative-process" style="font-size: 80%;"><b>Commutative process</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs025.html#toeplitz-matrices" style="font-size: 80%;"><b>Toeplitz matrices</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs026.html#fourier-series-and-toeplitz-matrices" style="font-size: 80%;"><b>Fourier series and Toeplitz matrices</b></a></li>
-     <!-- navigation toc: --> <li><a href="#generalizing-the-above-one-dimensional-case" style="font-size: 80%;"><b>Generalizing the above one-dimensional case</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#memory-considerations" style="font-size: 80%;"><b>Memory considerations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs029.html#padding" style="font-size: 80%;"><b>Padding</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs030.html#new-vector" style="font-size: 80%;"><b>New vector</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs031.html#rewriting-as-dot-products" style="font-size: 80%;"><b>Rewriting as dot products</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs032.html#cross-correlation" style="font-size: 80%;"><b>Cross correlation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs032.html#two-dimensional-objects" style="font-size: 80%;"><b>Two-dimensional objects</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs033.html#cnns-in-more-detail-simple-example" style="font-size: 80%;"><b>CNNs in more detail, simple example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs034.html#the-convolution-stage" style="font-size: 80%;"><b>The convolution stage</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs035.html#finding-the-number-of-parameters" style="font-size: 80%;"><b>Finding the number of parameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs036.html#new-image-or-volume" style="font-size: 80%;"><b>New image (or volume)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs037.html#parameters-to-train-common-settings" style="font-size: 80%;"><b>Parameters to train, common settings</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs038.html#examples-of-cnn-setups" style="font-size: 80%;"><b>Examples of CNN setups</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs039.html#summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>Summarizing: Performing a general discrete convolution ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs040.html#pooling" style="font-size: 80%;"><b>Pooling</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs041.html#pooling-arithmetic" style="font-size: 80%;"><b>Pooling arithmetic</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs042.html#pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>Pooling types ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs043.html#building-convolutional-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building convolutional neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs044.html#setting-it-up" style="font-size: 80%;"><b>Setting it up</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs045.html#the-mnist-dataset-again" style="font-size: 80%;"><b>The MNIST dataset again</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs046.html#strong-correlations" style="font-size: 80%;"><b>Strong correlations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs047.html#layers-of-a-cnn" style="font-size: 80%;"><b>Layers of a CNN</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs048.html#systematic-reduction" style="font-size: 80%;"><b>Systematic reduction</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs049.html#prerequisites-collect-and-pre-process-data" style="font-size: 80%;"><b>Prerequisites: Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs050.html#importing-keras-and-tensorflow" style="font-size: 80%;"><b>Importing Keras and Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs051.html#running-with-keras" style="font-size: 80%;"><b>Running with Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs052.html#final-part" style="font-size: 80%;"><b>Final part</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs053.html#final-visualization" style="font-size: 80%;"><b>Final visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs054.html#the-cifar01-data-set" style="font-size: 80%;"><b>The CIFAR01 data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs055.html#verifying-the-data-set" style="font-size: 80%;"><b>Verifying the data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs056.html#set-up-the-model" style="font-size: 80%;"><b>Set up  the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs057.html#add-dense-layers-on-top" style="font-size: 80%;"><b>Add Dense layers on top</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs058.html#compile-and-train-the-model" style="font-size: 80%;"><b>Compile and train the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs059.html#finally-evaluate-the-model" style="font-size: 80%;"><b>Finally, evaluate the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#building-our-own-cnn-code" style="font-size: 80%;"><b>Building our own CNN code</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#list-of-contents" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;List of contents:</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution2dlayer-convolution-in-a-hidden-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution2DLayer: convolution in a hidden layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#backpropagation-in-the-convolutional-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Backpropagation in the convolutional layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#demonstration" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Demonstration</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#pooling-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Pooling Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#flattening-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Flattening Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#fully-connected-layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Fully Connected Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#optimized-convolution2dlayer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Optimized Convolution2DLayer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#the-convolutional-neural-network-cnn" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;The Convolutional Neural Network (CNN)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-cnn-code" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of CNN code</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#additional-remarks" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Additional Remarks</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#remarks-on-the-speed" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Remarks on the speed</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution-using-separable-kernels" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution using separable kernels</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution-in-the-fourier-domain" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution in the Fourier domain</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs001.html#plan-for-week-44" style="font-size: 80%;">Plan for week 44</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs002.html#lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;">Lab  sessions on Tuesday and Wednesday</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs003.html#material-for-lecture-monday-october-27" style="font-size: 80%;">Material for Lecture Monday October 27</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs004.html#solving-differential-equations-with-deep-learning" style="font-size: 80%;">Solving differential equations  with Deep Learning</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs005.html#ordinary-differential-equations-first" style="font-size: 80%;">Ordinary Differential Equations first</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#the-trial-solution" style="font-size: 80%;">The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs007.html#minimization-process" style="font-size: 80%;">Minimization process</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs008.html#minimizing-the-cost-function-using-gradient-descent-and-automatic-differentiation" style="font-size: 80%;">Minimizing the cost function using gradient descent and automatic differentiation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs009.html#example-exponential-decay" style="font-size: 80%;">Example: Exponential decay</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs010.html#the-function-to-solve-for" style="font-size: 80%;">The function to solve for</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#the-trial-solution" style="font-size: 80%;">The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs012.html#setup-of-network" style="font-size: 80%;">Setup of Network</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs013.html#reformulating-the-problem" style="font-size: 80%;">Reformulating the problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs014.html#more-technicalities" style="font-size: 80%;">More technicalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs039.html#more-details" style="font-size: 80%;">More details</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs016.html#a-possible-implementation-of-a-neural-network" style="font-size: 80%;">A possible implementation of a neural network</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs017.html#technicalities" style="font-size: 80%;">Technicalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs018.html#final-technicalities-i" style="font-size: 80%;">Final technicalities I</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs019.html#final-technicalities-ii" style="font-size: 80%;">Final technicalities II</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs020.html#final-technicalities-iii" style="font-size: 80%;">Final technicalities III</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs021.html#final-technicalities-iv" style="font-size: 80%;">Final technicalities IV</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs022.html#back-propagation" style="font-size: 80%;">Back propagation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs023.html#gradient-descent" style="font-size: 80%;">Gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs024.html#the-code-for-solving-the-ode" style="font-size: 80%;">The code for solving the ODE</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs025.html#the-network-with-one-input-layer-specified-number-of-hidden-layers-and-one-output-layer" style="font-size: 80%;">The network with one input layer, specified number of hidden layers, and one output layer</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs026.html#example-population-growth" style="font-size: 80%;">Example: Population growth</a></li>
+     <!-- navigation toc: --> <li><a href="#setting-up-the-problem" style="font-size: 80%;">Setting up the problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#the-trial-solution" style="font-size: 80%;">The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs029.html#the-program-using-autograd" style="font-size: 80%;">The program using Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs030.html#using-forward-euler-to-solve-the-ode" style="font-size: 80%;">Using forward Euler to solve the ODE</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs031.html#example-solving-the-one-dimensional-poisson-equation" style="font-size: 80%;">Example: Solving the one dimensional Poisson equation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs032.html#the-specific-equation-to-solve-for" style="font-size: 80%;">The specific equation to solve for</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs033.html#solving-the-equation-using-autograd" style="font-size: 80%;">Solving the equation using Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs034.html#comparing-with-a-numerical-scheme" style="font-size: 80%;">Comparing with a numerical scheme</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs035.html#setting-up-the-code" style="font-size: 80%;">Setting up the code</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs036.html#partial-differential-equations" style="font-size: 80%;">Partial Differential Equations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs037.html#type-of-problem" style="font-size: 80%;">Type of problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs038.html#network-requirements" style="font-size: 80%;">Network requirements</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs039.html#more-details" style="font-size: 80%;">More details</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs040.html#example-the-diffusion-equation" style="font-size: 80%;">Example: The diffusion equation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs041.html#defining-the-problem" style="font-size: 80%;">Defining the problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs042.html#setting-up-the-network-using-autograd" style="font-size: 80%;">Setting up the network using Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs043.html#setting-up-the-network-using-autograd-the-trial-solution" style="font-size: 80%;">Setting up the network using Autograd; The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs044.html#why-the-jacobian" style="font-size: 80%;">Why the Jacobian?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs045.html#setting-up-the-network-using-autograd-the-full-program" style="font-size: 80%;">Setting up the network using Autograd; The full program</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs046.html#resources-on-differential-equations-and-deep-learning" style="font-size: 80%;">Resources on differential equations and deep learning</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs047.html#convolutional-neural-networks-recognizing-images" style="font-size: 80%;">Convolutional Neural Networks (recognizing images)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs048.html#what-is-the-difference" style="font-size: 80%;">What is the Difference</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs049.html#neural-networks-vs-cnns" style="font-size: 80%;">Neural Networks vs CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs050.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;">Why CNNS for images, sound files, medical images from CT scans etc?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs051.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;">Regular NNs don’t scale well to full images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs052.html#3d-volumes-of-neurons" style="font-size: 80%;">3D volumes of neurons</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs053.html#more-on-dimensionalities" style="font-size: 80%;">More on Dimensionalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs054.html#further-remarks" style="font-size: 80%;">Further remarks</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs055.html#layers-used-to-build-cnns" style="font-size: 80%;">Layers used to build CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs056.html#transforming-images" style="font-size: 80%;">Transforming images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs057.html#cnns-in-brief" style="font-size: 80%;">CNNs in brief</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs058.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs059.html#key-idea" style="font-size: 80%;">Key Idea</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#how-to-do-image-compression-before-the-era-of-deep-learning" style="font-size: 80%;">How to do image compression before the era of deep learning</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs061.html#the-svd-example" style="font-size: 80%;">The SVD example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs062.html#mathematics-of-cnns" style="font-size: 80%;">Mathematics of CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs063.html#convolution-examples-polynomial-multiplication" style="font-size: 80%;">Convolution Examples: Polynomial multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs064.html#efficient-polynomial-multiplication" style="font-size: 80%;">Efficient Polynomial Multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs065.html#further-simplification" style="font-size: 80%;">Further simplification</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs066.html#a-more-efficient-way-of-coding-the-above-convolution" style="font-size: 80%;">A more efficient way of coding the above Convolution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs067.html#commutative-process" style="font-size: 80%;">Commutative process</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs068.html#toeplitz-matrices" style="font-size: 80%;">Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs069.html#fourier-series-and-toeplitz-matrices" style="font-size: 80%;">Fourier series and Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs070.html#generalizing-the-above-one-dimensional-case" style="font-size: 80%;">Generalizing the above one-dimensional case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs071.html#memory-considerations" style="font-size: 80%;">Memory considerations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs072.html#padding" style="font-size: 80%;">Padding</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs073.html#new-vector" style="font-size: 80%;">New vector</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs074.html#rewriting-as-dot-products" style="font-size: 80%;">Rewriting as dot products</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs075.html#cross-correlation" style="font-size: 80%;">Cross correlation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs075.html#two-dimensional-objects" style="font-size: 80%;">Two-dimensional objects</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs076.html#cnns-in-more-detail-simple-example" style="font-size: 80%;">CNNs in more detail, simple example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs077.html#the-convolution-stage" style="font-size: 80%;">The convolution stage</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs078.html#finding-the-number-of-parameters" style="font-size: 80%;">Finding the number of parameters</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs079.html#new-image-or-volume" style="font-size: 80%;">New image (or volume)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs080.html#parameters-to-train-common-settings" style="font-size: 80%;">Parameters to train, common settings</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs081.html#examples-of-cnn-setups" style="font-size: 80%;">Examples of CNN setups</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs082.html#summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Summarizing: Performing a general discrete convolution ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs083.html#pooling" style="font-size: 80%;">Pooling</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs084.html#pooling-arithmetic" style="font-size: 80%;">Pooling arithmetic</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs085.html#pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Pooling types ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs086.html#building-convolutional-neural-networks-in-tensorflow-keras-and-pytorch" style="font-size: 80%;">Building convolutional neural networks in Tensorflow/Keras and PyTorch</a></li>
 
         </ul>
       </li>
@@ -348,24 +389,21 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0027"></a>
 <!-- !split -->
-<h2 id="generalizing-the-above-one-dimensional-case" class="anchor">Generalizing the above one-dimensional case </h2>
+<h2 id="setting-up-the-problem" class="anchor">Setting up the problem </h2>
 
-<p>In order to align the above simple case with the more general
-convolution cases, we rename \( \boldsymbol{\alpha} \), whose length is \( m=3 \),
-with \( \boldsymbol{w} \).  We will interpret \( \boldsymbol{w} \) as a weight/filter function
-with which we want to perform the convolution with an input variable
-\( \boldsymbol{x} \) of length \( n \).  We will assume always that the filter
-\( \boldsymbol{w} \) has dimensionality \( m \le n \).
+<p>Here, we will model a population \( g(t) \) in an environment having carrying capacity \( A \).
+The population follows the model
 </p>
 
-<p>We replace thus \( \boldsymbol{\beta} \) with \( \boldsymbol{x} \) and \( \boldsymbol{\delta} \) with \( \boldsymbol{y} \) and have</p>
 $$
-y(i)= \left(x*w\right)(i)= \sum_{k=0}^{k=m-1}w(k)x(i-k),
+\begin{equation} \tag{11}
+g'(t) = \alpha g(t)(A - g(t))
+\end{equation}
 $$
 
-<p>where \( m=3 \) in our case, the maximum length of the vector \( \boldsymbol{w} \).
-Here the symbol \( * \) represents the mathematical operation of convolution.
-</p>
+<p>where \( g(0) = g_0 \).</p>
+
+<p>In this example, we let \( \alpha = 2 \), \( A = 1 \), and \( g_0 = 1.2 \).</p>
 
 <p>
 <!-- navigation buttons at the bottom of the page -->
@@ -392,7 +430,7 @@ <h2 id="generalizing-the-above-one-dimensional-case" class="anchor">Generalizing
   <li><a href="http://github.com/._week44-bs035.html">36</a></li>
   <li><a href="http://github.com/._week44-bs036.html">37</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week44-bs060.html">61</a></li>
+  <li><a href="http://github.com/._week44-bs086.html">87</a></li>
   <li><a href="http://github.com/._week44-bs028.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week44/html/._week44-bs028.html b/doc/pub/week44/html/._week44-bs028.html
index 2d8d79f05..c0b737da6 100644
--- a/doc/pub/week44/html/._week44-bs028.html
+++ b/doc/pub/week44/html/._week44-bs028.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 44,  Convolutional Neural Networks (CNN)">
-<title>Week 44,  Convolutional Neural Networks (CNN)</title>
+<meta name="description" content="Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)">
+<title>Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week44.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week44-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -41,10 +41,121 @@
                2,
                None,
                'lab-sessions-on-tuesday-and-wednesday'),
-              ('Material for Lecture Monday October 28',
+              ('Material for Lecture Monday October 27',
                2,
                None,
-               'material-for-lecture-monday-october-28'),
+               'material-for-lecture-monday-october-27'),
+              ('Solving differential equations  with Deep Learning',
+               2,
+               None,
+               'solving-differential-equations-with-deep-learning'),
+              ('Ordinary Differential Equations first',
+               2,
+               None,
+               'ordinary-differential-equations-first'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('Minimization process', 2, None, 'minimization-process'),
+              ('Minimizing the cost function using gradient descent and '
+               'automatic differentiation',
+               2,
+               None,
+               'minimizing-the-cost-function-using-gradient-descent-and-automatic-differentiation'),
+              ('Example: Exponential decay',
+               2,
+               None,
+               'example-exponential-decay'),
+              ('The function to solve for',
+               2,
+               None,
+               'the-function-to-solve-for'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('Setup of Network', 2, None, 'setup-of-network'),
+              ('Reformulating the problem',
+               2,
+               None,
+               'reformulating-the-problem'),
+              ('More technicalities', 2, None, 'more-technicalities'),
+              ('More details', 2, None, 'more-details'),
+              ('A possible implementation of a neural network',
+               2,
+               None,
+               'a-possible-implementation-of-a-neural-network'),
+              ('Technicalities', 2, None, 'technicalities'),
+              ('Final technicalities I', 2, None, 'final-technicalities-i'),
+              ('Final technicalities II', 2, None, 'final-technicalities-ii'),
+              ('Final technicalities III', 2, None, 'final-technicalities-iii'),
+              ('Final technicalities IV', 2, None, 'final-technicalities-iv'),
+              ('Back propagation', 2, None, 'back-propagation'),
+              ('Gradient descent', 2, None, 'gradient-descent'),
+              ('The code for solving the ODE',
+               2,
+               None,
+               'the-code-for-solving-the-ode'),
+              ('The network with one input layer, specified number of hidden '
+               'layers, and one output layer',
+               2,
+               None,
+               'the-network-with-one-input-layer-specified-number-of-hidden-layers-and-one-output-layer'),
+              ('Example: Population growth',
+               2,
+               None,
+               'example-population-growth'),
+              ('Setting up the problem', 2, None, 'setting-up-the-problem'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('The program using Autograd',
+               2,
+               None,
+               'the-program-using-autograd'),
+              ('Using forward Euler to solve the ODE',
+               2,
+               None,
+               'using-forward-euler-to-solve-the-ode'),
+              ('Example: Solving the one dimensional Poisson equation',
+               2,
+               None,
+               'example-solving-the-one-dimensional-poisson-equation'),
+              ('The specific equation to solve for',
+               2,
+               None,
+               'the-specific-equation-to-solve-for'),
+              ('Solving the equation using Autograd',
+               2,
+               None,
+               'solving-the-equation-using-autograd'),
+              ('Comparing with a numerical scheme',
+               2,
+               None,
+               'comparing-with-a-numerical-scheme'),
+              ('Setting up the code', 2, None, 'setting-up-the-code'),
+              ('Partial Differential Equations',
+               2,
+               None,
+               'partial-differential-equations'),
+              ('Type of problem', 2, None, 'type-of-problem'),
+              ('Network requirements', 2, None, 'network-requirements'),
+              ('More details', 2, None, 'more-details'),
+              ('Example: The diffusion equation',
+               2,
+               None,
+               'example-the-diffusion-equation'),
+              ('Defining the problem', 2, None, 'defining-the-problem'),
+              ('Setting up the network using Autograd',
+               2,
+               None,
+               'setting-up-the-network-using-autograd'),
+              ('Setting up the network using Autograd; The trial solution',
+               2,
+               None,
+               'setting-up-the-network-using-autograd-the-trial-solution'),
+              ('Why the Jacobian?', 2, None, 'why-the-jacobian'),
+              ('Setting up the network using Autograd; The full program',
+               2,
+               None,
+               'setting-up-the-network-using-autograd-the-full-program'),
+              ('Resources on differential equations and deep learning',
+               2,
+               None,
+               'resources-on-differential-equations-and-deep-learning'),
               ('Convolutional Neural Networks (recognizing images)',
                2,
                None,
@@ -141,85 +252,11 @@
                2,
                None,
                'pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
-              ('Building convolutional neural networks in Tensorflow and Keras',
-               2,
-               None,
-               'building-convolutional-neural-networks-in-tensorflow-and-keras'),
-              ('Setting it up', 2, None, 'setting-it-up'),
-              ('The MNIST dataset again', 2, None, 'the-mnist-dataset-again'),
-              ('Strong correlations', 2, None, 'strong-correlations'),
-              ('Layers of a CNN', 2, None, 'layers-of-a-cnn'),
-              ('Systematic reduction', 2, None, 'systematic-reduction'),
-              ('Prerequisites: Collect and pre-process data',
-               2,
-               None,
-               'prerequisites-collect-and-pre-process-data'),
-              ('Importing Keras and Tensorflow',
+              ('Building convolutional neural networks in Tensorflow/Keras and '
+               'PyTorch',
                2,
                None,
-               'importing-keras-and-tensorflow'),
-              ('Running with Keras', 2, None, 'running-with-keras'),
-              ('Final part', 2, None, 'final-part'),
-              ('Final visualization', 2, None, 'final-visualization'),
-              ('The CIFAR01 data set', 2, None, 'the-cifar01-data-set'),
-              ('Verifying the data set', 2, None, 'verifying-the-data-set'),
-              ('Set up  the model', 2, None, 'set-up-the-model'),
-              ('Add Dense layers on top', 2, None, 'add-dense-layers-on-top'),
-              ('Compile and train the model',
-               2,
-               None,
-               'compile-and-train-the-model'),
-              ('Finally, evaluate the model',
-               2,
-               None,
-               'finally-evaluate-the-model'),
-              ('Building our own CNN code',
-               2,
-               None,
-               'building-our-own-cnn-code'),
-              ('List of contents:', 3, None, 'list-of-contents'),
-              ('Schedulers', 3, None, 'schedulers'),
-              ('Usage of schedulers', 3, None, 'usage-of-schedulers'),
-              ('Cost functions', 3, None, 'cost-functions'),
-              ('Usage of cost functions', 3, None, 'usage-of-cost-functions'),
-              ('Activation functions', 3, None, 'activation-functions'),
-              ('Usage of activation functions',
-               3,
-               None,
-               'usage-of-activation-functions'),
-              ('Convolution', 3, None, 'convolution'),
-              ('Layers', 3, None, 'layers'),
-              ('Convolution2DLayer: convolution in a hidden layer',
-               3,
-               None,
-               'convolution2dlayer-convolution-in-a-hidden-layer'),
-              ('Backpropagation in the convolutional layer',
-               3,
-               None,
-               'backpropagation-in-the-convolutional-layer'),
-              ('Demonstration', 3, None, 'demonstration'),
-              ('Pooling Layer', 3, None, 'pooling-layer'),
-              ('Flattening Layer', 3, None, 'flattening-layer'),
-              ('Fully Connected Layers', 3, None, 'fully-connected-layers'),
-              ('Optimized Convolution2DLayer',
-               3,
-               None,
-               'optimized-convolution2dlayer'),
-              ('The Convolutional Neural Network (CNN)',
-               3,
-               None,
-               'the-convolutional-neural-network-cnn'),
-              ('Usage of CNN code', 3, None, 'usage-of-cnn-code'),
-              ('Additional Remarks', 3, None, 'additional-remarks'),
-              ('Remarks on the speed', 3, None, 'remarks-on-the-speed'),
-              ('Convolution using separable kernels',
-               3,
-               None,
-               'convolution-using-separable-kernels'),
-              ('Convolution in the Fourier domain',
-               3,
-               None,
-               'convolution-in-the-fourier-domain')]}
+               'building-convolutional-neural-networks-in-tensorflow-keras-and-pytorch')]}
 end of tocinfo -->
 
 <body>
@@ -247,96 +284,100 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week44-bs.html">Week 44,  Convolutional Neural Networks (CNN)</a>
+    <a class="navbar-brand" href="http://github.com/week44-bs.html">Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs001.html#plan-for-week-44" style="font-size: 80%;"><b>Plan for week 44</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs002.html#lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Lab  sessions on Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs003.html#material-for-lecture-monday-october-28" style="font-size: 80%;"><b>Material for Lecture Monday October 28</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs004.html#convolutional-neural-networks-recognizing-images" style="font-size: 80%;"><b>Convolutional Neural Networks (recognizing images)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs005.html#what-is-the-difference" style="font-size: 80%;"><b>What is the Difference</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs006.html#neural-networks-vs-cnns" style="font-size: 80%;"><b>Neural Networks vs CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs007.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;"><b>Why CNNS for images, sound files, medical images from CT scans etc?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs008.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;"><b>Regular NNs don’t scale well to full images</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs009.html#3d-volumes-of-neurons" style="font-size: 80%;"><b>3D volumes of neurons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs010.html#more-on-dimensionalities" style="font-size: 80%;"><b>More on Dimensionalities</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs011.html#further-remarks" style="font-size: 80%;"><b>Further remarks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs012.html#layers-used-to-build-cnns" style="font-size: 80%;"><b>Layers used to build CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs013.html#transforming-images" style="font-size: 80%;"><b>Transforming images</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs014.html#cnns-in-brief" style="font-size: 80%;"><b>CNNs in brief</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs015.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs016.html#key-idea" style="font-size: 80%;"><b>Key Idea</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs017.html#how-to-do-image-compression-before-the-era-of-deep-learning" style="font-size: 80%;"><b>How to do image compression before the era of deep learning</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs018.html#the-svd-example" style="font-size: 80%;"><b>The SVD example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs019.html#mathematics-of-cnns" style="font-size: 80%;"><b>Mathematics of CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs020.html#convolution-examples-polynomial-multiplication" style="font-size: 80%;"><b>Convolution Examples: Polynomial multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs021.html#efficient-polynomial-multiplication" style="font-size: 80%;"><b>Efficient Polynomial Multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs022.html#further-simplification" style="font-size: 80%;"><b>Further simplification</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs023.html#a-more-efficient-way-of-coding-the-above-convolution" style="font-size: 80%;"><b>A more efficient way of coding the above Convolution</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs024.html#commutative-process" style="font-size: 80%;"><b>Commutative process</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs025.html#toeplitz-matrices" style="font-size: 80%;"><b>Toeplitz matrices</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs026.html#fourier-series-and-toeplitz-matrices" style="font-size: 80%;"><b>Fourier series and Toeplitz matrices</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs027.html#generalizing-the-above-one-dimensional-case" style="font-size: 80%;"><b>Generalizing the above one-dimensional case</b></a></li>
-     <!-- navigation toc: --> <li><a href="#memory-considerations" style="font-size: 80%;"><b>Memory considerations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs029.html#padding" style="font-size: 80%;"><b>Padding</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs030.html#new-vector" style="font-size: 80%;"><b>New vector</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs031.html#rewriting-as-dot-products" style="font-size: 80%;"><b>Rewriting as dot products</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs032.html#cross-correlation" style="font-size: 80%;"><b>Cross correlation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs032.html#two-dimensional-objects" style="font-size: 80%;"><b>Two-dimensional objects</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs033.html#cnns-in-more-detail-simple-example" style="font-size: 80%;"><b>CNNs in more detail, simple example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs034.html#the-convolution-stage" style="font-size: 80%;"><b>The convolution stage</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs035.html#finding-the-number-of-parameters" style="font-size: 80%;"><b>Finding the number of parameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs036.html#new-image-or-volume" style="font-size: 80%;"><b>New image (or volume)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs037.html#parameters-to-train-common-settings" style="font-size: 80%;"><b>Parameters to train, common settings</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs038.html#examples-of-cnn-setups" style="font-size: 80%;"><b>Examples of CNN setups</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs039.html#summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>Summarizing: Performing a general discrete convolution ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs040.html#pooling" style="font-size: 80%;"><b>Pooling</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs041.html#pooling-arithmetic" style="font-size: 80%;"><b>Pooling arithmetic</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs042.html#pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>Pooling types ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs043.html#building-convolutional-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building convolutional neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs044.html#setting-it-up" style="font-size: 80%;"><b>Setting it up</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs045.html#the-mnist-dataset-again" style="font-size: 80%;"><b>The MNIST dataset again</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs046.html#strong-correlations" style="font-size: 80%;"><b>Strong correlations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs047.html#layers-of-a-cnn" style="font-size: 80%;"><b>Layers of a CNN</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs048.html#systematic-reduction" style="font-size: 80%;"><b>Systematic reduction</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs049.html#prerequisites-collect-and-pre-process-data" style="font-size: 80%;"><b>Prerequisites: Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs050.html#importing-keras-and-tensorflow" style="font-size: 80%;"><b>Importing Keras and Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs051.html#running-with-keras" style="font-size: 80%;"><b>Running with Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs052.html#final-part" style="font-size: 80%;"><b>Final part</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs053.html#final-visualization" style="font-size: 80%;"><b>Final visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs054.html#the-cifar01-data-set" style="font-size: 80%;"><b>The CIFAR01 data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs055.html#verifying-the-data-set" style="font-size: 80%;"><b>Verifying the data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs056.html#set-up-the-model" style="font-size: 80%;"><b>Set up  the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs057.html#add-dense-layers-on-top" style="font-size: 80%;"><b>Add Dense layers on top</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs058.html#compile-and-train-the-model" style="font-size: 80%;"><b>Compile and train the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs059.html#finally-evaluate-the-model" style="font-size: 80%;"><b>Finally, evaluate the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#building-our-own-cnn-code" style="font-size: 80%;"><b>Building our own CNN code</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#list-of-contents" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;List of contents:</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution2dlayer-convolution-in-a-hidden-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution2DLayer: convolution in a hidden layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#backpropagation-in-the-convolutional-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Backpropagation in the convolutional layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#demonstration" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Demonstration</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#pooling-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Pooling Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#flattening-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Flattening Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#fully-connected-layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Fully Connected Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#optimized-convolution2dlayer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Optimized Convolution2DLayer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#the-convolutional-neural-network-cnn" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;The Convolutional Neural Network (CNN)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-cnn-code" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of CNN code</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#additional-remarks" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Additional Remarks</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#remarks-on-the-speed" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Remarks on the speed</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution-using-separable-kernels" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution using separable kernels</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution-in-the-fourier-domain" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution in the Fourier domain</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs001.html#plan-for-week-44" style="font-size: 80%;">Plan for week 44</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs002.html#lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;">Lab  sessions on Tuesday and Wednesday</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs003.html#material-for-lecture-monday-october-27" style="font-size: 80%;">Material for Lecture Monday October 27</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs004.html#solving-differential-equations-with-deep-learning" style="font-size: 80%;">Solving differential equations  with Deep Learning</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs005.html#ordinary-differential-equations-first" style="font-size: 80%;">Ordinary Differential Equations first</a></li>
+     <!-- navigation toc: --> <li><a href="#the-trial-solution" style="font-size: 80%;">The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs007.html#minimization-process" style="font-size: 80%;">Minimization process</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs008.html#minimizing-the-cost-function-using-gradient-descent-and-automatic-differentiation" style="font-size: 80%;">Minimizing the cost function using gradient descent and automatic differentiation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs009.html#example-exponential-decay" style="font-size: 80%;">Example: Exponential decay</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs010.html#the-function-to-solve-for" style="font-size: 80%;">The function to solve for</a></li>
+     <!-- navigation toc: --> <li><a href="#the-trial-solution" style="font-size: 80%;">The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs012.html#setup-of-network" style="font-size: 80%;">Setup of Network</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs013.html#reformulating-the-problem" style="font-size: 80%;">Reformulating the problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs014.html#more-technicalities" style="font-size: 80%;">More technicalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs039.html#more-details" style="font-size: 80%;">More details</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs016.html#a-possible-implementation-of-a-neural-network" style="font-size: 80%;">A possible implementation of a neural network</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs017.html#technicalities" style="font-size: 80%;">Technicalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs018.html#final-technicalities-i" style="font-size: 80%;">Final technicalities I</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs019.html#final-technicalities-ii" style="font-size: 80%;">Final technicalities II</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs020.html#final-technicalities-iii" style="font-size: 80%;">Final technicalities III</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs021.html#final-technicalities-iv" style="font-size: 80%;">Final technicalities IV</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs022.html#back-propagation" style="font-size: 80%;">Back propagation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs023.html#gradient-descent" style="font-size: 80%;">Gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs024.html#the-code-for-solving-the-ode" style="font-size: 80%;">The code for solving the ODE</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs025.html#the-network-with-one-input-layer-specified-number-of-hidden-layers-and-one-output-layer" style="font-size: 80%;">The network with one input layer, specified number of hidden layers, and one output layer</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs026.html#example-population-growth" style="font-size: 80%;">Example: Population growth</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs027.html#setting-up-the-problem" style="font-size: 80%;">Setting up the problem</a></li>
+     <!-- navigation toc: --> <li><a href="#the-trial-solution" style="font-size: 80%;">The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs029.html#the-program-using-autograd" style="font-size: 80%;">The program using Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs030.html#using-forward-euler-to-solve-the-ode" style="font-size: 80%;">Using forward Euler to solve the ODE</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs031.html#example-solving-the-one-dimensional-poisson-equation" style="font-size: 80%;">Example: Solving the one dimensional Poisson equation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs032.html#the-specific-equation-to-solve-for" style="font-size: 80%;">The specific equation to solve for</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs033.html#solving-the-equation-using-autograd" style="font-size: 80%;">Solving the equation using Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs034.html#comparing-with-a-numerical-scheme" style="font-size: 80%;">Comparing with a numerical scheme</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs035.html#setting-up-the-code" style="font-size: 80%;">Setting up the code</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs036.html#partial-differential-equations" style="font-size: 80%;">Partial Differential Equations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs037.html#type-of-problem" style="font-size: 80%;">Type of problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs038.html#network-requirements" style="font-size: 80%;">Network requirements</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs039.html#more-details" style="font-size: 80%;">More details</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs040.html#example-the-diffusion-equation" style="font-size: 80%;">Example: The diffusion equation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs041.html#defining-the-problem" style="font-size: 80%;">Defining the problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs042.html#setting-up-the-network-using-autograd" style="font-size: 80%;">Setting up the network using Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs043.html#setting-up-the-network-using-autograd-the-trial-solution" style="font-size: 80%;">Setting up the network using Autograd; The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs044.html#why-the-jacobian" style="font-size: 80%;">Why the Jacobian?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs045.html#setting-up-the-network-using-autograd-the-full-program" style="font-size: 80%;">Setting up the network using Autograd; The full program</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs046.html#resources-on-differential-equations-and-deep-learning" style="font-size: 80%;">Resources on differential equations and deep learning</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs047.html#convolutional-neural-networks-recognizing-images" style="font-size: 80%;">Convolutional Neural Networks (recognizing images)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs048.html#what-is-the-difference" style="font-size: 80%;">What is the Difference</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs049.html#neural-networks-vs-cnns" style="font-size: 80%;">Neural Networks vs CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs050.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;">Why CNNS for images, sound files, medical images from CT scans etc?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs051.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;">Regular NNs don’t scale well to full images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs052.html#3d-volumes-of-neurons" style="font-size: 80%;">3D volumes of neurons</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs053.html#more-on-dimensionalities" style="font-size: 80%;">More on Dimensionalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs054.html#further-remarks" style="font-size: 80%;">Further remarks</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs055.html#layers-used-to-build-cnns" style="font-size: 80%;">Layers used to build CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs056.html#transforming-images" style="font-size: 80%;">Transforming images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs057.html#cnns-in-brief" style="font-size: 80%;">CNNs in brief</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs058.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs059.html#key-idea" style="font-size: 80%;">Key Idea</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#how-to-do-image-compression-before-the-era-of-deep-learning" style="font-size: 80%;">How to do image compression before the era of deep learning</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs061.html#the-svd-example" style="font-size: 80%;">The SVD example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs062.html#mathematics-of-cnns" style="font-size: 80%;">Mathematics of CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs063.html#convolution-examples-polynomial-multiplication" style="font-size: 80%;">Convolution Examples: Polynomial multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs064.html#efficient-polynomial-multiplication" style="font-size: 80%;">Efficient Polynomial Multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs065.html#further-simplification" style="font-size: 80%;">Further simplification</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs066.html#a-more-efficient-way-of-coding-the-above-convolution" style="font-size: 80%;">A more efficient way of coding the above Convolution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs067.html#commutative-process" style="font-size: 80%;">Commutative process</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs068.html#toeplitz-matrices" style="font-size: 80%;">Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs069.html#fourier-series-and-toeplitz-matrices" style="font-size: 80%;">Fourier series and Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs070.html#generalizing-the-above-one-dimensional-case" style="font-size: 80%;">Generalizing the above one-dimensional case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs071.html#memory-considerations" style="font-size: 80%;">Memory considerations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs072.html#padding" style="font-size: 80%;">Padding</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs073.html#new-vector" style="font-size: 80%;">New vector</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs074.html#rewriting-as-dot-products" style="font-size: 80%;">Rewriting as dot products</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs075.html#cross-correlation" style="font-size: 80%;">Cross correlation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs075.html#two-dimensional-objects" style="font-size: 80%;">Two-dimensional objects</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs076.html#cnns-in-more-detail-simple-example" style="font-size: 80%;">CNNs in more detail, simple example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs077.html#the-convolution-stage" style="font-size: 80%;">The convolution stage</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs078.html#finding-the-number-of-parameters" style="font-size: 80%;">Finding the number of parameters</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs079.html#new-image-or-volume" style="font-size: 80%;">New image (or volume)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs080.html#parameters-to-train-common-settings" style="font-size: 80%;">Parameters to train, common settings</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs081.html#examples-of-cnn-setups" style="font-size: 80%;">Examples of CNN setups</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs082.html#summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Summarizing: Performing a general discrete convolution ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs083.html#pooling" style="font-size: 80%;">Pooling</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs084.html#pooling-arithmetic" style="font-size: 80%;">Pooling arithmetic</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs085.html#pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Pooling types ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs086.html#building-convolutional-neural-networks-in-tensorflow-keras-and-pytorch" style="font-size: 80%;">Building convolutional neural networks in Tensorflow/Keras and PyTorch</a></li>
 
         </ul>
       </li>
@@ -348,20 +389,26 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0028"></a>
 <!-- !split -->
-<h2 id="memory-considerations" class="anchor">Memory considerations </h2>
+<h2 id="the-trial-solution" class="anchor">The trial solution </h2>
 
-<p>This expression leaves us however with some terms with negative
-indices, for example \( x(-1) \) and \( x(-2) \) which may not be defined. Our
-vector \( \boldsymbol{x} \) has components \( x(0) \), \( x(1) \), \( x(2) \) and \( x(3) \).
+<p>We will get a slightly different trial solution, as the boundary conditions are different
+compared to the case for exponential decay.
 </p>
 
-<p>The index \( j \) for \( \boldsymbol{x} \) runs from \( j=0 \) to \( j=3 \) since \( \boldsymbol{x} \) is meant to
-represent a third-order polynomial.
+<p>A possible trial solution satisfying the condition \( g(0) = g_0 \) could be</p>
+
+<p>$$
+h_1(t) = g_0 + t \cdot N(t,P)
+$$
 </p>
 
-<p>Furthermore, the index \( i \) runs from \( i=0 \) to \( i=5 \) since \( \boldsymbol{y} \)
-contains the coefficients of a fifth-order polynomial.  When \( i=5 \) we
-may also have values of \( x(4) \) and \( x(5) \) which are not defined.
+<p>with \( N(t,P) \) being the output from the neural network with weights and biases for each layer collected in the set \( P \).</p>
+
+<p>The analytical solution is</p>
+
+<p>$$
+g(t) = \frac{Ag_0}{g_0 + (A - g_0)\exp(-\alpha A t)}
+$$
 </p>
 
 <p>
@@ -389,7 +436,7 @@ <h2 id="memory-considerations" class="anchor">Memory considerations </h2>
   <li><a href="http://github.com/._week44-bs036.html">37</a></li>
   <li><a href="http://github.com/._week44-bs037.html">38</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week44-bs060.html">61</a></li>
+  <li><a href="http://github.com/._week44-bs086.html">87</a></li>
   <li><a href="http://github.com/._week44-bs029.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week44/html/._week44-bs029.html b/doc/pub/week44/html/._week44-bs029.html
index 73020d364..c280a2b4f 100644
--- a/doc/pub/week44/html/._week44-bs029.html
+++ b/doc/pub/week44/html/._week44-bs029.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 44,  Convolutional Neural Networks (CNN)">
-<title>Week 44,  Convolutional Neural Networks (CNN)</title>
+<meta name="description" content="Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)">
+<title>Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week44.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week44-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -41,10 +41,121 @@
                2,
                None,
                'lab-sessions-on-tuesday-and-wednesday'),
-              ('Material for Lecture Monday October 28',
+              ('Material for Lecture Monday October 27',
                2,
                None,
-               'material-for-lecture-monday-october-28'),
+               'material-for-lecture-monday-october-27'),
+              ('Solving differential equations  with Deep Learning',
+               2,
+               None,
+               'solving-differential-equations-with-deep-learning'),
+              ('Ordinary Differential Equations first',
+               2,
+               None,
+               'ordinary-differential-equations-first'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('Minimization process', 2, None, 'minimization-process'),
+              ('Minimizing the cost function using gradient descent and '
+               'automatic differentiation',
+               2,
+               None,
+               'minimizing-the-cost-function-using-gradient-descent-and-automatic-differentiation'),
+              ('Example: Exponential decay',
+               2,
+               None,
+               'example-exponential-decay'),
+              ('The function to solve for',
+               2,
+               None,
+               'the-function-to-solve-for'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('Setup of Network', 2, None, 'setup-of-network'),
+              ('Reformulating the problem',
+               2,
+               None,
+               'reformulating-the-problem'),
+              ('More technicalities', 2, None, 'more-technicalities'),
+              ('More details', 2, None, 'more-details'),
+              ('A possible implementation of a neural network',
+               2,
+               None,
+               'a-possible-implementation-of-a-neural-network'),
+              ('Technicalities', 2, None, 'technicalities'),
+              ('Final technicalities I', 2, None, 'final-technicalities-i'),
+              ('Final technicalities II', 2, None, 'final-technicalities-ii'),
+              ('Final technicalities III', 2, None, 'final-technicalities-iii'),
+              ('Final technicalities IV', 2, None, 'final-technicalities-iv'),
+              ('Back propagation', 2, None, 'back-propagation'),
+              ('Gradient descent', 2, None, 'gradient-descent'),
+              ('The code for solving the ODE',
+               2,
+               None,
+               'the-code-for-solving-the-ode'),
+              ('The network with one input layer, specified number of hidden '
+               'layers, and one output layer',
+               2,
+               None,
+               'the-network-with-one-input-layer-specified-number-of-hidden-layers-and-one-output-layer'),
+              ('Example: Population growth',
+               2,
+               None,
+               'example-population-growth'),
+              ('Setting up the problem', 2, None, 'setting-up-the-problem'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('The program using Autograd',
+               2,
+               None,
+               'the-program-using-autograd'),
+              ('Using forward Euler to solve the ODE',
+               2,
+               None,
+               'using-forward-euler-to-solve-the-ode'),
+              ('Example: Solving the one dimensional Poisson equation',
+               2,
+               None,
+               'example-solving-the-one-dimensional-poisson-equation'),
+              ('The specific equation to solve for',
+               2,
+               None,
+               'the-specific-equation-to-solve-for'),
+              ('Solving the equation using Autograd',
+               2,
+               None,
+               'solving-the-equation-using-autograd'),
+              ('Comparing with a numerical scheme',
+               2,
+               None,
+               'comparing-with-a-numerical-scheme'),
+              ('Setting up the code', 2, None, 'setting-up-the-code'),
+              ('Partial Differential Equations',
+               2,
+               None,
+               'partial-differential-equations'),
+              ('Type of problem', 2, None, 'type-of-problem'),
+              ('Network requirements', 2, None, 'network-requirements'),
+              ('More details', 2, None, 'more-details'),
+              ('Example: The diffusion equation',
+               2,
+               None,
+               'example-the-diffusion-equation'),
+              ('Defining the problem', 2, None, 'defining-the-problem'),
+              ('Setting up the network using Autograd',
+               2,
+               None,
+               'setting-up-the-network-using-autograd'),
+              ('Setting up the network using Autograd; The trial solution',
+               2,
+               None,
+               'setting-up-the-network-using-autograd-the-trial-solution'),
+              ('Why the Jacobian?', 2, None, 'why-the-jacobian'),
+              ('Setting up the network using Autograd; The full program',
+               2,
+               None,
+               'setting-up-the-network-using-autograd-the-full-program'),
+              ('Resources on differential equations and deep learning',
+               2,
+               None,
+               'resources-on-differential-equations-and-deep-learning'),
               ('Convolutional Neural Networks (recognizing images)',
                2,
                None,
@@ -141,85 +252,11 @@
                2,
                None,
                'pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
-              ('Building convolutional neural networks in Tensorflow and Keras',
+              ('Building convolutional neural networks in Tensorflow/Keras and '
+               'PyTorch',
                2,
                None,
-               'building-convolutional-neural-networks-in-tensorflow-and-keras'),
-              ('Setting it up', 2, None, 'setting-it-up'),
-              ('The MNIST dataset again', 2, None, 'the-mnist-dataset-again'),
-              ('Strong correlations', 2, None, 'strong-correlations'),
-              ('Layers of a CNN', 2, None, 'layers-of-a-cnn'),
-              ('Systematic reduction', 2, None, 'systematic-reduction'),
-              ('Prerequisites: Collect and pre-process data',
-               2,
-               None,
-               'prerequisites-collect-and-pre-process-data'),
-              ('Importing Keras and Tensorflow',
-               2,
-               None,
-               'importing-keras-and-tensorflow'),
-              ('Running with Keras', 2, None, 'running-with-keras'),
-              ('Final part', 2, None, 'final-part'),
-              ('Final visualization', 2, None, 'final-visualization'),
-              ('The CIFAR01 data set', 2, None, 'the-cifar01-data-set'),
-              ('Verifying the data set', 2, None, 'verifying-the-data-set'),
-              ('Set up  the model', 2, None, 'set-up-the-model'),
-              ('Add Dense layers on top', 2, None, 'add-dense-layers-on-top'),
-              ('Compile and train the model',
-               2,
-               None,
-               'compile-and-train-the-model'),
-              ('Finally, evaluate the model',
-               2,
-               None,
-               'finally-evaluate-the-model'),
-              ('Building our own CNN code',
-               2,
-               None,
-               'building-our-own-cnn-code'),
-              ('List of contents:', 3, None, 'list-of-contents'),
-              ('Schedulers', 3, None, 'schedulers'),
-              ('Usage of schedulers', 3, None, 'usage-of-schedulers'),
-              ('Cost functions', 3, None, 'cost-functions'),
-              ('Usage of cost functions', 3, None, 'usage-of-cost-functions'),
-              ('Activation functions', 3, None, 'activation-functions'),
-              ('Usage of activation functions',
-               3,
-               None,
-               'usage-of-activation-functions'),
-              ('Convolution', 3, None, 'convolution'),
-              ('Layers', 3, None, 'layers'),
-              ('Convolution2DLayer: convolution in a hidden layer',
-               3,
-               None,
-               'convolution2dlayer-convolution-in-a-hidden-layer'),
-              ('Backpropagation in the convolutional layer',
-               3,
-               None,
-               'backpropagation-in-the-convolutional-layer'),
-              ('Demonstration', 3, None, 'demonstration'),
-              ('Pooling Layer', 3, None, 'pooling-layer'),
-              ('Flattening Layer', 3, None, 'flattening-layer'),
-              ('Fully Connected Layers', 3, None, 'fully-connected-layers'),
-              ('Optimized Convolution2DLayer',
-               3,
-               None,
-               'optimized-convolution2dlayer'),
-              ('The Convolutional Neural Network (CNN)',
-               3,
-               None,
-               'the-convolutional-neural-network-cnn'),
-              ('Usage of CNN code', 3, None, 'usage-of-cnn-code'),
-              ('Additional Remarks', 3, None, 'additional-remarks'),
-              ('Remarks on the speed', 3, None, 'remarks-on-the-speed'),
-              ('Convolution using separable kernels',
-               3,
-               None,
-               'convolution-using-separable-kernels'),
-              ('Convolution in the Fourier domain',
-               3,
-               None,
-               'convolution-in-the-fourier-domain')]}
+               'building-convolutional-neural-networks-in-tensorflow-keras-and-pytorch')]}
 end of tocinfo -->
 
 <body>
@@ -247,96 +284,100 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week44-bs.html">Week 44,  Convolutional Neural Networks (CNN)</a>
+    <a class="navbar-brand" href="http://github.com/week44-bs.html">Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs001.html#plan-for-week-44" style="font-size: 80%;"><b>Plan for week 44</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs002.html#lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Lab  sessions on Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs003.html#material-for-lecture-monday-october-28" style="font-size: 80%;"><b>Material for Lecture Monday October 28</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs004.html#convolutional-neural-networks-recognizing-images" style="font-size: 80%;"><b>Convolutional Neural Networks (recognizing images)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs005.html#what-is-the-difference" style="font-size: 80%;"><b>What is the Difference</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs006.html#neural-networks-vs-cnns" style="font-size: 80%;"><b>Neural Networks vs CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs007.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;"><b>Why CNNS for images, sound files, medical images from CT scans etc?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs008.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;"><b>Regular NNs don’t scale well to full images</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs009.html#3d-volumes-of-neurons" style="font-size: 80%;"><b>3D volumes of neurons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs010.html#more-on-dimensionalities" style="font-size: 80%;"><b>More on Dimensionalities</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs011.html#further-remarks" style="font-size: 80%;"><b>Further remarks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs012.html#layers-used-to-build-cnns" style="font-size: 80%;"><b>Layers used to build CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs013.html#transforming-images" style="font-size: 80%;"><b>Transforming images</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs014.html#cnns-in-brief" style="font-size: 80%;"><b>CNNs in brief</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs015.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs016.html#key-idea" style="font-size: 80%;"><b>Key Idea</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs017.html#how-to-do-image-compression-before-the-era-of-deep-learning" style="font-size: 80%;"><b>How to do image compression before the era of deep learning</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs018.html#the-svd-example" style="font-size: 80%;"><b>The SVD example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs019.html#mathematics-of-cnns" style="font-size: 80%;"><b>Mathematics of CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs020.html#convolution-examples-polynomial-multiplication" style="font-size: 80%;"><b>Convolution Examples: Polynomial multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs021.html#efficient-polynomial-multiplication" style="font-size: 80%;"><b>Efficient Polynomial Multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs022.html#further-simplification" style="font-size: 80%;"><b>Further simplification</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs023.html#a-more-efficient-way-of-coding-the-above-convolution" style="font-size: 80%;"><b>A more efficient way of coding the above Convolution</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs024.html#commutative-process" style="font-size: 80%;"><b>Commutative process</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs025.html#toeplitz-matrices" style="font-size: 80%;"><b>Toeplitz matrices</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs026.html#fourier-series-and-toeplitz-matrices" style="font-size: 80%;"><b>Fourier series and Toeplitz matrices</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs027.html#generalizing-the-above-one-dimensional-case" style="font-size: 80%;"><b>Generalizing the above one-dimensional case</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#memory-considerations" style="font-size: 80%;"><b>Memory considerations</b></a></li>
-     <!-- navigation toc: --> <li><a href="#padding" style="font-size: 80%;"><b>Padding</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs030.html#new-vector" style="font-size: 80%;"><b>New vector</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs031.html#rewriting-as-dot-products" style="font-size: 80%;"><b>Rewriting as dot products</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs032.html#cross-correlation" style="font-size: 80%;"><b>Cross correlation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs032.html#two-dimensional-objects" style="font-size: 80%;"><b>Two-dimensional objects</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs033.html#cnns-in-more-detail-simple-example" style="font-size: 80%;"><b>CNNs in more detail, simple example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs034.html#the-convolution-stage" style="font-size: 80%;"><b>The convolution stage</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs035.html#finding-the-number-of-parameters" style="font-size: 80%;"><b>Finding the number of parameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs036.html#new-image-or-volume" style="font-size: 80%;"><b>New image (or volume)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs037.html#parameters-to-train-common-settings" style="font-size: 80%;"><b>Parameters to train, common settings</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs038.html#examples-of-cnn-setups" style="font-size: 80%;"><b>Examples of CNN setups</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs039.html#summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>Summarizing: Performing a general discrete convolution ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs040.html#pooling" style="font-size: 80%;"><b>Pooling</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs041.html#pooling-arithmetic" style="font-size: 80%;"><b>Pooling arithmetic</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs042.html#pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>Pooling types ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs043.html#building-convolutional-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building convolutional neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs044.html#setting-it-up" style="font-size: 80%;"><b>Setting it up</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs045.html#the-mnist-dataset-again" style="font-size: 80%;"><b>The MNIST dataset again</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs046.html#strong-correlations" style="font-size: 80%;"><b>Strong correlations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs047.html#layers-of-a-cnn" style="font-size: 80%;"><b>Layers of a CNN</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs048.html#systematic-reduction" style="font-size: 80%;"><b>Systematic reduction</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs049.html#prerequisites-collect-and-pre-process-data" style="font-size: 80%;"><b>Prerequisites: Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs050.html#importing-keras-and-tensorflow" style="font-size: 80%;"><b>Importing Keras and Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs051.html#running-with-keras" style="font-size: 80%;"><b>Running with Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs052.html#final-part" style="font-size: 80%;"><b>Final part</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs053.html#final-visualization" style="font-size: 80%;"><b>Final visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs054.html#the-cifar01-data-set" style="font-size: 80%;"><b>The CIFAR01 data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs055.html#verifying-the-data-set" style="font-size: 80%;"><b>Verifying the data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs056.html#set-up-the-model" style="font-size: 80%;"><b>Set up  the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs057.html#add-dense-layers-on-top" style="font-size: 80%;"><b>Add Dense layers on top</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs058.html#compile-and-train-the-model" style="font-size: 80%;"><b>Compile and train the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs059.html#finally-evaluate-the-model" style="font-size: 80%;"><b>Finally, evaluate the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#building-our-own-cnn-code" style="font-size: 80%;"><b>Building our own CNN code</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#list-of-contents" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;List of contents:</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution2dlayer-convolution-in-a-hidden-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution2DLayer: convolution in a hidden layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#backpropagation-in-the-convolutional-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Backpropagation in the convolutional layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#demonstration" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Demonstration</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#pooling-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Pooling Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#flattening-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Flattening Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#fully-connected-layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Fully Connected Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#optimized-convolution2dlayer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Optimized Convolution2DLayer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#the-convolutional-neural-network-cnn" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;The Convolutional Neural Network (CNN)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-cnn-code" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of CNN code</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#additional-remarks" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Additional Remarks</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#remarks-on-the-speed" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Remarks on the speed</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution-using-separable-kernels" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution using separable kernels</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution-in-the-fourier-domain" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution in the Fourier domain</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs001.html#plan-for-week-44" style="font-size: 80%;">Plan for week 44</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs002.html#lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;">Lab  sessions on Tuesday and Wednesday</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs003.html#material-for-lecture-monday-october-27" style="font-size: 80%;">Material for Lecture Monday October 27</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs004.html#solving-differential-equations-with-deep-learning" style="font-size: 80%;">Solving differential equations  with Deep Learning</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs005.html#ordinary-differential-equations-first" style="font-size: 80%;">Ordinary Differential Equations first</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#the-trial-solution" style="font-size: 80%;">The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs007.html#minimization-process" style="font-size: 80%;">Minimization process</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs008.html#minimizing-the-cost-function-using-gradient-descent-and-automatic-differentiation" style="font-size: 80%;">Minimizing the cost function using gradient descent and automatic differentiation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs009.html#example-exponential-decay" style="font-size: 80%;">Example: Exponential decay</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs010.html#the-function-to-solve-for" style="font-size: 80%;">The function to solve for</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#the-trial-solution" style="font-size: 80%;">The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs012.html#setup-of-network" style="font-size: 80%;">Setup of Network</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs013.html#reformulating-the-problem" style="font-size: 80%;">Reformulating the problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs014.html#more-technicalities" style="font-size: 80%;">More technicalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs039.html#more-details" style="font-size: 80%;">More details</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs016.html#a-possible-implementation-of-a-neural-network" style="font-size: 80%;">A possible implementation of a neural network</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs017.html#technicalities" style="font-size: 80%;">Technicalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs018.html#final-technicalities-i" style="font-size: 80%;">Final technicalities I</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs019.html#final-technicalities-ii" style="font-size: 80%;">Final technicalities II</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs020.html#final-technicalities-iii" style="font-size: 80%;">Final technicalities III</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs021.html#final-technicalities-iv" style="font-size: 80%;">Final technicalities IV</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs022.html#back-propagation" style="font-size: 80%;">Back propagation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs023.html#gradient-descent" style="font-size: 80%;">Gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs024.html#the-code-for-solving-the-ode" style="font-size: 80%;">The code for solving the ODE</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs025.html#the-network-with-one-input-layer-specified-number-of-hidden-layers-and-one-output-layer" style="font-size: 80%;">The network with one input layer, specified number of hidden layers, and one output layer</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs026.html#example-population-growth" style="font-size: 80%;">Example: Population growth</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs027.html#setting-up-the-problem" style="font-size: 80%;">Setting up the problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#the-trial-solution" style="font-size: 80%;">The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="#the-program-using-autograd" style="font-size: 80%;">The program using Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs030.html#using-forward-euler-to-solve-the-ode" style="font-size: 80%;">Using forward Euler to solve the ODE</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs031.html#example-solving-the-one-dimensional-poisson-equation" style="font-size: 80%;">Example: Solving the one dimensional Poisson equation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs032.html#the-specific-equation-to-solve-for" style="font-size: 80%;">The specific equation to solve for</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs033.html#solving-the-equation-using-autograd" style="font-size: 80%;">Solving the equation using Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs034.html#comparing-with-a-numerical-scheme" style="font-size: 80%;">Comparing with a numerical scheme</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs035.html#setting-up-the-code" style="font-size: 80%;">Setting up the code</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs036.html#partial-differential-equations" style="font-size: 80%;">Partial Differential Equations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs037.html#type-of-problem" style="font-size: 80%;">Type of problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs038.html#network-requirements" style="font-size: 80%;">Network requirements</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs039.html#more-details" style="font-size: 80%;">More details</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs040.html#example-the-diffusion-equation" style="font-size: 80%;">Example: The diffusion equation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs041.html#defining-the-problem" style="font-size: 80%;">Defining the problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs042.html#setting-up-the-network-using-autograd" style="font-size: 80%;">Setting up the network using Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs043.html#setting-up-the-network-using-autograd-the-trial-solution" style="font-size: 80%;">Setting up the network using Autograd; The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs044.html#why-the-jacobian" style="font-size: 80%;">Why the Jacobian?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs045.html#setting-up-the-network-using-autograd-the-full-program" style="font-size: 80%;">Setting up the network using Autograd; The full program</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs046.html#resources-on-differential-equations-and-deep-learning" style="font-size: 80%;">Resources on differential equations and deep learning</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs047.html#convolutional-neural-networks-recognizing-images" style="font-size: 80%;">Convolutional Neural Networks (recognizing images)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs048.html#what-is-the-difference" style="font-size: 80%;">What is the Difference</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs049.html#neural-networks-vs-cnns" style="font-size: 80%;">Neural Networks vs CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs050.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;">Why CNNS for images, sound files, medical images from CT scans etc?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs051.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;">Regular NNs don’t scale well to full images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs052.html#3d-volumes-of-neurons" style="font-size: 80%;">3D volumes of neurons</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs053.html#more-on-dimensionalities" style="font-size: 80%;">More on Dimensionalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs054.html#further-remarks" style="font-size: 80%;">Further remarks</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs055.html#layers-used-to-build-cnns" style="font-size: 80%;">Layers used to build CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs056.html#transforming-images" style="font-size: 80%;">Transforming images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs057.html#cnns-in-brief" style="font-size: 80%;">CNNs in brief</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs058.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs059.html#key-idea" style="font-size: 80%;">Key Idea</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#how-to-do-image-compression-before-the-era-of-deep-learning" style="font-size: 80%;">How to do image compression before the era of deep learning</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs061.html#the-svd-example" style="font-size: 80%;">The SVD example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs062.html#mathematics-of-cnns" style="font-size: 80%;">Mathematics of CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs063.html#convolution-examples-polynomial-multiplication" style="font-size: 80%;">Convolution Examples: Polynomial multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs064.html#efficient-polynomial-multiplication" style="font-size: 80%;">Efficient Polynomial Multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs065.html#further-simplification" style="font-size: 80%;">Further simplification</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs066.html#a-more-efficient-way-of-coding-the-above-convolution" style="font-size: 80%;">A more efficient way of coding the above Convolution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs067.html#commutative-process" style="font-size: 80%;">Commutative process</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs068.html#toeplitz-matrices" style="font-size: 80%;">Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs069.html#fourier-series-and-toeplitz-matrices" style="font-size: 80%;">Fourier series and Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs070.html#generalizing-the-above-one-dimensional-case" style="font-size: 80%;">Generalizing the above one-dimensional case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs071.html#memory-considerations" style="font-size: 80%;">Memory considerations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs072.html#padding" style="font-size: 80%;">Padding</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs073.html#new-vector" style="font-size: 80%;">New vector</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs074.html#rewriting-as-dot-products" style="font-size: 80%;">Rewriting as dot products</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs075.html#cross-correlation" style="font-size: 80%;">Cross correlation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs075.html#two-dimensional-objects" style="font-size: 80%;">Two-dimensional objects</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs076.html#cnns-in-more-detail-simple-example" style="font-size: 80%;">CNNs in more detail, simple example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs077.html#the-convolution-stage" style="font-size: 80%;">The convolution stage</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs078.html#finding-the-number-of-parameters" style="font-size: 80%;">Finding the number of parameters</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs079.html#new-image-or-volume" style="font-size: 80%;">New image (or volume)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs080.html#parameters-to-train-common-settings" style="font-size: 80%;">Parameters to train, common settings</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs081.html#examples-of-cnn-setups" style="font-size: 80%;">Examples of CNN setups</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs082.html#summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Summarizing: Performing a general discrete convolution ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs083.html#pooling" style="font-size: 80%;">Pooling</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs084.html#pooling-arithmetic" style="font-size: 80%;">Pooling arithmetic</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs085.html#pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Pooling types ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs086.html#building-convolutional-neural-networks-in-tensorflow-keras-and-pytorch" style="font-size: 80%;">Building convolutional neural networks in Tensorflow/Keras and PyTorch</a></li>
 
         </ul>
       </li>
@@ -348,14 +389,200 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0029"></a>
 <!-- !split -->
-<h2 id="padding" class="anchor">Padding </h2>
+<h2 id="the-program-using-autograd" class="anchor">The program using Autograd </h2>
+
+<p>The network will be the similar as for the exponential decay example, but with some small modifications for our problem.</p>
+
+
+<!-- code=python (!bc pycod) typeset with pygments style "default" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #f8f8f8">
+  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">autograd.numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">autograd</span> <span style="color: #008000; font-weight: bold">import</span> grad, elementwise_grad
+<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">autograd.numpy.random</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">npr</span>
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">matplotlib</span> <span style="color: #008000; font-weight: bold">import</span> pyplot <span style="color: #008000; font-weight: bold">as</span> plt
+
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">sigmoid</span>(z):
+    <span style="color: #008000; font-weight: bold">return</span> <span style="color: #666666">1/</span>(<span style="color: #666666">1</span> <span style="color: #666666">+</span> np<span style="color: #666666">.</span>exp(<span style="color: #666666">-</span>z))
+
+<span style="color: #408080; font-style: italic"># Function to get the parameters.</span>
+<span style="color: #408080; font-style: italic"># Done such that one can easily change the paramaters after one&#39;s liking.</span>
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">get_parameters</span>():
+    alpha <span style="color: #666666">=</span> <span style="color: #666666">2</span>
+    A <span style="color: #666666">=</span> <span style="color: #666666">1</span>
+    g0 <span style="color: #666666">=</span> <span style="color: #666666">1.2</span>
+    <span style="color: #008000; font-weight: bold">return</span> alpha, A, g0
+
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">deep_neural_network</span>(deep_params, x):
+    <span style="color: #408080; font-style: italic"># N_hidden is the number of hidden layers</span>
+    <span style="color: #408080; font-style: italic"># deep_params is a list, len() should be used</span>
+    N_hidden <span style="color: #666666">=</span> <span style="color: #008000">len</span>(deep_params) <span style="color: #666666">-</span> <span style="color: #666666">1</span> <span style="color: #408080; font-style: italic"># -1 since params consists of</span>
+                                        <span style="color: #408080; font-style: italic"># parameters to all the hidden</span>
+                                        <span style="color: #408080; font-style: italic"># layers AND the output layer.</span>
+
+    <span style="color: #408080; font-style: italic"># Assumes input x being an one-dimensional array</span>
+    num_values <span style="color: #666666">=</span> np<span style="color: #666666">.</span>size(x)
+    x <span style="color: #666666">=</span> x<span style="color: #666666">.</span>reshape(<span style="color: #666666">-1</span>, num_values)
+
+    <span style="color: #408080; font-style: italic"># Assume that the input layer does nothing to the input x</span>
+    x_input <span style="color: #666666">=</span> x
+
+    <span style="color: #408080; font-style: italic"># Due to multiple hidden layers, define a variable referencing to the</span>
+    <span style="color: #408080; font-style: italic"># output of the previous layer:</span>
+    x_prev <span style="color: #666666">=</span> x_input
+
+    <span style="color: #408080; font-style: italic">## Hidden layers:</span>
+
+    <span style="color: #008000; font-weight: bold">for</span> l <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(N_hidden):
+        <span style="color: #408080; font-style: italic"># From the list of parameters P; find the correct weigths and bias for this layer</span>
+        w_hidden <span style="color: #666666">=</span> deep_params[l]
+
+        <span style="color: #408080; font-style: italic"># Add a row of ones to include bias</span>
+        x_prev <span style="color: #666666">=</span> np<span style="color: #666666">.</span>concatenate((np<span style="color: #666666">.</span>ones((<span style="color: #666666">1</span>,num_values)), x_prev ), axis <span style="color: #666666">=</span> <span style="color: #666666">0</span>)
+
+        z_hidden <span style="color: #666666">=</span> np<span style="color: #666666">.</span>matmul(w_hidden, x_prev)
+        x_hidden <span style="color: #666666">=</span> sigmoid(z_hidden)
+
+        <span style="color: #408080; font-style: italic"># Update x_prev such that next layer can use the output from this layer</span>
+        x_prev <span style="color: #666666">=</span> x_hidden
+
+    <span style="color: #408080; font-style: italic">## Output layer:</span>
+
+    <span style="color: #408080; font-style: italic"># Get the weights and bias for this layer</span>
+    w_output <span style="color: #666666">=</span> deep_params[<span style="color: #666666">-1</span>]
+
+    <span style="color: #408080; font-style: italic"># Include bias:</span>
+    x_prev <span style="color: #666666">=</span> np<span style="color: #666666">.</span>concatenate((np<span style="color: #666666">.</span>ones((<span style="color: #666666">1</span>,num_values)), x_prev), axis <span style="color: #666666">=</span> <span style="color: #666666">0</span>)
+
+    z_output <span style="color: #666666">=</span> np<span style="color: #666666">.</span>matmul(w_output, x_prev)
+    x_output <span style="color: #666666">=</span> z_output
+
+    <span style="color: #008000; font-weight: bold">return</span> x_output
+
+
+
+
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">cost_function_deep</span>(P, x):
+
+    <span style="color: #408080; font-style: italic"># Evaluate the trial function with the current parameters P</span>
+    g_t <span style="color: #666666">=</span> g_trial_deep(x,P)
+
+    <span style="color: #408080; font-style: italic"># Find the derivative w.r.t x of the trial function</span>
+    d_g_t <span style="color: #666666">=</span> elementwise_grad(g_trial_deep,<span style="color: #666666">0</span>)(x,P)
+
+    <span style="color: #408080; font-style: italic"># The right side of the ODE</span>
+    func <span style="color: #666666">=</span> f(x, g_t)
+
+    err_sqr <span style="color: #666666">=</span> (d_g_t <span style="color: #666666">-</span> func)<span style="color: #666666">**2</span>
+    cost_sum <span style="color: #666666">=</span> np<span style="color: #666666">.</span>sum(err_sqr)
+
+    <span style="color: #008000; font-weight: bold">return</span> cost_sum <span style="color: #666666">/</span> np<span style="color: #666666">.</span>size(err_sqr)
+
+<span style="color: #408080; font-style: italic"># The right side of the ODE:</span>
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">f</span>(x, g_trial):
+    alpha,A, g0 <span style="color: #666666">=</span> get_parameters()
+    <span style="color: #008000; font-weight: bold">return</span> alpha<span style="color: #666666">*</span>g_trial<span style="color: #666666">*</span>(A <span style="color: #666666">-</span> g_trial)
+
+<span style="color: #408080; font-style: italic"># The trial solution using the deep neural network:</span>
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">g_trial_deep</span>(x, params):
+    alpha,A, g0 <span style="color: #666666">=</span> get_parameters()
+    <span style="color: #008000; font-weight: bold">return</span> g0 <span style="color: #666666">+</span> x<span style="color: #666666">*</span>deep_neural_network(params,x)
+
+<span style="color: #408080; font-style: italic"># The analytical solution:</span>
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">g_analytic</span>(t):
+    alpha,A, g0 <span style="color: #666666">=</span> get_parameters()
+    <span style="color: #008000; font-weight: bold">return</span> A<span style="color: #666666">*</span>g0<span style="color: #666666">/</span>(g0 <span style="color: #666666">+</span> (A <span style="color: #666666">-</span> g0)<span style="color: #666666">*</span>np<span style="color: #666666">.</span>exp(<span style="color: #666666">-</span>alpha<span style="color: #666666">*</span>A<span style="color: #666666">*</span>t))
+
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">solve_ode_deep_neural_network</span>(x, num_neurons, num_iter, lmb):
+    <span style="color: #408080; font-style: italic"># num_hidden_neurons is now a list of number of neurons within each hidden layer</span>
+
+    <span style="color: #408080; font-style: italic"># Find the number of hidden layers:</span>
+    N_hidden <span style="color: #666666">=</span> np<span style="color: #666666">.</span>size(num_neurons)
+
+    <span style="color: #408080; font-style: italic">## Set up initial weigths and biases</span>
+
+    <span style="color: #408080; font-style: italic"># Initialize the list of parameters:</span>
+    P <span style="color: #666666">=</span> [<span style="color: #008000; font-weight: bold">None</span>]<span style="color: #666666">*</span>(N_hidden <span style="color: #666666">+</span> <span style="color: #666666">1</span>) <span style="color: #408080; font-style: italic"># + 1 to include the output layer</span>
+
+    P[<span style="color: #666666">0</span>] <span style="color: #666666">=</span> npr<span style="color: #666666">.</span>randn(num_neurons[<span style="color: #666666">0</span>], <span style="color: #666666">2</span> )
+    <span style="color: #008000; font-weight: bold">for</span> l <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #666666">1</span>,N_hidden):
+        P[l] <span style="color: #666666">=</span> npr<span style="color: #666666">.</span>randn(num_neurons[l], num_neurons[l<span style="color: #666666">-1</span>] <span style="color: #666666">+</span> <span style="color: #666666">1</span>) <span style="color: #408080; font-style: italic"># +1 to include bias</span>
+
+    <span style="color: #408080; font-style: italic"># For the output layer</span>
+    P[<span style="color: #666666">-1</span>] <span style="color: #666666">=</span> npr<span style="color: #666666">.</span>randn(<span style="color: #666666">1</span>, num_neurons[<span style="color: #666666">-1</span>] <span style="color: #666666">+</span> <span style="color: #666666">1</span> ) <span style="color: #408080; font-style: italic"># +1 since bias is included</span>
+
+    <span style="color: #008000">print</span>(<span style="color: #BA2121">&#39;Initial cost: </span><span style="color: #BB6688; font-weight: bold">%g</span><span style="color: #BA2121">&#39;</span><span style="color: #666666">%</span>cost_function_deep(P, x))
+
+    <span style="color: #408080; font-style: italic">## Start finding the optimal weigths using gradient descent</span>
+
+    <span style="color: #408080; font-style: italic"># Find the Python function that represents the gradient of the cost function</span>
+    <span style="color: #408080; font-style: italic"># w.r.t the 0-th input argument -- that is the weights and biases in the hidden and output layer</span>
+    cost_function_deep_grad <span style="color: #666666">=</span> grad(cost_function_deep,<span style="color: #666666">0</span>)
+
+    <span style="color: #408080; font-style: italic"># Let the update be done num_iter times</span>
+    <span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(num_iter):
+        <span style="color: #408080; font-style: italic"># Evaluate the gradient at the current weights and biases in P.</span>
+        <span style="color: #408080; font-style: italic"># The cost_grad consist now of N_hidden + 1 arrays; the gradient w.r.t the weights and biases</span>
+        <span style="color: #408080; font-style: italic"># in the hidden layers and output layers evaluated at x.</span>
+        cost_deep_grad <span style="color: #666666">=</span>  cost_function_deep_grad(P, x)
+
+        <span style="color: #008000; font-weight: bold">for</span> l <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(N_hidden<span style="color: #666666">+1</span>):
+            P[l] <span style="color: #666666">=</span> P[l] <span style="color: #666666">-</span> lmb <span style="color: #666666">*</span> cost_deep_grad[l]
+
+    <span style="color: #008000">print</span>(<span style="color: #BA2121">&#39;Final cost: </span><span style="color: #BB6688; font-weight: bold">%g</span><span style="color: #BA2121">&#39;</span><span style="color: #666666">%</span>cost_function_deep(P, x))
+
+    <span style="color: #008000; font-weight: bold">return</span> P
+
+<span style="color: #008000; font-weight: bold">if</span> <span style="color: #19177C">__name__</span> <span style="color: #666666">==</span> <span style="color: #BA2121">&#39;__main__&#39;</span>:
+    npr<span style="color: #666666">.</span>seed(<span style="color: #666666">4155</span>)
+
+    <span style="color: #408080; font-style: italic">## Decide the vales of arguments to the function to solve</span>
+    Nt <span style="color: #666666">=</span> <span style="color: #666666">10</span>
+    T <span style="color: #666666">=</span> <span style="color: #666666">1</span>
+    t <span style="color: #666666">=</span> np<span style="color: #666666">.</span>linspace(<span style="color: #666666">0</span>,T, Nt)
+
+    <span style="color: #408080; font-style: italic">## Set up the initial parameters</span>
+    num_hidden_neurons <span style="color: #666666">=</span> [<span style="color: #666666">100</span>, <span style="color: #666666">50</span>, <span style="color: #666666">25</span>]
+    num_iter <span style="color: #666666">=</span> <span style="color: #666666">1000</span>
+    lmb <span style="color: #666666">=</span> <span style="color: #666666">1e-3</span>
+
+    P <span style="color: #666666">=</span> solve_ode_deep_neural_network(t, num_hidden_neurons, num_iter, lmb)
+
+    g_dnn_ag <span style="color: #666666">=</span> g_trial_deep(t,P)
+    g_analytical <span style="color: #666666">=</span> g_analytic(t)
+
+    <span style="color: #408080; font-style: italic"># Find the maximum absolute difference between the solutons:</span>
+    diff_ag <span style="color: #666666">=</span> np<span style="color: #666666">.</span>max(np<span style="color: #666666">.</span>abs(g_dnn_ag <span style="color: #666666">-</span> g_analytical))
+    <span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;The max absolute difference between the solutions is: </span><span style="color: #BB6688; font-weight: bold">%g</span><span style="color: #BA2121">&quot;</span><span style="color: #666666">%</span>diff_ag)
+
+    plt<span style="color: #666666">.</span>figure(figsize<span style="color: #666666">=</span>(<span style="color: #666666">10</span>,<span style="color: #666666">10</span>))
+
+    plt<span style="color: #666666">.</span>title(<span style="color: #BA2121">&#39;Performance of neural network solving an ODE compared to the analytical solution&#39;</span>)
+    plt<span style="color: #666666">.</span>plot(t, g_analytical)
+    plt<span style="color: #666666">.</span>plot(t, g_dnn_ag[<span style="color: #666666">0</span>,:])
+    plt<span style="color: #666666">.</span>legend([<span style="color: #BA2121">&#39;analytical&#39;</span>,<span style="color: #BA2121">&#39;nn&#39;</span>])
+    plt<span style="color: #666666">.</span>xlabel(<span style="color: #BA2121">&#39;t&#39;</span>)
+    plt<span style="color: #666666">.</span>ylabel(<span style="color: #BA2121">&#39;g(t)&#39;</span>)
+
+    plt<span style="color: #666666">.</span>show()
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
+</div>
 
-<p>The solution to this is what is called <b>padding</b>!  We simply define a
-new vector \( x \) with two added elements set to zero before \( x(0) \) and
-two new elements after \( x(3) \) set to zero. That is, we augment the
-length of \( \boldsymbol{x} \) from \( n=4 \) to \( n+2P=8 \), where \( P=2 \) is the padding
-constant (a new hyperparameter), see discussions below as well.
-</p>
 
 <p>
 <!-- navigation buttons at the bottom of the page -->
@@ -382,7 +609,7 @@ <h2 id="padding" class="anchor">Padding </h2>
   <li><a href="http://github.com/._week44-bs037.html">38</a></li>
   <li><a href="http://github.com/._week44-bs038.html">39</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week44-bs060.html">61</a></li>
+  <li><a href="http://github.com/._week44-bs086.html">87</a></li>
   <li><a href="http://github.com/._week44-bs030.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week44/html/._week44-bs030.html b/doc/pub/week44/html/._week44-bs030.html
index 0054e94af..8f0407603 100644
--- a/doc/pub/week44/html/._week44-bs030.html
+++ b/doc/pub/week44/html/._week44-bs030.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 44,  Convolutional Neural Networks (CNN)">
-<title>Week 44,  Convolutional Neural Networks (CNN)</title>
+<meta name="description" content="Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)">
+<title>Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week44.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week44-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -41,10 +41,121 @@
                2,
                None,
                'lab-sessions-on-tuesday-and-wednesday'),
-              ('Material for Lecture Monday October 28',
+              ('Material for Lecture Monday October 27',
                2,
                None,
-               'material-for-lecture-monday-october-28'),
+               'material-for-lecture-monday-october-27'),
+              ('Solving differential equations  with Deep Learning',
+               2,
+               None,
+               'solving-differential-equations-with-deep-learning'),
+              ('Ordinary Differential Equations first',
+               2,
+               None,
+               'ordinary-differential-equations-first'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('Minimization process', 2, None, 'minimization-process'),
+              ('Minimizing the cost function using gradient descent and '
+               'automatic differentiation',
+               2,
+               None,
+               'minimizing-the-cost-function-using-gradient-descent-and-automatic-differentiation'),
+              ('Example: Exponential decay',
+               2,
+               None,
+               'example-exponential-decay'),
+              ('The function to solve for',
+               2,
+               None,
+               'the-function-to-solve-for'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('Setup of Network', 2, None, 'setup-of-network'),
+              ('Reformulating the problem',
+               2,
+               None,
+               'reformulating-the-problem'),
+              ('More technicalities', 2, None, 'more-technicalities'),
+              ('More details', 2, None, 'more-details'),
+              ('A possible implementation of a neural network',
+               2,
+               None,
+               'a-possible-implementation-of-a-neural-network'),
+              ('Technicalities', 2, None, 'technicalities'),
+              ('Final technicalities I', 2, None, 'final-technicalities-i'),
+              ('Final technicalities II', 2, None, 'final-technicalities-ii'),
+              ('Final technicalities III', 2, None, 'final-technicalities-iii'),
+              ('Final technicalities IV', 2, None, 'final-technicalities-iv'),
+              ('Back propagation', 2, None, 'back-propagation'),
+              ('Gradient descent', 2, None, 'gradient-descent'),
+              ('The code for solving the ODE',
+               2,
+               None,
+               'the-code-for-solving-the-ode'),
+              ('The network with one input layer, specified number of hidden '
+               'layers, and one output layer',
+               2,
+               None,
+               'the-network-with-one-input-layer-specified-number-of-hidden-layers-and-one-output-layer'),
+              ('Example: Population growth',
+               2,
+               None,
+               'example-population-growth'),
+              ('Setting up the problem', 2, None, 'setting-up-the-problem'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('The program using Autograd',
+               2,
+               None,
+               'the-program-using-autograd'),
+              ('Using forward Euler to solve the ODE',
+               2,
+               None,
+               'using-forward-euler-to-solve-the-ode'),
+              ('Example: Solving the one dimensional Poisson equation',
+               2,
+               None,
+               'example-solving-the-one-dimensional-poisson-equation'),
+              ('The specific equation to solve for',
+               2,
+               None,
+               'the-specific-equation-to-solve-for'),
+              ('Solving the equation using Autograd',
+               2,
+               None,
+               'solving-the-equation-using-autograd'),
+              ('Comparing with a numerical scheme',
+               2,
+               None,
+               'comparing-with-a-numerical-scheme'),
+              ('Setting up the code', 2, None, 'setting-up-the-code'),
+              ('Partial Differential Equations',
+               2,
+               None,
+               'partial-differential-equations'),
+              ('Type of problem', 2, None, 'type-of-problem'),
+              ('Network requirements', 2, None, 'network-requirements'),
+              ('More details', 2, None, 'more-details'),
+              ('Example: The diffusion equation',
+               2,
+               None,
+               'example-the-diffusion-equation'),
+              ('Defining the problem', 2, None, 'defining-the-problem'),
+              ('Setting up the network using Autograd',
+               2,
+               None,
+               'setting-up-the-network-using-autograd'),
+              ('Setting up the network using Autograd; The trial solution',
+               2,
+               None,
+               'setting-up-the-network-using-autograd-the-trial-solution'),
+              ('Why the Jacobian?', 2, None, 'why-the-jacobian'),
+              ('Setting up the network using Autograd; The full program',
+               2,
+               None,
+               'setting-up-the-network-using-autograd-the-full-program'),
+              ('Resources on differential equations and deep learning',
+               2,
+               None,
+               'resources-on-differential-equations-and-deep-learning'),
               ('Convolutional Neural Networks (recognizing images)',
                2,
                None,
@@ -141,85 +252,11 @@
                2,
                None,
                'pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
-              ('Building convolutional neural networks in Tensorflow and Keras',
-               2,
-               None,
-               'building-convolutional-neural-networks-in-tensorflow-and-keras'),
-              ('Setting it up', 2, None, 'setting-it-up'),
-              ('The MNIST dataset again', 2, None, 'the-mnist-dataset-again'),
-              ('Strong correlations', 2, None, 'strong-correlations'),
-              ('Layers of a CNN', 2, None, 'layers-of-a-cnn'),
-              ('Systematic reduction', 2, None, 'systematic-reduction'),
-              ('Prerequisites: Collect and pre-process data',
-               2,
-               None,
-               'prerequisites-collect-and-pre-process-data'),
-              ('Importing Keras and Tensorflow',
+              ('Building convolutional neural networks in Tensorflow/Keras and '
+               'PyTorch',
                2,
                None,
-               'importing-keras-and-tensorflow'),
-              ('Running with Keras', 2, None, 'running-with-keras'),
-              ('Final part', 2, None, 'final-part'),
-              ('Final visualization', 2, None, 'final-visualization'),
-              ('The CIFAR01 data set', 2, None, 'the-cifar01-data-set'),
-              ('Verifying the data set', 2, None, 'verifying-the-data-set'),
-              ('Set up  the model', 2, None, 'set-up-the-model'),
-              ('Add Dense layers on top', 2, None, 'add-dense-layers-on-top'),
-              ('Compile and train the model',
-               2,
-               None,
-               'compile-and-train-the-model'),
-              ('Finally, evaluate the model',
-               2,
-               None,
-               'finally-evaluate-the-model'),
-              ('Building our own CNN code',
-               2,
-               None,
-               'building-our-own-cnn-code'),
-              ('List of contents:', 3, None, 'list-of-contents'),
-              ('Schedulers', 3, None, 'schedulers'),
-              ('Usage of schedulers', 3, None, 'usage-of-schedulers'),
-              ('Cost functions', 3, None, 'cost-functions'),
-              ('Usage of cost functions', 3, None, 'usage-of-cost-functions'),
-              ('Activation functions', 3, None, 'activation-functions'),
-              ('Usage of activation functions',
-               3,
-               None,
-               'usage-of-activation-functions'),
-              ('Convolution', 3, None, 'convolution'),
-              ('Layers', 3, None, 'layers'),
-              ('Convolution2DLayer: convolution in a hidden layer',
-               3,
-               None,
-               'convolution2dlayer-convolution-in-a-hidden-layer'),
-              ('Backpropagation in the convolutional layer',
-               3,
-               None,
-               'backpropagation-in-the-convolutional-layer'),
-              ('Demonstration', 3, None, 'demonstration'),
-              ('Pooling Layer', 3, None, 'pooling-layer'),
-              ('Flattening Layer', 3, None, 'flattening-layer'),
-              ('Fully Connected Layers', 3, None, 'fully-connected-layers'),
-              ('Optimized Convolution2DLayer',
-               3,
-               None,
-               'optimized-convolution2dlayer'),
-              ('The Convolutional Neural Network (CNN)',
-               3,
-               None,
-               'the-convolutional-neural-network-cnn'),
-              ('Usage of CNN code', 3, None, 'usage-of-cnn-code'),
-              ('Additional Remarks', 3, None, 'additional-remarks'),
-              ('Remarks on the speed', 3, None, 'remarks-on-the-speed'),
-              ('Convolution using separable kernels',
-               3,
-               None,
-               'convolution-using-separable-kernels'),
-              ('Convolution in the Fourier domain',
-               3,
-               None,
-               'convolution-in-the-fourier-domain')]}
+               'building-convolutional-neural-networks-in-tensorflow-keras-and-pytorch')]}
 end of tocinfo -->
 
 <body>
@@ -247,96 +284,100 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week44-bs.html">Week 44,  Convolutional Neural Networks (CNN)</a>
+    <a class="navbar-brand" href="http://github.com/week44-bs.html">Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs001.html#plan-for-week-44" style="font-size: 80%;"><b>Plan for week 44</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs002.html#lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Lab  sessions on Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs003.html#material-for-lecture-monday-october-28" style="font-size: 80%;"><b>Material for Lecture Monday October 28</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs004.html#convolutional-neural-networks-recognizing-images" style="font-size: 80%;"><b>Convolutional Neural Networks (recognizing images)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs005.html#what-is-the-difference" style="font-size: 80%;"><b>What is the Difference</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs006.html#neural-networks-vs-cnns" style="font-size: 80%;"><b>Neural Networks vs CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs007.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;"><b>Why CNNS for images, sound files, medical images from CT scans etc?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs008.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;"><b>Regular NNs don’t scale well to full images</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs009.html#3d-volumes-of-neurons" style="font-size: 80%;"><b>3D volumes of neurons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs010.html#more-on-dimensionalities" style="font-size: 80%;"><b>More on Dimensionalities</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs011.html#further-remarks" style="font-size: 80%;"><b>Further remarks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs012.html#layers-used-to-build-cnns" style="font-size: 80%;"><b>Layers used to build CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs013.html#transforming-images" style="font-size: 80%;"><b>Transforming images</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs014.html#cnns-in-brief" style="font-size: 80%;"><b>CNNs in brief</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs015.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs016.html#key-idea" style="font-size: 80%;"><b>Key Idea</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs017.html#how-to-do-image-compression-before-the-era-of-deep-learning" style="font-size: 80%;"><b>How to do image compression before the era of deep learning</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs018.html#the-svd-example" style="font-size: 80%;"><b>The SVD example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs019.html#mathematics-of-cnns" style="font-size: 80%;"><b>Mathematics of CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs020.html#convolution-examples-polynomial-multiplication" style="font-size: 80%;"><b>Convolution Examples: Polynomial multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs021.html#efficient-polynomial-multiplication" style="font-size: 80%;"><b>Efficient Polynomial Multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs022.html#further-simplification" style="font-size: 80%;"><b>Further simplification</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs023.html#a-more-efficient-way-of-coding-the-above-convolution" style="font-size: 80%;"><b>A more efficient way of coding the above Convolution</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs024.html#commutative-process" style="font-size: 80%;"><b>Commutative process</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs025.html#toeplitz-matrices" style="font-size: 80%;"><b>Toeplitz matrices</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs026.html#fourier-series-and-toeplitz-matrices" style="font-size: 80%;"><b>Fourier series and Toeplitz matrices</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs027.html#generalizing-the-above-one-dimensional-case" style="font-size: 80%;"><b>Generalizing the above one-dimensional case</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#memory-considerations" style="font-size: 80%;"><b>Memory considerations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs029.html#padding" style="font-size: 80%;"><b>Padding</b></a></li>
-     <!-- navigation toc: --> <li><a href="#new-vector" style="font-size: 80%;"><b>New vector</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs031.html#rewriting-as-dot-products" style="font-size: 80%;"><b>Rewriting as dot products</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs032.html#cross-correlation" style="font-size: 80%;"><b>Cross correlation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs032.html#two-dimensional-objects" style="font-size: 80%;"><b>Two-dimensional objects</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs033.html#cnns-in-more-detail-simple-example" style="font-size: 80%;"><b>CNNs in more detail, simple example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs034.html#the-convolution-stage" style="font-size: 80%;"><b>The convolution stage</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs035.html#finding-the-number-of-parameters" style="font-size: 80%;"><b>Finding the number of parameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs036.html#new-image-or-volume" style="font-size: 80%;"><b>New image (or volume)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs037.html#parameters-to-train-common-settings" style="font-size: 80%;"><b>Parameters to train, common settings</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs038.html#examples-of-cnn-setups" style="font-size: 80%;"><b>Examples of CNN setups</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs039.html#summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>Summarizing: Performing a general discrete convolution ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs040.html#pooling" style="font-size: 80%;"><b>Pooling</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs041.html#pooling-arithmetic" style="font-size: 80%;"><b>Pooling arithmetic</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs042.html#pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>Pooling types ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs043.html#building-convolutional-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building convolutional neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs044.html#setting-it-up" style="font-size: 80%;"><b>Setting it up</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs045.html#the-mnist-dataset-again" style="font-size: 80%;"><b>The MNIST dataset again</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs046.html#strong-correlations" style="font-size: 80%;"><b>Strong correlations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs047.html#layers-of-a-cnn" style="font-size: 80%;"><b>Layers of a CNN</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs048.html#systematic-reduction" style="font-size: 80%;"><b>Systematic reduction</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs049.html#prerequisites-collect-and-pre-process-data" style="font-size: 80%;"><b>Prerequisites: Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs050.html#importing-keras-and-tensorflow" style="font-size: 80%;"><b>Importing Keras and Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs051.html#running-with-keras" style="font-size: 80%;"><b>Running with Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs052.html#final-part" style="font-size: 80%;"><b>Final part</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs053.html#final-visualization" style="font-size: 80%;"><b>Final visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs054.html#the-cifar01-data-set" style="font-size: 80%;"><b>The CIFAR01 data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs055.html#verifying-the-data-set" style="font-size: 80%;"><b>Verifying the data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs056.html#set-up-the-model" style="font-size: 80%;"><b>Set up  the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs057.html#add-dense-layers-on-top" style="font-size: 80%;"><b>Add Dense layers on top</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs058.html#compile-and-train-the-model" style="font-size: 80%;"><b>Compile and train the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs059.html#finally-evaluate-the-model" style="font-size: 80%;"><b>Finally, evaluate the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#building-our-own-cnn-code" style="font-size: 80%;"><b>Building our own CNN code</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#list-of-contents" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;List of contents:</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution2dlayer-convolution-in-a-hidden-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution2DLayer: convolution in a hidden layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#backpropagation-in-the-convolutional-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Backpropagation in the convolutional layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#demonstration" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Demonstration</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#pooling-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Pooling Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#flattening-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Flattening Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#fully-connected-layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Fully Connected Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#optimized-convolution2dlayer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Optimized Convolution2DLayer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#the-convolutional-neural-network-cnn" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;The Convolutional Neural Network (CNN)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-cnn-code" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of CNN code</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#additional-remarks" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Additional Remarks</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#remarks-on-the-speed" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Remarks on the speed</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution-using-separable-kernels" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution using separable kernels</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution-in-the-fourier-domain" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution in the Fourier domain</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs001.html#plan-for-week-44" style="font-size: 80%;">Plan for week 44</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs002.html#lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;">Lab  sessions on Tuesday and Wednesday</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs003.html#material-for-lecture-monday-october-27" style="font-size: 80%;">Material for Lecture Monday October 27</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs004.html#solving-differential-equations-with-deep-learning" style="font-size: 80%;">Solving differential equations  with Deep Learning</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs005.html#ordinary-differential-equations-first" style="font-size: 80%;">Ordinary Differential Equations first</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#the-trial-solution" style="font-size: 80%;">The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs007.html#minimization-process" style="font-size: 80%;">Minimization process</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs008.html#minimizing-the-cost-function-using-gradient-descent-and-automatic-differentiation" style="font-size: 80%;">Minimizing the cost function using gradient descent and automatic differentiation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs009.html#example-exponential-decay" style="font-size: 80%;">Example: Exponential decay</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs010.html#the-function-to-solve-for" style="font-size: 80%;">The function to solve for</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#the-trial-solution" style="font-size: 80%;">The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs012.html#setup-of-network" style="font-size: 80%;">Setup of Network</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs013.html#reformulating-the-problem" style="font-size: 80%;">Reformulating the problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs014.html#more-technicalities" style="font-size: 80%;">More technicalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs039.html#more-details" style="font-size: 80%;">More details</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs016.html#a-possible-implementation-of-a-neural-network" style="font-size: 80%;">A possible implementation of a neural network</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs017.html#technicalities" style="font-size: 80%;">Technicalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs018.html#final-technicalities-i" style="font-size: 80%;">Final technicalities I</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs019.html#final-technicalities-ii" style="font-size: 80%;">Final technicalities II</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs020.html#final-technicalities-iii" style="font-size: 80%;">Final technicalities III</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs021.html#final-technicalities-iv" style="font-size: 80%;">Final technicalities IV</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs022.html#back-propagation" style="font-size: 80%;">Back propagation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs023.html#gradient-descent" style="font-size: 80%;">Gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs024.html#the-code-for-solving-the-ode" style="font-size: 80%;">The code for solving the ODE</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs025.html#the-network-with-one-input-layer-specified-number-of-hidden-layers-and-one-output-layer" style="font-size: 80%;">The network with one input layer, specified number of hidden layers, and one output layer</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs026.html#example-population-growth" style="font-size: 80%;">Example: Population growth</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs027.html#setting-up-the-problem" style="font-size: 80%;">Setting up the problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#the-trial-solution" style="font-size: 80%;">The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs029.html#the-program-using-autograd" style="font-size: 80%;">The program using Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="#using-forward-euler-to-solve-the-ode" style="font-size: 80%;">Using forward Euler to solve the ODE</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs031.html#example-solving-the-one-dimensional-poisson-equation" style="font-size: 80%;">Example: Solving the one dimensional Poisson equation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs032.html#the-specific-equation-to-solve-for" style="font-size: 80%;">The specific equation to solve for</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs033.html#solving-the-equation-using-autograd" style="font-size: 80%;">Solving the equation using Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs034.html#comparing-with-a-numerical-scheme" style="font-size: 80%;">Comparing with a numerical scheme</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs035.html#setting-up-the-code" style="font-size: 80%;">Setting up the code</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs036.html#partial-differential-equations" style="font-size: 80%;">Partial Differential Equations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs037.html#type-of-problem" style="font-size: 80%;">Type of problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs038.html#network-requirements" style="font-size: 80%;">Network requirements</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs039.html#more-details" style="font-size: 80%;">More details</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs040.html#example-the-diffusion-equation" style="font-size: 80%;">Example: The diffusion equation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs041.html#defining-the-problem" style="font-size: 80%;">Defining the problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs042.html#setting-up-the-network-using-autograd" style="font-size: 80%;">Setting up the network using Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs043.html#setting-up-the-network-using-autograd-the-trial-solution" style="font-size: 80%;">Setting up the network using Autograd; The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs044.html#why-the-jacobian" style="font-size: 80%;">Why the Jacobian?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs045.html#setting-up-the-network-using-autograd-the-full-program" style="font-size: 80%;">Setting up the network using Autograd; The full program</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs046.html#resources-on-differential-equations-and-deep-learning" style="font-size: 80%;">Resources on differential equations and deep learning</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs047.html#convolutional-neural-networks-recognizing-images" style="font-size: 80%;">Convolutional Neural Networks (recognizing images)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs048.html#what-is-the-difference" style="font-size: 80%;">What is the Difference</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs049.html#neural-networks-vs-cnns" style="font-size: 80%;">Neural Networks vs CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs050.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;">Why CNNS for images, sound files, medical images from CT scans etc?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs051.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;">Regular NNs don’t scale well to full images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs052.html#3d-volumes-of-neurons" style="font-size: 80%;">3D volumes of neurons</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs053.html#more-on-dimensionalities" style="font-size: 80%;">More on Dimensionalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs054.html#further-remarks" style="font-size: 80%;">Further remarks</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs055.html#layers-used-to-build-cnns" style="font-size: 80%;">Layers used to build CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs056.html#transforming-images" style="font-size: 80%;">Transforming images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs057.html#cnns-in-brief" style="font-size: 80%;">CNNs in brief</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs058.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs059.html#key-idea" style="font-size: 80%;">Key Idea</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#how-to-do-image-compression-before-the-era-of-deep-learning" style="font-size: 80%;">How to do image compression before the era of deep learning</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs061.html#the-svd-example" style="font-size: 80%;">The SVD example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs062.html#mathematics-of-cnns" style="font-size: 80%;">Mathematics of CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs063.html#convolution-examples-polynomial-multiplication" style="font-size: 80%;">Convolution Examples: Polynomial multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs064.html#efficient-polynomial-multiplication" style="font-size: 80%;">Efficient Polynomial Multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs065.html#further-simplification" style="font-size: 80%;">Further simplification</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs066.html#a-more-efficient-way-of-coding-the-above-convolution" style="font-size: 80%;">A more efficient way of coding the above Convolution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs067.html#commutative-process" style="font-size: 80%;">Commutative process</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs068.html#toeplitz-matrices" style="font-size: 80%;">Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs069.html#fourier-series-and-toeplitz-matrices" style="font-size: 80%;">Fourier series and Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs070.html#generalizing-the-above-one-dimensional-case" style="font-size: 80%;">Generalizing the above one-dimensional case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs071.html#memory-considerations" style="font-size: 80%;">Memory considerations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs072.html#padding" style="font-size: 80%;">Padding</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs073.html#new-vector" style="font-size: 80%;">New vector</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs074.html#rewriting-as-dot-products" style="font-size: 80%;">Rewriting as dot products</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs075.html#cross-correlation" style="font-size: 80%;">Cross correlation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs075.html#two-dimensional-objects" style="font-size: 80%;">Two-dimensional objects</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs076.html#cnns-in-more-detail-simple-example" style="font-size: 80%;">CNNs in more detail, simple example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs077.html#the-convolution-stage" style="font-size: 80%;">The convolution stage</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs078.html#finding-the-number-of-parameters" style="font-size: 80%;">Finding the number of parameters</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs079.html#new-image-or-volume" style="font-size: 80%;">New image (or volume)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs080.html#parameters-to-train-common-settings" style="font-size: 80%;">Parameters to train, common settings</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs081.html#examples-of-cnn-setups" style="font-size: 80%;">Examples of CNN setups</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs082.html#summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Summarizing: Performing a general discrete convolution ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs083.html#pooling" style="font-size: 80%;">Pooling</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs084.html#pooling-arithmetic" style="font-size: 80%;">Pooling arithmetic</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs085.html#pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Pooling types ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs086.html#building-convolutional-neural-networks-in-tensorflow-keras-and-pytorch" style="font-size: 80%;">Building convolutional neural networks in Tensorflow/Keras and PyTorch</a></li>
 
         </ul>
       </li>
@@ -348,38 +389,145 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0030"></a>
 <!-- !split -->
-<h2 id="new-vector" class="anchor">New vector </h2>
+<h2 id="using-forward-euler-to-solve-the-ode" class="anchor">Using forward Euler to solve the ODE </h2>
 
-<p>We have a new vector defined as \( x(0)=0 \), \( x(1)=0 \),
-\( x(2)=\beta_0 \), \( x(3)=\beta_1 \), \( x(4)=\beta_2 \), \( x(5)=\beta_3 \),
-\( x(6)=0 \), and \( x(7)=0 \).
-</p>
+<p>A straightforward way of solving an ODE numerically, is to use Euler's method.</p>
 
-<p>We have added four new elements, which
-are all zero. The benefit is that we can rewrite the equation for
-\( \boldsymbol{y} \), with \( i=0,1,\dots,5 \),
-</p>
-$$
-y(i) = \sum_{k=0}^{k=m-1}w(k)x(i+(m-1)-k).
+<p>Euler's method uses Taylor series to approximate the value at a function \( f \) at a step \( \Delta x \) from \( x \):</p>
+
+<p>$$
+f(x + \Delta x) \approx f(x) + \Delta x f'(x)
 $$
+</p>
+
+<p>In our case, using Euler's method to approximate the value of \( g \) at a step \( \Delta t \) from \( t \) yields</p>
 
-<p>As an example, we have</p>
 $$
-y(4)=x(6)w(0)+x(5)w(1)+x(4)w(2)=0\times \alpha_0+\beta_3\alpha_1+\beta_2\alpha_2,
+\begin{aligned}
+  g(t + \Delta t) &\approx g(t) + \Delta t g'(t) \\
+  &= g(t) + \Delta t \big(\alpha g(t)(A - g(t))\big)
+\end{aligned}
 $$
 
-<p>as before except that we have an additional term \( x(6)w(0) \), which is zero.</p>
+<p>along with the condition that \( g(0) = g_0 \).</p>
+
+<p>Let \( t_i = i \cdot \Delta t \) where \( \Delta t = \frac{T}{N_t-1} \) where \( T \) is the final time our solver must solve for and \( N_t \) the number of values for \( t \in [0, T] \) for \( i = 0, \dots, N_t-1 \).</p>
 
-<p>Similarly, for the fifth-order term we have</p>
+<p>For \( i \geq 1 \), we have that</p>
 $$
-y(5)=x(7)w(0)+x(6)w(1)+x(5)w(2)=0\times \alpha_0+0\times\alpha_1+\beta_3\alpha_2.
+\begin{aligned}
+t_i &= i\Delta t \\
+&= (i - 1)\Delta t + \Delta t \\
+&= t_{i-1} + \Delta t
+\end{aligned}
 $$
 
-<p>The zeroth-order term is</p>
+<p>Now, if \( g_i = g(t_i) \) then</p>
+
 $$
-y(0)=x(2)w(0)+x(1)w(1)+x(0)w(2)=\beta_0 \alpha_0+0\times\alpha_1+0\times\alpha_2=\alpha_0\beta_0.
+\begin{equation}
+  \begin{aligned}
+  g_i &= g(t_i) \\
+  &= g(t_{i-1} + \Delta t) \\
+  &\approx g(t_{i-1}) + \Delta t \big(\alpha g(t_{i-1})(A - g(t_{i-1}))\big) \\
+  &= g_{i-1} + \Delta t \big(\alpha g_{i-1}(A - g_{i-1})\big)
+  \end{aligned}
+\end{equation} \tag{12}
 $$
 
+<p>for \( i \geq 1 \) and \( g_0 = g(t_0) = g(0) = g_0 \).</p>
+
+<p>Equation <a href="#mjx-eqn-12">(12)</a> could be implemented in the following way,
+extending the program that uses the network using Autograd:
+</p>
+
+
+<!-- code=python (!bc pycod) typeset with pygments style "default" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #f8f8f8">
+  <pre style="line-height: 125%;"><span style="color: #408080; font-style: italic"># Assume that all function definitions from the example program using Autograd</span>
+<span style="color: #408080; font-style: italic"># are located here.</span>
+
+<span style="color: #008000; font-weight: bold">if</span> <span style="color: #19177C">__name__</span> <span style="color: #666666">==</span> <span style="color: #BA2121">&#39;__main__&#39;</span>:
+    npr<span style="color: #666666">.</span>seed(<span style="color: #666666">4155</span>)
+
+    <span style="color: #408080; font-style: italic">## Decide the vales of arguments to the function to solve</span>
+    Nt <span style="color: #666666">=</span> <span style="color: #666666">10</span>
+    T <span style="color: #666666">=</span> <span style="color: #666666">1</span>
+    t <span style="color: #666666">=</span> np<span style="color: #666666">.</span>linspace(<span style="color: #666666">0</span>,T, Nt)
+
+    <span style="color: #408080; font-style: italic">## Set up the initial parameters</span>
+    num_hidden_neurons <span style="color: #666666">=</span> [<span style="color: #666666">100</span>,<span style="color: #666666">50</span>,<span style="color: #666666">25</span>]
+    num_iter <span style="color: #666666">=</span> <span style="color: #666666">1000</span>
+    lmb <span style="color: #666666">=</span> <span style="color: #666666">1e-3</span>
+
+    P <span style="color: #666666">=</span> solve_ode_deep_neural_network(t, num_hidden_neurons, num_iter, lmb)
+
+    g_dnn_ag <span style="color: #666666">=</span> g_trial_deep(t,P)
+    g_analytical <span style="color: #666666">=</span> g_analytic(t)
+
+    <span style="color: #408080; font-style: italic"># Find the maximum absolute difference between the solutons:</span>
+    diff_ag <span style="color: #666666">=</span> np<span style="color: #666666">.</span>max(np<span style="color: #666666">.</span>abs(g_dnn_ag <span style="color: #666666">-</span> g_analytical))
+    <span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;The max absolute difference between the solutions is: </span><span style="color: #BB6688; font-weight: bold">%g</span><span style="color: #BA2121">&quot;</span><span style="color: #666666">%</span>diff_ag)
+
+    plt<span style="color: #666666">.</span>figure(figsize<span style="color: #666666">=</span>(<span style="color: #666666">10</span>,<span style="color: #666666">10</span>))
+
+    plt<span style="color: #666666">.</span>title(<span style="color: #BA2121">&#39;Performance of neural network solving an ODE compared to the analytical solution&#39;</span>)
+    plt<span style="color: #666666">.</span>plot(t, g_analytical)
+    plt<span style="color: #666666">.</span>plot(t, g_dnn_ag[<span style="color: #666666">0</span>,:])
+    plt<span style="color: #666666">.</span>legend([<span style="color: #BA2121">&#39;analytical&#39;</span>,<span style="color: #BA2121">&#39;nn&#39;</span>])
+    plt<span style="color: #666666">.</span>xlabel(<span style="color: #BA2121">&#39;t&#39;</span>)
+    plt<span style="color: #666666">.</span>ylabel(<span style="color: #BA2121">&#39;g(t)&#39;</span>)
+
+    <span style="color: #408080; font-style: italic">## Find an approximation to the funtion using forward Euler</span>
+
+    alpha, A, g0 <span style="color: #666666">=</span> get_parameters()
+    dt <span style="color: #666666">=</span> T<span style="color: #666666">/</span>(Nt <span style="color: #666666">-</span> <span style="color: #666666">1</span>)
+
+    <span style="color: #408080; font-style: italic"># Perform forward Euler to solve the ODE</span>
+    g_euler <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros(Nt)
+    g_euler[<span style="color: #666666">0</span>] <span style="color: #666666">=</span> g0
+
+    <span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #666666">1</span>,Nt):
+        g_euler[i] <span style="color: #666666">=</span> g_euler[i<span style="color: #666666">-1</span>] <span style="color: #666666">+</span> dt<span style="color: #666666">*</span>(alpha<span style="color: #666666">*</span>g_euler[i<span style="color: #666666">-1</span>]<span style="color: #666666">*</span>(A <span style="color: #666666">-</span> g_euler[i<span style="color: #666666">-1</span>]))
+
+    <span style="color: #408080; font-style: italic"># Print the errors done by each method</span>
+    diff1 <span style="color: #666666">=</span> np<span style="color: #666666">.</span>max(np<span style="color: #666666">.</span>abs(g_euler <span style="color: #666666">-</span> g_analytical))
+    diff2 <span style="color: #666666">=</span> np<span style="color: #666666">.</span>max(np<span style="color: #666666">.</span>abs(g_dnn_ag[<span style="color: #666666">0</span>,:] <span style="color: #666666">-</span> g_analytical))
+
+    <span style="color: #008000">print</span>(<span style="color: #BA2121">&#39;Max absolute difference between Euler method and analytical: </span><span style="color: #BB6688; font-weight: bold">%g</span><span style="color: #BA2121">&#39;</span><span style="color: #666666">%</span>diff1)
+    <span style="color: #008000">print</span>(<span style="color: #BA2121">&#39;Max absolute difference between deep neural network and analytical: </span><span style="color: #BB6688; font-weight: bold">%g</span><span style="color: #BA2121">&#39;</span><span style="color: #666666">%</span>diff2)
+
+    <span style="color: #408080; font-style: italic"># Plot results</span>
+    plt<span style="color: #666666">.</span>figure(figsize<span style="color: #666666">=</span>(<span style="color: #666666">10</span>,<span style="color: #666666">10</span>))
+
+    plt<span style="color: #666666">.</span>plot(t,g_euler)
+    plt<span style="color: #666666">.</span>plot(t,g_analytical)
+    plt<span style="color: #666666">.</span>plot(t,g_dnn_ag[<span style="color: #666666">0</span>,:])
+
+    plt<span style="color: #666666">.</span>legend([<span style="color: #BA2121">&#39;euler&#39;</span>,<span style="color: #BA2121">&#39;analytical&#39;</span>,<span style="color: #BA2121">&#39;dnn&#39;</span>])
+    plt<span style="color: #666666">.</span>xlabel(<span style="color: #BA2121">&#39;Time t&#39;</span>)
+    plt<span style="color: #666666">.</span>ylabel(<span style="color: #BA2121">&#39;g(t)&#39;</span>)
+
+    plt<span style="color: #666666">.</span>show()
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
+</div>
+
 
 <p>
 <!-- navigation buttons at the bottom of the page -->
@@ -406,7 +554,7 @@ <h2 id="new-vector" class="anchor">New vector </h2>
   <li><a href="http://github.com/._week44-bs038.html">39</a></li>
   <li><a href="http://github.com/._week44-bs039.html">40</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week44-bs060.html">61</a></li>
+  <li><a href="http://github.com/._week44-bs086.html">87</a></li>
   <li><a href="http://github.com/._week44-bs031.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week44/html/._week44-bs031.html b/doc/pub/week44/html/._week44-bs031.html
index fcfa90ca5..6706f17b6 100644
--- a/doc/pub/week44/html/._week44-bs031.html
+++ b/doc/pub/week44/html/._week44-bs031.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 44,  Convolutional Neural Networks (CNN)">
-<title>Week 44,  Convolutional Neural Networks (CNN)</title>
+<meta name="description" content="Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)">
+<title>Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week44.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week44-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -41,10 +41,121 @@
                2,
                None,
                'lab-sessions-on-tuesday-and-wednesday'),
-              ('Material for Lecture Monday October 28',
+              ('Material for Lecture Monday October 27',
                2,
                None,
-               'material-for-lecture-monday-october-28'),
+               'material-for-lecture-monday-october-27'),
+              ('Solving differential equations  with Deep Learning',
+               2,
+               None,
+               'solving-differential-equations-with-deep-learning'),
+              ('Ordinary Differential Equations first',
+               2,
+               None,
+               'ordinary-differential-equations-first'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('Minimization process', 2, None, 'minimization-process'),
+              ('Minimizing the cost function using gradient descent and '
+               'automatic differentiation',
+               2,
+               None,
+               'minimizing-the-cost-function-using-gradient-descent-and-automatic-differentiation'),
+              ('Example: Exponential decay',
+               2,
+               None,
+               'example-exponential-decay'),
+              ('The function to solve for',
+               2,
+               None,
+               'the-function-to-solve-for'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('Setup of Network', 2, None, 'setup-of-network'),
+              ('Reformulating the problem',
+               2,
+               None,
+               'reformulating-the-problem'),
+              ('More technicalities', 2, None, 'more-technicalities'),
+              ('More details', 2, None, 'more-details'),
+              ('A possible implementation of a neural network',
+               2,
+               None,
+               'a-possible-implementation-of-a-neural-network'),
+              ('Technicalities', 2, None, 'technicalities'),
+              ('Final technicalities I', 2, None, 'final-technicalities-i'),
+              ('Final technicalities II', 2, None, 'final-technicalities-ii'),
+              ('Final technicalities III', 2, None, 'final-technicalities-iii'),
+              ('Final technicalities IV', 2, None, 'final-technicalities-iv'),
+              ('Back propagation', 2, None, 'back-propagation'),
+              ('Gradient descent', 2, None, 'gradient-descent'),
+              ('The code for solving the ODE',
+               2,
+               None,
+               'the-code-for-solving-the-ode'),
+              ('The network with one input layer, specified number of hidden '
+               'layers, and one output layer',
+               2,
+               None,
+               'the-network-with-one-input-layer-specified-number-of-hidden-layers-and-one-output-layer'),
+              ('Example: Population growth',
+               2,
+               None,
+               'example-population-growth'),
+              ('Setting up the problem', 2, None, 'setting-up-the-problem'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('The program using Autograd',
+               2,
+               None,
+               'the-program-using-autograd'),
+              ('Using forward Euler to solve the ODE',
+               2,
+               None,
+               'using-forward-euler-to-solve-the-ode'),
+              ('Example: Solving the one dimensional Poisson equation',
+               2,
+               None,
+               'example-solving-the-one-dimensional-poisson-equation'),
+              ('The specific equation to solve for',
+               2,
+               None,
+               'the-specific-equation-to-solve-for'),
+              ('Solving the equation using Autograd',
+               2,
+               None,
+               'solving-the-equation-using-autograd'),
+              ('Comparing with a numerical scheme',
+               2,
+               None,
+               'comparing-with-a-numerical-scheme'),
+              ('Setting up the code', 2, None, 'setting-up-the-code'),
+              ('Partial Differential Equations',
+               2,
+               None,
+               'partial-differential-equations'),
+              ('Type of problem', 2, None, 'type-of-problem'),
+              ('Network requirements', 2, None, 'network-requirements'),
+              ('More details', 2, None, 'more-details'),
+              ('Example: The diffusion equation',
+               2,
+               None,
+               'example-the-diffusion-equation'),
+              ('Defining the problem', 2, None, 'defining-the-problem'),
+              ('Setting up the network using Autograd',
+               2,
+               None,
+               'setting-up-the-network-using-autograd'),
+              ('Setting up the network using Autograd; The trial solution',
+               2,
+               None,
+               'setting-up-the-network-using-autograd-the-trial-solution'),
+              ('Why the Jacobian?', 2, None, 'why-the-jacobian'),
+              ('Setting up the network using Autograd; The full program',
+               2,
+               None,
+               'setting-up-the-network-using-autograd-the-full-program'),
+              ('Resources on differential equations and deep learning',
+               2,
+               None,
+               'resources-on-differential-equations-and-deep-learning'),
               ('Convolutional Neural Networks (recognizing images)',
                2,
                None,
@@ -141,85 +252,11 @@
                2,
                None,
                'pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
-              ('Building convolutional neural networks in Tensorflow and Keras',
-               2,
-               None,
-               'building-convolutional-neural-networks-in-tensorflow-and-keras'),
-              ('Setting it up', 2, None, 'setting-it-up'),
-              ('The MNIST dataset again', 2, None, 'the-mnist-dataset-again'),
-              ('Strong correlations', 2, None, 'strong-correlations'),
-              ('Layers of a CNN', 2, None, 'layers-of-a-cnn'),
-              ('Systematic reduction', 2, None, 'systematic-reduction'),
-              ('Prerequisites: Collect and pre-process data',
-               2,
-               None,
-               'prerequisites-collect-and-pre-process-data'),
-              ('Importing Keras and Tensorflow',
-               2,
-               None,
-               'importing-keras-and-tensorflow'),
-              ('Running with Keras', 2, None, 'running-with-keras'),
-              ('Final part', 2, None, 'final-part'),
-              ('Final visualization', 2, None, 'final-visualization'),
-              ('The CIFAR01 data set', 2, None, 'the-cifar01-data-set'),
-              ('Verifying the data set', 2, None, 'verifying-the-data-set'),
-              ('Set up  the model', 2, None, 'set-up-the-model'),
-              ('Add Dense layers on top', 2, None, 'add-dense-layers-on-top'),
-              ('Compile and train the model',
+              ('Building convolutional neural networks in Tensorflow/Keras and '
+               'PyTorch',
                2,
                None,
-               'compile-and-train-the-model'),
-              ('Finally, evaluate the model',
-               2,
-               None,
-               'finally-evaluate-the-model'),
-              ('Building our own CNN code',
-               2,
-               None,
-               'building-our-own-cnn-code'),
-              ('List of contents:', 3, None, 'list-of-contents'),
-              ('Schedulers', 3, None, 'schedulers'),
-              ('Usage of schedulers', 3, None, 'usage-of-schedulers'),
-              ('Cost functions', 3, None, 'cost-functions'),
-              ('Usage of cost functions', 3, None, 'usage-of-cost-functions'),
-              ('Activation functions', 3, None, 'activation-functions'),
-              ('Usage of activation functions',
-               3,
-               None,
-               'usage-of-activation-functions'),
-              ('Convolution', 3, None, 'convolution'),
-              ('Layers', 3, None, 'layers'),
-              ('Convolution2DLayer: convolution in a hidden layer',
-               3,
-               None,
-               'convolution2dlayer-convolution-in-a-hidden-layer'),
-              ('Backpropagation in the convolutional layer',
-               3,
-               None,
-               'backpropagation-in-the-convolutional-layer'),
-              ('Demonstration', 3, None, 'demonstration'),
-              ('Pooling Layer', 3, None, 'pooling-layer'),
-              ('Flattening Layer', 3, None, 'flattening-layer'),
-              ('Fully Connected Layers', 3, None, 'fully-connected-layers'),
-              ('Optimized Convolution2DLayer',
-               3,
-               None,
-               'optimized-convolution2dlayer'),
-              ('The Convolutional Neural Network (CNN)',
-               3,
-               None,
-               'the-convolutional-neural-network-cnn'),
-              ('Usage of CNN code', 3, None, 'usage-of-cnn-code'),
-              ('Additional Remarks', 3, None, 'additional-remarks'),
-              ('Remarks on the speed', 3, None, 'remarks-on-the-speed'),
-              ('Convolution using separable kernels',
-               3,
-               None,
-               'convolution-using-separable-kernels'),
-              ('Convolution in the Fourier domain',
-               3,
-               None,
-               'convolution-in-the-fourier-domain')]}
+               'building-convolutional-neural-networks-in-tensorflow-keras-and-pytorch')]}
 end of tocinfo -->
 
 <body>
@@ -247,96 +284,100 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week44-bs.html">Week 44,  Convolutional Neural Networks (CNN)</a>
+    <a class="navbar-brand" href="http://github.com/week44-bs.html">Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs001.html#plan-for-week-44" style="font-size: 80%;"><b>Plan for week 44</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs002.html#lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Lab  sessions on Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs003.html#material-for-lecture-monday-october-28" style="font-size: 80%;"><b>Material for Lecture Monday October 28</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs004.html#convolutional-neural-networks-recognizing-images" style="font-size: 80%;"><b>Convolutional Neural Networks (recognizing images)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs005.html#what-is-the-difference" style="font-size: 80%;"><b>What is the Difference</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs006.html#neural-networks-vs-cnns" style="font-size: 80%;"><b>Neural Networks vs CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs007.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;"><b>Why CNNS for images, sound files, medical images from CT scans etc?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs008.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;"><b>Regular NNs don’t scale well to full images</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs009.html#3d-volumes-of-neurons" style="font-size: 80%;"><b>3D volumes of neurons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs010.html#more-on-dimensionalities" style="font-size: 80%;"><b>More on Dimensionalities</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs011.html#further-remarks" style="font-size: 80%;"><b>Further remarks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs012.html#layers-used-to-build-cnns" style="font-size: 80%;"><b>Layers used to build CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs013.html#transforming-images" style="font-size: 80%;"><b>Transforming images</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs014.html#cnns-in-brief" style="font-size: 80%;"><b>CNNs in brief</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs015.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs016.html#key-idea" style="font-size: 80%;"><b>Key Idea</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs017.html#how-to-do-image-compression-before-the-era-of-deep-learning" style="font-size: 80%;"><b>How to do image compression before the era of deep learning</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs018.html#the-svd-example" style="font-size: 80%;"><b>The SVD example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs019.html#mathematics-of-cnns" style="font-size: 80%;"><b>Mathematics of CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs020.html#convolution-examples-polynomial-multiplication" style="font-size: 80%;"><b>Convolution Examples: Polynomial multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs021.html#efficient-polynomial-multiplication" style="font-size: 80%;"><b>Efficient Polynomial Multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs022.html#further-simplification" style="font-size: 80%;"><b>Further simplification</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs023.html#a-more-efficient-way-of-coding-the-above-convolution" style="font-size: 80%;"><b>A more efficient way of coding the above Convolution</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs024.html#commutative-process" style="font-size: 80%;"><b>Commutative process</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs025.html#toeplitz-matrices" style="font-size: 80%;"><b>Toeplitz matrices</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs026.html#fourier-series-and-toeplitz-matrices" style="font-size: 80%;"><b>Fourier series and Toeplitz matrices</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs027.html#generalizing-the-above-one-dimensional-case" style="font-size: 80%;"><b>Generalizing the above one-dimensional case</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#memory-considerations" style="font-size: 80%;"><b>Memory considerations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs029.html#padding" style="font-size: 80%;"><b>Padding</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs030.html#new-vector" style="font-size: 80%;"><b>New vector</b></a></li>
-     <!-- navigation toc: --> <li><a href="#rewriting-as-dot-products" style="font-size: 80%;"><b>Rewriting as dot products</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs032.html#cross-correlation" style="font-size: 80%;"><b>Cross correlation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs032.html#two-dimensional-objects" style="font-size: 80%;"><b>Two-dimensional objects</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs033.html#cnns-in-more-detail-simple-example" style="font-size: 80%;"><b>CNNs in more detail, simple example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs034.html#the-convolution-stage" style="font-size: 80%;"><b>The convolution stage</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs035.html#finding-the-number-of-parameters" style="font-size: 80%;"><b>Finding the number of parameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs036.html#new-image-or-volume" style="font-size: 80%;"><b>New image (or volume)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs037.html#parameters-to-train-common-settings" style="font-size: 80%;"><b>Parameters to train, common settings</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs038.html#examples-of-cnn-setups" style="font-size: 80%;"><b>Examples of CNN setups</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs039.html#summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>Summarizing: Performing a general discrete convolution ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs040.html#pooling" style="font-size: 80%;"><b>Pooling</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs041.html#pooling-arithmetic" style="font-size: 80%;"><b>Pooling arithmetic</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs042.html#pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>Pooling types ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs043.html#building-convolutional-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building convolutional neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs044.html#setting-it-up" style="font-size: 80%;"><b>Setting it up</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs045.html#the-mnist-dataset-again" style="font-size: 80%;"><b>The MNIST dataset again</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs046.html#strong-correlations" style="font-size: 80%;"><b>Strong correlations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs047.html#layers-of-a-cnn" style="font-size: 80%;"><b>Layers of a CNN</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs048.html#systematic-reduction" style="font-size: 80%;"><b>Systematic reduction</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs049.html#prerequisites-collect-and-pre-process-data" style="font-size: 80%;"><b>Prerequisites: Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs050.html#importing-keras-and-tensorflow" style="font-size: 80%;"><b>Importing Keras and Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs051.html#running-with-keras" style="font-size: 80%;"><b>Running with Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs052.html#final-part" style="font-size: 80%;"><b>Final part</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs053.html#final-visualization" style="font-size: 80%;"><b>Final visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs054.html#the-cifar01-data-set" style="font-size: 80%;"><b>The CIFAR01 data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs055.html#verifying-the-data-set" style="font-size: 80%;"><b>Verifying the data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs056.html#set-up-the-model" style="font-size: 80%;"><b>Set up  the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs057.html#add-dense-layers-on-top" style="font-size: 80%;"><b>Add Dense layers on top</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs058.html#compile-and-train-the-model" style="font-size: 80%;"><b>Compile and train the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs059.html#finally-evaluate-the-model" style="font-size: 80%;"><b>Finally, evaluate the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#building-our-own-cnn-code" style="font-size: 80%;"><b>Building our own CNN code</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#list-of-contents" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;List of contents:</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution2dlayer-convolution-in-a-hidden-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution2DLayer: convolution in a hidden layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#backpropagation-in-the-convolutional-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Backpropagation in the convolutional layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#demonstration" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Demonstration</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#pooling-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Pooling Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#flattening-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Flattening Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#fully-connected-layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Fully Connected Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#optimized-convolution2dlayer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Optimized Convolution2DLayer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#the-convolutional-neural-network-cnn" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;The Convolutional Neural Network (CNN)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-cnn-code" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of CNN code</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#additional-remarks" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Additional Remarks</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#remarks-on-the-speed" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Remarks on the speed</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution-using-separable-kernels" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution using separable kernels</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution-in-the-fourier-domain" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution in the Fourier domain</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs001.html#plan-for-week-44" style="font-size: 80%;">Plan for week 44</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs002.html#lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;">Lab  sessions on Tuesday and Wednesday</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs003.html#material-for-lecture-monday-october-27" style="font-size: 80%;">Material for Lecture Monday October 27</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs004.html#solving-differential-equations-with-deep-learning" style="font-size: 80%;">Solving differential equations  with Deep Learning</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs005.html#ordinary-differential-equations-first" style="font-size: 80%;">Ordinary Differential Equations first</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#the-trial-solution" style="font-size: 80%;">The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs007.html#minimization-process" style="font-size: 80%;">Minimization process</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs008.html#minimizing-the-cost-function-using-gradient-descent-and-automatic-differentiation" style="font-size: 80%;">Minimizing the cost function using gradient descent and automatic differentiation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs009.html#example-exponential-decay" style="font-size: 80%;">Example: Exponential decay</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs010.html#the-function-to-solve-for" style="font-size: 80%;">The function to solve for</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#the-trial-solution" style="font-size: 80%;">The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs012.html#setup-of-network" style="font-size: 80%;">Setup of Network</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs013.html#reformulating-the-problem" style="font-size: 80%;">Reformulating the problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs014.html#more-technicalities" style="font-size: 80%;">More technicalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs039.html#more-details" style="font-size: 80%;">More details</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs016.html#a-possible-implementation-of-a-neural-network" style="font-size: 80%;">A possible implementation of a neural network</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs017.html#technicalities" style="font-size: 80%;">Technicalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs018.html#final-technicalities-i" style="font-size: 80%;">Final technicalities I</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs019.html#final-technicalities-ii" style="font-size: 80%;">Final technicalities II</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs020.html#final-technicalities-iii" style="font-size: 80%;">Final technicalities III</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs021.html#final-technicalities-iv" style="font-size: 80%;">Final technicalities IV</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs022.html#back-propagation" style="font-size: 80%;">Back propagation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs023.html#gradient-descent" style="font-size: 80%;">Gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs024.html#the-code-for-solving-the-ode" style="font-size: 80%;">The code for solving the ODE</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs025.html#the-network-with-one-input-layer-specified-number-of-hidden-layers-and-one-output-layer" style="font-size: 80%;">The network with one input layer, specified number of hidden layers, and one output layer</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs026.html#example-population-growth" style="font-size: 80%;">Example: Population growth</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs027.html#setting-up-the-problem" style="font-size: 80%;">Setting up the problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#the-trial-solution" style="font-size: 80%;">The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs029.html#the-program-using-autograd" style="font-size: 80%;">The program using Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs030.html#using-forward-euler-to-solve-the-ode" style="font-size: 80%;">Using forward Euler to solve the ODE</a></li>
+     <!-- navigation toc: --> <li><a href="#example-solving-the-one-dimensional-poisson-equation" style="font-size: 80%;">Example: Solving the one dimensional Poisson equation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs032.html#the-specific-equation-to-solve-for" style="font-size: 80%;">The specific equation to solve for</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs033.html#solving-the-equation-using-autograd" style="font-size: 80%;">Solving the equation using Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs034.html#comparing-with-a-numerical-scheme" style="font-size: 80%;">Comparing with a numerical scheme</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs035.html#setting-up-the-code" style="font-size: 80%;">Setting up the code</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs036.html#partial-differential-equations" style="font-size: 80%;">Partial Differential Equations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs037.html#type-of-problem" style="font-size: 80%;">Type of problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs038.html#network-requirements" style="font-size: 80%;">Network requirements</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs039.html#more-details" style="font-size: 80%;">More details</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs040.html#example-the-diffusion-equation" style="font-size: 80%;">Example: The diffusion equation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs041.html#defining-the-problem" style="font-size: 80%;">Defining the problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs042.html#setting-up-the-network-using-autograd" style="font-size: 80%;">Setting up the network using Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs043.html#setting-up-the-network-using-autograd-the-trial-solution" style="font-size: 80%;">Setting up the network using Autograd; The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs044.html#why-the-jacobian" style="font-size: 80%;">Why the Jacobian?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs045.html#setting-up-the-network-using-autograd-the-full-program" style="font-size: 80%;">Setting up the network using Autograd; The full program</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs046.html#resources-on-differential-equations-and-deep-learning" style="font-size: 80%;">Resources on differential equations and deep learning</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs047.html#convolutional-neural-networks-recognizing-images" style="font-size: 80%;">Convolutional Neural Networks (recognizing images)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs048.html#what-is-the-difference" style="font-size: 80%;">What is the Difference</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs049.html#neural-networks-vs-cnns" style="font-size: 80%;">Neural Networks vs CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs050.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;">Why CNNS for images, sound files, medical images from CT scans etc?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs051.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;">Regular NNs don’t scale well to full images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs052.html#3d-volumes-of-neurons" style="font-size: 80%;">3D volumes of neurons</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs053.html#more-on-dimensionalities" style="font-size: 80%;">More on Dimensionalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs054.html#further-remarks" style="font-size: 80%;">Further remarks</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs055.html#layers-used-to-build-cnns" style="font-size: 80%;">Layers used to build CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs056.html#transforming-images" style="font-size: 80%;">Transforming images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs057.html#cnns-in-brief" style="font-size: 80%;">CNNs in brief</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs058.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs059.html#key-idea" style="font-size: 80%;">Key Idea</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#how-to-do-image-compression-before-the-era-of-deep-learning" style="font-size: 80%;">How to do image compression before the era of deep learning</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs061.html#the-svd-example" style="font-size: 80%;">The SVD example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs062.html#mathematics-of-cnns" style="font-size: 80%;">Mathematics of CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs063.html#convolution-examples-polynomial-multiplication" style="font-size: 80%;">Convolution Examples: Polynomial multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs064.html#efficient-polynomial-multiplication" style="font-size: 80%;">Efficient Polynomial Multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs065.html#further-simplification" style="font-size: 80%;">Further simplification</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs066.html#a-more-efficient-way-of-coding-the-above-convolution" style="font-size: 80%;">A more efficient way of coding the above Convolution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs067.html#commutative-process" style="font-size: 80%;">Commutative process</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs068.html#toeplitz-matrices" style="font-size: 80%;">Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs069.html#fourier-series-and-toeplitz-matrices" style="font-size: 80%;">Fourier series and Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs070.html#generalizing-the-above-one-dimensional-case" style="font-size: 80%;">Generalizing the above one-dimensional case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs071.html#memory-considerations" style="font-size: 80%;">Memory considerations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs072.html#padding" style="font-size: 80%;">Padding</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs073.html#new-vector" style="font-size: 80%;">New vector</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs074.html#rewriting-as-dot-products" style="font-size: 80%;">Rewriting as dot products</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs075.html#cross-correlation" style="font-size: 80%;">Cross correlation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs075.html#two-dimensional-objects" style="font-size: 80%;">Two-dimensional objects</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs076.html#cnns-in-more-detail-simple-example" style="font-size: 80%;">CNNs in more detail, simple example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs077.html#the-convolution-stage" style="font-size: 80%;">The convolution stage</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs078.html#finding-the-number-of-parameters" style="font-size: 80%;">Finding the number of parameters</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs079.html#new-image-or-volume" style="font-size: 80%;">New image (or volume)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs080.html#parameters-to-train-common-settings" style="font-size: 80%;">Parameters to train, common settings</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs081.html#examples-of-cnn-setups" style="font-size: 80%;">Examples of CNN setups</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs082.html#summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Summarizing: Performing a general discrete convolution ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs083.html#pooling" style="font-size: 80%;">Pooling</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs084.html#pooling-arithmetic" style="font-size: 80%;">Pooling arithmetic</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs085.html#pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Pooling types ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs086.html#building-convolutional-neural-networks-in-tensorflow-keras-and-pytorch" style="font-size: 80%;">Building convolutional neural networks in Tensorflow/Keras and PyTorch</a></li>
 
         </ul>
       </li>
@@ -348,21 +389,29 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0031"></a>
 <!-- !split -->
-<h2 id="rewriting-as-dot-products" class="anchor">Rewriting as dot products </h2>
+<h2 id="example-solving-the-one-dimensional-poisson-equation" class="anchor">Example: Solving the one dimensional Poisson equation </h2>
+
+<p>The Poisson equation for \( g(x) \) in one dimension is</p>
 
-<p>If we now flip the filter/weight vector, with the following term as a typical example</p>
 $$
-y(0)=x(2)w(0)+x(1)w(1)+x(0)w(2)=x(2)\tilde{w}(2)+x(1)\tilde{w}(1)+x(0)\tilde{w}(0),
+\begin{equation} \tag{13}
+  -g''(x) = f(x)
+\end{equation}
 $$
 
-<p>with \( \tilde{w}(0)=w(2) \), \( \tilde{w}(1)=w(1) \), and \( \tilde{w}(2)=w(0) \), we can then rewrite the above sum as a dot product of
-\( x(i:i+(m-1))\tilde{w} \) for element \( y(i) \), where \( x(i:i+(m-1)) \) is simply a patch of \( \boldsymbol{x} \) of size \( m-1 \).
-</p>
+<p>where \( f(x) \) is a given function for \( x \in (0,1) \).</p>
+
+<p>The conditions that \( g(x) \) is chosen to fulfill, are</p>
+$$
+\begin{align*}
+  g(0) &= 0 \\
+  g(1) &= 0
+\end{align*}
+$$
 
-<p>The padding \( P \) we have introduced for the convolution stage is just
-another hyperparameter which is introduced as part of the
-architecture. Similarly, below we will also introduce another
-hyperparameter called <b>Stride</b> \( S \). 
+<p>This equation can be solved numerically using programs where e.g Autograd and TensorFlow are used.
+The results from the networks can then be compared to the analytical solution.
+In addition, it could be interesting to see how a typical method for numerically solving second order ODEs compares to the neural networks.
 </p>
 
 <p>
@@ -390,7 +439,7 @@ <h2 id="rewriting-as-dot-products" class="anchor">Rewriting as dot products </h2
   <li><a href="http://github.com/._week44-bs039.html">40</a></li>
   <li><a href="http://github.com/._week44-bs040.html">41</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week44-bs060.html">61</a></li>
+  <li><a href="http://github.com/._week44-bs086.html">87</a></li>
   <li><a href="http://github.com/._week44-bs032.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week44/html/._week44-bs032.html b/doc/pub/week44/html/._week44-bs032.html
index 121db6939..21e4034d2 100644
--- a/doc/pub/week44/html/._week44-bs032.html
+++ b/doc/pub/week44/html/._week44-bs032.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 44,  Convolutional Neural Networks (CNN)">
-<title>Week 44,  Convolutional Neural Networks (CNN)</title>
+<meta name="description" content="Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)">
+<title>Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week44.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week44-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -41,10 +41,121 @@
                2,
                None,
                'lab-sessions-on-tuesday-and-wednesday'),
-              ('Material for Lecture Monday October 28',
+              ('Material for Lecture Monday October 27',
                2,
                None,
-               'material-for-lecture-monday-october-28'),
+               'material-for-lecture-monday-october-27'),
+              ('Solving differential equations  with Deep Learning',
+               2,
+               None,
+               'solving-differential-equations-with-deep-learning'),
+              ('Ordinary Differential Equations first',
+               2,
+               None,
+               'ordinary-differential-equations-first'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('Minimization process', 2, None, 'minimization-process'),
+              ('Minimizing the cost function using gradient descent and '
+               'automatic differentiation',
+               2,
+               None,
+               'minimizing-the-cost-function-using-gradient-descent-and-automatic-differentiation'),
+              ('Example: Exponential decay',
+               2,
+               None,
+               'example-exponential-decay'),
+              ('The function to solve for',
+               2,
+               None,
+               'the-function-to-solve-for'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('Setup of Network', 2, None, 'setup-of-network'),
+              ('Reformulating the problem',
+               2,
+               None,
+               'reformulating-the-problem'),
+              ('More technicalities', 2, None, 'more-technicalities'),
+              ('More details', 2, None, 'more-details'),
+              ('A possible implementation of a neural network',
+               2,
+               None,
+               'a-possible-implementation-of-a-neural-network'),
+              ('Technicalities', 2, None, 'technicalities'),
+              ('Final technicalities I', 2, None, 'final-technicalities-i'),
+              ('Final technicalities II', 2, None, 'final-technicalities-ii'),
+              ('Final technicalities III', 2, None, 'final-technicalities-iii'),
+              ('Final technicalities IV', 2, None, 'final-technicalities-iv'),
+              ('Back propagation', 2, None, 'back-propagation'),
+              ('Gradient descent', 2, None, 'gradient-descent'),
+              ('The code for solving the ODE',
+               2,
+               None,
+               'the-code-for-solving-the-ode'),
+              ('The network with one input layer, specified number of hidden '
+               'layers, and one output layer',
+               2,
+               None,
+               'the-network-with-one-input-layer-specified-number-of-hidden-layers-and-one-output-layer'),
+              ('Example: Population growth',
+               2,
+               None,
+               'example-population-growth'),
+              ('Setting up the problem', 2, None, 'setting-up-the-problem'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('The program using Autograd',
+               2,
+               None,
+               'the-program-using-autograd'),
+              ('Using forward Euler to solve the ODE',
+               2,
+               None,
+               'using-forward-euler-to-solve-the-ode'),
+              ('Example: Solving the one dimensional Poisson equation',
+               2,
+               None,
+               'example-solving-the-one-dimensional-poisson-equation'),
+              ('The specific equation to solve for',
+               2,
+               None,
+               'the-specific-equation-to-solve-for'),
+              ('Solving the equation using Autograd',
+               2,
+               None,
+               'solving-the-equation-using-autograd'),
+              ('Comparing with a numerical scheme',
+               2,
+               None,
+               'comparing-with-a-numerical-scheme'),
+              ('Setting up the code', 2, None, 'setting-up-the-code'),
+              ('Partial Differential Equations',
+               2,
+               None,
+               'partial-differential-equations'),
+              ('Type of problem', 2, None, 'type-of-problem'),
+              ('Network requirements', 2, None, 'network-requirements'),
+              ('More details', 2, None, 'more-details'),
+              ('Example: The diffusion equation',
+               2,
+               None,
+               'example-the-diffusion-equation'),
+              ('Defining the problem', 2, None, 'defining-the-problem'),
+              ('Setting up the network using Autograd',
+               2,
+               None,
+               'setting-up-the-network-using-autograd'),
+              ('Setting up the network using Autograd; The trial solution',
+               2,
+               None,
+               'setting-up-the-network-using-autograd-the-trial-solution'),
+              ('Why the Jacobian?', 2, None, 'why-the-jacobian'),
+              ('Setting up the network using Autograd; The full program',
+               2,
+               None,
+               'setting-up-the-network-using-autograd-the-full-program'),
+              ('Resources on differential equations and deep learning',
+               2,
+               None,
+               'resources-on-differential-equations-and-deep-learning'),
               ('Convolutional Neural Networks (recognizing images)',
                2,
                None,
@@ -141,85 +252,11 @@
                2,
                None,
                'pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
-              ('Building convolutional neural networks in Tensorflow and Keras',
-               2,
-               None,
-               'building-convolutional-neural-networks-in-tensorflow-and-keras'),
-              ('Setting it up', 2, None, 'setting-it-up'),
-              ('The MNIST dataset again', 2, None, 'the-mnist-dataset-again'),
-              ('Strong correlations', 2, None, 'strong-correlations'),
-              ('Layers of a CNN', 2, None, 'layers-of-a-cnn'),
-              ('Systematic reduction', 2, None, 'systematic-reduction'),
-              ('Prerequisites: Collect and pre-process data',
-               2,
-               None,
-               'prerequisites-collect-and-pre-process-data'),
-              ('Importing Keras and Tensorflow',
-               2,
-               None,
-               'importing-keras-and-tensorflow'),
-              ('Running with Keras', 2, None, 'running-with-keras'),
-              ('Final part', 2, None, 'final-part'),
-              ('Final visualization', 2, None, 'final-visualization'),
-              ('The CIFAR01 data set', 2, None, 'the-cifar01-data-set'),
-              ('Verifying the data set', 2, None, 'verifying-the-data-set'),
-              ('Set up  the model', 2, None, 'set-up-the-model'),
-              ('Add Dense layers on top', 2, None, 'add-dense-layers-on-top'),
-              ('Compile and train the model',
+              ('Building convolutional neural networks in Tensorflow/Keras and '
+               'PyTorch',
                2,
                None,
-               'compile-and-train-the-model'),
-              ('Finally, evaluate the model',
-               2,
-               None,
-               'finally-evaluate-the-model'),
-              ('Building our own CNN code',
-               2,
-               None,
-               'building-our-own-cnn-code'),
-              ('List of contents:', 3, None, 'list-of-contents'),
-              ('Schedulers', 3, None, 'schedulers'),
-              ('Usage of schedulers', 3, None, 'usage-of-schedulers'),
-              ('Cost functions', 3, None, 'cost-functions'),
-              ('Usage of cost functions', 3, None, 'usage-of-cost-functions'),
-              ('Activation functions', 3, None, 'activation-functions'),
-              ('Usage of activation functions',
-               3,
-               None,
-               'usage-of-activation-functions'),
-              ('Convolution', 3, None, 'convolution'),
-              ('Layers', 3, None, 'layers'),
-              ('Convolution2DLayer: convolution in a hidden layer',
-               3,
-               None,
-               'convolution2dlayer-convolution-in-a-hidden-layer'),
-              ('Backpropagation in the convolutional layer',
-               3,
-               None,
-               'backpropagation-in-the-convolutional-layer'),
-              ('Demonstration', 3, None, 'demonstration'),
-              ('Pooling Layer', 3, None, 'pooling-layer'),
-              ('Flattening Layer', 3, None, 'flattening-layer'),
-              ('Fully Connected Layers', 3, None, 'fully-connected-layers'),
-              ('Optimized Convolution2DLayer',
-               3,
-               None,
-               'optimized-convolution2dlayer'),
-              ('The Convolutional Neural Network (CNN)',
-               3,
-               None,
-               'the-convolutional-neural-network-cnn'),
-              ('Usage of CNN code', 3, None, 'usage-of-cnn-code'),
-              ('Additional Remarks', 3, None, 'additional-remarks'),
-              ('Remarks on the speed', 3, None, 'remarks-on-the-speed'),
-              ('Convolution using separable kernels',
-               3,
-               None,
-               'convolution-using-separable-kernels'),
-              ('Convolution in the Fourier domain',
-               3,
-               None,
-               'convolution-in-the-fourier-domain')]}
+               'building-convolutional-neural-networks-in-tensorflow-keras-and-pytorch')]}
 end of tocinfo -->
 
 <body>
@@ -247,96 +284,100 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week44-bs.html">Week 44,  Convolutional Neural Networks (CNN)</a>
+    <a class="navbar-brand" href="http://github.com/week44-bs.html">Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs001.html#plan-for-week-44" style="font-size: 80%;"><b>Plan for week 44</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs002.html#lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Lab  sessions on Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs003.html#material-for-lecture-monday-october-28" style="font-size: 80%;"><b>Material for Lecture Monday October 28</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs004.html#convolutional-neural-networks-recognizing-images" style="font-size: 80%;"><b>Convolutional Neural Networks (recognizing images)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs005.html#what-is-the-difference" style="font-size: 80%;"><b>What is the Difference</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs006.html#neural-networks-vs-cnns" style="font-size: 80%;"><b>Neural Networks vs CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs007.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;"><b>Why CNNS for images, sound files, medical images from CT scans etc?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs008.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;"><b>Regular NNs don’t scale well to full images</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs009.html#3d-volumes-of-neurons" style="font-size: 80%;"><b>3D volumes of neurons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs010.html#more-on-dimensionalities" style="font-size: 80%;"><b>More on Dimensionalities</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs011.html#further-remarks" style="font-size: 80%;"><b>Further remarks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs012.html#layers-used-to-build-cnns" style="font-size: 80%;"><b>Layers used to build CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs013.html#transforming-images" style="font-size: 80%;"><b>Transforming images</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs014.html#cnns-in-brief" style="font-size: 80%;"><b>CNNs in brief</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs015.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs016.html#key-idea" style="font-size: 80%;"><b>Key Idea</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs017.html#how-to-do-image-compression-before-the-era-of-deep-learning" style="font-size: 80%;"><b>How to do image compression before the era of deep learning</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs018.html#the-svd-example" style="font-size: 80%;"><b>The SVD example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs019.html#mathematics-of-cnns" style="font-size: 80%;"><b>Mathematics of CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs020.html#convolution-examples-polynomial-multiplication" style="font-size: 80%;"><b>Convolution Examples: Polynomial multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs021.html#efficient-polynomial-multiplication" style="font-size: 80%;"><b>Efficient Polynomial Multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs022.html#further-simplification" style="font-size: 80%;"><b>Further simplification</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs023.html#a-more-efficient-way-of-coding-the-above-convolution" style="font-size: 80%;"><b>A more efficient way of coding the above Convolution</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs024.html#commutative-process" style="font-size: 80%;"><b>Commutative process</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs025.html#toeplitz-matrices" style="font-size: 80%;"><b>Toeplitz matrices</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs026.html#fourier-series-and-toeplitz-matrices" style="font-size: 80%;"><b>Fourier series and Toeplitz matrices</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs027.html#generalizing-the-above-one-dimensional-case" style="font-size: 80%;"><b>Generalizing the above one-dimensional case</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#memory-considerations" style="font-size: 80%;"><b>Memory considerations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs029.html#padding" style="font-size: 80%;"><b>Padding</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs030.html#new-vector" style="font-size: 80%;"><b>New vector</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs031.html#rewriting-as-dot-products" style="font-size: 80%;"><b>Rewriting as dot products</b></a></li>
-     <!-- navigation toc: --> <li><a href="#cross-correlation" style="font-size: 80%;"><b>Cross correlation</b></a></li>
-     <!-- navigation toc: --> <li><a href="#two-dimensional-objects" style="font-size: 80%;"><b>Two-dimensional objects</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs033.html#cnns-in-more-detail-simple-example" style="font-size: 80%;"><b>CNNs in more detail, simple example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs034.html#the-convolution-stage" style="font-size: 80%;"><b>The convolution stage</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs035.html#finding-the-number-of-parameters" style="font-size: 80%;"><b>Finding the number of parameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs036.html#new-image-or-volume" style="font-size: 80%;"><b>New image (or volume)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs037.html#parameters-to-train-common-settings" style="font-size: 80%;"><b>Parameters to train, common settings</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs038.html#examples-of-cnn-setups" style="font-size: 80%;"><b>Examples of CNN setups</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs039.html#summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>Summarizing: Performing a general discrete convolution ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs040.html#pooling" style="font-size: 80%;"><b>Pooling</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs041.html#pooling-arithmetic" style="font-size: 80%;"><b>Pooling arithmetic</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs042.html#pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>Pooling types ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs043.html#building-convolutional-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building convolutional neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs044.html#setting-it-up" style="font-size: 80%;"><b>Setting it up</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs045.html#the-mnist-dataset-again" style="font-size: 80%;"><b>The MNIST dataset again</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs046.html#strong-correlations" style="font-size: 80%;"><b>Strong correlations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs047.html#layers-of-a-cnn" style="font-size: 80%;"><b>Layers of a CNN</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs048.html#systematic-reduction" style="font-size: 80%;"><b>Systematic reduction</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs049.html#prerequisites-collect-and-pre-process-data" style="font-size: 80%;"><b>Prerequisites: Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs050.html#importing-keras-and-tensorflow" style="font-size: 80%;"><b>Importing Keras and Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs051.html#running-with-keras" style="font-size: 80%;"><b>Running with Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs052.html#final-part" style="font-size: 80%;"><b>Final part</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs053.html#final-visualization" style="font-size: 80%;"><b>Final visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs054.html#the-cifar01-data-set" style="font-size: 80%;"><b>The CIFAR01 data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs055.html#verifying-the-data-set" style="font-size: 80%;"><b>Verifying the data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs056.html#set-up-the-model" style="font-size: 80%;"><b>Set up  the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs057.html#add-dense-layers-on-top" style="font-size: 80%;"><b>Add Dense layers on top</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs058.html#compile-and-train-the-model" style="font-size: 80%;"><b>Compile and train the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs059.html#finally-evaluate-the-model" style="font-size: 80%;"><b>Finally, evaluate the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#building-our-own-cnn-code" style="font-size: 80%;"><b>Building our own CNN code</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#list-of-contents" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;List of contents:</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution2dlayer-convolution-in-a-hidden-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution2DLayer: convolution in a hidden layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#backpropagation-in-the-convolutional-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Backpropagation in the convolutional layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#demonstration" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Demonstration</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#pooling-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Pooling Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#flattening-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Flattening Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#fully-connected-layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Fully Connected Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#optimized-convolution2dlayer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Optimized Convolution2DLayer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#the-convolutional-neural-network-cnn" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;The Convolutional Neural Network (CNN)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-cnn-code" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of CNN code</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#additional-remarks" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Additional Remarks</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#remarks-on-the-speed" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Remarks on the speed</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution-using-separable-kernels" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution using separable kernels</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution-in-the-fourier-domain" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution in the Fourier domain</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs001.html#plan-for-week-44" style="font-size: 80%;">Plan for week 44</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs002.html#lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;">Lab  sessions on Tuesday and Wednesday</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs003.html#material-for-lecture-monday-october-27" style="font-size: 80%;">Material for Lecture Monday October 27</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs004.html#solving-differential-equations-with-deep-learning" style="font-size: 80%;">Solving differential equations  with Deep Learning</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs005.html#ordinary-differential-equations-first" style="font-size: 80%;">Ordinary Differential Equations first</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#the-trial-solution" style="font-size: 80%;">The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs007.html#minimization-process" style="font-size: 80%;">Minimization process</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs008.html#minimizing-the-cost-function-using-gradient-descent-and-automatic-differentiation" style="font-size: 80%;">Minimizing the cost function using gradient descent and automatic differentiation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs009.html#example-exponential-decay" style="font-size: 80%;">Example: Exponential decay</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs010.html#the-function-to-solve-for" style="font-size: 80%;">The function to solve for</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#the-trial-solution" style="font-size: 80%;">The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs012.html#setup-of-network" style="font-size: 80%;">Setup of Network</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs013.html#reformulating-the-problem" style="font-size: 80%;">Reformulating the problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs014.html#more-technicalities" style="font-size: 80%;">More technicalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs039.html#more-details" style="font-size: 80%;">More details</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs016.html#a-possible-implementation-of-a-neural-network" style="font-size: 80%;">A possible implementation of a neural network</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs017.html#technicalities" style="font-size: 80%;">Technicalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs018.html#final-technicalities-i" style="font-size: 80%;">Final technicalities I</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs019.html#final-technicalities-ii" style="font-size: 80%;">Final technicalities II</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs020.html#final-technicalities-iii" style="font-size: 80%;">Final technicalities III</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs021.html#final-technicalities-iv" style="font-size: 80%;">Final technicalities IV</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs022.html#back-propagation" style="font-size: 80%;">Back propagation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs023.html#gradient-descent" style="font-size: 80%;">Gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs024.html#the-code-for-solving-the-ode" style="font-size: 80%;">The code for solving the ODE</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs025.html#the-network-with-one-input-layer-specified-number-of-hidden-layers-and-one-output-layer" style="font-size: 80%;">The network with one input layer, specified number of hidden layers, and one output layer</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs026.html#example-population-growth" style="font-size: 80%;">Example: Population growth</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs027.html#setting-up-the-problem" style="font-size: 80%;">Setting up the problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#the-trial-solution" style="font-size: 80%;">The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs029.html#the-program-using-autograd" style="font-size: 80%;">The program using Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs030.html#using-forward-euler-to-solve-the-ode" style="font-size: 80%;">Using forward Euler to solve the ODE</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs031.html#example-solving-the-one-dimensional-poisson-equation" style="font-size: 80%;">Example: Solving the one dimensional Poisson equation</a></li>
+     <!-- navigation toc: --> <li><a href="#the-specific-equation-to-solve-for" style="font-size: 80%;">The specific equation to solve for</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs033.html#solving-the-equation-using-autograd" style="font-size: 80%;">Solving the equation using Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs034.html#comparing-with-a-numerical-scheme" style="font-size: 80%;">Comparing with a numerical scheme</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs035.html#setting-up-the-code" style="font-size: 80%;">Setting up the code</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs036.html#partial-differential-equations" style="font-size: 80%;">Partial Differential Equations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs037.html#type-of-problem" style="font-size: 80%;">Type of problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs038.html#network-requirements" style="font-size: 80%;">Network requirements</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs039.html#more-details" style="font-size: 80%;">More details</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs040.html#example-the-diffusion-equation" style="font-size: 80%;">Example: The diffusion equation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs041.html#defining-the-problem" style="font-size: 80%;">Defining the problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs042.html#setting-up-the-network-using-autograd" style="font-size: 80%;">Setting up the network using Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs043.html#setting-up-the-network-using-autograd-the-trial-solution" style="font-size: 80%;">Setting up the network using Autograd; The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs044.html#why-the-jacobian" style="font-size: 80%;">Why the Jacobian?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs045.html#setting-up-the-network-using-autograd-the-full-program" style="font-size: 80%;">Setting up the network using Autograd; The full program</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs046.html#resources-on-differential-equations-and-deep-learning" style="font-size: 80%;">Resources on differential equations and deep learning</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs047.html#convolutional-neural-networks-recognizing-images" style="font-size: 80%;">Convolutional Neural Networks (recognizing images)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs048.html#what-is-the-difference" style="font-size: 80%;">What is the Difference</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs049.html#neural-networks-vs-cnns" style="font-size: 80%;">Neural Networks vs CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs050.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;">Why CNNS for images, sound files, medical images from CT scans etc?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs051.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;">Regular NNs don’t scale well to full images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs052.html#3d-volumes-of-neurons" style="font-size: 80%;">3D volumes of neurons</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs053.html#more-on-dimensionalities" style="font-size: 80%;">More on Dimensionalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs054.html#further-remarks" style="font-size: 80%;">Further remarks</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs055.html#layers-used-to-build-cnns" style="font-size: 80%;">Layers used to build CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs056.html#transforming-images" style="font-size: 80%;">Transforming images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs057.html#cnns-in-brief" style="font-size: 80%;">CNNs in brief</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs058.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs059.html#key-idea" style="font-size: 80%;">Key Idea</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#how-to-do-image-compression-before-the-era-of-deep-learning" style="font-size: 80%;">How to do image compression before the era of deep learning</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs061.html#the-svd-example" style="font-size: 80%;">The SVD example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs062.html#mathematics-of-cnns" style="font-size: 80%;">Mathematics of CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs063.html#convolution-examples-polynomial-multiplication" style="font-size: 80%;">Convolution Examples: Polynomial multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs064.html#efficient-polynomial-multiplication" style="font-size: 80%;">Efficient Polynomial Multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs065.html#further-simplification" style="font-size: 80%;">Further simplification</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs066.html#a-more-efficient-way-of-coding-the-above-convolution" style="font-size: 80%;">A more efficient way of coding the above Convolution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs067.html#commutative-process" style="font-size: 80%;">Commutative process</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs068.html#toeplitz-matrices" style="font-size: 80%;">Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs069.html#fourier-series-and-toeplitz-matrices" style="font-size: 80%;">Fourier series and Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs070.html#generalizing-the-above-one-dimensional-case" style="font-size: 80%;">Generalizing the above one-dimensional case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs071.html#memory-considerations" style="font-size: 80%;">Memory considerations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs072.html#padding" style="font-size: 80%;">Padding</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs073.html#new-vector" style="font-size: 80%;">New vector</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs074.html#rewriting-as-dot-products" style="font-size: 80%;">Rewriting as dot products</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs075.html#cross-correlation" style="font-size: 80%;">Cross correlation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs075.html#two-dimensional-objects" style="font-size: 80%;">Two-dimensional objects</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs076.html#cnns-in-more-detail-simple-example" style="font-size: 80%;">CNNs in more detail, simple example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs077.html#the-convolution-stage" style="font-size: 80%;">The convolution stage</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs078.html#finding-the-number-of-parameters" style="font-size: 80%;">Finding the number of parameters</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs079.html#new-image-or-volume" style="font-size: 80%;">New image (or volume)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs080.html#parameters-to-train-common-settings" style="font-size: 80%;">Parameters to train, common settings</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs081.html#examples-of-cnn-setups" style="font-size: 80%;">Examples of CNN setups</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs082.html#summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Summarizing: Performing a general discrete convolution ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs083.html#pooling" style="font-size: 80%;">Pooling</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs084.html#pooling-arithmetic" style="font-size: 80%;">Pooling arithmetic</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs085.html#pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Pooling types ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs086.html#building-convolutional-neural-networks-in-tensorflow-keras-and-pytorch" style="font-size: 80%;">Building convolutional neural networks in Tensorflow/Keras and PyTorch</a></li>
 
         </ul>
       </li>
@@ -348,55 +389,34 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0032"></a>
 <!-- !split -->
-<h2 id="cross-correlation" class="anchor">Cross correlation </h2>
+<h2 id="the-specific-equation-to-solve-for" class="anchor">The specific equation to solve for </h2>
 
-<p>In essentially all applications one uses what is called cross correlation instead of the standard convolution described above.
-This means that multiplication is performed in the same direction and instead of the general expression we have discussed above (with infinite sums)
-</p>
-$$
-y(i) = \sum_{k=-\infty}^{k=\infty}w(k)x(i-k),
-$$
+<p>Here, the function \( g(x) \) to solve for follows the equation</p>
 
-<p>we have now</p>
 $$
-y(i) = \sum_{k=-\infty}^{k=\infty}w(k)x(i+k).
+-g''(x) = f(x),\qquad x \in (0,1)
 $$
 
-<p>Both TensorFlow and PyTorch (as well as our own code example below),
-implement the last equation, although it is normally referred to as
-convolution.  The same padding rules and stride rules discussed below
-apply to this expression as well.
-</p>
-
-<p>We leave it as an exercise for you to convince yourself that the example we have discussed till now, gives the same final expression using the last expression.</p>
-<h2 id="two-dimensional-objects" class="anchor">Two-dimensional objects </h2>
-
-<p>We are now ready to start studying the discrete convolutions relevant for convolutional neural networks.
-We often use convolutions over more than one dimension at a time. If
-we have a two-dimensional image \( X \) as input, we can have a <b>filter</b>
-defined by a two-dimensional <b>kernel/weight/filter</b> \( W \). This leads to an output \( Y \)
-</p>
+<p>where \( f(x) \) is a given function, along with the chosen conditions</p>
 
 $$
-Y(i,j)=(X * W)(i,j) = \sum_m\sum_n X(m,n)W(i-m,j-n).
+\begin{aligned}
+g(0) = g(1) = 0
+\end{aligned}\tag{14}
 $$
 
-<p>Convolution is a commutative process, which means we can rewrite this equation as</p>
+<p>In this example, we consider the case when \( f(x) = (3x + x^2)\exp(x) \).</p>
+
+<p>For this case, a possible trial solution satisfying the conditions could be</p>
+
 $$
-Y(i,j)=(X * W)(i,j) = \sum_m\sum_n X(i-m,j-n)W(m,n).
+g_t(x) = x \cdot (1-x) \cdot N(P,x)
 $$
 
-<p>Normally the latter is more straightforward to implement in a machine
-larning library since there is less variation in the range of values
-of \( m \) and \( n \).
-</p>
+<p>The analytical solution for this problem is</p>
 
-<p>As mentioned above, most deep learning libraries implement
-cross-correlation instead of convolution (although it is referred to as
-convolution)
-</p>
 $$
-Y(i,j)=(X * W)(i,j) = \sum_m\sum_n X(i+m,j+n)W(m,n).
+g(x) = x(1 - x)\exp(x)
 $$
 
 
@@ -425,7 +445,7 @@ <h2 id="two-dimensional-objects" class="anchor">Two-dimensional objects </h2>
   <li><a href="http://github.com/._week44-bs040.html">41</a></li>
   <li><a href="http://github.com/._week44-bs041.html">42</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week44-bs060.html">61</a></li>
+  <li><a href="http://github.com/._week44-bs086.html">87</a></li>
   <li><a href="http://github.com/._week44-bs033.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week44/html/._week44-bs033.html b/doc/pub/week44/html/._week44-bs033.html
index 6277a85b1..6d333e9b6 100644
--- a/doc/pub/week44/html/._week44-bs033.html
+++ b/doc/pub/week44/html/._week44-bs033.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 44,  Convolutional Neural Networks (CNN)">
-<title>Week 44,  Convolutional Neural Networks (CNN)</title>
+<meta name="description" content="Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)">
+<title>Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week44.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week44-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -41,10 +41,121 @@
                2,
                None,
                'lab-sessions-on-tuesday-and-wednesday'),
-              ('Material for Lecture Monday October 28',
+              ('Material for Lecture Monday October 27',
                2,
                None,
-               'material-for-lecture-monday-october-28'),
+               'material-for-lecture-monday-october-27'),
+              ('Solving differential equations  with Deep Learning',
+               2,
+               None,
+               'solving-differential-equations-with-deep-learning'),
+              ('Ordinary Differential Equations first',
+               2,
+               None,
+               'ordinary-differential-equations-first'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('Minimization process', 2, None, 'minimization-process'),
+              ('Minimizing the cost function using gradient descent and '
+               'automatic differentiation',
+               2,
+               None,
+               'minimizing-the-cost-function-using-gradient-descent-and-automatic-differentiation'),
+              ('Example: Exponential decay',
+               2,
+               None,
+               'example-exponential-decay'),
+              ('The function to solve for',
+               2,
+               None,
+               'the-function-to-solve-for'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('Setup of Network', 2, None, 'setup-of-network'),
+              ('Reformulating the problem',
+               2,
+               None,
+               'reformulating-the-problem'),
+              ('More technicalities', 2, None, 'more-technicalities'),
+              ('More details', 2, None, 'more-details'),
+              ('A possible implementation of a neural network',
+               2,
+               None,
+               'a-possible-implementation-of-a-neural-network'),
+              ('Technicalities', 2, None, 'technicalities'),
+              ('Final technicalities I', 2, None, 'final-technicalities-i'),
+              ('Final technicalities II', 2, None, 'final-technicalities-ii'),
+              ('Final technicalities III', 2, None, 'final-technicalities-iii'),
+              ('Final technicalities IV', 2, None, 'final-technicalities-iv'),
+              ('Back propagation', 2, None, 'back-propagation'),
+              ('Gradient descent', 2, None, 'gradient-descent'),
+              ('The code for solving the ODE',
+               2,
+               None,
+               'the-code-for-solving-the-ode'),
+              ('The network with one input layer, specified number of hidden '
+               'layers, and one output layer',
+               2,
+               None,
+               'the-network-with-one-input-layer-specified-number-of-hidden-layers-and-one-output-layer'),
+              ('Example: Population growth',
+               2,
+               None,
+               'example-population-growth'),
+              ('Setting up the problem', 2, None, 'setting-up-the-problem'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('The program using Autograd',
+               2,
+               None,
+               'the-program-using-autograd'),
+              ('Using forward Euler to solve the ODE',
+               2,
+               None,
+               'using-forward-euler-to-solve-the-ode'),
+              ('Example: Solving the one dimensional Poisson equation',
+               2,
+               None,
+               'example-solving-the-one-dimensional-poisson-equation'),
+              ('The specific equation to solve for',
+               2,
+               None,
+               'the-specific-equation-to-solve-for'),
+              ('Solving the equation using Autograd',
+               2,
+               None,
+               'solving-the-equation-using-autograd'),
+              ('Comparing with a numerical scheme',
+               2,
+               None,
+               'comparing-with-a-numerical-scheme'),
+              ('Setting up the code', 2, None, 'setting-up-the-code'),
+              ('Partial Differential Equations',
+               2,
+               None,
+               'partial-differential-equations'),
+              ('Type of problem', 2, None, 'type-of-problem'),
+              ('Network requirements', 2, None, 'network-requirements'),
+              ('More details', 2, None, 'more-details'),
+              ('Example: The diffusion equation',
+               2,
+               None,
+               'example-the-diffusion-equation'),
+              ('Defining the problem', 2, None, 'defining-the-problem'),
+              ('Setting up the network using Autograd',
+               2,
+               None,
+               'setting-up-the-network-using-autograd'),
+              ('Setting up the network using Autograd; The trial solution',
+               2,
+               None,
+               'setting-up-the-network-using-autograd-the-trial-solution'),
+              ('Why the Jacobian?', 2, None, 'why-the-jacobian'),
+              ('Setting up the network using Autograd; The full program',
+               2,
+               None,
+               'setting-up-the-network-using-autograd-the-full-program'),
+              ('Resources on differential equations and deep learning',
+               2,
+               None,
+               'resources-on-differential-equations-and-deep-learning'),
               ('Convolutional Neural Networks (recognizing images)',
                2,
                None,
@@ -141,85 +252,11 @@
                2,
                None,
                'pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
-              ('Building convolutional neural networks in Tensorflow and Keras',
-               2,
-               None,
-               'building-convolutional-neural-networks-in-tensorflow-and-keras'),
-              ('Setting it up', 2, None, 'setting-it-up'),
-              ('The MNIST dataset again', 2, None, 'the-mnist-dataset-again'),
-              ('Strong correlations', 2, None, 'strong-correlations'),
-              ('Layers of a CNN', 2, None, 'layers-of-a-cnn'),
-              ('Systematic reduction', 2, None, 'systematic-reduction'),
-              ('Prerequisites: Collect and pre-process data',
-               2,
-               None,
-               'prerequisites-collect-and-pre-process-data'),
-              ('Importing Keras and Tensorflow',
-               2,
-               None,
-               'importing-keras-and-tensorflow'),
-              ('Running with Keras', 2, None, 'running-with-keras'),
-              ('Final part', 2, None, 'final-part'),
-              ('Final visualization', 2, None, 'final-visualization'),
-              ('The CIFAR01 data set', 2, None, 'the-cifar01-data-set'),
-              ('Verifying the data set', 2, None, 'verifying-the-data-set'),
-              ('Set up  the model', 2, None, 'set-up-the-model'),
-              ('Add Dense layers on top', 2, None, 'add-dense-layers-on-top'),
-              ('Compile and train the model',
-               2,
-               None,
-               'compile-and-train-the-model'),
-              ('Finally, evaluate the model',
-               2,
-               None,
-               'finally-evaluate-the-model'),
-              ('Building our own CNN code',
+              ('Building convolutional neural networks in Tensorflow/Keras and '
+               'PyTorch',
                2,
                None,
-               'building-our-own-cnn-code'),
-              ('List of contents:', 3, None, 'list-of-contents'),
-              ('Schedulers', 3, None, 'schedulers'),
-              ('Usage of schedulers', 3, None, 'usage-of-schedulers'),
-              ('Cost functions', 3, None, 'cost-functions'),
-              ('Usage of cost functions', 3, None, 'usage-of-cost-functions'),
-              ('Activation functions', 3, None, 'activation-functions'),
-              ('Usage of activation functions',
-               3,
-               None,
-               'usage-of-activation-functions'),
-              ('Convolution', 3, None, 'convolution'),
-              ('Layers', 3, None, 'layers'),
-              ('Convolution2DLayer: convolution in a hidden layer',
-               3,
-               None,
-               'convolution2dlayer-convolution-in-a-hidden-layer'),
-              ('Backpropagation in the convolutional layer',
-               3,
-               None,
-               'backpropagation-in-the-convolutional-layer'),
-              ('Demonstration', 3, None, 'demonstration'),
-              ('Pooling Layer', 3, None, 'pooling-layer'),
-              ('Flattening Layer', 3, None, 'flattening-layer'),
-              ('Fully Connected Layers', 3, None, 'fully-connected-layers'),
-              ('Optimized Convolution2DLayer',
-               3,
-               None,
-               'optimized-convolution2dlayer'),
-              ('The Convolutional Neural Network (CNN)',
-               3,
-               None,
-               'the-convolutional-neural-network-cnn'),
-              ('Usage of CNN code', 3, None, 'usage-of-cnn-code'),
-              ('Additional Remarks', 3, None, 'additional-remarks'),
-              ('Remarks on the speed', 3, None, 'remarks-on-the-speed'),
-              ('Convolution using separable kernels',
-               3,
-               None,
-               'convolution-using-separable-kernels'),
-              ('Convolution in the Fourier domain',
-               3,
-               None,
-               'convolution-in-the-fourier-domain')]}
+               'building-convolutional-neural-networks-in-tensorflow-keras-and-pytorch')]}
 end of tocinfo -->
 
 <body>
@@ -247,96 +284,100 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week44-bs.html">Week 44,  Convolutional Neural Networks (CNN)</a>
+    <a class="navbar-brand" href="http://github.com/week44-bs.html">Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs001.html#plan-for-week-44" style="font-size: 80%;"><b>Plan for week 44</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs002.html#lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Lab  sessions on Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs003.html#material-for-lecture-monday-october-28" style="font-size: 80%;"><b>Material for Lecture Monday October 28</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs004.html#convolutional-neural-networks-recognizing-images" style="font-size: 80%;"><b>Convolutional Neural Networks (recognizing images)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs005.html#what-is-the-difference" style="font-size: 80%;"><b>What is the Difference</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs006.html#neural-networks-vs-cnns" style="font-size: 80%;"><b>Neural Networks vs CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs007.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;"><b>Why CNNS for images, sound files, medical images from CT scans etc?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs008.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;"><b>Regular NNs don’t scale well to full images</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs009.html#3d-volumes-of-neurons" style="font-size: 80%;"><b>3D volumes of neurons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs010.html#more-on-dimensionalities" style="font-size: 80%;"><b>More on Dimensionalities</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs011.html#further-remarks" style="font-size: 80%;"><b>Further remarks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs012.html#layers-used-to-build-cnns" style="font-size: 80%;"><b>Layers used to build CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs013.html#transforming-images" style="font-size: 80%;"><b>Transforming images</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs014.html#cnns-in-brief" style="font-size: 80%;"><b>CNNs in brief</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs015.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs016.html#key-idea" style="font-size: 80%;"><b>Key Idea</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs017.html#how-to-do-image-compression-before-the-era-of-deep-learning" style="font-size: 80%;"><b>How to do image compression before the era of deep learning</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs018.html#the-svd-example" style="font-size: 80%;"><b>The SVD example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs019.html#mathematics-of-cnns" style="font-size: 80%;"><b>Mathematics of CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs020.html#convolution-examples-polynomial-multiplication" style="font-size: 80%;"><b>Convolution Examples: Polynomial multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs021.html#efficient-polynomial-multiplication" style="font-size: 80%;"><b>Efficient Polynomial Multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs022.html#further-simplification" style="font-size: 80%;"><b>Further simplification</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs023.html#a-more-efficient-way-of-coding-the-above-convolution" style="font-size: 80%;"><b>A more efficient way of coding the above Convolution</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs024.html#commutative-process" style="font-size: 80%;"><b>Commutative process</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs025.html#toeplitz-matrices" style="font-size: 80%;"><b>Toeplitz matrices</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs026.html#fourier-series-and-toeplitz-matrices" style="font-size: 80%;"><b>Fourier series and Toeplitz matrices</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs027.html#generalizing-the-above-one-dimensional-case" style="font-size: 80%;"><b>Generalizing the above one-dimensional case</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#memory-considerations" style="font-size: 80%;"><b>Memory considerations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs029.html#padding" style="font-size: 80%;"><b>Padding</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs030.html#new-vector" style="font-size: 80%;"><b>New vector</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs031.html#rewriting-as-dot-products" style="font-size: 80%;"><b>Rewriting as dot products</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs032.html#cross-correlation" style="font-size: 80%;"><b>Cross correlation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs032.html#two-dimensional-objects" style="font-size: 80%;"><b>Two-dimensional objects</b></a></li>
-     <!-- navigation toc: --> <li><a href="#cnns-in-more-detail-simple-example" style="font-size: 80%;"><b>CNNs in more detail, simple example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs034.html#the-convolution-stage" style="font-size: 80%;"><b>The convolution stage</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs035.html#finding-the-number-of-parameters" style="font-size: 80%;"><b>Finding the number of parameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs036.html#new-image-or-volume" style="font-size: 80%;"><b>New image (or volume)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs037.html#parameters-to-train-common-settings" style="font-size: 80%;"><b>Parameters to train, common settings</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs038.html#examples-of-cnn-setups" style="font-size: 80%;"><b>Examples of CNN setups</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs039.html#summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>Summarizing: Performing a general discrete convolution ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs040.html#pooling" style="font-size: 80%;"><b>Pooling</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs041.html#pooling-arithmetic" style="font-size: 80%;"><b>Pooling arithmetic</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs042.html#pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>Pooling types ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs043.html#building-convolutional-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building convolutional neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs044.html#setting-it-up" style="font-size: 80%;"><b>Setting it up</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs045.html#the-mnist-dataset-again" style="font-size: 80%;"><b>The MNIST dataset again</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs046.html#strong-correlations" style="font-size: 80%;"><b>Strong correlations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs047.html#layers-of-a-cnn" style="font-size: 80%;"><b>Layers of a CNN</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs048.html#systematic-reduction" style="font-size: 80%;"><b>Systematic reduction</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs049.html#prerequisites-collect-and-pre-process-data" style="font-size: 80%;"><b>Prerequisites: Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs050.html#importing-keras-and-tensorflow" style="font-size: 80%;"><b>Importing Keras and Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs051.html#running-with-keras" style="font-size: 80%;"><b>Running with Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs052.html#final-part" style="font-size: 80%;"><b>Final part</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs053.html#final-visualization" style="font-size: 80%;"><b>Final visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs054.html#the-cifar01-data-set" style="font-size: 80%;"><b>The CIFAR01 data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs055.html#verifying-the-data-set" style="font-size: 80%;"><b>Verifying the data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs056.html#set-up-the-model" style="font-size: 80%;"><b>Set up  the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs057.html#add-dense-layers-on-top" style="font-size: 80%;"><b>Add Dense layers on top</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs058.html#compile-and-train-the-model" style="font-size: 80%;"><b>Compile and train the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs059.html#finally-evaluate-the-model" style="font-size: 80%;"><b>Finally, evaluate the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#building-our-own-cnn-code" style="font-size: 80%;"><b>Building our own CNN code</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#list-of-contents" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;List of contents:</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution2dlayer-convolution-in-a-hidden-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution2DLayer: convolution in a hidden layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#backpropagation-in-the-convolutional-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Backpropagation in the convolutional layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#demonstration" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Demonstration</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#pooling-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Pooling Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#flattening-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Flattening Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#fully-connected-layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Fully Connected Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#optimized-convolution2dlayer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Optimized Convolution2DLayer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#the-convolutional-neural-network-cnn" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;The Convolutional Neural Network (CNN)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-cnn-code" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of CNN code</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#additional-remarks" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Additional Remarks</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#remarks-on-the-speed" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Remarks on the speed</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution-using-separable-kernels" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution using separable kernels</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution-in-the-fourier-domain" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution in the Fourier domain</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs001.html#plan-for-week-44" style="font-size: 80%;">Plan for week 44</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs002.html#lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;">Lab  sessions on Tuesday and Wednesday</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs003.html#material-for-lecture-monday-october-27" style="font-size: 80%;">Material for Lecture Monday October 27</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs004.html#solving-differential-equations-with-deep-learning" style="font-size: 80%;">Solving differential equations  with Deep Learning</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs005.html#ordinary-differential-equations-first" style="font-size: 80%;">Ordinary Differential Equations first</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#the-trial-solution" style="font-size: 80%;">The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs007.html#minimization-process" style="font-size: 80%;">Minimization process</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs008.html#minimizing-the-cost-function-using-gradient-descent-and-automatic-differentiation" style="font-size: 80%;">Minimizing the cost function using gradient descent and automatic differentiation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs009.html#example-exponential-decay" style="font-size: 80%;">Example: Exponential decay</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs010.html#the-function-to-solve-for" style="font-size: 80%;">The function to solve for</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#the-trial-solution" style="font-size: 80%;">The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs012.html#setup-of-network" style="font-size: 80%;">Setup of Network</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs013.html#reformulating-the-problem" style="font-size: 80%;">Reformulating the problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs014.html#more-technicalities" style="font-size: 80%;">More technicalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs039.html#more-details" style="font-size: 80%;">More details</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs016.html#a-possible-implementation-of-a-neural-network" style="font-size: 80%;">A possible implementation of a neural network</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs017.html#technicalities" style="font-size: 80%;">Technicalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs018.html#final-technicalities-i" style="font-size: 80%;">Final technicalities I</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs019.html#final-technicalities-ii" style="font-size: 80%;">Final technicalities II</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs020.html#final-technicalities-iii" style="font-size: 80%;">Final technicalities III</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs021.html#final-technicalities-iv" style="font-size: 80%;">Final technicalities IV</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs022.html#back-propagation" style="font-size: 80%;">Back propagation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs023.html#gradient-descent" style="font-size: 80%;">Gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs024.html#the-code-for-solving-the-ode" style="font-size: 80%;">The code for solving the ODE</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs025.html#the-network-with-one-input-layer-specified-number-of-hidden-layers-and-one-output-layer" style="font-size: 80%;">The network with one input layer, specified number of hidden layers, and one output layer</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs026.html#example-population-growth" style="font-size: 80%;">Example: Population growth</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs027.html#setting-up-the-problem" style="font-size: 80%;">Setting up the problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#the-trial-solution" style="font-size: 80%;">The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs029.html#the-program-using-autograd" style="font-size: 80%;">The program using Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs030.html#using-forward-euler-to-solve-the-ode" style="font-size: 80%;">Using forward Euler to solve the ODE</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs031.html#example-solving-the-one-dimensional-poisson-equation" style="font-size: 80%;">Example: Solving the one dimensional Poisson equation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs032.html#the-specific-equation-to-solve-for" style="font-size: 80%;">The specific equation to solve for</a></li>
+     <!-- navigation toc: --> <li><a href="#solving-the-equation-using-autograd" style="font-size: 80%;">Solving the equation using Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs034.html#comparing-with-a-numerical-scheme" style="font-size: 80%;">Comparing with a numerical scheme</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs035.html#setting-up-the-code" style="font-size: 80%;">Setting up the code</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs036.html#partial-differential-equations" style="font-size: 80%;">Partial Differential Equations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs037.html#type-of-problem" style="font-size: 80%;">Type of problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs038.html#network-requirements" style="font-size: 80%;">Network requirements</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs039.html#more-details" style="font-size: 80%;">More details</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs040.html#example-the-diffusion-equation" style="font-size: 80%;">Example: The diffusion equation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs041.html#defining-the-problem" style="font-size: 80%;">Defining the problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs042.html#setting-up-the-network-using-autograd" style="font-size: 80%;">Setting up the network using Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs043.html#setting-up-the-network-using-autograd-the-trial-solution" style="font-size: 80%;">Setting up the network using Autograd; The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs044.html#why-the-jacobian" style="font-size: 80%;">Why the Jacobian?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs045.html#setting-up-the-network-using-autograd-the-full-program" style="font-size: 80%;">Setting up the network using Autograd; The full program</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs046.html#resources-on-differential-equations-and-deep-learning" style="font-size: 80%;">Resources on differential equations and deep learning</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs047.html#convolutional-neural-networks-recognizing-images" style="font-size: 80%;">Convolutional Neural Networks (recognizing images)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs048.html#what-is-the-difference" style="font-size: 80%;">What is the Difference</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs049.html#neural-networks-vs-cnns" style="font-size: 80%;">Neural Networks vs CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs050.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;">Why CNNS for images, sound files, medical images from CT scans etc?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs051.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;">Regular NNs don’t scale well to full images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs052.html#3d-volumes-of-neurons" style="font-size: 80%;">3D volumes of neurons</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs053.html#more-on-dimensionalities" style="font-size: 80%;">More on Dimensionalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs054.html#further-remarks" style="font-size: 80%;">Further remarks</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs055.html#layers-used-to-build-cnns" style="font-size: 80%;">Layers used to build CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs056.html#transforming-images" style="font-size: 80%;">Transforming images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs057.html#cnns-in-brief" style="font-size: 80%;">CNNs in brief</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs058.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs059.html#key-idea" style="font-size: 80%;">Key Idea</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#how-to-do-image-compression-before-the-era-of-deep-learning" style="font-size: 80%;">How to do image compression before the era of deep learning</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs061.html#the-svd-example" style="font-size: 80%;">The SVD example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs062.html#mathematics-of-cnns" style="font-size: 80%;">Mathematics of CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs063.html#convolution-examples-polynomial-multiplication" style="font-size: 80%;">Convolution Examples: Polynomial multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs064.html#efficient-polynomial-multiplication" style="font-size: 80%;">Efficient Polynomial Multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs065.html#further-simplification" style="font-size: 80%;">Further simplification</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs066.html#a-more-efficient-way-of-coding-the-above-convolution" style="font-size: 80%;">A more efficient way of coding the above Convolution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs067.html#commutative-process" style="font-size: 80%;">Commutative process</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs068.html#toeplitz-matrices" style="font-size: 80%;">Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs069.html#fourier-series-and-toeplitz-matrices" style="font-size: 80%;">Fourier series and Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs070.html#generalizing-the-above-one-dimensional-case" style="font-size: 80%;">Generalizing the above one-dimensional case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs071.html#memory-considerations" style="font-size: 80%;">Memory considerations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs072.html#padding" style="font-size: 80%;">Padding</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs073.html#new-vector" style="font-size: 80%;">New vector</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs074.html#rewriting-as-dot-products" style="font-size: 80%;">Rewriting as dot products</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs075.html#cross-correlation" style="font-size: 80%;">Cross correlation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs075.html#two-dimensional-objects" style="font-size: 80%;">Two-dimensional objects</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs076.html#cnns-in-more-detail-simple-example" style="font-size: 80%;">CNNs in more detail, simple example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs077.html#the-convolution-stage" style="font-size: 80%;">The convolution stage</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs078.html#finding-the-number-of-parameters" style="font-size: 80%;">Finding the number of parameters</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs079.html#new-image-or-volume" style="font-size: 80%;">New image (or volume)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs080.html#parameters-to-train-common-settings" style="font-size: 80%;">Parameters to train, common settings</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs081.html#examples-of-cnn-setups" style="font-size: 80%;">Examples of CNN setups</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs082.html#summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Summarizing: Performing a general discrete convolution ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs083.html#pooling" style="font-size: 80%;">Pooling</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs084.html#pooling-arithmetic" style="font-size: 80%;">Pooling arithmetic</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs085.html#pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Pooling types ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs086.html#building-convolutional-neural-networks-in-tensorflow-keras-and-pytorch" style="font-size: 80%;">Building convolutional neural networks in Tensorflow/Keras and PyTorch</a></li>
 
         </ul>
       </li>
@@ -348,64 +389,184 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0033"></a>
 <!-- !split -->
-<h2 id="cnns-in-more-detail-simple-example" class="anchor">CNNs in more detail, simple example  </h2>
+<h2 id="solving-the-equation-using-autograd" class="anchor">Solving the equation using Autograd </h2>
+
+
+<!-- code=python (!bc pycod) typeset with pygments style "default" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #f8f8f8">
+  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">autograd.numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">autograd</span> <span style="color: #008000; font-weight: bold">import</span> grad, elementwise_grad
+<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">autograd.numpy.random</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">npr</span>
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">matplotlib</span> <span style="color: #008000; font-weight: bold">import</span> pyplot <span style="color: #008000; font-weight: bold">as</span> plt
+
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">sigmoid</span>(z):
+    <span style="color: #008000; font-weight: bold">return</span> <span style="color: #666666">1/</span>(<span style="color: #666666">1</span> <span style="color: #666666">+</span> np<span style="color: #666666">.</span>exp(<span style="color: #666666">-</span>z))
+
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">deep_neural_network</span>(deep_params, x):
+    <span style="color: #408080; font-style: italic"># N_hidden is the number of hidden layers</span>
+    <span style="color: #408080; font-style: italic"># deep_params is a list, len() should be used</span>
+    N_hidden <span style="color: #666666">=</span> <span style="color: #008000">len</span>(deep_params) <span style="color: #666666">-</span> <span style="color: #666666">1</span> <span style="color: #408080; font-style: italic"># -1 since params consists of</span>
+                                        <span style="color: #408080; font-style: italic"># parameters to all the hidden</span>
+                                        <span style="color: #408080; font-style: italic"># layers AND the output layer.</span>
+
+    <span style="color: #408080; font-style: italic"># Assumes input x being an one-dimensional array</span>
+    num_values <span style="color: #666666">=</span> np<span style="color: #666666">.</span>size(x)
+    x <span style="color: #666666">=</span> x<span style="color: #666666">.</span>reshape(<span style="color: #666666">-1</span>, num_values)
+
+    <span style="color: #408080; font-style: italic"># Assume that the input layer does nothing to the input x</span>
+    x_input <span style="color: #666666">=</span> x
+
+    <span style="color: #408080; font-style: italic"># Due to multiple hidden layers, define a variable referencing to the</span>
+    <span style="color: #408080; font-style: italic"># output of the previous layer:</span>
+    x_prev <span style="color: #666666">=</span> x_input
+
+    <span style="color: #408080; font-style: italic">## Hidden layers:</span>
+
+    <span style="color: #008000; font-weight: bold">for</span> l <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(N_hidden):
+        <span style="color: #408080; font-style: italic"># From the list of parameters P; find the correct weigths and bias for this layer</span>
+        w_hidden <span style="color: #666666">=</span> deep_params[l]
+
+        <span style="color: #408080; font-style: italic"># Add a row of ones to include bias</span>
+        x_prev <span style="color: #666666">=</span> np<span style="color: #666666">.</span>concatenate((np<span style="color: #666666">.</span>ones((<span style="color: #666666">1</span>,num_values)), x_prev ), axis <span style="color: #666666">=</span> <span style="color: #666666">0</span>)
+
+        z_hidden <span style="color: #666666">=</span> np<span style="color: #666666">.</span>matmul(w_hidden, x_prev)
+        x_hidden <span style="color: #666666">=</span> sigmoid(z_hidden)
+
+        <span style="color: #408080; font-style: italic"># Update x_prev such that next layer can use the output from this layer</span>
+        x_prev <span style="color: #666666">=</span> x_hidden
+
+    <span style="color: #408080; font-style: italic">## Output layer:</span>
+
+    <span style="color: #408080; font-style: italic"># Get the weights and bias for this layer</span>
+    w_output <span style="color: #666666">=</span> deep_params[<span style="color: #666666">-1</span>]
+
+    <span style="color: #408080; font-style: italic"># Include bias:</span>
+    x_prev <span style="color: #666666">=</span> np<span style="color: #666666">.</span>concatenate((np<span style="color: #666666">.</span>ones((<span style="color: #666666">1</span>,num_values)), x_prev), axis <span style="color: #666666">=</span> <span style="color: #666666">0</span>)
+
+    z_output <span style="color: #666666">=</span> np<span style="color: #666666">.</span>matmul(w_output, x_prev)
+    x_output <span style="color: #666666">=</span> z_output
 
-<p>Let assume we have an input matrix \( X \) of dimensionality \( 3\times 3 \)
-and a \( 2\times 2 \) filter \( W \) given by the following matrices
-</p>
+    <span style="color: #008000; font-weight: bold">return</span> x_output
 
-$$
-\boldsymbol{X}=\begin{bmatrix}x_{00} & x_{01} & x_{02}  \\
-                      x_{10} & x_{11} & x_{12}  \\
-	              x_{20} & x_{21} & x_{22} \end{bmatrix},
-$$
 
-<p>and </p>
-$$
-\boldsymbol{W}=\begin{bmatrix}w_{00} & w_{01} \\
-	              w_{10} & w_{11}\end{bmatrix}.
-$$
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">solve_ode_deep_neural_network</span>(x, num_neurons, num_iter, lmb):
+    <span style="color: #408080; font-style: italic"># num_hidden_neurons is now a list of number of neurons within each hidden layer</span>
 
-<p>We introduce now the hyperparameter \( S \) <b>stride</b>. Stride represents how the filter \( W \) moves the convolution process on the matrix \( X \).
-We strongly recommend the repository on <a href="https://github.com/vdumoulin/conv_arithmetic" target="_self">Arithmetic of deep learning by Dumoulin and Visin</a> 
-</p>
+    <span style="color: #408080; font-style: italic"># Find the number of hidden layers:</span>
+    N_hidden <span style="color: #666666">=</span> np<span style="color: #666666">.</span>size(num_neurons)
 
-<p>Here we set the stride equal to \( S=1 \), which means that, starting with the element \( x_{00} \), the filter will act on \( 2\times 2 \) submatrices each time, starting with the upper corner and moving according to the stride value column by column. </p>
+    <span style="color: #408080; font-style: italic">## Set up initial weigths and biases</span>
 
-<p>Here we perform the operation</p>
-$$
-Y_(i,j)=(X * W)(i,j) = \sum_m\sum_n X(i-m,j-n)W(m,n),
-$$
+    <span style="color: #408080; font-style: italic"># Initialize the list of parameters:</span>
+    P <span style="color: #666666">=</span> [<span style="color: #008000; font-weight: bold">None</span>]<span style="color: #666666">*</span>(N_hidden <span style="color: #666666">+</span> <span style="color: #666666">1</span>) <span style="color: #408080; font-style: italic"># + 1 to include the output layer</span>
 
-<p>and obtain</p>
-$$
-\boldsymbol{Y}=\begin{bmatrix}x_{00}w_{00}+x_{01}w_{01}+x_{10}w_{10}+x_{11}w_{11} & x_{01}w_{00}+x_{02}w_{01}+x_{11}w_{10}+x_{12}w_{11}  \\
-	              x_{10}w_{00}+x_{11}w_{01}+x_{20}w_{10}+x_{21}w_{11} & x_{11}w_{00}+x_{12}w_{01}+x_{21}w_{10}+x_{22}w_{11}\end{bmatrix}.
-$$
+    P[<span style="color: #666666">0</span>] <span style="color: #666666">=</span> npr<span style="color: #666666">.</span>randn(num_neurons[<span style="color: #666666">0</span>], <span style="color: #666666">2</span> )
+    <span style="color: #008000; font-weight: bold">for</span> l <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #666666">1</span>,N_hidden):
+        P[l] <span style="color: #666666">=</span> npr<span style="color: #666666">.</span>randn(num_neurons[l], num_neurons[l<span style="color: #666666">-1</span>] <span style="color: #666666">+</span> <span style="color: #666666">1</span>) <span style="color: #408080; font-style: italic"># +1 to include bias</span>
 
-<p>We can rewrite this operation in terms of a matrix-vector multiplication by defining a new vector where we flatten out the inputs as a vector \( \boldsymbol{X}' \) of length \( 9 \) and
-a matrix \( \boldsymbol{W}' \) with dimension \( 4\times 9 \) as
-</p>
+    <span style="color: #408080; font-style: italic"># For the output layer</span>
+    P[<span style="color: #666666">-1</span>] <span style="color: #666666">=</span> npr<span style="color: #666666">.</span>randn(<span style="color: #666666">1</span>, num_neurons[<span style="color: #666666">-1</span>] <span style="color: #666666">+</span> <span style="color: #666666">1</span> ) <span style="color: #408080; font-style: italic"># +1 since bias is included</span>
 
-$$
-\boldsymbol{X}'=\begin{bmatrix}x_{00} \\ x_{01} \\ x_{02} \\ x_{10} \\ x_{11} \\ x_{12} \\ x_{20} \\ x_{21} \\ x_{22} \end{bmatrix},
-$$
+    <span style="color: #008000">print</span>(<span style="color: #BA2121">&#39;Initial cost: </span><span style="color: #BB6688; font-weight: bold">%g</span><span style="color: #BA2121">&#39;</span><span style="color: #666666">%</span>cost_function_deep(P, x))
 
-<p>and the new matrix</p>
-$$
-\boldsymbol{W}'=\begin{bmatrix} w_{00} & w_{01} & 0 & w_{10} & w_{11} & 0 & 0 & 0 & 0 \\
-                        0  & w_{00} & w_{01} & 0 & w_{10} & w_{11} & 0 & 0 & 0 \\
-			0 & 0 & 0 & w_{00} & w_{01} & 0 & w_{10} & w_{11} & 0  \\
-                        0 & 0 & 0 & 0 & w_{00} & w_{01} & 0 & w_{10} & w_{11}\end{bmatrix}.
-$$
+    <span style="color: #408080; font-style: italic">## Start finding the optimal weigths using gradient descent</span>
 
-<p>We see easily that performing the matrix-vector multiplication \( \boldsymbol{W}'\boldsymbol{X}' \) is the same as the above convolution with stride \( S=1 \), that is</p>
+    <span style="color: #408080; font-style: italic"># Find the Python function that represents the gradient of the cost function</span>
+    <span style="color: #408080; font-style: italic"># w.r.t the 0-th input argument -- that is the weights and biases in the hidden and output layer</span>
+    cost_function_deep_grad <span style="color: #666666">=</span> grad(cost_function_deep,<span style="color: #666666">0</span>)
 
-$$
-Y=(\boldsymbol{W}*\boldsymbol{X}),
-$$
+    <span style="color: #408080; font-style: italic"># Let the update be done num_iter times</span>
+    <span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(num_iter):
+        <span style="color: #408080; font-style: italic"># Evaluate the gradient at the current weights and biases in P.</span>
+        <span style="color: #408080; font-style: italic"># The cost_grad consist now of N_hidden + 1 arrays; the gradient w.r.t the weights and biases</span>
+        <span style="color: #408080; font-style: italic"># in the hidden layers and output layers evaluated at x.</span>
+        cost_deep_grad <span style="color: #666666">=</span>  cost_function_deep_grad(P, x)
+
+        <span style="color: #008000; font-weight: bold">for</span> l <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(N_hidden<span style="color: #666666">+1</span>):
+            P[l] <span style="color: #666666">=</span> P[l] <span style="color: #666666">-</span> lmb <span style="color: #666666">*</span> cost_deep_grad[l]
+
+    <span style="color: #008000">print</span>(<span style="color: #BA2121">&#39;Final cost: </span><span style="color: #BB6688; font-weight: bold">%g</span><span style="color: #BA2121">&#39;</span><span style="color: #666666">%</span>cost_function_deep(P, x))
+
+    <span style="color: #008000; font-weight: bold">return</span> P
+
+<span style="color: #408080; font-style: italic">## Set up the cost function specified for this Poisson equation:</span>
+
+<span style="color: #408080; font-style: italic"># The right side of the ODE</span>
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">f</span>(x):
+    <span style="color: #008000; font-weight: bold">return</span> (<span style="color: #666666">3*</span>x <span style="color: #666666">+</span> x<span style="color: #666666">**2</span>)<span style="color: #666666">*</span>np<span style="color: #666666">.</span>exp(x)
+
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">cost_function_deep</span>(P, x):
+
+    <span style="color: #408080; font-style: italic"># Evaluate the trial function with the current parameters P</span>
+    g_t <span style="color: #666666">=</span> g_trial_deep(x,P)
+
+    <span style="color: #408080; font-style: italic"># Find the derivative w.r.t x of the trial function</span>
+    d2_g_t <span style="color: #666666">=</span> elementwise_grad(elementwise_grad(g_trial_deep,<span style="color: #666666">0</span>))(x,P)
+
+    right_side <span style="color: #666666">=</span> f(x)
+
+    err_sqr <span style="color: #666666">=</span> (<span style="color: #666666">-</span>d2_g_t <span style="color: #666666">-</span> right_side)<span style="color: #666666">**2</span>
+    cost_sum <span style="color: #666666">=</span> np<span style="color: #666666">.</span>sum(err_sqr)
+
+    <span style="color: #008000; font-weight: bold">return</span> cost_sum<span style="color: #666666">/</span>np<span style="color: #666666">.</span>size(err_sqr)
+
+<span style="color: #408080; font-style: italic"># The trial solution:</span>
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">g_trial_deep</span>(x,P):
+    <span style="color: #008000; font-weight: bold">return</span> x<span style="color: #666666">*</span>(<span style="color: #666666">1-</span>x)<span style="color: #666666">*</span>deep_neural_network(P,x)
+
+<span style="color: #408080; font-style: italic"># The analytic solution;</span>
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">g_analytic</span>(x):
+    <span style="color: #008000; font-weight: bold">return</span> x<span style="color: #666666">*</span>(<span style="color: #666666">1-</span>x)<span style="color: #666666">*</span>np<span style="color: #666666">.</span>exp(x)
+
+<span style="color: #008000; font-weight: bold">if</span> <span style="color: #19177C">__name__</span> <span style="color: #666666">==</span> <span style="color: #BA2121">&#39;__main__&#39;</span>:
+    npr<span style="color: #666666">.</span>seed(<span style="color: #666666">4155</span>)
+
+    <span style="color: #408080; font-style: italic">## Decide the vales of arguments to the function to solve</span>
+    Nx <span style="color: #666666">=</span> <span style="color: #666666">10</span>
+    x <span style="color: #666666">=</span> np<span style="color: #666666">.</span>linspace(<span style="color: #666666">0</span>,<span style="color: #666666">1</span>, Nx)
+
+    <span style="color: #408080; font-style: italic">## Set up the initial parameters</span>
+    num_hidden_neurons <span style="color: #666666">=</span> [<span style="color: #666666">200</span>,<span style="color: #666666">100</span>]
+    num_iter <span style="color: #666666">=</span> <span style="color: #666666">1000</span>
+    lmb <span style="color: #666666">=</span> <span style="color: #666666">1e-3</span>
+
+    P <span style="color: #666666">=</span> solve_ode_deep_neural_network(x, num_hidden_neurons, num_iter, lmb)
+
+    g_dnn_ag <span style="color: #666666">=</span> g_trial_deep(x,P)
+    g_analytical <span style="color: #666666">=</span> g_analytic(x)
+
+    <span style="color: #408080; font-style: italic"># Find the maximum absolute difference between the solutons:</span>
+    max_diff <span style="color: #666666">=</span> np<span style="color: #666666">.</span>max(np<span style="color: #666666">.</span>abs(g_dnn_ag <span style="color: #666666">-</span> g_analytical))
+    <span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;The max absolute difference between the solutions is: </span><span style="color: #BB6688; font-weight: bold">%g</span><span style="color: #BA2121">&quot;</span><span style="color: #666666">%</span>max_diff)
+
+    plt<span style="color: #666666">.</span>figure(figsize<span style="color: #666666">=</span>(<span style="color: #666666">10</span>,<span style="color: #666666">10</span>))
+
+    plt<span style="color: #666666">.</span>title(<span style="color: #BA2121">&#39;Performance of neural network solving an ODE compared to the analytical solution&#39;</span>)
+    plt<span style="color: #666666">.</span>plot(x, g_analytical)
+    plt<span style="color: #666666">.</span>plot(x, g_dnn_ag[<span style="color: #666666">0</span>,:])
+    plt<span style="color: #666666">.</span>legend([<span style="color: #BA2121">&#39;analytical&#39;</span>,<span style="color: #BA2121">&#39;nn&#39;</span>])
+    plt<span style="color: #666666">.</span>xlabel(<span style="color: #BA2121">&#39;x&#39;</span>)
+    plt<span style="color: #666666">.</span>ylabel(<span style="color: #BA2121">&#39;g(x)&#39;</span>)
+    plt<span style="color: #666666">.</span>show()
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
+</div>
 
-<p>is now given by \( \boldsymbol{W}'\boldsymbol{X}' \) which is a vector of length \( 4 \) instead of the originally resulting  \( 2\times 2 \) output matrix.</p>
 
 <p>
 <!-- navigation buttons at the bottom of the page -->
@@ -432,7 +593,7 @@ <h2 id="cnns-in-more-detail-simple-example" class="anchor">CNNs in more detail,
   <li><a href="http://github.com/._week44-bs041.html">42</a></li>
   <li><a href="http://github.com/._week44-bs042.html">43</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week44-bs060.html">61</a></li>
+  <li><a href="http://github.com/._week44-bs086.html">87</a></li>
   <li><a href="http://github.com/._week44-bs034.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week44/html/._week44-bs034.html b/doc/pub/week44/html/._week44-bs034.html
index 17c1d2660..a479e3d5b 100644
--- a/doc/pub/week44/html/._week44-bs034.html
+++ b/doc/pub/week44/html/._week44-bs034.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 44,  Convolutional Neural Networks (CNN)">
-<title>Week 44,  Convolutional Neural Networks (CNN)</title>
+<meta name="description" content="Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)">
+<title>Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week44.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week44-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -41,10 +41,121 @@
                2,
                None,
                'lab-sessions-on-tuesday-and-wednesday'),
-              ('Material for Lecture Monday October 28',
+              ('Material for Lecture Monday October 27',
                2,
                None,
-               'material-for-lecture-monday-october-28'),
+               'material-for-lecture-monday-october-27'),
+              ('Solving differential equations  with Deep Learning',
+               2,
+               None,
+               'solving-differential-equations-with-deep-learning'),
+              ('Ordinary Differential Equations first',
+               2,
+               None,
+               'ordinary-differential-equations-first'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('Minimization process', 2, None, 'minimization-process'),
+              ('Minimizing the cost function using gradient descent and '
+               'automatic differentiation',
+               2,
+               None,
+               'minimizing-the-cost-function-using-gradient-descent-and-automatic-differentiation'),
+              ('Example: Exponential decay',
+               2,
+               None,
+               'example-exponential-decay'),
+              ('The function to solve for',
+               2,
+               None,
+               'the-function-to-solve-for'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('Setup of Network', 2, None, 'setup-of-network'),
+              ('Reformulating the problem',
+               2,
+               None,
+               'reformulating-the-problem'),
+              ('More technicalities', 2, None, 'more-technicalities'),
+              ('More details', 2, None, 'more-details'),
+              ('A possible implementation of a neural network',
+               2,
+               None,
+               'a-possible-implementation-of-a-neural-network'),
+              ('Technicalities', 2, None, 'technicalities'),
+              ('Final technicalities I', 2, None, 'final-technicalities-i'),
+              ('Final technicalities II', 2, None, 'final-technicalities-ii'),
+              ('Final technicalities III', 2, None, 'final-technicalities-iii'),
+              ('Final technicalities IV', 2, None, 'final-technicalities-iv'),
+              ('Back propagation', 2, None, 'back-propagation'),
+              ('Gradient descent', 2, None, 'gradient-descent'),
+              ('The code for solving the ODE',
+               2,
+               None,
+               'the-code-for-solving-the-ode'),
+              ('The network with one input layer, specified number of hidden '
+               'layers, and one output layer',
+               2,
+               None,
+               'the-network-with-one-input-layer-specified-number-of-hidden-layers-and-one-output-layer'),
+              ('Example: Population growth',
+               2,
+               None,
+               'example-population-growth'),
+              ('Setting up the problem', 2, None, 'setting-up-the-problem'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('The program using Autograd',
+               2,
+               None,
+               'the-program-using-autograd'),
+              ('Using forward Euler to solve the ODE',
+               2,
+               None,
+               'using-forward-euler-to-solve-the-ode'),
+              ('Example: Solving the one dimensional Poisson equation',
+               2,
+               None,
+               'example-solving-the-one-dimensional-poisson-equation'),
+              ('The specific equation to solve for',
+               2,
+               None,
+               'the-specific-equation-to-solve-for'),
+              ('Solving the equation using Autograd',
+               2,
+               None,
+               'solving-the-equation-using-autograd'),
+              ('Comparing with a numerical scheme',
+               2,
+               None,
+               'comparing-with-a-numerical-scheme'),
+              ('Setting up the code', 2, None, 'setting-up-the-code'),
+              ('Partial Differential Equations',
+               2,
+               None,
+               'partial-differential-equations'),
+              ('Type of problem', 2, None, 'type-of-problem'),
+              ('Network requirements', 2, None, 'network-requirements'),
+              ('More details', 2, None, 'more-details'),
+              ('Example: The diffusion equation',
+               2,
+               None,
+               'example-the-diffusion-equation'),
+              ('Defining the problem', 2, None, 'defining-the-problem'),
+              ('Setting up the network using Autograd',
+               2,
+               None,
+               'setting-up-the-network-using-autograd'),
+              ('Setting up the network using Autograd; The trial solution',
+               2,
+               None,
+               'setting-up-the-network-using-autograd-the-trial-solution'),
+              ('Why the Jacobian?', 2, None, 'why-the-jacobian'),
+              ('Setting up the network using Autograd; The full program',
+               2,
+               None,
+               'setting-up-the-network-using-autograd-the-full-program'),
+              ('Resources on differential equations and deep learning',
+               2,
+               None,
+               'resources-on-differential-equations-and-deep-learning'),
               ('Convolutional Neural Networks (recognizing images)',
                2,
                None,
@@ -141,85 +252,11 @@
                2,
                None,
                'pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
-              ('Building convolutional neural networks in Tensorflow and Keras',
+              ('Building convolutional neural networks in Tensorflow/Keras and '
+               'PyTorch',
                2,
                None,
-               'building-convolutional-neural-networks-in-tensorflow-and-keras'),
-              ('Setting it up', 2, None, 'setting-it-up'),
-              ('The MNIST dataset again', 2, None, 'the-mnist-dataset-again'),
-              ('Strong correlations', 2, None, 'strong-correlations'),
-              ('Layers of a CNN', 2, None, 'layers-of-a-cnn'),
-              ('Systematic reduction', 2, None, 'systematic-reduction'),
-              ('Prerequisites: Collect and pre-process data',
-               2,
-               None,
-               'prerequisites-collect-and-pre-process-data'),
-              ('Importing Keras and Tensorflow',
-               2,
-               None,
-               'importing-keras-and-tensorflow'),
-              ('Running with Keras', 2, None, 'running-with-keras'),
-              ('Final part', 2, None, 'final-part'),
-              ('Final visualization', 2, None, 'final-visualization'),
-              ('The CIFAR01 data set', 2, None, 'the-cifar01-data-set'),
-              ('Verifying the data set', 2, None, 'verifying-the-data-set'),
-              ('Set up  the model', 2, None, 'set-up-the-model'),
-              ('Add Dense layers on top', 2, None, 'add-dense-layers-on-top'),
-              ('Compile and train the model',
-               2,
-               None,
-               'compile-and-train-the-model'),
-              ('Finally, evaluate the model',
-               2,
-               None,
-               'finally-evaluate-the-model'),
-              ('Building our own CNN code',
-               2,
-               None,
-               'building-our-own-cnn-code'),
-              ('List of contents:', 3, None, 'list-of-contents'),
-              ('Schedulers', 3, None, 'schedulers'),
-              ('Usage of schedulers', 3, None, 'usage-of-schedulers'),
-              ('Cost functions', 3, None, 'cost-functions'),
-              ('Usage of cost functions', 3, None, 'usage-of-cost-functions'),
-              ('Activation functions', 3, None, 'activation-functions'),
-              ('Usage of activation functions',
-               3,
-               None,
-               'usage-of-activation-functions'),
-              ('Convolution', 3, None, 'convolution'),
-              ('Layers', 3, None, 'layers'),
-              ('Convolution2DLayer: convolution in a hidden layer',
-               3,
-               None,
-               'convolution2dlayer-convolution-in-a-hidden-layer'),
-              ('Backpropagation in the convolutional layer',
-               3,
-               None,
-               'backpropagation-in-the-convolutional-layer'),
-              ('Demonstration', 3, None, 'demonstration'),
-              ('Pooling Layer', 3, None, 'pooling-layer'),
-              ('Flattening Layer', 3, None, 'flattening-layer'),
-              ('Fully Connected Layers', 3, None, 'fully-connected-layers'),
-              ('Optimized Convolution2DLayer',
-               3,
-               None,
-               'optimized-convolution2dlayer'),
-              ('The Convolutional Neural Network (CNN)',
-               3,
-               None,
-               'the-convolutional-neural-network-cnn'),
-              ('Usage of CNN code', 3, None, 'usage-of-cnn-code'),
-              ('Additional Remarks', 3, None, 'additional-remarks'),
-              ('Remarks on the speed', 3, None, 'remarks-on-the-speed'),
-              ('Convolution using separable kernels',
-               3,
-               None,
-               'convolution-using-separable-kernels'),
-              ('Convolution in the Fourier domain',
-               3,
-               None,
-               'convolution-in-the-fourier-domain')]}
+               'building-convolutional-neural-networks-in-tensorflow-keras-and-pytorch')]}
 end of tocinfo -->
 
 <body>
@@ -247,96 +284,100 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week44-bs.html">Week 44,  Convolutional Neural Networks (CNN)</a>
+    <a class="navbar-brand" href="http://github.com/week44-bs.html">Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs001.html#plan-for-week-44" style="font-size: 80%;"><b>Plan for week 44</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs002.html#lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Lab  sessions on Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs003.html#material-for-lecture-monday-october-28" style="font-size: 80%;"><b>Material for Lecture Monday October 28</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs004.html#convolutional-neural-networks-recognizing-images" style="font-size: 80%;"><b>Convolutional Neural Networks (recognizing images)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs005.html#what-is-the-difference" style="font-size: 80%;"><b>What is the Difference</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs006.html#neural-networks-vs-cnns" style="font-size: 80%;"><b>Neural Networks vs CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs007.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;"><b>Why CNNS for images, sound files, medical images from CT scans etc?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs008.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;"><b>Regular NNs don’t scale well to full images</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs009.html#3d-volumes-of-neurons" style="font-size: 80%;"><b>3D volumes of neurons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs010.html#more-on-dimensionalities" style="font-size: 80%;"><b>More on Dimensionalities</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs011.html#further-remarks" style="font-size: 80%;"><b>Further remarks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs012.html#layers-used-to-build-cnns" style="font-size: 80%;"><b>Layers used to build CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs013.html#transforming-images" style="font-size: 80%;"><b>Transforming images</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs014.html#cnns-in-brief" style="font-size: 80%;"><b>CNNs in brief</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs015.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs016.html#key-idea" style="font-size: 80%;"><b>Key Idea</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs017.html#how-to-do-image-compression-before-the-era-of-deep-learning" style="font-size: 80%;"><b>How to do image compression before the era of deep learning</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs018.html#the-svd-example" style="font-size: 80%;"><b>The SVD example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs019.html#mathematics-of-cnns" style="font-size: 80%;"><b>Mathematics of CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs020.html#convolution-examples-polynomial-multiplication" style="font-size: 80%;"><b>Convolution Examples: Polynomial multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs021.html#efficient-polynomial-multiplication" style="font-size: 80%;"><b>Efficient Polynomial Multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs022.html#further-simplification" style="font-size: 80%;"><b>Further simplification</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs023.html#a-more-efficient-way-of-coding-the-above-convolution" style="font-size: 80%;"><b>A more efficient way of coding the above Convolution</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs024.html#commutative-process" style="font-size: 80%;"><b>Commutative process</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs025.html#toeplitz-matrices" style="font-size: 80%;"><b>Toeplitz matrices</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs026.html#fourier-series-and-toeplitz-matrices" style="font-size: 80%;"><b>Fourier series and Toeplitz matrices</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs027.html#generalizing-the-above-one-dimensional-case" style="font-size: 80%;"><b>Generalizing the above one-dimensional case</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#memory-considerations" style="font-size: 80%;"><b>Memory considerations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs029.html#padding" style="font-size: 80%;"><b>Padding</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs030.html#new-vector" style="font-size: 80%;"><b>New vector</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs031.html#rewriting-as-dot-products" style="font-size: 80%;"><b>Rewriting as dot products</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs032.html#cross-correlation" style="font-size: 80%;"><b>Cross correlation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs032.html#two-dimensional-objects" style="font-size: 80%;"><b>Two-dimensional objects</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs033.html#cnns-in-more-detail-simple-example" style="font-size: 80%;"><b>CNNs in more detail, simple example</b></a></li>
-     <!-- navigation toc: --> <li><a href="#the-convolution-stage" style="font-size: 80%;"><b>The convolution stage</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs035.html#finding-the-number-of-parameters" style="font-size: 80%;"><b>Finding the number of parameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs036.html#new-image-or-volume" style="font-size: 80%;"><b>New image (or volume)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs037.html#parameters-to-train-common-settings" style="font-size: 80%;"><b>Parameters to train, common settings</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs038.html#examples-of-cnn-setups" style="font-size: 80%;"><b>Examples of CNN setups</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs039.html#summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>Summarizing: Performing a general discrete convolution ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs040.html#pooling" style="font-size: 80%;"><b>Pooling</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs041.html#pooling-arithmetic" style="font-size: 80%;"><b>Pooling arithmetic</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs042.html#pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>Pooling types ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs043.html#building-convolutional-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building convolutional neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs044.html#setting-it-up" style="font-size: 80%;"><b>Setting it up</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs045.html#the-mnist-dataset-again" style="font-size: 80%;"><b>The MNIST dataset again</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs046.html#strong-correlations" style="font-size: 80%;"><b>Strong correlations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs047.html#layers-of-a-cnn" style="font-size: 80%;"><b>Layers of a CNN</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs048.html#systematic-reduction" style="font-size: 80%;"><b>Systematic reduction</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs049.html#prerequisites-collect-and-pre-process-data" style="font-size: 80%;"><b>Prerequisites: Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs050.html#importing-keras-and-tensorflow" style="font-size: 80%;"><b>Importing Keras and Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs051.html#running-with-keras" style="font-size: 80%;"><b>Running with Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs052.html#final-part" style="font-size: 80%;"><b>Final part</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs053.html#final-visualization" style="font-size: 80%;"><b>Final visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs054.html#the-cifar01-data-set" style="font-size: 80%;"><b>The CIFAR01 data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs055.html#verifying-the-data-set" style="font-size: 80%;"><b>Verifying the data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs056.html#set-up-the-model" style="font-size: 80%;"><b>Set up  the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs057.html#add-dense-layers-on-top" style="font-size: 80%;"><b>Add Dense layers on top</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs058.html#compile-and-train-the-model" style="font-size: 80%;"><b>Compile and train the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs059.html#finally-evaluate-the-model" style="font-size: 80%;"><b>Finally, evaluate the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#building-our-own-cnn-code" style="font-size: 80%;"><b>Building our own CNN code</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#list-of-contents" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;List of contents:</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution2dlayer-convolution-in-a-hidden-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution2DLayer: convolution in a hidden layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#backpropagation-in-the-convolutional-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Backpropagation in the convolutional layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#demonstration" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Demonstration</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#pooling-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Pooling Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#flattening-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Flattening Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#fully-connected-layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Fully Connected Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#optimized-convolution2dlayer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Optimized Convolution2DLayer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#the-convolutional-neural-network-cnn" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;The Convolutional Neural Network (CNN)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-cnn-code" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of CNN code</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#additional-remarks" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Additional Remarks</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#remarks-on-the-speed" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Remarks on the speed</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution-using-separable-kernels" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution using separable kernels</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution-in-the-fourier-domain" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution in the Fourier domain</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs001.html#plan-for-week-44" style="font-size: 80%;">Plan for week 44</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs002.html#lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;">Lab  sessions on Tuesday and Wednesday</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs003.html#material-for-lecture-monday-october-27" style="font-size: 80%;">Material for Lecture Monday October 27</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs004.html#solving-differential-equations-with-deep-learning" style="font-size: 80%;">Solving differential equations  with Deep Learning</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs005.html#ordinary-differential-equations-first" style="font-size: 80%;">Ordinary Differential Equations first</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#the-trial-solution" style="font-size: 80%;">The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs007.html#minimization-process" style="font-size: 80%;">Minimization process</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs008.html#minimizing-the-cost-function-using-gradient-descent-and-automatic-differentiation" style="font-size: 80%;">Minimizing the cost function using gradient descent and automatic differentiation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs009.html#example-exponential-decay" style="font-size: 80%;">Example: Exponential decay</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs010.html#the-function-to-solve-for" style="font-size: 80%;">The function to solve for</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#the-trial-solution" style="font-size: 80%;">The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs012.html#setup-of-network" style="font-size: 80%;">Setup of Network</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs013.html#reformulating-the-problem" style="font-size: 80%;">Reformulating the problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs014.html#more-technicalities" style="font-size: 80%;">More technicalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs039.html#more-details" style="font-size: 80%;">More details</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs016.html#a-possible-implementation-of-a-neural-network" style="font-size: 80%;">A possible implementation of a neural network</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs017.html#technicalities" style="font-size: 80%;">Technicalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs018.html#final-technicalities-i" style="font-size: 80%;">Final technicalities I</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs019.html#final-technicalities-ii" style="font-size: 80%;">Final technicalities II</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs020.html#final-technicalities-iii" style="font-size: 80%;">Final technicalities III</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs021.html#final-technicalities-iv" style="font-size: 80%;">Final technicalities IV</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs022.html#back-propagation" style="font-size: 80%;">Back propagation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs023.html#gradient-descent" style="font-size: 80%;">Gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs024.html#the-code-for-solving-the-ode" style="font-size: 80%;">The code for solving the ODE</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs025.html#the-network-with-one-input-layer-specified-number-of-hidden-layers-and-one-output-layer" style="font-size: 80%;">The network with one input layer, specified number of hidden layers, and one output layer</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs026.html#example-population-growth" style="font-size: 80%;">Example: Population growth</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs027.html#setting-up-the-problem" style="font-size: 80%;">Setting up the problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#the-trial-solution" style="font-size: 80%;">The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs029.html#the-program-using-autograd" style="font-size: 80%;">The program using Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs030.html#using-forward-euler-to-solve-the-ode" style="font-size: 80%;">Using forward Euler to solve the ODE</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs031.html#example-solving-the-one-dimensional-poisson-equation" style="font-size: 80%;">Example: Solving the one dimensional Poisson equation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs032.html#the-specific-equation-to-solve-for" style="font-size: 80%;">The specific equation to solve for</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs033.html#solving-the-equation-using-autograd" style="font-size: 80%;">Solving the equation using Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="#comparing-with-a-numerical-scheme" style="font-size: 80%;">Comparing with a numerical scheme</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs035.html#setting-up-the-code" style="font-size: 80%;">Setting up the code</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs036.html#partial-differential-equations" style="font-size: 80%;">Partial Differential Equations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs037.html#type-of-problem" style="font-size: 80%;">Type of problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs038.html#network-requirements" style="font-size: 80%;">Network requirements</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs039.html#more-details" style="font-size: 80%;">More details</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs040.html#example-the-diffusion-equation" style="font-size: 80%;">Example: The diffusion equation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs041.html#defining-the-problem" style="font-size: 80%;">Defining the problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs042.html#setting-up-the-network-using-autograd" style="font-size: 80%;">Setting up the network using Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs043.html#setting-up-the-network-using-autograd-the-trial-solution" style="font-size: 80%;">Setting up the network using Autograd; The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs044.html#why-the-jacobian" style="font-size: 80%;">Why the Jacobian?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs045.html#setting-up-the-network-using-autograd-the-full-program" style="font-size: 80%;">Setting up the network using Autograd; The full program</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs046.html#resources-on-differential-equations-and-deep-learning" style="font-size: 80%;">Resources on differential equations and deep learning</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs047.html#convolutional-neural-networks-recognizing-images" style="font-size: 80%;">Convolutional Neural Networks (recognizing images)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs048.html#what-is-the-difference" style="font-size: 80%;">What is the Difference</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs049.html#neural-networks-vs-cnns" style="font-size: 80%;">Neural Networks vs CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs050.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;">Why CNNS for images, sound files, medical images from CT scans etc?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs051.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;">Regular NNs don’t scale well to full images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs052.html#3d-volumes-of-neurons" style="font-size: 80%;">3D volumes of neurons</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs053.html#more-on-dimensionalities" style="font-size: 80%;">More on Dimensionalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs054.html#further-remarks" style="font-size: 80%;">Further remarks</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs055.html#layers-used-to-build-cnns" style="font-size: 80%;">Layers used to build CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs056.html#transforming-images" style="font-size: 80%;">Transforming images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs057.html#cnns-in-brief" style="font-size: 80%;">CNNs in brief</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs058.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs059.html#key-idea" style="font-size: 80%;">Key Idea</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#how-to-do-image-compression-before-the-era-of-deep-learning" style="font-size: 80%;">How to do image compression before the era of deep learning</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs061.html#the-svd-example" style="font-size: 80%;">The SVD example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs062.html#mathematics-of-cnns" style="font-size: 80%;">Mathematics of CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs063.html#convolution-examples-polynomial-multiplication" style="font-size: 80%;">Convolution Examples: Polynomial multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs064.html#efficient-polynomial-multiplication" style="font-size: 80%;">Efficient Polynomial Multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs065.html#further-simplification" style="font-size: 80%;">Further simplification</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs066.html#a-more-efficient-way-of-coding-the-above-convolution" style="font-size: 80%;">A more efficient way of coding the above Convolution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs067.html#commutative-process" style="font-size: 80%;">Commutative process</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs068.html#toeplitz-matrices" style="font-size: 80%;">Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs069.html#fourier-series-and-toeplitz-matrices" style="font-size: 80%;">Fourier series and Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs070.html#generalizing-the-above-one-dimensional-case" style="font-size: 80%;">Generalizing the above one-dimensional case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs071.html#memory-considerations" style="font-size: 80%;">Memory considerations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs072.html#padding" style="font-size: 80%;">Padding</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs073.html#new-vector" style="font-size: 80%;">New vector</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs074.html#rewriting-as-dot-products" style="font-size: 80%;">Rewriting as dot products</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs075.html#cross-correlation" style="font-size: 80%;">Cross correlation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs075.html#two-dimensional-objects" style="font-size: 80%;">Two-dimensional objects</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs076.html#cnns-in-more-detail-simple-example" style="font-size: 80%;">CNNs in more detail, simple example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs077.html#the-convolution-stage" style="font-size: 80%;">The convolution stage</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs078.html#finding-the-number-of-parameters" style="font-size: 80%;">Finding the number of parameters</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs079.html#new-image-or-volume" style="font-size: 80%;">New image (or volume)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs080.html#parameters-to-train-common-settings" style="font-size: 80%;">Parameters to train, common settings</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs081.html#examples-of-cnn-setups" style="font-size: 80%;">Examples of CNN setups</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs082.html#summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Summarizing: Performing a general discrete convolution ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs083.html#pooling" style="font-size: 80%;">Pooling</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs084.html#pooling-arithmetic" style="font-size: 80%;">Pooling arithmetic</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs085.html#pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Pooling types ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs086.html#building-convolutional-neural-networks-in-tensorflow-keras-and-pytorch" style="font-size: 80%;">Building convolutional neural networks in Tensorflow/Keras and PyTorch</a></li>
 
         </ul>
       </li>
@@ -348,15 +389,93 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0034"></a>
 <!-- !split -->
-<h2 id="the-convolution-stage" class="anchor">The convolution stage </h2>
+<h2 id="comparing-with-a-numerical-scheme" class="anchor">Comparing with a numerical scheme </h2>
+
+<p>The Poisson equation is possible to solve using Taylor series to approximate the second derivative.</p>
+
+<p>Using Taylor series, the second derivative can be expressed as</p>
 
-<p>The convolution stage, where we apply different filters \( \boldsymbol{W} \) in
-order to reduce the dimensionality of an image, adds, in addition to
-the weights and biases (to be trained by the back propagation
-algorithm) that define the filters, two new hyperparameters, the so-called
-<b>padding</b> \( P \) and the stride \( S \).
+<p>$$
+g''(x) = \frac{g(x + \Delta x) - 2g(x) + g(x-\Delta x)}{\Delta x^2} + E_{\Delta x}(x)
+$$
 </p>
 
+<p>where \( \Delta x \) is a small step size and \( E_{\Delta x}(x) \) being the error term.</p>
+
+<p>Looking away from the error terms gives an approximation to the second derivative:</p>
+
+$$
+\begin{equation} \tag{15}
+g''(x) \approx \frac{g(x + \Delta x) - 2g(x) + g(x-\Delta x)}{\Delta x^2}
+\end{equation}
+$$
+
+<p>If \( x_i = i \Delta x = x_{i-1} + \Delta x \) and \( g_i = g(x_i) \) for \( i = 1,\dots N_x - 2 \) with \( N_x \) being the number of values for \( x \), <a href="#mjx-eqn-15">(15)</a> becomes</p>
+
+$$
+\begin{aligned}
+g''(x_i) &\approx \frac{g(x_i + \Delta x) - 2g(x_i) + g(x_i -\Delta x)}{\Delta x^2} \\
+&= \frac{g_{i+1} - 2g_i + g_{i-1}}{\Delta x^2}
+\end{aligned}
+$$
+
+<p>Since we know from our problem that</p>
+
+$$
+\begin{aligned}
+-g''(x) &= f(x) \\
+&= (3x + x^2)\exp(x)
+\end{aligned}
+$$
+
+<p>along with the conditions \( g(0) = g(1) = 0 \),
+the following scheme can be used to find an approximate solution for \( g(x) \) numerically:
+</p>
+
+$$
+\begin{equation}
+  \begin{aligned}
+  -\Big( \frac{g_{i+1} - 2g_i + g_{i-1}}{\Delta x^2} \Big) &= f(x_i) \\
+  -g_{i+1} + 2g_i - g_{i-1} &= \Delta x^2 f(x_i)
+  \end{aligned}
+\end{equation} \tag{16}
+$$
+
+<p>for \( i = 1, \dots, N_x - 2 \) where \( g_0 = g_{N_x - 1} = 0 \) and \( f(x_i) = (3x_i + x_i^2)\exp(x_i) \), which is given for our specific problem.</p>
+
+<p>The equation can be rewritten into a matrix equation:</p>
+
+$$
+\begin{aligned}
+\begin{pmatrix}
+2 & -1 & 0 & \dots & 0 \\
+-1 & 2 & -1 & \dots & 0 \\
+\vdots & & \ddots & & \vdots \\
+0 & \dots & -1 & 2 & -1  \\
+0 & \dots & 0 & -1 & 2\\
+\end{pmatrix}
+\begin{pmatrix}
+g_1 \\
+g_2 \\
+\vdots \\
+g_{N_x - 3} \\
+g_{N_x - 2}
+\end{pmatrix}
+&=
+\Delta x^2
+\begin{pmatrix}
+f(x_1) \\
+f(x_2) \\
+\vdots \\
+f(x_{N_x - 3}) \\
+f(x_{N_x - 2})
+\end{pmatrix} \\
+\boldsymbol{A}\boldsymbol{g} &= \boldsymbol{f},
+\end{aligned}
+$$
+
+<p>which makes it possible to solve for the vector \( \boldsymbol{g} \).</p>
+
 <p>
 <!-- navigation buttons at the bottom of the page -->
 <ul class="pagination">
@@ -382,7 +501,7 @@ <h2 id="the-convolution-stage" class="anchor">The convolution stage </h2>
   <li><a href="http://github.com/._week44-bs042.html">43</a></li>
   <li><a href="http://github.com/._week44-bs043.html">44</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week44-bs060.html">61</a></li>
+  <li><a href="http://github.com/._week44-bs086.html">87</a></li>
   <li><a href="http://github.com/._week44-bs035.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week44/html/._week44-bs035.html b/doc/pub/week44/html/._week44-bs035.html
index ed802bbbb..ce3382641 100644
--- a/doc/pub/week44/html/._week44-bs035.html
+++ b/doc/pub/week44/html/._week44-bs035.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 44,  Convolutional Neural Networks (CNN)">
-<title>Week 44,  Convolutional Neural Networks (CNN)</title>
+<meta name="description" content="Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)">
+<title>Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week44.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week44-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -41,10 +41,121 @@
                2,
                None,
                'lab-sessions-on-tuesday-and-wednesday'),
-              ('Material for Lecture Monday October 28',
+              ('Material for Lecture Monday October 27',
                2,
                None,
-               'material-for-lecture-monday-october-28'),
+               'material-for-lecture-monday-october-27'),
+              ('Solving differential equations  with Deep Learning',
+               2,
+               None,
+               'solving-differential-equations-with-deep-learning'),
+              ('Ordinary Differential Equations first',
+               2,
+               None,
+               'ordinary-differential-equations-first'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('Minimization process', 2, None, 'minimization-process'),
+              ('Minimizing the cost function using gradient descent and '
+               'automatic differentiation',
+               2,
+               None,
+               'minimizing-the-cost-function-using-gradient-descent-and-automatic-differentiation'),
+              ('Example: Exponential decay',
+               2,
+               None,
+               'example-exponential-decay'),
+              ('The function to solve for',
+               2,
+               None,
+               'the-function-to-solve-for'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('Setup of Network', 2, None, 'setup-of-network'),
+              ('Reformulating the problem',
+               2,
+               None,
+               'reformulating-the-problem'),
+              ('More technicalities', 2, None, 'more-technicalities'),
+              ('More details', 2, None, 'more-details'),
+              ('A possible implementation of a neural network',
+               2,
+               None,
+               'a-possible-implementation-of-a-neural-network'),
+              ('Technicalities', 2, None, 'technicalities'),
+              ('Final technicalities I', 2, None, 'final-technicalities-i'),
+              ('Final technicalities II', 2, None, 'final-technicalities-ii'),
+              ('Final technicalities III', 2, None, 'final-technicalities-iii'),
+              ('Final technicalities IV', 2, None, 'final-technicalities-iv'),
+              ('Back propagation', 2, None, 'back-propagation'),
+              ('Gradient descent', 2, None, 'gradient-descent'),
+              ('The code for solving the ODE',
+               2,
+               None,
+               'the-code-for-solving-the-ode'),
+              ('The network with one input layer, specified number of hidden '
+               'layers, and one output layer',
+               2,
+               None,
+               'the-network-with-one-input-layer-specified-number-of-hidden-layers-and-one-output-layer'),
+              ('Example: Population growth',
+               2,
+               None,
+               'example-population-growth'),
+              ('Setting up the problem', 2, None, 'setting-up-the-problem'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('The program using Autograd',
+               2,
+               None,
+               'the-program-using-autograd'),
+              ('Using forward Euler to solve the ODE',
+               2,
+               None,
+               'using-forward-euler-to-solve-the-ode'),
+              ('Example: Solving the one dimensional Poisson equation',
+               2,
+               None,
+               'example-solving-the-one-dimensional-poisson-equation'),
+              ('The specific equation to solve for',
+               2,
+               None,
+               'the-specific-equation-to-solve-for'),
+              ('Solving the equation using Autograd',
+               2,
+               None,
+               'solving-the-equation-using-autograd'),
+              ('Comparing with a numerical scheme',
+               2,
+               None,
+               'comparing-with-a-numerical-scheme'),
+              ('Setting up the code', 2, None, 'setting-up-the-code'),
+              ('Partial Differential Equations',
+               2,
+               None,
+               'partial-differential-equations'),
+              ('Type of problem', 2, None, 'type-of-problem'),
+              ('Network requirements', 2, None, 'network-requirements'),
+              ('More details', 2, None, 'more-details'),
+              ('Example: The diffusion equation',
+               2,
+               None,
+               'example-the-diffusion-equation'),
+              ('Defining the problem', 2, None, 'defining-the-problem'),
+              ('Setting up the network using Autograd',
+               2,
+               None,
+               'setting-up-the-network-using-autograd'),
+              ('Setting up the network using Autograd; The trial solution',
+               2,
+               None,
+               'setting-up-the-network-using-autograd-the-trial-solution'),
+              ('Why the Jacobian?', 2, None, 'why-the-jacobian'),
+              ('Setting up the network using Autograd; The full program',
+               2,
+               None,
+               'setting-up-the-network-using-autograd-the-full-program'),
+              ('Resources on differential equations and deep learning',
+               2,
+               None,
+               'resources-on-differential-equations-and-deep-learning'),
               ('Convolutional Neural Networks (recognizing images)',
                2,
                None,
@@ -141,85 +252,11 @@
                2,
                None,
                'pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
-              ('Building convolutional neural networks in Tensorflow and Keras',
-               2,
-               None,
-               'building-convolutional-neural-networks-in-tensorflow-and-keras'),
-              ('Setting it up', 2, None, 'setting-it-up'),
-              ('The MNIST dataset again', 2, None, 'the-mnist-dataset-again'),
-              ('Strong correlations', 2, None, 'strong-correlations'),
-              ('Layers of a CNN', 2, None, 'layers-of-a-cnn'),
-              ('Systematic reduction', 2, None, 'systematic-reduction'),
-              ('Prerequisites: Collect and pre-process data',
-               2,
-               None,
-               'prerequisites-collect-and-pre-process-data'),
-              ('Importing Keras and Tensorflow',
+              ('Building convolutional neural networks in Tensorflow/Keras and '
+               'PyTorch',
                2,
                None,
-               'importing-keras-and-tensorflow'),
-              ('Running with Keras', 2, None, 'running-with-keras'),
-              ('Final part', 2, None, 'final-part'),
-              ('Final visualization', 2, None, 'final-visualization'),
-              ('The CIFAR01 data set', 2, None, 'the-cifar01-data-set'),
-              ('Verifying the data set', 2, None, 'verifying-the-data-set'),
-              ('Set up  the model', 2, None, 'set-up-the-model'),
-              ('Add Dense layers on top', 2, None, 'add-dense-layers-on-top'),
-              ('Compile and train the model',
-               2,
-               None,
-               'compile-and-train-the-model'),
-              ('Finally, evaluate the model',
-               2,
-               None,
-               'finally-evaluate-the-model'),
-              ('Building our own CNN code',
-               2,
-               None,
-               'building-our-own-cnn-code'),
-              ('List of contents:', 3, None, 'list-of-contents'),
-              ('Schedulers', 3, None, 'schedulers'),
-              ('Usage of schedulers', 3, None, 'usage-of-schedulers'),
-              ('Cost functions', 3, None, 'cost-functions'),
-              ('Usage of cost functions', 3, None, 'usage-of-cost-functions'),
-              ('Activation functions', 3, None, 'activation-functions'),
-              ('Usage of activation functions',
-               3,
-               None,
-               'usage-of-activation-functions'),
-              ('Convolution', 3, None, 'convolution'),
-              ('Layers', 3, None, 'layers'),
-              ('Convolution2DLayer: convolution in a hidden layer',
-               3,
-               None,
-               'convolution2dlayer-convolution-in-a-hidden-layer'),
-              ('Backpropagation in the convolutional layer',
-               3,
-               None,
-               'backpropagation-in-the-convolutional-layer'),
-              ('Demonstration', 3, None, 'demonstration'),
-              ('Pooling Layer', 3, None, 'pooling-layer'),
-              ('Flattening Layer', 3, None, 'flattening-layer'),
-              ('Fully Connected Layers', 3, None, 'fully-connected-layers'),
-              ('Optimized Convolution2DLayer',
-               3,
-               None,
-               'optimized-convolution2dlayer'),
-              ('The Convolutional Neural Network (CNN)',
-               3,
-               None,
-               'the-convolutional-neural-network-cnn'),
-              ('Usage of CNN code', 3, None, 'usage-of-cnn-code'),
-              ('Additional Remarks', 3, None, 'additional-remarks'),
-              ('Remarks on the speed', 3, None, 'remarks-on-the-speed'),
-              ('Convolution using separable kernels',
-               3,
-               None,
-               'convolution-using-separable-kernels'),
-              ('Convolution in the Fourier domain',
-               3,
-               None,
-               'convolution-in-the-fourier-domain')]}
+               'building-convolutional-neural-networks-in-tensorflow-keras-and-pytorch')]}
 end of tocinfo -->
 
 <body>
@@ -247,96 +284,100 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week44-bs.html">Week 44,  Convolutional Neural Networks (CNN)</a>
+    <a class="navbar-brand" href="http://github.com/week44-bs.html">Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs001.html#plan-for-week-44" style="font-size: 80%;"><b>Plan for week 44</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs002.html#lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Lab  sessions on Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs003.html#material-for-lecture-monday-october-28" style="font-size: 80%;"><b>Material for Lecture Monday October 28</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs004.html#convolutional-neural-networks-recognizing-images" style="font-size: 80%;"><b>Convolutional Neural Networks (recognizing images)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs005.html#what-is-the-difference" style="font-size: 80%;"><b>What is the Difference</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs006.html#neural-networks-vs-cnns" style="font-size: 80%;"><b>Neural Networks vs CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs007.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;"><b>Why CNNS for images, sound files, medical images from CT scans etc?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs008.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;"><b>Regular NNs don’t scale well to full images</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs009.html#3d-volumes-of-neurons" style="font-size: 80%;"><b>3D volumes of neurons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs010.html#more-on-dimensionalities" style="font-size: 80%;"><b>More on Dimensionalities</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs011.html#further-remarks" style="font-size: 80%;"><b>Further remarks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs012.html#layers-used-to-build-cnns" style="font-size: 80%;"><b>Layers used to build CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs013.html#transforming-images" style="font-size: 80%;"><b>Transforming images</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs014.html#cnns-in-brief" style="font-size: 80%;"><b>CNNs in brief</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs015.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs016.html#key-idea" style="font-size: 80%;"><b>Key Idea</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs017.html#how-to-do-image-compression-before-the-era-of-deep-learning" style="font-size: 80%;"><b>How to do image compression before the era of deep learning</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs018.html#the-svd-example" style="font-size: 80%;"><b>The SVD example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs019.html#mathematics-of-cnns" style="font-size: 80%;"><b>Mathematics of CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs020.html#convolution-examples-polynomial-multiplication" style="font-size: 80%;"><b>Convolution Examples: Polynomial multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs021.html#efficient-polynomial-multiplication" style="font-size: 80%;"><b>Efficient Polynomial Multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs022.html#further-simplification" style="font-size: 80%;"><b>Further simplification</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs023.html#a-more-efficient-way-of-coding-the-above-convolution" style="font-size: 80%;"><b>A more efficient way of coding the above Convolution</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs024.html#commutative-process" style="font-size: 80%;"><b>Commutative process</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs025.html#toeplitz-matrices" style="font-size: 80%;"><b>Toeplitz matrices</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs026.html#fourier-series-and-toeplitz-matrices" style="font-size: 80%;"><b>Fourier series and Toeplitz matrices</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs027.html#generalizing-the-above-one-dimensional-case" style="font-size: 80%;"><b>Generalizing the above one-dimensional case</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#memory-considerations" style="font-size: 80%;"><b>Memory considerations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs029.html#padding" style="font-size: 80%;"><b>Padding</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs030.html#new-vector" style="font-size: 80%;"><b>New vector</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs031.html#rewriting-as-dot-products" style="font-size: 80%;"><b>Rewriting as dot products</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs032.html#cross-correlation" style="font-size: 80%;"><b>Cross correlation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs032.html#two-dimensional-objects" style="font-size: 80%;"><b>Two-dimensional objects</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs033.html#cnns-in-more-detail-simple-example" style="font-size: 80%;"><b>CNNs in more detail, simple example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs034.html#the-convolution-stage" style="font-size: 80%;"><b>The convolution stage</b></a></li>
-     <!-- navigation toc: --> <li><a href="#finding-the-number-of-parameters" style="font-size: 80%;"><b>Finding the number of parameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs036.html#new-image-or-volume" style="font-size: 80%;"><b>New image (or volume)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs037.html#parameters-to-train-common-settings" style="font-size: 80%;"><b>Parameters to train, common settings</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs038.html#examples-of-cnn-setups" style="font-size: 80%;"><b>Examples of CNN setups</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs039.html#summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>Summarizing: Performing a general discrete convolution ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs040.html#pooling" style="font-size: 80%;"><b>Pooling</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs041.html#pooling-arithmetic" style="font-size: 80%;"><b>Pooling arithmetic</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs042.html#pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>Pooling types ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs043.html#building-convolutional-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building convolutional neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs044.html#setting-it-up" style="font-size: 80%;"><b>Setting it up</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs045.html#the-mnist-dataset-again" style="font-size: 80%;"><b>The MNIST dataset again</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs046.html#strong-correlations" style="font-size: 80%;"><b>Strong correlations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs047.html#layers-of-a-cnn" style="font-size: 80%;"><b>Layers of a CNN</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs048.html#systematic-reduction" style="font-size: 80%;"><b>Systematic reduction</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs049.html#prerequisites-collect-and-pre-process-data" style="font-size: 80%;"><b>Prerequisites: Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs050.html#importing-keras-and-tensorflow" style="font-size: 80%;"><b>Importing Keras and Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs051.html#running-with-keras" style="font-size: 80%;"><b>Running with Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs052.html#final-part" style="font-size: 80%;"><b>Final part</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs053.html#final-visualization" style="font-size: 80%;"><b>Final visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs054.html#the-cifar01-data-set" style="font-size: 80%;"><b>The CIFAR01 data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs055.html#verifying-the-data-set" style="font-size: 80%;"><b>Verifying the data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs056.html#set-up-the-model" style="font-size: 80%;"><b>Set up  the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs057.html#add-dense-layers-on-top" style="font-size: 80%;"><b>Add Dense layers on top</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs058.html#compile-and-train-the-model" style="font-size: 80%;"><b>Compile and train the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs059.html#finally-evaluate-the-model" style="font-size: 80%;"><b>Finally, evaluate the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#building-our-own-cnn-code" style="font-size: 80%;"><b>Building our own CNN code</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#list-of-contents" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;List of contents:</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution2dlayer-convolution-in-a-hidden-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution2DLayer: convolution in a hidden layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#backpropagation-in-the-convolutional-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Backpropagation in the convolutional layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#demonstration" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Demonstration</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#pooling-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Pooling Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#flattening-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Flattening Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#fully-connected-layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Fully Connected Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#optimized-convolution2dlayer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Optimized Convolution2DLayer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#the-convolutional-neural-network-cnn" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;The Convolutional Neural Network (CNN)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-cnn-code" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of CNN code</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#additional-remarks" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Additional Remarks</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#remarks-on-the-speed" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Remarks on the speed</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution-using-separable-kernels" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution using separable kernels</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution-in-the-fourier-domain" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution in the Fourier domain</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs001.html#plan-for-week-44" style="font-size: 80%;">Plan for week 44</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs002.html#lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;">Lab  sessions on Tuesday and Wednesday</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs003.html#material-for-lecture-monday-october-27" style="font-size: 80%;">Material for Lecture Monday October 27</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs004.html#solving-differential-equations-with-deep-learning" style="font-size: 80%;">Solving differential equations  with Deep Learning</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs005.html#ordinary-differential-equations-first" style="font-size: 80%;">Ordinary Differential Equations first</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#the-trial-solution" style="font-size: 80%;">The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs007.html#minimization-process" style="font-size: 80%;">Minimization process</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs008.html#minimizing-the-cost-function-using-gradient-descent-and-automatic-differentiation" style="font-size: 80%;">Minimizing the cost function using gradient descent and automatic differentiation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs009.html#example-exponential-decay" style="font-size: 80%;">Example: Exponential decay</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs010.html#the-function-to-solve-for" style="font-size: 80%;">The function to solve for</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#the-trial-solution" style="font-size: 80%;">The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs012.html#setup-of-network" style="font-size: 80%;">Setup of Network</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs013.html#reformulating-the-problem" style="font-size: 80%;">Reformulating the problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs014.html#more-technicalities" style="font-size: 80%;">More technicalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs039.html#more-details" style="font-size: 80%;">More details</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs016.html#a-possible-implementation-of-a-neural-network" style="font-size: 80%;">A possible implementation of a neural network</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs017.html#technicalities" style="font-size: 80%;">Technicalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs018.html#final-technicalities-i" style="font-size: 80%;">Final technicalities I</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs019.html#final-technicalities-ii" style="font-size: 80%;">Final technicalities II</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs020.html#final-technicalities-iii" style="font-size: 80%;">Final technicalities III</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs021.html#final-technicalities-iv" style="font-size: 80%;">Final technicalities IV</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs022.html#back-propagation" style="font-size: 80%;">Back propagation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs023.html#gradient-descent" style="font-size: 80%;">Gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs024.html#the-code-for-solving-the-ode" style="font-size: 80%;">The code for solving the ODE</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs025.html#the-network-with-one-input-layer-specified-number-of-hidden-layers-and-one-output-layer" style="font-size: 80%;">The network with one input layer, specified number of hidden layers, and one output layer</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs026.html#example-population-growth" style="font-size: 80%;">Example: Population growth</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs027.html#setting-up-the-problem" style="font-size: 80%;">Setting up the problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#the-trial-solution" style="font-size: 80%;">The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs029.html#the-program-using-autograd" style="font-size: 80%;">The program using Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs030.html#using-forward-euler-to-solve-the-ode" style="font-size: 80%;">Using forward Euler to solve the ODE</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs031.html#example-solving-the-one-dimensional-poisson-equation" style="font-size: 80%;">Example: Solving the one dimensional Poisson equation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs032.html#the-specific-equation-to-solve-for" style="font-size: 80%;">The specific equation to solve for</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs033.html#solving-the-equation-using-autograd" style="font-size: 80%;">Solving the equation using Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs034.html#comparing-with-a-numerical-scheme" style="font-size: 80%;">Comparing with a numerical scheme</a></li>
+     <!-- navigation toc: --> <li><a href="#setting-up-the-code" style="font-size: 80%;">Setting up the code</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs036.html#partial-differential-equations" style="font-size: 80%;">Partial Differential Equations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs037.html#type-of-problem" style="font-size: 80%;">Type of problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs038.html#network-requirements" style="font-size: 80%;">Network requirements</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs039.html#more-details" style="font-size: 80%;">More details</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs040.html#example-the-diffusion-equation" style="font-size: 80%;">Example: The diffusion equation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs041.html#defining-the-problem" style="font-size: 80%;">Defining the problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs042.html#setting-up-the-network-using-autograd" style="font-size: 80%;">Setting up the network using Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs043.html#setting-up-the-network-using-autograd-the-trial-solution" style="font-size: 80%;">Setting up the network using Autograd; The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs044.html#why-the-jacobian" style="font-size: 80%;">Why the Jacobian?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs045.html#setting-up-the-network-using-autograd-the-full-program" style="font-size: 80%;">Setting up the network using Autograd; The full program</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs046.html#resources-on-differential-equations-and-deep-learning" style="font-size: 80%;">Resources on differential equations and deep learning</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs047.html#convolutional-neural-networks-recognizing-images" style="font-size: 80%;">Convolutional Neural Networks (recognizing images)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs048.html#what-is-the-difference" style="font-size: 80%;">What is the Difference</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs049.html#neural-networks-vs-cnns" style="font-size: 80%;">Neural Networks vs CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs050.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;">Why CNNS for images, sound files, medical images from CT scans etc?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs051.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;">Regular NNs don’t scale well to full images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs052.html#3d-volumes-of-neurons" style="font-size: 80%;">3D volumes of neurons</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs053.html#more-on-dimensionalities" style="font-size: 80%;">More on Dimensionalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs054.html#further-remarks" style="font-size: 80%;">Further remarks</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs055.html#layers-used-to-build-cnns" style="font-size: 80%;">Layers used to build CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs056.html#transforming-images" style="font-size: 80%;">Transforming images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs057.html#cnns-in-brief" style="font-size: 80%;">CNNs in brief</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs058.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs059.html#key-idea" style="font-size: 80%;">Key Idea</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#how-to-do-image-compression-before-the-era-of-deep-learning" style="font-size: 80%;">How to do image compression before the era of deep learning</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs061.html#the-svd-example" style="font-size: 80%;">The SVD example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs062.html#mathematics-of-cnns" style="font-size: 80%;">Mathematics of CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs063.html#convolution-examples-polynomial-multiplication" style="font-size: 80%;">Convolution Examples: Polynomial multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs064.html#efficient-polynomial-multiplication" style="font-size: 80%;">Efficient Polynomial Multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs065.html#further-simplification" style="font-size: 80%;">Further simplification</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs066.html#a-more-efficient-way-of-coding-the-above-convolution" style="font-size: 80%;">A more efficient way of coding the above Convolution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs067.html#commutative-process" style="font-size: 80%;">Commutative process</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs068.html#toeplitz-matrices" style="font-size: 80%;">Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs069.html#fourier-series-and-toeplitz-matrices" style="font-size: 80%;">Fourier series and Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs070.html#generalizing-the-above-one-dimensional-case" style="font-size: 80%;">Generalizing the above one-dimensional case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs071.html#memory-considerations" style="font-size: 80%;">Memory considerations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs072.html#padding" style="font-size: 80%;">Padding</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs073.html#new-vector" style="font-size: 80%;">New vector</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs074.html#rewriting-as-dot-products" style="font-size: 80%;">Rewriting as dot products</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs075.html#cross-correlation" style="font-size: 80%;">Cross correlation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs075.html#two-dimensional-objects" style="font-size: 80%;">Two-dimensional objects</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs076.html#cnns-in-more-detail-simple-example" style="font-size: 80%;">CNNs in more detail, simple example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs077.html#the-convolution-stage" style="font-size: 80%;">The convolution stage</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs078.html#finding-the-number-of-parameters" style="font-size: 80%;">Finding the number of parameters</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs079.html#new-image-or-volume" style="font-size: 80%;">New image (or volume)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs080.html#parameters-to-train-common-settings" style="font-size: 80%;">Parameters to train, common settings</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs081.html#examples-of-cnn-setups" style="font-size: 80%;">Examples of CNN setups</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs082.html#summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Summarizing: Performing a general discrete convolution ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs083.html#pooling" style="font-size: 80%;">Pooling</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs084.html#pooling-arithmetic" style="font-size: 80%;">Pooling arithmetic</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs085.html#pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Pooling types ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs086.html#building-convolutional-neural-networks-in-tensorflow-keras-and-pytorch" style="font-size: 80%;">Building convolutional neural networks in Tensorflow/Keras and PyTorch</a></li>
 
         </ul>
       </li>
@@ -348,24 +389,226 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0035"></a>
 <!-- !split -->
-<h2 id="finding-the-number-of-parameters" class="anchor">Finding the number of parameters </h2>
+<h2 id="setting-up-the-code" class="anchor">Setting up the code </h2>
+
+<p>We can then compare the result from this numerical scheme with the output from our network using Autograd:</p>
+
 
-<p>In the above example we have an input matrix of dimension \( 3\times
-3 \). In general we call the input for an input volume and it is defined
-by its width \( H_1 \), height \( H_1 \) and depth \( D_1 \). If we have the
-standard three color channels \( D_1=3 \).
-</p>
+<!-- code=python (!bc pycod) typeset with pygments style "default" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #f8f8f8">
+  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">autograd.numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">autograd</span> <span style="color: #008000; font-weight: bold">import</span> grad, elementwise_grad
+<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">autograd.numpy.random</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">npr</span>
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">matplotlib</span> <span style="color: #008000; font-weight: bold">import</span> pyplot <span style="color: #008000; font-weight: bold">as</span> plt
 
-<p>The above example has \( W_1=H_1=3 \) and \( D_1=1 \).</p>
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">sigmoid</span>(z):
+    <span style="color: #008000; font-weight: bold">return</span> <span style="color: #666666">1/</span>(<span style="color: #666666">1</span> <span style="color: #666666">+</span> np<span style="color: #666666">.</span>exp(<span style="color: #666666">-</span>z))
+
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">deep_neural_network</span>(deep_params, x):
+    <span style="color: #408080; font-style: italic"># N_hidden is the number of hidden layers</span>
+    <span style="color: #408080; font-style: italic"># deep_params is a list, len() should be used</span>
+    N_hidden <span style="color: #666666">=</span> <span style="color: #008000">len</span>(deep_params) <span style="color: #666666">-</span> <span style="color: #666666">1</span> <span style="color: #408080; font-style: italic"># -1 since params consists of</span>
+                                        <span style="color: #408080; font-style: italic"># parameters to all the hidden</span>
+                                        <span style="color: #408080; font-style: italic"># layers AND the output layer.</span>
+
+    <span style="color: #408080; font-style: italic"># Assumes input x being an one-dimensional array</span>
+    num_values <span style="color: #666666">=</span> np<span style="color: #666666">.</span>size(x)
+    x <span style="color: #666666">=</span> x<span style="color: #666666">.</span>reshape(<span style="color: #666666">-1</span>, num_values)
+
+    <span style="color: #408080; font-style: italic"># Assume that the input layer does nothing to the input x</span>
+    x_input <span style="color: #666666">=</span> x
+
+    <span style="color: #408080; font-style: italic"># Due to multiple hidden layers, define a variable referencing to the</span>
+    <span style="color: #408080; font-style: italic"># output of the previous layer:</span>
+    x_prev <span style="color: #666666">=</span> x_input
+
+    <span style="color: #408080; font-style: italic">## Hidden layers:</span>
+
+    <span style="color: #008000; font-weight: bold">for</span> l <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(N_hidden):
+        <span style="color: #408080; font-style: italic"># From the list of parameters P; find the correct weigths and bias for this layer</span>
+        w_hidden <span style="color: #666666">=</span> deep_params[l]
+
+        <span style="color: #408080; font-style: italic"># Add a row of ones to include bias</span>
+        x_prev <span style="color: #666666">=</span> np<span style="color: #666666">.</span>concatenate((np<span style="color: #666666">.</span>ones((<span style="color: #666666">1</span>,num_values)), x_prev ), axis <span style="color: #666666">=</span> <span style="color: #666666">0</span>)
+
+        z_hidden <span style="color: #666666">=</span> np<span style="color: #666666">.</span>matmul(w_hidden, x_prev)
+        x_hidden <span style="color: #666666">=</span> sigmoid(z_hidden)
+
+        <span style="color: #408080; font-style: italic"># Update x_prev such that next layer can use the output from this layer</span>
+        x_prev <span style="color: #666666">=</span> x_hidden
+
+    <span style="color: #408080; font-style: italic">## Output layer:</span>
+
+    <span style="color: #408080; font-style: italic"># Get the weights and bias for this layer</span>
+    w_output <span style="color: #666666">=</span> deep_params[<span style="color: #666666">-1</span>]
+
+    <span style="color: #408080; font-style: italic"># Include bias:</span>
+    x_prev <span style="color: #666666">=</span> np<span style="color: #666666">.</span>concatenate((np<span style="color: #666666">.</span>ones((<span style="color: #666666">1</span>,num_values)), x_prev), axis <span style="color: #666666">=</span> <span style="color: #666666">0</span>)
+
+    z_output <span style="color: #666666">=</span> np<span style="color: #666666">.</span>matmul(w_output, x_prev)
+    x_output <span style="color: #666666">=</span> z_output
+
+    <span style="color: #008000; font-weight: bold">return</span> x_output
+
+
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">solve_ode_deep_neural_network</span>(x, num_neurons, num_iter, lmb):
+    <span style="color: #408080; font-style: italic"># num_hidden_neurons is now a list of number of neurons within each hidden layer</span>
+
+    <span style="color: #408080; font-style: italic"># Find the number of hidden layers:</span>
+    N_hidden <span style="color: #666666">=</span> np<span style="color: #666666">.</span>size(num_neurons)
+
+    <span style="color: #408080; font-style: italic">## Set up initial weigths and biases</span>
+
+    <span style="color: #408080; font-style: italic"># Initialize the list of parameters:</span>
+    P <span style="color: #666666">=</span> [<span style="color: #008000; font-weight: bold">None</span>]<span style="color: #666666">*</span>(N_hidden <span style="color: #666666">+</span> <span style="color: #666666">1</span>) <span style="color: #408080; font-style: italic"># + 1 to include the output layer</span>
+
+    P[<span style="color: #666666">0</span>] <span style="color: #666666">=</span> npr<span style="color: #666666">.</span>randn(num_neurons[<span style="color: #666666">0</span>], <span style="color: #666666">2</span> )
+    <span style="color: #008000; font-weight: bold">for</span> l <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #666666">1</span>,N_hidden):
+        P[l] <span style="color: #666666">=</span> npr<span style="color: #666666">.</span>randn(num_neurons[l], num_neurons[l<span style="color: #666666">-1</span>] <span style="color: #666666">+</span> <span style="color: #666666">1</span>) <span style="color: #408080; font-style: italic"># +1 to include bias</span>
+
+    <span style="color: #408080; font-style: italic"># For the output layer</span>
+    P[<span style="color: #666666">-1</span>] <span style="color: #666666">=</span> npr<span style="color: #666666">.</span>randn(<span style="color: #666666">1</span>, num_neurons[<span style="color: #666666">-1</span>] <span style="color: #666666">+</span> <span style="color: #666666">1</span> ) <span style="color: #408080; font-style: italic"># +1 since bias is included</span>
+
+    <span style="color: #008000">print</span>(<span style="color: #BA2121">&#39;Initial cost: </span><span style="color: #BB6688; font-weight: bold">%g</span><span style="color: #BA2121">&#39;</span><span style="color: #666666">%</span>cost_function_deep(P, x))
+
+    <span style="color: #408080; font-style: italic">## Start finding the optimal weigths using gradient descent</span>
+
+    <span style="color: #408080; font-style: italic"># Find the Python function that represents the gradient of the cost function</span>
+    <span style="color: #408080; font-style: italic"># w.r.t the 0-th input argument -- that is the weights and biases in the hidden and output layer</span>
+    cost_function_deep_grad <span style="color: #666666">=</span> grad(cost_function_deep,<span style="color: #666666">0</span>)
+
+    <span style="color: #408080; font-style: italic"># Let the update be done num_iter times</span>
+    <span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(num_iter):
+        <span style="color: #408080; font-style: italic"># Evaluate the gradient at the current weights and biases in P.</span>
+        <span style="color: #408080; font-style: italic"># The cost_grad consist now of N_hidden + 1 arrays; the gradient w.r.t the weights and biases</span>
+        <span style="color: #408080; font-style: italic"># in the hidden layers and output layers evaluated at x.</span>
+        cost_deep_grad <span style="color: #666666">=</span>  cost_function_deep_grad(P, x)
+
+        <span style="color: #008000; font-weight: bold">for</span> l <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(N_hidden<span style="color: #666666">+1</span>):
+            P[l] <span style="color: #666666">=</span> P[l] <span style="color: #666666">-</span> lmb <span style="color: #666666">*</span> cost_deep_grad[l]
+
+    <span style="color: #008000">print</span>(<span style="color: #BA2121">&#39;Final cost: </span><span style="color: #BB6688; font-weight: bold">%g</span><span style="color: #BA2121">&#39;</span><span style="color: #666666">%</span>cost_function_deep(P, x))
+
+    <span style="color: #008000; font-weight: bold">return</span> P
+
+<span style="color: #408080; font-style: italic">## Set up the cost function specified for this Poisson equation:</span>
+
+<span style="color: #408080; font-style: italic"># The right side of the ODE</span>
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">f</span>(x):
+    <span style="color: #008000; font-weight: bold">return</span> (<span style="color: #666666">3*</span>x <span style="color: #666666">+</span> x<span style="color: #666666">**2</span>)<span style="color: #666666">*</span>np<span style="color: #666666">.</span>exp(x)
+
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">cost_function_deep</span>(P, x):
+
+    <span style="color: #408080; font-style: italic"># Evaluate the trial function with the current parameters P</span>
+    g_t <span style="color: #666666">=</span> g_trial_deep(x,P)
+
+    <span style="color: #408080; font-style: italic"># Find the derivative w.r.t x of the trial function</span>
+    d2_g_t <span style="color: #666666">=</span> elementwise_grad(elementwise_grad(g_trial_deep,<span style="color: #666666">0</span>))(x,P)
+
+    right_side <span style="color: #666666">=</span> f(x)
+
+    err_sqr <span style="color: #666666">=</span> (<span style="color: #666666">-</span>d2_g_t <span style="color: #666666">-</span> right_side)<span style="color: #666666">**2</span>
+    cost_sum <span style="color: #666666">=</span> np<span style="color: #666666">.</span>sum(err_sqr)
+
+    <span style="color: #008000; font-weight: bold">return</span> cost_sum<span style="color: #666666">/</span>np<span style="color: #666666">.</span>size(err_sqr)
+
+<span style="color: #408080; font-style: italic"># The trial solution:</span>
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">g_trial_deep</span>(x,P):
+    <span style="color: #008000; font-weight: bold">return</span> x<span style="color: #666666">*</span>(<span style="color: #666666">1-</span>x)<span style="color: #666666">*</span>deep_neural_network(P,x)
+
+<span style="color: #408080; font-style: italic"># The analytic solution;</span>
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">g_analytic</span>(x):
+    <span style="color: #008000; font-weight: bold">return</span> x<span style="color: #666666">*</span>(<span style="color: #666666">1-</span>x)<span style="color: #666666">*</span>np<span style="color: #666666">.</span>exp(x)
+
+<span style="color: #008000; font-weight: bold">if</span> <span style="color: #19177C">__name__</span> <span style="color: #666666">==</span> <span style="color: #BA2121">&#39;__main__&#39;</span>:
+    npr<span style="color: #666666">.</span>seed(<span style="color: #666666">4155</span>)
+
+    <span style="color: #408080; font-style: italic">## Decide the vales of arguments to the function to solve</span>
+    Nx <span style="color: #666666">=</span> <span style="color: #666666">10</span>
+    x <span style="color: #666666">=</span> np<span style="color: #666666">.</span>linspace(<span style="color: #666666">0</span>,<span style="color: #666666">1</span>, Nx)
+
+    <span style="color: #408080; font-style: italic">## Set up the initial parameters</span>
+    num_hidden_neurons <span style="color: #666666">=</span> [<span style="color: #666666">200</span>,<span style="color: #666666">100</span>]
+    num_iter <span style="color: #666666">=</span> <span style="color: #666666">1000</span>
+    lmb <span style="color: #666666">=</span> <span style="color: #666666">1e-3</span>
+
+    P <span style="color: #666666">=</span> solve_ode_deep_neural_network(x, num_hidden_neurons, num_iter, lmb)
+
+    g_dnn_ag <span style="color: #666666">=</span> g_trial_deep(x,P)
+    g_analytical <span style="color: #666666">=</span> g_analytic(x)
+
+    <span style="color: #408080; font-style: italic"># Find the maximum absolute difference between the solutons:</span>
+
+    plt<span style="color: #666666">.</span>figure(figsize<span style="color: #666666">=</span>(<span style="color: #666666">10</span>,<span style="color: #666666">10</span>))
+
+    plt<span style="color: #666666">.</span>title(<span style="color: #BA2121">&#39;Performance of neural network solving an ODE compared to the analytical solution&#39;</span>)
+    plt<span style="color: #666666">.</span>plot(x, g_analytical)
+    plt<span style="color: #666666">.</span>plot(x, g_dnn_ag[<span style="color: #666666">0</span>,:])
+    plt<span style="color: #666666">.</span>legend([<span style="color: #BA2121">&#39;analytical&#39;</span>,<span style="color: #BA2121">&#39;nn&#39;</span>])
+    plt<span style="color: #666666">.</span>xlabel(<span style="color: #BA2121">&#39;x&#39;</span>)
+    plt<span style="color: #666666">.</span>ylabel(<span style="color: #BA2121">&#39;g(x)&#39;</span>)
+
+    <span style="color: #408080; font-style: italic">## Perform the computation using the numerical scheme</span>
+
+    dx <span style="color: #666666">=</span> <span style="color: #666666">1/</span>(Nx <span style="color: #666666">-</span> <span style="color: #666666">1</span>)
+
+    <span style="color: #408080; font-style: italic"># Set up the matrix A</span>
+    A <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros((Nx<span style="color: #666666">-2</span>,Nx<span style="color: #666666">-2</span>))
+
+    A[<span style="color: #666666">0</span>,<span style="color: #666666">0</span>] <span style="color: #666666">=</span> <span style="color: #666666">2</span>
+    A[<span style="color: #666666">0</span>,<span style="color: #666666">1</span>] <span style="color: #666666">=</span> <span style="color: #666666">-1</span>
+
+    <span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #666666">1</span>,Nx<span style="color: #666666">-3</span>):
+        A[i,i<span style="color: #666666">-1</span>] <span style="color: #666666">=</span> <span style="color: #666666">-1</span>
+        A[i,i] <span style="color: #666666">=</span> <span style="color: #666666">2</span>
+        A[i,i<span style="color: #666666">+1</span>] <span style="color: #666666">=</span> <span style="color: #666666">-1</span>
+
+    A[Nx <span style="color: #666666">-</span> <span style="color: #666666">3</span>, Nx <span style="color: #666666">-</span> <span style="color: #666666">4</span>] <span style="color: #666666">=</span> <span style="color: #666666">-1</span>
+    A[Nx <span style="color: #666666">-</span> <span style="color: #666666">3</span>, Nx <span style="color: #666666">-</span> <span style="color: #666666">3</span>] <span style="color: #666666">=</span> <span style="color: #666666">2</span>
+
+    <span style="color: #408080; font-style: italic"># Set up the vector f</span>
+    f_vec <span style="color: #666666">=</span> dx<span style="color: #666666">**2</span> <span style="color: #666666">*</span> f(x[<span style="color: #666666">1</span>:<span style="color: #666666">-1</span>])
+
+    <span style="color: #408080; font-style: italic"># Solve the equation</span>
+    g_res <span style="color: #666666">=</span> np<span style="color: #666666">.</span>linalg<span style="color: #666666">.</span>solve(A,f_vec)
+
+    g_vec <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros(Nx)
+    g_vec[<span style="color: #666666">1</span>:<span style="color: #666666">-1</span>] <span style="color: #666666">=</span> g_res
+
+    <span style="color: #408080; font-style: italic"># Print the differences between each method</span>
+    max_diff1 <span style="color: #666666">=</span> np<span style="color: #666666">.</span>max(np<span style="color: #666666">.</span>abs(g_dnn_ag <span style="color: #666666">-</span> g_analytical))
+    max_diff2 <span style="color: #666666">=</span> np<span style="color: #666666">.</span>max(np<span style="color: #666666">.</span>abs(g_vec <span style="color: #666666">-</span> g_analytical))
+    <span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;The max absolute difference between the analytical solution and DNN Autograd: </span><span style="color: #BB6688; font-weight: bold">%g</span><span style="color: #BA2121">&quot;</span><span style="color: #666666">%</span>max_diff1)
+    <span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;The max absolute difference between the analytical solution and numerical scheme: </span><span style="color: #BB6688; font-weight: bold">%g</span><span style="color: #BA2121">&quot;</span><span style="color: #666666">%</span>max_diff2)
+
+    <span style="color: #408080; font-style: italic"># Plot the results</span>
+    plt<span style="color: #666666">.</span>figure(figsize<span style="color: #666666">=</span>(<span style="color: #666666">10</span>,<span style="color: #666666">10</span>))
+
+    plt<span style="color: #666666">.</span>plot(x,g_vec)
+    plt<span style="color: #666666">.</span>plot(x,g_analytical)
+    plt<span style="color: #666666">.</span>plot(x,g_dnn_ag[<span style="color: #666666">0</span>,:])
+
+    plt<span style="color: #666666">.</span>legend([<span style="color: #BA2121">&#39;numerical scheme&#39;</span>,<span style="color: #BA2121">&#39;analytical&#39;</span>,<span style="color: #BA2121">&#39;dnn&#39;</span>])
+    plt<span style="color: #666666">.</span>show()
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
+</div>
 
-<p>When we introduce the filter we have the following additional hyperparameters</p>
-<ol>
-<li> \( K \) the number of filters. It is common to perform the convolution of the input several times since by experience shrinking the input too fast does not work well</li>
-<li> \( F \) as the filter's spatial extent</li>
-<li> \( S \) as the stride parameter</li>
-<li> \( P \) as the padding parameter</li>
-</ol>
-<p>These parameters are defined by the architecture of the network and are not included in the training.</p>
 
 <p>
 <!-- navigation buttons at the bottom of the page -->
@@ -392,7 +635,7 @@ <h2 id="finding-the-number-of-parameters" class="anchor">Finding the number of p
   <li><a href="http://github.com/._week44-bs043.html">44</a></li>
   <li><a href="http://github.com/._week44-bs044.html">45</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week44-bs060.html">61</a></li>
+  <li><a href="http://github.com/._week44-bs086.html">87</a></li>
   <li><a href="http://github.com/._week44-bs036.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week44/html/._week44-bs036.html b/doc/pub/week44/html/._week44-bs036.html
index 5d15fe035..cf01bb172 100644
--- a/doc/pub/week44/html/._week44-bs036.html
+++ b/doc/pub/week44/html/._week44-bs036.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 44,  Convolutional Neural Networks (CNN)">
-<title>Week 44,  Convolutional Neural Networks (CNN)</title>
+<meta name="description" content="Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)">
+<title>Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week44.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week44-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -41,10 +41,121 @@
                2,
                None,
                'lab-sessions-on-tuesday-and-wednesday'),
-              ('Material for Lecture Monday October 28',
+              ('Material for Lecture Monday October 27',
                2,
                None,
-               'material-for-lecture-monday-october-28'),
+               'material-for-lecture-monday-october-27'),
+              ('Solving differential equations  with Deep Learning',
+               2,
+               None,
+               'solving-differential-equations-with-deep-learning'),
+              ('Ordinary Differential Equations first',
+               2,
+               None,
+               'ordinary-differential-equations-first'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('Minimization process', 2, None, 'minimization-process'),
+              ('Minimizing the cost function using gradient descent and '
+               'automatic differentiation',
+               2,
+               None,
+               'minimizing-the-cost-function-using-gradient-descent-and-automatic-differentiation'),
+              ('Example: Exponential decay',
+               2,
+               None,
+               'example-exponential-decay'),
+              ('The function to solve for',
+               2,
+               None,
+               'the-function-to-solve-for'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('Setup of Network', 2, None, 'setup-of-network'),
+              ('Reformulating the problem',
+               2,
+               None,
+               'reformulating-the-problem'),
+              ('More technicalities', 2, None, 'more-technicalities'),
+              ('More details', 2, None, 'more-details'),
+              ('A possible implementation of a neural network',
+               2,
+               None,
+               'a-possible-implementation-of-a-neural-network'),
+              ('Technicalities', 2, None, 'technicalities'),
+              ('Final technicalities I', 2, None, 'final-technicalities-i'),
+              ('Final technicalities II', 2, None, 'final-technicalities-ii'),
+              ('Final technicalities III', 2, None, 'final-technicalities-iii'),
+              ('Final technicalities IV', 2, None, 'final-technicalities-iv'),
+              ('Back propagation', 2, None, 'back-propagation'),
+              ('Gradient descent', 2, None, 'gradient-descent'),
+              ('The code for solving the ODE',
+               2,
+               None,
+               'the-code-for-solving-the-ode'),
+              ('The network with one input layer, specified number of hidden '
+               'layers, and one output layer',
+               2,
+               None,
+               'the-network-with-one-input-layer-specified-number-of-hidden-layers-and-one-output-layer'),
+              ('Example: Population growth',
+               2,
+               None,
+               'example-population-growth'),
+              ('Setting up the problem', 2, None, 'setting-up-the-problem'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('The program using Autograd',
+               2,
+               None,
+               'the-program-using-autograd'),
+              ('Using forward Euler to solve the ODE',
+               2,
+               None,
+               'using-forward-euler-to-solve-the-ode'),
+              ('Example: Solving the one dimensional Poisson equation',
+               2,
+               None,
+               'example-solving-the-one-dimensional-poisson-equation'),
+              ('The specific equation to solve for',
+               2,
+               None,
+               'the-specific-equation-to-solve-for'),
+              ('Solving the equation using Autograd',
+               2,
+               None,
+               'solving-the-equation-using-autograd'),
+              ('Comparing with a numerical scheme',
+               2,
+               None,
+               'comparing-with-a-numerical-scheme'),
+              ('Setting up the code', 2, None, 'setting-up-the-code'),
+              ('Partial Differential Equations',
+               2,
+               None,
+               'partial-differential-equations'),
+              ('Type of problem', 2, None, 'type-of-problem'),
+              ('Network requirements', 2, None, 'network-requirements'),
+              ('More details', 2, None, 'more-details'),
+              ('Example: The diffusion equation',
+               2,
+               None,
+               'example-the-diffusion-equation'),
+              ('Defining the problem', 2, None, 'defining-the-problem'),
+              ('Setting up the network using Autograd',
+               2,
+               None,
+               'setting-up-the-network-using-autograd'),
+              ('Setting up the network using Autograd; The trial solution',
+               2,
+               None,
+               'setting-up-the-network-using-autograd-the-trial-solution'),
+              ('Why the Jacobian?', 2, None, 'why-the-jacobian'),
+              ('Setting up the network using Autograd; The full program',
+               2,
+               None,
+               'setting-up-the-network-using-autograd-the-full-program'),
+              ('Resources on differential equations and deep learning',
+               2,
+               None,
+               'resources-on-differential-equations-and-deep-learning'),
               ('Convolutional Neural Networks (recognizing images)',
                2,
                None,
@@ -141,85 +252,11 @@
                2,
                None,
                'pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
-              ('Building convolutional neural networks in Tensorflow and Keras',
-               2,
-               None,
-               'building-convolutional-neural-networks-in-tensorflow-and-keras'),
-              ('Setting it up', 2, None, 'setting-it-up'),
-              ('The MNIST dataset again', 2, None, 'the-mnist-dataset-again'),
-              ('Strong correlations', 2, None, 'strong-correlations'),
-              ('Layers of a CNN', 2, None, 'layers-of-a-cnn'),
-              ('Systematic reduction', 2, None, 'systematic-reduction'),
-              ('Prerequisites: Collect and pre-process data',
-               2,
-               None,
-               'prerequisites-collect-and-pre-process-data'),
-              ('Importing Keras and Tensorflow',
+              ('Building convolutional neural networks in Tensorflow/Keras and '
+               'PyTorch',
                2,
                None,
-               'importing-keras-and-tensorflow'),
-              ('Running with Keras', 2, None, 'running-with-keras'),
-              ('Final part', 2, None, 'final-part'),
-              ('Final visualization', 2, None, 'final-visualization'),
-              ('The CIFAR01 data set', 2, None, 'the-cifar01-data-set'),
-              ('Verifying the data set', 2, None, 'verifying-the-data-set'),
-              ('Set up  the model', 2, None, 'set-up-the-model'),
-              ('Add Dense layers on top', 2, None, 'add-dense-layers-on-top'),
-              ('Compile and train the model',
-               2,
-               None,
-               'compile-and-train-the-model'),
-              ('Finally, evaluate the model',
-               2,
-               None,
-               'finally-evaluate-the-model'),
-              ('Building our own CNN code',
-               2,
-               None,
-               'building-our-own-cnn-code'),
-              ('List of contents:', 3, None, 'list-of-contents'),
-              ('Schedulers', 3, None, 'schedulers'),
-              ('Usage of schedulers', 3, None, 'usage-of-schedulers'),
-              ('Cost functions', 3, None, 'cost-functions'),
-              ('Usage of cost functions', 3, None, 'usage-of-cost-functions'),
-              ('Activation functions', 3, None, 'activation-functions'),
-              ('Usage of activation functions',
-               3,
-               None,
-               'usage-of-activation-functions'),
-              ('Convolution', 3, None, 'convolution'),
-              ('Layers', 3, None, 'layers'),
-              ('Convolution2DLayer: convolution in a hidden layer',
-               3,
-               None,
-               'convolution2dlayer-convolution-in-a-hidden-layer'),
-              ('Backpropagation in the convolutional layer',
-               3,
-               None,
-               'backpropagation-in-the-convolutional-layer'),
-              ('Demonstration', 3, None, 'demonstration'),
-              ('Pooling Layer', 3, None, 'pooling-layer'),
-              ('Flattening Layer', 3, None, 'flattening-layer'),
-              ('Fully Connected Layers', 3, None, 'fully-connected-layers'),
-              ('Optimized Convolution2DLayer',
-               3,
-               None,
-               'optimized-convolution2dlayer'),
-              ('The Convolutional Neural Network (CNN)',
-               3,
-               None,
-               'the-convolutional-neural-network-cnn'),
-              ('Usage of CNN code', 3, None, 'usage-of-cnn-code'),
-              ('Additional Remarks', 3, None, 'additional-remarks'),
-              ('Remarks on the speed', 3, None, 'remarks-on-the-speed'),
-              ('Convolution using separable kernels',
-               3,
-               None,
-               'convolution-using-separable-kernels'),
-              ('Convolution in the Fourier domain',
-               3,
-               None,
-               'convolution-in-the-fourier-domain')]}
+               'building-convolutional-neural-networks-in-tensorflow-keras-and-pytorch')]}
 end of tocinfo -->
 
 <body>
@@ -247,96 +284,100 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week44-bs.html">Week 44,  Convolutional Neural Networks (CNN)</a>
+    <a class="navbar-brand" href="http://github.com/week44-bs.html">Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs001.html#plan-for-week-44" style="font-size: 80%;"><b>Plan for week 44</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs002.html#lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Lab  sessions on Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs003.html#material-for-lecture-monday-october-28" style="font-size: 80%;"><b>Material for Lecture Monday October 28</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs004.html#convolutional-neural-networks-recognizing-images" style="font-size: 80%;"><b>Convolutional Neural Networks (recognizing images)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs005.html#what-is-the-difference" style="font-size: 80%;"><b>What is the Difference</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs006.html#neural-networks-vs-cnns" style="font-size: 80%;"><b>Neural Networks vs CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs007.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;"><b>Why CNNS for images, sound files, medical images from CT scans etc?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs008.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;"><b>Regular NNs don’t scale well to full images</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs009.html#3d-volumes-of-neurons" style="font-size: 80%;"><b>3D volumes of neurons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs010.html#more-on-dimensionalities" style="font-size: 80%;"><b>More on Dimensionalities</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs011.html#further-remarks" style="font-size: 80%;"><b>Further remarks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs012.html#layers-used-to-build-cnns" style="font-size: 80%;"><b>Layers used to build CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs013.html#transforming-images" style="font-size: 80%;"><b>Transforming images</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs014.html#cnns-in-brief" style="font-size: 80%;"><b>CNNs in brief</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs015.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs016.html#key-idea" style="font-size: 80%;"><b>Key Idea</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs017.html#how-to-do-image-compression-before-the-era-of-deep-learning" style="font-size: 80%;"><b>How to do image compression before the era of deep learning</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs018.html#the-svd-example" style="font-size: 80%;"><b>The SVD example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs019.html#mathematics-of-cnns" style="font-size: 80%;"><b>Mathematics of CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs020.html#convolution-examples-polynomial-multiplication" style="font-size: 80%;"><b>Convolution Examples: Polynomial multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs021.html#efficient-polynomial-multiplication" style="font-size: 80%;"><b>Efficient Polynomial Multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs022.html#further-simplification" style="font-size: 80%;"><b>Further simplification</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs023.html#a-more-efficient-way-of-coding-the-above-convolution" style="font-size: 80%;"><b>A more efficient way of coding the above Convolution</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs024.html#commutative-process" style="font-size: 80%;"><b>Commutative process</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs025.html#toeplitz-matrices" style="font-size: 80%;"><b>Toeplitz matrices</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs026.html#fourier-series-and-toeplitz-matrices" style="font-size: 80%;"><b>Fourier series and Toeplitz matrices</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs027.html#generalizing-the-above-one-dimensional-case" style="font-size: 80%;"><b>Generalizing the above one-dimensional case</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#memory-considerations" style="font-size: 80%;"><b>Memory considerations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs029.html#padding" style="font-size: 80%;"><b>Padding</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs030.html#new-vector" style="font-size: 80%;"><b>New vector</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs031.html#rewriting-as-dot-products" style="font-size: 80%;"><b>Rewriting as dot products</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs032.html#cross-correlation" style="font-size: 80%;"><b>Cross correlation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs032.html#two-dimensional-objects" style="font-size: 80%;"><b>Two-dimensional objects</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs033.html#cnns-in-more-detail-simple-example" style="font-size: 80%;"><b>CNNs in more detail, simple example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs034.html#the-convolution-stage" style="font-size: 80%;"><b>The convolution stage</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs035.html#finding-the-number-of-parameters" style="font-size: 80%;"><b>Finding the number of parameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="#new-image-or-volume" style="font-size: 80%;"><b>New image (or volume)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs037.html#parameters-to-train-common-settings" style="font-size: 80%;"><b>Parameters to train, common settings</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs038.html#examples-of-cnn-setups" style="font-size: 80%;"><b>Examples of CNN setups</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs039.html#summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>Summarizing: Performing a general discrete convolution ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs040.html#pooling" style="font-size: 80%;"><b>Pooling</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs041.html#pooling-arithmetic" style="font-size: 80%;"><b>Pooling arithmetic</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs042.html#pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>Pooling types ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs043.html#building-convolutional-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building convolutional neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs044.html#setting-it-up" style="font-size: 80%;"><b>Setting it up</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs045.html#the-mnist-dataset-again" style="font-size: 80%;"><b>The MNIST dataset again</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs046.html#strong-correlations" style="font-size: 80%;"><b>Strong correlations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs047.html#layers-of-a-cnn" style="font-size: 80%;"><b>Layers of a CNN</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs048.html#systematic-reduction" style="font-size: 80%;"><b>Systematic reduction</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs049.html#prerequisites-collect-and-pre-process-data" style="font-size: 80%;"><b>Prerequisites: Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs050.html#importing-keras-and-tensorflow" style="font-size: 80%;"><b>Importing Keras and Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs051.html#running-with-keras" style="font-size: 80%;"><b>Running with Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs052.html#final-part" style="font-size: 80%;"><b>Final part</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs053.html#final-visualization" style="font-size: 80%;"><b>Final visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs054.html#the-cifar01-data-set" style="font-size: 80%;"><b>The CIFAR01 data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs055.html#verifying-the-data-set" style="font-size: 80%;"><b>Verifying the data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs056.html#set-up-the-model" style="font-size: 80%;"><b>Set up  the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs057.html#add-dense-layers-on-top" style="font-size: 80%;"><b>Add Dense layers on top</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs058.html#compile-and-train-the-model" style="font-size: 80%;"><b>Compile and train the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs059.html#finally-evaluate-the-model" style="font-size: 80%;"><b>Finally, evaluate the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#building-our-own-cnn-code" style="font-size: 80%;"><b>Building our own CNN code</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#list-of-contents" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;List of contents:</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution2dlayer-convolution-in-a-hidden-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution2DLayer: convolution in a hidden layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#backpropagation-in-the-convolutional-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Backpropagation in the convolutional layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#demonstration" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Demonstration</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#pooling-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Pooling Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#flattening-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Flattening Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#fully-connected-layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Fully Connected Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#optimized-convolution2dlayer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Optimized Convolution2DLayer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#the-convolutional-neural-network-cnn" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;The Convolutional Neural Network (CNN)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-cnn-code" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of CNN code</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#additional-remarks" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Additional Remarks</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#remarks-on-the-speed" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Remarks on the speed</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution-using-separable-kernels" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution using separable kernels</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution-in-the-fourier-domain" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution in the Fourier domain</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs001.html#plan-for-week-44" style="font-size: 80%;">Plan for week 44</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs002.html#lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;">Lab  sessions on Tuesday and Wednesday</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs003.html#material-for-lecture-monday-october-27" style="font-size: 80%;">Material for Lecture Monday October 27</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs004.html#solving-differential-equations-with-deep-learning" style="font-size: 80%;">Solving differential equations  with Deep Learning</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs005.html#ordinary-differential-equations-first" style="font-size: 80%;">Ordinary Differential Equations first</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#the-trial-solution" style="font-size: 80%;">The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs007.html#minimization-process" style="font-size: 80%;">Minimization process</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs008.html#minimizing-the-cost-function-using-gradient-descent-and-automatic-differentiation" style="font-size: 80%;">Minimizing the cost function using gradient descent and automatic differentiation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs009.html#example-exponential-decay" style="font-size: 80%;">Example: Exponential decay</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs010.html#the-function-to-solve-for" style="font-size: 80%;">The function to solve for</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#the-trial-solution" style="font-size: 80%;">The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs012.html#setup-of-network" style="font-size: 80%;">Setup of Network</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs013.html#reformulating-the-problem" style="font-size: 80%;">Reformulating the problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs014.html#more-technicalities" style="font-size: 80%;">More technicalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs039.html#more-details" style="font-size: 80%;">More details</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs016.html#a-possible-implementation-of-a-neural-network" style="font-size: 80%;">A possible implementation of a neural network</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs017.html#technicalities" style="font-size: 80%;">Technicalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs018.html#final-technicalities-i" style="font-size: 80%;">Final technicalities I</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs019.html#final-technicalities-ii" style="font-size: 80%;">Final technicalities II</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs020.html#final-technicalities-iii" style="font-size: 80%;">Final technicalities III</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs021.html#final-technicalities-iv" style="font-size: 80%;">Final technicalities IV</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs022.html#back-propagation" style="font-size: 80%;">Back propagation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs023.html#gradient-descent" style="font-size: 80%;">Gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs024.html#the-code-for-solving-the-ode" style="font-size: 80%;">The code for solving the ODE</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs025.html#the-network-with-one-input-layer-specified-number-of-hidden-layers-and-one-output-layer" style="font-size: 80%;">The network with one input layer, specified number of hidden layers, and one output layer</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs026.html#example-population-growth" style="font-size: 80%;">Example: Population growth</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs027.html#setting-up-the-problem" style="font-size: 80%;">Setting up the problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#the-trial-solution" style="font-size: 80%;">The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs029.html#the-program-using-autograd" style="font-size: 80%;">The program using Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs030.html#using-forward-euler-to-solve-the-ode" style="font-size: 80%;">Using forward Euler to solve the ODE</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs031.html#example-solving-the-one-dimensional-poisson-equation" style="font-size: 80%;">Example: Solving the one dimensional Poisson equation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs032.html#the-specific-equation-to-solve-for" style="font-size: 80%;">The specific equation to solve for</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs033.html#solving-the-equation-using-autograd" style="font-size: 80%;">Solving the equation using Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs034.html#comparing-with-a-numerical-scheme" style="font-size: 80%;">Comparing with a numerical scheme</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs035.html#setting-up-the-code" style="font-size: 80%;">Setting up the code</a></li>
+     <!-- navigation toc: --> <li><a href="#partial-differential-equations" style="font-size: 80%;">Partial Differential Equations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs037.html#type-of-problem" style="font-size: 80%;">Type of problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs038.html#network-requirements" style="font-size: 80%;">Network requirements</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs039.html#more-details" style="font-size: 80%;">More details</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs040.html#example-the-diffusion-equation" style="font-size: 80%;">Example: The diffusion equation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs041.html#defining-the-problem" style="font-size: 80%;">Defining the problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs042.html#setting-up-the-network-using-autograd" style="font-size: 80%;">Setting up the network using Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs043.html#setting-up-the-network-using-autograd-the-trial-solution" style="font-size: 80%;">Setting up the network using Autograd; The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs044.html#why-the-jacobian" style="font-size: 80%;">Why the Jacobian?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs045.html#setting-up-the-network-using-autograd-the-full-program" style="font-size: 80%;">Setting up the network using Autograd; The full program</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs046.html#resources-on-differential-equations-and-deep-learning" style="font-size: 80%;">Resources on differential equations and deep learning</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs047.html#convolutional-neural-networks-recognizing-images" style="font-size: 80%;">Convolutional Neural Networks (recognizing images)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs048.html#what-is-the-difference" style="font-size: 80%;">What is the Difference</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs049.html#neural-networks-vs-cnns" style="font-size: 80%;">Neural Networks vs CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs050.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;">Why CNNS for images, sound files, medical images from CT scans etc?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs051.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;">Regular NNs don’t scale well to full images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs052.html#3d-volumes-of-neurons" style="font-size: 80%;">3D volumes of neurons</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs053.html#more-on-dimensionalities" style="font-size: 80%;">More on Dimensionalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs054.html#further-remarks" style="font-size: 80%;">Further remarks</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs055.html#layers-used-to-build-cnns" style="font-size: 80%;">Layers used to build CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs056.html#transforming-images" style="font-size: 80%;">Transforming images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs057.html#cnns-in-brief" style="font-size: 80%;">CNNs in brief</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs058.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs059.html#key-idea" style="font-size: 80%;">Key Idea</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#how-to-do-image-compression-before-the-era-of-deep-learning" style="font-size: 80%;">How to do image compression before the era of deep learning</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs061.html#the-svd-example" style="font-size: 80%;">The SVD example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs062.html#mathematics-of-cnns" style="font-size: 80%;">Mathematics of CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs063.html#convolution-examples-polynomial-multiplication" style="font-size: 80%;">Convolution Examples: Polynomial multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs064.html#efficient-polynomial-multiplication" style="font-size: 80%;">Efficient Polynomial Multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs065.html#further-simplification" style="font-size: 80%;">Further simplification</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs066.html#a-more-efficient-way-of-coding-the-above-convolution" style="font-size: 80%;">A more efficient way of coding the above Convolution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs067.html#commutative-process" style="font-size: 80%;">Commutative process</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs068.html#toeplitz-matrices" style="font-size: 80%;">Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs069.html#fourier-series-and-toeplitz-matrices" style="font-size: 80%;">Fourier series and Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs070.html#generalizing-the-above-one-dimensional-case" style="font-size: 80%;">Generalizing the above one-dimensional case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs071.html#memory-considerations" style="font-size: 80%;">Memory considerations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs072.html#padding" style="font-size: 80%;">Padding</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs073.html#new-vector" style="font-size: 80%;">New vector</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs074.html#rewriting-as-dot-products" style="font-size: 80%;">Rewriting as dot products</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs075.html#cross-correlation" style="font-size: 80%;">Cross correlation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs075.html#two-dimensional-objects" style="font-size: 80%;">Two-dimensional objects</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs076.html#cnns-in-more-detail-simple-example" style="font-size: 80%;">CNNs in more detail, simple example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs077.html#the-convolution-stage" style="font-size: 80%;">The convolution stage</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs078.html#finding-the-number-of-parameters" style="font-size: 80%;">Finding the number of parameters</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs079.html#new-image-or-volume" style="font-size: 80%;">New image (or volume)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs080.html#parameters-to-train-common-settings" style="font-size: 80%;">Parameters to train, common settings</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs081.html#examples-of-cnn-setups" style="font-size: 80%;">Examples of CNN setups</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs082.html#summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Summarizing: Performing a general discrete convolution ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs083.html#pooling" style="font-size: 80%;">Pooling</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs084.html#pooling-arithmetic" style="font-size: 80%;">Pooling arithmetic</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs085.html#pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Pooling types ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs086.html#building-convolutional-neural-networks-in-tensorflow-keras-and-pytorch" style="font-size: 80%;">Building convolutional neural networks in Tensorflow/Keras and PyTorch</a></li>
 
         </ul>
       </li>
@@ -348,23 +389,23 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0036"></a>
 <!-- !split -->
-<h2 id="new-image-or-volume" class="anchor">New image (or volume) </h2>
+<h2 id="partial-differential-equations" class="anchor">Partial Differential Equations </h2>
 
-<p>Acting with the filter on the input volume produces an output volume
-which is defined by its width \( W_2 \), its height \( H_2 \) and its depth
-\( D_2 \).
+<p>A partial differential equation (PDE) has a solution here the function
+is defined by multiple variables.  The equation may involve all kinds
+of combinations of which variables the function is differentiated with
+respect to.
 </p>
 
-<p>These are defined by the following relations</p>
-$$
-W_2 = \frac{(W_1-F+2P)}{S}+1,
-$$
+<p>In general, a partial differential equation for a function \( g(x_1,\dots,x_N) \) with \( N \) variables may be expressed as</p>
 
 $$
-H_2 = \frac{(H_1-F+2P)}{S}+1,
+\begin{equation} \tag{17}
+  f\left(x_1, \, \dots \, , x_N, \frac{\partial g(x_1,\dots,x_N) }{\partial x_1}, \dots , \frac{\partial g(x_1,\dots,x_N) }{\partial x_N}, \frac{\partial g(x_1,\dots,x_N) }{\partial x_1\partial x_2}, \, \dots \, , \frac{\partial^n g(x_1,\dots,x_N) }{\partial x_N^n} \right) = 0
+\end{equation}
 $$
 
-<p>and \( D_2=K \).</p>
+<p>where \( f \) is an expression involving all kinds of possible mixed derivatives of \( g(x_1,\dots,x_N) \) up to an order \( n \). In order for the solution to be unique, some additional conditions must also be given.</p>
 
 <p>
 <!-- navigation buttons at the bottom of the page -->
@@ -391,7 +432,7 @@ <h2 id="new-image-or-volume" class="anchor">New image (or volume) </h2>
   <li><a href="http://github.com/._week44-bs044.html">45</a></li>
   <li><a href="http://github.com/._week44-bs045.html">46</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week44-bs060.html">61</a></li>
+  <li><a href="http://github.com/._week44-bs086.html">87</a></li>
   <li><a href="http://github.com/._week44-bs037.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week44/html/._week44-bs037.html b/doc/pub/week44/html/._week44-bs037.html
index 3b43fbabb..fa56b40c6 100644
--- a/doc/pub/week44/html/._week44-bs037.html
+++ b/doc/pub/week44/html/._week44-bs037.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 44,  Convolutional Neural Networks (CNN)">
-<title>Week 44,  Convolutional Neural Networks (CNN)</title>
+<meta name="description" content="Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)">
+<title>Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week44.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week44-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -41,10 +41,121 @@
                2,
                None,
                'lab-sessions-on-tuesday-and-wednesday'),
-              ('Material for Lecture Monday October 28',
+              ('Material for Lecture Monday October 27',
                2,
                None,
-               'material-for-lecture-monday-october-28'),
+               'material-for-lecture-monday-october-27'),
+              ('Solving differential equations  with Deep Learning',
+               2,
+               None,
+               'solving-differential-equations-with-deep-learning'),
+              ('Ordinary Differential Equations first',
+               2,
+               None,
+               'ordinary-differential-equations-first'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('Minimization process', 2, None, 'minimization-process'),
+              ('Minimizing the cost function using gradient descent and '
+               'automatic differentiation',
+               2,
+               None,
+               'minimizing-the-cost-function-using-gradient-descent-and-automatic-differentiation'),
+              ('Example: Exponential decay',
+               2,
+               None,
+               'example-exponential-decay'),
+              ('The function to solve for',
+               2,
+               None,
+               'the-function-to-solve-for'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('Setup of Network', 2, None, 'setup-of-network'),
+              ('Reformulating the problem',
+               2,
+               None,
+               'reformulating-the-problem'),
+              ('More technicalities', 2, None, 'more-technicalities'),
+              ('More details', 2, None, 'more-details'),
+              ('A possible implementation of a neural network',
+               2,
+               None,
+               'a-possible-implementation-of-a-neural-network'),
+              ('Technicalities', 2, None, 'technicalities'),
+              ('Final technicalities I', 2, None, 'final-technicalities-i'),
+              ('Final technicalities II', 2, None, 'final-technicalities-ii'),
+              ('Final technicalities III', 2, None, 'final-technicalities-iii'),
+              ('Final technicalities IV', 2, None, 'final-technicalities-iv'),
+              ('Back propagation', 2, None, 'back-propagation'),
+              ('Gradient descent', 2, None, 'gradient-descent'),
+              ('The code for solving the ODE',
+               2,
+               None,
+               'the-code-for-solving-the-ode'),
+              ('The network with one input layer, specified number of hidden '
+               'layers, and one output layer',
+               2,
+               None,
+               'the-network-with-one-input-layer-specified-number-of-hidden-layers-and-one-output-layer'),
+              ('Example: Population growth',
+               2,
+               None,
+               'example-population-growth'),
+              ('Setting up the problem', 2, None, 'setting-up-the-problem'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('The program using Autograd',
+               2,
+               None,
+               'the-program-using-autograd'),
+              ('Using forward Euler to solve the ODE',
+               2,
+               None,
+               'using-forward-euler-to-solve-the-ode'),
+              ('Example: Solving the one dimensional Poisson equation',
+               2,
+               None,
+               'example-solving-the-one-dimensional-poisson-equation'),
+              ('The specific equation to solve for',
+               2,
+               None,
+               'the-specific-equation-to-solve-for'),
+              ('Solving the equation using Autograd',
+               2,
+               None,
+               'solving-the-equation-using-autograd'),
+              ('Comparing with a numerical scheme',
+               2,
+               None,
+               'comparing-with-a-numerical-scheme'),
+              ('Setting up the code', 2, None, 'setting-up-the-code'),
+              ('Partial Differential Equations',
+               2,
+               None,
+               'partial-differential-equations'),
+              ('Type of problem', 2, None, 'type-of-problem'),
+              ('Network requirements', 2, None, 'network-requirements'),
+              ('More details', 2, None, 'more-details'),
+              ('Example: The diffusion equation',
+               2,
+               None,
+               'example-the-diffusion-equation'),
+              ('Defining the problem', 2, None, 'defining-the-problem'),
+              ('Setting up the network using Autograd',
+               2,
+               None,
+               'setting-up-the-network-using-autograd'),
+              ('Setting up the network using Autograd; The trial solution',
+               2,
+               None,
+               'setting-up-the-network-using-autograd-the-trial-solution'),
+              ('Why the Jacobian?', 2, None, 'why-the-jacobian'),
+              ('Setting up the network using Autograd; The full program',
+               2,
+               None,
+               'setting-up-the-network-using-autograd-the-full-program'),
+              ('Resources on differential equations and deep learning',
+               2,
+               None,
+               'resources-on-differential-equations-and-deep-learning'),
               ('Convolutional Neural Networks (recognizing images)',
                2,
                None,
@@ -141,85 +252,11 @@
                2,
                None,
                'pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
-              ('Building convolutional neural networks in Tensorflow and Keras',
-               2,
-               None,
-               'building-convolutional-neural-networks-in-tensorflow-and-keras'),
-              ('Setting it up', 2, None, 'setting-it-up'),
-              ('The MNIST dataset again', 2, None, 'the-mnist-dataset-again'),
-              ('Strong correlations', 2, None, 'strong-correlations'),
-              ('Layers of a CNN', 2, None, 'layers-of-a-cnn'),
-              ('Systematic reduction', 2, None, 'systematic-reduction'),
-              ('Prerequisites: Collect and pre-process data',
-               2,
-               None,
-               'prerequisites-collect-and-pre-process-data'),
-              ('Importing Keras and Tensorflow',
-               2,
-               None,
-               'importing-keras-and-tensorflow'),
-              ('Running with Keras', 2, None, 'running-with-keras'),
-              ('Final part', 2, None, 'final-part'),
-              ('Final visualization', 2, None, 'final-visualization'),
-              ('The CIFAR01 data set', 2, None, 'the-cifar01-data-set'),
-              ('Verifying the data set', 2, None, 'verifying-the-data-set'),
-              ('Set up  the model', 2, None, 'set-up-the-model'),
-              ('Add Dense layers on top', 2, None, 'add-dense-layers-on-top'),
-              ('Compile and train the model',
-               2,
-               None,
-               'compile-and-train-the-model'),
-              ('Finally, evaluate the model',
-               2,
-               None,
-               'finally-evaluate-the-model'),
-              ('Building our own CNN code',
+              ('Building convolutional neural networks in Tensorflow/Keras and '
+               'PyTorch',
                2,
                None,
-               'building-our-own-cnn-code'),
-              ('List of contents:', 3, None, 'list-of-contents'),
-              ('Schedulers', 3, None, 'schedulers'),
-              ('Usage of schedulers', 3, None, 'usage-of-schedulers'),
-              ('Cost functions', 3, None, 'cost-functions'),
-              ('Usage of cost functions', 3, None, 'usage-of-cost-functions'),
-              ('Activation functions', 3, None, 'activation-functions'),
-              ('Usage of activation functions',
-               3,
-               None,
-               'usage-of-activation-functions'),
-              ('Convolution', 3, None, 'convolution'),
-              ('Layers', 3, None, 'layers'),
-              ('Convolution2DLayer: convolution in a hidden layer',
-               3,
-               None,
-               'convolution2dlayer-convolution-in-a-hidden-layer'),
-              ('Backpropagation in the convolutional layer',
-               3,
-               None,
-               'backpropagation-in-the-convolutional-layer'),
-              ('Demonstration', 3, None, 'demonstration'),
-              ('Pooling Layer', 3, None, 'pooling-layer'),
-              ('Flattening Layer', 3, None, 'flattening-layer'),
-              ('Fully Connected Layers', 3, None, 'fully-connected-layers'),
-              ('Optimized Convolution2DLayer',
-               3,
-               None,
-               'optimized-convolution2dlayer'),
-              ('The Convolutional Neural Network (CNN)',
-               3,
-               None,
-               'the-convolutional-neural-network-cnn'),
-              ('Usage of CNN code', 3, None, 'usage-of-cnn-code'),
-              ('Additional Remarks', 3, None, 'additional-remarks'),
-              ('Remarks on the speed', 3, None, 'remarks-on-the-speed'),
-              ('Convolution using separable kernels',
-               3,
-               None,
-               'convolution-using-separable-kernels'),
-              ('Convolution in the Fourier domain',
-               3,
-               None,
-               'convolution-in-the-fourier-domain')]}
+               'building-convolutional-neural-networks-in-tensorflow-keras-and-pytorch')]}
 end of tocinfo -->
 
 <body>
@@ -247,96 +284,100 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week44-bs.html">Week 44,  Convolutional Neural Networks (CNN)</a>
+    <a class="navbar-brand" href="http://github.com/week44-bs.html">Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs001.html#plan-for-week-44" style="font-size: 80%;"><b>Plan for week 44</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs002.html#lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Lab  sessions on Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs003.html#material-for-lecture-monday-october-28" style="font-size: 80%;"><b>Material for Lecture Monday October 28</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs004.html#convolutional-neural-networks-recognizing-images" style="font-size: 80%;"><b>Convolutional Neural Networks (recognizing images)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs005.html#what-is-the-difference" style="font-size: 80%;"><b>What is the Difference</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs006.html#neural-networks-vs-cnns" style="font-size: 80%;"><b>Neural Networks vs CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs007.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;"><b>Why CNNS for images, sound files, medical images from CT scans etc?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs008.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;"><b>Regular NNs don’t scale well to full images</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs009.html#3d-volumes-of-neurons" style="font-size: 80%;"><b>3D volumes of neurons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs010.html#more-on-dimensionalities" style="font-size: 80%;"><b>More on Dimensionalities</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs011.html#further-remarks" style="font-size: 80%;"><b>Further remarks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs012.html#layers-used-to-build-cnns" style="font-size: 80%;"><b>Layers used to build CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs013.html#transforming-images" style="font-size: 80%;"><b>Transforming images</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs014.html#cnns-in-brief" style="font-size: 80%;"><b>CNNs in brief</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs015.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs016.html#key-idea" style="font-size: 80%;"><b>Key Idea</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs017.html#how-to-do-image-compression-before-the-era-of-deep-learning" style="font-size: 80%;"><b>How to do image compression before the era of deep learning</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs018.html#the-svd-example" style="font-size: 80%;"><b>The SVD example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs019.html#mathematics-of-cnns" style="font-size: 80%;"><b>Mathematics of CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs020.html#convolution-examples-polynomial-multiplication" style="font-size: 80%;"><b>Convolution Examples: Polynomial multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs021.html#efficient-polynomial-multiplication" style="font-size: 80%;"><b>Efficient Polynomial Multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs022.html#further-simplification" style="font-size: 80%;"><b>Further simplification</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs023.html#a-more-efficient-way-of-coding-the-above-convolution" style="font-size: 80%;"><b>A more efficient way of coding the above Convolution</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs024.html#commutative-process" style="font-size: 80%;"><b>Commutative process</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs025.html#toeplitz-matrices" style="font-size: 80%;"><b>Toeplitz matrices</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs026.html#fourier-series-and-toeplitz-matrices" style="font-size: 80%;"><b>Fourier series and Toeplitz matrices</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs027.html#generalizing-the-above-one-dimensional-case" style="font-size: 80%;"><b>Generalizing the above one-dimensional case</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#memory-considerations" style="font-size: 80%;"><b>Memory considerations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs029.html#padding" style="font-size: 80%;"><b>Padding</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs030.html#new-vector" style="font-size: 80%;"><b>New vector</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs031.html#rewriting-as-dot-products" style="font-size: 80%;"><b>Rewriting as dot products</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs032.html#cross-correlation" style="font-size: 80%;"><b>Cross correlation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs032.html#two-dimensional-objects" style="font-size: 80%;"><b>Two-dimensional objects</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs033.html#cnns-in-more-detail-simple-example" style="font-size: 80%;"><b>CNNs in more detail, simple example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs034.html#the-convolution-stage" style="font-size: 80%;"><b>The convolution stage</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs035.html#finding-the-number-of-parameters" style="font-size: 80%;"><b>Finding the number of parameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs036.html#new-image-or-volume" style="font-size: 80%;"><b>New image (or volume)</b></a></li>
-     <!-- navigation toc: --> <li><a href="#parameters-to-train-common-settings" style="font-size: 80%;"><b>Parameters to train, common settings</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs038.html#examples-of-cnn-setups" style="font-size: 80%;"><b>Examples of CNN setups</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs039.html#summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>Summarizing: Performing a general discrete convolution ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs040.html#pooling" style="font-size: 80%;"><b>Pooling</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs041.html#pooling-arithmetic" style="font-size: 80%;"><b>Pooling arithmetic</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs042.html#pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>Pooling types ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs043.html#building-convolutional-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building convolutional neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs044.html#setting-it-up" style="font-size: 80%;"><b>Setting it up</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs045.html#the-mnist-dataset-again" style="font-size: 80%;"><b>The MNIST dataset again</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs046.html#strong-correlations" style="font-size: 80%;"><b>Strong correlations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs047.html#layers-of-a-cnn" style="font-size: 80%;"><b>Layers of a CNN</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs048.html#systematic-reduction" style="font-size: 80%;"><b>Systematic reduction</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs049.html#prerequisites-collect-and-pre-process-data" style="font-size: 80%;"><b>Prerequisites: Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs050.html#importing-keras-and-tensorflow" style="font-size: 80%;"><b>Importing Keras and Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs051.html#running-with-keras" style="font-size: 80%;"><b>Running with Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs052.html#final-part" style="font-size: 80%;"><b>Final part</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs053.html#final-visualization" style="font-size: 80%;"><b>Final visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs054.html#the-cifar01-data-set" style="font-size: 80%;"><b>The CIFAR01 data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs055.html#verifying-the-data-set" style="font-size: 80%;"><b>Verifying the data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs056.html#set-up-the-model" style="font-size: 80%;"><b>Set up  the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs057.html#add-dense-layers-on-top" style="font-size: 80%;"><b>Add Dense layers on top</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs058.html#compile-and-train-the-model" style="font-size: 80%;"><b>Compile and train the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs059.html#finally-evaluate-the-model" style="font-size: 80%;"><b>Finally, evaluate the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#building-our-own-cnn-code" style="font-size: 80%;"><b>Building our own CNN code</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#list-of-contents" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;List of contents:</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution2dlayer-convolution-in-a-hidden-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution2DLayer: convolution in a hidden layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#backpropagation-in-the-convolutional-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Backpropagation in the convolutional layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#demonstration" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Demonstration</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#pooling-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Pooling Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#flattening-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Flattening Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#fully-connected-layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Fully Connected Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#optimized-convolution2dlayer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Optimized Convolution2DLayer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#the-convolutional-neural-network-cnn" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;The Convolutional Neural Network (CNN)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-cnn-code" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of CNN code</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#additional-remarks" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Additional Remarks</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#remarks-on-the-speed" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Remarks on the speed</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution-using-separable-kernels" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution using separable kernels</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution-in-the-fourier-domain" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution in the Fourier domain</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs001.html#plan-for-week-44" style="font-size: 80%;">Plan for week 44</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs002.html#lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;">Lab  sessions on Tuesday and Wednesday</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs003.html#material-for-lecture-monday-october-27" style="font-size: 80%;">Material for Lecture Monday October 27</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs004.html#solving-differential-equations-with-deep-learning" style="font-size: 80%;">Solving differential equations  with Deep Learning</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs005.html#ordinary-differential-equations-first" style="font-size: 80%;">Ordinary Differential Equations first</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#the-trial-solution" style="font-size: 80%;">The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs007.html#minimization-process" style="font-size: 80%;">Minimization process</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs008.html#minimizing-the-cost-function-using-gradient-descent-and-automatic-differentiation" style="font-size: 80%;">Minimizing the cost function using gradient descent and automatic differentiation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs009.html#example-exponential-decay" style="font-size: 80%;">Example: Exponential decay</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs010.html#the-function-to-solve-for" style="font-size: 80%;">The function to solve for</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#the-trial-solution" style="font-size: 80%;">The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs012.html#setup-of-network" style="font-size: 80%;">Setup of Network</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs013.html#reformulating-the-problem" style="font-size: 80%;">Reformulating the problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs014.html#more-technicalities" style="font-size: 80%;">More technicalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs039.html#more-details" style="font-size: 80%;">More details</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs016.html#a-possible-implementation-of-a-neural-network" style="font-size: 80%;">A possible implementation of a neural network</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs017.html#technicalities" style="font-size: 80%;">Technicalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs018.html#final-technicalities-i" style="font-size: 80%;">Final technicalities I</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs019.html#final-technicalities-ii" style="font-size: 80%;">Final technicalities II</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs020.html#final-technicalities-iii" style="font-size: 80%;">Final technicalities III</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs021.html#final-technicalities-iv" style="font-size: 80%;">Final technicalities IV</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs022.html#back-propagation" style="font-size: 80%;">Back propagation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs023.html#gradient-descent" style="font-size: 80%;">Gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs024.html#the-code-for-solving-the-ode" style="font-size: 80%;">The code for solving the ODE</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs025.html#the-network-with-one-input-layer-specified-number-of-hidden-layers-and-one-output-layer" style="font-size: 80%;">The network with one input layer, specified number of hidden layers, and one output layer</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs026.html#example-population-growth" style="font-size: 80%;">Example: Population growth</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs027.html#setting-up-the-problem" style="font-size: 80%;">Setting up the problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#the-trial-solution" style="font-size: 80%;">The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs029.html#the-program-using-autograd" style="font-size: 80%;">The program using Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs030.html#using-forward-euler-to-solve-the-ode" style="font-size: 80%;">Using forward Euler to solve the ODE</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs031.html#example-solving-the-one-dimensional-poisson-equation" style="font-size: 80%;">Example: Solving the one dimensional Poisson equation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs032.html#the-specific-equation-to-solve-for" style="font-size: 80%;">The specific equation to solve for</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs033.html#solving-the-equation-using-autograd" style="font-size: 80%;">Solving the equation using Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs034.html#comparing-with-a-numerical-scheme" style="font-size: 80%;">Comparing with a numerical scheme</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs035.html#setting-up-the-code" style="font-size: 80%;">Setting up the code</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs036.html#partial-differential-equations" style="font-size: 80%;">Partial Differential Equations</a></li>
+     <!-- navigation toc: --> <li><a href="#type-of-problem" style="font-size: 80%;">Type of problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs038.html#network-requirements" style="font-size: 80%;">Network requirements</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs039.html#more-details" style="font-size: 80%;">More details</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs040.html#example-the-diffusion-equation" style="font-size: 80%;">Example: The diffusion equation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs041.html#defining-the-problem" style="font-size: 80%;">Defining the problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs042.html#setting-up-the-network-using-autograd" style="font-size: 80%;">Setting up the network using Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs043.html#setting-up-the-network-using-autograd-the-trial-solution" style="font-size: 80%;">Setting up the network using Autograd; The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs044.html#why-the-jacobian" style="font-size: 80%;">Why the Jacobian?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs045.html#setting-up-the-network-using-autograd-the-full-program" style="font-size: 80%;">Setting up the network using Autograd; The full program</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs046.html#resources-on-differential-equations-and-deep-learning" style="font-size: 80%;">Resources on differential equations and deep learning</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs047.html#convolutional-neural-networks-recognizing-images" style="font-size: 80%;">Convolutional Neural Networks (recognizing images)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs048.html#what-is-the-difference" style="font-size: 80%;">What is the Difference</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs049.html#neural-networks-vs-cnns" style="font-size: 80%;">Neural Networks vs CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs050.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;">Why CNNS for images, sound files, medical images from CT scans etc?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs051.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;">Regular NNs don’t scale well to full images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs052.html#3d-volumes-of-neurons" style="font-size: 80%;">3D volumes of neurons</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs053.html#more-on-dimensionalities" style="font-size: 80%;">More on Dimensionalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs054.html#further-remarks" style="font-size: 80%;">Further remarks</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs055.html#layers-used-to-build-cnns" style="font-size: 80%;">Layers used to build CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs056.html#transforming-images" style="font-size: 80%;">Transforming images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs057.html#cnns-in-brief" style="font-size: 80%;">CNNs in brief</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs058.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs059.html#key-idea" style="font-size: 80%;">Key Idea</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#how-to-do-image-compression-before-the-era-of-deep-learning" style="font-size: 80%;">How to do image compression before the era of deep learning</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs061.html#the-svd-example" style="font-size: 80%;">The SVD example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs062.html#mathematics-of-cnns" style="font-size: 80%;">Mathematics of CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs063.html#convolution-examples-polynomial-multiplication" style="font-size: 80%;">Convolution Examples: Polynomial multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs064.html#efficient-polynomial-multiplication" style="font-size: 80%;">Efficient Polynomial Multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs065.html#further-simplification" style="font-size: 80%;">Further simplification</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs066.html#a-more-efficient-way-of-coding-the-above-convolution" style="font-size: 80%;">A more efficient way of coding the above Convolution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs067.html#commutative-process" style="font-size: 80%;">Commutative process</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs068.html#toeplitz-matrices" style="font-size: 80%;">Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs069.html#fourier-series-and-toeplitz-matrices" style="font-size: 80%;">Fourier series and Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs070.html#generalizing-the-above-one-dimensional-case" style="font-size: 80%;">Generalizing the above one-dimensional case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs071.html#memory-considerations" style="font-size: 80%;">Memory considerations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs072.html#padding" style="font-size: 80%;">Padding</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs073.html#new-vector" style="font-size: 80%;">New vector</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs074.html#rewriting-as-dot-products" style="font-size: 80%;">Rewriting as dot products</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs075.html#cross-correlation" style="font-size: 80%;">Cross correlation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs075.html#two-dimensional-objects" style="font-size: 80%;">Two-dimensional objects</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs076.html#cnns-in-more-detail-simple-example" style="font-size: 80%;">CNNs in more detail, simple example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs077.html#the-convolution-stage" style="font-size: 80%;">The convolution stage</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs078.html#finding-the-number-of-parameters" style="font-size: 80%;">Finding the number of parameters</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs079.html#new-image-or-volume" style="font-size: 80%;">New image (or volume)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs080.html#parameters-to-train-common-settings" style="font-size: 80%;">Parameters to train, common settings</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs081.html#examples-of-cnn-setups" style="font-size: 80%;">Examples of CNN setups</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs082.html#summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Summarizing: Performing a general discrete convolution ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs083.html#pooling" style="font-size: 80%;">Pooling</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs084.html#pooling-arithmetic" style="font-size: 80%;">Pooling arithmetic</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs085.html#pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Pooling types ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs086.html#building-convolutional-neural-networks-in-tensorflow-keras-and-pytorch" style="font-size: 80%;">Building convolutional neural networks in Tensorflow/Keras and PyTorch</a></li>
 
         </ul>
       </li>
@@ -348,31 +389,24 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0037"></a>
 <!-- !split -->
-<h2 id="parameters-to-train-common-settings" class="anchor">Parameters to train, common settings </h2>
+<h2 id="type-of-problem" class="anchor">Type of problem </h2>
 
-<p>With parameter sharing, the convolution involves thus  for each filter  \( F\times F\times D_1 \) weights plus one bias parameter.</p>
+<p>The problem our network must solve for, is similar to the ODE case.
+We must have a trial solution \( g_t \) at hand.
+</p>
 
-<p>In total we have</p>
+<p>For instance, the trial solution could be expressed as</p>
 $$
-\left(F\times F\times D_1\right) \times K+K_{\mathrm{biases}},
+\begin{align*}
+  g_t(x_1,\dots,x_N) = h_1(x_1,\dots,x_N) + h_2(x_1,\dots,x_N,N(x_1,\dots,x_N,P))
+\end{align*}
 $$
 
-<p>parameters to train by back propagation.</p>
-
-<p>It is common to let \( K \) come in powers of \( 2 \), that is \( 32 \), \( 64 \), \( 128 \) etc.</p>
-
-<div class="panel panel-default">
-<div class="panel-body">
-<!-- subsequent paragraphs come in larger fonts, so start with a paragraph -->
-<ol>
-<li> \( \begin{array}{c} F=3 &amp; S=1 &amp; P=1 \end{array} \)</li>
-<li> \( \begin{array}{c} F=5 &amp; S=1 &amp; P=2 \end{array} \)</li>
-<li> \( \begin{array}{c} F=5 &amp; S=2 &amp; P=\mathrm{open} \end{array} \)</li>
-<li> \( \begin{array}{c} F=1 &amp; S=1 &amp; P=0 \end{array} \)</li>
-</ol>
-</div>
-</div>
+<p>where \( h_1(x_1,\dots,x_N) \) is a function that ensures \( g_t(x_1,\dots,x_N) \) satisfies some given conditions.
+The neural network \( N(x_1,\dots,x_N,P) \) has weights and biases described by \( P \) and \( h_2(x_1,\dots,x_N,N(x_1,\dots,x_N,P)) \) is an expression using the output from the neural network in some way.
+</p>
 
+<p>The role of the function \( h_2(x_1,\dots,x_N,N(x_1,\dots,x_N,P)) \), is to ensure that the output of \( N(x_1,\dots,x_N,P) \) is zero when \( g_t(x_1,\dots,x_N) \) is evaluated at the values of \( x_1,\dots,x_N \) where the given conditions must be satisfied. The function \( h_1(x_1,\dots,x_N) \) should alone make \( g_t(x_1,\dots,x_N) \) satisfy the conditions.</p>
 
 <p>
 <!-- navigation buttons at the bottom of the page -->
@@ -399,7 +433,7 @@ <h2 id="parameters-to-train-common-settings" class="anchor">Parameters to train,
   <li><a href="http://github.com/._week44-bs045.html">46</a></li>
   <li><a href="http://github.com/._week44-bs046.html">47</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week44-bs060.html">61</a></li>
+  <li><a href="http://github.com/._week44-bs086.html">87</a></li>
   <li><a href="http://github.com/._week44-bs038.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week44/html/._week44-bs038.html b/doc/pub/week44/html/._week44-bs038.html
index 2a21817b1..18ff1ee98 100644
--- a/doc/pub/week44/html/._week44-bs038.html
+++ b/doc/pub/week44/html/._week44-bs038.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 44,  Convolutional Neural Networks (CNN)">
-<title>Week 44,  Convolutional Neural Networks (CNN)</title>
+<meta name="description" content="Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)">
+<title>Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week44.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week44-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -41,10 +41,121 @@
                2,
                None,
                'lab-sessions-on-tuesday-and-wednesday'),
-              ('Material for Lecture Monday October 28',
+              ('Material for Lecture Monday October 27',
                2,
                None,
-               'material-for-lecture-monday-october-28'),
+               'material-for-lecture-monday-october-27'),
+              ('Solving differential equations  with Deep Learning',
+               2,
+               None,
+               'solving-differential-equations-with-deep-learning'),
+              ('Ordinary Differential Equations first',
+               2,
+               None,
+               'ordinary-differential-equations-first'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('Minimization process', 2, None, 'minimization-process'),
+              ('Minimizing the cost function using gradient descent and '
+               'automatic differentiation',
+               2,
+               None,
+               'minimizing-the-cost-function-using-gradient-descent-and-automatic-differentiation'),
+              ('Example: Exponential decay',
+               2,
+               None,
+               'example-exponential-decay'),
+              ('The function to solve for',
+               2,
+               None,
+               'the-function-to-solve-for'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('Setup of Network', 2, None, 'setup-of-network'),
+              ('Reformulating the problem',
+               2,
+               None,
+               'reformulating-the-problem'),
+              ('More technicalities', 2, None, 'more-technicalities'),
+              ('More details', 2, None, 'more-details'),
+              ('A possible implementation of a neural network',
+               2,
+               None,
+               'a-possible-implementation-of-a-neural-network'),
+              ('Technicalities', 2, None, 'technicalities'),
+              ('Final technicalities I', 2, None, 'final-technicalities-i'),
+              ('Final technicalities II', 2, None, 'final-technicalities-ii'),
+              ('Final technicalities III', 2, None, 'final-technicalities-iii'),
+              ('Final technicalities IV', 2, None, 'final-technicalities-iv'),
+              ('Back propagation', 2, None, 'back-propagation'),
+              ('Gradient descent', 2, None, 'gradient-descent'),
+              ('The code for solving the ODE',
+               2,
+               None,
+               'the-code-for-solving-the-ode'),
+              ('The network with one input layer, specified number of hidden '
+               'layers, and one output layer',
+               2,
+               None,
+               'the-network-with-one-input-layer-specified-number-of-hidden-layers-and-one-output-layer'),
+              ('Example: Population growth',
+               2,
+               None,
+               'example-population-growth'),
+              ('Setting up the problem', 2, None, 'setting-up-the-problem'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('The program using Autograd',
+               2,
+               None,
+               'the-program-using-autograd'),
+              ('Using forward Euler to solve the ODE',
+               2,
+               None,
+               'using-forward-euler-to-solve-the-ode'),
+              ('Example: Solving the one dimensional Poisson equation',
+               2,
+               None,
+               'example-solving-the-one-dimensional-poisson-equation'),
+              ('The specific equation to solve for',
+               2,
+               None,
+               'the-specific-equation-to-solve-for'),
+              ('Solving the equation using Autograd',
+               2,
+               None,
+               'solving-the-equation-using-autograd'),
+              ('Comparing with a numerical scheme',
+               2,
+               None,
+               'comparing-with-a-numerical-scheme'),
+              ('Setting up the code', 2, None, 'setting-up-the-code'),
+              ('Partial Differential Equations',
+               2,
+               None,
+               'partial-differential-equations'),
+              ('Type of problem', 2, None, 'type-of-problem'),
+              ('Network requirements', 2, None, 'network-requirements'),
+              ('More details', 2, None, 'more-details'),
+              ('Example: The diffusion equation',
+               2,
+               None,
+               'example-the-diffusion-equation'),
+              ('Defining the problem', 2, None, 'defining-the-problem'),
+              ('Setting up the network using Autograd',
+               2,
+               None,
+               'setting-up-the-network-using-autograd'),
+              ('Setting up the network using Autograd; The trial solution',
+               2,
+               None,
+               'setting-up-the-network-using-autograd-the-trial-solution'),
+              ('Why the Jacobian?', 2, None, 'why-the-jacobian'),
+              ('Setting up the network using Autograd; The full program',
+               2,
+               None,
+               'setting-up-the-network-using-autograd-the-full-program'),
+              ('Resources on differential equations and deep learning',
+               2,
+               None,
+               'resources-on-differential-equations-and-deep-learning'),
               ('Convolutional Neural Networks (recognizing images)',
                2,
                None,
@@ -141,85 +252,11 @@
                2,
                None,
                'pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
-              ('Building convolutional neural networks in Tensorflow and Keras',
-               2,
-               None,
-               'building-convolutional-neural-networks-in-tensorflow-and-keras'),
-              ('Setting it up', 2, None, 'setting-it-up'),
-              ('The MNIST dataset again', 2, None, 'the-mnist-dataset-again'),
-              ('Strong correlations', 2, None, 'strong-correlations'),
-              ('Layers of a CNN', 2, None, 'layers-of-a-cnn'),
-              ('Systematic reduction', 2, None, 'systematic-reduction'),
-              ('Prerequisites: Collect and pre-process data',
-               2,
-               None,
-               'prerequisites-collect-and-pre-process-data'),
-              ('Importing Keras and Tensorflow',
-               2,
-               None,
-               'importing-keras-and-tensorflow'),
-              ('Running with Keras', 2, None, 'running-with-keras'),
-              ('Final part', 2, None, 'final-part'),
-              ('Final visualization', 2, None, 'final-visualization'),
-              ('The CIFAR01 data set', 2, None, 'the-cifar01-data-set'),
-              ('Verifying the data set', 2, None, 'verifying-the-data-set'),
-              ('Set up  the model', 2, None, 'set-up-the-model'),
-              ('Add Dense layers on top', 2, None, 'add-dense-layers-on-top'),
-              ('Compile and train the model',
+              ('Building convolutional neural networks in Tensorflow/Keras and '
+               'PyTorch',
                2,
                None,
-               'compile-and-train-the-model'),
-              ('Finally, evaluate the model',
-               2,
-               None,
-               'finally-evaluate-the-model'),
-              ('Building our own CNN code',
-               2,
-               None,
-               'building-our-own-cnn-code'),
-              ('List of contents:', 3, None, 'list-of-contents'),
-              ('Schedulers', 3, None, 'schedulers'),
-              ('Usage of schedulers', 3, None, 'usage-of-schedulers'),
-              ('Cost functions', 3, None, 'cost-functions'),
-              ('Usage of cost functions', 3, None, 'usage-of-cost-functions'),
-              ('Activation functions', 3, None, 'activation-functions'),
-              ('Usage of activation functions',
-               3,
-               None,
-               'usage-of-activation-functions'),
-              ('Convolution', 3, None, 'convolution'),
-              ('Layers', 3, None, 'layers'),
-              ('Convolution2DLayer: convolution in a hidden layer',
-               3,
-               None,
-               'convolution2dlayer-convolution-in-a-hidden-layer'),
-              ('Backpropagation in the convolutional layer',
-               3,
-               None,
-               'backpropagation-in-the-convolutional-layer'),
-              ('Demonstration', 3, None, 'demonstration'),
-              ('Pooling Layer', 3, None, 'pooling-layer'),
-              ('Flattening Layer', 3, None, 'flattening-layer'),
-              ('Fully Connected Layers', 3, None, 'fully-connected-layers'),
-              ('Optimized Convolution2DLayer',
-               3,
-               None,
-               'optimized-convolution2dlayer'),
-              ('The Convolutional Neural Network (CNN)',
-               3,
-               None,
-               'the-convolutional-neural-network-cnn'),
-              ('Usage of CNN code', 3, None, 'usage-of-cnn-code'),
-              ('Additional Remarks', 3, None, 'additional-remarks'),
-              ('Remarks on the speed', 3, None, 'remarks-on-the-speed'),
-              ('Convolution using separable kernels',
-               3,
-               None,
-               'convolution-using-separable-kernels'),
-              ('Convolution in the Fourier domain',
-               3,
-               None,
-               'convolution-in-the-fourier-domain')]}
+               'building-convolutional-neural-networks-in-tensorflow-keras-and-pytorch')]}
 end of tocinfo -->
 
 <body>
@@ -247,96 +284,100 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week44-bs.html">Week 44,  Convolutional Neural Networks (CNN)</a>
+    <a class="navbar-brand" href="http://github.com/week44-bs.html">Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs001.html#plan-for-week-44" style="font-size: 80%;"><b>Plan for week 44</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs002.html#lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Lab  sessions on Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs003.html#material-for-lecture-monday-october-28" style="font-size: 80%;"><b>Material for Lecture Monday October 28</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs004.html#convolutional-neural-networks-recognizing-images" style="font-size: 80%;"><b>Convolutional Neural Networks (recognizing images)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs005.html#what-is-the-difference" style="font-size: 80%;"><b>What is the Difference</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs006.html#neural-networks-vs-cnns" style="font-size: 80%;"><b>Neural Networks vs CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs007.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;"><b>Why CNNS for images, sound files, medical images from CT scans etc?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs008.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;"><b>Regular NNs don’t scale well to full images</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs009.html#3d-volumes-of-neurons" style="font-size: 80%;"><b>3D volumes of neurons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs010.html#more-on-dimensionalities" style="font-size: 80%;"><b>More on Dimensionalities</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs011.html#further-remarks" style="font-size: 80%;"><b>Further remarks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs012.html#layers-used-to-build-cnns" style="font-size: 80%;"><b>Layers used to build CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs013.html#transforming-images" style="font-size: 80%;"><b>Transforming images</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs014.html#cnns-in-brief" style="font-size: 80%;"><b>CNNs in brief</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs015.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs016.html#key-idea" style="font-size: 80%;"><b>Key Idea</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs017.html#how-to-do-image-compression-before-the-era-of-deep-learning" style="font-size: 80%;"><b>How to do image compression before the era of deep learning</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs018.html#the-svd-example" style="font-size: 80%;"><b>The SVD example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs019.html#mathematics-of-cnns" style="font-size: 80%;"><b>Mathematics of CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs020.html#convolution-examples-polynomial-multiplication" style="font-size: 80%;"><b>Convolution Examples: Polynomial multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs021.html#efficient-polynomial-multiplication" style="font-size: 80%;"><b>Efficient Polynomial Multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs022.html#further-simplification" style="font-size: 80%;"><b>Further simplification</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs023.html#a-more-efficient-way-of-coding-the-above-convolution" style="font-size: 80%;"><b>A more efficient way of coding the above Convolution</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs024.html#commutative-process" style="font-size: 80%;"><b>Commutative process</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs025.html#toeplitz-matrices" style="font-size: 80%;"><b>Toeplitz matrices</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs026.html#fourier-series-and-toeplitz-matrices" style="font-size: 80%;"><b>Fourier series and Toeplitz matrices</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs027.html#generalizing-the-above-one-dimensional-case" style="font-size: 80%;"><b>Generalizing the above one-dimensional case</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#memory-considerations" style="font-size: 80%;"><b>Memory considerations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs029.html#padding" style="font-size: 80%;"><b>Padding</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs030.html#new-vector" style="font-size: 80%;"><b>New vector</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs031.html#rewriting-as-dot-products" style="font-size: 80%;"><b>Rewriting as dot products</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs032.html#cross-correlation" style="font-size: 80%;"><b>Cross correlation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs032.html#two-dimensional-objects" style="font-size: 80%;"><b>Two-dimensional objects</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs033.html#cnns-in-more-detail-simple-example" style="font-size: 80%;"><b>CNNs in more detail, simple example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs034.html#the-convolution-stage" style="font-size: 80%;"><b>The convolution stage</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs035.html#finding-the-number-of-parameters" style="font-size: 80%;"><b>Finding the number of parameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs036.html#new-image-or-volume" style="font-size: 80%;"><b>New image (or volume)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs037.html#parameters-to-train-common-settings" style="font-size: 80%;"><b>Parameters to train, common settings</b></a></li>
-     <!-- navigation toc: --> <li><a href="#examples-of-cnn-setups" style="font-size: 80%;"><b>Examples of CNN setups</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs039.html#summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>Summarizing: Performing a general discrete convolution ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs040.html#pooling" style="font-size: 80%;"><b>Pooling</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs041.html#pooling-arithmetic" style="font-size: 80%;"><b>Pooling arithmetic</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs042.html#pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>Pooling types ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs043.html#building-convolutional-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building convolutional neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs044.html#setting-it-up" style="font-size: 80%;"><b>Setting it up</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs045.html#the-mnist-dataset-again" style="font-size: 80%;"><b>The MNIST dataset again</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs046.html#strong-correlations" style="font-size: 80%;"><b>Strong correlations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs047.html#layers-of-a-cnn" style="font-size: 80%;"><b>Layers of a CNN</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs048.html#systematic-reduction" style="font-size: 80%;"><b>Systematic reduction</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs049.html#prerequisites-collect-and-pre-process-data" style="font-size: 80%;"><b>Prerequisites: Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs050.html#importing-keras-and-tensorflow" style="font-size: 80%;"><b>Importing Keras and Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs051.html#running-with-keras" style="font-size: 80%;"><b>Running with Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs052.html#final-part" style="font-size: 80%;"><b>Final part</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs053.html#final-visualization" style="font-size: 80%;"><b>Final visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs054.html#the-cifar01-data-set" style="font-size: 80%;"><b>The CIFAR01 data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs055.html#verifying-the-data-set" style="font-size: 80%;"><b>Verifying the data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs056.html#set-up-the-model" style="font-size: 80%;"><b>Set up  the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs057.html#add-dense-layers-on-top" style="font-size: 80%;"><b>Add Dense layers on top</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs058.html#compile-and-train-the-model" style="font-size: 80%;"><b>Compile and train the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs059.html#finally-evaluate-the-model" style="font-size: 80%;"><b>Finally, evaluate the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#building-our-own-cnn-code" style="font-size: 80%;"><b>Building our own CNN code</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#list-of-contents" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;List of contents:</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution2dlayer-convolution-in-a-hidden-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution2DLayer: convolution in a hidden layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#backpropagation-in-the-convolutional-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Backpropagation in the convolutional layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#demonstration" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Demonstration</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#pooling-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Pooling Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#flattening-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Flattening Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#fully-connected-layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Fully Connected Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#optimized-convolution2dlayer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Optimized Convolution2DLayer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#the-convolutional-neural-network-cnn" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;The Convolutional Neural Network (CNN)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-cnn-code" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of CNN code</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#additional-remarks" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Additional Remarks</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#remarks-on-the-speed" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Remarks on the speed</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution-using-separable-kernels" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution using separable kernels</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution-in-the-fourier-domain" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution in the Fourier domain</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs001.html#plan-for-week-44" style="font-size: 80%;">Plan for week 44</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs002.html#lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;">Lab  sessions on Tuesday and Wednesday</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs003.html#material-for-lecture-monday-october-27" style="font-size: 80%;">Material for Lecture Monday October 27</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs004.html#solving-differential-equations-with-deep-learning" style="font-size: 80%;">Solving differential equations  with Deep Learning</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs005.html#ordinary-differential-equations-first" style="font-size: 80%;">Ordinary Differential Equations first</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#the-trial-solution" style="font-size: 80%;">The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs007.html#minimization-process" style="font-size: 80%;">Minimization process</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs008.html#minimizing-the-cost-function-using-gradient-descent-and-automatic-differentiation" style="font-size: 80%;">Minimizing the cost function using gradient descent and automatic differentiation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs009.html#example-exponential-decay" style="font-size: 80%;">Example: Exponential decay</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs010.html#the-function-to-solve-for" style="font-size: 80%;">The function to solve for</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#the-trial-solution" style="font-size: 80%;">The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs012.html#setup-of-network" style="font-size: 80%;">Setup of Network</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs013.html#reformulating-the-problem" style="font-size: 80%;">Reformulating the problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs014.html#more-technicalities" style="font-size: 80%;">More technicalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs039.html#more-details" style="font-size: 80%;">More details</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs016.html#a-possible-implementation-of-a-neural-network" style="font-size: 80%;">A possible implementation of a neural network</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs017.html#technicalities" style="font-size: 80%;">Technicalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs018.html#final-technicalities-i" style="font-size: 80%;">Final technicalities I</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs019.html#final-technicalities-ii" style="font-size: 80%;">Final technicalities II</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs020.html#final-technicalities-iii" style="font-size: 80%;">Final technicalities III</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs021.html#final-technicalities-iv" style="font-size: 80%;">Final technicalities IV</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs022.html#back-propagation" style="font-size: 80%;">Back propagation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs023.html#gradient-descent" style="font-size: 80%;">Gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs024.html#the-code-for-solving-the-ode" style="font-size: 80%;">The code for solving the ODE</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs025.html#the-network-with-one-input-layer-specified-number-of-hidden-layers-and-one-output-layer" style="font-size: 80%;">The network with one input layer, specified number of hidden layers, and one output layer</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs026.html#example-population-growth" style="font-size: 80%;">Example: Population growth</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs027.html#setting-up-the-problem" style="font-size: 80%;">Setting up the problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#the-trial-solution" style="font-size: 80%;">The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs029.html#the-program-using-autograd" style="font-size: 80%;">The program using Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs030.html#using-forward-euler-to-solve-the-ode" style="font-size: 80%;">Using forward Euler to solve the ODE</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs031.html#example-solving-the-one-dimensional-poisson-equation" style="font-size: 80%;">Example: Solving the one dimensional Poisson equation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs032.html#the-specific-equation-to-solve-for" style="font-size: 80%;">The specific equation to solve for</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs033.html#solving-the-equation-using-autograd" style="font-size: 80%;">Solving the equation using Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs034.html#comparing-with-a-numerical-scheme" style="font-size: 80%;">Comparing with a numerical scheme</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs035.html#setting-up-the-code" style="font-size: 80%;">Setting up the code</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs036.html#partial-differential-equations" style="font-size: 80%;">Partial Differential Equations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs037.html#type-of-problem" style="font-size: 80%;">Type of problem</a></li>
+     <!-- navigation toc: --> <li><a href="#network-requirements" style="font-size: 80%;">Network requirements</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs039.html#more-details" style="font-size: 80%;">More details</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs040.html#example-the-diffusion-equation" style="font-size: 80%;">Example: The diffusion equation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs041.html#defining-the-problem" style="font-size: 80%;">Defining the problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs042.html#setting-up-the-network-using-autograd" style="font-size: 80%;">Setting up the network using Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs043.html#setting-up-the-network-using-autograd-the-trial-solution" style="font-size: 80%;">Setting up the network using Autograd; The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs044.html#why-the-jacobian" style="font-size: 80%;">Why the Jacobian?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs045.html#setting-up-the-network-using-autograd-the-full-program" style="font-size: 80%;">Setting up the network using Autograd; The full program</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs046.html#resources-on-differential-equations-and-deep-learning" style="font-size: 80%;">Resources on differential equations and deep learning</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs047.html#convolutional-neural-networks-recognizing-images" style="font-size: 80%;">Convolutional Neural Networks (recognizing images)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs048.html#what-is-the-difference" style="font-size: 80%;">What is the Difference</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs049.html#neural-networks-vs-cnns" style="font-size: 80%;">Neural Networks vs CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs050.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;">Why CNNS for images, sound files, medical images from CT scans etc?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs051.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;">Regular NNs don’t scale well to full images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs052.html#3d-volumes-of-neurons" style="font-size: 80%;">3D volumes of neurons</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs053.html#more-on-dimensionalities" style="font-size: 80%;">More on Dimensionalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs054.html#further-remarks" style="font-size: 80%;">Further remarks</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs055.html#layers-used-to-build-cnns" style="font-size: 80%;">Layers used to build CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs056.html#transforming-images" style="font-size: 80%;">Transforming images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs057.html#cnns-in-brief" style="font-size: 80%;">CNNs in brief</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs058.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs059.html#key-idea" style="font-size: 80%;">Key Idea</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#how-to-do-image-compression-before-the-era-of-deep-learning" style="font-size: 80%;">How to do image compression before the era of deep learning</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs061.html#the-svd-example" style="font-size: 80%;">The SVD example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs062.html#mathematics-of-cnns" style="font-size: 80%;">Mathematics of CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs063.html#convolution-examples-polynomial-multiplication" style="font-size: 80%;">Convolution Examples: Polynomial multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs064.html#efficient-polynomial-multiplication" style="font-size: 80%;">Efficient Polynomial Multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs065.html#further-simplification" style="font-size: 80%;">Further simplification</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs066.html#a-more-efficient-way-of-coding-the-above-convolution" style="font-size: 80%;">A more efficient way of coding the above Convolution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs067.html#commutative-process" style="font-size: 80%;">Commutative process</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs068.html#toeplitz-matrices" style="font-size: 80%;">Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs069.html#fourier-series-and-toeplitz-matrices" style="font-size: 80%;">Fourier series and Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs070.html#generalizing-the-above-one-dimensional-case" style="font-size: 80%;">Generalizing the above one-dimensional case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs071.html#memory-considerations" style="font-size: 80%;">Memory considerations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs072.html#padding" style="font-size: 80%;">Padding</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs073.html#new-vector" style="font-size: 80%;">New vector</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs074.html#rewriting-as-dot-products" style="font-size: 80%;">Rewriting as dot products</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs075.html#cross-correlation" style="font-size: 80%;">Cross correlation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs075.html#two-dimensional-objects" style="font-size: 80%;">Two-dimensional objects</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs076.html#cnns-in-more-detail-simple-example" style="font-size: 80%;">CNNs in more detail, simple example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs077.html#the-convolution-stage" style="font-size: 80%;">The convolution stage</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs078.html#finding-the-number-of-parameters" style="font-size: 80%;">Finding the number of parameters</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs079.html#new-image-or-volume" style="font-size: 80%;">New image (or volume)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs080.html#parameters-to-train-common-settings" style="font-size: 80%;">Parameters to train, common settings</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs081.html#examples-of-cnn-setups" style="font-size: 80%;">Examples of CNN setups</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs082.html#summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Summarizing: Performing a general discrete convolution ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs083.html#pooling" style="font-size: 80%;">Pooling</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs084.html#pooling-arithmetic" style="font-size: 80%;">Pooling arithmetic</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs085.html#pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Pooling types ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs086.html#building-convolutional-neural-networks-in-tensorflow-keras-and-pytorch" style="font-size: 80%;">Building convolutional neural networks in Tensorflow/Keras and PyTorch</a></li>
 
         </ul>
       </li>
@@ -348,34 +389,26 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0038"></a>
 <!-- !split -->
-<h2 id="examples-of-cnn-setups" class="anchor">Examples of CNN setups </h2>
-
-<p>Let us assume we have an input volume \( V \) given by an image of dimensionality
-\( 32\times 32 \times 3 \), that is three color channels and \( 32\times 32 \) pixels.
-</p>
-
-<p>We apply a filter of dimension \( 5\times 5 \) ten times with stride \( S=1 \) and padding \( P=0 \).</p>
+<h2 id="network-requirements" class="anchor">Network requirements </h2>
 
-<p>The output volume is given by \( (32-5)/1+1=28 \), resulting in ten images
-of dimensionality \( 28\times 28\times 3 \).
+<p>The network tries then the minimize the cost function following the
+same ideas as described for the ODE case, but now with more than one
+variables to consider.  The concept still remains the same; find a set
+of parameters \( P \) such that the expression \( f \) in <a href="http://github.com/._week44-bs036.html#mjx-eqn-17">(17)</a> is as
+close to zero as possible.
 </p>
 
-<p>The total number of parameters to train for each filter is then
-\( 5\times 5\times 3+1 \), where the last parameter is the bias. This
-gives us \( 76 \) parameters for each filter, leading to a total of \( 760 \)
-parameters for the ten filters.
+<p>As for the ODE case, the cost function is the mean squared error that
+the network must try to minimize. The cost function for the network to
+minimize is
 </p>
 
-<p>How many parameters will a filter of dimensionality \( 3\times 3 \)
-(adding color channels) result in if we produce \( 32 \) new images? Use \( S=1 \) and \( P=0 \).
-</p>
+$$
+\begin{equation*}
+C\left(x_1, \dots, x_N, P\right) = \left(  f\left(x_1, \, \dots \, , x_N, \frac{\partial g(x_1,\dots,x_N) }{\partial x_1}, \dots , \frac{\partial g(x_1,\dots,x_N) }{\partial x_N}, \frac{\partial g(x_1,\dots,x_N) }{\partial x_1\partial x_2}, \, \dots \, , \frac{\partial^n g(x_1,\dots,x_N) }{\partial x_N^n} \right) \right)^2
+\end{equation*}
+$$
 
-<p>Note that strides constitute a form of <b>subsampling</b>. As an alternative to
-being interpreted as a measure of how much the kernel/filter is translated, strides
-can also be viewed as how much of the output is retained. For instance, moving
-the kernel by hops of two is equivalent to moving the kernel by hops of one but
-retaining only odd output elements.
-</p>
 
 <p>
 <!-- navigation buttons at the bottom of the page -->
@@ -402,7 +435,7 @@ <h2 id="examples-of-cnn-setups" class="anchor">Examples of CNN setups </h2>
   <li><a href="http://github.com/._week44-bs046.html">47</a></li>
   <li><a href="http://github.com/._week44-bs047.html">48</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week44-bs060.html">61</a></li>
+  <li><a href="http://github.com/._week44-bs086.html">87</a></li>
   <li><a href="http://github.com/._week44-bs039.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week44/html/._week44-bs039.html b/doc/pub/week44/html/._week44-bs039.html
index 1d60c64f1..2a73d7d26 100644
--- a/doc/pub/week44/html/._week44-bs039.html
+++ b/doc/pub/week44/html/._week44-bs039.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 44,  Convolutional Neural Networks (CNN)">
-<title>Week 44,  Convolutional Neural Networks (CNN)</title>
+<meta name="description" content="Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)">
+<title>Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week44.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week44-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -41,10 +41,121 @@
                2,
                None,
                'lab-sessions-on-tuesday-and-wednesday'),
-              ('Material for Lecture Monday October 28',
+              ('Material for Lecture Monday October 27',
                2,
                None,
-               'material-for-lecture-monday-october-28'),
+               'material-for-lecture-monday-october-27'),
+              ('Solving differential equations  with Deep Learning',
+               2,
+               None,
+               'solving-differential-equations-with-deep-learning'),
+              ('Ordinary Differential Equations first',
+               2,
+               None,
+               'ordinary-differential-equations-first'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('Minimization process', 2, None, 'minimization-process'),
+              ('Minimizing the cost function using gradient descent and '
+               'automatic differentiation',
+               2,
+               None,
+               'minimizing-the-cost-function-using-gradient-descent-and-automatic-differentiation'),
+              ('Example: Exponential decay',
+               2,
+               None,
+               'example-exponential-decay'),
+              ('The function to solve for',
+               2,
+               None,
+               'the-function-to-solve-for'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('Setup of Network', 2, None, 'setup-of-network'),
+              ('Reformulating the problem',
+               2,
+               None,
+               'reformulating-the-problem'),
+              ('More technicalities', 2, None, 'more-technicalities'),
+              ('More details', 2, None, 'more-details'),
+              ('A possible implementation of a neural network',
+               2,
+               None,
+               'a-possible-implementation-of-a-neural-network'),
+              ('Technicalities', 2, None, 'technicalities'),
+              ('Final technicalities I', 2, None, 'final-technicalities-i'),
+              ('Final technicalities II', 2, None, 'final-technicalities-ii'),
+              ('Final technicalities III', 2, None, 'final-technicalities-iii'),
+              ('Final technicalities IV', 2, None, 'final-technicalities-iv'),
+              ('Back propagation', 2, None, 'back-propagation'),
+              ('Gradient descent', 2, None, 'gradient-descent'),
+              ('The code for solving the ODE',
+               2,
+               None,
+               'the-code-for-solving-the-ode'),
+              ('The network with one input layer, specified number of hidden '
+               'layers, and one output layer',
+               2,
+               None,
+               'the-network-with-one-input-layer-specified-number-of-hidden-layers-and-one-output-layer'),
+              ('Example: Population growth',
+               2,
+               None,
+               'example-population-growth'),
+              ('Setting up the problem', 2, None, 'setting-up-the-problem'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('The program using Autograd',
+               2,
+               None,
+               'the-program-using-autograd'),
+              ('Using forward Euler to solve the ODE',
+               2,
+               None,
+               'using-forward-euler-to-solve-the-ode'),
+              ('Example: Solving the one dimensional Poisson equation',
+               2,
+               None,
+               'example-solving-the-one-dimensional-poisson-equation'),
+              ('The specific equation to solve for',
+               2,
+               None,
+               'the-specific-equation-to-solve-for'),
+              ('Solving the equation using Autograd',
+               2,
+               None,
+               'solving-the-equation-using-autograd'),
+              ('Comparing with a numerical scheme',
+               2,
+               None,
+               'comparing-with-a-numerical-scheme'),
+              ('Setting up the code', 2, None, 'setting-up-the-code'),
+              ('Partial Differential Equations',
+               2,
+               None,
+               'partial-differential-equations'),
+              ('Type of problem', 2, None, 'type-of-problem'),
+              ('Network requirements', 2, None, 'network-requirements'),
+              ('More details', 2, None, 'more-details'),
+              ('Example: The diffusion equation',
+               2,
+               None,
+               'example-the-diffusion-equation'),
+              ('Defining the problem', 2, None, 'defining-the-problem'),
+              ('Setting up the network using Autograd',
+               2,
+               None,
+               'setting-up-the-network-using-autograd'),
+              ('Setting up the network using Autograd; The trial solution',
+               2,
+               None,
+               'setting-up-the-network-using-autograd-the-trial-solution'),
+              ('Why the Jacobian?', 2, None, 'why-the-jacobian'),
+              ('Setting up the network using Autograd; The full program',
+               2,
+               None,
+               'setting-up-the-network-using-autograd-the-full-program'),
+              ('Resources on differential equations and deep learning',
+               2,
+               None,
+               'resources-on-differential-equations-and-deep-learning'),
               ('Convolutional Neural Networks (recognizing images)',
                2,
                None,
@@ -141,85 +252,11 @@
                2,
                None,
                'pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
-              ('Building convolutional neural networks in Tensorflow and Keras',
-               2,
-               None,
-               'building-convolutional-neural-networks-in-tensorflow-and-keras'),
-              ('Setting it up', 2, None, 'setting-it-up'),
-              ('The MNIST dataset again', 2, None, 'the-mnist-dataset-again'),
-              ('Strong correlations', 2, None, 'strong-correlations'),
-              ('Layers of a CNN', 2, None, 'layers-of-a-cnn'),
-              ('Systematic reduction', 2, None, 'systematic-reduction'),
-              ('Prerequisites: Collect and pre-process data',
-               2,
-               None,
-               'prerequisites-collect-and-pre-process-data'),
-              ('Importing Keras and Tensorflow',
-               2,
-               None,
-               'importing-keras-and-tensorflow'),
-              ('Running with Keras', 2, None, 'running-with-keras'),
-              ('Final part', 2, None, 'final-part'),
-              ('Final visualization', 2, None, 'final-visualization'),
-              ('The CIFAR01 data set', 2, None, 'the-cifar01-data-set'),
-              ('Verifying the data set', 2, None, 'verifying-the-data-set'),
-              ('Set up  the model', 2, None, 'set-up-the-model'),
-              ('Add Dense layers on top', 2, None, 'add-dense-layers-on-top'),
-              ('Compile and train the model',
-               2,
-               None,
-               'compile-and-train-the-model'),
-              ('Finally, evaluate the model',
-               2,
-               None,
-               'finally-evaluate-the-model'),
-              ('Building our own CNN code',
+              ('Building convolutional neural networks in Tensorflow/Keras and '
+               'PyTorch',
                2,
                None,
-               'building-our-own-cnn-code'),
-              ('List of contents:', 3, None, 'list-of-contents'),
-              ('Schedulers', 3, None, 'schedulers'),
-              ('Usage of schedulers', 3, None, 'usage-of-schedulers'),
-              ('Cost functions', 3, None, 'cost-functions'),
-              ('Usage of cost functions', 3, None, 'usage-of-cost-functions'),
-              ('Activation functions', 3, None, 'activation-functions'),
-              ('Usage of activation functions',
-               3,
-               None,
-               'usage-of-activation-functions'),
-              ('Convolution', 3, None, 'convolution'),
-              ('Layers', 3, None, 'layers'),
-              ('Convolution2DLayer: convolution in a hidden layer',
-               3,
-               None,
-               'convolution2dlayer-convolution-in-a-hidden-layer'),
-              ('Backpropagation in the convolutional layer',
-               3,
-               None,
-               'backpropagation-in-the-convolutional-layer'),
-              ('Demonstration', 3, None, 'demonstration'),
-              ('Pooling Layer', 3, None, 'pooling-layer'),
-              ('Flattening Layer', 3, None, 'flattening-layer'),
-              ('Fully Connected Layers', 3, None, 'fully-connected-layers'),
-              ('Optimized Convolution2DLayer',
-               3,
-               None,
-               'optimized-convolution2dlayer'),
-              ('The Convolutional Neural Network (CNN)',
-               3,
-               None,
-               'the-convolutional-neural-network-cnn'),
-              ('Usage of CNN code', 3, None, 'usage-of-cnn-code'),
-              ('Additional Remarks', 3, None, 'additional-remarks'),
-              ('Remarks on the speed', 3, None, 'remarks-on-the-speed'),
-              ('Convolution using separable kernels',
-               3,
-               None,
-               'convolution-using-separable-kernels'),
-              ('Convolution in the Fourier domain',
-               3,
-               None,
-               'convolution-in-the-fourier-domain')]}
+               'building-convolutional-neural-networks-in-tensorflow-keras-and-pytorch')]}
 end of tocinfo -->
 
 <body>
@@ -247,96 +284,100 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week44-bs.html">Week 44,  Convolutional Neural Networks (CNN)</a>
+    <a class="navbar-brand" href="http://github.com/week44-bs.html">Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs001.html#plan-for-week-44" style="font-size: 80%;"><b>Plan for week 44</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs002.html#lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Lab  sessions on Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs003.html#material-for-lecture-monday-october-28" style="font-size: 80%;"><b>Material for Lecture Monday October 28</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs004.html#convolutional-neural-networks-recognizing-images" style="font-size: 80%;"><b>Convolutional Neural Networks (recognizing images)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs005.html#what-is-the-difference" style="font-size: 80%;"><b>What is the Difference</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs006.html#neural-networks-vs-cnns" style="font-size: 80%;"><b>Neural Networks vs CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs007.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;"><b>Why CNNS for images, sound files, medical images from CT scans etc?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs008.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;"><b>Regular NNs don’t scale well to full images</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs009.html#3d-volumes-of-neurons" style="font-size: 80%;"><b>3D volumes of neurons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs010.html#more-on-dimensionalities" style="font-size: 80%;"><b>More on Dimensionalities</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs011.html#further-remarks" style="font-size: 80%;"><b>Further remarks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs012.html#layers-used-to-build-cnns" style="font-size: 80%;"><b>Layers used to build CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs013.html#transforming-images" style="font-size: 80%;"><b>Transforming images</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs014.html#cnns-in-brief" style="font-size: 80%;"><b>CNNs in brief</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs015.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs016.html#key-idea" style="font-size: 80%;"><b>Key Idea</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs017.html#how-to-do-image-compression-before-the-era-of-deep-learning" style="font-size: 80%;"><b>How to do image compression before the era of deep learning</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs018.html#the-svd-example" style="font-size: 80%;"><b>The SVD example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs019.html#mathematics-of-cnns" style="font-size: 80%;"><b>Mathematics of CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs020.html#convolution-examples-polynomial-multiplication" style="font-size: 80%;"><b>Convolution Examples: Polynomial multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs021.html#efficient-polynomial-multiplication" style="font-size: 80%;"><b>Efficient Polynomial Multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs022.html#further-simplification" style="font-size: 80%;"><b>Further simplification</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs023.html#a-more-efficient-way-of-coding-the-above-convolution" style="font-size: 80%;"><b>A more efficient way of coding the above Convolution</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs024.html#commutative-process" style="font-size: 80%;"><b>Commutative process</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs025.html#toeplitz-matrices" style="font-size: 80%;"><b>Toeplitz matrices</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs026.html#fourier-series-and-toeplitz-matrices" style="font-size: 80%;"><b>Fourier series and Toeplitz matrices</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs027.html#generalizing-the-above-one-dimensional-case" style="font-size: 80%;"><b>Generalizing the above one-dimensional case</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#memory-considerations" style="font-size: 80%;"><b>Memory considerations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs029.html#padding" style="font-size: 80%;"><b>Padding</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs030.html#new-vector" style="font-size: 80%;"><b>New vector</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs031.html#rewriting-as-dot-products" style="font-size: 80%;"><b>Rewriting as dot products</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs032.html#cross-correlation" style="font-size: 80%;"><b>Cross correlation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs032.html#two-dimensional-objects" style="font-size: 80%;"><b>Two-dimensional objects</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs033.html#cnns-in-more-detail-simple-example" style="font-size: 80%;"><b>CNNs in more detail, simple example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs034.html#the-convolution-stage" style="font-size: 80%;"><b>The convolution stage</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs035.html#finding-the-number-of-parameters" style="font-size: 80%;"><b>Finding the number of parameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs036.html#new-image-or-volume" style="font-size: 80%;"><b>New image (or volume)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs037.html#parameters-to-train-common-settings" style="font-size: 80%;"><b>Parameters to train, common settings</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs038.html#examples-of-cnn-setups" style="font-size: 80%;"><b>Examples of CNN setups</b></a></li>
-     <!-- navigation toc: --> <li><a href="#summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>Summarizing: Performing a general discrete convolution ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs040.html#pooling" style="font-size: 80%;"><b>Pooling</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs041.html#pooling-arithmetic" style="font-size: 80%;"><b>Pooling arithmetic</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs042.html#pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>Pooling types ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs043.html#building-convolutional-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building convolutional neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs044.html#setting-it-up" style="font-size: 80%;"><b>Setting it up</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs045.html#the-mnist-dataset-again" style="font-size: 80%;"><b>The MNIST dataset again</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs046.html#strong-correlations" style="font-size: 80%;"><b>Strong correlations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs047.html#layers-of-a-cnn" style="font-size: 80%;"><b>Layers of a CNN</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs048.html#systematic-reduction" style="font-size: 80%;"><b>Systematic reduction</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs049.html#prerequisites-collect-and-pre-process-data" style="font-size: 80%;"><b>Prerequisites: Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs050.html#importing-keras-and-tensorflow" style="font-size: 80%;"><b>Importing Keras and Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs051.html#running-with-keras" style="font-size: 80%;"><b>Running with Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs052.html#final-part" style="font-size: 80%;"><b>Final part</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs053.html#final-visualization" style="font-size: 80%;"><b>Final visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs054.html#the-cifar01-data-set" style="font-size: 80%;"><b>The CIFAR01 data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs055.html#verifying-the-data-set" style="font-size: 80%;"><b>Verifying the data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs056.html#set-up-the-model" style="font-size: 80%;"><b>Set up  the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs057.html#add-dense-layers-on-top" style="font-size: 80%;"><b>Add Dense layers on top</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs058.html#compile-and-train-the-model" style="font-size: 80%;"><b>Compile and train the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs059.html#finally-evaluate-the-model" style="font-size: 80%;"><b>Finally, evaluate the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#building-our-own-cnn-code" style="font-size: 80%;"><b>Building our own CNN code</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#list-of-contents" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;List of contents:</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution2dlayer-convolution-in-a-hidden-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution2DLayer: convolution in a hidden layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#backpropagation-in-the-convolutional-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Backpropagation in the convolutional layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#demonstration" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Demonstration</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#pooling-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Pooling Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#flattening-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Flattening Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#fully-connected-layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Fully Connected Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#optimized-convolution2dlayer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Optimized Convolution2DLayer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#the-convolutional-neural-network-cnn" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;The Convolutional Neural Network (CNN)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-cnn-code" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of CNN code</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#additional-remarks" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Additional Remarks</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#remarks-on-the-speed" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Remarks on the speed</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution-using-separable-kernels" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution using separable kernels</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution-in-the-fourier-domain" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution in the Fourier domain</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs001.html#plan-for-week-44" style="font-size: 80%;">Plan for week 44</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs002.html#lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;">Lab  sessions on Tuesday and Wednesday</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs003.html#material-for-lecture-monday-october-27" style="font-size: 80%;">Material for Lecture Monday October 27</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs004.html#solving-differential-equations-with-deep-learning" style="font-size: 80%;">Solving differential equations  with Deep Learning</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs005.html#ordinary-differential-equations-first" style="font-size: 80%;">Ordinary Differential Equations first</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#the-trial-solution" style="font-size: 80%;">The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs007.html#minimization-process" style="font-size: 80%;">Minimization process</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs008.html#minimizing-the-cost-function-using-gradient-descent-and-automatic-differentiation" style="font-size: 80%;">Minimizing the cost function using gradient descent and automatic differentiation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs009.html#example-exponential-decay" style="font-size: 80%;">Example: Exponential decay</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs010.html#the-function-to-solve-for" style="font-size: 80%;">The function to solve for</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#the-trial-solution" style="font-size: 80%;">The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs012.html#setup-of-network" style="font-size: 80%;">Setup of Network</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs013.html#reformulating-the-problem" style="font-size: 80%;">Reformulating the problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs014.html#more-technicalities" style="font-size: 80%;">More technicalities</a></li>
+     <!-- navigation toc: --> <li><a href="#more-details" style="font-size: 80%;">More details</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs016.html#a-possible-implementation-of-a-neural-network" style="font-size: 80%;">A possible implementation of a neural network</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs017.html#technicalities" style="font-size: 80%;">Technicalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs018.html#final-technicalities-i" style="font-size: 80%;">Final technicalities I</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs019.html#final-technicalities-ii" style="font-size: 80%;">Final technicalities II</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs020.html#final-technicalities-iii" style="font-size: 80%;">Final technicalities III</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs021.html#final-technicalities-iv" style="font-size: 80%;">Final technicalities IV</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs022.html#back-propagation" style="font-size: 80%;">Back propagation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs023.html#gradient-descent" style="font-size: 80%;">Gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs024.html#the-code-for-solving-the-ode" style="font-size: 80%;">The code for solving the ODE</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs025.html#the-network-with-one-input-layer-specified-number-of-hidden-layers-and-one-output-layer" style="font-size: 80%;">The network with one input layer, specified number of hidden layers, and one output layer</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs026.html#example-population-growth" style="font-size: 80%;">Example: Population growth</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs027.html#setting-up-the-problem" style="font-size: 80%;">Setting up the problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#the-trial-solution" style="font-size: 80%;">The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs029.html#the-program-using-autograd" style="font-size: 80%;">The program using Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs030.html#using-forward-euler-to-solve-the-ode" style="font-size: 80%;">Using forward Euler to solve the ODE</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs031.html#example-solving-the-one-dimensional-poisson-equation" style="font-size: 80%;">Example: Solving the one dimensional Poisson equation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs032.html#the-specific-equation-to-solve-for" style="font-size: 80%;">The specific equation to solve for</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs033.html#solving-the-equation-using-autograd" style="font-size: 80%;">Solving the equation using Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs034.html#comparing-with-a-numerical-scheme" style="font-size: 80%;">Comparing with a numerical scheme</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs035.html#setting-up-the-code" style="font-size: 80%;">Setting up the code</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs036.html#partial-differential-equations" style="font-size: 80%;">Partial Differential Equations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs037.html#type-of-problem" style="font-size: 80%;">Type of problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs038.html#network-requirements" style="font-size: 80%;">Network requirements</a></li>
+     <!-- navigation toc: --> <li><a href="#more-details" style="font-size: 80%;">More details</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs040.html#example-the-diffusion-equation" style="font-size: 80%;">Example: The diffusion equation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs041.html#defining-the-problem" style="font-size: 80%;">Defining the problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs042.html#setting-up-the-network-using-autograd" style="font-size: 80%;">Setting up the network using Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs043.html#setting-up-the-network-using-autograd-the-trial-solution" style="font-size: 80%;">Setting up the network using Autograd; The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs044.html#why-the-jacobian" style="font-size: 80%;">Why the Jacobian?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs045.html#setting-up-the-network-using-autograd-the-full-program" style="font-size: 80%;">Setting up the network using Autograd; The full program</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs046.html#resources-on-differential-equations-and-deep-learning" style="font-size: 80%;">Resources on differential equations and deep learning</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs047.html#convolutional-neural-networks-recognizing-images" style="font-size: 80%;">Convolutional Neural Networks (recognizing images)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs048.html#what-is-the-difference" style="font-size: 80%;">What is the Difference</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs049.html#neural-networks-vs-cnns" style="font-size: 80%;">Neural Networks vs CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs050.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;">Why CNNS for images, sound files, medical images from CT scans etc?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs051.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;">Regular NNs don’t scale well to full images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs052.html#3d-volumes-of-neurons" style="font-size: 80%;">3D volumes of neurons</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs053.html#more-on-dimensionalities" style="font-size: 80%;">More on Dimensionalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs054.html#further-remarks" style="font-size: 80%;">Further remarks</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs055.html#layers-used-to-build-cnns" style="font-size: 80%;">Layers used to build CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs056.html#transforming-images" style="font-size: 80%;">Transforming images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs057.html#cnns-in-brief" style="font-size: 80%;">CNNs in brief</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs058.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs059.html#key-idea" style="font-size: 80%;">Key Idea</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#how-to-do-image-compression-before-the-era-of-deep-learning" style="font-size: 80%;">How to do image compression before the era of deep learning</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs061.html#the-svd-example" style="font-size: 80%;">The SVD example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs062.html#mathematics-of-cnns" style="font-size: 80%;">Mathematics of CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs063.html#convolution-examples-polynomial-multiplication" style="font-size: 80%;">Convolution Examples: Polynomial multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs064.html#efficient-polynomial-multiplication" style="font-size: 80%;">Efficient Polynomial Multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs065.html#further-simplification" style="font-size: 80%;">Further simplification</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs066.html#a-more-efficient-way-of-coding-the-above-convolution" style="font-size: 80%;">A more efficient way of coding the above Convolution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs067.html#commutative-process" style="font-size: 80%;">Commutative process</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs068.html#toeplitz-matrices" style="font-size: 80%;">Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs069.html#fourier-series-and-toeplitz-matrices" style="font-size: 80%;">Fourier series and Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs070.html#generalizing-the-above-one-dimensional-case" style="font-size: 80%;">Generalizing the above one-dimensional case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs071.html#memory-considerations" style="font-size: 80%;">Memory considerations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs072.html#padding" style="font-size: 80%;">Padding</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs073.html#new-vector" style="font-size: 80%;">New vector</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs074.html#rewriting-as-dot-products" style="font-size: 80%;">Rewriting as dot products</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs075.html#cross-correlation" style="font-size: 80%;">Cross correlation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs075.html#two-dimensional-objects" style="font-size: 80%;">Two-dimensional objects</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs076.html#cnns-in-more-detail-simple-example" style="font-size: 80%;">CNNs in more detail, simple example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs077.html#the-convolution-stage" style="font-size: 80%;">The convolution stage</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs078.html#finding-the-number-of-parameters" style="font-size: 80%;">Finding the number of parameters</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs079.html#new-image-or-volume" style="font-size: 80%;">New image (or volume)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs080.html#parameters-to-train-common-settings" style="font-size: 80%;">Parameters to train, common settings</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs081.html#examples-of-cnn-setups" style="font-size: 80%;">Examples of CNN setups</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs082.html#summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Summarizing: Performing a general discrete convolution ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs083.html#pooling" style="font-size: 80%;">Pooling</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs084.html#pooling-arithmetic" style="font-size: 80%;">Pooling arithmetic</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs085.html#pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Pooling types ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs086.html#building-convolutional-neural-networks-in-tensorflow-keras-and-pytorch" style="font-size: 80%;">Building convolutional neural networks in Tensorflow/Keras and PyTorch</a></li>
 
         </ul>
       </li>
@@ -348,15 +389,20 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0039"></a>
 <!-- !split -->
-<h2 id="summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" class="anchor">Summarizing: Performing a general discrete convolution (<a href="https://github.com/rasbt/machine-learning-book" target="_self">From Raschka et al</a>) </h2>
+<h2 id="more-details" class="anchor">More details </h2>
+
+<p>If we let \( \boldsymbol{x} = \big( x_1, \dots, x_N \big) \) be an array containing the values for \( x_1, \dots, x_N \) respectively, the cost function can be reformulated into the following:</p>
+$$
+	C\left(\boldsymbol{x}, P\right) = f\left( \left( \boldsymbol{x}, \frac{\partial g(\boldsymbol{x}) }{\partial x_1}, \dots , \frac{\partial g(\boldsymbol{x}) }{\partial x_N}, \frac{\partial g(\boldsymbol{x}) }{\partial x_1\partial x_2}, \, \dots \, , \frac{\partial^n g(\boldsymbol{x}) }{\partial x_N^n} \right) \right)^2
+$$
+
+<p>If we also have \( M \) different sets of values for \( x_1, \dots, x_N \), that is \( \boldsymbol{x}_i = \big(x_1^{(i)}, \dots, x_N^{(i)}\big) \) for \( i = 1,\dots,M \) being the rows in matrix \( X \), the cost function can be generalized into</p>
+$$
+\begin{equation*}
+C\left(X, P \right) = \sum_{i=1}^M f\left( \left( \boldsymbol{x}_i, \frac{\partial g(\boldsymbol{x}_i) }{\partial x_1}, \dots , \frac{\partial g(\boldsymbol{x}_i) }{\partial x_N}, \frac{\partial g(\boldsymbol{x}_i) }{\partial x_1\partial x_2}, \, \dots \, , \frac{\partial^n g(\boldsymbol{x}_i) }{\partial x_N^n} \right) \right)^2.
+\end{equation*}
+$$
 
-<center>  <!-- FIGURE -->
-<hr class="figure">
-<center>
-<p class="caption">Figure 4:  A deep CNN </p>
-</center>
-<p><img src="http://github.com/figslides/discreteconv1.png" width="500" align="bottom"></p>
-</center>
 
 <p>
 <!-- navigation buttons at the bottom of the page -->
@@ -383,7 +429,7 @@ <h2 id="summarizing-performing-a-general-discrete-convolution-from-raschka-et-al
   <li><a href="http://github.com/._week44-bs047.html">48</a></li>
   <li><a href="http://github.com/._week44-bs048.html">49</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week44-bs060.html">61</a></li>
+  <li><a href="http://github.com/._week44-bs086.html">87</a></li>
   <li><a href="http://github.com/._week44-bs040.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week44/html/._week44-bs040.html b/doc/pub/week44/html/._week44-bs040.html
index 53a0e84bd..9ddd46ec2 100644
--- a/doc/pub/week44/html/._week44-bs040.html
+++ b/doc/pub/week44/html/._week44-bs040.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 44,  Convolutional Neural Networks (CNN)">
-<title>Week 44,  Convolutional Neural Networks (CNN)</title>
+<meta name="description" content="Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)">
+<title>Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week44.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week44-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -41,10 +41,121 @@
                2,
                None,
                'lab-sessions-on-tuesday-and-wednesday'),
-              ('Material for Lecture Monday October 28',
+              ('Material for Lecture Monday October 27',
                2,
                None,
-               'material-for-lecture-monday-october-28'),
+               'material-for-lecture-monday-october-27'),
+              ('Solving differential equations  with Deep Learning',
+               2,
+               None,
+               'solving-differential-equations-with-deep-learning'),
+              ('Ordinary Differential Equations first',
+               2,
+               None,
+               'ordinary-differential-equations-first'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('Minimization process', 2, None, 'minimization-process'),
+              ('Minimizing the cost function using gradient descent and '
+               'automatic differentiation',
+               2,
+               None,
+               'minimizing-the-cost-function-using-gradient-descent-and-automatic-differentiation'),
+              ('Example: Exponential decay',
+               2,
+               None,
+               'example-exponential-decay'),
+              ('The function to solve for',
+               2,
+               None,
+               'the-function-to-solve-for'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('Setup of Network', 2, None, 'setup-of-network'),
+              ('Reformulating the problem',
+               2,
+               None,
+               'reformulating-the-problem'),
+              ('More technicalities', 2, None, 'more-technicalities'),
+              ('More details', 2, None, 'more-details'),
+              ('A possible implementation of a neural network',
+               2,
+               None,
+               'a-possible-implementation-of-a-neural-network'),
+              ('Technicalities', 2, None, 'technicalities'),
+              ('Final technicalities I', 2, None, 'final-technicalities-i'),
+              ('Final technicalities II', 2, None, 'final-technicalities-ii'),
+              ('Final technicalities III', 2, None, 'final-technicalities-iii'),
+              ('Final technicalities IV', 2, None, 'final-technicalities-iv'),
+              ('Back propagation', 2, None, 'back-propagation'),
+              ('Gradient descent', 2, None, 'gradient-descent'),
+              ('The code for solving the ODE',
+               2,
+               None,
+               'the-code-for-solving-the-ode'),
+              ('The network with one input layer, specified number of hidden '
+               'layers, and one output layer',
+               2,
+               None,
+               'the-network-with-one-input-layer-specified-number-of-hidden-layers-and-one-output-layer'),
+              ('Example: Population growth',
+               2,
+               None,
+               'example-population-growth'),
+              ('Setting up the problem', 2, None, 'setting-up-the-problem'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('The program using Autograd',
+               2,
+               None,
+               'the-program-using-autograd'),
+              ('Using forward Euler to solve the ODE',
+               2,
+               None,
+               'using-forward-euler-to-solve-the-ode'),
+              ('Example: Solving the one dimensional Poisson equation',
+               2,
+               None,
+               'example-solving-the-one-dimensional-poisson-equation'),
+              ('The specific equation to solve for',
+               2,
+               None,
+               'the-specific-equation-to-solve-for'),
+              ('Solving the equation using Autograd',
+               2,
+               None,
+               'solving-the-equation-using-autograd'),
+              ('Comparing with a numerical scheme',
+               2,
+               None,
+               'comparing-with-a-numerical-scheme'),
+              ('Setting up the code', 2, None, 'setting-up-the-code'),
+              ('Partial Differential Equations',
+               2,
+               None,
+               'partial-differential-equations'),
+              ('Type of problem', 2, None, 'type-of-problem'),
+              ('Network requirements', 2, None, 'network-requirements'),
+              ('More details', 2, None, 'more-details'),
+              ('Example: The diffusion equation',
+               2,
+               None,
+               'example-the-diffusion-equation'),
+              ('Defining the problem', 2, None, 'defining-the-problem'),
+              ('Setting up the network using Autograd',
+               2,
+               None,
+               'setting-up-the-network-using-autograd'),
+              ('Setting up the network using Autograd; The trial solution',
+               2,
+               None,
+               'setting-up-the-network-using-autograd-the-trial-solution'),
+              ('Why the Jacobian?', 2, None, 'why-the-jacobian'),
+              ('Setting up the network using Autograd; The full program',
+               2,
+               None,
+               'setting-up-the-network-using-autograd-the-full-program'),
+              ('Resources on differential equations and deep learning',
+               2,
+               None,
+               'resources-on-differential-equations-and-deep-learning'),
               ('Convolutional Neural Networks (recognizing images)',
                2,
                None,
@@ -141,85 +252,11 @@
                2,
                None,
                'pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
-              ('Building convolutional neural networks in Tensorflow and Keras',
-               2,
-               None,
-               'building-convolutional-neural-networks-in-tensorflow-and-keras'),
-              ('Setting it up', 2, None, 'setting-it-up'),
-              ('The MNIST dataset again', 2, None, 'the-mnist-dataset-again'),
-              ('Strong correlations', 2, None, 'strong-correlations'),
-              ('Layers of a CNN', 2, None, 'layers-of-a-cnn'),
-              ('Systematic reduction', 2, None, 'systematic-reduction'),
-              ('Prerequisites: Collect and pre-process data',
+              ('Building convolutional neural networks in Tensorflow/Keras and '
+               'PyTorch',
                2,
                None,
-               'prerequisites-collect-and-pre-process-data'),
-              ('Importing Keras and Tensorflow',
-               2,
-               None,
-               'importing-keras-and-tensorflow'),
-              ('Running with Keras', 2, None, 'running-with-keras'),
-              ('Final part', 2, None, 'final-part'),
-              ('Final visualization', 2, None, 'final-visualization'),
-              ('The CIFAR01 data set', 2, None, 'the-cifar01-data-set'),
-              ('Verifying the data set', 2, None, 'verifying-the-data-set'),
-              ('Set up  the model', 2, None, 'set-up-the-model'),
-              ('Add Dense layers on top', 2, None, 'add-dense-layers-on-top'),
-              ('Compile and train the model',
-               2,
-               None,
-               'compile-and-train-the-model'),
-              ('Finally, evaluate the model',
-               2,
-               None,
-               'finally-evaluate-the-model'),
-              ('Building our own CNN code',
-               2,
-               None,
-               'building-our-own-cnn-code'),
-              ('List of contents:', 3, None, 'list-of-contents'),
-              ('Schedulers', 3, None, 'schedulers'),
-              ('Usage of schedulers', 3, None, 'usage-of-schedulers'),
-              ('Cost functions', 3, None, 'cost-functions'),
-              ('Usage of cost functions', 3, None, 'usage-of-cost-functions'),
-              ('Activation functions', 3, None, 'activation-functions'),
-              ('Usage of activation functions',
-               3,
-               None,
-               'usage-of-activation-functions'),
-              ('Convolution', 3, None, 'convolution'),
-              ('Layers', 3, None, 'layers'),
-              ('Convolution2DLayer: convolution in a hidden layer',
-               3,
-               None,
-               'convolution2dlayer-convolution-in-a-hidden-layer'),
-              ('Backpropagation in the convolutional layer',
-               3,
-               None,
-               'backpropagation-in-the-convolutional-layer'),
-              ('Demonstration', 3, None, 'demonstration'),
-              ('Pooling Layer', 3, None, 'pooling-layer'),
-              ('Flattening Layer', 3, None, 'flattening-layer'),
-              ('Fully Connected Layers', 3, None, 'fully-connected-layers'),
-              ('Optimized Convolution2DLayer',
-               3,
-               None,
-               'optimized-convolution2dlayer'),
-              ('The Convolutional Neural Network (CNN)',
-               3,
-               None,
-               'the-convolutional-neural-network-cnn'),
-              ('Usage of CNN code', 3, None, 'usage-of-cnn-code'),
-              ('Additional Remarks', 3, None, 'additional-remarks'),
-              ('Remarks on the speed', 3, None, 'remarks-on-the-speed'),
-              ('Convolution using separable kernels',
-               3,
-               None,
-               'convolution-using-separable-kernels'),
-              ('Convolution in the Fourier domain',
-               3,
-               None,
-               'convolution-in-the-fourier-domain')]}
+               'building-convolutional-neural-networks-in-tensorflow-keras-and-pytorch')]}
 end of tocinfo -->
 
 <body>
@@ -247,96 +284,100 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week44-bs.html">Week 44,  Convolutional Neural Networks (CNN)</a>
+    <a class="navbar-brand" href="http://github.com/week44-bs.html">Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs001.html#plan-for-week-44" style="font-size: 80%;"><b>Plan for week 44</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs002.html#lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Lab  sessions on Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs003.html#material-for-lecture-monday-october-28" style="font-size: 80%;"><b>Material for Lecture Monday October 28</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs004.html#convolutional-neural-networks-recognizing-images" style="font-size: 80%;"><b>Convolutional Neural Networks (recognizing images)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs005.html#what-is-the-difference" style="font-size: 80%;"><b>What is the Difference</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs006.html#neural-networks-vs-cnns" style="font-size: 80%;"><b>Neural Networks vs CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs007.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;"><b>Why CNNS for images, sound files, medical images from CT scans etc?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs008.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;"><b>Regular NNs don’t scale well to full images</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs009.html#3d-volumes-of-neurons" style="font-size: 80%;"><b>3D volumes of neurons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs010.html#more-on-dimensionalities" style="font-size: 80%;"><b>More on Dimensionalities</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs011.html#further-remarks" style="font-size: 80%;"><b>Further remarks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs012.html#layers-used-to-build-cnns" style="font-size: 80%;"><b>Layers used to build CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs013.html#transforming-images" style="font-size: 80%;"><b>Transforming images</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs014.html#cnns-in-brief" style="font-size: 80%;"><b>CNNs in brief</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs015.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs016.html#key-idea" style="font-size: 80%;"><b>Key Idea</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs017.html#how-to-do-image-compression-before-the-era-of-deep-learning" style="font-size: 80%;"><b>How to do image compression before the era of deep learning</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs018.html#the-svd-example" style="font-size: 80%;"><b>The SVD example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs019.html#mathematics-of-cnns" style="font-size: 80%;"><b>Mathematics of CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs020.html#convolution-examples-polynomial-multiplication" style="font-size: 80%;"><b>Convolution Examples: Polynomial multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs021.html#efficient-polynomial-multiplication" style="font-size: 80%;"><b>Efficient Polynomial Multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs022.html#further-simplification" style="font-size: 80%;"><b>Further simplification</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs023.html#a-more-efficient-way-of-coding-the-above-convolution" style="font-size: 80%;"><b>A more efficient way of coding the above Convolution</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs024.html#commutative-process" style="font-size: 80%;"><b>Commutative process</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs025.html#toeplitz-matrices" style="font-size: 80%;"><b>Toeplitz matrices</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs026.html#fourier-series-and-toeplitz-matrices" style="font-size: 80%;"><b>Fourier series and Toeplitz matrices</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs027.html#generalizing-the-above-one-dimensional-case" style="font-size: 80%;"><b>Generalizing the above one-dimensional case</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#memory-considerations" style="font-size: 80%;"><b>Memory considerations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs029.html#padding" style="font-size: 80%;"><b>Padding</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs030.html#new-vector" style="font-size: 80%;"><b>New vector</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs031.html#rewriting-as-dot-products" style="font-size: 80%;"><b>Rewriting as dot products</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs032.html#cross-correlation" style="font-size: 80%;"><b>Cross correlation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs032.html#two-dimensional-objects" style="font-size: 80%;"><b>Two-dimensional objects</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs033.html#cnns-in-more-detail-simple-example" style="font-size: 80%;"><b>CNNs in more detail, simple example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs034.html#the-convolution-stage" style="font-size: 80%;"><b>The convolution stage</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs035.html#finding-the-number-of-parameters" style="font-size: 80%;"><b>Finding the number of parameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs036.html#new-image-or-volume" style="font-size: 80%;"><b>New image (or volume)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs037.html#parameters-to-train-common-settings" style="font-size: 80%;"><b>Parameters to train, common settings</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs038.html#examples-of-cnn-setups" style="font-size: 80%;"><b>Examples of CNN setups</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs039.html#summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>Summarizing: Performing a general discrete convolution ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="#pooling" style="font-size: 80%;"><b>Pooling</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs041.html#pooling-arithmetic" style="font-size: 80%;"><b>Pooling arithmetic</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs042.html#pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>Pooling types ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs043.html#building-convolutional-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building convolutional neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs044.html#setting-it-up" style="font-size: 80%;"><b>Setting it up</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs045.html#the-mnist-dataset-again" style="font-size: 80%;"><b>The MNIST dataset again</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs046.html#strong-correlations" style="font-size: 80%;"><b>Strong correlations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs047.html#layers-of-a-cnn" style="font-size: 80%;"><b>Layers of a CNN</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs048.html#systematic-reduction" style="font-size: 80%;"><b>Systematic reduction</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs049.html#prerequisites-collect-and-pre-process-data" style="font-size: 80%;"><b>Prerequisites: Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs050.html#importing-keras-and-tensorflow" style="font-size: 80%;"><b>Importing Keras and Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs051.html#running-with-keras" style="font-size: 80%;"><b>Running with Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs052.html#final-part" style="font-size: 80%;"><b>Final part</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs053.html#final-visualization" style="font-size: 80%;"><b>Final visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs054.html#the-cifar01-data-set" style="font-size: 80%;"><b>The CIFAR01 data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs055.html#verifying-the-data-set" style="font-size: 80%;"><b>Verifying the data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs056.html#set-up-the-model" style="font-size: 80%;"><b>Set up  the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs057.html#add-dense-layers-on-top" style="font-size: 80%;"><b>Add Dense layers on top</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs058.html#compile-and-train-the-model" style="font-size: 80%;"><b>Compile and train the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs059.html#finally-evaluate-the-model" style="font-size: 80%;"><b>Finally, evaluate the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#building-our-own-cnn-code" style="font-size: 80%;"><b>Building our own CNN code</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#list-of-contents" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;List of contents:</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution2dlayer-convolution-in-a-hidden-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution2DLayer: convolution in a hidden layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#backpropagation-in-the-convolutional-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Backpropagation in the convolutional layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#demonstration" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Demonstration</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#pooling-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Pooling Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#flattening-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Flattening Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#fully-connected-layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Fully Connected Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#optimized-convolution2dlayer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Optimized Convolution2DLayer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#the-convolutional-neural-network-cnn" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;The Convolutional Neural Network (CNN)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-cnn-code" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of CNN code</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#additional-remarks" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Additional Remarks</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#remarks-on-the-speed" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Remarks on the speed</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution-using-separable-kernels" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution using separable kernels</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution-in-the-fourier-domain" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution in the Fourier domain</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs001.html#plan-for-week-44" style="font-size: 80%;">Plan for week 44</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs002.html#lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;">Lab  sessions on Tuesday and Wednesday</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs003.html#material-for-lecture-monday-october-27" style="font-size: 80%;">Material for Lecture Monday October 27</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs004.html#solving-differential-equations-with-deep-learning" style="font-size: 80%;">Solving differential equations  with Deep Learning</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs005.html#ordinary-differential-equations-first" style="font-size: 80%;">Ordinary Differential Equations first</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#the-trial-solution" style="font-size: 80%;">The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs007.html#minimization-process" style="font-size: 80%;">Minimization process</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs008.html#minimizing-the-cost-function-using-gradient-descent-and-automatic-differentiation" style="font-size: 80%;">Minimizing the cost function using gradient descent and automatic differentiation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs009.html#example-exponential-decay" style="font-size: 80%;">Example: Exponential decay</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs010.html#the-function-to-solve-for" style="font-size: 80%;">The function to solve for</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#the-trial-solution" style="font-size: 80%;">The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs012.html#setup-of-network" style="font-size: 80%;">Setup of Network</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs013.html#reformulating-the-problem" style="font-size: 80%;">Reformulating the problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs014.html#more-technicalities" style="font-size: 80%;">More technicalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs039.html#more-details" style="font-size: 80%;">More details</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs016.html#a-possible-implementation-of-a-neural-network" style="font-size: 80%;">A possible implementation of a neural network</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs017.html#technicalities" style="font-size: 80%;">Technicalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs018.html#final-technicalities-i" style="font-size: 80%;">Final technicalities I</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs019.html#final-technicalities-ii" style="font-size: 80%;">Final technicalities II</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs020.html#final-technicalities-iii" style="font-size: 80%;">Final technicalities III</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs021.html#final-technicalities-iv" style="font-size: 80%;">Final technicalities IV</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs022.html#back-propagation" style="font-size: 80%;">Back propagation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs023.html#gradient-descent" style="font-size: 80%;">Gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs024.html#the-code-for-solving-the-ode" style="font-size: 80%;">The code for solving the ODE</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs025.html#the-network-with-one-input-layer-specified-number-of-hidden-layers-and-one-output-layer" style="font-size: 80%;">The network with one input layer, specified number of hidden layers, and one output layer</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs026.html#example-population-growth" style="font-size: 80%;">Example: Population growth</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs027.html#setting-up-the-problem" style="font-size: 80%;">Setting up the problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#the-trial-solution" style="font-size: 80%;">The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs029.html#the-program-using-autograd" style="font-size: 80%;">The program using Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs030.html#using-forward-euler-to-solve-the-ode" style="font-size: 80%;">Using forward Euler to solve the ODE</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs031.html#example-solving-the-one-dimensional-poisson-equation" style="font-size: 80%;">Example: Solving the one dimensional Poisson equation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs032.html#the-specific-equation-to-solve-for" style="font-size: 80%;">The specific equation to solve for</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs033.html#solving-the-equation-using-autograd" style="font-size: 80%;">Solving the equation using Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs034.html#comparing-with-a-numerical-scheme" style="font-size: 80%;">Comparing with a numerical scheme</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs035.html#setting-up-the-code" style="font-size: 80%;">Setting up the code</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs036.html#partial-differential-equations" style="font-size: 80%;">Partial Differential Equations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs037.html#type-of-problem" style="font-size: 80%;">Type of problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs038.html#network-requirements" style="font-size: 80%;">Network requirements</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs039.html#more-details" style="font-size: 80%;">More details</a></li>
+     <!-- navigation toc: --> <li><a href="#example-the-diffusion-equation" style="font-size: 80%;">Example: The diffusion equation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs041.html#defining-the-problem" style="font-size: 80%;">Defining the problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs042.html#setting-up-the-network-using-autograd" style="font-size: 80%;">Setting up the network using Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs043.html#setting-up-the-network-using-autograd-the-trial-solution" style="font-size: 80%;">Setting up the network using Autograd; The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs044.html#why-the-jacobian" style="font-size: 80%;">Why the Jacobian?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs045.html#setting-up-the-network-using-autograd-the-full-program" style="font-size: 80%;">Setting up the network using Autograd; The full program</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs046.html#resources-on-differential-equations-and-deep-learning" style="font-size: 80%;">Resources on differential equations and deep learning</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs047.html#convolutional-neural-networks-recognizing-images" style="font-size: 80%;">Convolutional Neural Networks (recognizing images)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs048.html#what-is-the-difference" style="font-size: 80%;">What is the Difference</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs049.html#neural-networks-vs-cnns" style="font-size: 80%;">Neural Networks vs CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs050.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;">Why CNNS for images, sound files, medical images from CT scans etc?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs051.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;">Regular NNs don’t scale well to full images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs052.html#3d-volumes-of-neurons" style="font-size: 80%;">3D volumes of neurons</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs053.html#more-on-dimensionalities" style="font-size: 80%;">More on Dimensionalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs054.html#further-remarks" style="font-size: 80%;">Further remarks</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs055.html#layers-used-to-build-cnns" style="font-size: 80%;">Layers used to build CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs056.html#transforming-images" style="font-size: 80%;">Transforming images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs057.html#cnns-in-brief" style="font-size: 80%;">CNNs in brief</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs058.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs059.html#key-idea" style="font-size: 80%;">Key Idea</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#how-to-do-image-compression-before-the-era-of-deep-learning" style="font-size: 80%;">How to do image compression before the era of deep learning</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs061.html#the-svd-example" style="font-size: 80%;">The SVD example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs062.html#mathematics-of-cnns" style="font-size: 80%;">Mathematics of CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs063.html#convolution-examples-polynomial-multiplication" style="font-size: 80%;">Convolution Examples: Polynomial multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs064.html#efficient-polynomial-multiplication" style="font-size: 80%;">Efficient Polynomial Multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs065.html#further-simplification" style="font-size: 80%;">Further simplification</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs066.html#a-more-efficient-way-of-coding-the-above-convolution" style="font-size: 80%;">A more efficient way of coding the above Convolution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs067.html#commutative-process" style="font-size: 80%;">Commutative process</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs068.html#toeplitz-matrices" style="font-size: 80%;">Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs069.html#fourier-series-and-toeplitz-matrices" style="font-size: 80%;">Fourier series and Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs070.html#generalizing-the-above-one-dimensional-case" style="font-size: 80%;">Generalizing the above one-dimensional case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs071.html#memory-considerations" style="font-size: 80%;">Memory considerations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs072.html#padding" style="font-size: 80%;">Padding</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs073.html#new-vector" style="font-size: 80%;">New vector</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs074.html#rewriting-as-dot-products" style="font-size: 80%;">Rewriting as dot products</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs075.html#cross-correlation" style="font-size: 80%;">Cross correlation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs075.html#two-dimensional-objects" style="font-size: 80%;">Two-dimensional objects</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs076.html#cnns-in-more-detail-simple-example" style="font-size: 80%;">CNNs in more detail, simple example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs077.html#the-convolution-stage" style="font-size: 80%;">The convolution stage</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs078.html#finding-the-number-of-parameters" style="font-size: 80%;">Finding the number of parameters</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs079.html#new-image-or-volume" style="font-size: 80%;">New image (or volume)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs080.html#parameters-to-train-common-settings" style="font-size: 80%;">Parameters to train, common settings</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs081.html#examples-of-cnn-setups" style="font-size: 80%;">Examples of CNN setups</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs082.html#summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Summarizing: Performing a general discrete convolution ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs083.html#pooling" style="font-size: 80%;">Pooling</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs084.html#pooling-arithmetic" style="font-size: 80%;">Pooling arithmetic</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs085.html#pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Pooling types ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs086.html#building-convolutional-neural-networks-in-tensorflow-keras-and-pytorch" style="font-size: 80%;">Building convolutional neural networks in Tensorflow/Keras and PyTorch</a></li>
 
         </ul>
       </li>
@@ -348,19 +389,25 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0040"></a>
 <!-- !split -->
-<h2 id="pooling" class="anchor">Pooling </h2>
+<h2 id="example-the-diffusion-equation" class="anchor">Example: The diffusion equation </h2>
+
+<p>In one spatial dimension, the equation reads</p>
+$$
+\begin{equation*}
+  \frac{\partial g(x,t)}{\partial t} = \frac{\partial^2 g(x,t)}{\partial x^2}
+\end{equation*}
+$$
 
-<p>In addition to discrete convolutions themselves, <b>pooling</b> operations
-make up another important building block in CNNs. Pooling operations reduce
-the size of feature maps by using some function to summarize subregions, such
-as taking the average or the maximum value.
-</p>
+<p>where a possible choice of conditions are</p>
+$$
+\begin{align*}
+g(0,t) &= 0 ,\qquad t \geq 0 \\
+g(1,t) &= 0, \qquad t \geq 0 \\
+g(x,0) &= u(x),\qquad x\in [0,1]
+\end{align*}
+$$
 
-<p>Pooling works by sliding a window across the input and feeding the content of
-the window to a <b>pooling function</b>. In some sense, pooling works very much
-like a discrete convolution, but replaces the linear combination described by
-the kernel with some other function.
-</p>
+<p>with \( u(x) \) being some given function.</p>
 
 <p>
 <!-- navigation buttons at the bottom of the page -->
@@ -387,7 +434,7 @@ <h2 id="pooling" class="anchor">Pooling </h2>
   <li><a href="http://github.com/._week44-bs048.html">49</a></li>
   <li><a href="http://github.com/._week44-bs049.html">50</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week44-bs060.html">61</a></li>
+  <li><a href="http://github.com/._week44-bs086.html">87</a></li>
   <li><a href="http://github.com/._week44-bs041.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week44/html/._week44-bs041.html b/doc/pub/week44/html/._week44-bs041.html
index c06e94ab6..de3b74e96 100644
--- a/doc/pub/week44/html/._week44-bs041.html
+++ b/doc/pub/week44/html/._week44-bs041.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 44,  Convolutional Neural Networks (CNN)">
-<title>Week 44,  Convolutional Neural Networks (CNN)</title>
+<meta name="description" content="Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)">
+<title>Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week44.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week44-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -41,10 +41,121 @@
                2,
                None,
                'lab-sessions-on-tuesday-and-wednesday'),
-              ('Material for Lecture Monday October 28',
+              ('Material for Lecture Monday October 27',
                2,
                None,
-               'material-for-lecture-monday-october-28'),
+               'material-for-lecture-monday-october-27'),
+              ('Solving differential equations  with Deep Learning',
+               2,
+               None,
+               'solving-differential-equations-with-deep-learning'),
+              ('Ordinary Differential Equations first',
+               2,
+               None,
+               'ordinary-differential-equations-first'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('Minimization process', 2, None, 'minimization-process'),
+              ('Minimizing the cost function using gradient descent and '
+               'automatic differentiation',
+               2,
+               None,
+               'minimizing-the-cost-function-using-gradient-descent-and-automatic-differentiation'),
+              ('Example: Exponential decay',
+               2,
+               None,
+               'example-exponential-decay'),
+              ('The function to solve for',
+               2,
+               None,
+               'the-function-to-solve-for'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('Setup of Network', 2, None, 'setup-of-network'),
+              ('Reformulating the problem',
+               2,
+               None,
+               'reformulating-the-problem'),
+              ('More technicalities', 2, None, 'more-technicalities'),
+              ('More details', 2, None, 'more-details'),
+              ('A possible implementation of a neural network',
+               2,
+               None,
+               'a-possible-implementation-of-a-neural-network'),
+              ('Technicalities', 2, None, 'technicalities'),
+              ('Final technicalities I', 2, None, 'final-technicalities-i'),
+              ('Final technicalities II', 2, None, 'final-technicalities-ii'),
+              ('Final technicalities III', 2, None, 'final-technicalities-iii'),
+              ('Final technicalities IV', 2, None, 'final-technicalities-iv'),
+              ('Back propagation', 2, None, 'back-propagation'),
+              ('Gradient descent', 2, None, 'gradient-descent'),
+              ('The code for solving the ODE',
+               2,
+               None,
+               'the-code-for-solving-the-ode'),
+              ('The network with one input layer, specified number of hidden '
+               'layers, and one output layer',
+               2,
+               None,
+               'the-network-with-one-input-layer-specified-number-of-hidden-layers-and-one-output-layer'),
+              ('Example: Population growth',
+               2,
+               None,
+               'example-population-growth'),
+              ('Setting up the problem', 2, None, 'setting-up-the-problem'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('The program using Autograd',
+               2,
+               None,
+               'the-program-using-autograd'),
+              ('Using forward Euler to solve the ODE',
+               2,
+               None,
+               'using-forward-euler-to-solve-the-ode'),
+              ('Example: Solving the one dimensional Poisson equation',
+               2,
+               None,
+               'example-solving-the-one-dimensional-poisson-equation'),
+              ('The specific equation to solve for',
+               2,
+               None,
+               'the-specific-equation-to-solve-for'),
+              ('Solving the equation using Autograd',
+               2,
+               None,
+               'solving-the-equation-using-autograd'),
+              ('Comparing with a numerical scheme',
+               2,
+               None,
+               'comparing-with-a-numerical-scheme'),
+              ('Setting up the code', 2, None, 'setting-up-the-code'),
+              ('Partial Differential Equations',
+               2,
+               None,
+               'partial-differential-equations'),
+              ('Type of problem', 2, None, 'type-of-problem'),
+              ('Network requirements', 2, None, 'network-requirements'),
+              ('More details', 2, None, 'more-details'),
+              ('Example: The diffusion equation',
+               2,
+               None,
+               'example-the-diffusion-equation'),
+              ('Defining the problem', 2, None, 'defining-the-problem'),
+              ('Setting up the network using Autograd',
+               2,
+               None,
+               'setting-up-the-network-using-autograd'),
+              ('Setting up the network using Autograd; The trial solution',
+               2,
+               None,
+               'setting-up-the-network-using-autograd-the-trial-solution'),
+              ('Why the Jacobian?', 2, None, 'why-the-jacobian'),
+              ('Setting up the network using Autograd; The full program',
+               2,
+               None,
+               'setting-up-the-network-using-autograd-the-full-program'),
+              ('Resources on differential equations and deep learning',
+               2,
+               None,
+               'resources-on-differential-equations-and-deep-learning'),
               ('Convolutional Neural Networks (recognizing images)',
                2,
                None,
@@ -141,85 +252,11 @@
                2,
                None,
                'pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
-              ('Building convolutional neural networks in Tensorflow and Keras',
-               2,
-               None,
-               'building-convolutional-neural-networks-in-tensorflow-and-keras'),
-              ('Setting it up', 2, None, 'setting-it-up'),
-              ('The MNIST dataset again', 2, None, 'the-mnist-dataset-again'),
-              ('Strong correlations', 2, None, 'strong-correlations'),
-              ('Layers of a CNN', 2, None, 'layers-of-a-cnn'),
-              ('Systematic reduction', 2, None, 'systematic-reduction'),
-              ('Prerequisites: Collect and pre-process data',
-               2,
-               None,
-               'prerequisites-collect-and-pre-process-data'),
-              ('Importing Keras and Tensorflow',
-               2,
-               None,
-               'importing-keras-and-tensorflow'),
-              ('Running with Keras', 2, None, 'running-with-keras'),
-              ('Final part', 2, None, 'final-part'),
-              ('Final visualization', 2, None, 'final-visualization'),
-              ('The CIFAR01 data set', 2, None, 'the-cifar01-data-set'),
-              ('Verifying the data set', 2, None, 'verifying-the-data-set'),
-              ('Set up  the model', 2, None, 'set-up-the-model'),
-              ('Add Dense layers on top', 2, None, 'add-dense-layers-on-top'),
-              ('Compile and train the model',
+              ('Building convolutional neural networks in Tensorflow/Keras and '
+               'PyTorch',
                2,
                None,
-               'compile-and-train-the-model'),
-              ('Finally, evaluate the model',
-               2,
-               None,
-               'finally-evaluate-the-model'),
-              ('Building our own CNN code',
-               2,
-               None,
-               'building-our-own-cnn-code'),
-              ('List of contents:', 3, None, 'list-of-contents'),
-              ('Schedulers', 3, None, 'schedulers'),
-              ('Usage of schedulers', 3, None, 'usage-of-schedulers'),
-              ('Cost functions', 3, None, 'cost-functions'),
-              ('Usage of cost functions', 3, None, 'usage-of-cost-functions'),
-              ('Activation functions', 3, None, 'activation-functions'),
-              ('Usage of activation functions',
-               3,
-               None,
-               'usage-of-activation-functions'),
-              ('Convolution', 3, None, 'convolution'),
-              ('Layers', 3, None, 'layers'),
-              ('Convolution2DLayer: convolution in a hidden layer',
-               3,
-               None,
-               'convolution2dlayer-convolution-in-a-hidden-layer'),
-              ('Backpropagation in the convolutional layer',
-               3,
-               None,
-               'backpropagation-in-the-convolutional-layer'),
-              ('Demonstration', 3, None, 'demonstration'),
-              ('Pooling Layer', 3, None, 'pooling-layer'),
-              ('Flattening Layer', 3, None, 'flattening-layer'),
-              ('Fully Connected Layers', 3, None, 'fully-connected-layers'),
-              ('Optimized Convolution2DLayer',
-               3,
-               None,
-               'optimized-convolution2dlayer'),
-              ('The Convolutional Neural Network (CNN)',
-               3,
-               None,
-               'the-convolutional-neural-network-cnn'),
-              ('Usage of CNN code', 3, None, 'usage-of-cnn-code'),
-              ('Additional Remarks', 3, None, 'additional-remarks'),
-              ('Remarks on the speed', 3, None, 'remarks-on-the-speed'),
-              ('Convolution using separable kernels',
-               3,
-               None,
-               'convolution-using-separable-kernels'),
-              ('Convolution in the Fourier domain',
-               3,
-               None,
-               'convolution-in-the-fourier-domain')]}
+               'building-convolutional-neural-networks-in-tensorflow-keras-and-pytorch')]}
 end of tocinfo -->
 
 <body>
@@ -247,96 +284,100 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week44-bs.html">Week 44,  Convolutional Neural Networks (CNN)</a>
+    <a class="navbar-brand" href="http://github.com/week44-bs.html">Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs001.html#plan-for-week-44" style="font-size: 80%;"><b>Plan for week 44</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs002.html#lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Lab  sessions on Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs003.html#material-for-lecture-monday-october-28" style="font-size: 80%;"><b>Material for Lecture Monday October 28</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs004.html#convolutional-neural-networks-recognizing-images" style="font-size: 80%;"><b>Convolutional Neural Networks (recognizing images)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs005.html#what-is-the-difference" style="font-size: 80%;"><b>What is the Difference</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs006.html#neural-networks-vs-cnns" style="font-size: 80%;"><b>Neural Networks vs CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs007.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;"><b>Why CNNS for images, sound files, medical images from CT scans etc?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs008.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;"><b>Regular NNs don’t scale well to full images</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs009.html#3d-volumes-of-neurons" style="font-size: 80%;"><b>3D volumes of neurons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs010.html#more-on-dimensionalities" style="font-size: 80%;"><b>More on Dimensionalities</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs011.html#further-remarks" style="font-size: 80%;"><b>Further remarks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs012.html#layers-used-to-build-cnns" style="font-size: 80%;"><b>Layers used to build CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs013.html#transforming-images" style="font-size: 80%;"><b>Transforming images</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs014.html#cnns-in-brief" style="font-size: 80%;"><b>CNNs in brief</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs015.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs016.html#key-idea" style="font-size: 80%;"><b>Key Idea</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs017.html#how-to-do-image-compression-before-the-era-of-deep-learning" style="font-size: 80%;"><b>How to do image compression before the era of deep learning</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs018.html#the-svd-example" style="font-size: 80%;"><b>The SVD example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs019.html#mathematics-of-cnns" style="font-size: 80%;"><b>Mathematics of CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs020.html#convolution-examples-polynomial-multiplication" style="font-size: 80%;"><b>Convolution Examples: Polynomial multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs021.html#efficient-polynomial-multiplication" style="font-size: 80%;"><b>Efficient Polynomial Multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs022.html#further-simplification" style="font-size: 80%;"><b>Further simplification</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs023.html#a-more-efficient-way-of-coding-the-above-convolution" style="font-size: 80%;"><b>A more efficient way of coding the above Convolution</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs024.html#commutative-process" style="font-size: 80%;"><b>Commutative process</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs025.html#toeplitz-matrices" style="font-size: 80%;"><b>Toeplitz matrices</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs026.html#fourier-series-and-toeplitz-matrices" style="font-size: 80%;"><b>Fourier series and Toeplitz matrices</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs027.html#generalizing-the-above-one-dimensional-case" style="font-size: 80%;"><b>Generalizing the above one-dimensional case</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#memory-considerations" style="font-size: 80%;"><b>Memory considerations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs029.html#padding" style="font-size: 80%;"><b>Padding</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs030.html#new-vector" style="font-size: 80%;"><b>New vector</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs031.html#rewriting-as-dot-products" style="font-size: 80%;"><b>Rewriting as dot products</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs032.html#cross-correlation" style="font-size: 80%;"><b>Cross correlation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs032.html#two-dimensional-objects" style="font-size: 80%;"><b>Two-dimensional objects</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs033.html#cnns-in-more-detail-simple-example" style="font-size: 80%;"><b>CNNs in more detail, simple example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs034.html#the-convolution-stage" style="font-size: 80%;"><b>The convolution stage</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs035.html#finding-the-number-of-parameters" style="font-size: 80%;"><b>Finding the number of parameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs036.html#new-image-or-volume" style="font-size: 80%;"><b>New image (or volume)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs037.html#parameters-to-train-common-settings" style="font-size: 80%;"><b>Parameters to train, common settings</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs038.html#examples-of-cnn-setups" style="font-size: 80%;"><b>Examples of CNN setups</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs039.html#summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>Summarizing: Performing a general discrete convolution ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs040.html#pooling" style="font-size: 80%;"><b>Pooling</b></a></li>
-     <!-- navigation toc: --> <li><a href="#pooling-arithmetic" style="font-size: 80%;"><b>Pooling arithmetic</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs042.html#pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>Pooling types ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs043.html#building-convolutional-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building convolutional neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs044.html#setting-it-up" style="font-size: 80%;"><b>Setting it up</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs045.html#the-mnist-dataset-again" style="font-size: 80%;"><b>The MNIST dataset again</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs046.html#strong-correlations" style="font-size: 80%;"><b>Strong correlations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs047.html#layers-of-a-cnn" style="font-size: 80%;"><b>Layers of a CNN</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs048.html#systematic-reduction" style="font-size: 80%;"><b>Systematic reduction</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs049.html#prerequisites-collect-and-pre-process-data" style="font-size: 80%;"><b>Prerequisites: Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs050.html#importing-keras-and-tensorflow" style="font-size: 80%;"><b>Importing Keras and Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs051.html#running-with-keras" style="font-size: 80%;"><b>Running with Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs052.html#final-part" style="font-size: 80%;"><b>Final part</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs053.html#final-visualization" style="font-size: 80%;"><b>Final visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs054.html#the-cifar01-data-set" style="font-size: 80%;"><b>The CIFAR01 data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs055.html#verifying-the-data-set" style="font-size: 80%;"><b>Verifying the data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs056.html#set-up-the-model" style="font-size: 80%;"><b>Set up  the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs057.html#add-dense-layers-on-top" style="font-size: 80%;"><b>Add Dense layers on top</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs058.html#compile-and-train-the-model" style="font-size: 80%;"><b>Compile and train the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs059.html#finally-evaluate-the-model" style="font-size: 80%;"><b>Finally, evaluate the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#building-our-own-cnn-code" style="font-size: 80%;"><b>Building our own CNN code</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#list-of-contents" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;List of contents:</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution2dlayer-convolution-in-a-hidden-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution2DLayer: convolution in a hidden layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#backpropagation-in-the-convolutional-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Backpropagation in the convolutional layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#demonstration" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Demonstration</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#pooling-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Pooling Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#flattening-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Flattening Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#fully-connected-layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Fully Connected Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#optimized-convolution2dlayer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Optimized Convolution2DLayer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#the-convolutional-neural-network-cnn" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;The Convolutional Neural Network (CNN)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-cnn-code" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of CNN code</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#additional-remarks" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Additional Remarks</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#remarks-on-the-speed" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Remarks on the speed</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution-using-separable-kernels" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution using separable kernels</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution-in-the-fourier-domain" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution in the Fourier domain</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs001.html#plan-for-week-44" style="font-size: 80%;">Plan for week 44</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs002.html#lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;">Lab  sessions on Tuesday and Wednesday</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs003.html#material-for-lecture-monday-october-27" style="font-size: 80%;">Material for Lecture Monday October 27</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs004.html#solving-differential-equations-with-deep-learning" style="font-size: 80%;">Solving differential equations  with Deep Learning</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs005.html#ordinary-differential-equations-first" style="font-size: 80%;">Ordinary Differential Equations first</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#the-trial-solution" style="font-size: 80%;">The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs007.html#minimization-process" style="font-size: 80%;">Minimization process</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs008.html#minimizing-the-cost-function-using-gradient-descent-and-automatic-differentiation" style="font-size: 80%;">Minimizing the cost function using gradient descent and automatic differentiation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs009.html#example-exponential-decay" style="font-size: 80%;">Example: Exponential decay</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs010.html#the-function-to-solve-for" style="font-size: 80%;">The function to solve for</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#the-trial-solution" style="font-size: 80%;">The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs012.html#setup-of-network" style="font-size: 80%;">Setup of Network</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs013.html#reformulating-the-problem" style="font-size: 80%;">Reformulating the problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs014.html#more-technicalities" style="font-size: 80%;">More technicalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs039.html#more-details" style="font-size: 80%;">More details</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs016.html#a-possible-implementation-of-a-neural-network" style="font-size: 80%;">A possible implementation of a neural network</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs017.html#technicalities" style="font-size: 80%;">Technicalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs018.html#final-technicalities-i" style="font-size: 80%;">Final technicalities I</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs019.html#final-technicalities-ii" style="font-size: 80%;">Final technicalities II</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs020.html#final-technicalities-iii" style="font-size: 80%;">Final technicalities III</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs021.html#final-technicalities-iv" style="font-size: 80%;">Final technicalities IV</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs022.html#back-propagation" style="font-size: 80%;">Back propagation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs023.html#gradient-descent" style="font-size: 80%;">Gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs024.html#the-code-for-solving-the-ode" style="font-size: 80%;">The code for solving the ODE</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs025.html#the-network-with-one-input-layer-specified-number-of-hidden-layers-and-one-output-layer" style="font-size: 80%;">The network with one input layer, specified number of hidden layers, and one output layer</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs026.html#example-population-growth" style="font-size: 80%;">Example: Population growth</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs027.html#setting-up-the-problem" style="font-size: 80%;">Setting up the problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#the-trial-solution" style="font-size: 80%;">The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs029.html#the-program-using-autograd" style="font-size: 80%;">The program using Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs030.html#using-forward-euler-to-solve-the-ode" style="font-size: 80%;">Using forward Euler to solve the ODE</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs031.html#example-solving-the-one-dimensional-poisson-equation" style="font-size: 80%;">Example: Solving the one dimensional Poisson equation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs032.html#the-specific-equation-to-solve-for" style="font-size: 80%;">The specific equation to solve for</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs033.html#solving-the-equation-using-autograd" style="font-size: 80%;">Solving the equation using Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs034.html#comparing-with-a-numerical-scheme" style="font-size: 80%;">Comparing with a numerical scheme</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs035.html#setting-up-the-code" style="font-size: 80%;">Setting up the code</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs036.html#partial-differential-equations" style="font-size: 80%;">Partial Differential Equations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs037.html#type-of-problem" style="font-size: 80%;">Type of problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs038.html#network-requirements" style="font-size: 80%;">Network requirements</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs039.html#more-details" style="font-size: 80%;">More details</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs040.html#example-the-diffusion-equation" style="font-size: 80%;">Example: The diffusion equation</a></li>
+     <!-- navigation toc: --> <li><a href="#defining-the-problem" style="font-size: 80%;">Defining the problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs042.html#setting-up-the-network-using-autograd" style="font-size: 80%;">Setting up the network using Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs043.html#setting-up-the-network-using-autograd-the-trial-solution" style="font-size: 80%;">Setting up the network using Autograd; The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs044.html#why-the-jacobian" style="font-size: 80%;">Why the Jacobian?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs045.html#setting-up-the-network-using-autograd-the-full-program" style="font-size: 80%;">Setting up the network using Autograd; The full program</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs046.html#resources-on-differential-equations-and-deep-learning" style="font-size: 80%;">Resources on differential equations and deep learning</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs047.html#convolutional-neural-networks-recognizing-images" style="font-size: 80%;">Convolutional Neural Networks (recognizing images)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs048.html#what-is-the-difference" style="font-size: 80%;">What is the Difference</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs049.html#neural-networks-vs-cnns" style="font-size: 80%;">Neural Networks vs CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs050.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;">Why CNNS for images, sound files, medical images from CT scans etc?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs051.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;">Regular NNs don’t scale well to full images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs052.html#3d-volumes-of-neurons" style="font-size: 80%;">3D volumes of neurons</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs053.html#more-on-dimensionalities" style="font-size: 80%;">More on Dimensionalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs054.html#further-remarks" style="font-size: 80%;">Further remarks</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs055.html#layers-used-to-build-cnns" style="font-size: 80%;">Layers used to build CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs056.html#transforming-images" style="font-size: 80%;">Transforming images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs057.html#cnns-in-brief" style="font-size: 80%;">CNNs in brief</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs058.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs059.html#key-idea" style="font-size: 80%;">Key Idea</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#how-to-do-image-compression-before-the-era-of-deep-learning" style="font-size: 80%;">How to do image compression before the era of deep learning</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs061.html#the-svd-example" style="font-size: 80%;">The SVD example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs062.html#mathematics-of-cnns" style="font-size: 80%;">Mathematics of CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs063.html#convolution-examples-polynomial-multiplication" style="font-size: 80%;">Convolution Examples: Polynomial multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs064.html#efficient-polynomial-multiplication" style="font-size: 80%;">Efficient Polynomial Multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs065.html#further-simplification" style="font-size: 80%;">Further simplification</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs066.html#a-more-efficient-way-of-coding-the-above-convolution" style="font-size: 80%;">A more efficient way of coding the above Convolution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs067.html#commutative-process" style="font-size: 80%;">Commutative process</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs068.html#toeplitz-matrices" style="font-size: 80%;">Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs069.html#fourier-series-and-toeplitz-matrices" style="font-size: 80%;">Fourier series and Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs070.html#generalizing-the-above-one-dimensional-case" style="font-size: 80%;">Generalizing the above one-dimensional case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs071.html#memory-considerations" style="font-size: 80%;">Memory considerations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs072.html#padding" style="font-size: 80%;">Padding</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs073.html#new-vector" style="font-size: 80%;">New vector</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs074.html#rewriting-as-dot-products" style="font-size: 80%;">Rewriting as dot products</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs075.html#cross-correlation" style="font-size: 80%;">Cross correlation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs075.html#two-dimensional-objects" style="font-size: 80%;">Two-dimensional objects</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs076.html#cnns-in-more-detail-simple-example" style="font-size: 80%;">CNNs in more detail, simple example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs077.html#the-convolution-stage" style="font-size: 80%;">The convolution stage</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs078.html#finding-the-number-of-parameters" style="font-size: 80%;">Finding the number of parameters</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs079.html#new-image-or-volume" style="font-size: 80%;">New image (or volume)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs080.html#parameters-to-train-common-settings" style="font-size: 80%;">Parameters to train, common settings</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs081.html#examples-of-cnn-setups" style="font-size: 80%;">Examples of CNN setups</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs082.html#summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Summarizing: Performing a general discrete convolution ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs083.html#pooling" style="font-size: 80%;">Pooling</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs084.html#pooling-arithmetic" style="font-size: 80%;">Pooling arithmetic</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs085.html#pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Pooling types ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs086.html#building-convolutional-neural-networks-in-tensorflow-keras-and-pytorch" style="font-size: 80%;">Building convolutional neural networks in Tensorflow/Keras and PyTorch</a></li>
 
         </ul>
       </li>
@@ -348,14 +389,31 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0041"></a>
 <!-- !split -->
-<h2 id="pooling-arithmetic" class="anchor">Pooling arithmetic </h2>
+<h2 id="defining-the-problem" class="anchor">Defining the problem </h2>
+
+<p>For this case, we want to find \( g(x,t) \) such that</p>
+
+$$
+\begin{equation}
+  \frac{\partial g(x,t)}{\partial t} = \frac{\partial^2 g(x,t)}{\partial x^2}
+\end{equation} \tag{18}
+$$
+
+<p>and</p>
+
+$$
+\begin{align*}
+g(0,t) &= 0 ,\qquad t \geq 0 \\
+g(1,t) &= 0, \qquad t \geq 0 \\
+g(x,0) &= u(x),\qquad x\in [0,1]
+\end{align*}
+$$
+
+<p>with \( u(x) = \sin(\pi x) \).</p>
 
-<p>In a neural network, pooling layers provide invariance to small translations of
-the input. The most common kind of pooling is <b>max pooling</b>, which
-consists in splitting the input in (usually non-overlapping) patches and
-outputting the maximum value of each patch. Other kinds of pooling exist, e.g.,
-mean or average pooling, which all share the same idea of aggregating the input
-locally by applying a non-linearity to the content of some patches.
+<p>First, let us set up the deep neural network.
+The deep neural network will follow the same structure as discussed in the examples solving the ODEs.
+First, we will look into how Autograd could be used in a network tailored to solve for bivariate functions.
 </p>
 
 <p>
@@ -383,7 +441,7 @@ <h2 id="pooling-arithmetic" class="anchor">Pooling arithmetic </h2>
   <li><a href="http://github.com/._week44-bs049.html">50</a></li>
   <li><a href="http://github.com/._week44-bs050.html">51</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week44-bs060.html">61</a></li>
+  <li><a href="http://github.com/._week44-bs086.html">87</a></li>
   <li><a href="http://github.com/._week44-bs042.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week44/html/._week44-bs042.html b/doc/pub/week44/html/._week44-bs042.html
index b35b194c1..8e5f517ed 100644
--- a/doc/pub/week44/html/._week44-bs042.html
+++ b/doc/pub/week44/html/._week44-bs042.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 44,  Convolutional Neural Networks (CNN)">
-<title>Week 44,  Convolutional Neural Networks (CNN)</title>
+<meta name="description" content="Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)">
+<title>Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week44.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week44-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -41,10 +41,121 @@
                2,
                None,
                'lab-sessions-on-tuesday-and-wednesday'),
-              ('Material for Lecture Monday October 28',
+              ('Material for Lecture Monday October 27',
                2,
                None,
-               'material-for-lecture-monday-october-28'),
+               'material-for-lecture-monday-october-27'),
+              ('Solving differential equations  with Deep Learning',
+               2,
+               None,
+               'solving-differential-equations-with-deep-learning'),
+              ('Ordinary Differential Equations first',
+               2,
+               None,
+               'ordinary-differential-equations-first'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('Minimization process', 2, None, 'minimization-process'),
+              ('Minimizing the cost function using gradient descent and '
+               'automatic differentiation',
+               2,
+               None,
+               'minimizing-the-cost-function-using-gradient-descent-and-automatic-differentiation'),
+              ('Example: Exponential decay',
+               2,
+               None,
+               'example-exponential-decay'),
+              ('The function to solve for',
+               2,
+               None,
+               'the-function-to-solve-for'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('Setup of Network', 2, None, 'setup-of-network'),
+              ('Reformulating the problem',
+               2,
+               None,
+               'reformulating-the-problem'),
+              ('More technicalities', 2, None, 'more-technicalities'),
+              ('More details', 2, None, 'more-details'),
+              ('A possible implementation of a neural network',
+               2,
+               None,
+               'a-possible-implementation-of-a-neural-network'),
+              ('Technicalities', 2, None, 'technicalities'),
+              ('Final technicalities I', 2, None, 'final-technicalities-i'),
+              ('Final technicalities II', 2, None, 'final-technicalities-ii'),
+              ('Final technicalities III', 2, None, 'final-technicalities-iii'),
+              ('Final technicalities IV', 2, None, 'final-technicalities-iv'),
+              ('Back propagation', 2, None, 'back-propagation'),
+              ('Gradient descent', 2, None, 'gradient-descent'),
+              ('The code for solving the ODE',
+               2,
+               None,
+               'the-code-for-solving-the-ode'),
+              ('The network with one input layer, specified number of hidden '
+               'layers, and one output layer',
+               2,
+               None,
+               'the-network-with-one-input-layer-specified-number-of-hidden-layers-and-one-output-layer'),
+              ('Example: Population growth',
+               2,
+               None,
+               'example-population-growth'),
+              ('Setting up the problem', 2, None, 'setting-up-the-problem'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('The program using Autograd',
+               2,
+               None,
+               'the-program-using-autograd'),
+              ('Using forward Euler to solve the ODE',
+               2,
+               None,
+               'using-forward-euler-to-solve-the-ode'),
+              ('Example: Solving the one dimensional Poisson equation',
+               2,
+               None,
+               'example-solving-the-one-dimensional-poisson-equation'),
+              ('The specific equation to solve for',
+               2,
+               None,
+               'the-specific-equation-to-solve-for'),
+              ('Solving the equation using Autograd',
+               2,
+               None,
+               'solving-the-equation-using-autograd'),
+              ('Comparing with a numerical scheme',
+               2,
+               None,
+               'comparing-with-a-numerical-scheme'),
+              ('Setting up the code', 2, None, 'setting-up-the-code'),
+              ('Partial Differential Equations',
+               2,
+               None,
+               'partial-differential-equations'),
+              ('Type of problem', 2, None, 'type-of-problem'),
+              ('Network requirements', 2, None, 'network-requirements'),
+              ('More details', 2, None, 'more-details'),
+              ('Example: The diffusion equation',
+               2,
+               None,
+               'example-the-diffusion-equation'),
+              ('Defining the problem', 2, None, 'defining-the-problem'),
+              ('Setting up the network using Autograd',
+               2,
+               None,
+               'setting-up-the-network-using-autograd'),
+              ('Setting up the network using Autograd; The trial solution',
+               2,
+               None,
+               'setting-up-the-network-using-autograd-the-trial-solution'),
+              ('Why the Jacobian?', 2, None, 'why-the-jacobian'),
+              ('Setting up the network using Autograd; The full program',
+               2,
+               None,
+               'setting-up-the-network-using-autograd-the-full-program'),
+              ('Resources on differential equations and deep learning',
+               2,
+               None,
+               'resources-on-differential-equations-and-deep-learning'),
               ('Convolutional Neural Networks (recognizing images)',
                2,
                None,
@@ -141,85 +252,11 @@
                2,
                None,
                'pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
-              ('Building convolutional neural networks in Tensorflow and Keras',
-               2,
-               None,
-               'building-convolutional-neural-networks-in-tensorflow-and-keras'),
-              ('Setting it up', 2, None, 'setting-it-up'),
-              ('The MNIST dataset again', 2, None, 'the-mnist-dataset-again'),
-              ('Strong correlations', 2, None, 'strong-correlations'),
-              ('Layers of a CNN', 2, None, 'layers-of-a-cnn'),
-              ('Systematic reduction', 2, None, 'systematic-reduction'),
-              ('Prerequisites: Collect and pre-process data',
-               2,
-               None,
-               'prerequisites-collect-and-pre-process-data'),
-              ('Importing Keras and Tensorflow',
+              ('Building convolutional neural networks in Tensorflow/Keras and '
+               'PyTorch',
                2,
                None,
-               'importing-keras-and-tensorflow'),
-              ('Running with Keras', 2, None, 'running-with-keras'),
-              ('Final part', 2, None, 'final-part'),
-              ('Final visualization', 2, None, 'final-visualization'),
-              ('The CIFAR01 data set', 2, None, 'the-cifar01-data-set'),
-              ('Verifying the data set', 2, None, 'verifying-the-data-set'),
-              ('Set up  the model', 2, None, 'set-up-the-model'),
-              ('Add Dense layers on top', 2, None, 'add-dense-layers-on-top'),
-              ('Compile and train the model',
-               2,
-               None,
-               'compile-and-train-the-model'),
-              ('Finally, evaluate the model',
-               2,
-               None,
-               'finally-evaluate-the-model'),
-              ('Building our own CNN code',
-               2,
-               None,
-               'building-our-own-cnn-code'),
-              ('List of contents:', 3, None, 'list-of-contents'),
-              ('Schedulers', 3, None, 'schedulers'),
-              ('Usage of schedulers', 3, None, 'usage-of-schedulers'),
-              ('Cost functions', 3, None, 'cost-functions'),
-              ('Usage of cost functions', 3, None, 'usage-of-cost-functions'),
-              ('Activation functions', 3, None, 'activation-functions'),
-              ('Usage of activation functions',
-               3,
-               None,
-               'usage-of-activation-functions'),
-              ('Convolution', 3, None, 'convolution'),
-              ('Layers', 3, None, 'layers'),
-              ('Convolution2DLayer: convolution in a hidden layer',
-               3,
-               None,
-               'convolution2dlayer-convolution-in-a-hidden-layer'),
-              ('Backpropagation in the convolutional layer',
-               3,
-               None,
-               'backpropagation-in-the-convolutional-layer'),
-              ('Demonstration', 3, None, 'demonstration'),
-              ('Pooling Layer', 3, None, 'pooling-layer'),
-              ('Flattening Layer', 3, None, 'flattening-layer'),
-              ('Fully Connected Layers', 3, None, 'fully-connected-layers'),
-              ('Optimized Convolution2DLayer',
-               3,
-               None,
-               'optimized-convolution2dlayer'),
-              ('The Convolutional Neural Network (CNN)',
-               3,
-               None,
-               'the-convolutional-neural-network-cnn'),
-              ('Usage of CNN code', 3, None, 'usage-of-cnn-code'),
-              ('Additional Remarks', 3, None, 'additional-remarks'),
-              ('Remarks on the speed', 3, None, 'remarks-on-the-speed'),
-              ('Convolution using separable kernels',
-               3,
-               None,
-               'convolution-using-separable-kernels'),
-              ('Convolution in the Fourier domain',
-               3,
-               None,
-               'convolution-in-the-fourier-domain')]}
+               'building-convolutional-neural-networks-in-tensorflow-keras-and-pytorch')]}
 end of tocinfo -->
 
 <body>
@@ -247,96 +284,100 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week44-bs.html">Week 44,  Convolutional Neural Networks (CNN)</a>
+    <a class="navbar-brand" href="http://github.com/week44-bs.html">Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs001.html#plan-for-week-44" style="font-size: 80%;"><b>Plan for week 44</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs002.html#lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Lab  sessions on Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs003.html#material-for-lecture-monday-october-28" style="font-size: 80%;"><b>Material for Lecture Monday October 28</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs004.html#convolutional-neural-networks-recognizing-images" style="font-size: 80%;"><b>Convolutional Neural Networks (recognizing images)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs005.html#what-is-the-difference" style="font-size: 80%;"><b>What is the Difference</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs006.html#neural-networks-vs-cnns" style="font-size: 80%;"><b>Neural Networks vs CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs007.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;"><b>Why CNNS for images, sound files, medical images from CT scans etc?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs008.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;"><b>Regular NNs don’t scale well to full images</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs009.html#3d-volumes-of-neurons" style="font-size: 80%;"><b>3D volumes of neurons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs010.html#more-on-dimensionalities" style="font-size: 80%;"><b>More on Dimensionalities</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs011.html#further-remarks" style="font-size: 80%;"><b>Further remarks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs012.html#layers-used-to-build-cnns" style="font-size: 80%;"><b>Layers used to build CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs013.html#transforming-images" style="font-size: 80%;"><b>Transforming images</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs014.html#cnns-in-brief" style="font-size: 80%;"><b>CNNs in brief</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs015.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs016.html#key-idea" style="font-size: 80%;"><b>Key Idea</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs017.html#how-to-do-image-compression-before-the-era-of-deep-learning" style="font-size: 80%;"><b>How to do image compression before the era of deep learning</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs018.html#the-svd-example" style="font-size: 80%;"><b>The SVD example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs019.html#mathematics-of-cnns" style="font-size: 80%;"><b>Mathematics of CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs020.html#convolution-examples-polynomial-multiplication" style="font-size: 80%;"><b>Convolution Examples: Polynomial multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs021.html#efficient-polynomial-multiplication" style="font-size: 80%;"><b>Efficient Polynomial Multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs022.html#further-simplification" style="font-size: 80%;"><b>Further simplification</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs023.html#a-more-efficient-way-of-coding-the-above-convolution" style="font-size: 80%;"><b>A more efficient way of coding the above Convolution</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs024.html#commutative-process" style="font-size: 80%;"><b>Commutative process</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs025.html#toeplitz-matrices" style="font-size: 80%;"><b>Toeplitz matrices</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs026.html#fourier-series-and-toeplitz-matrices" style="font-size: 80%;"><b>Fourier series and Toeplitz matrices</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs027.html#generalizing-the-above-one-dimensional-case" style="font-size: 80%;"><b>Generalizing the above one-dimensional case</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#memory-considerations" style="font-size: 80%;"><b>Memory considerations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs029.html#padding" style="font-size: 80%;"><b>Padding</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs030.html#new-vector" style="font-size: 80%;"><b>New vector</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs031.html#rewriting-as-dot-products" style="font-size: 80%;"><b>Rewriting as dot products</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs032.html#cross-correlation" style="font-size: 80%;"><b>Cross correlation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs032.html#two-dimensional-objects" style="font-size: 80%;"><b>Two-dimensional objects</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs033.html#cnns-in-more-detail-simple-example" style="font-size: 80%;"><b>CNNs in more detail, simple example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs034.html#the-convolution-stage" style="font-size: 80%;"><b>The convolution stage</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs035.html#finding-the-number-of-parameters" style="font-size: 80%;"><b>Finding the number of parameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs036.html#new-image-or-volume" style="font-size: 80%;"><b>New image (or volume)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs037.html#parameters-to-train-common-settings" style="font-size: 80%;"><b>Parameters to train, common settings</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs038.html#examples-of-cnn-setups" style="font-size: 80%;"><b>Examples of CNN setups</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs039.html#summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>Summarizing: Performing a general discrete convolution ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs040.html#pooling" style="font-size: 80%;"><b>Pooling</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs041.html#pooling-arithmetic" style="font-size: 80%;"><b>Pooling arithmetic</b></a></li>
-     <!-- navigation toc: --> <li><a href="#pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>Pooling types ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs043.html#building-convolutional-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building convolutional neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs044.html#setting-it-up" style="font-size: 80%;"><b>Setting it up</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs045.html#the-mnist-dataset-again" style="font-size: 80%;"><b>The MNIST dataset again</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs046.html#strong-correlations" style="font-size: 80%;"><b>Strong correlations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs047.html#layers-of-a-cnn" style="font-size: 80%;"><b>Layers of a CNN</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs048.html#systematic-reduction" style="font-size: 80%;"><b>Systematic reduction</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs049.html#prerequisites-collect-and-pre-process-data" style="font-size: 80%;"><b>Prerequisites: Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs050.html#importing-keras-and-tensorflow" style="font-size: 80%;"><b>Importing Keras and Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs051.html#running-with-keras" style="font-size: 80%;"><b>Running with Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs052.html#final-part" style="font-size: 80%;"><b>Final part</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs053.html#final-visualization" style="font-size: 80%;"><b>Final visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs054.html#the-cifar01-data-set" style="font-size: 80%;"><b>The CIFAR01 data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs055.html#verifying-the-data-set" style="font-size: 80%;"><b>Verifying the data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs056.html#set-up-the-model" style="font-size: 80%;"><b>Set up  the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs057.html#add-dense-layers-on-top" style="font-size: 80%;"><b>Add Dense layers on top</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs058.html#compile-and-train-the-model" style="font-size: 80%;"><b>Compile and train the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs059.html#finally-evaluate-the-model" style="font-size: 80%;"><b>Finally, evaluate the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#building-our-own-cnn-code" style="font-size: 80%;"><b>Building our own CNN code</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#list-of-contents" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;List of contents:</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution2dlayer-convolution-in-a-hidden-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution2DLayer: convolution in a hidden layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#backpropagation-in-the-convolutional-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Backpropagation in the convolutional layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#demonstration" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Demonstration</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#pooling-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Pooling Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#flattening-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Flattening Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#fully-connected-layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Fully Connected Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#optimized-convolution2dlayer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Optimized Convolution2DLayer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#the-convolutional-neural-network-cnn" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;The Convolutional Neural Network (CNN)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-cnn-code" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of CNN code</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#additional-remarks" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Additional Remarks</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#remarks-on-the-speed" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Remarks on the speed</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution-using-separable-kernels" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution using separable kernels</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution-in-the-fourier-domain" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution in the Fourier domain</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs001.html#plan-for-week-44" style="font-size: 80%;">Plan for week 44</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs002.html#lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;">Lab  sessions on Tuesday and Wednesday</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs003.html#material-for-lecture-monday-october-27" style="font-size: 80%;">Material for Lecture Monday October 27</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs004.html#solving-differential-equations-with-deep-learning" style="font-size: 80%;">Solving differential equations  with Deep Learning</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs005.html#ordinary-differential-equations-first" style="font-size: 80%;">Ordinary Differential Equations first</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#the-trial-solution" style="font-size: 80%;">The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs007.html#minimization-process" style="font-size: 80%;">Minimization process</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs008.html#minimizing-the-cost-function-using-gradient-descent-and-automatic-differentiation" style="font-size: 80%;">Minimizing the cost function using gradient descent and automatic differentiation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs009.html#example-exponential-decay" style="font-size: 80%;">Example: Exponential decay</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs010.html#the-function-to-solve-for" style="font-size: 80%;">The function to solve for</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#the-trial-solution" style="font-size: 80%;">The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs012.html#setup-of-network" style="font-size: 80%;">Setup of Network</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs013.html#reformulating-the-problem" style="font-size: 80%;">Reformulating the problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs014.html#more-technicalities" style="font-size: 80%;">More technicalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs039.html#more-details" style="font-size: 80%;">More details</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs016.html#a-possible-implementation-of-a-neural-network" style="font-size: 80%;">A possible implementation of a neural network</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs017.html#technicalities" style="font-size: 80%;">Technicalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs018.html#final-technicalities-i" style="font-size: 80%;">Final technicalities I</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs019.html#final-technicalities-ii" style="font-size: 80%;">Final technicalities II</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs020.html#final-technicalities-iii" style="font-size: 80%;">Final technicalities III</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs021.html#final-technicalities-iv" style="font-size: 80%;">Final technicalities IV</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs022.html#back-propagation" style="font-size: 80%;">Back propagation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs023.html#gradient-descent" style="font-size: 80%;">Gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs024.html#the-code-for-solving-the-ode" style="font-size: 80%;">The code for solving the ODE</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs025.html#the-network-with-one-input-layer-specified-number-of-hidden-layers-and-one-output-layer" style="font-size: 80%;">The network with one input layer, specified number of hidden layers, and one output layer</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs026.html#example-population-growth" style="font-size: 80%;">Example: Population growth</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs027.html#setting-up-the-problem" style="font-size: 80%;">Setting up the problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#the-trial-solution" style="font-size: 80%;">The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs029.html#the-program-using-autograd" style="font-size: 80%;">The program using Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs030.html#using-forward-euler-to-solve-the-ode" style="font-size: 80%;">Using forward Euler to solve the ODE</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs031.html#example-solving-the-one-dimensional-poisson-equation" style="font-size: 80%;">Example: Solving the one dimensional Poisson equation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs032.html#the-specific-equation-to-solve-for" style="font-size: 80%;">The specific equation to solve for</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs033.html#solving-the-equation-using-autograd" style="font-size: 80%;">Solving the equation using Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs034.html#comparing-with-a-numerical-scheme" style="font-size: 80%;">Comparing with a numerical scheme</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs035.html#setting-up-the-code" style="font-size: 80%;">Setting up the code</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs036.html#partial-differential-equations" style="font-size: 80%;">Partial Differential Equations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs037.html#type-of-problem" style="font-size: 80%;">Type of problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs038.html#network-requirements" style="font-size: 80%;">Network requirements</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs039.html#more-details" style="font-size: 80%;">More details</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs040.html#example-the-diffusion-equation" style="font-size: 80%;">Example: The diffusion equation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs041.html#defining-the-problem" style="font-size: 80%;">Defining the problem</a></li>
+     <!-- navigation toc: --> <li><a href="#setting-up-the-network-using-autograd" style="font-size: 80%;">Setting up the network using Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs043.html#setting-up-the-network-using-autograd-the-trial-solution" style="font-size: 80%;">Setting up the network using Autograd; The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs044.html#why-the-jacobian" style="font-size: 80%;">Why the Jacobian?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs045.html#setting-up-the-network-using-autograd-the-full-program" style="font-size: 80%;">Setting up the network using Autograd; The full program</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs046.html#resources-on-differential-equations-and-deep-learning" style="font-size: 80%;">Resources on differential equations and deep learning</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs047.html#convolutional-neural-networks-recognizing-images" style="font-size: 80%;">Convolutional Neural Networks (recognizing images)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs048.html#what-is-the-difference" style="font-size: 80%;">What is the Difference</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs049.html#neural-networks-vs-cnns" style="font-size: 80%;">Neural Networks vs CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs050.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;">Why CNNS for images, sound files, medical images from CT scans etc?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs051.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;">Regular NNs don’t scale well to full images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs052.html#3d-volumes-of-neurons" style="font-size: 80%;">3D volumes of neurons</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs053.html#more-on-dimensionalities" style="font-size: 80%;">More on Dimensionalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs054.html#further-remarks" style="font-size: 80%;">Further remarks</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs055.html#layers-used-to-build-cnns" style="font-size: 80%;">Layers used to build CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs056.html#transforming-images" style="font-size: 80%;">Transforming images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs057.html#cnns-in-brief" style="font-size: 80%;">CNNs in brief</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs058.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs059.html#key-idea" style="font-size: 80%;">Key Idea</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#how-to-do-image-compression-before-the-era-of-deep-learning" style="font-size: 80%;">How to do image compression before the era of deep learning</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs061.html#the-svd-example" style="font-size: 80%;">The SVD example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs062.html#mathematics-of-cnns" style="font-size: 80%;">Mathematics of CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs063.html#convolution-examples-polynomial-multiplication" style="font-size: 80%;">Convolution Examples: Polynomial multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs064.html#efficient-polynomial-multiplication" style="font-size: 80%;">Efficient Polynomial Multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs065.html#further-simplification" style="font-size: 80%;">Further simplification</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs066.html#a-more-efficient-way-of-coding-the-above-convolution" style="font-size: 80%;">A more efficient way of coding the above Convolution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs067.html#commutative-process" style="font-size: 80%;">Commutative process</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs068.html#toeplitz-matrices" style="font-size: 80%;">Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs069.html#fourier-series-and-toeplitz-matrices" style="font-size: 80%;">Fourier series and Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs070.html#generalizing-the-above-one-dimensional-case" style="font-size: 80%;">Generalizing the above one-dimensional case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs071.html#memory-considerations" style="font-size: 80%;">Memory considerations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs072.html#padding" style="font-size: 80%;">Padding</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs073.html#new-vector" style="font-size: 80%;">New vector</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs074.html#rewriting-as-dot-products" style="font-size: 80%;">Rewriting as dot products</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs075.html#cross-correlation" style="font-size: 80%;">Cross correlation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs075.html#two-dimensional-objects" style="font-size: 80%;">Two-dimensional objects</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs076.html#cnns-in-more-detail-simple-example" style="font-size: 80%;">CNNs in more detail, simple example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs077.html#the-convolution-stage" style="font-size: 80%;">The convolution stage</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs078.html#finding-the-number-of-parameters" style="font-size: 80%;">Finding the number of parameters</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs079.html#new-image-or-volume" style="font-size: 80%;">New image (or volume)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs080.html#parameters-to-train-common-settings" style="font-size: 80%;">Parameters to train, common settings</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs081.html#examples-of-cnn-setups" style="font-size: 80%;">Examples of CNN setups</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs082.html#summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Summarizing: Performing a general discrete convolution ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs083.html#pooling" style="font-size: 80%;">Pooling</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs084.html#pooling-arithmetic" style="font-size: 80%;">Pooling arithmetic</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs085.html#pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Pooling types ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs086.html#building-convolutional-neural-networks-in-tensorflow-keras-and-pytorch" style="font-size: 80%;">Building convolutional neural networks in Tensorflow/Keras and PyTorch</a></li>
 
         </ul>
       </li>
@@ -348,15 +389,83 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0042"></a>
 <!-- !split -->
-<h2 id="pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" class="anchor">Pooling types (<a href="https://github.com/rasbt/machine-learning-book" target="_self">From Raschka et al</a>) </h2>
+<h2 id="setting-up-the-network-using-autograd" class="anchor">Setting up the network using Autograd </h2>
+
+<p>The only change to do here, is to extend our network such that
+functions of multiple parameters are correctly handled.  In this case
+we have two variables in our function to solve for, that is time \( t \)
+and position \( x \).  The variables will be represented by a
+one-dimensional array in the program.  The program will evaluate the
+network at each possible pair \( (x,t) \), given an array for the desired
+\( x \)-values and \( t \)-values to approximate the solution at.
+</p>
+
+
+<!-- code=python (!bc pycod) typeset with pygments style "default" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #f8f8f8">
+  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">sigmoid</span>(z):
+    <span style="color: #008000; font-weight: bold">return</span> <span style="color: #666666">1/</span>(<span style="color: #666666">1</span> <span style="color: #666666">+</span> np<span style="color: #666666">.</span>exp(<span style="color: #666666">-</span>z))
+
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">deep_neural_network</span>(deep_params, x):
+    <span style="color: #408080; font-style: italic"># x is now a point and a 1D numpy array; make it a column vector</span>
+    num_coordinates <span style="color: #666666">=</span> np<span style="color: #666666">.</span>size(x,<span style="color: #666666">0</span>)
+    x <span style="color: #666666">=</span> x<span style="color: #666666">.</span>reshape(num_coordinates,<span style="color: #666666">-1</span>)
+
+    num_points <span style="color: #666666">=</span> np<span style="color: #666666">.</span>size(x,<span style="color: #666666">1</span>)
+
+    <span style="color: #408080; font-style: italic"># N_hidden is the number of hidden layers</span>
+    N_hidden <span style="color: #666666">=</span> <span style="color: #008000">len</span>(deep_params) <span style="color: #666666">-</span> <span style="color: #666666">1</span> <span style="color: #408080; font-style: italic"># -1 since params consist of parameters to all the hidden layers AND the output layer</span>
+
+    <span style="color: #408080; font-style: italic"># Assume that the input layer does nothing to the input x</span>
+    x_input <span style="color: #666666">=</span> x
+    x_prev <span style="color: #666666">=</span> x_input
+
+    <span style="color: #408080; font-style: italic">## Hidden layers:</span>
+
+    <span style="color: #008000; font-weight: bold">for</span> l <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(N_hidden):
+        <span style="color: #408080; font-style: italic"># From the list of parameters P; find the correct weigths and bias for this layer</span>
+        w_hidden <span style="color: #666666">=</span> deep_params[l]
+
+        <span style="color: #408080; font-style: italic"># Add a row of ones to include bias</span>
+        x_prev <span style="color: #666666">=</span> np<span style="color: #666666">.</span>concatenate((np<span style="color: #666666">.</span>ones((<span style="color: #666666">1</span>,num_points)), x_prev ), axis <span style="color: #666666">=</span> <span style="color: #666666">0</span>)
+
+        z_hidden <span style="color: #666666">=</span> np<span style="color: #666666">.</span>matmul(w_hidden, x_prev)
+        x_hidden <span style="color: #666666">=</span> sigmoid(z_hidden)
+
+        <span style="color: #408080; font-style: italic"># Update x_prev such that next layer can use the output from this layer</span>
+        x_prev <span style="color: #666666">=</span> x_hidden
+
+    <span style="color: #408080; font-style: italic">## Output layer:</span>
+
+    <span style="color: #408080; font-style: italic"># Get the weights and bias for this layer</span>
+    w_output <span style="color: #666666">=</span> deep_params[<span style="color: #666666">-1</span>]
+
+    <span style="color: #408080; font-style: italic"># Include bias:</span>
+    x_prev <span style="color: #666666">=</span> np<span style="color: #666666">.</span>concatenate((np<span style="color: #666666">.</span>ones((<span style="color: #666666">1</span>,num_points)), x_prev), axis <span style="color: #666666">=</span> <span style="color: #666666">0</span>)
+
+    z_output <span style="color: #666666">=</span> np<span style="color: #666666">.</span>matmul(w_output, x_prev)
+    x_output <span style="color: #666666">=</span> z_output
+
+    <span style="color: #008000; font-weight: bold">return</span> x_output[<span style="color: #666666">0</span>][<span style="color: #666666">0</span>]
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
+</div>
 
-<center>  <!-- FIGURE -->
-<hr class="figure">
-<center>
-<p class="caption">Figure 5:  A deep CNN </p>
-</center>
-<p><img src="http://github.com/figslides/maxpooling.png" width="500" align="bottom"></p>
-</center>
 
 <p>
 <!-- navigation buttons at the bottom of the page -->
@@ -383,7 +492,7 @@ <h2 id="pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning
   <li><a href="http://github.com/._week44-bs050.html">51</a></li>
   <li><a href="http://github.com/._week44-bs051.html">52</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week44-bs060.html">61</a></li>
+  <li><a href="http://github.com/._week44-bs086.html">87</a></li>
   <li><a href="http://github.com/._week44-bs043.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week44/html/._week44-bs043.html b/doc/pub/week44/html/._week44-bs043.html
index 3b2c86934..5eab3a99e 100644
--- a/doc/pub/week44/html/._week44-bs043.html
+++ b/doc/pub/week44/html/._week44-bs043.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 44,  Convolutional Neural Networks (CNN)">
-<title>Week 44,  Convolutional Neural Networks (CNN)</title>
+<meta name="description" content="Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)">
+<title>Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week44.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week44-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -41,10 +41,121 @@
                2,
                None,
                'lab-sessions-on-tuesday-and-wednesday'),
-              ('Material for Lecture Monday October 28',
+              ('Material for Lecture Monday October 27',
                2,
                None,
-               'material-for-lecture-monday-october-28'),
+               'material-for-lecture-monday-october-27'),
+              ('Solving differential equations  with Deep Learning',
+               2,
+               None,
+               'solving-differential-equations-with-deep-learning'),
+              ('Ordinary Differential Equations first',
+               2,
+               None,
+               'ordinary-differential-equations-first'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('Minimization process', 2, None, 'minimization-process'),
+              ('Minimizing the cost function using gradient descent and '
+               'automatic differentiation',
+               2,
+               None,
+               'minimizing-the-cost-function-using-gradient-descent-and-automatic-differentiation'),
+              ('Example: Exponential decay',
+               2,
+               None,
+               'example-exponential-decay'),
+              ('The function to solve for',
+               2,
+               None,
+               'the-function-to-solve-for'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('Setup of Network', 2, None, 'setup-of-network'),
+              ('Reformulating the problem',
+               2,
+               None,
+               'reformulating-the-problem'),
+              ('More technicalities', 2, None, 'more-technicalities'),
+              ('More details', 2, None, 'more-details'),
+              ('A possible implementation of a neural network',
+               2,
+               None,
+               'a-possible-implementation-of-a-neural-network'),
+              ('Technicalities', 2, None, 'technicalities'),
+              ('Final technicalities I', 2, None, 'final-technicalities-i'),
+              ('Final technicalities II', 2, None, 'final-technicalities-ii'),
+              ('Final technicalities III', 2, None, 'final-technicalities-iii'),
+              ('Final technicalities IV', 2, None, 'final-technicalities-iv'),
+              ('Back propagation', 2, None, 'back-propagation'),
+              ('Gradient descent', 2, None, 'gradient-descent'),
+              ('The code for solving the ODE',
+               2,
+               None,
+               'the-code-for-solving-the-ode'),
+              ('The network with one input layer, specified number of hidden '
+               'layers, and one output layer',
+               2,
+               None,
+               'the-network-with-one-input-layer-specified-number-of-hidden-layers-and-one-output-layer'),
+              ('Example: Population growth',
+               2,
+               None,
+               'example-population-growth'),
+              ('Setting up the problem', 2, None, 'setting-up-the-problem'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('The program using Autograd',
+               2,
+               None,
+               'the-program-using-autograd'),
+              ('Using forward Euler to solve the ODE',
+               2,
+               None,
+               'using-forward-euler-to-solve-the-ode'),
+              ('Example: Solving the one dimensional Poisson equation',
+               2,
+               None,
+               'example-solving-the-one-dimensional-poisson-equation'),
+              ('The specific equation to solve for',
+               2,
+               None,
+               'the-specific-equation-to-solve-for'),
+              ('Solving the equation using Autograd',
+               2,
+               None,
+               'solving-the-equation-using-autograd'),
+              ('Comparing with a numerical scheme',
+               2,
+               None,
+               'comparing-with-a-numerical-scheme'),
+              ('Setting up the code', 2, None, 'setting-up-the-code'),
+              ('Partial Differential Equations',
+               2,
+               None,
+               'partial-differential-equations'),
+              ('Type of problem', 2, None, 'type-of-problem'),
+              ('Network requirements', 2, None, 'network-requirements'),
+              ('More details', 2, None, 'more-details'),
+              ('Example: The diffusion equation',
+               2,
+               None,
+               'example-the-diffusion-equation'),
+              ('Defining the problem', 2, None, 'defining-the-problem'),
+              ('Setting up the network using Autograd',
+               2,
+               None,
+               'setting-up-the-network-using-autograd'),
+              ('Setting up the network using Autograd; The trial solution',
+               2,
+               None,
+               'setting-up-the-network-using-autograd-the-trial-solution'),
+              ('Why the Jacobian?', 2, None, 'why-the-jacobian'),
+              ('Setting up the network using Autograd; The full program',
+               2,
+               None,
+               'setting-up-the-network-using-autograd-the-full-program'),
+              ('Resources on differential equations and deep learning',
+               2,
+               None,
+               'resources-on-differential-equations-and-deep-learning'),
               ('Convolutional Neural Networks (recognizing images)',
                2,
                None,
@@ -141,85 +252,11 @@
                2,
                None,
                'pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
-              ('Building convolutional neural networks in Tensorflow and Keras',
-               2,
-               None,
-               'building-convolutional-neural-networks-in-tensorflow-and-keras'),
-              ('Setting it up', 2, None, 'setting-it-up'),
-              ('The MNIST dataset again', 2, None, 'the-mnist-dataset-again'),
-              ('Strong correlations', 2, None, 'strong-correlations'),
-              ('Layers of a CNN', 2, None, 'layers-of-a-cnn'),
-              ('Systematic reduction', 2, None, 'systematic-reduction'),
-              ('Prerequisites: Collect and pre-process data',
-               2,
-               None,
-               'prerequisites-collect-and-pre-process-data'),
-              ('Importing Keras and Tensorflow',
-               2,
-               None,
-               'importing-keras-and-tensorflow'),
-              ('Running with Keras', 2, None, 'running-with-keras'),
-              ('Final part', 2, None, 'final-part'),
-              ('Final visualization', 2, None, 'final-visualization'),
-              ('The CIFAR01 data set', 2, None, 'the-cifar01-data-set'),
-              ('Verifying the data set', 2, None, 'verifying-the-data-set'),
-              ('Set up  the model', 2, None, 'set-up-the-model'),
-              ('Add Dense layers on top', 2, None, 'add-dense-layers-on-top'),
-              ('Compile and train the model',
+              ('Building convolutional neural networks in Tensorflow/Keras and '
+               'PyTorch',
                2,
                None,
-               'compile-and-train-the-model'),
-              ('Finally, evaluate the model',
-               2,
-               None,
-               'finally-evaluate-the-model'),
-              ('Building our own CNN code',
-               2,
-               None,
-               'building-our-own-cnn-code'),
-              ('List of contents:', 3, None, 'list-of-contents'),
-              ('Schedulers', 3, None, 'schedulers'),
-              ('Usage of schedulers', 3, None, 'usage-of-schedulers'),
-              ('Cost functions', 3, None, 'cost-functions'),
-              ('Usage of cost functions', 3, None, 'usage-of-cost-functions'),
-              ('Activation functions', 3, None, 'activation-functions'),
-              ('Usage of activation functions',
-               3,
-               None,
-               'usage-of-activation-functions'),
-              ('Convolution', 3, None, 'convolution'),
-              ('Layers', 3, None, 'layers'),
-              ('Convolution2DLayer: convolution in a hidden layer',
-               3,
-               None,
-               'convolution2dlayer-convolution-in-a-hidden-layer'),
-              ('Backpropagation in the convolutional layer',
-               3,
-               None,
-               'backpropagation-in-the-convolutional-layer'),
-              ('Demonstration', 3, None, 'demonstration'),
-              ('Pooling Layer', 3, None, 'pooling-layer'),
-              ('Flattening Layer', 3, None, 'flattening-layer'),
-              ('Fully Connected Layers', 3, None, 'fully-connected-layers'),
-              ('Optimized Convolution2DLayer',
-               3,
-               None,
-               'optimized-convolution2dlayer'),
-              ('The Convolutional Neural Network (CNN)',
-               3,
-               None,
-               'the-convolutional-neural-network-cnn'),
-              ('Usage of CNN code', 3, None, 'usage-of-cnn-code'),
-              ('Additional Remarks', 3, None, 'additional-remarks'),
-              ('Remarks on the speed', 3, None, 'remarks-on-the-speed'),
-              ('Convolution using separable kernels',
-               3,
-               None,
-               'convolution-using-separable-kernels'),
-              ('Convolution in the Fourier domain',
-               3,
-               None,
-               'convolution-in-the-fourier-domain')]}
+               'building-convolutional-neural-networks-in-tensorflow-keras-and-pytorch')]}
 end of tocinfo -->
 
 <body>
@@ -247,96 +284,100 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week44-bs.html">Week 44,  Convolutional Neural Networks (CNN)</a>
+    <a class="navbar-brand" href="http://github.com/week44-bs.html">Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs001.html#plan-for-week-44" style="font-size: 80%;"><b>Plan for week 44</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs002.html#lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Lab  sessions on Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs003.html#material-for-lecture-monday-october-28" style="font-size: 80%;"><b>Material for Lecture Monday October 28</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs004.html#convolutional-neural-networks-recognizing-images" style="font-size: 80%;"><b>Convolutional Neural Networks (recognizing images)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs005.html#what-is-the-difference" style="font-size: 80%;"><b>What is the Difference</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs006.html#neural-networks-vs-cnns" style="font-size: 80%;"><b>Neural Networks vs CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs007.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;"><b>Why CNNS for images, sound files, medical images from CT scans etc?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs008.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;"><b>Regular NNs don’t scale well to full images</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs009.html#3d-volumes-of-neurons" style="font-size: 80%;"><b>3D volumes of neurons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs010.html#more-on-dimensionalities" style="font-size: 80%;"><b>More on Dimensionalities</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs011.html#further-remarks" style="font-size: 80%;"><b>Further remarks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs012.html#layers-used-to-build-cnns" style="font-size: 80%;"><b>Layers used to build CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs013.html#transforming-images" style="font-size: 80%;"><b>Transforming images</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs014.html#cnns-in-brief" style="font-size: 80%;"><b>CNNs in brief</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs015.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs016.html#key-idea" style="font-size: 80%;"><b>Key Idea</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs017.html#how-to-do-image-compression-before-the-era-of-deep-learning" style="font-size: 80%;"><b>How to do image compression before the era of deep learning</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs018.html#the-svd-example" style="font-size: 80%;"><b>The SVD example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs019.html#mathematics-of-cnns" style="font-size: 80%;"><b>Mathematics of CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs020.html#convolution-examples-polynomial-multiplication" style="font-size: 80%;"><b>Convolution Examples: Polynomial multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs021.html#efficient-polynomial-multiplication" style="font-size: 80%;"><b>Efficient Polynomial Multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs022.html#further-simplification" style="font-size: 80%;"><b>Further simplification</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs023.html#a-more-efficient-way-of-coding-the-above-convolution" style="font-size: 80%;"><b>A more efficient way of coding the above Convolution</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs024.html#commutative-process" style="font-size: 80%;"><b>Commutative process</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs025.html#toeplitz-matrices" style="font-size: 80%;"><b>Toeplitz matrices</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs026.html#fourier-series-and-toeplitz-matrices" style="font-size: 80%;"><b>Fourier series and Toeplitz matrices</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs027.html#generalizing-the-above-one-dimensional-case" style="font-size: 80%;"><b>Generalizing the above one-dimensional case</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#memory-considerations" style="font-size: 80%;"><b>Memory considerations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs029.html#padding" style="font-size: 80%;"><b>Padding</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs030.html#new-vector" style="font-size: 80%;"><b>New vector</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs031.html#rewriting-as-dot-products" style="font-size: 80%;"><b>Rewriting as dot products</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs032.html#cross-correlation" style="font-size: 80%;"><b>Cross correlation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs032.html#two-dimensional-objects" style="font-size: 80%;"><b>Two-dimensional objects</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs033.html#cnns-in-more-detail-simple-example" style="font-size: 80%;"><b>CNNs in more detail, simple example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs034.html#the-convolution-stage" style="font-size: 80%;"><b>The convolution stage</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs035.html#finding-the-number-of-parameters" style="font-size: 80%;"><b>Finding the number of parameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs036.html#new-image-or-volume" style="font-size: 80%;"><b>New image (or volume)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs037.html#parameters-to-train-common-settings" style="font-size: 80%;"><b>Parameters to train, common settings</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs038.html#examples-of-cnn-setups" style="font-size: 80%;"><b>Examples of CNN setups</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs039.html#summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>Summarizing: Performing a general discrete convolution ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs040.html#pooling" style="font-size: 80%;"><b>Pooling</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs041.html#pooling-arithmetic" style="font-size: 80%;"><b>Pooling arithmetic</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs042.html#pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>Pooling types ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="#building-convolutional-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building convolutional neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs044.html#setting-it-up" style="font-size: 80%;"><b>Setting it up</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs045.html#the-mnist-dataset-again" style="font-size: 80%;"><b>The MNIST dataset again</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs046.html#strong-correlations" style="font-size: 80%;"><b>Strong correlations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs047.html#layers-of-a-cnn" style="font-size: 80%;"><b>Layers of a CNN</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs048.html#systematic-reduction" style="font-size: 80%;"><b>Systematic reduction</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs049.html#prerequisites-collect-and-pre-process-data" style="font-size: 80%;"><b>Prerequisites: Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs050.html#importing-keras-and-tensorflow" style="font-size: 80%;"><b>Importing Keras and Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs051.html#running-with-keras" style="font-size: 80%;"><b>Running with Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs052.html#final-part" style="font-size: 80%;"><b>Final part</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs053.html#final-visualization" style="font-size: 80%;"><b>Final visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs054.html#the-cifar01-data-set" style="font-size: 80%;"><b>The CIFAR01 data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs055.html#verifying-the-data-set" style="font-size: 80%;"><b>Verifying the data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs056.html#set-up-the-model" style="font-size: 80%;"><b>Set up  the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs057.html#add-dense-layers-on-top" style="font-size: 80%;"><b>Add Dense layers on top</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs058.html#compile-and-train-the-model" style="font-size: 80%;"><b>Compile and train the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs059.html#finally-evaluate-the-model" style="font-size: 80%;"><b>Finally, evaluate the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#building-our-own-cnn-code" style="font-size: 80%;"><b>Building our own CNN code</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#list-of-contents" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;List of contents:</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution2dlayer-convolution-in-a-hidden-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution2DLayer: convolution in a hidden layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#backpropagation-in-the-convolutional-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Backpropagation in the convolutional layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#demonstration" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Demonstration</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#pooling-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Pooling Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#flattening-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Flattening Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#fully-connected-layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Fully Connected Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#optimized-convolution2dlayer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Optimized Convolution2DLayer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#the-convolutional-neural-network-cnn" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;The Convolutional Neural Network (CNN)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-cnn-code" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of CNN code</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#additional-remarks" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Additional Remarks</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#remarks-on-the-speed" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Remarks on the speed</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution-using-separable-kernels" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution using separable kernels</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution-in-the-fourier-domain" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution in the Fourier domain</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs001.html#plan-for-week-44" style="font-size: 80%;">Plan for week 44</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs002.html#lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;">Lab  sessions on Tuesday and Wednesday</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs003.html#material-for-lecture-monday-october-27" style="font-size: 80%;">Material for Lecture Monday October 27</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs004.html#solving-differential-equations-with-deep-learning" style="font-size: 80%;">Solving differential equations  with Deep Learning</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs005.html#ordinary-differential-equations-first" style="font-size: 80%;">Ordinary Differential Equations first</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#the-trial-solution" style="font-size: 80%;">The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs007.html#minimization-process" style="font-size: 80%;">Minimization process</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs008.html#minimizing-the-cost-function-using-gradient-descent-and-automatic-differentiation" style="font-size: 80%;">Minimizing the cost function using gradient descent and automatic differentiation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs009.html#example-exponential-decay" style="font-size: 80%;">Example: Exponential decay</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs010.html#the-function-to-solve-for" style="font-size: 80%;">The function to solve for</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#the-trial-solution" style="font-size: 80%;">The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs012.html#setup-of-network" style="font-size: 80%;">Setup of Network</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs013.html#reformulating-the-problem" style="font-size: 80%;">Reformulating the problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs014.html#more-technicalities" style="font-size: 80%;">More technicalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs039.html#more-details" style="font-size: 80%;">More details</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs016.html#a-possible-implementation-of-a-neural-network" style="font-size: 80%;">A possible implementation of a neural network</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs017.html#technicalities" style="font-size: 80%;">Technicalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs018.html#final-technicalities-i" style="font-size: 80%;">Final technicalities I</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs019.html#final-technicalities-ii" style="font-size: 80%;">Final technicalities II</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs020.html#final-technicalities-iii" style="font-size: 80%;">Final technicalities III</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs021.html#final-technicalities-iv" style="font-size: 80%;">Final technicalities IV</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs022.html#back-propagation" style="font-size: 80%;">Back propagation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs023.html#gradient-descent" style="font-size: 80%;">Gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs024.html#the-code-for-solving-the-ode" style="font-size: 80%;">The code for solving the ODE</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs025.html#the-network-with-one-input-layer-specified-number-of-hidden-layers-and-one-output-layer" style="font-size: 80%;">The network with one input layer, specified number of hidden layers, and one output layer</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs026.html#example-population-growth" style="font-size: 80%;">Example: Population growth</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs027.html#setting-up-the-problem" style="font-size: 80%;">Setting up the problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#the-trial-solution" style="font-size: 80%;">The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs029.html#the-program-using-autograd" style="font-size: 80%;">The program using Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs030.html#using-forward-euler-to-solve-the-ode" style="font-size: 80%;">Using forward Euler to solve the ODE</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs031.html#example-solving-the-one-dimensional-poisson-equation" style="font-size: 80%;">Example: Solving the one dimensional Poisson equation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs032.html#the-specific-equation-to-solve-for" style="font-size: 80%;">The specific equation to solve for</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs033.html#solving-the-equation-using-autograd" style="font-size: 80%;">Solving the equation using Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs034.html#comparing-with-a-numerical-scheme" style="font-size: 80%;">Comparing with a numerical scheme</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs035.html#setting-up-the-code" style="font-size: 80%;">Setting up the code</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs036.html#partial-differential-equations" style="font-size: 80%;">Partial Differential Equations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs037.html#type-of-problem" style="font-size: 80%;">Type of problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs038.html#network-requirements" style="font-size: 80%;">Network requirements</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs039.html#more-details" style="font-size: 80%;">More details</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs040.html#example-the-diffusion-equation" style="font-size: 80%;">Example: The diffusion equation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs041.html#defining-the-problem" style="font-size: 80%;">Defining the problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs042.html#setting-up-the-network-using-autograd" style="font-size: 80%;">Setting up the network using Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="#setting-up-the-network-using-autograd-the-trial-solution" style="font-size: 80%;">Setting up the network using Autograd; The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs044.html#why-the-jacobian" style="font-size: 80%;">Why the Jacobian?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs045.html#setting-up-the-network-using-autograd-the-full-program" style="font-size: 80%;">Setting up the network using Autograd; The full program</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs046.html#resources-on-differential-equations-and-deep-learning" style="font-size: 80%;">Resources on differential equations and deep learning</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs047.html#convolutional-neural-networks-recognizing-images" style="font-size: 80%;">Convolutional Neural Networks (recognizing images)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs048.html#what-is-the-difference" style="font-size: 80%;">What is the Difference</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs049.html#neural-networks-vs-cnns" style="font-size: 80%;">Neural Networks vs CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs050.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;">Why CNNS for images, sound files, medical images from CT scans etc?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs051.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;">Regular NNs don’t scale well to full images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs052.html#3d-volumes-of-neurons" style="font-size: 80%;">3D volumes of neurons</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs053.html#more-on-dimensionalities" style="font-size: 80%;">More on Dimensionalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs054.html#further-remarks" style="font-size: 80%;">Further remarks</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs055.html#layers-used-to-build-cnns" style="font-size: 80%;">Layers used to build CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs056.html#transforming-images" style="font-size: 80%;">Transforming images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs057.html#cnns-in-brief" style="font-size: 80%;">CNNs in brief</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs058.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs059.html#key-idea" style="font-size: 80%;">Key Idea</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#how-to-do-image-compression-before-the-era-of-deep-learning" style="font-size: 80%;">How to do image compression before the era of deep learning</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs061.html#the-svd-example" style="font-size: 80%;">The SVD example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs062.html#mathematics-of-cnns" style="font-size: 80%;">Mathematics of CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs063.html#convolution-examples-polynomial-multiplication" style="font-size: 80%;">Convolution Examples: Polynomial multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs064.html#efficient-polynomial-multiplication" style="font-size: 80%;">Efficient Polynomial Multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs065.html#further-simplification" style="font-size: 80%;">Further simplification</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs066.html#a-more-efficient-way-of-coding-the-above-convolution" style="font-size: 80%;">A more efficient way of coding the above Convolution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs067.html#commutative-process" style="font-size: 80%;">Commutative process</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs068.html#toeplitz-matrices" style="font-size: 80%;">Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs069.html#fourier-series-and-toeplitz-matrices" style="font-size: 80%;">Fourier series and Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs070.html#generalizing-the-above-one-dimensional-case" style="font-size: 80%;">Generalizing the above one-dimensional case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs071.html#memory-considerations" style="font-size: 80%;">Memory considerations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs072.html#padding" style="font-size: 80%;">Padding</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs073.html#new-vector" style="font-size: 80%;">New vector</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs074.html#rewriting-as-dot-products" style="font-size: 80%;">Rewriting as dot products</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs075.html#cross-correlation" style="font-size: 80%;">Cross correlation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs075.html#two-dimensional-objects" style="font-size: 80%;">Two-dimensional objects</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs076.html#cnns-in-more-detail-simple-example" style="font-size: 80%;">CNNs in more detail, simple example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs077.html#the-convolution-stage" style="font-size: 80%;">The convolution stage</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs078.html#finding-the-number-of-parameters" style="font-size: 80%;">Finding the number of parameters</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs079.html#new-image-or-volume" style="font-size: 80%;">New image (or volume)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs080.html#parameters-to-train-common-settings" style="font-size: 80%;">Parameters to train, common settings</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs081.html#examples-of-cnn-setups" style="font-size: 80%;">Examples of CNN setups</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs082.html#summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Summarizing: Performing a general discrete convolution ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs083.html#pooling" style="font-size: 80%;">Pooling</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs084.html#pooling-arithmetic" style="font-size: 80%;">Pooling arithmetic</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs085.html#pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Pooling types ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs086.html#building-convolutional-neural-networks-in-tensorflow-keras-and-pytorch" style="font-size: 80%;">Building convolutional neural networks in Tensorflow/Keras and PyTorch</a></li>
 
         </ul>
       </li>
@@ -348,17 +389,29 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0043"></a>
 <!-- !split -->
-<h2 id="building-convolutional-neural-networks-in-tensorflow-and-keras" class="anchor">Building convolutional neural networks in Tensorflow and Keras </h2>
+<h2 id="setting-up-the-network-using-autograd-the-trial-solution" class="anchor">Setting up the network using Autograd; The trial solution </h2>
+
+<p>The cost function must then iterate through the given arrays
+containing values for \( x \) and \( t \), defines a point \( (x,t) \) the deep
+neural network and the trial solution is evaluated at, and then finds
+the Jacobian of the trial solution.
+</p>
+
+<p>A possible trial solution for this PDE is</p>
 
-<p>As discussed above, CNNs are neural networks built from the assumption that the inputs
-to the network are 2D images. This is important because the number of features or pixels in images
-grows very fast with the image size, and an enormous number of weights and biases are needed in order to build an accurate network.  
+<p>$$
+g_t(x,t) = h_1(x,t) + x(1-x)tN(x,t,P)
+$$
 </p>
 
-<p>As before, we still have our input, a hidden layer and an output. What's novel about convolutional networks
-are the <b>convolutional</b> and <b>pooling</b> layers stacked in pairs between the input and the hidden layer.
-In addition, the data is no longer represented as a 2D feature matrix, instead each input is a number of 2D
-matrices, typically 1 for each color dimension (Red, Green, Blue). 
+<p>with \( h_1(x,t) \) being a function ensuring that \( g_t(x,t) \) satisfies our given conditions, and \( N(x,t,P) \) being the output from the deep neural network using weights and biases for each layer from \( P \).</p>
+
+<p>To fulfill the conditions, \( h_1(x,t) \) could be:</p>
+
+<p>$$
+h_1(x,t) = (1-t)\Big(u(x) - \big((1-x)u(0) + x u(1)\big)\Big) = (1-t)u(x) = (1-t)\sin(\pi x)
+$$
+since \( (0) = u(1) = 0 \) and \( u(x) = \sin(\pi x) \).
 </p>
 
 <p>
@@ -386,7 +439,7 @@ <h2 id="building-convolutional-neural-networks-in-tensorflow-and-keras" class="a
   <li><a href="http://github.com/._week44-bs051.html">52</a></li>
   <li><a href="http://github.com/._week44-bs052.html">53</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week44-bs060.html">61</a></li>
+  <li><a href="http://github.com/._week44-bs086.html">87</a></li>
   <li><a href="http://github.com/._week44-bs044.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week44/html/._week44-bs044.html b/doc/pub/week44/html/._week44-bs044.html
index 3a070f6b9..2863798a7 100644
--- a/doc/pub/week44/html/._week44-bs044.html
+++ b/doc/pub/week44/html/._week44-bs044.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 44,  Convolutional Neural Networks (CNN)">
-<title>Week 44,  Convolutional Neural Networks (CNN)</title>
+<meta name="description" content="Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)">
+<title>Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week44.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week44-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -41,10 +41,121 @@
                2,
                None,
                'lab-sessions-on-tuesday-and-wednesday'),
-              ('Material for Lecture Monday October 28',
+              ('Material for Lecture Monday October 27',
                2,
                None,
-               'material-for-lecture-monday-october-28'),
+               'material-for-lecture-monday-october-27'),
+              ('Solving differential equations  with Deep Learning',
+               2,
+               None,
+               'solving-differential-equations-with-deep-learning'),
+              ('Ordinary Differential Equations first',
+               2,
+               None,
+               'ordinary-differential-equations-first'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('Minimization process', 2, None, 'minimization-process'),
+              ('Minimizing the cost function using gradient descent and '
+               'automatic differentiation',
+               2,
+               None,
+               'minimizing-the-cost-function-using-gradient-descent-and-automatic-differentiation'),
+              ('Example: Exponential decay',
+               2,
+               None,
+               'example-exponential-decay'),
+              ('The function to solve for',
+               2,
+               None,
+               'the-function-to-solve-for'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('Setup of Network', 2, None, 'setup-of-network'),
+              ('Reformulating the problem',
+               2,
+               None,
+               'reformulating-the-problem'),
+              ('More technicalities', 2, None, 'more-technicalities'),
+              ('More details', 2, None, 'more-details'),
+              ('A possible implementation of a neural network',
+               2,
+               None,
+               'a-possible-implementation-of-a-neural-network'),
+              ('Technicalities', 2, None, 'technicalities'),
+              ('Final technicalities I', 2, None, 'final-technicalities-i'),
+              ('Final technicalities II', 2, None, 'final-technicalities-ii'),
+              ('Final technicalities III', 2, None, 'final-technicalities-iii'),
+              ('Final technicalities IV', 2, None, 'final-technicalities-iv'),
+              ('Back propagation', 2, None, 'back-propagation'),
+              ('Gradient descent', 2, None, 'gradient-descent'),
+              ('The code for solving the ODE',
+               2,
+               None,
+               'the-code-for-solving-the-ode'),
+              ('The network with one input layer, specified number of hidden '
+               'layers, and one output layer',
+               2,
+               None,
+               'the-network-with-one-input-layer-specified-number-of-hidden-layers-and-one-output-layer'),
+              ('Example: Population growth',
+               2,
+               None,
+               'example-population-growth'),
+              ('Setting up the problem', 2, None, 'setting-up-the-problem'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('The program using Autograd',
+               2,
+               None,
+               'the-program-using-autograd'),
+              ('Using forward Euler to solve the ODE',
+               2,
+               None,
+               'using-forward-euler-to-solve-the-ode'),
+              ('Example: Solving the one dimensional Poisson equation',
+               2,
+               None,
+               'example-solving-the-one-dimensional-poisson-equation'),
+              ('The specific equation to solve for',
+               2,
+               None,
+               'the-specific-equation-to-solve-for'),
+              ('Solving the equation using Autograd',
+               2,
+               None,
+               'solving-the-equation-using-autograd'),
+              ('Comparing with a numerical scheme',
+               2,
+               None,
+               'comparing-with-a-numerical-scheme'),
+              ('Setting up the code', 2, None, 'setting-up-the-code'),
+              ('Partial Differential Equations',
+               2,
+               None,
+               'partial-differential-equations'),
+              ('Type of problem', 2, None, 'type-of-problem'),
+              ('Network requirements', 2, None, 'network-requirements'),
+              ('More details', 2, None, 'more-details'),
+              ('Example: The diffusion equation',
+               2,
+               None,
+               'example-the-diffusion-equation'),
+              ('Defining the problem', 2, None, 'defining-the-problem'),
+              ('Setting up the network using Autograd',
+               2,
+               None,
+               'setting-up-the-network-using-autograd'),
+              ('Setting up the network using Autograd; The trial solution',
+               2,
+               None,
+               'setting-up-the-network-using-autograd-the-trial-solution'),
+              ('Why the Jacobian?', 2, None, 'why-the-jacobian'),
+              ('Setting up the network using Autograd; The full program',
+               2,
+               None,
+               'setting-up-the-network-using-autograd-the-full-program'),
+              ('Resources on differential equations and deep learning',
+               2,
+               None,
+               'resources-on-differential-equations-and-deep-learning'),
               ('Convolutional Neural Networks (recognizing images)',
                2,
                None,
@@ -141,85 +252,11 @@
                2,
                None,
                'pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
-              ('Building convolutional neural networks in Tensorflow and Keras',
-               2,
-               None,
-               'building-convolutional-neural-networks-in-tensorflow-and-keras'),
-              ('Setting it up', 2, None, 'setting-it-up'),
-              ('The MNIST dataset again', 2, None, 'the-mnist-dataset-again'),
-              ('Strong correlations', 2, None, 'strong-correlations'),
-              ('Layers of a CNN', 2, None, 'layers-of-a-cnn'),
-              ('Systematic reduction', 2, None, 'systematic-reduction'),
-              ('Prerequisites: Collect and pre-process data',
-               2,
-               None,
-               'prerequisites-collect-and-pre-process-data'),
-              ('Importing Keras and Tensorflow',
+              ('Building convolutional neural networks in Tensorflow/Keras and '
+               'PyTorch',
                2,
                None,
-               'importing-keras-and-tensorflow'),
-              ('Running with Keras', 2, None, 'running-with-keras'),
-              ('Final part', 2, None, 'final-part'),
-              ('Final visualization', 2, None, 'final-visualization'),
-              ('The CIFAR01 data set', 2, None, 'the-cifar01-data-set'),
-              ('Verifying the data set', 2, None, 'verifying-the-data-set'),
-              ('Set up  the model', 2, None, 'set-up-the-model'),
-              ('Add Dense layers on top', 2, None, 'add-dense-layers-on-top'),
-              ('Compile and train the model',
-               2,
-               None,
-               'compile-and-train-the-model'),
-              ('Finally, evaluate the model',
-               2,
-               None,
-               'finally-evaluate-the-model'),
-              ('Building our own CNN code',
-               2,
-               None,
-               'building-our-own-cnn-code'),
-              ('List of contents:', 3, None, 'list-of-contents'),
-              ('Schedulers', 3, None, 'schedulers'),
-              ('Usage of schedulers', 3, None, 'usage-of-schedulers'),
-              ('Cost functions', 3, None, 'cost-functions'),
-              ('Usage of cost functions', 3, None, 'usage-of-cost-functions'),
-              ('Activation functions', 3, None, 'activation-functions'),
-              ('Usage of activation functions',
-               3,
-               None,
-               'usage-of-activation-functions'),
-              ('Convolution', 3, None, 'convolution'),
-              ('Layers', 3, None, 'layers'),
-              ('Convolution2DLayer: convolution in a hidden layer',
-               3,
-               None,
-               'convolution2dlayer-convolution-in-a-hidden-layer'),
-              ('Backpropagation in the convolutional layer',
-               3,
-               None,
-               'backpropagation-in-the-convolutional-layer'),
-              ('Demonstration', 3, None, 'demonstration'),
-              ('Pooling Layer', 3, None, 'pooling-layer'),
-              ('Flattening Layer', 3, None, 'flattening-layer'),
-              ('Fully Connected Layers', 3, None, 'fully-connected-layers'),
-              ('Optimized Convolution2DLayer',
-               3,
-               None,
-               'optimized-convolution2dlayer'),
-              ('The Convolutional Neural Network (CNN)',
-               3,
-               None,
-               'the-convolutional-neural-network-cnn'),
-              ('Usage of CNN code', 3, None, 'usage-of-cnn-code'),
-              ('Additional Remarks', 3, None, 'additional-remarks'),
-              ('Remarks on the speed', 3, None, 'remarks-on-the-speed'),
-              ('Convolution using separable kernels',
-               3,
-               None,
-               'convolution-using-separable-kernels'),
-              ('Convolution in the Fourier domain',
-               3,
-               None,
-               'convolution-in-the-fourier-domain')]}
+               'building-convolutional-neural-networks-in-tensorflow-keras-and-pytorch')]}
 end of tocinfo -->
 
 <body>
@@ -247,96 +284,100 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week44-bs.html">Week 44,  Convolutional Neural Networks (CNN)</a>
+    <a class="navbar-brand" href="http://github.com/week44-bs.html">Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs001.html#plan-for-week-44" style="font-size: 80%;"><b>Plan for week 44</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs002.html#lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Lab  sessions on Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs003.html#material-for-lecture-monday-october-28" style="font-size: 80%;"><b>Material for Lecture Monday October 28</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs004.html#convolutional-neural-networks-recognizing-images" style="font-size: 80%;"><b>Convolutional Neural Networks (recognizing images)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs005.html#what-is-the-difference" style="font-size: 80%;"><b>What is the Difference</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs006.html#neural-networks-vs-cnns" style="font-size: 80%;"><b>Neural Networks vs CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs007.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;"><b>Why CNNS for images, sound files, medical images from CT scans etc?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs008.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;"><b>Regular NNs don’t scale well to full images</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs009.html#3d-volumes-of-neurons" style="font-size: 80%;"><b>3D volumes of neurons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs010.html#more-on-dimensionalities" style="font-size: 80%;"><b>More on Dimensionalities</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs011.html#further-remarks" style="font-size: 80%;"><b>Further remarks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs012.html#layers-used-to-build-cnns" style="font-size: 80%;"><b>Layers used to build CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs013.html#transforming-images" style="font-size: 80%;"><b>Transforming images</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs014.html#cnns-in-brief" style="font-size: 80%;"><b>CNNs in brief</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs015.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs016.html#key-idea" style="font-size: 80%;"><b>Key Idea</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs017.html#how-to-do-image-compression-before-the-era-of-deep-learning" style="font-size: 80%;"><b>How to do image compression before the era of deep learning</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs018.html#the-svd-example" style="font-size: 80%;"><b>The SVD example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs019.html#mathematics-of-cnns" style="font-size: 80%;"><b>Mathematics of CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs020.html#convolution-examples-polynomial-multiplication" style="font-size: 80%;"><b>Convolution Examples: Polynomial multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs021.html#efficient-polynomial-multiplication" style="font-size: 80%;"><b>Efficient Polynomial Multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs022.html#further-simplification" style="font-size: 80%;"><b>Further simplification</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs023.html#a-more-efficient-way-of-coding-the-above-convolution" style="font-size: 80%;"><b>A more efficient way of coding the above Convolution</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs024.html#commutative-process" style="font-size: 80%;"><b>Commutative process</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs025.html#toeplitz-matrices" style="font-size: 80%;"><b>Toeplitz matrices</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs026.html#fourier-series-and-toeplitz-matrices" style="font-size: 80%;"><b>Fourier series and Toeplitz matrices</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs027.html#generalizing-the-above-one-dimensional-case" style="font-size: 80%;"><b>Generalizing the above one-dimensional case</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#memory-considerations" style="font-size: 80%;"><b>Memory considerations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs029.html#padding" style="font-size: 80%;"><b>Padding</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs030.html#new-vector" style="font-size: 80%;"><b>New vector</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs031.html#rewriting-as-dot-products" style="font-size: 80%;"><b>Rewriting as dot products</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs032.html#cross-correlation" style="font-size: 80%;"><b>Cross correlation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs032.html#two-dimensional-objects" style="font-size: 80%;"><b>Two-dimensional objects</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs033.html#cnns-in-more-detail-simple-example" style="font-size: 80%;"><b>CNNs in more detail, simple example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs034.html#the-convolution-stage" style="font-size: 80%;"><b>The convolution stage</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs035.html#finding-the-number-of-parameters" style="font-size: 80%;"><b>Finding the number of parameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs036.html#new-image-or-volume" style="font-size: 80%;"><b>New image (or volume)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs037.html#parameters-to-train-common-settings" style="font-size: 80%;"><b>Parameters to train, common settings</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs038.html#examples-of-cnn-setups" style="font-size: 80%;"><b>Examples of CNN setups</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs039.html#summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>Summarizing: Performing a general discrete convolution ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs040.html#pooling" style="font-size: 80%;"><b>Pooling</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs041.html#pooling-arithmetic" style="font-size: 80%;"><b>Pooling arithmetic</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs042.html#pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>Pooling types ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs043.html#building-convolutional-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building convolutional neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="#setting-it-up" style="font-size: 80%;"><b>Setting it up</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs045.html#the-mnist-dataset-again" style="font-size: 80%;"><b>The MNIST dataset again</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs046.html#strong-correlations" style="font-size: 80%;"><b>Strong correlations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs047.html#layers-of-a-cnn" style="font-size: 80%;"><b>Layers of a CNN</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs048.html#systematic-reduction" style="font-size: 80%;"><b>Systematic reduction</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs049.html#prerequisites-collect-and-pre-process-data" style="font-size: 80%;"><b>Prerequisites: Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs050.html#importing-keras-and-tensorflow" style="font-size: 80%;"><b>Importing Keras and Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs051.html#running-with-keras" style="font-size: 80%;"><b>Running with Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs052.html#final-part" style="font-size: 80%;"><b>Final part</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs053.html#final-visualization" style="font-size: 80%;"><b>Final visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs054.html#the-cifar01-data-set" style="font-size: 80%;"><b>The CIFAR01 data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs055.html#verifying-the-data-set" style="font-size: 80%;"><b>Verifying the data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs056.html#set-up-the-model" style="font-size: 80%;"><b>Set up  the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs057.html#add-dense-layers-on-top" style="font-size: 80%;"><b>Add Dense layers on top</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs058.html#compile-and-train-the-model" style="font-size: 80%;"><b>Compile and train the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs059.html#finally-evaluate-the-model" style="font-size: 80%;"><b>Finally, evaluate the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#building-our-own-cnn-code" style="font-size: 80%;"><b>Building our own CNN code</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#list-of-contents" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;List of contents:</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution2dlayer-convolution-in-a-hidden-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution2DLayer: convolution in a hidden layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#backpropagation-in-the-convolutional-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Backpropagation in the convolutional layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#demonstration" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Demonstration</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#pooling-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Pooling Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#flattening-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Flattening Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#fully-connected-layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Fully Connected Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#optimized-convolution2dlayer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Optimized Convolution2DLayer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#the-convolutional-neural-network-cnn" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;The Convolutional Neural Network (CNN)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-cnn-code" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of CNN code</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#additional-remarks" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Additional Remarks</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#remarks-on-the-speed" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Remarks on the speed</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution-using-separable-kernels" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution using separable kernels</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution-in-the-fourier-domain" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution in the Fourier domain</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs001.html#plan-for-week-44" style="font-size: 80%;">Plan for week 44</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs002.html#lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;">Lab  sessions on Tuesday and Wednesday</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs003.html#material-for-lecture-monday-october-27" style="font-size: 80%;">Material for Lecture Monday October 27</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs004.html#solving-differential-equations-with-deep-learning" style="font-size: 80%;">Solving differential equations  with Deep Learning</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs005.html#ordinary-differential-equations-first" style="font-size: 80%;">Ordinary Differential Equations first</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#the-trial-solution" style="font-size: 80%;">The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs007.html#minimization-process" style="font-size: 80%;">Minimization process</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs008.html#minimizing-the-cost-function-using-gradient-descent-and-automatic-differentiation" style="font-size: 80%;">Minimizing the cost function using gradient descent and automatic differentiation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs009.html#example-exponential-decay" style="font-size: 80%;">Example: Exponential decay</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs010.html#the-function-to-solve-for" style="font-size: 80%;">The function to solve for</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#the-trial-solution" style="font-size: 80%;">The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs012.html#setup-of-network" style="font-size: 80%;">Setup of Network</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs013.html#reformulating-the-problem" style="font-size: 80%;">Reformulating the problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs014.html#more-technicalities" style="font-size: 80%;">More technicalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs039.html#more-details" style="font-size: 80%;">More details</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs016.html#a-possible-implementation-of-a-neural-network" style="font-size: 80%;">A possible implementation of a neural network</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs017.html#technicalities" style="font-size: 80%;">Technicalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs018.html#final-technicalities-i" style="font-size: 80%;">Final technicalities I</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs019.html#final-technicalities-ii" style="font-size: 80%;">Final technicalities II</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs020.html#final-technicalities-iii" style="font-size: 80%;">Final technicalities III</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs021.html#final-technicalities-iv" style="font-size: 80%;">Final technicalities IV</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs022.html#back-propagation" style="font-size: 80%;">Back propagation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs023.html#gradient-descent" style="font-size: 80%;">Gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs024.html#the-code-for-solving-the-ode" style="font-size: 80%;">The code for solving the ODE</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs025.html#the-network-with-one-input-layer-specified-number-of-hidden-layers-and-one-output-layer" style="font-size: 80%;">The network with one input layer, specified number of hidden layers, and one output layer</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs026.html#example-population-growth" style="font-size: 80%;">Example: Population growth</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs027.html#setting-up-the-problem" style="font-size: 80%;">Setting up the problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#the-trial-solution" style="font-size: 80%;">The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs029.html#the-program-using-autograd" style="font-size: 80%;">The program using Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs030.html#using-forward-euler-to-solve-the-ode" style="font-size: 80%;">Using forward Euler to solve the ODE</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs031.html#example-solving-the-one-dimensional-poisson-equation" style="font-size: 80%;">Example: Solving the one dimensional Poisson equation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs032.html#the-specific-equation-to-solve-for" style="font-size: 80%;">The specific equation to solve for</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs033.html#solving-the-equation-using-autograd" style="font-size: 80%;">Solving the equation using Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs034.html#comparing-with-a-numerical-scheme" style="font-size: 80%;">Comparing with a numerical scheme</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs035.html#setting-up-the-code" style="font-size: 80%;">Setting up the code</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs036.html#partial-differential-equations" style="font-size: 80%;">Partial Differential Equations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs037.html#type-of-problem" style="font-size: 80%;">Type of problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs038.html#network-requirements" style="font-size: 80%;">Network requirements</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs039.html#more-details" style="font-size: 80%;">More details</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs040.html#example-the-diffusion-equation" style="font-size: 80%;">Example: The diffusion equation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs041.html#defining-the-problem" style="font-size: 80%;">Defining the problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs042.html#setting-up-the-network-using-autograd" style="font-size: 80%;">Setting up the network using Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs043.html#setting-up-the-network-using-autograd-the-trial-solution" style="font-size: 80%;">Setting up the network using Autograd; The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="#why-the-jacobian" style="font-size: 80%;">Why the Jacobian?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs045.html#setting-up-the-network-using-autograd-the-full-program" style="font-size: 80%;">Setting up the network using Autograd; The full program</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs046.html#resources-on-differential-equations-and-deep-learning" style="font-size: 80%;">Resources on differential equations and deep learning</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs047.html#convolutional-neural-networks-recognizing-images" style="font-size: 80%;">Convolutional Neural Networks (recognizing images)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs048.html#what-is-the-difference" style="font-size: 80%;">What is the Difference</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs049.html#neural-networks-vs-cnns" style="font-size: 80%;">Neural Networks vs CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs050.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;">Why CNNS for images, sound files, medical images from CT scans etc?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs051.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;">Regular NNs don’t scale well to full images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs052.html#3d-volumes-of-neurons" style="font-size: 80%;">3D volumes of neurons</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs053.html#more-on-dimensionalities" style="font-size: 80%;">More on Dimensionalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs054.html#further-remarks" style="font-size: 80%;">Further remarks</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs055.html#layers-used-to-build-cnns" style="font-size: 80%;">Layers used to build CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs056.html#transforming-images" style="font-size: 80%;">Transforming images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs057.html#cnns-in-brief" style="font-size: 80%;">CNNs in brief</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs058.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs059.html#key-idea" style="font-size: 80%;">Key Idea</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#how-to-do-image-compression-before-the-era-of-deep-learning" style="font-size: 80%;">How to do image compression before the era of deep learning</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs061.html#the-svd-example" style="font-size: 80%;">The SVD example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs062.html#mathematics-of-cnns" style="font-size: 80%;">Mathematics of CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs063.html#convolution-examples-polynomial-multiplication" style="font-size: 80%;">Convolution Examples: Polynomial multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs064.html#efficient-polynomial-multiplication" style="font-size: 80%;">Efficient Polynomial Multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs065.html#further-simplification" style="font-size: 80%;">Further simplification</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs066.html#a-more-efficient-way-of-coding-the-above-convolution" style="font-size: 80%;">A more efficient way of coding the above Convolution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs067.html#commutative-process" style="font-size: 80%;">Commutative process</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs068.html#toeplitz-matrices" style="font-size: 80%;">Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs069.html#fourier-series-and-toeplitz-matrices" style="font-size: 80%;">Fourier series and Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs070.html#generalizing-the-above-one-dimensional-case" style="font-size: 80%;">Generalizing the above one-dimensional case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs071.html#memory-considerations" style="font-size: 80%;">Memory considerations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs072.html#padding" style="font-size: 80%;">Padding</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs073.html#new-vector" style="font-size: 80%;">New vector</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs074.html#rewriting-as-dot-products" style="font-size: 80%;">Rewriting as dot products</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs075.html#cross-correlation" style="font-size: 80%;">Cross correlation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs075.html#two-dimensional-objects" style="font-size: 80%;">Two-dimensional objects</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs076.html#cnns-in-more-detail-simple-example" style="font-size: 80%;">CNNs in more detail, simple example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs077.html#the-convolution-stage" style="font-size: 80%;">The convolution stage</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs078.html#finding-the-number-of-parameters" style="font-size: 80%;">Finding the number of parameters</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs079.html#new-image-or-volume" style="font-size: 80%;">New image (or volume)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs080.html#parameters-to-train-common-settings" style="font-size: 80%;">Parameters to train, common settings</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs081.html#examples-of-cnn-setups" style="font-size: 80%;">Examples of CNN setups</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs082.html#summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Summarizing: Performing a general discrete convolution ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs083.html#pooling" style="font-size: 80%;">Pooling</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs084.html#pooling-arithmetic" style="font-size: 80%;">Pooling arithmetic</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs085.html#pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Pooling types ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs086.html#building-convolutional-neural-networks-in-tensorflow-keras-and-pytorch" style="font-size: 80%;">Building convolutional neural networks in Tensorflow/Keras and PyTorch</a></li>
 
         </ul>
       </li>
@@ -348,14 +389,87 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0044"></a>
 <!-- !split -->
-<h2 id="setting-it-up" class="anchor">Setting it up </h2>
+<h2 id="why-the-jacobian" class="anchor">Why the Jacobian? </h2>
+
+<p>The Jacobian is used because the program must find the derivative of
+the trial solution with respect to \( x \) and \( t \).
+</p>
 
-<p>It means that to represent the entire
-dataset of images, we require a 4D matrix or <b>tensor</b>. This tensor has the dimensions:  
+<p>This gives the necessity of computing the Jacobian matrix, as we want
+to evaluate the gradient with respect to \( x \) and \( t \) (note that the
+Jacobian of a scalar-valued multivariate function is simply its
+gradient).
 </p>
-$$  
-(n_{inputs},\, n_{pixels, width},\, n_{pixels, height},\, depth) .
-$$
+
+<p>In Autograd, the differentiation is by default done with respect to
+the first input argument of your Python function. Since the points is
+an array representing \( x \) and \( t \), the Jacobian is calculated using
+the values of \( x \) and \( t \).
+</p>
+
+<p>To find the second derivative with respect to \( x \) and \( t \), the
+Jacobian can be found for the second time. The result is a Hessian
+matrix, which is the matrix containing all the possible second order
+mixed derivatives of \( g(x,t) \).
+</p>
+
+
+<!-- code=python (!bc pycod) typeset with pygments style "default" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #f8f8f8">
+  <pre style="line-height: 125%;"><span style="color: #408080; font-style: italic"># Set up the trial function:</span>
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">u</span>(x):
+    <span style="color: #008000; font-weight: bold">return</span> np<span style="color: #666666">.</span>sin(np<span style="color: #666666">.</span>pi<span style="color: #666666">*</span>x)
+
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">g_trial</span>(point,P):
+    x,t <span style="color: #666666">=</span> point
+    <span style="color: #008000; font-weight: bold">return</span> (<span style="color: #666666">1-</span>t)<span style="color: #666666">*</span>u(x) <span style="color: #666666">+</span> x<span style="color: #666666">*</span>(<span style="color: #666666">1-</span>x)<span style="color: #666666">*</span>t<span style="color: #666666">*</span>deep_neural_network(P,point)
+
+<span style="color: #408080; font-style: italic"># The right side of the ODE:</span>
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">f</span>(point):
+    <span style="color: #008000; font-weight: bold">return</span> <span style="color: #666666">0.</span>
+
+<span style="color: #408080; font-style: italic"># The cost function:</span>
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">cost_function</span>(P, x, t):
+    cost_sum <span style="color: #666666">=</span> <span style="color: #666666">0</span>
+
+    g_t_jacobian_func <span style="color: #666666">=</span> jacobian(g_trial)
+    g_t_hessian_func <span style="color: #666666">=</span> hessian(g_trial)
+
+    <span style="color: #008000; font-weight: bold">for</span> x_ <span style="color: #AA22FF; font-weight: bold">in</span> x:
+        <span style="color: #008000; font-weight: bold">for</span> t_ <span style="color: #AA22FF; font-weight: bold">in</span> t:
+            point <span style="color: #666666">=</span> np<span style="color: #666666">.</span>array([x_,t_])
+
+            g_t <span style="color: #666666">=</span> g_trial(point,P)
+            g_t_jacobian <span style="color: #666666">=</span> g_t_jacobian_func(point,P)
+            g_t_hessian <span style="color: #666666">=</span> g_t_hessian_func(point,P)
+
+            g_t_dt <span style="color: #666666">=</span> g_t_jacobian[<span style="color: #666666">1</span>]
+            g_t_d2x <span style="color: #666666">=</span> g_t_hessian[<span style="color: #666666">0</span>][<span style="color: #666666">0</span>]
+
+            func <span style="color: #666666">=</span> f(point)
+
+            err_sqr <span style="color: #666666">=</span> ( (g_t_dt <span style="color: #666666">-</span> g_t_d2x) <span style="color: #666666">-</span> func)<span style="color: #666666">**2</span>
+            cost_sum <span style="color: #666666">+=</span> err_sqr
+
+    <span style="color: #008000; font-weight: bold">return</span> cost_sum
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
+</div>
 
 
 <p>
@@ -383,7 +497,7 @@ <h2 id="setting-it-up" class="anchor">Setting it up </h2>
   <li><a href="http://github.com/._week44-bs052.html">53</a></li>
   <li><a href="http://github.com/._week44-bs053.html">54</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week44-bs060.html">61</a></li>
+  <li><a href="http://github.com/._week44-bs086.html">87</a></li>
   <li><a href="http://github.com/._week44-bs045.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week44/html/._week44-bs045.html b/doc/pub/week44/html/._week44-bs045.html
index 9cac19a70..6fdf9263e 100644
--- a/doc/pub/week44/html/._week44-bs045.html
+++ b/doc/pub/week44/html/._week44-bs045.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 44,  Convolutional Neural Networks (CNN)">
-<title>Week 44,  Convolutional Neural Networks (CNN)</title>
+<meta name="description" content="Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)">
+<title>Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week44.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week44-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -41,10 +41,121 @@
                2,
                None,
                'lab-sessions-on-tuesday-and-wednesday'),
-              ('Material for Lecture Monday October 28',
+              ('Material for Lecture Monday October 27',
                2,
                None,
-               'material-for-lecture-monday-october-28'),
+               'material-for-lecture-monday-october-27'),
+              ('Solving differential equations  with Deep Learning',
+               2,
+               None,
+               'solving-differential-equations-with-deep-learning'),
+              ('Ordinary Differential Equations first',
+               2,
+               None,
+               'ordinary-differential-equations-first'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('Minimization process', 2, None, 'minimization-process'),
+              ('Minimizing the cost function using gradient descent and '
+               'automatic differentiation',
+               2,
+               None,
+               'minimizing-the-cost-function-using-gradient-descent-and-automatic-differentiation'),
+              ('Example: Exponential decay',
+               2,
+               None,
+               'example-exponential-decay'),
+              ('The function to solve for',
+               2,
+               None,
+               'the-function-to-solve-for'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('Setup of Network', 2, None, 'setup-of-network'),
+              ('Reformulating the problem',
+               2,
+               None,
+               'reformulating-the-problem'),
+              ('More technicalities', 2, None, 'more-technicalities'),
+              ('More details', 2, None, 'more-details'),
+              ('A possible implementation of a neural network',
+               2,
+               None,
+               'a-possible-implementation-of-a-neural-network'),
+              ('Technicalities', 2, None, 'technicalities'),
+              ('Final technicalities I', 2, None, 'final-technicalities-i'),
+              ('Final technicalities II', 2, None, 'final-technicalities-ii'),
+              ('Final technicalities III', 2, None, 'final-technicalities-iii'),
+              ('Final technicalities IV', 2, None, 'final-technicalities-iv'),
+              ('Back propagation', 2, None, 'back-propagation'),
+              ('Gradient descent', 2, None, 'gradient-descent'),
+              ('The code for solving the ODE',
+               2,
+               None,
+               'the-code-for-solving-the-ode'),
+              ('The network with one input layer, specified number of hidden '
+               'layers, and one output layer',
+               2,
+               None,
+               'the-network-with-one-input-layer-specified-number-of-hidden-layers-and-one-output-layer'),
+              ('Example: Population growth',
+               2,
+               None,
+               'example-population-growth'),
+              ('Setting up the problem', 2, None, 'setting-up-the-problem'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('The program using Autograd',
+               2,
+               None,
+               'the-program-using-autograd'),
+              ('Using forward Euler to solve the ODE',
+               2,
+               None,
+               'using-forward-euler-to-solve-the-ode'),
+              ('Example: Solving the one dimensional Poisson equation',
+               2,
+               None,
+               'example-solving-the-one-dimensional-poisson-equation'),
+              ('The specific equation to solve for',
+               2,
+               None,
+               'the-specific-equation-to-solve-for'),
+              ('Solving the equation using Autograd',
+               2,
+               None,
+               'solving-the-equation-using-autograd'),
+              ('Comparing with a numerical scheme',
+               2,
+               None,
+               'comparing-with-a-numerical-scheme'),
+              ('Setting up the code', 2, None, 'setting-up-the-code'),
+              ('Partial Differential Equations',
+               2,
+               None,
+               'partial-differential-equations'),
+              ('Type of problem', 2, None, 'type-of-problem'),
+              ('Network requirements', 2, None, 'network-requirements'),
+              ('More details', 2, None, 'more-details'),
+              ('Example: The diffusion equation',
+               2,
+               None,
+               'example-the-diffusion-equation'),
+              ('Defining the problem', 2, None, 'defining-the-problem'),
+              ('Setting up the network using Autograd',
+               2,
+               None,
+               'setting-up-the-network-using-autograd'),
+              ('Setting up the network using Autograd; The trial solution',
+               2,
+               None,
+               'setting-up-the-network-using-autograd-the-trial-solution'),
+              ('Why the Jacobian?', 2, None, 'why-the-jacobian'),
+              ('Setting up the network using Autograd; The full program',
+               2,
+               None,
+               'setting-up-the-network-using-autograd-the-full-program'),
+              ('Resources on differential equations and deep learning',
+               2,
+               None,
+               'resources-on-differential-equations-and-deep-learning'),
               ('Convolutional Neural Networks (recognizing images)',
                2,
                None,
@@ -141,85 +252,11 @@
                2,
                None,
                'pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
-              ('Building convolutional neural networks in Tensorflow and Keras',
-               2,
-               None,
-               'building-convolutional-neural-networks-in-tensorflow-and-keras'),
-              ('Setting it up', 2, None, 'setting-it-up'),
-              ('The MNIST dataset again', 2, None, 'the-mnist-dataset-again'),
-              ('Strong correlations', 2, None, 'strong-correlations'),
-              ('Layers of a CNN', 2, None, 'layers-of-a-cnn'),
-              ('Systematic reduction', 2, None, 'systematic-reduction'),
-              ('Prerequisites: Collect and pre-process data',
-               2,
-               None,
-               'prerequisites-collect-and-pre-process-data'),
-              ('Importing Keras and Tensorflow',
-               2,
-               None,
-               'importing-keras-and-tensorflow'),
-              ('Running with Keras', 2, None, 'running-with-keras'),
-              ('Final part', 2, None, 'final-part'),
-              ('Final visualization', 2, None, 'final-visualization'),
-              ('The CIFAR01 data set', 2, None, 'the-cifar01-data-set'),
-              ('Verifying the data set', 2, None, 'verifying-the-data-set'),
-              ('Set up  the model', 2, None, 'set-up-the-model'),
-              ('Add Dense layers on top', 2, None, 'add-dense-layers-on-top'),
-              ('Compile and train the model',
+              ('Building convolutional neural networks in Tensorflow/Keras and '
+               'PyTorch',
                2,
                None,
-               'compile-and-train-the-model'),
-              ('Finally, evaluate the model',
-               2,
-               None,
-               'finally-evaluate-the-model'),
-              ('Building our own CNN code',
-               2,
-               None,
-               'building-our-own-cnn-code'),
-              ('List of contents:', 3, None, 'list-of-contents'),
-              ('Schedulers', 3, None, 'schedulers'),
-              ('Usage of schedulers', 3, None, 'usage-of-schedulers'),
-              ('Cost functions', 3, None, 'cost-functions'),
-              ('Usage of cost functions', 3, None, 'usage-of-cost-functions'),
-              ('Activation functions', 3, None, 'activation-functions'),
-              ('Usage of activation functions',
-               3,
-               None,
-               'usage-of-activation-functions'),
-              ('Convolution', 3, None, 'convolution'),
-              ('Layers', 3, None, 'layers'),
-              ('Convolution2DLayer: convolution in a hidden layer',
-               3,
-               None,
-               'convolution2dlayer-convolution-in-a-hidden-layer'),
-              ('Backpropagation in the convolutional layer',
-               3,
-               None,
-               'backpropagation-in-the-convolutional-layer'),
-              ('Demonstration', 3, None, 'demonstration'),
-              ('Pooling Layer', 3, None, 'pooling-layer'),
-              ('Flattening Layer', 3, None, 'flattening-layer'),
-              ('Fully Connected Layers', 3, None, 'fully-connected-layers'),
-              ('Optimized Convolution2DLayer',
-               3,
-               None,
-               'optimized-convolution2dlayer'),
-              ('The Convolutional Neural Network (CNN)',
-               3,
-               None,
-               'the-convolutional-neural-network-cnn'),
-              ('Usage of CNN code', 3, None, 'usage-of-cnn-code'),
-              ('Additional Remarks', 3, None, 'additional-remarks'),
-              ('Remarks on the speed', 3, None, 'remarks-on-the-speed'),
-              ('Convolution using separable kernels',
-               3,
-               None,
-               'convolution-using-separable-kernels'),
-              ('Convolution in the Fourier domain',
-               3,
-               None,
-               'convolution-in-the-fourier-domain')]}
+               'building-convolutional-neural-networks-in-tensorflow-keras-and-pytorch')]}
 end of tocinfo -->
 
 <body>
@@ -247,96 +284,100 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week44-bs.html">Week 44,  Convolutional Neural Networks (CNN)</a>
+    <a class="navbar-brand" href="http://github.com/week44-bs.html">Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs001.html#plan-for-week-44" style="font-size: 80%;"><b>Plan for week 44</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs002.html#lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Lab  sessions on Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs003.html#material-for-lecture-monday-october-28" style="font-size: 80%;"><b>Material for Lecture Monday October 28</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs004.html#convolutional-neural-networks-recognizing-images" style="font-size: 80%;"><b>Convolutional Neural Networks (recognizing images)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs005.html#what-is-the-difference" style="font-size: 80%;"><b>What is the Difference</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs006.html#neural-networks-vs-cnns" style="font-size: 80%;"><b>Neural Networks vs CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs007.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;"><b>Why CNNS for images, sound files, medical images from CT scans etc?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs008.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;"><b>Regular NNs don’t scale well to full images</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs009.html#3d-volumes-of-neurons" style="font-size: 80%;"><b>3D volumes of neurons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs010.html#more-on-dimensionalities" style="font-size: 80%;"><b>More on Dimensionalities</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs011.html#further-remarks" style="font-size: 80%;"><b>Further remarks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs012.html#layers-used-to-build-cnns" style="font-size: 80%;"><b>Layers used to build CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs013.html#transforming-images" style="font-size: 80%;"><b>Transforming images</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs014.html#cnns-in-brief" style="font-size: 80%;"><b>CNNs in brief</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs015.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs016.html#key-idea" style="font-size: 80%;"><b>Key Idea</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs017.html#how-to-do-image-compression-before-the-era-of-deep-learning" style="font-size: 80%;"><b>How to do image compression before the era of deep learning</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs018.html#the-svd-example" style="font-size: 80%;"><b>The SVD example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs019.html#mathematics-of-cnns" style="font-size: 80%;"><b>Mathematics of CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs020.html#convolution-examples-polynomial-multiplication" style="font-size: 80%;"><b>Convolution Examples: Polynomial multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs021.html#efficient-polynomial-multiplication" style="font-size: 80%;"><b>Efficient Polynomial Multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs022.html#further-simplification" style="font-size: 80%;"><b>Further simplification</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs023.html#a-more-efficient-way-of-coding-the-above-convolution" style="font-size: 80%;"><b>A more efficient way of coding the above Convolution</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs024.html#commutative-process" style="font-size: 80%;"><b>Commutative process</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs025.html#toeplitz-matrices" style="font-size: 80%;"><b>Toeplitz matrices</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs026.html#fourier-series-and-toeplitz-matrices" style="font-size: 80%;"><b>Fourier series and Toeplitz matrices</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs027.html#generalizing-the-above-one-dimensional-case" style="font-size: 80%;"><b>Generalizing the above one-dimensional case</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#memory-considerations" style="font-size: 80%;"><b>Memory considerations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs029.html#padding" style="font-size: 80%;"><b>Padding</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs030.html#new-vector" style="font-size: 80%;"><b>New vector</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs031.html#rewriting-as-dot-products" style="font-size: 80%;"><b>Rewriting as dot products</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs032.html#cross-correlation" style="font-size: 80%;"><b>Cross correlation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs032.html#two-dimensional-objects" style="font-size: 80%;"><b>Two-dimensional objects</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs033.html#cnns-in-more-detail-simple-example" style="font-size: 80%;"><b>CNNs in more detail, simple example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs034.html#the-convolution-stage" style="font-size: 80%;"><b>The convolution stage</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs035.html#finding-the-number-of-parameters" style="font-size: 80%;"><b>Finding the number of parameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs036.html#new-image-or-volume" style="font-size: 80%;"><b>New image (or volume)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs037.html#parameters-to-train-common-settings" style="font-size: 80%;"><b>Parameters to train, common settings</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs038.html#examples-of-cnn-setups" style="font-size: 80%;"><b>Examples of CNN setups</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs039.html#summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>Summarizing: Performing a general discrete convolution ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs040.html#pooling" style="font-size: 80%;"><b>Pooling</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs041.html#pooling-arithmetic" style="font-size: 80%;"><b>Pooling arithmetic</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs042.html#pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>Pooling types ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs043.html#building-convolutional-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building convolutional neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs044.html#setting-it-up" style="font-size: 80%;"><b>Setting it up</b></a></li>
-     <!-- navigation toc: --> <li><a href="#the-mnist-dataset-again" style="font-size: 80%;"><b>The MNIST dataset again</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs046.html#strong-correlations" style="font-size: 80%;"><b>Strong correlations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs047.html#layers-of-a-cnn" style="font-size: 80%;"><b>Layers of a CNN</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs048.html#systematic-reduction" style="font-size: 80%;"><b>Systematic reduction</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs049.html#prerequisites-collect-and-pre-process-data" style="font-size: 80%;"><b>Prerequisites: Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs050.html#importing-keras-and-tensorflow" style="font-size: 80%;"><b>Importing Keras and Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs051.html#running-with-keras" style="font-size: 80%;"><b>Running with Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs052.html#final-part" style="font-size: 80%;"><b>Final part</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs053.html#final-visualization" style="font-size: 80%;"><b>Final visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs054.html#the-cifar01-data-set" style="font-size: 80%;"><b>The CIFAR01 data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs055.html#verifying-the-data-set" style="font-size: 80%;"><b>Verifying the data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs056.html#set-up-the-model" style="font-size: 80%;"><b>Set up  the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs057.html#add-dense-layers-on-top" style="font-size: 80%;"><b>Add Dense layers on top</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs058.html#compile-and-train-the-model" style="font-size: 80%;"><b>Compile and train the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs059.html#finally-evaluate-the-model" style="font-size: 80%;"><b>Finally, evaluate the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#building-our-own-cnn-code" style="font-size: 80%;"><b>Building our own CNN code</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#list-of-contents" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;List of contents:</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution2dlayer-convolution-in-a-hidden-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution2DLayer: convolution in a hidden layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#backpropagation-in-the-convolutional-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Backpropagation in the convolutional layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#demonstration" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Demonstration</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#pooling-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Pooling Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#flattening-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Flattening Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#fully-connected-layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Fully Connected Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#optimized-convolution2dlayer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Optimized Convolution2DLayer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#the-convolutional-neural-network-cnn" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;The Convolutional Neural Network (CNN)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-cnn-code" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of CNN code</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#additional-remarks" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Additional Remarks</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#remarks-on-the-speed" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Remarks on the speed</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution-using-separable-kernels" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution using separable kernels</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution-in-the-fourier-domain" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution in the Fourier domain</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs001.html#plan-for-week-44" style="font-size: 80%;">Plan for week 44</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs002.html#lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;">Lab  sessions on Tuesday and Wednesday</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs003.html#material-for-lecture-monday-october-27" style="font-size: 80%;">Material for Lecture Monday October 27</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs004.html#solving-differential-equations-with-deep-learning" style="font-size: 80%;">Solving differential equations  with Deep Learning</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs005.html#ordinary-differential-equations-first" style="font-size: 80%;">Ordinary Differential Equations first</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#the-trial-solution" style="font-size: 80%;">The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs007.html#minimization-process" style="font-size: 80%;">Minimization process</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs008.html#minimizing-the-cost-function-using-gradient-descent-and-automatic-differentiation" style="font-size: 80%;">Minimizing the cost function using gradient descent and automatic differentiation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs009.html#example-exponential-decay" style="font-size: 80%;">Example: Exponential decay</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs010.html#the-function-to-solve-for" style="font-size: 80%;">The function to solve for</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#the-trial-solution" style="font-size: 80%;">The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs012.html#setup-of-network" style="font-size: 80%;">Setup of Network</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs013.html#reformulating-the-problem" style="font-size: 80%;">Reformulating the problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs014.html#more-technicalities" style="font-size: 80%;">More technicalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs039.html#more-details" style="font-size: 80%;">More details</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs016.html#a-possible-implementation-of-a-neural-network" style="font-size: 80%;">A possible implementation of a neural network</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs017.html#technicalities" style="font-size: 80%;">Technicalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs018.html#final-technicalities-i" style="font-size: 80%;">Final technicalities I</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs019.html#final-technicalities-ii" style="font-size: 80%;">Final technicalities II</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs020.html#final-technicalities-iii" style="font-size: 80%;">Final technicalities III</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs021.html#final-technicalities-iv" style="font-size: 80%;">Final technicalities IV</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs022.html#back-propagation" style="font-size: 80%;">Back propagation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs023.html#gradient-descent" style="font-size: 80%;">Gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs024.html#the-code-for-solving-the-ode" style="font-size: 80%;">The code for solving the ODE</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs025.html#the-network-with-one-input-layer-specified-number-of-hidden-layers-and-one-output-layer" style="font-size: 80%;">The network with one input layer, specified number of hidden layers, and one output layer</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs026.html#example-population-growth" style="font-size: 80%;">Example: Population growth</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs027.html#setting-up-the-problem" style="font-size: 80%;">Setting up the problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#the-trial-solution" style="font-size: 80%;">The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs029.html#the-program-using-autograd" style="font-size: 80%;">The program using Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs030.html#using-forward-euler-to-solve-the-ode" style="font-size: 80%;">Using forward Euler to solve the ODE</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs031.html#example-solving-the-one-dimensional-poisson-equation" style="font-size: 80%;">Example: Solving the one dimensional Poisson equation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs032.html#the-specific-equation-to-solve-for" style="font-size: 80%;">The specific equation to solve for</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs033.html#solving-the-equation-using-autograd" style="font-size: 80%;">Solving the equation using Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs034.html#comparing-with-a-numerical-scheme" style="font-size: 80%;">Comparing with a numerical scheme</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs035.html#setting-up-the-code" style="font-size: 80%;">Setting up the code</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs036.html#partial-differential-equations" style="font-size: 80%;">Partial Differential Equations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs037.html#type-of-problem" style="font-size: 80%;">Type of problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs038.html#network-requirements" style="font-size: 80%;">Network requirements</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs039.html#more-details" style="font-size: 80%;">More details</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs040.html#example-the-diffusion-equation" style="font-size: 80%;">Example: The diffusion equation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs041.html#defining-the-problem" style="font-size: 80%;">Defining the problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs042.html#setting-up-the-network-using-autograd" style="font-size: 80%;">Setting up the network using Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs043.html#setting-up-the-network-using-autograd-the-trial-solution" style="font-size: 80%;">Setting up the network using Autograd; The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs044.html#why-the-jacobian" style="font-size: 80%;">Why the Jacobian?</a></li>
+     <!-- navigation toc: --> <li><a href="#setting-up-the-network-using-autograd-the-full-program" style="font-size: 80%;">Setting up the network using Autograd; The full program</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs046.html#resources-on-differential-equations-and-deep-learning" style="font-size: 80%;">Resources on differential equations and deep learning</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs047.html#convolutional-neural-networks-recognizing-images" style="font-size: 80%;">Convolutional Neural Networks (recognizing images)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs048.html#what-is-the-difference" style="font-size: 80%;">What is the Difference</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs049.html#neural-networks-vs-cnns" style="font-size: 80%;">Neural Networks vs CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs050.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;">Why CNNS for images, sound files, medical images from CT scans etc?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs051.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;">Regular NNs don’t scale well to full images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs052.html#3d-volumes-of-neurons" style="font-size: 80%;">3D volumes of neurons</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs053.html#more-on-dimensionalities" style="font-size: 80%;">More on Dimensionalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs054.html#further-remarks" style="font-size: 80%;">Further remarks</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs055.html#layers-used-to-build-cnns" style="font-size: 80%;">Layers used to build CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs056.html#transforming-images" style="font-size: 80%;">Transforming images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs057.html#cnns-in-brief" style="font-size: 80%;">CNNs in brief</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs058.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs059.html#key-idea" style="font-size: 80%;">Key Idea</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#how-to-do-image-compression-before-the-era-of-deep-learning" style="font-size: 80%;">How to do image compression before the era of deep learning</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs061.html#the-svd-example" style="font-size: 80%;">The SVD example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs062.html#mathematics-of-cnns" style="font-size: 80%;">Mathematics of CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs063.html#convolution-examples-polynomial-multiplication" style="font-size: 80%;">Convolution Examples: Polynomial multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs064.html#efficient-polynomial-multiplication" style="font-size: 80%;">Efficient Polynomial Multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs065.html#further-simplification" style="font-size: 80%;">Further simplification</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs066.html#a-more-efficient-way-of-coding-the-above-convolution" style="font-size: 80%;">A more efficient way of coding the above Convolution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs067.html#commutative-process" style="font-size: 80%;">Commutative process</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs068.html#toeplitz-matrices" style="font-size: 80%;">Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs069.html#fourier-series-and-toeplitz-matrices" style="font-size: 80%;">Fourier series and Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs070.html#generalizing-the-above-one-dimensional-case" style="font-size: 80%;">Generalizing the above one-dimensional case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs071.html#memory-considerations" style="font-size: 80%;">Memory considerations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs072.html#padding" style="font-size: 80%;">Padding</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs073.html#new-vector" style="font-size: 80%;">New vector</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs074.html#rewriting-as-dot-products" style="font-size: 80%;">Rewriting as dot products</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs075.html#cross-correlation" style="font-size: 80%;">Cross correlation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs075.html#two-dimensional-objects" style="font-size: 80%;">Two-dimensional objects</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs076.html#cnns-in-more-detail-simple-example" style="font-size: 80%;">CNNs in more detail, simple example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs077.html#the-convolution-stage" style="font-size: 80%;">The convolution stage</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs078.html#finding-the-number-of-parameters" style="font-size: 80%;">Finding the number of parameters</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs079.html#new-image-or-volume" style="font-size: 80%;">New image (or volume)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs080.html#parameters-to-train-common-settings" style="font-size: 80%;">Parameters to train, common settings</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs081.html#examples-of-cnn-setups" style="font-size: 80%;">Examples of CNN setups</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs082.html#summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Summarizing: Performing a general discrete convolution ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs083.html#pooling" style="font-size: 80%;">Pooling</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs084.html#pooling-arithmetic" style="font-size: 80%;">Pooling arithmetic</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs085.html#pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Pooling types ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs086.html#building-convolutional-neural-networks-in-tensorflow-keras-and-pytorch" style="font-size: 80%;">Building convolutional neural networks in Tensorflow/Keras and PyTorch</a></li>
 
         </ul>
       </li>
@@ -348,22 +389,272 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0045"></a>
 <!-- !split -->
-<h2 id="the-mnist-dataset-again" class="anchor">The MNIST dataset again </h2>
+<h2 id="setting-up-the-network-using-autograd-the-full-program" class="anchor">Setting up the network using Autograd; The full program </h2>
+
+<p>Having set up the network, along with the trial solution and cost function, we can now see how the deep neural network performs by comparing the results to the analytical solution.</p>
 
-<p>The MNIST dataset consists of grayscale images with a pixel size of
-\( 28\times 28 \), meaning we require \( 28 \times 28 = 724 \) weights to each
-neuron in the first hidden layer.
+<p>The analytical solution of our problem is</p>
+
+<p>$$
+g(x,t) = \exp(-\pi^2 t)\sin(\pi x)
+$$
+</p>
+
+<p>A possible way to implement a neural network solving the PDE, is given below.
+Be aware, though, that it is fairly slow for the parameters used.
+A better result is possible, but requires more iterations, and thus longer time to complete.
 </p>
 
-<p>If we were to analyze images of size \( 128\times 128 \) we would require
-\( 128 \times 128 = 16384 \) weights to each neuron. Even worse if we were
-dealing with color images, as most images are, we have an image matrix
-of size \( 128\times 128 \) for each color dimension (Red, Green, Blue),
-meaning 3 times the number of weights \( = 49152 \) are required for every
-single neuron in the first hidden layer.
+<p>Indeed, the program below is not optimal in its implementation, but rather serves as an example on how to implement and use a neural network to solve a PDE.
+Using TensorFlow results in a much better execution time. Try it!
 </p>
 
 
+<!-- code=python (!bc pycod) typeset with pygments style "default" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #f8f8f8">
+  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">autograd.numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">autograd</span> <span style="color: #008000; font-weight: bold">import</span> jacobian,hessian,grad
+<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">autograd.numpy.random</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">npr</span>
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">matplotlib</span> <span style="color: #008000; font-weight: bold">import</span> cm
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">matplotlib</span> <span style="color: #008000; font-weight: bold">import</span> pyplot <span style="color: #008000; font-weight: bold">as</span> plt
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">mpl_toolkits.mplot3d</span> <span style="color: #008000; font-weight: bold">import</span> axes3d
+
+<span style="color: #408080; font-style: italic">## Set up the network</span>
+
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">sigmoid</span>(z):
+    <span style="color: #008000; font-weight: bold">return</span> <span style="color: #666666">1/</span>(<span style="color: #666666">1</span> <span style="color: #666666">+</span> np<span style="color: #666666">.</span>exp(<span style="color: #666666">-</span>z))
+
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">deep_neural_network</span>(deep_params, x):
+    <span style="color: #408080; font-style: italic"># x is now a point and a 1D numpy array; make it a column vector</span>
+    num_coordinates <span style="color: #666666">=</span> np<span style="color: #666666">.</span>size(x,<span style="color: #666666">0</span>)
+    x <span style="color: #666666">=</span> x<span style="color: #666666">.</span>reshape(num_coordinates,<span style="color: #666666">-1</span>)
+
+    num_points <span style="color: #666666">=</span> np<span style="color: #666666">.</span>size(x,<span style="color: #666666">1</span>)
+
+    <span style="color: #408080; font-style: italic"># N_hidden is the number of hidden layers</span>
+    N_hidden <span style="color: #666666">=</span> <span style="color: #008000">len</span>(deep_params) <span style="color: #666666">-</span> <span style="color: #666666">1</span> <span style="color: #408080; font-style: italic"># -1 since params consist of parameters to all the hidden layers AND the output layer</span>
+
+    <span style="color: #408080; font-style: italic"># Assume that the input layer does nothing to the input x</span>
+    x_input <span style="color: #666666">=</span> x
+    x_prev <span style="color: #666666">=</span> x_input
+
+    <span style="color: #408080; font-style: italic">## Hidden layers:</span>
+
+    <span style="color: #008000; font-weight: bold">for</span> l <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(N_hidden):
+        <span style="color: #408080; font-style: italic"># From the list of parameters P; find the correct weigths and bias for this layer</span>
+        w_hidden <span style="color: #666666">=</span> deep_params[l]
+
+        <span style="color: #408080; font-style: italic"># Add a row of ones to include bias</span>
+        x_prev <span style="color: #666666">=</span> np<span style="color: #666666">.</span>concatenate((np<span style="color: #666666">.</span>ones((<span style="color: #666666">1</span>,num_points)), x_prev ), axis <span style="color: #666666">=</span> <span style="color: #666666">0</span>)
+
+        z_hidden <span style="color: #666666">=</span> np<span style="color: #666666">.</span>matmul(w_hidden, x_prev)
+        x_hidden <span style="color: #666666">=</span> sigmoid(z_hidden)
+
+        <span style="color: #408080; font-style: italic"># Update x_prev such that next layer can use the output from this layer</span>
+        x_prev <span style="color: #666666">=</span> x_hidden
+
+    <span style="color: #408080; font-style: italic">## Output layer:</span>
+
+    <span style="color: #408080; font-style: italic"># Get the weights and bias for this layer</span>
+    w_output <span style="color: #666666">=</span> deep_params[<span style="color: #666666">-1</span>]
+
+    <span style="color: #408080; font-style: italic"># Include bias:</span>
+    x_prev <span style="color: #666666">=</span> np<span style="color: #666666">.</span>concatenate((np<span style="color: #666666">.</span>ones((<span style="color: #666666">1</span>,num_points)), x_prev), axis <span style="color: #666666">=</span> <span style="color: #666666">0</span>)
+
+    z_output <span style="color: #666666">=</span> np<span style="color: #666666">.</span>matmul(w_output, x_prev)
+    x_output <span style="color: #666666">=</span> z_output
+
+    <span style="color: #008000; font-weight: bold">return</span> x_output[<span style="color: #666666">0</span>][<span style="color: #666666">0</span>]
+
+<span style="color: #408080; font-style: italic">## Define the trial solution and cost function</span>
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">u</span>(x):
+    <span style="color: #008000; font-weight: bold">return</span> np<span style="color: #666666">.</span>sin(np<span style="color: #666666">.</span>pi<span style="color: #666666">*</span>x)
+
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">g_trial</span>(point,P):
+    x,t <span style="color: #666666">=</span> point
+    <span style="color: #008000; font-weight: bold">return</span> (<span style="color: #666666">1-</span>t)<span style="color: #666666">*</span>u(x) <span style="color: #666666">+</span> x<span style="color: #666666">*</span>(<span style="color: #666666">1-</span>x)<span style="color: #666666">*</span>t<span style="color: #666666">*</span>deep_neural_network(P,point)
+
+<span style="color: #408080; font-style: italic"># The right side of the ODE:</span>
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">f</span>(point):
+    <span style="color: #008000; font-weight: bold">return</span> <span style="color: #666666">0.</span>
+
+<span style="color: #408080; font-style: italic"># The cost function:</span>
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">cost_function</span>(P, x, t):
+    cost_sum <span style="color: #666666">=</span> <span style="color: #666666">0</span>
+
+    g_t_jacobian_func <span style="color: #666666">=</span> jacobian(g_trial)
+    g_t_hessian_func <span style="color: #666666">=</span> hessian(g_trial)
+
+    <span style="color: #008000; font-weight: bold">for</span> x_ <span style="color: #AA22FF; font-weight: bold">in</span> x:
+        <span style="color: #008000; font-weight: bold">for</span> t_ <span style="color: #AA22FF; font-weight: bold">in</span> t:
+            point <span style="color: #666666">=</span> np<span style="color: #666666">.</span>array([x_,t_])
+
+            g_t <span style="color: #666666">=</span> g_trial(point,P)
+            g_t_jacobian <span style="color: #666666">=</span> g_t_jacobian_func(point,P)
+            g_t_hessian <span style="color: #666666">=</span> g_t_hessian_func(point,P)
+
+            g_t_dt <span style="color: #666666">=</span> g_t_jacobian[<span style="color: #666666">1</span>]
+            g_t_d2x <span style="color: #666666">=</span> g_t_hessian[<span style="color: #666666">0</span>][<span style="color: #666666">0</span>]
+
+            func <span style="color: #666666">=</span> f(point)
+
+            err_sqr <span style="color: #666666">=</span> ( (g_t_dt <span style="color: #666666">-</span> g_t_d2x) <span style="color: #666666">-</span> func)<span style="color: #666666">**2</span>
+            cost_sum <span style="color: #666666">+=</span> err_sqr
+
+    <span style="color: #008000; font-weight: bold">return</span> cost_sum <span style="color: #666666">/</span>( np<span style="color: #666666">.</span>size(x)<span style="color: #666666">*</span>np<span style="color: #666666">.</span>size(t) )
+
+<span style="color: #408080; font-style: italic">## For comparison, define the analytical solution</span>
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">g_analytic</span>(point):
+    x,t <span style="color: #666666">=</span> point
+    <span style="color: #008000; font-weight: bold">return</span> np<span style="color: #666666">.</span>exp(<span style="color: #666666">-</span>np<span style="color: #666666">.</span>pi<span style="color: #666666">**2*</span>t)<span style="color: #666666">*</span>np<span style="color: #666666">.</span>sin(np<span style="color: #666666">.</span>pi<span style="color: #666666">*</span>x)
+
+<span style="color: #408080; font-style: italic">## Set up a function for training the network to solve for the equation</span>
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">solve_pde_deep_neural_network</span>(x,t, num_neurons, num_iter, lmb):
+    <span style="color: #408080; font-style: italic">## Set up initial weigths and biases</span>
+    N_hidden <span style="color: #666666">=</span> np<span style="color: #666666">.</span>size(num_neurons)
+
+    <span style="color: #408080; font-style: italic">## Set up initial weigths and biases</span>
+
+    <span style="color: #408080; font-style: italic"># Initialize the list of parameters:</span>
+    P <span style="color: #666666">=</span> [<span style="color: #008000; font-weight: bold">None</span>]<span style="color: #666666">*</span>(N_hidden <span style="color: #666666">+</span> <span style="color: #666666">1</span>) <span style="color: #408080; font-style: italic"># + 1 to include the output layer</span>
+
+    P[<span style="color: #666666">0</span>] <span style="color: #666666">=</span> npr<span style="color: #666666">.</span>randn(num_neurons[<span style="color: #666666">0</span>], <span style="color: #666666">2</span> <span style="color: #666666">+</span> <span style="color: #666666">1</span> ) <span style="color: #408080; font-style: italic"># 2 since we have two points, +1 to include bias</span>
+    <span style="color: #008000; font-weight: bold">for</span> l <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #666666">1</span>,N_hidden):
+        P[l] <span style="color: #666666">=</span> npr<span style="color: #666666">.</span>randn(num_neurons[l], num_neurons[l<span style="color: #666666">-1</span>] <span style="color: #666666">+</span> <span style="color: #666666">1</span>) <span style="color: #408080; font-style: italic"># +1 to include bias</span>
+
+    <span style="color: #408080; font-style: italic"># For the output layer</span>
+    P[<span style="color: #666666">-1</span>] <span style="color: #666666">=</span> npr<span style="color: #666666">.</span>randn(<span style="color: #666666">1</span>, num_neurons[<span style="color: #666666">-1</span>] <span style="color: #666666">+</span> <span style="color: #666666">1</span> ) <span style="color: #408080; font-style: italic"># +1 since bias is included</span>
+
+    <span style="color: #008000">print</span>(<span style="color: #BA2121">&#39;Initial cost: &#39;</span>,cost_function(P, x, t))
+
+    cost_function_grad <span style="color: #666666">=</span> grad(cost_function,<span style="color: #666666">0</span>)
+
+    <span style="color: #408080; font-style: italic"># Let the update be done num_iter times</span>
+    <span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(num_iter):
+        cost_grad <span style="color: #666666">=</span>  cost_function_grad(P, x , t)
+
+        <span style="color: #008000; font-weight: bold">for</span> l <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(N_hidden<span style="color: #666666">+1</span>):
+            P[l] <span style="color: #666666">=</span> P[l] <span style="color: #666666">-</span> lmb <span style="color: #666666">*</span> cost_grad[l]
+
+    <span style="color: #008000">print</span>(<span style="color: #BA2121">&#39;Final cost: &#39;</span>,cost_function(P, x, t))
+
+    <span style="color: #008000; font-weight: bold">return</span> P
+
+<span style="color: #008000; font-weight: bold">if</span> <span style="color: #19177C">__name__</span> <span style="color: #666666">==</span> <span style="color: #BA2121">&#39;__main__&#39;</span>:
+    <span style="color: #408080; font-style: italic">### Use the neural network:</span>
+    npr<span style="color: #666666">.</span>seed(<span style="color: #666666">15</span>)
+
+    <span style="color: #408080; font-style: italic">## Decide the vales of arguments to the function to solve</span>
+    Nx <span style="color: #666666">=</span> <span style="color: #666666">10</span>; Nt <span style="color: #666666">=</span> <span style="color: #666666">10</span>
+    x <span style="color: #666666">=</span> np<span style="color: #666666">.</span>linspace(<span style="color: #666666">0</span>, <span style="color: #666666">1</span>, Nx)
+    t <span style="color: #666666">=</span> np<span style="color: #666666">.</span>linspace(<span style="color: #666666">0</span>,<span style="color: #666666">1</span>,Nt)
+
+    <span style="color: #408080; font-style: italic">## Set up the parameters for the network</span>
+    num_hidden_neurons <span style="color: #666666">=</span> [<span style="color: #666666">100</span>, <span style="color: #666666">25</span>]
+    num_iter <span style="color: #666666">=</span> <span style="color: #666666">250</span>
+    lmb <span style="color: #666666">=</span> <span style="color: #666666">0.01</span>
+
+    P <span style="color: #666666">=</span> solve_pde_deep_neural_network(x,t, num_hidden_neurons, num_iter, lmb)
+
+    <span style="color: #408080; font-style: italic">## Store the results</span>
+    g_dnn_ag <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros((Nx, Nt))
+    G_analytical <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros((Nx, Nt))
+    <span style="color: #008000; font-weight: bold">for</span> i,x_ <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">enumerate</span>(x):
+        <span style="color: #008000; font-weight: bold">for</span> j, t_ <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">enumerate</span>(t):
+            point <span style="color: #666666">=</span> np<span style="color: #666666">.</span>array([x_, t_])
+            g_dnn_ag[i,j] <span style="color: #666666">=</span> g_trial(point,P)
+
+            G_analytical[i,j] <span style="color: #666666">=</span> g_analytic(point)
+
+    <span style="color: #408080; font-style: italic"># Find the map difference between the analytical and the computed solution</span>
+    diff_ag <span style="color: #666666">=</span> np<span style="color: #666666">.</span>abs(g_dnn_ag <span style="color: #666666">-</span> G_analytical)
+    <span style="color: #008000">print</span>(<span style="color: #BA2121">&#39;Max absolute difference between the analytical solution and the network: </span><span style="color: #BB6688; font-weight: bold">%g</span><span style="color: #BA2121">&#39;</span><span style="color: #666666">%</span>np<span style="color: #666666">.</span>max(diff_ag))
+
+    <span style="color: #408080; font-style: italic">## Plot the solutions in two dimensions, that being in position and time</span>
+
+    T,X <span style="color: #666666">=</span> np<span style="color: #666666">.</span>meshgrid(t,x)
+
+    fig <span style="color: #666666">=</span> plt<span style="color: #666666">.</span>figure(figsize<span style="color: #666666">=</span>(<span style="color: #666666">10</span>,<span style="color: #666666">10</span>))
+    ax <span style="color: #666666">=</span> fig<span style="color: #666666">.</span>add_suplot(projection<span style="color: #666666">=</span><span style="color: #BA2121">&#39;3d&#39;</span>)
+    ax<span style="color: #666666">.</span>set_title(<span style="color: #BA2121">&#39;Solution from the deep neural network w/ </span><span style="color: #BB6688; font-weight: bold">%d</span><span style="color: #BA2121"> layer&#39;</span><span style="color: #666666">%</span><span style="color: #008000">len</span>(num_hidden_neurons))
+    s <span style="color: #666666">=</span> ax<span style="color: #666666">.</span>plot_surface(T,X,g_dnn_ag,linewidth<span style="color: #666666">=0</span>,antialiased<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">False</span>,cmap<span style="color: #666666">=</span>cm<span style="color: #666666">.</span>viridis)
+    ax<span style="color: #666666">.</span>set_xlabel(<span style="color: #BA2121">&#39;Time $t$&#39;</span>)
+    ax<span style="color: #666666">.</span>set_ylabel(<span style="color: #BA2121">&#39;Position $x$&#39;</span>);
+
+
+    fig <span style="color: #666666">=</span> plt<span style="color: #666666">.</span>figure(figsize<span style="color: #666666">=</span>(<span style="color: #666666">10</span>,<span style="color: #666666">10</span>))
+    ax <span style="color: #666666">=</span> fig<span style="color: #666666">.</span>add_suplot(projection<span style="color: #666666">=</span><span style="color: #BA2121">&#39;3d&#39;</span>)
+    ax<span style="color: #666666">.</span>set_title(<span style="color: #BA2121">&#39;Analytical solution&#39;</span>)
+    s <span style="color: #666666">=</span> ax<span style="color: #666666">.</span>plot_surface(T,X,G_analytical,linewidth<span style="color: #666666">=0</span>,antialiased<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">False</span>,cmap<span style="color: #666666">=</span>cm<span style="color: #666666">.</span>viridis)
+    ax<span style="color: #666666">.</span>set_xlabel(<span style="color: #BA2121">&#39;Time $t$&#39;</span>)
+    ax<span style="color: #666666">.</span>set_ylabel(<span style="color: #BA2121">&#39;Position $x$&#39;</span>);
+
+    fig <span style="color: #666666">=</span> plt<span style="color: #666666">.</span>figure(figsize<span style="color: #666666">=</span>(<span style="color: #666666">10</span>,<span style="color: #666666">10</span>))
+    ax <span style="color: #666666">=</span> fig<span style="color: #666666">.</span>add_suplot(projection<span style="color: #666666">=</span><span style="color: #BA2121">&#39;3d&#39;</span>)
+    ax<span style="color: #666666">.</span>set_title(<span style="color: #BA2121">&#39;Difference&#39;</span>)
+    s <span style="color: #666666">=</span> ax<span style="color: #666666">.</span>plot_surface(T,X,diff_ag,linewidth<span style="color: #666666">=0</span>,antialiased<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">False</span>,cmap<span style="color: #666666">=</span>cm<span style="color: #666666">.</span>viridis)
+    ax<span style="color: #666666">.</span>set_xlabel(<span style="color: #BA2121">&#39;Time $t$&#39;</span>)
+    ax<span style="color: #666666">.</span>set_ylabel(<span style="color: #BA2121">&#39;Position $x$&#39;</span>);
+
+    <span style="color: #408080; font-style: italic">## Take some slices of the 3D plots just to see the solutions at particular times</span>
+    indx1 <span style="color: #666666">=</span> <span style="color: #666666">0</span>
+    indx2 <span style="color: #666666">=</span> <span style="color: #008000">int</span>(Nt<span style="color: #666666">/2</span>)
+    indx3 <span style="color: #666666">=</span> Nt<span style="color: #666666">-1</span>
+
+    t1 <span style="color: #666666">=</span> t[indx1]
+    t2 <span style="color: #666666">=</span> t[indx2]
+    t3 <span style="color: #666666">=</span> t[indx3]
+
+    <span style="color: #408080; font-style: italic"># Slice the results from the DNN</span>
+    res1 <span style="color: #666666">=</span> g_dnn_ag[:,indx1]
+    res2 <span style="color: #666666">=</span> g_dnn_ag[:,indx2]
+    res3 <span style="color: #666666">=</span> g_dnn_ag[:,indx3]
+
+    <span style="color: #408080; font-style: italic"># Slice the analytical results</span>
+    res_analytical1 <span style="color: #666666">=</span> G_analytical[:,indx1]
+    res_analytical2 <span style="color: #666666">=</span> G_analytical[:,indx2]
+    res_analytical3 <span style="color: #666666">=</span> G_analytical[:,indx3]
+
+    <span style="color: #408080; font-style: italic"># Plot the slices</span>
+    plt<span style="color: #666666">.</span>figure(figsize<span style="color: #666666">=</span>(<span style="color: #666666">10</span>,<span style="color: #666666">10</span>))
+    plt<span style="color: #666666">.</span>title(<span style="color: #BA2121">&quot;Computed solutions at time = </span><span style="color: #BB6688; font-weight: bold">%g</span><span style="color: #BA2121">&quot;</span><span style="color: #666666">%</span>t1)
+    plt<span style="color: #666666">.</span>plot(x, res1)
+    plt<span style="color: #666666">.</span>plot(x,res_analytical1)
+    plt<span style="color: #666666">.</span>legend([<span style="color: #BA2121">&#39;dnn&#39;</span>,<span style="color: #BA2121">&#39;analytical&#39;</span>])
+
+    plt<span style="color: #666666">.</span>figure(figsize<span style="color: #666666">=</span>(<span style="color: #666666">10</span>,<span style="color: #666666">10</span>))
+    plt<span style="color: #666666">.</span>title(<span style="color: #BA2121">&quot;Computed solutions at time = </span><span style="color: #BB6688; font-weight: bold">%g</span><span style="color: #BA2121">&quot;</span><span style="color: #666666">%</span>t2)
+    plt<span style="color: #666666">.</span>plot(x, res2)
+    plt<span style="color: #666666">.</span>plot(x,res_analytical2)
+    plt<span style="color: #666666">.</span>legend([<span style="color: #BA2121">&#39;dnn&#39;</span>,<span style="color: #BA2121">&#39;analytical&#39;</span>])
+
+    plt<span style="color: #666666">.</span>figure(figsize<span style="color: #666666">=</span>(<span style="color: #666666">10</span>,<span style="color: #666666">10</span>))
+    plt<span style="color: #666666">.</span>title(<span style="color: #BA2121">&quot;Computed solutions at time = </span><span style="color: #BB6688; font-weight: bold">%g</span><span style="color: #BA2121">&quot;</span><span style="color: #666666">%</span>t3)
+    plt<span style="color: #666666">.</span>plot(x, res3)
+    plt<span style="color: #666666">.</span>plot(x,res_analytical3)
+    plt<span style="color: #666666">.</span>legend([<span style="color: #BA2121">&#39;dnn&#39;</span>,<span style="color: #BA2121">&#39;analytical&#39;</span>])
+
+    plt<span style="color: #666666">.</span>show()
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
+</div>
+
+
 <p>
 <!-- navigation buttons at the bottom of the page -->
 <ul class="pagination">
@@ -389,7 +680,7 @@ <h2 id="the-mnist-dataset-again" class="anchor">The MNIST dataset again </h2>
   <li><a href="http://github.com/._week44-bs053.html">54</a></li>
   <li><a href="http://github.com/._week44-bs054.html">55</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week44-bs060.html">61</a></li>
+  <li><a href="http://github.com/._week44-bs086.html">87</a></li>
   <li><a href="http://github.com/._week44-bs046.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week44/html/._week44-bs046.html b/doc/pub/week44/html/._week44-bs046.html
index 9198f355b..40a818bbb 100644
--- a/doc/pub/week44/html/._week44-bs046.html
+++ b/doc/pub/week44/html/._week44-bs046.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 44,  Convolutional Neural Networks (CNN)">
-<title>Week 44,  Convolutional Neural Networks (CNN)</title>
+<meta name="description" content="Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)">
+<title>Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week44.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week44-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -41,10 +41,121 @@
                2,
                None,
                'lab-sessions-on-tuesday-and-wednesday'),
-              ('Material for Lecture Monday October 28',
+              ('Material for Lecture Monday October 27',
                2,
                None,
-               'material-for-lecture-monday-october-28'),
+               'material-for-lecture-monday-october-27'),
+              ('Solving differential equations  with Deep Learning',
+               2,
+               None,
+               'solving-differential-equations-with-deep-learning'),
+              ('Ordinary Differential Equations first',
+               2,
+               None,
+               'ordinary-differential-equations-first'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('Minimization process', 2, None, 'minimization-process'),
+              ('Minimizing the cost function using gradient descent and '
+               'automatic differentiation',
+               2,
+               None,
+               'minimizing-the-cost-function-using-gradient-descent-and-automatic-differentiation'),
+              ('Example: Exponential decay',
+               2,
+               None,
+               'example-exponential-decay'),
+              ('The function to solve for',
+               2,
+               None,
+               'the-function-to-solve-for'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('Setup of Network', 2, None, 'setup-of-network'),
+              ('Reformulating the problem',
+               2,
+               None,
+               'reformulating-the-problem'),
+              ('More technicalities', 2, None, 'more-technicalities'),
+              ('More details', 2, None, 'more-details'),
+              ('A possible implementation of a neural network',
+               2,
+               None,
+               'a-possible-implementation-of-a-neural-network'),
+              ('Technicalities', 2, None, 'technicalities'),
+              ('Final technicalities I', 2, None, 'final-technicalities-i'),
+              ('Final technicalities II', 2, None, 'final-technicalities-ii'),
+              ('Final technicalities III', 2, None, 'final-technicalities-iii'),
+              ('Final technicalities IV', 2, None, 'final-technicalities-iv'),
+              ('Back propagation', 2, None, 'back-propagation'),
+              ('Gradient descent', 2, None, 'gradient-descent'),
+              ('The code for solving the ODE',
+               2,
+               None,
+               'the-code-for-solving-the-ode'),
+              ('The network with one input layer, specified number of hidden '
+               'layers, and one output layer',
+               2,
+               None,
+               'the-network-with-one-input-layer-specified-number-of-hidden-layers-and-one-output-layer'),
+              ('Example: Population growth',
+               2,
+               None,
+               'example-population-growth'),
+              ('Setting up the problem', 2, None, 'setting-up-the-problem'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('The program using Autograd',
+               2,
+               None,
+               'the-program-using-autograd'),
+              ('Using forward Euler to solve the ODE',
+               2,
+               None,
+               'using-forward-euler-to-solve-the-ode'),
+              ('Example: Solving the one dimensional Poisson equation',
+               2,
+               None,
+               'example-solving-the-one-dimensional-poisson-equation'),
+              ('The specific equation to solve for',
+               2,
+               None,
+               'the-specific-equation-to-solve-for'),
+              ('Solving the equation using Autograd',
+               2,
+               None,
+               'solving-the-equation-using-autograd'),
+              ('Comparing with a numerical scheme',
+               2,
+               None,
+               'comparing-with-a-numerical-scheme'),
+              ('Setting up the code', 2, None, 'setting-up-the-code'),
+              ('Partial Differential Equations',
+               2,
+               None,
+               'partial-differential-equations'),
+              ('Type of problem', 2, None, 'type-of-problem'),
+              ('Network requirements', 2, None, 'network-requirements'),
+              ('More details', 2, None, 'more-details'),
+              ('Example: The diffusion equation',
+               2,
+               None,
+               'example-the-diffusion-equation'),
+              ('Defining the problem', 2, None, 'defining-the-problem'),
+              ('Setting up the network using Autograd',
+               2,
+               None,
+               'setting-up-the-network-using-autograd'),
+              ('Setting up the network using Autograd; The trial solution',
+               2,
+               None,
+               'setting-up-the-network-using-autograd-the-trial-solution'),
+              ('Why the Jacobian?', 2, None, 'why-the-jacobian'),
+              ('Setting up the network using Autograd; The full program',
+               2,
+               None,
+               'setting-up-the-network-using-autograd-the-full-program'),
+              ('Resources on differential equations and deep learning',
+               2,
+               None,
+               'resources-on-differential-equations-and-deep-learning'),
               ('Convolutional Neural Networks (recognizing images)',
                2,
                None,
@@ -141,85 +252,11 @@
                2,
                None,
                'pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
-              ('Building convolutional neural networks in Tensorflow and Keras',
-               2,
-               None,
-               'building-convolutional-neural-networks-in-tensorflow-and-keras'),
-              ('Setting it up', 2, None, 'setting-it-up'),
-              ('The MNIST dataset again', 2, None, 'the-mnist-dataset-again'),
-              ('Strong correlations', 2, None, 'strong-correlations'),
-              ('Layers of a CNN', 2, None, 'layers-of-a-cnn'),
-              ('Systematic reduction', 2, None, 'systematic-reduction'),
-              ('Prerequisites: Collect and pre-process data',
-               2,
-               None,
-               'prerequisites-collect-and-pre-process-data'),
-              ('Importing Keras and Tensorflow',
+              ('Building convolutional neural networks in Tensorflow/Keras and '
+               'PyTorch',
                2,
                None,
-               'importing-keras-and-tensorflow'),
-              ('Running with Keras', 2, None, 'running-with-keras'),
-              ('Final part', 2, None, 'final-part'),
-              ('Final visualization', 2, None, 'final-visualization'),
-              ('The CIFAR01 data set', 2, None, 'the-cifar01-data-set'),
-              ('Verifying the data set', 2, None, 'verifying-the-data-set'),
-              ('Set up  the model', 2, None, 'set-up-the-model'),
-              ('Add Dense layers on top', 2, None, 'add-dense-layers-on-top'),
-              ('Compile and train the model',
-               2,
-               None,
-               'compile-and-train-the-model'),
-              ('Finally, evaluate the model',
-               2,
-               None,
-               'finally-evaluate-the-model'),
-              ('Building our own CNN code',
-               2,
-               None,
-               'building-our-own-cnn-code'),
-              ('List of contents:', 3, None, 'list-of-contents'),
-              ('Schedulers', 3, None, 'schedulers'),
-              ('Usage of schedulers', 3, None, 'usage-of-schedulers'),
-              ('Cost functions', 3, None, 'cost-functions'),
-              ('Usage of cost functions', 3, None, 'usage-of-cost-functions'),
-              ('Activation functions', 3, None, 'activation-functions'),
-              ('Usage of activation functions',
-               3,
-               None,
-               'usage-of-activation-functions'),
-              ('Convolution', 3, None, 'convolution'),
-              ('Layers', 3, None, 'layers'),
-              ('Convolution2DLayer: convolution in a hidden layer',
-               3,
-               None,
-               'convolution2dlayer-convolution-in-a-hidden-layer'),
-              ('Backpropagation in the convolutional layer',
-               3,
-               None,
-               'backpropagation-in-the-convolutional-layer'),
-              ('Demonstration', 3, None, 'demonstration'),
-              ('Pooling Layer', 3, None, 'pooling-layer'),
-              ('Flattening Layer', 3, None, 'flattening-layer'),
-              ('Fully Connected Layers', 3, None, 'fully-connected-layers'),
-              ('Optimized Convolution2DLayer',
-               3,
-               None,
-               'optimized-convolution2dlayer'),
-              ('The Convolutional Neural Network (CNN)',
-               3,
-               None,
-               'the-convolutional-neural-network-cnn'),
-              ('Usage of CNN code', 3, None, 'usage-of-cnn-code'),
-              ('Additional Remarks', 3, None, 'additional-remarks'),
-              ('Remarks on the speed', 3, None, 'remarks-on-the-speed'),
-              ('Convolution using separable kernels',
-               3,
-               None,
-               'convolution-using-separable-kernels'),
-              ('Convolution in the Fourier domain',
-               3,
-               None,
-               'convolution-in-the-fourier-domain')]}
+               'building-convolutional-neural-networks-in-tensorflow-keras-and-pytorch')]}
 end of tocinfo -->
 
 <body>
@@ -247,96 +284,100 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week44-bs.html">Week 44,  Convolutional Neural Networks (CNN)</a>
+    <a class="navbar-brand" href="http://github.com/week44-bs.html">Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs001.html#plan-for-week-44" style="font-size: 80%;"><b>Plan for week 44</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs002.html#lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Lab  sessions on Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs003.html#material-for-lecture-monday-october-28" style="font-size: 80%;"><b>Material for Lecture Monday October 28</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs004.html#convolutional-neural-networks-recognizing-images" style="font-size: 80%;"><b>Convolutional Neural Networks (recognizing images)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs005.html#what-is-the-difference" style="font-size: 80%;"><b>What is the Difference</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs006.html#neural-networks-vs-cnns" style="font-size: 80%;"><b>Neural Networks vs CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs007.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;"><b>Why CNNS for images, sound files, medical images from CT scans etc?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs008.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;"><b>Regular NNs don’t scale well to full images</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs009.html#3d-volumes-of-neurons" style="font-size: 80%;"><b>3D volumes of neurons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs010.html#more-on-dimensionalities" style="font-size: 80%;"><b>More on Dimensionalities</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs011.html#further-remarks" style="font-size: 80%;"><b>Further remarks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs012.html#layers-used-to-build-cnns" style="font-size: 80%;"><b>Layers used to build CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs013.html#transforming-images" style="font-size: 80%;"><b>Transforming images</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs014.html#cnns-in-brief" style="font-size: 80%;"><b>CNNs in brief</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs015.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs016.html#key-idea" style="font-size: 80%;"><b>Key Idea</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs017.html#how-to-do-image-compression-before-the-era-of-deep-learning" style="font-size: 80%;"><b>How to do image compression before the era of deep learning</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs018.html#the-svd-example" style="font-size: 80%;"><b>The SVD example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs019.html#mathematics-of-cnns" style="font-size: 80%;"><b>Mathematics of CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs020.html#convolution-examples-polynomial-multiplication" style="font-size: 80%;"><b>Convolution Examples: Polynomial multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs021.html#efficient-polynomial-multiplication" style="font-size: 80%;"><b>Efficient Polynomial Multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs022.html#further-simplification" style="font-size: 80%;"><b>Further simplification</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs023.html#a-more-efficient-way-of-coding-the-above-convolution" style="font-size: 80%;"><b>A more efficient way of coding the above Convolution</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs024.html#commutative-process" style="font-size: 80%;"><b>Commutative process</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs025.html#toeplitz-matrices" style="font-size: 80%;"><b>Toeplitz matrices</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs026.html#fourier-series-and-toeplitz-matrices" style="font-size: 80%;"><b>Fourier series and Toeplitz matrices</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs027.html#generalizing-the-above-one-dimensional-case" style="font-size: 80%;"><b>Generalizing the above one-dimensional case</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#memory-considerations" style="font-size: 80%;"><b>Memory considerations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs029.html#padding" style="font-size: 80%;"><b>Padding</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs030.html#new-vector" style="font-size: 80%;"><b>New vector</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs031.html#rewriting-as-dot-products" style="font-size: 80%;"><b>Rewriting as dot products</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs032.html#cross-correlation" style="font-size: 80%;"><b>Cross correlation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs032.html#two-dimensional-objects" style="font-size: 80%;"><b>Two-dimensional objects</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs033.html#cnns-in-more-detail-simple-example" style="font-size: 80%;"><b>CNNs in more detail, simple example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs034.html#the-convolution-stage" style="font-size: 80%;"><b>The convolution stage</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs035.html#finding-the-number-of-parameters" style="font-size: 80%;"><b>Finding the number of parameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs036.html#new-image-or-volume" style="font-size: 80%;"><b>New image (or volume)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs037.html#parameters-to-train-common-settings" style="font-size: 80%;"><b>Parameters to train, common settings</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs038.html#examples-of-cnn-setups" style="font-size: 80%;"><b>Examples of CNN setups</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs039.html#summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>Summarizing: Performing a general discrete convolution ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs040.html#pooling" style="font-size: 80%;"><b>Pooling</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs041.html#pooling-arithmetic" style="font-size: 80%;"><b>Pooling arithmetic</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs042.html#pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>Pooling types ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs043.html#building-convolutional-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building convolutional neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs044.html#setting-it-up" style="font-size: 80%;"><b>Setting it up</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs045.html#the-mnist-dataset-again" style="font-size: 80%;"><b>The MNIST dataset again</b></a></li>
-     <!-- navigation toc: --> <li><a href="#strong-correlations" style="font-size: 80%;"><b>Strong correlations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs047.html#layers-of-a-cnn" style="font-size: 80%;"><b>Layers of a CNN</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs048.html#systematic-reduction" style="font-size: 80%;"><b>Systematic reduction</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs049.html#prerequisites-collect-and-pre-process-data" style="font-size: 80%;"><b>Prerequisites: Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs050.html#importing-keras-and-tensorflow" style="font-size: 80%;"><b>Importing Keras and Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs051.html#running-with-keras" style="font-size: 80%;"><b>Running with Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs052.html#final-part" style="font-size: 80%;"><b>Final part</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs053.html#final-visualization" style="font-size: 80%;"><b>Final visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs054.html#the-cifar01-data-set" style="font-size: 80%;"><b>The CIFAR01 data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs055.html#verifying-the-data-set" style="font-size: 80%;"><b>Verifying the data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs056.html#set-up-the-model" style="font-size: 80%;"><b>Set up  the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs057.html#add-dense-layers-on-top" style="font-size: 80%;"><b>Add Dense layers on top</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs058.html#compile-and-train-the-model" style="font-size: 80%;"><b>Compile and train the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs059.html#finally-evaluate-the-model" style="font-size: 80%;"><b>Finally, evaluate the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#building-our-own-cnn-code" style="font-size: 80%;"><b>Building our own CNN code</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#list-of-contents" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;List of contents:</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution2dlayer-convolution-in-a-hidden-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution2DLayer: convolution in a hidden layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#backpropagation-in-the-convolutional-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Backpropagation in the convolutional layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#demonstration" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Demonstration</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#pooling-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Pooling Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#flattening-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Flattening Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#fully-connected-layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Fully Connected Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#optimized-convolution2dlayer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Optimized Convolution2DLayer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#the-convolutional-neural-network-cnn" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;The Convolutional Neural Network (CNN)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-cnn-code" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of CNN code</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#additional-remarks" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Additional Remarks</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#remarks-on-the-speed" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Remarks on the speed</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution-using-separable-kernels" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution using separable kernels</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution-in-the-fourier-domain" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution in the Fourier domain</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs001.html#plan-for-week-44" style="font-size: 80%;">Plan for week 44</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs002.html#lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;">Lab  sessions on Tuesday and Wednesday</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs003.html#material-for-lecture-monday-october-27" style="font-size: 80%;">Material for Lecture Monday October 27</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs004.html#solving-differential-equations-with-deep-learning" style="font-size: 80%;">Solving differential equations  with Deep Learning</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs005.html#ordinary-differential-equations-first" style="font-size: 80%;">Ordinary Differential Equations first</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#the-trial-solution" style="font-size: 80%;">The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs007.html#minimization-process" style="font-size: 80%;">Minimization process</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs008.html#minimizing-the-cost-function-using-gradient-descent-and-automatic-differentiation" style="font-size: 80%;">Minimizing the cost function using gradient descent and automatic differentiation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs009.html#example-exponential-decay" style="font-size: 80%;">Example: Exponential decay</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs010.html#the-function-to-solve-for" style="font-size: 80%;">The function to solve for</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#the-trial-solution" style="font-size: 80%;">The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs012.html#setup-of-network" style="font-size: 80%;">Setup of Network</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs013.html#reformulating-the-problem" style="font-size: 80%;">Reformulating the problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs014.html#more-technicalities" style="font-size: 80%;">More technicalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs039.html#more-details" style="font-size: 80%;">More details</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs016.html#a-possible-implementation-of-a-neural-network" style="font-size: 80%;">A possible implementation of a neural network</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs017.html#technicalities" style="font-size: 80%;">Technicalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs018.html#final-technicalities-i" style="font-size: 80%;">Final technicalities I</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs019.html#final-technicalities-ii" style="font-size: 80%;">Final technicalities II</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs020.html#final-technicalities-iii" style="font-size: 80%;">Final technicalities III</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs021.html#final-technicalities-iv" style="font-size: 80%;">Final technicalities IV</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs022.html#back-propagation" style="font-size: 80%;">Back propagation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs023.html#gradient-descent" style="font-size: 80%;">Gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs024.html#the-code-for-solving-the-ode" style="font-size: 80%;">The code for solving the ODE</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs025.html#the-network-with-one-input-layer-specified-number-of-hidden-layers-and-one-output-layer" style="font-size: 80%;">The network with one input layer, specified number of hidden layers, and one output layer</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs026.html#example-population-growth" style="font-size: 80%;">Example: Population growth</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs027.html#setting-up-the-problem" style="font-size: 80%;">Setting up the problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#the-trial-solution" style="font-size: 80%;">The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs029.html#the-program-using-autograd" style="font-size: 80%;">The program using Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs030.html#using-forward-euler-to-solve-the-ode" style="font-size: 80%;">Using forward Euler to solve the ODE</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs031.html#example-solving-the-one-dimensional-poisson-equation" style="font-size: 80%;">Example: Solving the one dimensional Poisson equation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs032.html#the-specific-equation-to-solve-for" style="font-size: 80%;">The specific equation to solve for</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs033.html#solving-the-equation-using-autograd" style="font-size: 80%;">Solving the equation using Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs034.html#comparing-with-a-numerical-scheme" style="font-size: 80%;">Comparing with a numerical scheme</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs035.html#setting-up-the-code" style="font-size: 80%;">Setting up the code</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs036.html#partial-differential-equations" style="font-size: 80%;">Partial Differential Equations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs037.html#type-of-problem" style="font-size: 80%;">Type of problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs038.html#network-requirements" style="font-size: 80%;">Network requirements</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs039.html#more-details" style="font-size: 80%;">More details</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs040.html#example-the-diffusion-equation" style="font-size: 80%;">Example: The diffusion equation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs041.html#defining-the-problem" style="font-size: 80%;">Defining the problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs042.html#setting-up-the-network-using-autograd" style="font-size: 80%;">Setting up the network using Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs043.html#setting-up-the-network-using-autograd-the-trial-solution" style="font-size: 80%;">Setting up the network using Autograd; The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs044.html#why-the-jacobian" style="font-size: 80%;">Why the Jacobian?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs045.html#setting-up-the-network-using-autograd-the-full-program" style="font-size: 80%;">Setting up the network using Autograd; The full program</a></li>
+     <!-- navigation toc: --> <li><a href="#resources-on-differential-equations-and-deep-learning" style="font-size: 80%;">Resources on differential equations and deep learning</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs047.html#convolutional-neural-networks-recognizing-images" style="font-size: 80%;">Convolutional Neural Networks (recognizing images)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs048.html#what-is-the-difference" style="font-size: 80%;">What is the Difference</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs049.html#neural-networks-vs-cnns" style="font-size: 80%;">Neural Networks vs CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs050.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;">Why CNNS for images, sound files, medical images from CT scans etc?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs051.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;">Regular NNs don’t scale well to full images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs052.html#3d-volumes-of-neurons" style="font-size: 80%;">3D volumes of neurons</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs053.html#more-on-dimensionalities" style="font-size: 80%;">More on Dimensionalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs054.html#further-remarks" style="font-size: 80%;">Further remarks</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs055.html#layers-used-to-build-cnns" style="font-size: 80%;">Layers used to build CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs056.html#transforming-images" style="font-size: 80%;">Transforming images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs057.html#cnns-in-brief" style="font-size: 80%;">CNNs in brief</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs058.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs059.html#key-idea" style="font-size: 80%;">Key Idea</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#how-to-do-image-compression-before-the-era-of-deep-learning" style="font-size: 80%;">How to do image compression before the era of deep learning</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs061.html#the-svd-example" style="font-size: 80%;">The SVD example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs062.html#mathematics-of-cnns" style="font-size: 80%;">Mathematics of CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs063.html#convolution-examples-polynomial-multiplication" style="font-size: 80%;">Convolution Examples: Polynomial multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs064.html#efficient-polynomial-multiplication" style="font-size: 80%;">Efficient Polynomial Multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs065.html#further-simplification" style="font-size: 80%;">Further simplification</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs066.html#a-more-efficient-way-of-coding-the-above-convolution" style="font-size: 80%;">A more efficient way of coding the above Convolution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs067.html#commutative-process" style="font-size: 80%;">Commutative process</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs068.html#toeplitz-matrices" style="font-size: 80%;">Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs069.html#fourier-series-and-toeplitz-matrices" style="font-size: 80%;">Fourier series and Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs070.html#generalizing-the-above-one-dimensional-case" style="font-size: 80%;">Generalizing the above one-dimensional case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs071.html#memory-considerations" style="font-size: 80%;">Memory considerations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs072.html#padding" style="font-size: 80%;">Padding</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs073.html#new-vector" style="font-size: 80%;">New vector</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs074.html#rewriting-as-dot-products" style="font-size: 80%;">Rewriting as dot products</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs075.html#cross-correlation" style="font-size: 80%;">Cross correlation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs075.html#two-dimensional-objects" style="font-size: 80%;">Two-dimensional objects</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs076.html#cnns-in-more-detail-simple-example" style="font-size: 80%;">CNNs in more detail, simple example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs077.html#the-convolution-stage" style="font-size: 80%;">The convolution stage</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs078.html#finding-the-number-of-parameters" style="font-size: 80%;">Finding the number of parameters</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs079.html#new-image-or-volume" style="font-size: 80%;">New image (or volume)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs080.html#parameters-to-train-common-settings" style="font-size: 80%;">Parameters to train, common settings</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs081.html#examples-of-cnn-setups" style="font-size: 80%;">Examples of CNN setups</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs082.html#summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Summarizing: Performing a general discrete convolution ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs083.html#pooling" style="font-size: 80%;">Pooling</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs084.html#pooling-arithmetic" style="font-size: 80%;">Pooling arithmetic</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs085.html#pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Pooling types ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs086.html#building-convolutional-neural-networks-in-tensorflow-keras-and-pytorch" style="font-size: 80%;">Building convolutional neural networks in Tensorflow/Keras and PyTorch</a></li>
 
         </ul>
       </li>
@@ -348,22 +389,14 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0046"></a>
 <!-- !split -->
-<h2 id="strong-correlations" class="anchor">Strong correlations </h2>
-
-<p>Images typically have strong local correlations, meaning that a small
-part of the image varies little from its neighboring regions. If for
-example we have an image of a blue car, we can roughly assume that a
-small blue part of the image is surrounded by other blue regions.
-</p>
-
-<p>Therefore, instead of connecting every single pixel to a neuron in the
-first hidden layer, as we have previously done with deep neural
-networks, we can instead connect each neuron to a small part of the
-image (in all 3 RGB depth dimensions).  The size of each small area is
-fixed, and known as a <a href="https://en.wikipedia.org/wiki/Receptive_field" target="_self">receptive</a>.
-</p>
-
+<h2 id="resources-on-differential-equations-and-deep-learning" class="anchor">Resources on differential equations and deep learning </h2>
 
+<ol>
+<li> <a href="https://pdfs.semanticscholar.org/d061/df393e0e8fbfd0ea24976458b7d42419040d.pdf" target="_self">Artificial neural networks for solving ordinary and partial differential equations by I.E. Lagaris et al</a></li>
+<li> <a href="https://becominghuman.ai/neural-networks-for-solving-differential-equations-fa230ac5e04c" target="_self">Neural networks for solving differential equations by A. Honchar</a></li>
+<li> <a href="http://cs229.stanford.edu/proj2013/ChiaramonteKiener-SolvingDifferentialEquationsUsingNeuralNetworks.pdf" target="_self">Solving differential equations using neural networks by M.M Chiaramonte and M. Kiener</a></li>
+<li> <a href="https://www.springer.com/us/book/9783540225515" target="_self">Introduction to Partial Differential Equations by A. Tveito, R. Winther</a></li>
+</ol>
 <p>
 <!-- navigation buttons at the bottom of the page -->
 <ul class="pagination">
@@ -389,7 +422,7 @@ <h2 id="strong-correlations" class="anchor">Strong correlations </h2>
   <li><a href="http://github.com/._week44-bs054.html">55</a></li>
   <li><a href="http://github.com/._week44-bs055.html">56</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week44-bs060.html">61</a></li>
+  <li><a href="http://github.com/._week44-bs086.html">87</a></li>
   <li><a href="http://github.com/._week44-bs047.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week44/html/._week44-bs047.html b/doc/pub/week44/html/._week44-bs047.html
index 8d05430fc..e8ea178e5 100644
--- a/doc/pub/week44/html/._week44-bs047.html
+++ b/doc/pub/week44/html/._week44-bs047.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 44,  Convolutional Neural Networks (CNN)">
-<title>Week 44,  Convolutional Neural Networks (CNN)</title>
+<meta name="description" content="Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)">
+<title>Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week44.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week44-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -41,10 +41,121 @@
                2,
                None,
                'lab-sessions-on-tuesday-and-wednesday'),
-              ('Material for Lecture Monday October 28',
+              ('Material for Lecture Monday October 27',
                2,
                None,
-               'material-for-lecture-monday-october-28'),
+               'material-for-lecture-monday-october-27'),
+              ('Solving differential equations  with Deep Learning',
+               2,
+               None,
+               'solving-differential-equations-with-deep-learning'),
+              ('Ordinary Differential Equations first',
+               2,
+               None,
+               'ordinary-differential-equations-first'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('Minimization process', 2, None, 'minimization-process'),
+              ('Minimizing the cost function using gradient descent and '
+               'automatic differentiation',
+               2,
+               None,
+               'minimizing-the-cost-function-using-gradient-descent-and-automatic-differentiation'),
+              ('Example: Exponential decay',
+               2,
+               None,
+               'example-exponential-decay'),
+              ('The function to solve for',
+               2,
+               None,
+               'the-function-to-solve-for'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('Setup of Network', 2, None, 'setup-of-network'),
+              ('Reformulating the problem',
+               2,
+               None,
+               'reformulating-the-problem'),
+              ('More technicalities', 2, None, 'more-technicalities'),
+              ('More details', 2, None, 'more-details'),
+              ('A possible implementation of a neural network',
+               2,
+               None,
+               'a-possible-implementation-of-a-neural-network'),
+              ('Technicalities', 2, None, 'technicalities'),
+              ('Final technicalities I', 2, None, 'final-technicalities-i'),
+              ('Final technicalities II', 2, None, 'final-technicalities-ii'),
+              ('Final technicalities III', 2, None, 'final-technicalities-iii'),
+              ('Final technicalities IV', 2, None, 'final-technicalities-iv'),
+              ('Back propagation', 2, None, 'back-propagation'),
+              ('Gradient descent', 2, None, 'gradient-descent'),
+              ('The code for solving the ODE',
+               2,
+               None,
+               'the-code-for-solving-the-ode'),
+              ('The network with one input layer, specified number of hidden '
+               'layers, and one output layer',
+               2,
+               None,
+               'the-network-with-one-input-layer-specified-number-of-hidden-layers-and-one-output-layer'),
+              ('Example: Population growth',
+               2,
+               None,
+               'example-population-growth'),
+              ('Setting up the problem', 2, None, 'setting-up-the-problem'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('The program using Autograd',
+               2,
+               None,
+               'the-program-using-autograd'),
+              ('Using forward Euler to solve the ODE',
+               2,
+               None,
+               'using-forward-euler-to-solve-the-ode'),
+              ('Example: Solving the one dimensional Poisson equation',
+               2,
+               None,
+               'example-solving-the-one-dimensional-poisson-equation'),
+              ('The specific equation to solve for',
+               2,
+               None,
+               'the-specific-equation-to-solve-for'),
+              ('Solving the equation using Autograd',
+               2,
+               None,
+               'solving-the-equation-using-autograd'),
+              ('Comparing with a numerical scheme',
+               2,
+               None,
+               'comparing-with-a-numerical-scheme'),
+              ('Setting up the code', 2, None, 'setting-up-the-code'),
+              ('Partial Differential Equations',
+               2,
+               None,
+               'partial-differential-equations'),
+              ('Type of problem', 2, None, 'type-of-problem'),
+              ('Network requirements', 2, None, 'network-requirements'),
+              ('More details', 2, None, 'more-details'),
+              ('Example: The diffusion equation',
+               2,
+               None,
+               'example-the-diffusion-equation'),
+              ('Defining the problem', 2, None, 'defining-the-problem'),
+              ('Setting up the network using Autograd',
+               2,
+               None,
+               'setting-up-the-network-using-autograd'),
+              ('Setting up the network using Autograd; The trial solution',
+               2,
+               None,
+               'setting-up-the-network-using-autograd-the-trial-solution'),
+              ('Why the Jacobian?', 2, None, 'why-the-jacobian'),
+              ('Setting up the network using Autograd; The full program',
+               2,
+               None,
+               'setting-up-the-network-using-autograd-the-full-program'),
+              ('Resources on differential equations and deep learning',
+               2,
+               None,
+               'resources-on-differential-equations-and-deep-learning'),
               ('Convolutional Neural Networks (recognizing images)',
                2,
                None,
@@ -141,85 +252,11 @@
                2,
                None,
                'pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
-              ('Building convolutional neural networks in Tensorflow and Keras',
-               2,
-               None,
-               'building-convolutional-neural-networks-in-tensorflow-and-keras'),
-              ('Setting it up', 2, None, 'setting-it-up'),
-              ('The MNIST dataset again', 2, None, 'the-mnist-dataset-again'),
-              ('Strong correlations', 2, None, 'strong-correlations'),
-              ('Layers of a CNN', 2, None, 'layers-of-a-cnn'),
-              ('Systematic reduction', 2, None, 'systematic-reduction'),
-              ('Prerequisites: Collect and pre-process data',
-               2,
-               None,
-               'prerequisites-collect-and-pre-process-data'),
-              ('Importing Keras and Tensorflow',
+              ('Building convolutional neural networks in Tensorflow/Keras and '
+               'PyTorch',
                2,
                None,
-               'importing-keras-and-tensorflow'),
-              ('Running with Keras', 2, None, 'running-with-keras'),
-              ('Final part', 2, None, 'final-part'),
-              ('Final visualization', 2, None, 'final-visualization'),
-              ('The CIFAR01 data set', 2, None, 'the-cifar01-data-set'),
-              ('Verifying the data set', 2, None, 'verifying-the-data-set'),
-              ('Set up  the model', 2, None, 'set-up-the-model'),
-              ('Add Dense layers on top', 2, None, 'add-dense-layers-on-top'),
-              ('Compile and train the model',
-               2,
-               None,
-               'compile-and-train-the-model'),
-              ('Finally, evaluate the model',
-               2,
-               None,
-               'finally-evaluate-the-model'),
-              ('Building our own CNN code',
-               2,
-               None,
-               'building-our-own-cnn-code'),
-              ('List of contents:', 3, None, 'list-of-contents'),
-              ('Schedulers', 3, None, 'schedulers'),
-              ('Usage of schedulers', 3, None, 'usage-of-schedulers'),
-              ('Cost functions', 3, None, 'cost-functions'),
-              ('Usage of cost functions', 3, None, 'usage-of-cost-functions'),
-              ('Activation functions', 3, None, 'activation-functions'),
-              ('Usage of activation functions',
-               3,
-               None,
-               'usage-of-activation-functions'),
-              ('Convolution', 3, None, 'convolution'),
-              ('Layers', 3, None, 'layers'),
-              ('Convolution2DLayer: convolution in a hidden layer',
-               3,
-               None,
-               'convolution2dlayer-convolution-in-a-hidden-layer'),
-              ('Backpropagation in the convolutional layer',
-               3,
-               None,
-               'backpropagation-in-the-convolutional-layer'),
-              ('Demonstration', 3, None, 'demonstration'),
-              ('Pooling Layer', 3, None, 'pooling-layer'),
-              ('Flattening Layer', 3, None, 'flattening-layer'),
-              ('Fully Connected Layers', 3, None, 'fully-connected-layers'),
-              ('Optimized Convolution2DLayer',
-               3,
-               None,
-               'optimized-convolution2dlayer'),
-              ('The Convolutional Neural Network (CNN)',
-               3,
-               None,
-               'the-convolutional-neural-network-cnn'),
-              ('Usage of CNN code', 3, None, 'usage-of-cnn-code'),
-              ('Additional Remarks', 3, None, 'additional-remarks'),
-              ('Remarks on the speed', 3, None, 'remarks-on-the-speed'),
-              ('Convolution using separable kernels',
-               3,
-               None,
-               'convolution-using-separable-kernels'),
-              ('Convolution in the Fourier domain',
-               3,
-               None,
-               'convolution-in-the-fourier-domain')]}
+               'building-convolutional-neural-networks-in-tensorflow-keras-and-pytorch')]}
 end of tocinfo -->
 
 <body>
@@ -247,96 +284,100 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week44-bs.html">Week 44,  Convolutional Neural Networks (CNN)</a>
+    <a class="navbar-brand" href="http://github.com/week44-bs.html">Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs001.html#plan-for-week-44" style="font-size: 80%;"><b>Plan for week 44</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs002.html#lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Lab  sessions on Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs003.html#material-for-lecture-monday-october-28" style="font-size: 80%;"><b>Material for Lecture Monday October 28</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs004.html#convolutional-neural-networks-recognizing-images" style="font-size: 80%;"><b>Convolutional Neural Networks (recognizing images)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs005.html#what-is-the-difference" style="font-size: 80%;"><b>What is the Difference</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs006.html#neural-networks-vs-cnns" style="font-size: 80%;"><b>Neural Networks vs CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs007.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;"><b>Why CNNS for images, sound files, medical images from CT scans etc?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs008.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;"><b>Regular NNs don’t scale well to full images</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs009.html#3d-volumes-of-neurons" style="font-size: 80%;"><b>3D volumes of neurons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs010.html#more-on-dimensionalities" style="font-size: 80%;"><b>More on Dimensionalities</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs011.html#further-remarks" style="font-size: 80%;"><b>Further remarks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs012.html#layers-used-to-build-cnns" style="font-size: 80%;"><b>Layers used to build CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs013.html#transforming-images" style="font-size: 80%;"><b>Transforming images</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs014.html#cnns-in-brief" style="font-size: 80%;"><b>CNNs in brief</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs015.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs016.html#key-idea" style="font-size: 80%;"><b>Key Idea</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs017.html#how-to-do-image-compression-before-the-era-of-deep-learning" style="font-size: 80%;"><b>How to do image compression before the era of deep learning</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs018.html#the-svd-example" style="font-size: 80%;"><b>The SVD example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs019.html#mathematics-of-cnns" style="font-size: 80%;"><b>Mathematics of CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs020.html#convolution-examples-polynomial-multiplication" style="font-size: 80%;"><b>Convolution Examples: Polynomial multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs021.html#efficient-polynomial-multiplication" style="font-size: 80%;"><b>Efficient Polynomial Multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs022.html#further-simplification" style="font-size: 80%;"><b>Further simplification</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs023.html#a-more-efficient-way-of-coding-the-above-convolution" style="font-size: 80%;"><b>A more efficient way of coding the above Convolution</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs024.html#commutative-process" style="font-size: 80%;"><b>Commutative process</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs025.html#toeplitz-matrices" style="font-size: 80%;"><b>Toeplitz matrices</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs026.html#fourier-series-and-toeplitz-matrices" style="font-size: 80%;"><b>Fourier series and Toeplitz matrices</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs027.html#generalizing-the-above-one-dimensional-case" style="font-size: 80%;"><b>Generalizing the above one-dimensional case</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#memory-considerations" style="font-size: 80%;"><b>Memory considerations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs029.html#padding" style="font-size: 80%;"><b>Padding</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs030.html#new-vector" style="font-size: 80%;"><b>New vector</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs031.html#rewriting-as-dot-products" style="font-size: 80%;"><b>Rewriting as dot products</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs032.html#cross-correlation" style="font-size: 80%;"><b>Cross correlation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs032.html#two-dimensional-objects" style="font-size: 80%;"><b>Two-dimensional objects</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs033.html#cnns-in-more-detail-simple-example" style="font-size: 80%;"><b>CNNs in more detail, simple example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs034.html#the-convolution-stage" style="font-size: 80%;"><b>The convolution stage</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs035.html#finding-the-number-of-parameters" style="font-size: 80%;"><b>Finding the number of parameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs036.html#new-image-or-volume" style="font-size: 80%;"><b>New image (or volume)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs037.html#parameters-to-train-common-settings" style="font-size: 80%;"><b>Parameters to train, common settings</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs038.html#examples-of-cnn-setups" style="font-size: 80%;"><b>Examples of CNN setups</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs039.html#summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>Summarizing: Performing a general discrete convolution ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs040.html#pooling" style="font-size: 80%;"><b>Pooling</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs041.html#pooling-arithmetic" style="font-size: 80%;"><b>Pooling arithmetic</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs042.html#pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>Pooling types ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs043.html#building-convolutional-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building convolutional neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs044.html#setting-it-up" style="font-size: 80%;"><b>Setting it up</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs045.html#the-mnist-dataset-again" style="font-size: 80%;"><b>The MNIST dataset again</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs046.html#strong-correlations" style="font-size: 80%;"><b>Strong correlations</b></a></li>
-     <!-- navigation toc: --> <li><a href="#layers-of-a-cnn" style="font-size: 80%;"><b>Layers of a CNN</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs048.html#systematic-reduction" style="font-size: 80%;"><b>Systematic reduction</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs049.html#prerequisites-collect-and-pre-process-data" style="font-size: 80%;"><b>Prerequisites: Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs050.html#importing-keras-and-tensorflow" style="font-size: 80%;"><b>Importing Keras and Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs051.html#running-with-keras" style="font-size: 80%;"><b>Running with Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs052.html#final-part" style="font-size: 80%;"><b>Final part</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs053.html#final-visualization" style="font-size: 80%;"><b>Final visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs054.html#the-cifar01-data-set" style="font-size: 80%;"><b>The CIFAR01 data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs055.html#verifying-the-data-set" style="font-size: 80%;"><b>Verifying the data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs056.html#set-up-the-model" style="font-size: 80%;"><b>Set up  the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs057.html#add-dense-layers-on-top" style="font-size: 80%;"><b>Add Dense layers on top</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs058.html#compile-and-train-the-model" style="font-size: 80%;"><b>Compile and train the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs059.html#finally-evaluate-the-model" style="font-size: 80%;"><b>Finally, evaluate the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#building-our-own-cnn-code" style="font-size: 80%;"><b>Building our own CNN code</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#list-of-contents" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;List of contents:</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution2dlayer-convolution-in-a-hidden-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution2DLayer: convolution in a hidden layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#backpropagation-in-the-convolutional-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Backpropagation in the convolutional layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#demonstration" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Demonstration</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#pooling-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Pooling Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#flattening-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Flattening Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#fully-connected-layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Fully Connected Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#optimized-convolution2dlayer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Optimized Convolution2DLayer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#the-convolutional-neural-network-cnn" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;The Convolutional Neural Network (CNN)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-cnn-code" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of CNN code</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#additional-remarks" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Additional Remarks</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#remarks-on-the-speed" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Remarks on the speed</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution-using-separable-kernels" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution using separable kernels</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution-in-the-fourier-domain" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution in the Fourier domain</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs001.html#plan-for-week-44" style="font-size: 80%;">Plan for week 44</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs002.html#lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;">Lab  sessions on Tuesday and Wednesday</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs003.html#material-for-lecture-monday-october-27" style="font-size: 80%;">Material for Lecture Monday October 27</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs004.html#solving-differential-equations-with-deep-learning" style="font-size: 80%;">Solving differential equations  with Deep Learning</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs005.html#ordinary-differential-equations-first" style="font-size: 80%;">Ordinary Differential Equations first</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#the-trial-solution" style="font-size: 80%;">The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs007.html#minimization-process" style="font-size: 80%;">Minimization process</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs008.html#minimizing-the-cost-function-using-gradient-descent-and-automatic-differentiation" style="font-size: 80%;">Minimizing the cost function using gradient descent and automatic differentiation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs009.html#example-exponential-decay" style="font-size: 80%;">Example: Exponential decay</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs010.html#the-function-to-solve-for" style="font-size: 80%;">The function to solve for</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#the-trial-solution" style="font-size: 80%;">The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs012.html#setup-of-network" style="font-size: 80%;">Setup of Network</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs013.html#reformulating-the-problem" style="font-size: 80%;">Reformulating the problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs014.html#more-technicalities" style="font-size: 80%;">More technicalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs039.html#more-details" style="font-size: 80%;">More details</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs016.html#a-possible-implementation-of-a-neural-network" style="font-size: 80%;">A possible implementation of a neural network</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs017.html#technicalities" style="font-size: 80%;">Technicalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs018.html#final-technicalities-i" style="font-size: 80%;">Final technicalities I</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs019.html#final-technicalities-ii" style="font-size: 80%;">Final technicalities II</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs020.html#final-technicalities-iii" style="font-size: 80%;">Final technicalities III</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs021.html#final-technicalities-iv" style="font-size: 80%;">Final technicalities IV</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs022.html#back-propagation" style="font-size: 80%;">Back propagation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs023.html#gradient-descent" style="font-size: 80%;">Gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs024.html#the-code-for-solving-the-ode" style="font-size: 80%;">The code for solving the ODE</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs025.html#the-network-with-one-input-layer-specified-number-of-hidden-layers-and-one-output-layer" style="font-size: 80%;">The network with one input layer, specified number of hidden layers, and one output layer</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs026.html#example-population-growth" style="font-size: 80%;">Example: Population growth</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs027.html#setting-up-the-problem" style="font-size: 80%;">Setting up the problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#the-trial-solution" style="font-size: 80%;">The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs029.html#the-program-using-autograd" style="font-size: 80%;">The program using Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs030.html#using-forward-euler-to-solve-the-ode" style="font-size: 80%;">Using forward Euler to solve the ODE</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs031.html#example-solving-the-one-dimensional-poisson-equation" style="font-size: 80%;">Example: Solving the one dimensional Poisson equation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs032.html#the-specific-equation-to-solve-for" style="font-size: 80%;">The specific equation to solve for</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs033.html#solving-the-equation-using-autograd" style="font-size: 80%;">Solving the equation using Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs034.html#comparing-with-a-numerical-scheme" style="font-size: 80%;">Comparing with a numerical scheme</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs035.html#setting-up-the-code" style="font-size: 80%;">Setting up the code</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs036.html#partial-differential-equations" style="font-size: 80%;">Partial Differential Equations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs037.html#type-of-problem" style="font-size: 80%;">Type of problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs038.html#network-requirements" style="font-size: 80%;">Network requirements</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs039.html#more-details" style="font-size: 80%;">More details</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs040.html#example-the-diffusion-equation" style="font-size: 80%;">Example: The diffusion equation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs041.html#defining-the-problem" style="font-size: 80%;">Defining the problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs042.html#setting-up-the-network-using-autograd" style="font-size: 80%;">Setting up the network using Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs043.html#setting-up-the-network-using-autograd-the-trial-solution" style="font-size: 80%;">Setting up the network using Autograd; The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs044.html#why-the-jacobian" style="font-size: 80%;">Why the Jacobian?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs045.html#setting-up-the-network-using-autograd-the-full-program" style="font-size: 80%;">Setting up the network using Autograd; The full program</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs046.html#resources-on-differential-equations-and-deep-learning" style="font-size: 80%;">Resources on differential equations and deep learning</a></li>
+     <!-- navigation toc: --> <li><a href="#convolutional-neural-networks-recognizing-images" style="font-size: 80%;">Convolutional Neural Networks (recognizing images)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs048.html#what-is-the-difference" style="font-size: 80%;">What is the Difference</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs049.html#neural-networks-vs-cnns" style="font-size: 80%;">Neural Networks vs CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs050.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;">Why CNNS for images, sound files, medical images from CT scans etc?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs051.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;">Regular NNs don’t scale well to full images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs052.html#3d-volumes-of-neurons" style="font-size: 80%;">3D volumes of neurons</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs053.html#more-on-dimensionalities" style="font-size: 80%;">More on Dimensionalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs054.html#further-remarks" style="font-size: 80%;">Further remarks</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs055.html#layers-used-to-build-cnns" style="font-size: 80%;">Layers used to build CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs056.html#transforming-images" style="font-size: 80%;">Transforming images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs057.html#cnns-in-brief" style="font-size: 80%;">CNNs in brief</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs058.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs059.html#key-idea" style="font-size: 80%;">Key Idea</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#how-to-do-image-compression-before-the-era-of-deep-learning" style="font-size: 80%;">How to do image compression before the era of deep learning</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs061.html#the-svd-example" style="font-size: 80%;">The SVD example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs062.html#mathematics-of-cnns" style="font-size: 80%;">Mathematics of CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs063.html#convolution-examples-polynomial-multiplication" style="font-size: 80%;">Convolution Examples: Polynomial multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs064.html#efficient-polynomial-multiplication" style="font-size: 80%;">Efficient Polynomial Multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs065.html#further-simplification" style="font-size: 80%;">Further simplification</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs066.html#a-more-efficient-way-of-coding-the-above-convolution" style="font-size: 80%;">A more efficient way of coding the above Convolution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs067.html#commutative-process" style="font-size: 80%;">Commutative process</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs068.html#toeplitz-matrices" style="font-size: 80%;">Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs069.html#fourier-series-and-toeplitz-matrices" style="font-size: 80%;">Fourier series and Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs070.html#generalizing-the-above-one-dimensional-case" style="font-size: 80%;">Generalizing the above one-dimensional case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs071.html#memory-considerations" style="font-size: 80%;">Memory considerations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs072.html#padding" style="font-size: 80%;">Padding</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs073.html#new-vector" style="font-size: 80%;">New vector</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs074.html#rewriting-as-dot-products" style="font-size: 80%;">Rewriting as dot products</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs075.html#cross-correlation" style="font-size: 80%;">Cross correlation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs075.html#two-dimensional-objects" style="font-size: 80%;">Two-dimensional objects</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs076.html#cnns-in-more-detail-simple-example" style="font-size: 80%;">CNNs in more detail, simple example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs077.html#the-convolution-stage" style="font-size: 80%;">The convolution stage</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs078.html#finding-the-number-of-parameters" style="font-size: 80%;">Finding the number of parameters</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs079.html#new-image-or-volume" style="font-size: 80%;">New image (or volume)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs080.html#parameters-to-train-common-settings" style="font-size: 80%;">Parameters to train, common settings</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs081.html#examples-of-cnn-setups" style="font-size: 80%;">Examples of CNN setups</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs082.html#summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Summarizing: Performing a general discrete convolution ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs083.html#pooling" style="font-size: 80%;">Pooling</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs084.html#pooling-arithmetic" style="font-size: 80%;">Pooling arithmetic</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs085.html#pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Pooling types ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs086.html#building-convolutional-neural-networks-in-tensorflow-keras-and-pytorch" style="font-size: 80%;">Building convolutional neural networks in Tensorflow/Keras and PyTorch</a></li>
 
         </ul>
       </li>
@@ -347,26 +388,26 @@
 <div class="container">
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0047"></a>
-<!-- !split  -->
-<h2 id="layers-of-a-cnn" class="anchor">Layers of a CNN </h2>
-
-<p>The layers of a convolutional neural network arrange neurons in 3D: width, height and depth.  
-The input image is typically a square matrix of depth 3. 
-</p>
+<!-- !split -->
+<h2 id="convolutional-neural-networks-recognizing-images" class="anchor">Convolutional Neural Networks (recognizing images) </h2>
 
-<p>A <b>convolution</b> is performed on the image which outputs
-a 3D volume of neurons. The weights to the input are arranged in a number of 2D matrices, known as <b>filters</b>.
+<p>Convolutional neural networks (CNNs) were developed during the last
+decade of the previous century, with a focus on character recognition
+tasks. Nowadays, CNNs are a central element in the spectacular success
+of deep learning methods. The success in for example image
+classifications have made them a central tool for most machine
+learning practitioners.
 </p>
 
-<p>Each filter slides along the input image, taking the dot product
-between each small part of the image and the filter, in all depth
-dimensions. This is then passed through a non-linear function,
-typically the <b>Rectified Linear (ReLu)</b> function, which serves as the
-activation of the neurons in the first convolutional layer. This is
-further passed through a <b>pooling layer</b>, which reduces the size of the
-convolutional layer, e.g. by taking the maximum or average across some
-small regions, and this serves as input to the next convolutional
-layer.
+<p>CNNs are very similar to ordinary Neural Networks.
+They are made up of neurons that have learnable weights and
+biases. Each neuron receives some inputs, performs a dot product and
+optionally follows it with a non-linearity. The whole network still
+expresses a single differentiable score function: from the raw image
+pixels on one end to class scores at the other. And they still have a
+loss function (for example Softmax) on the last (fully-connected) layer
+and all the tips/tricks we developed for learning regular Neural
+Networks still apply (back propagation, gradient descent etc etc).
 </p>
 
 <p>
@@ -394,7 +435,7 @@ <h2 id="layers-of-a-cnn" class="anchor">Layers of a CNN </h2>
   <li><a href="http://github.com/._week44-bs055.html">56</a></li>
   <li><a href="http://github.com/._week44-bs056.html">57</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week44-bs060.html">61</a></li>
+  <li><a href="http://github.com/._week44-bs086.html">87</a></li>
   <li><a href="http://github.com/._week44-bs048.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week44/html/._week44-bs048.html b/doc/pub/week44/html/._week44-bs048.html
index ce0df7fe7..cdec43e00 100644
--- a/doc/pub/week44/html/._week44-bs048.html
+++ b/doc/pub/week44/html/._week44-bs048.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 44,  Convolutional Neural Networks (CNN)">
-<title>Week 44,  Convolutional Neural Networks (CNN)</title>
+<meta name="description" content="Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)">
+<title>Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week44.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week44-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -41,10 +41,121 @@
                2,
                None,
                'lab-sessions-on-tuesday-and-wednesday'),
-              ('Material for Lecture Monday October 28',
+              ('Material for Lecture Monday October 27',
                2,
                None,
-               'material-for-lecture-monday-october-28'),
+               'material-for-lecture-monday-october-27'),
+              ('Solving differential equations  with Deep Learning',
+               2,
+               None,
+               'solving-differential-equations-with-deep-learning'),
+              ('Ordinary Differential Equations first',
+               2,
+               None,
+               'ordinary-differential-equations-first'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('Minimization process', 2, None, 'minimization-process'),
+              ('Minimizing the cost function using gradient descent and '
+               'automatic differentiation',
+               2,
+               None,
+               'minimizing-the-cost-function-using-gradient-descent-and-automatic-differentiation'),
+              ('Example: Exponential decay',
+               2,
+               None,
+               'example-exponential-decay'),
+              ('The function to solve for',
+               2,
+               None,
+               'the-function-to-solve-for'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('Setup of Network', 2, None, 'setup-of-network'),
+              ('Reformulating the problem',
+               2,
+               None,
+               'reformulating-the-problem'),
+              ('More technicalities', 2, None, 'more-technicalities'),
+              ('More details', 2, None, 'more-details'),
+              ('A possible implementation of a neural network',
+               2,
+               None,
+               'a-possible-implementation-of-a-neural-network'),
+              ('Technicalities', 2, None, 'technicalities'),
+              ('Final technicalities I', 2, None, 'final-technicalities-i'),
+              ('Final technicalities II', 2, None, 'final-technicalities-ii'),
+              ('Final technicalities III', 2, None, 'final-technicalities-iii'),
+              ('Final technicalities IV', 2, None, 'final-technicalities-iv'),
+              ('Back propagation', 2, None, 'back-propagation'),
+              ('Gradient descent', 2, None, 'gradient-descent'),
+              ('The code for solving the ODE',
+               2,
+               None,
+               'the-code-for-solving-the-ode'),
+              ('The network with one input layer, specified number of hidden '
+               'layers, and one output layer',
+               2,
+               None,
+               'the-network-with-one-input-layer-specified-number-of-hidden-layers-and-one-output-layer'),
+              ('Example: Population growth',
+               2,
+               None,
+               'example-population-growth'),
+              ('Setting up the problem', 2, None, 'setting-up-the-problem'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('The program using Autograd',
+               2,
+               None,
+               'the-program-using-autograd'),
+              ('Using forward Euler to solve the ODE',
+               2,
+               None,
+               'using-forward-euler-to-solve-the-ode'),
+              ('Example: Solving the one dimensional Poisson equation',
+               2,
+               None,
+               'example-solving-the-one-dimensional-poisson-equation'),
+              ('The specific equation to solve for',
+               2,
+               None,
+               'the-specific-equation-to-solve-for'),
+              ('Solving the equation using Autograd',
+               2,
+               None,
+               'solving-the-equation-using-autograd'),
+              ('Comparing with a numerical scheme',
+               2,
+               None,
+               'comparing-with-a-numerical-scheme'),
+              ('Setting up the code', 2, None, 'setting-up-the-code'),
+              ('Partial Differential Equations',
+               2,
+               None,
+               'partial-differential-equations'),
+              ('Type of problem', 2, None, 'type-of-problem'),
+              ('Network requirements', 2, None, 'network-requirements'),
+              ('More details', 2, None, 'more-details'),
+              ('Example: The diffusion equation',
+               2,
+               None,
+               'example-the-diffusion-equation'),
+              ('Defining the problem', 2, None, 'defining-the-problem'),
+              ('Setting up the network using Autograd',
+               2,
+               None,
+               'setting-up-the-network-using-autograd'),
+              ('Setting up the network using Autograd; The trial solution',
+               2,
+               None,
+               'setting-up-the-network-using-autograd-the-trial-solution'),
+              ('Why the Jacobian?', 2, None, 'why-the-jacobian'),
+              ('Setting up the network using Autograd; The full program',
+               2,
+               None,
+               'setting-up-the-network-using-autograd-the-full-program'),
+              ('Resources on differential equations and deep learning',
+               2,
+               None,
+               'resources-on-differential-equations-and-deep-learning'),
               ('Convolutional Neural Networks (recognizing images)',
                2,
                None,
@@ -141,85 +252,11 @@
                2,
                None,
                'pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
-              ('Building convolutional neural networks in Tensorflow and Keras',
-               2,
-               None,
-               'building-convolutional-neural-networks-in-tensorflow-and-keras'),
-              ('Setting it up', 2, None, 'setting-it-up'),
-              ('The MNIST dataset again', 2, None, 'the-mnist-dataset-again'),
-              ('Strong correlations', 2, None, 'strong-correlations'),
-              ('Layers of a CNN', 2, None, 'layers-of-a-cnn'),
-              ('Systematic reduction', 2, None, 'systematic-reduction'),
-              ('Prerequisites: Collect and pre-process data',
+              ('Building convolutional neural networks in Tensorflow/Keras and '
+               'PyTorch',
                2,
                None,
-               'prerequisites-collect-and-pre-process-data'),
-              ('Importing Keras and Tensorflow',
-               2,
-               None,
-               'importing-keras-and-tensorflow'),
-              ('Running with Keras', 2, None, 'running-with-keras'),
-              ('Final part', 2, None, 'final-part'),
-              ('Final visualization', 2, None, 'final-visualization'),
-              ('The CIFAR01 data set', 2, None, 'the-cifar01-data-set'),
-              ('Verifying the data set', 2, None, 'verifying-the-data-set'),
-              ('Set up  the model', 2, None, 'set-up-the-model'),
-              ('Add Dense layers on top', 2, None, 'add-dense-layers-on-top'),
-              ('Compile and train the model',
-               2,
-               None,
-               'compile-and-train-the-model'),
-              ('Finally, evaluate the model',
-               2,
-               None,
-               'finally-evaluate-the-model'),
-              ('Building our own CNN code',
-               2,
-               None,
-               'building-our-own-cnn-code'),
-              ('List of contents:', 3, None, 'list-of-contents'),
-              ('Schedulers', 3, None, 'schedulers'),
-              ('Usage of schedulers', 3, None, 'usage-of-schedulers'),
-              ('Cost functions', 3, None, 'cost-functions'),
-              ('Usage of cost functions', 3, None, 'usage-of-cost-functions'),
-              ('Activation functions', 3, None, 'activation-functions'),
-              ('Usage of activation functions',
-               3,
-               None,
-               'usage-of-activation-functions'),
-              ('Convolution', 3, None, 'convolution'),
-              ('Layers', 3, None, 'layers'),
-              ('Convolution2DLayer: convolution in a hidden layer',
-               3,
-               None,
-               'convolution2dlayer-convolution-in-a-hidden-layer'),
-              ('Backpropagation in the convolutional layer',
-               3,
-               None,
-               'backpropagation-in-the-convolutional-layer'),
-              ('Demonstration', 3, None, 'demonstration'),
-              ('Pooling Layer', 3, None, 'pooling-layer'),
-              ('Flattening Layer', 3, None, 'flattening-layer'),
-              ('Fully Connected Layers', 3, None, 'fully-connected-layers'),
-              ('Optimized Convolution2DLayer',
-               3,
-               None,
-               'optimized-convolution2dlayer'),
-              ('The Convolutional Neural Network (CNN)',
-               3,
-               None,
-               'the-convolutional-neural-network-cnn'),
-              ('Usage of CNN code', 3, None, 'usage-of-cnn-code'),
-              ('Additional Remarks', 3, None, 'additional-remarks'),
-              ('Remarks on the speed', 3, None, 'remarks-on-the-speed'),
-              ('Convolution using separable kernels',
-               3,
-               None,
-               'convolution-using-separable-kernels'),
-              ('Convolution in the Fourier domain',
-               3,
-               None,
-               'convolution-in-the-fourier-domain')]}
+               'building-convolutional-neural-networks-in-tensorflow-keras-and-pytorch')]}
 end of tocinfo -->
 
 <body>
@@ -247,96 +284,100 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week44-bs.html">Week 44,  Convolutional Neural Networks (CNN)</a>
+    <a class="navbar-brand" href="http://github.com/week44-bs.html">Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs001.html#plan-for-week-44" style="font-size: 80%;"><b>Plan for week 44</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs002.html#lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Lab  sessions on Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs003.html#material-for-lecture-monday-october-28" style="font-size: 80%;"><b>Material for Lecture Monday October 28</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs004.html#convolutional-neural-networks-recognizing-images" style="font-size: 80%;"><b>Convolutional Neural Networks (recognizing images)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs005.html#what-is-the-difference" style="font-size: 80%;"><b>What is the Difference</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs006.html#neural-networks-vs-cnns" style="font-size: 80%;"><b>Neural Networks vs CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs007.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;"><b>Why CNNS for images, sound files, medical images from CT scans etc?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs008.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;"><b>Regular NNs don’t scale well to full images</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs009.html#3d-volumes-of-neurons" style="font-size: 80%;"><b>3D volumes of neurons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs010.html#more-on-dimensionalities" style="font-size: 80%;"><b>More on Dimensionalities</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs011.html#further-remarks" style="font-size: 80%;"><b>Further remarks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs012.html#layers-used-to-build-cnns" style="font-size: 80%;"><b>Layers used to build CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs013.html#transforming-images" style="font-size: 80%;"><b>Transforming images</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs014.html#cnns-in-brief" style="font-size: 80%;"><b>CNNs in brief</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs015.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs016.html#key-idea" style="font-size: 80%;"><b>Key Idea</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs017.html#how-to-do-image-compression-before-the-era-of-deep-learning" style="font-size: 80%;"><b>How to do image compression before the era of deep learning</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs018.html#the-svd-example" style="font-size: 80%;"><b>The SVD example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs019.html#mathematics-of-cnns" style="font-size: 80%;"><b>Mathematics of CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs020.html#convolution-examples-polynomial-multiplication" style="font-size: 80%;"><b>Convolution Examples: Polynomial multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs021.html#efficient-polynomial-multiplication" style="font-size: 80%;"><b>Efficient Polynomial Multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs022.html#further-simplification" style="font-size: 80%;"><b>Further simplification</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs023.html#a-more-efficient-way-of-coding-the-above-convolution" style="font-size: 80%;"><b>A more efficient way of coding the above Convolution</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs024.html#commutative-process" style="font-size: 80%;"><b>Commutative process</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs025.html#toeplitz-matrices" style="font-size: 80%;"><b>Toeplitz matrices</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs026.html#fourier-series-and-toeplitz-matrices" style="font-size: 80%;"><b>Fourier series and Toeplitz matrices</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs027.html#generalizing-the-above-one-dimensional-case" style="font-size: 80%;"><b>Generalizing the above one-dimensional case</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#memory-considerations" style="font-size: 80%;"><b>Memory considerations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs029.html#padding" style="font-size: 80%;"><b>Padding</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs030.html#new-vector" style="font-size: 80%;"><b>New vector</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs031.html#rewriting-as-dot-products" style="font-size: 80%;"><b>Rewriting as dot products</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs032.html#cross-correlation" style="font-size: 80%;"><b>Cross correlation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs032.html#two-dimensional-objects" style="font-size: 80%;"><b>Two-dimensional objects</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs033.html#cnns-in-more-detail-simple-example" style="font-size: 80%;"><b>CNNs in more detail, simple example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs034.html#the-convolution-stage" style="font-size: 80%;"><b>The convolution stage</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs035.html#finding-the-number-of-parameters" style="font-size: 80%;"><b>Finding the number of parameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs036.html#new-image-or-volume" style="font-size: 80%;"><b>New image (or volume)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs037.html#parameters-to-train-common-settings" style="font-size: 80%;"><b>Parameters to train, common settings</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs038.html#examples-of-cnn-setups" style="font-size: 80%;"><b>Examples of CNN setups</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs039.html#summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>Summarizing: Performing a general discrete convolution ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs040.html#pooling" style="font-size: 80%;"><b>Pooling</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs041.html#pooling-arithmetic" style="font-size: 80%;"><b>Pooling arithmetic</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs042.html#pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>Pooling types ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs043.html#building-convolutional-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building convolutional neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs044.html#setting-it-up" style="font-size: 80%;"><b>Setting it up</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs045.html#the-mnist-dataset-again" style="font-size: 80%;"><b>The MNIST dataset again</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs046.html#strong-correlations" style="font-size: 80%;"><b>Strong correlations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs047.html#layers-of-a-cnn" style="font-size: 80%;"><b>Layers of a CNN</b></a></li>
-     <!-- navigation toc: --> <li><a href="#systematic-reduction" style="font-size: 80%;"><b>Systematic reduction</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs049.html#prerequisites-collect-and-pre-process-data" style="font-size: 80%;"><b>Prerequisites: Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs050.html#importing-keras-and-tensorflow" style="font-size: 80%;"><b>Importing Keras and Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs051.html#running-with-keras" style="font-size: 80%;"><b>Running with Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs052.html#final-part" style="font-size: 80%;"><b>Final part</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs053.html#final-visualization" style="font-size: 80%;"><b>Final visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs054.html#the-cifar01-data-set" style="font-size: 80%;"><b>The CIFAR01 data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs055.html#verifying-the-data-set" style="font-size: 80%;"><b>Verifying the data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs056.html#set-up-the-model" style="font-size: 80%;"><b>Set up  the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs057.html#add-dense-layers-on-top" style="font-size: 80%;"><b>Add Dense layers on top</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs058.html#compile-and-train-the-model" style="font-size: 80%;"><b>Compile and train the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs059.html#finally-evaluate-the-model" style="font-size: 80%;"><b>Finally, evaluate the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#building-our-own-cnn-code" style="font-size: 80%;"><b>Building our own CNN code</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#list-of-contents" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;List of contents:</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution2dlayer-convolution-in-a-hidden-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution2DLayer: convolution in a hidden layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#backpropagation-in-the-convolutional-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Backpropagation in the convolutional layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#demonstration" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Demonstration</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#pooling-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Pooling Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#flattening-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Flattening Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#fully-connected-layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Fully Connected Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#optimized-convolution2dlayer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Optimized Convolution2DLayer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#the-convolutional-neural-network-cnn" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;The Convolutional Neural Network (CNN)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-cnn-code" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of CNN code</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#additional-remarks" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Additional Remarks</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#remarks-on-the-speed" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Remarks on the speed</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution-using-separable-kernels" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution using separable kernels</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution-in-the-fourier-domain" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution in the Fourier domain</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs001.html#plan-for-week-44" style="font-size: 80%;">Plan for week 44</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs002.html#lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;">Lab  sessions on Tuesday and Wednesday</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs003.html#material-for-lecture-monday-october-27" style="font-size: 80%;">Material for Lecture Monday October 27</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs004.html#solving-differential-equations-with-deep-learning" style="font-size: 80%;">Solving differential equations  with Deep Learning</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs005.html#ordinary-differential-equations-first" style="font-size: 80%;">Ordinary Differential Equations first</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#the-trial-solution" style="font-size: 80%;">The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs007.html#minimization-process" style="font-size: 80%;">Minimization process</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs008.html#minimizing-the-cost-function-using-gradient-descent-and-automatic-differentiation" style="font-size: 80%;">Minimizing the cost function using gradient descent and automatic differentiation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs009.html#example-exponential-decay" style="font-size: 80%;">Example: Exponential decay</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs010.html#the-function-to-solve-for" style="font-size: 80%;">The function to solve for</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#the-trial-solution" style="font-size: 80%;">The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs012.html#setup-of-network" style="font-size: 80%;">Setup of Network</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs013.html#reformulating-the-problem" style="font-size: 80%;">Reformulating the problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs014.html#more-technicalities" style="font-size: 80%;">More technicalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs039.html#more-details" style="font-size: 80%;">More details</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs016.html#a-possible-implementation-of-a-neural-network" style="font-size: 80%;">A possible implementation of a neural network</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs017.html#technicalities" style="font-size: 80%;">Technicalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs018.html#final-technicalities-i" style="font-size: 80%;">Final technicalities I</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs019.html#final-technicalities-ii" style="font-size: 80%;">Final technicalities II</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs020.html#final-technicalities-iii" style="font-size: 80%;">Final technicalities III</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs021.html#final-technicalities-iv" style="font-size: 80%;">Final technicalities IV</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs022.html#back-propagation" style="font-size: 80%;">Back propagation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs023.html#gradient-descent" style="font-size: 80%;">Gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs024.html#the-code-for-solving-the-ode" style="font-size: 80%;">The code for solving the ODE</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs025.html#the-network-with-one-input-layer-specified-number-of-hidden-layers-and-one-output-layer" style="font-size: 80%;">The network with one input layer, specified number of hidden layers, and one output layer</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs026.html#example-population-growth" style="font-size: 80%;">Example: Population growth</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs027.html#setting-up-the-problem" style="font-size: 80%;">Setting up the problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#the-trial-solution" style="font-size: 80%;">The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs029.html#the-program-using-autograd" style="font-size: 80%;">The program using Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs030.html#using-forward-euler-to-solve-the-ode" style="font-size: 80%;">Using forward Euler to solve the ODE</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs031.html#example-solving-the-one-dimensional-poisson-equation" style="font-size: 80%;">Example: Solving the one dimensional Poisson equation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs032.html#the-specific-equation-to-solve-for" style="font-size: 80%;">The specific equation to solve for</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs033.html#solving-the-equation-using-autograd" style="font-size: 80%;">Solving the equation using Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs034.html#comparing-with-a-numerical-scheme" style="font-size: 80%;">Comparing with a numerical scheme</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs035.html#setting-up-the-code" style="font-size: 80%;">Setting up the code</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs036.html#partial-differential-equations" style="font-size: 80%;">Partial Differential Equations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs037.html#type-of-problem" style="font-size: 80%;">Type of problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs038.html#network-requirements" style="font-size: 80%;">Network requirements</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs039.html#more-details" style="font-size: 80%;">More details</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs040.html#example-the-diffusion-equation" style="font-size: 80%;">Example: The diffusion equation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs041.html#defining-the-problem" style="font-size: 80%;">Defining the problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs042.html#setting-up-the-network-using-autograd" style="font-size: 80%;">Setting up the network using Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs043.html#setting-up-the-network-using-autograd-the-trial-solution" style="font-size: 80%;">Setting up the network using Autograd; The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs044.html#why-the-jacobian" style="font-size: 80%;">Why the Jacobian?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs045.html#setting-up-the-network-using-autograd-the-full-program" style="font-size: 80%;">Setting up the network using Autograd; The full program</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs046.html#resources-on-differential-equations-and-deep-learning" style="font-size: 80%;">Resources on differential equations and deep learning</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs047.html#convolutional-neural-networks-recognizing-images" style="font-size: 80%;">Convolutional Neural Networks (recognizing images)</a></li>
+     <!-- navigation toc: --> <li><a href="#what-is-the-difference" style="font-size: 80%;">What is the Difference</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs049.html#neural-networks-vs-cnns" style="font-size: 80%;">Neural Networks vs CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs050.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;">Why CNNS for images, sound files, medical images from CT scans etc?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs051.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;">Regular NNs don’t scale well to full images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs052.html#3d-volumes-of-neurons" style="font-size: 80%;">3D volumes of neurons</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs053.html#more-on-dimensionalities" style="font-size: 80%;">More on Dimensionalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs054.html#further-remarks" style="font-size: 80%;">Further remarks</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs055.html#layers-used-to-build-cnns" style="font-size: 80%;">Layers used to build CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs056.html#transforming-images" style="font-size: 80%;">Transforming images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs057.html#cnns-in-brief" style="font-size: 80%;">CNNs in brief</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs058.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs059.html#key-idea" style="font-size: 80%;">Key Idea</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#how-to-do-image-compression-before-the-era-of-deep-learning" style="font-size: 80%;">How to do image compression before the era of deep learning</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs061.html#the-svd-example" style="font-size: 80%;">The SVD example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs062.html#mathematics-of-cnns" style="font-size: 80%;">Mathematics of CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs063.html#convolution-examples-polynomial-multiplication" style="font-size: 80%;">Convolution Examples: Polynomial multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs064.html#efficient-polynomial-multiplication" style="font-size: 80%;">Efficient Polynomial Multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs065.html#further-simplification" style="font-size: 80%;">Further simplification</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs066.html#a-more-efficient-way-of-coding-the-above-convolution" style="font-size: 80%;">A more efficient way of coding the above Convolution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs067.html#commutative-process" style="font-size: 80%;">Commutative process</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs068.html#toeplitz-matrices" style="font-size: 80%;">Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs069.html#fourier-series-and-toeplitz-matrices" style="font-size: 80%;">Fourier series and Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs070.html#generalizing-the-above-one-dimensional-case" style="font-size: 80%;">Generalizing the above one-dimensional case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs071.html#memory-considerations" style="font-size: 80%;">Memory considerations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs072.html#padding" style="font-size: 80%;">Padding</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs073.html#new-vector" style="font-size: 80%;">New vector</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs074.html#rewriting-as-dot-products" style="font-size: 80%;">Rewriting as dot products</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs075.html#cross-correlation" style="font-size: 80%;">Cross correlation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs075.html#two-dimensional-objects" style="font-size: 80%;">Two-dimensional objects</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs076.html#cnns-in-more-detail-simple-example" style="font-size: 80%;">CNNs in more detail, simple example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs077.html#the-convolution-stage" style="font-size: 80%;">The convolution stage</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs078.html#finding-the-number-of-parameters" style="font-size: 80%;">Finding the number of parameters</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs079.html#new-image-or-volume" style="font-size: 80%;">New image (or volume)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs080.html#parameters-to-train-common-settings" style="font-size: 80%;">Parameters to train, common settings</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs081.html#examples-of-cnn-setups" style="font-size: 80%;">Examples of CNN setups</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs082.html#summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Summarizing: Performing a general discrete convolution ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs083.html#pooling" style="font-size: 80%;">Pooling</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs084.html#pooling-arithmetic" style="font-size: 80%;">Pooling arithmetic</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs085.html#pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Pooling types ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs086.html#building-convolutional-neural-networks-in-tensorflow-keras-and-pytorch" style="font-size: 80%;">Building convolutional neural networks in Tensorflow/Keras and PyTorch</a></li>
 
         </ul>
       </li>
@@ -348,19 +389,15 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0048"></a>
 <!-- !split -->
-<h2 id="systematic-reduction" class="anchor">Systematic reduction </h2>
+<h2 id="what-is-the-difference" class="anchor">What is the Difference </h2>
 
-<p>By systematically reducing the size of the input volume, through
-convolution and pooling, the network should create representations of
-small parts of the input, and then from them assemble representations
-of larger areas.  The final pooling layer is flattened to serve as
-input to a hidden layer, such that each neuron in the final pooling
-layer is connected to every single neuron in the hidden layer. This
-then serves as input to the output layer, e.g. a softmax output for
-classification.
+<p><b>CNN architectures make the explicit assumption that
+the inputs are images, which allows us to encode certain properties
+into the architecture. These then make the forward function more
+efficient to implement and vastly reduce the amount of parameters in
+the network.</b>
 </p>
 
-
 <p>
 <!-- navigation buttons at the bottom of the page -->
 <ul class="pagination">
@@ -386,7 +423,7 @@ <h2 id="systematic-reduction" class="anchor">Systematic reduction </h2>
   <li><a href="http://github.com/._week44-bs056.html">57</a></li>
   <li><a href="http://github.com/._week44-bs057.html">58</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week44-bs060.html">61</a></li>
+  <li><a href="http://github.com/._week44-bs086.html">87</a></li>
   <li><a href="http://github.com/._week44-bs049.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week44/html/._week44-bs049.html b/doc/pub/week44/html/._week44-bs049.html
index 8629a0fc5..c7bc18101 100644
--- a/doc/pub/week44/html/._week44-bs049.html
+++ b/doc/pub/week44/html/._week44-bs049.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 44,  Convolutional Neural Networks (CNN)">
-<title>Week 44,  Convolutional Neural Networks (CNN)</title>
+<meta name="description" content="Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)">
+<title>Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week44.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week44-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -41,10 +41,121 @@
                2,
                None,
                'lab-sessions-on-tuesday-and-wednesday'),
-              ('Material for Lecture Monday October 28',
+              ('Material for Lecture Monday October 27',
                2,
                None,
-               'material-for-lecture-monday-october-28'),
+               'material-for-lecture-monday-october-27'),
+              ('Solving differential equations  with Deep Learning',
+               2,
+               None,
+               'solving-differential-equations-with-deep-learning'),
+              ('Ordinary Differential Equations first',
+               2,
+               None,
+               'ordinary-differential-equations-first'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('Minimization process', 2, None, 'minimization-process'),
+              ('Minimizing the cost function using gradient descent and '
+               'automatic differentiation',
+               2,
+               None,
+               'minimizing-the-cost-function-using-gradient-descent-and-automatic-differentiation'),
+              ('Example: Exponential decay',
+               2,
+               None,
+               'example-exponential-decay'),
+              ('The function to solve for',
+               2,
+               None,
+               'the-function-to-solve-for'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('Setup of Network', 2, None, 'setup-of-network'),
+              ('Reformulating the problem',
+               2,
+               None,
+               'reformulating-the-problem'),
+              ('More technicalities', 2, None, 'more-technicalities'),
+              ('More details', 2, None, 'more-details'),
+              ('A possible implementation of a neural network',
+               2,
+               None,
+               'a-possible-implementation-of-a-neural-network'),
+              ('Technicalities', 2, None, 'technicalities'),
+              ('Final technicalities I', 2, None, 'final-technicalities-i'),
+              ('Final technicalities II', 2, None, 'final-technicalities-ii'),
+              ('Final technicalities III', 2, None, 'final-technicalities-iii'),
+              ('Final technicalities IV', 2, None, 'final-technicalities-iv'),
+              ('Back propagation', 2, None, 'back-propagation'),
+              ('Gradient descent', 2, None, 'gradient-descent'),
+              ('The code for solving the ODE',
+               2,
+               None,
+               'the-code-for-solving-the-ode'),
+              ('The network with one input layer, specified number of hidden '
+               'layers, and one output layer',
+               2,
+               None,
+               'the-network-with-one-input-layer-specified-number-of-hidden-layers-and-one-output-layer'),
+              ('Example: Population growth',
+               2,
+               None,
+               'example-population-growth'),
+              ('Setting up the problem', 2, None, 'setting-up-the-problem'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('The program using Autograd',
+               2,
+               None,
+               'the-program-using-autograd'),
+              ('Using forward Euler to solve the ODE',
+               2,
+               None,
+               'using-forward-euler-to-solve-the-ode'),
+              ('Example: Solving the one dimensional Poisson equation',
+               2,
+               None,
+               'example-solving-the-one-dimensional-poisson-equation'),
+              ('The specific equation to solve for',
+               2,
+               None,
+               'the-specific-equation-to-solve-for'),
+              ('Solving the equation using Autograd',
+               2,
+               None,
+               'solving-the-equation-using-autograd'),
+              ('Comparing with a numerical scheme',
+               2,
+               None,
+               'comparing-with-a-numerical-scheme'),
+              ('Setting up the code', 2, None, 'setting-up-the-code'),
+              ('Partial Differential Equations',
+               2,
+               None,
+               'partial-differential-equations'),
+              ('Type of problem', 2, None, 'type-of-problem'),
+              ('Network requirements', 2, None, 'network-requirements'),
+              ('More details', 2, None, 'more-details'),
+              ('Example: The diffusion equation',
+               2,
+               None,
+               'example-the-diffusion-equation'),
+              ('Defining the problem', 2, None, 'defining-the-problem'),
+              ('Setting up the network using Autograd',
+               2,
+               None,
+               'setting-up-the-network-using-autograd'),
+              ('Setting up the network using Autograd; The trial solution',
+               2,
+               None,
+               'setting-up-the-network-using-autograd-the-trial-solution'),
+              ('Why the Jacobian?', 2, None, 'why-the-jacobian'),
+              ('Setting up the network using Autograd; The full program',
+               2,
+               None,
+               'setting-up-the-network-using-autograd-the-full-program'),
+              ('Resources on differential equations and deep learning',
+               2,
+               None,
+               'resources-on-differential-equations-and-deep-learning'),
               ('Convolutional Neural Networks (recognizing images)',
                2,
                None,
@@ -141,85 +252,11 @@
                2,
                None,
                'pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
-              ('Building convolutional neural networks in Tensorflow and Keras',
+              ('Building convolutional neural networks in Tensorflow/Keras and '
+               'PyTorch',
                2,
                None,
-               'building-convolutional-neural-networks-in-tensorflow-and-keras'),
-              ('Setting it up', 2, None, 'setting-it-up'),
-              ('The MNIST dataset again', 2, None, 'the-mnist-dataset-again'),
-              ('Strong correlations', 2, None, 'strong-correlations'),
-              ('Layers of a CNN', 2, None, 'layers-of-a-cnn'),
-              ('Systematic reduction', 2, None, 'systematic-reduction'),
-              ('Prerequisites: Collect and pre-process data',
-               2,
-               None,
-               'prerequisites-collect-and-pre-process-data'),
-              ('Importing Keras and Tensorflow',
-               2,
-               None,
-               'importing-keras-and-tensorflow'),
-              ('Running with Keras', 2, None, 'running-with-keras'),
-              ('Final part', 2, None, 'final-part'),
-              ('Final visualization', 2, None, 'final-visualization'),
-              ('The CIFAR01 data set', 2, None, 'the-cifar01-data-set'),
-              ('Verifying the data set', 2, None, 'verifying-the-data-set'),
-              ('Set up  the model', 2, None, 'set-up-the-model'),
-              ('Add Dense layers on top', 2, None, 'add-dense-layers-on-top'),
-              ('Compile and train the model',
-               2,
-               None,
-               'compile-and-train-the-model'),
-              ('Finally, evaluate the model',
-               2,
-               None,
-               'finally-evaluate-the-model'),
-              ('Building our own CNN code',
-               2,
-               None,
-               'building-our-own-cnn-code'),
-              ('List of contents:', 3, None, 'list-of-contents'),
-              ('Schedulers', 3, None, 'schedulers'),
-              ('Usage of schedulers', 3, None, 'usage-of-schedulers'),
-              ('Cost functions', 3, None, 'cost-functions'),
-              ('Usage of cost functions', 3, None, 'usage-of-cost-functions'),
-              ('Activation functions', 3, None, 'activation-functions'),
-              ('Usage of activation functions',
-               3,
-               None,
-               'usage-of-activation-functions'),
-              ('Convolution', 3, None, 'convolution'),
-              ('Layers', 3, None, 'layers'),
-              ('Convolution2DLayer: convolution in a hidden layer',
-               3,
-               None,
-               'convolution2dlayer-convolution-in-a-hidden-layer'),
-              ('Backpropagation in the convolutional layer',
-               3,
-               None,
-               'backpropagation-in-the-convolutional-layer'),
-              ('Demonstration', 3, None, 'demonstration'),
-              ('Pooling Layer', 3, None, 'pooling-layer'),
-              ('Flattening Layer', 3, None, 'flattening-layer'),
-              ('Fully Connected Layers', 3, None, 'fully-connected-layers'),
-              ('Optimized Convolution2DLayer',
-               3,
-               None,
-               'optimized-convolution2dlayer'),
-              ('The Convolutional Neural Network (CNN)',
-               3,
-               None,
-               'the-convolutional-neural-network-cnn'),
-              ('Usage of CNN code', 3, None, 'usage-of-cnn-code'),
-              ('Additional Remarks', 3, None, 'additional-remarks'),
-              ('Remarks on the speed', 3, None, 'remarks-on-the-speed'),
-              ('Convolution using separable kernels',
-               3,
-               None,
-               'convolution-using-separable-kernels'),
-              ('Convolution in the Fourier domain',
-               3,
-               None,
-               'convolution-in-the-fourier-domain')]}
+               'building-convolutional-neural-networks-in-tensorflow-keras-and-pytorch')]}
 end of tocinfo -->
 
 <body>
@@ -247,96 +284,100 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week44-bs.html">Week 44,  Convolutional Neural Networks (CNN)</a>
+    <a class="navbar-brand" href="http://github.com/week44-bs.html">Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs001.html#plan-for-week-44" style="font-size: 80%;"><b>Plan for week 44</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs002.html#lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Lab  sessions on Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs003.html#material-for-lecture-monday-october-28" style="font-size: 80%;"><b>Material for Lecture Monday October 28</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs004.html#convolutional-neural-networks-recognizing-images" style="font-size: 80%;"><b>Convolutional Neural Networks (recognizing images)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs005.html#what-is-the-difference" style="font-size: 80%;"><b>What is the Difference</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs006.html#neural-networks-vs-cnns" style="font-size: 80%;"><b>Neural Networks vs CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs007.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;"><b>Why CNNS for images, sound files, medical images from CT scans etc?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs008.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;"><b>Regular NNs don’t scale well to full images</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs009.html#3d-volumes-of-neurons" style="font-size: 80%;"><b>3D volumes of neurons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs010.html#more-on-dimensionalities" style="font-size: 80%;"><b>More on Dimensionalities</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs011.html#further-remarks" style="font-size: 80%;"><b>Further remarks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs012.html#layers-used-to-build-cnns" style="font-size: 80%;"><b>Layers used to build CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs013.html#transforming-images" style="font-size: 80%;"><b>Transforming images</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs014.html#cnns-in-brief" style="font-size: 80%;"><b>CNNs in brief</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs015.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs016.html#key-idea" style="font-size: 80%;"><b>Key Idea</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs017.html#how-to-do-image-compression-before-the-era-of-deep-learning" style="font-size: 80%;"><b>How to do image compression before the era of deep learning</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs018.html#the-svd-example" style="font-size: 80%;"><b>The SVD example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs019.html#mathematics-of-cnns" style="font-size: 80%;"><b>Mathematics of CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs020.html#convolution-examples-polynomial-multiplication" style="font-size: 80%;"><b>Convolution Examples: Polynomial multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs021.html#efficient-polynomial-multiplication" style="font-size: 80%;"><b>Efficient Polynomial Multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs022.html#further-simplification" style="font-size: 80%;"><b>Further simplification</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs023.html#a-more-efficient-way-of-coding-the-above-convolution" style="font-size: 80%;"><b>A more efficient way of coding the above Convolution</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs024.html#commutative-process" style="font-size: 80%;"><b>Commutative process</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs025.html#toeplitz-matrices" style="font-size: 80%;"><b>Toeplitz matrices</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs026.html#fourier-series-and-toeplitz-matrices" style="font-size: 80%;"><b>Fourier series and Toeplitz matrices</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs027.html#generalizing-the-above-one-dimensional-case" style="font-size: 80%;"><b>Generalizing the above one-dimensional case</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#memory-considerations" style="font-size: 80%;"><b>Memory considerations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs029.html#padding" style="font-size: 80%;"><b>Padding</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs030.html#new-vector" style="font-size: 80%;"><b>New vector</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs031.html#rewriting-as-dot-products" style="font-size: 80%;"><b>Rewriting as dot products</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs032.html#cross-correlation" style="font-size: 80%;"><b>Cross correlation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs032.html#two-dimensional-objects" style="font-size: 80%;"><b>Two-dimensional objects</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs033.html#cnns-in-more-detail-simple-example" style="font-size: 80%;"><b>CNNs in more detail, simple example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs034.html#the-convolution-stage" style="font-size: 80%;"><b>The convolution stage</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs035.html#finding-the-number-of-parameters" style="font-size: 80%;"><b>Finding the number of parameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs036.html#new-image-or-volume" style="font-size: 80%;"><b>New image (or volume)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs037.html#parameters-to-train-common-settings" style="font-size: 80%;"><b>Parameters to train, common settings</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs038.html#examples-of-cnn-setups" style="font-size: 80%;"><b>Examples of CNN setups</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs039.html#summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>Summarizing: Performing a general discrete convolution ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs040.html#pooling" style="font-size: 80%;"><b>Pooling</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs041.html#pooling-arithmetic" style="font-size: 80%;"><b>Pooling arithmetic</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs042.html#pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>Pooling types ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs043.html#building-convolutional-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building convolutional neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs044.html#setting-it-up" style="font-size: 80%;"><b>Setting it up</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs045.html#the-mnist-dataset-again" style="font-size: 80%;"><b>The MNIST dataset again</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs046.html#strong-correlations" style="font-size: 80%;"><b>Strong correlations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs047.html#layers-of-a-cnn" style="font-size: 80%;"><b>Layers of a CNN</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs048.html#systematic-reduction" style="font-size: 80%;"><b>Systematic reduction</b></a></li>
-     <!-- navigation toc: --> <li><a href="#prerequisites-collect-and-pre-process-data" style="font-size: 80%;"><b>Prerequisites: Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs050.html#importing-keras-and-tensorflow" style="font-size: 80%;"><b>Importing Keras and Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs051.html#running-with-keras" style="font-size: 80%;"><b>Running with Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs052.html#final-part" style="font-size: 80%;"><b>Final part</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs053.html#final-visualization" style="font-size: 80%;"><b>Final visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs054.html#the-cifar01-data-set" style="font-size: 80%;"><b>The CIFAR01 data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs055.html#verifying-the-data-set" style="font-size: 80%;"><b>Verifying the data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs056.html#set-up-the-model" style="font-size: 80%;"><b>Set up  the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs057.html#add-dense-layers-on-top" style="font-size: 80%;"><b>Add Dense layers on top</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs058.html#compile-and-train-the-model" style="font-size: 80%;"><b>Compile and train the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs059.html#finally-evaluate-the-model" style="font-size: 80%;"><b>Finally, evaluate the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#building-our-own-cnn-code" style="font-size: 80%;"><b>Building our own CNN code</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#list-of-contents" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;List of contents:</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution2dlayer-convolution-in-a-hidden-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution2DLayer: convolution in a hidden layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#backpropagation-in-the-convolutional-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Backpropagation in the convolutional layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#demonstration" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Demonstration</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#pooling-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Pooling Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#flattening-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Flattening Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#fully-connected-layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Fully Connected Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#optimized-convolution2dlayer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Optimized Convolution2DLayer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#the-convolutional-neural-network-cnn" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;The Convolutional Neural Network (CNN)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-cnn-code" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of CNN code</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#additional-remarks" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Additional Remarks</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#remarks-on-the-speed" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Remarks on the speed</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution-using-separable-kernels" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution using separable kernels</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution-in-the-fourier-domain" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution in the Fourier domain</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs001.html#plan-for-week-44" style="font-size: 80%;">Plan for week 44</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs002.html#lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;">Lab  sessions on Tuesday and Wednesday</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs003.html#material-for-lecture-monday-october-27" style="font-size: 80%;">Material for Lecture Monday October 27</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs004.html#solving-differential-equations-with-deep-learning" style="font-size: 80%;">Solving differential equations  with Deep Learning</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs005.html#ordinary-differential-equations-first" style="font-size: 80%;">Ordinary Differential Equations first</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#the-trial-solution" style="font-size: 80%;">The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs007.html#minimization-process" style="font-size: 80%;">Minimization process</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs008.html#minimizing-the-cost-function-using-gradient-descent-and-automatic-differentiation" style="font-size: 80%;">Minimizing the cost function using gradient descent and automatic differentiation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs009.html#example-exponential-decay" style="font-size: 80%;">Example: Exponential decay</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs010.html#the-function-to-solve-for" style="font-size: 80%;">The function to solve for</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#the-trial-solution" style="font-size: 80%;">The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs012.html#setup-of-network" style="font-size: 80%;">Setup of Network</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs013.html#reformulating-the-problem" style="font-size: 80%;">Reformulating the problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs014.html#more-technicalities" style="font-size: 80%;">More technicalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs039.html#more-details" style="font-size: 80%;">More details</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs016.html#a-possible-implementation-of-a-neural-network" style="font-size: 80%;">A possible implementation of a neural network</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs017.html#technicalities" style="font-size: 80%;">Technicalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs018.html#final-technicalities-i" style="font-size: 80%;">Final technicalities I</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs019.html#final-technicalities-ii" style="font-size: 80%;">Final technicalities II</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs020.html#final-technicalities-iii" style="font-size: 80%;">Final technicalities III</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs021.html#final-technicalities-iv" style="font-size: 80%;">Final technicalities IV</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs022.html#back-propagation" style="font-size: 80%;">Back propagation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs023.html#gradient-descent" style="font-size: 80%;">Gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs024.html#the-code-for-solving-the-ode" style="font-size: 80%;">The code for solving the ODE</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs025.html#the-network-with-one-input-layer-specified-number-of-hidden-layers-and-one-output-layer" style="font-size: 80%;">The network with one input layer, specified number of hidden layers, and one output layer</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs026.html#example-population-growth" style="font-size: 80%;">Example: Population growth</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs027.html#setting-up-the-problem" style="font-size: 80%;">Setting up the problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#the-trial-solution" style="font-size: 80%;">The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs029.html#the-program-using-autograd" style="font-size: 80%;">The program using Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs030.html#using-forward-euler-to-solve-the-ode" style="font-size: 80%;">Using forward Euler to solve the ODE</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs031.html#example-solving-the-one-dimensional-poisson-equation" style="font-size: 80%;">Example: Solving the one dimensional Poisson equation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs032.html#the-specific-equation-to-solve-for" style="font-size: 80%;">The specific equation to solve for</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs033.html#solving-the-equation-using-autograd" style="font-size: 80%;">Solving the equation using Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs034.html#comparing-with-a-numerical-scheme" style="font-size: 80%;">Comparing with a numerical scheme</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs035.html#setting-up-the-code" style="font-size: 80%;">Setting up the code</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs036.html#partial-differential-equations" style="font-size: 80%;">Partial Differential Equations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs037.html#type-of-problem" style="font-size: 80%;">Type of problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs038.html#network-requirements" style="font-size: 80%;">Network requirements</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs039.html#more-details" style="font-size: 80%;">More details</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs040.html#example-the-diffusion-equation" style="font-size: 80%;">Example: The diffusion equation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs041.html#defining-the-problem" style="font-size: 80%;">Defining the problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs042.html#setting-up-the-network-using-autograd" style="font-size: 80%;">Setting up the network using Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs043.html#setting-up-the-network-using-autograd-the-trial-solution" style="font-size: 80%;">Setting up the network using Autograd; The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs044.html#why-the-jacobian" style="font-size: 80%;">Why the Jacobian?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs045.html#setting-up-the-network-using-autograd-the-full-program" style="font-size: 80%;">Setting up the network using Autograd; The full program</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs046.html#resources-on-differential-equations-and-deep-learning" style="font-size: 80%;">Resources on differential equations and deep learning</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs047.html#convolutional-neural-networks-recognizing-images" style="font-size: 80%;">Convolutional Neural Networks (recognizing images)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs048.html#what-is-the-difference" style="font-size: 80%;">What is the Difference</a></li>
+     <!-- navigation toc: --> <li><a href="#neural-networks-vs-cnns" style="font-size: 80%;">Neural Networks vs CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs050.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;">Why CNNS for images, sound files, medical images from CT scans etc?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs051.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;">Regular NNs don’t scale well to full images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs052.html#3d-volumes-of-neurons" style="font-size: 80%;">3D volumes of neurons</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs053.html#more-on-dimensionalities" style="font-size: 80%;">More on Dimensionalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs054.html#further-remarks" style="font-size: 80%;">Further remarks</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs055.html#layers-used-to-build-cnns" style="font-size: 80%;">Layers used to build CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs056.html#transforming-images" style="font-size: 80%;">Transforming images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs057.html#cnns-in-brief" style="font-size: 80%;">CNNs in brief</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs058.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs059.html#key-idea" style="font-size: 80%;">Key Idea</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#how-to-do-image-compression-before-the-era-of-deep-learning" style="font-size: 80%;">How to do image compression before the era of deep learning</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs061.html#the-svd-example" style="font-size: 80%;">The SVD example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs062.html#mathematics-of-cnns" style="font-size: 80%;">Mathematics of CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs063.html#convolution-examples-polynomial-multiplication" style="font-size: 80%;">Convolution Examples: Polynomial multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs064.html#efficient-polynomial-multiplication" style="font-size: 80%;">Efficient Polynomial Multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs065.html#further-simplification" style="font-size: 80%;">Further simplification</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs066.html#a-more-efficient-way-of-coding-the-above-convolution" style="font-size: 80%;">A more efficient way of coding the above Convolution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs067.html#commutative-process" style="font-size: 80%;">Commutative process</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs068.html#toeplitz-matrices" style="font-size: 80%;">Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs069.html#fourier-series-and-toeplitz-matrices" style="font-size: 80%;">Fourier series and Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs070.html#generalizing-the-above-one-dimensional-case" style="font-size: 80%;">Generalizing the above one-dimensional case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs071.html#memory-considerations" style="font-size: 80%;">Memory considerations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs072.html#padding" style="font-size: 80%;">Padding</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs073.html#new-vector" style="font-size: 80%;">New vector</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs074.html#rewriting-as-dot-products" style="font-size: 80%;">Rewriting as dot products</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs075.html#cross-correlation" style="font-size: 80%;">Cross correlation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs075.html#two-dimensional-objects" style="font-size: 80%;">Two-dimensional objects</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs076.html#cnns-in-more-detail-simple-example" style="font-size: 80%;">CNNs in more detail, simple example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs077.html#the-convolution-stage" style="font-size: 80%;">The convolution stage</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs078.html#finding-the-number-of-parameters" style="font-size: 80%;">Finding the number of parameters</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs079.html#new-image-or-volume" style="font-size: 80%;">New image (or volume)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs080.html#parameters-to-train-common-settings" style="font-size: 80%;">Parameters to train, common settings</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs081.html#examples-of-cnn-setups" style="font-size: 80%;">Examples of CNN setups</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs082.html#summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Summarizing: Performing a general discrete convolution ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs083.html#pooling" style="font-size: 80%;">Pooling</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs084.html#pooling-arithmetic" style="font-size: 80%;">Pooling arithmetic</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs085.html#pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Pooling types ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs086.html#building-convolutional-neural-networks-in-tensorflow-keras-and-pytorch" style="font-size: 80%;">Building convolutional neural networks in Tensorflow/Keras and PyTorch</a></li>
 
         </ul>
       </li>
@@ -348,69 +389,16 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0049"></a>
 <!-- !split -->
-<h2 id="prerequisites-collect-and-pre-process-data" class="anchor">Prerequisites: Collect and pre-process data </h2>
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #408080; font-style: italic"># import necessary packages</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">matplotlib.pyplot</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">plt</span>
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn</span> <span style="color: #008000; font-weight: bold">import</span> datasets
-
-
-<span style="color: #408080; font-style: italic"># ensure the same random numbers appear every time</span>
-np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>seed(<span style="color: #666666">0</span>)
-
-<span style="color: #408080; font-style: italic"># display images in notebook</span>
-<span style="color: #666666">%</span>matplotlib inline
-plt<span style="color: #666666">.</span>rcParams[<span style="color: #BA2121">&#39;figure.figsize&#39;</span>] <span style="color: #666666">=</span> (<span style="color: #666666">12</span>,<span style="color: #666666">12</span>)
-
-
-<span style="color: #408080; font-style: italic"># download MNIST dataset</span>
-digits <span style="color: #666666">=</span> datasets<span style="color: #666666">.</span>load_digits()
-
-<span style="color: #408080; font-style: italic"># define inputs and labels</span>
-inputs <span style="color: #666666">=</span> digits<span style="color: #666666">.</span>images
-labels <span style="color: #666666">=</span> digits<span style="color: #666666">.</span>target
-
-<span style="color: #408080; font-style: italic"># RGB images have a depth of 3</span>
-<span style="color: #408080; font-style: italic"># our images are grayscale so they should have a depth of 1</span>
-inputs <span style="color: #666666">=</span> inputs[:,:,:,np<span style="color: #666666">.</span>newaxis]
-
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;inputs = (n_inputs, pixel_width, pixel_height, depth) = &quot;</span> <span style="color: #666666">+</span> <span style="color: #008000">str</span>(inputs<span style="color: #666666">.</span>shape))
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;labels = (n_inputs) = &quot;</span> <span style="color: #666666">+</span> <span style="color: #008000">str</span>(labels<span style="color: #666666">.</span>shape))
-
-
-<span style="color: #408080; font-style: italic"># choose some random images to display</span>
-n_inputs <span style="color: #666666">=</span> <span style="color: #008000">len</span>(inputs)
-indices <span style="color: #666666">=</span> np<span style="color: #666666">.</span>arange(n_inputs)
-random_indices <span style="color: #666666">=</span> np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>choice(indices, size<span style="color: #666666">=5</span>)
-
-<span style="color: #008000; font-weight: bold">for</span> i, image <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">enumerate</span>(digits<span style="color: #666666">.</span>images[random_indices]):
-    plt<span style="color: #666666">.</span>subplot(<span style="color: #666666">1</span>, <span style="color: #666666">5</span>, i<span style="color: #666666">+1</span>)
-    plt<span style="color: #666666">.</span>axis(<span style="color: #BA2121">&#39;off&#39;</span>)
-    plt<span style="color: #666666">.</span>imshow(image, cmap<span style="color: #666666">=</span>plt<span style="color: #666666">.</span>cm<span style="color: #666666">.</span>gray_r, interpolation<span style="color: #666666">=</span><span style="color: #BA2121">&#39;nearest&#39;</span>)
-    plt<span style="color: #666666">.</span>title(<span style="color: #BA2121">&quot;Label: </span><span style="color: #BB6688; font-weight: bold">%d</span><span style="color: #BA2121">&quot;</span> <span style="color: #666666">%</span> digits<span style="color: #666666">.</span>target[random_indices[i]])
-plt<span style="color: #666666">.</span>show()
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
+<h2 id="neural-networks-vs-cnns" class="anchor">Neural Networks vs CNNs </h2>
 
+<p>Neural networks are defined as <b>affine transformations</b>, that is 
+a vector is received as input and is multiplied with a matrix of so-called weights (our unknown paramters) to produce an
+output (to which a bias vector is usually added before passing the result
+through a nonlinear activation function). This is applicable to any type of input, be it an
+image, a sound clip or an unordered collection of features: whatever their
+dimensionality, their representation can always be flattened into a vector
+before the transformation.
+</p>
 
 <p>
 <!-- navigation buttons at the bottom of the page -->
@@ -437,7 +425,7 @@ <h2 id="prerequisites-collect-and-pre-process-data" class="anchor">Prerequisites
   <li><a href="http://github.com/._week44-bs057.html">58</a></li>
   <li><a href="http://github.com/._week44-bs058.html">59</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week44-bs060.html">61</a></li>
+  <li><a href="http://github.com/._week44-bs086.html">87</a></li>
   <li><a href="http://github.com/._week44-bs050.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week44/html/._week44-bs050.html b/doc/pub/week44/html/._week44-bs050.html
index 1b77b263c..4c009d610 100644
--- a/doc/pub/week44/html/._week44-bs050.html
+++ b/doc/pub/week44/html/._week44-bs050.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 44,  Convolutional Neural Networks (CNN)">
-<title>Week 44,  Convolutional Neural Networks (CNN)</title>
+<meta name="description" content="Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)">
+<title>Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week44.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week44-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -41,10 +41,121 @@
                2,
                None,
                'lab-sessions-on-tuesday-and-wednesday'),
-              ('Material for Lecture Monday October 28',
+              ('Material for Lecture Monday October 27',
                2,
                None,
-               'material-for-lecture-monday-october-28'),
+               'material-for-lecture-monday-october-27'),
+              ('Solving differential equations  with Deep Learning',
+               2,
+               None,
+               'solving-differential-equations-with-deep-learning'),
+              ('Ordinary Differential Equations first',
+               2,
+               None,
+               'ordinary-differential-equations-first'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('Minimization process', 2, None, 'minimization-process'),
+              ('Minimizing the cost function using gradient descent and '
+               'automatic differentiation',
+               2,
+               None,
+               'minimizing-the-cost-function-using-gradient-descent-and-automatic-differentiation'),
+              ('Example: Exponential decay',
+               2,
+               None,
+               'example-exponential-decay'),
+              ('The function to solve for',
+               2,
+               None,
+               'the-function-to-solve-for'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('Setup of Network', 2, None, 'setup-of-network'),
+              ('Reformulating the problem',
+               2,
+               None,
+               'reformulating-the-problem'),
+              ('More technicalities', 2, None, 'more-technicalities'),
+              ('More details', 2, None, 'more-details'),
+              ('A possible implementation of a neural network',
+               2,
+               None,
+               'a-possible-implementation-of-a-neural-network'),
+              ('Technicalities', 2, None, 'technicalities'),
+              ('Final technicalities I', 2, None, 'final-technicalities-i'),
+              ('Final technicalities II', 2, None, 'final-technicalities-ii'),
+              ('Final technicalities III', 2, None, 'final-technicalities-iii'),
+              ('Final technicalities IV', 2, None, 'final-technicalities-iv'),
+              ('Back propagation', 2, None, 'back-propagation'),
+              ('Gradient descent', 2, None, 'gradient-descent'),
+              ('The code for solving the ODE',
+               2,
+               None,
+               'the-code-for-solving-the-ode'),
+              ('The network with one input layer, specified number of hidden '
+               'layers, and one output layer',
+               2,
+               None,
+               'the-network-with-one-input-layer-specified-number-of-hidden-layers-and-one-output-layer'),
+              ('Example: Population growth',
+               2,
+               None,
+               'example-population-growth'),
+              ('Setting up the problem', 2, None, 'setting-up-the-problem'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('The program using Autograd',
+               2,
+               None,
+               'the-program-using-autograd'),
+              ('Using forward Euler to solve the ODE',
+               2,
+               None,
+               'using-forward-euler-to-solve-the-ode'),
+              ('Example: Solving the one dimensional Poisson equation',
+               2,
+               None,
+               'example-solving-the-one-dimensional-poisson-equation'),
+              ('The specific equation to solve for',
+               2,
+               None,
+               'the-specific-equation-to-solve-for'),
+              ('Solving the equation using Autograd',
+               2,
+               None,
+               'solving-the-equation-using-autograd'),
+              ('Comparing with a numerical scheme',
+               2,
+               None,
+               'comparing-with-a-numerical-scheme'),
+              ('Setting up the code', 2, None, 'setting-up-the-code'),
+              ('Partial Differential Equations',
+               2,
+               None,
+               'partial-differential-equations'),
+              ('Type of problem', 2, None, 'type-of-problem'),
+              ('Network requirements', 2, None, 'network-requirements'),
+              ('More details', 2, None, 'more-details'),
+              ('Example: The diffusion equation',
+               2,
+               None,
+               'example-the-diffusion-equation'),
+              ('Defining the problem', 2, None, 'defining-the-problem'),
+              ('Setting up the network using Autograd',
+               2,
+               None,
+               'setting-up-the-network-using-autograd'),
+              ('Setting up the network using Autograd; The trial solution',
+               2,
+               None,
+               'setting-up-the-network-using-autograd-the-trial-solution'),
+              ('Why the Jacobian?', 2, None, 'why-the-jacobian'),
+              ('Setting up the network using Autograd; The full program',
+               2,
+               None,
+               'setting-up-the-network-using-autograd-the-full-program'),
+              ('Resources on differential equations and deep learning',
+               2,
+               None,
+               'resources-on-differential-equations-and-deep-learning'),
               ('Convolutional Neural Networks (recognizing images)',
                2,
                None,
@@ -141,85 +252,11 @@
                2,
                None,
                'pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
-              ('Building convolutional neural networks in Tensorflow and Keras',
-               2,
-               None,
-               'building-convolutional-neural-networks-in-tensorflow-and-keras'),
-              ('Setting it up', 2, None, 'setting-it-up'),
-              ('The MNIST dataset again', 2, None, 'the-mnist-dataset-again'),
-              ('Strong correlations', 2, None, 'strong-correlations'),
-              ('Layers of a CNN', 2, None, 'layers-of-a-cnn'),
-              ('Systematic reduction', 2, None, 'systematic-reduction'),
-              ('Prerequisites: Collect and pre-process data',
-               2,
-               None,
-               'prerequisites-collect-and-pre-process-data'),
-              ('Importing Keras and Tensorflow',
-               2,
-               None,
-               'importing-keras-and-tensorflow'),
-              ('Running with Keras', 2, None, 'running-with-keras'),
-              ('Final part', 2, None, 'final-part'),
-              ('Final visualization', 2, None, 'final-visualization'),
-              ('The CIFAR01 data set', 2, None, 'the-cifar01-data-set'),
-              ('Verifying the data set', 2, None, 'verifying-the-data-set'),
-              ('Set up  the model', 2, None, 'set-up-the-model'),
-              ('Add Dense layers on top', 2, None, 'add-dense-layers-on-top'),
-              ('Compile and train the model',
-               2,
-               None,
-               'compile-and-train-the-model'),
-              ('Finally, evaluate the model',
+              ('Building convolutional neural networks in Tensorflow/Keras and '
+               'PyTorch',
                2,
                None,
-               'finally-evaluate-the-model'),
-              ('Building our own CNN code',
-               2,
-               None,
-               'building-our-own-cnn-code'),
-              ('List of contents:', 3, None, 'list-of-contents'),
-              ('Schedulers', 3, None, 'schedulers'),
-              ('Usage of schedulers', 3, None, 'usage-of-schedulers'),
-              ('Cost functions', 3, None, 'cost-functions'),
-              ('Usage of cost functions', 3, None, 'usage-of-cost-functions'),
-              ('Activation functions', 3, None, 'activation-functions'),
-              ('Usage of activation functions',
-               3,
-               None,
-               'usage-of-activation-functions'),
-              ('Convolution', 3, None, 'convolution'),
-              ('Layers', 3, None, 'layers'),
-              ('Convolution2DLayer: convolution in a hidden layer',
-               3,
-               None,
-               'convolution2dlayer-convolution-in-a-hidden-layer'),
-              ('Backpropagation in the convolutional layer',
-               3,
-               None,
-               'backpropagation-in-the-convolutional-layer'),
-              ('Demonstration', 3, None, 'demonstration'),
-              ('Pooling Layer', 3, None, 'pooling-layer'),
-              ('Flattening Layer', 3, None, 'flattening-layer'),
-              ('Fully Connected Layers', 3, None, 'fully-connected-layers'),
-              ('Optimized Convolution2DLayer',
-               3,
-               None,
-               'optimized-convolution2dlayer'),
-              ('The Convolutional Neural Network (CNN)',
-               3,
-               None,
-               'the-convolutional-neural-network-cnn'),
-              ('Usage of CNN code', 3, None, 'usage-of-cnn-code'),
-              ('Additional Remarks', 3, None, 'additional-remarks'),
-              ('Remarks on the speed', 3, None, 'remarks-on-the-speed'),
-              ('Convolution using separable kernels',
-               3,
-               None,
-               'convolution-using-separable-kernels'),
-              ('Convolution in the Fourier domain',
-               3,
-               None,
-               'convolution-in-the-fourier-domain')]}
+               'building-convolutional-neural-networks-in-tensorflow-keras-and-pytorch')]}
 end of tocinfo -->
 
 <body>
@@ -247,96 +284,100 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week44-bs.html">Week 44,  Convolutional Neural Networks (CNN)</a>
+    <a class="navbar-brand" href="http://github.com/week44-bs.html">Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs001.html#plan-for-week-44" style="font-size: 80%;"><b>Plan for week 44</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs002.html#lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Lab  sessions on Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs003.html#material-for-lecture-monday-october-28" style="font-size: 80%;"><b>Material for Lecture Monday October 28</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs004.html#convolutional-neural-networks-recognizing-images" style="font-size: 80%;"><b>Convolutional Neural Networks (recognizing images)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs005.html#what-is-the-difference" style="font-size: 80%;"><b>What is the Difference</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs006.html#neural-networks-vs-cnns" style="font-size: 80%;"><b>Neural Networks vs CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs007.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;"><b>Why CNNS for images, sound files, medical images from CT scans etc?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs008.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;"><b>Regular NNs don’t scale well to full images</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs009.html#3d-volumes-of-neurons" style="font-size: 80%;"><b>3D volumes of neurons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs010.html#more-on-dimensionalities" style="font-size: 80%;"><b>More on Dimensionalities</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs011.html#further-remarks" style="font-size: 80%;"><b>Further remarks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs012.html#layers-used-to-build-cnns" style="font-size: 80%;"><b>Layers used to build CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs013.html#transforming-images" style="font-size: 80%;"><b>Transforming images</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs014.html#cnns-in-brief" style="font-size: 80%;"><b>CNNs in brief</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs015.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs016.html#key-idea" style="font-size: 80%;"><b>Key Idea</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs017.html#how-to-do-image-compression-before-the-era-of-deep-learning" style="font-size: 80%;"><b>How to do image compression before the era of deep learning</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs018.html#the-svd-example" style="font-size: 80%;"><b>The SVD example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs019.html#mathematics-of-cnns" style="font-size: 80%;"><b>Mathematics of CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs020.html#convolution-examples-polynomial-multiplication" style="font-size: 80%;"><b>Convolution Examples: Polynomial multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs021.html#efficient-polynomial-multiplication" style="font-size: 80%;"><b>Efficient Polynomial Multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs022.html#further-simplification" style="font-size: 80%;"><b>Further simplification</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs023.html#a-more-efficient-way-of-coding-the-above-convolution" style="font-size: 80%;"><b>A more efficient way of coding the above Convolution</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs024.html#commutative-process" style="font-size: 80%;"><b>Commutative process</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs025.html#toeplitz-matrices" style="font-size: 80%;"><b>Toeplitz matrices</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs026.html#fourier-series-and-toeplitz-matrices" style="font-size: 80%;"><b>Fourier series and Toeplitz matrices</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs027.html#generalizing-the-above-one-dimensional-case" style="font-size: 80%;"><b>Generalizing the above one-dimensional case</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#memory-considerations" style="font-size: 80%;"><b>Memory considerations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs029.html#padding" style="font-size: 80%;"><b>Padding</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs030.html#new-vector" style="font-size: 80%;"><b>New vector</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs031.html#rewriting-as-dot-products" style="font-size: 80%;"><b>Rewriting as dot products</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs032.html#cross-correlation" style="font-size: 80%;"><b>Cross correlation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs032.html#two-dimensional-objects" style="font-size: 80%;"><b>Two-dimensional objects</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs033.html#cnns-in-more-detail-simple-example" style="font-size: 80%;"><b>CNNs in more detail, simple example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs034.html#the-convolution-stage" style="font-size: 80%;"><b>The convolution stage</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs035.html#finding-the-number-of-parameters" style="font-size: 80%;"><b>Finding the number of parameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs036.html#new-image-or-volume" style="font-size: 80%;"><b>New image (or volume)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs037.html#parameters-to-train-common-settings" style="font-size: 80%;"><b>Parameters to train, common settings</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs038.html#examples-of-cnn-setups" style="font-size: 80%;"><b>Examples of CNN setups</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs039.html#summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>Summarizing: Performing a general discrete convolution ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs040.html#pooling" style="font-size: 80%;"><b>Pooling</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs041.html#pooling-arithmetic" style="font-size: 80%;"><b>Pooling arithmetic</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs042.html#pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>Pooling types ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs043.html#building-convolutional-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building convolutional neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs044.html#setting-it-up" style="font-size: 80%;"><b>Setting it up</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs045.html#the-mnist-dataset-again" style="font-size: 80%;"><b>The MNIST dataset again</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs046.html#strong-correlations" style="font-size: 80%;"><b>Strong correlations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs047.html#layers-of-a-cnn" style="font-size: 80%;"><b>Layers of a CNN</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs048.html#systematic-reduction" style="font-size: 80%;"><b>Systematic reduction</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs049.html#prerequisites-collect-and-pre-process-data" style="font-size: 80%;"><b>Prerequisites: Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="#importing-keras-and-tensorflow" style="font-size: 80%;"><b>Importing Keras and Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs051.html#running-with-keras" style="font-size: 80%;"><b>Running with Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs052.html#final-part" style="font-size: 80%;"><b>Final part</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs053.html#final-visualization" style="font-size: 80%;"><b>Final visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs054.html#the-cifar01-data-set" style="font-size: 80%;"><b>The CIFAR01 data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs055.html#verifying-the-data-set" style="font-size: 80%;"><b>Verifying the data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs056.html#set-up-the-model" style="font-size: 80%;"><b>Set up  the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs057.html#add-dense-layers-on-top" style="font-size: 80%;"><b>Add Dense layers on top</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs058.html#compile-and-train-the-model" style="font-size: 80%;"><b>Compile and train the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs059.html#finally-evaluate-the-model" style="font-size: 80%;"><b>Finally, evaluate the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#building-our-own-cnn-code" style="font-size: 80%;"><b>Building our own CNN code</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#list-of-contents" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;List of contents:</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution2dlayer-convolution-in-a-hidden-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution2DLayer: convolution in a hidden layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#backpropagation-in-the-convolutional-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Backpropagation in the convolutional layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#demonstration" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Demonstration</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#pooling-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Pooling Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#flattening-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Flattening Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#fully-connected-layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Fully Connected Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#optimized-convolution2dlayer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Optimized Convolution2DLayer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#the-convolutional-neural-network-cnn" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;The Convolutional Neural Network (CNN)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-cnn-code" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of CNN code</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#additional-remarks" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Additional Remarks</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#remarks-on-the-speed" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Remarks on the speed</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution-using-separable-kernels" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution using separable kernels</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution-in-the-fourier-domain" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution in the Fourier domain</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs001.html#plan-for-week-44" style="font-size: 80%;">Plan for week 44</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs002.html#lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;">Lab  sessions on Tuesday and Wednesday</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs003.html#material-for-lecture-monday-october-27" style="font-size: 80%;">Material for Lecture Monday October 27</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs004.html#solving-differential-equations-with-deep-learning" style="font-size: 80%;">Solving differential equations  with Deep Learning</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs005.html#ordinary-differential-equations-first" style="font-size: 80%;">Ordinary Differential Equations first</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#the-trial-solution" style="font-size: 80%;">The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs007.html#minimization-process" style="font-size: 80%;">Minimization process</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs008.html#minimizing-the-cost-function-using-gradient-descent-and-automatic-differentiation" style="font-size: 80%;">Minimizing the cost function using gradient descent and automatic differentiation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs009.html#example-exponential-decay" style="font-size: 80%;">Example: Exponential decay</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs010.html#the-function-to-solve-for" style="font-size: 80%;">The function to solve for</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#the-trial-solution" style="font-size: 80%;">The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs012.html#setup-of-network" style="font-size: 80%;">Setup of Network</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs013.html#reformulating-the-problem" style="font-size: 80%;">Reformulating the problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs014.html#more-technicalities" style="font-size: 80%;">More technicalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs039.html#more-details" style="font-size: 80%;">More details</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs016.html#a-possible-implementation-of-a-neural-network" style="font-size: 80%;">A possible implementation of a neural network</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs017.html#technicalities" style="font-size: 80%;">Technicalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs018.html#final-technicalities-i" style="font-size: 80%;">Final technicalities I</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs019.html#final-technicalities-ii" style="font-size: 80%;">Final technicalities II</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs020.html#final-technicalities-iii" style="font-size: 80%;">Final technicalities III</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs021.html#final-technicalities-iv" style="font-size: 80%;">Final technicalities IV</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs022.html#back-propagation" style="font-size: 80%;">Back propagation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs023.html#gradient-descent" style="font-size: 80%;">Gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs024.html#the-code-for-solving-the-ode" style="font-size: 80%;">The code for solving the ODE</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs025.html#the-network-with-one-input-layer-specified-number-of-hidden-layers-and-one-output-layer" style="font-size: 80%;">The network with one input layer, specified number of hidden layers, and one output layer</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs026.html#example-population-growth" style="font-size: 80%;">Example: Population growth</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs027.html#setting-up-the-problem" style="font-size: 80%;">Setting up the problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#the-trial-solution" style="font-size: 80%;">The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs029.html#the-program-using-autograd" style="font-size: 80%;">The program using Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs030.html#using-forward-euler-to-solve-the-ode" style="font-size: 80%;">Using forward Euler to solve the ODE</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs031.html#example-solving-the-one-dimensional-poisson-equation" style="font-size: 80%;">Example: Solving the one dimensional Poisson equation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs032.html#the-specific-equation-to-solve-for" style="font-size: 80%;">The specific equation to solve for</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs033.html#solving-the-equation-using-autograd" style="font-size: 80%;">Solving the equation using Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs034.html#comparing-with-a-numerical-scheme" style="font-size: 80%;">Comparing with a numerical scheme</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs035.html#setting-up-the-code" style="font-size: 80%;">Setting up the code</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs036.html#partial-differential-equations" style="font-size: 80%;">Partial Differential Equations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs037.html#type-of-problem" style="font-size: 80%;">Type of problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs038.html#network-requirements" style="font-size: 80%;">Network requirements</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs039.html#more-details" style="font-size: 80%;">More details</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs040.html#example-the-diffusion-equation" style="font-size: 80%;">Example: The diffusion equation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs041.html#defining-the-problem" style="font-size: 80%;">Defining the problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs042.html#setting-up-the-network-using-autograd" style="font-size: 80%;">Setting up the network using Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs043.html#setting-up-the-network-using-autograd-the-trial-solution" style="font-size: 80%;">Setting up the network using Autograd; The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs044.html#why-the-jacobian" style="font-size: 80%;">Why the Jacobian?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs045.html#setting-up-the-network-using-autograd-the-full-program" style="font-size: 80%;">Setting up the network using Autograd; The full program</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs046.html#resources-on-differential-equations-and-deep-learning" style="font-size: 80%;">Resources on differential equations and deep learning</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs047.html#convolutional-neural-networks-recognizing-images" style="font-size: 80%;">Convolutional Neural Networks (recognizing images)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs048.html#what-is-the-difference" style="font-size: 80%;">What is the Difference</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs049.html#neural-networks-vs-cnns" style="font-size: 80%;">Neural Networks vs CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;">Why CNNS for images, sound files, medical images from CT scans etc?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs051.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;">Regular NNs don’t scale well to full images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs052.html#3d-volumes-of-neurons" style="font-size: 80%;">3D volumes of neurons</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs053.html#more-on-dimensionalities" style="font-size: 80%;">More on Dimensionalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs054.html#further-remarks" style="font-size: 80%;">Further remarks</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs055.html#layers-used-to-build-cnns" style="font-size: 80%;">Layers used to build CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs056.html#transforming-images" style="font-size: 80%;">Transforming images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs057.html#cnns-in-brief" style="font-size: 80%;">CNNs in brief</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs058.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs059.html#key-idea" style="font-size: 80%;">Key Idea</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#how-to-do-image-compression-before-the-era-of-deep-learning" style="font-size: 80%;">How to do image compression before the era of deep learning</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs061.html#the-svd-example" style="font-size: 80%;">The SVD example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs062.html#mathematics-of-cnns" style="font-size: 80%;">Mathematics of CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs063.html#convolution-examples-polynomial-multiplication" style="font-size: 80%;">Convolution Examples: Polynomial multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs064.html#efficient-polynomial-multiplication" style="font-size: 80%;">Efficient Polynomial Multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs065.html#further-simplification" style="font-size: 80%;">Further simplification</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs066.html#a-more-efficient-way-of-coding-the-above-convolution" style="font-size: 80%;">A more efficient way of coding the above Convolution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs067.html#commutative-process" style="font-size: 80%;">Commutative process</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs068.html#toeplitz-matrices" style="font-size: 80%;">Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs069.html#fourier-series-and-toeplitz-matrices" style="font-size: 80%;">Fourier series and Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs070.html#generalizing-the-above-one-dimensional-case" style="font-size: 80%;">Generalizing the above one-dimensional case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs071.html#memory-considerations" style="font-size: 80%;">Memory considerations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs072.html#padding" style="font-size: 80%;">Padding</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs073.html#new-vector" style="font-size: 80%;">New vector</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs074.html#rewriting-as-dot-products" style="font-size: 80%;">Rewriting as dot products</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs075.html#cross-correlation" style="font-size: 80%;">Cross correlation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs075.html#two-dimensional-objects" style="font-size: 80%;">Two-dimensional objects</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs076.html#cnns-in-more-detail-simple-example" style="font-size: 80%;">CNNs in more detail, simple example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs077.html#the-convolution-stage" style="font-size: 80%;">The convolution stage</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs078.html#finding-the-number-of-parameters" style="font-size: 80%;">Finding the number of parameters</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs079.html#new-image-or-volume" style="font-size: 80%;">New image (or volume)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs080.html#parameters-to-train-common-settings" style="font-size: 80%;">Parameters to train, common settings</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs081.html#examples-of-cnn-setups" style="font-size: 80%;">Examples of CNN setups</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs082.html#summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Summarizing: Performing a general discrete convolution ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs083.html#pooling" style="font-size: 80%;">Pooling</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs084.html#pooling-arithmetic" style="font-size: 80%;">Pooling arithmetic</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs085.html#pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Pooling types ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs086.html#building-convolutional-neural-networks-in-tensorflow-keras-and-pytorch" style="font-size: 80%;">Building convolutional neural networks in Tensorflow/Keras and PyTorch</a></li>
 
         </ul>
       </li>
@@ -348,51 +389,29 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0050"></a>
 <!-- !split -->
-<h2 id="importing-keras-and-tensorflow" class="anchor">Importing Keras and Tensorflow </h2>
+<h2 id="why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" class="anchor">Why CNNS for images, sound files, medical images from CT scans etc? </h2>
 
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">tensorflow.keras</span> <span style="color: #008000; font-weight: bold">import</span> datasets, layers, models
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">tensorflow.keras.layers</span> <span style="color: #008000; font-weight: bold">import</span> Input
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">tensorflow.keras.models</span> <span style="color: #008000; font-weight: bold">import</span> Sequential      <span style="color: #408080; font-style: italic">#This allows appending layers to existing models</span>
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">tensorflow.keras.layers</span> <span style="color: #008000; font-weight: bold">import</span> Dense           <span style="color: #408080; font-style: italic">#This allows defining the characteristics of a particular layer</span>
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">tensorflow.keras</span> <span style="color: #008000; font-weight: bold">import</span> optimizers             <span style="color: #408080; font-style: italic">#This allows using whichever optimiser we want (sgd,adam,RMSprop)</span>
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">tensorflow.keras</span> <span style="color: #008000; font-weight: bold">import</span> regularizers           <span style="color: #408080; font-style: italic">#This allows using whichever regularizer we want (l1,l2,l1_l2)</span>
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">tensorflow.keras.utils</span> <span style="color: #008000; font-weight: bold">import</span> to_categorical   <span style="color: #408080; font-style: italic">#This allows using categorical cross entropy as the cost function</span>
-<span style="color: #408080; font-style: italic">#from tensorflow.keras import Conv2D</span>
-<span style="color: #408080; font-style: italic">#from tensorflow.keras import MaxPooling2D</span>
-<span style="color: #408080; font-style: italic">#from tensorflow.keras import Flatten</span>
-
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.model_selection</span> <span style="color: #008000; font-weight: bold">import</span> train_test_split
-
-<span style="color: #408080; font-style: italic"># representation of labels</span>
-labels <span style="color: #666666">=</span> to_categorical(labels)
-
-<span style="color: #408080; font-style: italic"># split into train and test data</span>
-<span style="color: #408080; font-style: italic"># one-liner from scikit-learn library</span>
-train_size <span style="color: #666666">=</span> <span style="color: #666666">0.8</span>
-test_size <span style="color: #666666">=</span> <span style="color: #666666">1</span> <span style="color: #666666">-</span> train_size
-X_train, X_test, Y_train, Y_test <span style="color: #666666">=</span> train_test_split(inputs, labels, train_size<span style="color: #666666">=</span>train_size,
-                                                    test_size<span style="color: #666666">=</span>test_size)
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
+<p>However, when we consider images, sound clips and many other similar kinds of data, these data  have an intrinsic
+structure. More formally, they share these important properties:
+</p>
+<ul>
+<li> They are stored as multi-dimensional arrays (think of the pixels of a figure) .</li>
+<li> They feature one or more axes for which ordering matters (e.g., width and height axes for an image, time axis for a sound clip).</li>
+<li> One axis, called the channel axis, is used to access different views of the data (e.g., the red, green and blue channels of a color image, or the left and right channels of a stereo audio track).</li>
+</ul>
+<p>These properties are not exploited when an affine transformation is applied; in
+fact, all the axes are treated in the same way and the topological information
+is not taken into account. Still, taking advantage of the implicit structure of
+the data may prove very handy in solving some tasks, like computer vision and
+speech recognition, and in these cases it would be best to preserve it. This is
+where discrete convolutions come into play.
+</p>
 
+<p>A discrete convolution is a linear transformation that preserves this notion of
+ordering. It is sparse (only a few input units contribute to a given output
+unit) and reuses parameters (the same weights are applied to multiple locations
+in the input).
+</p>
 
 <p>
 <!-- navigation buttons at the bottom of the page -->
@@ -419,7 +438,7 @@ <h2 id="importing-keras-and-tensorflow" class="anchor">Importing Keras and Tenso
   <li><a href="http://github.com/._week44-bs058.html">59</a></li>
   <li><a href="http://github.com/._week44-bs059.html">60</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week44-bs060.html">61</a></li>
+  <li><a href="http://github.com/._week44-bs086.html">87</a></li>
   <li><a href="http://github.com/._week44-bs051.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week44/html/._week44-bs051.html b/doc/pub/week44/html/._week44-bs051.html
index 9d5e83711..30d5200db 100644
--- a/doc/pub/week44/html/._week44-bs051.html
+++ b/doc/pub/week44/html/._week44-bs051.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 44,  Convolutional Neural Networks (CNN)">
-<title>Week 44,  Convolutional Neural Networks (CNN)</title>
+<meta name="description" content="Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)">
+<title>Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week44.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week44-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -41,10 +41,121 @@
                2,
                None,
                'lab-sessions-on-tuesday-and-wednesday'),
-              ('Material for Lecture Monday October 28',
+              ('Material for Lecture Monday October 27',
                2,
                None,
-               'material-for-lecture-monday-october-28'),
+               'material-for-lecture-monday-october-27'),
+              ('Solving differential equations  with Deep Learning',
+               2,
+               None,
+               'solving-differential-equations-with-deep-learning'),
+              ('Ordinary Differential Equations first',
+               2,
+               None,
+               'ordinary-differential-equations-first'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('Minimization process', 2, None, 'minimization-process'),
+              ('Minimizing the cost function using gradient descent and '
+               'automatic differentiation',
+               2,
+               None,
+               'minimizing-the-cost-function-using-gradient-descent-and-automatic-differentiation'),
+              ('Example: Exponential decay',
+               2,
+               None,
+               'example-exponential-decay'),
+              ('The function to solve for',
+               2,
+               None,
+               'the-function-to-solve-for'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('Setup of Network', 2, None, 'setup-of-network'),
+              ('Reformulating the problem',
+               2,
+               None,
+               'reformulating-the-problem'),
+              ('More technicalities', 2, None, 'more-technicalities'),
+              ('More details', 2, None, 'more-details'),
+              ('A possible implementation of a neural network',
+               2,
+               None,
+               'a-possible-implementation-of-a-neural-network'),
+              ('Technicalities', 2, None, 'technicalities'),
+              ('Final technicalities I', 2, None, 'final-technicalities-i'),
+              ('Final technicalities II', 2, None, 'final-technicalities-ii'),
+              ('Final technicalities III', 2, None, 'final-technicalities-iii'),
+              ('Final technicalities IV', 2, None, 'final-technicalities-iv'),
+              ('Back propagation', 2, None, 'back-propagation'),
+              ('Gradient descent', 2, None, 'gradient-descent'),
+              ('The code for solving the ODE',
+               2,
+               None,
+               'the-code-for-solving-the-ode'),
+              ('The network with one input layer, specified number of hidden '
+               'layers, and one output layer',
+               2,
+               None,
+               'the-network-with-one-input-layer-specified-number-of-hidden-layers-and-one-output-layer'),
+              ('Example: Population growth',
+               2,
+               None,
+               'example-population-growth'),
+              ('Setting up the problem', 2, None, 'setting-up-the-problem'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('The program using Autograd',
+               2,
+               None,
+               'the-program-using-autograd'),
+              ('Using forward Euler to solve the ODE',
+               2,
+               None,
+               'using-forward-euler-to-solve-the-ode'),
+              ('Example: Solving the one dimensional Poisson equation',
+               2,
+               None,
+               'example-solving-the-one-dimensional-poisson-equation'),
+              ('The specific equation to solve for',
+               2,
+               None,
+               'the-specific-equation-to-solve-for'),
+              ('Solving the equation using Autograd',
+               2,
+               None,
+               'solving-the-equation-using-autograd'),
+              ('Comparing with a numerical scheme',
+               2,
+               None,
+               'comparing-with-a-numerical-scheme'),
+              ('Setting up the code', 2, None, 'setting-up-the-code'),
+              ('Partial Differential Equations',
+               2,
+               None,
+               'partial-differential-equations'),
+              ('Type of problem', 2, None, 'type-of-problem'),
+              ('Network requirements', 2, None, 'network-requirements'),
+              ('More details', 2, None, 'more-details'),
+              ('Example: The diffusion equation',
+               2,
+               None,
+               'example-the-diffusion-equation'),
+              ('Defining the problem', 2, None, 'defining-the-problem'),
+              ('Setting up the network using Autograd',
+               2,
+               None,
+               'setting-up-the-network-using-autograd'),
+              ('Setting up the network using Autograd; The trial solution',
+               2,
+               None,
+               'setting-up-the-network-using-autograd-the-trial-solution'),
+              ('Why the Jacobian?', 2, None, 'why-the-jacobian'),
+              ('Setting up the network using Autograd; The full program',
+               2,
+               None,
+               'setting-up-the-network-using-autograd-the-full-program'),
+              ('Resources on differential equations and deep learning',
+               2,
+               None,
+               'resources-on-differential-equations-and-deep-learning'),
               ('Convolutional Neural Networks (recognizing images)',
                2,
                None,
@@ -141,85 +252,11 @@
                2,
                None,
                'pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
-              ('Building convolutional neural networks in Tensorflow and Keras',
+              ('Building convolutional neural networks in Tensorflow/Keras and '
+               'PyTorch',
                2,
                None,
-               'building-convolutional-neural-networks-in-tensorflow-and-keras'),
-              ('Setting it up', 2, None, 'setting-it-up'),
-              ('The MNIST dataset again', 2, None, 'the-mnist-dataset-again'),
-              ('Strong correlations', 2, None, 'strong-correlations'),
-              ('Layers of a CNN', 2, None, 'layers-of-a-cnn'),
-              ('Systematic reduction', 2, None, 'systematic-reduction'),
-              ('Prerequisites: Collect and pre-process data',
-               2,
-               None,
-               'prerequisites-collect-and-pre-process-data'),
-              ('Importing Keras and Tensorflow',
-               2,
-               None,
-               'importing-keras-and-tensorflow'),
-              ('Running with Keras', 2, None, 'running-with-keras'),
-              ('Final part', 2, None, 'final-part'),
-              ('Final visualization', 2, None, 'final-visualization'),
-              ('The CIFAR01 data set', 2, None, 'the-cifar01-data-set'),
-              ('Verifying the data set', 2, None, 'verifying-the-data-set'),
-              ('Set up  the model', 2, None, 'set-up-the-model'),
-              ('Add Dense layers on top', 2, None, 'add-dense-layers-on-top'),
-              ('Compile and train the model',
-               2,
-               None,
-               'compile-and-train-the-model'),
-              ('Finally, evaluate the model',
-               2,
-               None,
-               'finally-evaluate-the-model'),
-              ('Building our own CNN code',
-               2,
-               None,
-               'building-our-own-cnn-code'),
-              ('List of contents:', 3, None, 'list-of-contents'),
-              ('Schedulers', 3, None, 'schedulers'),
-              ('Usage of schedulers', 3, None, 'usage-of-schedulers'),
-              ('Cost functions', 3, None, 'cost-functions'),
-              ('Usage of cost functions', 3, None, 'usage-of-cost-functions'),
-              ('Activation functions', 3, None, 'activation-functions'),
-              ('Usage of activation functions',
-               3,
-               None,
-               'usage-of-activation-functions'),
-              ('Convolution', 3, None, 'convolution'),
-              ('Layers', 3, None, 'layers'),
-              ('Convolution2DLayer: convolution in a hidden layer',
-               3,
-               None,
-               'convolution2dlayer-convolution-in-a-hidden-layer'),
-              ('Backpropagation in the convolutional layer',
-               3,
-               None,
-               'backpropagation-in-the-convolutional-layer'),
-              ('Demonstration', 3, None, 'demonstration'),
-              ('Pooling Layer', 3, None, 'pooling-layer'),
-              ('Flattening Layer', 3, None, 'flattening-layer'),
-              ('Fully Connected Layers', 3, None, 'fully-connected-layers'),
-              ('Optimized Convolution2DLayer',
-               3,
-               None,
-               'optimized-convolution2dlayer'),
-              ('The Convolutional Neural Network (CNN)',
-               3,
-               None,
-               'the-convolutional-neural-network-cnn'),
-              ('Usage of CNN code', 3, None, 'usage-of-cnn-code'),
-              ('Additional Remarks', 3, None, 'additional-remarks'),
-              ('Remarks on the speed', 3, None, 'remarks-on-the-speed'),
-              ('Convolution using separable kernels',
-               3,
-               None,
-               'convolution-using-separable-kernels'),
-              ('Convolution in the Fourier domain',
-               3,
-               None,
-               'convolution-in-the-fourier-domain')]}
+               'building-convolutional-neural-networks-in-tensorflow-keras-and-pytorch')]}
 end of tocinfo -->
 
 <body>
@@ -247,96 +284,100 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week44-bs.html">Week 44,  Convolutional Neural Networks (CNN)</a>
+    <a class="navbar-brand" href="http://github.com/week44-bs.html">Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs001.html#plan-for-week-44" style="font-size: 80%;"><b>Plan for week 44</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs002.html#lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Lab  sessions on Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs003.html#material-for-lecture-monday-october-28" style="font-size: 80%;"><b>Material for Lecture Monday October 28</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs004.html#convolutional-neural-networks-recognizing-images" style="font-size: 80%;"><b>Convolutional Neural Networks (recognizing images)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs005.html#what-is-the-difference" style="font-size: 80%;"><b>What is the Difference</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs006.html#neural-networks-vs-cnns" style="font-size: 80%;"><b>Neural Networks vs CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs007.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;"><b>Why CNNS for images, sound files, medical images from CT scans etc?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs008.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;"><b>Regular NNs don’t scale well to full images</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs009.html#3d-volumes-of-neurons" style="font-size: 80%;"><b>3D volumes of neurons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs010.html#more-on-dimensionalities" style="font-size: 80%;"><b>More on Dimensionalities</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs011.html#further-remarks" style="font-size: 80%;"><b>Further remarks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs012.html#layers-used-to-build-cnns" style="font-size: 80%;"><b>Layers used to build CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs013.html#transforming-images" style="font-size: 80%;"><b>Transforming images</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs014.html#cnns-in-brief" style="font-size: 80%;"><b>CNNs in brief</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs015.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs016.html#key-idea" style="font-size: 80%;"><b>Key Idea</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs017.html#how-to-do-image-compression-before-the-era-of-deep-learning" style="font-size: 80%;"><b>How to do image compression before the era of deep learning</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs018.html#the-svd-example" style="font-size: 80%;"><b>The SVD example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs019.html#mathematics-of-cnns" style="font-size: 80%;"><b>Mathematics of CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs020.html#convolution-examples-polynomial-multiplication" style="font-size: 80%;"><b>Convolution Examples: Polynomial multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs021.html#efficient-polynomial-multiplication" style="font-size: 80%;"><b>Efficient Polynomial Multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs022.html#further-simplification" style="font-size: 80%;"><b>Further simplification</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs023.html#a-more-efficient-way-of-coding-the-above-convolution" style="font-size: 80%;"><b>A more efficient way of coding the above Convolution</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs024.html#commutative-process" style="font-size: 80%;"><b>Commutative process</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs025.html#toeplitz-matrices" style="font-size: 80%;"><b>Toeplitz matrices</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs026.html#fourier-series-and-toeplitz-matrices" style="font-size: 80%;"><b>Fourier series and Toeplitz matrices</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs027.html#generalizing-the-above-one-dimensional-case" style="font-size: 80%;"><b>Generalizing the above one-dimensional case</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#memory-considerations" style="font-size: 80%;"><b>Memory considerations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs029.html#padding" style="font-size: 80%;"><b>Padding</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs030.html#new-vector" style="font-size: 80%;"><b>New vector</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs031.html#rewriting-as-dot-products" style="font-size: 80%;"><b>Rewriting as dot products</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs032.html#cross-correlation" style="font-size: 80%;"><b>Cross correlation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs032.html#two-dimensional-objects" style="font-size: 80%;"><b>Two-dimensional objects</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs033.html#cnns-in-more-detail-simple-example" style="font-size: 80%;"><b>CNNs in more detail, simple example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs034.html#the-convolution-stage" style="font-size: 80%;"><b>The convolution stage</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs035.html#finding-the-number-of-parameters" style="font-size: 80%;"><b>Finding the number of parameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs036.html#new-image-or-volume" style="font-size: 80%;"><b>New image (or volume)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs037.html#parameters-to-train-common-settings" style="font-size: 80%;"><b>Parameters to train, common settings</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs038.html#examples-of-cnn-setups" style="font-size: 80%;"><b>Examples of CNN setups</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs039.html#summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>Summarizing: Performing a general discrete convolution ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs040.html#pooling" style="font-size: 80%;"><b>Pooling</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs041.html#pooling-arithmetic" style="font-size: 80%;"><b>Pooling arithmetic</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs042.html#pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>Pooling types ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs043.html#building-convolutional-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building convolutional neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs044.html#setting-it-up" style="font-size: 80%;"><b>Setting it up</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs045.html#the-mnist-dataset-again" style="font-size: 80%;"><b>The MNIST dataset again</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs046.html#strong-correlations" style="font-size: 80%;"><b>Strong correlations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs047.html#layers-of-a-cnn" style="font-size: 80%;"><b>Layers of a CNN</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs048.html#systematic-reduction" style="font-size: 80%;"><b>Systematic reduction</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs049.html#prerequisites-collect-and-pre-process-data" style="font-size: 80%;"><b>Prerequisites: Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs050.html#importing-keras-and-tensorflow" style="font-size: 80%;"><b>Importing Keras and Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="#running-with-keras" style="font-size: 80%;"><b>Running with Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs052.html#final-part" style="font-size: 80%;"><b>Final part</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs053.html#final-visualization" style="font-size: 80%;"><b>Final visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs054.html#the-cifar01-data-set" style="font-size: 80%;"><b>The CIFAR01 data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs055.html#verifying-the-data-set" style="font-size: 80%;"><b>Verifying the data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs056.html#set-up-the-model" style="font-size: 80%;"><b>Set up  the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs057.html#add-dense-layers-on-top" style="font-size: 80%;"><b>Add Dense layers on top</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs058.html#compile-and-train-the-model" style="font-size: 80%;"><b>Compile and train the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs059.html#finally-evaluate-the-model" style="font-size: 80%;"><b>Finally, evaluate the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#building-our-own-cnn-code" style="font-size: 80%;"><b>Building our own CNN code</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#list-of-contents" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;List of contents:</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution2dlayer-convolution-in-a-hidden-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution2DLayer: convolution in a hidden layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#backpropagation-in-the-convolutional-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Backpropagation in the convolutional layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#demonstration" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Demonstration</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#pooling-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Pooling Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#flattening-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Flattening Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#fully-connected-layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Fully Connected Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#optimized-convolution2dlayer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Optimized Convolution2DLayer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#the-convolutional-neural-network-cnn" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;The Convolutional Neural Network (CNN)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-cnn-code" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of CNN code</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#additional-remarks" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Additional Remarks</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#remarks-on-the-speed" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Remarks on the speed</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution-using-separable-kernels" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution using separable kernels</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution-in-the-fourier-domain" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution in the Fourier domain</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs001.html#plan-for-week-44" style="font-size: 80%;">Plan for week 44</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs002.html#lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;">Lab  sessions on Tuesday and Wednesday</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs003.html#material-for-lecture-monday-october-27" style="font-size: 80%;">Material for Lecture Monday October 27</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs004.html#solving-differential-equations-with-deep-learning" style="font-size: 80%;">Solving differential equations  with Deep Learning</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs005.html#ordinary-differential-equations-first" style="font-size: 80%;">Ordinary Differential Equations first</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#the-trial-solution" style="font-size: 80%;">The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs007.html#minimization-process" style="font-size: 80%;">Minimization process</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs008.html#minimizing-the-cost-function-using-gradient-descent-and-automatic-differentiation" style="font-size: 80%;">Minimizing the cost function using gradient descent and automatic differentiation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs009.html#example-exponential-decay" style="font-size: 80%;">Example: Exponential decay</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs010.html#the-function-to-solve-for" style="font-size: 80%;">The function to solve for</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#the-trial-solution" style="font-size: 80%;">The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs012.html#setup-of-network" style="font-size: 80%;">Setup of Network</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs013.html#reformulating-the-problem" style="font-size: 80%;">Reformulating the problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs014.html#more-technicalities" style="font-size: 80%;">More technicalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs039.html#more-details" style="font-size: 80%;">More details</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs016.html#a-possible-implementation-of-a-neural-network" style="font-size: 80%;">A possible implementation of a neural network</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs017.html#technicalities" style="font-size: 80%;">Technicalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs018.html#final-technicalities-i" style="font-size: 80%;">Final technicalities I</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs019.html#final-technicalities-ii" style="font-size: 80%;">Final technicalities II</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs020.html#final-technicalities-iii" style="font-size: 80%;">Final technicalities III</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs021.html#final-technicalities-iv" style="font-size: 80%;">Final technicalities IV</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs022.html#back-propagation" style="font-size: 80%;">Back propagation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs023.html#gradient-descent" style="font-size: 80%;">Gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs024.html#the-code-for-solving-the-ode" style="font-size: 80%;">The code for solving the ODE</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs025.html#the-network-with-one-input-layer-specified-number-of-hidden-layers-and-one-output-layer" style="font-size: 80%;">The network with one input layer, specified number of hidden layers, and one output layer</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs026.html#example-population-growth" style="font-size: 80%;">Example: Population growth</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs027.html#setting-up-the-problem" style="font-size: 80%;">Setting up the problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#the-trial-solution" style="font-size: 80%;">The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs029.html#the-program-using-autograd" style="font-size: 80%;">The program using Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs030.html#using-forward-euler-to-solve-the-ode" style="font-size: 80%;">Using forward Euler to solve the ODE</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs031.html#example-solving-the-one-dimensional-poisson-equation" style="font-size: 80%;">Example: Solving the one dimensional Poisson equation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs032.html#the-specific-equation-to-solve-for" style="font-size: 80%;">The specific equation to solve for</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs033.html#solving-the-equation-using-autograd" style="font-size: 80%;">Solving the equation using Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs034.html#comparing-with-a-numerical-scheme" style="font-size: 80%;">Comparing with a numerical scheme</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs035.html#setting-up-the-code" style="font-size: 80%;">Setting up the code</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs036.html#partial-differential-equations" style="font-size: 80%;">Partial Differential Equations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs037.html#type-of-problem" style="font-size: 80%;">Type of problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs038.html#network-requirements" style="font-size: 80%;">Network requirements</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs039.html#more-details" style="font-size: 80%;">More details</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs040.html#example-the-diffusion-equation" style="font-size: 80%;">Example: The diffusion equation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs041.html#defining-the-problem" style="font-size: 80%;">Defining the problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs042.html#setting-up-the-network-using-autograd" style="font-size: 80%;">Setting up the network using Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs043.html#setting-up-the-network-using-autograd-the-trial-solution" style="font-size: 80%;">Setting up the network using Autograd; The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs044.html#why-the-jacobian" style="font-size: 80%;">Why the Jacobian?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs045.html#setting-up-the-network-using-autograd-the-full-program" style="font-size: 80%;">Setting up the network using Autograd; The full program</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs046.html#resources-on-differential-equations-and-deep-learning" style="font-size: 80%;">Resources on differential equations and deep learning</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs047.html#convolutional-neural-networks-recognizing-images" style="font-size: 80%;">Convolutional Neural Networks (recognizing images)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs048.html#what-is-the-difference" style="font-size: 80%;">What is the Difference</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs049.html#neural-networks-vs-cnns" style="font-size: 80%;">Neural Networks vs CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs050.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;">Why CNNS for images, sound files, medical images from CT scans etc?</a></li>
+     <!-- navigation toc: --> <li><a href="#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;">Regular NNs don’t scale well to full images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs052.html#3d-volumes-of-neurons" style="font-size: 80%;">3D volumes of neurons</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs053.html#more-on-dimensionalities" style="font-size: 80%;">More on Dimensionalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs054.html#further-remarks" style="font-size: 80%;">Further remarks</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs055.html#layers-used-to-build-cnns" style="font-size: 80%;">Layers used to build CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs056.html#transforming-images" style="font-size: 80%;">Transforming images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs057.html#cnns-in-brief" style="font-size: 80%;">CNNs in brief</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs058.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs059.html#key-idea" style="font-size: 80%;">Key Idea</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#how-to-do-image-compression-before-the-era-of-deep-learning" style="font-size: 80%;">How to do image compression before the era of deep learning</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs061.html#the-svd-example" style="font-size: 80%;">The SVD example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs062.html#mathematics-of-cnns" style="font-size: 80%;">Mathematics of CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs063.html#convolution-examples-polynomial-multiplication" style="font-size: 80%;">Convolution Examples: Polynomial multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs064.html#efficient-polynomial-multiplication" style="font-size: 80%;">Efficient Polynomial Multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs065.html#further-simplification" style="font-size: 80%;">Further simplification</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs066.html#a-more-efficient-way-of-coding-the-above-convolution" style="font-size: 80%;">A more efficient way of coding the above Convolution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs067.html#commutative-process" style="font-size: 80%;">Commutative process</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs068.html#toeplitz-matrices" style="font-size: 80%;">Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs069.html#fourier-series-and-toeplitz-matrices" style="font-size: 80%;">Fourier series and Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs070.html#generalizing-the-above-one-dimensional-case" style="font-size: 80%;">Generalizing the above one-dimensional case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs071.html#memory-considerations" style="font-size: 80%;">Memory considerations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs072.html#padding" style="font-size: 80%;">Padding</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs073.html#new-vector" style="font-size: 80%;">New vector</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs074.html#rewriting-as-dot-products" style="font-size: 80%;">Rewriting as dot products</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs075.html#cross-correlation" style="font-size: 80%;">Cross correlation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs075.html#two-dimensional-objects" style="font-size: 80%;">Two-dimensional objects</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs076.html#cnns-in-more-detail-simple-example" style="font-size: 80%;">CNNs in more detail, simple example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs077.html#the-convolution-stage" style="font-size: 80%;">The convolution stage</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs078.html#finding-the-number-of-parameters" style="font-size: 80%;">Finding the number of parameters</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs079.html#new-image-or-volume" style="font-size: 80%;">New image (or volume)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs080.html#parameters-to-train-common-settings" style="font-size: 80%;">Parameters to train, common settings</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs081.html#examples-of-cnn-setups" style="font-size: 80%;">Examples of CNN setups</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs082.html#summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Summarizing: Performing a general discrete convolution ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs083.html#pooling" style="font-size: 80%;">Pooling</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs084.html#pooling-arithmetic" style="font-size: 80%;">Pooling arithmetic</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs085.html#pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Pooling types ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs086.html#building-convolutional-neural-networks-in-tensorflow-keras-and-pytorch" style="font-size: 80%;">Building convolutional neural networks in Tensorflow/Keras and PyTorch</a></li>
 
         </ul>
       </li>
@@ -347,57 +388,32 @@
 <div class="container">
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0051"></a>
-<!-- !split  -->
-<h2 id="running-with-keras" class="anchor">Running with Keras </h2>
-
+<!-- !split -->
+<h2 id="regular-nns-don-t-scale-well-to-full-images" class="anchor">Regular NNs don&#8217;t scale well to full images </h2>
 
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">create_convolutional_neural_network_keras</span>(input_shape, receptive_field,
-                                              n_filters, n_neurons_connected, n_categories,
-                                              eta, lmbd):
-    model <span style="color: #666666">=</span> Sequential()
-    model<span style="color: #666666">.</span>add(layers<span style="color: #666666">.</span>Conv2D(n_filters, (receptive_field, receptive_field), input_shape<span style="color: #666666">=</span>input_shape, padding<span style="color: #666666">=</span><span style="color: #BA2121">&#39;same&#39;</span>,
-              activation<span style="color: #666666">=</span><span style="color: #BA2121">&#39;relu&#39;</span>, kernel_regularizer<span style="color: #666666">=</span>regularizers<span style="color: #666666">.</span>l2(lmbd)))
-    model<span style="color: #666666">.</span>add(layers<span style="color: #666666">.</span>MaxPooling2D(pool_size<span style="color: #666666">=</span>(<span style="color: #666666">2</span>, <span style="color: #666666">2</span>)))
-    model<span style="color: #666666">.</span>add(layers<span style="color: #666666">.</span>Flatten())
-    model<span style="color: #666666">.</span>add(layers<span style="color: #666666">.</span>Dense(n_neurons_connected, activation<span style="color: #666666">=</span><span style="color: #BA2121">&#39;relu&#39;</span>, kernel_regularizer<span style="color: #666666">=</span>regularizers<span style="color: #666666">.</span>l2(lmbd)))
-    model<span style="color: #666666">.</span>add(layers<span style="color: #666666">.</span>Dense(n_categories, activation<span style="color: #666666">=</span><span style="color: #BA2121">&#39;softmax&#39;</span>, kernel_regularizer<span style="color: #666666">=</span>regularizers<span style="color: #666666">.</span>l2(lmbd)))
-    
-    sgd <span style="color: #666666">=</span> optimizers<span style="color: #666666">.</span>SGD(learning_rate<span style="color: #666666">=</span>eta)
-    model<span style="color: #666666">.</span>compile(loss<span style="color: #666666">=</span><span style="color: #BA2121">&#39;categorical_crossentropy&#39;</span>, optimizer<span style="color: #666666">=</span>sgd, metrics<span style="color: #666666">=</span>[<span style="color: #BA2121">&#39;accuracy&#39;</span>])
-    
-    <span style="color: #008000; font-weight: bold">return</span> model
+<p>As an example, consider
+an image of size \( 32\times 32\times 3 \) (32 wide, 32 high, 3 color channels), so a
+single fully-connected neuron in a first hidden layer of a regular
+Neural Network would have \( 32\times 32\times 3 = 3072 \) weights. This amount still
+seems manageable, but clearly this fully-connected structure does not
+scale to larger images. For example, an image of more respectable
+size, say \( 200\times 200\times 3 \), would lead to neurons that have 
+\( 200\times 200\times 3 = 120,000 \) weights. 
+</p>
 
-epochs <span style="color: #666666">=</span> <span style="color: #666666">100</span>
-batch_size <span style="color: #666666">=</span> <span style="color: #666666">100</span>
-input_shape <span style="color: #666666">=</span> X_train<span style="color: #666666">.</span>shape[<span style="color: #666666">1</span>:<span style="color: #666666">4</span>]
-receptive_field <span style="color: #666666">=</span> <span style="color: #666666">3</span>
-n_filters <span style="color: #666666">=</span> <span style="color: #666666">10</span>
-n_neurons_connected <span style="color: #666666">=</span> <span style="color: #666666">50</span>
-n_categories <span style="color: #666666">=</span> <span style="color: #666666">10</span>
-
-eta_vals <span style="color: #666666">=</span> np<span style="color: #666666">.</span>logspace(<span style="color: #666666">-5</span>, <span style="color: #666666">1</span>, <span style="color: #666666">7</span>)
-lmbd_vals <span style="color: #666666">=</span> np<span style="color: #666666">.</span>logspace(<span style="color: #666666">-5</span>, <span style="color: #666666">1</span>, <span style="color: #666666">7</span>)
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
+<p>We could have
+several such neurons, and the parameters would add up quickly! Clearly,
+this full connectivity is wasteful and the huge number of parameters
+would quickly lead to possible overfitting.
+</p>
 
+<center>  <!-- FIGURE -->
+<hr class="figure">
+<center>
+<p class="caption">Figure 1:  A regular 3-layer Neural Network. </p>
+</center>
+<p><img src="http://github.com/figslides/nn.jpeg" width="500" align="bottom"></p>
+</center>
 
 <p>
 <!-- navigation buttons at the bottom of the page -->
@@ -423,6 +439,8 @@ <h2 id="running-with-keras" class="anchor">Running with Keras </h2>
   <li><a href="http://github.com/._week44-bs058.html">59</a></li>
   <li><a href="http://github.com/._week44-bs059.html">60</a></li>
   <li><a href="http://github.com/._week44-bs060.html">61</a></li>
+  <li><a href="">...</a></li>
+  <li><a href="http://github.com/._week44-bs086.html">87</a></li>
   <li><a href="http://github.com/._week44-bs052.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week44/html/._week44-bs052.html b/doc/pub/week44/html/._week44-bs052.html
index 7161be3fa..a16b3af79 100644
--- a/doc/pub/week44/html/._week44-bs052.html
+++ b/doc/pub/week44/html/._week44-bs052.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 44,  Convolutional Neural Networks (CNN)">
-<title>Week 44,  Convolutional Neural Networks (CNN)</title>
+<meta name="description" content="Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)">
+<title>Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week44.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week44-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -41,10 +41,121 @@
                2,
                None,
                'lab-sessions-on-tuesday-and-wednesday'),
-              ('Material for Lecture Monday October 28',
+              ('Material for Lecture Monday October 27',
                2,
                None,
-               'material-for-lecture-monday-october-28'),
+               'material-for-lecture-monday-october-27'),
+              ('Solving differential equations  with Deep Learning',
+               2,
+               None,
+               'solving-differential-equations-with-deep-learning'),
+              ('Ordinary Differential Equations first',
+               2,
+               None,
+               'ordinary-differential-equations-first'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('Minimization process', 2, None, 'minimization-process'),
+              ('Minimizing the cost function using gradient descent and '
+               'automatic differentiation',
+               2,
+               None,
+               'minimizing-the-cost-function-using-gradient-descent-and-automatic-differentiation'),
+              ('Example: Exponential decay',
+               2,
+               None,
+               'example-exponential-decay'),
+              ('The function to solve for',
+               2,
+               None,
+               'the-function-to-solve-for'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('Setup of Network', 2, None, 'setup-of-network'),
+              ('Reformulating the problem',
+               2,
+               None,
+               'reformulating-the-problem'),
+              ('More technicalities', 2, None, 'more-technicalities'),
+              ('More details', 2, None, 'more-details'),
+              ('A possible implementation of a neural network',
+               2,
+               None,
+               'a-possible-implementation-of-a-neural-network'),
+              ('Technicalities', 2, None, 'technicalities'),
+              ('Final technicalities I', 2, None, 'final-technicalities-i'),
+              ('Final technicalities II', 2, None, 'final-technicalities-ii'),
+              ('Final technicalities III', 2, None, 'final-technicalities-iii'),
+              ('Final technicalities IV', 2, None, 'final-technicalities-iv'),
+              ('Back propagation', 2, None, 'back-propagation'),
+              ('Gradient descent', 2, None, 'gradient-descent'),
+              ('The code for solving the ODE',
+               2,
+               None,
+               'the-code-for-solving-the-ode'),
+              ('The network with one input layer, specified number of hidden '
+               'layers, and one output layer',
+               2,
+               None,
+               'the-network-with-one-input-layer-specified-number-of-hidden-layers-and-one-output-layer'),
+              ('Example: Population growth',
+               2,
+               None,
+               'example-population-growth'),
+              ('Setting up the problem', 2, None, 'setting-up-the-problem'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('The program using Autograd',
+               2,
+               None,
+               'the-program-using-autograd'),
+              ('Using forward Euler to solve the ODE',
+               2,
+               None,
+               'using-forward-euler-to-solve-the-ode'),
+              ('Example: Solving the one dimensional Poisson equation',
+               2,
+               None,
+               'example-solving-the-one-dimensional-poisson-equation'),
+              ('The specific equation to solve for',
+               2,
+               None,
+               'the-specific-equation-to-solve-for'),
+              ('Solving the equation using Autograd',
+               2,
+               None,
+               'solving-the-equation-using-autograd'),
+              ('Comparing with a numerical scheme',
+               2,
+               None,
+               'comparing-with-a-numerical-scheme'),
+              ('Setting up the code', 2, None, 'setting-up-the-code'),
+              ('Partial Differential Equations',
+               2,
+               None,
+               'partial-differential-equations'),
+              ('Type of problem', 2, None, 'type-of-problem'),
+              ('Network requirements', 2, None, 'network-requirements'),
+              ('More details', 2, None, 'more-details'),
+              ('Example: The diffusion equation',
+               2,
+               None,
+               'example-the-diffusion-equation'),
+              ('Defining the problem', 2, None, 'defining-the-problem'),
+              ('Setting up the network using Autograd',
+               2,
+               None,
+               'setting-up-the-network-using-autograd'),
+              ('Setting up the network using Autograd; The trial solution',
+               2,
+               None,
+               'setting-up-the-network-using-autograd-the-trial-solution'),
+              ('Why the Jacobian?', 2, None, 'why-the-jacobian'),
+              ('Setting up the network using Autograd; The full program',
+               2,
+               None,
+               'setting-up-the-network-using-autograd-the-full-program'),
+              ('Resources on differential equations and deep learning',
+               2,
+               None,
+               'resources-on-differential-equations-and-deep-learning'),
               ('Convolutional Neural Networks (recognizing images)',
                2,
                None,
@@ -141,85 +252,11 @@
                2,
                None,
                'pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
-              ('Building convolutional neural networks in Tensorflow and Keras',
+              ('Building convolutional neural networks in Tensorflow/Keras and '
+               'PyTorch',
                2,
                None,
-               'building-convolutional-neural-networks-in-tensorflow-and-keras'),
-              ('Setting it up', 2, None, 'setting-it-up'),
-              ('The MNIST dataset again', 2, None, 'the-mnist-dataset-again'),
-              ('Strong correlations', 2, None, 'strong-correlations'),
-              ('Layers of a CNN', 2, None, 'layers-of-a-cnn'),
-              ('Systematic reduction', 2, None, 'systematic-reduction'),
-              ('Prerequisites: Collect and pre-process data',
-               2,
-               None,
-               'prerequisites-collect-and-pre-process-data'),
-              ('Importing Keras and Tensorflow',
-               2,
-               None,
-               'importing-keras-and-tensorflow'),
-              ('Running with Keras', 2, None, 'running-with-keras'),
-              ('Final part', 2, None, 'final-part'),
-              ('Final visualization', 2, None, 'final-visualization'),
-              ('The CIFAR01 data set', 2, None, 'the-cifar01-data-set'),
-              ('Verifying the data set', 2, None, 'verifying-the-data-set'),
-              ('Set up  the model', 2, None, 'set-up-the-model'),
-              ('Add Dense layers on top', 2, None, 'add-dense-layers-on-top'),
-              ('Compile and train the model',
-               2,
-               None,
-               'compile-and-train-the-model'),
-              ('Finally, evaluate the model',
-               2,
-               None,
-               'finally-evaluate-the-model'),
-              ('Building our own CNN code',
-               2,
-               None,
-               'building-our-own-cnn-code'),
-              ('List of contents:', 3, None, 'list-of-contents'),
-              ('Schedulers', 3, None, 'schedulers'),
-              ('Usage of schedulers', 3, None, 'usage-of-schedulers'),
-              ('Cost functions', 3, None, 'cost-functions'),
-              ('Usage of cost functions', 3, None, 'usage-of-cost-functions'),
-              ('Activation functions', 3, None, 'activation-functions'),
-              ('Usage of activation functions',
-               3,
-               None,
-               'usage-of-activation-functions'),
-              ('Convolution', 3, None, 'convolution'),
-              ('Layers', 3, None, 'layers'),
-              ('Convolution2DLayer: convolution in a hidden layer',
-               3,
-               None,
-               'convolution2dlayer-convolution-in-a-hidden-layer'),
-              ('Backpropagation in the convolutional layer',
-               3,
-               None,
-               'backpropagation-in-the-convolutional-layer'),
-              ('Demonstration', 3, None, 'demonstration'),
-              ('Pooling Layer', 3, None, 'pooling-layer'),
-              ('Flattening Layer', 3, None, 'flattening-layer'),
-              ('Fully Connected Layers', 3, None, 'fully-connected-layers'),
-              ('Optimized Convolution2DLayer',
-               3,
-               None,
-               'optimized-convolution2dlayer'),
-              ('The Convolutional Neural Network (CNN)',
-               3,
-               None,
-               'the-convolutional-neural-network-cnn'),
-              ('Usage of CNN code', 3, None, 'usage-of-cnn-code'),
-              ('Additional Remarks', 3, None, 'additional-remarks'),
-              ('Remarks on the speed', 3, None, 'remarks-on-the-speed'),
-              ('Convolution using separable kernels',
-               3,
-               None,
-               'convolution-using-separable-kernels'),
-              ('Convolution in the Fourier domain',
-               3,
-               None,
-               'convolution-in-the-fourier-domain')]}
+               'building-convolutional-neural-networks-in-tensorflow-keras-and-pytorch')]}
 end of tocinfo -->
 
 <body>
@@ -247,96 +284,100 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week44-bs.html">Week 44,  Convolutional Neural Networks (CNN)</a>
+    <a class="navbar-brand" href="http://github.com/week44-bs.html">Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs001.html#plan-for-week-44" style="font-size: 80%;"><b>Plan for week 44</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs002.html#lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Lab  sessions on Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs003.html#material-for-lecture-monday-october-28" style="font-size: 80%;"><b>Material for Lecture Monday October 28</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs004.html#convolutional-neural-networks-recognizing-images" style="font-size: 80%;"><b>Convolutional Neural Networks (recognizing images)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs005.html#what-is-the-difference" style="font-size: 80%;"><b>What is the Difference</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs006.html#neural-networks-vs-cnns" style="font-size: 80%;"><b>Neural Networks vs CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs007.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;"><b>Why CNNS for images, sound files, medical images from CT scans etc?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs008.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;"><b>Regular NNs don’t scale well to full images</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs009.html#3d-volumes-of-neurons" style="font-size: 80%;"><b>3D volumes of neurons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs010.html#more-on-dimensionalities" style="font-size: 80%;"><b>More on Dimensionalities</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs011.html#further-remarks" style="font-size: 80%;"><b>Further remarks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs012.html#layers-used-to-build-cnns" style="font-size: 80%;"><b>Layers used to build CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs013.html#transforming-images" style="font-size: 80%;"><b>Transforming images</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs014.html#cnns-in-brief" style="font-size: 80%;"><b>CNNs in brief</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs015.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs016.html#key-idea" style="font-size: 80%;"><b>Key Idea</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs017.html#how-to-do-image-compression-before-the-era-of-deep-learning" style="font-size: 80%;"><b>How to do image compression before the era of deep learning</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs018.html#the-svd-example" style="font-size: 80%;"><b>The SVD example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs019.html#mathematics-of-cnns" style="font-size: 80%;"><b>Mathematics of CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs020.html#convolution-examples-polynomial-multiplication" style="font-size: 80%;"><b>Convolution Examples: Polynomial multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs021.html#efficient-polynomial-multiplication" style="font-size: 80%;"><b>Efficient Polynomial Multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs022.html#further-simplification" style="font-size: 80%;"><b>Further simplification</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs023.html#a-more-efficient-way-of-coding-the-above-convolution" style="font-size: 80%;"><b>A more efficient way of coding the above Convolution</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs024.html#commutative-process" style="font-size: 80%;"><b>Commutative process</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs025.html#toeplitz-matrices" style="font-size: 80%;"><b>Toeplitz matrices</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs026.html#fourier-series-and-toeplitz-matrices" style="font-size: 80%;"><b>Fourier series and Toeplitz matrices</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs027.html#generalizing-the-above-one-dimensional-case" style="font-size: 80%;"><b>Generalizing the above one-dimensional case</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#memory-considerations" style="font-size: 80%;"><b>Memory considerations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs029.html#padding" style="font-size: 80%;"><b>Padding</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs030.html#new-vector" style="font-size: 80%;"><b>New vector</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs031.html#rewriting-as-dot-products" style="font-size: 80%;"><b>Rewriting as dot products</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs032.html#cross-correlation" style="font-size: 80%;"><b>Cross correlation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs032.html#two-dimensional-objects" style="font-size: 80%;"><b>Two-dimensional objects</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs033.html#cnns-in-more-detail-simple-example" style="font-size: 80%;"><b>CNNs in more detail, simple example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs034.html#the-convolution-stage" style="font-size: 80%;"><b>The convolution stage</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs035.html#finding-the-number-of-parameters" style="font-size: 80%;"><b>Finding the number of parameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs036.html#new-image-or-volume" style="font-size: 80%;"><b>New image (or volume)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs037.html#parameters-to-train-common-settings" style="font-size: 80%;"><b>Parameters to train, common settings</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs038.html#examples-of-cnn-setups" style="font-size: 80%;"><b>Examples of CNN setups</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs039.html#summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>Summarizing: Performing a general discrete convolution ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs040.html#pooling" style="font-size: 80%;"><b>Pooling</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs041.html#pooling-arithmetic" style="font-size: 80%;"><b>Pooling arithmetic</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs042.html#pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>Pooling types ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs043.html#building-convolutional-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building convolutional neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs044.html#setting-it-up" style="font-size: 80%;"><b>Setting it up</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs045.html#the-mnist-dataset-again" style="font-size: 80%;"><b>The MNIST dataset again</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs046.html#strong-correlations" style="font-size: 80%;"><b>Strong correlations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs047.html#layers-of-a-cnn" style="font-size: 80%;"><b>Layers of a CNN</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs048.html#systematic-reduction" style="font-size: 80%;"><b>Systematic reduction</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs049.html#prerequisites-collect-and-pre-process-data" style="font-size: 80%;"><b>Prerequisites: Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs050.html#importing-keras-and-tensorflow" style="font-size: 80%;"><b>Importing Keras and Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs051.html#running-with-keras" style="font-size: 80%;"><b>Running with Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="#final-part" style="font-size: 80%;"><b>Final part</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs053.html#final-visualization" style="font-size: 80%;"><b>Final visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs054.html#the-cifar01-data-set" style="font-size: 80%;"><b>The CIFAR01 data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs055.html#verifying-the-data-set" style="font-size: 80%;"><b>Verifying the data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs056.html#set-up-the-model" style="font-size: 80%;"><b>Set up  the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs057.html#add-dense-layers-on-top" style="font-size: 80%;"><b>Add Dense layers on top</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs058.html#compile-and-train-the-model" style="font-size: 80%;"><b>Compile and train the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs059.html#finally-evaluate-the-model" style="font-size: 80%;"><b>Finally, evaluate the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#building-our-own-cnn-code" style="font-size: 80%;"><b>Building our own CNN code</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#list-of-contents" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;List of contents:</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution2dlayer-convolution-in-a-hidden-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution2DLayer: convolution in a hidden layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#backpropagation-in-the-convolutional-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Backpropagation in the convolutional layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#demonstration" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Demonstration</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#pooling-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Pooling Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#flattening-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Flattening Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#fully-connected-layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Fully Connected Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#optimized-convolution2dlayer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Optimized Convolution2DLayer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#the-convolutional-neural-network-cnn" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;The Convolutional Neural Network (CNN)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-cnn-code" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of CNN code</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#additional-remarks" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Additional Remarks</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#remarks-on-the-speed" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Remarks on the speed</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution-using-separable-kernels" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution using separable kernels</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution-in-the-fourier-domain" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution in the Fourier domain</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs001.html#plan-for-week-44" style="font-size: 80%;">Plan for week 44</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs002.html#lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;">Lab  sessions on Tuesday and Wednesday</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs003.html#material-for-lecture-monday-october-27" style="font-size: 80%;">Material for Lecture Monday October 27</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs004.html#solving-differential-equations-with-deep-learning" style="font-size: 80%;">Solving differential equations  with Deep Learning</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs005.html#ordinary-differential-equations-first" style="font-size: 80%;">Ordinary Differential Equations first</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#the-trial-solution" style="font-size: 80%;">The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs007.html#minimization-process" style="font-size: 80%;">Minimization process</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs008.html#minimizing-the-cost-function-using-gradient-descent-and-automatic-differentiation" style="font-size: 80%;">Minimizing the cost function using gradient descent and automatic differentiation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs009.html#example-exponential-decay" style="font-size: 80%;">Example: Exponential decay</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs010.html#the-function-to-solve-for" style="font-size: 80%;">The function to solve for</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#the-trial-solution" style="font-size: 80%;">The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs012.html#setup-of-network" style="font-size: 80%;">Setup of Network</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs013.html#reformulating-the-problem" style="font-size: 80%;">Reformulating the problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs014.html#more-technicalities" style="font-size: 80%;">More technicalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs039.html#more-details" style="font-size: 80%;">More details</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs016.html#a-possible-implementation-of-a-neural-network" style="font-size: 80%;">A possible implementation of a neural network</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs017.html#technicalities" style="font-size: 80%;">Technicalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs018.html#final-technicalities-i" style="font-size: 80%;">Final technicalities I</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs019.html#final-technicalities-ii" style="font-size: 80%;">Final technicalities II</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs020.html#final-technicalities-iii" style="font-size: 80%;">Final technicalities III</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs021.html#final-technicalities-iv" style="font-size: 80%;">Final technicalities IV</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs022.html#back-propagation" style="font-size: 80%;">Back propagation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs023.html#gradient-descent" style="font-size: 80%;">Gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs024.html#the-code-for-solving-the-ode" style="font-size: 80%;">The code for solving the ODE</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs025.html#the-network-with-one-input-layer-specified-number-of-hidden-layers-and-one-output-layer" style="font-size: 80%;">The network with one input layer, specified number of hidden layers, and one output layer</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs026.html#example-population-growth" style="font-size: 80%;">Example: Population growth</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs027.html#setting-up-the-problem" style="font-size: 80%;">Setting up the problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#the-trial-solution" style="font-size: 80%;">The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs029.html#the-program-using-autograd" style="font-size: 80%;">The program using Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs030.html#using-forward-euler-to-solve-the-ode" style="font-size: 80%;">Using forward Euler to solve the ODE</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs031.html#example-solving-the-one-dimensional-poisson-equation" style="font-size: 80%;">Example: Solving the one dimensional Poisson equation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs032.html#the-specific-equation-to-solve-for" style="font-size: 80%;">The specific equation to solve for</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs033.html#solving-the-equation-using-autograd" style="font-size: 80%;">Solving the equation using Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs034.html#comparing-with-a-numerical-scheme" style="font-size: 80%;">Comparing with a numerical scheme</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs035.html#setting-up-the-code" style="font-size: 80%;">Setting up the code</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs036.html#partial-differential-equations" style="font-size: 80%;">Partial Differential Equations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs037.html#type-of-problem" style="font-size: 80%;">Type of problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs038.html#network-requirements" style="font-size: 80%;">Network requirements</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs039.html#more-details" style="font-size: 80%;">More details</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs040.html#example-the-diffusion-equation" style="font-size: 80%;">Example: The diffusion equation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs041.html#defining-the-problem" style="font-size: 80%;">Defining the problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs042.html#setting-up-the-network-using-autograd" style="font-size: 80%;">Setting up the network using Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs043.html#setting-up-the-network-using-autograd-the-trial-solution" style="font-size: 80%;">Setting up the network using Autograd; The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs044.html#why-the-jacobian" style="font-size: 80%;">Why the Jacobian?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs045.html#setting-up-the-network-using-autograd-the-full-program" style="font-size: 80%;">Setting up the network using Autograd; The full program</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs046.html#resources-on-differential-equations-and-deep-learning" style="font-size: 80%;">Resources on differential equations and deep learning</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs047.html#convolutional-neural-networks-recognizing-images" style="font-size: 80%;">Convolutional Neural Networks (recognizing images)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs048.html#what-is-the-difference" style="font-size: 80%;">What is the Difference</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs049.html#neural-networks-vs-cnns" style="font-size: 80%;">Neural Networks vs CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs050.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;">Why CNNS for images, sound files, medical images from CT scans etc?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs051.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;">Regular NNs don’t scale well to full images</a></li>
+     <!-- navigation toc: --> <li><a href="#3d-volumes-of-neurons" style="font-size: 80%;">3D volumes of neurons</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs053.html#more-on-dimensionalities" style="font-size: 80%;">More on Dimensionalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs054.html#further-remarks" style="font-size: 80%;">Further remarks</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs055.html#layers-used-to-build-cnns" style="font-size: 80%;">Layers used to build CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs056.html#transforming-images" style="font-size: 80%;">Transforming images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs057.html#cnns-in-brief" style="font-size: 80%;">CNNs in brief</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs058.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs059.html#key-idea" style="font-size: 80%;">Key Idea</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#how-to-do-image-compression-before-the-era-of-deep-learning" style="font-size: 80%;">How to do image compression before the era of deep learning</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs061.html#the-svd-example" style="font-size: 80%;">The SVD example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs062.html#mathematics-of-cnns" style="font-size: 80%;">Mathematics of CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs063.html#convolution-examples-polynomial-multiplication" style="font-size: 80%;">Convolution Examples: Polynomial multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs064.html#efficient-polynomial-multiplication" style="font-size: 80%;">Efficient Polynomial Multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs065.html#further-simplification" style="font-size: 80%;">Further simplification</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs066.html#a-more-efficient-way-of-coding-the-above-convolution" style="font-size: 80%;">A more efficient way of coding the above Convolution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs067.html#commutative-process" style="font-size: 80%;">Commutative process</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs068.html#toeplitz-matrices" style="font-size: 80%;">Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs069.html#fourier-series-and-toeplitz-matrices" style="font-size: 80%;">Fourier series and Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs070.html#generalizing-the-above-one-dimensional-case" style="font-size: 80%;">Generalizing the above one-dimensional case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs071.html#memory-considerations" style="font-size: 80%;">Memory considerations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs072.html#padding" style="font-size: 80%;">Padding</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs073.html#new-vector" style="font-size: 80%;">New vector</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs074.html#rewriting-as-dot-products" style="font-size: 80%;">Rewriting as dot products</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs075.html#cross-correlation" style="font-size: 80%;">Cross correlation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs075.html#two-dimensional-objects" style="font-size: 80%;">Two-dimensional objects</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs076.html#cnns-in-more-detail-simple-example" style="font-size: 80%;">CNNs in more detail, simple example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs077.html#the-convolution-stage" style="font-size: 80%;">The convolution stage</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs078.html#finding-the-number-of-parameters" style="font-size: 80%;">Finding the number of parameters</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs079.html#new-image-or-volume" style="font-size: 80%;">New image (or volume)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs080.html#parameters-to-train-common-settings" style="font-size: 80%;">Parameters to train, common settings</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs081.html#examples-of-cnn-setups" style="font-size: 80%;">Examples of CNN setups</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs082.html#summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Summarizing: Performing a general discrete convolution ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs083.html#pooling" style="font-size: 80%;">Pooling</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs084.html#pooling-arithmetic" style="font-size: 80%;">Pooling arithmetic</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs085.html#pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Pooling types ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs086.html#building-convolutional-neural-networks-in-tensorflow-keras-and-pytorch" style="font-size: 80%;">Building convolutional neural networks in Tensorflow/Keras and PyTorch</a></li>
 
         </ul>
       </li>
@@ -348,46 +389,43 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0052"></a>
 <!-- !split -->
-<h2 id="final-part" class="anchor">Final part </h2>
+<h2 id="3d-volumes-of-neurons" class="anchor">3D volumes of neurons </h2>
 
+<p>Convolutional Neural Networks take advantage of the fact that the
+input consists of images and they constrain the architecture in a more
+sensible way. 
+</p>
 
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;">CNN_keras <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros((<span style="color: #008000">len</span>(eta_vals), <span style="color: #008000">len</span>(lmbd_vals)), dtype<span style="color: #666666">=</span><span style="color: #008000">object</span>)
-        
-<span style="color: #008000; font-weight: bold">for</span> i, eta <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">enumerate</span>(eta_vals):
-    <span style="color: #008000; font-weight: bold">for</span> j, lmbd <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">enumerate</span>(lmbd_vals):
-        CNN <span style="color: #666666">=</span> create_convolutional_neural_network_keras(input_shape, receptive_field,
-                                              n_filters, n_neurons_connected, n_categories,
-                                              eta, lmbd)
-        CNN<span style="color: #666666">.</span>fit(X_train, Y_train, epochs<span style="color: #666666">=</span>epochs, batch_size<span style="color: #666666">=</span>batch_size, verbose<span style="color: #666666">=0</span>)
-        scores <span style="color: #666666">=</span> CNN<span style="color: #666666">.</span>evaluate(X_test, Y_test)
-        
-        CNN_keras[i][j] <span style="color: #666666">=</span> CNN
-        
-        <span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;Learning rate = &quot;</span>, eta)
-        <span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;Lambda = &quot;</span>, lmbd)
-        <span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;Test accuracy: </span><span style="color: #BB6688; font-weight: bold">%.3f</span><span style="color: #BA2121">&quot;</span> <span style="color: #666666">%</span> scores[<span style="color: #666666">1</span>])
-        <span style="color: #008000">print</span>()
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
+<p>In particular, unlike a regular Neural Network, the
+layers of a CNN have neurons arranged in 3 dimensions: width,
+height, depth. (Note that the word depth here refers to the third
+dimension of an activation volume, not to the depth of a full Neural
+Network, which can refer to the total number of layers in a network.)
+</p>
+
+<p>To understand it better, the above example of an image 
+with an input volume of
+activations has dimensions \( 32\times 32\times 3 \) (width, height,
+depth respectively). 
+</p>
 
+<p>The neurons in a layer will
+only be connected to a small region of the layer before it, instead of
+all of the neurons in a fully-connected manner. Moreover, the final
+output layer could  for this specific image have dimensions \( 1\times 1 \times 10 \), 
+because by the
+end of the CNN architecture we will reduce the full image into a
+single vector of class scores, arranged along the depth
+dimension. 
+</p>
+
+<center>  <!-- FIGURE -->
+<hr class="figure">
+<center>
+<p class="caption">Figure 2:  A CNN arranges its neurons in three dimensions (width, height, depth), as visualized in one of the layers. Every layer of a CNN transforms the 3D input volume to a 3D output volume of neuron activations. In this example, the red input layer holds the image, so its width and height would be the dimensions of the image, and the depth would be 3 (Red, Green, Blue channels). </p>
+</center>
+<p><img src="http://github.com/figslides/cnn.jpeg" width="500" align="bottom"></p>
+</center>
 
 <p>
 <!-- navigation buttons at the bottom of the page -->
@@ -412,6 +450,9 @@ <h2 id="final-part" class="anchor">Final part </h2>
   <li><a href="http://github.com/._week44-bs058.html">59</a></li>
   <li><a href="http://github.com/._week44-bs059.html">60</a></li>
   <li><a href="http://github.com/._week44-bs060.html">61</a></li>
+  <li><a href="http://github.com/._week44-bs061.html">62</a></li>
+  <li><a href="">...</a></li>
+  <li><a href="http://github.com/._week44-bs086.html">87</a></li>
   <li><a href="http://github.com/._week44-bs053.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week44/html/._week44-bs053.html b/doc/pub/week44/html/._week44-bs053.html
index 3e6221522..aed9cf02c 100644
--- a/doc/pub/week44/html/._week44-bs053.html
+++ b/doc/pub/week44/html/._week44-bs053.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 44,  Convolutional Neural Networks (CNN)">
-<title>Week 44,  Convolutional Neural Networks (CNN)</title>
+<meta name="description" content="Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)">
+<title>Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week44.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week44-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -41,10 +41,121 @@
                2,
                None,
                'lab-sessions-on-tuesday-and-wednesday'),
-              ('Material for Lecture Monday October 28',
+              ('Material for Lecture Monday October 27',
                2,
                None,
-               'material-for-lecture-monday-october-28'),
+               'material-for-lecture-monday-october-27'),
+              ('Solving differential equations  with Deep Learning',
+               2,
+               None,
+               'solving-differential-equations-with-deep-learning'),
+              ('Ordinary Differential Equations first',
+               2,
+               None,
+               'ordinary-differential-equations-first'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('Minimization process', 2, None, 'minimization-process'),
+              ('Minimizing the cost function using gradient descent and '
+               'automatic differentiation',
+               2,
+               None,
+               'minimizing-the-cost-function-using-gradient-descent-and-automatic-differentiation'),
+              ('Example: Exponential decay',
+               2,
+               None,
+               'example-exponential-decay'),
+              ('The function to solve for',
+               2,
+               None,
+               'the-function-to-solve-for'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('Setup of Network', 2, None, 'setup-of-network'),
+              ('Reformulating the problem',
+               2,
+               None,
+               'reformulating-the-problem'),
+              ('More technicalities', 2, None, 'more-technicalities'),
+              ('More details', 2, None, 'more-details'),
+              ('A possible implementation of a neural network',
+               2,
+               None,
+               'a-possible-implementation-of-a-neural-network'),
+              ('Technicalities', 2, None, 'technicalities'),
+              ('Final technicalities I', 2, None, 'final-technicalities-i'),
+              ('Final technicalities II', 2, None, 'final-technicalities-ii'),
+              ('Final technicalities III', 2, None, 'final-technicalities-iii'),
+              ('Final technicalities IV', 2, None, 'final-technicalities-iv'),
+              ('Back propagation', 2, None, 'back-propagation'),
+              ('Gradient descent', 2, None, 'gradient-descent'),
+              ('The code for solving the ODE',
+               2,
+               None,
+               'the-code-for-solving-the-ode'),
+              ('The network with one input layer, specified number of hidden '
+               'layers, and one output layer',
+               2,
+               None,
+               'the-network-with-one-input-layer-specified-number-of-hidden-layers-and-one-output-layer'),
+              ('Example: Population growth',
+               2,
+               None,
+               'example-population-growth'),
+              ('Setting up the problem', 2, None, 'setting-up-the-problem'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('The program using Autograd',
+               2,
+               None,
+               'the-program-using-autograd'),
+              ('Using forward Euler to solve the ODE',
+               2,
+               None,
+               'using-forward-euler-to-solve-the-ode'),
+              ('Example: Solving the one dimensional Poisson equation',
+               2,
+               None,
+               'example-solving-the-one-dimensional-poisson-equation'),
+              ('The specific equation to solve for',
+               2,
+               None,
+               'the-specific-equation-to-solve-for'),
+              ('Solving the equation using Autograd',
+               2,
+               None,
+               'solving-the-equation-using-autograd'),
+              ('Comparing with a numerical scheme',
+               2,
+               None,
+               'comparing-with-a-numerical-scheme'),
+              ('Setting up the code', 2, None, 'setting-up-the-code'),
+              ('Partial Differential Equations',
+               2,
+               None,
+               'partial-differential-equations'),
+              ('Type of problem', 2, None, 'type-of-problem'),
+              ('Network requirements', 2, None, 'network-requirements'),
+              ('More details', 2, None, 'more-details'),
+              ('Example: The diffusion equation',
+               2,
+               None,
+               'example-the-diffusion-equation'),
+              ('Defining the problem', 2, None, 'defining-the-problem'),
+              ('Setting up the network using Autograd',
+               2,
+               None,
+               'setting-up-the-network-using-autograd'),
+              ('Setting up the network using Autograd; The trial solution',
+               2,
+               None,
+               'setting-up-the-network-using-autograd-the-trial-solution'),
+              ('Why the Jacobian?', 2, None, 'why-the-jacobian'),
+              ('Setting up the network using Autograd; The full program',
+               2,
+               None,
+               'setting-up-the-network-using-autograd-the-full-program'),
+              ('Resources on differential equations and deep learning',
+               2,
+               None,
+               'resources-on-differential-equations-and-deep-learning'),
               ('Convolutional Neural Networks (recognizing images)',
                2,
                None,
@@ -141,85 +252,11 @@
                2,
                None,
                'pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
-              ('Building convolutional neural networks in Tensorflow and Keras',
+              ('Building convolutional neural networks in Tensorflow/Keras and '
+               'PyTorch',
                2,
                None,
-               'building-convolutional-neural-networks-in-tensorflow-and-keras'),
-              ('Setting it up', 2, None, 'setting-it-up'),
-              ('The MNIST dataset again', 2, None, 'the-mnist-dataset-again'),
-              ('Strong correlations', 2, None, 'strong-correlations'),
-              ('Layers of a CNN', 2, None, 'layers-of-a-cnn'),
-              ('Systematic reduction', 2, None, 'systematic-reduction'),
-              ('Prerequisites: Collect and pre-process data',
-               2,
-               None,
-               'prerequisites-collect-and-pre-process-data'),
-              ('Importing Keras and Tensorflow',
-               2,
-               None,
-               'importing-keras-and-tensorflow'),
-              ('Running with Keras', 2, None, 'running-with-keras'),
-              ('Final part', 2, None, 'final-part'),
-              ('Final visualization', 2, None, 'final-visualization'),
-              ('The CIFAR01 data set', 2, None, 'the-cifar01-data-set'),
-              ('Verifying the data set', 2, None, 'verifying-the-data-set'),
-              ('Set up  the model', 2, None, 'set-up-the-model'),
-              ('Add Dense layers on top', 2, None, 'add-dense-layers-on-top'),
-              ('Compile and train the model',
-               2,
-               None,
-               'compile-and-train-the-model'),
-              ('Finally, evaluate the model',
-               2,
-               None,
-               'finally-evaluate-the-model'),
-              ('Building our own CNN code',
-               2,
-               None,
-               'building-our-own-cnn-code'),
-              ('List of contents:', 3, None, 'list-of-contents'),
-              ('Schedulers', 3, None, 'schedulers'),
-              ('Usage of schedulers', 3, None, 'usage-of-schedulers'),
-              ('Cost functions', 3, None, 'cost-functions'),
-              ('Usage of cost functions', 3, None, 'usage-of-cost-functions'),
-              ('Activation functions', 3, None, 'activation-functions'),
-              ('Usage of activation functions',
-               3,
-               None,
-               'usage-of-activation-functions'),
-              ('Convolution', 3, None, 'convolution'),
-              ('Layers', 3, None, 'layers'),
-              ('Convolution2DLayer: convolution in a hidden layer',
-               3,
-               None,
-               'convolution2dlayer-convolution-in-a-hidden-layer'),
-              ('Backpropagation in the convolutional layer',
-               3,
-               None,
-               'backpropagation-in-the-convolutional-layer'),
-              ('Demonstration', 3, None, 'demonstration'),
-              ('Pooling Layer', 3, None, 'pooling-layer'),
-              ('Flattening Layer', 3, None, 'flattening-layer'),
-              ('Fully Connected Layers', 3, None, 'fully-connected-layers'),
-              ('Optimized Convolution2DLayer',
-               3,
-               None,
-               'optimized-convolution2dlayer'),
-              ('The Convolutional Neural Network (CNN)',
-               3,
-               None,
-               'the-convolutional-neural-network-cnn'),
-              ('Usage of CNN code', 3, None, 'usage-of-cnn-code'),
-              ('Additional Remarks', 3, None, 'additional-remarks'),
-              ('Remarks on the speed', 3, None, 'remarks-on-the-speed'),
-              ('Convolution using separable kernels',
-               3,
-               None,
-               'convolution-using-separable-kernels'),
-              ('Convolution in the Fourier domain',
-               3,
-               None,
-               'convolution-in-the-fourier-domain')]}
+               'building-convolutional-neural-networks-in-tensorflow-keras-and-pytorch')]}
 end of tocinfo -->
 
 <body>
@@ -247,96 +284,100 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week44-bs.html">Week 44,  Convolutional Neural Networks (CNN)</a>
+    <a class="navbar-brand" href="http://github.com/week44-bs.html">Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs001.html#plan-for-week-44" style="font-size: 80%;"><b>Plan for week 44</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs002.html#lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Lab  sessions on Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs003.html#material-for-lecture-monday-october-28" style="font-size: 80%;"><b>Material for Lecture Monday October 28</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs004.html#convolutional-neural-networks-recognizing-images" style="font-size: 80%;"><b>Convolutional Neural Networks (recognizing images)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs005.html#what-is-the-difference" style="font-size: 80%;"><b>What is the Difference</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs006.html#neural-networks-vs-cnns" style="font-size: 80%;"><b>Neural Networks vs CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs007.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;"><b>Why CNNS for images, sound files, medical images from CT scans etc?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs008.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;"><b>Regular NNs don’t scale well to full images</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs009.html#3d-volumes-of-neurons" style="font-size: 80%;"><b>3D volumes of neurons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs010.html#more-on-dimensionalities" style="font-size: 80%;"><b>More on Dimensionalities</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs011.html#further-remarks" style="font-size: 80%;"><b>Further remarks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs012.html#layers-used-to-build-cnns" style="font-size: 80%;"><b>Layers used to build CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs013.html#transforming-images" style="font-size: 80%;"><b>Transforming images</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs014.html#cnns-in-brief" style="font-size: 80%;"><b>CNNs in brief</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs015.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs016.html#key-idea" style="font-size: 80%;"><b>Key Idea</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs017.html#how-to-do-image-compression-before-the-era-of-deep-learning" style="font-size: 80%;"><b>How to do image compression before the era of deep learning</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs018.html#the-svd-example" style="font-size: 80%;"><b>The SVD example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs019.html#mathematics-of-cnns" style="font-size: 80%;"><b>Mathematics of CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs020.html#convolution-examples-polynomial-multiplication" style="font-size: 80%;"><b>Convolution Examples: Polynomial multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs021.html#efficient-polynomial-multiplication" style="font-size: 80%;"><b>Efficient Polynomial Multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs022.html#further-simplification" style="font-size: 80%;"><b>Further simplification</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs023.html#a-more-efficient-way-of-coding-the-above-convolution" style="font-size: 80%;"><b>A more efficient way of coding the above Convolution</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs024.html#commutative-process" style="font-size: 80%;"><b>Commutative process</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs025.html#toeplitz-matrices" style="font-size: 80%;"><b>Toeplitz matrices</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs026.html#fourier-series-and-toeplitz-matrices" style="font-size: 80%;"><b>Fourier series and Toeplitz matrices</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs027.html#generalizing-the-above-one-dimensional-case" style="font-size: 80%;"><b>Generalizing the above one-dimensional case</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#memory-considerations" style="font-size: 80%;"><b>Memory considerations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs029.html#padding" style="font-size: 80%;"><b>Padding</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs030.html#new-vector" style="font-size: 80%;"><b>New vector</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs031.html#rewriting-as-dot-products" style="font-size: 80%;"><b>Rewriting as dot products</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs032.html#cross-correlation" style="font-size: 80%;"><b>Cross correlation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs032.html#two-dimensional-objects" style="font-size: 80%;"><b>Two-dimensional objects</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs033.html#cnns-in-more-detail-simple-example" style="font-size: 80%;"><b>CNNs in more detail, simple example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs034.html#the-convolution-stage" style="font-size: 80%;"><b>The convolution stage</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs035.html#finding-the-number-of-parameters" style="font-size: 80%;"><b>Finding the number of parameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs036.html#new-image-or-volume" style="font-size: 80%;"><b>New image (or volume)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs037.html#parameters-to-train-common-settings" style="font-size: 80%;"><b>Parameters to train, common settings</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs038.html#examples-of-cnn-setups" style="font-size: 80%;"><b>Examples of CNN setups</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs039.html#summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>Summarizing: Performing a general discrete convolution ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs040.html#pooling" style="font-size: 80%;"><b>Pooling</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs041.html#pooling-arithmetic" style="font-size: 80%;"><b>Pooling arithmetic</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs042.html#pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>Pooling types ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs043.html#building-convolutional-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building convolutional neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs044.html#setting-it-up" style="font-size: 80%;"><b>Setting it up</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs045.html#the-mnist-dataset-again" style="font-size: 80%;"><b>The MNIST dataset again</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs046.html#strong-correlations" style="font-size: 80%;"><b>Strong correlations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs047.html#layers-of-a-cnn" style="font-size: 80%;"><b>Layers of a CNN</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs048.html#systematic-reduction" style="font-size: 80%;"><b>Systematic reduction</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs049.html#prerequisites-collect-and-pre-process-data" style="font-size: 80%;"><b>Prerequisites: Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs050.html#importing-keras-and-tensorflow" style="font-size: 80%;"><b>Importing Keras and Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs051.html#running-with-keras" style="font-size: 80%;"><b>Running with Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs052.html#final-part" style="font-size: 80%;"><b>Final part</b></a></li>
-     <!-- navigation toc: --> <li><a href="#final-visualization" style="font-size: 80%;"><b>Final visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs054.html#the-cifar01-data-set" style="font-size: 80%;"><b>The CIFAR01 data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs055.html#verifying-the-data-set" style="font-size: 80%;"><b>Verifying the data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs056.html#set-up-the-model" style="font-size: 80%;"><b>Set up  the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs057.html#add-dense-layers-on-top" style="font-size: 80%;"><b>Add Dense layers on top</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs058.html#compile-and-train-the-model" style="font-size: 80%;"><b>Compile and train the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs059.html#finally-evaluate-the-model" style="font-size: 80%;"><b>Finally, evaluate the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#building-our-own-cnn-code" style="font-size: 80%;"><b>Building our own CNN code</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#list-of-contents" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;List of contents:</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution2dlayer-convolution-in-a-hidden-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution2DLayer: convolution in a hidden layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#backpropagation-in-the-convolutional-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Backpropagation in the convolutional layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#demonstration" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Demonstration</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#pooling-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Pooling Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#flattening-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Flattening Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#fully-connected-layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Fully Connected Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#optimized-convolution2dlayer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Optimized Convolution2DLayer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#the-convolutional-neural-network-cnn" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;The Convolutional Neural Network (CNN)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-cnn-code" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of CNN code</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#additional-remarks" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Additional Remarks</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#remarks-on-the-speed" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Remarks on the speed</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution-using-separable-kernels" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution using separable kernels</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution-in-the-fourier-domain" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution in the Fourier domain</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs001.html#plan-for-week-44" style="font-size: 80%;">Plan for week 44</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs002.html#lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;">Lab  sessions on Tuesday and Wednesday</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs003.html#material-for-lecture-monday-october-27" style="font-size: 80%;">Material for Lecture Monday October 27</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs004.html#solving-differential-equations-with-deep-learning" style="font-size: 80%;">Solving differential equations  with Deep Learning</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs005.html#ordinary-differential-equations-first" style="font-size: 80%;">Ordinary Differential Equations first</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#the-trial-solution" style="font-size: 80%;">The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs007.html#minimization-process" style="font-size: 80%;">Minimization process</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs008.html#minimizing-the-cost-function-using-gradient-descent-and-automatic-differentiation" style="font-size: 80%;">Minimizing the cost function using gradient descent and automatic differentiation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs009.html#example-exponential-decay" style="font-size: 80%;">Example: Exponential decay</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs010.html#the-function-to-solve-for" style="font-size: 80%;">The function to solve for</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#the-trial-solution" style="font-size: 80%;">The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs012.html#setup-of-network" style="font-size: 80%;">Setup of Network</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs013.html#reformulating-the-problem" style="font-size: 80%;">Reformulating the problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs014.html#more-technicalities" style="font-size: 80%;">More technicalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs039.html#more-details" style="font-size: 80%;">More details</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs016.html#a-possible-implementation-of-a-neural-network" style="font-size: 80%;">A possible implementation of a neural network</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs017.html#technicalities" style="font-size: 80%;">Technicalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs018.html#final-technicalities-i" style="font-size: 80%;">Final technicalities I</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs019.html#final-technicalities-ii" style="font-size: 80%;">Final technicalities II</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs020.html#final-technicalities-iii" style="font-size: 80%;">Final technicalities III</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs021.html#final-technicalities-iv" style="font-size: 80%;">Final technicalities IV</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs022.html#back-propagation" style="font-size: 80%;">Back propagation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs023.html#gradient-descent" style="font-size: 80%;">Gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs024.html#the-code-for-solving-the-ode" style="font-size: 80%;">The code for solving the ODE</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs025.html#the-network-with-one-input-layer-specified-number-of-hidden-layers-and-one-output-layer" style="font-size: 80%;">The network with one input layer, specified number of hidden layers, and one output layer</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs026.html#example-population-growth" style="font-size: 80%;">Example: Population growth</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs027.html#setting-up-the-problem" style="font-size: 80%;">Setting up the problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#the-trial-solution" style="font-size: 80%;">The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs029.html#the-program-using-autograd" style="font-size: 80%;">The program using Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs030.html#using-forward-euler-to-solve-the-ode" style="font-size: 80%;">Using forward Euler to solve the ODE</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs031.html#example-solving-the-one-dimensional-poisson-equation" style="font-size: 80%;">Example: Solving the one dimensional Poisson equation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs032.html#the-specific-equation-to-solve-for" style="font-size: 80%;">The specific equation to solve for</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs033.html#solving-the-equation-using-autograd" style="font-size: 80%;">Solving the equation using Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs034.html#comparing-with-a-numerical-scheme" style="font-size: 80%;">Comparing with a numerical scheme</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs035.html#setting-up-the-code" style="font-size: 80%;">Setting up the code</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs036.html#partial-differential-equations" style="font-size: 80%;">Partial Differential Equations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs037.html#type-of-problem" style="font-size: 80%;">Type of problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs038.html#network-requirements" style="font-size: 80%;">Network requirements</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs039.html#more-details" style="font-size: 80%;">More details</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs040.html#example-the-diffusion-equation" style="font-size: 80%;">Example: The diffusion equation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs041.html#defining-the-problem" style="font-size: 80%;">Defining the problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs042.html#setting-up-the-network-using-autograd" style="font-size: 80%;">Setting up the network using Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs043.html#setting-up-the-network-using-autograd-the-trial-solution" style="font-size: 80%;">Setting up the network using Autograd; The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs044.html#why-the-jacobian" style="font-size: 80%;">Why the Jacobian?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs045.html#setting-up-the-network-using-autograd-the-full-program" style="font-size: 80%;">Setting up the network using Autograd; The full program</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs046.html#resources-on-differential-equations-and-deep-learning" style="font-size: 80%;">Resources on differential equations and deep learning</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs047.html#convolutional-neural-networks-recognizing-images" style="font-size: 80%;">Convolutional Neural Networks (recognizing images)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs048.html#what-is-the-difference" style="font-size: 80%;">What is the Difference</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs049.html#neural-networks-vs-cnns" style="font-size: 80%;">Neural Networks vs CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs050.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;">Why CNNS for images, sound files, medical images from CT scans etc?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs051.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;">Regular NNs don’t scale well to full images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs052.html#3d-volumes-of-neurons" style="font-size: 80%;">3D volumes of neurons</a></li>
+     <!-- navigation toc: --> <li><a href="#more-on-dimensionalities" style="font-size: 80%;">More on Dimensionalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs054.html#further-remarks" style="font-size: 80%;">Further remarks</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs055.html#layers-used-to-build-cnns" style="font-size: 80%;">Layers used to build CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs056.html#transforming-images" style="font-size: 80%;">Transforming images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs057.html#cnns-in-brief" style="font-size: 80%;">CNNs in brief</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs058.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs059.html#key-idea" style="font-size: 80%;">Key Idea</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#how-to-do-image-compression-before-the-era-of-deep-learning" style="font-size: 80%;">How to do image compression before the era of deep learning</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs061.html#the-svd-example" style="font-size: 80%;">The SVD example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs062.html#mathematics-of-cnns" style="font-size: 80%;">Mathematics of CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs063.html#convolution-examples-polynomial-multiplication" style="font-size: 80%;">Convolution Examples: Polynomial multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs064.html#efficient-polynomial-multiplication" style="font-size: 80%;">Efficient Polynomial Multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs065.html#further-simplification" style="font-size: 80%;">Further simplification</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs066.html#a-more-efficient-way-of-coding-the-above-convolution" style="font-size: 80%;">A more efficient way of coding the above Convolution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs067.html#commutative-process" style="font-size: 80%;">Commutative process</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs068.html#toeplitz-matrices" style="font-size: 80%;">Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs069.html#fourier-series-and-toeplitz-matrices" style="font-size: 80%;">Fourier series and Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs070.html#generalizing-the-above-one-dimensional-case" style="font-size: 80%;">Generalizing the above one-dimensional case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs071.html#memory-considerations" style="font-size: 80%;">Memory considerations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs072.html#padding" style="font-size: 80%;">Padding</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs073.html#new-vector" style="font-size: 80%;">New vector</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs074.html#rewriting-as-dot-products" style="font-size: 80%;">Rewriting as dot products</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs075.html#cross-correlation" style="font-size: 80%;">Cross correlation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs075.html#two-dimensional-objects" style="font-size: 80%;">Two-dimensional objects</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs076.html#cnns-in-more-detail-simple-example" style="font-size: 80%;">CNNs in more detail, simple example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs077.html#the-convolution-stage" style="font-size: 80%;">The convolution stage</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs078.html#finding-the-number-of-parameters" style="font-size: 80%;">Finding the number of parameters</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs079.html#new-image-or-volume" style="font-size: 80%;">New image (or volume)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs080.html#parameters-to-train-common-settings" style="font-size: 80%;">Parameters to train, common settings</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs081.html#examples-of-cnn-setups" style="font-size: 80%;">Examples of CNN setups</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs082.html#summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Summarizing: Performing a general discrete convolution ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs083.html#pooling" style="font-size: 80%;">Pooling</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs084.html#pooling-arithmetic" style="font-size: 80%;">Pooling arithmetic</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs085.html#pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Pooling types ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs086.html#building-convolutional-neural-networks-in-tensorflow-keras-and-pytorch" style="font-size: 80%;">Building convolutional neural networks in Tensorflow/Keras and PyTorch</a></li>
 
         </ul>
       </li>
@@ -348,60 +389,33 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0053"></a>
 <!-- !split -->
-<h2 id="final-visualization" class="anchor">Final visualization </h2>
-
+<h2 id="more-on-dimensionalities" class="anchor">More on Dimensionalities </h2>
 
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #408080; font-style: italic"># visual representation of grid search</span>
-<span style="color: #408080; font-style: italic"># uses seaborn heatmap, could probably do this in matplotlib</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">seaborn</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">sns</span>
+<p>In fields like signal processing (and imaging as well), one designs
+so-called filters. These filters are defined by the convolutions and
+are often hand-crafted. One may specify filters for smoothing, edge
+detection, frequency reshaping, and similar operations. However with
+neural networks the idea is to automatically learn the filters and use
+many of them in conjunction with non-linear operations (activation
+functions).
+</p>
 
-sns<span style="color: #666666">.</span>set()
+<p>As an example consider a neural network operating on sound sequence
+data.  Assume that we an input vector \( \boldsymbol{x} \) of length \( d=10^6 \).  We
+construct then a neural network with onle hidden layer only with
+\( 10^4 \) nodes. This means that we will have a weight matrix with
+\( 10^4\times 10^6=10^{10} \) weights to be determined, together with \( 10^4 \) biases.
+</p>
 
-train_accuracy <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros((<span style="color: #008000">len</span>(eta_vals), <span style="color: #008000">len</span>(lmbd_vals)))
-test_accuracy <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros((<span style="color: #008000">len</span>(eta_vals), <span style="color: #008000">len</span>(lmbd_vals)))
+<p>Assume furthermore that we have an output layer which is meant to train whether the sound sequence represents a human voice (true) or something else (false).
+It means that we have only one output node. But since this output node connects to \( 10^4 \) nodes in the hidden layer, there are in total \( 10^4 \) weights to be determined for the output layer, plus one bias. In total we have
+</p>
 
-<span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #008000">len</span>(eta_vals)):
-    <span style="color: #008000; font-weight: bold">for</span> j <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #008000">len</span>(lmbd_vals)):
-        CNN <span style="color: #666666">=</span> CNN_keras[i][j]
-
-        train_accuracy[i][j] <span style="color: #666666">=</span> CNN<span style="color: #666666">.</span>evaluate(X_train, Y_train)[<span style="color: #666666">1</span>]
-        test_accuracy[i][j] <span style="color: #666666">=</span> CNN<span style="color: #666666">.</span>evaluate(X_test, Y_test)[<span style="color: #666666">1</span>]
-
-        
-fig, ax <span style="color: #666666">=</span> plt<span style="color: #666666">.</span>subplots(figsize <span style="color: #666666">=</span> (<span style="color: #666666">10</span>, <span style="color: #666666">10</span>))
-sns<span style="color: #666666">.</span>heatmap(train_accuracy, annot<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">True</span>, ax<span style="color: #666666">=</span>ax, cmap<span style="color: #666666">=</span><span style="color: #BA2121">&quot;viridis&quot;</span>)
-ax<span style="color: #666666">.</span>set_title(<span style="color: #BA2121">&quot;Training Accuracy&quot;</span>)
-ax<span style="color: #666666">.</span>set_ylabel(<span style="color: #BA2121">&quot;$\eta$&quot;</span>)
-ax<span style="color: #666666">.</span>set_xlabel(<span style="color: #BA2121">&quot;$\lambda$&quot;</span>)
-plt<span style="color: #666666">.</span>show()
-
-fig, ax <span style="color: #666666">=</span> plt<span style="color: #666666">.</span>subplots(figsize <span style="color: #666666">=</span> (<span style="color: #666666">10</span>, <span style="color: #666666">10</span>))
-sns<span style="color: #666666">.</span>heatmap(test_accuracy, annot<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">True</span>, ax<span style="color: #666666">=</span>ax, cmap<span style="color: #666666">=</span><span style="color: #BA2121">&quot;viridis&quot;</span>)
-ax<span style="color: #666666">.</span>set_title(<span style="color: #BA2121">&quot;Test Accuracy&quot;</span>)
-ax<span style="color: #666666">.</span>set_ylabel(<span style="color: #BA2121">&quot;$\eta$&quot;</span>)
-ax<span style="color: #666666">.</span>set_xlabel(<span style="color: #BA2121">&quot;$\lambda$&quot;</span>)
-plt<span style="color: #666666">.</span>show()
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
+$$
+\mathrm{NumberParameters}=10^{10}+10^4+10^4+1 \approx 10^{10},
+$$
 
+<p>that is ten billion parameters to determine. </p>
 
 <p>
 <!-- navigation buttons at the bottom of the page -->
@@ -425,6 +439,10 @@ <h2 id="final-visualization" class="anchor">Final visualization </h2>
   <li><a href="http://github.com/._week44-bs058.html">59</a></li>
   <li><a href="http://github.com/._week44-bs059.html">60</a></li>
   <li><a href="http://github.com/._week44-bs060.html">61</a></li>
+  <li><a href="http://github.com/._week44-bs061.html">62</a></li>
+  <li><a href="http://github.com/._week44-bs062.html">63</a></li>
+  <li><a href="">...</a></li>
+  <li><a href="http://github.com/._week44-bs086.html">87</a></li>
   <li><a href="http://github.com/._week44-bs054.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week44/html/._week44-bs054.html b/doc/pub/week44/html/._week44-bs054.html
index d031fb2cf..d3208e4a1 100644
--- a/doc/pub/week44/html/._week44-bs054.html
+++ b/doc/pub/week44/html/._week44-bs054.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 44,  Convolutional Neural Networks (CNN)">
-<title>Week 44,  Convolutional Neural Networks (CNN)</title>
+<meta name="description" content="Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)">
+<title>Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week44.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week44-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -41,10 +41,121 @@
                2,
                None,
                'lab-sessions-on-tuesday-and-wednesday'),
-              ('Material for Lecture Monday October 28',
+              ('Material for Lecture Monday October 27',
                2,
                None,
-               'material-for-lecture-monday-october-28'),
+               'material-for-lecture-monday-october-27'),
+              ('Solving differential equations  with Deep Learning',
+               2,
+               None,
+               'solving-differential-equations-with-deep-learning'),
+              ('Ordinary Differential Equations first',
+               2,
+               None,
+               'ordinary-differential-equations-first'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('Minimization process', 2, None, 'minimization-process'),
+              ('Minimizing the cost function using gradient descent and '
+               'automatic differentiation',
+               2,
+               None,
+               'minimizing-the-cost-function-using-gradient-descent-and-automatic-differentiation'),
+              ('Example: Exponential decay',
+               2,
+               None,
+               'example-exponential-decay'),
+              ('The function to solve for',
+               2,
+               None,
+               'the-function-to-solve-for'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('Setup of Network', 2, None, 'setup-of-network'),
+              ('Reformulating the problem',
+               2,
+               None,
+               'reformulating-the-problem'),
+              ('More technicalities', 2, None, 'more-technicalities'),
+              ('More details', 2, None, 'more-details'),
+              ('A possible implementation of a neural network',
+               2,
+               None,
+               'a-possible-implementation-of-a-neural-network'),
+              ('Technicalities', 2, None, 'technicalities'),
+              ('Final technicalities I', 2, None, 'final-technicalities-i'),
+              ('Final technicalities II', 2, None, 'final-technicalities-ii'),
+              ('Final technicalities III', 2, None, 'final-technicalities-iii'),
+              ('Final technicalities IV', 2, None, 'final-technicalities-iv'),
+              ('Back propagation', 2, None, 'back-propagation'),
+              ('Gradient descent', 2, None, 'gradient-descent'),
+              ('The code for solving the ODE',
+               2,
+               None,
+               'the-code-for-solving-the-ode'),
+              ('The network with one input layer, specified number of hidden '
+               'layers, and one output layer',
+               2,
+               None,
+               'the-network-with-one-input-layer-specified-number-of-hidden-layers-and-one-output-layer'),
+              ('Example: Population growth',
+               2,
+               None,
+               'example-population-growth'),
+              ('Setting up the problem', 2, None, 'setting-up-the-problem'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('The program using Autograd',
+               2,
+               None,
+               'the-program-using-autograd'),
+              ('Using forward Euler to solve the ODE',
+               2,
+               None,
+               'using-forward-euler-to-solve-the-ode'),
+              ('Example: Solving the one dimensional Poisson equation',
+               2,
+               None,
+               'example-solving-the-one-dimensional-poisson-equation'),
+              ('The specific equation to solve for',
+               2,
+               None,
+               'the-specific-equation-to-solve-for'),
+              ('Solving the equation using Autograd',
+               2,
+               None,
+               'solving-the-equation-using-autograd'),
+              ('Comparing with a numerical scheme',
+               2,
+               None,
+               'comparing-with-a-numerical-scheme'),
+              ('Setting up the code', 2, None, 'setting-up-the-code'),
+              ('Partial Differential Equations',
+               2,
+               None,
+               'partial-differential-equations'),
+              ('Type of problem', 2, None, 'type-of-problem'),
+              ('Network requirements', 2, None, 'network-requirements'),
+              ('More details', 2, None, 'more-details'),
+              ('Example: The diffusion equation',
+               2,
+               None,
+               'example-the-diffusion-equation'),
+              ('Defining the problem', 2, None, 'defining-the-problem'),
+              ('Setting up the network using Autograd',
+               2,
+               None,
+               'setting-up-the-network-using-autograd'),
+              ('Setting up the network using Autograd; The trial solution',
+               2,
+               None,
+               'setting-up-the-network-using-autograd-the-trial-solution'),
+              ('Why the Jacobian?', 2, None, 'why-the-jacobian'),
+              ('Setting up the network using Autograd; The full program',
+               2,
+               None,
+               'setting-up-the-network-using-autograd-the-full-program'),
+              ('Resources on differential equations and deep learning',
+               2,
+               None,
+               'resources-on-differential-equations-and-deep-learning'),
               ('Convolutional Neural Networks (recognizing images)',
                2,
                None,
@@ -141,85 +252,11 @@
                2,
                None,
                'pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
-              ('Building convolutional neural networks in Tensorflow and Keras',
-               2,
-               None,
-               'building-convolutional-neural-networks-in-tensorflow-and-keras'),
-              ('Setting it up', 2, None, 'setting-it-up'),
-              ('The MNIST dataset again', 2, None, 'the-mnist-dataset-again'),
-              ('Strong correlations', 2, None, 'strong-correlations'),
-              ('Layers of a CNN', 2, None, 'layers-of-a-cnn'),
-              ('Systematic reduction', 2, None, 'systematic-reduction'),
-              ('Prerequisites: Collect and pre-process data',
-               2,
-               None,
-               'prerequisites-collect-and-pre-process-data'),
-              ('Importing Keras and Tensorflow',
-               2,
-               None,
-               'importing-keras-and-tensorflow'),
-              ('Running with Keras', 2, None, 'running-with-keras'),
-              ('Final part', 2, None, 'final-part'),
-              ('Final visualization', 2, None, 'final-visualization'),
-              ('The CIFAR01 data set', 2, None, 'the-cifar01-data-set'),
-              ('Verifying the data set', 2, None, 'verifying-the-data-set'),
-              ('Set up  the model', 2, None, 'set-up-the-model'),
-              ('Add Dense layers on top', 2, None, 'add-dense-layers-on-top'),
-              ('Compile and train the model',
-               2,
-               None,
-               'compile-and-train-the-model'),
-              ('Finally, evaluate the model',
-               2,
-               None,
-               'finally-evaluate-the-model'),
-              ('Building our own CNN code',
+              ('Building convolutional neural networks in Tensorflow/Keras and '
+               'PyTorch',
                2,
                None,
-               'building-our-own-cnn-code'),
-              ('List of contents:', 3, None, 'list-of-contents'),
-              ('Schedulers', 3, None, 'schedulers'),
-              ('Usage of schedulers', 3, None, 'usage-of-schedulers'),
-              ('Cost functions', 3, None, 'cost-functions'),
-              ('Usage of cost functions', 3, None, 'usage-of-cost-functions'),
-              ('Activation functions', 3, None, 'activation-functions'),
-              ('Usage of activation functions',
-               3,
-               None,
-               'usage-of-activation-functions'),
-              ('Convolution', 3, None, 'convolution'),
-              ('Layers', 3, None, 'layers'),
-              ('Convolution2DLayer: convolution in a hidden layer',
-               3,
-               None,
-               'convolution2dlayer-convolution-in-a-hidden-layer'),
-              ('Backpropagation in the convolutional layer',
-               3,
-               None,
-               'backpropagation-in-the-convolutional-layer'),
-              ('Demonstration', 3, None, 'demonstration'),
-              ('Pooling Layer', 3, None, 'pooling-layer'),
-              ('Flattening Layer', 3, None, 'flattening-layer'),
-              ('Fully Connected Layers', 3, None, 'fully-connected-layers'),
-              ('Optimized Convolution2DLayer',
-               3,
-               None,
-               'optimized-convolution2dlayer'),
-              ('The Convolutional Neural Network (CNN)',
-               3,
-               None,
-               'the-convolutional-neural-network-cnn'),
-              ('Usage of CNN code', 3, None, 'usage-of-cnn-code'),
-              ('Additional Remarks', 3, None, 'additional-remarks'),
-              ('Remarks on the speed', 3, None, 'remarks-on-the-speed'),
-              ('Convolution using separable kernels',
-               3,
-               None,
-               'convolution-using-separable-kernels'),
-              ('Convolution in the Fourier domain',
-               3,
-               None,
-               'convolution-in-the-fourier-domain')]}
+               'building-convolutional-neural-networks-in-tensorflow-keras-and-pytorch')]}
 end of tocinfo -->
 
 <body>
@@ -247,96 +284,100 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week44-bs.html">Week 44,  Convolutional Neural Networks (CNN)</a>
+    <a class="navbar-brand" href="http://github.com/week44-bs.html">Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs001.html#plan-for-week-44" style="font-size: 80%;"><b>Plan for week 44</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs002.html#lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Lab  sessions on Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs003.html#material-for-lecture-monday-october-28" style="font-size: 80%;"><b>Material for Lecture Monday October 28</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs004.html#convolutional-neural-networks-recognizing-images" style="font-size: 80%;"><b>Convolutional Neural Networks (recognizing images)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs005.html#what-is-the-difference" style="font-size: 80%;"><b>What is the Difference</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs006.html#neural-networks-vs-cnns" style="font-size: 80%;"><b>Neural Networks vs CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs007.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;"><b>Why CNNS for images, sound files, medical images from CT scans etc?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs008.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;"><b>Regular NNs don’t scale well to full images</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs009.html#3d-volumes-of-neurons" style="font-size: 80%;"><b>3D volumes of neurons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs010.html#more-on-dimensionalities" style="font-size: 80%;"><b>More on Dimensionalities</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs011.html#further-remarks" style="font-size: 80%;"><b>Further remarks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs012.html#layers-used-to-build-cnns" style="font-size: 80%;"><b>Layers used to build CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs013.html#transforming-images" style="font-size: 80%;"><b>Transforming images</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs014.html#cnns-in-brief" style="font-size: 80%;"><b>CNNs in brief</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs015.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs016.html#key-idea" style="font-size: 80%;"><b>Key Idea</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs017.html#how-to-do-image-compression-before-the-era-of-deep-learning" style="font-size: 80%;"><b>How to do image compression before the era of deep learning</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs018.html#the-svd-example" style="font-size: 80%;"><b>The SVD example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs019.html#mathematics-of-cnns" style="font-size: 80%;"><b>Mathematics of CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs020.html#convolution-examples-polynomial-multiplication" style="font-size: 80%;"><b>Convolution Examples: Polynomial multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs021.html#efficient-polynomial-multiplication" style="font-size: 80%;"><b>Efficient Polynomial Multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs022.html#further-simplification" style="font-size: 80%;"><b>Further simplification</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs023.html#a-more-efficient-way-of-coding-the-above-convolution" style="font-size: 80%;"><b>A more efficient way of coding the above Convolution</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs024.html#commutative-process" style="font-size: 80%;"><b>Commutative process</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs025.html#toeplitz-matrices" style="font-size: 80%;"><b>Toeplitz matrices</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs026.html#fourier-series-and-toeplitz-matrices" style="font-size: 80%;"><b>Fourier series and Toeplitz matrices</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs027.html#generalizing-the-above-one-dimensional-case" style="font-size: 80%;"><b>Generalizing the above one-dimensional case</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#memory-considerations" style="font-size: 80%;"><b>Memory considerations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs029.html#padding" style="font-size: 80%;"><b>Padding</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs030.html#new-vector" style="font-size: 80%;"><b>New vector</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs031.html#rewriting-as-dot-products" style="font-size: 80%;"><b>Rewriting as dot products</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs032.html#cross-correlation" style="font-size: 80%;"><b>Cross correlation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs032.html#two-dimensional-objects" style="font-size: 80%;"><b>Two-dimensional objects</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs033.html#cnns-in-more-detail-simple-example" style="font-size: 80%;"><b>CNNs in more detail, simple example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs034.html#the-convolution-stage" style="font-size: 80%;"><b>The convolution stage</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs035.html#finding-the-number-of-parameters" style="font-size: 80%;"><b>Finding the number of parameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs036.html#new-image-or-volume" style="font-size: 80%;"><b>New image (or volume)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs037.html#parameters-to-train-common-settings" style="font-size: 80%;"><b>Parameters to train, common settings</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs038.html#examples-of-cnn-setups" style="font-size: 80%;"><b>Examples of CNN setups</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs039.html#summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>Summarizing: Performing a general discrete convolution ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs040.html#pooling" style="font-size: 80%;"><b>Pooling</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs041.html#pooling-arithmetic" style="font-size: 80%;"><b>Pooling arithmetic</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs042.html#pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>Pooling types ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs043.html#building-convolutional-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building convolutional neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs044.html#setting-it-up" style="font-size: 80%;"><b>Setting it up</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs045.html#the-mnist-dataset-again" style="font-size: 80%;"><b>The MNIST dataset again</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs046.html#strong-correlations" style="font-size: 80%;"><b>Strong correlations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs047.html#layers-of-a-cnn" style="font-size: 80%;"><b>Layers of a CNN</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs048.html#systematic-reduction" style="font-size: 80%;"><b>Systematic reduction</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs049.html#prerequisites-collect-and-pre-process-data" style="font-size: 80%;"><b>Prerequisites: Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs050.html#importing-keras-and-tensorflow" style="font-size: 80%;"><b>Importing Keras and Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs051.html#running-with-keras" style="font-size: 80%;"><b>Running with Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs052.html#final-part" style="font-size: 80%;"><b>Final part</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs053.html#final-visualization" style="font-size: 80%;"><b>Final visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="#the-cifar01-data-set" style="font-size: 80%;"><b>The CIFAR01 data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs055.html#verifying-the-data-set" style="font-size: 80%;"><b>Verifying the data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs056.html#set-up-the-model" style="font-size: 80%;"><b>Set up  the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs057.html#add-dense-layers-on-top" style="font-size: 80%;"><b>Add Dense layers on top</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs058.html#compile-and-train-the-model" style="font-size: 80%;"><b>Compile and train the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs059.html#finally-evaluate-the-model" style="font-size: 80%;"><b>Finally, evaluate the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#building-our-own-cnn-code" style="font-size: 80%;"><b>Building our own CNN code</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#list-of-contents" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;List of contents:</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution2dlayer-convolution-in-a-hidden-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution2DLayer: convolution in a hidden layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#backpropagation-in-the-convolutional-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Backpropagation in the convolutional layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#demonstration" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Demonstration</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#pooling-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Pooling Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#flattening-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Flattening Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#fully-connected-layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Fully Connected Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#optimized-convolution2dlayer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Optimized Convolution2DLayer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#the-convolutional-neural-network-cnn" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;The Convolutional Neural Network (CNN)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-cnn-code" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of CNN code</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#additional-remarks" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Additional Remarks</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#remarks-on-the-speed" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Remarks on the speed</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution-using-separable-kernels" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution using separable kernels</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution-in-the-fourier-domain" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution in the Fourier domain</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs001.html#plan-for-week-44" style="font-size: 80%;">Plan for week 44</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs002.html#lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;">Lab  sessions on Tuesday and Wednesday</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs003.html#material-for-lecture-monday-october-27" style="font-size: 80%;">Material for Lecture Monday October 27</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs004.html#solving-differential-equations-with-deep-learning" style="font-size: 80%;">Solving differential equations  with Deep Learning</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs005.html#ordinary-differential-equations-first" style="font-size: 80%;">Ordinary Differential Equations first</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#the-trial-solution" style="font-size: 80%;">The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs007.html#minimization-process" style="font-size: 80%;">Minimization process</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs008.html#minimizing-the-cost-function-using-gradient-descent-and-automatic-differentiation" style="font-size: 80%;">Minimizing the cost function using gradient descent and automatic differentiation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs009.html#example-exponential-decay" style="font-size: 80%;">Example: Exponential decay</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs010.html#the-function-to-solve-for" style="font-size: 80%;">The function to solve for</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#the-trial-solution" style="font-size: 80%;">The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs012.html#setup-of-network" style="font-size: 80%;">Setup of Network</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs013.html#reformulating-the-problem" style="font-size: 80%;">Reformulating the problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs014.html#more-technicalities" style="font-size: 80%;">More technicalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs039.html#more-details" style="font-size: 80%;">More details</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs016.html#a-possible-implementation-of-a-neural-network" style="font-size: 80%;">A possible implementation of a neural network</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs017.html#technicalities" style="font-size: 80%;">Technicalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs018.html#final-technicalities-i" style="font-size: 80%;">Final technicalities I</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs019.html#final-technicalities-ii" style="font-size: 80%;">Final technicalities II</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs020.html#final-technicalities-iii" style="font-size: 80%;">Final technicalities III</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs021.html#final-technicalities-iv" style="font-size: 80%;">Final technicalities IV</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs022.html#back-propagation" style="font-size: 80%;">Back propagation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs023.html#gradient-descent" style="font-size: 80%;">Gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs024.html#the-code-for-solving-the-ode" style="font-size: 80%;">The code for solving the ODE</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs025.html#the-network-with-one-input-layer-specified-number-of-hidden-layers-and-one-output-layer" style="font-size: 80%;">The network with one input layer, specified number of hidden layers, and one output layer</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs026.html#example-population-growth" style="font-size: 80%;">Example: Population growth</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs027.html#setting-up-the-problem" style="font-size: 80%;">Setting up the problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#the-trial-solution" style="font-size: 80%;">The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs029.html#the-program-using-autograd" style="font-size: 80%;">The program using Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs030.html#using-forward-euler-to-solve-the-ode" style="font-size: 80%;">Using forward Euler to solve the ODE</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs031.html#example-solving-the-one-dimensional-poisson-equation" style="font-size: 80%;">Example: Solving the one dimensional Poisson equation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs032.html#the-specific-equation-to-solve-for" style="font-size: 80%;">The specific equation to solve for</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs033.html#solving-the-equation-using-autograd" style="font-size: 80%;">Solving the equation using Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs034.html#comparing-with-a-numerical-scheme" style="font-size: 80%;">Comparing with a numerical scheme</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs035.html#setting-up-the-code" style="font-size: 80%;">Setting up the code</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs036.html#partial-differential-equations" style="font-size: 80%;">Partial Differential Equations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs037.html#type-of-problem" style="font-size: 80%;">Type of problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs038.html#network-requirements" style="font-size: 80%;">Network requirements</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs039.html#more-details" style="font-size: 80%;">More details</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs040.html#example-the-diffusion-equation" style="font-size: 80%;">Example: The diffusion equation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs041.html#defining-the-problem" style="font-size: 80%;">Defining the problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs042.html#setting-up-the-network-using-autograd" style="font-size: 80%;">Setting up the network using Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs043.html#setting-up-the-network-using-autograd-the-trial-solution" style="font-size: 80%;">Setting up the network using Autograd; The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs044.html#why-the-jacobian" style="font-size: 80%;">Why the Jacobian?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs045.html#setting-up-the-network-using-autograd-the-full-program" style="font-size: 80%;">Setting up the network using Autograd; The full program</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs046.html#resources-on-differential-equations-and-deep-learning" style="font-size: 80%;">Resources on differential equations and deep learning</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs047.html#convolutional-neural-networks-recognizing-images" style="font-size: 80%;">Convolutional Neural Networks (recognizing images)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs048.html#what-is-the-difference" style="font-size: 80%;">What is the Difference</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs049.html#neural-networks-vs-cnns" style="font-size: 80%;">Neural Networks vs CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs050.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;">Why CNNS for images, sound files, medical images from CT scans etc?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs051.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;">Regular NNs don’t scale well to full images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs052.html#3d-volumes-of-neurons" style="font-size: 80%;">3D volumes of neurons</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs053.html#more-on-dimensionalities" style="font-size: 80%;">More on Dimensionalities</a></li>
+     <!-- navigation toc: --> <li><a href="#further-remarks" style="font-size: 80%;">Further remarks</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs055.html#layers-used-to-build-cnns" style="font-size: 80%;">Layers used to build CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs056.html#transforming-images" style="font-size: 80%;">Transforming images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs057.html#cnns-in-brief" style="font-size: 80%;">CNNs in brief</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs058.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs059.html#key-idea" style="font-size: 80%;">Key Idea</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#how-to-do-image-compression-before-the-era-of-deep-learning" style="font-size: 80%;">How to do image compression before the era of deep learning</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs061.html#the-svd-example" style="font-size: 80%;">The SVD example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs062.html#mathematics-of-cnns" style="font-size: 80%;">Mathematics of CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs063.html#convolution-examples-polynomial-multiplication" style="font-size: 80%;">Convolution Examples: Polynomial multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs064.html#efficient-polynomial-multiplication" style="font-size: 80%;">Efficient Polynomial Multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs065.html#further-simplification" style="font-size: 80%;">Further simplification</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs066.html#a-more-efficient-way-of-coding-the-above-convolution" style="font-size: 80%;">A more efficient way of coding the above Convolution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs067.html#commutative-process" style="font-size: 80%;">Commutative process</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs068.html#toeplitz-matrices" style="font-size: 80%;">Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs069.html#fourier-series-and-toeplitz-matrices" style="font-size: 80%;">Fourier series and Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs070.html#generalizing-the-above-one-dimensional-case" style="font-size: 80%;">Generalizing the above one-dimensional case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs071.html#memory-considerations" style="font-size: 80%;">Memory considerations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs072.html#padding" style="font-size: 80%;">Padding</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs073.html#new-vector" style="font-size: 80%;">New vector</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs074.html#rewriting-as-dot-products" style="font-size: 80%;">Rewriting as dot products</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs075.html#cross-correlation" style="font-size: 80%;">Cross correlation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs075.html#two-dimensional-objects" style="font-size: 80%;">Two-dimensional objects</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs076.html#cnns-in-more-detail-simple-example" style="font-size: 80%;">CNNs in more detail, simple example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs077.html#the-convolution-stage" style="font-size: 80%;">The convolution stage</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs078.html#finding-the-number-of-parameters" style="font-size: 80%;">Finding the number of parameters</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs079.html#new-image-or-volume" style="font-size: 80%;">New image (or volume)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs080.html#parameters-to-train-common-settings" style="font-size: 80%;">Parameters to train, common settings</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs081.html#examples-of-cnn-setups" style="font-size: 80%;">Examples of CNN setups</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs082.html#summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Summarizing: Performing a general discrete convolution ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs083.html#pooling" style="font-size: 80%;">Pooling</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs084.html#pooling-arithmetic" style="font-size: 80%;">Pooling arithmetic</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs085.html#pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Pooling types ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs086.html#building-convolutional-neural-networks-in-tensorflow-keras-and-pytorch" style="font-size: 80%;">Building convolutional neural networks in Tensorflow/Keras and PyTorch</a></li>
 
         </ul>
       </li>
@@ -348,47 +389,17 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0054"></a>
 <!-- !split -->
-<h2 id="the-cifar01-data-set" class="anchor">The CIFAR01 data set </h2>
+<h2 id="further-remarks" class="anchor">Further remarks </h2>
 
-<p>The CIFAR10 dataset contains 60,000 color images in 10 classes, with
-6,000 images in each class. The dataset is divided into 50,000
-training images and 10,000 testing images. The classes are mutually
-exclusive and there is no overlap between them.
+<p>The main principles that justify convolutions is locality of
+information and repetion of patterns within the signal. Sound samples
+of the input in adjacent spots are much more likely to affect each
+other than those that are very far away. Similarly, sounds are
+repeated in multiple times in the signal. While slightly simplistic,
+reasoning about such a sound example demonstrates this. The same
+principles then apply to images and other similar data.
 </p>
 
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">tensorflow</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">tf</span>
-
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">tensorflow.keras</span> <span style="color: #008000; font-weight: bold">import</span> datasets, layers, models
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">matplotlib.pyplot</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">plt</span>
-
-<span style="color: #408080; font-style: italic"># We import the data set</span>
-(train_images, train_labels), (test_images, test_labels) <span style="color: #666666">=</span> datasets<span style="color: #666666">.</span>cifar10<span style="color: #666666">.</span>load_data()
-
-<span style="color: #408080; font-style: italic"># Normalize pixel values to be between 0 and 1 by dividing by 255. </span>
-train_images, test_images <span style="color: #666666">=</span> train_images <span style="color: #666666">/</span> <span style="color: #666666">255.0</span>, test_images <span style="color: #666666">/</span> <span style="color: #666666">255.0</span>
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-
 <p>
 <!-- navigation buttons at the bottom of the page -->
 <ul class="pagination">
@@ -410,6 +421,11 @@ <h2 id="the-cifar01-data-set" class="anchor">The CIFAR01 data set </h2>
   <li><a href="http://github.com/._week44-bs058.html">59</a></li>
   <li><a href="http://github.com/._week44-bs059.html">60</a></li>
   <li><a href="http://github.com/._week44-bs060.html">61</a></li>
+  <li><a href="http://github.com/._week44-bs061.html">62</a></li>
+  <li><a href="http://github.com/._week44-bs062.html">63</a></li>
+  <li><a href="http://github.com/._week44-bs063.html">64</a></li>
+  <li><a href="">...</a></li>
+  <li><a href="http://github.com/._week44-bs086.html">87</a></li>
   <li><a href="http://github.com/._week44-bs055.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week44/html/._week44-bs055.html b/doc/pub/week44/html/._week44-bs055.html
index 04c70904c..dcaf1b0a0 100644
--- a/doc/pub/week44/html/._week44-bs055.html
+++ b/doc/pub/week44/html/._week44-bs055.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 44,  Convolutional Neural Networks (CNN)">
-<title>Week 44,  Convolutional Neural Networks (CNN)</title>
+<meta name="description" content="Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)">
+<title>Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week44.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week44-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -41,10 +41,121 @@
                2,
                None,
                'lab-sessions-on-tuesday-and-wednesday'),
-              ('Material for Lecture Monday October 28',
+              ('Material for Lecture Monday October 27',
                2,
                None,
-               'material-for-lecture-monday-october-28'),
+               'material-for-lecture-monday-october-27'),
+              ('Solving differential equations  with Deep Learning',
+               2,
+               None,
+               'solving-differential-equations-with-deep-learning'),
+              ('Ordinary Differential Equations first',
+               2,
+               None,
+               'ordinary-differential-equations-first'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('Minimization process', 2, None, 'minimization-process'),
+              ('Minimizing the cost function using gradient descent and '
+               'automatic differentiation',
+               2,
+               None,
+               'minimizing-the-cost-function-using-gradient-descent-and-automatic-differentiation'),
+              ('Example: Exponential decay',
+               2,
+               None,
+               'example-exponential-decay'),
+              ('The function to solve for',
+               2,
+               None,
+               'the-function-to-solve-for'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('Setup of Network', 2, None, 'setup-of-network'),
+              ('Reformulating the problem',
+               2,
+               None,
+               'reformulating-the-problem'),
+              ('More technicalities', 2, None, 'more-technicalities'),
+              ('More details', 2, None, 'more-details'),
+              ('A possible implementation of a neural network',
+               2,
+               None,
+               'a-possible-implementation-of-a-neural-network'),
+              ('Technicalities', 2, None, 'technicalities'),
+              ('Final technicalities I', 2, None, 'final-technicalities-i'),
+              ('Final technicalities II', 2, None, 'final-technicalities-ii'),
+              ('Final technicalities III', 2, None, 'final-technicalities-iii'),
+              ('Final technicalities IV', 2, None, 'final-technicalities-iv'),
+              ('Back propagation', 2, None, 'back-propagation'),
+              ('Gradient descent', 2, None, 'gradient-descent'),
+              ('The code for solving the ODE',
+               2,
+               None,
+               'the-code-for-solving-the-ode'),
+              ('The network with one input layer, specified number of hidden '
+               'layers, and one output layer',
+               2,
+               None,
+               'the-network-with-one-input-layer-specified-number-of-hidden-layers-and-one-output-layer'),
+              ('Example: Population growth',
+               2,
+               None,
+               'example-population-growth'),
+              ('Setting up the problem', 2, None, 'setting-up-the-problem'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('The program using Autograd',
+               2,
+               None,
+               'the-program-using-autograd'),
+              ('Using forward Euler to solve the ODE',
+               2,
+               None,
+               'using-forward-euler-to-solve-the-ode'),
+              ('Example: Solving the one dimensional Poisson equation',
+               2,
+               None,
+               'example-solving-the-one-dimensional-poisson-equation'),
+              ('The specific equation to solve for',
+               2,
+               None,
+               'the-specific-equation-to-solve-for'),
+              ('Solving the equation using Autograd',
+               2,
+               None,
+               'solving-the-equation-using-autograd'),
+              ('Comparing with a numerical scheme',
+               2,
+               None,
+               'comparing-with-a-numerical-scheme'),
+              ('Setting up the code', 2, None, 'setting-up-the-code'),
+              ('Partial Differential Equations',
+               2,
+               None,
+               'partial-differential-equations'),
+              ('Type of problem', 2, None, 'type-of-problem'),
+              ('Network requirements', 2, None, 'network-requirements'),
+              ('More details', 2, None, 'more-details'),
+              ('Example: The diffusion equation',
+               2,
+               None,
+               'example-the-diffusion-equation'),
+              ('Defining the problem', 2, None, 'defining-the-problem'),
+              ('Setting up the network using Autograd',
+               2,
+               None,
+               'setting-up-the-network-using-autograd'),
+              ('Setting up the network using Autograd; The trial solution',
+               2,
+               None,
+               'setting-up-the-network-using-autograd-the-trial-solution'),
+              ('Why the Jacobian?', 2, None, 'why-the-jacobian'),
+              ('Setting up the network using Autograd; The full program',
+               2,
+               None,
+               'setting-up-the-network-using-autograd-the-full-program'),
+              ('Resources on differential equations and deep learning',
+               2,
+               None,
+               'resources-on-differential-equations-and-deep-learning'),
               ('Convolutional Neural Networks (recognizing images)',
                2,
                None,
@@ -141,85 +252,11 @@
                2,
                None,
                'pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
-              ('Building convolutional neural networks in Tensorflow and Keras',
-               2,
-               None,
-               'building-convolutional-neural-networks-in-tensorflow-and-keras'),
-              ('Setting it up', 2, None, 'setting-it-up'),
-              ('The MNIST dataset again', 2, None, 'the-mnist-dataset-again'),
-              ('Strong correlations', 2, None, 'strong-correlations'),
-              ('Layers of a CNN', 2, None, 'layers-of-a-cnn'),
-              ('Systematic reduction', 2, None, 'systematic-reduction'),
-              ('Prerequisites: Collect and pre-process data',
-               2,
-               None,
-               'prerequisites-collect-and-pre-process-data'),
-              ('Importing Keras and Tensorflow',
-               2,
-               None,
-               'importing-keras-and-tensorflow'),
-              ('Running with Keras', 2, None, 'running-with-keras'),
-              ('Final part', 2, None, 'final-part'),
-              ('Final visualization', 2, None, 'final-visualization'),
-              ('The CIFAR01 data set', 2, None, 'the-cifar01-data-set'),
-              ('Verifying the data set', 2, None, 'verifying-the-data-set'),
-              ('Set up  the model', 2, None, 'set-up-the-model'),
-              ('Add Dense layers on top', 2, None, 'add-dense-layers-on-top'),
-              ('Compile and train the model',
-               2,
-               None,
-               'compile-and-train-the-model'),
-              ('Finally, evaluate the model',
+              ('Building convolutional neural networks in Tensorflow/Keras and '
+               'PyTorch',
                2,
                None,
-               'finally-evaluate-the-model'),
-              ('Building our own CNN code',
-               2,
-               None,
-               'building-our-own-cnn-code'),
-              ('List of contents:', 3, None, 'list-of-contents'),
-              ('Schedulers', 3, None, 'schedulers'),
-              ('Usage of schedulers', 3, None, 'usage-of-schedulers'),
-              ('Cost functions', 3, None, 'cost-functions'),
-              ('Usage of cost functions', 3, None, 'usage-of-cost-functions'),
-              ('Activation functions', 3, None, 'activation-functions'),
-              ('Usage of activation functions',
-               3,
-               None,
-               'usage-of-activation-functions'),
-              ('Convolution', 3, None, 'convolution'),
-              ('Layers', 3, None, 'layers'),
-              ('Convolution2DLayer: convolution in a hidden layer',
-               3,
-               None,
-               'convolution2dlayer-convolution-in-a-hidden-layer'),
-              ('Backpropagation in the convolutional layer',
-               3,
-               None,
-               'backpropagation-in-the-convolutional-layer'),
-              ('Demonstration', 3, None, 'demonstration'),
-              ('Pooling Layer', 3, None, 'pooling-layer'),
-              ('Flattening Layer', 3, None, 'flattening-layer'),
-              ('Fully Connected Layers', 3, None, 'fully-connected-layers'),
-              ('Optimized Convolution2DLayer',
-               3,
-               None,
-               'optimized-convolution2dlayer'),
-              ('The Convolutional Neural Network (CNN)',
-               3,
-               None,
-               'the-convolutional-neural-network-cnn'),
-              ('Usage of CNN code', 3, None, 'usage-of-cnn-code'),
-              ('Additional Remarks', 3, None, 'additional-remarks'),
-              ('Remarks on the speed', 3, None, 'remarks-on-the-speed'),
-              ('Convolution using separable kernels',
-               3,
-               None,
-               'convolution-using-separable-kernels'),
-              ('Convolution in the Fourier domain',
-               3,
-               None,
-               'convolution-in-the-fourier-domain')]}
+               'building-convolutional-neural-networks-in-tensorflow-keras-and-pytorch')]}
 end of tocinfo -->
 
 <body>
@@ -247,96 +284,100 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week44-bs.html">Week 44,  Convolutional Neural Networks (CNN)</a>
+    <a class="navbar-brand" href="http://github.com/week44-bs.html">Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs001.html#plan-for-week-44" style="font-size: 80%;"><b>Plan for week 44</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs002.html#lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Lab  sessions on Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs003.html#material-for-lecture-monday-october-28" style="font-size: 80%;"><b>Material for Lecture Monday October 28</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs004.html#convolutional-neural-networks-recognizing-images" style="font-size: 80%;"><b>Convolutional Neural Networks (recognizing images)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs005.html#what-is-the-difference" style="font-size: 80%;"><b>What is the Difference</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs006.html#neural-networks-vs-cnns" style="font-size: 80%;"><b>Neural Networks vs CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs007.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;"><b>Why CNNS for images, sound files, medical images from CT scans etc?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs008.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;"><b>Regular NNs don’t scale well to full images</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs009.html#3d-volumes-of-neurons" style="font-size: 80%;"><b>3D volumes of neurons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs010.html#more-on-dimensionalities" style="font-size: 80%;"><b>More on Dimensionalities</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs011.html#further-remarks" style="font-size: 80%;"><b>Further remarks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs012.html#layers-used-to-build-cnns" style="font-size: 80%;"><b>Layers used to build CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs013.html#transforming-images" style="font-size: 80%;"><b>Transforming images</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs014.html#cnns-in-brief" style="font-size: 80%;"><b>CNNs in brief</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs015.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs016.html#key-idea" style="font-size: 80%;"><b>Key Idea</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs017.html#how-to-do-image-compression-before-the-era-of-deep-learning" style="font-size: 80%;"><b>How to do image compression before the era of deep learning</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs018.html#the-svd-example" style="font-size: 80%;"><b>The SVD example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs019.html#mathematics-of-cnns" style="font-size: 80%;"><b>Mathematics of CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs020.html#convolution-examples-polynomial-multiplication" style="font-size: 80%;"><b>Convolution Examples: Polynomial multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs021.html#efficient-polynomial-multiplication" style="font-size: 80%;"><b>Efficient Polynomial Multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs022.html#further-simplification" style="font-size: 80%;"><b>Further simplification</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs023.html#a-more-efficient-way-of-coding-the-above-convolution" style="font-size: 80%;"><b>A more efficient way of coding the above Convolution</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs024.html#commutative-process" style="font-size: 80%;"><b>Commutative process</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs025.html#toeplitz-matrices" style="font-size: 80%;"><b>Toeplitz matrices</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs026.html#fourier-series-and-toeplitz-matrices" style="font-size: 80%;"><b>Fourier series and Toeplitz matrices</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs027.html#generalizing-the-above-one-dimensional-case" style="font-size: 80%;"><b>Generalizing the above one-dimensional case</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#memory-considerations" style="font-size: 80%;"><b>Memory considerations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs029.html#padding" style="font-size: 80%;"><b>Padding</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs030.html#new-vector" style="font-size: 80%;"><b>New vector</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs031.html#rewriting-as-dot-products" style="font-size: 80%;"><b>Rewriting as dot products</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs032.html#cross-correlation" style="font-size: 80%;"><b>Cross correlation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs032.html#two-dimensional-objects" style="font-size: 80%;"><b>Two-dimensional objects</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs033.html#cnns-in-more-detail-simple-example" style="font-size: 80%;"><b>CNNs in more detail, simple example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs034.html#the-convolution-stage" style="font-size: 80%;"><b>The convolution stage</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs035.html#finding-the-number-of-parameters" style="font-size: 80%;"><b>Finding the number of parameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs036.html#new-image-or-volume" style="font-size: 80%;"><b>New image (or volume)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs037.html#parameters-to-train-common-settings" style="font-size: 80%;"><b>Parameters to train, common settings</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs038.html#examples-of-cnn-setups" style="font-size: 80%;"><b>Examples of CNN setups</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs039.html#summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>Summarizing: Performing a general discrete convolution ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs040.html#pooling" style="font-size: 80%;"><b>Pooling</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs041.html#pooling-arithmetic" style="font-size: 80%;"><b>Pooling arithmetic</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs042.html#pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>Pooling types ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs043.html#building-convolutional-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building convolutional neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs044.html#setting-it-up" style="font-size: 80%;"><b>Setting it up</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs045.html#the-mnist-dataset-again" style="font-size: 80%;"><b>The MNIST dataset again</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs046.html#strong-correlations" style="font-size: 80%;"><b>Strong correlations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs047.html#layers-of-a-cnn" style="font-size: 80%;"><b>Layers of a CNN</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs048.html#systematic-reduction" style="font-size: 80%;"><b>Systematic reduction</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs049.html#prerequisites-collect-and-pre-process-data" style="font-size: 80%;"><b>Prerequisites: Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs050.html#importing-keras-and-tensorflow" style="font-size: 80%;"><b>Importing Keras and Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs051.html#running-with-keras" style="font-size: 80%;"><b>Running with Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs052.html#final-part" style="font-size: 80%;"><b>Final part</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs053.html#final-visualization" style="font-size: 80%;"><b>Final visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs054.html#the-cifar01-data-set" style="font-size: 80%;"><b>The CIFAR01 data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="#verifying-the-data-set" style="font-size: 80%;"><b>Verifying the data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs056.html#set-up-the-model" style="font-size: 80%;"><b>Set up  the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs057.html#add-dense-layers-on-top" style="font-size: 80%;"><b>Add Dense layers on top</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs058.html#compile-and-train-the-model" style="font-size: 80%;"><b>Compile and train the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs059.html#finally-evaluate-the-model" style="font-size: 80%;"><b>Finally, evaluate the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#building-our-own-cnn-code" style="font-size: 80%;"><b>Building our own CNN code</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#list-of-contents" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;List of contents:</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution2dlayer-convolution-in-a-hidden-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution2DLayer: convolution in a hidden layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#backpropagation-in-the-convolutional-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Backpropagation in the convolutional layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#demonstration" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Demonstration</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#pooling-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Pooling Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#flattening-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Flattening Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#fully-connected-layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Fully Connected Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#optimized-convolution2dlayer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Optimized Convolution2DLayer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#the-convolutional-neural-network-cnn" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;The Convolutional Neural Network (CNN)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-cnn-code" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of CNN code</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#additional-remarks" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Additional Remarks</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#remarks-on-the-speed" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Remarks on the speed</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution-using-separable-kernels" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution using separable kernels</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution-in-the-fourier-domain" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution in the Fourier domain</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs001.html#plan-for-week-44" style="font-size: 80%;">Plan for week 44</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs002.html#lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;">Lab  sessions on Tuesday and Wednesday</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs003.html#material-for-lecture-monday-october-27" style="font-size: 80%;">Material for Lecture Monday October 27</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs004.html#solving-differential-equations-with-deep-learning" style="font-size: 80%;">Solving differential equations  with Deep Learning</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs005.html#ordinary-differential-equations-first" style="font-size: 80%;">Ordinary Differential Equations first</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#the-trial-solution" style="font-size: 80%;">The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs007.html#minimization-process" style="font-size: 80%;">Minimization process</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs008.html#minimizing-the-cost-function-using-gradient-descent-and-automatic-differentiation" style="font-size: 80%;">Minimizing the cost function using gradient descent and automatic differentiation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs009.html#example-exponential-decay" style="font-size: 80%;">Example: Exponential decay</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs010.html#the-function-to-solve-for" style="font-size: 80%;">The function to solve for</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#the-trial-solution" style="font-size: 80%;">The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs012.html#setup-of-network" style="font-size: 80%;">Setup of Network</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs013.html#reformulating-the-problem" style="font-size: 80%;">Reformulating the problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs014.html#more-technicalities" style="font-size: 80%;">More technicalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs039.html#more-details" style="font-size: 80%;">More details</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs016.html#a-possible-implementation-of-a-neural-network" style="font-size: 80%;">A possible implementation of a neural network</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs017.html#technicalities" style="font-size: 80%;">Technicalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs018.html#final-technicalities-i" style="font-size: 80%;">Final technicalities I</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs019.html#final-technicalities-ii" style="font-size: 80%;">Final technicalities II</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs020.html#final-technicalities-iii" style="font-size: 80%;">Final technicalities III</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs021.html#final-technicalities-iv" style="font-size: 80%;">Final technicalities IV</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs022.html#back-propagation" style="font-size: 80%;">Back propagation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs023.html#gradient-descent" style="font-size: 80%;">Gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs024.html#the-code-for-solving-the-ode" style="font-size: 80%;">The code for solving the ODE</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs025.html#the-network-with-one-input-layer-specified-number-of-hidden-layers-and-one-output-layer" style="font-size: 80%;">The network with one input layer, specified number of hidden layers, and one output layer</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs026.html#example-population-growth" style="font-size: 80%;">Example: Population growth</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs027.html#setting-up-the-problem" style="font-size: 80%;">Setting up the problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#the-trial-solution" style="font-size: 80%;">The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs029.html#the-program-using-autograd" style="font-size: 80%;">The program using Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs030.html#using-forward-euler-to-solve-the-ode" style="font-size: 80%;">Using forward Euler to solve the ODE</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs031.html#example-solving-the-one-dimensional-poisson-equation" style="font-size: 80%;">Example: Solving the one dimensional Poisson equation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs032.html#the-specific-equation-to-solve-for" style="font-size: 80%;">The specific equation to solve for</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs033.html#solving-the-equation-using-autograd" style="font-size: 80%;">Solving the equation using Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs034.html#comparing-with-a-numerical-scheme" style="font-size: 80%;">Comparing with a numerical scheme</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs035.html#setting-up-the-code" style="font-size: 80%;">Setting up the code</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs036.html#partial-differential-equations" style="font-size: 80%;">Partial Differential Equations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs037.html#type-of-problem" style="font-size: 80%;">Type of problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs038.html#network-requirements" style="font-size: 80%;">Network requirements</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs039.html#more-details" style="font-size: 80%;">More details</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs040.html#example-the-diffusion-equation" style="font-size: 80%;">Example: The diffusion equation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs041.html#defining-the-problem" style="font-size: 80%;">Defining the problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs042.html#setting-up-the-network-using-autograd" style="font-size: 80%;">Setting up the network using Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs043.html#setting-up-the-network-using-autograd-the-trial-solution" style="font-size: 80%;">Setting up the network using Autograd; The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs044.html#why-the-jacobian" style="font-size: 80%;">Why the Jacobian?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs045.html#setting-up-the-network-using-autograd-the-full-program" style="font-size: 80%;">Setting up the network using Autograd; The full program</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs046.html#resources-on-differential-equations-and-deep-learning" style="font-size: 80%;">Resources on differential equations and deep learning</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs047.html#convolutional-neural-networks-recognizing-images" style="font-size: 80%;">Convolutional Neural Networks (recognizing images)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs048.html#what-is-the-difference" style="font-size: 80%;">What is the Difference</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs049.html#neural-networks-vs-cnns" style="font-size: 80%;">Neural Networks vs CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs050.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;">Why CNNS for images, sound files, medical images from CT scans etc?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs051.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;">Regular NNs don’t scale well to full images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs052.html#3d-volumes-of-neurons" style="font-size: 80%;">3D volumes of neurons</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs053.html#more-on-dimensionalities" style="font-size: 80%;">More on Dimensionalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs054.html#further-remarks" style="font-size: 80%;">Further remarks</a></li>
+     <!-- navigation toc: --> <li><a href="#layers-used-to-build-cnns" style="font-size: 80%;">Layers used to build CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs056.html#transforming-images" style="font-size: 80%;">Transforming images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs057.html#cnns-in-brief" style="font-size: 80%;">CNNs in brief</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs058.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs059.html#key-idea" style="font-size: 80%;">Key Idea</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#how-to-do-image-compression-before-the-era-of-deep-learning" style="font-size: 80%;">How to do image compression before the era of deep learning</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs061.html#the-svd-example" style="font-size: 80%;">The SVD example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs062.html#mathematics-of-cnns" style="font-size: 80%;">Mathematics of CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs063.html#convolution-examples-polynomial-multiplication" style="font-size: 80%;">Convolution Examples: Polynomial multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs064.html#efficient-polynomial-multiplication" style="font-size: 80%;">Efficient Polynomial Multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs065.html#further-simplification" style="font-size: 80%;">Further simplification</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs066.html#a-more-efficient-way-of-coding-the-above-convolution" style="font-size: 80%;">A more efficient way of coding the above Convolution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs067.html#commutative-process" style="font-size: 80%;">Commutative process</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs068.html#toeplitz-matrices" style="font-size: 80%;">Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs069.html#fourier-series-and-toeplitz-matrices" style="font-size: 80%;">Fourier series and Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs070.html#generalizing-the-above-one-dimensional-case" style="font-size: 80%;">Generalizing the above one-dimensional case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs071.html#memory-considerations" style="font-size: 80%;">Memory considerations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs072.html#padding" style="font-size: 80%;">Padding</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs073.html#new-vector" style="font-size: 80%;">New vector</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs074.html#rewriting-as-dot-products" style="font-size: 80%;">Rewriting as dot products</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs075.html#cross-correlation" style="font-size: 80%;">Cross correlation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs075.html#two-dimensional-objects" style="font-size: 80%;">Two-dimensional objects</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs076.html#cnns-in-more-detail-simple-example" style="font-size: 80%;">CNNs in more detail, simple example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs077.html#the-convolution-stage" style="font-size: 80%;">The convolution stage</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs078.html#finding-the-number-of-parameters" style="font-size: 80%;">Finding the number of parameters</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs079.html#new-image-or-volume" style="font-size: 80%;">New image (or volume)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs080.html#parameters-to-train-common-settings" style="font-size: 80%;">Parameters to train, common settings</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs081.html#examples-of-cnn-setups" style="font-size: 80%;">Examples of CNN setups</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs082.html#summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Summarizing: Performing a general discrete convolution ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs083.html#pooling" style="font-size: 80%;">Pooling</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs084.html#pooling-arithmetic" style="font-size: 80%;">Pooling arithmetic</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs085.html#pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Pooling types ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs086.html#building-convolutional-neural-networks-in-tensorflow-keras-and-pytorch" style="font-size: 80%;">Building convolutional neural networks in Tensorflow/Keras and PyTorch</a></li>
 
         </ul>
       </li>
@@ -347,47 +388,26 @@
 <div class="container">
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0055"></a>
-<!-- !split -->
-<h2 id="verifying-the-data-set" class="anchor">Verifying the data set </h2>
+<!-- !split  -->
+<h2 id="layers-used-to-build-cnns" class="anchor">Layers used to build CNNs </h2>
 
-<p>To verify that the dataset looks correct, let's plot the first 25 images from the training set and display the class name below each image.</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;">class_names <span style="color: #666666">=</span> [<span style="color: #BA2121">&#39;airplane&#39;</span>, <span style="color: #BA2121">&#39;automobile&#39;</span>, <span style="color: #BA2121">&#39;bird&#39;</span>, <span style="color: #BA2121">&#39;cat&#39;</span>, <span style="color: #BA2121">&#39;deer&#39;</span>,
-               <span style="color: #BA2121">&#39;dog&#39;</span>, <span style="color: #BA2121">&#39;frog&#39;</span>, <span style="color: #BA2121">&#39;horse&#39;</span>, <span style="color: #BA2121">&#39;ship&#39;</span>, <span style="color: #BA2121">&#39;truck&#39;</span>]
-plt<span style="color: #666666">.</span>figure(figsize<span style="color: #666666">=</span>(<span style="color: #666666">10</span>,<span style="color: #666666">10</span>))
-<span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #666666">25</span>):
-    plt<span style="color: #666666">.</span>subplot(<span style="color: #666666">5</span>,<span style="color: #666666">5</span>,i<span style="color: #666666">+1</span>)
-    plt<span style="color: #666666">.</span>xticks([])
-    plt<span style="color: #666666">.</span>yticks([])
-    plt<span style="color: #666666">.</span>grid(<span style="color: #008000; font-weight: bold">False</span>)
-    plt<span style="color: #666666">.</span>imshow(train_images[i], cmap<span style="color: #666666">=</span>plt<span style="color: #666666">.</span>cm<span style="color: #666666">.</span>binary)
-    <span style="color: #408080; font-style: italic"># The CIFAR labels happen to be arrays, </span>
-    <span style="color: #408080; font-style: italic"># which is why you need the extra index</span>
-    plt<span style="color: #666666">.</span>xlabel(class_names[train_labels[i][<span style="color: #666666">0</span>]])
-plt<span style="color: #666666">.</span>show()
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
+<p>A simple CNN is a sequence of layers, and every layer of a CNN
+transforms one volume of activations to another through a
+differentiable function. We use three main types of layers to build
+CNN architectures: Convolutional Layer, Pooling Layer, and
+Fully-Connected Layer (exactly as seen in regular Neural Networks). We
+will stack these layers to form a full CNN architecture.
+</p>
 
+<p>A simple CNN for image classification could have the architecture:</p>
 
+<ul>
+<li> <b>INPUT</b> (\( 32\times 32 \times 3 \)) will hold the raw pixel values of the image, in this case an image of width 32, height 32, and with three color channels R,G,B.</li>
+<li> <b>CONV</b> (convolutional )layer will compute the output of neurons that are connected to local regions in the input, each computing a dot product between their weights and a small region they are connected to in the input volume. This may result in volume such as \( [32\times 32\times 12] \) if we decided to use 12 filters.</li>
+<li> <b>RELU</b> layer will apply an elementwise activation function, such as the \( max(0,x) \) thresholding at zero. This leaves the size of the volume unchanged (\( [32\times 32\times 12] \)).</li>
+<li> <b>POOL</b> (pooling) layer will perform a downsampling operation along the spatial dimensions (width, height), resulting in volume such as \( [16\times 16\times 12] \).</li>
+<li> <b>FC</b> (i.e. fully-connected) layer will compute the class scores, resulting in volume of size \( [1\times 1\times 10] \), where each of the 10 numbers correspond to a class score, such as among the 10 categories of the MNIST images we considered above . As with ordinary Neural Networks and as the name implies, each neuron in this layer will be connected to all the numbers in the previous volume.</li>
+</ul>
 <p>
 <!-- navigation buttons at the bottom of the page -->
 <ul class="pagination">
@@ -408,6 +428,12 @@ <h2 id="verifying-the-data-set" class="anchor">Verifying the data set </h2>
   <li><a href="http://github.com/._week44-bs058.html">59</a></li>
   <li><a href="http://github.com/._week44-bs059.html">60</a></li>
   <li><a href="http://github.com/._week44-bs060.html">61</a></li>
+  <li><a href="http://github.com/._week44-bs061.html">62</a></li>
+  <li><a href="http://github.com/._week44-bs062.html">63</a></li>
+  <li><a href="http://github.com/._week44-bs063.html">64</a></li>
+  <li><a href="http://github.com/._week44-bs064.html">65</a></li>
+  <li><a href="">...</a></li>
+  <li><a href="http://github.com/._week44-bs086.html">87</a></li>
   <li><a href="http://github.com/._week44-bs056.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week44/html/._week44-bs056.html b/doc/pub/week44/html/._week44-bs056.html
index d51165269..aa7c3f726 100644
--- a/doc/pub/week44/html/._week44-bs056.html
+++ b/doc/pub/week44/html/._week44-bs056.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 44,  Convolutional Neural Networks (CNN)">
-<title>Week 44,  Convolutional Neural Networks (CNN)</title>
+<meta name="description" content="Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)">
+<title>Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week44.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week44-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -41,10 +41,121 @@
                2,
                None,
                'lab-sessions-on-tuesday-and-wednesday'),
-              ('Material for Lecture Monday October 28',
+              ('Material for Lecture Monday October 27',
                2,
                None,
-               'material-for-lecture-monday-october-28'),
+               'material-for-lecture-monday-october-27'),
+              ('Solving differential equations  with Deep Learning',
+               2,
+               None,
+               'solving-differential-equations-with-deep-learning'),
+              ('Ordinary Differential Equations first',
+               2,
+               None,
+               'ordinary-differential-equations-first'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('Minimization process', 2, None, 'minimization-process'),
+              ('Minimizing the cost function using gradient descent and '
+               'automatic differentiation',
+               2,
+               None,
+               'minimizing-the-cost-function-using-gradient-descent-and-automatic-differentiation'),
+              ('Example: Exponential decay',
+               2,
+               None,
+               'example-exponential-decay'),
+              ('The function to solve for',
+               2,
+               None,
+               'the-function-to-solve-for'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('Setup of Network', 2, None, 'setup-of-network'),
+              ('Reformulating the problem',
+               2,
+               None,
+               'reformulating-the-problem'),
+              ('More technicalities', 2, None, 'more-technicalities'),
+              ('More details', 2, None, 'more-details'),
+              ('A possible implementation of a neural network',
+               2,
+               None,
+               'a-possible-implementation-of-a-neural-network'),
+              ('Technicalities', 2, None, 'technicalities'),
+              ('Final technicalities I', 2, None, 'final-technicalities-i'),
+              ('Final technicalities II', 2, None, 'final-technicalities-ii'),
+              ('Final technicalities III', 2, None, 'final-technicalities-iii'),
+              ('Final technicalities IV', 2, None, 'final-technicalities-iv'),
+              ('Back propagation', 2, None, 'back-propagation'),
+              ('Gradient descent', 2, None, 'gradient-descent'),
+              ('The code for solving the ODE',
+               2,
+               None,
+               'the-code-for-solving-the-ode'),
+              ('The network with one input layer, specified number of hidden '
+               'layers, and one output layer',
+               2,
+               None,
+               'the-network-with-one-input-layer-specified-number-of-hidden-layers-and-one-output-layer'),
+              ('Example: Population growth',
+               2,
+               None,
+               'example-population-growth'),
+              ('Setting up the problem', 2, None, 'setting-up-the-problem'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('The program using Autograd',
+               2,
+               None,
+               'the-program-using-autograd'),
+              ('Using forward Euler to solve the ODE',
+               2,
+               None,
+               'using-forward-euler-to-solve-the-ode'),
+              ('Example: Solving the one dimensional Poisson equation',
+               2,
+               None,
+               'example-solving-the-one-dimensional-poisson-equation'),
+              ('The specific equation to solve for',
+               2,
+               None,
+               'the-specific-equation-to-solve-for'),
+              ('Solving the equation using Autograd',
+               2,
+               None,
+               'solving-the-equation-using-autograd'),
+              ('Comparing with a numerical scheme',
+               2,
+               None,
+               'comparing-with-a-numerical-scheme'),
+              ('Setting up the code', 2, None, 'setting-up-the-code'),
+              ('Partial Differential Equations',
+               2,
+               None,
+               'partial-differential-equations'),
+              ('Type of problem', 2, None, 'type-of-problem'),
+              ('Network requirements', 2, None, 'network-requirements'),
+              ('More details', 2, None, 'more-details'),
+              ('Example: The diffusion equation',
+               2,
+               None,
+               'example-the-diffusion-equation'),
+              ('Defining the problem', 2, None, 'defining-the-problem'),
+              ('Setting up the network using Autograd',
+               2,
+               None,
+               'setting-up-the-network-using-autograd'),
+              ('Setting up the network using Autograd; The trial solution',
+               2,
+               None,
+               'setting-up-the-network-using-autograd-the-trial-solution'),
+              ('Why the Jacobian?', 2, None, 'why-the-jacobian'),
+              ('Setting up the network using Autograd; The full program',
+               2,
+               None,
+               'setting-up-the-network-using-autograd-the-full-program'),
+              ('Resources on differential equations and deep learning',
+               2,
+               None,
+               'resources-on-differential-equations-and-deep-learning'),
               ('Convolutional Neural Networks (recognizing images)',
                2,
                None,
@@ -141,85 +252,11 @@
                2,
                None,
                'pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
-              ('Building convolutional neural networks in Tensorflow and Keras',
-               2,
-               None,
-               'building-convolutional-neural-networks-in-tensorflow-and-keras'),
-              ('Setting it up', 2, None, 'setting-it-up'),
-              ('The MNIST dataset again', 2, None, 'the-mnist-dataset-again'),
-              ('Strong correlations', 2, None, 'strong-correlations'),
-              ('Layers of a CNN', 2, None, 'layers-of-a-cnn'),
-              ('Systematic reduction', 2, None, 'systematic-reduction'),
-              ('Prerequisites: Collect and pre-process data',
-               2,
-               None,
-               'prerequisites-collect-and-pre-process-data'),
-              ('Importing Keras and Tensorflow',
-               2,
-               None,
-               'importing-keras-and-tensorflow'),
-              ('Running with Keras', 2, None, 'running-with-keras'),
-              ('Final part', 2, None, 'final-part'),
-              ('Final visualization', 2, None, 'final-visualization'),
-              ('The CIFAR01 data set', 2, None, 'the-cifar01-data-set'),
-              ('Verifying the data set', 2, None, 'verifying-the-data-set'),
-              ('Set up  the model', 2, None, 'set-up-the-model'),
-              ('Add Dense layers on top', 2, None, 'add-dense-layers-on-top'),
-              ('Compile and train the model',
-               2,
-               None,
-               'compile-and-train-the-model'),
-              ('Finally, evaluate the model',
-               2,
-               None,
-               'finally-evaluate-the-model'),
-              ('Building our own CNN code',
+              ('Building convolutional neural networks in Tensorflow/Keras and '
+               'PyTorch',
                2,
                None,
-               'building-our-own-cnn-code'),
-              ('List of contents:', 3, None, 'list-of-contents'),
-              ('Schedulers', 3, None, 'schedulers'),
-              ('Usage of schedulers', 3, None, 'usage-of-schedulers'),
-              ('Cost functions', 3, None, 'cost-functions'),
-              ('Usage of cost functions', 3, None, 'usage-of-cost-functions'),
-              ('Activation functions', 3, None, 'activation-functions'),
-              ('Usage of activation functions',
-               3,
-               None,
-               'usage-of-activation-functions'),
-              ('Convolution', 3, None, 'convolution'),
-              ('Layers', 3, None, 'layers'),
-              ('Convolution2DLayer: convolution in a hidden layer',
-               3,
-               None,
-               'convolution2dlayer-convolution-in-a-hidden-layer'),
-              ('Backpropagation in the convolutional layer',
-               3,
-               None,
-               'backpropagation-in-the-convolutional-layer'),
-              ('Demonstration', 3, None, 'demonstration'),
-              ('Pooling Layer', 3, None, 'pooling-layer'),
-              ('Flattening Layer', 3, None, 'flattening-layer'),
-              ('Fully Connected Layers', 3, None, 'fully-connected-layers'),
-              ('Optimized Convolution2DLayer',
-               3,
-               None,
-               'optimized-convolution2dlayer'),
-              ('The Convolutional Neural Network (CNN)',
-               3,
-               None,
-               'the-convolutional-neural-network-cnn'),
-              ('Usage of CNN code', 3, None, 'usage-of-cnn-code'),
-              ('Additional Remarks', 3, None, 'additional-remarks'),
-              ('Remarks on the speed', 3, None, 'remarks-on-the-speed'),
-              ('Convolution using separable kernels',
-               3,
-               None,
-               'convolution-using-separable-kernels'),
-              ('Convolution in the Fourier domain',
-               3,
-               None,
-               'convolution-in-the-fourier-domain')]}
+               'building-convolutional-neural-networks-in-tensorflow-keras-and-pytorch')]}
 end of tocinfo -->
 
 <body>
@@ -247,96 +284,100 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week44-bs.html">Week 44,  Convolutional Neural Networks (CNN)</a>
+    <a class="navbar-brand" href="http://github.com/week44-bs.html">Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs001.html#plan-for-week-44" style="font-size: 80%;"><b>Plan for week 44</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs002.html#lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Lab  sessions on Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs003.html#material-for-lecture-monday-october-28" style="font-size: 80%;"><b>Material for Lecture Monday October 28</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs004.html#convolutional-neural-networks-recognizing-images" style="font-size: 80%;"><b>Convolutional Neural Networks (recognizing images)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs005.html#what-is-the-difference" style="font-size: 80%;"><b>What is the Difference</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs006.html#neural-networks-vs-cnns" style="font-size: 80%;"><b>Neural Networks vs CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs007.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;"><b>Why CNNS for images, sound files, medical images from CT scans etc?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs008.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;"><b>Regular NNs don’t scale well to full images</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs009.html#3d-volumes-of-neurons" style="font-size: 80%;"><b>3D volumes of neurons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs010.html#more-on-dimensionalities" style="font-size: 80%;"><b>More on Dimensionalities</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs011.html#further-remarks" style="font-size: 80%;"><b>Further remarks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs012.html#layers-used-to-build-cnns" style="font-size: 80%;"><b>Layers used to build CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs013.html#transforming-images" style="font-size: 80%;"><b>Transforming images</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs014.html#cnns-in-brief" style="font-size: 80%;"><b>CNNs in brief</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs015.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs016.html#key-idea" style="font-size: 80%;"><b>Key Idea</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs017.html#how-to-do-image-compression-before-the-era-of-deep-learning" style="font-size: 80%;"><b>How to do image compression before the era of deep learning</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs018.html#the-svd-example" style="font-size: 80%;"><b>The SVD example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs019.html#mathematics-of-cnns" style="font-size: 80%;"><b>Mathematics of CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs020.html#convolution-examples-polynomial-multiplication" style="font-size: 80%;"><b>Convolution Examples: Polynomial multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs021.html#efficient-polynomial-multiplication" style="font-size: 80%;"><b>Efficient Polynomial Multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs022.html#further-simplification" style="font-size: 80%;"><b>Further simplification</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs023.html#a-more-efficient-way-of-coding-the-above-convolution" style="font-size: 80%;"><b>A more efficient way of coding the above Convolution</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs024.html#commutative-process" style="font-size: 80%;"><b>Commutative process</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs025.html#toeplitz-matrices" style="font-size: 80%;"><b>Toeplitz matrices</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs026.html#fourier-series-and-toeplitz-matrices" style="font-size: 80%;"><b>Fourier series and Toeplitz matrices</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs027.html#generalizing-the-above-one-dimensional-case" style="font-size: 80%;"><b>Generalizing the above one-dimensional case</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#memory-considerations" style="font-size: 80%;"><b>Memory considerations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs029.html#padding" style="font-size: 80%;"><b>Padding</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs030.html#new-vector" style="font-size: 80%;"><b>New vector</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs031.html#rewriting-as-dot-products" style="font-size: 80%;"><b>Rewriting as dot products</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs032.html#cross-correlation" style="font-size: 80%;"><b>Cross correlation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs032.html#two-dimensional-objects" style="font-size: 80%;"><b>Two-dimensional objects</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs033.html#cnns-in-more-detail-simple-example" style="font-size: 80%;"><b>CNNs in more detail, simple example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs034.html#the-convolution-stage" style="font-size: 80%;"><b>The convolution stage</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs035.html#finding-the-number-of-parameters" style="font-size: 80%;"><b>Finding the number of parameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs036.html#new-image-or-volume" style="font-size: 80%;"><b>New image (or volume)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs037.html#parameters-to-train-common-settings" style="font-size: 80%;"><b>Parameters to train, common settings</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs038.html#examples-of-cnn-setups" style="font-size: 80%;"><b>Examples of CNN setups</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs039.html#summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>Summarizing: Performing a general discrete convolution ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs040.html#pooling" style="font-size: 80%;"><b>Pooling</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs041.html#pooling-arithmetic" style="font-size: 80%;"><b>Pooling arithmetic</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs042.html#pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>Pooling types ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs043.html#building-convolutional-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building convolutional neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs044.html#setting-it-up" style="font-size: 80%;"><b>Setting it up</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs045.html#the-mnist-dataset-again" style="font-size: 80%;"><b>The MNIST dataset again</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs046.html#strong-correlations" style="font-size: 80%;"><b>Strong correlations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs047.html#layers-of-a-cnn" style="font-size: 80%;"><b>Layers of a CNN</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs048.html#systematic-reduction" style="font-size: 80%;"><b>Systematic reduction</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs049.html#prerequisites-collect-and-pre-process-data" style="font-size: 80%;"><b>Prerequisites: Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs050.html#importing-keras-and-tensorflow" style="font-size: 80%;"><b>Importing Keras and Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs051.html#running-with-keras" style="font-size: 80%;"><b>Running with Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs052.html#final-part" style="font-size: 80%;"><b>Final part</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs053.html#final-visualization" style="font-size: 80%;"><b>Final visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs054.html#the-cifar01-data-set" style="font-size: 80%;"><b>The CIFAR01 data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs055.html#verifying-the-data-set" style="font-size: 80%;"><b>Verifying the data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="#set-up-the-model" style="font-size: 80%;"><b>Set up  the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs057.html#add-dense-layers-on-top" style="font-size: 80%;"><b>Add Dense layers on top</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs058.html#compile-and-train-the-model" style="font-size: 80%;"><b>Compile and train the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs059.html#finally-evaluate-the-model" style="font-size: 80%;"><b>Finally, evaluate the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#building-our-own-cnn-code" style="font-size: 80%;"><b>Building our own CNN code</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#list-of-contents" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;List of contents:</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution2dlayer-convolution-in-a-hidden-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution2DLayer: convolution in a hidden layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#backpropagation-in-the-convolutional-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Backpropagation in the convolutional layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#demonstration" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Demonstration</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#pooling-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Pooling Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#flattening-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Flattening Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#fully-connected-layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Fully Connected Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#optimized-convolution2dlayer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Optimized Convolution2DLayer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#the-convolutional-neural-network-cnn" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;The Convolutional Neural Network (CNN)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-cnn-code" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of CNN code</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#additional-remarks" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Additional Remarks</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#remarks-on-the-speed" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Remarks on the speed</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution-using-separable-kernels" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution using separable kernels</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution-in-the-fourier-domain" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution in the Fourier domain</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs001.html#plan-for-week-44" style="font-size: 80%;">Plan for week 44</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs002.html#lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;">Lab  sessions on Tuesday and Wednesday</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs003.html#material-for-lecture-monday-october-27" style="font-size: 80%;">Material for Lecture Monday October 27</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs004.html#solving-differential-equations-with-deep-learning" style="font-size: 80%;">Solving differential equations  with Deep Learning</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs005.html#ordinary-differential-equations-first" style="font-size: 80%;">Ordinary Differential Equations first</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#the-trial-solution" style="font-size: 80%;">The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs007.html#minimization-process" style="font-size: 80%;">Minimization process</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs008.html#minimizing-the-cost-function-using-gradient-descent-and-automatic-differentiation" style="font-size: 80%;">Minimizing the cost function using gradient descent and automatic differentiation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs009.html#example-exponential-decay" style="font-size: 80%;">Example: Exponential decay</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs010.html#the-function-to-solve-for" style="font-size: 80%;">The function to solve for</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#the-trial-solution" style="font-size: 80%;">The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs012.html#setup-of-network" style="font-size: 80%;">Setup of Network</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs013.html#reformulating-the-problem" style="font-size: 80%;">Reformulating the problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs014.html#more-technicalities" style="font-size: 80%;">More technicalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs039.html#more-details" style="font-size: 80%;">More details</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs016.html#a-possible-implementation-of-a-neural-network" style="font-size: 80%;">A possible implementation of a neural network</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs017.html#technicalities" style="font-size: 80%;">Technicalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs018.html#final-technicalities-i" style="font-size: 80%;">Final technicalities I</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs019.html#final-technicalities-ii" style="font-size: 80%;">Final technicalities II</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs020.html#final-technicalities-iii" style="font-size: 80%;">Final technicalities III</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs021.html#final-technicalities-iv" style="font-size: 80%;">Final technicalities IV</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs022.html#back-propagation" style="font-size: 80%;">Back propagation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs023.html#gradient-descent" style="font-size: 80%;">Gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs024.html#the-code-for-solving-the-ode" style="font-size: 80%;">The code for solving the ODE</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs025.html#the-network-with-one-input-layer-specified-number-of-hidden-layers-and-one-output-layer" style="font-size: 80%;">The network with one input layer, specified number of hidden layers, and one output layer</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs026.html#example-population-growth" style="font-size: 80%;">Example: Population growth</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs027.html#setting-up-the-problem" style="font-size: 80%;">Setting up the problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#the-trial-solution" style="font-size: 80%;">The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs029.html#the-program-using-autograd" style="font-size: 80%;">The program using Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs030.html#using-forward-euler-to-solve-the-ode" style="font-size: 80%;">Using forward Euler to solve the ODE</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs031.html#example-solving-the-one-dimensional-poisson-equation" style="font-size: 80%;">Example: Solving the one dimensional Poisson equation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs032.html#the-specific-equation-to-solve-for" style="font-size: 80%;">The specific equation to solve for</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs033.html#solving-the-equation-using-autograd" style="font-size: 80%;">Solving the equation using Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs034.html#comparing-with-a-numerical-scheme" style="font-size: 80%;">Comparing with a numerical scheme</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs035.html#setting-up-the-code" style="font-size: 80%;">Setting up the code</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs036.html#partial-differential-equations" style="font-size: 80%;">Partial Differential Equations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs037.html#type-of-problem" style="font-size: 80%;">Type of problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs038.html#network-requirements" style="font-size: 80%;">Network requirements</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs039.html#more-details" style="font-size: 80%;">More details</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs040.html#example-the-diffusion-equation" style="font-size: 80%;">Example: The diffusion equation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs041.html#defining-the-problem" style="font-size: 80%;">Defining the problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs042.html#setting-up-the-network-using-autograd" style="font-size: 80%;">Setting up the network using Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs043.html#setting-up-the-network-using-autograd-the-trial-solution" style="font-size: 80%;">Setting up the network using Autograd; The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs044.html#why-the-jacobian" style="font-size: 80%;">Why the Jacobian?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs045.html#setting-up-the-network-using-autograd-the-full-program" style="font-size: 80%;">Setting up the network using Autograd; The full program</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs046.html#resources-on-differential-equations-and-deep-learning" style="font-size: 80%;">Resources on differential equations and deep learning</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs047.html#convolutional-neural-networks-recognizing-images" style="font-size: 80%;">Convolutional Neural Networks (recognizing images)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs048.html#what-is-the-difference" style="font-size: 80%;">What is the Difference</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs049.html#neural-networks-vs-cnns" style="font-size: 80%;">Neural Networks vs CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs050.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;">Why CNNS for images, sound files, medical images from CT scans etc?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs051.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;">Regular NNs don’t scale well to full images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs052.html#3d-volumes-of-neurons" style="font-size: 80%;">3D volumes of neurons</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs053.html#more-on-dimensionalities" style="font-size: 80%;">More on Dimensionalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs054.html#further-remarks" style="font-size: 80%;">Further remarks</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs055.html#layers-used-to-build-cnns" style="font-size: 80%;">Layers used to build CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="#transforming-images" style="font-size: 80%;">Transforming images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs057.html#cnns-in-brief" style="font-size: 80%;">CNNs in brief</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs058.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs059.html#key-idea" style="font-size: 80%;">Key Idea</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#how-to-do-image-compression-before-the-era-of-deep-learning" style="font-size: 80%;">How to do image compression before the era of deep learning</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs061.html#the-svd-example" style="font-size: 80%;">The SVD example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs062.html#mathematics-of-cnns" style="font-size: 80%;">Mathematics of CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs063.html#convolution-examples-polynomial-multiplication" style="font-size: 80%;">Convolution Examples: Polynomial multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs064.html#efficient-polynomial-multiplication" style="font-size: 80%;">Efficient Polynomial Multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs065.html#further-simplification" style="font-size: 80%;">Further simplification</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs066.html#a-more-efficient-way-of-coding-the-above-convolution" style="font-size: 80%;">A more efficient way of coding the above Convolution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs067.html#commutative-process" style="font-size: 80%;">Commutative process</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs068.html#toeplitz-matrices" style="font-size: 80%;">Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs069.html#fourier-series-and-toeplitz-matrices" style="font-size: 80%;">Fourier series and Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs070.html#generalizing-the-above-one-dimensional-case" style="font-size: 80%;">Generalizing the above one-dimensional case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs071.html#memory-considerations" style="font-size: 80%;">Memory considerations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs072.html#padding" style="font-size: 80%;">Padding</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs073.html#new-vector" style="font-size: 80%;">New vector</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs074.html#rewriting-as-dot-products" style="font-size: 80%;">Rewriting as dot products</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs075.html#cross-correlation" style="font-size: 80%;">Cross correlation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs075.html#two-dimensional-objects" style="font-size: 80%;">Two-dimensional objects</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs076.html#cnns-in-more-detail-simple-example" style="font-size: 80%;">CNNs in more detail, simple example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs077.html#the-convolution-stage" style="font-size: 80%;">The convolution stage</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs078.html#finding-the-number-of-parameters" style="font-size: 80%;">Finding the number of parameters</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs079.html#new-image-or-volume" style="font-size: 80%;">New image (or volume)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs080.html#parameters-to-train-common-settings" style="font-size: 80%;">Parameters to train, common settings</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs081.html#examples-of-cnn-setups" style="font-size: 80%;">Examples of CNN setups</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs082.html#summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Summarizing: Performing a general discrete convolution ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs083.html#pooling" style="font-size: 80%;">Pooling</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs084.html#pooling-arithmetic" style="font-size: 80%;">Pooling arithmetic</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs085.html#pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Pooling types ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs086.html#building-convolutional-neural-networks-in-tensorflow-keras-and-pytorch" style="font-size: 80%;">Building convolutional neural networks in Tensorflow/Keras and PyTorch</a></li>
 
         </ul>
       </li>
@@ -348,45 +389,21 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0056"></a>
 <!-- !split -->
-<h2 id="set-up-the-model" class="anchor">Set up  the model </h2>
-
-<p>The 6 lines of code below define the convolutional base using a common pattern: a stack of Conv2D and MaxPooling2D layers.</p>
-
-<p>As input, a CNN takes tensors of shape (image_height, image_width, color_channels), ignoring the batch size. If you are new to these dimensions, color_channels refers to (R,G,B). In this example, you will configure our CNN to process inputs of shape (32, 32, 3), which is the format of CIFAR images. You can do this by passing the argument input_shape to our first layer.</p>
+<h2 id="transforming-images" class="anchor">Transforming images </h2>
 
+<p>CNNs transform the original image layer by layer from the original
+pixel values to the final class scores. 
+</p>
 
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;">model <span style="color: #666666">=</span> models<span style="color: #666666">.</span>Sequential()
-model<span style="color: #666666">.</span>add(layers<span style="color: #666666">.</span>Conv2D(<span style="color: #666666">32</span>, (<span style="color: #666666">3</span>, <span style="color: #666666">3</span>), activation<span style="color: #666666">=</span><span style="color: #BA2121">&#39;relu&#39;</span>, input_shape<span style="color: #666666">=</span>(<span style="color: #666666">32</span>, <span style="color: #666666">32</span>, <span style="color: #666666">3</span>)))
-model<span style="color: #666666">.</span>add(layers<span style="color: #666666">.</span>MaxPooling2D((<span style="color: #666666">2</span>, <span style="color: #666666">2</span>)))
-model<span style="color: #666666">.</span>add(layers<span style="color: #666666">.</span>Conv2D(<span style="color: #666666">64</span>, (<span style="color: #666666">3</span>, <span style="color: #666666">3</span>), activation<span style="color: #666666">=</span><span style="color: #BA2121">&#39;relu&#39;</span>))
-model<span style="color: #666666">.</span>add(layers<span style="color: #666666">.</span>MaxPooling2D((<span style="color: #666666">2</span>, <span style="color: #666666">2</span>)))
-model<span style="color: #666666">.</span>add(layers<span style="color: #666666">.</span>Conv2D(<span style="color: #666666">64</span>, (<span style="color: #666666">3</span>, <span style="color: #666666">3</span>), activation<span style="color: #666666">=</span><span style="color: #BA2121">&#39;relu&#39;</span>))
-
-<span style="color: #408080; font-style: italic"># Let&#39;s display the architecture of our model so far.</span>
-
-model<span style="color: #666666">.</span>summary()
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-<p>You can see that the output of every Conv2D and MaxPooling2D layer is a 3D tensor of shape (height, width, channels). The width and height dimensions tend to shrink as you go deeper in the network. The number of output channels for each Conv2D layer is controlled by the first argument (e.g., 32 or 64). Typically, as the width and height shrink, you can afford (computationally) to add more output channels in each Conv2D layer.</p>
+<p>Observe that some layers contain
+parameters and other don&#8217;t. In particular, the CNN layers perform
+transformations that are a function of not only the activations in the
+input volume, but also of the parameters (the weights and biases of
+the neurons). On the other hand, the RELU/POOL layers will implement a
+fixed function. The parameters in the CONV/FC layers will be trained
+with gradient descent so that the class scores that the CNN computes
+are consistent with the labels in the training set for each image.
+</p>
 
 <p>
 <!-- navigation buttons at the bottom of the page -->
@@ -407,6 +424,13 @@ <h2 id="set-up-the-model" class="anchor">Set up  the model </h2>
   <li><a href="http://github.com/._week44-bs058.html">59</a></li>
   <li><a href="http://github.com/._week44-bs059.html">60</a></li>
   <li><a href="http://github.com/._week44-bs060.html">61</a></li>
+  <li><a href="http://github.com/._week44-bs061.html">62</a></li>
+  <li><a href="http://github.com/._week44-bs062.html">63</a></li>
+  <li><a href="http://github.com/._week44-bs063.html">64</a></li>
+  <li><a href="http://github.com/._week44-bs064.html">65</a></li>
+  <li><a href="http://github.com/._week44-bs065.html">66</a></li>
+  <li><a href="">...</a></li>
+  <li><a href="http://github.com/._week44-bs086.html">87</a></li>
   <li><a href="http://github.com/._week44-bs057.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week44/html/._week44-bs057.html b/doc/pub/week44/html/._week44-bs057.html
index ef5c8cc5f..1edba3d6e 100644
--- a/doc/pub/week44/html/._week44-bs057.html
+++ b/doc/pub/week44/html/._week44-bs057.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 44,  Convolutional Neural Networks (CNN)">
-<title>Week 44,  Convolutional Neural Networks (CNN)</title>
+<meta name="description" content="Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)">
+<title>Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week44.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week44-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -41,10 +41,121 @@
                2,
                None,
                'lab-sessions-on-tuesday-and-wednesday'),
-              ('Material for Lecture Monday October 28',
+              ('Material for Lecture Monday October 27',
                2,
                None,
-               'material-for-lecture-monday-october-28'),
+               'material-for-lecture-monday-october-27'),
+              ('Solving differential equations  with Deep Learning',
+               2,
+               None,
+               'solving-differential-equations-with-deep-learning'),
+              ('Ordinary Differential Equations first',
+               2,
+               None,
+               'ordinary-differential-equations-first'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('Minimization process', 2, None, 'minimization-process'),
+              ('Minimizing the cost function using gradient descent and '
+               'automatic differentiation',
+               2,
+               None,
+               'minimizing-the-cost-function-using-gradient-descent-and-automatic-differentiation'),
+              ('Example: Exponential decay',
+               2,
+               None,
+               'example-exponential-decay'),
+              ('The function to solve for',
+               2,
+               None,
+               'the-function-to-solve-for'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('Setup of Network', 2, None, 'setup-of-network'),
+              ('Reformulating the problem',
+               2,
+               None,
+               'reformulating-the-problem'),
+              ('More technicalities', 2, None, 'more-technicalities'),
+              ('More details', 2, None, 'more-details'),
+              ('A possible implementation of a neural network',
+               2,
+               None,
+               'a-possible-implementation-of-a-neural-network'),
+              ('Technicalities', 2, None, 'technicalities'),
+              ('Final technicalities I', 2, None, 'final-technicalities-i'),
+              ('Final technicalities II', 2, None, 'final-technicalities-ii'),
+              ('Final technicalities III', 2, None, 'final-technicalities-iii'),
+              ('Final technicalities IV', 2, None, 'final-technicalities-iv'),
+              ('Back propagation', 2, None, 'back-propagation'),
+              ('Gradient descent', 2, None, 'gradient-descent'),
+              ('The code for solving the ODE',
+               2,
+               None,
+               'the-code-for-solving-the-ode'),
+              ('The network with one input layer, specified number of hidden '
+               'layers, and one output layer',
+               2,
+               None,
+               'the-network-with-one-input-layer-specified-number-of-hidden-layers-and-one-output-layer'),
+              ('Example: Population growth',
+               2,
+               None,
+               'example-population-growth'),
+              ('Setting up the problem', 2, None, 'setting-up-the-problem'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('The program using Autograd',
+               2,
+               None,
+               'the-program-using-autograd'),
+              ('Using forward Euler to solve the ODE',
+               2,
+               None,
+               'using-forward-euler-to-solve-the-ode'),
+              ('Example: Solving the one dimensional Poisson equation',
+               2,
+               None,
+               'example-solving-the-one-dimensional-poisson-equation'),
+              ('The specific equation to solve for',
+               2,
+               None,
+               'the-specific-equation-to-solve-for'),
+              ('Solving the equation using Autograd',
+               2,
+               None,
+               'solving-the-equation-using-autograd'),
+              ('Comparing with a numerical scheme',
+               2,
+               None,
+               'comparing-with-a-numerical-scheme'),
+              ('Setting up the code', 2, None, 'setting-up-the-code'),
+              ('Partial Differential Equations',
+               2,
+               None,
+               'partial-differential-equations'),
+              ('Type of problem', 2, None, 'type-of-problem'),
+              ('Network requirements', 2, None, 'network-requirements'),
+              ('More details', 2, None, 'more-details'),
+              ('Example: The diffusion equation',
+               2,
+               None,
+               'example-the-diffusion-equation'),
+              ('Defining the problem', 2, None, 'defining-the-problem'),
+              ('Setting up the network using Autograd',
+               2,
+               None,
+               'setting-up-the-network-using-autograd'),
+              ('Setting up the network using Autograd; The trial solution',
+               2,
+               None,
+               'setting-up-the-network-using-autograd-the-trial-solution'),
+              ('Why the Jacobian?', 2, None, 'why-the-jacobian'),
+              ('Setting up the network using Autograd; The full program',
+               2,
+               None,
+               'setting-up-the-network-using-autograd-the-full-program'),
+              ('Resources on differential equations and deep learning',
+               2,
+               None,
+               'resources-on-differential-equations-and-deep-learning'),
               ('Convolutional Neural Networks (recognizing images)',
                2,
                None,
@@ -141,85 +252,11 @@
                2,
                None,
                'pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
-              ('Building convolutional neural networks in Tensorflow and Keras',
-               2,
-               None,
-               'building-convolutional-neural-networks-in-tensorflow-and-keras'),
-              ('Setting it up', 2, None, 'setting-it-up'),
-              ('The MNIST dataset again', 2, None, 'the-mnist-dataset-again'),
-              ('Strong correlations', 2, None, 'strong-correlations'),
-              ('Layers of a CNN', 2, None, 'layers-of-a-cnn'),
-              ('Systematic reduction', 2, None, 'systematic-reduction'),
-              ('Prerequisites: Collect and pre-process data',
-               2,
-               None,
-               'prerequisites-collect-and-pre-process-data'),
-              ('Importing Keras and Tensorflow',
-               2,
-               None,
-               'importing-keras-and-tensorflow'),
-              ('Running with Keras', 2, None, 'running-with-keras'),
-              ('Final part', 2, None, 'final-part'),
-              ('Final visualization', 2, None, 'final-visualization'),
-              ('The CIFAR01 data set', 2, None, 'the-cifar01-data-set'),
-              ('Verifying the data set', 2, None, 'verifying-the-data-set'),
-              ('Set up  the model', 2, None, 'set-up-the-model'),
-              ('Add Dense layers on top', 2, None, 'add-dense-layers-on-top'),
-              ('Compile and train the model',
-               2,
-               None,
-               'compile-and-train-the-model'),
-              ('Finally, evaluate the model',
-               2,
-               None,
-               'finally-evaluate-the-model'),
-              ('Building our own CNN code',
+              ('Building convolutional neural networks in Tensorflow/Keras and '
+               'PyTorch',
                2,
                None,
-               'building-our-own-cnn-code'),
-              ('List of contents:', 3, None, 'list-of-contents'),
-              ('Schedulers', 3, None, 'schedulers'),
-              ('Usage of schedulers', 3, None, 'usage-of-schedulers'),
-              ('Cost functions', 3, None, 'cost-functions'),
-              ('Usage of cost functions', 3, None, 'usage-of-cost-functions'),
-              ('Activation functions', 3, None, 'activation-functions'),
-              ('Usage of activation functions',
-               3,
-               None,
-               'usage-of-activation-functions'),
-              ('Convolution', 3, None, 'convolution'),
-              ('Layers', 3, None, 'layers'),
-              ('Convolution2DLayer: convolution in a hidden layer',
-               3,
-               None,
-               'convolution2dlayer-convolution-in-a-hidden-layer'),
-              ('Backpropagation in the convolutional layer',
-               3,
-               None,
-               'backpropagation-in-the-convolutional-layer'),
-              ('Demonstration', 3, None, 'demonstration'),
-              ('Pooling Layer', 3, None, 'pooling-layer'),
-              ('Flattening Layer', 3, None, 'flattening-layer'),
-              ('Fully Connected Layers', 3, None, 'fully-connected-layers'),
-              ('Optimized Convolution2DLayer',
-               3,
-               None,
-               'optimized-convolution2dlayer'),
-              ('The Convolutional Neural Network (CNN)',
-               3,
-               None,
-               'the-convolutional-neural-network-cnn'),
-              ('Usage of CNN code', 3, None, 'usage-of-cnn-code'),
-              ('Additional Remarks', 3, None, 'additional-remarks'),
-              ('Remarks on the speed', 3, None, 'remarks-on-the-speed'),
-              ('Convolution using separable kernels',
-               3,
-               None,
-               'convolution-using-separable-kernels'),
-              ('Convolution in the Fourier domain',
-               3,
-               None,
-               'convolution-in-the-fourier-domain')]}
+               'building-convolutional-neural-networks-in-tensorflow-keras-and-pytorch')]}
 end of tocinfo -->
 
 <body>
@@ -247,96 +284,100 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week44-bs.html">Week 44,  Convolutional Neural Networks (CNN)</a>
+    <a class="navbar-brand" href="http://github.com/week44-bs.html">Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs001.html#plan-for-week-44" style="font-size: 80%;"><b>Plan for week 44</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs002.html#lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Lab  sessions on Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs003.html#material-for-lecture-monday-october-28" style="font-size: 80%;"><b>Material for Lecture Monday October 28</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs004.html#convolutional-neural-networks-recognizing-images" style="font-size: 80%;"><b>Convolutional Neural Networks (recognizing images)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs005.html#what-is-the-difference" style="font-size: 80%;"><b>What is the Difference</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs006.html#neural-networks-vs-cnns" style="font-size: 80%;"><b>Neural Networks vs CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs007.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;"><b>Why CNNS for images, sound files, medical images from CT scans etc?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs008.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;"><b>Regular NNs don’t scale well to full images</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs009.html#3d-volumes-of-neurons" style="font-size: 80%;"><b>3D volumes of neurons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs010.html#more-on-dimensionalities" style="font-size: 80%;"><b>More on Dimensionalities</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs011.html#further-remarks" style="font-size: 80%;"><b>Further remarks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs012.html#layers-used-to-build-cnns" style="font-size: 80%;"><b>Layers used to build CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs013.html#transforming-images" style="font-size: 80%;"><b>Transforming images</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs014.html#cnns-in-brief" style="font-size: 80%;"><b>CNNs in brief</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs015.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs016.html#key-idea" style="font-size: 80%;"><b>Key Idea</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs017.html#how-to-do-image-compression-before-the-era-of-deep-learning" style="font-size: 80%;"><b>How to do image compression before the era of deep learning</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs018.html#the-svd-example" style="font-size: 80%;"><b>The SVD example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs019.html#mathematics-of-cnns" style="font-size: 80%;"><b>Mathematics of CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs020.html#convolution-examples-polynomial-multiplication" style="font-size: 80%;"><b>Convolution Examples: Polynomial multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs021.html#efficient-polynomial-multiplication" style="font-size: 80%;"><b>Efficient Polynomial Multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs022.html#further-simplification" style="font-size: 80%;"><b>Further simplification</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs023.html#a-more-efficient-way-of-coding-the-above-convolution" style="font-size: 80%;"><b>A more efficient way of coding the above Convolution</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs024.html#commutative-process" style="font-size: 80%;"><b>Commutative process</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs025.html#toeplitz-matrices" style="font-size: 80%;"><b>Toeplitz matrices</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs026.html#fourier-series-and-toeplitz-matrices" style="font-size: 80%;"><b>Fourier series and Toeplitz matrices</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs027.html#generalizing-the-above-one-dimensional-case" style="font-size: 80%;"><b>Generalizing the above one-dimensional case</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#memory-considerations" style="font-size: 80%;"><b>Memory considerations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs029.html#padding" style="font-size: 80%;"><b>Padding</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs030.html#new-vector" style="font-size: 80%;"><b>New vector</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs031.html#rewriting-as-dot-products" style="font-size: 80%;"><b>Rewriting as dot products</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs032.html#cross-correlation" style="font-size: 80%;"><b>Cross correlation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs032.html#two-dimensional-objects" style="font-size: 80%;"><b>Two-dimensional objects</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs033.html#cnns-in-more-detail-simple-example" style="font-size: 80%;"><b>CNNs in more detail, simple example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs034.html#the-convolution-stage" style="font-size: 80%;"><b>The convolution stage</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs035.html#finding-the-number-of-parameters" style="font-size: 80%;"><b>Finding the number of parameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs036.html#new-image-or-volume" style="font-size: 80%;"><b>New image (or volume)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs037.html#parameters-to-train-common-settings" style="font-size: 80%;"><b>Parameters to train, common settings</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs038.html#examples-of-cnn-setups" style="font-size: 80%;"><b>Examples of CNN setups</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs039.html#summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>Summarizing: Performing a general discrete convolution ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs040.html#pooling" style="font-size: 80%;"><b>Pooling</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs041.html#pooling-arithmetic" style="font-size: 80%;"><b>Pooling arithmetic</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs042.html#pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>Pooling types ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs043.html#building-convolutional-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building convolutional neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs044.html#setting-it-up" style="font-size: 80%;"><b>Setting it up</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs045.html#the-mnist-dataset-again" style="font-size: 80%;"><b>The MNIST dataset again</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs046.html#strong-correlations" style="font-size: 80%;"><b>Strong correlations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs047.html#layers-of-a-cnn" style="font-size: 80%;"><b>Layers of a CNN</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs048.html#systematic-reduction" style="font-size: 80%;"><b>Systematic reduction</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs049.html#prerequisites-collect-and-pre-process-data" style="font-size: 80%;"><b>Prerequisites: Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs050.html#importing-keras-and-tensorflow" style="font-size: 80%;"><b>Importing Keras and Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs051.html#running-with-keras" style="font-size: 80%;"><b>Running with Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs052.html#final-part" style="font-size: 80%;"><b>Final part</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs053.html#final-visualization" style="font-size: 80%;"><b>Final visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs054.html#the-cifar01-data-set" style="font-size: 80%;"><b>The CIFAR01 data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs055.html#verifying-the-data-set" style="font-size: 80%;"><b>Verifying the data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs056.html#set-up-the-model" style="font-size: 80%;"><b>Set up  the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="#add-dense-layers-on-top" style="font-size: 80%;"><b>Add Dense layers on top</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs058.html#compile-and-train-the-model" style="font-size: 80%;"><b>Compile and train the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs059.html#finally-evaluate-the-model" style="font-size: 80%;"><b>Finally, evaluate the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#building-our-own-cnn-code" style="font-size: 80%;"><b>Building our own CNN code</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#list-of-contents" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;List of contents:</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution2dlayer-convolution-in-a-hidden-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution2DLayer: convolution in a hidden layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#backpropagation-in-the-convolutional-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Backpropagation in the convolutional layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#demonstration" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Demonstration</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#pooling-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Pooling Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#flattening-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Flattening Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#fully-connected-layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Fully Connected Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#optimized-convolution2dlayer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Optimized Convolution2DLayer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#the-convolutional-neural-network-cnn" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;The Convolutional Neural Network (CNN)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-cnn-code" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of CNN code</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#additional-remarks" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Additional Remarks</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#remarks-on-the-speed" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Remarks on the speed</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution-using-separable-kernels" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution using separable kernels</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution-in-the-fourier-domain" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution in the Fourier domain</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs001.html#plan-for-week-44" style="font-size: 80%;">Plan for week 44</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs002.html#lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;">Lab  sessions on Tuesday and Wednesday</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs003.html#material-for-lecture-monday-october-27" style="font-size: 80%;">Material for Lecture Monday October 27</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs004.html#solving-differential-equations-with-deep-learning" style="font-size: 80%;">Solving differential equations  with Deep Learning</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs005.html#ordinary-differential-equations-first" style="font-size: 80%;">Ordinary Differential Equations first</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#the-trial-solution" style="font-size: 80%;">The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs007.html#minimization-process" style="font-size: 80%;">Minimization process</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs008.html#minimizing-the-cost-function-using-gradient-descent-and-automatic-differentiation" style="font-size: 80%;">Minimizing the cost function using gradient descent and automatic differentiation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs009.html#example-exponential-decay" style="font-size: 80%;">Example: Exponential decay</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs010.html#the-function-to-solve-for" style="font-size: 80%;">The function to solve for</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#the-trial-solution" style="font-size: 80%;">The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs012.html#setup-of-network" style="font-size: 80%;">Setup of Network</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs013.html#reformulating-the-problem" style="font-size: 80%;">Reformulating the problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs014.html#more-technicalities" style="font-size: 80%;">More technicalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs039.html#more-details" style="font-size: 80%;">More details</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs016.html#a-possible-implementation-of-a-neural-network" style="font-size: 80%;">A possible implementation of a neural network</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs017.html#technicalities" style="font-size: 80%;">Technicalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs018.html#final-technicalities-i" style="font-size: 80%;">Final technicalities I</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs019.html#final-technicalities-ii" style="font-size: 80%;">Final technicalities II</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs020.html#final-technicalities-iii" style="font-size: 80%;">Final technicalities III</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs021.html#final-technicalities-iv" style="font-size: 80%;">Final technicalities IV</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs022.html#back-propagation" style="font-size: 80%;">Back propagation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs023.html#gradient-descent" style="font-size: 80%;">Gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs024.html#the-code-for-solving-the-ode" style="font-size: 80%;">The code for solving the ODE</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs025.html#the-network-with-one-input-layer-specified-number-of-hidden-layers-and-one-output-layer" style="font-size: 80%;">The network with one input layer, specified number of hidden layers, and one output layer</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs026.html#example-population-growth" style="font-size: 80%;">Example: Population growth</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs027.html#setting-up-the-problem" style="font-size: 80%;">Setting up the problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#the-trial-solution" style="font-size: 80%;">The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs029.html#the-program-using-autograd" style="font-size: 80%;">The program using Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs030.html#using-forward-euler-to-solve-the-ode" style="font-size: 80%;">Using forward Euler to solve the ODE</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs031.html#example-solving-the-one-dimensional-poisson-equation" style="font-size: 80%;">Example: Solving the one dimensional Poisson equation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs032.html#the-specific-equation-to-solve-for" style="font-size: 80%;">The specific equation to solve for</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs033.html#solving-the-equation-using-autograd" style="font-size: 80%;">Solving the equation using Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs034.html#comparing-with-a-numerical-scheme" style="font-size: 80%;">Comparing with a numerical scheme</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs035.html#setting-up-the-code" style="font-size: 80%;">Setting up the code</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs036.html#partial-differential-equations" style="font-size: 80%;">Partial Differential Equations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs037.html#type-of-problem" style="font-size: 80%;">Type of problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs038.html#network-requirements" style="font-size: 80%;">Network requirements</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs039.html#more-details" style="font-size: 80%;">More details</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs040.html#example-the-diffusion-equation" style="font-size: 80%;">Example: The diffusion equation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs041.html#defining-the-problem" style="font-size: 80%;">Defining the problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs042.html#setting-up-the-network-using-autograd" style="font-size: 80%;">Setting up the network using Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs043.html#setting-up-the-network-using-autograd-the-trial-solution" style="font-size: 80%;">Setting up the network using Autograd; The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs044.html#why-the-jacobian" style="font-size: 80%;">Why the Jacobian?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs045.html#setting-up-the-network-using-autograd-the-full-program" style="font-size: 80%;">Setting up the network using Autograd; The full program</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs046.html#resources-on-differential-equations-and-deep-learning" style="font-size: 80%;">Resources on differential equations and deep learning</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs047.html#convolutional-neural-networks-recognizing-images" style="font-size: 80%;">Convolutional Neural Networks (recognizing images)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs048.html#what-is-the-difference" style="font-size: 80%;">What is the Difference</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs049.html#neural-networks-vs-cnns" style="font-size: 80%;">Neural Networks vs CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs050.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;">Why CNNS for images, sound files, medical images from CT scans etc?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs051.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;">Regular NNs don’t scale well to full images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs052.html#3d-volumes-of-neurons" style="font-size: 80%;">3D volumes of neurons</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs053.html#more-on-dimensionalities" style="font-size: 80%;">More on Dimensionalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs054.html#further-remarks" style="font-size: 80%;">Further remarks</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs055.html#layers-used-to-build-cnns" style="font-size: 80%;">Layers used to build CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs056.html#transforming-images" style="font-size: 80%;">Transforming images</a></li>
+     <!-- navigation toc: --> <li><a href="#cnns-in-brief" style="font-size: 80%;">CNNs in brief</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs058.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs059.html#key-idea" style="font-size: 80%;">Key Idea</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#how-to-do-image-compression-before-the-era-of-deep-learning" style="font-size: 80%;">How to do image compression before the era of deep learning</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs061.html#the-svd-example" style="font-size: 80%;">The SVD example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs062.html#mathematics-of-cnns" style="font-size: 80%;">Mathematics of CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs063.html#convolution-examples-polynomial-multiplication" style="font-size: 80%;">Convolution Examples: Polynomial multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs064.html#efficient-polynomial-multiplication" style="font-size: 80%;">Efficient Polynomial Multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs065.html#further-simplification" style="font-size: 80%;">Further simplification</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs066.html#a-more-efficient-way-of-coding-the-above-convolution" style="font-size: 80%;">A more efficient way of coding the above Convolution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs067.html#commutative-process" style="font-size: 80%;">Commutative process</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs068.html#toeplitz-matrices" style="font-size: 80%;">Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs069.html#fourier-series-and-toeplitz-matrices" style="font-size: 80%;">Fourier series and Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs070.html#generalizing-the-above-one-dimensional-case" style="font-size: 80%;">Generalizing the above one-dimensional case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs071.html#memory-considerations" style="font-size: 80%;">Memory considerations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs072.html#padding" style="font-size: 80%;">Padding</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs073.html#new-vector" style="font-size: 80%;">New vector</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs074.html#rewriting-as-dot-products" style="font-size: 80%;">Rewriting as dot products</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs075.html#cross-correlation" style="font-size: 80%;">Cross correlation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs075.html#two-dimensional-objects" style="font-size: 80%;">Two-dimensional objects</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs076.html#cnns-in-more-detail-simple-example" style="font-size: 80%;">CNNs in more detail, simple example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs077.html#the-convolution-stage" style="font-size: 80%;">The convolution stage</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs078.html#finding-the-number-of-parameters" style="font-size: 80%;">Finding the number of parameters</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs079.html#new-image-or-volume" style="font-size: 80%;">New image (or volume)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs080.html#parameters-to-train-common-settings" style="font-size: 80%;">Parameters to train, common settings</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs081.html#examples-of-cnn-setups" style="font-size: 80%;">Examples of CNN setups</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs082.html#summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Summarizing: Performing a general discrete convolution ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs083.html#pooling" style="font-size: 80%;">Pooling</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs084.html#pooling-arithmetic" style="font-size: 80%;">Pooling arithmetic</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs085.html#pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Pooling types ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs086.html#building-convolutional-neural-networks-in-tensorflow-keras-and-pytorch" style="font-size: 80%;">Building convolutional neural networks in Tensorflow/Keras and PyTorch</a></li>
 
         </ul>
       </li>
@@ -348,47 +389,17 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0057"></a>
 <!-- !split -->
-<h2 id="add-dense-layers-on-top" class="anchor">Add Dense layers on top </h2>
-
-<p>To complete our model, you will feed the last output tensor from the
-convolutional base (of shape (4, 4, 64)) into one or more Dense layers
-to perform classification. Dense layers take vectors as input (which
-are 1D), while the current output is a 3D tensor. First, you will
-flatten (or unroll) the 3D output to 1D, then add one or more Dense
-layers on top. CIFAR has 10 output classes, so you use a final Dense
-layer with 10 outputs and a softmax activation.
-</p>
-
+<h2 id="cnns-in-brief" class="anchor">CNNs in brief </h2>
 
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;">model<span style="color: #666666">.</span>add(layers<span style="color: #666666">.</span>Flatten())
-model<span style="color: #666666">.</span>add(layers<span style="color: #666666">.</span>Dense(<span style="color: #666666">64</span>, activation<span style="color: #666666">=</span><span style="color: #BA2121">&#39;relu&#39;</span>))
-model<span style="color: #666666">.</span>add(layers<span style="color: #666666">.</span>Dense(<span style="color: #666666">10</span>))
-<span style="color: #408080; font-style: italic"># Here&#39;s the complete architecture of our model</span>
-
-model<span style="color: #666666">.</span>summary()
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-<p>As you can see, our (4, 4, 64) outputs were flattened into vectors of shape (1024) before going through two Dense layers.</p>
+<p>In summary:</p>
 
+<ul>
+<li> A CNN architecture is in the simplest case a list of Layers that transform the image volume into an output volume (e.g. holding the class scores)</li>
+<li> There are a few distinct types of Layers (e.g. CONV/FC/RELU/POOL are by far the most popular)</li>
+<li> Each Layer accepts an input 3D volume and transforms it to an output 3D volume through a differentiable function</li>
+<li> Each Layer may or may not have parameters (e.g. CONV/FC do, RELU/POOL don&#8217;t)</li>
+<li> Each Layer may or may not have additional hyperparameters (e.g. CONV/FC/POOL do, RELU doesn&#8217;t)</li>
+</ul>
 <p>
 <!-- navigation buttons at the bottom of the page -->
 <ul class="pagination">
@@ -407,6 +418,14 @@ <h2 id="add-dense-layers-on-top" class="anchor">Add Dense layers on top </h2>
   <li><a href="http://github.com/._week44-bs058.html">59</a></li>
   <li><a href="http://github.com/._week44-bs059.html">60</a></li>
   <li><a href="http://github.com/._week44-bs060.html">61</a></li>
+  <li><a href="http://github.com/._week44-bs061.html">62</a></li>
+  <li><a href="http://github.com/._week44-bs062.html">63</a></li>
+  <li><a href="http://github.com/._week44-bs063.html">64</a></li>
+  <li><a href="http://github.com/._week44-bs064.html">65</a></li>
+  <li><a href="http://github.com/._week44-bs065.html">66</a></li>
+  <li><a href="http://github.com/._week44-bs066.html">67</a></li>
+  <li><a href="">...</a></li>
+  <li><a href="http://github.com/._week44-bs086.html">87</a></li>
   <li><a href="http://github.com/._week44-bs058.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week44/html/._week44-bs058.html b/doc/pub/week44/html/._week44-bs058.html
index 771ba1898..de69dd530 100644
--- a/doc/pub/week44/html/._week44-bs058.html
+++ b/doc/pub/week44/html/._week44-bs058.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 44,  Convolutional Neural Networks (CNN)">
-<title>Week 44,  Convolutional Neural Networks (CNN)</title>
+<meta name="description" content="Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)">
+<title>Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week44.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week44-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -41,10 +41,121 @@
                2,
                None,
                'lab-sessions-on-tuesday-and-wednesday'),
-              ('Material for Lecture Monday October 28',
+              ('Material for Lecture Monday October 27',
                2,
                None,
-               'material-for-lecture-monday-october-28'),
+               'material-for-lecture-monday-october-27'),
+              ('Solving differential equations  with Deep Learning',
+               2,
+               None,
+               'solving-differential-equations-with-deep-learning'),
+              ('Ordinary Differential Equations first',
+               2,
+               None,
+               'ordinary-differential-equations-first'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('Minimization process', 2, None, 'minimization-process'),
+              ('Minimizing the cost function using gradient descent and '
+               'automatic differentiation',
+               2,
+               None,
+               'minimizing-the-cost-function-using-gradient-descent-and-automatic-differentiation'),
+              ('Example: Exponential decay',
+               2,
+               None,
+               'example-exponential-decay'),
+              ('The function to solve for',
+               2,
+               None,
+               'the-function-to-solve-for'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('Setup of Network', 2, None, 'setup-of-network'),
+              ('Reformulating the problem',
+               2,
+               None,
+               'reformulating-the-problem'),
+              ('More technicalities', 2, None, 'more-technicalities'),
+              ('More details', 2, None, 'more-details'),
+              ('A possible implementation of a neural network',
+               2,
+               None,
+               'a-possible-implementation-of-a-neural-network'),
+              ('Technicalities', 2, None, 'technicalities'),
+              ('Final technicalities I', 2, None, 'final-technicalities-i'),
+              ('Final technicalities II', 2, None, 'final-technicalities-ii'),
+              ('Final technicalities III', 2, None, 'final-technicalities-iii'),
+              ('Final technicalities IV', 2, None, 'final-technicalities-iv'),
+              ('Back propagation', 2, None, 'back-propagation'),
+              ('Gradient descent', 2, None, 'gradient-descent'),
+              ('The code for solving the ODE',
+               2,
+               None,
+               'the-code-for-solving-the-ode'),
+              ('The network with one input layer, specified number of hidden '
+               'layers, and one output layer',
+               2,
+               None,
+               'the-network-with-one-input-layer-specified-number-of-hidden-layers-and-one-output-layer'),
+              ('Example: Population growth',
+               2,
+               None,
+               'example-population-growth'),
+              ('Setting up the problem', 2, None, 'setting-up-the-problem'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('The program using Autograd',
+               2,
+               None,
+               'the-program-using-autograd'),
+              ('Using forward Euler to solve the ODE',
+               2,
+               None,
+               'using-forward-euler-to-solve-the-ode'),
+              ('Example: Solving the one dimensional Poisson equation',
+               2,
+               None,
+               'example-solving-the-one-dimensional-poisson-equation'),
+              ('The specific equation to solve for',
+               2,
+               None,
+               'the-specific-equation-to-solve-for'),
+              ('Solving the equation using Autograd',
+               2,
+               None,
+               'solving-the-equation-using-autograd'),
+              ('Comparing with a numerical scheme',
+               2,
+               None,
+               'comparing-with-a-numerical-scheme'),
+              ('Setting up the code', 2, None, 'setting-up-the-code'),
+              ('Partial Differential Equations',
+               2,
+               None,
+               'partial-differential-equations'),
+              ('Type of problem', 2, None, 'type-of-problem'),
+              ('Network requirements', 2, None, 'network-requirements'),
+              ('More details', 2, None, 'more-details'),
+              ('Example: The diffusion equation',
+               2,
+               None,
+               'example-the-diffusion-equation'),
+              ('Defining the problem', 2, None, 'defining-the-problem'),
+              ('Setting up the network using Autograd',
+               2,
+               None,
+               'setting-up-the-network-using-autograd'),
+              ('Setting up the network using Autograd; The trial solution',
+               2,
+               None,
+               'setting-up-the-network-using-autograd-the-trial-solution'),
+              ('Why the Jacobian?', 2, None, 'why-the-jacobian'),
+              ('Setting up the network using Autograd; The full program',
+               2,
+               None,
+               'setting-up-the-network-using-autograd-the-full-program'),
+              ('Resources on differential equations and deep learning',
+               2,
+               None,
+               'resources-on-differential-equations-and-deep-learning'),
               ('Convolutional Neural Networks (recognizing images)',
                2,
                None,
@@ -141,85 +252,11 @@
                2,
                None,
                'pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
-              ('Building convolutional neural networks in Tensorflow and Keras',
-               2,
-               None,
-               'building-convolutional-neural-networks-in-tensorflow-and-keras'),
-              ('Setting it up', 2, None, 'setting-it-up'),
-              ('The MNIST dataset again', 2, None, 'the-mnist-dataset-again'),
-              ('Strong correlations', 2, None, 'strong-correlations'),
-              ('Layers of a CNN', 2, None, 'layers-of-a-cnn'),
-              ('Systematic reduction', 2, None, 'systematic-reduction'),
-              ('Prerequisites: Collect and pre-process data',
-               2,
-               None,
-               'prerequisites-collect-and-pre-process-data'),
-              ('Importing Keras and Tensorflow',
-               2,
-               None,
-               'importing-keras-and-tensorflow'),
-              ('Running with Keras', 2, None, 'running-with-keras'),
-              ('Final part', 2, None, 'final-part'),
-              ('Final visualization', 2, None, 'final-visualization'),
-              ('The CIFAR01 data set', 2, None, 'the-cifar01-data-set'),
-              ('Verifying the data set', 2, None, 'verifying-the-data-set'),
-              ('Set up  the model', 2, None, 'set-up-the-model'),
-              ('Add Dense layers on top', 2, None, 'add-dense-layers-on-top'),
-              ('Compile and train the model',
-               2,
-               None,
-               'compile-and-train-the-model'),
-              ('Finally, evaluate the model',
-               2,
-               None,
-               'finally-evaluate-the-model'),
-              ('Building our own CNN code',
+              ('Building convolutional neural networks in Tensorflow/Keras and '
+               'PyTorch',
                2,
                None,
-               'building-our-own-cnn-code'),
-              ('List of contents:', 3, None, 'list-of-contents'),
-              ('Schedulers', 3, None, 'schedulers'),
-              ('Usage of schedulers', 3, None, 'usage-of-schedulers'),
-              ('Cost functions', 3, None, 'cost-functions'),
-              ('Usage of cost functions', 3, None, 'usage-of-cost-functions'),
-              ('Activation functions', 3, None, 'activation-functions'),
-              ('Usage of activation functions',
-               3,
-               None,
-               'usage-of-activation-functions'),
-              ('Convolution', 3, None, 'convolution'),
-              ('Layers', 3, None, 'layers'),
-              ('Convolution2DLayer: convolution in a hidden layer',
-               3,
-               None,
-               'convolution2dlayer-convolution-in-a-hidden-layer'),
-              ('Backpropagation in the convolutional layer',
-               3,
-               None,
-               'backpropagation-in-the-convolutional-layer'),
-              ('Demonstration', 3, None, 'demonstration'),
-              ('Pooling Layer', 3, None, 'pooling-layer'),
-              ('Flattening Layer', 3, None, 'flattening-layer'),
-              ('Fully Connected Layers', 3, None, 'fully-connected-layers'),
-              ('Optimized Convolution2DLayer',
-               3,
-               None,
-               'optimized-convolution2dlayer'),
-              ('The Convolutional Neural Network (CNN)',
-               3,
-               None,
-               'the-convolutional-neural-network-cnn'),
-              ('Usage of CNN code', 3, None, 'usage-of-cnn-code'),
-              ('Additional Remarks', 3, None, 'additional-remarks'),
-              ('Remarks on the speed', 3, None, 'remarks-on-the-speed'),
-              ('Convolution using separable kernels',
-               3,
-               None,
-               'convolution-using-separable-kernels'),
-              ('Convolution in the Fourier domain',
-               3,
-               None,
-               'convolution-in-the-fourier-domain')]}
+               'building-convolutional-neural-networks-in-tensorflow-keras-and-pytorch')]}
 end of tocinfo -->
 
 <body>
@@ -247,96 +284,100 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week44-bs.html">Week 44,  Convolutional Neural Networks (CNN)</a>
+    <a class="navbar-brand" href="http://github.com/week44-bs.html">Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs001.html#plan-for-week-44" style="font-size: 80%;"><b>Plan for week 44</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs002.html#lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Lab  sessions on Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs003.html#material-for-lecture-monday-october-28" style="font-size: 80%;"><b>Material for Lecture Monday October 28</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs004.html#convolutional-neural-networks-recognizing-images" style="font-size: 80%;"><b>Convolutional Neural Networks (recognizing images)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs005.html#what-is-the-difference" style="font-size: 80%;"><b>What is the Difference</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs006.html#neural-networks-vs-cnns" style="font-size: 80%;"><b>Neural Networks vs CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs007.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;"><b>Why CNNS for images, sound files, medical images from CT scans etc?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs008.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;"><b>Regular NNs don’t scale well to full images</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs009.html#3d-volumes-of-neurons" style="font-size: 80%;"><b>3D volumes of neurons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs010.html#more-on-dimensionalities" style="font-size: 80%;"><b>More on Dimensionalities</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs011.html#further-remarks" style="font-size: 80%;"><b>Further remarks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs012.html#layers-used-to-build-cnns" style="font-size: 80%;"><b>Layers used to build CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs013.html#transforming-images" style="font-size: 80%;"><b>Transforming images</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs014.html#cnns-in-brief" style="font-size: 80%;"><b>CNNs in brief</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs015.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs016.html#key-idea" style="font-size: 80%;"><b>Key Idea</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs017.html#how-to-do-image-compression-before-the-era-of-deep-learning" style="font-size: 80%;"><b>How to do image compression before the era of deep learning</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs018.html#the-svd-example" style="font-size: 80%;"><b>The SVD example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs019.html#mathematics-of-cnns" style="font-size: 80%;"><b>Mathematics of CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs020.html#convolution-examples-polynomial-multiplication" style="font-size: 80%;"><b>Convolution Examples: Polynomial multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs021.html#efficient-polynomial-multiplication" style="font-size: 80%;"><b>Efficient Polynomial Multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs022.html#further-simplification" style="font-size: 80%;"><b>Further simplification</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs023.html#a-more-efficient-way-of-coding-the-above-convolution" style="font-size: 80%;"><b>A more efficient way of coding the above Convolution</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs024.html#commutative-process" style="font-size: 80%;"><b>Commutative process</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs025.html#toeplitz-matrices" style="font-size: 80%;"><b>Toeplitz matrices</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs026.html#fourier-series-and-toeplitz-matrices" style="font-size: 80%;"><b>Fourier series and Toeplitz matrices</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs027.html#generalizing-the-above-one-dimensional-case" style="font-size: 80%;"><b>Generalizing the above one-dimensional case</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#memory-considerations" style="font-size: 80%;"><b>Memory considerations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs029.html#padding" style="font-size: 80%;"><b>Padding</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs030.html#new-vector" style="font-size: 80%;"><b>New vector</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs031.html#rewriting-as-dot-products" style="font-size: 80%;"><b>Rewriting as dot products</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs032.html#cross-correlation" style="font-size: 80%;"><b>Cross correlation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs032.html#two-dimensional-objects" style="font-size: 80%;"><b>Two-dimensional objects</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs033.html#cnns-in-more-detail-simple-example" style="font-size: 80%;"><b>CNNs in more detail, simple example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs034.html#the-convolution-stage" style="font-size: 80%;"><b>The convolution stage</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs035.html#finding-the-number-of-parameters" style="font-size: 80%;"><b>Finding the number of parameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs036.html#new-image-or-volume" style="font-size: 80%;"><b>New image (or volume)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs037.html#parameters-to-train-common-settings" style="font-size: 80%;"><b>Parameters to train, common settings</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs038.html#examples-of-cnn-setups" style="font-size: 80%;"><b>Examples of CNN setups</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs039.html#summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>Summarizing: Performing a general discrete convolution ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs040.html#pooling" style="font-size: 80%;"><b>Pooling</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs041.html#pooling-arithmetic" style="font-size: 80%;"><b>Pooling arithmetic</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs042.html#pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>Pooling types ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs043.html#building-convolutional-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building convolutional neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs044.html#setting-it-up" style="font-size: 80%;"><b>Setting it up</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs045.html#the-mnist-dataset-again" style="font-size: 80%;"><b>The MNIST dataset again</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs046.html#strong-correlations" style="font-size: 80%;"><b>Strong correlations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs047.html#layers-of-a-cnn" style="font-size: 80%;"><b>Layers of a CNN</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs048.html#systematic-reduction" style="font-size: 80%;"><b>Systematic reduction</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs049.html#prerequisites-collect-and-pre-process-data" style="font-size: 80%;"><b>Prerequisites: Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs050.html#importing-keras-and-tensorflow" style="font-size: 80%;"><b>Importing Keras and Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs051.html#running-with-keras" style="font-size: 80%;"><b>Running with Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs052.html#final-part" style="font-size: 80%;"><b>Final part</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs053.html#final-visualization" style="font-size: 80%;"><b>Final visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs054.html#the-cifar01-data-set" style="font-size: 80%;"><b>The CIFAR01 data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs055.html#verifying-the-data-set" style="font-size: 80%;"><b>Verifying the data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs056.html#set-up-the-model" style="font-size: 80%;"><b>Set up  the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs057.html#add-dense-layers-on-top" style="font-size: 80%;"><b>Add Dense layers on top</b></a></li>
-     <!-- navigation toc: --> <li><a href="#compile-and-train-the-model" style="font-size: 80%;"><b>Compile and train the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs059.html#finally-evaluate-the-model" style="font-size: 80%;"><b>Finally, evaluate the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#building-our-own-cnn-code" style="font-size: 80%;"><b>Building our own CNN code</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#list-of-contents" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;List of contents:</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution2dlayer-convolution-in-a-hidden-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution2DLayer: convolution in a hidden layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#backpropagation-in-the-convolutional-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Backpropagation in the convolutional layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#demonstration" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Demonstration</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#pooling-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Pooling Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#flattening-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Flattening Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#fully-connected-layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Fully Connected Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#optimized-convolution2dlayer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Optimized Convolution2DLayer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#the-convolutional-neural-network-cnn" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;The Convolutional Neural Network (CNN)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-cnn-code" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of CNN code</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#additional-remarks" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Additional Remarks</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#remarks-on-the-speed" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Remarks on the speed</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution-using-separable-kernels" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution using separable kernels</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution-in-the-fourier-domain" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution in the Fourier domain</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs001.html#plan-for-week-44" style="font-size: 80%;">Plan for week 44</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs002.html#lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;">Lab  sessions on Tuesday and Wednesday</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs003.html#material-for-lecture-monday-october-27" style="font-size: 80%;">Material for Lecture Monday October 27</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs004.html#solving-differential-equations-with-deep-learning" style="font-size: 80%;">Solving differential equations  with Deep Learning</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs005.html#ordinary-differential-equations-first" style="font-size: 80%;">Ordinary Differential Equations first</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#the-trial-solution" style="font-size: 80%;">The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs007.html#minimization-process" style="font-size: 80%;">Minimization process</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs008.html#minimizing-the-cost-function-using-gradient-descent-and-automatic-differentiation" style="font-size: 80%;">Minimizing the cost function using gradient descent and automatic differentiation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs009.html#example-exponential-decay" style="font-size: 80%;">Example: Exponential decay</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs010.html#the-function-to-solve-for" style="font-size: 80%;">The function to solve for</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#the-trial-solution" style="font-size: 80%;">The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs012.html#setup-of-network" style="font-size: 80%;">Setup of Network</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs013.html#reformulating-the-problem" style="font-size: 80%;">Reformulating the problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs014.html#more-technicalities" style="font-size: 80%;">More technicalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs039.html#more-details" style="font-size: 80%;">More details</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs016.html#a-possible-implementation-of-a-neural-network" style="font-size: 80%;">A possible implementation of a neural network</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs017.html#technicalities" style="font-size: 80%;">Technicalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs018.html#final-technicalities-i" style="font-size: 80%;">Final technicalities I</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs019.html#final-technicalities-ii" style="font-size: 80%;">Final technicalities II</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs020.html#final-technicalities-iii" style="font-size: 80%;">Final technicalities III</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs021.html#final-technicalities-iv" style="font-size: 80%;">Final technicalities IV</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs022.html#back-propagation" style="font-size: 80%;">Back propagation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs023.html#gradient-descent" style="font-size: 80%;">Gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs024.html#the-code-for-solving-the-ode" style="font-size: 80%;">The code for solving the ODE</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs025.html#the-network-with-one-input-layer-specified-number-of-hidden-layers-and-one-output-layer" style="font-size: 80%;">The network with one input layer, specified number of hidden layers, and one output layer</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs026.html#example-population-growth" style="font-size: 80%;">Example: Population growth</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs027.html#setting-up-the-problem" style="font-size: 80%;">Setting up the problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#the-trial-solution" style="font-size: 80%;">The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs029.html#the-program-using-autograd" style="font-size: 80%;">The program using Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs030.html#using-forward-euler-to-solve-the-ode" style="font-size: 80%;">Using forward Euler to solve the ODE</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs031.html#example-solving-the-one-dimensional-poisson-equation" style="font-size: 80%;">Example: Solving the one dimensional Poisson equation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs032.html#the-specific-equation-to-solve-for" style="font-size: 80%;">The specific equation to solve for</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs033.html#solving-the-equation-using-autograd" style="font-size: 80%;">Solving the equation using Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs034.html#comparing-with-a-numerical-scheme" style="font-size: 80%;">Comparing with a numerical scheme</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs035.html#setting-up-the-code" style="font-size: 80%;">Setting up the code</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs036.html#partial-differential-equations" style="font-size: 80%;">Partial Differential Equations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs037.html#type-of-problem" style="font-size: 80%;">Type of problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs038.html#network-requirements" style="font-size: 80%;">Network requirements</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs039.html#more-details" style="font-size: 80%;">More details</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs040.html#example-the-diffusion-equation" style="font-size: 80%;">Example: The diffusion equation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs041.html#defining-the-problem" style="font-size: 80%;">Defining the problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs042.html#setting-up-the-network-using-autograd" style="font-size: 80%;">Setting up the network using Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs043.html#setting-up-the-network-using-autograd-the-trial-solution" style="font-size: 80%;">Setting up the network using Autograd; The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs044.html#why-the-jacobian" style="font-size: 80%;">Why the Jacobian?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs045.html#setting-up-the-network-using-autograd-the-full-program" style="font-size: 80%;">Setting up the network using Autograd; The full program</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs046.html#resources-on-differential-equations-and-deep-learning" style="font-size: 80%;">Resources on differential equations and deep learning</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs047.html#convolutional-neural-networks-recognizing-images" style="font-size: 80%;">Convolutional Neural Networks (recognizing images)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs048.html#what-is-the-difference" style="font-size: 80%;">What is the Difference</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs049.html#neural-networks-vs-cnns" style="font-size: 80%;">Neural Networks vs CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs050.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;">Why CNNS for images, sound files, medical images from CT scans etc?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs051.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;">Regular NNs don’t scale well to full images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs052.html#3d-volumes-of-neurons" style="font-size: 80%;">3D volumes of neurons</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs053.html#more-on-dimensionalities" style="font-size: 80%;">More on Dimensionalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs054.html#further-remarks" style="font-size: 80%;">Further remarks</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs055.html#layers-used-to-build-cnns" style="font-size: 80%;">Layers used to build CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs056.html#transforming-images" style="font-size: 80%;">Transforming images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs057.html#cnns-in-brief" style="font-size: 80%;">CNNs in brief</a></li>
+     <!-- navigation toc: --> <li><a href="#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs059.html#key-idea" style="font-size: 80%;">Key Idea</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#how-to-do-image-compression-before-the-era-of-deep-learning" style="font-size: 80%;">How to do image compression before the era of deep learning</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs061.html#the-svd-example" style="font-size: 80%;">The SVD example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs062.html#mathematics-of-cnns" style="font-size: 80%;">Mathematics of CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs063.html#convolution-examples-polynomial-multiplication" style="font-size: 80%;">Convolution Examples: Polynomial multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs064.html#efficient-polynomial-multiplication" style="font-size: 80%;">Efficient Polynomial Multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs065.html#further-simplification" style="font-size: 80%;">Further simplification</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs066.html#a-more-efficient-way-of-coding-the-above-convolution" style="font-size: 80%;">A more efficient way of coding the above Convolution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs067.html#commutative-process" style="font-size: 80%;">Commutative process</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs068.html#toeplitz-matrices" style="font-size: 80%;">Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs069.html#fourier-series-and-toeplitz-matrices" style="font-size: 80%;">Fourier series and Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs070.html#generalizing-the-above-one-dimensional-case" style="font-size: 80%;">Generalizing the above one-dimensional case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs071.html#memory-considerations" style="font-size: 80%;">Memory considerations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs072.html#padding" style="font-size: 80%;">Padding</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs073.html#new-vector" style="font-size: 80%;">New vector</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs074.html#rewriting-as-dot-products" style="font-size: 80%;">Rewriting as dot products</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs075.html#cross-correlation" style="font-size: 80%;">Cross correlation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs075.html#two-dimensional-objects" style="font-size: 80%;">Two-dimensional objects</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs076.html#cnns-in-more-detail-simple-example" style="font-size: 80%;">CNNs in more detail, simple example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs077.html#the-convolution-stage" style="font-size: 80%;">The convolution stage</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs078.html#finding-the-number-of-parameters" style="font-size: 80%;">Finding the number of parameters</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs079.html#new-image-or-volume" style="font-size: 80%;">New image (or volume)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs080.html#parameters-to-train-common-settings" style="font-size: 80%;">Parameters to train, common settings</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs081.html#examples-of-cnn-setups" style="font-size: 80%;">Examples of CNN setups</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs082.html#summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Summarizing: Performing a general discrete convolution ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs083.html#pooling" style="font-size: 80%;">Pooling</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs084.html#pooling-arithmetic" style="font-size: 80%;">Pooling arithmetic</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs085.html#pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Pooling types ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs086.html#building-convolutional-neural-networks-in-tensorflow-keras-and-pytorch" style="font-size: 80%;">Building convolutional neural networks in Tensorflow/Keras and PyTorch</a></li>
 
         </ul>
       </li>
@@ -348,36 +389,15 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0058"></a>
 <!-- !split -->
-<h2 id="compile-and-train-the-model" class="anchor">Compile and train the model </h2>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;">model<span style="color: #666666">.</span>compile(optimizer<span style="color: #666666">=</span><span style="color: #BA2121">&#39;adam&#39;</span>,
-              loss<span style="color: #666666">=</span>tf<span style="color: #666666">.</span>keras<span style="color: #666666">.</span>losses<span style="color: #666666">.</span>SparseCategoricalCrossentropy(from_logits<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">True</span>),
-              metrics<span style="color: #666666">=</span>[<span style="color: #BA2121">&#39;accuracy&#39;</span>])
-
-history <span style="color: #666666">=</span> model<span style="color: #666666">.</span>fit(train_images, train_labels, epochs<span style="color: #666666">=10</span>, 
-                    validation_data<span style="color: #666666">=</span>(test_images, test_labels))
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
+<h2 id="a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" class="anchor">A deep CNN model (<a href="https://github.com/rasbt/machine-learning-book" target="_self">From Raschka et al</a>) </h2>
 
+<center>  <!-- FIGURE -->
+<hr class="figure">
+<center>
+<p class="caption">Figure 3:  A deep CNN </p>
+</center>
+<p><img src="http://github.com/figslides/deepcnn.png" width="500" align="bottom"></p>
+</center>
 
 <p>
 <!-- navigation buttons at the bottom of the page -->
@@ -396,6 +416,15 @@ <h2 id="compile-and-train-the-model" class="anchor">Compile and train the model
   <li class="active"><a href="http://github.com/._week44-bs058.html">59</a></li>
   <li><a href="http://github.com/._week44-bs059.html">60</a></li>
   <li><a href="http://github.com/._week44-bs060.html">61</a></li>
+  <li><a href="http://github.com/._week44-bs061.html">62</a></li>
+  <li><a href="http://github.com/._week44-bs062.html">63</a></li>
+  <li><a href="http://github.com/._week44-bs063.html">64</a></li>
+  <li><a href="http://github.com/._week44-bs064.html">65</a></li>
+  <li><a href="http://github.com/._week44-bs065.html">66</a></li>
+  <li><a href="http://github.com/._week44-bs066.html">67</a></li>
+  <li><a href="http://github.com/._week44-bs067.html">68</a></li>
+  <li><a href="">...</a></li>
+  <li><a href="http://github.com/._week44-bs086.html">87</a></li>
   <li><a href="http://github.com/._week44-bs059.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week44/html/._week44-bs059.html b/doc/pub/week44/html/._week44-bs059.html
index 8d11d5b85..6cd7ef031 100644
--- a/doc/pub/week44/html/._week44-bs059.html
+++ b/doc/pub/week44/html/._week44-bs059.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 44,  Convolutional Neural Networks (CNN)">
-<title>Week 44,  Convolutional Neural Networks (CNN)</title>
+<meta name="description" content="Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)">
+<title>Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week44.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week44-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -41,10 +41,121 @@
                2,
                None,
                'lab-sessions-on-tuesday-and-wednesday'),
-              ('Material for Lecture Monday October 28',
+              ('Material for Lecture Monday October 27',
                2,
                None,
-               'material-for-lecture-monday-october-28'),
+               'material-for-lecture-monday-october-27'),
+              ('Solving differential equations  with Deep Learning',
+               2,
+               None,
+               'solving-differential-equations-with-deep-learning'),
+              ('Ordinary Differential Equations first',
+               2,
+               None,
+               'ordinary-differential-equations-first'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('Minimization process', 2, None, 'minimization-process'),
+              ('Minimizing the cost function using gradient descent and '
+               'automatic differentiation',
+               2,
+               None,
+               'minimizing-the-cost-function-using-gradient-descent-and-automatic-differentiation'),
+              ('Example: Exponential decay',
+               2,
+               None,
+               'example-exponential-decay'),
+              ('The function to solve for',
+               2,
+               None,
+               'the-function-to-solve-for'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('Setup of Network', 2, None, 'setup-of-network'),
+              ('Reformulating the problem',
+               2,
+               None,
+               'reformulating-the-problem'),
+              ('More technicalities', 2, None, 'more-technicalities'),
+              ('More details', 2, None, 'more-details'),
+              ('A possible implementation of a neural network',
+               2,
+               None,
+               'a-possible-implementation-of-a-neural-network'),
+              ('Technicalities', 2, None, 'technicalities'),
+              ('Final technicalities I', 2, None, 'final-technicalities-i'),
+              ('Final technicalities II', 2, None, 'final-technicalities-ii'),
+              ('Final technicalities III', 2, None, 'final-technicalities-iii'),
+              ('Final technicalities IV', 2, None, 'final-technicalities-iv'),
+              ('Back propagation', 2, None, 'back-propagation'),
+              ('Gradient descent', 2, None, 'gradient-descent'),
+              ('The code for solving the ODE',
+               2,
+               None,
+               'the-code-for-solving-the-ode'),
+              ('The network with one input layer, specified number of hidden '
+               'layers, and one output layer',
+               2,
+               None,
+               'the-network-with-one-input-layer-specified-number-of-hidden-layers-and-one-output-layer'),
+              ('Example: Population growth',
+               2,
+               None,
+               'example-population-growth'),
+              ('Setting up the problem', 2, None, 'setting-up-the-problem'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('The program using Autograd',
+               2,
+               None,
+               'the-program-using-autograd'),
+              ('Using forward Euler to solve the ODE',
+               2,
+               None,
+               'using-forward-euler-to-solve-the-ode'),
+              ('Example: Solving the one dimensional Poisson equation',
+               2,
+               None,
+               'example-solving-the-one-dimensional-poisson-equation'),
+              ('The specific equation to solve for',
+               2,
+               None,
+               'the-specific-equation-to-solve-for'),
+              ('Solving the equation using Autograd',
+               2,
+               None,
+               'solving-the-equation-using-autograd'),
+              ('Comparing with a numerical scheme',
+               2,
+               None,
+               'comparing-with-a-numerical-scheme'),
+              ('Setting up the code', 2, None, 'setting-up-the-code'),
+              ('Partial Differential Equations',
+               2,
+               None,
+               'partial-differential-equations'),
+              ('Type of problem', 2, None, 'type-of-problem'),
+              ('Network requirements', 2, None, 'network-requirements'),
+              ('More details', 2, None, 'more-details'),
+              ('Example: The diffusion equation',
+               2,
+               None,
+               'example-the-diffusion-equation'),
+              ('Defining the problem', 2, None, 'defining-the-problem'),
+              ('Setting up the network using Autograd',
+               2,
+               None,
+               'setting-up-the-network-using-autograd'),
+              ('Setting up the network using Autograd; The trial solution',
+               2,
+               None,
+               'setting-up-the-network-using-autograd-the-trial-solution'),
+              ('Why the Jacobian?', 2, None, 'why-the-jacobian'),
+              ('Setting up the network using Autograd; The full program',
+               2,
+               None,
+               'setting-up-the-network-using-autograd-the-full-program'),
+              ('Resources on differential equations and deep learning',
+               2,
+               None,
+               'resources-on-differential-equations-and-deep-learning'),
               ('Convolutional Neural Networks (recognizing images)',
                2,
                None,
@@ -141,85 +252,11 @@
                2,
                None,
                'pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
-              ('Building convolutional neural networks in Tensorflow and Keras',
-               2,
-               None,
-               'building-convolutional-neural-networks-in-tensorflow-and-keras'),
-              ('Setting it up', 2, None, 'setting-it-up'),
-              ('The MNIST dataset again', 2, None, 'the-mnist-dataset-again'),
-              ('Strong correlations', 2, None, 'strong-correlations'),
-              ('Layers of a CNN', 2, None, 'layers-of-a-cnn'),
-              ('Systematic reduction', 2, None, 'systematic-reduction'),
-              ('Prerequisites: Collect and pre-process data',
-               2,
-               None,
-               'prerequisites-collect-and-pre-process-data'),
-              ('Importing Keras and Tensorflow',
-               2,
-               None,
-               'importing-keras-and-tensorflow'),
-              ('Running with Keras', 2, None, 'running-with-keras'),
-              ('Final part', 2, None, 'final-part'),
-              ('Final visualization', 2, None, 'final-visualization'),
-              ('The CIFAR01 data set', 2, None, 'the-cifar01-data-set'),
-              ('Verifying the data set', 2, None, 'verifying-the-data-set'),
-              ('Set up  the model', 2, None, 'set-up-the-model'),
-              ('Add Dense layers on top', 2, None, 'add-dense-layers-on-top'),
-              ('Compile and train the model',
-               2,
-               None,
-               'compile-and-train-the-model'),
-              ('Finally, evaluate the model',
-               2,
-               None,
-               'finally-evaluate-the-model'),
-              ('Building our own CNN code',
+              ('Building convolutional neural networks in Tensorflow/Keras and '
+               'PyTorch',
                2,
                None,
-               'building-our-own-cnn-code'),
-              ('List of contents:', 3, None, 'list-of-contents'),
-              ('Schedulers', 3, None, 'schedulers'),
-              ('Usage of schedulers', 3, None, 'usage-of-schedulers'),
-              ('Cost functions', 3, None, 'cost-functions'),
-              ('Usage of cost functions', 3, None, 'usage-of-cost-functions'),
-              ('Activation functions', 3, None, 'activation-functions'),
-              ('Usage of activation functions',
-               3,
-               None,
-               'usage-of-activation-functions'),
-              ('Convolution', 3, None, 'convolution'),
-              ('Layers', 3, None, 'layers'),
-              ('Convolution2DLayer: convolution in a hidden layer',
-               3,
-               None,
-               'convolution2dlayer-convolution-in-a-hidden-layer'),
-              ('Backpropagation in the convolutional layer',
-               3,
-               None,
-               'backpropagation-in-the-convolutional-layer'),
-              ('Demonstration', 3, None, 'demonstration'),
-              ('Pooling Layer', 3, None, 'pooling-layer'),
-              ('Flattening Layer', 3, None, 'flattening-layer'),
-              ('Fully Connected Layers', 3, None, 'fully-connected-layers'),
-              ('Optimized Convolution2DLayer',
-               3,
-               None,
-               'optimized-convolution2dlayer'),
-              ('The Convolutional Neural Network (CNN)',
-               3,
-               None,
-               'the-convolutional-neural-network-cnn'),
-              ('Usage of CNN code', 3, None, 'usage-of-cnn-code'),
-              ('Additional Remarks', 3, None, 'additional-remarks'),
-              ('Remarks on the speed', 3, None, 'remarks-on-the-speed'),
-              ('Convolution using separable kernels',
-               3,
-               None,
-               'convolution-using-separable-kernels'),
-              ('Convolution in the Fourier domain',
-               3,
-               None,
-               'convolution-in-the-fourier-domain')]}
+               'building-convolutional-neural-networks-in-tensorflow-keras-and-pytorch')]}
 end of tocinfo -->
 
 <body>
@@ -247,96 +284,100 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week44-bs.html">Week 44,  Convolutional Neural Networks (CNN)</a>
+    <a class="navbar-brand" href="http://github.com/week44-bs.html">Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs001.html#plan-for-week-44" style="font-size: 80%;"><b>Plan for week 44</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs002.html#lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Lab  sessions on Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs003.html#material-for-lecture-monday-october-28" style="font-size: 80%;"><b>Material for Lecture Monday October 28</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs004.html#convolutional-neural-networks-recognizing-images" style="font-size: 80%;"><b>Convolutional Neural Networks (recognizing images)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs005.html#what-is-the-difference" style="font-size: 80%;"><b>What is the Difference</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs006.html#neural-networks-vs-cnns" style="font-size: 80%;"><b>Neural Networks vs CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs007.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;"><b>Why CNNS for images, sound files, medical images from CT scans etc?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs008.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;"><b>Regular NNs don’t scale well to full images</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs009.html#3d-volumes-of-neurons" style="font-size: 80%;"><b>3D volumes of neurons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs010.html#more-on-dimensionalities" style="font-size: 80%;"><b>More on Dimensionalities</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs011.html#further-remarks" style="font-size: 80%;"><b>Further remarks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs012.html#layers-used-to-build-cnns" style="font-size: 80%;"><b>Layers used to build CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs013.html#transforming-images" style="font-size: 80%;"><b>Transforming images</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs014.html#cnns-in-brief" style="font-size: 80%;"><b>CNNs in brief</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs015.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs016.html#key-idea" style="font-size: 80%;"><b>Key Idea</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs017.html#how-to-do-image-compression-before-the-era-of-deep-learning" style="font-size: 80%;"><b>How to do image compression before the era of deep learning</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs018.html#the-svd-example" style="font-size: 80%;"><b>The SVD example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs019.html#mathematics-of-cnns" style="font-size: 80%;"><b>Mathematics of CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs020.html#convolution-examples-polynomial-multiplication" style="font-size: 80%;"><b>Convolution Examples: Polynomial multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs021.html#efficient-polynomial-multiplication" style="font-size: 80%;"><b>Efficient Polynomial Multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs022.html#further-simplification" style="font-size: 80%;"><b>Further simplification</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs023.html#a-more-efficient-way-of-coding-the-above-convolution" style="font-size: 80%;"><b>A more efficient way of coding the above Convolution</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs024.html#commutative-process" style="font-size: 80%;"><b>Commutative process</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs025.html#toeplitz-matrices" style="font-size: 80%;"><b>Toeplitz matrices</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs026.html#fourier-series-and-toeplitz-matrices" style="font-size: 80%;"><b>Fourier series and Toeplitz matrices</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs027.html#generalizing-the-above-one-dimensional-case" style="font-size: 80%;"><b>Generalizing the above one-dimensional case</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#memory-considerations" style="font-size: 80%;"><b>Memory considerations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs029.html#padding" style="font-size: 80%;"><b>Padding</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs030.html#new-vector" style="font-size: 80%;"><b>New vector</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs031.html#rewriting-as-dot-products" style="font-size: 80%;"><b>Rewriting as dot products</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs032.html#cross-correlation" style="font-size: 80%;"><b>Cross correlation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs032.html#two-dimensional-objects" style="font-size: 80%;"><b>Two-dimensional objects</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs033.html#cnns-in-more-detail-simple-example" style="font-size: 80%;"><b>CNNs in more detail, simple example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs034.html#the-convolution-stage" style="font-size: 80%;"><b>The convolution stage</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs035.html#finding-the-number-of-parameters" style="font-size: 80%;"><b>Finding the number of parameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs036.html#new-image-or-volume" style="font-size: 80%;"><b>New image (or volume)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs037.html#parameters-to-train-common-settings" style="font-size: 80%;"><b>Parameters to train, common settings</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs038.html#examples-of-cnn-setups" style="font-size: 80%;"><b>Examples of CNN setups</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs039.html#summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>Summarizing: Performing a general discrete convolution ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs040.html#pooling" style="font-size: 80%;"><b>Pooling</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs041.html#pooling-arithmetic" style="font-size: 80%;"><b>Pooling arithmetic</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs042.html#pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>Pooling types ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs043.html#building-convolutional-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building convolutional neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs044.html#setting-it-up" style="font-size: 80%;"><b>Setting it up</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs045.html#the-mnist-dataset-again" style="font-size: 80%;"><b>The MNIST dataset again</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs046.html#strong-correlations" style="font-size: 80%;"><b>Strong correlations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs047.html#layers-of-a-cnn" style="font-size: 80%;"><b>Layers of a CNN</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs048.html#systematic-reduction" style="font-size: 80%;"><b>Systematic reduction</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs049.html#prerequisites-collect-and-pre-process-data" style="font-size: 80%;"><b>Prerequisites: Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs050.html#importing-keras-and-tensorflow" style="font-size: 80%;"><b>Importing Keras and Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs051.html#running-with-keras" style="font-size: 80%;"><b>Running with Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs052.html#final-part" style="font-size: 80%;"><b>Final part</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs053.html#final-visualization" style="font-size: 80%;"><b>Final visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs054.html#the-cifar01-data-set" style="font-size: 80%;"><b>The CIFAR01 data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs055.html#verifying-the-data-set" style="font-size: 80%;"><b>Verifying the data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs056.html#set-up-the-model" style="font-size: 80%;"><b>Set up  the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs057.html#add-dense-layers-on-top" style="font-size: 80%;"><b>Add Dense layers on top</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs058.html#compile-and-train-the-model" style="font-size: 80%;"><b>Compile and train the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="#finally-evaluate-the-model" style="font-size: 80%;"><b>Finally, evaluate the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#building-our-own-cnn-code" style="font-size: 80%;"><b>Building our own CNN code</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#list-of-contents" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;List of contents:</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution2dlayer-convolution-in-a-hidden-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution2DLayer: convolution in a hidden layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#backpropagation-in-the-convolutional-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Backpropagation in the convolutional layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#demonstration" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Demonstration</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#pooling-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Pooling Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#flattening-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Flattening Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#fully-connected-layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Fully Connected Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#optimized-convolution2dlayer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Optimized Convolution2DLayer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#the-convolutional-neural-network-cnn" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;The Convolutional Neural Network (CNN)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-cnn-code" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of CNN code</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#additional-remarks" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Additional Remarks</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#remarks-on-the-speed" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Remarks on the speed</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution-using-separable-kernels" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution using separable kernels</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution-in-the-fourier-domain" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution in the Fourier domain</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs001.html#plan-for-week-44" style="font-size: 80%;">Plan for week 44</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs002.html#lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;">Lab  sessions on Tuesday and Wednesday</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs003.html#material-for-lecture-monday-october-27" style="font-size: 80%;">Material for Lecture Monday October 27</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs004.html#solving-differential-equations-with-deep-learning" style="font-size: 80%;">Solving differential equations  with Deep Learning</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs005.html#ordinary-differential-equations-first" style="font-size: 80%;">Ordinary Differential Equations first</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#the-trial-solution" style="font-size: 80%;">The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs007.html#minimization-process" style="font-size: 80%;">Minimization process</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs008.html#minimizing-the-cost-function-using-gradient-descent-and-automatic-differentiation" style="font-size: 80%;">Minimizing the cost function using gradient descent and automatic differentiation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs009.html#example-exponential-decay" style="font-size: 80%;">Example: Exponential decay</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs010.html#the-function-to-solve-for" style="font-size: 80%;">The function to solve for</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#the-trial-solution" style="font-size: 80%;">The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs012.html#setup-of-network" style="font-size: 80%;">Setup of Network</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs013.html#reformulating-the-problem" style="font-size: 80%;">Reformulating the problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs014.html#more-technicalities" style="font-size: 80%;">More technicalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs039.html#more-details" style="font-size: 80%;">More details</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs016.html#a-possible-implementation-of-a-neural-network" style="font-size: 80%;">A possible implementation of a neural network</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs017.html#technicalities" style="font-size: 80%;">Technicalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs018.html#final-technicalities-i" style="font-size: 80%;">Final technicalities I</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs019.html#final-technicalities-ii" style="font-size: 80%;">Final technicalities II</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs020.html#final-technicalities-iii" style="font-size: 80%;">Final technicalities III</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs021.html#final-technicalities-iv" style="font-size: 80%;">Final technicalities IV</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs022.html#back-propagation" style="font-size: 80%;">Back propagation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs023.html#gradient-descent" style="font-size: 80%;">Gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs024.html#the-code-for-solving-the-ode" style="font-size: 80%;">The code for solving the ODE</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs025.html#the-network-with-one-input-layer-specified-number-of-hidden-layers-and-one-output-layer" style="font-size: 80%;">The network with one input layer, specified number of hidden layers, and one output layer</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs026.html#example-population-growth" style="font-size: 80%;">Example: Population growth</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs027.html#setting-up-the-problem" style="font-size: 80%;">Setting up the problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#the-trial-solution" style="font-size: 80%;">The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs029.html#the-program-using-autograd" style="font-size: 80%;">The program using Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs030.html#using-forward-euler-to-solve-the-ode" style="font-size: 80%;">Using forward Euler to solve the ODE</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs031.html#example-solving-the-one-dimensional-poisson-equation" style="font-size: 80%;">Example: Solving the one dimensional Poisson equation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs032.html#the-specific-equation-to-solve-for" style="font-size: 80%;">The specific equation to solve for</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs033.html#solving-the-equation-using-autograd" style="font-size: 80%;">Solving the equation using Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs034.html#comparing-with-a-numerical-scheme" style="font-size: 80%;">Comparing with a numerical scheme</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs035.html#setting-up-the-code" style="font-size: 80%;">Setting up the code</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs036.html#partial-differential-equations" style="font-size: 80%;">Partial Differential Equations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs037.html#type-of-problem" style="font-size: 80%;">Type of problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs038.html#network-requirements" style="font-size: 80%;">Network requirements</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs039.html#more-details" style="font-size: 80%;">More details</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs040.html#example-the-diffusion-equation" style="font-size: 80%;">Example: The diffusion equation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs041.html#defining-the-problem" style="font-size: 80%;">Defining the problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs042.html#setting-up-the-network-using-autograd" style="font-size: 80%;">Setting up the network using Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs043.html#setting-up-the-network-using-autograd-the-trial-solution" style="font-size: 80%;">Setting up the network using Autograd; The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs044.html#why-the-jacobian" style="font-size: 80%;">Why the Jacobian?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs045.html#setting-up-the-network-using-autograd-the-full-program" style="font-size: 80%;">Setting up the network using Autograd; The full program</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs046.html#resources-on-differential-equations-and-deep-learning" style="font-size: 80%;">Resources on differential equations and deep learning</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs047.html#convolutional-neural-networks-recognizing-images" style="font-size: 80%;">Convolutional Neural Networks (recognizing images)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs048.html#what-is-the-difference" style="font-size: 80%;">What is the Difference</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs049.html#neural-networks-vs-cnns" style="font-size: 80%;">Neural Networks vs CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs050.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;">Why CNNS for images, sound files, medical images from CT scans etc?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs051.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;">Regular NNs don’t scale well to full images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs052.html#3d-volumes-of-neurons" style="font-size: 80%;">3D volumes of neurons</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs053.html#more-on-dimensionalities" style="font-size: 80%;">More on Dimensionalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs054.html#further-remarks" style="font-size: 80%;">Further remarks</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs055.html#layers-used-to-build-cnns" style="font-size: 80%;">Layers used to build CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs056.html#transforming-images" style="font-size: 80%;">Transforming images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs057.html#cnns-in-brief" style="font-size: 80%;">CNNs in brief</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs058.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="#key-idea" style="font-size: 80%;">Key Idea</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#how-to-do-image-compression-before-the-era-of-deep-learning" style="font-size: 80%;">How to do image compression before the era of deep learning</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs061.html#the-svd-example" style="font-size: 80%;">The SVD example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs062.html#mathematics-of-cnns" style="font-size: 80%;">Mathematics of CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs063.html#convolution-examples-polynomial-multiplication" style="font-size: 80%;">Convolution Examples: Polynomial multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs064.html#efficient-polynomial-multiplication" style="font-size: 80%;">Efficient Polynomial Multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs065.html#further-simplification" style="font-size: 80%;">Further simplification</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs066.html#a-more-efficient-way-of-coding-the-above-convolution" style="font-size: 80%;">A more efficient way of coding the above Convolution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs067.html#commutative-process" style="font-size: 80%;">Commutative process</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs068.html#toeplitz-matrices" style="font-size: 80%;">Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs069.html#fourier-series-and-toeplitz-matrices" style="font-size: 80%;">Fourier series and Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs070.html#generalizing-the-above-one-dimensional-case" style="font-size: 80%;">Generalizing the above one-dimensional case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs071.html#memory-considerations" style="font-size: 80%;">Memory considerations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs072.html#padding" style="font-size: 80%;">Padding</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs073.html#new-vector" style="font-size: 80%;">New vector</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs074.html#rewriting-as-dot-products" style="font-size: 80%;">Rewriting as dot products</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs075.html#cross-correlation" style="font-size: 80%;">Cross correlation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs075.html#two-dimensional-objects" style="font-size: 80%;">Two-dimensional objects</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs076.html#cnns-in-more-detail-simple-example" style="font-size: 80%;">CNNs in more detail, simple example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs077.html#the-convolution-stage" style="font-size: 80%;">The convolution stage</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs078.html#finding-the-number-of-parameters" style="font-size: 80%;">Finding the number of parameters</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs079.html#new-image-or-volume" style="font-size: 80%;">New image (or volume)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs080.html#parameters-to-train-common-settings" style="font-size: 80%;">Parameters to train, common settings</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs081.html#examples-of-cnn-setups" style="font-size: 80%;">Examples of CNN setups</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs082.html#summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Summarizing: Performing a general discrete convolution ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs083.html#pooling" style="font-size: 80%;">Pooling</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs084.html#pooling-arithmetic" style="font-size: 80%;">Pooling arithmetic</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs085.html#pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Pooling types ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs086.html#building-convolutional-neural-networks-in-tensorflow-keras-and-pytorch" style="font-size: 80%;">Building convolutional neural networks in Tensorflow/Keras and PyTorch</a></li>
 
         </ul>
       </li>
@@ -348,40 +389,17 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0059"></a>
 <!-- !split -->
-<h2 id="finally-evaluate-the-model" class="anchor">Finally, evaluate the model </h2>
-
+<h2 id="key-idea" class="anchor">Key Idea </h2>
 
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;">plt<span style="color: #666666">.</span>plot(history<span style="color: #666666">.</span>history[<span style="color: #BA2121">&#39;accuracy&#39;</span>], label<span style="color: #666666">=</span><span style="color: #BA2121">&#39;accuracy&#39;</span>)
-plt<span style="color: #666666">.</span>plot(history<span style="color: #666666">.</span>history[<span style="color: #BA2121">&#39;val_accuracy&#39;</span>], label <span style="color: #666666">=</span> <span style="color: #BA2121">&#39;val_accuracy&#39;</span>)
-plt<span style="color: #666666">.</span>xlabel(<span style="color: #BA2121">&#39;Epoch&#39;</span>)
-plt<span style="color: #666666">.</span>ylabel(<span style="color: #BA2121">&#39;Accuracy&#39;</span>)
-plt<span style="color: #666666">.</span>ylim([<span style="color: #666666">0.5</span>, <span style="color: #666666">1</span>])
-plt<span style="color: #666666">.</span>legend(loc<span style="color: #666666">=</span><span style="color: #BA2121">&#39;lower right&#39;</span>)
+<p>A dense neural network is representd by an affine operation (like
+matrix-matrix multiplication) where all parameters are included.
+</p>
 
-test_loss, test_acc <span style="color: #666666">=</span> model<span style="color: #666666">.</span>evaluate(test_images,  test_labels, verbose<span style="color: #666666">=2</span>)
-
-<span style="color: #008000">print</span>(test_acc)
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
+<p>The key idea in CNNs for say imaging is that in images neighbor pixels tend to be related! So we connect
+only neighboring neurons in the input instead of connecting all with the first hidden layer.
+</p>
 
+<p>We say we perform a filtering (convolution is the mathematical operation). </p>
 
 <p>
 <!-- navigation buttons at the bottom of the page -->
@@ -399,6 +417,16 @@ <h2 id="finally-evaluate-the-model" class="anchor">Finally, evaluate the model <
   <li><a href="http://github.com/._week44-bs058.html">59</a></li>
   <li class="active"><a href="http://github.com/._week44-bs059.html">60</a></li>
   <li><a href="http://github.com/._week44-bs060.html">61</a></li>
+  <li><a href="http://github.com/._week44-bs061.html">62</a></li>
+  <li><a href="http://github.com/._week44-bs062.html">63</a></li>
+  <li><a href="http://github.com/._week44-bs063.html">64</a></li>
+  <li><a href="http://github.com/._week44-bs064.html">65</a></li>
+  <li><a href="http://github.com/._week44-bs065.html">66</a></li>
+  <li><a href="http://github.com/._week44-bs066.html">67</a></li>
+  <li><a href="http://github.com/._week44-bs067.html">68</a></li>
+  <li><a href="http://github.com/._week44-bs068.html">69</a></li>
+  <li><a href="">...</a></li>
+  <li><a href="http://github.com/._week44-bs086.html">87</a></li>
   <li><a href="http://github.com/._week44-bs060.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week44/html/._week44-bs060.html b/doc/pub/week44/html/._week44-bs060.html
index 28ca8bae1..37d4448a7 100644
--- a/doc/pub/week44/html/._week44-bs060.html
+++ b/doc/pub/week44/html/._week44-bs060.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 44,  Convolutional Neural Networks (CNN)">
-<title>Week 44,  Convolutional Neural Networks (CNN)</title>
+<meta name="description" content="Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)">
+<title>Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week44.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week44-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -41,10 +41,121 @@
                2,
                None,
                'lab-sessions-on-tuesday-and-wednesday'),
-              ('Material for Lecture Monday October 28',
+              ('Material for Lecture Monday October 27',
                2,
                None,
-               'material-for-lecture-monday-october-28'),
+               'material-for-lecture-monday-october-27'),
+              ('Solving differential equations  with Deep Learning',
+               2,
+               None,
+               'solving-differential-equations-with-deep-learning'),
+              ('Ordinary Differential Equations first',
+               2,
+               None,
+               'ordinary-differential-equations-first'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('Minimization process', 2, None, 'minimization-process'),
+              ('Minimizing the cost function using gradient descent and '
+               'automatic differentiation',
+               2,
+               None,
+               'minimizing-the-cost-function-using-gradient-descent-and-automatic-differentiation'),
+              ('Example: Exponential decay',
+               2,
+               None,
+               'example-exponential-decay'),
+              ('The function to solve for',
+               2,
+               None,
+               'the-function-to-solve-for'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('Setup of Network', 2, None, 'setup-of-network'),
+              ('Reformulating the problem',
+               2,
+               None,
+               'reformulating-the-problem'),
+              ('More technicalities', 2, None, 'more-technicalities'),
+              ('More details', 2, None, 'more-details'),
+              ('A possible implementation of a neural network',
+               2,
+               None,
+               'a-possible-implementation-of-a-neural-network'),
+              ('Technicalities', 2, None, 'technicalities'),
+              ('Final technicalities I', 2, None, 'final-technicalities-i'),
+              ('Final technicalities II', 2, None, 'final-technicalities-ii'),
+              ('Final technicalities III', 2, None, 'final-technicalities-iii'),
+              ('Final technicalities IV', 2, None, 'final-technicalities-iv'),
+              ('Back propagation', 2, None, 'back-propagation'),
+              ('Gradient descent', 2, None, 'gradient-descent'),
+              ('The code for solving the ODE',
+               2,
+               None,
+               'the-code-for-solving-the-ode'),
+              ('The network with one input layer, specified number of hidden '
+               'layers, and one output layer',
+               2,
+               None,
+               'the-network-with-one-input-layer-specified-number-of-hidden-layers-and-one-output-layer'),
+              ('Example: Population growth',
+               2,
+               None,
+               'example-population-growth'),
+              ('Setting up the problem', 2, None, 'setting-up-the-problem'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('The program using Autograd',
+               2,
+               None,
+               'the-program-using-autograd'),
+              ('Using forward Euler to solve the ODE',
+               2,
+               None,
+               'using-forward-euler-to-solve-the-ode'),
+              ('Example: Solving the one dimensional Poisson equation',
+               2,
+               None,
+               'example-solving-the-one-dimensional-poisson-equation'),
+              ('The specific equation to solve for',
+               2,
+               None,
+               'the-specific-equation-to-solve-for'),
+              ('Solving the equation using Autograd',
+               2,
+               None,
+               'solving-the-equation-using-autograd'),
+              ('Comparing with a numerical scheme',
+               2,
+               None,
+               'comparing-with-a-numerical-scheme'),
+              ('Setting up the code', 2, None, 'setting-up-the-code'),
+              ('Partial Differential Equations',
+               2,
+               None,
+               'partial-differential-equations'),
+              ('Type of problem', 2, None, 'type-of-problem'),
+              ('Network requirements', 2, None, 'network-requirements'),
+              ('More details', 2, None, 'more-details'),
+              ('Example: The diffusion equation',
+               2,
+               None,
+               'example-the-diffusion-equation'),
+              ('Defining the problem', 2, None, 'defining-the-problem'),
+              ('Setting up the network using Autograd',
+               2,
+               None,
+               'setting-up-the-network-using-autograd'),
+              ('Setting up the network using Autograd; The trial solution',
+               2,
+               None,
+               'setting-up-the-network-using-autograd-the-trial-solution'),
+              ('Why the Jacobian?', 2, None, 'why-the-jacobian'),
+              ('Setting up the network using Autograd; The full program',
+               2,
+               None,
+               'setting-up-the-network-using-autograd-the-full-program'),
+              ('Resources on differential equations and deep learning',
+               2,
+               None,
+               'resources-on-differential-equations-and-deep-learning'),
               ('Convolutional Neural Networks (recognizing images)',
                2,
                None,
@@ -141,85 +252,11 @@
                2,
                None,
                'pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
-              ('Building convolutional neural networks in Tensorflow and Keras',
-               2,
-               None,
-               'building-convolutional-neural-networks-in-tensorflow-and-keras'),
-              ('Setting it up', 2, None, 'setting-it-up'),
-              ('The MNIST dataset again', 2, None, 'the-mnist-dataset-again'),
-              ('Strong correlations', 2, None, 'strong-correlations'),
-              ('Layers of a CNN', 2, None, 'layers-of-a-cnn'),
-              ('Systematic reduction', 2, None, 'systematic-reduction'),
-              ('Prerequisites: Collect and pre-process data',
-               2,
-               None,
-               'prerequisites-collect-and-pre-process-data'),
-              ('Importing Keras and Tensorflow',
+              ('Building convolutional neural networks in Tensorflow/Keras and '
+               'PyTorch',
                2,
                None,
-               'importing-keras-and-tensorflow'),
-              ('Running with Keras', 2, None, 'running-with-keras'),
-              ('Final part', 2, None, 'final-part'),
-              ('Final visualization', 2, None, 'final-visualization'),
-              ('The CIFAR01 data set', 2, None, 'the-cifar01-data-set'),
-              ('Verifying the data set', 2, None, 'verifying-the-data-set'),
-              ('Set up  the model', 2, None, 'set-up-the-model'),
-              ('Add Dense layers on top', 2, None, 'add-dense-layers-on-top'),
-              ('Compile and train the model',
-               2,
-               None,
-               'compile-and-train-the-model'),
-              ('Finally, evaluate the model',
-               2,
-               None,
-               'finally-evaluate-the-model'),
-              ('Building our own CNN code',
-               2,
-               None,
-               'building-our-own-cnn-code'),
-              ('List of contents:', 3, None, 'list-of-contents'),
-              ('Schedulers', 3, None, 'schedulers'),
-              ('Usage of schedulers', 3, None, 'usage-of-schedulers'),
-              ('Cost functions', 3, None, 'cost-functions'),
-              ('Usage of cost functions', 3, None, 'usage-of-cost-functions'),
-              ('Activation functions', 3, None, 'activation-functions'),
-              ('Usage of activation functions',
-               3,
-               None,
-               'usage-of-activation-functions'),
-              ('Convolution', 3, None, 'convolution'),
-              ('Layers', 3, None, 'layers'),
-              ('Convolution2DLayer: convolution in a hidden layer',
-               3,
-               None,
-               'convolution2dlayer-convolution-in-a-hidden-layer'),
-              ('Backpropagation in the convolutional layer',
-               3,
-               None,
-               'backpropagation-in-the-convolutional-layer'),
-              ('Demonstration', 3, None, 'demonstration'),
-              ('Pooling Layer', 3, None, 'pooling-layer'),
-              ('Flattening Layer', 3, None, 'flattening-layer'),
-              ('Fully Connected Layers', 3, None, 'fully-connected-layers'),
-              ('Optimized Convolution2DLayer',
-               3,
-               None,
-               'optimized-convolution2dlayer'),
-              ('The Convolutional Neural Network (CNN)',
-               3,
-               None,
-               'the-convolutional-neural-network-cnn'),
-              ('Usage of CNN code', 3, None, 'usage-of-cnn-code'),
-              ('Additional Remarks', 3, None, 'additional-remarks'),
-              ('Remarks on the speed', 3, None, 'remarks-on-the-speed'),
-              ('Convolution using separable kernels',
-               3,
-               None,
-               'convolution-using-separable-kernels'),
-              ('Convolution in the Fourier domain',
-               3,
-               None,
-               'convolution-in-the-fourier-domain')]}
+               'building-convolutional-neural-networks-in-tensorflow-keras-and-pytorch')]}
 end of tocinfo -->
 
 <body>
@@ -247,96 +284,100 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week44-bs.html">Week 44,  Convolutional Neural Networks (CNN)</a>
+    <a class="navbar-brand" href="http://github.com/week44-bs.html">Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs001.html#plan-for-week-44" style="font-size: 80%;"><b>Plan for week 44</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs002.html#lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Lab  sessions on Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs003.html#material-for-lecture-monday-october-28" style="font-size: 80%;"><b>Material for Lecture Monday October 28</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs004.html#convolutional-neural-networks-recognizing-images" style="font-size: 80%;"><b>Convolutional Neural Networks (recognizing images)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs005.html#what-is-the-difference" style="font-size: 80%;"><b>What is the Difference</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs006.html#neural-networks-vs-cnns" style="font-size: 80%;"><b>Neural Networks vs CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs007.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;"><b>Why CNNS for images, sound files, medical images from CT scans etc?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs008.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;"><b>Regular NNs don’t scale well to full images</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs009.html#3d-volumes-of-neurons" style="font-size: 80%;"><b>3D volumes of neurons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs010.html#more-on-dimensionalities" style="font-size: 80%;"><b>More on Dimensionalities</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs011.html#further-remarks" style="font-size: 80%;"><b>Further remarks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs012.html#layers-used-to-build-cnns" style="font-size: 80%;"><b>Layers used to build CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs013.html#transforming-images" style="font-size: 80%;"><b>Transforming images</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs014.html#cnns-in-brief" style="font-size: 80%;"><b>CNNs in brief</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs015.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs016.html#key-idea" style="font-size: 80%;"><b>Key Idea</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs017.html#how-to-do-image-compression-before-the-era-of-deep-learning" style="font-size: 80%;"><b>How to do image compression before the era of deep learning</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs018.html#the-svd-example" style="font-size: 80%;"><b>The SVD example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs019.html#mathematics-of-cnns" style="font-size: 80%;"><b>Mathematics of CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs020.html#convolution-examples-polynomial-multiplication" style="font-size: 80%;"><b>Convolution Examples: Polynomial multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs021.html#efficient-polynomial-multiplication" style="font-size: 80%;"><b>Efficient Polynomial Multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs022.html#further-simplification" style="font-size: 80%;"><b>Further simplification</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs023.html#a-more-efficient-way-of-coding-the-above-convolution" style="font-size: 80%;"><b>A more efficient way of coding the above Convolution</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs024.html#commutative-process" style="font-size: 80%;"><b>Commutative process</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs025.html#toeplitz-matrices" style="font-size: 80%;"><b>Toeplitz matrices</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs026.html#fourier-series-and-toeplitz-matrices" style="font-size: 80%;"><b>Fourier series and Toeplitz matrices</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs027.html#generalizing-the-above-one-dimensional-case" style="font-size: 80%;"><b>Generalizing the above one-dimensional case</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#memory-considerations" style="font-size: 80%;"><b>Memory considerations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs029.html#padding" style="font-size: 80%;"><b>Padding</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs030.html#new-vector" style="font-size: 80%;"><b>New vector</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs031.html#rewriting-as-dot-products" style="font-size: 80%;"><b>Rewriting as dot products</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs032.html#cross-correlation" style="font-size: 80%;"><b>Cross correlation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs032.html#two-dimensional-objects" style="font-size: 80%;"><b>Two-dimensional objects</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs033.html#cnns-in-more-detail-simple-example" style="font-size: 80%;"><b>CNNs in more detail, simple example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs034.html#the-convolution-stage" style="font-size: 80%;"><b>The convolution stage</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs035.html#finding-the-number-of-parameters" style="font-size: 80%;"><b>Finding the number of parameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs036.html#new-image-or-volume" style="font-size: 80%;"><b>New image (or volume)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs037.html#parameters-to-train-common-settings" style="font-size: 80%;"><b>Parameters to train, common settings</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs038.html#examples-of-cnn-setups" style="font-size: 80%;"><b>Examples of CNN setups</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs039.html#summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>Summarizing: Performing a general discrete convolution ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs040.html#pooling" style="font-size: 80%;"><b>Pooling</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs041.html#pooling-arithmetic" style="font-size: 80%;"><b>Pooling arithmetic</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs042.html#pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>Pooling types ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs043.html#building-convolutional-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building convolutional neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs044.html#setting-it-up" style="font-size: 80%;"><b>Setting it up</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs045.html#the-mnist-dataset-again" style="font-size: 80%;"><b>The MNIST dataset again</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs046.html#strong-correlations" style="font-size: 80%;"><b>Strong correlations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs047.html#layers-of-a-cnn" style="font-size: 80%;"><b>Layers of a CNN</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs048.html#systematic-reduction" style="font-size: 80%;"><b>Systematic reduction</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs049.html#prerequisites-collect-and-pre-process-data" style="font-size: 80%;"><b>Prerequisites: Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs050.html#importing-keras-and-tensorflow" style="font-size: 80%;"><b>Importing Keras and Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs051.html#running-with-keras" style="font-size: 80%;"><b>Running with Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs052.html#final-part" style="font-size: 80%;"><b>Final part</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs053.html#final-visualization" style="font-size: 80%;"><b>Final visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs054.html#the-cifar01-data-set" style="font-size: 80%;"><b>The CIFAR01 data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs055.html#verifying-the-data-set" style="font-size: 80%;"><b>Verifying the data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs056.html#set-up-the-model" style="font-size: 80%;"><b>Set up  the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs057.html#add-dense-layers-on-top" style="font-size: 80%;"><b>Add Dense layers on top</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs058.html#compile-and-train-the-model" style="font-size: 80%;"><b>Compile and train the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs059.html#finally-evaluate-the-model" style="font-size: 80%;"><b>Finally, evaluate the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="#building-our-own-cnn-code" style="font-size: 80%;"><b>Building our own CNN code</b></a></li>
-     <!-- navigation toc: --> <li><a href="#list-of-contents" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;List of contents:</a></li>
-     <!-- navigation toc: --> <li><a href="#schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="#usage-of-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="#cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="#usage-of-cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="#usage-of-activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="#convolution" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution</a></li>
-     <!-- navigation toc: --> <li><a href="#layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Layers</a></li>
-     <!-- navigation toc: --> <li><a href="#convolution2dlayer-convolution-in-a-hidden-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution2DLayer: convolution in a hidden layer</a></li>
-     <!-- navigation toc: --> <li><a href="#backpropagation-in-the-convolutional-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Backpropagation in the convolutional layer</a></li>
-     <!-- navigation toc: --> <li><a href="#demonstration" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Demonstration</a></li>
-     <!-- navigation toc: --> <li><a href="#pooling-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Pooling Layer</a></li>
-     <!-- navigation toc: --> <li><a href="#flattening-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Flattening Layer</a></li>
-     <!-- navigation toc: --> <li><a href="#fully-connected-layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Fully Connected Layers</a></li>
-     <!-- navigation toc: --> <li><a href="#optimized-convolution2dlayer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Optimized Convolution2DLayer</a></li>
-     <!-- navigation toc: --> <li><a href="#the-convolutional-neural-network-cnn" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;The Convolutional Neural Network (CNN)</a></li>
-     <!-- navigation toc: --> <li><a href="#usage-of-cnn-code" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of CNN code</a></li>
-     <!-- navigation toc: --> <li><a href="#additional-remarks" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Additional Remarks</a></li>
-     <!-- navigation toc: --> <li><a href="#remarks-on-the-speed" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Remarks on the speed</a></li>
-     <!-- navigation toc: --> <li><a href="#convolution-using-separable-kernels" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution using separable kernels</a></li>
-     <!-- navigation toc: --> <li><a href="#convolution-in-the-fourier-domain" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution in the Fourier domain</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs001.html#plan-for-week-44" style="font-size: 80%;">Plan for week 44</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs002.html#lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;">Lab  sessions on Tuesday and Wednesday</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs003.html#material-for-lecture-monday-october-27" style="font-size: 80%;">Material for Lecture Monday October 27</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs004.html#solving-differential-equations-with-deep-learning" style="font-size: 80%;">Solving differential equations  with Deep Learning</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs005.html#ordinary-differential-equations-first" style="font-size: 80%;">Ordinary Differential Equations first</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#the-trial-solution" style="font-size: 80%;">The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs007.html#minimization-process" style="font-size: 80%;">Minimization process</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs008.html#minimizing-the-cost-function-using-gradient-descent-and-automatic-differentiation" style="font-size: 80%;">Minimizing the cost function using gradient descent and automatic differentiation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs009.html#example-exponential-decay" style="font-size: 80%;">Example: Exponential decay</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs010.html#the-function-to-solve-for" style="font-size: 80%;">The function to solve for</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#the-trial-solution" style="font-size: 80%;">The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs012.html#setup-of-network" style="font-size: 80%;">Setup of Network</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs013.html#reformulating-the-problem" style="font-size: 80%;">Reformulating the problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs014.html#more-technicalities" style="font-size: 80%;">More technicalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs039.html#more-details" style="font-size: 80%;">More details</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs016.html#a-possible-implementation-of-a-neural-network" style="font-size: 80%;">A possible implementation of a neural network</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs017.html#technicalities" style="font-size: 80%;">Technicalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs018.html#final-technicalities-i" style="font-size: 80%;">Final technicalities I</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs019.html#final-technicalities-ii" style="font-size: 80%;">Final technicalities II</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs020.html#final-technicalities-iii" style="font-size: 80%;">Final technicalities III</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs021.html#final-technicalities-iv" style="font-size: 80%;">Final technicalities IV</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs022.html#back-propagation" style="font-size: 80%;">Back propagation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs023.html#gradient-descent" style="font-size: 80%;">Gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs024.html#the-code-for-solving-the-ode" style="font-size: 80%;">The code for solving the ODE</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs025.html#the-network-with-one-input-layer-specified-number-of-hidden-layers-and-one-output-layer" style="font-size: 80%;">The network with one input layer, specified number of hidden layers, and one output layer</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs026.html#example-population-growth" style="font-size: 80%;">Example: Population growth</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs027.html#setting-up-the-problem" style="font-size: 80%;">Setting up the problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#the-trial-solution" style="font-size: 80%;">The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs029.html#the-program-using-autograd" style="font-size: 80%;">The program using Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs030.html#using-forward-euler-to-solve-the-ode" style="font-size: 80%;">Using forward Euler to solve the ODE</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs031.html#example-solving-the-one-dimensional-poisson-equation" style="font-size: 80%;">Example: Solving the one dimensional Poisson equation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs032.html#the-specific-equation-to-solve-for" style="font-size: 80%;">The specific equation to solve for</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs033.html#solving-the-equation-using-autograd" style="font-size: 80%;">Solving the equation using Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs034.html#comparing-with-a-numerical-scheme" style="font-size: 80%;">Comparing with a numerical scheme</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs035.html#setting-up-the-code" style="font-size: 80%;">Setting up the code</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs036.html#partial-differential-equations" style="font-size: 80%;">Partial Differential Equations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs037.html#type-of-problem" style="font-size: 80%;">Type of problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs038.html#network-requirements" style="font-size: 80%;">Network requirements</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs039.html#more-details" style="font-size: 80%;">More details</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs040.html#example-the-diffusion-equation" style="font-size: 80%;">Example: The diffusion equation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs041.html#defining-the-problem" style="font-size: 80%;">Defining the problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs042.html#setting-up-the-network-using-autograd" style="font-size: 80%;">Setting up the network using Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs043.html#setting-up-the-network-using-autograd-the-trial-solution" style="font-size: 80%;">Setting up the network using Autograd; The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs044.html#why-the-jacobian" style="font-size: 80%;">Why the Jacobian?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs045.html#setting-up-the-network-using-autograd-the-full-program" style="font-size: 80%;">Setting up the network using Autograd; The full program</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs046.html#resources-on-differential-equations-and-deep-learning" style="font-size: 80%;">Resources on differential equations and deep learning</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs047.html#convolutional-neural-networks-recognizing-images" style="font-size: 80%;">Convolutional Neural Networks (recognizing images)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs048.html#what-is-the-difference" style="font-size: 80%;">What is the Difference</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs049.html#neural-networks-vs-cnns" style="font-size: 80%;">Neural Networks vs CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs050.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;">Why CNNS for images, sound files, medical images from CT scans etc?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs051.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;">Regular NNs don’t scale well to full images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs052.html#3d-volumes-of-neurons" style="font-size: 80%;">3D volumes of neurons</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs053.html#more-on-dimensionalities" style="font-size: 80%;">More on Dimensionalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs054.html#further-remarks" style="font-size: 80%;">Further remarks</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs055.html#layers-used-to-build-cnns" style="font-size: 80%;">Layers used to build CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs056.html#transforming-images" style="font-size: 80%;">Transforming images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs057.html#cnns-in-brief" style="font-size: 80%;">CNNs in brief</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs058.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs059.html#key-idea" style="font-size: 80%;">Key Idea</a></li>
+     <!-- navigation toc: --> <li><a href="#how-to-do-image-compression-before-the-era-of-deep-learning" style="font-size: 80%;">How to do image compression before the era of deep learning</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs061.html#the-svd-example" style="font-size: 80%;">The SVD example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs062.html#mathematics-of-cnns" style="font-size: 80%;">Mathematics of CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs063.html#convolution-examples-polynomial-multiplication" style="font-size: 80%;">Convolution Examples: Polynomial multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs064.html#efficient-polynomial-multiplication" style="font-size: 80%;">Efficient Polynomial Multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs065.html#further-simplification" style="font-size: 80%;">Further simplification</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs066.html#a-more-efficient-way-of-coding-the-above-convolution" style="font-size: 80%;">A more efficient way of coding the above Convolution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs067.html#commutative-process" style="font-size: 80%;">Commutative process</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs068.html#toeplitz-matrices" style="font-size: 80%;">Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs069.html#fourier-series-and-toeplitz-matrices" style="font-size: 80%;">Fourier series and Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs070.html#generalizing-the-above-one-dimensional-case" style="font-size: 80%;">Generalizing the above one-dimensional case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs071.html#memory-considerations" style="font-size: 80%;">Memory considerations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs072.html#padding" style="font-size: 80%;">Padding</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs073.html#new-vector" style="font-size: 80%;">New vector</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs074.html#rewriting-as-dot-products" style="font-size: 80%;">Rewriting as dot products</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs075.html#cross-correlation" style="font-size: 80%;">Cross correlation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs075.html#two-dimensional-objects" style="font-size: 80%;">Two-dimensional objects</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs076.html#cnns-in-more-detail-simple-example" style="font-size: 80%;">CNNs in more detail, simple example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs077.html#the-convolution-stage" style="font-size: 80%;">The convolution stage</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs078.html#finding-the-number-of-parameters" style="font-size: 80%;">Finding the number of parameters</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs079.html#new-image-or-volume" style="font-size: 80%;">New image (or volume)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs080.html#parameters-to-train-common-settings" style="font-size: 80%;">Parameters to train, common settings</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs081.html#examples-of-cnn-setups" style="font-size: 80%;">Examples of CNN setups</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs082.html#summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Summarizing: Performing a general discrete convolution ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs083.html#pooling" style="font-size: 80%;">Pooling</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs084.html#pooling-arithmetic" style="font-size: 80%;">Pooling arithmetic</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs085.html#pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Pooling types ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs086.html#building-convolutional-neural-networks-in-tensorflow-keras-and-pytorch" style="font-size: 80%;">Building convolutional neural networks in Tensorflow/Keras and PyTorch</a></li>
 
         </ul>
       </li>
@@ -348,3269 +389,23 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0060"></a>
 <!-- !split -->
-<h2 id="building-our-own-cnn-code" class="anchor">Building our own CNN code </h2>
-
-<p>Here we present a flexible and readable python code for a CNN
-implemented with NumPy. We will present the code, showcase how to use
-the codebase and fit a CNN that yields a 99% accuracy on the 28x28
-MNIST dataset within reasonable time.
-</p>
-
-<b>The codes here were developed by Eric Reber and Gregor Kajda during spring 2023.</b>
-
-<p>The CNN is compatible with all schedulers, cost functions and
-activation functions discussed in constructing our neural network
-codes.
-</p>
-
-<p> The CNN code consists of different types of Layer classes, including
-Convolution2DLayer, Pooling2DLayer, FlattenLayer, FullyConnectedLayer
-and OutputLayer, which can be added to the CNN object using the
-interface of the CNN class. This allows you to easily construct your
-own CNN, as well as allowing you to get used to an interface similar
-to that of TensorFlow which is used for real world applications. 
-</p>
-
-<p>Another important feature of this code is that it throws errors if
-unreasonable decisions are made (for example using a kernel that is
-larger than the image, not using a FlattenLayer, etc), and provides
-the user with an informative error message.
-</p>
-<h3 id="list-of-contents" class="anchor">List of contents: </h3>
-<ol>
-<li> Schedulers</li>
-<li> Activation Functions</li>
-<li> Cost Functions</li> 
-<li> Convolution</li>
-<li> Layers</li>
-<li> CNN</li> 
-<li> Some final remarks</li>
-</ol>
-<h3 id="schedulers" class="anchor">Schedulers </h3>
-
-<p>The code below shows object oriented implementations of the Constant,
-Momentum, Adagrad, AdagradMomentum, RMS prop and Adam schedulers. All
-of the classes belong to the shared abstract Scheduler class, and
-share the update_change() and reset() methods allowing for any of the
-schedulers to be seamlessly used during the training stage, as will
-later be shown in the fit() method of the neural
-network. Update_change() only has one parameter, the gradient
-(\( \delta^{l}_{j}a^{l-1}_k \)), and returns the change which will be
-subtracted from the weights. The reset() function takes no parameters,
-and resets the desired variables. For Constant and Momentum, reset
-does nothing.
-</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">autograd.numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
-
-<span style="color: #008000; font-weight: bold">class</span> <span style="color: #0000FF; font-weight: bold">Scheduler</span>:
-    <span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;</span>
-<span style="color: #BA2121; font-style: italic">    Abstract class for Schedulers</span>
-<span style="color: #BA2121; font-style: italic">    &quot;&quot;&quot;</span>
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">__init__</span>(<span style="color: #008000">self</span>, eta):
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>eta <span style="color: #666666">=</span> eta
-
-    <span style="color: #408080; font-style: italic"># should be overwritten</span>
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">update_change</span>(<span style="color: #008000">self</span>, gradient):
-        <span style="color: #008000; font-weight: bold">raise</span> <span style="color: #D2413A; font-weight: bold">NotImplementedError</span>
-
-    <span style="color: #408080; font-style: italic"># overwritten if needed</span>
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">reset</span>(<span style="color: #008000">self</span>):
-        <span style="color: #008000; font-weight: bold">pass</span>
-
-
-<span style="color: #008000; font-weight: bold">class</span> <span style="color: #0000FF; font-weight: bold">Constant</span>(Scheduler):
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">__init__</span>(<span style="color: #008000">self</span>, eta):
-        <span style="color: #008000">super</span>()<span style="color: #666666">.</span><span style="color: #0000FF">__init__</span>(eta)
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">update_change</span>(<span style="color: #008000">self</span>, gradient):
-        <span style="color: #008000; font-weight: bold">return</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>eta <span style="color: #666666">*</span> gradient
-    
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">reset</span>(<span style="color: #008000">self</span>):
-        <span style="color: #008000; font-weight: bold">pass</span>
-
-
-<span style="color: #008000; font-weight: bold">class</span> <span style="color: #0000FF; font-weight: bold">Momentum</span>(Scheduler):
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">__init__</span>(<span style="color: #008000">self</span>, eta: <span style="color: #008000">float</span>, momentum: <span style="color: #008000">float</span>):
-        <span style="color: #008000">super</span>()<span style="color: #666666">.</span><span style="color: #0000FF">__init__</span>(eta)
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>momentum <span style="color: #666666">=</span> momentum
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>change <span style="color: #666666">=</span> <span style="color: #666666">0</span>
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">update_change</span>(<span style="color: #008000">self</span>, gradient):
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>change <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>momentum <span style="color: #666666">*</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>change <span style="color: #666666">+</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>eta <span style="color: #666666">*</span> gradient
-        <span style="color: #008000; font-weight: bold">return</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>change
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">reset</span>(<span style="color: #008000">self</span>):
-        <span style="color: #008000; font-weight: bold">pass</span>
-
-
-<span style="color: #008000; font-weight: bold">class</span> <span style="color: #0000FF; font-weight: bold">Adagrad</span>(Scheduler):
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">__init__</span>(<span style="color: #008000">self</span>, eta):
-        <span style="color: #008000">super</span>()<span style="color: #666666">.</span><span style="color: #0000FF">__init__</span>(eta)
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>G_t <span style="color: #666666">=</span> <span style="color: #008000; font-weight: bold">None</span>
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">update_change</span>(<span style="color: #008000">self</span>, gradient):
-        delta <span style="color: #666666">=</span> <span style="color: #666666">1e-8</span>  <span style="color: #408080; font-style: italic"># avoid division ny zero</span>
-
-        <span style="color: #008000; font-weight: bold">if</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>G_t <span style="color: #AA22FF; font-weight: bold">is</span> <span style="color: #008000; font-weight: bold">None</span>:
-            <span style="color: #008000">self</span><span style="color: #666666">.</span>G_t <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros((gradient<span style="color: #666666">.</span>shape[<span style="color: #666666">0</span>], gradient<span style="color: #666666">.</span>shape[<span style="color: #666666">0</span>]))
-
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>G_t <span style="color: #666666">+=</span> gradient <span style="color: #666666">@</span> gradient<span style="color: #666666">.</span>T
-
-        G_t_inverse <span style="color: #666666">=</span> <span style="color: #666666">1</span> <span style="color: #666666">/</span> (
-            delta <span style="color: #666666">+</span> np<span style="color: #666666">.</span>sqrt(np<span style="color: #666666">.</span>reshape(np<span style="color: #666666">.</span>diagonal(<span style="color: #008000">self</span><span style="color: #666666">.</span>G_t), (<span style="color: #008000">self</span><span style="color: #666666">.</span>G_t<span style="color: #666666">.</span>shape[<span style="color: #666666">0</span>], <span style="color: #666666">1</span>)))
-        )
-        <span style="color: #008000; font-weight: bold">return</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>eta <span style="color: #666666">*</span> gradient <span style="color: #666666">*</span> G_t_inverse
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">reset</span>(<span style="color: #008000">self</span>):
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>G_t <span style="color: #666666">=</span> <span style="color: #008000; font-weight: bold">None</span>
-
-
-<span style="color: #008000; font-weight: bold">class</span> <span style="color: #0000FF; font-weight: bold">AdagradMomentum</span>(Scheduler):
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">__init__</span>(<span style="color: #008000">self</span>, eta, momentum):
-        <span style="color: #008000">super</span>()<span style="color: #666666">.</span><span style="color: #0000FF">__init__</span>(eta)
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>G_t <span style="color: #666666">=</span> <span style="color: #008000; font-weight: bold">None</span>
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>momentum <span style="color: #666666">=</span> momentum
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>change <span style="color: #666666">=</span> <span style="color: #666666">0</span>
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">update_change</span>(<span style="color: #008000">self</span>, gradient):
-        delta <span style="color: #666666">=</span> <span style="color: #666666">1e-8</span>  <span style="color: #408080; font-style: italic"># avoid division ny zero</span>
-
-        <span style="color: #008000; font-weight: bold">if</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>G_t <span style="color: #AA22FF; font-weight: bold">is</span> <span style="color: #008000; font-weight: bold">None</span>:
-            <span style="color: #008000">self</span><span style="color: #666666">.</span>G_t <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros((gradient<span style="color: #666666">.</span>shape[<span style="color: #666666">0</span>], gradient<span style="color: #666666">.</span>shape[<span style="color: #666666">0</span>]))
-
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>G_t <span style="color: #666666">+=</span> gradient <span style="color: #666666">@</span> gradient<span style="color: #666666">.</span>T
-
-        G_t_inverse <span style="color: #666666">=</span> <span style="color: #666666">1</span> <span style="color: #666666">/</span> (
-            delta <span style="color: #666666">+</span> np<span style="color: #666666">.</span>sqrt(np<span style="color: #666666">.</span>reshape(np<span style="color: #666666">.</span>diagonal(<span style="color: #008000">self</span><span style="color: #666666">.</span>G_t), (<span style="color: #008000">self</span><span style="color: #666666">.</span>G_t<span style="color: #666666">.</span>shape[<span style="color: #666666">0</span>], <span style="color: #666666">1</span>)))
-        )
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>change <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>change <span style="color: #666666">*</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>momentum <span style="color: #666666">+</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>eta <span style="color: #666666">*</span> gradient <span style="color: #666666">*</span> G_t_inverse
-        <span style="color: #008000; font-weight: bold">return</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>change
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">reset</span>(<span style="color: #008000">self</span>):
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>G_t <span style="color: #666666">=</span> <span style="color: #008000; font-weight: bold">None</span>
-
-
-<span style="color: #008000; font-weight: bold">class</span> <span style="color: #0000FF; font-weight: bold">RMS_prop</span>(Scheduler):
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">__init__</span>(<span style="color: #008000">self</span>, eta, rho):
-        <span style="color: #008000">super</span>()<span style="color: #666666">.</span><span style="color: #0000FF">__init__</span>(eta)
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>rho <span style="color: #666666">=</span> rho
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>second <span style="color: #666666">=</span> <span style="color: #666666">0.0</span>
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">update_change</span>(<span style="color: #008000">self</span>, gradient):
-        delta <span style="color: #666666">=</span> <span style="color: #666666">1e-8</span>  <span style="color: #408080; font-style: italic"># avoid division ny zero</span>
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>second <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>rho <span style="color: #666666">*</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>second <span style="color: #666666">+</span> (<span style="color: #666666">1</span> <span style="color: #666666">-</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>rho) <span style="color: #666666">*</span> gradient <span style="color: #666666">*</span> gradient
-        <span style="color: #008000; font-weight: bold">return</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>eta <span style="color: #666666">*</span> gradient <span style="color: #666666">/</span> (np<span style="color: #666666">.</span>sqrt(<span style="color: #008000">self</span><span style="color: #666666">.</span>second <span style="color: #666666">+</span> delta))
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">reset</span>(<span style="color: #008000">self</span>):
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>second <span style="color: #666666">=</span> <span style="color: #666666">0.0</span>
-
-
-<span style="color: #008000; font-weight: bold">class</span> <span style="color: #0000FF; font-weight: bold">Adam</span>(Scheduler):
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">__init__</span>(<span style="color: #008000">self</span>, eta, rho, rho2):
-        <span style="color: #008000">super</span>()<span style="color: #666666">.</span><span style="color: #0000FF">__init__</span>(eta)
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>rho <span style="color: #666666">=</span> rho
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>rho2 <span style="color: #666666">=</span> rho2
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>moment <span style="color: #666666">=</span> <span style="color: #666666">0</span>
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>second <span style="color: #666666">=</span> <span style="color: #666666">0</span>
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>n_epochs <span style="color: #666666">=</span> <span style="color: #666666">1</span>
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">update_change</span>(<span style="color: #008000">self</span>, gradient):
-        delta <span style="color: #666666">=</span> <span style="color: #666666">1e-8</span>  <span style="color: #408080; font-style: italic"># avoid division ny zero</span>
-
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>moment <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>rho <span style="color: #666666">*</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>moment <span style="color: #666666">+</span> (<span style="color: #666666">1</span> <span style="color: #666666">-</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>rho) <span style="color: #666666">*</span> gradient
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>second <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>rho2 <span style="color: #666666">*</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>second <span style="color: #666666">+</span> (<span style="color: #666666">1</span> <span style="color: #666666">-</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>rho2) <span style="color: #666666">*</span> gradient <span style="color: #666666">*</span> gradient
-
-        moment_corrected <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>moment <span style="color: #666666">/</span> (<span style="color: #666666">1</span> <span style="color: #666666">-</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>rho<span style="color: #666666">**</span><span style="color: #008000">self</span><span style="color: #666666">.</span>n_epochs)
-        second_corrected <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>second <span style="color: #666666">/</span> (<span style="color: #666666">1</span> <span style="color: #666666">-</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>rho2<span style="color: #666666">**</span><span style="color: #008000">self</span><span style="color: #666666">.</span>n_epochs)
-
-        <span style="color: #008000; font-weight: bold">return</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>eta <span style="color: #666666">*</span> moment_corrected <span style="color: #666666">/</span> (np<span style="color: #666666">.</span>sqrt(second_corrected <span style="color: #666666">+</span> delta))
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">reset</span>(<span style="color: #008000">self</span>):
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>n_epochs <span style="color: #666666">+=</span> <span style="color: #666666">1</span>
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>moment <span style="color: #666666">=</span> <span style="color: #666666">0</span>
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>second <span style="color: #666666">=</span> <span style="color: #666666">0</span>
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-<h3 id="usage-of-schedulers" class="anchor">Usage of schedulers </h3>
-
-<p>To initalize a scheduler, simply create the object and pass in the necessary parameters such as the learning rate and the momentum as shown below. As the Scheduler class is an abstract class it should not called directly, and will raise an error upon usage.</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;">momentum_scheduler <span style="color: #666666">=</span> Momentum(eta<span style="color: #666666">=1e-3</span>, momentum<span style="color: #666666">=0.9</span>)
-adam_scheduler <span style="color: #666666">=</span> Adam(eta<span style="color: #666666">=1e-3</span>, rho<span style="color: #666666">=0.9</span>, rho2<span style="color: #666666">=0.999</span>)
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-<p>Here is a small example for how a segment of code using schedulers could look. Switching out the schedulers is simple.</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;">weights <span style="color: #666666">=</span> np<span style="color: #666666">.</span>ones((<span style="color: #666666">3</span>,<span style="color: #666666">3</span>))
-<span style="color: #008000">print</span>(<span style="color: #BA2121">f&quot;Before scheduler:</span><span style="color: #BB6622; font-weight: bold">\n</span><span style="color: #BB6688; font-weight: bold">{</span>weights<span style="color: #BB6688; font-weight: bold">=}</span><span style="color: #BA2121">&quot;</span>)
-
-epochs <span style="color: #666666">=</span> <span style="color: #666666">10</span>
-<span style="color: #008000; font-weight: bold">for</span> e <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(epochs):
-    gradient <span style="color: #666666">=</span> np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>rand(<span style="color: #666666">3</span>, <span style="color: #666666">3</span>)
-    change <span style="color: #666666">=</span> adam_scheduler<span style="color: #666666">.</span>update_change(gradient)
-    weights <span style="color: #666666">=</span> weights <span style="color: #666666">-</span> change
-    adam_scheduler<span style="color: #666666">.</span>reset()
-
-<span style="color: #008000">print</span>(<span style="color: #BA2121">f&quot;</span><span style="color: #BB6622; font-weight: bold">\n</span><span style="color: #BA2121">After scheduler:</span><span style="color: #BB6622; font-weight: bold">\n</span><span style="color: #BB6688; font-weight: bold">{</span>weights<span style="color: #BB6688; font-weight: bold">=}</span><span style="color: #BA2121">&quot;</span>)
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-<h3 id="cost-functions" class="anchor">Cost functions </h3>
-
-<p>In this section we will quickly look at cost functions that can be
-used when creating the neural network. Every cost function takes the
-target vector as its parameter, and returns a function valued only at
-X such that it may easily be differentiated.
-</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">CostOLS</span>(target):
-    <span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;</span>
-<span style="color: #BA2121; font-style: italic">    Return OLS function valued only at X, so</span>
-<span style="color: #BA2121; font-style: italic">    that it may be easily differentiated</span>
-<span style="color: #BA2121; font-style: italic">    &quot;&quot;&quot;</span>
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">func</span>(X):
-        <span style="color: #008000; font-weight: bold">return</span> (<span style="color: #666666">1.0</span> <span style="color: #666666">/</span> target<span style="color: #666666">.</span>shape[<span style="color: #666666">0</span>]) <span style="color: #666666">*</span> np<span style="color: #666666">.</span>sum((target <span style="color: #666666">-</span> X) <span style="color: #666666">**</span> <span style="color: #666666">2</span>)
-
-    <span style="color: #008000; font-weight: bold">return</span> func
-
-
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">CostLogReg</span>(target):
-    <span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;</span>
-<span style="color: #BA2121; font-style: italic">    Return Logistic Regression cost function</span>
-<span style="color: #BA2121; font-style: italic">    valued only at X, so that it may be easily differentiated</span>
-<span style="color: #BA2121; font-style: italic">    &quot;&quot;&quot;</span>
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">func</span>(X):
-        <span style="color: #008000; font-weight: bold">return</span> <span style="color: #666666">-</span>(<span style="color: #666666">1.0</span> <span style="color: #666666">/</span> target<span style="color: #666666">.</span>shape[<span style="color: #666666">0</span>]) <span style="color: #666666">*</span> np<span style="color: #666666">.</span>sum(
-            (target <span style="color: #666666">*</span> np<span style="color: #666666">.</span>log(X <span style="color: #666666">+</span> <span style="color: #666666">10e-10</span>)) <span style="color: #666666">+</span> ((<span style="color: #666666">1</span> <span style="color: #666666">-</span> target) <span style="color: #666666">*</span> np<span style="color: #666666">.</span>log(<span style="color: #666666">1</span> <span style="color: #666666">-</span> X <span style="color: #666666">+</span> <span style="color: #666666">10e-10</span>))
-        )
-
-    <span style="color: #008000; font-weight: bold">return</span> func
-
-
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">CostCrossEntropy</span>(target):
-    <span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;</span>
-<span style="color: #BA2121; font-style: italic">    Return cross entropy cost function valued only at X, so</span>
-<span style="color: #BA2121; font-style: italic">    that it may be easily differentiated</span>
-<span style="color: #BA2121; font-style: italic">    &quot;&quot;&quot;</span>
-    
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">func</span>(X):
-        <span style="color: #008000; font-weight: bold">return</span> <span style="color: #666666">-</span>(<span style="color: #666666">1.0</span> <span style="color: #666666">/</span> target<span style="color: #666666">.</span>size) <span style="color: #666666">*</span> np<span style="color: #666666">.</span>sum(target <span style="color: #666666">*</span> np<span style="color: #666666">.</span>log(X <span style="color: #666666">+</span> <span style="color: #666666">10e-10</span>))
-
-    <span style="color: #008000; font-weight: bold">return</span> func
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-<h3 id="usage-of-cost-functions" class="anchor">Usage of cost functions </h3>
-
-<p>Below we will provide a short example of how these cost function may
-be used to obtain results if you wish to test them out on your own
-using AutoGrad's automatic differentiation.
-</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">autograd</span> <span style="color: #008000; font-weight: bold">import</span> grad
-
-target <span style="color: #666666">=</span> np<span style="color: #666666">.</span>array([[<span style="color: #666666">1</span>, <span style="color: #666666">2</span>, <span style="color: #666666">3</span>]])<span style="color: #666666">.</span>T
-a <span style="color: #666666">=</span> np<span style="color: #666666">.</span>array([[<span style="color: #666666">4</span>, <span style="color: #666666">5</span>, <span style="color: #666666">6</span>]])<span style="color: #666666">.</span>T
-
-cost_func <span style="color: #666666">=</span> CostCrossEntropy
-cost_func_derivative <span style="color: #666666">=</span> grad(cost_func(target))
-
-valued_at_a <span style="color: #666666">=</span> cost_func_derivative(a)
-<span style="color: #008000">print</span>(<span style="color: #BA2121">f&quot;Derivative of cost function </span><span style="color: #BB6688; font-weight: bold">{</span>cost_func<span style="color: #666666">.</span><span style="color: #19177C">__name__</span><span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121"> valued at a:</span><span style="color: #BB6622; font-weight: bold">\n</span><span style="color: #BB6688; font-weight: bold">{</span>valued_at_a<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">&quot;</span>)
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-<h3 id="activation-functions" class="anchor">Activation functions </h3>
-
-<p>Finally, before we look at the layers that make up the neural network,
-we will look at the activation functions which can be specified
-between the hidden layers and as the output function. Each function
-can be valued for any given vector or matrix X, and can be
-differentiated via derivate().
-</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">autograd.numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">autograd</span> <span style="color: #008000; font-weight: bold">import</span> elementwise_grad
-
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">identity</span>(X):
-    <span style="color: #008000; font-weight: bold">return</span> X
-
-
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">sigmoid</span>(X):
-    <span style="color: #008000; font-weight: bold">try</span>:
-        <span style="color: #008000; font-weight: bold">return</span> <span style="color: #666666">1.0</span> <span style="color: #666666">/</span> (<span style="color: #666666">1</span> <span style="color: #666666">+</span> np<span style="color: #666666">.</span>exp(<span style="color: #666666">-</span>X))
-    <span style="color: #008000; font-weight: bold">except</span> <span style="color: #D2413A; font-weight: bold">FloatingPointError</span>:
-        <span style="color: #008000; font-weight: bold">return</span> np<span style="color: #666666">.</span>where(X <span style="color: #666666">&gt;</span> np<span style="color: #666666">.</span>zeros(X<span style="color: #666666">.</span>shape), np<span style="color: #666666">.</span>ones(X<span style="color: #666666">.</span>shape), np<span style="color: #666666">.</span>zeros(X<span style="color: #666666">.</span>shape))
-
-
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">softmax</span>(X):
-    X <span style="color: #666666">=</span> X <span style="color: #666666">-</span> np<span style="color: #666666">.</span>max(X, axis<span style="color: #666666">=-1</span>, keepdims<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">True</span>)
-    delta <span style="color: #666666">=</span> <span style="color: #666666">10e-10</span>
-    <span style="color: #008000; font-weight: bold">return</span> np<span style="color: #666666">.</span>exp(X) <span style="color: #666666">/</span> (np<span style="color: #666666">.</span>sum(np<span style="color: #666666">.</span>exp(X), axis<span style="color: #666666">=-1</span>, keepdims<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">True</span>) <span style="color: #666666">+</span> delta)
-
-
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">RELU</span>(X):
-    <span style="color: #008000; font-weight: bold">return</span> np<span style="color: #666666">.</span>where(X <span style="color: #666666">&gt;</span> np<span style="color: #666666">.</span>zeros(X<span style="color: #666666">.</span>shape), X, np<span style="color: #666666">.</span>zeros(X<span style="color: #666666">.</span>shape))
-
-
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">LRELU</span>(X):
-    delta <span style="color: #666666">=</span> <span style="color: #666666">10e-4</span>
-    <span style="color: #008000; font-weight: bold">return</span> np<span style="color: #666666">.</span>where(X <span style="color: #666666">&gt;</span> np<span style="color: #666666">.</span>zeros(X<span style="color: #666666">.</span>shape), X, delta <span style="color: #666666">*</span> X)
-
-
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">derivate</span>(func):
-    <span style="color: #008000; font-weight: bold">if</span> func<span style="color: #666666">.</span><span style="color: #19177C">__name__</span> <span style="color: #666666">==</span> <span style="color: #BA2121">&quot;RELU&quot;</span>:
-
-        <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">func</span>(X):
-            <span style="color: #008000; font-weight: bold">return</span> np<span style="color: #666666">.</span>where(X <span style="color: #666666">&gt;</span> <span style="color: #666666">0</span>, <span style="color: #666666">1</span>, <span style="color: #666666">0</span>)
-
-        <span style="color: #008000; font-weight: bold">return</span> func
-
-    <span style="color: #008000; font-weight: bold">elif</span> func<span style="color: #666666">.</span><span style="color: #19177C">__name__</span> <span style="color: #666666">==</span> <span style="color: #BA2121">&quot;LRELU&quot;</span>:
-
-        <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">func</span>(X):
-            delta <span style="color: #666666">=</span> <span style="color: #666666">10e-4</span>
-            <span style="color: #008000; font-weight: bold">return</span> np<span style="color: #666666">.</span>where(X <span style="color: #666666">&gt;</span> <span style="color: #666666">0</span>, <span style="color: #666666">1</span>, delta)
-
-        <span style="color: #008000; font-weight: bold">return</span> func
-
-    <span style="color: #008000; font-weight: bold">else</span>:
-        <span style="color: #008000; font-weight: bold">return</span> elementwise_grad(func)
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-<h3 id="usage-of-activation-functions" class="anchor">Usage of activation functions </h3>
-
-<p>Below we present a short demonstration of how to use an activation
-function. The derivative of the activation function will be important
-when calculating the output delta term during backpropagation. Note
-that derivate() can also be used for cost functions for a more
-generalized approach.
-</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;">z <span style="color: #666666">=</span> np<span style="color: #666666">.</span>array([[<span style="color: #666666">4</span>, <span style="color: #666666">5</span>, <span style="color: #666666">6</span>]])<span style="color: #666666">.</span>T
-<span style="color: #008000">print</span>(<span style="color: #BA2121">f&quot;Input to activation function:</span><span style="color: #BB6622; font-weight: bold">\n</span><span style="color: #BB6688; font-weight: bold">{</span>z<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">&quot;</span>)
-
-act_func <span style="color: #666666">=</span> sigmoid
-a <span style="color: #666666">=</span> act_func(z)
-<span style="color: #008000">print</span>(<span style="color: #BA2121">f&quot;</span><span style="color: #BB6622; font-weight: bold">\n</span><span style="color: #BA2121">Output from </span><span style="color: #BB6688; font-weight: bold">{</span>act_func<span style="color: #666666">.</span><span style="color: #19177C">__name__</span><span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121"> activation function:</span><span style="color: #BB6622; font-weight: bold">\n</span><span style="color: #BB6688; font-weight: bold">{</span>a<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">&quot;</span>)
-
-act_func_derivative <span style="color: #666666">=</span> derivate(act_func)
-valued_at_z <span style="color: #666666">=</span> act_func_derivative(a)
-<span style="color: #008000">print</span>(<span style="color: #BA2121">f&quot;</span><span style="color: #BB6622; font-weight: bold">\n</span><span style="color: #BA2121">Derivative of </span><span style="color: #BB6688; font-weight: bold">{</span>act_func<span style="color: #666666">.</span><span style="color: #19177C">__name__</span><span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121"> activation function valued at z:</span><span style="color: #BB6622; font-weight: bold">\n</span><span style="color: #BB6688; font-weight: bold">{</span>valued_at_z<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">&quot;</span>)
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-<h3 id="convolution" class="anchor">Convolution </h3>
-
-<p>In order to construct a convolutional neural network (CNN), it is
-crucial to comprehend the fundamental principles of convolution and
-how it aids in extracting information from images. Convolution, at its
-core, is merely a mathematical operation between two functions that
-yields another function. It is represented by an integral between two
-functions, which is typically expressed as:
-</p>
-
-$$
-(f \ast g)(t):=\int_{-\infty}^{\infty} f(\tau) g(t-\tau) d \tau.
-$$
-
-<p>Here, \( f \) and \( g \) are the two functions on which we want to perform an
-operation. The outcome of the convolution operation is represented by
-\( (f \ast g) \), and it is derived by sliding the function \( g \) over \( f \) and
-computing the integral of their product at each position. If both
-functions are continuous, convolution takes the form shown
-above. However, if we discretize both \( f \) and \( g \), the convolution
-operation will take the form of a sum between the elements of \( f \) and \( g \):
-</p>
-$$
-(f \ast g)[n]=\sum_{m=0}^{n-1} f(m) g(n-m).
-$$
-
-<p>The key idea we utilize to extract the information contained in an
-image is to slide an \( m \times n \) matrix \( g \) over an \( m \times n \)
-matrix \( f \). In our case, \( f \) represents the image, while \( g \)
-represents the kernel, oftentimes called a filter. However, since our
-convolution will be a two-dimensional variant, we need to extend our
-mathematical formula with an additional summation:
-</p>
-
-$$
-(f \ast g)(i, j)\sum_{m=0}^{M-1}\sum_{n=0}^{N-1} f(m,n) g(i-m, j-n).
-$$
-
-<p>It is imperative to note that the size of the kernel g is
-significantly smaller than the size of the input image f, thereby
-reducing the amount of computation necessary for feature
-extraction. Furthermore, the kernel is usually a trainable parameter
-in a convolutional neural network, allowing the network to learn
-appropriate kernels for specific tasks.
-</p>
-
-<p>To give you an example of how 2D convolution works in practice,
-suppose we have an image \( f \) of dimension \( 6 \times 6 \)
-</p>
-
-$$
-f = \begin{bmatrix}
-4 & 1 & 2 & 9 & 8 & 6 \\
-9 & 5 & 9 & 5 & 8 & 5 \\
-1 & 5 & 9 & 7 & 6 & 4 \\
-2 & 9 & 8 & 3 & 7 & 1 \\
-8 & 1 & 6 & 4 & 2 & 2 \\
-1 & 0 & 5 & 7 & 8 & 2 \\
-\end{bmatrix}
-$$
-
-<p>and a \( 3 \times 3 \) kernel \( g \) called a low-pass filter. Note that the
-kernel is usually rotated by 180 degrees during convolution, however
-this has no effect on this kernel.
-</p>
-
-$$
-g = \frac{1}{9}
-\begin{bmatrix}
-1 & 1 & 1 \\
-1 & 1 & 1 \\
-1 & 1 & 1 \\
-\end{bmatrix}
-$$
-
-<p>In order to filter the image, we have to extract a \( 3 \times 3 \)
-element from the upper left corner of \( f \), and perform element-wise
-multiplication of the extracted image pixels with the elements of the
-kernel \( g \):
-</p>
-
-$$
-\begin{bmatrix}
-4 & 1 & 2 \\
-9 & 5 & 9 \\
-1 & 5 & 9 \\
-\end{bmatrix}
-\cdot
-\begin{bmatrix}
-\frac{1}{9} & \frac{1}{9} & \frac{1}{9} \\
-\frac{1}{9} & \frac{1}{9} & \frac{1}{9} \\
-\frac{1}{9} & \frac{1}{9} & \frac{1}{9} \\
-\end{bmatrix}
-=
-\begin{bmatrix}
-\frac{4}{9} & \frac{1}{9} & \frac{2}{9} \\
-\frac{9}{9} & \frac{5}{9} & \frac{9}{9} \\
-\frac{1}{9} & \frac{5}{9} & \frac{9}{9} \\
-\end {bmatrix}
-= \boldsymbol{A}
-$$
-
-<p>Then, following the multiplication, we summarize all the elements of the resulting matrix \( \boldsymbol{A} \):</p>
-
-$$
-(f \ast g)(0, 0)= \sum_{i=0}^{2} \sum_{j=0}^{2} a_{i,j} = 5,
-$$
-
-<p>which corresponds to the first element of the filtered image \( (f \ast g) \).</p>
-
-<p>Here we use a stride of \( S=1 \), a parameter denoted \( S \) which describes how
-many indexes we move the kernel \( g \) to the right before repeating the
-calculations above for the next \( 3 \times 3 \) element of the image
-\( f \). It is usually presumed that \( S=1 \), however, larger values for \( S \)
-can be used to reduce the dimentionality of the filtered image such
-that the convolution operation is more computationally efficient. In
-the context of a convolutional neural network, this will become very
-useful.
-</p>
-
-<p>The full result of the convolution is:</p>
-
-$$
-(f \ast g) =
-\begin{bmatrix}
-5 & 5.78 & 7 & 6.44 \\
-6.33 & 6.67 & 6.89 & 5.11 \\
-5.44 & 5.78 & 5.78 & 4 \\
-4.44 & 4.78 & 5.56 & 4 \\
-\end{bmatrix}
-$$
-
-<p>The result is markedly smaller in shape than the original image. This
-occurs when using convolution without first padding the image with
-additional columns and rows, allowing us to keep the original image
-shape after sliding the kernel over the image.  How many rows and
-columns we wish to pad the image with depends strictly on the shape of
-the kernel, as we wish to pad the image with \( r \) additional rows and
-\( c \) additional columns.
-</p>
-
-$$
-r =\lfloor \frac{\mathrm{kernel height}}{2} \rfloor \cdot 2 \\
-c =\lfloor \frac{\mathrm{kernel width}}{2} \rfloor \cdot 2
-$$
-
-<p>Note the notation \( \lfloor \frac{\mathrm{kernel width}}{2} \rfloor \) means that
-we floor the result of the division, meaning we round down to a whole
-number in case \( \frac{\mathrm{kernel width}}{2} \) results in a floating point
-number.
-</p>
+<h2 id="how-to-do-image-compression-before-the-era-of-deep-learning" class="anchor">How to do image compression before the era of deep learning </h2>
 
-<p>Using those simple equations, we find out by how much we have to
-extend the dimensions of the original image. Before proceeding,
-however, we might ask what we shall fill the additional rows and
-columns with? One of the most common approaches to padding is
-zero-padding, which as the name suggest, involves filling the rows and
-columns with zeros. This is the approach that we will be using for
-this demonstration. If we apply this padding to out original \( 6 \times 6 \)
-image, the result will be an \( 8 \times 8 \) image as the kernel has a width and
-height of 3. Note that the original image is encapsuled by the
-zero-padded rows and columns:
+<p>The singular-value decomposition (SVD) algorithm has been for decades one of the standard ways of compressing images.
+The <a href="https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/chapter2.html#the-singular-value-decomposition" target="_self">lectures on the SVD</a> give many of the essential details concerning the SVD.
 </p>
 
-$$
-\begin{bmatrix}
-0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\
-0 & 4 & 1 & 2 & 9 & 8 & 6 & 0 \\
-0 & 9 & 5 & 9 & 5 & 8 & 5 & 0 \\
-0 & 1 & 5 & 9 & 7 & 6 & 4 & 0 \\
-0 & 2 & 9 & 8 & 3 & 7 & 1 & 0 \\
-0 & 8 & 1 & 6 & 4 & 2 & 2 & 0 \\
-0 & 1 & 0 & 5 & 7 & 8 & 2 & 0 \\
-0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\
-\end{bmatrix}.
-$$
-
-<p>Below we have provided code that demonstrates padding and
-convolution. As you will see when we run the code, the size of the
-image will remain unchanged when using padding.~
-</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
-
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">padding</span>(image, kernel):
-    <span style="color: #408080; font-style: italic"># calculate r and c</span>
-    r <span style="color: #666666">=</span> (kernel<span style="color: #666666">.</span>shape[<span style="color: #666666">0</span>] <span style="color: #666666">//</span> <span style="color: #666666">2</span>) <span style="color: #666666">*</span> <span style="color: #666666">2</span>
-    c <span style="color: #666666">=</span> (kernel<span style="color: #666666">.</span>shape[<span style="color: #666666">1</span>] <span style="color: #666666">//</span> <span style="color: #666666">2</span>) <span style="color: #666666">*</span> <span style="color: #666666">2</span>
-    
-    <span style="color: #408080; font-style: italic"># padded image dimensions</span>
-    padded_height <span style="color: #666666">=</span> image<span style="color: #666666">.</span>shape[<span style="color: #666666">0</span>] <span style="color: #666666">+</span> r
-    padded_width <span style="color: #666666">=</span> image<span style="color: #666666">.</span>shape[<span style="color: #666666">1</span>] <span style="color: #666666">+</span> c
-    
-    <span style="color: #408080; font-style: italic"># for more readable code</span>
-    k_half_height <span style="color: #666666">=</span> kernel<span style="color: #666666">.</span>shape[<span style="color: #666666">0</span>] <span style="color: #666666">//</span> <span style="color: #666666">2</span>
-    k_half_width <span style="color: #666666">=</span> kernel<span style="color: #666666">.</span>shape[<span style="color: #666666">1</span>] <span style="color: #666666">//</span> <span style="color: #666666">2</span>
-
-    <span style="color: #408080; font-style: italic"># zero matrix with padded dimensions</span>
-    padded_img <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros((padded_height, padded_width))
-
-    <span style="color: #408080; font-style: italic"># place image into zero matrix</span>
-    padded_img[k_half_height : padded_height <span style="color: #666666">-</span> k_half_height,
-               k_half_width : padded_width <span style="color: #666666">-</span> k_half_width] <span style="color: #666666">=</span> image[:, :]
-
-    <span style="color: #008000; font-weight: bold">return</span> padded_img
-
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">convolve</span>(original_image, padded_image, kernel, stride<span style="color: #666666">=1</span>):
-    <span style="color: #408080; font-style: italic"># rotate kernel by 180 degrees</span>
-    kernel <span style="color: #666666">=</span> np<span style="color: #666666">.</span>rot90(np<span style="color: #666666">.</span>rot90(kernel))
-
-    <span style="color: #408080; font-style: italic"># note that kernel height // 2 is written as &#39;m&#39;</span>
-    <span style="color: #408080; font-style: italic"># and kernel width // 2 as &#39;n&#39; in the mathematical notation</span>
-    m <span style="color: #666666">=</span> kernel<span style="color: #666666">.</span>shape[<span style="color: #666666">0</span>] <span style="color: #666666">//</span> <span style="color: #666666">2</span>
-    n <span style="color: #666666">=</span> kernel<span style="color: #666666">.</span>shape[<span style="color: #666666">1</span>] <span style="color: #666666">//</span> <span style="color: #666666">2</span>
-    
-    r <span style="color: #666666">=</span> (kernel<span style="color: #666666">.</span>shape[<span style="color: #666666">0</span>] <span style="color: #666666">//</span> <span style="color: #666666">2</span>) <span style="color: #666666">*</span> <span style="color: #666666">2</span>
-    c <span style="color: #666666">=</span> (kernel<span style="color: #666666">.</span>shape[<span style="color: #666666">1</span>] <span style="color: #666666">//</span> <span style="color: #666666">2</span>) <span style="color: #666666">*</span> <span style="color: #666666">2</span>
-    
-    <span style="color: #408080; font-style: italic"># initialize output array</span>
-    convolved_image <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros(original_image<span style="color: #666666">.</span>shape)
-    image_height <span style="color: #666666">=</span> original_image<span style="color: #666666">.</span>shape[<span style="color: #666666">0</span>]
-    image_width <span style="color: #666666">=</span> original_image<span style="color: #666666">.</span>shape[<span style="color: #666666">1</span>]
-
-    <span style="color: #408080; font-style: italic"># the convolution</span>
-    <span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(m, image_height <span style="color: #666666">+</span> m, stride):
-        <span style="color: #008000; font-weight: bold">for</span> j <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(n, image_width <span style="color: #666666">+</span> n, stride):
-            convolved_image[i<span style="color: #666666">-</span>m, j<span style="color: #666666">-</span>n] <span style="color: #666666">=</span> np<span style="color: #666666">.</span>sum(
-                padded_image[i : i <span style="color: #666666">+</span> m, j : j <span style="color: #666666">+</span> n]
-                <span style="color: #666666">*</span> kernel
-            )
-            
-    <span style="color: #008000; font-weight: bold">return</span> convolved_image
-
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">convolve</span>(image, kernel, stride<span style="color: #666666">=1</span>):
-    <span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #666666">2</span>):
-        kernel <span style="color: #666666">=</span> np<span style="color: #666666">.</span>rot90(kernel)
-
-    k_half_height <span style="color: #666666">=</span> kernel<span style="color: #666666">.</span>shape[<span style="color: #666666">0</span>] <span style="color: #666666">//</span> <span style="color: #666666">2</span>
-    k_half_width <span style="color: #666666">=</span> kernel<span style="color: #666666">.</span>shape[<span style="color: #666666">0</span>] <span style="color: #666666">//</span> <span style="color: #666666">2</span>
-
-    conv_image <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros(image<span style="color: #666666">.</span>shape)
-    pad_image <span style="color: #666666">=</span> padding(image, kernel)
-
-    <span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(k_half_height, conv_image<span style="color: #666666">.</span>shape[<span style="color: #666666">0</span>] <span style="color: #666666">+</span> k_half_height, stride):
-        <span style="color: #008000; font-weight: bold">for</span> j <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(k_half_width, conv_image<span style="color: #666666">.</span>shape[<span style="color: #666666">1</span>] <span style="color: #666666">+</span> k_half_width, stride):
-            conv_image[i <span style="color: #666666">-</span> k_half_height, j <span style="color: #666666">-</span> k_half_width] <span style="color: #666666">=</span> np<span style="color: #666666">.</span>sum(
-                pad_image[
-                    i <span style="color: #666666">-</span> k_half_height : i <span style="color: #666666">+</span> k_half_height <span style="color: #666666">+</span> <span style="color: #666666">1</span>, j <span style="color: #666666">-</span> k_half_width : j <span style="color: #666666">+</span> k_half_width <span style="color: #666666">+</span> <span style="color: #666666">1</span>
-                ]
-                <span style="color: #666666">*</span> kernel
-            )
-
-    <span style="color: #008000; font-weight: bold">return</span> conv_image
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-<p>Fun fact: When filtering images, you will see that convolution
-involves rotating the kernel by 180 degrees.  However, this is not the
-case when applying convolution in a CNN, where the same operation that is not
-rotated by 180 degrees is called cross-correlation, which is normally implemented in most libraries.
+<p>The orthogonal vectors which are obtained from the SVD, can be used to
+project down the dimensionality of a given image. In the example here
+we gray-scale an image and downsize it.
 </p>
 
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;">original_image <span style="color: #666666">=</span> np<span style="color: #666666">.</span>array([[<span style="color: #666666">4</span>, <span style="color: #666666">1</span>, <span style="color: #666666">2</span>, <span style="color: #666666">9</span>, <span style="color: #666666">8</span>, <span style="color: #666666">6</span>],
-                 [<span style="color: #666666">9</span>, <span style="color: #666666">5</span>, <span style="color: #666666">9</span>, <span style="color: #666666">5</span>, <span style="color: #666666">8</span>, <span style="color: #666666">5</span>],
-                 [<span style="color: #666666">1</span>, <span style="color: #666666">5</span>, <span style="color: #666666">9</span>, <span style="color: #666666">7</span>, <span style="color: #666666">6</span>, <span style="color: #666666">4</span>],
-                 [<span style="color: #666666">2</span>, <span style="color: #666666">9</span>, <span style="color: #666666">8</span>, <span style="color: #666666">3</span>, <span style="color: #666666">7</span>, <span style="color: #666666">1</span>],
-                 [<span style="color: #666666">8</span>, <span style="color: #666666">1</span>, <span style="color: #666666">6</span>, <span style="color: #666666">4</span>, <span style="color: #666666">2</span>, <span style="color: #666666">2</span>],
-                 [<span style="color: #666666">1</span>, <span style="color: #666666">0</span>, <span style="color: #666666">5</span>, <span style="color: #666666">7</span>, <span style="color: #666666">8</span>, <span style="color: #666666">2</span>]])
-
-kernel <span style="color: #666666">=</span> (<span style="color: #666666">1/9</span>)<span style="color: #666666">*</span>np<span style="color: #666666">.</span>ones((<span style="color: #666666">3</span>,<span style="color: #666666">3</span>))
-
-<span style="color: #008000">print</span>(<span style="color: #BA2121">f&quot;</span><span style="color: #BB6688; font-weight: bold">{</span>original_image<span style="color: #666666">.</span>shape<span style="color: #BB6688; font-weight: bold">=}</span><span style="color: #BA2121">&quot;</span>)
-
-<span style="color: #408080; font-style: italic"># note that convolve() performs padding</span>
-convolved_image <span style="color: #666666">=</span> convolve(original_image, kernel, stride<span style="color: #666666">=1</span>)
-
-<span style="color: #008000">print</span>(<span style="color: #BA2121">f&quot;</span><span style="color: #BB6688; font-weight: bold">{</span>convolved_image<span style="color: #666666">.</span>shape<span style="color: #BB6688; font-weight: bold">=}</span><span style="color: #BA2121">&quot;</span>)
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-<p>As you can see, the resulting image is of the same size as the
-original image. To round of our demonstration of convolution, we will
-present the results of convolution using commonly used kernels. In a
-CNN, the values of the kernels are randomly initialized, and then
-learned during training. These kernels will extract information
-regarding the picture, such as for example the edge detection filter
-demonstrated below extracts the edges present in the picture. Of
-course, there is no guarantee that the CNN will learn an edge
-detection filter, but this should provide some intuiton as to how the
-CNN is able to use kernels to make better predictions than a regular
-feed forward neural network.
-</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #408080; font-style: italic"># Now an example using a real image and first a gaussian low-pass filter and then a Sobel filter</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">imageio.v3</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">imageio</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">matplotlib.pyplot</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">plt</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">time</span>
-
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">generate_gauss_mask</span>(sigma, K<span style="color: #666666">=1</span>):
-    side <span style="color: #666666">=</span> np<span style="color: #666666">.</span>ceil(<span style="color: #666666">1</span> <span style="color: #666666">+</span> <span style="color: #666666">8</span> <span style="color: #666666">*</span> sigma)
-    y, x <span style="color: #666666">=</span> np<span style="color: #666666">.</span>mgrid[<span style="color: #666666">-</span>side <span style="color: #666666">//</span> <span style="color: #666666">2</span> <span style="color: #666666">+</span> <span style="color: #666666">1</span> : (side <span style="color: #666666">//</span> <span style="color: #666666">2</span>) <span style="color: #666666">+</span> <span style="color: #666666">1</span>, <span style="color: #666666">-</span>side <span style="color: #666666">//</span> <span style="color: #666666">2</span> <span style="color: #666666">+</span> <span style="color: #666666">1</span> : (side <span style="color: #666666">//</span> <span style="color: #666666">2</span>) <span style="color: #666666">+</span> <span style="color: #666666">1</span>]
-    ker_coef <span style="color: #666666">=</span> K <span style="color: #666666">/</span> (<span style="color: #666666">2</span> <span style="color: #666666">*</span> np<span style="color: #666666">.</span>pi <span style="color: #666666">*</span> sigma<span style="color: #666666">**2</span>)
-    g <span style="color: #666666">=</span> np<span style="color: #666666">.</span>exp(<span style="color: #666666">-</span>((x<span style="color: #666666">**2</span> <span style="color: #666666">+</span> y<span style="color: #666666">**2</span>) <span style="color: #666666">/</span> (<span style="color: #666666">2.0</span> <span style="color: #666666">*</span> sigma<span style="color: #666666">**2</span>)))
-
-    <span style="color: #008000; font-weight: bold">return</span> g, ker_coef
-
-
-img_path <span style="color: #666666">=</span> <span style="color: #BA2121">&quot;data/IMG-2167.JPG&quot;</span>
-image_of_cute_dog <span style="color: #666666">=</span> imageio<span style="color: #666666">.</span>imread(img_path, mode<span style="color: #666666">=</span><span style="color: #BA2121">&#39;L&#39;</span>)
-
-plt<span style="color: #666666">.</span>imshow(image_of_cute_dog, cmap<span style="color: #666666">=</span><span style="color: #BA2121">&quot;gray&quot;</span>, vmin<span style="color: #666666">=0</span>, vmax<span style="color: #666666">=255</span>, aspect<span style="color: #666666">=</span><span style="color: #BA2121">&quot;auto&quot;</span>)
-plt<span style="color: #666666">.</span>title(<span style="color: #BA2121">&quot;Original image&quot;</span>)
-plt<span style="color: #666666">.</span>show()
-
-gauss, kernel <span style="color: #666666">=</span> generate_gauss_mask(sigma<span style="color: #666666">=6</span>)
-gauss_kernel <span style="color: #666666">=</span> gauss<span style="color: #666666">*</span>kernel
-
-filtered_image <span style="color: #666666">=</span> convolve(image_of_cute_dog, gauss_kernel)
-plt<span style="color: #666666">.</span>imshow(filtered_image, cmap<span style="color: #666666">=</span><span style="color: #BA2121">&quot;gray&quot;</span>, vmin<span style="color: #666666">=0</span>, vmax<span style="color: #666666">=255</span>, aspect<span style="color: #666666">=</span><span style="color: #BA2121">&quot;auto&quot;</span>)
-plt<span style="color: #666666">.</span>title(<span style="color: #BA2121">&quot;Result of convolution with gauss kernel (blurring filter)&quot;</span>)
-plt<span style="color: #666666">.</span>show()
-
-sobel_kernel <span style="color: #666666">=</span> np<span style="color: #666666">.</span>array([[<span style="color: #666666">1</span>, <span style="color: #666666">2</span>, <span style="color: #666666">1</span>],
-                    [<span style="color: #666666">0</span>, <span style="color: #666666">0</span>, <span style="color: #666666">0</span>], 
-                    [<span style="color: #666666">-1</span>, <span style="color: #666666">-2</span>, <span style="color: #666666">-1</span>]])
-
-filtered_image <span style="color: #666666">=</span> convolve(image_of_cute_dog, sobel_kernel)
-
-plt<span style="color: #666666">.</span>imshow(filtered_image, cmap<span style="color: #666666">=</span><span style="color: #BA2121">&quot;gray&quot;</span>, vmin<span style="color: #666666">=0</span>, vmax<span style="color: #666666">=255</span>, aspect<span style="color: #666666">=</span><span style="color: #BA2121">&quot;auto&quot;</span>)
-plt<span style="color: #666666">.</span>title(<span style="color: #BA2121">&quot;Result of convolution with sobel kernel (edge detection filter)&quot;</span>)
-plt<span style="color: #666666">.</span>show()
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-<h3 id="layers" class="anchor">Layers </h3>
-
-<p>The code below initialises global variables for readability and
-describes the abstract class Layers. This is not important in order to
-understand the CNN, but is benefitial for organizing the code neatly.
-</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">math</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">autograd.numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">copy</span> <span style="color: #008000; font-weight: bold">import</span> deepcopy, copy
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">autograd</span> <span style="color: #008000; font-weight: bold">import</span> grad
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">typing</span> <span style="color: #008000; font-weight: bold">import</span> Callable
-
-<span style="color: #408080; font-style: italic"># global variables for index readability</span>
-input_index <span style="color: #666666">=</span> <span style="color: #666666">0</span>
-node_index <span style="color: #666666">=</span> <span style="color: #666666">1</span>
-bias_index <span style="color: #666666">=</span> <span style="color: #666666">1</span>
-input_channel_index <span style="color: #666666">=</span> <span style="color: #666666">1</span>
-feature_maps_index <span style="color: #666666">=</span> <span style="color: #666666">1</span>
-height_index <span style="color: #666666">=</span> <span style="color: #666666">2</span>
-width_index <span style="color: #666666">=</span> <span style="color: #666666">3</span>
-kernel_feature_maps_index <span style="color: #666666">=</span> <span style="color: #666666">1</span>
-kernel_input_channels_index <span style="color: #666666">=</span> <span style="color: #666666">0</span>
-
-
-<span style="color: #008000; font-weight: bold">class</span> <span style="color: #0000FF; font-weight: bold">Layer</span>:
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">__init__</span>(<span style="color: #008000">self</span>, seed):
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>seed <span style="color: #666666">=</span> seed
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">_feedforward</span>(<span style="color: #008000">self</span>):
-        <span style="color: #008000; font-weight: bold">raise</span> <span style="color: #D2413A; font-weight: bold">NotImplementedError</span>
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">_backpropagate</span>(<span style="color: #008000">self</span>):
-        <span style="color: #008000; font-weight: bold">raise</span> <span style="color: #D2413A; font-weight: bold">NotImplementedError</span>
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">_reset_weights</span>(<span style="color: #008000">self</span>, previous_nodes):
-        <span style="color: #008000; font-weight: bold">raise</span> <span style="color: #D2413A; font-weight: bold">NotImplementedError</span>
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-<h3 id="convolution2dlayer-convolution-in-a-hidden-layer" class="anchor">Convolution2DLayer: convolution in a hidden layer </h3>
-
-<p>After establishing the foundational understanding of applying
-convolution to spatial data, let us delve into the intricate workings
-of a convolutional layer in a Convolutional Neural Network (CNN). The
-primary function of convolution, as previously discussed, is to
-extract pertinent information from images while simultaneously
-decreasing the scale of our data. To initiate the image processing, we
-shall begin by partitioning the images into color channels (unless the
-image is grayscale), comprising three primary colors: red, green, and
-blue. We will subsequently utilize trainable kernels to construct a
-higher-dimensional encoding of each channel called feature
-maps. Successive layers will receive these feature maps as inputs,
-generating further encodings, albeit with reduced dimensions. The term
-trainable kernels denotes the initialization of pre-defined
-kernel-shaped weights, which we will then train via backpropagation,
-similar to how weights are trained in a Feedforward Neural Network.
-</p>
-
-<p>To ensure seamless integration between our implementation of the
-convolutional layer and popular machine learning frameworks like
-Tensorflow (Keras) and PyTorch, we have adopted a design pattern that
-mirrors the construction of models using these APIs. This involves
-implementing our convolutional layer as a Python class or object,
-which allows for a more modular and flexible approach to building
-neural networks. By structuring our code in this way, users can easily
-incorporate our implementation into their existing machine learning
-pipelines without having to make significant changes to their
-codebase. Additionally, this design pattern promotes code reusability
-and makes it easier to maintain and update our convolutional layer
-implementation over time.
-</p>
-
-<p>Note that the Convolution2DLayer takes in an activation function as a parameter, as it also performs non-linearity.</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">class</span> <span style="color: #0000FF; font-weight: bold">Convolution2DLayer</span>(Layer):
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">__init__</span>(
-        <span style="color: #008000">self</span>,
-        input_channels,
-        feature_maps,
-        kernel_height,
-        kernel_width,
-        v_stride,
-        h_stride,
-        pad,
-        act_func: Callable,
-        seed<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">None</span>,
-        reset_weights_independently<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">True</span>,
-    ):
-        <span style="color: #008000">super</span>()<span style="color: #666666">.</span><span style="color: #0000FF">__init__</span>(seed)
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>input_channels <span style="color: #666666">=</span> input_channels
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>feature_maps <span style="color: #666666">=</span> feature_maps
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_height <span style="color: #666666">=</span> kernel_height
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_width <span style="color: #666666">=</span> kernel_width
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>v_stride <span style="color: #666666">=</span> v_stride
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>h_stride <span style="color: #666666">=</span> h_stride
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>pad <span style="color: #666666">=</span> pad
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>act_func <span style="color: #666666">=</span> act_func
-
-        <span style="color: #408080; font-style: italic"># such that the layer can be used on its own</span>
-        <span style="color: #408080; font-style: italic"># outside of the CNN module</span>
-        <span style="color: #008000; font-weight: bold">if</span> reset_weights_independently <span style="color: #666666">==</span> <span style="color: #008000; font-weight: bold">True</span>:
-            <span style="color: #008000">self</span><span style="color: #666666">.</span>_reset_weights_independently()
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">_feedforward</span>(<span style="color: #008000">self</span>, X_batch):
-        <span style="color: #408080; font-style: italic"># note that the shape of X_batch = [inputs, input_maps, img_height, img_width]</span>
-
-        <span style="color: #408080; font-style: italic"># pad the input batch</span>
-        X_batch_padded <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>_padding(X_batch)
-
-        <span style="color: #408080; font-style: italic"># calculate height_index and width_index after stride</span>
-        strided_height <span style="color: #666666">=</span> <span style="color: #008000">int</span>(np<span style="color: #666666">.</span>ceil(X_batch<span style="color: #666666">.</span>shape[height_index] <span style="color: #666666">/</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>v_stride))
-        strided_width <span style="color: #666666">=</span> <span style="color: #008000">int</span>(np<span style="color: #666666">.</span>ceil(X_batch<span style="color: #666666">.</span>shape[width_index] <span style="color: #666666">/</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>h_stride))
-
-        <span style="color: #408080; font-style: italic"># create output array</span>
-        output <span style="color: #666666">=</span> np<span style="color: #666666">.</span>ndarray(
-            (
-                X_batch<span style="color: #666666">.</span>shape[input_index],
-                <span style="color: #008000">self</span><span style="color: #666666">.</span>feature_maps,
-                strided_height,
-                strided_width,
-            )
-        )
-
-        <span style="color: #408080; font-style: italic"># save input and output for backpropagation</span>
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>X_batch_feedforward <span style="color: #666666">=</span> X_batch
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>output_shape <span style="color: #666666">=</span> output<span style="color: #666666">.</span>shape
-
-        <span style="color: #408080; font-style: italic"># checking for errors, no need to look here :)</span>
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>_check_for_errors()
-
-        <span style="color: #408080; font-style: italic"># convolve input with kernel</span>
-        <span style="color: #008000; font-weight: bold">for</span> img <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(X_batch<span style="color: #666666">.</span>shape[input_index]):
-            <span style="color: #008000; font-weight: bold">for</span> chin <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #008000">self</span><span style="color: #666666">.</span>input_channels):
-                <span style="color: #008000; font-weight: bold">for</span> fmap <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #008000">self</span><span style="color: #666666">.</span>feature_maps):
-                    out_h <span style="color: #666666">=</span> <span style="color: #666666">0</span>
-                    <span style="color: #008000; font-weight: bold">for</span> h <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #666666">0</span>, X_batch<span style="color: #666666">.</span>shape[height_index], <span style="color: #008000">self</span><span style="color: #666666">.</span>v_stride):
-                        out_w <span style="color: #666666">=</span> <span style="color: #666666">0</span>
-                        <span style="color: #008000; font-weight: bold">for</span> w <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #666666">0</span>, X_batch<span style="color: #666666">.</span>shape[width_index], <span style="color: #008000">self</span><span style="color: #666666">.</span>h_stride):
-                            output[img, fmap, out_h, out_w] <span style="color: #666666">=</span> np<span style="color: #666666">.</span>sum(
-                                X_batch_padded[
-                                    img,
-                                    chin,
-                                    h : h <span style="color: #666666">+</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_height,
-                                    w : w <span style="color: #666666">+</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_width,
-                                ]
-                                <span style="color: #666666">*</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel[chin, fmap, :, :]
-                            )
-                            out_w <span style="color: #666666">+=</span> <span style="color: #666666">1</span>
-                        out_h <span style="color: #666666">+=</span> <span style="color: #666666">1</span>
-
-        <span style="color: #408080; font-style: italic"># Pay attention to the fact that we&#39;re not rotating the kernel by 180 degrees when filtering the image in</span>
-        <span style="color: #408080; font-style: italic"># the convolutional layer, as convolution in terms of Machine Learning is a procedure known as cross-correlation</span>
-        <span style="color: #408080; font-style: italic"># in image processing and signal processing</span>
-
-        <span style="color: #408080; font-style: italic"># return a</span>
-        <span style="color: #008000; font-weight: bold">return</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>act_func(output <span style="color: #666666">/</span> (<span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_height))
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">_backpropagate</span>(<span style="color: #008000">self</span>, delta_term_next):
-        <span style="color: #408080; font-style: italic"># intiate matrices</span>
-        delta_term <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros((<span style="color: #008000">self</span><span style="color: #666666">.</span>X_batch_feedforward<span style="color: #666666">.</span>shape))
-        gradient_kernel <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros((<span style="color: #008000">self</span><span style="color: #666666">.</span>kernel<span style="color: #666666">.</span>shape))
-
-        <span style="color: #408080; font-style: italic"># pad input for convolution</span>
-        X_batch_padded <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>_padding(<span style="color: #008000">self</span><span style="color: #666666">.</span>X_batch_feedforward)
-
-        <span style="color: #408080; font-style: italic"># Since an activation function is used at the output of the convolution layer, its derivative</span>
-        <span style="color: #408080; font-style: italic"># has to be accounted for in the backpropagation -&gt; as if ReLU was a layer on its own.</span>
-        act_derivative <span style="color: #666666">=</span> derivate(<span style="color: #008000">self</span><span style="color: #666666">.</span>act_func)
-        delta_term_next <span style="color: #666666">=</span> act_derivative(delta_term_next)
-
-        <span style="color: #408080; font-style: italic"># fill in 0&#39;s for values removed by vertical stride in feedforward</span>
-        <span style="color: #008000; font-weight: bold">if</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>v_stride <span style="color: #666666">&gt;</span> <span style="color: #666666">1</span>:
-            v_ind <span style="color: #666666">=</span> <span style="color: #666666">1</span>
-            <span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(delta_term_next<span style="color: #666666">.</span>shape[height_index]):
-                <span style="color: #008000; font-weight: bold">for</span> j <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #008000">self</span><span style="color: #666666">.</span>v_stride <span style="color: #666666">-</span> <span style="color: #666666">1</span>):
-                    delta_term_next <span style="color: #666666">=</span> np<span style="color: #666666">.</span>insert(
-                        delta_term_next, v_ind, <span style="color: #666666">0</span>, axis<span style="color: #666666">=</span>height_index
-                    )
-                v_ind <span style="color: #666666">+=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>v_stride
-
-        <span style="color: #408080; font-style: italic"># fill in 0&#39;s for values removed by horizontal stride in feedforward</span>
-        <span style="color: #008000; font-weight: bold">if</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>h_stride <span style="color: #666666">&gt;</span> <span style="color: #666666">1</span>:
-            h_ind <span style="color: #666666">=</span> <span style="color: #666666">1</span>
-            <span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(delta_term_next<span style="color: #666666">.</span>shape[width_index]):
-                <span style="color: #008000; font-weight: bold">for</span> k <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #008000">self</span><span style="color: #666666">.</span>h_stride <span style="color: #666666">-</span> <span style="color: #666666">1</span>):
-                    delta_term_next <span style="color: #666666">=</span> np<span style="color: #666666">.</span>insert(
-                        delta_term_next, h_ind, <span style="color: #666666">0</span>, axis<span style="color: #666666">=</span>width_index
-                    )
-                h_ind <span style="color: #666666">+=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>h_stride
-
-        <span style="color: #408080; font-style: italic"># crops out 0-rows and 0-columns</span>
-        delta_term_next <span style="color: #666666">=</span> delta_term_next[
-            :,
-            :,
-            : <span style="color: #008000">self</span><span style="color: #666666">.</span>X_batch_feedforward<span style="color: #666666">.</span>shape[height_index],
-            : <span style="color: #008000">self</span><span style="color: #666666">.</span>X_batch_feedforward<span style="color: #666666">.</span>shape[width_index],
-        ]
-
-        <span style="color: #408080; font-style: italic"># the gradient received from the next layer also needs to be padded</span>
-        delta_term_next <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>_padding(delta_term_next)
-
-        <span style="color: #408080; font-style: italic"># calculate delta term by convolving next delta term with kernel</span>
-        <span style="color: #008000; font-weight: bold">for</span> img <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #008000">self</span><span style="color: #666666">.</span>X_batch_feedforward<span style="color: #666666">.</span>shape[input_index]):
-            <span style="color: #008000; font-weight: bold">for</span> chin <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #008000">self</span><span style="color: #666666">.</span>input_channels):
-                <span style="color: #008000; font-weight: bold">for</span> fmap <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #008000">self</span><span style="color: #666666">.</span>feature_maps):
-                    <span style="color: #008000; font-weight: bold">for</span> h <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #008000">self</span><span style="color: #666666">.</span>X_batch_feedforward<span style="color: #666666">.</span>shape[height_index]):
-                        <span style="color: #008000; font-weight: bold">for</span> w <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #008000">self</span><span style="color: #666666">.</span>X_batch_feedforward<span style="color: #666666">.</span>shape[width_index]):
-                            delta_term[img, chin, h, w] <span style="color: #666666">=</span> np<span style="color: #666666">.</span>sum(
-                                delta_term_next[
-                                    img,
-                                    fmap,
-                                    h : h <span style="color: #666666">+</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_height,
-                                    w : w <span style="color: #666666">+</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_width,
-                                ]
-                                <span style="color: #666666">*</span> np<span style="color: #666666">.</span>rot90(np<span style="color: #666666">.</span>rot90(<span style="color: #008000">self</span><span style="color: #666666">.</span>kernel[chin, fmap, :, :]))
-                            )
-
-        <span style="color: #408080; font-style: italic"># calculate gradient for kernel for weight update</span>
-        <span style="color: #408080; font-style: italic"># also via convolution</span>
-        <span style="color: #008000; font-weight: bold">for</span> chin <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #008000">self</span><span style="color: #666666">.</span>input_channels):
-            <span style="color: #008000; font-weight: bold">for</span> fmap <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #008000">self</span><span style="color: #666666">.</span>feature_maps):
-                <span style="color: #008000; font-weight: bold">for</span> k_x <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_height):
-                    <span style="color: #008000; font-weight: bold">for</span> k_y <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_width):
-                        gradient_kernel[chin, fmap, k_x, k_y] <span style="color: #666666">=</span> np<span style="color: #666666">.</span>sum(
-                            X_batch_padded[
-                                img,
-                                chin,
-                                h : h <span style="color: #666666">+</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_height,
-                                w : w <span style="color: #666666">+</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_width,
-                            ]
-                            <span style="color: #666666">*</span> delta_term_next[
-                                img,
-                                fmap,
-                                h : h <span style="color: #666666">+</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_height,
-                                w : w <span style="color: #666666">+</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_width,
-                            ]
-                        )
-        <span style="color: #408080; font-style: italic"># all kernels are updated with weight gradient of kernel</span>
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel <span style="color: #666666">-=</span> gradient_kernel
-
-        <span style="color: #408080; font-style: italic"># return delta term</span>
-        <span style="color: #008000; font-weight: bold">return</span> delta_term
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">_padding</span>(<span style="color: #008000">self</span>, X_batch, batch_type<span style="color: #666666">=</span><span style="color: #BA2121">&quot;image&quot;</span>):
-
-        <span style="color: #408080; font-style: italic"># same padding for images</span>
-        <span style="color: #008000; font-weight: bold">if</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>pad <span style="color: #666666">==</span> <span style="color: #BA2121">&quot;same&quot;</span> <span style="color: #AA22FF; font-weight: bold">and</span> batch_type <span style="color: #666666">==</span> <span style="color: #BA2121">&quot;image&quot;</span>:
-            padded_height <span style="color: #666666">=</span> X_batch<span style="color: #666666">.</span>shape[height_index] <span style="color: #666666">+</span> (<span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_height <span style="color: #666666">//</span> <span style="color: #666666">2</span>) <span style="color: #666666">*</span> <span style="color: #666666">2</span>
-            padded_width <span style="color: #666666">=</span> X_batch<span style="color: #666666">.</span>shape[width_index] <span style="color: #666666">+</span> (<span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_width <span style="color: #666666">//</span> <span style="color: #666666">2</span>) <span style="color: #666666">*</span> <span style="color: #666666">2</span>
-            half_kernel_height <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_height <span style="color: #666666">//</span> <span style="color: #666666">2</span>
-            half_kernel_width <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_width <span style="color: #666666">//</span> <span style="color: #666666">2</span>
-
-            <span style="color: #408080; font-style: italic"># initialize padded array</span>
-            X_batch_padded <span style="color: #666666">=</span> np<span style="color: #666666">.</span>ndarray(
-                (
-                    X_batch<span style="color: #666666">.</span>shape[input_index],
-                    X_batch<span style="color: #666666">.</span>shape[feature_maps_index],
-                    padded_height,
-                    padded_width,
-                )
-            )
-
-            <span style="color: #408080; font-style: italic"># zero pad all images in X_batch</span>
-            <span style="color: #008000; font-weight: bold">for</span> img <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(X_batch<span style="color: #666666">.</span>shape[input_index]):
-                padded_img <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros(
-                    (X_batch<span style="color: #666666">.</span>shape[feature_maps_index], padded_height, padded_width)
-                )
-                padded_img[
-                    :,
-                    half_kernel_height : padded_height <span style="color: #666666">-</span> half_kernel_height,
-                    half_kernel_width : padded_width <span style="color: #666666">-</span> half_kernel_width,
-                ] <span style="color: #666666">=</span> X_batch[img, :, :, :]
-                X_batch_padded[img, :, :, :] <span style="color: #666666">=</span> padded_img[:, :, :]
-
-            <span style="color: #008000; font-weight: bold">return</span> X_batch_padded
-
-        <span style="color: #408080; font-style: italic"># same padding for gradients</span>
-        <span style="color: #008000; font-weight: bold">elif</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>pad <span style="color: #666666">==</span> <span style="color: #BA2121">&quot;same&quot;</span> <span style="color: #AA22FF; font-weight: bold">and</span> batch_type <span style="color: #666666">==</span> <span style="color: #BA2121">&quot;grad&quot;</span>:
-            padded_height <span style="color: #666666">=</span> X_batch<span style="color: #666666">.</span>shape[height_index] <span style="color: #666666">+</span> (<span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_height <span style="color: #666666">//</span> <span style="color: #666666">2</span>) <span style="color: #666666">*</span> <span style="color: #666666">2</span>
-            padded_width <span style="color: #666666">=</span> X_batch<span style="color: #666666">.</span>shape[width_index] <span style="color: #666666">+</span> (<span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_width <span style="color: #666666">//</span> <span style="color: #666666">2</span>) <span style="color: #666666">*</span> <span style="color: #666666">2</span>
-            half_kernel_height <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_height <span style="color: #666666">//</span> <span style="color: #666666">2</span>
-            half_kernel_width <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_width <span style="color: #666666">//</span> <span style="color: #666666">2</span>
-
-            <span style="color: #408080; font-style: italic"># initialize padded array</span>
-            delta_term_padded <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros(
-                (
-                    X_batch<span style="color: #666666">.</span>shape[input_index],
-                    X_batch<span style="color: #666666">.</span>shape[feature_maps_index],
-                    padded_height,
-                    padded_width,
-                )
-            )
-
-            <span style="color: #408080; font-style: italic"># zero pad delta term</span>
-            delta_term_padded[
-                :, :, : X_batch<span style="color: #666666">.</span>shape[height_index], : X_batch<span style="color: #666666">.</span>shape[width_index]
-            ] <span style="color: #666666">=</span> X_batch[:, :, :, :]
-
-            <span style="color: #008000; font-weight: bold">return</span> delta_term_padded
-
-        <span style="color: #008000; font-weight: bold">else</span>:
-            <span style="color: #008000; font-weight: bold">return</span> X_batch
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">_reset_weights_independently</span>(<span style="color: #008000">self</span>):
-        <span style="color: #408080; font-style: italic"># sets seed to remove randomness inbetween runs</span>
-        <span style="color: #008000; font-weight: bold">if</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>seed <span style="color: #AA22FF; font-weight: bold">is</span> <span style="color: #AA22FF; font-weight: bold">not</span> <span style="color: #008000; font-weight: bold">None</span>:
-            np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>seed(<span style="color: #008000">self</span><span style="color: #666666">.</span>seed)
-
-        <span style="color: #408080; font-style: italic"># initializes kernel matrix</span>
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel <span style="color: #666666">=</span> np<span style="color: #666666">.</span>ndarray(
-            (
-                <span style="color: #008000">self</span><span style="color: #666666">.</span>input_channels,
-                <span style="color: #008000">self</span><span style="color: #666666">.</span>feature_maps,
-                <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_height,
-                <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_width,
-            )
-        )
-
-        <span style="color: #408080; font-style: italic"># randomly initializes weights</span>
-        <span style="color: #008000; font-weight: bold">for</span> chin <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #008000">self</span><span style="color: #666666">.</span>kernel<span style="color: #666666">.</span>shape[kernel_input_channels_index]):
-            <span style="color: #008000; font-weight: bold">for</span> fmap <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #008000">self</span><span style="color: #666666">.</span>kernel<span style="color: #666666">.</span>shape[kernel_feature_maps_index]):
-                <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel[chin, fmap, :, :] <span style="color: #666666">=</span> np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>rand(
-                    <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_height, <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_width
-                )
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">_reset_weights</span>(<span style="color: #008000">self</span>, previous_nodes):
-        <span style="color: #408080; font-style: italic"># sets weights</span>
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>_reset_weights_independently()
-
-        <span style="color: #408080; font-style: italic"># returns shape of output used for subsequent layer&#39;s weight initiation</span>
-        strided_height <span style="color: #666666">=</span> <span style="color: #008000">int</span>(
-            np<span style="color: #666666">.</span>ceil(previous_nodes<span style="color: #666666">.</span>shape[height_index] <span style="color: #666666">/</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>v_stride)
-        )
-        strided_width <span style="color: #666666">=</span> <span style="color: #008000">int</span>(np<span style="color: #666666">.</span>ceil(previous_nodes<span style="color: #666666">.</span>shape[width_index] <span style="color: #666666">/</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>h_stride))
-        next_nodes <span style="color: #666666">=</span> np<span style="color: #666666">.</span>ones(
-            (
-                previous_nodes<span style="color: #666666">.</span>shape[input_index],
-                <span style="color: #008000">self</span><span style="color: #666666">.</span>feature_maps,
-                strided_height,
-                strided_width,
-            )
-        )
-        <span style="color: #008000; font-weight: bold">return</span> next_nodes <span style="color: #666666">/</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_height
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">_check_for_errors</span>(<span style="color: #008000">self</span>):
-        <span style="color: #008000; font-weight: bold">if</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>X_batch_feedforward<span style="color: #666666">.</span>shape[input_channel_index] <span style="color: #666666">!=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>input_channels:
-            <span style="color: #008000; font-weight: bold">raise</span> <span style="color: #D2413A; font-weight: bold">AssertionError</span>(
-                <span style="color: #BA2121">f&quot;ERROR: Number of input channels in data (</span><span style="color: #BB6688; font-weight: bold">{</span><span style="color: #008000">self</span><span style="color: #666666">.</span>X_batch_feedforward<span style="color: #666666">.</span>shape[input_channel_index]<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">) is not equal to input channels in Convolution2DLayerOPT (</span><span style="color: #BB6688; font-weight: bold">{</span><span style="color: #008000">self</span><span style="color: #666666">.</span>input_channels<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">)! Please change the number of input channels of the Convolution2DLayer such that they are equal&quot;</span>
-            )
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-<h3 id="backpropagation-in-the-convolutional-layer" class="anchor">Backpropagation in the convolutional layer </h3>
-
-<p>As you may have noticed, we have not yet explained how the
-backpropagation algorithm works in a convolutional layer. However,
-having covered all other major details about convolutional layers, we
-are now prepared to do so. It should come as no surprise that the
-calculation of delta terms at each convolutional layer takes the form
-of convolution. After the gradient has been propagated backwards
-through the flattening layer, where it was reshaped into an
-appropriate form, calculating the update value for the kernel is
-simply a matter of convolving the output gradient with the input of
-the layer for which we are updating the weights. For more detail, this
-article serves as an excellent resource, see
-<a href="https://pavisj.medium.com/convolutions-and-backpropagations-46026a8f5d2c" target="_self"><tt>https://pavisj.medium.com/convolutions-and-backpropagations-46026a8f5d2c</tt></a>
-</p>
-<h3 id="demonstration" class="anchor">Demonstration </h3>
-
-<p>We can use the convolutional layer above to perform a simple convolution on an image of the now familiar cute dog.</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">imageio.v3</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">imageio</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">matplotlib.pyplot</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">plt</span>
-
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">plot_convolution_result</span>(X, layer):
-    plt<span style="color: #666666">.</span>imshow(X[<span style="color: #666666">0</span>, <span style="color: #666666">0</span>, :, :], vmin<span style="color: #666666">=0</span>, vmax<span style="color: #666666">=255</span>, cmap<span style="color: #666666">=</span><span style="color: #BA2121">&quot;gray&quot;</span>)
-    plt<span style="color: #666666">.</span>title(<span style="color: #BA2121">&quot;Original image&quot;</span>)
-    plt<span style="color: #666666">.</span>colorbar()
-    plt<span style="color: #666666">.</span>show()
-    conv_result <span style="color: #666666">=</span> layer<span style="color: #666666">.</span>_feedforward(X)
-    plt<span style="color: #666666">.</span>title(<span style="color: #BA2121">&quot;Result of convolutional layer&quot;</span>)
-    plt<span style="color: #666666">.</span>imshow(conv_result[<span style="color: #666666">0</span>, <span style="color: #666666">0</span>, :, :], vmin<span style="color: #666666">=0</span>, vmax<span style="color: #666666">=255</span>, cmap<span style="color: #666666">=</span><span style="color: #BA2121">&quot;gray&quot;</span>)
-    plt<span style="color: #666666">.</span>colorbar()
-    plt<span style="color: #666666">.</span>show()
-
-<span style="color: #408080; font-style: italic"># create layer</span>
-layer <span style="color: #666666">=</span> Convolution2DLayer(
-    input_channels<span style="color: #666666">=3</span>,
-    feature_maps<span style="color: #666666">=1</span>,
-    kernel_height<span style="color: #666666">=4</span>,
-    kernel_width<span style="color: #666666">=4</span>,
-    v_stride<span style="color: #666666">=2</span>,
-    h_stride<span style="color: #666666">=2</span>,
-    pad<span style="color: #666666">=</span><span style="color: #BA2121">&quot;same&quot;</span>,
-    act_func<span style="color: #666666">=</span>identity,
-    seed<span style="color: #666666">=2023</span>,
-    )
-
-<span style="color: #408080; font-style: italic"># read in image path, make data correct format</span>
-img_path <span style="color: #666666">=</span> img_path <span style="color: #666666">=</span> <span style="color: #BA2121">&quot;data/IMG-2167.JPG&quot;</span>
-image_of_cute_dog <span style="color: #666666">=</span> imageio<span style="color: #666666">.</span>imread(img_path)
-image_shape <span style="color: #666666">=</span> image_of_cute_dog<span style="color: #666666">.</span>shape
-image_of_cute_dog <span style="color: #666666">=</span> image_of_cute_dog<span style="color: #666666">.</span>reshape(<span style="color: #666666">1</span>, image_shape[<span style="color: #666666">0</span>], image_shape[<span style="color: #666666">1</span>], image_shape[<span style="color: #666666">2</span>])
-image_of_cute_dog <span style="color: #666666">=</span> image_of_cute_dog<span style="color: #666666">.</span>transpose(<span style="color: #666666">0</span>, <span style="color: #666666">3</span>, <span style="color: #666666">1</span>, <span style="color: #666666">2</span>)
-
-<span style="color: #408080; font-style: italic"># plot the result of the convolution</span>
-plot_convolution_result(image_of_cute_dog, layer)
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-<p>We cobserve that the result has half the pixels on each axis due to
-the fact that we've used a horizontal and vertical stride of 2. The
-result of this convolution is not very insightfull, as the kernel has
-completely random values for the first feedforward pass. However, as
-we perform multiple forward and backward passes, the results of the
-convolution should provide identifying features of the image it uses
-for classification.
-</p>
-
-<p>Note that image data usually comes in many different shapes and sizes,
-but for our CNN we require the input data be formatted as \[Number of
-inputs, input channels, input height, input width\]. Occasionally, the
-data you come accross use will be formatted like this, but on many
-occasions reshaping and transposing the dimensions is sadly necessary.
-</p>
-<h3 id="pooling-layer" class="anchor">Pooling Layer </h3>
-
-<p>The pooling layer is another widely used type of layer in
-convolutional neural networks that enables data downsampling to a more
-manageable size. Despite recent technological advancements that allow
-for convolution without excessive size reduction of the data, the
-pooling layer still remains a fundamental component of convolutional
-neural networks. It can be used before, after, or in between
-convolutional layers, although finding the optimal placement of layers
-and network depth requires experimentation to achieve the best
-performance for a given problem. The code we provide allows you to
-perform two types of pooling known as max pooling and average pooling.
-</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">class</span> <span style="color: #0000FF; font-weight: bold">Pooling2DLayer</span>(Layer):
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">__init__</span>(
-        <span style="color: #008000">self</span>,
-        kernel_height,
-        kernel_width,
-        v_stride,
-        h_stride,
-        pooling<span style="color: #666666">=</span><span style="color: #BA2121">&quot;max&quot;</span>,
-        seed<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">None</span>,
-    ):
-        <span style="color: #008000">super</span>()<span style="color: #666666">.</span><span style="color: #0000FF">__init__</span>(seed)
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_height <span style="color: #666666">=</span> kernel_height
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_width <span style="color: #666666">=</span> kernel_width
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>v_stride <span style="color: #666666">=</span> v_stride
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>h_stride <span style="color: #666666">=</span> h_stride
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>pooling <span style="color: #666666">=</span> pooling
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">_feedforward</span>(<span style="color: #008000">self</span>, X_batch):
-        <span style="color: #408080; font-style: italic"># Saving the input for use in the backwardpass</span>
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>X_batch_feedforward <span style="color: #666666">=</span> X_batch
-
-        <span style="color: #408080; font-style: italic"># check if user is silly</span>
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>_check_for_errors()
-
-        <span style="color: #408080; font-style: italic"># Computing the size of the feature maps based on kernel size and the stride parameter</span>
-        strided_height <span style="color: #666666">=</span> (
-            X_batch<span style="color: #666666">.</span>shape[height_index] <span style="color: #666666">-</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_height
-        ) <span style="color: #666666">//</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>v_stride <span style="color: #666666">+</span> <span style="color: #666666">1</span>
-        <span style="color: #008000; font-weight: bold">if</span> X_batch<span style="color: #666666">.</span>shape[height_index] <span style="color: #666666">==</span> X_batch<span style="color: #666666">.</span>shape[width_index]:
-            strided_width <span style="color: #666666">=</span> strided_height
-        <span style="color: #008000; font-weight: bold">else</span>:
-            strided_width <span style="color: #666666">=</span> (
-                X_batch<span style="color: #666666">.</span>shape[width_index] <span style="color: #666666">-</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_width
-            ) <span style="color: #666666">//</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>h_stride <span style="color: #666666">+</span> <span style="color: #666666">1</span>
-
-        <span style="color: #408080; font-style: italic"># initialize output array</span>
-        output <span style="color: #666666">=</span> np<span style="color: #666666">.</span>ndarray(
-            (
-                X_batch<span style="color: #666666">.</span>shape[input_index],
-                X_batch<span style="color: #666666">.</span>shape[feature_maps_index],
-                strided_height,
-                strided_width,
-            )
-        )
-
-        <span style="color: #408080; font-style: italic"># select pooling action, either max or average pooling</span>
-        <span style="color: #008000; font-weight: bold">if</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>pooling <span style="color: #666666">==</span> <span style="color: #BA2121">&quot;max&quot;</span>:
-            <span style="color: #008000">self</span><span style="color: #666666">.</span>pooling_action <span style="color: #666666">=</span> np<span style="color: #666666">.</span>max
-        <span style="color: #008000; font-weight: bold">elif</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>pooling <span style="color: #666666">==</span> <span style="color: #BA2121">&quot;average&quot;</span>:
-            <span style="color: #008000">self</span><span style="color: #666666">.</span>pooling_action <span style="color: #666666">=</span> np<span style="color: #666666">.</span>mean
-
-        <span style="color: #408080; font-style: italic"># pool based on kernel size and stride</span>
-        <span style="color: #008000; font-weight: bold">for</span> img <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(output<span style="color: #666666">.</span>shape[input_index]):
-            <span style="color: #008000; font-weight: bold">for</span> fmap <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(output<span style="color: #666666">.</span>shape[feature_maps_index]):
-                <span style="color: #008000; font-weight: bold">for</span> h <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(strided_height):
-                    <span style="color: #008000; font-weight: bold">for</span> w <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(strided_width):
-                        output[img, fmap, h, w] <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>pooling_action(
-                            X_batch[
-                                img,
-                                fmap,
-                                (h <span style="color: #666666">*</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>v_stride) : (h <span style="color: #666666">*</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>v_stride)
-                                <span style="color: #666666">+</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_height,
-                                (w <span style="color: #666666">*</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>h_stride) : (w <span style="color: #666666">*</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>h_stride)
-                                <span style="color: #666666">+</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_width,
-                            ]
-                        )
-
-        <span style="color: #408080; font-style: italic"># output for feedforward in next layer</span>
-        <span style="color: #008000; font-weight: bold">return</span> output
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">_backpropagate</span>(<span style="color: #008000">self</span>, delta_term_next):
-        <span style="color: #408080; font-style: italic"># initiate delta term array</span>
-        delta_term <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros((<span style="color: #008000">self</span><span style="color: #666666">.</span>X_batch_feedforward<span style="color: #666666">.</span>shape))
-
-        <span style="color: #008000; font-weight: bold">for</span> img <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(delta_term_next<span style="color: #666666">.</span>shape[input_index]):
-            <span style="color: #008000; font-weight: bold">for</span> fmap <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(delta_term_next<span style="color: #666666">.</span>shape[feature_maps_index]):
-                <span style="color: #008000; font-weight: bold">for</span> h <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #666666">0</span>, delta_term_next<span style="color: #666666">.</span>shape[height_index], <span style="color: #008000">self</span><span style="color: #666666">.</span>v_stride):
-                    <span style="color: #008000; font-weight: bold">for</span> w <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(
-                        <span style="color: #666666">0</span>, delta_term_next<span style="color: #666666">.</span>shape[width_index], <span style="color: #008000">self</span><span style="color: #666666">.</span>h_stride
-                    ):
-                        <span style="color: #408080; font-style: italic"># max pooling</span>
-                        <span style="color: #008000; font-weight: bold">if</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>pooling <span style="color: #666666">==</span> <span style="color: #BA2121">&quot;max&quot;</span>:
-                            <span style="color: #408080; font-style: italic"># get window</span>
-                            window <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>X_batch_feedforward[
-                                img,
-                                fmap,
-                                h : h <span style="color: #666666">+</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_height,
-                                w : w <span style="color: #666666">+</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_width,
-                            ]
-
-                            <span style="color: #408080; font-style: italic"># find max values indices in window</span>
-                            max_h, max_w <span style="color: #666666">=</span> np<span style="color: #666666">.</span>unravel_index(
-                                window<span style="color: #666666">.</span>argmax(), window<span style="color: #666666">.</span>shape
-                            )
-
-                            <span style="color: #408080; font-style: italic"># set values in new, upsampled delta term</span>
-                            delta_term[
-                                img,
-                                fmap,
-                                (h <span style="color: #666666">+</span> max_h),
-                                (w <span style="color: #666666">+</span> max_w),
-                            ] <span style="color: #666666">+=</span> delta_term_next[img, fmap, h, w]
-
-                        <span style="color: #408080; font-style: italic"># average pooling</span>
-                        <span style="color: #008000; font-weight: bold">if</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>pooling <span style="color: #666666">==</span> <span style="color: #BA2121">&quot;average&quot;</span>:
-                            delta_term[
-                                img,
-                                fmap,
-                                h : h <span style="color: #666666">+</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_height,
-                                w : w <span style="color: #666666">+</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_width,
-                            ] <span style="color: #666666">=</span> (
-                                delta_term_next[img, fmap, h, w]
-                                <span style="color: #666666">/</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_height
-                                <span style="color: #666666">/</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_width
-                            )
-        <span style="color: #408080; font-style: italic"># returns input to backpropagation in previous layer</span>
-        <span style="color: #008000; font-weight: bold">return</span> delta_term
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">_reset_weights</span>(<span style="color: #008000">self</span>, previous_nodes):
-        <span style="color: #408080; font-style: italic"># calculate strided height, strided width</span>
-        strided_height <span style="color: #666666">=</span> (
-            previous_nodes<span style="color: #666666">.</span>shape[height_index] <span style="color: #666666">-</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_height
-        ) <span style="color: #666666">//</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>v_stride <span style="color: #666666">+</span> <span style="color: #666666">1</span>
-        <span style="color: #008000; font-weight: bold">if</span> previous_nodes<span style="color: #666666">.</span>shape[height_index] <span style="color: #666666">==</span> previous_nodes<span style="color: #666666">.</span>shape[width_index]:
-            strided_width <span style="color: #666666">=</span> strided_height
-        <span style="color: #008000; font-weight: bold">else</span>:
-            strided_width <span style="color: #666666">=</span> (
-                previous_nodes<span style="color: #666666">.</span>shape[width_index] <span style="color: #666666">-</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_width
-            ) <span style="color: #666666">//</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>h_stride <span style="color: #666666">+</span> <span style="color: #666666">1</span>
-
-        <span style="color: #408080; font-style: italic"># initiate output array</span>
-        output <span style="color: #666666">=</span> np<span style="color: #666666">.</span>ones(
-            (
-                previous_nodes<span style="color: #666666">.</span>shape[input_index],
-                previous_nodes<span style="color: #666666">.</span>shape[feature_maps_index],
-                strided_height,
-                strided_width,
-            )
-        )
-
-        <span style="color: #408080; font-style: italic"># returns output with shape used for reset weights in next layer</span>
-        <span style="color: #008000; font-weight: bold">return</span> output
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">_check_for_errors</span>(<span style="color: #008000">self</span>):
-        <span style="color: #408080; font-style: italic"># check if input is smaller than kernel size -&gt; error</span>
-        <span style="color: #008000; font-weight: bold">assert</span> (
-            <span style="color: #008000">self</span><span style="color: #666666">.</span>X_batch_feedforward<span style="color: #666666">.</span>shape[width_index] <span style="color: #666666">&gt;=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_width
-        ), <span style="color: #BA2121">f&quot;ERROR: Pooling kernel width_index (</span><span style="color: #BB6688; font-weight: bold">{</span><span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_width<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">) larger than data width_index (</span><span style="color: #BB6688; font-weight: bold">{</span><span style="color: #008000">self</span><span style="color: #666666">.</span>X_batch_feedforward<span style="color: #666666">.</span>input<span style="color: #666666">.</span>shape[<span style="color: #666666">2</span>]<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">), please lower the kernel width_index of the Pooling2DLayer&quot;</span>
-        <span style="color: #008000; font-weight: bold">assert</span> (
-            <span style="color: #008000">self</span><span style="color: #666666">.</span>X_batch_feedforward<span style="color: #666666">.</span>shape[height_index] <span style="color: #666666">&gt;=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_height
-        ), <span style="color: #BA2121">f&quot;ERROR: Pooling kernel height_index (</span><span style="color: #BB6688; font-weight: bold">{</span><span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_height<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">) larger than data height_index (</span><span style="color: #BB6688; font-weight: bold">{</span><span style="color: #008000">self</span><span style="color: #666666">.</span>X_batch_feedforward<span style="color: #666666">.</span>input<span style="color: #666666">.</span>shape[<span style="color: #666666">3</span>]<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">), please lower the kernel height_index of the Pooling2DLayer&quot;</span>
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-<h3 id="flattening-layer" class="anchor">Flattening Layer </h3>
-
-<p>Before we can begin building our first CNN model, we need to introduce
-the flattening layer. As its name suggests, the flattening layer
-transforms the data into a one-dimensional vector that can be fed into
-the feedforward layers of our network. This layer plays a crucial role
-in preparing the data for further processing in the
-network. Additionally, the flattening layer is responsible for
-reshaping the gradient to the proper shape during
-backpropagation. This ensures that the kernels are correctly updated,
-allowing for effective learning in the network.
-</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">class</span> <span style="color: #0000FF; font-weight: bold">FlattenLayer</span>(Layer):
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">__init__</span>(<span style="color: #008000">self</span>, act_func<span style="color: #666666">=</span>LRELU, seed<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">None</span>):
-        <span style="color: #008000">super</span>()<span style="color: #666666">.</span><span style="color: #0000FF">__init__</span>(seed)
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>act_func <span style="color: #666666">=</span> act_func
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">_feedforward</span>(<span style="color: #008000">self</span>, X_batch):
-        <span style="color: #408080; font-style: italic"># save input for backpropagation</span>
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>X_batch_feedforward_shape <span style="color: #666666">=</span> X_batch<span style="color: #666666">.</span>shape
-        <span style="color: #408080; font-style: italic"># Remember, the data has the following shape: (I, FM, H, W, ) in the convolutional layers</span>
-        <span style="color: #408080; font-style: italic"># whilst the data has the shape (I, FM * H * W) in the fully connected layers</span>
-        <span style="color: #408080; font-style: italic"># I = Inputs, FM = Feature Maps, H = Height and W = Width.</span>
-        X_batch <span style="color: #666666">=</span> X_batch<span style="color: #666666">.</span>reshape(
-            X_batch<span style="color: #666666">.</span>shape[input_index],
-            X_batch<span style="color: #666666">.</span>shape[feature_maps_index]
-            <span style="color: #666666">*</span> X_batch<span style="color: #666666">.</span>shape[height_index]
-            <span style="color: #666666">*</span> X_batch<span style="color: #666666">.</span>shape[width_index],
-        )
-
-        <span style="color: #408080; font-style: italic"># add bias to a</span>
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>z_matrix <span style="color: #666666">=</span> X_batch
-        bias <span style="color: #666666">=</span> np<span style="color: #666666">.</span>ones((X_batch<span style="color: #666666">.</span>shape[input_index], <span style="color: #666666">1</span>)) <span style="color: #666666">*</span> <span style="color: #666666">0.01</span>
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>a_matrix <span style="color: #666666">=</span> np<span style="color: #666666">.</span>hstack([bias, X_batch])
-
-        <span style="color: #408080; font-style: italic"># return a, the input to feedforward in next layer</span>
-        <span style="color: #008000; font-weight: bold">return</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>a_matrix
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">_backpropagate</span>(<span style="color: #008000">self</span>, weights_next, delta_term_next):
-        activation_derivative <span style="color: #666666">=</span> derivate(<span style="color: #008000">self</span><span style="color: #666666">.</span>act_func)
-
-        <span style="color: #408080; font-style: italic"># calculate delta term</span>
-        delta_term <span style="color: #666666">=</span> (
-            weights_next[bias_index:, :] <span style="color: #666666">@</span> delta_term_next<span style="color: #666666">.</span>T
-        )<span style="color: #666666">.</span>T <span style="color: #666666">*</span> activation_derivative(<span style="color: #008000">self</span><span style="color: #666666">.</span>z_matrix)
-
-        <span style="color: #408080; font-style: italic"># FlattenLayer does not update weights</span>
-        <span style="color: #408080; font-style: italic"># reshapes delta layer to convolutional layer data format [Input, Feature_Maps, Height, Width]</span>
-        <span style="color: #008000; font-weight: bold">return</span> delta_term<span style="color: #666666">.</span>reshape(<span style="color: #008000">self</span><span style="color: #666666">.</span>X_batch_feedforward_shape)
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">_reset_weights</span>(<span style="color: #008000">self</span>, previous_nodes):
-        <span style="color: #408080; font-style: italic"># note that the previous nodes to the FlattenLayer are from the convolutional layers</span>
-        previous_nodes <span style="color: #666666">=</span> previous_nodes<span style="color: #666666">.</span>reshape(
-            previous_nodes<span style="color: #666666">.</span>shape[input_index],
-            previous_nodes<span style="color: #666666">.</span>shape[feature_maps_index]
-            <span style="color: #666666">*</span> previous_nodes<span style="color: #666666">.</span>shape[height_index]
-            <span style="color: #666666">*</span> previous_nodes<span style="color: #666666">.</span>shape[width_index],
-        )
-
-        <span style="color: #408080; font-style: italic"># return shape used in reset_weights in next layer</span>
-        <span style="color: #008000; font-weight: bold">return</span> previous_nodes<span style="color: #666666">.</span>shape[node_index]
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">get_prev_a</span>(<span style="color: #008000">self</span>):
-        <span style="color: #008000; font-weight: bold">return</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>a_matrix
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-<h3 id="fully-connected-layers" class="anchor">Fully Connected Layers </h3>
-
-<p>Finally, the result from the flatten layer will pass to a series of
-fully connected layers, which function as a normal feed forward neural
-network. The fully connected layers are split into two classes;
-FullyConnectedLayer which acts as a hidden layer, and OutputLayer,
-which acts as the single output layer at the end of the CNN. If one
-wishes to use this codebase to construct a normal feed forward neural
-network, it must start with a FlattenLayer due to techincal details
-regarding weight intitialization. However many FullyConnectedLayers
-can be added to the CNN, and in each layer the amount of nodes, which
-activation function and scheduler to use can be specified. In
-practice, the scheduler will be specified in the CNN object
-initialization, and inherited if no other scheduler is specified.
-</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">class</span> <span style="color: #0000FF; font-weight: bold">FullyConnectedLayer</span>(Layer):
-    <span style="color: #408080; font-style: italic"># FullyConnectedLayer per default uses LRELU and Adam scheduler</span>
-    <span style="color: #408080; font-style: italic"># with an eta of 0.0001, rho of 0.9 and rho2 of 0.999</span>
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">__init__</span>(
-        <span style="color: #008000">self</span>,
-        nodes: <span style="color: #008000">int</span>,
-        act_func: Callable <span style="color: #666666">=</span> LRELU,
-        scheduler: Scheduler <span style="color: #666666">=</span> Adam(eta<span style="color: #666666">=1e-4</span>, rho<span style="color: #666666">=0.9</span>, rho2<span style="color: #666666">=0.999</span>),
-        seed: <span style="color: #008000">int</span> <span style="color: #666666">=</span> <span style="color: #008000; font-weight: bold">None</span>,
-    ):
-        <span style="color: #008000">super</span>()<span style="color: #666666">.</span><span style="color: #0000FF">__init__</span>(seed)
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>nodes <span style="color: #666666">=</span> nodes
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>act_func <span style="color: #666666">=</span> act_func
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>scheduler_weight <span style="color: #666666">=</span> copy(scheduler)
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>scheduler_bias <span style="color: #666666">=</span> copy(scheduler)
-
-        <span style="color: #408080; font-style: italic"># initiate matrices for later</span>
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>weights <span style="color: #666666">=</span> <span style="color: #008000; font-weight: bold">None</span>
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>a_matrix <span style="color: #666666">=</span> <span style="color: #008000; font-weight: bold">None</span>
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>z_matrix <span style="color: #666666">=</span> <span style="color: #008000; font-weight: bold">None</span>
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">_feedforward</span>(<span style="color: #008000">self</span>, X_batch):
-        <span style="color: #408080; font-style: italic"># calculate z</span>
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>z_matrix <span style="color: #666666">=</span> X_batch <span style="color: #666666">@</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>weights
-
-        <span style="color: #408080; font-style: italic"># calculate a, add bias</span>
-        bias <span style="color: #666666">=</span> np<span style="color: #666666">.</span>ones((X_batch<span style="color: #666666">.</span>shape[input_index], <span style="color: #666666">1</span>)) <span style="color: #666666">*</span> <span style="color: #666666">0.01</span>
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>a_matrix <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>act_func(<span style="color: #008000">self</span><span style="color: #666666">.</span>z_matrix)
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>a_matrix <span style="color: #666666">=</span> np<span style="color: #666666">.</span>hstack([bias, <span style="color: #008000">self</span><span style="color: #666666">.</span>a_matrix])
-
-        <span style="color: #408080; font-style: italic"># return a, the input for feedforward in next layer</span>
-        <span style="color: #008000; font-weight: bold">return</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>a_matrix
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">_backpropagate</span>(<span style="color: #008000">self</span>, weights_next, delta_term_next, a_previous, lam):
-        <span style="color: #408080; font-style: italic"># take the derivative of the activation function</span>
-        activation_derivative <span style="color: #666666">=</span> derivate(<span style="color: #008000">self</span><span style="color: #666666">.</span>act_func)
-
-        <span style="color: #408080; font-style: italic"># calculate the delta term</span>
-        delta_term <span style="color: #666666">=</span> (
-            weights_next[bias_index:, :] <span style="color: #666666">@</span> delta_term_next<span style="color: #666666">.</span>T
-        )<span style="color: #666666">.</span>T <span style="color: #666666">*</span> activation_derivative(<span style="color: #008000">self</span><span style="color: #666666">.</span>z_matrix)
-
-        <span style="color: #408080; font-style: italic"># intitiate matrix to store gradient</span>
-        <span style="color: #408080; font-style: italic"># note that we exclude the bias term, which we will calculate later</span>
-        gradient_weights <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros(
-            (
-                a_previous<span style="color: #666666">.</span>shape[input_index],
-                a_previous<span style="color: #666666">.</span>shape[node_index] <span style="color: #666666">-</span> bias_index,
-                delta_term<span style="color: #666666">.</span>shape[node_index],
-            )
-        )
-
-        <span style="color: #408080; font-style: italic"># calculate gradient = delta term * previous a</span>
-        <span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #008000">len</span>(delta_term)):
-            gradient_weights[i, :, :] <span style="color: #666666">=</span> np<span style="color: #666666">.</span>outer(
-                a_previous[i, bias_index:], delta_term[i, :]
-            )
-
-        <span style="color: #408080; font-style: italic"># sum the gradient, divide by input_index</span>
-        gradient_weights <span style="color: #666666">=</span> np<span style="color: #666666">.</span>mean(gradient_weights, axis<span style="color: #666666">=</span>input_index)
-        <span style="color: #408080; font-style: italic"># for the bias gradient we do not multiply by previous a</span>
-        gradient_bias <span style="color: #666666">=</span> np<span style="color: #666666">.</span>mean(delta_term, axis<span style="color: #666666">=</span>input_index)<span style="color: #666666">.</span>reshape(
-            <span style="color: #666666">1</span>, delta_term<span style="color: #666666">.</span>shape[node_index]
-        )
-
-        <span style="color: #408080; font-style: italic"># regularization term</span>
-        gradient_weights <span style="color: #666666">+=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>weights[bias_index:, :] <span style="color: #666666">*</span> lam
-
-        <span style="color: #408080; font-style: italic"># send gradients into scheduler</span>
-        <span style="color: #408080; font-style: italic"># returns update matrix which will be used to update the weights and bias</span>
-        update_matrix <span style="color: #666666">=</span> np<span style="color: #666666">.</span>vstack(
-            [
-                <span style="color: #008000">self</span><span style="color: #666666">.</span>scheduler_bias<span style="color: #666666">.</span>update_change(gradient_bias),
-                <span style="color: #008000">self</span><span style="color: #666666">.</span>scheduler_weight<span style="color: #666666">.</span>update_change(gradient_weights),
-            ]
-        )
-
-        <span style="color: #408080; font-style: italic"># update weights</span>
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>weights <span style="color: #666666">-=</span> update_matrix
-
-        <span style="color: #408080; font-style: italic"># return weights and delta term, input for backpropagation in previous layer</span>
-        <span style="color: #008000; font-weight: bold">return</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>weights, delta_term
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">_reset_weights</span>(<span style="color: #008000">self</span>, previous_nodes):
-        <span style="color: #408080; font-style: italic"># sets seed to remove randomness inbetween runs</span>
-        <span style="color: #008000; font-weight: bold">if</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>seed <span style="color: #AA22FF; font-weight: bold">is</span> <span style="color: #AA22FF; font-weight: bold">not</span> <span style="color: #008000; font-weight: bold">None</span>:
-            np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>seed(<span style="color: #008000">self</span><span style="color: #666666">.</span>seed)
-
-        <span style="color: #408080; font-style: italic"># add bias, initiate random weights</span>
-        bias <span style="color: #666666">=</span> <span style="color: #666666">1</span>
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>weights <span style="color: #666666">=</span> np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>randn(previous_nodes <span style="color: #666666">+</span> bias, <span style="color: #008000">self</span><span style="color: #666666">.</span>nodes)
-
-        <span style="color: #408080; font-style: italic"># returns number of nodes, used for reset_weights in next layer</span>
-        <span style="color: #008000; font-weight: bold">return</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>nodes
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">_reset_scheduler</span>(<span style="color: #008000">self</span>):
-        <span style="color: #408080; font-style: italic"># resets scheduler per epoch</span>
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>scheduler_weight<span style="color: #666666">.</span>reset()
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>scheduler_bias<span style="color: #666666">.</span>reset()
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">get_prev_a</span>(<span style="color: #008000">self</span>):
-        <span style="color: #408080; font-style: italic"># returns a matrix, used in backpropagation</span>
-        <span style="color: #008000; font-weight: bold">return</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>a_matrix
-
-
-<span style="color: #008000; font-weight: bold">class</span> <span style="color: #0000FF; font-weight: bold">OutputLayer</span>(FullyConnectedLayer):
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">__init__</span>(
-        <span style="color: #008000">self</span>,
-        nodes: <span style="color: #008000">int</span>,
-        output_func: Callable <span style="color: #666666">=</span> LRELU,
-        cost_func: Callable <span style="color: #666666">=</span> CostCrossEntropy,
-        scheduler: Scheduler <span style="color: #666666">=</span> Adam(eta<span style="color: #666666">=1e-4</span>, rho<span style="color: #666666">=0.9</span>, rho2<span style="color: #666666">=0.999</span>),
-        seed: <span style="color: #008000">int</span> <span style="color: #666666">=</span> <span style="color: #008000; font-weight: bold">None</span>,
-    ):
-        <span style="color: #008000">super</span>()<span style="color: #666666">.</span><span style="color: #0000FF">__init__</span>(nodes, output_func, copy(scheduler), seed)
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>cost_func <span style="color: #666666">=</span> cost_func
-
-        <span style="color: #408080; font-style: italic"># initiate matrices for later</span>
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>weights <span style="color: #666666">=</span> <span style="color: #008000; font-weight: bold">None</span>
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>a_matrix <span style="color: #666666">=</span> <span style="color: #008000; font-weight: bold">None</span>
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>z_matrix <span style="color: #666666">=</span> <span style="color: #008000; font-weight: bold">None</span>
-
-        <span style="color: #408080; font-style: italic"># decides if the output layer performs binary or multi-class classification</span>
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>_set_pred_format()
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">_feedforward</span>(<span style="color: #008000">self</span>, X_batch: np<span style="color: #666666">.</span>ndarray):
-        <span style="color: #408080; font-style: italic"># calculate a, z</span>
-        <span style="color: #408080; font-style: italic"># note that bias is not added as this would create an extra output class</span>
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>z_matrix <span style="color: #666666">=</span> X_batch <span style="color: #666666">@</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>weights
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>a_matrix <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>act_func(<span style="color: #008000">self</span><span style="color: #666666">.</span>z_matrix)
-
-        <span style="color: #408080; font-style: italic"># returns prediction</span>
-        <span style="color: #008000; font-weight: bold">return</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>a_matrix
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">_backpropagate</span>(<span style="color: #008000">self</span>, target, a_previous, lam):
-        <span style="color: #408080; font-style: italic"># note that in the OutputLayer the activation function is the output function</span>
-        activation_derivative <span style="color: #666666">=</span> derivate(<span style="color: #008000">self</span><span style="color: #666666">.</span>act_func)
-
-        <span style="color: #408080; font-style: italic"># calculate output delta terms</span>
-        <span style="color: #408080; font-style: italic"># for multi-class or binary classification</span>
-        <span style="color: #008000; font-weight: bold">if</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>pred_format <span style="color: #666666">==</span> <span style="color: #BA2121">&quot;Multi-class&quot;</span>:
-            delta_term <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>a_matrix <span style="color: #666666">-</span> target
-        <span style="color: #008000; font-weight: bold">else</span>:
-            cost_func_derivative <span style="color: #666666">=</span> grad(<span style="color: #008000">self</span><span style="color: #666666">.</span>cost_func(target))
-            delta_term <span style="color: #666666">=</span> activation_derivative(<span style="color: #008000">self</span><span style="color: #666666">.</span>z_matrix) <span style="color: #666666">*</span> cost_func_derivative(
-                <span style="color: #008000">self</span><span style="color: #666666">.</span>a_matrix
-            )
-
-        <span style="color: #408080; font-style: italic"># intiate matrix that stores gradient</span>
-        gradient_weights <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros(
-            (
-                a_previous<span style="color: #666666">.</span>shape[input_index],
-                a_previous<span style="color: #666666">.</span>shape[node_index] <span style="color: #666666">-</span> bias_index,
-                delta_term<span style="color: #666666">.</span>shape[node_index],
-            )
-        )
-
-        <span style="color: #408080; font-style: italic"># calculate gradient = delta term * previous a</span>
-        <span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #008000">len</span>(delta_term)):
-            gradient_weights[i, :, :] <span style="color: #666666">=</span> np<span style="color: #666666">.</span>outer(
-                a_previous[i, bias_index:], delta_term[i, :]
-            )
-
-        <span style="color: #408080; font-style: italic"># sum the gradient, divide by input_index</span>
-        gradient_weights <span style="color: #666666">=</span> np<span style="color: #666666">.</span>mean(gradient_weights, axis<span style="color: #666666">=</span>input_index)
-        <span style="color: #408080; font-style: italic"># for the bias gradient we do not multiply by previous a</span>
-        gradient_bias <span style="color: #666666">=</span> np<span style="color: #666666">.</span>mean(delta_term, axis<span style="color: #666666">=</span>input_index)<span style="color: #666666">.</span>reshape(
-            <span style="color: #666666">1</span>, delta_term<span style="color: #666666">.</span>shape[node_index]
-        )
-
-        <span style="color: #408080; font-style: italic"># regularization term</span>
-        gradient_weights <span style="color: #666666">+=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>weights[bias_index:, :] <span style="color: #666666">*</span> lam
-
-        <span style="color: #408080; font-style: italic"># send gradients into scheduler</span>
-        <span style="color: #408080; font-style: italic"># returns update matrix which will be used to update the weights and bias</span>
-        update_matrix <span style="color: #666666">=</span> np<span style="color: #666666">.</span>vstack(
-            [
-                <span style="color: #008000">self</span><span style="color: #666666">.</span>scheduler_bias<span style="color: #666666">.</span>update_change(gradient_bias),
-                <span style="color: #008000">self</span><span style="color: #666666">.</span>scheduler_weight<span style="color: #666666">.</span>update_change(gradient_weights),
-            ]
-        )
-
-        <span style="color: #408080; font-style: italic"># update weights</span>
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>weights <span style="color: #666666">-=</span> update_matrix
-
-        <span style="color: #408080; font-style: italic"># return weights and delta term, input for backpropagation in previous layer</span>
-        <span style="color: #008000; font-weight: bold">return</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>weights, delta_term
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">_reset_weights</span>(<span style="color: #008000">self</span>, previous_nodes):
-        <span style="color: #408080; font-style: italic"># sets seed to remove randomness inbetween runs</span>
-        <span style="color: #008000; font-weight: bold">if</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>seed <span style="color: #AA22FF; font-weight: bold">is</span> <span style="color: #AA22FF; font-weight: bold">not</span> <span style="color: #008000; font-weight: bold">None</span>:
-            np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>seed(<span style="color: #008000">self</span><span style="color: #666666">.</span>seed)
-
-        <span style="color: #408080; font-style: italic"># add bias, initiate random weights</span>
-        bias <span style="color: #666666">=</span> <span style="color: #666666">1</span>
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>weights <span style="color: #666666">=</span> np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>rand(previous_nodes <span style="color: #666666">+</span> bias, <span style="color: #008000">self</span><span style="color: #666666">.</span>nodes)
-
-        <span style="color: #408080; font-style: italic"># returns number of nodes, used for reset_weights in next layer</span>
-        <span style="color: #008000; font-weight: bold">return</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>nodes
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">_reset_scheduler</span>(<span style="color: #008000">self</span>):
-        <span style="color: #408080; font-style: italic"># resets scheduler per epoch</span>
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>scheduler_weight<span style="color: #666666">.</span>reset()
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>scheduler_bias<span style="color: #666666">.</span>reset()
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">_set_pred_format</span>(<span style="color: #008000">self</span>):
-        <span style="color: #408080; font-style: italic"># sets prediction format to either regression, binary or multi-class classification</span>
-        <span style="color: #008000; font-weight: bold">if</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>act_func<span style="color: #666666">.</span><span style="color: #19177C">__name__</span> <span style="color: #AA22FF; font-weight: bold">is</span> <span style="color: #008000; font-weight: bold">None</span> <span style="color: #AA22FF; font-weight: bold">or</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>act_func<span style="color: #666666">.</span><span style="color: #19177C">__name__</span> <span style="color: #666666">==</span> <span style="color: #BA2121">&quot;identity&quot;</span>:
-            <span style="color: #008000">self</span><span style="color: #666666">.</span>pred_format <span style="color: #666666">=</span> <span style="color: #BA2121">&quot;Regression&quot;</span>
-        <span style="color: #008000; font-weight: bold">elif</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>act_func<span style="color: #666666">.</span><span style="color: #19177C">__name__</span> <span style="color: #666666">==</span> <span style="color: #BA2121">&quot;sigmoid&quot;</span> <span style="color: #AA22FF; font-weight: bold">or</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>act_func<span style="color: #666666">.</span><span style="color: #19177C">__name__</span> <span style="color: #666666">==</span> <span style="color: #BA2121">&quot;tanh&quot;</span>:
-            <span style="color: #008000">self</span><span style="color: #666666">.</span>pred_format <span style="color: #666666">=</span> <span style="color: #BA2121">&quot;Binary&quot;</span>
-        <span style="color: #008000; font-weight: bold">else</span>:
-            <span style="color: #008000">self</span><span style="color: #666666">.</span>pred_format <span style="color: #666666">=</span> <span style="color: #BA2121">&quot;Multi-class&quot;</span>
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">get_pred_format</span>(<span style="color: #008000">self</span>):
-        <span style="color: #408080; font-style: italic"># returns format of prediction</span>
-        <span style="color: #008000; font-weight: bold">return</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>pred_format
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-<h3 id="optimized-convolution2dlayer" class="anchor">Optimized Convolution2DLayer </h3>
-
-<p>For our CNN, we have also implemented an optimized version of the
-Convolution2DLayer, Convolution2DLayerOPT, which runs much faster. See
-VII. Remarks for discussion. This layer will per default be used by
-the CNN due to its computational advantages, but is much less
-readable. We've documented it such that specially interested students
-can understand the principles behind it, but it is not recommended to
-read. In short, we reshape and transpose parts of the image such that
-the convolutional operation can be swapped out for a simple matrix
-multiplication.
-</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">class</span> <span style="color: #0000FF; font-weight: bold">Convolution2DLayerOPT</span>(Convolution2DLayer):
-    <span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;</span>
-<span style="color: #BA2121; font-style: italic">    Am optimized version of the convolution layer above which</span>
-<span style="color: #BA2121; font-style: italic">    utilizes an approach of extracting windows of size equivalent</span>
-<span style="color: #BA2121; font-style: italic">    in size to the filter. The convoution is then performed on those</span>
-<span style="color: #BA2121; font-style: italic">    windows instead of a full feature map.</span>
-<span style="color: #BA2121; font-style: italic">    &quot;&quot;&quot;</span>
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">__init__</span>(
-        <span style="color: #008000">self</span>,
-        input_channels,
-        feature_maps,
-        kernel_height,
-        kernel_width,
-        v_stride,
-        h_stride,
-        pad,
-        act_func: Callable,
-        seed<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">None</span>,
-        reset_weights_independently<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">True</span>,
-    ):
-        <span style="color: #008000">super</span>()<span style="color: #666666">.</span><span style="color: #0000FF">__init__</span>(
-            input_channels,
-            feature_maps,
-            kernel_height,
-            kernel_width,
-            v_stride,
-            h_stride,
-            pad,
-            act_func,
-            seed,
-        )
-        <span style="color: #408080; font-style: italic"># true if layer is used outside of CNN</span>
-        <span style="color: #008000; font-weight: bold">if</span> reset_weights_independently <span style="color: #666666">==</span> <span style="color: #008000; font-weight: bold">True</span>:
-            <span style="color: #008000">self</span><span style="color: #666666">.</span>_reset_weights_independently()
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">_feedforward</span>(<span style="color: #008000">self</span>, X_batch):
-        <span style="color: #408080; font-style: italic"># The optimized _feedforward method is difficult to understand but computationally more efficient</span>
-        <span style="color: #408080; font-style: italic"># for a more &quot;by the book&quot; approach, please look at the _feedforward method of Convolution2DLayer</span>
-
-        <span style="color: #408080; font-style: italic"># save the input for backpropagation</span>
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>X_batch_feedforward <span style="color: #666666">=</span> X_batch
-
-        <span style="color: #408080; font-style: italic"># check that there are the correct amount of input channels</span>
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>_check_for_errors()
-
-        <span style="color: #408080; font-style: italic"># calculate new shape after stride</span>
-        strided_height <span style="color: #666666">=</span> <span style="color: #008000">int</span>(np<span style="color: #666666">.</span>ceil(X_batch<span style="color: #666666">.</span>shape[height_index] <span style="color: #666666">/</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>v_stride))
-        strided_width <span style="color: #666666">=</span> <span style="color: #008000">int</span>(np<span style="color: #666666">.</span>ceil(X_batch<span style="color: #666666">.</span>shape[width_index] <span style="color: #666666">/</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>h_stride))
-
-        <span style="color: #408080; font-style: italic"># get windows of the image for more computationally efficient convolution</span>
-        <span style="color: #408080; font-style: italic"># the idea is that we want to align the dimensions that we wish to matrix</span>
-        <span style="color: #408080; font-style: italic"># multiply, then use a simple matrix multiplication instead of convolution.</span>
-        <span style="color: #408080; font-style: italic"># then, we reshape the size back to its intended shape</span>
-        windows <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>_extract_windows(X_batch)
-        windows <span style="color: #666666">=</span> windows<span style="color: #666666">.</span>transpose(<span style="color: #666666">1</span>, <span style="color: #666666">0</span>, <span style="color: #666666">2</span>, <span style="color: #666666">3</span>, <span style="color: #666666">4</span>)<span style="color: #666666">.</span>reshape(
-            X_batch<span style="color: #666666">.</span>shape[input_index],
-            strided_height <span style="color: #666666">*</span> strided_width,
-            <span style="color: #666666">-1</span>,
-        )
-
-        <span style="color: #408080; font-style: italic"># reshape the kernel for more computationally efficient convolution</span>
-        kernel <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel
-        kernel <span style="color: #666666">=</span> kernel<span style="color: #666666">.</span>transpose(<span style="color: #666666">0</span>, <span style="color: #666666">2</span>, <span style="color: #666666">3</span>, <span style="color: #666666">1</span>)<span style="color: #666666">.</span>reshape(
-            kernel<span style="color: #666666">.</span>shape[kernel_input_channels_index]
-            <span style="color: #666666">*</span> kernel<span style="color: #666666">.</span>shape[height_index]
-            <span style="color: #666666">*</span> kernel<span style="color: #666666">.</span>shape[width_index],
-            <span style="color: #666666">-1</span>,
-        )
-
-        <span style="color: #408080; font-style: italic"># use simple matrix calculation to obtain output</span>
-        output <span style="color: #666666">=</span> (
-            (windows <span style="color: #666666">@</span> kernel)
-            <span style="color: #666666">.</span>reshape(
-                X_batch<span style="color: #666666">.</span>shape[input_index],
-                strided_height,
-                strided_width,
-                <span style="color: #666666">-1</span>,
-            )
-            <span style="color: #666666">.</span>transpose(<span style="color: #666666">0</span>, <span style="color: #666666">3</span>, <span style="color: #666666">1</span>, <span style="color: #666666">2</span>)
-        )
-
-        <span style="color: #408080; font-style: italic"># The output is reshaped and rearranged to appropriate shape</span>
-        <span style="color: #008000; font-weight: bold">return</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>act_func(
-            output <span style="color: #666666">/</span> (<span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_height <span style="color: #666666">*</span> X_batch<span style="color: #666666">.</span>shape[feature_maps_index])
-        )
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">_backpropagate</span>(<span style="color: #008000">self</span>, delta_term_next):
-        <span style="color: #408080; font-style: italic"># The optimized _backpropagate method is difficult to understand but computationally more efficient</span>
-        <span style="color: #408080; font-style: italic"># for a more &quot;by the book&quot; approach, please look at the _backpropagate method of Convolution2DLayer</span>
-        act_derivative <span style="color: #666666">=</span> derivate(<span style="color: #008000">self</span><span style="color: #666666">.</span>act_func)
-        delta_term_next <span style="color: #666666">=</span> act_derivative(delta_term_next)
-
-        <span style="color: #408080; font-style: italic"># calculate strided dimensions</span>
-        strided_height <span style="color: #666666">=</span> <span style="color: #008000">int</span>(
-            np<span style="color: #666666">.</span>ceil(<span style="color: #008000">self</span><span style="color: #666666">.</span>X_batch_feedforward<span style="color: #666666">.</span>shape[height_index] <span style="color: #666666">/</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>v_stride)
-        )
-        strided_width <span style="color: #666666">=</span> <span style="color: #008000">int</span>(
-            np<span style="color: #666666">.</span>ceil(<span style="color: #008000">self</span><span style="color: #666666">.</span>X_batch_feedforward<span style="color: #666666">.</span>shape[width_index] <span style="color: #666666">/</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>h_stride)
-        )
-
-        <span style="color: #408080; font-style: italic"># copy kernel</span>
-        kernel <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel
-
-        <span style="color: #408080; font-style: italic"># get windows, reshape for matrix multiplication</span>
-        windows <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>_extract_windows(<span style="color: #008000">self</span><span style="color: #666666">.</span>X_batch_feedforward, <span style="color: #BA2121">&quot;image&quot;</span>)<span style="color: #666666">.</span>reshape(
-            <span style="color: #008000">self</span><span style="color: #666666">.</span>X_batch_feedforward<span style="color: #666666">.</span>shape[input_index]
-            <span style="color: #666666">*</span> strided_height
-            <span style="color: #666666">*</span> strided_width,
-            <span style="color: #666666">-1</span>,
-        )
-
-        <span style="color: #408080; font-style: italic"># initialize output gradient, reshape and transpose into correct shape</span>
-        <span style="color: #408080; font-style: italic"># for matrix multiplication</span>
-        output_grad_tr <span style="color: #666666">=</span> delta_term_next<span style="color: #666666">.</span>transpose(<span style="color: #666666">0</span>, <span style="color: #666666">2</span>, <span style="color: #666666">3</span>, <span style="color: #666666">1</span>)<span style="color: #666666">.</span>reshape(
-            <span style="color: #008000">self</span><span style="color: #666666">.</span>X_batch_feedforward<span style="color: #666666">.</span>shape[input_index]
-            <span style="color: #666666">*</span> strided_height
-            <span style="color: #666666">*</span> strided_width,
-            <span style="color: #666666">-1</span>,
-        )
-
-        <span style="color: #408080; font-style: italic"># calculate gradient kernel via simple matrix multiplication and reshaping</span>
-        gradient_kernel <span style="color: #666666">=</span> (
-            (windows<span style="color: #666666">.</span>T <span style="color: #666666">@</span> output_grad_tr)
-            <span style="color: #666666">.</span>reshape(
-                kernel<span style="color: #666666">.</span>shape[kernel_input_channels_index],
-                kernel<span style="color: #666666">.</span>shape[height_index],
-                kernel<span style="color: #666666">.</span>shape[width_index],
-                kernel<span style="color: #666666">.</span>shape[kernel_feature_maps_index],
-            )
-            <span style="color: #666666">.</span>transpose(<span style="color: #666666">0</span>, <span style="color: #666666">3</span>, <span style="color: #666666">1</span>, <span style="color: #666666">2</span>)
-        )
-
-        <span style="color: #408080; font-style: italic"># for computing the input gradient</span>
-        windows_out, upsampled_height, upsampled_width <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>_extract_windows(
-            delta_term_next, <span style="color: #BA2121">&quot;grad&quot;</span>
-        )
-
-        <span style="color: #408080; font-style: italic"># calculate new window dimensions</span>
-        new_windows_first_dim <span style="color: #666666">=</span> (
-            <span style="color: #008000">self</span><span style="color: #666666">.</span>X_batch_feedforward<span style="color: #666666">.</span>shape[input_index]
-            <span style="color: #666666">*</span> upsampled_height
-            <span style="color: #666666">*</span> upsampled_width
-        )
-        <span style="color: #408080; font-style: italic"># ceil allows for various asymmetric kernels</span>
-        new_windows_sec_dim <span style="color: #666666">=</span> <span style="color: #008000">int</span>(np<span style="color: #666666">.</span>ceil(windows_out<span style="color: #666666">.</span>size <span style="color: #666666">/</span> new_windows_first_dim))
-
-        <span style="color: #408080; font-style: italic"># reshape for matrix multiplication</span>
-        windows_out <span style="color: #666666">=</span> windows_out<span style="color: #666666">.</span>transpose(<span style="color: #666666">1</span>, <span style="color: #666666">0</span>, <span style="color: #666666">2</span>, <span style="color: #666666">3</span>, <span style="color: #666666">4</span>)<span style="color: #666666">.</span>reshape(
-            new_windows_first_dim, new_windows_sec_dim
-        )
-
-        <span style="color: #408080; font-style: italic"># reshape for matrix multiplication</span>
-        kernel_reshaped <span style="color: #666666">=</span> kernel<span style="color: #666666">.</span>reshape(<span style="color: #008000">self</span><span style="color: #666666">.</span>input_channels, <span style="color: #666666">-1</span>)
-
-        <span style="color: #408080; font-style: italic"># calculating input gradient for next convolutional layer</span>
-        input_grad <span style="color: #666666">=</span> (windows_out <span style="color: #666666">@</span> kernel_reshaped<span style="color: #666666">.</span>T)<span style="color: #666666">.</span>reshape(
-            <span style="color: #008000">self</span><span style="color: #666666">.</span>X_batch_feedforward<span style="color: #666666">.</span>shape[input_index],
-            upsampled_height,
-            upsampled_width,
-            kernel<span style="color: #666666">.</span>shape[kernel_input_channels_index],
-        )
-        input_grad <span style="color: #666666">=</span> input_grad<span style="color: #666666">.</span>transpose(<span style="color: #666666">0</span>, <span style="color: #666666">3</span>, <span style="color: #666666">1</span>, <span style="color: #666666">2</span>)
-
-        <span style="color: #408080; font-style: italic"># Update the weights in the kernel</span>
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel <span style="color: #666666">-=</span> gradient_kernel
-
-        <span style="color: #408080; font-style: italic"># Output the gradient to propagate backwards</span>
-        <span style="color: #008000; font-weight: bold">return</span> input_grad
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">_extract_windows</span>(<span style="color: #008000">self</span>, X_batch, batch_type<span style="color: #666666">=</span><span style="color: #BA2121">&quot;image&quot;</span>):
-        <span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;</span>
-<span style="color: #BA2121; font-style: italic">        Receives as input the X_batch with shape (inputs, feature_maps, image_height, image_width)</span>
-<span style="color: #BA2121; font-style: italic">        and extract windows of size kernel_height * kernel_width for every image and every feature_map.</span>
-<span style="color: #BA2121; font-style: italic">        It then returns an np.ndarray of shape (image_height * image_width, inputs, feature_maps, kernel_height, kernel_width)</span>
-<span style="color: #BA2121; font-style: italic">        which will be used either to filter the images in feedforward or to calculate the gradient.</span>
-<span style="color: #BA2121; font-style: italic">        &quot;&quot;&quot;</span>
-
-        <span style="color: #408080; font-style: italic"># initialize list of windows</span>
-        windows <span style="color: #666666">=</span> []
-
-        <span style="color: #008000; font-weight: bold">if</span> batch_type <span style="color: #666666">==</span> <span style="color: #BA2121">&quot;image&quot;</span>:
-            <span style="color: #408080; font-style: italic"># pad the images</span>
-            X_batch_padded <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>_padding(X_batch, batch_type<span style="color: #666666">=</span><span style="color: #BA2121">&quot;image&quot;</span>)
-            img_height, img_width <span style="color: #666666">=</span> X_batch_padded<span style="color: #666666">.</span>shape[<span style="color: #666666">2</span>:]
-            <span style="color: #408080; font-style: italic"># For each location in the image...</span>
-            <span style="color: #008000; font-weight: bold">for</span> h <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(
-                <span style="color: #666666">0</span>,
-                X_batch<span style="color: #666666">.</span>shape[height_index],
-                <span style="color: #008000">self</span><span style="color: #666666">.</span>v_stride,
-            ):
-                <span style="color: #008000; font-weight: bold">for</span> w <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(
-                    <span style="color: #666666">0</span>,
-                    X_batch<span style="color: #666666">.</span>shape[width_index],
-                    <span style="color: #008000">self</span><span style="color: #666666">.</span>h_stride,
-                ):
-                    <span style="color: #408080; font-style: italic"># ...obtain an image patch of the original size (strided)</span>
-
-                    <span style="color: #408080; font-style: italic"># get window</span>
-                    window <span style="color: #666666">=</span> X_batch_padded[
-                        :,
-                        :,
-                        h : h <span style="color: #666666">+</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_height,
-                        w : w <span style="color: #666666">+</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_width,
-                    ]
-
-                    <span style="color: #408080; font-style: italic"># append to list of windows</span>
-                    windows<span style="color: #666666">.</span>append(window)
-
-            <span style="color: #408080; font-style: italic"># return numpy array instead of list</span>
-            <span style="color: #008000; font-weight: bold">return</span> np<span style="color: #666666">.</span>stack(windows)
-
-        <span style="color: #408080; font-style: italic"># In order to be able to perform backprogagation by the method of window extraction,</span>
-        <span style="color: #408080; font-style: italic"># here is a modified approach to extracting the windows which allow for the necessary</span>
-        <span style="color: #408080; font-style: italic"># upsampling of the gradient in case the on of the stride parameters is larger than one.</span>
-
-        <span style="color: #008000; font-weight: bold">if</span> batch_type <span style="color: #666666">==</span> <span style="color: #BA2121">&quot;grad&quot;</span>:
-
-            <span style="color: #408080; font-style: italic"># In the case of one of the stride parameters being odd, we have to take some</span>
-            <span style="color: #408080; font-style: italic"># extra care in calculating the upsampled size of X_batch. We solve this</span>
-            <span style="color: #408080; font-style: italic"># by simply flooring the result of dividing stride by 2.</span>
-            <span style="color: #008000; font-weight: bold">if</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>v_stride <span style="color: #666666">&lt;</span> <span style="color: #666666">2</span> <span style="color: #AA22FF; font-weight: bold">or</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>v_stride <span style="color: #666666">%</span> <span style="color: #666666">2</span> <span style="color: #666666">==</span> <span style="color: #666666">0</span>:
-                v_stride <span style="color: #666666">=</span> <span style="color: #666666">0</span>
-            <span style="color: #008000; font-weight: bold">else</span>:
-                v_stride <span style="color: #666666">=</span> <span style="color: #008000">int</span>(np<span style="color: #666666">.</span>floor(<span style="color: #008000">self</span><span style="color: #666666">.</span>v_stride <span style="color: #666666">/</span> <span style="color: #666666">2</span>))
-
-            <span style="color: #008000; font-weight: bold">if</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>h_stride <span style="color: #666666">&lt;</span> <span style="color: #666666">2</span> <span style="color: #AA22FF; font-weight: bold">or</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>h_stride <span style="color: #666666">%</span> <span style="color: #666666">2</span> <span style="color: #666666">==</span> <span style="color: #666666">0</span>:
-                h_stride <span style="color: #666666">=</span> <span style="color: #666666">0</span>
-            <span style="color: #008000; font-weight: bold">else</span>:
-                h_stride <span style="color: #666666">=</span> <span style="color: #008000">int</span>(np<span style="color: #666666">.</span>floor(<span style="color: #008000">self</span><span style="color: #666666">.</span>h_stride <span style="color: #666666">/</span> <span style="color: #666666">2</span>))
-
-            upsampled_height <span style="color: #666666">=</span> (X_batch<span style="color: #666666">.</span>shape[height_index] <span style="color: #666666">*</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>v_stride) <span style="color: #666666">-</span> v_stride
-            upsampled_width <span style="color: #666666">=</span> (X_batch<span style="color: #666666">.</span>shape[width_index] <span style="color: #666666">*</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>h_stride) <span style="color: #666666">-</span> h_stride
-
-            <span style="color: #408080; font-style: italic"># When upsampling, we need to insert rows and columns filled with zeros</span>
-            <span style="color: #408080; font-style: italic"># into each feature map. How many of those we have to insert is purely</span>
-            <span style="color: #408080; font-style: italic"># dependant on the value of stride parameter in the vertical and horizontal</span>
-            <span style="color: #408080; font-style: italic"># direction.</span>
-            <span style="color: #008000; font-weight: bold">if</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>v_stride <span style="color: #666666">&gt;</span> <span style="color: #666666">1</span>:
-                v_ind <span style="color: #666666">=</span> <span style="color: #666666">1</span>
-                <span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(X_batch<span style="color: #666666">.</span>shape[height_index]):
-                    <span style="color: #008000; font-weight: bold">for</span> j <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #008000">self</span><span style="color: #666666">.</span>v_stride <span style="color: #666666">-</span> <span style="color: #666666">1</span>):
-                        X_batch <span style="color: #666666">=</span> np<span style="color: #666666">.</span>insert(X_batch, v_ind, <span style="color: #666666">0</span>, axis<span style="color: #666666">=</span>height_index)
-                    v_ind <span style="color: #666666">+=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>v_stride
-
-            <span style="color: #008000; font-weight: bold">if</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>h_stride <span style="color: #666666">&gt;</span> <span style="color: #666666">1</span>:
-                h_ind <span style="color: #666666">=</span> <span style="color: #666666">1</span>
-                <span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(X_batch<span style="color: #666666">.</span>shape[width_index]):
-                    <span style="color: #008000; font-weight: bold">for</span> k <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #008000">self</span><span style="color: #666666">.</span>h_stride <span style="color: #666666">-</span> <span style="color: #666666">1</span>):
-                        X_batch <span style="color: #666666">=</span> np<span style="color: #666666">.</span>insert(X_batch, h_ind, <span style="color: #666666">0</span>, axis<span style="color: #666666">=</span>width_index)
-                    h_ind <span style="color: #666666">+=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>h_stride
-
-            <span style="color: #408080; font-style: italic"># Since the insertion of zero-filled rows and columns isn&#39;t perfect, we have</span>
-            <span style="color: #408080; font-style: italic"># to assure that the resulting feature maps will have the expected upsampled height</span>
-            <span style="color: #408080; font-style: italic"># and width by cutting them og at desired dimensions.</span>
-
-            X_batch <span style="color: #666666">=</span> X_batch[:, :, :upsampled_height, :upsampled_width]
-
-            X_batch_padded <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>_padding(X_batch, batch_type<span style="color: #666666">=</span><span style="color: #BA2121">&quot;grad&quot;</span>)
-
-            <span style="color: #408080; font-style: italic"># initialize list of windows</span>
-            windows <span style="color: #666666">=</span> []
-
-            <span style="color: #408080; font-style: italic"># For each location in the image...</span>
-            <span style="color: #008000; font-weight: bold">for</span> h <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(
-                <span style="color: #666666">0</span>,
-                X_batch<span style="color: #666666">.</span>shape[height_index],
-                <span style="color: #008000">self</span><span style="color: #666666">.</span>v_stride,
-            ):
-                <span style="color: #008000; font-weight: bold">for</span> w <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(
-                    <span style="color: #666666">0</span>,
-                    X_batch<span style="color: #666666">.</span>shape[width_index],
-                    <span style="color: #008000">self</span><span style="color: #666666">.</span>h_stride,
-                ):
-                    <span style="color: #408080; font-style: italic"># ...obtain an image patch of the original size (strided)</span>
-
-                    <span style="color: #408080; font-style: italic"># get window</span>
-                    window <span style="color: #666666">=</span> X_batch_padded[
-                        :, :, h : h <span style="color: #666666">+</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_height, w : w <span style="color: #666666">+</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_width
-                    ]
-
-                    <span style="color: #408080; font-style: italic"># append window to list</span>
-                    windows<span style="color: #666666">.</span>append(window)
-
-            <span style="color: #408080; font-style: italic"># return numpy array, unsampled dimensions</span>
-            <span style="color: #008000; font-weight: bold">return</span> np<span style="color: #666666">.</span>stack(windows), upsampled_height, upsampled_width
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">_check_for_errors</span>(<span style="color: #008000">self</span>):
-        <span style="color: #408080; font-style: italic"># compares input channels of data to input channels of Convolution2DLayer</span>
-        <span style="color: #008000; font-weight: bold">if</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>X_batch_feedforward<span style="color: #666666">.</span>shape[input_channel_index] <span style="color: #666666">!=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>input_channels:
-            <span style="color: #008000; font-weight: bold">raise</span> <span style="color: #D2413A; font-weight: bold">AssertionError</span>(
-                <span style="color: #BA2121">f&quot;ERROR: Number of input channels in data (</span><span style="color: #BB6688; font-weight: bold">{</span><span style="color: #008000">self</span><span style="color: #666666">.</span>X_batch_feedforward<span style="color: #666666">.</span>shape[input_channel_index]<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">) is not equal to input channels in Convolution2DLayerOPT (</span><span style="color: #BB6688; font-weight: bold">{</span><span style="color: #008000">self</span><span style="color: #666666">.</span>input_channels<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">)! Please change the number of input channels of the Convolution2DLayer such that they are equal&quot;</span>
-            )
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-<h3 id="the-convolutional-neural-network-cnn" class="anchor">The Convolutional Neural Network (CNN) </h3>
-
-<p>Finally, we present the code for the CNN. The CNN class organizes all the layers, and allows for training on image data.</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">math</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">autograd.numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">sys</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">warnings</span>
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">autograd</span> <span style="color: #008000; font-weight: bold">import</span> grad, elementwise_grad
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">random</span> <span style="color: #008000; font-weight: bold">import</span> random, seed
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">copy</span> <span style="color: #008000; font-weight: bold">import</span> deepcopy
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">typing</span> <span style="color: #008000; font-weight: bold">import</span> Tuple, Callable
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.utils</span> <span style="color: #008000; font-weight: bold">import</span> resample
-
-warnings<span style="color: #666666">.</span>simplefilter(<span style="color: #BA2121">&quot;error&quot;</span>)
-
-
-<span style="color: #008000; font-weight: bold">class</span> <span style="color: #0000FF; font-weight: bold">CNN</span>:
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">__init__</span>(
-        <span style="color: #008000">self</span>,
-        cost_func: Callable <span style="color: #666666">=</span> CostCrossEntropy,
-        scheduler: Scheduler <span style="color: #666666">=</span> Adam(eta<span style="color: #666666">=1e-4</span>, rho<span style="color: #666666">=0.9</span>, rho2<span style="color: #666666">=0.999</span>),
-        seed: <span style="color: #008000">int</span> <span style="color: #666666">=</span> <span style="color: #008000; font-weight: bold">None</span>,
-    ):
-        <span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;</span>
-<span style="color: #BA2121; font-style: italic">        Description:</span>
-<span style="color: #BA2121; font-style: italic">        ------------</span>
-<span style="color: #BA2121; font-style: italic">            Instantiates CNN object</span>
-
-<span style="color: #BA2121; font-style: italic">        Parameters:</span>
-<span style="color: #BA2121; font-style: italic">        ------------</span>
-<span style="color: #BA2121; font-style: italic">            I   output_func (costFunctions) cost function for feed forward neural network part of CNN,</span>
-<span style="color: #BA2121; font-style: italic">                such as &quot;CostLogReg&quot;, &quot;CostOLS&quot; or &quot;CostCrossEntropy&quot;</span>
-
-<span style="color: #BA2121; font-style: italic">            II  scheduler (Scheduler) optional parameter, default set to Adam. Can also be set to other</span>
-<span style="color: #BA2121; font-style: italic">                schedulers such as AdaGrad, Momentum, RMS_prop and Constant. Note that schedulers have</span>
-<span style="color: #BA2121; font-style: italic">                to be instantiated first with proper parameters (for example eta, rho and rho2 for Adam)</span>
-
-<span style="color: #BA2121; font-style: italic">            III seed (int) used for seeding all random operations</span>
-<span style="color: #BA2121; font-style: italic">        &quot;&quot;&quot;</span>
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>layers <span style="color: #666666">=</span> <span style="color: #008000">list</span>()
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>cost_func <span style="color: #666666">=</span> cost_func
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>scheduler <span style="color: #666666">=</span> scheduler
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>schedulers_weight <span style="color: #666666">=</span> <span style="color: #008000">list</span>()
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>schedulers_bias <span style="color: #666666">=</span> <span style="color: #008000">list</span>()
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>seed <span style="color: #666666">=</span> seed
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>pred_format <span style="color: #666666">=</span> <span style="color: #008000; font-weight: bold">None</span>
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">add_FullyConnectedLayer</span>(
-        <span style="color: #008000">self</span>, nodes: <span style="color: #008000">int</span>, act_func<span style="color: #666666">=</span>LRELU, scheduler<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">None</span>
-    ) <span style="color: #666666">-&gt;</span> <span style="color: #008000; font-weight: bold">None</span>:
-        <span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;</span>
-<span style="color: #BA2121; font-style: italic">        Description:</span>
-<span style="color: #BA2121; font-style: italic">        ------------</span>
-<span style="color: #BA2121; font-style: italic">            Add a FullyConnectedLayer to the CNN, i.e. a hidden layer in the feed forward neural</span>
-<span style="color: #BA2121; font-style: italic">            network part of the CNN. Often called a Dense layer in literature</span>
-
-<span style="color: #BA2121; font-style: italic">        Parameters:</span>
-<span style="color: #BA2121; font-style: italic">        ------------</span>
-<span style="color: #BA2121; font-style: italic">            I   nodes (int) number of nodes in FullyConnectedLayer</span>
-<span style="color: #BA2121; font-style: italic">            II  act_func (activationFunctions) activation function of FullyConnectedLayer,</span>
-<span style="color: #BA2121; font-style: italic">                such as &quot;sigmoid&quot;, &quot;RELU&quot;, &quot;LRELU&quot;, &quot;softmax&quot; or &quot;identity&quot;</span>
-<span style="color: #BA2121; font-style: italic">            III scheduler (Scheduler) optional parameter, default set to Adam. Can also be set to other</span>
-<span style="color: #BA2121; font-style: italic">                schedulers such as AdaGrad, Momentum, RMS_prop and Constant</span>
-<span style="color: #BA2121; font-style: italic">        &quot;&quot;&quot;</span>
-        <span style="color: #008000; font-weight: bold">assert</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>layers, <span style="color: #BA2121">&quot;FullyConnectedLayer should follow FlattenLayer in CNN&quot;</span>
-
-        <span style="color: #008000; font-weight: bold">if</span> scheduler <span style="color: #AA22FF; font-weight: bold">is</span> <span style="color: #008000; font-weight: bold">None</span>:
-            scheduler <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>scheduler
-
-        layer <span style="color: #666666">=</span> FullyConnectedLayer(nodes, act_func, scheduler, <span style="color: #008000">self</span><span style="color: #666666">.</span>seed)
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>layers<span style="color: #666666">.</span>append(layer)
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">add_OutputLayer</span>(<span style="color: #008000">self</span>, nodes: <span style="color: #008000">int</span>, output_func<span style="color: #666666">=</span>sigmoid, scheduler<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">None</span>) <span style="color: #666666">-&gt;</span> <span style="color: #008000; font-weight: bold">None</span>:
-        <span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;</span>
-<span style="color: #BA2121; font-style: italic">        Description:</span>
-<span style="color: #BA2121; font-style: italic">        ------------</span>
-<span style="color: #BA2121; font-style: italic">            Add an OutputLayer to the CNN, i.e. a the final layer in the feed forward neural</span>
-<span style="color: #BA2121; font-style: italic">            network part of the CNN</span>
-
-<span style="color: #BA2121; font-style: italic">        Parameters:</span>
-<span style="color: #BA2121; font-style: italic">        ------------</span>
-<span style="color: #BA2121; font-style: italic">            I   nodes (int) number of nodes in OutputLayer. Set nodes=1 for binary classification and</span>
-<span style="color: #BA2121; font-style: italic">                nodes = number of classes for multi-class classification</span>
-<span style="color: #BA2121; font-style: italic">            II  output_func (activationFunctions) activation function for the output layer, such as</span>
-<span style="color: #BA2121; font-style: italic">                &quot;identity&quot; for regression, &quot;sigmoid&quot; for binary classification and &quot;softmax&quot; for multi-class</span>
-<span style="color: #BA2121; font-style: italic">                classification</span>
-<span style="color: #BA2121; font-style: italic">            III scheduler (Scheduler) optional parameter, default set to Adam. Can also be set to other</span>
-<span style="color: #BA2121; font-style: italic">                schedulers such as AdaGrad, Momentum, RMS_prop and Constant</span>
-<span style="color: #BA2121; font-style: italic">        &quot;&quot;&quot;</span>
-        <span style="color: #008000; font-weight: bold">assert</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>layers, <span style="color: #BA2121">&quot;OutputLayer should follow FullyConnectedLayer in CNN&quot;</span>
-
-        <span style="color: #008000; font-weight: bold">if</span> scheduler <span style="color: #AA22FF; font-weight: bold">is</span> <span style="color: #008000; font-weight: bold">None</span>:
-            scheduler <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>scheduler
-
-        output_layer <span style="color: #666666">=</span> OutputLayer(
-            nodes, output_func, <span style="color: #008000">self</span><span style="color: #666666">.</span>cost_func, scheduler, <span style="color: #008000">self</span><span style="color: #666666">.</span>seed
-        )
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>layers<span style="color: #666666">.</span>append(output_layer)
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>pred_format <span style="color: #666666">=</span> output_layer<span style="color: #666666">.</span>get_pred_format()
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">add_FlattenLayer</span>(<span style="color: #008000">self</span>, act_func<span style="color: #666666">=</span>LRELU) <span style="color: #666666">-&gt;</span> <span style="color: #008000; font-weight: bold">None</span>:
-        <span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;</span>
-<span style="color: #BA2121; font-style: italic">        Description:</span>
-<span style="color: #BA2121; font-style: italic">        ------------</span>
-<span style="color: #BA2121; font-style: italic">            Add a FlattenLayer to the CNN, which flattens the image data such that it is formatted to</span>
-<span style="color: #BA2121; font-style: italic">            be used in the feed forward neural network part of the CNN</span>
-<span style="color: #BA2121; font-style: italic">        &quot;&quot;&quot;</span>
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>layers<span style="color: #666666">.</span>append(FlattenLayer(act_func<span style="color: #666666">=</span>act_func, seed<span style="color: #666666">=</span><span style="color: #008000">self</span><span style="color: #666666">.</span>seed))
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">add_Convolution2DLayer</span>(
-        <span style="color: #008000">self</span>,
-        input_channels<span style="color: #666666">=1</span>,
-        feature_maps<span style="color: #666666">=1</span>,
-        kernel_height<span style="color: #666666">=3</span>,
-        kernel_width<span style="color: #666666">=3</span>,
-        v_stride<span style="color: #666666">=1</span>,
-        h_stride<span style="color: #666666">=1</span>,
-        pad<span style="color: #666666">=</span><span style="color: #BA2121">&quot;same&quot;</span>,
-        act_func<span style="color: #666666">=</span>LRELU,
-        optimized<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">True</span>,
-    ) <span style="color: #666666">-&gt;</span> <span style="color: #008000; font-weight: bold">None</span>:
-        <span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;</span>
-<span style="color: #BA2121; font-style: italic">        Description:</span>
-<span style="color: #BA2121; font-style: italic">        ------------</span>
-<span style="color: #BA2121; font-style: italic">            Add a Convolution2DLayer to the CNN, i.e. a convolutional layer with a 2 dimensional kernel. Should be</span>
-<span style="color: #BA2121; font-style: italic">            the first layer added to the CNN</span>
-
-<span style="color: #BA2121; font-style: italic">        Parameters:</span>
-<span style="color: #BA2121; font-style: italic">        ------------</span>
-<span style="color: #BA2121; font-style: italic">            I   input_channels (int) specifies amount of input channels. For monochrome images, use input_channels</span>
-<span style="color: #BA2121; font-style: italic">                = 1, and input_channels = 3 for colored images, where each channel represents one of R, G and B</span>
-<span style="color: #BA2121; font-style: italic">            II  feature_maps (int) amount of feature maps in CNN</span>
-<span style="color: #BA2121; font-style: italic">            III kernel_height (int) height of the kernel, also called &#39;convolutional filter&#39; in literature</span>
-<span style="color: #BA2121; font-style: italic">            IV  kernel_width (int) width of the kernel, also called &#39;convolutional filter&#39; in literature</span>
-<span style="color: #BA2121; font-style: italic">            V   v_stride (int) value of vertical stride for dimentionality reduction</span>
-<span style="color: #BA2121; font-style: italic">            VI  h_stride (int) value of horizontal stride for dimentionality reduction</span>
-<span style="color: #BA2121; font-style: italic">            VII pad (str) default = &quot;same&quot; ensures output size is the same as input size (given stride=1)</span>
-<span style="color: #BA2121; font-style: italic">           VIII act_func (activationFunctions) default = &quot;LRELU&quot;, nonlinear activation function</span>
-<span style="color: #BA2121; font-style: italic">             IX optimized (bool) default = True, uses Convolution2DLayerOPT if True which is much faster when</span>
-<span style="color: #BA2121; font-style: italic">                compared to Convolution2DLayer, which is a more straightforward, understandable implementation</span>
-<span style="color: #BA2121; font-style: italic">        &quot;&quot;&quot;</span>
-        <span style="color: #008000; font-weight: bold">if</span> optimized:
-            conv_layer <span style="color: #666666">=</span> Convolution2DLayerOPT(
-                input_channels,
-                feature_maps,
-                kernel_height,
-                kernel_width,
-                v_stride,
-                h_stride,
-                pad,
-                act_func,
-                <span style="color: #008000">self</span><span style="color: #666666">.</span>seed,
-                reset_weights_independently<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">False</span>,
-            )
-        <span style="color: #008000; font-weight: bold">else</span>:
-            conv_layer <span style="color: #666666">=</span> Convolution2DLayer(
-                input_channels,
-                feature_maps,
-                kernel_height,
-                kernel_width,
-                v_stride,
-                h_stride,
-                pad,
-                act_func,
-                <span style="color: #008000">self</span><span style="color: #666666">.</span>seed,
-                reset_weights_independently<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">False</span>,
-            )
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>layers<span style="color: #666666">.</span>append(conv_layer)
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">add_PoolingLayer</span>(
-        <span style="color: #008000">self</span>, kernel_height<span style="color: #666666">=2</span>, kernel_width<span style="color: #666666">=2</span>, v_stride<span style="color: #666666">=1</span>, h_stride<span style="color: #666666">=1</span>, pooling<span style="color: #666666">=</span><span style="color: #BA2121">&quot;max&quot;</span>
-    ) <span style="color: #666666">-&gt;</span> <span style="color: #008000; font-weight: bold">None</span>:
-        <span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;</span>
-<span style="color: #BA2121; font-style: italic">        Description:</span>
-<span style="color: #BA2121; font-style: italic">        ------------</span>
-<span style="color: #BA2121; font-style: italic">            Add a Pooling2DLayer to the CNN, i.e. a pooling layer that reduces the dimentionality of</span>
-<span style="color: #BA2121; font-style: italic">            the image data. It is not necessary to use a Pooling2DLayer when creating a CNN, but it</span>
-<span style="color: #BA2121; font-style: italic">            can be used to speed up the training</span>
-
-<span style="color: #BA2121; font-style: italic">        Parameters:</span>
-<span style="color: #BA2121; font-style: italic">        ------------</span>
-<span style="color: #BA2121; font-style: italic">            I   kernel_height (int) height of the kernel used for pooling</span>
-<span style="color: #BA2121; font-style: italic">            II  kernel_width (int) width of the kernel used for pooling</span>
-<span style="color: #BA2121; font-style: italic">            III v_stride (int) value of vertical stride for dimentionality reduction</span>
-<span style="color: #BA2121; font-style: italic">            IV  h_stride (int) value of horizontal stride for dimentionality reduction</span>
-<span style="color: #BA2121; font-style: italic">            V   pooling (str) either &quot;max&quot; or &quot;average&quot;, describes type of pooling performed</span>
-<span style="color: #BA2121; font-style: italic">        &quot;&quot;&quot;</span>
-        pooling_layer <span style="color: #666666">=</span> Pooling2DLayer(
-            kernel_height, kernel_width, v_stride, h_stride, pooling, <span style="color: #008000">self</span><span style="color: #666666">.</span>seed
-        )
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>layers<span style="color: #666666">.</span>append(pooling_layer)
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">fit</span>(
-        <span style="color: #008000">self</span>,
-        X: np<span style="color: #666666">.</span>ndarray,
-        t: np<span style="color: #666666">.</span>ndarray,
-        epochs: <span style="color: #008000">int</span> <span style="color: #666666">=</span> <span style="color: #666666">100</span>,
-        lam: <span style="color: #008000">float</span> <span style="color: #666666">=</span> <span style="color: #666666">0</span>,
-        batches: <span style="color: #008000">int</span> <span style="color: #666666">=</span> <span style="color: #666666">1</span>,
-        X_val: np<span style="color: #666666">.</span>ndarray <span style="color: #666666">=</span> <span style="color: #008000; font-weight: bold">None</span>,
-        t_val: np<span style="color: #666666">.</span>ndarray <span style="color: #666666">=</span> <span style="color: #008000; font-weight: bold">None</span>,
-    ) <span style="color: #666666">-&gt;</span> <span style="color: #008000">dict</span>:
-        <span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;</span>
-<span style="color: #BA2121; font-style: italic">        Description:</span>
-<span style="color: #BA2121; font-style: italic">        ------------</span>
-<span style="color: #BA2121; font-style: italic">            Fits the CNN to input X for a given amount of epochs. Performs feedforward and backpropagation passes,</span>
-<span style="color: #BA2121; font-style: italic">            can utilize batches, regulariziation and validation if desired.</span>
-
-<span style="color: #BA2121; font-style: italic">        Parameters:</span>
-<span style="color: #BA2121; font-style: italic">        ------------</span>
-<span style="color: #BA2121; font-style: italic">            X (numpy array) with input data in format [images, input channels,</span>
-<span style="color: #BA2121; font-style: italic">            image height, image_width]</span>
-<span style="color: #BA2121; font-style: italic">            t (numpy array) target labels for input data</span>
-<span style="color: #BA2121; font-style: italic">            epochs (int) amount of epochs</span>
-<span style="color: #BA2121; font-style: italic">            lam (float) regulariziation term lambda</span>
-<span style="color: #BA2121; font-style: italic">            batches (int) amount of batches input data splits into</span>
-<span style="color: #BA2121; font-style: italic">            X_val (numpy array) validation data</span>
-<span style="color: #BA2121; font-style: italic">            t_val (numpy array) target labels for validation data</span>
-
-<span style="color: #BA2121; font-style: italic">        Returns:</span>
-<span style="color: #BA2121; font-style: italic">        ------------</span>
-<span style="color: #BA2121; font-style: italic">            scores (dict) a dictionary with &quot;train_error&quot;, &quot;train_acc&quot;, &quot;val_error&quot;, val_acc&quot; keys</span>
-<span style="color: #BA2121; font-style: italic">            that contain numpy arrays with float values of all accuracies/errors over all epochs.</span>
-<span style="color: #BA2121; font-style: italic">            Can be used to create plots. Also used to update the progress bar during training</span>
-<span style="color: #BA2121; font-style: italic">        &quot;&quot;&quot;</span>
-
-        <span style="color: #408080; font-style: italic"># setup</span>
-        <span style="color: #008000; font-weight: bold">if</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>seed <span style="color: #AA22FF; font-weight: bold">is</span> <span style="color: #AA22FF; font-weight: bold">not</span> <span style="color: #008000; font-weight: bold">None</span>:
-            np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>seed(<span style="color: #008000">self</span><span style="color: #666666">.</span>seed)
-
-        <span style="color: #408080; font-style: italic"># initialize weights</span>
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>_initialize_weights(X)
-
-        <span style="color: #408080; font-style: italic"># create arrays for score metrics</span>
-        scores <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>_initialize_scores(epochs)
-
-        <span style="color: #008000; font-weight: bold">assert</span> batches <span style="color: #666666">&lt;=</span> t<span style="color: #666666">.</span>shape[<span style="color: #666666">0</span>]
-        batch_size <span style="color: #666666">=</span> X<span style="color: #666666">.</span>shape[<span style="color: #666666">0</span>] <span style="color: #666666">//</span> batches
-
-        <span style="color: #008000; font-weight: bold">try</span>:
-            <span style="color: #008000; font-weight: bold">for</span> epoch <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(epochs):
-                <span style="color: #008000; font-weight: bold">for</span> batch <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(batches):
-                    <span style="color: #408080; font-style: italic"># minibatch gradient descent</span>
-                    <span style="color: #408080; font-style: italic"># If the for loop has reached the last batch, take all thats left</span>
-                    <span style="color: #008000; font-weight: bold">if</span> batch <span style="color: #666666">==</span> batches <span style="color: #666666">-</span> <span style="color: #666666">1</span>:
-                        X_batch <span style="color: #666666">=</span> X[batch <span style="color: #666666">*</span> batch_size :, :, :, :]
-                        t_batch <span style="color: #666666">=</span> t[batch <span style="color: #666666">*</span> batch_size :, :]
-                    <span style="color: #008000; font-weight: bold">else</span>:
-                        X_batch <span style="color: #666666">=</span> X[
-                            batch <span style="color: #666666">*</span> batch_size : (batch <span style="color: #666666">+</span> <span style="color: #666666">1</span>) <span style="color: #666666">*</span> batch_size, :, :, :
-                        ]
-                        t_batch <span style="color: #666666">=</span> t[batch <span style="color: #666666">*</span> batch_size : (batch <span style="color: #666666">+</span> <span style="color: #666666">1</span>) <span style="color: #666666">*</span> batch_size, :]
-
-                    <span style="color: #008000">self</span><span style="color: #666666">.</span>_feedforward(X_batch)
-                    <span style="color: #008000">self</span><span style="color: #666666">.</span>_backpropagate(t_batch, lam)
-
-                <span style="color: #408080; font-style: italic"># reset schedulers for each epoch (some schedulers pass in this call)</span>
-                <span style="color: #008000; font-weight: bold">for</span> layer <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>layers:
-                    <span style="color: #008000; font-weight: bold">if</span> <span style="color: #008000">isinstance</span>(layer, FullyConnectedLayer):
-                        layer<span style="color: #666666">.</span>_reset_scheduler()
-
-                <span style="color: #408080; font-style: italic"># computing performance metrics</span>
-                scores <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>_compute_scores(scores, epoch, X, t, X_val, t_val)
-
-                <span style="color: #408080; font-style: italic"># printing progress bar</span>
-                print_length <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>_progress_bar(
-                    epoch,
-                    epochs,
-                    scores,
-                )
-        <span style="color: #408080; font-style: italic"># allows for stopping training at any point and seeing the result</span>
-        <span style="color: #008000; font-weight: bold">except</span> <span style="color: #D2413A; font-weight: bold">KeyboardInterrupt</span>:
-            <span style="color: #008000; font-weight: bold">pass</span>
-
-        <span style="color: #408080; font-style: italic"># visualization of training progression (similiar to tensorflow progression bar)</span>
-        sys<span style="color: #666666">.</span>stdout<span style="color: #666666">.</span>write(<span style="color: #BA2121">&quot;</span><span style="color: #BB6622; font-weight: bold">\r</span><span style="color: #BA2121">&quot;</span> <span style="color: #666666">+</span> <span style="color: #BA2121">&quot; &quot;</span> <span style="color: #666666">*</span> print_length)
-        sys<span style="color: #666666">.</span>stdout<span style="color: #666666">.</span>flush()
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>_progress_bar(
-            epochs,
-            epochs,
-            scores,
-        )
-        sys<span style="color: #666666">.</span>stdout<span style="color: #666666">.</span>write(<span style="color: #BA2121">&quot;&quot;</span>)
-
-        <span style="color: #008000; font-weight: bold">return</span> scores
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">_feedforward</span>(<span style="color: #008000">self</span>, X_batch) <span style="color: #666666">-&gt;</span> np<span style="color: #666666">.</span>ndarray:
-        <span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;</span>
-<span style="color: #BA2121; font-style: italic">        Description:</span>
-<span style="color: #BA2121; font-style: italic">        ------------</span>
-<span style="color: #BA2121; font-style: italic">            Performs the feedforward pass for all layers in the CNN. Called from fit()</span>
-<span style="color: #BA2121; font-style: italic">        &quot;&quot;&quot;</span>
-        a <span style="color: #666666">=</span> X_batch
-        <span style="color: #008000; font-weight: bold">for</span> layer <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>layers:
-            a <span style="color: #666666">=</span> layer<span style="color: #666666">.</span>_feedforward(a)
-
-        <span style="color: #008000; font-weight: bold">return</span> a
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">_backpropagate</span>(<span style="color: #008000">self</span>, t_batch, lam) <span style="color: #666666">-&gt;</span> <span style="color: #008000; font-weight: bold">None</span>:
-        <span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;</span>
-<span style="color: #BA2121; font-style: italic">        Description:</span>
-<span style="color: #BA2121; font-style: italic">        ------------</span>
-<span style="color: #BA2121; font-style: italic">            Performs backpropagation for all layers in the CNN. Called from fit()</span>
-<span style="color: #BA2121; font-style: italic">        &quot;&quot;&quot;</span>
-        <span style="color: #008000; font-weight: bold">assert</span> <span style="color: #008000">len</span>(<span style="color: #008000">self</span><span style="color: #666666">.</span>layers) <span style="color: #666666">&gt;=</span> <span style="color: #666666">2</span>
-        reversed_layers <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>layers[::<span style="color: #666666">-1</span>]
-
-        <span style="color: #408080; font-style: italic"># for every layer, backwards</span>
-        <span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #008000">len</span>(reversed_layers) <span style="color: #666666">-</span> <span style="color: #666666">1</span>):
-            layer <span style="color: #666666">=</span> reversed_layers[i]
-            prev_layer <span style="color: #666666">=</span> reversed_layers[i <span style="color: #666666">+</span> <span style="color: #666666">1</span>]
-
-            <span style="color: #408080; font-style: italic"># OutputLayer</span>
-            <span style="color: #008000; font-weight: bold">if</span> <span style="color: #008000">isinstance</span>(layer, OutputLayer):
-                prev_a <span style="color: #666666">=</span> prev_layer<span style="color: #666666">.</span>get_prev_a()
-                weights_next, delta_next <span style="color: #666666">=</span> layer<span style="color: #666666">.</span>_backpropagate(t_batch, prev_a, lam)
-
-            <span style="color: #408080; font-style: italic"># FullyConnectedLayer</span>
-            <span style="color: #008000; font-weight: bold">elif</span> <span style="color: #008000">isinstance</span>(layer, FullyConnectedLayer):
-                <span style="color: #008000; font-weight: bold">assert</span> (
-                    delta_next <span style="color: #AA22FF; font-weight: bold">is</span> <span style="color: #AA22FF; font-weight: bold">not</span> <span style="color: #008000; font-weight: bold">None</span>
-                ), <span style="color: #BA2121">&quot;No OutputLayer to follow FullyConnectedLayer&quot;</span>
-                <span style="color: #008000; font-weight: bold">assert</span> (
-                    weights_next <span style="color: #AA22FF; font-weight: bold">is</span> <span style="color: #AA22FF; font-weight: bold">not</span> <span style="color: #008000; font-weight: bold">None</span>
-                ), <span style="color: #BA2121">&quot;No OutputLayer to follow FullyConnectedLayer&quot;</span>
-                prev_a <span style="color: #666666">=</span> prev_layer<span style="color: #666666">.</span>get_prev_a()
-                weights_next, delta_next <span style="color: #666666">=</span> layer<span style="color: #666666">.</span>_backpropagate(
-                    weights_next, delta_next, prev_a, lam
-                )
-
-            <span style="color: #408080; font-style: italic"># FlattenLayer</span>
-            <span style="color: #008000; font-weight: bold">elif</span> <span style="color: #008000">isinstance</span>(layer, FlattenLayer):
-                <span style="color: #008000; font-weight: bold">assert</span> (
-                    delta_next <span style="color: #AA22FF; font-weight: bold">is</span> <span style="color: #AA22FF; font-weight: bold">not</span> <span style="color: #008000; font-weight: bold">None</span>
-                ), <span style="color: #BA2121">&quot;No FullyConnectedLayer to follow FlattenLayer&quot;</span>
-                <span style="color: #008000; font-weight: bold">assert</span> (
-                    weights_next <span style="color: #AA22FF; font-weight: bold">is</span> <span style="color: #AA22FF; font-weight: bold">not</span> <span style="color: #008000; font-weight: bold">None</span>
-                ), <span style="color: #BA2121">&quot;No FullyConnectedLayer to follow FlattenLayer&quot;</span>
-                delta_next <span style="color: #666666">=</span> layer<span style="color: #666666">.</span>_backpropagate(weights_next, delta_next)
-
-            <span style="color: #408080; font-style: italic"># Convolution2DLayer and Convolution2DLayerOPT</span>
-            <span style="color: #008000; font-weight: bold">elif</span> <span style="color: #008000">isinstance</span>(layer, Convolution2DLayer):
-                <span style="color: #008000; font-weight: bold">assert</span> (
-                    delta_next <span style="color: #AA22FF; font-weight: bold">is</span> <span style="color: #AA22FF; font-weight: bold">not</span> <span style="color: #008000; font-weight: bold">None</span>
-                ), <span style="color: #BA2121">&quot;No FlattenLayer to follow Convolution2DLayer&quot;</span>
-                delta_next <span style="color: #666666">=</span> layer<span style="color: #666666">.</span>_backpropagate(delta_next)
-
-            <span style="color: #408080; font-style: italic"># Pooling2DLayer</span>
-            <span style="color: #008000; font-weight: bold">elif</span> <span style="color: #008000">isinstance</span>(layer, Pooling2DLayer):
-                <span style="color: #008000; font-weight: bold">assert</span> delta_next <span style="color: #AA22FF; font-weight: bold">is</span> <span style="color: #AA22FF; font-weight: bold">not</span> <span style="color: #008000; font-weight: bold">None</span>, <span style="color: #BA2121">&quot;No Layer to follow Pooling2DLayer&quot;</span>
-                delta_next <span style="color: #666666">=</span> layer<span style="color: #666666">.</span>_backpropagate(delta_next)
-
-            <span style="color: #408080; font-style: italic"># Catch error</span>
-            <span style="color: #008000; font-weight: bold">else</span>:
-                <span style="color: #008000; font-weight: bold">raise</span> <span style="color: #D2413A; font-weight: bold">NotImplementedError</span>
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">_compute_scores</span>(
-        <span style="color: #008000">self</span>,
-        scores: <span style="color: #008000">dict</span>,
-        epoch: <span style="color: #008000">int</span>,
-        X: np<span style="color: #666666">.</span>ndarray,
-        t: np<span style="color: #666666">.</span>ndarray,
-        X_val: np<span style="color: #666666">.</span>ndarray,
-        t_val: np<span style="color: #666666">.</span>ndarray,
-    ) <span style="color: #666666">-&gt;</span> <span style="color: #008000">dict</span>:
-        <span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;</span>
-<span style="color: #BA2121; font-style: italic">        Description:</span>
-<span style="color: #BA2121; font-style: italic">        ------------</span>
-<span style="color: #BA2121; font-style: italic">            Computes scores such as training error, training accuracy, validation error</span>
-<span style="color: #BA2121; font-style: italic">            and validation accuracy for the CNN depending on if a validation set is used</span>
-<span style="color: #BA2121; font-style: italic">            and if the CNN performs classification or regression</span>
-
-<span style="color: #BA2121; font-style: italic">        Returns:</span>
-<span style="color: #BA2121; font-style: italic">        ------------</span>
-<span style="color: #BA2121; font-style: italic">            scores (dict) a dictionary with &quot;train_error&quot;, &quot;train_acc&quot;, &quot;val_error&quot;, val_acc&quot; keys</span>
-<span style="color: #BA2121; font-style: italic">            that contain numpy arrays with float values of all accuracies/errors over all epochs.</span>
-<span style="color: #BA2121; font-style: italic">            Can be used to create plots. Also used to update the progress bar during training</span>
-<span style="color: #BA2121; font-style: italic">        &quot;&quot;&quot;</span>
-
-        pred_train <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>predict(X)
-        cost_function_train <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>cost_func(t)
-        train_error <span style="color: #666666">=</span> cost_function_train(pred_train)
-        scores[<span style="color: #BA2121">&quot;train_error&quot;</span>][epoch] <span style="color: #666666">=</span> train_error
-
-        <span style="color: #008000; font-weight: bold">if</span> X_val <span style="color: #AA22FF; font-weight: bold">is</span> <span style="color: #AA22FF; font-weight: bold">not</span> <span style="color: #008000; font-weight: bold">None</span> <span style="color: #AA22FF; font-weight: bold">and</span> t_val <span style="color: #AA22FF; font-weight: bold">is</span> <span style="color: #AA22FF; font-weight: bold">not</span> <span style="color: #008000; font-weight: bold">None</span>:
-            cost_function_val <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>cost_func(t_val)
-            pred_val <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>predict(X_val)
-            val_error <span style="color: #666666">=</span> cost_function_val(pred_val)
-            scores[<span style="color: #BA2121">&quot;val_error&quot;</span>][epoch] <span style="color: #666666">=</span> val_error
-
-        <span style="color: #008000; font-weight: bold">if</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>pred_format <span style="color: #666666">!=</span> <span style="color: #BA2121">&quot;Regression&quot;</span>:
-            train_acc <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>_accuracy(pred_train, t)
-            scores[<span style="color: #BA2121">&quot;train_acc&quot;</span>][epoch] <span style="color: #666666">=</span> train_acc
-            <span style="color: #008000; font-weight: bold">if</span> X_val <span style="color: #AA22FF; font-weight: bold">is</span> <span style="color: #AA22FF; font-weight: bold">not</span> <span style="color: #008000; font-weight: bold">None</span> <span style="color: #AA22FF; font-weight: bold">and</span> t_val <span style="color: #AA22FF; font-weight: bold">is</span> <span style="color: #AA22FF; font-weight: bold">not</span> <span style="color: #008000; font-weight: bold">None</span>:
-                val_acc <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>_accuracy(pred_val, t_val)
-                scores[<span style="color: #BA2121">&quot;val_acc&quot;</span>][epoch] <span style="color: #666666">=</span> val_acc
-
-        <span style="color: #008000; font-weight: bold">return</span> scores
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">_initialize_scores</span>(<span style="color: #008000">self</span>, epochs) <span style="color: #666666">-&gt;</span> <span style="color: #008000">dict</span>:
-        <span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;</span>
-<span style="color: #BA2121; font-style: italic">        Description:</span>
-<span style="color: #BA2121; font-style: italic">        ------------</span>
-<span style="color: #BA2121; font-style: italic">            Initializes scores such as training error, training accuracy, validation error</span>
-<span style="color: #BA2121; font-style: italic">            and validation accuracy for the CNN</span>
-
-<span style="color: #BA2121; font-style: italic">        Returns:</span>
-<span style="color: #BA2121; font-style: italic">        ------------</span>
-<span style="color: #BA2121; font-style: italic">            A dictionary with &quot;train_error&quot;, &quot;train_acc&quot;, &quot;val_error&quot;, val_acc&quot; keys that</span>
-<span style="color: #BA2121; font-style: italic">            will contain numpy arrays with float values of all accuracies/errors over all epochs</span>
-<span style="color: #BA2121; font-style: italic">            when passed through the _compute_scores() function during fit()</span>
-<span style="color: #BA2121; font-style: italic">        &quot;&quot;&quot;</span>
-        scores <span style="color: #666666">=</span> <span style="color: #008000">dict</span>()
-
-        train_errors <span style="color: #666666">=</span> np<span style="color: #666666">.</span>empty(epochs)
-        train_errors<span style="color: #666666">.</span>fill(np<span style="color: #666666">.</span>nan)
-        val_errors <span style="color: #666666">=</span> np<span style="color: #666666">.</span>empty(epochs)
-        val_errors<span style="color: #666666">.</span>fill(np<span style="color: #666666">.</span>nan)
-
-        train_accs <span style="color: #666666">=</span> np<span style="color: #666666">.</span>empty(epochs)
-        train_accs<span style="color: #666666">.</span>fill(np<span style="color: #666666">.</span>nan)
-        val_accs <span style="color: #666666">=</span> np<span style="color: #666666">.</span>empty(epochs)
-        val_accs<span style="color: #666666">.</span>fill(np<span style="color: #666666">.</span>nan)
-
-        scores[<span style="color: #BA2121">&quot;train_error&quot;</span>] <span style="color: #666666">=</span> train_errors
-        scores[<span style="color: #BA2121">&quot;val_error&quot;</span>] <span style="color: #666666">=</span> val_errors
-        scores[<span style="color: #BA2121">&quot;train_acc&quot;</span>] <span style="color: #666666">=</span> train_accs
-        scores[<span style="color: #BA2121">&quot;val_acc&quot;</span>] <span style="color: #666666">=</span> val_accs
-
-        <span style="color: #008000; font-weight: bold">return</span> scores
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">_initialize_weights</span>(<span style="color: #008000">self</span>, X: np<span style="color: #666666">.</span>ndarray) <span style="color: #666666">-&gt;</span> <span style="color: #008000; font-weight: bold">None</span>:
-        <span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;</span>
-<span style="color: #BA2121; font-style: italic">        Description:</span>
-<span style="color: #BA2121; font-style: italic">        ------------</span>
-<span style="color: #BA2121; font-style: italic">            Initializes weights for all layers in CNN</span>
-
-<span style="color: #BA2121; font-style: italic">        Parameters:</span>
-<span style="color: #BA2121; font-style: italic">        ------------</span>
-<span style="color: #BA2121; font-style: italic">            I   X (np.ndarray) input of format [img, feature_maps, height, width]</span>
-<span style="color: #BA2121; font-style: italic">        &quot;&quot;&quot;</span>
-        prev_nodes <span style="color: #666666">=</span> X
-        <span style="color: #008000; font-weight: bold">for</span> layer <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>layers:
-            prev_nodes <span style="color: #666666">=</span> layer<span style="color: #666666">.</span>_reset_weights(prev_nodes)
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">predict</span>(<span style="color: #008000">self</span>, X: np<span style="color: #666666">.</span>ndarray, <span style="color: #666666">*</span>, threshold<span style="color: #666666">=0.5</span>) <span style="color: #666666">-&gt;</span> np<span style="color: #666666">.</span>ndarray:
-        <span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;</span>
-<span style="color: #BA2121; font-style: italic">        Description:</span>
-<span style="color: #BA2121; font-style: italic">        ------------</span>
-<span style="color: #BA2121; font-style: italic">            Predicts output of input X</span>
-
-<span style="color: #BA2121; font-style: italic">        Parameters:</span>
-<span style="color: #BA2121; font-style: italic">        ------------</span>
-<span style="color: #BA2121; font-style: italic">            I   X (np.ndarray) input [img, feature_maps, height, width]</span>
-<span style="color: #BA2121; font-style: italic">        &quot;&quot;&quot;</span>
-
-        prediction <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>_feedforward(X)
-
-        <span style="color: #008000; font-weight: bold">if</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>pred_format <span style="color: #666666">==</span> <span style="color: #BA2121">&quot;Binary&quot;</span>:
-            <span style="color: #008000; font-weight: bold">return</span> np<span style="color: #666666">.</span>where(prediction <span style="color: #666666">&gt;</span> threshold, <span style="color: #666666">1</span>, <span style="color: #666666">0</span>)
-        <span style="color: #008000; font-weight: bold">elif</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>pred_format <span style="color: #666666">==</span> <span style="color: #BA2121">&quot;Multi-class&quot;</span>:
-            class_prediction <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros(prediction<span style="color: #666666">.</span>shape)
-            <span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(prediction<span style="color: #666666">.</span>shape[<span style="color: #666666">0</span>]):
-                class_prediction[i, np<span style="color: #666666">.</span>argmax(prediction[i, :])] <span style="color: #666666">=</span> <span style="color: #666666">1</span>
-            <span style="color: #008000; font-weight: bold">return</span> class_prediction
-        <span style="color: #008000; font-weight: bold">else</span>:
-            <span style="color: #008000; font-weight: bold">return</span> prediction
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">_accuracy</span>(<span style="color: #008000">self</span>, prediction: np<span style="color: #666666">.</span>ndarray, target: np<span style="color: #666666">.</span>ndarray) <span style="color: #666666">-&gt;</span> <span style="color: #008000">float</span>:
-        <span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;</span>
-<span style="color: #BA2121; font-style: italic">        Description:</span>
-<span style="color: #BA2121; font-style: italic">        ------------</span>
-<span style="color: #BA2121; font-style: italic">            Calculates accuracy of given prediction to target</span>
-
-<span style="color: #BA2121; font-style: italic">        Parameters:</span>
-<span style="color: #BA2121; font-style: italic">        ------------</span>
-<span style="color: #BA2121; font-style: italic">            I   prediction (np.ndarray): output of predict() fuction</span>
-<span style="color: #BA2121; font-style: italic">            (1s and 0s in case of classification, and real numbers in case of regression)</span>
-<span style="color: #BA2121; font-style: italic">            II  target (np.ndarray): vector of true values (What the network should predict)</span>
-
-<span style="color: #BA2121; font-style: italic">        Returns:</span>
-<span style="color: #BA2121; font-style: italic">        ------------</span>
-<span style="color: #BA2121; font-style: italic">            A floating point number representing the percentage of correctly classified instances.</span>
-<span style="color: #BA2121; font-style: italic">        &quot;&quot;&quot;</span>
-        <span style="color: #008000; font-weight: bold">assert</span> prediction<span style="color: #666666">.</span>size <span style="color: #666666">==</span> target<span style="color: #666666">.</span>size
-        <span style="color: #008000; font-weight: bold">return</span> np<span style="color: #666666">.</span>average((target <span style="color: #666666">==</span> prediction))
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">_progress_bar</span>(<span style="color: #008000">self</span>, epoch: <span style="color: #008000">int</span>, epochs: <span style="color: #008000">int</span>, scores: <span style="color: #008000">dict</span>) <span style="color: #666666">-&gt;</span> <span style="color: #008000">int</span>:
-        <span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;</span>
-<span style="color: #BA2121; font-style: italic">        Description:</span>
-<span style="color: #BA2121; font-style: italic">        ------------</span>
-<span style="color: #BA2121; font-style: italic">            Displays progress of training</span>
-<span style="color: #BA2121; font-style: italic">        &quot;&quot;&quot;</span>
-        progression <span style="color: #666666">=</span> epoch <span style="color: #666666">/</span> epochs
-        epoch <span style="color: #666666">-=</span> <span style="color: #666666">1</span>
-        print_length <span style="color: #666666">=</span> <span style="color: #666666">40</span>
-        num_equals <span style="color: #666666">=</span> <span style="color: #008000">int</span>(progression <span style="color: #666666">*</span> print_length)
-        num_not <span style="color: #666666">=</span> print_length <span style="color: #666666">-</span> num_equals
-        arrow <span style="color: #666666">=</span> <span style="color: #BA2121">&quot;&gt;&quot;</span> <span style="color: #008000; font-weight: bold">if</span> num_equals <span style="color: #666666">&gt;</span> <span style="color: #666666">0</span> <span style="color: #008000; font-weight: bold">else</span> <span style="color: #BA2121">&quot;&quot;</span>
-        bar <span style="color: #666666">=</span> <span style="color: #BA2121">&quot;[&quot;</span> <span style="color: #666666">+</span> <span style="color: #BA2121">&quot;=&quot;</span> <span style="color: #666666">*</span> (num_equals <span style="color: #666666">-</span> <span style="color: #666666">1</span>) <span style="color: #666666">+</span> arrow <span style="color: #666666">+</span> <span style="color: #BA2121">&quot;-&quot;</span> <span style="color: #666666">*</span> num_not <span style="color: #666666">+</span> <span style="color: #BA2121">&quot;]&quot;</span>
-        perc_print <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>_fmt(progression <span style="color: #666666">*</span> <span style="color: #666666">100</span>, N<span style="color: #666666">=5</span>)
-        line <span style="color: #666666">=</span> <span style="color: #BA2121">f&quot;  </span><span style="color: #BB6688; font-weight: bold">{</span>bar<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121"> </span><span style="color: #BB6688; font-weight: bold">{</span>perc_print<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">% &quot;</span>
-
-        <span style="color: #008000; font-weight: bold">for</span> key, score <span style="color: #AA22FF; font-weight: bold">in</span> scores<span style="color: #666666">.</span>items():
-            <span style="color: #008000; font-weight: bold">if</span> np<span style="color: #666666">.</span>isnan(score[epoch]) <span style="color: #666666">==</span> <span style="color: #008000; font-weight: bold">False</span>:
-                value <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>_fmt(score[epoch], N<span style="color: #666666">=4</span>)
-                line <span style="color: #666666">+=</span> <span style="color: #BA2121">f&quot;| </span><span style="color: #BB6688; font-weight: bold">{</span>key<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">: </span><span style="color: #BB6688; font-weight: bold">{</span>value<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121"> &quot;</span>
-        <span style="color: #008000">print</span>(line, end<span style="color: #666666">=</span><span style="color: #BA2121">&quot;</span><span style="color: #BB6622; font-weight: bold">\r</span><span style="color: #BA2121">&quot;</span>)
-        <span style="color: #008000; font-weight: bold">return</span> <span style="color: #008000">len</span>(line)
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">_fmt</span>(<span style="color: #008000">self</span>, value: <span style="color: #008000">int</span>, N<span style="color: #666666">=4</span>) <span style="color: #666666">-&gt;</span> <span style="color: #008000">str</span>:
-        <span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;</span>
-<span style="color: #BA2121; font-style: italic">        Description:</span>
-<span style="color: #BA2121; font-style: italic">        ------------</span>
-<span style="color: #BA2121; font-style: italic">            Formats decimal numbers for progress bar</span>
-<span style="color: #BA2121; font-style: italic">        &quot;&quot;&quot;</span>
-        <span style="color: #008000; font-weight: bold">if</span> value <span style="color: #666666">&gt;</span> <span style="color: #666666">0</span>:
-            v <span style="color: #666666">=</span> value
-        <span style="color: #008000; font-weight: bold">elif</span> value <span style="color: #666666">&lt;</span> <span style="color: #666666">0</span>:
-            v <span style="color: #666666">=</span> <span style="color: #666666">-10</span> <span style="color: #666666">*</span> value
-        <span style="color: #008000; font-weight: bold">else</span>:
-            v <span style="color: #666666">=</span> <span style="color: #666666">1</span>
-        n <span style="color: #666666">=</span> <span style="color: #666666">1</span> <span style="color: #666666">+</span> math<span style="color: #666666">.</span>floor(math<span style="color: #666666">.</span>log10(v))
-        <span style="color: #008000; font-weight: bold">if</span> n <span style="color: #666666">&gt;=</span> N <span style="color: #666666">-</span> <span style="color: #666666">1</span>:
-            <span style="color: #008000; font-weight: bold">return</span> <span style="color: #008000">str</span>(<span style="color: #008000">round</span>(value))
-            <span style="color: #408080; font-style: italic"># or overflow</span>
-        <span style="color: #008000; font-weight: bold">return</span> <span style="color: #BA2121">f&quot;</span><span style="color: #BB6688; font-weight: bold">{</span>value<span style="color: #BB6688; font-weight: bold">:</span><span style="color: #BA2121">.</span><span style="color: #BB6688; font-weight: bold">{</span>N<span style="color: #666666">-</span>n<span style="color: #666666">-1</span><span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">f</span><span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">&quot;</span>
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-<h3 id="usage-of-cnn-code" class="anchor">Usage of CNN code </h3>
-
-<p>Using the CNN codebase is very simple. We begin by initiating a CNN
-object, which takes a cost function, a scheduler and a seed as its
-arguments. If a scheduler is not provided, it will per default
-initiate an Adam scheduler with eta=1e-4, and if a seed is not
-provided, the CNN will not be seeded, meaning it will run with a
-different random seed every run. Below we demonstrate an initiation of
-our CNN.
-</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;">adam_scheduler <span style="color: #666666">=</span> Adam(eta<span style="color: #666666">=1e-3</span>, rho<span style="color: #666666">=0.9</span>, rho2<span style="color: #666666">=0.999</span>)
-cnn <span style="color: #666666">=</span> CNN(cost_func<span style="color: #666666">=</span>CostCrossEntropy, scheduler<span style="color: #666666">=</span>adam_scheduler, seed<span style="color: #666666">=2023</span>)
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-<p>Now that we have our CNN object, we can begin to add layers to it!
-Many of the add_layer functions have default values, for example
-add_Convolution2DLayer() has a default v_stride and h_stride of
-1. However, these can of course be set to any value you please. Note
-that the input channels of a subsequent convolutional layer must equal
-the previous convolutional layer's feature maps.
-</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;">cnn<span style="color: #666666">.</span>add_Convolution2DLayer(
-    input_channels<span style="color: #666666">=1</span>,
-    feature_maps<span style="color: #666666">=1</span>,
-    kernel_height<span style="color: #666666">=3</span>,
-    kernel_width<span style="color: #666666">=3</span>,
-    act_func<span style="color: #666666">=</span>LRELU,
-)
-
-cnn<span style="color: #666666">.</span>add_FlattenLayer()
-
-cnn<span style="color: #666666">.</span>add_FullyConnectedLayer(<span style="color: #666666">30</span>, LRELU)
-
-cnn<span style="color: #666666">.</span>add_FullyConnectedLayer(<span style="color: #666666">20</span>, LRELU)
-
-cnn<span style="color: #666666">.</span>add_OutputLayer(<span style="color: #666666">10</span>, softmax)
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-<p>Here we have created a CNN with the following architecture:</p>
-
-<ol>
-<li> A convolutional layer with 1 input channel, with a kernel height of 2 and a width of 2, which uses LRELU as its non-linearity function. This layer outputs 1 feature map, which feed into the subsequent layer.</li>
-<li> A flatten layer</li>
-<li> A hidden layer with 30 nodes, with LRELU as its activation function</li>
-<li> Another hidden layer but with 20 nodes</li>
-<li> The output layer, with softmax as its activation function and 10 nodes. We use 10 nodes because we will be using a dataset with 10 classes.</li>
-</ol>
-<p>Now, before we can train the model, we need to load in our data. We
-will use the MNIST dataset and use 10000 \( 28 \times  28 \) images.
-</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.datasets</span> <span style="color: #008000; font-weight: bold">import</span> fetch_openml
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.model_selection</span> <span style="color: #008000; font-weight: bold">import</span> train_test_split
-
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">onehot</span>(target: np<span style="color: #666666">.</span>ndarray):
-    onehot <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros((target<span style="color: #666666">.</span>size, target<span style="color: #666666">.</span>max() <span style="color: #666666">+</span> <span style="color: #666666">1</span>))
-    onehot[np<span style="color: #666666">.</span>arange(target<span style="color: #666666">.</span>size), target] <span style="color: #666666">=</span> <span style="color: #666666">1</span>
-    <span style="color: #008000; font-weight: bold">return</span> onehot
-
-<span style="color: #408080; font-style: italic"># get dataset</span>
-dataset <span style="color: #666666">=</span> fetch_openml(<span style="color: #BA2121">&quot;mnist_784&quot;</span>, parser<span style="color: #666666">=</span><span style="color: #BA2121">&quot;auto&quot;</span>)
-mnist <span style="color: #666666">=</span> dataset<span style="color: #666666">.</span>data<span style="color: #666666">.</span>to_numpy(dtype<span style="color: #666666">=</span><span style="color: #BA2121">&quot;float&quot;</span>)[:<span style="color: #666666">10000</span>, :]
-
-<span style="color: #408080; font-style: italic"># scale data</span>
-<span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(mnist<span style="color: #666666">.</span>shape[<span style="color: #666666">1</span>]):
-    mnist[:, i] <span style="color: #666666">/=</span> <span style="color: #666666">255</span>
-    
-<span style="color: #408080; font-style: italic"># reshape to add single input channel to data shape [inputs, input_channels, height, width]</span>
-mnist <span style="color: #666666">=</span> mnist<span style="color: #666666">.</span>reshape(mnist<span style="color: #666666">.</span>shape[<span style="color: #666666">0</span>], <span style="color: #666666">1</span>, <span style="color: #666666">28</span>, <span style="color: #666666">28</span>)
-
-<span style="color: #408080; font-style: italic"># one hot encode target as we are doing multi-class classification</span>
-target <span style="color: #666666">=</span> onehot(np<span style="color: #666666">.</span>array([<span style="color: #008000">int</span>(i) <span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> dataset<span style="color: #666666">.</span>target<span style="color: #666666">.</span>to_numpy()[:<span style="color: #666666">10000</span>]]))
-
-<span style="color: #408080; font-style: italic"># split into training and validation data</span>
-x_train, x_val, y_train, y_val <span style="color: #666666">=</span> train_test_split(mnist, target)
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-<p>Now we may train our model. Note that we can utilize regularization in
-the CNN by using the lam (lambda) parameter in fit(), and utilize
-different types of gradient descent by specifying the amount of
-batches via the batches parameter as shown below.
-</p>
-
-<p>The functionfit() returns a score dictionary of the training error and
-accuracy (and validation error and accuracy if a validation set is
-provided) which can be used to plot the error and accuracy of the
-model over epochs.
-</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;">scores <span style="color: #666666">=</span> cnn<span style="color: #666666">.</span>fit(
-    x_train,
-    y_train,
-    lam<span style="color: #666666">=1e-5</span>,
-    batches<span style="color: #666666">=10</span>,
-    epochs<span style="color: #666666">=100</span>,
-    X_val<span style="color: #666666">=</span>x_val,
-    t_val<span style="color: #666666">=</span>y_val,
-)
-
-plt<span style="color: #666666">.</span>plot(scores[<span style="color: #BA2121">&quot;train_acc&quot;</span>], label<span style="color: #666666">=</span><span style="color: #BA2121">&quot;Training&quot;</span>)
-plt<span style="color: #666666">.</span>plot(scores[<span style="color: #BA2121">&quot;val_acc&quot;</span>], label<span style="color: #666666">=</span><span style="color: #BA2121">&quot;Validation&quot;</span>)
-plt<span style="color: #666666">.</span>ylim([<span style="color: #666666">0.8</span>,<span style="color: #666666">1</span>])
-plt<span style="color: #666666">.</span>xlabel(<span style="color: #BA2121">&quot;Epochs&quot;</span>)
-plt<span style="color: #666666">.</span>ylabel(<span style="color: #BA2121">&quot;Accuracy&quot;</span>)
-plt<span style="color: #666666">.</span>legend()
-plt<span style="color: #666666">.</span>show()
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-<p>Considering we only trained the model for 100 epochs without any tuning of the hyperparameters, this result is pretty good.</p>
-
-<p>The codebase allows for great flexibility in CNN
-architectures. Pooling layers can be added before, inbetween or after
-convolutional layers, but due to the great optimizations made within
-Convolution2DLayerOPT, we recommend using the v_stride and h_stride
-parameters in add_Convolution2DLayer() to reduce the dimentionality of
-the problem as the pooling layer is slow in comparison. To use the
-unoptimized version of Convolution2DLayer, simply pass optimized=False
-as an argument in add_Convolution2DLayer().
-</p>
-
-<p>If one wishes to perform binary classification using the CNN, simply
-use the cost function 'CostLogReg' when initializing the CNN and use 1
-node at the OutputLayer.
-</p>
-
-<p>Below we have created another, more untraditional architecture using
-our code to demonstrate its flexibility and different attributes such
-as asymmetric stride that might become useful when constructing your
-own CNN.
-</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;">adam_scheduler <span style="color: #666666">=</span> Adam(eta<span style="color: #666666">=1e-3</span>, rho<span style="color: #666666">=0.9</span>, rho2<span style="color: #666666">=0.999</span>)
-cnn <span style="color: #666666">=</span> CNN(cost_func<span style="color: #666666">=</span>CostCrossEntropy, scheduler<span style="color: #666666">=</span>adam_scheduler, seed<span style="color: #666666">=2023</span>)
-
-cnn<span style="color: #666666">.</span>add_Convolution2DLayer(
-    input_channels<span style="color: #666666">=1</span>,
-    feature_maps<span style="color: #666666">=7</span>,
-    kernel_height<span style="color: #666666">=7</span>,
-    kernel_width<span style="color: #666666">=1</span>,
-    act_func<span style="color: #666666">=</span>LRELU,
-)
-
-cnn<span style="color: #666666">.</span>add_PoolingLayer(
-    kernel_height<span style="color: #666666">=2</span>,
-    kernel_width<span style="color: #666666">=2</span>,
-    pooling<span style="color: #666666">=</span><span style="color: #BA2121">&quot;average&quot;</span>,
-)
-
-cnn<span style="color: #666666">.</span>add_PoolingLayer(
-    kernel_height<span style="color: #666666">=2</span>,
-    kernel_width<span style="color: #666666">=2</span>,
-    pooling<span style="color: #666666">=</span><span style="color: #BA2121">&quot;max&quot;</span>,
-)
-
-cnn<span style="color: #666666">.</span>add_Convolution2DLayer(
-    input_channels<span style="color: #666666">=7</span>,
-    feature_maps<span style="color: #666666">=1</span>,
-    kernel_height<span style="color: #666666">=4</span>,
-    kernel_width<span style="color: #666666">=4</span>,
-    v_stride<span style="color: #666666">=2</span>,
-    h_stride<span style="color: #666666">=3</span>,
-    act_func<span style="color: #666666">=</span>LRELU,
-    optimized<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">False</span>,
-)
-
-cnn<span style="color: #666666">.</span>add_Convolution2DLayer(
-    input_channels<span style="color: #666666">=1</span>,
-    feature_maps<span style="color: #666666">=1</span>,
-    kernel_height<span style="color: #666666">=2</span>,
-    kernel_width<span style="color: #666666">=2</span>,
-    act_func<span style="color: #666666">=</span>sigmoid,
-    optimized<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">True</span>,
-)
-
-cnn<span style="color: #666666">.</span>add_PoolingLayer(
-    kernel_height<span style="color: #666666">=2</span>,
-    kernel_width<span style="color: #666666">=2</span>,
-    pooling<span style="color: #666666">=</span><span style="color: #BA2121">&quot;max&quot;</span>
-)
-
-cnn<span style="color: #666666">.</span>add_FlattenLayer()
-
-cnn<span style="color: #666666">.</span>add_FullyConnectedLayer(<span style="color: #666666">100</span>, LRELU)
-
-cnn<span style="color: #666666">.</span>add_FullyConnectedLayer(<span style="color: #666666">10</span>, sigmoid)
-
-cnn<span style="color: #666666">.</span>add_FullyConnectedLayer(<span style="color: #666666">101</span>, identity)
-
-cnn<span style="color: #666666">.</span>add_OutputLayer(<span style="color: #666666">10</span>, softmax)
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-<p>Here we see the use of asymmetrical 1D kernels such as the \( 7 \times
-1 \) kernel in the first convolutional layer, both max and average
-pooling, asymmetric stride in the unoptimized convolutional layer,
-more pooling, a flatten layer, a hidden layer with 100 nodes using
-LRELU, another hidden layer with 10 hidden nodes that uses the sigmoid
-activation function, and another hidden layer with 101 nodes which
-utilizes no activation function (identity). Finally, we arrive at the
-output layer with 10 nodes, which uses softmax as its activation
-function.
-</p>
-<h3 id="additional-remarks" class="anchor">Additional Remarks </h3>
-
-<p>The stride parameter controls the distance between each convolution
-and the kernel/filter. If our image is padded, stride is the only
-parameter that determines the size of the output from a convolutional
-layer. However, if we decide not to perform any padding, the size of
-the output feature map depends on both the stride and kernel size. It
-is important to note that neither the stride nor the kernel has to be
-symmetrical. This means that we can use a rectangular filter if we
-choose, and the stride in the vertical direction (axis=0 in Python)
-does not need to be the same as the stride in the horizontal direction
-(axis=1 in Python). It may even be the case that asymmetric
-combinations of stride or kernel dimensions, or both, yield better
-results than symmetric values for these parameters.
-</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">convolve</span>(image, kernel, stride<span style="color: #666666">=1</span>):
-    <span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #666666">2</span>):
-        kernel <span style="color: #666666">=</span> np<span style="color: #666666">.</span>rot90(kernel)
-
-    k_half_height <span style="color: #666666">=</span> kernel<span style="color: #666666">.</span>shape[<span style="color: #666666">0</span>] <span style="color: #666666">//</span> <span style="color: #666666">2</span>
-    k_half_width <span style="color: #666666">=</span> kernel<span style="color: #666666">.</span>shape[<span style="color: #666666">0</span>] <span style="color: #666666">//</span> <span style="color: #666666">2</span>
-
-    conv_image <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros(image<span style="color: #666666">.</span>shape)
-    pad_image <span style="color: #666666">=</span> padding(image, kernel)
-
-    <span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(k_half_height, conv_image<span style="color: #666666">.</span>shape[<span style="color: #666666">0</span>] <span style="color: #666666">+</span> k_half_height, stride):
-        <span style="color: #008000; font-weight: bold">for</span> j <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(k_half_width, conv_image<span style="color: #666666">.</span>shape[<span style="color: #666666">1</span>] <span style="color: #666666">+</span> k_half_width, stride):
-            conv_image[i <span style="color: #666666">-</span> k_half_height, j <span style="color: #666666">-</span> k_half_width] <span style="color: #666666">=</span> np<span style="color: #666666">.</span>sum(
-                pad_image[
-                    i <span style="color: #666666">-</span> k_half_height : i <span style="color: #666666">+</span> k_half_height <span style="color: #666666">+</span> <span style="color: #666666">1</span>, j <span style="color: #666666">-</span> k_half_width : j <span style="color: #666666">+</span> k_half_width <span style="color: #666666">+</span> <span style="color: #666666">1</span>
-                ]
-                <span style="color: #666666">*</span> kernel
-            )
-
-    <span style="color: #008000; font-weight: bold">return</span> conv_image
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-<h3 id="remarks-on-the-speed" class="anchor">Remarks on the speed  </h3>
-
-<p>Despite the naive convolution algorithm shown above working finely, it
-is extremely slow, requiring approximately 20-30 seconds to process a
-single image. The time complexity of 2D convolution, which is O(NMnm),
-rapidly becomes a constraint and may, at worst, make computations
-infeasible. Consequently, optimizing the naive 2D convolution
-algorithm is a necessity, as the execution time of the algorithm
-significantly increases as the input data size expands. This can pose
-a bottleneck in applications that necessitate real-time processing of
-large data volumes, such as image and video processing, deep learning,
-and scientific simulations.
-</p>
-
-<p>To address this issue, we shall present two widely used optimization
-techniques: the separable kernel approach and Fast Fourier Transform
-(FFT). Both of these methods can drastically reduce the computational
-complexity of convolution and enhance the overall efficiency of
-processing substantial data quantities. While we shall refrain from
-delving into the intricacies of these algorithms, we strongly
-encourage you to examine at least the application of FFT to optimize
-computations.
-</p>
-<h3 id="convolution-using-separable-kernels" class="anchor">Convolution using separable kernels </h3>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">conv2DSep</span>(image, kernel, coef, stride<span style="color: #666666">=1</span>, pad<span style="color: #666666">=</span><span style="color: #BA2121">&quot;zero&quot;</span>):
-    <span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #666666">2</span>):
-        kernel <span style="color: #666666">=</span> np<span style="color: #666666">.</span>rot90(kernel)
-
-    <span style="color: #408080; font-style: italic"># The kernel is quadratic, thus we only need one of its dimensions</span>
-    half_dim <span style="color: #666666">=</span> kernel<span style="color: #666666">.</span>shape[<span style="color: #666666">0</span>] <span style="color: #666666">//</span> <span style="color: #666666">2</span>
-
-    ker1 <span style="color: #666666">=</span> np<span style="color: #666666">.</span>array(kernel[<span style="color: #666666">0</span>, :])
-    ker2 <span style="color: #666666">=</span> np<span style="color: #666666">.</span>array(kernel[:, <span style="color: #666666">0</span>])
-
-    <span style="color: #008000; font-weight: bold">if</span> pad <span style="color: #666666">==</span> <span style="color: #BA2121">&quot;zero&quot;</span>:
-        conv_image <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros(image<span style="color: #666666">.</span>shape)
-        pad_image <span style="color: #666666">=</span> padding(image, kernel)
-    <span style="color: #008000; font-weight: bold">else</span>:
-        conv_image <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros(
-            (image<span style="color: #666666">.</span>shape[<span style="color: #666666">0</span>] <span style="color: #666666">-</span> kernel<span style="color: #666666">.</span>shape[<span style="color: #666666">0</span>], image<span style="color: #666666">.</span>shape[<span style="color: #666666">1</span>] <span style="color: #666666">-</span> kernel<span style="color: #666666">.</span>shape[<span style="color: #666666">1</span>])
-        )
-        pad_image <span style="color: #666666">=</span> image[:, :]
-
-    <span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(half_dim, conv_image<span style="color: #666666">.</span>shape[<span style="color: #666666">0</span>] <span style="color: #666666">+</span> half_dim, stride):
-        <span style="color: #008000; font-weight: bold">for</span> j <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(half_dim, conv_image<span style="color: #666666">.</span>shape[<span style="color: #666666">1</span>] <span style="color: #666666">+</span> half_dim, stride):
-            conv_image[i <span style="color: #666666">-</span> half_dim, j <span style="color: #666666">-</span> half_dim] <span style="color: #666666">=</span> (
-                pad_image[
-                    i <span style="color: #666666">-</span> half_dim : i <span style="color: #666666">+</span> half_dim <span style="color: #666666">+</span> <span style="color: #666666">1</span>, j <span style="color: #666666">-</span> half_dim : j <span style="color: #666666">+</span> half_dim <span style="color: #666666">+</span> <span style="color: #666666">1</span>
-                ]
-                <span style="color: #666666">@</span> ker1
-                <span style="color: #666666">@</span> ker2<span style="color: #666666">.</span>T
-                <span style="color: #666666">*</span> coef
-            )
-
-    <span style="color: #008000; font-weight: bold">return</span> conv_image
-
-img_path <span style="color: #666666">=</span> img_path <span style="color: #666666">=</span> <span style="color: #BA2121">&quot;data/IMG-2167.JPG&quot;</span>
-image_of_cute_dog <span style="color: #666666">=</span> imageio<span style="color: #666666">.</span>imread(img_path, mode<span style="color: #666666">=</span><span style="color: #BA2121">&quot;L&quot;</span>)
-start_time <span style="color: #666666">=</span> time<span style="color: #666666">.</span>time()
-filtered_image <span style="color: #666666">=</span> conv2DSep(image_of_cute_dog, kernel<span style="color: #666666">=</span>sobel_kernel, coef<span style="color: #666666">=1</span>)
-<span style="color: #008000">print</span>(<span style="color: #BA2121">f&#39;Time taken for convolution with seperated kernel on 128x128 image </span><span style="color: #BB6688; font-weight: bold">{</span>time<span style="color: #666666">.</span>time() <span style="color: #666666">-</span> start_time<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">&#39;</span>)
-plt<span style="color: #666666">.</span>imshow(filtered_image, cmap<span style="color: #666666">=</span><span style="color: #BA2121">&quot;gray&quot;</span>, vmin<span style="color: #666666">=0</span>, vmax<span style="color: #666666">=255</span>, aspect<span style="color: #666666">=</span><span style="color: #BA2121">&quot;auto&quot;</span>)
-plt<span style="color: #666666">.</span>show()
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-<p>By taking advantage of the capabilities of separable kernels, we can
-effectively cut the computational expense of filtering an image in
-half. Yet, if we seek even more rapid processing, we can turn to the
-Fast Fourier Transform (FFT) algorithm provided by the numpy
-library. By utilizing FFT to transform the input image and filter into
-the frequency domain, we can perform convolution in this domain. This
-approach significantly reduces the number of operations needed and
-results in a marked speedup relative to other convolution
-techniques. In addition, it is worth noting that the FFT is widely
-regarded as one of the most critical algorithms developed to date,
-with applications ranging from digital signal processing to scientific
-computing.
-</p>
-<h3 id="convolution-in-the-fourier-domain" class="anchor">Convolution in the Fourier domain </h3>
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;">start_time <span style="color: #666666">=</span> time<span style="color: #666666">.</span>time()
-img_fft <span style="color: #666666">=</span> np<span style="color: #666666">.</span>fft<span style="color: #666666">.</span>fft2(image_of_cute_dog)
-kernel_fft <span style="color: #666666">=</span> np<span style="color: #666666">.</span>fft<span style="color: #666666">.</span>fft2(sobel_kernel, s<span style="color: #666666">=</span>image_of_cute_dog<span style="color: #666666">.</span>shape)
-
-conv_image <span style="color: #666666">=</span> img_fft <span style="color: #666666">*</span> kernel_fft
-
-filtered_image <span style="color: #666666">=</span> np<span style="color: #666666">.</span>fft<span style="color: #666666">.</span>ifft2(conv_image)
-<span style="color: #008000">print</span>(<span style="color: #BA2121">f&#39;Time take for convolution in the fourier domain: </span><span style="color: #BB6688; font-weight: bold">{</span>time<span style="color: #666666">.</span>time() <span style="color: #666666">-</span> start_time<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">&#39;</span>)
-plt<span style="color: #666666">.</span>imshow(filtered_image<span style="color: #666666">.</span>real, cmap<span style="color: #666666">=</span><span style="color: #BA2121">&quot;gray&quot;</span>, vmin<span style="color: #666666">=0</span>, vmax<span style="color: #666666">=255</span>, aspect<span style="color: #666666">=</span><span style="color: #BA2121">&quot;auto&quot;</span>)
-plt<span style="color: #666666">.</span>show()
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-<p>It is evident that executing convolution in the Fourier domain yields
-the quickest computation time. Nonetheless, one should exercise
-caution, particularly when dealing with images of relatively small
-dimensions, as one of the other methods may prove to be more
-expeditious than FFT-enhanced convolution. The overhead involved in
-transferring both the image and filter into the Fourier domain,
-followed by their subsequent transformation back into the spatial
-domain, results in a minor inconvenience. Therefore, it is imperative
-to remain cognizant of this fact when utilizing FFT as the primary
-optimization technique.
+<p>This recipe relies on us being able to actually perform the SVD. For
+large images, and in particular with many images to reconstruct, using the SVD 
+may quickly become an overwhelming task. With the advent of efficient deep
+learning methods like CNNs and later generative methods, these methods
+have become in the last years the premier way of performing image
+analysis. In particular for classification problems with labelled images.
 </p>
 
 <p>
@@ -3628,6 +423,18 @@ <h3 id="convolution-in-the-fourier-domain" class="anchor">Convolution in the Fou
   <li><a href="http://github.com/._week44-bs058.html">59</a></li>
   <li><a href="http://github.com/._week44-bs059.html">60</a></li>
   <li class="active"><a href="http://github.com/._week44-bs060.html">61</a></li>
+  <li><a href="http://github.com/._week44-bs061.html">62</a></li>
+  <li><a href="http://github.com/._week44-bs062.html">63</a></li>
+  <li><a href="http://github.com/._week44-bs063.html">64</a></li>
+  <li><a href="http://github.com/._week44-bs064.html">65</a></li>
+  <li><a href="http://github.com/._week44-bs065.html">66</a></li>
+  <li><a href="http://github.com/._week44-bs066.html">67</a></li>
+  <li><a href="http://github.com/._week44-bs067.html">68</a></li>
+  <li><a href="http://github.com/._week44-bs068.html">69</a></li>
+  <li><a href="http://github.com/._week44-bs069.html">70</a></li>
+  <li><a href="">...</a></li>
+  <li><a href="http://github.com/._week44-bs086.html">87</a></li>
+  <li><a href="http://github.com/._week44-bs061.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
 </div>  <!-- end container -->
diff --git a/doc/pub/week44/html/._week44-bs061.html b/doc/pub/week44/html/._week44-bs061.html
index a4a95eb39..961176ba8 100644
--- a/doc/pub/week44/html/._week44-bs061.html
+++ b/doc/pub/week44/html/._week44-bs061.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 44,  Convolutional Neural Networks (CNN)">
-<title>Week 44,  Convolutional Neural Networks (CNN)</title>
+<meta name="description" content="Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)">
+<title>Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week44.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week44-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -41,10 +41,121 @@
                2,
                None,
                'lab-sessions-on-tuesday-and-wednesday'),
-              ('Material for Lecture Monday October 28',
+              ('Material for Lecture Monday October 27',
                2,
                None,
-               'material-for-lecture-monday-october-28'),
+               'material-for-lecture-monday-october-27'),
+              ('Solving differential equations  with Deep Learning',
+               2,
+               None,
+               'solving-differential-equations-with-deep-learning'),
+              ('Ordinary Differential Equations first',
+               2,
+               None,
+               'ordinary-differential-equations-first'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('Minimization process', 2, None, 'minimization-process'),
+              ('Minimizing the cost function using gradient descent and '
+               'automatic differentiation',
+               2,
+               None,
+               'minimizing-the-cost-function-using-gradient-descent-and-automatic-differentiation'),
+              ('Example: Exponential decay',
+               2,
+               None,
+               'example-exponential-decay'),
+              ('The function to solve for',
+               2,
+               None,
+               'the-function-to-solve-for'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('Setup of Network', 2, None, 'setup-of-network'),
+              ('Reformulating the problem',
+               2,
+               None,
+               'reformulating-the-problem'),
+              ('More technicalities', 2, None, 'more-technicalities'),
+              ('More details', 2, None, 'more-details'),
+              ('A possible implementation of a neural network',
+               2,
+               None,
+               'a-possible-implementation-of-a-neural-network'),
+              ('Technicalities', 2, None, 'technicalities'),
+              ('Final technicalities I', 2, None, 'final-technicalities-i'),
+              ('Final technicalities II', 2, None, 'final-technicalities-ii'),
+              ('Final technicalities III', 2, None, 'final-technicalities-iii'),
+              ('Final technicalities IV', 2, None, 'final-technicalities-iv'),
+              ('Back propagation', 2, None, 'back-propagation'),
+              ('Gradient descent', 2, None, 'gradient-descent'),
+              ('The code for solving the ODE',
+               2,
+               None,
+               'the-code-for-solving-the-ode'),
+              ('The network with one input layer, specified number of hidden '
+               'layers, and one output layer',
+               2,
+               None,
+               'the-network-with-one-input-layer-specified-number-of-hidden-layers-and-one-output-layer'),
+              ('Example: Population growth',
+               2,
+               None,
+               'example-population-growth'),
+              ('Setting up the problem', 2, None, 'setting-up-the-problem'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('The program using Autograd',
+               2,
+               None,
+               'the-program-using-autograd'),
+              ('Using forward Euler to solve the ODE',
+               2,
+               None,
+               'using-forward-euler-to-solve-the-ode'),
+              ('Example: Solving the one dimensional Poisson equation',
+               2,
+               None,
+               'example-solving-the-one-dimensional-poisson-equation'),
+              ('The specific equation to solve for',
+               2,
+               None,
+               'the-specific-equation-to-solve-for'),
+              ('Solving the equation using Autograd',
+               2,
+               None,
+               'solving-the-equation-using-autograd'),
+              ('Comparing with a numerical scheme',
+               2,
+               None,
+               'comparing-with-a-numerical-scheme'),
+              ('Setting up the code', 2, None, 'setting-up-the-code'),
+              ('Partial Differential Equations',
+               2,
+               None,
+               'partial-differential-equations'),
+              ('Type of problem', 2, None, 'type-of-problem'),
+              ('Network requirements', 2, None, 'network-requirements'),
+              ('More details', 2, None, 'more-details'),
+              ('Example: The diffusion equation',
+               2,
+               None,
+               'example-the-diffusion-equation'),
+              ('Defining the problem', 2, None, 'defining-the-problem'),
+              ('Setting up the network using Autograd',
+               2,
+               None,
+               'setting-up-the-network-using-autograd'),
+              ('Setting up the network using Autograd; The trial solution',
+               2,
+               None,
+               'setting-up-the-network-using-autograd-the-trial-solution'),
+              ('Why the Jacobian?', 2, None, 'why-the-jacobian'),
+              ('Setting up the network using Autograd; The full program',
+               2,
+               None,
+               'setting-up-the-network-using-autograd-the-full-program'),
+              ('Resources on differential equations and deep learning',
+               2,
+               None,
+               'resources-on-differential-equations-and-deep-learning'),
               ('Convolutional Neural Networks (recognizing images)',
                2,
                None,
@@ -81,7 +192,6 @@
                'how-to-do-image-compression-before-the-era-of-deep-learning'),
               ('The SVD example', 2, None, 'the-svd-example'),
               ('Mathematics of CNNs', 2, None, 'mathematics-of-cnns'),
-              ('Mathematics of CNNs', 2, None, 'mathematics-of-cnns'),
               ('Convolution Examples: Polynomial multiplication',
                2,
                None,
@@ -142,85 +252,11 @@
                2,
                None,
                'pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
-              ('Building convolutional neural networks in Tensorflow and Keras',
-               2,
-               None,
-               'building-convolutional-neural-networks-in-tensorflow-and-keras'),
-              ('Setting it up', 2, None, 'setting-it-up'),
-              ('The MNIST dataset again', 2, None, 'the-mnist-dataset-again'),
-              ('Strong correlations', 2, None, 'strong-correlations'),
-              ('Layers of a CNN', 2, None, 'layers-of-a-cnn'),
-              ('Systematic reduction', 2, None, 'systematic-reduction'),
-              ('Prerequisites: Collect and pre-process data',
-               2,
-               None,
-               'prerequisites-collect-and-pre-process-data'),
-              ('Importing Keras and Tensorflow',
-               2,
-               None,
-               'importing-keras-and-tensorflow'),
-              ('Running with Keras', 2, None, 'running-with-keras'),
-              ('Final part', 2, None, 'final-part'),
-              ('Final visualization', 2, None, 'final-visualization'),
-              ('The CIFAR01 data set', 2, None, 'the-cifar01-data-set'),
-              ('Verifying the data set', 2, None, 'verifying-the-data-set'),
-              ('Set up  the model', 2, None, 'set-up-the-model'),
-              ('Add Dense layers on top', 2, None, 'add-dense-layers-on-top'),
-              ('Compile and train the model',
-               2,
-               None,
-               'compile-and-train-the-model'),
-              ('Finally, evaluate the model',
-               2,
-               None,
-               'finally-evaluate-the-model'),
-              ('Building our own CNN code',
+              ('Building convolutional neural networks in Tensorflow/Keras and '
+               'PyTorch',
                2,
                None,
-               'building-our-own-cnn-code'),
-              ('List of contents:', 3, None, 'list-of-contents'),
-              ('Schedulers', 3, None, 'schedulers'),
-              ('Usage of schedulers', 3, None, 'usage-of-schedulers'),
-              ('Cost functions', 3, None, 'cost-functions'),
-              ('Usage of cost functions', 3, None, 'usage-of-cost-functions'),
-              ('Activation functions', 3, None, 'activation-functions'),
-              ('Usage of activation functions',
-               3,
-               None,
-               'usage-of-activation-functions'),
-              ('Convolution', 3, None, 'convolution'),
-              ('Layers', 3, None, 'layers'),
-              ('Convolution2DLayer: convolution in a hidden layer',
-               3,
-               None,
-               'convolution2dlayer-convolution-in-a-hidden-layer'),
-              ('Backpropagation in the convolutional layer',
-               3,
-               None,
-               'backpropagation-in-the-convolutional-layer'),
-              ('Demonstration', 3, None, 'demonstration'),
-              ('Pooling Layer', 3, None, 'pooling-layer'),
-              ('Flattening Layer', 3, None, 'flattening-layer'),
-              ('Fully Connected Layers', 3, None, 'fully-connected-layers'),
-              ('Optimized Convolution2DLayer',
-               3,
-               None,
-               'optimized-convolution2dlayer'),
-              ('The Convolutional Neural Network (CNN)',
-               3,
-               None,
-               'the-convolutional-neural-network-cnn'),
-              ('Usage of CNN code', 3, None, 'usage-of-cnn-code'),
-              ('Additional Remarks', 3, None, 'additional-remarks'),
-              ('Remarks on the speed', 3, None, 'remarks-on-the-speed'),
-              ('Convolution using separable kernels',
-               3,
-               None,
-               'convolution-using-separable-kernels'),
-              ('Convolution in the Fourier domain',
-               3,
-               None,
-               'convolution-in-the-fourier-domain')]}
+               'building-convolutional-neural-networks-in-tensorflow-keras-and-pytorch')]}
 end of tocinfo -->
 
 <body>
@@ -248,97 +284,100 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week44-bs.html">Week 44,  Convolutional Neural Networks (CNN)</a>
+    <a class="navbar-brand" href="http://github.com/week44-bs.html">Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs001.html#plan-for-week-44" style="font-size: 80%;"><b>Plan for week 44</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs002.html#lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Lab  sessions on Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs003.html#material-for-lecture-monday-october-28" style="font-size: 80%;"><b>Material for Lecture Monday October 28</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs004.html#convolutional-neural-networks-recognizing-images" style="font-size: 80%;"><b>Convolutional Neural Networks (recognizing images)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs005.html#what-is-the-difference" style="font-size: 80%;"><b>What is the Difference</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs006.html#neural-networks-vs-cnns" style="font-size: 80%;"><b>Neural Networks vs CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs007.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;"><b>Why CNNS for images, sound files, medical images from CT scans etc?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs008.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;"><b>Regular NNs don’t scale well to full images</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs009.html#3d-volumes-of-neurons" style="font-size: 80%;"><b>3D volumes of neurons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs010.html#more-on-dimensionalities" style="font-size: 80%;"><b>More on Dimensionalities</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs011.html#further-remarks" style="font-size: 80%;"><b>Further remarks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs012.html#layers-used-to-build-cnns" style="font-size: 80%;"><b>Layers used to build CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs013.html#transforming-images" style="font-size: 80%;"><b>Transforming images</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs014.html#cnns-in-brief" style="font-size: 80%;"><b>CNNs in brief</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs015.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs016.html#key-idea" style="font-size: 80%;"><b>Key Idea</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs017.html#how-to-do-image-compression-before-the-era-of-deep-learning" style="font-size: 80%;"><b>How to do image compression before the era of deep learning</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs018.html#the-svd-example" style="font-size: 80%;"><b>The SVD example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs020.html#mathematics-of-cnns" style="font-size: 80%;"><b>Mathematics of CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs020.html#mathematics-of-cnns" style="font-size: 80%;"><b>Mathematics of CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs021.html#convolution-examples-polynomial-multiplication" style="font-size: 80%;"><b>Convolution Examples: Polynomial multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs022.html#efficient-polynomial-multiplication" style="font-size: 80%;"><b>Efficient Polynomial Multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs023.html#further-simplification" style="font-size: 80%;"><b>Further simplification</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs024.html#a-more-efficient-way-of-coding-the-above-convolution" style="font-size: 80%;"><b>A more efficient way of coding the above Convolution</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs025.html#commutative-process" style="font-size: 80%;"><b>Commutative process</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs026.html#toeplitz-matrices" style="font-size: 80%;"><b>Toeplitz matrices</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs027.html#fourier-series-and-toeplitz-matrices" style="font-size: 80%;"><b>Fourier series and Toeplitz matrices</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#generalizing-the-above-one-dimensional-case" style="font-size: 80%;"><b>Generalizing the above one-dimensional case</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs029.html#memory-considerations" style="font-size: 80%;"><b>Memory considerations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs030.html#padding" style="font-size: 80%;"><b>Padding</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs031.html#new-vector" style="font-size: 80%;"><b>New vector</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs032.html#rewriting-as-dot-products" style="font-size: 80%;"><b>Rewriting as dot products</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs033.html#cross-correlation" style="font-size: 80%;"><b>Cross correlation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs033.html#two-dimensional-objects" style="font-size: 80%;"><b>Two-dimensional objects</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs034.html#cnns-in-more-detail-simple-example" style="font-size: 80%;"><b>CNNs in more detail, simple example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs035.html#the-convolution-stage" style="font-size: 80%;"><b>The convolution stage</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs036.html#finding-the-number-of-parameters" style="font-size: 80%;"><b>Finding the number of parameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs037.html#new-image-or-volume" style="font-size: 80%;"><b>New image (or volume)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs038.html#parameters-to-train-common-settings" style="font-size: 80%;"><b>Parameters to train, common settings</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs039.html#examples-of-cnn-setups" style="font-size: 80%;"><b>Examples of CNN setups</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs040.html#summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>Summarizing: Performing a general discrete convolution ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs041.html#pooling" style="font-size: 80%;"><b>Pooling</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs042.html#pooling-arithmetic" style="font-size: 80%;"><b>Pooling arithmetic</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs043.html#pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>Pooling types ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs044.html#building-convolutional-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building convolutional neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs045.html#setting-it-up" style="font-size: 80%;"><b>Setting it up</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs046.html#the-mnist-dataset-again" style="font-size: 80%;"><b>The MNIST dataset again</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs047.html#strong-correlations" style="font-size: 80%;"><b>Strong correlations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs048.html#layers-of-a-cnn" style="font-size: 80%;"><b>Layers of a CNN</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs049.html#systematic-reduction" style="font-size: 80%;"><b>Systematic reduction</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs050.html#prerequisites-collect-and-pre-process-data" style="font-size: 80%;"><b>Prerequisites: Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs051.html#importing-keras-and-tensorflow" style="font-size: 80%;"><b>Importing Keras and Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs052.html#running-with-keras" style="font-size: 80%;"><b>Running with Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs053.html#final-part" style="font-size: 80%;"><b>Final part</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs054.html#final-visualization" style="font-size: 80%;"><b>Final visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs055.html#the-cifar01-data-set" style="font-size: 80%;"><b>The CIFAR01 data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs056.html#verifying-the-data-set" style="font-size: 80%;"><b>Verifying the data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs057.html#set-up-the-model" style="font-size: 80%;"><b>Set up  the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs058.html#add-dense-layers-on-top" style="font-size: 80%;"><b>Add Dense layers on top</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs059.html#compile-and-train-the-model" style="font-size: 80%;"><b>Compile and train the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#finally-evaluate-the-model" style="font-size: 80%;"><b>Finally, evaluate the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="#building-our-own-cnn-code" style="font-size: 80%;"><b>Building our own CNN code</b></a></li>
-     <!-- navigation toc: --> <li><a href="#list-of-contents" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;List of contents:</a></li>
-     <!-- navigation toc: --> <li><a href="#schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="#usage-of-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="#cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="#usage-of-cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="#usage-of-activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="#convolution" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution</a></li>
-     <!-- navigation toc: --> <li><a href="#layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Layers</a></li>
-     <!-- navigation toc: --> <li><a href="#convolution2dlayer-convolution-in-a-hidden-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution2DLayer: convolution in a hidden layer</a></li>
-     <!-- navigation toc: --> <li><a href="#backpropagation-in-the-convolutional-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Backpropagation in the convolutional layer</a></li>
-     <!-- navigation toc: --> <li><a href="#demonstration" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Demonstration</a></li>
-     <!-- navigation toc: --> <li><a href="#pooling-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Pooling Layer</a></li>
-     <!-- navigation toc: --> <li><a href="#flattening-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Flattening Layer</a></li>
-     <!-- navigation toc: --> <li><a href="#fully-connected-layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Fully Connected Layers</a></li>
-     <!-- navigation toc: --> <li><a href="#optimized-convolution2dlayer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Optimized Convolution2DLayer</a></li>
-     <!-- navigation toc: --> <li><a href="#the-convolutional-neural-network-cnn" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;The Convolutional Neural Network (CNN)</a></li>
-     <!-- navigation toc: --> <li><a href="#usage-of-cnn-code" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of CNN code</a></li>
-     <!-- navigation toc: --> <li><a href="#additional-remarks" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Additional Remarks</a></li>
-     <!-- navigation toc: --> <li><a href="#remarks-on-the-speed" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Remarks on the speed</a></li>
-     <!-- navigation toc: --> <li><a href="#convolution-using-separable-kernels" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution using separable kernels</a></li>
-     <!-- navigation toc: --> <li><a href="#convolution-in-the-fourier-domain" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution in the Fourier domain</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs001.html#plan-for-week-44" style="font-size: 80%;">Plan for week 44</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs002.html#lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;">Lab  sessions on Tuesday and Wednesday</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs003.html#material-for-lecture-monday-october-27" style="font-size: 80%;">Material for Lecture Monday October 27</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs004.html#solving-differential-equations-with-deep-learning" style="font-size: 80%;">Solving differential equations  with Deep Learning</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs005.html#ordinary-differential-equations-first" style="font-size: 80%;">Ordinary Differential Equations first</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#the-trial-solution" style="font-size: 80%;">The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs007.html#minimization-process" style="font-size: 80%;">Minimization process</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs008.html#minimizing-the-cost-function-using-gradient-descent-and-automatic-differentiation" style="font-size: 80%;">Minimizing the cost function using gradient descent and automatic differentiation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs009.html#example-exponential-decay" style="font-size: 80%;">Example: Exponential decay</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs010.html#the-function-to-solve-for" style="font-size: 80%;">The function to solve for</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#the-trial-solution" style="font-size: 80%;">The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs012.html#setup-of-network" style="font-size: 80%;">Setup of Network</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs013.html#reformulating-the-problem" style="font-size: 80%;">Reformulating the problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs014.html#more-technicalities" style="font-size: 80%;">More technicalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs039.html#more-details" style="font-size: 80%;">More details</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs016.html#a-possible-implementation-of-a-neural-network" style="font-size: 80%;">A possible implementation of a neural network</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs017.html#technicalities" style="font-size: 80%;">Technicalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs018.html#final-technicalities-i" style="font-size: 80%;">Final technicalities I</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs019.html#final-technicalities-ii" style="font-size: 80%;">Final technicalities II</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs020.html#final-technicalities-iii" style="font-size: 80%;">Final technicalities III</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs021.html#final-technicalities-iv" style="font-size: 80%;">Final technicalities IV</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs022.html#back-propagation" style="font-size: 80%;">Back propagation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs023.html#gradient-descent" style="font-size: 80%;">Gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs024.html#the-code-for-solving-the-ode" style="font-size: 80%;">The code for solving the ODE</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs025.html#the-network-with-one-input-layer-specified-number-of-hidden-layers-and-one-output-layer" style="font-size: 80%;">The network with one input layer, specified number of hidden layers, and one output layer</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs026.html#example-population-growth" style="font-size: 80%;">Example: Population growth</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs027.html#setting-up-the-problem" style="font-size: 80%;">Setting up the problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#the-trial-solution" style="font-size: 80%;">The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs029.html#the-program-using-autograd" style="font-size: 80%;">The program using Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs030.html#using-forward-euler-to-solve-the-ode" style="font-size: 80%;">Using forward Euler to solve the ODE</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs031.html#example-solving-the-one-dimensional-poisson-equation" style="font-size: 80%;">Example: Solving the one dimensional Poisson equation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs032.html#the-specific-equation-to-solve-for" style="font-size: 80%;">The specific equation to solve for</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs033.html#solving-the-equation-using-autograd" style="font-size: 80%;">Solving the equation using Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs034.html#comparing-with-a-numerical-scheme" style="font-size: 80%;">Comparing with a numerical scheme</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs035.html#setting-up-the-code" style="font-size: 80%;">Setting up the code</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs036.html#partial-differential-equations" style="font-size: 80%;">Partial Differential Equations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs037.html#type-of-problem" style="font-size: 80%;">Type of problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs038.html#network-requirements" style="font-size: 80%;">Network requirements</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs039.html#more-details" style="font-size: 80%;">More details</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs040.html#example-the-diffusion-equation" style="font-size: 80%;">Example: The diffusion equation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs041.html#defining-the-problem" style="font-size: 80%;">Defining the problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs042.html#setting-up-the-network-using-autograd" style="font-size: 80%;">Setting up the network using Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs043.html#setting-up-the-network-using-autograd-the-trial-solution" style="font-size: 80%;">Setting up the network using Autograd; The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs044.html#why-the-jacobian" style="font-size: 80%;">Why the Jacobian?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs045.html#setting-up-the-network-using-autograd-the-full-program" style="font-size: 80%;">Setting up the network using Autograd; The full program</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs046.html#resources-on-differential-equations-and-deep-learning" style="font-size: 80%;">Resources on differential equations and deep learning</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs047.html#convolutional-neural-networks-recognizing-images" style="font-size: 80%;">Convolutional Neural Networks (recognizing images)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs048.html#what-is-the-difference" style="font-size: 80%;">What is the Difference</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs049.html#neural-networks-vs-cnns" style="font-size: 80%;">Neural Networks vs CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs050.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;">Why CNNS for images, sound files, medical images from CT scans etc?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs051.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;">Regular NNs don’t scale well to full images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs052.html#3d-volumes-of-neurons" style="font-size: 80%;">3D volumes of neurons</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs053.html#more-on-dimensionalities" style="font-size: 80%;">More on Dimensionalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs054.html#further-remarks" style="font-size: 80%;">Further remarks</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs055.html#layers-used-to-build-cnns" style="font-size: 80%;">Layers used to build CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs056.html#transforming-images" style="font-size: 80%;">Transforming images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs057.html#cnns-in-brief" style="font-size: 80%;">CNNs in brief</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs058.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs059.html#key-idea" style="font-size: 80%;">Key Idea</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#how-to-do-image-compression-before-the-era-of-deep-learning" style="font-size: 80%;">How to do image compression before the era of deep learning</a></li>
+     <!-- navigation toc: --> <li><a href="#the-svd-example" style="font-size: 80%;">The SVD example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs062.html#mathematics-of-cnns" style="font-size: 80%;">Mathematics of CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs063.html#convolution-examples-polynomial-multiplication" style="font-size: 80%;">Convolution Examples: Polynomial multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs064.html#efficient-polynomial-multiplication" style="font-size: 80%;">Efficient Polynomial Multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs065.html#further-simplification" style="font-size: 80%;">Further simplification</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs066.html#a-more-efficient-way-of-coding-the-above-convolution" style="font-size: 80%;">A more efficient way of coding the above Convolution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs067.html#commutative-process" style="font-size: 80%;">Commutative process</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs068.html#toeplitz-matrices" style="font-size: 80%;">Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs069.html#fourier-series-and-toeplitz-matrices" style="font-size: 80%;">Fourier series and Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs070.html#generalizing-the-above-one-dimensional-case" style="font-size: 80%;">Generalizing the above one-dimensional case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs071.html#memory-considerations" style="font-size: 80%;">Memory considerations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs072.html#padding" style="font-size: 80%;">Padding</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs073.html#new-vector" style="font-size: 80%;">New vector</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs074.html#rewriting-as-dot-products" style="font-size: 80%;">Rewriting as dot products</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs075.html#cross-correlation" style="font-size: 80%;">Cross correlation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs075.html#two-dimensional-objects" style="font-size: 80%;">Two-dimensional objects</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs076.html#cnns-in-more-detail-simple-example" style="font-size: 80%;">CNNs in more detail, simple example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs077.html#the-convolution-stage" style="font-size: 80%;">The convolution stage</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs078.html#finding-the-number-of-parameters" style="font-size: 80%;">Finding the number of parameters</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs079.html#new-image-or-volume" style="font-size: 80%;">New image (or volume)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs080.html#parameters-to-train-common-settings" style="font-size: 80%;">Parameters to train, common settings</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs081.html#examples-of-cnn-setups" style="font-size: 80%;">Examples of CNN setups</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs082.html#summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Summarizing: Performing a general discrete convolution ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs083.html#pooling" style="font-size: 80%;">Pooling</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs084.html#pooling-arithmetic" style="font-size: 80%;">Pooling arithmetic</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs085.html#pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Pooling types ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs086.html#building-convolutional-neural-networks-in-tensorflow-keras-and-pytorch" style="font-size: 80%;">Building convolutional neural networks in Tensorflow/Keras and PyTorch</a></li>
 
         </ul>
       </li>
@@ -350,213 +389,7 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0061"></a>
 <!-- !split -->
-<h2 id="building-our-own-cnn-code" class="anchor">Building our own CNN code </h2>
-
-<p>Here we present a flexible and readable python code for a CNN
-implemented with NumPy. We will present the code, showcase how to use
-the codebase and fit a CNN that yields a 99% accuracy on the 28x28
-MNIST dataset within reasonable time.
-</p>
-
-<b>The codes here were developed by Eric Reber and Gregor Kajda during spring 2023.</b>
-
-<p>The CNN is compatible with all schedulers, cost functions and
-activation functions discussed in constructing our neural network
-codes.
-</p>
-
-<p> The CNN code consists of different types of Layer classes, including
-Convolution2DLayer, Pooling2DLayer, FlattenLayer, FullyConnectedLayer
-and OutputLayer, which can be added to the CNN object using the
-interface of the CNN class. This allows you to easily construct your
-own CNN, as well as allowing you to get used to an interface similar
-to that of TensorFlow which is used for real world applications. 
-</p>
-
-<p>Another important feature of this code is that it throws errors if
-unreasonable decisions are made (for example using a kernel that is
-larger than the image, not using a FlattenLayer, etc), and provides
-the user with an informative error message.
-</p>
-<h3 id="list-of-contents" class="anchor">List of contents: </h3>
-<ol>
-<li> Schedulers</li>
-<li> Activation Functions</li>
-<li> Cost Functions</li> 
-<li> Convolution</li>
-<li> Layers</li>
-<li> CNN</li> 
-<li> Some final remarks</li>
-</ol>
-<h3 id="schedulers" class="anchor">Schedulers </h3>
-
-<p>The code below shows object oriented implementations of the Constant,
-Momentum, Adagrad, AdagradMomentum, RMS prop and Adam schedulers. All
-of the classes belong to the shared abstract Scheduler class, and
-share the update_change() and reset() methods allowing for any of the
-schedulers to be seamlessly used during the training stage, as will
-later be shown in the fit() method of the neural
-network. Update_change() only has one parameter, the gradient
-(\( \delta^{l}_{j}a^{l-1}_k \)), and returns the change which will be
-subtracted from the weights. The reset() function takes no parameters,
-and resets the desired variables. For Constant and Momentum, reset
-does nothing.
-</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">autograd.numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
-
-<span style="color: #008000; font-weight: bold">class</span> <span style="color: #0000FF; font-weight: bold">Scheduler</span>:
-    <span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;</span>
-<span style="color: #BA2121; font-style: italic">    Abstract class for Schedulers</span>
-<span style="color: #BA2121; font-style: italic">    &quot;&quot;&quot;</span>
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">__init__</span>(<span style="color: #008000">self</span>, eta):
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>eta <span style="color: #666666">=</span> eta
-
-    <span style="color: #408080; font-style: italic"># should be overwritten</span>
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">update_change</span>(<span style="color: #008000">self</span>, gradient):
-        <span style="color: #008000; font-weight: bold">raise</span> <span style="color: #D2413A; font-weight: bold">NotImplementedError</span>
-
-    <span style="color: #408080; font-style: italic"># overwritten if needed</span>
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">reset</span>(<span style="color: #008000">self</span>):
-        <span style="color: #008000; font-weight: bold">pass</span>
-
-
-<span style="color: #008000; font-weight: bold">class</span> <span style="color: #0000FF; font-weight: bold">Constant</span>(Scheduler):
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">__init__</span>(<span style="color: #008000">self</span>, eta):
-        <span style="color: #008000">super</span>()<span style="color: #666666">.</span><span style="color: #0000FF">__init__</span>(eta)
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">update_change</span>(<span style="color: #008000">self</span>, gradient):
-        <span style="color: #008000; font-weight: bold">return</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>eta <span style="color: #666666">*</span> gradient
-    
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">reset</span>(<span style="color: #008000">self</span>):
-        <span style="color: #008000; font-weight: bold">pass</span>
-
-
-<span style="color: #008000; font-weight: bold">class</span> <span style="color: #0000FF; font-weight: bold">Momentum</span>(Scheduler):
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">__init__</span>(<span style="color: #008000">self</span>, eta: <span style="color: #008000">float</span>, momentum: <span style="color: #008000">float</span>):
-        <span style="color: #008000">super</span>()<span style="color: #666666">.</span><span style="color: #0000FF">__init__</span>(eta)
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>momentum <span style="color: #666666">=</span> momentum
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>change <span style="color: #666666">=</span> <span style="color: #666666">0</span>
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">update_change</span>(<span style="color: #008000">self</span>, gradient):
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>change <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>momentum <span style="color: #666666">*</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>change <span style="color: #666666">+</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>eta <span style="color: #666666">*</span> gradient
-        <span style="color: #008000; font-weight: bold">return</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>change
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">reset</span>(<span style="color: #008000">self</span>):
-        <span style="color: #008000; font-weight: bold">pass</span>
-
-
-<span style="color: #008000; font-weight: bold">class</span> <span style="color: #0000FF; font-weight: bold">Adagrad</span>(Scheduler):
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">__init__</span>(<span style="color: #008000">self</span>, eta):
-        <span style="color: #008000">super</span>()<span style="color: #666666">.</span><span style="color: #0000FF">__init__</span>(eta)
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>G_t <span style="color: #666666">=</span> <span style="color: #008000; font-weight: bold">None</span>
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">update_change</span>(<span style="color: #008000">self</span>, gradient):
-        delta <span style="color: #666666">=</span> <span style="color: #666666">1e-8</span>  <span style="color: #408080; font-style: italic"># avoid division ny zero</span>
-
-        <span style="color: #008000; font-weight: bold">if</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>G_t <span style="color: #AA22FF; font-weight: bold">is</span> <span style="color: #008000; font-weight: bold">None</span>:
-            <span style="color: #008000">self</span><span style="color: #666666">.</span>G_t <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros((gradient<span style="color: #666666">.</span>shape[<span style="color: #666666">0</span>], gradient<span style="color: #666666">.</span>shape[<span style="color: #666666">0</span>]))
-
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>G_t <span style="color: #666666">+=</span> gradient <span style="color: #666666">@</span> gradient<span style="color: #666666">.</span>T
-
-        G_t_inverse <span style="color: #666666">=</span> <span style="color: #666666">1</span> <span style="color: #666666">/</span> (
-            delta <span style="color: #666666">+</span> np<span style="color: #666666">.</span>sqrt(np<span style="color: #666666">.</span>reshape(np<span style="color: #666666">.</span>diagonal(<span style="color: #008000">self</span><span style="color: #666666">.</span>G_t), (<span style="color: #008000">self</span><span style="color: #666666">.</span>G_t<span style="color: #666666">.</span>shape[<span style="color: #666666">0</span>], <span style="color: #666666">1</span>)))
-        )
-        <span style="color: #008000; font-weight: bold">return</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>eta <span style="color: #666666">*</span> gradient <span style="color: #666666">*</span> G_t_inverse
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">reset</span>(<span style="color: #008000">self</span>):
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>G_t <span style="color: #666666">=</span> <span style="color: #008000; font-weight: bold">None</span>
-
-
-<span style="color: #008000; font-weight: bold">class</span> <span style="color: #0000FF; font-weight: bold">AdagradMomentum</span>(Scheduler):
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">__init__</span>(<span style="color: #008000">self</span>, eta, momentum):
-        <span style="color: #008000">super</span>()<span style="color: #666666">.</span><span style="color: #0000FF">__init__</span>(eta)
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>G_t <span style="color: #666666">=</span> <span style="color: #008000; font-weight: bold">None</span>
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>momentum <span style="color: #666666">=</span> momentum
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>change <span style="color: #666666">=</span> <span style="color: #666666">0</span>
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">update_change</span>(<span style="color: #008000">self</span>, gradient):
-        delta <span style="color: #666666">=</span> <span style="color: #666666">1e-8</span>  <span style="color: #408080; font-style: italic"># avoid division ny zero</span>
-
-        <span style="color: #008000; font-weight: bold">if</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>G_t <span style="color: #AA22FF; font-weight: bold">is</span> <span style="color: #008000; font-weight: bold">None</span>:
-            <span style="color: #008000">self</span><span style="color: #666666">.</span>G_t <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros((gradient<span style="color: #666666">.</span>shape[<span style="color: #666666">0</span>], gradient<span style="color: #666666">.</span>shape[<span style="color: #666666">0</span>]))
-
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>G_t <span style="color: #666666">+=</span> gradient <span style="color: #666666">@</span> gradient<span style="color: #666666">.</span>T
-
-        G_t_inverse <span style="color: #666666">=</span> <span style="color: #666666">1</span> <span style="color: #666666">/</span> (
-            delta <span style="color: #666666">+</span> np<span style="color: #666666">.</span>sqrt(np<span style="color: #666666">.</span>reshape(np<span style="color: #666666">.</span>diagonal(<span style="color: #008000">self</span><span style="color: #666666">.</span>G_t), (<span style="color: #008000">self</span><span style="color: #666666">.</span>G_t<span style="color: #666666">.</span>shape[<span style="color: #666666">0</span>], <span style="color: #666666">1</span>)))
-        )
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>change <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>change <span style="color: #666666">*</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>momentum <span style="color: #666666">+</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>eta <span style="color: #666666">*</span> gradient <span style="color: #666666">*</span> G_t_inverse
-        <span style="color: #008000; font-weight: bold">return</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>change
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">reset</span>(<span style="color: #008000">self</span>):
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>G_t <span style="color: #666666">=</span> <span style="color: #008000; font-weight: bold">None</span>
-
-
-<span style="color: #008000; font-weight: bold">class</span> <span style="color: #0000FF; font-weight: bold">RMS_prop</span>(Scheduler):
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">__init__</span>(<span style="color: #008000">self</span>, eta, rho):
-        <span style="color: #008000">super</span>()<span style="color: #666666">.</span><span style="color: #0000FF">__init__</span>(eta)
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>rho <span style="color: #666666">=</span> rho
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>second <span style="color: #666666">=</span> <span style="color: #666666">0.0</span>
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">update_change</span>(<span style="color: #008000">self</span>, gradient):
-        delta <span style="color: #666666">=</span> <span style="color: #666666">1e-8</span>  <span style="color: #408080; font-style: italic"># avoid division ny zero</span>
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>second <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>rho <span style="color: #666666">*</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>second <span style="color: #666666">+</span> (<span style="color: #666666">1</span> <span style="color: #666666">-</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>rho) <span style="color: #666666">*</span> gradient <span style="color: #666666">*</span> gradient
-        <span style="color: #008000; font-weight: bold">return</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>eta <span style="color: #666666">*</span> gradient <span style="color: #666666">/</span> (np<span style="color: #666666">.</span>sqrt(<span style="color: #008000">self</span><span style="color: #666666">.</span>second <span style="color: #666666">+</span> delta))
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">reset</span>(<span style="color: #008000">self</span>):
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>second <span style="color: #666666">=</span> <span style="color: #666666">0.0</span>
-
-
-<span style="color: #008000; font-weight: bold">class</span> <span style="color: #0000FF; font-weight: bold">Adam</span>(Scheduler):
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">__init__</span>(<span style="color: #008000">self</span>, eta, rho, rho2):
-        <span style="color: #008000">super</span>()<span style="color: #666666">.</span><span style="color: #0000FF">__init__</span>(eta)
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>rho <span style="color: #666666">=</span> rho
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>rho2 <span style="color: #666666">=</span> rho2
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>moment <span style="color: #666666">=</span> <span style="color: #666666">0</span>
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>second <span style="color: #666666">=</span> <span style="color: #666666">0</span>
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>n_epochs <span style="color: #666666">=</span> <span style="color: #666666">1</span>
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">update_change</span>(<span style="color: #008000">self</span>, gradient):
-        delta <span style="color: #666666">=</span> <span style="color: #666666">1e-8</span>  <span style="color: #408080; font-style: italic"># avoid division ny zero</span>
-
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>moment <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>rho <span style="color: #666666">*</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>moment <span style="color: #666666">+</span> (<span style="color: #666666">1</span> <span style="color: #666666">-</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>rho) <span style="color: #666666">*</span> gradient
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>second <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>rho2 <span style="color: #666666">*</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>second <span style="color: #666666">+</span> (<span style="color: #666666">1</span> <span style="color: #666666">-</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>rho2) <span style="color: #666666">*</span> gradient <span style="color: #666666">*</span> gradient
-
-        moment_corrected <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>moment <span style="color: #666666">/</span> (<span style="color: #666666">1</span> <span style="color: #666666">-</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>rho<span style="color: #666666">**</span><span style="color: #008000">self</span><span style="color: #666666">.</span>n_epochs)
-        second_corrected <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>second <span style="color: #666666">/</span> (<span style="color: #666666">1</span> <span style="color: #666666">-</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>rho2<span style="color: #666666">**</span><span style="color: #008000">self</span><span style="color: #666666">.</span>n_epochs)
-
-        <span style="color: #008000; font-weight: bold">return</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>eta <span style="color: #666666">*</span> moment_corrected <span style="color: #666666">/</span> (np<span style="color: #666666">.</span>sqrt(second_corrected <span style="color: #666666">+</span> delta))
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">reset</span>(<span style="color: #008000">self</span>):
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>n_epochs <span style="color: #666666">+=</span> <span style="color: #666666">1</span>
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>moment <span style="color: #666666">=</span> <span style="color: #666666">0</span>
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>second <span style="color: #666666">=</span> <span style="color: #666666">0</span>
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-<h3 id="usage-of-schedulers" class="anchor">Usage of schedulers </h3>
-
-<p>To initalize a scheduler, simply create the object and pass in the necessary parameters such as the learning rate and the momentum as shown below. As the Scheduler class is an abstract class it should not called directly, and will raise an error upon usage.</p>
+<h2 id="the-svd-example" class="anchor">The SVD example </h2>
 
 
 <!-- code=python (!bc pycod) typeset with pygments style "default" -->
@@ -565,223 +398,64 @@ <h3 id="usage-of-schedulers" class="anchor">Usage of schedulers </h3>
     <div class="inner_cell">
       <div class="input_area">
         <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;">momentum_scheduler <span style="color: #666666">=</span> Momentum(eta<span style="color: #666666">=1e-3</span>, momentum<span style="color: #666666">=0.9</span>)
-adam_scheduler <span style="color: #666666">=</span> Adam(eta<span style="color: #666666">=1e-3</span>, rho<span style="color: #666666">=0.9</span>, rho2<span style="color: #666666">=0.999</span>)
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-<p>Here is a small example for how a segment of code using schedulers could look. Switching out the schedulers is simple.</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;">weights <span style="color: #666666">=</span> np<span style="color: #666666">.</span>ones((<span style="color: #666666">3</span>,<span style="color: #666666">3</span>))
-<span style="color: #008000">print</span>(<span style="color: #BA2121">f&quot;Before scheduler:</span><span style="color: #BB6622; font-weight: bold">\n</span><span style="color: #BB6688; font-weight: bold">{</span>weights<span style="color: #BB6688; font-weight: bold">=}</span><span style="color: #BA2121">&quot;</span>)
-
-epochs <span style="color: #666666">=</span> <span style="color: #666666">10</span>
-<span style="color: #008000; font-weight: bold">for</span> e <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(epochs):
-    gradient <span style="color: #666666">=</span> np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>rand(<span style="color: #666666">3</span>, <span style="color: #666666">3</span>)
-    change <span style="color: #666666">=</span> adam_scheduler<span style="color: #666666">.</span>update_change(gradient)
-    weights <span style="color: #666666">=</span> weights <span style="color: #666666">-</span> change
-    adam_scheduler<span style="color: #666666">.</span>reset()
-
-<span style="color: #008000">print</span>(<span style="color: #BA2121">f&quot;</span><span style="color: #BB6622; font-weight: bold">\n</span><span style="color: #BA2121">After scheduler:</span><span style="color: #BB6622; font-weight: bold">\n</span><span style="color: #BB6688; font-weight: bold">{</span>weights<span style="color: #BB6688; font-weight: bold">=}</span><span style="color: #BA2121">&quot;</span>)
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-<h3 id="cost-functions" class="anchor">Cost functions </h3>
-
-<p>In this section we will quickly look at cost functions that can be
-used when creating the neural network. Every cost function takes the
-target vector as its parameter, and returns a function valued only at
-X such that it may easily be differentiated.
-</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">CostOLS</span>(target):
-    <span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;</span>
-<span style="color: #BA2121; font-style: italic">    Return OLS function valued only at X, so</span>
-<span style="color: #BA2121; font-style: italic">    that it may be easily differentiated</span>
-<span style="color: #BA2121; font-style: italic">    &quot;&quot;&quot;</span>
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">func</span>(X):
-        <span style="color: #008000; font-weight: bold">return</span> (<span style="color: #666666">1.0</span> <span style="color: #666666">/</span> target<span style="color: #666666">.</span>shape[<span style="color: #666666">0</span>]) <span style="color: #666666">*</span> np<span style="color: #666666">.</span>sum((target <span style="color: #666666">-</span> X) <span style="color: #666666">**</span> <span style="color: #666666">2</span>)
-
-    <span style="color: #008000; font-weight: bold">return</span> func
-
-
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">CostLogReg</span>(target):
-    <span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;</span>
-<span style="color: #BA2121; font-style: italic">    Return Logistic Regression cost function</span>
-<span style="color: #BA2121; font-style: italic">    valued only at X, so that it may be easily differentiated</span>
-<span style="color: #BA2121; font-style: italic">    &quot;&quot;&quot;</span>
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">func</span>(X):
-        <span style="color: #008000; font-weight: bold">return</span> <span style="color: #666666">-</span>(<span style="color: #666666">1.0</span> <span style="color: #666666">/</span> target<span style="color: #666666">.</span>shape[<span style="color: #666666">0</span>]) <span style="color: #666666">*</span> np<span style="color: #666666">.</span>sum(
-            (target <span style="color: #666666">*</span> np<span style="color: #666666">.</span>log(X <span style="color: #666666">+</span> <span style="color: #666666">10e-10</span>)) <span style="color: #666666">+</span> ((<span style="color: #666666">1</span> <span style="color: #666666">-</span> target) <span style="color: #666666">*</span> np<span style="color: #666666">.</span>log(<span style="color: #666666">1</span> <span style="color: #666666">-</span> X <span style="color: #666666">+</span> <span style="color: #666666">10e-10</span>))
-        )
-
-    <span style="color: #008000; font-weight: bold">return</span> func
-
-
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">CostCrossEntropy</span>(target):
-    <span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;</span>
-<span style="color: #BA2121; font-style: italic">    Return cross entropy cost function valued only at X, so</span>
-<span style="color: #BA2121; font-style: italic">    that it may be easily differentiated</span>
-<span style="color: #BA2121; font-style: italic">    &quot;&quot;&quot;</span>
+  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">matplotlib.image</span> <span style="color: #008000; font-weight: bold">import</span> imread
+<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">matplotlib.pyplot</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">plt</span>
+<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">scipy.linalg</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">ln</span>
+<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
+<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">os</span>
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">PIL</span> <span style="color: #008000; font-weight: bold">import</span> Image
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">math</span> <span style="color: #008000; font-weight: bold">import</span> log10, sqrt 
+plt<span style="color: #666666">.</span>rcParams[<span style="color: #BA2121">&#39;figure.figsize&#39;</span>] <span style="color: #666666">=</span> [<span style="color: #666666">16</span>, <span style="color: #666666">8</span>]
+<span style="color: #408080; font-style: italic"># Import image</span>
+A <span style="color: #666666">=</span> imread(os<span style="color: #666666">.</span>path<span style="color: #666666">.</span>join(<span style="color: #BA2121">&quot;figslides/photo1.jpg&quot;</span>))
+X <span style="color: #666666">=</span> A<span style="color: #666666">.</span>dot([<span style="color: #666666">0.299</span>, <span style="color: #666666">0.5870</span>, <span style="color: #666666">0.114</span>]) <span style="color: #408080; font-style: italic"># Convert RGB to grayscale</span>
+img <span style="color: #666666">=</span> plt<span style="color: #666666">.</span>imshow(X)
+<span style="color: #408080; font-style: italic"># convert to gray</span>
+img<span style="color: #666666">.</span>set_cmap(<span style="color: #BA2121">&#39;gray&#39;</span>)
+plt<span style="color: #666666">.</span>axis(<span style="color: #BA2121">&#39;off&#39;</span>)
+plt<span style="color: #666666">.</span>show()
+<span style="color: #408080; font-style: italic"># Call image size</span>
+<span style="color: #008000">print</span>(<span style="color: #BA2121">&#39;: </span><span style="color: #BB6688; font-weight: bold">%s</span><span style="color: #BA2121">&#39;</span><span style="color: #666666">%</span><span style="color: #008000">str</span>(X<span style="color: #666666">.</span>shape))
+
+
+<span style="color: #408080; font-style: italic"># split the matrix into U, S, VT</span>
+U, S, VT <span style="color: #666666">=</span> np<span style="color: #666666">.</span>linalg<span style="color: #666666">.</span>svd(X,full_matrices<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">False</span>)
+S <span style="color: #666666">=</span> np<span style="color: #666666">.</span>diag(S)
+m <span style="color: #666666">=</span> <span style="color: #666666">800</span> <span style="color: #408080; font-style: italic"># Image&#39;s width</span>
+n <span style="color: #666666">=</span> <span style="color: #666666">1200</span> <span style="color: #408080; font-style: italic"># Image&#39;s height</span>
+j <span style="color: #666666">=</span> <span style="color: #666666">0</span>
+<span style="color: #408080; font-style: italic"># Try compression with different k vectors (these represent projections):</span>
+<span style="color: #008000; font-weight: bold">for</span> k <span style="color: #AA22FF; font-weight: bold">in</span> (<span style="color: #666666">5</span>,<span style="color: #666666">10</span>, <span style="color: #666666">20</span>, <span style="color: #666666">100</span>,<span style="color: #666666">200</span>,<span style="color: #666666">400</span>,<span style="color: #666666">500</span>):
+    <span style="color: #408080; font-style: italic"># Original size of the image</span>
+    originalSize <span style="color: #666666">=</span> m <span style="color: #666666">*</span> n 
+    <span style="color: #408080; font-style: italic"># Size after compressed</span>
+    compressedSize <span style="color: #666666">=</span> k <span style="color: #666666">*</span> (<span style="color: #666666">1</span> <span style="color: #666666">+</span> m <span style="color: #666666">+</span> n) 
+    <span style="color: #408080; font-style: italic"># The projection of the original image</span>
+    Xapprox <span style="color: #666666">=</span> U[:,:k] <span style="color: #666666">@</span> S[<span style="color: #666666">0</span>:k,:k] <span style="color: #666666">@</span> VT[:k,:]
+    plt<span style="color: #666666">.</span>figure(j<span style="color: #666666">+1</span>)
+    j <span style="color: #666666">+=</span> <span style="color: #666666">1</span>
+    img <span style="color: #666666">=</span> plt<span style="color: #666666">.</span>imshow(Xapprox)
+    img<span style="color: #666666">.</span>set_cmap(<span style="color: #BA2121">&#39;gray&#39;</span>)
     
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">func</span>(X):
-        <span style="color: #008000; font-weight: bold">return</span> <span style="color: #666666">-</span>(<span style="color: #666666">1.0</span> <span style="color: #666666">/</span> target<span style="color: #666666">.</span>size) <span style="color: #666666">*</span> np<span style="color: #666666">.</span>sum(target <span style="color: #666666">*</span> np<span style="color: #666666">.</span>log(X <span style="color: #666666">+</span> <span style="color: #666666">10e-10</span>))
-
-    <span style="color: #008000; font-weight: bold">return</span> func
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-<h3 id="usage-of-cost-functions" class="anchor">Usage of cost functions </h3>
-
-<p>Below we will provide a short example of how these cost function may
-be used to obtain results if you wish to test them out on your own
-using AutoGrad's automatic differentiation.
-</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">autograd</span> <span style="color: #008000; font-weight: bold">import</span> grad
-
-target <span style="color: #666666">=</span> np<span style="color: #666666">.</span>array([[<span style="color: #666666">1</span>, <span style="color: #666666">2</span>, <span style="color: #666666">3</span>]])<span style="color: #666666">.</span>T
-a <span style="color: #666666">=</span> np<span style="color: #666666">.</span>array([[<span style="color: #666666">4</span>, <span style="color: #666666">5</span>, <span style="color: #666666">6</span>]])<span style="color: #666666">.</span>T
-
-cost_func <span style="color: #666666">=</span> CostCrossEntropy
-cost_func_derivative <span style="color: #666666">=</span> grad(cost_func(target))
-
-valued_at_a <span style="color: #666666">=</span> cost_func_derivative(a)
-<span style="color: #008000">print</span>(<span style="color: #BA2121">f&quot;Derivative of cost function </span><span style="color: #BB6688; font-weight: bold">{</span>cost_func<span style="color: #666666">.</span><span style="color: #19177C">__name__</span><span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121"> valued at a:</span><span style="color: #BB6622; font-weight: bold">\n</span><span style="color: #BB6688; font-weight: bold">{</span>valued_at_a<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">&quot;</span>)
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-<h3 id="activation-functions" class="anchor">Activation functions </h3>
-
-<p>Finally, before we look at the layers that make up the neural network,
-we will look at the activation functions which can be specified
-between the hidden layers and as the output function. Each function
-can be valued for any given vector or matrix X, and can be
-differentiated via derivate().
-</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">autograd.numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">autograd</span> <span style="color: #008000; font-weight: bold">import</span> elementwise_grad
-
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">identity</span>(X):
-    <span style="color: #008000; font-weight: bold">return</span> X
-
-
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">sigmoid</span>(X):
-    <span style="color: #008000; font-weight: bold">try</span>:
-        <span style="color: #008000; font-weight: bold">return</span> <span style="color: #666666">1.0</span> <span style="color: #666666">/</span> (<span style="color: #666666">1</span> <span style="color: #666666">+</span> np<span style="color: #666666">.</span>exp(<span style="color: #666666">-</span>X))
-    <span style="color: #008000; font-weight: bold">except</span> <span style="color: #D2413A; font-weight: bold">FloatingPointError</span>:
-        <span style="color: #008000; font-weight: bold">return</span> np<span style="color: #666666">.</span>where(X <span style="color: #666666">&gt;</span> np<span style="color: #666666">.</span>zeros(X<span style="color: #666666">.</span>shape), np<span style="color: #666666">.</span>ones(X<span style="color: #666666">.</span>shape), np<span style="color: #666666">.</span>zeros(X<span style="color: #666666">.</span>shape))
-
-
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">softmax</span>(X):
-    X <span style="color: #666666">=</span> X <span style="color: #666666">-</span> np<span style="color: #666666">.</span>max(X, axis<span style="color: #666666">=-1</span>, keepdims<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">True</span>)
-    delta <span style="color: #666666">=</span> <span style="color: #666666">10e-10</span>
-    <span style="color: #008000; font-weight: bold">return</span> np<span style="color: #666666">.</span>exp(X) <span style="color: #666666">/</span> (np<span style="color: #666666">.</span>sum(np<span style="color: #666666">.</span>exp(X), axis<span style="color: #666666">=-1</span>, keepdims<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">True</span>) <span style="color: #666666">+</span> delta)
-
-
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">RELU</span>(X):
-    <span style="color: #008000; font-weight: bold">return</span> np<span style="color: #666666">.</span>where(X <span style="color: #666666">&gt;</span> np<span style="color: #666666">.</span>zeros(X<span style="color: #666666">.</span>shape), X, np<span style="color: #666666">.</span>zeros(X<span style="color: #666666">.</span>shape))
-
-
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">LRELU</span>(X):
-    delta <span style="color: #666666">=</span> <span style="color: #666666">10e-4</span>
-    <span style="color: #008000; font-weight: bold">return</span> np<span style="color: #666666">.</span>where(X <span style="color: #666666">&gt;</span> np<span style="color: #666666">.</span>zeros(X<span style="color: #666666">.</span>shape), X, delta <span style="color: #666666">*</span> X)
-
-
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">derivate</span>(func):
-    <span style="color: #008000; font-weight: bold">if</span> func<span style="color: #666666">.</span><span style="color: #19177C">__name__</span> <span style="color: #666666">==</span> <span style="color: #BA2121">&quot;RELU&quot;</span>:
-
-        <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">func</span>(X):
-            <span style="color: #008000; font-weight: bold">return</span> np<span style="color: #666666">.</span>where(X <span style="color: #666666">&gt;</span> <span style="color: #666666">0</span>, <span style="color: #666666">1</span>, <span style="color: #666666">0</span>)
-
-        <span style="color: #008000; font-weight: bold">return</span> func
-
-    <span style="color: #008000; font-weight: bold">elif</span> func<span style="color: #666666">.</span><span style="color: #19177C">__name__</span> <span style="color: #666666">==</span> <span style="color: #BA2121">&quot;LRELU&quot;</span>:
-
-        <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">func</span>(X):
-            delta <span style="color: #666666">=</span> <span style="color: #666666">10e-4</span>
-            <span style="color: #008000; font-weight: bold">return</span> np<span style="color: #666666">.</span>where(X <span style="color: #666666">&gt;</span> <span style="color: #666666">0</span>, <span style="color: #666666">1</span>, delta)
-
-        <span style="color: #008000; font-weight: bold">return</span> func
-
-    <span style="color: #008000; font-weight: bold">else</span>:
-        <span style="color: #008000; font-weight: bold">return</span> elementwise_grad(func)
+    plt<span style="color: #666666">.</span>axis(<span style="color: #BA2121">&#39;off&#39;</span>)
+    plt<span style="color: #666666">.</span>title(<span style="color: #BA2121">&#39;k = &#39;</span> <span style="color: #666666">+</span> <span style="color: #008000">str</span>(k))
+    plt<span style="color: #666666">.</span>show() 
+    <span style="color: #008000">print</span>(<span style="color: #BA2121">&#39;Original size of image:&#39;</span>)
+    <span style="color: #008000">print</span>(originalSize)
+    <span style="color: #008000">print</span>(<span style="color: #BA2121">&#39;Compression rate as Compressed image / Original size:&#39;</span>)
+    ratio <span style="color: #666666">=</span> compressedSize <span style="color: #666666">*</span> <span style="color: #666666">1.0</span> <span style="color: #666666">/</span> originalSize
+    <span style="color: #008000">print</span>(ratio)
+    <span style="color: #008000">print</span>(<span style="color: #BA2121">&#39;Compression rate is &#39;</span> <span style="color: #666666">+</span> <span style="color: #008000">str</span>( <span style="color: #008000">round</span>(ratio <span style="color: #666666">*</span> <span style="color: #666666">100</span> ,<span style="color: #666666">2</span>)) <span style="color: #666666">+</span> <span style="color: #BA2121">&#39;%&#39;</span> )  
+    <span style="color: #408080; font-style: italic"># Estimate MQA</span>
+    x<span style="color: #666666">=</span> X<span style="color: #666666">.</span>astype(<span style="color: #BA2121">&quot;float&quot;</span>)
+    y<span style="color: #666666">=</span>Xapprox<span style="color: #666666">.</span>astype(<span style="color: #BA2121">&quot;float&quot;</span>)
+    err <span style="color: #666666">=</span> np<span style="color: #666666">.</span>sum((x <span style="color: #666666">-</span> y) <span style="color: #666666">**</span> <span style="color: #666666">2</span>)
+    err <span style="color: #666666">/=</span> <span style="color: #008000">float</span>(X<span style="color: #666666">.</span>shape[<span style="color: #666666">0</span>] <span style="color: #666666">*</span> Xapprox<span style="color: #666666">.</span>shape[<span style="color: #666666">1</span>])
+    <span style="color: #008000">print</span>(<span style="color: #BA2121">&#39;The mean-square deviation &#39;</span><span style="color: #666666">+</span> <span style="color: #008000">str</span>(<span style="color: #008000">round</span>( err)))
+    max_pixel <span style="color: #666666">=</span> <span style="color: #666666">255.0</span>
+    <span style="color: #408080; font-style: italic"># Estimate Signal Noise Ratio</span>
+    srv <span style="color: #666666">=</span> <span style="color: #666666">20</span> <span style="color: #666666">*</span> (log10(max_pixel <span style="color: #666666">/</span> sqrt(err)))
+    <span style="color: #008000">print</span>(<span style="color: #BA2121">&#39;Signa to noise ratio &#39;</span><span style="color: #666666">+</span> <span style="color: #008000">str</span>(<span style="color: #008000">round</span>(srv)) <span style="color: #666666">+</span><span style="color: #BA2121">&#39;dB&#39;</span>)
 </pre>
 </div>
       </div>
@@ -796,2824 +470,7 @@ <h3 id="activation-functions" class="anchor">Activation functions </h3>
     </div>
   </div>
 </div>
-<h3 id="usage-of-activation-functions" class="anchor">Usage of activation functions </h3>
 
-<p>Below we present a short demonstration of how to use an activation
-function. The derivative of the activation function will be important
-when calculating the output delta term during backpropagation. Note
-that derivate() can also be used for cost functions for a more
-generalized approach.
-</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;">z <span style="color: #666666">=</span> np<span style="color: #666666">.</span>array([[<span style="color: #666666">4</span>, <span style="color: #666666">5</span>, <span style="color: #666666">6</span>]])<span style="color: #666666">.</span>T
-<span style="color: #008000">print</span>(<span style="color: #BA2121">f&quot;Input to activation function:</span><span style="color: #BB6622; font-weight: bold">\n</span><span style="color: #BB6688; font-weight: bold">{</span>z<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">&quot;</span>)
-
-act_func <span style="color: #666666">=</span> sigmoid
-a <span style="color: #666666">=</span> act_func(z)
-<span style="color: #008000">print</span>(<span style="color: #BA2121">f&quot;</span><span style="color: #BB6622; font-weight: bold">\n</span><span style="color: #BA2121">Output from </span><span style="color: #BB6688; font-weight: bold">{</span>act_func<span style="color: #666666">.</span><span style="color: #19177C">__name__</span><span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121"> activation function:</span><span style="color: #BB6622; font-weight: bold">\n</span><span style="color: #BB6688; font-weight: bold">{</span>a<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">&quot;</span>)
-
-act_func_derivative <span style="color: #666666">=</span> derivate(act_func)
-valued_at_z <span style="color: #666666">=</span> act_func_derivative(a)
-<span style="color: #008000">print</span>(<span style="color: #BA2121">f&quot;</span><span style="color: #BB6622; font-weight: bold">\n</span><span style="color: #BA2121">Derivative of </span><span style="color: #BB6688; font-weight: bold">{</span>act_func<span style="color: #666666">.</span><span style="color: #19177C">__name__</span><span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121"> activation function valued at z:</span><span style="color: #BB6622; font-weight: bold">\n</span><span style="color: #BB6688; font-weight: bold">{</span>valued_at_z<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">&quot;</span>)
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-<h3 id="convolution" class="anchor">Convolution </h3>
-
-<p>In order to construct a convolutional neural network (CNN), it is
-crucial to comprehend the fundamental principles of convolution and
-how it aids in extracting information from images. Convolution, at its
-core, is merely a mathematical operation between two functions that
-yields another function. It is represented by an integral between two
-functions, which is typically expressed as:
-</p>
-
-$$
-(f \ast g)(t):=\int_{-\infty}^{\infty} f(\tau) g(t-\tau) d \tau.
-$$
-
-<p>Here, \( f \) and \( g \) are the two functions on which we want to perform an
-operation. The outcome of the convolution operation is represented by
-\( (f \ast g) \), and it is derived by sliding the function \( g \) over \( f \) and
-computing the integral of their product at each position. If both
-functions are continuous, convolution takes the form shown
-above. However, if we discretize both \( f \) and \( g \), the convolution
-operation will take the form of a sum between the elements of \( f \) and \( g \):
-</p>
-$$
-(f \ast g)[n]=\sum_{m=0}^{n-1} f(m) g(n-m).
-$$
-
-<p>The key idea we utilize to extract the information contained in an
-image is to slide an \( m \times n \) matrix \( g \) over an \( m \times n \)
-matrix \( f \). In our case, \( f \) represents the image, while \( g \)
-represents the kernel, oftentimes called a filter. However, since our
-convolution will be a two-dimensional variant, we need to extend our
-mathematical formula with an additional summation:
-</p>
-
-$$
-(f \ast g)(i, j)\sum_{m=0}^{M-1}\sum_{n=0}^{N-1} f(m,n) g(i-m, j-n).
-$$
-
-<p>It is imperative to note that the size of the kernel g is
-significantly smaller than the size of the input image f, thereby
-reducing the amount of computation necessary for feature
-extraction. Furthermore, the kernel is usually a trainable parameter
-in a convolutional neural network, allowing the network to learn
-appropriate kernels for specific tasks.
-</p>
-
-<p>To give you an example of how 2D convolution works in practice,
-suppose we have an image \( f \) of dimension \( 6 \times 6 \)
-</p>
-
-$$
-f = \begin{bmatrix}
-4 & 1 & 2 & 9 & 8 & 6 \\
-9 & 5 & 9 & 5 & 8 & 5 \\
-1 & 5 & 9 & 7 & 6 & 4 \\
-2 & 9 & 8 & 3 & 7 & 1 \\
-8 & 1 & 6 & 4 & 2 & 2 \\
-1 & 0 & 5 & 7 & 8 & 2 \\
-\end{bmatrix}
-$$
-
-<p>and a \( 3 \times 3 \) kernel \( g \) called a low-pass filter. Note that the
-kernel is usually rotated by 180 degrees during convolution, however
-this has no effect on this kernel.
-</p>
-
-$$
-g = \frac{1}{9}
-\begin{bmatrix}
-1 & 1 & 1 \\
-1 & 1 & 1 \\
-1 & 1 & 1 \\
-\end{bmatrix}
-$$
-
-<p>In order to filter the image, we have to extract a \( 3 \times 3 \)
-element from the upper left corner of \( f \), and perform element-wise
-multiplication of the extracted image pixels with the elements of the
-kernel \( g \):
-</p>
-
-$$
-\begin{bmatrix}
-4 & 1 & 2 \\
-9 & 5 & 9 \\
-1 & 5 & 9 \\
-\end{bmatrix}
-\cdot
-\begin{bmatrix}
-\frac{1}{9} & \frac{1}{9} & \frac{1}{9} \\
-\frac{1}{9} & \frac{1}{9} & \frac{1}{9} \\
-\frac{1}{9} & \frac{1}{9} & \frac{1}{9} \\
-\end{bmatrix}
-=
-\begin{bmatrix}
-\frac{4}{9} & \frac{1}{9} & \frac{2}{9} \\
-\frac{9}{9} & \frac{5}{9} & \frac{9}{9} \\
-\frac{1}{9} & \frac{5}{9} & \frac{9}{9} \\
-\end {bmatrix}
-= \boldsymbol{A}
-$$
-
-<p>Then, following the multiplication, we summarize all the elements of the resulting matrix \( \boldsymbol{A} \):</p>
-
-$$
-(f \ast g)(0, 0)= \sum_{i=0}^{2} \sum_{j=0}^{2} a_{i,j} = 5,
-$$
-
-<p>which corresponds to the first element of the filtered image \( (f \ast g) \).</p>
-
-<p>Here we use a stride of \( S=1 \), a parameter denoted \( S \) which describes how
-many indexes we move the kernel \( g \) to the right before repeating the
-calculations above for the next \( 3 \times 3 \) element of the image
-\( f \). It is usually presumed that \( S=1 \), however, larger values for \( S \)
-can be used to reduce the dimentionality of the filtered image such
-that the convolution operation is more computationally efficient. In
-the context of a convolutional neural network, this will become very
-useful.
-</p>
-
-<p>The full result of the convolution is:</p>
-
-$$
-(f \ast g) =
-\begin{bmatrix}
-5 & 5.78 & 7 & 6.44 \\
-6.33 & 6.67 & 6.89 & 5.11 \\
-5.44 & 5.78 & 5.78 & 4 \\
-4.44 & 4.78 & 5.56 & 4 \\
-\end{bmatrix}
-$$
-
-<p>The result is markedly smaller in shape than the original image. This
-occurs when using convolution without first padding the image with
-additional columns and rows, allowing us to keep the original image
-shape after sliding the kernel over the image.  How many rows and
-columns we wish to pad the image with depends strictly on the shape of
-the kernel, as we wish to pad the image with \( r \) additional rows and
-\( c \) additional columns.
-</p>
-
-$$
-r =\lfloor \frac{\mathrm{kernel height}}{2} \rfloor \cdot 2 \\
-c =\lfloor \frac{\mathrm{kernel width}}{2} \rfloor \cdot 2
-$$
-
-<p>Note the notation \( \lfloor \frac{\mathrm{kernel width}}{2} \rfloor \) means that
-we floor the result of the division, meaning we round down to a whole
-number in case \( \frac{\mathrm{kernel width}}{2} \) results in a floating point
-number.
-</p>
-
-<p>Using those simple equations, we find out by how much we have to
-extend the dimensions of the original image. Before proceeding,
-however, we might ask what we shall fill the additional rows and
-columns with? One of the most common approaches to padding is
-zero-padding, which as the name suggest, involves filling the rows and
-columns with zeros. This is the approach that we will be using for
-this demonstration. If we apply this padding to out original \( 6 \times 6 \)
-image, the result will be an \( 8 \times 8 \) image as the kernel has a width and
-height of 3. Note that the original image is encapsuled by the
-zero-padded rows and columns:
-</p>
-
-$$
-\begin{bmatrix}
-0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\
-0 & 4 & 1 & 2 & 9 & 8 & 6 & 0 \\
-0 & 9 & 5 & 9 & 5 & 8 & 5 & 0 \\
-0 & 1 & 5 & 9 & 7 & 6 & 4 & 0 \\
-0 & 2 & 9 & 8 & 3 & 7 & 1 & 0 \\
-0 & 8 & 1 & 6 & 4 & 2 & 2 & 0 \\
-0 & 1 & 0 & 5 & 7 & 8 & 2 & 0 \\
-0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\
-\end{bmatrix}.
-$$
-
-<p>Below we have provided code that demonstrates padding and
-convolution. As you will see when we run the code, the size of the
-image will remain unchanged when using padding.~
-</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
-
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">padding</span>(image, kernel):
-    <span style="color: #408080; font-style: italic"># calculate r and c</span>
-    r <span style="color: #666666">=</span> (kernel<span style="color: #666666">.</span>shape[<span style="color: #666666">0</span>] <span style="color: #666666">//</span> <span style="color: #666666">2</span>) <span style="color: #666666">*</span> <span style="color: #666666">2</span>
-    c <span style="color: #666666">=</span> (kernel<span style="color: #666666">.</span>shape[<span style="color: #666666">1</span>] <span style="color: #666666">//</span> <span style="color: #666666">2</span>) <span style="color: #666666">*</span> <span style="color: #666666">2</span>
-    
-    <span style="color: #408080; font-style: italic"># padded image dimensions</span>
-    padded_height <span style="color: #666666">=</span> image<span style="color: #666666">.</span>shape[<span style="color: #666666">0</span>] <span style="color: #666666">+</span> r
-    padded_width <span style="color: #666666">=</span> image<span style="color: #666666">.</span>shape[<span style="color: #666666">1</span>] <span style="color: #666666">+</span> c
-    
-    <span style="color: #408080; font-style: italic"># for more readable code</span>
-    k_half_height <span style="color: #666666">=</span> kernel<span style="color: #666666">.</span>shape[<span style="color: #666666">0</span>] <span style="color: #666666">//</span> <span style="color: #666666">2</span>
-    k_half_width <span style="color: #666666">=</span> kernel<span style="color: #666666">.</span>shape[<span style="color: #666666">1</span>] <span style="color: #666666">//</span> <span style="color: #666666">2</span>
-
-    <span style="color: #408080; font-style: italic"># zero matrix with padded dimensions</span>
-    padded_img <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros((padded_height, padded_width))
-
-    <span style="color: #408080; font-style: italic"># place image into zero matrix</span>
-    padded_img[k_half_height : padded_height <span style="color: #666666">-</span> k_half_height,
-               k_half_width : padded_width <span style="color: #666666">-</span> k_half_width] <span style="color: #666666">=</span> image[:, :]
-
-    <span style="color: #008000; font-weight: bold">return</span> padded_img
-
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">convolve</span>(original_image, padded_image, kernel, stride<span style="color: #666666">=1</span>):
-    <span style="color: #408080; font-style: italic"># rotate kernel by 180 degrees</span>
-    kernel <span style="color: #666666">=</span> np<span style="color: #666666">.</span>rot90(np<span style="color: #666666">.</span>rot90(kernel))
-
-    <span style="color: #408080; font-style: italic"># note that kernel height // 2 is written as &#39;m&#39;</span>
-    <span style="color: #408080; font-style: italic"># and kernel width // 2 as &#39;n&#39; in the mathematical notation</span>
-    m <span style="color: #666666">=</span> kernel<span style="color: #666666">.</span>shape[<span style="color: #666666">0</span>] <span style="color: #666666">//</span> <span style="color: #666666">2</span>
-    n <span style="color: #666666">=</span> kernel<span style="color: #666666">.</span>shape[<span style="color: #666666">1</span>] <span style="color: #666666">//</span> <span style="color: #666666">2</span>
-    
-    r <span style="color: #666666">=</span> (kernel<span style="color: #666666">.</span>shape[<span style="color: #666666">0</span>] <span style="color: #666666">//</span> <span style="color: #666666">2</span>) <span style="color: #666666">*</span> <span style="color: #666666">2</span>
-    c <span style="color: #666666">=</span> (kernel<span style="color: #666666">.</span>shape[<span style="color: #666666">1</span>] <span style="color: #666666">//</span> <span style="color: #666666">2</span>) <span style="color: #666666">*</span> <span style="color: #666666">2</span>
-    
-    <span style="color: #408080; font-style: italic"># initialize output array</span>
-    convolved_image <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros(original_image<span style="color: #666666">.</span>shape)
-    image_height <span style="color: #666666">=</span> original_image<span style="color: #666666">.</span>shape[<span style="color: #666666">0</span>]
-    image_width <span style="color: #666666">=</span> original_image<span style="color: #666666">.</span>shape[<span style="color: #666666">1</span>]
-
-    <span style="color: #408080; font-style: italic"># the convolution</span>
-    <span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(m, image_height <span style="color: #666666">+</span> m, stride):
-        <span style="color: #008000; font-weight: bold">for</span> j <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(n, image_width <span style="color: #666666">+</span> n, stride):
-            convolved_image[i<span style="color: #666666">-</span>m, j<span style="color: #666666">-</span>n] <span style="color: #666666">=</span> np<span style="color: #666666">.</span>sum(
-                padded_image[i : i <span style="color: #666666">+</span> m, j : j <span style="color: #666666">+</span> n]
-                <span style="color: #666666">*</span> kernel
-            )
-            
-    <span style="color: #008000; font-weight: bold">return</span> convolved_image
-
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">convolve</span>(image, kernel, stride<span style="color: #666666">=1</span>):
-    <span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #666666">2</span>):
-        kernel <span style="color: #666666">=</span> np<span style="color: #666666">.</span>rot90(kernel)
-
-    k_half_height <span style="color: #666666">=</span> kernel<span style="color: #666666">.</span>shape[<span style="color: #666666">0</span>] <span style="color: #666666">//</span> <span style="color: #666666">2</span>
-    k_half_width <span style="color: #666666">=</span> kernel<span style="color: #666666">.</span>shape[<span style="color: #666666">0</span>] <span style="color: #666666">//</span> <span style="color: #666666">2</span>
-
-    conv_image <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros(image<span style="color: #666666">.</span>shape)
-    pad_image <span style="color: #666666">=</span> padding(image, kernel)
-
-    <span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(k_half_height, conv_image<span style="color: #666666">.</span>shape[<span style="color: #666666">0</span>] <span style="color: #666666">+</span> k_half_height, stride):
-        <span style="color: #008000; font-weight: bold">for</span> j <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(k_half_width, conv_image<span style="color: #666666">.</span>shape[<span style="color: #666666">1</span>] <span style="color: #666666">+</span> k_half_width, stride):
-            conv_image[i <span style="color: #666666">-</span> k_half_height, j <span style="color: #666666">-</span> k_half_width] <span style="color: #666666">=</span> np<span style="color: #666666">.</span>sum(
-                pad_image[
-                    i <span style="color: #666666">-</span> k_half_height : i <span style="color: #666666">+</span> k_half_height <span style="color: #666666">+</span> <span style="color: #666666">1</span>, j <span style="color: #666666">-</span> k_half_width : j <span style="color: #666666">+</span> k_half_width <span style="color: #666666">+</span> <span style="color: #666666">1</span>
-                ]
-                <span style="color: #666666">*</span> kernel
-            )
-
-    <span style="color: #008000; font-weight: bold">return</span> conv_image
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-<p>Fun fact: When filtering images, you will see that convolution
-involves rotating the kernel by 180 degrees.  However, this is not the
-case when applying convolution in a CNN, where the same operation that is not
-rotated by 180 degrees is called cross-correlation, which is normally implemented in most libraries.
-</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;">original_image <span style="color: #666666">=</span> np<span style="color: #666666">.</span>array([[<span style="color: #666666">4</span>, <span style="color: #666666">1</span>, <span style="color: #666666">2</span>, <span style="color: #666666">9</span>, <span style="color: #666666">8</span>, <span style="color: #666666">6</span>],
-                 [<span style="color: #666666">9</span>, <span style="color: #666666">5</span>, <span style="color: #666666">9</span>, <span style="color: #666666">5</span>, <span style="color: #666666">8</span>, <span style="color: #666666">5</span>],
-                 [<span style="color: #666666">1</span>, <span style="color: #666666">5</span>, <span style="color: #666666">9</span>, <span style="color: #666666">7</span>, <span style="color: #666666">6</span>, <span style="color: #666666">4</span>],
-                 [<span style="color: #666666">2</span>, <span style="color: #666666">9</span>, <span style="color: #666666">8</span>, <span style="color: #666666">3</span>, <span style="color: #666666">7</span>, <span style="color: #666666">1</span>],
-                 [<span style="color: #666666">8</span>, <span style="color: #666666">1</span>, <span style="color: #666666">6</span>, <span style="color: #666666">4</span>, <span style="color: #666666">2</span>, <span style="color: #666666">2</span>],
-                 [<span style="color: #666666">1</span>, <span style="color: #666666">0</span>, <span style="color: #666666">5</span>, <span style="color: #666666">7</span>, <span style="color: #666666">8</span>, <span style="color: #666666">2</span>]])
-
-kernel <span style="color: #666666">=</span> (<span style="color: #666666">1/9</span>)<span style="color: #666666">*</span>np<span style="color: #666666">.</span>ones((<span style="color: #666666">3</span>,<span style="color: #666666">3</span>))
-
-<span style="color: #008000">print</span>(<span style="color: #BA2121">f&quot;</span><span style="color: #BB6688; font-weight: bold">{</span>original_image<span style="color: #666666">.</span>shape<span style="color: #BB6688; font-weight: bold">=}</span><span style="color: #BA2121">&quot;</span>)
-
-<span style="color: #408080; font-style: italic"># note that convolve() performs padding</span>
-convolved_image <span style="color: #666666">=</span> convolve(original_image, kernel, stride<span style="color: #666666">=1</span>)
-
-<span style="color: #008000">print</span>(<span style="color: #BA2121">f&quot;</span><span style="color: #BB6688; font-weight: bold">{</span>convolved_image<span style="color: #666666">.</span>shape<span style="color: #BB6688; font-weight: bold">=}</span><span style="color: #BA2121">&quot;</span>)
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-<p>As you can see, the resulting image is of the same size as the
-original image. To round of our demonstration of convolution, we will
-present the results of convolution using commonly used kernels. In a
-CNN, the values of the kernels are randomly initialized, and then
-learned during training. These kernels will extract information
-regarding the picture, such as for example the edge detection filter
-demonstrated below extracts the edges present in the picture. Of
-course, there is no guarantee that the CNN will learn an edge
-detection filter, but this should provide some intuiton as to how the
-CNN is able to use kernels to make better predictions than a regular
-feed forward neural network.
-</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #408080; font-style: italic"># Now an example using a real image and first a gaussian low-pass filter and then a Sobel filter</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">imageio.v3</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">imageio</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">matplotlib.pyplot</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">plt</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">time</span>
-
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">generate_gauss_mask</span>(sigma, K<span style="color: #666666">=1</span>):
-    side <span style="color: #666666">=</span> np<span style="color: #666666">.</span>ceil(<span style="color: #666666">1</span> <span style="color: #666666">+</span> <span style="color: #666666">8</span> <span style="color: #666666">*</span> sigma)
-    y, x <span style="color: #666666">=</span> np<span style="color: #666666">.</span>mgrid[<span style="color: #666666">-</span>side <span style="color: #666666">//</span> <span style="color: #666666">2</span> <span style="color: #666666">+</span> <span style="color: #666666">1</span> : (side <span style="color: #666666">//</span> <span style="color: #666666">2</span>) <span style="color: #666666">+</span> <span style="color: #666666">1</span>, <span style="color: #666666">-</span>side <span style="color: #666666">//</span> <span style="color: #666666">2</span> <span style="color: #666666">+</span> <span style="color: #666666">1</span> : (side <span style="color: #666666">//</span> <span style="color: #666666">2</span>) <span style="color: #666666">+</span> <span style="color: #666666">1</span>]
-    ker_coef <span style="color: #666666">=</span> K <span style="color: #666666">/</span> (<span style="color: #666666">2</span> <span style="color: #666666">*</span> np<span style="color: #666666">.</span>pi <span style="color: #666666">*</span> sigma<span style="color: #666666">**2</span>)
-    g <span style="color: #666666">=</span> np<span style="color: #666666">.</span>exp(<span style="color: #666666">-</span>((x<span style="color: #666666">**2</span> <span style="color: #666666">+</span> y<span style="color: #666666">**2</span>) <span style="color: #666666">/</span> (<span style="color: #666666">2.0</span> <span style="color: #666666">*</span> sigma<span style="color: #666666">**2</span>)))
-
-    <span style="color: #008000; font-weight: bold">return</span> g, ker_coef
-
-
-img_path <span style="color: #666666">=</span> <span style="color: #BA2121">&quot;data/IMG-2167.JPG&quot;</span>
-image_of_cute_dog <span style="color: #666666">=</span> imageio<span style="color: #666666">.</span>imread(img_path, mode<span style="color: #666666">=</span><span style="color: #BA2121">&#39;L&#39;</span>)
-
-plt<span style="color: #666666">.</span>imshow(image_of_cute_dog, cmap<span style="color: #666666">=</span><span style="color: #BA2121">&quot;gray&quot;</span>, vmin<span style="color: #666666">=0</span>, vmax<span style="color: #666666">=255</span>, aspect<span style="color: #666666">=</span><span style="color: #BA2121">&quot;auto&quot;</span>)
-plt<span style="color: #666666">.</span>title(<span style="color: #BA2121">&quot;Original image&quot;</span>)
-plt<span style="color: #666666">.</span>show()
-
-gauss, kernel <span style="color: #666666">=</span> generate_gauss_mask(sigma<span style="color: #666666">=6</span>)
-gauss_kernel <span style="color: #666666">=</span> gauss<span style="color: #666666">*</span>kernel
-
-filtered_image <span style="color: #666666">=</span> convolve(image_of_cute_dog, gauss_kernel)
-plt<span style="color: #666666">.</span>imshow(filtered_image, cmap<span style="color: #666666">=</span><span style="color: #BA2121">&quot;gray&quot;</span>, vmin<span style="color: #666666">=0</span>, vmax<span style="color: #666666">=255</span>, aspect<span style="color: #666666">=</span><span style="color: #BA2121">&quot;auto&quot;</span>)
-plt<span style="color: #666666">.</span>title(<span style="color: #BA2121">&quot;Result of convolution with gauss kernel (blurring filter)&quot;</span>)
-plt<span style="color: #666666">.</span>show()
-
-sobel_kernel <span style="color: #666666">=</span> np<span style="color: #666666">.</span>array([[<span style="color: #666666">1</span>, <span style="color: #666666">2</span>, <span style="color: #666666">1</span>],
-                    [<span style="color: #666666">0</span>, <span style="color: #666666">0</span>, <span style="color: #666666">0</span>], 
-                    [<span style="color: #666666">-1</span>, <span style="color: #666666">-2</span>, <span style="color: #666666">-1</span>]])
-
-filtered_image <span style="color: #666666">=</span> convolve(image_of_cute_dog, sobel_kernel)
-
-plt<span style="color: #666666">.</span>imshow(filtered_image, cmap<span style="color: #666666">=</span><span style="color: #BA2121">&quot;gray&quot;</span>, vmin<span style="color: #666666">=0</span>, vmax<span style="color: #666666">=255</span>, aspect<span style="color: #666666">=</span><span style="color: #BA2121">&quot;auto&quot;</span>)
-plt<span style="color: #666666">.</span>title(<span style="color: #BA2121">&quot;Result of convolution with sobel kernel (edge detection filter)&quot;</span>)
-plt<span style="color: #666666">.</span>show()
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-<h3 id="layers" class="anchor">Layers </h3>
-
-<p>The code below initialises global variables for readability and
-describes the abstract class Layers. This is not important in order to
-understand the CNN, but is benefitial for organizing the code neatly.
-</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">math</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">autograd.numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">copy</span> <span style="color: #008000; font-weight: bold">import</span> deepcopy, copy
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">autograd</span> <span style="color: #008000; font-weight: bold">import</span> grad
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">typing</span> <span style="color: #008000; font-weight: bold">import</span> Callable
-
-<span style="color: #408080; font-style: italic"># global variables for index readability</span>
-input_index <span style="color: #666666">=</span> <span style="color: #666666">0</span>
-node_index <span style="color: #666666">=</span> <span style="color: #666666">1</span>
-bias_index <span style="color: #666666">=</span> <span style="color: #666666">1</span>
-input_channel_index <span style="color: #666666">=</span> <span style="color: #666666">1</span>
-feature_maps_index <span style="color: #666666">=</span> <span style="color: #666666">1</span>
-height_index <span style="color: #666666">=</span> <span style="color: #666666">2</span>
-width_index <span style="color: #666666">=</span> <span style="color: #666666">3</span>
-kernel_feature_maps_index <span style="color: #666666">=</span> <span style="color: #666666">1</span>
-kernel_input_channels_index <span style="color: #666666">=</span> <span style="color: #666666">0</span>
-
-
-<span style="color: #008000; font-weight: bold">class</span> <span style="color: #0000FF; font-weight: bold">Layer</span>:
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">__init__</span>(<span style="color: #008000">self</span>, seed):
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>seed <span style="color: #666666">=</span> seed
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">_feedforward</span>(<span style="color: #008000">self</span>):
-        <span style="color: #008000; font-weight: bold">raise</span> <span style="color: #D2413A; font-weight: bold">NotImplementedError</span>
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">_backpropagate</span>(<span style="color: #008000">self</span>):
-        <span style="color: #008000; font-weight: bold">raise</span> <span style="color: #D2413A; font-weight: bold">NotImplementedError</span>
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">_reset_weights</span>(<span style="color: #008000">self</span>, previous_nodes):
-        <span style="color: #008000; font-weight: bold">raise</span> <span style="color: #D2413A; font-weight: bold">NotImplementedError</span>
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-<h3 id="convolution2dlayer-convolution-in-a-hidden-layer" class="anchor">Convolution2DLayer: convolution in a hidden layer </h3>
-
-<p>After establishing the foundational understanding of applying
-convolution to spatial data, let us delve into the intricate workings
-of a convolutional layer in a Convolutional Neural Network (CNN). The
-primary function of convolution, as previously discussed, is to
-extract pertinent information from images while simultaneously
-decreasing the scale of our data. To initiate the image processing, we
-shall begin by partitioning the images into color channels (unless the
-image is grayscale), comprising three primary colors: red, green, and
-blue. We will subsequently utilize trainable kernels to construct a
-higher-dimensional encoding of each channel called feature
-maps. Successive layers will receive these feature maps as inputs,
-generating further encodings, albeit with reduced dimensions. The term
-trainable kernels denotes the initialization of pre-defined
-kernel-shaped weights, which we will then train via backpropagation,
-similar to how weights are trained in a Feedforward Neural Network.
-</p>
-
-<p>To ensure seamless integration between our implementation of the
-convolutional layer and popular machine learning frameworks like
-Tensorflow (Keras) and PyTorch, we have adopted a design pattern that
-mirrors the construction of models using these APIs. This involves
-implementing our convolutional layer as a Python class or object,
-which allows for a more modular and flexible approach to building
-neural networks. By structuring our code in this way, users can easily
-incorporate our implementation into their existing machine learning
-pipelines without having to make significant changes to their
-codebase. Additionally, this design pattern promotes code reusability
-and makes it easier to maintain and update our convolutional layer
-implementation over time.
-</p>
-
-<p>Note that the Convolution2DLayer takes in an activation function as a parameter, as it also performs non-linearity.</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">class</span> <span style="color: #0000FF; font-weight: bold">Convolution2DLayer</span>(Layer):
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">__init__</span>(
-        <span style="color: #008000">self</span>,
-        input_channels,
-        feature_maps,
-        kernel_height,
-        kernel_width,
-        v_stride,
-        h_stride,
-        pad,
-        act_func: Callable,
-        seed<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">None</span>,
-        reset_weights_independently<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">True</span>,
-    ):
-        <span style="color: #008000">super</span>()<span style="color: #666666">.</span><span style="color: #0000FF">__init__</span>(seed)
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>input_channels <span style="color: #666666">=</span> input_channels
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>feature_maps <span style="color: #666666">=</span> feature_maps
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_height <span style="color: #666666">=</span> kernel_height
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_width <span style="color: #666666">=</span> kernel_width
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>v_stride <span style="color: #666666">=</span> v_stride
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>h_stride <span style="color: #666666">=</span> h_stride
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>pad <span style="color: #666666">=</span> pad
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>act_func <span style="color: #666666">=</span> act_func
-
-        <span style="color: #408080; font-style: italic"># such that the layer can be used on its own</span>
-        <span style="color: #408080; font-style: italic"># outside of the CNN module</span>
-        <span style="color: #008000; font-weight: bold">if</span> reset_weights_independently <span style="color: #666666">==</span> <span style="color: #008000; font-weight: bold">True</span>:
-            <span style="color: #008000">self</span><span style="color: #666666">.</span>_reset_weights_independently()
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">_feedforward</span>(<span style="color: #008000">self</span>, X_batch):
-        <span style="color: #408080; font-style: italic"># note that the shape of X_batch = [inputs, input_maps, img_height, img_width]</span>
-
-        <span style="color: #408080; font-style: italic"># pad the input batch</span>
-        X_batch_padded <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>_padding(X_batch)
-
-        <span style="color: #408080; font-style: italic"># calculate height_index and width_index after stride</span>
-        strided_height <span style="color: #666666">=</span> <span style="color: #008000">int</span>(np<span style="color: #666666">.</span>ceil(X_batch<span style="color: #666666">.</span>shape[height_index] <span style="color: #666666">/</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>v_stride))
-        strided_width <span style="color: #666666">=</span> <span style="color: #008000">int</span>(np<span style="color: #666666">.</span>ceil(X_batch<span style="color: #666666">.</span>shape[width_index] <span style="color: #666666">/</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>h_stride))
-
-        <span style="color: #408080; font-style: italic"># create output array</span>
-        output <span style="color: #666666">=</span> np<span style="color: #666666">.</span>ndarray(
-            (
-                X_batch<span style="color: #666666">.</span>shape[input_index],
-                <span style="color: #008000">self</span><span style="color: #666666">.</span>feature_maps,
-                strided_height,
-                strided_width,
-            )
-        )
-
-        <span style="color: #408080; font-style: italic"># save input and output for backpropagation</span>
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>X_batch_feedforward <span style="color: #666666">=</span> X_batch
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>output_shape <span style="color: #666666">=</span> output<span style="color: #666666">.</span>shape
-
-        <span style="color: #408080; font-style: italic"># checking for errors, no need to look here :)</span>
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>_check_for_errors()
-
-        <span style="color: #408080; font-style: italic"># convolve input with kernel</span>
-        <span style="color: #008000; font-weight: bold">for</span> img <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(X_batch<span style="color: #666666">.</span>shape[input_index]):
-            <span style="color: #008000; font-weight: bold">for</span> chin <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #008000">self</span><span style="color: #666666">.</span>input_channels):
-                <span style="color: #008000; font-weight: bold">for</span> fmap <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #008000">self</span><span style="color: #666666">.</span>feature_maps):
-                    out_h <span style="color: #666666">=</span> <span style="color: #666666">0</span>
-                    <span style="color: #008000; font-weight: bold">for</span> h <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #666666">0</span>, X_batch<span style="color: #666666">.</span>shape[height_index], <span style="color: #008000">self</span><span style="color: #666666">.</span>v_stride):
-                        out_w <span style="color: #666666">=</span> <span style="color: #666666">0</span>
-                        <span style="color: #008000; font-weight: bold">for</span> w <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #666666">0</span>, X_batch<span style="color: #666666">.</span>shape[width_index], <span style="color: #008000">self</span><span style="color: #666666">.</span>h_stride):
-                            output[img, fmap, out_h, out_w] <span style="color: #666666">=</span> np<span style="color: #666666">.</span>sum(
-                                X_batch_padded[
-                                    img,
-                                    chin,
-                                    h : h <span style="color: #666666">+</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_height,
-                                    w : w <span style="color: #666666">+</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_width,
-                                ]
-                                <span style="color: #666666">*</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel[chin, fmap, :, :]
-                            )
-                            out_w <span style="color: #666666">+=</span> <span style="color: #666666">1</span>
-                        out_h <span style="color: #666666">+=</span> <span style="color: #666666">1</span>
-
-        <span style="color: #408080; font-style: italic"># Pay attention to the fact that we&#39;re not rotating the kernel by 180 degrees when filtering the image in</span>
-        <span style="color: #408080; font-style: italic"># the convolutional layer, as convolution in terms of Machine Learning is a procedure known as cross-correlation</span>
-        <span style="color: #408080; font-style: italic"># in image processing and signal processing</span>
-
-        <span style="color: #408080; font-style: italic"># return a</span>
-        <span style="color: #008000; font-weight: bold">return</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>act_func(output <span style="color: #666666">/</span> (<span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_height))
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">_backpropagate</span>(<span style="color: #008000">self</span>, delta_term_next):
-        <span style="color: #408080; font-style: italic"># intiate matrices</span>
-        delta_term <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros((<span style="color: #008000">self</span><span style="color: #666666">.</span>X_batch_feedforward<span style="color: #666666">.</span>shape))
-        gradient_kernel <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros((<span style="color: #008000">self</span><span style="color: #666666">.</span>kernel<span style="color: #666666">.</span>shape))
-
-        <span style="color: #408080; font-style: italic"># pad input for convolution</span>
-        X_batch_padded <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>_padding(<span style="color: #008000">self</span><span style="color: #666666">.</span>X_batch_feedforward)
-
-        <span style="color: #408080; font-style: italic"># Since an activation function is used at the output of the convolution layer, its derivative</span>
-        <span style="color: #408080; font-style: italic"># has to be accounted for in the backpropagation -&gt; as if ReLU was a layer on its own.</span>
-        act_derivative <span style="color: #666666">=</span> derivate(<span style="color: #008000">self</span><span style="color: #666666">.</span>act_func)
-        delta_term_next <span style="color: #666666">=</span> act_derivative(delta_term_next)
-
-        <span style="color: #408080; font-style: italic"># fill in 0&#39;s for values removed by vertical stride in feedforward</span>
-        <span style="color: #008000; font-weight: bold">if</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>v_stride <span style="color: #666666">&gt;</span> <span style="color: #666666">1</span>:
-            v_ind <span style="color: #666666">=</span> <span style="color: #666666">1</span>
-            <span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(delta_term_next<span style="color: #666666">.</span>shape[height_index]):
-                <span style="color: #008000; font-weight: bold">for</span> j <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #008000">self</span><span style="color: #666666">.</span>v_stride <span style="color: #666666">-</span> <span style="color: #666666">1</span>):
-                    delta_term_next <span style="color: #666666">=</span> np<span style="color: #666666">.</span>insert(
-                        delta_term_next, v_ind, <span style="color: #666666">0</span>, axis<span style="color: #666666">=</span>height_index
-                    )
-                v_ind <span style="color: #666666">+=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>v_stride
-
-        <span style="color: #408080; font-style: italic"># fill in 0&#39;s for values removed by horizontal stride in feedforward</span>
-        <span style="color: #008000; font-weight: bold">if</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>h_stride <span style="color: #666666">&gt;</span> <span style="color: #666666">1</span>:
-            h_ind <span style="color: #666666">=</span> <span style="color: #666666">1</span>
-            <span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(delta_term_next<span style="color: #666666">.</span>shape[width_index]):
-                <span style="color: #008000; font-weight: bold">for</span> k <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #008000">self</span><span style="color: #666666">.</span>h_stride <span style="color: #666666">-</span> <span style="color: #666666">1</span>):
-                    delta_term_next <span style="color: #666666">=</span> np<span style="color: #666666">.</span>insert(
-                        delta_term_next, h_ind, <span style="color: #666666">0</span>, axis<span style="color: #666666">=</span>width_index
-                    )
-                h_ind <span style="color: #666666">+=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>h_stride
-
-        <span style="color: #408080; font-style: italic"># crops out 0-rows and 0-columns</span>
-        delta_term_next <span style="color: #666666">=</span> delta_term_next[
-            :,
-            :,
-            : <span style="color: #008000">self</span><span style="color: #666666">.</span>X_batch_feedforward<span style="color: #666666">.</span>shape[height_index],
-            : <span style="color: #008000">self</span><span style="color: #666666">.</span>X_batch_feedforward<span style="color: #666666">.</span>shape[width_index],
-        ]
-
-        <span style="color: #408080; font-style: italic"># the gradient received from the next layer also needs to be padded</span>
-        delta_term_next <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>_padding(delta_term_next)
-
-        <span style="color: #408080; font-style: italic"># calculate delta term by convolving next delta term with kernel</span>
-        <span style="color: #008000; font-weight: bold">for</span> img <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #008000">self</span><span style="color: #666666">.</span>X_batch_feedforward<span style="color: #666666">.</span>shape[input_index]):
-            <span style="color: #008000; font-weight: bold">for</span> chin <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #008000">self</span><span style="color: #666666">.</span>input_channels):
-                <span style="color: #008000; font-weight: bold">for</span> fmap <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #008000">self</span><span style="color: #666666">.</span>feature_maps):
-                    <span style="color: #008000; font-weight: bold">for</span> h <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #008000">self</span><span style="color: #666666">.</span>X_batch_feedforward<span style="color: #666666">.</span>shape[height_index]):
-                        <span style="color: #008000; font-weight: bold">for</span> w <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #008000">self</span><span style="color: #666666">.</span>X_batch_feedforward<span style="color: #666666">.</span>shape[width_index]):
-                            delta_term[img, chin, h, w] <span style="color: #666666">=</span> np<span style="color: #666666">.</span>sum(
-                                delta_term_next[
-                                    img,
-                                    fmap,
-                                    h : h <span style="color: #666666">+</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_height,
-                                    w : w <span style="color: #666666">+</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_width,
-                                ]
-                                <span style="color: #666666">*</span> np<span style="color: #666666">.</span>rot90(np<span style="color: #666666">.</span>rot90(<span style="color: #008000">self</span><span style="color: #666666">.</span>kernel[chin, fmap, :, :]))
-                            )
-
-        <span style="color: #408080; font-style: italic"># calculate gradient for kernel for weight update</span>
-        <span style="color: #408080; font-style: italic"># also via convolution</span>
-        <span style="color: #008000; font-weight: bold">for</span> chin <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #008000">self</span><span style="color: #666666">.</span>input_channels):
-            <span style="color: #008000; font-weight: bold">for</span> fmap <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #008000">self</span><span style="color: #666666">.</span>feature_maps):
-                <span style="color: #008000; font-weight: bold">for</span> k_x <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_height):
-                    <span style="color: #008000; font-weight: bold">for</span> k_y <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_width):
-                        gradient_kernel[chin, fmap, k_x, k_y] <span style="color: #666666">=</span> np<span style="color: #666666">.</span>sum(
-                            X_batch_padded[
-                                img,
-                                chin,
-                                h : h <span style="color: #666666">+</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_height,
-                                w : w <span style="color: #666666">+</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_width,
-                            ]
-                            <span style="color: #666666">*</span> delta_term_next[
-                                img,
-                                fmap,
-                                h : h <span style="color: #666666">+</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_height,
-                                w : w <span style="color: #666666">+</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_width,
-                            ]
-                        )
-        <span style="color: #408080; font-style: italic"># all kernels are updated with weight gradient of kernel</span>
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel <span style="color: #666666">-=</span> gradient_kernel
-
-        <span style="color: #408080; font-style: italic"># return delta term</span>
-        <span style="color: #008000; font-weight: bold">return</span> delta_term
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">_padding</span>(<span style="color: #008000">self</span>, X_batch, batch_type<span style="color: #666666">=</span><span style="color: #BA2121">&quot;image&quot;</span>):
-
-        <span style="color: #408080; font-style: italic"># same padding for images</span>
-        <span style="color: #008000; font-weight: bold">if</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>pad <span style="color: #666666">==</span> <span style="color: #BA2121">&quot;same&quot;</span> <span style="color: #AA22FF; font-weight: bold">and</span> batch_type <span style="color: #666666">==</span> <span style="color: #BA2121">&quot;image&quot;</span>:
-            padded_height <span style="color: #666666">=</span> X_batch<span style="color: #666666">.</span>shape[height_index] <span style="color: #666666">+</span> (<span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_height <span style="color: #666666">//</span> <span style="color: #666666">2</span>) <span style="color: #666666">*</span> <span style="color: #666666">2</span>
-            padded_width <span style="color: #666666">=</span> X_batch<span style="color: #666666">.</span>shape[width_index] <span style="color: #666666">+</span> (<span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_width <span style="color: #666666">//</span> <span style="color: #666666">2</span>) <span style="color: #666666">*</span> <span style="color: #666666">2</span>
-            half_kernel_height <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_height <span style="color: #666666">//</span> <span style="color: #666666">2</span>
-            half_kernel_width <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_width <span style="color: #666666">//</span> <span style="color: #666666">2</span>
-
-            <span style="color: #408080; font-style: italic"># initialize padded array</span>
-            X_batch_padded <span style="color: #666666">=</span> np<span style="color: #666666">.</span>ndarray(
-                (
-                    X_batch<span style="color: #666666">.</span>shape[input_index],
-                    X_batch<span style="color: #666666">.</span>shape[feature_maps_index],
-                    padded_height,
-                    padded_width,
-                )
-            )
-
-            <span style="color: #408080; font-style: italic"># zero pad all images in X_batch</span>
-            <span style="color: #008000; font-weight: bold">for</span> img <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(X_batch<span style="color: #666666">.</span>shape[input_index]):
-                padded_img <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros(
-                    (X_batch<span style="color: #666666">.</span>shape[feature_maps_index], padded_height, padded_width)
-                )
-                padded_img[
-                    :,
-                    half_kernel_height : padded_height <span style="color: #666666">-</span> half_kernel_height,
-                    half_kernel_width : padded_width <span style="color: #666666">-</span> half_kernel_width,
-                ] <span style="color: #666666">=</span> X_batch[img, :, :, :]
-                X_batch_padded[img, :, :, :] <span style="color: #666666">=</span> padded_img[:, :, :]
-
-            <span style="color: #008000; font-weight: bold">return</span> X_batch_padded
-
-        <span style="color: #408080; font-style: italic"># same padding for gradients</span>
-        <span style="color: #008000; font-weight: bold">elif</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>pad <span style="color: #666666">==</span> <span style="color: #BA2121">&quot;same&quot;</span> <span style="color: #AA22FF; font-weight: bold">and</span> batch_type <span style="color: #666666">==</span> <span style="color: #BA2121">&quot;grad&quot;</span>:
-            padded_height <span style="color: #666666">=</span> X_batch<span style="color: #666666">.</span>shape[height_index] <span style="color: #666666">+</span> (<span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_height <span style="color: #666666">//</span> <span style="color: #666666">2</span>) <span style="color: #666666">*</span> <span style="color: #666666">2</span>
-            padded_width <span style="color: #666666">=</span> X_batch<span style="color: #666666">.</span>shape[width_index] <span style="color: #666666">+</span> (<span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_width <span style="color: #666666">//</span> <span style="color: #666666">2</span>) <span style="color: #666666">*</span> <span style="color: #666666">2</span>
-            half_kernel_height <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_height <span style="color: #666666">//</span> <span style="color: #666666">2</span>
-            half_kernel_width <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_width <span style="color: #666666">//</span> <span style="color: #666666">2</span>
-
-            <span style="color: #408080; font-style: italic"># initialize padded array</span>
-            delta_term_padded <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros(
-                (
-                    X_batch<span style="color: #666666">.</span>shape[input_index],
-                    X_batch<span style="color: #666666">.</span>shape[feature_maps_index],
-                    padded_height,
-                    padded_width,
-                )
-            )
-
-            <span style="color: #408080; font-style: italic"># zero pad delta term</span>
-            delta_term_padded[
-                :, :, : X_batch<span style="color: #666666">.</span>shape[height_index], : X_batch<span style="color: #666666">.</span>shape[width_index]
-            ] <span style="color: #666666">=</span> X_batch[:, :, :, :]
-
-            <span style="color: #008000; font-weight: bold">return</span> delta_term_padded
-
-        <span style="color: #008000; font-weight: bold">else</span>:
-            <span style="color: #008000; font-weight: bold">return</span> X_batch
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">_reset_weights_independently</span>(<span style="color: #008000">self</span>):
-        <span style="color: #408080; font-style: italic"># sets seed to remove randomness inbetween runs</span>
-        <span style="color: #008000; font-weight: bold">if</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>seed <span style="color: #AA22FF; font-weight: bold">is</span> <span style="color: #AA22FF; font-weight: bold">not</span> <span style="color: #008000; font-weight: bold">None</span>:
-            np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>seed(<span style="color: #008000">self</span><span style="color: #666666">.</span>seed)
-
-        <span style="color: #408080; font-style: italic"># initializes kernel matrix</span>
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel <span style="color: #666666">=</span> np<span style="color: #666666">.</span>ndarray(
-            (
-                <span style="color: #008000">self</span><span style="color: #666666">.</span>input_channels,
-                <span style="color: #008000">self</span><span style="color: #666666">.</span>feature_maps,
-                <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_height,
-                <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_width,
-            )
-        )
-
-        <span style="color: #408080; font-style: italic"># randomly initializes weights</span>
-        <span style="color: #008000; font-weight: bold">for</span> chin <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #008000">self</span><span style="color: #666666">.</span>kernel<span style="color: #666666">.</span>shape[kernel_input_channels_index]):
-            <span style="color: #008000; font-weight: bold">for</span> fmap <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #008000">self</span><span style="color: #666666">.</span>kernel<span style="color: #666666">.</span>shape[kernel_feature_maps_index]):
-                <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel[chin, fmap, :, :] <span style="color: #666666">=</span> np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>rand(
-                    <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_height, <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_width
-                )
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">_reset_weights</span>(<span style="color: #008000">self</span>, previous_nodes):
-        <span style="color: #408080; font-style: italic"># sets weights</span>
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>_reset_weights_independently()
-
-        <span style="color: #408080; font-style: italic"># returns shape of output used for subsequent layer&#39;s weight initiation</span>
-        strided_height <span style="color: #666666">=</span> <span style="color: #008000">int</span>(
-            np<span style="color: #666666">.</span>ceil(previous_nodes<span style="color: #666666">.</span>shape[height_index] <span style="color: #666666">/</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>v_stride)
-        )
-        strided_width <span style="color: #666666">=</span> <span style="color: #008000">int</span>(np<span style="color: #666666">.</span>ceil(previous_nodes<span style="color: #666666">.</span>shape[width_index] <span style="color: #666666">/</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>h_stride))
-        next_nodes <span style="color: #666666">=</span> np<span style="color: #666666">.</span>ones(
-            (
-                previous_nodes<span style="color: #666666">.</span>shape[input_index],
-                <span style="color: #008000">self</span><span style="color: #666666">.</span>feature_maps,
-                strided_height,
-                strided_width,
-            )
-        )
-        <span style="color: #008000; font-weight: bold">return</span> next_nodes <span style="color: #666666">/</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_height
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">_check_for_errors</span>(<span style="color: #008000">self</span>):
-        <span style="color: #008000; font-weight: bold">if</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>X_batch_feedforward<span style="color: #666666">.</span>shape[input_channel_index] <span style="color: #666666">!=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>input_channels:
-            <span style="color: #008000; font-weight: bold">raise</span> <span style="color: #D2413A; font-weight: bold">AssertionError</span>(
-                <span style="color: #BA2121">f&quot;ERROR: Number of input channels in data (</span><span style="color: #BB6688; font-weight: bold">{</span><span style="color: #008000">self</span><span style="color: #666666">.</span>X_batch_feedforward<span style="color: #666666">.</span>shape[input_channel_index]<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">) is not equal to input channels in Convolution2DLayerOPT (</span><span style="color: #BB6688; font-weight: bold">{</span><span style="color: #008000">self</span><span style="color: #666666">.</span>input_channels<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">)! Please change the number of input channels of the Convolution2DLayer such that they are equal&quot;</span>
-            )
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-<h3 id="backpropagation-in-the-convolutional-layer" class="anchor">Backpropagation in the convolutional layer </h3>
-
-<p>As you may have noticed, we have not yet explained how the
-backpropagation algorithm works in a convolutional layer. However,
-having covered all other major details about convolutional layers, we
-are now prepared to do so. It should come as no surprise that the
-calculation of delta terms at each convolutional layer takes the form
-of convolution. After the gradient has been propagated backwards
-through the flattening layer, where it was reshaped into an
-appropriate form, calculating the update value for the kernel is
-simply a matter of convolving the output gradient with the input of
-the layer for which we are updating the weights. For more detail, this
-article serves as an excellent resource, see
-<a href="https://pavisj.medium.com/convolutions-and-backpropagations-46026a8f5d2c" target="_self"><tt>https://pavisj.medium.com/convolutions-and-backpropagations-46026a8f5d2c</tt></a>
-</p>
-<h3 id="demonstration" class="anchor">Demonstration </h3>
-
-<p>We can use the convolutional layer above to perform a simple convolution on an image of the now familiar cute dog.</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">imageio.v3</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">imageio</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">matplotlib.pyplot</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">plt</span>
-
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">plot_convolution_result</span>(X, layer):
-    plt<span style="color: #666666">.</span>imshow(X[<span style="color: #666666">0</span>, <span style="color: #666666">0</span>, :, :], vmin<span style="color: #666666">=0</span>, vmax<span style="color: #666666">=255</span>, cmap<span style="color: #666666">=</span><span style="color: #BA2121">&quot;gray&quot;</span>)
-    plt<span style="color: #666666">.</span>title(<span style="color: #BA2121">&quot;Original image&quot;</span>)
-    plt<span style="color: #666666">.</span>colorbar()
-    plt<span style="color: #666666">.</span>show()
-    conv_result <span style="color: #666666">=</span> layer<span style="color: #666666">.</span>_feedforward(X)
-    plt<span style="color: #666666">.</span>title(<span style="color: #BA2121">&quot;Result of convolutional layer&quot;</span>)
-    plt<span style="color: #666666">.</span>imshow(conv_result[<span style="color: #666666">0</span>, <span style="color: #666666">0</span>, :, :], vmin<span style="color: #666666">=0</span>, vmax<span style="color: #666666">=255</span>, cmap<span style="color: #666666">=</span><span style="color: #BA2121">&quot;gray&quot;</span>)
-    plt<span style="color: #666666">.</span>colorbar()
-    plt<span style="color: #666666">.</span>show()
-
-<span style="color: #408080; font-style: italic"># create layer</span>
-layer <span style="color: #666666">=</span> Convolution2DLayer(
-    input_channels<span style="color: #666666">=3</span>,
-    feature_maps<span style="color: #666666">=1</span>,
-    kernel_height<span style="color: #666666">=4</span>,
-    kernel_width<span style="color: #666666">=4</span>,
-    v_stride<span style="color: #666666">=2</span>,
-    h_stride<span style="color: #666666">=2</span>,
-    pad<span style="color: #666666">=</span><span style="color: #BA2121">&quot;same&quot;</span>,
-    act_func<span style="color: #666666">=</span>identity,
-    seed<span style="color: #666666">=2023</span>,
-    )
-
-<span style="color: #408080; font-style: italic"># read in image path, make data correct format</span>
-img_path <span style="color: #666666">=</span> img_path <span style="color: #666666">=</span> <span style="color: #BA2121">&quot;data/IMG-2167.JPG&quot;</span>
-image_of_cute_dog <span style="color: #666666">=</span> imageio<span style="color: #666666">.</span>imread(img_path)
-image_shape <span style="color: #666666">=</span> image_of_cute_dog<span style="color: #666666">.</span>shape
-image_of_cute_dog <span style="color: #666666">=</span> image_of_cute_dog<span style="color: #666666">.</span>reshape(<span style="color: #666666">1</span>, image_shape[<span style="color: #666666">0</span>], image_shape[<span style="color: #666666">1</span>], image_shape[<span style="color: #666666">2</span>])
-image_of_cute_dog <span style="color: #666666">=</span> image_of_cute_dog<span style="color: #666666">.</span>transpose(<span style="color: #666666">0</span>, <span style="color: #666666">3</span>, <span style="color: #666666">1</span>, <span style="color: #666666">2</span>)
-
-<span style="color: #408080; font-style: italic"># plot the result of the convolution</span>
-plot_convolution_result(image_of_cute_dog, layer)
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-<p>We cobserve that the result has half the pixels on each axis due to
-the fact that we've used a horizontal and vertical stride of 2. The
-result of this convolution is not very insightfull, as the kernel has
-completely random values for the first feedforward pass. However, as
-we perform multiple forward and backward passes, the results of the
-convolution should provide identifying features of the image it uses
-for classification.
-</p>
-
-<p>Note that image data usually comes in many different shapes and sizes,
-but for our CNN we require the input data be formatted as \[Number of
-inputs, input channels, input height, input width\]. Occasionally, the
-data you come accross use will be formatted like this, but on many
-occasions reshaping and transposing the dimensions is sadly necessary.
-</p>
-<h3 id="pooling-layer" class="anchor">Pooling Layer </h3>
-
-<p>The pooling layer is another widely used type of layer in
-convolutional neural networks that enables data downsampling to a more
-manageable size. Despite recent technological advancements that allow
-for convolution without excessive size reduction of the data, the
-pooling layer still remains a fundamental component of convolutional
-neural networks. It can be used before, after, or in between
-convolutional layers, although finding the optimal placement of layers
-and network depth requires experimentation to achieve the best
-performance for a given problem. The code we provide allows you to
-perform two types of pooling known as max pooling and average pooling.
-</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">class</span> <span style="color: #0000FF; font-weight: bold">Pooling2DLayer</span>(Layer):
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">__init__</span>(
-        <span style="color: #008000">self</span>,
-        kernel_height,
-        kernel_width,
-        v_stride,
-        h_stride,
-        pooling<span style="color: #666666">=</span><span style="color: #BA2121">&quot;max&quot;</span>,
-        seed<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">None</span>,
-    ):
-        <span style="color: #008000">super</span>()<span style="color: #666666">.</span><span style="color: #0000FF">__init__</span>(seed)
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_height <span style="color: #666666">=</span> kernel_height
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_width <span style="color: #666666">=</span> kernel_width
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>v_stride <span style="color: #666666">=</span> v_stride
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>h_stride <span style="color: #666666">=</span> h_stride
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>pooling <span style="color: #666666">=</span> pooling
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">_feedforward</span>(<span style="color: #008000">self</span>, X_batch):
-        <span style="color: #408080; font-style: italic"># Saving the input for use in the backwardpass</span>
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>X_batch_feedforward <span style="color: #666666">=</span> X_batch
-
-        <span style="color: #408080; font-style: italic"># check if user is silly</span>
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>_check_for_errors()
-
-        <span style="color: #408080; font-style: italic"># Computing the size of the feature maps based on kernel size and the stride parameter</span>
-        strided_height <span style="color: #666666">=</span> (
-            X_batch<span style="color: #666666">.</span>shape[height_index] <span style="color: #666666">-</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_height
-        ) <span style="color: #666666">//</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>v_stride <span style="color: #666666">+</span> <span style="color: #666666">1</span>
-        <span style="color: #008000; font-weight: bold">if</span> X_batch<span style="color: #666666">.</span>shape[height_index] <span style="color: #666666">==</span> X_batch<span style="color: #666666">.</span>shape[width_index]:
-            strided_width <span style="color: #666666">=</span> strided_height
-        <span style="color: #008000; font-weight: bold">else</span>:
-            strided_width <span style="color: #666666">=</span> (
-                X_batch<span style="color: #666666">.</span>shape[width_index] <span style="color: #666666">-</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_width
-            ) <span style="color: #666666">//</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>h_stride <span style="color: #666666">+</span> <span style="color: #666666">1</span>
-
-        <span style="color: #408080; font-style: italic"># initialize output array</span>
-        output <span style="color: #666666">=</span> np<span style="color: #666666">.</span>ndarray(
-            (
-                X_batch<span style="color: #666666">.</span>shape[input_index],
-                X_batch<span style="color: #666666">.</span>shape[feature_maps_index],
-                strided_height,
-                strided_width,
-            )
-        )
-
-        <span style="color: #408080; font-style: italic"># select pooling action, either max or average pooling</span>
-        <span style="color: #008000; font-weight: bold">if</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>pooling <span style="color: #666666">==</span> <span style="color: #BA2121">&quot;max&quot;</span>:
-            <span style="color: #008000">self</span><span style="color: #666666">.</span>pooling_action <span style="color: #666666">=</span> np<span style="color: #666666">.</span>max
-        <span style="color: #008000; font-weight: bold">elif</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>pooling <span style="color: #666666">==</span> <span style="color: #BA2121">&quot;average&quot;</span>:
-            <span style="color: #008000">self</span><span style="color: #666666">.</span>pooling_action <span style="color: #666666">=</span> np<span style="color: #666666">.</span>mean
-
-        <span style="color: #408080; font-style: italic"># pool based on kernel size and stride</span>
-        <span style="color: #008000; font-weight: bold">for</span> img <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(output<span style="color: #666666">.</span>shape[input_index]):
-            <span style="color: #008000; font-weight: bold">for</span> fmap <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(output<span style="color: #666666">.</span>shape[feature_maps_index]):
-                <span style="color: #008000; font-weight: bold">for</span> h <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(strided_height):
-                    <span style="color: #008000; font-weight: bold">for</span> w <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(strided_width):
-                        output[img, fmap, h, w] <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>pooling_action(
-                            X_batch[
-                                img,
-                                fmap,
-                                (h <span style="color: #666666">*</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>v_stride) : (h <span style="color: #666666">*</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>v_stride)
-                                <span style="color: #666666">+</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_height,
-                                (w <span style="color: #666666">*</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>h_stride) : (w <span style="color: #666666">*</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>h_stride)
-                                <span style="color: #666666">+</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_width,
-                            ]
-                        )
-
-        <span style="color: #408080; font-style: italic"># output for feedforward in next layer</span>
-        <span style="color: #008000; font-weight: bold">return</span> output
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">_backpropagate</span>(<span style="color: #008000">self</span>, delta_term_next):
-        <span style="color: #408080; font-style: italic"># initiate delta term array</span>
-        delta_term <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros((<span style="color: #008000">self</span><span style="color: #666666">.</span>X_batch_feedforward<span style="color: #666666">.</span>shape))
-
-        <span style="color: #008000; font-weight: bold">for</span> img <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(delta_term_next<span style="color: #666666">.</span>shape[input_index]):
-            <span style="color: #008000; font-weight: bold">for</span> fmap <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(delta_term_next<span style="color: #666666">.</span>shape[feature_maps_index]):
-                <span style="color: #008000; font-weight: bold">for</span> h <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #666666">0</span>, delta_term_next<span style="color: #666666">.</span>shape[height_index], <span style="color: #008000">self</span><span style="color: #666666">.</span>v_stride):
-                    <span style="color: #008000; font-weight: bold">for</span> w <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(
-                        <span style="color: #666666">0</span>, delta_term_next<span style="color: #666666">.</span>shape[width_index], <span style="color: #008000">self</span><span style="color: #666666">.</span>h_stride
-                    ):
-                        <span style="color: #408080; font-style: italic"># max pooling</span>
-                        <span style="color: #008000; font-weight: bold">if</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>pooling <span style="color: #666666">==</span> <span style="color: #BA2121">&quot;max&quot;</span>:
-                            <span style="color: #408080; font-style: italic"># get window</span>
-                            window <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>X_batch_feedforward[
-                                img,
-                                fmap,
-                                h : h <span style="color: #666666">+</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_height,
-                                w : w <span style="color: #666666">+</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_width,
-                            ]
-
-                            <span style="color: #408080; font-style: italic"># find max values indices in window</span>
-                            max_h, max_w <span style="color: #666666">=</span> np<span style="color: #666666">.</span>unravel_index(
-                                window<span style="color: #666666">.</span>argmax(), window<span style="color: #666666">.</span>shape
-                            )
-
-                            <span style="color: #408080; font-style: italic"># set values in new, upsampled delta term</span>
-                            delta_term[
-                                img,
-                                fmap,
-                                (h <span style="color: #666666">+</span> max_h),
-                                (w <span style="color: #666666">+</span> max_w),
-                            ] <span style="color: #666666">+=</span> delta_term_next[img, fmap, h, w]
-
-                        <span style="color: #408080; font-style: italic"># average pooling</span>
-                        <span style="color: #008000; font-weight: bold">if</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>pooling <span style="color: #666666">==</span> <span style="color: #BA2121">&quot;average&quot;</span>:
-                            delta_term[
-                                img,
-                                fmap,
-                                h : h <span style="color: #666666">+</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_height,
-                                w : w <span style="color: #666666">+</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_width,
-                            ] <span style="color: #666666">=</span> (
-                                delta_term_next[img, fmap, h, w]
-                                <span style="color: #666666">/</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_height
-                                <span style="color: #666666">/</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_width
-                            )
-        <span style="color: #408080; font-style: italic"># returns input to backpropagation in previous layer</span>
-        <span style="color: #008000; font-weight: bold">return</span> delta_term
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">_reset_weights</span>(<span style="color: #008000">self</span>, previous_nodes):
-        <span style="color: #408080; font-style: italic"># calculate strided height, strided width</span>
-        strided_height <span style="color: #666666">=</span> (
-            previous_nodes<span style="color: #666666">.</span>shape[height_index] <span style="color: #666666">-</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_height
-        ) <span style="color: #666666">//</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>v_stride <span style="color: #666666">+</span> <span style="color: #666666">1</span>
-        <span style="color: #008000; font-weight: bold">if</span> previous_nodes<span style="color: #666666">.</span>shape[height_index] <span style="color: #666666">==</span> previous_nodes<span style="color: #666666">.</span>shape[width_index]:
-            strided_width <span style="color: #666666">=</span> strided_height
-        <span style="color: #008000; font-weight: bold">else</span>:
-            strided_width <span style="color: #666666">=</span> (
-                previous_nodes<span style="color: #666666">.</span>shape[width_index] <span style="color: #666666">-</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_width
-            ) <span style="color: #666666">//</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>h_stride <span style="color: #666666">+</span> <span style="color: #666666">1</span>
-
-        <span style="color: #408080; font-style: italic"># initiate output array</span>
-        output <span style="color: #666666">=</span> np<span style="color: #666666">.</span>ones(
-            (
-                previous_nodes<span style="color: #666666">.</span>shape[input_index],
-                previous_nodes<span style="color: #666666">.</span>shape[feature_maps_index],
-                strided_height,
-                strided_width,
-            )
-        )
-
-        <span style="color: #408080; font-style: italic"># returns output with shape used for reset weights in next layer</span>
-        <span style="color: #008000; font-weight: bold">return</span> output
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">_check_for_errors</span>(<span style="color: #008000">self</span>):
-        <span style="color: #408080; font-style: italic"># check if input is smaller than kernel size -&gt; error</span>
-        <span style="color: #008000; font-weight: bold">assert</span> (
-            <span style="color: #008000">self</span><span style="color: #666666">.</span>X_batch_feedforward<span style="color: #666666">.</span>shape[width_index] <span style="color: #666666">&gt;=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_width
-        ), <span style="color: #BA2121">f&quot;ERROR: Pooling kernel width_index (</span><span style="color: #BB6688; font-weight: bold">{</span><span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_width<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">) larger than data width_index (</span><span style="color: #BB6688; font-weight: bold">{</span><span style="color: #008000">self</span><span style="color: #666666">.</span>X_batch_feedforward<span style="color: #666666">.</span>input<span style="color: #666666">.</span>shape[<span style="color: #666666">2</span>]<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">), please lower the kernel width_index of the Pooling2DLayer&quot;</span>
-        <span style="color: #008000; font-weight: bold">assert</span> (
-            <span style="color: #008000">self</span><span style="color: #666666">.</span>X_batch_feedforward<span style="color: #666666">.</span>shape[height_index] <span style="color: #666666">&gt;=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_height
-        ), <span style="color: #BA2121">f&quot;ERROR: Pooling kernel height_index (</span><span style="color: #BB6688; font-weight: bold">{</span><span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_height<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">) larger than data height_index (</span><span style="color: #BB6688; font-weight: bold">{</span><span style="color: #008000">self</span><span style="color: #666666">.</span>X_batch_feedforward<span style="color: #666666">.</span>input<span style="color: #666666">.</span>shape[<span style="color: #666666">3</span>]<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">), please lower the kernel height_index of the Pooling2DLayer&quot;</span>
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-<h3 id="flattening-layer" class="anchor">Flattening Layer </h3>
-
-<p>Before we can begin building our first CNN model, we need to introduce
-the flattening layer. As its name suggests, the flattening layer
-transforms the data into a one-dimensional vector that can be fed into
-the feedforward layers of our network. This layer plays a crucial role
-in preparing the data for further processing in the
-network. Additionally, the flattening layer is responsible for
-reshaping the gradient to the proper shape during
-backpropagation. This ensures that the kernels are correctly updated,
-allowing for effective learning in the network.
-</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">class</span> <span style="color: #0000FF; font-weight: bold">FlattenLayer</span>(Layer):
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">__init__</span>(<span style="color: #008000">self</span>, act_func<span style="color: #666666">=</span>LRELU, seed<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">None</span>):
-        <span style="color: #008000">super</span>()<span style="color: #666666">.</span><span style="color: #0000FF">__init__</span>(seed)
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>act_func <span style="color: #666666">=</span> act_func
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">_feedforward</span>(<span style="color: #008000">self</span>, X_batch):
-        <span style="color: #408080; font-style: italic"># save input for backpropagation</span>
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>X_batch_feedforward_shape <span style="color: #666666">=</span> X_batch<span style="color: #666666">.</span>shape
-        <span style="color: #408080; font-style: italic"># Remember, the data has the following shape: (I, FM, H, W, ) in the convolutional layers</span>
-        <span style="color: #408080; font-style: italic"># whilst the data has the shape (I, FM * H * W) in the fully connected layers</span>
-        <span style="color: #408080; font-style: italic"># I = Inputs, FM = Feature Maps, H = Height and W = Width.</span>
-        X_batch <span style="color: #666666">=</span> X_batch<span style="color: #666666">.</span>reshape(
-            X_batch<span style="color: #666666">.</span>shape[input_index],
-            X_batch<span style="color: #666666">.</span>shape[feature_maps_index]
-            <span style="color: #666666">*</span> X_batch<span style="color: #666666">.</span>shape[height_index]
-            <span style="color: #666666">*</span> X_batch<span style="color: #666666">.</span>shape[width_index],
-        )
-
-        <span style="color: #408080; font-style: italic"># add bias to a</span>
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>z_matrix <span style="color: #666666">=</span> X_batch
-        bias <span style="color: #666666">=</span> np<span style="color: #666666">.</span>ones((X_batch<span style="color: #666666">.</span>shape[input_index], <span style="color: #666666">1</span>)) <span style="color: #666666">*</span> <span style="color: #666666">0.01</span>
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>a_matrix <span style="color: #666666">=</span> np<span style="color: #666666">.</span>hstack([bias, X_batch])
-
-        <span style="color: #408080; font-style: italic"># return a, the input to feedforward in next layer</span>
-        <span style="color: #008000; font-weight: bold">return</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>a_matrix
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">_backpropagate</span>(<span style="color: #008000">self</span>, weights_next, delta_term_next):
-        activation_derivative <span style="color: #666666">=</span> derivate(<span style="color: #008000">self</span><span style="color: #666666">.</span>act_func)
-
-        <span style="color: #408080; font-style: italic"># calculate delta term</span>
-        delta_term <span style="color: #666666">=</span> (
-            weights_next[bias_index:, :] <span style="color: #666666">@</span> delta_term_next<span style="color: #666666">.</span>T
-        )<span style="color: #666666">.</span>T <span style="color: #666666">*</span> activation_derivative(<span style="color: #008000">self</span><span style="color: #666666">.</span>z_matrix)
-
-        <span style="color: #408080; font-style: italic"># FlattenLayer does not update weights</span>
-        <span style="color: #408080; font-style: italic"># reshapes delta layer to convolutional layer data format [Input, Feature_Maps, Height, Width]</span>
-        <span style="color: #008000; font-weight: bold">return</span> delta_term<span style="color: #666666">.</span>reshape(<span style="color: #008000">self</span><span style="color: #666666">.</span>X_batch_feedforward_shape)
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">_reset_weights</span>(<span style="color: #008000">self</span>, previous_nodes):
-        <span style="color: #408080; font-style: italic"># note that the previous nodes to the FlattenLayer are from the convolutional layers</span>
-        previous_nodes <span style="color: #666666">=</span> previous_nodes<span style="color: #666666">.</span>reshape(
-            previous_nodes<span style="color: #666666">.</span>shape[input_index],
-            previous_nodes<span style="color: #666666">.</span>shape[feature_maps_index]
-            <span style="color: #666666">*</span> previous_nodes<span style="color: #666666">.</span>shape[height_index]
-            <span style="color: #666666">*</span> previous_nodes<span style="color: #666666">.</span>shape[width_index],
-        )
-
-        <span style="color: #408080; font-style: italic"># return shape used in reset_weights in next layer</span>
-        <span style="color: #008000; font-weight: bold">return</span> previous_nodes<span style="color: #666666">.</span>shape[node_index]
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">get_prev_a</span>(<span style="color: #008000">self</span>):
-        <span style="color: #008000; font-weight: bold">return</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>a_matrix
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-<h3 id="fully-connected-layers" class="anchor">Fully Connected Layers </h3>
-
-<p>Finally, the result from the flatten layer will pass to a series of
-fully connected layers, which function as a normal feed forward neural
-network. The fully connected layers are split into two classes;
-FullyConnectedLayer which acts as a hidden layer, and OutputLayer,
-which acts as the single output layer at the end of the CNN. If one
-wishes to use this codebase to construct a normal feed forward neural
-network, it must start with a FlattenLayer due to techincal details
-regarding weight intitialization. However many FullyConnectedLayers
-can be added to the CNN, and in each layer the amount of nodes, which
-activation function and scheduler to use can be specified. In
-practice, the scheduler will be specified in the CNN object
-initialization, and inherited if no other scheduler is specified.
-</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">class</span> <span style="color: #0000FF; font-weight: bold">FullyConnectedLayer</span>(Layer):
-    <span style="color: #408080; font-style: italic"># FullyConnectedLayer per default uses LRELU and Adam scheduler</span>
-    <span style="color: #408080; font-style: italic"># with an eta of 0.0001, rho of 0.9 and rho2 of 0.999</span>
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">__init__</span>(
-        <span style="color: #008000">self</span>,
-        nodes: <span style="color: #008000">int</span>,
-        act_func: Callable <span style="color: #666666">=</span> LRELU,
-        scheduler: Scheduler <span style="color: #666666">=</span> Adam(eta<span style="color: #666666">=1e-4</span>, rho<span style="color: #666666">=0.9</span>, rho2<span style="color: #666666">=0.999</span>),
-        seed: <span style="color: #008000">int</span> <span style="color: #666666">=</span> <span style="color: #008000; font-weight: bold">None</span>,
-    ):
-        <span style="color: #008000">super</span>()<span style="color: #666666">.</span><span style="color: #0000FF">__init__</span>(seed)
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>nodes <span style="color: #666666">=</span> nodes
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>act_func <span style="color: #666666">=</span> act_func
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>scheduler_weight <span style="color: #666666">=</span> copy(scheduler)
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>scheduler_bias <span style="color: #666666">=</span> copy(scheduler)
-
-        <span style="color: #408080; font-style: italic"># initiate matrices for later</span>
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>weights <span style="color: #666666">=</span> <span style="color: #008000; font-weight: bold">None</span>
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>a_matrix <span style="color: #666666">=</span> <span style="color: #008000; font-weight: bold">None</span>
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>z_matrix <span style="color: #666666">=</span> <span style="color: #008000; font-weight: bold">None</span>
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">_feedforward</span>(<span style="color: #008000">self</span>, X_batch):
-        <span style="color: #408080; font-style: italic"># calculate z</span>
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>z_matrix <span style="color: #666666">=</span> X_batch <span style="color: #666666">@</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>weights
-
-        <span style="color: #408080; font-style: italic"># calculate a, add bias</span>
-        bias <span style="color: #666666">=</span> np<span style="color: #666666">.</span>ones((X_batch<span style="color: #666666">.</span>shape[input_index], <span style="color: #666666">1</span>)) <span style="color: #666666">*</span> <span style="color: #666666">0.01</span>
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>a_matrix <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>act_func(<span style="color: #008000">self</span><span style="color: #666666">.</span>z_matrix)
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>a_matrix <span style="color: #666666">=</span> np<span style="color: #666666">.</span>hstack([bias, <span style="color: #008000">self</span><span style="color: #666666">.</span>a_matrix])
-
-        <span style="color: #408080; font-style: italic"># return a, the input for feedforward in next layer</span>
-        <span style="color: #008000; font-weight: bold">return</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>a_matrix
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">_backpropagate</span>(<span style="color: #008000">self</span>, weights_next, delta_term_next, a_previous, lam):
-        <span style="color: #408080; font-style: italic"># take the derivative of the activation function</span>
-        activation_derivative <span style="color: #666666">=</span> derivate(<span style="color: #008000">self</span><span style="color: #666666">.</span>act_func)
-
-        <span style="color: #408080; font-style: italic"># calculate the delta term</span>
-        delta_term <span style="color: #666666">=</span> (
-            weights_next[bias_index:, :] <span style="color: #666666">@</span> delta_term_next<span style="color: #666666">.</span>T
-        )<span style="color: #666666">.</span>T <span style="color: #666666">*</span> activation_derivative(<span style="color: #008000">self</span><span style="color: #666666">.</span>z_matrix)
-
-        <span style="color: #408080; font-style: italic"># intitiate matrix to store gradient</span>
-        <span style="color: #408080; font-style: italic"># note that we exclude the bias term, which we will calculate later</span>
-        gradient_weights <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros(
-            (
-                a_previous<span style="color: #666666">.</span>shape[input_index],
-                a_previous<span style="color: #666666">.</span>shape[node_index] <span style="color: #666666">-</span> bias_index,
-                delta_term<span style="color: #666666">.</span>shape[node_index],
-            )
-        )
-
-        <span style="color: #408080; font-style: italic"># calculate gradient = delta term * previous a</span>
-        <span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #008000">len</span>(delta_term)):
-            gradient_weights[i, :, :] <span style="color: #666666">=</span> np<span style="color: #666666">.</span>outer(
-                a_previous[i, bias_index:], delta_term[i, :]
-            )
-
-        <span style="color: #408080; font-style: italic"># sum the gradient, divide by input_index</span>
-        gradient_weights <span style="color: #666666">=</span> np<span style="color: #666666">.</span>mean(gradient_weights, axis<span style="color: #666666">=</span>input_index)
-        <span style="color: #408080; font-style: italic"># for the bias gradient we do not multiply by previous a</span>
-        gradient_bias <span style="color: #666666">=</span> np<span style="color: #666666">.</span>mean(delta_term, axis<span style="color: #666666">=</span>input_index)<span style="color: #666666">.</span>reshape(
-            <span style="color: #666666">1</span>, delta_term<span style="color: #666666">.</span>shape[node_index]
-        )
-
-        <span style="color: #408080; font-style: italic"># regularization term</span>
-        gradient_weights <span style="color: #666666">+=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>weights[bias_index:, :] <span style="color: #666666">*</span> lam
-
-        <span style="color: #408080; font-style: italic"># send gradients into scheduler</span>
-        <span style="color: #408080; font-style: italic"># returns update matrix which will be used to update the weights and bias</span>
-        update_matrix <span style="color: #666666">=</span> np<span style="color: #666666">.</span>vstack(
-            [
-                <span style="color: #008000">self</span><span style="color: #666666">.</span>scheduler_bias<span style="color: #666666">.</span>update_change(gradient_bias),
-                <span style="color: #008000">self</span><span style="color: #666666">.</span>scheduler_weight<span style="color: #666666">.</span>update_change(gradient_weights),
-            ]
-        )
-
-        <span style="color: #408080; font-style: italic"># update weights</span>
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>weights <span style="color: #666666">-=</span> update_matrix
-
-        <span style="color: #408080; font-style: italic"># return weights and delta term, input for backpropagation in previous layer</span>
-        <span style="color: #008000; font-weight: bold">return</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>weights, delta_term
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">_reset_weights</span>(<span style="color: #008000">self</span>, previous_nodes):
-        <span style="color: #408080; font-style: italic"># sets seed to remove randomness inbetween runs</span>
-        <span style="color: #008000; font-weight: bold">if</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>seed <span style="color: #AA22FF; font-weight: bold">is</span> <span style="color: #AA22FF; font-weight: bold">not</span> <span style="color: #008000; font-weight: bold">None</span>:
-            np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>seed(<span style="color: #008000">self</span><span style="color: #666666">.</span>seed)
-
-        <span style="color: #408080; font-style: italic"># add bias, initiate random weights</span>
-        bias <span style="color: #666666">=</span> <span style="color: #666666">1</span>
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>weights <span style="color: #666666">=</span> np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>randn(previous_nodes <span style="color: #666666">+</span> bias, <span style="color: #008000">self</span><span style="color: #666666">.</span>nodes)
-
-        <span style="color: #408080; font-style: italic"># returns number of nodes, used for reset_weights in next layer</span>
-        <span style="color: #008000; font-weight: bold">return</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>nodes
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">_reset_scheduler</span>(<span style="color: #008000">self</span>):
-        <span style="color: #408080; font-style: italic"># resets scheduler per epoch</span>
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>scheduler_weight<span style="color: #666666">.</span>reset()
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>scheduler_bias<span style="color: #666666">.</span>reset()
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">get_prev_a</span>(<span style="color: #008000">self</span>):
-        <span style="color: #408080; font-style: italic"># returns a matrix, used in backpropagation</span>
-        <span style="color: #008000; font-weight: bold">return</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>a_matrix
-
-
-<span style="color: #008000; font-weight: bold">class</span> <span style="color: #0000FF; font-weight: bold">OutputLayer</span>(FullyConnectedLayer):
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">__init__</span>(
-        <span style="color: #008000">self</span>,
-        nodes: <span style="color: #008000">int</span>,
-        output_func: Callable <span style="color: #666666">=</span> LRELU,
-        cost_func: Callable <span style="color: #666666">=</span> CostCrossEntropy,
-        scheduler: Scheduler <span style="color: #666666">=</span> Adam(eta<span style="color: #666666">=1e-4</span>, rho<span style="color: #666666">=0.9</span>, rho2<span style="color: #666666">=0.999</span>),
-        seed: <span style="color: #008000">int</span> <span style="color: #666666">=</span> <span style="color: #008000; font-weight: bold">None</span>,
-    ):
-        <span style="color: #008000">super</span>()<span style="color: #666666">.</span><span style="color: #0000FF">__init__</span>(nodes, output_func, copy(scheduler), seed)
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>cost_func <span style="color: #666666">=</span> cost_func
-
-        <span style="color: #408080; font-style: italic"># initiate matrices for later</span>
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>weights <span style="color: #666666">=</span> <span style="color: #008000; font-weight: bold">None</span>
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>a_matrix <span style="color: #666666">=</span> <span style="color: #008000; font-weight: bold">None</span>
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>z_matrix <span style="color: #666666">=</span> <span style="color: #008000; font-weight: bold">None</span>
-
-        <span style="color: #408080; font-style: italic"># decides if the output layer performs binary or multi-class classification</span>
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>_set_pred_format()
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">_feedforward</span>(<span style="color: #008000">self</span>, X_batch: np<span style="color: #666666">.</span>ndarray):
-        <span style="color: #408080; font-style: italic"># calculate a, z</span>
-        <span style="color: #408080; font-style: italic"># note that bias is not added as this would create an extra output class</span>
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>z_matrix <span style="color: #666666">=</span> X_batch <span style="color: #666666">@</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>weights
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>a_matrix <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>act_func(<span style="color: #008000">self</span><span style="color: #666666">.</span>z_matrix)
-
-        <span style="color: #408080; font-style: italic"># returns prediction</span>
-        <span style="color: #008000; font-weight: bold">return</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>a_matrix
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">_backpropagate</span>(<span style="color: #008000">self</span>, target, a_previous, lam):
-        <span style="color: #408080; font-style: italic"># note that in the OutputLayer the activation function is the output function</span>
-        activation_derivative <span style="color: #666666">=</span> derivate(<span style="color: #008000">self</span><span style="color: #666666">.</span>act_func)
-
-        <span style="color: #408080; font-style: italic"># calculate output delta terms</span>
-        <span style="color: #408080; font-style: italic"># for multi-class or binary classification</span>
-        <span style="color: #008000; font-weight: bold">if</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>pred_format <span style="color: #666666">==</span> <span style="color: #BA2121">&quot;Multi-class&quot;</span>:
-            delta_term <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>a_matrix <span style="color: #666666">-</span> target
-        <span style="color: #008000; font-weight: bold">else</span>:
-            cost_func_derivative <span style="color: #666666">=</span> grad(<span style="color: #008000">self</span><span style="color: #666666">.</span>cost_func(target))
-            delta_term <span style="color: #666666">=</span> activation_derivative(<span style="color: #008000">self</span><span style="color: #666666">.</span>z_matrix) <span style="color: #666666">*</span> cost_func_derivative(
-                <span style="color: #008000">self</span><span style="color: #666666">.</span>a_matrix
-            )
-
-        <span style="color: #408080; font-style: italic"># intiate matrix that stores gradient</span>
-        gradient_weights <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros(
-            (
-                a_previous<span style="color: #666666">.</span>shape[input_index],
-                a_previous<span style="color: #666666">.</span>shape[node_index] <span style="color: #666666">-</span> bias_index,
-                delta_term<span style="color: #666666">.</span>shape[node_index],
-            )
-        )
-
-        <span style="color: #408080; font-style: italic"># calculate gradient = delta term * previous a</span>
-        <span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #008000">len</span>(delta_term)):
-            gradient_weights[i, :, :] <span style="color: #666666">=</span> np<span style="color: #666666">.</span>outer(
-                a_previous[i, bias_index:], delta_term[i, :]
-            )
-
-        <span style="color: #408080; font-style: italic"># sum the gradient, divide by input_index</span>
-        gradient_weights <span style="color: #666666">=</span> np<span style="color: #666666">.</span>mean(gradient_weights, axis<span style="color: #666666">=</span>input_index)
-        <span style="color: #408080; font-style: italic"># for the bias gradient we do not multiply by previous a</span>
-        gradient_bias <span style="color: #666666">=</span> np<span style="color: #666666">.</span>mean(delta_term, axis<span style="color: #666666">=</span>input_index)<span style="color: #666666">.</span>reshape(
-            <span style="color: #666666">1</span>, delta_term<span style="color: #666666">.</span>shape[node_index]
-        )
-
-        <span style="color: #408080; font-style: italic"># regularization term</span>
-        gradient_weights <span style="color: #666666">+=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>weights[bias_index:, :] <span style="color: #666666">*</span> lam
-
-        <span style="color: #408080; font-style: italic"># send gradients into scheduler</span>
-        <span style="color: #408080; font-style: italic"># returns update matrix which will be used to update the weights and bias</span>
-        update_matrix <span style="color: #666666">=</span> np<span style="color: #666666">.</span>vstack(
-            [
-                <span style="color: #008000">self</span><span style="color: #666666">.</span>scheduler_bias<span style="color: #666666">.</span>update_change(gradient_bias),
-                <span style="color: #008000">self</span><span style="color: #666666">.</span>scheduler_weight<span style="color: #666666">.</span>update_change(gradient_weights),
-            ]
-        )
-
-        <span style="color: #408080; font-style: italic"># update weights</span>
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>weights <span style="color: #666666">-=</span> update_matrix
-
-        <span style="color: #408080; font-style: italic"># return weights and delta term, input for backpropagation in previous layer</span>
-        <span style="color: #008000; font-weight: bold">return</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>weights, delta_term
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">_reset_weights</span>(<span style="color: #008000">self</span>, previous_nodes):
-        <span style="color: #408080; font-style: italic"># sets seed to remove randomness inbetween runs</span>
-        <span style="color: #008000; font-weight: bold">if</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>seed <span style="color: #AA22FF; font-weight: bold">is</span> <span style="color: #AA22FF; font-weight: bold">not</span> <span style="color: #008000; font-weight: bold">None</span>:
-            np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>seed(<span style="color: #008000">self</span><span style="color: #666666">.</span>seed)
-
-        <span style="color: #408080; font-style: italic"># add bias, initiate random weights</span>
-        bias <span style="color: #666666">=</span> <span style="color: #666666">1</span>
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>weights <span style="color: #666666">=</span> np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>rand(previous_nodes <span style="color: #666666">+</span> bias, <span style="color: #008000">self</span><span style="color: #666666">.</span>nodes)
-
-        <span style="color: #408080; font-style: italic"># returns number of nodes, used for reset_weights in next layer</span>
-        <span style="color: #008000; font-weight: bold">return</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>nodes
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">_reset_scheduler</span>(<span style="color: #008000">self</span>):
-        <span style="color: #408080; font-style: italic"># resets scheduler per epoch</span>
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>scheduler_weight<span style="color: #666666">.</span>reset()
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>scheduler_bias<span style="color: #666666">.</span>reset()
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">_set_pred_format</span>(<span style="color: #008000">self</span>):
-        <span style="color: #408080; font-style: italic"># sets prediction format to either regression, binary or multi-class classification</span>
-        <span style="color: #008000; font-weight: bold">if</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>act_func<span style="color: #666666">.</span><span style="color: #19177C">__name__</span> <span style="color: #AA22FF; font-weight: bold">is</span> <span style="color: #008000; font-weight: bold">None</span> <span style="color: #AA22FF; font-weight: bold">or</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>act_func<span style="color: #666666">.</span><span style="color: #19177C">__name__</span> <span style="color: #666666">==</span> <span style="color: #BA2121">&quot;identity&quot;</span>:
-            <span style="color: #008000">self</span><span style="color: #666666">.</span>pred_format <span style="color: #666666">=</span> <span style="color: #BA2121">&quot;Regression&quot;</span>
-        <span style="color: #008000; font-weight: bold">elif</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>act_func<span style="color: #666666">.</span><span style="color: #19177C">__name__</span> <span style="color: #666666">==</span> <span style="color: #BA2121">&quot;sigmoid&quot;</span> <span style="color: #AA22FF; font-weight: bold">or</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>act_func<span style="color: #666666">.</span><span style="color: #19177C">__name__</span> <span style="color: #666666">==</span> <span style="color: #BA2121">&quot;tanh&quot;</span>:
-            <span style="color: #008000">self</span><span style="color: #666666">.</span>pred_format <span style="color: #666666">=</span> <span style="color: #BA2121">&quot;Binary&quot;</span>
-        <span style="color: #008000; font-weight: bold">else</span>:
-            <span style="color: #008000">self</span><span style="color: #666666">.</span>pred_format <span style="color: #666666">=</span> <span style="color: #BA2121">&quot;Multi-class&quot;</span>
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">get_pred_format</span>(<span style="color: #008000">self</span>):
-        <span style="color: #408080; font-style: italic"># returns format of prediction</span>
-        <span style="color: #008000; font-weight: bold">return</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>pred_format
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-<h3 id="optimized-convolution2dlayer" class="anchor">Optimized Convolution2DLayer </h3>
-
-<p>For our CNN, we have also implemented an optimized version of the
-Convolution2DLayer, Convolution2DLayerOPT, which runs much faster. See
-VII. Remarks for discussion. This layer will per default be used by
-the CNN due to its computational advantages, but is much less
-readable. We've documented it such that specially interested students
-can understand the principles behind it, but it is not recommended to
-read. In short, we reshape and transpose parts of the image such that
-the convolutional operation can be swapped out for a simple matrix
-multiplication.
-</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">class</span> <span style="color: #0000FF; font-weight: bold">Convolution2DLayerOPT</span>(Convolution2DLayer):
-    <span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;</span>
-<span style="color: #BA2121; font-style: italic">    Am optimized version of the convolution layer above which</span>
-<span style="color: #BA2121; font-style: italic">    utilizes an approach of extracting windows of size equivalent</span>
-<span style="color: #BA2121; font-style: italic">    in size to the filter. The convoution is then performed on those</span>
-<span style="color: #BA2121; font-style: italic">    windows instead of a full feature map.</span>
-<span style="color: #BA2121; font-style: italic">    &quot;&quot;&quot;</span>
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">__init__</span>(
-        <span style="color: #008000">self</span>,
-        input_channels,
-        feature_maps,
-        kernel_height,
-        kernel_width,
-        v_stride,
-        h_stride,
-        pad,
-        act_func: Callable,
-        seed<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">None</span>,
-        reset_weights_independently<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">True</span>,
-    ):
-        <span style="color: #008000">super</span>()<span style="color: #666666">.</span><span style="color: #0000FF">__init__</span>(
-            input_channels,
-            feature_maps,
-            kernel_height,
-            kernel_width,
-            v_stride,
-            h_stride,
-            pad,
-            act_func,
-            seed,
-        )
-        <span style="color: #408080; font-style: italic"># true if layer is used outside of CNN</span>
-        <span style="color: #008000; font-weight: bold">if</span> reset_weights_independently <span style="color: #666666">==</span> <span style="color: #008000; font-weight: bold">True</span>:
-            <span style="color: #008000">self</span><span style="color: #666666">.</span>_reset_weights_independently()
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">_feedforward</span>(<span style="color: #008000">self</span>, X_batch):
-        <span style="color: #408080; font-style: italic"># The optimized _feedforward method is difficult to understand but computationally more efficient</span>
-        <span style="color: #408080; font-style: italic"># for a more &quot;by the book&quot; approach, please look at the _feedforward method of Convolution2DLayer</span>
-
-        <span style="color: #408080; font-style: italic"># save the input for backpropagation</span>
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>X_batch_feedforward <span style="color: #666666">=</span> X_batch
-
-        <span style="color: #408080; font-style: italic"># check that there are the correct amount of input channels</span>
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>_check_for_errors()
-
-        <span style="color: #408080; font-style: italic"># calculate new shape after stride</span>
-        strided_height <span style="color: #666666">=</span> <span style="color: #008000">int</span>(np<span style="color: #666666">.</span>ceil(X_batch<span style="color: #666666">.</span>shape[height_index] <span style="color: #666666">/</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>v_stride))
-        strided_width <span style="color: #666666">=</span> <span style="color: #008000">int</span>(np<span style="color: #666666">.</span>ceil(X_batch<span style="color: #666666">.</span>shape[width_index] <span style="color: #666666">/</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>h_stride))
-
-        <span style="color: #408080; font-style: italic"># get windows of the image for more computationally efficient convolution</span>
-        <span style="color: #408080; font-style: italic"># the idea is that we want to align the dimensions that we wish to matrix</span>
-        <span style="color: #408080; font-style: italic"># multiply, then use a simple matrix multiplication instead of convolution.</span>
-        <span style="color: #408080; font-style: italic"># then, we reshape the size back to its intended shape</span>
-        windows <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>_extract_windows(X_batch)
-        windows <span style="color: #666666">=</span> windows<span style="color: #666666">.</span>transpose(<span style="color: #666666">1</span>, <span style="color: #666666">0</span>, <span style="color: #666666">2</span>, <span style="color: #666666">3</span>, <span style="color: #666666">4</span>)<span style="color: #666666">.</span>reshape(
-            X_batch<span style="color: #666666">.</span>shape[input_index],
-            strided_height <span style="color: #666666">*</span> strided_width,
-            <span style="color: #666666">-1</span>,
-        )
-
-        <span style="color: #408080; font-style: italic"># reshape the kernel for more computationally efficient convolution</span>
-        kernel <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel
-        kernel <span style="color: #666666">=</span> kernel<span style="color: #666666">.</span>transpose(<span style="color: #666666">0</span>, <span style="color: #666666">2</span>, <span style="color: #666666">3</span>, <span style="color: #666666">1</span>)<span style="color: #666666">.</span>reshape(
-            kernel<span style="color: #666666">.</span>shape[kernel_input_channels_index]
-            <span style="color: #666666">*</span> kernel<span style="color: #666666">.</span>shape[height_index]
-            <span style="color: #666666">*</span> kernel<span style="color: #666666">.</span>shape[width_index],
-            <span style="color: #666666">-1</span>,
-        )
-
-        <span style="color: #408080; font-style: italic"># use simple matrix calculation to obtain output</span>
-        output <span style="color: #666666">=</span> (
-            (windows <span style="color: #666666">@</span> kernel)
-            <span style="color: #666666">.</span>reshape(
-                X_batch<span style="color: #666666">.</span>shape[input_index],
-                strided_height,
-                strided_width,
-                <span style="color: #666666">-1</span>,
-            )
-            <span style="color: #666666">.</span>transpose(<span style="color: #666666">0</span>, <span style="color: #666666">3</span>, <span style="color: #666666">1</span>, <span style="color: #666666">2</span>)
-        )
-
-        <span style="color: #408080; font-style: italic"># The output is reshaped and rearranged to appropriate shape</span>
-        <span style="color: #008000; font-weight: bold">return</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>act_func(
-            output <span style="color: #666666">/</span> (<span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_height <span style="color: #666666">*</span> X_batch<span style="color: #666666">.</span>shape[feature_maps_index])
-        )
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">_backpropagate</span>(<span style="color: #008000">self</span>, delta_term_next):
-        <span style="color: #408080; font-style: italic"># The optimized _backpropagate method is difficult to understand but computationally more efficient</span>
-        <span style="color: #408080; font-style: italic"># for a more &quot;by the book&quot; approach, please look at the _backpropagate method of Convolution2DLayer</span>
-        act_derivative <span style="color: #666666">=</span> derivate(<span style="color: #008000">self</span><span style="color: #666666">.</span>act_func)
-        delta_term_next <span style="color: #666666">=</span> act_derivative(delta_term_next)
-
-        <span style="color: #408080; font-style: italic"># calculate strided dimensions</span>
-        strided_height <span style="color: #666666">=</span> <span style="color: #008000">int</span>(
-            np<span style="color: #666666">.</span>ceil(<span style="color: #008000">self</span><span style="color: #666666">.</span>X_batch_feedforward<span style="color: #666666">.</span>shape[height_index] <span style="color: #666666">/</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>v_stride)
-        )
-        strided_width <span style="color: #666666">=</span> <span style="color: #008000">int</span>(
-            np<span style="color: #666666">.</span>ceil(<span style="color: #008000">self</span><span style="color: #666666">.</span>X_batch_feedforward<span style="color: #666666">.</span>shape[width_index] <span style="color: #666666">/</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>h_stride)
-        )
-
-        <span style="color: #408080; font-style: italic"># copy kernel</span>
-        kernel <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel
-
-        <span style="color: #408080; font-style: italic"># get windows, reshape for matrix multiplication</span>
-        windows <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>_extract_windows(<span style="color: #008000">self</span><span style="color: #666666">.</span>X_batch_feedforward, <span style="color: #BA2121">&quot;image&quot;</span>)<span style="color: #666666">.</span>reshape(
-            <span style="color: #008000">self</span><span style="color: #666666">.</span>X_batch_feedforward<span style="color: #666666">.</span>shape[input_index]
-            <span style="color: #666666">*</span> strided_height
-            <span style="color: #666666">*</span> strided_width,
-            <span style="color: #666666">-1</span>,
-        )
-
-        <span style="color: #408080; font-style: italic"># initialize output gradient, reshape and transpose into correct shape</span>
-        <span style="color: #408080; font-style: italic"># for matrix multiplication</span>
-        output_grad_tr <span style="color: #666666">=</span> delta_term_next<span style="color: #666666">.</span>transpose(<span style="color: #666666">0</span>, <span style="color: #666666">2</span>, <span style="color: #666666">3</span>, <span style="color: #666666">1</span>)<span style="color: #666666">.</span>reshape(
-            <span style="color: #008000">self</span><span style="color: #666666">.</span>X_batch_feedforward<span style="color: #666666">.</span>shape[input_index]
-            <span style="color: #666666">*</span> strided_height
-            <span style="color: #666666">*</span> strided_width,
-            <span style="color: #666666">-1</span>,
-        )
-
-        <span style="color: #408080; font-style: italic"># calculate gradient kernel via simple matrix multiplication and reshaping</span>
-        gradient_kernel <span style="color: #666666">=</span> (
-            (windows<span style="color: #666666">.</span>T <span style="color: #666666">@</span> output_grad_tr)
-            <span style="color: #666666">.</span>reshape(
-                kernel<span style="color: #666666">.</span>shape[kernel_input_channels_index],
-                kernel<span style="color: #666666">.</span>shape[height_index],
-                kernel<span style="color: #666666">.</span>shape[width_index],
-                kernel<span style="color: #666666">.</span>shape[kernel_feature_maps_index],
-            )
-            <span style="color: #666666">.</span>transpose(<span style="color: #666666">0</span>, <span style="color: #666666">3</span>, <span style="color: #666666">1</span>, <span style="color: #666666">2</span>)
-        )
-
-        <span style="color: #408080; font-style: italic"># for computing the input gradient</span>
-        windows_out, upsampled_height, upsampled_width <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>_extract_windows(
-            delta_term_next, <span style="color: #BA2121">&quot;grad&quot;</span>
-        )
-
-        <span style="color: #408080; font-style: italic"># calculate new window dimensions</span>
-        new_windows_first_dim <span style="color: #666666">=</span> (
-            <span style="color: #008000">self</span><span style="color: #666666">.</span>X_batch_feedforward<span style="color: #666666">.</span>shape[input_index]
-            <span style="color: #666666">*</span> upsampled_height
-            <span style="color: #666666">*</span> upsampled_width
-        )
-        <span style="color: #408080; font-style: italic"># ceil allows for various asymmetric kernels</span>
-        new_windows_sec_dim <span style="color: #666666">=</span> <span style="color: #008000">int</span>(np<span style="color: #666666">.</span>ceil(windows_out<span style="color: #666666">.</span>size <span style="color: #666666">/</span> new_windows_first_dim))
-
-        <span style="color: #408080; font-style: italic"># reshape for matrix multiplication</span>
-        windows_out <span style="color: #666666">=</span> windows_out<span style="color: #666666">.</span>transpose(<span style="color: #666666">1</span>, <span style="color: #666666">0</span>, <span style="color: #666666">2</span>, <span style="color: #666666">3</span>, <span style="color: #666666">4</span>)<span style="color: #666666">.</span>reshape(
-            new_windows_first_dim, new_windows_sec_dim
-        )
-
-        <span style="color: #408080; font-style: italic"># reshape for matrix multiplication</span>
-        kernel_reshaped <span style="color: #666666">=</span> kernel<span style="color: #666666">.</span>reshape(<span style="color: #008000">self</span><span style="color: #666666">.</span>input_channels, <span style="color: #666666">-1</span>)
-
-        <span style="color: #408080; font-style: italic"># calculating input gradient for next convolutional layer</span>
-        input_grad <span style="color: #666666">=</span> (windows_out <span style="color: #666666">@</span> kernel_reshaped<span style="color: #666666">.</span>T)<span style="color: #666666">.</span>reshape(
-            <span style="color: #008000">self</span><span style="color: #666666">.</span>X_batch_feedforward<span style="color: #666666">.</span>shape[input_index],
-            upsampled_height,
-            upsampled_width,
-            kernel<span style="color: #666666">.</span>shape[kernel_input_channels_index],
-        )
-        input_grad <span style="color: #666666">=</span> input_grad<span style="color: #666666">.</span>transpose(<span style="color: #666666">0</span>, <span style="color: #666666">3</span>, <span style="color: #666666">1</span>, <span style="color: #666666">2</span>)
-
-        <span style="color: #408080; font-style: italic"># Update the weights in the kernel</span>
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel <span style="color: #666666">-=</span> gradient_kernel
-
-        <span style="color: #408080; font-style: italic"># Output the gradient to propagate backwards</span>
-        <span style="color: #008000; font-weight: bold">return</span> input_grad
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">_extract_windows</span>(<span style="color: #008000">self</span>, X_batch, batch_type<span style="color: #666666">=</span><span style="color: #BA2121">&quot;image&quot;</span>):
-        <span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;</span>
-<span style="color: #BA2121; font-style: italic">        Receives as input the X_batch with shape (inputs, feature_maps, image_height, image_width)</span>
-<span style="color: #BA2121; font-style: italic">        and extract windows of size kernel_height * kernel_width for every image and every feature_map.</span>
-<span style="color: #BA2121; font-style: italic">        It then returns an np.ndarray of shape (image_height * image_width, inputs, feature_maps, kernel_height, kernel_width)</span>
-<span style="color: #BA2121; font-style: italic">        which will be used either to filter the images in feedforward or to calculate the gradient.</span>
-<span style="color: #BA2121; font-style: italic">        &quot;&quot;&quot;</span>
-
-        <span style="color: #408080; font-style: italic"># initialize list of windows</span>
-        windows <span style="color: #666666">=</span> []
-
-        <span style="color: #008000; font-weight: bold">if</span> batch_type <span style="color: #666666">==</span> <span style="color: #BA2121">&quot;image&quot;</span>:
-            <span style="color: #408080; font-style: italic"># pad the images</span>
-            X_batch_padded <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>_padding(X_batch, batch_type<span style="color: #666666">=</span><span style="color: #BA2121">&quot;image&quot;</span>)
-            img_height, img_width <span style="color: #666666">=</span> X_batch_padded<span style="color: #666666">.</span>shape[<span style="color: #666666">2</span>:]
-            <span style="color: #408080; font-style: italic"># For each location in the image...</span>
-            <span style="color: #008000; font-weight: bold">for</span> h <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(
-                <span style="color: #666666">0</span>,
-                X_batch<span style="color: #666666">.</span>shape[height_index],
-                <span style="color: #008000">self</span><span style="color: #666666">.</span>v_stride,
-            ):
-                <span style="color: #008000; font-weight: bold">for</span> w <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(
-                    <span style="color: #666666">0</span>,
-                    X_batch<span style="color: #666666">.</span>shape[width_index],
-                    <span style="color: #008000">self</span><span style="color: #666666">.</span>h_stride,
-                ):
-                    <span style="color: #408080; font-style: italic"># ...obtain an image patch of the original size (strided)</span>
-
-                    <span style="color: #408080; font-style: italic"># get window</span>
-                    window <span style="color: #666666">=</span> X_batch_padded[
-                        :,
-                        :,
-                        h : h <span style="color: #666666">+</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_height,
-                        w : w <span style="color: #666666">+</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_width,
-                    ]
-
-                    <span style="color: #408080; font-style: italic"># append to list of windows</span>
-                    windows<span style="color: #666666">.</span>append(window)
-
-            <span style="color: #408080; font-style: italic"># return numpy array instead of list</span>
-            <span style="color: #008000; font-weight: bold">return</span> np<span style="color: #666666">.</span>stack(windows)
-
-        <span style="color: #408080; font-style: italic"># In order to be able to perform backprogagation by the method of window extraction,</span>
-        <span style="color: #408080; font-style: italic"># here is a modified approach to extracting the windows which allow for the necessary</span>
-        <span style="color: #408080; font-style: italic"># upsampling of the gradient in case the on of the stride parameters is larger than one.</span>
-
-        <span style="color: #008000; font-weight: bold">if</span> batch_type <span style="color: #666666">==</span> <span style="color: #BA2121">&quot;grad&quot;</span>:
-
-            <span style="color: #408080; font-style: italic"># In the case of one of the stride parameters being odd, we have to take some</span>
-            <span style="color: #408080; font-style: italic"># extra care in calculating the upsampled size of X_batch. We solve this</span>
-            <span style="color: #408080; font-style: italic"># by simply flooring the result of dividing stride by 2.</span>
-            <span style="color: #008000; font-weight: bold">if</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>v_stride <span style="color: #666666">&lt;</span> <span style="color: #666666">2</span> <span style="color: #AA22FF; font-weight: bold">or</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>v_stride <span style="color: #666666">%</span> <span style="color: #666666">2</span> <span style="color: #666666">==</span> <span style="color: #666666">0</span>:
-                v_stride <span style="color: #666666">=</span> <span style="color: #666666">0</span>
-            <span style="color: #008000; font-weight: bold">else</span>:
-                v_stride <span style="color: #666666">=</span> <span style="color: #008000">int</span>(np<span style="color: #666666">.</span>floor(<span style="color: #008000">self</span><span style="color: #666666">.</span>v_stride <span style="color: #666666">/</span> <span style="color: #666666">2</span>))
-
-            <span style="color: #008000; font-weight: bold">if</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>h_stride <span style="color: #666666">&lt;</span> <span style="color: #666666">2</span> <span style="color: #AA22FF; font-weight: bold">or</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>h_stride <span style="color: #666666">%</span> <span style="color: #666666">2</span> <span style="color: #666666">==</span> <span style="color: #666666">0</span>:
-                h_stride <span style="color: #666666">=</span> <span style="color: #666666">0</span>
-            <span style="color: #008000; font-weight: bold">else</span>:
-                h_stride <span style="color: #666666">=</span> <span style="color: #008000">int</span>(np<span style="color: #666666">.</span>floor(<span style="color: #008000">self</span><span style="color: #666666">.</span>h_stride <span style="color: #666666">/</span> <span style="color: #666666">2</span>))
-
-            upsampled_height <span style="color: #666666">=</span> (X_batch<span style="color: #666666">.</span>shape[height_index] <span style="color: #666666">*</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>v_stride) <span style="color: #666666">-</span> v_stride
-            upsampled_width <span style="color: #666666">=</span> (X_batch<span style="color: #666666">.</span>shape[width_index] <span style="color: #666666">*</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>h_stride) <span style="color: #666666">-</span> h_stride
-
-            <span style="color: #408080; font-style: italic"># When upsampling, we need to insert rows and columns filled with zeros</span>
-            <span style="color: #408080; font-style: italic"># into each feature map. How many of those we have to insert is purely</span>
-            <span style="color: #408080; font-style: italic"># dependant on the value of stride parameter in the vertical and horizontal</span>
-            <span style="color: #408080; font-style: italic"># direction.</span>
-            <span style="color: #008000; font-weight: bold">if</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>v_stride <span style="color: #666666">&gt;</span> <span style="color: #666666">1</span>:
-                v_ind <span style="color: #666666">=</span> <span style="color: #666666">1</span>
-                <span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(X_batch<span style="color: #666666">.</span>shape[height_index]):
-                    <span style="color: #008000; font-weight: bold">for</span> j <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #008000">self</span><span style="color: #666666">.</span>v_stride <span style="color: #666666">-</span> <span style="color: #666666">1</span>):
-                        X_batch <span style="color: #666666">=</span> np<span style="color: #666666">.</span>insert(X_batch, v_ind, <span style="color: #666666">0</span>, axis<span style="color: #666666">=</span>height_index)
-                    v_ind <span style="color: #666666">+=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>v_stride
-
-            <span style="color: #008000; font-weight: bold">if</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>h_stride <span style="color: #666666">&gt;</span> <span style="color: #666666">1</span>:
-                h_ind <span style="color: #666666">=</span> <span style="color: #666666">1</span>
-                <span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(X_batch<span style="color: #666666">.</span>shape[width_index]):
-                    <span style="color: #008000; font-weight: bold">for</span> k <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #008000">self</span><span style="color: #666666">.</span>h_stride <span style="color: #666666">-</span> <span style="color: #666666">1</span>):
-                        X_batch <span style="color: #666666">=</span> np<span style="color: #666666">.</span>insert(X_batch, h_ind, <span style="color: #666666">0</span>, axis<span style="color: #666666">=</span>width_index)
-                    h_ind <span style="color: #666666">+=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>h_stride
-
-            <span style="color: #408080; font-style: italic"># Since the insertion of zero-filled rows and columns isn&#39;t perfect, we have</span>
-            <span style="color: #408080; font-style: italic"># to assure that the resulting feature maps will have the expected upsampled height</span>
-            <span style="color: #408080; font-style: italic"># and width by cutting them og at desired dimensions.</span>
-
-            X_batch <span style="color: #666666">=</span> X_batch[:, :, :upsampled_height, :upsampled_width]
-
-            X_batch_padded <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>_padding(X_batch, batch_type<span style="color: #666666">=</span><span style="color: #BA2121">&quot;grad&quot;</span>)
-
-            <span style="color: #408080; font-style: italic"># initialize list of windows</span>
-            windows <span style="color: #666666">=</span> []
-
-            <span style="color: #408080; font-style: italic"># For each location in the image...</span>
-            <span style="color: #008000; font-weight: bold">for</span> h <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(
-                <span style="color: #666666">0</span>,
-                X_batch<span style="color: #666666">.</span>shape[height_index],
-                <span style="color: #008000">self</span><span style="color: #666666">.</span>v_stride,
-            ):
-                <span style="color: #008000; font-weight: bold">for</span> w <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(
-                    <span style="color: #666666">0</span>,
-                    X_batch<span style="color: #666666">.</span>shape[width_index],
-                    <span style="color: #008000">self</span><span style="color: #666666">.</span>h_stride,
-                ):
-                    <span style="color: #408080; font-style: italic"># ...obtain an image patch of the original size (strided)</span>
-
-                    <span style="color: #408080; font-style: italic"># get window</span>
-                    window <span style="color: #666666">=</span> X_batch_padded[
-                        :, :, h : h <span style="color: #666666">+</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_height, w : w <span style="color: #666666">+</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_width
-                    ]
-
-                    <span style="color: #408080; font-style: italic"># append window to list</span>
-                    windows<span style="color: #666666">.</span>append(window)
-
-            <span style="color: #408080; font-style: italic"># return numpy array, unsampled dimensions</span>
-            <span style="color: #008000; font-weight: bold">return</span> np<span style="color: #666666">.</span>stack(windows), upsampled_height, upsampled_width
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">_check_for_errors</span>(<span style="color: #008000">self</span>):
-        <span style="color: #408080; font-style: italic"># compares input channels of data to input channels of Convolution2DLayer</span>
-        <span style="color: #008000; font-weight: bold">if</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>X_batch_feedforward<span style="color: #666666">.</span>shape[input_channel_index] <span style="color: #666666">!=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>input_channels:
-            <span style="color: #008000; font-weight: bold">raise</span> <span style="color: #D2413A; font-weight: bold">AssertionError</span>(
-                <span style="color: #BA2121">f&quot;ERROR: Number of input channels in data (</span><span style="color: #BB6688; font-weight: bold">{</span><span style="color: #008000">self</span><span style="color: #666666">.</span>X_batch_feedforward<span style="color: #666666">.</span>shape[input_channel_index]<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">) is not equal to input channels in Convolution2DLayerOPT (</span><span style="color: #BB6688; font-weight: bold">{</span><span style="color: #008000">self</span><span style="color: #666666">.</span>input_channels<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">)! Please change the number of input channels of the Convolution2DLayer such that they are equal&quot;</span>
-            )
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-<h3 id="the-convolutional-neural-network-cnn" class="anchor">The Convolutional Neural Network (CNN) </h3>
-
-<p>Finally, we present the code for the CNN. The CNN class organizes all the layers, and allows for training on image data.</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">math</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">autograd.numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">sys</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">warnings</span>
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">autograd</span> <span style="color: #008000; font-weight: bold">import</span> grad, elementwise_grad
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">random</span> <span style="color: #008000; font-weight: bold">import</span> random, seed
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">copy</span> <span style="color: #008000; font-weight: bold">import</span> deepcopy
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">typing</span> <span style="color: #008000; font-weight: bold">import</span> Tuple, Callable
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.utils</span> <span style="color: #008000; font-weight: bold">import</span> resample
-
-warnings<span style="color: #666666">.</span>simplefilter(<span style="color: #BA2121">&quot;error&quot;</span>)
-
-
-<span style="color: #008000; font-weight: bold">class</span> <span style="color: #0000FF; font-weight: bold">CNN</span>:
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">__init__</span>(
-        <span style="color: #008000">self</span>,
-        cost_func: Callable <span style="color: #666666">=</span> CostCrossEntropy,
-        scheduler: Scheduler <span style="color: #666666">=</span> Adam(eta<span style="color: #666666">=1e-4</span>, rho<span style="color: #666666">=0.9</span>, rho2<span style="color: #666666">=0.999</span>),
-        seed: <span style="color: #008000">int</span> <span style="color: #666666">=</span> <span style="color: #008000; font-weight: bold">None</span>,
-    ):
-        <span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;</span>
-<span style="color: #BA2121; font-style: italic">        Description:</span>
-<span style="color: #BA2121; font-style: italic">        ------------</span>
-<span style="color: #BA2121; font-style: italic">            Instantiates CNN object</span>
-
-<span style="color: #BA2121; font-style: italic">        Parameters:</span>
-<span style="color: #BA2121; font-style: italic">        ------------</span>
-<span style="color: #BA2121; font-style: italic">            I   output_func (costFunctions) cost function for feed forward neural network part of CNN,</span>
-<span style="color: #BA2121; font-style: italic">                such as &quot;CostLogReg&quot;, &quot;CostOLS&quot; or &quot;CostCrossEntropy&quot;</span>
-
-<span style="color: #BA2121; font-style: italic">            II  scheduler (Scheduler) optional parameter, default set to Adam. Can also be set to other</span>
-<span style="color: #BA2121; font-style: italic">                schedulers such as AdaGrad, Momentum, RMS_prop and Constant. Note that schedulers have</span>
-<span style="color: #BA2121; font-style: italic">                to be instantiated first with proper parameters (for example eta, rho and rho2 for Adam)</span>
-
-<span style="color: #BA2121; font-style: italic">            III seed (int) used for seeding all random operations</span>
-<span style="color: #BA2121; font-style: italic">        &quot;&quot;&quot;</span>
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>layers <span style="color: #666666">=</span> <span style="color: #008000">list</span>()
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>cost_func <span style="color: #666666">=</span> cost_func
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>scheduler <span style="color: #666666">=</span> scheduler
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>schedulers_weight <span style="color: #666666">=</span> <span style="color: #008000">list</span>()
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>schedulers_bias <span style="color: #666666">=</span> <span style="color: #008000">list</span>()
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>seed <span style="color: #666666">=</span> seed
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>pred_format <span style="color: #666666">=</span> <span style="color: #008000; font-weight: bold">None</span>
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">add_FullyConnectedLayer</span>(
-        <span style="color: #008000">self</span>, nodes: <span style="color: #008000">int</span>, act_func<span style="color: #666666">=</span>LRELU, scheduler<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">None</span>
-    ) <span style="color: #666666">-&gt;</span> <span style="color: #008000; font-weight: bold">None</span>:
-        <span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;</span>
-<span style="color: #BA2121; font-style: italic">        Description:</span>
-<span style="color: #BA2121; font-style: italic">        ------------</span>
-<span style="color: #BA2121; font-style: italic">            Add a FullyConnectedLayer to the CNN, i.e. a hidden layer in the feed forward neural</span>
-<span style="color: #BA2121; font-style: italic">            network part of the CNN. Often called a Dense layer in literature</span>
-
-<span style="color: #BA2121; font-style: italic">        Parameters:</span>
-<span style="color: #BA2121; font-style: italic">        ------------</span>
-<span style="color: #BA2121; font-style: italic">            I   nodes (int) number of nodes in FullyConnectedLayer</span>
-<span style="color: #BA2121; font-style: italic">            II  act_func (activationFunctions) activation function of FullyConnectedLayer,</span>
-<span style="color: #BA2121; font-style: italic">                such as &quot;sigmoid&quot;, &quot;RELU&quot;, &quot;LRELU&quot;, &quot;softmax&quot; or &quot;identity&quot;</span>
-<span style="color: #BA2121; font-style: italic">            III scheduler (Scheduler) optional parameter, default set to Adam. Can also be set to other</span>
-<span style="color: #BA2121; font-style: italic">                schedulers such as AdaGrad, Momentum, RMS_prop and Constant</span>
-<span style="color: #BA2121; font-style: italic">        &quot;&quot;&quot;</span>
-        <span style="color: #008000; font-weight: bold">assert</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>layers, <span style="color: #BA2121">&quot;FullyConnectedLayer should follow FlattenLayer in CNN&quot;</span>
-
-        <span style="color: #008000; font-weight: bold">if</span> scheduler <span style="color: #AA22FF; font-weight: bold">is</span> <span style="color: #008000; font-weight: bold">None</span>:
-            scheduler <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>scheduler
-
-        layer <span style="color: #666666">=</span> FullyConnectedLayer(nodes, act_func, scheduler, <span style="color: #008000">self</span><span style="color: #666666">.</span>seed)
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>layers<span style="color: #666666">.</span>append(layer)
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">add_OutputLayer</span>(<span style="color: #008000">self</span>, nodes: <span style="color: #008000">int</span>, output_func<span style="color: #666666">=</span>sigmoid, scheduler<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">None</span>) <span style="color: #666666">-&gt;</span> <span style="color: #008000; font-weight: bold">None</span>:
-        <span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;</span>
-<span style="color: #BA2121; font-style: italic">        Description:</span>
-<span style="color: #BA2121; font-style: italic">        ------------</span>
-<span style="color: #BA2121; font-style: italic">            Add an OutputLayer to the CNN, i.e. a the final layer in the feed forward neural</span>
-<span style="color: #BA2121; font-style: italic">            network part of the CNN</span>
-
-<span style="color: #BA2121; font-style: italic">        Parameters:</span>
-<span style="color: #BA2121; font-style: italic">        ------------</span>
-<span style="color: #BA2121; font-style: italic">            I   nodes (int) number of nodes in OutputLayer. Set nodes=1 for binary classification and</span>
-<span style="color: #BA2121; font-style: italic">                nodes = number of classes for multi-class classification</span>
-<span style="color: #BA2121; font-style: italic">            II  output_func (activationFunctions) activation function for the output layer, such as</span>
-<span style="color: #BA2121; font-style: italic">                &quot;identity&quot; for regression, &quot;sigmoid&quot; for binary classification and &quot;softmax&quot; for multi-class</span>
-<span style="color: #BA2121; font-style: italic">                classification</span>
-<span style="color: #BA2121; font-style: italic">            III scheduler (Scheduler) optional parameter, default set to Adam. Can also be set to other</span>
-<span style="color: #BA2121; font-style: italic">                schedulers such as AdaGrad, Momentum, RMS_prop and Constant</span>
-<span style="color: #BA2121; font-style: italic">        &quot;&quot;&quot;</span>
-        <span style="color: #008000; font-weight: bold">assert</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>layers, <span style="color: #BA2121">&quot;OutputLayer should follow FullyConnectedLayer in CNN&quot;</span>
-
-        <span style="color: #008000; font-weight: bold">if</span> scheduler <span style="color: #AA22FF; font-weight: bold">is</span> <span style="color: #008000; font-weight: bold">None</span>:
-            scheduler <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>scheduler
-
-        output_layer <span style="color: #666666">=</span> OutputLayer(
-            nodes, output_func, <span style="color: #008000">self</span><span style="color: #666666">.</span>cost_func, scheduler, <span style="color: #008000">self</span><span style="color: #666666">.</span>seed
-        )
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>layers<span style="color: #666666">.</span>append(output_layer)
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>pred_format <span style="color: #666666">=</span> output_layer<span style="color: #666666">.</span>get_pred_format()
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">add_FlattenLayer</span>(<span style="color: #008000">self</span>, act_func<span style="color: #666666">=</span>LRELU) <span style="color: #666666">-&gt;</span> <span style="color: #008000; font-weight: bold">None</span>:
-        <span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;</span>
-<span style="color: #BA2121; font-style: italic">        Description:</span>
-<span style="color: #BA2121; font-style: italic">        ------------</span>
-<span style="color: #BA2121; font-style: italic">            Add a FlattenLayer to the CNN, which flattens the image data such that it is formatted to</span>
-<span style="color: #BA2121; font-style: italic">            be used in the feed forward neural network part of the CNN</span>
-<span style="color: #BA2121; font-style: italic">        &quot;&quot;&quot;</span>
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>layers<span style="color: #666666">.</span>append(FlattenLayer(act_func<span style="color: #666666">=</span>act_func, seed<span style="color: #666666">=</span><span style="color: #008000">self</span><span style="color: #666666">.</span>seed))
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">add_Convolution2DLayer</span>(
-        <span style="color: #008000">self</span>,
-        input_channels<span style="color: #666666">=1</span>,
-        feature_maps<span style="color: #666666">=1</span>,
-        kernel_height<span style="color: #666666">=3</span>,
-        kernel_width<span style="color: #666666">=3</span>,
-        v_stride<span style="color: #666666">=1</span>,
-        h_stride<span style="color: #666666">=1</span>,
-        pad<span style="color: #666666">=</span><span style="color: #BA2121">&quot;same&quot;</span>,
-        act_func<span style="color: #666666">=</span>LRELU,
-        optimized<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">True</span>,
-    ) <span style="color: #666666">-&gt;</span> <span style="color: #008000; font-weight: bold">None</span>:
-        <span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;</span>
-<span style="color: #BA2121; font-style: italic">        Description:</span>
-<span style="color: #BA2121; font-style: italic">        ------------</span>
-<span style="color: #BA2121; font-style: italic">            Add a Convolution2DLayer to the CNN, i.e. a convolutional layer with a 2 dimensional kernel. Should be</span>
-<span style="color: #BA2121; font-style: italic">            the first layer added to the CNN</span>
-
-<span style="color: #BA2121; font-style: italic">        Parameters:</span>
-<span style="color: #BA2121; font-style: italic">        ------------</span>
-<span style="color: #BA2121; font-style: italic">            I   input_channels (int) specifies amount of input channels. For monochrome images, use input_channels</span>
-<span style="color: #BA2121; font-style: italic">                = 1, and input_channels = 3 for colored images, where each channel represents one of R, G and B</span>
-<span style="color: #BA2121; font-style: italic">            II  feature_maps (int) amount of feature maps in CNN</span>
-<span style="color: #BA2121; font-style: italic">            III kernel_height (int) height of the kernel, also called &#39;convolutional filter&#39; in literature</span>
-<span style="color: #BA2121; font-style: italic">            IV  kernel_width (int) width of the kernel, also called &#39;convolutional filter&#39; in literature</span>
-<span style="color: #BA2121; font-style: italic">            V   v_stride (int) value of vertical stride for dimentionality reduction</span>
-<span style="color: #BA2121; font-style: italic">            VI  h_stride (int) value of horizontal stride for dimentionality reduction</span>
-<span style="color: #BA2121; font-style: italic">            VII pad (str) default = &quot;same&quot; ensures output size is the same as input size (given stride=1)</span>
-<span style="color: #BA2121; font-style: italic">           VIII act_func (activationFunctions) default = &quot;LRELU&quot;, nonlinear activation function</span>
-<span style="color: #BA2121; font-style: italic">             IX optimized (bool) default = True, uses Convolution2DLayerOPT if True which is much faster when</span>
-<span style="color: #BA2121; font-style: italic">                compared to Convolution2DLayer, which is a more straightforward, understandable implementation</span>
-<span style="color: #BA2121; font-style: italic">        &quot;&quot;&quot;</span>
-        <span style="color: #008000; font-weight: bold">if</span> optimized:
-            conv_layer <span style="color: #666666">=</span> Convolution2DLayerOPT(
-                input_channels,
-                feature_maps,
-                kernel_height,
-                kernel_width,
-                v_stride,
-                h_stride,
-                pad,
-                act_func,
-                <span style="color: #008000">self</span><span style="color: #666666">.</span>seed,
-                reset_weights_independently<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">False</span>,
-            )
-        <span style="color: #008000; font-weight: bold">else</span>:
-            conv_layer <span style="color: #666666">=</span> Convolution2DLayer(
-                input_channels,
-                feature_maps,
-                kernel_height,
-                kernel_width,
-                v_stride,
-                h_stride,
-                pad,
-                act_func,
-                <span style="color: #008000">self</span><span style="color: #666666">.</span>seed,
-                reset_weights_independently<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">False</span>,
-            )
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>layers<span style="color: #666666">.</span>append(conv_layer)
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">add_PoolingLayer</span>(
-        <span style="color: #008000">self</span>, kernel_height<span style="color: #666666">=2</span>, kernel_width<span style="color: #666666">=2</span>, v_stride<span style="color: #666666">=1</span>, h_stride<span style="color: #666666">=1</span>, pooling<span style="color: #666666">=</span><span style="color: #BA2121">&quot;max&quot;</span>
-    ) <span style="color: #666666">-&gt;</span> <span style="color: #008000; font-weight: bold">None</span>:
-        <span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;</span>
-<span style="color: #BA2121; font-style: italic">        Description:</span>
-<span style="color: #BA2121; font-style: italic">        ------------</span>
-<span style="color: #BA2121; font-style: italic">            Add a Pooling2DLayer to the CNN, i.e. a pooling layer that reduces the dimentionality of</span>
-<span style="color: #BA2121; font-style: italic">            the image data. It is not necessary to use a Pooling2DLayer when creating a CNN, but it</span>
-<span style="color: #BA2121; font-style: italic">            can be used to speed up the training</span>
-
-<span style="color: #BA2121; font-style: italic">        Parameters:</span>
-<span style="color: #BA2121; font-style: italic">        ------------</span>
-<span style="color: #BA2121; font-style: italic">            I   kernel_height (int) height of the kernel used for pooling</span>
-<span style="color: #BA2121; font-style: italic">            II  kernel_width (int) width of the kernel used for pooling</span>
-<span style="color: #BA2121; font-style: italic">            III v_stride (int) value of vertical stride for dimentionality reduction</span>
-<span style="color: #BA2121; font-style: italic">            IV  h_stride (int) value of horizontal stride for dimentionality reduction</span>
-<span style="color: #BA2121; font-style: italic">            V   pooling (str) either &quot;max&quot; or &quot;average&quot;, describes type of pooling performed</span>
-<span style="color: #BA2121; font-style: italic">        &quot;&quot;&quot;</span>
-        pooling_layer <span style="color: #666666">=</span> Pooling2DLayer(
-            kernel_height, kernel_width, v_stride, h_stride, pooling, <span style="color: #008000">self</span><span style="color: #666666">.</span>seed
-        )
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>layers<span style="color: #666666">.</span>append(pooling_layer)
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">fit</span>(
-        <span style="color: #008000">self</span>,
-        X: np<span style="color: #666666">.</span>ndarray,
-        t: np<span style="color: #666666">.</span>ndarray,
-        epochs: <span style="color: #008000">int</span> <span style="color: #666666">=</span> <span style="color: #666666">100</span>,
-        lam: <span style="color: #008000">float</span> <span style="color: #666666">=</span> <span style="color: #666666">0</span>,
-        batches: <span style="color: #008000">int</span> <span style="color: #666666">=</span> <span style="color: #666666">1</span>,
-        X_val: np<span style="color: #666666">.</span>ndarray <span style="color: #666666">=</span> <span style="color: #008000; font-weight: bold">None</span>,
-        t_val: np<span style="color: #666666">.</span>ndarray <span style="color: #666666">=</span> <span style="color: #008000; font-weight: bold">None</span>,
-    ) <span style="color: #666666">-&gt;</span> <span style="color: #008000">dict</span>:
-        <span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;</span>
-<span style="color: #BA2121; font-style: italic">        Description:</span>
-<span style="color: #BA2121; font-style: italic">        ------------</span>
-<span style="color: #BA2121; font-style: italic">            Fits the CNN to input X for a given amount of epochs. Performs feedforward and backpropagation passes,</span>
-<span style="color: #BA2121; font-style: italic">            can utilize batches, regulariziation and validation if desired.</span>
-
-<span style="color: #BA2121; font-style: italic">        Parameters:</span>
-<span style="color: #BA2121; font-style: italic">        ------------</span>
-<span style="color: #BA2121; font-style: italic">            X (numpy array) with input data in format [images, input channels,</span>
-<span style="color: #BA2121; font-style: italic">            image height, image_width]</span>
-<span style="color: #BA2121; font-style: italic">            t (numpy array) target labels for input data</span>
-<span style="color: #BA2121; font-style: italic">            epochs (int) amount of epochs</span>
-<span style="color: #BA2121; font-style: italic">            lam (float) regulariziation term lambda</span>
-<span style="color: #BA2121; font-style: italic">            batches (int) amount of batches input data splits into</span>
-<span style="color: #BA2121; font-style: italic">            X_val (numpy array) validation data</span>
-<span style="color: #BA2121; font-style: italic">            t_val (numpy array) target labels for validation data</span>
-
-<span style="color: #BA2121; font-style: italic">        Returns:</span>
-<span style="color: #BA2121; font-style: italic">        ------------</span>
-<span style="color: #BA2121; font-style: italic">            scores (dict) a dictionary with &quot;train_error&quot;, &quot;train_acc&quot;, &quot;val_error&quot;, val_acc&quot; keys</span>
-<span style="color: #BA2121; font-style: italic">            that contain numpy arrays with float values of all accuracies/errors over all epochs.</span>
-<span style="color: #BA2121; font-style: italic">            Can be used to create plots. Also used to update the progress bar during training</span>
-<span style="color: #BA2121; font-style: italic">        &quot;&quot;&quot;</span>
-
-        <span style="color: #408080; font-style: italic"># setup</span>
-        <span style="color: #008000; font-weight: bold">if</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>seed <span style="color: #AA22FF; font-weight: bold">is</span> <span style="color: #AA22FF; font-weight: bold">not</span> <span style="color: #008000; font-weight: bold">None</span>:
-            np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>seed(<span style="color: #008000">self</span><span style="color: #666666">.</span>seed)
-
-        <span style="color: #408080; font-style: italic"># initialize weights</span>
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>_initialize_weights(X)
-
-        <span style="color: #408080; font-style: italic"># create arrays for score metrics</span>
-        scores <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>_initialize_scores(epochs)
-
-        <span style="color: #008000; font-weight: bold">assert</span> batches <span style="color: #666666">&lt;=</span> t<span style="color: #666666">.</span>shape[<span style="color: #666666">0</span>]
-        batch_size <span style="color: #666666">=</span> X<span style="color: #666666">.</span>shape[<span style="color: #666666">0</span>] <span style="color: #666666">//</span> batches
-
-        <span style="color: #008000; font-weight: bold">try</span>:
-            <span style="color: #008000; font-weight: bold">for</span> epoch <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(epochs):
-                <span style="color: #008000; font-weight: bold">for</span> batch <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(batches):
-                    <span style="color: #408080; font-style: italic"># minibatch gradient descent</span>
-                    <span style="color: #408080; font-style: italic"># If the for loop has reached the last batch, take all thats left</span>
-                    <span style="color: #008000; font-weight: bold">if</span> batch <span style="color: #666666">==</span> batches <span style="color: #666666">-</span> <span style="color: #666666">1</span>:
-                        X_batch <span style="color: #666666">=</span> X[batch <span style="color: #666666">*</span> batch_size :, :, :, :]
-                        t_batch <span style="color: #666666">=</span> t[batch <span style="color: #666666">*</span> batch_size :, :]
-                    <span style="color: #008000; font-weight: bold">else</span>:
-                        X_batch <span style="color: #666666">=</span> X[
-                            batch <span style="color: #666666">*</span> batch_size : (batch <span style="color: #666666">+</span> <span style="color: #666666">1</span>) <span style="color: #666666">*</span> batch_size, :, :, :
-                        ]
-                        t_batch <span style="color: #666666">=</span> t[batch <span style="color: #666666">*</span> batch_size : (batch <span style="color: #666666">+</span> <span style="color: #666666">1</span>) <span style="color: #666666">*</span> batch_size, :]
-
-                    <span style="color: #008000">self</span><span style="color: #666666">.</span>_feedforward(X_batch)
-                    <span style="color: #008000">self</span><span style="color: #666666">.</span>_backpropagate(t_batch, lam)
-
-                <span style="color: #408080; font-style: italic"># reset schedulers for each epoch (some schedulers pass in this call)</span>
-                <span style="color: #008000; font-weight: bold">for</span> layer <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>layers:
-                    <span style="color: #008000; font-weight: bold">if</span> <span style="color: #008000">isinstance</span>(layer, FullyConnectedLayer):
-                        layer<span style="color: #666666">.</span>_reset_scheduler()
-
-                <span style="color: #408080; font-style: italic"># computing performance metrics</span>
-                scores <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>_compute_scores(scores, epoch, X, t, X_val, t_val)
-
-                <span style="color: #408080; font-style: italic"># printing progress bar</span>
-                print_length <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>_progress_bar(
-                    epoch,
-                    epochs,
-                    scores,
-                )
-        <span style="color: #408080; font-style: italic"># allows for stopping training at any point and seeing the result</span>
-        <span style="color: #008000; font-weight: bold">except</span> <span style="color: #D2413A; font-weight: bold">KeyboardInterrupt</span>:
-            <span style="color: #008000; font-weight: bold">pass</span>
-
-        <span style="color: #408080; font-style: italic"># visualization of training progression (similiar to tensorflow progression bar)</span>
-        sys<span style="color: #666666">.</span>stdout<span style="color: #666666">.</span>write(<span style="color: #BA2121">&quot;</span><span style="color: #BB6622; font-weight: bold">\r</span><span style="color: #BA2121">&quot;</span> <span style="color: #666666">+</span> <span style="color: #BA2121">&quot; &quot;</span> <span style="color: #666666">*</span> print_length)
-        sys<span style="color: #666666">.</span>stdout<span style="color: #666666">.</span>flush()
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>_progress_bar(
-            epochs,
-            epochs,
-            scores,
-        )
-        sys<span style="color: #666666">.</span>stdout<span style="color: #666666">.</span>write(<span style="color: #BA2121">&quot;&quot;</span>)
-
-        <span style="color: #008000; font-weight: bold">return</span> scores
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">_feedforward</span>(<span style="color: #008000">self</span>, X_batch) <span style="color: #666666">-&gt;</span> np<span style="color: #666666">.</span>ndarray:
-        <span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;</span>
-<span style="color: #BA2121; font-style: italic">        Description:</span>
-<span style="color: #BA2121; font-style: italic">        ------------</span>
-<span style="color: #BA2121; font-style: italic">            Performs the feedforward pass for all layers in the CNN. Called from fit()</span>
-<span style="color: #BA2121; font-style: italic">        &quot;&quot;&quot;</span>
-        a <span style="color: #666666">=</span> X_batch
-        <span style="color: #008000; font-weight: bold">for</span> layer <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>layers:
-            a <span style="color: #666666">=</span> layer<span style="color: #666666">.</span>_feedforward(a)
-
-        <span style="color: #008000; font-weight: bold">return</span> a
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">_backpropagate</span>(<span style="color: #008000">self</span>, t_batch, lam) <span style="color: #666666">-&gt;</span> <span style="color: #008000; font-weight: bold">None</span>:
-        <span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;</span>
-<span style="color: #BA2121; font-style: italic">        Description:</span>
-<span style="color: #BA2121; font-style: italic">        ------------</span>
-<span style="color: #BA2121; font-style: italic">            Performs backpropagation for all layers in the CNN. Called from fit()</span>
-<span style="color: #BA2121; font-style: italic">        &quot;&quot;&quot;</span>
-        <span style="color: #008000; font-weight: bold">assert</span> <span style="color: #008000">len</span>(<span style="color: #008000">self</span><span style="color: #666666">.</span>layers) <span style="color: #666666">&gt;=</span> <span style="color: #666666">2</span>
-        reversed_layers <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>layers[::<span style="color: #666666">-1</span>]
-
-        <span style="color: #408080; font-style: italic"># for every layer, backwards</span>
-        <span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #008000">len</span>(reversed_layers) <span style="color: #666666">-</span> <span style="color: #666666">1</span>):
-            layer <span style="color: #666666">=</span> reversed_layers[i]
-            prev_layer <span style="color: #666666">=</span> reversed_layers[i <span style="color: #666666">+</span> <span style="color: #666666">1</span>]
-
-            <span style="color: #408080; font-style: italic"># OutputLayer</span>
-            <span style="color: #008000; font-weight: bold">if</span> <span style="color: #008000">isinstance</span>(layer, OutputLayer):
-                prev_a <span style="color: #666666">=</span> prev_layer<span style="color: #666666">.</span>get_prev_a()
-                weights_next, delta_next <span style="color: #666666">=</span> layer<span style="color: #666666">.</span>_backpropagate(t_batch, prev_a, lam)
-
-            <span style="color: #408080; font-style: italic"># FullyConnectedLayer</span>
-            <span style="color: #008000; font-weight: bold">elif</span> <span style="color: #008000">isinstance</span>(layer, FullyConnectedLayer):
-                <span style="color: #008000; font-weight: bold">assert</span> (
-                    delta_next <span style="color: #AA22FF; font-weight: bold">is</span> <span style="color: #AA22FF; font-weight: bold">not</span> <span style="color: #008000; font-weight: bold">None</span>
-                ), <span style="color: #BA2121">&quot;No OutputLayer to follow FullyConnectedLayer&quot;</span>
-                <span style="color: #008000; font-weight: bold">assert</span> (
-                    weights_next <span style="color: #AA22FF; font-weight: bold">is</span> <span style="color: #AA22FF; font-weight: bold">not</span> <span style="color: #008000; font-weight: bold">None</span>
-                ), <span style="color: #BA2121">&quot;No OutputLayer to follow FullyConnectedLayer&quot;</span>
-                prev_a <span style="color: #666666">=</span> prev_layer<span style="color: #666666">.</span>get_prev_a()
-                weights_next, delta_next <span style="color: #666666">=</span> layer<span style="color: #666666">.</span>_backpropagate(
-                    weights_next, delta_next, prev_a, lam
-                )
-
-            <span style="color: #408080; font-style: italic"># FlattenLayer</span>
-            <span style="color: #008000; font-weight: bold">elif</span> <span style="color: #008000">isinstance</span>(layer, FlattenLayer):
-                <span style="color: #008000; font-weight: bold">assert</span> (
-                    delta_next <span style="color: #AA22FF; font-weight: bold">is</span> <span style="color: #AA22FF; font-weight: bold">not</span> <span style="color: #008000; font-weight: bold">None</span>
-                ), <span style="color: #BA2121">&quot;No FullyConnectedLayer to follow FlattenLayer&quot;</span>
-                <span style="color: #008000; font-weight: bold">assert</span> (
-                    weights_next <span style="color: #AA22FF; font-weight: bold">is</span> <span style="color: #AA22FF; font-weight: bold">not</span> <span style="color: #008000; font-weight: bold">None</span>
-                ), <span style="color: #BA2121">&quot;No FullyConnectedLayer to follow FlattenLayer&quot;</span>
-                delta_next <span style="color: #666666">=</span> layer<span style="color: #666666">.</span>_backpropagate(weights_next, delta_next)
-
-            <span style="color: #408080; font-style: italic"># Convolution2DLayer and Convolution2DLayerOPT</span>
-            <span style="color: #008000; font-weight: bold">elif</span> <span style="color: #008000">isinstance</span>(layer, Convolution2DLayer):
-                <span style="color: #008000; font-weight: bold">assert</span> (
-                    delta_next <span style="color: #AA22FF; font-weight: bold">is</span> <span style="color: #AA22FF; font-weight: bold">not</span> <span style="color: #008000; font-weight: bold">None</span>
-                ), <span style="color: #BA2121">&quot;No FlattenLayer to follow Convolution2DLayer&quot;</span>
-                delta_next <span style="color: #666666">=</span> layer<span style="color: #666666">.</span>_backpropagate(delta_next)
-
-            <span style="color: #408080; font-style: italic"># Pooling2DLayer</span>
-            <span style="color: #008000; font-weight: bold">elif</span> <span style="color: #008000">isinstance</span>(layer, Pooling2DLayer):
-                <span style="color: #008000; font-weight: bold">assert</span> delta_next <span style="color: #AA22FF; font-weight: bold">is</span> <span style="color: #AA22FF; font-weight: bold">not</span> <span style="color: #008000; font-weight: bold">None</span>, <span style="color: #BA2121">&quot;No Layer to follow Pooling2DLayer&quot;</span>
-                delta_next <span style="color: #666666">=</span> layer<span style="color: #666666">.</span>_backpropagate(delta_next)
-
-            <span style="color: #408080; font-style: italic"># Catch error</span>
-            <span style="color: #008000; font-weight: bold">else</span>:
-                <span style="color: #008000; font-weight: bold">raise</span> <span style="color: #D2413A; font-weight: bold">NotImplementedError</span>
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">_compute_scores</span>(
-        <span style="color: #008000">self</span>,
-        scores: <span style="color: #008000">dict</span>,
-        epoch: <span style="color: #008000">int</span>,
-        X: np<span style="color: #666666">.</span>ndarray,
-        t: np<span style="color: #666666">.</span>ndarray,
-        X_val: np<span style="color: #666666">.</span>ndarray,
-        t_val: np<span style="color: #666666">.</span>ndarray,
-    ) <span style="color: #666666">-&gt;</span> <span style="color: #008000">dict</span>:
-        <span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;</span>
-<span style="color: #BA2121; font-style: italic">        Description:</span>
-<span style="color: #BA2121; font-style: italic">        ------------</span>
-<span style="color: #BA2121; font-style: italic">            Computes scores such as training error, training accuracy, validation error</span>
-<span style="color: #BA2121; font-style: italic">            and validation accuracy for the CNN depending on if a validation set is used</span>
-<span style="color: #BA2121; font-style: italic">            and if the CNN performs classification or regression</span>
-
-<span style="color: #BA2121; font-style: italic">        Returns:</span>
-<span style="color: #BA2121; font-style: italic">        ------------</span>
-<span style="color: #BA2121; font-style: italic">            scores (dict) a dictionary with &quot;train_error&quot;, &quot;train_acc&quot;, &quot;val_error&quot;, val_acc&quot; keys</span>
-<span style="color: #BA2121; font-style: italic">            that contain numpy arrays with float values of all accuracies/errors over all epochs.</span>
-<span style="color: #BA2121; font-style: italic">            Can be used to create plots. Also used to update the progress bar during training</span>
-<span style="color: #BA2121; font-style: italic">        &quot;&quot;&quot;</span>
-
-        pred_train <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>predict(X)
-        cost_function_train <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>cost_func(t)
-        train_error <span style="color: #666666">=</span> cost_function_train(pred_train)
-        scores[<span style="color: #BA2121">&quot;train_error&quot;</span>][epoch] <span style="color: #666666">=</span> train_error
-
-        <span style="color: #008000; font-weight: bold">if</span> X_val <span style="color: #AA22FF; font-weight: bold">is</span> <span style="color: #AA22FF; font-weight: bold">not</span> <span style="color: #008000; font-weight: bold">None</span> <span style="color: #AA22FF; font-weight: bold">and</span> t_val <span style="color: #AA22FF; font-weight: bold">is</span> <span style="color: #AA22FF; font-weight: bold">not</span> <span style="color: #008000; font-weight: bold">None</span>:
-            cost_function_val <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>cost_func(t_val)
-            pred_val <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>predict(X_val)
-            val_error <span style="color: #666666">=</span> cost_function_val(pred_val)
-            scores[<span style="color: #BA2121">&quot;val_error&quot;</span>][epoch] <span style="color: #666666">=</span> val_error
-
-        <span style="color: #008000; font-weight: bold">if</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>pred_format <span style="color: #666666">!=</span> <span style="color: #BA2121">&quot;Regression&quot;</span>:
-            train_acc <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>_accuracy(pred_train, t)
-            scores[<span style="color: #BA2121">&quot;train_acc&quot;</span>][epoch] <span style="color: #666666">=</span> train_acc
-            <span style="color: #008000; font-weight: bold">if</span> X_val <span style="color: #AA22FF; font-weight: bold">is</span> <span style="color: #AA22FF; font-weight: bold">not</span> <span style="color: #008000; font-weight: bold">None</span> <span style="color: #AA22FF; font-weight: bold">and</span> t_val <span style="color: #AA22FF; font-weight: bold">is</span> <span style="color: #AA22FF; font-weight: bold">not</span> <span style="color: #008000; font-weight: bold">None</span>:
-                val_acc <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>_accuracy(pred_val, t_val)
-                scores[<span style="color: #BA2121">&quot;val_acc&quot;</span>][epoch] <span style="color: #666666">=</span> val_acc
-
-        <span style="color: #008000; font-weight: bold">return</span> scores
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">_initialize_scores</span>(<span style="color: #008000">self</span>, epochs) <span style="color: #666666">-&gt;</span> <span style="color: #008000">dict</span>:
-        <span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;</span>
-<span style="color: #BA2121; font-style: italic">        Description:</span>
-<span style="color: #BA2121; font-style: italic">        ------------</span>
-<span style="color: #BA2121; font-style: italic">            Initializes scores such as training error, training accuracy, validation error</span>
-<span style="color: #BA2121; font-style: italic">            and validation accuracy for the CNN</span>
-
-<span style="color: #BA2121; font-style: italic">        Returns:</span>
-<span style="color: #BA2121; font-style: italic">        ------------</span>
-<span style="color: #BA2121; font-style: italic">            A dictionary with &quot;train_error&quot;, &quot;train_acc&quot;, &quot;val_error&quot;, val_acc&quot; keys that</span>
-<span style="color: #BA2121; font-style: italic">            will contain numpy arrays with float values of all accuracies/errors over all epochs</span>
-<span style="color: #BA2121; font-style: italic">            when passed through the _compute_scores() function during fit()</span>
-<span style="color: #BA2121; font-style: italic">        &quot;&quot;&quot;</span>
-        scores <span style="color: #666666">=</span> <span style="color: #008000">dict</span>()
-
-        train_errors <span style="color: #666666">=</span> np<span style="color: #666666">.</span>empty(epochs)
-        train_errors<span style="color: #666666">.</span>fill(np<span style="color: #666666">.</span>nan)
-        val_errors <span style="color: #666666">=</span> np<span style="color: #666666">.</span>empty(epochs)
-        val_errors<span style="color: #666666">.</span>fill(np<span style="color: #666666">.</span>nan)
-
-        train_accs <span style="color: #666666">=</span> np<span style="color: #666666">.</span>empty(epochs)
-        train_accs<span style="color: #666666">.</span>fill(np<span style="color: #666666">.</span>nan)
-        val_accs <span style="color: #666666">=</span> np<span style="color: #666666">.</span>empty(epochs)
-        val_accs<span style="color: #666666">.</span>fill(np<span style="color: #666666">.</span>nan)
-
-        scores[<span style="color: #BA2121">&quot;train_error&quot;</span>] <span style="color: #666666">=</span> train_errors
-        scores[<span style="color: #BA2121">&quot;val_error&quot;</span>] <span style="color: #666666">=</span> val_errors
-        scores[<span style="color: #BA2121">&quot;train_acc&quot;</span>] <span style="color: #666666">=</span> train_accs
-        scores[<span style="color: #BA2121">&quot;val_acc&quot;</span>] <span style="color: #666666">=</span> val_accs
-
-        <span style="color: #008000; font-weight: bold">return</span> scores
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">_initialize_weights</span>(<span style="color: #008000">self</span>, X: np<span style="color: #666666">.</span>ndarray) <span style="color: #666666">-&gt;</span> <span style="color: #008000; font-weight: bold">None</span>:
-        <span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;</span>
-<span style="color: #BA2121; font-style: italic">        Description:</span>
-<span style="color: #BA2121; font-style: italic">        ------------</span>
-<span style="color: #BA2121; font-style: italic">            Initializes weights for all layers in CNN</span>
-
-<span style="color: #BA2121; font-style: italic">        Parameters:</span>
-<span style="color: #BA2121; font-style: italic">        ------------</span>
-<span style="color: #BA2121; font-style: italic">            I   X (np.ndarray) input of format [img, feature_maps, height, width]</span>
-<span style="color: #BA2121; font-style: italic">        &quot;&quot;&quot;</span>
-        prev_nodes <span style="color: #666666">=</span> X
-        <span style="color: #008000; font-weight: bold">for</span> layer <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>layers:
-            prev_nodes <span style="color: #666666">=</span> layer<span style="color: #666666">.</span>_reset_weights(prev_nodes)
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">predict</span>(<span style="color: #008000">self</span>, X: np<span style="color: #666666">.</span>ndarray, <span style="color: #666666">*</span>, threshold<span style="color: #666666">=0.5</span>) <span style="color: #666666">-&gt;</span> np<span style="color: #666666">.</span>ndarray:
-        <span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;</span>
-<span style="color: #BA2121; font-style: italic">        Description:</span>
-<span style="color: #BA2121; font-style: italic">        ------------</span>
-<span style="color: #BA2121; font-style: italic">            Predicts output of input X</span>
-
-<span style="color: #BA2121; font-style: italic">        Parameters:</span>
-<span style="color: #BA2121; font-style: italic">        ------------</span>
-<span style="color: #BA2121; font-style: italic">            I   X (np.ndarray) input [img, feature_maps, height, width]</span>
-<span style="color: #BA2121; font-style: italic">        &quot;&quot;&quot;</span>
-
-        prediction <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>_feedforward(X)
-
-        <span style="color: #008000; font-weight: bold">if</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>pred_format <span style="color: #666666">==</span> <span style="color: #BA2121">&quot;Binary&quot;</span>:
-            <span style="color: #008000; font-weight: bold">return</span> np<span style="color: #666666">.</span>where(prediction <span style="color: #666666">&gt;</span> threshold, <span style="color: #666666">1</span>, <span style="color: #666666">0</span>)
-        <span style="color: #008000; font-weight: bold">elif</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>pred_format <span style="color: #666666">==</span> <span style="color: #BA2121">&quot;Multi-class&quot;</span>:
-            class_prediction <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros(prediction<span style="color: #666666">.</span>shape)
-            <span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(prediction<span style="color: #666666">.</span>shape[<span style="color: #666666">0</span>]):
-                class_prediction[i, np<span style="color: #666666">.</span>argmax(prediction[i, :])] <span style="color: #666666">=</span> <span style="color: #666666">1</span>
-            <span style="color: #008000; font-weight: bold">return</span> class_prediction
-        <span style="color: #008000; font-weight: bold">else</span>:
-            <span style="color: #008000; font-weight: bold">return</span> prediction
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">_accuracy</span>(<span style="color: #008000">self</span>, prediction: np<span style="color: #666666">.</span>ndarray, target: np<span style="color: #666666">.</span>ndarray) <span style="color: #666666">-&gt;</span> <span style="color: #008000">float</span>:
-        <span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;</span>
-<span style="color: #BA2121; font-style: italic">        Description:</span>
-<span style="color: #BA2121; font-style: italic">        ------------</span>
-<span style="color: #BA2121; font-style: italic">            Calculates accuracy of given prediction to target</span>
-
-<span style="color: #BA2121; font-style: italic">        Parameters:</span>
-<span style="color: #BA2121; font-style: italic">        ------------</span>
-<span style="color: #BA2121; font-style: italic">            I   prediction (np.ndarray): output of predict() fuction</span>
-<span style="color: #BA2121; font-style: italic">            (1s and 0s in case of classification, and real numbers in case of regression)</span>
-<span style="color: #BA2121; font-style: italic">            II  target (np.ndarray): vector of true values (What the network should predict)</span>
-
-<span style="color: #BA2121; font-style: italic">        Returns:</span>
-<span style="color: #BA2121; font-style: italic">        ------------</span>
-<span style="color: #BA2121; font-style: italic">            A floating point number representing the percentage of correctly classified instances.</span>
-<span style="color: #BA2121; font-style: italic">        &quot;&quot;&quot;</span>
-        <span style="color: #008000; font-weight: bold">assert</span> prediction<span style="color: #666666">.</span>size <span style="color: #666666">==</span> target<span style="color: #666666">.</span>size
-        <span style="color: #008000; font-weight: bold">return</span> np<span style="color: #666666">.</span>average((target <span style="color: #666666">==</span> prediction))
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">_progress_bar</span>(<span style="color: #008000">self</span>, epoch: <span style="color: #008000">int</span>, epochs: <span style="color: #008000">int</span>, scores: <span style="color: #008000">dict</span>) <span style="color: #666666">-&gt;</span> <span style="color: #008000">int</span>:
-        <span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;</span>
-<span style="color: #BA2121; font-style: italic">        Description:</span>
-<span style="color: #BA2121; font-style: italic">        ------------</span>
-<span style="color: #BA2121; font-style: italic">            Displays progress of training</span>
-<span style="color: #BA2121; font-style: italic">        &quot;&quot;&quot;</span>
-        progression <span style="color: #666666">=</span> epoch <span style="color: #666666">/</span> epochs
-        epoch <span style="color: #666666">-=</span> <span style="color: #666666">1</span>
-        print_length <span style="color: #666666">=</span> <span style="color: #666666">40</span>
-        num_equals <span style="color: #666666">=</span> <span style="color: #008000">int</span>(progression <span style="color: #666666">*</span> print_length)
-        num_not <span style="color: #666666">=</span> print_length <span style="color: #666666">-</span> num_equals
-        arrow <span style="color: #666666">=</span> <span style="color: #BA2121">&quot;&gt;&quot;</span> <span style="color: #008000; font-weight: bold">if</span> num_equals <span style="color: #666666">&gt;</span> <span style="color: #666666">0</span> <span style="color: #008000; font-weight: bold">else</span> <span style="color: #BA2121">&quot;&quot;</span>
-        bar <span style="color: #666666">=</span> <span style="color: #BA2121">&quot;[&quot;</span> <span style="color: #666666">+</span> <span style="color: #BA2121">&quot;=&quot;</span> <span style="color: #666666">*</span> (num_equals <span style="color: #666666">-</span> <span style="color: #666666">1</span>) <span style="color: #666666">+</span> arrow <span style="color: #666666">+</span> <span style="color: #BA2121">&quot;-&quot;</span> <span style="color: #666666">*</span> num_not <span style="color: #666666">+</span> <span style="color: #BA2121">&quot;]&quot;</span>
-        perc_print <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>_fmt(progression <span style="color: #666666">*</span> <span style="color: #666666">100</span>, N<span style="color: #666666">=5</span>)
-        line <span style="color: #666666">=</span> <span style="color: #BA2121">f&quot;  </span><span style="color: #BB6688; font-weight: bold">{</span>bar<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121"> </span><span style="color: #BB6688; font-weight: bold">{</span>perc_print<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">% &quot;</span>
-
-        <span style="color: #008000; font-weight: bold">for</span> key, score <span style="color: #AA22FF; font-weight: bold">in</span> scores<span style="color: #666666">.</span>items():
-            <span style="color: #008000; font-weight: bold">if</span> np<span style="color: #666666">.</span>isnan(score[epoch]) <span style="color: #666666">==</span> <span style="color: #008000; font-weight: bold">False</span>:
-                value <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>_fmt(score[epoch], N<span style="color: #666666">=4</span>)
-                line <span style="color: #666666">+=</span> <span style="color: #BA2121">f&quot;| </span><span style="color: #BB6688; font-weight: bold">{</span>key<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">: </span><span style="color: #BB6688; font-weight: bold">{</span>value<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121"> &quot;</span>
-        <span style="color: #008000">print</span>(line, end<span style="color: #666666">=</span><span style="color: #BA2121">&quot;</span><span style="color: #BB6622; font-weight: bold">\r</span><span style="color: #BA2121">&quot;</span>)
-        <span style="color: #008000; font-weight: bold">return</span> <span style="color: #008000">len</span>(line)
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">_fmt</span>(<span style="color: #008000">self</span>, value: <span style="color: #008000">int</span>, N<span style="color: #666666">=4</span>) <span style="color: #666666">-&gt;</span> <span style="color: #008000">str</span>:
-        <span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;</span>
-<span style="color: #BA2121; font-style: italic">        Description:</span>
-<span style="color: #BA2121; font-style: italic">        ------------</span>
-<span style="color: #BA2121; font-style: italic">            Formats decimal numbers for progress bar</span>
-<span style="color: #BA2121; font-style: italic">        &quot;&quot;&quot;</span>
-        <span style="color: #008000; font-weight: bold">if</span> value <span style="color: #666666">&gt;</span> <span style="color: #666666">0</span>:
-            v <span style="color: #666666">=</span> value
-        <span style="color: #008000; font-weight: bold">elif</span> value <span style="color: #666666">&lt;</span> <span style="color: #666666">0</span>:
-            v <span style="color: #666666">=</span> <span style="color: #666666">-10</span> <span style="color: #666666">*</span> value
-        <span style="color: #008000; font-weight: bold">else</span>:
-            v <span style="color: #666666">=</span> <span style="color: #666666">1</span>
-        n <span style="color: #666666">=</span> <span style="color: #666666">1</span> <span style="color: #666666">+</span> math<span style="color: #666666">.</span>floor(math<span style="color: #666666">.</span>log10(v))
-        <span style="color: #008000; font-weight: bold">if</span> n <span style="color: #666666">&gt;=</span> N <span style="color: #666666">-</span> <span style="color: #666666">1</span>:
-            <span style="color: #008000; font-weight: bold">return</span> <span style="color: #008000">str</span>(<span style="color: #008000">round</span>(value))
-            <span style="color: #408080; font-style: italic"># or overflow</span>
-        <span style="color: #008000; font-weight: bold">return</span> <span style="color: #BA2121">f&quot;</span><span style="color: #BB6688; font-weight: bold">{</span>value<span style="color: #BB6688; font-weight: bold">:</span><span style="color: #BA2121">.</span><span style="color: #BB6688; font-weight: bold">{</span>N<span style="color: #666666">-</span>n<span style="color: #666666">-1</span><span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">f</span><span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">&quot;</span>
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-<h3 id="usage-of-cnn-code" class="anchor">Usage of CNN code </h3>
-
-<p>Using the CNN codebase is very simple. We begin by initiating a CNN
-object, which takes a cost function, a scheduler and a seed as its
-arguments. If a scheduler is not provided, it will per default
-initiate an Adam scheduler with eta=1e-4, and if a seed is not
-provided, the CNN will not be seeded, meaning it will run with a
-different random seed every run. Below we demonstrate an initiation of
-our CNN.
-</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;">adam_scheduler <span style="color: #666666">=</span> Adam(eta<span style="color: #666666">=1e-3</span>, rho<span style="color: #666666">=0.9</span>, rho2<span style="color: #666666">=0.999</span>)
-cnn <span style="color: #666666">=</span> CNN(cost_func<span style="color: #666666">=</span>CostCrossEntropy, scheduler<span style="color: #666666">=</span>adam_scheduler, seed<span style="color: #666666">=2023</span>)
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-<p>Now that we have our CNN object, we can begin to add layers to it!
-Many of the add_layer functions have default values, for example
-add_Convolution2DLayer() has a default v_stride and h_stride of
-1. However, these can of course be set to any value you please. Note
-that the input channels of a subsequent convolutional layer must equal
-the previous convolutional layer's feature maps.
-</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;">cnn<span style="color: #666666">.</span>add_Convolution2DLayer(
-    input_channels<span style="color: #666666">=1</span>,
-    feature_maps<span style="color: #666666">=1</span>,
-    kernel_height<span style="color: #666666">=3</span>,
-    kernel_width<span style="color: #666666">=3</span>,
-    act_func<span style="color: #666666">=</span>LRELU,
-)
-
-cnn<span style="color: #666666">.</span>add_FlattenLayer()
-
-cnn<span style="color: #666666">.</span>add_FullyConnectedLayer(<span style="color: #666666">30</span>, LRELU)
-
-cnn<span style="color: #666666">.</span>add_FullyConnectedLayer(<span style="color: #666666">20</span>, LRELU)
-
-cnn<span style="color: #666666">.</span>add_OutputLayer(<span style="color: #666666">10</span>, softmax)
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-<p>Here we have created a CNN with the following architecture:</p>
-
-<ol>
-<li> A convolutional layer with 1 input channel, with a kernel height of 2 and a width of 2, which uses LRELU as its non-linearity function. This layer outputs 1 feature map, which feed into the subsequent layer.</li>
-<li> A flatten layer</li>
-<li> A hidden layer with 30 nodes, with LRELU as its activation function</li>
-<li> Another hidden layer but with 20 nodes</li>
-<li> The output layer, with softmax as its activation function and 10 nodes. We use 10 nodes because we will be using a dataset with 10 classes.</li>
-</ol>
-<p>Now, before we can train the model, we need to load in our data. We
-will use the MNIST dataset and use 10000 \( 28 \times  28 \) images.
-</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.datasets</span> <span style="color: #008000; font-weight: bold">import</span> fetch_openml
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.model_selection</span> <span style="color: #008000; font-weight: bold">import</span> train_test_split
-
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">onehot</span>(target: np<span style="color: #666666">.</span>ndarray):
-    onehot <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros((target<span style="color: #666666">.</span>size, target<span style="color: #666666">.</span>max() <span style="color: #666666">+</span> <span style="color: #666666">1</span>))
-    onehot[np<span style="color: #666666">.</span>arange(target<span style="color: #666666">.</span>size), target] <span style="color: #666666">=</span> <span style="color: #666666">1</span>
-    <span style="color: #008000; font-weight: bold">return</span> onehot
-
-<span style="color: #408080; font-style: italic"># get dataset</span>
-dataset <span style="color: #666666">=</span> fetch_openml(<span style="color: #BA2121">&quot;mnist_784&quot;</span>, parser<span style="color: #666666">=</span><span style="color: #BA2121">&quot;auto&quot;</span>)
-mnist <span style="color: #666666">=</span> dataset<span style="color: #666666">.</span>data<span style="color: #666666">.</span>to_numpy(dtype<span style="color: #666666">=</span><span style="color: #BA2121">&quot;float&quot;</span>)[:<span style="color: #666666">10000</span>, :]
-
-<span style="color: #408080; font-style: italic"># scale data</span>
-<span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(mnist<span style="color: #666666">.</span>shape[<span style="color: #666666">1</span>]):
-    mnist[:, i] <span style="color: #666666">/=</span> <span style="color: #666666">255</span>
-    
-<span style="color: #408080; font-style: italic"># reshape to add single input channel to data shape [inputs, input_channels, height, width]</span>
-mnist <span style="color: #666666">=</span> mnist<span style="color: #666666">.</span>reshape(mnist<span style="color: #666666">.</span>shape[<span style="color: #666666">0</span>], <span style="color: #666666">1</span>, <span style="color: #666666">28</span>, <span style="color: #666666">28</span>)
-
-<span style="color: #408080; font-style: italic"># one hot encode target as we are doing multi-class classification</span>
-target <span style="color: #666666">=</span> onehot(np<span style="color: #666666">.</span>array([<span style="color: #008000">int</span>(i) <span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> dataset<span style="color: #666666">.</span>target<span style="color: #666666">.</span>to_numpy()[:<span style="color: #666666">10000</span>]]))
-
-<span style="color: #408080; font-style: italic"># split into training and validation data</span>
-x_train, x_val, y_train, y_val <span style="color: #666666">=</span> train_test_split(mnist, target)
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-<p>Now we may train our model. Note that we can utilize regularization in
-the CNN by using the lam (lambda) parameter in fit(), and utilize
-different types of gradient descent by specifying the amount of
-batches via the batches parameter as shown below.
-</p>
-
-<p>The functionfit() returns a score dictionary of the training error and
-accuracy (and validation error and accuracy if a validation set is
-provided) which can be used to plot the error and accuracy of the
-model over epochs.
-</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;">scores <span style="color: #666666">=</span> cnn<span style="color: #666666">.</span>fit(
-    x_train,
-    y_train,
-    lam<span style="color: #666666">=1e-5</span>,
-    batches<span style="color: #666666">=10</span>,
-    epochs<span style="color: #666666">=100</span>,
-    X_val<span style="color: #666666">=</span>x_val,
-    t_val<span style="color: #666666">=</span>y_val,
-)
-
-plt<span style="color: #666666">.</span>plot(scores[<span style="color: #BA2121">&quot;train_acc&quot;</span>], label<span style="color: #666666">=</span><span style="color: #BA2121">&quot;Training&quot;</span>)
-plt<span style="color: #666666">.</span>plot(scores[<span style="color: #BA2121">&quot;val_acc&quot;</span>], label<span style="color: #666666">=</span><span style="color: #BA2121">&quot;Validation&quot;</span>)
-plt<span style="color: #666666">.</span>ylim([<span style="color: #666666">0.8</span>,<span style="color: #666666">1</span>])
-plt<span style="color: #666666">.</span>xlabel(<span style="color: #BA2121">&quot;Epochs&quot;</span>)
-plt<span style="color: #666666">.</span>ylabel(<span style="color: #BA2121">&quot;Accuracy&quot;</span>)
-plt<span style="color: #666666">.</span>legend()
-plt<span style="color: #666666">.</span>show()
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-<p>Considering we only trained the model for 100 epochs without any tuning of the hyperparameters, this result is pretty good.</p>
-
-<p>The codebase allows for great flexibility in CNN
-architectures. Pooling layers can be added before, inbetween or after
-convolutional layers, but due to the great optimizations made within
-Convolution2DLayerOPT, we recommend using the v_stride and h_stride
-parameters in add_Convolution2DLayer() to reduce the dimentionality of
-the problem as the pooling layer is slow in comparison. To use the
-unoptimized version of Convolution2DLayer, simply pass optimized=False
-as an argument in add_Convolution2DLayer().
-</p>
-
-<p>If one wishes to perform binary classification using the CNN, simply
-use the cost function 'CostLogReg' when initializing the CNN and use 1
-node at the OutputLayer.
-</p>
-
-<p>Below we have created another, more untraditional architecture using
-our code to demonstrate its flexibility and different attributes such
-as asymmetric stride that might become useful when constructing your
-own CNN.
-</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;">adam_scheduler <span style="color: #666666">=</span> Adam(eta<span style="color: #666666">=1e-3</span>, rho<span style="color: #666666">=0.9</span>, rho2<span style="color: #666666">=0.999</span>)
-cnn <span style="color: #666666">=</span> CNN(cost_func<span style="color: #666666">=</span>CostCrossEntropy, scheduler<span style="color: #666666">=</span>adam_scheduler, seed<span style="color: #666666">=2023</span>)
-
-cnn<span style="color: #666666">.</span>add_Convolution2DLayer(
-    input_channels<span style="color: #666666">=1</span>,
-    feature_maps<span style="color: #666666">=7</span>,
-    kernel_height<span style="color: #666666">=7</span>,
-    kernel_width<span style="color: #666666">=1</span>,
-    act_func<span style="color: #666666">=</span>LRELU,
-)
-
-cnn<span style="color: #666666">.</span>add_PoolingLayer(
-    kernel_height<span style="color: #666666">=2</span>,
-    kernel_width<span style="color: #666666">=2</span>,
-    pooling<span style="color: #666666">=</span><span style="color: #BA2121">&quot;average&quot;</span>,
-)
-
-cnn<span style="color: #666666">.</span>add_PoolingLayer(
-    kernel_height<span style="color: #666666">=2</span>,
-    kernel_width<span style="color: #666666">=2</span>,
-    pooling<span style="color: #666666">=</span><span style="color: #BA2121">&quot;max&quot;</span>,
-)
-
-cnn<span style="color: #666666">.</span>add_Convolution2DLayer(
-    input_channels<span style="color: #666666">=7</span>,
-    feature_maps<span style="color: #666666">=1</span>,
-    kernel_height<span style="color: #666666">=4</span>,
-    kernel_width<span style="color: #666666">=4</span>,
-    v_stride<span style="color: #666666">=2</span>,
-    h_stride<span style="color: #666666">=3</span>,
-    act_func<span style="color: #666666">=</span>LRELU,
-    optimized<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">False</span>,
-)
-
-cnn<span style="color: #666666">.</span>add_Convolution2DLayer(
-    input_channels<span style="color: #666666">=1</span>,
-    feature_maps<span style="color: #666666">=1</span>,
-    kernel_height<span style="color: #666666">=2</span>,
-    kernel_width<span style="color: #666666">=2</span>,
-    act_func<span style="color: #666666">=</span>sigmoid,
-    optimized<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">True</span>,
-)
-
-cnn<span style="color: #666666">.</span>add_PoolingLayer(
-    kernel_height<span style="color: #666666">=2</span>,
-    kernel_width<span style="color: #666666">=2</span>,
-    pooling<span style="color: #666666">=</span><span style="color: #BA2121">&quot;max&quot;</span>
-)
-
-cnn<span style="color: #666666">.</span>add_FlattenLayer()
-
-cnn<span style="color: #666666">.</span>add_FullyConnectedLayer(<span style="color: #666666">100</span>, LRELU)
-
-cnn<span style="color: #666666">.</span>add_FullyConnectedLayer(<span style="color: #666666">10</span>, sigmoid)
-
-cnn<span style="color: #666666">.</span>add_FullyConnectedLayer(<span style="color: #666666">101</span>, identity)
-
-cnn<span style="color: #666666">.</span>add_OutputLayer(<span style="color: #666666">10</span>, softmax)
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-<p>Here we see the use of asymmetrical 1D kernels such as the \( 7 \times
-1 \) kernel in the first convolutional layer, both max and average
-pooling, asymmetric stride in the unoptimized convolutional layer,
-more pooling, a flatten layer, a hidden layer with 100 nodes using
-LRELU, another hidden layer with 10 hidden nodes that uses the sigmoid
-activation function, and another hidden layer with 101 nodes which
-utilizes no activation function (identity). Finally, we arrive at the
-output layer with 10 nodes, which uses softmax as its activation
-function.
-</p>
-<h3 id="additional-remarks" class="anchor">Additional Remarks </h3>
-
-<p>The stride parameter controls the distance between each convolution
-and the kernel/filter. If our image is padded, stride is the only
-parameter that determines the size of the output from a convolutional
-layer. However, if we decide not to perform any padding, the size of
-the output feature map depends on both the stride and kernel size. It
-is important to note that neither the stride nor the kernel has to be
-symmetrical. This means that we can use a rectangular filter if we
-choose, and the stride in the vertical direction (axis=0 in Python)
-does not need to be the same as the stride in the horizontal direction
-(axis=1 in Python). It may even be the case that asymmetric
-combinations of stride or kernel dimensions, or both, yield better
-results than symmetric values for these parameters.
-</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">convolve</span>(image, kernel, stride<span style="color: #666666">=1</span>):
-    <span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #666666">2</span>):
-        kernel <span style="color: #666666">=</span> np<span style="color: #666666">.</span>rot90(kernel)
-
-    k_half_height <span style="color: #666666">=</span> kernel<span style="color: #666666">.</span>shape[<span style="color: #666666">0</span>] <span style="color: #666666">//</span> <span style="color: #666666">2</span>
-    k_half_width <span style="color: #666666">=</span> kernel<span style="color: #666666">.</span>shape[<span style="color: #666666">0</span>] <span style="color: #666666">//</span> <span style="color: #666666">2</span>
-
-    conv_image <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros(image<span style="color: #666666">.</span>shape)
-    pad_image <span style="color: #666666">=</span> padding(image, kernel)
-
-    <span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(k_half_height, conv_image<span style="color: #666666">.</span>shape[<span style="color: #666666">0</span>] <span style="color: #666666">+</span> k_half_height, stride):
-        <span style="color: #008000; font-weight: bold">for</span> j <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(k_half_width, conv_image<span style="color: #666666">.</span>shape[<span style="color: #666666">1</span>] <span style="color: #666666">+</span> k_half_width, stride):
-            conv_image[i <span style="color: #666666">-</span> k_half_height, j <span style="color: #666666">-</span> k_half_width] <span style="color: #666666">=</span> np<span style="color: #666666">.</span>sum(
-                pad_image[
-                    i <span style="color: #666666">-</span> k_half_height : i <span style="color: #666666">+</span> k_half_height <span style="color: #666666">+</span> <span style="color: #666666">1</span>, j <span style="color: #666666">-</span> k_half_width : j <span style="color: #666666">+</span> k_half_width <span style="color: #666666">+</span> <span style="color: #666666">1</span>
-                ]
-                <span style="color: #666666">*</span> kernel
-            )
-
-    <span style="color: #008000; font-weight: bold">return</span> conv_image
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-<h3 id="remarks-on-the-speed" class="anchor">Remarks on the speed  </h3>
-
-<p>Despite the naive convolution algorithm shown above working finely, it
-is extremely slow, requiring approximately 20-30 seconds to process a
-single image. The time complexity of 2D convolution, which is O(NMnm),
-rapidly becomes a constraint and may, at worst, make computations
-infeasible. Consequently, optimizing the naive 2D convolution
-algorithm is a necessity, as the execution time of the algorithm
-significantly increases as the input data size expands. This can pose
-a bottleneck in applications that necessitate real-time processing of
-large data volumes, such as image and video processing, deep learning,
-and scientific simulations.
-</p>
-
-<p>To address this issue, we shall present two widely used optimization
-techniques: the separable kernel approach and Fast Fourier Transform
-(FFT). Both of these methods can drastically reduce the computational
-complexity of convolution and enhance the overall efficiency of
-processing substantial data quantities. While we shall refrain from
-delving into the intricacies of these algorithms, we strongly
-encourage you to examine at least the application of FFT to optimize
-computations.
-</p>
-<h3 id="convolution-using-separable-kernels" class="anchor">Convolution using separable kernels </h3>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">conv2DSep</span>(image, kernel, coef, stride<span style="color: #666666">=1</span>, pad<span style="color: #666666">=</span><span style="color: #BA2121">&quot;zero&quot;</span>):
-    <span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #666666">2</span>):
-        kernel <span style="color: #666666">=</span> np<span style="color: #666666">.</span>rot90(kernel)
-
-    <span style="color: #408080; font-style: italic"># The kernel is quadratic, thus we only need one of its dimensions</span>
-    half_dim <span style="color: #666666">=</span> kernel<span style="color: #666666">.</span>shape[<span style="color: #666666">0</span>] <span style="color: #666666">//</span> <span style="color: #666666">2</span>
-
-    ker1 <span style="color: #666666">=</span> np<span style="color: #666666">.</span>array(kernel[<span style="color: #666666">0</span>, :])
-    ker2 <span style="color: #666666">=</span> np<span style="color: #666666">.</span>array(kernel[:, <span style="color: #666666">0</span>])
-
-    <span style="color: #008000; font-weight: bold">if</span> pad <span style="color: #666666">==</span> <span style="color: #BA2121">&quot;zero&quot;</span>:
-        conv_image <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros(image<span style="color: #666666">.</span>shape)
-        pad_image <span style="color: #666666">=</span> padding(image, kernel)
-    <span style="color: #008000; font-weight: bold">else</span>:
-        conv_image <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros(
-            (image<span style="color: #666666">.</span>shape[<span style="color: #666666">0</span>] <span style="color: #666666">-</span> kernel<span style="color: #666666">.</span>shape[<span style="color: #666666">0</span>], image<span style="color: #666666">.</span>shape[<span style="color: #666666">1</span>] <span style="color: #666666">-</span> kernel<span style="color: #666666">.</span>shape[<span style="color: #666666">1</span>])
-        )
-        pad_image <span style="color: #666666">=</span> image[:, :]
-
-    <span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(half_dim, conv_image<span style="color: #666666">.</span>shape[<span style="color: #666666">0</span>] <span style="color: #666666">+</span> half_dim, stride):
-        <span style="color: #008000; font-weight: bold">for</span> j <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(half_dim, conv_image<span style="color: #666666">.</span>shape[<span style="color: #666666">1</span>] <span style="color: #666666">+</span> half_dim, stride):
-            conv_image[i <span style="color: #666666">-</span> half_dim, j <span style="color: #666666">-</span> half_dim] <span style="color: #666666">=</span> (
-                pad_image[
-                    i <span style="color: #666666">-</span> half_dim : i <span style="color: #666666">+</span> half_dim <span style="color: #666666">+</span> <span style="color: #666666">1</span>, j <span style="color: #666666">-</span> half_dim : j <span style="color: #666666">+</span> half_dim <span style="color: #666666">+</span> <span style="color: #666666">1</span>
-                ]
-                <span style="color: #666666">@</span> ker1
-                <span style="color: #666666">@</span> ker2<span style="color: #666666">.</span>T
-                <span style="color: #666666">*</span> coef
-            )
-
-    <span style="color: #008000; font-weight: bold">return</span> conv_image
-
-img_path <span style="color: #666666">=</span> img_path <span style="color: #666666">=</span> <span style="color: #BA2121">&quot;data/IMG-2167.JPG&quot;</span>
-image_of_cute_dog <span style="color: #666666">=</span> imageio<span style="color: #666666">.</span>imread(img_path, mode<span style="color: #666666">=</span><span style="color: #BA2121">&quot;L&quot;</span>)
-start_time <span style="color: #666666">=</span> time<span style="color: #666666">.</span>time()
-filtered_image <span style="color: #666666">=</span> conv2DSep(image_of_cute_dog, kernel<span style="color: #666666">=</span>sobel_kernel, coef<span style="color: #666666">=1</span>)
-<span style="color: #008000">print</span>(<span style="color: #BA2121">f&#39;Time taken for convolution with seperated kernel on 128x128 image </span><span style="color: #BB6688; font-weight: bold">{</span>time<span style="color: #666666">.</span>time() <span style="color: #666666">-</span> start_time<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">&#39;</span>)
-plt<span style="color: #666666">.</span>imshow(filtered_image, cmap<span style="color: #666666">=</span><span style="color: #BA2121">&quot;gray&quot;</span>, vmin<span style="color: #666666">=0</span>, vmax<span style="color: #666666">=255</span>, aspect<span style="color: #666666">=</span><span style="color: #BA2121">&quot;auto&quot;</span>)
-plt<span style="color: #666666">.</span>show()
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-<p>By taking advantage of the capabilities of separable kernels, we can
-effectively cut the computational expense of filtering an image in
-half. Yet, if we seek even more rapid processing, we can turn to the
-Fast Fourier Transform (FFT) algorithm provided by the numpy
-library. By utilizing FFT to transform the input image and filter into
-the frequency domain, we can perform convolution in this domain. This
-approach significantly reduces the number of operations needed and
-results in a marked speedup relative to other convolution
-techniques. In addition, it is worth noting that the FFT is widely
-regarded as one of the most critical algorithms developed to date,
-with applications ranging from digital signal processing to scientific
-computing.
-</p>
-<h3 id="convolution-in-the-fourier-domain" class="anchor">Convolution in the Fourier domain </h3>
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;">start_time <span style="color: #666666">=</span> time<span style="color: #666666">.</span>time()
-img_fft <span style="color: #666666">=</span> np<span style="color: #666666">.</span>fft<span style="color: #666666">.</span>fft2(image_of_cute_dog)
-kernel_fft <span style="color: #666666">=</span> np<span style="color: #666666">.</span>fft<span style="color: #666666">.</span>fft2(sobel_kernel, s<span style="color: #666666">=</span>image_of_cute_dog<span style="color: #666666">.</span>shape)
-
-conv_image <span style="color: #666666">=</span> img_fft <span style="color: #666666">*</span> kernel_fft
-
-filtered_image <span style="color: #666666">=</span> np<span style="color: #666666">.</span>fft<span style="color: #666666">.</span>ifft2(conv_image)
-<span style="color: #008000">print</span>(<span style="color: #BA2121">f&#39;Time take for convolution in the fourier domain: </span><span style="color: #BB6688; font-weight: bold">{</span>time<span style="color: #666666">.</span>time() <span style="color: #666666">-</span> start_time<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">&#39;</span>)
-plt<span style="color: #666666">.</span>imshow(filtered_image<span style="color: #666666">.</span>real, cmap<span style="color: #666666">=</span><span style="color: #BA2121">&quot;gray&quot;</span>, vmin<span style="color: #666666">=0</span>, vmax<span style="color: #666666">=255</span>, aspect<span style="color: #666666">=</span><span style="color: #BA2121">&quot;auto&quot;</span>)
-plt<span style="color: #666666">.</span>show()
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-<p>It is evident that executing convolution in the Fourier domain yields
-the quickest computation time. Nonetheless, one should exercise
-caution, particularly when dealing with images of relatively small
-dimensions, as one of the other methods may prove to be more
-expeditious than FFT-enhanced convolution. The overhead involved in
-transferring both the image and filter into the Fourier domain,
-followed by their subsequent transformation back into the spatial
-domain, results in a minor inconvenience. Therefore, it is imperative
-to remain cognizant of this fact when utilizing FFT as the primary
-optimization technique.
-</p>
 
 <p>
 <!-- navigation buttons at the bottom of the page -->
@@ -3630,6 +487,18 @@ <h3 id="convolution-in-the-fourier-domain" class="anchor">Convolution in the Fou
   <li><a href="http://github.com/._week44-bs059.html">60</a></li>
   <li><a href="http://github.com/._week44-bs060.html">61</a></li>
   <li class="active"><a href="http://github.com/._week44-bs061.html">62</a></li>
+  <li><a href="http://github.com/._week44-bs062.html">63</a></li>
+  <li><a href="http://github.com/._week44-bs063.html">64</a></li>
+  <li><a href="http://github.com/._week44-bs064.html">65</a></li>
+  <li><a href="http://github.com/._week44-bs065.html">66</a></li>
+  <li><a href="http://github.com/._week44-bs066.html">67</a></li>
+  <li><a href="http://github.com/._week44-bs067.html">68</a></li>
+  <li><a href="http://github.com/._week44-bs068.html">69</a></li>
+  <li><a href="http://github.com/._week44-bs069.html">70</a></li>
+  <li><a href="http://github.com/._week44-bs070.html">71</a></li>
+  <li><a href="">...</a></li>
+  <li><a href="http://github.com/._week44-bs086.html">87</a></li>
+  <li><a href="http://github.com/._week44-bs062.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
 </div>  <!-- end container -->
diff --git a/doc/pub/week44/html/._week44-bs062.html b/doc/pub/week44/html/._week44-bs062.html
index c4c02ad78..e699f30ea 100644
--- a/doc/pub/week44/html/._week44-bs062.html
+++ b/doc/pub/week44/html/._week44-bs062.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 44: Decision Trees, Ensemble methods  and Random Forests">
-<title>Week 44: Decision Trees, Ensemble methods  and Random Forests</title>
+<meta name="description" content="Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)">
+<title>Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week44.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week44-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -36,191 +36,227 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Overview of week 44', 2, None, 'overview-of-week-44'),
-              ('Digression First', 2, None, 'digression-first'),
-              ('Decision trees, overarching aims',
+ 'sections': [('Plan for week 44', 2, None, 'plan-for-week-44'),
+              ('Lab  sessions on Tuesday and Wednesday',
                2,
                None,
-               'decision-trees-overarching-aims'),
-              ('Basics of a tree', 2, None, 'basics-of-a-tree'),
-              ('A Sketch of a Tree, Regression problem',
+               'lab-sessions-on-tuesday-and-wednesday'),
+              ('Material for Lecture Monday October 27',
                2,
                None,
-               'a-sketch-of-a-tree-regression-problem'),
-              ('A Sketch of a Tree, Classification  problem',
+               'material-for-lecture-monday-october-27'),
+              ('Solving differential equations  with Deep Learning',
                2,
                None,
-               'a-sketch-of-a-tree-classification-problem'),
-              ('A typical Decision Tree with its pertinent Jargon, '
-               'Classification Problem',
+               'solving-differential-equations-with-deep-learning'),
+              ('Ordinary Differential Equations first',
                2,
                None,
-               'a-typical-decision-tree-with-its-pertinent-jargon-classification-problem'),
-              ('General Features', 2, None, 'general-features'),
-              ('How do we set it up?', 2, None, 'how-do-we-set-it-up'),
-              ('Decision trees and Regression',
+               'ordinary-differential-equations-first'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('Minimization process', 2, None, 'minimization-process'),
+              ('Minimizing the cost function using gradient descent and '
+               'automatic differentiation',
                2,
                None,
-               'decision-trees-and-regression'),
-              ('Building a tree, regression',
+               'minimizing-the-cost-function-using-gradient-descent-and-automatic-differentiation'),
+              ('Example: Exponential decay',
                2,
                None,
-               'building-a-tree-regression'),
-              ('A top-down approach, recursive binary splitting',
+               'example-exponential-decay'),
+              ('The function to solve for',
                2,
                None,
-               'a-top-down-approach-recursive-binary-splitting'),
-              ('Making a tree', 2, None, 'making-a-tree'),
-              ('Pruning the tree', 2, None, 'pruning-the-tree'),
-              ('Cost complexity pruning', 2, None, 'cost-complexity-pruning'),
-              ('Schematic Regression Procedure',
+               'the-function-to-solve-for'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('Setup of Network', 2, None, 'setup-of-network'),
+              ('Reformulating the problem',
                2,
                None,
-               'schematic-regression-procedure'),
-              ('A Classification Tree', 2, None, 'a-classification-tree'),
-              ('Growing a classification tree',
+               'reformulating-the-problem'),
+              ('More technicalities', 2, None, 'more-technicalities'),
+              ('More details', 2, None, 'more-details'),
+              ('A possible implementation of a neural network',
                2,
                None,
-               'growing-a-classification-tree'),
-              ('Classification tree, how to split nodes',
+               'a-possible-implementation-of-a-neural-network'),
+              ('Technicalities', 2, None, 'technicalities'),
+              ('Final technicalities I', 2, None, 'final-technicalities-i'),
+              ('Final technicalities II', 2, None, 'final-technicalities-ii'),
+              ('Final technicalities III', 2, None, 'final-technicalities-iii'),
+              ('Final technicalities IV', 2, None, 'final-technicalities-iv'),
+              ('Back propagation', 2, None, 'back-propagation'),
+              ('Gradient descent', 2, None, 'gradient-descent'),
+              ('The code for solving the ODE',
                2,
                None,
-               'classification-tree-how-to-split-nodes'),
-              ('Visualizing the Tree, Classification',
+               'the-code-for-solving-the-ode'),
+              ('The network with one input layer, specified number of hidden '
+               'layers, and one output layer',
                2,
                None,
-               'visualizing-the-tree-classification'),
-              ('Visualizing the Tree, The Moons',
+               'the-network-with-one-input-layer-specified-number-of-hidden-layers-and-one-output-layer'),
+              ('Example: Population growth',
                2,
                None,
-               'visualizing-the-tree-the-moons'),
-              ('Other ways of visualizing the trees',
+               'example-population-growth'),
+              ('Setting up the problem', 2, None, 'setting-up-the-problem'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('The program using Autograd',
                2,
                None,
-               'other-ways-of-visualizing-the-trees'),
-              ('Printing out as text', 2, None, 'printing-out-as-text'),
-              ('Algorithms for Setting up Decision Trees',
+               'the-program-using-autograd'),
+              ('Using forward Euler to solve the ODE',
                2,
                None,
-               'algorithms-for-setting-up-decision-trees'),
-              ('The CART algorithm for Classification',
+               'using-forward-euler-to-solve-the-ode'),
+              ('Example: Solving the one dimensional Poisson equation',
                2,
                None,
-               'the-cart-algorithm-for-classification'),
-              ('The CART algorithm for Regression',
+               'example-solving-the-one-dimensional-poisson-equation'),
+              ('The specific equation to solve for',
                2,
                None,
-               'the-cart-algorithm-for-regression'),
-              ('Why binary splits?', 2, None, 'why-binary-splits'),
-              ('Computing a Tree using the Gini Index',
+               'the-specific-equation-to-solve-for'),
+              ('Solving the equation using Autograd',
                2,
                None,
-               'computing-a-tree-using-the-gini-index'),
-              ('The Table', 2, None, 'the-table'),
-              ('Computing the various Gini Indices',
+               'solving-the-equation-using-autograd'),
+              ('Comparing with a numerical scheme',
                2,
                None,
-               'computing-the-various-gini-indices'),
-              ('Computing the various Gini Indices, Hours slept',
+               'comparing-with-a-numerical-scheme'),
+              ('Setting up the code', 2, None, 'setting-up-the-code'),
+              ('Partial Differential Equations',
                2,
                None,
-               'computing-the-various-gini-indices-hours-slept'),
-              ('Computing the various Gini Indices, Hours studied',
+               'partial-differential-equations'),
+              ('Type of problem', 2, None, 'type-of-problem'),
+              ('Network requirements', 2, None, 'network-requirements'),
+              ('More details', 2, None, 'more-details'),
+              ('Example: The diffusion equation',
                2,
                None,
-               'computing-the-various-gini-indices-hours-studied'),
-              ('A possible code using Scikit-Learn',
+               'example-the-diffusion-equation'),
+              ('Defining the problem', 2, None, 'defining-the-problem'),
+              ('Setting up the network using Autograd',
                2,
                None,
-               'a-possible-code-using-scikit-learn'),
-              ('Further example: Computing the Gini index',
+               'setting-up-the-network-using-autograd'),
+              ('Setting up the network using Autograd; The trial solution',
                2,
                None,
-               'further-example-computing-the-gini-index'),
-              ('Simple Python Code to read in Data and perform Classification',
+               'setting-up-the-network-using-autograd-the-trial-solution'),
+              ('Why the Jacobian?', 2, None, 'why-the-jacobian'),
+              ('Setting up the network using Autograd; The full program',
                2,
                None,
-               'simple-python-code-to-read-in-data-and-perform-classification'),
-              ('Computing the Gini Factor',
+               'setting-up-the-network-using-autograd-the-full-program'),
+              ('Resources on differential equations and deep learning',
                2,
                None,
-               'computing-the-gini-factor'),
-              ('Regression trees', 2, None, 'regression-trees'),
-              ('Final regressor code', 2, None, 'final-regressor-code'),
-              ('Pros and cons of trees, pros',
+               'resources-on-differential-equations-and-deep-learning'),
+              ('Convolutional Neural Networks (recognizing images)',
                2,
                None,
-               'pros-and-cons-of-trees-pros'),
-              ('Disadvantages', 2, None, 'disadvantages'),
-              ('Ensemble Methods: From a Single Tree to Many Trees and Extreme '
-               'Boosting, Meet the Jungle of Methods',
+               'convolutional-neural-networks-recognizing-images'),
+              ('What is the Difference', 2, None, 'what-is-the-difference'),
+              ('Neural Networks vs CNNs', 2, None, 'neural-networks-vs-cnns'),
+              ('Why CNNS for images, sound files, medical images from CT scans '
+               'etc?',
                2,
                None,
-               'ensemble-methods-from-a-single-tree-to-many-trees-and-extreme-boosting-meet-the-jungle-of-methods'),
-              ('An Overview of Ensemble Methods',
+               'why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc'),
+              ('Regular NNs don’t scale well to full images',
                2,
                None,
-               'an-overview-of-ensemble-methods'),
-              ('Why Voting?', 2, None, 'why-voting'),
-              ('Tossing coins', 2, None, 'tossing-coins'),
-              ('Standard imports first', 2, None, 'standard-imports-first'),
-              ('Simple Voting Example, head or tail',
+               'regular-nns-don-t-scale-well-to-full-images'),
+              ('3D volumes of neurons', 2, None, '3d-volumes-of-neurons'),
+              ('More on Dimensionalities', 2, None, 'more-on-dimensionalities'),
+              ('Further remarks', 2, None, 'further-remarks'),
+              ('Layers used to build CNNs',
                2,
                None,
-               'simple-voting-example-head-or-tail'),
-              ('Using the Voting Classifier',
+               'layers-used-to-build-cnns'),
+              ('Transforming images', 2, None, 'transforming-images'),
+              ('CNNs in brief', 2, None, 'cnns-in-brief'),
+              ('A deep CNN model ("From Raschka et '
+               'al":"https://github.com/rasbt/machine-learning-book")',
                2,
                None,
-               'using-the-voting-classifier'),
-              ('Voting and Bagging', 2, None, 'voting-and-bagging'),
-              ('Bagging', 2, None, 'bagging'),
-              ('More bagging', 2, None, 'more-bagging'),
-              ('Making your own Bootstrap: Changing the Level of the Decision '
-               'Tree',
+               'a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
+              ('Key Idea', 2, None, 'key-idea'),
+              ('How to do image compression before the era of deep learning',
                2,
                None,
-               'making-your-own-bootstrap-changing-the-level-of-the-decision-tree'),
-              ('Random forests', 2, None, 'random-forests'),
-              ('Random Forest Algorithm', 2, None, 'random-forest-algorithm'),
-              ('Random Forests Compared with other Methods on the Cancer Data',
+               'how-to-do-image-compression-before-the-era-of-deep-learning'),
+              ('The SVD example', 2, None, 'the-svd-example'),
+              ('Mathematics of CNNs', 2, None, 'mathematics-of-cnns'),
+              ('Convolution Examples: Polynomial multiplication',
                2,
                None,
-               'random-forests-compared-with-other-methods-on-the-cancer-data'),
-              ('Compare  Bagging on Trees with Random Forests',
+               'convolution-examples-polynomial-multiplication'),
+              ('Efficient Polynomial Multiplication',
                2,
                None,
-               'compare-bagging-on-trees-with-random-forests'),
-              ("Boosting, a Bird's Eye View",
+               'efficient-polynomial-multiplication'),
+              ('Further simplification', 2, None, 'further-simplification'),
+              ('A more efficient way of coding the above Convolution',
                2,
                None,
-               'boosting-a-bird-s-eye-view'),
-              ('What is boosting? Additive Modelling/Iterative Fitting',
+               'a-more-efficient-way-of-coding-the-above-convolution'),
+              ('Commutative process', 2, None, 'commutative-process'),
+              ('Toeplitz matrices', 2, None, 'toeplitz-matrices'),
+              ('Fourier series and Toeplitz matrices',
                2,
                None,
-               'what-is-boosting-additive-modelling-iterative-fitting'),
-              ('Iterative Fitting, Regression and Squared-error Cost Function',
+               'fourier-series-and-toeplitz-matrices'),
+              ('Generalizing the above one-dimensional case',
                2,
                None,
-               'iterative-fitting-regression-and-squared-error-cost-function'),
-              ('Squared-Error Example and Iterative Fitting',
+               'generalizing-the-above-one-dimensional-case'),
+              ('Memory considerations', 2, None, 'memory-considerations'),
+              ('Padding', 2, None, 'padding'),
+              ('New vector', 2, None, 'new-vector'),
+              ('Rewriting as dot products',
                2,
                None,
-               'squared-error-example-and-iterative-fitting'),
-              ('Iterative Fitting, Classification and AdaBoost',
+               'rewriting-as-dot-products'),
+              ('Cross correlation', 2, None, 'cross-correlation'),
+              ('Two-dimensional objects', 2, None, 'two-dimensional-objects'),
+              ('CNNs in more detail, simple example',
                2,
                None,
-               'iterative-fitting-classification-and-adaboost'),
-              ('Adaptive Boosting, AdaBoost',
+               'cnns-in-more-detail-simple-example'),
+              ('The convolution stage', 2, None, 'the-convolution-stage'),
+              ('Finding the number of parameters',
                2,
                None,
-               'adaptive-boosting-adaboost'),
-              ('Building up AdaBoost', 2, None, 'building-up-adaboost'),
-              ('Adaptive boosting: AdaBoost, Basic Algorithm',
+               'finding-the-number-of-parameters'),
+              ('New image (or volume)', 2, None, 'new-image-or-volume'),
+              ('Parameters to train, common settings',
                2,
                None,
-               'adaptive-boosting-adaboost-basic-algorithm'),
-              ('Basic Steps of AdaBoost', 2, None, 'basic-steps-of-adaboost'),
-              ('AdaBoost Examples', 2, None, 'adaboost-examples')]}
+               'parameters-to-train-common-settings'),
+              ('Examples of CNN setups', 2, None, 'examples-of-cnn-setups'),
+              ('Summarizing: Performing a general discrete convolution ("From '
+               'Raschka et '
+               'al":"https://github.com/rasbt/machine-learning-book")',
+               2,
+               None,
+               'summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
+              ('Pooling', 2, None, 'pooling'),
+              ('Pooling arithmetic', 2, None, 'pooling-arithmetic'),
+              ('Pooling types ("From Raschka et '
+               'al":"https://github.com/rasbt/machine-learning-book")',
+               2,
+               None,
+               'pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
+              ('Building convolutional neural networks in Tensorflow/Keras and '
+               'PyTorch',
+               2,
+               None,
+               'building-convolutional-neural-networks-in-tensorflow-keras-and-pytorch')]}
 end of tocinfo -->
 
 <body>
@@ -248,78 +284,100 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week44-bs.html">Week 44: Decision Trees, Ensemble methods  and Random Forests</a>
+    <a class="navbar-brand" href="http://github.com/week44-bs.html">Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs001.html#overview-of-week-44" style="font-size: 80%;">Overview of week 44</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs002.html#digression-first" style="font-size: 80%;">Digression First</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs003.html#decision-trees-overarching-aims" style="font-size: 80%;">Decision trees, overarching aims</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs004.html#basics-of-a-tree" style="font-size: 80%;">Basics of a tree</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs005.html#a-sketch-of-a-tree-regression-problem" style="font-size: 80%;">A Sketch of a Tree, Regression problem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs006.html#a-sketch-of-a-tree-classification-problem" style="font-size: 80%;">A Sketch of a Tree, Classification  problem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs007.html#a-typical-decision-tree-with-its-pertinent-jargon-classification-problem" style="font-size: 80%;">A typical Decision Tree with its pertinent Jargon, Classification Problem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs008.html#general-features" style="font-size: 80%;">General Features</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs009.html#how-do-we-set-it-up" style="font-size: 80%;">How do we set it up?</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs010.html#decision-trees-and-regression" style="font-size: 80%;">Decision trees and Regression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs011.html#building-a-tree-regression" style="font-size: 80%;">Building a tree, regression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs012.html#a-top-down-approach-recursive-binary-splitting" style="font-size: 80%;">A top-down approach, recursive binary splitting</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs013.html#making-a-tree" style="font-size: 80%;">Making a tree</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs014.html#pruning-the-tree" style="font-size: 80%;">Pruning the tree</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs015.html#cost-complexity-pruning" style="font-size: 80%;">Cost complexity pruning</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs016.html#schematic-regression-procedure" style="font-size: 80%;">Schematic Regression Procedure</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs017.html#a-classification-tree" style="font-size: 80%;">A Classification Tree</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs018.html#growing-a-classification-tree" style="font-size: 80%;">Growing a classification tree</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs019.html#classification-tree-how-to-split-nodes" style="font-size: 80%;">Classification tree, how to split nodes</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs020.html#visualizing-the-tree-classification" style="font-size: 80%;">Visualizing the Tree, Classification</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs021.html#visualizing-the-tree-the-moons" style="font-size: 80%;">Visualizing the Tree, The Moons</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs022.html#other-ways-of-visualizing-the-trees" style="font-size: 80%;">Other ways of visualizing the trees</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs023.html#printing-out-as-text" style="font-size: 80%;">Printing out as text</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs024.html#algorithms-for-setting-up-decision-trees" style="font-size: 80%;">Algorithms for Setting up Decision Trees</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs025.html#the-cart-algorithm-for-classification" style="font-size: 80%;">The CART algorithm for Classification</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs026.html#the-cart-algorithm-for-regression" style="font-size: 80%;">The CART algorithm for Regression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs027.html#why-binary-splits" style="font-size: 80%;">Why binary splits?</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#computing-a-tree-using-the-gini-index" style="font-size: 80%;">Computing a Tree using the Gini Index</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs029.html#the-table" style="font-size: 80%;">The Table</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs030.html#computing-the-various-gini-indices" style="font-size: 80%;">Computing the various Gini Indices</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs031.html#computing-the-various-gini-indices-hours-slept" style="font-size: 80%;">Computing the various Gini Indices, Hours slept</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs032.html#computing-the-various-gini-indices-hours-studied" style="font-size: 80%;">Computing the various Gini Indices, Hours studied</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs033.html#a-possible-code-using-scikit-learn" style="font-size: 80%;">A possible code using Scikit-Learn</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs034.html#further-example-computing-the-gini-index" style="font-size: 80%;">Further example: Computing the Gini index</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs035.html#simple-python-code-to-read-in-data-and-perform-classification" style="font-size: 80%;">Simple Python Code to read in Data and perform Classification</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs036.html#computing-the-gini-factor" style="font-size: 80%;">Computing the Gini Factor</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs037.html#regression-trees" style="font-size: 80%;">Regression trees</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs038.html#final-regressor-code" style="font-size: 80%;">Final regressor code</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs039.html#pros-and-cons-of-trees-pros" style="font-size: 80%;">Pros and cons of trees, pros</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs040.html#disadvantages" style="font-size: 80%;">Disadvantages</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs041.html#ensemble-methods-from-a-single-tree-to-many-trees-and-extreme-boosting-meet-the-jungle-of-methods" style="font-size: 80%;">Ensemble Methods: From a Single Tree to Many Trees and Extreme Boosting, Meet the Jungle of Methods</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs042.html#an-overview-of-ensemble-methods" style="font-size: 80%;">An Overview of Ensemble Methods</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs043.html#why-voting" style="font-size: 80%;">Why Voting?</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs044.html#tossing-coins" style="font-size: 80%;">Tossing coins</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs045.html#standard-imports-first" style="font-size: 80%;">Standard imports first</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs046.html#simple-voting-example-head-or-tail" style="font-size: 80%;">Simple Voting Example, head or tail</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs047.html#using-the-voting-classifier" style="font-size: 80%;">Using the Voting Classifier</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs048.html#voting-and-bagging" style="font-size: 80%;">Voting and Bagging</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs049.html#bagging" style="font-size: 80%;">Bagging</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs050.html#more-bagging" style="font-size: 80%;">More bagging</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs051.html#making-your-own-bootstrap-changing-the-level-of-the-decision-tree" style="font-size: 80%;">Making your own Bootstrap: Changing the Level of the Decision Tree</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs052.html#random-forests" style="font-size: 80%;">Random forests</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs053.html#random-forest-algorithm" style="font-size: 80%;">Random Forest Algorithm</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs054.html#random-forests-compared-with-other-methods-on-the-cancer-data" style="font-size: 80%;">Random Forests Compared with other Methods on the Cancer Data</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs055.html#compare-bagging-on-trees-with-random-forests" style="font-size: 80%;">Compare  Bagging on Trees with Random Forests</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs056.html#boosting-a-bird-s-eye-view" style="font-size: 80%;">Boosting, a Bird's Eye View</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs057.html#what-is-boosting-additive-modelling-iterative-fitting" style="font-size: 80%;">What is boosting? Additive Modelling/Iterative Fitting</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs058.html#iterative-fitting-regression-and-squared-error-cost-function" style="font-size: 80%;">Iterative Fitting, Regression and Squared-error Cost Function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs059.html#squared-error-example-and-iterative-fitting" style="font-size: 80%;">Squared-Error Example and Iterative Fitting</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#iterative-fitting-classification-and-adaboost" style="font-size: 80%;">Iterative Fitting, Classification and AdaBoost</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs061.html#adaptive-boosting-adaboost" style="font-size: 80%;">Adaptive Boosting, AdaBoost</a></li>
-     <!-- navigation toc: --> <li><a href="#building-up-adaboost" style="font-size: 80%;">Building up AdaBoost</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs063.html#adaptive-boosting-adaboost-basic-algorithm" style="font-size: 80%;">Adaptive boosting: AdaBoost, Basic Algorithm</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs064.html#basic-steps-of-adaboost" style="font-size: 80%;">Basic Steps of AdaBoost</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs065.html#adaboost-examples" style="font-size: 80%;">AdaBoost Examples</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs001.html#plan-for-week-44" style="font-size: 80%;">Plan for week 44</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs002.html#lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;">Lab  sessions on Tuesday and Wednesday</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs003.html#material-for-lecture-monday-october-27" style="font-size: 80%;">Material for Lecture Monday October 27</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs004.html#solving-differential-equations-with-deep-learning" style="font-size: 80%;">Solving differential equations  with Deep Learning</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs005.html#ordinary-differential-equations-first" style="font-size: 80%;">Ordinary Differential Equations first</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#the-trial-solution" style="font-size: 80%;">The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs007.html#minimization-process" style="font-size: 80%;">Minimization process</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs008.html#minimizing-the-cost-function-using-gradient-descent-and-automatic-differentiation" style="font-size: 80%;">Minimizing the cost function using gradient descent and automatic differentiation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs009.html#example-exponential-decay" style="font-size: 80%;">Example: Exponential decay</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs010.html#the-function-to-solve-for" style="font-size: 80%;">The function to solve for</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#the-trial-solution" style="font-size: 80%;">The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs012.html#setup-of-network" style="font-size: 80%;">Setup of Network</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs013.html#reformulating-the-problem" style="font-size: 80%;">Reformulating the problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs014.html#more-technicalities" style="font-size: 80%;">More technicalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs039.html#more-details" style="font-size: 80%;">More details</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs016.html#a-possible-implementation-of-a-neural-network" style="font-size: 80%;">A possible implementation of a neural network</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs017.html#technicalities" style="font-size: 80%;">Technicalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs018.html#final-technicalities-i" style="font-size: 80%;">Final technicalities I</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs019.html#final-technicalities-ii" style="font-size: 80%;">Final technicalities II</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs020.html#final-technicalities-iii" style="font-size: 80%;">Final technicalities III</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs021.html#final-technicalities-iv" style="font-size: 80%;">Final technicalities IV</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs022.html#back-propagation" style="font-size: 80%;">Back propagation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs023.html#gradient-descent" style="font-size: 80%;">Gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs024.html#the-code-for-solving-the-ode" style="font-size: 80%;">The code for solving the ODE</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs025.html#the-network-with-one-input-layer-specified-number-of-hidden-layers-and-one-output-layer" style="font-size: 80%;">The network with one input layer, specified number of hidden layers, and one output layer</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs026.html#example-population-growth" style="font-size: 80%;">Example: Population growth</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs027.html#setting-up-the-problem" style="font-size: 80%;">Setting up the problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#the-trial-solution" style="font-size: 80%;">The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs029.html#the-program-using-autograd" style="font-size: 80%;">The program using Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs030.html#using-forward-euler-to-solve-the-ode" style="font-size: 80%;">Using forward Euler to solve the ODE</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs031.html#example-solving-the-one-dimensional-poisson-equation" style="font-size: 80%;">Example: Solving the one dimensional Poisson equation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs032.html#the-specific-equation-to-solve-for" style="font-size: 80%;">The specific equation to solve for</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs033.html#solving-the-equation-using-autograd" style="font-size: 80%;">Solving the equation using Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs034.html#comparing-with-a-numerical-scheme" style="font-size: 80%;">Comparing with a numerical scheme</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs035.html#setting-up-the-code" style="font-size: 80%;">Setting up the code</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs036.html#partial-differential-equations" style="font-size: 80%;">Partial Differential Equations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs037.html#type-of-problem" style="font-size: 80%;">Type of problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs038.html#network-requirements" style="font-size: 80%;">Network requirements</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs039.html#more-details" style="font-size: 80%;">More details</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs040.html#example-the-diffusion-equation" style="font-size: 80%;">Example: The diffusion equation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs041.html#defining-the-problem" style="font-size: 80%;">Defining the problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs042.html#setting-up-the-network-using-autograd" style="font-size: 80%;">Setting up the network using Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs043.html#setting-up-the-network-using-autograd-the-trial-solution" style="font-size: 80%;">Setting up the network using Autograd; The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs044.html#why-the-jacobian" style="font-size: 80%;">Why the Jacobian?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs045.html#setting-up-the-network-using-autograd-the-full-program" style="font-size: 80%;">Setting up the network using Autograd; The full program</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs046.html#resources-on-differential-equations-and-deep-learning" style="font-size: 80%;">Resources on differential equations and deep learning</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs047.html#convolutional-neural-networks-recognizing-images" style="font-size: 80%;">Convolutional Neural Networks (recognizing images)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs048.html#what-is-the-difference" style="font-size: 80%;">What is the Difference</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs049.html#neural-networks-vs-cnns" style="font-size: 80%;">Neural Networks vs CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs050.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;">Why CNNS for images, sound files, medical images from CT scans etc?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs051.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;">Regular NNs don’t scale well to full images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs052.html#3d-volumes-of-neurons" style="font-size: 80%;">3D volumes of neurons</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs053.html#more-on-dimensionalities" style="font-size: 80%;">More on Dimensionalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs054.html#further-remarks" style="font-size: 80%;">Further remarks</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs055.html#layers-used-to-build-cnns" style="font-size: 80%;">Layers used to build CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs056.html#transforming-images" style="font-size: 80%;">Transforming images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs057.html#cnns-in-brief" style="font-size: 80%;">CNNs in brief</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs058.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs059.html#key-idea" style="font-size: 80%;">Key Idea</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#how-to-do-image-compression-before-the-era-of-deep-learning" style="font-size: 80%;">How to do image compression before the era of deep learning</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs061.html#the-svd-example" style="font-size: 80%;">The SVD example</a></li>
+     <!-- navigation toc: --> <li><a href="#mathematics-of-cnns" style="font-size: 80%;">Mathematics of CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs063.html#convolution-examples-polynomial-multiplication" style="font-size: 80%;">Convolution Examples: Polynomial multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs064.html#efficient-polynomial-multiplication" style="font-size: 80%;">Efficient Polynomial Multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs065.html#further-simplification" style="font-size: 80%;">Further simplification</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs066.html#a-more-efficient-way-of-coding-the-above-convolution" style="font-size: 80%;">A more efficient way of coding the above Convolution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs067.html#commutative-process" style="font-size: 80%;">Commutative process</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs068.html#toeplitz-matrices" style="font-size: 80%;">Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs069.html#fourier-series-and-toeplitz-matrices" style="font-size: 80%;">Fourier series and Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs070.html#generalizing-the-above-one-dimensional-case" style="font-size: 80%;">Generalizing the above one-dimensional case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs071.html#memory-considerations" style="font-size: 80%;">Memory considerations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs072.html#padding" style="font-size: 80%;">Padding</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs073.html#new-vector" style="font-size: 80%;">New vector</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs074.html#rewriting-as-dot-products" style="font-size: 80%;">Rewriting as dot products</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs075.html#cross-correlation" style="font-size: 80%;">Cross correlation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs075.html#two-dimensional-objects" style="font-size: 80%;">Two-dimensional objects</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs076.html#cnns-in-more-detail-simple-example" style="font-size: 80%;">CNNs in more detail, simple example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs077.html#the-convolution-stage" style="font-size: 80%;">The convolution stage</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs078.html#finding-the-number-of-parameters" style="font-size: 80%;">Finding the number of parameters</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs079.html#new-image-or-volume" style="font-size: 80%;">New image (or volume)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs080.html#parameters-to-train-common-settings" style="font-size: 80%;">Parameters to train, common settings</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs081.html#examples-of-cnn-setups" style="font-size: 80%;">Examples of CNN setups</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs082.html#summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Summarizing: Performing a general discrete convolution ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs083.html#pooling" style="font-size: 80%;">Pooling</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs084.html#pooling-arithmetic" style="font-size: 80%;">Pooling arithmetic</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs085.html#pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Pooling types ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs086.html#building-convolutional-neural-networks-in-tensorflow-keras-and-pytorch" style="font-size: 80%;">Building convolutional neural networks in Tensorflow/Keras and PyTorch</a></li>
 
         </ul>
       </li>
@@ -331,45 +389,41 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0062"></a>
 <!-- !split -->
-<h2 id="building-up-adaboost" class="anchor">Building up AdaBoost </h2>
-
-<p>First, for any \( \beta > 0 \), we optimize \( G \) by setting</p>
-$$
-G_m(x) = \mathrm{sign} \sum_{i=0}^{n-1} w_i^m I(y_i \ne G_(x_i)),
-$$
+<h2 id="mathematics-of-cnns" class="anchor">Mathematics of CNNs </h2>
 
-<p>which is the classifier that minimizes the weighted error rate in predicting \( y \).</p>
+<p>The mathematics of CNNs is based on the mathematical operation of
+<b>convolution</b>.  In mathematics (in particular in functional analysis),
+convolution is represented by mathematical operations (integration,
+summation etc) on two functions in order to produce a third function
+that expresses how the shape of one gets modified by the other.
+Convolution has a plethora of applications in a variety of
+disciplines, spanning from statistics to signal processing, computer
+vision, solutions of differential equations,linear algebra,
+engineering, and yes, machine learning.
+</p>
 
-<p>We can do this by rewriting</p>
+<p>Mathematically, convolution is defined as follows (one-dimensional example):
+Let us define a continuous function \( y(t) \) given by
+</p>
 $$
-\exp{-(\beta)}\sum_{y_i=G(x_i)}w_i^m+\exp{(\beta)}\sum_{y_i\ne G(x_i)}w_i^m,
+y(t) = \int x(a) w(t-a) da,
 $$
 
-<p>which can be rewritten as</p>
-$$
-(\exp{(\beta)}-\exp{-(\beta)})\sum_{i=0}^{n-1}w_i^mI(y_i\ne G(x_i))+\exp{(-\beta)}\sum_{i=0}^{n-1}w_i^m=0,
-$$
-
-<p>which leads to</p>
-$$
-\beta_m = \frac{1}{2}\log{\frac{1-\mathrm{\overline{err}}}{\mathrm{\overline{err}}}},
-$$
+<p>where \( x(a) \) represents a so-called input and \( w(t-a) \) is normally called the weight function or kernel.</p>
 
-<p>where we have redefined the error as </p>
+<p>The above integral is written in  a more compact form as</p>
 $$
-\mathrm{\overline{err}}_m=\frac{1}{n}\frac{\sum_{i=0}^{n-1}w_i^mI(y_i\ne G(x_i)}{\sum_{i=0}^{n-1}w_i^m},
+y(t) = \left(x * w\right)(t).
 $$
 
-<p>which leads to an update of</p>
+<p>The discretized version reads</p>
 $$
-f_m(x) = f_{m-1}(x) +\beta_m G_m(x).
+y(t) = \sum_{a=-\infty}^{a=\infty}x(a)w(t-a).
 $$
 
-<p>This leads to the new weights</p>
-$$
-w_i^{m+1} = w_i^m \exp{(-y_i\beta_m G_m(x_i))}
-$$
+<p>Computing the inverse of the above convolution operations is known as deconvolution and the process is commutative.</p>
 
+<p>How can we use this? And what does it mean? Let us study some familiar examples first.</p>
 
 <p>
 <!-- navigation buttons at the bottom of the page -->
@@ -389,6 +443,14 @@ <h2 id="building-up-adaboost" class="anchor">Building up AdaBoost </h2>
   <li><a href="http://github.com/._week44-bs063.html">64</a></li>
   <li><a href="http://github.com/._week44-bs064.html">65</a></li>
   <li><a href="http://github.com/._week44-bs065.html">66</a></li>
+  <li><a href="http://github.com/._week44-bs066.html">67</a></li>
+  <li><a href="http://github.com/._week44-bs067.html">68</a></li>
+  <li><a href="http://github.com/._week44-bs068.html">69</a></li>
+  <li><a href="http://github.com/._week44-bs069.html">70</a></li>
+  <li><a href="http://github.com/._week44-bs070.html">71</a></li>
+  <li><a href="http://github.com/._week44-bs071.html">72</a></li>
+  <li><a href="">...</a></li>
+  <li><a href="http://github.com/._week44-bs086.html">87</a></li>
   <li><a href="http://github.com/._week44-bs063.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week44/html/._week44-bs063.html b/doc/pub/week44/html/._week44-bs063.html
index cb650d568..cf196b7d0 100644
--- a/doc/pub/week44/html/._week44-bs063.html
+++ b/doc/pub/week44/html/._week44-bs063.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 44: Decision Trees, Ensemble methods  and Random Forests">
-<title>Week 44: Decision Trees, Ensemble methods  and Random Forests</title>
+<meta name="description" content="Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)">
+<title>Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week44.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week44-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -36,191 +36,227 @@
 
 <!-- tocinfo
 {'highest level': 2,
- 'sections': [('Overview of week 44', 2, None, 'overview-of-week-44'),
-              ('Digression First', 2, None, 'digression-first'),
-              ('Decision trees, overarching aims',
+ 'sections': [('Plan for week 44', 2, None, 'plan-for-week-44'),
+              ('Lab  sessions on Tuesday and Wednesday',
                2,
                None,
-               'decision-trees-overarching-aims'),
-              ('Basics of a tree', 2, None, 'basics-of-a-tree'),
-              ('A Sketch of a Tree, Regression problem',
+               'lab-sessions-on-tuesday-and-wednesday'),
+              ('Material for Lecture Monday October 27',
                2,
                None,
-               'a-sketch-of-a-tree-regression-problem'),
-              ('A Sketch of a Tree, Classification  problem',
+               'material-for-lecture-monday-october-27'),
+              ('Solving differential equations  with Deep Learning',
                2,
                None,
-               'a-sketch-of-a-tree-classification-problem'),
-              ('A typical Decision Tree with its pertinent Jargon, '
-               'Classification Problem',
+               'solving-differential-equations-with-deep-learning'),
+              ('Ordinary Differential Equations first',
                2,
                None,
-               'a-typical-decision-tree-with-its-pertinent-jargon-classification-problem'),
-              ('General Features', 2, None, 'general-features'),
-              ('How do we set it up?', 2, None, 'how-do-we-set-it-up'),
-              ('Decision trees and Regression',
+               'ordinary-differential-equations-first'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('Minimization process', 2, None, 'minimization-process'),
+              ('Minimizing the cost function using gradient descent and '
+               'automatic differentiation',
                2,
                None,
-               'decision-trees-and-regression'),
-              ('Building a tree, regression',
+               'minimizing-the-cost-function-using-gradient-descent-and-automatic-differentiation'),
+              ('Example: Exponential decay',
                2,
                None,
-               'building-a-tree-regression'),
-              ('A top-down approach, recursive binary splitting',
+               'example-exponential-decay'),
+              ('The function to solve for',
                2,
                None,
-               'a-top-down-approach-recursive-binary-splitting'),
-              ('Making a tree', 2, None, 'making-a-tree'),
-              ('Pruning the tree', 2, None, 'pruning-the-tree'),
-              ('Cost complexity pruning', 2, None, 'cost-complexity-pruning'),
-              ('Schematic Regression Procedure',
+               'the-function-to-solve-for'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('Setup of Network', 2, None, 'setup-of-network'),
+              ('Reformulating the problem',
                2,
                None,
-               'schematic-regression-procedure'),
-              ('A Classification Tree', 2, None, 'a-classification-tree'),
-              ('Growing a classification tree',
+               'reformulating-the-problem'),
+              ('More technicalities', 2, None, 'more-technicalities'),
+              ('More details', 2, None, 'more-details'),
+              ('A possible implementation of a neural network',
                2,
                None,
-               'growing-a-classification-tree'),
-              ('Classification tree, how to split nodes',
+               'a-possible-implementation-of-a-neural-network'),
+              ('Technicalities', 2, None, 'technicalities'),
+              ('Final technicalities I', 2, None, 'final-technicalities-i'),
+              ('Final technicalities II', 2, None, 'final-technicalities-ii'),
+              ('Final technicalities III', 2, None, 'final-technicalities-iii'),
+              ('Final technicalities IV', 2, None, 'final-technicalities-iv'),
+              ('Back propagation', 2, None, 'back-propagation'),
+              ('Gradient descent', 2, None, 'gradient-descent'),
+              ('The code for solving the ODE',
                2,
                None,
-               'classification-tree-how-to-split-nodes'),
-              ('Visualizing the Tree, Classification',
+               'the-code-for-solving-the-ode'),
+              ('The network with one input layer, specified number of hidden '
+               'layers, and one output layer',
                2,
                None,
-               'visualizing-the-tree-classification'),
-              ('Visualizing the Tree, The Moons',
+               'the-network-with-one-input-layer-specified-number-of-hidden-layers-and-one-output-layer'),
+              ('Example: Population growth',
                2,
                None,
-               'visualizing-the-tree-the-moons'),
-              ('Other ways of visualizing the trees',
+               'example-population-growth'),
+              ('Setting up the problem', 2, None, 'setting-up-the-problem'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('The program using Autograd',
                2,
                None,
-               'other-ways-of-visualizing-the-trees'),
-              ('Printing out as text', 2, None, 'printing-out-as-text'),
-              ('Algorithms for Setting up Decision Trees',
+               'the-program-using-autograd'),
+              ('Using forward Euler to solve the ODE',
                2,
                None,
-               'algorithms-for-setting-up-decision-trees'),
-              ('The CART algorithm for Classification',
+               'using-forward-euler-to-solve-the-ode'),
+              ('Example: Solving the one dimensional Poisson equation',
                2,
                None,
-               'the-cart-algorithm-for-classification'),
-              ('The CART algorithm for Regression',
+               'example-solving-the-one-dimensional-poisson-equation'),
+              ('The specific equation to solve for',
                2,
                None,
-               'the-cart-algorithm-for-regression'),
-              ('Why binary splits?', 2, None, 'why-binary-splits'),
-              ('Computing a Tree using the Gini Index',
+               'the-specific-equation-to-solve-for'),
+              ('Solving the equation using Autograd',
                2,
                None,
-               'computing-a-tree-using-the-gini-index'),
-              ('The Table', 2, None, 'the-table'),
-              ('Computing the various Gini Indices',
+               'solving-the-equation-using-autograd'),
+              ('Comparing with a numerical scheme',
                2,
                None,
-               'computing-the-various-gini-indices'),
-              ('Computing the various Gini Indices, Hours slept',
+               'comparing-with-a-numerical-scheme'),
+              ('Setting up the code', 2, None, 'setting-up-the-code'),
+              ('Partial Differential Equations',
                2,
                None,
-               'computing-the-various-gini-indices-hours-slept'),
-              ('Computing the various Gini Indices, Hours studied',
+               'partial-differential-equations'),
+              ('Type of problem', 2, None, 'type-of-problem'),
+              ('Network requirements', 2, None, 'network-requirements'),
+              ('More details', 2, None, 'more-details'),
+              ('Example: The diffusion equation',
                2,
                None,
-               'computing-the-various-gini-indices-hours-studied'),
-              ('A possible code using Scikit-Learn',
+               'example-the-diffusion-equation'),
+              ('Defining the problem', 2, None, 'defining-the-problem'),
+              ('Setting up the network using Autograd',
                2,
                None,
-               'a-possible-code-using-scikit-learn'),
-              ('Further example: Computing the Gini index',
+               'setting-up-the-network-using-autograd'),
+              ('Setting up the network using Autograd; The trial solution',
                2,
                None,
-               'further-example-computing-the-gini-index'),
-              ('Simple Python Code to read in Data and perform Classification',
+               'setting-up-the-network-using-autograd-the-trial-solution'),
+              ('Why the Jacobian?', 2, None, 'why-the-jacobian'),
+              ('Setting up the network using Autograd; The full program',
                2,
                None,
-               'simple-python-code-to-read-in-data-and-perform-classification'),
-              ('Computing the Gini Factor',
+               'setting-up-the-network-using-autograd-the-full-program'),
+              ('Resources on differential equations and deep learning',
                2,
                None,
-               'computing-the-gini-factor'),
-              ('Regression trees', 2, None, 'regression-trees'),
-              ('Final regressor code', 2, None, 'final-regressor-code'),
-              ('Pros and cons of trees, pros',
+               'resources-on-differential-equations-and-deep-learning'),
+              ('Convolutional Neural Networks (recognizing images)',
                2,
                None,
-               'pros-and-cons-of-trees-pros'),
-              ('Disadvantages', 2, None, 'disadvantages'),
-              ('Ensemble Methods: From a Single Tree to Many Trees and Extreme '
-               'Boosting, Meet the Jungle of Methods',
+               'convolutional-neural-networks-recognizing-images'),
+              ('What is the Difference', 2, None, 'what-is-the-difference'),
+              ('Neural Networks vs CNNs', 2, None, 'neural-networks-vs-cnns'),
+              ('Why CNNS for images, sound files, medical images from CT scans '
+               'etc?',
                2,
                None,
-               'ensemble-methods-from-a-single-tree-to-many-trees-and-extreme-boosting-meet-the-jungle-of-methods'),
-              ('An Overview of Ensemble Methods',
+               'why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc'),
+              ('Regular NNs don’t scale well to full images',
                2,
                None,
-               'an-overview-of-ensemble-methods'),
-              ('Why Voting?', 2, None, 'why-voting'),
-              ('Tossing coins', 2, None, 'tossing-coins'),
-              ('Standard imports first', 2, None, 'standard-imports-first'),
-              ('Simple Voting Example, head or tail',
+               'regular-nns-don-t-scale-well-to-full-images'),
+              ('3D volumes of neurons', 2, None, '3d-volumes-of-neurons'),
+              ('More on Dimensionalities', 2, None, 'more-on-dimensionalities'),
+              ('Further remarks', 2, None, 'further-remarks'),
+              ('Layers used to build CNNs',
                2,
                None,
-               'simple-voting-example-head-or-tail'),
-              ('Using the Voting Classifier',
+               'layers-used-to-build-cnns'),
+              ('Transforming images', 2, None, 'transforming-images'),
+              ('CNNs in brief', 2, None, 'cnns-in-brief'),
+              ('A deep CNN model ("From Raschka et '
+               'al":"https://github.com/rasbt/machine-learning-book")',
                2,
                None,
-               'using-the-voting-classifier'),
-              ('Voting and Bagging', 2, None, 'voting-and-bagging'),
-              ('Bagging', 2, None, 'bagging'),
-              ('More bagging', 2, None, 'more-bagging'),
-              ('Making your own Bootstrap: Changing the Level of the Decision '
-               'Tree',
+               'a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
+              ('Key Idea', 2, None, 'key-idea'),
+              ('How to do image compression before the era of deep learning',
                2,
                None,
-               'making-your-own-bootstrap-changing-the-level-of-the-decision-tree'),
-              ('Random forests', 2, None, 'random-forests'),
-              ('Random Forest Algorithm', 2, None, 'random-forest-algorithm'),
-              ('Random Forests Compared with other Methods on the Cancer Data',
+               'how-to-do-image-compression-before-the-era-of-deep-learning'),
+              ('The SVD example', 2, None, 'the-svd-example'),
+              ('Mathematics of CNNs', 2, None, 'mathematics-of-cnns'),
+              ('Convolution Examples: Polynomial multiplication',
                2,
                None,
-               'random-forests-compared-with-other-methods-on-the-cancer-data'),
-              ('Compare  Bagging on Trees with Random Forests',
+               'convolution-examples-polynomial-multiplication'),
+              ('Efficient Polynomial Multiplication',
                2,
                None,
-               'compare-bagging-on-trees-with-random-forests'),
-              ("Boosting, a Bird's Eye View",
+               'efficient-polynomial-multiplication'),
+              ('Further simplification', 2, None, 'further-simplification'),
+              ('A more efficient way of coding the above Convolution',
                2,
                None,
-               'boosting-a-bird-s-eye-view'),
-              ('What is boosting? Additive Modelling/Iterative Fitting',
+               'a-more-efficient-way-of-coding-the-above-convolution'),
+              ('Commutative process', 2, None, 'commutative-process'),
+              ('Toeplitz matrices', 2, None, 'toeplitz-matrices'),
+              ('Fourier series and Toeplitz matrices',
                2,
                None,
-               'what-is-boosting-additive-modelling-iterative-fitting'),
-              ('Iterative Fitting, Regression and Squared-error Cost Function',
+               'fourier-series-and-toeplitz-matrices'),
+              ('Generalizing the above one-dimensional case',
                2,
                None,
-               'iterative-fitting-regression-and-squared-error-cost-function'),
-              ('Squared-Error Example and Iterative Fitting',
+               'generalizing-the-above-one-dimensional-case'),
+              ('Memory considerations', 2, None, 'memory-considerations'),
+              ('Padding', 2, None, 'padding'),
+              ('New vector', 2, None, 'new-vector'),
+              ('Rewriting as dot products',
                2,
                None,
-               'squared-error-example-and-iterative-fitting'),
-              ('Iterative Fitting, Classification and AdaBoost',
+               'rewriting-as-dot-products'),
+              ('Cross correlation', 2, None, 'cross-correlation'),
+              ('Two-dimensional objects', 2, None, 'two-dimensional-objects'),
+              ('CNNs in more detail, simple example',
                2,
                None,
-               'iterative-fitting-classification-and-adaboost'),
-              ('Adaptive Boosting, AdaBoost',
+               'cnns-in-more-detail-simple-example'),
+              ('The convolution stage', 2, None, 'the-convolution-stage'),
+              ('Finding the number of parameters',
                2,
                None,
-               'adaptive-boosting-adaboost'),
-              ('Building up AdaBoost', 2, None, 'building-up-adaboost'),
-              ('Adaptive boosting: AdaBoost, Basic Algorithm',
+               'finding-the-number-of-parameters'),
+              ('New image (or volume)', 2, None, 'new-image-or-volume'),
+              ('Parameters to train, common settings',
                2,
                None,
-               'adaptive-boosting-adaboost-basic-algorithm'),
-              ('Basic Steps of AdaBoost', 2, None, 'basic-steps-of-adaboost'),
-              ('AdaBoost Examples', 2, None, 'adaboost-examples')]}
+               'parameters-to-train-common-settings'),
+              ('Examples of CNN setups', 2, None, 'examples-of-cnn-setups'),
+              ('Summarizing: Performing a general discrete convolution ("From '
+               'Raschka et '
+               'al":"https://github.com/rasbt/machine-learning-book")',
+               2,
+               None,
+               'summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
+              ('Pooling', 2, None, 'pooling'),
+              ('Pooling arithmetic', 2, None, 'pooling-arithmetic'),
+              ('Pooling types ("From Raschka et '
+               'al":"https://github.com/rasbt/machine-learning-book")',
+               2,
+               None,
+               'pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
+              ('Building convolutional neural networks in Tensorflow/Keras and '
+               'PyTorch',
+               2,
+               None,
+               'building-convolutional-neural-networks-in-tensorflow-keras-and-pytorch')]}
 end of tocinfo -->
 
 <body>
@@ -248,78 +284,100 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week44-bs.html">Week 44: Decision Trees, Ensemble methods  and Random Forests</a>
+    <a class="navbar-brand" href="http://github.com/week44-bs.html">Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs001.html#overview-of-week-44" style="font-size: 80%;">Overview of week 44</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs002.html#digression-first" style="font-size: 80%;">Digression First</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs003.html#decision-trees-overarching-aims" style="font-size: 80%;">Decision trees, overarching aims</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs004.html#basics-of-a-tree" style="font-size: 80%;">Basics of a tree</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs005.html#a-sketch-of-a-tree-regression-problem" style="font-size: 80%;">A Sketch of a Tree, Regression problem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs006.html#a-sketch-of-a-tree-classification-problem" style="font-size: 80%;">A Sketch of a Tree, Classification  problem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs007.html#a-typical-decision-tree-with-its-pertinent-jargon-classification-problem" style="font-size: 80%;">A typical Decision Tree with its pertinent Jargon, Classification Problem</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs008.html#general-features" style="font-size: 80%;">General Features</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs009.html#how-do-we-set-it-up" style="font-size: 80%;">How do we set it up?</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs010.html#decision-trees-and-regression" style="font-size: 80%;">Decision trees and Regression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs011.html#building-a-tree-regression" style="font-size: 80%;">Building a tree, regression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs012.html#a-top-down-approach-recursive-binary-splitting" style="font-size: 80%;">A top-down approach, recursive binary splitting</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs013.html#making-a-tree" style="font-size: 80%;">Making a tree</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs014.html#pruning-the-tree" style="font-size: 80%;">Pruning the tree</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs015.html#cost-complexity-pruning" style="font-size: 80%;">Cost complexity pruning</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs016.html#schematic-regression-procedure" style="font-size: 80%;">Schematic Regression Procedure</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs017.html#a-classification-tree" style="font-size: 80%;">A Classification Tree</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs018.html#growing-a-classification-tree" style="font-size: 80%;">Growing a classification tree</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs019.html#classification-tree-how-to-split-nodes" style="font-size: 80%;">Classification tree, how to split nodes</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs020.html#visualizing-the-tree-classification" style="font-size: 80%;">Visualizing the Tree, Classification</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs021.html#visualizing-the-tree-the-moons" style="font-size: 80%;">Visualizing the Tree, The Moons</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs022.html#other-ways-of-visualizing-the-trees" style="font-size: 80%;">Other ways of visualizing the trees</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs023.html#printing-out-as-text" style="font-size: 80%;">Printing out as text</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs024.html#algorithms-for-setting-up-decision-trees" style="font-size: 80%;">Algorithms for Setting up Decision Trees</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs025.html#the-cart-algorithm-for-classification" style="font-size: 80%;">The CART algorithm for Classification</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs026.html#the-cart-algorithm-for-regression" style="font-size: 80%;">The CART algorithm for Regression</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs027.html#why-binary-splits" style="font-size: 80%;">Why binary splits?</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#computing-a-tree-using-the-gini-index" style="font-size: 80%;">Computing a Tree using the Gini Index</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs029.html#the-table" style="font-size: 80%;">The Table</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs030.html#computing-the-various-gini-indices" style="font-size: 80%;">Computing the various Gini Indices</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs031.html#computing-the-various-gini-indices-hours-slept" style="font-size: 80%;">Computing the various Gini Indices, Hours slept</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs032.html#computing-the-various-gini-indices-hours-studied" style="font-size: 80%;">Computing the various Gini Indices, Hours studied</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs033.html#a-possible-code-using-scikit-learn" style="font-size: 80%;">A possible code using Scikit-Learn</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs034.html#further-example-computing-the-gini-index" style="font-size: 80%;">Further example: Computing the Gini index</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs035.html#simple-python-code-to-read-in-data-and-perform-classification" style="font-size: 80%;">Simple Python Code to read in Data and perform Classification</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs036.html#computing-the-gini-factor" style="font-size: 80%;">Computing the Gini Factor</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs037.html#regression-trees" style="font-size: 80%;">Regression trees</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs038.html#final-regressor-code" style="font-size: 80%;">Final regressor code</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs039.html#pros-and-cons-of-trees-pros" style="font-size: 80%;">Pros and cons of trees, pros</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs040.html#disadvantages" style="font-size: 80%;">Disadvantages</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs041.html#ensemble-methods-from-a-single-tree-to-many-trees-and-extreme-boosting-meet-the-jungle-of-methods" style="font-size: 80%;">Ensemble Methods: From a Single Tree to Many Trees and Extreme Boosting, Meet the Jungle of Methods</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs042.html#an-overview-of-ensemble-methods" style="font-size: 80%;">An Overview of Ensemble Methods</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs043.html#why-voting" style="font-size: 80%;">Why Voting?</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs044.html#tossing-coins" style="font-size: 80%;">Tossing coins</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs045.html#standard-imports-first" style="font-size: 80%;">Standard imports first</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs046.html#simple-voting-example-head-or-tail" style="font-size: 80%;">Simple Voting Example, head or tail</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs047.html#using-the-voting-classifier" style="font-size: 80%;">Using the Voting Classifier</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs048.html#voting-and-bagging" style="font-size: 80%;">Voting and Bagging</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs049.html#bagging" style="font-size: 80%;">Bagging</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs050.html#more-bagging" style="font-size: 80%;">More bagging</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs051.html#making-your-own-bootstrap-changing-the-level-of-the-decision-tree" style="font-size: 80%;">Making your own Bootstrap: Changing the Level of the Decision Tree</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs052.html#random-forests" style="font-size: 80%;">Random forests</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs053.html#random-forest-algorithm" style="font-size: 80%;">Random Forest Algorithm</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs054.html#random-forests-compared-with-other-methods-on-the-cancer-data" style="font-size: 80%;">Random Forests Compared with other Methods on the Cancer Data</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs055.html#compare-bagging-on-trees-with-random-forests" style="font-size: 80%;">Compare  Bagging on Trees with Random Forests</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs056.html#boosting-a-bird-s-eye-view" style="font-size: 80%;">Boosting, a Bird's Eye View</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs057.html#what-is-boosting-additive-modelling-iterative-fitting" style="font-size: 80%;">What is boosting? Additive Modelling/Iterative Fitting</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs058.html#iterative-fitting-regression-and-squared-error-cost-function" style="font-size: 80%;">Iterative Fitting, Regression and Squared-error Cost Function</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs059.html#squared-error-example-and-iterative-fitting" style="font-size: 80%;">Squared-Error Example and Iterative Fitting</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#iterative-fitting-classification-and-adaboost" style="font-size: 80%;">Iterative Fitting, Classification and AdaBoost</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs061.html#adaptive-boosting-adaboost" style="font-size: 80%;">Adaptive Boosting, AdaBoost</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs062.html#building-up-adaboost" style="font-size: 80%;">Building up AdaBoost</a></li>
-     <!-- navigation toc: --> <li><a href="#adaptive-boosting-adaboost-basic-algorithm" style="font-size: 80%;">Adaptive boosting: AdaBoost, Basic Algorithm</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs064.html#basic-steps-of-adaboost" style="font-size: 80%;">Basic Steps of AdaBoost</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs065.html#adaboost-examples" style="font-size: 80%;">AdaBoost Examples</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs001.html#plan-for-week-44" style="font-size: 80%;">Plan for week 44</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs002.html#lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;">Lab  sessions on Tuesday and Wednesday</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs003.html#material-for-lecture-monday-october-27" style="font-size: 80%;">Material for Lecture Monday October 27</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs004.html#solving-differential-equations-with-deep-learning" style="font-size: 80%;">Solving differential equations  with Deep Learning</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs005.html#ordinary-differential-equations-first" style="font-size: 80%;">Ordinary Differential Equations first</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#the-trial-solution" style="font-size: 80%;">The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs007.html#minimization-process" style="font-size: 80%;">Minimization process</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs008.html#minimizing-the-cost-function-using-gradient-descent-and-automatic-differentiation" style="font-size: 80%;">Minimizing the cost function using gradient descent and automatic differentiation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs009.html#example-exponential-decay" style="font-size: 80%;">Example: Exponential decay</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs010.html#the-function-to-solve-for" style="font-size: 80%;">The function to solve for</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#the-trial-solution" style="font-size: 80%;">The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs012.html#setup-of-network" style="font-size: 80%;">Setup of Network</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs013.html#reformulating-the-problem" style="font-size: 80%;">Reformulating the problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs014.html#more-technicalities" style="font-size: 80%;">More technicalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs039.html#more-details" style="font-size: 80%;">More details</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs016.html#a-possible-implementation-of-a-neural-network" style="font-size: 80%;">A possible implementation of a neural network</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs017.html#technicalities" style="font-size: 80%;">Technicalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs018.html#final-technicalities-i" style="font-size: 80%;">Final technicalities I</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs019.html#final-technicalities-ii" style="font-size: 80%;">Final technicalities II</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs020.html#final-technicalities-iii" style="font-size: 80%;">Final technicalities III</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs021.html#final-technicalities-iv" style="font-size: 80%;">Final technicalities IV</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs022.html#back-propagation" style="font-size: 80%;">Back propagation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs023.html#gradient-descent" style="font-size: 80%;">Gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs024.html#the-code-for-solving-the-ode" style="font-size: 80%;">The code for solving the ODE</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs025.html#the-network-with-one-input-layer-specified-number-of-hidden-layers-and-one-output-layer" style="font-size: 80%;">The network with one input layer, specified number of hidden layers, and one output layer</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs026.html#example-population-growth" style="font-size: 80%;">Example: Population growth</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs027.html#setting-up-the-problem" style="font-size: 80%;">Setting up the problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#the-trial-solution" style="font-size: 80%;">The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs029.html#the-program-using-autograd" style="font-size: 80%;">The program using Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs030.html#using-forward-euler-to-solve-the-ode" style="font-size: 80%;">Using forward Euler to solve the ODE</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs031.html#example-solving-the-one-dimensional-poisson-equation" style="font-size: 80%;">Example: Solving the one dimensional Poisson equation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs032.html#the-specific-equation-to-solve-for" style="font-size: 80%;">The specific equation to solve for</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs033.html#solving-the-equation-using-autograd" style="font-size: 80%;">Solving the equation using Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs034.html#comparing-with-a-numerical-scheme" style="font-size: 80%;">Comparing with a numerical scheme</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs035.html#setting-up-the-code" style="font-size: 80%;">Setting up the code</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs036.html#partial-differential-equations" style="font-size: 80%;">Partial Differential Equations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs037.html#type-of-problem" style="font-size: 80%;">Type of problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs038.html#network-requirements" style="font-size: 80%;">Network requirements</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs039.html#more-details" style="font-size: 80%;">More details</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs040.html#example-the-diffusion-equation" style="font-size: 80%;">Example: The diffusion equation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs041.html#defining-the-problem" style="font-size: 80%;">Defining the problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs042.html#setting-up-the-network-using-autograd" style="font-size: 80%;">Setting up the network using Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs043.html#setting-up-the-network-using-autograd-the-trial-solution" style="font-size: 80%;">Setting up the network using Autograd; The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs044.html#why-the-jacobian" style="font-size: 80%;">Why the Jacobian?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs045.html#setting-up-the-network-using-autograd-the-full-program" style="font-size: 80%;">Setting up the network using Autograd; The full program</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs046.html#resources-on-differential-equations-and-deep-learning" style="font-size: 80%;">Resources on differential equations and deep learning</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs047.html#convolutional-neural-networks-recognizing-images" style="font-size: 80%;">Convolutional Neural Networks (recognizing images)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs048.html#what-is-the-difference" style="font-size: 80%;">What is the Difference</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs049.html#neural-networks-vs-cnns" style="font-size: 80%;">Neural Networks vs CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs050.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;">Why CNNS for images, sound files, medical images from CT scans etc?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs051.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;">Regular NNs don’t scale well to full images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs052.html#3d-volumes-of-neurons" style="font-size: 80%;">3D volumes of neurons</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs053.html#more-on-dimensionalities" style="font-size: 80%;">More on Dimensionalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs054.html#further-remarks" style="font-size: 80%;">Further remarks</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs055.html#layers-used-to-build-cnns" style="font-size: 80%;">Layers used to build CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs056.html#transforming-images" style="font-size: 80%;">Transforming images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs057.html#cnns-in-brief" style="font-size: 80%;">CNNs in brief</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs058.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs059.html#key-idea" style="font-size: 80%;">Key Idea</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#how-to-do-image-compression-before-the-era-of-deep-learning" style="font-size: 80%;">How to do image compression before the era of deep learning</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs061.html#the-svd-example" style="font-size: 80%;">The SVD example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs062.html#mathematics-of-cnns" style="font-size: 80%;">Mathematics of CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="#convolution-examples-polynomial-multiplication" style="font-size: 80%;">Convolution Examples: Polynomial multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs064.html#efficient-polynomial-multiplication" style="font-size: 80%;">Efficient Polynomial Multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs065.html#further-simplification" style="font-size: 80%;">Further simplification</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs066.html#a-more-efficient-way-of-coding-the-above-convolution" style="font-size: 80%;">A more efficient way of coding the above Convolution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs067.html#commutative-process" style="font-size: 80%;">Commutative process</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs068.html#toeplitz-matrices" style="font-size: 80%;">Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs069.html#fourier-series-and-toeplitz-matrices" style="font-size: 80%;">Fourier series and Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs070.html#generalizing-the-above-one-dimensional-case" style="font-size: 80%;">Generalizing the above one-dimensional case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs071.html#memory-considerations" style="font-size: 80%;">Memory considerations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs072.html#padding" style="font-size: 80%;">Padding</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs073.html#new-vector" style="font-size: 80%;">New vector</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs074.html#rewriting-as-dot-products" style="font-size: 80%;">Rewriting as dot products</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs075.html#cross-correlation" style="font-size: 80%;">Cross correlation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs075.html#two-dimensional-objects" style="font-size: 80%;">Two-dimensional objects</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs076.html#cnns-in-more-detail-simple-example" style="font-size: 80%;">CNNs in more detail, simple example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs077.html#the-convolution-stage" style="font-size: 80%;">The convolution stage</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs078.html#finding-the-number-of-parameters" style="font-size: 80%;">Finding the number of parameters</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs079.html#new-image-or-volume" style="font-size: 80%;">New image (or volume)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs080.html#parameters-to-train-common-settings" style="font-size: 80%;">Parameters to train, common settings</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs081.html#examples-of-cnn-setups" style="font-size: 80%;">Examples of CNN setups</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs082.html#summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Summarizing: Performing a general discrete convolution ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs083.html#pooling" style="font-size: 80%;">Pooling</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs084.html#pooling-arithmetic" style="font-size: 80%;">Pooling arithmetic</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs085.html#pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Pooling types ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs086.html#building-convolutional-neural-networks-in-tensorflow-keras-and-pytorch" style="font-size: 80%;">Building convolutional neural networks in Tensorflow/Keras and PyTorch</a></li>
 
         </ul>
       </li>
@@ -331,23 +389,31 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0063"></a>
 <!-- !split -->
-<h2 id="adaptive-boosting-adaboost-basic-algorithm" class="anchor">Adaptive boosting: AdaBoost, Basic Algorithm </h2>
+<h2 id="convolution-examples-polynomial-multiplication" class="anchor">Convolution Examples: Polynomial multiplication </h2>
 
-<p>The algorithm here is rather straightforward. Assume that our weak
-classifier is a decision tree and we consider a binary set of outputs
-with \( y_i \in \{-1,1\} \) and \( i=0,1,2,\dots,n-1 \) as our set of
-observations. Our design matrix is given in terms of the
-feature/predictor vectors
-\( \boldsymbol{X}=[\boldsymbol{x}_0\boldsymbol{x}_1\dots\boldsymbol{x}_{p-1}] \). Finally, we define also a
-classifier determined by our data via a function \( G(x) \). This function tells us how well we are able to classify our outputs/targets \( \boldsymbol{y} \). 
+<p>Our first example is that of a multiplication between two polynomials,
+which we will rewrite in terms of the mathematics of convolution. In
+the final stage, since the problem here is a discrete one, we will
+recast the final expression in terms of a matrix-vector
+multiplication, where the matrix is a so-called <a href="https://link.springer.com/book/10.1007/978-93-86279-04-0" target="_self">Toeplitz matrix
+</a>.
 </p>
 
-<p>We have already defined the misclassification error \( \mathrm{err} \) as</p>
+<p>Let us look a the following polynomials to second and third order, respectively:</p>
 $$
-\mathrm{err}=\frac{1}{n}\sum_{i=0}^{n-1}I(y_i\ne G(x_i)),
+p(t) = \alpha_0+\alpha_1 t+\alpha_2 t^2,
+$$
+
+<p>and</p>
+$$
+s(t) = \beta_0+\beta_1 t+\beta_2 t^2+\beta_3 t^3.
+$$
+
+<p>The polynomial multiplication gives us a new polynomial of degree \( 5 \)</p>
+$$
+z(t) = \delta_0+\delta_1 t+\delta_2 t^2+\delta_3 t^3+\delta_4 t^4+\delta_5 t^5.
 $$
 
-<p>where the function \( I() \) is one if we misclassify and zero if we classify correctly. </p>
 
 <p>
 <!-- navigation buttons at the bottom of the page -->
@@ -366,6 +432,15 @@ <h2 id="adaptive-boosting-adaboost-basic-algorithm" class="anchor">Adaptive boos
   <li class="active"><a href="http://github.com/._week44-bs063.html">64</a></li>
   <li><a href="http://github.com/._week44-bs064.html">65</a></li>
   <li><a href="http://github.com/._week44-bs065.html">66</a></li>
+  <li><a href="http://github.com/._week44-bs066.html">67</a></li>
+  <li><a href="http://github.com/._week44-bs067.html">68</a></li>
+  <li><a href="http://github.com/._week44-bs068.html">69</a></li>
+  <li><a href="http://github.com/._week44-bs069.html">70</a></li>
+  <li><a href="http://github.com/._week44-bs070.html">71</a></li>
+  <li><a href="http://github.com/._week44-bs071.html">72</a></li>
+  <li><a href="http://github.com/._week44-bs072.html">73</a></li>
+  <li><a href="">...</a></li>
+  <li><a href="http://github.com/._week44-bs086.html">87</a></li>
   <li><a href="http://github.com/._week44-bs064.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week44/html/week44-bs.html b/doc/pub/week44/html/week44-bs.html
index 644209ab4..7e2427036 100644
--- a/doc/pub/week44/html/week44-bs.html
+++ b/doc/pub/week44/html/week44-bs.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 44,  Convolutional Neural Networks (CNN)">
-<title>Week 44,  Convolutional Neural Networks (CNN)</title>
+<meta name="description" content="Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)">
+<title>Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week44.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week44-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -41,10 +41,121 @@
                2,
                None,
                'lab-sessions-on-tuesday-and-wednesday'),
-              ('Material for Lecture Monday October 28',
+              ('Material for Lecture Monday October 27',
                2,
                None,
-               'material-for-lecture-monday-october-28'),
+               'material-for-lecture-monday-october-27'),
+              ('Solving differential equations  with Deep Learning',
+               2,
+               None,
+               'solving-differential-equations-with-deep-learning'),
+              ('Ordinary Differential Equations first',
+               2,
+               None,
+               'ordinary-differential-equations-first'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('Minimization process', 2, None, 'minimization-process'),
+              ('Minimizing the cost function using gradient descent and '
+               'automatic differentiation',
+               2,
+               None,
+               'minimizing-the-cost-function-using-gradient-descent-and-automatic-differentiation'),
+              ('Example: Exponential decay',
+               2,
+               None,
+               'example-exponential-decay'),
+              ('The function to solve for',
+               2,
+               None,
+               'the-function-to-solve-for'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('Setup of Network', 2, None, 'setup-of-network'),
+              ('Reformulating the problem',
+               2,
+               None,
+               'reformulating-the-problem'),
+              ('More technicalities', 2, None, 'more-technicalities'),
+              ('More details', 2, None, 'more-details'),
+              ('A possible implementation of a neural network',
+               2,
+               None,
+               'a-possible-implementation-of-a-neural-network'),
+              ('Technicalities', 2, None, 'technicalities'),
+              ('Final technicalities I', 2, None, 'final-technicalities-i'),
+              ('Final technicalities II', 2, None, 'final-technicalities-ii'),
+              ('Final technicalities III', 2, None, 'final-technicalities-iii'),
+              ('Final technicalities IV', 2, None, 'final-technicalities-iv'),
+              ('Back propagation', 2, None, 'back-propagation'),
+              ('Gradient descent', 2, None, 'gradient-descent'),
+              ('The code for solving the ODE',
+               2,
+               None,
+               'the-code-for-solving-the-ode'),
+              ('The network with one input layer, specified number of hidden '
+               'layers, and one output layer',
+               2,
+               None,
+               'the-network-with-one-input-layer-specified-number-of-hidden-layers-and-one-output-layer'),
+              ('Example: Population growth',
+               2,
+               None,
+               'example-population-growth'),
+              ('Setting up the problem', 2, None, 'setting-up-the-problem'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('The program using Autograd',
+               2,
+               None,
+               'the-program-using-autograd'),
+              ('Using forward Euler to solve the ODE',
+               2,
+               None,
+               'using-forward-euler-to-solve-the-ode'),
+              ('Example: Solving the one dimensional Poisson equation',
+               2,
+               None,
+               'example-solving-the-one-dimensional-poisson-equation'),
+              ('The specific equation to solve for',
+               2,
+               None,
+               'the-specific-equation-to-solve-for'),
+              ('Solving the equation using Autograd',
+               2,
+               None,
+               'solving-the-equation-using-autograd'),
+              ('Comparing with a numerical scheme',
+               2,
+               None,
+               'comparing-with-a-numerical-scheme'),
+              ('Setting up the code', 2, None, 'setting-up-the-code'),
+              ('Partial Differential Equations',
+               2,
+               None,
+               'partial-differential-equations'),
+              ('Type of problem', 2, None, 'type-of-problem'),
+              ('Network requirements', 2, None, 'network-requirements'),
+              ('More details', 2, None, 'more-details'),
+              ('Example: The diffusion equation',
+               2,
+               None,
+               'example-the-diffusion-equation'),
+              ('Defining the problem', 2, None, 'defining-the-problem'),
+              ('Setting up the network using Autograd',
+               2,
+               None,
+               'setting-up-the-network-using-autograd'),
+              ('Setting up the network using Autograd; The trial solution',
+               2,
+               None,
+               'setting-up-the-network-using-autograd-the-trial-solution'),
+              ('Why the Jacobian?', 2, None, 'why-the-jacobian'),
+              ('Setting up the network using Autograd; The full program',
+               2,
+               None,
+               'setting-up-the-network-using-autograd-the-full-program'),
+              ('Resources on differential equations and deep learning',
+               2,
+               None,
+               'resources-on-differential-equations-and-deep-learning'),
               ('Convolutional Neural Networks (recognizing images)',
                2,
                None,
@@ -141,85 +252,11 @@
                2,
                None,
                'pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
-              ('Building convolutional neural networks in Tensorflow and Keras',
-               2,
-               None,
-               'building-convolutional-neural-networks-in-tensorflow-and-keras'),
-              ('Setting it up', 2, None, 'setting-it-up'),
-              ('The MNIST dataset again', 2, None, 'the-mnist-dataset-again'),
-              ('Strong correlations', 2, None, 'strong-correlations'),
-              ('Layers of a CNN', 2, None, 'layers-of-a-cnn'),
-              ('Systematic reduction', 2, None, 'systematic-reduction'),
-              ('Prerequisites: Collect and pre-process data',
+              ('Building convolutional neural networks in Tensorflow/Keras and '
+               'PyTorch',
                2,
                None,
-               'prerequisites-collect-and-pre-process-data'),
-              ('Importing Keras and Tensorflow',
-               2,
-               None,
-               'importing-keras-and-tensorflow'),
-              ('Running with Keras', 2, None, 'running-with-keras'),
-              ('Final part', 2, None, 'final-part'),
-              ('Final visualization', 2, None, 'final-visualization'),
-              ('The CIFAR01 data set', 2, None, 'the-cifar01-data-set'),
-              ('Verifying the data set', 2, None, 'verifying-the-data-set'),
-              ('Set up  the model', 2, None, 'set-up-the-model'),
-              ('Add Dense layers on top', 2, None, 'add-dense-layers-on-top'),
-              ('Compile and train the model',
-               2,
-               None,
-               'compile-and-train-the-model'),
-              ('Finally, evaluate the model',
-               2,
-               None,
-               'finally-evaluate-the-model'),
-              ('Building our own CNN code',
-               2,
-               None,
-               'building-our-own-cnn-code'),
-              ('List of contents:', 3, None, 'list-of-contents'),
-              ('Schedulers', 3, None, 'schedulers'),
-              ('Usage of schedulers', 3, None, 'usage-of-schedulers'),
-              ('Cost functions', 3, None, 'cost-functions'),
-              ('Usage of cost functions', 3, None, 'usage-of-cost-functions'),
-              ('Activation functions', 3, None, 'activation-functions'),
-              ('Usage of activation functions',
-               3,
-               None,
-               'usage-of-activation-functions'),
-              ('Convolution', 3, None, 'convolution'),
-              ('Layers', 3, None, 'layers'),
-              ('Convolution2DLayer: convolution in a hidden layer',
-               3,
-               None,
-               'convolution2dlayer-convolution-in-a-hidden-layer'),
-              ('Backpropagation in the convolutional layer',
-               3,
-               None,
-               'backpropagation-in-the-convolutional-layer'),
-              ('Demonstration', 3, None, 'demonstration'),
-              ('Pooling Layer', 3, None, 'pooling-layer'),
-              ('Flattening Layer', 3, None, 'flattening-layer'),
-              ('Fully Connected Layers', 3, None, 'fully-connected-layers'),
-              ('Optimized Convolution2DLayer',
-               3,
-               None,
-               'optimized-convolution2dlayer'),
-              ('The Convolutional Neural Network (CNN)',
-               3,
-               None,
-               'the-convolutional-neural-network-cnn'),
-              ('Usage of CNN code', 3, None, 'usage-of-cnn-code'),
-              ('Additional Remarks', 3, None, 'additional-remarks'),
-              ('Remarks on the speed', 3, None, 'remarks-on-the-speed'),
-              ('Convolution using separable kernels',
-               3,
-               None,
-               'convolution-using-separable-kernels'),
-              ('Convolution in the Fourier domain',
-               3,
-               None,
-               'convolution-in-the-fourier-domain')]}
+               'building-convolutional-neural-networks-in-tensorflow-keras-and-pytorch')]}
 end of tocinfo -->
 
 <body>
@@ -247,96 +284,100 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week44-bs.html">Week 44,  Convolutional Neural Networks (CNN)</a>
+    <a class="navbar-brand" href="http://github.com/week44-bs.html">Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs001.html#plan-for-week-44" style="font-size: 80%;"><b>Plan for week 44</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs002.html#lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;"><b>Lab  sessions on Tuesday and Wednesday</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs003.html#material-for-lecture-monday-october-28" style="font-size: 80%;"><b>Material for Lecture Monday October 28</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs004.html#convolutional-neural-networks-recognizing-images" style="font-size: 80%;"><b>Convolutional Neural Networks (recognizing images)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs005.html#what-is-the-difference" style="font-size: 80%;"><b>What is the Difference</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs006.html#neural-networks-vs-cnns" style="font-size: 80%;"><b>Neural Networks vs CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs007.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;"><b>Why CNNS for images, sound files, medical images from CT scans etc?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs008.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;"><b>Regular NNs don’t scale well to full images</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs009.html#3d-volumes-of-neurons" style="font-size: 80%;"><b>3D volumes of neurons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs010.html#more-on-dimensionalities" style="font-size: 80%;"><b>More on Dimensionalities</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs011.html#further-remarks" style="font-size: 80%;"><b>Further remarks</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs012.html#layers-used-to-build-cnns" style="font-size: 80%;"><b>Layers used to build CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs013.html#transforming-images" style="font-size: 80%;"><b>Transforming images</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs014.html#cnns-in-brief" style="font-size: 80%;"><b>CNNs in brief</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs015.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs016.html#key-idea" style="font-size: 80%;"><b>Key Idea</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs017.html#how-to-do-image-compression-before-the-era-of-deep-learning" style="font-size: 80%;"><b>How to do image compression before the era of deep learning</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs018.html#the-svd-example" style="font-size: 80%;"><b>The SVD example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs019.html#mathematics-of-cnns" style="font-size: 80%;"><b>Mathematics of CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs020.html#convolution-examples-polynomial-multiplication" style="font-size: 80%;"><b>Convolution Examples: Polynomial multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs021.html#efficient-polynomial-multiplication" style="font-size: 80%;"><b>Efficient Polynomial Multiplication</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs022.html#further-simplification" style="font-size: 80%;"><b>Further simplification</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs023.html#a-more-efficient-way-of-coding-the-above-convolution" style="font-size: 80%;"><b>A more efficient way of coding the above Convolution</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs024.html#commutative-process" style="font-size: 80%;"><b>Commutative process</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs025.html#toeplitz-matrices" style="font-size: 80%;"><b>Toeplitz matrices</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs026.html#fourier-series-and-toeplitz-matrices" style="font-size: 80%;"><b>Fourier series and Toeplitz matrices</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs027.html#generalizing-the-above-one-dimensional-case" style="font-size: 80%;"><b>Generalizing the above one-dimensional case</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#memory-considerations" style="font-size: 80%;"><b>Memory considerations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs029.html#padding" style="font-size: 80%;"><b>Padding</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs030.html#new-vector" style="font-size: 80%;"><b>New vector</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs031.html#rewriting-as-dot-products" style="font-size: 80%;"><b>Rewriting as dot products</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs032.html#cross-correlation" style="font-size: 80%;"><b>Cross correlation</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs032.html#two-dimensional-objects" style="font-size: 80%;"><b>Two-dimensional objects</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs033.html#cnns-in-more-detail-simple-example" style="font-size: 80%;"><b>CNNs in more detail, simple example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs034.html#the-convolution-stage" style="font-size: 80%;"><b>The convolution stage</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs035.html#finding-the-number-of-parameters" style="font-size: 80%;"><b>Finding the number of parameters</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs036.html#new-image-or-volume" style="font-size: 80%;"><b>New image (or volume)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs037.html#parameters-to-train-common-settings" style="font-size: 80%;"><b>Parameters to train, common settings</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs038.html#examples-of-cnn-setups" style="font-size: 80%;"><b>Examples of CNN setups</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs039.html#summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>Summarizing: Performing a general discrete convolution ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs040.html#pooling" style="font-size: 80%;"><b>Pooling</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs041.html#pooling-arithmetic" style="font-size: 80%;"><b>Pooling arithmetic</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs042.html#pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>Pooling types ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs043.html#building-convolutional-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building convolutional neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs044.html#setting-it-up" style="font-size: 80%;"><b>Setting it up</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs045.html#the-mnist-dataset-again" style="font-size: 80%;"><b>The MNIST dataset again</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs046.html#strong-correlations" style="font-size: 80%;"><b>Strong correlations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs047.html#layers-of-a-cnn" style="font-size: 80%;"><b>Layers of a CNN</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs048.html#systematic-reduction" style="font-size: 80%;"><b>Systematic reduction</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs049.html#prerequisites-collect-and-pre-process-data" style="font-size: 80%;"><b>Prerequisites: Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs050.html#importing-keras-and-tensorflow" style="font-size: 80%;"><b>Importing Keras and Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs051.html#running-with-keras" style="font-size: 80%;"><b>Running with Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs052.html#final-part" style="font-size: 80%;"><b>Final part</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs053.html#final-visualization" style="font-size: 80%;"><b>Final visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs054.html#the-cifar01-data-set" style="font-size: 80%;"><b>The CIFAR01 data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs055.html#verifying-the-data-set" style="font-size: 80%;"><b>Verifying the data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs056.html#set-up-the-model" style="font-size: 80%;"><b>Set up  the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs057.html#add-dense-layers-on-top" style="font-size: 80%;"><b>Add Dense layers on top</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs058.html#compile-and-train-the-model" style="font-size: 80%;"><b>Compile and train the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs059.html#finally-evaluate-the-model" style="font-size: 80%;"><b>Finally, evaluate the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#building-our-own-cnn-code" style="font-size: 80%;"><b>Building our own CNN code</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#list-of-contents" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;List of contents:</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution2dlayer-convolution-in-a-hidden-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution2DLayer: convolution in a hidden layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#backpropagation-in-the-convolutional-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Backpropagation in the convolutional layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#demonstration" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Demonstration</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#pooling-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Pooling Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#flattening-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Flattening Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#fully-connected-layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Fully Connected Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#optimized-convolution2dlayer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Optimized Convolution2DLayer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#the-convolutional-neural-network-cnn" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;The Convolutional Neural Network (CNN)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#usage-of-cnn-code" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of CNN code</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#additional-remarks" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Additional Remarks</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#remarks-on-the-speed" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Remarks on the speed</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution-using-separable-kernels" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution using separable kernels</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#convolution-in-the-fourier-domain" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution in the Fourier domain</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs001.html#plan-for-week-44" style="font-size: 80%;">Plan for week 44</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs002.html#lab-sessions-on-tuesday-and-wednesday" style="font-size: 80%;">Lab  sessions on Tuesday and Wednesday</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs003.html#material-for-lecture-monday-october-27" style="font-size: 80%;">Material for Lecture Monday October 27</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs004.html#solving-differential-equations-with-deep-learning" style="font-size: 80%;">Solving differential equations  with Deep Learning</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs005.html#ordinary-differential-equations-first" style="font-size: 80%;">Ordinary Differential Equations first</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#the-trial-solution" style="font-size: 80%;">The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs007.html#minimization-process" style="font-size: 80%;">Minimization process</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs008.html#minimizing-the-cost-function-using-gradient-descent-and-automatic-differentiation" style="font-size: 80%;">Minimizing the cost function using gradient descent and automatic differentiation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs009.html#example-exponential-decay" style="font-size: 80%;">Example: Exponential decay</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs010.html#the-function-to-solve-for" style="font-size: 80%;">The function to solve for</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#the-trial-solution" style="font-size: 80%;">The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs012.html#setup-of-network" style="font-size: 80%;">Setup of Network</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs013.html#reformulating-the-problem" style="font-size: 80%;">Reformulating the problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs014.html#more-technicalities" style="font-size: 80%;">More technicalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs039.html#more-details" style="font-size: 80%;">More details</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs016.html#a-possible-implementation-of-a-neural-network" style="font-size: 80%;">A possible implementation of a neural network</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs017.html#technicalities" style="font-size: 80%;">Technicalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs018.html#final-technicalities-i" style="font-size: 80%;">Final technicalities I</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs019.html#final-technicalities-ii" style="font-size: 80%;">Final technicalities II</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs020.html#final-technicalities-iii" style="font-size: 80%;">Final technicalities III</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs021.html#final-technicalities-iv" style="font-size: 80%;">Final technicalities IV</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs022.html#back-propagation" style="font-size: 80%;">Back propagation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs023.html#gradient-descent" style="font-size: 80%;">Gradient descent</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs024.html#the-code-for-solving-the-ode" style="font-size: 80%;">The code for solving the ODE</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs025.html#the-network-with-one-input-layer-specified-number-of-hidden-layers-and-one-output-layer" style="font-size: 80%;">The network with one input layer, specified number of hidden layers, and one output layer</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs026.html#example-population-growth" style="font-size: 80%;">Example: Population growth</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs027.html#setting-up-the-problem" style="font-size: 80%;">Setting up the problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs028.html#the-trial-solution" style="font-size: 80%;">The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs029.html#the-program-using-autograd" style="font-size: 80%;">The program using Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs030.html#using-forward-euler-to-solve-the-ode" style="font-size: 80%;">Using forward Euler to solve the ODE</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs031.html#example-solving-the-one-dimensional-poisson-equation" style="font-size: 80%;">Example: Solving the one dimensional Poisson equation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs032.html#the-specific-equation-to-solve-for" style="font-size: 80%;">The specific equation to solve for</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs033.html#solving-the-equation-using-autograd" style="font-size: 80%;">Solving the equation using Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs034.html#comparing-with-a-numerical-scheme" style="font-size: 80%;">Comparing with a numerical scheme</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs035.html#setting-up-the-code" style="font-size: 80%;">Setting up the code</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs036.html#partial-differential-equations" style="font-size: 80%;">Partial Differential Equations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs037.html#type-of-problem" style="font-size: 80%;">Type of problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs038.html#network-requirements" style="font-size: 80%;">Network requirements</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs039.html#more-details" style="font-size: 80%;">More details</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs040.html#example-the-diffusion-equation" style="font-size: 80%;">Example: The diffusion equation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs041.html#defining-the-problem" style="font-size: 80%;">Defining the problem</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs042.html#setting-up-the-network-using-autograd" style="font-size: 80%;">Setting up the network using Autograd</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs043.html#setting-up-the-network-using-autograd-the-trial-solution" style="font-size: 80%;">Setting up the network using Autograd; The trial solution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs044.html#why-the-jacobian" style="font-size: 80%;">Why the Jacobian?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs045.html#setting-up-the-network-using-autograd-the-full-program" style="font-size: 80%;">Setting up the network using Autograd; The full program</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs046.html#resources-on-differential-equations-and-deep-learning" style="font-size: 80%;">Resources on differential equations and deep learning</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs047.html#convolutional-neural-networks-recognizing-images" style="font-size: 80%;">Convolutional Neural Networks (recognizing images)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs048.html#what-is-the-difference" style="font-size: 80%;">What is the Difference</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs049.html#neural-networks-vs-cnns" style="font-size: 80%;">Neural Networks vs CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs050.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;">Why CNNS for images, sound files, medical images from CT scans etc?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs051.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;">Regular NNs don’t scale well to full images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs052.html#3d-volumes-of-neurons" style="font-size: 80%;">3D volumes of neurons</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs053.html#more-on-dimensionalities" style="font-size: 80%;">More on Dimensionalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs054.html#further-remarks" style="font-size: 80%;">Further remarks</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs055.html#layers-used-to-build-cnns" style="font-size: 80%;">Layers used to build CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs056.html#transforming-images" style="font-size: 80%;">Transforming images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs057.html#cnns-in-brief" style="font-size: 80%;">CNNs in brief</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs058.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs059.html#key-idea" style="font-size: 80%;">Key Idea</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs060.html#how-to-do-image-compression-before-the-era-of-deep-learning" style="font-size: 80%;">How to do image compression before the era of deep learning</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs061.html#the-svd-example" style="font-size: 80%;">The SVD example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs062.html#mathematics-of-cnns" style="font-size: 80%;">Mathematics of CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs063.html#convolution-examples-polynomial-multiplication" style="font-size: 80%;">Convolution Examples: Polynomial multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs064.html#efficient-polynomial-multiplication" style="font-size: 80%;">Efficient Polynomial Multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs065.html#further-simplification" style="font-size: 80%;">Further simplification</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs066.html#a-more-efficient-way-of-coding-the-above-convolution" style="font-size: 80%;">A more efficient way of coding the above Convolution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs067.html#commutative-process" style="font-size: 80%;">Commutative process</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs068.html#toeplitz-matrices" style="font-size: 80%;">Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs069.html#fourier-series-and-toeplitz-matrices" style="font-size: 80%;">Fourier series and Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs070.html#generalizing-the-above-one-dimensional-case" style="font-size: 80%;">Generalizing the above one-dimensional case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs071.html#memory-considerations" style="font-size: 80%;">Memory considerations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs072.html#padding" style="font-size: 80%;">Padding</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs073.html#new-vector" style="font-size: 80%;">New vector</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs074.html#rewriting-as-dot-products" style="font-size: 80%;">Rewriting as dot products</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs075.html#cross-correlation" style="font-size: 80%;">Cross correlation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs075.html#two-dimensional-objects" style="font-size: 80%;">Two-dimensional objects</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs076.html#cnns-in-more-detail-simple-example" style="font-size: 80%;">CNNs in more detail, simple example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs077.html#the-convolution-stage" style="font-size: 80%;">The convolution stage</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs078.html#finding-the-number-of-parameters" style="font-size: 80%;">Finding the number of parameters</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs079.html#new-image-or-volume" style="font-size: 80%;">New image (or volume)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs080.html#parameters-to-train-common-settings" style="font-size: 80%;">Parameters to train, common settings</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs081.html#examples-of-cnn-setups" style="font-size: 80%;">Examples of CNN setups</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs082.html#summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Summarizing: Performing a general discrete convolution ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs083.html#pooling" style="font-size: 80%;">Pooling</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs084.html#pooling-arithmetic" style="font-size: 80%;">Pooling arithmetic</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs085.html#pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Pooling types ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week44-bs086.html#building-convolutional-neural-networks-in-tensorflow-keras-and-pytorch" style="font-size: 80%;">Building convolutional neural networks in Tensorflow/Keras and PyTorch</a></li>
 
         </ul>
       </li>
@@ -350,7 +391,7 @@
 <!-- ------------------- main content ---------------------- -->
 <div class="jumbotron">
 <center>
-<h1>Week 44,  Convolutional Neural Networks (CNN)</h1>
+<h1>Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)</h1>
 </center>  <!-- document title -->
 
 <!-- author(s): Morten Hjorth-Jensen -->
@@ -363,7 +404,7 @@ <h1>Week 44,  Convolutional Neural Networks (CNN)</h1>
 </center>
 <br>
 <center>
-<h4>October 28</h4>
+<h4>Week 44</h4>
 </center> <!-- date -->
 <br>
 
@@ -388,7 +429,7 @@ <h4>October 28</h4>
   <li><a href="http://github.com/._week44-bs008.html">9</a></li>
   <li><a href="http://github.com/._week44-bs009.html">10</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week44-bs060.html">61</a></li>
+  <li><a href="http://github.com/._week44-bs086.html">87</a></li>
   <li><a href="http://github.com/._week44-bs001.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
@@ -402,7 +443,7 @@ <h4>October 28</h4>
 </footer>
 -->
 <center style="font-size:80%">
-<!-- copyright --> &copy; 1999-2024, Morten Hjorth-Jensen. Released under CC Attribution-NonCommercial 4.0 license
+<!-- copyright --> &copy; 1999-2025, Morten Hjorth-Jensen. Released under CC Attribution-NonCommercial 4.0 license
 </center>
 </body>
 </html>
diff --git a/doc/pub/week44/html/week44-reveal.html b/doc/pub/week44/html/week44-reveal.html
index dbcae2165..fc17e9cdc 100644
--- a/doc/pub/week44/html/week44-reveal.html
+++ b/doc/pub/week44/html/week44-reveal.html
@@ -9,8 +9,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 44,  Convolutional Neural Networks (CNN)">
-<title>Week 44,  Convolutional Neural Networks (CNN)</title>
+<meta name="description" content="Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)">
+<title>Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)</title>
 
 <!-- reveal.js: https://lab.hakim.se/reveal-js/ -->
 
@@ -168,7 +168,7 @@
 <section>
 <!-- ------------------- main content ---------------------- -->
 <center>
-<h1 style="text-align: center;">Week 44,  Convolutional Neural Networks (CNN)</h1>
+<h1 style="text-align: center;">Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)</h1>
 </center>  <!-- document title -->
 
 <!-- author(s): Morten Hjorth-Jensen -->
@@ -181,13 +181,13 @@ <h1 style="text-align: center;">Week 44,  Convolutional Neural Networks (CNN)</h
 </center>
 <br>
 <center>
-<h4>October 28</h4>
+<h4>Week 44</h4>
 </center> <!-- date -->
 <br>
 
 
 <center style="font-size:80%">
-<!-- copyright --> &copy; 1999-2024, Morten Hjorth-Jensen. Released under CC Attribution-NonCommercial 4.0 license
+<!-- copyright --> &copy; 1999-2025, Morten Hjorth-Jensen. Released under CC Attribution-NonCommercial 4.0 license
 </center>
 </section>
 
@@ -195,10 +195,11 @@ <h4>October 28</h4>
 <h2 id="plan-for-week-44">Plan for week 44 </h2>
 
 <div class="alert alert-block alert-block alert-text-normal">
-<b>Material for the lecture Monday October 28, 2024</b>
+<b>Material for the lecture Monday October 27, 2025</b>
 <p>
 <ol>
-<p><li> Convolutional  Neural Networks</li>
+<p><li> Solving differential equations, continuation from last week, first lecture</li>
+<p><li> Convolutional  Neural Networks, second lecture</li>
 <p><li> Readings and Videos:</li>
 <ul>
 
@@ -214,9 +215,9 @@ <h2 id="plan-for-week-44">Plan for week 44 </h2>
 
 <p><li> Video on CNNs from Stanford at <a href="https://www.youtube.com/watch?v=bNb2fEVKeEo&list=PLC1qU-LWwrF64f4QKQT-Vg5Wr4qEE1Zxk&index=6&ab_channel=StanfordUniversitySchoolofEngineering" target="_blank"><tt>https://www.youtube.com/watch?v=bNb2fEVKeEo&list=PLC1qU-LWwrF64f4QKQT-Vg5Wr4qEE1Zxk&index=6&ab_channel=StanfordUniversitySchoolofEngineering</tt></a></li>
 
-<p><li> Video of lecture October 28 at <a href="https://youtu.be/rfrSfikAz94" target="_blank"><tt>https://youtu.be/rfrSfikAz94</tt></a></li>
+<p><li> Video of lecture October 27 at <a href="https://youtu.be/QqOGhLgkig0" target="_blank"><tt>https://youtu.be/QqOGhLgkig0</tt></a></li>
 
-<p><li> Whiteboard notes at <a href="https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2024/NotesOctober28" target="_blank"><tt>https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2024/NotesOctober28</tt></a></li>
+<p><li> Whiteboard notes at <a href="https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2025/FYSSTKweek44" target="_blank"><tt>https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2025/FYSSTKweek44</tt></a></li>
 </ul>
 <p>
 </ol>
@@ -231,1124 +232,1212 @@ <h2 id="lab-sessions-on-tuesday-and-wednesday">Lab  sessions on Tuesday and Wedn
 <p>
 <ul>
 <p><li> Main focus is discussion of and work on project 2</li>
-<p><li> If you did not get time to finish the exercises from week 43, you can also keep working on them and hand in this coming Friday</li>
+<p><li> If you did not get time to finish the exercises from weeks 41-42, you can also keep working on them and hand in this coming Friday</li>
 </ul>
 </div>
 </section>
 
 <section>
-<h2 id="material-for-lecture-monday-october-28">Material for Lecture Monday October 28 </h2>
+<h2 id="material-for-lecture-monday-october-27">Material for Lecture Monday October 27 </h2>
 </section>
 
 <section>
-<h2 id="convolutional-neural-networks-recognizing-images">Convolutional Neural Networks (recognizing images) </h2>
+<h2 id="solving-differential-equations-with-deep-learning">Solving differential equations  with Deep Learning </h2>
 
-<p>Convolutional neural networks (CNNs) were developed during the last
-decade of the previous century, with a focus on character recognition
-tasks. Nowadays, CNNs are a central element in the spectacular success
-of deep learning methods. The success in for example image
-classifications have made them a central tool for most machine
-learning practitioners.
+<div class="alert alert-block alert-block alert-text-normal">
+<b></b>
+<p>
+<p>The Universal Approximation Theorem states that a neural network can
+approximate any function at a single hidden layer along with one input
+and output layer to any given precision.  
 </p>
+</div>
 
-<p>CNNs are very similar to ordinary Neural Networks.
-They are made up of neurons that have learnable weights and
-biases. Each neuron receives some inputs, performs a dot product and
-optionally follows it with a non-linearity. The whole network still
-expresses a single differentiable score function: from the raw image
-pixels on one end to class scores at the other. And they still have a
-loss function (for example Softmax) on the last (fully-connected) layer
-and all the tips/tricks we developed for learning regular Neural
-Networks still apply (back propagation, gradient descent etc etc).
-</p>
-</section>
 
-<section>
-<h2 id="what-is-the-difference">What is the Difference </h2>
+<div class="alert alert-block alert-block alert-text-normal">
+<b>Book on solving differential equations with ML methods</b>
+<p>
+<p><a href="https://www.springer.com/gp/book/9789401798150" target="_blank">An Introduction to Neural Network Methods for Differential Equations</a>, by Yadav and Kumar.</p>
+</div>
 
-<p><b>CNN architectures make the explicit assumption that
-the inputs are images, which allows us to encode certain properties
-into the architecture. These then make the forward function more
-efficient to implement and vastly reduce the amount of parameters in
-the network.</b>
-</p>
-</section>
 
-<section>
-<h2 id="neural-networks-vs-cnns">Neural Networks vs CNNs </h2>
+<div class="alert alert-block alert-block alert-text-normal">
+<b>Physics informed neural networks</b>
+<p>
+<p><a href="https://link.springer.com/article/10.1007/s10915-022-01939-z" target="_blank">Scientific Machine Learning Through Physics&#8211;Informed Neural Networks: Where we are and What&#8217;s Next</a>, by Cuomo et al</p>
+</div>
 
-<p>Neural networks are defined as <b>affine transformations</b>, that is 
-a vector is received as input and is multiplied with a matrix of so-called weights (our unknown paramters) to produce an
-output (to which a bias vector is usually added before passing the result
-through a nonlinear activation function). This is applicable to any type of input, be it an
-image, a sound clip or an unordered collection of features: whatever their
-dimensionality, their representation can always be flattened into a vector
-before the transformation.
+
+<div class="alert alert-block alert-block alert-text-normal">
+<b>Thanks to Kristine Baluka Hein</b>
+<p>
+<p>The lectures on differential equations were developed by Kristine Baluka Hein, now PhD student at IFI.
+A great thanks to Kristine.
 </p>
+</div>
 </section>
 
 <section>
-<h2 id="why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc">Why CNNS for images, sound files, medical images from CT scans etc? </h2>
+<h2 id="ordinary-differential-equations-first">Ordinary Differential Equations first  </h2>
 
-<p>However, when we consider images, sound clips and many other similar kinds of data, these data  have an intrinsic
-structure. More formally, they share these important properties:
-</p>
-<ul>
-<p><li> They are stored as multi-dimensional arrays (think of the pixels of a figure) .</li>
-<p><li> They feature one or more axes for which ordering matters (e.g., width and height axes for an image, time axis for a sound clip).</li>
-<p><li> One axis, called the channel axis, is used to access different views of the data (e.g., the red, green and blue channels of a color image, or the left and right channels of a stereo audio track).</li>
-</ul>
-<p>
-<p>These properties are not exploited when an affine transformation is applied; in
-fact, all the axes are treated in the same way and the topological information
-is not taken into account. Still, taking advantage of the implicit structure of
-the data may prove very handy in solving some tasks, like computer vision and
-speech recognition, and in these cases it would be best to preserve it. This is
-where discrete convolutions come into play.
-</p>
+<p>An ordinary differential equation (ODE) is an equation involving functions having one variable.</p>
 
-<p>A discrete convolution is a linear transformation that preserves this notion of
-ordering. It is sparse (only a few input units contribute to a given output
-unit) and reuses parameters (the same weights are applied to multiple locations
-in the input).
-</p>
-</section>
+<p>In general, an ordinary differential equation looks like</p>
 
-<section>
-<h2 id="regular-nns-don-t-scale-well-to-full-images">Regular NNs don&#8217;t scale well to full images </h2>
+<p>&nbsp;<br>
+$$
+\begin{equation} \tag{1}
+f\left(x, \, g(x), \, g'(x), \, g''(x), \, \dots \, , \, g^{(n)}(x)\right) = 0
+\end{equation}
+$$
+<p>&nbsp;<br>
 
-<p>As an example, consider
-an image of size \( 32\times 32\times 3 \) (32 wide, 32 high, 3 color channels), so a
-single fully-connected neuron in a first hidden layer of a regular
-Neural Network would have \( 32\times 32\times 3 = 3072 \) weights. This amount still
-seems manageable, but clearly this fully-connected structure does not
-scale to larger images. For example, an image of more respectable
-size, say \( 200\times 200\times 3 \), would lead to neurons that have 
-\( 200\times 200\times 3 = 120,000 \) weights. 
-</p>
+<p>where \( g(x) \) is the function to find, and \( g^{(n)}(x) \) is the \( n \)-th derivative of \( g(x) \).</p>
 
-<p>We could have
-several such neurons, and the parameters would add up quickly! Clearly,
-this full connectivity is wasteful and the huge number of parameters
-would quickly lead to possible overfitting.
+<p>The \( f\left(x, g(x), g'(x), g''(x), \, \dots \, , g^{(n)}(x)\right) \) is just a way to write that there is an expression involving \( x \) and \( g(x), \ g'(x), \ g''(x), \, \dots \, , \text{ and } g^{(n)}(x) \) on the left side of the equality sign in <a href="#mjx-eqn-1">(1)</a>.
+The highest order of derivative, that is the value of \( n \), determines to the order of the equation.
+The equation is referred to as a \( n \)-th order ODE.
+Along with <a href="#mjx-eqn-1">(1)</a>, some additional conditions of the function \( g(x) \) are typically given
+for the solution to be unique.
 </p>
-
-<center>  <!-- FIGURE -->
-<hr class="figure">
-<center>
-<p class="caption">Figure 1:  A regular 3-layer Neural Network. </p>
-</center>
-<p><img src="http://github.com/figslides/nn.jpeg" width="500" align="bottom"></p>
-</center>
 </section>
 
 <section>
-<h2 id="3d-volumes-of-neurons">3D volumes of neurons </h2>
+<h2 id="the-trial-solution">The trial solution </h2>
 
-<p>Convolutional Neural Networks take advantage of the fact that the
-input consists of images and they constrain the architecture in a more
-sensible way. 
-</p>
+<p>Let the trial solution \( g_t(x) \) be</p>
 
-<p>In particular, unlike a regular Neural Network, the
-layers of a CNN have neurons arranged in 3 dimensions: width,
-height, depth. (Note that the word depth here refers to the third
-dimension of an activation volume, not to the depth of a full Neural
-Network, which can refer to the total number of layers in a network.)
-</p>
+<p>&nbsp;<br>
+$$
+\begin{equation}
+	g_t(x) = h_1(x) + h_2(x,N(x,P))
+\tag{2}
+\end{equation}
+$$
+<p>&nbsp;<br>
 
-<p>To understand it better, the above example of an image 
-with an input volume of
-activations has dimensions \( 32\times 32\times 3 \) (width, height,
-depth respectively). 
+<p>where \( h_1(x) \) is a function that makes \( g_t(x) \) satisfy a given set
+of conditions, \( N(x,P) \) a neural network with weights and biases
+described by \( P \) and \( h_2(x, N(x,P)) \) some expression involving the
+neural network.  The role of the function \( h_2(x, N(x,P)) \), is to
+ensure that the output from \( N(x,P) \) is zero when \( g_t(x) \) is
+evaluated at the values of \( x \) where the given conditions must be
+satisfied.  The function \( h_1(x) \) should alone make \( g_t(x) \) satisfy
+the conditions.
 </p>
 
-<p>The neurons in a layer will
-only be connected to a small region of the layer before it, instead of
-all of the neurons in a fully-connected manner. Moreover, the final
-output layer could  for this specific image have dimensions \( 1\times 1 \times 10 \), 
-because by the
-end of the CNN architecture we will reduce the full image into a
-single vector of class scores, arranged along the depth
-dimension. 
-</p>
+<p>But what about the network \( N(x,P) \)?</p>
 
-<center>  <!-- FIGURE -->
-<hr class="figure">
-<center>
-<p class="caption">Figure 2:  A CNN arranges its neurons in three dimensions (width, height, depth), as visualized in one of the layers. Every layer of a CNN transforms the 3D input volume to a 3D output volume of neuron activations. In this example, the red input layer holds the image, so its width and height would be the dimensions of the image, and the depth would be 3 (Red, Green, Blue channels). </p>
-</center>
-<p><img src="http://github.com/figslides/cnn.jpeg" width="500" align="bottom"></p>
-</center>
+<p>As described previously, an optimization method could be used to minimize the parameters of a neural network, that being its weights and biases, through backward propagation.</p>
 </section>
 
 <section>
-<h2 id="more-on-dimensionalities">More on Dimensionalities </h2>
+<h2 id="minimization-process">Minimization process </h2>
 
-<p>In fields like signal processing (and imaging as well), one designs
-so-called filters. These filters are defined by the convolutions and
-are often hand-crafted. One may specify filters for smoothing, edge
-detection, frequency reshaping, and similar operations. However with
-neural networks the idea is to automatically learn the filters and use
-many of them in conjunction with non-linear operations (activation
-functions).
-</p>
+<p>For the minimization to be defined, we need to have a cost function at hand to minimize.</p>
 
-<p>As an example consider a neural network operating on sound sequence
-data.  Assume that we an input vector \( \boldsymbol{x} \) of length \( d=10^6 \).  We
-construct then a neural network with onle hidden layer only with
-\( 10^4 \) nodes. This means that we will have a weight matrix with
-\( 10^4\times 10^6=10^{10} \) weights to be determined, together with \( 10^4 \) biases.
+<p>It is given that \( f\left(x, \, g(x), \, g'(x), \, g''(x), \, \dots \, , \, g^{(n)}(x)\right) \) should be equal to zero in <a href="#mjx-eqn-1">(1)</a>.
+We can choose to consider the mean squared error as the cost function for an input \( x \).
+Since we are looking at one input, the cost function is just \( f \) squared.
+The cost function \( c\left(x, P \right) \) can therefore be expressed as
 </p>
 
-<p>Assume furthermore that we have an output layer which is meant to train whether the sound sequence represents a human voice (true) or something else (false).
-It means that we have only one output node. But since this output node connects to \( 10^4 \) nodes in the hidden layer, there are in total \( 10^4 \) weights to be determined for the output layer, plus one bias. In total we have
+<p>&nbsp;<br>
+$$
+C\left(x, P\right) = \big(f\left(x, \, g(x), \, g'(x), \, g''(x), \, \dots \, , \, g^{(n)}(x)\right)\big)^2
+$$
+<p>&nbsp;<br>
+
+<p>If \( N \) inputs are given as a vector \( \boldsymbol{x} \) with elements \( x_i \) for \( i = 1,\dots,N \),
+the cost function becomes
 </p>
 
 <p>&nbsp;<br>
 $$
-\mathrm{NumberParameters}=10^{10}+10^4+10^4+1 \approx 10^{10},
+\begin{equation} \tag{3}
+	C\left(\boldsymbol{x}, P\right) = \frac{1}{N} \sum_{i=1}^N \big(f\left(x_i, \, g(x_i), \, g'(x_i), \, g''(x_i), \, \dots \, , \, g^{(n)}(x_i)\right)\big)^2
+\end{equation}
 $$
 <p>&nbsp;<br>
 
-<p>that is ten billion parameters to determine. </p>
+<p>The neural net should then find the parameters \( P \) that minimizes the cost function in
+<a href="#mjx-eqn-3">(3)</a> for a set of \( N \) training samples \( x_i \).
+</p>
 </section>
 
 <section>
-<h2 id="further-remarks">Further remarks </h2>
+<h2 id="minimizing-the-cost-function-using-gradient-descent-and-automatic-differentiation">Minimizing the cost function using gradient descent and automatic differentiation </h2>
 
-<p>The main principles that justify convolutions is locality of
-information and repetion of patterns within the signal. Sound samples
-of the input in adjacent spots are much more likely to affect each
-other than those that are very far away. Similarly, sounds are
-repeated in multiple times in the signal. While slightly simplistic,
-reasoning about such a sound example demonstrates this. The same
-principles then apply to images and other similar data.
+<p>To perform the minimization using gradient descent, the gradient of \( C\left(\boldsymbol{x}, P\right) \) is needed.
+It might happen so that finding an analytical expression of the gradient of \( C(\boldsymbol{x}, P) \) from <a href="#mjx-eqn-3">(3)</a> gets too messy, depending on which cost function one desires to use.
+</p>
+
+<p>Luckily, there exists libraries that makes the job for us through automatic differentiation.
+Automatic differentiation is a method of finding the derivatives numerically with very high precision.
 </p>
 </section>
 
 <section>
-<h2 id="layers-used-to-build-cnns">Layers used to build CNNs </h2>
+<h2 id="example-exponential-decay">Example: Exponential decay </h2>
 
-<p>A simple CNN is a sequence of layers, and every layer of a CNN
-transforms one volume of activations to another through a
-differentiable function. We use three main types of layers to build
-CNN architectures: Convolutional Layer, Pooling Layer, and
-Fully-Connected Layer (exactly as seen in regular Neural Networks). We
-will stack these layers to form a full CNN architecture.
-</p>
+<p>An exponential decay of a quantity \( g(x) \) is described by the equation</p>
 
-<p>A simple CNN for image classification could have the architecture:</p>
+<p>&nbsp;<br>
+$$
+\begin{equation} \tag{4}
+  g'(x) = -\gamma g(x)
+\end{equation}
+$$
+<p>&nbsp;<br>
 
-<ul>
-<p><li> <b>INPUT</b> (\( 32\times 32 \times 3 \)) will hold the raw pixel values of the image, in this case an image of width 32, height 32, and with three color channels R,G,B.</li>
-<p><li> <b>CONV</b> (convolutional )layer will compute the output of neurons that are connected to local regions in the input, each computing a dot product between their weights and a small region they are connected to in the input volume. This may result in volume such as \( [32\times 32\times 12] \) if we decided to use 12 filters.</li>
-<p><li> <b>RELU</b> layer will apply an elementwise activation function, such as the \( max(0,x) \) thresholding at zero. This leaves the size of the volume unchanged (\( [32\times 32\times 12] \)).</li>
-<p><li> <b>POOL</b> (pooling) layer will perform a downsampling operation along the spatial dimensions (width, height), resulting in volume such as \( [16\times 16\times 12] \).</li>
-<p><li> <b>FC</b> (i.e. fully-connected) layer will compute the class scores, resulting in volume of size \( [1\times 1\times 10] \), where each of the 10 numbers correspond to a class score, such as among the 10 categories of the MNIST images we considered above . As with ordinary Neural Networks and as the name implies, each neuron in this layer will be connected to all the numbers in the previous volume.</li>
-</ul>
-</section>
+<p>with \( g(0) = g_0 \) for some chosen initial value \( g_0 \).</p>
 
-<section>
-<h2 id="transforming-images">Transforming images </h2>
+<p>The analytical solution of <a href="#mjx-eqn-4">(4)</a> is</p>
 
-<p>CNNs transform the original image layer by layer from the original
-pixel values to the final class scores. 
-</p>
+<p>&nbsp;<br>
+$$
+\begin{equation}
+  g(x) = g_0 \exp\left(-\gamma x\right)
+\tag{5}
+\end{equation}
+$$
+<p>&nbsp;<br>
 
-<p>Observe that some layers contain
-parameters and other don&#8217;t. In particular, the CNN layers perform
-transformations that are a function of not only the activations in the
-input volume, but also of the parameters (the weights and biases of
-the neurons). On the other hand, the RELU/POOL layers will implement a
-fixed function. The parameters in the CONV/FC layers will be trained
-with gradient descent so that the class scores that the CNN computes
-are consistent with the labels in the training set for each image.
-</p>
+<p>Having an analytical solution at hand, it is possible to use it to compare how well a neural network finds a solution of <a href="#mjx-eqn-4">(4)</a>.</p>
 </section>
 
 <section>
-<h2 id="cnns-in-brief">CNNs in brief </h2>
+<h2 id="the-function-to-solve-for">The function to solve for </h2>
 
-<p>In summary:</p>
+<p>The program will use a neural network to solve</p>
 
-<ul>
-<p><li> A CNN architecture is in the simplest case a list of Layers that transform the image volume into an output volume (e.g. holding the class scores)</li>
-<p><li> There are a few distinct types of Layers (e.g. CONV/FC/RELU/POOL are by far the most popular)</li>
-<p><li> Each Layer accepts an input 3D volume and transforms it to an output 3D volume through a differentiable function</li>
-<p><li> Each Layer may or may not have parameters (e.g. CONV/FC do, RELU/POOL don&#8217;t)</li>
-<p><li> Each Layer may or may not have additional hyperparameters (e.g. CONV/FC/POOL do, RELU doesn&#8217;t)</li>
-</ul>
+<p>&nbsp;<br>
+$$
+\begin{equation} \tag{6}
+g'(x) = -\gamma g(x)
+\end{equation}
+$$
+<p>&nbsp;<br>
+
+<p>where \( g(0) = g_0 \) with \( \gamma \) and \( g_0 \) being some chosen values.</p>
+
+<p>In this example, \( \gamma = 2 \) and \( g_0 = 10 \).</p>
 </section>
 
 <section>
-<h2 id="a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book">A deep CNN model (<a href="https://github.com/rasbt/machine-learning-book" target="_blank">From Raschka et al</a>) </h2>
+<h2 id="the-trial-solution">The trial solution </h2>
+<p>To begin with, a trial solution \( g_t(t) \) must be chosen. A general trial solution for ordinary differential equations could be</p>
 
-<center>  <!-- FIGURE -->
-<hr class="figure">
-<center>
-<p class="caption">Figure 3:  A deep CNN </p>
-</center>
-<p><img src="http://github.com/figslides/deepcnn.png" width="500" align="bottom"></p>
-</center>
+<p>&nbsp;<br>
+$$
+g_t(x, P) = h_1(x) + h_2(x, N(x, P))
+$$
+<p>&nbsp;<br>
+
+<p>with \( h_1(x) \) ensuring that \( g_t(x) \) satisfies some conditions and \( h_2(x,N(x, P)) \) an expression involving \( x \) and the output from the neural network \( N(x,P) \) with \( P  \) being the collection of the weights and biases for each layer. For now, it is assumed that the network consists of one input layer, one hidden layer, and one output layer.</p>
 </section>
 
 <section>
-<h2 id="key-idea">Key Idea </h2>
+<h2 id="setup-of-network">Setup of Network </h2>
 
-<p>A dense neural network is representd by an affine operation (like matrix-matrix multiplication) where all parameters are included.</p>
+<p>In this network, there are no weights and bias at the input layer, so \( P = \{ P_{\text{hidden}},  P_{\text{output}} \} \).
+If there are \( N_{\text{hidden} } \) neurons in the hidden layer, then \( P_{\text{hidden}} \) is a \( N_{\text{hidden} } \times (1 + N_{\text{input}}) \) matrix, given that there are \( N_{\text{input}} \) neurons in the input layer.
+</p>
 
-<p>The key idea in CNNs for say imaging is that in images neighbor pixels tend to be related! So we connect
-only neighboring neurons in the input instead of connecting all with the first hidden layer.
+<p>The first column in \( P_{\text{hidden} } \) represents the bias for each neuron in the hidden layer and the second column represents the weights for each neuron in the hidden layer from the input layer.
+If there are \( N_{\text{output} } \) neurons in the output layer, then \( P_{\text{output}}  \) is a \( N_{\text{output} } \times (1 + N_{\text{hidden} }) \) matrix.
 </p>
 
-<p>We say we perform a filtering (convolution is the mathematical operation). </p>
+<p>Its first column represents the bias of each neuron and the remaining columns represents the weights to each neuron.</p>
+
+<p>It is given that \( g(0) = g_0 \). The trial solution must fulfill this condition to be a proper solution of <a href="#mjx-eqn-6">(6)</a>. A possible way to ensure that \( g_t(0, P) = g_0 \), is to let \( F(N(x,P)) = x \cdot N(x,P) \) and \( h_1(x) = g_0 \). This gives the following trial solution:</p>
+
+<p>&nbsp;<br>
+$$
+\begin{equation} \tag{7}
+g_t(x, P) = g_0 + x \cdot N(x, P)
+\end{equation}
+$$
+<p>&nbsp;<br>
 </section>
 
 <section>
-<h2 id="how-to-do-image-compression-before-the-era-of-deep-learning">How to do image compression before the era of deep learning </h2>
+<h2 id="reformulating-the-problem">Reformulating the problem </h2>
 
-<p>The singular-value decomposition (SVD) algorithm has been for decades one of the standard ways of compressing images.
-The <a href="https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/chapter2.html#the-singular-value-decomposition" target="_blank">lectures on the SVD</a> give many of the essential details concerning the SVD.
-</p>
+<p>We wish that our neural network manages to minimize a given cost function.</p>
 
-<p>The orthogonal vectors which are obtained from the SVD, can be used to
-project down the dimensionality of a given image. In the example here
-we gray-scale an image and downsize it.
+<p>A reformulation of out equation, <a href="#mjx-eqn-6">(6)</a>, must therefore be done,
+such that it describes the problem a neural network can solve for.
 </p>
 
-<p>This recipe relies on us being able to actually perform the SVD. For
-large images, and in particular with many images to reconstruct, using the SVD 
-may quickly become an overwhelming task. With the advent of efficient deep
-learning methods like CNNs and later generative methods, these methods
-have become in the last years the premier way of performing image
-analysis. In particular for classification problems with labelled images.
-</p>
-</section>
+<p>The neural network must find the set of weights and biases \( P \) such that the trial solution in <a href="#mjx-eqn-7">(7)</a> satisfies <a href="#mjx-eqn-6">(6)</a>.</p>
 
-<section>
-<h2 id="the-svd-example">The SVD example </h2>
+<p>The trial solution</p>
 
+<p>&nbsp;<br>
+$$
+g_t(x, P) = g_0 + x \cdot N(x, P)
+$$
+<p>&nbsp;<br>
 
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="font-size: 80%; line-height: 125%;"><span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">matplotlib.image</span> <span style="color: #8B008B; font-weight: bold">import</span> imread
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">matplotlib.pyplot</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">plt</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">scipy.linalg</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">ln</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">os</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">PIL</span> <span style="color: #8B008B; font-weight: bold">import</span> Image
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">math</span> <span style="color: #8B008B; font-weight: bold">import</span> log10, sqrt 
-plt.rcParams[<span style="color: #CD5555">&#39;figure.figsize&#39;</span>] = [<span style="color: #B452CD">16</span>, <span style="color: #B452CD">8</span>]
-<span style="color: #228B22"># Import image</span>
-A = imread(os.path.join(<span style="color: #CD5555">&quot;figslides/photo1.jpg&quot;</span>))
-X = A.dot([<span style="color: #B452CD">0.299</span>, <span style="color: #B452CD">0.5870</span>, <span style="color: #B452CD">0.114</span>]) <span style="color: #228B22"># Convert RGB to grayscale</span>
-img = plt.imshow(X)
-<span style="color: #228B22"># convert to gray</span>
-img.set_cmap(<span style="color: #CD5555">&#39;gray&#39;</span>)
-plt.axis(<span style="color: #CD5555">&#39;off&#39;</span>)
-plt.show()
-<span style="color: #228B22"># Call image size</span>
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&#39;: %s&#39;</span>%<span style="color: #658b00">str</span>(X.shape))
+<p>has been chosen such that it already solves the condition \( g(0) = g_0 \). What remains, is to find \( P \) such that</p>
 
+<p>&nbsp;<br>
+$$
+\begin{equation} \tag{8}
+g_t'(x, P) = - \gamma g_t(x, P)
+\end{equation}
+$$
+<p>&nbsp;<br>
 
-<span style="color: #228B22"># split the matrix into U, S, VT</span>
-U, S, VT = np.linalg.svd(X,full_matrices=<span style="color: #8B008B; font-weight: bold">False</span>)
-S = np.diag(S)
-m = <span style="color: #B452CD">800</span> <span style="color: #228B22"># Image&#39;s width</span>
-n = <span style="color: #B452CD">1200</span> <span style="color: #228B22"># Image&#39;s height</span>
-j = <span style="color: #B452CD">0</span>
-<span style="color: #228B22"># Try compression with different k vectors (these represent projections):</span>
-<span style="color: #8B008B; font-weight: bold">for</span> k <span style="color: #8B008B">in</span> (<span style="color: #B452CD">5</span>,<span style="color: #B452CD">10</span>, <span style="color: #B452CD">20</span>, <span style="color: #B452CD">100</span>,<span style="color: #B452CD">200</span>,<span style="color: #B452CD">400</span>,<span style="color: #B452CD">500</span>):
-    <span style="color: #228B22"># Original size of the image</span>
-    originalSize = m * n 
-    <span style="color: #228B22"># Size after compressed</span>
-    compressedSize = k * (<span style="color: #B452CD">1</span> + m + n) 
-    <span style="color: #228B22"># The projection of the original image</span>
-    Xapprox = U[:,:k] @ S[<span style="color: #B452CD">0</span>:k,:k] @ VT[:k,:]
-    plt.figure(j+<span style="color: #B452CD">1</span>)
-    j += <span style="color: #B452CD">1</span>
-    img = plt.imshow(Xapprox)
-    img.set_cmap(<span style="color: #CD5555">&#39;gray&#39;</span>)
-    
-    plt.axis(<span style="color: #CD5555">&#39;off&#39;</span>)
-    plt.title(<span style="color: #CD5555">&#39;k = &#39;</span> + <span style="color: #658b00">str</span>(k))
-    plt.show() 
-    <span style="color: #658b00">print</span>(<span style="color: #CD5555">&#39;Original size of image:&#39;</span>)
-    <span style="color: #658b00">print</span>(originalSize)
-    <span style="color: #658b00">print</span>(<span style="color: #CD5555">&#39;Compression rate as Compressed image / Original size:&#39;</span>)
-    ratio = compressedSize * <span style="color: #B452CD">1.0</span> / originalSize
-    <span style="color: #658b00">print</span>(ratio)
-    <span style="color: #658b00">print</span>(<span style="color: #CD5555">&#39;Compression rate is &#39;</span> + <span style="color: #658b00">str</span>( <span style="color: #658b00">round</span>(ratio * <span style="color: #B452CD">100</span> ,<span style="color: #B452CD">2</span>)) + <span style="color: #CD5555">&#39;%&#39;</span> )  
-    <span style="color: #228B22"># Estimate MQA</span>
-    x= X.astype(<span style="color: #CD5555">&quot;float&quot;</span>)
-    y=Xapprox.astype(<span style="color: #CD5555">&quot;float&quot;</span>)
-    err = np.sum((x - y) ** <span style="color: #B452CD">2</span>)
-    err /= <span style="color: #658b00">float</span>(X.shape[<span style="color: #B452CD">0</span>] * Xapprox.shape[<span style="color: #B452CD">1</span>])
-    <span style="color: #658b00">print</span>(<span style="color: #CD5555">&#39;The mean-square deviation &#39;</span>+ <span style="color: #658b00">str</span>(<span style="color: #658b00">round</span>( err)))
-    max_pixel = <span style="color: #B452CD">255.0</span>
-    <span style="color: #228B22"># Estimate Signal Noise Ratio</span>
-    srv = <span style="color: #B452CD">20</span> * (log10(max_pixel / sqrt(err)))
-    <span style="color: #658b00">print</span>(<span style="color: #CD5555">&#39;Signa to noise ratio &#39;</span>+ <span style="color: #658b00">str</span>(<span style="color: #658b00">round</span>(srv)) +<span style="color: #CD5555">&#39;dB&#39;</span>)
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
+<p>is fulfilled as <em>best as possible</em>.</p>
 </section>
 
 <section>
-<h2 id="mathematics-of-cnns">Mathematics of CNNs </h2>
+<h2 id="more-technicalities">More technicalities </h2>
 
-<p>The mathematics of CNNs is based on the mathematical operation of
-<b>convolution</b>.  In mathematics (in particular in functional analysis),
-convolution is represented by mathematical operations (integration,
-summation etc) on two functions in order to produce a third function
-that expresses how the shape of one gets modified by the other.
-Convolution has a plethora of applications in a variety of
-disciplines, spanning from statistics to signal processing, computer
-vision, solutions of differential equations,linear algebra,
-engineering, and yes, machine learning.
+<p>The left hand side and right hand side of <a href="#mjx-eqn-8">(8)</a> must be computed separately, and then the neural network must choose weights and biases, contained in \( P \), such that the sides are equal as best as possible.
+This means that the absolute or squared difference between the sides must be as close to zero, ideally equal to zero.
+In this case, the difference squared shows to be an appropriate measurement of how erroneous the trial solution is with respect to \( P \) of the neural network.
 </p>
 
-<p>Mathematically, convolution is defined as follows (one-dimensional example):
-Let us define a continuous function \( y(t) \) given by
-</p>
+<p>This gives the following cost function our neural network must solve for:</p>
+
 <p>&nbsp;<br>
 $$
-y(t) = \int x(a) w(t-a) da,
+\min_{P}\Big\{ \big(g_t'(x, P) - ( -\gamma g_t(x, P) \big)^2 \Big\}
 $$
 <p>&nbsp;<br>
 
-<p>where \( x(a) \) represents a so-called input and \( w(t-a) \) is normally called the weight function or kernel.</p>
+<p>(the notation \( \min_{P}\{ f(x, P) \} \) means that we desire to find \( P \) that yields the minimum of \( f(x, P) \))</p>
 
-<p>The above integral is written in  a more compact form as</p>
-<p>&nbsp;<br>
-$$
-y(t) = \left(x * w\right)(t).
-$$
-<p>&nbsp;<br>
+<p>or, in terms of weights and biases for the hidden and output layer in our network:</p>
 
-<p>The discretized version reads</p>
 <p>&nbsp;<br>
 $$
-y(t) = \sum_{a=-\infty}^{a=\infty}x(a)w(t-a).
+\min_{P_{\text{hidden} }, \ P_{\text{output} }}\Big\{ \big(g_t'(x, \{ P_{\text{hidden} }, P_{\text{output} }\}) - ( -\gamma g_t(x, \{ P_{\text{hidden} }, P_{\text{output} }\}) \big)^2 \Big\}
 $$
 <p>&nbsp;<br>
 
-<p>Computing the inverse of the above convolution operations is known as deconvolution and the process is commutative.</p>
-
-<p>How can we use this? And what does it mean? Let us study some familiar examples first.</p>
+<p>for an input value \( x \).</p>
 </section>
 
 <section>
-<h2 id="convolution-examples-polynomial-multiplication">Convolution Examples: Polynomial multiplication </h2>
+<h2 id="more-details">More details </h2>
 
-<p>Our first example is that of a multiplication between two polynomials,
-which we will rewrite in terms of the mathematics of convolution. In
-the final stage, since the problem here is a discrete one, we will
-recast the final expression in terms of a matrix-vector
-multiplication, where the matrix is a so-called <a href="https://link.springer.com/book/10.1007/978-93-86279-04-0" target="_blank">Toeplitz matrix
-</a>.
-</p>
+<p>If the neural network evaluates \( g_t(x, P) \) at more values for \( x \), say \( N \) values \( x_i \) for \( i = 1, \dots, N \), then the <em>total</em> error to minimize becomes</p>
 
-<p>Let us look a the following polynomials to second and third order, respectively:</p>
 <p>&nbsp;<br>
 $$
-p(t) = \alpha_0+\alpha_1 t+\alpha_2 t^2,
+\begin{equation} \tag{9}
+\min_{P}\Big\{\frac{1}{N} \sum_{i=1}^N  \big(g_t'(x_i, P) - ( -\gamma g_t(x_i, P) \big)^2 \Big\}
+\end{equation}
 $$
 <p>&nbsp;<br>
 
-<p>and</p>
+<p>Letting \( \boldsymbol{x} \) be a vector with elements \( x_i \) and \( C(\boldsymbol{x}, P) = \frac{1}{N} \sum_i  \big(g_t'(x_i, P) - ( -\gamma g_t(x_i, P) \big)^2 \) denote the cost function, the minimization problem that our network must solve, becomes</p>
+
 <p>&nbsp;<br>
 $$
-s(t) = \beta_0+\beta_1 t+\beta_2 t^2+\beta_3 t^3.
+\min_{P} C(\boldsymbol{x}, P)
 $$
 <p>&nbsp;<br>
 
-<p>The polynomial multiplication gives us a new polynomial of degree \( 5 \)</p>
-<p>&nbsp;<br>
+<p>In terms of \( P_{\text{hidden} } \) and \( P_{\text{output} } \), this could also be expressed as</p>
+
+<p><p>&nbsp;<br>
 $$
-z(t) = \delta_0+\delta_1 t+\delta_2 t^2+\delta_3 t^3+\delta_4 t^4+\delta_5 t^5.
+\min_{P_{\text{hidden} }, \ P_{\text{output} }} C(\boldsymbol{x}, \{P_{\text{hidden} }, P_{\text{output} }\})
 $$
 <p>&nbsp;<br>
+</p>
 </section>
 
 <section>
-<h2 id="efficient-polynomial-multiplication">Efficient Polynomial Multiplication </h2>
+<h2 id="a-possible-implementation-of-a-neural-network">A possible implementation of a neural network </h2>
 
-<p>Computing polynomial products can be implemented efficiently if we rewrite the more brute force multiplications using convolution.
-We note first that the new coefficients are given as
+<p>For simplicity, it is assumed that the input is an array \( \boldsymbol{x} = (x_1, \dots, x_N) \) with \( N \) elements. It is at these points the neural network should find \( P \) such that it fulfills <a href="#mjx-eqn-9">(9)</a>.</p>
+
+<p>First, the neural network must feed forward the inputs.
+This means that \( \boldsymbol{x}s \) must be passed through an input layer, a hidden layer and a output layer. The input layer in this case, does not need to process the data any further.
+The input layer will consist of \( N_{\text{input} } \) neurons, passing its element to each neuron in the hidden layer.  The number of neurons in the hidden layer will be \( N_{\text{hidden} } \).
 </p>
+</section>
+
+<section>
+<h2 id="technicalities">Technicalities </h2>
+
+<p>For the \( i \)-th in the hidden layer with weight \( w_i^{\text{hidden} } \) and bias \( b_i^{\text{hidden} } \), the weighting from the \( j \)-th neuron at the input layer is:</p>
 
 <p>&nbsp;<br>
 $$
-\begin{split}
-\delta_0=&\alpha_0\beta_0\\
-\delta_1=&\alpha_1\beta_0+\alpha_0\beta_1\\
-\delta_2=&\alpha_0\beta_2+\alpha_1\beta_1+\alpha_2\beta_0\\
-\delta_3=&\alpha_1\beta_2+\alpha_2\beta_1+\alpha_0\beta_3\\
-\delta_4=&\alpha_2\beta_2+\alpha_1\beta_3\\
-\delta_5=&\alpha_2\beta_3.\\
-\end{split}
+\begin{aligned}
+z_{i,j}^{\text{hidden}} &= b_i^{\text{hidden}} + w_i^{\text{hidden}}x_j \\
+&=
+\begin{pmatrix}
+b_i^{\text{hidden}} & w_i^{\text{hidden}}
+\end{pmatrix}
+\begin{pmatrix}
+1 \\
+x_j
+\end{pmatrix}
+\end{aligned}
 $$
 <p>&nbsp;<br>
+</section>
 
-<p>We note that \( \alpha_i=0 \) except for \( i\in \left\{0,1,2\right\} \) and \( \beta_i=0 \) except for \( i\in\left\{0,1,2,3\right\} \).</p>
+<section>
+<h2 id="final-technicalities-i">Final technicalities I </h2>
+
+<p>The result after weighting the inputs at the \( i \)-th hidden neuron can be written as a vector:</p>
 
-<p>We can then rewrite the coefficients \( \delta_j \) using a discrete convolution as</p>
 <p>&nbsp;<br>
 $$
-\delta_j = \sum_{i=-\infty}^{i=\infty}\alpha_i\beta_{j-i}=(\alpha * \beta)_j,
+\begin{aligned}
+\boldsymbol{z}_{i}^{\text{hidden}} &= \Big( b_i^{\text{hidden}} + w_i^{\text{hidden}}x_1 , \ b_i^{\text{hidden}} + w_i^{\text{hidden}} x_2, \ \dots \, , \ b_i^{\text{hidden}} + w_i^{\text{hidden}} x_N\Big)  \\
+&=
+\begin{pmatrix}
+ b_i^{\text{hidden}}  & w_i^{\text{hidden}}
+\end{pmatrix}
+\begin{pmatrix}
+1  & 1 & \dots & 1 \\
+x_1 & x_2 & \dots & x_N
+\end{pmatrix} \\
+&= \boldsymbol{p}_{i, \text{hidden}}^T X
+\end{aligned}
 $$
 <p>&nbsp;<br>
+</section>
+
+<section>
+<h2 id="final-technicalities-ii">Final technicalities II </h2>
+
+<p>The vector \( \boldsymbol{p}_{i, \text{hidden}}^T \) constitutes each row in \( P_{\text{hidden} } \), which contains the weights for the neural network to minimize according to <a href="#mjx-eqn-9">(9)</a>.</p>
+
+<p>After having found \( \boldsymbol{z}_{i}^{\text{hidden}}  \) for every \( i \)-th neuron within the hidden layer, the vector will be sent to an activation function \( a_i(\boldsymbol{z}) \).</p>
+
+<p>In this example, the sigmoid function has been chosen to be the activation function for each hidden neuron:</p>
 
-<p>or as a double sum with restriction \( l=i+j \)</p>
 <p>&nbsp;<br>
 $$
-\delta_l = \sum_{ij}\alpha_i\beta_{j}.
+f(z) = \frac{1}{1 + \exp{(-z)}}
 $$
 <p>&nbsp;<br>
-</section>
 
-<section>
-<h2 id="further-simplification">Further simplification </h2>
+<p>It is possible to use other activations functions for the hidden layer also.</p>
 
-<p>Although we may have redundant operations with some few zeros for \( \beta_i \), we can rewrite the above sum in a more compact way as </p>
-<p>&nbsp;<br>
+<p>The output \( \boldsymbol{x}_i^{\text{hidden}} \) from each \( i \)-th hidden neuron is:</p>
+
+<p><p>&nbsp;<br>
 $$
-\delta_i = \sum_{k=0}^{k=m-1}\alpha_k\beta_{i-k},
+\boldsymbol{x}_i^{\text{hidden} } = f\big(  \boldsymbol{z}_{i}^{\text{hidden}} \big)
 $$
 <p>&nbsp;<br>
+</p>
 
-<p>where \( m=3 \) in our case, the maximum length of
-the vector \( \alpha \). Note that the vector \( \boldsymbol{\beta} \) has length \( n=4 \). Below we will find an even more efficient representation.
+<p>The outputs \( \boldsymbol{x}_i^{\text{hidden} }  \) are then sent to the output layer.</p>
+
+<p>The output layer consists of one neuron in this case, and combines the
+output from each of the neurons in the hidden layers. The output layer
+combines the results from the hidden layer using some weights \( w_i^{\text{output}} \)
+and biases \( b_i^{\text{output}} \). In this case,
+it is assumes that the number of neurons in the output layer is one.
 </p>
 </section>
 
 <section>
-<h2 id="a-more-efficient-way-of-coding-the-above-convolution">A more efficient way of coding the above Convolution </h2>
+<h2 id="final-technicalities-iii">Final technicalities III </h2>
 
-<p>Since we only have a finite number of \( \alpha \) and \( \beta \) values
-which are non-zero, we can rewrite the above convolution expressions
-as a matrix-vector multiplication
-</p>
+<p>The procedure of weighting the output neuron \( j \) in the hidden layer to the \( i \)-th neuron in the output layer is similar as for the hidden layer described previously.</p>
 
 <p>&nbsp;<br>
 $$
-\boldsymbol{\delta}=\begin{bmatrix}\alpha_0 & 0 & 0 & 0 \\
-                            \alpha_1 & \alpha_0 & 0 & 0 \\
-			    \alpha_2 & \alpha_1 & \alpha_0 & 0 \\
-			    0 & \alpha_2 & \alpha_1 & \alpha_0 \\
-			    0 & 0 & \alpha_2 & \alpha_1 \\
-			    0 & 0 & 0 & \alpha_2
-			    \end{bmatrix}\begin{bmatrix} \beta_0 \\ \beta_1 \\ \beta_2 \\ \beta_3\end{bmatrix}.
+\begin{aligned}
+z_{1,j}^{\text{output}} & =
+\begin{pmatrix}
+b_1^{\text{output}} & \boldsymbol{w}_1^{\text{output}}
+\end{pmatrix}
+\begin{pmatrix}
+1 \\
+\boldsymbol{x}_j^{\text{hidden}}
+\end{pmatrix}
+\end{aligned}
 $$
 <p>&nbsp;<br>
 </section>
 
 <section>
-<h2 id="commutative-process">Commutative process </h2>
+<h2 id="final-technicalities-iv">Final technicalities IV </h2>
+
+<p>Expressing \( z_{1,j}^{\text{output}} \) as a vector gives the following way of weighting the inputs from the hidden layer:</p>
 
-<p>The process is commutative and we can easily see that we can rewrite the multiplication in terms of  a matrix holding \( \beta \) and a vector holding \( \alpha \).
-In this case we have
-</p>
 <p>&nbsp;<br>
 $$
-\boldsymbol{\delta}=\begin{bmatrix}\beta_0 & 0 & 0  \\
-                            \beta_1 & \beta_0 & 0  \\
-			    \beta_2 & \beta_1 & \beta_0  \\
-			    \beta_3 & \beta_2 & \beta_1 \\
-			    0 & \beta_3 & \beta_2 \\
-			    0 & 0 & \beta_3
-			    \end{bmatrix}\begin{bmatrix} \alpha_0 \\ \alpha_1 \\ \alpha_2\end{bmatrix}.
+\boldsymbol{z}_{1}^{\text{output}} =
+\begin{pmatrix}
+b_1^{\text{output}} & \boldsymbol{w}_1^{\text{output}}
+\end{pmatrix}
+\begin{pmatrix}
+1  & 1 & \dots & 1 \\
+\boldsymbol{x}_1^{\text{hidden}} & \boldsymbol{x}_2^{\text{hidden}} & \dots & \boldsymbol{x}_N^{\text{hidden}}
+\end{pmatrix}
 $$
 <p>&nbsp;<br>
 
-<p>Note that the use of these matrices is for mathematical purposes only
-and not implementation purposes.  When implementing the above equation
-we do not encode (and allocate memory) the matrices explicitely.  We
-rather code the convolutions in the minimal memory footprint that they
-require.
-</p>
+<p>In this case we seek a continuous range of values since we are approximating a function. This means that after computing \( \boldsymbol{z}_{1}^{\text{output}} \) the neural network has finished its feed forward step, and \( \boldsymbol{z}_{1}^{\text{output}} \) is the final output of the network.</p>
 </section>
 
 <section>
-<h2 id="toeplitz-matrices">Toeplitz matrices </h2>
+<h2 id="back-propagation">Back propagation </h2>
+
+<p>The next step is to decide how the parameters should be changed such that they minimize the cost function.</p>
+
+<p>The chosen cost function for this problem is</p>
 
-<p>The above matrices are examples of so-called <a href="https://link.springer.com/book/10.1007/978-93-86279-04-0" target="_blank">Toeplitz
-matrices</a>. A
-Toeplitz matrix is a matrix in which each descending diagonal from
-left to right is constant. For instance the last matrix, which we
-rewrite as
-</p>
 <p>&nbsp;<br>
 $$
-\boldsymbol{A}=\begin{bmatrix}a_0 & 0 & 0  \\
-                            a_1 & a_0 & 0  \\
-			    a_2 & a_1 & a_0  \\
-			    a_3 & a_2 & a_1 \\
-			    0 & a_3 & a_2 \\
-			    0 & 0 & a_3
-			    \end{bmatrix},
+C(\boldsymbol{x}, P) = \frac{1}{N} \sum_i  \big(g_t'(x_i, P) - ( -\gamma g_t(x_i, P) \big)^2
 $$
 <p>&nbsp;<br>
 
-<p>with elements \( a_{ii}=a_{i+1,j+1}=a_{i-j} \) is an example of a Toeplitz
-matrix. Such a matrix does not need to be a square matrix.  Toeplitz
-matrices are also closely connected with Fourier series, because the multiplication operator by a trigonometric
-polynomial, compressed to a finite-dimensional space, can be
-represented by such a matrix. The example above shows that we can
-represent linear convolution as multiplication of a Toeplitz matrix by
-a vector.
-</p>
-</section>
-
-<section>
-<h2 id="fourier-series-and-toeplitz-matrices">Fourier series and Toeplitz matrices </h2>
+<p>In order to minimize the cost function, an optimization method must be chosen.</p>
 
-<p>This is an active and ogoing research area concerning CNNs. The following articles may be of interest</p>
-<ol>
-<p><li> <a href="https://www.sciencedirect.com/topics/engineering/convolution-theorem#:~:text=The%20convolution%20theorem%20(together%20with,k%20)%20G%20(%20k%20)%20." target="_blank">Read more about the convolution theorem and Fouriers series</a></li>
-<p><li> <a href="https://www.sciencedirect.com/science/article/pii/S1568494623006257" target="_blank">Fourier Transform Layer</a></li>
-</ol>
+<p>Here, gradient descent with a constant step size has been chosen.</p>
 </section>
 
 <section>
-<h2 id="generalizing-the-above-one-dimensional-case">Generalizing the above one-dimensional case </h2>
+<h2 id="gradient-descent">Gradient descent </h2>
 
-<p>In order to align the above simple case with the more general
-convolution cases, we rename \( \boldsymbol{\alpha} \), whose length is \( m=3 \),
-with \( \boldsymbol{w} \).  We will interpret \( \boldsymbol{w} \) as a weight/filter function
-with which we want to perform the convolution with an input variable
-\( \boldsymbol{x} \) of length \( n \).  We will assume always that the filter
-\( \boldsymbol{w} \) has dimensionality \( m \le n \).
+<p>The idea of the gradient descent algorithm is to update parameters in
+a direction where the cost function decreases goes to a minimum.
+</p>
+
+<p>In general, the update of some parameters \( \boldsymbol{\omega} \) given a cost
+function defined by some weights \( \boldsymbol{\omega} \), \( C(\boldsymbol{x},
+\boldsymbol{\omega}) \), goes as follows:
 </p>
 
-<p>We replace thus \( \boldsymbol{\beta} \) with \( \boldsymbol{x} \) and \( \boldsymbol{\delta} \) with \( \boldsymbol{y} \) and have</p>
 <p>&nbsp;<br>
 $$
-y(i)= \left(x*w\right)(i)= \sum_{k=0}^{k=m-1}w(k)x(i-k),
+\boldsymbol{\omega}_{\text{new} } = \boldsymbol{\omega} - \lambda \nabla_{\boldsymbol{\omega}} C(\boldsymbol{x}, \boldsymbol{\omega})
 $$
 <p>&nbsp;<br>
 
-<p>where \( m=3 \) in our case, the maximum length of the vector \( \boldsymbol{w} \).
-Here the symbol \( * \) represents the mathematical operation of convolution.
+<p>for a number of iterations or until $ \big|\big| \boldsymbol{\omega}_{\text{new} } - \boldsymbol{\omega} \big|\big|$ becomes smaller than some given tolerance.</p>
+
+<p>The value of \( \lambda \) decides how large steps the algorithm must take
+in the direction of $ \nabla_{\boldsymbol{\omega}} C(\boldsymbol{x}, \boldsymbol{\omega})$.
+The notation \( \nabla_{\boldsymbol{\omega}} \) express the gradient with respect
+to the elements in \( \boldsymbol{\omega} \).
 </p>
-</section>
 
-<section>
-<h2 id="memory-considerations">Memory considerations </h2>
-
-<p>This expression leaves us however with some terms with negative
-indices, for example \( x(-1) \) and \( x(-2) \) which may not be defined. Our
-vector \( \boldsymbol{x} \) has components \( x(0) \), \( x(1) \), \( x(2) \) and \( x(3) \).
+<p>In our case, we have to minimize the cost function \( C(\boldsymbol{x}, P) \) with
+respect to the two sets of weights and biases, that is for the hidden
+layer \( P_{\text{hidden} } \) and for the output layer \( P_{\text{output}
+} \) .
 </p>
 
-<p>The index \( j \) for \( \boldsymbol{x} \) runs from \( j=0 \) to \( j=3 \) since \( \boldsymbol{x} \) is meant to
-represent a third-order polynomial.
-</p>
+<p>This means that \( P_{\text{hidden} } \) and \( P_{\text{output} } \) is updated by</p>
 
-<p>Furthermore, the index \( i \) runs from \( i=0 \) to \( i=5 \) since \( \boldsymbol{y} \)
-contains the coefficients of a fifth-order polynomial.  When \( i=5 \) we
-may also have values of \( x(4) \) and \( x(5) \) which are not defined.
-</p>
+<p>&nbsp;<br>
+$$
+\begin{aligned}
+P_{\text{hidden},\text{new}} &= P_{\text{hidden}} - \lambda \nabla_{P_{\text{hidden}}} C(\boldsymbol{x}, P)  \\
+P_{\text{output},\text{new}} &= P_{\text{output}} - \lambda \nabla_{P_{\text{output}}} C(\boldsymbol{x}, P)
+\end{aligned}
+$$
+<p>&nbsp;<br>
 </section>
 
 <section>
-<h2 id="padding">Padding </h2>
+<h2 id="the-code-for-solving-the-ode">The code for solving the ODE </h2>
 
-<p>The solution to this is what is called <b>padding</b>!  We simply define a
-new vector \( x \) with two added elements set to zero before \( x(0) \) and
-two new elements after \( x(3) \) set to zero. That is, we augment the
-length of \( \boldsymbol{x} \) from \( n=4 \) to \( n+2P=8 \), where \( P=2 \) is the padding
-constant (a new hyperparameter), see discussions below as well.
-</p>
+
+<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #eeeedd">
+  <pre style="font-size: 80%; line-height: 125%;"><span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">autograd.numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">autograd</span> <span style="color: #8B008B; font-weight: bold">import</span> grad, elementwise_grad
+<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">autograd.numpy.random</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">npr</span>
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">matplotlib</span> <span style="color: #8B008B; font-weight: bold">import</span> pyplot <span style="color: #8B008B; font-weight: bold">as</span> plt
+
+<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">sigmoid</span>(z):
+    <span style="color: #8B008B; font-weight: bold">return</span> <span style="color: #B452CD">1</span>/(<span style="color: #B452CD">1</span> + np.exp(-z))
+
+<span style="color: #228B22"># Assuming one input, hidden, and output layer</span>
+<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">neural_network</span>(params, x):
+
+    <span style="color: #228B22"># Find the weights (including and biases) for the hidden and output layer.</span>
+    <span style="color: #228B22"># Assume that params is a list of parameters for each layer.</span>
+    <span style="color: #228B22"># The biases are the first element for each array in params,</span>
+    <span style="color: #228B22"># and the weights are the remaning elements in each array in params.</span>
+
+    w_hidden = params[<span style="color: #B452CD">0</span>]
+    w_output = params[<span style="color: #B452CD">1</span>]
+
+    <span style="color: #228B22"># Assumes input x being an one-dimensional array</span>
+    num_values = np.size(x)
+    x = x.reshape(-<span style="color: #B452CD">1</span>, num_values)
+
+    <span style="color: #228B22"># Assume that the input layer does nothing to the input x</span>
+    x_input = x
+
+    <span style="color: #228B22">## Hidden layer:</span>
+
+    <span style="color: #228B22"># Add a row of ones to include bias</span>
+    x_input = np.concatenate((np.ones((<span style="color: #B452CD">1</span>,num_values)), x_input ), axis = <span style="color: #B452CD">0</span>)
+
+    z_hidden = np.matmul(w_hidden, x_input)
+    x_hidden = sigmoid(z_hidden)
+
+    <span style="color: #228B22">## Output layer:</span>
+
+    <span style="color: #228B22"># Include bias:</span>
+    x_hidden = np.concatenate((np.ones((<span style="color: #B452CD">1</span>,num_values)), x_hidden ), axis = <span style="color: #B452CD">0</span>)
+
+    z_output = np.matmul(w_output, x_hidden)
+    x_output = z_output
+
+    <span style="color: #8B008B; font-weight: bold">return</span> x_output
+
+<span style="color: #228B22"># The trial solution using the deep neural network:</span>
+<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">g_trial</span>(x,params, g0 = <span style="color: #B452CD">10</span>):
+    <span style="color: #8B008B; font-weight: bold">return</span> g0 + x*neural_network(params,x)
+
+<span style="color: #228B22"># The right side of the ODE:</span>
+<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">g</span>(x, g_trial, gamma = <span style="color: #B452CD">2</span>):
+    <span style="color: #8B008B; font-weight: bold">return</span> -gamma*g_trial
+
+<span style="color: #228B22"># The cost function:</span>
+<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">cost_function</span>(P, x):
+
+    <span style="color: #228B22"># Evaluate the trial function with the current parameters P</span>
+    g_t = g_trial(x,P)
+
+    <span style="color: #228B22"># Find the derivative w.r.t x of the neural network</span>
+    d_net_out = elementwise_grad(neural_network,<span style="color: #B452CD">1</span>)(P,x)
+
+    <span style="color: #228B22"># Find the derivative w.r.t x of the trial function</span>
+    d_g_t = elementwise_grad(g_trial,<span style="color: #B452CD">0</span>)(x,P)
+
+    <span style="color: #228B22"># The right side of the ODE</span>
+    func = g(x, g_t)
+
+    err_sqr = (d_g_t - func)**<span style="color: #B452CD">2</span>
+    cost_sum = np.sum(err_sqr)
+
+    <span style="color: #8B008B; font-weight: bold">return</span> cost_sum / np.size(err_sqr)
+
+<span style="color: #228B22"># Solve the exponential decay ODE using neural network with one input, hidden, and output layer</span>
+<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">solve_ode_neural_network</span>(x, num_neurons_hidden, num_iter, lmb):
+    <span style="color: #228B22">## Set up initial weights and biases</span>
+
+    <span style="color: #228B22"># For the hidden layer</span>
+    p0 = npr.randn(num_neurons_hidden, <span style="color: #B452CD">2</span> )
+
+    <span style="color: #228B22"># For the output layer</span>
+    p1 = npr.randn(<span style="color: #B452CD">1</span>, num_neurons_hidden + <span style="color: #B452CD">1</span> ) <span style="color: #228B22"># +1 since bias is included</span>
+
+    P = [p0, p1]
+
+    <span style="color: #658b00">print</span>(<span style="color: #CD5555">&#39;Initial cost: %g&#39;</span>%cost_function(P, x))
+
+    <span style="color: #228B22">## Start finding the optimal weights using gradient descent</span>
+
+    <span style="color: #228B22"># Find the Python function that represents the gradient of the cost function</span>
+    <span style="color: #228B22"># w.r.t the 0-th input argument -- that is the weights and biases in the hidden and output layer</span>
+    cost_function_grad = grad(cost_function,<span style="color: #B452CD">0</span>)
+
+    <span style="color: #228B22"># Let the update be done num_iter times</span>
+    <span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(num_iter):
+        <span style="color: #228B22"># Evaluate the gradient at the current weights and biases in P.</span>
+        <span style="color: #228B22"># The cost_grad consist now of two arrays;</span>
+        <span style="color: #228B22"># one for the gradient w.r.t P_hidden and</span>
+        <span style="color: #228B22"># one for the gradient w.r.t P_output</span>
+        cost_grad =  cost_function_grad(P, x)
+
+        P[<span style="color: #B452CD">0</span>] = P[<span style="color: #B452CD">0</span>] - lmb * cost_grad[<span style="color: #B452CD">0</span>]
+        P[<span style="color: #B452CD">1</span>] = P[<span style="color: #B452CD">1</span>] - lmb * cost_grad[<span style="color: #B452CD">1</span>]
+
+    <span style="color: #658b00">print</span>(<span style="color: #CD5555">&#39;Final cost: %g&#39;</span>%cost_function(P, x))
+
+    <span style="color: #8B008B; font-weight: bold">return</span> P
+
+<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">g_analytic</span>(x, gamma = <span style="color: #B452CD">2</span>, g0 = <span style="color: #B452CD">10</span>):
+    <span style="color: #8B008B; font-weight: bold">return</span> g0*np.exp(-gamma*x)
+
+<span style="color: #228B22"># Solve the given problem</span>
+<span style="color: #8B008B; font-weight: bold">if</span> <span style="color: #00688B">__name__</span> == <span style="color: #CD5555">&#39;__main__&#39;</span>:
+    <span style="color: #228B22"># Set seed such that the weight are initialized</span>
+    <span style="color: #228B22"># with same weights and biases for every run.</span>
+    npr.seed(<span style="color: #B452CD">15</span>)
+
+    <span style="color: #228B22">## Decide the vales of arguments to the function to solve</span>
+    N = <span style="color: #B452CD">10</span>
+    x = np.linspace(<span style="color: #B452CD">0</span>, <span style="color: #B452CD">1</span>, N)
+
+    <span style="color: #228B22">## Set up the initial parameters</span>
+    num_hidden_neurons = <span style="color: #B452CD">10</span>
+    num_iter = <span style="color: #B452CD">10000</span>
+    lmb = <span style="color: #B452CD">0.001</span>
+
+    <span style="color: #228B22"># Use the network</span>
+    P = solve_ode_neural_network(x, num_hidden_neurons, num_iter, lmb)
+
+    <span style="color: #228B22"># Print the deviation from the trial solution and true solution</span>
+    res = g_trial(x,P)
+    res_analytical = g_analytic(x)
+
+    <span style="color: #658b00">print</span>(<span style="color: #CD5555">&#39;Max absolute difference: %g&#39;</span>%np.max(np.abs(res - res_analytical)))
+
+    <span style="color: #228B22"># Plot the results</span>
+    plt.figure(figsize=(<span style="color: #B452CD">10</span>,<span style="color: #B452CD">10</span>))
+
+    plt.title(<span style="color: #CD5555">&#39;Performance of neural network solving an ODE compared to the analytical solution&#39;</span>)
+    plt.plot(x, res_analytical)
+    plt.plot(x, res[<span style="color: #B452CD">0</span>,:])
+    plt.legend([<span style="color: #CD5555">&#39;analytical&#39;</span>,<span style="color: #CD5555">&#39;nn&#39;</span>])
+    plt.xlabel(<span style="color: #CD5555">&#39;x&#39;</span>)
+    plt.ylabel(<span style="color: #CD5555">&#39;g(x)&#39;</span>)
+    plt.show()
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
+</div>
 </section>
 
 <section>
-<h2 id="new-vector">New vector </h2>
+<h2 id="the-network-with-one-input-layer-specified-number-of-hidden-layers-and-one-output-layer">The network with one input layer, specified number of hidden layers, and one output layer </h2>
 
-<p>We have a new vector defined as \( x(0)=0 \), \( x(1)=0 \),
-\( x(2)=\beta_0 \), \( x(3)=\beta_1 \), \( x(4)=\beta_2 \), \( x(5)=\beta_3 \),
-\( x(6)=0 \), and \( x(7)=0 \).
-</p>
+<p>It is also possible to extend the construction of our network into a more general one, allowing the network to contain more than one hidden layers.</p>
 
-<p>We have added four new elements, which
-are all zero. The benefit is that we can rewrite the equation for
-\( \boldsymbol{y} \), with \( i=0,1,\dots,5 \),
-</p>
-<p>&nbsp;<br>
-$$
-y(i) = \sum_{k=0}^{k=m-1}w(k)x(i+(m-1)-k).
-$$
-<p>&nbsp;<br>
+<p>The number of neurons within each hidden layer are given as a list of integers in the program below.</p>
 
-<p>As an example, we have</p>
-<p>&nbsp;<br>
-$$
-y(4)=x(6)w(0)+x(5)w(1)+x(4)w(2)=0\times \alpha_0+\beta_3\alpha_1+\beta_2\alpha_2,
-$$
-<p>&nbsp;<br>
 
-<p>as before except that we have an additional term \( x(6)w(0) \), which is zero.</p>
+<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #eeeedd">
+  <pre style="font-size: 80%; line-height: 125%;"><span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">autograd.numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">autograd</span> <span style="color: #8B008B; font-weight: bold">import</span> grad, elementwise_grad
+<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">autograd.numpy.random</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">npr</span>
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">matplotlib</span> <span style="color: #8B008B; font-weight: bold">import</span> pyplot <span style="color: #8B008B; font-weight: bold">as</span> plt
 
-<p>Similarly, for the fifth-order term we have</p>
-<p>&nbsp;<br>
-$$
-y(5)=x(7)w(0)+x(6)w(1)+x(5)w(2)=0\times \alpha_0+0\times\alpha_1+\beta_3\alpha_2.
-$$
-<p>&nbsp;<br>
+<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">sigmoid</span>(z):
+    <span style="color: #8B008B; font-weight: bold">return</span> <span style="color: #B452CD">1</span>/(<span style="color: #B452CD">1</span> + np.exp(-z))
 
-<p>The zeroth-order term is</p>
-<p>&nbsp;<br>
-$$
-y(0)=x(2)w(0)+x(1)w(1)+x(0)w(2)=\beta_0 \alpha_0+0\times\alpha_1+0\times\alpha_2=\alpha_0\beta_0.
-$$
-<p>&nbsp;<br>
+<span style="color: #228B22"># The neural network with one input layer and one output layer,</span>
+<span style="color: #228B22"># but with number of hidden layers specified by the user.</span>
+<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">deep_neural_network</span>(deep_params, x):
+    <span style="color: #228B22"># N_hidden is the number of hidden layers</span>
+    <span style="color: #228B22"># deep_params is a list, len() should be used</span>
+    N_hidden = <span style="color: #658b00">len</span>(deep_params) - <span style="color: #B452CD">1</span> <span style="color: #228B22"># -1 since params consists of</span>
+                                        <span style="color: #228B22"># parameters to all the hidden</span>
+                                        <span style="color: #228B22"># layers AND the output layer.</span>
+
+    <span style="color: #228B22"># Assumes input x being an one-dimensional array</span>
+    num_values = np.size(x)
+    x = x.reshape(-<span style="color: #B452CD">1</span>, num_values)
+
+    <span style="color: #228B22"># Assume that the input layer does nothing to the input x</span>
+    x_input = x
+
+    <span style="color: #228B22"># Due to multiple hidden layers, define a variable referencing to the</span>
+    <span style="color: #228B22"># output of the previous layer:</span>
+    x_prev = x_input
+
+    <span style="color: #228B22">## Hidden layers:</span>
+
+    <span style="color: #8B008B; font-weight: bold">for</span> l <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(N_hidden):
+        <span style="color: #228B22"># From the list of parameters P; find the correct weigths and bias for this layer</span>
+        w_hidden = deep_params[l]
+
+        <span style="color: #228B22"># Add a row of ones to include bias</span>
+        x_prev = np.concatenate((np.ones((<span style="color: #B452CD">1</span>,num_values)), x_prev ), axis = <span style="color: #B452CD">0</span>)
+
+        z_hidden = np.matmul(w_hidden, x_prev)
+        x_hidden = sigmoid(z_hidden)
+
+        <span style="color: #228B22"># Update x_prev such that next layer can use the output from this layer</span>
+        x_prev = x_hidden
+
+    <span style="color: #228B22">## Output layer:</span>
+
+    <span style="color: #228B22"># Get the weights and bias for this layer</span>
+    w_output = deep_params[-<span style="color: #B452CD">1</span>]
+
+    <span style="color: #228B22"># Include bias:</span>
+    x_prev = np.concatenate((np.ones((<span style="color: #B452CD">1</span>,num_values)), x_prev), axis = <span style="color: #B452CD">0</span>)
+
+    z_output = np.matmul(w_output, x_prev)
+    x_output = z_output
+
+    <span style="color: #8B008B; font-weight: bold">return</span> x_output
+
+<span style="color: #228B22"># The trial solution using the deep neural network:</span>
+<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">g_trial_deep</span>(x,params, g0 = <span style="color: #B452CD">10</span>):
+    <span style="color: #8B008B; font-weight: bold">return</span> g0 + x*deep_neural_network(params, x)
+
+<span style="color: #228B22"># The right side of the ODE:</span>
+<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">g</span>(x, g_trial, gamma = <span style="color: #B452CD">2</span>):
+    <span style="color: #8B008B; font-weight: bold">return</span> -gamma*g_trial
+
+<span style="color: #228B22"># The same cost function as before, but calls deep_neural_network instead.</span>
+<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">cost_function_deep</span>(P, x):
+
+    <span style="color: #228B22"># Evaluate the trial function with the current parameters P</span>
+    g_t = g_trial_deep(x,P)
+
+    <span style="color: #228B22"># Find the derivative w.r.t x of the neural network</span>
+    d_net_out = elementwise_grad(deep_neural_network,<span style="color: #B452CD">1</span>)(P,x)
+
+    <span style="color: #228B22"># Find the derivative w.r.t x of the trial function</span>
+    d_g_t = elementwise_grad(g_trial_deep,<span style="color: #B452CD">0</span>)(x,P)
+
+    <span style="color: #228B22"># The right side of the ODE</span>
+    func = g(x, g_t)
+
+    err_sqr = (d_g_t - func)**<span style="color: #B452CD">2</span>
+    cost_sum = np.sum(err_sqr)
+
+    <span style="color: #8B008B; font-weight: bold">return</span> cost_sum / np.size(err_sqr)
+
+<span style="color: #228B22"># Solve the exponential decay ODE using neural network with one input and one output layer,</span>
+<span style="color: #228B22"># but with specified number of hidden layers from the user.</span>
+<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">solve_ode_deep_neural_network</span>(x, num_neurons, num_iter, lmb):
+    <span style="color: #228B22"># num_hidden_neurons is now a list of number of neurons within each hidden layer</span>
+
+    <span style="color: #228B22"># The number of elements in the list num_hidden_neurons thus represents</span>
+    <span style="color: #228B22"># the number of hidden layers.</span>
+
+    <span style="color: #228B22"># Find the number of hidden layers:</span>
+    N_hidden = np.size(num_neurons)
+
+    <span style="color: #228B22">## Set up initial weights and biases</span>
+
+    <span style="color: #228B22"># Initialize the list of parameters:</span>
+    P = [<span style="color: #8B008B; font-weight: bold">None</span>]*(N_hidden + <span style="color: #B452CD">1</span>) <span style="color: #228B22"># + 1 to include the output layer</span>
+
+    P[<span style="color: #B452CD">0</span>] = npr.randn(num_neurons[<span style="color: #B452CD">0</span>], <span style="color: #B452CD">2</span> )
+    <span style="color: #8B008B; font-weight: bold">for</span> l <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #B452CD">1</span>,N_hidden):
+        P[l] = npr.randn(num_neurons[l], num_neurons[l-<span style="color: #B452CD">1</span>] + <span style="color: #B452CD">1</span>) <span style="color: #228B22"># +1 to include bias</span>
+
+    <span style="color: #228B22"># For the output layer</span>
+    P[-<span style="color: #B452CD">1</span>] = npr.randn(<span style="color: #B452CD">1</span>, num_neurons[-<span style="color: #B452CD">1</span>] + <span style="color: #B452CD">1</span> ) <span style="color: #228B22"># +1 since bias is included</span>
+
+    <span style="color: #658b00">print</span>(<span style="color: #CD5555">&#39;Initial cost: %g&#39;</span>%cost_function_deep(P, x))
+
+    <span style="color: #228B22">## Start finding the optimal weights using gradient descent</span>
+
+    <span style="color: #228B22"># Find the Python function that represents the gradient of the cost function</span>
+    <span style="color: #228B22"># w.r.t the 0-th input argument -- that is the weights and biases in the hidden and output layer</span>
+    cost_function_deep_grad = grad(cost_function_deep,<span style="color: #B452CD">0</span>)
+
+    <span style="color: #228B22"># Let the update be done num_iter times</span>
+    <span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(num_iter):
+        <span style="color: #228B22"># Evaluate the gradient at the current weights and biases in P.</span>
+        <span style="color: #228B22"># The cost_grad consist now of N_hidden + 1 arrays; the gradient w.r.t the weights and biases</span>
+        <span style="color: #228B22"># in the hidden layers and output layers evaluated at x.</span>
+        cost_deep_grad =  cost_function_deep_grad(P, x)
+
+        <span style="color: #8B008B; font-weight: bold">for</span> l <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(N_hidden+<span style="color: #B452CD">1</span>):
+            P[l] = P[l] - lmb * cost_deep_grad[l]
+
+    <span style="color: #658b00">print</span>(<span style="color: #CD5555">&#39;Final cost: %g&#39;</span>%cost_function_deep(P, x))
+
+    <span style="color: #8B008B; font-weight: bold">return</span> P
+
+<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">g_analytic</span>(x, gamma = <span style="color: #B452CD">2</span>, g0 = <span style="color: #B452CD">10</span>):
+    <span style="color: #8B008B; font-weight: bold">return</span> g0*np.exp(-gamma*x)
+
+<span style="color: #228B22"># Solve the given problem</span>
+<span style="color: #8B008B; font-weight: bold">if</span> <span style="color: #00688B">__name__</span> == <span style="color: #CD5555">&#39;__main__&#39;</span>:
+    npr.seed(<span style="color: #B452CD">15</span>)
+
+    <span style="color: #228B22">## Decide the vales of arguments to the function to solve</span>
+    N = <span style="color: #B452CD">10</span>
+    x = np.linspace(<span style="color: #B452CD">0</span>, <span style="color: #B452CD">1</span>, N)
+
+    <span style="color: #228B22">## Set up the initial parameters</span>
+    num_hidden_neurons = np.array([<span style="color: #B452CD">10</span>,<span style="color: #B452CD">10</span>])
+    num_iter = <span style="color: #B452CD">10000</span>
+    lmb = <span style="color: #B452CD">0.001</span>
+
+    P = solve_ode_deep_neural_network(x, num_hidden_neurons, num_iter, lmb)
+
+    res = g_trial_deep(x,P)
+    res_analytical = g_analytic(x)
+
+    plt.figure(figsize=(<span style="color: #B452CD">10</span>,<span style="color: #B452CD">10</span>))
+
+    plt.title(<span style="color: #CD5555">&#39;Performance of a deep neural network solving an ODE compared to the analytical solution&#39;</span>)
+    plt.plot(x, res_analytical)
+    plt.plot(x, res[<span style="color: #B452CD">0</span>,:])
+    plt.legend([<span style="color: #CD5555">&#39;analytical&#39;</span>,<span style="color: #CD5555">&#39;dnn&#39;</span>])
+    plt.ylabel(<span style="color: #CD5555">&#39;g(x)&#39;</span>)
+    plt.show()
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
+</div>
 </section>
 
 <section>
-<h2 id="rewriting-as-dot-products">Rewriting as dot products </h2>
+<h2 id="example-population-growth">Example: Population growth </h2>
+
+<p>A logistic model of population growth assumes that a population converges toward an equilibrium.
+The population growth can be modeled by
+</p>
 
-<p>If we now flip the filter/weight vector, with the following term as a typical example</p>
 <p>&nbsp;<br>
 $$
-y(0)=x(2)w(0)+x(1)w(1)+x(0)w(2)=x(2)\tilde{w}(2)+x(1)\tilde{w}(1)+x(0)\tilde{w}(0),
+\begin{equation} \tag{10}
+	g'(t) = \alpha g(t)(A - g(t))
+\end{equation}
 $$
 <p>&nbsp;<br>
 
-<p>with \( \tilde{w}(0)=w(2) \), \( \tilde{w}(1)=w(1) \), and \( \tilde{w}(2)=w(0) \), we can then rewrite the above sum as a dot product of
-\( x(i:i+(m-1))\tilde{w} \) for element \( y(i) \), where \( x(i:i+(m-1)) \) is simply a patch of \( \boldsymbol{x} \) of size \( m-1 \).
+<p>where \( g(t) \) is the population density at time \( t \), \( \alpha > 0 \) the growth rate and \( A > 0 \) is the maximum population number in the environment.
+Also, at \( t = 0 \) the population has the size \( g(0) = g_0 \), where \( g_0 \) is some chosen constant.
 </p>
 
-<p>The padding \( P \) we have introduced for the convolution stage is just
-another hyperparameter which is introduced as part of the
-architecture. Similarly, below we will also introduce another
-hyperparameter called <b>Stride</b> \( S \). 
+<p>In this example, similar network as for the exponential decay using Autograd has been used to solve the equation. However, as the implementation might suffer from e.g numerical instability
+and high execution time (this might be more apparent in the examples solving PDEs),
+using a library like  TensorFlow is recommended.
+Here, we stay with a more simple approach and implement for comparison, the simple forward Euler method.
 </p>
 </section>
 
 <section>
-<h2 id="cross-correlation">Cross correlation </h2>
+<h2 id="setting-up-the-problem">Setting up the problem </h2>
 
-<p>In essentially all applications one uses what is called cross correlation instead of the standard convolution described above.
-This means that multiplication is performed in the same direction and instead of the general expression we have discussed above (with infinite sums)
+<p>Here, we will model a population \( g(t) \) in an environment having carrying capacity \( A \).
+The population follows the model
 </p>
-<p>&nbsp;<br>
-$$
-y(i) = \sum_{k=-\infty}^{k=\infty}w(k)x(i-k),
-$$
-<p>&nbsp;<br>
 
-<p>we have now</p>
 <p>&nbsp;<br>
 $$
-y(i) = \sum_{k=-\infty}^{k=\infty}w(k)x(i+k).
+\begin{equation} \tag{11}
+g'(t) = \alpha g(t)(A - g(t))
+\end{equation}
 $$
 <p>&nbsp;<br>
 
-<p>Both TensorFlow and PyTorch (as well as our own code example below),
-implement the last equation, although it is normally referred to as
-convolution.  The same padding rules and stride rules discussed below
-apply to this expression as well.
-</p>
+<p>where \( g(0) = g_0 \).</p>
 
-<p>We leave it as an exercise for you to convince yourself that the example we have discussed till now, gives the same final expression using the last expression.</p>
-<h2 id="two-dimensional-objects">Two-dimensional objects </h2>
+<p>In this example, we let \( \alpha = 2 \), \( A = 1 \), and \( g_0 = 1.2 \).</p>
+</section>
 
-<p>We are now ready to start studying the discrete convolutions relevant for convolutional neural networks.
-We often use convolutions over more than one dimension at a time. If
-we have a two-dimensional image \( X \) as input, we can have a <b>filter</b>
-defined by a two-dimensional <b>kernel/weight/filter</b> \( W \). This leads to an output \( Y \)
+<section>
+<h2 id="the-trial-solution">The trial solution </h2>
+
+<p>We will get a slightly different trial solution, as the boundary conditions are different
+compared to the case for exponential decay.
 </p>
 
-<p>&nbsp;<br>
-$$
-Y(i,j)=(X * W)(i,j) = \sum_m\sum_n X(m,n)W(i-m,j-n).
-$$
-<p>&nbsp;<br>
+<p>A possible trial solution satisfying the condition \( g(0) = g_0 \) could be</p>
 
-<p>Convolution is a commutative process, which means we can rewrite this equation as</p>
-<p>&nbsp;<br>
+<p><p>&nbsp;<br>
 $$
-Y(i,j)=(X * W)(i,j) = \sum_m\sum_n X(i-m,j-n)W(m,n).
+h_1(t) = g_0 + t \cdot N(t,P)
 $$
 <p>&nbsp;<br>
-
-<p>Normally the latter is more straightforward to implement in a machine
-larning library since there is less variation in the range of values
-of \( m \) and \( n \).
 </p>
 
-<p>As mentioned above, most deep learning libraries implement
-cross-correlation instead of convolution (although it is referred to as
-convolution)
-</p>
-<p>&nbsp;<br>
+<p>with \( N(t,P) \) being the output from the neural network with weights and biases for each layer collected in the set \( P \).</p>
+
+<p>The analytical solution is</p>
+
+<p><p>&nbsp;<br>
 $$
-Y(i,j)=(X * W)(i,j) = \sum_m\sum_n X(i+m,j+n)W(m,n).
+g(t) = \frac{Ag_0}{g_0 + (A - g_0)\exp(-\alpha A t)}
 $$
 <p>&nbsp;<br>
+</p>
 </section>
 
 <section>
-<h2 id="cnns-in-more-detail-simple-example">CNNs in more detail, simple example  </h2>
+<h2 id="the-program-using-autograd">The program using Autograd </h2>
 
-<p>Let assume we have an input matrix \( X \) of dimensionality \( 3\times 3 \)
-and a \( 2\times 2 \) filter \( W \) given by the following matrices
-</p>
+<p>The network will be the similar as for the exponential decay example, but with some small modifications for our problem.</p>
 
-<p>&nbsp;<br>
-$$
-\boldsymbol{X}=\begin{bmatrix}x_{00} & x_{01} & x_{02}  \\
-                      x_{10} & x_{11} & x_{12}  \\
-	              x_{20} & x_{21} & x_{22} \end{bmatrix},
-$$
-<p>&nbsp;<br>
 
-<p>and </p>
-<p>&nbsp;<br>
-$$
-\boldsymbol{W}=\begin{bmatrix}w_{00} & w_{01} \\
-	              w_{10} & w_{11}\end{bmatrix}.
-$$
-<p>&nbsp;<br>
+<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #eeeedd">
+  <pre style="font-size: 80%; line-height: 125%;"><span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">autograd.numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">autograd</span> <span style="color: #8B008B; font-weight: bold">import</span> grad, elementwise_grad
+<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">autograd.numpy.random</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">npr</span>
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">matplotlib</span> <span style="color: #8B008B; font-weight: bold">import</span> pyplot <span style="color: #8B008B; font-weight: bold">as</span> plt
 
-<p>We introduce now the hyperparameter \( S \) <b>stride</b>. Stride represents how the filter \( W \) moves the convolution process on the matrix \( X \).
-We strongly recommend the repository on <a href="https://github.com/vdumoulin/conv_arithmetic" target="_blank">Arithmetic of deep learning by Dumoulin and Visin</a> 
-</p>
+<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">sigmoid</span>(z):
+    <span style="color: #8B008B; font-weight: bold">return</span> <span style="color: #B452CD">1</span>/(<span style="color: #B452CD">1</span> + np.exp(-z))
 
-<p>Here we set the stride equal to \( S=1 \), which means that, starting with the element \( x_{00} \), the filter will act on \( 2\times 2 \) submatrices each time, starting with the upper corner and moving according to the stride value column by column. </p>
+<span style="color: #228B22"># Function to get the parameters.</span>
+<span style="color: #228B22"># Done such that one can easily change the paramaters after one&#39;s liking.</span>
+<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">get_parameters</span>():
+    alpha = <span style="color: #B452CD">2</span>
+    A = <span style="color: #B452CD">1</span>
+    g0 = <span style="color: #B452CD">1.2</span>
+    <span style="color: #8B008B; font-weight: bold">return</span> alpha, A, g0
 
-<p>Here we perform the operation</p>
-<p>&nbsp;<br>
-$$
-Y_(i,j)=(X * W)(i,j) = \sum_m\sum_n X(i-m,j-n)W(m,n),
-$$
-<p>&nbsp;<br>
+<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">deep_neural_network</span>(deep_params, x):
+    <span style="color: #228B22"># N_hidden is the number of hidden layers</span>
+    <span style="color: #228B22"># deep_params is a list, len() should be used</span>
+    N_hidden = <span style="color: #658b00">len</span>(deep_params) - <span style="color: #B452CD">1</span> <span style="color: #228B22"># -1 since params consists of</span>
+                                        <span style="color: #228B22"># parameters to all the hidden</span>
+                                        <span style="color: #228B22"># layers AND the output layer.</span>
 
-<p>and obtain</p>
-<p>&nbsp;<br>
-$$
-\boldsymbol{Y}=\begin{bmatrix}x_{00}w_{00}+x_{01}w_{01}+x_{10}w_{10}+x_{11}w_{11} & x_{01}w_{00}+x_{02}w_{01}+x_{11}w_{10}+x_{12}w_{11}  \\
-	              x_{10}w_{00}+x_{11}w_{01}+x_{20}w_{10}+x_{21}w_{11} & x_{11}w_{00}+x_{12}w_{01}+x_{21}w_{10}+x_{22}w_{11}\end{bmatrix}.
-$$
-<p>&nbsp;<br>
+    <span style="color: #228B22"># Assumes input x being an one-dimensional array</span>
+    num_values = np.size(x)
+    x = x.reshape(-<span style="color: #B452CD">1</span>, num_values)
 
-<p>We can rewrite this operation in terms of a matrix-vector multiplication by defining a new vector where we flatten out the inputs as a vector \( \boldsymbol{X}' \) of length \( 9 \) and
-a matrix \( \boldsymbol{W}' \) with dimension \( 4\times 9 \) as
-</p>
+    <span style="color: #228B22"># Assume that the input layer does nothing to the input x</span>
+    x_input = x
 
-<p>&nbsp;<br>
-$$
-\boldsymbol{X}'=\begin{bmatrix}x_{00} \\ x_{01} \\ x_{02} \\ x_{10} \\ x_{11} \\ x_{12} \\ x_{20} \\ x_{21} \\ x_{22} \end{bmatrix},
-$$
-<p>&nbsp;<br>
+    <span style="color: #228B22"># Due to multiple hidden layers, define a variable referencing to the</span>
+    <span style="color: #228B22"># output of the previous layer:</span>
+    x_prev = x_input
 
-<p>and the new matrix</p>
-<p>&nbsp;<br>
-$$
-\boldsymbol{W}'=\begin{bmatrix} w_{00} & w_{01} & 0 & w_{10} & w_{11} & 0 & 0 & 0 & 0 \\
-                        0  & w_{00} & w_{01} & 0 & w_{10} & w_{11} & 0 & 0 & 0 \\
-			0 & 0 & 0 & w_{00} & w_{01} & 0 & w_{10} & w_{11} & 0  \\
-                        0 & 0 & 0 & 0 & w_{00} & w_{01} & 0 & w_{10} & w_{11}\end{bmatrix}.
-$$
-<p>&nbsp;<br>
+    <span style="color: #228B22">## Hidden layers:</span>
 
-<p>We see easily that performing the matrix-vector multiplication \( \boldsymbol{W}'\boldsymbol{X}' \) is the same as the above convolution with stride \( S=1 \), that is</p>
+    <span style="color: #8B008B; font-weight: bold">for</span> l <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(N_hidden):
+        <span style="color: #228B22"># From the list of parameters P; find the correct weigths and bias for this layer</span>
+        w_hidden = deep_params[l]
 
-<p>&nbsp;<br>
-$$
-Y=(\boldsymbol{W}*\boldsymbol{X}),
-$$
-<p>&nbsp;<br>
+        <span style="color: #228B22"># Add a row of ones to include bias</span>
+        x_prev = np.concatenate((np.ones((<span style="color: #B452CD">1</span>,num_values)), x_prev ), axis = <span style="color: #B452CD">0</span>)
 
-<p>is now given by \( \boldsymbol{W}'\boldsymbol{X}' \) which is a vector of length \( 4 \) instead of the originally resulting  \( 2\times 2 \) output matrix.</p>
-</section>
+        z_hidden = np.matmul(w_hidden, x_prev)
+        x_hidden = sigmoid(z_hidden)
 
-<section>
-<h2 id="the-convolution-stage">The convolution stage </h2>
+        <span style="color: #228B22"># Update x_prev such that next layer can use the output from this layer</span>
+        x_prev = x_hidden
 
-<p>The convolution stage, where we apply different filters \( \boldsymbol{W} \) in
-order to reduce the dimensionality of an image, adds, in addition to
-the weights and biases (to be trained by the back propagation
-algorithm) that define the filters, two new hyperparameters, the so-called
-<b>padding</b> \( P \) and the stride \( S \).
-</p>
-</section>
+    <span style="color: #228B22">## Output layer:</span>
 
-<section>
-<h2 id="finding-the-number-of-parameters">Finding the number of parameters </h2>
+    <span style="color: #228B22"># Get the weights and bias for this layer</span>
+    w_output = deep_params[-<span style="color: #B452CD">1</span>]
 
-<p>In the above example we have an input matrix of dimension \( 3\times
-3 \). In general we call the input for an input volume and it is defined
-by its width \( H_1 \), height \( H_1 \) and depth \( D_1 \). If we have the
-standard three color channels \( D_1=3 \).
-</p>
+    <span style="color: #228B22"># Include bias:</span>
+    x_prev = np.concatenate((np.ones((<span style="color: #B452CD">1</span>,num_values)), x_prev), axis = <span style="color: #B452CD">0</span>)
 
-<p>The above example has \( W_1=H_1=3 \) and \( D_1=1 \).</p>
+    z_output = np.matmul(w_output, x_prev)
+    x_output = z_output
 
-<p>When we introduce the filter we have the following additional hyperparameters</p>
-<ol>
-<p><li> \( K \) the number of filters. It is common to perform the convolution of the input several times since by experience shrinking the input too fast does not work well</li>
-<p><li> \( F \) as the filter's spatial extent</li>
-<p><li> \( S \) as the stride parameter</li>
-<p><li> \( P \) as the padding parameter</li>
-</ol>
-<p>
-<p>These parameters are defined by the architecture of the network and are not included in the training.</p>
-</section>
+    <span style="color: #8B008B; font-weight: bold">return</span> x_output
 
-<section>
-<h2 id="new-image-or-volume">New image (or volume) </h2>
 
-<p>Acting with the filter on the input volume produces an output volume
-which is defined by its width \( W_2 \), its height \( H_2 \) and its depth
-\( D_2 \).
-</p>
 
-<p>These are defined by the following relations</p>
-<p>&nbsp;<br>
-$$
-W_2 = \frac{(W_1-F+2P)}{S}+1,
-$$
-<p>&nbsp;<br>
 
-<p>&nbsp;<br>
-$$
-H_2 = \frac{(H_1-F+2P)}{S}+1,
-$$
-<p>&nbsp;<br>
+<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">cost_function_deep</span>(P, x):
 
-<p>and \( D_2=K \).</p>
-</section>
+    <span style="color: #228B22"># Evaluate the trial function with the current parameters P</span>
+    g_t = g_trial_deep(x,P)
 
-<section>
-<h2 id="parameters-to-train-common-settings">Parameters to train, common settings </h2>
+    <span style="color: #228B22"># Find the derivative w.r.t x of the trial function</span>
+    d_g_t = elementwise_grad(g_trial_deep,<span style="color: #B452CD">0</span>)(x,P)
 
-<p>With parameter sharing, the convolution involves thus  for each filter  \( F\times F\times D_1 \) weights plus one bias parameter.</p>
+    <span style="color: #228B22"># The right side of the ODE</span>
+    func = f(x, g_t)
 
-<p>In total we have</p>
-<p>&nbsp;<br>
-$$
-\left(F\times F\times D_1\right) \times K+K_{\mathrm{biases}},
-$$
-<p>&nbsp;<br>
+    err_sqr = (d_g_t - func)**<span style="color: #B452CD">2</span>
+    cost_sum = np.sum(err_sqr)
 
-<p>parameters to train by back propagation.</p>
+    <span style="color: #8B008B; font-weight: bold">return</span> cost_sum / np.size(err_sqr)
 
-<p>It is common to let \( K \) come in powers of \( 2 \), that is \( 32 \), \( 64 \), \( 128 \) etc.</p>
+<span style="color: #228B22"># The right side of the ODE:</span>
+<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">f</span>(x, g_trial):
+    alpha,A, g0 = get_parameters()
+    <span style="color: #8B008B; font-weight: bold">return</span> alpha*g_trial*(A - g_trial)
 
-<div class="alert alert-block alert-block alert-text-normal">
-<b>Common settings</b>
-<p>
-<ol>
-<p><li> \( \begin{array}{c} F=3 &amp; S=1 &amp; P=1 \end{array} \)</li>
-<p><li> \( \begin{array}{c} F=5 &amp; S=1 &amp; P=2 \end{array} \)</li>
-<p><li> \( \begin{array}{c} F=5 &amp; S=2 &amp; P=\mathrm{open} \end{array} \)</li>
-<p><li> \( \begin{array}{c} F=1 &amp; S=1 &amp; P=0 \end{array} \)</li>
-</ol>
-</div>
-</section>
+<span style="color: #228B22"># The trial solution using the deep neural network:</span>
+<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">g_trial_deep</span>(x, params):
+    alpha,A, g0 = get_parameters()
+    <span style="color: #8B008B; font-weight: bold">return</span> g0 + x*deep_neural_network(params,x)
 
-<section>
-<h2 id="examples-of-cnn-setups">Examples of CNN setups </h2>
+<span style="color: #228B22"># The analytical solution:</span>
+<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">g_analytic</span>(t):
+    alpha,A, g0 = get_parameters()
+    <span style="color: #8B008B; font-weight: bold">return</span> A*g0/(g0 + (A - g0)*np.exp(-alpha*A*t))
 
-<p>Let us assume we have an input volume \( V \) given by an image of dimensionality
-\( 32\times 32 \times 3 \), that is three color channels and \( 32\times 32 \) pixels.
-</p>
+<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">solve_ode_deep_neural_network</span>(x, num_neurons, num_iter, lmb):
+    <span style="color: #228B22"># num_hidden_neurons is now a list of number of neurons within each hidden layer</span>
 
-<p>We apply a filter of dimension \( 5\times 5 \) ten times with stride \( S=1 \) and padding \( P=0 \).</p>
+    <span style="color: #228B22"># Find the number of hidden layers:</span>
+    N_hidden = np.size(num_neurons)
 
-<p>The output volume is given by \( (32-5)/1+1=28 \), resulting in ten images
-of dimensionality \( 28\times 28\times 3 \).
-</p>
+    <span style="color: #228B22">## Set up initial weigths and biases</span>
 
-<p>The total number of parameters to train for each filter is then
-\( 5\times 5\times 3+1 \), where the last parameter is the bias. This
-gives us \( 76 \) parameters for each filter, leading to a total of \( 760 \)
-parameters for the ten filters.
-</p>
+    <span style="color: #228B22"># Initialize the list of parameters:</span>
+    P = [<span style="color: #8B008B; font-weight: bold">None</span>]*(N_hidden + <span style="color: #B452CD">1</span>) <span style="color: #228B22"># + 1 to include the output layer</span>
 
-<p>How many parameters will a filter of dimensionality \( 3\times 3 \)
-(adding color channels) result in if we produce \( 32 \) new images? Use \( S=1 \) and \( P=0 \).
-</p>
+    P[<span style="color: #B452CD">0</span>] = npr.randn(num_neurons[<span style="color: #B452CD">0</span>], <span style="color: #B452CD">2</span> )
+    <span style="color: #8B008B; font-weight: bold">for</span> l <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #B452CD">1</span>,N_hidden):
+        P[l] = npr.randn(num_neurons[l], num_neurons[l-<span style="color: #B452CD">1</span>] + <span style="color: #B452CD">1</span>) <span style="color: #228B22"># +1 to include bias</span>
 
-<p>Note that strides constitute a form of <b>subsampling</b>. As an alternative to
-being interpreted as a measure of how much the kernel/filter is translated, strides
-can also be viewed as how much of the output is retained. For instance, moving
-the kernel by hops of two is equivalent to moving the kernel by hops of one but
-retaining only odd output elements.
-</p>
-</section>
+    <span style="color: #228B22"># For the output layer</span>
+    P[-<span style="color: #B452CD">1</span>] = npr.randn(<span style="color: #B452CD">1</span>, num_neurons[-<span style="color: #B452CD">1</span>] + <span style="color: #B452CD">1</span> ) <span style="color: #228B22"># +1 since bias is included</span>
 
-<section>
-<h2 id="summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book">Summarizing: Performing a general discrete convolution (<a href="https://github.com/rasbt/machine-learning-book" target="_blank">From Raschka et al</a>) </h2>
+    <span style="color: #658b00">print</span>(<span style="color: #CD5555">&#39;Initial cost: %g&#39;</span>%cost_function_deep(P, x))
 
-<center>  <!-- FIGURE -->
-<hr class="figure">
-<center>
-<p class="caption">Figure 4:  A deep CNN </p>
-</center>
-<p><img src="http://github.com/figslides/discreteconv1.png" width="500" align="bottom"></p>
-</center>
-</section>
+    <span style="color: #228B22">## Start finding the optimal weigths using gradient descent</span>
 
-<section>
-<h2 id="pooling">Pooling </h2>
+    <span style="color: #228B22"># Find the Python function that represents the gradient of the cost function</span>
+    <span style="color: #228B22"># w.r.t the 0-th input argument -- that is the weights and biases in the hidden and output layer</span>
+    cost_function_deep_grad = grad(cost_function_deep,<span style="color: #B452CD">0</span>)
 
-<p>In addition to discrete convolutions themselves, <b>pooling</b> operations
-make up another important building block in CNNs. Pooling operations reduce
-the size of feature maps by using some function to summarize subregions, such
-as taking the average or the maximum value.
-</p>
+    <span style="color: #228B22"># Let the update be done num_iter times</span>
+    <span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(num_iter):
+        <span style="color: #228B22"># Evaluate the gradient at the current weights and biases in P.</span>
+        <span style="color: #228B22"># The cost_grad consist now of N_hidden + 1 arrays; the gradient w.r.t the weights and biases</span>
+        <span style="color: #228B22"># in the hidden layers and output layers evaluated at x.</span>
+        cost_deep_grad =  cost_function_deep_grad(P, x)
 
-<p>Pooling works by sliding a window across the input and feeding the content of
-the window to a <b>pooling function</b>. In some sense, pooling works very much
-like a discrete convolution, but replaces the linear combination described by
-the kernel with some other function.
-</p>
-</section>
+        <span style="color: #8B008B; font-weight: bold">for</span> l <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(N_hidden+<span style="color: #B452CD">1</span>):
+            P[l] = P[l] - lmb * cost_deep_grad[l]
 
-<section>
-<h2 id="pooling-arithmetic">Pooling arithmetic </h2>
+    <span style="color: #658b00">print</span>(<span style="color: #CD5555">&#39;Final cost: %g&#39;</span>%cost_function_deep(P, x))
 
-<p>In a neural network, pooling layers provide invariance to small translations of
-the input. The most common kind of pooling is <b>max pooling</b>, which
-consists in splitting the input in (usually non-overlapping) patches and
-outputting the maximum value of each patch. Other kinds of pooling exist, e.g.,
-mean or average pooling, which all share the same idea of aggregating the input
-locally by applying a non-linearity to the content of some patches.
-</p>
-</section>
+    <span style="color: #8B008B; font-weight: bold">return</span> P
 
-<section>
-<h2 id="pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book">Pooling types (<a href="https://github.com/rasbt/machine-learning-book" target="_blank">From Raschka et al</a>) </h2>
+<span style="color: #8B008B; font-weight: bold">if</span> <span style="color: #00688B">__name__</span> == <span style="color: #CD5555">&#39;__main__&#39;</span>:
+    npr.seed(<span style="color: #B452CD">4155</span>)
 
-<center>  <!-- FIGURE -->
-<hr class="figure">
-<center>
-<p class="caption">Figure 5:  A deep CNN </p>
-</center>
-<p><img src="http://github.com/figslides/maxpooling.png" width="500" align="bottom"></p>
-</center>
-</section>
+    <span style="color: #228B22">## Decide the vales of arguments to the function to solve</span>
+    Nt = <span style="color: #B452CD">10</span>
+    T = <span style="color: #B452CD">1</span>
+    t = np.linspace(<span style="color: #B452CD">0</span>,T, Nt)
 
-<section>
-<h2 id="building-convolutional-neural-networks-in-tensorflow-and-keras">Building convolutional neural networks in Tensorflow and Keras </h2>
+    <span style="color: #228B22">## Set up the initial parameters</span>
+    num_hidden_neurons = [<span style="color: #B452CD">100</span>, <span style="color: #B452CD">50</span>, <span style="color: #B452CD">25</span>]
+    num_iter = <span style="color: #B452CD">1000</span>
+    lmb = <span style="color: #B452CD">1e-3</span>
 
-<p>As discussed above, CNNs are neural networks built from the assumption that the inputs
-to the network are 2D images. This is important because the number of features or pixels in images
-grows very fast with the image size, and an enormous number of weights and biases are needed in order to build an accurate network.  
-</p>
+    P = solve_ode_deep_neural_network(t, num_hidden_neurons, num_iter, lmb)
 
-<p>As before, we still have our input, a hidden layer and an output. What's novel about convolutional networks
-are the <b>convolutional</b> and <b>pooling</b> layers stacked in pairs between the input and the hidden layer.
-In addition, the data is no longer represented as a 2D feature matrix, instead each input is a number of 2D
-matrices, typically 1 for each color dimension (Red, Green, Blue). 
-</p>
-</section>
+    g_dnn_ag = g_trial_deep(t,P)
+    g_analytical = g_analytic(t)
 
-<section>
-<h2 id="setting-it-up">Setting it up </h2>
+    <span style="color: #228B22"># Find the maximum absolute difference between the solutons:</span>
+    diff_ag = np.max(np.abs(g_dnn_ag - g_analytical))
+    <span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;The max absolute difference between the solutions is: %g&quot;</span>%diff_ag)
 
-<p>It means that to represent the entire
-dataset of images, we require a 4D matrix or <b>tensor</b>. This tensor has the dimensions:  
-</p>
-<p>&nbsp;<br>
-$$  
-(n_{inputs},\, n_{pixels, width},\, n_{pixels, height},\, depth) .
-$$
-<p>&nbsp;<br>
+    plt.figure(figsize=(<span style="color: #B452CD">10</span>,<span style="color: #B452CD">10</span>))
+
+    plt.title(<span style="color: #CD5555">&#39;Performance of neural network solving an ODE compared to the analytical solution&#39;</span>)
+    plt.plot(t, g_analytical)
+    plt.plot(t, g_dnn_ag[<span style="color: #B452CD">0</span>,:])
+    plt.legend([<span style="color: #CD5555">&#39;analytical&#39;</span>,<span style="color: #CD5555">&#39;nn&#39;</span>])
+    plt.xlabel(<span style="color: #CD5555">&#39;t&#39;</span>)
+    plt.ylabel(<span style="color: #CD5555">&#39;g(t)&#39;</span>)
+
+    plt.show()
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
+</div>
 </section>
 
 <section>
-<h2 id="the-mnist-dataset-again">The MNIST dataset again </h2>
+<h2 id="using-forward-euler-to-solve-the-ode">Using forward Euler to solve the ODE </h2>
 
-<p>The MNIST dataset consists of grayscale images with a pixel size of
-\( 28\times 28 \), meaning we require \( 28 \times 28 = 724 \) weights to each
-neuron in the first hidden layer.
-</p>
+<p>A straightforward way of solving an ODE numerically, is to use Euler's method.</p>
+
+<p>Euler's method uses Taylor series to approximate the value at a function \( f \) at a step \( \Delta x \) from \( x \):</p>
 
-<p>If we were to analyze images of size \( 128\times 128 \) we would require
-\( 128 \times 128 = 16384 \) weights to each neuron. Even worse if we were
-dealing with color images, as most images are, we have an image matrix
-of size \( 128\times 128 \) for each color dimension (Red, Green, Blue),
-meaning 3 times the number of weights \( = 49152 \) are required for every
-single neuron in the first hidden layer.
+<p><p>&nbsp;<br>
+$$
+f(x + \Delta x) \approx f(x) + \Delta x f'(x)
+$$
+<p>&nbsp;<br>
 </p>
-</section>
 
-<section>
-<h2 id="strong-correlations">Strong correlations </h2>
+<p>In our case, using Euler's method to approximate the value of \( g \) at a step \( \Delta t \) from \( t \) yields</p>
 
-<p>Images typically have strong local correlations, meaning that a small
-part of the image varies little from its neighboring regions. If for
-example we have an image of a blue car, we can roughly assume that a
-small blue part of the image is surrounded by other blue regions.
-</p>
+<p>&nbsp;<br>
+$$
+\begin{aligned}
+  g(t + \Delta t) &\approx g(t) + \Delta t g'(t) \\
+  &= g(t) + \Delta t \big(\alpha g(t)(A - g(t))\big)
+\end{aligned}
+$$
+<p>&nbsp;<br>
 
-<p>Therefore, instead of connecting every single pixel to a neuron in the
-first hidden layer, as we have previously done with deep neural
-networks, we can instead connect each neuron to a small part of the
-image (in all 3 RGB depth dimensions).  The size of each small area is
-fixed, and known as a <a href="https://en.wikipedia.org/wiki/Receptive_field" target="_blank">receptive</a>.
-</p>
-</section>
+<p>along with the condition that \( g(0) = g_0 \).</p>
 
-<section>
-<h2 id="layers-of-a-cnn">Layers of a CNN </h2>
+<p>Let \( t_i = i \cdot \Delta t \) where \( \Delta t = \frac{T}{N_t-1} \) where \( T \) is the final time our solver must solve for and \( N_t \) the number of values for \( t \in [0, T] \) for \( i = 0, \dots, N_t-1 \).</p>
 
-<p>The layers of a convolutional neural network arrange neurons in 3D: width, height and depth.  
-The input image is typically a square matrix of depth 3. 
-</p>
+<p>For \( i \geq 1 \), we have that</p>
+<p>&nbsp;<br>
+$$
+\begin{aligned}
+t_i &= i\Delta t \\
+&= (i - 1)\Delta t + \Delta t \\
+&= t_{i-1} + \Delta t
+\end{aligned}
+$$
+<p>&nbsp;<br>
 
-<p>A <b>convolution</b> is performed on the image which outputs
-a 3D volume of neurons. The weights to the input are arranged in a number of 2D matrices, known as <b>filters</b>.
-</p>
+<p>Now, if \( g_i = g(t_i) \) then</p>
 
-<p>Each filter slides along the input image, taking the dot product
-between each small part of the image and the filter, in all depth
-dimensions. This is then passed through a non-linear function,
-typically the <b>Rectified Linear (ReLu)</b> function, which serves as the
-activation of the neurons in the first convolutional layer. This is
-further passed through a <b>pooling layer</b>, which reduces the size of the
-convolutional layer, e.g. by taking the maximum or average across some
-small regions, and this serves as input to the next convolutional
-layer.
-</p>
-</section>
+<p>&nbsp;<br>
+$$
+\begin{equation}
+  \begin{aligned}
+  g_i &= g(t_i) \\
+  &= g(t_{i-1} + \Delta t) \\
+  &\approx g(t_{i-1}) + \Delta t \big(\alpha g(t_{i-1})(A - g(t_{i-1}))\big) \\
+  &= g_{i-1} + \Delta t \big(\alpha g_{i-1}(A - g_{i-1})\big)
+  \end{aligned}
+\end{equation} \tag{12}
+$$
+<p>&nbsp;<br>
 
-<section>
-<h2 id="systematic-reduction">Systematic reduction </h2>
-
-<p>By systematically reducing the size of the input volume, through
-convolution and pooling, the network should create representations of
-small parts of the input, and then from them assemble representations
-of larger areas.  The final pooling layer is flattened to serve as
-input to a hidden layer, such that each neuron in the final pooling
-layer is connected to every single neuron in the hidden layer. This
-then serves as input to the output layer, e.g. a softmax output for
-classification.
+<p>for \( i \geq 1 \) and \( g_0 = g(t_0) = g(0) = g_0 \).</p>
+
+<p>Equation <a href="#mjx-eqn-12">(12)</a> could be implemented in the following way,
+extending the program that uses the network using Autograd:
 </p>
-</section>
 
-<section>
-<h2 id="prerequisites-collect-and-pre-process-data">Prerequisites: Collect and pre-process data </h2>
 
 <!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
 <div class="cell border-box-sizing code_cell rendered">
@@ -1356,93 +1445,71 @@ <h2 id="prerequisites-collect-and-pre-process-data">Prerequisites: Collect and p
     <div class="inner_cell">
       <div class="input_area">
         <div class="highlight" style="background: #eeeedd">
-  <pre style="font-size: 80%; line-height: 125%;"><span style="color: #228B22"># import necessary packages</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">matplotlib.pyplot</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">plt</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn</span> <span style="color: #8B008B; font-weight: bold">import</span> datasets
+  <pre style="font-size: 80%; line-height: 125%;"><span style="color: #228B22"># Assume that all function definitions from the example program using Autograd</span>
+<span style="color: #228B22"># are located here.</span>
 
+<span style="color: #8B008B; font-weight: bold">if</span> <span style="color: #00688B">__name__</span> == <span style="color: #CD5555">&#39;__main__&#39;</span>:
+    npr.seed(<span style="color: #B452CD">4155</span>)
 
-<span style="color: #228B22"># ensure the same random numbers appear every time</span>
-np.random.seed(<span style="color: #B452CD">0</span>)
+    <span style="color: #228B22">## Decide the vales of arguments to the function to solve</span>
+    Nt = <span style="color: #B452CD">10</span>
+    T = <span style="color: #B452CD">1</span>
+    t = np.linspace(<span style="color: #B452CD">0</span>,T, Nt)
 
-<span style="color: #228B22"># display images in notebook</span>
-%matplotlib inline
-plt.rcParams[<span style="color: #CD5555">&#39;figure.figsize&#39;</span>] = (<span style="color: #B452CD">12</span>,<span style="color: #B452CD">12</span>)
+    <span style="color: #228B22">## Set up the initial parameters</span>
+    num_hidden_neurons = [<span style="color: #B452CD">100</span>,<span style="color: #B452CD">50</span>,<span style="color: #B452CD">25</span>]
+    num_iter = <span style="color: #B452CD">1000</span>
+    lmb = <span style="color: #B452CD">1e-3</span>
 
+    P = solve_ode_deep_neural_network(t, num_hidden_neurons, num_iter, lmb)
 
-<span style="color: #228B22"># download MNIST dataset</span>
-digits = datasets.load_digits()
+    g_dnn_ag = g_trial_deep(t,P)
+    g_analytical = g_analytic(t)
 
-<span style="color: #228B22"># define inputs and labels</span>
-inputs = digits.images
-labels = digits.target
+    <span style="color: #228B22"># Find the maximum absolute difference between the solutons:</span>
+    diff_ag = np.max(np.abs(g_dnn_ag - g_analytical))
+    <span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;The max absolute difference between the solutions is: %g&quot;</span>%diff_ag)
 
-<span style="color: #228B22"># RGB images have a depth of 3</span>
-<span style="color: #228B22"># our images are grayscale so they should have a depth of 1</span>
-inputs = inputs[:,:,:,np.newaxis]
+    plt.figure(figsize=(<span style="color: #B452CD">10</span>,<span style="color: #B452CD">10</span>))
 
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;inputs = (n_inputs, pixel_width, pixel_height, depth) = &quot;</span> + <span style="color: #658b00">str</span>(inputs.shape))
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;labels = (n_inputs) = &quot;</span> + <span style="color: #658b00">str</span>(labels.shape))
+    plt.title(<span style="color: #CD5555">&#39;Performance of neural network solving an ODE compared to the analytical solution&#39;</span>)
+    plt.plot(t, g_analytical)
+    plt.plot(t, g_dnn_ag[<span style="color: #B452CD">0</span>,:])
+    plt.legend([<span style="color: #CD5555">&#39;analytical&#39;</span>,<span style="color: #CD5555">&#39;nn&#39;</span>])
+    plt.xlabel(<span style="color: #CD5555">&#39;t&#39;</span>)
+    plt.ylabel(<span style="color: #CD5555">&#39;g(t)&#39;</span>)
 
+    <span style="color: #228B22">## Find an approximation to the funtion using forward Euler</span>
 
-<span style="color: #228B22"># choose some random images to display</span>
-n_inputs = <span style="color: #658b00">len</span>(inputs)
-indices = np.arange(n_inputs)
-random_indices = np.random.choice(indices, size=<span style="color: #B452CD">5</span>)
+    alpha, A, g0 = get_parameters()
+    dt = T/(Nt - <span style="color: #B452CD">1</span>)
 
-<span style="color: #8B008B; font-weight: bold">for</span> i, image <span style="color: #8B008B">in</span> <span style="color: #658b00">enumerate</span>(digits.images[random_indices]):
-    plt.subplot(<span style="color: #B452CD">1</span>, <span style="color: #B452CD">5</span>, i+<span style="color: #B452CD">1</span>)
-    plt.axis(<span style="color: #CD5555">&#39;off&#39;</span>)
-    plt.imshow(image, cmap=plt.cm.gray_r, interpolation=<span style="color: #CD5555">&#39;nearest&#39;</span>)
-    plt.title(<span style="color: #CD5555">&quot;Label: %d&quot;</span> % digits.target[random_indices[i]])
-plt.show()
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-</section>
+    <span style="color: #228B22"># Perform forward Euler to solve the ODE</span>
+    g_euler = np.zeros(Nt)
+    g_euler[<span style="color: #B452CD">0</span>] = g0
 
-<section>
-<h2 id="importing-keras-and-tensorflow">Importing Keras and Tensorflow </h2>
+    <span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #B452CD">1</span>,Nt):
+        g_euler[i] = g_euler[i-<span style="color: #B452CD">1</span>] + dt*(alpha*g_euler[i-<span style="color: #B452CD">1</span>]*(A - g_euler[i-<span style="color: #B452CD">1</span>]))
 
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="font-size: 80%; line-height: 125%;"><span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">tensorflow.keras</span> <span style="color: #8B008B; font-weight: bold">import</span> datasets, layers, models
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">tensorflow.keras.layers</span> <span style="color: #8B008B; font-weight: bold">import</span> Input
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">tensorflow.keras.models</span> <span style="color: #8B008B; font-weight: bold">import</span> Sequential      <span style="color: #228B22">#This allows appending layers to existing models</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">tensorflow.keras.layers</span> <span style="color: #8B008B; font-weight: bold">import</span> Dense           <span style="color: #228B22">#This allows defining the characteristics of a particular layer</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">tensorflow.keras</span> <span style="color: #8B008B; font-weight: bold">import</span> optimizers             <span style="color: #228B22">#This allows using whichever optimiser we want (sgd,adam,RMSprop)</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">tensorflow.keras</span> <span style="color: #8B008B; font-weight: bold">import</span> regularizers           <span style="color: #228B22">#This allows using whichever regularizer we want (l1,l2,l1_l2)</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">tensorflow.keras.utils</span> <span style="color: #8B008B; font-weight: bold">import</span> to_categorical   <span style="color: #228B22">#This allows using categorical cross entropy as the cost function</span>
-<span style="color: #228B22">#from tensorflow.keras import Conv2D</span>
-<span style="color: #228B22">#from tensorflow.keras import MaxPooling2D</span>
-<span style="color: #228B22">#from tensorflow.keras import Flatten</span>
-
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.model_selection</span> <span style="color: #8B008B; font-weight: bold">import</span> train_test_split
-
-<span style="color: #228B22"># representation of labels</span>
-labels = to_categorical(labels)
-
-<span style="color: #228B22"># split into train and test data</span>
-<span style="color: #228B22"># one-liner from scikit-learn library</span>
-train_size = <span style="color: #B452CD">0.8</span>
-test_size = <span style="color: #B452CD">1</span> - train_size
-X_train, X_test, Y_train, Y_test = train_test_split(inputs, labels, train_size=train_size,
-                                                    test_size=test_size)
+    <span style="color: #228B22"># Print the errors done by each method</span>
+    diff1 = np.max(np.abs(g_euler - g_analytical))
+    diff2 = np.max(np.abs(g_dnn_ag[<span style="color: #B452CD">0</span>,:] - g_analytical))
+
+    <span style="color: #658b00">print</span>(<span style="color: #CD5555">&#39;Max absolute difference between Euler method and analytical: %g&#39;</span>%diff1)
+    <span style="color: #658b00">print</span>(<span style="color: #CD5555">&#39;Max absolute difference between deep neural network and analytical: %g&#39;</span>%diff2)
+
+    <span style="color: #228B22"># Plot results</span>
+    plt.figure(figsize=(<span style="color: #B452CD">10</span>,<span style="color: #B452CD">10</span>))
+
+    plt.plot(t,g_euler)
+    plt.plot(t,g_analytical)
+    plt.plot(t,g_dnn_ag[<span style="color: #B452CD">0</span>,:])
+
+    plt.legend([<span style="color: #CD5555">&#39;euler&#39;</span>,<span style="color: #CD5555">&#39;analytical&#39;</span>,<span style="color: #CD5555">&#39;dnn&#39;</span>])
+    plt.xlabel(<span style="color: #CD5555">&#39;Time t&#39;</span>)
+    plt.ylabel(<span style="color: #CD5555">&#39;g(t)&#39;</span>)
+
+    plt.show()
 </pre>
 </div>
       </div>
@@ -1460,101 +1527,78 @@ <h2 id="importing-keras-and-tensorflow">Importing Keras and Tensorflow </h2>
 </section>
 
 <section>
-<h2 id="running-with-keras">Running with Keras </h2>
+<h2 id="example-solving-the-one-dimensional-poisson-equation">Example: Solving the one dimensional Poisson equation </h2>
 
+<p>The Poisson equation for \( g(x) \) in one dimension is</p>
 
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="font-size: 80%; line-height: 125%;"><span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">create_convolutional_neural_network_keras</span>(input_shape, receptive_field,
-                                              n_filters, n_neurons_connected, n_categories,
-                                              eta, lmbd):
-    model = Sequential()
-    model.add(layers.Conv2D(n_filters, (receptive_field, receptive_field), input_shape=input_shape, padding=<span style="color: #CD5555">&#39;same&#39;</span>,
-              activation=<span style="color: #CD5555">&#39;relu&#39;</span>, kernel_regularizer=regularizers.l2(lmbd)))
-    model.add(layers.MaxPooling2D(pool_size=(<span style="color: #B452CD">2</span>, <span style="color: #B452CD">2</span>)))
-    model.add(layers.Flatten())
-    model.add(layers.Dense(n_neurons_connected, activation=<span style="color: #CD5555">&#39;relu&#39;</span>, kernel_regularizer=regularizers.l2(lmbd)))
-    model.add(layers.Dense(n_categories, activation=<span style="color: #CD5555">&#39;softmax&#39;</span>, kernel_regularizer=regularizers.l2(lmbd)))
-    
-    sgd = optimizers.SGD(learning_rate=eta)
-    model.compile(loss=<span style="color: #CD5555">&#39;categorical_crossentropy&#39;</span>, optimizer=sgd, metrics=[<span style="color: #CD5555">&#39;accuracy&#39;</span>])
-    
-    <span style="color: #8B008B; font-weight: bold">return</span> model
-
-epochs = <span style="color: #B452CD">100</span>
-batch_size = <span style="color: #B452CD">100</span>
-input_shape = X_train.shape[<span style="color: #B452CD">1</span>:<span style="color: #B452CD">4</span>]
-receptive_field = <span style="color: #B452CD">3</span>
-n_filters = <span style="color: #B452CD">10</span>
-n_neurons_connected = <span style="color: #B452CD">50</span>
-n_categories = <span style="color: #B452CD">10</span>
-
-eta_vals = np.logspace(-<span style="color: #B452CD">5</span>, <span style="color: #B452CD">1</span>, <span style="color: #B452CD">7</span>)
-lmbd_vals = np.logspace(-<span style="color: #B452CD">5</span>, <span style="color: #B452CD">1</span>, <span style="color: #B452CD">7</span>)
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
+<p>&nbsp;<br>
+$$
+\begin{equation} \tag{13}
+  -g''(x) = f(x)
+\end{equation}
+$$
+<p>&nbsp;<br>
+
+<p>where \( f(x) \) is a given function for \( x \in (0,1) \).</p>
+
+<p>The conditions that \( g(x) \) is chosen to fulfill, are</p>
+<p>&nbsp;<br>
+$$
+\begin{align*}
+  g(0) &= 0 \\
+  g(1) &= 0
+\end{align*}
+$$
+<p>&nbsp;<br>
+
+<p>This equation can be solved numerically using programs where e.g Autograd and TensorFlow are used.
+The results from the networks can then be compared to the analytical solution.
+In addition, it could be interesting to see how a typical method for numerically solving second order ODEs compares to the neural networks.
+</p>
 </section>
 
 <section>
-<h2 id="final-part">Final part </h2>
+<h2 id="the-specific-equation-to-solve-for">The specific equation to solve for </h2>
 
+<p>Here, the function \( g(x) \) to solve for follows the equation</p>
 
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="font-size: 80%; line-height: 125%;">CNN_keras = np.zeros((<span style="color: #658b00">len</span>(eta_vals), <span style="color: #658b00">len</span>(lmbd_vals)), dtype=<span style="color: #658b00">object</span>)
-        
-<span style="color: #8B008B; font-weight: bold">for</span> i, eta <span style="color: #8B008B">in</span> <span style="color: #658b00">enumerate</span>(eta_vals):
-    <span style="color: #8B008B; font-weight: bold">for</span> j, lmbd <span style="color: #8B008B">in</span> <span style="color: #658b00">enumerate</span>(lmbd_vals):
-        CNN = create_convolutional_neural_network_keras(input_shape, receptive_field,
-                                              n_filters, n_neurons_connected, n_categories,
-                                              eta, lmbd)
-        CNN.fit(X_train, Y_train, epochs=epochs, batch_size=batch_size, verbose=<span style="color: #B452CD">0</span>)
-        scores = CNN.evaluate(X_test, Y_test)
-        
-        CNN_keras[i][j] = CNN
-        
-        <span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;Learning rate = &quot;</span>, eta)
-        <span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;Lambda = &quot;</span>, lmbd)
-        <span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;Test accuracy: %.3f&quot;</span> % scores[<span style="color: #B452CD">1</span>])
-        <span style="color: #658b00">print</span>()
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
+<p>&nbsp;<br>
+$$
+-g''(x) = f(x),\qquad x \in (0,1)
+$$
+<p>&nbsp;<br>
+
+<p>where \( f(x) \) is a given function, along with the chosen conditions</p>
+
+<p>&nbsp;<br>
+$$
+\begin{aligned}
+g(0) = g(1) = 0
+\end{aligned}\tag{14}
+$$
+<p>&nbsp;<br>
+
+<p>In this example, we consider the case when \( f(x) = (3x + x^2)\exp(x) \).</p>
+
+<p>For this case, a possible trial solution satisfying the conditions could be</p>
+
+<p>&nbsp;<br>
+$$
+g_t(x) = x \cdot (1-x) \cdot N(P,x)
+$$
+<p>&nbsp;<br>
+
+<p>The analytical solution for this problem is</p>
+
+<p>&nbsp;<br>
+$$
+g(x) = x(1 - x)\exp(x)
+$$
+<p>&nbsp;<br>
 </section>
 
 <section>
-<h2 id="final-visualization">Final visualization </h2>
+<h2 id="solving-the-equation-using-autograd">Solving the equation using Autograd </h2>
 
 
 <!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
@@ -1563,272 +1607,160 @@ <h2 id="final-visualization">Final visualization </h2>
     <div class="inner_cell">
       <div class="input_area">
         <div class="highlight" style="background: #eeeedd">
-  <pre style="font-size: 80%; line-height: 125%;"><span style="color: #228B22"># visual representation of grid search</span>
-<span style="color: #228B22"># uses seaborn heatmap, could probably do this in matplotlib</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">seaborn</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">sns</span>
+  <pre style="font-size: 80%; line-height: 125%;"><span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">autograd.numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">autograd</span> <span style="color: #8B008B; font-weight: bold">import</span> grad, elementwise_grad
+<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">autograd.numpy.random</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">npr</span>
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">matplotlib</span> <span style="color: #8B008B; font-weight: bold">import</span> pyplot <span style="color: #8B008B; font-weight: bold">as</span> plt
 
-sns.set()
+<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">sigmoid</span>(z):
+    <span style="color: #8B008B; font-weight: bold">return</span> <span style="color: #B452CD">1</span>/(<span style="color: #B452CD">1</span> + np.exp(-z))
 
-train_accuracy = np.zeros((<span style="color: #658b00">len</span>(eta_vals), <span style="color: #658b00">len</span>(lmbd_vals)))
-test_accuracy = np.zeros((<span style="color: #658b00">len</span>(eta_vals), <span style="color: #658b00">len</span>(lmbd_vals)))
+<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">deep_neural_network</span>(deep_params, x):
+    <span style="color: #228B22"># N_hidden is the number of hidden layers</span>
+    <span style="color: #228B22"># deep_params is a list, len() should be used</span>
+    N_hidden = <span style="color: #658b00">len</span>(deep_params) - <span style="color: #B452CD">1</span> <span style="color: #228B22"># -1 since params consists of</span>
+                                        <span style="color: #228B22"># parameters to all the hidden</span>
+                                        <span style="color: #228B22"># layers AND the output layer.</span>
 
-<span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #658b00">len</span>(eta_vals)):
-    <span style="color: #8B008B; font-weight: bold">for</span> j <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #658b00">len</span>(lmbd_vals)):
-        CNN = CNN_keras[i][j]
+    <span style="color: #228B22"># Assumes input x being an one-dimensional array</span>
+    num_values = np.size(x)
+    x = x.reshape(-<span style="color: #B452CD">1</span>, num_values)
 
-        train_accuracy[i][j] = CNN.evaluate(X_train, Y_train)[<span style="color: #B452CD">1</span>]
-        test_accuracy[i][j] = CNN.evaluate(X_test, Y_test)[<span style="color: #B452CD">1</span>]
+    <span style="color: #228B22"># Assume that the input layer does nothing to the input x</span>
+    x_input = x
 
-        
-fig, ax = plt.subplots(figsize = (<span style="color: #B452CD">10</span>, <span style="color: #B452CD">10</span>))
-sns.heatmap(train_accuracy, annot=<span style="color: #8B008B; font-weight: bold">True</span>, ax=ax, cmap=<span style="color: #CD5555">&quot;viridis&quot;</span>)
-ax.set_title(<span style="color: #CD5555">&quot;Training Accuracy&quot;</span>)
-ax.set_ylabel(<span style="color: #CD5555">&quot;$\eta$&quot;</span>)
-ax.set_xlabel(<span style="color: #CD5555">&quot;$\lambda$&quot;</span>)
-plt.show()
+    <span style="color: #228B22"># Due to multiple hidden layers, define a variable referencing to the</span>
+    <span style="color: #228B22"># output of the previous layer:</span>
+    x_prev = x_input
 
-fig, ax = plt.subplots(figsize = (<span style="color: #B452CD">10</span>, <span style="color: #B452CD">10</span>))
-sns.heatmap(test_accuracy, annot=<span style="color: #8B008B; font-weight: bold">True</span>, ax=ax, cmap=<span style="color: #CD5555">&quot;viridis&quot;</span>)
-ax.set_title(<span style="color: #CD5555">&quot;Test Accuracy&quot;</span>)
-ax.set_ylabel(<span style="color: #CD5555">&quot;$\eta$&quot;</span>)
-ax.set_xlabel(<span style="color: #CD5555">&quot;$\lambda$&quot;</span>)
-plt.show()
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-</section>
+    <span style="color: #228B22">## Hidden layers:</span>
 
-<section>
-<h2 id="the-cifar01-data-set">The CIFAR01 data set </h2>
+    <span style="color: #8B008B; font-weight: bold">for</span> l <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(N_hidden):
+        <span style="color: #228B22"># From the list of parameters P; find the correct weigths and bias for this layer</span>
+        w_hidden = deep_params[l]
 
-<p>The CIFAR10 dataset contains 60,000 color images in 10 classes, with
-6,000 images in each class. The dataset is divided into 50,000
-training images and 10,000 testing images. The classes are mutually
-exclusive and there is no overlap between them.
-</p>
+        <span style="color: #228B22"># Add a row of ones to include bias</span>
+        x_prev = np.concatenate((np.ones((<span style="color: #B452CD">1</span>,num_values)), x_prev ), axis = <span style="color: #B452CD">0</span>)
 
+        z_hidden = np.matmul(w_hidden, x_prev)
+        x_hidden = sigmoid(z_hidden)
 
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="font-size: 80%; line-height: 125%;"><span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">tensorflow</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">tf</span>
+        <span style="color: #228B22"># Update x_prev such that next layer can use the output from this layer</span>
+        x_prev = x_hidden
 
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">tensorflow.keras</span> <span style="color: #8B008B; font-weight: bold">import</span> datasets, layers, models
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">matplotlib.pyplot</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">plt</span>
+    <span style="color: #228B22">## Output layer:</span>
 
-<span style="color: #228B22"># We import the data set</span>
-(train_images, train_labels), (test_images, test_labels) = datasets.cifar10.load_data()
+    <span style="color: #228B22"># Get the weights and bias for this layer</span>
+    w_output = deep_params[-<span style="color: #B452CD">1</span>]
 
-<span style="color: #228B22"># Normalize pixel values to be between 0 and 1 by dividing by 255. </span>
-train_images, test_images = train_images / <span style="color: #B452CD">255.0</span>, test_images / <span style="color: #B452CD">255.0</span>
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-</section>
+    <span style="color: #228B22"># Include bias:</span>
+    x_prev = np.concatenate((np.ones((<span style="color: #B452CD">1</span>,num_values)), x_prev), axis = <span style="color: #B452CD">0</span>)
 
-<section>
-<h2 id="verifying-the-data-set">Verifying the data set </h2>
+    z_output = np.matmul(w_output, x_prev)
+    x_output = z_output
 
-<p>To verify that the dataset looks correct, let's plot the first 25 images from the training set and display the class name below each image.</p>
+    <span style="color: #8B008B; font-weight: bold">return</span> x_output
 
 
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="font-size: 80%; line-height: 125%;">class_names = [<span style="color: #CD5555">&#39;airplane&#39;</span>, <span style="color: #CD5555">&#39;automobile&#39;</span>, <span style="color: #CD5555">&#39;bird&#39;</span>, <span style="color: #CD5555">&#39;cat&#39;</span>, <span style="color: #CD5555">&#39;deer&#39;</span>,
-               <span style="color: #CD5555">&#39;dog&#39;</span>, <span style="color: #CD5555">&#39;frog&#39;</span>, <span style="color: #CD5555">&#39;horse&#39;</span>, <span style="color: #CD5555">&#39;ship&#39;</span>, <span style="color: #CD5555">&#39;truck&#39;</span>]
-plt.figure(figsize=(<span style="color: #B452CD">10</span>,<span style="color: #B452CD">10</span>))
-<span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #B452CD">25</span>):
-    plt.subplot(<span style="color: #B452CD">5</span>,<span style="color: #B452CD">5</span>,i+<span style="color: #B452CD">1</span>)
-    plt.xticks([])
-    plt.yticks([])
-    plt.grid(<span style="color: #8B008B; font-weight: bold">False</span>)
-    plt.imshow(train_images[i], cmap=plt.cm.binary)
-    <span style="color: #228B22"># The CIFAR labels happen to be arrays, </span>
-    <span style="color: #228B22"># which is why you need the extra index</span>
-    plt.xlabel(class_names[train_labels[i][<span style="color: #B452CD">0</span>]])
-plt.show()
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-</section>
+<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">solve_ode_deep_neural_network</span>(x, num_neurons, num_iter, lmb):
+    <span style="color: #228B22"># num_hidden_neurons is now a list of number of neurons within each hidden layer</span>
 
-<section>
-<h2 id="set-up-the-model">Set up  the model </h2>
+    <span style="color: #228B22"># Find the number of hidden layers:</span>
+    N_hidden = np.size(num_neurons)
 
-<p>The 6 lines of code below define the convolutional base using a common pattern: a stack of Conv2D and MaxPooling2D layers.</p>
+    <span style="color: #228B22">## Set up initial weigths and biases</span>
 
-<p>As input, a CNN takes tensors of shape (image_height, image_width, color_channels), ignoring the batch size. If you are new to these dimensions, color_channels refers to (R,G,B). In this example, you will configure our CNN to process inputs of shape (32, 32, 3), which is the format of CIFAR images. You can do this by passing the argument input_shape to our first layer.</p>
+    <span style="color: #228B22"># Initialize the list of parameters:</span>
+    P = [<span style="color: #8B008B; font-weight: bold">None</span>]*(N_hidden + <span style="color: #B452CD">1</span>) <span style="color: #228B22"># + 1 to include the output layer</span>
 
+    P[<span style="color: #B452CD">0</span>] = npr.randn(num_neurons[<span style="color: #B452CD">0</span>], <span style="color: #B452CD">2</span> )
+    <span style="color: #8B008B; font-weight: bold">for</span> l <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #B452CD">1</span>,N_hidden):
+        P[l] = npr.randn(num_neurons[l], num_neurons[l-<span style="color: #B452CD">1</span>] + <span style="color: #B452CD">1</span>) <span style="color: #228B22"># +1 to include bias</span>
 
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="font-size: 80%; line-height: 125%;">model = models.Sequential()
-model.add(layers.Conv2D(<span style="color: #B452CD">32</span>, (<span style="color: #B452CD">3</span>, <span style="color: #B452CD">3</span>), activation=<span style="color: #CD5555">&#39;relu&#39;</span>, input_shape=(<span style="color: #B452CD">32</span>, <span style="color: #B452CD">32</span>, <span style="color: #B452CD">3</span>)))
-model.add(layers.MaxPooling2D((<span style="color: #B452CD">2</span>, <span style="color: #B452CD">2</span>)))
-model.add(layers.Conv2D(<span style="color: #B452CD">64</span>, (<span style="color: #B452CD">3</span>, <span style="color: #B452CD">3</span>), activation=<span style="color: #CD5555">&#39;relu&#39;</span>))
-model.add(layers.MaxPooling2D((<span style="color: #B452CD">2</span>, <span style="color: #B452CD">2</span>)))
-model.add(layers.Conv2D(<span style="color: #B452CD">64</span>, (<span style="color: #B452CD">3</span>, <span style="color: #B452CD">3</span>), activation=<span style="color: #CD5555">&#39;relu&#39;</span>))
+    <span style="color: #228B22"># For the output layer</span>
+    P[-<span style="color: #B452CD">1</span>] = npr.randn(<span style="color: #B452CD">1</span>, num_neurons[-<span style="color: #B452CD">1</span>] + <span style="color: #B452CD">1</span> ) <span style="color: #228B22"># +1 since bias is included</span>
 
-<span style="color: #228B22"># Let&#39;s display the architecture of our model so far.</span>
+    <span style="color: #658b00">print</span>(<span style="color: #CD5555">&#39;Initial cost: %g&#39;</span>%cost_function_deep(P, x))
 
-model.summary()
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
+    <span style="color: #228B22">## Start finding the optimal weigths using gradient descent</span>
 
-<p>You can see that the output of every Conv2D and MaxPooling2D layer is a 3D tensor of shape (height, width, channels). The width and height dimensions tend to shrink as you go deeper in the network. The number of output channels for each Conv2D layer is controlled by the first argument (e.g., 32 or 64). Typically, as the width and height shrink, you can afford (computationally) to add more output channels in each Conv2D layer.</p>
-</section>
+    <span style="color: #228B22"># Find the Python function that represents the gradient of the cost function</span>
+    <span style="color: #228B22"># w.r.t the 0-th input argument -- that is the weights and biases in the hidden and output layer</span>
+    cost_function_deep_grad = grad(cost_function_deep,<span style="color: #B452CD">0</span>)
 
-<section>
-<h2 id="add-dense-layers-on-top">Add Dense layers on top </h2>
-
-<p>To complete our model, you will feed the last output tensor from the
-convolutional base (of shape (4, 4, 64)) into one or more Dense layers
-to perform classification. Dense layers take vectors as input (which
-are 1D), while the current output is a 3D tensor. First, you will
-flatten (or unroll) the 3D output to 1D, then add one or more Dense
-layers on top. CIFAR has 10 output classes, so you use a final Dense
-layer with 10 outputs and a softmax activation.
-</p>
+    <span style="color: #228B22"># Let the update be done num_iter times</span>
+    <span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(num_iter):
+        <span style="color: #228B22"># Evaluate the gradient at the current weights and biases in P.</span>
+        <span style="color: #228B22"># The cost_grad consist now of N_hidden + 1 arrays; the gradient w.r.t the weights and biases</span>
+        <span style="color: #228B22"># in the hidden layers and output layers evaluated at x.</span>
+        cost_deep_grad =  cost_function_deep_grad(P, x)
 
+        <span style="color: #8B008B; font-weight: bold">for</span> l <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(N_hidden+<span style="color: #B452CD">1</span>):
+            P[l] = P[l] - lmb * cost_deep_grad[l]
 
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="font-size: 80%; line-height: 125%;">model.add(layers.Flatten())
-model.add(layers.Dense(<span style="color: #B452CD">64</span>, activation=<span style="color: #CD5555">&#39;relu&#39;</span>))
-model.add(layers.Dense(<span style="color: #B452CD">10</span>))
-<span style="color: #228B22"># Here&#39;s the complete architecture of our model</span>
+    <span style="color: #658b00">print</span>(<span style="color: #CD5555">&#39;Final cost: %g&#39;</span>%cost_function_deep(P, x))
 
-model.summary()
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
+    <span style="color: #8B008B; font-weight: bold">return</span> P
 
-<p>As you can see, our (4, 4, 64) outputs were flattened into vectors of shape (1024) before going through two Dense layers.</p>
-</section>
+<span style="color: #228B22">## Set up the cost function specified for this Poisson equation:</span>
 
-<section>
-<h2 id="compile-and-train-the-model">Compile and train the model </h2>
+<span style="color: #228B22"># The right side of the ODE</span>
+<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">f</span>(x):
+    <span style="color: #8B008B; font-weight: bold">return</span> (<span style="color: #B452CD">3</span>*x + x**<span style="color: #B452CD">2</span>)*np.exp(x)
 
+<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">cost_function_deep</span>(P, x):
 
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="font-size: 80%; line-height: 125%;">model.compile(optimizer=<span style="color: #CD5555">&#39;adam&#39;</span>,
-              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=<span style="color: #8B008B; font-weight: bold">True</span>),
-              metrics=[<span style="color: #CD5555">&#39;accuracy&#39;</span>])
+    <span style="color: #228B22"># Evaluate the trial function with the current parameters P</span>
+    g_t = g_trial_deep(x,P)
 
-history = model.fit(train_images, train_labels, epochs=<span style="color: #B452CD">10</span>, 
-                    validation_data=(test_images, test_labels))
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-</section>
+    <span style="color: #228B22"># Find the derivative w.r.t x of the trial function</span>
+    d2_g_t = elementwise_grad(elementwise_grad(g_trial_deep,<span style="color: #B452CD">0</span>))(x,P)
 
-<section>
-<h2 id="finally-evaluate-the-model">Finally, evaluate the model </h2>
+    right_side = f(x)
 
+    err_sqr = (-d2_g_t - right_side)**<span style="color: #B452CD">2</span>
+    cost_sum = np.sum(err_sqr)
 
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="font-size: 80%; line-height: 125%;">plt.plot(history.history[<span style="color: #CD5555">&#39;accuracy&#39;</span>], label=<span style="color: #CD5555">&#39;accuracy&#39;</span>)
-plt.plot(history.history[<span style="color: #CD5555">&#39;val_accuracy&#39;</span>], label = <span style="color: #CD5555">&#39;val_accuracy&#39;</span>)
-plt.xlabel(<span style="color: #CD5555">&#39;Epoch&#39;</span>)
-plt.ylabel(<span style="color: #CD5555">&#39;Accuracy&#39;</span>)
-plt.ylim([<span style="color: #B452CD">0.5</span>, <span style="color: #B452CD">1</span>])
-plt.legend(loc=<span style="color: #CD5555">&#39;lower right&#39;</span>)
+    <span style="color: #8B008B; font-weight: bold">return</span> cost_sum/np.size(err_sqr)
+
+<span style="color: #228B22"># The trial solution:</span>
+<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">g_trial_deep</span>(x,P):
+    <span style="color: #8B008B; font-weight: bold">return</span> x*(<span style="color: #B452CD">1</span>-x)*deep_neural_network(P,x)
+
+<span style="color: #228B22"># The analytic solution;</span>
+<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">g_analytic</span>(x):
+    <span style="color: #8B008B; font-weight: bold">return</span> x*(<span style="color: #B452CD">1</span>-x)*np.exp(x)
+
+<span style="color: #8B008B; font-weight: bold">if</span> <span style="color: #00688B">__name__</span> == <span style="color: #CD5555">&#39;__main__&#39;</span>:
+    npr.seed(<span style="color: #B452CD">4155</span>)
+
+    <span style="color: #228B22">## Decide the vales of arguments to the function to solve</span>
+    Nx = <span style="color: #B452CD">10</span>
+    x = np.linspace(<span style="color: #B452CD">0</span>,<span style="color: #B452CD">1</span>, Nx)
+
+    <span style="color: #228B22">## Set up the initial parameters</span>
+    num_hidden_neurons = [<span style="color: #B452CD">200</span>,<span style="color: #B452CD">100</span>]
+    num_iter = <span style="color: #B452CD">1000</span>
+    lmb = <span style="color: #B452CD">1e-3</span>
+
+    P = solve_ode_deep_neural_network(x, num_hidden_neurons, num_iter, lmb)
 
-test_loss, test_acc = model.evaluate(test_images,  test_labels, verbose=<span style="color: #B452CD">2</span>)
+    g_dnn_ag = g_trial_deep(x,P)
+    g_analytical = g_analytic(x)
 
-<span style="color: #658b00">print</span>(test_acc)
+    <span style="color: #228B22"># Find the maximum absolute difference between the solutons:</span>
+    max_diff = np.max(np.abs(g_dnn_ag - g_analytical))
+    <span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;The max absolute difference between the solutions is: %g&quot;</span>%max_diff)
+
+    plt.figure(figsize=(<span style="color: #B452CD">10</span>,<span style="color: #B452CD">10</span>))
+
+    plt.title(<span style="color: #CD5555">&#39;Performance of neural network solving an ODE compared to the analytical solution&#39;</span>)
+    plt.plot(x, g_analytical)
+    plt.plot(x, g_dnn_ag[<span style="color: #B452CD">0</span>,:])
+    plt.legend([<span style="color: #CD5555">&#39;analytical&#39;</span>,<span style="color: #CD5555">&#39;nn&#39;</span>])
+    plt.xlabel(<span style="color: #CD5555">&#39;x&#39;</span>)
+    plt.ylabel(<span style="color: #CD5555">&#39;g(x)&#39;</span>)
+    plt.show()
 </pre>
 </div>
       </div>
@@ -1846,439 +1778,312 @@ <h2 id="finally-evaluate-the-model">Finally, evaluate the model </h2>
 </section>
 
 <section>
-<h2 id="building-our-own-cnn-code">Building our own CNN code </h2>
+<h2 id="comparing-with-a-numerical-scheme">Comparing with a numerical scheme </h2>
 
-<p>Here we present a flexible and readable python code for a CNN
-implemented with NumPy. We will present the code, showcase how to use
-the codebase and fit a CNN that yields a 99% accuracy on the 28x28
-MNIST dataset within reasonable time.
-</p>
+<p>The Poisson equation is possible to solve using Taylor series to approximate the second derivative.</p>
 
-<b>The codes here were developed by Eric Reber and Gregor Kajda during spring 2023.</b>
+<p>Using Taylor series, the second derivative can be expressed as</p>
 
-<p>The CNN is compatible with all schedulers, cost functions and
-activation functions discussed in constructing our neural network
-codes.
+<p><p>&nbsp;<br>
+$$
+g''(x) = \frac{g(x + \Delta x) - 2g(x) + g(x-\Delta x)}{\Delta x^2} + E_{\Delta x}(x)
+$$
+<p>&nbsp;<br>
 </p>
 
-<p> The CNN code consists of different types of Layer classes, including
-Convolution2DLayer, Pooling2DLayer, FlattenLayer, FullyConnectedLayer
-and OutputLayer, which can be added to the CNN object using the
-interface of the CNN class. This allows you to easily construct your
-own CNN, as well as allowing you to get used to an interface similar
-to that of TensorFlow which is used for real world applications. 
-</p>
+<p>where \( \Delta x \) is a small step size and \( E_{\Delta x}(x) \) being the error term.</p>
 
-<p>Another important feature of this code is that it throws errors if
-unreasonable decisions are made (for example using a kernel that is
-larger than the image, not using a FlattenLayer, etc), and provides
-the user with an informative error message.
-</p>
-<h3 id="list-of-contents">List of contents: </h3>
-<ol>
-<p><li> Schedulers</li>
-<p><li> Activation Functions</li>
-<p><li> Cost Functions</li> 
-<p><li> Convolution</li>
-<p><li> Layers</li>
-<p><li> CNN</li> 
-<p><li> Some final remarks</li>
-</ol>
-<p>
-<h3 id="schedulers">Schedulers </h3>
-
-<p>The code below shows object oriented implementations of the Constant,
-Momentum, Adagrad, AdagradMomentum, RMS prop and Adam schedulers. All
-of the classes belong to the shared abstract Scheduler class, and
-share the update_change() and reset() methods allowing for any of the
-schedulers to be seamlessly used during the training stage, as will
-later be shown in the fit() method of the neural
-network. Update_change() only has one parameter, the gradient
-(\( \delta^{l}_{j}a^{l-1}_k \)), and returns the change which will be
-subtracted from the weights. The reset() function takes no parameters,
-and resets the desired variables. For Constant and Momentum, reset
-does nothing.
-</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="font-size: 80%; line-height: 125%;"><span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">autograd.numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
-
-<span style="color: #8B008B; font-weight: bold">class</span> <span style="color: #008b45; font-weight: bold">Scheduler</span>:
-    <span style="color: #CD5555">&quot;&quot;&quot;</span>
-<span style="color: #CD5555">    Abstract class for Schedulers</span>
-<span style="color: #CD5555">    &quot;&quot;&quot;</span>
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">__init__</span>(<span style="color: #658b00">self</span>, eta):
-        <span style="color: #658b00">self</span>.eta = eta
-
-    <span style="color: #228B22"># should be overwritten</span>
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">update_change</span>(<span style="color: #658b00">self</span>, gradient):
-        <span style="color: #8B008B; font-weight: bold">raise</span> <span style="color: #008b45; font-weight: bold">NotImplementedError</span>
-
-    <span style="color: #228B22"># overwritten if needed</span>
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">reset</span>(<span style="color: #658b00">self</span>):
-        <span style="color: #8B008B; font-weight: bold">pass</span>
-
-
-<span style="color: #8B008B; font-weight: bold">class</span> <span style="color: #008b45; font-weight: bold">Constant</span>(Scheduler):
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">__init__</span>(<span style="color: #658b00">self</span>, eta):
-        <span style="color: #658b00">super</span>().<span style="color: #008b45">__init__</span>(eta)
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">update_change</span>(<span style="color: #658b00">self</span>, gradient):
-        <span style="color: #8B008B; font-weight: bold">return</span> <span style="color: #658b00">self</span>.eta * gradient
-    
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">reset</span>(<span style="color: #658b00">self</span>):
-        <span style="color: #8B008B; font-weight: bold">pass</span>
+<p>Looking away from the error terms gives an approximation to the second derivative:</p>
 
+<p>&nbsp;<br>
+$$
+\begin{equation} \tag{15}
+g''(x) \approx \frac{g(x + \Delta x) - 2g(x) + g(x-\Delta x)}{\Delta x^2}
+\end{equation}
+$$
+<p>&nbsp;<br>
 
-<span style="color: #8B008B; font-weight: bold">class</span> <span style="color: #008b45; font-weight: bold">Momentum</span>(Scheduler):
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">__init__</span>(<span style="color: #658b00">self</span>, eta: <span style="color: #658b00">float</span>, momentum: <span style="color: #658b00">float</span>):
-        <span style="color: #658b00">super</span>().<span style="color: #008b45">__init__</span>(eta)
-        <span style="color: #658b00">self</span>.momentum = momentum
-        <span style="color: #658b00">self</span>.change = <span style="color: #B452CD">0</span>
+<p>If \( x_i = i \Delta x = x_{i-1} + \Delta x \) and \( g_i = g(x_i) \) for \( i = 1,\dots N_x - 2 \) with \( N_x \) being the number of values for \( x \), <a href="#mjx-eqn-15">(15)</a> becomes</p>
 
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">update_change</span>(<span style="color: #658b00">self</span>, gradient):
-        <span style="color: #658b00">self</span>.change = <span style="color: #658b00">self</span>.momentum * <span style="color: #658b00">self</span>.change + <span style="color: #658b00">self</span>.eta * gradient
-        <span style="color: #8B008B; font-weight: bold">return</span> <span style="color: #658b00">self</span>.change
+<p>&nbsp;<br>
+$$
+\begin{aligned}
+g''(x_i) &\approx \frac{g(x_i + \Delta x) - 2g(x_i) + g(x_i -\Delta x)}{\Delta x^2} \\
+&= \frac{g_{i+1} - 2g_i + g_{i-1}}{\Delta x^2}
+\end{aligned}
+$$
+<p>&nbsp;<br>
 
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">reset</span>(<span style="color: #658b00">self</span>):
-        <span style="color: #8B008B; font-weight: bold">pass</span>
+<p>Since we know from our problem that</p>
 
+<p>&nbsp;<br>
+$$
+\begin{aligned}
+-g''(x) &= f(x) \\
+&= (3x + x^2)\exp(x)
+\end{aligned}
+$$
+<p>&nbsp;<br>
 
-<span style="color: #8B008B; font-weight: bold">class</span> <span style="color: #008b45; font-weight: bold">Adagrad</span>(Scheduler):
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">__init__</span>(<span style="color: #658b00">self</span>, eta):
-        <span style="color: #658b00">super</span>().<span style="color: #008b45">__init__</span>(eta)
-        <span style="color: #658b00">self</span>.G_t = <span style="color: #8B008B; font-weight: bold">None</span>
+<p>along with the conditions \( g(0) = g(1) = 0 \),
+the following scheme can be used to find an approximate solution for \( g(x) \) numerically:
+</p>
 
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">update_change</span>(<span style="color: #658b00">self</span>, gradient):
-        delta = <span style="color: #B452CD">1e-8</span>  <span style="color: #228B22"># avoid division ny zero</span>
+<p>&nbsp;<br>
+$$
+\begin{equation}
+  \begin{aligned}
+  -\Big( \frac{g_{i+1} - 2g_i + g_{i-1}}{\Delta x^2} \Big) &= f(x_i) \\
+  -g_{i+1} + 2g_i - g_{i-1} &= \Delta x^2 f(x_i)
+  \end{aligned}
+\end{equation} \tag{16}
+$$
+<p>&nbsp;<br>
 
-        <span style="color: #8B008B; font-weight: bold">if</span> <span style="color: #658b00">self</span>.G_t <span style="color: #8B008B">is</span> <span style="color: #8B008B; font-weight: bold">None</span>:
-            <span style="color: #658b00">self</span>.G_t = np.zeros((gradient.shape[<span style="color: #B452CD">0</span>], gradient.shape[<span style="color: #B452CD">0</span>]))
+<p>for \( i = 1, \dots, N_x - 2 \) where \( g_0 = g_{N_x - 1} = 0 \) and \( f(x_i) = (3x_i + x_i^2)\exp(x_i) \), which is given for our specific problem.</p>
 
-        <span style="color: #658b00">self</span>.G_t += gradient @ gradient.T
+<p>The equation can be rewritten into a matrix equation:</p>
 
-        G_t_inverse = <span style="color: #B452CD">1</span> / (
-            delta + np.sqrt(np.reshape(np.diagonal(<span style="color: #658b00">self</span>.G_t), (<span style="color: #658b00">self</span>.G_t.shape[<span style="color: #B452CD">0</span>], <span style="color: #B452CD">1</span>)))
-        )
-        <span style="color: #8B008B; font-weight: bold">return</span> <span style="color: #658b00">self</span>.eta * gradient * G_t_inverse
+<p>&nbsp;<br>
+$$
+\begin{aligned}
+\begin{pmatrix}
+2 & -1 & 0 & \dots & 0 \\
+-1 & 2 & -1 & \dots & 0 \\
+\vdots & & \ddots & & \vdots \\
+0 & \dots & -1 & 2 & -1  \\
+0 & \dots & 0 & -1 & 2\\
+\end{pmatrix}
+\begin{pmatrix}
+g_1 \\
+g_2 \\
+\vdots \\
+g_{N_x - 3} \\
+g_{N_x - 2}
+\end{pmatrix}
+&=
+\Delta x^2
+\begin{pmatrix}
+f(x_1) \\
+f(x_2) \\
+\vdots \\
+f(x_{N_x - 3}) \\
+f(x_{N_x - 2})
+\end{pmatrix} \\
+\boldsymbol{A}\boldsymbol{g} &= \boldsymbol{f},
+\end{aligned}
+$$
+<p>&nbsp;<br>
 
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">reset</span>(<span style="color: #658b00">self</span>):
-        <span style="color: #658b00">self</span>.G_t = <span style="color: #8B008B; font-weight: bold">None</span>
+<p>which makes it possible to solve for the vector \( \boldsymbol{g} \).</p>
+</section>
 
+<section>
+<h2 id="setting-up-the-code">Setting up the code </h2>
 
-<span style="color: #8B008B; font-weight: bold">class</span> <span style="color: #008b45; font-weight: bold">AdagradMomentum</span>(Scheduler):
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">__init__</span>(<span style="color: #658b00">self</span>, eta, momentum):
-        <span style="color: #658b00">super</span>().<span style="color: #008b45">__init__</span>(eta)
-        <span style="color: #658b00">self</span>.G_t = <span style="color: #8B008B; font-weight: bold">None</span>
-        <span style="color: #658b00">self</span>.momentum = momentum
-        <span style="color: #658b00">self</span>.change = <span style="color: #B452CD">0</span>
+<p>We can then compare the result from this numerical scheme with the output from our network using Autograd:</p>
 
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">update_change</span>(<span style="color: #658b00">self</span>, gradient):
-        delta = <span style="color: #B452CD">1e-8</span>  <span style="color: #228B22"># avoid division ny zero</span>
 
-        <span style="color: #8B008B; font-weight: bold">if</span> <span style="color: #658b00">self</span>.G_t <span style="color: #8B008B">is</span> <span style="color: #8B008B; font-weight: bold">None</span>:
-            <span style="color: #658b00">self</span>.G_t = np.zeros((gradient.shape[<span style="color: #B452CD">0</span>], gradient.shape[<span style="color: #B452CD">0</span>]))
+<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #eeeedd">
+  <pre style="font-size: 80%; line-height: 125%;"><span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">autograd.numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">autograd</span> <span style="color: #8B008B; font-weight: bold">import</span> grad, elementwise_grad
+<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">autograd.numpy.random</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">npr</span>
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">matplotlib</span> <span style="color: #8B008B; font-weight: bold">import</span> pyplot <span style="color: #8B008B; font-weight: bold">as</span> plt
 
-        <span style="color: #658b00">self</span>.G_t += gradient @ gradient.T
+<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">sigmoid</span>(z):
+    <span style="color: #8B008B; font-weight: bold">return</span> <span style="color: #B452CD">1</span>/(<span style="color: #B452CD">1</span> + np.exp(-z))
 
-        G_t_inverse = <span style="color: #B452CD">1</span> / (
-            delta + np.sqrt(np.reshape(np.diagonal(<span style="color: #658b00">self</span>.G_t), (<span style="color: #658b00">self</span>.G_t.shape[<span style="color: #B452CD">0</span>], <span style="color: #B452CD">1</span>)))
-        )
-        <span style="color: #658b00">self</span>.change = <span style="color: #658b00">self</span>.change * <span style="color: #658b00">self</span>.momentum + <span style="color: #658b00">self</span>.eta * gradient * G_t_inverse
-        <span style="color: #8B008B; font-weight: bold">return</span> <span style="color: #658b00">self</span>.change
+<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">deep_neural_network</span>(deep_params, x):
+    <span style="color: #228B22"># N_hidden is the number of hidden layers</span>
+    <span style="color: #228B22"># deep_params is a list, len() should be used</span>
+    N_hidden = <span style="color: #658b00">len</span>(deep_params) - <span style="color: #B452CD">1</span> <span style="color: #228B22"># -1 since params consists of</span>
+                                        <span style="color: #228B22"># parameters to all the hidden</span>
+                                        <span style="color: #228B22"># layers AND the output layer.</span>
 
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">reset</span>(<span style="color: #658b00">self</span>):
-        <span style="color: #658b00">self</span>.G_t = <span style="color: #8B008B; font-weight: bold">None</span>
+    <span style="color: #228B22"># Assumes input x being an one-dimensional array</span>
+    num_values = np.size(x)
+    x = x.reshape(-<span style="color: #B452CD">1</span>, num_values)
 
+    <span style="color: #228B22"># Assume that the input layer does nothing to the input x</span>
+    x_input = x
 
-<span style="color: #8B008B; font-weight: bold">class</span> <span style="color: #008b45; font-weight: bold">RMS_prop</span>(Scheduler):
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">__init__</span>(<span style="color: #658b00">self</span>, eta, rho):
-        <span style="color: #658b00">super</span>().<span style="color: #008b45">__init__</span>(eta)
-        <span style="color: #658b00">self</span>.rho = rho
-        <span style="color: #658b00">self</span>.second = <span style="color: #B452CD">0.0</span>
+    <span style="color: #228B22"># Due to multiple hidden layers, define a variable referencing to the</span>
+    <span style="color: #228B22"># output of the previous layer:</span>
+    x_prev = x_input
 
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">update_change</span>(<span style="color: #658b00">self</span>, gradient):
-        delta = <span style="color: #B452CD">1e-8</span>  <span style="color: #228B22"># avoid division ny zero</span>
-        <span style="color: #658b00">self</span>.second = <span style="color: #658b00">self</span>.rho * <span style="color: #658b00">self</span>.second + (<span style="color: #B452CD">1</span> - <span style="color: #658b00">self</span>.rho) * gradient * gradient
-        <span style="color: #8B008B; font-weight: bold">return</span> <span style="color: #658b00">self</span>.eta * gradient / (np.sqrt(<span style="color: #658b00">self</span>.second + delta))
+    <span style="color: #228B22">## Hidden layers:</span>
 
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">reset</span>(<span style="color: #658b00">self</span>):
-        <span style="color: #658b00">self</span>.second = <span style="color: #B452CD">0.0</span>
+    <span style="color: #8B008B; font-weight: bold">for</span> l <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(N_hidden):
+        <span style="color: #228B22"># From the list of parameters P; find the correct weigths and bias for this layer</span>
+        w_hidden = deep_params[l]
 
+        <span style="color: #228B22"># Add a row of ones to include bias</span>
+        x_prev = np.concatenate((np.ones((<span style="color: #B452CD">1</span>,num_values)), x_prev ), axis = <span style="color: #B452CD">0</span>)
 
-<span style="color: #8B008B; font-weight: bold">class</span> <span style="color: #008b45; font-weight: bold">Adam</span>(Scheduler):
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">__init__</span>(<span style="color: #658b00">self</span>, eta, rho, rho2):
-        <span style="color: #658b00">super</span>().<span style="color: #008b45">__init__</span>(eta)
-        <span style="color: #658b00">self</span>.rho = rho
-        <span style="color: #658b00">self</span>.rho2 = rho2
-        <span style="color: #658b00">self</span>.moment = <span style="color: #B452CD">0</span>
-        <span style="color: #658b00">self</span>.second = <span style="color: #B452CD">0</span>
-        <span style="color: #658b00">self</span>.n_epochs = <span style="color: #B452CD">1</span>
+        z_hidden = np.matmul(w_hidden, x_prev)
+        x_hidden = sigmoid(z_hidden)
 
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">update_change</span>(<span style="color: #658b00">self</span>, gradient):
-        delta = <span style="color: #B452CD">1e-8</span>  <span style="color: #228B22"># avoid division ny zero</span>
+        <span style="color: #228B22"># Update x_prev such that next layer can use the output from this layer</span>
+        x_prev = x_hidden
 
-        <span style="color: #658b00">self</span>.moment = <span style="color: #658b00">self</span>.rho * <span style="color: #658b00">self</span>.moment + (<span style="color: #B452CD">1</span> - <span style="color: #658b00">self</span>.rho) * gradient
-        <span style="color: #658b00">self</span>.second = <span style="color: #658b00">self</span>.rho2 * <span style="color: #658b00">self</span>.second + (<span style="color: #B452CD">1</span> - <span style="color: #658b00">self</span>.rho2) * gradient * gradient
+    <span style="color: #228B22">## Output layer:</span>
 
-        moment_corrected = <span style="color: #658b00">self</span>.moment / (<span style="color: #B452CD">1</span> - <span style="color: #658b00">self</span>.rho**<span style="color: #658b00">self</span>.n_epochs)
-        second_corrected = <span style="color: #658b00">self</span>.second / (<span style="color: #B452CD">1</span> - <span style="color: #658b00">self</span>.rho2**<span style="color: #658b00">self</span>.n_epochs)
+    <span style="color: #228B22"># Get the weights and bias for this layer</span>
+    w_output = deep_params[-<span style="color: #B452CD">1</span>]
 
-        <span style="color: #8B008B; font-weight: bold">return</span> <span style="color: #658b00">self</span>.eta * moment_corrected / (np.sqrt(second_corrected + delta))
+    <span style="color: #228B22"># Include bias:</span>
+    x_prev = np.concatenate((np.ones((<span style="color: #B452CD">1</span>,num_values)), x_prev), axis = <span style="color: #B452CD">0</span>)
 
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">reset</span>(<span style="color: #658b00">self</span>):
-        <span style="color: #658b00">self</span>.n_epochs += <span style="color: #B452CD">1</span>
-        <span style="color: #658b00">self</span>.moment = <span style="color: #B452CD">0</span>
-        <span style="color: #658b00">self</span>.second = <span style="color: #B452CD">0</span>
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-<h3 id="usage-of-schedulers">Usage of schedulers </h3>
+    z_output = np.matmul(w_output, x_prev)
+    x_output = z_output
 
-<p>To initalize a scheduler, simply create the object and pass in the necessary parameters such as the learning rate and the momentum as shown below. As the Scheduler class is an abstract class it should not called directly, and will raise an error upon usage.</p>
+    <span style="color: #8B008B; font-weight: bold">return</span> x_output
 
 
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="font-size: 80%; line-height: 125%;">momentum_scheduler = Momentum(eta=<span style="color: #B452CD">1e-3</span>, momentum=<span style="color: #B452CD">0.9</span>)
-adam_scheduler = Adam(eta=<span style="color: #B452CD">1e-3</span>, rho=<span style="color: #B452CD">0.9</span>, rho2=<span style="color: #B452CD">0.999</span>)
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
+<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">solve_ode_deep_neural_network</span>(x, num_neurons, num_iter, lmb):
+    <span style="color: #228B22"># num_hidden_neurons is now a list of number of neurons within each hidden layer</span>
 
-<p>Here is a small example for how a segment of code using schedulers could look. Switching out the schedulers is simple.</p>
+    <span style="color: #228B22"># Find the number of hidden layers:</span>
+    N_hidden = np.size(num_neurons)
 
+    <span style="color: #228B22">## Set up initial weigths and biases</span>
 
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="font-size: 80%; line-height: 125%;">weights = np.ones((<span style="color: #B452CD">3</span>,<span style="color: #B452CD">3</span>))
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">f&quot;Before scheduler:\n{</span>weights<span style="color: #CD5555">=}&quot;</span>)
+    <span style="color: #228B22"># Initialize the list of parameters:</span>
+    P = [<span style="color: #8B008B; font-weight: bold">None</span>]*(N_hidden + <span style="color: #B452CD">1</span>) <span style="color: #228B22"># + 1 to include the output layer</span>
 
-epochs = <span style="color: #B452CD">10</span>
-<span style="color: #8B008B; font-weight: bold">for</span> e <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(epochs):
-    gradient = np.random.rand(<span style="color: #B452CD">3</span>, <span style="color: #B452CD">3</span>)
-    change = adam_scheduler.update_change(gradient)
-    weights = weights - change
-    adam_scheduler.reset()
+    P[<span style="color: #B452CD">0</span>] = npr.randn(num_neurons[<span style="color: #B452CD">0</span>], <span style="color: #B452CD">2</span> )
+    <span style="color: #8B008B; font-weight: bold">for</span> l <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #B452CD">1</span>,N_hidden):
+        P[l] = npr.randn(num_neurons[l], num_neurons[l-<span style="color: #B452CD">1</span>] + <span style="color: #B452CD">1</span>) <span style="color: #228B22"># +1 to include bias</span>
 
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">f&quot;\nAfter scheduler:\n{</span>weights<span style="color: #CD5555">=}&quot;</span>)
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-<h3 id="cost-functions">Cost functions </h3>
+    <span style="color: #228B22"># For the output layer</span>
+    P[-<span style="color: #B452CD">1</span>] = npr.randn(<span style="color: #B452CD">1</span>, num_neurons[-<span style="color: #B452CD">1</span>] + <span style="color: #B452CD">1</span> ) <span style="color: #228B22"># +1 since bias is included</span>
 
-<p>In this section we will quickly look at cost functions that can be
-used when creating the neural network. Every cost function takes the
-target vector as its parameter, and returns a function valued only at
-X such that it may easily be differentiated.
-</p>
+    <span style="color: #658b00">print</span>(<span style="color: #CD5555">&#39;Initial cost: %g&#39;</span>%cost_function_deep(P, x))
 
+    <span style="color: #228B22">## Start finding the optimal weigths using gradient descent</span>
 
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="font-size: 80%; line-height: 125%;"><span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">CostOLS</span>(target):
-    <span style="color: #CD5555">&quot;&quot;&quot;</span>
-<span style="color: #CD5555">    Return OLS function valued only at X, so</span>
-<span style="color: #CD5555">    that it may be easily differentiated</span>
-<span style="color: #CD5555">    &quot;&quot;&quot;</span>
+    <span style="color: #228B22"># Find the Python function that represents the gradient of the cost function</span>
+    <span style="color: #228B22"># w.r.t the 0-th input argument -- that is the weights and biases in the hidden and output layer</span>
+    cost_function_deep_grad = grad(cost_function_deep,<span style="color: #B452CD">0</span>)
 
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">func</span>(X):
-        <span style="color: #8B008B; font-weight: bold">return</span> (<span style="color: #B452CD">1.0</span> / target.shape[<span style="color: #B452CD">0</span>]) * np.sum((target - X) ** <span style="color: #B452CD">2</span>)
+    <span style="color: #228B22"># Let the update be done num_iter times</span>
+    <span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(num_iter):
+        <span style="color: #228B22"># Evaluate the gradient at the current weights and biases in P.</span>
+        <span style="color: #228B22"># The cost_grad consist now of N_hidden + 1 arrays; the gradient w.r.t the weights and biases</span>
+        <span style="color: #228B22"># in the hidden layers and output layers evaluated at x.</span>
+        cost_deep_grad =  cost_function_deep_grad(P, x)
 
-    <span style="color: #8B008B; font-weight: bold">return</span> func
+        <span style="color: #8B008B; font-weight: bold">for</span> l <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(N_hidden+<span style="color: #B452CD">1</span>):
+            P[l] = P[l] - lmb * cost_deep_grad[l]
 
+    <span style="color: #658b00">print</span>(<span style="color: #CD5555">&#39;Final cost: %g&#39;</span>%cost_function_deep(P, x))
 
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">CostLogReg</span>(target):
-    <span style="color: #CD5555">&quot;&quot;&quot;</span>
-<span style="color: #CD5555">    Return Logistic Regression cost function</span>
-<span style="color: #CD5555">    valued only at X, so that it may be easily differentiated</span>
-<span style="color: #CD5555">    &quot;&quot;&quot;</span>
+    <span style="color: #8B008B; font-weight: bold">return</span> P
 
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">func</span>(X):
-        <span style="color: #8B008B; font-weight: bold">return</span> -(<span style="color: #B452CD">1.0</span> / target.shape[<span style="color: #B452CD">0</span>]) * np.sum(
-            (target * np.log(X + <span style="color: #B452CD">10e-10</span>)) + ((<span style="color: #B452CD">1</span> - target) * np.log(<span style="color: #B452CD">1</span> - X + <span style="color: #B452CD">10e-10</span>))
-        )
+<span style="color: #228B22">## Set up the cost function specified for this Poisson equation:</span>
 
-    <span style="color: #8B008B; font-weight: bold">return</span> func
+<span style="color: #228B22"># The right side of the ODE</span>
+<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">f</span>(x):
+    <span style="color: #8B008B; font-weight: bold">return</span> (<span style="color: #B452CD">3</span>*x + x**<span style="color: #B452CD">2</span>)*np.exp(x)
 
+<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">cost_function_deep</span>(P, x):
 
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">CostCrossEntropy</span>(target):
-    <span style="color: #CD5555">&quot;&quot;&quot;</span>
-<span style="color: #CD5555">    Return cross entropy cost function valued only at X, so</span>
-<span style="color: #CD5555">    that it may be easily differentiated</span>
-<span style="color: #CD5555">    &quot;&quot;&quot;</span>
-    
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">func</span>(X):
-        <span style="color: #8B008B; font-weight: bold">return</span> -(<span style="color: #B452CD">1.0</span> / target.size) * np.sum(target * np.log(X + <span style="color: #B452CD">10e-10</span>))
+    <span style="color: #228B22"># Evaluate the trial function with the current parameters P</span>
+    g_t = g_trial_deep(x,P)
 
-    <span style="color: #8B008B; font-weight: bold">return</span> func
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-<h3 id="usage-of-cost-functions">Usage of cost functions </h3>
+    <span style="color: #228B22"># Find the derivative w.r.t x of the trial function</span>
+    d2_g_t = elementwise_grad(elementwise_grad(g_trial_deep,<span style="color: #B452CD">0</span>))(x,P)
 
-<p>Below we will provide a short example of how these cost function may
-be used to obtain results if you wish to test them out on your own
-using AutoGrad's automatic differentiation.
-</p>
+    right_side = f(x)
 
+    err_sqr = (-d2_g_t - right_side)**<span style="color: #B452CD">2</span>
+    cost_sum = np.sum(err_sqr)
 
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="font-size: 80%; line-height: 125%;"><span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">autograd</span> <span style="color: #8B008B; font-weight: bold">import</span> grad
+    <span style="color: #8B008B; font-weight: bold">return</span> cost_sum/np.size(err_sqr)
 
-target = np.array([[<span style="color: #B452CD">1</span>, <span style="color: #B452CD">2</span>, <span style="color: #B452CD">3</span>]]).T
-a = np.array([[<span style="color: #B452CD">4</span>, <span style="color: #B452CD">5</span>, <span style="color: #B452CD">6</span>]]).T
+<span style="color: #228B22"># The trial solution:</span>
+<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">g_trial_deep</span>(x,P):
+    <span style="color: #8B008B; font-weight: bold">return</span> x*(<span style="color: #B452CD">1</span>-x)*deep_neural_network(P,x)
 
-cost_func = CostCrossEntropy
-cost_func_derivative = grad(cost_func(target))
+<span style="color: #228B22"># The analytic solution;</span>
+<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">g_analytic</span>(x):
+    <span style="color: #8B008B; font-weight: bold">return</span> x*(<span style="color: #B452CD">1</span>-x)*np.exp(x)
 
-valued_at_a = cost_func_derivative(a)
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">f&quot;Derivative of cost function {</span>cost_func.<span style="color: #00688B">__name__</span><span style="color: #CD5555">} valued at a:\n{</span>valued_at_a<span style="color: #CD5555">}&quot;</span>)
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-<h3 id="activation-functions">Activation functions </h3>
+<span style="color: #8B008B; font-weight: bold">if</span> <span style="color: #00688B">__name__</span> == <span style="color: #CD5555">&#39;__main__&#39;</span>:
+    npr.seed(<span style="color: #B452CD">4155</span>)
 
-<p>Finally, before we look at the layers that make up the neural network,
-we will look at the activation functions which can be specified
-between the hidden layers and as the output function. Each function
-can be valued for any given vector or matrix X, and can be
-differentiated via derivate().
-</p>
+    <span style="color: #228B22">## Decide the vales of arguments to the function to solve</span>
+    Nx = <span style="color: #B452CD">10</span>
+    x = np.linspace(<span style="color: #B452CD">0</span>,<span style="color: #B452CD">1</span>, Nx)
 
+    <span style="color: #228B22">## Set up the initial parameters</span>
+    num_hidden_neurons = [<span style="color: #B452CD">200</span>,<span style="color: #B452CD">100</span>]
+    num_iter = <span style="color: #B452CD">1000</span>
+    lmb = <span style="color: #B452CD">1e-3</span>
 
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="font-size: 80%; line-height: 125%;"><span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">autograd.numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">autograd</span> <span style="color: #8B008B; font-weight: bold">import</span> elementwise_grad
+    P = solve_ode_deep_neural_network(x, num_hidden_neurons, num_iter, lmb)
 
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">identity</span>(X):
-    <span style="color: #8B008B; font-weight: bold">return</span> X
+    g_dnn_ag = g_trial_deep(x,P)
+    g_analytical = g_analytic(x)
 
+    <span style="color: #228B22"># Find the maximum absolute difference between the solutons:</span>
 
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">sigmoid</span>(X):
-    <span style="color: #8B008B; font-weight: bold">try</span>:
-        <span style="color: #8B008B; font-weight: bold">return</span> <span style="color: #B452CD">1.0</span> / (<span style="color: #B452CD">1</span> + np.exp(-X))
-    <span style="color: #8B008B; font-weight: bold">except</span> <span style="color: #008b45; font-weight: bold">FloatingPointError</span>:
-        <span style="color: #8B008B; font-weight: bold">return</span> np.where(X &gt; np.zeros(X.shape), np.ones(X.shape), np.zeros(X.shape))
+    plt.figure(figsize=(<span style="color: #B452CD">10</span>,<span style="color: #B452CD">10</span>))
 
+    plt.title(<span style="color: #CD5555">&#39;Performance of neural network solving an ODE compared to the analytical solution&#39;</span>)
+    plt.plot(x, g_analytical)
+    plt.plot(x, g_dnn_ag[<span style="color: #B452CD">0</span>,:])
+    plt.legend([<span style="color: #CD5555">&#39;analytical&#39;</span>,<span style="color: #CD5555">&#39;nn&#39;</span>])
+    plt.xlabel(<span style="color: #CD5555">&#39;x&#39;</span>)
+    plt.ylabel(<span style="color: #CD5555">&#39;g(x)&#39;</span>)
 
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">softmax</span>(X):
-    X = X - np.max(X, axis=-<span style="color: #B452CD">1</span>, keepdims=<span style="color: #8B008B; font-weight: bold">True</span>)
-    delta = <span style="color: #B452CD">10e-10</span>
-    <span style="color: #8B008B; font-weight: bold">return</span> np.exp(X) / (np.sum(np.exp(X), axis=-<span style="color: #B452CD">1</span>, keepdims=<span style="color: #8B008B; font-weight: bold">True</span>) + delta)
+    <span style="color: #228B22">## Perform the computation using the numerical scheme</span>
 
+    dx = <span style="color: #B452CD">1</span>/(Nx - <span style="color: #B452CD">1</span>)
 
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">RELU</span>(X):
-    <span style="color: #8B008B; font-weight: bold">return</span> np.where(X &gt; np.zeros(X.shape), X, np.zeros(X.shape))
+    <span style="color: #228B22"># Set up the matrix A</span>
+    A = np.zeros((Nx-<span style="color: #B452CD">2</span>,Nx-<span style="color: #B452CD">2</span>))
 
+    A[<span style="color: #B452CD">0</span>,<span style="color: #B452CD">0</span>] = <span style="color: #B452CD">2</span>
+    A[<span style="color: #B452CD">0</span>,<span style="color: #B452CD">1</span>] = -<span style="color: #B452CD">1</span>
 
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">LRELU</span>(X):
-    delta = <span style="color: #B452CD">10e-4</span>
-    <span style="color: #8B008B; font-weight: bold">return</span> np.where(X &gt; np.zeros(X.shape), X, delta * X)
+    <span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #B452CD">1</span>,Nx-<span style="color: #B452CD">3</span>):
+        A[i,i-<span style="color: #B452CD">1</span>] = -<span style="color: #B452CD">1</span>
+        A[i,i] = <span style="color: #B452CD">2</span>
+        A[i,i+<span style="color: #B452CD">1</span>] = -<span style="color: #B452CD">1</span>
 
+    A[Nx - <span style="color: #B452CD">3</span>, Nx - <span style="color: #B452CD">4</span>] = -<span style="color: #B452CD">1</span>
+    A[Nx - <span style="color: #B452CD">3</span>, Nx - <span style="color: #B452CD">3</span>] = <span style="color: #B452CD">2</span>
 
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">derivate</span>(func):
-    <span style="color: #8B008B; font-weight: bold">if</span> func.<span style="color: #00688B">__name__</span> == <span style="color: #CD5555">&quot;RELU&quot;</span>:
+    <span style="color: #228B22"># Set up the vector f</span>
+    f_vec = dx**<span style="color: #B452CD">2</span> * f(x[<span style="color: #B452CD">1</span>:-<span style="color: #B452CD">1</span>])
 
-        <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">func</span>(X):
-            <span style="color: #8B008B; font-weight: bold">return</span> np.where(X &gt; <span style="color: #B452CD">0</span>, <span style="color: #B452CD">1</span>, <span style="color: #B452CD">0</span>)
+    <span style="color: #228B22"># Solve the equation</span>
+    g_res = np.linalg.solve(A,f_vec)
 
-        <span style="color: #8B008B; font-weight: bold">return</span> func
+    g_vec = np.zeros(Nx)
+    g_vec[<span style="color: #B452CD">1</span>:-<span style="color: #B452CD">1</span>] = g_res
 
-    <span style="color: #8B008B; font-weight: bold">elif</span> func.<span style="color: #00688B">__name__</span> == <span style="color: #CD5555">&quot;LRELU&quot;</span>:
+    <span style="color: #228B22"># Print the differences between each method</span>
+    max_diff1 = np.max(np.abs(g_dnn_ag - g_analytical))
+    max_diff2 = np.max(np.abs(g_vec - g_analytical))
+    <span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;The max absolute difference between the analytical solution and DNN Autograd: %g&quot;</span>%max_diff1)
+    <span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;The max absolute difference between the analytical solution and numerical scheme: %g&quot;</span>%max_diff2)
 
-        <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">func</span>(X):
-            delta = <span style="color: #B452CD">10e-4</span>
-            <span style="color: #8B008B; font-weight: bold">return</span> np.where(X &gt; <span style="color: #B452CD">0</span>, <span style="color: #B452CD">1</span>, delta)
+    <span style="color: #228B22"># Plot the results</span>
+    plt.figure(figsize=(<span style="color: #B452CD">10</span>,<span style="color: #B452CD">10</span>))
 
-        <span style="color: #8B008B; font-weight: bold">return</span> func
+    plt.plot(x,g_vec)
+    plt.plot(x,g_analytical)
+    plt.plot(x,g_dnn_ag[<span style="color: #B452CD">0</span>,:])
 
-    <span style="color: #8B008B; font-weight: bold">else</span>:
-        <span style="color: #8B008B; font-weight: bold">return</span> elementwise_grad(func)
+    plt.legend([<span style="color: #CD5555">&#39;numerical scheme&#39;</span>,<span style="color: #CD5555">&#39;analytical&#39;</span>,<span style="color: #CD5555">&#39;dnn&#39;</span>])
+    plt.show()
 </pre>
 </div>
       </div>
@@ -2293,246 +2098,166 @@ <h3 id="activation-functions">Activation functions </h3>
     </div>
   </div>
 </div>
-<h3 id="usage-of-activation-functions">Usage of activation functions </h3>
-
-<p>Below we present a short demonstration of how to use an activation
-function. The derivative of the activation function will be important
-when calculating the output delta term during backpropagation. Note
-that derivate() can also be used for cost functions for a more
-generalized approach.
-</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="font-size: 80%; line-height: 125%;">z = np.array([[<span style="color: #B452CD">4</span>, <span style="color: #B452CD">5</span>, <span style="color: #B452CD">6</span>]]).T
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">f&quot;Input to activation function:\n{</span>z<span style="color: #CD5555">}&quot;</span>)
+</section>
 
-act_func = sigmoid
-a = act_func(z)
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">f&quot;\nOutput from {</span>act_func.<span style="color: #00688B">__name__</span><span style="color: #CD5555">} activation function:\n{</span>a<span style="color: #CD5555">}&quot;</span>)
+<section>
+<h2 id="partial-differential-equations">Partial Differential Equations </h2>
 
-act_func_derivative = derivate(act_func)
-valued_at_z = act_func_derivative(a)
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">f&quot;\nDerivative of {</span>act_func.<span style="color: #00688B">__name__</span><span style="color: #CD5555">} activation function valued at z:\n{</span>valued_at_z<span style="color: #CD5555">}&quot;</span>)
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-<h3 id="convolution">Convolution </h3>
-
-<p>In order to construct a convolutional neural network (CNN), it is
-crucial to comprehend the fundamental principles of convolution and
-how it aids in extracting information from images. Convolution, at its
-core, is merely a mathematical operation between two functions that
-yields another function. It is represented by an integral between two
-functions, which is typically expressed as:
+<p>A partial differential equation (PDE) has a solution here the function
+is defined by multiple variables.  The equation may involve all kinds
+of combinations of which variables the function is differentiated with
+respect to.
 </p>
 
+<p>In general, a partial differential equation for a function \( g(x_1,\dots,x_N) \) with \( N \) variables may be expressed as</p>
+
 <p>&nbsp;<br>
 $$
-(f \ast g)(t):=\int_{-\infty}^{\infty} f(\tau) g(t-\tau) d \tau.
+\begin{equation} \tag{17}
+  f\left(x_1, \, \dots \, , x_N, \frac{\partial g(x_1,\dots,x_N) }{\partial x_1}, \dots , \frac{\partial g(x_1,\dots,x_N) }{\partial x_N}, \frac{\partial g(x_1,\dots,x_N) }{\partial x_1\partial x_2}, \, \dots \, , \frac{\partial^n g(x_1,\dots,x_N) }{\partial x_N^n} \right) = 0
+\end{equation}
 $$
 <p>&nbsp;<br>
 
-<p>Here, \( f \) and \( g \) are the two functions on which we want to perform an
-operation. The outcome of the convolution operation is represented by
-\( (f \ast g) \), and it is derived by sliding the function \( g \) over \( f \) and
-computing the integral of their product at each position. If both
-functions are continuous, convolution takes the form shown
-above. However, if we discretize both \( f \) and \( g \), the convolution
-operation will take the form of a sum between the elements of \( f \) and \( g \):
+<p>where \( f \) is an expression involving all kinds of possible mixed derivatives of \( g(x_1,\dots,x_N) \) up to an order \( n \). In order for the solution to be unique, some additional conditions must also be given.</p>
+</section>
+
+<section>
+<h2 id="type-of-problem">Type of problem </h2>
+
+<p>The problem our network must solve for, is similar to the ODE case.
+We must have a trial solution \( g_t \) at hand.
 </p>
+
+<p>For instance, the trial solution could be expressed as</p>
 <p>&nbsp;<br>
 $$
-(f \ast g)[n]=\sum_{m=0}^{n-1} f(m) g(n-m).
+\begin{align*}
+  g_t(x_1,\dots,x_N) = h_1(x_1,\dots,x_N) + h_2(x_1,\dots,x_N,N(x_1,\dots,x_N,P))
+\end{align*}
 $$
 <p>&nbsp;<br>
 
-<p>The key idea we utilize to extract the information contained in an
-image is to slide an \( m \times n \) matrix \( g \) over an \( m \times n \)
-matrix \( f \). In our case, \( f \) represents the image, while \( g \)
-represents the kernel, oftentimes called a filter. However, since our
-convolution will be a two-dimensional variant, we need to extend our
-mathematical formula with an additional summation:
+<p>where \( h_1(x_1,\dots,x_N) \) is a function that ensures \( g_t(x_1,\dots,x_N) \) satisfies some given conditions.
+The neural network \( N(x_1,\dots,x_N,P) \) has weights and biases described by \( P \) and \( h_2(x_1,\dots,x_N,N(x_1,\dots,x_N,P)) \) is an expression using the output from the neural network in some way.
 </p>
 
-<p>&nbsp;<br>
-$$
-(f \ast g)(i, j)\sum_{m=0}^{M-1}\sum_{n=0}^{N-1} f(m,n) g(i-m, j-n).
-$$
-<p>&nbsp;<br>
+<p>The role of the function \( h_2(x_1,\dots,x_N,N(x_1,\dots,x_N,P)) \), is to ensure that the output of \( N(x_1,\dots,x_N,P) \) is zero when \( g_t(x_1,\dots,x_N) \) is evaluated at the values of \( x_1,\dots,x_N \) where the given conditions must be satisfied. The function \( h_1(x_1,\dots,x_N) \) should alone make \( g_t(x_1,\dots,x_N) \) satisfy the conditions.</p>
+</section>
 
-<p>It is imperative to note that the size of the kernel g is
-significantly smaller than the size of the input image f, thereby
-reducing the amount of computation necessary for feature
-extraction. Furthermore, the kernel is usually a trainable parameter
-in a convolutional neural network, allowing the network to learn
-appropriate kernels for specific tasks.
+<section>
+<h2 id="network-requirements">Network requirements </h2>
+
+<p>The network tries then the minimize the cost function following the
+same ideas as described for the ODE case, but now with more than one
+variables to consider.  The concept still remains the same; find a set
+of parameters \( P \) such that the expression \( f \) in <a href="#mjx-eqn-17">(17)</a> is as
+close to zero as possible.
 </p>
 
-<p>To give you an example of how 2D convolution works in practice,
-suppose we have an image \( f \) of dimension \( 6 \times 6 \)
+<p>As for the ODE case, the cost function is the mean squared error that
+the network must try to minimize. The cost function for the network to
+minimize is
 </p>
 
 <p>&nbsp;<br>
 $$
-f = \begin{bmatrix}
-4 & 1 & 2 & 9 & 8 & 6 \\
-9 & 5 & 9 & 5 & 8 & 5 \\
-1 & 5 & 9 & 7 & 6 & 4 \\
-2 & 9 & 8 & 3 & 7 & 1 \\
-8 & 1 & 6 & 4 & 2 & 2 \\
-1 & 0 & 5 & 7 & 8 & 2 \\
-\end{bmatrix}
+\begin{equation*}
+C\left(x_1, \dots, x_N, P\right) = \left(  f\left(x_1, \, \dots \, , x_N, \frac{\partial g(x_1,\dots,x_N) }{\partial x_1}, \dots , \frac{\partial g(x_1,\dots,x_N) }{\partial x_N}, \frac{\partial g(x_1,\dots,x_N) }{\partial x_1\partial x_2}, \, \dots \, , \frac{\partial^n g(x_1,\dots,x_N) }{\partial x_N^n} \right) \right)^2
+\end{equation*}
 $$
 <p>&nbsp;<br>
+</section>
 
-<p>and a \( 3 \times 3 \) kernel \( g \) called a low-pass filter. Note that the
-kernel is usually rotated by 180 degrees during convolution, however
-this has no effect on this kernel.
-</p>
+<section>
+<h2 id="more-details">More details </h2>
 
+<p>If we let \( \boldsymbol{x} = \big( x_1, \dots, x_N \big) \) be an array containing the values for \( x_1, \dots, x_N \) respectively, the cost function can be reformulated into the following:</p>
 <p>&nbsp;<br>
 $$
-g = \frac{1}{9}
-\begin{bmatrix}
-1 & 1 & 1 \\
-1 & 1 & 1 \\
-1 & 1 & 1 \\
-\end{bmatrix}
+	C\left(\boldsymbol{x}, P\right) = f\left( \left( \boldsymbol{x}, \frac{\partial g(\boldsymbol{x}) }{\partial x_1}, \dots , \frac{\partial g(\boldsymbol{x}) }{\partial x_N}, \frac{\partial g(\boldsymbol{x}) }{\partial x_1\partial x_2}, \, \dots \, , \frac{\partial^n g(\boldsymbol{x}) }{\partial x_N^n} \right) \right)^2
 $$
 <p>&nbsp;<br>
 
-<p>In order to filter the image, we have to extract a \( 3 \times 3 \)
-element from the upper left corner of \( f \), and perform element-wise
-multiplication of the extracted image pixels with the elements of the
-kernel \( g \):
-</p>
-
+<p>If we also have \( M \) different sets of values for \( x_1, \dots, x_N \), that is \( \boldsymbol{x}_i = \big(x_1^{(i)}, \dots, x_N^{(i)}\big) \) for \( i = 1,\dots,M \) being the rows in matrix \( X \), the cost function can be generalized into</p>
 <p>&nbsp;<br>
 $$
-\begin{bmatrix}
-4 & 1 & 2 \\
-9 & 5 & 9 \\
-1 & 5 & 9 \\
-\end{bmatrix}
-\cdot
-\begin{bmatrix}
-\frac{1}{9} & \frac{1}{9} & \frac{1}{9} \\
-\frac{1}{9} & \frac{1}{9} & \frac{1}{9} \\
-\frac{1}{9} & \frac{1}{9} & \frac{1}{9} \\
-\end{bmatrix}
-=
-\begin{bmatrix}
-\frac{4}{9} & \frac{1}{9} & \frac{2}{9} \\
-\frac{9}{9} & \frac{5}{9} & \frac{9}{9} \\
-\frac{1}{9} & \frac{5}{9} & \frac{9}{9} \\
-\end {bmatrix}
-= \boldsymbol{A}
+\begin{equation*}
+C\left(X, P \right) = \sum_{i=1}^M f\left( \left( \boldsymbol{x}_i, \frac{\partial g(\boldsymbol{x}_i) }{\partial x_1}, \dots , \frac{\partial g(\boldsymbol{x}_i) }{\partial x_N}, \frac{\partial g(\boldsymbol{x}_i) }{\partial x_1\partial x_2}, \, \dots \, , \frac{\partial^n g(\boldsymbol{x}_i) }{\partial x_N^n} \right) \right)^2.
+\end{equation*}
 $$
 <p>&nbsp;<br>
+</section>
 
-<p>Then, following the multiplication, we summarize all the elements of the resulting matrix \( \boldsymbol{A} \):</p>
+<section>
+<h2 id="example-the-diffusion-equation">Example: The diffusion equation </h2>
+
+<p>In one spatial dimension, the equation reads</p>
+<p>&nbsp;<br>
+$$
+\begin{equation*}
+  \frac{\partial g(x,t)}{\partial t} = \frac{\partial^2 g(x,t)}{\partial x^2}
+\end{equation*}
+$$
+<p>&nbsp;<br>
 
+<p>where a possible choice of conditions are</p>
 <p>&nbsp;<br>
 $$
-(f \ast g)(0, 0)= \sum_{i=0}^{2} \sum_{j=0}^{2} a_{i,j} = 5,
+\begin{align*}
+g(0,t) &= 0 ,\qquad t \geq 0 \\
+g(1,t) &= 0, \qquad t \geq 0 \\
+g(x,0) &= u(x),\qquad x\in [0,1]
+\end{align*}
 $$
 <p>&nbsp;<br>
 
-<p>which corresponds to the first element of the filtered image \( (f \ast g) \).</p>
+<p>with \( u(x) \) being some given function.</p>
+</section>
 
-<p>Here we use a stride of \( S=1 \), a parameter denoted \( S \) which describes how
-many indexes we move the kernel \( g \) to the right before repeating the
-calculations above for the next \( 3 \times 3 \) element of the image
-\( f \). It is usually presumed that \( S=1 \), however, larger values for \( S \)
-can be used to reduce the dimentionality of the filtered image such
-that the convolution operation is more computationally efficient. In
-the context of a convolutional neural network, this will become very
-useful.
-</p>
+<section>
+<h2 id="defining-the-problem">Defining the problem </h2>
 
-<p>The full result of the convolution is:</p>
+<p>For this case, we want to find \( g(x,t) \) such that</p>
 
 <p>&nbsp;<br>
 $$
-(f \ast g) =
-\begin{bmatrix}
-5 & 5.78 & 7 & 6.44 \\
-6.33 & 6.67 & 6.89 & 5.11 \\
-5.44 & 5.78 & 5.78 & 4 \\
-4.44 & 4.78 & 5.56 & 4 \\
-\end{bmatrix}
+\begin{equation}
+  \frac{\partial g(x,t)}{\partial t} = \frac{\partial^2 g(x,t)}{\partial x^2}
+\end{equation} \tag{18}
 $$
 <p>&nbsp;<br>
 
-<p>The result is markedly smaller in shape than the original image. This
-occurs when using convolution without first padding the image with
-additional columns and rows, allowing us to keep the original image
-shape after sliding the kernel over the image.  How many rows and
-columns we wish to pad the image with depends strictly on the shape of
-the kernel, as we wish to pad the image with \( r \) additional rows and
-\( c \) additional columns.
-</p>
+<p>and</p>
 
 <p>&nbsp;<br>
 $$
-r =\lfloor \frac{\mathrm{kernel height}}{2} \rfloor \cdot 2 \\
-c =\lfloor \frac{\mathrm{kernel width}}{2} \rfloor \cdot 2
+\begin{align*}
+g(0,t) &= 0 ,\qquad t \geq 0 \\
+g(1,t) &= 0, \qquad t \geq 0 \\
+g(x,0) &= u(x),\qquad x\in [0,1]
+\end{align*}
 $$
 <p>&nbsp;<br>
 
-<p>Note the notation \( \lfloor \frac{\mathrm{kernel width}}{2} \rfloor \) means that
-we floor the result of the division, meaning we round down to a whole
-number in case \( \frac{\mathrm{kernel width}}{2} \) results in a floating point
-number.
-</p>
+<p>with \( u(x) = \sin(\pi x) \).</p>
 
-<p>Using those simple equations, we find out by how much we have to
-extend the dimensions of the original image. Before proceeding,
-however, we might ask what we shall fill the additional rows and
-columns with? One of the most common approaches to padding is
-zero-padding, which as the name suggest, involves filling the rows and
-columns with zeros. This is the approach that we will be using for
-this demonstration. If we apply this padding to out original \( 6 \times 6 \)
-image, the result will be an \( 8 \times 8 \) image as the kernel has a width and
-height of 3. Note that the original image is encapsuled by the
-zero-padded rows and columns:
+<p>First, let us set up the deep neural network.
+The deep neural network will follow the same structure as discussed in the examples solving the ODEs.
+First, we will look into how Autograd could be used in a network tailored to solve for bivariate functions.
 </p>
+</section>
 
-<p>&nbsp;<br>
-$$
-\begin{bmatrix}
-0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\
-0 & 4 & 1 & 2 & 9 & 8 & 6 & 0 \\
-0 & 9 & 5 & 9 & 5 & 8 & 5 & 0 \\
-0 & 1 & 5 & 9 & 7 & 6 & 4 & 0 \\
-0 & 2 & 9 & 8 & 3 & 7 & 1 & 0 \\
-0 & 8 & 1 & 6 & 4 & 2 & 2 & 0 \\
-0 & 1 & 0 & 5 & 7 & 8 & 2 & 0 \\
-0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\
-\end{bmatrix}.
-$$
-<p>&nbsp;<br>
+<section>
+<h2 id="setting-up-the-network-using-autograd">Setting up the network using Autograd </h2>
 
-<p>Below we have provided code that demonstrates padding and
-convolution. As you will see when we run the code, the size of the
-image will remain unchanged when using padding.~
+<p>The only change to do here, is to extend our network such that
+functions of multiple parameters are correctly handled.  In this case
+we have two variables in our function to solve for, that is time \( t \)
+and position \( x \).  The variables will be represented by a
+one-dimensional array in the program.  The program will evaluate the
+network at each possible pair \( (x,t) \), given an array for the desired
+\( x \)-values and \( t \)-values to approximate the solution at.
 </p>
 
 
@@ -2542,120 +2267,50 @@ <h3 id="convolution">Convolution </h3>
     <div class="inner_cell">
       <div class="input_area">
         <div class="highlight" style="background: #eeeedd">
-  <pre style="font-size: 80%; line-height: 125%;"><span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
+  <pre style="font-size: 80%; line-height: 125%;"><span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">sigmoid</span>(z):
+    <span style="color: #8B008B; font-weight: bold">return</span> <span style="color: #B452CD">1</span>/(<span style="color: #B452CD">1</span> + np.exp(-z))
 
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">padding</span>(image, kernel):
-    <span style="color: #228B22"># calculate r and c</span>
-    r = (kernel.shape[<span style="color: #B452CD">0</span>] // <span style="color: #B452CD">2</span>) * <span style="color: #B452CD">2</span>
-    c = (kernel.shape[<span style="color: #B452CD">1</span>] // <span style="color: #B452CD">2</span>) * <span style="color: #B452CD">2</span>
-    
-    <span style="color: #228B22"># padded image dimensions</span>
-    padded_height = image.shape[<span style="color: #B452CD">0</span>] + r
-    padded_width = image.shape[<span style="color: #B452CD">1</span>] + c
-    
-    <span style="color: #228B22"># for more readable code</span>
-    k_half_height = kernel.shape[<span style="color: #B452CD">0</span>] // <span style="color: #B452CD">2</span>
-    k_half_width = kernel.shape[<span style="color: #B452CD">1</span>] // <span style="color: #B452CD">2</span>
+<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">deep_neural_network</span>(deep_params, x):
+    <span style="color: #228B22"># x is now a point and a 1D numpy array; make it a column vector</span>
+    num_coordinates = np.size(x,<span style="color: #B452CD">0</span>)
+    x = x.reshape(num_coordinates,-<span style="color: #B452CD">1</span>)
 
-    <span style="color: #228B22"># zero matrix with padded dimensions</span>
-    padded_img = np.zeros((padded_height, padded_width))
+    num_points = np.size(x,<span style="color: #B452CD">1</span>)
 
-    <span style="color: #228B22"># place image into zero matrix</span>
-    padded_img[k_half_height : padded_height - k_half_height,
-               k_half_width : padded_width - k_half_width] = image[:, :]
+    <span style="color: #228B22"># N_hidden is the number of hidden layers</span>
+    N_hidden = <span style="color: #658b00">len</span>(deep_params) - <span style="color: #B452CD">1</span> <span style="color: #228B22"># -1 since params consist of parameters to all the hidden layers AND the output layer</span>
 
-    <span style="color: #8B008B; font-weight: bold">return</span> padded_img
+    <span style="color: #228B22"># Assume that the input layer does nothing to the input x</span>
+    x_input = x
+    x_prev = x_input
 
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">convolve</span>(original_image, padded_image, kernel, stride=<span style="color: #B452CD">1</span>):
-    <span style="color: #228B22"># rotate kernel by 180 degrees</span>
-    kernel = np.rot90(np.rot90(kernel))
+    <span style="color: #228B22">## Hidden layers:</span>
 
-    <span style="color: #228B22"># note that kernel height // 2 is written as &#39;m&#39;</span>
-    <span style="color: #228B22"># and kernel width // 2 as &#39;n&#39; in the mathematical notation</span>
-    m = kernel.shape[<span style="color: #B452CD">0</span>] // <span style="color: #B452CD">2</span>
-    n = kernel.shape[<span style="color: #B452CD">1</span>] // <span style="color: #B452CD">2</span>
-    
-    r = (kernel.shape[<span style="color: #B452CD">0</span>] // <span style="color: #B452CD">2</span>) * <span style="color: #B452CD">2</span>
-    c = (kernel.shape[<span style="color: #B452CD">1</span>] // <span style="color: #B452CD">2</span>) * <span style="color: #B452CD">2</span>
-    
-    <span style="color: #228B22"># initialize output array</span>
-    convolved_image = np.zeros(original_image.shape)
-    image_height = original_image.shape[<span style="color: #B452CD">0</span>]
-    image_width = original_image.shape[<span style="color: #B452CD">1</span>]
-
-    <span style="color: #228B22"># the convolution</span>
-    <span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(m, image_height + m, stride):
-        <span style="color: #8B008B; font-weight: bold">for</span> j <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(n, image_width + n, stride):
-            convolved_image[i-m, j-n] = np.sum(
-                padded_image[i : i + m, j : j + n]
-                * kernel
-            )
-            
-    <span style="color: #8B008B; font-weight: bold">return</span> convolved_image
-
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">convolve</span>(image, kernel, stride=<span style="color: #B452CD">1</span>):
-    <span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #B452CD">2</span>):
-        kernel = np.rot90(kernel)
-
-    k_half_height = kernel.shape[<span style="color: #B452CD">0</span>] // <span style="color: #B452CD">2</span>
-    k_half_width = kernel.shape[<span style="color: #B452CD">0</span>] // <span style="color: #B452CD">2</span>
-
-    conv_image = np.zeros(image.shape)
-    pad_image = padding(image, kernel)
-
-    <span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(k_half_height, conv_image.shape[<span style="color: #B452CD">0</span>] + k_half_height, stride):
-        <span style="color: #8B008B; font-weight: bold">for</span> j <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(k_half_width, conv_image.shape[<span style="color: #B452CD">1</span>] + k_half_width, stride):
-            conv_image[i - k_half_height, j - k_half_width] = np.sum(
-                pad_image[
-                    i - k_half_height : i + k_half_height + <span style="color: #B452CD">1</span>, j - k_half_width : j + k_half_width + <span style="color: #B452CD">1</span>
-                ]
-                * kernel
-            )
-
-    <span style="color: #8B008B; font-weight: bold">return</span> conv_image
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
+    <span style="color: #8B008B; font-weight: bold">for</span> l <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(N_hidden):
+        <span style="color: #228B22"># From the list of parameters P; find the correct weigths and bias for this layer</span>
+        w_hidden = deep_params[l]
 
-<p>Fun fact: When filtering images, you will see that convolution
-involves rotating the kernel by 180 degrees.  However, this is not the
-case when applying convolution in a CNN, where the same operation that is not
-rotated by 180 degrees is called cross-correlation, which is normally implemented in most libraries.
-</p>
+        <span style="color: #228B22"># Add a row of ones to include bias</span>
+        x_prev = np.concatenate((np.ones((<span style="color: #B452CD">1</span>,num_points)), x_prev ), axis = <span style="color: #B452CD">0</span>)
 
+        z_hidden = np.matmul(w_hidden, x_prev)
+        x_hidden = sigmoid(z_hidden)
 
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="font-size: 80%; line-height: 125%;">original_image = np.array([[<span style="color: #B452CD">4</span>, <span style="color: #B452CD">1</span>, <span style="color: #B452CD">2</span>, <span style="color: #B452CD">9</span>, <span style="color: #B452CD">8</span>, <span style="color: #B452CD">6</span>],
-                 [<span style="color: #B452CD">9</span>, <span style="color: #B452CD">5</span>, <span style="color: #B452CD">9</span>, <span style="color: #B452CD">5</span>, <span style="color: #B452CD">8</span>, <span style="color: #B452CD">5</span>],
-                 [<span style="color: #B452CD">1</span>, <span style="color: #B452CD">5</span>, <span style="color: #B452CD">9</span>, <span style="color: #B452CD">7</span>, <span style="color: #B452CD">6</span>, <span style="color: #B452CD">4</span>],
-                 [<span style="color: #B452CD">2</span>, <span style="color: #B452CD">9</span>, <span style="color: #B452CD">8</span>, <span style="color: #B452CD">3</span>, <span style="color: #B452CD">7</span>, <span style="color: #B452CD">1</span>],
-                 [<span style="color: #B452CD">8</span>, <span style="color: #B452CD">1</span>, <span style="color: #B452CD">6</span>, <span style="color: #B452CD">4</span>, <span style="color: #B452CD">2</span>, <span style="color: #B452CD">2</span>],
-                 [<span style="color: #B452CD">1</span>, <span style="color: #B452CD">0</span>, <span style="color: #B452CD">5</span>, <span style="color: #B452CD">7</span>, <span style="color: #B452CD">8</span>, <span style="color: #B452CD">2</span>]])
+        <span style="color: #228B22"># Update x_prev such that next layer can use the output from this layer</span>
+        x_prev = x_hidden
+
+    <span style="color: #228B22">## Output layer:</span>
 
-kernel = (<span style="color: #B452CD">1</span>/<span style="color: #B452CD">9</span>)*np.ones((<span style="color: #B452CD">3</span>,<span style="color: #B452CD">3</span>))
+    <span style="color: #228B22"># Get the weights and bias for this layer</span>
+    w_output = deep_params[-<span style="color: #B452CD">1</span>]
 
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">f&quot;{</span>original_image.shape<span style="color: #CD5555">=}&quot;</span>)
+    <span style="color: #228B22"># Include bias:</span>
+    x_prev = np.concatenate((np.ones((<span style="color: #B452CD">1</span>,num_points)), x_prev), axis = <span style="color: #B452CD">0</span>)
 
-<span style="color: #228B22"># note that convolve() performs padding</span>
-convolved_image = convolve(original_image, kernel, stride=<span style="color: #B452CD">1</span>)
+    z_output = np.matmul(w_output, x_prev)
+    x_output = z_output
 
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">f&quot;{</span>convolved_image.shape<span style="color: #CD5555">=}&quot;</span>)
+    <span style="color: #8B008B; font-weight: bold">return</span> x_output[<span style="color: #B452CD">0</span>][<span style="color: #B452CD">0</span>]
 </pre>
 </div>
       </div>
@@ -2670,85 +2325,62 @@ <h3 id="convolution">Convolution </h3>
     </div>
   </div>
 </div>
+</section>
+
+<section>
+<h2 id="setting-up-the-network-using-autograd-the-trial-solution">Setting up the network using Autograd; The trial solution </h2>
 
-<p>As you can see, the resulting image is of the same size as the
-original image. To round of our demonstration of convolution, we will
-present the results of convolution using commonly used kernels. In a
-CNN, the values of the kernels are randomly initialized, and then
-learned during training. These kernels will extract information
-regarding the picture, such as for example the edge detection filter
-demonstrated below extracts the edges present in the picture. Of
-course, there is no guarantee that the CNN will learn an edge
-detection filter, but this should provide some intuiton as to how the
-CNN is able to use kernels to make better predictions than a regular
-feed forward neural network.
+<p>The cost function must then iterate through the given arrays
+containing values for \( x \) and \( t \), defines a point \( (x,t) \) the deep
+neural network and the trial solution is evaluated at, and then finds
+the Jacobian of the trial solution.
 </p>
 
+<p>A possible trial solution for this PDE is</p>
 
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="font-size: 80%; line-height: 125%;"><span style="color: #228B22"># Now an example using a real image and first a gaussian low-pass filter and then a Sobel filter</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">imageio.v3</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">imageio</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">matplotlib.pyplot</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">plt</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">time</span>
-
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">generate_gauss_mask</span>(sigma, K=<span style="color: #B452CD">1</span>):
-    side = np.ceil(<span style="color: #B452CD">1</span> + <span style="color: #B452CD">8</span> * sigma)
-    y, x = np.mgrid[-side // <span style="color: #B452CD">2</span> + <span style="color: #B452CD">1</span> : (side // <span style="color: #B452CD">2</span>) + <span style="color: #B452CD">1</span>, -side // <span style="color: #B452CD">2</span> + <span style="color: #B452CD">1</span> : (side // <span style="color: #B452CD">2</span>) + <span style="color: #B452CD">1</span>]
-    ker_coef = K / (<span style="color: #B452CD">2</span> * np.pi * sigma**<span style="color: #B452CD">2</span>)
-    g = np.exp(-((x**<span style="color: #B452CD">2</span> + y**<span style="color: #B452CD">2</span>) / (<span style="color: #B452CD">2.0</span> * sigma**<span style="color: #B452CD">2</span>)))
-
-    <span style="color: #8B008B; font-weight: bold">return</span> g, ker_coef
-
+<p><p>&nbsp;<br>
+$$
+g_t(x,t) = h_1(x,t) + x(1-x)tN(x,t,P)
+$$
+<p>&nbsp;<br>
+</p>
 
-img_path = <span style="color: #CD5555">&quot;data/IMG-2167.JPG&quot;</span>
-image_of_cute_dog = imageio.imread(img_path, mode=<span style="color: #CD5555">&#39;L&#39;</span>)
+<p>with \( h_1(x,t) \) being a function ensuring that \( g_t(x,t) \) satisfies our given conditions, and \( N(x,t,P) \) being the output from the deep neural network using weights and biases for each layer from \( P \).</p>
 
-plt.imshow(image_of_cute_dog, cmap=<span style="color: #CD5555">&quot;gray&quot;</span>, vmin=<span style="color: #B452CD">0</span>, vmax=<span style="color: #B452CD">255</span>, aspect=<span style="color: #CD5555">&quot;auto&quot;</span>)
-plt.title(<span style="color: #CD5555">&quot;Original image&quot;</span>)
-plt.show()
+<p>To fulfill the conditions, \( h_1(x,t) \) could be:</p>
 
-gauss, kernel = generate_gauss_mask(sigma=<span style="color: #B452CD">6</span>)
-gauss_kernel = gauss*kernel
+<p><p>&nbsp;<br>
+$$
+h_1(x,t) = (1-t)\Big(u(x) - \big((1-x)u(0) + x u(1)\big)\Big) = (1-t)u(x) = (1-t)\sin(\pi x)
+$$
+<p>&nbsp;<br>
+since \( (0) = u(1) = 0 \) and \( u(x) = \sin(\pi x) \).
+</p>
+</section>
 
-filtered_image = convolve(image_of_cute_dog, gauss_kernel)
-plt.imshow(filtered_image, cmap=<span style="color: #CD5555">&quot;gray&quot;</span>, vmin=<span style="color: #B452CD">0</span>, vmax=<span style="color: #B452CD">255</span>, aspect=<span style="color: #CD5555">&quot;auto&quot;</span>)
-plt.title(<span style="color: #CD5555">&quot;Result of convolution with gauss kernel (blurring filter)&quot;</span>)
-plt.show()
+<section>
+<h2 id="why-the-jacobian">Why the Jacobian? </h2>
 
-sobel_kernel = np.array([[<span style="color: #B452CD">1</span>, <span style="color: #B452CD">2</span>, <span style="color: #B452CD">1</span>],
-                    [<span style="color: #B452CD">0</span>, <span style="color: #B452CD">0</span>, <span style="color: #B452CD">0</span>], 
-                    [-<span style="color: #B452CD">1</span>, -<span style="color: #B452CD">2</span>, -<span style="color: #B452CD">1</span>]])
+<p>The Jacobian is used because the program must find the derivative of
+the trial solution with respect to \( x \) and \( t \).
+</p>
 
-filtered_image = convolve(image_of_cute_dog, sobel_kernel)
+<p>This gives the necessity of computing the Jacobian matrix, as we want
+to evaluate the gradient with respect to \( x \) and \( t \) (note that the
+Jacobian of a scalar-valued multivariate function is simply its
+gradient).
+</p>
 
-plt.imshow(filtered_image, cmap=<span style="color: #CD5555">&quot;gray&quot;</span>, vmin=<span style="color: #B452CD">0</span>, vmax=<span style="color: #B452CD">255</span>, aspect=<span style="color: #CD5555">&quot;auto&quot;</span>)
-plt.title(<span style="color: #CD5555">&quot;Result of convolution with sobel kernel (edge detection filter)&quot;</span>)
-plt.show()
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-<h3 id="layers">Layers </h3>
+<p>In Autograd, the differentiation is by default done with respect to
+the first input argument of your Python function. Since the points is
+an array representing \( x \) and \( t \), the Jacobian is calculated using
+the values of \( x \) and \( t \).
+</p>
 
-<p>The code below initialises global variables for readability and
-describes the abstract class Layers. This is not important in order to
-understand the CNN, but is benefitial for organizing the code neatly.
+<p>To find the second derivative with respect to \( x \) and \( t \), the
+Jacobian can be found for the second time. The result is a Hessian
+matrix, which is the matrix containing all the possible second order
+mixed derivatives of \( g(x,t) \).
 </p>
 
 
@@ -2758,444 +2390,42 @@ <h3 id="layers">Layers </h3>
     <div class="inner_cell">
       <div class="input_area">
         <div class="highlight" style="background: #eeeedd">
-  <pre style="font-size: 80%; line-height: 125%;"><span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">math</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">autograd.numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">copy</span> <span style="color: #8B008B; font-weight: bold">import</span> deepcopy, copy
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">autograd</span> <span style="color: #8B008B; font-weight: bold">import</span> grad
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">typing</span> <span style="color: #8B008B; font-weight: bold">import</span> Callable
-
-<span style="color: #228B22"># global variables for index readability</span>
-input_index = <span style="color: #B452CD">0</span>
-node_index = <span style="color: #B452CD">1</span>
-bias_index = <span style="color: #B452CD">1</span>
-input_channel_index = <span style="color: #B452CD">1</span>
-feature_maps_index = <span style="color: #B452CD">1</span>
-height_index = <span style="color: #B452CD">2</span>
-width_index = <span style="color: #B452CD">3</span>
-kernel_feature_maps_index = <span style="color: #B452CD">1</span>
-kernel_input_channels_index = <span style="color: #B452CD">0</span>
-
-
-<span style="color: #8B008B; font-weight: bold">class</span> <span style="color: #008b45; font-weight: bold">Layer</span>:
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">__init__</span>(<span style="color: #658b00">self</span>, seed):
-        <span style="color: #658b00">self</span>.seed = seed
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">_feedforward</span>(<span style="color: #658b00">self</span>):
-        <span style="color: #8B008B; font-weight: bold">raise</span> <span style="color: #008b45; font-weight: bold">NotImplementedError</span>
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">_backpropagate</span>(<span style="color: #658b00">self</span>):
-        <span style="color: #8B008B; font-weight: bold">raise</span> <span style="color: #008b45; font-weight: bold">NotImplementedError</span>
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">_reset_weights</span>(<span style="color: #658b00">self</span>, previous_nodes):
-        <span style="color: #8B008B; font-weight: bold">raise</span> <span style="color: #008b45; font-weight: bold">NotImplementedError</span>
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-<h3 id="convolution2dlayer-convolution-in-a-hidden-layer">Convolution2DLayer: convolution in a hidden layer </h3>
-
-<p>After establishing the foundational understanding of applying
-convolution to spatial data, let us delve into the intricate workings
-of a convolutional layer in a Convolutional Neural Network (CNN). The
-primary function of convolution, as previously discussed, is to
-extract pertinent information from images while simultaneously
-decreasing the scale of our data. To initiate the image processing, we
-shall begin by partitioning the images into color channels (unless the
-image is grayscale), comprising three primary colors: red, green, and
-blue. We will subsequently utilize trainable kernels to construct a
-higher-dimensional encoding of each channel called feature
-maps. Successive layers will receive these feature maps as inputs,
-generating further encodings, albeit with reduced dimensions. The term
-trainable kernels denotes the initialization of pre-defined
-kernel-shaped weights, which we will then train via backpropagation,
-similar to how weights are trained in a Feedforward Neural Network.
-</p>
+  <pre style="font-size: 80%; line-height: 125%;"><span style="color: #228B22"># Set up the trial function:</span>
+<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">u</span>(x):
+    <span style="color: #8B008B; font-weight: bold">return</span> np.sin(np.pi*x)
 
-<p>To ensure seamless integration between our implementation of the
-convolutional layer and popular machine learning frameworks like
-Tensorflow (Keras) and PyTorch, we have adopted a design pattern that
-mirrors the construction of models using these APIs. This involves
-implementing our convolutional layer as a Python class or object,
-which allows for a more modular and flexible approach to building
-neural networks. By structuring our code in this way, users can easily
-incorporate our implementation into their existing machine learning
-pipelines without having to make significant changes to their
-codebase. Additionally, this design pattern promotes code reusability
-and makes it easier to maintain and update our convolutional layer
-implementation over time.
-</p>
+<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">g_trial</span>(point,P):
+    x,t = point
+    <span style="color: #8B008B; font-weight: bold">return</span> (<span style="color: #B452CD">1</span>-t)*u(x) + x*(<span style="color: #B452CD">1</span>-x)*t*deep_neural_network(P,point)
 
-<p>Note that the Convolution2DLayer takes in an activation function as a parameter, as it also performs non-linearity.</p>
+<span style="color: #228B22"># The right side of the ODE:</span>
+<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">f</span>(point):
+    <span style="color: #8B008B; font-weight: bold">return</span> <span style="color: #B452CD">0.</span>
 
+<span style="color: #228B22"># The cost function:</span>
+<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">cost_function</span>(P, x, t):
+    cost_sum = <span style="color: #B452CD">0</span>
 
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="font-size: 80%; line-height: 125%;"><span style="color: #8B008B; font-weight: bold">class</span> <span style="color: #008b45; font-weight: bold">Convolution2DLayer</span>(Layer):
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">__init__</span>(
-        <span style="color: #658b00">self</span>,
-        input_channels,
-        feature_maps,
-        kernel_height,
-        kernel_width,
-        v_stride,
-        h_stride,
-        pad,
-        act_func: Callable,
-        seed=<span style="color: #8B008B; font-weight: bold">None</span>,
-        reset_weights_independently=<span style="color: #8B008B; font-weight: bold">True</span>,
-    ):
-        <span style="color: #658b00">super</span>().<span style="color: #008b45">__init__</span>(seed)
-        <span style="color: #658b00">self</span>.input_channels = input_channels
-        <span style="color: #658b00">self</span>.feature_maps = feature_maps
-        <span style="color: #658b00">self</span>.kernel_height = kernel_height
-        <span style="color: #658b00">self</span>.kernel_width = kernel_width
-        <span style="color: #658b00">self</span>.v_stride = v_stride
-        <span style="color: #658b00">self</span>.h_stride = h_stride
-        <span style="color: #658b00">self</span>.pad = pad
-        <span style="color: #658b00">self</span>.act_func = act_func
-
-        <span style="color: #228B22"># such that the layer can be used on its own</span>
-        <span style="color: #228B22"># outside of the CNN module</span>
-        <span style="color: #8B008B; font-weight: bold">if</span> reset_weights_independently == <span style="color: #8B008B; font-weight: bold">True</span>:
-            <span style="color: #658b00">self</span>._reset_weights_independently()
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">_feedforward</span>(<span style="color: #658b00">self</span>, X_batch):
-        <span style="color: #228B22"># note that the shape of X_batch = [inputs, input_maps, img_height, img_width]</span>
-
-        <span style="color: #228B22"># pad the input batch</span>
-        X_batch_padded = <span style="color: #658b00">self</span>._padding(X_batch)
-
-        <span style="color: #228B22"># calculate height_index and width_index after stride</span>
-        strided_height = <span style="color: #658b00">int</span>(np.ceil(X_batch.shape[height_index] / <span style="color: #658b00">self</span>.v_stride))
-        strided_width = <span style="color: #658b00">int</span>(np.ceil(X_batch.shape[width_index] / <span style="color: #658b00">self</span>.h_stride))
-
-        <span style="color: #228B22"># create output array</span>
-        output = np.ndarray(
-            (
-                X_batch.shape[input_index],
-                <span style="color: #658b00">self</span>.feature_maps,
-                strided_height,
-                strided_width,
-            )
-        )
-
-        <span style="color: #228B22"># save input and output for backpropagation</span>
-        <span style="color: #658b00">self</span>.X_batch_feedforward = X_batch
-        <span style="color: #658b00">self</span>.output_shape = output.shape
-
-        <span style="color: #228B22"># checking for errors, no need to look here :)</span>
-        <span style="color: #658b00">self</span>._check_for_errors()
-
-        <span style="color: #228B22"># convolve input with kernel</span>
-        <span style="color: #8B008B; font-weight: bold">for</span> img <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(X_batch.shape[input_index]):
-            <span style="color: #8B008B; font-weight: bold">for</span> chin <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #658b00">self</span>.input_channels):
-                <span style="color: #8B008B; font-weight: bold">for</span> fmap <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #658b00">self</span>.feature_maps):
-                    out_h = <span style="color: #B452CD">0</span>
-                    <span style="color: #8B008B; font-weight: bold">for</span> h <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #B452CD">0</span>, X_batch.shape[height_index], <span style="color: #658b00">self</span>.v_stride):
-                        out_w = <span style="color: #B452CD">0</span>
-                        <span style="color: #8B008B; font-weight: bold">for</span> w <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #B452CD">0</span>, X_batch.shape[width_index], <span style="color: #658b00">self</span>.h_stride):
-                            output[img, fmap, out_h, out_w] = np.sum(
-                                X_batch_padded[
-                                    img,
-                                    chin,
-                                    h : h + <span style="color: #658b00">self</span>.kernel_height,
-                                    w : w + <span style="color: #658b00">self</span>.kernel_width,
-                                ]
-                                * <span style="color: #658b00">self</span>.kernel[chin, fmap, :, :]
-                            )
-                            out_w += <span style="color: #B452CD">1</span>
-                        out_h += <span style="color: #B452CD">1</span>
-
-        <span style="color: #228B22"># Pay attention to the fact that we&#39;re not rotating the kernel by 180 degrees when filtering the image in</span>
-        <span style="color: #228B22"># the convolutional layer, as convolution in terms of Machine Learning is a procedure known as cross-correlation</span>
-        <span style="color: #228B22"># in image processing and signal processing</span>
-
-        <span style="color: #228B22"># return a</span>
-        <span style="color: #8B008B; font-weight: bold">return</span> <span style="color: #658b00">self</span>.act_func(output / (<span style="color: #658b00">self</span>.kernel_height))
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">_backpropagate</span>(<span style="color: #658b00">self</span>, delta_term_next):
-        <span style="color: #228B22"># intiate matrices</span>
-        delta_term = np.zeros((<span style="color: #658b00">self</span>.X_batch_feedforward.shape))
-        gradient_kernel = np.zeros((<span style="color: #658b00">self</span>.kernel.shape))
-
-        <span style="color: #228B22"># pad input for convolution</span>
-        X_batch_padded = <span style="color: #658b00">self</span>._padding(<span style="color: #658b00">self</span>.X_batch_feedforward)
-
-        <span style="color: #228B22"># Since an activation function is used at the output of the convolution layer, its derivative</span>
-        <span style="color: #228B22"># has to be accounted for in the backpropagation -&gt; as if ReLU was a layer on its own.</span>
-        act_derivative = derivate(<span style="color: #658b00">self</span>.act_func)
-        delta_term_next = act_derivative(delta_term_next)
-
-        <span style="color: #228B22"># fill in 0&#39;s for values removed by vertical stride in feedforward</span>
-        <span style="color: #8B008B; font-weight: bold">if</span> <span style="color: #658b00">self</span>.v_stride &gt; <span style="color: #B452CD">1</span>:
-            v_ind = <span style="color: #B452CD">1</span>
-            <span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(delta_term_next.shape[height_index]):
-                <span style="color: #8B008B; font-weight: bold">for</span> j <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #658b00">self</span>.v_stride - <span style="color: #B452CD">1</span>):
-                    delta_term_next = np.insert(
-                        delta_term_next, v_ind, <span style="color: #B452CD">0</span>, axis=height_index
-                    )
-                v_ind += <span style="color: #658b00">self</span>.v_stride
-
-        <span style="color: #228B22"># fill in 0&#39;s for values removed by horizontal stride in feedforward</span>
-        <span style="color: #8B008B; font-weight: bold">if</span> <span style="color: #658b00">self</span>.h_stride &gt; <span style="color: #B452CD">1</span>:
-            h_ind = <span style="color: #B452CD">1</span>
-            <span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(delta_term_next.shape[width_index]):
-                <span style="color: #8B008B; font-weight: bold">for</span> k <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #658b00">self</span>.h_stride - <span style="color: #B452CD">1</span>):
-                    delta_term_next = np.insert(
-                        delta_term_next, h_ind, <span style="color: #B452CD">0</span>, axis=width_index
-                    )
-                h_ind += <span style="color: #658b00">self</span>.h_stride
-
-        <span style="color: #228B22"># crops out 0-rows and 0-columns</span>
-        delta_term_next = delta_term_next[
-            :,
-            :,
-            : <span style="color: #658b00">self</span>.X_batch_feedforward.shape[height_index],
-            : <span style="color: #658b00">self</span>.X_batch_feedforward.shape[width_index],
-        ]
-
-        <span style="color: #228B22"># the gradient received from the next layer also needs to be padded</span>
-        delta_term_next = <span style="color: #658b00">self</span>._padding(delta_term_next)
-
-        <span style="color: #228B22"># calculate delta term by convolving next delta term with kernel</span>
-        <span style="color: #8B008B; font-weight: bold">for</span> img <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #658b00">self</span>.X_batch_feedforward.shape[input_index]):
-            <span style="color: #8B008B; font-weight: bold">for</span> chin <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #658b00">self</span>.input_channels):
-                <span style="color: #8B008B; font-weight: bold">for</span> fmap <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #658b00">self</span>.feature_maps):
-                    <span style="color: #8B008B; font-weight: bold">for</span> h <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #658b00">self</span>.X_batch_feedforward.shape[height_index]):
-                        <span style="color: #8B008B; font-weight: bold">for</span> w <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #658b00">self</span>.X_batch_feedforward.shape[width_index]):
-                            delta_term[img, chin, h, w] = np.sum(
-                                delta_term_next[
-                                    img,
-                                    fmap,
-                                    h : h + <span style="color: #658b00">self</span>.kernel_height,
-                                    w : w + <span style="color: #658b00">self</span>.kernel_width,
-                                ]
-                                * np.rot90(np.rot90(<span style="color: #658b00">self</span>.kernel[chin, fmap, :, :]))
-                            )
-
-        <span style="color: #228B22"># calculate gradient for kernel for weight update</span>
-        <span style="color: #228B22"># also via convolution</span>
-        <span style="color: #8B008B; font-weight: bold">for</span> chin <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #658b00">self</span>.input_channels):
-            <span style="color: #8B008B; font-weight: bold">for</span> fmap <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #658b00">self</span>.feature_maps):
-                <span style="color: #8B008B; font-weight: bold">for</span> k_x <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #658b00">self</span>.kernel_height):
-                    <span style="color: #8B008B; font-weight: bold">for</span> k_y <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #658b00">self</span>.kernel_width):
-                        gradient_kernel[chin, fmap, k_x, k_y] = np.sum(
-                            X_batch_padded[
-                                img,
-                                chin,
-                                h : h + <span style="color: #658b00">self</span>.kernel_height,
-                                w : w + <span style="color: #658b00">self</span>.kernel_width,
-                            ]
-                            * delta_term_next[
-                                img,
-                                fmap,
-                                h : h + <span style="color: #658b00">self</span>.kernel_height,
-                                w : w + <span style="color: #658b00">self</span>.kernel_width,
-                            ]
-                        )
-        <span style="color: #228B22"># all kernels are updated with weight gradient of kernel</span>
-        <span style="color: #658b00">self</span>.kernel -= gradient_kernel
-
-        <span style="color: #228B22"># return delta term</span>
-        <span style="color: #8B008B; font-weight: bold">return</span> delta_term
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">_padding</span>(<span style="color: #658b00">self</span>, X_batch, batch_type=<span style="color: #CD5555">&quot;image&quot;</span>):
-
-        <span style="color: #228B22"># same padding for images</span>
-        <span style="color: #8B008B; font-weight: bold">if</span> <span style="color: #658b00">self</span>.pad == <span style="color: #CD5555">&quot;same&quot;</span> <span style="color: #8B008B">and</span> batch_type == <span style="color: #CD5555">&quot;image&quot;</span>:
-            padded_height = X_batch.shape[height_index] + (<span style="color: #658b00">self</span>.kernel_height // <span style="color: #B452CD">2</span>) * <span style="color: #B452CD">2</span>
-            padded_width = X_batch.shape[width_index] + (<span style="color: #658b00">self</span>.kernel_width // <span style="color: #B452CD">2</span>) * <span style="color: #B452CD">2</span>
-            half_kernel_height = <span style="color: #658b00">self</span>.kernel_height // <span style="color: #B452CD">2</span>
-            half_kernel_width = <span style="color: #658b00">self</span>.kernel_width // <span style="color: #B452CD">2</span>
-
-            <span style="color: #228B22"># initialize padded array</span>
-            X_batch_padded = np.ndarray(
-                (
-                    X_batch.shape[input_index],
-                    X_batch.shape[feature_maps_index],
-                    padded_height,
-                    padded_width,
-                )
-            )
-
-            <span style="color: #228B22"># zero pad all images in X_batch</span>
-            <span style="color: #8B008B; font-weight: bold">for</span> img <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(X_batch.shape[input_index]):
-                padded_img = np.zeros(
-                    (X_batch.shape[feature_maps_index], padded_height, padded_width)
-                )
-                padded_img[
-                    :,
-                    half_kernel_height : padded_height - half_kernel_height,
-                    half_kernel_width : padded_width - half_kernel_width,
-                ] = X_batch[img, :, :, :]
-                X_batch_padded[img, :, :, :] = padded_img[:, :, :]
-
-            <span style="color: #8B008B; font-weight: bold">return</span> X_batch_padded
-
-        <span style="color: #228B22"># same padding for gradients</span>
-        <span style="color: #8B008B; font-weight: bold">elif</span> <span style="color: #658b00">self</span>.pad == <span style="color: #CD5555">&quot;same&quot;</span> <span style="color: #8B008B">and</span> batch_type == <span style="color: #CD5555">&quot;grad&quot;</span>:
-            padded_height = X_batch.shape[height_index] + (<span style="color: #658b00">self</span>.kernel_height // <span style="color: #B452CD">2</span>) * <span style="color: #B452CD">2</span>
-            padded_width = X_batch.shape[width_index] + (<span style="color: #658b00">self</span>.kernel_width // <span style="color: #B452CD">2</span>) * <span style="color: #B452CD">2</span>
-            half_kernel_height = <span style="color: #658b00">self</span>.kernel_height // <span style="color: #B452CD">2</span>
-            half_kernel_width = <span style="color: #658b00">self</span>.kernel_width // <span style="color: #B452CD">2</span>
-
-            <span style="color: #228B22"># initialize padded array</span>
-            delta_term_padded = np.zeros(
-                (
-                    X_batch.shape[input_index],
-                    X_batch.shape[feature_maps_index],
-                    padded_height,
-                    padded_width,
-                )
-            )
-
-            <span style="color: #228B22"># zero pad delta term</span>
-            delta_term_padded[
-                :, :, : X_batch.shape[height_index], : X_batch.shape[width_index]
-            ] = X_batch[:, :, :, :]
-
-            <span style="color: #8B008B; font-weight: bold">return</span> delta_term_padded
-
-        <span style="color: #8B008B; font-weight: bold">else</span>:
-            <span style="color: #8B008B; font-weight: bold">return</span> X_batch
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">_reset_weights_independently</span>(<span style="color: #658b00">self</span>):
-        <span style="color: #228B22"># sets seed to remove randomness inbetween runs</span>
-        <span style="color: #8B008B; font-weight: bold">if</span> <span style="color: #658b00">self</span>.seed <span style="color: #8B008B">is</span> <span style="color: #8B008B">not</span> <span style="color: #8B008B; font-weight: bold">None</span>:
-            np.random.seed(<span style="color: #658b00">self</span>.seed)
-
-        <span style="color: #228B22"># initializes kernel matrix</span>
-        <span style="color: #658b00">self</span>.kernel = np.ndarray(
-            (
-                <span style="color: #658b00">self</span>.input_channels,
-                <span style="color: #658b00">self</span>.feature_maps,
-                <span style="color: #658b00">self</span>.kernel_height,
-                <span style="color: #658b00">self</span>.kernel_width,
-            )
-        )
-
-        <span style="color: #228B22"># randomly initializes weights</span>
-        <span style="color: #8B008B; font-weight: bold">for</span> chin <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #658b00">self</span>.kernel.shape[kernel_input_channels_index]):
-            <span style="color: #8B008B; font-weight: bold">for</span> fmap <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #658b00">self</span>.kernel.shape[kernel_feature_maps_index]):
-                <span style="color: #658b00">self</span>.kernel[chin, fmap, :, :] = np.random.rand(
-                    <span style="color: #658b00">self</span>.kernel_height, <span style="color: #658b00">self</span>.kernel_width
-                )
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">_reset_weights</span>(<span style="color: #658b00">self</span>, previous_nodes):
-        <span style="color: #228B22"># sets weights</span>
-        <span style="color: #658b00">self</span>._reset_weights_independently()
-
-        <span style="color: #228B22"># returns shape of output used for subsequent layer&#39;s weight initiation</span>
-        strided_height = <span style="color: #658b00">int</span>(
-            np.ceil(previous_nodes.shape[height_index] / <span style="color: #658b00">self</span>.v_stride)
-        )
-        strided_width = <span style="color: #658b00">int</span>(np.ceil(previous_nodes.shape[width_index] / <span style="color: #658b00">self</span>.h_stride))
-        next_nodes = np.ones(
-            (
-                previous_nodes.shape[input_index],
-                <span style="color: #658b00">self</span>.feature_maps,
-                strided_height,
-                strided_width,
-            )
-        )
-        <span style="color: #8B008B; font-weight: bold">return</span> next_nodes / <span style="color: #658b00">self</span>.kernel_height
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">_check_for_errors</span>(<span style="color: #658b00">self</span>):
-        <span style="color: #8B008B; font-weight: bold">if</span> <span style="color: #658b00">self</span>.X_batch_feedforward.shape[input_channel_index] != <span style="color: #658b00">self</span>.input_channels:
-            <span style="color: #8B008B; font-weight: bold">raise</span> <span style="color: #008b45; font-weight: bold">AssertionError</span>(
-                <span style="color: #CD5555">f&quot;ERROR: Number of input channels in data ({</span><span style="color: #658b00">self</span>.X_batch_feedforward.shape[input_channel_index]<span style="color: #CD5555">}) is not equal to input channels in Convolution2DLayerOPT ({</span><span style="color: #658b00">self</span>.input_channels<span style="color: #CD5555">})! Please change the number of input channels of the Convolution2DLayer such that they are equal&quot;</span>
-            )
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-<h3 id="backpropagation-in-the-convolutional-layer">Backpropagation in the convolutional layer </h3>
-
-<p>As you may have noticed, we have not yet explained how the
-backpropagation algorithm works in a convolutional layer. However,
-having covered all other major details about convolutional layers, we
-are now prepared to do so. It should come as no surprise that the
-calculation of delta terms at each convolutional layer takes the form
-of convolution. After the gradient has been propagated backwards
-through the flattening layer, where it was reshaped into an
-appropriate form, calculating the update value for the kernel is
-simply a matter of convolving the output gradient with the input of
-the layer for which we are updating the weights. For more detail, this
-article serves as an excellent resource, see
-<a href="https://pavisj.medium.com/convolutions-and-backpropagations-46026a8f5d2c" target="_blank"><tt>https://pavisj.medium.com/convolutions-and-backpropagations-46026a8f5d2c</tt></a>
-</p>
-<h3 id="demonstration">Demonstration </h3>
+    g_t_jacobian_func = jacobian(g_trial)
+    g_t_hessian_func = hessian(g_trial)
 
-<p>We can use the convolutional layer above to perform a simple convolution on an image of the now familiar cute dog.</p>
+    <span style="color: #8B008B; font-weight: bold">for</span> x_ <span style="color: #8B008B">in</span> x:
+        <span style="color: #8B008B; font-weight: bold">for</span> t_ <span style="color: #8B008B">in</span> t:
+            point = np.array([x_,t_])
 
+            g_t = g_trial(point,P)
+            g_t_jacobian = g_t_jacobian_func(point,P)
+            g_t_hessian = g_t_hessian_func(point,P)
 
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="font-size: 80%; line-height: 125%;"><span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">imageio.v3</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">imageio</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">matplotlib.pyplot</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">plt</span>
+            g_t_dt = g_t_jacobian[<span style="color: #B452CD">1</span>]
+            g_t_d2x = g_t_hessian[<span style="color: #B452CD">0</span>][<span style="color: #B452CD">0</span>]
 
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">plot_convolution_result</span>(X, layer):
-    plt.imshow(X[<span style="color: #B452CD">0</span>, <span style="color: #B452CD">0</span>, :, :], vmin=<span style="color: #B452CD">0</span>, vmax=<span style="color: #B452CD">255</span>, cmap=<span style="color: #CD5555">&quot;gray&quot;</span>)
-    plt.title(<span style="color: #CD5555">&quot;Original image&quot;</span>)
-    plt.colorbar()
-    plt.show()
-    conv_result = layer._feedforward(X)
-    plt.title(<span style="color: #CD5555">&quot;Result of convolutional layer&quot;</span>)
-    plt.imshow(conv_result[<span style="color: #B452CD">0</span>, <span style="color: #B452CD">0</span>, :, :], vmin=<span style="color: #B452CD">0</span>, vmax=<span style="color: #B452CD">255</span>, cmap=<span style="color: #CD5555">&quot;gray&quot;</span>)
-    plt.colorbar()
-    plt.show()
+            func = f(point)
 
-<span style="color: #228B22"># create layer</span>
-layer = Convolution2DLayer(
-    input_channels=<span style="color: #B452CD">3</span>,
-    feature_maps=<span style="color: #B452CD">1</span>,
-    kernel_height=<span style="color: #B452CD">4</span>,
-    kernel_width=<span style="color: #B452CD">4</span>,
-    v_stride=<span style="color: #B452CD">2</span>,
-    h_stride=<span style="color: #B452CD">2</span>,
-    pad=<span style="color: #CD5555">&quot;same&quot;</span>,
-    act_func=identity,
-    seed=<span style="color: #B452CD">2023</span>,
-    )
-
-<span style="color: #228B22"># read in image path, make data correct format</span>
-img_path = img_path = <span style="color: #CD5555">&quot;data/IMG-2167.JPG&quot;</span>
-image_of_cute_dog = imageio.imread(img_path)
-image_shape = image_of_cute_dog.shape
-image_of_cute_dog = image_of_cute_dog.reshape(<span style="color: #B452CD">1</span>, image_shape[<span style="color: #B452CD">0</span>], image_shape[<span style="color: #B452CD">1</span>], image_shape[<span style="color: #B452CD">2</span>])
-image_of_cute_dog = image_of_cute_dog.transpose(<span style="color: #B452CD">0</span>, <span style="color: #B452CD">3</span>, <span style="color: #B452CD">1</span>, <span style="color: #B452CD">2</span>)
-
-<span style="color: #228B22"># plot the result of the convolution</span>
-plot_convolution_result(image_of_cute_dog, layer)
+            err_sqr = ( (g_t_dt - g_t_d2x) - func)**<span style="color: #B452CD">2</span>
+            cost_sum += err_sqr
+
+    <span style="color: #8B008B; font-weight: bold">return</span> cost_sum
 </pre>
 </div>
       </div>
@@ -3210,34 +2440,29 @@ <h3 id="demonstration">Demonstration </h3>
     </div>
   </div>
 </div>
+</section>
+
+<section>
+<h2 id="setting-up-the-network-using-autograd-the-full-program">Setting up the network using Autograd; The full program </h2>
+
+<p>Having set up the network, along with the trial solution and cost function, we can now see how the deep neural network performs by comparing the results to the analytical solution.</p>
+
+<p>The analytical solution of our problem is</p>
 
-<p>We cobserve that the result has half the pixels on each axis due to
-the fact that we've used a horizontal and vertical stride of 2. The
-result of this convolution is not very insightfull, as the kernel has
-completely random values for the first feedforward pass. However, as
-we perform multiple forward and backward passes, the results of the
-convolution should provide identifying features of the image it uses
-for classification.
+<p><p>&nbsp;<br>
+$$
+g(x,t) = \exp(-\pi^2 t)\sin(\pi x)
+$$
+<p>&nbsp;<br>
 </p>
 
-<p>Note that image data usually comes in many different shapes and sizes,
-but for our CNN we require the input data be formatted as \[Number of
-inputs, input channels, input height, input width\]. Occasionally, the
-data you come accross use will be formatted like this, but on many
-occasions reshaping and transposing the dimensions is sadly necessary.
+<p>A possible way to implement a neural network solving the PDE, is given below.
+Be aware, though, that it is fairly slow for the parameters used.
+A better result is possible, but requires more iterations, and thus longer time to complete.
 </p>
-<h3 id="pooling-layer">Pooling Layer </h3>
-
-<p>The pooling layer is another widely used type of layer in
-convolutional neural networks that enables data downsampling to a more
-manageable size. Despite recent technological advancements that allow
-for convolution without excessive size reduction of the data, the
-pooling layer still remains a fundamental component of convolutional
-neural networks. It can be used before, after, or in between
-convolutional layers, although finding the optimal placement of layers
-and network depth requires experimentation to achieve the best
-performance for a given problem. The code we provide allows you to
-perform two types of pooling known as max pooling and average pooling.
+
+<p>Indeed, the program below is not optimal in its implementation, but rather serves as an example on how to implement and use a neural network to solve a PDE.
+Using TensorFlow results in a much better execution time. Try it!
 </p>
 
 
@@ -3247,1407 +2472,229 @@ <h3 id="pooling-layer">Pooling Layer </h3>
     <div class="inner_cell">
       <div class="input_area">
         <div class="highlight" style="background: #eeeedd">
-  <pre style="font-size: 80%; line-height: 125%;"><span style="color: #8B008B; font-weight: bold">class</span> <span style="color: #008b45; font-weight: bold">Pooling2DLayer</span>(Layer):
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">__init__</span>(
-        <span style="color: #658b00">self</span>,
-        kernel_height,
-        kernel_width,
-        v_stride,
-        h_stride,
-        pooling=<span style="color: #CD5555">&quot;max&quot;</span>,
-        seed=<span style="color: #8B008B; font-weight: bold">None</span>,
-    ):
-        <span style="color: #658b00">super</span>().<span style="color: #008b45">__init__</span>(seed)
-        <span style="color: #658b00">self</span>.kernel_height = kernel_height
-        <span style="color: #658b00">self</span>.kernel_width = kernel_width
-        <span style="color: #658b00">self</span>.v_stride = v_stride
-        <span style="color: #658b00">self</span>.h_stride = h_stride
-        <span style="color: #658b00">self</span>.pooling = pooling
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">_feedforward</span>(<span style="color: #658b00">self</span>, X_batch):
-        <span style="color: #228B22"># Saving the input for use in the backwardpass</span>
-        <span style="color: #658b00">self</span>.X_batch_feedforward = X_batch
-
-        <span style="color: #228B22"># check if user is silly</span>
-        <span style="color: #658b00">self</span>._check_for_errors()
-
-        <span style="color: #228B22"># Computing the size of the feature maps based on kernel size and the stride parameter</span>
-        strided_height = (
-            X_batch.shape[height_index] - <span style="color: #658b00">self</span>.kernel_height
-        ) // <span style="color: #658b00">self</span>.v_stride + <span style="color: #B452CD">1</span>
-        <span style="color: #8B008B; font-weight: bold">if</span> X_batch.shape[height_index] == X_batch.shape[width_index]:
-            strided_width = strided_height
-        <span style="color: #8B008B; font-weight: bold">else</span>:
-            strided_width = (
-                X_batch.shape[width_index] - <span style="color: #658b00">self</span>.kernel_width
-            ) // <span style="color: #658b00">self</span>.h_stride + <span style="color: #B452CD">1</span>
-
-        <span style="color: #228B22"># initialize output array</span>
-        output = np.ndarray(
-            (
-                X_batch.shape[input_index],
-                X_batch.shape[feature_maps_index],
-                strided_height,
-                strided_width,
-            )
-        )
-
-        <span style="color: #228B22"># select pooling action, either max or average pooling</span>
-        <span style="color: #8B008B; font-weight: bold">if</span> <span style="color: #658b00">self</span>.pooling == <span style="color: #CD5555">&quot;max&quot;</span>:
-            <span style="color: #658b00">self</span>.pooling_action = np.max
-        <span style="color: #8B008B; font-weight: bold">elif</span> <span style="color: #658b00">self</span>.pooling == <span style="color: #CD5555">&quot;average&quot;</span>:
-            <span style="color: #658b00">self</span>.pooling_action = np.mean
-
-        <span style="color: #228B22"># pool based on kernel size and stride</span>
-        <span style="color: #8B008B; font-weight: bold">for</span> img <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(output.shape[input_index]):
-            <span style="color: #8B008B; font-weight: bold">for</span> fmap <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(output.shape[feature_maps_index]):
-                <span style="color: #8B008B; font-weight: bold">for</span> h <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(strided_height):
-                    <span style="color: #8B008B; font-weight: bold">for</span> w <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(strided_width):
-                        output[img, fmap, h, w] = <span style="color: #658b00">self</span>.pooling_action(
-                            X_batch[
-                                img,
-                                fmap,
-                                (h * <span style="color: #658b00">self</span>.v_stride) : (h * <span style="color: #658b00">self</span>.v_stride)
-                                + <span style="color: #658b00">self</span>.kernel_height,
-                                (w * <span style="color: #658b00">self</span>.h_stride) : (w * <span style="color: #658b00">self</span>.h_stride)
-                                + <span style="color: #658b00">self</span>.kernel_width,
-                            ]
-                        )
-
-        <span style="color: #228B22"># output for feedforward in next layer</span>
-        <span style="color: #8B008B; font-weight: bold">return</span> output
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">_backpropagate</span>(<span style="color: #658b00">self</span>, delta_term_next):
-        <span style="color: #228B22"># initiate delta term array</span>
-        delta_term = np.zeros((<span style="color: #658b00">self</span>.X_batch_feedforward.shape))
-
-        <span style="color: #8B008B; font-weight: bold">for</span> img <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(delta_term_next.shape[input_index]):
-            <span style="color: #8B008B; font-weight: bold">for</span> fmap <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(delta_term_next.shape[feature_maps_index]):
-                <span style="color: #8B008B; font-weight: bold">for</span> h <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #B452CD">0</span>, delta_term_next.shape[height_index], <span style="color: #658b00">self</span>.v_stride):
-                    <span style="color: #8B008B; font-weight: bold">for</span> w <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(
-                        <span style="color: #B452CD">0</span>, delta_term_next.shape[width_index], <span style="color: #658b00">self</span>.h_stride
-                    ):
-                        <span style="color: #228B22"># max pooling</span>
-                        <span style="color: #8B008B; font-weight: bold">if</span> <span style="color: #658b00">self</span>.pooling == <span style="color: #CD5555">&quot;max&quot;</span>:
-                            <span style="color: #228B22"># get window</span>
-                            window = <span style="color: #658b00">self</span>.X_batch_feedforward[
-                                img,
-                                fmap,
-                                h : h + <span style="color: #658b00">self</span>.kernel_height,
-                                w : w + <span style="color: #658b00">self</span>.kernel_width,
-                            ]
-
-                            <span style="color: #228B22"># find max values indices in window</span>
-                            max_h, max_w = np.unravel_index(
-                                window.argmax(), window.shape
-                            )
-
-                            <span style="color: #228B22"># set values in new, upsampled delta term</span>
-                            delta_term[
-                                img,
-                                fmap,
-                                (h + max_h),
-                                (w + max_w),
-                            ] += delta_term_next[img, fmap, h, w]
-
-                        <span style="color: #228B22"># average pooling</span>
-                        <span style="color: #8B008B; font-weight: bold">if</span> <span style="color: #658b00">self</span>.pooling == <span style="color: #CD5555">&quot;average&quot;</span>:
-                            delta_term[
-                                img,
-                                fmap,
-                                h : h + <span style="color: #658b00">self</span>.kernel_height,
-                                w : w + <span style="color: #658b00">self</span>.kernel_width,
-                            ] = (
-                                delta_term_next[img, fmap, h, w]
-                                / <span style="color: #658b00">self</span>.kernel_height
-                                / <span style="color: #658b00">self</span>.kernel_width
-                            )
-        <span style="color: #228B22"># returns input to backpropagation in previous layer</span>
-        <span style="color: #8B008B; font-weight: bold">return</span> delta_term
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">_reset_weights</span>(<span style="color: #658b00">self</span>, previous_nodes):
-        <span style="color: #228B22"># calculate strided height, strided width</span>
-        strided_height = (
-            previous_nodes.shape[height_index] - <span style="color: #658b00">self</span>.kernel_height
-        ) // <span style="color: #658b00">self</span>.v_stride + <span style="color: #B452CD">1</span>
-        <span style="color: #8B008B; font-weight: bold">if</span> previous_nodes.shape[height_index] == previous_nodes.shape[width_index]:
-            strided_width = strided_height
-        <span style="color: #8B008B; font-weight: bold">else</span>:
-            strided_width = (
-                previous_nodes.shape[width_index] - <span style="color: #658b00">self</span>.kernel_width
-            ) // <span style="color: #658b00">self</span>.h_stride + <span style="color: #B452CD">1</span>
-
-        <span style="color: #228B22"># initiate output array</span>
-        output = np.ones(
-            (
-                previous_nodes.shape[input_index],
-                previous_nodes.shape[feature_maps_index],
-                strided_height,
-                strided_width,
-            )
-        )
-
-        <span style="color: #228B22"># returns output with shape used for reset weights in next layer</span>
-        <span style="color: #8B008B; font-weight: bold">return</span> output
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">_check_for_errors</span>(<span style="color: #658b00">self</span>):
-        <span style="color: #228B22"># check if input is smaller than kernel size -&gt; error</span>
-        <span style="color: #8B008B; font-weight: bold">assert</span> (
-            <span style="color: #658b00">self</span>.X_batch_feedforward.shape[width_index] &gt;= <span style="color: #658b00">self</span>.kernel_width
-        ), <span style="color: #CD5555">f&quot;ERROR: Pooling kernel width_index ({</span><span style="color: #658b00">self</span>.kernel_width<span style="color: #CD5555">}) larger than data width_index ({</span><span style="color: #658b00">self</span>.X_batch_feedforward.input.shape[<span style="color: #B452CD">2</span>]<span style="color: #CD5555">}), please lower the kernel width_index of the Pooling2DLayer&quot;</span>
-        <span style="color: #8B008B; font-weight: bold">assert</span> (
-            <span style="color: #658b00">self</span>.X_batch_feedforward.shape[height_index] &gt;= <span style="color: #658b00">self</span>.kernel_height
-        ), <span style="color: #CD5555">f&quot;ERROR: Pooling kernel height_index ({</span><span style="color: #658b00">self</span>.kernel_height<span style="color: #CD5555">}) larger than data height_index ({</span><span style="color: #658b00">self</span>.X_batch_feedforward.input.shape[<span style="color: #B452CD">3</span>]<span style="color: #CD5555">}), please lower the kernel height_index of the Pooling2DLayer&quot;</span>
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-<h3 id="flattening-layer">Flattening Layer </h3>
-
-<p>Before we can begin building our first CNN model, we need to introduce
-the flattening layer. As its name suggests, the flattening layer
-transforms the data into a one-dimensional vector that can be fed into
-the feedforward layers of our network. This layer plays a crucial role
-in preparing the data for further processing in the
-network. Additionally, the flattening layer is responsible for
-reshaping the gradient to the proper shape during
-backpropagation. This ensures that the kernels are correctly updated,
-allowing for effective learning in the network.
-</p>
+  <pre style="font-size: 80%; line-height: 125%;"><span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">autograd.numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">autograd</span> <span style="color: #8B008B; font-weight: bold">import</span> jacobian,hessian,grad
+<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">autograd.numpy.random</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">npr</span>
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">matplotlib</span> <span style="color: #8B008B; font-weight: bold">import</span> cm
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">matplotlib</span> <span style="color: #8B008B; font-weight: bold">import</span> pyplot <span style="color: #8B008B; font-weight: bold">as</span> plt
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">mpl_toolkits.mplot3d</span> <span style="color: #8B008B; font-weight: bold">import</span> axes3d
 
+<span style="color: #228B22">## Set up the network</span>
 
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="font-size: 80%; line-height: 125%;"><span style="color: #8B008B; font-weight: bold">class</span> <span style="color: #008b45; font-weight: bold">FlattenLayer</span>(Layer):
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">__init__</span>(<span style="color: #658b00">self</span>, act_func=LRELU, seed=<span style="color: #8B008B; font-weight: bold">None</span>):
-        <span style="color: #658b00">super</span>().<span style="color: #008b45">__init__</span>(seed)
-        <span style="color: #658b00">self</span>.act_func = act_func
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">_feedforward</span>(<span style="color: #658b00">self</span>, X_batch):
-        <span style="color: #228B22"># save input for backpropagation</span>
-        <span style="color: #658b00">self</span>.X_batch_feedforward_shape = X_batch.shape
-        <span style="color: #228B22"># Remember, the data has the following shape: (I, FM, H, W, ) in the convolutional layers</span>
-        <span style="color: #228B22"># whilst the data has the shape (I, FM * H * W) in the fully connected layers</span>
-        <span style="color: #228B22"># I = Inputs, FM = Feature Maps, H = Height and W = Width.</span>
-        X_batch = X_batch.reshape(
-            X_batch.shape[input_index],
-            X_batch.shape[feature_maps_index]
-            * X_batch.shape[height_index]
-            * X_batch.shape[width_index],
-        )
-
-        <span style="color: #228B22"># add bias to a</span>
-        <span style="color: #658b00">self</span>.z_matrix = X_batch
-        bias = np.ones((X_batch.shape[input_index], <span style="color: #B452CD">1</span>)) * <span style="color: #B452CD">0.01</span>
-        <span style="color: #658b00">self</span>.a_matrix = np.hstack([bias, X_batch])
-
-        <span style="color: #228B22"># return a, the input to feedforward in next layer</span>
-        <span style="color: #8B008B; font-weight: bold">return</span> <span style="color: #658b00">self</span>.a_matrix
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">_backpropagate</span>(<span style="color: #658b00">self</span>, weights_next, delta_term_next):
-        activation_derivative = derivate(<span style="color: #658b00">self</span>.act_func)
-
-        <span style="color: #228B22"># calculate delta term</span>
-        delta_term = (
-            weights_next[bias_index:, :] @ delta_term_next.T
-        ).T * activation_derivative(<span style="color: #658b00">self</span>.z_matrix)
-
-        <span style="color: #228B22"># FlattenLayer does not update weights</span>
-        <span style="color: #228B22"># reshapes delta layer to convolutional layer data format [Input, Feature_Maps, Height, Width]</span>
-        <span style="color: #8B008B; font-weight: bold">return</span> delta_term.reshape(<span style="color: #658b00">self</span>.X_batch_feedforward_shape)
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">_reset_weights</span>(<span style="color: #658b00">self</span>, previous_nodes):
-        <span style="color: #228B22"># note that the previous nodes to the FlattenLayer are from the convolutional layers</span>
-        previous_nodes = previous_nodes.reshape(
-            previous_nodes.shape[input_index],
-            previous_nodes.shape[feature_maps_index]
-            * previous_nodes.shape[height_index]
-            * previous_nodes.shape[width_index],
-        )
-
-        <span style="color: #228B22"># return shape used in reset_weights in next layer</span>
-        <span style="color: #8B008B; font-weight: bold">return</span> previous_nodes.shape[node_index]
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">get_prev_a</span>(<span style="color: #658b00">self</span>):
-        <span style="color: #8B008B; font-weight: bold">return</span> <span style="color: #658b00">self</span>.a_matrix
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-<h3 id="fully-connected-layers">Fully Connected Layers </h3>
-
-<p>Finally, the result from the flatten layer will pass to a series of
-fully connected layers, which function as a normal feed forward neural
-network. The fully connected layers are split into two classes;
-FullyConnectedLayer which acts as a hidden layer, and OutputLayer,
-which acts as the single output layer at the end of the CNN. If one
-wishes to use this codebase to construct a normal feed forward neural
-network, it must start with a FlattenLayer due to techincal details
-regarding weight intitialization. However many FullyConnectedLayers
-can be added to the CNN, and in each layer the amount of nodes, which
-activation function and scheduler to use can be specified. In
-practice, the scheduler will be specified in the CNN object
-initialization, and inherited if no other scheduler is specified.
-</p>
+<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">sigmoid</span>(z):
+    <span style="color: #8B008B; font-weight: bold">return</span> <span style="color: #B452CD">1</span>/(<span style="color: #B452CD">1</span> + np.exp(-z))
 
+<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">deep_neural_network</span>(deep_params, x):
+    <span style="color: #228B22"># x is now a point and a 1D numpy array; make it a column vector</span>
+    num_coordinates = np.size(x,<span style="color: #B452CD">0</span>)
+    x = x.reshape(num_coordinates,-<span style="color: #B452CD">1</span>)
 
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="font-size: 80%; line-height: 125%;"><span style="color: #8B008B; font-weight: bold">class</span> <span style="color: #008b45; font-weight: bold">FullyConnectedLayer</span>(Layer):
-    <span style="color: #228B22"># FullyConnectedLayer per default uses LRELU and Adam scheduler</span>
-    <span style="color: #228B22"># with an eta of 0.0001, rho of 0.9 and rho2 of 0.999</span>
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">__init__</span>(
-        <span style="color: #658b00">self</span>,
-        nodes: <span style="color: #658b00">int</span>,
-        act_func: Callable = LRELU,
-        scheduler: Scheduler = Adam(eta=<span style="color: #B452CD">1e-4</span>, rho=<span style="color: #B452CD">0.9</span>, rho2=<span style="color: #B452CD">0.999</span>),
-        seed: <span style="color: #658b00">int</span> = <span style="color: #8B008B; font-weight: bold">None</span>,
-    ):
-        <span style="color: #658b00">super</span>().<span style="color: #008b45">__init__</span>(seed)
-        <span style="color: #658b00">self</span>.nodes = nodes
-        <span style="color: #658b00">self</span>.act_func = act_func
-        <span style="color: #658b00">self</span>.scheduler_weight = copy(scheduler)
-        <span style="color: #658b00">self</span>.scheduler_bias = copy(scheduler)
-
-        <span style="color: #228B22"># initiate matrices for later</span>
-        <span style="color: #658b00">self</span>.weights = <span style="color: #8B008B; font-weight: bold">None</span>
-        <span style="color: #658b00">self</span>.a_matrix = <span style="color: #8B008B; font-weight: bold">None</span>
-        <span style="color: #658b00">self</span>.z_matrix = <span style="color: #8B008B; font-weight: bold">None</span>
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">_feedforward</span>(<span style="color: #658b00">self</span>, X_batch):
-        <span style="color: #228B22"># calculate z</span>
-        <span style="color: #658b00">self</span>.z_matrix = X_batch @ <span style="color: #658b00">self</span>.weights
-
-        <span style="color: #228B22"># calculate a, add bias</span>
-        bias = np.ones((X_batch.shape[input_index], <span style="color: #B452CD">1</span>)) * <span style="color: #B452CD">0.01</span>
-        <span style="color: #658b00">self</span>.a_matrix = <span style="color: #658b00">self</span>.act_func(<span style="color: #658b00">self</span>.z_matrix)
-        <span style="color: #658b00">self</span>.a_matrix = np.hstack([bias, <span style="color: #658b00">self</span>.a_matrix])
-
-        <span style="color: #228B22"># return a, the input for feedforward in next layer</span>
-        <span style="color: #8B008B; font-weight: bold">return</span> <span style="color: #658b00">self</span>.a_matrix
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">_backpropagate</span>(<span style="color: #658b00">self</span>, weights_next, delta_term_next, a_previous, lam):
-        <span style="color: #228B22"># take the derivative of the activation function</span>
-        activation_derivative = derivate(<span style="color: #658b00">self</span>.act_func)
-
-        <span style="color: #228B22"># calculate the delta term</span>
-        delta_term = (
-            weights_next[bias_index:, :] @ delta_term_next.T
-        ).T * activation_derivative(<span style="color: #658b00">self</span>.z_matrix)
-
-        <span style="color: #228B22"># intitiate matrix to store gradient</span>
-        <span style="color: #228B22"># note that we exclude the bias term, which we will calculate later</span>
-        gradient_weights = np.zeros(
-            (
-                a_previous.shape[input_index],
-                a_previous.shape[node_index] - bias_index,
-                delta_term.shape[node_index],
-            )
-        )
-
-        <span style="color: #228B22"># calculate gradient = delta term * previous a</span>
-        <span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #658b00">len</span>(delta_term)):
-            gradient_weights[i, :, :] = np.outer(
-                a_previous[i, bias_index:], delta_term[i, :]
-            )
-
-        <span style="color: #228B22"># sum the gradient, divide by input_index</span>
-        gradient_weights = np.mean(gradient_weights, axis=input_index)
-        <span style="color: #228B22"># for the bias gradient we do not multiply by previous a</span>
-        gradient_bias = np.mean(delta_term, axis=input_index).reshape(
-            <span style="color: #B452CD">1</span>, delta_term.shape[node_index]
-        )
-
-        <span style="color: #228B22"># regularization term</span>
-        gradient_weights += <span style="color: #658b00">self</span>.weights[bias_index:, :] * lam
-
-        <span style="color: #228B22"># send gradients into scheduler</span>
-        <span style="color: #228B22"># returns update matrix which will be used to update the weights and bias</span>
-        update_matrix = np.vstack(
-            [
-                <span style="color: #658b00">self</span>.scheduler_bias.update_change(gradient_bias),
-                <span style="color: #658b00">self</span>.scheduler_weight.update_change(gradient_weights),
-            ]
-        )
-
-        <span style="color: #228B22"># update weights</span>
-        <span style="color: #658b00">self</span>.weights -= update_matrix
-
-        <span style="color: #228B22"># return weights and delta term, input for backpropagation in previous layer</span>
-        <span style="color: #8B008B; font-weight: bold">return</span> <span style="color: #658b00">self</span>.weights, delta_term
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">_reset_weights</span>(<span style="color: #658b00">self</span>, previous_nodes):
-        <span style="color: #228B22"># sets seed to remove randomness inbetween runs</span>
-        <span style="color: #8B008B; font-weight: bold">if</span> <span style="color: #658b00">self</span>.seed <span style="color: #8B008B">is</span> <span style="color: #8B008B">not</span> <span style="color: #8B008B; font-weight: bold">None</span>:
-            np.random.seed(<span style="color: #658b00">self</span>.seed)
-
-        <span style="color: #228B22"># add bias, initiate random weights</span>
-        bias = <span style="color: #B452CD">1</span>
-        <span style="color: #658b00">self</span>.weights = np.random.randn(previous_nodes + bias, <span style="color: #658b00">self</span>.nodes)
-
-        <span style="color: #228B22"># returns number of nodes, used for reset_weights in next layer</span>
-        <span style="color: #8B008B; font-weight: bold">return</span> <span style="color: #658b00">self</span>.nodes
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">_reset_scheduler</span>(<span style="color: #658b00">self</span>):
-        <span style="color: #228B22"># resets scheduler per epoch</span>
-        <span style="color: #658b00">self</span>.scheduler_weight.reset()
-        <span style="color: #658b00">self</span>.scheduler_bias.reset()
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">get_prev_a</span>(<span style="color: #658b00">self</span>):
-        <span style="color: #228B22"># returns a matrix, used in backpropagation</span>
-        <span style="color: #8B008B; font-weight: bold">return</span> <span style="color: #658b00">self</span>.a_matrix
-
-
-<span style="color: #8B008B; font-weight: bold">class</span> <span style="color: #008b45; font-weight: bold">OutputLayer</span>(FullyConnectedLayer):
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">__init__</span>(
-        <span style="color: #658b00">self</span>,
-        nodes: <span style="color: #658b00">int</span>,
-        output_func: Callable = LRELU,
-        cost_func: Callable = CostCrossEntropy,
-        scheduler: Scheduler = Adam(eta=<span style="color: #B452CD">1e-4</span>, rho=<span style="color: #B452CD">0.9</span>, rho2=<span style="color: #B452CD">0.999</span>),
-        seed: <span style="color: #658b00">int</span> = <span style="color: #8B008B; font-weight: bold">None</span>,
-    ):
-        <span style="color: #658b00">super</span>().<span style="color: #008b45">__init__</span>(nodes, output_func, copy(scheduler), seed)
-        <span style="color: #658b00">self</span>.cost_func = cost_func
-
-        <span style="color: #228B22"># initiate matrices for later</span>
-        <span style="color: #658b00">self</span>.weights = <span style="color: #8B008B; font-weight: bold">None</span>
-        <span style="color: #658b00">self</span>.a_matrix = <span style="color: #8B008B; font-weight: bold">None</span>
-        <span style="color: #658b00">self</span>.z_matrix = <span style="color: #8B008B; font-weight: bold">None</span>
-
-        <span style="color: #228B22"># decides if the output layer performs binary or multi-class classification</span>
-        <span style="color: #658b00">self</span>._set_pred_format()
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">_feedforward</span>(<span style="color: #658b00">self</span>, X_batch: np.ndarray):
-        <span style="color: #228B22"># calculate a, z</span>
-        <span style="color: #228B22"># note that bias is not added as this would create an extra output class</span>
-        <span style="color: #658b00">self</span>.z_matrix = X_batch @ <span style="color: #658b00">self</span>.weights
-        <span style="color: #658b00">self</span>.a_matrix = <span style="color: #658b00">self</span>.act_func(<span style="color: #658b00">self</span>.z_matrix)
-
-        <span style="color: #228B22"># returns prediction</span>
-        <span style="color: #8B008B; font-weight: bold">return</span> <span style="color: #658b00">self</span>.a_matrix
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">_backpropagate</span>(<span style="color: #658b00">self</span>, target, a_previous, lam):
-        <span style="color: #228B22"># note that in the OutputLayer the activation function is the output function</span>
-        activation_derivative = derivate(<span style="color: #658b00">self</span>.act_func)
-
-        <span style="color: #228B22"># calculate output delta terms</span>
-        <span style="color: #228B22"># for multi-class or binary classification</span>
-        <span style="color: #8B008B; font-weight: bold">if</span> <span style="color: #658b00">self</span>.pred_format == <span style="color: #CD5555">&quot;Multi-class&quot;</span>:
-            delta_term = <span style="color: #658b00">self</span>.a_matrix - target
-        <span style="color: #8B008B; font-weight: bold">else</span>:
-            cost_func_derivative = grad(<span style="color: #658b00">self</span>.cost_func(target))
-            delta_term = activation_derivative(<span style="color: #658b00">self</span>.z_matrix) * cost_func_derivative(
-                <span style="color: #658b00">self</span>.a_matrix
-            )
-
-        <span style="color: #228B22"># intiate matrix that stores gradient</span>
-        gradient_weights = np.zeros(
-            (
-                a_previous.shape[input_index],
-                a_previous.shape[node_index] - bias_index,
-                delta_term.shape[node_index],
-            )
-        )
-
-        <span style="color: #228B22"># calculate gradient = delta term * previous a</span>
-        <span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #658b00">len</span>(delta_term)):
-            gradient_weights[i, :, :] = np.outer(
-                a_previous[i, bias_index:], delta_term[i, :]
-            )
-
-        <span style="color: #228B22"># sum the gradient, divide by input_index</span>
-        gradient_weights = np.mean(gradient_weights, axis=input_index)
-        <span style="color: #228B22"># for the bias gradient we do not multiply by previous a</span>
-        gradient_bias = np.mean(delta_term, axis=input_index).reshape(
-            <span style="color: #B452CD">1</span>, delta_term.shape[node_index]
-        )
-
-        <span style="color: #228B22"># regularization term</span>
-        gradient_weights += <span style="color: #658b00">self</span>.weights[bias_index:, :] * lam
-
-        <span style="color: #228B22"># send gradients into scheduler</span>
-        <span style="color: #228B22"># returns update matrix which will be used to update the weights and bias</span>
-        update_matrix = np.vstack(
-            [
-                <span style="color: #658b00">self</span>.scheduler_bias.update_change(gradient_bias),
-                <span style="color: #658b00">self</span>.scheduler_weight.update_change(gradient_weights),
-            ]
-        )
-
-        <span style="color: #228B22"># update weights</span>
-        <span style="color: #658b00">self</span>.weights -= update_matrix
-
-        <span style="color: #228B22"># return weights and delta term, input for backpropagation in previous layer</span>
-        <span style="color: #8B008B; font-weight: bold">return</span> <span style="color: #658b00">self</span>.weights, delta_term
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">_reset_weights</span>(<span style="color: #658b00">self</span>, previous_nodes):
-        <span style="color: #228B22"># sets seed to remove randomness inbetween runs</span>
-        <span style="color: #8B008B; font-weight: bold">if</span> <span style="color: #658b00">self</span>.seed <span style="color: #8B008B">is</span> <span style="color: #8B008B">not</span> <span style="color: #8B008B; font-weight: bold">None</span>:
-            np.random.seed(<span style="color: #658b00">self</span>.seed)
-
-        <span style="color: #228B22"># add bias, initiate random weights</span>
-        bias = <span style="color: #B452CD">1</span>
-        <span style="color: #658b00">self</span>.weights = np.random.rand(previous_nodes + bias, <span style="color: #658b00">self</span>.nodes)
-
-        <span style="color: #228B22"># returns number of nodes, used for reset_weights in next layer</span>
-        <span style="color: #8B008B; font-weight: bold">return</span> <span style="color: #658b00">self</span>.nodes
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">_reset_scheduler</span>(<span style="color: #658b00">self</span>):
-        <span style="color: #228B22"># resets scheduler per epoch</span>
-        <span style="color: #658b00">self</span>.scheduler_weight.reset()
-        <span style="color: #658b00">self</span>.scheduler_bias.reset()
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">_set_pred_format</span>(<span style="color: #658b00">self</span>):
-        <span style="color: #228B22"># sets prediction format to either regression, binary or multi-class classification</span>
-        <span style="color: #8B008B; font-weight: bold">if</span> <span style="color: #658b00">self</span>.act_func.<span style="color: #00688B">__name__</span> <span style="color: #8B008B">is</span> <span style="color: #8B008B; font-weight: bold">None</span> <span style="color: #8B008B">or</span> <span style="color: #658b00">self</span>.act_func.<span style="color: #00688B">__name__</span> == <span style="color: #CD5555">&quot;identity&quot;</span>:
-            <span style="color: #658b00">self</span>.pred_format = <span style="color: #CD5555">&quot;Regression&quot;</span>
-        <span style="color: #8B008B; font-weight: bold">elif</span> <span style="color: #658b00">self</span>.act_func.<span style="color: #00688B">__name__</span> == <span style="color: #CD5555">&quot;sigmoid&quot;</span> <span style="color: #8B008B">or</span> <span style="color: #658b00">self</span>.act_func.<span style="color: #00688B">__name__</span> == <span style="color: #CD5555">&quot;tanh&quot;</span>:
-            <span style="color: #658b00">self</span>.pred_format = <span style="color: #CD5555">&quot;Binary&quot;</span>
-        <span style="color: #8B008B; font-weight: bold">else</span>:
-            <span style="color: #658b00">self</span>.pred_format = <span style="color: #CD5555">&quot;Multi-class&quot;</span>
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">get_pred_format</span>(<span style="color: #658b00">self</span>):
-        <span style="color: #228B22"># returns format of prediction</span>
-        <span style="color: #8B008B; font-weight: bold">return</span> <span style="color: #658b00">self</span>.pred_format
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-<h3 id="optimized-convolution2dlayer">Optimized Convolution2DLayer </h3>
-
-<p>For our CNN, we have also implemented an optimized version of the
-Convolution2DLayer, Convolution2DLayerOPT, which runs much faster. See
-VII. Remarks for discussion. This layer will per default be used by
-the CNN due to its computational advantages, but is much less
-readable. We've documented it such that specially interested students
-can understand the principles behind it, but it is not recommended to
-read. In short, we reshape and transpose parts of the image such that
-the convolutional operation can be swapped out for a simple matrix
-multiplication.
-</p>
+    num_points = np.size(x,<span style="color: #B452CD">1</span>)
 
+    <span style="color: #228B22"># N_hidden is the number of hidden layers</span>
+    N_hidden = <span style="color: #658b00">len</span>(deep_params) - <span style="color: #B452CD">1</span> <span style="color: #228B22"># -1 since params consist of parameters to all the hidden layers AND the output layer</span>
 
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="font-size: 80%; line-height: 125%;"><span style="color: #8B008B; font-weight: bold">class</span> <span style="color: #008b45; font-weight: bold">Convolution2DLayerOPT</span>(Convolution2DLayer):
-    <span style="color: #CD5555">&quot;&quot;&quot;</span>
-<span style="color: #CD5555">    Am optimized version of the convolution layer above which</span>
-<span style="color: #CD5555">    utilizes an approach of extracting windows of size equivalent</span>
-<span style="color: #CD5555">    in size to the filter. The convoution is then performed on those</span>
-<span style="color: #CD5555">    windows instead of a full feature map.</span>
-<span style="color: #CD5555">    &quot;&quot;&quot;</span>
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">__init__</span>(
-        <span style="color: #658b00">self</span>,
-        input_channels,
-        feature_maps,
-        kernel_height,
-        kernel_width,
-        v_stride,
-        h_stride,
-        pad,
-        act_func: Callable,
-        seed=<span style="color: #8B008B; font-weight: bold">None</span>,
-        reset_weights_independently=<span style="color: #8B008B; font-weight: bold">True</span>,
-    ):
-        <span style="color: #658b00">super</span>().<span style="color: #008b45">__init__</span>(
-            input_channels,
-            feature_maps,
-            kernel_height,
-            kernel_width,
-            v_stride,
-            h_stride,
-            pad,
-            act_func,
-            seed,
-        )
-        <span style="color: #228B22"># true if layer is used outside of CNN</span>
-        <span style="color: #8B008B; font-weight: bold">if</span> reset_weights_independently == <span style="color: #8B008B; font-weight: bold">True</span>:
-            <span style="color: #658b00">self</span>._reset_weights_independently()
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">_feedforward</span>(<span style="color: #658b00">self</span>, X_batch):
-        <span style="color: #228B22"># The optimized _feedforward method is difficult to understand but computationally more efficient</span>
-        <span style="color: #228B22"># for a more &quot;by the book&quot; approach, please look at the _feedforward method of Convolution2DLayer</span>
-
-        <span style="color: #228B22"># save the input for backpropagation</span>
-        <span style="color: #658b00">self</span>.X_batch_feedforward = X_batch
-
-        <span style="color: #228B22"># check that there are the correct amount of input channels</span>
-        <span style="color: #658b00">self</span>._check_for_errors()
-
-        <span style="color: #228B22"># calculate new shape after stride</span>
-        strided_height = <span style="color: #658b00">int</span>(np.ceil(X_batch.shape[height_index] / <span style="color: #658b00">self</span>.v_stride))
-        strided_width = <span style="color: #658b00">int</span>(np.ceil(X_batch.shape[width_index] / <span style="color: #658b00">self</span>.h_stride))
-
-        <span style="color: #228B22"># get windows of the image for more computationally efficient convolution</span>
-        <span style="color: #228B22"># the idea is that we want to align the dimensions that we wish to matrix</span>
-        <span style="color: #228B22"># multiply, then use a simple matrix multiplication instead of convolution.</span>
-        <span style="color: #228B22"># then, we reshape the size back to its intended shape</span>
-        windows = <span style="color: #658b00">self</span>._extract_windows(X_batch)
-        windows = windows.transpose(<span style="color: #B452CD">1</span>, <span style="color: #B452CD">0</span>, <span style="color: #B452CD">2</span>, <span style="color: #B452CD">3</span>, <span style="color: #B452CD">4</span>).reshape(
-            X_batch.shape[input_index],
-            strided_height * strided_width,
-            -<span style="color: #B452CD">1</span>,
-        )
-
-        <span style="color: #228B22"># reshape the kernel for more computationally efficient convolution</span>
-        kernel = <span style="color: #658b00">self</span>.kernel
-        kernel = kernel.transpose(<span style="color: #B452CD">0</span>, <span style="color: #B452CD">2</span>, <span style="color: #B452CD">3</span>, <span style="color: #B452CD">1</span>).reshape(
-            kernel.shape[kernel_input_channels_index]
-            * kernel.shape[height_index]
-            * kernel.shape[width_index],
-            -<span style="color: #B452CD">1</span>,
-        )
-
-        <span style="color: #228B22"># use simple matrix calculation to obtain output</span>
-        output = (
-            (windows @ kernel)
-            .reshape(
-                X_batch.shape[input_index],
-                strided_height,
-                strided_width,
-                -<span style="color: #B452CD">1</span>,
-            )
-            .transpose(<span style="color: #B452CD">0</span>, <span style="color: #B452CD">3</span>, <span style="color: #B452CD">1</span>, <span style="color: #B452CD">2</span>)
-        )
-
-        <span style="color: #228B22"># The output is reshaped and rearranged to appropriate shape</span>
-        <span style="color: #8B008B; font-weight: bold">return</span> <span style="color: #658b00">self</span>.act_func(
-            output / (<span style="color: #658b00">self</span>.kernel_height * X_batch.shape[feature_maps_index])
-        )
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">_backpropagate</span>(<span style="color: #658b00">self</span>, delta_term_next):
-        <span style="color: #228B22"># The optimized _backpropagate method is difficult to understand but computationally more efficient</span>
-        <span style="color: #228B22"># for a more &quot;by the book&quot; approach, please look at the _backpropagate method of Convolution2DLayer</span>
-        act_derivative = derivate(<span style="color: #658b00">self</span>.act_func)
-        delta_term_next = act_derivative(delta_term_next)
-
-        <span style="color: #228B22"># calculate strided dimensions</span>
-        strided_height = <span style="color: #658b00">int</span>(
-            np.ceil(<span style="color: #658b00">self</span>.X_batch_feedforward.shape[height_index] / <span style="color: #658b00">self</span>.v_stride)
-        )
-        strided_width = <span style="color: #658b00">int</span>(
-            np.ceil(<span style="color: #658b00">self</span>.X_batch_feedforward.shape[width_index] / <span style="color: #658b00">self</span>.h_stride)
-        )
-
-        <span style="color: #228B22"># copy kernel</span>
-        kernel = <span style="color: #658b00">self</span>.kernel
-
-        <span style="color: #228B22"># get windows, reshape for matrix multiplication</span>
-        windows = <span style="color: #658b00">self</span>._extract_windows(<span style="color: #658b00">self</span>.X_batch_feedforward, <span style="color: #CD5555">&quot;image&quot;</span>).reshape(
-            <span style="color: #658b00">self</span>.X_batch_feedforward.shape[input_index]
-            * strided_height
-            * strided_width,
-            -<span style="color: #B452CD">1</span>,
-        )
-
-        <span style="color: #228B22"># initialize output gradient, reshape and transpose into correct shape</span>
-        <span style="color: #228B22"># for matrix multiplication</span>
-        output_grad_tr = delta_term_next.transpose(<span style="color: #B452CD">0</span>, <span style="color: #B452CD">2</span>, <span style="color: #B452CD">3</span>, <span style="color: #B452CD">1</span>).reshape(
-            <span style="color: #658b00">self</span>.X_batch_feedforward.shape[input_index]
-            * strided_height
-            * strided_width,
-            -<span style="color: #B452CD">1</span>,
-        )
-
-        <span style="color: #228B22"># calculate gradient kernel via simple matrix multiplication and reshaping</span>
-        gradient_kernel = (
-            (windows.T @ output_grad_tr)
-            .reshape(
-                kernel.shape[kernel_input_channels_index],
-                kernel.shape[height_index],
-                kernel.shape[width_index],
-                kernel.shape[kernel_feature_maps_index],
-            )
-            .transpose(<span style="color: #B452CD">0</span>, <span style="color: #B452CD">3</span>, <span style="color: #B452CD">1</span>, <span style="color: #B452CD">2</span>)
-        )
-
-        <span style="color: #228B22"># for computing the input gradient</span>
-        windows_out, upsampled_height, upsampled_width = <span style="color: #658b00">self</span>._extract_windows(
-            delta_term_next, <span style="color: #CD5555">&quot;grad&quot;</span>
-        )
-
-        <span style="color: #228B22"># calculate new window dimensions</span>
-        new_windows_first_dim = (
-            <span style="color: #658b00">self</span>.X_batch_feedforward.shape[input_index]
-            * upsampled_height
-            * upsampled_width
-        )
-        <span style="color: #228B22"># ceil allows for various asymmetric kernels</span>
-        new_windows_sec_dim = <span style="color: #658b00">int</span>(np.ceil(windows_out.size / new_windows_first_dim))
-
-        <span style="color: #228B22"># reshape for matrix multiplication</span>
-        windows_out = windows_out.transpose(<span style="color: #B452CD">1</span>, <span style="color: #B452CD">0</span>, <span style="color: #B452CD">2</span>, <span style="color: #B452CD">3</span>, <span style="color: #B452CD">4</span>).reshape(
-            new_windows_first_dim, new_windows_sec_dim
-        )
-
-        <span style="color: #228B22"># reshape for matrix multiplication</span>
-        kernel_reshaped = kernel.reshape(<span style="color: #658b00">self</span>.input_channels, -<span style="color: #B452CD">1</span>)
-
-        <span style="color: #228B22"># calculating input gradient for next convolutional layer</span>
-        input_grad = (windows_out @ kernel_reshaped.T).reshape(
-            <span style="color: #658b00">self</span>.X_batch_feedforward.shape[input_index],
-            upsampled_height,
-            upsampled_width,
-            kernel.shape[kernel_input_channels_index],
-        )
-        input_grad = input_grad.transpose(<span style="color: #B452CD">0</span>, <span style="color: #B452CD">3</span>, <span style="color: #B452CD">1</span>, <span style="color: #B452CD">2</span>)
-
-        <span style="color: #228B22"># Update the weights in the kernel</span>
-        <span style="color: #658b00">self</span>.kernel -= gradient_kernel
-
-        <span style="color: #228B22"># Output the gradient to propagate backwards</span>
-        <span style="color: #8B008B; font-weight: bold">return</span> input_grad
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">_extract_windows</span>(<span style="color: #658b00">self</span>, X_batch, batch_type=<span style="color: #CD5555">&quot;image&quot;</span>):
-        <span style="color: #CD5555">&quot;&quot;&quot;</span>
-<span style="color: #CD5555">        Receives as input the X_batch with shape (inputs, feature_maps, image_height, image_width)</span>
-<span style="color: #CD5555">        and extract windows of size kernel_height * kernel_width for every image and every feature_map.</span>
-<span style="color: #CD5555">        It then returns an np.ndarray of shape (image_height * image_width, inputs, feature_maps, kernel_height, kernel_width)</span>
-<span style="color: #CD5555">        which will be used either to filter the images in feedforward or to calculate the gradient.</span>
-<span style="color: #CD5555">        &quot;&quot;&quot;</span>
-
-        <span style="color: #228B22"># initialize list of windows</span>
-        windows = []
-
-        <span style="color: #8B008B; font-weight: bold">if</span> batch_type == <span style="color: #CD5555">&quot;image&quot;</span>:
-            <span style="color: #228B22"># pad the images</span>
-            X_batch_padded = <span style="color: #658b00">self</span>._padding(X_batch, batch_type=<span style="color: #CD5555">&quot;image&quot;</span>)
-            img_height, img_width = X_batch_padded.shape[<span style="color: #B452CD">2</span>:]
-            <span style="color: #228B22"># For each location in the image...</span>
-            <span style="color: #8B008B; font-weight: bold">for</span> h <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(
-                <span style="color: #B452CD">0</span>,
-                X_batch.shape[height_index],
-                <span style="color: #658b00">self</span>.v_stride,
-            ):
-                <span style="color: #8B008B; font-weight: bold">for</span> w <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(
-                    <span style="color: #B452CD">0</span>,
-                    X_batch.shape[width_index],
-                    <span style="color: #658b00">self</span>.h_stride,
-                ):
-                    <span style="color: #228B22"># ...obtain an image patch of the original size (strided)</span>
-
-                    <span style="color: #228B22"># get window</span>
-                    window = X_batch_padded[
-                        :,
-                        :,
-                        h : h + <span style="color: #658b00">self</span>.kernel_height,
-                        w : w + <span style="color: #658b00">self</span>.kernel_width,
-                    ]
-
-                    <span style="color: #228B22"># append to list of windows</span>
-                    windows.append(window)
-
-            <span style="color: #228B22"># return numpy array instead of list</span>
-            <span style="color: #8B008B; font-weight: bold">return</span> np.stack(windows)
-
-        <span style="color: #228B22"># In order to be able to perform backprogagation by the method of window extraction,</span>
-        <span style="color: #228B22"># here is a modified approach to extracting the windows which allow for the necessary</span>
-        <span style="color: #228B22"># upsampling of the gradient in case the on of the stride parameters is larger than one.</span>
-
-        <span style="color: #8B008B; font-weight: bold">if</span> batch_type == <span style="color: #CD5555">&quot;grad&quot;</span>:
-
-            <span style="color: #228B22"># In the case of one of the stride parameters being odd, we have to take some</span>
-            <span style="color: #228B22"># extra care in calculating the upsampled size of X_batch. We solve this</span>
-            <span style="color: #228B22"># by simply flooring the result of dividing stride by 2.</span>
-            <span style="color: #8B008B; font-weight: bold">if</span> <span style="color: #658b00">self</span>.v_stride &lt; <span style="color: #B452CD">2</span> <span style="color: #8B008B">or</span> <span style="color: #658b00">self</span>.v_stride % <span style="color: #B452CD">2</span> == <span style="color: #B452CD">0</span>:
-                v_stride = <span style="color: #B452CD">0</span>
-            <span style="color: #8B008B; font-weight: bold">else</span>:
-                v_stride = <span style="color: #658b00">int</span>(np.floor(<span style="color: #658b00">self</span>.v_stride / <span style="color: #B452CD">2</span>))
-
-            <span style="color: #8B008B; font-weight: bold">if</span> <span style="color: #658b00">self</span>.h_stride &lt; <span style="color: #B452CD">2</span> <span style="color: #8B008B">or</span> <span style="color: #658b00">self</span>.h_stride % <span style="color: #B452CD">2</span> == <span style="color: #B452CD">0</span>:
-                h_stride = <span style="color: #B452CD">0</span>
-            <span style="color: #8B008B; font-weight: bold">else</span>:
-                h_stride = <span style="color: #658b00">int</span>(np.floor(<span style="color: #658b00">self</span>.h_stride / <span style="color: #B452CD">2</span>))
-
-            upsampled_height = (X_batch.shape[height_index] * <span style="color: #658b00">self</span>.v_stride) - v_stride
-            upsampled_width = (X_batch.shape[width_index] * <span style="color: #658b00">self</span>.h_stride) - h_stride
-
-            <span style="color: #228B22"># When upsampling, we need to insert rows and columns filled with zeros</span>
-            <span style="color: #228B22"># into each feature map. How many of those we have to insert is purely</span>
-            <span style="color: #228B22"># dependant on the value of stride parameter in the vertical and horizontal</span>
-            <span style="color: #228B22"># direction.</span>
-            <span style="color: #8B008B; font-weight: bold">if</span> <span style="color: #658b00">self</span>.v_stride &gt; <span style="color: #B452CD">1</span>:
-                v_ind = <span style="color: #B452CD">1</span>
-                <span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(X_batch.shape[height_index]):
-                    <span style="color: #8B008B; font-weight: bold">for</span> j <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #658b00">self</span>.v_stride - <span style="color: #B452CD">1</span>):
-                        X_batch = np.insert(X_batch, v_ind, <span style="color: #B452CD">0</span>, axis=height_index)
-                    v_ind += <span style="color: #658b00">self</span>.v_stride
-
-            <span style="color: #8B008B; font-weight: bold">if</span> <span style="color: #658b00">self</span>.h_stride &gt; <span style="color: #B452CD">1</span>:
-                h_ind = <span style="color: #B452CD">1</span>
-                <span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(X_batch.shape[width_index]):
-                    <span style="color: #8B008B; font-weight: bold">for</span> k <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #658b00">self</span>.h_stride - <span style="color: #B452CD">1</span>):
-                        X_batch = np.insert(X_batch, h_ind, <span style="color: #B452CD">0</span>, axis=width_index)
-                    h_ind += <span style="color: #658b00">self</span>.h_stride
-
-            <span style="color: #228B22"># Since the insertion of zero-filled rows and columns isn&#39;t perfect, we have</span>
-            <span style="color: #228B22"># to assure that the resulting feature maps will have the expected upsampled height</span>
-            <span style="color: #228B22"># and width by cutting them og at desired dimensions.</span>
-
-            X_batch = X_batch[:, :, :upsampled_height, :upsampled_width]
-
-            X_batch_padded = <span style="color: #658b00">self</span>._padding(X_batch, batch_type=<span style="color: #CD5555">&quot;grad&quot;</span>)
-
-            <span style="color: #228B22"># initialize list of windows</span>
-            windows = []
-
-            <span style="color: #228B22"># For each location in the image...</span>
-            <span style="color: #8B008B; font-weight: bold">for</span> h <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(
-                <span style="color: #B452CD">0</span>,
-                X_batch.shape[height_index],
-                <span style="color: #658b00">self</span>.v_stride,
-            ):
-                <span style="color: #8B008B; font-weight: bold">for</span> w <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(
-                    <span style="color: #B452CD">0</span>,
-                    X_batch.shape[width_index],
-                    <span style="color: #658b00">self</span>.h_stride,
-                ):
-                    <span style="color: #228B22"># ...obtain an image patch of the original size (strided)</span>
-
-                    <span style="color: #228B22"># get window</span>
-                    window = X_batch_padded[
-                        :, :, h : h + <span style="color: #658b00">self</span>.kernel_height, w : w + <span style="color: #658b00">self</span>.kernel_width
-                    ]
-
-                    <span style="color: #228B22"># append window to list</span>
-                    windows.append(window)
-
-            <span style="color: #228B22"># return numpy array, unsampled dimensions</span>
-            <span style="color: #8B008B; font-weight: bold">return</span> np.stack(windows), upsampled_height, upsampled_width
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">_check_for_errors</span>(<span style="color: #658b00">self</span>):
-        <span style="color: #228B22"># compares input channels of data to input channels of Convolution2DLayer</span>
-        <span style="color: #8B008B; font-weight: bold">if</span> <span style="color: #658b00">self</span>.X_batch_feedforward.shape[input_channel_index] != <span style="color: #658b00">self</span>.input_channels:
-            <span style="color: #8B008B; font-weight: bold">raise</span> <span style="color: #008b45; font-weight: bold">AssertionError</span>(
-                <span style="color: #CD5555">f&quot;ERROR: Number of input channels in data ({</span><span style="color: #658b00">self</span>.X_batch_feedforward.shape[input_channel_index]<span style="color: #CD5555">}) is not equal to input channels in Convolution2DLayerOPT ({</span><span style="color: #658b00">self</span>.input_channels<span style="color: #CD5555">})! Please change the number of input channels of the Convolution2DLayer such that they are equal&quot;</span>
-            )
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-<h3 id="the-convolutional-neural-network-cnn">The Convolutional Neural Network (CNN) </h3>
+    <span style="color: #228B22"># Assume that the input layer does nothing to the input x</span>
+    x_input = x
+    x_prev = x_input
 
-<p>Finally, we present the code for the CNN. The CNN class organizes all the layers, and allows for training on image data.</p>
+    <span style="color: #228B22">## Hidden layers:</span>
 
+    <span style="color: #8B008B; font-weight: bold">for</span> l <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(N_hidden):
+        <span style="color: #228B22"># From the list of parameters P; find the correct weigths and bias for this layer</span>
+        w_hidden = deep_params[l]
 
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="font-size: 80%; line-height: 125%;"><span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">math</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">autograd.numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">sys</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">warnings</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">autograd</span> <span style="color: #8B008B; font-weight: bold">import</span> grad, elementwise_grad
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">random</span> <span style="color: #8B008B; font-weight: bold">import</span> random, seed
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">copy</span> <span style="color: #8B008B; font-weight: bold">import</span> deepcopy
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">typing</span> <span style="color: #8B008B; font-weight: bold">import</span> Tuple, Callable
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.utils</span> <span style="color: #8B008B; font-weight: bold">import</span> resample
-
-warnings.simplefilter(<span style="color: #CD5555">&quot;error&quot;</span>)
-
-
-<span style="color: #8B008B; font-weight: bold">class</span> <span style="color: #008b45; font-weight: bold">CNN</span>:
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">__init__</span>(
-        <span style="color: #658b00">self</span>,
-        cost_func: Callable = CostCrossEntropy,
-        scheduler: Scheduler = Adam(eta=<span style="color: #B452CD">1e-4</span>, rho=<span style="color: #B452CD">0.9</span>, rho2=<span style="color: #B452CD">0.999</span>),
-        seed: <span style="color: #658b00">int</span> = <span style="color: #8B008B; font-weight: bold">None</span>,
-    ):
-        <span style="color: #CD5555">&quot;&quot;&quot;</span>
-<span style="color: #CD5555">        Description:</span>
-<span style="color: #CD5555">        ------------</span>
-<span style="color: #CD5555">            Instantiates CNN object</span>
-
-<span style="color: #CD5555">        Parameters:</span>
-<span style="color: #CD5555">        ------------</span>
-<span style="color: #CD5555">            I   output_func (costFunctions) cost function for feed forward neural network part of CNN,</span>
-<span style="color: #CD5555">                such as &quot;CostLogReg&quot;, &quot;CostOLS&quot; or &quot;CostCrossEntropy&quot;</span>
-
-<span style="color: #CD5555">            II  scheduler (Scheduler) optional parameter, default set to Adam. Can also be set to other</span>
-<span style="color: #CD5555">                schedulers such as AdaGrad, Momentum, RMS_prop and Constant. Note that schedulers have</span>
-<span style="color: #CD5555">                to be instantiated first with proper parameters (for example eta, rho and rho2 for Adam)</span>
-
-<span style="color: #CD5555">            III seed (int) used for seeding all random operations</span>
-<span style="color: #CD5555">        &quot;&quot;&quot;</span>
-        <span style="color: #658b00">self</span>.layers = <span style="color: #658b00">list</span>()
-        <span style="color: #658b00">self</span>.cost_func = cost_func
-        <span style="color: #658b00">self</span>.scheduler = scheduler
-        <span style="color: #658b00">self</span>.schedulers_weight = <span style="color: #658b00">list</span>()
-        <span style="color: #658b00">self</span>.schedulers_bias = <span style="color: #658b00">list</span>()
-        <span style="color: #658b00">self</span>.seed = seed
-        <span style="color: #658b00">self</span>.pred_format = <span style="color: #8B008B; font-weight: bold">None</span>
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">add_FullyConnectedLayer</span>(
-        <span style="color: #658b00">self</span>, nodes: <span style="color: #658b00">int</span>, act_func=LRELU, scheduler=<span style="color: #8B008B; font-weight: bold">None</span>
-    ) -&gt; <span style="color: #8B008B; font-weight: bold">None</span>:
-        <span style="color: #CD5555">&quot;&quot;&quot;</span>
-<span style="color: #CD5555">        Description:</span>
-<span style="color: #CD5555">        ------------</span>
-<span style="color: #CD5555">            Add a FullyConnectedLayer to the CNN, i.e. a hidden layer in the feed forward neural</span>
-<span style="color: #CD5555">            network part of the CNN. Often called a Dense layer in literature</span>
-
-<span style="color: #CD5555">        Parameters:</span>
-<span style="color: #CD5555">        ------------</span>
-<span style="color: #CD5555">            I   nodes (int) number of nodes in FullyConnectedLayer</span>
-<span style="color: #CD5555">            II  act_func (activationFunctions) activation function of FullyConnectedLayer,</span>
-<span style="color: #CD5555">                such as &quot;sigmoid&quot;, &quot;RELU&quot;, &quot;LRELU&quot;, &quot;softmax&quot; or &quot;identity&quot;</span>
-<span style="color: #CD5555">            III scheduler (Scheduler) optional parameter, default set to Adam. Can also be set to other</span>
-<span style="color: #CD5555">                schedulers such as AdaGrad, Momentum, RMS_prop and Constant</span>
-<span style="color: #CD5555">        &quot;&quot;&quot;</span>
-        <span style="color: #8B008B; font-weight: bold">assert</span> <span style="color: #658b00">self</span>.layers, <span style="color: #CD5555">&quot;FullyConnectedLayer should follow FlattenLayer in CNN&quot;</span>
-
-        <span style="color: #8B008B; font-weight: bold">if</span> scheduler <span style="color: #8B008B">is</span> <span style="color: #8B008B; font-weight: bold">None</span>:
-            scheduler = <span style="color: #658b00">self</span>.scheduler
-
-        layer = FullyConnectedLayer(nodes, act_func, scheduler, <span style="color: #658b00">self</span>.seed)
-        <span style="color: #658b00">self</span>.layers.append(layer)
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">add_OutputLayer</span>(<span style="color: #658b00">self</span>, nodes: <span style="color: #658b00">int</span>, output_func=sigmoid, scheduler=<span style="color: #8B008B; font-weight: bold">None</span>) -&gt; <span style="color: #8B008B; font-weight: bold">None</span>:
-        <span style="color: #CD5555">&quot;&quot;&quot;</span>
-<span style="color: #CD5555">        Description:</span>
-<span style="color: #CD5555">        ------------</span>
-<span style="color: #CD5555">            Add an OutputLayer to the CNN, i.e. a the final layer in the feed forward neural</span>
-<span style="color: #CD5555">            network part of the CNN</span>
-
-<span style="color: #CD5555">        Parameters:</span>
-<span style="color: #CD5555">        ------------</span>
-<span style="color: #CD5555">            I   nodes (int) number of nodes in OutputLayer. Set nodes=1 for binary classification and</span>
-<span style="color: #CD5555">                nodes = number of classes for multi-class classification</span>
-<span style="color: #CD5555">            II  output_func (activationFunctions) activation function for the output layer, such as</span>
-<span style="color: #CD5555">                &quot;identity&quot; for regression, &quot;sigmoid&quot; for binary classification and &quot;softmax&quot; for multi-class</span>
-<span style="color: #CD5555">                classification</span>
-<span style="color: #CD5555">            III scheduler (Scheduler) optional parameter, default set to Adam. Can also be set to other</span>
-<span style="color: #CD5555">                schedulers such as AdaGrad, Momentum, RMS_prop and Constant</span>
-<span style="color: #CD5555">        &quot;&quot;&quot;</span>
-        <span style="color: #8B008B; font-weight: bold">assert</span> <span style="color: #658b00">self</span>.layers, <span style="color: #CD5555">&quot;OutputLayer should follow FullyConnectedLayer in CNN&quot;</span>
-
-        <span style="color: #8B008B; font-weight: bold">if</span> scheduler <span style="color: #8B008B">is</span> <span style="color: #8B008B; font-weight: bold">None</span>:
-            scheduler = <span style="color: #658b00">self</span>.scheduler
-
-        output_layer = OutputLayer(
-            nodes, output_func, <span style="color: #658b00">self</span>.cost_func, scheduler, <span style="color: #658b00">self</span>.seed
-        )
-        <span style="color: #658b00">self</span>.layers.append(output_layer)
-        <span style="color: #658b00">self</span>.pred_format = output_layer.get_pred_format()
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">add_FlattenLayer</span>(<span style="color: #658b00">self</span>, act_func=LRELU) -&gt; <span style="color: #8B008B; font-weight: bold">None</span>:
-        <span style="color: #CD5555">&quot;&quot;&quot;</span>
-<span style="color: #CD5555">        Description:</span>
-<span style="color: #CD5555">        ------------</span>
-<span style="color: #CD5555">            Add a FlattenLayer to the CNN, which flattens the image data such that it is formatted to</span>
-<span style="color: #CD5555">            be used in the feed forward neural network part of the CNN</span>
-<span style="color: #CD5555">        &quot;&quot;&quot;</span>
-        <span style="color: #658b00">self</span>.layers.append(FlattenLayer(act_func=act_func, seed=<span style="color: #658b00">self</span>.seed))
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">add_Convolution2DLayer</span>(
-        <span style="color: #658b00">self</span>,
-        input_channels=<span style="color: #B452CD">1</span>,
-        feature_maps=<span style="color: #B452CD">1</span>,
-        kernel_height=<span style="color: #B452CD">3</span>,
-        kernel_width=<span style="color: #B452CD">3</span>,
-        v_stride=<span style="color: #B452CD">1</span>,
-        h_stride=<span style="color: #B452CD">1</span>,
-        pad=<span style="color: #CD5555">&quot;same&quot;</span>,
-        act_func=LRELU,
-        optimized=<span style="color: #8B008B; font-weight: bold">True</span>,
-    ) -&gt; <span style="color: #8B008B; font-weight: bold">None</span>:
-        <span style="color: #CD5555">&quot;&quot;&quot;</span>
-<span style="color: #CD5555">        Description:</span>
-<span style="color: #CD5555">        ------------</span>
-<span style="color: #CD5555">            Add a Convolution2DLayer to the CNN, i.e. a convolutional layer with a 2 dimensional kernel. Should be</span>
-<span style="color: #CD5555">            the first layer added to the CNN</span>
-
-<span style="color: #CD5555">        Parameters:</span>
-<span style="color: #CD5555">        ------------</span>
-<span style="color: #CD5555">            I   input_channels (int) specifies amount of input channels. For monochrome images, use input_channels</span>
-<span style="color: #CD5555">                = 1, and input_channels = 3 for colored images, where each channel represents one of R, G and B</span>
-<span style="color: #CD5555">            II  feature_maps (int) amount of feature maps in CNN</span>
-<span style="color: #CD5555">            III kernel_height (int) height of the kernel, also called &#39;convolutional filter&#39; in literature</span>
-<span style="color: #CD5555">            IV  kernel_width (int) width of the kernel, also called &#39;convolutional filter&#39; in literature</span>
-<span style="color: #CD5555">            V   v_stride (int) value of vertical stride for dimentionality reduction</span>
-<span style="color: #CD5555">            VI  h_stride (int) value of horizontal stride for dimentionality reduction</span>
-<span style="color: #CD5555">            VII pad (str) default = &quot;same&quot; ensures output size is the same as input size (given stride=1)</span>
-<span style="color: #CD5555">           VIII act_func (activationFunctions) default = &quot;LRELU&quot;, nonlinear activation function</span>
-<span style="color: #CD5555">             IX optimized (bool) default = True, uses Convolution2DLayerOPT if True which is much faster when</span>
-<span style="color: #CD5555">                compared to Convolution2DLayer, which is a more straightforward, understandable implementation</span>
-<span style="color: #CD5555">        &quot;&quot;&quot;</span>
-        <span style="color: #8B008B; font-weight: bold">if</span> optimized:
-            conv_layer = Convolution2DLayerOPT(
-                input_channels,
-                feature_maps,
-                kernel_height,
-                kernel_width,
-                v_stride,
-                h_stride,
-                pad,
-                act_func,
-                <span style="color: #658b00">self</span>.seed,
-                reset_weights_independently=<span style="color: #8B008B; font-weight: bold">False</span>,
-            )
-        <span style="color: #8B008B; font-weight: bold">else</span>:
-            conv_layer = Convolution2DLayer(
-                input_channels,
-                feature_maps,
-                kernel_height,
-                kernel_width,
-                v_stride,
-                h_stride,
-                pad,
-                act_func,
-                <span style="color: #658b00">self</span>.seed,
-                reset_weights_independently=<span style="color: #8B008B; font-weight: bold">False</span>,
-            )
-        <span style="color: #658b00">self</span>.layers.append(conv_layer)
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">add_PoolingLayer</span>(
-        <span style="color: #658b00">self</span>, kernel_height=<span style="color: #B452CD">2</span>, kernel_width=<span style="color: #B452CD">2</span>, v_stride=<span style="color: #B452CD">1</span>, h_stride=<span style="color: #B452CD">1</span>, pooling=<span style="color: #CD5555">&quot;max&quot;</span>
-    ) -&gt; <span style="color: #8B008B; font-weight: bold">None</span>:
-        <span style="color: #CD5555">&quot;&quot;&quot;</span>
-<span style="color: #CD5555">        Description:</span>
-<span style="color: #CD5555">        ------------</span>
-<span style="color: #CD5555">            Add a Pooling2DLayer to the CNN, i.e. a pooling layer that reduces the dimentionality of</span>
-<span style="color: #CD5555">            the image data. It is not necessary to use a Pooling2DLayer when creating a CNN, but it</span>
-<span style="color: #CD5555">            can be used to speed up the training</span>
-
-<span style="color: #CD5555">        Parameters:</span>
-<span style="color: #CD5555">        ------------</span>
-<span style="color: #CD5555">            I   kernel_height (int) height of the kernel used for pooling</span>
-<span style="color: #CD5555">            II  kernel_width (int) width of the kernel used for pooling</span>
-<span style="color: #CD5555">            III v_stride (int) value of vertical stride for dimentionality reduction</span>
-<span style="color: #CD5555">            IV  h_stride (int) value of horizontal stride for dimentionality reduction</span>
-<span style="color: #CD5555">            V   pooling (str) either &quot;max&quot; or &quot;average&quot;, describes type of pooling performed</span>
-<span style="color: #CD5555">        &quot;&quot;&quot;</span>
-        pooling_layer = Pooling2DLayer(
-            kernel_height, kernel_width, v_stride, h_stride, pooling, <span style="color: #658b00">self</span>.seed
-        )
-        <span style="color: #658b00">self</span>.layers.append(pooling_layer)
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">fit</span>(
-        <span style="color: #658b00">self</span>,
-        X: np.ndarray,
-        t: np.ndarray,
-        epochs: <span style="color: #658b00">int</span> = <span style="color: #B452CD">100</span>,
-        lam: <span style="color: #658b00">float</span> = <span style="color: #B452CD">0</span>,
-        batches: <span style="color: #658b00">int</span> = <span style="color: #B452CD">1</span>,
-        X_val: np.ndarray = <span style="color: #8B008B; font-weight: bold">None</span>,
-        t_val: np.ndarray = <span style="color: #8B008B; font-weight: bold">None</span>,
-    ) -&gt; <span style="color: #658b00">dict</span>:
-        <span style="color: #CD5555">&quot;&quot;&quot;</span>
-<span style="color: #CD5555">        Description:</span>
-<span style="color: #CD5555">        ------------</span>
-<span style="color: #CD5555">            Fits the CNN to input X for a given amount of epochs. Performs feedforward and backpropagation passes,</span>
-<span style="color: #CD5555">            can utilize batches, regulariziation and validation if desired.</span>
-
-<span style="color: #CD5555">        Parameters:</span>
-<span style="color: #CD5555">        ------------</span>
-<span style="color: #CD5555">            X (numpy array) with input data in format [images, input channels,</span>
-<span style="color: #CD5555">            image height, image_width]</span>
-<span style="color: #CD5555">            t (numpy array) target labels for input data</span>
-<span style="color: #CD5555">            epochs (int) amount of epochs</span>
-<span style="color: #CD5555">            lam (float) regulariziation term lambda</span>
-<span style="color: #CD5555">            batches (int) amount of batches input data splits into</span>
-<span style="color: #CD5555">            X_val (numpy array) validation data</span>
-<span style="color: #CD5555">            t_val (numpy array) target labels for validation data</span>
-
-<span style="color: #CD5555">        Returns:</span>
-<span style="color: #CD5555">        ------------</span>
-<span style="color: #CD5555">            scores (dict) a dictionary with &quot;train_error&quot;, &quot;train_acc&quot;, &quot;val_error&quot;, val_acc&quot; keys</span>
-<span style="color: #CD5555">            that contain numpy arrays with float values of all accuracies/errors over all epochs.</span>
-<span style="color: #CD5555">            Can be used to create plots. Also used to update the progress bar during training</span>
-<span style="color: #CD5555">        &quot;&quot;&quot;</span>
-
-        <span style="color: #228B22"># setup</span>
-        <span style="color: #8B008B; font-weight: bold">if</span> <span style="color: #658b00">self</span>.seed <span style="color: #8B008B">is</span> <span style="color: #8B008B">not</span> <span style="color: #8B008B; font-weight: bold">None</span>:
-            np.random.seed(<span style="color: #658b00">self</span>.seed)
-
-        <span style="color: #228B22"># initialize weights</span>
-        <span style="color: #658b00">self</span>._initialize_weights(X)
-
-        <span style="color: #228B22"># create arrays for score metrics</span>
-        scores = <span style="color: #658b00">self</span>._initialize_scores(epochs)
-
-        <span style="color: #8B008B; font-weight: bold">assert</span> batches &lt;= t.shape[<span style="color: #B452CD">0</span>]
-        batch_size = X.shape[<span style="color: #B452CD">0</span>] // batches
-
-        <span style="color: #8B008B; font-weight: bold">try</span>:
-            <span style="color: #8B008B; font-weight: bold">for</span> epoch <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(epochs):
-                <span style="color: #8B008B; font-weight: bold">for</span> batch <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(batches):
-                    <span style="color: #228B22"># minibatch gradient descent</span>
-                    <span style="color: #228B22"># If the for loop has reached the last batch, take all thats left</span>
-                    <span style="color: #8B008B; font-weight: bold">if</span> batch == batches - <span style="color: #B452CD">1</span>:
-                        X_batch = X[batch * batch_size :, :, :, :]
-                        t_batch = t[batch * batch_size :, :]
-                    <span style="color: #8B008B; font-weight: bold">else</span>:
-                        X_batch = X[
-                            batch * batch_size : (batch + <span style="color: #B452CD">1</span>) * batch_size, :, :, :
-                        ]
-                        t_batch = t[batch * batch_size : (batch + <span style="color: #B452CD">1</span>) * batch_size, :]
-
-                    <span style="color: #658b00">self</span>._feedforward(X_batch)
-                    <span style="color: #658b00">self</span>._backpropagate(t_batch, lam)
-
-                <span style="color: #228B22"># reset schedulers for each epoch (some schedulers pass in this call)</span>
-                <span style="color: #8B008B; font-weight: bold">for</span> layer <span style="color: #8B008B">in</span> <span style="color: #658b00">self</span>.layers:
-                    <span style="color: #8B008B; font-weight: bold">if</span> <span style="color: #658b00">isinstance</span>(layer, FullyConnectedLayer):
-                        layer._reset_scheduler()
-
-                <span style="color: #228B22"># computing performance metrics</span>
-                scores = <span style="color: #658b00">self</span>._compute_scores(scores, epoch, X, t, X_val, t_val)
-
-                <span style="color: #228B22"># printing progress bar</span>
-                print_length = <span style="color: #658b00">self</span>._progress_bar(
-                    epoch,
-                    epochs,
-                    scores,
-                )
-        <span style="color: #228B22"># allows for stopping training at any point and seeing the result</span>
-        <span style="color: #8B008B; font-weight: bold">except</span> <span style="color: #008b45; font-weight: bold">KeyboardInterrupt</span>:
-            <span style="color: #8B008B; font-weight: bold">pass</span>
-
-        <span style="color: #228B22"># visualization of training progression (similiar to tensorflow progression bar)</span>
-        sys.stdout.write(<span style="color: #CD5555">&quot;\r&quot;</span> + <span style="color: #CD5555">&quot; &quot;</span> * print_length)
-        sys.stdout.flush()
-        <span style="color: #658b00">self</span>._progress_bar(
-            epochs,
-            epochs,
-            scores,
-        )
-        sys.stdout.write(<span style="color: #CD5555">&quot;&quot;</span>)
-
-        <span style="color: #8B008B; font-weight: bold">return</span> scores
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">_feedforward</span>(<span style="color: #658b00">self</span>, X_batch) -&gt; np.ndarray:
-        <span style="color: #CD5555">&quot;&quot;&quot;</span>
-<span style="color: #CD5555">        Description:</span>
-<span style="color: #CD5555">        ------------</span>
-<span style="color: #CD5555">            Performs the feedforward pass for all layers in the CNN. Called from fit()</span>
-<span style="color: #CD5555">        &quot;&quot;&quot;</span>
-        a = X_batch
-        <span style="color: #8B008B; font-weight: bold">for</span> layer <span style="color: #8B008B">in</span> <span style="color: #658b00">self</span>.layers:
-            a = layer._feedforward(a)
-
-        <span style="color: #8B008B; font-weight: bold">return</span> a
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">_backpropagate</span>(<span style="color: #658b00">self</span>, t_batch, lam) -&gt; <span style="color: #8B008B; font-weight: bold">None</span>:
-        <span style="color: #CD5555">&quot;&quot;&quot;</span>
-<span style="color: #CD5555">        Description:</span>
-<span style="color: #CD5555">        ------------</span>
-<span style="color: #CD5555">            Performs backpropagation for all layers in the CNN. Called from fit()</span>
-<span style="color: #CD5555">        &quot;&quot;&quot;</span>
-        <span style="color: #8B008B; font-weight: bold">assert</span> <span style="color: #658b00">len</span>(<span style="color: #658b00">self</span>.layers) &gt;= <span style="color: #B452CD">2</span>
-        reversed_layers = <span style="color: #658b00">self</span>.layers[::-<span style="color: #B452CD">1</span>]
-
-        <span style="color: #228B22"># for every layer, backwards</span>
-        <span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #658b00">len</span>(reversed_layers) - <span style="color: #B452CD">1</span>):
-            layer = reversed_layers[i]
-            prev_layer = reversed_layers[i + <span style="color: #B452CD">1</span>]
-
-            <span style="color: #228B22"># OutputLayer</span>
-            <span style="color: #8B008B; font-weight: bold">if</span> <span style="color: #658b00">isinstance</span>(layer, OutputLayer):
-                prev_a = prev_layer.get_prev_a()
-                weights_next, delta_next = layer._backpropagate(t_batch, prev_a, lam)
-
-            <span style="color: #228B22"># FullyConnectedLayer</span>
-            <span style="color: #8B008B; font-weight: bold">elif</span> <span style="color: #658b00">isinstance</span>(layer, FullyConnectedLayer):
-                <span style="color: #8B008B; font-weight: bold">assert</span> (
-                    delta_next <span style="color: #8B008B">is</span> <span style="color: #8B008B">not</span> <span style="color: #8B008B; font-weight: bold">None</span>
-                ), <span style="color: #CD5555">&quot;No OutputLayer to follow FullyConnectedLayer&quot;</span>
-                <span style="color: #8B008B; font-weight: bold">assert</span> (
-                    weights_next <span style="color: #8B008B">is</span> <span style="color: #8B008B">not</span> <span style="color: #8B008B; font-weight: bold">None</span>
-                ), <span style="color: #CD5555">&quot;No OutputLayer to follow FullyConnectedLayer&quot;</span>
-                prev_a = prev_layer.get_prev_a()
-                weights_next, delta_next = layer._backpropagate(
-                    weights_next, delta_next, prev_a, lam
-                )
-
-            <span style="color: #228B22"># FlattenLayer</span>
-            <span style="color: #8B008B; font-weight: bold">elif</span> <span style="color: #658b00">isinstance</span>(layer, FlattenLayer):
-                <span style="color: #8B008B; font-weight: bold">assert</span> (
-                    delta_next <span style="color: #8B008B">is</span> <span style="color: #8B008B">not</span> <span style="color: #8B008B; font-weight: bold">None</span>
-                ), <span style="color: #CD5555">&quot;No FullyConnectedLayer to follow FlattenLayer&quot;</span>
-                <span style="color: #8B008B; font-weight: bold">assert</span> (
-                    weights_next <span style="color: #8B008B">is</span> <span style="color: #8B008B">not</span> <span style="color: #8B008B; font-weight: bold">None</span>
-                ), <span style="color: #CD5555">&quot;No FullyConnectedLayer to follow FlattenLayer&quot;</span>
-                delta_next = layer._backpropagate(weights_next, delta_next)
-
-            <span style="color: #228B22"># Convolution2DLayer and Convolution2DLayerOPT</span>
-            <span style="color: #8B008B; font-weight: bold">elif</span> <span style="color: #658b00">isinstance</span>(layer, Convolution2DLayer):
-                <span style="color: #8B008B; font-weight: bold">assert</span> (
-                    delta_next <span style="color: #8B008B">is</span> <span style="color: #8B008B">not</span> <span style="color: #8B008B; font-weight: bold">None</span>
-                ), <span style="color: #CD5555">&quot;No FlattenLayer to follow Convolution2DLayer&quot;</span>
-                delta_next = layer._backpropagate(delta_next)
-
-            <span style="color: #228B22"># Pooling2DLayer</span>
-            <span style="color: #8B008B; font-weight: bold">elif</span> <span style="color: #658b00">isinstance</span>(layer, Pooling2DLayer):
-                <span style="color: #8B008B; font-weight: bold">assert</span> delta_next <span style="color: #8B008B">is</span> <span style="color: #8B008B">not</span> <span style="color: #8B008B; font-weight: bold">None</span>, <span style="color: #CD5555">&quot;No Layer to follow Pooling2DLayer&quot;</span>
-                delta_next = layer._backpropagate(delta_next)
-
-            <span style="color: #228B22"># Catch error</span>
-            <span style="color: #8B008B; font-weight: bold">else</span>:
-                <span style="color: #8B008B; font-weight: bold">raise</span> <span style="color: #008b45; font-weight: bold">NotImplementedError</span>
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">_compute_scores</span>(
-        <span style="color: #658b00">self</span>,
-        scores: <span style="color: #658b00">dict</span>,
-        epoch: <span style="color: #658b00">int</span>,
-        X: np.ndarray,
-        t: np.ndarray,
-        X_val: np.ndarray,
-        t_val: np.ndarray,
-    ) -&gt; <span style="color: #658b00">dict</span>:
-        <span style="color: #CD5555">&quot;&quot;&quot;</span>
-<span style="color: #CD5555">        Description:</span>
-<span style="color: #CD5555">        ------------</span>
-<span style="color: #CD5555">            Computes scores such as training error, training accuracy, validation error</span>
-<span style="color: #CD5555">            and validation accuracy for the CNN depending on if a validation set is used</span>
-<span style="color: #CD5555">            and if the CNN performs classification or regression</span>
-
-<span style="color: #CD5555">        Returns:</span>
-<span style="color: #CD5555">        ------------</span>
-<span style="color: #CD5555">            scores (dict) a dictionary with &quot;train_error&quot;, &quot;train_acc&quot;, &quot;val_error&quot;, val_acc&quot; keys</span>
-<span style="color: #CD5555">            that contain numpy arrays with float values of all accuracies/errors over all epochs.</span>
-<span style="color: #CD5555">            Can be used to create plots. Also used to update the progress bar during training</span>
-<span style="color: #CD5555">        &quot;&quot;&quot;</span>
-
-        pred_train = <span style="color: #658b00">self</span>.predict(X)
-        cost_function_train = <span style="color: #658b00">self</span>.cost_func(t)
-        train_error = cost_function_train(pred_train)
-        scores[<span style="color: #CD5555">&quot;train_error&quot;</span>][epoch] = train_error
-
-        <span style="color: #8B008B; font-weight: bold">if</span> X_val <span style="color: #8B008B">is</span> <span style="color: #8B008B">not</span> <span style="color: #8B008B; font-weight: bold">None</span> <span style="color: #8B008B">and</span> t_val <span style="color: #8B008B">is</span> <span style="color: #8B008B">not</span> <span style="color: #8B008B; font-weight: bold">None</span>:
-            cost_function_val = <span style="color: #658b00">self</span>.cost_func(t_val)
-            pred_val = <span style="color: #658b00">self</span>.predict(X_val)
-            val_error = cost_function_val(pred_val)
-            scores[<span style="color: #CD5555">&quot;val_error&quot;</span>][epoch] = val_error
-
-        <span style="color: #8B008B; font-weight: bold">if</span> <span style="color: #658b00">self</span>.pred_format != <span style="color: #CD5555">&quot;Regression&quot;</span>:
-            train_acc = <span style="color: #658b00">self</span>._accuracy(pred_train, t)
-            scores[<span style="color: #CD5555">&quot;train_acc&quot;</span>][epoch] = train_acc
-            <span style="color: #8B008B; font-weight: bold">if</span> X_val <span style="color: #8B008B">is</span> <span style="color: #8B008B">not</span> <span style="color: #8B008B; font-weight: bold">None</span> <span style="color: #8B008B">and</span> t_val <span style="color: #8B008B">is</span> <span style="color: #8B008B">not</span> <span style="color: #8B008B; font-weight: bold">None</span>:
-                val_acc = <span style="color: #658b00">self</span>._accuracy(pred_val, t_val)
-                scores[<span style="color: #CD5555">&quot;val_acc&quot;</span>][epoch] = val_acc
-
-        <span style="color: #8B008B; font-weight: bold">return</span> scores
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">_initialize_scores</span>(<span style="color: #658b00">self</span>, epochs) -&gt; <span style="color: #658b00">dict</span>:
-        <span style="color: #CD5555">&quot;&quot;&quot;</span>
-<span style="color: #CD5555">        Description:</span>
-<span style="color: #CD5555">        ------------</span>
-<span style="color: #CD5555">            Initializes scores such as training error, training accuracy, validation error</span>
-<span style="color: #CD5555">            and validation accuracy for the CNN</span>
-
-<span style="color: #CD5555">        Returns:</span>
-<span style="color: #CD5555">        ------------</span>
-<span style="color: #CD5555">            A dictionary with &quot;train_error&quot;, &quot;train_acc&quot;, &quot;val_error&quot;, val_acc&quot; keys that</span>
-<span style="color: #CD5555">            will contain numpy arrays with float values of all accuracies/errors over all epochs</span>
-<span style="color: #CD5555">            when passed through the _compute_scores() function during fit()</span>
-<span style="color: #CD5555">        &quot;&quot;&quot;</span>
-        scores = <span style="color: #658b00">dict</span>()
-
-        train_errors = np.empty(epochs)
-        train_errors.fill(np.nan)
-        val_errors = np.empty(epochs)
-        val_errors.fill(np.nan)
-
-        train_accs = np.empty(epochs)
-        train_accs.fill(np.nan)
-        val_accs = np.empty(epochs)
-        val_accs.fill(np.nan)
-
-        scores[<span style="color: #CD5555">&quot;train_error&quot;</span>] = train_errors
-        scores[<span style="color: #CD5555">&quot;val_error&quot;</span>] = val_errors
-        scores[<span style="color: #CD5555">&quot;train_acc&quot;</span>] = train_accs
-        scores[<span style="color: #CD5555">&quot;val_acc&quot;</span>] = val_accs
-
-        <span style="color: #8B008B; font-weight: bold">return</span> scores
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">_initialize_weights</span>(<span style="color: #658b00">self</span>, X: np.ndarray) -&gt; <span style="color: #8B008B; font-weight: bold">None</span>:
-        <span style="color: #CD5555">&quot;&quot;&quot;</span>
-<span style="color: #CD5555">        Description:</span>
-<span style="color: #CD5555">        ------------</span>
-<span style="color: #CD5555">            Initializes weights for all layers in CNN</span>
-
-<span style="color: #CD5555">        Parameters:</span>
-<span style="color: #CD5555">        ------------</span>
-<span style="color: #CD5555">            I   X (np.ndarray) input of format [img, feature_maps, height, width]</span>
-<span style="color: #CD5555">        &quot;&quot;&quot;</span>
-        prev_nodes = X
-        <span style="color: #8B008B; font-weight: bold">for</span> layer <span style="color: #8B008B">in</span> <span style="color: #658b00">self</span>.layers:
-            prev_nodes = layer._reset_weights(prev_nodes)
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">predict</span>(<span style="color: #658b00">self</span>, X: np.ndarray, *, threshold=<span style="color: #B452CD">0.5</span>) -&gt; np.ndarray:
-        <span style="color: #CD5555">&quot;&quot;&quot;</span>
-<span style="color: #CD5555">        Description:</span>
-<span style="color: #CD5555">        ------------</span>
-<span style="color: #CD5555">            Predicts output of input X</span>
-
-<span style="color: #CD5555">        Parameters:</span>
-<span style="color: #CD5555">        ------------</span>
-<span style="color: #CD5555">            I   X (np.ndarray) input [img, feature_maps, height, width]</span>
-<span style="color: #CD5555">        &quot;&quot;&quot;</span>
-
-        prediction = <span style="color: #658b00">self</span>._feedforward(X)
-
-        <span style="color: #8B008B; font-weight: bold">if</span> <span style="color: #658b00">self</span>.pred_format == <span style="color: #CD5555">&quot;Binary&quot;</span>:
-            <span style="color: #8B008B; font-weight: bold">return</span> np.where(prediction &gt; threshold, <span style="color: #B452CD">1</span>, <span style="color: #B452CD">0</span>)
-        <span style="color: #8B008B; font-weight: bold">elif</span> <span style="color: #658b00">self</span>.pred_format == <span style="color: #CD5555">&quot;Multi-class&quot;</span>:
-            class_prediction = np.zeros(prediction.shape)
-            <span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(prediction.shape[<span style="color: #B452CD">0</span>]):
-                class_prediction[i, np.argmax(prediction[i, :])] = <span style="color: #B452CD">1</span>
-            <span style="color: #8B008B; font-weight: bold">return</span> class_prediction
-        <span style="color: #8B008B; font-weight: bold">else</span>:
-            <span style="color: #8B008B; font-weight: bold">return</span> prediction
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">_accuracy</span>(<span style="color: #658b00">self</span>, prediction: np.ndarray, target: np.ndarray) -&gt; <span style="color: #658b00">float</span>:
-        <span style="color: #CD5555">&quot;&quot;&quot;</span>
-<span style="color: #CD5555">        Description:</span>
-<span style="color: #CD5555">        ------------</span>
-<span style="color: #CD5555">            Calculates accuracy of given prediction to target</span>
-
-<span style="color: #CD5555">        Parameters:</span>
-<span style="color: #CD5555">        ------------</span>
-<span style="color: #CD5555">            I   prediction (np.ndarray): output of predict() fuction</span>
-<span style="color: #CD5555">            (1s and 0s in case of classification, and real numbers in case of regression)</span>
-<span style="color: #CD5555">            II  target (np.ndarray): vector of true values (What the network should predict)</span>
-
-<span style="color: #CD5555">        Returns:</span>
-<span style="color: #CD5555">        ------------</span>
-<span style="color: #CD5555">            A floating point number representing the percentage of correctly classified instances.</span>
-<span style="color: #CD5555">        &quot;&quot;&quot;</span>
-        <span style="color: #8B008B; font-weight: bold">assert</span> prediction.size == target.size
-        <span style="color: #8B008B; font-weight: bold">return</span> np.average((target == prediction))
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">_progress_bar</span>(<span style="color: #658b00">self</span>, epoch: <span style="color: #658b00">int</span>, epochs: <span style="color: #658b00">int</span>, scores: <span style="color: #658b00">dict</span>) -&gt; <span style="color: #658b00">int</span>:
-        <span style="color: #CD5555">&quot;&quot;&quot;</span>
-<span style="color: #CD5555">        Description:</span>
-<span style="color: #CD5555">        ------------</span>
-<span style="color: #CD5555">            Displays progress of training</span>
-<span style="color: #CD5555">        &quot;&quot;&quot;</span>
-        progression = epoch / epochs
-        epoch -= <span style="color: #B452CD">1</span>
-        print_length = <span style="color: #B452CD">40</span>
-        num_equals = <span style="color: #658b00">int</span>(progression * print_length)
-        num_not = print_length - num_equals
-        arrow = <span style="color: #CD5555">&quot;&gt;&quot;</span> <span style="color: #8B008B; font-weight: bold">if</span> num_equals &gt; <span style="color: #B452CD">0</span> <span style="color: #8B008B; font-weight: bold">else</span> <span style="color: #CD5555">&quot;&quot;</span>
-        bar = <span style="color: #CD5555">&quot;[&quot;</span> + <span style="color: #CD5555">&quot;=&quot;</span> * (num_equals - <span style="color: #B452CD">1</span>) + arrow + <span style="color: #CD5555">&quot;-&quot;</span> * num_not + <span style="color: #CD5555">&quot;]&quot;</span>
-        perc_print = <span style="color: #658b00">self</span>._fmt(progression * <span style="color: #B452CD">100</span>, N=<span style="color: #B452CD">5</span>)
-        line = <span style="color: #CD5555">f&quot;  {</span>bar<span style="color: #CD5555">} {</span>perc_print<span style="color: #CD5555">}% &quot;</span>
-
-        <span style="color: #8B008B; font-weight: bold">for</span> key, score <span style="color: #8B008B">in</span> scores.items():
-            <span style="color: #8B008B; font-weight: bold">if</span> np.isnan(score[epoch]) == <span style="color: #8B008B; font-weight: bold">False</span>:
-                value = <span style="color: #658b00">self</span>._fmt(score[epoch], N=<span style="color: #B452CD">4</span>)
-                line += <span style="color: #CD5555">f&quot;| {</span>key<span style="color: #CD5555">}: {</span>value<span style="color: #CD5555">} &quot;</span>
-        <span style="color: #658b00">print</span>(line, end=<span style="color: #CD5555">&quot;\r&quot;</span>)
-        <span style="color: #8B008B; font-weight: bold">return</span> <span style="color: #658b00">len</span>(line)
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">_fmt</span>(<span style="color: #658b00">self</span>, value: <span style="color: #658b00">int</span>, N=<span style="color: #B452CD">4</span>) -&gt; <span style="color: #658b00">str</span>:
-        <span style="color: #CD5555">&quot;&quot;&quot;</span>
-<span style="color: #CD5555">        Description:</span>
-<span style="color: #CD5555">        ------------</span>
-<span style="color: #CD5555">            Formats decimal numbers for progress bar</span>
-<span style="color: #CD5555">        &quot;&quot;&quot;</span>
-        <span style="color: #8B008B; font-weight: bold">if</span> value &gt; <span style="color: #B452CD">0</span>:
-            v = value
-        <span style="color: #8B008B; font-weight: bold">elif</span> value &lt; <span style="color: #B452CD">0</span>:
-            v = -<span style="color: #B452CD">10</span> * value
-        <span style="color: #8B008B; font-weight: bold">else</span>:
-            v = <span style="color: #B452CD">1</span>
-        n = <span style="color: #B452CD">1</span> + math.floor(math.log10(v))
-        <span style="color: #8B008B; font-weight: bold">if</span> n &gt;= N - <span style="color: #B452CD">1</span>:
-            <span style="color: #8B008B; font-weight: bold">return</span> <span style="color: #658b00">str</span>(<span style="color: #658b00">round</span>(value))
-            <span style="color: #228B22"># or overflow</span>
-        <span style="color: #8B008B; font-weight: bold">return</span> <span style="color: #CD5555">f&quot;{</span>value<span style="color: #CD5555">:.{</span>N-n-<span style="color: #B452CD">1</span><span style="color: #CD5555">}f}&quot;</span>
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-<h3 id="usage-of-cnn-code">Usage of CNN code </h3>
-
-<p>Using the CNN codebase is very simple. We begin by initiating a CNN
-object, which takes a cost function, a scheduler and a seed as its
-arguments. If a scheduler is not provided, it will per default
-initiate an Adam scheduler with eta=1e-4, and if a seed is not
-provided, the CNN will not be seeded, meaning it will run with a
-different random seed every run. Below we demonstrate an initiation of
-our CNN.
-</p>
+        <span style="color: #228B22"># Add a row of ones to include bias</span>
+        x_prev = np.concatenate((np.ones((<span style="color: #B452CD">1</span>,num_points)), x_prev ), axis = <span style="color: #B452CD">0</span>)
 
+        z_hidden = np.matmul(w_hidden, x_prev)
+        x_hidden = sigmoid(z_hidden)
 
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="font-size: 80%; line-height: 125%;">adam_scheduler = Adam(eta=<span style="color: #B452CD">1e-3</span>, rho=<span style="color: #B452CD">0.9</span>, rho2=<span style="color: #B452CD">0.999</span>)
-cnn = CNN(cost_func=CostCrossEntropy, scheduler=adam_scheduler, seed=<span style="color: #B452CD">2023</span>)
+        <span style="color: #228B22"># Update x_prev such that next layer can use the output from this layer</span>
+        x_prev = x_hidden
+
+    <span style="color: #228B22">## Output layer:</span>
+
+    <span style="color: #228B22"># Get the weights and bias for this layer</span>
+    w_output = deep_params[-<span style="color: #B452CD">1</span>]
+
+    <span style="color: #228B22"># Include bias:</span>
+    x_prev = np.concatenate((np.ones((<span style="color: #B452CD">1</span>,num_points)), x_prev), axis = <span style="color: #B452CD">0</span>)
+
+    z_output = np.matmul(w_output, x_prev)
+    x_output = z_output
+
+    <span style="color: #8B008B; font-weight: bold">return</span> x_output[<span style="color: #B452CD">0</span>][<span style="color: #B452CD">0</span>]
+
+<span style="color: #228B22">## Define the trial solution and cost function</span>
+<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">u</span>(x):
+    <span style="color: #8B008B; font-weight: bold">return</span> np.sin(np.pi*x)
+
+<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">g_trial</span>(point,P):
+    x,t = point
+    <span style="color: #8B008B; font-weight: bold">return</span> (<span style="color: #B452CD">1</span>-t)*u(x) + x*(<span style="color: #B452CD">1</span>-x)*t*deep_neural_network(P,point)
+
+<span style="color: #228B22"># The right side of the ODE:</span>
+<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">f</span>(point):
+    <span style="color: #8B008B; font-weight: bold">return</span> <span style="color: #B452CD">0.</span>
+
+<span style="color: #228B22"># The cost function:</span>
+<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">cost_function</span>(P, x, t):
+    cost_sum = <span style="color: #B452CD">0</span>
+
+    g_t_jacobian_func = jacobian(g_trial)
+    g_t_hessian_func = hessian(g_trial)
+
+    <span style="color: #8B008B; font-weight: bold">for</span> x_ <span style="color: #8B008B">in</span> x:
+        <span style="color: #8B008B; font-weight: bold">for</span> t_ <span style="color: #8B008B">in</span> t:
+            point = np.array([x_,t_])
+
+            g_t = g_trial(point,P)
+            g_t_jacobian = g_t_jacobian_func(point,P)
+            g_t_hessian = g_t_hessian_func(point,P)
+
+            g_t_dt = g_t_jacobian[<span style="color: #B452CD">1</span>]
+            g_t_d2x = g_t_hessian[<span style="color: #B452CD">0</span>][<span style="color: #B452CD">0</span>]
+
+            func = f(point)
+
+            err_sqr = ( (g_t_dt - g_t_d2x) - func)**<span style="color: #B452CD">2</span>
+            cost_sum += err_sqr
+
+    <span style="color: #8B008B; font-weight: bold">return</span> cost_sum /( np.size(x)*np.size(t) )
+
+<span style="color: #228B22">## For comparison, define the analytical solution</span>
+<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">g_analytic</span>(point):
+    x,t = point
+    <span style="color: #8B008B; font-weight: bold">return</span> np.exp(-np.pi**<span style="color: #B452CD">2</span>*t)*np.sin(np.pi*x)
+
+<span style="color: #228B22">## Set up a function for training the network to solve for the equation</span>
+<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">solve_pde_deep_neural_network</span>(x,t, num_neurons, num_iter, lmb):
+    <span style="color: #228B22">## Set up initial weigths and biases</span>
+    N_hidden = np.size(num_neurons)
+
+    <span style="color: #228B22">## Set up initial weigths and biases</span>
+
+    <span style="color: #228B22"># Initialize the list of parameters:</span>
+    P = [<span style="color: #8B008B; font-weight: bold">None</span>]*(N_hidden + <span style="color: #B452CD">1</span>) <span style="color: #228B22"># + 1 to include the output layer</span>
+
+    P[<span style="color: #B452CD">0</span>] = npr.randn(num_neurons[<span style="color: #B452CD">0</span>], <span style="color: #B452CD">2</span> + <span style="color: #B452CD">1</span> ) <span style="color: #228B22"># 2 since we have two points, +1 to include bias</span>
+    <span style="color: #8B008B; font-weight: bold">for</span> l <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #B452CD">1</span>,N_hidden):
+        P[l] = npr.randn(num_neurons[l], num_neurons[l-<span style="color: #B452CD">1</span>] + <span style="color: #B452CD">1</span>) <span style="color: #228B22"># +1 to include bias</span>
+
+    <span style="color: #228B22"># For the output layer</span>
+    P[-<span style="color: #B452CD">1</span>] = npr.randn(<span style="color: #B452CD">1</span>, num_neurons[-<span style="color: #B452CD">1</span>] + <span style="color: #B452CD">1</span> ) <span style="color: #228B22"># +1 since bias is included</span>
+
+    <span style="color: #658b00">print</span>(<span style="color: #CD5555">&#39;Initial cost: &#39;</span>,cost_function(P, x, t))
+
+    cost_function_grad = grad(cost_function,<span style="color: #B452CD">0</span>)
+
+    <span style="color: #228B22"># Let the update be done num_iter times</span>
+    <span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(num_iter):
+        cost_grad =  cost_function_grad(P, x , t)
+
+        <span style="color: #8B008B; font-weight: bold">for</span> l <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(N_hidden+<span style="color: #B452CD">1</span>):
+            P[l] = P[l] - lmb * cost_grad[l]
+
+    <span style="color: #658b00">print</span>(<span style="color: #CD5555">&#39;Final cost: &#39;</span>,cost_function(P, x, t))
+
+    <span style="color: #8B008B; font-weight: bold">return</span> P
+
+<span style="color: #8B008B; font-weight: bold">if</span> <span style="color: #00688B">__name__</span> == <span style="color: #CD5555">&#39;__main__&#39;</span>:
+    <span style="color: #228B22">### Use the neural network:</span>
+    npr.seed(<span style="color: #B452CD">15</span>)
+
+    <span style="color: #228B22">## Decide the vales of arguments to the function to solve</span>
+    Nx = <span style="color: #B452CD">10</span>; Nt = <span style="color: #B452CD">10</span>
+    x = np.linspace(<span style="color: #B452CD">0</span>, <span style="color: #B452CD">1</span>, Nx)
+    t = np.linspace(<span style="color: #B452CD">0</span>,<span style="color: #B452CD">1</span>,Nt)
+
+    <span style="color: #228B22">## Set up the parameters for the network</span>
+    num_hidden_neurons = [<span style="color: #B452CD">100</span>, <span style="color: #B452CD">25</span>]
+    num_iter = <span style="color: #B452CD">250</span>
+    lmb = <span style="color: #B452CD">0.01</span>
+
+    P = solve_pde_deep_neural_network(x,t, num_hidden_neurons, num_iter, lmb)
+
+    <span style="color: #228B22">## Store the results</span>
+    g_dnn_ag = np.zeros((Nx, Nt))
+    G_analytical = np.zeros((Nx, Nt))
+    <span style="color: #8B008B; font-weight: bold">for</span> i,x_ <span style="color: #8B008B">in</span> <span style="color: #658b00">enumerate</span>(x):
+        <span style="color: #8B008B; font-weight: bold">for</span> j, t_ <span style="color: #8B008B">in</span> <span style="color: #658b00">enumerate</span>(t):
+            point = np.array([x_, t_])
+            g_dnn_ag[i,j] = g_trial(point,P)
+
+            G_analytical[i,j] = g_analytic(point)
+
+    <span style="color: #228B22"># Find the map difference between the analytical and the computed solution</span>
+    diff_ag = np.abs(g_dnn_ag - G_analytical)
+    <span style="color: #658b00">print</span>(<span style="color: #CD5555">&#39;Max absolute difference between the analytical solution and the network: %g&#39;</span>%np.max(diff_ag))
+
+    <span style="color: #228B22">## Plot the solutions in two dimensions, that being in position and time</span>
+
+    T,X = np.meshgrid(t,x)
+
+    fig = plt.figure(figsize=(<span style="color: #B452CD">10</span>,<span style="color: #B452CD">10</span>))
+    ax = fig.add_suplot(projection=<span style="color: #CD5555">&#39;3d&#39;</span>)
+    ax.set_title(<span style="color: #CD5555">&#39;Solution from the deep neural network w/ %d layer&#39;</span>%<span style="color: #658b00">len</span>(num_hidden_neurons))
+    s = ax.plot_surface(T,X,g_dnn_ag,linewidth=<span style="color: #B452CD">0</span>,antialiased=<span style="color: #8B008B; font-weight: bold">False</span>,cmap=cm.viridis)
+    ax.set_xlabel(<span style="color: #CD5555">&#39;Time $t$&#39;</span>)
+    ax.set_ylabel(<span style="color: #CD5555">&#39;Position $x$&#39;</span>);
+
+
+    fig = plt.figure(figsize=(<span style="color: #B452CD">10</span>,<span style="color: #B452CD">10</span>))
+    ax = fig.add_suplot(projection=<span style="color: #CD5555">&#39;3d&#39;</span>)
+    ax.set_title(<span style="color: #CD5555">&#39;Analytical solution&#39;</span>)
+    s = ax.plot_surface(T,X,G_analytical,linewidth=<span style="color: #B452CD">0</span>,antialiased=<span style="color: #8B008B; font-weight: bold">False</span>,cmap=cm.viridis)
+    ax.set_xlabel(<span style="color: #CD5555">&#39;Time $t$&#39;</span>)
+    ax.set_ylabel(<span style="color: #CD5555">&#39;Position $x$&#39;</span>);
+
+    fig = plt.figure(figsize=(<span style="color: #B452CD">10</span>,<span style="color: #B452CD">10</span>))
+    ax = fig.add_suplot(projection=<span style="color: #CD5555">&#39;3d&#39;</span>)
+    ax.set_title(<span style="color: #CD5555">&#39;Difference&#39;</span>)
+    s = ax.plot_surface(T,X,diff_ag,linewidth=<span style="color: #B452CD">0</span>,antialiased=<span style="color: #8B008B; font-weight: bold">False</span>,cmap=cm.viridis)
+    ax.set_xlabel(<span style="color: #CD5555">&#39;Time $t$&#39;</span>)
+    ax.set_ylabel(<span style="color: #CD5555">&#39;Position $x$&#39;</span>);
+
+    <span style="color: #228B22">## Take some slices of the 3D plots just to see the solutions at particular times</span>
+    indx1 = <span style="color: #B452CD">0</span>
+    indx2 = <span style="color: #658b00">int</span>(Nt/<span style="color: #B452CD">2</span>)
+    indx3 = Nt-<span style="color: #B452CD">1</span>
+
+    t1 = t[indx1]
+    t2 = t[indx2]
+    t3 = t[indx3]
+
+    <span style="color: #228B22"># Slice the results from the DNN</span>
+    res1 = g_dnn_ag[:,indx1]
+    res2 = g_dnn_ag[:,indx2]
+    res3 = g_dnn_ag[:,indx3]
+
+    <span style="color: #228B22"># Slice the analytical results</span>
+    res_analytical1 = G_analytical[:,indx1]
+    res_analytical2 = G_analytical[:,indx2]
+    res_analytical3 = G_analytical[:,indx3]
+
+    <span style="color: #228B22"># Plot the slices</span>
+    plt.figure(figsize=(<span style="color: #B452CD">10</span>,<span style="color: #B452CD">10</span>))
+    plt.title(<span style="color: #CD5555">&quot;Computed solutions at time = %g&quot;</span>%t1)
+    plt.plot(x, res1)
+    plt.plot(x,res_analytical1)
+    plt.legend([<span style="color: #CD5555">&#39;dnn&#39;</span>,<span style="color: #CD5555">&#39;analytical&#39;</span>])
+
+    plt.figure(figsize=(<span style="color: #B452CD">10</span>,<span style="color: #B452CD">10</span>))
+    plt.title(<span style="color: #CD5555">&quot;Computed solutions at time = %g&quot;</span>%t2)
+    plt.plot(x, res2)
+    plt.plot(x,res_analytical2)
+    plt.legend([<span style="color: #CD5555">&#39;dnn&#39;</span>,<span style="color: #CD5555">&#39;analytical&#39;</span>])
+
+    plt.figure(figsize=(<span style="color: #B452CD">10</span>,<span style="color: #B452CD">10</span>))
+    plt.title(<span style="color: #CD5555">&quot;Computed solutions at time = %g&quot;</span>%t3)
+    plt.plot(x, res3)
+    plt.plot(x,res_analytical3)
+    plt.legend([<span style="color: #CD5555">&#39;dnn&#39;</span>,<span style="color: #CD5555">&#39;analytical&#39;</span>])
+
+    plt.show()
 </pre>
 </div>
       </div>
@@ -4662,475 +2709,1039 @@ <h3 id="usage-of-cnn-code">Usage of CNN code </h3>
     </div>
   </div>
 </div>
+</section>
 
-<p>Now that we have our CNN object, we can begin to add layers to it!
-Many of the add_layer functions have default values, for example
-add_Convolution2DLayer() has a default v_stride and h_stride of
-1. However, these can of course be set to any value you please. Note
-that the input channels of a subsequent convolutional layer must equal
-the previous convolutional layer's feature maps.
-</p>
+<section>
+<h2 id="resources-on-differential-equations-and-deep-learning">Resources on differential equations and deep learning </h2>
 
+<ol>
+<p><li> <a href="https://pdfs.semanticscholar.org/d061/df393e0e8fbfd0ea24976458b7d42419040d.pdf" target="_blank">Artificial neural networks for solving ordinary and partial differential equations by I.E. Lagaris et al</a></li>
+<p><li> <a href="https://becominghuman.ai/neural-networks-for-solving-differential-equations-fa230ac5e04c" target="_blank">Neural networks for solving differential equations by A. Honchar</a></li>
+<p><li> <a href="http://cs229.stanford.edu/proj2013/ChiaramonteKiener-SolvingDifferentialEquationsUsingNeuralNetworks.pdf" target="_blank">Solving differential equations using neural networks by M.M Chiaramonte and M. Kiener</a></li>
+<p><li> <a href="https://www.springer.com/us/book/9783540225515" target="_blank">Introduction to Partial Differential Equations by A. Tveito, R. Winther</a></li>
+</ol>
+</section>
 
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
+<section>
+<h2 id="convolutional-neural-networks-recognizing-images">Convolutional Neural Networks (recognizing images) </h2>
+
+<p>Convolutional neural networks (CNNs) were developed during the last
+decade of the previous century, with a focus on character recognition
+tasks. Nowadays, CNNs are a central element in the spectacular success
+of deep learning methods. The success in for example image
+classifications have made them a central tool for most machine
+learning practitioners.
+</p>
+
+<p>CNNs are very similar to ordinary Neural Networks.
+They are made up of neurons that have learnable weights and
+biases. Each neuron receives some inputs, performs a dot product and
+optionally follows it with a non-linearity. The whole network still
+expresses a single differentiable score function: from the raw image
+pixels on one end to class scores at the other. And they still have a
+loss function (for example Softmax) on the last (fully-connected) layer
+and all the tips/tricks we developed for learning regular Neural
+Networks still apply (back propagation, gradient descent etc etc).
+</p>
+</section>
+
+<section>
+<h2 id="what-is-the-difference">What is the Difference </h2>
+
+<p><b>CNN architectures make the explicit assumption that
+the inputs are images, which allows us to encode certain properties
+into the architecture. These then make the forward function more
+efficient to implement and vastly reduce the amount of parameters in
+the network.</b>
+</p>
+</section>
+
+<section>
+<h2 id="neural-networks-vs-cnns">Neural Networks vs CNNs </h2>
+
+<p>Neural networks are defined as <b>affine transformations</b>, that is 
+a vector is received as input and is multiplied with a matrix of so-called weights (our unknown paramters) to produce an
+output (to which a bias vector is usually added before passing the result
+through a nonlinear activation function). This is applicable to any type of input, be it an
+image, a sound clip or an unordered collection of features: whatever their
+dimensionality, their representation can always be flattened into a vector
+before the transformation.
+</p>
+</section>
+
+<section>
+<h2 id="why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc">Why CNNS for images, sound files, medical images from CT scans etc? </h2>
+
+<p>However, when we consider images, sound clips and many other similar kinds of data, these data  have an intrinsic
+structure. More formally, they share these important properties:
+</p>
+<ul>
+<p><li> They are stored as multi-dimensional arrays (think of the pixels of a figure) .</li>
+<p><li> They feature one or more axes for which ordering matters (e.g., width and height axes for an image, time axis for a sound clip).</li>
+<p><li> One axis, called the channel axis, is used to access different views of the data (e.g., the red, green and blue channels of a color image, or the left and right channels of a stereo audio track).</li>
+</ul>
+<p>
+<p>These properties are not exploited when an affine transformation is applied; in
+fact, all the axes are treated in the same way and the topological information
+is not taken into account. Still, taking advantage of the implicit structure of
+the data may prove very handy in solving some tasks, like computer vision and
+speech recognition, and in these cases it would be best to preserve it. This is
+where discrete convolutions come into play.
+</p>
+
+<p>A discrete convolution is a linear transformation that preserves this notion of
+ordering. It is sparse (only a few input units contribute to a given output
+unit) and reuses parameters (the same weights are applied to multiple locations
+in the input).
+</p>
+</section>
+
+<section>
+<h2 id="regular-nns-don-t-scale-well-to-full-images">Regular NNs don&#8217;t scale well to full images </h2>
+
+<p>As an example, consider
+an image of size \( 32\times 32\times 3 \) (32 wide, 32 high, 3 color channels), so a
+single fully-connected neuron in a first hidden layer of a regular
+Neural Network would have \( 32\times 32\times 3 = 3072 \) weights. This amount still
+seems manageable, but clearly this fully-connected structure does not
+scale to larger images. For example, an image of more respectable
+size, say \( 200\times 200\times 3 \), would lead to neurons that have 
+\( 200\times 200\times 3 = 120,000 \) weights. 
+</p>
+
+<p>We could have
+several such neurons, and the parameters would add up quickly! Clearly,
+this full connectivity is wasteful and the huge number of parameters
+would quickly lead to possible overfitting.
+</p>
+
+<center>  <!-- FIGURE -->
+<hr class="figure">
+<center>
+<p class="caption">Figure 1:  A regular 3-layer Neural Network. </p>
+</center>
+<p><img src="http://github.com/figslides/nn.jpeg" width="500" align="bottom"></p>
+</center>
+</section>
+
+<section>
+<h2 id="3d-volumes-of-neurons">3D volumes of neurons </h2>
+
+<p>Convolutional Neural Networks take advantage of the fact that the
+input consists of images and they constrain the architecture in a more
+sensible way. 
+</p>
+
+<p>In particular, unlike a regular Neural Network, the
+layers of a CNN have neurons arranged in 3 dimensions: width,
+height, depth. (Note that the word depth here refers to the third
+dimension of an activation volume, not to the depth of a full Neural
+Network, which can refer to the total number of layers in a network.)
+</p>
+
+<p>To understand it better, the above example of an image 
+with an input volume of
+activations has dimensions \( 32\times 32\times 3 \) (width, height,
+depth respectively). 
+</p>
+
+<p>The neurons in a layer will
+only be connected to a small region of the layer before it, instead of
+all of the neurons in a fully-connected manner. Moreover, the final
+output layer could  for this specific image have dimensions \( 1\times 1 \times 10 \), 
+because by the
+end of the CNN architecture we will reduce the full image into a
+single vector of class scores, arranged along the depth
+dimension. 
+</p>
+
+<center>  <!-- FIGURE -->
+<hr class="figure">
+<center>
+<p class="caption">Figure 2:  A CNN arranges its neurons in three dimensions (width, height, depth), as visualized in one of the layers. Every layer of a CNN transforms the 3D input volume to a 3D output volume of neuron activations. In this example, the red input layer holds the image, so its width and height would be the dimensions of the image, and the depth would be 3 (Red, Green, Blue channels). </p>
+</center>
+<p><img src="http://github.com/figslides/cnn.jpeg" width="500" align="bottom"></p>
+</center>
+</section>
+
+<section>
+<h2 id="more-on-dimensionalities">More on Dimensionalities </h2>
+
+<p>In fields like signal processing (and imaging as well), one designs
+so-called filters. These filters are defined by the convolutions and
+are often hand-crafted. One may specify filters for smoothing, edge
+detection, frequency reshaping, and similar operations. However with
+neural networks the idea is to automatically learn the filters and use
+many of them in conjunction with non-linear operations (activation
+functions).
+</p>
+
+<p>As an example consider a neural network operating on sound sequence
+data.  Assume that we an input vector \( \boldsymbol{x} \) of length \( d=10^6 \).  We
+construct then a neural network with onle hidden layer only with
+\( 10^4 \) nodes. This means that we will have a weight matrix with
+\( 10^4\times 10^6=10^{10} \) weights to be determined, together with \( 10^4 \) biases.
+</p>
+
+<p>Assume furthermore that we have an output layer which is meant to train whether the sound sequence represents a human voice (true) or something else (false).
+It means that we have only one output node. But since this output node connects to \( 10^4 \) nodes in the hidden layer, there are in total \( 10^4 \) weights to be determined for the output layer, plus one bias. In total we have
+</p>
+
+<p>&nbsp;<br>
+$$
+\mathrm{NumberParameters}=10^{10}+10^4+10^4+1 \approx 10^{10},
+$$
+<p>&nbsp;<br>
+
+<p>that is ten billion parameters to determine. </p>
+</section>
+
+<section>
+<h2 id="further-remarks">Further remarks </h2>
+
+<p>The main principles that justify convolutions is locality of
+information and repetion of patterns within the signal. Sound samples
+of the input in adjacent spots are much more likely to affect each
+other than those that are very far away. Similarly, sounds are
+repeated in multiple times in the signal. While slightly simplistic,
+reasoning about such a sound example demonstrates this. The same
+principles then apply to images and other similar data.
+</p>
+</section>
+
+<section>
+<h2 id="layers-used-to-build-cnns">Layers used to build CNNs </h2>
+
+<p>A simple CNN is a sequence of layers, and every layer of a CNN
+transforms one volume of activations to another through a
+differentiable function. We use three main types of layers to build
+CNN architectures: Convolutional Layer, Pooling Layer, and
+Fully-Connected Layer (exactly as seen in regular Neural Networks). We
+will stack these layers to form a full CNN architecture.
+</p>
+
+<p>A simple CNN for image classification could have the architecture:</p>
+
+<ul>
+<p><li> <b>INPUT</b> (\( 32\times 32 \times 3 \)) will hold the raw pixel values of the image, in this case an image of width 32, height 32, and with three color channels R,G,B.</li>
+<p><li> <b>CONV</b> (convolutional )layer will compute the output of neurons that are connected to local regions in the input, each computing a dot product between their weights and a small region they are connected to in the input volume. This may result in volume such as \( [32\times 32\times 12] \) if we decided to use 12 filters.</li>
+<p><li> <b>RELU</b> layer will apply an elementwise activation function, such as the \( max(0,x) \) thresholding at zero. This leaves the size of the volume unchanged (\( [32\times 32\times 12] \)).</li>
+<p><li> <b>POOL</b> (pooling) layer will perform a downsampling operation along the spatial dimensions (width, height), resulting in volume such as \( [16\times 16\times 12] \).</li>
+<p><li> <b>FC</b> (i.e. fully-connected) layer will compute the class scores, resulting in volume of size \( [1\times 1\times 10] \), where each of the 10 numbers correspond to a class score, such as among the 10 categories of the MNIST images we considered above . As with ordinary Neural Networks and as the name implies, each neuron in this layer will be connected to all the numbers in the previous volume.</li>
+</ul>
+</section>
+
+<section>
+<h2 id="transforming-images">Transforming images </h2>
+
+<p>CNNs transform the original image layer by layer from the original
+pixel values to the final class scores. 
+</p>
+
+<p>Observe that some layers contain
+parameters and other don&#8217;t. In particular, the CNN layers perform
+transformations that are a function of not only the activations in the
+input volume, but also of the parameters (the weights and biases of
+the neurons). On the other hand, the RELU/POOL layers will implement a
+fixed function. The parameters in the CONV/FC layers will be trained
+with gradient descent so that the class scores that the CNN computes
+are consistent with the labels in the training set for each image.
+</p>
+</section>
+
+<section>
+<h2 id="cnns-in-brief">CNNs in brief </h2>
+
+<p>In summary:</p>
+
+<ul>
+<p><li> A CNN architecture is in the simplest case a list of Layers that transform the image volume into an output volume (e.g. holding the class scores)</li>
+<p><li> There are a few distinct types of Layers (e.g. CONV/FC/RELU/POOL are by far the most popular)</li>
+<p><li> Each Layer accepts an input 3D volume and transforms it to an output 3D volume through a differentiable function</li>
+<p><li> Each Layer may or may not have parameters (e.g. CONV/FC do, RELU/POOL don&#8217;t)</li>
+<p><li> Each Layer may or may not have additional hyperparameters (e.g. CONV/FC/POOL do, RELU doesn&#8217;t)</li>
+</ul>
+</section>
+
+<section>
+<h2 id="a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book">A deep CNN model (<a href="https://github.com/rasbt/machine-learning-book" target="_blank">From Raschka et al</a>) </h2>
+
+<center>  <!-- FIGURE -->
+<hr class="figure">
+<center>
+<p class="caption">Figure 3:  A deep CNN </p>
+</center>
+<p><img src="http://github.com/figslides/deepcnn.png" width="500" align="bottom"></p>
+</center>
+</section>
+
+<section>
+<h2 id="key-idea">Key Idea </h2>
+
+<p>A dense neural network is representd by an affine operation (like
+matrix-matrix multiplication) where all parameters are included.
+</p>
+
+<p>The key idea in CNNs for say imaging is that in images neighbor pixels tend to be related! So we connect
+only neighboring neurons in the input instead of connecting all with the first hidden layer.
+</p>
+
+<p>We say we perform a filtering (convolution is the mathematical operation). </p>
+</section>
+
+<section>
+<h2 id="how-to-do-image-compression-before-the-era-of-deep-learning">How to do image compression before the era of deep learning </h2>
+
+<p>The singular-value decomposition (SVD) algorithm has been for decades one of the standard ways of compressing images.
+The <a href="https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/chapter2.html#the-singular-value-decomposition" target="_blank">lectures on the SVD</a> give many of the essential details concerning the SVD.
+</p>
+
+<p>The orthogonal vectors which are obtained from the SVD, can be used to
+project down the dimensionality of a given image. In the example here
+we gray-scale an image and downsize it.
+</p>
+
+<p>This recipe relies on us being able to actually perform the SVD. For
+large images, and in particular with many images to reconstruct, using the SVD 
+may quickly become an overwhelming task. With the advent of efficient deep
+learning methods like CNNs and later generative methods, these methods
+have become in the last years the premier way of performing image
+analysis. In particular for classification problems with labelled images.
+</p>
+</section>
+
+<section>
+<h2 id="the-svd-example">The SVD example </h2>
+
+
+<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
     <div class="inner_cell">
       <div class="input_area">
         <div class="highlight" style="background: #eeeedd">
-  <pre style="font-size: 80%; line-height: 125%;">cnn.add_Convolution2DLayer(
-    input_channels=<span style="color: #B452CD">1</span>,
-    feature_maps=<span style="color: #B452CD">1</span>,
-    kernel_height=<span style="color: #B452CD">3</span>,
-    kernel_width=<span style="color: #B452CD">3</span>,
-    act_func=LRELU,
-)
+  <pre style="font-size: 80%; line-height: 125%;"><span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">matplotlib.image</span> <span style="color: #8B008B; font-weight: bold">import</span> imread
+<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">matplotlib.pyplot</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">plt</span>
+<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">scipy.linalg</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">ln</span>
+<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
+<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">os</span>
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">PIL</span> <span style="color: #8B008B; font-weight: bold">import</span> Image
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">math</span> <span style="color: #8B008B; font-weight: bold">import</span> log10, sqrt 
+plt.rcParams[<span style="color: #CD5555">&#39;figure.figsize&#39;</span>] = [<span style="color: #B452CD">16</span>, <span style="color: #B452CD">8</span>]
+<span style="color: #228B22"># Import image</span>
+A = imread(os.path.join(<span style="color: #CD5555">&quot;figslides/photo1.jpg&quot;</span>))
+X = A.dot([<span style="color: #B452CD">0.299</span>, <span style="color: #B452CD">0.5870</span>, <span style="color: #B452CD">0.114</span>]) <span style="color: #228B22"># Convert RGB to grayscale</span>
+img = plt.imshow(X)
+<span style="color: #228B22"># convert to gray</span>
+img.set_cmap(<span style="color: #CD5555">&#39;gray&#39;</span>)
+plt.axis(<span style="color: #CD5555">&#39;off&#39;</span>)
+plt.show()
+<span style="color: #228B22"># Call image size</span>
+<span style="color: #658b00">print</span>(<span style="color: #CD5555">&#39;: %s&#39;</span>%<span style="color: #658b00">str</span>(X.shape))
+
+
+<span style="color: #228B22"># split the matrix into U, S, VT</span>
+U, S, VT = np.linalg.svd(X,full_matrices=<span style="color: #8B008B; font-weight: bold">False</span>)
+S = np.diag(S)
+m = <span style="color: #B452CD">800</span> <span style="color: #228B22"># Image&#39;s width</span>
+n = <span style="color: #B452CD">1200</span> <span style="color: #228B22"># Image&#39;s height</span>
+j = <span style="color: #B452CD">0</span>
+<span style="color: #228B22"># Try compression with different k vectors (these represent projections):</span>
+<span style="color: #8B008B; font-weight: bold">for</span> k <span style="color: #8B008B">in</span> (<span style="color: #B452CD">5</span>,<span style="color: #B452CD">10</span>, <span style="color: #B452CD">20</span>, <span style="color: #B452CD">100</span>,<span style="color: #B452CD">200</span>,<span style="color: #B452CD">400</span>,<span style="color: #B452CD">500</span>):
+    <span style="color: #228B22"># Original size of the image</span>
+    originalSize = m * n 
+    <span style="color: #228B22"># Size after compressed</span>
+    compressedSize = k * (<span style="color: #B452CD">1</span> + m + n) 
+    <span style="color: #228B22"># The projection of the original image</span>
+    Xapprox = U[:,:k] @ S[<span style="color: #B452CD">0</span>:k,:k] @ VT[:k,:]
+    plt.figure(j+<span style="color: #B452CD">1</span>)
+    j += <span style="color: #B452CD">1</span>
+    img = plt.imshow(Xapprox)
+    img.set_cmap(<span style="color: #CD5555">&#39;gray&#39;</span>)
+    
+    plt.axis(<span style="color: #CD5555">&#39;off&#39;</span>)
+    plt.title(<span style="color: #CD5555">&#39;k = &#39;</span> + <span style="color: #658b00">str</span>(k))
+    plt.show() 
+    <span style="color: #658b00">print</span>(<span style="color: #CD5555">&#39;Original size of image:&#39;</span>)
+    <span style="color: #658b00">print</span>(originalSize)
+    <span style="color: #658b00">print</span>(<span style="color: #CD5555">&#39;Compression rate as Compressed image / Original size:&#39;</span>)
+    ratio = compressedSize * <span style="color: #B452CD">1.0</span> / originalSize
+    <span style="color: #658b00">print</span>(ratio)
+    <span style="color: #658b00">print</span>(<span style="color: #CD5555">&#39;Compression rate is &#39;</span> + <span style="color: #658b00">str</span>( <span style="color: #658b00">round</span>(ratio * <span style="color: #B452CD">100</span> ,<span style="color: #B452CD">2</span>)) + <span style="color: #CD5555">&#39;%&#39;</span> )  
+    <span style="color: #228B22"># Estimate MQA</span>
+    x= X.astype(<span style="color: #CD5555">&quot;float&quot;</span>)
+    y=Xapprox.astype(<span style="color: #CD5555">&quot;float&quot;</span>)
+    err = np.sum((x - y) ** <span style="color: #B452CD">2</span>)
+    err /= <span style="color: #658b00">float</span>(X.shape[<span style="color: #B452CD">0</span>] * Xapprox.shape[<span style="color: #B452CD">1</span>])
+    <span style="color: #658b00">print</span>(<span style="color: #CD5555">&#39;The mean-square deviation &#39;</span>+ <span style="color: #658b00">str</span>(<span style="color: #658b00">round</span>( err)))
+    max_pixel = <span style="color: #B452CD">255.0</span>
+    <span style="color: #228B22"># Estimate Signal Noise Ratio</span>
+    srv = <span style="color: #B452CD">20</span> * (log10(max_pixel / sqrt(err)))
+    <span style="color: #658b00">print</span>(<span style="color: #CD5555">&#39;Signa to noise ratio &#39;</span>+ <span style="color: #658b00">str</span>(<span style="color: #658b00">round</span>(srv)) +<span style="color: #CD5555">&#39;dB&#39;</span>)
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
+</div>
+</section>
+
+<section>
+<h2 id="mathematics-of-cnns">Mathematics of CNNs </h2>
+
+<p>The mathematics of CNNs is based on the mathematical operation of
+<b>convolution</b>.  In mathematics (in particular in functional analysis),
+convolution is represented by mathematical operations (integration,
+summation etc) on two functions in order to produce a third function
+that expresses how the shape of one gets modified by the other.
+Convolution has a plethora of applications in a variety of
+disciplines, spanning from statistics to signal processing, computer
+vision, solutions of differential equations,linear algebra,
+engineering, and yes, machine learning.
+</p>
+
+<p>Mathematically, convolution is defined as follows (one-dimensional example):
+Let us define a continuous function \( y(t) \) given by
+</p>
+<p>&nbsp;<br>
+$$
+y(t) = \int x(a) w(t-a) da,
+$$
+<p>&nbsp;<br>
+
+<p>where \( x(a) \) represents a so-called input and \( w(t-a) \) is normally called the weight function or kernel.</p>
+
+<p>The above integral is written in  a more compact form as</p>
+<p>&nbsp;<br>
+$$
+y(t) = \left(x * w\right)(t).
+$$
+<p>&nbsp;<br>
+
+<p>The discretized version reads</p>
+<p>&nbsp;<br>
+$$
+y(t) = \sum_{a=-\infty}^{a=\infty}x(a)w(t-a).
+$$
+<p>&nbsp;<br>
+
+<p>Computing the inverse of the above convolution operations is known as deconvolution and the process is commutative.</p>
+
+<p>How can we use this? And what does it mean? Let us study some familiar examples first.</p>
+</section>
+
+<section>
+<h2 id="convolution-examples-polynomial-multiplication">Convolution Examples: Polynomial multiplication </h2>
+
+<p>Our first example is that of a multiplication between two polynomials,
+which we will rewrite in terms of the mathematics of convolution. In
+the final stage, since the problem here is a discrete one, we will
+recast the final expression in terms of a matrix-vector
+multiplication, where the matrix is a so-called <a href="https://link.springer.com/book/10.1007/978-93-86279-04-0" target="_blank">Toeplitz matrix
+</a>.
+</p>
+
+<p>Let us look a the following polynomials to second and third order, respectively:</p>
+<p>&nbsp;<br>
+$$
+p(t) = \alpha_0+\alpha_1 t+\alpha_2 t^2,
+$$
+<p>&nbsp;<br>
+
+<p>and</p>
+<p>&nbsp;<br>
+$$
+s(t) = \beta_0+\beta_1 t+\beta_2 t^2+\beta_3 t^3.
+$$
+<p>&nbsp;<br>
+
+<p>The polynomial multiplication gives us a new polynomial of degree \( 5 \)</p>
+<p>&nbsp;<br>
+$$
+z(t) = \delta_0+\delta_1 t+\delta_2 t^2+\delta_3 t^3+\delta_4 t^4+\delta_5 t^5.
+$$
+<p>&nbsp;<br>
+</section>
+
+<section>
+<h2 id="efficient-polynomial-multiplication">Efficient Polynomial Multiplication </h2>
+
+<p>Computing polynomial products can be implemented efficiently if we rewrite the more brute force multiplications using convolution.
+We note first that the new coefficients are given as
+</p>
+
+<p>&nbsp;<br>
+$$
+\begin{split}
+\delta_0=&\alpha_0\beta_0\\
+\delta_1=&\alpha_1\beta_0+\alpha_0\beta_1\\
+\delta_2=&\alpha_0\beta_2+\alpha_1\beta_1+\alpha_2\beta_0\\
+\delta_3=&\alpha_1\beta_2+\alpha_2\beta_1+\alpha_0\beta_3\\
+\delta_4=&\alpha_2\beta_2+\alpha_1\beta_3\\
+\delta_5=&\alpha_2\beta_3.\\
+\end{split}
+$$
+<p>&nbsp;<br>
+
+<p>We note that \( \alpha_i=0 \) except for \( i\in \left\{0,1,2\right\} \) and \( \beta_i=0 \) except for \( i\in\left\{0,1,2,3\right\} \).</p>
+
+<p>We can then rewrite the coefficients \( \delta_j \) using a discrete convolution as</p>
+<p>&nbsp;<br>
+$$
+\delta_j = \sum_{i=-\infty}^{i=\infty}\alpha_i\beta_{j-i}=(\alpha * \beta)_j,
+$$
+<p>&nbsp;<br>
+
+<p>or as a double sum with restriction \( l=i+j \)</p>
+<p>&nbsp;<br>
+$$
+\delta_l = \sum_{ij}\alpha_i\beta_{j}.
+$$
+<p>&nbsp;<br>
+</section>
+
+<section>
+<h2 id="further-simplification">Further simplification </h2>
+
+<p>Although we may have redundant operations with some few zeros for \( \beta_i \), we can rewrite the above sum in a more compact way as </p>
+<p>&nbsp;<br>
+$$
+\delta_i = \sum_{k=0}^{k=m-1}\alpha_k\beta_{i-k},
+$$
+<p>&nbsp;<br>
+
+<p>where \( m=3 \) in our case, the maximum length of
+the vector \( \alpha \). Note that the vector \( \boldsymbol{\beta} \) has length \( n=4 \). Below we will find an even more efficient representation.
+</p>
+</section>
+
+<section>
+<h2 id="a-more-efficient-way-of-coding-the-above-convolution">A more efficient way of coding the above Convolution </h2>
+
+<p>Since we only have a finite number of \( \alpha \) and \( \beta \) values
+which are non-zero, we can rewrite the above convolution expressions
+as a matrix-vector multiplication
+</p>
+
+<p>&nbsp;<br>
+$$
+\boldsymbol{\delta}=\begin{bmatrix}\alpha_0 & 0 & 0 & 0 \\
+                            \alpha_1 & \alpha_0 & 0 & 0 \\
+			    \alpha_2 & \alpha_1 & \alpha_0 & 0 \\
+			    0 & \alpha_2 & \alpha_1 & \alpha_0 \\
+			    0 & 0 & \alpha_2 & \alpha_1 \\
+			    0 & 0 & 0 & \alpha_2
+			    \end{bmatrix}\begin{bmatrix} \beta_0 \\ \beta_1 \\ \beta_2 \\ \beta_3\end{bmatrix}.
+$$
+<p>&nbsp;<br>
+</section>
+
+<section>
+<h2 id="commutative-process">Commutative process </h2>
+
+<p>The process is commutative and we can easily see that we can rewrite the multiplication in terms of  a matrix holding \( \beta \) and a vector holding \( \alpha \).
+In this case we have
+</p>
+<p>&nbsp;<br>
+$$
+\boldsymbol{\delta}=\begin{bmatrix}\beta_0 & 0 & 0  \\
+                            \beta_1 & \beta_0 & 0  \\
+			    \beta_2 & \beta_1 & \beta_0  \\
+			    \beta_3 & \beta_2 & \beta_1 \\
+			    0 & \beta_3 & \beta_2 \\
+			    0 & 0 & \beta_3
+			    \end{bmatrix}\begin{bmatrix} \alpha_0 \\ \alpha_1 \\ \alpha_2\end{bmatrix}.
+$$
+<p>&nbsp;<br>
+
+<p>Note that the use of these matrices is for mathematical purposes only
+and not implementation purposes.  When implementing the above equation
+we do not encode (and allocate memory) the matrices explicitely.  We
+rather code the convolutions in the minimal memory footprint that they
+require.
+</p>
+</section>
+
+<section>
+<h2 id="toeplitz-matrices">Toeplitz matrices </h2>
+
+<p>The above matrices are examples of so-called <a href="https://link.springer.com/book/10.1007/978-93-86279-04-0" target="_blank">Toeplitz
+matrices</a>. A
+Toeplitz matrix is a matrix in which each descending diagonal from
+left to right is constant. For instance the last matrix, which we
+rewrite as
+</p>
+<p>&nbsp;<br>
+$$
+\boldsymbol{A}=\begin{bmatrix}a_0 & 0 & 0  \\
+                            a_1 & a_0 & 0  \\
+			    a_2 & a_1 & a_0  \\
+			    a_3 & a_2 & a_1 \\
+			    0 & a_3 & a_2 \\
+			    0 & 0 & a_3
+			    \end{bmatrix},
+$$
+<p>&nbsp;<br>
+
+<p>with elements \( a_{ii}=a_{i+1,j+1}=a_{i-j} \) is an example of a Toeplitz
+matrix. Such a matrix does not need to be a square matrix.  Toeplitz
+matrices are also closely connected with Fourier series, because the multiplication operator by a trigonometric
+polynomial, compressed to a finite-dimensional space, can be
+represented by such a matrix. The example above shows that we can
+represent linear convolution as multiplication of a Toeplitz matrix by
+a vector.
+</p>
+</section>
+
+<section>
+<h2 id="fourier-series-and-toeplitz-matrices">Fourier series and Toeplitz matrices </h2>
+
+<p>This is an active and ogoing research area concerning CNNs. The following articles may be of interest</p>
+<ol>
+<p><li> <a href="https://www.sciencedirect.com/topics/engineering/convolution-theorem#:~:text=The%20convolution%20theorem%20(together%20with,k%20)%20G%20(%20k%20)%20." target="_blank">Read more about the convolution theorem and Fouriers series</a></li>
+<p><li> <a href="https://www.sciencedirect.com/science/article/pii/S1568494623006257" target="_blank">Fourier Transform Layer</a></li>
+</ol>
+</section>
+
+<section>
+<h2 id="generalizing-the-above-one-dimensional-case">Generalizing the above one-dimensional case </h2>
+
+<p>In order to align the above simple case with the more general
+convolution cases, we rename \( \boldsymbol{\alpha} \), whose length is \( m=3 \),
+with \( \boldsymbol{w} \).  We will interpret \( \boldsymbol{w} \) as a weight/filter function
+with which we want to perform the convolution with an input variable
+\( \boldsymbol{x} \) of length \( n \).  We will assume always that the filter
+\( \boldsymbol{w} \) has dimensionality \( m \le n \).
+</p>
+
+<p>We replace thus \( \boldsymbol{\beta} \) with \( \boldsymbol{x} \) and \( \boldsymbol{\delta} \) with \( \boldsymbol{y} \) and have</p>
+<p>&nbsp;<br>
+$$
+y(i)= \left(x*w\right)(i)= \sum_{k=0}^{k=m-1}w(k)x(i-k),
+$$
+<p>&nbsp;<br>
 
-cnn.add_FlattenLayer()
+<p>where \( m=3 \) in our case, the maximum length of the vector \( \boldsymbol{w} \).
+Here the symbol \( * \) represents the mathematical operation of convolution.
+</p>
+</section>
 
-cnn.add_FullyConnectedLayer(<span style="color: #B452CD">30</span>, LRELU)
+<section>
+<h2 id="memory-considerations">Memory considerations </h2>
 
-cnn.add_FullyConnectedLayer(<span style="color: #B452CD">20</span>, LRELU)
+<p>This expression leaves us however with some terms with negative
+indices, for example \( x(-1) \) and \( x(-2) \) which may not be defined. Our
+vector \( \boldsymbol{x} \) has components \( x(0) \), \( x(1) \), \( x(2) \) and \( x(3) \).
+</p>
 
-cnn.add_OutputLayer(<span style="color: #B452CD">10</span>, softmax)
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
+<p>The index \( j \) for \( \boldsymbol{x} \) runs from \( j=0 \) to \( j=3 \) since \( \boldsymbol{x} \) is meant to
+represent a third-order polynomial.
+</p>
+
+<p>Furthermore, the index \( i \) runs from \( i=0 \) to \( i=5 \) since \( \boldsymbol{y} \)
+contains the coefficients of a fifth-order polynomial.  When \( i=5 \) we
+may also have values of \( x(4) \) and \( x(5) \) which are not defined.
+</p>
+</section>
+
+<section>
+<h2 id="padding">Padding </h2>
+
+<p>The solution to this is what is called <b>padding</b>!  We simply define a
+new vector \( x \) with two added elements set to zero before \( x(0) \) and
+two new elements after \( x(3) \) set to zero. That is, we augment the
+length of \( \boldsymbol{x} \) from \( n=4 \) to \( n+2P=8 \), where \( P=2 \) is the padding
+constant (a new hyperparameter), see discussions below as well.
+</p>
+</section>
+
+<section>
+<h2 id="new-vector">New vector </h2>
+
+<p>We have a new vector defined as \( x(0)=0 \), \( x(1)=0 \),
+\( x(2)=\beta_0 \), \( x(3)=\beta_1 \), \( x(4)=\beta_2 \), \( x(5)=\beta_3 \),
+\( x(6)=0 \), and \( x(7)=0 \).
+</p>
+
+<p>We have added four new elements, which
+are all zero. The benefit is that we can rewrite the equation for
+\( \boldsymbol{y} \), with \( i=0,1,\dots,5 \),
+</p>
+<p>&nbsp;<br>
+$$
+y(i) = \sum_{k=0}^{k=m-1}w(k)x(i+(m-1)-k).
+$$
+<p>&nbsp;<br>
+
+<p>As an example, we have</p>
+<p>&nbsp;<br>
+$$
+y(4)=x(6)w(0)+x(5)w(1)+x(4)w(2)=0\times \alpha_0+\beta_3\alpha_1+\beta_2\alpha_2,
+$$
+<p>&nbsp;<br>
+
+<p>as before except that we have an additional term \( x(6)w(0) \), which is zero.</p>
+
+<p>Similarly, for the fifth-order term we have</p>
+<p>&nbsp;<br>
+$$
+y(5)=x(7)w(0)+x(6)w(1)+x(5)w(2)=0\times \alpha_0+0\times\alpha_1+\beta_3\alpha_2.
+$$
+<p>&nbsp;<br>
+
+<p>The zeroth-order term is</p>
+<p>&nbsp;<br>
+$$
+y(0)=x(2)w(0)+x(1)w(1)+x(0)w(2)=\beta_0 \alpha_0+0\times\alpha_1+0\times\alpha_2=\alpha_0\beta_0.
+$$
+<p>&nbsp;<br>
+</section>
+
+<section>
+<h2 id="rewriting-as-dot-products">Rewriting as dot products </h2>
+
+<p>If we now flip the filter/weight vector, with the following term as a typical example</p>
+<p>&nbsp;<br>
+$$
+y(0)=x(2)w(0)+x(1)w(1)+x(0)w(2)=x(2)\tilde{w}(2)+x(1)\tilde{w}(1)+x(0)\tilde{w}(0),
+$$
+<p>&nbsp;<br>
+
+<p>with \( \tilde{w}(0)=w(2) \), \( \tilde{w}(1)=w(1) \), and \( \tilde{w}(2)=w(0) \), we can then rewrite the above sum as a dot product of
+\( x(i:i+(m-1))\tilde{w} \) for element \( y(i) \), where \( x(i:i+(m-1)) \) is simply a patch of \( \boldsymbol{x} \) of size \( m-1 \).
+</p>
+
+<p>The padding \( P \) we have introduced for the convolution stage is just
+another hyperparameter which is introduced as part of the
+architecture. Similarly, below we will also introduce another
+hyperparameter called <b>Stride</b> \( S \). 
+</p>
+</section>
+
+<section>
+<h2 id="cross-correlation">Cross correlation </h2>
+
+<p>In essentially all applications one uses what is called cross correlation instead of the standard convolution described above.
+This means that multiplication is performed in the same direction and instead of the general expression we have discussed above (with infinite sums)
+</p>
+<p>&nbsp;<br>
+$$
+y(i) = \sum_{k=-\infty}^{k=\infty}w(k)x(i-k),
+$$
+<p>&nbsp;<br>
+
+<p>we have now</p>
+<p>&nbsp;<br>
+$$
+y(i) = \sum_{k=-\infty}^{k=\infty}w(k)x(i+k).
+$$
+<p>&nbsp;<br>
+
+<p>Both TensorFlow and PyTorch (as well as our own code example below),
+implement the last equation, although it is normally referred to as
+convolution.  The same padding rules and stride rules discussed below
+apply to this expression as well.
+</p>
+
+<p>We leave it as an exercise for you to convince yourself that the example we have discussed till now, gives the same final expression using the last expression.</p>
+<h2 id="two-dimensional-objects">Two-dimensional objects </h2>
+
+<p>We are now ready to start studying the discrete convolutions relevant for convolutional neural networks.
+We often use convolutions over more than one dimension at a time. If
+we have a two-dimensional image \( X \) as input, we can have a <b>filter</b>
+defined by a two-dimensional <b>kernel/weight/filter</b> \( W \). This leads to an output \( Y \)
+</p>
+
+<p>&nbsp;<br>
+$$
+Y(i,j)=(X * W)(i,j) = \sum_m\sum_n X(m,n)W(i-m,j-n).
+$$
+<p>&nbsp;<br>
+
+<p>Convolution is a commutative process, which means we can rewrite this equation as</p>
+<p>&nbsp;<br>
+$$
+Y(i,j)=(X * W)(i,j) = \sum_m\sum_n X(i-m,j-n)W(m,n).
+$$
+<p>&nbsp;<br>
+
+<p>Normally the latter is more straightforward to implement in a machine
+larning library since there is less variation in the range of values
+of \( m \) and \( n \).
+</p>
+
+<p>As mentioned above, most deep learning libraries implement
+cross-correlation instead of convolution (although it is referred to as
+convolution)
+</p>
+<p>&nbsp;<br>
+$$
+Y(i,j)=(X * W)(i,j) = \sum_m\sum_n X(i+m,j+n)W(m,n).
+$$
+<p>&nbsp;<br>
+</section>
+
+<section>
+<h2 id="cnns-in-more-detail-simple-example">CNNs in more detail, simple example  </h2>
+
+<p>Let assume we have an input matrix \( X \) of dimensionality \( 3\times 3 \)
+and a \( 2\times 2 \) filter \( W \) given by the following matrices
+</p>
+
+<p>&nbsp;<br>
+$$
+\boldsymbol{X}=\begin{bmatrix}x_{00} & x_{01} & x_{02}  \\
+                      x_{10} & x_{11} & x_{12}  \\
+	              x_{20} & x_{21} & x_{22} \end{bmatrix},
+$$
+<p>&nbsp;<br>
+
+<p>and </p>
+<p>&nbsp;<br>
+$$
+\boldsymbol{W}=\begin{bmatrix}w_{00} & w_{01} \\
+	              w_{10} & w_{11}\end{bmatrix}.
+$$
+<p>&nbsp;<br>
+
+<p>We introduce now the hyperparameter \( S \) <b>stride</b>. Stride represents how the filter \( W \) moves the convolution process on the matrix \( X \).
+We strongly recommend the repository on <a href="https://github.com/vdumoulin/conv_arithmetic" target="_blank">Arithmetic of deep learning by Dumoulin and Visin</a> 
+</p>
+
+<p>Here we set the stride equal to \( S=1 \), which means that, starting with the element \( x_{00} \), the filter will act on \( 2\times 2 \) submatrices each time, starting with the upper corner and moving according to the stride value column by column. </p>
+
+<p>Here we perform the operation</p>
+<p>&nbsp;<br>
+$$
+Y_(i,j)=(X * W)(i,j) = \sum_m\sum_n X(i-m,j-n)W(m,n),
+$$
+<p>&nbsp;<br>
+
+<p>and obtain</p>
+<p>&nbsp;<br>
+$$
+\boldsymbol{Y}=\begin{bmatrix}x_{00}w_{00}+x_{01}w_{01}+x_{10}w_{10}+x_{11}w_{11} & x_{01}w_{00}+x_{02}w_{01}+x_{11}w_{10}+x_{12}w_{11}  \\
+	              x_{10}w_{00}+x_{11}w_{01}+x_{20}w_{10}+x_{21}w_{11} & x_{11}w_{00}+x_{12}w_{01}+x_{21}w_{10}+x_{22}w_{11}\end{bmatrix}.
+$$
+<p>&nbsp;<br>
+
+<p>We can rewrite this operation in terms of a matrix-vector multiplication by defining a new vector where we flatten out the inputs as a vector \( \boldsymbol{X}' \) of length \( 9 \) and
+a matrix \( \boldsymbol{W}' \) with dimension \( 4\times 9 \) as
+</p>
+
+<p>&nbsp;<br>
+$$
+\boldsymbol{X}'=\begin{bmatrix}x_{00} \\ x_{01} \\ x_{02} \\ x_{10} \\ x_{11} \\ x_{12} \\ x_{20} \\ x_{21} \\ x_{22} \end{bmatrix},
+$$
+<p>&nbsp;<br>
+
+<p>and the new matrix</p>
+<p>&nbsp;<br>
+$$
+\boldsymbol{W}'=\begin{bmatrix} w_{00} & w_{01} & 0 & w_{10} & w_{11} & 0 & 0 & 0 & 0 \\
+                        0  & w_{00} & w_{01} & 0 & w_{10} & w_{11} & 0 & 0 & 0 \\
+			0 & 0 & 0 & w_{00} & w_{01} & 0 & w_{10} & w_{11} & 0  \\
+                        0 & 0 & 0 & 0 & w_{00} & w_{01} & 0 & w_{10} & w_{11}\end{bmatrix}.
+$$
+<p>&nbsp;<br>
+
+<p>We see easily that performing the matrix-vector multiplication \( \boldsymbol{W}'\boldsymbol{X}' \) is the same as the above convolution with stride \( S=1 \), that is</p>
+
+<p>&nbsp;<br>
+$$
+Y=(\boldsymbol{W}*\boldsymbol{X}),
+$$
+<p>&nbsp;<br>
+
+<p>is now given by \( \boldsymbol{W}'\boldsymbol{X}' \) which is a vector of length \( 4 \) instead of the originally resulting  \( 2\times 2 \) output matrix.</p>
+</section>
+
+<section>
+<h2 id="the-convolution-stage">The convolution stage </h2>
+
+<p>The convolution stage, where we apply different filters \( \boldsymbol{W} \) in
+order to reduce the dimensionality of an image, adds, in addition to
+the weights and biases (to be trained by the back propagation
+algorithm) that define the filters, two new hyperparameters, the so-called
+<b>padding</b> \( P \) and the stride \( S \).
+</p>
+</section>
+
+<section>
+<h2 id="finding-the-number-of-parameters">Finding the number of parameters </h2>
+
+<p>In the above example we have an input matrix of dimension \( 3\times
+3 \). In general we call the input for an input volume and it is defined
+by its width \( H_1 \), height \( H_1 \) and depth \( D_1 \). If we have the
+standard three color channels \( D_1=3 \).
+</p>
 
-<p>Here we have created a CNN with the following architecture:</p>
+<p>The above example has \( W_1=H_1=3 \) and \( D_1=1 \).</p>
 
+<p>When we introduce the filter we have the following additional hyperparameters</p>
 <ol>
-<p><li> A convolutional layer with 1 input channel, with a kernel height of 2 and a width of 2, which uses LRELU as its non-linearity function. This layer outputs 1 feature map, which feed into the subsequent layer.</li>
-<p><li> A flatten layer</li>
-<p><li> A hidden layer with 30 nodes, with LRELU as its activation function</li>
-<p><li> Another hidden layer but with 20 nodes</li>
-<p><li> The output layer, with softmax as its activation function and 10 nodes. We use 10 nodes because we will be using a dataset with 10 classes.</li>
+<p><li> \( K \) the number of filters. It is common to perform the convolution of the input several times since by experience shrinking the input too fast does not work well</li>
+<p><li> \( F \) as the filter's spatial extent</li>
+<p><li> \( S \) as the stride parameter</li>
+<p><li> \( P \) as the padding parameter</li>
 </ol>
 <p>
-<p>Now, before we can train the model, we need to load in our data. We
-will use the MNIST dataset and use 10000 \( 28 \times  28 \) images.
-</p>
+<p>These parameters are defined by the architecture of the network and are not included in the training.</p>
+</section>
 
+<section>
+<h2 id="new-image-or-volume">New image (or volume) </h2>
 
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="font-size: 80%; line-height: 125%;"><span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.datasets</span> <span style="color: #8B008B; font-weight: bold">import</span> fetch_openml
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.model_selection</span> <span style="color: #8B008B; font-weight: bold">import</span> train_test_split
+<p>Acting with the filter on the input volume produces an output volume
+which is defined by its width \( W_2 \), its height \( H_2 \) and its depth
+\( D_2 \).
+</p>
 
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">onehot</span>(target: np.ndarray):
-    onehot = np.zeros((target.size, target.max() + <span style="color: #B452CD">1</span>))
-    onehot[np.arange(target.size), target] = <span style="color: #B452CD">1</span>
-    <span style="color: #8B008B; font-weight: bold">return</span> onehot
+<p>These are defined by the following relations</p>
+<p>&nbsp;<br>
+$$
+W_2 = \frac{(W_1-F+2P)}{S}+1,
+$$
+<p>&nbsp;<br>
 
-<span style="color: #228B22"># get dataset</span>
-dataset = fetch_openml(<span style="color: #CD5555">&quot;mnist_784&quot;</span>, parser=<span style="color: #CD5555">&quot;auto&quot;</span>)
-mnist = dataset.data.to_numpy(dtype=<span style="color: #CD5555">&quot;float&quot;</span>)[:<span style="color: #B452CD">10000</span>, :]
+<p>&nbsp;<br>
+$$
+H_2 = \frac{(H_1-F+2P)}{S}+1,
+$$
+<p>&nbsp;<br>
 
-<span style="color: #228B22"># scale data</span>
-<span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(mnist.shape[<span style="color: #B452CD">1</span>]):
-    mnist[:, i] /= <span style="color: #B452CD">255</span>
-    
-<span style="color: #228B22"># reshape to add single input channel to data shape [inputs, input_channels, height, width]</span>
-mnist = mnist.reshape(mnist.shape[<span style="color: #B452CD">0</span>], <span style="color: #B452CD">1</span>, <span style="color: #B452CD">28</span>, <span style="color: #B452CD">28</span>)
+<p>and \( D_2=K \).</p>
+</section>
 
-<span style="color: #228B22"># one hot encode target as we are doing multi-class classification</span>
-target = onehot(np.array([<span style="color: #658b00">int</span>(i) <span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> dataset.target.to_numpy()[:<span style="color: #B452CD">10000</span>]]))
+<section>
+<h2 id="parameters-to-train-common-settings">Parameters to train, common settings </h2>
 
-<span style="color: #228B22"># split into training and validation data</span>
-x_train, x_val, y_train, y_val = train_test_split(mnist, target)
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
+<p>With parameter sharing, the convolution involves thus  for each filter  \( F\times F\times D_1 \) weights plus one bias parameter.</p>
 
-<p>Now we may train our model. Note that we can utilize regularization in
-the CNN by using the lam (lambda) parameter in fit(), and utilize
-different types of gradient descent by specifying the amount of
-batches via the batches parameter as shown below.
-</p>
+<p>In total we have</p>
+<p>&nbsp;<br>
+$$
+\left(F\times F\times D_1)\right) \times K+(K\mathrm{--biases}),
+$$
+<p>&nbsp;<br>
 
-<p>The functionfit() returns a score dictionary of the training error and
-accuracy (and validation error and accuracy if a validation set is
-provided) which can be used to plot the error and accuracy of the
-model over epochs.
-</p>
+<p>parameters to train by back propagation.</p>
 
+<p>It is common to let \( K \) come in powers of \( 2 \), that is \( 32 \), \( 64 \), \( 128 \) etc.</p>
 
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="font-size: 80%; line-height: 125%;">scores = cnn.fit(
-    x_train,
-    y_train,
-    lam=<span style="color: #B452CD">1e-5</span>,
-    batches=<span style="color: #B452CD">10</span>,
-    epochs=<span style="color: #B452CD">100</span>,
-    X_val=x_val,
-    t_val=y_val,
-)
-
-plt.plot(scores[<span style="color: #CD5555">&quot;train_acc&quot;</span>], label=<span style="color: #CD5555">&quot;Training&quot;</span>)
-plt.plot(scores[<span style="color: #CD5555">&quot;val_acc&quot;</span>], label=<span style="color: #CD5555">&quot;Validation&quot;</span>)
-plt.ylim([<span style="color: #B452CD">0.8</span>,<span style="color: #B452CD">1</span>])
-plt.xlabel(<span style="color: #CD5555">&quot;Epochs&quot;</span>)
-plt.ylabel(<span style="color: #CD5555">&quot;Accuracy&quot;</span>)
-plt.legend()
-plt.show()
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
+<div class="alert alert-block alert-block alert-text-normal">
+<b>Common settings</b>
+<p>
+<ol>
+<p><li> \( \begin{array}{c} F=3 &amp; S=1 &amp; P=1 \end{array} \)</li>
+<p><li> \( \begin{array}{c} F=5 &amp; S=1 &amp; P=2 \end{array} \)</li>
+<p><li> \( \begin{array}{c} F=5 &amp; S=2 &amp; P=\mathrm{open} \end{array} \)</li>
+<p><li> \( \begin{array}{c} F=1 &amp; S=1 &amp; P=0 \end{array} \)</li>
+</ol>
 </div>
+</section>
 
-<p>Considering we only trained the model for 100 epochs without any tuning of the hyperparameters, this result is pretty good.</p>
+<section>
+<h2 id="examples-of-cnn-setups">Examples of CNN setups </h2>
 
-<p>The codebase allows for great flexibility in CNN
-architectures. Pooling layers can be added before, inbetween or after
-convolutional layers, but due to the great optimizations made within
-Convolution2DLayerOPT, we recommend using the v_stride and h_stride
-parameters in add_Convolution2DLayer() to reduce the dimentionality of
-the problem as the pooling layer is slow in comparison. To use the
-unoptimized version of Convolution2DLayer, simply pass optimized=False
-as an argument in add_Convolution2DLayer().
+<p>Let us assume we have an input volume \( V \) given by an image of dimensionality
+\( 32\times 32 \times 3 \), that is three color channels and \( 32\times 32 \) pixels.
 </p>
 
-<p>If one wishes to perform binary classification using the CNN, simply
-use the cost function 'CostLogReg' when initializing the CNN and use 1
-node at the OutputLayer.
-</p>
+<p>We apply a filter of dimension \( 5\times 5 \) ten times with stride \( S=1 \) and padding \( P=0 \).</p>
 
-<p>Below we have created another, more untraditional architecture using
-our code to demonstrate its flexibility and different attributes such
-as asymmetric stride that might become useful when constructing your
-own CNN.
+<p>The output volume is given by \( (32-5)/1+1=28 \), resulting in ten images
+of dimensionality \( 28\times 28\times 3 \).
 </p>
 
+<p>The total number of parameters to train for each filter is then
+\( 5\times 5\times 3+1 \), where the last parameter is the bias. This
+gives us \( 76 \) parameters for each filter, leading to a total of \( 760 \)
+parameters for the ten filters.
+</p>
 
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="font-size: 80%; line-height: 125%;">adam_scheduler = Adam(eta=<span style="color: #B452CD">1e-3</span>, rho=<span style="color: #B452CD">0.9</span>, rho2=<span style="color: #B452CD">0.999</span>)
-cnn = CNN(cost_func=CostCrossEntropy, scheduler=adam_scheduler, seed=<span style="color: #B452CD">2023</span>)
-
-cnn.add_Convolution2DLayer(
-    input_channels=<span style="color: #B452CD">1</span>,
-    feature_maps=<span style="color: #B452CD">7</span>,
-    kernel_height=<span style="color: #B452CD">7</span>,
-    kernel_width=<span style="color: #B452CD">1</span>,
-    act_func=LRELU,
-)
-
-cnn.add_PoolingLayer(
-    kernel_height=<span style="color: #B452CD">2</span>,
-    kernel_width=<span style="color: #B452CD">2</span>,
-    pooling=<span style="color: #CD5555">&quot;average&quot;</span>,
-)
-
-cnn.add_PoolingLayer(
-    kernel_height=<span style="color: #B452CD">2</span>,
-    kernel_width=<span style="color: #B452CD">2</span>,
-    pooling=<span style="color: #CD5555">&quot;max&quot;</span>,
-)
-
-cnn.add_Convolution2DLayer(
-    input_channels=<span style="color: #B452CD">7</span>,
-    feature_maps=<span style="color: #B452CD">1</span>,
-    kernel_height=<span style="color: #B452CD">4</span>,
-    kernel_width=<span style="color: #B452CD">4</span>,
-    v_stride=<span style="color: #B452CD">2</span>,
-    h_stride=<span style="color: #B452CD">3</span>,
-    act_func=LRELU,
-    optimized=<span style="color: #8B008B; font-weight: bold">False</span>,
-)
-
-cnn.add_Convolution2DLayer(
-    input_channels=<span style="color: #B452CD">1</span>,
-    feature_maps=<span style="color: #B452CD">1</span>,
-    kernel_height=<span style="color: #B452CD">2</span>,
-    kernel_width=<span style="color: #B452CD">2</span>,
-    act_func=sigmoid,
-    optimized=<span style="color: #8B008B; font-weight: bold">True</span>,
-)
-
-cnn.add_PoolingLayer(
-    kernel_height=<span style="color: #B452CD">2</span>,
-    kernel_width=<span style="color: #B452CD">2</span>,
-    pooling=<span style="color: #CD5555">&quot;max&quot;</span>
-)
-
-cnn.add_FlattenLayer()
-
-cnn.add_FullyConnectedLayer(<span style="color: #B452CD">100</span>, LRELU)
-
-cnn.add_FullyConnectedLayer(<span style="color: #B452CD">10</span>, sigmoid)
-
-cnn.add_FullyConnectedLayer(<span style="color: #B452CD">101</span>, identity)
-
-cnn.add_OutputLayer(<span style="color: #B452CD">10</span>, softmax)
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-<p>Here we see the use of asymmetrical 1D kernels such as the \( 7 \times
-1 \) kernel in the first convolutional layer, both max and average
-pooling, asymmetric stride in the unoptimized convolutional layer,
-more pooling, a flatten layer, a hidden layer with 100 nodes using
-LRELU, another hidden layer with 10 hidden nodes that uses the sigmoid
-activation function, and another hidden layer with 101 nodes which
-utilizes no activation function (identity). Finally, we arrive at the
-output layer with 10 nodes, which uses softmax as its activation
-function.
+<p>How many parameters will a filter of dimensionality \( 3\times 3 \)
+(adding color channels) result in if we produce \( 32 \) new images? Use \( S=1 \) and \( P=0 \).
 </p>
-<h3 id="additional-remarks">Additional Remarks </h3>
-
-<p>The stride parameter controls the distance between each convolution
-and the kernel/filter. If our image is padded, stride is the only
-parameter that determines the size of the output from a convolutional
-layer. However, if we decide not to perform any padding, the size of
-the output feature map depends on both the stride and kernel size. It
-is important to note that neither the stride nor the kernel has to be
-symmetrical. This means that we can use a rectangular filter if we
-choose, and the stride in the vertical direction (axis=0 in Python)
-does not need to be the same as the stride in the horizontal direction
-(axis=1 in Python). It may even be the case that asymmetric
-combinations of stride or kernel dimensions, or both, yield better
-results than symmetric values for these parameters.
+
+<p>Note that strides constitute a form of <b>subsampling</b>. As an alternative to
+being interpreted as a measure of how much the kernel/filter is translated, strides
+can also be viewed as how much of the output is retained. For instance, moving
+the kernel by hops of two is equivalent to moving the kernel by hops of one but
+retaining only odd output elements.
 </p>
+</section>
 
+<section>
+<h2 id="summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book">Summarizing: Performing a general discrete convolution (<a href="https://github.com/rasbt/machine-learning-book" target="_blank">From Raschka et al</a>) </h2>
 
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="font-size: 80%; line-height: 125%;"><span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">convolve</span>(image, kernel, stride=<span style="color: #B452CD">1</span>):
-    <span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #B452CD">2</span>):
-        kernel = np.rot90(kernel)
-
-    k_half_height = kernel.shape[<span style="color: #B452CD">0</span>] // <span style="color: #B452CD">2</span>
-    k_half_width = kernel.shape[<span style="color: #B452CD">0</span>] // <span style="color: #B452CD">2</span>
-
-    conv_image = np.zeros(image.shape)
-    pad_image = padding(image, kernel)
-
-    <span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(k_half_height, conv_image.shape[<span style="color: #B452CD">0</span>] + k_half_height, stride):
-        <span style="color: #8B008B; font-weight: bold">for</span> j <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(k_half_width, conv_image.shape[<span style="color: #B452CD">1</span>] + k_half_width, stride):
-            conv_image[i - k_half_height, j - k_half_width] = np.sum(
-                pad_image[
-                    i - k_half_height : i + k_half_height + <span style="color: #B452CD">1</span>, j - k_half_width : j + k_half_width + <span style="color: #B452CD">1</span>
-                ]
-                * kernel
-            )
-
-    <span style="color: #8B008B; font-weight: bold">return</span> conv_image
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-<h3 id="remarks-on-the-speed">Remarks on the speed  </h3>
-
-<p>Despite the naive convolution algorithm shown above working finely, it
-is extremely slow, requiring approximately 20-30 seconds to process a
-single image. The time complexity of 2D convolution, which is O(NMnm),
-rapidly becomes a constraint and may, at worst, make computations
-infeasible. Consequently, optimizing the naive 2D convolution
-algorithm is a necessity, as the execution time of the algorithm
-significantly increases as the input data size expands. This can pose
-a bottleneck in applications that necessitate real-time processing of
-large data volumes, such as image and video processing, deep learning,
-and scientific simulations.
-</p>
+<center>  <!-- FIGURE -->
+<hr class="figure">
+<center>
+<p class="caption">Figure 4:  A deep CNN </p>
+</center>
+<p><img src="http://github.com/figslides/discreteconv1.png" width="500" align="bottom"></p>
+</center>
+</section>
 
-<p>To address this issue, we shall present two widely used optimization
-techniques: the separable kernel approach and Fast Fourier Transform
-(FFT). Both of these methods can drastically reduce the computational
-complexity of convolution and enhance the overall efficiency of
-processing substantial data quantities. While we shall refrain from
-delving into the intricacies of these algorithms, we strongly
-encourage you to examine at least the application of FFT to optimize
-computations.
+<section>
+<h2 id="pooling">Pooling </h2>
+
+<p>In addition to discrete convolutions themselves, <b>pooling</b> operations
+make up another important building block in CNNs. Pooling operations reduce
+the size of feature maps by using some function to summarize subregions, such
+as taking the average or the maximum value.
 </p>
-<h3 id="convolution-using-separable-kernels">Convolution using separable kernels </h3>
 
+<p>Pooling works by sliding a window across the input and feeding the content of
+the window to a <b>pooling function</b>. In some sense, pooling works very much
+like a discrete convolution, but replaces the linear combination described by
+the kernel with some other function.
+</p>
+</section>
 
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="font-size: 80%; line-height: 125%;"><span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">conv2DSep</span>(image, kernel, coef, stride=<span style="color: #B452CD">1</span>, pad=<span style="color: #CD5555">&quot;zero&quot;</span>):
-    <span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #B452CD">2</span>):
-        kernel = np.rot90(kernel)
-
-    <span style="color: #228B22"># The kernel is quadratic, thus we only need one of its dimensions</span>
-    half_dim = kernel.shape[<span style="color: #B452CD">0</span>] // <span style="color: #B452CD">2</span>
-
-    ker1 = np.array(kernel[<span style="color: #B452CD">0</span>, :])
-    ker2 = np.array(kernel[:, <span style="color: #B452CD">0</span>])
-
-    <span style="color: #8B008B; font-weight: bold">if</span> pad == <span style="color: #CD5555">&quot;zero&quot;</span>:
-        conv_image = np.zeros(image.shape)
-        pad_image = padding(image, kernel)
-    <span style="color: #8B008B; font-weight: bold">else</span>:
-        conv_image = np.zeros(
-            (image.shape[<span style="color: #B452CD">0</span>] - kernel.shape[<span style="color: #B452CD">0</span>], image.shape[<span style="color: #B452CD">1</span>] - kernel.shape[<span style="color: #B452CD">1</span>])
-        )
-        pad_image = image[:, :]
-
-    <span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(half_dim, conv_image.shape[<span style="color: #B452CD">0</span>] + half_dim, stride):
-        <span style="color: #8B008B; font-weight: bold">for</span> j <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(half_dim, conv_image.shape[<span style="color: #B452CD">1</span>] + half_dim, stride):
-            conv_image[i - half_dim, j - half_dim] = (
-                pad_image[
-                    i - half_dim : i + half_dim + <span style="color: #B452CD">1</span>, j - half_dim : j + half_dim + <span style="color: #B452CD">1</span>
-                ]
-                @ ker1
-                @ ker2.T
-                * coef
-            )
-
-    <span style="color: #8B008B; font-weight: bold">return</span> conv_image
-
-img_path = img_path = <span style="color: #CD5555">&quot;data/IMG-2167.JPG&quot;</span>
-image_of_cute_dog = imageio.imread(img_path, mode=<span style="color: #CD5555">&quot;L&quot;</span>)
-start_time = time.time()
-filtered_image = conv2DSep(image_of_cute_dog, kernel=sobel_kernel, coef=<span style="color: #B452CD">1</span>)
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">f&#39;Time taken for convolution with seperated kernel on 128x128 image {</span>time.time() - start_time<span style="color: #CD5555">}&#39;</span>)
-plt.imshow(filtered_image, cmap=<span style="color: #CD5555">&quot;gray&quot;</span>, vmin=<span style="color: #B452CD">0</span>, vmax=<span style="color: #B452CD">255</span>, aspect=<span style="color: #CD5555">&quot;auto&quot;</span>)
-plt.show()
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
+<section>
+<h2 id="pooling-arithmetic">Pooling arithmetic </h2>
 
-<p>By taking advantage of the capabilities of separable kernels, we can
-effectively cut the computational expense of filtering an image in
-half. Yet, if we seek even more rapid processing, we can turn to the
-Fast Fourier Transform (FFT) algorithm provided by the numpy
-library. By utilizing FFT to transform the input image and filter into
-the frequency domain, we can perform convolution in this domain. This
-approach significantly reduces the number of operations needed and
-results in a marked speedup relative to other convolution
-techniques. In addition, it is worth noting that the FFT is widely
-regarded as one of the most critical algorithms developed to date,
-with applications ranging from digital signal processing to scientific
-computing.
+<p>In a neural network, pooling layers provide invariance to small translations of
+the input. The most common kind of pooling is <b>max pooling</b>, which
+consists in splitting the input in (usually non-overlapping) patches and
+outputting the maximum value of each patch. Other kinds of pooling exist, e.g.,
+mean or average pooling, which all share the same idea of aggregating the input
+locally by applying a non-linearity to the content of some patches.
 </p>
-<h3 id="convolution-in-the-fourier-domain">Convolution in the Fourier domain </h3>
-
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="font-size: 80%; line-height: 125%;">start_time = time.time()
-img_fft = np.fft.fft2(image_of_cute_dog)
-kernel_fft = np.fft.fft2(sobel_kernel, s=image_of_cute_dog.shape)
+</section>
 
-conv_image = img_fft * kernel_fft
+<section>
+<h2 id="pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book">Pooling types (<a href="https://github.com/rasbt/machine-learning-book" target="_blank">From Raschka et al</a>) </h2>
 
-filtered_image = np.fft.ifft2(conv_image)
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">f&#39;Time take for convolution in the fourier domain: {</span>time.time() - start_time<span style="color: #CD5555">}&#39;</span>)
-plt.imshow(filtered_image.real, cmap=<span style="color: #CD5555">&quot;gray&quot;</span>, vmin=<span style="color: #B452CD">0</span>, vmax=<span style="color: #B452CD">255</span>, aspect=<span style="color: #CD5555">&quot;auto&quot;</span>)
-plt.show()
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
+<center>  <!-- FIGURE -->
+<hr class="figure">
+<center>
+<p class="caption">Figure 5:  A deep CNN </p>
+</center>
+<p><img src="http://github.com/figslides/maxpooling.png" width="500" align="bottom"></p>
+</center>
+</section>
 
-<p>It is evident that executing convolution in the Fourier domain yields
-the quickest computation time. Nonetheless, one should exercise
-caution, particularly when dealing with images of relatively small
-dimensions, as one of the other methods may prove to be more
-expeditious than FFT-enhanced convolution. The overhead involved in
-transferring both the image and filter into the Fourier domain,
-followed by their subsequent transformation back into the spatial
-domain, results in a minor inconvenience. Therefore, it is imperative
-to remain cognizant of this fact when utilizing FFT as the primary
-optimization technique.
+<section>
+<h2 id="building-convolutional-neural-networks-in-tensorflow-keras-and-pytorch">Building convolutional neural networks in Tensorflow/Keras and PyTorch </h2>
+
+<p>As discussed above, CNNs are neural networks built from the assumption
+that the inputs to the network are 2D images. This is important
+because the number of features or pixels in images grows very fast
+with the image size, and an enormous number of weights and biases are
+needed in order to build an accurate network.  Next week we will
+discuss in more detail how we can build a CNN using either TensorFlow
+with Keras and PyTorch.
 </p>
 </section>
 
diff --git a/doc/pub/week44/html/week44-solarized.html b/doc/pub/week44/html/week44-solarized.html
index dadca990b..d16c5ad3d 100644
--- a/doc/pub/week44/html/week44-solarized.html
+++ b/doc/pub/week44/html/week44-solarized.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 44,  Convolutional Neural Networks (CNN)">
-<title>Week 44,  Convolutional Neural Networks (CNN)</title>
+<meta name="description" content="Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)">
+<title>Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)</title>
 <link href="https://cdn.rawgit.com/doconce/doconce/master/bundled/html_styles/style_solarized_box/css/solarized_light_code.css" rel="stylesheet" type="text/css" title="light"/>
 <script src="https://cdn.rawgit.com/doconce/doconce/master/bundled/html_styles/style_solarized_box/js/highlight.pack.js"></script>
 <script>hljs.initHighlightingOnLoad();</script>
@@ -68,10 +68,121 @@
                2,
                None,
                'lab-sessions-on-tuesday-and-wednesday'),
-              ('Material for Lecture Monday October 28',
+              ('Material for Lecture Monday October 27',
                2,
                None,
-               'material-for-lecture-monday-october-28'),
+               'material-for-lecture-monday-october-27'),
+              ('Solving differential equations  with Deep Learning',
+               2,
+               None,
+               'solving-differential-equations-with-deep-learning'),
+              ('Ordinary Differential Equations first',
+               2,
+               None,
+               'ordinary-differential-equations-first'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('Minimization process', 2, None, 'minimization-process'),
+              ('Minimizing the cost function using gradient descent and '
+               'automatic differentiation',
+               2,
+               None,
+               'minimizing-the-cost-function-using-gradient-descent-and-automatic-differentiation'),
+              ('Example: Exponential decay',
+               2,
+               None,
+               'example-exponential-decay'),
+              ('The function to solve for',
+               2,
+               None,
+               'the-function-to-solve-for'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('Setup of Network', 2, None, 'setup-of-network'),
+              ('Reformulating the problem',
+               2,
+               None,
+               'reformulating-the-problem'),
+              ('More technicalities', 2, None, 'more-technicalities'),
+              ('More details', 2, None, 'more-details'),
+              ('A possible implementation of a neural network',
+               2,
+               None,
+               'a-possible-implementation-of-a-neural-network'),
+              ('Technicalities', 2, None, 'technicalities'),
+              ('Final technicalities I', 2, None, 'final-technicalities-i'),
+              ('Final technicalities II', 2, None, 'final-technicalities-ii'),
+              ('Final technicalities III', 2, None, 'final-technicalities-iii'),
+              ('Final technicalities IV', 2, None, 'final-technicalities-iv'),
+              ('Back propagation', 2, None, 'back-propagation'),
+              ('Gradient descent', 2, None, 'gradient-descent'),
+              ('The code for solving the ODE',
+               2,
+               None,
+               'the-code-for-solving-the-ode'),
+              ('The network with one input layer, specified number of hidden '
+               'layers, and one output layer',
+               2,
+               None,
+               'the-network-with-one-input-layer-specified-number-of-hidden-layers-and-one-output-layer'),
+              ('Example: Population growth',
+               2,
+               None,
+               'example-population-growth'),
+              ('Setting up the problem', 2, None, 'setting-up-the-problem'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('The program using Autograd',
+               2,
+               None,
+               'the-program-using-autograd'),
+              ('Using forward Euler to solve the ODE',
+               2,
+               None,
+               'using-forward-euler-to-solve-the-ode'),
+              ('Example: Solving the one dimensional Poisson equation',
+               2,
+               None,
+               'example-solving-the-one-dimensional-poisson-equation'),
+              ('The specific equation to solve for',
+               2,
+               None,
+               'the-specific-equation-to-solve-for'),
+              ('Solving the equation using Autograd',
+               2,
+               None,
+               'solving-the-equation-using-autograd'),
+              ('Comparing with a numerical scheme',
+               2,
+               None,
+               'comparing-with-a-numerical-scheme'),
+              ('Setting up the code', 2, None, 'setting-up-the-code'),
+              ('Partial Differential Equations',
+               2,
+               None,
+               'partial-differential-equations'),
+              ('Type of problem', 2, None, 'type-of-problem'),
+              ('Network requirements', 2, None, 'network-requirements'),
+              ('More details', 2, None, 'more-details'),
+              ('Example: The diffusion equation',
+               2,
+               None,
+               'example-the-diffusion-equation'),
+              ('Defining the problem', 2, None, 'defining-the-problem'),
+              ('Setting up the network using Autograd',
+               2,
+               None,
+               'setting-up-the-network-using-autograd'),
+              ('Setting up the network using Autograd; The trial solution',
+               2,
+               None,
+               'setting-up-the-network-using-autograd-the-trial-solution'),
+              ('Why the Jacobian?', 2, None, 'why-the-jacobian'),
+              ('Setting up the network using Autograd; The full program',
+               2,
+               None,
+               'setting-up-the-network-using-autograd-the-full-program'),
+              ('Resources on differential equations and deep learning',
+               2,
+               None,
+               'resources-on-differential-equations-and-deep-learning'),
               ('Convolutional Neural Networks (recognizing images)',
                2,
                None,
@@ -168,85 +279,11 @@
                2,
                None,
                'pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
-              ('Building convolutional neural networks in Tensorflow and Keras',
-               2,
-               None,
-               'building-convolutional-neural-networks-in-tensorflow-and-keras'),
-              ('Setting it up', 2, None, 'setting-it-up'),
-              ('The MNIST dataset again', 2, None, 'the-mnist-dataset-again'),
-              ('Strong correlations', 2, None, 'strong-correlations'),
-              ('Layers of a CNN', 2, None, 'layers-of-a-cnn'),
-              ('Systematic reduction', 2, None, 'systematic-reduction'),
-              ('Prerequisites: Collect and pre-process data',
-               2,
-               None,
-               'prerequisites-collect-and-pre-process-data'),
-              ('Importing Keras and Tensorflow',
+              ('Building convolutional neural networks in Tensorflow/Keras and '
+               'PyTorch',
                2,
                None,
-               'importing-keras-and-tensorflow'),
-              ('Running with Keras', 2, None, 'running-with-keras'),
-              ('Final part', 2, None, 'final-part'),
-              ('Final visualization', 2, None, 'final-visualization'),
-              ('The CIFAR01 data set', 2, None, 'the-cifar01-data-set'),
-              ('Verifying the data set', 2, None, 'verifying-the-data-set'),
-              ('Set up  the model', 2, None, 'set-up-the-model'),
-              ('Add Dense layers on top', 2, None, 'add-dense-layers-on-top'),
-              ('Compile and train the model',
-               2,
-               None,
-               'compile-and-train-the-model'),
-              ('Finally, evaluate the model',
-               2,
-               None,
-               'finally-evaluate-the-model'),
-              ('Building our own CNN code',
-               2,
-               None,
-               'building-our-own-cnn-code'),
-              ('List of contents:', 3, None, 'list-of-contents'),
-              ('Schedulers', 3, None, 'schedulers'),
-              ('Usage of schedulers', 3, None, 'usage-of-schedulers'),
-              ('Cost functions', 3, None, 'cost-functions'),
-              ('Usage of cost functions', 3, None, 'usage-of-cost-functions'),
-              ('Activation functions', 3, None, 'activation-functions'),
-              ('Usage of activation functions',
-               3,
-               None,
-               'usage-of-activation-functions'),
-              ('Convolution', 3, None, 'convolution'),
-              ('Layers', 3, None, 'layers'),
-              ('Convolution2DLayer: convolution in a hidden layer',
-               3,
-               None,
-               'convolution2dlayer-convolution-in-a-hidden-layer'),
-              ('Backpropagation in the convolutional layer',
-               3,
-               None,
-               'backpropagation-in-the-convolutional-layer'),
-              ('Demonstration', 3, None, 'demonstration'),
-              ('Pooling Layer', 3, None, 'pooling-layer'),
-              ('Flattening Layer', 3, None, 'flattening-layer'),
-              ('Fully Connected Layers', 3, None, 'fully-connected-layers'),
-              ('Optimized Convolution2DLayer',
-               3,
-               None,
-               'optimized-convolution2dlayer'),
-              ('The Convolutional Neural Network (CNN)',
-               3,
-               None,
-               'the-convolutional-neural-network-cnn'),
-              ('Usage of CNN code', 3, None, 'usage-of-cnn-code'),
-              ('Additional Remarks', 3, None, 'additional-remarks'),
-              ('Remarks on the speed', 3, None, 'remarks-on-the-speed'),
-              ('Convolution using separable kernels',
-               3,
-               None,
-               'convolution-using-separable-kernels'),
-              ('Convolution in the Fourier domain',
-               3,
-               None,
-               'convolution-in-the-fourier-domain')]}
+               'building-convolutional-neural-networks-in-tensorflow-keras-and-pytorch')]}
 end of tocinfo -->
 
 <body>
@@ -268,7 +305,7 @@
 
 <!-- ------------------- main content ---------------------- -->
 <center>
-<h1>Week 44,  Convolutional Neural Networks (CNN)</h1>
+<h1>Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)</h1>
 </center>  <!-- document title -->
 
 <!-- author(s): Morten Hjorth-Jensen -->
@@ -281,7 +318,7 @@ <h1>Week 44,  Convolutional Neural Networks (CNN)</h1>
 </center>
 <br>
 <center>
-<h4>October 28</h4>
+<h4>Week 44</h4>
 </center> <!-- date -->
 <br>
 
@@ -289,10 +326,11 @@ <h4>October 28</h4>
 <h2 id="plan-for-week-44">Plan for week 44 </h2>
 
 <div class="alert alert-block alert-block alert-text-normal">
-<b>Material for the lecture Monday October 28, 2024</b>
+<b>Material for the lecture Monday October 27, 2025</b>
 <p>
 <ol>
-<li> Convolutional  Neural Networks</li>
+<li> Solving differential equations, continuation from last week, first lecture</li>
+<li> Convolutional  Neural Networks, second lecture</li>
 <li> Readings and Videos:</li>
 <ul>
   <li> These lecture notes at <a href="https://github.com/CompPhysics/MachineLearning/blob/master/doc/pub/week44/ipynb/week44.ipynb" target="_blank"><tt>https://github.com/CompPhysics/MachineLearning/blob/master/doc/pub/week44/ipynb/week44.ipynb</tt></a></li>
@@ -301,8 +339,8 @@ <h2 id="plan-for-week-44">Plan for week 44 </h2>
   <li> Video on Deep Learning at <a href="https://www.youtube.com/playlist?list=PLZHQObOWTQDNU6R1_67000Dx_ZCJB-3pi" target="_blank"><tt>https://www.youtube.com/playlist?list=PLZHQObOWTQDNU6R1_67000Dx_ZCJB-3pi</tt></a></li>
   <li> Video  on Convolutional Neural Networks from MIT at <a href="https://www.youtube.com/watch?v=iaSUYvmCekI&ab_channel=AlexanderAmini" target="_blank"><tt>https://www.youtube.com/watch?v=iaSUYvmCekI&ab_channel=AlexanderAmini</tt></a></li>
   <li> Video on CNNs from Stanford at <a href="https://www.youtube.com/watch?v=bNb2fEVKeEo&list=PLC1qU-LWwrF64f4QKQT-Vg5Wr4qEE1Zxk&index=6&ab_channel=StanfordUniversitySchoolofEngineering" target="_blank"><tt>https://www.youtube.com/watch?v=bNb2fEVKeEo&list=PLC1qU-LWwrF64f4QKQT-Vg5Wr4qEE1Zxk&index=6&ab_channel=StanfordUniversitySchoolofEngineering</tt></a></li>
-  <li> Video of lecture October 28 at <a href="https://youtu.be/rfrSfikAz94" target="_blank"><tt>https://youtu.be/rfrSfikAz94</tt></a></li>
-  <li> Whiteboard notes at <a href="https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2024/NotesOctober28" target="_blank"><tt>https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2024/NotesOctober28</tt></a></li>
+  <li> Video of lecture October 27 at <a href="https://youtu.be/QqOGhLgkig0" target="_blank"><tt>https://youtu.be/QqOGhLgkig0</tt></a></li>
+  <li> Whiteboard notes at <a href="https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2025/FYSSTKweek44" target="_blank"><tt>https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2025/FYSSTKweek44</tt></a></li>
 </ul>
 </ol>
 </div>
@@ -316,1012 +354,884 @@ <h2 id="lab-sessions-on-tuesday-and-wednesday">Lab  sessions on Tuesday and Wedn
 <p>
 <ul>
 <li> Main focus is discussion of and work on project 2</li>
-<li> If you did not get time to finish the exercises from week 43, you can also keep working on them and hand in this coming Friday</li>
+<li> If you did not get time to finish the exercises from weeks 41-42, you can also keep working on them and hand in this coming Friday</li>
 </ul>
 </div>
   
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="material-for-lecture-monday-october-28">Material for Lecture Monday October 28 </h2>
+<h2 id="material-for-lecture-monday-october-27">Material for Lecture Monday October 27 </h2>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="convolutional-neural-networks-recognizing-images">Convolutional Neural Networks (recognizing images) </h2>
+<h2 id="solving-differential-equations-with-deep-learning">Solving differential equations  with Deep Learning </h2>
 
-<p>Convolutional neural networks (CNNs) were developed during the last
-decade of the previous century, with a focus on character recognition
-tasks. Nowadays, CNNs are a central element in the spectacular success
-of deep learning methods. The success in for example image
-classifications have made them a central tool for most machine
-learning practitioners.
+<div class="alert alert-block alert-block alert-text-normal">
+<b></b>
+<p>
+<p>The Universal Approximation Theorem states that a neural network can
+approximate any function at a single hidden layer along with one input
+and output layer to any given precision.  
 </p>
+</div>
 
-<p>CNNs are very similar to ordinary Neural Networks.
-They are made up of neurons that have learnable weights and
-biases. Each neuron receives some inputs, performs a dot product and
-optionally follows it with a non-linearity. The whole network still
-expresses a single differentiable score function: from the raw image
-pixels on one end to class scores at the other. And they still have a
-loss function (for example Softmax) on the last (fully-connected) layer
-and all the tips/tricks we developed for learning regular Neural
-Networks still apply (back propagation, gradient descent etc etc).
+
+<div class="alert alert-block alert-block alert-text-normal">
+<b>Book on solving differential equations with ML methods</b>
+<p>
+<p><a href="https://www.springer.com/gp/book/9789401798150" target="_blank">An Introduction to Neural Network Methods for Differential Equations</a>, by Yadav and Kumar.</p>
+</div>
+
+
+<div class="alert alert-block alert-block alert-text-normal">
+<b>Physics informed neural networks</b>
+<p>
+<p><a href="https://link.springer.com/article/10.1007/s10915-022-01939-z" target="_blank">Scientific Machine Learning Through Physics&#8211;Informed Neural Networks: Where we are and What&#8217;s Next</a>, by Cuomo et al</p>
+</div>
+
+
+<div class="alert alert-block alert-block alert-text-normal">
+<b>Thanks to Kristine Baluka Hein</b>
+<p>
+<p>The lectures on differential equations were developed by Kristine Baluka Hein, now PhD student at IFI.
+A great thanks to Kristine.
 </p>
+</div>
+
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="what-is-the-difference">What is the Difference </h2>
+<h2 id="ordinary-differential-equations-first">Ordinary Differential Equations first  </h2>
 
-<p><b>CNN architectures make the explicit assumption that
-the inputs are images, which allows us to encode certain properties
-into the architecture. These then make the forward function more
-efficient to implement and vastly reduce the amount of parameters in
-the network.</b>
+<p>An ordinary differential equation (ODE) is an equation involving functions having one variable.</p>
+
+<p>In general, an ordinary differential equation looks like</p>
+
+$$
+\begin{equation} \label{ode}
+f\left(x, \, g(x), \, g'(x), \, g''(x), \, \dots \, , \, g^{(n)}(x)\right) = 0
+\end{equation}
+$$
+
+<p>where \( g(x) \) is the function to find, and \( g^{(n)}(x) \) is the \( n \)-th derivative of \( g(x) \).</p>
+
+<p>The \( f\left(x, g(x), g'(x), g''(x), \, \dots \, , g^{(n)}(x)\right) \) is just a way to write that there is an expression involving \( x \) and \( g(x), \ g'(x), \ g''(x), \, \dots \, , \text{ and } g^{(n)}(x) \) on the left side of the equality sign in \eqref{ode}.
+The highest order of derivative, that is the value of \( n \), determines to the order of the equation.
+The equation is referred to as a \( n \)-th order ODE.
+Along with \eqref{ode}, some additional conditions of the function \( g(x) \) are typically given
+for the solution to be unique.
 </p>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="neural-networks-vs-cnns">Neural Networks vs CNNs </h2>
+<h2 id="the-trial-solution">The trial solution </h2>
 
-<p>Neural networks are defined as <b>affine transformations</b>, that is 
-a vector is received as input and is multiplied with a matrix of so-called weights (our unknown paramters) to produce an
-output (to which a bias vector is usually added before passing the result
-through a nonlinear activation function). This is applicable to any type of input, be it an
-image, a sound clip or an unordered collection of features: whatever their
-dimensionality, their representation can always be flattened into a vector
-before the transformation.
+<p>Let the trial solution \( g_t(x) \) be</p>
+
+$$
+\begin{equation}
+	g_t(x) = h_1(x) + h_2(x,N(x,P))
+\label{_auto1}
+\end{equation}
+$$
+
+<p>where \( h_1(x) \) is a function that makes \( g_t(x) \) satisfy a given set
+of conditions, \( N(x,P) \) a neural network with weights and biases
+described by \( P \) and \( h_2(x, N(x,P)) \) some expression involving the
+neural network.  The role of the function \( h_2(x, N(x,P)) \), is to
+ensure that the output from \( N(x,P) \) is zero when \( g_t(x) \) is
+evaluated at the values of \( x \) where the given conditions must be
+satisfied.  The function \( h_1(x) \) should alone make \( g_t(x) \) satisfy
+the conditions.
 </p>
 
+<p>But what about the network \( N(x,P) \)?</p>
+
+<p>As described previously, an optimization method could be used to minimize the parameters of a neural network, that being its weights and biases, through backward propagation.</p>
+
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc">Why CNNS for images, sound files, medical images from CT scans etc? </h2>
+<h2 id="minimization-process">Minimization process </h2>
 
-<p>However, when we consider images, sound clips and many other similar kinds of data, these data  have an intrinsic
-structure. More formally, they share these important properties:
+<p>For the minimization to be defined, we need to have a cost function at hand to minimize.</p>
+
+<p>It is given that \( f\left(x, \, g(x), \, g'(x), \, g''(x), \, \dots \, , \, g^{(n)}(x)\right) \) should be equal to zero in \eqref{ode}.
+We can choose to consider the mean squared error as the cost function for an input \( x \).
+Since we are looking at one input, the cost function is just \( f \) squared.
+The cost function \( c\left(x, P \right) \) can therefore be expressed as
 </p>
-<ul>
-<li> They are stored as multi-dimensional arrays (think of the pixels of a figure) .</li>
-<li> They feature one or more axes for which ordering matters (e.g., width and height axes for an image, time axis for a sound clip).</li>
-<li> One axis, called the channel axis, is used to access different views of the data (e.g., the red, green and blue channels of a color image, or the left and right channels of a stereo audio track).</li>
-</ul>
-<p>These properties are not exploited when an affine transformation is applied; in
-fact, all the axes are treated in the same way and the topological information
-is not taken into account. Still, taking advantage of the implicit structure of
-the data may prove very handy in solving some tasks, like computer vision and
-speech recognition, and in these cases it would be best to preserve it. This is
-where discrete convolutions come into play.
+
+$$
+C\left(x, P\right) = \big(f\left(x, \, g(x), \, g'(x), \, g''(x), \, \dots \, , \, g^{(n)}(x)\right)\big)^2
+$$
+
+<p>If \( N \) inputs are given as a vector \( \boldsymbol{x} \) with elements \( x_i \) for \( i = 1,\dots,N \),
+the cost function becomes
 </p>
 
-<p>A discrete convolution is a linear transformation that preserves this notion of
-ordering. It is sparse (only a few input units contribute to a given output
-unit) and reuses parameters (the same weights are applied to multiple locations
-in the input).
+$$
+\begin{equation} \label{cost}
+	C\left(\boldsymbol{x}, P\right) = \frac{1}{N} \sum_{i=1}^N \big(f\left(x_i, \, g(x_i), \, g'(x_i), \, g''(x_i), \, \dots \, , \, g^{(n)}(x_i)\right)\big)^2
+\end{equation}
+$$
+
+<p>The neural net should then find the parameters \( P \) that minimizes the cost function in
+\eqref{cost} for a set of \( N \) training samples \( x_i \).
 </p>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="regular-nns-don-t-scale-well-to-full-images">Regular NNs don&#8217;t scale well to full images </h2>
+<h2 id="minimizing-the-cost-function-using-gradient-descent-and-automatic-differentiation">Minimizing the cost function using gradient descent and automatic differentiation </h2>
 
-<p>As an example, consider
-an image of size \( 32\times 32\times 3 \) (32 wide, 32 high, 3 color channels), so a
-single fully-connected neuron in a first hidden layer of a regular
-Neural Network would have \( 32\times 32\times 3 = 3072 \) weights. This amount still
-seems manageable, but clearly this fully-connected structure does not
-scale to larger images. For example, an image of more respectable
-size, say \( 200\times 200\times 3 \), would lead to neurons that have 
-\( 200\times 200\times 3 = 120,000 \) weights. 
+<p>To perform the minimization using gradient descent, the gradient of \( C\left(\boldsymbol{x}, P\right) \) is needed.
+It might happen so that finding an analytical expression of the gradient of \( C(\boldsymbol{x}, P) \) from \eqref{cost} gets too messy, depending on which cost function one desires to use.
 </p>
 
-<p>We could have
-several such neurons, and the parameters would add up quickly! Clearly,
-this full connectivity is wasteful and the huge number of parameters
-would quickly lead to possible overfitting.
+<p>Luckily, there exists libraries that makes the job for us through automatic differentiation.
+Automatic differentiation is a method of finding the derivatives numerically with very high precision.
 </p>
 
-<center>  <!-- FIGURE -->
-<hr class="figure">
-<center>
-<p class="caption">Figure 1:  A regular 3-layer Neural Network. </p>
-</center>
-<p><img src="http://github.com/figslides/nn.jpeg" width="500" align="bottom"></p>
-</center>
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="example-exponential-decay">Example: Exponential decay </h2>
+
+<p>An exponential decay of a quantity \( g(x) \) is described by the equation</p>
+
+$$
+\begin{equation} \label{solve_expdec}
+  g'(x) = -\gamma g(x)
+\end{equation}
+$$
+
+<p>with \( g(0) = g_0 \) for some chosen initial value \( g_0 \).</p>
+
+<p>The analytical solution of \eqref{solve_expdec} is</p>
+
+$$
+\begin{equation}
+  g(x) = g_0 \exp\left(-\gamma x\right)
+\label{_auto2}
+\end{equation}
+$$
+
+<p>Having an analytical solution at hand, it is possible to use it to compare how well a neural network finds a solution of \eqref{solve_expdec}.</p>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="3d-volumes-of-neurons">3D volumes of neurons </h2>
+<h2 id="the-function-to-solve-for">The function to solve for </h2>
 
-<p>Convolutional Neural Networks take advantage of the fact that the
-input consists of images and they constrain the architecture in a more
-sensible way. 
-</p>
+<p>The program will use a neural network to solve</p>
 
-<p>In particular, unlike a regular Neural Network, the
-layers of a CNN have neurons arranged in 3 dimensions: width,
-height, depth. (Note that the word depth here refers to the third
-dimension of an activation volume, not to the depth of a full Neural
-Network, which can refer to the total number of layers in a network.)
-</p>
+$$
+\begin{equation} \label{solveode}
+g'(x) = -\gamma g(x)
+\end{equation}
+$$
 
-<p>To understand it better, the above example of an image 
-with an input volume of
-activations has dimensions \( 32\times 32\times 3 \) (width, height,
-depth respectively). 
-</p>
+<p>where \( g(0) = g_0 \) with \( \gamma \) and \( g_0 \) being some chosen values.</p>
 
-<p>The neurons in a layer will
-only be connected to a small region of the layer before it, instead of
-all of the neurons in a fully-connected manner. Moreover, the final
-output layer could  for this specific image have dimensions \( 1\times 1 \times 10 \), 
-because by the
-end of the CNN architecture we will reduce the full image into a
-single vector of class scores, arranged along the depth
-dimension. 
-</p>
+<p>In this example, \( \gamma = 2 \) and \( g_0 = 10 \).</p>
 
-<center>  <!-- FIGURE -->
-<hr class="figure">
-<center>
-<p class="caption">Figure 2:  A CNN arranges its neurons in three dimensions (width, height, depth), as visualized in one of the layers. Every layer of a CNN transforms the 3D input volume to a 3D output volume of neuron activations. In this example, the red input layer holds the image, so its width and height would be the dimensions of the image, and the depth would be 3 (Red, Green, Blue channels). </p>
-</center>
-<p><img src="http://github.com/figslides/cnn.jpeg" width="500" align="bottom"></p>
-</center>
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="the-trial-solution">The trial solution </h2>
+<p>To begin with, a trial solution \( g_t(t) \) must be chosen. A general trial solution for ordinary differential equations could be</p>
+
+$$
+g_t(x, P) = h_1(x) + h_2(x, N(x, P))
+$$
+
+<p>with \( h_1(x) \) ensuring that \( g_t(x) \) satisfies some conditions and \( h_2(x,N(x, P)) \) an expression involving \( x \) and the output from the neural network \( N(x,P) \) with \( P  \) being the collection of the weights and biases for each layer. For now, it is assumed that the network consists of one input layer, one hidden layer, and one output layer.</p>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="more-on-dimensionalities">More on Dimensionalities </h2>
+<h2 id="setup-of-network">Setup of Network </h2>
 
-<p>In fields like signal processing (and imaging as well), one designs
-so-called filters. These filters are defined by the convolutions and
-are often hand-crafted. One may specify filters for smoothing, edge
-detection, frequency reshaping, and similar operations. However with
-neural networks the idea is to automatically learn the filters and use
-many of them in conjunction with non-linear operations (activation
-functions).
+<p>In this network, there are no weights and bias at the input layer, so \( P = \{ P_{\text{hidden}},  P_{\text{output}} \} \).
+If there are \( N_{\text{hidden} } \) neurons in the hidden layer, then \( P_{\text{hidden}} \) is a \( N_{\text{hidden} } \times (1 + N_{\text{input}}) \) matrix, given that there are \( N_{\text{input}} \) neurons in the input layer.
 </p>
 
-<p>As an example consider a neural network operating on sound sequence
-data.  Assume that we an input vector \( \boldsymbol{x} \) of length \( d=10^6 \).  We
-construct then a neural network with onle hidden layer only with
-\( 10^4 \) nodes. This means that we will have a weight matrix with
-\( 10^4\times 10^6=10^{10} \) weights to be determined, together with \( 10^4 \) biases.
+<p>The first column in \( P_{\text{hidden} } \) represents the bias for each neuron in the hidden layer and the second column represents the weights for each neuron in the hidden layer from the input layer.
+If there are \( N_{\text{output} } \) neurons in the output layer, then \( P_{\text{output}}  \) is a \( N_{\text{output} } \times (1 + N_{\text{hidden} }) \) matrix.
 </p>
 
-<p>Assume furthermore that we have an output layer which is meant to train whether the sound sequence represents a human voice (true) or something else (false).
-It means that we have only one output node. But since this output node connects to \( 10^4 \) nodes in the hidden layer, there are in total \( 10^4 \) weights to be determined for the output layer, plus one bias. In total we have
-</p>
+<p>Its first column represents the bias of each neuron and the remaining columns represents the weights to each neuron.</p>
+
+<p>It is given that \( g(0) = g_0 \). The trial solution must fulfill this condition to be a proper solution of \eqref{solveode}. A possible way to ensure that \( g_t(0, P) = g_0 \), is to let \( F(N(x,P)) = x \cdot N(x,P) \) and \( h_1(x) = g_0 \). This gives the following trial solution:</p>
 
 $$
-\mathrm{NumberParameters}=10^{10}+10^4+10^4+1 \approx 10^{10},
+\begin{equation} \label{trial}
+g_t(x, P) = g_0 + x \cdot N(x, P)
+\end{equation}
 $$
 
-<p>that is ten billion parameters to determine. </p>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="further-remarks">Further remarks </h2>
-
-<p>The main principles that justify convolutions is locality of
-information and repetion of patterns within the signal. Sound samples
-of the input in adjacent spots are much more likely to affect each
-other than those that are very far away. Similarly, sounds are
-repeated in multiple times in the signal. While slightly simplistic,
-reasoning about such a sound example demonstrates this. The same
-principles then apply to images and other similar data.
-</p>
+<h2 id="reformulating-the-problem">Reformulating the problem </h2>
 
-<!-- !split  -->
-<h2 id="layers-used-to-build-cnns">Layers used to build CNNs </h2>
+<p>We wish that our neural network manages to minimize a given cost function.</p>
 
-<p>A simple CNN is a sequence of layers, and every layer of a CNN
-transforms one volume of activations to another through a
-differentiable function. We use three main types of layers to build
-CNN architectures: Convolutional Layer, Pooling Layer, and
-Fully-Connected Layer (exactly as seen in regular Neural Networks). We
-will stack these layers to form a full CNN architecture.
+<p>A reformulation of out equation, \eqref{solveode}, must therefore be done,
+such that it describes the problem a neural network can solve for.
 </p>
 
-<p>A simple CNN for image classification could have the architecture:</p>
+<p>The neural network must find the set of weights and biases \( P \) such that the trial solution in \eqref{trial} satisfies \eqref{solveode}.</p>
 
-<ul>
-<li> <b>INPUT</b> (\( 32\times 32 \times 3 \)) will hold the raw pixel values of the image, in this case an image of width 32, height 32, and with three color channels R,G,B.</li>
-<li> <b>CONV</b> (convolutional )layer will compute the output of neurons that are connected to local regions in the input, each computing a dot product between their weights and a small region they are connected to in the input volume. This may result in volume such as \( [32\times 32\times 12] \) if we decided to use 12 filters.</li>
-<li> <b>RELU</b> layer will apply an elementwise activation function, such as the \( max(0,x) \) thresholding at zero. This leaves the size of the volume unchanged (\( [32\times 32\times 12] \)).</li>
-<li> <b>POOL</b> (pooling) layer will perform a downsampling operation along the spatial dimensions (width, height), resulting in volume such as \( [16\times 16\times 12] \).</li>
-<li> <b>FC</b> (i.e. fully-connected) layer will compute the class scores, resulting in volume of size \( [1\times 1\times 10] \), where each of the 10 numbers correspond to a class score, such as among the 10 categories of the MNIST images we considered above . As with ordinary Neural Networks and as the name implies, each neuron in this layer will be connected to all the numbers in the previous volume.</li>
-</ul>
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="transforming-images">Transforming images </h2>
+<p>The trial solution</p>
 
-<p>CNNs transform the original image layer by layer from the original
-pixel values to the final class scores. 
-</p>
+$$
+g_t(x, P) = g_0 + x \cdot N(x, P)
+$$
 
-<p>Observe that some layers contain
-parameters and other don&#8217;t. In particular, the CNN layers perform
-transformations that are a function of not only the activations in the
-input volume, but also of the parameters (the weights and biases of
-the neurons). On the other hand, the RELU/POOL layers will implement a
-fixed function. The parameters in the CONV/FC layers will be trained
-with gradient descent so that the class scores that the CNN computes
-are consistent with the labels in the training set for each image.
-</p>
+<p>has been chosen such that it already solves the condition \( g(0) = g_0 \). What remains, is to find \( P \) such that</p>
 
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="cnns-in-brief">CNNs in brief </h2>
+$$
+\begin{equation} \label{nnmin}
+g_t'(x, P) = - \gamma g_t(x, P)
+\end{equation}
+$$
 
-<p>In summary:</p>
+<p>is fulfilled as <em>best as possible</em>.</p>
 
-<ul>
-<li> A CNN architecture is in the simplest case a list of Layers that transform the image volume into an output volume (e.g. holding the class scores)</li>
-<li> There are a few distinct types of Layers (e.g. CONV/FC/RELU/POOL are by far the most popular)</li>
-<li> Each Layer accepts an input 3D volume and transforms it to an output 3D volume through a differentiable function</li>
-<li> Each Layer may or may not have parameters (e.g. CONV/FC do, RELU/POOL don&#8217;t)</li>
-<li> Each Layer may or may not have additional hyperparameters (e.g. CONV/FC/POOL do, RELU doesn&#8217;t)</li>
-</ul>
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book">A deep CNN model (<a href="https://github.com/rasbt/machine-learning-book" target="_blank">From Raschka et al</a>) </h2>
+<h2 id="more-technicalities">More technicalities </h2>
 
-<center>  <!-- FIGURE -->
-<hr class="figure">
-<center>
-<p class="caption">Figure 3:  A deep CNN </p>
-</center>
-<p><img src="http://github.com/figslides/deepcnn.png" width="500" align="bottom"></p>
-</center>
+<p>The left hand side and right hand side of \eqref{nnmin} must be computed separately, and then the neural network must choose weights and biases, contained in \( P \), such that the sides are equal as best as possible.
+This means that the absolute or squared difference between the sides must be as close to zero, ideally equal to zero.
+In this case, the difference squared shows to be an appropriate measurement of how erroneous the trial solution is with respect to \( P \) of the neural network.
+</p>
 
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="key-idea">Key Idea </h2>
+<p>This gives the following cost function our neural network must solve for:</p>
+
+$$
+\min_{P}\Big\{ \big(g_t'(x, P) - ( -\gamma g_t(x, P) \big)^2 \Big\}
+$$
 
-<p>A dense neural network is representd by an affine operation (like matrix-matrix multiplication) where all parameters are included.</p>
+<p>(the notation \( \min_{P}\{ f(x, P) \} \) means that we desire to find \( P \) that yields the minimum of \( f(x, P) \))</p>
 
-<p>The key idea in CNNs for say imaging is that in images neighbor pixels tend to be related! So we connect
-only neighboring neurons in the input instead of connecting all with the first hidden layer.
-</p>
+<p>or, in terms of weights and biases for the hidden and output layer in our network:</p>
 
-<p>We say we perform a filtering (convolution is the mathematical operation). </p>
+$$
+\min_{P_{\text{hidden} }, \ P_{\text{output} }}\Big\{ \big(g_t'(x, \{ P_{\text{hidden} }, P_{\text{output} }\}) - ( -\gamma g_t(x, \{ P_{\text{hidden} }, P_{\text{output} }\}) \big)^2 \Big\}
+$$
+
+<p>for an input value \( x \).</p>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="how-to-do-image-compression-before-the-era-of-deep-learning">How to do image compression before the era of deep learning </h2>
+<h2 id="more-details">More details </h2>
 
-<p>The singular-value decomposition (SVD) algorithm has been for decades one of the standard ways of compressing images.
-The <a href="https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/chapter2.html#the-singular-value-decomposition" target="_blank">lectures on the SVD</a> give many of the essential details concerning the SVD.
-</p>
+<p>If the neural network evaluates \( g_t(x, P) \) at more values for \( x \), say \( N \) values \( x_i \) for \( i = 1, \dots, N \), then the <em>total</em> error to minimize becomes</p>
 
-<p>The orthogonal vectors which are obtained from the SVD, can be used to
-project down the dimensionality of a given image. In the example here
-we gray-scale an image and downsize it.
-</p>
+$$
+\begin{equation} \label{min}
+\min_{P}\Big\{\frac{1}{N} \sum_{i=1}^N  \big(g_t'(x_i, P) - ( -\gamma g_t(x_i, P) \big)^2 \Big\}
+\end{equation}
+$$
 
-<p>This recipe relies on us being able to actually perform the SVD. For
-large images, and in particular with many images to reconstruct, using the SVD 
-may quickly become an overwhelming task. With the advent of efficient deep
-learning methods like CNNs and later generative methods, these methods
-have become in the last years the premier way of performing image
-analysis. In particular for classification problems with labelled images.
-</p>
+<p>Letting \( \boldsymbol{x} \) be a vector with elements \( x_i \) and \( C(\boldsymbol{x}, P) = \frac{1}{N} \sum_i  \big(g_t'(x_i, P) - ( -\gamma g_t(x_i, P) \big)^2 \) denote the cost function, the minimization problem that our network must solve, becomes</p>
 
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="the-svd-example">The SVD example </h2>
+$$
+\min_{P} C(\boldsymbol{x}, P)
+$$
 
+<p>In terms of \( P_{\text{hidden} } \) and \( P_{\text{output} } \), this could also be expressed as</p>
 
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="line-height: 125%;"><span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">matplotlib.image</span> <span style="color: #8B008B; font-weight: bold">import</span> imread
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">matplotlib.pyplot</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">plt</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">scipy.linalg</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">ln</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">os</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">PIL</span> <span style="color: #8B008B; font-weight: bold">import</span> Image
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">math</span> <span style="color: #8B008B; font-weight: bold">import</span> log10, sqrt 
-plt.rcParams[<span style="color: #CD5555">&#39;figure.figsize&#39;</span>] = [<span style="color: #B452CD">16</span>, <span style="color: #B452CD">8</span>]
-<span style="color: #228B22"># Import image</span>
-A = imread(os.path.join(<span style="color: #CD5555">&quot;figslides/photo1.jpg&quot;</span>))
-X = A.dot([<span style="color: #B452CD">0.299</span>, <span style="color: #B452CD">0.5870</span>, <span style="color: #B452CD">0.114</span>]) <span style="color: #228B22"># Convert RGB to grayscale</span>
-img = plt.imshow(X)
-<span style="color: #228B22"># convert to gray</span>
-img.set_cmap(<span style="color: #CD5555">&#39;gray&#39;</span>)
-plt.axis(<span style="color: #CD5555">&#39;off&#39;</span>)
-plt.show()
-<span style="color: #228B22"># Call image size</span>
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&#39;: %s&#39;</span>%<span style="color: #658b00">str</span>(X.shape))
+<p>$$
+\min_{P_{\text{hidden} }, \ P_{\text{output} }} C(\boldsymbol{x}, \{P_{\text{hidden} }, P_{\text{output} }\})
+$$
+</p>
 
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="a-possible-implementation-of-a-neural-network">A possible implementation of a neural network </h2>
 
-<span style="color: #228B22"># split the matrix into U, S, VT</span>
-U, S, VT = np.linalg.svd(X,full_matrices=<span style="color: #8B008B; font-weight: bold">False</span>)
-S = np.diag(S)
-m = <span style="color: #B452CD">800</span> <span style="color: #228B22"># Image&#39;s width</span>
-n = <span style="color: #B452CD">1200</span> <span style="color: #228B22"># Image&#39;s height</span>
-j = <span style="color: #B452CD">0</span>
-<span style="color: #228B22"># Try compression with different k vectors (these represent projections):</span>
-<span style="color: #8B008B; font-weight: bold">for</span> k <span style="color: #8B008B">in</span> (<span style="color: #B452CD">5</span>,<span style="color: #B452CD">10</span>, <span style="color: #B452CD">20</span>, <span style="color: #B452CD">100</span>,<span style="color: #B452CD">200</span>,<span style="color: #B452CD">400</span>,<span style="color: #B452CD">500</span>):
-    <span style="color: #228B22"># Original size of the image</span>
-    originalSize = m * n 
-    <span style="color: #228B22"># Size after compressed</span>
-    compressedSize = k * (<span style="color: #B452CD">1</span> + m + n) 
-    <span style="color: #228B22"># The projection of the original image</span>
-    Xapprox = U[:,:k] @ S[<span style="color: #B452CD">0</span>:k,:k] @ VT[:k,:]
-    plt.figure(j+<span style="color: #B452CD">1</span>)
-    j += <span style="color: #B452CD">1</span>
-    img = plt.imshow(Xapprox)
-    img.set_cmap(<span style="color: #CD5555">&#39;gray&#39;</span>)
-    
-    plt.axis(<span style="color: #CD5555">&#39;off&#39;</span>)
-    plt.title(<span style="color: #CD5555">&#39;k = &#39;</span> + <span style="color: #658b00">str</span>(k))
-    plt.show() 
-    <span style="color: #658b00">print</span>(<span style="color: #CD5555">&#39;Original size of image:&#39;</span>)
-    <span style="color: #658b00">print</span>(originalSize)
-    <span style="color: #658b00">print</span>(<span style="color: #CD5555">&#39;Compression rate as Compressed image / Original size:&#39;</span>)
-    ratio = compressedSize * <span style="color: #B452CD">1.0</span> / originalSize
-    <span style="color: #658b00">print</span>(ratio)
-    <span style="color: #658b00">print</span>(<span style="color: #CD5555">&#39;Compression rate is &#39;</span> + <span style="color: #658b00">str</span>( <span style="color: #658b00">round</span>(ratio * <span style="color: #B452CD">100</span> ,<span style="color: #B452CD">2</span>)) + <span style="color: #CD5555">&#39;%&#39;</span> )  
-    <span style="color: #228B22"># Estimate MQA</span>
-    x= X.astype(<span style="color: #CD5555">&quot;float&quot;</span>)
-    y=Xapprox.astype(<span style="color: #CD5555">&quot;float&quot;</span>)
-    err = np.sum((x - y) ** <span style="color: #B452CD">2</span>)
-    err /= <span style="color: #658b00">float</span>(X.shape[<span style="color: #B452CD">0</span>] * Xapprox.shape[<span style="color: #B452CD">1</span>])
-    <span style="color: #658b00">print</span>(<span style="color: #CD5555">&#39;The mean-square deviation &#39;</span>+ <span style="color: #658b00">str</span>(<span style="color: #658b00">round</span>( err)))
-    max_pixel = <span style="color: #B452CD">255.0</span>
-    <span style="color: #228B22"># Estimate Signal Noise Ratio</span>
-    srv = <span style="color: #B452CD">20</span> * (log10(max_pixel / sqrt(err)))
-    <span style="color: #658b00">print</span>(<span style="color: #CD5555">&#39;Signa to noise ratio &#39;</span>+ <span style="color: #658b00">str</span>(<span style="color: #658b00">round</span>(srv)) +<span style="color: #CD5555">&#39;dB&#39;</span>)
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
+<p>For simplicity, it is assumed that the input is an array \( \boldsymbol{x} = (x_1, \dots, x_N) \) with \( N \) elements. It is at these points the neural network should find \( P \) such that it fulfills \eqref{min}.</p>
 
+<p>First, the neural network must feed forward the inputs.
+This means that \( \boldsymbol{x}s \) must be passed through an input layer, a hidden layer and a output layer. The input layer in this case, does not need to process the data any further.
+The input layer will consist of \( N_{\text{input} } \) neurons, passing its element to each neuron in the hidden layer.  The number of neurons in the hidden layer will be \( N_{\text{hidden} } \).
+</p>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="mathematics-of-cnns">Mathematics of CNNs </h2>
+<h2 id="technicalities">Technicalities </h2>
 
-<p>The mathematics of CNNs is based on the mathematical operation of
-<b>convolution</b>.  In mathematics (in particular in functional analysis),
-convolution is represented by mathematical operations (integration,
-summation etc) on two functions in order to produce a third function
-that expresses how the shape of one gets modified by the other.
-Convolution has a plethora of applications in a variety of
-disciplines, spanning from statistics to signal processing, computer
-vision, solutions of differential equations,linear algebra,
-engineering, and yes, machine learning.
-</p>
+<p>For the \( i \)-th in the hidden layer with weight \( w_i^{\text{hidden} } \) and bias \( b_i^{\text{hidden} } \), the weighting from the \( j \)-th neuron at the input layer is:</p>
 
-<p>Mathematically, convolution is defined as follows (one-dimensional example):
-Let us define a continuous function \( y(t) \) given by
-</p>
 $$
-y(t) = \int x(a) w(t-a) da,
+\begin{aligned}
+z_{i,j}^{\text{hidden}} &= b_i^{\text{hidden}} + w_i^{\text{hidden}}x_j \\
+&=
+\begin{pmatrix}
+b_i^{\text{hidden}} & w_i^{\text{hidden}}
+\end{pmatrix}
+\begin{pmatrix}
+1 \\
+x_j
+\end{pmatrix}
+\end{aligned}
 $$
 
-<p>where \( x(a) \) represents a so-called input and \( w(t-a) \) is normally called the weight function or kernel.</p>
 
-<p>The above integral is written in  a more compact form as</p>
-$$
-y(t) = \left(x * w\right)(t).
-$$
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="final-technicalities-i">Final technicalities I </h2>
+
+<p>The result after weighting the inputs at the \( i \)-th hidden neuron can be written as a vector:</p>
 
-<p>The discretized version reads</p>
 $$
-y(t) = \sum_{a=-\infty}^{a=\infty}x(a)w(t-a).
+\begin{aligned}
+\boldsymbol{z}_{i}^{\text{hidden}} &= \Big( b_i^{\text{hidden}} + w_i^{\text{hidden}}x_1 , \ b_i^{\text{hidden}} + w_i^{\text{hidden}} x_2, \ \dots \, , \ b_i^{\text{hidden}} + w_i^{\text{hidden}} x_N\Big)  \\
+&=
+\begin{pmatrix}
+ b_i^{\text{hidden}}  & w_i^{\text{hidden}}
+\end{pmatrix}
+\begin{pmatrix}
+1  & 1 & \dots & 1 \\
+x_1 & x_2 & \dots & x_N
+\end{pmatrix} \\
+&= \boldsymbol{p}_{i, \text{hidden}}^T X
+\end{aligned}
 $$
 
-<p>Computing the inverse of the above convolution operations is known as deconvolution and the process is commutative.</p>
-
-<p>How can we use this? And what does it mean? Let us study some familiar examples first.</p>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="convolution-examples-polynomial-multiplication">Convolution Examples: Polynomial multiplication </h2>
+<h2 id="final-technicalities-ii">Final technicalities II </h2>
 
-<p>Our first example is that of a multiplication between two polynomials,
-which we will rewrite in terms of the mathematics of convolution. In
-the final stage, since the problem here is a discrete one, we will
-recast the final expression in terms of a matrix-vector
-multiplication, where the matrix is a so-called <a href="https://link.springer.com/book/10.1007/978-93-86279-04-0" target="_blank">Toeplitz matrix
-</a>.
-</p>
+<p>The vector \( \boldsymbol{p}_{i, \text{hidden}}^T \) constitutes each row in \( P_{\text{hidden} } \), which contains the weights for the neural network to minimize according to \eqref{min}.</p>
 
-<p>Let us look a the following polynomials to second and third order, respectively:</p>
-$$
-p(t) = \alpha_0+\alpha_1 t+\alpha_2 t^2,
-$$
+<p>After having found \( \boldsymbol{z}_{i}^{\text{hidden}}  \) for every \( i \)-th neuron within the hidden layer, the vector will be sent to an activation function \( a_i(\boldsymbol{z}) \).</p>
+
+<p>In this example, the sigmoid function has been chosen to be the activation function for each hidden neuron:</p>
 
-<p>and</p>
 $$
-s(t) = \beta_0+\beta_1 t+\beta_2 t^2+\beta_3 t^3.
+f(z) = \frac{1}{1 + \exp{(-z)}}
 $$
 
-<p>The polynomial multiplication gives us a new polynomial of degree \( 5 \)</p>
-$$
-z(t) = \delta_0+\delta_1 t+\delta_2 t^2+\delta_3 t^3+\delta_4 t^4+\delta_5 t^5.
+<p>It is possible to use other activations functions for the hidden layer also.</p>
+
+<p>The output \( \boldsymbol{x}_i^{\text{hidden}} \) from each \( i \)-th hidden neuron is:</p>
+
+<p>$$
+\boldsymbol{x}_i^{\text{hidden} } = f\big(  \boldsymbol{z}_{i}^{\text{hidden}} \big)
 $$
+</p>
+
+<p>The outputs \( \boldsymbol{x}_i^{\text{hidden} }  \) are then sent to the output layer.</p>
 
+<p>The output layer consists of one neuron in this case, and combines the
+output from each of the neurons in the hidden layers. The output layer
+combines the results from the hidden layer using some weights \( w_i^{\text{output}} \)
+and biases \( b_i^{\text{output}} \). In this case,
+it is assumes that the number of neurons in the output layer is one.
+</p>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="efficient-polynomial-multiplication">Efficient Polynomial Multiplication </h2>
+<h2 id="final-technicalities-iii">Final technicalities III </h2>
 
-<p>Computing polynomial products can be implemented efficiently if we rewrite the more brute force multiplications using convolution.
-We note first that the new coefficients are given as
-</p>
+<p>The procedure of weighting the output neuron \( j \) in the hidden layer to the \( i \)-th neuron in the output layer is similar as for the hidden layer described previously.</p>
 
 $$
-\begin{split}
-\delta_0=&\alpha_0\beta_0\\
-\delta_1=&\alpha_1\beta_0+\alpha_0\beta_1\\
-\delta_2=&\alpha_0\beta_2+\alpha_1\beta_1+\alpha_2\beta_0\\
-\delta_3=&\alpha_1\beta_2+\alpha_2\beta_1+\alpha_0\beta_3\\
-\delta_4=&\alpha_2\beta_2+\alpha_1\beta_3\\
-\delta_5=&\alpha_2\beta_3.\\
-\end{split}
+\begin{aligned}
+z_{1,j}^{\text{output}} & =
+\begin{pmatrix}
+b_1^{\text{output}} & \boldsymbol{w}_1^{\text{output}}
+\end{pmatrix}
+\begin{pmatrix}
+1 \\
+\boldsymbol{x}_j^{\text{hidden}}
+\end{pmatrix}
+\end{aligned}
 $$
 
-<p>We note that \( \alpha_i=0 \) except for \( i\in \left\{0,1,2\right\} \) and \( \beta_i=0 \) except for \( i\in\left\{0,1,2,3\right\} \).</p>
 
-<p>We can then rewrite the coefficients \( \delta_j \) using a discrete convolution as</p>
-$$
-\delta_j = \sum_{i=-\infty}^{i=\infty}\alpha_i\beta_{j-i}=(\alpha * \beta)_j,
-$$
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="final-technicalities-iv">Final technicalities IV </h2>
+
+<p>Expressing \( z_{1,j}^{\text{output}} \) as a vector gives the following way of weighting the inputs from the hidden layer:</p>
 
-<p>or as a double sum with restriction \( l=i+j \)</p>
 $$
-\delta_l = \sum_{ij}\alpha_i\beta_{j}.
+\boldsymbol{z}_{1}^{\text{output}} =
+\begin{pmatrix}
+b_1^{\text{output}} & \boldsymbol{w}_1^{\text{output}}
+\end{pmatrix}
+\begin{pmatrix}
+1  & 1 & \dots & 1 \\
+\boldsymbol{x}_1^{\text{hidden}} & \boldsymbol{x}_2^{\text{hidden}} & \dots & \boldsymbol{x}_N^{\text{hidden}}
+\end{pmatrix}
 $$
 
+<p>In this case we seek a continuous range of values since we are approximating a function. This means that after computing \( \boldsymbol{z}_{1}^{\text{output}} \) the neural network has finished its feed forward step, and \( \boldsymbol{z}_{1}^{\text{output}} \) is the final output of the network.</p>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="further-simplification">Further simplification </h2>
+<h2 id="back-propagation">Back propagation </h2>
+
+<p>The next step is to decide how the parameters should be changed such that they minimize the cost function.</p>
+
+<p>The chosen cost function for this problem is</p>
 
-<p>Although we may have redundant operations with some few zeros for \( \beta_i \), we can rewrite the above sum in a more compact way as </p>
 $$
-\delta_i = \sum_{k=0}^{k=m-1}\alpha_k\beta_{i-k},
+C(\boldsymbol{x}, P) = \frac{1}{N} \sum_i  \big(g_t'(x_i, P) - ( -\gamma g_t(x_i, P) \big)^2
 $$
 
-<p>where \( m=3 \) in our case, the maximum length of
-the vector \( \alpha \). Note that the vector \( \boldsymbol{\beta} \) has length \( n=4 \). Below we will find an even more efficient representation.
-</p>
+<p>In order to minimize the cost function, an optimization method must be chosen.</p>
+
+<p>Here, gradient descent with a constant step size has been chosen.</p>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="a-more-efficient-way-of-coding-the-above-convolution">A more efficient way of coding the above Convolution </h2>
+<h2 id="gradient-descent">Gradient descent </h2>
 
-<p>Since we only have a finite number of \( \alpha \) and \( \beta \) values
-which are non-zero, we can rewrite the above convolution expressions
-as a matrix-vector multiplication
+<p>The idea of the gradient descent algorithm is to update parameters in
+a direction where the cost function decreases goes to a minimum.
+</p>
+
+<p>In general, the update of some parameters \( \boldsymbol{\omega} \) given a cost
+function defined by some weights \( \boldsymbol{\omega} \), \( C(\boldsymbol{x},
+\boldsymbol{\omega}) \), goes as follows:
 </p>
 
 $$
-\boldsymbol{\delta}=\begin{bmatrix}\alpha_0 & 0 & 0 & 0 \\
-                            \alpha_1 & \alpha_0 & 0 & 0 \\
-			    \alpha_2 & \alpha_1 & \alpha_0 & 0 \\
-			    0 & \alpha_2 & \alpha_1 & \alpha_0 \\
-			    0 & 0 & \alpha_2 & \alpha_1 \\
-			    0 & 0 & 0 & \alpha_2
-			    \end{bmatrix}\begin{bmatrix} \beta_0 \\ \beta_1 \\ \beta_2 \\ \beta_3\end{bmatrix}.
+\boldsymbol{\omega}_{\text{new} } = \boldsymbol{\omega} - \lambda \nabla_{\boldsymbol{\omega}} C(\boldsymbol{x}, \boldsymbol{\omega})
 $$
 
+<p>for a number of iterations or until $ \big|\big| \boldsymbol{\omega}_{\text{new} } - \boldsymbol{\omega} \big|\big|$ becomes smaller than some given tolerance.</p>
 
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="commutative-process">Commutative process </h2>
+<p>The value of \( \lambda \) decides how large steps the algorithm must take
+in the direction of $ \nabla_{\boldsymbol{\omega}} C(\boldsymbol{x}, \boldsymbol{\omega})$.
+The notation \( \nabla_{\boldsymbol{\omega}} \) express the gradient with respect
+to the elements in \( \boldsymbol{\omega} \).
+</p>
 
-<p>The process is commutative and we can easily see that we can rewrite the multiplication in terms of  a matrix holding \( \beta \) and a vector holding \( \alpha \).
-In this case we have
+<p>In our case, we have to minimize the cost function \( C(\boldsymbol{x}, P) \) with
+respect to the two sets of weights and biases, that is for the hidden
+layer \( P_{\text{hidden} } \) and for the output layer \( P_{\text{output}
+} \) .
 </p>
-$$
-\boldsymbol{\delta}=\begin{bmatrix}\beta_0 & 0 & 0  \\
-                            \beta_1 & \beta_0 & 0  \\
-			    \beta_2 & \beta_1 & \beta_0  \\
-			    \beta_3 & \beta_2 & \beta_1 \\
-			    0 & \beta_3 & \beta_2 \\
-			    0 & 0 & \beta_3
-			    \end{bmatrix}\begin{bmatrix} \alpha_0 \\ \alpha_1 \\ \alpha_2\end{bmatrix}.
-$$
 
-<p>Note that the use of these matrices is for mathematical purposes only
-and not implementation purposes.  When implementing the above equation
-we do not encode (and allocate memory) the matrices explicitely.  We
-rather code the convolutions in the minimal memory footprint that they
-require.
-</p>
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="toeplitz-matrices">Toeplitz matrices </h2>
+<p>This means that \( P_{\text{hidden} } \) and \( P_{\text{output} } \) is updated by</p>
 
-<p>The above matrices are examples of so-called <a href="https://link.springer.com/book/10.1007/978-93-86279-04-0" target="_blank">Toeplitz
-matrices</a>. A
-Toeplitz matrix is a matrix in which each descending diagonal from
-left to right is constant. For instance the last matrix, which we
-rewrite as
-</p>
 $$
-\boldsymbol{A}=\begin{bmatrix}a_0 & 0 & 0  \\
-                            a_1 & a_0 & 0  \\
-			    a_2 & a_1 & a_0  \\
-			    a_3 & a_2 & a_1 \\
-			    0 & a_3 & a_2 \\
-			    0 & 0 & a_3
-			    \end{bmatrix},
+\begin{aligned}
+P_{\text{hidden},\text{new}} &= P_{\text{hidden}} - \lambda \nabla_{P_{\text{hidden}}} C(\boldsymbol{x}, P)  \\
+P_{\text{output},\text{new}} &= P_{\text{output}} - \lambda \nabla_{P_{\text{output}}} C(\boldsymbol{x}, P)
+\end{aligned}
 $$
 
-<p>with elements \( a_{ii}=a_{i+1,j+1}=a_{i-j} \) is an example of a Toeplitz
-matrix. Such a matrix does not need to be a square matrix.  Toeplitz
-matrices are also closely connected with Fourier series, because the multiplication operator by a trigonometric
-polynomial, compressed to a finite-dimensional space, can be
-represented by such a matrix. The example above shows that we can
-represent linear convolution as multiplication of a Toeplitz matrix by
-a vector.
-</p>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="fourier-series-and-toeplitz-matrices">Fourier series and Toeplitz matrices </h2>
+<h2 id="the-code-for-solving-the-ode">The code for solving the ODE </h2>
 
-<p>This is an active and ogoing research area concerning CNNs. The following articles may be of interest</p>
-<ol>
-<li> <a href="https://www.sciencedirect.com/topics/engineering/convolution-theorem#:~:text=The%20convolution%20theorem%20(together%20with,k%20)%20G%20(%20k%20)%20." target="_blank">Read more about the convolution theorem and Fouriers series</a></li>
-<li> <a href="https://www.sciencedirect.com/science/article/pii/S1568494623006257" target="_blank">Fourier Transform Layer</a></li>
-</ol>
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="generalizing-the-above-one-dimensional-case">Generalizing the above one-dimensional case </h2>
 
-<p>In order to align the above simple case with the more general
-convolution cases, we rename \( \boldsymbol{\alpha} \), whose length is \( m=3 \),
-with \( \boldsymbol{w} \).  We will interpret \( \boldsymbol{w} \) as a weight/filter function
-with which we want to perform the convolution with an input variable
-\( \boldsymbol{x} \) of length \( n \).  We will assume always that the filter
-\( \boldsymbol{w} \) has dimensionality \( m \le n \).
-</p>
+<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #eeeedd">
+  <pre style="line-height: 125%;"><span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">autograd.numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">autograd</span> <span style="color: #8B008B; font-weight: bold">import</span> grad, elementwise_grad
+<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">autograd.numpy.random</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">npr</span>
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">matplotlib</span> <span style="color: #8B008B; font-weight: bold">import</span> pyplot <span style="color: #8B008B; font-weight: bold">as</span> plt
 
-<p>We replace thus \( \boldsymbol{\beta} \) with \( \boldsymbol{x} \) and \( \boldsymbol{\delta} \) with \( \boldsymbol{y} \) and have</p>
-$$
-y(i)= \left(x*w\right)(i)= \sum_{k=0}^{k=m-1}w(k)x(i-k),
-$$
+<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">sigmoid</span>(z):
+    <span style="color: #8B008B; font-weight: bold">return</span> <span style="color: #B452CD">1</span>/(<span style="color: #B452CD">1</span> + np.exp(-z))
 
-<p>where \( m=3 \) in our case, the maximum length of the vector \( \boldsymbol{w} \).
-Here the symbol \( * \) represents the mathematical operation of convolution.
-</p>
+<span style="color: #228B22"># Assuming one input, hidden, and output layer</span>
+<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">neural_network</span>(params, x):
 
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="memory-considerations">Memory considerations </h2>
+    <span style="color: #228B22"># Find the weights (including and biases) for the hidden and output layer.</span>
+    <span style="color: #228B22"># Assume that params is a list of parameters for each layer.</span>
+    <span style="color: #228B22"># The biases are the first element for each array in params,</span>
+    <span style="color: #228B22"># and the weights are the remaning elements in each array in params.</span>
 
-<p>This expression leaves us however with some terms with negative
-indices, for example \( x(-1) \) and \( x(-2) \) which may not be defined. Our
-vector \( \boldsymbol{x} \) has components \( x(0) \), \( x(1) \), \( x(2) \) and \( x(3) \).
-</p>
+    w_hidden = params[<span style="color: #B452CD">0</span>]
+    w_output = params[<span style="color: #B452CD">1</span>]
 
-<p>The index \( j \) for \( \boldsymbol{x} \) runs from \( j=0 \) to \( j=3 \) since \( \boldsymbol{x} \) is meant to
-represent a third-order polynomial.
-</p>
+    <span style="color: #228B22"># Assumes input x being an one-dimensional array</span>
+    num_values = np.size(x)
+    x = x.reshape(-<span style="color: #B452CD">1</span>, num_values)
 
-<p>Furthermore, the index \( i \) runs from \( i=0 \) to \( i=5 \) since \( \boldsymbol{y} \)
-contains the coefficients of a fifth-order polynomial.  When \( i=5 \) we
-may also have values of \( x(4) \) and \( x(5) \) which are not defined.
-</p>
+    <span style="color: #228B22"># Assume that the input layer does nothing to the input x</span>
+    x_input = x
 
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="padding">Padding </h2>
+    <span style="color: #228B22">## Hidden layer:</span>
 
-<p>The solution to this is what is called <b>padding</b>!  We simply define a
-new vector \( x \) with two added elements set to zero before \( x(0) \) and
-two new elements after \( x(3) \) set to zero. That is, we augment the
-length of \( \boldsymbol{x} \) from \( n=4 \) to \( n+2P=8 \), where \( P=2 \) is the padding
-constant (a new hyperparameter), see discussions below as well.
-</p>
+    <span style="color: #228B22"># Add a row of ones to include bias</span>
+    x_input = np.concatenate((np.ones((<span style="color: #B452CD">1</span>,num_values)), x_input ), axis = <span style="color: #B452CD">0</span>)
 
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="new-vector">New vector </h2>
+    z_hidden = np.matmul(w_hidden, x_input)
+    x_hidden = sigmoid(z_hidden)
 
-<p>We have a new vector defined as \( x(0)=0 \), \( x(1)=0 \),
-\( x(2)=\beta_0 \), \( x(3)=\beta_1 \), \( x(4)=\beta_2 \), \( x(5)=\beta_3 \),
-\( x(6)=0 \), and \( x(7)=0 \).
-</p>
+    <span style="color: #228B22">## Output layer:</span>
 
-<p>We have added four new elements, which
-are all zero. The benefit is that we can rewrite the equation for
-\( \boldsymbol{y} \), with \( i=0,1,\dots,5 \),
-</p>
-$$
-y(i) = \sum_{k=0}^{k=m-1}w(k)x(i+(m-1)-k).
-$$
+    <span style="color: #228B22"># Include bias:</span>
+    x_hidden = np.concatenate((np.ones((<span style="color: #B452CD">1</span>,num_values)), x_hidden ), axis = <span style="color: #B452CD">0</span>)
 
-<p>As an example, we have</p>
-$$
-y(4)=x(6)w(0)+x(5)w(1)+x(4)w(2)=0\times \alpha_0+\beta_3\alpha_1+\beta_2\alpha_2,
-$$
+    z_output = np.matmul(w_output, x_hidden)
+    x_output = z_output
 
-<p>as before except that we have an additional term \( x(6)w(0) \), which is zero.</p>
+    <span style="color: #8B008B; font-weight: bold">return</span> x_output
 
-<p>Similarly, for the fifth-order term we have</p>
-$$
-y(5)=x(7)w(0)+x(6)w(1)+x(5)w(2)=0\times \alpha_0+0\times\alpha_1+\beta_3\alpha_2.
-$$
+<span style="color: #228B22"># The trial solution using the deep neural network:</span>
+<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">g_trial</span>(x,params, g0 = <span style="color: #B452CD">10</span>):
+    <span style="color: #8B008B; font-weight: bold">return</span> g0 + x*neural_network(params,x)
 
-<p>The zeroth-order term is</p>
-$$
-y(0)=x(2)w(0)+x(1)w(1)+x(0)w(2)=\beta_0 \alpha_0+0\times\alpha_1+0\times\alpha_2=\alpha_0\beta_0.
-$$
+<span style="color: #228B22"># The right side of the ODE:</span>
+<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">g</span>(x, g_trial, gamma = <span style="color: #B452CD">2</span>):
+    <span style="color: #8B008B; font-weight: bold">return</span> -gamma*g_trial
 
+<span style="color: #228B22"># The cost function:</span>
+<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">cost_function</span>(P, x):
 
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="rewriting-as-dot-products">Rewriting as dot products </h2>
+    <span style="color: #228B22"># Evaluate the trial function with the current parameters P</span>
+    g_t = g_trial(x,P)
 
-<p>If we now flip the filter/weight vector, with the following term as a typical example</p>
-$$
-y(0)=x(2)w(0)+x(1)w(1)+x(0)w(2)=x(2)\tilde{w}(2)+x(1)\tilde{w}(1)+x(0)\tilde{w}(0),
-$$
+    <span style="color: #228B22"># Find the derivative w.r.t x of the neural network</span>
+    d_net_out = elementwise_grad(neural_network,<span style="color: #B452CD">1</span>)(P,x)
 
-<p>with \( \tilde{w}(0)=w(2) \), \( \tilde{w}(1)=w(1) \), and \( \tilde{w}(2)=w(0) \), we can then rewrite the above sum as a dot product of
-\( x(i:i+(m-1))\tilde{w} \) for element \( y(i) \), where \( x(i:i+(m-1)) \) is simply a patch of \( \boldsymbol{x} \) of size \( m-1 \).
-</p>
+    <span style="color: #228B22"># Find the derivative w.r.t x of the trial function</span>
+    d_g_t = elementwise_grad(g_trial,<span style="color: #B452CD">0</span>)(x,P)
 
-<p>The padding \( P \) we have introduced for the convolution stage is just
-another hyperparameter which is introduced as part of the
-architecture. Similarly, below we will also introduce another
-hyperparameter called <b>Stride</b> \( S \). 
-</p>
+    <span style="color: #228B22"># The right side of the ODE</span>
+    func = g(x, g_t)
 
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="cross-correlation">Cross correlation </h2>
+    err_sqr = (d_g_t - func)**<span style="color: #B452CD">2</span>
+    cost_sum = np.sum(err_sqr)
 
-<p>In essentially all applications one uses what is called cross correlation instead of the standard convolution described above.
-This means that multiplication is performed in the same direction and instead of the general expression we have discussed above (with infinite sums)
-</p>
-$$
-y(i) = \sum_{k=-\infty}^{k=\infty}w(k)x(i-k),
-$$
+    <span style="color: #8B008B; font-weight: bold">return</span> cost_sum / np.size(err_sqr)
 
-<p>we have now</p>
-$$
-y(i) = \sum_{k=-\infty}^{k=\infty}w(k)x(i+k).
-$$
+<span style="color: #228B22"># Solve the exponential decay ODE using neural network with one input, hidden, and output layer</span>
+<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">solve_ode_neural_network</span>(x, num_neurons_hidden, num_iter, lmb):
+    <span style="color: #228B22">## Set up initial weights and biases</span>
 
-<p>Both TensorFlow and PyTorch (as well as our own code example below),
-implement the last equation, although it is normally referred to as
-convolution.  The same padding rules and stride rules discussed below
-apply to this expression as well.
-</p>
+    <span style="color: #228B22"># For the hidden layer</span>
+    p0 = npr.randn(num_neurons_hidden, <span style="color: #B452CD">2</span> )
 
-<p>We leave it as an exercise for you to convince yourself that the example we have discussed till now, gives the same final expression using the last expression.</p>
-<h2 id="two-dimensional-objects">Two-dimensional objects </h2>
+    <span style="color: #228B22"># For the output layer</span>
+    p1 = npr.randn(<span style="color: #B452CD">1</span>, num_neurons_hidden + <span style="color: #B452CD">1</span> ) <span style="color: #228B22"># +1 since bias is included</span>
 
-<p>We are now ready to start studying the discrete convolutions relevant for convolutional neural networks.
-We often use convolutions over more than one dimension at a time. If
-we have a two-dimensional image \( X \) as input, we can have a <b>filter</b>
-defined by a two-dimensional <b>kernel/weight/filter</b> \( W \). This leads to an output \( Y \)
-</p>
+    P = [p0, p1]
 
-$$
-Y(i,j)=(X * W)(i,j) = \sum_m\sum_n X(m,n)W(i-m,j-n).
-$$
+    <span style="color: #658b00">print</span>(<span style="color: #CD5555">&#39;Initial cost: %g&#39;</span>%cost_function(P, x))
 
-<p>Convolution is a commutative process, which means we can rewrite this equation as</p>
-$$
-Y(i,j)=(X * W)(i,j) = \sum_m\sum_n X(i-m,j-n)W(m,n).
-$$
+    <span style="color: #228B22">## Start finding the optimal weights using gradient descent</span>
 
-<p>Normally the latter is more straightforward to implement in a machine
-larning library since there is less variation in the range of values
-of \( m \) and \( n \).
-</p>
+    <span style="color: #228B22"># Find the Python function that represents the gradient of the cost function</span>
+    <span style="color: #228B22"># w.r.t the 0-th input argument -- that is the weights and biases in the hidden and output layer</span>
+    cost_function_grad = grad(cost_function,<span style="color: #B452CD">0</span>)
 
-<p>As mentioned above, most deep learning libraries implement
-cross-correlation instead of convolution (although it is referred to as
-convolution)
-</p>
-$$
-Y(i,j)=(X * W)(i,j) = \sum_m\sum_n X(i+m,j+n)W(m,n).
-$$
+    <span style="color: #228B22"># Let the update be done num_iter times</span>
+    <span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(num_iter):
+        <span style="color: #228B22"># Evaluate the gradient at the current weights and biases in P.</span>
+        <span style="color: #228B22"># The cost_grad consist now of two arrays;</span>
+        <span style="color: #228B22"># one for the gradient w.r.t P_hidden and</span>
+        <span style="color: #228B22"># one for the gradient w.r.t P_output</span>
+        cost_grad =  cost_function_grad(P, x)
 
+        P[<span style="color: #B452CD">0</span>] = P[<span style="color: #B452CD">0</span>] - lmb * cost_grad[<span style="color: #B452CD">0</span>]
+        P[<span style="color: #B452CD">1</span>] = P[<span style="color: #B452CD">1</span>] - lmb * cost_grad[<span style="color: #B452CD">1</span>]
 
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="cnns-in-more-detail-simple-example">CNNs in more detail, simple example  </h2>
+    <span style="color: #658b00">print</span>(<span style="color: #CD5555">&#39;Final cost: %g&#39;</span>%cost_function(P, x))
 
-<p>Let assume we have an input matrix \( X \) of dimensionality \( 3\times 3 \)
-and a \( 2\times 2 \) filter \( W \) given by the following matrices
-</p>
+    <span style="color: #8B008B; font-weight: bold">return</span> P
 
-$$
-\boldsymbol{X}=\begin{bmatrix}x_{00} & x_{01} & x_{02}  \\
-                      x_{10} & x_{11} & x_{12}  \\
-	              x_{20} & x_{21} & x_{22} \end{bmatrix},
-$$
+<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">g_analytic</span>(x, gamma = <span style="color: #B452CD">2</span>, g0 = <span style="color: #B452CD">10</span>):
+    <span style="color: #8B008B; font-weight: bold">return</span> g0*np.exp(-gamma*x)
 
-<p>and </p>
-$$
-\boldsymbol{W}=\begin{bmatrix}w_{00} & w_{01} \\
-	              w_{10} & w_{11}\end{bmatrix}.
-$$
+<span style="color: #228B22"># Solve the given problem</span>
+<span style="color: #8B008B; font-weight: bold">if</span> <span style="color: #00688B">__name__</span> == <span style="color: #CD5555">&#39;__main__&#39;</span>:
+    <span style="color: #228B22"># Set seed such that the weight are initialized</span>
+    <span style="color: #228B22"># with same weights and biases for every run.</span>
+    npr.seed(<span style="color: #B452CD">15</span>)
 
-<p>We introduce now the hyperparameter \( S \) <b>stride</b>. Stride represents how the filter \( W \) moves the convolution process on the matrix \( X \).
-We strongly recommend the repository on <a href="https://github.com/vdumoulin/conv_arithmetic" target="_blank">Arithmetic of deep learning by Dumoulin and Visin</a> 
-</p>
+    <span style="color: #228B22">## Decide the vales of arguments to the function to solve</span>
+    N = <span style="color: #B452CD">10</span>
+    x = np.linspace(<span style="color: #B452CD">0</span>, <span style="color: #B452CD">1</span>, N)
 
-<p>Here we set the stride equal to \( S=1 \), which means that, starting with the element \( x_{00} \), the filter will act on \( 2\times 2 \) submatrices each time, starting with the upper corner and moving according to the stride value column by column. </p>
+    <span style="color: #228B22">## Set up the initial parameters</span>
+    num_hidden_neurons = <span style="color: #B452CD">10</span>
+    num_iter = <span style="color: #B452CD">10000</span>
+    lmb = <span style="color: #B452CD">0.001</span>
 
-<p>Here we perform the operation</p>
-$$
-Y_(i,j)=(X * W)(i,j) = \sum_m\sum_n X(i-m,j-n)W(m,n),
-$$
+    <span style="color: #228B22"># Use the network</span>
+    P = solve_ode_neural_network(x, num_hidden_neurons, num_iter, lmb)
 
-<p>and obtain</p>
-$$
-\boldsymbol{Y}=\begin{bmatrix}x_{00}w_{00}+x_{01}w_{01}+x_{10}w_{10}+x_{11}w_{11} & x_{01}w_{00}+x_{02}w_{01}+x_{11}w_{10}+x_{12}w_{11}  \\
-	              x_{10}w_{00}+x_{11}w_{01}+x_{20}w_{10}+x_{21}w_{11} & x_{11}w_{00}+x_{12}w_{01}+x_{21}w_{10}+x_{22}w_{11}\end{bmatrix}.
-$$
+    <span style="color: #228B22"># Print the deviation from the trial solution and true solution</span>
+    res = g_trial(x,P)
+    res_analytical = g_analytic(x)
 
-<p>We can rewrite this operation in terms of a matrix-vector multiplication by defining a new vector where we flatten out the inputs as a vector \( \boldsymbol{X}' \) of length \( 9 \) and
-a matrix \( \boldsymbol{W}' \) with dimension \( 4\times 9 \) as
-</p>
+    <span style="color: #658b00">print</span>(<span style="color: #CD5555">&#39;Max absolute difference: %g&#39;</span>%np.max(np.abs(res - res_analytical)))
 
-$$
-\boldsymbol{X}'=\begin{bmatrix}x_{00} \\ x_{01} \\ x_{02} \\ x_{10} \\ x_{11} \\ x_{12} \\ x_{20} \\ x_{21} \\ x_{22} \end{bmatrix},
-$$
+    <span style="color: #228B22"># Plot the results</span>
+    plt.figure(figsize=(<span style="color: #B452CD">10</span>,<span style="color: #B452CD">10</span>))
 
-<p>and the new matrix</p>
-$$
-\boldsymbol{W}'=\begin{bmatrix} w_{00} & w_{01} & 0 & w_{10} & w_{11} & 0 & 0 & 0 & 0 \\
-                        0  & w_{00} & w_{01} & 0 & w_{10} & w_{11} & 0 & 0 & 0 \\
-			0 & 0 & 0 & w_{00} & w_{01} & 0 & w_{10} & w_{11} & 0  \\
-                        0 & 0 & 0 & 0 & w_{00} & w_{01} & 0 & w_{10} & w_{11}\end{bmatrix}.
-$$
+    plt.title(<span style="color: #CD5555">&#39;Performance of neural network solving an ODE compared to the analytical solution&#39;</span>)
+    plt.plot(x, res_analytical)
+    plt.plot(x, res[<span style="color: #B452CD">0</span>,:])
+    plt.legend([<span style="color: #CD5555">&#39;analytical&#39;</span>,<span style="color: #CD5555">&#39;nn&#39;</span>])
+    plt.xlabel(<span style="color: #CD5555">&#39;x&#39;</span>)
+    plt.ylabel(<span style="color: #CD5555">&#39;g(x)&#39;</span>)
+    plt.show()
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
+</div>
 
-<p>We see easily that performing the matrix-vector multiplication \( \boldsymbol{W}'\boldsymbol{X}' \) is the same as the above convolution with stride \( S=1 \), that is</p>
 
-$$
-Y=(\boldsymbol{W}*\boldsymbol{X}),
-$$
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="the-network-with-one-input-layer-specified-number-of-hidden-layers-and-one-output-layer">The network with one input layer, specified number of hidden layers, and one output layer </h2>
 
-<p>is now given by \( \boldsymbol{W}'\boldsymbol{X}' \) which is a vector of length \( 4 \) instead of the originally resulting  \( 2\times 2 \) output matrix.</p>
+<p>It is also possible to extend the construction of our network into a more general one, allowing the network to contain more than one hidden layers.</p>
 
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="the-convolution-stage">The convolution stage </h2>
+<p>The number of neurons within each hidden layer are given as a list of integers in the program below.</p>
 
-<p>The convolution stage, where we apply different filters \( \boldsymbol{W} \) in
-order to reduce the dimensionality of an image, adds, in addition to
-the weights and biases (to be trained by the back propagation
-algorithm) that define the filters, two new hyperparameters, the so-called
-<b>padding</b> \( P \) and the stride \( S \).
-</p>
 
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="finding-the-number-of-parameters">Finding the number of parameters </h2>
+<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #eeeedd">
+  <pre style="line-height: 125%;"><span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">autograd.numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">autograd</span> <span style="color: #8B008B; font-weight: bold">import</span> grad, elementwise_grad
+<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">autograd.numpy.random</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">npr</span>
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">matplotlib</span> <span style="color: #8B008B; font-weight: bold">import</span> pyplot <span style="color: #8B008B; font-weight: bold">as</span> plt
 
-<p>In the above example we have an input matrix of dimension \( 3\times
-3 \). In general we call the input for an input volume and it is defined
-by its width \( H_1 \), height \( H_1 \) and depth \( D_1 \). If we have the
-standard three color channels \( D_1=3 \).
-</p>
+<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">sigmoid</span>(z):
+    <span style="color: #8B008B; font-weight: bold">return</span> <span style="color: #B452CD">1</span>/(<span style="color: #B452CD">1</span> + np.exp(-z))
 
-<p>The above example has \( W_1=H_1=3 \) and \( D_1=1 \).</p>
+<span style="color: #228B22"># The neural network with one input layer and one output layer,</span>
+<span style="color: #228B22"># but with number of hidden layers specified by the user.</span>
+<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">deep_neural_network</span>(deep_params, x):
+    <span style="color: #228B22"># N_hidden is the number of hidden layers</span>
+    <span style="color: #228B22"># deep_params is a list, len() should be used</span>
+    N_hidden = <span style="color: #658b00">len</span>(deep_params) - <span style="color: #B452CD">1</span> <span style="color: #228B22"># -1 since params consists of</span>
+                                        <span style="color: #228B22"># parameters to all the hidden</span>
+                                        <span style="color: #228B22"># layers AND the output layer.</span>
 
-<p>When we introduce the filter we have the following additional hyperparameters</p>
-<ol>
-<li> \( K \) the number of filters. It is common to perform the convolution of the input several times since by experience shrinking the input too fast does not work well</li>
-<li> \( F \) as the filter's spatial extent</li>
-<li> \( S \) as the stride parameter</li>
-<li> \( P \) as the padding parameter</li>
-</ol>
-<p>These parameters are defined by the architecture of the network and are not included in the training.</p>
+    <span style="color: #228B22"># Assumes input x being an one-dimensional array</span>
+    num_values = np.size(x)
+    x = x.reshape(-<span style="color: #B452CD">1</span>, num_values)
 
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="new-image-or-volume">New image (or volume) </h2>
+    <span style="color: #228B22"># Assume that the input layer does nothing to the input x</span>
+    x_input = x
 
-<p>Acting with the filter on the input volume produces an output volume
-which is defined by its width \( W_2 \), its height \( H_2 \) and its depth
-\( D_2 \).
-</p>
+    <span style="color: #228B22"># Due to multiple hidden layers, define a variable referencing to the</span>
+    <span style="color: #228B22"># output of the previous layer:</span>
+    x_prev = x_input
 
-<p>These are defined by the following relations</p>
-$$
-W_2 = \frac{(W_1-F+2P)}{S}+1,
-$$
+    <span style="color: #228B22">## Hidden layers:</span>
 
-$$
-H_2 = \frac{(H_1-F+2P)}{S}+1,
-$$
+    <span style="color: #8B008B; font-weight: bold">for</span> l <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(N_hidden):
+        <span style="color: #228B22"># From the list of parameters P; find the correct weigths and bias for this layer</span>
+        w_hidden = deep_params[l]
 
-<p>and \( D_2=K \).</p>
+        <span style="color: #228B22"># Add a row of ones to include bias</span>
+        x_prev = np.concatenate((np.ones((<span style="color: #B452CD">1</span>,num_values)), x_prev ), axis = <span style="color: #B452CD">0</span>)
 
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="parameters-to-train-common-settings">Parameters to train, common settings </h2>
+        z_hidden = np.matmul(w_hidden, x_prev)
+        x_hidden = sigmoid(z_hidden)
 
-<p>With parameter sharing, the convolution involves thus  for each filter  \( F\times F\times D_1 \) weights plus one bias parameter.</p>
+        <span style="color: #228B22"># Update x_prev such that next layer can use the output from this layer</span>
+        x_prev = x_hidden
 
-<p>In total we have</p>
-$$
-\left(F\times F\times D_1\right) \times K+K_{\mathrm{biases}},
-$$
+    <span style="color: #228B22">## Output layer:</span>
 
-<p>parameters to train by back propagation.</p>
+    <span style="color: #228B22"># Get the weights and bias for this layer</span>
+    w_output = deep_params[-<span style="color: #B452CD">1</span>]
 
-<p>It is common to let \( K \) come in powers of \( 2 \), that is \( 32 \), \( 64 \), \( 128 \) etc.</p>
+    <span style="color: #228B22"># Include bias:</span>
+    x_prev = np.concatenate((np.ones((<span style="color: #B452CD">1</span>,num_values)), x_prev), axis = <span style="color: #B452CD">0</span>)
 
-<div class="alert alert-block alert-block alert-text-normal">
-<b>Common settings</b>
-<p>
-<ol>
-<li> \( \begin{array}{c} F=3 &amp; S=1 &amp; P=1 \end{array} \)</li>
-<li> \( \begin{array}{c} F=5 &amp; S=1 &amp; P=2 \end{array} \)</li>
-<li> \( \begin{array}{c} F=5 &amp; S=2 &amp; P=\mathrm{open} \end{array} \)</li>
-<li> \( \begin{array}{c} F=1 &amp; S=1 &amp; P=0 \end{array} \)</li>
-</ol>
-</div>
+    z_output = np.matmul(w_output, x_prev)
+    x_output = z_output
 
+    <span style="color: #8B008B; font-weight: bold">return</span> x_output
 
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="examples-of-cnn-setups">Examples of CNN setups </h2>
+<span style="color: #228B22"># The trial solution using the deep neural network:</span>
+<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">g_trial_deep</span>(x,params, g0 = <span style="color: #B452CD">10</span>):
+    <span style="color: #8B008B; font-weight: bold">return</span> g0 + x*deep_neural_network(params, x)
 
-<p>Let us assume we have an input volume \( V \) given by an image of dimensionality
-\( 32\times 32 \times 3 \), that is three color channels and \( 32\times 32 \) pixels.
-</p>
+<span style="color: #228B22"># The right side of the ODE:</span>
+<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">g</span>(x, g_trial, gamma = <span style="color: #B452CD">2</span>):
+    <span style="color: #8B008B; font-weight: bold">return</span> -gamma*g_trial
 
-<p>We apply a filter of dimension \( 5\times 5 \) ten times with stride \( S=1 \) and padding \( P=0 \).</p>
+<span style="color: #228B22"># The same cost function as before, but calls deep_neural_network instead.</span>
+<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">cost_function_deep</span>(P, x):
 
-<p>The output volume is given by \( (32-5)/1+1=28 \), resulting in ten images
-of dimensionality \( 28\times 28\times 3 \).
-</p>
+    <span style="color: #228B22"># Evaluate the trial function with the current parameters P</span>
+    g_t = g_trial_deep(x,P)
 
-<p>The total number of parameters to train for each filter is then
-\( 5\times 5\times 3+1 \), where the last parameter is the bias. This
-gives us \( 76 \) parameters for each filter, leading to a total of \( 760 \)
-parameters for the ten filters.
-</p>
+    <span style="color: #228B22"># Find the derivative w.r.t x of the neural network</span>
+    d_net_out = elementwise_grad(deep_neural_network,<span style="color: #B452CD">1</span>)(P,x)
 
-<p>How many parameters will a filter of dimensionality \( 3\times 3 \)
-(adding color channels) result in if we produce \( 32 \) new images? Use \( S=1 \) and \( P=0 \).
-</p>
+    <span style="color: #228B22"># Find the derivative w.r.t x of the trial function</span>
+    d_g_t = elementwise_grad(g_trial_deep,<span style="color: #B452CD">0</span>)(x,P)
 
-<p>Note that strides constitute a form of <b>subsampling</b>. As an alternative to
-being interpreted as a measure of how much the kernel/filter is translated, strides
-can also be viewed as how much of the output is retained. For instance, moving
-the kernel by hops of two is equivalent to moving the kernel by hops of one but
-retaining only odd output elements.
-</p>
+    <span style="color: #228B22"># The right side of the ODE</span>
+    func = g(x, g_t)
 
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book">Summarizing: Performing a general discrete convolution (<a href="https://github.com/rasbt/machine-learning-book" target="_blank">From Raschka et al</a>) </h2>
+    err_sqr = (d_g_t - func)**<span style="color: #B452CD">2</span>
+    cost_sum = np.sum(err_sqr)
 
-<center>  <!-- FIGURE -->
-<hr class="figure">
-<center>
-<p class="caption">Figure 4:  A deep CNN </p>
-</center>
-<p><img src="http://github.com/figslides/discreteconv1.png" width="500" align="bottom"></p>
-</center>
+    <span style="color: #8B008B; font-weight: bold">return</span> cost_sum / np.size(err_sqr)
 
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="pooling">Pooling </h2>
+<span style="color: #228B22"># Solve the exponential decay ODE using neural network with one input and one output layer,</span>
+<span style="color: #228B22"># but with specified number of hidden layers from the user.</span>
+<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">solve_ode_deep_neural_network</span>(x, num_neurons, num_iter, lmb):
+    <span style="color: #228B22"># num_hidden_neurons is now a list of number of neurons within each hidden layer</span>
 
-<p>In addition to discrete convolutions themselves, <b>pooling</b> operations
-make up another important building block in CNNs. Pooling operations reduce
-the size of feature maps by using some function to summarize subregions, such
-as taking the average or the maximum value.
-</p>
+    <span style="color: #228B22"># The number of elements in the list num_hidden_neurons thus represents</span>
+    <span style="color: #228B22"># the number of hidden layers.</span>
 
-<p>Pooling works by sliding a window across the input and feeding the content of
-the window to a <b>pooling function</b>. In some sense, pooling works very much
-like a discrete convolution, but replaces the linear combination described by
-the kernel with some other function.
-</p>
+    <span style="color: #228B22"># Find the number of hidden layers:</span>
+    N_hidden = np.size(num_neurons)
 
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="pooling-arithmetic">Pooling arithmetic </h2>
+    <span style="color: #228B22">## Set up initial weights and biases</span>
 
-<p>In a neural network, pooling layers provide invariance to small translations of
-the input. The most common kind of pooling is <b>max pooling</b>, which
-consists in splitting the input in (usually non-overlapping) patches and
-outputting the maximum value of each patch. Other kinds of pooling exist, e.g.,
-mean or average pooling, which all share the same idea of aggregating the input
-locally by applying a non-linearity to the content of some patches.
-</p>
+    <span style="color: #228B22"># Initialize the list of parameters:</span>
+    P = [<span style="color: #8B008B; font-weight: bold">None</span>]*(N_hidden + <span style="color: #B452CD">1</span>) <span style="color: #228B22"># + 1 to include the output layer</span>
 
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book">Pooling types (<a href="https://github.com/rasbt/machine-learning-book" target="_blank">From Raschka et al</a>) </h2>
+    P[<span style="color: #B452CD">0</span>] = npr.randn(num_neurons[<span style="color: #B452CD">0</span>], <span style="color: #B452CD">2</span> )
+    <span style="color: #8B008B; font-weight: bold">for</span> l <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #B452CD">1</span>,N_hidden):
+        P[l] = npr.randn(num_neurons[l], num_neurons[l-<span style="color: #B452CD">1</span>] + <span style="color: #B452CD">1</span>) <span style="color: #228B22"># +1 to include bias</span>
 
-<center>  <!-- FIGURE -->
-<hr class="figure">
-<center>
-<p class="caption">Figure 5:  A deep CNN </p>
-</center>
-<p><img src="http://github.com/figslides/maxpooling.png" width="500" align="bottom"></p>
-</center>
+    <span style="color: #228B22"># For the output layer</span>
+    P[-<span style="color: #B452CD">1</span>] = npr.randn(<span style="color: #B452CD">1</span>, num_neurons[-<span style="color: #B452CD">1</span>] + <span style="color: #B452CD">1</span> ) <span style="color: #228B22"># +1 since bias is included</span>
 
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="building-convolutional-neural-networks-in-tensorflow-and-keras">Building convolutional neural networks in Tensorflow and Keras </h2>
+    <span style="color: #658b00">print</span>(<span style="color: #CD5555">&#39;Initial cost: %g&#39;</span>%cost_function_deep(P, x))
 
-<p>As discussed above, CNNs are neural networks built from the assumption that the inputs
-to the network are 2D images. This is important because the number of features or pixels in images
-grows very fast with the image size, and an enormous number of weights and biases are needed in order to build an accurate network.  
-</p>
+    <span style="color: #228B22">## Start finding the optimal weights using gradient descent</span>
 
-<p>As before, we still have our input, a hidden layer and an output. What's novel about convolutional networks
-are the <b>convolutional</b> and <b>pooling</b> layers stacked in pairs between the input and the hidden layer.
-In addition, the data is no longer represented as a 2D feature matrix, instead each input is a number of 2D
-matrices, typically 1 for each color dimension (Red, Green, Blue). 
-</p>
+    <span style="color: #228B22"># Find the Python function that represents the gradient of the cost function</span>
+    <span style="color: #228B22"># w.r.t the 0-th input argument -- that is the weights and biases in the hidden and output layer</span>
+    cost_function_deep_grad = grad(cost_function_deep,<span style="color: #B452CD">0</span>)
 
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="setting-it-up">Setting it up </h2>
+    <span style="color: #228B22"># Let the update be done num_iter times</span>
+    <span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(num_iter):
+        <span style="color: #228B22"># Evaluate the gradient at the current weights and biases in P.</span>
+        <span style="color: #228B22"># The cost_grad consist now of N_hidden + 1 arrays; the gradient w.r.t the weights and biases</span>
+        <span style="color: #228B22"># in the hidden layers and output layers evaluated at x.</span>
+        cost_deep_grad =  cost_function_deep_grad(P, x)
 
-<p>It means that to represent the entire
-dataset of images, we require a 4D matrix or <b>tensor</b>. This tensor has the dimensions:  
-</p>
-$$  
-(n_{inputs},\, n_{pixels, width},\, n_{pixels, height},\, depth) .
-$$
+        <span style="color: #8B008B; font-weight: bold">for</span> l <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(N_hidden+<span style="color: #B452CD">1</span>):
+            P[l] = P[l] - lmb * cost_deep_grad[l]
+
+    <span style="color: #658b00">print</span>(<span style="color: #CD5555">&#39;Final cost: %g&#39;</span>%cost_function_deep(P, x))
+
+    <span style="color: #8B008B; font-weight: bold">return</span> P
+
+<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">g_analytic</span>(x, gamma = <span style="color: #B452CD">2</span>, g0 = <span style="color: #B452CD">10</span>):
+    <span style="color: #8B008B; font-weight: bold">return</span> g0*np.exp(-gamma*x)
+
+<span style="color: #228B22"># Solve the given problem</span>
+<span style="color: #8B008B; font-weight: bold">if</span> <span style="color: #00688B">__name__</span> == <span style="color: #CD5555">&#39;__main__&#39;</span>:
+    npr.seed(<span style="color: #B452CD">15</span>)
+
+    <span style="color: #228B22">## Decide the vales of arguments to the function to solve</span>
+    N = <span style="color: #B452CD">10</span>
+    x = np.linspace(<span style="color: #B452CD">0</span>, <span style="color: #B452CD">1</span>, N)
+
+    <span style="color: #228B22">## Set up the initial parameters</span>
+    num_hidden_neurons = np.array([<span style="color: #B452CD">10</span>,<span style="color: #B452CD">10</span>])
+    num_iter = <span style="color: #B452CD">10000</span>
+    lmb = <span style="color: #B452CD">0.001</span>
+
+    P = solve_ode_deep_neural_network(x, num_hidden_neurons, num_iter, lmb)
+
+    res = g_trial_deep(x,P)
+    res_analytical = g_analytic(x)
+
+    plt.figure(figsize=(<span style="color: #B452CD">10</span>,<span style="color: #B452CD">10</span>))
+
+    plt.title(<span style="color: #CD5555">&#39;Performance of a deep neural network solving an ODE compared to the analytical solution&#39;</span>)
+    plt.plot(x, res_analytical)
+    plt.plot(x, res[<span style="color: #B452CD">0</span>,:])
+    plt.legend([<span style="color: #CD5555">&#39;analytical&#39;</span>,<span style="color: #CD5555">&#39;dnn&#39;</span>])
+    plt.ylabel(<span style="color: #CD5555">&#39;g(x)&#39;</span>)
+    plt.show()
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
+</div>
 
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="the-mnist-dataset-again">The MNIST dataset again </h2>
+<h2 id="example-population-growth">Example: Population growth </h2>
 
-<p>The MNIST dataset consists of grayscale images with a pixel size of
-\( 28\times 28 \), meaning we require \( 28 \times 28 = 724 \) weights to each
-neuron in the first hidden layer.
+<p>A logistic model of population growth assumes that a population converges toward an equilibrium.
+The population growth can be modeled by
 </p>
 
-<p>If we were to analyze images of size \( 128\times 128 \) we would require
-\( 128 \times 128 = 16384 \) weights to each neuron. Even worse if we were
-dealing with color images, as most images are, we have an image matrix
-of size \( 128\times 128 \) for each color dimension (Red, Green, Blue),
-meaning 3 times the number of weights \( = 49152 \) are required for every
-single neuron in the first hidden layer.
+$$
+\begin{equation} \label{log}
+	g'(t) = \alpha g(t)(A - g(t))
+\end{equation}
+$$
+
+<p>where \( g(t) \) is the population density at time \( t \), \( \alpha > 0 \) the growth rate and \( A > 0 \) is the maximum population number in the environment.
+Also, at \( t = 0 \) the population has the size \( g(0) = g_0 \), where \( g_0 \) is some chosen constant.
 </p>
 
+<p>In this example, similar network as for the exponential decay using Autograd has been used to solve the equation. However, as the implementation might suffer from e.g numerical instability
+and high execution time (this might be more apparent in the examples solving PDEs),
+using a library like  TensorFlow is recommended.
+Here, we stay with a more simple approach and implement for comparison, the simple forward Euler method.
+</p>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="strong-correlations">Strong correlations </h2>
+<h2 id="setting-up-the-problem">Setting up the problem </h2>
 
-<p>Images typically have strong local correlations, meaning that a small
-part of the image varies little from its neighboring regions. If for
-example we have an image of a blue car, we can roughly assume that a
-small blue part of the image is surrounded by other blue regions.
+<p>Here, we will model a population \( g(t) \) in an environment having carrying capacity \( A \).
+The population follows the model
 </p>
 
-<p>Therefore, instead of connecting every single pixel to a neuron in the
-first hidden layer, as we have previously done with deep neural
-networks, we can instead connect each neuron to a small part of the
-image (in all 3 RGB depth dimensions).  The size of each small area is
-fixed, and known as a <a href="https://en.wikipedia.org/wiki/Receptive_field" target="_blank">receptive</a>.
-</p>
+$$
+\begin{equation} \label{solveode_population}
+g'(t) = \alpha g(t)(A - g(t))
+\end{equation}
+$$
 
+<p>where \( g(0) = g_0 \).</p>
 
-<!-- !split  -->
-<h2 id="layers-of-a-cnn">Layers of a CNN </h2>
+<p>In this example, we let \( \alpha = 2 \), \( A = 1 \), and \( g_0 = 1.2 \).</p>
+
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="the-trial-solution">The trial solution </h2>
 
-<p>The layers of a convolutional neural network arrange neurons in 3D: width, height and depth.  
-The input image is typically a square matrix of depth 3. 
+<p>We will get a slightly different trial solution, as the boundary conditions are different
+compared to the case for exponential decay.
 </p>
 
-<p>A <b>convolution</b> is performed on the image which outputs
-a 3D volume of neurons. The weights to the input are arranged in a number of 2D matrices, known as <b>filters</b>.
+<p>A possible trial solution satisfying the condition \( g(0) = g_0 \) could be</p>
+
+<p>$$
+h_1(t) = g_0 + t \cdot N(t,P)
+$$
 </p>
 
-<p>Each filter slides along the input image, taking the dot product
-between each small part of the image and the filter, in all depth
-dimensions. This is then passed through a non-linear function,
-typically the <b>Rectified Linear (ReLu)</b> function, which serves as the
-activation of the neurons in the first convolutional layer. This is
-further passed through a <b>pooling layer</b>, which reduces the size of the
-convolutional layer, e.g. by taking the maximum or average across some
-small regions, and this serves as input to the next convolutional
-layer.
+<p>with \( N(t,P) \) being the output from the neural network with weights and biases for each layer collected in the set \( P \).</p>
+
+<p>The analytical solution is</p>
+
+<p>$$
+g(t) = \frac{Ag_0}{g_0 + (A - g_0)\exp(-\alpha A t)}
+$$
 </p>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="systematic-reduction">Systematic reduction </h2>
-
-<p>By systematically reducing the size of the input volume, through
-convolution and pooling, the network should create representations of
-small parts of the input, and then from them assemble representations
-of larger areas.  The final pooling layer is flattened to serve as
-input to a hidden layer, such that each neuron in the final pooling
-layer is connected to every single neuron in the hidden layer. This
-then serves as input to the output layer, e.g. a softmax output for
-classification.
-</p>
+<h2 id="the-program-using-autograd">The program using Autograd </h2>
 
+<p>The network will be the similar as for the exponential decay example, but with some small modifications for our problem.</p>
 
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="prerequisites-collect-and-pre-process-data">Prerequisites: Collect and pre-process data </h2>
 
 <!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
 <div class="cell border-box-sizing code_cell rendered">
@@ -1329,46 +1239,174 @@ <h2 id="prerequisites-collect-and-pre-process-data">Prerequisites: Collect and p
     <div class="inner_cell">
       <div class="input_area">
         <div class="highlight" style="background: #eeeedd">
-  <pre style="line-height: 125%;"><span style="color: #228B22"># import necessary packages</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">matplotlib.pyplot</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">plt</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn</span> <span style="color: #8B008B; font-weight: bold">import</span> datasets
+  <pre style="line-height: 125%;"><span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">autograd.numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">autograd</span> <span style="color: #8B008B; font-weight: bold">import</span> grad, elementwise_grad
+<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">autograd.numpy.random</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">npr</span>
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">matplotlib</span> <span style="color: #8B008B; font-weight: bold">import</span> pyplot <span style="color: #8B008B; font-weight: bold">as</span> plt
 
+<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">sigmoid</span>(z):
+    <span style="color: #8B008B; font-weight: bold">return</span> <span style="color: #B452CD">1</span>/(<span style="color: #B452CD">1</span> + np.exp(-z))
 
-<span style="color: #228B22"># ensure the same random numbers appear every time</span>
-np.random.seed(<span style="color: #B452CD">0</span>)
+<span style="color: #228B22"># Function to get the parameters.</span>
+<span style="color: #228B22"># Done such that one can easily change the paramaters after one&#39;s liking.</span>
+<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">get_parameters</span>():
+    alpha = <span style="color: #B452CD">2</span>
+    A = <span style="color: #B452CD">1</span>
+    g0 = <span style="color: #B452CD">1.2</span>
+    <span style="color: #8B008B; font-weight: bold">return</span> alpha, A, g0
 
-<span style="color: #228B22"># display images in notebook</span>
-%matplotlib inline
-plt.rcParams[<span style="color: #CD5555">&#39;figure.figsize&#39;</span>] = (<span style="color: #B452CD">12</span>,<span style="color: #B452CD">12</span>)
+<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">deep_neural_network</span>(deep_params, x):
+    <span style="color: #228B22"># N_hidden is the number of hidden layers</span>
+    <span style="color: #228B22"># deep_params is a list, len() should be used</span>
+    N_hidden = <span style="color: #658b00">len</span>(deep_params) - <span style="color: #B452CD">1</span> <span style="color: #228B22"># -1 since params consists of</span>
+                                        <span style="color: #228B22"># parameters to all the hidden</span>
+                                        <span style="color: #228B22"># layers AND the output layer.</span>
 
+    <span style="color: #228B22"># Assumes input x being an one-dimensional array</span>
+    num_values = np.size(x)
+    x = x.reshape(-<span style="color: #B452CD">1</span>, num_values)
 
-<span style="color: #228B22"># download MNIST dataset</span>
-digits = datasets.load_digits()
+    <span style="color: #228B22"># Assume that the input layer does nothing to the input x</span>
+    x_input = x
 
-<span style="color: #228B22"># define inputs and labels</span>
-inputs = digits.images
-labels = digits.target
+    <span style="color: #228B22"># Due to multiple hidden layers, define a variable referencing to the</span>
+    <span style="color: #228B22"># output of the previous layer:</span>
+    x_prev = x_input
 
-<span style="color: #228B22"># RGB images have a depth of 3</span>
-<span style="color: #228B22"># our images are grayscale so they should have a depth of 1</span>
-inputs = inputs[:,:,:,np.newaxis]
+    <span style="color: #228B22">## Hidden layers:</span>
 
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;inputs = (n_inputs, pixel_width, pixel_height, depth) = &quot;</span> + <span style="color: #658b00">str</span>(inputs.shape))
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;labels = (n_inputs) = &quot;</span> + <span style="color: #658b00">str</span>(labels.shape))
+    <span style="color: #8B008B; font-weight: bold">for</span> l <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(N_hidden):
+        <span style="color: #228B22"># From the list of parameters P; find the correct weigths and bias for this layer</span>
+        w_hidden = deep_params[l]
 
+        <span style="color: #228B22"># Add a row of ones to include bias</span>
+        x_prev = np.concatenate((np.ones((<span style="color: #B452CD">1</span>,num_values)), x_prev ), axis = <span style="color: #B452CD">0</span>)
 
-<span style="color: #228B22"># choose some random images to display</span>
-n_inputs = <span style="color: #658b00">len</span>(inputs)
-indices = np.arange(n_inputs)
-random_indices = np.random.choice(indices, size=<span style="color: #B452CD">5</span>)
+        z_hidden = np.matmul(w_hidden, x_prev)
+        x_hidden = sigmoid(z_hidden)
 
-<span style="color: #8B008B; font-weight: bold">for</span> i, image <span style="color: #8B008B">in</span> <span style="color: #658b00">enumerate</span>(digits.images[random_indices]):
-    plt.subplot(<span style="color: #B452CD">1</span>, <span style="color: #B452CD">5</span>, i+<span style="color: #B452CD">1</span>)
-    plt.axis(<span style="color: #CD5555">&#39;off&#39;</span>)
-    plt.imshow(image, cmap=plt.cm.gray_r, interpolation=<span style="color: #CD5555">&#39;nearest&#39;</span>)
-    plt.title(<span style="color: #CD5555">&quot;Label: %d&quot;</span> % digits.target[random_indices[i]])
-plt.show()
+        <span style="color: #228B22"># Update x_prev such that next layer can use the output from this layer</span>
+        x_prev = x_hidden
+
+    <span style="color: #228B22">## Output layer:</span>
+
+    <span style="color: #228B22"># Get the weights and bias for this layer</span>
+    w_output = deep_params[-<span style="color: #B452CD">1</span>]
+
+    <span style="color: #228B22"># Include bias:</span>
+    x_prev = np.concatenate((np.ones((<span style="color: #B452CD">1</span>,num_values)), x_prev), axis = <span style="color: #B452CD">0</span>)
+
+    z_output = np.matmul(w_output, x_prev)
+    x_output = z_output
+
+    <span style="color: #8B008B; font-weight: bold">return</span> x_output
+
+
+
+
+<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">cost_function_deep</span>(P, x):
+
+    <span style="color: #228B22"># Evaluate the trial function with the current parameters P</span>
+    g_t = g_trial_deep(x,P)
+
+    <span style="color: #228B22"># Find the derivative w.r.t x of the trial function</span>
+    d_g_t = elementwise_grad(g_trial_deep,<span style="color: #B452CD">0</span>)(x,P)
+
+    <span style="color: #228B22"># The right side of the ODE</span>
+    func = f(x, g_t)
+
+    err_sqr = (d_g_t - func)**<span style="color: #B452CD">2</span>
+    cost_sum = np.sum(err_sqr)
+
+    <span style="color: #8B008B; font-weight: bold">return</span> cost_sum / np.size(err_sqr)
+
+<span style="color: #228B22"># The right side of the ODE:</span>
+<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">f</span>(x, g_trial):
+    alpha,A, g0 = get_parameters()
+    <span style="color: #8B008B; font-weight: bold">return</span> alpha*g_trial*(A - g_trial)
+
+<span style="color: #228B22"># The trial solution using the deep neural network:</span>
+<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">g_trial_deep</span>(x, params):
+    alpha,A, g0 = get_parameters()
+    <span style="color: #8B008B; font-weight: bold">return</span> g0 + x*deep_neural_network(params,x)
+
+<span style="color: #228B22"># The analytical solution:</span>
+<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">g_analytic</span>(t):
+    alpha,A, g0 = get_parameters()
+    <span style="color: #8B008B; font-weight: bold">return</span> A*g0/(g0 + (A - g0)*np.exp(-alpha*A*t))
+
+<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">solve_ode_deep_neural_network</span>(x, num_neurons, num_iter, lmb):
+    <span style="color: #228B22"># num_hidden_neurons is now a list of number of neurons within each hidden layer</span>
+
+    <span style="color: #228B22"># Find the number of hidden layers:</span>
+    N_hidden = np.size(num_neurons)
+
+    <span style="color: #228B22">## Set up initial weigths and biases</span>
+
+    <span style="color: #228B22"># Initialize the list of parameters:</span>
+    P = [<span style="color: #8B008B; font-weight: bold">None</span>]*(N_hidden + <span style="color: #B452CD">1</span>) <span style="color: #228B22"># + 1 to include the output layer</span>
+
+    P[<span style="color: #B452CD">0</span>] = npr.randn(num_neurons[<span style="color: #B452CD">0</span>], <span style="color: #B452CD">2</span> )
+    <span style="color: #8B008B; font-weight: bold">for</span> l <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #B452CD">1</span>,N_hidden):
+        P[l] = npr.randn(num_neurons[l], num_neurons[l-<span style="color: #B452CD">1</span>] + <span style="color: #B452CD">1</span>) <span style="color: #228B22"># +1 to include bias</span>
+
+    <span style="color: #228B22"># For the output layer</span>
+    P[-<span style="color: #B452CD">1</span>] = npr.randn(<span style="color: #B452CD">1</span>, num_neurons[-<span style="color: #B452CD">1</span>] + <span style="color: #B452CD">1</span> ) <span style="color: #228B22"># +1 since bias is included</span>
+
+    <span style="color: #658b00">print</span>(<span style="color: #CD5555">&#39;Initial cost: %g&#39;</span>%cost_function_deep(P, x))
+
+    <span style="color: #228B22">## Start finding the optimal weigths using gradient descent</span>
+
+    <span style="color: #228B22"># Find the Python function that represents the gradient of the cost function</span>
+    <span style="color: #228B22"># w.r.t the 0-th input argument -- that is the weights and biases in the hidden and output layer</span>
+    cost_function_deep_grad = grad(cost_function_deep,<span style="color: #B452CD">0</span>)
+
+    <span style="color: #228B22"># Let the update be done num_iter times</span>
+    <span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(num_iter):
+        <span style="color: #228B22"># Evaluate the gradient at the current weights and biases in P.</span>
+        <span style="color: #228B22"># The cost_grad consist now of N_hidden + 1 arrays; the gradient w.r.t the weights and biases</span>
+        <span style="color: #228B22"># in the hidden layers and output layers evaluated at x.</span>
+        cost_deep_grad =  cost_function_deep_grad(P, x)
+
+        <span style="color: #8B008B; font-weight: bold">for</span> l <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(N_hidden+<span style="color: #B452CD">1</span>):
+            P[l] = P[l] - lmb * cost_deep_grad[l]
+
+    <span style="color: #658b00">print</span>(<span style="color: #CD5555">&#39;Final cost: %g&#39;</span>%cost_function_deep(P, x))
+
+    <span style="color: #8B008B; font-weight: bold">return</span> P
+
+<span style="color: #8B008B; font-weight: bold">if</span> <span style="color: #00688B">__name__</span> == <span style="color: #CD5555">&#39;__main__&#39;</span>:
+    npr.seed(<span style="color: #B452CD">4155</span>)
+
+    <span style="color: #228B22">## Decide the vales of arguments to the function to solve</span>
+    Nt = <span style="color: #B452CD">10</span>
+    T = <span style="color: #B452CD">1</span>
+    t = np.linspace(<span style="color: #B452CD">0</span>,T, Nt)
+
+    <span style="color: #228B22">## Set up the initial parameters</span>
+    num_hidden_neurons = [<span style="color: #B452CD">100</span>, <span style="color: #B452CD">50</span>, <span style="color: #B452CD">25</span>]
+    num_iter = <span style="color: #B452CD">1000</span>
+    lmb = <span style="color: #B452CD">1e-3</span>
+
+    P = solve_ode_deep_neural_network(t, num_hidden_neurons, num_iter, lmb)
+
+    g_dnn_ag = g_trial_deep(t,P)
+    g_analytical = g_analytic(t)
+
+    <span style="color: #228B22"># Find the maximum absolute difference between the solutons:</span>
+    diff_ag = np.max(np.abs(g_dnn_ag - g_analytical))
+    <span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;The max absolute difference between the solutions is: %g&quot;</span>%diff_ag)
+
+    plt.figure(figsize=(<span style="color: #B452CD">10</span>,<span style="color: #B452CD">10</span>))
+
+    plt.title(<span style="color: #CD5555">&#39;Performance of neural network solving an ODE compared to the analytical solution&#39;</span>)
+    plt.plot(t, g_analytical)
+    plt.plot(t, g_dnn_ag[<span style="color: #B452CD">0</span>,:])
+    plt.legend([<span style="color: #CD5555">&#39;analytical&#39;</span>,<span style="color: #CD5555">&#39;nn&#39;</span>])
+    plt.xlabel(<span style="color: #CD5555">&#39;t&#39;</span>)
+    plt.ylabel(<span style="color: #CD5555">&#39;g(t)&#39;</span>)
+
+    plt.show()
 </pre>
 </div>
       </div>
@@ -1386,7 +1424,58 @@ <h2 id="prerequisites-collect-and-pre-process-data">Prerequisites: Collect and p
 
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="importing-keras-and-tensorflow">Importing Keras and Tensorflow </h2>
+<h2 id="using-forward-euler-to-solve-the-ode">Using forward Euler to solve the ODE </h2>
+
+<p>A straightforward way of solving an ODE numerically, is to use Euler's method.</p>
+
+<p>Euler's method uses Taylor series to approximate the value at a function \( f \) at a step \( \Delta x \) from \( x \):</p>
+
+<p>$$
+f(x + \Delta x) \approx f(x) + \Delta x f'(x)
+$$
+</p>
+
+<p>In our case, using Euler's method to approximate the value of \( g \) at a step \( \Delta t \) from \( t \) yields</p>
+
+$$
+\begin{aligned}
+  g(t + \Delta t) &\approx g(t) + \Delta t g'(t) \\
+  &= g(t) + \Delta t \big(\alpha g(t)(A - g(t))\big)
+\end{aligned}
+$$
+
+<p>along with the condition that \( g(0) = g_0 \).</p>
+
+<p>Let \( t_i = i \cdot \Delta t \) where \( \Delta t = \frac{T}{N_t-1} \) where \( T \) is the final time our solver must solve for and \( N_t \) the number of values for \( t \in [0, T] \) for \( i = 0, \dots, N_t-1 \).</p>
+
+<p>For \( i \geq 1 \), we have that</p>
+$$
+\begin{aligned}
+t_i &= i\Delta t \\
+&= (i - 1)\Delta t + \Delta t \\
+&= t_{i-1} + \Delta t
+\end{aligned}
+$$
+
+<p>Now, if \( g_i = g(t_i) \) then</p>
+
+$$
+\begin{equation}
+  \begin{aligned}
+  g_i &= g(t_i) \\
+  &= g(t_{i-1} + \Delta t) \\
+  &\approx g(t_{i-1}) + \Delta t \big(\alpha g(t_{i-1})(A - g(t_{i-1}))\big) \\
+  &= g_{i-1} + \Delta t \big(\alpha g_{i-1}(A - g_{i-1})\big)
+  \end{aligned}
+\end{equation} \label{odenum}
+$$
+
+<p>for \( i \geq 1 \) and \( g_0 = g(t_0) = g(0) = g_0 \).</p>
+
+<p>Equation \eqref{odenum} could be implemented in the following way,
+extending the program that uses the network using Autograd:
+</p>
+
 
 <!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
 <div class="cell border-box-sizing code_cell rendered">
@@ -1394,28 +1483,71 @@ <h2 id="importing-keras-and-tensorflow">Importing Keras and Tensorflow </h2>
     <div class="inner_cell">
       <div class="input_area">
         <div class="highlight" style="background: #eeeedd">
-  <pre style="line-height: 125%;"><span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">tensorflow.keras</span> <span style="color: #8B008B; font-weight: bold">import</span> datasets, layers, models
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">tensorflow.keras.layers</span> <span style="color: #8B008B; font-weight: bold">import</span> Input
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">tensorflow.keras.models</span> <span style="color: #8B008B; font-weight: bold">import</span> Sequential      <span style="color: #228B22">#This allows appending layers to existing models</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">tensorflow.keras.layers</span> <span style="color: #8B008B; font-weight: bold">import</span> Dense           <span style="color: #228B22">#This allows defining the characteristics of a particular layer</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">tensorflow.keras</span> <span style="color: #8B008B; font-weight: bold">import</span> optimizers             <span style="color: #228B22">#This allows using whichever optimiser we want (sgd,adam,RMSprop)</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">tensorflow.keras</span> <span style="color: #8B008B; font-weight: bold">import</span> regularizers           <span style="color: #228B22">#This allows using whichever regularizer we want (l1,l2,l1_l2)</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">tensorflow.keras.utils</span> <span style="color: #8B008B; font-weight: bold">import</span> to_categorical   <span style="color: #228B22">#This allows using categorical cross entropy as the cost function</span>
-<span style="color: #228B22">#from tensorflow.keras import Conv2D</span>
-<span style="color: #228B22">#from tensorflow.keras import MaxPooling2D</span>
-<span style="color: #228B22">#from tensorflow.keras import Flatten</span>
-
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.model_selection</span> <span style="color: #8B008B; font-weight: bold">import</span> train_test_split
-
-<span style="color: #228B22"># representation of labels</span>
-labels = to_categorical(labels)
-
-<span style="color: #228B22"># split into train and test data</span>
-<span style="color: #228B22"># one-liner from scikit-learn library</span>
-train_size = <span style="color: #B452CD">0.8</span>
-test_size = <span style="color: #B452CD">1</span> - train_size
-X_train, X_test, Y_train, Y_test = train_test_split(inputs, labels, train_size=train_size,
-                                                    test_size=test_size)
+  <pre style="line-height: 125%;"><span style="color: #228B22"># Assume that all function definitions from the example program using Autograd</span>
+<span style="color: #228B22"># are located here.</span>
+
+<span style="color: #8B008B; font-weight: bold">if</span> <span style="color: #00688B">__name__</span> == <span style="color: #CD5555">&#39;__main__&#39;</span>:
+    npr.seed(<span style="color: #B452CD">4155</span>)
+
+    <span style="color: #228B22">## Decide the vales of arguments to the function to solve</span>
+    Nt = <span style="color: #B452CD">10</span>
+    T = <span style="color: #B452CD">1</span>
+    t = np.linspace(<span style="color: #B452CD">0</span>,T, Nt)
+
+    <span style="color: #228B22">## Set up the initial parameters</span>
+    num_hidden_neurons = [<span style="color: #B452CD">100</span>,<span style="color: #B452CD">50</span>,<span style="color: #B452CD">25</span>]
+    num_iter = <span style="color: #B452CD">1000</span>
+    lmb = <span style="color: #B452CD">1e-3</span>
+
+    P = solve_ode_deep_neural_network(t, num_hidden_neurons, num_iter, lmb)
+
+    g_dnn_ag = g_trial_deep(t,P)
+    g_analytical = g_analytic(t)
+
+    <span style="color: #228B22"># Find the maximum absolute difference between the solutons:</span>
+    diff_ag = np.max(np.abs(g_dnn_ag - g_analytical))
+    <span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;The max absolute difference between the solutions is: %g&quot;</span>%diff_ag)
+
+    plt.figure(figsize=(<span style="color: #B452CD">10</span>,<span style="color: #B452CD">10</span>))
+
+    plt.title(<span style="color: #CD5555">&#39;Performance of neural network solving an ODE compared to the analytical solution&#39;</span>)
+    plt.plot(t, g_analytical)
+    plt.plot(t, g_dnn_ag[<span style="color: #B452CD">0</span>,:])
+    plt.legend([<span style="color: #CD5555">&#39;analytical&#39;</span>,<span style="color: #CD5555">&#39;nn&#39;</span>])
+    plt.xlabel(<span style="color: #CD5555">&#39;t&#39;</span>)
+    plt.ylabel(<span style="color: #CD5555">&#39;g(t)&#39;</span>)
+
+    <span style="color: #228B22">## Find an approximation to the funtion using forward Euler</span>
+
+    alpha, A, g0 = get_parameters()
+    dt = T/(Nt - <span style="color: #B452CD">1</span>)
+
+    <span style="color: #228B22"># Perform forward Euler to solve the ODE</span>
+    g_euler = np.zeros(Nt)
+    g_euler[<span style="color: #B452CD">0</span>] = g0
+
+    <span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #B452CD">1</span>,Nt):
+        g_euler[i] = g_euler[i-<span style="color: #B452CD">1</span>] + dt*(alpha*g_euler[i-<span style="color: #B452CD">1</span>]*(A - g_euler[i-<span style="color: #B452CD">1</span>]))
+
+    <span style="color: #228B22"># Print the errors done by each method</span>
+    diff1 = np.max(np.abs(g_euler - g_analytical))
+    diff2 = np.max(np.abs(g_dnn_ag[<span style="color: #B452CD">0</span>,:] - g_analytical))
+
+    <span style="color: #658b00">print</span>(<span style="color: #CD5555">&#39;Max absolute difference between Euler method and analytical: %g&#39;</span>%diff1)
+    <span style="color: #658b00">print</span>(<span style="color: #CD5555">&#39;Max absolute difference between deep neural network and analytical: %g&#39;</span>%diff2)
+
+    <span style="color: #228B22"># Plot results</span>
+    plt.figure(figsize=(<span style="color: #B452CD">10</span>,<span style="color: #B452CD">10</span>))
+
+    plt.plot(t,g_euler)
+    plt.plot(t,g_analytical)
+    plt.plot(t,g_dnn_ag[<span style="color: #B452CD">0</span>,:])
+
+    plt.legend([<span style="color: #CD5555">&#39;euler&#39;</span>,<span style="color: #CD5555">&#39;analytical&#39;</span>,<span style="color: #CD5555">&#39;dnn&#39;</span>])
+    plt.xlabel(<span style="color: #CD5555">&#39;Time t&#39;</span>)
+    plt.ylabel(<span style="color: #CD5555">&#39;g(t)&#39;</span>)
+
+    plt.show()
 </pre>
 </div>
       </div>
@@ -1432,102 +1564,66 @@ <h2 id="importing-keras-and-tensorflow">Importing Keras and Tensorflow </h2>
 </div>
 
 
-<!-- !split  -->
-<h2 id="running-with-keras">Running with Keras </h2>
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="example-solving-the-one-dimensional-poisson-equation">Example: Solving the one dimensional Poisson equation </h2>
 
+<p>The Poisson equation for \( g(x) \) in one dimension is</p>
 
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="line-height: 125%;"><span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">create_convolutional_neural_network_keras</span>(input_shape, receptive_field,
-                                              n_filters, n_neurons_connected, n_categories,
-                                              eta, lmbd):
-    model = Sequential()
-    model.add(layers.Conv2D(n_filters, (receptive_field, receptive_field), input_shape=input_shape, padding=<span style="color: #CD5555">&#39;same&#39;</span>,
-              activation=<span style="color: #CD5555">&#39;relu&#39;</span>, kernel_regularizer=regularizers.l2(lmbd)))
-    model.add(layers.MaxPooling2D(pool_size=(<span style="color: #B452CD">2</span>, <span style="color: #B452CD">2</span>)))
-    model.add(layers.Flatten())
-    model.add(layers.Dense(n_neurons_connected, activation=<span style="color: #CD5555">&#39;relu&#39;</span>, kernel_regularizer=regularizers.l2(lmbd)))
-    model.add(layers.Dense(n_categories, activation=<span style="color: #CD5555">&#39;softmax&#39;</span>, kernel_regularizer=regularizers.l2(lmbd)))
-    
-    sgd = optimizers.SGD(learning_rate=eta)
-    model.compile(loss=<span style="color: #CD5555">&#39;categorical_crossentropy&#39;</span>, optimizer=sgd, metrics=[<span style="color: #CD5555">&#39;accuracy&#39;</span>])
-    
-    <span style="color: #8B008B; font-weight: bold">return</span> model
-
-epochs = <span style="color: #B452CD">100</span>
-batch_size = <span style="color: #B452CD">100</span>
-input_shape = X_train.shape[<span style="color: #B452CD">1</span>:<span style="color: #B452CD">4</span>]
-receptive_field = <span style="color: #B452CD">3</span>
-n_filters = <span style="color: #B452CD">10</span>
-n_neurons_connected = <span style="color: #B452CD">50</span>
-n_categories = <span style="color: #B452CD">10</span>
-
-eta_vals = np.logspace(-<span style="color: #B452CD">5</span>, <span style="color: #B452CD">1</span>, <span style="color: #B452CD">7</span>)
-lmbd_vals = np.logspace(-<span style="color: #B452CD">5</span>, <span style="color: #B452CD">1</span>, <span style="color: #B452CD">7</span>)
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
+$$
+\begin{equation} \label{poisson}
+  -g''(x) = f(x)
+\end{equation}
+$$
 
+<p>where \( f(x) \) is a given function for \( x \in (0,1) \).</p>
+
+<p>The conditions that \( g(x) \) is chosen to fulfill, are</p>
+$$
+\begin{align*}
+  g(0) &= 0 \\
+  g(1) &= 0
+\end{align*}
+$$
+
+<p>This equation can be solved numerically using programs where e.g Autograd and TensorFlow are used.
+The results from the networks can then be compared to the analytical solution.
+In addition, it could be interesting to see how a typical method for numerically solving second order ODEs compares to the neural networks.
+</p>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="final-part">Final part </h2>
+<h2 id="the-specific-equation-to-solve-for">The specific equation to solve for </h2>
 
+<p>Here, the function \( g(x) \) to solve for follows the equation</p>
 
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="line-height: 125%;">CNN_keras = np.zeros((<span style="color: #658b00">len</span>(eta_vals), <span style="color: #658b00">len</span>(lmbd_vals)), dtype=<span style="color: #658b00">object</span>)
-        
-<span style="color: #8B008B; font-weight: bold">for</span> i, eta <span style="color: #8B008B">in</span> <span style="color: #658b00">enumerate</span>(eta_vals):
-    <span style="color: #8B008B; font-weight: bold">for</span> j, lmbd <span style="color: #8B008B">in</span> <span style="color: #658b00">enumerate</span>(lmbd_vals):
-        CNN = create_convolutional_neural_network_keras(input_shape, receptive_field,
-                                              n_filters, n_neurons_connected, n_categories,
-                                              eta, lmbd)
-        CNN.fit(X_train, Y_train, epochs=epochs, batch_size=batch_size, verbose=<span style="color: #B452CD">0</span>)
-        scores = CNN.evaluate(X_test, Y_test)
-        
-        CNN_keras[i][j] = CNN
-        
-        <span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;Learning rate = &quot;</span>, eta)
-        <span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;Lambda = &quot;</span>, lmbd)
-        <span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;Test accuracy: %.3f&quot;</span> % scores[<span style="color: #B452CD">1</span>])
-        <span style="color: #658b00">print</span>()
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
+$$
+-g''(x) = f(x),\qquad x \in (0,1)
+$$
+
+<p>where \( f(x) \) is a given function, along with the chosen conditions</p>
+
+$$
+\begin{aligned}
+g(0) = g(1) = 0
+\end{aligned}\label{cond}
+$$
+
+<p>In this example, we consider the case when \( f(x) = (3x + x^2)\exp(x) \).</p>
+
+<p>For this case, a possible trial solution satisfying the conditions could be</p>
+
+$$
+g_t(x) = x \cdot (1-x) \cdot N(P,x)
+$$
+
+<p>The analytical solution for this problem is</p>
+
+$$
+g(x) = x(1 - x)\exp(x)
+$$
 
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="final-visualization">Final visualization </h2>
+<h2 id="solving-the-equation-using-autograd">Solving the equation using Autograd </h2>
 
 
 <!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
@@ -1536,270 +1632,160 @@ <h2 id="final-visualization">Final visualization </h2>
     <div class="inner_cell">
       <div class="input_area">
         <div class="highlight" style="background: #eeeedd">
-  <pre style="line-height: 125%;"><span style="color: #228B22"># visual representation of grid search</span>
-<span style="color: #228B22"># uses seaborn heatmap, could probably do this in matplotlib</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">seaborn</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">sns</span>
+  <pre style="line-height: 125%;"><span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">autograd.numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">autograd</span> <span style="color: #8B008B; font-weight: bold">import</span> grad, elementwise_grad
+<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">autograd.numpy.random</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">npr</span>
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">matplotlib</span> <span style="color: #8B008B; font-weight: bold">import</span> pyplot <span style="color: #8B008B; font-weight: bold">as</span> plt
 
-sns.set()
+<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">sigmoid</span>(z):
+    <span style="color: #8B008B; font-weight: bold">return</span> <span style="color: #B452CD">1</span>/(<span style="color: #B452CD">1</span> + np.exp(-z))
 
-train_accuracy = np.zeros((<span style="color: #658b00">len</span>(eta_vals), <span style="color: #658b00">len</span>(lmbd_vals)))
-test_accuracy = np.zeros((<span style="color: #658b00">len</span>(eta_vals), <span style="color: #658b00">len</span>(lmbd_vals)))
+<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">deep_neural_network</span>(deep_params, x):
+    <span style="color: #228B22"># N_hidden is the number of hidden layers</span>
+    <span style="color: #228B22"># deep_params is a list, len() should be used</span>
+    N_hidden = <span style="color: #658b00">len</span>(deep_params) - <span style="color: #B452CD">1</span> <span style="color: #228B22"># -1 since params consists of</span>
+                                        <span style="color: #228B22"># parameters to all the hidden</span>
+                                        <span style="color: #228B22"># layers AND the output layer.</span>
 
-<span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #658b00">len</span>(eta_vals)):
-    <span style="color: #8B008B; font-weight: bold">for</span> j <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #658b00">len</span>(lmbd_vals)):
-        CNN = CNN_keras[i][j]
+    <span style="color: #228B22"># Assumes input x being an one-dimensional array</span>
+    num_values = np.size(x)
+    x = x.reshape(-<span style="color: #B452CD">1</span>, num_values)
 
-        train_accuracy[i][j] = CNN.evaluate(X_train, Y_train)[<span style="color: #B452CD">1</span>]
-        test_accuracy[i][j] = CNN.evaluate(X_test, Y_test)[<span style="color: #B452CD">1</span>]
+    <span style="color: #228B22"># Assume that the input layer does nothing to the input x</span>
+    x_input = x
 
-        
-fig, ax = plt.subplots(figsize = (<span style="color: #B452CD">10</span>, <span style="color: #B452CD">10</span>))
-sns.heatmap(train_accuracy, annot=<span style="color: #8B008B; font-weight: bold">True</span>, ax=ax, cmap=<span style="color: #CD5555">&quot;viridis&quot;</span>)
-ax.set_title(<span style="color: #CD5555">&quot;Training Accuracy&quot;</span>)
-ax.set_ylabel(<span style="color: #CD5555">&quot;$\eta$&quot;</span>)
-ax.set_xlabel(<span style="color: #CD5555">&quot;$\lambda$&quot;</span>)
-plt.show()
+    <span style="color: #228B22"># Due to multiple hidden layers, define a variable referencing to the</span>
+    <span style="color: #228B22"># output of the previous layer:</span>
+    x_prev = x_input
 
-fig, ax = plt.subplots(figsize = (<span style="color: #B452CD">10</span>, <span style="color: #B452CD">10</span>))
-sns.heatmap(test_accuracy, annot=<span style="color: #8B008B; font-weight: bold">True</span>, ax=ax, cmap=<span style="color: #CD5555">&quot;viridis&quot;</span>)
-ax.set_title(<span style="color: #CD5555">&quot;Test Accuracy&quot;</span>)
-ax.set_ylabel(<span style="color: #CD5555">&quot;$\eta$&quot;</span>)
-ax.set_xlabel(<span style="color: #CD5555">&quot;$\lambda$&quot;</span>)
-plt.show()
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
+    <span style="color: #228B22">## Hidden layers:</span>
 
+    <span style="color: #8B008B; font-weight: bold">for</span> l <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(N_hidden):
+        <span style="color: #228B22"># From the list of parameters P; find the correct weigths and bias for this layer</span>
+        w_hidden = deep_params[l]
 
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="the-cifar01-data-set">The CIFAR01 data set </h2>
+        <span style="color: #228B22"># Add a row of ones to include bias</span>
+        x_prev = np.concatenate((np.ones((<span style="color: #B452CD">1</span>,num_values)), x_prev ), axis = <span style="color: #B452CD">0</span>)
 
-<p>The CIFAR10 dataset contains 60,000 color images in 10 classes, with
-6,000 images in each class. The dataset is divided into 50,000
-training images and 10,000 testing images. The classes are mutually
-exclusive and there is no overlap between them.
-</p>
+        z_hidden = np.matmul(w_hidden, x_prev)
+        x_hidden = sigmoid(z_hidden)
 
+        <span style="color: #228B22"># Update x_prev such that next layer can use the output from this layer</span>
+        x_prev = x_hidden
 
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="line-height: 125%;"><span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">tensorflow</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">tf</span>
+    <span style="color: #228B22">## Output layer:</span>
 
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">tensorflow.keras</span> <span style="color: #8B008B; font-weight: bold">import</span> datasets, layers, models
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">matplotlib.pyplot</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">plt</span>
+    <span style="color: #228B22"># Get the weights and bias for this layer</span>
+    w_output = deep_params[-<span style="color: #B452CD">1</span>]
 
-<span style="color: #228B22"># We import the data set</span>
-(train_images, train_labels), (test_images, test_labels) = datasets.cifar10.load_data()
+    <span style="color: #228B22"># Include bias:</span>
+    x_prev = np.concatenate((np.ones((<span style="color: #B452CD">1</span>,num_values)), x_prev), axis = <span style="color: #B452CD">0</span>)
 
-<span style="color: #228B22"># Normalize pixel values to be between 0 and 1 by dividing by 255. </span>
-train_images, test_images = train_images / <span style="color: #B452CD">255.0</span>, test_images / <span style="color: #B452CD">255.0</span>
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
+    z_output = np.matmul(w_output, x_prev)
+    x_output = z_output
 
+    <span style="color: #8B008B; font-weight: bold">return</span> x_output
 
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="verifying-the-data-set">Verifying the data set </h2>
 
-<p>To verify that the dataset looks correct, let's plot the first 25 images from the training set and display the class name below each image.</p>
+<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">solve_ode_deep_neural_network</span>(x, num_neurons, num_iter, lmb):
+    <span style="color: #228B22"># num_hidden_neurons is now a list of number of neurons within each hidden layer</span>
 
+    <span style="color: #228B22"># Find the number of hidden layers:</span>
+    N_hidden = np.size(num_neurons)
 
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="line-height: 125%;">class_names = [<span style="color: #CD5555">&#39;airplane&#39;</span>, <span style="color: #CD5555">&#39;automobile&#39;</span>, <span style="color: #CD5555">&#39;bird&#39;</span>, <span style="color: #CD5555">&#39;cat&#39;</span>, <span style="color: #CD5555">&#39;deer&#39;</span>,
-               <span style="color: #CD5555">&#39;dog&#39;</span>, <span style="color: #CD5555">&#39;frog&#39;</span>, <span style="color: #CD5555">&#39;horse&#39;</span>, <span style="color: #CD5555">&#39;ship&#39;</span>, <span style="color: #CD5555">&#39;truck&#39;</span>]
-plt.figure(figsize=(<span style="color: #B452CD">10</span>,<span style="color: #B452CD">10</span>))
-<span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #B452CD">25</span>):
-    plt.subplot(<span style="color: #B452CD">5</span>,<span style="color: #B452CD">5</span>,i+<span style="color: #B452CD">1</span>)
-    plt.xticks([])
-    plt.yticks([])
-    plt.grid(<span style="color: #8B008B; font-weight: bold">False</span>)
-    plt.imshow(train_images[i], cmap=plt.cm.binary)
-    <span style="color: #228B22"># The CIFAR labels happen to be arrays, </span>
-    <span style="color: #228B22"># which is why you need the extra index</span>
-    plt.xlabel(class_names[train_labels[i][<span style="color: #B452CD">0</span>]])
-plt.show()
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
+    <span style="color: #228B22">## Set up initial weigths and biases</span>
 
+    <span style="color: #228B22"># Initialize the list of parameters:</span>
+    P = [<span style="color: #8B008B; font-weight: bold">None</span>]*(N_hidden + <span style="color: #B452CD">1</span>) <span style="color: #228B22"># + 1 to include the output layer</span>
 
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="set-up-the-model">Set up  the model </h2>
+    P[<span style="color: #B452CD">0</span>] = npr.randn(num_neurons[<span style="color: #B452CD">0</span>], <span style="color: #B452CD">2</span> )
+    <span style="color: #8B008B; font-weight: bold">for</span> l <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #B452CD">1</span>,N_hidden):
+        P[l] = npr.randn(num_neurons[l], num_neurons[l-<span style="color: #B452CD">1</span>] + <span style="color: #B452CD">1</span>) <span style="color: #228B22"># +1 to include bias</span>
 
-<p>The 6 lines of code below define the convolutional base using a common pattern: a stack of Conv2D and MaxPooling2D layers.</p>
+    <span style="color: #228B22"># For the output layer</span>
+    P[-<span style="color: #B452CD">1</span>] = npr.randn(<span style="color: #B452CD">1</span>, num_neurons[-<span style="color: #B452CD">1</span>] + <span style="color: #B452CD">1</span> ) <span style="color: #228B22"># +1 since bias is included</span>
 
-<p>As input, a CNN takes tensors of shape (image_height, image_width, color_channels), ignoring the batch size. If you are new to these dimensions, color_channels refers to (R,G,B). In this example, you will configure our CNN to process inputs of shape (32, 32, 3), which is the format of CIFAR images. You can do this by passing the argument input_shape to our first layer.</p>
+    <span style="color: #658b00">print</span>(<span style="color: #CD5555">&#39;Initial cost: %g&#39;</span>%cost_function_deep(P, x))
 
+    <span style="color: #228B22">## Start finding the optimal weigths using gradient descent</span>
 
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="line-height: 125%;">model = models.Sequential()
-model.add(layers.Conv2D(<span style="color: #B452CD">32</span>, (<span style="color: #B452CD">3</span>, <span style="color: #B452CD">3</span>), activation=<span style="color: #CD5555">&#39;relu&#39;</span>, input_shape=(<span style="color: #B452CD">32</span>, <span style="color: #B452CD">32</span>, <span style="color: #B452CD">3</span>)))
-model.add(layers.MaxPooling2D((<span style="color: #B452CD">2</span>, <span style="color: #B452CD">2</span>)))
-model.add(layers.Conv2D(<span style="color: #B452CD">64</span>, (<span style="color: #B452CD">3</span>, <span style="color: #B452CD">3</span>), activation=<span style="color: #CD5555">&#39;relu&#39;</span>))
-model.add(layers.MaxPooling2D((<span style="color: #B452CD">2</span>, <span style="color: #B452CD">2</span>)))
-model.add(layers.Conv2D(<span style="color: #B452CD">64</span>, (<span style="color: #B452CD">3</span>, <span style="color: #B452CD">3</span>), activation=<span style="color: #CD5555">&#39;relu&#39;</span>))
+    <span style="color: #228B22"># Find the Python function that represents the gradient of the cost function</span>
+    <span style="color: #228B22"># w.r.t the 0-th input argument -- that is the weights and biases in the hidden and output layer</span>
+    cost_function_deep_grad = grad(cost_function_deep,<span style="color: #B452CD">0</span>)
 
-<span style="color: #228B22"># Let&#39;s display the architecture of our model so far.</span>
+    <span style="color: #228B22"># Let the update be done num_iter times</span>
+    <span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(num_iter):
+        <span style="color: #228B22"># Evaluate the gradient at the current weights and biases in P.</span>
+        <span style="color: #228B22"># The cost_grad consist now of N_hidden + 1 arrays; the gradient w.r.t the weights and biases</span>
+        <span style="color: #228B22"># in the hidden layers and output layers evaluated at x.</span>
+        cost_deep_grad =  cost_function_deep_grad(P, x)
 
-model.summary()
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
+        <span style="color: #8B008B; font-weight: bold">for</span> l <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(N_hidden+<span style="color: #B452CD">1</span>):
+            P[l] = P[l] - lmb * cost_deep_grad[l]
 
-<p>You can see that the output of every Conv2D and MaxPooling2D layer is a 3D tensor of shape (height, width, channels). The width and height dimensions tend to shrink as you go deeper in the network. The number of output channels for each Conv2D layer is controlled by the first argument (e.g., 32 or 64). Typically, as the width and height shrink, you can afford (computationally) to add more output channels in each Conv2D layer.</p>
+    <span style="color: #658b00">print</span>(<span style="color: #CD5555">&#39;Final cost: %g&#39;</span>%cost_function_deep(P, x))
 
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="add-dense-layers-on-top">Add Dense layers on top </h2>
-
-<p>To complete our model, you will feed the last output tensor from the
-convolutional base (of shape (4, 4, 64)) into one or more Dense layers
-to perform classification. Dense layers take vectors as input (which
-are 1D), while the current output is a 3D tensor. First, you will
-flatten (or unroll) the 3D output to 1D, then add one or more Dense
-layers on top. CIFAR has 10 output classes, so you use a final Dense
-layer with 10 outputs and a softmax activation.
-</p>
+    <span style="color: #8B008B; font-weight: bold">return</span> P
 
+<span style="color: #228B22">## Set up the cost function specified for this Poisson equation:</span>
 
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="line-height: 125%;">model.add(layers.Flatten())
-model.add(layers.Dense(<span style="color: #B452CD">64</span>, activation=<span style="color: #CD5555">&#39;relu&#39;</span>))
-model.add(layers.Dense(<span style="color: #B452CD">10</span>))
-<span style="color: #228B22"># Here&#39;s the complete architecture of our model</span>
+<span style="color: #228B22"># The right side of the ODE</span>
+<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">f</span>(x):
+    <span style="color: #8B008B; font-weight: bold">return</span> (<span style="color: #B452CD">3</span>*x + x**<span style="color: #B452CD">2</span>)*np.exp(x)
 
-model.summary()
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
+<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">cost_function_deep</span>(P, x):
 
-<p>As you can see, our (4, 4, 64) outputs were flattened into vectors of shape (1024) before going through two Dense layers.</p>
+    <span style="color: #228B22"># Evaluate the trial function with the current parameters P</span>
+    g_t = g_trial_deep(x,P)
 
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="compile-and-train-the-model">Compile and train the model </h2>
+    <span style="color: #228B22"># Find the derivative w.r.t x of the trial function</span>
+    d2_g_t = elementwise_grad(elementwise_grad(g_trial_deep,<span style="color: #B452CD">0</span>))(x,P)
 
+    right_side = f(x)
 
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="line-height: 125%;">model.compile(optimizer=<span style="color: #CD5555">&#39;adam&#39;</span>,
-              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=<span style="color: #8B008B; font-weight: bold">True</span>),
-              metrics=[<span style="color: #CD5555">&#39;accuracy&#39;</span>])
+    err_sqr = (-d2_g_t - right_side)**<span style="color: #B452CD">2</span>
+    cost_sum = np.sum(err_sqr)
 
-history = model.fit(train_images, train_labels, epochs=<span style="color: #B452CD">10</span>, 
-                    validation_data=(test_images, test_labels))
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
+    <span style="color: #8B008B; font-weight: bold">return</span> cost_sum/np.size(err_sqr)
 
+<span style="color: #228B22"># The trial solution:</span>
+<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">g_trial_deep</span>(x,P):
+    <span style="color: #8B008B; font-weight: bold">return</span> x*(<span style="color: #B452CD">1</span>-x)*deep_neural_network(P,x)
 
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="finally-evaluate-the-model">Finally, evaluate the model </h2>
+<span style="color: #228B22"># The analytic solution;</span>
+<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">g_analytic</span>(x):
+    <span style="color: #8B008B; font-weight: bold">return</span> x*(<span style="color: #B452CD">1</span>-x)*np.exp(x)
 
+<span style="color: #8B008B; font-weight: bold">if</span> <span style="color: #00688B">__name__</span> == <span style="color: #CD5555">&#39;__main__&#39;</span>:
+    npr.seed(<span style="color: #B452CD">4155</span>)
 
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="line-height: 125%;">plt.plot(history.history[<span style="color: #CD5555">&#39;accuracy&#39;</span>], label=<span style="color: #CD5555">&#39;accuracy&#39;</span>)
-plt.plot(history.history[<span style="color: #CD5555">&#39;val_accuracy&#39;</span>], label = <span style="color: #CD5555">&#39;val_accuracy&#39;</span>)
-plt.xlabel(<span style="color: #CD5555">&#39;Epoch&#39;</span>)
-plt.ylabel(<span style="color: #CD5555">&#39;Accuracy&#39;</span>)
-plt.ylim([<span style="color: #B452CD">0.5</span>, <span style="color: #B452CD">1</span>])
-plt.legend(loc=<span style="color: #CD5555">&#39;lower right&#39;</span>)
+    <span style="color: #228B22">## Decide the vales of arguments to the function to solve</span>
+    Nx = <span style="color: #B452CD">10</span>
+    x = np.linspace(<span style="color: #B452CD">0</span>,<span style="color: #B452CD">1</span>, Nx)
+
+    <span style="color: #228B22">## Set up the initial parameters</span>
+    num_hidden_neurons = [<span style="color: #B452CD">200</span>,<span style="color: #B452CD">100</span>]
+    num_iter = <span style="color: #B452CD">1000</span>
+    lmb = <span style="color: #B452CD">1e-3</span>
+
+    P = solve_ode_deep_neural_network(x, num_hidden_neurons, num_iter, lmb)
 
-test_loss, test_acc = model.evaluate(test_images,  test_labels, verbose=<span style="color: #B452CD">2</span>)
+    g_dnn_ag = g_trial_deep(x,P)
+    g_analytical = g_analytic(x)
 
-<span style="color: #658b00">print</span>(test_acc)
+    <span style="color: #228B22"># Find the maximum absolute difference between the solutons:</span>
+    max_diff = np.max(np.abs(g_dnn_ag - g_analytical))
+    <span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;The max absolute difference between the solutions is: %g&quot;</span>%max_diff)
+
+    plt.figure(figsize=(<span style="color: #B452CD">10</span>,<span style="color: #B452CD">10</span>))
+
+    plt.title(<span style="color: #CD5555">&#39;Performance of neural network solving an ODE compared to the analytical solution&#39;</span>)
+    plt.plot(x, g_analytical)
+    plt.plot(x, g_dnn_ag[<span style="color: #B452CD">0</span>,:])
+    plt.legend([<span style="color: #CD5555">&#39;analytical&#39;</span>,<span style="color: #CD5555">&#39;nn&#39;</span>])
+    plt.xlabel(<span style="color: #CD5555">&#39;x&#39;</span>)
+    plt.ylabel(<span style="color: #CD5555">&#39;g(x)&#39;</span>)
+    plt.show()
 </pre>
 </div>
       </div>
@@ -1817,59 +1803,98 @@ <h2 id="finally-evaluate-the-model">Finally, evaluate the model </h2>
 
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="building-our-own-cnn-code">Building our own CNN code </h2>
+<h2 id="comparing-with-a-numerical-scheme">Comparing with a numerical scheme </h2>
 
-<p>Here we present a flexible and readable python code for a CNN
-implemented with NumPy. We will present the code, showcase how to use
-the codebase and fit a CNN that yields a 99% accuracy on the 28x28
-MNIST dataset within reasonable time.
-</p>
+<p>The Poisson equation is possible to solve using Taylor series to approximate the second derivative.</p>
 
-<b>The codes here were developed by Eric Reber and Gregor Kajda during spring 2023.</b>
+<p>Using Taylor series, the second derivative can be expressed as</p>
 
-<p>The CNN is compatible with all schedulers, cost functions and
-activation functions discussed in constructing our neural network
-codes.
+<p>$$
+g''(x) = \frac{g(x + \Delta x) - 2g(x) + g(x-\Delta x)}{\Delta x^2} + E_{\Delta x}(x)
+$$
 </p>
 
-<p> The CNN code consists of different types of Layer classes, including
-Convolution2DLayer, Pooling2DLayer, FlattenLayer, FullyConnectedLayer
-and OutputLayer, which can be added to the CNN object using the
-interface of the CNN class. This allows you to easily construct your
-own CNN, as well as allowing you to get used to an interface similar
-to that of TensorFlow which is used for real world applications. 
-</p>
+<p>where \( \Delta x \) is a small step size and \( E_{\Delta x}(x) \) being the error term.</p>
 
-<p>Another important feature of this code is that it throws errors if
-unreasonable decisions are made (for example using a kernel that is
-larger than the image, not using a FlattenLayer, etc), and provides
-the user with an informative error message.
-</p>
-<h3 id="list-of-contents">List of contents: </h3>
-<ol>
-<li> Schedulers</li>
-<li> Activation Functions</li>
-<li> Cost Functions</li> 
-<li> Convolution</li>
-<li> Layers</li>
-<li> CNN</li> 
-<li> Some final remarks</li>
-</ol>
-<h3 id="schedulers">Schedulers </h3>
-
-<p>The code below shows object oriented implementations of the Constant,
-Momentum, Adagrad, AdagradMomentum, RMS prop and Adam schedulers. All
-of the classes belong to the shared abstract Scheduler class, and
-share the update_change() and reset() methods allowing for any of the
-schedulers to be seamlessly used during the training stage, as will
-later be shown in the fit() method of the neural
-network. Update_change() only has one parameter, the gradient
-(\( \delta^{l}_{j}a^{l-1}_k \)), and returns the change which will be
-subtracted from the weights. The reset() function takes no parameters,
-and resets the desired variables. For Constant and Momentum, reset
-does nothing.
+<p>Looking away from the error terms gives an approximation to the second derivative:</p>
+
+$$
+\begin{equation} \label{approx}
+g''(x) \approx \frac{g(x + \Delta x) - 2g(x) + g(x-\Delta x)}{\Delta x^2}
+\end{equation}
+$$
+
+<p>If \( x_i = i \Delta x = x_{i-1} + \Delta x \) and \( g_i = g(x_i) \) for \( i = 1,\dots N_x - 2 \) with \( N_x \) being the number of values for \( x \), \eqref{approx} becomes</p>
+
+$$
+\begin{aligned}
+g''(x_i) &\approx \frac{g(x_i + \Delta x) - 2g(x_i) + g(x_i -\Delta x)}{\Delta x^2} \\
+&= \frac{g_{i+1} - 2g_i + g_{i-1}}{\Delta x^2}
+\end{aligned}
+$$
+
+<p>Since we know from our problem that</p>
+
+$$
+\begin{aligned}
+-g''(x) &= f(x) \\
+&= (3x + x^2)\exp(x)
+\end{aligned}
+$$
+
+<p>along with the conditions \( g(0) = g(1) = 0 \),
+the following scheme can be used to find an approximate solution for \( g(x) \) numerically:
 </p>
 
+$$
+\begin{equation}
+  \begin{aligned}
+  -\Big( \frac{g_{i+1} - 2g_i + g_{i-1}}{\Delta x^2} \Big) &= f(x_i) \\
+  -g_{i+1} + 2g_i - g_{i-1} &= \Delta x^2 f(x_i)
+  \end{aligned}
+\end{equation} \label{odesys}
+$$
+
+<p>for \( i = 1, \dots, N_x - 2 \) where \( g_0 = g_{N_x - 1} = 0 \) and \( f(x_i) = (3x_i + x_i^2)\exp(x_i) \), which is given for our specific problem.</p>
+
+<p>The equation can be rewritten into a matrix equation:</p>
+
+$$
+\begin{aligned}
+\begin{pmatrix}
+2 & -1 & 0 & \dots & 0 \\
+-1 & 2 & -1 & \dots & 0 \\
+\vdots & & \ddots & & \vdots \\
+0 & \dots & -1 & 2 & -1  \\
+0 & \dots & 0 & -1 & 2\\
+\end{pmatrix}
+\begin{pmatrix}
+g_1 \\
+g_2 \\
+\vdots \\
+g_{N_x - 3} \\
+g_{N_x - 2}
+\end{pmatrix}
+&=
+\Delta x^2
+\begin{pmatrix}
+f(x_1) \\
+f(x_2) \\
+\vdots \\
+f(x_{N_x - 3}) \\
+f(x_{N_x - 2})
+\end{pmatrix} \\
+\boldsymbol{A}\boldsymbol{g} &= \boldsymbol{f},
+\end{aligned}
+$$
+
+<p>which makes it possible to solve for the vector \( \boldsymbol{g} \).</p>
+
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="setting-up-the-code">Setting up the code </h2>
+
+<p>We can then compare the result from this numerical scheme with the output from our network using Autograd:</p>
+
 
 <!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
 <div class="cell border-box-sizing code_cell rendered">
@@ -1878,135 +1903,199 @@ <h3 id="schedulers">Schedulers </h3>
       <div class="input_area">
         <div class="highlight" style="background: #eeeedd">
   <pre style="line-height: 125%;"><span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">autograd.numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">autograd</span> <span style="color: #8B008B; font-weight: bold">import</span> grad, elementwise_grad
+<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">autograd.numpy.random</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">npr</span>
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">matplotlib</span> <span style="color: #8B008B; font-weight: bold">import</span> pyplot <span style="color: #8B008B; font-weight: bold">as</span> plt
 
-<span style="color: #8B008B; font-weight: bold">class</span> <span style="color: #008b45; font-weight: bold">Scheduler</span>:
-    <span style="color: #CD5555">&quot;&quot;&quot;</span>
-<span style="color: #CD5555">    Abstract class for Schedulers</span>
-<span style="color: #CD5555">    &quot;&quot;&quot;</span>
+<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">sigmoid</span>(z):
+    <span style="color: #8B008B; font-weight: bold">return</span> <span style="color: #B452CD">1</span>/(<span style="color: #B452CD">1</span> + np.exp(-z))
 
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">__init__</span>(<span style="color: #658b00">self</span>, eta):
-        <span style="color: #658b00">self</span>.eta = eta
+<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">deep_neural_network</span>(deep_params, x):
+    <span style="color: #228B22"># N_hidden is the number of hidden layers</span>
+    <span style="color: #228B22"># deep_params is a list, len() should be used</span>
+    N_hidden = <span style="color: #658b00">len</span>(deep_params) - <span style="color: #B452CD">1</span> <span style="color: #228B22"># -1 since params consists of</span>
+                                        <span style="color: #228B22"># parameters to all the hidden</span>
+                                        <span style="color: #228B22"># layers AND the output layer.</span>
 
-    <span style="color: #228B22"># should be overwritten</span>
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">update_change</span>(<span style="color: #658b00">self</span>, gradient):
-        <span style="color: #8B008B; font-weight: bold">raise</span> <span style="color: #008b45; font-weight: bold">NotImplementedError</span>
+    <span style="color: #228B22"># Assumes input x being an one-dimensional array</span>
+    num_values = np.size(x)
+    x = x.reshape(-<span style="color: #B452CD">1</span>, num_values)
 
-    <span style="color: #228B22"># overwritten if needed</span>
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">reset</span>(<span style="color: #658b00">self</span>):
-        <span style="color: #8B008B; font-weight: bold">pass</span>
+    <span style="color: #228B22"># Assume that the input layer does nothing to the input x</span>
+    x_input = x
 
+    <span style="color: #228B22"># Due to multiple hidden layers, define a variable referencing to the</span>
+    <span style="color: #228B22"># output of the previous layer:</span>
+    x_prev = x_input
 
-<span style="color: #8B008B; font-weight: bold">class</span> <span style="color: #008b45; font-weight: bold">Constant</span>(Scheduler):
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">__init__</span>(<span style="color: #658b00">self</span>, eta):
-        <span style="color: #658b00">super</span>().<span style="color: #008b45">__init__</span>(eta)
+    <span style="color: #228B22">## Hidden layers:</span>
 
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">update_change</span>(<span style="color: #658b00">self</span>, gradient):
-        <span style="color: #8B008B; font-weight: bold">return</span> <span style="color: #658b00">self</span>.eta * gradient
-    
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">reset</span>(<span style="color: #658b00">self</span>):
-        <span style="color: #8B008B; font-weight: bold">pass</span>
+    <span style="color: #8B008B; font-weight: bold">for</span> l <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(N_hidden):
+        <span style="color: #228B22"># From the list of parameters P; find the correct weigths and bias for this layer</span>
+        w_hidden = deep_params[l]
+
+        <span style="color: #228B22"># Add a row of ones to include bias</span>
+        x_prev = np.concatenate((np.ones((<span style="color: #B452CD">1</span>,num_values)), x_prev ), axis = <span style="color: #B452CD">0</span>)
+
+        z_hidden = np.matmul(w_hidden, x_prev)
+        x_hidden = sigmoid(z_hidden)
+
+        <span style="color: #228B22"># Update x_prev such that next layer can use the output from this layer</span>
+        x_prev = x_hidden
+
+    <span style="color: #228B22">## Output layer:</span>
+
+    <span style="color: #228B22"># Get the weights and bias for this layer</span>
+    w_output = deep_params[-<span style="color: #B452CD">1</span>]
 
+    <span style="color: #228B22"># Include bias:</span>
+    x_prev = np.concatenate((np.ones((<span style="color: #B452CD">1</span>,num_values)), x_prev), axis = <span style="color: #B452CD">0</span>)
 
-<span style="color: #8B008B; font-weight: bold">class</span> <span style="color: #008b45; font-weight: bold">Momentum</span>(Scheduler):
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">__init__</span>(<span style="color: #658b00">self</span>, eta: <span style="color: #658b00">float</span>, momentum: <span style="color: #658b00">float</span>):
-        <span style="color: #658b00">super</span>().<span style="color: #008b45">__init__</span>(eta)
-        <span style="color: #658b00">self</span>.momentum = momentum
-        <span style="color: #658b00">self</span>.change = <span style="color: #B452CD">0</span>
+    z_output = np.matmul(w_output, x_prev)
+    x_output = z_output
 
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">update_change</span>(<span style="color: #658b00">self</span>, gradient):
-        <span style="color: #658b00">self</span>.change = <span style="color: #658b00">self</span>.momentum * <span style="color: #658b00">self</span>.change + <span style="color: #658b00">self</span>.eta * gradient
-        <span style="color: #8B008B; font-weight: bold">return</span> <span style="color: #658b00">self</span>.change
+    <span style="color: #8B008B; font-weight: bold">return</span> x_output
 
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">reset</span>(<span style="color: #658b00">self</span>):
-        <span style="color: #8B008B; font-weight: bold">pass</span>
 
+<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">solve_ode_deep_neural_network</span>(x, num_neurons, num_iter, lmb):
+    <span style="color: #228B22"># num_hidden_neurons is now a list of number of neurons within each hidden layer</span>
 
-<span style="color: #8B008B; font-weight: bold">class</span> <span style="color: #008b45; font-weight: bold">Adagrad</span>(Scheduler):
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">__init__</span>(<span style="color: #658b00">self</span>, eta):
-        <span style="color: #658b00">super</span>().<span style="color: #008b45">__init__</span>(eta)
-        <span style="color: #658b00">self</span>.G_t = <span style="color: #8B008B; font-weight: bold">None</span>
+    <span style="color: #228B22"># Find the number of hidden layers:</span>
+    N_hidden = np.size(num_neurons)
 
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">update_change</span>(<span style="color: #658b00">self</span>, gradient):
-        delta = <span style="color: #B452CD">1e-8</span>  <span style="color: #228B22"># avoid division ny zero</span>
+    <span style="color: #228B22">## Set up initial weigths and biases</span>
 
-        <span style="color: #8B008B; font-weight: bold">if</span> <span style="color: #658b00">self</span>.G_t <span style="color: #8B008B">is</span> <span style="color: #8B008B; font-weight: bold">None</span>:
-            <span style="color: #658b00">self</span>.G_t = np.zeros((gradient.shape[<span style="color: #B452CD">0</span>], gradient.shape[<span style="color: #B452CD">0</span>]))
+    <span style="color: #228B22"># Initialize the list of parameters:</span>
+    P = [<span style="color: #8B008B; font-weight: bold">None</span>]*(N_hidden + <span style="color: #B452CD">1</span>) <span style="color: #228B22"># + 1 to include the output layer</span>
 
-        <span style="color: #658b00">self</span>.G_t += gradient @ gradient.T
+    P[<span style="color: #B452CD">0</span>] = npr.randn(num_neurons[<span style="color: #B452CD">0</span>], <span style="color: #B452CD">2</span> )
+    <span style="color: #8B008B; font-weight: bold">for</span> l <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #B452CD">1</span>,N_hidden):
+        P[l] = npr.randn(num_neurons[l], num_neurons[l-<span style="color: #B452CD">1</span>] + <span style="color: #B452CD">1</span>) <span style="color: #228B22"># +1 to include bias</span>
 
-        G_t_inverse = <span style="color: #B452CD">1</span> / (
-            delta + np.sqrt(np.reshape(np.diagonal(<span style="color: #658b00">self</span>.G_t), (<span style="color: #658b00">self</span>.G_t.shape[<span style="color: #B452CD">0</span>], <span style="color: #B452CD">1</span>)))
-        )
-        <span style="color: #8B008B; font-weight: bold">return</span> <span style="color: #658b00">self</span>.eta * gradient * G_t_inverse
+    <span style="color: #228B22"># For the output layer</span>
+    P[-<span style="color: #B452CD">1</span>] = npr.randn(<span style="color: #B452CD">1</span>, num_neurons[-<span style="color: #B452CD">1</span>] + <span style="color: #B452CD">1</span> ) <span style="color: #228B22"># +1 since bias is included</span>
 
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">reset</span>(<span style="color: #658b00">self</span>):
-        <span style="color: #658b00">self</span>.G_t = <span style="color: #8B008B; font-weight: bold">None</span>
+    <span style="color: #658b00">print</span>(<span style="color: #CD5555">&#39;Initial cost: %g&#39;</span>%cost_function_deep(P, x))
 
+    <span style="color: #228B22">## Start finding the optimal weigths using gradient descent</span>
 
-<span style="color: #8B008B; font-weight: bold">class</span> <span style="color: #008b45; font-weight: bold">AdagradMomentum</span>(Scheduler):
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">__init__</span>(<span style="color: #658b00">self</span>, eta, momentum):
-        <span style="color: #658b00">super</span>().<span style="color: #008b45">__init__</span>(eta)
-        <span style="color: #658b00">self</span>.G_t = <span style="color: #8B008B; font-weight: bold">None</span>
-        <span style="color: #658b00">self</span>.momentum = momentum
-        <span style="color: #658b00">self</span>.change = <span style="color: #B452CD">0</span>
+    <span style="color: #228B22"># Find the Python function that represents the gradient of the cost function</span>
+    <span style="color: #228B22"># w.r.t the 0-th input argument -- that is the weights and biases in the hidden and output layer</span>
+    cost_function_deep_grad = grad(cost_function_deep,<span style="color: #B452CD">0</span>)
 
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">update_change</span>(<span style="color: #658b00">self</span>, gradient):
-        delta = <span style="color: #B452CD">1e-8</span>  <span style="color: #228B22"># avoid division ny zero</span>
+    <span style="color: #228B22"># Let the update be done num_iter times</span>
+    <span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(num_iter):
+        <span style="color: #228B22"># Evaluate the gradient at the current weights and biases in P.</span>
+        <span style="color: #228B22"># The cost_grad consist now of N_hidden + 1 arrays; the gradient w.r.t the weights and biases</span>
+        <span style="color: #228B22"># in the hidden layers and output layers evaluated at x.</span>
+        cost_deep_grad =  cost_function_deep_grad(P, x)
 
-        <span style="color: #8B008B; font-weight: bold">if</span> <span style="color: #658b00">self</span>.G_t <span style="color: #8B008B">is</span> <span style="color: #8B008B; font-weight: bold">None</span>:
-            <span style="color: #658b00">self</span>.G_t = np.zeros((gradient.shape[<span style="color: #B452CD">0</span>], gradient.shape[<span style="color: #B452CD">0</span>]))
+        <span style="color: #8B008B; font-weight: bold">for</span> l <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(N_hidden+<span style="color: #B452CD">1</span>):
+            P[l] = P[l] - lmb * cost_deep_grad[l]
 
-        <span style="color: #658b00">self</span>.G_t += gradient @ gradient.T
+    <span style="color: #658b00">print</span>(<span style="color: #CD5555">&#39;Final cost: %g&#39;</span>%cost_function_deep(P, x))
 
-        G_t_inverse = <span style="color: #B452CD">1</span> / (
-            delta + np.sqrt(np.reshape(np.diagonal(<span style="color: #658b00">self</span>.G_t), (<span style="color: #658b00">self</span>.G_t.shape[<span style="color: #B452CD">0</span>], <span style="color: #B452CD">1</span>)))
-        )
-        <span style="color: #658b00">self</span>.change = <span style="color: #658b00">self</span>.change * <span style="color: #658b00">self</span>.momentum + <span style="color: #658b00">self</span>.eta * gradient * G_t_inverse
-        <span style="color: #8B008B; font-weight: bold">return</span> <span style="color: #658b00">self</span>.change
+    <span style="color: #8B008B; font-weight: bold">return</span> P
 
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">reset</span>(<span style="color: #658b00">self</span>):
-        <span style="color: #658b00">self</span>.G_t = <span style="color: #8B008B; font-weight: bold">None</span>
+<span style="color: #228B22">## Set up the cost function specified for this Poisson equation:</span>
 
+<span style="color: #228B22"># The right side of the ODE</span>
+<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">f</span>(x):
+    <span style="color: #8B008B; font-weight: bold">return</span> (<span style="color: #B452CD">3</span>*x + x**<span style="color: #B452CD">2</span>)*np.exp(x)
 
-<span style="color: #8B008B; font-weight: bold">class</span> <span style="color: #008b45; font-weight: bold">RMS_prop</span>(Scheduler):
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">__init__</span>(<span style="color: #658b00">self</span>, eta, rho):
-        <span style="color: #658b00">super</span>().<span style="color: #008b45">__init__</span>(eta)
-        <span style="color: #658b00">self</span>.rho = rho
-        <span style="color: #658b00">self</span>.second = <span style="color: #B452CD">0.0</span>
+<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">cost_function_deep</span>(P, x):
 
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">update_change</span>(<span style="color: #658b00">self</span>, gradient):
-        delta = <span style="color: #B452CD">1e-8</span>  <span style="color: #228B22"># avoid division ny zero</span>
-        <span style="color: #658b00">self</span>.second = <span style="color: #658b00">self</span>.rho * <span style="color: #658b00">self</span>.second + (<span style="color: #B452CD">1</span> - <span style="color: #658b00">self</span>.rho) * gradient * gradient
-        <span style="color: #8B008B; font-weight: bold">return</span> <span style="color: #658b00">self</span>.eta * gradient / (np.sqrt(<span style="color: #658b00">self</span>.second + delta))
+    <span style="color: #228B22"># Evaluate the trial function with the current parameters P</span>
+    g_t = g_trial_deep(x,P)
 
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">reset</span>(<span style="color: #658b00">self</span>):
-        <span style="color: #658b00">self</span>.second = <span style="color: #B452CD">0.0</span>
+    <span style="color: #228B22"># Find the derivative w.r.t x of the trial function</span>
+    d2_g_t = elementwise_grad(elementwise_grad(g_trial_deep,<span style="color: #B452CD">0</span>))(x,P)
 
+    right_side = f(x)
 
-<span style="color: #8B008B; font-weight: bold">class</span> <span style="color: #008b45; font-weight: bold">Adam</span>(Scheduler):
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">__init__</span>(<span style="color: #658b00">self</span>, eta, rho, rho2):
-        <span style="color: #658b00">super</span>().<span style="color: #008b45">__init__</span>(eta)
-        <span style="color: #658b00">self</span>.rho = rho
-        <span style="color: #658b00">self</span>.rho2 = rho2
-        <span style="color: #658b00">self</span>.moment = <span style="color: #B452CD">0</span>
-        <span style="color: #658b00">self</span>.second = <span style="color: #B452CD">0</span>
-        <span style="color: #658b00">self</span>.n_epochs = <span style="color: #B452CD">1</span>
+    err_sqr = (-d2_g_t - right_side)**<span style="color: #B452CD">2</span>
+    cost_sum = np.sum(err_sqr)
 
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">update_change</span>(<span style="color: #658b00">self</span>, gradient):
-        delta = <span style="color: #B452CD">1e-8</span>  <span style="color: #228B22"># avoid division ny zero</span>
+    <span style="color: #8B008B; font-weight: bold">return</span> cost_sum/np.size(err_sqr)
 
-        <span style="color: #658b00">self</span>.moment = <span style="color: #658b00">self</span>.rho * <span style="color: #658b00">self</span>.moment + (<span style="color: #B452CD">1</span> - <span style="color: #658b00">self</span>.rho) * gradient
-        <span style="color: #658b00">self</span>.second = <span style="color: #658b00">self</span>.rho2 * <span style="color: #658b00">self</span>.second + (<span style="color: #B452CD">1</span> - <span style="color: #658b00">self</span>.rho2) * gradient * gradient
+<span style="color: #228B22"># The trial solution:</span>
+<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">g_trial_deep</span>(x,P):
+    <span style="color: #8B008B; font-weight: bold">return</span> x*(<span style="color: #B452CD">1</span>-x)*deep_neural_network(P,x)
 
-        moment_corrected = <span style="color: #658b00">self</span>.moment / (<span style="color: #B452CD">1</span> - <span style="color: #658b00">self</span>.rho**<span style="color: #658b00">self</span>.n_epochs)
-        second_corrected = <span style="color: #658b00">self</span>.second / (<span style="color: #B452CD">1</span> - <span style="color: #658b00">self</span>.rho2**<span style="color: #658b00">self</span>.n_epochs)
+<span style="color: #228B22"># The analytic solution;</span>
+<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">g_analytic</span>(x):
+    <span style="color: #8B008B; font-weight: bold">return</span> x*(<span style="color: #B452CD">1</span>-x)*np.exp(x)
 
-        <span style="color: #8B008B; font-weight: bold">return</span> <span style="color: #658b00">self</span>.eta * moment_corrected / (np.sqrt(second_corrected + delta))
+<span style="color: #8B008B; font-weight: bold">if</span> <span style="color: #00688B">__name__</span> == <span style="color: #CD5555">&#39;__main__&#39;</span>:
+    npr.seed(<span style="color: #B452CD">4155</span>)
 
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">reset</span>(<span style="color: #658b00">self</span>):
-        <span style="color: #658b00">self</span>.n_epochs += <span style="color: #B452CD">1</span>
-        <span style="color: #658b00">self</span>.moment = <span style="color: #B452CD">0</span>
-        <span style="color: #658b00">self</span>.second = <span style="color: #B452CD">0</span>
+    <span style="color: #228B22">## Decide the vales of arguments to the function to solve</span>
+    Nx = <span style="color: #B452CD">10</span>
+    x = np.linspace(<span style="color: #B452CD">0</span>,<span style="color: #B452CD">1</span>, Nx)
+
+    <span style="color: #228B22">## Set up the initial parameters</span>
+    num_hidden_neurons = [<span style="color: #B452CD">200</span>,<span style="color: #B452CD">100</span>]
+    num_iter = <span style="color: #B452CD">1000</span>
+    lmb = <span style="color: #B452CD">1e-3</span>
+
+    P = solve_ode_deep_neural_network(x, num_hidden_neurons, num_iter, lmb)
+
+    g_dnn_ag = g_trial_deep(x,P)
+    g_analytical = g_analytic(x)
+
+    <span style="color: #228B22"># Find the maximum absolute difference between the solutons:</span>
+
+    plt.figure(figsize=(<span style="color: #B452CD">10</span>,<span style="color: #B452CD">10</span>))
+
+    plt.title(<span style="color: #CD5555">&#39;Performance of neural network solving an ODE compared to the analytical solution&#39;</span>)
+    plt.plot(x, g_analytical)
+    plt.plot(x, g_dnn_ag[<span style="color: #B452CD">0</span>,:])
+    plt.legend([<span style="color: #CD5555">&#39;analytical&#39;</span>,<span style="color: #CD5555">&#39;nn&#39;</span>])
+    plt.xlabel(<span style="color: #CD5555">&#39;x&#39;</span>)
+    plt.ylabel(<span style="color: #CD5555">&#39;g(x)&#39;</span>)
+
+    <span style="color: #228B22">## Perform the computation using the numerical scheme</span>
+
+    dx = <span style="color: #B452CD">1</span>/(Nx - <span style="color: #B452CD">1</span>)
+
+    <span style="color: #228B22"># Set up the matrix A</span>
+    A = np.zeros((Nx-<span style="color: #B452CD">2</span>,Nx-<span style="color: #B452CD">2</span>))
+
+    A[<span style="color: #B452CD">0</span>,<span style="color: #B452CD">0</span>] = <span style="color: #B452CD">2</span>
+    A[<span style="color: #B452CD">0</span>,<span style="color: #B452CD">1</span>] = -<span style="color: #B452CD">1</span>
+
+    <span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #B452CD">1</span>,Nx-<span style="color: #B452CD">3</span>):
+        A[i,i-<span style="color: #B452CD">1</span>] = -<span style="color: #B452CD">1</span>
+        A[i,i] = <span style="color: #B452CD">2</span>
+        A[i,i+<span style="color: #B452CD">1</span>] = -<span style="color: #B452CD">1</span>
+
+    A[Nx - <span style="color: #B452CD">3</span>, Nx - <span style="color: #B452CD">4</span>] = -<span style="color: #B452CD">1</span>
+    A[Nx - <span style="color: #B452CD">3</span>, Nx - <span style="color: #B452CD">3</span>] = <span style="color: #B452CD">2</span>
+
+    <span style="color: #228B22"># Set up the vector f</span>
+    f_vec = dx**<span style="color: #B452CD">2</span> * f(x[<span style="color: #B452CD">1</span>:-<span style="color: #B452CD">1</span>])
+
+    <span style="color: #228B22"># Solve the equation</span>
+    g_res = np.linalg.solve(A,f_vec)
+
+    g_vec = np.zeros(Nx)
+    g_vec[<span style="color: #B452CD">1</span>:-<span style="color: #B452CD">1</span>] = g_res
+
+    <span style="color: #228B22"># Print the differences between each method</span>
+    max_diff1 = np.max(np.abs(g_dnn_ag - g_analytical))
+    max_diff2 = np.max(np.abs(g_vec - g_analytical))
+    <span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;The max absolute difference between the analytical solution and DNN Autograd: %g&quot;</span>%max_diff1)
+    <span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;The max absolute difference between the analytical solution and numerical scheme: %g&quot;</span>%max_diff2)
+
+    <span style="color: #228B22"># Plot the results</span>
+    plt.figure(figsize=(<span style="color: #B452CD">10</span>,<span style="color: #B452CD">10</span>))
+
+    plt.plot(x,g_vec)
+    plt.plot(x,g_analytical)
+    plt.plot(x,g_dnn_ag[<span style="color: #B452CD">0</span>,:])
+
+    plt.legend([<span style="color: #CD5555">&#39;numerical scheme&#39;</span>,<span style="color: #CD5555">&#39;analytical&#39;</span>,<span style="color: #CD5555">&#39;dnn&#39;</span>])
+    plt.show()
 </pre>
 </div>
       </div>
@@ -2021,138 +2110,144 @@ <h3 id="schedulers">Schedulers </h3>
     </div>
   </div>
 </div>
-<h3 id="usage-of-schedulers">Usage of schedulers </h3>
 
-<p>To initalize a scheduler, simply create the object and pass in the necessary parameters such as the learning rate and the momentum as shown below. As the Scheduler class is an abstract class it should not called directly, and will raise an error upon usage.</p>
 
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="partial-differential-equations">Partial Differential Equations </h2>
 
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="line-height: 125%;">momentum_scheduler = Momentum(eta=<span style="color: #B452CD">1e-3</span>, momentum=<span style="color: #B452CD">0.9</span>)
-adam_scheduler = Adam(eta=<span style="color: #B452CD">1e-3</span>, rho=<span style="color: #B452CD">0.9</span>, rho2=<span style="color: #B452CD">0.999</span>)
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
+<p>A partial differential equation (PDE) has a solution here the function
+is defined by multiple variables.  The equation may involve all kinds
+of combinations of which variables the function is differentiated with
+respect to.
+</p>
 
-<p>Here is a small example for how a segment of code using schedulers could look. Switching out the schedulers is simple.</p>
+<p>In general, a partial differential equation for a function \( g(x_1,\dots,x_N) \) with \( N \) variables may be expressed as</p>
 
+$$
+\begin{equation} \label{PDE}
+  f\left(x_1, \, \dots \, , x_N, \frac{\partial g(x_1,\dots,x_N) }{\partial x_1}, \dots , \frac{\partial g(x_1,\dots,x_N) }{\partial x_N}, \frac{\partial g(x_1,\dots,x_N) }{\partial x_1\partial x_2}, \, \dots \, , \frac{\partial^n g(x_1,\dots,x_N) }{\partial x_N^n} \right) = 0
+\end{equation}
+$$
 
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="line-height: 125%;">weights = np.ones((<span style="color: #B452CD">3</span>,<span style="color: #B452CD">3</span>))
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">f&quot;Before scheduler:\n{</span>weights<span style="color: #CD5555">=}&quot;</span>)
+<p>where \( f \) is an expression involving all kinds of possible mixed derivatives of \( g(x_1,\dots,x_N) \) up to an order \( n \). In order for the solution to be unique, some additional conditions must also be given.</p>
 
-epochs = <span style="color: #B452CD">10</span>
-<span style="color: #8B008B; font-weight: bold">for</span> e <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(epochs):
-    gradient = np.random.rand(<span style="color: #B452CD">3</span>, <span style="color: #B452CD">3</span>)
-    change = adam_scheduler.update_change(gradient)
-    weights = weights - change
-    adam_scheduler.reset()
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="type-of-problem">Type of problem </h2>
 
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">f&quot;\nAfter scheduler:\n{</span>weights<span style="color: #CD5555">=}&quot;</span>)
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-<h3 id="cost-functions">Cost functions </h3>
+<p>The problem our network must solve for, is similar to the ODE case.
+We must have a trial solution \( g_t \) at hand.
+</p>
+
+<p>For instance, the trial solution could be expressed as</p>
+$$
+\begin{align*}
+  g_t(x_1,\dots,x_N) = h_1(x_1,\dots,x_N) + h_2(x_1,\dots,x_N,N(x_1,\dots,x_N,P))
+\end{align*}
+$$
 
-<p>In this section we will quickly look at cost functions that can be
-used when creating the neural network. Every cost function takes the
-target vector as its parameter, and returns a function valued only at
-X such that it may easily be differentiated.
+<p>where \( h_1(x_1,\dots,x_N) \) is a function that ensures \( g_t(x_1,\dots,x_N) \) satisfies some given conditions.
+The neural network \( N(x_1,\dots,x_N,P) \) has weights and biases described by \( P \) and \( h_2(x_1,\dots,x_N,N(x_1,\dots,x_N,P)) \) is an expression using the output from the neural network in some way.
 </p>
 
+<p>The role of the function \( h_2(x_1,\dots,x_N,N(x_1,\dots,x_N,P)) \), is to ensure that the output of \( N(x_1,\dots,x_N,P) \) is zero when \( g_t(x_1,\dots,x_N) \) is evaluated at the values of \( x_1,\dots,x_N \) where the given conditions must be satisfied. The function \( h_1(x_1,\dots,x_N) \) should alone make \( g_t(x_1,\dots,x_N) \) satisfy the conditions.</p>
 
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="line-height: 125%;"><span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">CostOLS</span>(target):
-    <span style="color: #CD5555">&quot;&quot;&quot;</span>
-<span style="color: #CD5555">    Return OLS function valued only at X, so</span>
-<span style="color: #CD5555">    that it may be easily differentiated</span>
-<span style="color: #CD5555">    &quot;&quot;&quot;</span>
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="network-requirements">Network requirements </h2>
 
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">func</span>(X):
-        <span style="color: #8B008B; font-weight: bold">return</span> (<span style="color: #B452CD">1.0</span> / target.shape[<span style="color: #B452CD">0</span>]) * np.sum((target - X) ** <span style="color: #B452CD">2</span>)
+<p>The network tries then the minimize the cost function following the
+same ideas as described for the ODE case, but now with more than one
+variables to consider.  The concept still remains the same; find a set
+of parameters \( P \) such that the expression \( f \) in \eqref{PDE} is as
+close to zero as possible.
+</p>
 
-    <span style="color: #8B008B; font-weight: bold">return</span> func
+<p>As for the ODE case, the cost function is the mean squared error that
+the network must try to minimize. The cost function for the network to
+minimize is
+</p>
 
+$$
+\begin{equation*}
+C\left(x_1, \dots, x_N, P\right) = \left(  f\left(x_1, \, \dots \, , x_N, \frac{\partial g(x_1,\dots,x_N) }{\partial x_1}, \dots , \frac{\partial g(x_1,\dots,x_N) }{\partial x_N}, \frac{\partial g(x_1,\dots,x_N) }{\partial x_1\partial x_2}, \, \dots \, , \frac{\partial^n g(x_1,\dots,x_N) }{\partial x_N^n} \right) \right)^2
+\end{equation*}
+$$
 
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">CostLogReg</span>(target):
-    <span style="color: #CD5555">&quot;&quot;&quot;</span>
-<span style="color: #CD5555">    Return Logistic Regression cost function</span>
-<span style="color: #CD5555">    valued only at X, so that it may be easily differentiated</span>
-<span style="color: #CD5555">    &quot;&quot;&quot;</span>
 
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">func</span>(X):
-        <span style="color: #8B008B; font-weight: bold">return</span> -(<span style="color: #B452CD">1.0</span> / target.shape[<span style="color: #B452CD">0</span>]) * np.sum(
-            (target * np.log(X + <span style="color: #B452CD">10e-10</span>)) + ((<span style="color: #B452CD">1</span> - target) * np.log(<span style="color: #B452CD">1</span> - X + <span style="color: #B452CD">10e-10</span>))
-        )
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="more-details">More details </h2>
 
-    <span style="color: #8B008B; font-weight: bold">return</span> func
+<p>If we let \( \boldsymbol{x} = \big( x_1, \dots, x_N \big) \) be an array containing the values for \( x_1, \dots, x_N \) respectively, the cost function can be reformulated into the following:</p>
+$$
+	C\left(\boldsymbol{x}, P\right) = f\left( \left( \boldsymbol{x}, \frac{\partial g(\boldsymbol{x}) }{\partial x_1}, \dots , \frac{\partial g(\boldsymbol{x}) }{\partial x_N}, \frac{\partial g(\boldsymbol{x}) }{\partial x_1\partial x_2}, \, \dots \, , \frac{\partial^n g(\boldsymbol{x}) }{\partial x_N^n} \right) \right)^2
+$$
 
+<p>If we also have \( M \) different sets of values for \( x_1, \dots, x_N \), that is \( \boldsymbol{x}_i = \big(x_1^{(i)}, \dots, x_N^{(i)}\big) \) for \( i = 1,\dots,M \) being the rows in matrix \( X \), the cost function can be generalized into</p>
+$$
+\begin{equation*}
+C\left(X, P \right) = \sum_{i=1}^M f\left( \left( \boldsymbol{x}_i, \frac{\partial g(\boldsymbol{x}_i) }{\partial x_1}, \dots , \frac{\partial g(\boldsymbol{x}_i) }{\partial x_N}, \frac{\partial g(\boldsymbol{x}_i) }{\partial x_1\partial x_2}, \, \dots \, , \frac{\partial^n g(\boldsymbol{x}_i) }{\partial x_N^n} \right) \right)^2.
+\end{equation*}
+$$
 
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">CostCrossEntropy</span>(target):
-    <span style="color: #CD5555">&quot;&quot;&quot;</span>
-<span style="color: #CD5555">    Return cross entropy cost function valued only at X, so</span>
-<span style="color: #CD5555">    that it may be easily differentiated</span>
-<span style="color: #CD5555">    &quot;&quot;&quot;</span>
-    
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">func</span>(X):
-        <span style="color: #8B008B; font-weight: bold">return</span> -(<span style="color: #B452CD">1.0</span> / target.size) * np.sum(target * np.log(X + <span style="color: #B452CD">10e-10</span>))
 
-    <span style="color: #8B008B; font-weight: bold">return</span> func
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-<h3 id="usage-of-cost-functions">Usage of cost functions </h3>
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="example-the-diffusion-equation">Example: The diffusion equation </h2>
+
+<p>In one spatial dimension, the equation reads</p>
+$$
+\begin{equation*}
+  \frac{\partial g(x,t)}{\partial t} = \frac{\partial^2 g(x,t)}{\partial x^2}
+\end{equation*}
+$$
+
+<p>where a possible choice of conditions are</p>
+$$
+\begin{align*}
+g(0,t) &= 0 ,\qquad t \geq 0 \\
+g(1,t) &= 0, \qquad t \geq 0 \\
+g(x,0) &= u(x),\qquad x\in [0,1]
+\end{align*}
+$$
+
+<p>with \( u(x) \) being some given function.</p>
+
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="defining-the-problem">Defining the problem </h2>
+
+<p>For this case, we want to find \( g(x,t) \) such that</p>
+
+$$
+\begin{equation}
+  \frac{\partial g(x,t)}{\partial t} = \frac{\partial^2 g(x,t)}{\partial x^2}
+\end{equation} \label{diffonedim}
+$$
+
+<p>and</p>
+
+$$
+\begin{align*}
+g(0,t) &= 0 ,\qquad t \geq 0 \\
+g(1,t) &= 0, \qquad t \geq 0 \\
+g(x,0) &= u(x),\qquad x\in [0,1]
+\end{align*}
+$$
+
+<p>with \( u(x) = \sin(\pi x) \).</p>
 
-<p>Below we will provide a short example of how these cost function may
-be used to obtain results if you wish to test them out on your own
-using AutoGrad's automatic differentiation.
+<p>First, let us set up the deep neural network.
+The deep neural network will follow the same structure as discussed in the examples solving the ODEs.
+First, we will look into how Autograd could be used in a network tailored to solve for bivariate functions.
+</p>
+
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="setting-up-the-network-using-autograd">Setting up the network using Autograd </h2>
+
+<p>The only change to do here, is to extend our network such that
+functions of multiple parameters are correctly handled.  In this case
+we have two variables in our function to solve for, that is time \( t \)
+and position \( x \).  The variables will be represented by a
+one-dimensional array in the program.  The program will evaluate the
+network at each possible pair \( (x,t) \), given an array for the desired
+\( x \)-values and \( t \)-values to approximate the solution at.
 </p>
 
 
@@ -2162,16 +2257,50 @@ <h3 id="usage-of-cost-functions">Usage of cost functions </h3>
     <div class="inner_cell">
       <div class="input_area">
         <div class="highlight" style="background: #eeeedd">
-  <pre style="line-height: 125%;"><span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">autograd</span> <span style="color: #8B008B; font-weight: bold">import</span> grad
+  <pre style="line-height: 125%;"><span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">sigmoid</span>(z):
+    <span style="color: #8B008B; font-weight: bold">return</span> <span style="color: #B452CD">1</span>/(<span style="color: #B452CD">1</span> + np.exp(-z))
+
+<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">deep_neural_network</span>(deep_params, x):
+    <span style="color: #228B22"># x is now a point and a 1D numpy array; make it a column vector</span>
+    num_coordinates = np.size(x,<span style="color: #B452CD">0</span>)
+    x = x.reshape(num_coordinates,-<span style="color: #B452CD">1</span>)
+
+    num_points = np.size(x,<span style="color: #B452CD">1</span>)
+
+    <span style="color: #228B22"># N_hidden is the number of hidden layers</span>
+    N_hidden = <span style="color: #658b00">len</span>(deep_params) - <span style="color: #B452CD">1</span> <span style="color: #228B22"># -1 since params consist of parameters to all the hidden layers AND the output layer</span>
+
+    <span style="color: #228B22"># Assume that the input layer does nothing to the input x</span>
+    x_input = x
+    x_prev = x_input
 
-target = np.array([[<span style="color: #B452CD">1</span>, <span style="color: #B452CD">2</span>, <span style="color: #B452CD">3</span>]]).T
-a = np.array([[<span style="color: #B452CD">4</span>, <span style="color: #B452CD">5</span>, <span style="color: #B452CD">6</span>]]).T
+    <span style="color: #228B22">## Hidden layers:</span>
 
-cost_func = CostCrossEntropy
-cost_func_derivative = grad(cost_func(target))
+    <span style="color: #8B008B; font-weight: bold">for</span> l <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(N_hidden):
+        <span style="color: #228B22"># From the list of parameters P; find the correct weigths and bias for this layer</span>
+        w_hidden = deep_params[l]
 
-valued_at_a = cost_func_derivative(a)
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">f&quot;Derivative of cost function {</span>cost_func.<span style="color: #00688B">__name__</span><span style="color: #CD5555">} valued at a:\n{</span>valued_at_a<span style="color: #CD5555">}&quot;</span>)
+        <span style="color: #228B22"># Add a row of ones to include bias</span>
+        x_prev = np.concatenate((np.ones((<span style="color: #B452CD">1</span>,num_points)), x_prev ), axis = <span style="color: #B452CD">0</span>)
+
+        z_hidden = np.matmul(w_hidden, x_prev)
+        x_hidden = sigmoid(z_hidden)
+
+        <span style="color: #228B22"># Update x_prev such that next layer can use the output from this layer</span>
+        x_prev = x_hidden
+
+    <span style="color: #228B22">## Output layer:</span>
+
+    <span style="color: #228B22"># Get the weights and bias for this layer</span>
+    w_output = deep_params[-<span style="color: #B452CD">1</span>]
+
+    <span style="color: #228B22"># Include bias:</span>
+    x_prev = np.concatenate((np.ones((<span style="color: #B452CD">1</span>,num_points)), x_prev), axis = <span style="color: #B452CD">0</span>)
+
+    z_output = np.matmul(w_output, x_prev)
+    x_output = z_output
+
+    <span style="color: #8B008B; font-weight: bold">return</span> x_output[<span style="color: #B452CD">0</span>][<span style="color: #B452CD">0</span>]
 </pre>
 </div>
       </div>
@@ -2186,69 +2315,102 @@ <h3 id="usage-of-cost-functions">Usage of cost functions </h3>
     </div>
   </div>
 </div>
-<h3 id="activation-functions">Activation functions </h3>
 
-<p>Finally, before we look at the layers that make up the neural network,
-we will look at the activation functions which can be specified
-between the hidden layers and as the output function. Each function
-can be valued for any given vector or matrix X, and can be
-differentiated via derivate().
+
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="setting-up-the-network-using-autograd-the-trial-solution">Setting up the network using Autograd; The trial solution </h2>
+
+<p>The cost function must then iterate through the given arrays
+containing values for \( x \) and \( t \), defines a point \( (x,t) \) the deep
+neural network and the trial solution is evaluated at, and then finds
+the Jacobian of the trial solution.
 </p>
 
+<p>A possible trial solution for this PDE is</p>
 
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="line-height: 125%;"><span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">autograd.numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">autograd</span> <span style="color: #8B008B; font-weight: bold">import</span> elementwise_grad
+<p>$$
+g_t(x,t) = h_1(x,t) + x(1-x)tN(x,t,P)
+$$
+</p>
+
+<p>with \( h_1(x,t) \) being a function ensuring that \( g_t(x,t) \) satisfies our given conditions, and \( N(x,t,P) \) being the output from the deep neural network using weights and biases for each layer from \( P \).</p>
+
+<p>To fulfill the conditions, \( h_1(x,t) \) could be:</p>
+
+<p>$$
+h_1(x,t) = (1-t)\Big(u(x) - \big((1-x)u(0) + x u(1)\big)\Big) = (1-t)u(x) = (1-t)\sin(\pi x)
+$$
+since \( (0) = u(1) = 0 \) and \( u(x) = \sin(\pi x) \).
+</p>
 
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">identity</span>(X):
-    <span style="color: #8B008B; font-weight: bold">return</span> X
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="why-the-jacobian">Why the Jacobian? </h2>
 
+<p>The Jacobian is used because the program must find the derivative of
+the trial solution with respect to \( x \) and \( t \).
+</p>
 
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">sigmoid</span>(X):
-    <span style="color: #8B008B; font-weight: bold">try</span>:
-        <span style="color: #8B008B; font-weight: bold">return</span> <span style="color: #B452CD">1.0</span> / (<span style="color: #B452CD">1</span> + np.exp(-X))
-    <span style="color: #8B008B; font-weight: bold">except</span> <span style="color: #008b45; font-weight: bold">FloatingPointError</span>:
-        <span style="color: #8B008B; font-weight: bold">return</span> np.where(X &gt; np.zeros(X.shape), np.ones(X.shape), np.zeros(X.shape))
+<p>This gives the necessity of computing the Jacobian matrix, as we want
+to evaluate the gradient with respect to \( x \) and \( t \) (note that the
+Jacobian of a scalar-valued multivariate function is simply its
+gradient).
+</p>
 
+<p>In Autograd, the differentiation is by default done with respect to
+the first input argument of your Python function. Since the points is
+an array representing \( x \) and \( t \), the Jacobian is calculated using
+the values of \( x \) and \( t \).
+</p>
 
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">softmax</span>(X):
-    X = X - np.max(X, axis=-<span style="color: #B452CD">1</span>, keepdims=<span style="color: #8B008B; font-weight: bold">True</span>)
-    delta = <span style="color: #B452CD">10e-10</span>
-    <span style="color: #8B008B; font-weight: bold">return</span> np.exp(X) / (np.sum(np.exp(X), axis=-<span style="color: #B452CD">1</span>, keepdims=<span style="color: #8B008B; font-weight: bold">True</span>) + delta)
+<p>To find the second derivative with respect to \( x \) and \( t \), the
+Jacobian can be found for the second time. The result is a Hessian
+matrix, which is the matrix containing all the possible second order
+mixed derivatives of \( g(x,t) \).
+</p>
 
 
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">RELU</span>(X):
-    <span style="color: #8B008B; font-weight: bold">return</span> np.where(X &gt; np.zeros(X.shape), X, np.zeros(X.shape))
+<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #eeeedd">
+  <pre style="line-height: 125%;"><span style="color: #228B22"># Set up the trial function:</span>
+<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">u</span>(x):
+    <span style="color: #8B008B; font-weight: bold">return</span> np.sin(np.pi*x)
 
+<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">g_trial</span>(point,P):
+    x,t = point
+    <span style="color: #8B008B; font-weight: bold">return</span> (<span style="color: #B452CD">1</span>-t)*u(x) + x*(<span style="color: #B452CD">1</span>-x)*t*deep_neural_network(P,point)
 
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">LRELU</span>(X):
-    delta = <span style="color: #B452CD">10e-4</span>
-    <span style="color: #8B008B; font-weight: bold">return</span> np.where(X &gt; np.zeros(X.shape), X, delta * X)
+<span style="color: #228B22"># The right side of the ODE:</span>
+<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">f</span>(point):
+    <span style="color: #8B008B; font-weight: bold">return</span> <span style="color: #B452CD">0.</span>
 
+<span style="color: #228B22"># The cost function:</span>
+<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">cost_function</span>(P, x, t):
+    cost_sum = <span style="color: #B452CD">0</span>
 
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">derivate</span>(func):
-    <span style="color: #8B008B; font-weight: bold">if</span> func.<span style="color: #00688B">__name__</span> == <span style="color: #CD5555">&quot;RELU&quot;</span>:
+    g_t_jacobian_func = jacobian(g_trial)
+    g_t_hessian_func = hessian(g_trial)
 
-        <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">func</span>(X):
-            <span style="color: #8B008B; font-weight: bold">return</span> np.where(X &gt; <span style="color: #B452CD">0</span>, <span style="color: #B452CD">1</span>, <span style="color: #B452CD">0</span>)
+    <span style="color: #8B008B; font-weight: bold">for</span> x_ <span style="color: #8B008B">in</span> x:
+        <span style="color: #8B008B; font-weight: bold">for</span> t_ <span style="color: #8B008B">in</span> t:
+            point = np.array([x_,t_])
 
-        <span style="color: #8B008B; font-weight: bold">return</span> func
+            g_t = g_trial(point,P)
+            g_t_jacobian = g_t_jacobian_func(point,P)
+            g_t_hessian = g_t_hessian_func(point,P)
 
-    <span style="color: #8B008B; font-weight: bold">elif</span> func.<span style="color: #00688B">__name__</span> == <span style="color: #CD5555">&quot;LRELU&quot;</span>:
+            g_t_dt = g_t_jacobian[<span style="color: #B452CD">1</span>]
+            g_t_d2x = g_t_hessian[<span style="color: #B452CD">0</span>][<span style="color: #B452CD">0</span>]
 
-        <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">func</span>(X):
-            delta = <span style="color: #B452CD">10e-4</span>
-            <span style="color: #8B008B; font-weight: bold">return</span> np.where(X &gt; <span style="color: #B452CD">0</span>, <span style="color: #B452CD">1</span>, delta)
+            func = f(point)
 
-        <span style="color: #8B008B; font-weight: bold">return</span> func
+            err_sqr = ( (g_t_dt - g_t_d2x) - func)**<span style="color: #B452CD">2</span>
+            cost_sum += err_sqr
 
-    <span style="color: #8B008B; font-weight: bold">else</span>:
-        <span style="color: #8B008B; font-weight: bold">return</span> elementwise_grad(func)
+    <span style="color: #8B008B; font-weight: bold">return</span> cost_sum
 </pre>
 </div>
       </div>
@@ -2263,13 +2425,27 @@ <h3 id="activation-functions">Activation functions </h3>
     </div>
   </div>
 </div>
-<h3 id="usage-of-activation-functions">Usage of activation functions </h3>
 
-<p>Below we present a short demonstration of how to use an activation
-function. The derivative of the activation function will be important
-when calculating the output delta term during backpropagation. Note
-that derivate() can also be used for cost functions for a more
-generalized approach.
+
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="setting-up-the-network-using-autograd-the-full-program">Setting up the network using Autograd; The full program </h2>
+
+<p>Having set up the network, along with the trial solution and cost function, we can now see how the deep neural network performs by comparing the results to the analytical solution.</p>
+
+<p>The analytical solution of our problem is</p>
+
+<p>$$
+g(x,t) = \exp(-\pi^2 t)\sin(\pi x)
+$$
+</p>
+
+<p>A possible way to implement a neural network solving the PDE, is given below.
+Be aware, though, that it is fairly slow for the parameters used.
+A better result is possible, but requires more iterations, and thus longer time to complete.
+</p>
+
+<p>Indeed, the program below is not optimal in its implementation, but rather serves as an example on how to implement and use a neural network to solve a PDE.
+Using TensorFlow results in a much better execution time. Try it!
 </p>
 
 
@@ -2279,407 +2455,229 @@ <h3 id="usage-of-activation-functions">Usage of activation functions </h3>
     <div class="inner_cell">
       <div class="input_area">
         <div class="highlight" style="background: #eeeedd">
-  <pre style="line-height: 125%;">z = np.array([[<span style="color: #B452CD">4</span>, <span style="color: #B452CD">5</span>, <span style="color: #B452CD">6</span>]]).T
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">f&quot;Input to activation function:\n{</span>z<span style="color: #CD5555">}&quot;</span>)
+  <pre style="line-height: 125%;"><span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">autograd.numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">autograd</span> <span style="color: #8B008B; font-weight: bold">import</span> jacobian,hessian,grad
+<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">autograd.numpy.random</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">npr</span>
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">matplotlib</span> <span style="color: #8B008B; font-weight: bold">import</span> cm
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">matplotlib</span> <span style="color: #8B008B; font-weight: bold">import</span> pyplot <span style="color: #8B008B; font-weight: bold">as</span> plt
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">mpl_toolkits.mplot3d</span> <span style="color: #8B008B; font-weight: bold">import</span> axes3d
 
-act_func = sigmoid
-a = act_func(z)
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">f&quot;\nOutput from {</span>act_func.<span style="color: #00688B">__name__</span><span style="color: #CD5555">} activation function:\n{</span>a<span style="color: #CD5555">}&quot;</span>)
+<span style="color: #228B22">## Set up the network</span>
 
-act_func_derivative = derivate(act_func)
-valued_at_z = act_func_derivative(a)
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">f&quot;\nDerivative of {</span>act_func.<span style="color: #00688B">__name__</span><span style="color: #CD5555">} activation function valued at z:\n{</span>valued_at_z<span style="color: #CD5555">}&quot;</span>)
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-<h3 id="convolution">Convolution </h3>
-
-<p>In order to construct a convolutional neural network (CNN), it is
-crucial to comprehend the fundamental principles of convolution and
-how it aids in extracting information from images. Convolution, at its
-core, is merely a mathematical operation between two functions that
-yields another function. It is represented by an integral between two
-functions, which is typically expressed as:
-</p>
+<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">sigmoid</span>(z):
+    <span style="color: #8B008B; font-weight: bold">return</span> <span style="color: #B452CD">1</span>/(<span style="color: #B452CD">1</span> + np.exp(-z))
 
-$$
-(f \ast g)(t):=\int_{-\infty}^{\infty} f(\tau) g(t-\tau) d \tau.
-$$
+<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">deep_neural_network</span>(deep_params, x):
+    <span style="color: #228B22"># x is now a point and a 1D numpy array; make it a column vector</span>
+    num_coordinates = np.size(x,<span style="color: #B452CD">0</span>)
+    x = x.reshape(num_coordinates,-<span style="color: #B452CD">1</span>)
 
-<p>Here, \( f \) and \( g \) are the two functions on which we want to perform an
-operation. The outcome of the convolution operation is represented by
-\( (f \ast g) \), and it is derived by sliding the function \( g \) over \( f \) and
-computing the integral of their product at each position. If both
-functions are continuous, convolution takes the form shown
-above. However, if we discretize both \( f \) and \( g \), the convolution
-operation will take the form of a sum between the elements of \( f \) and \( g \):
-</p>
-$$
-(f \ast g)[n]=\sum_{m=0}^{n-1} f(m) g(n-m).
-$$
+    num_points = np.size(x,<span style="color: #B452CD">1</span>)
 
-<p>The key idea we utilize to extract the information contained in an
-image is to slide an \( m \times n \) matrix \( g \) over an \( m \times n \)
-matrix \( f \). In our case, \( f \) represents the image, while \( g \)
-represents the kernel, oftentimes called a filter. However, since our
-convolution will be a two-dimensional variant, we need to extend our
-mathematical formula with an additional summation:
-</p>
+    <span style="color: #228B22"># N_hidden is the number of hidden layers</span>
+    N_hidden = <span style="color: #658b00">len</span>(deep_params) - <span style="color: #B452CD">1</span> <span style="color: #228B22"># -1 since params consist of parameters to all the hidden layers AND the output layer</span>
 
-$$
-(f \ast g)(i, j)\sum_{m=0}^{M-1}\sum_{n=0}^{N-1} f(m,n) g(i-m, j-n).
-$$
+    <span style="color: #228B22"># Assume that the input layer does nothing to the input x</span>
+    x_input = x
+    x_prev = x_input
 
-<p>It is imperative to note that the size of the kernel g is
-significantly smaller than the size of the input image f, thereby
-reducing the amount of computation necessary for feature
-extraction. Furthermore, the kernel is usually a trainable parameter
-in a convolutional neural network, allowing the network to learn
-appropriate kernels for specific tasks.
-</p>
+    <span style="color: #228B22">## Hidden layers:</span>
 
-<p>To give you an example of how 2D convolution works in practice,
-suppose we have an image \( f \) of dimension \( 6 \times 6 \)
-</p>
+    <span style="color: #8B008B; font-weight: bold">for</span> l <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(N_hidden):
+        <span style="color: #228B22"># From the list of parameters P; find the correct weigths and bias for this layer</span>
+        w_hidden = deep_params[l]
 
-$$
-f = \begin{bmatrix}
-4 & 1 & 2 & 9 & 8 & 6 \\
-9 & 5 & 9 & 5 & 8 & 5 \\
-1 & 5 & 9 & 7 & 6 & 4 \\
-2 & 9 & 8 & 3 & 7 & 1 \\
-8 & 1 & 6 & 4 & 2 & 2 \\
-1 & 0 & 5 & 7 & 8 & 2 \\
-\end{bmatrix}
-$$
+        <span style="color: #228B22"># Add a row of ones to include bias</span>
+        x_prev = np.concatenate((np.ones((<span style="color: #B452CD">1</span>,num_points)), x_prev ), axis = <span style="color: #B452CD">0</span>)
 
-<p>and a \( 3 \times 3 \) kernel \( g \) called a low-pass filter. Note that the
-kernel is usually rotated by 180 degrees during convolution, however
-this has no effect on this kernel.
-</p>
+        z_hidden = np.matmul(w_hidden, x_prev)
+        x_hidden = sigmoid(z_hidden)
 
-$$
-g = \frac{1}{9}
-\begin{bmatrix}
-1 & 1 & 1 \\
-1 & 1 & 1 \\
-1 & 1 & 1 \\
-\end{bmatrix}
-$$
+        <span style="color: #228B22"># Update x_prev such that next layer can use the output from this layer</span>
+        x_prev = x_hidden
 
-<p>In order to filter the image, we have to extract a \( 3 \times 3 \)
-element from the upper left corner of \( f \), and perform element-wise
-multiplication of the extracted image pixels with the elements of the
-kernel \( g \):
-</p>
+    <span style="color: #228B22">## Output layer:</span>
 
-$$
-\begin{bmatrix}
-4 & 1 & 2 \\
-9 & 5 & 9 \\
-1 & 5 & 9 \\
-\end{bmatrix}
-\cdot
-\begin{bmatrix}
-\frac{1}{9} & \frac{1}{9} & \frac{1}{9} \\
-\frac{1}{9} & \frac{1}{9} & \frac{1}{9} \\
-\frac{1}{9} & \frac{1}{9} & \frac{1}{9} \\
-\end{bmatrix}
-=
-\begin{bmatrix}
-\frac{4}{9} & \frac{1}{9} & \frac{2}{9} \\
-\frac{9}{9} & \frac{5}{9} & \frac{9}{9} \\
-\frac{1}{9} & \frac{5}{9} & \frac{9}{9} \\
-\end {bmatrix}
-= \boldsymbol{A}
-$$
+    <span style="color: #228B22"># Get the weights and bias for this layer</span>
+    w_output = deep_params[-<span style="color: #B452CD">1</span>]
 
-<p>Then, following the multiplication, we summarize all the elements of the resulting matrix \( \boldsymbol{A} \):</p>
+    <span style="color: #228B22"># Include bias:</span>
+    x_prev = np.concatenate((np.ones((<span style="color: #B452CD">1</span>,num_points)), x_prev), axis = <span style="color: #B452CD">0</span>)
 
-$$
-(f \ast g)(0, 0)= \sum_{i=0}^{2} \sum_{j=0}^{2} a_{i,j} = 5,
-$$
+    z_output = np.matmul(w_output, x_prev)
+    x_output = z_output
 
-<p>which corresponds to the first element of the filtered image \( (f \ast g) \).</p>
+    <span style="color: #8B008B; font-weight: bold">return</span> x_output[<span style="color: #B452CD">0</span>][<span style="color: #B452CD">0</span>]
 
-<p>Here we use a stride of \( S=1 \), a parameter denoted \( S \) which describes how
-many indexes we move the kernel \( g \) to the right before repeating the
-calculations above for the next \( 3 \times 3 \) element of the image
-\( f \). It is usually presumed that \( S=1 \), however, larger values for \( S \)
-can be used to reduce the dimentionality of the filtered image such
-that the convolution operation is more computationally efficient. In
-the context of a convolutional neural network, this will become very
-useful.
-</p>
+<span style="color: #228B22">## Define the trial solution and cost function</span>
+<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">u</span>(x):
+    <span style="color: #8B008B; font-weight: bold">return</span> np.sin(np.pi*x)
 
-<p>The full result of the convolution is:</p>
+<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">g_trial</span>(point,P):
+    x,t = point
+    <span style="color: #8B008B; font-weight: bold">return</span> (<span style="color: #B452CD">1</span>-t)*u(x) + x*(<span style="color: #B452CD">1</span>-x)*t*deep_neural_network(P,point)
 
-$$
-(f \ast g) =
-\begin{bmatrix}
-5 & 5.78 & 7 & 6.44 \\
-6.33 & 6.67 & 6.89 & 5.11 \\
-5.44 & 5.78 & 5.78 & 4 \\
-4.44 & 4.78 & 5.56 & 4 \\
-\end{bmatrix}
-$$
+<span style="color: #228B22"># The right side of the ODE:</span>
+<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">f</span>(point):
+    <span style="color: #8B008B; font-weight: bold">return</span> <span style="color: #B452CD">0.</span>
 
-<p>The result is markedly smaller in shape than the original image. This
-occurs when using convolution without first padding the image with
-additional columns and rows, allowing us to keep the original image
-shape after sliding the kernel over the image.  How many rows and
-columns we wish to pad the image with depends strictly on the shape of
-the kernel, as we wish to pad the image with \( r \) additional rows and
-\( c \) additional columns.
-</p>
+<span style="color: #228B22"># The cost function:</span>
+<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">cost_function</span>(P, x, t):
+    cost_sum = <span style="color: #B452CD">0</span>
 
-$$
-r =\lfloor \frac{\mathrm{kernel height}}{2} \rfloor \cdot 2 \\
-c =\lfloor \frac{\mathrm{kernel width}}{2} \rfloor \cdot 2
-$$
+    g_t_jacobian_func = jacobian(g_trial)
+    g_t_hessian_func = hessian(g_trial)
 
-<p>Note the notation \( \lfloor \frac{\mathrm{kernel width}}{2} \rfloor \) means that
-we floor the result of the division, meaning we round down to a whole
-number in case \( \frac{\mathrm{kernel width}}{2} \) results in a floating point
-number.
-</p>
+    <span style="color: #8B008B; font-weight: bold">for</span> x_ <span style="color: #8B008B">in</span> x:
+        <span style="color: #8B008B; font-weight: bold">for</span> t_ <span style="color: #8B008B">in</span> t:
+            point = np.array([x_,t_])
 
-<p>Using those simple equations, we find out by how much we have to
-extend the dimensions of the original image. Before proceeding,
-however, we might ask what we shall fill the additional rows and
-columns with? One of the most common approaches to padding is
-zero-padding, which as the name suggest, involves filling the rows and
-columns with zeros. This is the approach that we will be using for
-this demonstration. If we apply this padding to out original \( 6 \times 6 \)
-image, the result will be an \( 8 \times 8 \) image as the kernel has a width and
-height of 3. Note that the original image is encapsuled by the
-zero-padded rows and columns:
-</p>
+            g_t = g_trial(point,P)
+            g_t_jacobian = g_t_jacobian_func(point,P)
+            g_t_hessian = g_t_hessian_func(point,P)
 
-$$
-\begin{bmatrix}
-0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\
-0 & 4 & 1 & 2 & 9 & 8 & 6 & 0 \\
-0 & 9 & 5 & 9 & 5 & 8 & 5 & 0 \\
-0 & 1 & 5 & 9 & 7 & 6 & 4 & 0 \\
-0 & 2 & 9 & 8 & 3 & 7 & 1 & 0 \\
-0 & 8 & 1 & 6 & 4 & 2 & 2 & 0 \\
-0 & 1 & 0 & 5 & 7 & 8 & 2 & 0 \\
-0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\
-\end{bmatrix}.
-$$
+            g_t_dt = g_t_jacobian[<span style="color: #B452CD">1</span>]
+            g_t_d2x = g_t_hessian[<span style="color: #B452CD">0</span>][<span style="color: #B452CD">0</span>]
 
-<p>Below we have provided code that demonstrates padding and
-convolution. As you will see when we run the code, the size of the
-image will remain unchanged when using padding.~
-</p>
+            func = f(point)
 
+            err_sqr = ( (g_t_dt - g_t_d2x) - func)**<span style="color: #B452CD">2</span>
+            cost_sum += err_sqr
 
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="line-height: 125%;"><span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
+    <span style="color: #8B008B; font-weight: bold">return</span> cost_sum /( np.size(x)*np.size(t) )
 
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">padding</span>(image, kernel):
-    <span style="color: #228B22"># calculate r and c</span>
-    r = (kernel.shape[<span style="color: #B452CD">0</span>] // <span style="color: #B452CD">2</span>) * <span style="color: #B452CD">2</span>
-    c = (kernel.shape[<span style="color: #B452CD">1</span>] // <span style="color: #B452CD">2</span>) * <span style="color: #B452CD">2</span>
-    
-    <span style="color: #228B22"># padded image dimensions</span>
-    padded_height = image.shape[<span style="color: #B452CD">0</span>] + r
-    padded_width = image.shape[<span style="color: #B452CD">1</span>] + c
-    
-    <span style="color: #228B22"># for more readable code</span>
-    k_half_height = kernel.shape[<span style="color: #B452CD">0</span>] // <span style="color: #B452CD">2</span>
-    k_half_width = kernel.shape[<span style="color: #B452CD">1</span>] // <span style="color: #B452CD">2</span>
+<span style="color: #228B22">## For comparison, define the analytical solution</span>
+<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">g_analytic</span>(point):
+    x,t = point
+    <span style="color: #8B008B; font-weight: bold">return</span> np.exp(-np.pi**<span style="color: #B452CD">2</span>*t)*np.sin(np.pi*x)
 
-    <span style="color: #228B22"># zero matrix with padded dimensions</span>
-    padded_img = np.zeros((padded_height, padded_width))
+<span style="color: #228B22">## Set up a function for training the network to solve for the equation</span>
+<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">solve_pde_deep_neural_network</span>(x,t, num_neurons, num_iter, lmb):
+    <span style="color: #228B22">## Set up initial weigths and biases</span>
+    N_hidden = np.size(num_neurons)
 
-    <span style="color: #228B22"># place image into zero matrix</span>
-    padded_img[k_half_height : padded_height - k_half_height,
-               k_half_width : padded_width - k_half_width] = image[:, :]
+    <span style="color: #228B22">## Set up initial weigths and biases</span>
 
-    <span style="color: #8B008B; font-weight: bold">return</span> padded_img
+    <span style="color: #228B22"># Initialize the list of parameters:</span>
+    P = [<span style="color: #8B008B; font-weight: bold">None</span>]*(N_hidden + <span style="color: #B452CD">1</span>) <span style="color: #228B22"># + 1 to include the output layer</span>
 
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">convolve</span>(original_image, padded_image, kernel, stride=<span style="color: #B452CD">1</span>):
-    <span style="color: #228B22"># rotate kernel by 180 degrees</span>
-    kernel = np.rot90(np.rot90(kernel))
+    P[<span style="color: #B452CD">0</span>] = npr.randn(num_neurons[<span style="color: #B452CD">0</span>], <span style="color: #B452CD">2</span> + <span style="color: #B452CD">1</span> ) <span style="color: #228B22"># 2 since we have two points, +1 to include bias</span>
+    <span style="color: #8B008B; font-weight: bold">for</span> l <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #B452CD">1</span>,N_hidden):
+        P[l] = npr.randn(num_neurons[l], num_neurons[l-<span style="color: #B452CD">1</span>] + <span style="color: #B452CD">1</span>) <span style="color: #228B22"># +1 to include bias</span>
 
-    <span style="color: #228B22"># note that kernel height // 2 is written as &#39;m&#39;</span>
-    <span style="color: #228B22"># and kernel width // 2 as &#39;n&#39; in the mathematical notation</span>
-    m = kernel.shape[<span style="color: #B452CD">0</span>] // <span style="color: #B452CD">2</span>
-    n = kernel.shape[<span style="color: #B452CD">1</span>] // <span style="color: #B452CD">2</span>
-    
-    r = (kernel.shape[<span style="color: #B452CD">0</span>] // <span style="color: #B452CD">2</span>) * <span style="color: #B452CD">2</span>
-    c = (kernel.shape[<span style="color: #B452CD">1</span>] // <span style="color: #B452CD">2</span>) * <span style="color: #B452CD">2</span>
-    
-    <span style="color: #228B22"># initialize output array</span>
-    convolved_image = np.zeros(original_image.shape)
-    image_height = original_image.shape[<span style="color: #B452CD">0</span>]
-    image_width = original_image.shape[<span style="color: #B452CD">1</span>]
-
-    <span style="color: #228B22"># the convolution</span>
-    <span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(m, image_height + m, stride):
-        <span style="color: #8B008B; font-weight: bold">for</span> j <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(n, image_width + n, stride):
-            convolved_image[i-m, j-n] = np.sum(
-                padded_image[i : i + m, j : j + n]
-                * kernel
-            )
-            
-    <span style="color: #8B008B; font-weight: bold">return</span> convolved_image
-
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">convolve</span>(image, kernel, stride=<span style="color: #B452CD">1</span>):
-    <span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #B452CD">2</span>):
-        kernel = np.rot90(kernel)
-
-    k_half_height = kernel.shape[<span style="color: #B452CD">0</span>] // <span style="color: #B452CD">2</span>
-    k_half_width = kernel.shape[<span style="color: #B452CD">0</span>] // <span style="color: #B452CD">2</span>
-
-    conv_image = np.zeros(image.shape)
-    pad_image = padding(image, kernel)
-
-    <span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(k_half_height, conv_image.shape[<span style="color: #B452CD">0</span>] + k_half_height, stride):
-        <span style="color: #8B008B; font-weight: bold">for</span> j <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(k_half_width, conv_image.shape[<span style="color: #B452CD">1</span>] + k_half_width, stride):
-            conv_image[i - k_half_height, j - k_half_width] = np.sum(
-                pad_image[
-                    i - k_half_height : i + k_half_height + <span style="color: #B452CD">1</span>, j - k_half_width : j + k_half_width + <span style="color: #B452CD">1</span>
-                ]
-                * kernel
-            )
-
-    <span style="color: #8B008B; font-weight: bold">return</span> conv_image
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
+    <span style="color: #228B22"># For the output layer</span>
+    P[-<span style="color: #B452CD">1</span>] = npr.randn(<span style="color: #B452CD">1</span>, num_neurons[-<span style="color: #B452CD">1</span>] + <span style="color: #B452CD">1</span> ) <span style="color: #228B22"># +1 since bias is included</span>
 
-<p>Fun fact: When filtering images, you will see that convolution
-involves rotating the kernel by 180 degrees.  However, this is not the
-case when applying convolution in a CNN, where the same operation that is not
-rotated by 180 degrees is called cross-correlation, which is normally implemented in most libraries.
-</p>
+    <span style="color: #658b00">print</span>(<span style="color: #CD5555">&#39;Initial cost: &#39;</span>,cost_function(P, x, t))
 
+    cost_function_grad = grad(cost_function,<span style="color: #B452CD">0</span>)
 
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="line-height: 125%;">original_image = np.array([[<span style="color: #B452CD">4</span>, <span style="color: #B452CD">1</span>, <span style="color: #B452CD">2</span>, <span style="color: #B452CD">9</span>, <span style="color: #B452CD">8</span>, <span style="color: #B452CD">6</span>],
-                 [<span style="color: #B452CD">9</span>, <span style="color: #B452CD">5</span>, <span style="color: #B452CD">9</span>, <span style="color: #B452CD">5</span>, <span style="color: #B452CD">8</span>, <span style="color: #B452CD">5</span>],
-                 [<span style="color: #B452CD">1</span>, <span style="color: #B452CD">5</span>, <span style="color: #B452CD">9</span>, <span style="color: #B452CD">7</span>, <span style="color: #B452CD">6</span>, <span style="color: #B452CD">4</span>],
-                 [<span style="color: #B452CD">2</span>, <span style="color: #B452CD">9</span>, <span style="color: #B452CD">8</span>, <span style="color: #B452CD">3</span>, <span style="color: #B452CD">7</span>, <span style="color: #B452CD">1</span>],
-                 [<span style="color: #B452CD">8</span>, <span style="color: #B452CD">1</span>, <span style="color: #B452CD">6</span>, <span style="color: #B452CD">4</span>, <span style="color: #B452CD">2</span>, <span style="color: #B452CD">2</span>],
-                 [<span style="color: #B452CD">1</span>, <span style="color: #B452CD">0</span>, <span style="color: #B452CD">5</span>, <span style="color: #B452CD">7</span>, <span style="color: #B452CD">8</span>, <span style="color: #B452CD">2</span>]])
+    <span style="color: #228B22"># Let the update be done num_iter times</span>
+    <span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(num_iter):
+        cost_grad =  cost_function_grad(P, x , t)
 
-kernel = (<span style="color: #B452CD">1</span>/<span style="color: #B452CD">9</span>)*np.ones((<span style="color: #B452CD">3</span>,<span style="color: #B452CD">3</span>))
+        <span style="color: #8B008B; font-weight: bold">for</span> l <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(N_hidden+<span style="color: #B452CD">1</span>):
+            P[l] = P[l] - lmb * cost_grad[l]
 
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">f&quot;{</span>original_image.shape<span style="color: #CD5555">=}&quot;</span>)
+    <span style="color: #658b00">print</span>(<span style="color: #CD5555">&#39;Final cost: &#39;</span>,cost_function(P, x, t))
 
-<span style="color: #228B22"># note that convolve() performs padding</span>
-convolved_image = convolve(original_image, kernel, stride=<span style="color: #B452CD">1</span>)
+    <span style="color: #8B008B; font-weight: bold">return</span> P
 
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">f&quot;{</span>convolved_image.shape<span style="color: #CD5555">=}&quot;</span>)
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
+<span style="color: #8B008B; font-weight: bold">if</span> <span style="color: #00688B">__name__</span> == <span style="color: #CD5555">&#39;__main__&#39;</span>:
+    <span style="color: #228B22">### Use the neural network:</span>
+    npr.seed(<span style="color: #B452CD">15</span>)
 
-<p>As you can see, the resulting image is of the same size as the
-original image. To round of our demonstration of convolution, we will
-present the results of convolution using commonly used kernels. In a
-CNN, the values of the kernels are randomly initialized, and then
-learned during training. These kernels will extract information
-regarding the picture, such as for example the edge detection filter
-demonstrated below extracts the edges present in the picture. Of
-course, there is no guarantee that the CNN will learn an edge
-detection filter, but this should provide some intuiton as to how the
-CNN is able to use kernels to make better predictions than a regular
-feed forward neural network.
-</p>
+    <span style="color: #228B22">## Decide the vales of arguments to the function to solve</span>
+    Nx = <span style="color: #B452CD">10</span>; Nt = <span style="color: #B452CD">10</span>
+    x = np.linspace(<span style="color: #B452CD">0</span>, <span style="color: #B452CD">1</span>, Nx)
+    t = np.linspace(<span style="color: #B452CD">0</span>,<span style="color: #B452CD">1</span>,Nt)
 
+    <span style="color: #228B22">## Set up the parameters for the network</span>
+    num_hidden_neurons = [<span style="color: #B452CD">100</span>, <span style="color: #B452CD">25</span>]
+    num_iter = <span style="color: #B452CD">250</span>
+    lmb = <span style="color: #B452CD">0.01</span>
 
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="line-height: 125%;"><span style="color: #228B22"># Now an example using a real image and first a gaussian low-pass filter and then a Sobel filter</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">imageio.v3</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">imageio</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">matplotlib.pyplot</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">plt</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">time</span>
+    P = solve_pde_deep_neural_network(x,t, num_hidden_neurons, num_iter, lmb)
 
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">generate_gauss_mask</span>(sigma, K=<span style="color: #B452CD">1</span>):
-    side = np.ceil(<span style="color: #B452CD">1</span> + <span style="color: #B452CD">8</span> * sigma)
-    y, x = np.mgrid[-side // <span style="color: #B452CD">2</span> + <span style="color: #B452CD">1</span> : (side // <span style="color: #B452CD">2</span>) + <span style="color: #B452CD">1</span>, -side // <span style="color: #B452CD">2</span> + <span style="color: #B452CD">1</span> : (side // <span style="color: #B452CD">2</span>) + <span style="color: #B452CD">1</span>]
-    ker_coef = K / (<span style="color: #B452CD">2</span> * np.pi * sigma**<span style="color: #B452CD">2</span>)
-    g = np.exp(-((x**<span style="color: #B452CD">2</span> + y**<span style="color: #B452CD">2</span>) / (<span style="color: #B452CD">2.0</span> * sigma**<span style="color: #B452CD">2</span>)))
+    <span style="color: #228B22">## Store the results</span>
+    g_dnn_ag = np.zeros((Nx, Nt))
+    G_analytical = np.zeros((Nx, Nt))
+    <span style="color: #8B008B; font-weight: bold">for</span> i,x_ <span style="color: #8B008B">in</span> <span style="color: #658b00">enumerate</span>(x):
+        <span style="color: #8B008B; font-weight: bold">for</span> j, t_ <span style="color: #8B008B">in</span> <span style="color: #658b00">enumerate</span>(t):
+            point = np.array([x_, t_])
+            g_dnn_ag[i,j] = g_trial(point,P)
 
-    <span style="color: #8B008B; font-weight: bold">return</span> g, ker_coef
+            G_analytical[i,j] = g_analytic(point)
 
+    <span style="color: #228B22"># Find the map difference between the analytical and the computed solution</span>
+    diff_ag = np.abs(g_dnn_ag - G_analytical)
+    <span style="color: #658b00">print</span>(<span style="color: #CD5555">&#39;Max absolute difference between the analytical solution and the network: %g&#39;</span>%np.max(diff_ag))
 
-img_path = <span style="color: #CD5555">&quot;data/IMG-2167.JPG&quot;</span>
-image_of_cute_dog = imageio.imread(img_path, mode=<span style="color: #CD5555">&#39;L&#39;</span>)
+    <span style="color: #228B22">## Plot the solutions in two dimensions, that being in position and time</span>
 
-plt.imshow(image_of_cute_dog, cmap=<span style="color: #CD5555">&quot;gray&quot;</span>, vmin=<span style="color: #B452CD">0</span>, vmax=<span style="color: #B452CD">255</span>, aspect=<span style="color: #CD5555">&quot;auto&quot;</span>)
-plt.title(<span style="color: #CD5555">&quot;Original image&quot;</span>)
-plt.show()
+    T,X = np.meshgrid(t,x)
 
-gauss, kernel = generate_gauss_mask(sigma=<span style="color: #B452CD">6</span>)
-gauss_kernel = gauss*kernel
+    fig = plt.figure(figsize=(<span style="color: #B452CD">10</span>,<span style="color: #B452CD">10</span>))
+    ax = fig.add_suplot(projection=<span style="color: #CD5555">&#39;3d&#39;</span>)
+    ax.set_title(<span style="color: #CD5555">&#39;Solution from the deep neural network w/ %d layer&#39;</span>%<span style="color: #658b00">len</span>(num_hidden_neurons))
+    s = ax.plot_surface(T,X,g_dnn_ag,linewidth=<span style="color: #B452CD">0</span>,antialiased=<span style="color: #8B008B; font-weight: bold">False</span>,cmap=cm.viridis)
+    ax.set_xlabel(<span style="color: #CD5555">&#39;Time $t$&#39;</span>)
+    ax.set_ylabel(<span style="color: #CD5555">&#39;Position $x$&#39;</span>);
 
-filtered_image = convolve(image_of_cute_dog, gauss_kernel)
-plt.imshow(filtered_image, cmap=<span style="color: #CD5555">&quot;gray&quot;</span>, vmin=<span style="color: #B452CD">0</span>, vmax=<span style="color: #B452CD">255</span>, aspect=<span style="color: #CD5555">&quot;auto&quot;</span>)
-plt.title(<span style="color: #CD5555">&quot;Result of convolution with gauss kernel (blurring filter)&quot;</span>)
-plt.show()
 
-sobel_kernel = np.array([[<span style="color: #B452CD">1</span>, <span style="color: #B452CD">2</span>, <span style="color: #B452CD">1</span>],
-                    [<span style="color: #B452CD">0</span>, <span style="color: #B452CD">0</span>, <span style="color: #B452CD">0</span>], 
-                    [-<span style="color: #B452CD">1</span>, -<span style="color: #B452CD">2</span>, -<span style="color: #B452CD">1</span>]])
+    fig = plt.figure(figsize=(<span style="color: #B452CD">10</span>,<span style="color: #B452CD">10</span>))
+    ax = fig.add_suplot(projection=<span style="color: #CD5555">&#39;3d&#39;</span>)
+    ax.set_title(<span style="color: #CD5555">&#39;Analytical solution&#39;</span>)
+    s = ax.plot_surface(T,X,G_analytical,linewidth=<span style="color: #B452CD">0</span>,antialiased=<span style="color: #8B008B; font-weight: bold">False</span>,cmap=cm.viridis)
+    ax.set_xlabel(<span style="color: #CD5555">&#39;Time $t$&#39;</span>)
+    ax.set_ylabel(<span style="color: #CD5555">&#39;Position $x$&#39;</span>);
 
-filtered_image = convolve(image_of_cute_dog, sobel_kernel)
+    fig = plt.figure(figsize=(<span style="color: #B452CD">10</span>,<span style="color: #B452CD">10</span>))
+    ax = fig.add_suplot(projection=<span style="color: #CD5555">&#39;3d&#39;</span>)
+    ax.set_title(<span style="color: #CD5555">&#39;Difference&#39;</span>)
+    s = ax.plot_surface(T,X,diff_ag,linewidth=<span style="color: #B452CD">0</span>,antialiased=<span style="color: #8B008B; font-weight: bold">False</span>,cmap=cm.viridis)
+    ax.set_xlabel(<span style="color: #CD5555">&#39;Time $t$&#39;</span>)
+    ax.set_ylabel(<span style="color: #CD5555">&#39;Position $x$&#39;</span>);
 
-plt.imshow(filtered_image, cmap=<span style="color: #CD5555">&quot;gray&quot;</span>, vmin=<span style="color: #B452CD">0</span>, vmax=<span style="color: #B452CD">255</span>, aspect=<span style="color: #CD5555">&quot;auto&quot;</span>)
-plt.title(<span style="color: #CD5555">&quot;Result of convolution with sobel kernel (edge detection filter)&quot;</span>)
-plt.show()
+    <span style="color: #228B22">## Take some slices of the 3D plots just to see the solutions at particular times</span>
+    indx1 = <span style="color: #B452CD">0</span>
+    indx2 = <span style="color: #658b00">int</span>(Nt/<span style="color: #B452CD">2</span>)
+    indx3 = Nt-<span style="color: #B452CD">1</span>
+
+    t1 = t[indx1]
+    t2 = t[indx2]
+    t3 = t[indx3]
+
+    <span style="color: #228B22"># Slice the results from the DNN</span>
+    res1 = g_dnn_ag[:,indx1]
+    res2 = g_dnn_ag[:,indx2]
+    res3 = g_dnn_ag[:,indx3]
+
+    <span style="color: #228B22"># Slice the analytical results</span>
+    res_analytical1 = G_analytical[:,indx1]
+    res_analytical2 = G_analytical[:,indx2]
+    res_analytical3 = G_analytical[:,indx3]
+
+    <span style="color: #228B22"># Plot the slices</span>
+    plt.figure(figsize=(<span style="color: #B452CD">10</span>,<span style="color: #B452CD">10</span>))
+    plt.title(<span style="color: #CD5555">&quot;Computed solutions at time = %g&quot;</span>%t1)
+    plt.plot(x, res1)
+    plt.plot(x,res_analytical1)
+    plt.legend([<span style="color: #CD5555">&#39;dnn&#39;</span>,<span style="color: #CD5555">&#39;analytical&#39;</span>])
+
+    plt.figure(figsize=(<span style="color: #B452CD">10</span>,<span style="color: #B452CD">10</span>))
+    plt.title(<span style="color: #CD5555">&quot;Computed solutions at time = %g&quot;</span>%t2)
+    plt.plot(x, res2)
+    plt.plot(x,res_analytical2)
+    plt.legend([<span style="color: #CD5555">&#39;dnn&#39;</span>,<span style="color: #CD5555">&#39;analytical&#39;</span>])
+
+    plt.figure(figsize=(<span style="color: #B452CD">10</span>,<span style="color: #B452CD">10</span>))
+    plt.title(<span style="color: #CD5555">&quot;Computed solutions at time = %g&quot;</span>%t3)
+    plt.plot(x, res3)
+    plt.plot(x,res_analytical3)
+    plt.legend([<span style="color: #CD5555">&#39;dnn&#39;</span>,<span style="color: #CD5555">&#39;analytical&#39;</span>])
+
+    plt.show()
 </pre>
 </div>
       </div>
@@ -2694,50 +2692,354 @@ <h3 id="convolution">Convolution </h3>
     </div>
   </div>
 </div>
-<h3 id="layers">Layers </h3>
 
-<p>The code below initialises global variables for readability and
-describes the abstract class Layers. This is not important in order to
-understand the CNN, but is benefitial for organizing the code neatly.
-</p>
 
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="resources-on-differential-equations-and-deep-learning">Resources on differential equations and deep learning </h2>
 
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
+<ol>
+<li> <a href="https://pdfs.semanticscholar.org/d061/df393e0e8fbfd0ea24976458b7d42419040d.pdf" target="_blank">Artificial neural networks for solving ordinary and partial differential equations by I.E. Lagaris et al</a></li>
+<li> <a href="https://becominghuman.ai/neural-networks-for-solving-differential-equations-fa230ac5e04c" target="_blank">Neural networks for solving differential equations by A. Honchar</a></li>
+<li> <a href="http://cs229.stanford.edu/proj2013/ChiaramonteKiener-SolvingDifferentialEquationsUsingNeuralNetworks.pdf" target="_blank">Solving differential equations using neural networks by M.M Chiaramonte and M. Kiener</a></li>
+<li> <a href="https://www.springer.com/us/book/9783540225515" target="_blank">Introduction to Partial Differential Equations by A. Tveito, R. Winther</a></li>
+</ol>
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="convolutional-neural-networks-recognizing-images">Convolutional Neural Networks (recognizing images) </h2>
+
+<p>Convolutional neural networks (CNNs) were developed during the last
+decade of the previous century, with a focus on character recognition
+tasks. Nowadays, CNNs are a central element in the spectacular success
+of deep learning methods. The success in for example image
+classifications have made them a central tool for most machine
+learning practitioners.
+</p>
+
+<p>CNNs are very similar to ordinary Neural Networks.
+They are made up of neurons that have learnable weights and
+biases. Each neuron receives some inputs, performs a dot product and
+optionally follows it with a non-linearity. The whole network still
+expresses a single differentiable score function: from the raw image
+pixels on one end to class scores at the other. And they still have a
+loss function (for example Softmax) on the last (fully-connected) layer
+and all the tips/tricks we developed for learning regular Neural
+Networks still apply (back propagation, gradient descent etc etc).
+</p>
+
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="what-is-the-difference">What is the Difference </h2>
+
+<p><b>CNN architectures make the explicit assumption that
+the inputs are images, which allows us to encode certain properties
+into the architecture. These then make the forward function more
+efficient to implement and vastly reduce the amount of parameters in
+the network.</b>
+</p>
+
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="neural-networks-vs-cnns">Neural Networks vs CNNs </h2>
+
+<p>Neural networks are defined as <b>affine transformations</b>, that is 
+a vector is received as input and is multiplied with a matrix of so-called weights (our unknown paramters) to produce an
+output (to which a bias vector is usually added before passing the result
+through a nonlinear activation function). This is applicable to any type of input, be it an
+image, a sound clip or an unordered collection of features: whatever their
+dimensionality, their representation can always be flattened into a vector
+before the transformation.
+</p>
+
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc">Why CNNS for images, sound files, medical images from CT scans etc? </h2>
+
+<p>However, when we consider images, sound clips and many other similar kinds of data, these data  have an intrinsic
+structure. More formally, they share these important properties:
+</p>
+<ul>
+<li> They are stored as multi-dimensional arrays (think of the pixels of a figure) .</li>
+<li> They feature one or more axes for which ordering matters (e.g., width and height axes for an image, time axis for a sound clip).</li>
+<li> One axis, called the channel axis, is used to access different views of the data (e.g., the red, green and blue channels of a color image, or the left and right channels of a stereo audio track).</li>
+</ul>
+<p>These properties are not exploited when an affine transformation is applied; in
+fact, all the axes are treated in the same way and the topological information
+is not taken into account. Still, taking advantage of the implicit structure of
+the data may prove very handy in solving some tasks, like computer vision and
+speech recognition, and in these cases it would be best to preserve it. This is
+where discrete convolutions come into play.
+</p>
+
+<p>A discrete convolution is a linear transformation that preserves this notion of
+ordering. It is sparse (only a few input units contribute to a given output
+unit) and reuses parameters (the same weights are applied to multiple locations
+in the input).
+</p>
+
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="regular-nns-don-t-scale-well-to-full-images">Regular NNs don&#8217;t scale well to full images </h2>
+
+<p>As an example, consider
+an image of size \( 32\times 32\times 3 \) (32 wide, 32 high, 3 color channels), so a
+single fully-connected neuron in a first hidden layer of a regular
+Neural Network would have \( 32\times 32\times 3 = 3072 \) weights. This amount still
+seems manageable, but clearly this fully-connected structure does not
+scale to larger images. For example, an image of more respectable
+size, say \( 200\times 200\times 3 \), would lead to neurons that have 
+\( 200\times 200\times 3 = 120,000 \) weights. 
+</p>
+
+<p>We could have
+several such neurons, and the parameters would add up quickly! Clearly,
+this full connectivity is wasteful and the huge number of parameters
+would quickly lead to possible overfitting.
+</p>
+
+<center>  <!-- FIGURE -->
+<hr class="figure">
+<center>
+<p class="caption">Figure 1:  A regular 3-layer Neural Network. </p>
+</center>
+<p><img src="http://github.com/figslides/nn.jpeg" width="500" align="bottom"></p>
+</center>
+
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="3d-volumes-of-neurons">3D volumes of neurons </h2>
+
+<p>Convolutional Neural Networks take advantage of the fact that the
+input consists of images and they constrain the architecture in a more
+sensible way. 
+</p>
+
+<p>In particular, unlike a regular Neural Network, the
+layers of a CNN have neurons arranged in 3 dimensions: width,
+height, depth. (Note that the word depth here refers to the third
+dimension of an activation volume, not to the depth of a full Neural
+Network, which can refer to the total number of layers in a network.)
+</p>
+
+<p>To understand it better, the above example of an image 
+with an input volume of
+activations has dimensions \( 32\times 32\times 3 \) (width, height,
+depth respectively). 
+</p>
+
+<p>The neurons in a layer will
+only be connected to a small region of the layer before it, instead of
+all of the neurons in a fully-connected manner. Moreover, the final
+output layer could  for this specific image have dimensions \( 1\times 1 \times 10 \), 
+because by the
+end of the CNN architecture we will reduce the full image into a
+single vector of class scores, arranged along the depth
+dimension. 
+</p>
+
+<center>  <!-- FIGURE -->
+<hr class="figure">
+<center>
+<p class="caption">Figure 2:  A CNN arranges its neurons in three dimensions (width, height, depth), as visualized in one of the layers. Every layer of a CNN transforms the 3D input volume to a 3D output volume of neuron activations. In this example, the red input layer holds the image, so its width and height would be the dimensions of the image, and the depth would be 3 (Red, Green, Blue channels). </p>
+</center>
+<p><img src="http://github.com/figslides/cnn.jpeg" width="500" align="bottom"></p>
+</center>
+
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="more-on-dimensionalities">More on Dimensionalities </h2>
+
+<p>In fields like signal processing (and imaging as well), one designs
+so-called filters. These filters are defined by the convolutions and
+are often hand-crafted. One may specify filters for smoothing, edge
+detection, frequency reshaping, and similar operations. However with
+neural networks the idea is to automatically learn the filters and use
+many of them in conjunction with non-linear operations (activation
+functions).
+</p>
+
+<p>As an example consider a neural network operating on sound sequence
+data.  Assume that we an input vector \( \boldsymbol{x} \) of length \( d=10^6 \).  We
+construct then a neural network with onle hidden layer only with
+\( 10^4 \) nodes. This means that we will have a weight matrix with
+\( 10^4\times 10^6=10^{10} \) weights to be determined, together with \( 10^4 \) biases.
+</p>
+
+<p>Assume furthermore that we have an output layer which is meant to train whether the sound sequence represents a human voice (true) or something else (false).
+It means that we have only one output node. But since this output node connects to \( 10^4 \) nodes in the hidden layer, there are in total \( 10^4 \) weights to be determined for the output layer, plus one bias. In total we have
+</p>
+
+$$
+\mathrm{NumberParameters}=10^{10}+10^4+10^4+1 \approx 10^{10},
+$$
+
+<p>that is ten billion parameters to determine. </p>
+
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="further-remarks">Further remarks </h2>
+
+<p>The main principles that justify convolutions is locality of
+information and repetion of patterns within the signal. Sound samples
+of the input in adjacent spots are much more likely to affect each
+other than those that are very far away. Similarly, sounds are
+repeated in multiple times in the signal. While slightly simplistic,
+reasoning about such a sound example demonstrates this. The same
+principles then apply to images and other similar data.
+</p>
+
+<!-- !split  -->
+<h2 id="layers-used-to-build-cnns">Layers used to build CNNs </h2>
+
+<p>A simple CNN is a sequence of layers, and every layer of a CNN
+transforms one volume of activations to another through a
+differentiable function. We use three main types of layers to build
+CNN architectures: Convolutional Layer, Pooling Layer, and
+Fully-Connected Layer (exactly as seen in regular Neural Networks). We
+will stack these layers to form a full CNN architecture.
+</p>
+
+<p>A simple CNN for image classification could have the architecture:</p>
+
+<ul>
+<li> <b>INPUT</b> (\( 32\times 32 \times 3 \)) will hold the raw pixel values of the image, in this case an image of width 32, height 32, and with three color channels R,G,B.</li>
+<li> <b>CONV</b> (convolutional )layer will compute the output of neurons that are connected to local regions in the input, each computing a dot product between their weights and a small region they are connected to in the input volume. This may result in volume such as \( [32\times 32\times 12] \) if we decided to use 12 filters.</li>
+<li> <b>RELU</b> layer will apply an elementwise activation function, such as the \( max(0,x) \) thresholding at zero. This leaves the size of the volume unchanged (\( [32\times 32\times 12] \)).</li>
+<li> <b>POOL</b> (pooling) layer will perform a downsampling operation along the spatial dimensions (width, height), resulting in volume such as \( [16\times 16\times 12] \).</li>
+<li> <b>FC</b> (i.e. fully-connected) layer will compute the class scores, resulting in volume of size \( [1\times 1\times 10] \), where each of the 10 numbers correspond to a class score, such as among the 10 categories of the MNIST images we considered above . As with ordinary Neural Networks and as the name implies, each neuron in this layer will be connected to all the numbers in the previous volume.</li>
+</ul>
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="transforming-images">Transforming images </h2>
+
+<p>CNNs transform the original image layer by layer from the original
+pixel values to the final class scores. 
+</p>
+
+<p>Observe that some layers contain
+parameters and other don&#8217;t. In particular, the CNN layers perform
+transformations that are a function of not only the activations in the
+input volume, but also of the parameters (the weights and biases of
+the neurons). On the other hand, the RELU/POOL layers will implement a
+fixed function. The parameters in the CONV/FC layers will be trained
+with gradient descent so that the class scores that the CNN computes
+are consistent with the labels in the training set for each image.
+</p>
+
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="cnns-in-brief">CNNs in brief </h2>
+
+<p>In summary:</p>
+
+<ul>
+<li> A CNN architecture is in the simplest case a list of Layers that transform the image volume into an output volume (e.g. holding the class scores)</li>
+<li> There are a few distinct types of Layers (e.g. CONV/FC/RELU/POOL are by far the most popular)</li>
+<li> Each Layer accepts an input 3D volume and transforms it to an output 3D volume through a differentiable function</li>
+<li> Each Layer may or may not have parameters (e.g. CONV/FC do, RELU/POOL don&#8217;t)</li>
+<li> Each Layer may or may not have additional hyperparameters (e.g. CONV/FC/POOL do, RELU doesn&#8217;t)</li>
+</ul>
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book">A deep CNN model (<a href="https://github.com/rasbt/machine-learning-book" target="_blank">From Raschka et al</a>) </h2>
+
+<center>  <!-- FIGURE -->
+<hr class="figure">
+<center>
+<p class="caption">Figure 3:  A deep CNN </p>
+</center>
+<p><img src="http://github.com/figslides/deepcnn.png" width="500" align="bottom"></p>
+</center>
+
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="key-idea">Key Idea </h2>
+
+<p>A dense neural network is representd by an affine operation (like
+matrix-matrix multiplication) where all parameters are included.
+</p>
+
+<p>The key idea in CNNs for say imaging is that in images neighbor pixels tend to be related! So we connect
+only neighboring neurons in the input instead of connecting all with the first hidden layer.
+</p>
+
+<p>We say we perform a filtering (convolution is the mathematical operation). </p>
+
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="how-to-do-image-compression-before-the-era-of-deep-learning">How to do image compression before the era of deep learning </h2>
+
+<p>The singular-value decomposition (SVD) algorithm has been for decades one of the standard ways of compressing images.
+The <a href="https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/chapter2.html#the-singular-value-decomposition" target="_blank">lectures on the SVD</a> give many of the essential details concerning the SVD.
+</p>
+
+<p>The orthogonal vectors which are obtained from the SVD, can be used to
+project down the dimensionality of a given image. In the example here
+we gray-scale an image and downsize it.
+</p>
+
+<p>This recipe relies on us being able to actually perform the SVD. For
+large images, and in particular with many images to reconstruct, using the SVD 
+may quickly become an overwhelming task. With the advent of efficient deep
+learning methods like CNNs and later generative methods, these methods
+have become in the last years the premier way of performing image
+analysis. In particular for classification problems with labelled images.
+</p>
+
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="the-svd-example">The SVD example </h2>
+
+
+<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
+<div class="cell border-box-sizing code_cell rendered">
   <div class="input">
     <div class="inner_cell">
       <div class="input_area">
         <div class="highlight" style="background: #eeeedd">
-  <pre style="line-height: 125%;"><span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">math</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">autograd.numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">copy</span> <span style="color: #8B008B; font-weight: bold">import</span> deepcopy, copy
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">autograd</span> <span style="color: #8B008B; font-weight: bold">import</span> grad
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">typing</span> <span style="color: #8B008B; font-weight: bold">import</span> Callable
-
-<span style="color: #228B22"># global variables for index readability</span>
-input_index = <span style="color: #B452CD">0</span>
-node_index = <span style="color: #B452CD">1</span>
-bias_index = <span style="color: #B452CD">1</span>
-input_channel_index = <span style="color: #B452CD">1</span>
-feature_maps_index = <span style="color: #B452CD">1</span>
-height_index = <span style="color: #B452CD">2</span>
-width_index = <span style="color: #B452CD">3</span>
-kernel_feature_maps_index = <span style="color: #B452CD">1</span>
-kernel_input_channels_index = <span style="color: #B452CD">0</span>
-
-
-<span style="color: #8B008B; font-weight: bold">class</span> <span style="color: #008b45; font-weight: bold">Layer</span>:
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">__init__</span>(<span style="color: #658b00">self</span>, seed):
-        <span style="color: #658b00">self</span>.seed = seed
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">_feedforward</span>(<span style="color: #658b00">self</span>):
-        <span style="color: #8B008B; font-weight: bold">raise</span> <span style="color: #008b45; font-weight: bold">NotImplementedError</span>
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">_backpropagate</span>(<span style="color: #658b00">self</span>):
-        <span style="color: #8B008B; font-weight: bold">raise</span> <span style="color: #008b45; font-weight: bold">NotImplementedError</span>
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">_reset_weights</span>(<span style="color: #658b00">self</span>, previous_nodes):
-        <span style="color: #8B008B; font-weight: bold">raise</span> <span style="color: #008b45; font-weight: bold">NotImplementedError</span>
+  <pre style="line-height: 125%;"><span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">matplotlib.image</span> <span style="color: #8B008B; font-weight: bold">import</span> imread
+<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">matplotlib.pyplot</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">plt</span>
+<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">scipy.linalg</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">ln</span>
+<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
+<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">os</span>
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">PIL</span> <span style="color: #8B008B; font-weight: bold">import</span> Image
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">math</span> <span style="color: #8B008B; font-weight: bold">import</span> log10, sqrt 
+plt.rcParams[<span style="color: #CD5555">&#39;figure.figsize&#39;</span>] = [<span style="color: #B452CD">16</span>, <span style="color: #B452CD">8</span>]
+<span style="color: #228B22"># Import image</span>
+A = imread(os.path.join(<span style="color: #CD5555">&quot;figslides/photo1.jpg&quot;</span>))
+X = A.dot([<span style="color: #B452CD">0.299</span>, <span style="color: #B452CD">0.5870</span>, <span style="color: #B452CD">0.114</span>]) <span style="color: #228B22"># Convert RGB to grayscale</span>
+img = plt.imshow(X)
+<span style="color: #228B22"># convert to gray</span>
+img.set_cmap(<span style="color: #CD5555">&#39;gray&#39;</span>)
+plt.axis(<span style="color: #CD5555">&#39;off&#39;</span>)
+plt.show()
+<span style="color: #228B22"># Call image size</span>
+<span style="color: #658b00">print</span>(<span style="color: #CD5555">&#39;: %s&#39;</span>%<span style="color: #658b00">str</span>(X.shape))
+
+
+<span style="color: #228B22"># split the matrix into U, S, VT</span>
+U, S, VT = np.linalg.svd(X,full_matrices=<span style="color: #8B008B; font-weight: bold">False</span>)
+S = np.diag(S)
+m = <span style="color: #B452CD">800</span> <span style="color: #228B22"># Image&#39;s width</span>
+n = <span style="color: #B452CD">1200</span> <span style="color: #228B22"># Image&#39;s height</span>
+j = <span style="color: #B452CD">0</span>
+<span style="color: #228B22"># Try compression with different k vectors (these represent projections):</span>
+<span style="color: #8B008B; font-weight: bold">for</span> k <span style="color: #8B008B">in</span> (<span style="color: #B452CD">5</span>,<span style="color: #B452CD">10</span>, <span style="color: #B452CD">20</span>, <span style="color: #B452CD">100</span>,<span style="color: #B452CD">200</span>,<span style="color: #B452CD">400</span>,<span style="color: #B452CD">500</span>):
+    <span style="color: #228B22"># Original size of the image</span>
+    originalSize = m * n 
+    <span style="color: #228B22"># Size after compressed</span>
+    compressedSize = k * (<span style="color: #B452CD">1</span> + m + n) 
+    <span style="color: #228B22"># The projection of the original image</span>
+    Xapprox = U[:,:k] @ S[<span style="color: #B452CD">0</span>:k,:k] @ VT[:k,:]
+    plt.figure(j+<span style="color: #B452CD">1</span>)
+    j += <span style="color: #B452CD">1</span>
+    img = plt.imshow(Xapprox)
+    img.set_cmap(<span style="color: #CD5555">&#39;gray&#39;</span>)
+    
+    plt.axis(<span style="color: #CD5555">&#39;off&#39;</span>)
+    plt.title(<span style="color: #CD5555">&#39;k = &#39;</span> + <span style="color: #658b00">str</span>(k))
+    plt.show() 
+    <span style="color: #658b00">print</span>(<span style="color: #CD5555">&#39;Original size of image:&#39;</span>)
+    <span style="color: #658b00">print</span>(originalSize)
+    <span style="color: #658b00">print</span>(<span style="color: #CD5555">&#39;Compression rate as Compressed image / Original size:&#39;</span>)
+    ratio = compressedSize * <span style="color: #B452CD">1.0</span> / originalSize
+    <span style="color: #658b00">print</span>(ratio)
+    <span style="color: #658b00">print</span>(<span style="color: #CD5555">&#39;Compression rate is &#39;</span> + <span style="color: #658b00">str</span>( <span style="color: #658b00">round</span>(ratio * <span style="color: #B452CD">100</span> ,<span style="color: #B452CD">2</span>)) + <span style="color: #CD5555">&#39;%&#39;</span> )  
+    <span style="color: #228B22"># Estimate MQA</span>
+    x= X.astype(<span style="color: #CD5555">&quot;float&quot;</span>)
+    y=Xapprox.astype(<span style="color: #CD5555">&quot;float&quot;</span>)
+    err = np.sum((x - y) ** <span style="color: #B452CD">2</span>)
+    err /= <span style="color: #658b00">float</span>(X.shape[<span style="color: #B452CD">0</span>] * Xapprox.shape[<span style="color: #B452CD">1</span>])
+    <span style="color: #658b00">print</span>(<span style="color: #CD5555">&#39;The mean-square deviation &#39;</span>+ <span style="color: #658b00">str</span>(<span style="color: #658b00">round</span>( err)))
+    max_pixel = <span style="color: #B452CD">255.0</span>
+    <span style="color: #228B22"># Estimate Signal Noise Ratio</span>
+    srv = <span style="color: #B452CD">20</span> * (log10(max_pixel / sqrt(err)))
+    <span style="color: #658b00">print</span>(<span style="color: #CD5555">&#39;Signa to noise ratio &#39;</span>+ <span style="color: #658b00">str</span>(<span style="color: #658b00">round</span>(srv)) +<span style="color: #CD5555">&#39;dB&#39;</span>)
 </pre>
 </div>
       </div>
@@ -2752,2339 +3054,574 @@ <h3 id="layers">Layers </h3>
     </div>
   </div>
 </div>
-<h3 id="convolution2dlayer-convolution-in-a-hidden-layer">Convolution2DLayer: convolution in a hidden layer </h3>
-
-<p>After establishing the foundational understanding of applying
-convolution to spatial data, let us delve into the intricate workings
-of a convolutional layer in a Convolutional Neural Network (CNN). The
-primary function of convolution, as previously discussed, is to
-extract pertinent information from images while simultaneously
-decreasing the scale of our data. To initiate the image processing, we
-shall begin by partitioning the images into color channels (unless the
-image is grayscale), comprising three primary colors: red, green, and
-blue. We will subsequently utilize trainable kernels to construct a
-higher-dimensional encoding of each channel called feature
-maps. Successive layers will receive these feature maps as inputs,
-generating further encodings, albeit with reduced dimensions. The term
-trainable kernels denotes the initialization of pre-defined
-kernel-shaped weights, which we will then train via backpropagation,
-similar to how weights are trained in a Feedforward Neural Network.
+
+
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="mathematics-of-cnns">Mathematics of CNNs </h2>
+
+<p>The mathematics of CNNs is based on the mathematical operation of
+<b>convolution</b>.  In mathematics (in particular in functional analysis),
+convolution is represented by mathematical operations (integration,
+summation etc) on two functions in order to produce a third function
+that expresses how the shape of one gets modified by the other.
+Convolution has a plethora of applications in a variety of
+disciplines, spanning from statistics to signal processing, computer
+vision, solutions of differential equations,linear algebra,
+engineering, and yes, machine learning.
+</p>
+
+<p>Mathematically, convolution is defined as follows (one-dimensional example):
+Let us define a continuous function \( y(t) \) given by
+</p>
+$$
+y(t) = \int x(a) w(t-a) da,
+$$
+
+<p>where \( x(a) \) represents a so-called input and \( w(t-a) \) is normally called the weight function or kernel.</p>
+
+<p>The above integral is written in  a more compact form as</p>
+$$
+y(t) = \left(x * w\right)(t).
+$$
+
+<p>The discretized version reads</p>
+$$
+y(t) = \sum_{a=-\infty}^{a=\infty}x(a)w(t-a).
+$$
+
+<p>Computing the inverse of the above convolution operations is known as deconvolution and the process is commutative.</p>
+
+<p>How can we use this? And what does it mean? Let us study some familiar examples first.</p>
+
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="convolution-examples-polynomial-multiplication">Convolution Examples: Polynomial multiplication </h2>
+
+<p>Our first example is that of a multiplication between two polynomials,
+which we will rewrite in terms of the mathematics of convolution. In
+the final stage, since the problem here is a discrete one, we will
+recast the final expression in terms of a matrix-vector
+multiplication, where the matrix is a so-called <a href="https://link.springer.com/book/10.1007/978-93-86279-04-0" target="_blank">Toeplitz matrix
+</a>.
 </p>
 
-<p>To ensure seamless integration between our implementation of the
-convolutional layer and popular machine learning frameworks like
-Tensorflow (Keras) and PyTorch, we have adopted a design pattern that
-mirrors the construction of models using these APIs. This involves
-implementing our convolutional layer as a Python class or object,
-which allows for a more modular and flexible approach to building
-neural networks. By structuring our code in this way, users can easily
-incorporate our implementation into their existing machine learning
-pipelines without having to make significant changes to their
-codebase. Additionally, this design pattern promotes code reusability
-and makes it easier to maintain and update our convolutional layer
-implementation over time.
+<p>Let us look a the following polynomials to second and third order, respectively:</p>
+$$
+p(t) = \alpha_0+\alpha_1 t+\alpha_2 t^2,
+$$
+
+<p>and</p>
+$$
+s(t) = \beta_0+\beta_1 t+\beta_2 t^2+\beta_3 t^3.
+$$
+
+<p>The polynomial multiplication gives us a new polynomial of degree \( 5 \)</p>
+$$
+z(t) = \delta_0+\delta_1 t+\delta_2 t^2+\delta_3 t^3+\delta_4 t^4+\delta_5 t^5.
+$$
+
+
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="efficient-polynomial-multiplication">Efficient Polynomial Multiplication </h2>
+
+<p>Computing polynomial products can be implemented efficiently if we rewrite the more brute force multiplications using convolution.
+We note first that the new coefficients are given as
 </p>
 
-<p>Note that the Convolution2DLayer takes in an activation function as a parameter, as it also performs non-linearity.</p>
+$$
+\begin{split}
+\delta_0=&\alpha_0\beta_0\\
+\delta_1=&\alpha_1\beta_0+\alpha_0\beta_1\\
+\delta_2=&\alpha_0\beta_2+\alpha_1\beta_1+\alpha_2\beta_0\\
+\delta_3=&\alpha_1\beta_2+\alpha_2\beta_1+\alpha_0\beta_3\\
+\delta_4=&\alpha_2\beta_2+\alpha_1\beta_3\\
+\delta_5=&\alpha_2\beta_3.\\
+\end{split}
+$$
 
+<p>We note that \( \alpha_i=0 \) except for \( i\in \left\{0,1,2\right\} \) and \( \beta_i=0 \) except for \( i\in\left\{0,1,2,3\right\} \).</p>
 
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="line-height: 125%;"><span style="color: #8B008B; font-weight: bold">class</span> <span style="color: #008b45; font-weight: bold">Convolution2DLayer</span>(Layer):
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">__init__</span>(
-        <span style="color: #658b00">self</span>,
-        input_channels,
-        feature_maps,
-        kernel_height,
-        kernel_width,
-        v_stride,
-        h_stride,
-        pad,
-        act_func: Callable,
-        seed=<span style="color: #8B008B; font-weight: bold">None</span>,
-        reset_weights_independently=<span style="color: #8B008B; font-weight: bold">True</span>,
-    ):
-        <span style="color: #658b00">super</span>().<span style="color: #008b45">__init__</span>(seed)
-        <span style="color: #658b00">self</span>.input_channels = input_channels
-        <span style="color: #658b00">self</span>.feature_maps = feature_maps
-        <span style="color: #658b00">self</span>.kernel_height = kernel_height
-        <span style="color: #658b00">self</span>.kernel_width = kernel_width
-        <span style="color: #658b00">self</span>.v_stride = v_stride
-        <span style="color: #658b00">self</span>.h_stride = h_stride
-        <span style="color: #658b00">self</span>.pad = pad
-        <span style="color: #658b00">self</span>.act_func = act_func
-
-        <span style="color: #228B22"># such that the layer can be used on its own</span>
-        <span style="color: #228B22"># outside of the CNN module</span>
-        <span style="color: #8B008B; font-weight: bold">if</span> reset_weights_independently == <span style="color: #8B008B; font-weight: bold">True</span>:
-            <span style="color: #658b00">self</span>._reset_weights_independently()
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">_feedforward</span>(<span style="color: #658b00">self</span>, X_batch):
-        <span style="color: #228B22"># note that the shape of X_batch = [inputs, input_maps, img_height, img_width]</span>
-
-        <span style="color: #228B22"># pad the input batch</span>
-        X_batch_padded = <span style="color: #658b00">self</span>._padding(X_batch)
-
-        <span style="color: #228B22"># calculate height_index and width_index after stride</span>
-        strided_height = <span style="color: #658b00">int</span>(np.ceil(X_batch.shape[height_index] / <span style="color: #658b00">self</span>.v_stride))
-        strided_width = <span style="color: #658b00">int</span>(np.ceil(X_batch.shape[width_index] / <span style="color: #658b00">self</span>.h_stride))
-
-        <span style="color: #228B22"># create output array</span>
-        output = np.ndarray(
-            (
-                X_batch.shape[input_index],
-                <span style="color: #658b00">self</span>.feature_maps,
-                strided_height,
-                strided_width,
-            )
-        )
-
-        <span style="color: #228B22"># save input and output for backpropagation</span>
-        <span style="color: #658b00">self</span>.X_batch_feedforward = X_batch
-        <span style="color: #658b00">self</span>.output_shape = output.shape
-
-        <span style="color: #228B22"># checking for errors, no need to look here :)</span>
-        <span style="color: #658b00">self</span>._check_for_errors()
-
-        <span style="color: #228B22"># convolve input with kernel</span>
-        <span style="color: #8B008B; font-weight: bold">for</span> img <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(X_batch.shape[input_index]):
-            <span style="color: #8B008B; font-weight: bold">for</span> chin <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #658b00">self</span>.input_channels):
-                <span style="color: #8B008B; font-weight: bold">for</span> fmap <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #658b00">self</span>.feature_maps):
-                    out_h = <span style="color: #B452CD">0</span>
-                    <span style="color: #8B008B; font-weight: bold">for</span> h <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #B452CD">0</span>, X_batch.shape[height_index], <span style="color: #658b00">self</span>.v_stride):
-                        out_w = <span style="color: #B452CD">0</span>
-                        <span style="color: #8B008B; font-weight: bold">for</span> w <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #B452CD">0</span>, X_batch.shape[width_index], <span style="color: #658b00">self</span>.h_stride):
-                            output[img, fmap, out_h, out_w] = np.sum(
-                                X_batch_padded[
-                                    img,
-                                    chin,
-                                    h : h + <span style="color: #658b00">self</span>.kernel_height,
-                                    w : w + <span style="color: #658b00">self</span>.kernel_width,
-                                ]
-                                * <span style="color: #658b00">self</span>.kernel[chin, fmap, :, :]
-                            )
-                            out_w += <span style="color: #B452CD">1</span>
-                        out_h += <span style="color: #B452CD">1</span>
-
-        <span style="color: #228B22"># Pay attention to the fact that we&#39;re not rotating the kernel by 180 degrees when filtering the image in</span>
-        <span style="color: #228B22"># the convolutional layer, as convolution in terms of Machine Learning is a procedure known as cross-correlation</span>
-        <span style="color: #228B22"># in image processing and signal processing</span>
-
-        <span style="color: #228B22"># return a</span>
-        <span style="color: #8B008B; font-weight: bold">return</span> <span style="color: #658b00">self</span>.act_func(output / (<span style="color: #658b00">self</span>.kernel_height))
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">_backpropagate</span>(<span style="color: #658b00">self</span>, delta_term_next):
-        <span style="color: #228B22"># intiate matrices</span>
-        delta_term = np.zeros((<span style="color: #658b00">self</span>.X_batch_feedforward.shape))
-        gradient_kernel = np.zeros((<span style="color: #658b00">self</span>.kernel.shape))
-
-        <span style="color: #228B22"># pad input for convolution</span>
-        X_batch_padded = <span style="color: #658b00">self</span>._padding(<span style="color: #658b00">self</span>.X_batch_feedforward)
-
-        <span style="color: #228B22"># Since an activation function is used at the output of the convolution layer, its derivative</span>
-        <span style="color: #228B22"># has to be accounted for in the backpropagation -&gt; as if ReLU was a layer on its own.</span>
-        act_derivative = derivate(<span style="color: #658b00">self</span>.act_func)
-        delta_term_next = act_derivative(delta_term_next)
-
-        <span style="color: #228B22"># fill in 0&#39;s for values removed by vertical stride in feedforward</span>
-        <span style="color: #8B008B; font-weight: bold">if</span> <span style="color: #658b00">self</span>.v_stride &gt; <span style="color: #B452CD">1</span>:
-            v_ind = <span style="color: #B452CD">1</span>
-            <span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(delta_term_next.shape[height_index]):
-                <span style="color: #8B008B; font-weight: bold">for</span> j <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #658b00">self</span>.v_stride - <span style="color: #B452CD">1</span>):
-                    delta_term_next = np.insert(
-                        delta_term_next, v_ind, <span style="color: #B452CD">0</span>, axis=height_index
-                    )
-                v_ind += <span style="color: #658b00">self</span>.v_stride
-
-        <span style="color: #228B22"># fill in 0&#39;s for values removed by horizontal stride in feedforward</span>
-        <span style="color: #8B008B; font-weight: bold">if</span> <span style="color: #658b00">self</span>.h_stride &gt; <span style="color: #B452CD">1</span>:
-            h_ind = <span style="color: #B452CD">1</span>
-            <span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(delta_term_next.shape[width_index]):
-                <span style="color: #8B008B; font-weight: bold">for</span> k <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #658b00">self</span>.h_stride - <span style="color: #B452CD">1</span>):
-                    delta_term_next = np.insert(
-                        delta_term_next, h_ind, <span style="color: #B452CD">0</span>, axis=width_index
-                    )
-                h_ind += <span style="color: #658b00">self</span>.h_stride
-
-        <span style="color: #228B22"># crops out 0-rows and 0-columns</span>
-        delta_term_next = delta_term_next[
-            :,
-            :,
-            : <span style="color: #658b00">self</span>.X_batch_feedforward.shape[height_index],
-            : <span style="color: #658b00">self</span>.X_batch_feedforward.shape[width_index],
-        ]
-
-        <span style="color: #228B22"># the gradient received from the next layer also needs to be padded</span>
-        delta_term_next = <span style="color: #658b00">self</span>._padding(delta_term_next)
-
-        <span style="color: #228B22"># calculate delta term by convolving next delta term with kernel</span>
-        <span style="color: #8B008B; font-weight: bold">for</span> img <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #658b00">self</span>.X_batch_feedforward.shape[input_index]):
-            <span style="color: #8B008B; font-weight: bold">for</span> chin <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #658b00">self</span>.input_channels):
-                <span style="color: #8B008B; font-weight: bold">for</span> fmap <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #658b00">self</span>.feature_maps):
-                    <span style="color: #8B008B; font-weight: bold">for</span> h <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #658b00">self</span>.X_batch_feedforward.shape[height_index]):
-                        <span style="color: #8B008B; font-weight: bold">for</span> w <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #658b00">self</span>.X_batch_feedforward.shape[width_index]):
-                            delta_term[img, chin, h, w] = np.sum(
-                                delta_term_next[
-                                    img,
-                                    fmap,
-                                    h : h + <span style="color: #658b00">self</span>.kernel_height,
-                                    w : w + <span style="color: #658b00">self</span>.kernel_width,
-                                ]
-                                * np.rot90(np.rot90(<span style="color: #658b00">self</span>.kernel[chin, fmap, :, :]))
-                            )
-
-        <span style="color: #228B22"># calculate gradient for kernel for weight update</span>
-        <span style="color: #228B22"># also via convolution</span>
-        <span style="color: #8B008B; font-weight: bold">for</span> chin <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #658b00">self</span>.input_channels):
-            <span style="color: #8B008B; font-weight: bold">for</span> fmap <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #658b00">self</span>.feature_maps):
-                <span style="color: #8B008B; font-weight: bold">for</span> k_x <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #658b00">self</span>.kernel_height):
-                    <span style="color: #8B008B; font-weight: bold">for</span> k_y <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #658b00">self</span>.kernel_width):
-                        gradient_kernel[chin, fmap, k_x, k_y] = np.sum(
-                            X_batch_padded[
-                                img,
-                                chin,
-                                h : h + <span style="color: #658b00">self</span>.kernel_height,
-                                w : w + <span style="color: #658b00">self</span>.kernel_width,
-                            ]
-                            * delta_term_next[
-                                img,
-                                fmap,
-                                h : h + <span style="color: #658b00">self</span>.kernel_height,
-                                w : w + <span style="color: #658b00">self</span>.kernel_width,
-                            ]
-                        )
-        <span style="color: #228B22"># all kernels are updated with weight gradient of kernel</span>
-        <span style="color: #658b00">self</span>.kernel -= gradient_kernel
-
-        <span style="color: #228B22"># return delta term</span>
-        <span style="color: #8B008B; font-weight: bold">return</span> delta_term
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">_padding</span>(<span style="color: #658b00">self</span>, X_batch, batch_type=<span style="color: #CD5555">&quot;image&quot;</span>):
-
-        <span style="color: #228B22"># same padding for images</span>
-        <span style="color: #8B008B; font-weight: bold">if</span> <span style="color: #658b00">self</span>.pad == <span style="color: #CD5555">&quot;same&quot;</span> <span style="color: #8B008B">and</span> batch_type == <span style="color: #CD5555">&quot;image&quot;</span>:
-            padded_height = X_batch.shape[height_index] + (<span style="color: #658b00">self</span>.kernel_height // <span style="color: #B452CD">2</span>) * <span style="color: #B452CD">2</span>
-            padded_width = X_batch.shape[width_index] + (<span style="color: #658b00">self</span>.kernel_width // <span style="color: #B452CD">2</span>) * <span style="color: #B452CD">2</span>
-            half_kernel_height = <span style="color: #658b00">self</span>.kernel_height // <span style="color: #B452CD">2</span>
-            half_kernel_width = <span style="color: #658b00">self</span>.kernel_width // <span style="color: #B452CD">2</span>
-
-            <span style="color: #228B22"># initialize padded array</span>
-            X_batch_padded = np.ndarray(
-                (
-                    X_batch.shape[input_index],
-                    X_batch.shape[feature_maps_index],
-                    padded_height,
-                    padded_width,
-                )
-            )
-
-            <span style="color: #228B22"># zero pad all images in X_batch</span>
-            <span style="color: #8B008B; font-weight: bold">for</span> img <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(X_batch.shape[input_index]):
-                padded_img = np.zeros(
-                    (X_batch.shape[feature_maps_index], padded_height, padded_width)
-                )
-                padded_img[
-                    :,
-                    half_kernel_height : padded_height - half_kernel_height,
-                    half_kernel_width : padded_width - half_kernel_width,
-                ] = X_batch[img, :, :, :]
-                X_batch_padded[img, :, :, :] = padded_img[:, :, :]
-
-            <span style="color: #8B008B; font-weight: bold">return</span> X_batch_padded
-
-        <span style="color: #228B22"># same padding for gradients</span>
-        <span style="color: #8B008B; font-weight: bold">elif</span> <span style="color: #658b00">self</span>.pad == <span style="color: #CD5555">&quot;same&quot;</span> <span style="color: #8B008B">and</span> batch_type == <span style="color: #CD5555">&quot;grad&quot;</span>:
-            padded_height = X_batch.shape[height_index] + (<span style="color: #658b00">self</span>.kernel_height // <span style="color: #B452CD">2</span>) * <span style="color: #B452CD">2</span>
-            padded_width = X_batch.shape[width_index] + (<span style="color: #658b00">self</span>.kernel_width // <span style="color: #B452CD">2</span>) * <span style="color: #B452CD">2</span>
-            half_kernel_height = <span style="color: #658b00">self</span>.kernel_height // <span style="color: #B452CD">2</span>
-            half_kernel_width = <span style="color: #658b00">self</span>.kernel_width // <span style="color: #B452CD">2</span>
-
-            <span style="color: #228B22"># initialize padded array</span>
-            delta_term_padded = np.zeros(
-                (
-                    X_batch.shape[input_index],
-                    X_batch.shape[feature_maps_index],
-                    padded_height,
-                    padded_width,
-                )
-            )
-
-            <span style="color: #228B22"># zero pad delta term</span>
-            delta_term_padded[
-                :, :, : X_batch.shape[height_index], : X_batch.shape[width_index]
-            ] = X_batch[:, :, :, :]
-
-            <span style="color: #8B008B; font-weight: bold">return</span> delta_term_padded
-
-        <span style="color: #8B008B; font-weight: bold">else</span>:
-            <span style="color: #8B008B; font-weight: bold">return</span> X_batch
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">_reset_weights_independently</span>(<span style="color: #658b00">self</span>):
-        <span style="color: #228B22"># sets seed to remove randomness inbetween runs</span>
-        <span style="color: #8B008B; font-weight: bold">if</span> <span style="color: #658b00">self</span>.seed <span style="color: #8B008B">is</span> <span style="color: #8B008B">not</span> <span style="color: #8B008B; font-weight: bold">None</span>:
-            np.random.seed(<span style="color: #658b00">self</span>.seed)
-
-        <span style="color: #228B22"># initializes kernel matrix</span>
-        <span style="color: #658b00">self</span>.kernel = np.ndarray(
-            (
-                <span style="color: #658b00">self</span>.input_channels,
-                <span style="color: #658b00">self</span>.feature_maps,
-                <span style="color: #658b00">self</span>.kernel_height,
-                <span style="color: #658b00">self</span>.kernel_width,
-            )
-        )
-
-        <span style="color: #228B22"># randomly initializes weights</span>
-        <span style="color: #8B008B; font-weight: bold">for</span> chin <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #658b00">self</span>.kernel.shape[kernel_input_channels_index]):
-            <span style="color: #8B008B; font-weight: bold">for</span> fmap <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #658b00">self</span>.kernel.shape[kernel_feature_maps_index]):
-                <span style="color: #658b00">self</span>.kernel[chin, fmap, :, :] = np.random.rand(
-                    <span style="color: #658b00">self</span>.kernel_height, <span style="color: #658b00">self</span>.kernel_width
-                )
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">_reset_weights</span>(<span style="color: #658b00">self</span>, previous_nodes):
-        <span style="color: #228B22"># sets weights</span>
-        <span style="color: #658b00">self</span>._reset_weights_independently()
-
-        <span style="color: #228B22"># returns shape of output used for subsequent layer&#39;s weight initiation</span>
-        strided_height = <span style="color: #658b00">int</span>(
-            np.ceil(previous_nodes.shape[height_index] / <span style="color: #658b00">self</span>.v_stride)
-        )
-        strided_width = <span style="color: #658b00">int</span>(np.ceil(previous_nodes.shape[width_index] / <span style="color: #658b00">self</span>.h_stride))
-        next_nodes = np.ones(
-            (
-                previous_nodes.shape[input_index],
-                <span style="color: #658b00">self</span>.feature_maps,
-                strided_height,
-                strided_width,
-            )
-        )
-        <span style="color: #8B008B; font-weight: bold">return</span> next_nodes / <span style="color: #658b00">self</span>.kernel_height
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">_check_for_errors</span>(<span style="color: #658b00">self</span>):
-        <span style="color: #8B008B; font-weight: bold">if</span> <span style="color: #658b00">self</span>.X_batch_feedforward.shape[input_channel_index] != <span style="color: #658b00">self</span>.input_channels:
-            <span style="color: #8B008B; font-weight: bold">raise</span> <span style="color: #008b45; font-weight: bold">AssertionError</span>(
-                <span style="color: #CD5555">f&quot;ERROR: Number of input channels in data ({</span><span style="color: #658b00">self</span>.X_batch_feedforward.shape[input_channel_index]<span style="color: #CD5555">}) is not equal to input channels in Convolution2DLayerOPT ({</span><span style="color: #658b00">self</span>.input_channels<span style="color: #CD5555">})! Please change the number of input channels of the Convolution2DLayer such that they are equal&quot;</span>
-            )
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-<h3 id="backpropagation-in-the-convolutional-layer">Backpropagation in the convolutional layer </h3>
-
-<p>As you may have noticed, we have not yet explained how the
-backpropagation algorithm works in a convolutional layer. However,
-having covered all other major details about convolutional layers, we
-are now prepared to do so. It should come as no surprise that the
-calculation of delta terms at each convolutional layer takes the form
-of convolution. After the gradient has been propagated backwards
-through the flattening layer, where it was reshaped into an
-appropriate form, calculating the update value for the kernel is
-simply a matter of convolving the output gradient with the input of
-the layer for which we are updating the weights. For more detail, this
-article serves as an excellent resource, see
-<a href="https://pavisj.medium.com/convolutions-and-backpropagations-46026a8f5d2c" target="_blank"><tt>https://pavisj.medium.com/convolutions-and-backpropagations-46026a8f5d2c</tt></a>
+<p>We can then rewrite the coefficients \( \delta_j \) using a discrete convolution as</p>
+$$
+\delta_j = \sum_{i=-\infty}^{i=\infty}\alpha_i\beta_{j-i}=(\alpha * \beta)_j,
+$$
+
+<p>or as a double sum with restriction \( l=i+j \)</p>
+$$
+\delta_l = \sum_{ij}\alpha_i\beta_{j}.
+$$
+
+
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="further-simplification">Further simplification </h2>
+
+<p>Although we may have redundant operations with some few zeros for \( \beta_i \), we can rewrite the above sum in a more compact way as </p>
+$$
+\delta_i = \sum_{k=0}^{k=m-1}\alpha_k\beta_{i-k},
+$$
+
+<p>where \( m=3 \) in our case, the maximum length of
+the vector \( \alpha \). Note that the vector \( \boldsymbol{\beta} \) has length \( n=4 \). Below we will find an even more efficient representation.
+</p>
+
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="a-more-efficient-way-of-coding-the-above-convolution">A more efficient way of coding the above Convolution </h2>
+
+<p>Since we only have a finite number of \( \alpha \) and \( \beta \) values
+which are non-zero, we can rewrite the above convolution expressions
+as a matrix-vector multiplication
+</p>
+
+$$
+\boldsymbol{\delta}=\begin{bmatrix}\alpha_0 & 0 & 0 & 0 \\
+                            \alpha_1 & \alpha_0 & 0 & 0 \\
+			    \alpha_2 & \alpha_1 & \alpha_0 & 0 \\
+			    0 & \alpha_2 & \alpha_1 & \alpha_0 \\
+			    0 & 0 & \alpha_2 & \alpha_1 \\
+			    0 & 0 & 0 & \alpha_2
+			    \end{bmatrix}\begin{bmatrix} \beta_0 \\ \beta_1 \\ \beta_2 \\ \beta_3\end{bmatrix}.
+$$
+
+
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="commutative-process">Commutative process </h2>
+
+<p>The process is commutative and we can easily see that we can rewrite the multiplication in terms of  a matrix holding \( \beta \) and a vector holding \( \alpha \).
+In this case we have
 </p>
-<h3 id="demonstration">Demonstration </h3>
+$$
+\boldsymbol{\delta}=\begin{bmatrix}\beta_0 & 0 & 0  \\
+                            \beta_1 & \beta_0 & 0  \\
+			    \beta_2 & \beta_1 & \beta_0  \\
+			    \beta_3 & \beta_2 & \beta_1 \\
+			    0 & \beta_3 & \beta_2 \\
+			    0 & 0 & \beta_3
+			    \end{bmatrix}\begin{bmatrix} \alpha_0 \\ \alpha_1 \\ \alpha_2\end{bmatrix}.
+$$
+
+<p>Note that the use of these matrices is for mathematical purposes only
+and not implementation purposes.  When implementing the above equation
+we do not encode (and allocate memory) the matrices explicitely.  We
+rather code the convolutions in the minimal memory footprint that they
+require.
+</p>
+
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="toeplitz-matrices">Toeplitz matrices </h2>
+
+<p>The above matrices are examples of so-called <a href="https://link.springer.com/book/10.1007/978-93-86279-04-0" target="_blank">Toeplitz
+matrices</a>. A
+Toeplitz matrix is a matrix in which each descending diagonal from
+left to right is constant. For instance the last matrix, which we
+rewrite as
+</p>
+$$
+\boldsymbol{A}=\begin{bmatrix}a_0 & 0 & 0  \\
+                            a_1 & a_0 & 0  \\
+			    a_2 & a_1 & a_0  \\
+			    a_3 & a_2 & a_1 \\
+			    0 & a_3 & a_2 \\
+			    0 & 0 & a_3
+			    \end{bmatrix},
+$$
+
+<p>with elements \( a_{ii}=a_{i+1,j+1}=a_{i-j} \) is an example of a Toeplitz
+matrix. Such a matrix does not need to be a square matrix.  Toeplitz
+matrices are also closely connected with Fourier series, because the multiplication operator by a trigonometric
+polynomial, compressed to a finite-dimensional space, can be
+represented by such a matrix. The example above shows that we can
+represent linear convolution as multiplication of a Toeplitz matrix by
+a vector.
+</p>
+
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="fourier-series-and-toeplitz-matrices">Fourier series and Toeplitz matrices </h2>
+
+<p>This is an active and ogoing research area concerning CNNs. The following articles may be of interest</p>
+<ol>
+<li> <a href="https://www.sciencedirect.com/topics/engineering/convolution-theorem#:~:text=The%20convolution%20theorem%20(together%20with,k%20)%20G%20(%20k%20)%20." target="_blank">Read more about the convolution theorem and Fouriers series</a></li>
+<li> <a href="https://www.sciencedirect.com/science/article/pii/S1568494623006257" target="_blank">Fourier Transform Layer</a></li>
+</ol>
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="generalizing-the-above-one-dimensional-case">Generalizing the above one-dimensional case </h2>
+
+<p>In order to align the above simple case with the more general
+convolution cases, we rename \( \boldsymbol{\alpha} \), whose length is \( m=3 \),
+with \( \boldsymbol{w} \).  We will interpret \( \boldsymbol{w} \) as a weight/filter function
+with which we want to perform the convolution with an input variable
+\( \boldsymbol{x} \) of length \( n \).  We will assume always that the filter
+\( \boldsymbol{w} \) has dimensionality \( m \le n \).
+</p>
+
+<p>We replace thus \( \boldsymbol{\beta} \) with \( \boldsymbol{x} \) and \( \boldsymbol{\delta} \) with \( \boldsymbol{y} \) and have</p>
+$$
+y(i)= \left(x*w\right)(i)= \sum_{k=0}^{k=m-1}w(k)x(i-k),
+$$
+
+<p>where \( m=3 \) in our case, the maximum length of the vector \( \boldsymbol{w} \).
+Here the symbol \( * \) represents the mathematical operation of convolution.
+</p>
+
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="memory-considerations">Memory considerations </h2>
+
+<p>This expression leaves us however with some terms with negative
+indices, for example \( x(-1) \) and \( x(-2) \) which may not be defined. Our
+vector \( \boldsymbol{x} \) has components \( x(0) \), \( x(1) \), \( x(2) \) and \( x(3) \).
+</p>
+
+<p>The index \( j \) for \( \boldsymbol{x} \) runs from \( j=0 \) to \( j=3 \) since \( \boldsymbol{x} \) is meant to
+represent a third-order polynomial.
+</p>
+
+<p>Furthermore, the index \( i \) runs from \( i=0 \) to \( i=5 \) since \( \boldsymbol{y} \)
+contains the coefficients of a fifth-order polynomial.  When \( i=5 \) we
+may also have values of \( x(4) \) and \( x(5) \) which are not defined.
+</p>
+
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="padding">Padding </h2>
+
+<p>The solution to this is what is called <b>padding</b>!  We simply define a
+new vector \( x \) with two added elements set to zero before \( x(0) \) and
+two new elements after \( x(3) \) set to zero. That is, we augment the
+length of \( \boldsymbol{x} \) from \( n=4 \) to \( n+2P=8 \), where \( P=2 \) is the padding
+constant (a new hyperparameter), see discussions below as well.
+</p>
+
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="new-vector">New vector </h2>
+
+<p>We have a new vector defined as \( x(0)=0 \), \( x(1)=0 \),
+\( x(2)=\beta_0 \), \( x(3)=\beta_1 \), \( x(4)=\beta_2 \), \( x(5)=\beta_3 \),
+\( x(6)=0 \), and \( x(7)=0 \).
+</p>
+
+<p>We have added four new elements, which
+are all zero. The benefit is that we can rewrite the equation for
+\( \boldsymbol{y} \), with \( i=0,1,\dots,5 \),
+</p>
+$$
+y(i) = \sum_{k=0}^{k=m-1}w(k)x(i+(m-1)-k).
+$$
+
+<p>As an example, we have</p>
+$$
+y(4)=x(6)w(0)+x(5)w(1)+x(4)w(2)=0\times \alpha_0+\beta_3\alpha_1+\beta_2\alpha_2,
+$$
 
-<p>We can use the convolutional layer above to perform a simple convolution on an image of the now familiar cute dog.</p>
+<p>as before except that we have an additional term \( x(6)w(0) \), which is zero.</p>
+
+<p>Similarly, for the fifth-order term we have</p>
+$$
+y(5)=x(7)w(0)+x(6)w(1)+x(5)w(2)=0\times \alpha_0+0\times\alpha_1+\beta_3\alpha_2.
+$$
 
+<p>The zeroth-order term is</p>
+$$
+y(0)=x(2)w(0)+x(1)w(1)+x(0)w(2)=\beta_0 \alpha_0+0\times\alpha_1+0\times\alpha_2=\alpha_0\beta_0.
+$$
 
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="line-height: 125%;"><span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">imageio.v3</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">imageio</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">matplotlib.pyplot</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">plt</span>
 
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">plot_convolution_result</span>(X, layer):
-    plt.imshow(X[<span style="color: #B452CD">0</span>, <span style="color: #B452CD">0</span>, :, :], vmin=<span style="color: #B452CD">0</span>, vmax=<span style="color: #B452CD">255</span>, cmap=<span style="color: #CD5555">&quot;gray&quot;</span>)
-    plt.title(<span style="color: #CD5555">&quot;Original image&quot;</span>)
-    plt.colorbar()
-    plt.show()
-    conv_result = layer._feedforward(X)
-    plt.title(<span style="color: #CD5555">&quot;Result of convolutional layer&quot;</span>)
-    plt.imshow(conv_result[<span style="color: #B452CD">0</span>, <span style="color: #B452CD">0</span>, :, :], vmin=<span style="color: #B452CD">0</span>, vmax=<span style="color: #B452CD">255</span>, cmap=<span style="color: #CD5555">&quot;gray&quot;</span>)
-    plt.colorbar()
-    plt.show()
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="rewriting-as-dot-products">Rewriting as dot products </h2>
 
-<span style="color: #228B22"># create layer</span>
-layer = Convolution2DLayer(
-    input_channels=<span style="color: #B452CD">3</span>,
-    feature_maps=<span style="color: #B452CD">1</span>,
-    kernel_height=<span style="color: #B452CD">4</span>,
-    kernel_width=<span style="color: #B452CD">4</span>,
-    v_stride=<span style="color: #B452CD">2</span>,
-    h_stride=<span style="color: #B452CD">2</span>,
-    pad=<span style="color: #CD5555">&quot;same&quot;</span>,
-    act_func=identity,
-    seed=<span style="color: #B452CD">2023</span>,
-    )
-
-<span style="color: #228B22"># read in image path, make data correct format</span>
-img_path = img_path = <span style="color: #CD5555">&quot;data/IMG-2167.JPG&quot;</span>
-image_of_cute_dog = imageio.imread(img_path)
-image_shape = image_of_cute_dog.shape
-image_of_cute_dog = image_of_cute_dog.reshape(<span style="color: #B452CD">1</span>, image_shape[<span style="color: #B452CD">0</span>], image_shape[<span style="color: #B452CD">1</span>], image_shape[<span style="color: #B452CD">2</span>])
-image_of_cute_dog = image_of_cute_dog.transpose(<span style="color: #B452CD">0</span>, <span style="color: #B452CD">3</span>, <span style="color: #B452CD">1</span>, <span style="color: #B452CD">2</span>)
-
-<span style="color: #228B22"># plot the result of the convolution</span>
-plot_convolution_result(image_of_cute_dog, layer)
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
+<p>If we now flip the filter/weight vector, with the following term as a typical example</p>
+$$
+y(0)=x(2)w(0)+x(1)w(1)+x(0)w(2)=x(2)\tilde{w}(2)+x(1)\tilde{w}(1)+x(0)\tilde{w}(0),
+$$
 
-<p>We cobserve that the result has half the pixels on each axis due to
-the fact that we've used a horizontal and vertical stride of 2. The
-result of this convolution is not very insightfull, as the kernel has
-completely random values for the first feedforward pass. However, as
-we perform multiple forward and backward passes, the results of the
-convolution should provide identifying features of the image it uses
-for classification.
+<p>with \( \tilde{w}(0)=w(2) \), \( \tilde{w}(1)=w(1) \), and \( \tilde{w}(2)=w(0) \), we can then rewrite the above sum as a dot product of
+\( x(i:i+(m-1))\tilde{w} \) for element \( y(i) \), where \( x(i:i+(m-1)) \) is simply a patch of \( \boldsymbol{x} \) of size \( m-1 \).
 </p>
 
-<p>Note that image data usually comes in many different shapes and sizes,
-but for our CNN we require the input data be formatted as \[Number of
-inputs, input channels, input height, input width\]. Occasionally, the
-data you come accross use will be formatted like this, but on many
-occasions reshaping and transposing the dimensions is sadly necessary.
+<p>The padding \( P \) we have introduced for the convolution stage is just
+another hyperparameter which is introduced as part of the
+architecture. Similarly, below we will also introduce another
+hyperparameter called <b>Stride</b> \( S \). 
 </p>
-<h3 id="pooling-layer">Pooling Layer </h3>
-
-<p>The pooling layer is another widely used type of layer in
-convolutional neural networks that enables data downsampling to a more
-manageable size. Despite recent technological advancements that allow
-for convolution without excessive size reduction of the data, the
-pooling layer still remains a fundamental component of convolutional
-neural networks. It can be used before, after, or in between
-convolutional layers, although finding the optimal placement of layers
-and network depth requires experimentation to achieve the best
-performance for a given problem. The code we provide allows you to
-perform two types of pooling known as max pooling and average pooling.
+
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="cross-correlation">Cross correlation </h2>
+
+<p>In essentially all applications one uses what is called cross correlation instead of the standard convolution described above.
+This means that multiplication is performed in the same direction and instead of the general expression we have discussed above (with infinite sums)
 </p>
+$$
+y(i) = \sum_{k=-\infty}^{k=\infty}w(k)x(i-k),
+$$
 
+<p>we have now</p>
+$$
+y(i) = \sum_{k=-\infty}^{k=\infty}w(k)x(i+k).
+$$
 
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="line-height: 125%;"><span style="color: #8B008B; font-weight: bold">class</span> <span style="color: #008b45; font-weight: bold">Pooling2DLayer</span>(Layer):
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">__init__</span>(
-        <span style="color: #658b00">self</span>,
-        kernel_height,
-        kernel_width,
-        v_stride,
-        h_stride,
-        pooling=<span style="color: #CD5555">&quot;max&quot;</span>,
-        seed=<span style="color: #8B008B; font-weight: bold">None</span>,
-    ):
-        <span style="color: #658b00">super</span>().<span style="color: #008b45">__init__</span>(seed)
-        <span style="color: #658b00">self</span>.kernel_height = kernel_height
-        <span style="color: #658b00">self</span>.kernel_width = kernel_width
-        <span style="color: #658b00">self</span>.v_stride = v_stride
-        <span style="color: #658b00">self</span>.h_stride = h_stride
-        <span style="color: #658b00">self</span>.pooling = pooling
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">_feedforward</span>(<span style="color: #658b00">self</span>, X_batch):
-        <span style="color: #228B22"># Saving the input for use in the backwardpass</span>
-        <span style="color: #658b00">self</span>.X_batch_feedforward = X_batch
-
-        <span style="color: #228B22"># check if user is silly</span>
-        <span style="color: #658b00">self</span>._check_for_errors()
-
-        <span style="color: #228B22"># Computing the size of the feature maps based on kernel size and the stride parameter</span>
-        strided_height = (
-            X_batch.shape[height_index] - <span style="color: #658b00">self</span>.kernel_height
-        ) // <span style="color: #658b00">self</span>.v_stride + <span style="color: #B452CD">1</span>
-        <span style="color: #8B008B; font-weight: bold">if</span> X_batch.shape[height_index] == X_batch.shape[width_index]:
-            strided_width = strided_height
-        <span style="color: #8B008B; font-weight: bold">else</span>:
-            strided_width = (
-                X_batch.shape[width_index] - <span style="color: #658b00">self</span>.kernel_width
-            ) // <span style="color: #658b00">self</span>.h_stride + <span style="color: #B452CD">1</span>
-
-        <span style="color: #228B22"># initialize output array</span>
-        output = np.ndarray(
-            (
-                X_batch.shape[input_index],
-                X_batch.shape[feature_maps_index],
-                strided_height,
-                strided_width,
-            )
-        )
-
-        <span style="color: #228B22"># select pooling action, either max or average pooling</span>
-        <span style="color: #8B008B; font-weight: bold">if</span> <span style="color: #658b00">self</span>.pooling == <span style="color: #CD5555">&quot;max&quot;</span>:
-            <span style="color: #658b00">self</span>.pooling_action = np.max
-        <span style="color: #8B008B; font-weight: bold">elif</span> <span style="color: #658b00">self</span>.pooling == <span style="color: #CD5555">&quot;average&quot;</span>:
-            <span style="color: #658b00">self</span>.pooling_action = np.mean
-
-        <span style="color: #228B22"># pool based on kernel size and stride</span>
-        <span style="color: #8B008B; font-weight: bold">for</span> img <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(output.shape[input_index]):
-            <span style="color: #8B008B; font-weight: bold">for</span> fmap <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(output.shape[feature_maps_index]):
-                <span style="color: #8B008B; font-weight: bold">for</span> h <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(strided_height):
-                    <span style="color: #8B008B; font-weight: bold">for</span> w <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(strided_width):
-                        output[img, fmap, h, w] = <span style="color: #658b00">self</span>.pooling_action(
-                            X_batch[
-                                img,
-                                fmap,
-                                (h * <span style="color: #658b00">self</span>.v_stride) : (h * <span style="color: #658b00">self</span>.v_stride)
-                                + <span style="color: #658b00">self</span>.kernel_height,
-                                (w * <span style="color: #658b00">self</span>.h_stride) : (w * <span style="color: #658b00">self</span>.h_stride)
-                                + <span style="color: #658b00">self</span>.kernel_width,
-                            ]
-                        )
-
-        <span style="color: #228B22"># output for feedforward in next layer</span>
-        <span style="color: #8B008B; font-weight: bold">return</span> output
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">_backpropagate</span>(<span style="color: #658b00">self</span>, delta_term_next):
-        <span style="color: #228B22"># initiate delta term array</span>
-        delta_term = np.zeros((<span style="color: #658b00">self</span>.X_batch_feedforward.shape))
-
-        <span style="color: #8B008B; font-weight: bold">for</span> img <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(delta_term_next.shape[input_index]):
-            <span style="color: #8B008B; font-weight: bold">for</span> fmap <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(delta_term_next.shape[feature_maps_index]):
-                <span style="color: #8B008B; font-weight: bold">for</span> h <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #B452CD">0</span>, delta_term_next.shape[height_index], <span style="color: #658b00">self</span>.v_stride):
-                    <span style="color: #8B008B; font-weight: bold">for</span> w <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(
-                        <span style="color: #B452CD">0</span>, delta_term_next.shape[width_index], <span style="color: #658b00">self</span>.h_stride
-                    ):
-                        <span style="color: #228B22"># max pooling</span>
-                        <span style="color: #8B008B; font-weight: bold">if</span> <span style="color: #658b00">self</span>.pooling == <span style="color: #CD5555">&quot;max&quot;</span>:
-                            <span style="color: #228B22"># get window</span>
-                            window = <span style="color: #658b00">self</span>.X_batch_feedforward[
-                                img,
-                                fmap,
-                                h : h + <span style="color: #658b00">self</span>.kernel_height,
-                                w : w + <span style="color: #658b00">self</span>.kernel_width,
-                            ]
-
-                            <span style="color: #228B22"># find max values indices in window</span>
-                            max_h, max_w = np.unravel_index(
-                                window.argmax(), window.shape
-                            )
-
-                            <span style="color: #228B22"># set values in new, upsampled delta term</span>
-                            delta_term[
-                                img,
-                                fmap,
-                                (h + max_h),
-                                (w + max_w),
-                            ] += delta_term_next[img, fmap, h, w]
-
-                        <span style="color: #228B22"># average pooling</span>
-                        <span style="color: #8B008B; font-weight: bold">if</span> <span style="color: #658b00">self</span>.pooling == <span style="color: #CD5555">&quot;average&quot;</span>:
-                            delta_term[
-                                img,
-                                fmap,
-                                h : h + <span style="color: #658b00">self</span>.kernel_height,
-                                w : w + <span style="color: #658b00">self</span>.kernel_width,
-                            ] = (
-                                delta_term_next[img, fmap, h, w]
-                                / <span style="color: #658b00">self</span>.kernel_height
-                                / <span style="color: #658b00">self</span>.kernel_width
-                            )
-        <span style="color: #228B22"># returns input to backpropagation in previous layer</span>
-        <span style="color: #8B008B; font-weight: bold">return</span> delta_term
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">_reset_weights</span>(<span style="color: #658b00">self</span>, previous_nodes):
-        <span style="color: #228B22"># calculate strided height, strided width</span>
-        strided_height = (
-            previous_nodes.shape[height_index] - <span style="color: #658b00">self</span>.kernel_height
-        ) // <span style="color: #658b00">self</span>.v_stride + <span style="color: #B452CD">1</span>
-        <span style="color: #8B008B; font-weight: bold">if</span> previous_nodes.shape[height_index] == previous_nodes.shape[width_index]:
-            strided_width = strided_height
-        <span style="color: #8B008B; font-weight: bold">else</span>:
-            strided_width = (
-                previous_nodes.shape[width_index] - <span style="color: #658b00">self</span>.kernel_width
-            ) // <span style="color: #658b00">self</span>.h_stride + <span style="color: #B452CD">1</span>
-
-        <span style="color: #228B22"># initiate output array</span>
-        output = np.ones(
-            (
-                previous_nodes.shape[input_index],
-                previous_nodes.shape[feature_maps_index],
-                strided_height,
-                strided_width,
-            )
-        )
-
-        <span style="color: #228B22"># returns output with shape used for reset weights in next layer</span>
-        <span style="color: #8B008B; font-weight: bold">return</span> output
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">_check_for_errors</span>(<span style="color: #658b00">self</span>):
-        <span style="color: #228B22"># check if input is smaller than kernel size -&gt; error</span>
-        <span style="color: #8B008B; font-weight: bold">assert</span> (
-            <span style="color: #658b00">self</span>.X_batch_feedforward.shape[width_index] &gt;= <span style="color: #658b00">self</span>.kernel_width
-        ), <span style="color: #CD5555">f&quot;ERROR: Pooling kernel width_index ({</span><span style="color: #658b00">self</span>.kernel_width<span style="color: #CD5555">}) larger than data width_index ({</span><span style="color: #658b00">self</span>.X_batch_feedforward.input.shape[<span style="color: #B452CD">2</span>]<span style="color: #CD5555">}), please lower the kernel width_index of the Pooling2DLayer&quot;</span>
-        <span style="color: #8B008B; font-weight: bold">assert</span> (
-            <span style="color: #658b00">self</span>.X_batch_feedforward.shape[height_index] &gt;= <span style="color: #658b00">self</span>.kernel_height
-        ), <span style="color: #CD5555">f&quot;ERROR: Pooling kernel height_index ({</span><span style="color: #658b00">self</span>.kernel_height<span style="color: #CD5555">}) larger than data height_index ({</span><span style="color: #658b00">self</span>.X_batch_feedforward.input.shape[<span style="color: #B452CD">3</span>]<span style="color: #CD5555">}), please lower the kernel height_index of the Pooling2DLayer&quot;</span>
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-<h3 id="flattening-layer">Flattening Layer </h3>
-
-<p>Before we can begin building our first CNN model, we need to introduce
-the flattening layer. As its name suggests, the flattening layer
-transforms the data into a one-dimensional vector that can be fed into
-the feedforward layers of our network. This layer plays a crucial role
-in preparing the data for further processing in the
-network. Additionally, the flattening layer is responsible for
-reshaping the gradient to the proper shape during
-backpropagation. This ensures that the kernels are correctly updated,
-allowing for effective learning in the network.
+<p>Both TensorFlow and PyTorch (as well as our own code example below),
+implement the last equation, although it is normally referred to as
+convolution.  The same padding rules and stride rules discussed below
+apply to this expression as well.
 </p>
 
+<p>We leave it as an exercise for you to convince yourself that the example we have discussed till now, gives the same final expression using the last expression.</p>
+<h2 id="two-dimensional-objects">Two-dimensional objects </h2>
 
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="line-height: 125%;"><span style="color: #8B008B; font-weight: bold">class</span> <span style="color: #008b45; font-weight: bold">FlattenLayer</span>(Layer):
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">__init__</span>(<span style="color: #658b00">self</span>, act_func=LRELU, seed=<span style="color: #8B008B; font-weight: bold">None</span>):
-        <span style="color: #658b00">super</span>().<span style="color: #008b45">__init__</span>(seed)
-        <span style="color: #658b00">self</span>.act_func = act_func
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">_feedforward</span>(<span style="color: #658b00">self</span>, X_batch):
-        <span style="color: #228B22"># save input for backpropagation</span>
-        <span style="color: #658b00">self</span>.X_batch_feedforward_shape = X_batch.shape
-        <span style="color: #228B22"># Remember, the data has the following shape: (I, FM, H, W, ) in the convolutional layers</span>
-        <span style="color: #228B22"># whilst the data has the shape (I, FM * H * W) in the fully connected layers</span>
-        <span style="color: #228B22"># I = Inputs, FM = Feature Maps, H = Height and W = Width.</span>
-        X_batch = X_batch.reshape(
-            X_batch.shape[input_index],
-            X_batch.shape[feature_maps_index]
-            * X_batch.shape[height_index]
-            * X_batch.shape[width_index],
-        )
-
-        <span style="color: #228B22"># add bias to a</span>
-        <span style="color: #658b00">self</span>.z_matrix = X_batch
-        bias = np.ones((X_batch.shape[input_index], <span style="color: #B452CD">1</span>)) * <span style="color: #B452CD">0.01</span>
-        <span style="color: #658b00">self</span>.a_matrix = np.hstack([bias, X_batch])
-
-        <span style="color: #228B22"># return a, the input to feedforward in next layer</span>
-        <span style="color: #8B008B; font-weight: bold">return</span> <span style="color: #658b00">self</span>.a_matrix
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">_backpropagate</span>(<span style="color: #658b00">self</span>, weights_next, delta_term_next):
-        activation_derivative = derivate(<span style="color: #658b00">self</span>.act_func)
-
-        <span style="color: #228B22"># calculate delta term</span>
-        delta_term = (
-            weights_next[bias_index:, :] @ delta_term_next.T
-        ).T * activation_derivative(<span style="color: #658b00">self</span>.z_matrix)
-
-        <span style="color: #228B22"># FlattenLayer does not update weights</span>
-        <span style="color: #228B22"># reshapes delta layer to convolutional layer data format [Input, Feature_Maps, Height, Width]</span>
-        <span style="color: #8B008B; font-weight: bold">return</span> delta_term.reshape(<span style="color: #658b00">self</span>.X_batch_feedforward_shape)
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">_reset_weights</span>(<span style="color: #658b00">self</span>, previous_nodes):
-        <span style="color: #228B22"># note that the previous nodes to the FlattenLayer are from the convolutional layers</span>
-        previous_nodes = previous_nodes.reshape(
-            previous_nodes.shape[input_index],
-            previous_nodes.shape[feature_maps_index]
-            * previous_nodes.shape[height_index]
-            * previous_nodes.shape[width_index],
-        )
-
-        <span style="color: #228B22"># return shape used in reset_weights in next layer</span>
-        <span style="color: #8B008B; font-weight: bold">return</span> previous_nodes.shape[node_index]
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">get_prev_a</span>(<span style="color: #658b00">self</span>):
-        <span style="color: #8B008B; font-weight: bold">return</span> <span style="color: #658b00">self</span>.a_matrix
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-<h3 id="fully-connected-layers">Fully Connected Layers </h3>
-
-<p>Finally, the result from the flatten layer will pass to a series of
-fully connected layers, which function as a normal feed forward neural
-network. The fully connected layers are split into two classes;
-FullyConnectedLayer which acts as a hidden layer, and OutputLayer,
-which acts as the single output layer at the end of the CNN. If one
-wishes to use this codebase to construct a normal feed forward neural
-network, it must start with a FlattenLayer due to techincal details
-regarding weight intitialization. However many FullyConnectedLayers
-can be added to the CNN, and in each layer the amount of nodes, which
-activation function and scheduler to use can be specified. In
-practice, the scheduler will be specified in the CNN object
-initialization, and inherited if no other scheduler is specified.
+<p>We are now ready to start studying the discrete convolutions relevant for convolutional neural networks.
+We often use convolutions over more than one dimension at a time. If
+we have a two-dimensional image \( X \) as input, we can have a <b>filter</b>
+defined by a two-dimensional <b>kernel/weight/filter</b> \( W \). This leads to an output \( Y \)
 </p>
 
+$$
+Y(i,j)=(X * W)(i,j) = \sum_m\sum_n X(m,n)W(i-m,j-n).
+$$
 
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="line-height: 125%;"><span style="color: #8B008B; font-weight: bold">class</span> <span style="color: #008b45; font-weight: bold">FullyConnectedLayer</span>(Layer):
-    <span style="color: #228B22"># FullyConnectedLayer per default uses LRELU and Adam scheduler</span>
-    <span style="color: #228B22"># with an eta of 0.0001, rho of 0.9 and rho2 of 0.999</span>
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">__init__</span>(
-        <span style="color: #658b00">self</span>,
-        nodes: <span style="color: #658b00">int</span>,
-        act_func: Callable = LRELU,
-        scheduler: Scheduler = Adam(eta=<span style="color: #B452CD">1e-4</span>, rho=<span style="color: #B452CD">0.9</span>, rho2=<span style="color: #B452CD">0.999</span>),
-        seed: <span style="color: #658b00">int</span> = <span style="color: #8B008B; font-weight: bold">None</span>,
-    ):
-        <span style="color: #658b00">super</span>().<span style="color: #008b45">__init__</span>(seed)
-        <span style="color: #658b00">self</span>.nodes = nodes
-        <span style="color: #658b00">self</span>.act_func = act_func
-        <span style="color: #658b00">self</span>.scheduler_weight = copy(scheduler)
-        <span style="color: #658b00">self</span>.scheduler_bias = copy(scheduler)
-
-        <span style="color: #228B22"># initiate matrices for later</span>
-        <span style="color: #658b00">self</span>.weights = <span style="color: #8B008B; font-weight: bold">None</span>
-        <span style="color: #658b00">self</span>.a_matrix = <span style="color: #8B008B; font-weight: bold">None</span>
-        <span style="color: #658b00">self</span>.z_matrix = <span style="color: #8B008B; font-weight: bold">None</span>
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">_feedforward</span>(<span style="color: #658b00">self</span>, X_batch):
-        <span style="color: #228B22"># calculate z</span>
-        <span style="color: #658b00">self</span>.z_matrix = X_batch @ <span style="color: #658b00">self</span>.weights
-
-        <span style="color: #228B22"># calculate a, add bias</span>
-        bias = np.ones((X_batch.shape[input_index], <span style="color: #B452CD">1</span>)) * <span style="color: #B452CD">0.01</span>
-        <span style="color: #658b00">self</span>.a_matrix = <span style="color: #658b00">self</span>.act_func(<span style="color: #658b00">self</span>.z_matrix)
-        <span style="color: #658b00">self</span>.a_matrix = np.hstack([bias, <span style="color: #658b00">self</span>.a_matrix])
-
-        <span style="color: #228B22"># return a, the input for feedforward in next layer</span>
-        <span style="color: #8B008B; font-weight: bold">return</span> <span style="color: #658b00">self</span>.a_matrix
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">_backpropagate</span>(<span style="color: #658b00">self</span>, weights_next, delta_term_next, a_previous, lam):
-        <span style="color: #228B22"># take the derivative of the activation function</span>
-        activation_derivative = derivate(<span style="color: #658b00">self</span>.act_func)
-
-        <span style="color: #228B22"># calculate the delta term</span>
-        delta_term = (
-            weights_next[bias_index:, :] @ delta_term_next.T
-        ).T * activation_derivative(<span style="color: #658b00">self</span>.z_matrix)
-
-        <span style="color: #228B22"># intitiate matrix to store gradient</span>
-        <span style="color: #228B22"># note that we exclude the bias term, which we will calculate later</span>
-        gradient_weights = np.zeros(
-            (
-                a_previous.shape[input_index],
-                a_previous.shape[node_index] - bias_index,
-                delta_term.shape[node_index],
-            )
-        )
-
-        <span style="color: #228B22"># calculate gradient = delta term * previous a</span>
-        <span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #658b00">len</span>(delta_term)):
-            gradient_weights[i, :, :] = np.outer(
-                a_previous[i, bias_index:], delta_term[i, :]
-            )
-
-        <span style="color: #228B22"># sum the gradient, divide by input_index</span>
-        gradient_weights = np.mean(gradient_weights, axis=input_index)
-        <span style="color: #228B22"># for the bias gradient we do not multiply by previous a</span>
-        gradient_bias = np.mean(delta_term, axis=input_index).reshape(
-            <span style="color: #B452CD">1</span>, delta_term.shape[node_index]
-        )
-
-        <span style="color: #228B22"># regularization term</span>
-        gradient_weights += <span style="color: #658b00">self</span>.weights[bias_index:, :] * lam
-
-        <span style="color: #228B22"># send gradients into scheduler</span>
-        <span style="color: #228B22"># returns update matrix which will be used to update the weights and bias</span>
-        update_matrix = np.vstack(
-            [
-                <span style="color: #658b00">self</span>.scheduler_bias.update_change(gradient_bias),
-                <span style="color: #658b00">self</span>.scheduler_weight.update_change(gradient_weights),
-            ]
-        )
-
-        <span style="color: #228B22"># update weights</span>
-        <span style="color: #658b00">self</span>.weights -= update_matrix
-
-        <span style="color: #228B22"># return weights and delta term, input for backpropagation in previous layer</span>
-        <span style="color: #8B008B; font-weight: bold">return</span> <span style="color: #658b00">self</span>.weights, delta_term
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">_reset_weights</span>(<span style="color: #658b00">self</span>, previous_nodes):
-        <span style="color: #228B22"># sets seed to remove randomness inbetween runs</span>
-        <span style="color: #8B008B; font-weight: bold">if</span> <span style="color: #658b00">self</span>.seed <span style="color: #8B008B">is</span> <span style="color: #8B008B">not</span> <span style="color: #8B008B; font-weight: bold">None</span>:
-            np.random.seed(<span style="color: #658b00">self</span>.seed)
-
-        <span style="color: #228B22"># add bias, initiate random weights</span>
-        bias = <span style="color: #B452CD">1</span>
-        <span style="color: #658b00">self</span>.weights = np.random.randn(previous_nodes + bias, <span style="color: #658b00">self</span>.nodes)
-
-        <span style="color: #228B22"># returns number of nodes, used for reset_weights in next layer</span>
-        <span style="color: #8B008B; font-weight: bold">return</span> <span style="color: #658b00">self</span>.nodes
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">_reset_scheduler</span>(<span style="color: #658b00">self</span>):
-        <span style="color: #228B22"># resets scheduler per epoch</span>
-        <span style="color: #658b00">self</span>.scheduler_weight.reset()
-        <span style="color: #658b00">self</span>.scheduler_bias.reset()
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">get_prev_a</span>(<span style="color: #658b00">self</span>):
-        <span style="color: #228B22"># returns a matrix, used in backpropagation</span>
-        <span style="color: #8B008B; font-weight: bold">return</span> <span style="color: #658b00">self</span>.a_matrix
-
-
-<span style="color: #8B008B; font-weight: bold">class</span> <span style="color: #008b45; font-weight: bold">OutputLayer</span>(FullyConnectedLayer):
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">__init__</span>(
-        <span style="color: #658b00">self</span>,
-        nodes: <span style="color: #658b00">int</span>,
-        output_func: Callable = LRELU,
-        cost_func: Callable = CostCrossEntropy,
-        scheduler: Scheduler = Adam(eta=<span style="color: #B452CD">1e-4</span>, rho=<span style="color: #B452CD">0.9</span>, rho2=<span style="color: #B452CD">0.999</span>),
-        seed: <span style="color: #658b00">int</span> = <span style="color: #8B008B; font-weight: bold">None</span>,
-    ):
-        <span style="color: #658b00">super</span>().<span style="color: #008b45">__init__</span>(nodes, output_func, copy(scheduler), seed)
-        <span style="color: #658b00">self</span>.cost_func = cost_func
-
-        <span style="color: #228B22"># initiate matrices for later</span>
-        <span style="color: #658b00">self</span>.weights = <span style="color: #8B008B; font-weight: bold">None</span>
-        <span style="color: #658b00">self</span>.a_matrix = <span style="color: #8B008B; font-weight: bold">None</span>
-        <span style="color: #658b00">self</span>.z_matrix = <span style="color: #8B008B; font-weight: bold">None</span>
-
-        <span style="color: #228B22"># decides if the output layer performs binary or multi-class classification</span>
-        <span style="color: #658b00">self</span>._set_pred_format()
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">_feedforward</span>(<span style="color: #658b00">self</span>, X_batch: np.ndarray):
-        <span style="color: #228B22"># calculate a, z</span>
-        <span style="color: #228B22"># note that bias is not added as this would create an extra output class</span>
-        <span style="color: #658b00">self</span>.z_matrix = X_batch @ <span style="color: #658b00">self</span>.weights
-        <span style="color: #658b00">self</span>.a_matrix = <span style="color: #658b00">self</span>.act_func(<span style="color: #658b00">self</span>.z_matrix)
-
-        <span style="color: #228B22"># returns prediction</span>
-        <span style="color: #8B008B; font-weight: bold">return</span> <span style="color: #658b00">self</span>.a_matrix
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">_backpropagate</span>(<span style="color: #658b00">self</span>, target, a_previous, lam):
-        <span style="color: #228B22"># note that in the OutputLayer the activation function is the output function</span>
-        activation_derivative = derivate(<span style="color: #658b00">self</span>.act_func)
-
-        <span style="color: #228B22"># calculate output delta terms</span>
-        <span style="color: #228B22"># for multi-class or binary classification</span>
-        <span style="color: #8B008B; font-weight: bold">if</span> <span style="color: #658b00">self</span>.pred_format == <span style="color: #CD5555">&quot;Multi-class&quot;</span>:
-            delta_term = <span style="color: #658b00">self</span>.a_matrix - target
-        <span style="color: #8B008B; font-weight: bold">else</span>:
-            cost_func_derivative = grad(<span style="color: #658b00">self</span>.cost_func(target))
-            delta_term = activation_derivative(<span style="color: #658b00">self</span>.z_matrix) * cost_func_derivative(
-                <span style="color: #658b00">self</span>.a_matrix
-            )
-
-        <span style="color: #228B22"># intiate matrix that stores gradient</span>
-        gradient_weights = np.zeros(
-            (
-                a_previous.shape[input_index],
-                a_previous.shape[node_index] - bias_index,
-                delta_term.shape[node_index],
-            )
-        )
-
-        <span style="color: #228B22"># calculate gradient = delta term * previous a</span>
-        <span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #658b00">len</span>(delta_term)):
-            gradient_weights[i, :, :] = np.outer(
-                a_previous[i, bias_index:], delta_term[i, :]
-            )
-
-        <span style="color: #228B22"># sum the gradient, divide by input_index</span>
-        gradient_weights = np.mean(gradient_weights, axis=input_index)
-        <span style="color: #228B22"># for the bias gradient we do not multiply by previous a</span>
-        gradient_bias = np.mean(delta_term, axis=input_index).reshape(
-            <span style="color: #B452CD">1</span>, delta_term.shape[node_index]
-        )
-
-        <span style="color: #228B22"># regularization term</span>
-        gradient_weights += <span style="color: #658b00">self</span>.weights[bias_index:, :] * lam
-
-        <span style="color: #228B22"># send gradients into scheduler</span>
-        <span style="color: #228B22"># returns update matrix which will be used to update the weights and bias</span>
-        update_matrix = np.vstack(
-            [
-                <span style="color: #658b00">self</span>.scheduler_bias.update_change(gradient_bias),
-                <span style="color: #658b00">self</span>.scheduler_weight.update_change(gradient_weights),
-            ]
-        )
-
-        <span style="color: #228B22"># update weights</span>
-        <span style="color: #658b00">self</span>.weights -= update_matrix
-
-        <span style="color: #228B22"># return weights and delta term, input for backpropagation in previous layer</span>
-        <span style="color: #8B008B; font-weight: bold">return</span> <span style="color: #658b00">self</span>.weights, delta_term
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">_reset_weights</span>(<span style="color: #658b00">self</span>, previous_nodes):
-        <span style="color: #228B22"># sets seed to remove randomness inbetween runs</span>
-        <span style="color: #8B008B; font-weight: bold">if</span> <span style="color: #658b00">self</span>.seed <span style="color: #8B008B">is</span> <span style="color: #8B008B">not</span> <span style="color: #8B008B; font-weight: bold">None</span>:
-            np.random.seed(<span style="color: #658b00">self</span>.seed)
-
-        <span style="color: #228B22"># add bias, initiate random weights</span>
-        bias = <span style="color: #B452CD">1</span>
-        <span style="color: #658b00">self</span>.weights = np.random.rand(previous_nodes + bias, <span style="color: #658b00">self</span>.nodes)
-
-        <span style="color: #228B22"># returns number of nodes, used for reset_weights in next layer</span>
-        <span style="color: #8B008B; font-weight: bold">return</span> <span style="color: #658b00">self</span>.nodes
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">_reset_scheduler</span>(<span style="color: #658b00">self</span>):
-        <span style="color: #228B22"># resets scheduler per epoch</span>
-        <span style="color: #658b00">self</span>.scheduler_weight.reset()
-        <span style="color: #658b00">self</span>.scheduler_bias.reset()
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">_set_pred_format</span>(<span style="color: #658b00">self</span>):
-        <span style="color: #228B22"># sets prediction format to either regression, binary or multi-class classification</span>
-        <span style="color: #8B008B; font-weight: bold">if</span> <span style="color: #658b00">self</span>.act_func.<span style="color: #00688B">__name__</span> <span style="color: #8B008B">is</span> <span style="color: #8B008B; font-weight: bold">None</span> <span style="color: #8B008B">or</span> <span style="color: #658b00">self</span>.act_func.<span style="color: #00688B">__name__</span> == <span style="color: #CD5555">&quot;identity&quot;</span>:
-            <span style="color: #658b00">self</span>.pred_format = <span style="color: #CD5555">&quot;Regression&quot;</span>
-        <span style="color: #8B008B; font-weight: bold">elif</span> <span style="color: #658b00">self</span>.act_func.<span style="color: #00688B">__name__</span> == <span style="color: #CD5555">&quot;sigmoid&quot;</span> <span style="color: #8B008B">or</span> <span style="color: #658b00">self</span>.act_func.<span style="color: #00688B">__name__</span> == <span style="color: #CD5555">&quot;tanh&quot;</span>:
-            <span style="color: #658b00">self</span>.pred_format = <span style="color: #CD5555">&quot;Binary&quot;</span>
-        <span style="color: #8B008B; font-weight: bold">else</span>:
-            <span style="color: #658b00">self</span>.pred_format = <span style="color: #CD5555">&quot;Multi-class&quot;</span>
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">get_pred_format</span>(<span style="color: #658b00">self</span>):
-        <span style="color: #228B22"># returns format of prediction</span>
-        <span style="color: #8B008B; font-weight: bold">return</span> <span style="color: #658b00">self</span>.pred_format
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-<h3 id="optimized-convolution2dlayer">Optimized Convolution2DLayer </h3>
-
-<p>For our CNN, we have also implemented an optimized version of the
-Convolution2DLayer, Convolution2DLayerOPT, which runs much faster. See
-VII. Remarks for discussion. This layer will per default be used by
-the CNN due to its computational advantages, but is much less
-readable. We've documented it such that specially interested students
-can understand the principles behind it, but it is not recommended to
-read. In short, we reshape and transpose parts of the image such that
-the convolutional operation can be swapped out for a simple matrix
-multiplication.
+<p>Convolution is a commutative process, which means we can rewrite this equation as</p>
+$$
+Y(i,j)=(X * W)(i,j) = \sum_m\sum_n X(i-m,j-n)W(m,n).
+$$
+
+<p>Normally the latter is more straightforward to implement in a machine
+larning library since there is less variation in the range of values
+of \( m \) and \( n \).
 </p>
 
+<p>As mentioned above, most deep learning libraries implement
+cross-correlation instead of convolution (although it is referred to as
+convolution)
+</p>
+$$
+Y(i,j)=(X * W)(i,j) = \sum_m\sum_n X(i+m,j+n)W(m,n).
+$$
 
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="line-height: 125%;"><span style="color: #8B008B; font-weight: bold">class</span> <span style="color: #008b45; font-weight: bold">Convolution2DLayerOPT</span>(Convolution2DLayer):
-    <span style="color: #CD5555">&quot;&quot;&quot;</span>
-<span style="color: #CD5555">    Am optimized version of the convolution layer above which</span>
-<span style="color: #CD5555">    utilizes an approach of extracting windows of size equivalent</span>
-<span style="color: #CD5555">    in size to the filter. The convoution is then performed on those</span>
-<span style="color: #CD5555">    windows instead of a full feature map.</span>
-<span style="color: #CD5555">    &quot;&quot;&quot;</span>
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">__init__</span>(
-        <span style="color: #658b00">self</span>,
-        input_channels,
-        feature_maps,
-        kernel_height,
-        kernel_width,
-        v_stride,
-        h_stride,
-        pad,
-        act_func: Callable,
-        seed=<span style="color: #8B008B; font-weight: bold">None</span>,
-        reset_weights_independently=<span style="color: #8B008B; font-weight: bold">True</span>,
-    ):
-        <span style="color: #658b00">super</span>().<span style="color: #008b45">__init__</span>(
-            input_channels,
-            feature_maps,
-            kernel_height,
-            kernel_width,
-            v_stride,
-            h_stride,
-            pad,
-            act_func,
-            seed,
-        )
-        <span style="color: #228B22"># true if layer is used outside of CNN</span>
-        <span style="color: #8B008B; font-weight: bold">if</span> reset_weights_independently == <span style="color: #8B008B; font-weight: bold">True</span>:
-            <span style="color: #658b00">self</span>._reset_weights_independently()
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">_feedforward</span>(<span style="color: #658b00">self</span>, X_batch):
-        <span style="color: #228B22"># The optimized _feedforward method is difficult to understand but computationally more efficient</span>
-        <span style="color: #228B22"># for a more &quot;by the book&quot; approach, please look at the _feedforward method of Convolution2DLayer</span>
-
-        <span style="color: #228B22"># save the input for backpropagation</span>
-        <span style="color: #658b00">self</span>.X_batch_feedforward = X_batch
-
-        <span style="color: #228B22"># check that there are the correct amount of input channels</span>
-        <span style="color: #658b00">self</span>._check_for_errors()
-
-        <span style="color: #228B22"># calculate new shape after stride</span>
-        strided_height = <span style="color: #658b00">int</span>(np.ceil(X_batch.shape[height_index] / <span style="color: #658b00">self</span>.v_stride))
-        strided_width = <span style="color: #658b00">int</span>(np.ceil(X_batch.shape[width_index] / <span style="color: #658b00">self</span>.h_stride))
-
-        <span style="color: #228B22"># get windows of the image for more computationally efficient convolution</span>
-        <span style="color: #228B22"># the idea is that we want to align the dimensions that we wish to matrix</span>
-        <span style="color: #228B22"># multiply, then use a simple matrix multiplication instead of convolution.</span>
-        <span style="color: #228B22"># then, we reshape the size back to its intended shape</span>
-        windows = <span style="color: #658b00">self</span>._extract_windows(X_batch)
-        windows = windows.transpose(<span style="color: #B452CD">1</span>, <span style="color: #B452CD">0</span>, <span style="color: #B452CD">2</span>, <span style="color: #B452CD">3</span>, <span style="color: #B452CD">4</span>).reshape(
-            X_batch.shape[input_index],
-            strided_height * strided_width,
-            -<span style="color: #B452CD">1</span>,
-        )
-
-        <span style="color: #228B22"># reshape the kernel for more computationally efficient convolution</span>
-        kernel = <span style="color: #658b00">self</span>.kernel
-        kernel = kernel.transpose(<span style="color: #B452CD">0</span>, <span style="color: #B452CD">2</span>, <span style="color: #B452CD">3</span>, <span style="color: #B452CD">1</span>).reshape(
-            kernel.shape[kernel_input_channels_index]
-            * kernel.shape[height_index]
-            * kernel.shape[width_index],
-            -<span style="color: #B452CD">1</span>,
-        )
-
-        <span style="color: #228B22"># use simple matrix calculation to obtain output</span>
-        output = (
-            (windows @ kernel)
-            .reshape(
-                X_batch.shape[input_index],
-                strided_height,
-                strided_width,
-                -<span style="color: #B452CD">1</span>,
-            )
-            .transpose(<span style="color: #B452CD">0</span>, <span style="color: #B452CD">3</span>, <span style="color: #B452CD">1</span>, <span style="color: #B452CD">2</span>)
-        )
-
-        <span style="color: #228B22"># The output is reshaped and rearranged to appropriate shape</span>
-        <span style="color: #8B008B; font-weight: bold">return</span> <span style="color: #658b00">self</span>.act_func(
-            output / (<span style="color: #658b00">self</span>.kernel_height * X_batch.shape[feature_maps_index])
-        )
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">_backpropagate</span>(<span style="color: #658b00">self</span>, delta_term_next):
-        <span style="color: #228B22"># The optimized _backpropagate method is difficult to understand but computationally more efficient</span>
-        <span style="color: #228B22"># for a more &quot;by the book&quot; approach, please look at the _backpropagate method of Convolution2DLayer</span>
-        act_derivative = derivate(<span style="color: #658b00">self</span>.act_func)
-        delta_term_next = act_derivative(delta_term_next)
-
-        <span style="color: #228B22"># calculate strided dimensions</span>
-        strided_height = <span style="color: #658b00">int</span>(
-            np.ceil(<span style="color: #658b00">self</span>.X_batch_feedforward.shape[height_index] / <span style="color: #658b00">self</span>.v_stride)
-        )
-        strided_width = <span style="color: #658b00">int</span>(
-            np.ceil(<span style="color: #658b00">self</span>.X_batch_feedforward.shape[width_index] / <span style="color: #658b00">self</span>.h_stride)
-        )
-
-        <span style="color: #228B22"># copy kernel</span>
-        kernel = <span style="color: #658b00">self</span>.kernel
-
-        <span style="color: #228B22"># get windows, reshape for matrix multiplication</span>
-        windows = <span style="color: #658b00">self</span>._extract_windows(<span style="color: #658b00">self</span>.X_batch_feedforward, <span style="color: #CD5555">&quot;image&quot;</span>).reshape(
-            <span style="color: #658b00">self</span>.X_batch_feedforward.shape[input_index]
-            * strided_height
-            * strided_width,
-            -<span style="color: #B452CD">1</span>,
-        )
-
-        <span style="color: #228B22"># initialize output gradient, reshape and transpose into correct shape</span>
-        <span style="color: #228B22"># for matrix multiplication</span>
-        output_grad_tr = delta_term_next.transpose(<span style="color: #B452CD">0</span>, <span style="color: #B452CD">2</span>, <span style="color: #B452CD">3</span>, <span style="color: #B452CD">1</span>).reshape(
-            <span style="color: #658b00">self</span>.X_batch_feedforward.shape[input_index]
-            * strided_height
-            * strided_width,
-            -<span style="color: #B452CD">1</span>,
-        )
-
-        <span style="color: #228B22"># calculate gradient kernel via simple matrix multiplication and reshaping</span>
-        gradient_kernel = (
-            (windows.T @ output_grad_tr)
-            .reshape(
-                kernel.shape[kernel_input_channels_index],
-                kernel.shape[height_index],
-                kernel.shape[width_index],
-                kernel.shape[kernel_feature_maps_index],
-            )
-            .transpose(<span style="color: #B452CD">0</span>, <span style="color: #B452CD">3</span>, <span style="color: #B452CD">1</span>, <span style="color: #B452CD">2</span>)
-        )
-
-        <span style="color: #228B22"># for computing the input gradient</span>
-        windows_out, upsampled_height, upsampled_width = <span style="color: #658b00">self</span>._extract_windows(
-            delta_term_next, <span style="color: #CD5555">&quot;grad&quot;</span>
-        )
-
-        <span style="color: #228B22"># calculate new window dimensions</span>
-        new_windows_first_dim = (
-            <span style="color: #658b00">self</span>.X_batch_feedforward.shape[input_index]
-            * upsampled_height
-            * upsampled_width
-        )
-        <span style="color: #228B22"># ceil allows for various asymmetric kernels</span>
-        new_windows_sec_dim = <span style="color: #658b00">int</span>(np.ceil(windows_out.size / new_windows_first_dim))
-
-        <span style="color: #228B22"># reshape for matrix multiplication</span>
-        windows_out = windows_out.transpose(<span style="color: #B452CD">1</span>, <span style="color: #B452CD">0</span>, <span style="color: #B452CD">2</span>, <span style="color: #B452CD">3</span>, <span style="color: #B452CD">4</span>).reshape(
-            new_windows_first_dim, new_windows_sec_dim
-        )
-
-        <span style="color: #228B22"># reshape for matrix multiplication</span>
-        kernel_reshaped = kernel.reshape(<span style="color: #658b00">self</span>.input_channels, -<span style="color: #B452CD">1</span>)
-
-        <span style="color: #228B22"># calculating input gradient for next convolutional layer</span>
-        input_grad = (windows_out @ kernel_reshaped.T).reshape(
-            <span style="color: #658b00">self</span>.X_batch_feedforward.shape[input_index],
-            upsampled_height,
-            upsampled_width,
-            kernel.shape[kernel_input_channels_index],
-        )
-        input_grad = input_grad.transpose(<span style="color: #B452CD">0</span>, <span style="color: #B452CD">3</span>, <span style="color: #B452CD">1</span>, <span style="color: #B452CD">2</span>)
-
-        <span style="color: #228B22"># Update the weights in the kernel</span>
-        <span style="color: #658b00">self</span>.kernel -= gradient_kernel
-
-        <span style="color: #228B22"># Output the gradient to propagate backwards</span>
-        <span style="color: #8B008B; font-weight: bold">return</span> input_grad
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">_extract_windows</span>(<span style="color: #658b00">self</span>, X_batch, batch_type=<span style="color: #CD5555">&quot;image&quot;</span>):
-        <span style="color: #CD5555">&quot;&quot;&quot;</span>
-<span style="color: #CD5555">        Receives as input the X_batch with shape (inputs, feature_maps, image_height, image_width)</span>
-<span style="color: #CD5555">        and extract windows of size kernel_height * kernel_width for every image and every feature_map.</span>
-<span style="color: #CD5555">        It then returns an np.ndarray of shape (image_height * image_width, inputs, feature_maps, kernel_height, kernel_width)</span>
-<span style="color: #CD5555">        which will be used either to filter the images in feedforward or to calculate the gradient.</span>
-<span style="color: #CD5555">        &quot;&quot;&quot;</span>
-
-        <span style="color: #228B22"># initialize list of windows</span>
-        windows = []
-
-        <span style="color: #8B008B; font-weight: bold">if</span> batch_type == <span style="color: #CD5555">&quot;image&quot;</span>:
-            <span style="color: #228B22"># pad the images</span>
-            X_batch_padded = <span style="color: #658b00">self</span>._padding(X_batch, batch_type=<span style="color: #CD5555">&quot;image&quot;</span>)
-            img_height, img_width = X_batch_padded.shape[<span style="color: #B452CD">2</span>:]
-            <span style="color: #228B22"># For each location in the image...</span>
-            <span style="color: #8B008B; font-weight: bold">for</span> h <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(
-                <span style="color: #B452CD">0</span>,
-                X_batch.shape[height_index],
-                <span style="color: #658b00">self</span>.v_stride,
-            ):
-                <span style="color: #8B008B; font-weight: bold">for</span> w <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(
-                    <span style="color: #B452CD">0</span>,
-                    X_batch.shape[width_index],
-                    <span style="color: #658b00">self</span>.h_stride,
-                ):
-                    <span style="color: #228B22"># ...obtain an image patch of the original size (strided)</span>
-
-                    <span style="color: #228B22"># get window</span>
-                    window = X_batch_padded[
-                        :,
-                        :,
-                        h : h + <span style="color: #658b00">self</span>.kernel_height,
-                        w : w + <span style="color: #658b00">self</span>.kernel_width,
-                    ]
-
-                    <span style="color: #228B22"># append to list of windows</span>
-                    windows.append(window)
-
-            <span style="color: #228B22"># return numpy array instead of list</span>
-            <span style="color: #8B008B; font-weight: bold">return</span> np.stack(windows)
-
-        <span style="color: #228B22"># In order to be able to perform backprogagation by the method of window extraction,</span>
-        <span style="color: #228B22"># here is a modified approach to extracting the windows which allow for the necessary</span>
-        <span style="color: #228B22"># upsampling of the gradient in case the on of the stride parameters is larger than one.</span>
-
-        <span style="color: #8B008B; font-weight: bold">if</span> batch_type == <span style="color: #CD5555">&quot;grad&quot;</span>:
-
-            <span style="color: #228B22"># In the case of one of the stride parameters being odd, we have to take some</span>
-            <span style="color: #228B22"># extra care in calculating the upsampled size of X_batch. We solve this</span>
-            <span style="color: #228B22"># by simply flooring the result of dividing stride by 2.</span>
-            <span style="color: #8B008B; font-weight: bold">if</span> <span style="color: #658b00">self</span>.v_stride &lt; <span style="color: #B452CD">2</span> <span style="color: #8B008B">or</span> <span style="color: #658b00">self</span>.v_stride % <span style="color: #B452CD">2</span> == <span style="color: #B452CD">0</span>:
-                v_stride = <span style="color: #B452CD">0</span>
-            <span style="color: #8B008B; font-weight: bold">else</span>:
-                v_stride = <span style="color: #658b00">int</span>(np.floor(<span style="color: #658b00">self</span>.v_stride / <span style="color: #B452CD">2</span>))
-
-            <span style="color: #8B008B; font-weight: bold">if</span> <span style="color: #658b00">self</span>.h_stride &lt; <span style="color: #B452CD">2</span> <span style="color: #8B008B">or</span> <span style="color: #658b00">self</span>.h_stride % <span style="color: #B452CD">2</span> == <span style="color: #B452CD">0</span>:
-                h_stride = <span style="color: #B452CD">0</span>
-            <span style="color: #8B008B; font-weight: bold">else</span>:
-                h_stride = <span style="color: #658b00">int</span>(np.floor(<span style="color: #658b00">self</span>.h_stride / <span style="color: #B452CD">2</span>))
-
-            upsampled_height = (X_batch.shape[height_index] * <span style="color: #658b00">self</span>.v_stride) - v_stride
-            upsampled_width = (X_batch.shape[width_index] * <span style="color: #658b00">self</span>.h_stride) - h_stride
-
-            <span style="color: #228B22"># When upsampling, we need to insert rows and columns filled with zeros</span>
-            <span style="color: #228B22"># into each feature map. How many of those we have to insert is purely</span>
-            <span style="color: #228B22"># dependant on the value of stride parameter in the vertical and horizontal</span>
-            <span style="color: #228B22"># direction.</span>
-            <span style="color: #8B008B; font-weight: bold">if</span> <span style="color: #658b00">self</span>.v_stride &gt; <span style="color: #B452CD">1</span>:
-                v_ind = <span style="color: #B452CD">1</span>
-                <span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(X_batch.shape[height_index]):
-                    <span style="color: #8B008B; font-weight: bold">for</span> j <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #658b00">self</span>.v_stride - <span style="color: #B452CD">1</span>):
-                        X_batch = np.insert(X_batch, v_ind, <span style="color: #B452CD">0</span>, axis=height_index)
-                    v_ind += <span style="color: #658b00">self</span>.v_stride
-
-            <span style="color: #8B008B; font-weight: bold">if</span> <span style="color: #658b00">self</span>.h_stride &gt; <span style="color: #B452CD">1</span>:
-                h_ind = <span style="color: #B452CD">1</span>
-                <span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(X_batch.shape[width_index]):
-                    <span style="color: #8B008B; font-weight: bold">for</span> k <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #658b00">self</span>.h_stride - <span style="color: #B452CD">1</span>):
-                        X_batch = np.insert(X_batch, h_ind, <span style="color: #B452CD">0</span>, axis=width_index)
-                    h_ind += <span style="color: #658b00">self</span>.h_stride
-
-            <span style="color: #228B22"># Since the insertion of zero-filled rows and columns isn&#39;t perfect, we have</span>
-            <span style="color: #228B22"># to assure that the resulting feature maps will have the expected upsampled height</span>
-            <span style="color: #228B22"># and width by cutting them og at desired dimensions.</span>
-
-            X_batch = X_batch[:, :, :upsampled_height, :upsampled_width]
-
-            X_batch_padded = <span style="color: #658b00">self</span>._padding(X_batch, batch_type=<span style="color: #CD5555">&quot;grad&quot;</span>)
-
-            <span style="color: #228B22"># initialize list of windows</span>
-            windows = []
-
-            <span style="color: #228B22"># For each location in the image...</span>
-            <span style="color: #8B008B; font-weight: bold">for</span> h <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(
-                <span style="color: #B452CD">0</span>,
-                X_batch.shape[height_index],
-                <span style="color: #658b00">self</span>.v_stride,
-            ):
-                <span style="color: #8B008B; font-weight: bold">for</span> w <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(
-                    <span style="color: #B452CD">0</span>,
-                    X_batch.shape[width_index],
-                    <span style="color: #658b00">self</span>.h_stride,
-                ):
-                    <span style="color: #228B22"># ...obtain an image patch of the original size (strided)</span>
-
-                    <span style="color: #228B22"># get window</span>
-                    window = X_batch_padded[
-                        :, :, h : h + <span style="color: #658b00">self</span>.kernel_height, w : w + <span style="color: #658b00">self</span>.kernel_width
-                    ]
-
-                    <span style="color: #228B22"># append window to list</span>
-                    windows.append(window)
-
-            <span style="color: #228B22"># return numpy array, unsampled dimensions</span>
-            <span style="color: #8B008B; font-weight: bold">return</span> np.stack(windows), upsampled_height, upsampled_width
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">_check_for_errors</span>(<span style="color: #658b00">self</span>):
-        <span style="color: #228B22"># compares input channels of data to input channels of Convolution2DLayer</span>
-        <span style="color: #8B008B; font-weight: bold">if</span> <span style="color: #658b00">self</span>.X_batch_feedforward.shape[input_channel_index] != <span style="color: #658b00">self</span>.input_channels:
-            <span style="color: #8B008B; font-weight: bold">raise</span> <span style="color: #008b45; font-weight: bold">AssertionError</span>(
-                <span style="color: #CD5555">f&quot;ERROR: Number of input channels in data ({</span><span style="color: #658b00">self</span>.X_batch_feedforward.shape[input_channel_index]<span style="color: #CD5555">}) is not equal to input channels in Convolution2DLayerOPT ({</span><span style="color: #658b00">self</span>.input_channels<span style="color: #CD5555">})! Please change the number of input channels of the Convolution2DLayer such that they are equal&quot;</span>
-            )
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-<h3 id="the-convolutional-neural-network-cnn">The Convolutional Neural Network (CNN) </h3>
 
-<p>Finally, we present the code for the CNN. The CNN class organizes all the layers, and allows for training on image data.</p>
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="cnns-in-more-detail-simple-example">CNNs in more detail, simple example  </h2>
+
+<p>Let assume we have an input matrix \( X \) of dimensionality \( 3\times 3 \)
+and a \( 2\times 2 \) filter \( W \) given by the following matrices
+</p>
 
+$$
+\boldsymbol{X}=\begin{bmatrix}x_{00} & x_{01} & x_{02}  \\
+                      x_{10} & x_{11} & x_{12}  \\
+	              x_{20} & x_{21} & x_{22} \end{bmatrix},
+$$
 
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="line-height: 125%;"><span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">math</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">autograd.numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">sys</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">warnings</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">autograd</span> <span style="color: #8B008B; font-weight: bold">import</span> grad, elementwise_grad
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">random</span> <span style="color: #8B008B; font-weight: bold">import</span> random, seed
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">copy</span> <span style="color: #8B008B; font-weight: bold">import</span> deepcopy
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">typing</span> <span style="color: #8B008B; font-weight: bold">import</span> Tuple, Callable
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.utils</span> <span style="color: #8B008B; font-weight: bold">import</span> resample
-
-warnings.simplefilter(<span style="color: #CD5555">&quot;error&quot;</span>)
-
-
-<span style="color: #8B008B; font-weight: bold">class</span> <span style="color: #008b45; font-weight: bold">CNN</span>:
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">__init__</span>(
-        <span style="color: #658b00">self</span>,
-        cost_func: Callable = CostCrossEntropy,
-        scheduler: Scheduler = Adam(eta=<span style="color: #B452CD">1e-4</span>, rho=<span style="color: #B452CD">0.9</span>, rho2=<span style="color: #B452CD">0.999</span>),
-        seed: <span style="color: #658b00">int</span> = <span style="color: #8B008B; font-weight: bold">None</span>,
-    ):
-        <span style="color: #CD5555">&quot;&quot;&quot;</span>
-<span style="color: #CD5555">        Description:</span>
-<span style="color: #CD5555">        ------------</span>
-<span style="color: #CD5555">            Instantiates CNN object</span>
-
-<span style="color: #CD5555">        Parameters:</span>
-<span style="color: #CD5555">        ------------</span>
-<span style="color: #CD5555">            I   output_func (costFunctions) cost function for feed forward neural network part of CNN,</span>
-<span style="color: #CD5555">                such as &quot;CostLogReg&quot;, &quot;CostOLS&quot; or &quot;CostCrossEntropy&quot;</span>
-
-<span style="color: #CD5555">            II  scheduler (Scheduler) optional parameter, default set to Adam. Can also be set to other</span>
-<span style="color: #CD5555">                schedulers such as AdaGrad, Momentum, RMS_prop and Constant. Note that schedulers have</span>
-<span style="color: #CD5555">                to be instantiated first with proper parameters (for example eta, rho and rho2 for Adam)</span>
-
-<span style="color: #CD5555">            III seed (int) used for seeding all random operations</span>
-<span style="color: #CD5555">        &quot;&quot;&quot;</span>
-        <span style="color: #658b00">self</span>.layers = <span style="color: #658b00">list</span>()
-        <span style="color: #658b00">self</span>.cost_func = cost_func
-        <span style="color: #658b00">self</span>.scheduler = scheduler
-        <span style="color: #658b00">self</span>.schedulers_weight = <span style="color: #658b00">list</span>()
-        <span style="color: #658b00">self</span>.schedulers_bias = <span style="color: #658b00">list</span>()
-        <span style="color: #658b00">self</span>.seed = seed
-        <span style="color: #658b00">self</span>.pred_format = <span style="color: #8B008B; font-weight: bold">None</span>
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">add_FullyConnectedLayer</span>(
-        <span style="color: #658b00">self</span>, nodes: <span style="color: #658b00">int</span>, act_func=LRELU, scheduler=<span style="color: #8B008B; font-weight: bold">None</span>
-    ) -&gt; <span style="color: #8B008B; font-weight: bold">None</span>:
-        <span style="color: #CD5555">&quot;&quot;&quot;</span>
-<span style="color: #CD5555">        Description:</span>
-<span style="color: #CD5555">        ------------</span>
-<span style="color: #CD5555">            Add a FullyConnectedLayer to the CNN, i.e. a hidden layer in the feed forward neural</span>
-<span style="color: #CD5555">            network part of the CNN. Often called a Dense layer in literature</span>
-
-<span style="color: #CD5555">        Parameters:</span>
-<span style="color: #CD5555">        ------------</span>
-<span style="color: #CD5555">            I   nodes (int) number of nodes in FullyConnectedLayer</span>
-<span style="color: #CD5555">            II  act_func (activationFunctions) activation function of FullyConnectedLayer,</span>
-<span style="color: #CD5555">                such as &quot;sigmoid&quot;, &quot;RELU&quot;, &quot;LRELU&quot;, &quot;softmax&quot; or &quot;identity&quot;</span>
-<span style="color: #CD5555">            III scheduler (Scheduler) optional parameter, default set to Adam. Can also be set to other</span>
-<span style="color: #CD5555">                schedulers such as AdaGrad, Momentum, RMS_prop and Constant</span>
-<span style="color: #CD5555">        &quot;&quot;&quot;</span>
-        <span style="color: #8B008B; font-weight: bold">assert</span> <span style="color: #658b00">self</span>.layers, <span style="color: #CD5555">&quot;FullyConnectedLayer should follow FlattenLayer in CNN&quot;</span>
-
-        <span style="color: #8B008B; font-weight: bold">if</span> scheduler <span style="color: #8B008B">is</span> <span style="color: #8B008B; font-weight: bold">None</span>:
-            scheduler = <span style="color: #658b00">self</span>.scheduler
-
-        layer = FullyConnectedLayer(nodes, act_func, scheduler, <span style="color: #658b00">self</span>.seed)
-        <span style="color: #658b00">self</span>.layers.append(layer)
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">add_OutputLayer</span>(<span style="color: #658b00">self</span>, nodes: <span style="color: #658b00">int</span>, output_func=sigmoid, scheduler=<span style="color: #8B008B; font-weight: bold">None</span>) -&gt; <span style="color: #8B008B; font-weight: bold">None</span>:
-        <span style="color: #CD5555">&quot;&quot;&quot;</span>
-<span style="color: #CD5555">        Description:</span>
-<span style="color: #CD5555">        ------------</span>
-<span style="color: #CD5555">            Add an OutputLayer to the CNN, i.e. a the final layer in the feed forward neural</span>
-<span style="color: #CD5555">            network part of the CNN</span>
-
-<span style="color: #CD5555">        Parameters:</span>
-<span style="color: #CD5555">        ------------</span>
-<span style="color: #CD5555">            I   nodes (int) number of nodes in OutputLayer. Set nodes=1 for binary classification and</span>
-<span style="color: #CD5555">                nodes = number of classes for multi-class classification</span>
-<span style="color: #CD5555">            II  output_func (activationFunctions) activation function for the output layer, such as</span>
-<span style="color: #CD5555">                &quot;identity&quot; for regression, &quot;sigmoid&quot; for binary classification and &quot;softmax&quot; for multi-class</span>
-<span style="color: #CD5555">                classification</span>
-<span style="color: #CD5555">            III scheduler (Scheduler) optional parameter, default set to Adam. Can also be set to other</span>
-<span style="color: #CD5555">                schedulers such as AdaGrad, Momentum, RMS_prop and Constant</span>
-<span style="color: #CD5555">        &quot;&quot;&quot;</span>
-        <span style="color: #8B008B; font-weight: bold">assert</span> <span style="color: #658b00">self</span>.layers, <span style="color: #CD5555">&quot;OutputLayer should follow FullyConnectedLayer in CNN&quot;</span>
-
-        <span style="color: #8B008B; font-weight: bold">if</span> scheduler <span style="color: #8B008B">is</span> <span style="color: #8B008B; font-weight: bold">None</span>:
-            scheduler = <span style="color: #658b00">self</span>.scheduler
-
-        output_layer = OutputLayer(
-            nodes, output_func, <span style="color: #658b00">self</span>.cost_func, scheduler, <span style="color: #658b00">self</span>.seed
-        )
-        <span style="color: #658b00">self</span>.layers.append(output_layer)
-        <span style="color: #658b00">self</span>.pred_format = output_layer.get_pred_format()
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">add_FlattenLayer</span>(<span style="color: #658b00">self</span>, act_func=LRELU) -&gt; <span style="color: #8B008B; font-weight: bold">None</span>:
-        <span style="color: #CD5555">&quot;&quot;&quot;</span>
-<span style="color: #CD5555">        Description:</span>
-<span style="color: #CD5555">        ------------</span>
-<span style="color: #CD5555">            Add a FlattenLayer to the CNN, which flattens the image data such that it is formatted to</span>
-<span style="color: #CD5555">            be used in the feed forward neural network part of the CNN</span>
-<span style="color: #CD5555">        &quot;&quot;&quot;</span>
-        <span style="color: #658b00">self</span>.layers.append(FlattenLayer(act_func=act_func, seed=<span style="color: #658b00">self</span>.seed))
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">add_Convolution2DLayer</span>(
-        <span style="color: #658b00">self</span>,
-        input_channels=<span style="color: #B452CD">1</span>,
-        feature_maps=<span style="color: #B452CD">1</span>,
-        kernel_height=<span style="color: #B452CD">3</span>,
-        kernel_width=<span style="color: #B452CD">3</span>,
-        v_stride=<span style="color: #B452CD">1</span>,
-        h_stride=<span style="color: #B452CD">1</span>,
-        pad=<span style="color: #CD5555">&quot;same&quot;</span>,
-        act_func=LRELU,
-        optimized=<span style="color: #8B008B; font-weight: bold">True</span>,
-    ) -&gt; <span style="color: #8B008B; font-weight: bold">None</span>:
-        <span style="color: #CD5555">&quot;&quot;&quot;</span>
-<span style="color: #CD5555">        Description:</span>
-<span style="color: #CD5555">        ------------</span>
-<span style="color: #CD5555">            Add a Convolution2DLayer to the CNN, i.e. a convolutional layer with a 2 dimensional kernel. Should be</span>
-<span style="color: #CD5555">            the first layer added to the CNN</span>
-
-<span style="color: #CD5555">        Parameters:</span>
-<span style="color: #CD5555">        ------------</span>
-<span style="color: #CD5555">            I   input_channels (int) specifies amount of input channels. For monochrome images, use input_channels</span>
-<span style="color: #CD5555">                = 1, and input_channels = 3 for colored images, where each channel represents one of R, G and B</span>
-<span style="color: #CD5555">            II  feature_maps (int) amount of feature maps in CNN</span>
-<span style="color: #CD5555">            III kernel_height (int) height of the kernel, also called &#39;convolutional filter&#39; in literature</span>
-<span style="color: #CD5555">            IV  kernel_width (int) width of the kernel, also called &#39;convolutional filter&#39; in literature</span>
-<span style="color: #CD5555">            V   v_stride (int) value of vertical stride for dimentionality reduction</span>
-<span style="color: #CD5555">            VI  h_stride (int) value of horizontal stride for dimentionality reduction</span>
-<span style="color: #CD5555">            VII pad (str) default = &quot;same&quot; ensures output size is the same as input size (given stride=1)</span>
-<span style="color: #CD5555">           VIII act_func (activationFunctions) default = &quot;LRELU&quot;, nonlinear activation function</span>
-<span style="color: #CD5555">             IX optimized (bool) default = True, uses Convolution2DLayerOPT if True which is much faster when</span>
-<span style="color: #CD5555">                compared to Convolution2DLayer, which is a more straightforward, understandable implementation</span>
-<span style="color: #CD5555">        &quot;&quot;&quot;</span>
-        <span style="color: #8B008B; font-weight: bold">if</span> optimized:
-            conv_layer = Convolution2DLayerOPT(
-                input_channels,
-                feature_maps,
-                kernel_height,
-                kernel_width,
-                v_stride,
-                h_stride,
-                pad,
-                act_func,
-                <span style="color: #658b00">self</span>.seed,
-                reset_weights_independently=<span style="color: #8B008B; font-weight: bold">False</span>,
-            )
-        <span style="color: #8B008B; font-weight: bold">else</span>:
-            conv_layer = Convolution2DLayer(
-                input_channels,
-                feature_maps,
-                kernel_height,
-                kernel_width,
-                v_stride,
-                h_stride,
-                pad,
-                act_func,
-                <span style="color: #658b00">self</span>.seed,
-                reset_weights_independently=<span style="color: #8B008B; font-weight: bold">False</span>,
-            )
-        <span style="color: #658b00">self</span>.layers.append(conv_layer)
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">add_PoolingLayer</span>(
-        <span style="color: #658b00">self</span>, kernel_height=<span style="color: #B452CD">2</span>, kernel_width=<span style="color: #B452CD">2</span>, v_stride=<span style="color: #B452CD">1</span>, h_stride=<span style="color: #B452CD">1</span>, pooling=<span style="color: #CD5555">&quot;max&quot;</span>
-    ) -&gt; <span style="color: #8B008B; font-weight: bold">None</span>:
-        <span style="color: #CD5555">&quot;&quot;&quot;</span>
-<span style="color: #CD5555">        Description:</span>
-<span style="color: #CD5555">        ------------</span>
-<span style="color: #CD5555">            Add a Pooling2DLayer to the CNN, i.e. a pooling layer that reduces the dimentionality of</span>
-<span style="color: #CD5555">            the image data. It is not necessary to use a Pooling2DLayer when creating a CNN, but it</span>
-<span style="color: #CD5555">            can be used to speed up the training</span>
-
-<span style="color: #CD5555">        Parameters:</span>
-<span style="color: #CD5555">        ------------</span>
-<span style="color: #CD5555">            I   kernel_height (int) height of the kernel used for pooling</span>
-<span style="color: #CD5555">            II  kernel_width (int) width of the kernel used for pooling</span>
-<span style="color: #CD5555">            III v_stride (int) value of vertical stride for dimentionality reduction</span>
-<span style="color: #CD5555">            IV  h_stride (int) value of horizontal stride for dimentionality reduction</span>
-<span style="color: #CD5555">            V   pooling (str) either &quot;max&quot; or &quot;average&quot;, describes type of pooling performed</span>
-<span style="color: #CD5555">        &quot;&quot;&quot;</span>
-        pooling_layer = Pooling2DLayer(
-            kernel_height, kernel_width, v_stride, h_stride, pooling, <span style="color: #658b00">self</span>.seed
-        )
-        <span style="color: #658b00">self</span>.layers.append(pooling_layer)
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">fit</span>(
-        <span style="color: #658b00">self</span>,
-        X: np.ndarray,
-        t: np.ndarray,
-        epochs: <span style="color: #658b00">int</span> = <span style="color: #B452CD">100</span>,
-        lam: <span style="color: #658b00">float</span> = <span style="color: #B452CD">0</span>,
-        batches: <span style="color: #658b00">int</span> = <span style="color: #B452CD">1</span>,
-        X_val: np.ndarray = <span style="color: #8B008B; font-weight: bold">None</span>,
-        t_val: np.ndarray = <span style="color: #8B008B; font-weight: bold">None</span>,
-    ) -&gt; <span style="color: #658b00">dict</span>:
-        <span style="color: #CD5555">&quot;&quot;&quot;</span>
-<span style="color: #CD5555">        Description:</span>
-<span style="color: #CD5555">        ------------</span>
-<span style="color: #CD5555">            Fits the CNN to input X for a given amount of epochs. Performs feedforward and backpropagation passes,</span>
-<span style="color: #CD5555">            can utilize batches, regulariziation and validation if desired.</span>
-
-<span style="color: #CD5555">        Parameters:</span>
-<span style="color: #CD5555">        ------------</span>
-<span style="color: #CD5555">            X (numpy array) with input data in format [images, input channels,</span>
-<span style="color: #CD5555">            image height, image_width]</span>
-<span style="color: #CD5555">            t (numpy array) target labels for input data</span>
-<span style="color: #CD5555">            epochs (int) amount of epochs</span>
-<span style="color: #CD5555">            lam (float) regulariziation term lambda</span>
-<span style="color: #CD5555">            batches (int) amount of batches input data splits into</span>
-<span style="color: #CD5555">            X_val (numpy array) validation data</span>
-<span style="color: #CD5555">            t_val (numpy array) target labels for validation data</span>
-
-<span style="color: #CD5555">        Returns:</span>
-<span style="color: #CD5555">        ------------</span>
-<span style="color: #CD5555">            scores (dict) a dictionary with &quot;train_error&quot;, &quot;train_acc&quot;, &quot;val_error&quot;, val_acc&quot; keys</span>
-<span style="color: #CD5555">            that contain numpy arrays with float values of all accuracies/errors over all epochs.</span>
-<span style="color: #CD5555">            Can be used to create plots. Also used to update the progress bar during training</span>
-<span style="color: #CD5555">        &quot;&quot;&quot;</span>
-
-        <span style="color: #228B22"># setup</span>
-        <span style="color: #8B008B; font-weight: bold">if</span> <span style="color: #658b00">self</span>.seed <span style="color: #8B008B">is</span> <span style="color: #8B008B">not</span> <span style="color: #8B008B; font-weight: bold">None</span>:
-            np.random.seed(<span style="color: #658b00">self</span>.seed)
-
-        <span style="color: #228B22"># initialize weights</span>
-        <span style="color: #658b00">self</span>._initialize_weights(X)
-
-        <span style="color: #228B22"># create arrays for score metrics</span>
-        scores = <span style="color: #658b00">self</span>._initialize_scores(epochs)
-
-        <span style="color: #8B008B; font-weight: bold">assert</span> batches &lt;= t.shape[<span style="color: #B452CD">0</span>]
-        batch_size = X.shape[<span style="color: #B452CD">0</span>] // batches
-
-        <span style="color: #8B008B; font-weight: bold">try</span>:
-            <span style="color: #8B008B; font-weight: bold">for</span> epoch <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(epochs):
-                <span style="color: #8B008B; font-weight: bold">for</span> batch <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(batches):
-                    <span style="color: #228B22"># minibatch gradient descent</span>
-                    <span style="color: #228B22"># If the for loop has reached the last batch, take all thats left</span>
-                    <span style="color: #8B008B; font-weight: bold">if</span> batch == batches - <span style="color: #B452CD">1</span>:
-                        X_batch = X[batch * batch_size :, :, :, :]
-                        t_batch = t[batch * batch_size :, :]
-                    <span style="color: #8B008B; font-weight: bold">else</span>:
-                        X_batch = X[
-                            batch * batch_size : (batch + <span style="color: #B452CD">1</span>) * batch_size, :, :, :
-                        ]
-                        t_batch = t[batch * batch_size : (batch + <span style="color: #B452CD">1</span>) * batch_size, :]
-
-                    <span style="color: #658b00">self</span>._feedforward(X_batch)
-                    <span style="color: #658b00">self</span>._backpropagate(t_batch, lam)
-
-                <span style="color: #228B22"># reset schedulers for each epoch (some schedulers pass in this call)</span>
-                <span style="color: #8B008B; font-weight: bold">for</span> layer <span style="color: #8B008B">in</span> <span style="color: #658b00">self</span>.layers:
-                    <span style="color: #8B008B; font-weight: bold">if</span> <span style="color: #658b00">isinstance</span>(layer, FullyConnectedLayer):
-                        layer._reset_scheduler()
-
-                <span style="color: #228B22"># computing performance metrics</span>
-                scores = <span style="color: #658b00">self</span>._compute_scores(scores, epoch, X, t, X_val, t_val)
-
-                <span style="color: #228B22"># printing progress bar</span>
-                print_length = <span style="color: #658b00">self</span>._progress_bar(
-                    epoch,
-                    epochs,
-                    scores,
-                )
-        <span style="color: #228B22"># allows for stopping training at any point and seeing the result</span>
-        <span style="color: #8B008B; font-weight: bold">except</span> <span style="color: #008b45; font-weight: bold">KeyboardInterrupt</span>:
-            <span style="color: #8B008B; font-weight: bold">pass</span>
-
-        <span style="color: #228B22"># visualization of training progression (similiar to tensorflow progression bar)</span>
-        sys.stdout.write(<span style="color: #CD5555">&quot;\r&quot;</span> + <span style="color: #CD5555">&quot; &quot;</span> * print_length)
-        sys.stdout.flush()
-        <span style="color: #658b00">self</span>._progress_bar(
-            epochs,
-            epochs,
-            scores,
-        )
-        sys.stdout.write(<span style="color: #CD5555">&quot;&quot;</span>)
-
-        <span style="color: #8B008B; font-weight: bold">return</span> scores
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">_feedforward</span>(<span style="color: #658b00">self</span>, X_batch) -&gt; np.ndarray:
-        <span style="color: #CD5555">&quot;&quot;&quot;</span>
-<span style="color: #CD5555">        Description:</span>
-<span style="color: #CD5555">        ------------</span>
-<span style="color: #CD5555">            Performs the feedforward pass for all layers in the CNN. Called from fit()</span>
-<span style="color: #CD5555">        &quot;&quot;&quot;</span>
-        a = X_batch
-        <span style="color: #8B008B; font-weight: bold">for</span> layer <span style="color: #8B008B">in</span> <span style="color: #658b00">self</span>.layers:
-            a = layer._feedforward(a)
-
-        <span style="color: #8B008B; font-weight: bold">return</span> a
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">_backpropagate</span>(<span style="color: #658b00">self</span>, t_batch, lam) -&gt; <span style="color: #8B008B; font-weight: bold">None</span>:
-        <span style="color: #CD5555">&quot;&quot;&quot;</span>
-<span style="color: #CD5555">        Description:</span>
-<span style="color: #CD5555">        ------------</span>
-<span style="color: #CD5555">            Performs backpropagation for all layers in the CNN. Called from fit()</span>
-<span style="color: #CD5555">        &quot;&quot;&quot;</span>
-        <span style="color: #8B008B; font-weight: bold">assert</span> <span style="color: #658b00">len</span>(<span style="color: #658b00">self</span>.layers) &gt;= <span style="color: #B452CD">2</span>
-        reversed_layers = <span style="color: #658b00">self</span>.layers[::-<span style="color: #B452CD">1</span>]
-
-        <span style="color: #228B22"># for every layer, backwards</span>
-        <span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #658b00">len</span>(reversed_layers) - <span style="color: #B452CD">1</span>):
-            layer = reversed_layers[i]
-            prev_layer = reversed_layers[i + <span style="color: #B452CD">1</span>]
-
-            <span style="color: #228B22"># OutputLayer</span>
-            <span style="color: #8B008B; font-weight: bold">if</span> <span style="color: #658b00">isinstance</span>(layer, OutputLayer):
-                prev_a = prev_layer.get_prev_a()
-                weights_next, delta_next = layer._backpropagate(t_batch, prev_a, lam)
-
-            <span style="color: #228B22"># FullyConnectedLayer</span>
-            <span style="color: #8B008B; font-weight: bold">elif</span> <span style="color: #658b00">isinstance</span>(layer, FullyConnectedLayer):
-                <span style="color: #8B008B; font-weight: bold">assert</span> (
-                    delta_next <span style="color: #8B008B">is</span> <span style="color: #8B008B">not</span> <span style="color: #8B008B; font-weight: bold">None</span>
-                ), <span style="color: #CD5555">&quot;No OutputLayer to follow FullyConnectedLayer&quot;</span>
-                <span style="color: #8B008B; font-weight: bold">assert</span> (
-                    weights_next <span style="color: #8B008B">is</span> <span style="color: #8B008B">not</span> <span style="color: #8B008B; font-weight: bold">None</span>
-                ), <span style="color: #CD5555">&quot;No OutputLayer to follow FullyConnectedLayer&quot;</span>
-                prev_a = prev_layer.get_prev_a()
-                weights_next, delta_next = layer._backpropagate(
-                    weights_next, delta_next, prev_a, lam
-                )
-
-            <span style="color: #228B22"># FlattenLayer</span>
-            <span style="color: #8B008B; font-weight: bold">elif</span> <span style="color: #658b00">isinstance</span>(layer, FlattenLayer):
-                <span style="color: #8B008B; font-weight: bold">assert</span> (
-                    delta_next <span style="color: #8B008B">is</span> <span style="color: #8B008B">not</span> <span style="color: #8B008B; font-weight: bold">None</span>
-                ), <span style="color: #CD5555">&quot;No FullyConnectedLayer to follow FlattenLayer&quot;</span>
-                <span style="color: #8B008B; font-weight: bold">assert</span> (
-                    weights_next <span style="color: #8B008B">is</span> <span style="color: #8B008B">not</span> <span style="color: #8B008B; font-weight: bold">None</span>
-                ), <span style="color: #CD5555">&quot;No FullyConnectedLayer to follow FlattenLayer&quot;</span>
-                delta_next = layer._backpropagate(weights_next, delta_next)
-
-            <span style="color: #228B22"># Convolution2DLayer and Convolution2DLayerOPT</span>
-            <span style="color: #8B008B; font-weight: bold">elif</span> <span style="color: #658b00">isinstance</span>(layer, Convolution2DLayer):
-                <span style="color: #8B008B; font-weight: bold">assert</span> (
-                    delta_next <span style="color: #8B008B">is</span> <span style="color: #8B008B">not</span> <span style="color: #8B008B; font-weight: bold">None</span>
-                ), <span style="color: #CD5555">&quot;No FlattenLayer to follow Convolution2DLayer&quot;</span>
-                delta_next = layer._backpropagate(delta_next)
-
-            <span style="color: #228B22"># Pooling2DLayer</span>
-            <span style="color: #8B008B; font-weight: bold">elif</span> <span style="color: #658b00">isinstance</span>(layer, Pooling2DLayer):
-                <span style="color: #8B008B; font-weight: bold">assert</span> delta_next <span style="color: #8B008B">is</span> <span style="color: #8B008B">not</span> <span style="color: #8B008B; font-weight: bold">None</span>, <span style="color: #CD5555">&quot;No Layer to follow Pooling2DLayer&quot;</span>
-                delta_next = layer._backpropagate(delta_next)
-
-            <span style="color: #228B22"># Catch error</span>
-            <span style="color: #8B008B; font-weight: bold">else</span>:
-                <span style="color: #8B008B; font-weight: bold">raise</span> <span style="color: #008b45; font-weight: bold">NotImplementedError</span>
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">_compute_scores</span>(
-        <span style="color: #658b00">self</span>,
-        scores: <span style="color: #658b00">dict</span>,
-        epoch: <span style="color: #658b00">int</span>,
-        X: np.ndarray,
-        t: np.ndarray,
-        X_val: np.ndarray,
-        t_val: np.ndarray,
-    ) -&gt; <span style="color: #658b00">dict</span>:
-        <span style="color: #CD5555">&quot;&quot;&quot;</span>
-<span style="color: #CD5555">        Description:</span>
-<span style="color: #CD5555">        ------------</span>
-<span style="color: #CD5555">            Computes scores such as training error, training accuracy, validation error</span>
-<span style="color: #CD5555">            and validation accuracy for the CNN depending on if a validation set is used</span>
-<span style="color: #CD5555">            and if the CNN performs classification or regression</span>
-
-<span style="color: #CD5555">        Returns:</span>
-<span style="color: #CD5555">        ------------</span>
-<span style="color: #CD5555">            scores (dict) a dictionary with &quot;train_error&quot;, &quot;train_acc&quot;, &quot;val_error&quot;, val_acc&quot; keys</span>
-<span style="color: #CD5555">            that contain numpy arrays with float values of all accuracies/errors over all epochs.</span>
-<span style="color: #CD5555">            Can be used to create plots. Also used to update the progress bar during training</span>
-<span style="color: #CD5555">        &quot;&quot;&quot;</span>
-
-        pred_train = <span style="color: #658b00">self</span>.predict(X)
-        cost_function_train = <span style="color: #658b00">self</span>.cost_func(t)
-        train_error = cost_function_train(pred_train)
-        scores[<span style="color: #CD5555">&quot;train_error&quot;</span>][epoch] = train_error
-
-        <span style="color: #8B008B; font-weight: bold">if</span> X_val <span style="color: #8B008B">is</span> <span style="color: #8B008B">not</span> <span style="color: #8B008B; font-weight: bold">None</span> <span style="color: #8B008B">and</span> t_val <span style="color: #8B008B">is</span> <span style="color: #8B008B">not</span> <span style="color: #8B008B; font-weight: bold">None</span>:
-            cost_function_val = <span style="color: #658b00">self</span>.cost_func(t_val)
-            pred_val = <span style="color: #658b00">self</span>.predict(X_val)
-            val_error = cost_function_val(pred_val)
-            scores[<span style="color: #CD5555">&quot;val_error&quot;</span>][epoch] = val_error
-
-        <span style="color: #8B008B; font-weight: bold">if</span> <span style="color: #658b00">self</span>.pred_format != <span style="color: #CD5555">&quot;Regression&quot;</span>:
-            train_acc = <span style="color: #658b00">self</span>._accuracy(pred_train, t)
-            scores[<span style="color: #CD5555">&quot;train_acc&quot;</span>][epoch] = train_acc
-            <span style="color: #8B008B; font-weight: bold">if</span> X_val <span style="color: #8B008B">is</span> <span style="color: #8B008B">not</span> <span style="color: #8B008B; font-weight: bold">None</span> <span style="color: #8B008B">and</span> t_val <span style="color: #8B008B">is</span> <span style="color: #8B008B">not</span> <span style="color: #8B008B; font-weight: bold">None</span>:
-                val_acc = <span style="color: #658b00">self</span>._accuracy(pred_val, t_val)
-                scores[<span style="color: #CD5555">&quot;val_acc&quot;</span>][epoch] = val_acc
-
-        <span style="color: #8B008B; font-weight: bold">return</span> scores
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">_initialize_scores</span>(<span style="color: #658b00">self</span>, epochs) -&gt; <span style="color: #658b00">dict</span>:
-        <span style="color: #CD5555">&quot;&quot;&quot;</span>
-<span style="color: #CD5555">        Description:</span>
-<span style="color: #CD5555">        ------------</span>
-<span style="color: #CD5555">            Initializes scores such as training error, training accuracy, validation error</span>
-<span style="color: #CD5555">            and validation accuracy for the CNN</span>
-
-<span style="color: #CD5555">        Returns:</span>
-<span style="color: #CD5555">        ------------</span>
-<span style="color: #CD5555">            A dictionary with &quot;train_error&quot;, &quot;train_acc&quot;, &quot;val_error&quot;, val_acc&quot; keys that</span>
-<span style="color: #CD5555">            will contain numpy arrays with float values of all accuracies/errors over all epochs</span>
-<span style="color: #CD5555">            when passed through the _compute_scores() function during fit()</span>
-<span style="color: #CD5555">        &quot;&quot;&quot;</span>
-        scores = <span style="color: #658b00">dict</span>()
-
-        train_errors = np.empty(epochs)
-        train_errors.fill(np.nan)
-        val_errors = np.empty(epochs)
-        val_errors.fill(np.nan)
-
-        train_accs = np.empty(epochs)
-        train_accs.fill(np.nan)
-        val_accs = np.empty(epochs)
-        val_accs.fill(np.nan)
-
-        scores[<span style="color: #CD5555">&quot;train_error&quot;</span>] = train_errors
-        scores[<span style="color: #CD5555">&quot;val_error&quot;</span>] = val_errors
-        scores[<span style="color: #CD5555">&quot;train_acc&quot;</span>] = train_accs
-        scores[<span style="color: #CD5555">&quot;val_acc&quot;</span>] = val_accs
-
-        <span style="color: #8B008B; font-weight: bold">return</span> scores
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">_initialize_weights</span>(<span style="color: #658b00">self</span>, X: np.ndarray) -&gt; <span style="color: #8B008B; font-weight: bold">None</span>:
-        <span style="color: #CD5555">&quot;&quot;&quot;</span>
-<span style="color: #CD5555">        Description:</span>
-<span style="color: #CD5555">        ------------</span>
-<span style="color: #CD5555">            Initializes weights for all layers in CNN</span>
-
-<span style="color: #CD5555">        Parameters:</span>
-<span style="color: #CD5555">        ------------</span>
-<span style="color: #CD5555">            I   X (np.ndarray) input of format [img, feature_maps, height, width]</span>
-<span style="color: #CD5555">        &quot;&quot;&quot;</span>
-        prev_nodes = X
-        <span style="color: #8B008B; font-weight: bold">for</span> layer <span style="color: #8B008B">in</span> <span style="color: #658b00">self</span>.layers:
-            prev_nodes = layer._reset_weights(prev_nodes)
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">predict</span>(<span style="color: #658b00">self</span>, X: np.ndarray, *, threshold=<span style="color: #B452CD">0.5</span>) -&gt; np.ndarray:
-        <span style="color: #CD5555">&quot;&quot;&quot;</span>
-<span style="color: #CD5555">        Description:</span>
-<span style="color: #CD5555">        ------------</span>
-<span style="color: #CD5555">            Predicts output of input X</span>
-
-<span style="color: #CD5555">        Parameters:</span>
-<span style="color: #CD5555">        ------------</span>
-<span style="color: #CD5555">            I   X (np.ndarray) input [img, feature_maps, height, width]</span>
-<span style="color: #CD5555">        &quot;&quot;&quot;</span>
-
-        prediction = <span style="color: #658b00">self</span>._feedforward(X)
-
-        <span style="color: #8B008B; font-weight: bold">if</span> <span style="color: #658b00">self</span>.pred_format == <span style="color: #CD5555">&quot;Binary&quot;</span>:
-            <span style="color: #8B008B; font-weight: bold">return</span> np.where(prediction &gt; threshold, <span style="color: #B452CD">1</span>, <span style="color: #B452CD">0</span>)
-        <span style="color: #8B008B; font-weight: bold">elif</span> <span style="color: #658b00">self</span>.pred_format == <span style="color: #CD5555">&quot;Multi-class&quot;</span>:
-            class_prediction = np.zeros(prediction.shape)
-            <span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(prediction.shape[<span style="color: #B452CD">0</span>]):
-                class_prediction[i, np.argmax(prediction[i, :])] = <span style="color: #B452CD">1</span>
-            <span style="color: #8B008B; font-weight: bold">return</span> class_prediction
-        <span style="color: #8B008B; font-weight: bold">else</span>:
-            <span style="color: #8B008B; font-weight: bold">return</span> prediction
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">_accuracy</span>(<span style="color: #658b00">self</span>, prediction: np.ndarray, target: np.ndarray) -&gt; <span style="color: #658b00">float</span>:
-        <span style="color: #CD5555">&quot;&quot;&quot;</span>
-<span style="color: #CD5555">        Description:</span>
-<span style="color: #CD5555">        ------------</span>
-<span style="color: #CD5555">            Calculates accuracy of given prediction to target</span>
-
-<span style="color: #CD5555">        Parameters:</span>
-<span style="color: #CD5555">        ------------</span>
-<span style="color: #CD5555">            I   prediction (np.ndarray): output of predict() fuction</span>
-<span style="color: #CD5555">            (1s and 0s in case of classification, and real numbers in case of regression)</span>
-<span style="color: #CD5555">            II  target (np.ndarray): vector of true values (What the network should predict)</span>
-
-<span style="color: #CD5555">        Returns:</span>
-<span style="color: #CD5555">        ------------</span>
-<span style="color: #CD5555">            A floating point number representing the percentage of correctly classified instances.</span>
-<span style="color: #CD5555">        &quot;&quot;&quot;</span>
-        <span style="color: #8B008B; font-weight: bold">assert</span> prediction.size == target.size
-        <span style="color: #8B008B; font-weight: bold">return</span> np.average((target == prediction))
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">_progress_bar</span>(<span style="color: #658b00">self</span>, epoch: <span style="color: #658b00">int</span>, epochs: <span style="color: #658b00">int</span>, scores: <span style="color: #658b00">dict</span>) -&gt; <span style="color: #658b00">int</span>:
-        <span style="color: #CD5555">&quot;&quot;&quot;</span>
-<span style="color: #CD5555">        Description:</span>
-<span style="color: #CD5555">        ------------</span>
-<span style="color: #CD5555">            Displays progress of training</span>
-<span style="color: #CD5555">        &quot;&quot;&quot;</span>
-        progression = epoch / epochs
-        epoch -= <span style="color: #B452CD">1</span>
-        print_length = <span style="color: #B452CD">40</span>
-        num_equals = <span style="color: #658b00">int</span>(progression * print_length)
-        num_not = print_length - num_equals
-        arrow = <span style="color: #CD5555">&quot;&gt;&quot;</span> <span style="color: #8B008B; font-weight: bold">if</span> num_equals &gt; <span style="color: #B452CD">0</span> <span style="color: #8B008B; font-weight: bold">else</span> <span style="color: #CD5555">&quot;&quot;</span>
-        bar = <span style="color: #CD5555">&quot;[&quot;</span> + <span style="color: #CD5555">&quot;=&quot;</span> * (num_equals - <span style="color: #B452CD">1</span>) + arrow + <span style="color: #CD5555">&quot;-&quot;</span> * num_not + <span style="color: #CD5555">&quot;]&quot;</span>
-        perc_print = <span style="color: #658b00">self</span>._fmt(progression * <span style="color: #B452CD">100</span>, N=<span style="color: #B452CD">5</span>)
-        line = <span style="color: #CD5555">f&quot;  {</span>bar<span style="color: #CD5555">} {</span>perc_print<span style="color: #CD5555">}% &quot;</span>
-
-        <span style="color: #8B008B; font-weight: bold">for</span> key, score <span style="color: #8B008B">in</span> scores.items():
-            <span style="color: #8B008B; font-weight: bold">if</span> np.isnan(score[epoch]) == <span style="color: #8B008B; font-weight: bold">False</span>:
-                value = <span style="color: #658b00">self</span>._fmt(score[epoch], N=<span style="color: #B452CD">4</span>)
-                line += <span style="color: #CD5555">f&quot;| {</span>key<span style="color: #CD5555">}: {</span>value<span style="color: #CD5555">} &quot;</span>
-        <span style="color: #658b00">print</span>(line, end=<span style="color: #CD5555">&quot;\r&quot;</span>)
-        <span style="color: #8B008B; font-weight: bold">return</span> <span style="color: #658b00">len</span>(line)
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">_fmt</span>(<span style="color: #658b00">self</span>, value: <span style="color: #658b00">int</span>, N=<span style="color: #B452CD">4</span>) -&gt; <span style="color: #658b00">str</span>:
-        <span style="color: #CD5555">&quot;&quot;&quot;</span>
-<span style="color: #CD5555">        Description:</span>
-<span style="color: #CD5555">        ------------</span>
-<span style="color: #CD5555">            Formats decimal numbers for progress bar</span>
-<span style="color: #CD5555">        &quot;&quot;&quot;</span>
-        <span style="color: #8B008B; font-weight: bold">if</span> value &gt; <span style="color: #B452CD">0</span>:
-            v = value
-        <span style="color: #8B008B; font-weight: bold">elif</span> value &lt; <span style="color: #B452CD">0</span>:
-            v = -<span style="color: #B452CD">10</span> * value
-        <span style="color: #8B008B; font-weight: bold">else</span>:
-            v = <span style="color: #B452CD">1</span>
-        n = <span style="color: #B452CD">1</span> + math.floor(math.log10(v))
-        <span style="color: #8B008B; font-weight: bold">if</span> n &gt;= N - <span style="color: #B452CD">1</span>:
-            <span style="color: #8B008B; font-weight: bold">return</span> <span style="color: #658b00">str</span>(<span style="color: #658b00">round</span>(value))
-            <span style="color: #228B22"># or overflow</span>
-        <span style="color: #8B008B; font-weight: bold">return</span> <span style="color: #CD5555">f&quot;{</span>value<span style="color: #CD5555">:.{</span>N-n-<span style="color: #B452CD">1</span><span style="color: #CD5555">}f}&quot;</span>
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-<h3 id="usage-of-cnn-code">Usage of CNN code </h3>
-
-<p>Using the CNN codebase is very simple. We begin by initiating a CNN
-object, which takes a cost function, a scheduler and a seed as its
-arguments. If a scheduler is not provided, it will per default
-initiate an Adam scheduler with eta=1e-4, and if a seed is not
-provided, the CNN will not be seeded, meaning it will run with a
-different random seed every run. Below we demonstrate an initiation of
-our CNN.
+<p>and </p>
+$$
+\boldsymbol{W}=\begin{bmatrix}w_{00} & w_{01} \\
+	              w_{10} & w_{11}\end{bmatrix}.
+$$
+
+<p>We introduce now the hyperparameter \( S \) <b>stride</b>. Stride represents how the filter \( W \) moves the convolution process on the matrix \( X \).
+We strongly recommend the repository on <a href="https://github.com/vdumoulin/conv_arithmetic" target="_blank">Arithmetic of deep learning by Dumoulin and Visin</a> 
 </p>
 
+<p>Here we set the stride equal to \( S=1 \), which means that, starting with the element \( x_{00} \), the filter will act on \( 2\times 2 \) submatrices each time, starting with the upper corner and moving according to the stride value column by column. </p>
 
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="line-height: 125%;">adam_scheduler = Adam(eta=<span style="color: #B452CD">1e-3</span>, rho=<span style="color: #B452CD">0.9</span>, rho2=<span style="color: #B452CD">0.999</span>)
-cnn = CNN(cost_func=CostCrossEntropy, scheduler=adam_scheduler, seed=<span style="color: #B452CD">2023</span>)
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
+<p>Here we perform the operation</p>
+$$
+Y_(i,j)=(X * W)(i,j) = \sum_m\sum_n X(i-m,j-n)W(m,n),
+$$
+
+<p>and obtain</p>
+$$
+\boldsymbol{Y}=\begin{bmatrix}x_{00}w_{00}+x_{01}w_{01}+x_{10}w_{10}+x_{11}w_{11} & x_{01}w_{00}+x_{02}w_{01}+x_{11}w_{10}+x_{12}w_{11}  \\
+	              x_{10}w_{00}+x_{11}w_{01}+x_{20}w_{10}+x_{21}w_{11} & x_{11}w_{00}+x_{12}w_{01}+x_{21}w_{10}+x_{22}w_{11}\end{bmatrix}.
+$$
 
-<p>Now that we have our CNN object, we can begin to add layers to it!
-Many of the add_layer functions have default values, for example
-add_Convolution2DLayer() has a default v_stride and h_stride of
-1. However, these can of course be set to any value you please. Note
-that the input channels of a subsequent convolutional layer must equal
-the previous convolutional layer's feature maps.
+<p>We can rewrite this operation in terms of a matrix-vector multiplication by defining a new vector where we flatten out the inputs as a vector \( \boldsymbol{X}' \) of length \( 9 \) and
+a matrix \( \boldsymbol{W}' \) with dimension \( 4\times 9 \) as
 </p>
 
+$$
+\boldsymbol{X}'=\begin{bmatrix}x_{00} \\ x_{01} \\ x_{02} \\ x_{10} \\ x_{11} \\ x_{12} \\ x_{20} \\ x_{21} \\ x_{22} \end{bmatrix},
+$$
 
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="line-height: 125%;">cnn.add_Convolution2DLayer(
-    input_channels=<span style="color: #B452CD">1</span>,
-    feature_maps=<span style="color: #B452CD">1</span>,
-    kernel_height=<span style="color: #B452CD">3</span>,
-    kernel_width=<span style="color: #B452CD">3</span>,
-    act_func=LRELU,
-)
+<p>and the new matrix</p>
+$$
+\boldsymbol{W}'=\begin{bmatrix} w_{00} & w_{01} & 0 & w_{10} & w_{11} & 0 & 0 & 0 & 0 \\
+                        0  & w_{00} & w_{01} & 0 & w_{10} & w_{11} & 0 & 0 & 0 \\
+			0 & 0 & 0 & w_{00} & w_{01} & 0 & w_{10} & w_{11} & 0  \\
+                        0 & 0 & 0 & 0 & w_{00} & w_{01} & 0 & w_{10} & w_{11}\end{bmatrix}.
+$$
+
+<p>We see easily that performing the matrix-vector multiplication \( \boldsymbol{W}'\boldsymbol{X}' \) is the same as the above convolution with stride \( S=1 \), that is</p>
+
+$$
+Y=(\boldsymbol{W}*\boldsymbol{X}),
+$$
+
+<p>is now given by \( \boldsymbol{W}'\boldsymbol{X}' \) which is a vector of length \( 4 \) instead of the originally resulting  \( 2\times 2 \) output matrix.</p>
 
-cnn.add_FlattenLayer()
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="the-convolution-stage">The convolution stage </h2>
 
-cnn.add_FullyConnectedLayer(<span style="color: #B452CD">30</span>, LRELU)
+<p>The convolution stage, where we apply different filters \( \boldsymbol{W} \) in
+order to reduce the dimensionality of an image, adds, in addition to
+the weights and biases (to be trained by the back propagation
+algorithm) that define the filters, two new hyperparameters, the so-called
+<b>padding</b> \( P \) and the stride \( S \).
+</p>
 
-cnn.add_FullyConnectedLayer(<span style="color: #B452CD">20</span>, LRELU)
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="finding-the-number-of-parameters">Finding the number of parameters </h2>
 
-cnn.add_OutputLayer(<span style="color: #B452CD">10</span>, softmax)
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
+<p>In the above example we have an input matrix of dimension \( 3\times
+3 \). In general we call the input for an input volume and it is defined
+by its width \( H_1 \), height \( H_1 \) and depth \( D_1 \). If we have the
+standard three color channels \( D_1=3 \).
+</p>
 
-<p>Here we have created a CNN with the following architecture:</p>
+<p>The above example has \( W_1=H_1=3 \) and \( D_1=1 \).</p>
 
+<p>When we introduce the filter we have the following additional hyperparameters</p>
 <ol>
-<li> A convolutional layer with 1 input channel, with a kernel height of 2 and a width of 2, which uses LRELU as its non-linearity function. This layer outputs 1 feature map, which feed into the subsequent layer.</li>
-<li> A flatten layer</li>
-<li> A hidden layer with 30 nodes, with LRELU as its activation function</li>
-<li> Another hidden layer but with 20 nodes</li>
-<li> The output layer, with softmax as its activation function and 10 nodes. We use 10 nodes because we will be using a dataset with 10 classes.</li>
+<li> \( K \) the number of filters. It is common to perform the convolution of the input several times since by experience shrinking the input too fast does not work well</li>
+<li> \( F \) as the filter's spatial extent</li>
+<li> \( S \) as the stride parameter</li>
+<li> \( P \) as the padding parameter</li>
 </ol>
-<p>Now, before we can train the model, we need to load in our data. We
-will use the MNIST dataset and use 10000 \( 28 \times  28 \) images.
-</p>
+<p>These parameters are defined by the architecture of the network and are not included in the training.</p>
 
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="new-image-or-volume">New image (or volume) </h2>
 
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="line-height: 125%;"><span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.datasets</span> <span style="color: #8B008B; font-weight: bold">import</span> fetch_openml
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.model_selection</span> <span style="color: #8B008B; font-weight: bold">import</span> train_test_split
+<p>Acting with the filter on the input volume produces an output volume
+which is defined by its width \( W_2 \), its height \( H_2 \) and its depth
+\( D_2 \).
+</p>
 
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">onehot</span>(target: np.ndarray):
-    onehot = np.zeros((target.size, target.max() + <span style="color: #B452CD">1</span>))
-    onehot[np.arange(target.size), target] = <span style="color: #B452CD">1</span>
-    <span style="color: #8B008B; font-weight: bold">return</span> onehot
+<p>These are defined by the following relations</p>
+$$
+W_2 = \frac{(W_1-F+2P)}{S}+1,
+$$
 
-<span style="color: #228B22"># get dataset</span>
-dataset = fetch_openml(<span style="color: #CD5555">&quot;mnist_784&quot;</span>, parser=<span style="color: #CD5555">&quot;auto&quot;</span>)
-mnist = dataset.data.to_numpy(dtype=<span style="color: #CD5555">&quot;float&quot;</span>)[:<span style="color: #B452CD">10000</span>, :]
+$$
+H_2 = \frac{(H_1-F+2P)}{S}+1,
+$$
 
-<span style="color: #228B22"># scale data</span>
-<span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(mnist.shape[<span style="color: #B452CD">1</span>]):
-    mnist[:, i] /= <span style="color: #B452CD">255</span>
-    
-<span style="color: #228B22"># reshape to add single input channel to data shape [inputs, input_channels, height, width]</span>
-mnist = mnist.reshape(mnist.shape[<span style="color: #B452CD">0</span>], <span style="color: #B452CD">1</span>, <span style="color: #B452CD">28</span>, <span style="color: #B452CD">28</span>)
+<p>and \( D_2=K \).</p>
 
-<span style="color: #228B22"># one hot encode target as we are doing multi-class classification</span>
-target = onehot(np.array([<span style="color: #658b00">int</span>(i) <span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> dataset.target.to_numpy()[:<span style="color: #B452CD">10000</span>]]))
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="parameters-to-train-common-settings">Parameters to train, common settings </h2>
 
-<span style="color: #228B22"># split into training and validation data</span>
-x_train, x_val, y_train, y_val = train_test_split(mnist, target)
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
+<p>With parameter sharing, the convolution involves thus  for each filter  \( F\times F\times D_1 \) weights plus one bias parameter.</p>
 
-<p>Now we may train our model. Note that we can utilize regularization in
-the CNN by using the lam (lambda) parameter in fit(), and utilize
-different types of gradient descent by specifying the amount of
-batches via the batches parameter as shown below.
-</p>
+<p>In total we have</p>
+$$
+\left(F\times F\times D_1)\right) \times K+(K\mathrm{--biases}),
+$$
 
-<p>The functionfit() returns a score dictionary of the training error and
-accuracy (and validation error and accuracy if a validation set is
-provided) which can be used to plot the error and accuracy of the
-model over epochs.
-</p>
+<p>parameters to train by back propagation.</p>
 
+<p>It is common to let \( K \) come in powers of \( 2 \), that is \( 32 \), \( 64 \), \( 128 \) etc.</p>
 
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="line-height: 125%;">scores = cnn.fit(
-    x_train,
-    y_train,
-    lam=<span style="color: #B452CD">1e-5</span>,
-    batches=<span style="color: #B452CD">10</span>,
-    epochs=<span style="color: #B452CD">100</span>,
-    X_val=x_val,
-    t_val=y_val,
-)
-
-plt.plot(scores[<span style="color: #CD5555">&quot;train_acc&quot;</span>], label=<span style="color: #CD5555">&quot;Training&quot;</span>)
-plt.plot(scores[<span style="color: #CD5555">&quot;val_acc&quot;</span>], label=<span style="color: #CD5555">&quot;Validation&quot;</span>)
-plt.ylim([<span style="color: #B452CD">0.8</span>,<span style="color: #B452CD">1</span>])
-plt.xlabel(<span style="color: #CD5555">&quot;Epochs&quot;</span>)
-plt.ylabel(<span style="color: #CD5555">&quot;Accuracy&quot;</span>)
-plt.legend()
-plt.show()
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
+<div class="alert alert-block alert-block alert-text-normal">
+<b>Common settings</b>
+<p>
+<ol>
+<li> \( \begin{array}{c} F=3 &amp; S=1 &amp; P=1 \end{array} \)</li>
+<li> \( \begin{array}{c} F=5 &amp; S=1 &amp; P=2 \end{array} \)</li>
+<li> \( \begin{array}{c} F=5 &amp; S=2 &amp; P=\mathrm{open} \end{array} \)</li>
+<li> \( \begin{array}{c} F=1 &amp; S=1 &amp; P=0 \end{array} \)</li>
+</ol>
 </div>
 
-<p>Considering we only trained the model for 100 epochs without any tuning of the hyperparameters, this result is pretty good.</p>
 
-<p>The codebase allows for great flexibility in CNN
-architectures. Pooling layers can be added before, inbetween or after
-convolutional layers, but due to the great optimizations made within
-Convolution2DLayerOPT, we recommend using the v_stride and h_stride
-parameters in add_Convolution2DLayer() to reduce the dimentionality of
-the problem as the pooling layer is slow in comparison. To use the
-unoptimized version of Convolution2DLayer, simply pass optimized=False
-as an argument in add_Convolution2DLayer().
-</p>
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="examples-of-cnn-setups">Examples of CNN setups </h2>
 
-<p>If one wishes to perform binary classification using the CNN, simply
-use the cost function 'CostLogReg' when initializing the CNN and use 1
-node at the OutputLayer.
+<p>Let us assume we have an input volume \( V \) given by an image of dimensionality
+\( 32\times 32 \times 3 \), that is three color channels and \( 32\times 32 \) pixels.
 </p>
 
-<p>Below we have created another, more untraditional architecture using
-our code to demonstrate its flexibility and different attributes such
-as asymmetric stride that might become useful when constructing your
-own CNN.
-</p>
+<p>We apply a filter of dimension \( 5\times 5 \) ten times with stride \( S=1 \) and padding \( P=0 \).</p>
 
+<p>The output volume is given by \( (32-5)/1+1=28 \), resulting in ten images
+of dimensionality \( 28\times 28\times 3 \).
+</p>
 
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="line-height: 125%;">adam_scheduler = Adam(eta=<span style="color: #B452CD">1e-3</span>, rho=<span style="color: #B452CD">0.9</span>, rho2=<span style="color: #B452CD">0.999</span>)
-cnn = CNN(cost_func=CostCrossEntropy, scheduler=adam_scheduler, seed=<span style="color: #B452CD">2023</span>)
-
-cnn.add_Convolution2DLayer(
-    input_channels=<span style="color: #B452CD">1</span>,
-    feature_maps=<span style="color: #B452CD">7</span>,
-    kernel_height=<span style="color: #B452CD">7</span>,
-    kernel_width=<span style="color: #B452CD">1</span>,
-    act_func=LRELU,
-)
-
-cnn.add_PoolingLayer(
-    kernel_height=<span style="color: #B452CD">2</span>,
-    kernel_width=<span style="color: #B452CD">2</span>,
-    pooling=<span style="color: #CD5555">&quot;average&quot;</span>,
-)
-
-cnn.add_PoolingLayer(
-    kernel_height=<span style="color: #B452CD">2</span>,
-    kernel_width=<span style="color: #B452CD">2</span>,
-    pooling=<span style="color: #CD5555">&quot;max&quot;</span>,
-)
-
-cnn.add_Convolution2DLayer(
-    input_channels=<span style="color: #B452CD">7</span>,
-    feature_maps=<span style="color: #B452CD">1</span>,
-    kernel_height=<span style="color: #B452CD">4</span>,
-    kernel_width=<span style="color: #B452CD">4</span>,
-    v_stride=<span style="color: #B452CD">2</span>,
-    h_stride=<span style="color: #B452CD">3</span>,
-    act_func=LRELU,
-    optimized=<span style="color: #8B008B; font-weight: bold">False</span>,
-)
-
-cnn.add_Convolution2DLayer(
-    input_channels=<span style="color: #B452CD">1</span>,
-    feature_maps=<span style="color: #B452CD">1</span>,
-    kernel_height=<span style="color: #B452CD">2</span>,
-    kernel_width=<span style="color: #B452CD">2</span>,
-    act_func=sigmoid,
-    optimized=<span style="color: #8B008B; font-weight: bold">True</span>,
-)
-
-cnn.add_PoolingLayer(
-    kernel_height=<span style="color: #B452CD">2</span>,
-    kernel_width=<span style="color: #B452CD">2</span>,
-    pooling=<span style="color: #CD5555">&quot;max&quot;</span>
-)
-
-cnn.add_FlattenLayer()
-
-cnn.add_FullyConnectedLayer(<span style="color: #B452CD">100</span>, LRELU)
-
-cnn.add_FullyConnectedLayer(<span style="color: #B452CD">10</span>, sigmoid)
-
-cnn.add_FullyConnectedLayer(<span style="color: #B452CD">101</span>, identity)
-
-cnn.add_OutputLayer(<span style="color: #B452CD">10</span>, softmax)
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
+<p>The total number of parameters to train for each filter is then
+\( 5\times 5\times 3+1 \), where the last parameter is the bias. This
+gives us \( 76 \) parameters for each filter, leading to a total of \( 760 \)
+parameters for the ten filters.
+</p>
 
-<p>Here we see the use of asymmetrical 1D kernels such as the \( 7 \times
-1 \) kernel in the first convolutional layer, both max and average
-pooling, asymmetric stride in the unoptimized convolutional layer,
-more pooling, a flatten layer, a hidden layer with 100 nodes using
-LRELU, another hidden layer with 10 hidden nodes that uses the sigmoid
-activation function, and another hidden layer with 101 nodes which
-utilizes no activation function (identity). Finally, we arrive at the
-output layer with 10 nodes, which uses softmax as its activation
-function.
+<p>How many parameters will a filter of dimensionality \( 3\times 3 \)
+(adding color channels) result in if we produce \( 32 \) new images? Use \( S=1 \) and \( P=0 \).
 </p>
-<h3 id="additional-remarks">Additional Remarks </h3>
-
-<p>The stride parameter controls the distance between each convolution
-and the kernel/filter. If our image is padded, stride is the only
-parameter that determines the size of the output from a convolutional
-layer. However, if we decide not to perform any padding, the size of
-the output feature map depends on both the stride and kernel size. It
-is important to note that neither the stride nor the kernel has to be
-symmetrical. This means that we can use a rectangular filter if we
-choose, and the stride in the vertical direction (axis=0 in Python)
-does not need to be the same as the stride in the horizontal direction
-(axis=1 in Python). It may even be the case that asymmetric
-combinations of stride or kernel dimensions, or both, yield better
-results than symmetric values for these parameters.
+
+<p>Note that strides constitute a form of <b>subsampling</b>. As an alternative to
+being interpreted as a measure of how much the kernel/filter is translated, strides
+can also be viewed as how much of the output is retained. For instance, moving
+the kernel by hops of two is equivalent to moving the kernel by hops of one but
+retaining only odd output elements.
 </p>
 
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book">Summarizing: Performing a general discrete convolution (<a href="https://github.com/rasbt/machine-learning-book" target="_blank">From Raschka et al</a>) </h2>
 
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="line-height: 125%;"><span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">convolve</span>(image, kernel, stride=<span style="color: #B452CD">1</span>):
-    <span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #B452CD">2</span>):
-        kernel = np.rot90(kernel)
-
-    k_half_height = kernel.shape[<span style="color: #B452CD">0</span>] // <span style="color: #B452CD">2</span>
-    k_half_width = kernel.shape[<span style="color: #B452CD">0</span>] // <span style="color: #B452CD">2</span>
-
-    conv_image = np.zeros(image.shape)
-    pad_image = padding(image, kernel)
-
-    <span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(k_half_height, conv_image.shape[<span style="color: #B452CD">0</span>] + k_half_height, stride):
-        <span style="color: #8B008B; font-weight: bold">for</span> j <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(k_half_width, conv_image.shape[<span style="color: #B452CD">1</span>] + k_half_width, stride):
-            conv_image[i - k_half_height, j - k_half_width] = np.sum(
-                pad_image[
-                    i - k_half_height : i + k_half_height + <span style="color: #B452CD">1</span>, j - k_half_width : j + k_half_width + <span style="color: #B452CD">1</span>
-                ]
-                * kernel
-            )
-
-    <span style="color: #8B008B; font-weight: bold">return</span> conv_image
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-<h3 id="remarks-on-the-speed">Remarks on the speed  </h3>
-
-<p>Despite the naive convolution algorithm shown above working finely, it
-is extremely slow, requiring approximately 20-30 seconds to process a
-single image. The time complexity of 2D convolution, which is O(NMnm),
-rapidly becomes a constraint and may, at worst, make computations
-infeasible. Consequently, optimizing the naive 2D convolution
-algorithm is a necessity, as the execution time of the algorithm
-significantly increases as the input data size expands. This can pose
-a bottleneck in applications that necessitate real-time processing of
-large data volumes, such as image and video processing, deep learning,
-and scientific simulations.
-</p>
+<center>  <!-- FIGURE -->
+<hr class="figure">
+<center>
+<p class="caption">Figure 4:  A deep CNN </p>
+</center>
+<p><img src="http://github.com/figslides/discreteconv1.png" width="500" align="bottom"></p>
+</center>
+
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="pooling">Pooling </h2>
 
-<p>To address this issue, we shall present two widely used optimization
-techniques: the separable kernel approach and Fast Fourier Transform
-(FFT). Both of these methods can drastically reduce the computational
-complexity of convolution and enhance the overall efficiency of
-processing substantial data quantities. While we shall refrain from
-delving into the intricacies of these algorithms, we strongly
-encourage you to examine at least the application of FFT to optimize
-computations.
+<p>In addition to discrete convolutions themselves, <b>pooling</b> operations
+make up another important building block in CNNs. Pooling operations reduce
+the size of feature maps by using some function to summarize subregions, such
+as taking the average or the maximum value.
 </p>
-<h3 id="convolution-using-separable-kernels">Convolution using separable kernels </h3>
 
+<p>Pooling works by sliding a window across the input and feeding the content of
+the window to a <b>pooling function</b>. In some sense, pooling works very much
+like a discrete convolution, but replaces the linear combination described by
+the kernel with some other function.
+</p>
 
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="line-height: 125%;"><span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">conv2DSep</span>(image, kernel, coef, stride=<span style="color: #B452CD">1</span>, pad=<span style="color: #CD5555">&quot;zero&quot;</span>):
-    <span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #B452CD">2</span>):
-        kernel = np.rot90(kernel)
-
-    <span style="color: #228B22"># The kernel is quadratic, thus we only need one of its dimensions</span>
-    half_dim = kernel.shape[<span style="color: #B452CD">0</span>] // <span style="color: #B452CD">2</span>
-
-    ker1 = np.array(kernel[<span style="color: #B452CD">0</span>, :])
-    ker2 = np.array(kernel[:, <span style="color: #B452CD">0</span>])
-
-    <span style="color: #8B008B; font-weight: bold">if</span> pad == <span style="color: #CD5555">&quot;zero&quot;</span>:
-        conv_image = np.zeros(image.shape)
-        pad_image = padding(image, kernel)
-    <span style="color: #8B008B; font-weight: bold">else</span>:
-        conv_image = np.zeros(
-            (image.shape[<span style="color: #B452CD">0</span>] - kernel.shape[<span style="color: #B452CD">0</span>], image.shape[<span style="color: #B452CD">1</span>] - kernel.shape[<span style="color: #B452CD">1</span>])
-        )
-        pad_image = image[:, :]
-
-    <span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(half_dim, conv_image.shape[<span style="color: #B452CD">0</span>] + half_dim, stride):
-        <span style="color: #8B008B; font-weight: bold">for</span> j <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(half_dim, conv_image.shape[<span style="color: #B452CD">1</span>] + half_dim, stride):
-            conv_image[i - half_dim, j - half_dim] = (
-                pad_image[
-                    i - half_dim : i + half_dim + <span style="color: #B452CD">1</span>, j - half_dim : j + half_dim + <span style="color: #B452CD">1</span>
-                ]
-                @ ker1
-                @ ker2.T
-                * coef
-            )
-
-    <span style="color: #8B008B; font-weight: bold">return</span> conv_image
-
-img_path = img_path = <span style="color: #CD5555">&quot;data/IMG-2167.JPG&quot;</span>
-image_of_cute_dog = imageio.imread(img_path, mode=<span style="color: #CD5555">&quot;L&quot;</span>)
-start_time = time.time()
-filtered_image = conv2DSep(image_of_cute_dog, kernel=sobel_kernel, coef=<span style="color: #B452CD">1</span>)
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">f&#39;Time taken for convolution with seperated kernel on 128x128 image {</span>time.time() - start_time<span style="color: #CD5555">}&#39;</span>)
-plt.imshow(filtered_image, cmap=<span style="color: #CD5555">&quot;gray&quot;</span>, vmin=<span style="color: #B452CD">0</span>, vmax=<span style="color: #B452CD">255</span>, aspect=<span style="color: #CD5555">&quot;auto&quot;</span>)
-plt.show()
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="pooling-arithmetic">Pooling arithmetic </h2>
 
-<p>By taking advantage of the capabilities of separable kernels, we can
-effectively cut the computational expense of filtering an image in
-half. Yet, if we seek even more rapid processing, we can turn to the
-Fast Fourier Transform (FFT) algorithm provided by the numpy
-library. By utilizing FFT to transform the input image and filter into
-the frequency domain, we can perform convolution in this domain. This
-approach significantly reduces the number of operations needed and
-results in a marked speedup relative to other convolution
-techniques. In addition, it is worth noting that the FFT is widely
-regarded as one of the most critical algorithms developed to date,
-with applications ranging from digital signal processing to scientific
-computing.
+<p>In a neural network, pooling layers provide invariance to small translations of
+the input. The most common kind of pooling is <b>max pooling</b>, which
+consists in splitting the input in (usually non-overlapping) patches and
+outputting the maximum value of each patch. Other kinds of pooling exist, e.g.,
+mean or average pooling, which all share the same idea of aggregating the input
+locally by applying a non-linearity to the content of some patches.
 </p>
-<h3 id="convolution-in-the-fourier-domain">Convolution in the Fourier domain </h3>
 
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="line-height: 125%;">start_time = time.time()
-img_fft = np.fft.fft2(image_of_cute_dog)
-kernel_fft = np.fft.fft2(sobel_kernel, s=image_of_cute_dog.shape)
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book">Pooling types (<a href="https://github.com/rasbt/machine-learning-book" target="_blank">From Raschka et al</a>) </h2>
 
-conv_image = img_fft * kernel_fft
+<center>  <!-- FIGURE -->
+<hr class="figure">
+<center>
+<p class="caption">Figure 5:  A deep CNN </p>
+</center>
+<p><img src="http://github.com/figslides/maxpooling.png" width="500" align="bottom"></p>
+</center>
 
-filtered_image = np.fft.ifft2(conv_image)
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">f&#39;Time take for convolution in the fourier domain: {</span>time.time() - start_time<span style="color: #CD5555">}&#39;</span>)
-plt.imshow(filtered_image.real, cmap=<span style="color: #CD5555">&quot;gray&quot;</span>, vmin=<span style="color: #B452CD">0</span>, vmax=<span style="color: #B452CD">255</span>, aspect=<span style="color: #CD5555">&quot;auto&quot;</span>)
-plt.show()
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="building-convolutional-neural-networks-in-tensorflow-keras-and-pytorch">Building convolutional neural networks in Tensorflow/Keras and PyTorch </h2>
 
-<p>It is evident that executing convolution in the Fourier domain yields
-the quickest computation time. Nonetheless, one should exercise
-caution, particularly when dealing with images of relatively small
-dimensions, as one of the other methods may prove to be more
-expeditious than FFT-enhanced convolution. The overhead involved in
-transferring both the image and filter into the Fourier domain,
-followed by their subsequent transformation back into the spatial
-domain, results in a minor inconvenience. Therefore, it is imperative
-to remain cognizant of this fact when utilizing FFT as the primary
-optimization technique.
+<p>As discussed above, CNNs are neural networks built from the assumption
+that the inputs to the network are 2D images. This is important
+because the number of features or pixels in images grows very fast
+with the image size, and an enormous number of weights and biases are
+needed in order to build an accurate network.  Next week we will
+discuss in more detail how we can build a CNN using either TensorFlow
+with Keras and PyTorch.
 </p>
 
+
 <!-- ------------------- end of main content --------------- -->
 <center style="font-size:80%">
-<!-- copyright --> &copy; 1999-2024, Morten Hjorth-Jensen. Released under CC Attribution-NonCommercial 4.0 license
+<!-- copyright --> &copy; 1999-2025, Morten Hjorth-Jensen. Released under CC Attribution-NonCommercial 4.0 license
 </center>
 </body>
 </html>
diff --git a/doc/pub/week44/html/week44.html b/doc/pub/week44/html/week44.html
index 8018bf3a7..635c880bc 100644
--- a/doc/pub/week44/html/week44.html
+++ b/doc/pub/week44/html/week44.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 44,  Convolutional Neural Networks (CNN)">
-<title>Week 44,  Convolutional Neural Networks (CNN)</title>
+<meta name="description" content="Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)">
+<title>Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)</title>
 <style type="text/css">
 /* bloodish style */
 body {
@@ -145,10 +145,121 @@
                2,
                None,
                'lab-sessions-on-tuesday-and-wednesday'),
-              ('Material for Lecture Monday October 28',
+              ('Material for Lecture Monday October 27',
                2,
                None,
-               'material-for-lecture-monday-october-28'),
+               'material-for-lecture-monday-october-27'),
+              ('Solving differential equations  with Deep Learning',
+               2,
+               None,
+               'solving-differential-equations-with-deep-learning'),
+              ('Ordinary Differential Equations first',
+               2,
+               None,
+               'ordinary-differential-equations-first'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('Minimization process', 2, None, 'minimization-process'),
+              ('Minimizing the cost function using gradient descent and '
+               'automatic differentiation',
+               2,
+               None,
+               'minimizing-the-cost-function-using-gradient-descent-and-automatic-differentiation'),
+              ('Example: Exponential decay',
+               2,
+               None,
+               'example-exponential-decay'),
+              ('The function to solve for',
+               2,
+               None,
+               'the-function-to-solve-for'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('Setup of Network', 2, None, 'setup-of-network'),
+              ('Reformulating the problem',
+               2,
+               None,
+               'reformulating-the-problem'),
+              ('More technicalities', 2, None, 'more-technicalities'),
+              ('More details', 2, None, 'more-details'),
+              ('A possible implementation of a neural network',
+               2,
+               None,
+               'a-possible-implementation-of-a-neural-network'),
+              ('Technicalities', 2, None, 'technicalities'),
+              ('Final technicalities I', 2, None, 'final-technicalities-i'),
+              ('Final technicalities II', 2, None, 'final-technicalities-ii'),
+              ('Final technicalities III', 2, None, 'final-technicalities-iii'),
+              ('Final technicalities IV', 2, None, 'final-technicalities-iv'),
+              ('Back propagation', 2, None, 'back-propagation'),
+              ('Gradient descent', 2, None, 'gradient-descent'),
+              ('The code for solving the ODE',
+               2,
+               None,
+               'the-code-for-solving-the-ode'),
+              ('The network with one input layer, specified number of hidden '
+               'layers, and one output layer',
+               2,
+               None,
+               'the-network-with-one-input-layer-specified-number-of-hidden-layers-and-one-output-layer'),
+              ('Example: Population growth',
+               2,
+               None,
+               'example-population-growth'),
+              ('Setting up the problem', 2, None, 'setting-up-the-problem'),
+              ('The trial solution', 2, None, 'the-trial-solution'),
+              ('The program using Autograd',
+               2,
+               None,
+               'the-program-using-autograd'),
+              ('Using forward Euler to solve the ODE',
+               2,
+               None,
+               'using-forward-euler-to-solve-the-ode'),
+              ('Example: Solving the one dimensional Poisson equation',
+               2,
+               None,
+               'example-solving-the-one-dimensional-poisson-equation'),
+              ('The specific equation to solve for',
+               2,
+               None,
+               'the-specific-equation-to-solve-for'),
+              ('Solving the equation using Autograd',
+               2,
+               None,
+               'solving-the-equation-using-autograd'),
+              ('Comparing with a numerical scheme',
+               2,
+               None,
+               'comparing-with-a-numerical-scheme'),
+              ('Setting up the code', 2, None, 'setting-up-the-code'),
+              ('Partial Differential Equations',
+               2,
+               None,
+               'partial-differential-equations'),
+              ('Type of problem', 2, None, 'type-of-problem'),
+              ('Network requirements', 2, None, 'network-requirements'),
+              ('More details', 2, None, 'more-details'),
+              ('Example: The diffusion equation',
+               2,
+               None,
+               'example-the-diffusion-equation'),
+              ('Defining the problem', 2, None, 'defining-the-problem'),
+              ('Setting up the network using Autograd',
+               2,
+               None,
+               'setting-up-the-network-using-autograd'),
+              ('Setting up the network using Autograd; The trial solution',
+               2,
+               None,
+               'setting-up-the-network-using-autograd-the-trial-solution'),
+              ('Why the Jacobian?', 2, None, 'why-the-jacobian'),
+              ('Setting up the network using Autograd; The full program',
+               2,
+               None,
+               'setting-up-the-network-using-autograd-the-full-program'),
+              ('Resources on differential equations and deep learning',
+               2,
+               None,
+               'resources-on-differential-equations-and-deep-learning'),
               ('Convolutional Neural Networks (recognizing images)',
                2,
                None,
@@ -245,85 +356,11 @@
                2,
                None,
                'pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
-              ('Building convolutional neural networks in Tensorflow and Keras',
-               2,
-               None,
-               'building-convolutional-neural-networks-in-tensorflow-and-keras'),
-              ('Setting it up', 2, None, 'setting-it-up'),
-              ('The MNIST dataset again', 2, None, 'the-mnist-dataset-again'),
-              ('Strong correlations', 2, None, 'strong-correlations'),
-              ('Layers of a CNN', 2, None, 'layers-of-a-cnn'),
-              ('Systematic reduction', 2, None, 'systematic-reduction'),
-              ('Prerequisites: Collect and pre-process data',
-               2,
-               None,
-               'prerequisites-collect-and-pre-process-data'),
-              ('Importing Keras and Tensorflow',
+              ('Building convolutional neural networks in Tensorflow/Keras and '
+               'PyTorch',
                2,
                None,
-               'importing-keras-and-tensorflow'),
-              ('Running with Keras', 2, None, 'running-with-keras'),
-              ('Final part', 2, None, 'final-part'),
-              ('Final visualization', 2, None, 'final-visualization'),
-              ('The CIFAR01 data set', 2, None, 'the-cifar01-data-set'),
-              ('Verifying the data set', 2, None, 'verifying-the-data-set'),
-              ('Set up  the model', 2, None, 'set-up-the-model'),
-              ('Add Dense layers on top', 2, None, 'add-dense-layers-on-top'),
-              ('Compile and train the model',
-               2,
-               None,
-               'compile-and-train-the-model'),
-              ('Finally, evaluate the model',
-               2,
-               None,
-               'finally-evaluate-the-model'),
-              ('Building our own CNN code',
-               2,
-               None,
-               'building-our-own-cnn-code'),
-              ('List of contents:', 3, None, 'list-of-contents'),
-              ('Schedulers', 3, None, 'schedulers'),
-              ('Usage of schedulers', 3, None, 'usage-of-schedulers'),
-              ('Cost functions', 3, None, 'cost-functions'),
-              ('Usage of cost functions', 3, None, 'usage-of-cost-functions'),
-              ('Activation functions', 3, None, 'activation-functions'),
-              ('Usage of activation functions',
-               3,
-               None,
-               'usage-of-activation-functions'),
-              ('Convolution', 3, None, 'convolution'),
-              ('Layers', 3, None, 'layers'),
-              ('Convolution2DLayer: convolution in a hidden layer',
-               3,
-               None,
-               'convolution2dlayer-convolution-in-a-hidden-layer'),
-              ('Backpropagation in the convolutional layer',
-               3,
-               None,
-               'backpropagation-in-the-convolutional-layer'),
-              ('Demonstration', 3, None, 'demonstration'),
-              ('Pooling Layer', 3, None, 'pooling-layer'),
-              ('Flattening Layer', 3, None, 'flattening-layer'),
-              ('Fully Connected Layers', 3, None, 'fully-connected-layers'),
-              ('Optimized Convolution2DLayer',
-               3,
-               None,
-               'optimized-convolution2dlayer'),
-              ('The Convolutional Neural Network (CNN)',
-               3,
-               None,
-               'the-convolutional-neural-network-cnn'),
-              ('Usage of CNN code', 3, None, 'usage-of-cnn-code'),
-              ('Additional Remarks', 3, None, 'additional-remarks'),
-              ('Remarks on the speed', 3, None, 'remarks-on-the-speed'),
-              ('Convolution using separable kernels',
-               3,
-               None,
-               'convolution-using-separable-kernels'),
-              ('Convolution in the Fourier domain',
-               3,
-               None,
-               'convolution-in-the-fourier-domain')]}
+               'building-convolutional-neural-networks-in-tensorflow-keras-and-pytorch')]}
 end of tocinfo -->
 
 <body>
@@ -345,7 +382,7 @@
 
 <!-- ------------------- main content ---------------------- -->
 <center>
-<h1>Week 44,  Convolutional Neural Networks (CNN)</h1>
+<h1>Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)</h1>
 </center>  <!-- document title -->
 
 <!-- author(s): Morten Hjorth-Jensen -->
@@ -358,7 +395,7 @@ <h1>Week 44,  Convolutional Neural Networks (CNN)</h1>
 </center>
 <br>
 <center>
-<h4>October 28</h4>
+<h4>Week 44</h4>
 </center> <!-- date -->
 <br>
 
@@ -366,10 +403,11 @@ <h4>October 28</h4>
 <h2 id="plan-for-week-44">Plan for week 44 </h2>
 
 <div class="alert alert-block alert-block alert-text-normal">
-<b>Material for the lecture Monday October 28, 2024</b>
+<b>Material for the lecture Monday October 27, 2025</b>
 <p>
 <ol>
-<li> Convolutional  Neural Networks</li>
+<li> Solving differential equations, continuation from last week, first lecture</li>
+<li> Convolutional  Neural Networks, second lecture</li>
 <li> Readings and Videos:</li>
 <ul>
   <li> These lecture notes at <a href="https://github.com/CompPhysics/MachineLearning/blob/master/doc/pub/week44/ipynb/week44.ipynb" target="_blank"><tt>https://github.com/CompPhysics/MachineLearning/blob/master/doc/pub/week44/ipynb/week44.ipynb</tt></a></li>
@@ -378,8 +416,8 @@ <h2 id="plan-for-week-44">Plan for week 44 </h2>
   <li> Video on Deep Learning at <a href="https://www.youtube.com/playlist?list=PLZHQObOWTQDNU6R1_67000Dx_ZCJB-3pi" target="_blank"><tt>https://www.youtube.com/playlist?list=PLZHQObOWTQDNU6R1_67000Dx_ZCJB-3pi</tt></a></li>
   <li> Video  on Convolutional Neural Networks from MIT at <a href="https://www.youtube.com/watch?v=iaSUYvmCekI&ab_channel=AlexanderAmini" target="_blank"><tt>https://www.youtube.com/watch?v=iaSUYvmCekI&ab_channel=AlexanderAmini</tt></a></li>
   <li> Video on CNNs from Stanford at <a href="https://www.youtube.com/watch?v=bNb2fEVKeEo&list=PLC1qU-LWwrF64f4QKQT-Vg5Wr4qEE1Zxk&index=6&ab_channel=StanfordUniversitySchoolofEngineering" target="_blank"><tt>https://www.youtube.com/watch?v=bNb2fEVKeEo&list=PLC1qU-LWwrF64f4QKQT-Vg5Wr4qEE1Zxk&index=6&ab_channel=StanfordUniversitySchoolofEngineering</tt></a></li>
-  <li> Video of lecture October 28 at <a href="https://youtu.be/rfrSfikAz94" target="_blank"><tt>https://youtu.be/rfrSfikAz94</tt></a></li>
-  <li> Whiteboard notes at <a href="https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2024/NotesOctober28" target="_blank"><tt>https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2024/NotesOctober28</tt></a></li>
+  <li> Video of lecture October 27 at <a href="https://youtu.be/QqOGhLgkig0" target="_blank"><tt>https://youtu.be/QqOGhLgkig0</tt></a></li>
+  <li> Whiteboard notes at <a href="https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2025/FYSSTKweek44" target="_blank"><tt>https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2025/FYSSTKweek44</tt></a></li>
 </ul>
 </ol>
 </div>
@@ -393,1012 +431,884 @@ <h2 id="lab-sessions-on-tuesday-and-wednesday">Lab  sessions on Tuesday and Wedn
 <p>
 <ul>
 <li> Main focus is discussion of and work on project 2</li>
-<li> If you did not get time to finish the exercises from week 43, you can also keep working on them and hand in this coming Friday</li>
+<li> If you did not get time to finish the exercises from weeks 41-42, you can also keep working on them and hand in this coming Friday</li>
 </ul>
 </div>
   
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="material-for-lecture-monday-october-28">Material for Lecture Monday October 28 </h2>
+<h2 id="material-for-lecture-monday-october-27">Material for Lecture Monday October 27 </h2>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="convolutional-neural-networks-recognizing-images">Convolutional Neural Networks (recognizing images) </h2>
+<h2 id="solving-differential-equations-with-deep-learning">Solving differential equations  with Deep Learning </h2>
 
-<p>Convolutional neural networks (CNNs) were developed during the last
-decade of the previous century, with a focus on character recognition
-tasks. Nowadays, CNNs are a central element in the spectacular success
-of deep learning methods. The success in for example image
-classifications have made them a central tool for most machine
-learning practitioners.
+<div class="alert alert-block alert-block alert-text-normal">
+<b></b>
+<p>
+<p>The Universal Approximation Theorem states that a neural network can
+approximate any function at a single hidden layer along with one input
+and output layer to any given precision.  
 </p>
+</div>
 
-<p>CNNs are very similar to ordinary Neural Networks.
-They are made up of neurons that have learnable weights and
-biases. Each neuron receives some inputs, performs a dot product and
-optionally follows it with a non-linearity. The whole network still
-expresses a single differentiable score function: from the raw image
-pixels on one end to class scores at the other. And they still have a
-loss function (for example Softmax) on the last (fully-connected) layer
-and all the tips/tricks we developed for learning regular Neural
-Networks still apply (back propagation, gradient descent etc etc).
+
+<div class="alert alert-block alert-block alert-text-normal">
+<b>Book on solving differential equations with ML methods</b>
+<p>
+<p><a href="https://www.springer.com/gp/book/9789401798150" target="_blank">An Introduction to Neural Network Methods for Differential Equations</a>, by Yadav and Kumar.</p>
+</div>
+
+
+<div class="alert alert-block alert-block alert-text-normal">
+<b>Physics informed neural networks</b>
+<p>
+<p><a href="https://link.springer.com/article/10.1007/s10915-022-01939-z" target="_blank">Scientific Machine Learning Through Physics&#8211;Informed Neural Networks: Where we are and What&#8217;s Next</a>, by Cuomo et al</p>
+</div>
+
+
+<div class="alert alert-block alert-block alert-text-normal">
+<b>Thanks to Kristine Baluka Hein</b>
+<p>
+<p>The lectures on differential equations were developed by Kristine Baluka Hein, now PhD student at IFI.
+A great thanks to Kristine.
 </p>
+</div>
+
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="what-is-the-difference">What is the Difference </h2>
+<h2 id="ordinary-differential-equations-first">Ordinary Differential Equations first  </h2>
 
-<p><b>CNN architectures make the explicit assumption that
-the inputs are images, which allows us to encode certain properties
-into the architecture. These then make the forward function more
-efficient to implement and vastly reduce the amount of parameters in
-the network.</b>
+<p>An ordinary differential equation (ODE) is an equation involving functions having one variable.</p>
+
+<p>In general, an ordinary differential equation looks like</p>
+
+$$
+\begin{equation} \label{ode}
+f\left(x, \, g(x), \, g'(x), \, g''(x), \, \dots \, , \, g^{(n)}(x)\right) = 0
+\end{equation}
+$$
+
+<p>where \( g(x) \) is the function to find, and \( g^{(n)}(x) \) is the \( n \)-th derivative of \( g(x) \).</p>
+
+<p>The \( f\left(x, g(x), g'(x), g''(x), \, \dots \, , g^{(n)}(x)\right) \) is just a way to write that there is an expression involving \( x \) and \( g(x), \ g'(x), \ g''(x), \, \dots \, , \text{ and } g^{(n)}(x) \) on the left side of the equality sign in \eqref{ode}.
+The highest order of derivative, that is the value of \( n \), determines to the order of the equation.
+The equation is referred to as a \( n \)-th order ODE.
+Along with \eqref{ode}, some additional conditions of the function \( g(x) \) are typically given
+for the solution to be unique.
 </p>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="neural-networks-vs-cnns">Neural Networks vs CNNs </h2>
+<h2 id="the-trial-solution">The trial solution </h2>
 
-<p>Neural networks are defined as <b>affine transformations</b>, that is 
-a vector is received as input and is multiplied with a matrix of so-called weights (our unknown paramters) to produce an
-output (to which a bias vector is usually added before passing the result
-through a nonlinear activation function). This is applicable to any type of input, be it an
-image, a sound clip or an unordered collection of features: whatever their
-dimensionality, their representation can always be flattened into a vector
-before the transformation.
+<p>Let the trial solution \( g_t(x) \) be</p>
+
+$$
+\begin{equation}
+	g_t(x) = h_1(x) + h_2(x,N(x,P))
+\label{_auto1}
+\end{equation}
+$$
+
+<p>where \( h_1(x) \) is a function that makes \( g_t(x) \) satisfy a given set
+of conditions, \( N(x,P) \) a neural network with weights and biases
+described by \( P \) and \( h_2(x, N(x,P)) \) some expression involving the
+neural network.  The role of the function \( h_2(x, N(x,P)) \), is to
+ensure that the output from \( N(x,P) \) is zero when \( g_t(x) \) is
+evaluated at the values of \( x \) where the given conditions must be
+satisfied.  The function \( h_1(x) \) should alone make \( g_t(x) \) satisfy
+the conditions.
 </p>
 
+<p>But what about the network \( N(x,P) \)?</p>
+
+<p>As described previously, an optimization method could be used to minimize the parameters of a neural network, that being its weights and biases, through backward propagation.</p>
+
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc">Why CNNS for images, sound files, medical images from CT scans etc? </h2>
+<h2 id="minimization-process">Minimization process </h2>
 
-<p>However, when we consider images, sound clips and many other similar kinds of data, these data  have an intrinsic
-structure. More formally, they share these important properties:
+<p>For the minimization to be defined, we need to have a cost function at hand to minimize.</p>
+
+<p>It is given that \( f\left(x, \, g(x), \, g'(x), \, g''(x), \, \dots \, , \, g^{(n)}(x)\right) \) should be equal to zero in \eqref{ode}.
+We can choose to consider the mean squared error as the cost function for an input \( x \).
+Since we are looking at one input, the cost function is just \( f \) squared.
+The cost function \( c\left(x, P \right) \) can therefore be expressed as
 </p>
-<ul>
-<li> They are stored as multi-dimensional arrays (think of the pixels of a figure) .</li>
-<li> They feature one or more axes for which ordering matters (e.g., width and height axes for an image, time axis for a sound clip).</li>
-<li> One axis, called the channel axis, is used to access different views of the data (e.g., the red, green and blue channels of a color image, or the left and right channels of a stereo audio track).</li>
-</ul>
-<p>These properties are not exploited when an affine transformation is applied; in
-fact, all the axes are treated in the same way and the topological information
-is not taken into account. Still, taking advantage of the implicit structure of
-the data may prove very handy in solving some tasks, like computer vision and
-speech recognition, and in these cases it would be best to preserve it. This is
-where discrete convolutions come into play.
+
+$$
+C\left(x, P\right) = \big(f\left(x, \, g(x), \, g'(x), \, g''(x), \, \dots \, , \, g^{(n)}(x)\right)\big)^2
+$$
+
+<p>If \( N \) inputs are given as a vector \( \boldsymbol{x} \) with elements \( x_i \) for \( i = 1,\dots,N \),
+the cost function becomes
 </p>
 
-<p>A discrete convolution is a linear transformation that preserves this notion of
-ordering. It is sparse (only a few input units contribute to a given output
-unit) and reuses parameters (the same weights are applied to multiple locations
-in the input).
+$$
+\begin{equation} \label{cost}
+	C\left(\boldsymbol{x}, P\right) = \frac{1}{N} \sum_{i=1}^N \big(f\left(x_i, \, g(x_i), \, g'(x_i), \, g''(x_i), \, \dots \, , \, g^{(n)}(x_i)\right)\big)^2
+\end{equation}
+$$
+
+<p>The neural net should then find the parameters \( P \) that minimizes the cost function in
+\eqref{cost} for a set of \( N \) training samples \( x_i \).
 </p>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="regular-nns-don-t-scale-well-to-full-images">Regular NNs don&#8217;t scale well to full images </h2>
+<h2 id="minimizing-the-cost-function-using-gradient-descent-and-automatic-differentiation">Minimizing the cost function using gradient descent and automatic differentiation </h2>
 
-<p>As an example, consider
-an image of size \( 32\times 32\times 3 \) (32 wide, 32 high, 3 color channels), so a
-single fully-connected neuron in a first hidden layer of a regular
-Neural Network would have \( 32\times 32\times 3 = 3072 \) weights. This amount still
-seems manageable, but clearly this fully-connected structure does not
-scale to larger images. For example, an image of more respectable
-size, say \( 200\times 200\times 3 \), would lead to neurons that have 
-\( 200\times 200\times 3 = 120,000 \) weights. 
+<p>To perform the minimization using gradient descent, the gradient of \( C\left(\boldsymbol{x}, P\right) \) is needed.
+It might happen so that finding an analytical expression of the gradient of \( C(\boldsymbol{x}, P) \) from \eqref{cost} gets too messy, depending on which cost function one desires to use.
 </p>
 
-<p>We could have
-several such neurons, and the parameters would add up quickly! Clearly,
-this full connectivity is wasteful and the huge number of parameters
-would quickly lead to possible overfitting.
+<p>Luckily, there exists libraries that makes the job for us through automatic differentiation.
+Automatic differentiation is a method of finding the derivatives numerically with very high precision.
 </p>
 
-<center>  <!-- FIGURE -->
-<hr class="figure">
-<center>
-<p class="caption">Figure 1:  A regular 3-layer Neural Network. </p>
-</center>
-<p><img src="http://github.com/figslides/nn.jpeg" width="500" align="bottom"></p>
-</center>
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="example-exponential-decay">Example: Exponential decay </h2>
+
+<p>An exponential decay of a quantity \( g(x) \) is described by the equation</p>
+
+$$
+\begin{equation} \label{solve_expdec}
+  g'(x) = -\gamma g(x)
+\end{equation}
+$$
+
+<p>with \( g(0) = g_0 \) for some chosen initial value \( g_0 \).</p>
+
+<p>The analytical solution of \eqref{solve_expdec} is</p>
+
+$$
+\begin{equation}
+  g(x) = g_0 \exp\left(-\gamma x\right)
+\label{_auto2}
+\end{equation}
+$$
+
+<p>Having an analytical solution at hand, it is possible to use it to compare how well a neural network finds a solution of \eqref{solve_expdec}.</p>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="3d-volumes-of-neurons">3D volumes of neurons </h2>
+<h2 id="the-function-to-solve-for">The function to solve for </h2>
 
-<p>Convolutional Neural Networks take advantage of the fact that the
-input consists of images and they constrain the architecture in a more
-sensible way. 
-</p>
+<p>The program will use a neural network to solve</p>
 
-<p>In particular, unlike a regular Neural Network, the
-layers of a CNN have neurons arranged in 3 dimensions: width,
-height, depth. (Note that the word depth here refers to the third
-dimension of an activation volume, not to the depth of a full Neural
-Network, which can refer to the total number of layers in a network.)
-</p>
+$$
+\begin{equation} \label{solveode}
+g'(x) = -\gamma g(x)
+\end{equation}
+$$
 
-<p>To understand it better, the above example of an image 
-with an input volume of
-activations has dimensions \( 32\times 32\times 3 \) (width, height,
-depth respectively). 
-</p>
+<p>where \( g(0) = g_0 \) with \( \gamma \) and \( g_0 \) being some chosen values.</p>
 
-<p>The neurons in a layer will
-only be connected to a small region of the layer before it, instead of
-all of the neurons in a fully-connected manner. Moreover, the final
-output layer could  for this specific image have dimensions \( 1\times 1 \times 10 \), 
-because by the
-end of the CNN architecture we will reduce the full image into a
-single vector of class scores, arranged along the depth
-dimension. 
-</p>
+<p>In this example, \( \gamma = 2 \) and \( g_0 = 10 \).</p>
 
-<center>  <!-- FIGURE -->
-<hr class="figure">
-<center>
-<p class="caption">Figure 2:  A CNN arranges its neurons in three dimensions (width, height, depth), as visualized in one of the layers. Every layer of a CNN transforms the 3D input volume to a 3D output volume of neuron activations. In this example, the red input layer holds the image, so its width and height would be the dimensions of the image, and the depth would be 3 (Red, Green, Blue channels). </p>
-</center>
-<p><img src="http://github.com/figslides/cnn.jpeg" width="500" align="bottom"></p>
-</center>
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="the-trial-solution">The trial solution </h2>
+<p>To begin with, a trial solution \( g_t(t) \) must be chosen. A general trial solution for ordinary differential equations could be</p>
+
+$$
+g_t(x, P) = h_1(x) + h_2(x, N(x, P))
+$$
+
+<p>with \( h_1(x) \) ensuring that \( g_t(x) \) satisfies some conditions and \( h_2(x,N(x, P)) \) an expression involving \( x \) and the output from the neural network \( N(x,P) \) with \( P  \) being the collection of the weights and biases for each layer. For now, it is assumed that the network consists of one input layer, one hidden layer, and one output layer.</p>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="more-on-dimensionalities">More on Dimensionalities </h2>
+<h2 id="setup-of-network">Setup of Network </h2>
 
-<p>In fields like signal processing (and imaging as well), one designs
-so-called filters. These filters are defined by the convolutions and
-are often hand-crafted. One may specify filters for smoothing, edge
-detection, frequency reshaping, and similar operations. However with
-neural networks the idea is to automatically learn the filters and use
-many of them in conjunction with non-linear operations (activation
-functions).
+<p>In this network, there are no weights and bias at the input layer, so \( P = \{ P_{\text{hidden}},  P_{\text{output}} \} \).
+If there are \( N_{\text{hidden} } \) neurons in the hidden layer, then \( P_{\text{hidden}} \) is a \( N_{\text{hidden} } \times (1 + N_{\text{input}}) \) matrix, given that there are \( N_{\text{input}} \) neurons in the input layer.
 </p>
 
-<p>As an example consider a neural network operating on sound sequence
-data.  Assume that we an input vector \( \boldsymbol{x} \) of length \( d=10^6 \).  We
-construct then a neural network with onle hidden layer only with
-\( 10^4 \) nodes. This means that we will have a weight matrix with
-\( 10^4\times 10^6=10^{10} \) weights to be determined, together with \( 10^4 \) biases.
+<p>The first column in \( P_{\text{hidden} } \) represents the bias for each neuron in the hidden layer and the second column represents the weights for each neuron in the hidden layer from the input layer.
+If there are \( N_{\text{output} } \) neurons in the output layer, then \( P_{\text{output}}  \) is a \( N_{\text{output} } \times (1 + N_{\text{hidden} }) \) matrix.
 </p>
 
-<p>Assume furthermore that we have an output layer which is meant to train whether the sound sequence represents a human voice (true) or something else (false).
-It means that we have only one output node. But since this output node connects to \( 10^4 \) nodes in the hidden layer, there are in total \( 10^4 \) weights to be determined for the output layer, plus one bias. In total we have
-</p>
+<p>Its first column represents the bias of each neuron and the remaining columns represents the weights to each neuron.</p>
+
+<p>It is given that \( g(0) = g_0 \). The trial solution must fulfill this condition to be a proper solution of \eqref{solveode}. A possible way to ensure that \( g_t(0, P) = g_0 \), is to let \( F(N(x,P)) = x \cdot N(x,P) \) and \( h_1(x) = g_0 \). This gives the following trial solution:</p>
 
 $$
-\mathrm{NumberParameters}=10^{10}+10^4+10^4+1 \approx 10^{10},
+\begin{equation} \label{trial}
+g_t(x, P) = g_0 + x \cdot N(x, P)
+\end{equation}
 $$
 
-<p>that is ten billion parameters to determine. </p>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="further-remarks">Further remarks </h2>
-
-<p>The main principles that justify convolutions is locality of
-information and repetion of patterns within the signal. Sound samples
-of the input in adjacent spots are much more likely to affect each
-other than those that are very far away. Similarly, sounds are
-repeated in multiple times in the signal. While slightly simplistic,
-reasoning about such a sound example demonstrates this. The same
-principles then apply to images and other similar data.
-</p>
+<h2 id="reformulating-the-problem">Reformulating the problem </h2>
 
-<!-- !split  -->
-<h2 id="layers-used-to-build-cnns">Layers used to build CNNs </h2>
+<p>We wish that our neural network manages to minimize a given cost function.</p>
 
-<p>A simple CNN is a sequence of layers, and every layer of a CNN
-transforms one volume of activations to another through a
-differentiable function. We use three main types of layers to build
-CNN architectures: Convolutional Layer, Pooling Layer, and
-Fully-Connected Layer (exactly as seen in regular Neural Networks). We
-will stack these layers to form a full CNN architecture.
+<p>A reformulation of out equation, \eqref{solveode}, must therefore be done,
+such that it describes the problem a neural network can solve for.
 </p>
 
-<p>A simple CNN for image classification could have the architecture:</p>
+<p>The neural network must find the set of weights and biases \( P \) such that the trial solution in \eqref{trial} satisfies \eqref{solveode}.</p>
 
-<ul>
-<li> <b>INPUT</b> (\( 32\times 32 \times 3 \)) will hold the raw pixel values of the image, in this case an image of width 32, height 32, and with three color channels R,G,B.</li>
-<li> <b>CONV</b> (convolutional )layer will compute the output of neurons that are connected to local regions in the input, each computing a dot product between their weights and a small region they are connected to in the input volume. This may result in volume such as \( [32\times 32\times 12] \) if we decided to use 12 filters.</li>
-<li> <b>RELU</b> layer will apply an elementwise activation function, such as the \( max(0,x) \) thresholding at zero. This leaves the size of the volume unchanged (\( [32\times 32\times 12] \)).</li>
-<li> <b>POOL</b> (pooling) layer will perform a downsampling operation along the spatial dimensions (width, height), resulting in volume such as \( [16\times 16\times 12] \).</li>
-<li> <b>FC</b> (i.e. fully-connected) layer will compute the class scores, resulting in volume of size \( [1\times 1\times 10] \), where each of the 10 numbers correspond to a class score, such as among the 10 categories of the MNIST images we considered above . As with ordinary Neural Networks and as the name implies, each neuron in this layer will be connected to all the numbers in the previous volume.</li>
-</ul>
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="transforming-images">Transforming images </h2>
+<p>The trial solution</p>
 
-<p>CNNs transform the original image layer by layer from the original
-pixel values to the final class scores. 
-</p>
+$$
+g_t(x, P) = g_0 + x \cdot N(x, P)
+$$
 
-<p>Observe that some layers contain
-parameters and other don&#8217;t. In particular, the CNN layers perform
-transformations that are a function of not only the activations in the
-input volume, but also of the parameters (the weights and biases of
-the neurons). On the other hand, the RELU/POOL layers will implement a
-fixed function. The parameters in the CONV/FC layers will be trained
-with gradient descent so that the class scores that the CNN computes
-are consistent with the labels in the training set for each image.
-</p>
+<p>has been chosen such that it already solves the condition \( g(0) = g_0 \). What remains, is to find \( P \) such that</p>
 
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="cnns-in-brief">CNNs in brief </h2>
+$$
+\begin{equation} \label{nnmin}
+g_t'(x, P) = - \gamma g_t(x, P)
+\end{equation}
+$$
 
-<p>In summary:</p>
+<p>is fulfilled as <em>best as possible</em>.</p>
 
-<ul>
-<li> A CNN architecture is in the simplest case a list of Layers that transform the image volume into an output volume (e.g. holding the class scores)</li>
-<li> There are a few distinct types of Layers (e.g. CONV/FC/RELU/POOL are by far the most popular)</li>
-<li> Each Layer accepts an input 3D volume and transforms it to an output 3D volume through a differentiable function</li>
-<li> Each Layer may or may not have parameters (e.g. CONV/FC do, RELU/POOL don&#8217;t)</li>
-<li> Each Layer may or may not have additional hyperparameters (e.g. CONV/FC/POOL do, RELU doesn&#8217;t)</li>
-</ul>
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book">A deep CNN model (<a href="https://github.com/rasbt/machine-learning-book" target="_blank">From Raschka et al</a>) </h2>
+<h2 id="more-technicalities">More technicalities </h2>
 
-<center>  <!-- FIGURE -->
-<hr class="figure">
-<center>
-<p class="caption">Figure 3:  A deep CNN </p>
-</center>
-<p><img src="http://github.com/figslides/deepcnn.png" width="500" align="bottom"></p>
-</center>
+<p>The left hand side and right hand side of \eqref{nnmin} must be computed separately, and then the neural network must choose weights and biases, contained in \( P \), such that the sides are equal as best as possible.
+This means that the absolute or squared difference between the sides must be as close to zero, ideally equal to zero.
+In this case, the difference squared shows to be an appropriate measurement of how erroneous the trial solution is with respect to \( P \) of the neural network.
+</p>
 
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="key-idea">Key Idea </h2>
+<p>This gives the following cost function our neural network must solve for:</p>
+
+$$
+\min_{P}\Big\{ \big(g_t'(x, P) - ( -\gamma g_t(x, P) \big)^2 \Big\}
+$$
 
-<p>A dense neural network is representd by an affine operation (like matrix-matrix multiplication) where all parameters are included.</p>
+<p>(the notation \( \min_{P}\{ f(x, P) \} \) means that we desire to find \( P \) that yields the minimum of \( f(x, P) \))</p>
 
-<p>The key idea in CNNs for say imaging is that in images neighbor pixels tend to be related! So we connect
-only neighboring neurons in the input instead of connecting all with the first hidden layer.
-</p>
+<p>or, in terms of weights and biases for the hidden and output layer in our network:</p>
 
-<p>We say we perform a filtering (convolution is the mathematical operation). </p>
+$$
+\min_{P_{\text{hidden} }, \ P_{\text{output} }}\Big\{ \big(g_t'(x, \{ P_{\text{hidden} }, P_{\text{output} }\}) - ( -\gamma g_t(x, \{ P_{\text{hidden} }, P_{\text{output} }\}) \big)^2 \Big\}
+$$
+
+<p>for an input value \( x \).</p>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="how-to-do-image-compression-before-the-era-of-deep-learning">How to do image compression before the era of deep learning </h2>
+<h2 id="more-details">More details </h2>
 
-<p>The singular-value decomposition (SVD) algorithm has been for decades one of the standard ways of compressing images.
-The <a href="https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/chapter2.html#the-singular-value-decomposition" target="_blank">lectures on the SVD</a> give many of the essential details concerning the SVD.
-</p>
+<p>If the neural network evaluates \( g_t(x, P) \) at more values for \( x \), say \( N \) values \( x_i \) for \( i = 1, \dots, N \), then the <em>total</em> error to minimize becomes</p>
 
-<p>The orthogonal vectors which are obtained from the SVD, can be used to
-project down the dimensionality of a given image. In the example here
-we gray-scale an image and downsize it.
-</p>
+$$
+\begin{equation} \label{min}
+\min_{P}\Big\{\frac{1}{N} \sum_{i=1}^N  \big(g_t'(x_i, P) - ( -\gamma g_t(x_i, P) \big)^2 \Big\}
+\end{equation}
+$$
 
-<p>This recipe relies on us being able to actually perform the SVD. For
-large images, and in particular with many images to reconstruct, using the SVD 
-may quickly become an overwhelming task. With the advent of efficient deep
-learning methods like CNNs and later generative methods, these methods
-have become in the last years the premier way of performing image
-analysis. In particular for classification problems with labelled images.
-</p>
+<p>Letting \( \boldsymbol{x} \) be a vector with elements \( x_i \) and \( C(\boldsymbol{x}, P) = \frac{1}{N} \sum_i  \big(g_t'(x_i, P) - ( -\gamma g_t(x_i, P) \big)^2 \) denote the cost function, the minimization problem that our network must solve, becomes</p>
 
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="the-svd-example">The SVD example </h2>
+$$
+\min_{P} C(\boldsymbol{x}, P)
+$$
 
+<p>In terms of \( P_{\text{hidden} } \) and \( P_{\text{output} } \), this could also be expressed as</p>
 
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">matplotlib.image</span> <span style="color: #008000; font-weight: bold">import</span> imread
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">matplotlib.pyplot</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">plt</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">scipy.linalg</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">ln</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">os</span>
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">PIL</span> <span style="color: #008000; font-weight: bold">import</span> Image
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">math</span> <span style="color: #008000; font-weight: bold">import</span> log10, sqrt 
-plt<span style="color: #666666">.</span>rcParams[<span style="color: #BA2121">&#39;figure.figsize&#39;</span>] <span style="color: #666666">=</span> [<span style="color: #666666">16</span>, <span style="color: #666666">8</span>]
-<span style="color: #408080; font-style: italic"># Import image</span>
-A <span style="color: #666666">=</span> imread(os<span style="color: #666666">.</span>path<span style="color: #666666">.</span>join(<span style="color: #BA2121">&quot;figslides/photo1.jpg&quot;</span>))
-X <span style="color: #666666">=</span> A<span style="color: #666666">.</span>dot([<span style="color: #666666">0.299</span>, <span style="color: #666666">0.5870</span>, <span style="color: #666666">0.114</span>]) <span style="color: #408080; font-style: italic"># Convert RGB to grayscale</span>
-img <span style="color: #666666">=</span> plt<span style="color: #666666">.</span>imshow(X)
-<span style="color: #408080; font-style: italic"># convert to gray</span>
-img<span style="color: #666666">.</span>set_cmap(<span style="color: #BA2121">&#39;gray&#39;</span>)
-plt<span style="color: #666666">.</span>axis(<span style="color: #BA2121">&#39;off&#39;</span>)
-plt<span style="color: #666666">.</span>show()
-<span style="color: #408080; font-style: italic"># Call image size</span>
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&#39;: </span><span style="color: #BB6688; font-weight: bold">%s</span><span style="color: #BA2121">&#39;</span><span style="color: #666666">%</span><span style="color: #008000">str</span>(X<span style="color: #666666">.</span>shape))
+<p>$$
+\min_{P_{\text{hidden} }, \ P_{\text{output} }} C(\boldsymbol{x}, \{P_{\text{hidden} }, P_{\text{output} }\})
+$$
+</p>
 
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="a-possible-implementation-of-a-neural-network">A possible implementation of a neural network </h2>
 
-<span style="color: #408080; font-style: italic"># split the matrix into U, S, VT</span>
-U, S, VT <span style="color: #666666">=</span> np<span style="color: #666666">.</span>linalg<span style="color: #666666">.</span>svd(X,full_matrices<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">False</span>)
-S <span style="color: #666666">=</span> np<span style="color: #666666">.</span>diag(S)
-m <span style="color: #666666">=</span> <span style="color: #666666">800</span> <span style="color: #408080; font-style: italic"># Image&#39;s width</span>
-n <span style="color: #666666">=</span> <span style="color: #666666">1200</span> <span style="color: #408080; font-style: italic"># Image&#39;s height</span>
-j <span style="color: #666666">=</span> <span style="color: #666666">0</span>
-<span style="color: #408080; font-style: italic"># Try compression with different k vectors (these represent projections):</span>
-<span style="color: #008000; font-weight: bold">for</span> k <span style="color: #AA22FF; font-weight: bold">in</span> (<span style="color: #666666">5</span>,<span style="color: #666666">10</span>, <span style="color: #666666">20</span>, <span style="color: #666666">100</span>,<span style="color: #666666">200</span>,<span style="color: #666666">400</span>,<span style="color: #666666">500</span>):
-    <span style="color: #408080; font-style: italic"># Original size of the image</span>
-    originalSize <span style="color: #666666">=</span> m <span style="color: #666666">*</span> n 
-    <span style="color: #408080; font-style: italic"># Size after compressed</span>
-    compressedSize <span style="color: #666666">=</span> k <span style="color: #666666">*</span> (<span style="color: #666666">1</span> <span style="color: #666666">+</span> m <span style="color: #666666">+</span> n) 
-    <span style="color: #408080; font-style: italic"># The projection of the original image</span>
-    Xapprox <span style="color: #666666">=</span> U[:,:k] <span style="color: #666666">@</span> S[<span style="color: #666666">0</span>:k,:k] <span style="color: #666666">@</span> VT[:k,:]
-    plt<span style="color: #666666">.</span>figure(j<span style="color: #666666">+1</span>)
-    j <span style="color: #666666">+=</span> <span style="color: #666666">1</span>
-    img <span style="color: #666666">=</span> plt<span style="color: #666666">.</span>imshow(Xapprox)
-    img<span style="color: #666666">.</span>set_cmap(<span style="color: #BA2121">&#39;gray&#39;</span>)
-    
-    plt<span style="color: #666666">.</span>axis(<span style="color: #BA2121">&#39;off&#39;</span>)
-    plt<span style="color: #666666">.</span>title(<span style="color: #BA2121">&#39;k = &#39;</span> <span style="color: #666666">+</span> <span style="color: #008000">str</span>(k))
-    plt<span style="color: #666666">.</span>show() 
-    <span style="color: #008000">print</span>(<span style="color: #BA2121">&#39;Original size of image:&#39;</span>)
-    <span style="color: #008000">print</span>(originalSize)
-    <span style="color: #008000">print</span>(<span style="color: #BA2121">&#39;Compression rate as Compressed image / Original size:&#39;</span>)
-    ratio <span style="color: #666666">=</span> compressedSize <span style="color: #666666">*</span> <span style="color: #666666">1.0</span> <span style="color: #666666">/</span> originalSize
-    <span style="color: #008000">print</span>(ratio)
-    <span style="color: #008000">print</span>(<span style="color: #BA2121">&#39;Compression rate is &#39;</span> <span style="color: #666666">+</span> <span style="color: #008000">str</span>( <span style="color: #008000">round</span>(ratio <span style="color: #666666">*</span> <span style="color: #666666">100</span> ,<span style="color: #666666">2</span>)) <span style="color: #666666">+</span> <span style="color: #BA2121">&#39;%&#39;</span> )  
-    <span style="color: #408080; font-style: italic"># Estimate MQA</span>
-    x<span style="color: #666666">=</span> X<span style="color: #666666">.</span>astype(<span style="color: #BA2121">&quot;float&quot;</span>)
-    y<span style="color: #666666">=</span>Xapprox<span style="color: #666666">.</span>astype(<span style="color: #BA2121">&quot;float&quot;</span>)
-    err <span style="color: #666666">=</span> np<span style="color: #666666">.</span>sum((x <span style="color: #666666">-</span> y) <span style="color: #666666">**</span> <span style="color: #666666">2</span>)
-    err <span style="color: #666666">/=</span> <span style="color: #008000">float</span>(X<span style="color: #666666">.</span>shape[<span style="color: #666666">0</span>] <span style="color: #666666">*</span> Xapprox<span style="color: #666666">.</span>shape[<span style="color: #666666">1</span>])
-    <span style="color: #008000">print</span>(<span style="color: #BA2121">&#39;The mean-square deviation &#39;</span><span style="color: #666666">+</span> <span style="color: #008000">str</span>(<span style="color: #008000">round</span>( err)))
-    max_pixel <span style="color: #666666">=</span> <span style="color: #666666">255.0</span>
-    <span style="color: #408080; font-style: italic"># Estimate Signal Noise Ratio</span>
-    srv <span style="color: #666666">=</span> <span style="color: #666666">20</span> <span style="color: #666666">*</span> (log10(max_pixel <span style="color: #666666">/</span> sqrt(err)))
-    <span style="color: #008000">print</span>(<span style="color: #BA2121">&#39;Signa to noise ratio &#39;</span><span style="color: #666666">+</span> <span style="color: #008000">str</span>(<span style="color: #008000">round</span>(srv)) <span style="color: #666666">+</span><span style="color: #BA2121">&#39;dB&#39;</span>)
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
+<p>For simplicity, it is assumed that the input is an array \( \boldsymbol{x} = (x_1, \dots, x_N) \) with \( N \) elements. It is at these points the neural network should find \( P \) such that it fulfills \eqref{min}.</p>
 
+<p>First, the neural network must feed forward the inputs.
+This means that \( \boldsymbol{x}s \) must be passed through an input layer, a hidden layer and a output layer. The input layer in this case, does not need to process the data any further.
+The input layer will consist of \( N_{\text{input} } \) neurons, passing its element to each neuron in the hidden layer.  The number of neurons in the hidden layer will be \( N_{\text{hidden} } \).
+</p>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="mathematics-of-cnns">Mathematics of CNNs </h2>
+<h2 id="technicalities">Technicalities </h2>
 
-<p>The mathematics of CNNs is based on the mathematical operation of
-<b>convolution</b>.  In mathematics (in particular in functional analysis),
-convolution is represented by mathematical operations (integration,
-summation etc) on two functions in order to produce a third function
-that expresses how the shape of one gets modified by the other.
-Convolution has a plethora of applications in a variety of
-disciplines, spanning from statistics to signal processing, computer
-vision, solutions of differential equations,linear algebra,
-engineering, and yes, machine learning.
-</p>
+<p>For the \( i \)-th in the hidden layer with weight \( w_i^{\text{hidden} } \) and bias \( b_i^{\text{hidden} } \), the weighting from the \( j \)-th neuron at the input layer is:</p>
 
-<p>Mathematically, convolution is defined as follows (one-dimensional example):
-Let us define a continuous function \( y(t) \) given by
-</p>
 $$
-y(t) = \int x(a) w(t-a) da,
+\begin{aligned}
+z_{i,j}^{\text{hidden}} &= b_i^{\text{hidden}} + w_i^{\text{hidden}}x_j \\
+&=
+\begin{pmatrix}
+b_i^{\text{hidden}} & w_i^{\text{hidden}}
+\end{pmatrix}
+\begin{pmatrix}
+1 \\
+x_j
+\end{pmatrix}
+\end{aligned}
 $$
 
-<p>where \( x(a) \) represents a so-called input and \( w(t-a) \) is normally called the weight function or kernel.</p>
 
-<p>The above integral is written in  a more compact form as</p>
-$$
-y(t) = \left(x * w\right)(t).
-$$
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="final-technicalities-i">Final technicalities I </h2>
+
+<p>The result after weighting the inputs at the \( i \)-th hidden neuron can be written as a vector:</p>
 
-<p>The discretized version reads</p>
 $$
-y(t) = \sum_{a=-\infty}^{a=\infty}x(a)w(t-a).
+\begin{aligned}
+\boldsymbol{z}_{i}^{\text{hidden}} &= \Big( b_i^{\text{hidden}} + w_i^{\text{hidden}}x_1 , \ b_i^{\text{hidden}} + w_i^{\text{hidden}} x_2, \ \dots \, , \ b_i^{\text{hidden}} + w_i^{\text{hidden}} x_N\Big)  \\
+&=
+\begin{pmatrix}
+ b_i^{\text{hidden}}  & w_i^{\text{hidden}}
+\end{pmatrix}
+\begin{pmatrix}
+1  & 1 & \dots & 1 \\
+x_1 & x_2 & \dots & x_N
+\end{pmatrix} \\
+&= \boldsymbol{p}_{i, \text{hidden}}^T X
+\end{aligned}
 $$
 
-<p>Computing the inverse of the above convolution operations is known as deconvolution and the process is commutative.</p>
-
-<p>How can we use this? And what does it mean? Let us study some familiar examples first.</p>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="convolution-examples-polynomial-multiplication">Convolution Examples: Polynomial multiplication </h2>
+<h2 id="final-technicalities-ii">Final technicalities II </h2>
 
-<p>Our first example is that of a multiplication between two polynomials,
-which we will rewrite in terms of the mathematics of convolution. In
-the final stage, since the problem here is a discrete one, we will
-recast the final expression in terms of a matrix-vector
-multiplication, where the matrix is a so-called <a href="https://link.springer.com/book/10.1007/978-93-86279-04-0" target="_blank">Toeplitz matrix
-</a>.
-</p>
+<p>The vector \( \boldsymbol{p}_{i, \text{hidden}}^T \) constitutes each row in \( P_{\text{hidden} } \), which contains the weights for the neural network to minimize according to \eqref{min}.</p>
 
-<p>Let us look a the following polynomials to second and third order, respectively:</p>
-$$
-p(t) = \alpha_0+\alpha_1 t+\alpha_2 t^2,
-$$
+<p>After having found \( \boldsymbol{z}_{i}^{\text{hidden}}  \) for every \( i \)-th neuron within the hidden layer, the vector will be sent to an activation function \( a_i(\boldsymbol{z}) \).</p>
+
+<p>In this example, the sigmoid function has been chosen to be the activation function for each hidden neuron:</p>
 
-<p>and</p>
 $$
-s(t) = \beta_0+\beta_1 t+\beta_2 t^2+\beta_3 t^3.
+f(z) = \frac{1}{1 + \exp{(-z)}}
 $$
 
-<p>The polynomial multiplication gives us a new polynomial of degree \( 5 \)</p>
-$$
-z(t) = \delta_0+\delta_1 t+\delta_2 t^2+\delta_3 t^3+\delta_4 t^4+\delta_5 t^5.
+<p>It is possible to use other activations functions for the hidden layer also.</p>
+
+<p>The output \( \boldsymbol{x}_i^{\text{hidden}} \) from each \( i \)-th hidden neuron is:</p>
+
+<p>$$
+\boldsymbol{x}_i^{\text{hidden} } = f\big(  \boldsymbol{z}_{i}^{\text{hidden}} \big)
 $$
+</p>
+
+<p>The outputs \( \boldsymbol{x}_i^{\text{hidden} }  \) are then sent to the output layer.</p>
 
+<p>The output layer consists of one neuron in this case, and combines the
+output from each of the neurons in the hidden layers. The output layer
+combines the results from the hidden layer using some weights \( w_i^{\text{output}} \)
+and biases \( b_i^{\text{output}} \). In this case,
+it is assumes that the number of neurons in the output layer is one.
+</p>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="efficient-polynomial-multiplication">Efficient Polynomial Multiplication </h2>
+<h2 id="final-technicalities-iii">Final technicalities III </h2>
 
-<p>Computing polynomial products can be implemented efficiently if we rewrite the more brute force multiplications using convolution.
-We note first that the new coefficients are given as
-</p>
+<p>The procedure of weighting the output neuron \( j \) in the hidden layer to the \( i \)-th neuron in the output layer is similar as for the hidden layer described previously.</p>
 
 $$
-\begin{split}
-\delta_0=&\alpha_0\beta_0\\
-\delta_1=&\alpha_1\beta_0+\alpha_0\beta_1\\
-\delta_2=&\alpha_0\beta_2+\alpha_1\beta_1+\alpha_2\beta_0\\
-\delta_3=&\alpha_1\beta_2+\alpha_2\beta_1+\alpha_0\beta_3\\
-\delta_4=&\alpha_2\beta_2+\alpha_1\beta_3\\
-\delta_5=&\alpha_2\beta_3.\\
-\end{split}
+\begin{aligned}
+z_{1,j}^{\text{output}} & =
+\begin{pmatrix}
+b_1^{\text{output}} & \boldsymbol{w}_1^{\text{output}}
+\end{pmatrix}
+\begin{pmatrix}
+1 \\
+\boldsymbol{x}_j^{\text{hidden}}
+\end{pmatrix}
+\end{aligned}
 $$
 
-<p>We note that \( \alpha_i=0 \) except for \( i\in \left\{0,1,2\right\} \) and \( \beta_i=0 \) except for \( i\in\left\{0,1,2,3\right\} \).</p>
 
-<p>We can then rewrite the coefficients \( \delta_j \) using a discrete convolution as</p>
-$$
-\delta_j = \sum_{i=-\infty}^{i=\infty}\alpha_i\beta_{j-i}=(\alpha * \beta)_j,
-$$
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="final-technicalities-iv">Final technicalities IV </h2>
+
+<p>Expressing \( z_{1,j}^{\text{output}} \) as a vector gives the following way of weighting the inputs from the hidden layer:</p>
 
-<p>or as a double sum with restriction \( l=i+j \)</p>
 $$
-\delta_l = \sum_{ij}\alpha_i\beta_{j}.
+\boldsymbol{z}_{1}^{\text{output}} =
+\begin{pmatrix}
+b_1^{\text{output}} & \boldsymbol{w}_1^{\text{output}}
+\end{pmatrix}
+\begin{pmatrix}
+1  & 1 & \dots & 1 \\
+\boldsymbol{x}_1^{\text{hidden}} & \boldsymbol{x}_2^{\text{hidden}} & \dots & \boldsymbol{x}_N^{\text{hidden}}
+\end{pmatrix}
 $$
 
+<p>In this case we seek a continuous range of values since we are approximating a function. This means that after computing \( \boldsymbol{z}_{1}^{\text{output}} \) the neural network has finished its feed forward step, and \( \boldsymbol{z}_{1}^{\text{output}} \) is the final output of the network.</p>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="further-simplification">Further simplification </h2>
+<h2 id="back-propagation">Back propagation </h2>
+
+<p>The next step is to decide how the parameters should be changed such that they minimize the cost function.</p>
+
+<p>The chosen cost function for this problem is</p>
 
-<p>Although we may have redundant operations with some few zeros for \( \beta_i \), we can rewrite the above sum in a more compact way as </p>
 $$
-\delta_i = \sum_{k=0}^{k=m-1}\alpha_k\beta_{i-k},
+C(\boldsymbol{x}, P) = \frac{1}{N} \sum_i  \big(g_t'(x_i, P) - ( -\gamma g_t(x_i, P) \big)^2
 $$
 
-<p>where \( m=3 \) in our case, the maximum length of
-the vector \( \alpha \). Note that the vector \( \boldsymbol{\beta} \) has length \( n=4 \). Below we will find an even more efficient representation.
-</p>
+<p>In order to minimize the cost function, an optimization method must be chosen.</p>
+
+<p>Here, gradient descent with a constant step size has been chosen.</p>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="a-more-efficient-way-of-coding-the-above-convolution">A more efficient way of coding the above Convolution </h2>
+<h2 id="gradient-descent">Gradient descent </h2>
 
-<p>Since we only have a finite number of \( \alpha \) and \( \beta \) values
-which are non-zero, we can rewrite the above convolution expressions
-as a matrix-vector multiplication
+<p>The idea of the gradient descent algorithm is to update parameters in
+a direction where the cost function decreases goes to a minimum.
+</p>
+
+<p>In general, the update of some parameters \( \boldsymbol{\omega} \) given a cost
+function defined by some weights \( \boldsymbol{\omega} \), \( C(\boldsymbol{x},
+\boldsymbol{\omega}) \), goes as follows:
 </p>
 
 $$
-\boldsymbol{\delta}=\begin{bmatrix}\alpha_0 & 0 & 0 & 0 \\
-                            \alpha_1 & \alpha_0 & 0 & 0 \\
-			    \alpha_2 & \alpha_1 & \alpha_0 & 0 \\
-			    0 & \alpha_2 & \alpha_1 & \alpha_0 \\
-			    0 & 0 & \alpha_2 & \alpha_1 \\
-			    0 & 0 & 0 & \alpha_2
-			    \end{bmatrix}\begin{bmatrix} \beta_0 \\ \beta_1 \\ \beta_2 \\ \beta_3\end{bmatrix}.
+\boldsymbol{\omega}_{\text{new} } = \boldsymbol{\omega} - \lambda \nabla_{\boldsymbol{\omega}} C(\boldsymbol{x}, \boldsymbol{\omega})
 $$
 
+<p>for a number of iterations or until $ \big|\big| \boldsymbol{\omega}_{\text{new} } - \boldsymbol{\omega} \big|\big|$ becomes smaller than some given tolerance.</p>
 
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="commutative-process">Commutative process </h2>
+<p>The value of \( \lambda \) decides how large steps the algorithm must take
+in the direction of $ \nabla_{\boldsymbol{\omega}} C(\boldsymbol{x}, \boldsymbol{\omega})$.
+The notation \( \nabla_{\boldsymbol{\omega}} \) express the gradient with respect
+to the elements in \( \boldsymbol{\omega} \).
+</p>
 
-<p>The process is commutative and we can easily see that we can rewrite the multiplication in terms of  a matrix holding \( \beta \) and a vector holding \( \alpha \).
-In this case we have
+<p>In our case, we have to minimize the cost function \( C(\boldsymbol{x}, P) \) with
+respect to the two sets of weights and biases, that is for the hidden
+layer \( P_{\text{hidden} } \) and for the output layer \( P_{\text{output}
+} \) .
 </p>
-$$
-\boldsymbol{\delta}=\begin{bmatrix}\beta_0 & 0 & 0  \\
-                            \beta_1 & \beta_0 & 0  \\
-			    \beta_2 & \beta_1 & \beta_0  \\
-			    \beta_3 & \beta_2 & \beta_1 \\
-			    0 & \beta_3 & \beta_2 \\
-			    0 & 0 & \beta_3
-			    \end{bmatrix}\begin{bmatrix} \alpha_0 \\ \alpha_1 \\ \alpha_2\end{bmatrix}.
-$$
 
-<p>Note that the use of these matrices is for mathematical purposes only
-and not implementation purposes.  When implementing the above equation
-we do not encode (and allocate memory) the matrices explicitely.  We
-rather code the convolutions in the minimal memory footprint that they
-require.
-</p>
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="toeplitz-matrices">Toeplitz matrices </h2>
+<p>This means that \( P_{\text{hidden} } \) and \( P_{\text{output} } \) is updated by</p>
 
-<p>The above matrices are examples of so-called <a href="https://link.springer.com/book/10.1007/978-93-86279-04-0" target="_blank">Toeplitz
-matrices</a>. A
-Toeplitz matrix is a matrix in which each descending diagonal from
-left to right is constant. For instance the last matrix, which we
-rewrite as
-</p>
 $$
-\boldsymbol{A}=\begin{bmatrix}a_0 & 0 & 0  \\
-                            a_1 & a_0 & 0  \\
-			    a_2 & a_1 & a_0  \\
-			    a_3 & a_2 & a_1 \\
-			    0 & a_3 & a_2 \\
-			    0 & 0 & a_3
-			    \end{bmatrix},
+\begin{aligned}
+P_{\text{hidden},\text{new}} &= P_{\text{hidden}} - \lambda \nabla_{P_{\text{hidden}}} C(\boldsymbol{x}, P)  \\
+P_{\text{output},\text{new}} &= P_{\text{output}} - \lambda \nabla_{P_{\text{output}}} C(\boldsymbol{x}, P)
+\end{aligned}
 $$
 
-<p>with elements \( a_{ii}=a_{i+1,j+1}=a_{i-j} \) is an example of a Toeplitz
-matrix. Such a matrix does not need to be a square matrix.  Toeplitz
-matrices are also closely connected with Fourier series, because the multiplication operator by a trigonometric
-polynomial, compressed to a finite-dimensional space, can be
-represented by such a matrix. The example above shows that we can
-represent linear convolution as multiplication of a Toeplitz matrix by
-a vector.
-</p>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="fourier-series-and-toeplitz-matrices">Fourier series and Toeplitz matrices </h2>
+<h2 id="the-code-for-solving-the-ode">The code for solving the ODE </h2>
 
-<p>This is an active and ogoing research area concerning CNNs. The following articles may be of interest</p>
-<ol>
-<li> <a href="https://www.sciencedirect.com/topics/engineering/convolution-theorem#:~:text=The%20convolution%20theorem%20(together%20with,k%20)%20G%20(%20k%20)%20." target="_blank">Read more about the convolution theorem and Fouriers series</a></li>
-<li> <a href="https://www.sciencedirect.com/science/article/pii/S1568494623006257" target="_blank">Fourier Transform Layer</a></li>
-</ol>
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="generalizing-the-above-one-dimensional-case">Generalizing the above one-dimensional case </h2>
 
-<p>In order to align the above simple case with the more general
-convolution cases, we rename \( \boldsymbol{\alpha} \), whose length is \( m=3 \),
-with \( \boldsymbol{w} \).  We will interpret \( \boldsymbol{w} \) as a weight/filter function
-with which we want to perform the convolution with an input variable
-\( \boldsymbol{x} \) of length \( n \).  We will assume always that the filter
-\( \boldsymbol{w} \) has dimensionality \( m \le n \).
-</p>
+<!-- code=python (!bc pycod) typeset with pygments style "default" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #f8f8f8">
+  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">autograd.numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">autograd</span> <span style="color: #008000; font-weight: bold">import</span> grad, elementwise_grad
+<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">autograd.numpy.random</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">npr</span>
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">matplotlib</span> <span style="color: #008000; font-weight: bold">import</span> pyplot <span style="color: #008000; font-weight: bold">as</span> plt
 
-<p>We replace thus \( \boldsymbol{\beta} \) with \( \boldsymbol{x} \) and \( \boldsymbol{\delta} \) with \( \boldsymbol{y} \) and have</p>
-$$
-y(i)= \left(x*w\right)(i)= \sum_{k=0}^{k=m-1}w(k)x(i-k),
-$$
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">sigmoid</span>(z):
+    <span style="color: #008000; font-weight: bold">return</span> <span style="color: #666666">1/</span>(<span style="color: #666666">1</span> <span style="color: #666666">+</span> np<span style="color: #666666">.</span>exp(<span style="color: #666666">-</span>z))
 
-<p>where \( m=3 \) in our case, the maximum length of the vector \( \boldsymbol{w} \).
-Here the symbol \( * \) represents the mathematical operation of convolution.
-</p>
+<span style="color: #408080; font-style: italic"># Assuming one input, hidden, and output layer</span>
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">neural_network</span>(params, x):
 
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="memory-considerations">Memory considerations </h2>
+    <span style="color: #408080; font-style: italic"># Find the weights (including and biases) for the hidden and output layer.</span>
+    <span style="color: #408080; font-style: italic"># Assume that params is a list of parameters for each layer.</span>
+    <span style="color: #408080; font-style: italic"># The biases are the first element for each array in params,</span>
+    <span style="color: #408080; font-style: italic"># and the weights are the remaning elements in each array in params.</span>
 
-<p>This expression leaves us however with some terms with negative
-indices, for example \( x(-1) \) and \( x(-2) \) which may not be defined. Our
-vector \( \boldsymbol{x} \) has components \( x(0) \), \( x(1) \), \( x(2) \) and \( x(3) \).
-</p>
+    w_hidden <span style="color: #666666">=</span> params[<span style="color: #666666">0</span>]
+    w_output <span style="color: #666666">=</span> params[<span style="color: #666666">1</span>]
 
-<p>The index \( j \) for \( \boldsymbol{x} \) runs from \( j=0 \) to \( j=3 \) since \( \boldsymbol{x} \) is meant to
-represent a third-order polynomial.
-</p>
+    <span style="color: #408080; font-style: italic"># Assumes input x being an one-dimensional array</span>
+    num_values <span style="color: #666666">=</span> np<span style="color: #666666">.</span>size(x)
+    x <span style="color: #666666">=</span> x<span style="color: #666666">.</span>reshape(<span style="color: #666666">-1</span>, num_values)
 
-<p>Furthermore, the index \( i \) runs from \( i=0 \) to \( i=5 \) since \( \boldsymbol{y} \)
-contains the coefficients of a fifth-order polynomial.  When \( i=5 \) we
-may also have values of \( x(4) \) and \( x(5) \) which are not defined.
-</p>
+    <span style="color: #408080; font-style: italic"># Assume that the input layer does nothing to the input x</span>
+    x_input <span style="color: #666666">=</span> x
 
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="padding">Padding </h2>
+    <span style="color: #408080; font-style: italic">## Hidden layer:</span>
 
-<p>The solution to this is what is called <b>padding</b>!  We simply define a
-new vector \( x \) with two added elements set to zero before \( x(0) \) and
-two new elements after \( x(3) \) set to zero. That is, we augment the
-length of \( \boldsymbol{x} \) from \( n=4 \) to \( n+2P=8 \), where \( P=2 \) is the padding
-constant (a new hyperparameter), see discussions below as well.
-</p>
+    <span style="color: #408080; font-style: italic"># Add a row of ones to include bias</span>
+    x_input <span style="color: #666666">=</span> np<span style="color: #666666">.</span>concatenate((np<span style="color: #666666">.</span>ones((<span style="color: #666666">1</span>,num_values)), x_input ), axis <span style="color: #666666">=</span> <span style="color: #666666">0</span>)
 
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="new-vector">New vector </h2>
+    z_hidden <span style="color: #666666">=</span> np<span style="color: #666666">.</span>matmul(w_hidden, x_input)
+    x_hidden <span style="color: #666666">=</span> sigmoid(z_hidden)
 
-<p>We have a new vector defined as \( x(0)=0 \), \( x(1)=0 \),
-\( x(2)=\beta_0 \), \( x(3)=\beta_1 \), \( x(4)=\beta_2 \), \( x(5)=\beta_3 \),
-\( x(6)=0 \), and \( x(7)=0 \).
-</p>
+    <span style="color: #408080; font-style: italic">## Output layer:</span>
 
-<p>We have added four new elements, which
-are all zero. The benefit is that we can rewrite the equation for
-\( \boldsymbol{y} \), with \( i=0,1,\dots,5 \),
-</p>
-$$
-y(i) = \sum_{k=0}^{k=m-1}w(k)x(i+(m-1)-k).
-$$
+    <span style="color: #408080; font-style: italic"># Include bias:</span>
+    x_hidden <span style="color: #666666">=</span> np<span style="color: #666666">.</span>concatenate((np<span style="color: #666666">.</span>ones((<span style="color: #666666">1</span>,num_values)), x_hidden ), axis <span style="color: #666666">=</span> <span style="color: #666666">0</span>)
 
-<p>As an example, we have</p>
-$$
-y(4)=x(6)w(0)+x(5)w(1)+x(4)w(2)=0\times \alpha_0+\beta_3\alpha_1+\beta_2\alpha_2,
-$$
+    z_output <span style="color: #666666">=</span> np<span style="color: #666666">.</span>matmul(w_output, x_hidden)
+    x_output <span style="color: #666666">=</span> z_output
 
-<p>as before except that we have an additional term \( x(6)w(0) \), which is zero.</p>
+    <span style="color: #008000; font-weight: bold">return</span> x_output
 
-<p>Similarly, for the fifth-order term we have</p>
-$$
-y(5)=x(7)w(0)+x(6)w(1)+x(5)w(2)=0\times \alpha_0+0\times\alpha_1+\beta_3\alpha_2.
-$$
+<span style="color: #408080; font-style: italic"># The trial solution using the deep neural network:</span>
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">g_trial</span>(x,params, g0 <span style="color: #666666">=</span> <span style="color: #666666">10</span>):
+    <span style="color: #008000; font-weight: bold">return</span> g0 <span style="color: #666666">+</span> x<span style="color: #666666">*</span>neural_network(params,x)
 
-<p>The zeroth-order term is</p>
-$$
-y(0)=x(2)w(0)+x(1)w(1)+x(0)w(2)=\beta_0 \alpha_0+0\times\alpha_1+0\times\alpha_2=\alpha_0\beta_0.
-$$
+<span style="color: #408080; font-style: italic"># The right side of the ODE:</span>
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">g</span>(x, g_trial, gamma <span style="color: #666666">=</span> <span style="color: #666666">2</span>):
+    <span style="color: #008000; font-weight: bold">return</span> <span style="color: #666666">-</span>gamma<span style="color: #666666">*</span>g_trial
 
+<span style="color: #408080; font-style: italic"># The cost function:</span>
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">cost_function</span>(P, x):
 
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="rewriting-as-dot-products">Rewriting as dot products </h2>
+    <span style="color: #408080; font-style: italic"># Evaluate the trial function with the current parameters P</span>
+    g_t <span style="color: #666666">=</span> g_trial(x,P)
 
-<p>If we now flip the filter/weight vector, with the following term as a typical example</p>
-$$
-y(0)=x(2)w(0)+x(1)w(1)+x(0)w(2)=x(2)\tilde{w}(2)+x(1)\tilde{w}(1)+x(0)\tilde{w}(0),
-$$
+    <span style="color: #408080; font-style: italic"># Find the derivative w.r.t x of the neural network</span>
+    d_net_out <span style="color: #666666">=</span> elementwise_grad(neural_network,<span style="color: #666666">1</span>)(P,x)
 
-<p>with \( \tilde{w}(0)=w(2) \), \( \tilde{w}(1)=w(1) \), and \( \tilde{w}(2)=w(0) \), we can then rewrite the above sum as a dot product of
-\( x(i:i+(m-1))\tilde{w} \) for element \( y(i) \), where \( x(i:i+(m-1)) \) is simply a patch of \( \boldsymbol{x} \) of size \( m-1 \).
-</p>
+    <span style="color: #408080; font-style: italic"># Find the derivative w.r.t x of the trial function</span>
+    d_g_t <span style="color: #666666">=</span> elementwise_grad(g_trial,<span style="color: #666666">0</span>)(x,P)
 
-<p>The padding \( P \) we have introduced for the convolution stage is just
-another hyperparameter which is introduced as part of the
-architecture. Similarly, below we will also introduce another
-hyperparameter called <b>Stride</b> \( S \). 
-</p>
+    <span style="color: #408080; font-style: italic"># The right side of the ODE</span>
+    func <span style="color: #666666">=</span> g(x, g_t)
 
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="cross-correlation">Cross correlation </h2>
+    err_sqr <span style="color: #666666">=</span> (d_g_t <span style="color: #666666">-</span> func)<span style="color: #666666">**2</span>
+    cost_sum <span style="color: #666666">=</span> np<span style="color: #666666">.</span>sum(err_sqr)
 
-<p>In essentially all applications one uses what is called cross correlation instead of the standard convolution described above.
-This means that multiplication is performed in the same direction and instead of the general expression we have discussed above (with infinite sums)
-</p>
-$$
-y(i) = \sum_{k=-\infty}^{k=\infty}w(k)x(i-k),
-$$
+    <span style="color: #008000; font-weight: bold">return</span> cost_sum <span style="color: #666666">/</span> np<span style="color: #666666">.</span>size(err_sqr)
 
-<p>we have now</p>
-$$
-y(i) = \sum_{k=-\infty}^{k=\infty}w(k)x(i+k).
-$$
+<span style="color: #408080; font-style: italic"># Solve the exponential decay ODE using neural network with one input, hidden, and output layer</span>
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">solve_ode_neural_network</span>(x, num_neurons_hidden, num_iter, lmb):
+    <span style="color: #408080; font-style: italic">## Set up initial weights and biases</span>
 
-<p>Both TensorFlow and PyTorch (as well as our own code example below),
-implement the last equation, although it is normally referred to as
-convolution.  The same padding rules and stride rules discussed below
-apply to this expression as well.
-</p>
+    <span style="color: #408080; font-style: italic"># For the hidden layer</span>
+    p0 <span style="color: #666666">=</span> npr<span style="color: #666666">.</span>randn(num_neurons_hidden, <span style="color: #666666">2</span> )
 
-<p>We leave it as an exercise for you to convince yourself that the example we have discussed till now, gives the same final expression using the last expression.</p>
-<h2 id="two-dimensional-objects">Two-dimensional objects </h2>
+    <span style="color: #408080; font-style: italic"># For the output layer</span>
+    p1 <span style="color: #666666">=</span> npr<span style="color: #666666">.</span>randn(<span style="color: #666666">1</span>, num_neurons_hidden <span style="color: #666666">+</span> <span style="color: #666666">1</span> ) <span style="color: #408080; font-style: italic"># +1 since bias is included</span>
 
-<p>We are now ready to start studying the discrete convolutions relevant for convolutional neural networks.
-We often use convolutions over more than one dimension at a time. If
-we have a two-dimensional image \( X \) as input, we can have a <b>filter</b>
-defined by a two-dimensional <b>kernel/weight/filter</b> \( W \). This leads to an output \( Y \)
-</p>
+    P <span style="color: #666666">=</span> [p0, p1]
 
-$$
-Y(i,j)=(X * W)(i,j) = \sum_m\sum_n X(m,n)W(i-m,j-n).
-$$
+    <span style="color: #008000">print</span>(<span style="color: #BA2121">&#39;Initial cost: </span><span style="color: #BB6688; font-weight: bold">%g</span><span style="color: #BA2121">&#39;</span><span style="color: #666666">%</span>cost_function(P, x))
 
-<p>Convolution is a commutative process, which means we can rewrite this equation as</p>
-$$
-Y(i,j)=(X * W)(i,j) = \sum_m\sum_n X(i-m,j-n)W(m,n).
-$$
+    <span style="color: #408080; font-style: italic">## Start finding the optimal weights using gradient descent</span>
 
-<p>Normally the latter is more straightforward to implement in a machine
-larning library since there is less variation in the range of values
-of \( m \) and \( n \).
-</p>
+    <span style="color: #408080; font-style: italic"># Find the Python function that represents the gradient of the cost function</span>
+    <span style="color: #408080; font-style: italic"># w.r.t the 0-th input argument -- that is the weights and biases in the hidden and output layer</span>
+    cost_function_grad <span style="color: #666666">=</span> grad(cost_function,<span style="color: #666666">0</span>)
 
-<p>As mentioned above, most deep learning libraries implement
-cross-correlation instead of convolution (although it is referred to as
-convolution)
-</p>
-$$
-Y(i,j)=(X * W)(i,j) = \sum_m\sum_n X(i+m,j+n)W(m,n).
-$$
+    <span style="color: #408080; font-style: italic"># Let the update be done num_iter times</span>
+    <span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(num_iter):
+        <span style="color: #408080; font-style: italic"># Evaluate the gradient at the current weights and biases in P.</span>
+        <span style="color: #408080; font-style: italic"># The cost_grad consist now of two arrays;</span>
+        <span style="color: #408080; font-style: italic"># one for the gradient w.r.t P_hidden and</span>
+        <span style="color: #408080; font-style: italic"># one for the gradient w.r.t P_output</span>
+        cost_grad <span style="color: #666666">=</span>  cost_function_grad(P, x)
 
+        P[<span style="color: #666666">0</span>] <span style="color: #666666">=</span> P[<span style="color: #666666">0</span>] <span style="color: #666666">-</span> lmb <span style="color: #666666">*</span> cost_grad[<span style="color: #666666">0</span>]
+        P[<span style="color: #666666">1</span>] <span style="color: #666666">=</span> P[<span style="color: #666666">1</span>] <span style="color: #666666">-</span> lmb <span style="color: #666666">*</span> cost_grad[<span style="color: #666666">1</span>]
 
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="cnns-in-more-detail-simple-example">CNNs in more detail, simple example  </h2>
+    <span style="color: #008000">print</span>(<span style="color: #BA2121">&#39;Final cost: </span><span style="color: #BB6688; font-weight: bold">%g</span><span style="color: #BA2121">&#39;</span><span style="color: #666666">%</span>cost_function(P, x))
 
-<p>Let assume we have an input matrix \( X \) of dimensionality \( 3\times 3 \)
-and a \( 2\times 2 \) filter \( W \) given by the following matrices
-</p>
+    <span style="color: #008000; font-weight: bold">return</span> P
 
-$$
-\boldsymbol{X}=\begin{bmatrix}x_{00} & x_{01} & x_{02}  \\
-                      x_{10} & x_{11} & x_{12}  \\
-	              x_{20} & x_{21} & x_{22} \end{bmatrix},
-$$
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">g_analytic</span>(x, gamma <span style="color: #666666">=</span> <span style="color: #666666">2</span>, g0 <span style="color: #666666">=</span> <span style="color: #666666">10</span>):
+    <span style="color: #008000; font-weight: bold">return</span> g0<span style="color: #666666">*</span>np<span style="color: #666666">.</span>exp(<span style="color: #666666">-</span>gamma<span style="color: #666666">*</span>x)
 
-<p>and </p>
-$$
-\boldsymbol{W}=\begin{bmatrix}w_{00} & w_{01} \\
-	              w_{10} & w_{11}\end{bmatrix}.
-$$
+<span style="color: #408080; font-style: italic"># Solve the given problem</span>
+<span style="color: #008000; font-weight: bold">if</span> <span style="color: #19177C">__name__</span> <span style="color: #666666">==</span> <span style="color: #BA2121">&#39;__main__&#39;</span>:
+    <span style="color: #408080; font-style: italic"># Set seed such that the weight are initialized</span>
+    <span style="color: #408080; font-style: italic"># with same weights and biases for every run.</span>
+    npr<span style="color: #666666">.</span>seed(<span style="color: #666666">15</span>)
 
-<p>We introduce now the hyperparameter \( S \) <b>stride</b>. Stride represents how the filter \( W \) moves the convolution process on the matrix \( X \).
-We strongly recommend the repository on <a href="https://github.com/vdumoulin/conv_arithmetic" target="_blank">Arithmetic of deep learning by Dumoulin and Visin</a> 
-</p>
+    <span style="color: #408080; font-style: italic">## Decide the vales of arguments to the function to solve</span>
+    N <span style="color: #666666">=</span> <span style="color: #666666">10</span>
+    x <span style="color: #666666">=</span> np<span style="color: #666666">.</span>linspace(<span style="color: #666666">0</span>, <span style="color: #666666">1</span>, N)
 
-<p>Here we set the stride equal to \( S=1 \), which means that, starting with the element \( x_{00} \), the filter will act on \( 2\times 2 \) submatrices each time, starting with the upper corner and moving according to the stride value column by column. </p>
+    <span style="color: #408080; font-style: italic">## Set up the initial parameters</span>
+    num_hidden_neurons <span style="color: #666666">=</span> <span style="color: #666666">10</span>
+    num_iter <span style="color: #666666">=</span> <span style="color: #666666">10000</span>
+    lmb <span style="color: #666666">=</span> <span style="color: #666666">0.001</span>
 
-<p>Here we perform the operation</p>
-$$
-Y_(i,j)=(X * W)(i,j) = \sum_m\sum_n X(i-m,j-n)W(m,n),
-$$
+    <span style="color: #408080; font-style: italic"># Use the network</span>
+    P <span style="color: #666666">=</span> solve_ode_neural_network(x, num_hidden_neurons, num_iter, lmb)
 
-<p>and obtain</p>
-$$
-\boldsymbol{Y}=\begin{bmatrix}x_{00}w_{00}+x_{01}w_{01}+x_{10}w_{10}+x_{11}w_{11} & x_{01}w_{00}+x_{02}w_{01}+x_{11}w_{10}+x_{12}w_{11}  \\
-	              x_{10}w_{00}+x_{11}w_{01}+x_{20}w_{10}+x_{21}w_{11} & x_{11}w_{00}+x_{12}w_{01}+x_{21}w_{10}+x_{22}w_{11}\end{bmatrix}.
-$$
+    <span style="color: #408080; font-style: italic"># Print the deviation from the trial solution and true solution</span>
+    res <span style="color: #666666">=</span> g_trial(x,P)
+    res_analytical <span style="color: #666666">=</span> g_analytic(x)
 
-<p>We can rewrite this operation in terms of a matrix-vector multiplication by defining a new vector where we flatten out the inputs as a vector \( \boldsymbol{X}' \) of length \( 9 \) and
-a matrix \( \boldsymbol{W}' \) with dimension \( 4\times 9 \) as
-</p>
+    <span style="color: #008000">print</span>(<span style="color: #BA2121">&#39;Max absolute difference: </span><span style="color: #BB6688; font-weight: bold">%g</span><span style="color: #BA2121">&#39;</span><span style="color: #666666">%</span>np<span style="color: #666666">.</span>max(np<span style="color: #666666">.</span>abs(res <span style="color: #666666">-</span> res_analytical)))
 
-$$
-\boldsymbol{X}'=\begin{bmatrix}x_{00} \\ x_{01} \\ x_{02} \\ x_{10} \\ x_{11} \\ x_{12} \\ x_{20} \\ x_{21} \\ x_{22} \end{bmatrix},
-$$
+    <span style="color: #408080; font-style: italic"># Plot the results</span>
+    plt<span style="color: #666666">.</span>figure(figsize<span style="color: #666666">=</span>(<span style="color: #666666">10</span>,<span style="color: #666666">10</span>))
 
-<p>and the new matrix</p>
-$$
-\boldsymbol{W}'=\begin{bmatrix} w_{00} & w_{01} & 0 & w_{10} & w_{11} & 0 & 0 & 0 & 0 \\
-                        0  & w_{00} & w_{01} & 0 & w_{10} & w_{11} & 0 & 0 & 0 \\
-			0 & 0 & 0 & w_{00} & w_{01} & 0 & w_{10} & w_{11} & 0  \\
-                        0 & 0 & 0 & 0 & w_{00} & w_{01} & 0 & w_{10} & w_{11}\end{bmatrix}.
-$$
+    plt<span style="color: #666666">.</span>title(<span style="color: #BA2121">&#39;Performance of neural network solving an ODE compared to the analytical solution&#39;</span>)
+    plt<span style="color: #666666">.</span>plot(x, res_analytical)
+    plt<span style="color: #666666">.</span>plot(x, res[<span style="color: #666666">0</span>,:])
+    plt<span style="color: #666666">.</span>legend([<span style="color: #BA2121">&#39;analytical&#39;</span>,<span style="color: #BA2121">&#39;nn&#39;</span>])
+    plt<span style="color: #666666">.</span>xlabel(<span style="color: #BA2121">&#39;x&#39;</span>)
+    plt<span style="color: #666666">.</span>ylabel(<span style="color: #BA2121">&#39;g(x)&#39;</span>)
+    plt<span style="color: #666666">.</span>show()
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
+</div>
 
-<p>We see easily that performing the matrix-vector multiplication \( \boldsymbol{W}'\boldsymbol{X}' \) is the same as the above convolution with stride \( S=1 \), that is</p>
 
-$$
-Y=(\boldsymbol{W}*\boldsymbol{X}),
-$$
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="the-network-with-one-input-layer-specified-number-of-hidden-layers-and-one-output-layer">The network with one input layer, specified number of hidden layers, and one output layer </h2>
 
-<p>is now given by \( \boldsymbol{W}'\boldsymbol{X}' \) which is a vector of length \( 4 \) instead of the originally resulting  \( 2\times 2 \) output matrix.</p>
+<p>It is also possible to extend the construction of our network into a more general one, allowing the network to contain more than one hidden layers.</p>
 
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="the-convolution-stage">The convolution stage </h2>
+<p>The number of neurons within each hidden layer are given as a list of integers in the program below.</p>
 
-<p>The convolution stage, where we apply different filters \( \boldsymbol{W} \) in
-order to reduce the dimensionality of an image, adds, in addition to
-the weights and biases (to be trained by the back propagation
-algorithm) that define the filters, two new hyperparameters, the so-called
-<b>padding</b> \( P \) and the stride \( S \).
-</p>
 
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="finding-the-number-of-parameters">Finding the number of parameters </h2>
+<!-- code=python (!bc pycod) typeset with pygments style "default" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #f8f8f8">
+  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">autograd.numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">autograd</span> <span style="color: #008000; font-weight: bold">import</span> grad, elementwise_grad
+<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">autograd.numpy.random</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">npr</span>
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">matplotlib</span> <span style="color: #008000; font-weight: bold">import</span> pyplot <span style="color: #008000; font-weight: bold">as</span> plt
 
-<p>In the above example we have an input matrix of dimension \( 3\times
-3 \). In general we call the input for an input volume and it is defined
-by its width \( H_1 \), height \( H_1 \) and depth \( D_1 \). If we have the
-standard three color channels \( D_1=3 \).
-</p>
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">sigmoid</span>(z):
+    <span style="color: #008000; font-weight: bold">return</span> <span style="color: #666666">1/</span>(<span style="color: #666666">1</span> <span style="color: #666666">+</span> np<span style="color: #666666">.</span>exp(<span style="color: #666666">-</span>z))
 
-<p>The above example has \( W_1=H_1=3 \) and \( D_1=1 \).</p>
+<span style="color: #408080; font-style: italic"># The neural network with one input layer and one output layer,</span>
+<span style="color: #408080; font-style: italic"># but with number of hidden layers specified by the user.</span>
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">deep_neural_network</span>(deep_params, x):
+    <span style="color: #408080; font-style: italic"># N_hidden is the number of hidden layers</span>
+    <span style="color: #408080; font-style: italic"># deep_params is a list, len() should be used</span>
+    N_hidden <span style="color: #666666">=</span> <span style="color: #008000">len</span>(deep_params) <span style="color: #666666">-</span> <span style="color: #666666">1</span> <span style="color: #408080; font-style: italic"># -1 since params consists of</span>
+                                        <span style="color: #408080; font-style: italic"># parameters to all the hidden</span>
+                                        <span style="color: #408080; font-style: italic"># layers AND the output layer.</span>
 
-<p>When we introduce the filter we have the following additional hyperparameters</p>
-<ol>
-<li> \( K \) the number of filters. It is common to perform the convolution of the input several times since by experience shrinking the input too fast does not work well</li>
-<li> \( F \) as the filter's spatial extent</li>
-<li> \( S \) as the stride parameter</li>
-<li> \( P \) as the padding parameter</li>
-</ol>
-<p>These parameters are defined by the architecture of the network and are not included in the training.</p>
+    <span style="color: #408080; font-style: italic"># Assumes input x being an one-dimensional array</span>
+    num_values <span style="color: #666666">=</span> np<span style="color: #666666">.</span>size(x)
+    x <span style="color: #666666">=</span> x<span style="color: #666666">.</span>reshape(<span style="color: #666666">-1</span>, num_values)
 
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="new-image-or-volume">New image (or volume) </h2>
+    <span style="color: #408080; font-style: italic"># Assume that the input layer does nothing to the input x</span>
+    x_input <span style="color: #666666">=</span> x
 
-<p>Acting with the filter on the input volume produces an output volume
-which is defined by its width \( W_2 \), its height \( H_2 \) and its depth
-\( D_2 \).
-</p>
+    <span style="color: #408080; font-style: italic"># Due to multiple hidden layers, define a variable referencing to the</span>
+    <span style="color: #408080; font-style: italic"># output of the previous layer:</span>
+    x_prev <span style="color: #666666">=</span> x_input
 
-<p>These are defined by the following relations</p>
-$$
-W_2 = \frac{(W_1-F+2P)}{S}+1,
-$$
+    <span style="color: #408080; font-style: italic">## Hidden layers:</span>
 
-$$
-H_2 = \frac{(H_1-F+2P)}{S}+1,
-$$
+    <span style="color: #008000; font-weight: bold">for</span> l <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(N_hidden):
+        <span style="color: #408080; font-style: italic"># From the list of parameters P; find the correct weigths and bias for this layer</span>
+        w_hidden <span style="color: #666666">=</span> deep_params[l]
 
-<p>and \( D_2=K \).</p>
+        <span style="color: #408080; font-style: italic"># Add a row of ones to include bias</span>
+        x_prev <span style="color: #666666">=</span> np<span style="color: #666666">.</span>concatenate((np<span style="color: #666666">.</span>ones((<span style="color: #666666">1</span>,num_values)), x_prev ), axis <span style="color: #666666">=</span> <span style="color: #666666">0</span>)
 
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="parameters-to-train-common-settings">Parameters to train, common settings </h2>
+        z_hidden <span style="color: #666666">=</span> np<span style="color: #666666">.</span>matmul(w_hidden, x_prev)
+        x_hidden <span style="color: #666666">=</span> sigmoid(z_hidden)
 
-<p>With parameter sharing, the convolution involves thus  for each filter  \( F\times F\times D_1 \) weights plus one bias parameter.</p>
+        <span style="color: #408080; font-style: italic"># Update x_prev such that next layer can use the output from this layer</span>
+        x_prev <span style="color: #666666">=</span> x_hidden
 
-<p>In total we have</p>
-$$
-\left(F\times F\times D_1\right) \times K+K_{\mathrm{biases}},
-$$
+    <span style="color: #408080; font-style: italic">## Output layer:</span>
 
-<p>parameters to train by back propagation.</p>
+    <span style="color: #408080; font-style: italic"># Get the weights and bias for this layer</span>
+    w_output <span style="color: #666666">=</span> deep_params[<span style="color: #666666">-1</span>]
 
-<p>It is common to let \( K \) come in powers of \( 2 \), that is \( 32 \), \( 64 \), \( 128 \) etc.</p>
+    <span style="color: #408080; font-style: italic"># Include bias:</span>
+    x_prev <span style="color: #666666">=</span> np<span style="color: #666666">.</span>concatenate((np<span style="color: #666666">.</span>ones((<span style="color: #666666">1</span>,num_values)), x_prev), axis <span style="color: #666666">=</span> <span style="color: #666666">0</span>)
 
-<div class="alert alert-block alert-block alert-text-normal">
-<b>Common settings</b>
-<p>
-<ol>
-<li> \( \begin{array}{c} F=3 &amp; S=1 &amp; P=1 \end{array} \)</li>
-<li> \( \begin{array}{c} F=5 &amp; S=1 &amp; P=2 \end{array} \)</li>
-<li> \( \begin{array}{c} F=5 &amp; S=2 &amp; P=\mathrm{open} \end{array} \)</li>
-<li> \( \begin{array}{c} F=1 &amp; S=1 &amp; P=0 \end{array} \)</li>
-</ol>
-</div>
+    z_output <span style="color: #666666">=</span> np<span style="color: #666666">.</span>matmul(w_output, x_prev)
+    x_output <span style="color: #666666">=</span> z_output
 
+    <span style="color: #008000; font-weight: bold">return</span> x_output
 
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="examples-of-cnn-setups">Examples of CNN setups </h2>
+<span style="color: #408080; font-style: italic"># The trial solution using the deep neural network:</span>
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">g_trial_deep</span>(x,params, g0 <span style="color: #666666">=</span> <span style="color: #666666">10</span>):
+    <span style="color: #008000; font-weight: bold">return</span> g0 <span style="color: #666666">+</span> x<span style="color: #666666">*</span>deep_neural_network(params, x)
 
-<p>Let us assume we have an input volume \( V \) given by an image of dimensionality
-\( 32\times 32 \times 3 \), that is three color channels and \( 32\times 32 \) pixels.
-</p>
+<span style="color: #408080; font-style: italic"># The right side of the ODE:</span>
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">g</span>(x, g_trial, gamma <span style="color: #666666">=</span> <span style="color: #666666">2</span>):
+    <span style="color: #008000; font-weight: bold">return</span> <span style="color: #666666">-</span>gamma<span style="color: #666666">*</span>g_trial
 
-<p>We apply a filter of dimension \( 5\times 5 \) ten times with stride \( S=1 \) and padding \( P=0 \).</p>
+<span style="color: #408080; font-style: italic"># The same cost function as before, but calls deep_neural_network instead.</span>
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">cost_function_deep</span>(P, x):
 
-<p>The output volume is given by \( (32-5)/1+1=28 \), resulting in ten images
-of dimensionality \( 28\times 28\times 3 \).
-</p>
+    <span style="color: #408080; font-style: italic"># Evaluate the trial function with the current parameters P</span>
+    g_t <span style="color: #666666">=</span> g_trial_deep(x,P)
 
-<p>The total number of parameters to train for each filter is then
-\( 5\times 5\times 3+1 \), where the last parameter is the bias. This
-gives us \( 76 \) parameters for each filter, leading to a total of \( 760 \)
-parameters for the ten filters.
-</p>
+    <span style="color: #408080; font-style: italic"># Find the derivative w.r.t x of the neural network</span>
+    d_net_out <span style="color: #666666">=</span> elementwise_grad(deep_neural_network,<span style="color: #666666">1</span>)(P,x)
 
-<p>How many parameters will a filter of dimensionality \( 3\times 3 \)
-(adding color channels) result in if we produce \( 32 \) new images? Use \( S=1 \) and \( P=0 \).
-</p>
+    <span style="color: #408080; font-style: italic"># Find the derivative w.r.t x of the trial function</span>
+    d_g_t <span style="color: #666666">=</span> elementwise_grad(g_trial_deep,<span style="color: #666666">0</span>)(x,P)
 
-<p>Note that strides constitute a form of <b>subsampling</b>. As an alternative to
-being interpreted as a measure of how much the kernel/filter is translated, strides
-can also be viewed as how much of the output is retained. For instance, moving
-the kernel by hops of two is equivalent to moving the kernel by hops of one but
-retaining only odd output elements.
-</p>
+    <span style="color: #408080; font-style: italic"># The right side of the ODE</span>
+    func <span style="color: #666666">=</span> g(x, g_t)
 
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book">Summarizing: Performing a general discrete convolution (<a href="https://github.com/rasbt/machine-learning-book" target="_blank">From Raschka et al</a>) </h2>
+    err_sqr <span style="color: #666666">=</span> (d_g_t <span style="color: #666666">-</span> func)<span style="color: #666666">**2</span>
+    cost_sum <span style="color: #666666">=</span> np<span style="color: #666666">.</span>sum(err_sqr)
 
-<center>  <!-- FIGURE -->
-<hr class="figure">
-<center>
-<p class="caption">Figure 4:  A deep CNN </p>
-</center>
-<p><img src="http://github.com/figslides/discreteconv1.png" width="500" align="bottom"></p>
-</center>
+    <span style="color: #008000; font-weight: bold">return</span> cost_sum <span style="color: #666666">/</span> np<span style="color: #666666">.</span>size(err_sqr)
 
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="pooling">Pooling </h2>
+<span style="color: #408080; font-style: italic"># Solve the exponential decay ODE using neural network with one input and one output layer,</span>
+<span style="color: #408080; font-style: italic"># but with specified number of hidden layers from the user.</span>
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">solve_ode_deep_neural_network</span>(x, num_neurons, num_iter, lmb):
+    <span style="color: #408080; font-style: italic"># num_hidden_neurons is now a list of number of neurons within each hidden layer</span>
 
-<p>In addition to discrete convolutions themselves, <b>pooling</b> operations
-make up another important building block in CNNs. Pooling operations reduce
-the size of feature maps by using some function to summarize subregions, such
-as taking the average or the maximum value.
-</p>
+    <span style="color: #408080; font-style: italic"># The number of elements in the list num_hidden_neurons thus represents</span>
+    <span style="color: #408080; font-style: italic"># the number of hidden layers.</span>
 
-<p>Pooling works by sliding a window across the input and feeding the content of
-the window to a <b>pooling function</b>. In some sense, pooling works very much
-like a discrete convolution, but replaces the linear combination described by
-the kernel with some other function.
-</p>
+    <span style="color: #408080; font-style: italic"># Find the number of hidden layers:</span>
+    N_hidden <span style="color: #666666">=</span> np<span style="color: #666666">.</span>size(num_neurons)
 
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="pooling-arithmetic">Pooling arithmetic </h2>
+    <span style="color: #408080; font-style: italic">## Set up initial weights and biases</span>
 
-<p>In a neural network, pooling layers provide invariance to small translations of
-the input. The most common kind of pooling is <b>max pooling</b>, which
-consists in splitting the input in (usually non-overlapping) patches and
-outputting the maximum value of each patch. Other kinds of pooling exist, e.g.,
-mean or average pooling, which all share the same idea of aggregating the input
-locally by applying a non-linearity to the content of some patches.
-</p>
+    <span style="color: #408080; font-style: italic"># Initialize the list of parameters:</span>
+    P <span style="color: #666666">=</span> [<span style="color: #008000; font-weight: bold">None</span>]<span style="color: #666666">*</span>(N_hidden <span style="color: #666666">+</span> <span style="color: #666666">1</span>) <span style="color: #408080; font-style: italic"># + 1 to include the output layer</span>
 
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book">Pooling types (<a href="https://github.com/rasbt/machine-learning-book" target="_blank">From Raschka et al</a>) </h2>
+    P[<span style="color: #666666">0</span>] <span style="color: #666666">=</span> npr<span style="color: #666666">.</span>randn(num_neurons[<span style="color: #666666">0</span>], <span style="color: #666666">2</span> )
+    <span style="color: #008000; font-weight: bold">for</span> l <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #666666">1</span>,N_hidden):
+        P[l] <span style="color: #666666">=</span> npr<span style="color: #666666">.</span>randn(num_neurons[l], num_neurons[l<span style="color: #666666">-1</span>] <span style="color: #666666">+</span> <span style="color: #666666">1</span>) <span style="color: #408080; font-style: italic"># +1 to include bias</span>
 
-<center>  <!-- FIGURE -->
-<hr class="figure">
-<center>
-<p class="caption">Figure 5:  A deep CNN </p>
-</center>
-<p><img src="http://github.com/figslides/maxpooling.png" width="500" align="bottom"></p>
-</center>
+    <span style="color: #408080; font-style: italic"># For the output layer</span>
+    P[<span style="color: #666666">-1</span>] <span style="color: #666666">=</span> npr<span style="color: #666666">.</span>randn(<span style="color: #666666">1</span>, num_neurons[<span style="color: #666666">-1</span>] <span style="color: #666666">+</span> <span style="color: #666666">1</span> ) <span style="color: #408080; font-style: italic"># +1 since bias is included</span>
 
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="building-convolutional-neural-networks-in-tensorflow-and-keras">Building convolutional neural networks in Tensorflow and Keras </h2>
+    <span style="color: #008000">print</span>(<span style="color: #BA2121">&#39;Initial cost: </span><span style="color: #BB6688; font-weight: bold">%g</span><span style="color: #BA2121">&#39;</span><span style="color: #666666">%</span>cost_function_deep(P, x))
 
-<p>As discussed above, CNNs are neural networks built from the assumption that the inputs
-to the network are 2D images. This is important because the number of features or pixels in images
-grows very fast with the image size, and an enormous number of weights and biases are needed in order to build an accurate network.  
-</p>
+    <span style="color: #408080; font-style: italic">## Start finding the optimal weights using gradient descent</span>
 
-<p>As before, we still have our input, a hidden layer and an output. What's novel about convolutional networks
-are the <b>convolutional</b> and <b>pooling</b> layers stacked in pairs between the input and the hidden layer.
-In addition, the data is no longer represented as a 2D feature matrix, instead each input is a number of 2D
-matrices, typically 1 for each color dimension (Red, Green, Blue). 
-</p>
+    <span style="color: #408080; font-style: italic"># Find the Python function that represents the gradient of the cost function</span>
+    <span style="color: #408080; font-style: italic"># w.r.t the 0-th input argument -- that is the weights and biases in the hidden and output layer</span>
+    cost_function_deep_grad <span style="color: #666666">=</span> grad(cost_function_deep,<span style="color: #666666">0</span>)
 
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="setting-it-up">Setting it up </h2>
+    <span style="color: #408080; font-style: italic"># Let the update be done num_iter times</span>
+    <span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(num_iter):
+        <span style="color: #408080; font-style: italic"># Evaluate the gradient at the current weights and biases in P.</span>
+        <span style="color: #408080; font-style: italic"># The cost_grad consist now of N_hidden + 1 arrays; the gradient w.r.t the weights and biases</span>
+        <span style="color: #408080; font-style: italic"># in the hidden layers and output layers evaluated at x.</span>
+        cost_deep_grad <span style="color: #666666">=</span>  cost_function_deep_grad(P, x)
 
-<p>It means that to represent the entire
-dataset of images, we require a 4D matrix or <b>tensor</b>. This tensor has the dimensions:  
-</p>
-$$  
-(n_{inputs},\, n_{pixels, width},\, n_{pixels, height},\, depth) .
-$$
+        <span style="color: #008000; font-weight: bold">for</span> l <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(N_hidden<span style="color: #666666">+1</span>):
+            P[l] <span style="color: #666666">=</span> P[l] <span style="color: #666666">-</span> lmb <span style="color: #666666">*</span> cost_deep_grad[l]
+
+    <span style="color: #008000">print</span>(<span style="color: #BA2121">&#39;Final cost: </span><span style="color: #BB6688; font-weight: bold">%g</span><span style="color: #BA2121">&#39;</span><span style="color: #666666">%</span>cost_function_deep(P, x))
+
+    <span style="color: #008000; font-weight: bold">return</span> P
+
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">g_analytic</span>(x, gamma <span style="color: #666666">=</span> <span style="color: #666666">2</span>, g0 <span style="color: #666666">=</span> <span style="color: #666666">10</span>):
+    <span style="color: #008000; font-weight: bold">return</span> g0<span style="color: #666666">*</span>np<span style="color: #666666">.</span>exp(<span style="color: #666666">-</span>gamma<span style="color: #666666">*</span>x)
+
+<span style="color: #408080; font-style: italic"># Solve the given problem</span>
+<span style="color: #008000; font-weight: bold">if</span> <span style="color: #19177C">__name__</span> <span style="color: #666666">==</span> <span style="color: #BA2121">&#39;__main__&#39;</span>:
+    npr<span style="color: #666666">.</span>seed(<span style="color: #666666">15</span>)
+
+    <span style="color: #408080; font-style: italic">## Decide the vales of arguments to the function to solve</span>
+    N <span style="color: #666666">=</span> <span style="color: #666666">10</span>
+    x <span style="color: #666666">=</span> np<span style="color: #666666">.</span>linspace(<span style="color: #666666">0</span>, <span style="color: #666666">1</span>, N)
+
+    <span style="color: #408080; font-style: italic">## Set up the initial parameters</span>
+    num_hidden_neurons <span style="color: #666666">=</span> np<span style="color: #666666">.</span>array([<span style="color: #666666">10</span>,<span style="color: #666666">10</span>])
+    num_iter <span style="color: #666666">=</span> <span style="color: #666666">10000</span>
+    lmb <span style="color: #666666">=</span> <span style="color: #666666">0.001</span>
+
+    P <span style="color: #666666">=</span> solve_ode_deep_neural_network(x, num_hidden_neurons, num_iter, lmb)
+
+    res <span style="color: #666666">=</span> g_trial_deep(x,P)
+    res_analytical <span style="color: #666666">=</span> g_analytic(x)
+
+    plt<span style="color: #666666">.</span>figure(figsize<span style="color: #666666">=</span>(<span style="color: #666666">10</span>,<span style="color: #666666">10</span>))
+
+    plt<span style="color: #666666">.</span>title(<span style="color: #BA2121">&#39;Performance of a deep neural network solving an ODE compared to the analytical solution&#39;</span>)
+    plt<span style="color: #666666">.</span>plot(x, res_analytical)
+    plt<span style="color: #666666">.</span>plot(x, res[<span style="color: #666666">0</span>,:])
+    plt<span style="color: #666666">.</span>legend([<span style="color: #BA2121">&#39;analytical&#39;</span>,<span style="color: #BA2121">&#39;dnn&#39;</span>])
+    plt<span style="color: #666666">.</span>ylabel(<span style="color: #BA2121">&#39;g(x)&#39;</span>)
+    plt<span style="color: #666666">.</span>show()
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
+</div>
 
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="the-mnist-dataset-again">The MNIST dataset again </h2>
+<h2 id="example-population-growth">Example: Population growth </h2>
 
-<p>The MNIST dataset consists of grayscale images with a pixel size of
-\( 28\times 28 \), meaning we require \( 28 \times 28 = 724 \) weights to each
-neuron in the first hidden layer.
+<p>A logistic model of population growth assumes that a population converges toward an equilibrium.
+The population growth can be modeled by
 </p>
 
-<p>If we were to analyze images of size \( 128\times 128 \) we would require
-\( 128 \times 128 = 16384 \) weights to each neuron. Even worse if we were
-dealing with color images, as most images are, we have an image matrix
-of size \( 128\times 128 \) for each color dimension (Red, Green, Blue),
-meaning 3 times the number of weights \( = 49152 \) are required for every
-single neuron in the first hidden layer.
+$$
+\begin{equation} \label{log}
+	g'(t) = \alpha g(t)(A - g(t))
+\end{equation}
+$$
+
+<p>where \( g(t) \) is the population density at time \( t \), \( \alpha > 0 \) the growth rate and \( A > 0 \) is the maximum population number in the environment.
+Also, at \( t = 0 \) the population has the size \( g(0) = g_0 \), where \( g_0 \) is some chosen constant.
 </p>
 
+<p>In this example, similar network as for the exponential decay using Autograd has been used to solve the equation. However, as the implementation might suffer from e.g numerical instability
+and high execution time (this might be more apparent in the examples solving PDEs),
+using a library like  TensorFlow is recommended.
+Here, we stay with a more simple approach and implement for comparison, the simple forward Euler method.
+</p>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="strong-correlations">Strong correlations </h2>
+<h2 id="setting-up-the-problem">Setting up the problem </h2>
 
-<p>Images typically have strong local correlations, meaning that a small
-part of the image varies little from its neighboring regions. If for
-example we have an image of a blue car, we can roughly assume that a
-small blue part of the image is surrounded by other blue regions.
+<p>Here, we will model a population \( g(t) \) in an environment having carrying capacity \( A \).
+The population follows the model
 </p>
 
-<p>Therefore, instead of connecting every single pixel to a neuron in the
-first hidden layer, as we have previously done with deep neural
-networks, we can instead connect each neuron to a small part of the
-image (in all 3 RGB depth dimensions).  The size of each small area is
-fixed, and known as a <a href="https://en.wikipedia.org/wiki/Receptive_field" target="_blank">receptive</a>.
-</p>
+$$
+\begin{equation} \label{solveode_population}
+g'(t) = \alpha g(t)(A - g(t))
+\end{equation}
+$$
 
+<p>where \( g(0) = g_0 \).</p>
 
-<!-- !split  -->
-<h2 id="layers-of-a-cnn">Layers of a CNN </h2>
+<p>In this example, we let \( \alpha = 2 \), \( A = 1 \), and \( g_0 = 1.2 \).</p>
+
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="the-trial-solution">The trial solution </h2>
 
-<p>The layers of a convolutional neural network arrange neurons in 3D: width, height and depth.  
-The input image is typically a square matrix of depth 3. 
+<p>We will get a slightly different trial solution, as the boundary conditions are different
+compared to the case for exponential decay.
 </p>
 
-<p>A <b>convolution</b> is performed on the image which outputs
-a 3D volume of neurons. The weights to the input are arranged in a number of 2D matrices, known as <b>filters</b>.
+<p>A possible trial solution satisfying the condition \( g(0) = g_0 \) could be</p>
+
+<p>$$
+h_1(t) = g_0 + t \cdot N(t,P)
+$$
 </p>
 
-<p>Each filter slides along the input image, taking the dot product
-between each small part of the image and the filter, in all depth
-dimensions. This is then passed through a non-linear function,
-typically the <b>Rectified Linear (ReLu)</b> function, which serves as the
-activation of the neurons in the first convolutional layer. This is
-further passed through a <b>pooling layer</b>, which reduces the size of the
-convolutional layer, e.g. by taking the maximum or average across some
-small regions, and this serves as input to the next convolutional
-layer.
+<p>with \( N(t,P) \) being the output from the neural network with weights and biases for each layer collected in the set \( P \).</p>
+
+<p>The analytical solution is</p>
+
+<p>$$
+g(t) = \frac{Ag_0}{g_0 + (A - g_0)\exp(-\alpha A t)}
+$$
 </p>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="systematic-reduction">Systematic reduction </h2>
-
-<p>By systematically reducing the size of the input volume, through
-convolution and pooling, the network should create representations of
-small parts of the input, and then from them assemble representations
-of larger areas.  The final pooling layer is flattened to serve as
-input to a hidden layer, such that each neuron in the final pooling
-layer is connected to every single neuron in the hidden layer. This
-then serves as input to the output layer, e.g. a softmax output for
-classification.
-</p>
+<h2 id="the-program-using-autograd">The program using Autograd </h2>
 
+<p>The network will be the similar as for the exponential decay example, but with some small modifications for our problem.</p>
 
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="prerequisites-collect-and-pre-process-data">Prerequisites: Collect and pre-process data </h2>
 
 <!-- code=python (!bc pycod) typeset with pygments style "default" -->
 <div class="cell border-box-sizing code_cell rendered">
@@ -1406,46 +1316,174 @@ <h2 id="prerequisites-collect-and-pre-process-data">Prerequisites: Collect and p
     <div class="inner_cell">
       <div class="input_area">
         <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #408080; font-style: italic"># import necessary packages</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">matplotlib.pyplot</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">plt</span>
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn</span> <span style="color: #008000; font-weight: bold">import</span> datasets
+  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">autograd.numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">autograd</span> <span style="color: #008000; font-weight: bold">import</span> grad, elementwise_grad
+<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">autograd.numpy.random</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">npr</span>
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">matplotlib</span> <span style="color: #008000; font-weight: bold">import</span> pyplot <span style="color: #008000; font-weight: bold">as</span> plt
 
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">sigmoid</span>(z):
+    <span style="color: #008000; font-weight: bold">return</span> <span style="color: #666666">1/</span>(<span style="color: #666666">1</span> <span style="color: #666666">+</span> np<span style="color: #666666">.</span>exp(<span style="color: #666666">-</span>z))
 
-<span style="color: #408080; font-style: italic"># ensure the same random numbers appear every time</span>
-np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>seed(<span style="color: #666666">0</span>)
+<span style="color: #408080; font-style: italic"># Function to get the parameters.</span>
+<span style="color: #408080; font-style: italic"># Done such that one can easily change the paramaters after one&#39;s liking.</span>
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">get_parameters</span>():
+    alpha <span style="color: #666666">=</span> <span style="color: #666666">2</span>
+    A <span style="color: #666666">=</span> <span style="color: #666666">1</span>
+    g0 <span style="color: #666666">=</span> <span style="color: #666666">1.2</span>
+    <span style="color: #008000; font-weight: bold">return</span> alpha, A, g0
 
-<span style="color: #408080; font-style: italic"># display images in notebook</span>
-<span style="color: #666666">%</span>matplotlib inline
-plt<span style="color: #666666">.</span>rcParams[<span style="color: #BA2121">&#39;figure.figsize&#39;</span>] <span style="color: #666666">=</span> (<span style="color: #666666">12</span>,<span style="color: #666666">12</span>)
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">deep_neural_network</span>(deep_params, x):
+    <span style="color: #408080; font-style: italic"># N_hidden is the number of hidden layers</span>
+    <span style="color: #408080; font-style: italic"># deep_params is a list, len() should be used</span>
+    N_hidden <span style="color: #666666">=</span> <span style="color: #008000">len</span>(deep_params) <span style="color: #666666">-</span> <span style="color: #666666">1</span> <span style="color: #408080; font-style: italic"># -1 since params consists of</span>
+                                        <span style="color: #408080; font-style: italic"># parameters to all the hidden</span>
+                                        <span style="color: #408080; font-style: italic"># layers AND the output layer.</span>
 
+    <span style="color: #408080; font-style: italic"># Assumes input x being an one-dimensional array</span>
+    num_values <span style="color: #666666">=</span> np<span style="color: #666666">.</span>size(x)
+    x <span style="color: #666666">=</span> x<span style="color: #666666">.</span>reshape(<span style="color: #666666">-1</span>, num_values)
 
-<span style="color: #408080; font-style: italic"># download MNIST dataset</span>
-digits <span style="color: #666666">=</span> datasets<span style="color: #666666">.</span>load_digits()
+    <span style="color: #408080; font-style: italic"># Assume that the input layer does nothing to the input x</span>
+    x_input <span style="color: #666666">=</span> x
 
-<span style="color: #408080; font-style: italic"># define inputs and labels</span>
-inputs <span style="color: #666666">=</span> digits<span style="color: #666666">.</span>images
-labels <span style="color: #666666">=</span> digits<span style="color: #666666">.</span>target
+    <span style="color: #408080; font-style: italic"># Due to multiple hidden layers, define a variable referencing to the</span>
+    <span style="color: #408080; font-style: italic"># output of the previous layer:</span>
+    x_prev <span style="color: #666666">=</span> x_input
 
-<span style="color: #408080; font-style: italic"># RGB images have a depth of 3</span>
-<span style="color: #408080; font-style: italic"># our images are grayscale so they should have a depth of 1</span>
-inputs <span style="color: #666666">=</span> inputs[:,:,:,np<span style="color: #666666">.</span>newaxis]
+    <span style="color: #408080; font-style: italic">## Hidden layers:</span>
 
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;inputs = (n_inputs, pixel_width, pixel_height, depth) = &quot;</span> <span style="color: #666666">+</span> <span style="color: #008000">str</span>(inputs<span style="color: #666666">.</span>shape))
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;labels = (n_inputs) = &quot;</span> <span style="color: #666666">+</span> <span style="color: #008000">str</span>(labels<span style="color: #666666">.</span>shape))
+    <span style="color: #008000; font-weight: bold">for</span> l <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(N_hidden):
+        <span style="color: #408080; font-style: italic"># From the list of parameters P; find the correct weigths and bias for this layer</span>
+        w_hidden <span style="color: #666666">=</span> deep_params[l]
 
+        <span style="color: #408080; font-style: italic"># Add a row of ones to include bias</span>
+        x_prev <span style="color: #666666">=</span> np<span style="color: #666666">.</span>concatenate((np<span style="color: #666666">.</span>ones((<span style="color: #666666">1</span>,num_values)), x_prev ), axis <span style="color: #666666">=</span> <span style="color: #666666">0</span>)
 
-<span style="color: #408080; font-style: italic"># choose some random images to display</span>
-n_inputs <span style="color: #666666">=</span> <span style="color: #008000">len</span>(inputs)
-indices <span style="color: #666666">=</span> np<span style="color: #666666">.</span>arange(n_inputs)
-random_indices <span style="color: #666666">=</span> np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>choice(indices, size<span style="color: #666666">=5</span>)
+        z_hidden <span style="color: #666666">=</span> np<span style="color: #666666">.</span>matmul(w_hidden, x_prev)
+        x_hidden <span style="color: #666666">=</span> sigmoid(z_hidden)
 
-<span style="color: #008000; font-weight: bold">for</span> i, image <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">enumerate</span>(digits<span style="color: #666666">.</span>images[random_indices]):
-    plt<span style="color: #666666">.</span>subplot(<span style="color: #666666">1</span>, <span style="color: #666666">5</span>, i<span style="color: #666666">+1</span>)
-    plt<span style="color: #666666">.</span>axis(<span style="color: #BA2121">&#39;off&#39;</span>)
-    plt<span style="color: #666666">.</span>imshow(image, cmap<span style="color: #666666">=</span>plt<span style="color: #666666">.</span>cm<span style="color: #666666">.</span>gray_r, interpolation<span style="color: #666666">=</span><span style="color: #BA2121">&#39;nearest&#39;</span>)
-    plt<span style="color: #666666">.</span>title(<span style="color: #BA2121">&quot;Label: </span><span style="color: #BB6688; font-weight: bold">%d</span><span style="color: #BA2121">&quot;</span> <span style="color: #666666">%</span> digits<span style="color: #666666">.</span>target[random_indices[i]])
-plt<span style="color: #666666">.</span>show()
+        <span style="color: #408080; font-style: italic"># Update x_prev such that next layer can use the output from this layer</span>
+        x_prev <span style="color: #666666">=</span> x_hidden
+
+    <span style="color: #408080; font-style: italic">## Output layer:</span>
+
+    <span style="color: #408080; font-style: italic"># Get the weights and bias for this layer</span>
+    w_output <span style="color: #666666">=</span> deep_params[<span style="color: #666666">-1</span>]
+
+    <span style="color: #408080; font-style: italic"># Include bias:</span>
+    x_prev <span style="color: #666666">=</span> np<span style="color: #666666">.</span>concatenate((np<span style="color: #666666">.</span>ones((<span style="color: #666666">1</span>,num_values)), x_prev), axis <span style="color: #666666">=</span> <span style="color: #666666">0</span>)
+
+    z_output <span style="color: #666666">=</span> np<span style="color: #666666">.</span>matmul(w_output, x_prev)
+    x_output <span style="color: #666666">=</span> z_output
+
+    <span style="color: #008000; font-weight: bold">return</span> x_output
+
+
+
+
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">cost_function_deep</span>(P, x):
+
+    <span style="color: #408080; font-style: italic"># Evaluate the trial function with the current parameters P</span>
+    g_t <span style="color: #666666">=</span> g_trial_deep(x,P)
+
+    <span style="color: #408080; font-style: italic"># Find the derivative w.r.t x of the trial function</span>
+    d_g_t <span style="color: #666666">=</span> elementwise_grad(g_trial_deep,<span style="color: #666666">0</span>)(x,P)
+
+    <span style="color: #408080; font-style: italic"># The right side of the ODE</span>
+    func <span style="color: #666666">=</span> f(x, g_t)
+
+    err_sqr <span style="color: #666666">=</span> (d_g_t <span style="color: #666666">-</span> func)<span style="color: #666666">**2</span>
+    cost_sum <span style="color: #666666">=</span> np<span style="color: #666666">.</span>sum(err_sqr)
+
+    <span style="color: #008000; font-weight: bold">return</span> cost_sum <span style="color: #666666">/</span> np<span style="color: #666666">.</span>size(err_sqr)
+
+<span style="color: #408080; font-style: italic"># The right side of the ODE:</span>
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">f</span>(x, g_trial):
+    alpha,A, g0 <span style="color: #666666">=</span> get_parameters()
+    <span style="color: #008000; font-weight: bold">return</span> alpha<span style="color: #666666">*</span>g_trial<span style="color: #666666">*</span>(A <span style="color: #666666">-</span> g_trial)
+
+<span style="color: #408080; font-style: italic"># The trial solution using the deep neural network:</span>
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">g_trial_deep</span>(x, params):
+    alpha,A, g0 <span style="color: #666666">=</span> get_parameters()
+    <span style="color: #008000; font-weight: bold">return</span> g0 <span style="color: #666666">+</span> x<span style="color: #666666">*</span>deep_neural_network(params,x)
+
+<span style="color: #408080; font-style: italic"># The analytical solution:</span>
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">g_analytic</span>(t):
+    alpha,A, g0 <span style="color: #666666">=</span> get_parameters()
+    <span style="color: #008000; font-weight: bold">return</span> A<span style="color: #666666">*</span>g0<span style="color: #666666">/</span>(g0 <span style="color: #666666">+</span> (A <span style="color: #666666">-</span> g0)<span style="color: #666666">*</span>np<span style="color: #666666">.</span>exp(<span style="color: #666666">-</span>alpha<span style="color: #666666">*</span>A<span style="color: #666666">*</span>t))
+
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">solve_ode_deep_neural_network</span>(x, num_neurons, num_iter, lmb):
+    <span style="color: #408080; font-style: italic"># num_hidden_neurons is now a list of number of neurons within each hidden layer</span>
+
+    <span style="color: #408080; font-style: italic"># Find the number of hidden layers:</span>
+    N_hidden <span style="color: #666666">=</span> np<span style="color: #666666">.</span>size(num_neurons)
+
+    <span style="color: #408080; font-style: italic">## Set up initial weigths and biases</span>
+
+    <span style="color: #408080; font-style: italic"># Initialize the list of parameters:</span>
+    P <span style="color: #666666">=</span> [<span style="color: #008000; font-weight: bold">None</span>]<span style="color: #666666">*</span>(N_hidden <span style="color: #666666">+</span> <span style="color: #666666">1</span>) <span style="color: #408080; font-style: italic"># + 1 to include the output layer</span>
+
+    P[<span style="color: #666666">0</span>] <span style="color: #666666">=</span> npr<span style="color: #666666">.</span>randn(num_neurons[<span style="color: #666666">0</span>], <span style="color: #666666">2</span> )
+    <span style="color: #008000; font-weight: bold">for</span> l <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #666666">1</span>,N_hidden):
+        P[l] <span style="color: #666666">=</span> npr<span style="color: #666666">.</span>randn(num_neurons[l], num_neurons[l<span style="color: #666666">-1</span>] <span style="color: #666666">+</span> <span style="color: #666666">1</span>) <span style="color: #408080; font-style: italic"># +1 to include bias</span>
+
+    <span style="color: #408080; font-style: italic"># For the output layer</span>
+    P[<span style="color: #666666">-1</span>] <span style="color: #666666">=</span> npr<span style="color: #666666">.</span>randn(<span style="color: #666666">1</span>, num_neurons[<span style="color: #666666">-1</span>] <span style="color: #666666">+</span> <span style="color: #666666">1</span> ) <span style="color: #408080; font-style: italic"># +1 since bias is included</span>
+
+    <span style="color: #008000">print</span>(<span style="color: #BA2121">&#39;Initial cost: </span><span style="color: #BB6688; font-weight: bold">%g</span><span style="color: #BA2121">&#39;</span><span style="color: #666666">%</span>cost_function_deep(P, x))
+
+    <span style="color: #408080; font-style: italic">## Start finding the optimal weigths using gradient descent</span>
+
+    <span style="color: #408080; font-style: italic"># Find the Python function that represents the gradient of the cost function</span>
+    <span style="color: #408080; font-style: italic"># w.r.t the 0-th input argument -- that is the weights and biases in the hidden and output layer</span>
+    cost_function_deep_grad <span style="color: #666666">=</span> grad(cost_function_deep,<span style="color: #666666">0</span>)
+
+    <span style="color: #408080; font-style: italic"># Let the update be done num_iter times</span>
+    <span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(num_iter):
+        <span style="color: #408080; font-style: italic"># Evaluate the gradient at the current weights and biases in P.</span>
+        <span style="color: #408080; font-style: italic"># The cost_grad consist now of N_hidden + 1 arrays; the gradient w.r.t the weights and biases</span>
+        <span style="color: #408080; font-style: italic"># in the hidden layers and output layers evaluated at x.</span>
+        cost_deep_grad <span style="color: #666666">=</span>  cost_function_deep_grad(P, x)
+
+        <span style="color: #008000; font-weight: bold">for</span> l <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(N_hidden<span style="color: #666666">+1</span>):
+            P[l] <span style="color: #666666">=</span> P[l] <span style="color: #666666">-</span> lmb <span style="color: #666666">*</span> cost_deep_grad[l]
+
+    <span style="color: #008000">print</span>(<span style="color: #BA2121">&#39;Final cost: </span><span style="color: #BB6688; font-weight: bold">%g</span><span style="color: #BA2121">&#39;</span><span style="color: #666666">%</span>cost_function_deep(P, x))
+
+    <span style="color: #008000; font-weight: bold">return</span> P
+
+<span style="color: #008000; font-weight: bold">if</span> <span style="color: #19177C">__name__</span> <span style="color: #666666">==</span> <span style="color: #BA2121">&#39;__main__&#39;</span>:
+    npr<span style="color: #666666">.</span>seed(<span style="color: #666666">4155</span>)
+
+    <span style="color: #408080; font-style: italic">## Decide the vales of arguments to the function to solve</span>
+    Nt <span style="color: #666666">=</span> <span style="color: #666666">10</span>
+    T <span style="color: #666666">=</span> <span style="color: #666666">1</span>
+    t <span style="color: #666666">=</span> np<span style="color: #666666">.</span>linspace(<span style="color: #666666">0</span>,T, Nt)
+
+    <span style="color: #408080; font-style: italic">## Set up the initial parameters</span>
+    num_hidden_neurons <span style="color: #666666">=</span> [<span style="color: #666666">100</span>, <span style="color: #666666">50</span>, <span style="color: #666666">25</span>]
+    num_iter <span style="color: #666666">=</span> <span style="color: #666666">1000</span>
+    lmb <span style="color: #666666">=</span> <span style="color: #666666">1e-3</span>
+
+    P <span style="color: #666666">=</span> solve_ode_deep_neural_network(t, num_hidden_neurons, num_iter, lmb)
+
+    g_dnn_ag <span style="color: #666666">=</span> g_trial_deep(t,P)
+    g_analytical <span style="color: #666666">=</span> g_analytic(t)
+
+    <span style="color: #408080; font-style: italic"># Find the maximum absolute difference between the solutons:</span>
+    diff_ag <span style="color: #666666">=</span> np<span style="color: #666666">.</span>max(np<span style="color: #666666">.</span>abs(g_dnn_ag <span style="color: #666666">-</span> g_analytical))
+    <span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;The max absolute difference between the solutions is: </span><span style="color: #BB6688; font-weight: bold">%g</span><span style="color: #BA2121">&quot;</span><span style="color: #666666">%</span>diff_ag)
+
+    plt<span style="color: #666666">.</span>figure(figsize<span style="color: #666666">=</span>(<span style="color: #666666">10</span>,<span style="color: #666666">10</span>))
+
+    plt<span style="color: #666666">.</span>title(<span style="color: #BA2121">&#39;Performance of neural network solving an ODE compared to the analytical solution&#39;</span>)
+    plt<span style="color: #666666">.</span>plot(t, g_analytical)
+    plt<span style="color: #666666">.</span>plot(t, g_dnn_ag[<span style="color: #666666">0</span>,:])
+    plt<span style="color: #666666">.</span>legend([<span style="color: #BA2121">&#39;analytical&#39;</span>,<span style="color: #BA2121">&#39;nn&#39;</span>])
+    plt<span style="color: #666666">.</span>xlabel(<span style="color: #BA2121">&#39;t&#39;</span>)
+    plt<span style="color: #666666">.</span>ylabel(<span style="color: #BA2121">&#39;g(t)&#39;</span>)
+
+    plt<span style="color: #666666">.</span>show()
 </pre>
 </div>
       </div>
@@ -1463,7 +1501,58 @@ <h2 id="prerequisites-collect-and-pre-process-data">Prerequisites: Collect and p
 
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="importing-keras-and-tensorflow">Importing Keras and Tensorflow </h2>
+<h2 id="using-forward-euler-to-solve-the-ode">Using forward Euler to solve the ODE </h2>
+
+<p>A straightforward way of solving an ODE numerically, is to use Euler's method.</p>
+
+<p>Euler's method uses Taylor series to approximate the value at a function \( f \) at a step \( \Delta x \) from \( x \):</p>
+
+<p>$$
+f(x + \Delta x) \approx f(x) + \Delta x f'(x)
+$$
+</p>
+
+<p>In our case, using Euler's method to approximate the value of \( g \) at a step \( \Delta t \) from \( t \) yields</p>
+
+$$
+\begin{aligned}
+  g(t + \Delta t) &\approx g(t) + \Delta t g'(t) \\
+  &= g(t) + \Delta t \big(\alpha g(t)(A - g(t))\big)
+\end{aligned}
+$$
+
+<p>along with the condition that \( g(0) = g_0 \).</p>
+
+<p>Let \( t_i = i \cdot \Delta t \) where \( \Delta t = \frac{T}{N_t-1} \) where \( T \) is the final time our solver must solve for and \( N_t \) the number of values for \( t \in [0, T] \) for \( i = 0, \dots, N_t-1 \).</p>
+
+<p>For \( i \geq 1 \), we have that</p>
+$$
+\begin{aligned}
+t_i &= i\Delta t \\
+&= (i - 1)\Delta t + \Delta t \\
+&= t_{i-1} + \Delta t
+\end{aligned}
+$$
+
+<p>Now, if \( g_i = g(t_i) \) then</p>
+
+$$
+\begin{equation}
+  \begin{aligned}
+  g_i &= g(t_i) \\
+  &= g(t_{i-1} + \Delta t) \\
+  &\approx g(t_{i-1}) + \Delta t \big(\alpha g(t_{i-1})(A - g(t_{i-1}))\big) \\
+  &= g_{i-1} + \Delta t \big(\alpha g_{i-1}(A - g_{i-1})\big)
+  \end{aligned}
+\end{equation} \label{odenum}
+$$
+
+<p>for \( i \geq 1 \) and \( g_0 = g(t_0) = g(0) = g_0 \).</p>
+
+<p>Equation \eqref{odenum} could be implemented in the following way,
+extending the program that uses the network using Autograd:
+</p>
+
 
 <!-- code=python (!bc pycod) typeset with pygments style "default" -->
 <div class="cell border-box-sizing code_cell rendered">
@@ -1471,28 +1560,71 @@ <h2 id="importing-keras-and-tensorflow">Importing Keras and Tensorflow </h2>
     <div class="inner_cell">
       <div class="input_area">
         <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">tensorflow.keras</span> <span style="color: #008000; font-weight: bold">import</span> datasets, layers, models
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">tensorflow.keras.layers</span> <span style="color: #008000; font-weight: bold">import</span> Input
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">tensorflow.keras.models</span> <span style="color: #008000; font-weight: bold">import</span> Sequential      <span style="color: #408080; font-style: italic">#This allows appending layers to existing models</span>
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">tensorflow.keras.layers</span> <span style="color: #008000; font-weight: bold">import</span> Dense           <span style="color: #408080; font-style: italic">#This allows defining the characteristics of a particular layer</span>
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">tensorflow.keras</span> <span style="color: #008000; font-weight: bold">import</span> optimizers             <span style="color: #408080; font-style: italic">#This allows using whichever optimiser we want (sgd,adam,RMSprop)</span>
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">tensorflow.keras</span> <span style="color: #008000; font-weight: bold">import</span> regularizers           <span style="color: #408080; font-style: italic">#This allows using whichever regularizer we want (l1,l2,l1_l2)</span>
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">tensorflow.keras.utils</span> <span style="color: #008000; font-weight: bold">import</span> to_categorical   <span style="color: #408080; font-style: italic">#This allows using categorical cross entropy as the cost function</span>
-<span style="color: #408080; font-style: italic">#from tensorflow.keras import Conv2D</span>
-<span style="color: #408080; font-style: italic">#from tensorflow.keras import MaxPooling2D</span>
-<span style="color: #408080; font-style: italic">#from tensorflow.keras import Flatten</span>
-
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.model_selection</span> <span style="color: #008000; font-weight: bold">import</span> train_test_split
-
-<span style="color: #408080; font-style: italic"># representation of labels</span>
-labels <span style="color: #666666">=</span> to_categorical(labels)
-
-<span style="color: #408080; font-style: italic"># split into train and test data</span>
-<span style="color: #408080; font-style: italic"># one-liner from scikit-learn library</span>
-train_size <span style="color: #666666">=</span> <span style="color: #666666">0.8</span>
-test_size <span style="color: #666666">=</span> <span style="color: #666666">1</span> <span style="color: #666666">-</span> train_size
-X_train, X_test, Y_train, Y_test <span style="color: #666666">=</span> train_test_split(inputs, labels, train_size<span style="color: #666666">=</span>train_size,
-                                                    test_size<span style="color: #666666">=</span>test_size)
+  <pre style="line-height: 125%;"><span style="color: #408080; font-style: italic"># Assume that all function definitions from the example program using Autograd</span>
+<span style="color: #408080; font-style: italic"># are located here.</span>
+
+<span style="color: #008000; font-weight: bold">if</span> <span style="color: #19177C">__name__</span> <span style="color: #666666">==</span> <span style="color: #BA2121">&#39;__main__&#39;</span>:
+    npr<span style="color: #666666">.</span>seed(<span style="color: #666666">4155</span>)
+
+    <span style="color: #408080; font-style: italic">## Decide the vales of arguments to the function to solve</span>
+    Nt <span style="color: #666666">=</span> <span style="color: #666666">10</span>
+    T <span style="color: #666666">=</span> <span style="color: #666666">1</span>
+    t <span style="color: #666666">=</span> np<span style="color: #666666">.</span>linspace(<span style="color: #666666">0</span>,T, Nt)
+
+    <span style="color: #408080; font-style: italic">## Set up the initial parameters</span>
+    num_hidden_neurons <span style="color: #666666">=</span> [<span style="color: #666666">100</span>,<span style="color: #666666">50</span>,<span style="color: #666666">25</span>]
+    num_iter <span style="color: #666666">=</span> <span style="color: #666666">1000</span>
+    lmb <span style="color: #666666">=</span> <span style="color: #666666">1e-3</span>
+
+    P <span style="color: #666666">=</span> solve_ode_deep_neural_network(t, num_hidden_neurons, num_iter, lmb)
+
+    g_dnn_ag <span style="color: #666666">=</span> g_trial_deep(t,P)
+    g_analytical <span style="color: #666666">=</span> g_analytic(t)
+
+    <span style="color: #408080; font-style: italic"># Find the maximum absolute difference between the solutons:</span>
+    diff_ag <span style="color: #666666">=</span> np<span style="color: #666666">.</span>max(np<span style="color: #666666">.</span>abs(g_dnn_ag <span style="color: #666666">-</span> g_analytical))
+    <span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;The max absolute difference between the solutions is: </span><span style="color: #BB6688; font-weight: bold">%g</span><span style="color: #BA2121">&quot;</span><span style="color: #666666">%</span>diff_ag)
+
+    plt<span style="color: #666666">.</span>figure(figsize<span style="color: #666666">=</span>(<span style="color: #666666">10</span>,<span style="color: #666666">10</span>))
+
+    plt<span style="color: #666666">.</span>title(<span style="color: #BA2121">&#39;Performance of neural network solving an ODE compared to the analytical solution&#39;</span>)
+    plt<span style="color: #666666">.</span>plot(t, g_analytical)
+    plt<span style="color: #666666">.</span>plot(t, g_dnn_ag[<span style="color: #666666">0</span>,:])
+    plt<span style="color: #666666">.</span>legend([<span style="color: #BA2121">&#39;analytical&#39;</span>,<span style="color: #BA2121">&#39;nn&#39;</span>])
+    plt<span style="color: #666666">.</span>xlabel(<span style="color: #BA2121">&#39;t&#39;</span>)
+    plt<span style="color: #666666">.</span>ylabel(<span style="color: #BA2121">&#39;g(t)&#39;</span>)
+
+    <span style="color: #408080; font-style: italic">## Find an approximation to the funtion using forward Euler</span>
+
+    alpha, A, g0 <span style="color: #666666">=</span> get_parameters()
+    dt <span style="color: #666666">=</span> T<span style="color: #666666">/</span>(Nt <span style="color: #666666">-</span> <span style="color: #666666">1</span>)
+
+    <span style="color: #408080; font-style: italic"># Perform forward Euler to solve the ODE</span>
+    g_euler <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros(Nt)
+    g_euler[<span style="color: #666666">0</span>] <span style="color: #666666">=</span> g0
+
+    <span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #666666">1</span>,Nt):
+        g_euler[i] <span style="color: #666666">=</span> g_euler[i<span style="color: #666666">-1</span>] <span style="color: #666666">+</span> dt<span style="color: #666666">*</span>(alpha<span style="color: #666666">*</span>g_euler[i<span style="color: #666666">-1</span>]<span style="color: #666666">*</span>(A <span style="color: #666666">-</span> g_euler[i<span style="color: #666666">-1</span>]))
+
+    <span style="color: #408080; font-style: italic"># Print the errors done by each method</span>
+    diff1 <span style="color: #666666">=</span> np<span style="color: #666666">.</span>max(np<span style="color: #666666">.</span>abs(g_euler <span style="color: #666666">-</span> g_analytical))
+    diff2 <span style="color: #666666">=</span> np<span style="color: #666666">.</span>max(np<span style="color: #666666">.</span>abs(g_dnn_ag[<span style="color: #666666">0</span>,:] <span style="color: #666666">-</span> g_analytical))
+
+    <span style="color: #008000">print</span>(<span style="color: #BA2121">&#39;Max absolute difference between Euler method and analytical: </span><span style="color: #BB6688; font-weight: bold">%g</span><span style="color: #BA2121">&#39;</span><span style="color: #666666">%</span>diff1)
+    <span style="color: #008000">print</span>(<span style="color: #BA2121">&#39;Max absolute difference between deep neural network and analytical: </span><span style="color: #BB6688; font-weight: bold">%g</span><span style="color: #BA2121">&#39;</span><span style="color: #666666">%</span>diff2)
+
+    <span style="color: #408080; font-style: italic"># Plot results</span>
+    plt<span style="color: #666666">.</span>figure(figsize<span style="color: #666666">=</span>(<span style="color: #666666">10</span>,<span style="color: #666666">10</span>))
+
+    plt<span style="color: #666666">.</span>plot(t,g_euler)
+    plt<span style="color: #666666">.</span>plot(t,g_analytical)
+    plt<span style="color: #666666">.</span>plot(t,g_dnn_ag[<span style="color: #666666">0</span>,:])
+
+    plt<span style="color: #666666">.</span>legend([<span style="color: #BA2121">&#39;euler&#39;</span>,<span style="color: #BA2121">&#39;analytical&#39;</span>,<span style="color: #BA2121">&#39;dnn&#39;</span>])
+    plt<span style="color: #666666">.</span>xlabel(<span style="color: #BA2121">&#39;Time t&#39;</span>)
+    plt<span style="color: #666666">.</span>ylabel(<span style="color: #BA2121">&#39;g(t)&#39;</span>)
+
+    plt<span style="color: #666666">.</span>show()
 </pre>
 </div>
       </div>
@@ -1509,102 +1641,66 @@ <h2 id="importing-keras-and-tensorflow">Importing Keras and Tensorflow </h2>
 </div>
 
 
-<!-- !split  -->
-<h2 id="running-with-keras">Running with Keras </h2>
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="example-solving-the-one-dimensional-poisson-equation">Example: Solving the one dimensional Poisson equation </h2>
 
+<p>The Poisson equation for \( g(x) \) in one dimension is</p>
 
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">create_convolutional_neural_network_keras</span>(input_shape, receptive_field,
-                                              n_filters, n_neurons_connected, n_categories,
-                                              eta, lmbd):
-    model <span style="color: #666666">=</span> Sequential()
-    model<span style="color: #666666">.</span>add(layers<span style="color: #666666">.</span>Conv2D(n_filters, (receptive_field, receptive_field), input_shape<span style="color: #666666">=</span>input_shape, padding<span style="color: #666666">=</span><span style="color: #BA2121">&#39;same&#39;</span>,
-              activation<span style="color: #666666">=</span><span style="color: #BA2121">&#39;relu&#39;</span>, kernel_regularizer<span style="color: #666666">=</span>regularizers<span style="color: #666666">.</span>l2(lmbd)))
-    model<span style="color: #666666">.</span>add(layers<span style="color: #666666">.</span>MaxPooling2D(pool_size<span style="color: #666666">=</span>(<span style="color: #666666">2</span>, <span style="color: #666666">2</span>)))
-    model<span style="color: #666666">.</span>add(layers<span style="color: #666666">.</span>Flatten())
-    model<span style="color: #666666">.</span>add(layers<span style="color: #666666">.</span>Dense(n_neurons_connected, activation<span style="color: #666666">=</span><span style="color: #BA2121">&#39;relu&#39;</span>, kernel_regularizer<span style="color: #666666">=</span>regularizers<span style="color: #666666">.</span>l2(lmbd)))
-    model<span style="color: #666666">.</span>add(layers<span style="color: #666666">.</span>Dense(n_categories, activation<span style="color: #666666">=</span><span style="color: #BA2121">&#39;softmax&#39;</span>, kernel_regularizer<span style="color: #666666">=</span>regularizers<span style="color: #666666">.</span>l2(lmbd)))
-    
-    sgd <span style="color: #666666">=</span> optimizers<span style="color: #666666">.</span>SGD(learning_rate<span style="color: #666666">=</span>eta)
-    model<span style="color: #666666">.</span>compile(loss<span style="color: #666666">=</span><span style="color: #BA2121">&#39;categorical_crossentropy&#39;</span>, optimizer<span style="color: #666666">=</span>sgd, metrics<span style="color: #666666">=</span>[<span style="color: #BA2121">&#39;accuracy&#39;</span>])
-    
-    <span style="color: #008000; font-weight: bold">return</span> model
-
-epochs <span style="color: #666666">=</span> <span style="color: #666666">100</span>
-batch_size <span style="color: #666666">=</span> <span style="color: #666666">100</span>
-input_shape <span style="color: #666666">=</span> X_train<span style="color: #666666">.</span>shape[<span style="color: #666666">1</span>:<span style="color: #666666">4</span>]
-receptive_field <span style="color: #666666">=</span> <span style="color: #666666">3</span>
-n_filters <span style="color: #666666">=</span> <span style="color: #666666">10</span>
-n_neurons_connected <span style="color: #666666">=</span> <span style="color: #666666">50</span>
-n_categories <span style="color: #666666">=</span> <span style="color: #666666">10</span>
-
-eta_vals <span style="color: #666666">=</span> np<span style="color: #666666">.</span>logspace(<span style="color: #666666">-5</span>, <span style="color: #666666">1</span>, <span style="color: #666666">7</span>)
-lmbd_vals <span style="color: #666666">=</span> np<span style="color: #666666">.</span>logspace(<span style="color: #666666">-5</span>, <span style="color: #666666">1</span>, <span style="color: #666666">7</span>)
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
+$$
+\begin{equation} \label{poisson}
+  -g''(x) = f(x)
+\end{equation}
+$$
 
+<p>where \( f(x) \) is a given function for \( x \in (0,1) \).</p>
+
+<p>The conditions that \( g(x) \) is chosen to fulfill, are</p>
+$$
+\begin{align*}
+  g(0) &= 0 \\
+  g(1) &= 0
+\end{align*}
+$$
+
+<p>This equation can be solved numerically using programs where e.g Autograd and TensorFlow are used.
+The results from the networks can then be compared to the analytical solution.
+In addition, it could be interesting to see how a typical method for numerically solving second order ODEs compares to the neural networks.
+</p>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="final-part">Final part </h2>
+<h2 id="the-specific-equation-to-solve-for">The specific equation to solve for </h2>
 
+<p>Here, the function \( g(x) \) to solve for follows the equation</p>
 
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;">CNN_keras <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros((<span style="color: #008000">len</span>(eta_vals), <span style="color: #008000">len</span>(lmbd_vals)), dtype<span style="color: #666666">=</span><span style="color: #008000">object</span>)
-        
-<span style="color: #008000; font-weight: bold">for</span> i, eta <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">enumerate</span>(eta_vals):
-    <span style="color: #008000; font-weight: bold">for</span> j, lmbd <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">enumerate</span>(lmbd_vals):
-        CNN <span style="color: #666666">=</span> create_convolutional_neural_network_keras(input_shape, receptive_field,
-                                              n_filters, n_neurons_connected, n_categories,
-                                              eta, lmbd)
-        CNN<span style="color: #666666">.</span>fit(X_train, Y_train, epochs<span style="color: #666666">=</span>epochs, batch_size<span style="color: #666666">=</span>batch_size, verbose<span style="color: #666666">=0</span>)
-        scores <span style="color: #666666">=</span> CNN<span style="color: #666666">.</span>evaluate(X_test, Y_test)
-        
-        CNN_keras[i][j] <span style="color: #666666">=</span> CNN
-        
-        <span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;Learning rate = &quot;</span>, eta)
-        <span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;Lambda = &quot;</span>, lmbd)
-        <span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;Test accuracy: </span><span style="color: #BB6688; font-weight: bold">%.3f</span><span style="color: #BA2121">&quot;</span> <span style="color: #666666">%</span> scores[<span style="color: #666666">1</span>])
-        <span style="color: #008000">print</span>()
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
+$$
+-g''(x) = f(x),\qquad x \in (0,1)
+$$
+
+<p>where \( f(x) \) is a given function, along with the chosen conditions</p>
+
+$$
+\begin{aligned}
+g(0) = g(1) = 0
+\end{aligned}\label{cond}
+$$
+
+<p>In this example, we consider the case when \( f(x) = (3x + x^2)\exp(x) \).</p>
+
+<p>For this case, a possible trial solution satisfying the conditions could be</p>
+
+$$
+g_t(x) = x \cdot (1-x) \cdot N(P,x)
+$$
+
+<p>The analytical solution for this problem is</p>
+
+$$
+g(x) = x(1 - x)\exp(x)
+$$
 
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="final-visualization">Final visualization </h2>
+<h2 id="solving-the-equation-using-autograd">Solving the equation using Autograd </h2>
 
 
 <!-- code=python (!bc pycod) typeset with pygments style "default" -->
@@ -1613,270 +1709,160 @@ <h2 id="final-visualization">Final visualization </h2>
     <div class="inner_cell">
       <div class="input_area">
         <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #408080; font-style: italic"># visual representation of grid search</span>
-<span style="color: #408080; font-style: italic"># uses seaborn heatmap, could probably do this in matplotlib</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">seaborn</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">sns</span>
+  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">autograd.numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">autograd</span> <span style="color: #008000; font-weight: bold">import</span> grad, elementwise_grad
+<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">autograd.numpy.random</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">npr</span>
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">matplotlib</span> <span style="color: #008000; font-weight: bold">import</span> pyplot <span style="color: #008000; font-weight: bold">as</span> plt
 
-sns<span style="color: #666666">.</span>set()
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">sigmoid</span>(z):
+    <span style="color: #008000; font-weight: bold">return</span> <span style="color: #666666">1/</span>(<span style="color: #666666">1</span> <span style="color: #666666">+</span> np<span style="color: #666666">.</span>exp(<span style="color: #666666">-</span>z))
 
-train_accuracy <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros((<span style="color: #008000">len</span>(eta_vals), <span style="color: #008000">len</span>(lmbd_vals)))
-test_accuracy <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros((<span style="color: #008000">len</span>(eta_vals), <span style="color: #008000">len</span>(lmbd_vals)))
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">deep_neural_network</span>(deep_params, x):
+    <span style="color: #408080; font-style: italic"># N_hidden is the number of hidden layers</span>
+    <span style="color: #408080; font-style: italic"># deep_params is a list, len() should be used</span>
+    N_hidden <span style="color: #666666">=</span> <span style="color: #008000">len</span>(deep_params) <span style="color: #666666">-</span> <span style="color: #666666">1</span> <span style="color: #408080; font-style: italic"># -1 since params consists of</span>
+                                        <span style="color: #408080; font-style: italic"># parameters to all the hidden</span>
+                                        <span style="color: #408080; font-style: italic"># layers AND the output layer.</span>
 
-<span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #008000">len</span>(eta_vals)):
-    <span style="color: #008000; font-weight: bold">for</span> j <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #008000">len</span>(lmbd_vals)):
-        CNN <span style="color: #666666">=</span> CNN_keras[i][j]
+    <span style="color: #408080; font-style: italic"># Assumes input x being an one-dimensional array</span>
+    num_values <span style="color: #666666">=</span> np<span style="color: #666666">.</span>size(x)
+    x <span style="color: #666666">=</span> x<span style="color: #666666">.</span>reshape(<span style="color: #666666">-1</span>, num_values)
 
-        train_accuracy[i][j] <span style="color: #666666">=</span> CNN<span style="color: #666666">.</span>evaluate(X_train, Y_train)[<span style="color: #666666">1</span>]
-        test_accuracy[i][j] <span style="color: #666666">=</span> CNN<span style="color: #666666">.</span>evaluate(X_test, Y_test)[<span style="color: #666666">1</span>]
+    <span style="color: #408080; font-style: italic"># Assume that the input layer does nothing to the input x</span>
+    x_input <span style="color: #666666">=</span> x
 
-        
-fig, ax <span style="color: #666666">=</span> plt<span style="color: #666666">.</span>subplots(figsize <span style="color: #666666">=</span> (<span style="color: #666666">10</span>, <span style="color: #666666">10</span>))
-sns<span style="color: #666666">.</span>heatmap(train_accuracy, annot<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">True</span>, ax<span style="color: #666666">=</span>ax, cmap<span style="color: #666666">=</span><span style="color: #BA2121">&quot;viridis&quot;</span>)
-ax<span style="color: #666666">.</span>set_title(<span style="color: #BA2121">&quot;Training Accuracy&quot;</span>)
-ax<span style="color: #666666">.</span>set_ylabel(<span style="color: #BA2121">&quot;$\eta$&quot;</span>)
-ax<span style="color: #666666">.</span>set_xlabel(<span style="color: #BA2121">&quot;$\lambda$&quot;</span>)
-plt<span style="color: #666666">.</span>show()
+    <span style="color: #408080; font-style: italic"># Due to multiple hidden layers, define a variable referencing to the</span>
+    <span style="color: #408080; font-style: italic"># output of the previous layer:</span>
+    x_prev <span style="color: #666666">=</span> x_input
 
-fig, ax <span style="color: #666666">=</span> plt<span style="color: #666666">.</span>subplots(figsize <span style="color: #666666">=</span> (<span style="color: #666666">10</span>, <span style="color: #666666">10</span>))
-sns<span style="color: #666666">.</span>heatmap(test_accuracy, annot<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">True</span>, ax<span style="color: #666666">=</span>ax, cmap<span style="color: #666666">=</span><span style="color: #BA2121">&quot;viridis&quot;</span>)
-ax<span style="color: #666666">.</span>set_title(<span style="color: #BA2121">&quot;Test Accuracy&quot;</span>)
-ax<span style="color: #666666">.</span>set_ylabel(<span style="color: #BA2121">&quot;$\eta$&quot;</span>)
-ax<span style="color: #666666">.</span>set_xlabel(<span style="color: #BA2121">&quot;$\lambda$&quot;</span>)
-plt<span style="color: #666666">.</span>show()
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
+    <span style="color: #408080; font-style: italic">## Hidden layers:</span>
 
+    <span style="color: #008000; font-weight: bold">for</span> l <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(N_hidden):
+        <span style="color: #408080; font-style: italic"># From the list of parameters P; find the correct weigths and bias for this layer</span>
+        w_hidden <span style="color: #666666">=</span> deep_params[l]
 
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="the-cifar01-data-set">The CIFAR01 data set </h2>
+        <span style="color: #408080; font-style: italic"># Add a row of ones to include bias</span>
+        x_prev <span style="color: #666666">=</span> np<span style="color: #666666">.</span>concatenate((np<span style="color: #666666">.</span>ones((<span style="color: #666666">1</span>,num_values)), x_prev ), axis <span style="color: #666666">=</span> <span style="color: #666666">0</span>)
 
-<p>The CIFAR10 dataset contains 60,000 color images in 10 classes, with
-6,000 images in each class. The dataset is divided into 50,000
-training images and 10,000 testing images. The classes are mutually
-exclusive and there is no overlap between them.
-</p>
+        z_hidden <span style="color: #666666">=</span> np<span style="color: #666666">.</span>matmul(w_hidden, x_prev)
+        x_hidden <span style="color: #666666">=</span> sigmoid(z_hidden)
 
+        <span style="color: #408080; font-style: italic"># Update x_prev such that next layer can use the output from this layer</span>
+        x_prev <span style="color: #666666">=</span> x_hidden
 
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">tensorflow</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">tf</span>
+    <span style="color: #408080; font-style: italic">## Output layer:</span>
 
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">tensorflow.keras</span> <span style="color: #008000; font-weight: bold">import</span> datasets, layers, models
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">matplotlib.pyplot</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">plt</span>
+    <span style="color: #408080; font-style: italic"># Get the weights and bias for this layer</span>
+    w_output <span style="color: #666666">=</span> deep_params[<span style="color: #666666">-1</span>]
 
-<span style="color: #408080; font-style: italic"># We import the data set</span>
-(train_images, train_labels), (test_images, test_labels) <span style="color: #666666">=</span> datasets<span style="color: #666666">.</span>cifar10<span style="color: #666666">.</span>load_data()
+    <span style="color: #408080; font-style: italic"># Include bias:</span>
+    x_prev <span style="color: #666666">=</span> np<span style="color: #666666">.</span>concatenate((np<span style="color: #666666">.</span>ones((<span style="color: #666666">1</span>,num_values)), x_prev), axis <span style="color: #666666">=</span> <span style="color: #666666">0</span>)
 
-<span style="color: #408080; font-style: italic"># Normalize pixel values to be between 0 and 1 by dividing by 255. </span>
-train_images, test_images <span style="color: #666666">=</span> train_images <span style="color: #666666">/</span> <span style="color: #666666">255.0</span>, test_images <span style="color: #666666">/</span> <span style="color: #666666">255.0</span>
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
+    z_output <span style="color: #666666">=</span> np<span style="color: #666666">.</span>matmul(w_output, x_prev)
+    x_output <span style="color: #666666">=</span> z_output
 
+    <span style="color: #008000; font-weight: bold">return</span> x_output
 
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="verifying-the-data-set">Verifying the data set </h2>
 
-<p>To verify that the dataset looks correct, let's plot the first 25 images from the training set and display the class name below each image.</p>
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">solve_ode_deep_neural_network</span>(x, num_neurons, num_iter, lmb):
+    <span style="color: #408080; font-style: italic"># num_hidden_neurons is now a list of number of neurons within each hidden layer</span>
 
+    <span style="color: #408080; font-style: italic"># Find the number of hidden layers:</span>
+    N_hidden <span style="color: #666666">=</span> np<span style="color: #666666">.</span>size(num_neurons)
 
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;">class_names <span style="color: #666666">=</span> [<span style="color: #BA2121">&#39;airplane&#39;</span>, <span style="color: #BA2121">&#39;automobile&#39;</span>, <span style="color: #BA2121">&#39;bird&#39;</span>, <span style="color: #BA2121">&#39;cat&#39;</span>, <span style="color: #BA2121">&#39;deer&#39;</span>,
-               <span style="color: #BA2121">&#39;dog&#39;</span>, <span style="color: #BA2121">&#39;frog&#39;</span>, <span style="color: #BA2121">&#39;horse&#39;</span>, <span style="color: #BA2121">&#39;ship&#39;</span>, <span style="color: #BA2121">&#39;truck&#39;</span>]
-plt<span style="color: #666666">.</span>figure(figsize<span style="color: #666666">=</span>(<span style="color: #666666">10</span>,<span style="color: #666666">10</span>))
-<span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #666666">25</span>):
-    plt<span style="color: #666666">.</span>subplot(<span style="color: #666666">5</span>,<span style="color: #666666">5</span>,i<span style="color: #666666">+1</span>)
-    plt<span style="color: #666666">.</span>xticks([])
-    plt<span style="color: #666666">.</span>yticks([])
-    plt<span style="color: #666666">.</span>grid(<span style="color: #008000; font-weight: bold">False</span>)
-    plt<span style="color: #666666">.</span>imshow(train_images[i], cmap<span style="color: #666666">=</span>plt<span style="color: #666666">.</span>cm<span style="color: #666666">.</span>binary)
-    <span style="color: #408080; font-style: italic"># The CIFAR labels happen to be arrays, </span>
-    <span style="color: #408080; font-style: italic"># which is why you need the extra index</span>
-    plt<span style="color: #666666">.</span>xlabel(class_names[train_labels[i][<span style="color: #666666">0</span>]])
-plt<span style="color: #666666">.</span>show()
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
+    <span style="color: #408080; font-style: italic">## Set up initial weigths and biases</span>
 
+    <span style="color: #408080; font-style: italic"># Initialize the list of parameters:</span>
+    P <span style="color: #666666">=</span> [<span style="color: #008000; font-weight: bold">None</span>]<span style="color: #666666">*</span>(N_hidden <span style="color: #666666">+</span> <span style="color: #666666">1</span>) <span style="color: #408080; font-style: italic"># + 1 to include the output layer</span>
 
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="set-up-the-model">Set up  the model </h2>
+    P[<span style="color: #666666">0</span>] <span style="color: #666666">=</span> npr<span style="color: #666666">.</span>randn(num_neurons[<span style="color: #666666">0</span>], <span style="color: #666666">2</span> )
+    <span style="color: #008000; font-weight: bold">for</span> l <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #666666">1</span>,N_hidden):
+        P[l] <span style="color: #666666">=</span> npr<span style="color: #666666">.</span>randn(num_neurons[l], num_neurons[l<span style="color: #666666">-1</span>] <span style="color: #666666">+</span> <span style="color: #666666">1</span>) <span style="color: #408080; font-style: italic"># +1 to include bias</span>
 
-<p>The 6 lines of code below define the convolutional base using a common pattern: a stack of Conv2D and MaxPooling2D layers.</p>
+    <span style="color: #408080; font-style: italic"># For the output layer</span>
+    P[<span style="color: #666666">-1</span>] <span style="color: #666666">=</span> npr<span style="color: #666666">.</span>randn(<span style="color: #666666">1</span>, num_neurons[<span style="color: #666666">-1</span>] <span style="color: #666666">+</span> <span style="color: #666666">1</span> ) <span style="color: #408080; font-style: italic"># +1 since bias is included</span>
 
-<p>As input, a CNN takes tensors of shape (image_height, image_width, color_channels), ignoring the batch size. If you are new to these dimensions, color_channels refers to (R,G,B). In this example, you will configure our CNN to process inputs of shape (32, 32, 3), which is the format of CIFAR images. You can do this by passing the argument input_shape to our first layer.</p>
+    <span style="color: #008000">print</span>(<span style="color: #BA2121">&#39;Initial cost: </span><span style="color: #BB6688; font-weight: bold">%g</span><span style="color: #BA2121">&#39;</span><span style="color: #666666">%</span>cost_function_deep(P, x))
 
+    <span style="color: #408080; font-style: italic">## Start finding the optimal weigths using gradient descent</span>
 
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;">model <span style="color: #666666">=</span> models<span style="color: #666666">.</span>Sequential()
-model<span style="color: #666666">.</span>add(layers<span style="color: #666666">.</span>Conv2D(<span style="color: #666666">32</span>, (<span style="color: #666666">3</span>, <span style="color: #666666">3</span>), activation<span style="color: #666666">=</span><span style="color: #BA2121">&#39;relu&#39;</span>, input_shape<span style="color: #666666">=</span>(<span style="color: #666666">32</span>, <span style="color: #666666">32</span>, <span style="color: #666666">3</span>)))
-model<span style="color: #666666">.</span>add(layers<span style="color: #666666">.</span>MaxPooling2D((<span style="color: #666666">2</span>, <span style="color: #666666">2</span>)))
-model<span style="color: #666666">.</span>add(layers<span style="color: #666666">.</span>Conv2D(<span style="color: #666666">64</span>, (<span style="color: #666666">3</span>, <span style="color: #666666">3</span>), activation<span style="color: #666666">=</span><span style="color: #BA2121">&#39;relu&#39;</span>))
-model<span style="color: #666666">.</span>add(layers<span style="color: #666666">.</span>MaxPooling2D((<span style="color: #666666">2</span>, <span style="color: #666666">2</span>)))
-model<span style="color: #666666">.</span>add(layers<span style="color: #666666">.</span>Conv2D(<span style="color: #666666">64</span>, (<span style="color: #666666">3</span>, <span style="color: #666666">3</span>), activation<span style="color: #666666">=</span><span style="color: #BA2121">&#39;relu&#39;</span>))
+    <span style="color: #408080; font-style: italic"># Find the Python function that represents the gradient of the cost function</span>
+    <span style="color: #408080; font-style: italic"># w.r.t the 0-th input argument -- that is the weights and biases in the hidden and output layer</span>
+    cost_function_deep_grad <span style="color: #666666">=</span> grad(cost_function_deep,<span style="color: #666666">0</span>)
 
-<span style="color: #408080; font-style: italic"># Let&#39;s display the architecture of our model so far.</span>
+    <span style="color: #408080; font-style: italic"># Let the update be done num_iter times</span>
+    <span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(num_iter):
+        <span style="color: #408080; font-style: italic"># Evaluate the gradient at the current weights and biases in P.</span>
+        <span style="color: #408080; font-style: italic"># The cost_grad consist now of N_hidden + 1 arrays; the gradient w.r.t the weights and biases</span>
+        <span style="color: #408080; font-style: italic"># in the hidden layers and output layers evaluated at x.</span>
+        cost_deep_grad <span style="color: #666666">=</span>  cost_function_deep_grad(P, x)
 
-model<span style="color: #666666">.</span>summary()
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
+        <span style="color: #008000; font-weight: bold">for</span> l <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(N_hidden<span style="color: #666666">+1</span>):
+            P[l] <span style="color: #666666">=</span> P[l] <span style="color: #666666">-</span> lmb <span style="color: #666666">*</span> cost_deep_grad[l]
 
-<p>You can see that the output of every Conv2D and MaxPooling2D layer is a 3D tensor of shape (height, width, channels). The width and height dimensions tend to shrink as you go deeper in the network. The number of output channels for each Conv2D layer is controlled by the first argument (e.g., 32 or 64). Typically, as the width and height shrink, you can afford (computationally) to add more output channels in each Conv2D layer.</p>
+    <span style="color: #008000">print</span>(<span style="color: #BA2121">&#39;Final cost: </span><span style="color: #BB6688; font-weight: bold">%g</span><span style="color: #BA2121">&#39;</span><span style="color: #666666">%</span>cost_function_deep(P, x))
 
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="add-dense-layers-on-top">Add Dense layers on top </h2>
-
-<p>To complete our model, you will feed the last output tensor from the
-convolutional base (of shape (4, 4, 64)) into one or more Dense layers
-to perform classification. Dense layers take vectors as input (which
-are 1D), while the current output is a 3D tensor. First, you will
-flatten (or unroll) the 3D output to 1D, then add one or more Dense
-layers on top. CIFAR has 10 output classes, so you use a final Dense
-layer with 10 outputs and a softmax activation.
-</p>
+    <span style="color: #008000; font-weight: bold">return</span> P
 
+<span style="color: #408080; font-style: italic">## Set up the cost function specified for this Poisson equation:</span>
 
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;">model<span style="color: #666666">.</span>add(layers<span style="color: #666666">.</span>Flatten())
-model<span style="color: #666666">.</span>add(layers<span style="color: #666666">.</span>Dense(<span style="color: #666666">64</span>, activation<span style="color: #666666">=</span><span style="color: #BA2121">&#39;relu&#39;</span>))
-model<span style="color: #666666">.</span>add(layers<span style="color: #666666">.</span>Dense(<span style="color: #666666">10</span>))
-<span style="color: #408080; font-style: italic"># Here&#39;s the complete architecture of our model</span>
+<span style="color: #408080; font-style: italic"># The right side of the ODE</span>
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">f</span>(x):
+    <span style="color: #008000; font-weight: bold">return</span> (<span style="color: #666666">3*</span>x <span style="color: #666666">+</span> x<span style="color: #666666">**2</span>)<span style="color: #666666">*</span>np<span style="color: #666666">.</span>exp(x)
 
-model<span style="color: #666666">.</span>summary()
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">cost_function_deep</span>(P, x):
 
-<p>As you can see, our (4, 4, 64) outputs were flattened into vectors of shape (1024) before going through two Dense layers.</p>
+    <span style="color: #408080; font-style: italic"># Evaluate the trial function with the current parameters P</span>
+    g_t <span style="color: #666666">=</span> g_trial_deep(x,P)
 
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="compile-and-train-the-model">Compile and train the model </h2>
+    <span style="color: #408080; font-style: italic"># Find the derivative w.r.t x of the trial function</span>
+    d2_g_t <span style="color: #666666">=</span> elementwise_grad(elementwise_grad(g_trial_deep,<span style="color: #666666">0</span>))(x,P)
 
+    right_side <span style="color: #666666">=</span> f(x)
 
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;">model<span style="color: #666666">.</span>compile(optimizer<span style="color: #666666">=</span><span style="color: #BA2121">&#39;adam&#39;</span>,
-              loss<span style="color: #666666">=</span>tf<span style="color: #666666">.</span>keras<span style="color: #666666">.</span>losses<span style="color: #666666">.</span>SparseCategoricalCrossentropy(from_logits<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">True</span>),
-              metrics<span style="color: #666666">=</span>[<span style="color: #BA2121">&#39;accuracy&#39;</span>])
+    err_sqr <span style="color: #666666">=</span> (<span style="color: #666666">-</span>d2_g_t <span style="color: #666666">-</span> right_side)<span style="color: #666666">**2</span>
+    cost_sum <span style="color: #666666">=</span> np<span style="color: #666666">.</span>sum(err_sqr)
 
-history <span style="color: #666666">=</span> model<span style="color: #666666">.</span>fit(train_images, train_labels, epochs<span style="color: #666666">=10</span>, 
-                    validation_data<span style="color: #666666">=</span>(test_images, test_labels))
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
+    <span style="color: #008000; font-weight: bold">return</span> cost_sum<span style="color: #666666">/</span>np<span style="color: #666666">.</span>size(err_sqr)
 
+<span style="color: #408080; font-style: italic"># The trial solution:</span>
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">g_trial_deep</span>(x,P):
+    <span style="color: #008000; font-weight: bold">return</span> x<span style="color: #666666">*</span>(<span style="color: #666666">1-</span>x)<span style="color: #666666">*</span>deep_neural_network(P,x)
 
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="finally-evaluate-the-model">Finally, evaluate the model </h2>
+<span style="color: #408080; font-style: italic"># The analytic solution;</span>
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">g_analytic</span>(x):
+    <span style="color: #008000; font-weight: bold">return</span> x<span style="color: #666666">*</span>(<span style="color: #666666">1-</span>x)<span style="color: #666666">*</span>np<span style="color: #666666">.</span>exp(x)
 
+<span style="color: #008000; font-weight: bold">if</span> <span style="color: #19177C">__name__</span> <span style="color: #666666">==</span> <span style="color: #BA2121">&#39;__main__&#39;</span>:
+    npr<span style="color: #666666">.</span>seed(<span style="color: #666666">4155</span>)
 
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;">plt<span style="color: #666666">.</span>plot(history<span style="color: #666666">.</span>history[<span style="color: #BA2121">&#39;accuracy&#39;</span>], label<span style="color: #666666">=</span><span style="color: #BA2121">&#39;accuracy&#39;</span>)
-plt<span style="color: #666666">.</span>plot(history<span style="color: #666666">.</span>history[<span style="color: #BA2121">&#39;val_accuracy&#39;</span>], label <span style="color: #666666">=</span> <span style="color: #BA2121">&#39;val_accuracy&#39;</span>)
-plt<span style="color: #666666">.</span>xlabel(<span style="color: #BA2121">&#39;Epoch&#39;</span>)
-plt<span style="color: #666666">.</span>ylabel(<span style="color: #BA2121">&#39;Accuracy&#39;</span>)
-plt<span style="color: #666666">.</span>ylim([<span style="color: #666666">0.5</span>, <span style="color: #666666">1</span>])
-plt<span style="color: #666666">.</span>legend(loc<span style="color: #666666">=</span><span style="color: #BA2121">&#39;lower right&#39;</span>)
+    <span style="color: #408080; font-style: italic">## Decide the vales of arguments to the function to solve</span>
+    Nx <span style="color: #666666">=</span> <span style="color: #666666">10</span>
+    x <span style="color: #666666">=</span> np<span style="color: #666666">.</span>linspace(<span style="color: #666666">0</span>,<span style="color: #666666">1</span>, Nx)
+
+    <span style="color: #408080; font-style: italic">## Set up the initial parameters</span>
+    num_hidden_neurons <span style="color: #666666">=</span> [<span style="color: #666666">200</span>,<span style="color: #666666">100</span>]
+    num_iter <span style="color: #666666">=</span> <span style="color: #666666">1000</span>
+    lmb <span style="color: #666666">=</span> <span style="color: #666666">1e-3</span>
+
+    P <span style="color: #666666">=</span> solve_ode_deep_neural_network(x, num_hidden_neurons, num_iter, lmb)
 
-test_loss, test_acc <span style="color: #666666">=</span> model<span style="color: #666666">.</span>evaluate(test_images,  test_labels, verbose<span style="color: #666666">=2</span>)
+    g_dnn_ag <span style="color: #666666">=</span> g_trial_deep(x,P)
+    g_analytical <span style="color: #666666">=</span> g_analytic(x)
 
-<span style="color: #008000">print</span>(test_acc)
+    <span style="color: #408080; font-style: italic"># Find the maximum absolute difference between the solutons:</span>
+    max_diff <span style="color: #666666">=</span> np<span style="color: #666666">.</span>max(np<span style="color: #666666">.</span>abs(g_dnn_ag <span style="color: #666666">-</span> g_analytical))
+    <span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;The max absolute difference between the solutions is: </span><span style="color: #BB6688; font-weight: bold">%g</span><span style="color: #BA2121">&quot;</span><span style="color: #666666">%</span>max_diff)
+
+    plt<span style="color: #666666">.</span>figure(figsize<span style="color: #666666">=</span>(<span style="color: #666666">10</span>,<span style="color: #666666">10</span>))
+
+    plt<span style="color: #666666">.</span>title(<span style="color: #BA2121">&#39;Performance of neural network solving an ODE compared to the analytical solution&#39;</span>)
+    plt<span style="color: #666666">.</span>plot(x, g_analytical)
+    plt<span style="color: #666666">.</span>plot(x, g_dnn_ag[<span style="color: #666666">0</span>,:])
+    plt<span style="color: #666666">.</span>legend([<span style="color: #BA2121">&#39;analytical&#39;</span>,<span style="color: #BA2121">&#39;nn&#39;</span>])
+    plt<span style="color: #666666">.</span>xlabel(<span style="color: #BA2121">&#39;x&#39;</span>)
+    plt<span style="color: #666666">.</span>ylabel(<span style="color: #BA2121">&#39;g(x)&#39;</span>)
+    plt<span style="color: #666666">.</span>show()
 </pre>
 </div>
       </div>
@@ -1894,59 +1880,98 @@ <h2 id="finally-evaluate-the-model">Finally, evaluate the model </h2>
 
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="building-our-own-cnn-code">Building our own CNN code </h2>
+<h2 id="comparing-with-a-numerical-scheme">Comparing with a numerical scheme </h2>
 
-<p>Here we present a flexible and readable python code for a CNN
-implemented with NumPy. We will present the code, showcase how to use
-the codebase and fit a CNN that yields a 99% accuracy on the 28x28
-MNIST dataset within reasonable time.
-</p>
+<p>The Poisson equation is possible to solve using Taylor series to approximate the second derivative.</p>
 
-<b>The codes here were developed by Eric Reber and Gregor Kajda during spring 2023.</b>
+<p>Using Taylor series, the second derivative can be expressed as</p>
 
-<p>The CNN is compatible with all schedulers, cost functions and
-activation functions discussed in constructing our neural network
-codes.
+<p>$$
+g''(x) = \frac{g(x + \Delta x) - 2g(x) + g(x-\Delta x)}{\Delta x^2} + E_{\Delta x}(x)
+$$
 </p>
 
-<p> The CNN code consists of different types of Layer classes, including
-Convolution2DLayer, Pooling2DLayer, FlattenLayer, FullyConnectedLayer
-and OutputLayer, which can be added to the CNN object using the
-interface of the CNN class. This allows you to easily construct your
-own CNN, as well as allowing you to get used to an interface similar
-to that of TensorFlow which is used for real world applications. 
-</p>
+<p>where \( \Delta x \) is a small step size and \( E_{\Delta x}(x) \) being the error term.</p>
 
-<p>Another important feature of this code is that it throws errors if
-unreasonable decisions are made (for example using a kernel that is
-larger than the image, not using a FlattenLayer, etc), and provides
-the user with an informative error message.
-</p>
-<h3 id="list-of-contents">List of contents: </h3>
-<ol>
-<li> Schedulers</li>
-<li> Activation Functions</li>
-<li> Cost Functions</li> 
-<li> Convolution</li>
-<li> Layers</li>
-<li> CNN</li> 
-<li> Some final remarks</li>
-</ol>
-<h3 id="schedulers">Schedulers </h3>
-
-<p>The code below shows object oriented implementations of the Constant,
-Momentum, Adagrad, AdagradMomentum, RMS prop and Adam schedulers. All
-of the classes belong to the shared abstract Scheduler class, and
-share the update_change() and reset() methods allowing for any of the
-schedulers to be seamlessly used during the training stage, as will
-later be shown in the fit() method of the neural
-network. Update_change() only has one parameter, the gradient
-(\( \delta^{l}_{j}a^{l-1}_k \)), and returns the change which will be
-subtracted from the weights. The reset() function takes no parameters,
-and resets the desired variables. For Constant and Momentum, reset
-does nothing.
+<p>Looking away from the error terms gives an approximation to the second derivative:</p>
+
+$$
+\begin{equation} \label{approx}
+g''(x) \approx \frac{g(x + \Delta x) - 2g(x) + g(x-\Delta x)}{\Delta x^2}
+\end{equation}
+$$
+
+<p>If \( x_i = i \Delta x = x_{i-1} + \Delta x \) and \( g_i = g(x_i) \) for \( i = 1,\dots N_x - 2 \) with \( N_x \) being the number of values for \( x \), \eqref{approx} becomes</p>
+
+$$
+\begin{aligned}
+g''(x_i) &\approx \frac{g(x_i + \Delta x) - 2g(x_i) + g(x_i -\Delta x)}{\Delta x^2} \\
+&= \frac{g_{i+1} - 2g_i + g_{i-1}}{\Delta x^2}
+\end{aligned}
+$$
+
+<p>Since we know from our problem that</p>
+
+$$
+\begin{aligned}
+-g''(x) &= f(x) \\
+&= (3x + x^2)\exp(x)
+\end{aligned}
+$$
+
+<p>along with the conditions \( g(0) = g(1) = 0 \),
+the following scheme can be used to find an approximate solution for \( g(x) \) numerically:
 </p>
 
+$$
+\begin{equation}
+  \begin{aligned}
+  -\Big( \frac{g_{i+1} - 2g_i + g_{i-1}}{\Delta x^2} \Big) &= f(x_i) \\
+  -g_{i+1} + 2g_i - g_{i-1} &= \Delta x^2 f(x_i)
+  \end{aligned}
+\end{equation} \label{odesys}
+$$
+
+<p>for \( i = 1, \dots, N_x - 2 \) where \( g_0 = g_{N_x - 1} = 0 \) and \( f(x_i) = (3x_i + x_i^2)\exp(x_i) \), which is given for our specific problem.</p>
+
+<p>The equation can be rewritten into a matrix equation:</p>
+
+$$
+\begin{aligned}
+\begin{pmatrix}
+2 & -1 & 0 & \dots & 0 \\
+-1 & 2 & -1 & \dots & 0 \\
+\vdots & & \ddots & & \vdots \\
+0 & \dots & -1 & 2 & -1  \\
+0 & \dots & 0 & -1 & 2\\
+\end{pmatrix}
+\begin{pmatrix}
+g_1 \\
+g_2 \\
+\vdots \\
+g_{N_x - 3} \\
+g_{N_x - 2}
+\end{pmatrix}
+&=
+\Delta x^2
+\begin{pmatrix}
+f(x_1) \\
+f(x_2) \\
+\vdots \\
+f(x_{N_x - 3}) \\
+f(x_{N_x - 2})
+\end{pmatrix} \\
+\boldsymbol{A}\boldsymbol{g} &= \boldsymbol{f},
+\end{aligned}
+$$
+
+<p>which makes it possible to solve for the vector \( \boldsymbol{g} \).</p>
+
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="setting-up-the-code">Setting up the code </h2>
+
+<p>We can then compare the result from this numerical scheme with the output from our network using Autograd:</p>
+
 
 <!-- code=python (!bc pycod) typeset with pygments style "default" -->
 <div class="cell border-box-sizing code_cell rendered">
@@ -1955,135 +1980,199 @@ <h3 id="schedulers">Schedulers </h3>
       <div class="input_area">
         <div class="highlight" style="background: #f8f8f8">
   <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">autograd.numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">autograd</span> <span style="color: #008000; font-weight: bold">import</span> grad, elementwise_grad
+<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">autograd.numpy.random</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">npr</span>
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">matplotlib</span> <span style="color: #008000; font-weight: bold">import</span> pyplot <span style="color: #008000; font-weight: bold">as</span> plt
 
-<span style="color: #008000; font-weight: bold">class</span> <span style="color: #0000FF; font-weight: bold">Scheduler</span>:
-    <span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;</span>
-<span style="color: #BA2121; font-style: italic">    Abstract class for Schedulers</span>
-<span style="color: #BA2121; font-style: italic">    &quot;&quot;&quot;</span>
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">sigmoid</span>(z):
+    <span style="color: #008000; font-weight: bold">return</span> <span style="color: #666666">1/</span>(<span style="color: #666666">1</span> <span style="color: #666666">+</span> np<span style="color: #666666">.</span>exp(<span style="color: #666666">-</span>z))
 
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">__init__</span>(<span style="color: #008000">self</span>, eta):
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>eta <span style="color: #666666">=</span> eta
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">deep_neural_network</span>(deep_params, x):
+    <span style="color: #408080; font-style: italic"># N_hidden is the number of hidden layers</span>
+    <span style="color: #408080; font-style: italic"># deep_params is a list, len() should be used</span>
+    N_hidden <span style="color: #666666">=</span> <span style="color: #008000">len</span>(deep_params) <span style="color: #666666">-</span> <span style="color: #666666">1</span> <span style="color: #408080; font-style: italic"># -1 since params consists of</span>
+                                        <span style="color: #408080; font-style: italic"># parameters to all the hidden</span>
+                                        <span style="color: #408080; font-style: italic"># layers AND the output layer.</span>
 
-    <span style="color: #408080; font-style: italic"># should be overwritten</span>
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">update_change</span>(<span style="color: #008000">self</span>, gradient):
-        <span style="color: #008000; font-weight: bold">raise</span> <span style="color: #D2413A; font-weight: bold">NotImplementedError</span>
+    <span style="color: #408080; font-style: italic"># Assumes input x being an one-dimensional array</span>
+    num_values <span style="color: #666666">=</span> np<span style="color: #666666">.</span>size(x)
+    x <span style="color: #666666">=</span> x<span style="color: #666666">.</span>reshape(<span style="color: #666666">-1</span>, num_values)
 
-    <span style="color: #408080; font-style: italic"># overwritten if needed</span>
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">reset</span>(<span style="color: #008000">self</span>):
-        <span style="color: #008000; font-weight: bold">pass</span>
+    <span style="color: #408080; font-style: italic"># Assume that the input layer does nothing to the input x</span>
+    x_input <span style="color: #666666">=</span> x
 
+    <span style="color: #408080; font-style: italic"># Due to multiple hidden layers, define a variable referencing to the</span>
+    <span style="color: #408080; font-style: italic"># output of the previous layer:</span>
+    x_prev <span style="color: #666666">=</span> x_input
 
-<span style="color: #008000; font-weight: bold">class</span> <span style="color: #0000FF; font-weight: bold">Constant</span>(Scheduler):
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">__init__</span>(<span style="color: #008000">self</span>, eta):
-        <span style="color: #008000">super</span>()<span style="color: #666666">.</span><span style="color: #0000FF">__init__</span>(eta)
+    <span style="color: #408080; font-style: italic">## Hidden layers:</span>
 
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">update_change</span>(<span style="color: #008000">self</span>, gradient):
-        <span style="color: #008000; font-weight: bold">return</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>eta <span style="color: #666666">*</span> gradient
-    
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">reset</span>(<span style="color: #008000">self</span>):
-        <span style="color: #008000; font-weight: bold">pass</span>
+    <span style="color: #008000; font-weight: bold">for</span> l <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(N_hidden):
+        <span style="color: #408080; font-style: italic"># From the list of parameters P; find the correct weigths and bias for this layer</span>
+        w_hidden <span style="color: #666666">=</span> deep_params[l]
+
+        <span style="color: #408080; font-style: italic"># Add a row of ones to include bias</span>
+        x_prev <span style="color: #666666">=</span> np<span style="color: #666666">.</span>concatenate((np<span style="color: #666666">.</span>ones((<span style="color: #666666">1</span>,num_values)), x_prev ), axis <span style="color: #666666">=</span> <span style="color: #666666">0</span>)
+
+        z_hidden <span style="color: #666666">=</span> np<span style="color: #666666">.</span>matmul(w_hidden, x_prev)
+        x_hidden <span style="color: #666666">=</span> sigmoid(z_hidden)
+
+        <span style="color: #408080; font-style: italic"># Update x_prev such that next layer can use the output from this layer</span>
+        x_prev <span style="color: #666666">=</span> x_hidden
+
+    <span style="color: #408080; font-style: italic">## Output layer:</span>
+
+    <span style="color: #408080; font-style: italic"># Get the weights and bias for this layer</span>
+    w_output <span style="color: #666666">=</span> deep_params[<span style="color: #666666">-1</span>]
 
+    <span style="color: #408080; font-style: italic"># Include bias:</span>
+    x_prev <span style="color: #666666">=</span> np<span style="color: #666666">.</span>concatenate((np<span style="color: #666666">.</span>ones((<span style="color: #666666">1</span>,num_values)), x_prev), axis <span style="color: #666666">=</span> <span style="color: #666666">0</span>)
 
-<span style="color: #008000; font-weight: bold">class</span> <span style="color: #0000FF; font-weight: bold">Momentum</span>(Scheduler):
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">__init__</span>(<span style="color: #008000">self</span>, eta: <span style="color: #008000">float</span>, momentum: <span style="color: #008000">float</span>):
-        <span style="color: #008000">super</span>()<span style="color: #666666">.</span><span style="color: #0000FF">__init__</span>(eta)
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>momentum <span style="color: #666666">=</span> momentum
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>change <span style="color: #666666">=</span> <span style="color: #666666">0</span>
+    z_output <span style="color: #666666">=</span> np<span style="color: #666666">.</span>matmul(w_output, x_prev)
+    x_output <span style="color: #666666">=</span> z_output
 
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">update_change</span>(<span style="color: #008000">self</span>, gradient):
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>change <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>momentum <span style="color: #666666">*</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>change <span style="color: #666666">+</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>eta <span style="color: #666666">*</span> gradient
-        <span style="color: #008000; font-weight: bold">return</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>change
+    <span style="color: #008000; font-weight: bold">return</span> x_output
 
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">reset</span>(<span style="color: #008000">self</span>):
-        <span style="color: #008000; font-weight: bold">pass</span>
 
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">solve_ode_deep_neural_network</span>(x, num_neurons, num_iter, lmb):
+    <span style="color: #408080; font-style: italic"># num_hidden_neurons is now a list of number of neurons within each hidden layer</span>
 
-<span style="color: #008000; font-weight: bold">class</span> <span style="color: #0000FF; font-weight: bold">Adagrad</span>(Scheduler):
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">__init__</span>(<span style="color: #008000">self</span>, eta):
-        <span style="color: #008000">super</span>()<span style="color: #666666">.</span><span style="color: #0000FF">__init__</span>(eta)
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>G_t <span style="color: #666666">=</span> <span style="color: #008000; font-weight: bold">None</span>
+    <span style="color: #408080; font-style: italic"># Find the number of hidden layers:</span>
+    N_hidden <span style="color: #666666">=</span> np<span style="color: #666666">.</span>size(num_neurons)
 
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">update_change</span>(<span style="color: #008000">self</span>, gradient):
-        delta <span style="color: #666666">=</span> <span style="color: #666666">1e-8</span>  <span style="color: #408080; font-style: italic"># avoid division ny zero</span>
+    <span style="color: #408080; font-style: italic">## Set up initial weigths and biases</span>
 
-        <span style="color: #008000; font-weight: bold">if</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>G_t <span style="color: #AA22FF; font-weight: bold">is</span> <span style="color: #008000; font-weight: bold">None</span>:
-            <span style="color: #008000">self</span><span style="color: #666666">.</span>G_t <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros((gradient<span style="color: #666666">.</span>shape[<span style="color: #666666">0</span>], gradient<span style="color: #666666">.</span>shape[<span style="color: #666666">0</span>]))
+    <span style="color: #408080; font-style: italic"># Initialize the list of parameters:</span>
+    P <span style="color: #666666">=</span> [<span style="color: #008000; font-weight: bold">None</span>]<span style="color: #666666">*</span>(N_hidden <span style="color: #666666">+</span> <span style="color: #666666">1</span>) <span style="color: #408080; font-style: italic"># + 1 to include the output layer</span>
 
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>G_t <span style="color: #666666">+=</span> gradient <span style="color: #666666">@</span> gradient<span style="color: #666666">.</span>T
+    P[<span style="color: #666666">0</span>] <span style="color: #666666">=</span> npr<span style="color: #666666">.</span>randn(num_neurons[<span style="color: #666666">0</span>], <span style="color: #666666">2</span> )
+    <span style="color: #008000; font-weight: bold">for</span> l <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #666666">1</span>,N_hidden):
+        P[l] <span style="color: #666666">=</span> npr<span style="color: #666666">.</span>randn(num_neurons[l], num_neurons[l<span style="color: #666666">-1</span>] <span style="color: #666666">+</span> <span style="color: #666666">1</span>) <span style="color: #408080; font-style: italic"># +1 to include bias</span>
 
-        G_t_inverse <span style="color: #666666">=</span> <span style="color: #666666">1</span> <span style="color: #666666">/</span> (
-            delta <span style="color: #666666">+</span> np<span style="color: #666666">.</span>sqrt(np<span style="color: #666666">.</span>reshape(np<span style="color: #666666">.</span>diagonal(<span style="color: #008000">self</span><span style="color: #666666">.</span>G_t), (<span style="color: #008000">self</span><span style="color: #666666">.</span>G_t<span style="color: #666666">.</span>shape[<span style="color: #666666">0</span>], <span style="color: #666666">1</span>)))
-        )
-        <span style="color: #008000; font-weight: bold">return</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>eta <span style="color: #666666">*</span> gradient <span style="color: #666666">*</span> G_t_inverse
+    <span style="color: #408080; font-style: italic"># For the output layer</span>
+    P[<span style="color: #666666">-1</span>] <span style="color: #666666">=</span> npr<span style="color: #666666">.</span>randn(<span style="color: #666666">1</span>, num_neurons[<span style="color: #666666">-1</span>] <span style="color: #666666">+</span> <span style="color: #666666">1</span> ) <span style="color: #408080; font-style: italic"># +1 since bias is included</span>
 
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">reset</span>(<span style="color: #008000">self</span>):
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>G_t <span style="color: #666666">=</span> <span style="color: #008000; font-weight: bold">None</span>
+    <span style="color: #008000">print</span>(<span style="color: #BA2121">&#39;Initial cost: </span><span style="color: #BB6688; font-weight: bold">%g</span><span style="color: #BA2121">&#39;</span><span style="color: #666666">%</span>cost_function_deep(P, x))
 
+    <span style="color: #408080; font-style: italic">## Start finding the optimal weigths using gradient descent</span>
 
-<span style="color: #008000; font-weight: bold">class</span> <span style="color: #0000FF; font-weight: bold">AdagradMomentum</span>(Scheduler):
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">__init__</span>(<span style="color: #008000">self</span>, eta, momentum):
-        <span style="color: #008000">super</span>()<span style="color: #666666">.</span><span style="color: #0000FF">__init__</span>(eta)
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>G_t <span style="color: #666666">=</span> <span style="color: #008000; font-weight: bold">None</span>
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>momentum <span style="color: #666666">=</span> momentum
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>change <span style="color: #666666">=</span> <span style="color: #666666">0</span>
+    <span style="color: #408080; font-style: italic"># Find the Python function that represents the gradient of the cost function</span>
+    <span style="color: #408080; font-style: italic"># w.r.t the 0-th input argument -- that is the weights and biases in the hidden and output layer</span>
+    cost_function_deep_grad <span style="color: #666666">=</span> grad(cost_function_deep,<span style="color: #666666">0</span>)
 
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">update_change</span>(<span style="color: #008000">self</span>, gradient):
-        delta <span style="color: #666666">=</span> <span style="color: #666666">1e-8</span>  <span style="color: #408080; font-style: italic"># avoid division ny zero</span>
+    <span style="color: #408080; font-style: italic"># Let the update be done num_iter times</span>
+    <span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(num_iter):
+        <span style="color: #408080; font-style: italic"># Evaluate the gradient at the current weights and biases in P.</span>
+        <span style="color: #408080; font-style: italic"># The cost_grad consist now of N_hidden + 1 arrays; the gradient w.r.t the weights and biases</span>
+        <span style="color: #408080; font-style: italic"># in the hidden layers and output layers evaluated at x.</span>
+        cost_deep_grad <span style="color: #666666">=</span>  cost_function_deep_grad(P, x)
 
-        <span style="color: #008000; font-weight: bold">if</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>G_t <span style="color: #AA22FF; font-weight: bold">is</span> <span style="color: #008000; font-weight: bold">None</span>:
-            <span style="color: #008000">self</span><span style="color: #666666">.</span>G_t <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros((gradient<span style="color: #666666">.</span>shape[<span style="color: #666666">0</span>], gradient<span style="color: #666666">.</span>shape[<span style="color: #666666">0</span>]))
+        <span style="color: #008000; font-weight: bold">for</span> l <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(N_hidden<span style="color: #666666">+1</span>):
+            P[l] <span style="color: #666666">=</span> P[l] <span style="color: #666666">-</span> lmb <span style="color: #666666">*</span> cost_deep_grad[l]
 
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>G_t <span style="color: #666666">+=</span> gradient <span style="color: #666666">@</span> gradient<span style="color: #666666">.</span>T
+    <span style="color: #008000">print</span>(<span style="color: #BA2121">&#39;Final cost: </span><span style="color: #BB6688; font-weight: bold">%g</span><span style="color: #BA2121">&#39;</span><span style="color: #666666">%</span>cost_function_deep(P, x))
 
-        G_t_inverse <span style="color: #666666">=</span> <span style="color: #666666">1</span> <span style="color: #666666">/</span> (
-            delta <span style="color: #666666">+</span> np<span style="color: #666666">.</span>sqrt(np<span style="color: #666666">.</span>reshape(np<span style="color: #666666">.</span>diagonal(<span style="color: #008000">self</span><span style="color: #666666">.</span>G_t), (<span style="color: #008000">self</span><span style="color: #666666">.</span>G_t<span style="color: #666666">.</span>shape[<span style="color: #666666">0</span>], <span style="color: #666666">1</span>)))
-        )
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>change <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>change <span style="color: #666666">*</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>momentum <span style="color: #666666">+</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>eta <span style="color: #666666">*</span> gradient <span style="color: #666666">*</span> G_t_inverse
-        <span style="color: #008000; font-weight: bold">return</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>change
+    <span style="color: #008000; font-weight: bold">return</span> P
 
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">reset</span>(<span style="color: #008000">self</span>):
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>G_t <span style="color: #666666">=</span> <span style="color: #008000; font-weight: bold">None</span>
+<span style="color: #408080; font-style: italic">## Set up the cost function specified for this Poisson equation:</span>
 
+<span style="color: #408080; font-style: italic"># The right side of the ODE</span>
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">f</span>(x):
+    <span style="color: #008000; font-weight: bold">return</span> (<span style="color: #666666">3*</span>x <span style="color: #666666">+</span> x<span style="color: #666666">**2</span>)<span style="color: #666666">*</span>np<span style="color: #666666">.</span>exp(x)
 
-<span style="color: #008000; font-weight: bold">class</span> <span style="color: #0000FF; font-weight: bold">RMS_prop</span>(Scheduler):
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">__init__</span>(<span style="color: #008000">self</span>, eta, rho):
-        <span style="color: #008000">super</span>()<span style="color: #666666">.</span><span style="color: #0000FF">__init__</span>(eta)
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>rho <span style="color: #666666">=</span> rho
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>second <span style="color: #666666">=</span> <span style="color: #666666">0.0</span>
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">cost_function_deep</span>(P, x):
 
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">update_change</span>(<span style="color: #008000">self</span>, gradient):
-        delta <span style="color: #666666">=</span> <span style="color: #666666">1e-8</span>  <span style="color: #408080; font-style: italic"># avoid division ny zero</span>
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>second <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>rho <span style="color: #666666">*</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>second <span style="color: #666666">+</span> (<span style="color: #666666">1</span> <span style="color: #666666">-</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>rho) <span style="color: #666666">*</span> gradient <span style="color: #666666">*</span> gradient
-        <span style="color: #008000; font-weight: bold">return</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>eta <span style="color: #666666">*</span> gradient <span style="color: #666666">/</span> (np<span style="color: #666666">.</span>sqrt(<span style="color: #008000">self</span><span style="color: #666666">.</span>second <span style="color: #666666">+</span> delta))
+    <span style="color: #408080; font-style: italic"># Evaluate the trial function with the current parameters P</span>
+    g_t <span style="color: #666666">=</span> g_trial_deep(x,P)
 
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">reset</span>(<span style="color: #008000">self</span>):
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>second <span style="color: #666666">=</span> <span style="color: #666666">0.0</span>
+    <span style="color: #408080; font-style: italic"># Find the derivative w.r.t x of the trial function</span>
+    d2_g_t <span style="color: #666666">=</span> elementwise_grad(elementwise_grad(g_trial_deep,<span style="color: #666666">0</span>))(x,P)
 
+    right_side <span style="color: #666666">=</span> f(x)
 
-<span style="color: #008000; font-weight: bold">class</span> <span style="color: #0000FF; font-weight: bold">Adam</span>(Scheduler):
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">__init__</span>(<span style="color: #008000">self</span>, eta, rho, rho2):
-        <span style="color: #008000">super</span>()<span style="color: #666666">.</span><span style="color: #0000FF">__init__</span>(eta)
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>rho <span style="color: #666666">=</span> rho
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>rho2 <span style="color: #666666">=</span> rho2
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>moment <span style="color: #666666">=</span> <span style="color: #666666">0</span>
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>second <span style="color: #666666">=</span> <span style="color: #666666">0</span>
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>n_epochs <span style="color: #666666">=</span> <span style="color: #666666">1</span>
+    err_sqr <span style="color: #666666">=</span> (<span style="color: #666666">-</span>d2_g_t <span style="color: #666666">-</span> right_side)<span style="color: #666666">**2</span>
+    cost_sum <span style="color: #666666">=</span> np<span style="color: #666666">.</span>sum(err_sqr)
 
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">update_change</span>(<span style="color: #008000">self</span>, gradient):
-        delta <span style="color: #666666">=</span> <span style="color: #666666">1e-8</span>  <span style="color: #408080; font-style: italic"># avoid division ny zero</span>
+    <span style="color: #008000; font-weight: bold">return</span> cost_sum<span style="color: #666666">/</span>np<span style="color: #666666">.</span>size(err_sqr)
 
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>moment <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>rho <span style="color: #666666">*</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>moment <span style="color: #666666">+</span> (<span style="color: #666666">1</span> <span style="color: #666666">-</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>rho) <span style="color: #666666">*</span> gradient
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>second <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>rho2 <span style="color: #666666">*</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>second <span style="color: #666666">+</span> (<span style="color: #666666">1</span> <span style="color: #666666">-</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>rho2) <span style="color: #666666">*</span> gradient <span style="color: #666666">*</span> gradient
+<span style="color: #408080; font-style: italic"># The trial solution:</span>
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">g_trial_deep</span>(x,P):
+    <span style="color: #008000; font-weight: bold">return</span> x<span style="color: #666666">*</span>(<span style="color: #666666">1-</span>x)<span style="color: #666666">*</span>deep_neural_network(P,x)
 
-        moment_corrected <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>moment <span style="color: #666666">/</span> (<span style="color: #666666">1</span> <span style="color: #666666">-</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>rho<span style="color: #666666">**</span><span style="color: #008000">self</span><span style="color: #666666">.</span>n_epochs)
-        second_corrected <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>second <span style="color: #666666">/</span> (<span style="color: #666666">1</span> <span style="color: #666666">-</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>rho2<span style="color: #666666">**</span><span style="color: #008000">self</span><span style="color: #666666">.</span>n_epochs)
+<span style="color: #408080; font-style: italic"># The analytic solution;</span>
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">g_analytic</span>(x):
+    <span style="color: #008000; font-weight: bold">return</span> x<span style="color: #666666">*</span>(<span style="color: #666666">1-</span>x)<span style="color: #666666">*</span>np<span style="color: #666666">.</span>exp(x)
 
-        <span style="color: #008000; font-weight: bold">return</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>eta <span style="color: #666666">*</span> moment_corrected <span style="color: #666666">/</span> (np<span style="color: #666666">.</span>sqrt(second_corrected <span style="color: #666666">+</span> delta))
+<span style="color: #008000; font-weight: bold">if</span> <span style="color: #19177C">__name__</span> <span style="color: #666666">==</span> <span style="color: #BA2121">&#39;__main__&#39;</span>:
+    npr<span style="color: #666666">.</span>seed(<span style="color: #666666">4155</span>)
 
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">reset</span>(<span style="color: #008000">self</span>):
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>n_epochs <span style="color: #666666">+=</span> <span style="color: #666666">1</span>
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>moment <span style="color: #666666">=</span> <span style="color: #666666">0</span>
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>second <span style="color: #666666">=</span> <span style="color: #666666">0</span>
+    <span style="color: #408080; font-style: italic">## Decide the vales of arguments to the function to solve</span>
+    Nx <span style="color: #666666">=</span> <span style="color: #666666">10</span>
+    x <span style="color: #666666">=</span> np<span style="color: #666666">.</span>linspace(<span style="color: #666666">0</span>,<span style="color: #666666">1</span>, Nx)
+
+    <span style="color: #408080; font-style: italic">## Set up the initial parameters</span>
+    num_hidden_neurons <span style="color: #666666">=</span> [<span style="color: #666666">200</span>,<span style="color: #666666">100</span>]
+    num_iter <span style="color: #666666">=</span> <span style="color: #666666">1000</span>
+    lmb <span style="color: #666666">=</span> <span style="color: #666666">1e-3</span>
+
+    P <span style="color: #666666">=</span> solve_ode_deep_neural_network(x, num_hidden_neurons, num_iter, lmb)
+
+    g_dnn_ag <span style="color: #666666">=</span> g_trial_deep(x,P)
+    g_analytical <span style="color: #666666">=</span> g_analytic(x)
+
+    <span style="color: #408080; font-style: italic"># Find the maximum absolute difference between the solutons:</span>
+
+    plt<span style="color: #666666">.</span>figure(figsize<span style="color: #666666">=</span>(<span style="color: #666666">10</span>,<span style="color: #666666">10</span>))
+
+    plt<span style="color: #666666">.</span>title(<span style="color: #BA2121">&#39;Performance of neural network solving an ODE compared to the analytical solution&#39;</span>)
+    plt<span style="color: #666666">.</span>plot(x, g_analytical)
+    plt<span style="color: #666666">.</span>plot(x, g_dnn_ag[<span style="color: #666666">0</span>,:])
+    plt<span style="color: #666666">.</span>legend([<span style="color: #BA2121">&#39;analytical&#39;</span>,<span style="color: #BA2121">&#39;nn&#39;</span>])
+    plt<span style="color: #666666">.</span>xlabel(<span style="color: #BA2121">&#39;x&#39;</span>)
+    plt<span style="color: #666666">.</span>ylabel(<span style="color: #BA2121">&#39;g(x)&#39;</span>)
+
+    <span style="color: #408080; font-style: italic">## Perform the computation using the numerical scheme</span>
+
+    dx <span style="color: #666666">=</span> <span style="color: #666666">1/</span>(Nx <span style="color: #666666">-</span> <span style="color: #666666">1</span>)
+
+    <span style="color: #408080; font-style: italic"># Set up the matrix A</span>
+    A <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros((Nx<span style="color: #666666">-2</span>,Nx<span style="color: #666666">-2</span>))
+
+    A[<span style="color: #666666">0</span>,<span style="color: #666666">0</span>] <span style="color: #666666">=</span> <span style="color: #666666">2</span>
+    A[<span style="color: #666666">0</span>,<span style="color: #666666">1</span>] <span style="color: #666666">=</span> <span style="color: #666666">-1</span>
+
+    <span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #666666">1</span>,Nx<span style="color: #666666">-3</span>):
+        A[i,i<span style="color: #666666">-1</span>] <span style="color: #666666">=</span> <span style="color: #666666">-1</span>
+        A[i,i] <span style="color: #666666">=</span> <span style="color: #666666">2</span>
+        A[i,i<span style="color: #666666">+1</span>] <span style="color: #666666">=</span> <span style="color: #666666">-1</span>
+
+    A[Nx <span style="color: #666666">-</span> <span style="color: #666666">3</span>, Nx <span style="color: #666666">-</span> <span style="color: #666666">4</span>] <span style="color: #666666">=</span> <span style="color: #666666">-1</span>
+    A[Nx <span style="color: #666666">-</span> <span style="color: #666666">3</span>, Nx <span style="color: #666666">-</span> <span style="color: #666666">3</span>] <span style="color: #666666">=</span> <span style="color: #666666">2</span>
+
+    <span style="color: #408080; font-style: italic"># Set up the vector f</span>
+    f_vec <span style="color: #666666">=</span> dx<span style="color: #666666">**2</span> <span style="color: #666666">*</span> f(x[<span style="color: #666666">1</span>:<span style="color: #666666">-1</span>])
+
+    <span style="color: #408080; font-style: italic"># Solve the equation</span>
+    g_res <span style="color: #666666">=</span> np<span style="color: #666666">.</span>linalg<span style="color: #666666">.</span>solve(A,f_vec)
+
+    g_vec <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros(Nx)
+    g_vec[<span style="color: #666666">1</span>:<span style="color: #666666">-1</span>] <span style="color: #666666">=</span> g_res
+
+    <span style="color: #408080; font-style: italic"># Print the differences between each method</span>
+    max_diff1 <span style="color: #666666">=</span> np<span style="color: #666666">.</span>max(np<span style="color: #666666">.</span>abs(g_dnn_ag <span style="color: #666666">-</span> g_analytical))
+    max_diff2 <span style="color: #666666">=</span> np<span style="color: #666666">.</span>max(np<span style="color: #666666">.</span>abs(g_vec <span style="color: #666666">-</span> g_analytical))
+    <span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;The max absolute difference between the analytical solution and DNN Autograd: </span><span style="color: #BB6688; font-weight: bold">%g</span><span style="color: #BA2121">&quot;</span><span style="color: #666666">%</span>max_diff1)
+    <span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;The max absolute difference between the analytical solution and numerical scheme: </span><span style="color: #BB6688; font-weight: bold">%g</span><span style="color: #BA2121">&quot;</span><span style="color: #666666">%</span>max_diff2)
+
+    <span style="color: #408080; font-style: italic"># Plot the results</span>
+    plt<span style="color: #666666">.</span>figure(figsize<span style="color: #666666">=</span>(<span style="color: #666666">10</span>,<span style="color: #666666">10</span>))
+
+    plt<span style="color: #666666">.</span>plot(x,g_vec)
+    plt<span style="color: #666666">.</span>plot(x,g_analytical)
+    plt<span style="color: #666666">.</span>plot(x,g_dnn_ag[<span style="color: #666666">0</span>,:])
+
+    plt<span style="color: #666666">.</span>legend([<span style="color: #BA2121">&#39;numerical scheme&#39;</span>,<span style="color: #BA2121">&#39;analytical&#39;</span>,<span style="color: #BA2121">&#39;dnn&#39;</span>])
+    plt<span style="color: #666666">.</span>show()
 </pre>
 </div>
       </div>
@@ -2098,138 +2187,144 @@ <h3 id="schedulers">Schedulers </h3>
     </div>
   </div>
 </div>
-<h3 id="usage-of-schedulers">Usage of schedulers </h3>
 
-<p>To initalize a scheduler, simply create the object and pass in the necessary parameters such as the learning rate and the momentum as shown below. As the Scheduler class is an abstract class it should not called directly, and will raise an error upon usage.</p>
 
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="partial-differential-equations">Partial Differential Equations </h2>
 
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;">momentum_scheduler <span style="color: #666666">=</span> Momentum(eta<span style="color: #666666">=1e-3</span>, momentum<span style="color: #666666">=0.9</span>)
-adam_scheduler <span style="color: #666666">=</span> Adam(eta<span style="color: #666666">=1e-3</span>, rho<span style="color: #666666">=0.9</span>, rho2<span style="color: #666666">=0.999</span>)
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
+<p>A partial differential equation (PDE) has a solution here the function
+is defined by multiple variables.  The equation may involve all kinds
+of combinations of which variables the function is differentiated with
+respect to.
+</p>
 
-<p>Here is a small example for how a segment of code using schedulers could look. Switching out the schedulers is simple.</p>
+<p>In general, a partial differential equation for a function \( g(x_1,\dots,x_N) \) with \( N \) variables may be expressed as</p>
 
+$$
+\begin{equation} \label{PDE}
+  f\left(x_1, \, \dots \, , x_N, \frac{\partial g(x_1,\dots,x_N) }{\partial x_1}, \dots , \frac{\partial g(x_1,\dots,x_N) }{\partial x_N}, \frac{\partial g(x_1,\dots,x_N) }{\partial x_1\partial x_2}, \, \dots \, , \frac{\partial^n g(x_1,\dots,x_N) }{\partial x_N^n} \right) = 0
+\end{equation}
+$$
 
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;">weights <span style="color: #666666">=</span> np<span style="color: #666666">.</span>ones((<span style="color: #666666">3</span>,<span style="color: #666666">3</span>))
-<span style="color: #008000">print</span>(<span style="color: #BA2121">f&quot;Before scheduler:</span><span style="color: #BB6622; font-weight: bold">\n</span><span style="color: #BB6688; font-weight: bold">{</span>weights<span style="color: #BB6688; font-weight: bold">=}</span><span style="color: #BA2121">&quot;</span>)
+<p>where \( f \) is an expression involving all kinds of possible mixed derivatives of \( g(x_1,\dots,x_N) \) up to an order \( n \). In order for the solution to be unique, some additional conditions must also be given.</p>
 
-epochs <span style="color: #666666">=</span> <span style="color: #666666">10</span>
-<span style="color: #008000; font-weight: bold">for</span> e <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(epochs):
-    gradient <span style="color: #666666">=</span> np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>rand(<span style="color: #666666">3</span>, <span style="color: #666666">3</span>)
-    change <span style="color: #666666">=</span> adam_scheduler<span style="color: #666666">.</span>update_change(gradient)
-    weights <span style="color: #666666">=</span> weights <span style="color: #666666">-</span> change
-    adam_scheduler<span style="color: #666666">.</span>reset()
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="type-of-problem">Type of problem </h2>
 
-<span style="color: #008000">print</span>(<span style="color: #BA2121">f&quot;</span><span style="color: #BB6622; font-weight: bold">\n</span><span style="color: #BA2121">After scheduler:</span><span style="color: #BB6622; font-weight: bold">\n</span><span style="color: #BB6688; font-weight: bold">{</span>weights<span style="color: #BB6688; font-weight: bold">=}</span><span style="color: #BA2121">&quot;</span>)
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-<h3 id="cost-functions">Cost functions </h3>
+<p>The problem our network must solve for, is similar to the ODE case.
+We must have a trial solution \( g_t \) at hand.
+</p>
+
+<p>For instance, the trial solution could be expressed as</p>
+$$
+\begin{align*}
+  g_t(x_1,\dots,x_N) = h_1(x_1,\dots,x_N) + h_2(x_1,\dots,x_N,N(x_1,\dots,x_N,P))
+\end{align*}
+$$
 
-<p>In this section we will quickly look at cost functions that can be
-used when creating the neural network. Every cost function takes the
-target vector as its parameter, and returns a function valued only at
-X such that it may easily be differentiated.
+<p>where \( h_1(x_1,\dots,x_N) \) is a function that ensures \( g_t(x_1,\dots,x_N) \) satisfies some given conditions.
+The neural network \( N(x_1,\dots,x_N,P) \) has weights and biases described by \( P \) and \( h_2(x_1,\dots,x_N,N(x_1,\dots,x_N,P)) \) is an expression using the output from the neural network in some way.
 </p>
 
+<p>The role of the function \( h_2(x_1,\dots,x_N,N(x_1,\dots,x_N,P)) \), is to ensure that the output of \( N(x_1,\dots,x_N,P) \) is zero when \( g_t(x_1,\dots,x_N) \) is evaluated at the values of \( x_1,\dots,x_N \) where the given conditions must be satisfied. The function \( h_1(x_1,\dots,x_N) \) should alone make \( g_t(x_1,\dots,x_N) \) satisfy the conditions.</p>
 
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">CostOLS</span>(target):
-    <span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;</span>
-<span style="color: #BA2121; font-style: italic">    Return OLS function valued only at X, so</span>
-<span style="color: #BA2121; font-style: italic">    that it may be easily differentiated</span>
-<span style="color: #BA2121; font-style: italic">    &quot;&quot;&quot;</span>
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="network-requirements">Network requirements </h2>
 
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">func</span>(X):
-        <span style="color: #008000; font-weight: bold">return</span> (<span style="color: #666666">1.0</span> <span style="color: #666666">/</span> target<span style="color: #666666">.</span>shape[<span style="color: #666666">0</span>]) <span style="color: #666666">*</span> np<span style="color: #666666">.</span>sum((target <span style="color: #666666">-</span> X) <span style="color: #666666">**</span> <span style="color: #666666">2</span>)
+<p>The network tries then the minimize the cost function following the
+same ideas as described for the ODE case, but now with more than one
+variables to consider.  The concept still remains the same; find a set
+of parameters \( P \) such that the expression \( f \) in \eqref{PDE} is as
+close to zero as possible.
+</p>
 
-    <span style="color: #008000; font-weight: bold">return</span> func
+<p>As for the ODE case, the cost function is the mean squared error that
+the network must try to minimize. The cost function for the network to
+minimize is
+</p>
 
+$$
+\begin{equation*}
+C\left(x_1, \dots, x_N, P\right) = \left(  f\left(x_1, \, \dots \, , x_N, \frac{\partial g(x_1,\dots,x_N) }{\partial x_1}, \dots , \frac{\partial g(x_1,\dots,x_N) }{\partial x_N}, \frac{\partial g(x_1,\dots,x_N) }{\partial x_1\partial x_2}, \, \dots \, , \frac{\partial^n g(x_1,\dots,x_N) }{\partial x_N^n} \right) \right)^2
+\end{equation*}
+$$
 
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">CostLogReg</span>(target):
-    <span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;</span>
-<span style="color: #BA2121; font-style: italic">    Return Logistic Regression cost function</span>
-<span style="color: #BA2121; font-style: italic">    valued only at X, so that it may be easily differentiated</span>
-<span style="color: #BA2121; font-style: italic">    &quot;&quot;&quot;</span>
 
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">func</span>(X):
-        <span style="color: #008000; font-weight: bold">return</span> <span style="color: #666666">-</span>(<span style="color: #666666">1.0</span> <span style="color: #666666">/</span> target<span style="color: #666666">.</span>shape[<span style="color: #666666">0</span>]) <span style="color: #666666">*</span> np<span style="color: #666666">.</span>sum(
-            (target <span style="color: #666666">*</span> np<span style="color: #666666">.</span>log(X <span style="color: #666666">+</span> <span style="color: #666666">10e-10</span>)) <span style="color: #666666">+</span> ((<span style="color: #666666">1</span> <span style="color: #666666">-</span> target) <span style="color: #666666">*</span> np<span style="color: #666666">.</span>log(<span style="color: #666666">1</span> <span style="color: #666666">-</span> X <span style="color: #666666">+</span> <span style="color: #666666">10e-10</span>))
-        )
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="more-details">More details </h2>
 
-    <span style="color: #008000; font-weight: bold">return</span> func
+<p>If we let \( \boldsymbol{x} = \big( x_1, \dots, x_N \big) \) be an array containing the values for \( x_1, \dots, x_N \) respectively, the cost function can be reformulated into the following:</p>
+$$
+	C\left(\boldsymbol{x}, P\right) = f\left( \left( \boldsymbol{x}, \frac{\partial g(\boldsymbol{x}) }{\partial x_1}, \dots , \frac{\partial g(\boldsymbol{x}) }{\partial x_N}, \frac{\partial g(\boldsymbol{x}) }{\partial x_1\partial x_2}, \, \dots \, , \frac{\partial^n g(\boldsymbol{x}) }{\partial x_N^n} \right) \right)^2
+$$
 
+<p>If we also have \( M \) different sets of values for \( x_1, \dots, x_N \), that is \( \boldsymbol{x}_i = \big(x_1^{(i)}, \dots, x_N^{(i)}\big) \) for \( i = 1,\dots,M \) being the rows in matrix \( X \), the cost function can be generalized into</p>
+$$
+\begin{equation*}
+C\left(X, P \right) = \sum_{i=1}^M f\left( \left( \boldsymbol{x}_i, \frac{\partial g(\boldsymbol{x}_i) }{\partial x_1}, \dots , \frac{\partial g(\boldsymbol{x}_i) }{\partial x_N}, \frac{\partial g(\boldsymbol{x}_i) }{\partial x_1\partial x_2}, \, \dots \, , \frac{\partial^n g(\boldsymbol{x}_i) }{\partial x_N^n} \right) \right)^2.
+\end{equation*}
+$$
 
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">CostCrossEntropy</span>(target):
-    <span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;</span>
-<span style="color: #BA2121; font-style: italic">    Return cross entropy cost function valued only at X, so</span>
-<span style="color: #BA2121; font-style: italic">    that it may be easily differentiated</span>
-<span style="color: #BA2121; font-style: italic">    &quot;&quot;&quot;</span>
-    
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">func</span>(X):
-        <span style="color: #008000; font-weight: bold">return</span> <span style="color: #666666">-</span>(<span style="color: #666666">1.0</span> <span style="color: #666666">/</span> target<span style="color: #666666">.</span>size) <span style="color: #666666">*</span> np<span style="color: #666666">.</span>sum(target <span style="color: #666666">*</span> np<span style="color: #666666">.</span>log(X <span style="color: #666666">+</span> <span style="color: #666666">10e-10</span>))
 
-    <span style="color: #008000; font-weight: bold">return</span> func
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-<h3 id="usage-of-cost-functions">Usage of cost functions </h3>
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="example-the-diffusion-equation">Example: The diffusion equation </h2>
+
+<p>In one spatial dimension, the equation reads</p>
+$$
+\begin{equation*}
+  \frac{\partial g(x,t)}{\partial t} = \frac{\partial^2 g(x,t)}{\partial x^2}
+\end{equation*}
+$$
+
+<p>where a possible choice of conditions are</p>
+$$
+\begin{align*}
+g(0,t) &= 0 ,\qquad t \geq 0 \\
+g(1,t) &= 0, \qquad t \geq 0 \\
+g(x,0) &= u(x),\qquad x\in [0,1]
+\end{align*}
+$$
+
+<p>with \( u(x) \) being some given function.</p>
+
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="defining-the-problem">Defining the problem </h2>
+
+<p>For this case, we want to find \( g(x,t) \) such that</p>
+
+$$
+\begin{equation}
+  \frac{\partial g(x,t)}{\partial t} = \frac{\partial^2 g(x,t)}{\partial x^2}
+\end{equation} \label{diffonedim}
+$$
+
+<p>and</p>
+
+$$
+\begin{align*}
+g(0,t) &= 0 ,\qquad t \geq 0 \\
+g(1,t) &= 0, \qquad t \geq 0 \\
+g(x,0) &= u(x),\qquad x\in [0,1]
+\end{align*}
+$$
+
+<p>with \( u(x) = \sin(\pi x) \).</p>
 
-<p>Below we will provide a short example of how these cost function may
-be used to obtain results if you wish to test them out on your own
-using AutoGrad's automatic differentiation.
+<p>First, let us set up the deep neural network.
+The deep neural network will follow the same structure as discussed in the examples solving the ODEs.
+First, we will look into how Autograd could be used in a network tailored to solve for bivariate functions.
+</p>
+
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="setting-up-the-network-using-autograd">Setting up the network using Autograd </h2>
+
+<p>The only change to do here, is to extend our network such that
+functions of multiple parameters are correctly handled.  In this case
+we have two variables in our function to solve for, that is time \( t \)
+and position \( x \).  The variables will be represented by a
+one-dimensional array in the program.  The program will evaluate the
+network at each possible pair \( (x,t) \), given an array for the desired
+\( x \)-values and \( t \)-values to approximate the solution at.
 </p>
 
 
@@ -2239,16 +2334,50 @@ <h3 id="usage-of-cost-functions">Usage of cost functions </h3>
     <div class="inner_cell">
       <div class="input_area">
         <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">autograd</span> <span style="color: #008000; font-weight: bold">import</span> grad
+  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">sigmoid</span>(z):
+    <span style="color: #008000; font-weight: bold">return</span> <span style="color: #666666">1/</span>(<span style="color: #666666">1</span> <span style="color: #666666">+</span> np<span style="color: #666666">.</span>exp(<span style="color: #666666">-</span>z))
+
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">deep_neural_network</span>(deep_params, x):
+    <span style="color: #408080; font-style: italic"># x is now a point and a 1D numpy array; make it a column vector</span>
+    num_coordinates <span style="color: #666666">=</span> np<span style="color: #666666">.</span>size(x,<span style="color: #666666">0</span>)
+    x <span style="color: #666666">=</span> x<span style="color: #666666">.</span>reshape(num_coordinates,<span style="color: #666666">-1</span>)
+
+    num_points <span style="color: #666666">=</span> np<span style="color: #666666">.</span>size(x,<span style="color: #666666">1</span>)
+
+    <span style="color: #408080; font-style: italic"># N_hidden is the number of hidden layers</span>
+    N_hidden <span style="color: #666666">=</span> <span style="color: #008000">len</span>(deep_params) <span style="color: #666666">-</span> <span style="color: #666666">1</span> <span style="color: #408080; font-style: italic"># -1 since params consist of parameters to all the hidden layers AND the output layer</span>
+
+    <span style="color: #408080; font-style: italic"># Assume that the input layer does nothing to the input x</span>
+    x_input <span style="color: #666666">=</span> x
+    x_prev <span style="color: #666666">=</span> x_input
 
-target <span style="color: #666666">=</span> np<span style="color: #666666">.</span>array([[<span style="color: #666666">1</span>, <span style="color: #666666">2</span>, <span style="color: #666666">3</span>]])<span style="color: #666666">.</span>T
-a <span style="color: #666666">=</span> np<span style="color: #666666">.</span>array([[<span style="color: #666666">4</span>, <span style="color: #666666">5</span>, <span style="color: #666666">6</span>]])<span style="color: #666666">.</span>T
+    <span style="color: #408080; font-style: italic">## Hidden layers:</span>
 
-cost_func <span style="color: #666666">=</span> CostCrossEntropy
-cost_func_derivative <span style="color: #666666">=</span> grad(cost_func(target))
+    <span style="color: #008000; font-weight: bold">for</span> l <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(N_hidden):
+        <span style="color: #408080; font-style: italic"># From the list of parameters P; find the correct weigths and bias for this layer</span>
+        w_hidden <span style="color: #666666">=</span> deep_params[l]
 
-valued_at_a <span style="color: #666666">=</span> cost_func_derivative(a)
-<span style="color: #008000">print</span>(<span style="color: #BA2121">f&quot;Derivative of cost function </span><span style="color: #BB6688; font-weight: bold">{</span>cost_func<span style="color: #666666">.</span><span style="color: #19177C">__name__</span><span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121"> valued at a:</span><span style="color: #BB6622; font-weight: bold">\n</span><span style="color: #BB6688; font-weight: bold">{</span>valued_at_a<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">&quot;</span>)
+        <span style="color: #408080; font-style: italic"># Add a row of ones to include bias</span>
+        x_prev <span style="color: #666666">=</span> np<span style="color: #666666">.</span>concatenate((np<span style="color: #666666">.</span>ones((<span style="color: #666666">1</span>,num_points)), x_prev ), axis <span style="color: #666666">=</span> <span style="color: #666666">0</span>)
+
+        z_hidden <span style="color: #666666">=</span> np<span style="color: #666666">.</span>matmul(w_hidden, x_prev)
+        x_hidden <span style="color: #666666">=</span> sigmoid(z_hidden)
+
+        <span style="color: #408080; font-style: italic"># Update x_prev such that next layer can use the output from this layer</span>
+        x_prev <span style="color: #666666">=</span> x_hidden
+
+    <span style="color: #408080; font-style: italic">## Output layer:</span>
+
+    <span style="color: #408080; font-style: italic"># Get the weights and bias for this layer</span>
+    w_output <span style="color: #666666">=</span> deep_params[<span style="color: #666666">-1</span>]
+
+    <span style="color: #408080; font-style: italic"># Include bias:</span>
+    x_prev <span style="color: #666666">=</span> np<span style="color: #666666">.</span>concatenate((np<span style="color: #666666">.</span>ones((<span style="color: #666666">1</span>,num_points)), x_prev), axis <span style="color: #666666">=</span> <span style="color: #666666">0</span>)
+
+    z_output <span style="color: #666666">=</span> np<span style="color: #666666">.</span>matmul(w_output, x_prev)
+    x_output <span style="color: #666666">=</span> z_output
+
+    <span style="color: #008000; font-weight: bold">return</span> x_output[<span style="color: #666666">0</span>][<span style="color: #666666">0</span>]
 </pre>
 </div>
       </div>
@@ -2263,69 +2392,102 @@ <h3 id="usage-of-cost-functions">Usage of cost functions </h3>
     </div>
   </div>
 </div>
-<h3 id="activation-functions">Activation functions </h3>
 
-<p>Finally, before we look at the layers that make up the neural network,
-we will look at the activation functions which can be specified
-between the hidden layers and as the output function. Each function
-can be valued for any given vector or matrix X, and can be
-differentiated via derivate().
+
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="setting-up-the-network-using-autograd-the-trial-solution">Setting up the network using Autograd; The trial solution </h2>
+
+<p>The cost function must then iterate through the given arrays
+containing values for \( x \) and \( t \), defines a point \( (x,t) \) the deep
+neural network and the trial solution is evaluated at, and then finds
+the Jacobian of the trial solution.
 </p>
 
+<p>A possible trial solution for this PDE is</p>
 
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">autograd.numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">autograd</span> <span style="color: #008000; font-weight: bold">import</span> elementwise_grad
+<p>$$
+g_t(x,t) = h_1(x,t) + x(1-x)tN(x,t,P)
+$$
+</p>
+
+<p>with \( h_1(x,t) \) being a function ensuring that \( g_t(x,t) \) satisfies our given conditions, and \( N(x,t,P) \) being the output from the deep neural network using weights and biases for each layer from \( P \).</p>
+
+<p>To fulfill the conditions, \( h_1(x,t) \) could be:</p>
+
+<p>$$
+h_1(x,t) = (1-t)\Big(u(x) - \big((1-x)u(0) + x u(1)\big)\Big) = (1-t)u(x) = (1-t)\sin(\pi x)
+$$
+since \( (0) = u(1) = 0 \) and \( u(x) = \sin(\pi x) \).
+</p>
 
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">identity</span>(X):
-    <span style="color: #008000; font-weight: bold">return</span> X
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="why-the-jacobian">Why the Jacobian? </h2>
 
+<p>The Jacobian is used because the program must find the derivative of
+the trial solution with respect to \( x \) and \( t \).
+</p>
 
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">sigmoid</span>(X):
-    <span style="color: #008000; font-weight: bold">try</span>:
-        <span style="color: #008000; font-weight: bold">return</span> <span style="color: #666666">1.0</span> <span style="color: #666666">/</span> (<span style="color: #666666">1</span> <span style="color: #666666">+</span> np<span style="color: #666666">.</span>exp(<span style="color: #666666">-</span>X))
-    <span style="color: #008000; font-weight: bold">except</span> <span style="color: #D2413A; font-weight: bold">FloatingPointError</span>:
-        <span style="color: #008000; font-weight: bold">return</span> np<span style="color: #666666">.</span>where(X <span style="color: #666666">&gt;</span> np<span style="color: #666666">.</span>zeros(X<span style="color: #666666">.</span>shape), np<span style="color: #666666">.</span>ones(X<span style="color: #666666">.</span>shape), np<span style="color: #666666">.</span>zeros(X<span style="color: #666666">.</span>shape))
+<p>This gives the necessity of computing the Jacobian matrix, as we want
+to evaluate the gradient with respect to \( x \) and \( t \) (note that the
+Jacobian of a scalar-valued multivariate function is simply its
+gradient).
+</p>
 
+<p>In Autograd, the differentiation is by default done with respect to
+the first input argument of your Python function. Since the points is
+an array representing \( x \) and \( t \), the Jacobian is calculated using
+the values of \( x \) and \( t \).
+</p>
 
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">softmax</span>(X):
-    X <span style="color: #666666">=</span> X <span style="color: #666666">-</span> np<span style="color: #666666">.</span>max(X, axis<span style="color: #666666">=-1</span>, keepdims<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">True</span>)
-    delta <span style="color: #666666">=</span> <span style="color: #666666">10e-10</span>
-    <span style="color: #008000; font-weight: bold">return</span> np<span style="color: #666666">.</span>exp(X) <span style="color: #666666">/</span> (np<span style="color: #666666">.</span>sum(np<span style="color: #666666">.</span>exp(X), axis<span style="color: #666666">=-1</span>, keepdims<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">True</span>) <span style="color: #666666">+</span> delta)
+<p>To find the second derivative with respect to \( x \) and \( t \), the
+Jacobian can be found for the second time. The result is a Hessian
+matrix, which is the matrix containing all the possible second order
+mixed derivatives of \( g(x,t) \).
+</p>
 
 
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">RELU</span>(X):
-    <span style="color: #008000; font-weight: bold">return</span> np<span style="color: #666666">.</span>where(X <span style="color: #666666">&gt;</span> np<span style="color: #666666">.</span>zeros(X<span style="color: #666666">.</span>shape), X, np<span style="color: #666666">.</span>zeros(X<span style="color: #666666">.</span>shape))
+<!-- code=python (!bc pycod) typeset with pygments style "default" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #f8f8f8">
+  <pre style="line-height: 125%;"><span style="color: #408080; font-style: italic"># Set up the trial function:</span>
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">u</span>(x):
+    <span style="color: #008000; font-weight: bold">return</span> np<span style="color: #666666">.</span>sin(np<span style="color: #666666">.</span>pi<span style="color: #666666">*</span>x)
 
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">g_trial</span>(point,P):
+    x,t <span style="color: #666666">=</span> point
+    <span style="color: #008000; font-weight: bold">return</span> (<span style="color: #666666">1-</span>t)<span style="color: #666666">*</span>u(x) <span style="color: #666666">+</span> x<span style="color: #666666">*</span>(<span style="color: #666666">1-</span>x)<span style="color: #666666">*</span>t<span style="color: #666666">*</span>deep_neural_network(P,point)
 
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">LRELU</span>(X):
-    delta <span style="color: #666666">=</span> <span style="color: #666666">10e-4</span>
-    <span style="color: #008000; font-weight: bold">return</span> np<span style="color: #666666">.</span>where(X <span style="color: #666666">&gt;</span> np<span style="color: #666666">.</span>zeros(X<span style="color: #666666">.</span>shape), X, delta <span style="color: #666666">*</span> X)
+<span style="color: #408080; font-style: italic"># The right side of the ODE:</span>
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">f</span>(point):
+    <span style="color: #008000; font-weight: bold">return</span> <span style="color: #666666">0.</span>
 
+<span style="color: #408080; font-style: italic"># The cost function:</span>
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">cost_function</span>(P, x, t):
+    cost_sum <span style="color: #666666">=</span> <span style="color: #666666">0</span>
 
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">derivate</span>(func):
-    <span style="color: #008000; font-weight: bold">if</span> func<span style="color: #666666">.</span><span style="color: #19177C">__name__</span> <span style="color: #666666">==</span> <span style="color: #BA2121">&quot;RELU&quot;</span>:
+    g_t_jacobian_func <span style="color: #666666">=</span> jacobian(g_trial)
+    g_t_hessian_func <span style="color: #666666">=</span> hessian(g_trial)
 
-        <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">func</span>(X):
-            <span style="color: #008000; font-weight: bold">return</span> np<span style="color: #666666">.</span>where(X <span style="color: #666666">&gt;</span> <span style="color: #666666">0</span>, <span style="color: #666666">1</span>, <span style="color: #666666">0</span>)
+    <span style="color: #008000; font-weight: bold">for</span> x_ <span style="color: #AA22FF; font-weight: bold">in</span> x:
+        <span style="color: #008000; font-weight: bold">for</span> t_ <span style="color: #AA22FF; font-weight: bold">in</span> t:
+            point <span style="color: #666666">=</span> np<span style="color: #666666">.</span>array([x_,t_])
 
-        <span style="color: #008000; font-weight: bold">return</span> func
+            g_t <span style="color: #666666">=</span> g_trial(point,P)
+            g_t_jacobian <span style="color: #666666">=</span> g_t_jacobian_func(point,P)
+            g_t_hessian <span style="color: #666666">=</span> g_t_hessian_func(point,P)
 
-    <span style="color: #008000; font-weight: bold">elif</span> func<span style="color: #666666">.</span><span style="color: #19177C">__name__</span> <span style="color: #666666">==</span> <span style="color: #BA2121">&quot;LRELU&quot;</span>:
+            g_t_dt <span style="color: #666666">=</span> g_t_jacobian[<span style="color: #666666">1</span>]
+            g_t_d2x <span style="color: #666666">=</span> g_t_hessian[<span style="color: #666666">0</span>][<span style="color: #666666">0</span>]
 
-        <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">func</span>(X):
-            delta <span style="color: #666666">=</span> <span style="color: #666666">10e-4</span>
-            <span style="color: #008000; font-weight: bold">return</span> np<span style="color: #666666">.</span>where(X <span style="color: #666666">&gt;</span> <span style="color: #666666">0</span>, <span style="color: #666666">1</span>, delta)
+            func <span style="color: #666666">=</span> f(point)
 
-        <span style="color: #008000; font-weight: bold">return</span> func
+            err_sqr <span style="color: #666666">=</span> ( (g_t_dt <span style="color: #666666">-</span> g_t_d2x) <span style="color: #666666">-</span> func)<span style="color: #666666">**2</span>
+            cost_sum <span style="color: #666666">+=</span> err_sqr
 
-    <span style="color: #008000; font-weight: bold">else</span>:
-        <span style="color: #008000; font-weight: bold">return</span> elementwise_grad(func)
+    <span style="color: #008000; font-weight: bold">return</span> cost_sum
 </pre>
 </div>
       </div>
@@ -2340,13 +2502,27 @@ <h3 id="activation-functions">Activation functions </h3>
     </div>
   </div>
 </div>
-<h3 id="usage-of-activation-functions">Usage of activation functions </h3>
 
-<p>Below we present a short demonstration of how to use an activation
-function. The derivative of the activation function will be important
-when calculating the output delta term during backpropagation. Note
-that derivate() can also be used for cost functions for a more
-generalized approach.
+
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="setting-up-the-network-using-autograd-the-full-program">Setting up the network using Autograd; The full program </h2>
+
+<p>Having set up the network, along with the trial solution and cost function, we can now see how the deep neural network performs by comparing the results to the analytical solution.</p>
+
+<p>The analytical solution of our problem is</p>
+
+<p>$$
+g(x,t) = \exp(-\pi^2 t)\sin(\pi x)
+$$
+</p>
+
+<p>A possible way to implement a neural network solving the PDE, is given below.
+Be aware, though, that it is fairly slow for the parameters used.
+A better result is possible, but requires more iterations, and thus longer time to complete.
+</p>
+
+<p>Indeed, the program below is not optimal in its implementation, but rather serves as an example on how to implement and use a neural network to solve a PDE.
+Using TensorFlow results in a much better execution time. Try it!
 </p>
 
 
@@ -2356,407 +2532,229 @@ <h3 id="usage-of-activation-functions">Usage of activation functions </h3>
     <div class="inner_cell">
       <div class="input_area">
         <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;">z <span style="color: #666666">=</span> np<span style="color: #666666">.</span>array([[<span style="color: #666666">4</span>, <span style="color: #666666">5</span>, <span style="color: #666666">6</span>]])<span style="color: #666666">.</span>T
-<span style="color: #008000">print</span>(<span style="color: #BA2121">f&quot;Input to activation function:</span><span style="color: #BB6622; font-weight: bold">\n</span><span style="color: #BB6688; font-weight: bold">{</span>z<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">&quot;</span>)
+  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">autograd.numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">autograd</span> <span style="color: #008000; font-weight: bold">import</span> jacobian,hessian,grad
+<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">autograd.numpy.random</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">npr</span>
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">matplotlib</span> <span style="color: #008000; font-weight: bold">import</span> cm
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">matplotlib</span> <span style="color: #008000; font-weight: bold">import</span> pyplot <span style="color: #008000; font-weight: bold">as</span> plt
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">mpl_toolkits.mplot3d</span> <span style="color: #008000; font-weight: bold">import</span> axes3d
 
-act_func <span style="color: #666666">=</span> sigmoid
-a <span style="color: #666666">=</span> act_func(z)
-<span style="color: #008000">print</span>(<span style="color: #BA2121">f&quot;</span><span style="color: #BB6622; font-weight: bold">\n</span><span style="color: #BA2121">Output from </span><span style="color: #BB6688; font-weight: bold">{</span>act_func<span style="color: #666666">.</span><span style="color: #19177C">__name__</span><span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121"> activation function:</span><span style="color: #BB6622; font-weight: bold">\n</span><span style="color: #BB6688; font-weight: bold">{</span>a<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">&quot;</span>)
+<span style="color: #408080; font-style: italic">## Set up the network</span>
 
-act_func_derivative <span style="color: #666666">=</span> derivate(act_func)
-valued_at_z <span style="color: #666666">=</span> act_func_derivative(a)
-<span style="color: #008000">print</span>(<span style="color: #BA2121">f&quot;</span><span style="color: #BB6622; font-weight: bold">\n</span><span style="color: #BA2121">Derivative of </span><span style="color: #BB6688; font-weight: bold">{</span>act_func<span style="color: #666666">.</span><span style="color: #19177C">__name__</span><span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121"> activation function valued at z:</span><span style="color: #BB6622; font-weight: bold">\n</span><span style="color: #BB6688; font-weight: bold">{</span>valued_at_z<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">&quot;</span>)
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-<h3 id="convolution">Convolution </h3>
-
-<p>In order to construct a convolutional neural network (CNN), it is
-crucial to comprehend the fundamental principles of convolution and
-how it aids in extracting information from images. Convolution, at its
-core, is merely a mathematical operation between two functions that
-yields another function. It is represented by an integral between two
-functions, which is typically expressed as:
-</p>
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">sigmoid</span>(z):
+    <span style="color: #008000; font-weight: bold">return</span> <span style="color: #666666">1/</span>(<span style="color: #666666">1</span> <span style="color: #666666">+</span> np<span style="color: #666666">.</span>exp(<span style="color: #666666">-</span>z))
 
-$$
-(f \ast g)(t):=\int_{-\infty}^{\infty} f(\tau) g(t-\tau) d \tau.
-$$
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">deep_neural_network</span>(deep_params, x):
+    <span style="color: #408080; font-style: italic"># x is now a point and a 1D numpy array; make it a column vector</span>
+    num_coordinates <span style="color: #666666">=</span> np<span style="color: #666666">.</span>size(x,<span style="color: #666666">0</span>)
+    x <span style="color: #666666">=</span> x<span style="color: #666666">.</span>reshape(num_coordinates,<span style="color: #666666">-1</span>)
 
-<p>Here, \( f \) and \( g \) are the two functions on which we want to perform an
-operation. The outcome of the convolution operation is represented by
-\( (f \ast g) \), and it is derived by sliding the function \( g \) over \( f \) and
-computing the integral of their product at each position. If both
-functions are continuous, convolution takes the form shown
-above. However, if we discretize both \( f \) and \( g \), the convolution
-operation will take the form of a sum between the elements of \( f \) and \( g \):
-</p>
-$$
-(f \ast g)[n]=\sum_{m=0}^{n-1} f(m) g(n-m).
-$$
+    num_points <span style="color: #666666">=</span> np<span style="color: #666666">.</span>size(x,<span style="color: #666666">1</span>)
 
-<p>The key idea we utilize to extract the information contained in an
-image is to slide an \( m \times n \) matrix \( g \) over an \( m \times n \)
-matrix \( f \). In our case, \( f \) represents the image, while \( g \)
-represents the kernel, oftentimes called a filter. However, since our
-convolution will be a two-dimensional variant, we need to extend our
-mathematical formula with an additional summation:
-</p>
+    <span style="color: #408080; font-style: italic"># N_hidden is the number of hidden layers</span>
+    N_hidden <span style="color: #666666">=</span> <span style="color: #008000">len</span>(deep_params) <span style="color: #666666">-</span> <span style="color: #666666">1</span> <span style="color: #408080; font-style: italic"># -1 since params consist of parameters to all the hidden layers AND the output layer</span>
 
-$$
-(f \ast g)(i, j)\sum_{m=0}^{M-1}\sum_{n=0}^{N-1} f(m,n) g(i-m, j-n).
-$$
+    <span style="color: #408080; font-style: italic"># Assume that the input layer does nothing to the input x</span>
+    x_input <span style="color: #666666">=</span> x
+    x_prev <span style="color: #666666">=</span> x_input
 
-<p>It is imperative to note that the size of the kernel g is
-significantly smaller than the size of the input image f, thereby
-reducing the amount of computation necessary for feature
-extraction. Furthermore, the kernel is usually a trainable parameter
-in a convolutional neural network, allowing the network to learn
-appropriate kernels for specific tasks.
-</p>
+    <span style="color: #408080; font-style: italic">## Hidden layers:</span>
 
-<p>To give you an example of how 2D convolution works in practice,
-suppose we have an image \( f \) of dimension \( 6 \times 6 \)
-</p>
+    <span style="color: #008000; font-weight: bold">for</span> l <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(N_hidden):
+        <span style="color: #408080; font-style: italic"># From the list of parameters P; find the correct weigths and bias for this layer</span>
+        w_hidden <span style="color: #666666">=</span> deep_params[l]
 
-$$
-f = \begin{bmatrix}
-4 & 1 & 2 & 9 & 8 & 6 \\
-9 & 5 & 9 & 5 & 8 & 5 \\
-1 & 5 & 9 & 7 & 6 & 4 \\
-2 & 9 & 8 & 3 & 7 & 1 \\
-8 & 1 & 6 & 4 & 2 & 2 \\
-1 & 0 & 5 & 7 & 8 & 2 \\
-\end{bmatrix}
-$$
+        <span style="color: #408080; font-style: italic"># Add a row of ones to include bias</span>
+        x_prev <span style="color: #666666">=</span> np<span style="color: #666666">.</span>concatenate((np<span style="color: #666666">.</span>ones((<span style="color: #666666">1</span>,num_points)), x_prev ), axis <span style="color: #666666">=</span> <span style="color: #666666">0</span>)
 
-<p>and a \( 3 \times 3 \) kernel \( g \) called a low-pass filter. Note that the
-kernel is usually rotated by 180 degrees during convolution, however
-this has no effect on this kernel.
-</p>
+        z_hidden <span style="color: #666666">=</span> np<span style="color: #666666">.</span>matmul(w_hidden, x_prev)
+        x_hidden <span style="color: #666666">=</span> sigmoid(z_hidden)
 
-$$
-g = \frac{1}{9}
-\begin{bmatrix}
-1 & 1 & 1 \\
-1 & 1 & 1 \\
-1 & 1 & 1 \\
-\end{bmatrix}
-$$
+        <span style="color: #408080; font-style: italic"># Update x_prev such that next layer can use the output from this layer</span>
+        x_prev <span style="color: #666666">=</span> x_hidden
 
-<p>In order to filter the image, we have to extract a \( 3 \times 3 \)
-element from the upper left corner of \( f \), and perform element-wise
-multiplication of the extracted image pixels with the elements of the
-kernel \( g \):
-</p>
+    <span style="color: #408080; font-style: italic">## Output layer:</span>
 
-$$
-\begin{bmatrix}
-4 & 1 & 2 \\
-9 & 5 & 9 \\
-1 & 5 & 9 \\
-\end{bmatrix}
-\cdot
-\begin{bmatrix}
-\frac{1}{9} & \frac{1}{9} & \frac{1}{9} \\
-\frac{1}{9} & \frac{1}{9} & \frac{1}{9} \\
-\frac{1}{9} & \frac{1}{9} & \frac{1}{9} \\
-\end{bmatrix}
-=
-\begin{bmatrix}
-\frac{4}{9} & \frac{1}{9} & \frac{2}{9} \\
-\frac{9}{9} & \frac{5}{9} & \frac{9}{9} \\
-\frac{1}{9} & \frac{5}{9} & \frac{9}{9} \\
-\end {bmatrix}
-= \boldsymbol{A}
-$$
+    <span style="color: #408080; font-style: italic"># Get the weights and bias for this layer</span>
+    w_output <span style="color: #666666">=</span> deep_params[<span style="color: #666666">-1</span>]
 
-<p>Then, following the multiplication, we summarize all the elements of the resulting matrix \( \boldsymbol{A} \):</p>
+    <span style="color: #408080; font-style: italic"># Include bias:</span>
+    x_prev <span style="color: #666666">=</span> np<span style="color: #666666">.</span>concatenate((np<span style="color: #666666">.</span>ones((<span style="color: #666666">1</span>,num_points)), x_prev), axis <span style="color: #666666">=</span> <span style="color: #666666">0</span>)
 
-$$
-(f \ast g)(0, 0)= \sum_{i=0}^{2} \sum_{j=0}^{2} a_{i,j} = 5,
-$$
+    z_output <span style="color: #666666">=</span> np<span style="color: #666666">.</span>matmul(w_output, x_prev)
+    x_output <span style="color: #666666">=</span> z_output
 
-<p>which corresponds to the first element of the filtered image \( (f \ast g) \).</p>
+    <span style="color: #008000; font-weight: bold">return</span> x_output[<span style="color: #666666">0</span>][<span style="color: #666666">0</span>]
 
-<p>Here we use a stride of \( S=1 \), a parameter denoted \( S \) which describes how
-many indexes we move the kernel \( g \) to the right before repeating the
-calculations above for the next \( 3 \times 3 \) element of the image
-\( f \). It is usually presumed that \( S=1 \), however, larger values for \( S \)
-can be used to reduce the dimentionality of the filtered image such
-that the convolution operation is more computationally efficient. In
-the context of a convolutional neural network, this will become very
-useful.
-</p>
+<span style="color: #408080; font-style: italic">## Define the trial solution and cost function</span>
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">u</span>(x):
+    <span style="color: #008000; font-weight: bold">return</span> np<span style="color: #666666">.</span>sin(np<span style="color: #666666">.</span>pi<span style="color: #666666">*</span>x)
 
-<p>The full result of the convolution is:</p>
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">g_trial</span>(point,P):
+    x,t <span style="color: #666666">=</span> point
+    <span style="color: #008000; font-weight: bold">return</span> (<span style="color: #666666">1-</span>t)<span style="color: #666666">*</span>u(x) <span style="color: #666666">+</span> x<span style="color: #666666">*</span>(<span style="color: #666666">1-</span>x)<span style="color: #666666">*</span>t<span style="color: #666666">*</span>deep_neural_network(P,point)
 
-$$
-(f \ast g) =
-\begin{bmatrix}
-5 & 5.78 & 7 & 6.44 \\
-6.33 & 6.67 & 6.89 & 5.11 \\
-5.44 & 5.78 & 5.78 & 4 \\
-4.44 & 4.78 & 5.56 & 4 \\
-\end{bmatrix}
-$$
+<span style="color: #408080; font-style: italic"># The right side of the ODE:</span>
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">f</span>(point):
+    <span style="color: #008000; font-weight: bold">return</span> <span style="color: #666666">0.</span>
 
-<p>The result is markedly smaller in shape than the original image. This
-occurs when using convolution without first padding the image with
-additional columns and rows, allowing us to keep the original image
-shape after sliding the kernel over the image.  How many rows and
-columns we wish to pad the image with depends strictly on the shape of
-the kernel, as we wish to pad the image with \( r \) additional rows and
-\( c \) additional columns.
-</p>
+<span style="color: #408080; font-style: italic"># The cost function:</span>
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">cost_function</span>(P, x, t):
+    cost_sum <span style="color: #666666">=</span> <span style="color: #666666">0</span>
 
-$$
-r =\lfloor \frac{\mathrm{kernel height}}{2} \rfloor \cdot 2 \\
-c =\lfloor \frac{\mathrm{kernel width}}{2} \rfloor \cdot 2
-$$
+    g_t_jacobian_func <span style="color: #666666">=</span> jacobian(g_trial)
+    g_t_hessian_func <span style="color: #666666">=</span> hessian(g_trial)
 
-<p>Note the notation \( \lfloor \frac{\mathrm{kernel width}}{2} \rfloor \) means that
-we floor the result of the division, meaning we round down to a whole
-number in case \( \frac{\mathrm{kernel width}}{2} \) results in a floating point
-number.
-</p>
+    <span style="color: #008000; font-weight: bold">for</span> x_ <span style="color: #AA22FF; font-weight: bold">in</span> x:
+        <span style="color: #008000; font-weight: bold">for</span> t_ <span style="color: #AA22FF; font-weight: bold">in</span> t:
+            point <span style="color: #666666">=</span> np<span style="color: #666666">.</span>array([x_,t_])
 
-<p>Using those simple equations, we find out by how much we have to
-extend the dimensions of the original image. Before proceeding,
-however, we might ask what we shall fill the additional rows and
-columns with? One of the most common approaches to padding is
-zero-padding, which as the name suggest, involves filling the rows and
-columns with zeros. This is the approach that we will be using for
-this demonstration. If we apply this padding to out original \( 6 \times 6 \)
-image, the result will be an \( 8 \times 8 \) image as the kernel has a width and
-height of 3. Note that the original image is encapsuled by the
-zero-padded rows and columns:
-</p>
+            g_t <span style="color: #666666">=</span> g_trial(point,P)
+            g_t_jacobian <span style="color: #666666">=</span> g_t_jacobian_func(point,P)
+            g_t_hessian <span style="color: #666666">=</span> g_t_hessian_func(point,P)
 
-$$
-\begin{bmatrix}
-0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\
-0 & 4 & 1 & 2 & 9 & 8 & 6 & 0 \\
-0 & 9 & 5 & 9 & 5 & 8 & 5 & 0 \\
-0 & 1 & 5 & 9 & 7 & 6 & 4 & 0 \\
-0 & 2 & 9 & 8 & 3 & 7 & 1 & 0 \\
-0 & 8 & 1 & 6 & 4 & 2 & 2 & 0 \\
-0 & 1 & 0 & 5 & 7 & 8 & 2 & 0 \\
-0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\
-\end{bmatrix}.
-$$
+            g_t_dt <span style="color: #666666">=</span> g_t_jacobian[<span style="color: #666666">1</span>]
+            g_t_d2x <span style="color: #666666">=</span> g_t_hessian[<span style="color: #666666">0</span>][<span style="color: #666666">0</span>]
 
-<p>Below we have provided code that demonstrates padding and
-convolution. As you will see when we run the code, the size of the
-image will remain unchanged when using padding.~
-</p>
+            func <span style="color: #666666">=</span> f(point)
 
+            err_sqr <span style="color: #666666">=</span> ( (g_t_dt <span style="color: #666666">-</span> g_t_d2x) <span style="color: #666666">-</span> func)<span style="color: #666666">**2</span>
+            cost_sum <span style="color: #666666">+=</span> err_sqr
 
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
+    <span style="color: #008000; font-weight: bold">return</span> cost_sum <span style="color: #666666">/</span>( np<span style="color: #666666">.</span>size(x)<span style="color: #666666">*</span>np<span style="color: #666666">.</span>size(t) )
 
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">padding</span>(image, kernel):
-    <span style="color: #408080; font-style: italic"># calculate r and c</span>
-    r <span style="color: #666666">=</span> (kernel<span style="color: #666666">.</span>shape[<span style="color: #666666">0</span>] <span style="color: #666666">//</span> <span style="color: #666666">2</span>) <span style="color: #666666">*</span> <span style="color: #666666">2</span>
-    c <span style="color: #666666">=</span> (kernel<span style="color: #666666">.</span>shape[<span style="color: #666666">1</span>] <span style="color: #666666">//</span> <span style="color: #666666">2</span>) <span style="color: #666666">*</span> <span style="color: #666666">2</span>
-    
-    <span style="color: #408080; font-style: italic"># padded image dimensions</span>
-    padded_height <span style="color: #666666">=</span> image<span style="color: #666666">.</span>shape[<span style="color: #666666">0</span>] <span style="color: #666666">+</span> r
-    padded_width <span style="color: #666666">=</span> image<span style="color: #666666">.</span>shape[<span style="color: #666666">1</span>] <span style="color: #666666">+</span> c
-    
-    <span style="color: #408080; font-style: italic"># for more readable code</span>
-    k_half_height <span style="color: #666666">=</span> kernel<span style="color: #666666">.</span>shape[<span style="color: #666666">0</span>] <span style="color: #666666">//</span> <span style="color: #666666">2</span>
-    k_half_width <span style="color: #666666">=</span> kernel<span style="color: #666666">.</span>shape[<span style="color: #666666">1</span>] <span style="color: #666666">//</span> <span style="color: #666666">2</span>
+<span style="color: #408080; font-style: italic">## For comparison, define the analytical solution</span>
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">g_analytic</span>(point):
+    x,t <span style="color: #666666">=</span> point
+    <span style="color: #008000; font-weight: bold">return</span> np<span style="color: #666666">.</span>exp(<span style="color: #666666">-</span>np<span style="color: #666666">.</span>pi<span style="color: #666666">**2*</span>t)<span style="color: #666666">*</span>np<span style="color: #666666">.</span>sin(np<span style="color: #666666">.</span>pi<span style="color: #666666">*</span>x)
 
-    <span style="color: #408080; font-style: italic"># zero matrix with padded dimensions</span>
-    padded_img <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros((padded_height, padded_width))
+<span style="color: #408080; font-style: italic">## Set up a function for training the network to solve for the equation</span>
+<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">solve_pde_deep_neural_network</span>(x,t, num_neurons, num_iter, lmb):
+    <span style="color: #408080; font-style: italic">## Set up initial weigths and biases</span>
+    N_hidden <span style="color: #666666">=</span> np<span style="color: #666666">.</span>size(num_neurons)
 
-    <span style="color: #408080; font-style: italic"># place image into zero matrix</span>
-    padded_img[k_half_height : padded_height <span style="color: #666666">-</span> k_half_height,
-               k_half_width : padded_width <span style="color: #666666">-</span> k_half_width] <span style="color: #666666">=</span> image[:, :]
+    <span style="color: #408080; font-style: italic">## Set up initial weigths and biases</span>
 
-    <span style="color: #008000; font-weight: bold">return</span> padded_img
+    <span style="color: #408080; font-style: italic"># Initialize the list of parameters:</span>
+    P <span style="color: #666666">=</span> [<span style="color: #008000; font-weight: bold">None</span>]<span style="color: #666666">*</span>(N_hidden <span style="color: #666666">+</span> <span style="color: #666666">1</span>) <span style="color: #408080; font-style: italic"># + 1 to include the output layer</span>
 
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">convolve</span>(original_image, padded_image, kernel, stride<span style="color: #666666">=1</span>):
-    <span style="color: #408080; font-style: italic"># rotate kernel by 180 degrees</span>
-    kernel <span style="color: #666666">=</span> np<span style="color: #666666">.</span>rot90(np<span style="color: #666666">.</span>rot90(kernel))
+    P[<span style="color: #666666">0</span>] <span style="color: #666666">=</span> npr<span style="color: #666666">.</span>randn(num_neurons[<span style="color: #666666">0</span>], <span style="color: #666666">2</span> <span style="color: #666666">+</span> <span style="color: #666666">1</span> ) <span style="color: #408080; font-style: italic"># 2 since we have two points, +1 to include bias</span>
+    <span style="color: #008000; font-weight: bold">for</span> l <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #666666">1</span>,N_hidden):
+        P[l] <span style="color: #666666">=</span> npr<span style="color: #666666">.</span>randn(num_neurons[l], num_neurons[l<span style="color: #666666">-1</span>] <span style="color: #666666">+</span> <span style="color: #666666">1</span>) <span style="color: #408080; font-style: italic"># +1 to include bias</span>
 
-    <span style="color: #408080; font-style: italic"># note that kernel height // 2 is written as &#39;m&#39;</span>
-    <span style="color: #408080; font-style: italic"># and kernel width // 2 as &#39;n&#39; in the mathematical notation</span>
-    m <span style="color: #666666">=</span> kernel<span style="color: #666666">.</span>shape[<span style="color: #666666">0</span>] <span style="color: #666666">//</span> <span style="color: #666666">2</span>
-    n <span style="color: #666666">=</span> kernel<span style="color: #666666">.</span>shape[<span style="color: #666666">1</span>] <span style="color: #666666">//</span> <span style="color: #666666">2</span>
-    
-    r <span style="color: #666666">=</span> (kernel<span style="color: #666666">.</span>shape[<span style="color: #666666">0</span>] <span style="color: #666666">//</span> <span style="color: #666666">2</span>) <span style="color: #666666">*</span> <span style="color: #666666">2</span>
-    c <span style="color: #666666">=</span> (kernel<span style="color: #666666">.</span>shape[<span style="color: #666666">1</span>] <span style="color: #666666">//</span> <span style="color: #666666">2</span>) <span style="color: #666666">*</span> <span style="color: #666666">2</span>
-    
-    <span style="color: #408080; font-style: italic"># initialize output array</span>
-    convolved_image <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros(original_image<span style="color: #666666">.</span>shape)
-    image_height <span style="color: #666666">=</span> original_image<span style="color: #666666">.</span>shape[<span style="color: #666666">0</span>]
-    image_width <span style="color: #666666">=</span> original_image<span style="color: #666666">.</span>shape[<span style="color: #666666">1</span>]
-
-    <span style="color: #408080; font-style: italic"># the convolution</span>
-    <span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(m, image_height <span style="color: #666666">+</span> m, stride):
-        <span style="color: #008000; font-weight: bold">for</span> j <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(n, image_width <span style="color: #666666">+</span> n, stride):
-            convolved_image[i<span style="color: #666666">-</span>m, j<span style="color: #666666">-</span>n] <span style="color: #666666">=</span> np<span style="color: #666666">.</span>sum(
-                padded_image[i : i <span style="color: #666666">+</span> m, j : j <span style="color: #666666">+</span> n]
-                <span style="color: #666666">*</span> kernel
-            )
-            
-    <span style="color: #008000; font-weight: bold">return</span> convolved_image
-
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">convolve</span>(image, kernel, stride<span style="color: #666666">=1</span>):
-    <span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #666666">2</span>):
-        kernel <span style="color: #666666">=</span> np<span style="color: #666666">.</span>rot90(kernel)
-
-    k_half_height <span style="color: #666666">=</span> kernel<span style="color: #666666">.</span>shape[<span style="color: #666666">0</span>] <span style="color: #666666">//</span> <span style="color: #666666">2</span>
-    k_half_width <span style="color: #666666">=</span> kernel<span style="color: #666666">.</span>shape[<span style="color: #666666">0</span>] <span style="color: #666666">//</span> <span style="color: #666666">2</span>
-
-    conv_image <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros(image<span style="color: #666666">.</span>shape)
-    pad_image <span style="color: #666666">=</span> padding(image, kernel)
-
-    <span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(k_half_height, conv_image<span style="color: #666666">.</span>shape[<span style="color: #666666">0</span>] <span style="color: #666666">+</span> k_half_height, stride):
-        <span style="color: #008000; font-weight: bold">for</span> j <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(k_half_width, conv_image<span style="color: #666666">.</span>shape[<span style="color: #666666">1</span>] <span style="color: #666666">+</span> k_half_width, stride):
-            conv_image[i <span style="color: #666666">-</span> k_half_height, j <span style="color: #666666">-</span> k_half_width] <span style="color: #666666">=</span> np<span style="color: #666666">.</span>sum(
-                pad_image[
-                    i <span style="color: #666666">-</span> k_half_height : i <span style="color: #666666">+</span> k_half_height <span style="color: #666666">+</span> <span style="color: #666666">1</span>, j <span style="color: #666666">-</span> k_half_width : j <span style="color: #666666">+</span> k_half_width <span style="color: #666666">+</span> <span style="color: #666666">1</span>
-                ]
-                <span style="color: #666666">*</span> kernel
-            )
-
-    <span style="color: #008000; font-weight: bold">return</span> conv_image
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
+    <span style="color: #408080; font-style: italic"># For the output layer</span>
+    P[<span style="color: #666666">-1</span>] <span style="color: #666666">=</span> npr<span style="color: #666666">.</span>randn(<span style="color: #666666">1</span>, num_neurons[<span style="color: #666666">-1</span>] <span style="color: #666666">+</span> <span style="color: #666666">1</span> ) <span style="color: #408080; font-style: italic"># +1 since bias is included</span>
 
-<p>Fun fact: When filtering images, you will see that convolution
-involves rotating the kernel by 180 degrees.  However, this is not the
-case when applying convolution in a CNN, where the same operation that is not
-rotated by 180 degrees is called cross-correlation, which is normally implemented in most libraries.
-</p>
+    <span style="color: #008000">print</span>(<span style="color: #BA2121">&#39;Initial cost: &#39;</span>,cost_function(P, x, t))
 
+    cost_function_grad <span style="color: #666666">=</span> grad(cost_function,<span style="color: #666666">0</span>)
 
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;">original_image <span style="color: #666666">=</span> np<span style="color: #666666">.</span>array([[<span style="color: #666666">4</span>, <span style="color: #666666">1</span>, <span style="color: #666666">2</span>, <span style="color: #666666">9</span>, <span style="color: #666666">8</span>, <span style="color: #666666">6</span>],
-                 [<span style="color: #666666">9</span>, <span style="color: #666666">5</span>, <span style="color: #666666">9</span>, <span style="color: #666666">5</span>, <span style="color: #666666">8</span>, <span style="color: #666666">5</span>],
-                 [<span style="color: #666666">1</span>, <span style="color: #666666">5</span>, <span style="color: #666666">9</span>, <span style="color: #666666">7</span>, <span style="color: #666666">6</span>, <span style="color: #666666">4</span>],
-                 [<span style="color: #666666">2</span>, <span style="color: #666666">9</span>, <span style="color: #666666">8</span>, <span style="color: #666666">3</span>, <span style="color: #666666">7</span>, <span style="color: #666666">1</span>],
-                 [<span style="color: #666666">8</span>, <span style="color: #666666">1</span>, <span style="color: #666666">6</span>, <span style="color: #666666">4</span>, <span style="color: #666666">2</span>, <span style="color: #666666">2</span>],
-                 [<span style="color: #666666">1</span>, <span style="color: #666666">0</span>, <span style="color: #666666">5</span>, <span style="color: #666666">7</span>, <span style="color: #666666">8</span>, <span style="color: #666666">2</span>]])
+    <span style="color: #408080; font-style: italic"># Let the update be done num_iter times</span>
+    <span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(num_iter):
+        cost_grad <span style="color: #666666">=</span>  cost_function_grad(P, x , t)
 
-kernel <span style="color: #666666">=</span> (<span style="color: #666666">1/9</span>)<span style="color: #666666">*</span>np<span style="color: #666666">.</span>ones((<span style="color: #666666">3</span>,<span style="color: #666666">3</span>))
+        <span style="color: #008000; font-weight: bold">for</span> l <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(N_hidden<span style="color: #666666">+1</span>):
+            P[l] <span style="color: #666666">=</span> P[l] <span style="color: #666666">-</span> lmb <span style="color: #666666">*</span> cost_grad[l]
 
-<span style="color: #008000">print</span>(<span style="color: #BA2121">f&quot;</span><span style="color: #BB6688; font-weight: bold">{</span>original_image<span style="color: #666666">.</span>shape<span style="color: #BB6688; font-weight: bold">=}</span><span style="color: #BA2121">&quot;</span>)
+    <span style="color: #008000">print</span>(<span style="color: #BA2121">&#39;Final cost: &#39;</span>,cost_function(P, x, t))
 
-<span style="color: #408080; font-style: italic"># note that convolve() performs padding</span>
-convolved_image <span style="color: #666666">=</span> convolve(original_image, kernel, stride<span style="color: #666666">=1</span>)
+    <span style="color: #008000; font-weight: bold">return</span> P
 
-<span style="color: #008000">print</span>(<span style="color: #BA2121">f&quot;</span><span style="color: #BB6688; font-weight: bold">{</span>convolved_image<span style="color: #666666">.</span>shape<span style="color: #BB6688; font-weight: bold">=}</span><span style="color: #BA2121">&quot;</span>)
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
+<span style="color: #008000; font-weight: bold">if</span> <span style="color: #19177C">__name__</span> <span style="color: #666666">==</span> <span style="color: #BA2121">&#39;__main__&#39;</span>:
+    <span style="color: #408080; font-style: italic">### Use the neural network:</span>
+    npr<span style="color: #666666">.</span>seed(<span style="color: #666666">15</span>)
 
-<p>As you can see, the resulting image is of the same size as the
-original image. To round of our demonstration of convolution, we will
-present the results of convolution using commonly used kernels. In a
-CNN, the values of the kernels are randomly initialized, and then
-learned during training. These kernels will extract information
-regarding the picture, such as for example the edge detection filter
-demonstrated below extracts the edges present in the picture. Of
-course, there is no guarantee that the CNN will learn an edge
-detection filter, but this should provide some intuiton as to how the
-CNN is able to use kernels to make better predictions than a regular
-feed forward neural network.
-</p>
+    <span style="color: #408080; font-style: italic">## Decide the vales of arguments to the function to solve</span>
+    Nx <span style="color: #666666">=</span> <span style="color: #666666">10</span>; Nt <span style="color: #666666">=</span> <span style="color: #666666">10</span>
+    x <span style="color: #666666">=</span> np<span style="color: #666666">.</span>linspace(<span style="color: #666666">0</span>, <span style="color: #666666">1</span>, Nx)
+    t <span style="color: #666666">=</span> np<span style="color: #666666">.</span>linspace(<span style="color: #666666">0</span>,<span style="color: #666666">1</span>,Nt)
 
+    <span style="color: #408080; font-style: italic">## Set up the parameters for the network</span>
+    num_hidden_neurons <span style="color: #666666">=</span> [<span style="color: #666666">100</span>, <span style="color: #666666">25</span>]
+    num_iter <span style="color: #666666">=</span> <span style="color: #666666">250</span>
+    lmb <span style="color: #666666">=</span> <span style="color: #666666">0.01</span>
 
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #408080; font-style: italic"># Now an example using a real image and first a gaussian low-pass filter and then a Sobel filter</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">imageio.v3</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">imageio</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">matplotlib.pyplot</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">plt</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">time</span>
+    P <span style="color: #666666">=</span> solve_pde_deep_neural_network(x,t, num_hidden_neurons, num_iter, lmb)
 
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">generate_gauss_mask</span>(sigma, K<span style="color: #666666">=1</span>):
-    side <span style="color: #666666">=</span> np<span style="color: #666666">.</span>ceil(<span style="color: #666666">1</span> <span style="color: #666666">+</span> <span style="color: #666666">8</span> <span style="color: #666666">*</span> sigma)
-    y, x <span style="color: #666666">=</span> np<span style="color: #666666">.</span>mgrid[<span style="color: #666666">-</span>side <span style="color: #666666">//</span> <span style="color: #666666">2</span> <span style="color: #666666">+</span> <span style="color: #666666">1</span> : (side <span style="color: #666666">//</span> <span style="color: #666666">2</span>) <span style="color: #666666">+</span> <span style="color: #666666">1</span>, <span style="color: #666666">-</span>side <span style="color: #666666">//</span> <span style="color: #666666">2</span> <span style="color: #666666">+</span> <span style="color: #666666">1</span> : (side <span style="color: #666666">//</span> <span style="color: #666666">2</span>) <span style="color: #666666">+</span> <span style="color: #666666">1</span>]
-    ker_coef <span style="color: #666666">=</span> K <span style="color: #666666">/</span> (<span style="color: #666666">2</span> <span style="color: #666666">*</span> np<span style="color: #666666">.</span>pi <span style="color: #666666">*</span> sigma<span style="color: #666666">**2</span>)
-    g <span style="color: #666666">=</span> np<span style="color: #666666">.</span>exp(<span style="color: #666666">-</span>((x<span style="color: #666666">**2</span> <span style="color: #666666">+</span> y<span style="color: #666666">**2</span>) <span style="color: #666666">/</span> (<span style="color: #666666">2.0</span> <span style="color: #666666">*</span> sigma<span style="color: #666666">**2</span>)))
+    <span style="color: #408080; font-style: italic">## Store the results</span>
+    g_dnn_ag <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros((Nx, Nt))
+    G_analytical <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros((Nx, Nt))
+    <span style="color: #008000; font-weight: bold">for</span> i,x_ <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">enumerate</span>(x):
+        <span style="color: #008000; font-weight: bold">for</span> j, t_ <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">enumerate</span>(t):
+            point <span style="color: #666666">=</span> np<span style="color: #666666">.</span>array([x_, t_])
+            g_dnn_ag[i,j] <span style="color: #666666">=</span> g_trial(point,P)
 
-    <span style="color: #008000; font-weight: bold">return</span> g, ker_coef
+            G_analytical[i,j] <span style="color: #666666">=</span> g_analytic(point)
 
+    <span style="color: #408080; font-style: italic"># Find the map difference between the analytical and the computed solution</span>
+    diff_ag <span style="color: #666666">=</span> np<span style="color: #666666">.</span>abs(g_dnn_ag <span style="color: #666666">-</span> G_analytical)
+    <span style="color: #008000">print</span>(<span style="color: #BA2121">&#39;Max absolute difference between the analytical solution and the network: </span><span style="color: #BB6688; font-weight: bold">%g</span><span style="color: #BA2121">&#39;</span><span style="color: #666666">%</span>np<span style="color: #666666">.</span>max(diff_ag))
 
-img_path <span style="color: #666666">=</span> <span style="color: #BA2121">&quot;data/IMG-2167.JPG&quot;</span>
-image_of_cute_dog <span style="color: #666666">=</span> imageio<span style="color: #666666">.</span>imread(img_path, mode<span style="color: #666666">=</span><span style="color: #BA2121">&#39;L&#39;</span>)
+    <span style="color: #408080; font-style: italic">## Plot the solutions in two dimensions, that being in position and time</span>
 
-plt<span style="color: #666666">.</span>imshow(image_of_cute_dog, cmap<span style="color: #666666">=</span><span style="color: #BA2121">&quot;gray&quot;</span>, vmin<span style="color: #666666">=0</span>, vmax<span style="color: #666666">=255</span>, aspect<span style="color: #666666">=</span><span style="color: #BA2121">&quot;auto&quot;</span>)
-plt<span style="color: #666666">.</span>title(<span style="color: #BA2121">&quot;Original image&quot;</span>)
-plt<span style="color: #666666">.</span>show()
+    T,X <span style="color: #666666">=</span> np<span style="color: #666666">.</span>meshgrid(t,x)
 
-gauss, kernel <span style="color: #666666">=</span> generate_gauss_mask(sigma<span style="color: #666666">=6</span>)
-gauss_kernel <span style="color: #666666">=</span> gauss<span style="color: #666666">*</span>kernel
+    fig <span style="color: #666666">=</span> plt<span style="color: #666666">.</span>figure(figsize<span style="color: #666666">=</span>(<span style="color: #666666">10</span>,<span style="color: #666666">10</span>))
+    ax <span style="color: #666666">=</span> fig<span style="color: #666666">.</span>add_suplot(projection<span style="color: #666666">=</span><span style="color: #BA2121">&#39;3d&#39;</span>)
+    ax<span style="color: #666666">.</span>set_title(<span style="color: #BA2121">&#39;Solution from the deep neural network w/ </span><span style="color: #BB6688; font-weight: bold">%d</span><span style="color: #BA2121"> layer&#39;</span><span style="color: #666666">%</span><span style="color: #008000">len</span>(num_hidden_neurons))
+    s <span style="color: #666666">=</span> ax<span style="color: #666666">.</span>plot_surface(T,X,g_dnn_ag,linewidth<span style="color: #666666">=0</span>,antialiased<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">False</span>,cmap<span style="color: #666666">=</span>cm<span style="color: #666666">.</span>viridis)
+    ax<span style="color: #666666">.</span>set_xlabel(<span style="color: #BA2121">&#39;Time $t$&#39;</span>)
+    ax<span style="color: #666666">.</span>set_ylabel(<span style="color: #BA2121">&#39;Position $x$&#39;</span>);
 
-filtered_image <span style="color: #666666">=</span> convolve(image_of_cute_dog, gauss_kernel)
-plt<span style="color: #666666">.</span>imshow(filtered_image, cmap<span style="color: #666666">=</span><span style="color: #BA2121">&quot;gray&quot;</span>, vmin<span style="color: #666666">=0</span>, vmax<span style="color: #666666">=255</span>, aspect<span style="color: #666666">=</span><span style="color: #BA2121">&quot;auto&quot;</span>)
-plt<span style="color: #666666">.</span>title(<span style="color: #BA2121">&quot;Result of convolution with gauss kernel (blurring filter)&quot;</span>)
-plt<span style="color: #666666">.</span>show()
 
-sobel_kernel <span style="color: #666666">=</span> np<span style="color: #666666">.</span>array([[<span style="color: #666666">1</span>, <span style="color: #666666">2</span>, <span style="color: #666666">1</span>],
-                    [<span style="color: #666666">0</span>, <span style="color: #666666">0</span>, <span style="color: #666666">0</span>], 
-                    [<span style="color: #666666">-1</span>, <span style="color: #666666">-2</span>, <span style="color: #666666">-1</span>]])
+    fig <span style="color: #666666">=</span> plt<span style="color: #666666">.</span>figure(figsize<span style="color: #666666">=</span>(<span style="color: #666666">10</span>,<span style="color: #666666">10</span>))
+    ax <span style="color: #666666">=</span> fig<span style="color: #666666">.</span>add_suplot(projection<span style="color: #666666">=</span><span style="color: #BA2121">&#39;3d&#39;</span>)
+    ax<span style="color: #666666">.</span>set_title(<span style="color: #BA2121">&#39;Analytical solution&#39;</span>)
+    s <span style="color: #666666">=</span> ax<span style="color: #666666">.</span>plot_surface(T,X,G_analytical,linewidth<span style="color: #666666">=0</span>,antialiased<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">False</span>,cmap<span style="color: #666666">=</span>cm<span style="color: #666666">.</span>viridis)
+    ax<span style="color: #666666">.</span>set_xlabel(<span style="color: #BA2121">&#39;Time $t$&#39;</span>)
+    ax<span style="color: #666666">.</span>set_ylabel(<span style="color: #BA2121">&#39;Position $x$&#39;</span>);
 
-filtered_image <span style="color: #666666">=</span> convolve(image_of_cute_dog, sobel_kernel)
+    fig <span style="color: #666666">=</span> plt<span style="color: #666666">.</span>figure(figsize<span style="color: #666666">=</span>(<span style="color: #666666">10</span>,<span style="color: #666666">10</span>))
+    ax <span style="color: #666666">=</span> fig<span style="color: #666666">.</span>add_suplot(projection<span style="color: #666666">=</span><span style="color: #BA2121">&#39;3d&#39;</span>)
+    ax<span style="color: #666666">.</span>set_title(<span style="color: #BA2121">&#39;Difference&#39;</span>)
+    s <span style="color: #666666">=</span> ax<span style="color: #666666">.</span>plot_surface(T,X,diff_ag,linewidth<span style="color: #666666">=0</span>,antialiased<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">False</span>,cmap<span style="color: #666666">=</span>cm<span style="color: #666666">.</span>viridis)
+    ax<span style="color: #666666">.</span>set_xlabel(<span style="color: #BA2121">&#39;Time $t$&#39;</span>)
+    ax<span style="color: #666666">.</span>set_ylabel(<span style="color: #BA2121">&#39;Position $x$&#39;</span>);
 
-plt<span style="color: #666666">.</span>imshow(filtered_image, cmap<span style="color: #666666">=</span><span style="color: #BA2121">&quot;gray&quot;</span>, vmin<span style="color: #666666">=0</span>, vmax<span style="color: #666666">=255</span>, aspect<span style="color: #666666">=</span><span style="color: #BA2121">&quot;auto&quot;</span>)
-plt<span style="color: #666666">.</span>title(<span style="color: #BA2121">&quot;Result of convolution with sobel kernel (edge detection filter)&quot;</span>)
-plt<span style="color: #666666">.</span>show()
+    <span style="color: #408080; font-style: italic">## Take some slices of the 3D plots just to see the solutions at particular times</span>
+    indx1 <span style="color: #666666">=</span> <span style="color: #666666">0</span>
+    indx2 <span style="color: #666666">=</span> <span style="color: #008000">int</span>(Nt<span style="color: #666666">/2</span>)
+    indx3 <span style="color: #666666">=</span> Nt<span style="color: #666666">-1</span>
+
+    t1 <span style="color: #666666">=</span> t[indx1]
+    t2 <span style="color: #666666">=</span> t[indx2]
+    t3 <span style="color: #666666">=</span> t[indx3]
+
+    <span style="color: #408080; font-style: italic"># Slice the results from the DNN</span>
+    res1 <span style="color: #666666">=</span> g_dnn_ag[:,indx1]
+    res2 <span style="color: #666666">=</span> g_dnn_ag[:,indx2]
+    res3 <span style="color: #666666">=</span> g_dnn_ag[:,indx3]
+
+    <span style="color: #408080; font-style: italic"># Slice the analytical results</span>
+    res_analytical1 <span style="color: #666666">=</span> G_analytical[:,indx1]
+    res_analytical2 <span style="color: #666666">=</span> G_analytical[:,indx2]
+    res_analytical3 <span style="color: #666666">=</span> G_analytical[:,indx3]
+
+    <span style="color: #408080; font-style: italic"># Plot the slices</span>
+    plt<span style="color: #666666">.</span>figure(figsize<span style="color: #666666">=</span>(<span style="color: #666666">10</span>,<span style="color: #666666">10</span>))
+    plt<span style="color: #666666">.</span>title(<span style="color: #BA2121">&quot;Computed solutions at time = </span><span style="color: #BB6688; font-weight: bold">%g</span><span style="color: #BA2121">&quot;</span><span style="color: #666666">%</span>t1)
+    plt<span style="color: #666666">.</span>plot(x, res1)
+    plt<span style="color: #666666">.</span>plot(x,res_analytical1)
+    plt<span style="color: #666666">.</span>legend([<span style="color: #BA2121">&#39;dnn&#39;</span>,<span style="color: #BA2121">&#39;analytical&#39;</span>])
+
+    plt<span style="color: #666666">.</span>figure(figsize<span style="color: #666666">=</span>(<span style="color: #666666">10</span>,<span style="color: #666666">10</span>))
+    plt<span style="color: #666666">.</span>title(<span style="color: #BA2121">&quot;Computed solutions at time = </span><span style="color: #BB6688; font-weight: bold">%g</span><span style="color: #BA2121">&quot;</span><span style="color: #666666">%</span>t2)
+    plt<span style="color: #666666">.</span>plot(x, res2)
+    plt<span style="color: #666666">.</span>plot(x,res_analytical2)
+    plt<span style="color: #666666">.</span>legend([<span style="color: #BA2121">&#39;dnn&#39;</span>,<span style="color: #BA2121">&#39;analytical&#39;</span>])
+
+    plt<span style="color: #666666">.</span>figure(figsize<span style="color: #666666">=</span>(<span style="color: #666666">10</span>,<span style="color: #666666">10</span>))
+    plt<span style="color: #666666">.</span>title(<span style="color: #BA2121">&quot;Computed solutions at time = </span><span style="color: #BB6688; font-weight: bold">%g</span><span style="color: #BA2121">&quot;</span><span style="color: #666666">%</span>t3)
+    plt<span style="color: #666666">.</span>plot(x, res3)
+    plt<span style="color: #666666">.</span>plot(x,res_analytical3)
+    plt<span style="color: #666666">.</span>legend([<span style="color: #BA2121">&#39;dnn&#39;</span>,<span style="color: #BA2121">&#39;analytical&#39;</span>])
+
+    plt<span style="color: #666666">.</span>show()
 </pre>
 </div>
       </div>
@@ -2771,50 +2769,354 @@ <h3 id="convolution">Convolution </h3>
     </div>
   </div>
 </div>
-<h3 id="layers">Layers </h3>
 
-<p>The code below initialises global variables for readability and
-describes the abstract class Layers. This is not important in order to
-understand the CNN, but is benefitial for organizing the code neatly.
-</p>
 
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="resources-on-differential-equations-and-deep-learning">Resources on differential equations and deep learning </h2>
 
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
+<ol>
+<li> <a href="https://pdfs.semanticscholar.org/d061/df393e0e8fbfd0ea24976458b7d42419040d.pdf" target="_blank">Artificial neural networks for solving ordinary and partial differential equations by I.E. Lagaris et al</a></li>
+<li> <a href="https://becominghuman.ai/neural-networks-for-solving-differential-equations-fa230ac5e04c" target="_blank">Neural networks for solving differential equations by A. Honchar</a></li>
+<li> <a href="http://cs229.stanford.edu/proj2013/ChiaramonteKiener-SolvingDifferentialEquationsUsingNeuralNetworks.pdf" target="_blank">Solving differential equations using neural networks by M.M Chiaramonte and M. Kiener</a></li>
+<li> <a href="https://www.springer.com/us/book/9783540225515" target="_blank">Introduction to Partial Differential Equations by A. Tveito, R. Winther</a></li>
+</ol>
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="convolutional-neural-networks-recognizing-images">Convolutional Neural Networks (recognizing images) </h2>
+
+<p>Convolutional neural networks (CNNs) were developed during the last
+decade of the previous century, with a focus on character recognition
+tasks. Nowadays, CNNs are a central element in the spectacular success
+of deep learning methods. The success in for example image
+classifications have made them a central tool for most machine
+learning practitioners.
+</p>
+
+<p>CNNs are very similar to ordinary Neural Networks.
+They are made up of neurons that have learnable weights and
+biases. Each neuron receives some inputs, performs a dot product and
+optionally follows it with a non-linearity. The whole network still
+expresses a single differentiable score function: from the raw image
+pixels on one end to class scores at the other. And they still have a
+loss function (for example Softmax) on the last (fully-connected) layer
+and all the tips/tricks we developed for learning regular Neural
+Networks still apply (back propagation, gradient descent etc etc).
+</p>
+
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="what-is-the-difference">What is the Difference </h2>
+
+<p><b>CNN architectures make the explicit assumption that
+the inputs are images, which allows us to encode certain properties
+into the architecture. These then make the forward function more
+efficient to implement and vastly reduce the amount of parameters in
+the network.</b>
+</p>
+
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="neural-networks-vs-cnns">Neural Networks vs CNNs </h2>
+
+<p>Neural networks are defined as <b>affine transformations</b>, that is 
+a vector is received as input and is multiplied with a matrix of so-called weights (our unknown paramters) to produce an
+output (to which a bias vector is usually added before passing the result
+through a nonlinear activation function). This is applicable to any type of input, be it an
+image, a sound clip or an unordered collection of features: whatever their
+dimensionality, their representation can always be flattened into a vector
+before the transformation.
+</p>
+
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc">Why CNNS for images, sound files, medical images from CT scans etc? </h2>
+
+<p>However, when we consider images, sound clips and many other similar kinds of data, these data  have an intrinsic
+structure. More formally, they share these important properties:
+</p>
+<ul>
+<li> They are stored as multi-dimensional arrays (think of the pixels of a figure) .</li>
+<li> They feature one or more axes for which ordering matters (e.g., width and height axes for an image, time axis for a sound clip).</li>
+<li> One axis, called the channel axis, is used to access different views of the data (e.g., the red, green and blue channels of a color image, or the left and right channels of a stereo audio track).</li>
+</ul>
+<p>These properties are not exploited when an affine transformation is applied; in
+fact, all the axes are treated in the same way and the topological information
+is not taken into account. Still, taking advantage of the implicit structure of
+the data may prove very handy in solving some tasks, like computer vision and
+speech recognition, and in these cases it would be best to preserve it. This is
+where discrete convolutions come into play.
+</p>
+
+<p>A discrete convolution is a linear transformation that preserves this notion of
+ordering. It is sparse (only a few input units contribute to a given output
+unit) and reuses parameters (the same weights are applied to multiple locations
+in the input).
+</p>
+
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="regular-nns-don-t-scale-well-to-full-images">Regular NNs don&#8217;t scale well to full images </h2>
+
+<p>As an example, consider
+an image of size \( 32\times 32\times 3 \) (32 wide, 32 high, 3 color channels), so a
+single fully-connected neuron in a first hidden layer of a regular
+Neural Network would have \( 32\times 32\times 3 = 3072 \) weights. This amount still
+seems manageable, but clearly this fully-connected structure does not
+scale to larger images. For example, an image of more respectable
+size, say \( 200\times 200\times 3 \), would lead to neurons that have 
+\( 200\times 200\times 3 = 120,000 \) weights. 
+</p>
+
+<p>We could have
+several such neurons, and the parameters would add up quickly! Clearly,
+this full connectivity is wasteful and the huge number of parameters
+would quickly lead to possible overfitting.
+</p>
+
+<center>  <!-- FIGURE -->
+<hr class="figure">
+<center>
+<p class="caption">Figure 1:  A regular 3-layer Neural Network. </p>
+</center>
+<p><img src="http://github.com/figslides/nn.jpeg" width="500" align="bottom"></p>
+</center>
+
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="3d-volumes-of-neurons">3D volumes of neurons </h2>
+
+<p>Convolutional Neural Networks take advantage of the fact that the
+input consists of images and they constrain the architecture in a more
+sensible way. 
+</p>
+
+<p>In particular, unlike a regular Neural Network, the
+layers of a CNN have neurons arranged in 3 dimensions: width,
+height, depth. (Note that the word depth here refers to the third
+dimension of an activation volume, not to the depth of a full Neural
+Network, which can refer to the total number of layers in a network.)
+</p>
+
+<p>To understand it better, the above example of an image 
+with an input volume of
+activations has dimensions \( 32\times 32\times 3 \) (width, height,
+depth respectively). 
+</p>
+
+<p>The neurons in a layer will
+only be connected to a small region of the layer before it, instead of
+all of the neurons in a fully-connected manner. Moreover, the final
+output layer could  for this specific image have dimensions \( 1\times 1 \times 10 \), 
+because by the
+end of the CNN architecture we will reduce the full image into a
+single vector of class scores, arranged along the depth
+dimension. 
+</p>
+
+<center>  <!-- FIGURE -->
+<hr class="figure">
+<center>
+<p class="caption">Figure 2:  A CNN arranges its neurons in three dimensions (width, height, depth), as visualized in one of the layers. Every layer of a CNN transforms the 3D input volume to a 3D output volume of neuron activations. In this example, the red input layer holds the image, so its width and height would be the dimensions of the image, and the depth would be 3 (Red, Green, Blue channels). </p>
+</center>
+<p><img src="http://github.com/figslides/cnn.jpeg" width="500" align="bottom"></p>
+</center>
+
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="more-on-dimensionalities">More on Dimensionalities </h2>
+
+<p>In fields like signal processing (and imaging as well), one designs
+so-called filters. These filters are defined by the convolutions and
+are often hand-crafted. One may specify filters for smoothing, edge
+detection, frequency reshaping, and similar operations. However with
+neural networks the idea is to automatically learn the filters and use
+many of them in conjunction with non-linear operations (activation
+functions).
+</p>
+
+<p>As an example consider a neural network operating on sound sequence
+data.  Assume that we an input vector \( \boldsymbol{x} \) of length \( d=10^6 \).  We
+construct then a neural network with onle hidden layer only with
+\( 10^4 \) nodes. This means that we will have a weight matrix with
+\( 10^4\times 10^6=10^{10} \) weights to be determined, together with \( 10^4 \) biases.
+</p>
+
+<p>Assume furthermore that we have an output layer which is meant to train whether the sound sequence represents a human voice (true) or something else (false).
+It means that we have only one output node. But since this output node connects to \( 10^4 \) nodes in the hidden layer, there are in total \( 10^4 \) weights to be determined for the output layer, plus one bias. In total we have
+</p>
+
+$$
+\mathrm{NumberParameters}=10^{10}+10^4+10^4+1 \approx 10^{10},
+$$
+
+<p>that is ten billion parameters to determine. </p>
+
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="further-remarks">Further remarks </h2>
+
+<p>The main principles that justify convolutions is locality of
+information and repetion of patterns within the signal. Sound samples
+of the input in adjacent spots are much more likely to affect each
+other than those that are very far away. Similarly, sounds are
+repeated in multiple times in the signal. While slightly simplistic,
+reasoning about such a sound example demonstrates this. The same
+principles then apply to images and other similar data.
+</p>
+
+<!-- !split  -->
+<h2 id="layers-used-to-build-cnns">Layers used to build CNNs </h2>
+
+<p>A simple CNN is a sequence of layers, and every layer of a CNN
+transforms one volume of activations to another through a
+differentiable function. We use three main types of layers to build
+CNN architectures: Convolutional Layer, Pooling Layer, and
+Fully-Connected Layer (exactly as seen in regular Neural Networks). We
+will stack these layers to form a full CNN architecture.
+</p>
+
+<p>A simple CNN for image classification could have the architecture:</p>
+
+<ul>
+<li> <b>INPUT</b> (\( 32\times 32 \times 3 \)) will hold the raw pixel values of the image, in this case an image of width 32, height 32, and with three color channels R,G,B.</li>
+<li> <b>CONV</b> (convolutional )layer will compute the output of neurons that are connected to local regions in the input, each computing a dot product between their weights and a small region they are connected to in the input volume. This may result in volume such as \( [32\times 32\times 12] \) if we decided to use 12 filters.</li>
+<li> <b>RELU</b> layer will apply an elementwise activation function, such as the \( max(0,x) \) thresholding at zero. This leaves the size of the volume unchanged (\( [32\times 32\times 12] \)).</li>
+<li> <b>POOL</b> (pooling) layer will perform a downsampling operation along the spatial dimensions (width, height), resulting in volume such as \( [16\times 16\times 12] \).</li>
+<li> <b>FC</b> (i.e. fully-connected) layer will compute the class scores, resulting in volume of size \( [1\times 1\times 10] \), where each of the 10 numbers correspond to a class score, such as among the 10 categories of the MNIST images we considered above . As with ordinary Neural Networks and as the name implies, each neuron in this layer will be connected to all the numbers in the previous volume.</li>
+</ul>
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="transforming-images">Transforming images </h2>
+
+<p>CNNs transform the original image layer by layer from the original
+pixel values to the final class scores. 
+</p>
+
+<p>Observe that some layers contain
+parameters and other don&#8217;t. In particular, the CNN layers perform
+transformations that are a function of not only the activations in the
+input volume, but also of the parameters (the weights and biases of
+the neurons). On the other hand, the RELU/POOL layers will implement a
+fixed function. The parameters in the CONV/FC layers will be trained
+with gradient descent so that the class scores that the CNN computes
+are consistent with the labels in the training set for each image.
+</p>
+
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="cnns-in-brief">CNNs in brief </h2>
+
+<p>In summary:</p>
+
+<ul>
+<li> A CNN architecture is in the simplest case a list of Layers that transform the image volume into an output volume (e.g. holding the class scores)</li>
+<li> There are a few distinct types of Layers (e.g. CONV/FC/RELU/POOL are by far the most popular)</li>
+<li> Each Layer accepts an input 3D volume and transforms it to an output 3D volume through a differentiable function</li>
+<li> Each Layer may or may not have parameters (e.g. CONV/FC do, RELU/POOL don&#8217;t)</li>
+<li> Each Layer may or may not have additional hyperparameters (e.g. CONV/FC/POOL do, RELU doesn&#8217;t)</li>
+</ul>
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book">A deep CNN model (<a href="https://github.com/rasbt/machine-learning-book" target="_blank">From Raschka et al</a>) </h2>
+
+<center>  <!-- FIGURE -->
+<hr class="figure">
+<center>
+<p class="caption">Figure 3:  A deep CNN </p>
+</center>
+<p><img src="http://github.com/figslides/deepcnn.png" width="500" align="bottom"></p>
+</center>
+
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="key-idea">Key Idea </h2>
+
+<p>A dense neural network is representd by an affine operation (like
+matrix-matrix multiplication) where all parameters are included.
+</p>
+
+<p>The key idea in CNNs for say imaging is that in images neighbor pixels tend to be related! So we connect
+only neighboring neurons in the input instead of connecting all with the first hidden layer.
+</p>
+
+<p>We say we perform a filtering (convolution is the mathematical operation). </p>
+
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="how-to-do-image-compression-before-the-era-of-deep-learning">How to do image compression before the era of deep learning </h2>
+
+<p>The singular-value decomposition (SVD) algorithm has been for decades one of the standard ways of compressing images.
+The <a href="https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/chapter2.html#the-singular-value-decomposition" target="_blank">lectures on the SVD</a> give many of the essential details concerning the SVD.
+</p>
+
+<p>The orthogonal vectors which are obtained from the SVD, can be used to
+project down the dimensionality of a given image. In the example here
+we gray-scale an image and downsize it.
+</p>
+
+<p>This recipe relies on us being able to actually perform the SVD. For
+large images, and in particular with many images to reconstruct, using the SVD 
+may quickly become an overwhelming task. With the advent of efficient deep
+learning methods like CNNs and later generative methods, these methods
+have become in the last years the premier way of performing image
+analysis. In particular for classification problems with labelled images.
+</p>
+
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="the-svd-example">The SVD example </h2>
+
+
+<!-- code=python (!bc pycod) typeset with pygments style "default" -->
+<div class="cell border-box-sizing code_cell rendered">
   <div class="input">
     <div class="inner_cell">
       <div class="input_area">
         <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">math</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">autograd.numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">copy</span> <span style="color: #008000; font-weight: bold">import</span> deepcopy, copy
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">autograd</span> <span style="color: #008000; font-weight: bold">import</span> grad
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">typing</span> <span style="color: #008000; font-weight: bold">import</span> Callable
-
-<span style="color: #408080; font-style: italic"># global variables for index readability</span>
-input_index <span style="color: #666666">=</span> <span style="color: #666666">0</span>
-node_index <span style="color: #666666">=</span> <span style="color: #666666">1</span>
-bias_index <span style="color: #666666">=</span> <span style="color: #666666">1</span>
-input_channel_index <span style="color: #666666">=</span> <span style="color: #666666">1</span>
-feature_maps_index <span style="color: #666666">=</span> <span style="color: #666666">1</span>
-height_index <span style="color: #666666">=</span> <span style="color: #666666">2</span>
-width_index <span style="color: #666666">=</span> <span style="color: #666666">3</span>
-kernel_feature_maps_index <span style="color: #666666">=</span> <span style="color: #666666">1</span>
-kernel_input_channels_index <span style="color: #666666">=</span> <span style="color: #666666">0</span>
-
-
-<span style="color: #008000; font-weight: bold">class</span> <span style="color: #0000FF; font-weight: bold">Layer</span>:
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">__init__</span>(<span style="color: #008000">self</span>, seed):
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>seed <span style="color: #666666">=</span> seed
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">_feedforward</span>(<span style="color: #008000">self</span>):
-        <span style="color: #008000; font-weight: bold">raise</span> <span style="color: #D2413A; font-weight: bold">NotImplementedError</span>
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">_backpropagate</span>(<span style="color: #008000">self</span>):
-        <span style="color: #008000; font-weight: bold">raise</span> <span style="color: #D2413A; font-weight: bold">NotImplementedError</span>
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">_reset_weights</span>(<span style="color: #008000">self</span>, previous_nodes):
-        <span style="color: #008000; font-weight: bold">raise</span> <span style="color: #D2413A; font-weight: bold">NotImplementedError</span>
+  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">matplotlib.image</span> <span style="color: #008000; font-weight: bold">import</span> imread
+<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">matplotlib.pyplot</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">plt</span>
+<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">scipy.linalg</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">ln</span>
+<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
+<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">os</span>
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">PIL</span> <span style="color: #008000; font-weight: bold">import</span> Image
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">math</span> <span style="color: #008000; font-weight: bold">import</span> log10, sqrt 
+plt<span style="color: #666666">.</span>rcParams[<span style="color: #BA2121">&#39;figure.figsize&#39;</span>] <span style="color: #666666">=</span> [<span style="color: #666666">16</span>, <span style="color: #666666">8</span>]
+<span style="color: #408080; font-style: italic"># Import image</span>
+A <span style="color: #666666">=</span> imread(os<span style="color: #666666">.</span>path<span style="color: #666666">.</span>join(<span style="color: #BA2121">&quot;figslides/photo1.jpg&quot;</span>))
+X <span style="color: #666666">=</span> A<span style="color: #666666">.</span>dot([<span style="color: #666666">0.299</span>, <span style="color: #666666">0.5870</span>, <span style="color: #666666">0.114</span>]) <span style="color: #408080; font-style: italic"># Convert RGB to grayscale</span>
+img <span style="color: #666666">=</span> plt<span style="color: #666666">.</span>imshow(X)
+<span style="color: #408080; font-style: italic"># convert to gray</span>
+img<span style="color: #666666">.</span>set_cmap(<span style="color: #BA2121">&#39;gray&#39;</span>)
+plt<span style="color: #666666">.</span>axis(<span style="color: #BA2121">&#39;off&#39;</span>)
+plt<span style="color: #666666">.</span>show()
+<span style="color: #408080; font-style: italic"># Call image size</span>
+<span style="color: #008000">print</span>(<span style="color: #BA2121">&#39;: </span><span style="color: #BB6688; font-weight: bold">%s</span><span style="color: #BA2121">&#39;</span><span style="color: #666666">%</span><span style="color: #008000">str</span>(X<span style="color: #666666">.</span>shape))
+
+
+<span style="color: #408080; font-style: italic"># split the matrix into U, S, VT</span>
+U, S, VT <span style="color: #666666">=</span> np<span style="color: #666666">.</span>linalg<span style="color: #666666">.</span>svd(X,full_matrices<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">False</span>)
+S <span style="color: #666666">=</span> np<span style="color: #666666">.</span>diag(S)
+m <span style="color: #666666">=</span> <span style="color: #666666">800</span> <span style="color: #408080; font-style: italic"># Image&#39;s width</span>
+n <span style="color: #666666">=</span> <span style="color: #666666">1200</span> <span style="color: #408080; font-style: italic"># Image&#39;s height</span>
+j <span style="color: #666666">=</span> <span style="color: #666666">0</span>
+<span style="color: #408080; font-style: italic"># Try compression with different k vectors (these represent projections):</span>
+<span style="color: #008000; font-weight: bold">for</span> k <span style="color: #AA22FF; font-weight: bold">in</span> (<span style="color: #666666">5</span>,<span style="color: #666666">10</span>, <span style="color: #666666">20</span>, <span style="color: #666666">100</span>,<span style="color: #666666">200</span>,<span style="color: #666666">400</span>,<span style="color: #666666">500</span>):
+    <span style="color: #408080; font-style: italic"># Original size of the image</span>
+    originalSize <span style="color: #666666">=</span> m <span style="color: #666666">*</span> n 
+    <span style="color: #408080; font-style: italic"># Size after compressed</span>
+    compressedSize <span style="color: #666666">=</span> k <span style="color: #666666">*</span> (<span style="color: #666666">1</span> <span style="color: #666666">+</span> m <span style="color: #666666">+</span> n) 
+    <span style="color: #408080; font-style: italic"># The projection of the original image</span>
+    Xapprox <span style="color: #666666">=</span> U[:,:k] <span style="color: #666666">@</span> S[<span style="color: #666666">0</span>:k,:k] <span style="color: #666666">@</span> VT[:k,:]
+    plt<span style="color: #666666">.</span>figure(j<span style="color: #666666">+1</span>)
+    j <span style="color: #666666">+=</span> <span style="color: #666666">1</span>
+    img <span style="color: #666666">=</span> plt<span style="color: #666666">.</span>imshow(Xapprox)
+    img<span style="color: #666666">.</span>set_cmap(<span style="color: #BA2121">&#39;gray&#39;</span>)
+    
+    plt<span style="color: #666666">.</span>axis(<span style="color: #BA2121">&#39;off&#39;</span>)
+    plt<span style="color: #666666">.</span>title(<span style="color: #BA2121">&#39;k = &#39;</span> <span style="color: #666666">+</span> <span style="color: #008000">str</span>(k))
+    plt<span style="color: #666666">.</span>show() 
+    <span style="color: #008000">print</span>(<span style="color: #BA2121">&#39;Original size of image:&#39;</span>)
+    <span style="color: #008000">print</span>(originalSize)
+    <span style="color: #008000">print</span>(<span style="color: #BA2121">&#39;Compression rate as Compressed image / Original size:&#39;</span>)
+    ratio <span style="color: #666666">=</span> compressedSize <span style="color: #666666">*</span> <span style="color: #666666">1.0</span> <span style="color: #666666">/</span> originalSize
+    <span style="color: #008000">print</span>(ratio)
+    <span style="color: #008000">print</span>(<span style="color: #BA2121">&#39;Compression rate is &#39;</span> <span style="color: #666666">+</span> <span style="color: #008000">str</span>( <span style="color: #008000">round</span>(ratio <span style="color: #666666">*</span> <span style="color: #666666">100</span> ,<span style="color: #666666">2</span>)) <span style="color: #666666">+</span> <span style="color: #BA2121">&#39;%&#39;</span> )  
+    <span style="color: #408080; font-style: italic"># Estimate MQA</span>
+    x<span style="color: #666666">=</span> X<span style="color: #666666">.</span>astype(<span style="color: #BA2121">&quot;float&quot;</span>)
+    y<span style="color: #666666">=</span>Xapprox<span style="color: #666666">.</span>astype(<span style="color: #BA2121">&quot;float&quot;</span>)
+    err <span style="color: #666666">=</span> np<span style="color: #666666">.</span>sum((x <span style="color: #666666">-</span> y) <span style="color: #666666">**</span> <span style="color: #666666">2</span>)
+    err <span style="color: #666666">/=</span> <span style="color: #008000">float</span>(X<span style="color: #666666">.</span>shape[<span style="color: #666666">0</span>] <span style="color: #666666">*</span> Xapprox<span style="color: #666666">.</span>shape[<span style="color: #666666">1</span>])
+    <span style="color: #008000">print</span>(<span style="color: #BA2121">&#39;The mean-square deviation &#39;</span><span style="color: #666666">+</span> <span style="color: #008000">str</span>(<span style="color: #008000">round</span>( err)))
+    max_pixel <span style="color: #666666">=</span> <span style="color: #666666">255.0</span>
+    <span style="color: #408080; font-style: italic"># Estimate Signal Noise Ratio</span>
+    srv <span style="color: #666666">=</span> <span style="color: #666666">20</span> <span style="color: #666666">*</span> (log10(max_pixel <span style="color: #666666">/</span> sqrt(err)))
+    <span style="color: #008000">print</span>(<span style="color: #BA2121">&#39;Signa to noise ratio &#39;</span><span style="color: #666666">+</span> <span style="color: #008000">str</span>(<span style="color: #008000">round</span>(srv)) <span style="color: #666666">+</span><span style="color: #BA2121">&#39;dB&#39;</span>)
 </pre>
 </div>
       </div>
@@ -2829,2339 +3131,574 @@ <h3 id="layers">Layers </h3>
     </div>
   </div>
 </div>
-<h3 id="convolution2dlayer-convolution-in-a-hidden-layer">Convolution2DLayer: convolution in a hidden layer </h3>
-
-<p>After establishing the foundational understanding of applying
-convolution to spatial data, let us delve into the intricate workings
-of a convolutional layer in a Convolutional Neural Network (CNN). The
-primary function of convolution, as previously discussed, is to
-extract pertinent information from images while simultaneously
-decreasing the scale of our data. To initiate the image processing, we
-shall begin by partitioning the images into color channels (unless the
-image is grayscale), comprising three primary colors: red, green, and
-blue. We will subsequently utilize trainable kernels to construct a
-higher-dimensional encoding of each channel called feature
-maps. Successive layers will receive these feature maps as inputs,
-generating further encodings, albeit with reduced dimensions. The term
-trainable kernels denotes the initialization of pre-defined
-kernel-shaped weights, which we will then train via backpropagation,
-similar to how weights are trained in a Feedforward Neural Network.
+
+
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="mathematics-of-cnns">Mathematics of CNNs </h2>
+
+<p>The mathematics of CNNs is based on the mathematical operation of
+<b>convolution</b>.  In mathematics (in particular in functional analysis),
+convolution is represented by mathematical operations (integration,
+summation etc) on two functions in order to produce a third function
+that expresses how the shape of one gets modified by the other.
+Convolution has a plethora of applications in a variety of
+disciplines, spanning from statistics to signal processing, computer
+vision, solutions of differential equations,linear algebra,
+engineering, and yes, machine learning.
+</p>
+
+<p>Mathematically, convolution is defined as follows (one-dimensional example):
+Let us define a continuous function \( y(t) \) given by
+</p>
+$$
+y(t) = \int x(a) w(t-a) da,
+$$
+
+<p>where \( x(a) \) represents a so-called input and \( w(t-a) \) is normally called the weight function or kernel.</p>
+
+<p>The above integral is written in  a more compact form as</p>
+$$
+y(t) = \left(x * w\right)(t).
+$$
+
+<p>The discretized version reads</p>
+$$
+y(t) = \sum_{a=-\infty}^{a=\infty}x(a)w(t-a).
+$$
+
+<p>Computing the inverse of the above convolution operations is known as deconvolution and the process is commutative.</p>
+
+<p>How can we use this? And what does it mean? Let us study some familiar examples first.</p>
+
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="convolution-examples-polynomial-multiplication">Convolution Examples: Polynomial multiplication </h2>
+
+<p>Our first example is that of a multiplication between two polynomials,
+which we will rewrite in terms of the mathematics of convolution. In
+the final stage, since the problem here is a discrete one, we will
+recast the final expression in terms of a matrix-vector
+multiplication, where the matrix is a so-called <a href="https://link.springer.com/book/10.1007/978-93-86279-04-0" target="_blank">Toeplitz matrix
+</a>.
 </p>
 
-<p>To ensure seamless integration between our implementation of the
-convolutional layer and popular machine learning frameworks like
-Tensorflow (Keras) and PyTorch, we have adopted a design pattern that
-mirrors the construction of models using these APIs. This involves
-implementing our convolutional layer as a Python class or object,
-which allows for a more modular and flexible approach to building
-neural networks. By structuring our code in this way, users can easily
-incorporate our implementation into their existing machine learning
-pipelines without having to make significant changes to their
-codebase. Additionally, this design pattern promotes code reusability
-and makes it easier to maintain and update our convolutional layer
-implementation over time.
+<p>Let us look a the following polynomials to second and third order, respectively:</p>
+$$
+p(t) = \alpha_0+\alpha_1 t+\alpha_2 t^2,
+$$
+
+<p>and</p>
+$$
+s(t) = \beta_0+\beta_1 t+\beta_2 t^2+\beta_3 t^3.
+$$
+
+<p>The polynomial multiplication gives us a new polynomial of degree \( 5 \)</p>
+$$
+z(t) = \delta_0+\delta_1 t+\delta_2 t^2+\delta_3 t^3+\delta_4 t^4+\delta_5 t^5.
+$$
+
+
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="efficient-polynomial-multiplication">Efficient Polynomial Multiplication </h2>
+
+<p>Computing polynomial products can be implemented efficiently if we rewrite the more brute force multiplications using convolution.
+We note first that the new coefficients are given as
 </p>
 
-<p>Note that the Convolution2DLayer takes in an activation function as a parameter, as it also performs non-linearity.</p>
+$$
+\begin{split}
+\delta_0=&\alpha_0\beta_0\\
+\delta_1=&\alpha_1\beta_0+\alpha_0\beta_1\\
+\delta_2=&\alpha_0\beta_2+\alpha_1\beta_1+\alpha_2\beta_0\\
+\delta_3=&\alpha_1\beta_2+\alpha_2\beta_1+\alpha_0\beta_3\\
+\delta_4=&\alpha_2\beta_2+\alpha_1\beta_3\\
+\delta_5=&\alpha_2\beta_3.\\
+\end{split}
+$$
 
+<p>We note that \( \alpha_i=0 \) except for \( i\in \left\{0,1,2\right\} \) and \( \beta_i=0 \) except for \( i\in\left\{0,1,2,3\right\} \).</p>
 
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">class</span> <span style="color: #0000FF; font-weight: bold">Convolution2DLayer</span>(Layer):
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">__init__</span>(
-        <span style="color: #008000">self</span>,
-        input_channels,
-        feature_maps,
-        kernel_height,
-        kernel_width,
-        v_stride,
-        h_stride,
-        pad,
-        act_func: Callable,
-        seed<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">None</span>,
-        reset_weights_independently<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">True</span>,
-    ):
-        <span style="color: #008000">super</span>()<span style="color: #666666">.</span><span style="color: #0000FF">__init__</span>(seed)
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>input_channels <span style="color: #666666">=</span> input_channels
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>feature_maps <span style="color: #666666">=</span> feature_maps
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_height <span style="color: #666666">=</span> kernel_height
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_width <span style="color: #666666">=</span> kernel_width
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>v_stride <span style="color: #666666">=</span> v_stride
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>h_stride <span style="color: #666666">=</span> h_stride
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>pad <span style="color: #666666">=</span> pad
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>act_func <span style="color: #666666">=</span> act_func
-
-        <span style="color: #408080; font-style: italic"># such that the layer can be used on its own</span>
-        <span style="color: #408080; font-style: italic"># outside of the CNN module</span>
-        <span style="color: #008000; font-weight: bold">if</span> reset_weights_independently <span style="color: #666666">==</span> <span style="color: #008000; font-weight: bold">True</span>:
-            <span style="color: #008000">self</span><span style="color: #666666">.</span>_reset_weights_independently()
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">_feedforward</span>(<span style="color: #008000">self</span>, X_batch):
-        <span style="color: #408080; font-style: italic"># note that the shape of X_batch = [inputs, input_maps, img_height, img_width]</span>
-
-        <span style="color: #408080; font-style: italic"># pad the input batch</span>
-        X_batch_padded <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>_padding(X_batch)
-
-        <span style="color: #408080; font-style: italic"># calculate height_index and width_index after stride</span>
-        strided_height <span style="color: #666666">=</span> <span style="color: #008000">int</span>(np<span style="color: #666666">.</span>ceil(X_batch<span style="color: #666666">.</span>shape[height_index] <span style="color: #666666">/</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>v_stride))
-        strided_width <span style="color: #666666">=</span> <span style="color: #008000">int</span>(np<span style="color: #666666">.</span>ceil(X_batch<span style="color: #666666">.</span>shape[width_index] <span style="color: #666666">/</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>h_stride))
-
-        <span style="color: #408080; font-style: italic"># create output array</span>
-        output <span style="color: #666666">=</span> np<span style="color: #666666">.</span>ndarray(
-            (
-                X_batch<span style="color: #666666">.</span>shape[input_index],
-                <span style="color: #008000">self</span><span style="color: #666666">.</span>feature_maps,
-                strided_height,
-                strided_width,
-            )
-        )
-
-        <span style="color: #408080; font-style: italic"># save input and output for backpropagation</span>
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>X_batch_feedforward <span style="color: #666666">=</span> X_batch
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>output_shape <span style="color: #666666">=</span> output<span style="color: #666666">.</span>shape
-
-        <span style="color: #408080; font-style: italic"># checking for errors, no need to look here :)</span>
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>_check_for_errors()
-
-        <span style="color: #408080; font-style: italic"># convolve input with kernel</span>
-        <span style="color: #008000; font-weight: bold">for</span> img <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(X_batch<span style="color: #666666">.</span>shape[input_index]):
-            <span style="color: #008000; font-weight: bold">for</span> chin <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #008000">self</span><span style="color: #666666">.</span>input_channels):
-                <span style="color: #008000; font-weight: bold">for</span> fmap <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #008000">self</span><span style="color: #666666">.</span>feature_maps):
-                    out_h <span style="color: #666666">=</span> <span style="color: #666666">0</span>
-                    <span style="color: #008000; font-weight: bold">for</span> h <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #666666">0</span>, X_batch<span style="color: #666666">.</span>shape[height_index], <span style="color: #008000">self</span><span style="color: #666666">.</span>v_stride):
-                        out_w <span style="color: #666666">=</span> <span style="color: #666666">0</span>
-                        <span style="color: #008000; font-weight: bold">for</span> w <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #666666">0</span>, X_batch<span style="color: #666666">.</span>shape[width_index], <span style="color: #008000">self</span><span style="color: #666666">.</span>h_stride):
-                            output[img, fmap, out_h, out_w] <span style="color: #666666">=</span> np<span style="color: #666666">.</span>sum(
-                                X_batch_padded[
-                                    img,
-                                    chin,
-                                    h : h <span style="color: #666666">+</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_height,
-                                    w : w <span style="color: #666666">+</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_width,
-                                ]
-                                <span style="color: #666666">*</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel[chin, fmap, :, :]
-                            )
-                            out_w <span style="color: #666666">+=</span> <span style="color: #666666">1</span>
-                        out_h <span style="color: #666666">+=</span> <span style="color: #666666">1</span>
-
-        <span style="color: #408080; font-style: italic"># Pay attention to the fact that we&#39;re not rotating the kernel by 180 degrees when filtering the image in</span>
-        <span style="color: #408080; font-style: italic"># the convolutional layer, as convolution in terms of Machine Learning is a procedure known as cross-correlation</span>
-        <span style="color: #408080; font-style: italic"># in image processing and signal processing</span>
-
-        <span style="color: #408080; font-style: italic"># return a</span>
-        <span style="color: #008000; font-weight: bold">return</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>act_func(output <span style="color: #666666">/</span> (<span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_height))
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">_backpropagate</span>(<span style="color: #008000">self</span>, delta_term_next):
-        <span style="color: #408080; font-style: italic"># intiate matrices</span>
-        delta_term <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros((<span style="color: #008000">self</span><span style="color: #666666">.</span>X_batch_feedforward<span style="color: #666666">.</span>shape))
-        gradient_kernel <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros((<span style="color: #008000">self</span><span style="color: #666666">.</span>kernel<span style="color: #666666">.</span>shape))
-
-        <span style="color: #408080; font-style: italic"># pad input for convolution</span>
-        X_batch_padded <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>_padding(<span style="color: #008000">self</span><span style="color: #666666">.</span>X_batch_feedforward)
-
-        <span style="color: #408080; font-style: italic"># Since an activation function is used at the output of the convolution layer, its derivative</span>
-        <span style="color: #408080; font-style: italic"># has to be accounted for in the backpropagation -&gt; as if ReLU was a layer on its own.</span>
-        act_derivative <span style="color: #666666">=</span> derivate(<span style="color: #008000">self</span><span style="color: #666666">.</span>act_func)
-        delta_term_next <span style="color: #666666">=</span> act_derivative(delta_term_next)
-
-        <span style="color: #408080; font-style: italic"># fill in 0&#39;s for values removed by vertical stride in feedforward</span>
-        <span style="color: #008000; font-weight: bold">if</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>v_stride <span style="color: #666666">&gt;</span> <span style="color: #666666">1</span>:
-            v_ind <span style="color: #666666">=</span> <span style="color: #666666">1</span>
-            <span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(delta_term_next<span style="color: #666666">.</span>shape[height_index]):
-                <span style="color: #008000; font-weight: bold">for</span> j <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #008000">self</span><span style="color: #666666">.</span>v_stride <span style="color: #666666">-</span> <span style="color: #666666">1</span>):
-                    delta_term_next <span style="color: #666666">=</span> np<span style="color: #666666">.</span>insert(
-                        delta_term_next, v_ind, <span style="color: #666666">0</span>, axis<span style="color: #666666">=</span>height_index
-                    )
-                v_ind <span style="color: #666666">+=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>v_stride
-
-        <span style="color: #408080; font-style: italic"># fill in 0&#39;s for values removed by horizontal stride in feedforward</span>
-        <span style="color: #008000; font-weight: bold">if</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>h_stride <span style="color: #666666">&gt;</span> <span style="color: #666666">1</span>:
-            h_ind <span style="color: #666666">=</span> <span style="color: #666666">1</span>
-            <span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(delta_term_next<span style="color: #666666">.</span>shape[width_index]):
-                <span style="color: #008000; font-weight: bold">for</span> k <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #008000">self</span><span style="color: #666666">.</span>h_stride <span style="color: #666666">-</span> <span style="color: #666666">1</span>):
-                    delta_term_next <span style="color: #666666">=</span> np<span style="color: #666666">.</span>insert(
-                        delta_term_next, h_ind, <span style="color: #666666">0</span>, axis<span style="color: #666666">=</span>width_index
-                    )
-                h_ind <span style="color: #666666">+=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>h_stride
-
-        <span style="color: #408080; font-style: italic"># crops out 0-rows and 0-columns</span>
-        delta_term_next <span style="color: #666666">=</span> delta_term_next[
-            :,
-            :,
-            : <span style="color: #008000">self</span><span style="color: #666666">.</span>X_batch_feedforward<span style="color: #666666">.</span>shape[height_index],
-            : <span style="color: #008000">self</span><span style="color: #666666">.</span>X_batch_feedforward<span style="color: #666666">.</span>shape[width_index],
-        ]
-
-        <span style="color: #408080; font-style: italic"># the gradient received from the next layer also needs to be padded</span>
-        delta_term_next <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>_padding(delta_term_next)
-
-        <span style="color: #408080; font-style: italic"># calculate delta term by convolving next delta term with kernel</span>
-        <span style="color: #008000; font-weight: bold">for</span> img <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #008000">self</span><span style="color: #666666">.</span>X_batch_feedforward<span style="color: #666666">.</span>shape[input_index]):
-            <span style="color: #008000; font-weight: bold">for</span> chin <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #008000">self</span><span style="color: #666666">.</span>input_channels):
-                <span style="color: #008000; font-weight: bold">for</span> fmap <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #008000">self</span><span style="color: #666666">.</span>feature_maps):
-                    <span style="color: #008000; font-weight: bold">for</span> h <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #008000">self</span><span style="color: #666666">.</span>X_batch_feedforward<span style="color: #666666">.</span>shape[height_index]):
-                        <span style="color: #008000; font-weight: bold">for</span> w <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #008000">self</span><span style="color: #666666">.</span>X_batch_feedforward<span style="color: #666666">.</span>shape[width_index]):
-                            delta_term[img, chin, h, w] <span style="color: #666666">=</span> np<span style="color: #666666">.</span>sum(
-                                delta_term_next[
-                                    img,
-                                    fmap,
-                                    h : h <span style="color: #666666">+</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_height,
-                                    w : w <span style="color: #666666">+</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_width,
-                                ]
-                                <span style="color: #666666">*</span> np<span style="color: #666666">.</span>rot90(np<span style="color: #666666">.</span>rot90(<span style="color: #008000">self</span><span style="color: #666666">.</span>kernel[chin, fmap, :, :]))
-                            )
-
-        <span style="color: #408080; font-style: italic"># calculate gradient for kernel for weight update</span>
-        <span style="color: #408080; font-style: italic"># also via convolution</span>
-        <span style="color: #008000; font-weight: bold">for</span> chin <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #008000">self</span><span style="color: #666666">.</span>input_channels):
-            <span style="color: #008000; font-weight: bold">for</span> fmap <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #008000">self</span><span style="color: #666666">.</span>feature_maps):
-                <span style="color: #008000; font-weight: bold">for</span> k_x <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_height):
-                    <span style="color: #008000; font-weight: bold">for</span> k_y <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_width):
-                        gradient_kernel[chin, fmap, k_x, k_y] <span style="color: #666666">=</span> np<span style="color: #666666">.</span>sum(
-                            X_batch_padded[
-                                img,
-                                chin,
-                                h : h <span style="color: #666666">+</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_height,
-                                w : w <span style="color: #666666">+</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_width,
-                            ]
-                            <span style="color: #666666">*</span> delta_term_next[
-                                img,
-                                fmap,
-                                h : h <span style="color: #666666">+</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_height,
-                                w : w <span style="color: #666666">+</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_width,
-                            ]
-                        )
-        <span style="color: #408080; font-style: italic"># all kernels are updated with weight gradient of kernel</span>
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel <span style="color: #666666">-=</span> gradient_kernel
-
-        <span style="color: #408080; font-style: italic"># return delta term</span>
-        <span style="color: #008000; font-weight: bold">return</span> delta_term
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">_padding</span>(<span style="color: #008000">self</span>, X_batch, batch_type<span style="color: #666666">=</span><span style="color: #BA2121">&quot;image&quot;</span>):
-
-        <span style="color: #408080; font-style: italic"># same padding for images</span>
-        <span style="color: #008000; font-weight: bold">if</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>pad <span style="color: #666666">==</span> <span style="color: #BA2121">&quot;same&quot;</span> <span style="color: #AA22FF; font-weight: bold">and</span> batch_type <span style="color: #666666">==</span> <span style="color: #BA2121">&quot;image&quot;</span>:
-            padded_height <span style="color: #666666">=</span> X_batch<span style="color: #666666">.</span>shape[height_index] <span style="color: #666666">+</span> (<span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_height <span style="color: #666666">//</span> <span style="color: #666666">2</span>) <span style="color: #666666">*</span> <span style="color: #666666">2</span>
-            padded_width <span style="color: #666666">=</span> X_batch<span style="color: #666666">.</span>shape[width_index] <span style="color: #666666">+</span> (<span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_width <span style="color: #666666">//</span> <span style="color: #666666">2</span>) <span style="color: #666666">*</span> <span style="color: #666666">2</span>
-            half_kernel_height <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_height <span style="color: #666666">//</span> <span style="color: #666666">2</span>
-            half_kernel_width <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_width <span style="color: #666666">//</span> <span style="color: #666666">2</span>
-
-            <span style="color: #408080; font-style: italic"># initialize padded array</span>
-            X_batch_padded <span style="color: #666666">=</span> np<span style="color: #666666">.</span>ndarray(
-                (
-                    X_batch<span style="color: #666666">.</span>shape[input_index],
-                    X_batch<span style="color: #666666">.</span>shape[feature_maps_index],
-                    padded_height,
-                    padded_width,
-                )
-            )
-
-            <span style="color: #408080; font-style: italic"># zero pad all images in X_batch</span>
-            <span style="color: #008000; font-weight: bold">for</span> img <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(X_batch<span style="color: #666666">.</span>shape[input_index]):
-                padded_img <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros(
-                    (X_batch<span style="color: #666666">.</span>shape[feature_maps_index], padded_height, padded_width)
-                )
-                padded_img[
-                    :,
-                    half_kernel_height : padded_height <span style="color: #666666">-</span> half_kernel_height,
-                    half_kernel_width : padded_width <span style="color: #666666">-</span> half_kernel_width,
-                ] <span style="color: #666666">=</span> X_batch[img, :, :, :]
-                X_batch_padded[img, :, :, :] <span style="color: #666666">=</span> padded_img[:, :, :]
-
-            <span style="color: #008000; font-weight: bold">return</span> X_batch_padded
-
-        <span style="color: #408080; font-style: italic"># same padding for gradients</span>
-        <span style="color: #008000; font-weight: bold">elif</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>pad <span style="color: #666666">==</span> <span style="color: #BA2121">&quot;same&quot;</span> <span style="color: #AA22FF; font-weight: bold">and</span> batch_type <span style="color: #666666">==</span> <span style="color: #BA2121">&quot;grad&quot;</span>:
-            padded_height <span style="color: #666666">=</span> X_batch<span style="color: #666666">.</span>shape[height_index] <span style="color: #666666">+</span> (<span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_height <span style="color: #666666">//</span> <span style="color: #666666">2</span>) <span style="color: #666666">*</span> <span style="color: #666666">2</span>
-            padded_width <span style="color: #666666">=</span> X_batch<span style="color: #666666">.</span>shape[width_index] <span style="color: #666666">+</span> (<span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_width <span style="color: #666666">//</span> <span style="color: #666666">2</span>) <span style="color: #666666">*</span> <span style="color: #666666">2</span>
-            half_kernel_height <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_height <span style="color: #666666">//</span> <span style="color: #666666">2</span>
-            half_kernel_width <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_width <span style="color: #666666">//</span> <span style="color: #666666">2</span>
-
-            <span style="color: #408080; font-style: italic"># initialize padded array</span>
-            delta_term_padded <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros(
-                (
-                    X_batch<span style="color: #666666">.</span>shape[input_index],
-                    X_batch<span style="color: #666666">.</span>shape[feature_maps_index],
-                    padded_height,
-                    padded_width,
-                )
-            )
-
-            <span style="color: #408080; font-style: italic"># zero pad delta term</span>
-            delta_term_padded[
-                :, :, : X_batch<span style="color: #666666">.</span>shape[height_index], : X_batch<span style="color: #666666">.</span>shape[width_index]
-            ] <span style="color: #666666">=</span> X_batch[:, :, :, :]
-
-            <span style="color: #008000; font-weight: bold">return</span> delta_term_padded
-
-        <span style="color: #008000; font-weight: bold">else</span>:
-            <span style="color: #008000; font-weight: bold">return</span> X_batch
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">_reset_weights_independently</span>(<span style="color: #008000">self</span>):
-        <span style="color: #408080; font-style: italic"># sets seed to remove randomness inbetween runs</span>
-        <span style="color: #008000; font-weight: bold">if</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>seed <span style="color: #AA22FF; font-weight: bold">is</span> <span style="color: #AA22FF; font-weight: bold">not</span> <span style="color: #008000; font-weight: bold">None</span>:
-            np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>seed(<span style="color: #008000">self</span><span style="color: #666666">.</span>seed)
-
-        <span style="color: #408080; font-style: italic"># initializes kernel matrix</span>
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel <span style="color: #666666">=</span> np<span style="color: #666666">.</span>ndarray(
-            (
-                <span style="color: #008000">self</span><span style="color: #666666">.</span>input_channels,
-                <span style="color: #008000">self</span><span style="color: #666666">.</span>feature_maps,
-                <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_height,
-                <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_width,
-            )
-        )
-
-        <span style="color: #408080; font-style: italic"># randomly initializes weights</span>
-        <span style="color: #008000; font-weight: bold">for</span> chin <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #008000">self</span><span style="color: #666666">.</span>kernel<span style="color: #666666">.</span>shape[kernel_input_channels_index]):
-            <span style="color: #008000; font-weight: bold">for</span> fmap <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #008000">self</span><span style="color: #666666">.</span>kernel<span style="color: #666666">.</span>shape[kernel_feature_maps_index]):
-                <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel[chin, fmap, :, :] <span style="color: #666666">=</span> np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>rand(
-                    <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_height, <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_width
-                )
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">_reset_weights</span>(<span style="color: #008000">self</span>, previous_nodes):
-        <span style="color: #408080; font-style: italic"># sets weights</span>
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>_reset_weights_independently()
-
-        <span style="color: #408080; font-style: italic"># returns shape of output used for subsequent layer&#39;s weight initiation</span>
-        strided_height <span style="color: #666666">=</span> <span style="color: #008000">int</span>(
-            np<span style="color: #666666">.</span>ceil(previous_nodes<span style="color: #666666">.</span>shape[height_index] <span style="color: #666666">/</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>v_stride)
-        )
-        strided_width <span style="color: #666666">=</span> <span style="color: #008000">int</span>(np<span style="color: #666666">.</span>ceil(previous_nodes<span style="color: #666666">.</span>shape[width_index] <span style="color: #666666">/</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>h_stride))
-        next_nodes <span style="color: #666666">=</span> np<span style="color: #666666">.</span>ones(
-            (
-                previous_nodes<span style="color: #666666">.</span>shape[input_index],
-                <span style="color: #008000">self</span><span style="color: #666666">.</span>feature_maps,
-                strided_height,
-                strided_width,
-            )
-        )
-        <span style="color: #008000; font-weight: bold">return</span> next_nodes <span style="color: #666666">/</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_height
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">_check_for_errors</span>(<span style="color: #008000">self</span>):
-        <span style="color: #008000; font-weight: bold">if</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>X_batch_feedforward<span style="color: #666666">.</span>shape[input_channel_index] <span style="color: #666666">!=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>input_channels:
-            <span style="color: #008000; font-weight: bold">raise</span> <span style="color: #D2413A; font-weight: bold">AssertionError</span>(
-                <span style="color: #BA2121">f&quot;ERROR: Number of input channels in data (</span><span style="color: #BB6688; font-weight: bold">{</span><span style="color: #008000">self</span><span style="color: #666666">.</span>X_batch_feedforward<span style="color: #666666">.</span>shape[input_channel_index]<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">) is not equal to input channels in Convolution2DLayerOPT (</span><span style="color: #BB6688; font-weight: bold">{</span><span style="color: #008000">self</span><span style="color: #666666">.</span>input_channels<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">)! Please change the number of input channels of the Convolution2DLayer such that they are equal&quot;</span>
-            )
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-<h3 id="backpropagation-in-the-convolutional-layer">Backpropagation in the convolutional layer </h3>
-
-<p>As you may have noticed, we have not yet explained how the
-backpropagation algorithm works in a convolutional layer. However,
-having covered all other major details about convolutional layers, we
-are now prepared to do so. It should come as no surprise that the
-calculation of delta terms at each convolutional layer takes the form
-of convolution. After the gradient has been propagated backwards
-through the flattening layer, where it was reshaped into an
-appropriate form, calculating the update value for the kernel is
-simply a matter of convolving the output gradient with the input of
-the layer for which we are updating the weights. For more detail, this
-article serves as an excellent resource, see
-<a href="https://pavisj.medium.com/convolutions-and-backpropagations-46026a8f5d2c" target="_blank"><tt>https://pavisj.medium.com/convolutions-and-backpropagations-46026a8f5d2c</tt></a>
+<p>We can then rewrite the coefficients \( \delta_j \) using a discrete convolution as</p>
+$$
+\delta_j = \sum_{i=-\infty}^{i=\infty}\alpha_i\beta_{j-i}=(\alpha * \beta)_j,
+$$
+
+<p>or as a double sum with restriction \( l=i+j \)</p>
+$$
+\delta_l = \sum_{ij}\alpha_i\beta_{j}.
+$$
+
+
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="further-simplification">Further simplification </h2>
+
+<p>Although we may have redundant operations with some few zeros for \( \beta_i \), we can rewrite the above sum in a more compact way as </p>
+$$
+\delta_i = \sum_{k=0}^{k=m-1}\alpha_k\beta_{i-k},
+$$
+
+<p>where \( m=3 \) in our case, the maximum length of
+the vector \( \alpha \). Note that the vector \( \boldsymbol{\beta} \) has length \( n=4 \). Below we will find an even more efficient representation.
+</p>
+
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="a-more-efficient-way-of-coding-the-above-convolution">A more efficient way of coding the above Convolution </h2>
+
+<p>Since we only have a finite number of \( \alpha \) and \( \beta \) values
+which are non-zero, we can rewrite the above convolution expressions
+as a matrix-vector multiplication
+</p>
+
+$$
+\boldsymbol{\delta}=\begin{bmatrix}\alpha_0 & 0 & 0 & 0 \\
+                            \alpha_1 & \alpha_0 & 0 & 0 \\
+			    \alpha_2 & \alpha_1 & \alpha_0 & 0 \\
+			    0 & \alpha_2 & \alpha_1 & \alpha_0 \\
+			    0 & 0 & \alpha_2 & \alpha_1 \\
+			    0 & 0 & 0 & \alpha_2
+			    \end{bmatrix}\begin{bmatrix} \beta_0 \\ \beta_1 \\ \beta_2 \\ \beta_3\end{bmatrix}.
+$$
+
+
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="commutative-process">Commutative process </h2>
+
+<p>The process is commutative and we can easily see that we can rewrite the multiplication in terms of  a matrix holding \( \beta \) and a vector holding \( \alpha \).
+In this case we have
 </p>
-<h3 id="demonstration">Demonstration </h3>
+$$
+\boldsymbol{\delta}=\begin{bmatrix}\beta_0 & 0 & 0  \\
+                            \beta_1 & \beta_0 & 0  \\
+			    \beta_2 & \beta_1 & \beta_0  \\
+			    \beta_3 & \beta_2 & \beta_1 \\
+			    0 & \beta_3 & \beta_2 \\
+			    0 & 0 & \beta_3
+			    \end{bmatrix}\begin{bmatrix} \alpha_0 \\ \alpha_1 \\ \alpha_2\end{bmatrix}.
+$$
+
+<p>Note that the use of these matrices is for mathematical purposes only
+and not implementation purposes.  When implementing the above equation
+we do not encode (and allocate memory) the matrices explicitely.  We
+rather code the convolutions in the minimal memory footprint that they
+require.
+</p>
+
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="toeplitz-matrices">Toeplitz matrices </h2>
+
+<p>The above matrices are examples of so-called <a href="https://link.springer.com/book/10.1007/978-93-86279-04-0" target="_blank">Toeplitz
+matrices</a>. A
+Toeplitz matrix is a matrix in which each descending diagonal from
+left to right is constant. For instance the last matrix, which we
+rewrite as
+</p>
+$$
+\boldsymbol{A}=\begin{bmatrix}a_0 & 0 & 0  \\
+                            a_1 & a_0 & 0  \\
+			    a_2 & a_1 & a_0  \\
+			    a_3 & a_2 & a_1 \\
+			    0 & a_3 & a_2 \\
+			    0 & 0 & a_3
+			    \end{bmatrix},
+$$
+
+<p>with elements \( a_{ii}=a_{i+1,j+1}=a_{i-j} \) is an example of a Toeplitz
+matrix. Such a matrix does not need to be a square matrix.  Toeplitz
+matrices are also closely connected with Fourier series, because the multiplication operator by a trigonometric
+polynomial, compressed to a finite-dimensional space, can be
+represented by such a matrix. The example above shows that we can
+represent linear convolution as multiplication of a Toeplitz matrix by
+a vector.
+</p>
+
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="fourier-series-and-toeplitz-matrices">Fourier series and Toeplitz matrices </h2>
+
+<p>This is an active and ogoing research area concerning CNNs. The following articles may be of interest</p>
+<ol>
+<li> <a href="https://www.sciencedirect.com/topics/engineering/convolution-theorem#:~:text=The%20convolution%20theorem%20(together%20with,k%20)%20G%20(%20k%20)%20." target="_blank">Read more about the convolution theorem and Fouriers series</a></li>
+<li> <a href="https://www.sciencedirect.com/science/article/pii/S1568494623006257" target="_blank">Fourier Transform Layer</a></li>
+</ol>
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="generalizing-the-above-one-dimensional-case">Generalizing the above one-dimensional case </h2>
+
+<p>In order to align the above simple case with the more general
+convolution cases, we rename \( \boldsymbol{\alpha} \), whose length is \( m=3 \),
+with \( \boldsymbol{w} \).  We will interpret \( \boldsymbol{w} \) as a weight/filter function
+with which we want to perform the convolution with an input variable
+\( \boldsymbol{x} \) of length \( n \).  We will assume always that the filter
+\( \boldsymbol{w} \) has dimensionality \( m \le n \).
+</p>
+
+<p>We replace thus \( \boldsymbol{\beta} \) with \( \boldsymbol{x} \) and \( \boldsymbol{\delta} \) with \( \boldsymbol{y} \) and have</p>
+$$
+y(i)= \left(x*w\right)(i)= \sum_{k=0}^{k=m-1}w(k)x(i-k),
+$$
+
+<p>where \( m=3 \) in our case, the maximum length of the vector \( \boldsymbol{w} \).
+Here the symbol \( * \) represents the mathematical operation of convolution.
+</p>
+
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="memory-considerations">Memory considerations </h2>
+
+<p>This expression leaves us however with some terms with negative
+indices, for example \( x(-1) \) and \( x(-2) \) which may not be defined. Our
+vector \( \boldsymbol{x} \) has components \( x(0) \), \( x(1) \), \( x(2) \) and \( x(3) \).
+</p>
+
+<p>The index \( j \) for \( \boldsymbol{x} \) runs from \( j=0 \) to \( j=3 \) since \( \boldsymbol{x} \) is meant to
+represent a third-order polynomial.
+</p>
+
+<p>Furthermore, the index \( i \) runs from \( i=0 \) to \( i=5 \) since \( \boldsymbol{y} \)
+contains the coefficients of a fifth-order polynomial.  When \( i=5 \) we
+may also have values of \( x(4) \) and \( x(5) \) which are not defined.
+</p>
+
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="padding">Padding </h2>
+
+<p>The solution to this is what is called <b>padding</b>!  We simply define a
+new vector \( x \) with two added elements set to zero before \( x(0) \) and
+two new elements after \( x(3) \) set to zero. That is, we augment the
+length of \( \boldsymbol{x} \) from \( n=4 \) to \( n+2P=8 \), where \( P=2 \) is the padding
+constant (a new hyperparameter), see discussions below as well.
+</p>
+
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="new-vector">New vector </h2>
+
+<p>We have a new vector defined as \( x(0)=0 \), \( x(1)=0 \),
+\( x(2)=\beta_0 \), \( x(3)=\beta_1 \), \( x(4)=\beta_2 \), \( x(5)=\beta_3 \),
+\( x(6)=0 \), and \( x(7)=0 \).
+</p>
+
+<p>We have added four new elements, which
+are all zero. The benefit is that we can rewrite the equation for
+\( \boldsymbol{y} \), with \( i=0,1,\dots,5 \),
+</p>
+$$
+y(i) = \sum_{k=0}^{k=m-1}w(k)x(i+(m-1)-k).
+$$
+
+<p>As an example, we have</p>
+$$
+y(4)=x(6)w(0)+x(5)w(1)+x(4)w(2)=0\times \alpha_0+\beta_3\alpha_1+\beta_2\alpha_2,
+$$
 
-<p>We can use the convolutional layer above to perform a simple convolution on an image of the now familiar cute dog.</p>
+<p>as before except that we have an additional term \( x(6)w(0) \), which is zero.</p>
+
+<p>Similarly, for the fifth-order term we have</p>
+$$
+y(5)=x(7)w(0)+x(6)w(1)+x(5)w(2)=0\times \alpha_0+0\times\alpha_1+\beta_3\alpha_2.
+$$
 
+<p>The zeroth-order term is</p>
+$$
+y(0)=x(2)w(0)+x(1)w(1)+x(0)w(2)=\beta_0 \alpha_0+0\times\alpha_1+0\times\alpha_2=\alpha_0\beta_0.
+$$
 
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">imageio.v3</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">imageio</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">matplotlib.pyplot</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">plt</span>
 
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">plot_convolution_result</span>(X, layer):
-    plt<span style="color: #666666">.</span>imshow(X[<span style="color: #666666">0</span>, <span style="color: #666666">0</span>, :, :], vmin<span style="color: #666666">=0</span>, vmax<span style="color: #666666">=255</span>, cmap<span style="color: #666666">=</span><span style="color: #BA2121">&quot;gray&quot;</span>)
-    plt<span style="color: #666666">.</span>title(<span style="color: #BA2121">&quot;Original image&quot;</span>)
-    plt<span style="color: #666666">.</span>colorbar()
-    plt<span style="color: #666666">.</span>show()
-    conv_result <span style="color: #666666">=</span> layer<span style="color: #666666">.</span>_feedforward(X)
-    plt<span style="color: #666666">.</span>title(<span style="color: #BA2121">&quot;Result of convolutional layer&quot;</span>)
-    plt<span style="color: #666666">.</span>imshow(conv_result[<span style="color: #666666">0</span>, <span style="color: #666666">0</span>, :, :], vmin<span style="color: #666666">=0</span>, vmax<span style="color: #666666">=255</span>, cmap<span style="color: #666666">=</span><span style="color: #BA2121">&quot;gray&quot;</span>)
-    plt<span style="color: #666666">.</span>colorbar()
-    plt<span style="color: #666666">.</span>show()
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="rewriting-as-dot-products">Rewriting as dot products </h2>
 
-<span style="color: #408080; font-style: italic"># create layer</span>
-layer <span style="color: #666666">=</span> Convolution2DLayer(
-    input_channels<span style="color: #666666">=3</span>,
-    feature_maps<span style="color: #666666">=1</span>,
-    kernel_height<span style="color: #666666">=4</span>,
-    kernel_width<span style="color: #666666">=4</span>,
-    v_stride<span style="color: #666666">=2</span>,
-    h_stride<span style="color: #666666">=2</span>,
-    pad<span style="color: #666666">=</span><span style="color: #BA2121">&quot;same&quot;</span>,
-    act_func<span style="color: #666666">=</span>identity,
-    seed<span style="color: #666666">=2023</span>,
-    )
-
-<span style="color: #408080; font-style: italic"># read in image path, make data correct format</span>
-img_path <span style="color: #666666">=</span> img_path <span style="color: #666666">=</span> <span style="color: #BA2121">&quot;data/IMG-2167.JPG&quot;</span>
-image_of_cute_dog <span style="color: #666666">=</span> imageio<span style="color: #666666">.</span>imread(img_path)
-image_shape <span style="color: #666666">=</span> image_of_cute_dog<span style="color: #666666">.</span>shape
-image_of_cute_dog <span style="color: #666666">=</span> image_of_cute_dog<span style="color: #666666">.</span>reshape(<span style="color: #666666">1</span>, image_shape[<span style="color: #666666">0</span>], image_shape[<span style="color: #666666">1</span>], image_shape[<span style="color: #666666">2</span>])
-image_of_cute_dog <span style="color: #666666">=</span> image_of_cute_dog<span style="color: #666666">.</span>transpose(<span style="color: #666666">0</span>, <span style="color: #666666">3</span>, <span style="color: #666666">1</span>, <span style="color: #666666">2</span>)
-
-<span style="color: #408080; font-style: italic"># plot the result of the convolution</span>
-plot_convolution_result(image_of_cute_dog, layer)
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
+<p>If we now flip the filter/weight vector, with the following term as a typical example</p>
+$$
+y(0)=x(2)w(0)+x(1)w(1)+x(0)w(2)=x(2)\tilde{w}(2)+x(1)\tilde{w}(1)+x(0)\tilde{w}(0),
+$$
 
-<p>We cobserve that the result has half the pixels on each axis due to
-the fact that we've used a horizontal and vertical stride of 2. The
-result of this convolution is not very insightfull, as the kernel has
-completely random values for the first feedforward pass. However, as
-we perform multiple forward and backward passes, the results of the
-convolution should provide identifying features of the image it uses
-for classification.
+<p>with \( \tilde{w}(0)=w(2) \), \( \tilde{w}(1)=w(1) \), and \( \tilde{w}(2)=w(0) \), we can then rewrite the above sum as a dot product of
+\( x(i:i+(m-1))\tilde{w} \) for element \( y(i) \), where \( x(i:i+(m-1)) \) is simply a patch of \( \boldsymbol{x} \) of size \( m-1 \).
 </p>
 
-<p>Note that image data usually comes in many different shapes and sizes,
-but for our CNN we require the input data be formatted as \[Number of
-inputs, input channels, input height, input width\]. Occasionally, the
-data you come accross use will be formatted like this, but on many
-occasions reshaping and transposing the dimensions is sadly necessary.
+<p>The padding \( P \) we have introduced for the convolution stage is just
+another hyperparameter which is introduced as part of the
+architecture. Similarly, below we will also introduce another
+hyperparameter called <b>Stride</b> \( S \). 
 </p>
-<h3 id="pooling-layer">Pooling Layer </h3>
-
-<p>The pooling layer is another widely used type of layer in
-convolutional neural networks that enables data downsampling to a more
-manageable size. Despite recent technological advancements that allow
-for convolution without excessive size reduction of the data, the
-pooling layer still remains a fundamental component of convolutional
-neural networks. It can be used before, after, or in between
-convolutional layers, although finding the optimal placement of layers
-and network depth requires experimentation to achieve the best
-performance for a given problem. The code we provide allows you to
-perform two types of pooling known as max pooling and average pooling.
+
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="cross-correlation">Cross correlation </h2>
+
+<p>In essentially all applications one uses what is called cross correlation instead of the standard convolution described above.
+This means that multiplication is performed in the same direction and instead of the general expression we have discussed above (with infinite sums)
 </p>
+$$
+y(i) = \sum_{k=-\infty}^{k=\infty}w(k)x(i-k),
+$$
 
+<p>we have now</p>
+$$
+y(i) = \sum_{k=-\infty}^{k=\infty}w(k)x(i+k).
+$$
 
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">class</span> <span style="color: #0000FF; font-weight: bold">Pooling2DLayer</span>(Layer):
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">__init__</span>(
-        <span style="color: #008000">self</span>,
-        kernel_height,
-        kernel_width,
-        v_stride,
-        h_stride,
-        pooling<span style="color: #666666">=</span><span style="color: #BA2121">&quot;max&quot;</span>,
-        seed<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">None</span>,
-    ):
-        <span style="color: #008000">super</span>()<span style="color: #666666">.</span><span style="color: #0000FF">__init__</span>(seed)
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_height <span style="color: #666666">=</span> kernel_height
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_width <span style="color: #666666">=</span> kernel_width
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>v_stride <span style="color: #666666">=</span> v_stride
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>h_stride <span style="color: #666666">=</span> h_stride
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>pooling <span style="color: #666666">=</span> pooling
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">_feedforward</span>(<span style="color: #008000">self</span>, X_batch):
-        <span style="color: #408080; font-style: italic"># Saving the input for use in the backwardpass</span>
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>X_batch_feedforward <span style="color: #666666">=</span> X_batch
-
-        <span style="color: #408080; font-style: italic"># check if user is silly</span>
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>_check_for_errors()
-
-        <span style="color: #408080; font-style: italic"># Computing the size of the feature maps based on kernel size and the stride parameter</span>
-        strided_height <span style="color: #666666">=</span> (
-            X_batch<span style="color: #666666">.</span>shape[height_index] <span style="color: #666666">-</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_height
-        ) <span style="color: #666666">//</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>v_stride <span style="color: #666666">+</span> <span style="color: #666666">1</span>
-        <span style="color: #008000; font-weight: bold">if</span> X_batch<span style="color: #666666">.</span>shape[height_index] <span style="color: #666666">==</span> X_batch<span style="color: #666666">.</span>shape[width_index]:
-            strided_width <span style="color: #666666">=</span> strided_height
-        <span style="color: #008000; font-weight: bold">else</span>:
-            strided_width <span style="color: #666666">=</span> (
-                X_batch<span style="color: #666666">.</span>shape[width_index] <span style="color: #666666">-</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_width
-            ) <span style="color: #666666">//</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>h_stride <span style="color: #666666">+</span> <span style="color: #666666">1</span>
-
-        <span style="color: #408080; font-style: italic"># initialize output array</span>
-        output <span style="color: #666666">=</span> np<span style="color: #666666">.</span>ndarray(
-            (
-                X_batch<span style="color: #666666">.</span>shape[input_index],
-                X_batch<span style="color: #666666">.</span>shape[feature_maps_index],
-                strided_height,
-                strided_width,
-            )
-        )
-
-        <span style="color: #408080; font-style: italic"># select pooling action, either max or average pooling</span>
-        <span style="color: #008000; font-weight: bold">if</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>pooling <span style="color: #666666">==</span> <span style="color: #BA2121">&quot;max&quot;</span>:
-            <span style="color: #008000">self</span><span style="color: #666666">.</span>pooling_action <span style="color: #666666">=</span> np<span style="color: #666666">.</span>max
-        <span style="color: #008000; font-weight: bold">elif</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>pooling <span style="color: #666666">==</span> <span style="color: #BA2121">&quot;average&quot;</span>:
-            <span style="color: #008000">self</span><span style="color: #666666">.</span>pooling_action <span style="color: #666666">=</span> np<span style="color: #666666">.</span>mean
-
-        <span style="color: #408080; font-style: italic"># pool based on kernel size and stride</span>
-        <span style="color: #008000; font-weight: bold">for</span> img <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(output<span style="color: #666666">.</span>shape[input_index]):
-            <span style="color: #008000; font-weight: bold">for</span> fmap <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(output<span style="color: #666666">.</span>shape[feature_maps_index]):
-                <span style="color: #008000; font-weight: bold">for</span> h <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(strided_height):
-                    <span style="color: #008000; font-weight: bold">for</span> w <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(strided_width):
-                        output[img, fmap, h, w] <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>pooling_action(
-                            X_batch[
-                                img,
-                                fmap,
-                                (h <span style="color: #666666">*</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>v_stride) : (h <span style="color: #666666">*</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>v_stride)
-                                <span style="color: #666666">+</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_height,
-                                (w <span style="color: #666666">*</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>h_stride) : (w <span style="color: #666666">*</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>h_stride)
-                                <span style="color: #666666">+</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_width,
-                            ]
-                        )
-
-        <span style="color: #408080; font-style: italic"># output for feedforward in next layer</span>
-        <span style="color: #008000; font-weight: bold">return</span> output
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">_backpropagate</span>(<span style="color: #008000">self</span>, delta_term_next):
-        <span style="color: #408080; font-style: italic"># initiate delta term array</span>
-        delta_term <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros((<span style="color: #008000">self</span><span style="color: #666666">.</span>X_batch_feedforward<span style="color: #666666">.</span>shape))
-
-        <span style="color: #008000; font-weight: bold">for</span> img <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(delta_term_next<span style="color: #666666">.</span>shape[input_index]):
-            <span style="color: #008000; font-weight: bold">for</span> fmap <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(delta_term_next<span style="color: #666666">.</span>shape[feature_maps_index]):
-                <span style="color: #008000; font-weight: bold">for</span> h <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #666666">0</span>, delta_term_next<span style="color: #666666">.</span>shape[height_index], <span style="color: #008000">self</span><span style="color: #666666">.</span>v_stride):
-                    <span style="color: #008000; font-weight: bold">for</span> w <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(
-                        <span style="color: #666666">0</span>, delta_term_next<span style="color: #666666">.</span>shape[width_index], <span style="color: #008000">self</span><span style="color: #666666">.</span>h_stride
-                    ):
-                        <span style="color: #408080; font-style: italic"># max pooling</span>
-                        <span style="color: #008000; font-weight: bold">if</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>pooling <span style="color: #666666">==</span> <span style="color: #BA2121">&quot;max&quot;</span>:
-                            <span style="color: #408080; font-style: italic"># get window</span>
-                            window <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>X_batch_feedforward[
-                                img,
-                                fmap,
-                                h : h <span style="color: #666666">+</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_height,
-                                w : w <span style="color: #666666">+</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_width,
-                            ]
-
-                            <span style="color: #408080; font-style: italic"># find max values indices in window</span>
-                            max_h, max_w <span style="color: #666666">=</span> np<span style="color: #666666">.</span>unravel_index(
-                                window<span style="color: #666666">.</span>argmax(), window<span style="color: #666666">.</span>shape
-                            )
-
-                            <span style="color: #408080; font-style: italic"># set values in new, upsampled delta term</span>
-                            delta_term[
-                                img,
-                                fmap,
-                                (h <span style="color: #666666">+</span> max_h),
-                                (w <span style="color: #666666">+</span> max_w),
-                            ] <span style="color: #666666">+=</span> delta_term_next[img, fmap, h, w]
-
-                        <span style="color: #408080; font-style: italic"># average pooling</span>
-                        <span style="color: #008000; font-weight: bold">if</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>pooling <span style="color: #666666">==</span> <span style="color: #BA2121">&quot;average&quot;</span>:
-                            delta_term[
-                                img,
-                                fmap,
-                                h : h <span style="color: #666666">+</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_height,
-                                w : w <span style="color: #666666">+</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_width,
-                            ] <span style="color: #666666">=</span> (
-                                delta_term_next[img, fmap, h, w]
-                                <span style="color: #666666">/</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_height
-                                <span style="color: #666666">/</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_width
-                            )
-        <span style="color: #408080; font-style: italic"># returns input to backpropagation in previous layer</span>
-        <span style="color: #008000; font-weight: bold">return</span> delta_term
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">_reset_weights</span>(<span style="color: #008000">self</span>, previous_nodes):
-        <span style="color: #408080; font-style: italic"># calculate strided height, strided width</span>
-        strided_height <span style="color: #666666">=</span> (
-            previous_nodes<span style="color: #666666">.</span>shape[height_index] <span style="color: #666666">-</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_height
-        ) <span style="color: #666666">//</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>v_stride <span style="color: #666666">+</span> <span style="color: #666666">1</span>
-        <span style="color: #008000; font-weight: bold">if</span> previous_nodes<span style="color: #666666">.</span>shape[height_index] <span style="color: #666666">==</span> previous_nodes<span style="color: #666666">.</span>shape[width_index]:
-            strided_width <span style="color: #666666">=</span> strided_height
-        <span style="color: #008000; font-weight: bold">else</span>:
-            strided_width <span style="color: #666666">=</span> (
-                previous_nodes<span style="color: #666666">.</span>shape[width_index] <span style="color: #666666">-</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_width
-            ) <span style="color: #666666">//</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>h_stride <span style="color: #666666">+</span> <span style="color: #666666">1</span>
-
-        <span style="color: #408080; font-style: italic"># initiate output array</span>
-        output <span style="color: #666666">=</span> np<span style="color: #666666">.</span>ones(
-            (
-                previous_nodes<span style="color: #666666">.</span>shape[input_index],
-                previous_nodes<span style="color: #666666">.</span>shape[feature_maps_index],
-                strided_height,
-                strided_width,
-            )
-        )
-
-        <span style="color: #408080; font-style: italic"># returns output with shape used for reset weights in next layer</span>
-        <span style="color: #008000; font-weight: bold">return</span> output
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">_check_for_errors</span>(<span style="color: #008000">self</span>):
-        <span style="color: #408080; font-style: italic"># check if input is smaller than kernel size -&gt; error</span>
-        <span style="color: #008000; font-weight: bold">assert</span> (
-            <span style="color: #008000">self</span><span style="color: #666666">.</span>X_batch_feedforward<span style="color: #666666">.</span>shape[width_index] <span style="color: #666666">&gt;=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_width
-        ), <span style="color: #BA2121">f&quot;ERROR: Pooling kernel width_index (</span><span style="color: #BB6688; font-weight: bold">{</span><span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_width<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">) larger than data width_index (</span><span style="color: #BB6688; font-weight: bold">{</span><span style="color: #008000">self</span><span style="color: #666666">.</span>X_batch_feedforward<span style="color: #666666">.</span>input<span style="color: #666666">.</span>shape[<span style="color: #666666">2</span>]<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">), please lower the kernel width_index of the Pooling2DLayer&quot;</span>
-        <span style="color: #008000; font-weight: bold">assert</span> (
-            <span style="color: #008000">self</span><span style="color: #666666">.</span>X_batch_feedforward<span style="color: #666666">.</span>shape[height_index] <span style="color: #666666">&gt;=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_height
-        ), <span style="color: #BA2121">f&quot;ERROR: Pooling kernel height_index (</span><span style="color: #BB6688; font-weight: bold">{</span><span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_height<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">) larger than data height_index (</span><span style="color: #BB6688; font-weight: bold">{</span><span style="color: #008000">self</span><span style="color: #666666">.</span>X_batch_feedforward<span style="color: #666666">.</span>input<span style="color: #666666">.</span>shape[<span style="color: #666666">3</span>]<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">), please lower the kernel height_index of the Pooling2DLayer&quot;</span>
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-<h3 id="flattening-layer">Flattening Layer </h3>
-
-<p>Before we can begin building our first CNN model, we need to introduce
-the flattening layer. As its name suggests, the flattening layer
-transforms the data into a one-dimensional vector that can be fed into
-the feedforward layers of our network. This layer plays a crucial role
-in preparing the data for further processing in the
-network. Additionally, the flattening layer is responsible for
-reshaping the gradient to the proper shape during
-backpropagation. This ensures that the kernels are correctly updated,
-allowing for effective learning in the network.
+<p>Both TensorFlow and PyTorch (as well as our own code example below),
+implement the last equation, although it is normally referred to as
+convolution.  The same padding rules and stride rules discussed below
+apply to this expression as well.
 </p>
 
+<p>We leave it as an exercise for you to convince yourself that the example we have discussed till now, gives the same final expression using the last expression.</p>
+<h2 id="two-dimensional-objects">Two-dimensional objects </h2>
 
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">class</span> <span style="color: #0000FF; font-weight: bold">FlattenLayer</span>(Layer):
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">__init__</span>(<span style="color: #008000">self</span>, act_func<span style="color: #666666">=</span>LRELU, seed<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">None</span>):
-        <span style="color: #008000">super</span>()<span style="color: #666666">.</span><span style="color: #0000FF">__init__</span>(seed)
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>act_func <span style="color: #666666">=</span> act_func
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">_feedforward</span>(<span style="color: #008000">self</span>, X_batch):
-        <span style="color: #408080; font-style: italic"># save input for backpropagation</span>
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>X_batch_feedforward_shape <span style="color: #666666">=</span> X_batch<span style="color: #666666">.</span>shape
-        <span style="color: #408080; font-style: italic"># Remember, the data has the following shape: (I, FM, H, W, ) in the convolutional layers</span>
-        <span style="color: #408080; font-style: italic"># whilst the data has the shape (I, FM * H * W) in the fully connected layers</span>
-        <span style="color: #408080; font-style: italic"># I = Inputs, FM = Feature Maps, H = Height and W = Width.</span>
-        X_batch <span style="color: #666666">=</span> X_batch<span style="color: #666666">.</span>reshape(
-            X_batch<span style="color: #666666">.</span>shape[input_index],
-            X_batch<span style="color: #666666">.</span>shape[feature_maps_index]
-            <span style="color: #666666">*</span> X_batch<span style="color: #666666">.</span>shape[height_index]
-            <span style="color: #666666">*</span> X_batch<span style="color: #666666">.</span>shape[width_index],
-        )
-
-        <span style="color: #408080; font-style: italic"># add bias to a</span>
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>z_matrix <span style="color: #666666">=</span> X_batch
-        bias <span style="color: #666666">=</span> np<span style="color: #666666">.</span>ones((X_batch<span style="color: #666666">.</span>shape[input_index], <span style="color: #666666">1</span>)) <span style="color: #666666">*</span> <span style="color: #666666">0.01</span>
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>a_matrix <span style="color: #666666">=</span> np<span style="color: #666666">.</span>hstack([bias, X_batch])
-
-        <span style="color: #408080; font-style: italic"># return a, the input to feedforward in next layer</span>
-        <span style="color: #008000; font-weight: bold">return</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>a_matrix
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">_backpropagate</span>(<span style="color: #008000">self</span>, weights_next, delta_term_next):
-        activation_derivative <span style="color: #666666">=</span> derivate(<span style="color: #008000">self</span><span style="color: #666666">.</span>act_func)
-
-        <span style="color: #408080; font-style: italic"># calculate delta term</span>
-        delta_term <span style="color: #666666">=</span> (
-            weights_next[bias_index:, :] <span style="color: #666666">@</span> delta_term_next<span style="color: #666666">.</span>T
-        )<span style="color: #666666">.</span>T <span style="color: #666666">*</span> activation_derivative(<span style="color: #008000">self</span><span style="color: #666666">.</span>z_matrix)
-
-        <span style="color: #408080; font-style: italic"># FlattenLayer does not update weights</span>
-        <span style="color: #408080; font-style: italic"># reshapes delta layer to convolutional layer data format [Input, Feature_Maps, Height, Width]</span>
-        <span style="color: #008000; font-weight: bold">return</span> delta_term<span style="color: #666666">.</span>reshape(<span style="color: #008000">self</span><span style="color: #666666">.</span>X_batch_feedforward_shape)
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">_reset_weights</span>(<span style="color: #008000">self</span>, previous_nodes):
-        <span style="color: #408080; font-style: italic"># note that the previous nodes to the FlattenLayer are from the convolutional layers</span>
-        previous_nodes <span style="color: #666666">=</span> previous_nodes<span style="color: #666666">.</span>reshape(
-            previous_nodes<span style="color: #666666">.</span>shape[input_index],
-            previous_nodes<span style="color: #666666">.</span>shape[feature_maps_index]
-            <span style="color: #666666">*</span> previous_nodes<span style="color: #666666">.</span>shape[height_index]
-            <span style="color: #666666">*</span> previous_nodes<span style="color: #666666">.</span>shape[width_index],
-        )
-
-        <span style="color: #408080; font-style: italic"># return shape used in reset_weights in next layer</span>
-        <span style="color: #008000; font-weight: bold">return</span> previous_nodes<span style="color: #666666">.</span>shape[node_index]
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">get_prev_a</span>(<span style="color: #008000">self</span>):
-        <span style="color: #008000; font-weight: bold">return</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>a_matrix
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-<h3 id="fully-connected-layers">Fully Connected Layers </h3>
-
-<p>Finally, the result from the flatten layer will pass to a series of
-fully connected layers, which function as a normal feed forward neural
-network. The fully connected layers are split into two classes;
-FullyConnectedLayer which acts as a hidden layer, and OutputLayer,
-which acts as the single output layer at the end of the CNN. If one
-wishes to use this codebase to construct a normal feed forward neural
-network, it must start with a FlattenLayer due to techincal details
-regarding weight intitialization. However many FullyConnectedLayers
-can be added to the CNN, and in each layer the amount of nodes, which
-activation function and scheduler to use can be specified. In
-practice, the scheduler will be specified in the CNN object
-initialization, and inherited if no other scheduler is specified.
+<p>We are now ready to start studying the discrete convolutions relevant for convolutional neural networks.
+We often use convolutions over more than one dimension at a time. If
+we have a two-dimensional image \( X \) as input, we can have a <b>filter</b>
+defined by a two-dimensional <b>kernel/weight/filter</b> \( W \). This leads to an output \( Y \)
 </p>
 
+$$
+Y(i,j)=(X * W)(i,j) = \sum_m\sum_n X(m,n)W(i-m,j-n).
+$$
 
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">class</span> <span style="color: #0000FF; font-weight: bold">FullyConnectedLayer</span>(Layer):
-    <span style="color: #408080; font-style: italic"># FullyConnectedLayer per default uses LRELU and Adam scheduler</span>
-    <span style="color: #408080; font-style: italic"># with an eta of 0.0001, rho of 0.9 and rho2 of 0.999</span>
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">__init__</span>(
-        <span style="color: #008000">self</span>,
-        nodes: <span style="color: #008000">int</span>,
-        act_func: Callable <span style="color: #666666">=</span> LRELU,
-        scheduler: Scheduler <span style="color: #666666">=</span> Adam(eta<span style="color: #666666">=1e-4</span>, rho<span style="color: #666666">=0.9</span>, rho2<span style="color: #666666">=0.999</span>),
-        seed: <span style="color: #008000">int</span> <span style="color: #666666">=</span> <span style="color: #008000; font-weight: bold">None</span>,
-    ):
-        <span style="color: #008000">super</span>()<span style="color: #666666">.</span><span style="color: #0000FF">__init__</span>(seed)
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>nodes <span style="color: #666666">=</span> nodes
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>act_func <span style="color: #666666">=</span> act_func
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>scheduler_weight <span style="color: #666666">=</span> copy(scheduler)
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>scheduler_bias <span style="color: #666666">=</span> copy(scheduler)
-
-        <span style="color: #408080; font-style: italic"># initiate matrices for later</span>
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>weights <span style="color: #666666">=</span> <span style="color: #008000; font-weight: bold">None</span>
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>a_matrix <span style="color: #666666">=</span> <span style="color: #008000; font-weight: bold">None</span>
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>z_matrix <span style="color: #666666">=</span> <span style="color: #008000; font-weight: bold">None</span>
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">_feedforward</span>(<span style="color: #008000">self</span>, X_batch):
-        <span style="color: #408080; font-style: italic"># calculate z</span>
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>z_matrix <span style="color: #666666">=</span> X_batch <span style="color: #666666">@</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>weights
-
-        <span style="color: #408080; font-style: italic"># calculate a, add bias</span>
-        bias <span style="color: #666666">=</span> np<span style="color: #666666">.</span>ones((X_batch<span style="color: #666666">.</span>shape[input_index], <span style="color: #666666">1</span>)) <span style="color: #666666">*</span> <span style="color: #666666">0.01</span>
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>a_matrix <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>act_func(<span style="color: #008000">self</span><span style="color: #666666">.</span>z_matrix)
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>a_matrix <span style="color: #666666">=</span> np<span style="color: #666666">.</span>hstack([bias, <span style="color: #008000">self</span><span style="color: #666666">.</span>a_matrix])
-
-        <span style="color: #408080; font-style: italic"># return a, the input for feedforward in next layer</span>
-        <span style="color: #008000; font-weight: bold">return</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>a_matrix
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">_backpropagate</span>(<span style="color: #008000">self</span>, weights_next, delta_term_next, a_previous, lam):
-        <span style="color: #408080; font-style: italic"># take the derivative of the activation function</span>
-        activation_derivative <span style="color: #666666">=</span> derivate(<span style="color: #008000">self</span><span style="color: #666666">.</span>act_func)
-
-        <span style="color: #408080; font-style: italic"># calculate the delta term</span>
-        delta_term <span style="color: #666666">=</span> (
-            weights_next[bias_index:, :] <span style="color: #666666">@</span> delta_term_next<span style="color: #666666">.</span>T
-        )<span style="color: #666666">.</span>T <span style="color: #666666">*</span> activation_derivative(<span style="color: #008000">self</span><span style="color: #666666">.</span>z_matrix)
-
-        <span style="color: #408080; font-style: italic"># intitiate matrix to store gradient</span>
-        <span style="color: #408080; font-style: italic"># note that we exclude the bias term, which we will calculate later</span>
-        gradient_weights <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros(
-            (
-                a_previous<span style="color: #666666">.</span>shape[input_index],
-                a_previous<span style="color: #666666">.</span>shape[node_index] <span style="color: #666666">-</span> bias_index,
-                delta_term<span style="color: #666666">.</span>shape[node_index],
-            )
-        )
-
-        <span style="color: #408080; font-style: italic"># calculate gradient = delta term * previous a</span>
-        <span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #008000">len</span>(delta_term)):
-            gradient_weights[i, :, :] <span style="color: #666666">=</span> np<span style="color: #666666">.</span>outer(
-                a_previous[i, bias_index:], delta_term[i, :]
-            )
-
-        <span style="color: #408080; font-style: italic"># sum the gradient, divide by input_index</span>
-        gradient_weights <span style="color: #666666">=</span> np<span style="color: #666666">.</span>mean(gradient_weights, axis<span style="color: #666666">=</span>input_index)
-        <span style="color: #408080; font-style: italic"># for the bias gradient we do not multiply by previous a</span>
-        gradient_bias <span style="color: #666666">=</span> np<span style="color: #666666">.</span>mean(delta_term, axis<span style="color: #666666">=</span>input_index)<span style="color: #666666">.</span>reshape(
-            <span style="color: #666666">1</span>, delta_term<span style="color: #666666">.</span>shape[node_index]
-        )
-
-        <span style="color: #408080; font-style: italic"># regularization term</span>
-        gradient_weights <span style="color: #666666">+=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>weights[bias_index:, :] <span style="color: #666666">*</span> lam
-
-        <span style="color: #408080; font-style: italic"># send gradients into scheduler</span>
-        <span style="color: #408080; font-style: italic"># returns update matrix which will be used to update the weights and bias</span>
-        update_matrix <span style="color: #666666">=</span> np<span style="color: #666666">.</span>vstack(
-            [
-                <span style="color: #008000">self</span><span style="color: #666666">.</span>scheduler_bias<span style="color: #666666">.</span>update_change(gradient_bias),
-                <span style="color: #008000">self</span><span style="color: #666666">.</span>scheduler_weight<span style="color: #666666">.</span>update_change(gradient_weights),
-            ]
-        )
-
-        <span style="color: #408080; font-style: italic"># update weights</span>
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>weights <span style="color: #666666">-=</span> update_matrix
-
-        <span style="color: #408080; font-style: italic"># return weights and delta term, input for backpropagation in previous layer</span>
-        <span style="color: #008000; font-weight: bold">return</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>weights, delta_term
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">_reset_weights</span>(<span style="color: #008000">self</span>, previous_nodes):
-        <span style="color: #408080; font-style: italic"># sets seed to remove randomness inbetween runs</span>
-        <span style="color: #008000; font-weight: bold">if</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>seed <span style="color: #AA22FF; font-weight: bold">is</span> <span style="color: #AA22FF; font-weight: bold">not</span> <span style="color: #008000; font-weight: bold">None</span>:
-            np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>seed(<span style="color: #008000">self</span><span style="color: #666666">.</span>seed)
-
-        <span style="color: #408080; font-style: italic"># add bias, initiate random weights</span>
-        bias <span style="color: #666666">=</span> <span style="color: #666666">1</span>
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>weights <span style="color: #666666">=</span> np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>randn(previous_nodes <span style="color: #666666">+</span> bias, <span style="color: #008000">self</span><span style="color: #666666">.</span>nodes)
-
-        <span style="color: #408080; font-style: italic"># returns number of nodes, used for reset_weights in next layer</span>
-        <span style="color: #008000; font-weight: bold">return</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>nodes
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">_reset_scheduler</span>(<span style="color: #008000">self</span>):
-        <span style="color: #408080; font-style: italic"># resets scheduler per epoch</span>
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>scheduler_weight<span style="color: #666666">.</span>reset()
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>scheduler_bias<span style="color: #666666">.</span>reset()
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">get_prev_a</span>(<span style="color: #008000">self</span>):
-        <span style="color: #408080; font-style: italic"># returns a matrix, used in backpropagation</span>
-        <span style="color: #008000; font-weight: bold">return</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>a_matrix
-
-
-<span style="color: #008000; font-weight: bold">class</span> <span style="color: #0000FF; font-weight: bold">OutputLayer</span>(FullyConnectedLayer):
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">__init__</span>(
-        <span style="color: #008000">self</span>,
-        nodes: <span style="color: #008000">int</span>,
-        output_func: Callable <span style="color: #666666">=</span> LRELU,
-        cost_func: Callable <span style="color: #666666">=</span> CostCrossEntropy,
-        scheduler: Scheduler <span style="color: #666666">=</span> Adam(eta<span style="color: #666666">=1e-4</span>, rho<span style="color: #666666">=0.9</span>, rho2<span style="color: #666666">=0.999</span>),
-        seed: <span style="color: #008000">int</span> <span style="color: #666666">=</span> <span style="color: #008000; font-weight: bold">None</span>,
-    ):
-        <span style="color: #008000">super</span>()<span style="color: #666666">.</span><span style="color: #0000FF">__init__</span>(nodes, output_func, copy(scheduler), seed)
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>cost_func <span style="color: #666666">=</span> cost_func
-
-        <span style="color: #408080; font-style: italic"># initiate matrices for later</span>
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>weights <span style="color: #666666">=</span> <span style="color: #008000; font-weight: bold">None</span>
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>a_matrix <span style="color: #666666">=</span> <span style="color: #008000; font-weight: bold">None</span>
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>z_matrix <span style="color: #666666">=</span> <span style="color: #008000; font-weight: bold">None</span>
-
-        <span style="color: #408080; font-style: italic"># decides if the output layer performs binary or multi-class classification</span>
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>_set_pred_format()
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">_feedforward</span>(<span style="color: #008000">self</span>, X_batch: np<span style="color: #666666">.</span>ndarray):
-        <span style="color: #408080; font-style: italic"># calculate a, z</span>
-        <span style="color: #408080; font-style: italic"># note that bias is not added as this would create an extra output class</span>
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>z_matrix <span style="color: #666666">=</span> X_batch <span style="color: #666666">@</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>weights
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>a_matrix <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>act_func(<span style="color: #008000">self</span><span style="color: #666666">.</span>z_matrix)
-
-        <span style="color: #408080; font-style: italic"># returns prediction</span>
-        <span style="color: #008000; font-weight: bold">return</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>a_matrix
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">_backpropagate</span>(<span style="color: #008000">self</span>, target, a_previous, lam):
-        <span style="color: #408080; font-style: italic"># note that in the OutputLayer the activation function is the output function</span>
-        activation_derivative <span style="color: #666666">=</span> derivate(<span style="color: #008000">self</span><span style="color: #666666">.</span>act_func)
-
-        <span style="color: #408080; font-style: italic"># calculate output delta terms</span>
-        <span style="color: #408080; font-style: italic"># for multi-class or binary classification</span>
-        <span style="color: #008000; font-weight: bold">if</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>pred_format <span style="color: #666666">==</span> <span style="color: #BA2121">&quot;Multi-class&quot;</span>:
-            delta_term <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>a_matrix <span style="color: #666666">-</span> target
-        <span style="color: #008000; font-weight: bold">else</span>:
-            cost_func_derivative <span style="color: #666666">=</span> grad(<span style="color: #008000">self</span><span style="color: #666666">.</span>cost_func(target))
-            delta_term <span style="color: #666666">=</span> activation_derivative(<span style="color: #008000">self</span><span style="color: #666666">.</span>z_matrix) <span style="color: #666666">*</span> cost_func_derivative(
-                <span style="color: #008000">self</span><span style="color: #666666">.</span>a_matrix
-            )
-
-        <span style="color: #408080; font-style: italic"># intiate matrix that stores gradient</span>
-        gradient_weights <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros(
-            (
-                a_previous<span style="color: #666666">.</span>shape[input_index],
-                a_previous<span style="color: #666666">.</span>shape[node_index] <span style="color: #666666">-</span> bias_index,
-                delta_term<span style="color: #666666">.</span>shape[node_index],
-            )
-        )
-
-        <span style="color: #408080; font-style: italic"># calculate gradient = delta term * previous a</span>
-        <span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #008000">len</span>(delta_term)):
-            gradient_weights[i, :, :] <span style="color: #666666">=</span> np<span style="color: #666666">.</span>outer(
-                a_previous[i, bias_index:], delta_term[i, :]
-            )
-
-        <span style="color: #408080; font-style: italic"># sum the gradient, divide by input_index</span>
-        gradient_weights <span style="color: #666666">=</span> np<span style="color: #666666">.</span>mean(gradient_weights, axis<span style="color: #666666">=</span>input_index)
-        <span style="color: #408080; font-style: italic"># for the bias gradient we do not multiply by previous a</span>
-        gradient_bias <span style="color: #666666">=</span> np<span style="color: #666666">.</span>mean(delta_term, axis<span style="color: #666666">=</span>input_index)<span style="color: #666666">.</span>reshape(
-            <span style="color: #666666">1</span>, delta_term<span style="color: #666666">.</span>shape[node_index]
-        )
-
-        <span style="color: #408080; font-style: italic"># regularization term</span>
-        gradient_weights <span style="color: #666666">+=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>weights[bias_index:, :] <span style="color: #666666">*</span> lam
-
-        <span style="color: #408080; font-style: italic"># send gradients into scheduler</span>
-        <span style="color: #408080; font-style: italic"># returns update matrix which will be used to update the weights and bias</span>
-        update_matrix <span style="color: #666666">=</span> np<span style="color: #666666">.</span>vstack(
-            [
-                <span style="color: #008000">self</span><span style="color: #666666">.</span>scheduler_bias<span style="color: #666666">.</span>update_change(gradient_bias),
-                <span style="color: #008000">self</span><span style="color: #666666">.</span>scheduler_weight<span style="color: #666666">.</span>update_change(gradient_weights),
-            ]
-        )
-
-        <span style="color: #408080; font-style: italic"># update weights</span>
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>weights <span style="color: #666666">-=</span> update_matrix
-
-        <span style="color: #408080; font-style: italic"># return weights and delta term, input for backpropagation in previous layer</span>
-        <span style="color: #008000; font-weight: bold">return</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>weights, delta_term
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">_reset_weights</span>(<span style="color: #008000">self</span>, previous_nodes):
-        <span style="color: #408080; font-style: italic"># sets seed to remove randomness inbetween runs</span>
-        <span style="color: #008000; font-weight: bold">if</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>seed <span style="color: #AA22FF; font-weight: bold">is</span> <span style="color: #AA22FF; font-weight: bold">not</span> <span style="color: #008000; font-weight: bold">None</span>:
-            np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>seed(<span style="color: #008000">self</span><span style="color: #666666">.</span>seed)
-
-        <span style="color: #408080; font-style: italic"># add bias, initiate random weights</span>
-        bias <span style="color: #666666">=</span> <span style="color: #666666">1</span>
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>weights <span style="color: #666666">=</span> np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>rand(previous_nodes <span style="color: #666666">+</span> bias, <span style="color: #008000">self</span><span style="color: #666666">.</span>nodes)
-
-        <span style="color: #408080; font-style: italic"># returns number of nodes, used for reset_weights in next layer</span>
-        <span style="color: #008000; font-weight: bold">return</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>nodes
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">_reset_scheduler</span>(<span style="color: #008000">self</span>):
-        <span style="color: #408080; font-style: italic"># resets scheduler per epoch</span>
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>scheduler_weight<span style="color: #666666">.</span>reset()
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>scheduler_bias<span style="color: #666666">.</span>reset()
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">_set_pred_format</span>(<span style="color: #008000">self</span>):
-        <span style="color: #408080; font-style: italic"># sets prediction format to either regression, binary or multi-class classification</span>
-        <span style="color: #008000; font-weight: bold">if</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>act_func<span style="color: #666666">.</span><span style="color: #19177C">__name__</span> <span style="color: #AA22FF; font-weight: bold">is</span> <span style="color: #008000; font-weight: bold">None</span> <span style="color: #AA22FF; font-weight: bold">or</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>act_func<span style="color: #666666">.</span><span style="color: #19177C">__name__</span> <span style="color: #666666">==</span> <span style="color: #BA2121">&quot;identity&quot;</span>:
-            <span style="color: #008000">self</span><span style="color: #666666">.</span>pred_format <span style="color: #666666">=</span> <span style="color: #BA2121">&quot;Regression&quot;</span>
-        <span style="color: #008000; font-weight: bold">elif</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>act_func<span style="color: #666666">.</span><span style="color: #19177C">__name__</span> <span style="color: #666666">==</span> <span style="color: #BA2121">&quot;sigmoid&quot;</span> <span style="color: #AA22FF; font-weight: bold">or</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>act_func<span style="color: #666666">.</span><span style="color: #19177C">__name__</span> <span style="color: #666666">==</span> <span style="color: #BA2121">&quot;tanh&quot;</span>:
-            <span style="color: #008000">self</span><span style="color: #666666">.</span>pred_format <span style="color: #666666">=</span> <span style="color: #BA2121">&quot;Binary&quot;</span>
-        <span style="color: #008000; font-weight: bold">else</span>:
-            <span style="color: #008000">self</span><span style="color: #666666">.</span>pred_format <span style="color: #666666">=</span> <span style="color: #BA2121">&quot;Multi-class&quot;</span>
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">get_pred_format</span>(<span style="color: #008000">self</span>):
-        <span style="color: #408080; font-style: italic"># returns format of prediction</span>
-        <span style="color: #008000; font-weight: bold">return</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>pred_format
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-<h3 id="optimized-convolution2dlayer">Optimized Convolution2DLayer </h3>
-
-<p>For our CNN, we have also implemented an optimized version of the
-Convolution2DLayer, Convolution2DLayerOPT, which runs much faster. See
-VII. Remarks for discussion. This layer will per default be used by
-the CNN due to its computational advantages, but is much less
-readable. We've documented it such that specially interested students
-can understand the principles behind it, but it is not recommended to
-read. In short, we reshape and transpose parts of the image such that
-the convolutional operation can be swapped out for a simple matrix
-multiplication.
+<p>Convolution is a commutative process, which means we can rewrite this equation as</p>
+$$
+Y(i,j)=(X * W)(i,j) = \sum_m\sum_n X(i-m,j-n)W(m,n).
+$$
+
+<p>Normally the latter is more straightforward to implement in a machine
+larning library since there is less variation in the range of values
+of \( m \) and \( n \).
 </p>
 
+<p>As mentioned above, most deep learning libraries implement
+cross-correlation instead of convolution (although it is referred to as
+convolution)
+</p>
+$$
+Y(i,j)=(X * W)(i,j) = \sum_m\sum_n X(i+m,j+n)W(m,n).
+$$
 
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">class</span> <span style="color: #0000FF; font-weight: bold">Convolution2DLayerOPT</span>(Convolution2DLayer):
-    <span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;</span>
-<span style="color: #BA2121; font-style: italic">    Am optimized version of the convolution layer above which</span>
-<span style="color: #BA2121; font-style: italic">    utilizes an approach of extracting windows of size equivalent</span>
-<span style="color: #BA2121; font-style: italic">    in size to the filter. The convoution is then performed on those</span>
-<span style="color: #BA2121; font-style: italic">    windows instead of a full feature map.</span>
-<span style="color: #BA2121; font-style: italic">    &quot;&quot;&quot;</span>
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">__init__</span>(
-        <span style="color: #008000">self</span>,
-        input_channels,
-        feature_maps,
-        kernel_height,
-        kernel_width,
-        v_stride,
-        h_stride,
-        pad,
-        act_func: Callable,
-        seed<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">None</span>,
-        reset_weights_independently<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">True</span>,
-    ):
-        <span style="color: #008000">super</span>()<span style="color: #666666">.</span><span style="color: #0000FF">__init__</span>(
-            input_channels,
-            feature_maps,
-            kernel_height,
-            kernel_width,
-            v_stride,
-            h_stride,
-            pad,
-            act_func,
-            seed,
-        )
-        <span style="color: #408080; font-style: italic"># true if layer is used outside of CNN</span>
-        <span style="color: #008000; font-weight: bold">if</span> reset_weights_independently <span style="color: #666666">==</span> <span style="color: #008000; font-weight: bold">True</span>:
-            <span style="color: #008000">self</span><span style="color: #666666">.</span>_reset_weights_independently()
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">_feedforward</span>(<span style="color: #008000">self</span>, X_batch):
-        <span style="color: #408080; font-style: italic"># The optimized _feedforward method is difficult to understand but computationally more efficient</span>
-        <span style="color: #408080; font-style: italic"># for a more &quot;by the book&quot; approach, please look at the _feedforward method of Convolution2DLayer</span>
-
-        <span style="color: #408080; font-style: italic"># save the input for backpropagation</span>
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>X_batch_feedforward <span style="color: #666666">=</span> X_batch
-
-        <span style="color: #408080; font-style: italic"># check that there are the correct amount of input channels</span>
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>_check_for_errors()
-
-        <span style="color: #408080; font-style: italic"># calculate new shape after stride</span>
-        strided_height <span style="color: #666666">=</span> <span style="color: #008000">int</span>(np<span style="color: #666666">.</span>ceil(X_batch<span style="color: #666666">.</span>shape[height_index] <span style="color: #666666">/</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>v_stride))
-        strided_width <span style="color: #666666">=</span> <span style="color: #008000">int</span>(np<span style="color: #666666">.</span>ceil(X_batch<span style="color: #666666">.</span>shape[width_index] <span style="color: #666666">/</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>h_stride))
-
-        <span style="color: #408080; font-style: italic"># get windows of the image for more computationally efficient convolution</span>
-        <span style="color: #408080; font-style: italic"># the idea is that we want to align the dimensions that we wish to matrix</span>
-        <span style="color: #408080; font-style: italic"># multiply, then use a simple matrix multiplication instead of convolution.</span>
-        <span style="color: #408080; font-style: italic"># then, we reshape the size back to its intended shape</span>
-        windows <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>_extract_windows(X_batch)
-        windows <span style="color: #666666">=</span> windows<span style="color: #666666">.</span>transpose(<span style="color: #666666">1</span>, <span style="color: #666666">0</span>, <span style="color: #666666">2</span>, <span style="color: #666666">3</span>, <span style="color: #666666">4</span>)<span style="color: #666666">.</span>reshape(
-            X_batch<span style="color: #666666">.</span>shape[input_index],
-            strided_height <span style="color: #666666">*</span> strided_width,
-            <span style="color: #666666">-1</span>,
-        )
-
-        <span style="color: #408080; font-style: italic"># reshape the kernel for more computationally efficient convolution</span>
-        kernel <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel
-        kernel <span style="color: #666666">=</span> kernel<span style="color: #666666">.</span>transpose(<span style="color: #666666">0</span>, <span style="color: #666666">2</span>, <span style="color: #666666">3</span>, <span style="color: #666666">1</span>)<span style="color: #666666">.</span>reshape(
-            kernel<span style="color: #666666">.</span>shape[kernel_input_channels_index]
-            <span style="color: #666666">*</span> kernel<span style="color: #666666">.</span>shape[height_index]
-            <span style="color: #666666">*</span> kernel<span style="color: #666666">.</span>shape[width_index],
-            <span style="color: #666666">-1</span>,
-        )
-
-        <span style="color: #408080; font-style: italic"># use simple matrix calculation to obtain output</span>
-        output <span style="color: #666666">=</span> (
-            (windows <span style="color: #666666">@</span> kernel)
-            <span style="color: #666666">.</span>reshape(
-                X_batch<span style="color: #666666">.</span>shape[input_index],
-                strided_height,
-                strided_width,
-                <span style="color: #666666">-1</span>,
-            )
-            <span style="color: #666666">.</span>transpose(<span style="color: #666666">0</span>, <span style="color: #666666">3</span>, <span style="color: #666666">1</span>, <span style="color: #666666">2</span>)
-        )
-
-        <span style="color: #408080; font-style: italic"># The output is reshaped and rearranged to appropriate shape</span>
-        <span style="color: #008000; font-weight: bold">return</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>act_func(
-            output <span style="color: #666666">/</span> (<span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_height <span style="color: #666666">*</span> X_batch<span style="color: #666666">.</span>shape[feature_maps_index])
-        )
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">_backpropagate</span>(<span style="color: #008000">self</span>, delta_term_next):
-        <span style="color: #408080; font-style: italic"># The optimized _backpropagate method is difficult to understand but computationally more efficient</span>
-        <span style="color: #408080; font-style: italic"># for a more &quot;by the book&quot; approach, please look at the _backpropagate method of Convolution2DLayer</span>
-        act_derivative <span style="color: #666666">=</span> derivate(<span style="color: #008000">self</span><span style="color: #666666">.</span>act_func)
-        delta_term_next <span style="color: #666666">=</span> act_derivative(delta_term_next)
-
-        <span style="color: #408080; font-style: italic"># calculate strided dimensions</span>
-        strided_height <span style="color: #666666">=</span> <span style="color: #008000">int</span>(
-            np<span style="color: #666666">.</span>ceil(<span style="color: #008000">self</span><span style="color: #666666">.</span>X_batch_feedforward<span style="color: #666666">.</span>shape[height_index] <span style="color: #666666">/</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>v_stride)
-        )
-        strided_width <span style="color: #666666">=</span> <span style="color: #008000">int</span>(
-            np<span style="color: #666666">.</span>ceil(<span style="color: #008000">self</span><span style="color: #666666">.</span>X_batch_feedforward<span style="color: #666666">.</span>shape[width_index] <span style="color: #666666">/</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>h_stride)
-        )
-
-        <span style="color: #408080; font-style: italic"># copy kernel</span>
-        kernel <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel
-
-        <span style="color: #408080; font-style: italic"># get windows, reshape for matrix multiplication</span>
-        windows <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>_extract_windows(<span style="color: #008000">self</span><span style="color: #666666">.</span>X_batch_feedforward, <span style="color: #BA2121">&quot;image&quot;</span>)<span style="color: #666666">.</span>reshape(
-            <span style="color: #008000">self</span><span style="color: #666666">.</span>X_batch_feedforward<span style="color: #666666">.</span>shape[input_index]
-            <span style="color: #666666">*</span> strided_height
-            <span style="color: #666666">*</span> strided_width,
-            <span style="color: #666666">-1</span>,
-        )
-
-        <span style="color: #408080; font-style: italic"># initialize output gradient, reshape and transpose into correct shape</span>
-        <span style="color: #408080; font-style: italic"># for matrix multiplication</span>
-        output_grad_tr <span style="color: #666666">=</span> delta_term_next<span style="color: #666666">.</span>transpose(<span style="color: #666666">0</span>, <span style="color: #666666">2</span>, <span style="color: #666666">3</span>, <span style="color: #666666">1</span>)<span style="color: #666666">.</span>reshape(
-            <span style="color: #008000">self</span><span style="color: #666666">.</span>X_batch_feedforward<span style="color: #666666">.</span>shape[input_index]
-            <span style="color: #666666">*</span> strided_height
-            <span style="color: #666666">*</span> strided_width,
-            <span style="color: #666666">-1</span>,
-        )
-
-        <span style="color: #408080; font-style: italic"># calculate gradient kernel via simple matrix multiplication and reshaping</span>
-        gradient_kernel <span style="color: #666666">=</span> (
-            (windows<span style="color: #666666">.</span>T <span style="color: #666666">@</span> output_grad_tr)
-            <span style="color: #666666">.</span>reshape(
-                kernel<span style="color: #666666">.</span>shape[kernel_input_channels_index],
-                kernel<span style="color: #666666">.</span>shape[height_index],
-                kernel<span style="color: #666666">.</span>shape[width_index],
-                kernel<span style="color: #666666">.</span>shape[kernel_feature_maps_index],
-            )
-            <span style="color: #666666">.</span>transpose(<span style="color: #666666">0</span>, <span style="color: #666666">3</span>, <span style="color: #666666">1</span>, <span style="color: #666666">2</span>)
-        )
-
-        <span style="color: #408080; font-style: italic"># for computing the input gradient</span>
-        windows_out, upsampled_height, upsampled_width <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>_extract_windows(
-            delta_term_next, <span style="color: #BA2121">&quot;grad&quot;</span>
-        )
-
-        <span style="color: #408080; font-style: italic"># calculate new window dimensions</span>
-        new_windows_first_dim <span style="color: #666666">=</span> (
-            <span style="color: #008000">self</span><span style="color: #666666">.</span>X_batch_feedforward<span style="color: #666666">.</span>shape[input_index]
-            <span style="color: #666666">*</span> upsampled_height
-            <span style="color: #666666">*</span> upsampled_width
-        )
-        <span style="color: #408080; font-style: italic"># ceil allows for various asymmetric kernels</span>
-        new_windows_sec_dim <span style="color: #666666">=</span> <span style="color: #008000">int</span>(np<span style="color: #666666">.</span>ceil(windows_out<span style="color: #666666">.</span>size <span style="color: #666666">/</span> new_windows_first_dim))
-
-        <span style="color: #408080; font-style: italic"># reshape for matrix multiplication</span>
-        windows_out <span style="color: #666666">=</span> windows_out<span style="color: #666666">.</span>transpose(<span style="color: #666666">1</span>, <span style="color: #666666">0</span>, <span style="color: #666666">2</span>, <span style="color: #666666">3</span>, <span style="color: #666666">4</span>)<span style="color: #666666">.</span>reshape(
-            new_windows_first_dim, new_windows_sec_dim
-        )
-
-        <span style="color: #408080; font-style: italic"># reshape for matrix multiplication</span>
-        kernel_reshaped <span style="color: #666666">=</span> kernel<span style="color: #666666">.</span>reshape(<span style="color: #008000">self</span><span style="color: #666666">.</span>input_channels, <span style="color: #666666">-1</span>)
-
-        <span style="color: #408080; font-style: italic"># calculating input gradient for next convolutional layer</span>
-        input_grad <span style="color: #666666">=</span> (windows_out <span style="color: #666666">@</span> kernel_reshaped<span style="color: #666666">.</span>T)<span style="color: #666666">.</span>reshape(
-            <span style="color: #008000">self</span><span style="color: #666666">.</span>X_batch_feedforward<span style="color: #666666">.</span>shape[input_index],
-            upsampled_height,
-            upsampled_width,
-            kernel<span style="color: #666666">.</span>shape[kernel_input_channels_index],
-        )
-        input_grad <span style="color: #666666">=</span> input_grad<span style="color: #666666">.</span>transpose(<span style="color: #666666">0</span>, <span style="color: #666666">3</span>, <span style="color: #666666">1</span>, <span style="color: #666666">2</span>)
-
-        <span style="color: #408080; font-style: italic"># Update the weights in the kernel</span>
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel <span style="color: #666666">-=</span> gradient_kernel
-
-        <span style="color: #408080; font-style: italic"># Output the gradient to propagate backwards</span>
-        <span style="color: #008000; font-weight: bold">return</span> input_grad
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">_extract_windows</span>(<span style="color: #008000">self</span>, X_batch, batch_type<span style="color: #666666">=</span><span style="color: #BA2121">&quot;image&quot;</span>):
-        <span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;</span>
-<span style="color: #BA2121; font-style: italic">        Receives as input the X_batch with shape (inputs, feature_maps, image_height, image_width)</span>
-<span style="color: #BA2121; font-style: italic">        and extract windows of size kernel_height * kernel_width for every image and every feature_map.</span>
-<span style="color: #BA2121; font-style: italic">        It then returns an np.ndarray of shape (image_height * image_width, inputs, feature_maps, kernel_height, kernel_width)</span>
-<span style="color: #BA2121; font-style: italic">        which will be used either to filter the images in feedforward or to calculate the gradient.</span>
-<span style="color: #BA2121; font-style: italic">        &quot;&quot;&quot;</span>
-
-        <span style="color: #408080; font-style: italic"># initialize list of windows</span>
-        windows <span style="color: #666666">=</span> []
-
-        <span style="color: #008000; font-weight: bold">if</span> batch_type <span style="color: #666666">==</span> <span style="color: #BA2121">&quot;image&quot;</span>:
-            <span style="color: #408080; font-style: italic"># pad the images</span>
-            X_batch_padded <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>_padding(X_batch, batch_type<span style="color: #666666">=</span><span style="color: #BA2121">&quot;image&quot;</span>)
-            img_height, img_width <span style="color: #666666">=</span> X_batch_padded<span style="color: #666666">.</span>shape[<span style="color: #666666">2</span>:]
-            <span style="color: #408080; font-style: italic"># For each location in the image...</span>
-            <span style="color: #008000; font-weight: bold">for</span> h <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(
-                <span style="color: #666666">0</span>,
-                X_batch<span style="color: #666666">.</span>shape[height_index],
-                <span style="color: #008000">self</span><span style="color: #666666">.</span>v_stride,
-            ):
-                <span style="color: #008000; font-weight: bold">for</span> w <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(
-                    <span style="color: #666666">0</span>,
-                    X_batch<span style="color: #666666">.</span>shape[width_index],
-                    <span style="color: #008000">self</span><span style="color: #666666">.</span>h_stride,
-                ):
-                    <span style="color: #408080; font-style: italic"># ...obtain an image patch of the original size (strided)</span>
-
-                    <span style="color: #408080; font-style: italic"># get window</span>
-                    window <span style="color: #666666">=</span> X_batch_padded[
-                        :,
-                        :,
-                        h : h <span style="color: #666666">+</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_height,
-                        w : w <span style="color: #666666">+</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_width,
-                    ]
-
-                    <span style="color: #408080; font-style: italic"># append to list of windows</span>
-                    windows<span style="color: #666666">.</span>append(window)
-
-            <span style="color: #408080; font-style: italic"># return numpy array instead of list</span>
-            <span style="color: #008000; font-weight: bold">return</span> np<span style="color: #666666">.</span>stack(windows)
-
-        <span style="color: #408080; font-style: italic"># In order to be able to perform backprogagation by the method of window extraction,</span>
-        <span style="color: #408080; font-style: italic"># here is a modified approach to extracting the windows which allow for the necessary</span>
-        <span style="color: #408080; font-style: italic"># upsampling of the gradient in case the on of the stride parameters is larger than one.</span>
-
-        <span style="color: #008000; font-weight: bold">if</span> batch_type <span style="color: #666666">==</span> <span style="color: #BA2121">&quot;grad&quot;</span>:
-
-            <span style="color: #408080; font-style: italic"># In the case of one of the stride parameters being odd, we have to take some</span>
-            <span style="color: #408080; font-style: italic"># extra care in calculating the upsampled size of X_batch. We solve this</span>
-            <span style="color: #408080; font-style: italic"># by simply flooring the result of dividing stride by 2.</span>
-            <span style="color: #008000; font-weight: bold">if</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>v_stride <span style="color: #666666">&lt;</span> <span style="color: #666666">2</span> <span style="color: #AA22FF; font-weight: bold">or</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>v_stride <span style="color: #666666">%</span> <span style="color: #666666">2</span> <span style="color: #666666">==</span> <span style="color: #666666">0</span>:
-                v_stride <span style="color: #666666">=</span> <span style="color: #666666">0</span>
-            <span style="color: #008000; font-weight: bold">else</span>:
-                v_stride <span style="color: #666666">=</span> <span style="color: #008000">int</span>(np<span style="color: #666666">.</span>floor(<span style="color: #008000">self</span><span style="color: #666666">.</span>v_stride <span style="color: #666666">/</span> <span style="color: #666666">2</span>))
-
-            <span style="color: #008000; font-weight: bold">if</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>h_stride <span style="color: #666666">&lt;</span> <span style="color: #666666">2</span> <span style="color: #AA22FF; font-weight: bold">or</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>h_stride <span style="color: #666666">%</span> <span style="color: #666666">2</span> <span style="color: #666666">==</span> <span style="color: #666666">0</span>:
-                h_stride <span style="color: #666666">=</span> <span style="color: #666666">0</span>
-            <span style="color: #008000; font-weight: bold">else</span>:
-                h_stride <span style="color: #666666">=</span> <span style="color: #008000">int</span>(np<span style="color: #666666">.</span>floor(<span style="color: #008000">self</span><span style="color: #666666">.</span>h_stride <span style="color: #666666">/</span> <span style="color: #666666">2</span>))
-
-            upsampled_height <span style="color: #666666">=</span> (X_batch<span style="color: #666666">.</span>shape[height_index] <span style="color: #666666">*</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>v_stride) <span style="color: #666666">-</span> v_stride
-            upsampled_width <span style="color: #666666">=</span> (X_batch<span style="color: #666666">.</span>shape[width_index] <span style="color: #666666">*</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>h_stride) <span style="color: #666666">-</span> h_stride
-
-            <span style="color: #408080; font-style: italic"># When upsampling, we need to insert rows and columns filled with zeros</span>
-            <span style="color: #408080; font-style: italic"># into each feature map. How many of those we have to insert is purely</span>
-            <span style="color: #408080; font-style: italic"># dependant on the value of stride parameter in the vertical and horizontal</span>
-            <span style="color: #408080; font-style: italic"># direction.</span>
-            <span style="color: #008000; font-weight: bold">if</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>v_stride <span style="color: #666666">&gt;</span> <span style="color: #666666">1</span>:
-                v_ind <span style="color: #666666">=</span> <span style="color: #666666">1</span>
-                <span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(X_batch<span style="color: #666666">.</span>shape[height_index]):
-                    <span style="color: #008000; font-weight: bold">for</span> j <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #008000">self</span><span style="color: #666666">.</span>v_stride <span style="color: #666666">-</span> <span style="color: #666666">1</span>):
-                        X_batch <span style="color: #666666">=</span> np<span style="color: #666666">.</span>insert(X_batch, v_ind, <span style="color: #666666">0</span>, axis<span style="color: #666666">=</span>height_index)
-                    v_ind <span style="color: #666666">+=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>v_stride
-
-            <span style="color: #008000; font-weight: bold">if</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>h_stride <span style="color: #666666">&gt;</span> <span style="color: #666666">1</span>:
-                h_ind <span style="color: #666666">=</span> <span style="color: #666666">1</span>
-                <span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(X_batch<span style="color: #666666">.</span>shape[width_index]):
-                    <span style="color: #008000; font-weight: bold">for</span> k <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #008000">self</span><span style="color: #666666">.</span>h_stride <span style="color: #666666">-</span> <span style="color: #666666">1</span>):
-                        X_batch <span style="color: #666666">=</span> np<span style="color: #666666">.</span>insert(X_batch, h_ind, <span style="color: #666666">0</span>, axis<span style="color: #666666">=</span>width_index)
-                    h_ind <span style="color: #666666">+=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>h_stride
-
-            <span style="color: #408080; font-style: italic"># Since the insertion of zero-filled rows and columns isn&#39;t perfect, we have</span>
-            <span style="color: #408080; font-style: italic"># to assure that the resulting feature maps will have the expected upsampled height</span>
-            <span style="color: #408080; font-style: italic"># and width by cutting them og at desired dimensions.</span>
-
-            X_batch <span style="color: #666666">=</span> X_batch[:, :, :upsampled_height, :upsampled_width]
-
-            X_batch_padded <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>_padding(X_batch, batch_type<span style="color: #666666">=</span><span style="color: #BA2121">&quot;grad&quot;</span>)
-
-            <span style="color: #408080; font-style: italic"># initialize list of windows</span>
-            windows <span style="color: #666666">=</span> []
-
-            <span style="color: #408080; font-style: italic"># For each location in the image...</span>
-            <span style="color: #008000; font-weight: bold">for</span> h <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(
-                <span style="color: #666666">0</span>,
-                X_batch<span style="color: #666666">.</span>shape[height_index],
-                <span style="color: #008000">self</span><span style="color: #666666">.</span>v_stride,
-            ):
-                <span style="color: #008000; font-weight: bold">for</span> w <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(
-                    <span style="color: #666666">0</span>,
-                    X_batch<span style="color: #666666">.</span>shape[width_index],
-                    <span style="color: #008000">self</span><span style="color: #666666">.</span>h_stride,
-                ):
-                    <span style="color: #408080; font-style: italic"># ...obtain an image patch of the original size (strided)</span>
-
-                    <span style="color: #408080; font-style: italic"># get window</span>
-                    window <span style="color: #666666">=</span> X_batch_padded[
-                        :, :, h : h <span style="color: #666666">+</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_height, w : w <span style="color: #666666">+</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_width
-                    ]
-
-                    <span style="color: #408080; font-style: italic"># append window to list</span>
-                    windows<span style="color: #666666">.</span>append(window)
-
-            <span style="color: #408080; font-style: italic"># return numpy array, unsampled dimensions</span>
-            <span style="color: #008000; font-weight: bold">return</span> np<span style="color: #666666">.</span>stack(windows), upsampled_height, upsampled_width
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">_check_for_errors</span>(<span style="color: #008000">self</span>):
-        <span style="color: #408080; font-style: italic"># compares input channels of data to input channels of Convolution2DLayer</span>
-        <span style="color: #008000; font-weight: bold">if</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>X_batch_feedforward<span style="color: #666666">.</span>shape[input_channel_index] <span style="color: #666666">!=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>input_channels:
-            <span style="color: #008000; font-weight: bold">raise</span> <span style="color: #D2413A; font-weight: bold">AssertionError</span>(
-                <span style="color: #BA2121">f&quot;ERROR: Number of input channels in data (</span><span style="color: #BB6688; font-weight: bold">{</span><span style="color: #008000">self</span><span style="color: #666666">.</span>X_batch_feedforward<span style="color: #666666">.</span>shape[input_channel_index]<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">) is not equal to input channels in Convolution2DLayerOPT (</span><span style="color: #BB6688; font-weight: bold">{</span><span style="color: #008000">self</span><span style="color: #666666">.</span>input_channels<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">)! Please change the number of input channels of the Convolution2DLayer such that they are equal&quot;</span>
-            )
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-<h3 id="the-convolutional-neural-network-cnn">The Convolutional Neural Network (CNN) </h3>
 
-<p>Finally, we present the code for the CNN. The CNN class organizes all the layers, and allows for training on image data.</p>
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="cnns-in-more-detail-simple-example">CNNs in more detail, simple example  </h2>
+
+<p>Let assume we have an input matrix \( X \) of dimensionality \( 3\times 3 \)
+and a \( 2\times 2 \) filter \( W \) given by the following matrices
+</p>
 
+$$
+\boldsymbol{X}=\begin{bmatrix}x_{00} & x_{01} & x_{02}  \\
+                      x_{10} & x_{11} & x_{12}  \\
+	              x_{20} & x_{21} & x_{22} \end{bmatrix},
+$$
 
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">math</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">autograd.numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">sys</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">warnings</span>
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">autograd</span> <span style="color: #008000; font-weight: bold">import</span> grad, elementwise_grad
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">random</span> <span style="color: #008000; font-weight: bold">import</span> random, seed
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">copy</span> <span style="color: #008000; font-weight: bold">import</span> deepcopy
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">typing</span> <span style="color: #008000; font-weight: bold">import</span> Tuple, Callable
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.utils</span> <span style="color: #008000; font-weight: bold">import</span> resample
-
-warnings<span style="color: #666666">.</span>simplefilter(<span style="color: #BA2121">&quot;error&quot;</span>)
-
-
-<span style="color: #008000; font-weight: bold">class</span> <span style="color: #0000FF; font-weight: bold">CNN</span>:
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">__init__</span>(
-        <span style="color: #008000">self</span>,
-        cost_func: Callable <span style="color: #666666">=</span> CostCrossEntropy,
-        scheduler: Scheduler <span style="color: #666666">=</span> Adam(eta<span style="color: #666666">=1e-4</span>, rho<span style="color: #666666">=0.9</span>, rho2<span style="color: #666666">=0.999</span>),
-        seed: <span style="color: #008000">int</span> <span style="color: #666666">=</span> <span style="color: #008000; font-weight: bold">None</span>,
-    ):
-        <span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;</span>
-<span style="color: #BA2121; font-style: italic">        Description:</span>
-<span style="color: #BA2121; font-style: italic">        ------------</span>
-<span style="color: #BA2121; font-style: italic">            Instantiates CNN object</span>
-
-<span style="color: #BA2121; font-style: italic">        Parameters:</span>
-<span style="color: #BA2121; font-style: italic">        ------------</span>
-<span style="color: #BA2121; font-style: italic">            I   output_func (costFunctions) cost function for feed forward neural network part of CNN,</span>
-<span style="color: #BA2121; font-style: italic">                such as &quot;CostLogReg&quot;, &quot;CostOLS&quot; or &quot;CostCrossEntropy&quot;</span>
-
-<span style="color: #BA2121; font-style: italic">            II  scheduler (Scheduler) optional parameter, default set to Adam. Can also be set to other</span>
-<span style="color: #BA2121; font-style: italic">                schedulers such as AdaGrad, Momentum, RMS_prop and Constant. Note that schedulers have</span>
-<span style="color: #BA2121; font-style: italic">                to be instantiated first with proper parameters (for example eta, rho and rho2 for Adam)</span>
-
-<span style="color: #BA2121; font-style: italic">            III seed (int) used for seeding all random operations</span>
-<span style="color: #BA2121; font-style: italic">        &quot;&quot;&quot;</span>
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>layers <span style="color: #666666">=</span> <span style="color: #008000">list</span>()
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>cost_func <span style="color: #666666">=</span> cost_func
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>scheduler <span style="color: #666666">=</span> scheduler
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>schedulers_weight <span style="color: #666666">=</span> <span style="color: #008000">list</span>()
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>schedulers_bias <span style="color: #666666">=</span> <span style="color: #008000">list</span>()
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>seed <span style="color: #666666">=</span> seed
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>pred_format <span style="color: #666666">=</span> <span style="color: #008000; font-weight: bold">None</span>
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">add_FullyConnectedLayer</span>(
-        <span style="color: #008000">self</span>, nodes: <span style="color: #008000">int</span>, act_func<span style="color: #666666">=</span>LRELU, scheduler<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">None</span>
-    ) <span style="color: #666666">-&gt;</span> <span style="color: #008000; font-weight: bold">None</span>:
-        <span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;</span>
-<span style="color: #BA2121; font-style: italic">        Description:</span>
-<span style="color: #BA2121; font-style: italic">        ------------</span>
-<span style="color: #BA2121; font-style: italic">            Add a FullyConnectedLayer to the CNN, i.e. a hidden layer in the feed forward neural</span>
-<span style="color: #BA2121; font-style: italic">            network part of the CNN. Often called a Dense layer in literature</span>
-
-<span style="color: #BA2121; font-style: italic">        Parameters:</span>
-<span style="color: #BA2121; font-style: italic">        ------------</span>
-<span style="color: #BA2121; font-style: italic">            I   nodes (int) number of nodes in FullyConnectedLayer</span>
-<span style="color: #BA2121; font-style: italic">            II  act_func (activationFunctions) activation function of FullyConnectedLayer,</span>
-<span style="color: #BA2121; font-style: italic">                such as &quot;sigmoid&quot;, &quot;RELU&quot;, &quot;LRELU&quot;, &quot;softmax&quot; or &quot;identity&quot;</span>
-<span style="color: #BA2121; font-style: italic">            III scheduler (Scheduler) optional parameter, default set to Adam. Can also be set to other</span>
-<span style="color: #BA2121; font-style: italic">                schedulers such as AdaGrad, Momentum, RMS_prop and Constant</span>
-<span style="color: #BA2121; font-style: italic">        &quot;&quot;&quot;</span>
-        <span style="color: #008000; font-weight: bold">assert</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>layers, <span style="color: #BA2121">&quot;FullyConnectedLayer should follow FlattenLayer in CNN&quot;</span>
-
-        <span style="color: #008000; font-weight: bold">if</span> scheduler <span style="color: #AA22FF; font-weight: bold">is</span> <span style="color: #008000; font-weight: bold">None</span>:
-            scheduler <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>scheduler
-
-        layer <span style="color: #666666">=</span> FullyConnectedLayer(nodes, act_func, scheduler, <span style="color: #008000">self</span><span style="color: #666666">.</span>seed)
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>layers<span style="color: #666666">.</span>append(layer)
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">add_OutputLayer</span>(<span style="color: #008000">self</span>, nodes: <span style="color: #008000">int</span>, output_func<span style="color: #666666">=</span>sigmoid, scheduler<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">None</span>) <span style="color: #666666">-&gt;</span> <span style="color: #008000; font-weight: bold">None</span>:
-        <span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;</span>
-<span style="color: #BA2121; font-style: italic">        Description:</span>
-<span style="color: #BA2121; font-style: italic">        ------------</span>
-<span style="color: #BA2121; font-style: italic">            Add an OutputLayer to the CNN, i.e. a the final layer in the feed forward neural</span>
-<span style="color: #BA2121; font-style: italic">            network part of the CNN</span>
-
-<span style="color: #BA2121; font-style: italic">        Parameters:</span>
-<span style="color: #BA2121; font-style: italic">        ------------</span>
-<span style="color: #BA2121; font-style: italic">            I   nodes (int) number of nodes in OutputLayer. Set nodes=1 for binary classification and</span>
-<span style="color: #BA2121; font-style: italic">                nodes = number of classes for multi-class classification</span>
-<span style="color: #BA2121; font-style: italic">            II  output_func (activationFunctions) activation function for the output layer, such as</span>
-<span style="color: #BA2121; font-style: italic">                &quot;identity&quot; for regression, &quot;sigmoid&quot; for binary classification and &quot;softmax&quot; for multi-class</span>
-<span style="color: #BA2121; font-style: italic">                classification</span>
-<span style="color: #BA2121; font-style: italic">            III scheduler (Scheduler) optional parameter, default set to Adam. Can also be set to other</span>
-<span style="color: #BA2121; font-style: italic">                schedulers such as AdaGrad, Momentum, RMS_prop and Constant</span>
-<span style="color: #BA2121; font-style: italic">        &quot;&quot;&quot;</span>
-        <span style="color: #008000; font-weight: bold">assert</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>layers, <span style="color: #BA2121">&quot;OutputLayer should follow FullyConnectedLayer in CNN&quot;</span>
-
-        <span style="color: #008000; font-weight: bold">if</span> scheduler <span style="color: #AA22FF; font-weight: bold">is</span> <span style="color: #008000; font-weight: bold">None</span>:
-            scheduler <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>scheduler
-
-        output_layer <span style="color: #666666">=</span> OutputLayer(
-            nodes, output_func, <span style="color: #008000">self</span><span style="color: #666666">.</span>cost_func, scheduler, <span style="color: #008000">self</span><span style="color: #666666">.</span>seed
-        )
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>layers<span style="color: #666666">.</span>append(output_layer)
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>pred_format <span style="color: #666666">=</span> output_layer<span style="color: #666666">.</span>get_pred_format()
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">add_FlattenLayer</span>(<span style="color: #008000">self</span>, act_func<span style="color: #666666">=</span>LRELU) <span style="color: #666666">-&gt;</span> <span style="color: #008000; font-weight: bold">None</span>:
-        <span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;</span>
-<span style="color: #BA2121; font-style: italic">        Description:</span>
-<span style="color: #BA2121; font-style: italic">        ------------</span>
-<span style="color: #BA2121; font-style: italic">            Add a FlattenLayer to the CNN, which flattens the image data such that it is formatted to</span>
-<span style="color: #BA2121; font-style: italic">            be used in the feed forward neural network part of the CNN</span>
-<span style="color: #BA2121; font-style: italic">        &quot;&quot;&quot;</span>
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>layers<span style="color: #666666">.</span>append(FlattenLayer(act_func<span style="color: #666666">=</span>act_func, seed<span style="color: #666666">=</span><span style="color: #008000">self</span><span style="color: #666666">.</span>seed))
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">add_Convolution2DLayer</span>(
-        <span style="color: #008000">self</span>,
-        input_channels<span style="color: #666666">=1</span>,
-        feature_maps<span style="color: #666666">=1</span>,
-        kernel_height<span style="color: #666666">=3</span>,
-        kernel_width<span style="color: #666666">=3</span>,
-        v_stride<span style="color: #666666">=1</span>,
-        h_stride<span style="color: #666666">=1</span>,
-        pad<span style="color: #666666">=</span><span style="color: #BA2121">&quot;same&quot;</span>,
-        act_func<span style="color: #666666">=</span>LRELU,
-        optimized<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">True</span>,
-    ) <span style="color: #666666">-&gt;</span> <span style="color: #008000; font-weight: bold">None</span>:
-        <span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;</span>
-<span style="color: #BA2121; font-style: italic">        Description:</span>
-<span style="color: #BA2121; font-style: italic">        ------------</span>
-<span style="color: #BA2121; font-style: italic">            Add a Convolution2DLayer to the CNN, i.e. a convolutional layer with a 2 dimensional kernel. Should be</span>
-<span style="color: #BA2121; font-style: italic">            the first layer added to the CNN</span>
-
-<span style="color: #BA2121; font-style: italic">        Parameters:</span>
-<span style="color: #BA2121; font-style: italic">        ------------</span>
-<span style="color: #BA2121; font-style: italic">            I   input_channels (int) specifies amount of input channels. For monochrome images, use input_channels</span>
-<span style="color: #BA2121; font-style: italic">                = 1, and input_channels = 3 for colored images, where each channel represents one of R, G and B</span>
-<span style="color: #BA2121; font-style: italic">            II  feature_maps (int) amount of feature maps in CNN</span>
-<span style="color: #BA2121; font-style: italic">            III kernel_height (int) height of the kernel, also called &#39;convolutional filter&#39; in literature</span>
-<span style="color: #BA2121; font-style: italic">            IV  kernel_width (int) width of the kernel, also called &#39;convolutional filter&#39; in literature</span>
-<span style="color: #BA2121; font-style: italic">            V   v_stride (int) value of vertical stride for dimentionality reduction</span>
-<span style="color: #BA2121; font-style: italic">            VI  h_stride (int) value of horizontal stride for dimentionality reduction</span>
-<span style="color: #BA2121; font-style: italic">            VII pad (str) default = &quot;same&quot; ensures output size is the same as input size (given stride=1)</span>
-<span style="color: #BA2121; font-style: italic">           VIII act_func (activationFunctions) default = &quot;LRELU&quot;, nonlinear activation function</span>
-<span style="color: #BA2121; font-style: italic">             IX optimized (bool) default = True, uses Convolution2DLayerOPT if True which is much faster when</span>
-<span style="color: #BA2121; font-style: italic">                compared to Convolution2DLayer, which is a more straightforward, understandable implementation</span>
-<span style="color: #BA2121; font-style: italic">        &quot;&quot;&quot;</span>
-        <span style="color: #008000; font-weight: bold">if</span> optimized:
-            conv_layer <span style="color: #666666">=</span> Convolution2DLayerOPT(
-                input_channels,
-                feature_maps,
-                kernel_height,
-                kernel_width,
-                v_stride,
-                h_stride,
-                pad,
-                act_func,
-                <span style="color: #008000">self</span><span style="color: #666666">.</span>seed,
-                reset_weights_independently<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">False</span>,
-            )
-        <span style="color: #008000; font-weight: bold">else</span>:
-            conv_layer <span style="color: #666666">=</span> Convolution2DLayer(
-                input_channels,
-                feature_maps,
-                kernel_height,
-                kernel_width,
-                v_stride,
-                h_stride,
-                pad,
-                act_func,
-                <span style="color: #008000">self</span><span style="color: #666666">.</span>seed,
-                reset_weights_independently<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">False</span>,
-            )
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>layers<span style="color: #666666">.</span>append(conv_layer)
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">add_PoolingLayer</span>(
-        <span style="color: #008000">self</span>, kernel_height<span style="color: #666666">=2</span>, kernel_width<span style="color: #666666">=2</span>, v_stride<span style="color: #666666">=1</span>, h_stride<span style="color: #666666">=1</span>, pooling<span style="color: #666666">=</span><span style="color: #BA2121">&quot;max&quot;</span>
-    ) <span style="color: #666666">-&gt;</span> <span style="color: #008000; font-weight: bold">None</span>:
-        <span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;</span>
-<span style="color: #BA2121; font-style: italic">        Description:</span>
-<span style="color: #BA2121; font-style: italic">        ------------</span>
-<span style="color: #BA2121; font-style: italic">            Add a Pooling2DLayer to the CNN, i.e. a pooling layer that reduces the dimentionality of</span>
-<span style="color: #BA2121; font-style: italic">            the image data. It is not necessary to use a Pooling2DLayer when creating a CNN, but it</span>
-<span style="color: #BA2121; font-style: italic">            can be used to speed up the training</span>
-
-<span style="color: #BA2121; font-style: italic">        Parameters:</span>
-<span style="color: #BA2121; font-style: italic">        ------------</span>
-<span style="color: #BA2121; font-style: italic">            I   kernel_height (int) height of the kernel used for pooling</span>
-<span style="color: #BA2121; font-style: italic">            II  kernel_width (int) width of the kernel used for pooling</span>
-<span style="color: #BA2121; font-style: italic">            III v_stride (int) value of vertical stride for dimentionality reduction</span>
-<span style="color: #BA2121; font-style: italic">            IV  h_stride (int) value of horizontal stride for dimentionality reduction</span>
-<span style="color: #BA2121; font-style: italic">            V   pooling (str) either &quot;max&quot; or &quot;average&quot;, describes type of pooling performed</span>
-<span style="color: #BA2121; font-style: italic">        &quot;&quot;&quot;</span>
-        pooling_layer <span style="color: #666666">=</span> Pooling2DLayer(
-            kernel_height, kernel_width, v_stride, h_stride, pooling, <span style="color: #008000">self</span><span style="color: #666666">.</span>seed
-        )
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>layers<span style="color: #666666">.</span>append(pooling_layer)
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">fit</span>(
-        <span style="color: #008000">self</span>,
-        X: np<span style="color: #666666">.</span>ndarray,
-        t: np<span style="color: #666666">.</span>ndarray,
-        epochs: <span style="color: #008000">int</span> <span style="color: #666666">=</span> <span style="color: #666666">100</span>,
-        lam: <span style="color: #008000">float</span> <span style="color: #666666">=</span> <span style="color: #666666">0</span>,
-        batches: <span style="color: #008000">int</span> <span style="color: #666666">=</span> <span style="color: #666666">1</span>,
-        X_val: np<span style="color: #666666">.</span>ndarray <span style="color: #666666">=</span> <span style="color: #008000; font-weight: bold">None</span>,
-        t_val: np<span style="color: #666666">.</span>ndarray <span style="color: #666666">=</span> <span style="color: #008000; font-weight: bold">None</span>,
-    ) <span style="color: #666666">-&gt;</span> <span style="color: #008000">dict</span>:
-        <span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;</span>
-<span style="color: #BA2121; font-style: italic">        Description:</span>
-<span style="color: #BA2121; font-style: italic">        ------------</span>
-<span style="color: #BA2121; font-style: italic">            Fits the CNN to input X for a given amount of epochs. Performs feedforward and backpropagation passes,</span>
-<span style="color: #BA2121; font-style: italic">            can utilize batches, regulariziation and validation if desired.</span>
-
-<span style="color: #BA2121; font-style: italic">        Parameters:</span>
-<span style="color: #BA2121; font-style: italic">        ------------</span>
-<span style="color: #BA2121; font-style: italic">            X (numpy array) with input data in format [images, input channels,</span>
-<span style="color: #BA2121; font-style: italic">            image height, image_width]</span>
-<span style="color: #BA2121; font-style: italic">            t (numpy array) target labels for input data</span>
-<span style="color: #BA2121; font-style: italic">            epochs (int) amount of epochs</span>
-<span style="color: #BA2121; font-style: italic">            lam (float) regulariziation term lambda</span>
-<span style="color: #BA2121; font-style: italic">            batches (int) amount of batches input data splits into</span>
-<span style="color: #BA2121; font-style: italic">            X_val (numpy array) validation data</span>
-<span style="color: #BA2121; font-style: italic">            t_val (numpy array) target labels for validation data</span>
-
-<span style="color: #BA2121; font-style: italic">        Returns:</span>
-<span style="color: #BA2121; font-style: italic">        ------------</span>
-<span style="color: #BA2121; font-style: italic">            scores (dict) a dictionary with &quot;train_error&quot;, &quot;train_acc&quot;, &quot;val_error&quot;, val_acc&quot; keys</span>
-<span style="color: #BA2121; font-style: italic">            that contain numpy arrays with float values of all accuracies/errors over all epochs.</span>
-<span style="color: #BA2121; font-style: italic">            Can be used to create plots. Also used to update the progress bar during training</span>
-<span style="color: #BA2121; font-style: italic">        &quot;&quot;&quot;</span>
-
-        <span style="color: #408080; font-style: italic"># setup</span>
-        <span style="color: #008000; font-weight: bold">if</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>seed <span style="color: #AA22FF; font-weight: bold">is</span> <span style="color: #AA22FF; font-weight: bold">not</span> <span style="color: #008000; font-weight: bold">None</span>:
-            np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>seed(<span style="color: #008000">self</span><span style="color: #666666">.</span>seed)
-
-        <span style="color: #408080; font-style: italic"># initialize weights</span>
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>_initialize_weights(X)
-
-        <span style="color: #408080; font-style: italic"># create arrays for score metrics</span>
-        scores <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>_initialize_scores(epochs)
-
-        <span style="color: #008000; font-weight: bold">assert</span> batches <span style="color: #666666">&lt;=</span> t<span style="color: #666666">.</span>shape[<span style="color: #666666">0</span>]
-        batch_size <span style="color: #666666">=</span> X<span style="color: #666666">.</span>shape[<span style="color: #666666">0</span>] <span style="color: #666666">//</span> batches
-
-        <span style="color: #008000; font-weight: bold">try</span>:
-            <span style="color: #008000; font-weight: bold">for</span> epoch <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(epochs):
-                <span style="color: #008000; font-weight: bold">for</span> batch <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(batches):
-                    <span style="color: #408080; font-style: italic"># minibatch gradient descent</span>
-                    <span style="color: #408080; font-style: italic"># If the for loop has reached the last batch, take all thats left</span>
-                    <span style="color: #008000; font-weight: bold">if</span> batch <span style="color: #666666">==</span> batches <span style="color: #666666">-</span> <span style="color: #666666">1</span>:
-                        X_batch <span style="color: #666666">=</span> X[batch <span style="color: #666666">*</span> batch_size :, :, :, :]
-                        t_batch <span style="color: #666666">=</span> t[batch <span style="color: #666666">*</span> batch_size :, :]
-                    <span style="color: #008000; font-weight: bold">else</span>:
-                        X_batch <span style="color: #666666">=</span> X[
-                            batch <span style="color: #666666">*</span> batch_size : (batch <span style="color: #666666">+</span> <span style="color: #666666">1</span>) <span style="color: #666666">*</span> batch_size, :, :, :
-                        ]
-                        t_batch <span style="color: #666666">=</span> t[batch <span style="color: #666666">*</span> batch_size : (batch <span style="color: #666666">+</span> <span style="color: #666666">1</span>) <span style="color: #666666">*</span> batch_size, :]
-
-                    <span style="color: #008000">self</span><span style="color: #666666">.</span>_feedforward(X_batch)
-                    <span style="color: #008000">self</span><span style="color: #666666">.</span>_backpropagate(t_batch, lam)
-
-                <span style="color: #408080; font-style: italic"># reset schedulers for each epoch (some schedulers pass in this call)</span>
-                <span style="color: #008000; font-weight: bold">for</span> layer <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>layers:
-                    <span style="color: #008000; font-weight: bold">if</span> <span style="color: #008000">isinstance</span>(layer, FullyConnectedLayer):
-                        layer<span style="color: #666666">.</span>_reset_scheduler()
-
-                <span style="color: #408080; font-style: italic"># computing performance metrics</span>
-                scores <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>_compute_scores(scores, epoch, X, t, X_val, t_val)
-
-                <span style="color: #408080; font-style: italic"># printing progress bar</span>
-                print_length <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>_progress_bar(
-                    epoch,
-                    epochs,
-                    scores,
-                )
-        <span style="color: #408080; font-style: italic"># allows for stopping training at any point and seeing the result</span>
-        <span style="color: #008000; font-weight: bold">except</span> <span style="color: #D2413A; font-weight: bold">KeyboardInterrupt</span>:
-            <span style="color: #008000; font-weight: bold">pass</span>
-
-        <span style="color: #408080; font-style: italic"># visualization of training progression (similiar to tensorflow progression bar)</span>
-        sys<span style="color: #666666">.</span>stdout<span style="color: #666666">.</span>write(<span style="color: #BA2121">&quot;</span><span style="color: #BB6622; font-weight: bold">\r</span><span style="color: #BA2121">&quot;</span> <span style="color: #666666">+</span> <span style="color: #BA2121">&quot; &quot;</span> <span style="color: #666666">*</span> print_length)
-        sys<span style="color: #666666">.</span>stdout<span style="color: #666666">.</span>flush()
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>_progress_bar(
-            epochs,
-            epochs,
-            scores,
-        )
-        sys<span style="color: #666666">.</span>stdout<span style="color: #666666">.</span>write(<span style="color: #BA2121">&quot;&quot;</span>)
-
-        <span style="color: #008000; font-weight: bold">return</span> scores
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">_feedforward</span>(<span style="color: #008000">self</span>, X_batch) <span style="color: #666666">-&gt;</span> np<span style="color: #666666">.</span>ndarray:
-        <span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;</span>
-<span style="color: #BA2121; font-style: italic">        Description:</span>
-<span style="color: #BA2121; font-style: italic">        ------------</span>
-<span style="color: #BA2121; font-style: italic">            Performs the feedforward pass for all layers in the CNN. Called from fit()</span>
-<span style="color: #BA2121; font-style: italic">        &quot;&quot;&quot;</span>
-        a <span style="color: #666666">=</span> X_batch
-        <span style="color: #008000; font-weight: bold">for</span> layer <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>layers:
-            a <span style="color: #666666">=</span> layer<span style="color: #666666">.</span>_feedforward(a)
-
-        <span style="color: #008000; font-weight: bold">return</span> a
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">_backpropagate</span>(<span style="color: #008000">self</span>, t_batch, lam) <span style="color: #666666">-&gt;</span> <span style="color: #008000; font-weight: bold">None</span>:
-        <span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;</span>
-<span style="color: #BA2121; font-style: italic">        Description:</span>
-<span style="color: #BA2121; font-style: italic">        ------------</span>
-<span style="color: #BA2121; font-style: italic">            Performs backpropagation for all layers in the CNN. Called from fit()</span>
-<span style="color: #BA2121; font-style: italic">        &quot;&quot;&quot;</span>
-        <span style="color: #008000; font-weight: bold">assert</span> <span style="color: #008000">len</span>(<span style="color: #008000">self</span><span style="color: #666666">.</span>layers) <span style="color: #666666">&gt;=</span> <span style="color: #666666">2</span>
-        reversed_layers <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>layers[::<span style="color: #666666">-1</span>]
-
-        <span style="color: #408080; font-style: italic"># for every layer, backwards</span>
-        <span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #008000">len</span>(reversed_layers) <span style="color: #666666">-</span> <span style="color: #666666">1</span>):
-            layer <span style="color: #666666">=</span> reversed_layers[i]
-            prev_layer <span style="color: #666666">=</span> reversed_layers[i <span style="color: #666666">+</span> <span style="color: #666666">1</span>]
-
-            <span style="color: #408080; font-style: italic"># OutputLayer</span>
-            <span style="color: #008000; font-weight: bold">if</span> <span style="color: #008000">isinstance</span>(layer, OutputLayer):
-                prev_a <span style="color: #666666">=</span> prev_layer<span style="color: #666666">.</span>get_prev_a()
-                weights_next, delta_next <span style="color: #666666">=</span> layer<span style="color: #666666">.</span>_backpropagate(t_batch, prev_a, lam)
-
-            <span style="color: #408080; font-style: italic"># FullyConnectedLayer</span>
-            <span style="color: #008000; font-weight: bold">elif</span> <span style="color: #008000">isinstance</span>(layer, FullyConnectedLayer):
-                <span style="color: #008000; font-weight: bold">assert</span> (
-                    delta_next <span style="color: #AA22FF; font-weight: bold">is</span> <span style="color: #AA22FF; font-weight: bold">not</span> <span style="color: #008000; font-weight: bold">None</span>
-                ), <span style="color: #BA2121">&quot;No OutputLayer to follow FullyConnectedLayer&quot;</span>
-                <span style="color: #008000; font-weight: bold">assert</span> (
-                    weights_next <span style="color: #AA22FF; font-weight: bold">is</span> <span style="color: #AA22FF; font-weight: bold">not</span> <span style="color: #008000; font-weight: bold">None</span>
-                ), <span style="color: #BA2121">&quot;No OutputLayer to follow FullyConnectedLayer&quot;</span>
-                prev_a <span style="color: #666666">=</span> prev_layer<span style="color: #666666">.</span>get_prev_a()
-                weights_next, delta_next <span style="color: #666666">=</span> layer<span style="color: #666666">.</span>_backpropagate(
-                    weights_next, delta_next, prev_a, lam
-                )
-
-            <span style="color: #408080; font-style: italic"># FlattenLayer</span>
-            <span style="color: #008000; font-weight: bold">elif</span> <span style="color: #008000">isinstance</span>(layer, FlattenLayer):
-                <span style="color: #008000; font-weight: bold">assert</span> (
-                    delta_next <span style="color: #AA22FF; font-weight: bold">is</span> <span style="color: #AA22FF; font-weight: bold">not</span> <span style="color: #008000; font-weight: bold">None</span>
-                ), <span style="color: #BA2121">&quot;No FullyConnectedLayer to follow FlattenLayer&quot;</span>
-                <span style="color: #008000; font-weight: bold">assert</span> (
-                    weights_next <span style="color: #AA22FF; font-weight: bold">is</span> <span style="color: #AA22FF; font-weight: bold">not</span> <span style="color: #008000; font-weight: bold">None</span>
-                ), <span style="color: #BA2121">&quot;No FullyConnectedLayer to follow FlattenLayer&quot;</span>
-                delta_next <span style="color: #666666">=</span> layer<span style="color: #666666">.</span>_backpropagate(weights_next, delta_next)
-
-            <span style="color: #408080; font-style: italic"># Convolution2DLayer and Convolution2DLayerOPT</span>
-            <span style="color: #008000; font-weight: bold">elif</span> <span style="color: #008000">isinstance</span>(layer, Convolution2DLayer):
-                <span style="color: #008000; font-weight: bold">assert</span> (
-                    delta_next <span style="color: #AA22FF; font-weight: bold">is</span> <span style="color: #AA22FF; font-weight: bold">not</span> <span style="color: #008000; font-weight: bold">None</span>
-                ), <span style="color: #BA2121">&quot;No FlattenLayer to follow Convolution2DLayer&quot;</span>
-                delta_next <span style="color: #666666">=</span> layer<span style="color: #666666">.</span>_backpropagate(delta_next)
-
-            <span style="color: #408080; font-style: italic"># Pooling2DLayer</span>
-            <span style="color: #008000; font-weight: bold">elif</span> <span style="color: #008000">isinstance</span>(layer, Pooling2DLayer):
-                <span style="color: #008000; font-weight: bold">assert</span> delta_next <span style="color: #AA22FF; font-weight: bold">is</span> <span style="color: #AA22FF; font-weight: bold">not</span> <span style="color: #008000; font-weight: bold">None</span>, <span style="color: #BA2121">&quot;No Layer to follow Pooling2DLayer&quot;</span>
-                delta_next <span style="color: #666666">=</span> layer<span style="color: #666666">.</span>_backpropagate(delta_next)
-
-            <span style="color: #408080; font-style: italic"># Catch error</span>
-            <span style="color: #008000; font-weight: bold">else</span>:
-                <span style="color: #008000; font-weight: bold">raise</span> <span style="color: #D2413A; font-weight: bold">NotImplementedError</span>
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">_compute_scores</span>(
-        <span style="color: #008000">self</span>,
-        scores: <span style="color: #008000">dict</span>,
-        epoch: <span style="color: #008000">int</span>,
-        X: np<span style="color: #666666">.</span>ndarray,
-        t: np<span style="color: #666666">.</span>ndarray,
-        X_val: np<span style="color: #666666">.</span>ndarray,
-        t_val: np<span style="color: #666666">.</span>ndarray,
-    ) <span style="color: #666666">-&gt;</span> <span style="color: #008000">dict</span>:
-        <span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;</span>
-<span style="color: #BA2121; font-style: italic">        Description:</span>
-<span style="color: #BA2121; font-style: italic">        ------------</span>
-<span style="color: #BA2121; font-style: italic">            Computes scores such as training error, training accuracy, validation error</span>
-<span style="color: #BA2121; font-style: italic">            and validation accuracy for the CNN depending on if a validation set is used</span>
-<span style="color: #BA2121; font-style: italic">            and if the CNN performs classification or regression</span>
-
-<span style="color: #BA2121; font-style: italic">        Returns:</span>
-<span style="color: #BA2121; font-style: italic">        ------------</span>
-<span style="color: #BA2121; font-style: italic">            scores (dict) a dictionary with &quot;train_error&quot;, &quot;train_acc&quot;, &quot;val_error&quot;, val_acc&quot; keys</span>
-<span style="color: #BA2121; font-style: italic">            that contain numpy arrays with float values of all accuracies/errors over all epochs.</span>
-<span style="color: #BA2121; font-style: italic">            Can be used to create plots. Also used to update the progress bar during training</span>
-<span style="color: #BA2121; font-style: italic">        &quot;&quot;&quot;</span>
-
-        pred_train <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>predict(X)
-        cost_function_train <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>cost_func(t)
-        train_error <span style="color: #666666">=</span> cost_function_train(pred_train)
-        scores[<span style="color: #BA2121">&quot;train_error&quot;</span>][epoch] <span style="color: #666666">=</span> train_error
-
-        <span style="color: #008000; font-weight: bold">if</span> X_val <span style="color: #AA22FF; font-weight: bold">is</span> <span style="color: #AA22FF; font-weight: bold">not</span> <span style="color: #008000; font-weight: bold">None</span> <span style="color: #AA22FF; font-weight: bold">and</span> t_val <span style="color: #AA22FF; font-weight: bold">is</span> <span style="color: #AA22FF; font-weight: bold">not</span> <span style="color: #008000; font-weight: bold">None</span>:
-            cost_function_val <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>cost_func(t_val)
-            pred_val <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>predict(X_val)
-            val_error <span style="color: #666666">=</span> cost_function_val(pred_val)
-            scores[<span style="color: #BA2121">&quot;val_error&quot;</span>][epoch] <span style="color: #666666">=</span> val_error
-
-        <span style="color: #008000; font-weight: bold">if</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>pred_format <span style="color: #666666">!=</span> <span style="color: #BA2121">&quot;Regression&quot;</span>:
-            train_acc <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>_accuracy(pred_train, t)
-            scores[<span style="color: #BA2121">&quot;train_acc&quot;</span>][epoch] <span style="color: #666666">=</span> train_acc
-            <span style="color: #008000; font-weight: bold">if</span> X_val <span style="color: #AA22FF; font-weight: bold">is</span> <span style="color: #AA22FF; font-weight: bold">not</span> <span style="color: #008000; font-weight: bold">None</span> <span style="color: #AA22FF; font-weight: bold">and</span> t_val <span style="color: #AA22FF; font-weight: bold">is</span> <span style="color: #AA22FF; font-weight: bold">not</span> <span style="color: #008000; font-weight: bold">None</span>:
-                val_acc <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>_accuracy(pred_val, t_val)
-                scores[<span style="color: #BA2121">&quot;val_acc&quot;</span>][epoch] <span style="color: #666666">=</span> val_acc
-
-        <span style="color: #008000; font-weight: bold">return</span> scores
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">_initialize_scores</span>(<span style="color: #008000">self</span>, epochs) <span style="color: #666666">-&gt;</span> <span style="color: #008000">dict</span>:
-        <span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;</span>
-<span style="color: #BA2121; font-style: italic">        Description:</span>
-<span style="color: #BA2121; font-style: italic">        ------------</span>
-<span style="color: #BA2121; font-style: italic">            Initializes scores such as training error, training accuracy, validation error</span>
-<span style="color: #BA2121; font-style: italic">            and validation accuracy for the CNN</span>
-
-<span style="color: #BA2121; font-style: italic">        Returns:</span>
-<span style="color: #BA2121; font-style: italic">        ------------</span>
-<span style="color: #BA2121; font-style: italic">            A dictionary with &quot;train_error&quot;, &quot;train_acc&quot;, &quot;val_error&quot;, val_acc&quot; keys that</span>
-<span style="color: #BA2121; font-style: italic">            will contain numpy arrays with float values of all accuracies/errors over all epochs</span>
-<span style="color: #BA2121; font-style: italic">            when passed through the _compute_scores() function during fit()</span>
-<span style="color: #BA2121; font-style: italic">        &quot;&quot;&quot;</span>
-        scores <span style="color: #666666">=</span> <span style="color: #008000">dict</span>()
-
-        train_errors <span style="color: #666666">=</span> np<span style="color: #666666">.</span>empty(epochs)
-        train_errors<span style="color: #666666">.</span>fill(np<span style="color: #666666">.</span>nan)
-        val_errors <span style="color: #666666">=</span> np<span style="color: #666666">.</span>empty(epochs)
-        val_errors<span style="color: #666666">.</span>fill(np<span style="color: #666666">.</span>nan)
-
-        train_accs <span style="color: #666666">=</span> np<span style="color: #666666">.</span>empty(epochs)
-        train_accs<span style="color: #666666">.</span>fill(np<span style="color: #666666">.</span>nan)
-        val_accs <span style="color: #666666">=</span> np<span style="color: #666666">.</span>empty(epochs)
-        val_accs<span style="color: #666666">.</span>fill(np<span style="color: #666666">.</span>nan)
-
-        scores[<span style="color: #BA2121">&quot;train_error&quot;</span>] <span style="color: #666666">=</span> train_errors
-        scores[<span style="color: #BA2121">&quot;val_error&quot;</span>] <span style="color: #666666">=</span> val_errors
-        scores[<span style="color: #BA2121">&quot;train_acc&quot;</span>] <span style="color: #666666">=</span> train_accs
-        scores[<span style="color: #BA2121">&quot;val_acc&quot;</span>] <span style="color: #666666">=</span> val_accs
-
-        <span style="color: #008000; font-weight: bold">return</span> scores
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">_initialize_weights</span>(<span style="color: #008000">self</span>, X: np<span style="color: #666666">.</span>ndarray) <span style="color: #666666">-&gt;</span> <span style="color: #008000; font-weight: bold">None</span>:
-        <span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;</span>
-<span style="color: #BA2121; font-style: italic">        Description:</span>
-<span style="color: #BA2121; font-style: italic">        ------------</span>
-<span style="color: #BA2121; font-style: italic">            Initializes weights for all layers in CNN</span>
-
-<span style="color: #BA2121; font-style: italic">        Parameters:</span>
-<span style="color: #BA2121; font-style: italic">        ------------</span>
-<span style="color: #BA2121; font-style: italic">            I   X (np.ndarray) input of format [img, feature_maps, height, width]</span>
-<span style="color: #BA2121; font-style: italic">        &quot;&quot;&quot;</span>
-        prev_nodes <span style="color: #666666">=</span> X
-        <span style="color: #008000; font-weight: bold">for</span> layer <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>layers:
-            prev_nodes <span style="color: #666666">=</span> layer<span style="color: #666666">.</span>_reset_weights(prev_nodes)
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">predict</span>(<span style="color: #008000">self</span>, X: np<span style="color: #666666">.</span>ndarray, <span style="color: #666666">*</span>, threshold<span style="color: #666666">=0.5</span>) <span style="color: #666666">-&gt;</span> np<span style="color: #666666">.</span>ndarray:
-        <span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;</span>
-<span style="color: #BA2121; font-style: italic">        Description:</span>
-<span style="color: #BA2121; font-style: italic">        ------------</span>
-<span style="color: #BA2121; font-style: italic">            Predicts output of input X</span>
-
-<span style="color: #BA2121; font-style: italic">        Parameters:</span>
-<span style="color: #BA2121; font-style: italic">        ------------</span>
-<span style="color: #BA2121; font-style: italic">            I   X (np.ndarray) input [img, feature_maps, height, width]</span>
-<span style="color: #BA2121; font-style: italic">        &quot;&quot;&quot;</span>
-
-        prediction <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>_feedforward(X)
-
-        <span style="color: #008000; font-weight: bold">if</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>pred_format <span style="color: #666666">==</span> <span style="color: #BA2121">&quot;Binary&quot;</span>:
-            <span style="color: #008000; font-weight: bold">return</span> np<span style="color: #666666">.</span>where(prediction <span style="color: #666666">&gt;</span> threshold, <span style="color: #666666">1</span>, <span style="color: #666666">0</span>)
-        <span style="color: #008000; font-weight: bold">elif</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>pred_format <span style="color: #666666">==</span> <span style="color: #BA2121">&quot;Multi-class&quot;</span>:
-            class_prediction <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros(prediction<span style="color: #666666">.</span>shape)
-            <span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(prediction<span style="color: #666666">.</span>shape[<span style="color: #666666">0</span>]):
-                class_prediction[i, np<span style="color: #666666">.</span>argmax(prediction[i, :])] <span style="color: #666666">=</span> <span style="color: #666666">1</span>
-            <span style="color: #008000; font-weight: bold">return</span> class_prediction
-        <span style="color: #008000; font-weight: bold">else</span>:
-            <span style="color: #008000; font-weight: bold">return</span> prediction
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">_accuracy</span>(<span style="color: #008000">self</span>, prediction: np<span style="color: #666666">.</span>ndarray, target: np<span style="color: #666666">.</span>ndarray) <span style="color: #666666">-&gt;</span> <span style="color: #008000">float</span>:
-        <span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;</span>
-<span style="color: #BA2121; font-style: italic">        Description:</span>
-<span style="color: #BA2121; font-style: italic">        ------------</span>
-<span style="color: #BA2121; font-style: italic">            Calculates accuracy of given prediction to target</span>
-
-<span style="color: #BA2121; font-style: italic">        Parameters:</span>
-<span style="color: #BA2121; font-style: italic">        ------------</span>
-<span style="color: #BA2121; font-style: italic">            I   prediction (np.ndarray): output of predict() fuction</span>
-<span style="color: #BA2121; font-style: italic">            (1s and 0s in case of classification, and real numbers in case of regression)</span>
-<span style="color: #BA2121; font-style: italic">            II  target (np.ndarray): vector of true values (What the network should predict)</span>
-
-<span style="color: #BA2121; font-style: italic">        Returns:</span>
-<span style="color: #BA2121; font-style: italic">        ------------</span>
-<span style="color: #BA2121; font-style: italic">            A floating point number representing the percentage of correctly classified instances.</span>
-<span style="color: #BA2121; font-style: italic">        &quot;&quot;&quot;</span>
-        <span style="color: #008000; font-weight: bold">assert</span> prediction<span style="color: #666666">.</span>size <span style="color: #666666">==</span> target<span style="color: #666666">.</span>size
-        <span style="color: #008000; font-weight: bold">return</span> np<span style="color: #666666">.</span>average((target <span style="color: #666666">==</span> prediction))
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">_progress_bar</span>(<span style="color: #008000">self</span>, epoch: <span style="color: #008000">int</span>, epochs: <span style="color: #008000">int</span>, scores: <span style="color: #008000">dict</span>) <span style="color: #666666">-&gt;</span> <span style="color: #008000">int</span>:
-        <span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;</span>
-<span style="color: #BA2121; font-style: italic">        Description:</span>
-<span style="color: #BA2121; font-style: italic">        ------------</span>
-<span style="color: #BA2121; font-style: italic">            Displays progress of training</span>
-<span style="color: #BA2121; font-style: italic">        &quot;&quot;&quot;</span>
-        progression <span style="color: #666666">=</span> epoch <span style="color: #666666">/</span> epochs
-        epoch <span style="color: #666666">-=</span> <span style="color: #666666">1</span>
-        print_length <span style="color: #666666">=</span> <span style="color: #666666">40</span>
-        num_equals <span style="color: #666666">=</span> <span style="color: #008000">int</span>(progression <span style="color: #666666">*</span> print_length)
-        num_not <span style="color: #666666">=</span> print_length <span style="color: #666666">-</span> num_equals
-        arrow <span style="color: #666666">=</span> <span style="color: #BA2121">&quot;&gt;&quot;</span> <span style="color: #008000; font-weight: bold">if</span> num_equals <span style="color: #666666">&gt;</span> <span style="color: #666666">0</span> <span style="color: #008000; font-weight: bold">else</span> <span style="color: #BA2121">&quot;&quot;</span>
-        bar <span style="color: #666666">=</span> <span style="color: #BA2121">&quot;[&quot;</span> <span style="color: #666666">+</span> <span style="color: #BA2121">&quot;=&quot;</span> <span style="color: #666666">*</span> (num_equals <span style="color: #666666">-</span> <span style="color: #666666">1</span>) <span style="color: #666666">+</span> arrow <span style="color: #666666">+</span> <span style="color: #BA2121">&quot;-&quot;</span> <span style="color: #666666">*</span> num_not <span style="color: #666666">+</span> <span style="color: #BA2121">&quot;]&quot;</span>
-        perc_print <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>_fmt(progression <span style="color: #666666">*</span> <span style="color: #666666">100</span>, N<span style="color: #666666">=5</span>)
-        line <span style="color: #666666">=</span> <span style="color: #BA2121">f&quot;  </span><span style="color: #BB6688; font-weight: bold">{</span>bar<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121"> </span><span style="color: #BB6688; font-weight: bold">{</span>perc_print<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">% &quot;</span>
-
-        <span style="color: #008000; font-weight: bold">for</span> key, score <span style="color: #AA22FF; font-weight: bold">in</span> scores<span style="color: #666666">.</span>items():
-            <span style="color: #008000; font-weight: bold">if</span> np<span style="color: #666666">.</span>isnan(score[epoch]) <span style="color: #666666">==</span> <span style="color: #008000; font-weight: bold">False</span>:
-                value <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>_fmt(score[epoch], N<span style="color: #666666">=4</span>)
-                line <span style="color: #666666">+=</span> <span style="color: #BA2121">f&quot;| </span><span style="color: #BB6688; font-weight: bold">{</span>key<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">: </span><span style="color: #BB6688; font-weight: bold">{</span>value<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121"> &quot;</span>
-        <span style="color: #008000">print</span>(line, end<span style="color: #666666">=</span><span style="color: #BA2121">&quot;</span><span style="color: #BB6622; font-weight: bold">\r</span><span style="color: #BA2121">&quot;</span>)
-        <span style="color: #008000; font-weight: bold">return</span> <span style="color: #008000">len</span>(line)
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">_fmt</span>(<span style="color: #008000">self</span>, value: <span style="color: #008000">int</span>, N<span style="color: #666666">=4</span>) <span style="color: #666666">-&gt;</span> <span style="color: #008000">str</span>:
-        <span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;</span>
-<span style="color: #BA2121; font-style: italic">        Description:</span>
-<span style="color: #BA2121; font-style: italic">        ------------</span>
-<span style="color: #BA2121; font-style: italic">            Formats decimal numbers for progress bar</span>
-<span style="color: #BA2121; font-style: italic">        &quot;&quot;&quot;</span>
-        <span style="color: #008000; font-weight: bold">if</span> value <span style="color: #666666">&gt;</span> <span style="color: #666666">0</span>:
-            v <span style="color: #666666">=</span> value
-        <span style="color: #008000; font-weight: bold">elif</span> value <span style="color: #666666">&lt;</span> <span style="color: #666666">0</span>:
-            v <span style="color: #666666">=</span> <span style="color: #666666">-10</span> <span style="color: #666666">*</span> value
-        <span style="color: #008000; font-weight: bold">else</span>:
-            v <span style="color: #666666">=</span> <span style="color: #666666">1</span>
-        n <span style="color: #666666">=</span> <span style="color: #666666">1</span> <span style="color: #666666">+</span> math<span style="color: #666666">.</span>floor(math<span style="color: #666666">.</span>log10(v))
-        <span style="color: #008000; font-weight: bold">if</span> n <span style="color: #666666">&gt;=</span> N <span style="color: #666666">-</span> <span style="color: #666666">1</span>:
-            <span style="color: #008000; font-weight: bold">return</span> <span style="color: #008000">str</span>(<span style="color: #008000">round</span>(value))
-            <span style="color: #408080; font-style: italic"># or overflow</span>
-        <span style="color: #008000; font-weight: bold">return</span> <span style="color: #BA2121">f&quot;</span><span style="color: #BB6688; font-weight: bold">{</span>value<span style="color: #BB6688; font-weight: bold">:</span><span style="color: #BA2121">.</span><span style="color: #BB6688; font-weight: bold">{</span>N<span style="color: #666666">-</span>n<span style="color: #666666">-1</span><span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">f</span><span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">&quot;</span>
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-<h3 id="usage-of-cnn-code">Usage of CNN code </h3>
-
-<p>Using the CNN codebase is very simple. We begin by initiating a CNN
-object, which takes a cost function, a scheduler and a seed as its
-arguments. If a scheduler is not provided, it will per default
-initiate an Adam scheduler with eta=1e-4, and if a seed is not
-provided, the CNN will not be seeded, meaning it will run with a
-different random seed every run. Below we demonstrate an initiation of
-our CNN.
+<p>and </p>
+$$
+\boldsymbol{W}=\begin{bmatrix}w_{00} & w_{01} \\
+	              w_{10} & w_{11}\end{bmatrix}.
+$$
+
+<p>We introduce now the hyperparameter \( S \) <b>stride</b>. Stride represents how the filter \( W \) moves the convolution process on the matrix \( X \).
+We strongly recommend the repository on <a href="https://github.com/vdumoulin/conv_arithmetic" target="_blank">Arithmetic of deep learning by Dumoulin and Visin</a> 
 </p>
 
+<p>Here we set the stride equal to \( S=1 \), which means that, starting with the element \( x_{00} \), the filter will act on \( 2\times 2 \) submatrices each time, starting with the upper corner and moving according to the stride value column by column. </p>
 
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;">adam_scheduler <span style="color: #666666">=</span> Adam(eta<span style="color: #666666">=1e-3</span>, rho<span style="color: #666666">=0.9</span>, rho2<span style="color: #666666">=0.999</span>)
-cnn <span style="color: #666666">=</span> CNN(cost_func<span style="color: #666666">=</span>CostCrossEntropy, scheduler<span style="color: #666666">=</span>adam_scheduler, seed<span style="color: #666666">=2023</span>)
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
+<p>Here we perform the operation</p>
+$$
+Y_(i,j)=(X * W)(i,j) = \sum_m\sum_n X(i-m,j-n)W(m,n),
+$$
+
+<p>and obtain</p>
+$$
+\boldsymbol{Y}=\begin{bmatrix}x_{00}w_{00}+x_{01}w_{01}+x_{10}w_{10}+x_{11}w_{11} & x_{01}w_{00}+x_{02}w_{01}+x_{11}w_{10}+x_{12}w_{11}  \\
+	              x_{10}w_{00}+x_{11}w_{01}+x_{20}w_{10}+x_{21}w_{11} & x_{11}w_{00}+x_{12}w_{01}+x_{21}w_{10}+x_{22}w_{11}\end{bmatrix}.
+$$
 
-<p>Now that we have our CNN object, we can begin to add layers to it!
-Many of the add_layer functions have default values, for example
-add_Convolution2DLayer() has a default v_stride and h_stride of
-1. However, these can of course be set to any value you please. Note
-that the input channels of a subsequent convolutional layer must equal
-the previous convolutional layer's feature maps.
+<p>We can rewrite this operation in terms of a matrix-vector multiplication by defining a new vector where we flatten out the inputs as a vector \( \boldsymbol{X}' \) of length \( 9 \) and
+a matrix \( \boldsymbol{W}' \) with dimension \( 4\times 9 \) as
 </p>
 
+$$
+\boldsymbol{X}'=\begin{bmatrix}x_{00} \\ x_{01} \\ x_{02} \\ x_{10} \\ x_{11} \\ x_{12} \\ x_{20} \\ x_{21} \\ x_{22} \end{bmatrix},
+$$
 
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;">cnn<span style="color: #666666">.</span>add_Convolution2DLayer(
-    input_channels<span style="color: #666666">=1</span>,
-    feature_maps<span style="color: #666666">=1</span>,
-    kernel_height<span style="color: #666666">=3</span>,
-    kernel_width<span style="color: #666666">=3</span>,
-    act_func<span style="color: #666666">=</span>LRELU,
-)
+<p>and the new matrix</p>
+$$
+\boldsymbol{W}'=\begin{bmatrix} w_{00} & w_{01} & 0 & w_{10} & w_{11} & 0 & 0 & 0 & 0 \\
+                        0  & w_{00} & w_{01} & 0 & w_{10} & w_{11} & 0 & 0 & 0 \\
+			0 & 0 & 0 & w_{00} & w_{01} & 0 & w_{10} & w_{11} & 0  \\
+                        0 & 0 & 0 & 0 & w_{00} & w_{01} & 0 & w_{10} & w_{11}\end{bmatrix}.
+$$
+
+<p>We see easily that performing the matrix-vector multiplication \( \boldsymbol{W}'\boldsymbol{X}' \) is the same as the above convolution with stride \( S=1 \), that is</p>
+
+$$
+Y=(\boldsymbol{W}*\boldsymbol{X}),
+$$
+
+<p>is now given by \( \boldsymbol{W}'\boldsymbol{X}' \) which is a vector of length \( 4 \) instead of the originally resulting  \( 2\times 2 \) output matrix.</p>
 
-cnn<span style="color: #666666">.</span>add_FlattenLayer()
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="the-convolution-stage">The convolution stage </h2>
 
-cnn<span style="color: #666666">.</span>add_FullyConnectedLayer(<span style="color: #666666">30</span>, LRELU)
+<p>The convolution stage, where we apply different filters \( \boldsymbol{W} \) in
+order to reduce the dimensionality of an image, adds, in addition to
+the weights and biases (to be trained by the back propagation
+algorithm) that define the filters, two new hyperparameters, the so-called
+<b>padding</b> \( P \) and the stride \( S \).
+</p>
 
-cnn<span style="color: #666666">.</span>add_FullyConnectedLayer(<span style="color: #666666">20</span>, LRELU)
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="finding-the-number-of-parameters">Finding the number of parameters </h2>
 
-cnn<span style="color: #666666">.</span>add_OutputLayer(<span style="color: #666666">10</span>, softmax)
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
+<p>In the above example we have an input matrix of dimension \( 3\times
+3 \). In general we call the input for an input volume and it is defined
+by its width \( H_1 \), height \( H_1 \) and depth \( D_1 \). If we have the
+standard three color channels \( D_1=3 \).
+</p>
 
-<p>Here we have created a CNN with the following architecture:</p>
+<p>The above example has \( W_1=H_1=3 \) and \( D_1=1 \).</p>
 
+<p>When we introduce the filter we have the following additional hyperparameters</p>
 <ol>
-<li> A convolutional layer with 1 input channel, with a kernel height of 2 and a width of 2, which uses LRELU as its non-linearity function. This layer outputs 1 feature map, which feed into the subsequent layer.</li>
-<li> A flatten layer</li>
-<li> A hidden layer with 30 nodes, with LRELU as its activation function</li>
-<li> Another hidden layer but with 20 nodes</li>
-<li> The output layer, with softmax as its activation function and 10 nodes. We use 10 nodes because we will be using a dataset with 10 classes.</li>
+<li> \( K \) the number of filters. It is common to perform the convolution of the input several times since by experience shrinking the input too fast does not work well</li>
+<li> \( F \) as the filter's spatial extent</li>
+<li> \( S \) as the stride parameter</li>
+<li> \( P \) as the padding parameter</li>
 </ol>
-<p>Now, before we can train the model, we need to load in our data. We
-will use the MNIST dataset and use 10000 \( 28 \times  28 \) images.
-</p>
+<p>These parameters are defined by the architecture of the network and are not included in the training.</p>
 
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="new-image-or-volume">New image (or volume) </h2>
 
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.datasets</span> <span style="color: #008000; font-weight: bold">import</span> fetch_openml
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.model_selection</span> <span style="color: #008000; font-weight: bold">import</span> train_test_split
+<p>Acting with the filter on the input volume produces an output volume
+which is defined by its width \( W_2 \), its height \( H_2 \) and its depth
+\( D_2 \).
+</p>
 
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">onehot</span>(target: np<span style="color: #666666">.</span>ndarray):
-    onehot <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros((target<span style="color: #666666">.</span>size, target<span style="color: #666666">.</span>max() <span style="color: #666666">+</span> <span style="color: #666666">1</span>))
-    onehot[np<span style="color: #666666">.</span>arange(target<span style="color: #666666">.</span>size), target] <span style="color: #666666">=</span> <span style="color: #666666">1</span>
-    <span style="color: #008000; font-weight: bold">return</span> onehot
+<p>These are defined by the following relations</p>
+$$
+W_2 = \frac{(W_1-F+2P)}{S}+1,
+$$
 
-<span style="color: #408080; font-style: italic"># get dataset</span>
-dataset <span style="color: #666666">=</span> fetch_openml(<span style="color: #BA2121">&quot;mnist_784&quot;</span>, parser<span style="color: #666666">=</span><span style="color: #BA2121">&quot;auto&quot;</span>)
-mnist <span style="color: #666666">=</span> dataset<span style="color: #666666">.</span>data<span style="color: #666666">.</span>to_numpy(dtype<span style="color: #666666">=</span><span style="color: #BA2121">&quot;float&quot;</span>)[:<span style="color: #666666">10000</span>, :]
+$$
+H_2 = \frac{(H_1-F+2P)}{S}+1,
+$$
 
-<span style="color: #408080; font-style: italic"># scale data</span>
-<span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(mnist<span style="color: #666666">.</span>shape[<span style="color: #666666">1</span>]):
-    mnist[:, i] <span style="color: #666666">/=</span> <span style="color: #666666">255</span>
-    
-<span style="color: #408080; font-style: italic"># reshape to add single input channel to data shape [inputs, input_channels, height, width]</span>
-mnist <span style="color: #666666">=</span> mnist<span style="color: #666666">.</span>reshape(mnist<span style="color: #666666">.</span>shape[<span style="color: #666666">0</span>], <span style="color: #666666">1</span>, <span style="color: #666666">28</span>, <span style="color: #666666">28</span>)
+<p>and \( D_2=K \).</p>
 
-<span style="color: #408080; font-style: italic"># one hot encode target as we are doing multi-class classification</span>
-target <span style="color: #666666">=</span> onehot(np<span style="color: #666666">.</span>array([<span style="color: #008000">int</span>(i) <span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> dataset<span style="color: #666666">.</span>target<span style="color: #666666">.</span>to_numpy()[:<span style="color: #666666">10000</span>]]))
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="parameters-to-train-common-settings">Parameters to train, common settings </h2>
 
-<span style="color: #408080; font-style: italic"># split into training and validation data</span>
-x_train, x_val, y_train, y_val <span style="color: #666666">=</span> train_test_split(mnist, target)
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
+<p>With parameter sharing, the convolution involves thus  for each filter  \( F\times F\times D_1 \) weights plus one bias parameter.</p>
 
-<p>Now we may train our model. Note that we can utilize regularization in
-the CNN by using the lam (lambda) parameter in fit(), and utilize
-different types of gradient descent by specifying the amount of
-batches via the batches parameter as shown below.
-</p>
+<p>In total we have</p>
+$$
+\left(F\times F\times D_1)\right) \times K+(K\mathrm{--biases}),
+$$
 
-<p>The functionfit() returns a score dictionary of the training error and
-accuracy (and validation error and accuracy if a validation set is
-provided) which can be used to plot the error and accuracy of the
-model over epochs.
-</p>
+<p>parameters to train by back propagation.</p>
 
+<p>It is common to let \( K \) come in powers of \( 2 \), that is \( 32 \), \( 64 \), \( 128 \) etc.</p>
 
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;">scores <span style="color: #666666">=</span> cnn<span style="color: #666666">.</span>fit(
-    x_train,
-    y_train,
-    lam<span style="color: #666666">=1e-5</span>,
-    batches<span style="color: #666666">=10</span>,
-    epochs<span style="color: #666666">=100</span>,
-    X_val<span style="color: #666666">=</span>x_val,
-    t_val<span style="color: #666666">=</span>y_val,
-)
-
-plt<span style="color: #666666">.</span>plot(scores[<span style="color: #BA2121">&quot;train_acc&quot;</span>], label<span style="color: #666666">=</span><span style="color: #BA2121">&quot;Training&quot;</span>)
-plt<span style="color: #666666">.</span>plot(scores[<span style="color: #BA2121">&quot;val_acc&quot;</span>], label<span style="color: #666666">=</span><span style="color: #BA2121">&quot;Validation&quot;</span>)
-plt<span style="color: #666666">.</span>ylim([<span style="color: #666666">0.8</span>,<span style="color: #666666">1</span>])
-plt<span style="color: #666666">.</span>xlabel(<span style="color: #BA2121">&quot;Epochs&quot;</span>)
-plt<span style="color: #666666">.</span>ylabel(<span style="color: #BA2121">&quot;Accuracy&quot;</span>)
-plt<span style="color: #666666">.</span>legend()
-plt<span style="color: #666666">.</span>show()
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
+<div class="alert alert-block alert-block alert-text-normal">
+<b>Common settings</b>
+<p>
+<ol>
+<li> \( \begin{array}{c} F=3 &amp; S=1 &amp; P=1 \end{array} \)</li>
+<li> \( \begin{array}{c} F=5 &amp; S=1 &amp; P=2 \end{array} \)</li>
+<li> \( \begin{array}{c} F=5 &amp; S=2 &amp; P=\mathrm{open} \end{array} \)</li>
+<li> \( \begin{array}{c} F=1 &amp; S=1 &amp; P=0 \end{array} \)</li>
+</ol>
 </div>
 
-<p>Considering we only trained the model for 100 epochs without any tuning of the hyperparameters, this result is pretty good.</p>
 
-<p>The codebase allows for great flexibility in CNN
-architectures. Pooling layers can be added before, inbetween or after
-convolutional layers, but due to the great optimizations made within
-Convolution2DLayerOPT, we recommend using the v_stride and h_stride
-parameters in add_Convolution2DLayer() to reduce the dimentionality of
-the problem as the pooling layer is slow in comparison. To use the
-unoptimized version of Convolution2DLayer, simply pass optimized=False
-as an argument in add_Convolution2DLayer().
-</p>
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="examples-of-cnn-setups">Examples of CNN setups </h2>
 
-<p>If one wishes to perform binary classification using the CNN, simply
-use the cost function 'CostLogReg' when initializing the CNN and use 1
-node at the OutputLayer.
+<p>Let us assume we have an input volume \( V \) given by an image of dimensionality
+\( 32\times 32 \times 3 \), that is three color channels and \( 32\times 32 \) pixels.
 </p>
 
-<p>Below we have created another, more untraditional architecture using
-our code to demonstrate its flexibility and different attributes such
-as asymmetric stride that might become useful when constructing your
-own CNN.
-</p>
+<p>We apply a filter of dimension \( 5\times 5 \) ten times with stride \( S=1 \) and padding \( P=0 \).</p>
 
+<p>The output volume is given by \( (32-5)/1+1=28 \), resulting in ten images
+of dimensionality \( 28\times 28\times 3 \).
+</p>
 
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;">adam_scheduler <span style="color: #666666">=</span> Adam(eta<span style="color: #666666">=1e-3</span>, rho<span style="color: #666666">=0.9</span>, rho2<span style="color: #666666">=0.999</span>)
-cnn <span style="color: #666666">=</span> CNN(cost_func<span style="color: #666666">=</span>CostCrossEntropy, scheduler<span style="color: #666666">=</span>adam_scheduler, seed<span style="color: #666666">=2023</span>)
-
-cnn<span style="color: #666666">.</span>add_Convolution2DLayer(
-    input_channels<span style="color: #666666">=1</span>,
-    feature_maps<span style="color: #666666">=7</span>,
-    kernel_height<span style="color: #666666">=7</span>,
-    kernel_width<span style="color: #666666">=1</span>,
-    act_func<span style="color: #666666">=</span>LRELU,
-)
-
-cnn<span style="color: #666666">.</span>add_PoolingLayer(
-    kernel_height<span style="color: #666666">=2</span>,
-    kernel_width<span style="color: #666666">=2</span>,
-    pooling<span style="color: #666666">=</span><span style="color: #BA2121">&quot;average&quot;</span>,
-)
-
-cnn<span style="color: #666666">.</span>add_PoolingLayer(
-    kernel_height<span style="color: #666666">=2</span>,
-    kernel_width<span style="color: #666666">=2</span>,
-    pooling<span style="color: #666666">=</span><span style="color: #BA2121">&quot;max&quot;</span>,
-)
-
-cnn<span style="color: #666666">.</span>add_Convolution2DLayer(
-    input_channels<span style="color: #666666">=7</span>,
-    feature_maps<span style="color: #666666">=1</span>,
-    kernel_height<span style="color: #666666">=4</span>,
-    kernel_width<span style="color: #666666">=4</span>,
-    v_stride<span style="color: #666666">=2</span>,
-    h_stride<span style="color: #666666">=3</span>,
-    act_func<span style="color: #666666">=</span>LRELU,
-    optimized<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">False</span>,
-)
-
-cnn<span style="color: #666666">.</span>add_Convolution2DLayer(
-    input_channels<span style="color: #666666">=1</span>,
-    feature_maps<span style="color: #666666">=1</span>,
-    kernel_height<span style="color: #666666">=2</span>,
-    kernel_width<span style="color: #666666">=2</span>,
-    act_func<span style="color: #666666">=</span>sigmoid,
-    optimized<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">True</span>,
-)
-
-cnn<span style="color: #666666">.</span>add_PoolingLayer(
-    kernel_height<span style="color: #666666">=2</span>,
-    kernel_width<span style="color: #666666">=2</span>,
-    pooling<span style="color: #666666">=</span><span style="color: #BA2121">&quot;max&quot;</span>
-)
-
-cnn<span style="color: #666666">.</span>add_FlattenLayer()
-
-cnn<span style="color: #666666">.</span>add_FullyConnectedLayer(<span style="color: #666666">100</span>, LRELU)
-
-cnn<span style="color: #666666">.</span>add_FullyConnectedLayer(<span style="color: #666666">10</span>, sigmoid)
-
-cnn<span style="color: #666666">.</span>add_FullyConnectedLayer(<span style="color: #666666">101</span>, identity)
-
-cnn<span style="color: #666666">.</span>add_OutputLayer(<span style="color: #666666">10</span>, softmax)
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
+<p>The total number of parameters to train for each filter is then
+\( 5\times 5\times 3+1 \), where the last parameter is the bias. This
+gives us \( 76 \) parameters for each filter, leading to a total of \( 760 \)
+parameters for the ten filters.
+</p>
 
-<p>Here we see the use of asymmetrical 1D kernels such as the \( 7 \times
-1 \) kernel in the first convolutional layer, both max and average
-pooling, asymmetric stride in the unoptimized convolutional layer,
-more pooling, a flatten layer, a hidden layer with 100 nodes using
-LRELU, another hidden layer with 10 hidden nodes that uses the sigmoid
-activation function, and another hidden layer with 101 nodes which
-utilizes no activation function (identity). Finally, we arrive at the
-output layer with 10 nodes, which uses softmax as its activation
-function.
+<p>How many parameters will a filter of dimensionality \( 3\times 3 \)
+(adding color channels) result in if we produce \( 32 \) new images? Use \( S=1 \) and \( P=0 \).
 </p>
-<h3 id="additional-remarks">Additional Remarks </h3>
-
-<p>The stride parameter controls the distance between each convolution
-and the kernel/filter. If our image is padded, stride is the only
-parameter that determines the size of the output from a convolutional
-layer. However, if we decide not to perform any padding, the size of
-the output feature map depends on both the stride and kernel size. It
-is important to note that neither the stride nor the kernel has to be
-symmetrical. This means that we can use a rectangular filter if we
-choose, and the stride in the vertical direction (axis=0 in Python)
-does not need to be the same as the stride in the horizontal direction
-(axis=1 in Python). It may even be the case that asymmetric
-combinations of stride or kernel dimensions, or both, yield better
-results than symmetric values for these parameters.
+
+<p>Note that strides constitute a form of <b>subsampling</b>. As an alternative to
+being interpreted as a measure of how much the kernel/filter is translated, strides
+can also be viewed as how much of the output is retained. For instance, moving
+the kernel by hops of two is equivalent to moving the kernel by hops of one but
+retaining only odd output elements.
 </p>
 
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book">Summarizing: Performing a general discrete convolution (<a href="https://github.com/rasbt/machine-learning-book" target="_blank">From Raschka et al</a>) </h2>
 
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">convolve</span>(image, kernel, stride<span style="color: #666666">=1</span>):
-    <span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #666666">2</span>):
-        kernel <span style="color: #666666">=</span> np<span style="color: #666666">.</span>rot90(kernel)
-
-    k_half_height <span style="color: #666666">=</span> kernel<span style="color: #666666">.</span>shape[<span style="color: #666666">0</span>] <span style="color: #666666">//</span> <span style="color: #666666">2</span>
-    k_half_width <span style="color: #666666">=</span> kernel<span style="color: #666666">.</span>shape[<span style="color: #666666">0</span>] <span style="color: #666666">//</span> <span style="color: #666666">2</span>
-
-    conv_image <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros(image<span style="color: #666666">.</span>shape)
-    pad_image <span style="color: #666666">=</span> padding(image, kernel)
-
-    <span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(k_half_height, conv_image<span style="color: #666666">.</span>shape[<span style="color: #666666">0</span>] <span style="color: #666666">+</span> k_half_height, stride):
-        <span style="color: #008000; font-weight: bold">for</span> j <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(k_half_width, conv_image<span style="color: #666666">.</span>shape[<span style="color: #666666">1</span>] <span style="color: #666666">+</span> k_half_width, stride):
-            conv_image[i <span style="color: #666666">-</span> k_half_height, j <span style="color: #666666">-</span> k_half_width] <span style="color: #666666">=</span> np<span style="color: #666666">.</span>sum(
-                pad_image[
-                    i <span style="color: #666666">-</span> k_half_height : i <span style="color: #666666">+</span> k_half_height <span style="color: #666666">+</span> <span style="color: #666666">1</span>, j <span style="color: #666666">-</span> k_half_width : j <span style="color: #666666">+</span> k_half_width <span style="color: #666666">+</span> <span style="color: #666666">1</span>
-                ]
-                <span style="color: #666666">*</span> kernel
-            )
-
-    <span style="color: #008000; font-weight: bold">return</span> conv_image
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-<h3 id="remarks-on-the-speed">Remarks on the speed  </h3>
-
-<p>Despite the naive convolution algorithm shown above working finely, it
-is extremely slow, requiring approximately 20-30 seconds to process a
-single image. The time complexity of 2D convolution, which is O(NMnm),
-rapidly becomes a constraint and may, at worst, make computations
-infeasible. Consequently, optimizing the naive 2D convolution
-algorithm is a necessity, as the execution time of the algorithm
-significantly increases as the input data size expands. This can pose
-a bottleneck in applications that necessitate real-time processing of
-large data volumes, such as image and video processing, deep learning,
-and scientific simulations.
-</p>
+<center>  <!-- FIGURE -->
+<hr class="figure">
+<center>
+<p class="caption">Figure 4:  A deep CNN </p>
+</center>
+<p><img src="http://github.com/figslides/discreteconv1.png" width="500" align="bottom"></p>
+</center>
+
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="pooling">Pooling </h2>
 
-<p>To address this issue, we shall present two widely used optimization
-techniques: the separable kernel approach and Fast Fourier Transform
-(FFT). Both of these methods can drastically reduce the computational
-complexity of convolution and enhance the overall efficiency of
-processing substantial data quantities. While we shall refrain from
-delving into the intricacies of these algorithms, we strongly
-encourage you to examine at least the application of FFT to optimize
-computations.
+<p>In addition to discrete convolutions themselves, <b>pooling</b> operations
+make up another important building block in CNNs. Pooling operations reduce
+the size of feature maps by using some function to summarize subregions, such
+as taking the average or the maximum value.
 </p>
-<h3 id="convolution-using-separable-kernels">Convolution using separable kernels </h3>
 
+<p>Pooling works by sliding a window across the input and feeding the content of
+the window to a <b>pooling function</b>. In some sense, pooling works very much
+like a discrete convolution, but replaces the linear combination described by
+the kernel with some other function.
+</p>
 
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">conv2DSep</span>(image, kernel, coef, stride<span style="color: #666666">=1</span>, pad<span style="color: #666666">=</span><span style="color: #BA2121">&quot;zero&quot;</span>):
-    <span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #666666">2</span>):
-        kernel <span style="color: #666666">=</span> np<span style="color: #666666">.</span>rot90(kernel)
-
-    <span style="color: #408080; font-style: italic"># The kernel is quadratic, thus we only need one of its dimensions</span>
-    half_dim <span style="color: #666666">=</span> kernel<span style="color: #666666">.</span>shape[<span style="color: #666666">0</span>] <span style="color: #666666">//</span> <span style="color: #666666">2</span>
-
-    ker1 <span style="color: #666666">=</span> np<span style="color: #666666">.</span>array(kernel[<span style="color: #666666">0</span>, :])
-    ker2 <span style="color: #666666">=</span> np<span style="color: #666666">.</span>array(kernel[:, <span style="color: #666666">0</span>])
-
-    <span style="color: #008000; font-weight: bold">if</span> pad <span style="color: #666666">==</span> <span style="color: #BA2121">&quot;zero&quot;</span>:
-        conv_image <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros(image<span style="color: #666666">.</span>shape)
-        pad_image <span style="color: #666666">=</span> padding(image, kernel)
-    <span style="color: #008000; font-weight: bold">else</span>:
-        conv_image <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros(
-            (image<span style="color: #666666">.</span>shape[<span style="color: #666666">0</span>] <span style="color: #666666">-</span> kernel<span style="color: #666666">.</span>shape[<span style="color: #666666">0</span>], image<span style="color: #666666">.</span>shape[<span style="color: #666666">1</span>] <span style="color: #666666">-</span> kernel<span style="color: #666666">.</span>shape[<span style="color: #666666">1</span>])
-        )
-        pad_image <span style="color: #666666">=</span> image[:, :]
-
-    <span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(half_dim, conv_image<span style="color: #666666">.</span>shape[<span style="color: #666666">0</span>] <span style="color: #666666">+</span> half_dim, stride):
-        <span style="color: #008000; font-weight: bold">for</span> j <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(half_dim, conv_image<span style="color: #666666">.</span>shape[<span style="color: #666666">1</span>] <span style="color: #666666">+</span> half_dim, stride):
-            conv_image[i <span style="color: #666666">-</span> half_dim, j <span style="color: #666666">-</span> half_dim] <span style="color: #666666">=</span> (
-                pad_image[
-                    i <span style="color: #666666">-</span> half_dim : i <span style="color: #666666">+</span> half_dim <span style="color: #666666">+</span> <span style="color: #666666">1</span>, j <span style="color: #666666">-</span> half_dim : j <span style="color: #666666">+</span> half_dim <span style="color: #666666">+</span> <span style="color: #666666">1</span>
-                ]
-                <span style="color: #666666">@</span> ker1
-                <span style="color: #666666">@</span> ker2<span style="color: #666666">.</span>T
-                <span style="color: #666666">*</span> coef
-            )
-
-    <span style="color: #008000; font-weight: bold">return</span> conv_image
-
-img_path <span style="color: #666666">=</span> img_path <span style="color: #666666">=</span> <span style="color: #BA2121">&quot;data/IMG-2167.JPG&quot;</span>
-image_of_cute_dog <span style="color: #666666">=</span> imageio<span style="color: #666666">.</span>imread(img_path, mode<span style="color: #666666">=</span><span style="color: #BA2121">&quot;L&quot;</span>)
-start_time <span style="color: #666666">=</span> time<span style="color: #666666">.</span>time()
-filtered_image <span style="color: #666666">=</span> conv2DSep(image_of_cute_dog, kernel<span style="color: #666666">=</span>sobel_kernel, coef<span style="color: #666666">=1</span>)
-<span style="color: #008000">print</span>(<span style="color: #BA2121">f&#39;Time taken for convolution with seperated kernel on 128x128 image </span><span style="color: #BB6688; font-weight: bold">{</span>time<span style="color: #666666">.</span>time() <span style="color: #666666">-</span> start_time<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">&#39;</span>)
-plt<span style="color: #666666">.</span>imshow(filtered_image, cmap<span style="color: #666666">=</span><span style="color: #BA2121">&quot;gray&quot;</span>, vmin<span style="color: #666666">=0</span>, vmax<span style="color: #666666">=255</span>, aspect<span style="color: #666666">=</span><span style="color: #BA2121">&quot;auto&quot;</span>)
-plt<span style="color: #666666">.</span>show()
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="pooling-arithmetic">Pooling arithmetic </h2>
 
-<p>By taking advantage of the capabilities of separable kernels, we can
-effectively cut the computational expense of filtering an image in
-half. Yet, if we seek even more rapid processing, we can turn to the
-Fast Fourier Transform (FFT) algorithm provided by the numpy
-library. By utilizing FFT to transform the input image and filter into
-the frequency domain, we can perform convolution in this domain. This
-approach significantly reduces the number of operations needed and
-results in a marked speedup relative to other convolution
-techniques. In addition, it is worth noting that the FFT is widely
-regarded as one of the most critical algorithms developed to date,
-with applications ranging from digital signal processing to scientific
-computing.
+<p>In a neural network, pooling layers provide invariance to small translations of
+the input. The most common kind of pooling is <b>max pooling</b>, which
+consists in splitting the input in (usually non-overlapping) patches and
+outputting the maximum value of each patch. Other kinds of pooling exist, e.g.,
+mean or average pooling, which all share the same idea of aggregating the input
+locally by applying a non-linearity to the content of some patches.
 </p>
-<h3 id="convolution-in-the-fourier-domain">Convolution in the Fourier domain </h3>
 
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;">start_time <span style="color: #666666">=</span> time<span style="color: #666666">.</span>time()
-img_fft <span style="color: #666666">=</span> np<span style="color: #666666">.</span>fft<span style="color: #666666">.</span>fft2(image_of_cute_dog)
-kernel_fft <span style="color: #666666">=</span> np<span style="color: #666666">.</span>fft<span style="color: #666666">.</span>fft2(sobel_kernel, s<span style="color: #666666">=</span>image_of_cute_dog<span style="color: #666666">.</span>shape)
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book">Pooling types (<a href="https://github.com/rasbt/machine-learning-book" target="_blank">From Raschka et al</a>) </h2>
 
-conv_image <span style="color: #666666">=</span> img_fft <span style="color: #666666">*</span> kernel_fft
+<center>  <!-- FIGURE -->
+<hr class="figure">
+<center>
+<p class="caption">Figure 5:  A deep CNN </p>
+</center>
+<p><img src="http://github.com/figslides/maxpooling.png" width="500" align="bottom"></p>
+</center>
 
-filtered_image <span style="color: #666666">=</span> np<span style="color: #666666">.</span>fft<span style="color: #666666">.</span>ifft2(conv_image)
-<span style="color: #008000">print</span>(<span style="color: #BA2121">f&#39;Time take for convolution in the fourier domain: </span><span style="color: #BB6688; font-weight: bold">{</span>time<span style="color: #666666">.</span>time() <span style="color: #666666">-</span> start_time<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">&#39;</span>)
-plt<span style="color: #666666">.</span>imshow(filtered_image<span style="color: #666666">.</span>real, cmap<span style="color: #666666">=</span><span style="color: #BA2121">&quot;gray&quot;</span>, vmin<span style="color: #666666">=0</span>, vmax<span style="color: #666666">=255</span>, aspect<span style="color: #666666">=</span><span style="color: #BA2121">&quot;auto&quot;</span>)
-plt<span style="color: #666666">.</span>show()
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="building-convolutional-neural-networks-in-tensorflow-keras-and-pytorch">Building convolutional neural networks in Tensorflow/Keras and PyTorch </h2>
 
-<p>It is evident that executing convolution in the Fourier domain yields
-the quickest computation time. Nonetheless, one should exercise
-caution, particularly when dealing with images of relatively small
-dimensions, as one of the other methods may prove to be more
-expeditious than FFT-enhanced convolution. The overhead involved in
-transferring both the image and filter into the Fourier domain,
-followed by their subsequent transformation back into the spatial
-domain, results in a minor inconvenience. Therefore, it is imperative
-to remain cognizant of this fact when utilizing FFT as the primary
-optimization technique.
+<p>As discussed above, CNNs are neural networks built from the assumption
+that the inputs to the network are 2D images. This is important
+because the number of features or pixels in images grows very fast
+with the image size, and an enormous number of weights and biases are
+needed in order to build an accurate network.  Next week we will
+discuss in more detail how we can build a CNN using either TensorFlow
+with Keras and PyTorch.
 </p>
 
+
 <!-- ------------------- end of main content --------------- -->
 <center style="font-size:80%">
-<!-- copyright --> &copy; 1999-2024, Morten Hjorth-Jensen. Released under CC Attribution-NonCommercial 4.0 license
+<!-- copyright --> &copy; 1999-2025, Morten Hjorth-Jensen. Released under CC Attribution-NonCommercial 4.0 license
 </center>
 </body>
 </html>
diff --git a/doc/pub/week44/ipynb/ipynb-week44-src.tar.gz b/doc/pub/week44/ipynb/ipynb-week44-src.tar.gz
index 470b8a4ab..247dc80c9 100644
Binary files a/doc/pub/week44/ipynb/ipynb-week44-src.tar.gz and b/doc/pub/week44/ipynb/ipynb-week44-src.tar.gz differ
diff --git a/doc/pub/week44/ipynb/week44.ipynb b/doc/pub/week44/ipynb/week44.ipynb
index 86616fad5..6193b11ee 100644
--- a/doc/pub/week44/ipynb/week44.ipynb
+++ b/doc/pub/week44/ipynb/week44.ipynb
@@ -2,43 +2,45 @@
  "cells": [
   {
    "cell_type": "markdown",
-   "id": "8ca0d33d",
+   "id": "67995f17",
    "metadata": {
     "editable": true
    },
    "source": [
     "<!-- HTML file automatically generated from DocOnce source (https://github.com/doconce/doconce/)\n",
     "doconce format html week44.do.txt --no_mako -->\n",
-    "<!-- dom:TITLE: Week 44,  Convolutional Neural Networks (CNN) -->"
+    "<!-- dom:TITLE: Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN) -->"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "bddb3681",
+   "id": "d31bb6a0",
    "metadata": {
     "editable": true
    },
    "source": [
-    "# Week 44,  Convolutional Neural Networks (CNN)\n",
+    "# Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)\n",
     "**Morten Hjorth-Jensen**, Department of Physics, University of Oslo, Norway\n",
     "\n",
-    "Date: **October 28**"
+    "Date: **Week 44**"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "3d29f8f9",
+   "id": "846f5bd7",
    "metadata": {
     "editable": true
    },
    "source": [
     "## Plan for week 44\n",
     "\n",
-    "**Material for the lecture Monday October 28, 2024.**\n",
+    "**Material for the lecture Monday October 27, 2025.**\n",
     "\n",
-    "1. Convolutional  Neural Networks\n",
+    "1. Solving differential equations, continuation from last week, first lecture\n",
     "\n",
-    "2. Readings and Videos:\n",
+    "2. Convolutional  Neural Networks, second lecture\n",
+    "\n",
+    "3. Readings and Videos:\n",
     "\n",
     "  * These lecture notes at <https://github.com/CompPhysics/MachineLearning/blob/master/doc/pub/week44/ipynb/week44.ipynb>\n",
     "\n",
@@ -52,14 +54,14 @@
     "\n",
     "  * Video on CNNs from Stanford at <https://www.youtube.com/watch?v=bNb2fEVKeEo&list=PLC1qU-LWwrF64f4QKQT-Vg5Wr4qEE1Zxk&index=6&ab_channel=StanfordUniversitySchoolofEngineering>\n",
     "\n",
-    "  * Video of lecture October 28 at <https://youtu.be/rfrSfikAz94>\n",
+    "  * Video of lecture October 27 at <https://youtu.be/QqOGhLgkig0>\n",
     "\n",
-    "  * Whiteboard notes at <https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2024/NotesOctober28>"
+    "  * Whiteboard notes at <https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2025/FYSSTKweek44>"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "34fc94b3",
+   "id": "855f98ab",
    "metadata": {
     "editable": true
    },
@@ -68,5684 +70,4910 @@
     "\n",
     "* Main focus is discussion of and work on project 2\n",
     "\n",
-    "* If you did not get time to finish the exercises from week 43, you can also keep working on them and hand in this coming Friday"
+    "* If you did not get time to finish the exercises from weeks 41-42, you can also keep working on them and hand in this coming Friday"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "3239c466",
+   "id": "12675cc5",
    "metadata": {
     "editable": true
    },
    "source": [
-    "## Material for Lecture Monday October 28"
+    "## Material for Lecture Monday October 27"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "79352c78",
+   "id": "f714320f",
    "metadata": {
     "editable": true
    },
    "source": [
-    "## Convolutional Neural Networks (recognizing images)\n",
+    "## Solving differential equations  with Deep Learning\n",
     "\n",
-    "Convolutional neural networks (CNNs) were developed during the last\n",
-    "decade of the previous century, with a focus on character recognition\n",
-    "tasks. Nowadays, CNNs are a central element in the spectacular success\n",
-    "of deep learning methods. The success in for example image\n",
-    "classifications have made them a central tool for most machine\n",
-    "learning practitioners.\n",
+    "The Universal Approximation Theorem states that a neural network can\n",
+    "approximate any function at a single hidden layer along with one input\n",
+    "and output layer to any given precision.\n",
     "\n",
-    "CNNs are very similar to ordinary Neural Networks.\n",
-    "They are made up of neurons that have learnable weights and\n",
-    "biases. Each neuron receives some inputs, performs a dot product and\n",
-    "optionally follows it with a non-linearity. The whole network still\n",
-    "expresses a single differentiable score function: from the raw image\n",
-    "pixels on one end to class scores at the other. And they still have a\n",
-    "loss function (for example Softmax) on the last (fully-connected) layer\n",
-    "and all the tips/tricks we developed for learning regular Neural\n",
-    "Networks still apply (back propagation, gradient descent etc etc)."
+    "**Book on solving differential equations with ML methods.**\n",
+    "\n",
+    "[An Introduction to Neural Network Methods for Differential Equations](https://www.springer.com/gp/book/9789401798150), by Yadav and Kumar.\n",
+    "\n",
+    "**Physics informed neural networks.**\n",
+    "\n",
+    "[Scientific Machine Learning Through Physics–Informed Neural Networks: Where we are and What’s Next](https://link.springer.com/article/10.1007/s10915-022-01939-z), by Cuomo et al\n",
+    "\n",
+    "**Thanks to Kristine Baluka Hein.**\n",
+    "\n",
+    "The lectures on differential equations were developed by Kristine Baluka Hein, now PhD student at IFI.\n",
+    "A great thanks to Kristine."
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "bd4b7c7a",
+   "id": "ebe354b6",
    "metadata": {
     "editable": true
    },
    "source": [
-    "## What is the Difference\n",
+    "## Ordinary Differential Equations first\n",
     "\n",
-    "**CNN architectures make the explicit assumption that\n",
-    "the inputs are images, which allows us to encode certain properties\n",
-    "into the architecture. These then make the forward function more\n",
-    "efficient to implement and vastly reduce the amount of parameters in\n",
-    "the network.**"
+    "An ordinary differential equation (ODE) is an equation involving functions having one variable.\n",
+    "\n",
+    "In general, an ordinary differential equation looks like"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "cc1736b4",
+   "id": "f16621c0",
    "metadata": {
     "editable": true
    },
    "source": [
-    "## Neural Networks vs CNNs\n",
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"ode\"></div>\n",
     "\n",
-    "Neural networks are defined as **affine transformations**, that is \n",
-    "a vector is received as input and is multiplied with a matrix of so-called weights (our unknown paramters) to produce an\n",
-    "output (to which a bias vector is usually added before passing the result\n",
-    "through a nonlinear activation function). This is applicable to any type of input, be it an\n",
-    "image, a sound clip or an unordered collection of features: whatever their\n",
-    "dimensionality, their representation can always be flattened into a vector\n",
-    "before the transformation."
+    "$$\n",
+    "\\begin{equation} \\label{ode} \\tag{1}\n",
+    "f\\left(x, \\, g(x), \\, g'(x), \\, g''(x), \\, \\dots \\, , \\, g^{(n)}(x)\\right) = 0\n",
+    "\\end{equation}\n",
+    "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "70bf2bf1",
+   "id": "2b272a0d",
    "metadata": {
     "editable": true
    },
    "source": [
-    "## Why CNNS for images, sound files, medical images from CT scans etc?\n",
-    "\n",
-    "However, when we consider images, sound clips and many other similar kinds of data, these data  have an intrinsic\n",
-    "structure. More formally, they share these important properties:\n",
-    "* They are stored as multi-dimensional arrays (think of the pixels of a figure) .\n",
-    "\n",
-    "* They feature one or more axes for which ordering matters (e.g., width and height axes for an image, time axis for a sound clip).\n",
-    "\n",
-    "* One axis, called the channel axis, is used to access different views of the data (e.g., the red, green and blue channels of a color image, or the left and right channels of a stereo audio track).\n",
+    "where $g(x)$ is the function to find, and $g^{(n)}(x)$ is the $n$-th derivative of $g(x)$.\n",
     "\n",
-    "These properties are not exploited when an affine transformation is applied; in\n",
-    "fact, all the axes are treated in the same way and the topological information\n",
-    "is not taken into account. Still, taking advantage of the implicit structure of\n",
-    "the data may prove very handy in solving some tasks, like computer vision and\n",
-    "speech recognition, and in these cases it would be best to preserve it. This is\n",
-    "where discrete convolutions come into play.\n",
-    "\n",
-    "A discrete convolution is a linear transformation that preserves this notion of\n",
-    "ordering. It is sparse (only a few input units contribute to a given output\n",
-    "unit) and reuses parameters (the same weights are applied to multiple locations\n",
-    "in the input)."
+    "The $f\\left(x, g(x), g'(x), g''(x), \\, \\dots \\, , g^{(n)}(x)\\right)$ is just a way to write that there is an expression involving $x$ and $g(x), \\ g'(x), \\ g''(x), \\, \\dots \\, , \\text{ and } g^{(n)}(x)$ on the left side of the equality sign in ([1](#ode)).\n",
+    "The highest order of derivative, that is the value of $n$, determines to the order of the equation.\n",
+    "The equation is referred to as a $n$-th order ODE.\n",
+    "Along with ([1](#ode)), some additional conditions of the function $g(x)$ are typically given\n",
+    "for the solution to be unique."
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "3a7dca56",
+   "id": "611b2399",
    "metadata": {
     "editable": true
    },
    "source": [
-    "## Regular NNs don’t scale well to full images\n",
-    "\n",
-    "As an example, consider\n",
-    "an image of size $32\\times 32\\times 3$ (32 wide, 32 high, 3 color channels), so a\n",
-    "single fully-connected neuron in a first hidden layer of a regular\n",
-    "Neural Network would have $32\\times 32\\times 3 = 3072$ weights. This amount still\n",
-    "seems manageable, but clearly this fully-connected structure does not\n",
-    "scale to larger images. For example, an image of more respectable\n",
-    "size, say $200\\times 200\\times 3$, would lead to neurons that have \n",
-    "$200\\times 200\\times 3 = 120,000$ weights. \n",
-    "\n",
-    "We could have\n",
-    "several such neurons, and the parameters would add up quickly! Clearly,\n",
-    "this full connectivity is wasteful and the huge number of parameters\n",
-    "would quickly lead to possible overfitting.\n",
-    "\n",
-    "<!-- dom:FIGURE: [figslides/nn.jpeg, width=500 frac=0.6]  A regular 3-layer Neural Network. -->\n",
-    "<!-- begin figure -->\n",
+    "## The trial solution\n",
     "\n",
-    "<img src=\"figslides/nn.jpeg\" width=\"500\"><p style=\"font-size: 0.9em\"><i>Figure 1: A regular 3-layer Neural Network.</i></p>\n",
-    "<!-- end figure -->"
+    "Let the trial solution $g_t(x)$ be"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "de304dc3",
+   "id": "cab2d9fb",
    "metadata": {
     "editable": true
    },
    "source": [
-    "## 3D volumes of neurons\n",
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"_auto1\"></div>\n",
     "\n",
-    "Convolutional Neural Networks take advantage of the fact that the\n",
-    "input consists of images and they constrain the architecture in a more\n",
-    "sensible way. \n",
-    "\n",
-    "In particular, unlike a regular Neural Network, the\n",
-    "layers of a CNN have neurons arranged in 3 dimensions: width,\n",
-    "height, depth. (Note that the word depth here refers to the third\n",
-    "dimension of an activation volume, not to the depth of a full Neural\n",
-    "Network, which can refer to the total number of layers in a network.)\n",
-    "\n",
-    "To understand it better, the above example of an image \n",
-    "with an input volume of\n",
-    "activations has dimensions $32\\times 32\\times 3$ (width, height,\n",
-    "depth respectively). \n",
-    "\n",
-    "The neurons in a layer will\n",
-    "only be connected to a small region of the layer before it, instead of\n",
-    "all of the neurons in a fully-connected manner. Moreover, the final\n",
-    "output layer could  for this specific image have dimensions $1\\times 1 \\times 10$, \n",
-    "because by the\n",
-    "end of the CNN architecture we will reduce the full image into a\n",
-    "single vector of class scores, arranged along the depth\n",
-    "dimension. \n",
+    "$$\n",
+    "\\begin{equation}\n",
+    "\tg_t(x) = h_1(x) + h_2(x,N(x,P))\n",
+    "\\label{_auto1} \\tag{2}\n",
+    "\\end{equation}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "fbd68a84",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where $h_1(x)$ is a function that makes $g_t(x)$ satisfy a given set\n",
+    "of conditions, $N(x,P)$ a neural network with weights and biases\n",
+    "described by $P$ and $h_2(x, N(x,P))$ some expression involving the\n",
+    "neural network.  The role of the function $h_2(x, N(x,P))$, is to\n",
+    "ensure that the output from $N(x,P)$ is zero when $g_t(x)$ is\n",
+    "evaluated at the values of $x$ where the given conditions must be\n",
+    "satisfied.  The function $h_1(x)$ should alone make $g_t(x)$ satisfy\n",
+    "the conditions.\n",
     "\n",
-    "<!-- dom:FIGURE: [figslides/cnn.jpeg, width=500 frac=0.6]  A CNN arranges its neurons in three dimensions (width, height, depth), as visualized in one of the layers. Every layer of a CNN transforms the 3D input volume to a 3D output volume of neuron activations. In this example, the red input layer holds the image, so its width and height would be the dimensions of the image, and the depth would be 3 (Red, Green, Blue channels). -->\n",
-    "<!-- begin figure -->\n",
+    "But what about the network $N(x,P)$?\n",
     "\n",
-    "<img src=\"figslides/cnn.jpeg\" width=\"500\"><p style=\"font-size: 0.9em\"><i>Figure 1: A CNN arranges its neurons in three dimensions (width, height, depth), as visualized in one of the layers. Every layer of a CNN transforms the 3D input volume to a 3D output volume of neuron activations. In this example, the red input layer holds the image, so its width and height would be the dimensions of the image, and the depth would be 3 (Red, Green, Blue channels).</i></p>\n",
-    "<!-- end figure -->"
+    "As described previously, an optimization method could be used to minimize the parameters of a neural network, that being its weights and biases, through backward propagation."
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "91bb5b86",
+   "id": "24929e78",
    "metadata": {
     "editable": true
    },
    "source": [
-    "## More on Dimensionalities\n",
-    "\n",
-    "In fields like signal processing (and imaging as well), one designs\n",
-    "so-called filters. These filters are defined by the convolutions and\n",
-    "are often hand-crafted. One may specify filters for smoothing, edge\n",
-    "detection, frequency reshaping, and similar operations. However with\n",
-    "neural networks the idea is to automatically learn the filters and use\n",
-    "many of them in conjunction with non-linear operations (activation\n",
-    "functions).\n",
+    "## Minimization process\n",
     "\n",
-    "As an example consider a neural network operating on sound sequence\n",
-    "data.  Assume that we an input vector $\\boldsymbol{x}$ of length $d=10^6$.  We\n",
-    "construct then a neural network with onle hidden layer only with\n",
-    "$10^4$ nodes. This means that we will have a weight matrix with\n",
-    "$10^4\\times 10^6=10^{10}$ weights to be determined, together with $10^4$ biases.\n",
+    "For the minimization to be defined, we need to have a cost function at hand to minimize.\n",
     "\n",
-    "Assume furthermore that we have an output layer which is meant to train whether the sound sequence represents a human voice (true) or something else (false).\n",
-    "It means that we have only one output node. But since this output node connects to $10^4$ nodes in the hidden layer, there are in total $10^4$ weights to be determined for the output layer, plus one bias. In total we have"
+    "It is given that $f\\left(x, \\, g(x), \\, g'(x), \\, g''(x), \\, \\dots \\, , \\, g^{(n)}(x)\\right)$ should be equal to zero in ([1](#ode)).\n",
+    "We can choose to consider the mean squared error as the cost function for an input $x$.\n",
+    "Since we are looking at one input, the cost function is just $f$ squared.\n",
+    "The cost function $c\\left(x, P \\right)$ can therefore be expressed as"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "413cf44d",
+   "id": "8da0a4d4",
    "metadata": {
     "editable": true
    },
    "source": [
     "$$\n",
-    "\\mathrm{NumberParameters}=10^{10}+10^4+10^4+1 \\approx 10^{10},\n",
+    "C\\left(x, P\\right) = \\big(f\\left(x, \\, g(x), \\, g'(x), \\, g''(x), \\, \\dots \\, , \\, g^{(n)}(x)\\right)\\big)^2\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "adda54b9",
+   "id": "3de8b89e",
    "metadata": {
     "editable": true
    },
    "source": [
-    "that is ten billion parameters to determine."
+    "If $N$ inputs are given as a vector $\\boldsymbol{x}$ with elements $x_i$ for $i = 1,\\dots,N$,\n",
+    "the cost function becomes"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "4507a4f9",
+   "id": "1275ce7a",
    "metadata": {
     "editable": true
    },
    "source": [
-    "## Further remarks\n",
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"cost\"></div>\n",
     "\n",
-    "The main principles that justify convolutions is locality of\n",
-    "information and repetion of patterns within the signal. Sound samples\n",
-    "of the input in adjacent spots are much more likely to affect each\n",
-    "other than those that are very far away. Similarly, sounds are\n",
-    "repeated in multiple times in the signal. While slightly simplistic,\n",
-    "reasoning about such a sound example demonstrates this. The same\n",
-    "principles then apply to images and other similar data."
+    "$$\n",
+    "\\begin{equation} \\label{cost} \\tag{3}\n",
+    "\tC\\left(\\boldsymbol{x}, P\\right) = \\frac{1}{N} \\sum_{i=1}^N \\big(f\\left(x_i, \\, g(x_i), \\, g'(x_i), \\, g''(x_i), \\, \\dots \\, , \\, g^{(n)}(x_i)\\right)\\big)^2\n",
+    "\\end{equation}\n",
+    "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "148bc604",
+   "id": "a522e0fa",
    "metadata": {
     "editable": true
    },
    "source": [
-    "## Layers used to build CNNs\n",
-    "\n",
-    "A simple CNN is a sequence of layers, and every layer of a CNN\n",
-    "transforms one volume of activations to another through a\n",
-    "differentiable function. We use three main types of layers to build\n",
-    "CNN architectures: Convolutional Layer, Pooling Layer, and\n",
-    "Fully-Connected Layer (exactly as seen in regular Neural Networks). We\n",
-    "will stack these layers to form a full CNN architecture.\n",
-    "\n",
-    "A simple CNN for image classification could have the architecture:\n",
-    "\n",
-    "* **INPUT** ($32\\times 32 \\times 3$) will hold the raw pixel values of the image, in this case an image of width 32, height 32, and with three color channels R,G,B.\n",
-    "\n",
-    "* **CONV** (convolutional )layer will compute the output of neurons that are connected to local regions in the input, each computing a dot product between their weights and a small region they are connected to in the input volume. This may result in volume such as $[32\\times 32\\times 12]$ if we decided to use 12 filters.\n",
-    "\n",
-    "* **RELU** layer will apply an elementwise activation function, such as the $max(0,x)$ thresholding at zero. This leaves the size of the volume unchanged ($[32\\times 32\\times 12]$).\n",
-    "\n",
-    "* **POOL** (pooling) layer will perform a downsampling operation along the spatial dimensions (width, height), resulting in volume such as $[16\\times 16\\times 12]$.\n",
-    "\n",
-    "* **FC** (i.e. fully-connected) layer will compute the class scores, resulting in volume of size $[1\\times 1\\times 10]$, where each of the 10 numbers correspond to a class score, such as among the 10 categories of the MNIST images we considered above . As with ordinary Neural Networks and as the name implies, each neuron in this layer will be connected to all the numbers in the previous volume."
+    "The neural net should then find the parameters $P$ that minimizes the cost function in\n",
+    "([3](#cost)) for a set of $N$ training samples $x_i$."
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "2cc27405",
+   "id": "8a18955b",
    "metadata": {
     "editable": true
    },
    "source": [
-    "## Transforming images\n",
+    "## Minimizing the cost function using gradient descent and automatic differentiation\n",
     "\n",
-    "CNNs transform the original image layer by layer from the original\n",
-    "pixel values to the final class scores. \n",
+    "To perform the minimization using gradient descent, the gradient of $C\\left(\\boldsymbol{x}, P\\right)$ is needed.\n",
+    "It might happen so that finding an analytical expression of the gradient of $C(\\boldsymbol{x}, P)$ from ([3](#cost)) gets too messy, depending on which cost function one desires to use.\n",
     "\n",
-    "Observe that some layers contain\n",
-    "parameters and other don’t. In particular, the CNN layers perform\n",
-    "transformations that are a function of not only the activations in the\n",
-    "input volume, but also of the parameters (the weights and biases of\n",
-    "the neurons). On the other hand, the RELU/POOL layers will implement a\n",
-    "fixed function. The parameters in the CONV/FC layers will be trained\n",
-    "with gradient descent so that the class scores that the CNN computes\n",
-    "are consistent with the labels in the training set for each image."
+    "Luckily, there exists libraries that makes the job for us through automatic differentiation.\n",
+    "Automatic differentiation is a method of finding the derivatives numerically with very high precision."
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "f76305aa",
+   "id": "888808f7",
    "metadata": {
     "editable": true
    },
    "source": [
-    "## CNNs in brief\n",
-    "\n",
-    "In summary:\n",
-    "\n",
-    "* A CNN architecture is in the simplest case a list of Layers that transform the image volume into an output volume (e.g. holding the class scores)\n",
-    "\n",
-    "* There are a few distinct types of Layers (e.g. CONV/FC/RELU/POOL are by far the most popular)\n",
-    "\n",
-    "* Each Layer accepts an input 3D volume and transforms it to an output 3D volume through a differentiable function\n",
-    "\n",
-    "* Each Layer may or may not have parameters (e.g. CONV/FC do, RELU/POOL don’t)\n",
+    "## Example: Exponential decay\n",
     "\n",
-    "* Each Layer may or may not have additional hyperparameters (e.g. CONV/FC/POOL do, RELU doesn’t)"
+    "An exponential decay of a quantity $g(x)$ is described by the equation"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "4411a90d",
+   "id": "fcefd7fb",
    "metadata": {
     "editable": true
    },
    "source": [
-    "## A deep CNN model ([From Raschka et al](https://github.com/rasbt/machine-learning-book))\n",
-    "\n",
-    "<!-- dom:FIGURE: [figslides/deepcnn.png, width=500 frac=0.67]  A deep CNN -->\n",
-    "<!-- begin figure -->\n",
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"solve_expdec\"></div>\n",
     "\n",
-    "<img src=\"figslides/deepcnn.png\" width=\"500\"><p style=\"font-size: 0.9em\"><i>Figure 1: A deep CNN</i></p>\n",
-    "<!-- end figure -->"
+    "$$\n",
+    "\\begin{equation} \\label{solve_expdec} \\tag{4}\n",
+    "  g'(x) = -\\gamma g(x)\n",
+    "\\end{equation}\n",
+    "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "b3d0dbca",
+   "id": "02cb2ce9",
    "metadata": {
     "editable": true
    },
    "source": [
-    "## Key Idea\n",
-    "\n",
-    "A dense neural network is representd by an affine operation (like matrix-matrix multiplication) where all parameters are included.\n",
+    "with $g(0) = g_0$ for some chosen initial value $g_0$.\n",
     "\n",
-    "The key idea in CNNs for say imaging is that in images neighbor pixels tend to be related! So we connect\n",
-    "only neighboring neurons in the input instead of connecting all with the first hidden layer.\n",
+    "The analytical solution of ([4](#solve_expdec)) is"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "bdd9ef4d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"_auto2\"></div>\n",
     "\n",
-    "We say we perform a filtering (convolution is the mathematical operation)."
+    "$$\n",
+    "\\begin{equation}\n",
+    "  g(x) = g_0 \\exp\\left(-\\gamma x\\right)\n",
+    "\\label{_auto2} \\tag{5}\n",
+    "\\end{equation}\n",
+    "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "a09c0720",
+   "id": "867cbb56",
    "metadata": {
     "editable": true
    },
    "source": [
-    "## How to do image compression before the era of deep learning\n",
+    "Having an analytical solution at hand, it is possible to use it to compare how well a neural network finds a solution of ([4](#solve_expdec))."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2f9ac7ae",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## The function to solve for\n",
     "\n",
-    "The singular-value decomposition (SVD) algorithm has been for decades one of the standard ways of compressing images.\n",
-    "The [lectures on the SVD](https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/chapter2.html#the-singular-value-decomposition) give many of the essential details concerning the SVD.\n",
+    "The program will use a neural network to solve"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "49a68337",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"solveode\"></div>\n",
     "\n",
-    "The orthogonal vectors which are obtained from the SVD, can be used to\n",
-    "project down the dimensionality of a given image. In the example here\n",
-    "we gray-scale an image and downsize it.\n",
+    "$$\n",
+    "\\begin{equation} \\label{solveode} \\tag{6}\n",
+    "g'(x) = -\\gamma g(x)\n",
+    "\\end{equation}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a6a70316",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where $g(0) = g_0$ with $\\gamma$ and $g_0$ being some chosen values.\n",
     "\n",
-    "This recipe relies on us being able to actually perform the SVD. For\n",
-    "large images, and in particular with many images to reconstruct, using the SVD \n",
-    "may quickly become an overwhelming task. With the advent of efficient deep\n",
-    "learning methods like CNNs and later generative methods, these methods\n",
-    "have become in the last years the premier way of performing image\n",
-    "analysis. In particular for classification problems with labelled images."
+    "In this example, $\\gamma = 2$ and $g_0 = 10$."
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "48da38d8",
+   "id": "15622597",
    "metadata": {
     "editable": true
    },
    "source": [
-    "## The SVD example"
+    "## The trial solution\n",
+    "To begin with, a trial solution $g_t(t)$ must be chosen. A general trial solution for ordinary differential equations could be"
    ]
   },
   {
-   "cell_type": "code",
-   "execution_count": 1,
-   "id": "762be0c3",
+   "cell_type": "markdown",
+   "id": "3661d5fe",
    "metadata": {
-    "collapsed": false,
     "editable": true
    },
-   "outputs": [],
    "source": [
-    "%matplotlib inline\n",
-    "\n",
-    "from matplotlib.image import imread\n",
-    "import matplotlib.pyplot as plt\n",
-    "import scipy.linalg as ln\n",
-    "import numpy as np\n",
-    "import os\n",
-    "from PIL import Image\n",
-    "from math import log10, sqrt \n",
-    "plt.rcParams['figure.figsize'] = [16, 8]\n",
-    "# Import image\n",
-    "A = imread(os.path.join(\"figslides/photo1.jpg\"))\n",
-    "X = A.dot([0.299, 0.5870, 0.114]) # Convert RGB to grayscale\n",
-    "img = plt.imshow(X)\n",
-    "# convert to gray\n",
-    "img.set_cmap('gray')\n",
-    "plt.axis('off')\n",
-    "plt.show()\n",
-    "# Call image size\n",
-    "print(': %s'%str(X.shape))\n",
-    "\n",
-    "\n",
-    "# split the matrix into U, S, VT\n",
-    "U, S, VT = np.linalg.svd(X,full_matrices=False)\n",
-    "S = np.diag(S)\n",
-    "m = 800 # Image's width\n",
-    "n = 1200 # Image's height\n",
-    "j = 0\n",
-    "# Try compression with different k vectors (these represent projections):\n",
-    "for k in (5,10, 20, 100,200,400,500):\n",
-    "    # Original size of the image\n",
-    "    originalSize = m * n \n",
-    "    # Size after compressed\n",
-    "    compressedSize = k * (1 + m + n) \n",
-    "    # The projection of the original image\n",
-    "    Xapprox = U[:,:k] @ S[0:k,:k] @ VT[:k,:]\n",
-    "    plt.figure(j+1)\n",
-    "    j += 1\n",
-    "    img = plt.imshow(Xapprox)\n",
-    "    img.set_cmap('gray')\n",
-    "    \n",
-    "    plt.axis('off')\n",
-    "    plt.title('k = ' + str(k))\n",
-    "    plt.show() \n",
-    "    print('Original size of image:')\n",
-    "    print(originalSize)\n",
-    "    print('Compression rate as Compressed image / Original size:')\n",
-    "    ratio = compressedSize * 1.0 / originalSize\n",
-    "    print(ratio)\n",
-    "    print('Compression rate is ' + str( round(ratio * 100 ,2)) + '%' )  \n",
-    "    # Estimate MQA\n",
-    "    x= X.astype(\"float\")\n",
-    "    y=Xapprox.astype(\"float\")\n",
-    "    err = np.sum((x - y) ** 2)\n",
-    "    err /= float(X.shape[0] * Xapprox.shape[1])\n",
-    "    print('The mean-square deviation '+ str(round( err)))\n",
-    "    max_pixel = 255.0\n",
-    "    # Estimate Signal Noise Ratio\n",
-    "    srv = 20 * (log10(max_pixel / sqrt(err)))\n",
-    "    print('Signa to noise ratio '+ str(round(srv)) +'dB')"
+    "$$\n",
+    "g_t(x, P) = h_1(x) + h_2(x, N(x, P))\n",
+    "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "9e557414",
+   "id": "245327b3",
    "metadata": {
     "editable": true
    },
    "source": [
-    "## Mathematics of CNNs\n",
+    "with $h_1(x)$ ensuring that $g_t(x)$ satisfies some conditions and $h_2(x,N(x, P))$ an expression involving $x$ and the output from the neural network $N(x,P)$ with $P $ being the collection of the weights and biases for each layer. For now, it is assumed that the network consists of one input layer, one hidden layer, and one output layer."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "57ae96b2",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Setup of Network\n",
     "\n",
-    "The mathematics of CNNs is based on the mathematical operation of\n",
-    "**convolution**.  In mathematics (in particular in functional analysis),\n",
-    "convolution is represented by mathematical operations (integration,\n",
-    "summation etc) on two functions in order to produce a third function\n",
-    "that expresses how the shape of one gets modified by the other.\n",
-    "Convolution has a plethora of applications in a variety of\n",
-    "disciplines, spanning from statistics to signal processing, computer\n",
-    "vision, solutions of differential equations,linear algebra,\n",
-    "engineering, and yes, machine learning.\n",
+    "In this network, there are no weights and bias at the input layer, so $P = \\{ P_{\\text{hidden}},  P_{\\text{output}} \\}$.\n",
+    "If there are $N_{\\text{hidden} }$ neurons in the hidden layer, then $P_{\\text{hidden}}$ is a $N_{\\text{hidden} } \\times (1 + N_{\\text{input}})$ matrix, given that there are $N_{\\text{input}}$ neurons in the input layer.\n",
     "\n",
-    "Mathematically, convolution is defined as follows (one-dimensional example):\n",
-    "Let us define a continuous function $y(t)$ given by"
+    "The first column in $P_{\\text{hidden} }$ represents the bias for each neuron in the hidden layer and the second column represents the weights for each neuron in the hidden layer from the input layer.\n",
+    "If there are $N_{\\text{output} }$ neurons in the output layer, then $P_{\\text{output}} $ is a $N_{\\text{output} } \\times (1 + N_{\\text{hidden} })$ matrix.\n",
+    "\n",
+    "Its first column represents the bias of each neuron and the remaining columns represents the weights to each neuron.\n",
+    "\n",
+    "It is given that $g(0) = g_0$. The trial solution must fulfill this condition to be a proper solution of ([6](#solveode)). A possible way to ensure that $g_t(0, P) = g_0$, is to let $F(N(x,P)) = x \\cdot N(x,P)$ and $h_1(x) = g_0$. This gives the following trial solution:"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "2fa21082",
+   "id": "6e7ea73f",
    "metadata": {
     "editable": true
    },
    "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"trial\"></div>\n",
+    "\n",
     "$$\n",
-    "y(t) = \\int x(a) w(t-a) da,\n",
+    "\\begin{equation} \\label{trial} \\tag{7}\n",
+    "g_t(x, P) = g_0 + x \\cdot N(x, P)\n",
+    "\\end{equation}\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "78091a35",
+   "id": "3ef84086",
    "metadata": {
     "editable": true
    },
    "source": [
-    "where $x(a)$ represents a so-called input and $w(t-a)$ is normally called the weight function or kernel.\n",
+    "## Reformulating the problem\n",
     "\n",
-    "The above integral is written in  a more compact form as"
+    "We wish that our neural network manages to minimize a given cost function.\n",
+    "\n",
+    "A reformulation of out equation, ([6](#solveode)), must therefore be done,\n",
+    "such that it describes the problem a neural network can solve for.\n",
+    "\n",
+    "The neural network must find the set of weights and biases $P$ such that the trial solution in ([7](#trial)) satisfies ([6](#solveode)).\n",
+    "\n",
+    "The trial solution"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "ba4e1957",
+   "id": "03980965",
    "metadata": {
     "editable": true
    },
    "source": [
     "$$\n",
-    "y(t) = \\left(x * w\\right)(t).\n",
+    "g_t(x, P) = g_0 + x \\cdot N(x, P)\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "1a7d9e0f",
+   "id": "f838bf7c",
    "metadata": {
     "editable": true
    },
    "source": [
-    "The discretized version reads"
+    "has been chosen such that it already solves the condition $g(0) = g_0$. What remains, is to find $P$ such that"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "d2be6986",
+   "id": "3e1ebb62",
    "metadata": {
     "editable": true
    },
    "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"nnmin\"></div>\n",
+    "\n",
     "$$\n",
-    "y(t) = \\sum_{a=-\\infty}^{a=\\infty}x(a)w(t-a).\n",
+    "\\begin{equation} \\label{nnmin} \\tag{8}\n",
+    "g_t'(x, P) = - \\gamma g_t(x, P)\n",
+    "\\end{equation}\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "c89576c7",
+   "id": "a85dcbea",
    "metadata": {
     "editable": true
    },
    "source": [
-    "Computing the inverse of the above convolution operations is known as deconvolution and the process is commutative.\n",
-    "\n",
-    "How can we use this? And what does it mean? Let us study some familiar examples first."
+    "is fulfilled as *best as possible*."
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "982931d8",
+   "id": "dc4a2fc0",
    "metadata": {
     "editable": true
    },
    "source": [
-    "## Convolution Examples: Polynomial multiplication\n",
+    "## More technicalities\n",
     "\n",
-    "Our first example is that of a multiplication between two polynomials,\n",
-    "which we will rewrite in terms of the mathematics of convolution. In\n",
-    "the final stage, since the problem here is a discrete one, we will\n",
-    "recast the final expression in terms of a matrix-vector\n",
-    "multiplication, where the matrix is a so-called [Toeplitz matrix\n",
-    "](https://link.springer.com/book/10.1007/978-93-86279-04-0).\n",
+    "The left hand side and right hand side of ([8](#nnmin)) must be computed separately, and then the neural network must choose weights and biases, contained in $P$, such that the sides are equal as best as possible.\n",
+    "This means that the absolute or squared difference between the sides must be as close to zero, ideally equal to zero.\n",
+    "In this case, the difference squared shows to be an appropriate measurement of how erroneous the trial solution is with respect to $P$ of the neural network.\n",
     "\n",
-    "Let us look a the following polynomials to second and third order, respectively:"
+    "This gives the following cost function our neural network must solve for:"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "cead60e0",
+   "id": "20921b20",
    "metadata": {
     "editable": true
    },
    "source": [
     "$$\n",
-    "p(t) = \\alpha_0+\\alpha_1 t+\\alpha_2 t^2,\n",
+    "\\min_{P}\\Big\\{ \\big(g_t'(x, P) - ( -\\gamma g_t(x, P) \\big)^2 \\Big\\}\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "d1986b13",
+   "id": "06e89d99",
    "metadata": {
     "editable": true
    },
    "source": [
-    "and"
+    "(the notation $\\min_{P}\\{ f(x, P) \\}$ means that we desire to find $P$ that yields the minimum of $f(x, P)$)\n",
+    "\n",
+    "or, in terms of weights and biases for the hidden and output layer in our network:"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "1a2d79b8",
+   "id": "fb4b7d00",
    "metadata": {
     "editable": true
    },
    "source": [
     "$$\n",
-    "s(t) = \\beta_0+\\beta_1 t+\\beta_2 t^2+\\beta_3 t^3.\n",
+    "\\min_{P_{\\text{hidden} }, \\ P_{\\text{output} }}\\Big\\{ \\big(g_t'(x, \\{ P_{\\text{hidden} }, P_{\\text{output} }\\}) - ( -\\gamma g_t(x, \\{ P_{\\text{hidden} }, P_{\\text{output} }\\}) \\big)^2 \\Big\\}\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "46e743d4",
+   "id": "925d8872",
    "metadata": {
     "editable": true
    },
    "source": [
-    "The polynomial multiplication gives us a new polynomial of degree $5$"
+    "for an input value $x$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "46f38d69",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## More details\n",
+    "\n",
+    "If the neural network evaluates $g_t(x, P)$ at more values for $x$, say $N$ values $x_i$ for $i = 1, \\dots, N$, then the *total* error to minimize becomes"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "7342bf69",
+   "id": "adca56df",
    "metadata": {
     "editable": true
    },
    "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"min\"></div>\n",
+    "\n",
     "$$\n",
-    "z(t) = \\delta_0+\\delta_1 t+\\delta_2 t^2+\\delta_3 t^3+\\delta_4 t^4+\\delta_5 t^5.\n",
+    "\\begin{equation} \\label{min} \\tag{9}\n",
+    "\\min_{P}\\Big\\{\\frac{1}{N} \\sum_{i=1}^N  \\big(g_t'(x_i, P) - ( -\\gamma g_t(x_i, P) \\big)^2 \\Big\\}\n",
+    "\\end{equation}\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "d1efa756",
+   "id": "9e260216",
    "metadata": {
     "editable": true
    },
    "source": [
-    "## Efficient Polynomial Multiplication\n",
-    "\n",
-    "Computing polynomial products can be implemented efficiently if we rewrite the more brute force multiplications using convolution.\n",
-    "We note first that the new coefficients are given as"
+    "Letting $\\boldsymbol{x}$ be a vector with elements $x_i$ and $C(\\boldsymbol{x}, P) = \\frac{1}{N} \\sum_i  \\big(g_t'(x_i, P) - ( -\\gamma g_t(x_i, P) \\big)^2$ denote the cost function, the minimization problem that our network must solve, becomes"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "2ca3e8f6",
+   "id": "7d5e7f63",
    "metadata": {
     "editable": true
    },
    "source": [
     "$$\n",
-    "\\begin{split}\n",
-    "\\delta_0=&\\alpha_0\\beta_0\\\\\n",
-    "\\delta_1=&\\alpha_1\\beta_0+\\alpha_0\\beta_1\\\\\n",
-    "\\delta_2=&\\alpha_0\\beta_2+\\alpha_1\\beta_1+\\alpha_2\\beta_0\\\\\n",
-    "\\delta_3=&\\alpha_1\\beta_2+\\alpha_2\\beta_1+\\alpha_0\\beta_3\\\\\n",
-    "\\delta_4=&\\alpha_2\\beta_2+\\alpha_1\\beta_3\\\\\n",
-    "\\delta_5=&\\alpha_2\\beta_3.\\\\\n",
-    "\\end{split}\n",
+    "\\min_{P} C(\\boldsymbol{x}, P)\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "3b52d4e1",
+   "id": "d7442d44",
    "metadata": {
     "editable": true
    },
    "source": [
-    "We note that $\\alpha_i=0$ except for $i\\in \\left\\{0,1,2\\right\\}$ and $\\beta_i=0$ except for $i\\in\\left\\{0,1,2,3\\right\\}$.\n",
+    "In terms of $P_{\\text{hidden} }$ and $P_{\\text{output} }$, this could also be expressed as\n",
     "\n",
-    "We can then rewrite the coefficients $\\delta_j$ using a discrete convolution as"
+    "$$\n",
+    "\\min_{P_{\\text{hidden} }, \\ P_{\\text{output} }} C(\\boldsymbol{x}, \\{P_{\\text{hidden} }, P_{\\text{output} }\\})\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "af21673a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## A possible implementation of a neural network\n",
+    "\n",
+    "For simplicity, it is assumed that the input is an array $\\boldsymbol{x} = (x_1, \\dots, x_N)$ with $N$ elements. It is at these points the neural network should find $P$ such that it fulfills ([9](#min)).\n",
+    "\n",
+    "First, the neural network must feed forward the inputs.\n",
+    "This means that $\\boldsymbol{x}s$ must be passed through an input layer, a hidden layer and a output layer. The input layer in this case, does not need to process the data any further.\n",
+    "The input layer will consist of $N_{\\text{input} }$ neurons, passing its element to each neuron in the hidden layer.  The number of neurons in the hidden layer will be $N_{\\text{hidden} }$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6687f370",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Technicalities\n",
+    "\n",
+    "For the $i$-th in the hidden layer with weight $w_i^{\\text{hidden} }$ and bias $b_i^{\\text{hidden} }$, the weighting from the $j$-th neuron at the input layer is:"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "30a7affc",
+   "id": "7c07e210",
    "metadata": {
     "editable": true
    },
    "source": [
     "$$\n",
-    "\\delta_j = \\sum_{i=-\\infty}^{i=\\infty}\\alpha_i\\beta_{j-i}=(\\alpha * \\beta)_j,\n",
+    "\\begin{aligned}\n",
+    "z_{i,j}^{\\text{hidden}} &= b_i^{\\text{hidden}} + w_i^{\\text{hidden}}x_j \\\\\n",
+    "&=\n",
+    "\\begin{pmatrix}\n",
+    "b_i^{\\text{hidden}} & w_i^{\\text{hidden}}\n",
+    "\\end{pmatrix}\n",
+    "\\begin{pmatrix}\n",
+    "1 \\\\\n",
+    "x_j\n",
+    "\\end{pmatrix}\n",
+    "\\end{aligned}\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "58b6c025",
+   "id": "7747386f",
    "metadata": {
     "editable": true
    },
    "source": [
-    "or as a double sum with restriction $l=i+j$"
+    "## Final technicalities I\n",
+    "\n",
+    "The result after weighting the inputs at the $i$-th hidden neuron can be written as a vector:"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "3212b62b",
+   "id": "981c5e4b",
    "metadata": {
     "editable": true
    },
    "source": [
     "$$\n",
-    "\\delta_l = \\sum_{ij}\\alpha_i\\beta_{j}.\n",
+    "\\begin{aligned}\n",
+    "\\boldsymbol{z}_{i}^{\\text{hidden}} &= \\Big( b_i^{\\text{hidden}} + w_i^{\\text{hidden}}x_1 , \\ b_i^{\\text{hidden}} + w_i^{\\text{hidden}} x_2, \\ \\dots \\, , \\ b_i^{\\text{hidden}} + w_i^{\\text{hidden}} x_N\\Big)  \\\\\n",
+    "&=\n",
+    "\\begin{pmatrix}\n",
+    " b_i^{\\text{hidden}}  & w_i^{\\text{hidden}}\n",
+    "\\end{pmatrix}\n",
+    "\\begin{pmatrix}\n",
+    "1  & 1 & \\dots & 1 \\\\\n",
+    "x_1 & x_2 & \\dots & x_N\n",
+    "\\end{pmatrix} \\\\\n",
+    "&= \\boldsymbol{p}_{i, \\text{hidden}}^T X\n",
+    "\\end{aligned}\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "9b4093fe",
+   "id": "7eedb1ed",
    "metadata": {
     "editable": true
    },
    "source": [
-    "## Further simplification\n",
+    "## Final technicalities II\n",
     "\n",
-    "Although we may have redundant operations with some few zeros for $\\beta_i$, we can rewrite the above sum in a more compact way as"
+    "The vector $\\boldsymbol{p}_{i, \\text{hidden}}^T$ constitutes each row in $P_{\\text{hidden} }$, which contains the weights for the neural network to minimize according to ([9](#min)).\n",
+    "\n",
+    "After having found $\\boldsymbol{z}_{i}^{\\text{hidden}} $ for every $i$-th neuron within the hidden layer, the vector will be sent to an activation function $a_i(\\boldsymbol{z})$.\n",
+    "\n",
+    "In this example, the sigmoid function has been chosen to be the activation function for each hidden neuron:"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "f887571e",
+   "id": "8507388c",
    "metadata": {
     "editable": true
    },
    "source": [
     "$$\n",
-    "\\delta_i = \\sum_{k=0}^{k=m-1}\\alpha_k\\beta_{i-k},\n",
+    "f(z) = \\frac{1}{1 + \\exp{(-z)}}\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "3c52aecb",
+   "id": "32c6ce19",
    "metadata": {
     "editable": true
    },
    "source": [
-    "where $m=3$ in our case, the maximum length of\n",
-    "the vector $\\alpha$. Note that the vector $\\boldsymbol{\\beta}$ has length $n=4$. Below we will find an even more efficient representation."
+    "It is possible to use other activations functions for the hidden layer also.\n",
+    "\n",
+    "The output $\\boldsymbol{x}_i^{\\text{hidden}}$ from each $i$-th hidden neuron is:\n",
+    "\n",
+    "$$\n",
+    "\\boldsymbol{x}_i^{\\text{hidden} } = f\\big(  \\boldsymbol{z}_{i}^{\\text{hidden}} \\big)\n",
+    "$$\n",
+    "\n",
+    "The outputs $\\boldsymbol{x}_i^{\\text{hidden} } $ are then sent to the output layer.\n",
+    "\n",
+    "The output layer consists of one neuron in this case, and combines the\n",
+    "output from each of the neurons in the hidden layers. The output layer\n",
+    "combines the results from the hidden layer using some weights $w_i^{\\text{output}}$\n",
+    "and biases $b_i^{\\text{output}}$. In this case,\n",
+    "it is assumes that the number of neurons in the output layer is one."
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "c2ca7d94",
+   "id": "d3adb503",
    "metadata": {
     "editable": true
    },
    "source": [
-    "## A more efficient way of coding the above Convolution\n",
+    "## Final technicalities III\n",
     "\n",
-    "Since we only have a finite number of $\\alpha$ and $\\beta$ values\n",
-    "which are non-zero, we can rewrite the above convolution expressions\n",
-    "as a matrix-vector multiplication"
+    "The procedure of weighting the output neuron $j$ in the hidden layer to the $i$-th neuron in the output layer is similar as for the hidden layer described previously."
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "d0814379",
+   "id": "41fb7d85",
    "metadata": {
     "editable": true
    },
    "source": [
     "$$\n",
-    "\\boldsymbol{\\delta}=\\begin{bmatrix}\\alpha_0 & 0 & 0 & 0 \\\\\n",
-    "                            \\alpha_1 & \\alpha_0 & 0 & 0 \\\\\n",
-    "\t\t\t    \\alpha_2 & \\alpha_1 & \\alpha_0 & 0 \\\\\n",
-    "\t\t\t    0 & \\alpha_2 & \\alpha_1 & \\alpha_0 \\\\\n",
-    "\t\t\t    0 & 0 & \\alpha_2 & \\alpha_1 \\\\\n",
-    "\t\t\t    0 & 0 & 0 & \\alpha_2\n",
-    "\t\t\t    \\end{bmatrix}\\begin{bmatrix} \\beta_0 \\\\ \\beta_1 \\\\ \\beta_2 \\\\ \\beta_3\\end{bmatrix}.\n",
+    "\\begin{aligned}\n",
+    "z_{1,j}^{\\text{output}} & =\n",
+    "\\begin{pmatrix}\n",
+    "b_1^{\\text{output}} & \\boldsymbol{w}_1^{\\text{output}}\n",
+    "\\end{pmatrix}\n",
+    "\\begin{pmatrix}\n",
+    "1 \\\\\n",
+    "\\boldsymbol{x}_j^{\\text{hidden}}\n",
+    "\\end{pmatrix}\n",
+    "\\end{aligned}\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "37aa2860",
+   "id": "6af6c5f6",
    "metadata": {
     "editable": true
    },
    "source": [
-    "## Commutative process\n",
+    "## Final technicalities IV\n",
     "\n",
-    "The process is commutative and we can easily see that we can rewrite the multiplication in terms of  a matrix holding $\\beta$ and a vector holding $\\alpha$.\n",
-    "In this case we have"
+    "Expressing $z_{1,j}^{\\text{output}}$ as a vector gives the following way of weighting the inputs from the hidden layer:"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "9634d093",
+   "id": "bfdfcfe5",
    "metadata": {
     "editable": true
    },
    "source": [
     "$$\n",
-    "\\boldsymbol{\\delta}=\\begin{bmatrix}\\beta_0 & 0 & 0  \\\\\n",
-    "                            \\beta_1 & \\beta_0 & 0  \\\\\n",
-    "\t\t\t    \\beta_2 & \\beta_1 & \\beta_0  \\\\\n",
-    "\t\t\t    \\beta_3 & \\beta_2 & \\beta_1 \\\\\n",
-    "\t\t\t    0 & \\beta_3 & \\beta_2 \\\\\n",
-    "\t\t\t    0 & 0 & \\beta_3\n",
-    "\t\t\t    \\end{bmatrix}\\begin{bmatrix} \\alpha_0 \\\\ \\alpha_1 \\\\ \\alpha_2\\end{bmatrix}.\n",
+    "\\boldsymbol{z}_{1}^{\\text{output}} =\n",
+    "\\begin{pmatrix}\n",
+    "b_1^{\\text{output}} & \\boldsymbol{w}_1^{\\text{output}}\n",
+    "\\end{pmatrix}\n",
+    "\\begin{pmatrix}\n",
+    "1  & 1 & \\dots & 1 \\\\\n",
+    "\\boldsymbol{x}_1^{\\text{hidden}} & \\boldsymbol{x}_2^{\\text{hidden}} & \\dots & \\boldsymbol{x}_N^{\\text{hidden}}\n",
+    "\\end{pmatrix}\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "29991f53",
+   "id": "224fb7a0",
    "metadata": {
     "editable": true
    },
    "source": [
-    "Note that the use of these matrices is for mathematical purposes only\n",
-    "and not implementation purposes.  When implementing the above equation\n",
-    "we do not encode (and allocate memory) the matrices explicitely.  We\n",
-    "rather code the convolutions in the minimal memory footprint that they\n",
-    "require."
+    "In this case we seek a continuous range of values since we are approximating a function. This means that after computing $\\boldsymbol{z}_{1}^{\\text{output}}$ the neural network has finished its feed forward step, and $\\boldsymbol{z}_{1}^{\\text{output}}$ is the final output of the network."
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "a4c5c132",
+   "id": "03c8c39e",
    "metadata": {
     "editable": true
    },
    "source": [
-    "## Toeplitz matrices\n",
+    "## Back propagation\n",
     "\n",
-    "The above matrices are examples of so-called [Toeplitz\n",
-    "matrices](https://link.springer.com/book/10.1007/978-93-86279-04-0). A\n",
-    "Toeplitz matrix is a matrix in which each descending diagonal from\n",
-    "left to right is constant. For instance the last matrix, which we\n",
-    "rewrite as"
+    "The next step is to decide how the parameters should be changed such that they minimize the cost function.\n",
+    "\n",
+    "The chosen cost function for this problem is"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "23bbc731",
+   "id": "f467feb4",
    "metadata": {
     "editable": true
    },
    "source": [
     "$$\n",
-    "\\boldsymbol{A}=\\begin{bmatrix}a_0 & 0 & 0  \\\\\n",
-    "                            a_1 & a_0 & 0  \\\\\n",
-    "\t\t\t    a_2 & a_1 & a_0  \\\\\n",
-    "\t\t\t    a_3 & a_2 & a_1 \\\\\n",
-    "\t\t\t    0 & a_3 & a_2 \\\\\n",
-    "\t\t\t    0 & 0 & a_3\n",
-    "\t\t\t    \\end{bmatrix},\n",
+    "C(\\boldsymbol{x}, P) = \\frac{1}{N} \\sum_i  \\big(g_t'(x_i, P) - ( -\\gamma g_t(x_i, P) \\big)^2\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "a8be6a8f",
+   "id": "287a0aed",
    "metadata": {
     "editable": true
    },
    "source": [
-    "with elements $a_{ii}=a_{i+1,j+1}=a_{i-j}$ is an example of a Toeplitz\n",
-    "matrix. Such a matrix does not need to be a square matrix.  Toeplitz\n",
-    "matrices are also closely connected with Fourier series, because the multiplication operator by a trigonometric\n",
-    "polynomial, compressed to a finite-dimensional space, can be\n",
-    "represented by such a matrix. The example above shows that we can\n",
-    "represent linear convolution as multiplication of a Toeplitz matrix by\n",
-    "a vector."
+    "In order to minimize the cost function, an optimization method must be chosen.\n",
+    "\n",
+    "Here, gradient descent with a constant step size has been chosen."
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "b1f6dde3",
+   "id": "a49835f1",
    "metadata": {
     "editable": true
    },
    "source": [
-    "## Fourier series and Toeplitz matrices\n",
+    "## Gradient descent\n",
     "\n",
-    "This is an active and ogoing research area concerning CNNs. The following articles may be of interest\n",
-    "1. [Read more about the convolution theorem and Fouriers series](https://www.sciencedirect.com/topics/engineering/convolution-theorem#:~:text=The%20convolution%20theorem%20(together%20with,k%20)%20G%20(%20k%20)%20.)\n",
+    "The idea of the gradient descent algorithm is to update parameters in\n",
+    "a direction where the cost function decreases goes to a minimum.\n",
     "\n",
-    "2. [Fourier Transform Layer](https://www.sciencedirect.com/science/article/pii/S1568494623006257)"
+    "In general, the update of some parameters $\\boldsymbol{\\omega}$ given a cost\n",
+    "function defined by some weights $\\boldsymbol{\\omega}$, $C(\\boldsymbol{x},\n",
+    "\\boldsymbol{\\omega})$, goes as follows:"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "52d1e6d0",
+   "id": "62d6f51d",
    "metadata": {
     "editable": true
    },
    "source": [
-    "## Generalizing the above one-dimensional case\n",
+    "$$\n",
+    "\\boldsymbol{\\omega}_{\\text{new} } = \\boldsymbol{\\omega} - \\lambda \\nabla_{\\boldsymbol{\\omega}} C(\\boldsymbol{x}, \\boldsymbol{\\omega})\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3ca20573",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "for a number of iterations or until $ \\big|\\big| \\boldsymbol{\\omega}_{\\text{new} } - \\boldsymbol{\\omega} \\big|\\big|$ becomes smaller than some given tolerance.\n",
     "\n",
-    "In order to align the above simple case with the more general\n",
-    "convolution cases, we rename $\\boldsymbol{\\alpha}$, whose length is $m=3$,\n",
-    "with $\\boldsymbol{w}$.  We will interpret $\\boldsymbol{w}$ as a weight/filter function\n",
-    "with which we want to perform the convolution with an input variable\n",
-    "$\\boldsymbol{x}$ of length $n$.  We will assume always that the filter\n",
-    "$\\boldsymbol{w}$ has dimensionality $m \\le n$.\n",
+    "The value of $\\lambda$ decides how large steps the algorithm must take\n",
+    "in the direction of $ \\nabla_{\\boldsymbol{\\omega}} C(\\boldsymbol{x}, \\boldsymbol{\\omega})$.\n",
+    "The notation $\\nabla_{\\boldsymbol{\\omega}}$ express the gradient with respect\n",
+    "to the elements in $\\boldsymbol{\\omega}$.\n",
     "\n",
-    "We replace thus $\\boldsymbol{\\beta}$ with $\\boldsymbol{x}$ and $\\boldsymbol{\\delta}$ with $\\boldsymbol{y}$ and have"
+    "In our case, we have to minimize the cost function $C(\\boldsymbol{x}, P)$ with\n",
+    "respect to the two sets of weights and biases, that is for the hidden\n",
+    "layer $P_{\\text{hidden} }$ and for the output layer $P_{\\text{output}\n",
+    "}$ .\n",
+    "\n",
+    "This means that $P_{\\text{hidden} }$ and $P_{\\text{output} }$ is updated by"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "5dd784d6",
+   "id": "8b16bc94",
    "metadata": {
     "editable": true
    },
    "source": [
     "$$\n",
-    "y(i)= \\left(x*w\\right)(i)= \\sum_{k=0}^{k=m-1}w(k)x(i-k),\n",
+    "\\begin{aligned}\n",
+    "P_{\\text{hidden},\\text{new}} &= P_{\\text{hidden}} - \\lambda \\nabla_{P_{\\text{hidden}}} C(\\boldsymbol{x}, P)  \\\\\n",
+    "P_{\\text{output},\\text{new}} &= P_{\\text{output}} - \\lambda \\nabla_{P_{\\text{output}}} C(\\boldsymbol{x}, P)\n",
+    "\\end{aligned}\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "50cd21e3",
+   "id": "a339b3a7",
    "metadata": {
     "editable": true
    },
    "source": [
-    "where $m=3$ in our case, the maximum length of the vector $\\boldsymbol{w}$.\n",
-    "Here the symbol $*$ represents the mathematical operation of convolution."
+    "## The code for solving the ODE"
    ]
   },
   {
-   "cell_type": "markdown",
-   "id": "5631b73a",
+   "cell_type": "code",
+   "execution_count": 1,
+   "id": "a63e587a",
    "metadata": {
+    "collapsed": false,
     "editable": true
    },
+   "outputs": [],
    "source": [
-    "## Memory considerations\n",
+    "%matplotlib inline\n",
     "\n",
-    "This expression leaves us however with some terms with negative\n",
-    "indices, for example $x(-1)$ and $x(-2)$ which may not be defined. Our\n",
-    "vector $\\boldsymbol{x}$ has components $x(0)$, $x(1)$, $x(2)$ and $x(3)$.\n",
+    "import autograd.numpy as np\n",
+    "from autograd import grad, elementwise_grad\n",
+    "import autograd.numpy.random as npr\n",
+    "from matplotlib import pyplot as plt\n",
     "\n",
-    "The index $j$ for $\\boldsymbol{x}$ runs from $j=0$ to $j=3$ since $\\boldsymbol{x}$ is meant to\n",
-    "represent a third-order polynomial.\n",
+    "def sigmoid(z):\n",
+    "    return 1/(1 + np.exp(-z))\n",
     "\n",
-    "Furthermore, the index $i$ runs from $i=0$ to $i=5$ since $\\boldsymbol{y}$\n",
-    "contains the coefficients of a fifth-order polynomial.  When $i=5$ we\n",
-    "may also have values of $x(4)$ and $x(5)$ which are not defined."
+    "# Assuming one input, hidden, and output layer\n",
+    "def neural_network(params, x):\n",
+    "\n",
+    "    # Find the weights (including and biases) for the hidden and output layer.\n",
+    "    # Assume that params is a list of parameters for each layer.\n",
+    "    # The biases are the first element for each array in params,\n",
+    "    # and the weights are the remaning elements in each array in params.\n",
+    "\n",
+    "    w_hidden = params[0]\n",
+    "    w_output = params[1]\n",
+    "\n",
+    "    # Assumes input x being an one-dimensional array\n",
+    "    num_values = np.size(x)\n",
+    "    x = x.reshape(-1, num_values)\n",
+    "\n",
+    "    # Assume that the input layer does nothing to the input x\n",
+    "    x_input = x\n",
+    "\n",
+    "    ## Hidden layer:\n",
+    "\n",
+    "    # Add a row of ones to include bias\n",
+    "    x_input = np.concatenate((np.ones((1,num_values)), x_input ), axis = 0)\n",
+    "\n",
+    "    z_hidden = np.matmul(w_hidden, x_input)\n",
+    "    x_hidden = sigmoid(z_hidden)\n",
+    "\n",
+    "    ## Output layer:\n",
+    "\n",
+    "    # Include bias:\n",
+    "    x_hidden = np.concatenate((np.ones((1,num_values)), x_hidden ), axis = 0)\n",
+    "\n",
+    "    z_output = np.matmul(w_output, x_hidden)\n",
+    "    x_output = z_output\n",
+    "\n",
+    "    return x_output\n",
+    "\n",
+    "# The trial solution using the deep neural network:\n",
+    "def g_trial(x,params, g0 = 10):\n",
+    "    return g0 + x*neural_network(params,x)\n",
+    "\n",
+    "# The right side of the ODE:\n",
+    "def g(x, g_trial, gamma = 2):\n",
+    "    return -gamma*g_trial\n",
+    "\n",
+    "# The cost function:\n",
+    "def cost_function(P, x):\n",
+    "\n",
+    "    # Evaluate the trial function with the current parameters P\n",
+    "    g_t = g_trial(x,P)\n",
+    "\n",
+    "    # Find the derivative w.r.t x of the neural network\n",
+    "    d_net_out = elementwise_grad(neural_network,1)(P,x)\n",
+    "\n",
+    "    # Find the derivative w.r.t x of the trial function\n",
+    "    d_g_t = elementwise_grad(g_trial,0)(x,P)\n",
+    "\n",
+    "    # The right side of the ODE\n",
+    "    func = g(x, g_t)\n",
+    "\n",
+    "    err_sqr = (d_g_t - func)**2\n",
+    "    cost_sum = np.sum(err_sqr)\n",
+    "\n",
+    "    return cost_sum / np.size(err_sqr)\n",
+    "\n",
+    "# Solve the exponential decay ODE using neural network with one input, hidden, and output layer\n",
+    "def solve_ode_neural_network(x, num_neurons_hidden, num_iter, lmb):\n",
+    "    ## Set up initial weights and biases\n",
+    "\n",
+    "    # For the hidden layer\n",
+    "    p0 = npr.randn(num_neurons_hidden, 2 )\n",
+    "\n",
+    "    # For the output layer\n",
+    "    p1 = npr.randn(1, num_neurons_hidden + 1 ) # +1 since bias is included\n",
+    "\n",
+    "    P = [p0, p1]\n",
+    "\n",
+    "    print('Initial cost: %g'%cost_function(P, x))\n",
+    "\n",
+    "    ## Start finding the optimal weights using gradient descent\n",
+    "\n",
+    "    # Find the Python function that represents the gradient of the cost function\n",
+    "    # w.r.t the 0-th input argument -- that is the weights and biases in the hidden and output layer\n",
+    "    cost_function_grad = grad(cost_function,0)\n",
+    "\n",
+    "    # Let the update be done num_iter times\n",
+    "    for i in range(num_iter):\n",
+    "        # Evaluate the gradient at the current weights and biases in P.\n",
+    "        # The cost_grad consist now of two arrays;\n",
+    "        # one for the gradient w.r.t P_hidden and\n",
+    "        # one for the gradient w.r.t P_output\n",
+    "        cost_grad =  cost_function_grad(P, x)\n",
+    "\n",
+    "        P[0] = P[0] - lmb * cost_grad[0]\n",
+    "        P[1] = P[1] - lmb * cost_grad[1]\n",
+    "\n",
+    "    print('Final cost: %g'%cost_function(P, x))\n",
+    "\n",
+    "    return P\n",
+    "\n",
+    "def g_analytic(x, gamma = 2, g0 = 10):\n",
+    "    return g0*np.exp(-gamma*x)\n",
+    "\n",
+    "# Solve the given problem\n",
+    "if __name__ == '__main__':\n",
+    "    # Set seed such that the weight are initialized\n",
+    "    # with same weights and biases for every run.\n",
+    "    npr.seed(15)\n",
+    "\n",
+    "    ## Decide the vales of arguments to the function to solve\n",
+    "    N = 10\n",
+    "    x = np.linspace(0, 1, N)\n",
+    "\n",
+    "    ## Set up the initial parameters\n",
+    "    num_hidden_neurons = 10\n",
+    "    num_iter = 10000\n",
+    "    lmb = 0.001\n",
+    "\n",
+    "    # Use the network\n",
+    "    P = solve_ode_neural_network(x, num_hidden_neurons, num_iter, lmb)\n",
+    "\n",
+    "    # Print the deviation from the trial solution and true solution\n",
+    "    res = g_trial(x,P)\n",
+    "    res_analytical = g_analytic(x)\n",
+    "\n",
+    "    print('Max absolute difference: %g'%np.max(np.abs(res - res_analytical)))\n",
+    "\n",
+    "    # Plot the results\n",
+    "    plt.figure(figsize=(10,10))\n",
+    "\n",
+    "    plt.title('Performance of neural network solving an ODE compared to the analytical solution')\n",
+    "    plt.plot(x, res_analytical)\n",
+    "    plt.plot(x, res[0,:])\n",
+    "    plt.legend(['analytical','nn'])\n",
+    "    plt.xlabel('x')\n",
+    "    plt.ylabel('g(x)')\n",
+    "    plt.show()"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "4cbbda23",
+   "id": "85985bda",
    "metadata": {
     "editable": true
    },
    "source": [
-    "## Padding\n",
+    "## The network with one input layer, specified number of hidden layers, and one output layer\n",
     "\n",
-    "The solution to this is what is called **padding**!  We simply define a\n",
-    "new vector $x$ with two added elements set to zero before $x(0)$ and\n",
-    "two new elements after $x(3)$ set to zero. That is, we augment the\n",
-    "length of $\\boldsymbol{x}$ from $n=4$ to $n+2P=8$, where $P=2$ is the padding\n",
-    "constant (a new hyperparameter), see discussions below as well."
+    "It is also possible to extend the construction of our network into a more general one, allowing the network to contain more than one hidden layers.\n",
+    "\n",
+    "The number of neurons within each hidden layer are given as a list of integers in the program below."
    ]
   },
   {
-   "cell_type": "markdown",
-   "id": "1d55097f",
+   "cell_type": "code",
+   "execution_count": 2,
+   "id": "91831f8e",
    "metadata": {
+    "collapsed": false,
     "editable": true
    },
+   "outputs": [],
    "source": [
-    "## New vector\n",
+    "import autograd.numpy as np\n",
+    "from autograd import grad, elementwise_grad\n",
+    "import autograd.numpy.random as npr\n",
+    "from matplotlib import pyplot as plt\n",
     "\n",
-    "We have a new vector defined as $x(0)=0$, $x(1)=0$,\n",
-    "$x(2)=\\beta_0$, $x(3)=\\beta_1$, $x(4)=\\beta_2$, $x(5)=\\beta_3$,\n",
-    "$x(6)=0$, and $x(7)=0$.\n",
+    "def sigmoid(z):\n",
+    "    return 1/(1 + np.exp(-z))\n",
     "\n",
-    "We have added four new elements, which\n",
-    "are all zero. The benefit is that we can rewrite the equation for\n",
-    "$\\boldsymbol{y}$, with $i=0,1,\\dots,5$,"
+    "# The neural network with one input layer and one output layer,\n",
+    "# but with number of hidden layers specified by the user.\n",
+    "def deep_neural_network(deep_params, x):\n",
+    "    # N_hidden is the number of hidden layers\n",
+    "    # deep_params is a list, len() should be used\n",
+    "    N_hidden = len(deep_params) - 1 # -1 since params consists of\n",
+    "                                        # parameters to all the hidden\n",
+    "                                        # layers AND the output layer.\n",
+    "\n",
+    "    # Assumes input x being an one-dimensional array\n",
+    "    num_values = np.size(x)\n",
+    "    x = x.reshape(-1, num_values)\n",
+    "\n",
+    "    # Assume that the input layer does nothing to the input x\n",
+    "    x_input = x\n",
+    "\n",
+    "    # Due to multiple hidden layers, define a variable referencing to the\n",
+    "    # output of the previous layer:\n",
+    "    x_prev = x_input\n",
+    "\n",
+    "    ## Hidden layers:\n",
+    "\n",
+    "    for l in range(N_hidden):\n",
+    "        # From the list of parameters P; find the correct weigths and bias for this layer\n",
+    "        w_hidden = deep_params[l]\n",
+    "\n",
+    "        # Add a row of ones to include bias\n",
+    "        x_prev = np.concatenate((np.ones((1,num_values)), x_prev ), axis = 0)\n",
+    "\n",
+    "        z_hidden = np.matmul(w_hidden, x_prev)\n",
+    "        x_hidden = sigmoid(z_hidden)\n",
+    "\n",
+    "        # Update x_prev such that next layer can use the output from this layer\n",
+    "        x_prev = x_hidden\n",
+    "\n",
+    "    ## Output layer:\n",
+    "\n",
+    "    # Get the weights and bias for this layer\n",
+    "    w_output = deep_params[-1]\n",
+    "\n",
+    "    # Include bias:\n",
+    "    x_prev = np.concatenate((np.ones((1,num_values)), x_prev), axis = 0)\n",
+    "\n",
+    "    z_output = np.matmul(w_output, x_prev)\n",
+    "    x_output = z_output\n",
+    "\n",
+    "    return x_output\n",
+    "\n",
+    "# The trial solution using the deep neural network:\n",
+    "def g_trial_deep(x,params, g0 = 10):\n",
+    "    return g0 + x*deep_neural_network(params, x)\n",
+    "\n",
+    "# The right side of the ODE:\n",
+    "def g(x, g_trial, gamma = 2):\n",
+    "    return -gamma*g_trial\n",
+    "\n",
+    "# The same cost function as before, but calls deep_neural_network instead.\n",
+    "def cost_function_deep(P, x):\n",
+    "\n",
+    "    # Evaluate the trial function with the current parameters P\n",
+    "    g_t = g_trial_deep(x,P)\n",
+    "\n",
+    "    # Find the derivative w.r.t x of the neural network\n",
+    "    d_net_out = elementwise_grad(deep_neural_network,1)(P,x)\n",
+    "\n",
+    "    # Find the derivative w.r.t x of the trial function\n",
+    "    d_g_t = elementwise_grad(g_trial_deep,0)(x,P)\n",
+    "\n",
+    "    # The right side of the ODE\n",
+    "    func = g(x, g_t)\n",
+    "\n",
+    "    err_sqr = (d_g_t - func)**2\n",
+    "    cost_sum = np.sum(err_sqr)\n",
+    "\n",
+    "    return cost_sum / np.size(err_sqr)\n",
+    "\n",
+    "# Solve the exponential decay ODE using neural network with one input and one output layer,\n",
+    "# but with specified number of hidden layers from the user.\n",
+    "def solve_ode_deep_neural_network(x, num_neurons, num_iter, lmb):\n",
+    "    # num_hidden_neurons is now a list of number of neurons within each hidden layer\n",
+    "\n",
+    "    # The number of elements in the list num_hidden_neurons thus represents\n",
+    "    # the number of hidden layers.\n",
+    "\n",
+    "    # Find the number of hidden layers:\n",
+    "    N_hidden = np.size(num_neurons)\n",
+    "\n",
+    "    ## Set up initial weights and biases\n",
+    "\n",
+    "    # Initialize the list of parameters:\n",
+    "    P = [None]*(N_hidden + 1) # + 1 to include the output layer\n",
+    "\n",
+    "    P[0] = npr.randn(num_neurons[0], 2 )\n",
+    "    for l in range(1,N_hidden):\n",
+    "        P[l] = npr.randn(num_neurons[l], num_neurons[l-1] + 1) # +1 to include bias\n",
+    "\n",
+    "    # For the output layer\n",
+    "    P[-1] = npr.randn(1, num_neurons[-1] + 1 ) # +1 since bias is included\n",
+    "\n",
+    "    print('Initial cost: %g'%cost_function_deep(P, x))\n",
+    "\n",
+    "    ## Start finding the optimal weights using gradient descent\n",
+    "\n",
+    "    # Find the Python function that represents the gradient of the cost function\n",
+    "    # w.r.t the 0-th input argument -- that is the weights and biases in the hidden and output layer\n",
+    "    cost_function_deep_grad = grad(cost_function_deep,0)\n",
+    "\n",
+    "    # Let the update be done num_iter times\n",
+    "    for i in range(num_iter):\n",
+    "        # Evaluate the gradient at the current weights and biases in P.\n",
+    "        # The cost_grad consist now of N_hidden + 1 arrays; the gradient w.r.t the weights and biases\n",
+    "        # in the hidden layers and output layers evaluated at x.\n",
+    "        cost_deep_grad =  cost_function_deep_grad(P, x)\n",
+    "\n",
+    "        for l in range(N_hidden+1):\n",
+    "            P[l] = P[l] - lmb * cost_deep_grad[l]\n",
+    "\n",
+    "    print('Final cost: %g'%cost_function_deep(P, x))\n",
+    "\n",
+    "    return P\n",
+    "\n",
+    "def g_analytic(x, gamma = 2, g0 = 10):\n",
+    "    return g0*np.exp(-gamma*x)\n",
+    "\n",
+    "# Solve the given problem\n",
+    "if __name__ == '__main__':\n",
+    "    npr.seed(15)\n",
+    "\n",
+    "    ## Decide the vales of arguments to the function to solve\n",
+    "    N = 10\n",
+    "    x = np.linspace(0, 1, N)\n",
+    "\n",
+    "    ## Set up the initial parameters\n",
+    "    num_hidden_neurons = np.array([10,10])\n",
+    "    num_iter = 10000\n",
+    "    lmb = 0.001\n",
+    "\n",
+    "    P = solve_ode_deep_neural_network(x, num_hidden_neurons, num_iter, lmb)\n",
+    "\n",
+    "    res = g_trial_deep(x,P)\n",
+    "    res_analytical = g_analytic(x)\n",
+    "\n",
+    "    plt.figure(figsize=(10,10))\n",
+    "\n",
+    "    plt.title('Performance of a deep neural network solving an ODE compared to the analytical solution')\n",
+    "    plt.plot(x, res_analytical)\n",
+    "    plt.plot(x, res[0,:])\n",
+    "    plt.legend(['analytical','dnn'])\n",
+    "    plt.ylabel('g(x)')\n",
+    "    plt.show()"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "21812778",
+   "id": "e6de1553",
    "metadata": {
     "editable": true
    },
    "source": [
-    "$$\n",
-    "y(i) = \\sum_{k=0}^{k=m-1}w(k)x(i+(m-1)-k).\n",
-    "$$"
+    "## Example: Population growth\n",
+    "\n",
+    "A logistic model of population growth assumes that a population converges toward an equilibrium.\n",
+    "The population growth can be modeled by"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "ab3720f7",
+   "id": "6e4c5e3a",
    "metadata": {
     "editable": true
    },
    "source": [
-    "As an example, we have"
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"log\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation} \\label{log} \\tag{10}\n",
+    "\tg'(t) = \\alpha g(t)(A - g(t))\n",
+    "\\end{equation}\n",
+    "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "1e1dd46d",
+   "id": "64a97256",
    "metadata": {
     "editable": true
    },
    "source": [
-    "$$\n",
-    "y(4)=x(6)w(0)+x(5)w(1)+x(4)w(2)=0\\times \\alpha_0+\\beta_3\\alpha_1+\\beta_2\\alpha_2,\n",
-    "$$"
+    "where $g(t)$ is the population density at time $t$, $\\alpha > 0$ the growth rate and $A > 0$ is the maximum population number in the environment.\n",
+    "Also, at $t = 0$ the population has the size $g(0) = g_0$, where $g_0$ is some chosen constant.\n",
+    "\n",
+    "In this example, similar network as for the exponential decay using Autograd has been used to solve the equation. However, as the implementation might suffer from e.g numerical instability\n",
+    "and high execution time (this might be more apparent in the examples solving PDEs),\n",
+    "using a library like  TensorFlow is recommended.\n",
+    "Here, we stay with a more simple approach and implement for comparison, the simple forward Euler method."
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "1f73e566",
+   "id": "94bb8aaa",
    "metadata": {
     "editable": true
    },
    "source": [
-    "as before except that we have an additional term $x(6)w(0)$, which is zero.\n",
+    "## Setting up the problem\n",
     "\n",
-    "Similarly, for the fifth-order term we have"
+    "Here, we will model a population $g(t)$ in an environment having carrying capacity $A$.\n",
+    "The population follows the model"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "00b9b1d2",
+   "id": "29ead54b",
    "metadata": {
     "editable": true
    },
    "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"solveode_population\"></div>\n",
+    "\n",
     "$$\n",
-    "y(5)=x(7)w(0)+x(6)w(1)+x(5)w(2)=0\\times \\alpha_0+0\\times\\alpha_1+\\beta_3\\alpha_2.\n",
+    "\\begin{equation} \\label{solveode_population} \\tag{11}\n",
+    "g'(t) = \\alpha g(t)(A - g(t))\n",
+    "\\end{equation}\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "b27a9b22",
+   "id": "5685f6e2",
    "metadata": {
     "editable": true
    },
    "source": [
-    "The zeroth-order term is"
+    "where $g(0) = g_0$.\n",
+    "\n",
+    "In this example, we let $\\alpha = 2$, $A = 1$, and $g_0 = 1.2$."
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "f5e818df",
+   "id": "adaea719",
    "metadata": {
     "editable": true
    },
    "source": [
+    "## The trial solution\n",
+    "\n",
+    "We will get a slightly different trial solution, as the boundary conditions are different\n",
+    "compared to the case for exponential decay.\n",
+    "\n",
+    "A possible trial solution satisfying the condition $g(0) = g_0$ could be\n",
+    "\n",
     "$$\n",
-    "y(0)=x(2)w(0)+x(1)w(1)+x(0)w(2)=\\beta_0 \\alpha_0+0\\times\\alpha_1+0\\times\\alpha_2=\\alpha_0\\beta_0.\n",
+    "h_1(t) = g_0 + t \\cdot N(t,P)\n",
+    "$$\n",
+    "\n",
+    "with $N(t,P)$ being the output from the neural network with weights and biases for each layer collected in the set $P$.\n",
+    "\n",
+    "The analytical solution is\n",
+    "\n",
+    "$$\n",
+    "g(t) = \\frac{Ag_0}{g_0 + (A - g_0)\\exp(-\\alpha A t)}\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "5d40e9b7",
+   "id": "4ee7e543",
    "metadata": {
     "editable": true
    },
    "source": [
-    "## Rewriting as dot products\n",
+    "## The program using Autograd\n",
     "\n",
-    "If we now flip the filter/weight vector, with the following term as a typical example"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "a2488003",
-   "metadata": {
-    "editable": true
-   },
-   "source": [
-    "$$\n",
-    "y(0)=x(2)w(0)+x(1)w(1)+x(0)w(2)=x(2)\\tilde{w}(2)+x(1)\\tilde{w}(1)+x(0)\\tilde{w}(0),\n",
-    "$$"
+    "The network will be the similar as for the exponential decay example, but with some small modifications for our problem."
    ]
   },
   {
-   "cell_type": "markdown",
-   "id": "f3424a4a",
+   "cell_type": "code",
+   "execution_count": 3,
+   "id": "e50f4369",
    "metadata": {
+    "collapsed": false,
     "editable": true
    },
+   "outputs": [],
    "source": [
-    "with $\\tilde{w}(0)=w(2)$, $\\tilde{w}(1)=w(1)$, and $\\tilde{w}(2)=w(0)$, we can then rewrite the above sum as a dot product of\n",
-    "$x(i:i+(m-1))\\tilde{w}$ for element $y(i)$, where $x(i:i+(m-1))$ is simply a patch of $\\boldsymbol{x}$ of size $m-1$.\n",
+    "import autograd.numpy as np\n",
+    "from autograd import grad, elementwise_grad\n",
+    "import autograd.numpy.random as npr\n",
+    "from matplotlib import pyplot as plt\n",
     "\n",
-    "The padding $P$ we have introduced for the convolution stage is just\n",
-    "another hyperparameter which is introduced as part of the\n",
-    "architecture. Similarly, below we will also introduce another\n",
-    "hyperparameter called **Stride** $S$."
+    "def sigmoid(z):\n",
+    "    return 1/(1 + np.exp(-z))\n",
+    "\n",
+    "# Function to get the parameters.\n",
+    "# Done such that one can easily change the paramaters after one's liking.\n",
+    "def get_parameters():\n",
+    "    alpha = 2\n",
+    "    A = 1\n",
+    "    g0 = 1.2\n",
+    "    return alpha, A, g0\n",
+    "\n",
+    "def deep_neural_network(deep_params, x):\n",
+    "    # N_hidden is the number of hidden layers\n",
+    "    # deep_params is a list, len() should be used\n",
+    "    N_hidden = len(deep_params) - 1 # -1 since params consists of\n",
+    "                                        # parameters to all the hidden\n",
+    "                                        # layers AND the output layer.\n",
+    "\n",
+    "    # Assumes input x being an one-dimensional array\n",
+    "    num_values = np.size(x)\n",
+    "    x = x.reshape(-1, num_values)\n",
+    "\n",
+    "    # Assume that the input layer does nothing to the input x\n",
+    "    x_input = x\n",
+    "\n",
+    "    # Due to multiple hidden layers, define a variable referencing to the\n",
+    "    # output of the previous layer:\n",
+    "    x_prev = x_input\n",
+    "\n",
+    "    ## Hidden layers:\n",
+    "\n",
+    "    for l in range(N_hidden):\n",
+    "        # From the list of parameters P; find the correct weigths and bias for this layer\n",
+    "        w_hidden = deep_params[l]\n",
+    "\n",
+    "        # Add a row of ones to include bias\n",
+    "        x_prev = np.concatenate((np.ones((1,num_values)), x_prev ), axis = 0)\n",
+    "\n",
+    "        z_hidden = np.matmul(w_hidden, x_prev)\n",
+    "        x_hidden = sigmoid(z_hidden)\n",
+    "\n",
+    "        # Update x_prev such that next layer can use the output from this layer\n",
+    "        x_prev = x_hidden\n",
+    "\n",
+    "    ## Output layer:\n",
+    "\n",
+    "    # Get the weights and bias for this layer\n",
+    "    w_output = deep_params[-1]\n",
+    "\n",
+    "    # Include bias:\n",
+    "    x_prev = np.concatenate((np.ones((1,num_values)), x_prev), axis = 0)\n",
+    "\n",
+    "    z_output = np.matmul(w_output, x_prev)\n",
+    "    x_output = z_output\n",
+    "\n",
+    "    return x_output\n",
+    "\n",
+    "\n",
+    "\n",
+    "\n",
+    "def cost_function_deep(P, x):\n",
+    "\n",
+    "    # Evaluate the trial function with the current parameters P\n",
+    "    g_t = g_trial_deep(x,P)\n",
+    "\n",
+    "    # Find the derivative w.r.t x of the trial function\n",
+    "    d_g_t = elementwise_grad(g_trial_deep,0)(x,P)\n",
+    "\n",
+    "    # The right side of the ODE\n",
+    "    func = f(x, g_t)\n",
+    "\n",
+    "    err_sqr = (d_g_t - func)**2\n",
+    "    cost_sum = np.sum(err_sqr)\n",
+    "\n",
+    "    return cost_sum / np.size(err_sqr)\n",
+    "\n",
+    "# The right side of the ODE:\n",
+    "def f(x, g_trial):\n",
+    "    alpha,A, g0 = get_parameters()\n",
+    "    return alpha*g_trial*(A - g_trial)\n",
+    "\n",
+    "# The trial solution using the deep neural network:\n",
+    "def g_trial_deep(x, params):\n",
+    "    alpha,A, g0 = get_parameters()\n",
+    "    return g0 + x*deep_neural_network(params,x)\n",
+    "\n",
+    "# The analytical solution:\n",
+    "def g_analytic(t):\n",
+    "    alpha,A, g0 = get_parameters()\n",
+    "    return A*g0/(g0 + (A - g0)*np.exp(-alpha*A*t))\n",
+    "\n",
+    "def solve_ode_deep_neural_network(x, num_neurons, num_iter, lmb):\n",
+    "    # num_hidden_neurons is now a list of number of neurons within each hidden layer\n",
+    "\n",
+    "    # Find the number of hidden layers:\n",
+    "    N_hidden = np.size(num_neurons)\n",
+    "\n",
+    "    ## Set up initial weigths and biases\n",
+    "\n",
+    "    # Initialize the list of parameters:\n",
+    "    P = [None]*(N_hidden + 1) # + 1 to include the output layer\n",
+    "\n",
+    "    P[0] = npr.randn(num_neurons[0], 2 )\n",
+    "    for l in range(1,N_hidden):\n",
+    "        P[l] = npr.randn(num_neurons[l], num_neurons[l-1] + 1) # +1 to include bias\n",
+    "\n",
+    "    # For the output layer\n",
+    "    P[-1] = npr.randn(1, num_neurons[-1] + 1 ) # +1 since bias is included\n",
+    "\n",
+    "    print('Initial cost: %g'%cost_function_deep(P, x))\n",
+    "\n",
+    "    ## Start finding the optimal weigths using gradient descent\n",
+    "\n",
+    "    # Find the Python function that represents the gradient of the cost function\n",
+    "    # w.r.t the 0-th input argument -- that is the weights and biases in the hidden and output layer\n",
+    "    cost_function_deep_grad = grad(cost_function_deep,0)\n",
+    "\n",
+    "    # Let the update be done num_iter times\n",
+    "    for i in range(num_iter):\n",
+    "        # Evaluate the gradient at the current weights and biases in P.\n",
+    "        # The cost_grad consist now of N_hidden + 1 arrays; the gradient w.r.t the weights and biases\n",
+    "        # in the hidden layers and output layers evaluated at x.\n",
+    "        cost_deep_grad =  cost_function_deep_grad(P, x)\n",
+    "\n",
+    "        for l in range(N_hidden+1):\n",
+    "            P[l] = P[l] - lmb * cost_deep_grad[l]\n",
+    "\n",
+    "    print('Final cost: %g'%cost_function_deep(P, x))\n",
+    "\n",
+    "    return P\n",
+    "\n",
+    "if __name__ == '__main__':\n",
+    "    npr.seed(4155)\n",
+    "\n",
+    "    ## Decide the vales of arguments to the function to solve\n",
+    "    Nt = 10\n",
+    "    T = 1\n",
+    "    t = np.linspace(0,T, Nt)\n",
+    "\n",
+    "    ## Set up the initial parameters\n",
+    "    num_hidden_neurons = [100, 50, 25]\n",
+    "    num_iter = 1000\n",
+    "    lmb = 1e-3\n",
+    "\n",
+    "    P = solve_ode_deep_neural_network(t, num_hidden_neurons, num_iter, lmb)\n",
+    "\n",
+    "    g_dnn_ag = g_trial_deep(t,P)\n",
+    "    g_analytical = g_analytic(t)\n",
+    "\n",
+    "    # Find the maximum absolute difference between the solutons:\n",
+    "    diff_ag = np.max(np.abs(g_dnn_ag - g_analytical))\n",
+    "    print(\"The max absolute difference between the solutions is: %g\"%diff_ag)\n",
+    "\n",
+    "    plt.figure(figsize=(10,10))\n",
+    "\n",
+    "    plt.title('Performance of neural network solving an ODE compared to the analytical solution')\n",
+    "    plt.plot(t, g_analytical)\n",
+    "    plt.plot(t, g_dnn_ag[0,:])\n",
+    "    plt.legend(['analytical','nn'])\n",
+    "    plt.xlabel('t')\n",
+    "    plt.ylabel('g(t)')\n",
+    "\n",
+    "    plt.show()"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "b8ec7a35",
+   "id": "cf212644",
    "metadata": {
     "editable": true
    },
    "source": [
-    "## Cross correlation\n",
+    "## Using forward Euler to solve the ODE\n",
     "\n",
-    "In essentially all applications one uses what is called cross correlation instead of the standard convolution described above.\n",
-    "This means that multiplication is performed in the same direction and instead of the general expression we have discussed above (with infinite sums)"
+    "A straightforward way of solving an ODE numerically, is to use Euler's method.\n",
+    "\n",
+    "Euler's method uses Taylor series to approximate the value at a function $f$ at a step $\\Delta x$ from $x$:\n",
+    "\n",
+    "$$\n",
+    "f(x + \\Delta x) \\approx f(x) + \\Delta x f'(x)\n",
+    "$$\n",
+    "\n",
+    "In our case, using Euler's method to approximate the value of $g$ at a step $\\Delta t$ from $t$ yields"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "0e2c28ae",
+   "id": "46f2fb77",
    "metadata": {
     "editable": true
    },
    "source": [
     "$$\n",
-    "y(i) = \\sum_{k=-\\infty}^{k=\\infty}w(k)x(i-k),\n",
+    "\\begin{aligned}\n",
+    "  g(t + \\Delta t) &\\approx g(t) + \\Delta t g'(t) \\\\\n",
+    "  &= g(t) + \\Delta t \\big(\\alpha g(t)(A - g(t))\\big)\n",
+    "\\end{aligned}\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "9eb2d76b",
+   "id": "aab2dfa5",
    "metadata": {
     "editable": true
    },
    "source": [
-    "we have now"
+    "along with the condition that $g(0) = g_0$.\n",
+    "\n",
+    "Let $t_i = i \\cdot \\Delta t$ where $\\Delta t = \\frac{T}{N_t-1}$ where $T$ is the final time our solver must solve for and $N_t$ the number of values for $t \\in [0, T]$ for $i = 0, \\dots, N_t-1$.\n",
+    "\n",
+    "For $i \\geq 1$, we have that"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "f17d4c03",
+   "id": "8eea575e",
    "metadata": {
     "editable": true
    },
    "source": [
     "$$\n",
-    "y(i) = \\sum_{k=-\\infty}^{k=\\infty}w(k)x(i+k).\n",
+    "\\begin{aligned}\n",
+    "t_i &= i\\Delta t \\\\\n",
+    "&= (i - 1)\\Delta t + \\Delta t \\\\\n",
+    "&= t_{i-1} + \\Delta t\n",
+    "\\end{aligned}\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "027e2d72",
+   "id": "b91b116d",
    "metadata": {
     "editable": true
    },
    "source": [
-    "Both TensorFlow and PyTorch (as well as our own code example below),\n",
-    "implement the last equation, although it is normally referred to as\n",
-    "convolution.  The same padding rules and stride rules discussed below\n",
-    "apply to this expression as well.\n",
-    "\n",
-    "We leave it as an exercise for you to convince yourself that the example we have discussed till now, gives the same final expression using the last expression."
+    "Now, if $g_i = g(t_i)$ then"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "88be64cf",
+   "id": "b438159d",
    "metadata": {
     "editable": true
    },
    "source": [
-    "## Two-dimensional objects\n",
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"odenum\"></div>\n",
     "\n",
-    "We are now ready to start studying the discrete convolutions relevant for convolutional neural networks.\n",
-    "We often use convolutions over more than one dimension at a time. If\n",
-    "we have a two-dimensional image $X$ as input, we can have a **filter**\n",
-    "defined by a two-dimensional **kernel/weight/filter** $W$. This leads to an output $Y$"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "d297ed25",
-   "metadata": {
-    "editable": true
-   },
-   "source": [
     "$$\n",
-    "Y(i,j)=(X * W)(i,j) = \\sum_m\\sum_n X(m,n)W(i-m,j-n).\n",
+    "\\begin{equation}\n",
+    "  \\begin{aligned}\n",
+    "  g_i &= g(t_i) \\\\\n",
+    "  &= g(t_{i-1} + \\Delta t) \\\\\n",
+    "  &\\approx g(t_{i-1}) + \\Delta t \\big(\\alpha g(t_{i-1})(A - g(t_{i-1}))\\big) \\\\\n",
+    "  &= g_{i-1} + \\Delta t \\big(\\alpha g_{i-1}(A - g_{i-1})\\big)\n",
+    "  \\end{aligned}\n",
+    "\\end{equation} \\label{odenum} \\tag{12}\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "de288b76",
+   "id": "c4fcc89b",
    "metadata": {
     "editable": true
    },
    "source": [
-    "Convolution is a commutative process, which means we can rewrite this equation as"
+    "for $i \\geq 1$ and $g_0 = g(t_0) = g(0) = g_0$.\n",
+    "\n",
+    "Equation ([12](#odenum)) could be implemented in the following way,\n",
+    "extending the program that uses the network using Autograd:"
    ]
   },
   {
-   "cell_type": "markdown",
-   "id": "2aaea6bf",
+   "cell_type": "code",
+   "execution_count": 4,
+   "id": "98f55b29",
    "metadata": {
+    "collapsed": false,
     "editable": true
    },
+   "outputs": [],
    "source": [
-    "$$\n",
-    "Y(i,j)=(X * W)(i,j) = \\sum_m\\sum_n X(i-m,j-n)W(m,n).\n",
-    "$$"
+    "# Assume that all function definitions from the example program using Autograd\n",
+    "# are located here.\n",
+    "\n",
+    "if __name__ == '__main__':\n",
+    "    npr.seed(4155)\n",
+    "\n",
+    "    ## Decide the vales of arguments to the function to solve\n",
+    "    Nt = 10\n",
+    "    T = 1\n",
+    "    t = np.linspace(0,T, Nt)\n",
+    "\n",
+    "    ## Set up the initial parameters\n",
+    "    num_hidden_neurons = [100,50,25]\n",
+    "    num_iter = 1000\n",
+    "    lmb = 1e-3\n",
+    "\n",
+    "    P = solve_ode_deep_neural_network(t, num_hidden_neurons, num_iter, lmb)\n",
+    "\n",
+    "    g_dnn_ag = g_trial_deep(t,P)\n",
+    "    g_analytical = g_analytic(t)\n",
+    "\n",
+    "    # Find the maximum absolute difference between the solutons:\n",
+    "    diff_ag = np.max(np.abs(g_dnn_ag - g_analytical))\n",
+    "    print(\"The max absolute difference between the solutions is: %g\"%diff_ag)\n",
+    "\n",
+    "    plt.figure(figsize=(10,10))\n",
+    "\n",
+    "    plt.title('Performance of neural network solving an ODE compared to the analytical solution')\n",
+    "    plt.plot(t, g_analytical)\n",
+    "    plt.plot(t, g_dnn_ag[0,:])\n",
+    "    plt.legend(['analytical','nn'])\n",
+    "    plt.xlabel('t')\n",
+    "    plt.ylabel('g(t)')\n",
+    "\n",
+    "    ## Find an approximation to the funtion using forward Euler\n",
+    "\n",
+    "    alpha, A, g0 = get_parameters()\n",
+    "    dt = T/(Nt - 1)\n",
+    "\n",
+    "    # Perform forward Euler to solve the ODE\n",
+    "    g_euler = np.zeros(Nt)\n",
+    "    g_euler[0] = g0\n",
+    "\n",
+    "    for i in range(1,Nt):\n",
+    "        g_euler[i] = g_euler[i-1] + dt*(alpha*g_euler[i-1]*(A - g_euler[i-1]))\n",
+    "\n",
+    "    # Print the errors done by each method\n",
+    "    diff1 = np.max(np.abs(g_euler - g_analytical))\n",
+    "    diff2 = np.max(np.abs(g_dnn_ag[0,:] - g_analytical))\n",
+    "\n",
+    "    print('Max absolute difference between Euler method and analytical: %g'%diff1)\n",
+    "    print('Max absolute difference between deep neural network and analytical: %g'%diff2)\n",
+    "\n",
+    "    # Plot results\n",
+    "    plt.figure(figsize=(10,10))\n",
+    "\n",
+    "    plt.plot(t,g_euler)\n",
+    "    plt.plot(t,g_analytical)\n",
+    "    plt.plot(t,g_dnn_ag[0,:])\n",
+    "\n",
+    "    plt.legend(['euler','analytical','dnn'])\n",
+    "    plt.xlabel('Time t')\n",
+    "    plt.ylabel('g(t)')\n",
+    "\n",
+    "    plt.show()"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "00d4208e",
+   "id": "a6e8888e",
    "metadata": {
     "editable": true
    },
    "source": [
-    "Normally the latter is more straightforward to implement in a machine\n",
-    "larning library since there is less variation in the range of values\n",
-    "of $m$ and $n$.\n",
+    "## Example: Solving the one dimensional Poisson equation\n",
     "\n",
-    "As mentioned above, most deep learning libraries implement\n",
-    "cross-correlation instead of convolution (although it is referred to as\n",
-    "convolution)"
+    "The Poisson equation for $g(x)$ in one dimension is"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "96af754f",
+   "id": "ac2720d4",
    "metadata": {
     "editable": true
    },
    "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"poisson\"></div>\n",
+    "\n",
     "$$\n",
-    "Y(i,j)=(X * W)(i,j) = \\sum_m\\sum_n X(i+m,j+n)W(m,n).\n",
+    "\\begin{equation} \\label{poisson} \\tag{13}\n",
+    "  -g''(x) = f(x)\n",
+    "\\end{equation}\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "21b0c1e8",
+   "id": "65554b02",
    "metadata": {
     "editable": true
    },
    "source": [
-    "## CNNs in more detail, simple example\n",
+    "where $f(x)$ is a given function for $x \\in (0,1)$.\n",
     "\n",
-    "Let assume we have an input matrix $X$ of dimensionality $3\\times 3$\n",
-    "and a $2\\times 2$ filter $W$ given by the following matrices"
+    "The conditions that $g(x)$ is chosen to fulfill, are"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "be785720",
+   "id": "0cdf0586",
    "metadata": {
     "editable": true
    },
    "source": [
     "$$\n",
-    "\\boldsymbol{X}=\\begin{bmatrix}x_{00} & x_{01} & x_{02}  \\\\\n",
-    "                      x_{10} & x_{11} & x_{12}  \\\\\n",
-    "\t              x_{20} & x_{21} & x_{22} \\end{bmatrix},\n",
+    "\\begin{align*}\n",
+    "  g(0) &= 0 \\\\\n",
+    "  g(1) &= 0\n",
+    "\\end{align*}\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "b7058d94",
-   "metadata": {
-    "editable": true
-   },
-   "source": [
-    "and"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "b88a0094",
+   "id": "f7e65a6a",
    "metadata": {
     "editable": true
    },
    "source": [
-    "$$\n",
-    "\\boldsymbol{W}=\\begin{bmatrix}w_{00} & w_{01} \\\\\n",
-    "\t              w_{10} & w_{11}\\end{bmatrix}.\n",
-    "$$"
+    "This equation can be solved numerically using programs where e.g Autograd and TensorFlow are used.\n",
+    "The results from the networks can then be compared to the analytical solution.\n",
+    "In addition, it could be interesting to see how a typical method for numerically solving second order ODEs compares to the neural networks."
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "5e4e20bc",
+   "id": "cd827e12",
    "metadata": {
     "editable": true
    },
    "source": [
-    "We introduce now the hyperparameter $S$ **stride**. Stride represents how the filter $W$ moves the convolution process on the matrix $X$.\n",
-    "We strongly recommend the repository on [Arithmetic of deep learning by Dumoulin and Visin](https://github.com/vdumoulin/conv_arithmetic) \n",
-    "\n",
-    "Here we set the stride equal to $S=1$, which means that, starting with the element $x_{00}$, the filter will act on $2\\times 2$ submatrices each time, starting with the upper corner and moving according to the stride value column by column. \n",
+    "## The specific equation to solve for\n",
     "\n",
-    "Here we perform the operation"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "d75708bf",
-   "metadata": {
-    "editable": true
-   },
-   "source": [
-    "$$\n",
-    "Y_(i,j)=(X * W)(i,j) = \\sum_m\\sum_n X(i-m,j-n)W(m,n),\n",
-    "$$"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "a905172f",
-   "metadata": {
-    "editable": true
-   },
-   "source": [
-    "and obtain"
+    "Here, the function $g(x)$ to solve for follows the equation"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "1cae567c",
+   "id": "a6100e41",
    "metadata": {
     "editable": true
    },
    "source": [
     "$$\n",
-    "\\boldsymbol{Y}=\\begin{bmatrix}x_{00}w_{00}+x_{01}w_{01}+x_{10}w_{10}+x_{11}w_{11} & x_{01}w_{00}+x_{02}w_{01}+x_{11}w_{10}+x_{12}w_{11}  \\\\\n",
-    "\t              x_{10}w_{00}+x_{11}w_{01}+x_{20}w_{10}+x_{21}w_{11} & x_{11}w_{00}+x_{12}w_{01}+x_{21}w_{10}+x_{22}w_{11}\\end{bmatrix}.\n",
+    "-g''(x) = f(x),\\qquad x \\in (0,1)\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "8fd08506",
+   "id": "15c06751",
    "metadata": {
     "editable": true
    },
    "source": [
-    "We can rewrite this operation in terms of a matrix-vector multiplication by defining a new vector where we flatten out the inputs as a vector $\\boldsymbol{X}'$ of length $9$ and\n",
-    "a matrix $\\boldsymbol{W}'$ with dimension $4\\times 9$ as"
+    "where $f(x)$ is a given function, along with the chosen conditions"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "5eb78b07",
+   "id": "b2b4dd2f",
    "metadata": {
     "editable": true
    },
    "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"cond\"></div>\n",
+    "\n",
     "$$\n",
-    "\\boldsymbol{X}'=\\begin{bmatrix}x_{00} \\\\ x_{01} \\\\ x_{02} \\\\ x_{10} \\\\ x_{11} \\\\ x_{12} \\\\ x_{20} \\\\ x_{21} \\\\ x_{22} \\end{bmatrix},\n",
+    "\\begin{aligned}\n",
+    "g(0) = g(1) = 0\n",
+    "\\end{aligned}\\label{cond} \\tag{14}\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "96796312",
+   "id": "2133aeed",
    "metadata": {
     "editable": true
    },
    "source": [
-    "and the new matrix"
+    "In this example, we consider the case when $f(x) = (3x + x^2)\\exp(x)$.\n",
+    "\n",
+    "For this case, a possible trial solution satisfying the conditions could be"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "10c03ac1",
+   "id": "5baf9b4b",
    "metadata": {
     "editable": true
    },
    "source": [
     "$$\n",
-    "\\boldsymbol{W}'=\\begin{bmatrix} w_{00} & w_{01} & 0 & w_{10} & w_{11} & 0 & 0 & 0 & 0 \\\\\n",
-    "                        0  & w_{00} & w_{01} & 0 & w_{10} & w_{11} & 0 & 0 & 0 \\\\\n",
-    "\t\t\t0 & 0 & 0 & w_{00} & w_{01} & 0 & w_{10} & w_{11} & 0  \\\\\n",
-    "                        0 & 0 & 0 & 0 & w_{00} & w_{01} & 0 & w_{10} & w_{11}\\end{bmatrix}.\n",
+    "g_t(x) = x \\cdot (1-x) \\cdot N(P,x)\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "fcd5c905",
+   "id": "ed82aba2",
    "metadata": {
     "editable": true
    },
    "source": [
-    "We see easily that performing the matrix-vector multiplication $\\boldsymbol{W}'\\boldsymbol{X}'$ is the same as the above convolution with stride $S=1$, that is"
+    "The analytical solution for this problem is"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "04f7219c",
+   "id": "c9bce69c",
    "metadata": {
     "editable": true
    },
    "source": [
     "$$\n",
-    "Y=(\\boldsymbol{W}*\\boldsymbol{X}),\n",
+    "g(x) = x(1 - x)\\exp(x)\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "9ee70da3",
+   "id": "ce42c4a8",
    "metadata": {
     "editable": true
    },
    "source": [
-    "is now given by $\\boldsymbol{W}'\\boldsymbol{X}'$ which is a vector of length $4$ instead of the originally resulting  $2\\times 2$ output matrix."
+    "## Solving the equation using Autograd"
    ]
   },
   {
-   "cell_type": "markdown",
-   "id": "32bf0b36",
+   "cell_type": "code",
+   "execution_count": 5,
+   "id": "2fcb9045",
    "metadata": {
+    "collapsed": false,
     "editable": true
    },
+   "outputs": [],
    "source": [
-    "## The convolution stage\n",
+    "import autograd.numpy as np\n",
+    "from autograd import grad, elementwise_grad\n",
+    "import autograd.numpy.random as npr\n",
+    "from matplotlib import pyplot as plt\n",
     "\n",
-    "The convolution stage, where we apply different filters $\\boldsymbol{W}$ in\n",
-    "order to reduce the dimensionality of an image, adds, in addition to\n",
-    "the weights and biases (to be trained by the back propagation\n",
-    "algorithm) that define the filters, two new hyperparameters, the so-called\n",
-    "**padding** $P$ and the stride $S$."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "bd9f1937",
-   "metadata": {
-    "editable": true
-   },
-   "source": [
-    "## Finding the number of parameters\n",
+    "def sigmoid(z):\n",
+    "    return 1/(1 + np.exp(-z))\n",
     "\n",
-    "In the above example we have an input matrix of dimension $3\\times\n",
-    "3$. In general we call the input for an input volume and it is defined\n",
-    "by its width $H_1$, height $H_1$ and depth $D_1$. If we have the\n",
-    "standard three color channels $D_1=3$.\n",
+    "def deep_neural_network(deep_params, x):\n",
+    "    # N_hidden is the number of hidden layers\n",
+    "    # deep_params is a list, len() should be used\n",
+    "    N_hidden = len(deep_params) - 1 # -1 since params consists of\n",
+    "                                        # parameters to all the hidden\n",
+    "                                        # layers AND the output layer.\n",
     "\n",
-    "The above example has $W_1=H_1=3$ and $D_1=1$.\n",
+    "    # Assumes input x being an one-dimensional array\n",
+    "    num_values = np.size(x)\n",
+    "    x = x.reshape(-1, num_values)\n",
     "\n",
-    "When we introduce the filter we have the following additional hyperparameters\n",
-    "1. $K$ the number of filters. It is common to perform the convolution of the input several times since by experience shrinking the input too fast does not work well\n",
+    "    # Assume that the input layer does nothing to the input x\n",
+    "    x_input = x\n",
     "\n",
-    "2. $F$ as the filter's spatial extent\n",
+    "    # Due to multiple hidden layers, define a variable referencing to the\n",
+    "    # output of the previous layer:\n",
+    "    x_prev = x_input\n",
     "\n",
-    "3. $S$ as the stride parameter\n",
+    "    ## Hidden layers:\n",
     "\n",
-    "4. $P$ as the padding parameter\n",
+    "    for l in range(N_hidden):\n",
+    "        # From the list of parameters P; find the correct weigths and bias for this layer\n",
+    "        w_hidden = deep_params[l]\n",
     "\n",
-    "These parameters are defined by the architecture of the network and are not included in the training."
+    "        # Add a row of ones to include bias\n",
+    "        x_prev = np.concatenate((np.ones((1,num_values)), x_prev ), axis = 0)\n",
+    "\n",
+    "        z_hidden = np.matmul(w_hidden, x_prev)\n",
+    "        x_hidden = sigmoid(z_hidden)\n",
+    "\n",
+    "        # Update x_prev such that next layer can use the output from this layer\n",
+    "        x_prev = x_hidden\n",
+    "\n",
+    "    ## Output layer:\n",
+    "\n",
+    "    # Get the weights and bias for this layer\n",
+    "    w_output = deep_params[-1]\n",
+    "\n",
+    "    # Include bias:\n",
+    "    x_prev = np.concatenate((np.ones((1,num_values)), x_prev), axis = 0)\n",
+    "\n",
+    "    z_output = np.matmul(w_output, x_prev)\n",
+    "    x_output = z_output\n",
+    "\n",
+    "    return x_output\n",
+    "\n",
+    "\n",
+    "def solve_ode_deep_neural_network(x, num_neurons, num_iter, lmb):\n",
+    "    # num_hidden_neurons is now a list of number of neurons within each hidden layer\n",
+    "\n",
+    "    # Find the number of hidden layers:\n",
+    "    N_hidden = np.size(num_neurons)\n",
+    "\n",
+    "    ## Set up initial weigths and biases\n",
+    "\n",
+    "    # Initialize the list of parameters:\n",
+    "    P = [None]*(N_hidden + 1) # + 1 to include the output layer\n",
+    "\n",
+    "    P[0] = npr.randn(num_neurons[0], 2 )\n",
+    "    for l in range(1,N_hidden):\n",
+    "        P[l] = npr.randn(num_neurons[l], num_neurons[l-1] + 1) # +1 to include bias\n",
+    "\n",
+    "    # For the output layer\n",
+    "    P[-1] = npr.randn(1, num_neurons[-1] + 1 ) # +1 since bias is included\n",
+    "\n",
+    "    print('Initial cost: %g'%cost_function_deep(P, x))\n",
+    "\n",
+    "    ## Start finding the optimal weigths using gradient descent\n",
+    "\n",
+    "    # Find the Python function that represents the gradient of the cost function\n",
+    "    # w.r.t the 0-th input argument -- that is the weights and biases in the hidden and output layer\n",
+    "    cost_function_deep_grad = grad(cost_function_deep,0)\n",
+    "\n",
+    "    # Let the update be done num_iter times\n",
+    "    for i in range(num_iter):\n",
+    "        # Evaluate the gradient at the current weights and biases in P.\n",
+    "        # The cost_grad consist now of N_hidden + 1 arrays; the gradient w.r.t the weights and biases\n",
+    "        # in the hidden layers and output layers evaluated at x.\n",
+    "        cost_deep_grad =  cost_function_deep_grad(P, x)\n",
+    "\n",
+    "        for l in range(N_hidden+1):\n",
+    "            P[l] = P[l] - lmb * cost_deep_grad[l]\n",
+    "\n",
+    "    print('Final cost: %g'%cost_function_deep(P, x))\n",
+    "\n",
+    "    return P\n",
+    "\n",
+    "## Set up the cost function specified for this Poisson equation:\n",
+    "\n",
+    "# The right side of the ODE\n",
+    "def f(x):\n",
+    "    return (3*x + x**2)*np.exp(x)\n",
+    "\n",
+    "def cost_function_deep(P, x):\n",
+    "\n",
+    "    # Evaluate the trial function with the current parameters P\n",
+    "    g_t = g_trial_deep(x,P)\n",
+    "\n",
+    "    # Find the derivative w.r.t x of the trial function\n",
+    "    d2_g_t = elementwise_grad(elementwise_grad(g_trial_deep,0))(x,P)\n",
+    "\n",
+    "    right_side = f(x)\n",
+    "\n",
+    "    err_sqr = (-d2_g_t - right_side)**2\n",
+    "    cost_sum = np.sum(err_sqr)\n",
+    "\n",
+    "    return cost_sum/np.size(err_sqr)\n",
+    "\n",
+    "# The trial solution:\n",
+    "def g_trial_deep(x,P):\n",
+    "    return x*(1-x)*deep_neural_network(P,x)\n",
+    "\n",
+    "# The analytic solution;\n",
+    "def g_analytic(x):\n",
+    "    return x*(1-x)*np.exp(x)\n",
+    "\n",
+    "if __name__ == '__main__':\n",
+    "    npr.seed(4155)\n",
+    "\n",
+    "    ## Decide the vales of arguments to the function to solve\n",
+    "    Nx = 10\n",
+    "    x = np.linspace(0,1, Nx)\n",
+    "\n",
+    "    ## Set up the initial parameters\n",
+    "    num_hidden_neurons = [200,100]\n",
+    "    num_iter = 1000\n",
+    "    lmb = 1e-3\n",
+    "\n",
+    "    P = solve_ode_deep_neural_network(x, num_hidden_neurons, num_iter, lmb)\n",
+    "\n",
+    "    g_dnn_ag = g_trial_deep(x,P)\n",
+    "    g_analytical = g_analytic(x)\n",
+    "\n",
+    "    # Find the maximum absolute difference between the solutons:\n",
+    "    max_diff = np.max(np.abs(g_dnn_ag - g_analytical))\n",
+    "    print(\"The max absolute difference between the solutions is: %g\"%max_diff)\n",
+    "\n",
+    "    plt.figure(figsize=(10,10))\n",
+    "\n",
+    "    plt.title('Performance of neural network solving an ODE compared to the analytical solution')\n",
+    "    plt.plot(x, g_analytical)\n",
+    "    plt.plot(x, g_dnn_ag[0,:])\n",
+    "    plt.legend(['analytical','nn'])\n",
+    "    plt.xlabel('x')\n",
+    "    plt.ylabel('g(x)')\n",
+    "    plt.show()"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "47cafbf5",
+   "id": "9db2e30e",
    "metadata": {
     "editable": true
    },
    "source": [
-    "## New image (or volume)\n",
+    "## Comparing with a numerical scheme\n",
     "\n",
-    "Acting with the filter on the input volume produces an output volume\n",
-    "which is defined by its width $W_2$, its height $H_2$ and its depth\n",
-    "$D_2$.\n",
+    "The Poisson equation is possible to solve using Taylor series to approximate the second derivative.\n",
     "\n",
-    "These are defined by the following relations"
+    "Using Taylor series, the second derivative can be expressed as\n",
+    "\n",
+    "$$\n",
+    "g''(x) = \\frac{g(x + \\Delta x) - 2g(x) + g(x-\\Delta x)}{\\Delta x^2} + E_{\\Delta x}(x)\n",
+    "$$\n",
+    "\n",
+    "where $\\Delta x$ is a small step size and $E_{\\Delta x}(x)$ being the error term.\n",
+    "\n",
+    "Looking away from the error terms gives an approximation to the second derivative:"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "cf5c61df",
+   "id": "2cea098e",
    "metadata": {
     "editable": true
    },
    "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"approx\"></div>\n",
+    "\n",
     "$$\n",
-    "W_2 = \\frac{(W_1-F+2P)}{S}+1,\n",
+    "\\begin{equation} \\label{approx} \\tag{15}\n",
+    "g''(x) \\approx \\frac{g(x + \\Delta x) - 2g(x) + g(x-\\Delta x)}{\\Delta x^2}\n",
+    "\\end{equation}\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "c22d0618",
+   "id": "4606d139",
    "metadata": {
     "editable": true
    },
    "source": [
-    "$$\n",
-    "H_2 = \\frac{(H_1-F+2P)}{S}+1,\n",
-    "$$"
+    "If $x_i = i \\Delta x = x_{i-1} + \\Delta x$ and $g_i = g(x_i)$ for $i = 1,\\dots N_x - 2$ with $N_x$ being the number of values for $x$, ([15](#approx)) becomes"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "910ce75f",
+   "id": "bf52b218",
    "metadata": {
     "editable": true
    },
    "source": [
-    "and $D_2=K$."
+    "$$\n",
+    "\\begin{aligned}\n",
+    "g''(x_i) &\\approx \\frac{g(x_i + \\Delta x) - 2g(x_i) + g(x_i -\\Delta x)}{\\Delta x^2} \\\\\n",
+    "&= \\frac{g_{i+1} - 2g_i + g_{i-1}}{\\Delta x^2}\n",
+    "\\end{aligned}\n",
+    "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "0615df08",
+   "id": "5649b303",
    "metadata": {
     "editable": true
    },
    "source": [
-    "## Parameters to train, common settings\n",
-    "\n",
-    "With parameter sharing, the convolution involves thus  for each filter  $F\\times F\\times D_1$ weights plus one bias parameter.\n",
-    "\n",
-    "In total we have"
+    "Since we know from our problem that"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "452f4413",
+   "id": "cabbaeeb",
    "metadata": {
     "editable": true
    },
    "source": [
     "$$\n",
-    "\\left(F\\times F\\times D_1\\right) \\times K+K_{\\mathrm{biases}},\n",
+    "\\begin{aligned}\n",
+    "-g''(x) &= f(x) \\\\\n",
+    "&= (3x + x^2)\\exp(x)\n",
+    "\\end{aligned}\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "cb97c13e",
+   "id": "9116da9a",
    "metadata": {
     "editable": true
    },
    "source": [
-    "parameters to train by back propagation.\n",
-    "\n",
-    "It is common to let $K$ come in powers of $2$, that is $32$, $64$, $128$ etc.\n",
-    "\n",
-    "**Common settings.**\n",
-    "\n",
-    "1. $\\begin{array}{c} F=3 & S=1 & P=1 \\end{array}$\n",
-    "\n",
-    "2. $\\begin{array}{c} F=5 & S=1 & P=2 \\end{array}$\n",
-    "\n",
-    "3. $\\begin{array}{c} F=5 & S=2 & P=\\mathrm{open} \\end{array}$\n",
-    "\n",
-    "4. $\\begin{array}{c} F=1 & S=1 & P=0 \\end{array}$"
+    "along with the conditions $g(0) = g(1) = 0$,\n",
+    "the following scheme can be used to find an approximate solution for $g(x)$ numerically:"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "3425f9b9",
+   "id": "fa0313ed",
    "metadata": {
     "editable": true
    },
    "source": [
-    "## Examples of CNN setups\n",
-    "\n",
-    "Let us assume we have an input volume $V$ given by an image of dimensionality\n",
-    "$32\\times 32 \\times 3$, that is three color channels and $32\\times 32$ pixels.\n",
-    "\n",
-    "We apply a filter of dimension $5\\times 5$ ten times with stride $S=1$ and padding $P=0$.\n",
-    "\n",
-    "The output volume is given by $(32-5)/1+1=28$, resulting in ten images\n",
-    "of dimensionality $28\\times 28\\times 3$.\n",
-    "\n",
-    "The total number of parameters to train for each filter is then\n",
-    "$5\\times 5\\times 3+1$, where the last parameter is the bias. This\n",
-    "gives us $76$ parameters for each filter, leading to a total of $760$\n",
-    "parameters for the ten filters.\n",
-    "\n",
-    "How many parameters will a filter of dimensionality $3\\times 3$\n",
-    "(adding color channels) result in if we produce $32$ new images? Use $S=1$ and $P=0$.\n",
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"odesys\"></div>\n",
     "\n",
-    "Note that strides constitute a form of **subsampling**. As an alternative to\n",
-    "being interpreted as a measure of how much the kernel/filter is translated, strides\n",
-    "can also be viewed as how much of the output is retained. For instance, moving\n",
-    "the kernel by hops of two is equivalent to moving the kernel by hops of one but\n",
-    "retaining only odd output elements."
+    "$$\n",
+    "\\begin{equation}\n",
+    "  \\begin{aligned}\n",
+    "  -\\Big( \\frac{g_{i+1} - 2g_i + g_{i-1}}{\\Delta x^2} \\Big) &= f(x_i) \\\\\n",
+    "  -g_{i+1} + 2g_i - g_{i-1} &= \\Delta x^2 f(x_i)\n",
+    "  \\end{aligned}\n",
+    "\\end{equation} \\label{odesys} \\tag{16}\n",
+    "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "9659e588",
+   "id": "d4bff256",
    "metadata": {
     "editable": true
    },
    "source": [
-    "## Summarizing: Performing a general discrete convolution ([From Raschka et al](https://github.com/rasbt/machine-learning-book))\n",
+    "for $i = 1, \\dots, N_x - 2$ where $g_0 = g_{N_x - 1} = 0$ and $f(x_i) = (3x_i + x_i^2)\\exp(x_i)$, which is given for our specific problem.\n",
     "\n",
-    "<!-- dom:FIGURE: [figslides/discreteconv1.png, width=500 frac=0.67]  A deep CNN -->\n",
-    "<!-- begin figure -->\n",
-    "\n",
-    "<img src=\"figslides/discreteconv1.png\" width=\"500\"><p style=\"font-size: 0.9em\"><i>Figure 1: A deep CNN</i></p>\n",
-    "<!-- end figure -->"
+    "The equation can be rewritten into a matrix equation:"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "696da180",
+   "id": "2817b619",
    "metadata": {
     "editable": true
    },
    "source": [
-    "## Pooling\n",
-    "\n",
-    "In addition to discrete convolutions themselves, **pooling** operations\n",
-    "make up another important building block in CNNs. Pooling operations reduce\n",
-    "the size of feature maps by using some function to summarize subregions, such\n",
-    "as taking the average or the maximum value.\n",
-    "\n",
-    "Pooling works by sliding a window across the input and feeding the content of\n",
-    "the window to a **pooling function**. In some sense, pooling works very much\n",
-    "like a discrete convolution, but replaces the linear combination described by\n",
-    "the kernel with some other function."
+    "$$\n",
+    "\\begin{aligned}\n",
+    "\\begin{pmatrix}\n",
+    "2 & -1 & 0 & \\dots & 0 \\\\\n",
+    "-1 & 2 & -1 & \\dots & 0 \\\\\n",
+    "\\vdots & & \\ddots & & \\vdots \\\\\n",
+    "0 & \\dots & -1 & 2 & -1  \\\\\n",
+    "0 & \\dots & 0 & -1 & 2\\\\\n",
+    "\\end{pmatrix}\n",
+    "\\begin{pmatrix}\n",
+    "g_1 \\\\\n",
+    "g_2 \\\\\n",
+    "\\vdots \\\\\n",
+    "g_{N_x - 3} \\\\\n",
+    "g_{N_x - 2}\n",
+    "\\end{pmatrix}\n",
+    "&=\n",
+    "\\Delta x^2\n",
+    "\\begin{pmatrix}\n",
+    "f(x_1) \\\\\n",
+    "f(x_2) \\\\\n",
+    "\\vdots \\\\\n",
+    "f(x_{N_x - 3}) \\\\\n",
+    "f(x_{N_x - 2})\n",
+    "\\end{pmatrix} \\\\\n",
+    "\\boldsymbol{A}\\boldsymbol{g} &= \\boldsymbol{f},\n",
+    "\\end{aligned}\n",
+    "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "f62d8385",
+   "id": "5130b233",
    "metadata": {
     "editable": true
    },
    "source": [
-    "## Pooling arithmetic\n",
-    "\n",
-    "In a neural network, pooling layers provide invariance to small translations of\n",
-    "the input. The most common kind of pooling is **max pooling**, which\n",
-    "consists in splitting the input in (usually non-overlapping) patches and\n",
-    "outputting the maximum value of each patch. Other kinds of pooling exist, e.g.,\n",
-    "mean or average pooling, which all share the same idea of aggregating the input\n",
-    "locally by applying a non-linearity to the content of some patches."
+    "which makes it possible to solve for the vector $\\boldsymbol{g}$."
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "da48ec69",
+   "id": "18a4fdda",
    "metadata": {
     "editable": true
    },
    "source": [
-    "## Pooling types ([From Raschka et al](https://github.com/rasbt/machine-learning-book))\n",
-    "\n",
-    "<!-- dom:FIGURE: [figslides/maxpooling.png, width=500 frac=0.67]  A deep CNN -->\n",
-    "<!-- begin figure -->\n",
+    "## Setting up the code\n",
     "\n",
-    "<img src=\"figslides/maxpooling.png\" width=\"500\"><p style=\"font-size: 0.9em\"><i>Figure 1: A deep CNN</i></p>\n",
-    "<!-- end figure -->"
+    "We can then compare the result from this numerical scheme with the output from our network using Autograd:"
    ]
   },
   {
-   "cell_type": "markdown",
-   "id": "82697e96",
+   "cell_type": "code",
+   "execution_count": 6,
+   "id": "3cff184d",
    "metadata": {
+    "collapsed": false,
     "editable": true
    },
+   "outputs": [],
    "source": [
-    "## Building convolutional neural networks in Tensorflow and Keras\n",
+    "import autograd.numpy as np\n",
+    "from autograd import grad, elementwise_grad\n",
+    "import autograd.numpy.random as npr\n",
+    "from matplotlib import pyplot as plt\n",
+    "\n",
+    "def sigmoid(z):\n",
+    "    return 1/(1 + np.exp(-z))\n",
+    "\n",
+    "def deep_neural_network(deep_params, x):\n",
+    "    # N_hidden is the number of hidden layers\n",
+    "    # deep_params is a list, len() should be used\n",
+    "    N_hidden = len(deep_params) - 1 # -1 since params consists of\n",
+    "                                        # parameters to all the hidden\n",
+    "                                        # layers AND the output layer.\n",
+    "\n",
+    "    # Assumes input x being an one-dimensional array\n",
+    "    num_values = np.size(x)\n",
+    "    x = x.reshape(-1, num_values)\n",
+    "\n",
+    "    # Assume that the input layer does nothing to the input x\n",
+    "    x_input = x\n",
+    "\n",
+    "    # Due to multiple hidden layers, define a variable referencing to the\n",
+    "    # output of the previous layer:\n",
+    "    x_prev = x_input\n",
+    "\n",
+    "    ## Hidden layers:\n",
+    "\n",
+    "    for l in range(N_hidden):\n",
+    "        # From the list of parameters P; find the correct weigths and bias for this layer\n",
+    "        w_hidden = deep_params[l]\n",
+    "\n",
+    "        # Add a row of ones to include bias\n",
+    "        x_prev = np.concatenate((np.ones((1,num_values)), x_prev ), axis = 0)\n",
+    "\n",
+    "        z_hidden = np.matmul(w_hidden, x_prev)\n",
+    "        x_hidden = sigmoid(z_hidden)\n",
+    "\n",
+    "        # Update x_prev such that next layer can use the output from this layer\n",
+    "        x_prev = x_hidden\n",
+    "\n",
+    "    ## Output layer:\n",
+    "\n",
+    "    # Get the weights and bias for this layer\n",
+    "    w_output = deep_params[-1]\n",
+    "\n",
+    "    # Include bias:\n",
+    "    x_prev = np.concatenate((np.ones((1,num_values)), x_prev), axis = 0)\n",
+    "\n",
+    "    z_output = np.matmul(w_output, x_prev)\n",
+    "    x_output = z_output\n",
+    "\n",
+    "    return x_output\n",
+    "\n",
+    "\n",
+    "def solve_ode_deep_neural_network(x, num_neurons, num_iter, lmb):\n",
+    "    # num_hidden_neurons is now a list of number of neurons within each hidden layer\n",
+    "\n",
+    "    # Find the number of hidden layers:\n",
+    "    N_hidden = np.size(num_neurons)\n",
+    "\n",
+    "    ## Set up initial weigths and biases\n",
+    "\n",
+    "    # Initialize the list of parameters:\n",
+    "    P = [None]*(N_hidden + 1) # + 1 to include the output layer\n",
+    "\n",
+    "    P[0] = npr.randn(num_neurons[0], 2 )\n",
+    "    for l in range(1,N_hidden):\n",
+    "        P[l] = npr.randn(num_neurons[l], num_neurons[l-1] + 1) # +1 to include bias\n",
+    "\n",
+    "    # For the output layer\n",
+    "    P[-1] = npr.randn(1, num_neurons[-1] + 1 ) # +1 since bias is included\n",
+    "\n",
+    "    print('Initial cost: %g'%cost_function_deep(P, x))\n",
+    "\n",
+    "    ## Start finding the optimal weigths using gradient descent\n",
+    "\n",
+    "    # Find the Python function that represents the gradient of the cost function\n",
+    "    # w.r.t the 0-th input argument -- that is the weights and biases in the hidden and output layer\n",
+    "    cost_function_deep_grad = grad(cost_function_deep,0)\n",
+    "\n",
+    "    # Let the update be done num_iter times\n",
+    "    for i in range(num_iter):\n",
+    "        # Evaluate the gradient at the current weights and biases in P.\n",
+    "        # The cost_grad consist now of N_hidden + 1 arrays; the gradient w.r.t the weights and biases\n",
+    "        # in the hidden layers and output layers evaluated at x.\n",
+    "        cost_deep_grad =  cost_function_deep_grad(P, x)\n",
+    "\n",
+    "        for l in range(N_hidden+1):\n",
+    "            P[l] = P[l] - lmb * cost_deep_grad[l]\n",
+    "\n",
+    "    print('Final cost: %g'%cost_function_deep(P, x))\n",
+    "\n",
+    "    return P\n",
     "\n",
-    "As discussed above, CNNs are neural networks built from the assumption that the inputs\n",
-    "to the network are 2D images. This is important because the number of features or pixels in images\n",
-    "grows very fast with the image size, and an enormous number of weights and biases are needed in order to build an accurate network.  \n",
+    "## Set up the cost function specified for this Poisson equation:\n",
     "\n",
-    "As before, we still have our input, a hidden layer and an output. What's novel about convolutional networks\n",
-    "are the **convolutional** and **pooling** layers stacked in pairs between the input and the hidden layer.\n",
-    "In addition, the data is no longer represented as a 2D feature matrix, instead each input is a number of 2D\n",
-    "matrices, typically 1 for each color dimension (Red, Green, Blue)."
+    "# The right side of the ODE\n",
+    "def f(x):\n",
+    "    return (3*x + x**2)*np.exp(x)\n",
+    "\n",
+    "def cost_function_deep(P, x):\n",
+    "\n",
+    "    # Evaluate the trial function with the current parameters P\n",
+    "    g_t = g_trial_deep(x,P)\n",
+    "\n",
+    "    # Find the derivative w.r.t x of the trial function\n",
+    "    d2_g_t = elementwise_grad(elementwise_grad(g_trial_deep,0))(x,P)\n",
+    "\n",
+    "    right_side = f(x)\n",
+    "\n",
+    "    err_sqr = (-d2_g_t - right_side)**2\n",
+    "    cost_sum = np.sum(err_sqr)\n",
+    "\n",
+    "    return cost_sum/np.size(err_sqr)\n",
+    "\n",
+    "# The trial solution:\n",
+    "def g_trial_deep(x,P):\n",
+    "    return x*(1-x)*deep_neural_network(P,x)\n",
+    "\n",
+    "# The analytic solution;\n",
+    "def g_analytic(x):\n",
+    "    return x*(1-x)*np.exp(x)\n",
+    "\n",
+    "if __name__ == '__main__':\n",
+    "    npr.seed(4155)\n",
+    "\n",
+    "    ## Decide the vales of arguments to the function to solve\n",
+    "    Nx = 10\n",
+    "    x = np.linspace(0,1, Nx)\n",
+    "\n",
+    "    ## Set up the initial parameters\n",
+    "    num_hidden_neurons = [200,100]\n",
+    "    num_iter = 1000\n",
+    "    lmb = 1e-3\n",
+    "\n",
+    "    P = solve_ode_deep_neural_network(x, num_hidden_neurons, num_iter, lmb)\n",
+    "\n",
+    "    g_dnn_ag = g_trial_deep(x,P)\n",
+    "    g_analytical = g_analytic(x)\n",
+    "\n",
+    "    # Find the maximum absolute difference between the solutons:\n",
+    "\n",
+    "    plt.figure(figsize=(10,10))\n",
+    "\n",
+    "    plt.title('Performance of neural network solving an ODE compared to the analytical solution')\n",
+    "    plt.plot(x, g_analytical)\n",
+    "    plt.plot(x, g_dnn_ag[0,:])\n",
+    "    plt.legend(['analytical','nn'])\n",
+    "    plt.xlabel('x')\n",
+    "    plt.ylabel('g(x)')\n",
+    "\n",
+    "    ## Perform the computation using the numerical scheme\n",
+    "\n",
+    "    dx = 1/(Nx - 1)\n",
+    "\n",
+    "    # Set up the matrix A\n",
+    "    A = np.zeros((Nx-2,Nx-2))\n",
+    "\n",
+    "    A[0,0] = 2\n",
+    "    A[0,1] = -1\n",
+    "\n",
+    "    for i in range(1,Nx-3):\n",
+    "        A[i,i-1] = -1\n",
+    "        A[i,i] = 2\n",
+    "        A[i,i+1] = -1\n",
+    "\n",
+    "    A[Nx - 3, Nx - 4] = -1\n",
+    "    A[Nx - 3, Nx - 3] = 2\n",
+    "\n",
+    "    # Set up the vector f\n",
+    "    f_vec = dx**2 * f(x[1:-1])\n",
+    "\n",
+    "    # Solve the equation\n",
+    "    g_res = np.linalg.solve(A,f_vec)\n",
+    "\n",
+    "    g_vec = np.zeros(Nx)\n",
+    "    g_vec[1:-1] = g_res\n",
+    "\n",
+    "    # Print the differences between each method\n",
+    "    max_diff1 = np.max(np.abs(g_dnn_ag - g_analytical))\n",
+    "    max_diff2 = np.max(np.abs(g_vec - g_analytical))\n",
+    "    print(\"The max absolute difference between the analytical solution and DNN Autograd: %g\"%max_diff1)\n",
+    "    print(\"The max absolute difference between the analytical solution and numerical scheme: %g\"%max_diff2)\n",
+    "\n",
+    "    # Plot the results\n",
+    "    plt.figure(figsize=(10,10))\n",
+    "\n",
+    "    plt.plot(x,g_vec)\n",
+    "    plt.plot(x,g_analytical)\n",
+    "    plt.plot(x,g_dnn_ag[0,:])\n",
+    "\n",
+    "    plt.legend(['numerical scheme','analytical','dnn'])\n",
+    "    plt.show()"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "603caa07",
+   "id": "89115be0",
    "metadata": {
     "editable": true
    },
    "source": [
-    "## Setting it up\n",
+    "## Partial Differential Equations\n",
+    "\n",
+    "A partial differential equation (PDE) has a solution here the function\n",
+    "is defined by multiple variables.  The equation may involve all kinds\n",
+    "of combinations of which variables the function is differentiated with\n",
+    "respect to.\n",
     "\n",
-    "It means that to represent the entire\n",
-    "dataset of images, we require a 4D matrix or **tensor**. This tensor has the dimensions:"
+    "In general, a partial differential equation for a function $g(x_1,\\dots,x_N)$ with $N$ variables may be expressed as"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "a15c5a6f",
+   "id": "c43a6341",
    "metadata": {
     "editable": true
    },
    "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"PDE\"></div>\n",
+    "\n",
     "$$\n",
-    "(n_{inputs},\\, n_{pixels, width},\\, n_{pixels, height},\\, depth) .\n",
+    "\\begin{equation} \\label{PDE} \\tag{17}\n",
+    "  f\\left(x_1, \\, \\dots \\, , x_N, \\frac{\\partial g(x_1,\\dots,x_N) }{\\partial x_1}, \\dots , \\frac{\\partial g(x_1,\\dots,x_N) }{\\partial x_N}, \\frac{\\partial g(x_1,\\dots,x_N) }{\\partial x_1\\partial x_2}, \\, \\dots \\, , \\frac{\\partial^n g(x_1,\\dots,x_N) }{\\partial x_N^n} \\right) = 0\n",
+    "\\end{equation}\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "dbffa234",
+   "id": "218a7a68",
    "metadata": {
     "editable": true
    },
    "source": [
-    "## The MNIST dataset again\n",
-    "\n",
-    "The MNIST dataset consists of grayscale images with a pixel size of\n",
-    "$28\\times 28$, meaning we require $28 \\times 28 = 724$ weights to each\n",
-    "neuron in the first hidden layer.\n",
-    "\n",
-    "If we were to analyze images of size $128\\times 128$ we would require\n",
-    "$128 \\times 128 = 16384$ weights to each neuron. Even worse if we were\n",
-    "dealing with color images, as most images are, we have an image matrix\n",
-    "of size $128\\times 128$ for each color dimension (Red, Green, Blue),\n",
-    "meaning 3 times the number of weights $= 49152$ are required for every\n",
-    "single neuron in the first hidden layer."
+    "where $f$ is an expression involving all kinds of possible mixed derivatives of $g(x_1,\\dots,x_N)$ up to an order $n$. In order for the solution to be unique, some additional conditions must also be given."
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "c08597db",
+   "id": "902f8f61",
    "metadata": {
     "editable": true
    },
    "source": [
-    "## Strong correlations\n",
+    "## Type of problem\n",
     "\n",
-    "Images typically have strong local correlations, meaning that a small\n",
-    "part of the image varies little from its neighboring regions. If for\n",
-    "example we have an image of a blue car, we can roughly assume that a\n",
-    "small blue part of the image is surrounded by other blue regions.\n",
+    "The problem our network must solve for, is similar to the ODE case.\n",
+    "We must have a trial solution $g_t$ at hand.\n",
     "\n",
-    "Therefore, instead of connecting every single pixel to a neuron in the\n",
-    "first hidden layer, as we have previously done with deep neural\n",
-    "networks, we can instead connect each neuron to a small part of the\n",
-    "image (in all 3 RGB depth dimensions).  The size of each small area is\n",
-    "fixed, and known as a [receptive](https://en.wikipedia.org/wiki/Receptive_field)."
+    "For instance, the trial solution could be expressed as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1c2bbcbd",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\begin{align*}\n",
+    "  g_t(x_1,\\dots,x_N) = h_1(x_1,\\dots,x_N) + h_2(x_1,\\dots,x_N,N(x_1,\\dots,x_N,P))\n",
+    "\\end{align*}\n",
+    "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "36ead32a",
+   "id": "73f5bf7b",
    "metadata": {
     "editable": true
    },
    "source": [
-    "## Layers of a CNN\n",
+    "where $h_1(x_1,\\dots,x_N)$ is a function that ensures $g_t(x_1,\\dots,x_N)$ satisfies some given conditions.\n",
+    "The neural network $N(x_1,\\dots,x_N,P)$ has weights and biases described by $P$ and $h_2(x_1,\\dots,x_N,N(x_1,\\dots,x_N,P))$ is an expression using the output from the neural network in some way.\n",
     "\n",
-    "The layers of a convolutional neural network arrange neurons in 3D: width, height and depth.  \n",
-    "The input image is typically a square matrix of depth 3. \n",
+    "The role of the function $h_2(x_1,\\dots,x_N,N(x_1,\\dots,x_N,P))$, is to ensure that the output of $N(x_1,\\dots,x_N,P)$ is zero when $g_t(x_1,\\dots,x_N)$ is evaluated at the values of $x_1,\\dots,x_N$ where the given conditions must be satisfied. The function $h_1(x_1,\\dots,x_N)$ should alone make $g_t(x_1,\\dots,x_N)$ satisfy the conditions."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "dbb4ece5",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Network requirements\n",
     "\n",
-    "A **convolution** is performed on the image which outputs\n",
-    "a 3D volume of neurons. The weights to the input are arranged in a number of 2D matrices, known as **filters**.\n",
+    "The network tries then the minimize the cost function following the\n",
+    "same ideas as described for the ODE case, but now with more than one\n",
+    "variables to consider.  The concept still remains the same; find a set\n",
+    "of parameters $P$ such that the expression $f$ in ([17](#PDE)) is as\n",
+    "close to zero as possible.\n",
     "\n",
-    "Each filter slides along the input image, taking the dot product\n",
-    "between each small part of the image and the filter, in all depth\n",
-    "dimensions. This is then passed through a non-linear function,\n",
-    "typically the **Rectified Linear (ReLu)** function, which serves as the\n",
-    "activation of the neurons in the first convolutional layer. This is\n",
-    "further passed through a **pooling layer**, which reduces the size of the\n",
-    "convolutional layer, e.g. by taking the maximum or average across some\n",
-    "small regions, and this serves as input to the next convolutional\n",
-    "layer."
+    "As for the ODE case, the cost function is the mean squared error that\n",
+    "the network must try to minimize. The cost function for the network to\n",
+    "minimize is"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d01d3943",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "C\\left(x_1, \\dots, x_N, P\\right) = \\left(  f\\left(x_1, \\, \\dots \\, , x_N, \\frac{\\partial g(x_1,\\dots,x_N) }{\\partial x_1}, \\dots , \\frac{\\partial g(x_1,\\dots,x_N) }{\\partial x_N}, \\frac{\\partial g(x_1,\\dots,x_N) }{\\partial x_1\\partial x_2}, \\, \\dots \\, , \\frac{\\partial^n g(x_1,\\dots,x_N) }{\\partial x_N^n} \\right) \\right)^2\n",
+    "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "76a692c3",
+   "id": "6514db22",
    "metadata": {
     "editable": true
    },
    "source": [
-    "## Systematic reduction\n",
+    "## More details\n",
     "\n",
-    "By systematically reducing the size of the input volume, through\n",
-    "convolution and pooling, the network should create representations of\n",
-    "small parts of the input, and then from them assemble representations\n",
-    "of larger areas.  The final pooling layer is flattened to serve as\n",
-    "input to a hidden layer, such that each neuron in the final pooling\n",
-    "layer is connected to every single neuron in the hidden layer. This\n",
-    "then serves as input to the output layer, e.g. a softmax output for\n",
-    "classification."
+    "If we let $\\boldsymbol{x} = \\big( x_1, \\dots, x_N \\big)$ be an array containing the values for $x_1, \\dots, x_N$ respectively, the cost function can be reformulated into the following:"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "177759f2",
+   "id": "5a0ed10c",
    "metadata": {
     "editable": true
    },
    "source": [
-    "## Prerequisites: Collect and pre-process data"
+    "$$\n",
+    "C\\left(\\boldsymbol{x}, P\\right) = f\\left( \\left( \\boldsymbol{x}, \\frac{\\partial g(\\boldsymbol{x}) }{\\partial x_1}, \\dots , \\frac{\\partial g(\\boldsymbol{x}) }{\\partial x_N}, \\frac{\\partial g(\\boldsymbol{x}) }{\\partial x_1\\partial x_2}, \\, \\dots \\, , \\frac{\\partial^n g(\\boldsymbol{x}) }{\\partial x_N^n} \\right) \\right)^2\n",
+    "$$"
    ]
   },
   {
-   "cell_type": "code",
-   "execution_count": 2,
-   "id": "f4f578f8",
+   "cell_type": "markdown",
+   "id": "200fc78c",
    "metadata": {
-    "collapsed": false,
     "editable": true
    },
-   "outputs": [],
    "source": [
-    "# import necessary packages\n",
-    "import numpy as np\n",
-    "import matplotlib.pyplot as plt\n",
-    "from sklearn import datasets\n",
-    "\n",
-    "\n",
-    "# ensure the same random numbers appear every time\n",
-    "np.random.seed(0)\n",
-    "\n",
-    "# display images in notebook\n",
-    "%matplotlib inline\n",
-    "plt.rcParams['figure.figsize'] = (12,12)\n",
-    "\n",
-    "\n",
-    "# download MNIST dataset\n",
-    "digits = datasets.load_digits()\n",
-    "\n",
-    "# define inputs and labels\n",
-    "inputs = digits.images\n",
-    "labels = digits.target\n",
-    "\n",
-    "# RGB images have a depth of 3\n",
-    "# our images are grayscale so they should have a depth of 1\n",
-    "inputs = inputs[:,:,:,np.newaxis]\n",
-    "\n",
-    "print(\"inputs = (n_inputs, pixel_width, pixel_height, depth) = \" + str(inputs.shape))\n",
-    "print(\"labels = (n_inputs) = \" + str(labels.shape))\n",
-    "\n",
-    "\n",
-    "# choose some random images to display\n",
-    "n_inputs = len(inputs)\n",
-    "indices = np.arange(n_inputs)\n",
-    "random_indices = np.random.choice(indices, size=5)\n",
+    "If we also have $M$ different sets of values for $x_1, \\dots, x_N$, that is $\\boldsymbol{x}_i = \\big(x_1^{(i)}, \\dots, x_N^{(i)}\\big)$ for $i = 1,\\dots,M$ being the rows in matrix $X$, the cost function can be generalized into"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0c87647d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "C\\left(X, P \\right) = \\sum_{i=1}^M f\\left( \\left( \\boldsymbol{x}_i, \\frac{\\partial g(\\boldsymbol{x}_i) }{\\partial x_1}, \\dots , \\frac{\\partial g(\\boldsymbol{x}_i) }{\\partial x_N}, \\frac{\\partial g(\\boldsymbol{x}_i) }{\\partial x_1\\partial x_2}, \\, \\dots \\, , \\frac{\\partial^n g(\\boldsymbol{x}_i) }{\\partial x_N^n} \\right) \\right)^2.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6484a267",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Example: The diffusion equation\n",
     "\n",
-    "for i, image in enumerate(digits.images[random_indices]):\n",
-    "    plt.subplot(1, 5, i+1)\n",
-    "    plt.axis('off')\n",
-    "    plt.imshow(image, cmap=plt.cm.gray_r, interpolation='nearest')\n",
-    "    plt.title(\"Label: %d\" % digits.target[random_indices[i]])\n",
-    "plt.show()"
+    "In one spatial dimension, the equation reads"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "6fd14013",
+   "id": "2c2a2467",
    "metadata": {
     "editable": true
    },
    "source": [
-    "## Importing Keras and Tensorflow"
+    "$$\n",
+    "\\frac{\\partial g(x,t)}{\\partial t} = \\frac{\\partial^2 g(x,t)}{\\partial x^2}\n",
+    "$$"
    ]
   },
   {
-   "cell_type": "code",
-   "execution_count": 3,
-   "id": "bea60977",
+   "cell_type": "markdown",
+   "id": "6df58357",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where a possible choice of conditions are"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "13d9c7f6",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\begin{align*}\n",
+    "g(0,t) &= 0 ,\\qquad t \\geq 0 \\\\\n",
+    "g(1,t) &= 0, \\qquad t \\geq 0 \\\\\n",
+    "g(x,0) &= u(x),\\qquad x\\in [0,1]\n",
+    "\\end{align*}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "627708ec",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "with $u(x)$ being some given function."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "43cdd945",
    "metadata": {
-    "collapsed": false,
     "editable": true
    },
-   "outputs": [],
    "source": [
-    "from tensorflow.keras import datasets, layers, models\n",
-    "from tensorflow.keras.layers import Input\n",
-    "from tensorflow.keras.models import Sequential      #This allows appending layers to existing models\n",
-    "from tensorflow.keras.layers import Dense           #This allows defining the characteristics of a particular layer\n",
-    "from tensorflow.keras import optimizers             #This allows using whichever optimiser we want (sgd,adam,RMSprop)\n",
-    "from tensorflow.keras import regularizers           #This allows using whichever regularizer we want (l1,l2,l1_l2)\n",
-    "from tensorflow.keras.utils import to_categorical   #This allows using categorical cross entropy as the cost function\n",
-    "#from tensorflow.keras import Conv2D\n",
-    "#from tensorflow.keras import MaxPooling2D\n",
-    "#from tensorflow.keras import Flatten\n",
+    "## Defining the problem\n",
     "\n",
-    "from sklearn.model_selection import train_test_split\n",
+    "For this case, we want to find $g(x,t)$ such that"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ccdcb67e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"diffonedim\"></div>\n",
     "\n",
-    "# representation of labels\n",
-    "labels = to_categorical(labels)\n",
+    "$$\n",
+    "\\begin{equation}\n",
+    "  \\frac{\\partial g(x,t)}{\\partial t} = \\frac{\\partial^2 g(x,t)}{\\partial x^2}\n",
+    "\\end{equation} \\label{diffonedim} \\tag{18}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ebe711f8",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2174f30f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\begin{align*}\n",
+    "g(0,t) &= 0 ,\\qquad t \\geq 0 \\\\\n",
+    "g(1,t) &= 0, \\qquad t \\geq 0 \\\\\n",
+    "g(x,0) &= u(x),\\qquad x\\in [0,1]\n",
+    "\\end{align*}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "083ed2ff",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "with $u(x) = \\sin(\\pi x)$.\n",
     "\n",
-    "# split into train and test data\n",
-    "# one-liner from scikit-learn library\n",
-    "train_size = 0.8\n",
-    "test_size = 1 - train_size\n",
-    "X_train, X_test, Y_train, Y_test = train_test_split(inputs, labels, train_size=train_size,\n",
-    "                                                    test_size=test_size)"
+    "First, let us set up the deep neural network.\n",
+    "The deep neural network will follow the same structure as discussed in the examples solving the ODEs.\n",
+    "First, we will look into how Autograd could be used in a network tailored to solve for bivariate functions."
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "849c5abf",
+   "id": "cf5e3f46",
    "metadata": {
     "editable": true
    },
    "source": [
-    "## Running with Keras"
+    "## Setting up the network using Autograd\n",
+    "\n",
+    "The only change to do here, is to extend our network such that\n",
+    "functions of multiple parameters are correctly handled.  In this case\n",
+    "we have two variables in our function to solve for, that is time $t$\n",
+    "and position $x$.  The variables will be represented by a\n",
+    "one-dimensional array in the program.  The program will evaluate the\n",
+    "network at each possible pair $(x,t)$, given an array for the desired\n",
+    "$x$-values and $t$-values to approximate the solution at."
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 4,
-   "id": "f6cf91b0",
+   "execution_count": 7,
+   "id": "4fee106b",
    "metadata": {
     "collapsed": false,
     "editable": true
    },
    "outputs": [],
    "source": [
-    "def create_convolutional_neural_network_keras(input_shape, receptive_field,\n",
-    "                                              n_filters, n_neurons_connected, n_categories,\n",
-    "                                              eta, lmbd):\n",
-    "    model = Sequential()\n",
-    "    model.add(layers.Conv2D(n_filters, (receptive_field, receptive_field), input_shape=input_shape, padding='same',\n",
-    "              activation='relu', kernel_regularizer=regularizers.l2(lmbd)))\n",
-    "    model.add(layers.MaxPooling2D(pool_size=(2, 2)))\n",
-    "    model.add(layers.Flatten())\n",
-    "    model.add(layers.Dense(n_neurons_connected, activation='relu', kernel_regularizer=regularizers.l2(lmbd)))\n",
-    "    model.add(layers.Dense(n_categories, activation='softmax', kernel_regularizer=regularizers.l2(lmbd)))\n",
-    "    \n",
-    "    sgd = optimizers.SGD(learning_rate=eta)\n",
-    "    model.compile(loss='categorical_crossentropy', optimizer=sgd, metrics=['accuracy'])\n",
-    "    \n",
-    "    return model\n",
+    "def sigmoid(z):\n",
+    "    return 1/(1 + np.exp(-z))\n",
+    "\n",
+    "def deep_neural_network(deep_params, x):\n",
+    "    # x is now a point and a 1D numpy array; make it a column vector\n",
+    "    num_coordinates = np.size(x,0)\n",
+    "    x = x.reshape(num_coordinates,-1)\n",
+    "\n",
+    "    num_points = np.size(x,1)\n",
+    "\n",
+    "    # N_hidden is the number of hidden layers\n",
+    "    N_hidden = len(deep_params) - 1 # -1 since params consist of parameters to all the hidden layers AND the output layer\n",
+    "\n",
+    "    # Assume that the input layer does nothing to the input x\n",
+    "    x_input = x\n",
+    "    x_prev = x_input\n",
+    "\n",
+    "    ## Hidden layers:\n",
+    "\n",
+    "    for l in range(N_hidden):\n",
+    "        # From the list of parameters P; find the correct weigths and bias for this layer\n",
+    "        w_hidden = deep_params[l]\n",
+    "\n",
+    "        # Add a row of ones to include bias\n",
+    "        x_prev = np.concatenate((np.ones((1,num_points)), x_prev ), axis = 0)\n",
+    "\n",
+    "        z_hidden = np.matmul(w_hidden, x_prev)\n",
+    "        x_hidden = sigmoid(z_hidden)\n",
+    "\n",
+    "        # Update x_prev such that next layer can use the output from this layer\n",
+    "        x_prev = x_hidden\n",
+    "\n",
+    "    ## Output layer:\n",
+    "\n",
+    "    # Get the weights and bias for this layer\n",
+    "    w_output = deep_params[-1]\n",
+    "\n",
+    "    # Include bias:\n",
+    "    x_prev = np.concatenate((np.ones((1,num_points)), x_prev), axis = 0)\n",
+    "\n",
+    "    z_output = np.matmul(w_output, x_prev)\n",
+    "    x_output = z_output\n",
+    "\n",
+    "    return x_output[0][0]"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "63e5fb7e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Setting up the network using Autograd; The trial solution\n",
+    "\n",
+    "The cost function must then iterate through the given arrays\n",
+    "containing values for $x$ and $t$, defines a point $(x,t)$ the deep\n",
+    "neural network and the trial solution is evaluated at, and then finds\n",
+    "the Jacobian of the trial solution.\n",
+    "\n",
+    "A possible trial solution for this PDE is\n",
+    "\n",
+    "$$\n",
+    "g_t(x,t) = h_1(x,t) + x(1-x)tN(x,t,P)\n",
+    "$$\n",
     "\n",
-    "epochs = 100\n",
-    "batch_size = 100\n",
-    "input_shape = X_train.shape[1:4]\n",
-    "receptive_field = 3\n",
-    "n_filters = 10\n",
-    "n_neurons_connected = 50\n",
-    "n_categories = 10\n",
+    "with $h_1(x,t)$ being a function ensuring that $g_t(x,t)$ satisfies our given conditions, and $N(x,t,P)$ being the output from the deep neural network using weights and biases for each layer from $P$.\n",
     "\n",
-    "eta_vals = np.logspace(-5, 1, 7)\n",
-    "lmbd_vals = np.logspace(-5, 1, 7)"
+    "To fulfill the conditions, $h_1(x,t)$ could be:\n",
+    "\n",
+    "$$\n",
+    "h_1(x,t) = (1-t)\\Big(u(x) - \\big((1-x)u(0) + x u(1)\\big)\\Big) = (1-t)u(x) = (1-t)\\sin(\\pi x)\n",
+    "$$\n",
+    "since $(0) = u(1) = 0$ and $u(x) = \\sin(\\pi x)$."
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "1762e77d",
+   "id": "50cfea81",
    "metadata": {
     "editable": true
    },
    "source": [
-    "## Final part"
+    "## Why the Jacobian?\n",
+    "\n",
+    "The Jacobian is used because the program must find the derivative of\n",
+    "the trial solution with respect to $x$ and $t$.\n",
+    "\n",
+    "This gives the necessity of computing the Jacobian matrix, as we want\n",
+    "to evaluate the gradient with respect to $x$ and $t$ (note that the\n",
+    "Jacobian of a scalar-valued multivariate function is simply its\n",
+    "gradient).\n",
+    "\n",
+    "In Autograd, the differentiation is by default done with respect to\n",
+    "the first input argument of your Python function. Since the points is\n",
+    "an array representing $x$ and $t$, the Jacobian is calculated using\n",
+    "the values of $x$ and $t$.\n",
+    "\n",
+    "To find the second derivative with respect to $x$ and $t$, the\n",
+    "Jacobian can be found for the second time. The result is a Hessian\n",
+    "matrix, which is the matrix containing all the possible second order\n",
+    "mixed derivatives of $g(x,t)$."
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 5,
-   "id": "85cb8818",
+   "execution_count": 8,
+   "id": "309808f6",
    "metadata": {
     "collapsed": false,
     "editable": true
    },
    "outputs": [],
    "source": [
-    "CNN_keras = np.zeros((len(eta_vals), len(lmbd_vals)), dtype=object)\n",
-    "        \n",
-    "for i, eta in enumerate(eta_vals):\n",
-    "    for j, lmbd in enumerate(lmbd_vals):\n",
-    "        CNN = create_convolutional_neural_network_keras(input_shape, receptive_field,\n",
-    "                                              n_filters, n_neurons_connected, n_categories,\n",
-    "                                              eta, lmbd)\n",
-    "        CNN.fit(X_train, Y_train, epochs=epochs, batch_size=batch_size, verbose=0)\n",
-    "        scores = CNN.evaluate(X_test, Y_test)\n",
-    "        \n",
-    "        CNN_keras[i][j] = CNN\n",
-    "        \n",
-    "        print(\"Learning rate = \", eta)\n",
-    "        print(\"Lambda = \", lmbd)\n",
-    "        print(\"Test accuracy: %.3f\" % scores[1])\n",
-    "        print()"
+    "# Set up the trial function:\n",
+    "def u(x):\n",
+    "    return np.sin(np.pi*x)\n",
+    "\n",
+    "def g_trial(point,P):\n",
+    "    x,t = point\n",
+    "    return (1-t)*u(x) + x*(1-x)*t*deep_neural_network(P,point)\n",
+    "\n",
+    "# The right side of the ODE:\n",
+    "def f(point):\n",
+    "    return 0.\n",
+    "\n",
+    "# The cost function:\n",
+    "def cost_function(P, x, t):\n",
+    "    cost_sum = 0\n",
+    "\n",
+    "    g_t_jacobian_func = jacobian(g_trial)\n",
+    "    g_t_hessian_func = hessian(g_trial)\n",
+    "\n",
+    "    for x_ in x:\n",
+    "        for t_ in t:\n",
+    "            point = np.array([x_,t_])\n",
+    "\n",
+    "            g_t = g_trial(point,P)\n",
+    "            g_t_jacobian = g_t_jacobian_func(point,P)\n",
+    "            g_t_hessian = g_t_hessian_func(point,P)\n",
+    "\n",
+    "            g_t_dt = g_t_jacobian[1]\n",
+    "            g_t_d2x = g_t_hessian[0][0]\n",
+    "\n",
+    "            func = f(point)\n",
+    "\n",
+    "            err_sqr = ( (g_t_dt - g_t_d2x) - func)**2\n",
+    "            cost_sum += err_sqr\n",
+    "\n",
+    "    return cost_sum"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9880d94c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Setting up the network using Autograd; The full program\n",
+    "\n",
+    "Having set up the network, along with the trial solution and cost function, we can now see how the deep neural network performs by comparing the results to the analytical solution.\n",
+    "\n",
+    "The analytical solution of our problem is\n",
+    "\n",
+    "$$\n",
+    "g(x,t) = \\exp(-\\pi^2 t)\\sin(\\pi x)\n",
+    "$$\n",
+    "\n",
+    "A possible way to implement a neural network solving the PDE, is given below.\n",
+    "Be aware, though, that it is fairly slow for the parameters used.\n",
+    "A better result is possible, but requires more iterations, and thus longer time to complete.\n",
+    "\n",
+    "Indeed, the program below is not optimal in its implementation, but rather serves as an example on how to implement and use a neural network to solve a PDE.\n",
+    "Using TensorFlow results in a much better execution time. Try it!"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 9,
+   "id": "fcd284e3",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "import autograd.numpy as np\n",
+    "from autograd import jacobian,hessian,grad\n",
+    "import autograd.numpy.random as npr\n",
+    "from matplotlib import cm\n",
+    "from matplotlib import pyplot as plt\n",
+    "from mpl_toolkits.mplot3d import axes3d\n",
+    "\n",
+    "## Set up the network\n",
+    "\n",
+    "def sigmoid(z):\n",
+    "    return 1/(1 + np.exp(-z))\n",
+    "\n",
+    "def deep_neural_network(deep_params, x):\n",
+    "    # x is now a point and a 1D numpy array; make it a column vector\n",
+    "    num_coordinates = np.size(x,0)\n",
+    "    x = x.reshape(num_coordinates,-1)\n",
+    "\n",
+    "    num_points = np.size(x,1)\n",
+    "\n",
+    "    # N_hidden is the number of hidden layers\n",
+    "    N_hidden = len(deep_params) - 1 # -1 since params consist of parameters to all the hidden layers AND the output layer\n",
+    "\n",
+    "    # Assume that the input layer does nothing to the input x\n",
+    "    x_input = x\n",
+    "    x_prev = x_input\n",
+    "\n",
+    "    ## Hidden layers:\n",
+    "\n",
+    "    for l in range(N_hidden):\n",
+    "        # From the list of parameters P; find the correct weigths and bias for this layer\n",
+    "        w_hidden = deep_params[l]\n",
+    "\n",
+    "        # Add a row of ones to include bias\n",
+    "        x_prev = np.concatenate((np.ones((1,num_points)), x_prev ), axis = 0)\n",
+    "\n",
+    "        z_hidden = np.matmul(w_hidden, x_prev)\n",
+    "        x_hidden = sigmoid(z_hidden)\n",
+    "\n",
+    "        # Update x_prev such that next layer can use the output from this layer\n",
+    "        x_prev = x_hidden\n",
+    "\n",
+    "    ## Output layer:\n",
+    "\n",
+    "    # Get the weights and bias for this layer\n",
+    "    w_output = deep_params[-1]\n",
+    "\n",
+    "    # Include bias:\n",
+    "    x_prev = np.concatenate((np.ones((1,num_points)), x_prev), axis = 0)\n",
+    "\n",
+    "    z_output = np.matmul(w_output, x_prev)\n",
+    "    x_output = z_output\n",
+    "\n",
+    "    return x_output[0][0]\n",
+    "\n",
+    "## Define the trial solution and cost function\n",
+    "def u(x):\n",
+    "    return np.sin(np.pi*x)\n",
+    "\n",
+    "def g_trial(point,P):\n",
+    "    x,t = point\n",
+    "    return (1-t)*u(x) + x*(1-x)*t*deep_neural_network(P,point)\n",
+    "\n",
+    "# The right side of the ODE:\n",
+    "def f(point):\n",
+    "    return 0.\n",
+    "\n",
+    "# The cost function:\n",
+    "def cost_function(P, x, t):\n",
+    "    cost_sum = 0\n",
+    "\n",
+    "    g_t_jacobian_func = jacobian(g_trial)\n",
+    "    g_t_hessian_func = hessian(g_trial)\n",
+    "\n",
+    "    for x_ in x:\n",
+    "        for t_ in t:\n",
+    "            point = np.array([x_,t_])\n",
+    "\n",
+    "            g_t = g_trial(point,P)\n",
+    "            g_t_jacobian = g_t_jacobian_func(point,P)\n",
+    "            g_t_hessian = g_t_hessian_func(point,P)\n",
+    "\n",
+    "            g_t_dt = g_t_jacobian[1]\n",
+    "            g_t_d2x = g_t_hessian[0][0]\n",
+    "\n",
+    "            func = f(point)\n",
+    "\n",
+    "            err_sqr = ( (g_t_dt - g_t_d2x) - func)**2\n",
+    "            cost_sum += err_sqr\n",
+    "\n",
+    "    return cost_sum /( np.size(x)*np.size(t) )\n",
+    "\n",
+    "## For comparison, define the analytical solution\n",
+    "def g_analytic(point):\n",
+    "    x,t = point\n",
+    "    return np.exp(-np.pi**2*t)*np.sin(np.pi*x)\n",
+    "\n",
+    "## Set up a function for training the network to solve for the equation\n",
+    "def solve_pde_deep_neural_network(x,t, num_neurons, num_iter, lmb):\n",
+    "    ## Set up initial weigths and biases\n",
+    "    N_hidden = np.size(num_neurons)\n",
+    "\n",
+    "    ## Set up initial weigths and biases\n",
+    "\n",
+    "    # Initialize the list of parameters:\n",
+    "    P = [None]*(N_hidden + 1) # + 1 to include the output layer\n",
+    "\n",
+    "    P[0] = npr.randn(num_neurons[0], 2 + 1 ) # 2 since we have two points, +1 to include bias\n",
+    "    for l in range(1,N_hidden):\n",
+    "        P[l] = npr.randn(num_neurons[l], num_neurons[l-1] + 1) # +1 to include bias\n",
+    "\n",
+    "    # For the output layer\n",
+    "    P[-1] = npr.randn(1, num_neurons[-1] + 1 ) # +1 since bias is included\n",
+    "\n",
+    "    print('Initial cost: ',cost_function(P, x, t))\n",
+    "\n",
+    "    cost_function_grad = grad(cost_function,0)\n",
+    "\n",
+    "    # Let the update be done num_iter times\n",
+    "    for i in range(num_iter):\n",
+    "        cost_grad =  cost_function_grad(P, x , t)\n",
+    "\n",
+    "        for l in range(N_hidden+1):\n",
+    "            P[l] = P[l] - lmb * cost_grad[l]\n",
+    "\n",
+    "    print('Final cost: ',cost_function(P, x, t))\n",
+    "\n",
+    "    return P\n",
+    "\n",
+    "if __name__ == '__main__':\n",
+    "    ### Use the neural network:\n",
+    "    npr.seed(15)\n",
+    "\n",
+    "    ## Decide the vales of arguments to the function to solve\n",
+    "    Nx = 10; Nt = 10\n",
+    "    x = np.linspace(0, 1, Nx)\n",
+    "    t = np.linspace(0,1,Nt)\n",
+    "\n",
+    "    ## Set up the parameters for the network\n",
+    "    num_hidden_neurons = [100, 25]\n",
+    "    num_iter = 250\n",
+    "    lmb = 0.01\n",
+    "\n",
+    "    P = solve_pde_deep_neural_network(x,t, num_hidden_neurons, num_iter, lmb)\n",
+    "\n",
+    "    ## Store the results\n",
+    "    g_dnn_ag = np.zeros((Nx, Nt))\n",
+    "    G_analytical = np.zeros((Nx, Nt))\n",
+    "    for i,x_ in enumerate(x):\n",
+    "        for j, t_ in enumerate(t):\n",
+    "            point = np.array([x_, t_])\n",
+    "            g_dnn_ag[i,j] = g_trial(point,P)\n",
+    "\n",
+    "            G_analytical[i,j] = g_analytic(point)\n",
+    "\n",
+    "    # Find the map difference between the analytical and the computed solution\n",
+    "    diff_ag = np.abs(g_dnn_ag - G_analytical)\n",
+    "    print('Max absolute difference between the analytical solution and the network: %g'%np.max(diff_ag))\n",
+    "\n",
+    "    ## Plot the solutions in two dimensions, that being in position and time\n",
+    "\n",
+    "    T,X = np.meshgrid(t,x)\n",
+    "\n",
+    "    fig = plt.figure(figsize=(10,10))\n",
+    "    ax = fig.add_suplot(projection='3d')\n",
+    "    ax.set_title('Solution from the deep neural network w/ %d layer'%len(num_hidden_neurons))\n",
+    "    s = ax.plot_surface(T,X,g_dnn_ag,linewidth=0,antialiased=False,cmap=cm.viridis)\n",
+    "    ax.set_xlabel('Time $t$')\n",
+    "    ax.set_ylabel('Position $x$');\n",
+    "\n",
+    "\n",
+    "    fig = plt.figure(figsize=(10,10))\n",
+    "    ax = fig.add_suplot(projection='3d')\n",
+    "    ax.set_title('Analytical solution')\n",
+    "    s = ax.plot_surface(T,X,G_analytical,linewidth=0,antialiased=False,cmap=cm.viridis)\n",
+    "    ax.set_xlabel('Time $t$')\n",
+    "    ax.set_ylabel('Position $x$');\n",
+    "\n",
+    "    fig = plt.figure(figsize=(10,10))\n",
+    "    ax = fig.add_suplot(projection='3d')\n",
+    "    ax.set_title('Difference')\n",
+    "    s = ax.plot_surface(T,X,diff_ag,linewidth=0,antialiased=False,cmap=cm.viridis)\n",
+    "    ax.set_xlabel('Time $t$')\n",
+    "    ax.set_ylabel('Position $x$');\n",
+    "\n",
+    "    ## Take some slices of the 3D plots just to see the solutions at particular times\n",
+    "    indx1 = 0\n",
+    "    indx2 = int(Nt/2)\n",
+    "    indx3 = Nt-1\n",
+    "\n",
+    "    t1 = t[indx1]\n",
+    "    t2 = t[indx2]\n",
+    "    t3 = t[indx3]\n",
+    "\n",
+    "    # Slice the results from the DNN\n",
+    "    res1 = g_dnn_ag[:,indx1]\n",
+    "    res2 = g_dnn_ag[:,indx2]\n",
+    "    res3 = g_dnn_ag[:,indx3]\n",
+    "\n",
+    "    # Slice the analytical results\n",
+    "    res_analytical1 = G_analytical[:,indx1]\n",
+    "    res_analytical2 = G_analytical[:,indx2]\n",
+    "    res_analytical3 = G_analytical[:,indx3]\n",
+    "\n",
+    "    # Plot the slices\n",
+    "    plt.figure(figsize=(10,10))\n",
+    "    plt.title(\"Computed solutions at time = %g\"%t1)\n",
+    "    plt.plot(x, res1)\n",
+    "    plt.plot(x,res_analytical1)\n",
+    "    plt.legend(['dnn','analytical'])\n",
+    "\n",
+    "    plt.figure(figsize=(10,10))\n",
+    "    plt.title(\"Computed solutions at time = %g\"%t2)\n",
+    "    plt.plot(x, res2)\n",
+    "    plt.plot(x,res_analytical2)\n",
+    "    plt.legend(['dnn','analytical'])\n",
+    "\n",
+    "    plt.figure(figsize=(10,10))\n",
+    "    plt.title(\"Computed solutions at time = %g\"%t3)\n",
+    "    plt.plot(x, res3)\n",
+    "    plt.plot(x,res_analytical3)\n",
+    "    plt.legend(['dnn','analytical'])\n",
+    "\n",
+    "    plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "51ff4964",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Resources on differential equations and deep learning\n",
+    "\n",
+    "1. [Artificial neural networks for solving ordinary and partial differential equations by I.E. Lagaris et al](https://pdfs.semanticscholar.org/d061/df393e0e8fbfd0ea24976458b7d42419040d.pdf)\n",
+    "\n",
+    "2. [Neural networks for solving differential equations by A. Honchar](https://becominghuman.ai/neural-networks-for-solving-differential-equations-fa230ac5e04c)\n",
+    "\n",
+    "3. [Solving differential equations using neural networks by M.M Chiaramonte and M. Kiener](http://cs229.stanford.edu/proj2013/ChiaramonteKiener-SolvingDifferentialEquationsUsingNeuralNetworks.pdf)\n",
+    "\n",
+    "4. [Introduction to Partial Differential Equations by A. Tveito, R. Winther](https://www.springer.com/us/book/9783540225515)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f7c3b9fc",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Convolutional Neural Networks (recognizing images)\n",
+    "\n",
+    "Convolutional neural networks (CNNs) were developed during the last\n",
+    "decade of the previous century, with a focus on character recognition\n",
+    "tasks. Nowadays, CNNs are a central element in the spectacular success\n",
+    "of deep learning methods. The success in for example image\n",
+    "classifications have made them a central tool for most machine\n",
+    "learning practitioners.\n",
+    "\n",
+    "CNNs are very similar to ordinary Neural Networks.\n",
+    "They are made up of neurons that have learnable weights and\n",
+    "biases. Each neuron receives some inputs, performs a dot product and\n",
+    "optionally follows it with a non-linearity. The whole network still\n",
+    "expresses a single differentiable score function: from the raw image\n",
+    "pixels on one end to class scores at the other. And they still have a\n",
+    "loss function (for example Softmax) on the last (fully-connected) layer\n",
+    "and all the tips/tricks we developed for learning regular Neural\n",
+    "Networks still apply (back propagation, gradient descent etc etc)."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5d3a5ee8",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## What is the Difference\n",
+    "\n",
+    "**CNN architectures make the explicit assumption that\n",
+    "the inputs are images, which allows us to encode certain properties\n",
+    "into the architecture. These then make the forward function more\n",
+    "efficient to implement and vastly reduce the amount of parameters in\n",
+    "the network.**"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e8618fc8",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Neural Networks vs CNNs\n",
+    "\n",
+    "Neural networks are defined as **affine transformations**, that is \n",
+    "a vector is received as input and is multiplied with a matrix of so-called weights (our unknown paramters) to produce an\n",
+    "output (to which a bias vector is usually added before passing the result\n",
+    "through a nonlinear activation function). This is applicable to any type of input, be it an\n",
+    "image, a sound clip or an unordered collection of features: whatever their\n",
+    "dimensionality, their representation can always be flattened into a vector\n",
+    "before the transformation."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b41b4781",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Why CNNS for images, sound files, medical images from CT scans etc?\n",
+    "\n",
+    "However, when we consider images, sound clips and many other similar kinds of data, these data  have an intrinsic\n",
+    "structure. More formally, they share these important properties:\n",
+    "* They are stored as multi-dimensional arrays (think of the pixels of a figure) .\n",
+    "\n",
+    "* They feature one or more axes for which ordering matters (e.g., width and height axes for an image, time axis for a sound clip).\n",
+    "\n",
+    "* One axis, called the channel axis, is used to access different views of the data (e.g., the red, green and blue channels of a color image, or the left and right channels of a stereo audio track).\n",
+    "\n",
+    "These properties are not exploited when an affine transformation is applied; in\n",
+    "fact, all the axes are treated in the same way and the topological information\n",
+    "is not taken into account. Still, taking advantage of the implicit structure of\n",
+    "the data may prove very handy in solving some tasks, like computer vision and\n",
+    "speech recognition, and in these cases it would be best to preserve it. This is\n",
+    "where discrete convolutions come into play.\n",
+    "\n",
+    "A discrete convolution is a linear transformation that preserves this notion of\n",
+    "ordering. It is sparse (only a few input units contribute to a given output\n",
+    "unit) and reuses parameters (the same weights are applied to multiple locations\n",
+    "in the input)."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "33bf8922",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Regular NNs don’t scale well to full images\n",
+    "\n",
+    "As an example, consider\n",
+    "an image of size $32\\times 32\\times 3$ (32 wide, 32 high, 3 color channels), so a\n",
+    "single fully-connected neuron in a first hidden layer of a regular\n",
+    "Neural Network would have $32\\times 32\\times 3 = 3072$ weights. This amount still\n",
+    "seems manageable, but clearly this fully-connected structure does not\n",
+    "scale to larger images. For example, an image of more respectable\n",
+    "size, say $200\\times 200\\times 3$, would lead to neurons that have \n",
+    "$200\\times 200\\times 3 = 120,000$ weights. \n",
+    "\n",
+    "We could have\n",
+    "several such neurons, and the parameters would add up quickly! Clearly,\n",
+    "this full connectivity is wasteful and the huge number of parameters\n",
+    "would quickly lead to possible overfitting.\n",
+    "\n",
+    "<!-- dom:FIGURE: [figslides/nn.jpeg, width=500 frac=0.6]  A regular 3-layer Neural Network. -->\n",
+    "<!-- begin figure -->\n",
+    "\n",
+    "<img src=\"figslides/nn.jpeg\" width=\"500\"><p style=\"font-size: 0.9em\"><i>Figure 1: A regular 3-layer Neural Network.</i></p>\n",
+    "<!-- end figure -->"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "95c20234",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## 3D volumes of neurons\n",
+    "\n",
+    "Convolutional Neural Networks take advantage of the fact that the\n",
+    "input consists of images and they constrain the architecture in a more\n",
+    "sensible way. \n",
+    "\n",
+    "In particular, unlike a regular Neural Network, the\n",
+    "layers of a CNN have neurons arranged in 3 dimensions: width,\n",
+    "height, depth. (Note that the word depth here refers to the third\n",
+    "dimension of an activation volume, not to the depth of a full Neural\n",
+    "Network, which can refer to the total number of layers in a network.)\n",
+    "\n",
+    "To understand it better, the above example of an image \n",
+    "with an input volume of\n",
+    "activations has dimensions $32\\times 32\\times 3$ (width, height,\n",
+    "depth respectively). \n",
+    "\n",
+    "The neurons in a layer will\n",
+    "only be connected to a small region of the layer before it, instead of\n",
+    "all of the neurons in a fully-connected manner. Moreover, the final\n",
+    "output layer could  for this specific image have dimensions $1\\times 1 \\times 10$, \n",
+    "because by the\n",
+    "end of the CNN architecture we will reduce the full image into a\n",
+    "single vector of class scores, arranged along the depth\n",
+    "dimension. \n",
+    "\n",
+    "<!-- dom:FIGURE: [figslides/cnn.jpeg, width=500 frac=0.6]  A CNN arranges its neurons in three dimensions (width, height, depth), as visualized in one of the layers. Every layer of a CNN transforms the 3D input volume to a 3D output volume of neuron activations. In this example, the red input layer holds the image, so its width and height would be the dimensions of the image, and the depth would be 3 (Red, Green, Blue channels). -->\n",
+    "<!-- begin figure -->\n",
+    "\n",
+    "<img src=\"figslides/cnn.jpeg\" width=\"500\"><p style=\"font-size: 0.9em\"><i>Figure 1: A CNN arranges its neurons in three dimensions (width, height, depth), as visualized in one of the layers. Every layer of a CNN transforms the 3D input volume to a 3D output volume of neuron activations. In this example, the red input layer holds the image, so its width and height would be the dimensions of the image, and the depth would be 3 (Red, Green, Blue channels).</i></p>\n",
+    "<!-- end figure -->"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2b7ba652",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## More on Dimensionalities\n",
+    "\n",
+    "In fields like signal processing (and imaging as well), one designs\n",
+    "so-called filters. These filters are defined by the convolutions and\n",
+    "are often hand-crafted. One may specify filters for smoothing, edge\n",
+    "detection, frequency reshaping, and similar operations. However with\n",
+    "neural networks the idea is to automatically learn the filters and use\n",
+    "many of them in conjunction with non-linear operations (activation\n",
+    "functions).\n",
+    "\n",
+    "As an example consider a neural network operating on sound sequence\n",
+    "data.  Assume that we an input vector $\\boldsymbol{x}$ of length $d=10^6$.  We\n",
+    "construct then a neural network with onle hidden layer only with\n",
+    "$10^4$ nodes. This means that we will have a weight matrix with\n",
+    "$10^4\\times 10^6=10^{10}$ weights to be determined, together with $10^4$ biases.\n",
+    "\n",
+    "Assume furthermore that we have an output layer which is meant to train whether the sound sequence represents a human voice (true) or something else (false).\n",
+    "It means that we have only one output node. But since this output node connects to $10^4$ nodes in the hidden layer, there are in total $10^4$ weights to be determined for the output layer, plus one bias. In total we have"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b6a7ae46",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\mathrm{NumberParameters}=10^{10}+10^4+10^4+1 \\approx 10^{10},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0d56b05e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "that is ten billion parameters to determine."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "35c90423",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Further remarks\n",
+    "\n",
+    "The main principles that justify convolutions is locality of\n",
+    "information and repetion of patterns within the signal. Sound samples\n",
+    "of the input in adjacent spots are much more likely to affect each\n",
+    "other than those that are very far away. Similarly, sounds are\n",
+    "repeated in multiple times in the signal. While slightly simplistic,\n",
+    "reasoning about such a sound example demonstrates this. The same\n",
+    "principles then apply to images and other similar data."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d08d4fb6",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Layers used to build CNNs\n",
+    "\n",
+    "A simple CNN is a sequence of layers, and every layer of a CNN\n",
+    "transforms one volume of activations to another through a\n",
+    "differentiable function. We use three main types of layers to build\n",
+    "CNN architectures: Convolutional Layer, Pooling Layer, and\n",
+    "Fully-Connected Layer (exactly as seen in regular Neural Networks). We\n",
+    "will stack these layers to form a full CNN architecture.\n",
+    "\n",
+    "A simple CNN for image classification could have the architecture:\n",
+    "\n",
+    "* **INPUT** ($32\\times 32 \\times 3$) will hold the raw pixel values of the image, in this case an image of width 32, height 32, and with three color channels R,G,B.\n",
+    "\n",
+    "* **CONV** (convolutional )layer will compute the output of neurons that are connected to local regions in the input, each computing a dot product between their weights and a small region they are connected to in the input volume. This may result in volume such as $[32\\times 32\\times 12]$ if we decided to use 12 filters.\n",
+    "\n",
+    "* **RELU** layer will apply an elementwise activation function, such as the $max(0,x)$ thresholding at zero. This leaves the size of the volume unchanged ($[32\\times 32\\times 12]$).\n",
+    "\n",
+    "* **POOL** (pooling) layer will perform a downsampling operation along the spatial dimensions (width, height), resulting in volume such as $[16\\times 16\\times 12]$.\n",
+    "\n",
+    "* **FC** (i.e. fully-connected) layer will compute the class scores, resulting in volume of size $[1\\times 1\\times 10]$, where each of the 10 numbers correspond to a class score, such as among the 10 categories of the MNIST images we considered above . As with ordinary Neural Networks and as the name implies, each neuron in this layer will be connected to all the numbers in the previous volume."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "dd95dcc6",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Transforming images\n",
+    "\n",
+    "CNNs transform the original image layer by layer from the original\n",
+    "pixel values to the final class scores. \n",
+    "\n",
+    "Observe that some layers contain\n",
+    "parameters and other don’t. In particular, the CNN layers perform\n",
+    "transformations that are a function of not only the activations in the\n",
+    "input volume, but also of the parameters (the weights and biases of\n",
+    "the neurons). On the other hand, the RELU/POOL layers will implement a\n",
+    "fixed function. The parameters in the CONV/FC layers will be trained\n",
+    "with gradient descent so that the class scores that the CNN computes\n",
+    "are consistent with the labels in the training set for each image."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5fdbdbfd",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## CNNs in brief\n",
+    "\n",
+    "In summary:\n",
+    "\n",
+    "* A CNN architecture is in the simplest case a list of Layers that transform the image volume into an output volume (e.g. holding the class scores)\n",
+    "\n",
+    "* There are a few distinct types of Layers (e.g. CONV/FC/RELU/POOL are by far the most popular)\n",
+    "\n",
+    "* Each Layer accepts an input 3D volume and transforms it to an output 3D volume through a differentiable function\n",
+    "\n",
+    "* Each Layer may or may not have parameters (e.g. CONV/FC do, RELU/POOL don’t)\n",
+    "\n",
+    "* Each Layer may or may not have additional hyperparameters (e.g. CONV/FC/POOL do, RELU doesn’t)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c0cbb6b0",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## A deep CNN model ([From Raschka et al](https://github.com/rasbt/machine-learning-book))\n",
+    "\n",
+    "<!-- dom:FIGURE: [figslides/deepcnn.png, width=500 frac=0.67]  A deep CNN -->\n",
+    "<!-- begin figure -->\n",
+    "\n",
+    "<img src=\"figslides/deepcnn.png\" width=\"500\"><p style=\"font-size: 0.9em\"><i>Figure 1: A deep CNN</i></p>\n",
+    "<!-- end figure -->"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "caf2418d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Key Idea\n",
+    "\n",
+    "A dense neural network is representd by an affine operation (like\n",
+    "matrix-matrix multiplication) where all parameters are included.\n",
+    "\n",
+    "The key idea in CNNs for say imaging is that in images neighbor pixels tend to be related! So we connect\n",
+    "only neighboring neurons in the input instead of connecting all with the first hidden layer.\n",
+    "\n",
+    "We say we perform a filtering (convolution is the mathematical operation)."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7d5552d8",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## How to do image compression before the era of deep learning\n",
+    "\n",
+    "The singular-value decomposition (SVD) algorithm has been for decades one of the standard ways of compressing images.\n",
+    "The [lectures on the SVD](https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/chapter2.html#the-singular-value-decomposition) give many of the essential details concerning the SVD.\n",
+    "\n",
+    "The orthogonal vectors which are obtained from the SVD, can be used to\n",
+    "project down the dimensionality of a given image. In the example here\n",
+    "we gray-scale an image and downsize it.\n",
+    "\n",
+    "This recipe relies on us being able to actually perform the SVD. For\n",
+    "large images, and in particular with many images to reconstruct, using the SVD \n",
+    "may quickly become an overwhelming task. With the advent of efficient deep\n",
+    "learning methods like CNNs and later generative methods, these methods\n",
+    "have become in the last years the premier way of performing image\n",
+    "analysis. In particular for classification problems with labelled images."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d0bc0489",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## The SVD example"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 10,
+   "id": "cec697e6",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "from matplotlib.image import imread\n",
+    "import matplotlib.pyplot as plt\n",
+    "import scipy.linalg as ln\n",
+    "import numpy as np\n",
+    "import os\n",
+    "from PIL import Image\n",
+    "from math import log10, sqrt \n",
+    "plt.rcParams['figure.figsize'] = [16, 8]\n",
+    "# Import image\n",
+    "A = imread(os.path.join(\"figslides/photo1.jpg\"))\n",
+    "X = A.dot([0.299, 0.5870, 0.114]) # Convert RGB to grayscale\n",
+    "img = plt.imshow(X)\n",
+    "# convert to gray\n",
+    "img.set_cmap('gray')\n",
+    "plt.axis('off')\n",
+    "plt.show()\n",
+    "# Call image size\n",
+    "print(': %s'%str(X.shape))\n",
+    "\n",
+    "\n",
+    "# split the matrix into U, S, VT\n",
+    "U, S, VT = np.linalg.svd(X,full_matrices=False)\n",
+    "S = np.diag(S)\n",
+    "m = 800 # Image's width\n",
+    "n = 1200 # Image's height\n",
+    "j = 0\n",
+    "# Try compression with different k vectors (these represent projections):\n",
+    "for k in (5,10, 20, 100,200,400,500):\n",
+    "    # Original size of the image\n",
+    "    originalSize = m * n \n",
+    "    # Size after compressed\n",
+    "    compressedSize = k * (1 + m + n) \n",
+    "    # The projection of the original image\n",
+    "    Xapprox = U[:,:k] @ S[0:k,:k] @ VT[:k,:]\n",
+    "    plt.figure(j+1)\n",
+    "    j += 1\n",
+    "    img = plt.imshow(Xapprox)\n",
+    "    img.set_cmap('gray')\n",
+    "    \n",
+    "    plt.axis('off')\n",
+    "    plt.title('k = ' + str(k))\n",
+    "    plt.show() \n",
+    "    print('Original size of image:')\n",
+    "    print(originalSize)\n",
+    "    print('Compression rate as Compressed image / Original size:')\n",
+    "    ratio = compressedSize * 1.0 / originalSize\n",
+    "    print(ratio)\n",
+    "    print('Compression rate is ' + str( round(ratio * 100 ,2)) + '%' )  \n",
+    "    # Estimate MQA\n",
+    "    x= X.astype(\"float\")\n",
+    "    y=Xapprox.astype(\"float\")\n",
+    "    err = np.sum((x - y) ** 2)\n",
+    "    err /= float(X.shape[0] * Xapprox.shape[1])\n",
+    "    print('The mean-square deviation '+ str(round( err)))\n",
+    "    max_pixel = 255.0\n",
+    "    # Estimate Signal Noise Ratio\n",
+    "    srv = 20 * (log10(max_pixel / sqrt(err)))\n",
+    "    print('Signa to noise ratio '+ str(round(srv)) +'dB')"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "7cad90f7",
-   "metadata": {
-    "editable": true
-   },
-   "source": [
-    "## Final visualization"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 6,
-   "id": "526e2167",
+   "id": "6a578704",
    "metadata": {
-    "collapsed": false,
     "editable": true
    },
-   "outputs": [],
    "source": [
-    "# visual representation of grid search\n",
-    "# uses seaborn heatmap, could probably do this in matplotlib\n",
-    "import seaborn as sns\n",
-    "\n",
-    "sns.set()\n",
-    "\n",
-    "train_accuracy = np.zeros((len(eta_vals), len(lmbd_vals)))\n",
-    "test_accuracy = np.zeros((len(eta_vals), len(lmbd_vals)))\n",
-    "\n",
-    "for i in range(len(eta_vals)):\n",
-    "    for j in range(len(lmbd_vals)):\n",
-    "        CNN = CNN_keras[i][j]\n",
-    "\n",
-    "        train_accuracy[i][j] = CNN.evaluate(X_train, Y_train)[1]\n",
-    "        test_accuracy[i][j] = CNN.evaluate(X_test, Y_test)[1]\n",
+    "## Mathematics of CNNs\n",
     "\n",
-    "        \n",
-    "fig, ax = plt.subplots(figsize = (10, 10))\n",
-    "sns.heatmap(train_accuracy, annot=True, ax=ax, cmap=\"viridis\")\n",
-    "ax.set_title(\"Training Accuracy\")\n",
-    "ax.set_ylabel(\"$\\eta$\")\n",
-    "ax.set_xlabel(\"$\\lambda$\")\n",
-    "plt.show()\n",
+    "The mathematics of CNNs is based on the mathematical operation of\n",
+    "**convolution**.  In mathematics (in particular in functional analysis),\n",
+    "convolution is represented by mathematical operations (integration,\n",
+    "summation etc) on two functions in order to produce a third function\n",
+    "that expresses how the shape of one gets modified by the other.\n",
+    "Convolution has a plethora of applications in a variety of\n",
+    "disciplines, spanning from statistics to signal processing, computer\n",
+    "vision, solutions of differential equations,linear algebra,\n",
+    "engineering, and yes, machine learning.\n",
     "\n",
-    "fig, ax = plt.subplots(figsize = (10, 10))\n",
-    "sns.heatmap(test_accuracy, annot=True, ax=ax, cmap=\"viridis\")\n",
-    "ax.set_title(\"Test Accuracy\")\n",
-    "ax.set_ylabel(\"$\\eta$\")\n",
-    "ax.set_xlabel(\"$\\lambda$\")\n",
-    "plt.show()"
+    "Mathematically, convolution is defined as follows (one-dimensional example):\n",
+    "Let us define a continuous function $y(t)$ given by"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "3676714e",
-   "metadata": {
-    "editable": true
-   },
-   "source": [
-    "## The CIFAR01 data set\n",
-    "\n",
-    "The CIFAR10 dataset contains 60,000 color images in 10 classes, with\n",
-    "6,000 images in each class. The dataset is divided into 50,000\n",
-    "training images and 10,000 testing images. The classes are mutually\n",
-    "exclusive and there is no overlap between them."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 7,
-   "id": "ddec0620",
+   "id": "5c858d52",
    "metadata": {
-    "collapsed": false,
     "editable": true
    },
-   "outputs": [],
    "source": [
-    "import tensorflow as tf\n",
-    "\n",
-    "from tensorflow.keras import datasets, layers, models\n",
-    "import matplotlib.pyplot as plt\n",
-    "\n",
-    "# We import the data set\n",
-    "(train_images, train_labels), (test_images, test_labels) = datasets.cifar10.load_data()\n",
-    "\n",
-    "# Normalize pixel values to be between 0 and 1 by dividing by 255. \n",
-    "train_images, test_images = train_images / 255.0, test_images / 255.0"
+    "$$\n",
+    "y(t) = \\int x(a) w(t-a) da,\n",
+    "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "68c162f4",
+   "id": "a96333c3",
    "metadata": {
     "editable": true
    },
    "source": [
-    "## Verifying the data set\n",
+    "where $x(a)$ represents a so-called input and $w(t-a)$ is normally called the weight function or kernel.\n",
     "\n",
-    "To verify that the dataset looks correct, let's plot the first 25 images from the training set and display the class name below each image."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 8,
-   "id": "af8654a4",
-   "metadata": {
-    "collapsed": false,
-    "editable": true
-   },
-   "outputs": [],
-   "source": [
-    "class_names = ['airplane', 'automobile', 'bird', 'cat', 'deer',\n",
-    "               'dog', 'frog', 'horse', 'ship', 'truck']\n",
-    "plt.figure(figsize=(10,10))\n",
-    "for i in range(25):\n",
-    "    plt.subplot(5,5,i+1)\n",
-    "    plt.xticks([])\n",
-    "    plt.yticks([])\n",
-    "    plt.grid(False)\n",
-    "    plt.imshow(train_images[i], cmap=plt.cm.binary)\n",
-    "    # The CIFAR labels happen to be arrays, \n",
-    "    # which is why you need the extra index\n",
-    "    plt.xlabel(class_names[train_labels[i][0]])\n",
-    "plt.show()"
+    "The above integral is written in  a more compact form as"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "1038efea",
+   "id": "9834d45e",
    "metadata": {
     "editable": true
    },
    "source": [
-    "## Set up  the model\n",
-    "\n",
-    "The 6 lines of code below define the convolutional base using a common pattern: a stack of Conv2D and MaxPooling2D layers.\n",
-    "\n",
-    "As input, a CNN takes tensors of shape (image_height, image_width, color_channels), ignoring the batch size. If you are new to these dimensions, color_channels refers to (R,G,B). In this example, you will configure our CNN to process inputs of shape (32, 32, 3), which is the format of CIFAR images. You can do this by passing the argument input_shape to our first layer."
+    "$$\n",
+    "y(t) = \\left(x * w\\right)(t).\n",
+    "$$"
    ]
   },
   {
-   "cell_type": "code",
-   "execution_count": 9,
-   "id": "6a377cfd",
+   "cell_type": "markdown",
+   "id": "13e15c5f",
    "metadata": {
-    "collapsed": false,
     "editable": true
    },
-   "outputs": [],
    "source": [
-    "model = models.Sequential()\n",
-    "model.add(layers.Conv2D(32, (3, 3), activation='relu', input_shape=(32, 32, 3)))\n",
-    "model.add(layers.MaxPooling2D((2, 2)))\n",
-    "model.add(layers.Conv2D(64, (3, 3), activation='relu'))\n",
-    "model.add(layers.MaxPooling2D((2, 2)))\n",
-    "model.add(layers.Conv2D(64, (3, 3), activation='relu'))\n",
-    "\n",
-    "# Let's display the architecture of our model so far.\n",
-    "\n",
-    "model.summary()"
+    "The discretized version reads"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "22897193",
+   "id": "0a496b2f",
    "metadata": {
     "editable": true
    },
    "source": [
-    "You can see that the output of every Conv2D and MaxPooling2D layer is a 3D tensor of shape (height, width, channels). The width and height dimensions tend to shrink as you go deeper in the network. The number of output channels for each Conv2D layer is controlled by the first argument (e.g., 32 or 64). Typically, as the width and height shrink, you can afford (computationally) to add more output channels in each Conv2D layer."
+    "$$\n",
+    "y(t) = \\sum_{a=-\\infty}^{a=\\infty}x(a)w(t-a).\n",
+    "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "30ef3e7c",
+   "id": "48c5ecd3",
    "metadata": {
     "editable": true
    },
    "source": [
-    "## Add Dense layers on top\n",
+    "Computing the inverse of the above convolution operations is known as deconvolution and the process is commutative.\n",
     "\n",
-    "To complete our model, you will feed the last output tensor from the\n",
-    "convolutional base (of shape (4, 4, 64)) into one or more Dense layers\n",
-    "to perform classification. Dense layers take vectors as input (which\n",
-    "are 1D), while the current output is a 3D tensor. First, you will\n",
-    "flatten (or unroll) the 3D output to 1D, then add one or more Dense\n",
-    "layers on top. CIFAR has 10 output classes, so you use a final Dense\n",
-    "layer with 10 outputs and a softmax activation."
+    "How can we use this? And what does it mean? Let us study some familiar examples first."
    ]
   },
   {
-   "cell_type": "code",
-   "execution_count": 10,
-   "id": "349e7619",
+   "cell_type": "markdown",
+   "id": "7cab11e7",
    "metadata": {
-    "collapsed": false,
     "editable": true
    },
-   "outputs": [],
    "source": [
-    "model.add(layers.Flatten())\n",
-    "model.add(layers.Dense(64, activation='relu'))\n",
-    "model.add(layers.Dense(10))\n",
-    "# Here's the complete architecture of our model\n",
+    "## Convolution Examples: Polynomial multiplication\n",
+    "\n",
+    "Our first example is that of a multiplication between two polynomials,\n",
+    "which we will rewrite in terms of the mathematics of convolution. In\n",
+    "the final stage, since the problem here is a discrete one, we will\n",
+    "recast the final expression in terms of a matrix-vector\n",
+    "multiplication, where the matrix is a so-called [Toeplitz matrix\n",
+    "](https://link.springer.com/book/10.1007/978-93-86279-04-0).\n",
     "\n",
-    "model.summary()"
+    "Let us look a the following polynomials to second and third order, respectively:"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "3d8d954b",
+   "id": "c90333f8",
    "metadata": {
     "editable": true
    },
    "source": [
-    "As you can see, our (4, 4, 64) outputs were flattened into vectors of shape (1024) before going through two Dense layers."
+    "$$\n",
+    "p(t) = \\alpha_0+\\alpha_1 t+\\alpha_2 t^2,\n",
+    "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "7f0847a8",
-   "metadata": {
-    "editable": true
-   },
-   "source": [
-    "## Compile and train the model"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 11,
-   "id": "783f6194",
+   "id": "7c1b0c9b",
    "metadata": {
-    "collapsed": false,
     "editable": true
    },
-   "outputs": [],
    "source": [
-    "model.compile(optimizer='adam',\n",
-    "              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),\n",
-    "              metrics=['accuracy'])\n",
-    "\n",
-    "history = model.fit(train_images, train_labels, epochs=10, \n",
-    "                    validation_data=(test_images, test_labels))"
+    "and"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "0e1cce97",
+   "id": "9c8df6e8",
    "metadata": {
     "editable": true
    },
    "source": [
-    "## Finally, evaluate the model"
+    "$$\n",
+    "s(t) = \\beta_0+\\beta_1 t+\\beta_2 t^2+\\beta_3 t^3.\n",
+    "$$"
    ]
   },
   {
-   "cell_type": "code",
-   "execution_count": 12,
-   "id": "7d87be54",
+   "cell_type": "markdown",
+   "id": "50667dfa",
    "metadata": {
-    "collapsed": false,
     "editable": true
    },
-   "outputs": [],
    "source": [
-    "plt.plot(history.history['accuracy'], label='accuracy')\n",
-    "plt.plot(history.history['val_accuracy'], label = 'val_accuracy')\n",
-    "plt.xlabel('Epoch')\n",
-    "plt.ylabel('Accuracy')\n",
-    "plt.ylim([0.5, 1])\n",
-    "plt.legend(loc='lower right')\n",
-    "\n",
-    "test_loss, test_acc = model.evaluate(test_images,  test_labels, verbose=2)\n",
-    "\n",
-    "print(test_acc)"
+    "The polynomial multiplication gives us a new polynomial of degree $5$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "5287a9fb",
+   "id": "11f2ea4b",
    "metadata": {
     "editable": true
    },
    "source": [
-    "## Building our own CNN code\n",
-    "\n",
-    "Here we present a flexible and readable python code for a CNN\n",
-    "implemented with NumPy. We will present the code, showcase how to use\n",
-    "the codebase and fit a CNN that yields a 99% accuracy on the 28x28\n",
-    "MNIST dataset within reasonable time.\n",
-    "\n",
-    "**The codes here were developed by Eric Reber and Gregor Kajda during spring 2023.**\n",
-    "\n",
-    "The CNN is compatible with all schedulers, cost functions and\n",
-    "activation functions discussed in constructing our neural network\n",
-    "codes.\n",
-    "\n",
-    " The CNN code consists of different types of Layer classes, including\n",
-    "Convolution2DLayer, Pooling2DLayer, FlattenLayer, FullyConnectedLayer\n",
-    "and OutputLayer, which can be added to the CNN object using the\n",
-    "interface of the CNN class. This allows you to easily construct your\n",
-    "own CNN, as well as allowing you to get used to an interface similar\n",
-    "to that of TensorFlow which is used for real world applications. \n",
-    "\n",
-    "Another important feature of this code is that it throws errors if\n",
-    "unreasonable decisions are made (for example using a kernel that is\n",
-    "larger than the image, not using a FlattenLayer, etc), and provides\n",
-    "the user with an informative error message."
+    "$$\n",
+    "z(t) = \\delta_0+\\delta_1 t+\\delta_2 t^2+\\delta_3 t^3+\\delta_4 t^4+\\delta_5 t^5.\n",
+    "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "5376c8cd",
+   "id": "4abea758",
    "metadata": {
     "editable": true
    },
    "source": [
-    "### List of contents:\n",
-    "\n",
-    "1. Schedulers\n",
-    "\n",
-    "2. Activation Functions\n",
-    "\n",
-    "3. Cost Functions \n",
-    "\n",
-    "4. Convolution\n",
-    "\n",
-    "5. Layers\n",
-    "\n",
-    "6. CNN \n",
+    "## Efficient Polynomial Multiplication\n",
     "\n",
-    "7. Some final remarks"
+    "Computing polynomial products can be implemented efficiently if we rewrite the more brute force multiplications using convolution.\n",
+    "We note first that the new coefficients are given as"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "8cf0b8dc",
+   "id": "ad22b2d2",
    "metadata": {
     "editable": true
    },
    "source": [
-    "### Schedulers\n",
-    "\n",
-    "The code below shows object oriented implementations of the Constant,\n",
-    "Momentum, Adagrad, AdagradMomentum, RMS prop and Adam schedulers. All\n",
-    "of the classes belong to the shared abstract Scheduler class, and\n",
-    "share the update_change() and reset() methods allowing for any of the\n",
-    "schedulers to be seamlessly used during the training stage, as will\n",
-    "later be shown in the fit() method of the neural\n",
-    "network. Update_change() only has one parameter, the gradient\n",
-    "($\\delta^{l}_{j}a^{l-1}_k$), and returns the change which will be\n",
-    "subtracted from the weights. The reset() function takes no parameters,\n",
-    "and resets the desired variables. For Constant and Momentum, reset\n",
-    "does nothing."
+    "$$\n",
+    "\\begin{split}\n",
+    "\\delta_0=&\\alpha_0\\beta_0\\\\\n",
+    "\\delta_1=&\\alpha_1\\beta_0+\\alpha_0\\beta_1\\\\\n",
+    "\\delta_2=&\\alpha_0\\beta_2+\\alpha_1\\beta_1+\\alpha_2\\beta_0\\\\\n",
+    "\\delta_3=&\\alpha_1\\beta_2+\\alpha_2\\beta_1+\\alpha_0\\beta_3\\\\\n",
+    "\\delta_4=&\\alpha_2\\beta_2+\\alpha_1\\beta_3\\\\\n",
+    "\\delta_5=&\\alpha_2\\beta_3.\\\\\n",
+    "\\end{split}\n",
+    "$$"
    ]
   },
   {
-   "cell_type": "code",
-   "execution_count": 13,
-   "id": "f2619dd7",
+   "cell_type": "markdown",
+   "id": "6a3ee064",
    "metadata": {
-    "collapsed": false,
     "editable": true
    },
-   "outputs": [],
    "source": [
-    "import autograd.numpy as np\n",
-    "\n",
-    "class Scheduler:\n",
-    "    \"\"\"\n",
-    "    Abstract class for Schedulers\n",
-    "    \"\"\"\n",
-    "\n",
-    "    def __init__(self, eta):\n",
-    "        self.eta = eta\n",
-    "\n",
-    "    # should be overwritten\n",
-    "    def update_change(self, gradient):\n",
-    "        raise NotImplementedError\n",
-    "\n",
-    "    # overwritten if needed\n",
-    "    def reset(self):\n",
-    "        pass\n",
-    "\n",
-    "\n",
-    "class Constant(Scheduler):\n",
-    "    def __init__(self, eta):\n",
-    "        super().__init__(eta)\n",
-    "\n",
-    "    def update_change(self, gradient):\n",
-    "        return self.eta * gradient\n",
-    "    \n",
-    "    def reset(self):\n",
-    "        pass\n",
-    "\n",
-    "\n",
-    "class Momentum(Scheduler):\n",
-    "    def __init__(self, eta: float, momentum: float):\n",
-    "        super().__init__(eta)\n",
-    "        self.momentum = momentum\n",
-    "        self.change = 0\n",
-    "\n",
-    "    def update_change(self, gradient):\n",
-    "        self.change = self.momentum * self.change + self.eta * gradient\n",
-    "        return self.change\n",
-    "\n",
-    "    def reset(self):\n",
-    "        pass\n",
-    "\n",
-    "\n",
-    "class Adagrad(Scheduler):\n",
-    "    def __init__(self, eta):\n",
-    "        super().__init__(eta)\n",
-    "        self.G_t = None\n",
-    "\n",
-    "    def update_change(self, gradient):\n",
-    "        delta = 1e-8  # avoid division ny zero\n",
-    "\n",
-    "        if self.G_t is None:\n",
-    "            self.G_t = np.zeros((gradient.shape[0], gradient.shape[0]))\n",
-    "\n",
-    "        self.G_t += gradient @ gradient.T\n",
-    "\n",
-    "        G_t_inverse = 1 / (\n",
-    "            delta + np.sqrt(np.reshape(np.diagonal(self.G_t), (self.G_t.shape[0], 1)))\n",
-    "        )\n",
-    "        return self.eta * gradient * G_t_inverse\n",
-    "\n",
-    "    def reset(self):\n",
-    "        self.G_t = None\n",
-    "\n",
-    "\n",
-    "class AdagradMomentum(Scheduler):\n",
-    "    def __init__(self, eta, momentum):\n",
-    "        super().__init__(eta)\n",
-    "        self.G_t = None\n",
-    "        self.momentum = momentum\n",
-    "        self.change = 0\n",
-    "\n",
-    "    def update_change(self, gradient):\n",
-    "        delta = 1e-8  # avoid division ny zero\n",
-    "\n",
-    "        if self.G_t is None:\n",
-    "            self.G_t = np.zeros((gradient.shape[0], gradient.shape[0]))\n",
-    "\n",
-    "        self.G_t += gradient @ gradient.T\n",
-    "\n",
-    "        G_t_inverse = 1 / (\n",
-    "            delta + np.sqrt(np.reshape(np.diagonal(self.G_t), (self.G_t.shape[0], 1)))\n",
-    "        )\n",
-    "        self.change = self.change * self.momentum + self.eta * gradient * G_t_inverse\n",
-    "        return self.change\n",
-    "\n",
-    "    def reset(self):\n",
-    "        self.G_t = None\n",
-    "\n",
-    "\n",
-    "class RMS_prop(Scheduler):\n",
-    "    def __init__(self, eta, rho):\n",
-    "        super().__init__(eta)\n",
-    "        self.rho = rho\n",
-    "        self.second = 0.0\n",
-    "\n",
-    "    def update_change(self, gradient):\n",
-    "        delta = 1e-8  # avoid division ny zero\n",
-    "        self.second = self.rho * self.second + (1 - self.rho) * gradient * gradient\n",
-    "        return self.eta * gradient / (np.sqrt(self.second + delta))\n",
-    "\n",
-    "    def reset(self):\n",
-    "        self.second = 0.0\n",
-    "\n",
-    "\n",
-    "class Adam(Scheduler):\n",
-    "    def __init__(self, eta, rho, rho2):\n",
-    "        super().__init__(eta)\n",
-    "        self.rho = rho\n",
-    "        self.rho2 = rho2\n",
-    "        self.moment = 0\n",
-    "        self.second = 0\n",
-    "        self.n_epochs = 1\n",
-    "\n",
-    "    def update_change(self, gradient):\n",
-    "        delta = 1e-8  # avoid division ny zero\n",
-    "\n",
-    "        self.moment = self.rho * self.moment + (1 - self.rho) * gradient\n",
-    "        self.second = self.rho2 * self.second + (1 - self.rho2) * gradient * gradient\n",
-    "\n",
-    "        moment_corrected = self.moment / (1 - self.rho**self.n_epochs)\n",
-    "        second_corrected = self.second / (1 - self.rho2**self.n_epochs)\n",
-    "\n",
-    "        return self.eta * moment_corrected / (np.sqrt(second_corrected + delta))\n",
+    "We note that $\\alpha_i=0$ except for $i\\in \\left\\{0,1,2\\right\\}$ and $\\beta_i=0$ except for $i\\in\\left\\{0,1,2,3\\right\\}$.\n",
     "\n",
-    "    def reset(self):\n",
-    "        self.n_epochs += 1\n",
-    "        self.moment = 0\n",
-    "        self.second = 0"
+    "We can then rewrite the coefficients $\\delta_j$ using a discrete convolution as"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "04f25816",
+   "id": "3aca65d8",
    "metadata": {
     "editable": true
    },
    "source": [
-    "### Usage of schedulers\n",
-    "\n",
-    "To initalize a scheduler, simply create the object and pass in the necessary parameters such as the learning rate and the momentum as shown below. As the Scheduler class is an abstract class it should not called directly, and will raise an error upon usage."
+    "$$\n",
+    "\\delta_j = \\sum_{i=-\\infty}^{i=\\infty}\\alpha_i\\beta_{j-i}=(\\alpha * \\beta)_j,\n",
+    "$$"
    ]
   },
   {
-   "cell_type": "code",
-   "execution_count": 14,
-   "id": "9afe39a2",
+   "cell_type": "markdown",
+   "id": "0e04ce27",
    "metadata": {
-    "collapsed": false,
     "editable": true
    },
-   "outputs": [],
    "source": [
-    "momentum_scheduler = Momentum(eta=1e-3, momentum=0.9)\n",
-    "adam_scheduler = Adam(eta=1e-3, rho=0.9, rho2=0.999)"
+    "or as a double sum with restriction $l=i+j$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "3e912906",
+   "id": "173eda29",
    "metadata": {
     "editable": true
    },
    "source": [
-    "Here is a small example for how a segment of code using schedulers could look. Switching out the schedulers is simple."
+    "$$\n",
+    "\\delta_l = \\sum_{ij}\\alpha_i\\beta_{j}.\n",
+    "$$"
    ]
   },
   {
-   "cell_type": "code",
-   "execution_count": 15,
-   "id": "aa146e1b",
+   "cell_type": "markdown",
+   "id": "a196c2cd",
    "metadata": {
-    "collapsed": false,
     "editable": true
    },
-   "outputs": [],
    "source": [
-    "weights = np.ones((3,3))\n",
-    "print(f\"Before scheduler:\\n{weights=}\")\n",
-    "\n",
-    "epochs = 10\n",
-    "for e in range(epochs):\n",
-    "    gradient = np.random.rand(3, 3)\n",
-    "    change = adam_scheduler.update_change(gradient)\n",
-    "    weights = weights - change\n",
-    "    adam_scheduler.reset()\n",
+    "## Further simplification\n",
     "\n",
-    "print(f\"\\nAfter scheduler:\\n{weights=}\")"
+    "Although we may have redundant operations with some few zeros for $\\beta_i$, we can rewrite the above sum in a more compact way as"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "fba31aff",
+   "id": "56018bb8",
    "metadata": {
     "editable": true
    },
    "source": [
-    "### Cost functions\n",
-    "\n",
-    "In this section we will quickly look at cost functions that can be\n",
-    "used when creating the neural network. Every cost function takes the\n",
-    "target vector as its parameter, and returns a function valued only at\n",
-    "X such that it may easily be differentiated."
+    "$$\n",
+    "\\delta_i = \\sum_{k=0}^{k=m-1}\\alpha_k\\beta_{i-k},\n",
+    "$$"
    ]
   },
   {
-   "cell_type": "code",
-   "execution_count": 16,
-   "id": "abdcad5a",
+   "cell_type": "markdown",
+   "id": "ba91ab7b",
    "metadata": {
-    "collapsed": false,
     "editable": true
    },
-   "outputs": [],
    "source": [
-    "def CostOLS(target):\n",
-    "    \"\"\"\n",
-    "    Return OLS function valued only at X, so\n",
-    "    that it may be easily differentiated\n",
-    "    \"\"\"\n",
-    "\n",
-    "    def func(X):\n",
-    "        return (1.0 / target.shape[0]) * np.sum((target - X) ** 2)\n",
-    "\n",
-    "    return func\n",
-    "\n",
-    "\n",
-    "def CostLogReg(target):\n",
-    "    \"\"\"\n",
-    "    Return Logistic Regression cost function\n",
-    "    valued only at X, so that it may be easily differentiated\n",
-    "    \"\"\"\n",
-    "\n",
-    "    def func(X):\n",
-    "        return -(1.0 / target.shape[0]) * np.sum(\n",
-    "            (target * np.log(X + 10e-10)) + ((1 - target) * np.log(1 - X + 10e-10))\n",
-    "        )\n",
-    "\n",
-    "    return func\n",
-    "\n",
-    "\n",
-    "def CostCrossEntropy(target):\n",
-    "    \"\"\"\n",
-    "    Return cross entropy cost function valued only at X, so\n",
-    "    that it may be easily differentiated\n",
-    "    \"\"\"\n",
-    "    \n",
-    "    def func(X):\n",
-    "        return -(1.0 / target.size) * np.sum(target * np.log(X + 10e-10))\n",
-    "\n",
-    "    return func"
+    "where $m=3$ in our case, the maximum length of\n",
+    "the vector $\\alpha$. Note that the vector $\\boldsymbol{\\beta}$ has length $n=4$. Below we will find an even more efficient representation."
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "894970b1",
+   "id": "1b25324b",
    "metadata": {
     "editable": true
    },
    "source": [
-    "### Usage of cost functions\n",
+    "## A more efficient way of coding the above Convolution\n",
     "\n",
-    "Below we will provide a short example of how these cost function may\n",
-    "be used to obtain results if you wish to test them out on your own\n",
-    "using AutoGrad's automatic differentiation."
+    "Since we only have a finite number of $\\alpha$ and $\\beta$ values\n",
+    "which are non-zero, we can rewrite the above convolution expressions\n",
+    "as a matrix-vector multiplication"
    ]
   },
   {
-   "cell_type": "code",
-   "execution_count": 17,
-   "id": "6a778d9d",
+   "cell_type": "markdown",
+   "id": "dd6d9155",
    "metadata": {
-    "collapsed": false,
     "editable": true
    },
-   "outputs": [],
    "source": [
-    "from autograd import grad\n",
-    "\n",
-    "target = np.array([[1, 2, 3]]).T\n",
-    "a = np.array([[4, 5, 6]]).T\n",
-    "\n",
-    "cost_func = CostCrossEntropy\n",
-    "cost_func_derivative = grad(cost_func(target))\n",
-    "\n",
-    "valued_at_a = cost_func_derivative(a)\n",
-    "print(f\"Derivative of cost function {cost_func.__name__} valued at a:\\n{valued_at_a}\")"
+    "$$\n",
+    "\\boldsymbol{\\delta}=\\begin{bmatrix}\\alpha_0 & 0 & 0 & 0 \\\\\n",
+    "                            \\alpha_1 & \\alpha_0 & 0 & 0 \\\\\n",
+    "\t\t\t    \\alpha_2 & \\alpha_1 & \\alpha_0 & 0 \\\\\n",
+    "\t\t\t    0 & \\alpha_2 & \\alpha_1 & \\alpha_0 \\\\\n",
+    "\t\t\t    0 & 0 & \\alpha_2 & \\alpha_1 \\\\\n",
+    "\t\t\t    0 & 0 & 0 & \\alpha_2\n",
+    "\t\t\t    \\end{bmatrix}\\begin{bmatrix} \\beta_0 \\\\ \\beta_1 \\\\ \\beta_2 \\\\ \\beta_3\\end{bmatrix}.\n",
+    "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "4fd0dfdb",
+   "id": "28050537",
    "metadata": {
     "editable": true
    },
    "source": [
-    "### Activation functions\n",
+    "## Commutative process\n",
     "\n",
-    "Finally, before we look at the layers that make up the neural network,\n",
-    "we will look at the activation functions which can be specified\n",
-    "between the hidden layers and as the output function. Each function\n",
-    "can be valued for any given vector or matrix X, and can be\n",
-    "differentiated via derivate()."
+    "The process is commutative and we can easily see that we can rewrite the multiplication in terms of  a matrix holding $\\beta$ and a vector holding $\\alpha$.\n",
+    "In this case we have"
    ]
   },
   {
-   "cell_type": "code",
-   "execution_count": 18,
-   "id": "53d2e40d",
+   "cell_type": "markdown",
+   "id": "f8278af4",
    "metadata": {
-    "collapsed": false,
     "editable": true
    },
-   "outputs": [],
    "source": [
-    "\n",
-    "import autograd.numpy as np\n",
-    "from autograd import elementwise_grad\n",
-    "\n",
-    "def identity(X):\n",
-    "    return X\n",
-    "\n",
-    "\n",
-    "def sigmoid(X):\n",
-    "    try:\n",
-    "        return 1.0 / (1 + np.exp(-X))\n",
-    "    except FloatingPointError:\n",
-    "        return np.where(X > np.zeros(X.shape), np.ones(X.shape), np.zeros(X.shape))\n",
-    "\n",
-    "\n",
-    "def softmax(X):\n",
-    "    X = X - np.max(X, axis=-1, keepdims=True)\n",
-    "    delta = 10e-10\n",
-    "    return np.exp(X) / (np.sum(np.exp(X), axis=-1, keepdims=True) + delta)\n",
-    "\n",
-    "\n",
-    "def RELU(X):\n",
-    "    return np.where(X > np.zeros(X.shape), X, np.zeros(X.shape))\n",
-    "\n",
-    "\n",
-    "def LRELU(X):\n",
-    "    delta = 10e-4\n",
-    "    return np.where(X > np.zeros(X.shape), X, delta * X)\n",
-    "\n",
-    "\n",
-    "def derivate(func):\n",
-    "    if func.__name__ == \"RELU\":\n",
-    "\n",
-    "        def func(X):\n",
-    "            return np.where(X > 0, 1, 0)\n",
-    "\n",
-    "        return func\n",
-    "\n",
-    "    elif func.__name__ == \"LRELU\":\n",
-    "\n",
-    "        def func(X):\n",
-    "            delta = 10e-4\n",
-    "            return np.where(X > 0, 1, delta)\n",
-    "\n",
-    "        return func\n",
-    "\n",
-    "    else:\n",
-    "        return elementwise_grad(func)"
+    "$$\n",
+    "\\boldsymbol{\\delta}=\\begin{bmatrix}\\beta_0 & 0 & 0  \\\\\n",
+    "                            \\beta_1 & \\beta_0 & 0  \\\\\n",
+    "\t\t\t    \\beta_2 & \\beta_1 & \\beta_0  \\\\\n",
+    "\t\t\t    \\beta_3 & \\beta_2 & \\beta_1 \\\\\n",
+    "\t\t\t    0 & \\beta_3 & \\beta_2 \\\\\n",
+    "\t\t\t    0 & 0 & \\beta_3\n",
+    "\t\t\t    \\end{bmatrix}\\begin{bmatrix} \\alpha_0 \\\\ \\alpha_1 \\\\ \\alpha_2\\end{bmatrix}.\n",
+    "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "09771b87",
+   "id": "cfa8bf9e",
    "metadata": {
     "editable": true
    },
    "source": [
-    "### Usage of activation functions\n",
-    "\n",
-    "Below we present a short demonstration of how to use an activation\n",
-    "function. The derivative of the activation function will be important\n",
-    "when calculating the output delta term during backpropagation. Note\n",
-    "that derivate() can also be used for cost functions for a more\n",
-    "generalized approach."
+    "Note that the use of these matrices is for mathematical purposes only\n",
+    "and not implementation purposes.  When implementing the above equation\n",
+    "we do not encode (and allocate memory) the matrices explicitely.  We\n",
+    "rather code the convolutions in the minimal memory footprint that they\n",
+    "require."
    ]
   },
   {
-   "cell_type": "code",
-   "execution_count": 19,
-   "id": "08f622fa",
+   "cell_type": "markdown",
+   "id": "4ad971ca",
    "metadata": {
-    "collapsed": false,
     "editable": true
    },
-   "outputs": [],
    "source": [
-    "z = np.array([[4, 5, 6]]).T\n",
-    "print(f\"Input to activation function:\\n{z}\")\n",
-    "\n",
-    "act_func = sigmoid\n",
-    "a = act_func(z)\n",
-    "print(f\"\\nOutput from {act_func.__name__} activation function:\\n{a}\")\n",
+    "## Toeplitz matrices\n",
     "\n",
-    "act_func_derivative = derivate(act_func)\n",
-    "valued_at_z = act_func_derivative(a)\n",
-    "print(f\"\\nDerivative of {act_func.__name__} activation function valued at z:\\n{valued_at_z}\")"
+    "The above matrices are examples of so-called [Toeplitz\n",
+    "matrices](https://link.springer.com/book/10.1007/978-93-86279-04-0). A\n",
+    "Toeplitz matrix is a matrix in which each descending diagonal from\n",
+    "left to right is constant. For instance the last matrix, which we\n",
+    "rewrite as"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "81c7eaef",
+   "id": "ff12250a",
    "metadata": {
     "editable": true
    },
    "source": [
-    "### Convolution\n",
-    "\n",
-    "In order to construct a convolutional neural network (CNN), it is\n",
-    "crucial to comprehend the fundamental principles of convolution and\n",
-    "how it aids in extracting information from images. Convolution, at its\n",
-    "core, is merely a mathematical operation between two functions that\n",
-    "yields another function. It is represented by an integral between two\n",
-    "functions, which is typically expressed as:"
+    "$$\n",
+    "\\boldsymbol{A}=\\begin{bmatrix}a_0 & 0 & 0  \\\\\n",
+    "                            a_1 & a_0 & 0  \\\\\n",
+    "\t\t\t    a_2 & a_1 & a_0  \\\\\n",
+    "\t\t\t    a_3 & a_2 & a_1 \\\\\n",
+    "\t\t\t    0 & a_3 & a_2 \\\\\n",
+    "\t\t\t    0 & 0 & a_3\n",
+    "\t\t\t    \\end{bmatrix},\n",
+    "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "0998392d",
+   "id": "d7ebe2e9",
    "metadata": {
     "editable": true
    },
    "source": [
-    "$$\n",
-    "(f \\ast g)(t):=\\int_{-\\infty}^{\\infty} f(\\tau) g(t-\\tau) d \\tau.\n",
-    "$$"
+    "with elements $a_{ii}=a_{i+1,j+1}=a_{i-j}$ is an example of a Toeplitz\n",
+    "matrix. Such a matrix does not need to be a square matrix.  Toeplitz\n",
+    "matrices are also closely connected with Fourier series, because the multiplication operator by a trigonometric\n",
+    "polynomial, compressed to a finite-dimensional space, can be\n",
+    "represented by such a matrix. The example above shows that we can\n",
+    "represent linear convolution as multiplication of a Toeplitz matrix by\n",
+    "a vector."
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "147961a0",
+   "id": "5bfa6cd4",
    "metadata": {
     "editable": true
    },
    "source": [
-    "Here, $f$ and $g$ are the two functions on which we want to perform an\n",
-    "operation. The outcome of the convolution operation is represented by\n",
-    "$(f \\ast g)$, and it is derived by sliding the function $g$ over $f$ and\n",
-    "computing the integral of their product at each position. If both\n",
-    "functions are continuous, convolution takes the form shown\n",
-    "above. However, if we discretize both $f$ and $g$, the convolution\n",
-    "operation will take the form of a sum between the elements of $f$ and $g$:"
+    "## Fourier series and Toeplitz matrices\n",
+    "\n",
+    "This is an active and ogoing research area concerning CNNs. The following articles may be of interest\n",
+    "1. [Read more about the convolution theorem and Fouriers series](https://www.sciencedirect.com/topics/engineering/convolution-theorem#:~:text=The%20convolution%20theorem%20(together%20with,k%20)%20G%20(%20k%20)%20.)\n",
+    "\n",
+    "2. [Fourier Transform Layer](https://www.sciencedirect.com/science/article/pii/S1568494623006257)"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "f67f1f6d",
+   "id": "4cb64d8c",
    "metadata": {
     "editable": true
    },
    "source": [
-    "$$\n",
-    "(f \\ast g)[n]=\\sum_{m=0}^{n-1} f(m) g(n-m).\n",
-    "$$"
+    "## Generalizing the above one-dimensional case\n",
+    "\n",
+    "In order to align the above simple case with the more general\n",
+    "convolution cases, we rename $\\boldsymbol{\\alpha}$, whose length is $m=3$,\n",
+    "with $\\boldsymbol{w}$.  We will interpret $\\boldsymbol{w}$ as a weight/filter function\n",
+    "with which we want to perform the convolution with an input variable\n",
+    "$\\boldsymbol{x}$ of length $n$.  We will assume always that the filter\n",
+    "$\\boldsymbol{w}$ has dimensionality $m \\le n$.\n",
+    "\n",
+    "We replace thus $\\boldsymbol{\\beta}$ with $\\boldsymbol{x}$ and $\\boldsymbol{\\delta}$ with $\\boldsymbol{y}$ and have"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "dec83f81",
+   "id": "b05f94fc",
    "metadata": {
     "editable": true
    },
    "source": [
-    "The key idea we utilize to extract the information contained in an\n",
-    "image is to slide an $m \\times n$ matrix $g$ over an $m \\times n$\n",
-    "matrix $f$. In our case, $f$ represents the image, while $g$\n",
-    "represents the kernel, oftentimes called a filter. However, since our\n",
-    "convolution will be a two-dimensional variant, we need to extend our\n",
-    "mathematical formula with an additional summation:"
+    "$$\n",
+    "y(i)= \\left(x*w\\right)(i)= \\sum_{k=0}^{k=m-1}w(k)x(i-k),\n",
+    "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "5a5b1e07",
+   "id": "e95bb8b8",
    "metadata": {
     "editable": true
    },
    "source": [
-    "$$\n",
-    "(f \\ast g)(i, j)\\sum_{m=0}^{M-1}\\sum_{n=0}^{N-1} f(m,n) g(i-m, j-n).\n",
-    "$$"
+    "where $m=3$ in our case, the maximum length of the vector $\\boldsymbol{w}$.\n",
+    "Here the symbol $*$ represents the mathematical operation of convolution."
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "7c27a0fe",
+   "id": "490b28d9",
    "metadata": {
     "editable": true
    },
    "source": [
-    "It is imperative to note that the size of the kernel g is\n",
-    "significantly smaller than the size of the input image f, thereby\n",
-    "reducing the amount of computation necessary for feature\n",
-    "extraction. Furthermore, the kernel is usually a trainable parameter\n",
-    "in a convolutional neural network, allowing the network to learn\n",
-    "appropriate kernels for specific tasks.\n",
+    "## Memory considerations\n",
+    "\n",
+    "This expression leaves us however with some terms with negative\n",
+    "indices, for example $x(-1)$ and $x(-2)$ which may not be defined. Our\n",
+    "vector $\\boldsymbol{x}$ has components $x(0)$, $x(1)$, $x(2)$ and $x(3)$.\n",
+    "\n",
+    "The index $j$ for $\\boldsymbol{x}$ runs from $j=0$ to $j=3$ since $\\boldsymbol{x}$ is meant to\n",
+    "represent a third-order polynomial.\n",
     "\n",
-    "To give you an example of how 2D convolution works in practice,\n",
-    "suppose we have an image $f$ of dimension $6 \\times 6$"
+    "Furthermore, the index $i$ runs from $i=0$ to $i=5$ since $\\boldsymbol{y}$\n",
+    "contains the coefficients of a fifth-order polynomial.  When $i=5$ we\n",
+    "may also have values of $x(4)$ and $x(5)$ which are not defined."
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "4c81a107",
+   "id": "73dba37b",
    "metadata": {
     "editable": true
    },
    "source": [
-    "$$\n",
-    "f = \\begin{bmatrix}\n",
-    "4 & 1 & 2 & 9 & 8 & 6 \\\\\n",
-    "9 & 5 & 9 & 5 & 8 & 5 \\\\\n",
-    "1 & 5 & 9 & 7 & 6 & 4 \\\\\n",
-    "2 & 9 & 8 & 3 & 7 & 1 \\\\\n",
-    "8 & 1 & 6 & 4 & 2 & 2 \\\\\n",
-    "1 & 0 & 5 & 7 & 8 & 2 \\\\\n",
-    "\\end{bmatrix}\n",
-    "$$"
+    "## Padding\n",
+    "\n",
+    "The solution to this is what is called **padding**!  We simply define a\n",
+    "new vector $x$ with two added elements set to zero before $x(0)$ and\n",
+    "two new elements after $x(3)$ set to zero. That is, we augment the\n",
+    "length of $\\boldsymbol{x}$ from $n=4$ to $n+2P=8$, where $P=2$ is the padding\n",
+    "constant (a new hyperparameter), see discussions below as well."
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "24228635",
+   "id": "a4ef9cfb",
    "metadata": {
     "editable": true
    },
    "source": [
-    "and a $3 \\times 3$ kernel $g$ called a low-pass filter. Note that the\n",
-    "kernel is usually rotated by 180 degrees during convolution, however\n",
-    "this has no effect on this kernel."
+    "## New vector\n",
+    "\n",
+    "We have a new vector defined as $x(0)=0$, $x(1)=0$,\n",
+    "$x(2)=\\beta_0$, $x(3)=\\beta_1$, $x(4)=\\beta_2$, $x(5)=\\beta_3$,\n",
+    "$x(6)=0$, and $x(7)=0$.\n",
+    "\n",
+    "We have added four new elements, which\n",
+    "are all zero. The benefit is that we can rewrite the equation for\n",
+    "$\\boldsymbol{y}$, with $i=0,1,\\dots,5$,"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "89692cf6",
+   "id": "a3df037d",
    "metadata": {
     "editable": true
    },
    "source": [
     "$$\n",
-    "g = \\frac{1}{9}\n",
-    "\\begin{bmatrix}\n",
-    "1 & 1 & 1 \\\\\n",
-    "1 & 1 & 1 \\\\\n",
-    "1 & 1 & 1 \\\\\n",
-    "\\end{bmatrix}\n",
+    "y(i) = \\sum_{k=0}^{k=m-1}w(k)x(i+(m-1)-k).\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "0ecfdafa",
+   "id": "a10c95fd",
    "metadata": {
     "editable": true
    },
    "source": [
-    "In order to filter the image, we have to extract a $3 \\times 3$\n",
-    "element from the upper left corner of $f$, and perform element-wise\n",
-    "multiplication of the extracted image pixels with the elements of the\n",
-    "kernel $g$:"
+    "As an example, we have"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "33c6332e",
+   "id": "be674b8a",
    "metadata": {
     "editable": true
    },
    "source": [
     "$$\n",
-    "\\begin{bmatrix}\n",
-    "4 & 1 & 2 \\\\\n",
-    "9 & 5 & 9 \\\\\n",
-    "1 & 5 & 9 \\\\\n",
-    "\\end{bmatrix}\n",
-    "\\cdot\n",
-    "\\begin{bmatrix}\n",
-    "\\frac{1}{9} & \\frac{1}{9} & \\frac{1}{9} \\\\\n",
-    "\\frac{1}{9} & \\frac{1}{9} & \\frac{1}{9} \\\\\n",
-    "\\frac{1}{9} & \\frac{1}{9} & \\frac{1}{9} \\\\\n",
-    "\\end{bmatrix}\n",
-    "=\n",
-    "\\begin{bmatrix}\n",
-    "\\frac{4}{9} & \\frac{1}{9} & \\frac{2}{9} \\\\\n",
-    "\\frac{9}{9} & \\frac{5}{9} & \\frac{9}{9} \\\\\n",
-    "\\frac{1}{9} & \\frac{5}{9} & \\frac{9}{9} \\\\\n",
-    "\\end {bmatrix}\n",
-    "= \\boldsymbol{A}\n",
+    "y(4)=x(6)w(0)+x(5)w(1)+x(4)w(2)=0\\times \\alpha_0+\\beta_3\\alpha_1+\\beta_2\\alpha_2,\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "a573879c",
+   "id": "c903130e",
    "metadata": {
     "editable": true
    },
    "source": [
-    "Then, following the multiplication, we summarize all the elements of the resulting matrix $\\boldsymbol{A}$:"
+    "as before except that we have an additional term $x(6)w(0)$, which is zero.\n",
+    "\n",
+    "Similarly, for the fifth-order term we have"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "e515a3e4",
+   "id": "369fb648",
    "metadata": {
     "editable": true
    },
    "source": [
     "$$\n",
-    "(f \\ast g)(0, 0)= \\sum_{i=0}^{2} \\sum_{j=0}^{2} a_{i,j} = 5,\n",
+    "y(5)=x(7)w(0)+x(6)w(1)+x(5)w(2)=0\\times \\alpha_0+0\\times\\alpha_1+\\beta_3\\alpha_2.\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "e53c0a21",
+   "id": "9eae3982",
    "metadata": {
     "editable": true
    },
    "source": [
-    "which corresponds to the first element of the filtered image $(f \\ast g)$.\n",
-    "\n",
-    "Here we use a stride of $S=1$, a parameter denoted $S$ which describes how\n",
-    "many indexes we move the kernel $g$ to the right before repeating the\n",
-    "calculations above for the next $3 \\times 3$ element of the image\n",
-    "$f$. It is usually presumed that $S=1$, however, larger values for $S$\n",
-    "can be used to reduce the dimentionality of the filtered image such\n",
-    "that the convolution operation is more computationally efficient. In\n",
-    "the context of a convolutional neural network, this will become very\n",
-    "useful.\n",
-    "\n",
-    "The full result of the convolution is:"
+    "The zeroth-order term is"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "aee15740",
+   "id": "52147ec0",
    "metadata": {
     "editable": true
    },
    "source": [
     "$$\n",
-    "(f \\ast g) =\n",
-    "\\begin{bmatrix}\n",
-    "5 & 5.78 & 7 & 6.44 \\\\\n",
-    "6.33 & 6.67 & 6.89 & 5.11 \\\\\n",
-    "5.44 & 5.78 & 5.78 & 4 \\\\\n",
-    "4.44 & 4.78 & 5.56 & 4 \\\\\n",
-    "\\end{bmatrix}\n",
+    "y(0)=x(2)w(0)+x(1)w(1)+x(0)w(2)=\\beta_0 \\alpha_0+0\\times\\alpha_1+0\\times\\alpha_2=\\alpha_0\\beta_0.\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "2ed1f60a",
+   "id": "f26b1f24",
    "metadata": {
     "editable": true
    },
    "source": [
-    "The result is markedly smaller in shape than the original image. This\n",
-    "occurs when using convolution without first padding the image with\n",
-    "additional columns and rows, allowing us to keep the original image\n",
-    "shape after sliding the kernel over the image.  How many rows and\n",
-    "columns we wish to pad the image with depends strictly on the shape of\n",
-    "the kernel, as we wish to pad the image with $r$ additional rows and\n",
-    "$c$ additional columns."
+    "## Rewriting as dot products\n",
+    "\n",
+    "If we now flip the filter/weight vector, with the following term as a typical example"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "9859970d",
+   "id": "1cda7b7e",
    "metadata": {
     "editable": true
    },
    "source": [
     "$$\n",
-    "r =\\lfloor \\frac{\\mathrm{kernel height}}{2} \\rfloor \\cdot 2 \\\\\n",
-    "c =\\lfloor \\frac{\\mathrm{kernel width}}{2} \\rfloor \\cdot 2\n",
+    "y(0)=x(2)w(0)+x(1)w(1)+x(0)w(2)=x(2)\\tilde{w}(2)+x(1)\\tilde{w}(1)+x(0)\\tilde{w}(0),\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "128b4dc2",
+   "id": "de80daa7",
    "metadata": {
     "editable": true
    },
    "source": [
-    "Note the notation $\\lfloor \\frac{\\mathrm{kernel width}}{2} \\rfloor$ means that\n",
-    "we floor the result of the division, meaning we round down to a whole\n",
-    "number in case $\\frac{\\mathrm{kernel width}}{2}$ results in a floating point\n",
-    "number.\n",
+    "with $\\tilde{w}(0)=w(2)$, $\\tilde{w}(1)=w(1)$, and $\\tilde{w}(2)=w(0)$, we can then rewrite the above sum as a dot product of\n",
+    "$x(i:i+(m-1))\\tilde{w}$ for element $y(i)$, where $x(i:i+(m-1))$ is simply a patch of $\\boldsymbol{x}$ of size $m-1$.\n",
     "\n",
-    "Using those simple equations, we find out by how much we have to\n",
-    "extend the dimensions of the original image. Before proceeding,\n",
-    "however, we might ask what we shall fill the additional rows and\n",
-    "columns with? One of the most common approaches to padding is\n",
-    "zero-padding, which as the name suggest, involves filling the rows and\n",
-    "columns with zeros. This is the approach that we will be using for\n",
-    "this demonstration. If we apply this padding to out original $6 \\times 6$\n",
-    "image, the result will be an $8 \\times 8$ image as the kernel has a width and\n",
-    "height of 3. Note that the original image is encapsuled by the\n",
-    "zero-padded rows and columns:"
+    "The padding $P$ we have introduced for the convolution stage is just\n",
+    "another hyperparameter which is introduced as part of the\n",
+    "architecture. Similarly, below we will also introduce another\n",
+    "hyperparameter called **Stride** $S$."
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "c6fb6517",
+   "id": "bdb16a64",
    "metadata": {
     "editable": true
    },
    "source": [
-    "$$\n",
-    "\\begin{bmatrix}\n",
-    "0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\\\\n",
-    "0 & 4 & 1 & 2 & 9 & 8 & 6 & 0 \\\\\n",
-    "0 & 9 & 5 & 9 & 5 & 8 & 5 & 0 \\\\\n",
-    "0 & 1 & 5 & 9 & 7 & 6 & 4 & 0 \\\\\n",
-    "0 & 2 & 9 & 8 & 3 & 7 & 1 & 0 \\\\\n",
-    "0 & 8 & 1 & 6 & 4 & 2 & 2 & 0 \\\\\n",
-    "0 & 1 & 0 & 5 & 7 & 8 & 2 & 0 \\\\\n",
-    "0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\\\\n",
-    "\\end{bmatrix}.\n",
-    "$$"
+    "## Cross correlation\n",
+    "\n",
+    "In essentially all applications one uses what is called cross correlation instead of the standard convolution described above.\n",
+    "This means that multiplication is performed in the same direction and instead of the general expression we have discussed above (with infinite sums)"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "b01d3014",
+   "id": "a88a1043",
    "metadata": {
     "editable": true
    },
    "source": [
-    "Below we have provided code that demonstrates padding and\n",
-    "convolution. As you will see when we run the code, the size of the\n",
-    "image will remain unchanged when using padding.~"
+    "$$\n",
+    "y(i) = \\sum_{k=-\\infty}^{k=\\infty}w(k)x(i-k),\n",
+    "$$"
    ]
   },
   {
-   "cell_type": "code",
-   "execution_count": 20,
-   "id": "67babc7b",
+   "cell_type": "markdown",
+   "id": "03659d77",
    "metadata": {
-    "collapsed": false,
     "editable": true
    },
-   "outputs": [],
    "source": [
-    "import numpy as np\n",
-    "\n",
-    "def padding(image, kernel):\n",
-    "    # calculate r and c\n",
-    "    r = (kernel.shape[0] // 2) * 2\n",
-    "    c = (kernel.shape[1] // 2) * 2\n",
-    "    \n",
-    "    # padded image dimensions\n",
-    "    padded_height = image.shape[0] + r\n",
-    "    padded_width = image.shape[1] + c\n",
-    "    \n",
-    "    # for more readable code\n",
-    "    k_half_height = kernel.shape[0] // 2\n",
-    "    k_half_width = kernel.shape[1] // 2\n",
-    "\n",
-    "    # zero matrix with padded dimensions\n",
-    "    padded_img = np.zeros((padded_height, padded_width))\n",
-    "\n",
-    "    # place image into zero matrix\n",
-    "    padded_img[k_half_height : padded_height - k_half_height,\n",
-    "               k_half_width : padded_width - k_half_width] = image[:, :]\n",
-    "\n",
-    "    return padded_img\n",
-    "\n",
-    "def convolve(original_image, padded_image, kernel, stride=1):\n",
-    "    # rotate kernel by 180 degrees\n",
-    "    kernel = np.rot90(np.rot90(kernel))\n",
-    "\n",
-    "    # note that kernel height // 2 is written as 'm'\n",
-    "    # and kernel width // 2 as 'n' in the mathematical notation\n",
-    "    m = kernel.shape[0] // 2\n",
-    "    n = kernel.shape[1] // 2\n",
-    "    \n",
-    "    r = (kernel.shape[0] // 2) * 2\n",
-    "    c = (kernel.shape[1] // 2) * 2\n",
-    "    \n",
-    "    # initialize output array\n",
-    "    convolved_image = np.zeros(original_image.shape)\n",
-    "    image_height = original_image.shape[0]\n",
-    "    image_width = original_image.shape[1]\n",
-    "\n",
-    "    # the convolution\n",
-    "    for i in range(m, image_height + m, stride):\n",
-    "        for j in range(n, image_width + n, stride):\n",
-    "            convolved_image[i-m, j-n] = np.sum(\n",
-    "                padded_image[i : i + m, j : j + n]\n",
-    "                * kernel\n",
-    "            )\n",
-    "            \n",
-    "    return convolved_image\n",
-    "\n",
-    "def convolve(image, kernel, stride=1):\n",
-    "    for i in range(2):\n",
-    "        kernel = np.rot90(kernel)\n",
-    "\n",
-    "    k_half_height = kernel.shape[0] // 2\n",
-    "    k_half_width = kernel.shape[0] // 2\n",
-    "\n",
-    "    conv_image = np.zeros(image.shape)\n",
-    "    pad_image = padding(image, kernel)\n",
-    "\n",
-    "    for i in range(k_half_height, conv_image.shape[0] + k_half_height, stride):\n",
-    "        for j in range(k_half_width, conv_image.shape[1] + k_half_width, stride):\n",
-    "            conv_image[i - k_half_height, j - k_half_width] = np.sum(\n",
-    "                pad_image[\n",
-    "                    i - k_half_height : i + k_half_height + 1, j - k_half_width : j + k_half_width + 1\n",
-    "                ]\n",
-    "                * kernel\n",
-    "            )\n",
-    "\n",
-    "    return conv_image"
+    "we have now"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "406e9884",
+   "id": "532e84de",
    "metadata": {
     "editable": true
    },
    "source": [
-    "Fun fact: When filtering images, you will see that convolution\n",
-    "involves rotating the kernel by 180 degrees.  However, this is not the\n",
-    "case when applying convolution in a CNN, where the same operation that is not\n",
-    "rotated by 180 degrees is called cross-correlation, which is normally implemented in most libraries."
+    "$$\n",
+    "y(i) = \\sum_{k=-\\infty}^{k=\\infty}w(k)x(i+k).\n",
+    "$$"
    ]
   },
   {
-   "cell_type": "code",
-   "execution_count": 21,
-   "id": "c9887d3f",
+   "cell_type": "markdown",
+   "id": "0487f1f5",
    "metadata": {
-    "collapsed": false,
     "editable": true
    },
-   "outputs": [],
    "source": [
+    "Both TensorFlow and PyTorch (as well as our own code example below),\n",
+    "implement the last equation, although it is normally referred to as\n",
+    "convolution.  The same padding rules and stride rules discussed below\n",
+    "apply to this expression as well.\n",
     "\n",
-    "original_image = np.array([[4, 1, 2, 9, 8, 6],\n",
-    "                 [9, 5, 9, 5, 8, 5],\n",
-    "                 [1, 5, 9, 7, 6, 4],\n",
-    "                 [2, 9, 8, 3, 7, 1],\n",
-    "                 [8, 1, 6, 4, 2, 2],\n",
-    "                 [1, 0, 5, 7, 8, 2]])\n",
-    "\n",
-    "kernel = (1/9)*np.ones((3,3))\n",
-    "\n",
-    "print(f\"{original_image.shape=}\")\n",
-    "\n",
-    "# note that convolve() performs padding\n",
-    "convolved_image = convolve(original_image, kernel, stride=1)\n",
-    "\n",
-    "print(f\"{convolved_image.shape=}\")"
+    "We leave it as an exercise for you to convince yourself that the example we have discussed till now, gives the same final expression using the last expression."
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "239d497a",
+   "id": "98475dfa",
    "metadata": {
     "editable": true
    },
    "source": [
-    "As you can see, the resulting image is of the same size as the\n",
-    "original image. To round of our demonstration of convolution, we will\n",
-    "present the results of convolution using commonly used kernels. In a\n",
-    "CNN, the values of the kernels are randomly initialized, and then\n",
-    "learned during training. These kernels will extract information\n",
-    "regarding the picture, such as for example the edge detection filter\n",
-    "demonstrated below extracts the edges present in the picture. Of\n",
-    "course, there is no guarantee that the CNN will learn an edge\n",
-    "detection filter, but this should provide some intuiton as to how the\n",
-    "CNN is able to use kernels to make better predictions than a regular\n",
-    "feed forward neural network."
+    "## Two-dimensional objects\n",
+    "\n",
+    "We are now ready to start studying the discrete convolutions relevant for convolutional neural networks.\n",
+    "We often use convolutions over more than one dimension at a time. If\n",
+    "we have a two-dimensional image $X$ as input, we can have a **filter**\n",
+    "defined by a two-dimensional **kernel/weight/filter** $W$. This leads to an output $Y$"
    ]
   },
   {
-   "cell_type": "code",
-   "execution_count": 22,
-   "id": "bf4ab84e",
+   "cell_type": "markdown",
+   "id": "1cb3be71",
    "metadata": {
-    "collapsed": false,
     "editable": true
    },
-   "outputs": [],
    "source": [
-    "# Now an example using a real image and first a gaussian low-pass filter and then a Sobel filter\n",
-    "import numpy as np\n",
-    "import imageio.v3 as imageio\n",
-    "import matplotlib.pyplot as plt\n",
-    "import time\n",
-    "\n",
-    "def generate_gauss_mask(sigma, K=1):\n",
-    "    side = np.ceil(1 + 8 * sigma)\n",
-    "    y, x = np.mgrid[-side // 2 + 1 : (side // 2) + 1, -side // 2 + 1 : (side // 2) + 1]\n",
-    "    ker_coef = K / (2 * np.pi * sigma**2)\n",
-    "    g = np.exp(-((x**2 + y**2) / (2.0 * sigma**2)))\n",
-    "\n",
-    "    return g, ker_coef\n",
-    "\n",
-    "\n",
-    "img_path = \"data/IMG-2167.JPG\"\n",
-    "image_of_cute_dog = imageio.imread(img_path, mode='L')\n",
-    "\n",
-    "plt.imshow(image_of_cute_dog, cmap=\"gray\", vmin=0, vmax=255, aspect=\"auto\")\n",
-    "plt.title(\"Original image\")\n",
-    "plt.show()\n",
-    "\n",
-    "gauss, kernel = generate_gauss_mask(sigma=6)\n",
-    "gauss_kernel = gauss*kernel\n",
-    "\n",
-    "filtered_image = convolve(image_of_cute_dog, gauss_kernel)\n",
-    "plt.imshow(filtered_image, cmap=\"gray\", vmin=0, vmax=255, aspect=\"auto\")\n",
-    "plt.title(\"Result of convolution with gauss kernel (blurring filter)\")\n",
-    "plt.show()\n",
-    "\n",
-    "sobel_kernel = np.array([[1, 2, 1],\n",
-    "                    [0, 0, 0], \n",
-    "                    [-1, -2, -1]])\n",
-    "\n",
-    "filtered_image = convolve(image_of_cute_dog, sobel_kernel)\n",
-    "\n",
-    "plt.imshow(filtered_image, cmap=\"gray\", vmin=0, vmax=255, aspect=\"auto\")\n",
-    "plt.title(\"Result of convolution with sobel kernel (edge detection filter)\")\n",
-    "plt.show()"
+    "$$\n",
+    "Y(i,j)=(X * W)(i,j) = \\sum_m\\sum_n X(m,n)W(i-m,j-n).\n",
+    "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "4bf788aa",
+   "id": "bd9cd9fb",
    "metadata": {
     "editable": true
    },
    "source": [
-    "### Layers\n",
-    "\n",
-    "The code below initialises global variables for readability and\n",
-    "describes the abstract class Layers. This is not important in order to\n",
-    "understand the CNN, but is benefitial for organizing the code neatly."
+    "Convolution is a commutative process, which means we can rewrite this equation as"
    ]
   },
   {
-   "cell_type": "code",
-   "execution_count": 23,
-   "id": "6d3dca2d",
+   "cell_type": "markdown",
+   "id": "1ba314a8",
    "metadata": {
-    "collapsed": false,
     "editable": true
    },
-   "outputs": [],
    "source": [
-    "import math\n",
-    "import autograd.numpy as np\n",
-    "from copy import deepcopy, copy\n",
-    "from autograd import grad\n",
-    "from typing import Callable\n",
-    "\n",
-    "# global variables for index readability\n",
-    "input_index = 0\n",
-    "node_index = 1\n",
-    "bias_index = 1\n",
-    "input_channel_index = 1\n",
-    "feature_maps_index = 1\n",
-    "height_index = 2\n",
-    "width_index = 3\n",
-    "kernel_feature_maps_index = 1\n",
-    "kernel_input_channels_index = 0\n",
-    "\n",
-    "\n",
-    "class Layer:\n",
-    "    def __init__(self, seed):\n",
-    "        self.seed = seed\n",
-    "\n",
-    "    def _feedforward(self):\n",
-    "        raise NotImplementedError\n",
-    "\n",
-    "    def _backpropagate(self):\n",
-    "        raise NotImplementedError\n",
-    "\n",
-    "    def _reset_weights(self, previous_nodes):\n",
-    "        raise NotImplementedError"
+    "$$\n",
+    "Y(i,j)=(X * W)(i,j) = \\sum_m\\sum_n X(i-m,j-n)W(m,n).\n",
+    "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "0dacddbe",
+   "id": "b9fb3fef",
    "metadata": {
     "editable": true
    },
    "source": [
-    "### Convolution2DLayer: convolution in a hidden layer\n",
-    "\n",
-    "After establishing the foundational understanding of applying\n",
-    "convolution to spatial data, let us delve into the intricate workings\n",
-    "of a convolutional layer in a Convolutional Neural Network (CNN). The\n",
-    "primary function of convolution, as previously discussed, is to\n",
-    "extract pertinent information from images while simultaneously\n",
-    "decreasing the scale of our data. To initiate the image processing, we\n",
-    "shall begin by partitioning the images into color channels (unless the\n",
-    "image is grayscale), comprising three primary colors: red, green, and\n",
-    "blue. We will subsequently utilize trainable kernels to construct a\n",
-    "higher-dimensional encoding of each channel called feature\n",
-    "maps. Successive layers will receive these feature maps as inputs,\n",
-    "generating further encodings, albeit with reduced dimensions. The term\n",
-    "trainable kernels denotes the initialization of pre-defined\n",
-    "kernel-shaped weights, which we will then train via backpropagation,\n",
-    "similar to how weights are trained in a Feedforward Neural Network.\n",
-    "\n",
-    "To ensure seamless integration between our implementation of the\n",
-    "convolutional layer and popular machine learning frameworks like\n",
-    "Tensorflow (Keras) and PyTorch, we have adopted a design pattern that\n",
-    "mirrors the construction of models using these APIs. This involves\n",
-    "implementing our convolutional layer as a Python class or object,\n",
-    "which allows for a more modular and flexible approach to building\n",
-    "neural networks. By structuring our code in this way, users can easily\n",
-    "incorporate our implementation into their existing machine learning\n",
-    "pipelines without having to make significant changes to their\n",
-    "codebase. Additionally, this design pattern promotes code reusability\n",
-    "and makes it easier to maintain and update our convolutional layer\n",
-    "implementation over time.\n",
+    "Normally the latter is more straightforward to implement in a machine\n",
+    "larning library since there is less variation in the range of values\n",
+    "of $m$ and $n$.\n",
     "\n",
-    "Note that the Convolution2DLayer takes in an activation function as a parameter, as it also performs non-linearity."
+    "As mentioned above, most deep learning libraries implement\n",
+    "cross-correlation instead of convolution (although it is referred to as\n",
+    "convolution)"
    ]
   },
   {
-   "cell_type": "code",
-   "execution_count": 24,
-   "id": "17f9a3e4",
+   "cell_type": "markdown",
+   "id": "2d48086b",
    "metadata": {
-    "collapsed": false,
     "editable": true
    },
-   "outputs": [],
    "source": [
-    "class Convolution2DLayer(Layer):\n",
-    "    def __init__(\n",
-    "        self,\n",
-    "        input_channels,\n",
-    "        feature_maps,\n",
-    "        kernel_height,\n",
-    "        kernel_width,\n",
-    "        v_stride,\n",
-    "        h_stride,\n",
-    "        pad,\n",
-    "        act_func: Callable,\n",
-    "        seed=None,\n",
-    "        reset_weights_independently=True,\n",
-    "    ):\n",
-    "        super().__init__(seed)\n",
-    "        self.input_channels = input_channels\n",
-    "        self.feature_maps = feature_maps\n",
-    "        self.kernel_height = kernel_height\n",
-    "        self.kernel_width = kernel_width\n",
-    "        self.v_stride = v_stride\n",
-    "        self.h_stride = h_stride\n",
-    "        self.pad = pad\n",
-    "        self.act_func = act_func\n",
-    "\n",
-    "        # such that the layer can be used on its own\n",
-    "        # outside of the CNN module\n",
-    "        if reset_weights_independently == True:\n",
-    "            self._reset_weights_independently()\n",
-    "\n",
-    "    def _feedforward(self, X_batch):\n",
-    "        # note that the shape of X_batch = [inputs, input_maps, img_height, img_width]\n",
-    "\n",
-    "        # pad the input batch\n",
-    "        X_batch_padded = self._padding(X_batch)\n",
-    "\n",
-    "        # calculate height_index and width_index after stride\n",
-    "        strided_height = int(np.ceil(X_batch.shape[height_index] / self.v_stride))\n",
-    "        strided_width = int(np.ceil(X_batch.shape[width_index] / self.h_stride))\n",
-    "\n",
-    "        # create output array\n",
-    "        output = np.ndarray(\n",
-    "            (\n",
-    "                X_batch.shape[input_index],\n",
-    "                self.feature_maps,\n",
-    "                strided_height,\n",
-    "                strided_width,\n",
-    "            )\n",
-    "        )\n",
-    "\n",
-    "        # save input and output for backpropagation\n",
-    "        self.X_batch_feedforward = X_batch\n",
-    "        self.output_shape = output.shape\n",
-    "\n",
-    "        # checking for errors, no need to look here :)\n",
-    "        self._check_for_errors()\n",
-    "\n",
-    "        # convolve input with kernel\n",
-    "        for img in range(X_batch.shape[input_index]):\n",
-    "            for chin in range(self.input_channels):\n",
-    "                for fmap in range(self.feature_maps):\n",
-    "                    out_h = 0\n",
-    "                    for h in range(0, X_batch.shape[height_index], self.v_stride):\n",
-    "                        out_w = 0\n",
-    "                        for w in range(0, X_batch.shape[width_index], self.h_stride):\n",
-    "                            output[img, fmap, out_h, out_w] = np.sum(\n",
-    "                                X_batch_padded[\n",
-    "                                    img,\n",
-    "                                    chin,\n",
-    "                                    h : h + self.kernel_height,\n",
-    "                                    w : w + self.kernel_width,\n",
-    "                                ]\n",
-    "                                * self.kernel[chin, fmap, :, :]\n",
-    "                            )\n",
-    "                            out_w += 1\n",
-    "                        out_h += 1\n",
-    "\n",
-    "        # Pay attention to the fact that we're not rotating the kernel by 180 degrees when filtering the image in\n",
-    "        # the convolutional layer, as convolution in terms of Machine Learning is a procedure known as cross-correlation\n",
-    "        # in image processing and signal processing\n",
-    "\n",
-    "        # return a\n",
-    "        return self.act_func(output / (self.kernel_height))\n",
-    "\n",
-    "    def _backpropagate(self, delta_term_next):\n",
-    "        # intiate matrices\n",
-    "        delta_term = np.zeros((self.X_batch_feedforward.shape))\n",
-    "        gradient_kernel = np.zeros((self.kernel.shape))\n",
-    "\n",
-    "        # pad input for convolution\n",
-    "        X_batch_padded = self._padding(self.X_batch_feedforward)\n",
-    "\n",
-    "        # Since an activation function is used at the output of the convolution layer, its derivative\n",
-    "        # has to be accounted for in the backpropagation -> as if ReLU was a layer on its own.\n",
-    "        act_derivative = derivate(self.act_func)\n",
-    "        delta_term_next = act_derivative(delta_term_next)\n",
-    "\n",
-    "        # fill in 0's for values removed by vertical stride in feedforward\n",
-    "        if self.v_stride > 1:\n",
-    "            v_ind = 1\n",
-    "            for i in range(delta_term_next.shape[height_index]):\n",
-    "                for j in range(self.v_stride - 1):\n",
-    "                    delta_term_next = np.insert(\n",
-    "                        delta_term_next, v_ind, 0, axis=height_index\n",
-    "                    )\n",
-    "                v_ind += self.v_stride\n",
-    "\n",
-    "        # fill in 0's for values removed by horizontal stride in feedforward\n",
-    "        if self.h_stride > 1:\n",
-    "            h_ind = 1\n",
-    "            for i in range(delta_term_next.shape[width_index]):\n",
-    "                for k in range(self.h_stride - 1):\n",
-    "                    delta_term_next = np.insert(\n",
-    "                        delta_term_next, h_ind, 0, axis=width_index\n",
-    "                    )\n",
-    "                h_ind += self.h_stride\n",
-    "\n",
-    "        # crops out 0-rows and 0-columns\n",
-    "        delta_term_next = delta_term_next[\n",
-    "            :,\n",
-    "            :,\n",
-    "            : self.X_batch_feedforward.shape[height_index],\n",
-    "            : self.X_batch_feedforward.shape[width_index],\n",
-    "        ]\n",
-    "\n",
-    "        # the gradient received from the next layer also needs to be padded\n",
-    "        delta_term_next = self._padding(delta_term_next)\n",
-    "\n",
-    "        # calculate delta term by convolving next delta term with kernel\n",
-    "        for img in range(self.X_batch_feedforward.shape[input_index]):\n",
-    "            for chin in range(self.input_channels):\n",
-    "                for fmap in range(self.feature_maps):\n",
-    "                    for h in range(self.X_batch_feedforward.shape[height_index]):\n",
-    "                        for w in range(self.X_batch_feedforward.shape[width_index]):\n",
-    "                            delta_term[img, chin, h, w] = np.sum(\n",
-    "                                delta_term_next[\n",
-    "                                    img,\n",
-    "                                    fmap,\n",
-    "                                    h : h + self.kernel_height,\n",
-    "                                    w : w + self.kernel_width,\n",
-    "                                ]\n",
-    "                                * np.rot90(np.rot90(self.kernel[chin, fmap, :, :]))\n",
-    "                            )\n",
-    "\n",
-    "        # calculate gradient for kernel for weight update\n",
-    "        # also via convolution\n",
-    "        for chin in range(self.input_channels):\n",
-    "            for fmap in range(self.feature_maps):\n",
-    "                for k_x in range(self.kernel_height):\n",
-    "                    for k_y in range(self.kernel_width):\n",
-    "                        gradient_kernel[chin, fmap, k_x, k_y] = np.sum(\n",
-    "                            X_batch_padded[\n",
-    "                                img,\n",
-    "                                chin,\n",
-    "                                h : h + self.kernel_height,\n",
-    "                                w : w + self.kernel_width,\n",
-    "                            ]\n",
-    "                            * delta_term_next[\n",
-    "                                img,\n",
-    "                                fmap,\n",
-    "                                h : h + self.kernel_height,\n",
-    "                                w : w + self.kernel_width,\n",
-    "                            ]\n",
-    "                        )\n",
-    "        # all kernels are updated with weight gradient of kernel\n",
-    "        self.kernel -= gradient_kernel\n",
-    "\n",
-    "        # return delta term\n",
-    "        return delta_term\n",
-    "\n",
-    "    def _padding(self, X_batch, batch_type=\"image\"):\n",
-    "\n",
-    "        # same padding for images\n",
-    "        if self.pad == \"same\" and batch_type == \"image\":\n",
-    "            padded_height = X_batch.shape[height_index] + (self.kernel_height // 2) * 2\n",
-    "            padded_width = X_batch.shape[width_index] + (self.kernel_width // 2) * 2\n",
-    "            half_kernel_height = self.kernel_height // 2\n",
-    "            half_kernel_width = self.kernel_width // 2\n",
-    "\n",
-    "            # initialize padded array\n",
-    "            X_batch_padded = np.ndarray(\n",
-    "                (\n",
-    "                    X_batch.shape[input_index],\n",
-    "                    X_batch.shape[feature_maps_index],\n",
-    "                    padded_height,\n",
-    "                    padded_width,\n",
-    "                )\n",
-    "            )\n",
-    "\n",
-    "            # zero pad all images in X_batch\n",
-    "            for img in range(X_batch.shape[input_index]):\n",
-    "                padded_img = np.zeros(\n",
-    "                    (X_batch.shape[feature_maps_index], padded_height, padded_width)\n",
-    "                )\n",
-    "                padded_img[\n",
-    "                    :,\n",
-    "                    half_kernel_height : padded_height - half_kernel_height,\n",
-    "                    half_kernel_width : padded_width - half_kernel_width,\n",
-    "                ] = X_batch[img, :, :, :]\n",
-    "                X_batch_padded[img, :, :, :] = padded_img[:, :, :]\n",
-    "\n",
-    "            return X_batch_padded\n",
-    "\n",
-    "        # same padding for gradients\n",
-    "        elif self.pad == \"same\" and batch_type == \"grad\":\n",
-    "            padded_height = X_batch.shape[height_index] + (self.kernel_height // 2) * 2\n",
-    "            padded_width = X_batch.shape[width_index] + (self.kernel_width // 2) * 2\n",
-    "            half_kernel_height = self.kernel_height // 2\n",
-    "            half_kernel_width = self.kernel_width // 2\n",
-    "\n",
-    "            # initialize padded array\n",
-    "            delta_term_padded = np.zeros(\n",
-    "                (\n",
-    "                    X_batch.shape[input_index],\n",
-    "                    X_batch.shape[feature_maps_index],\n",
-    "                    padded_height,\n",
-    "                    padded_width,\n",
-    "                )\n",
-    "            )\n",
-    "\n",
-    "            # zero pad delta term\n",
-    "            delta_term_padded[\n",
-    "                :, :, : X_batch.shape[height_index], : X_batch.shape[width_index]\n",
-    "            ] = X_batch[:, :, :, :]\n",
-    "\n",
-    "            return delta_term_padded\n",
-    "\n",
-    "        else:\n",
-    "            return X_batch\n",
-    "\n",
-    "    def _reset_weights_independently(self):\n",
-    "        # sets seed to remove randomness inbetween runs\n",
-    "        if self.seed is not None:\n",
-    "            np.random.seed(self.seed)\n",
-    "\n",
-    "        # initializes kernel matrix\n",
-    "        self.kernel = np.ndarray(\n",
-    "            (\n",
-    "                self.input_channels,\n",
-    "                self.feature_maps,\n",
-    "                self.kernel_height,\n",
-    "                self.kernel_width,\n",
-    "            )\n",
-    "        )\n",
-    "\n",
-    "        # randomly initializes weights\n",
-    "        for chin in range(self.kernel.shape[kernel_input_channels_index]):\n",
-    "            for fmap in range(self.kernel.shape[kernel_feature_maps_index]):\n",
-    "                self.kernel[chin, fmap, :, :] = np.random.rand(\n",
-    "                    self.kernel_height, self.kernel_width\n",
-    "                )\n",
-    "\n",
-    "    def _reset_weights(self, previous_nodes):\n",
-    "        # sets weights\n",
-    "        self._reset_weights_independently()\n",
-    "\n",
-    "        # returns shape of output used for subsequent layer's weight initiation\n",
-    "        strided_height = int(\n",
-    "            np.ceil(previous_nodes.shape[height_index] / self.v_stride)\n",
-    "        )\n",
-    "        strided_width = int(np.ceil(previous_nodes.shape[width_index] / self.h_stride))\n",
-    "        next_nodes = np.ones(\n",
-    "            (\n",
-    "                previous_nodes.shape[input_index],\n",
-    "                self.feature_maps,\n",
-    "                strided_height,\n",
-    "                strided_width,\n",
-    "            )\n",
-    "        )\n",
-    "        return next_nodes / self.kernel_height\n",
-    "\n",
-    "    def _check_for_errors(self):\n",
-    "        if self.X_batch_feedforward.shape[input_channel_index] != self.input_channels:\n",
-    "            raise AssertionError(\n",
-    "                f\"ERROR: Number of input channels in data ({self.X_batch_feedforward.shape[input_channel_index]}) is not equal to input channels in Convolution2DLayerOPT ({self.input_channels})! Please change the number of input channels of the Convolution2DLayer such that they are equal\"\n",
-    "            )"
+    "$$\n",
+    "Y(i,j)=(X * W)(i,j) = \\sum_m\\sum_n X(i+m,j+n)W(m,n).\n",
+    "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "2ebd51a9",
+   "id": "2a62fbae",
    "metadata": {
     "editable": true
    },
    "source": [
-    "### Backpropagation in the convolutional layer\n",
+    "## CNNs in more detail, simple example\n",
     "\n",
-    "As you may have noticed, we have not yet explained how the\n",
-    "backpropagation algorithm works in a convolutional layer. However,\n",
-    "having covered all other major details about convolutional layers, we\n",
-    "are now prepared to do so. It should come as no surprise that the\n",
-    "calculation of delta terms at each convolutional layer takes the form\n",
-    "of convolution. After the gradient has been propagated backwards\n",
-    "through the flattening layer, where it was reshaped into an\n",
-    "appropriate form, calculating the update value for the kernel is\n",
-    "simply a matter of convolving the output gradient with the input of\n",
-    "the layer for which we are updating the weights. For more detail, this\n",
-    "article serves as an excellent resource, see\n",
-    "<https://pavisj.medium.com/convolutions-and-backpropagations-46026a8f5d2c>"
+    "Let assume we have an input matrix $X$ of dimensionality $3\\times 3$\n",
+    "and a $2\\times 2$ filter $W$ given by the following matrices"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "6e37ad83",
+   "id": "0176ecc6",
    "metadata": {
     "editable": true
    },
    "source": [
-    "### Demonstration\n",
-    "\n",
-    "We can use the convolutional layer above to perform a simple convolution on an image of the now familiar cute dog."
+    "$$\n",
+    "\\boldsymbol{X}=\\begin{bmatrix}x_{00} & x_{01} & x_{02}  \\\\\n",
+    "                      x_{10} & x_{11} & x_{12}  \\\\\n",
+    "\t              x_{20} & x_{21} & x_{22} \\end{bmatrix},\n",
+    "$$"
    ]
   },
   {
-   "cell_type": "code",
-   "execution_count": 25,
-   "id": "09290632",
+   "cell_type": "markdown",
+   "id": "f87b6051",
    "metadata": {
-    "collapsed": false,
     "editable": true
    },
-   "outputs": [],
    "source": [
-    "import numpy as np\n",
-    "import imageio.v3 as imageio\n",
-    "import matplotlib.pyplot as plt\n",
-    "\n",
-    "def plot_convolution_result(X, layer):\n",
-    "    plt.imshow(X[0, 0, :, :], vmin=0, vmax=255, cmap=\"gray\")\n",
-    "    plt.title(\"Original image\")\n",
-    "    plt.colorbar()\n",
-    "    plt.show()\n",
-    "    conv_result = layer._feedforward(X)\n",
-    "    plt.title(\"Result of convolutional layer\")\n",
-    "    plt.imshow(conv_result[0, 0, :, :], vmin=0, vmax=255, cmap=\"gray\")\n",
-    "    plt.colorbar()\n",
-    "    plt.show()\n",
-    "\n",
-    "# create layer\n",
-    "layer = Convolution2DLayer(\n",
-    "    input_channels=3,\n",
-    "    feature_maps=1,\n",
-    "    kernel_height=4,\n",
-    "    kernel_width=4,\n",
-    "    v_stride=2,\n",
-    "    h_stride=2,\n",
-    "    pad=\"same\",\n",
-    "    act_func=identity,\n",
-    "    seed=2023,\n",
-    "    )\n",
-    "\n",
-    "# read in image path, make data correct format\n",
-    "img_path = img_path = \"data/IMG-2167.JPG\"\n",
-    "image_of_cute_dog = imageio.imread(img_path)\n",
-    "image_shape = image_of_cute_dog.shape\n",
-    "image_of_cute_dog = image_of_cute_dog.reshape(1, image_shape[0], image_shape[1], image_shape[2])\n",
-    "image_of_cute_dog = image_of_cute_dog.transpose(0, 3, 1, 2)\n",
-    "\n",
-    "# plot the result of the convolution\n",
-    "plot_convolution_result(image_of_cute_dog, layer)"
+    "and"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "111f84ed",
+   "id": "164502cc",
    "metadata": {
     "editable": true
    },
    "source": [
-    "We cobserve that the result has half the pixels on each axis due to\n",
-    "the fact that we've used a horizontal and vertical stride of 2. The\n",
-    "result of this convolution is not very insightfull, as the kernel has\n",
-    "completely random values for the first feedforward pass. However, as\n",
-    "we perform multiple forward and backward passes, the results of the\n",
-    "convolution should provide identifying features of the image it uses\n",
-    "for classification.\n",
-    "\n",
-    "Note that image data usually comes in many different shapes and sizes,\n",
-    "but for our CNN we require the input data be formatted as \\[Number of\n",
-    "inputs, input channels, input height, input width\\]. Occasionally, the\n",
-    "data you come accross use will be formatted like this, but on many\n",
-    "occasions reshaping and transposing the dimensions is sadly necessary."
+    "$$\n",
+    "\\boldsymbol{W}=\\begin{bmatrix}w_{00} & w_{01} \\\\\n",
+    "\t              w_{10} & w_{11}\\end{bmatrix}.\n",
+    "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "589e5822",
+   "id": "1d4e61fe",
    "metadata": {
     "editable": true
    },
    "source": [
-    "### Pooling Layer\n",
+    "We introduce now the hyperparameter $S$ **stride**. Stride represents how the filter $W$ moves the convolution process on the matrix $X$.\n",
+    "We strongly recommend the repository on [Arithmetic of deep learning by Dumoulin and Visin](https://github.com/vdumoulin/conv_arithmetic) \n",
     "\n",
-    "The pooling layer is another widely used type of layer in\n",
-    "convolutional neural networks that enables data downsampling to a more\n",
-    "manageable size. Despite recent technological advancements that allow\n",
-    "for convolution without excessive size reduction of the data, the\n",
-    "pooling layer still remains a fundamental component of convolutional\n",
-    "neural networks. It can be used before, after, or in between\n",
-    "convolutional layers, although finding the optimal placement of layers\n",
-    "and network depth requires experimentation to achieve the best\n",
-    "performance for a given problem. The code we provide allows you to\n",
-    "perform two types of pooling known as max pooling and average pooling."
+    "Here we set the stride equal to $S=1$, which means that, starting with the element $x_{00}$, the filter will act on $2\\times 2$ submatrices each time, starting with the upper corner and moving according to the stride value column by column. \n",
+    "\n",
+    "Here we perform the operation"
    ]
   },
   {
-   "cell_type": "code",
-   "execution_count": 26,
-   "id": "c9c28b65",
+   "cell_type": "markdown",
+   "id": "7aae890d",
    "metadata": {
-    "collapsed": false,
     "editable": true
    },
-   "outputs": [],
    "source": [
-    "class Pooling2DLayer(Layer):\n",
-    "    def __init__(\n",
-    "        self,\n",
-    "        kernel_height,\n",
-    "        kernel_width,\n",
-    "        v_stride,\n",
-    "        h_stride,\n",
-    "        pooling=\"max\",\n",
-    "        seed=None,\n",
-    "    ):\n",
-    "        super().__init__(seed)\n",
-    "        self.kernel_height = kernel_height\n",
-    "        self.kernel_width = kernel_width\n",
-    "        self.v_stride = v_stride\n",
-    "        self.h_stride = h_stride\n",
-    "        self.pooling = pooling\n",
-    "\n",
-    "    def _feedforward(self, X_batch):\n",
-    "        # Saving the input for use in the backwardpass\n",
-    "        self.X_batch_feedforward = X_batch\n",
-    "\n",
-    "        # check if user is silly\n",
-    "        self._check_for_errors()\n",
-    "\n",
-    "        # Computing the size of the feature maps based on kernel size and the stride parameter\n",
-    "        strided_height = (\n",
-    "            X_batch.shape[height_index] - self.kernel_height\n",
-    "        ) // self.v_stride + 1\n",
-    "        if X_batch.shape[height_index] == X_batch.shape[width_index]:\n",
-    "            strided_width = strided_height\n",
-    "        else:\n",
-    "            strided_width = (\n",
-    "                X_batch.shape[width_index] - self.kernel_width\n",
-    "            ) // self.h_stride + 1\n",
-    "\n",
-    "        # initialize output array\n",
-    "        output = np.ndarray(\n",
-    "            (\n",
-    "                X_batch.shape[input_index],\n",
-    "                X_batch.shape[feature_maps_index],\n",
-    "                strided_height,\n",
-    "                strided_width,\n",
-    "            )\n",
-    "        )\n",
-    "\n",
-    "        # select pooling action, either max or average pooling\n",
-    "        if self.pooling == \"max\":\n",
-    "            self.pooling_action = np.max\n",
-    "        elif self.pooling == \"average\":\n",
-    "            self.pooling_action = np.mean\n",
-    "\n",
-    "        # pool based on kernel size and stride\n",
-    "        for img in range(output.shape[input_index]):\n",
-    "            for fmap in range(output.shape[feature_maps_index]):\n",
-    "                for h in range(strided_height):\n",
-    "                    for w in range(strided_width):\n",
-    "                        output[img, fmap, h, w] = self.pooling_action(\n",
-    "                            X_batch[\n",
-    "                                img,\n",
-    "                                fmap,\n",
-    "                                (h * self.v_stride) : (h * self.v_stride)\n",
-    "                                + self.kernel_height,\n",
-    "                                (w * self.h_stride) : (w * self.h_stride)\n",
-    "                                + self.kernel_width,\n",
-    "                            ]\n",
-    "                        )\n",
-    "\n",
-    "        # output for feedforward in next layer\n",
-    "        return output\n",
-    "\n",
-    "    def _backpropagate(self, delta_term_next):\n",
-    "        # initiate delta term array\n",
-    "        delta_term = np.zeros((self.X_batch_feedforward.shape))\n",
-    "\n",
-    "        for img in range(delta_term_next.shape[input_index]):\n",
-    "            for fmap in range(delta_term_next.shape[feature_maps_index]):\n",
-    "                for h in range(0, delta_term_next.shape[height_index], self.v_stride):\n",
-    "                    for w in range(\n",
-    "                        0, delta_term_next.shape[width_index], self.h_stride\n",
-    "                    ):\n",
-    "                        # max pooling\n",
-    "                        if self.pooling == \"max\":\n",
-    "                            # get window\n",
-    "                            window = self.X_batch_feedforward[\n",
-    "                                img,\n",
-    "                                fmap,\n",
-    "                                h : h + self.kernel_height,\n",
-    "                                w : w + self.kernel_width,\n",
-    "                            ]\n",
-    "\n",
-    "                            # find max values indices in window\n",
-    "                            max_h, max_w = np.unravel_index(\n",
-    "                                window.argmax(), window.shape\n",
-    "                            )\n",
-    "\n",
-    "                            # set values in new, upsampled delta term\n",
-    "                            delta_term[\n",
-    "                                img,\n",
-    "                                fmap,\n",
-    "                                (h + max_h),\n",
-    "                                (w + max_w),\n",
-    "                            ] += delta_term_next[img, fmap, h, w]\n",
-    "\n",
-    "                        # average pooling\n",
-    "                        if self.pooling == \"average\":\n",
-    "                            delta_term[\n",
-    "                                img,\n",
-    "                                fmap,\n",
-    "                                h : h + self.kernel_height,\n",
-    "                                w : w + self.kernel_width,\n",
-    "                            ] = (\n",
-    "                                delta_term_next[img, fmap, h, w]\n",
-    "                                / self.kernel_height\n",
-    "                                / self.kernel_width\n",
-    "                            )\n",
-    "        # returns input to backpropagation in previous layer\n",
-    "        return delta_term\n",
-    "\n",
-    "    def _reset_weights(self, previous_nodes):\n",
-    "        # calculate strided height, strided width\n",
-    "        strided_height = (\n",
-    "            previous_nodes.shape[height_index] - self.kernel_height\n",
-    "        ) // self.v_stride + 1\n",
-    "        if previous_nodes.shape[height_index] == previous_nodes.shape[width_index]:\n",
-    "            strided_width = strided_height\n",
-    "        else:\n",
-    "            strided_width = (\n",
-    "                previous_nodes.shape[width_index] - self.kernel_width\n",
-    "            ) // self.h_stride + 1\n",
-    "\n",
-    "        # initiate output array\n",
-    "        output = np.ones(\n",
-    "            (\n",
-    "                previous_nodes.shape[input_index],\n",
-    "                previous_nodes.shape[feature_maps_index],\n",
-    "                strided_height,\n",
-    "                strided_width,\n",
-    "            )\n",
-    "        )\n",
-    "\n",
-    "        # returns output with shape used for reset weights in next layer\n",
-    "        return output\n",
-    "\n",
-    "    def _check_for_errors(self):\n",
-    "        # check if input is smaller than kernel size -> error\n",
-    "        assert (\n",
-    "            self.X_batch_feedforward.shape[width_index] >= self.kernel_width\n",
-    "        ), f\"ERROR: Pooling kernel width_index ({self.kernel_width}) larger than data width_index ({self.X_batch_feedforward.input.shape[2]}), please lower the kernel width_index of the Pooling2DLayer\"\n",
-    "        assert (\n",
-    "            self.X_batch_feedforward.shape[height_index] >= self.kernel_height\n",
-    "        ), f\"ERROR: Pooling kernel height_index ({self.kernel_height}) larger than data height_index ({self.X_batch_feedforward.input.shape[3]}), please lower the kernel height_index of the Pooling2DLayer\""
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "1fcb9ea1",
-   "metadata": {
-    "editable": true
-   },
-   "source": [
-    "### Flattening Layer\n",
-    "\n",
-    "Before we can begin building our first CNN model, we need to introduce\n",
-    "the flattening layer. As its name suggests, the flattening layer\n",
-    "transforms the data into a one-dimensional vector that can be fed into\n",
-    "the feedforward layers of our network. This layer plays a crucial role\n",
-    "in preparing the data for further processing in the\n",
-    "network. Additionally, the flattening layer is responsible for\n",
-    "reshaping the gradient to the proper shape during\n",
-    "backpropagation. This ensures that the kernels are correctly updated,\n",
-    "allowing for effective learning in the network."
+    "$$\n",
+    "Y_(i,j)=(X * W)(i,j) = \\sum_m\\sum_n X(i-m,j-n)W(m,n),\n",
+    "$$"
    ]
   },
   {
-   "cell_type": "code",
-   "execution_count": 27,
-   "id": "d8651caa",
+   "cell_type": "markdown",
+   "id": "352ba109",
    "metadata": {
-    "collapsed": false,
     "editable": true
    },
-   "outputs": [],
    "source": [
-    "class FlattenLayer(Layer):\n",
-    "    def __init__(self, act_func=LRELU, seed=None):\n",
-    "        super().__init__(seed)\n",
-    "        self.act_func = act_func\n",
-    "\n",
-    "    def _feedforward(self, X_batch):\n",
-    "        # save input for backpropagation\n",
-    "        self.X_batch_feedforward_shape = X_batch.shape\n",
-    "        # Remember, the data has the following shape: (I, FM, H, W, ) in the convolutional layers\n",
-    "        # whilst the data has the shape (I, FM * H * W) in the fully connected layers\n",
-    "        # I = Inputs, FM = Feature Maps, H = Height and W = Width.\n",
-    "        X_batch = X_batch.reshape(\n",
-    "            X_batch.shape[input_index],\n",
-    "            X_batch.shape[feature_maps_index]\n",
-    "            * X_batch.shape[height_index]\n",
-    "            * X_batch.shape[width_index],\n",
-    "        )\n",
-    "\n",
-    "        # add bias to a\n",
-    "        self.z_matrix = X_batch\n",
-    "        bias = np.ones((X_batch.shape[input_index], 1)) * 0.01\n",
-    "        self.a_matrix = np.hstack([bias, X_batch])\n",
-    "\n",
-    "        # return a, the input to feedforward in next layer\n",
-    "        return self.a_matrix\n",
-    "\n",
-    "    def _backpropagate(self, weights_next, delta_term_next):\n",
-    "        activation_derivative = derivate(self.act_func)\n",
-    "\n",
-    "        # calculate delta term\n",
-    "        delta_term = (\n",
-    "            weights_next[bias_index:, :] @ delta_term_next.T\n",
-    "        ).T * activation_derivative(self.z_matrix)\n",
-    "\n",
-    "        # FlattenLayer does not update weights\n",
-    "        # reshapes delta layer to convolutional layer data format [Input, Feature_Maps, Height, Width]\n",
-    "        return delta_term.reshape(self.X_batch_feedforward_shape)\n",
-    "\n",
-    "    def _reset_weights(self, previous_nodes):\n",
-    "        # note that the previous nodes to the FlattenLayer are from the convolutional layers\n",
-    "        previous_nodes = previous_nodes.reshape(\n",
-    "            previous_nodes.shape[input_index],\n",
-    "            previous_nodes.shape[feature_maps_index]\n",
-    "            * previous_nodes.shape[height_index]\n",
-    "            * previous_nodes.shape[width_index],\n",
-    "        )\n",
-    "\n",
-    "        # return shape used in reset_weights in next layer\n",
-    "        return previous_nodes.shape[node_index]\n",
-    "\n",
-    "    def get_prev_a(self):\n",
-    "        return self.a_matrix"
+    "and obtain"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "e6b69672",
+   "id": "4660c16f",
    "metadata": {
     "editable": true
    },
    "source": [
-    "### Fully Connected Layers\n",
-    "\n",
-    "Finally, the result from the flatten layer will pass to a series of\n",
-    "fully connected layers, which function as a normal feed forward neural\n",
-    "network. The fully connected layers are split into two classes;\n",
-    "FullyConnectedLayer which acts as a hidden layer, and OutputLayer,\n",
-    "which acts as the single output layer at the end of the CNN. If one\n",
-    "wishes to use this codebase to construct a normal feed forward neural\n",
-    "network, it must start with a FlattenLayer due to techincal details\n",
-    "regarding weight intitialization. However many FullyConnectedLayers\n",
-    "can be added to the CNN, and in each layer the amount of nodes, which\n",
-    "activation function and scheduler to use can be specified. In\n",
-    "practice, the scheduler will be specified in the CNN object\n",
-    "initialization, and inherited if no other scheduler is specified."
+    "$$\n",
+    "\\boldsymbol{Y}=\\begin{bmatrix}x_{00}w_{00}+x_{01}w_{01}+x_{10}w_{10}+x_{11}w_{11} & x_{01}w_{00}+x_{02}w_{01}+x_{11}w_{10}+x_{12}w_{11}  \\\\\n",
+    "\t              x_{10}w_{00}+x_{11}w_{01}+x_{20}w_{10}+x_{21}w_{11} & x_{11}w_{00}+x_{12}w_{01}+x_{21}w_{10}+x_{22}w_{11}\\end{bmatrix}.\n",
+    "$$"
    ]
   },
   {
-   "cell_type": "code",
-   "execution_count": 28,
-   "id": "a37def90",
+   "cell_type": "markdown",
+   "id": "edb9d39b",
    "metadata": {
-    "collapsed": false,
     "editable": true
    },
-   "outputs": [],
    "source": [
-    "class FullyConnectedLayer(Layer):\n",
-    "    # FullyConnectedLayer per default uses LRELU and Adam scheduler\n",
-    "    # with an eta of 0.0001, rho of 0.9 and rho2 of 0.999\n",
-    "    def __init__(\n",
-    "        self,\n",
-    "        nodes: int,\n",
-    "        act_func: Callable = LRELU,\n",
-    "        scheduler: Scheduler = Adam(eta=1e-4, rho=0.9, rho2=0.999),\n",
-    "        seed: int = None,\n",
-    "    ):\n",
-    "        super().__init__(seed)\n",
-    "        self.nodes = nodes\n",
-    "        self.act_func = act_func\n",
-    "        self.scheduler_weight = copy(scheduler)\n",
-    "        self.scheduler_bias = copy(scheduler)\n",
-    "\n",
-    "        # initiate matrices for later\n",
-    "        self.weights = None\n",
-    "        self.a_matrix = None\n",
-    "        self.z_matrix = None\n",
-    "\n",
-    "    def _feedforward(self, X_batch):\n",
-    "        # calculate z\n",
-    "        self.z_matrix = X_batch @ self.weights\n",
-    "\n",
-    "        # calculate a, add bias\n",
-    "        bias = np.ones((X_batch.shape[input_index], 1)) * 0.01\n",
-    "        self.a_matrix = self.act_func(self.z_matrix)\n",
-    "        self.a_matrix = np.hstack([bias, self.a_matrix])\n",
-    "\n",
-    "        # return a, the input for feedforward in next layer\n",
-    "        return self.a_matrix\n",
-    "\n",
-    "    def _backpropagate(self, weights_next, delta_term_next, a_previous, lam):\n",
-    "        # take the derivative of the activation function\n",
-    "        activation_derivative = derivate(self.act_func)\n",
-    "\n",
-    "        # calculate the delta term\n",
-    "        delta_term = (\n",
-    "            weights_next[bias_index:, :] @ delta_term_next.T\n",
-    "        ).T * activation_derivative(self.z_matrix)\n",
-    "\n",
-    "        # intitiate matrix to store gradient\n",
-    "        # note that we exclude the bias term, which we will calculate later\n",
-    "        gradient_weights = np.zeros(\n",
-    "            (\n",
-    "                a_previous.shape[input_index],\n",
-    "                a_previous.shape[node_index] - bias_index,\n",
-    "                delta_term.shape[node_index],\n",
-    "            )\n",
-    "        )\n",
-    "\n",
-    "        # calculate gradient = delta term * previous a\n",
-    "        for i in range(len(delta_term)):\n",
-    "            gradient_weights[i, :, :] = np.outer(\n",
-    "                a_previous[i, bias_index:], delta_term[i, :]\n",
-    "            )\n",
-    "\n",
-    "        # sum the gradient, divide by input_index\n",
-    "        gradient_weights = np.mean(gradient_weights, axis=input_index)\n",
-    "        # for the bias gradient we do not multiply by previous a\n",
-    "        gradient_bias = np.mean(delta_term, axis=input_index).reshape(\n",
-    "            1, delta_term.shape[node_index]\n",
-    "        )\n",
-    "\n",
-    "        # regularization term\n",
-    "        gradient_weights += self.weights[bias_index:, :] * lam\n",
-    "\n",
-    "        # send gradients into scheduler\n",
-    "        # returns update matrix which will be used to update the weights and bias\n",
-    "        update_matrix = np.vstack(\n",
-    "            [\n",
-    "                self.scheduler_bias.update_change(gradient_bias),\n",
-    "                self.scheduler_weight.update_change(gradient_weights),\n",
-    "            ]\n",
-    "        )\n",
-    "\n",
-    "        # update weights\n",
-    "        self.weights -= update_matrix\n",
-    "\n",
-    "        # return weights and delta term, input for backpropagation in previous layer\n",
-    "        return self.weights, delta_term\n",
-    "\n",
-    "    def _reset_weights(self, previous_nodes):\n",
-    "        # sets seed to remove randomness inbetween runs\n",
-    "        if self.seed is not None:\n",
-    "            np.random.seed(self.seed)\n",
-    "\n",
-    "        # add bias, initiate random weights\n",
-    "        bias = 1\n",
-    "        self.weights = np.random.randn(previous_nodes + bias, self.nodes)\n",
-    "\n",
-    "        # returns number of nodes, used for reset_weights in next layer\n",
-    "        return self.nodes\n",
-    "\n",
-    "    def _reset_scheduler(self):\n",
-    "        # resets scheduler per epoch\n",
-    "        self.scheduler_weight.reset()\n",
-    "        self.scheduler_bias.reset()\n",
-    "\n",
-    "    def get_prev_a(self):\n",
-    "        # returns a matrix, used in backpropagation\n",
-    "        return self.a_matrix\n",
-    "\n",
-    "\n",
-    "class OutputLayer(FullyConnectedLayer):\n",
-    "    def __init__(\n",
-    "        self,\n",
-    "        nodes: int,\n",
-    "        output_func: Callable = LRELU,\n",
-    "        cost_func: Callable = CostCrossEntropy,\n",
-    "        scheduler: Scheduler = Adam(eta=1e-4, rho=0.9, rho2=0.999),\n",
-    "        seed: int = None,\n",
-    "    ):\n",
-    "        super().__init__(nodes, output_func, copy(scheduler), seed)\n",
-    "        self.cost_func = cost_func\n",
-    "\n",
-    "        # initiate matrices for later\n",
-    "        self.weights = None\n",
-    "        self.a_matrix = None\n",
-    "        self.z_matrix = None\n",
-    "\n",
-    "        # decides if the output layer performs binary or multi-class classification\n",
-    "        self._set_pred_format()\n",
-    "\n",
-    "    def _feedforward(self, X_batch: np.ndarray):\n",
-    "        # calculate a, z\n",
-    "        # note that bias is not added as this would create an extra output class\n",
-    "        self.z_matrix = X_batch @ self.weights\n",
-    "        self.a_matrix = self.act_func(self.z_matrix)\n",
-    "\n",
-    "        # returns prediction\n",
-    "        return self.a_matrix\n",
-    "\n",
-    "    def _backpropagate(self, target, a_previous, lam):\n",
-    "        # note that in the OutputLayer the activation function is the output function\n",
-    "        activation_derivative = derivate(self.act_func)\n",
-    "\n",
-    "        # calculate output delta terms\n",
-    "        # for multi-class or binary classification\n",
-    "        if self.pred_format == \"Multi-class\":\n",
-    "            delta_term = self.a_matrix - target\n",
-    "        else:\n",
-    "            cost_func_derivative = grad(self.cost_func(target))\n",
-    "            delta_term = activation_derivative(self.z_matrix) * cost_func_derivative(\n",
-    "                self.a_matrix\n",
-    "            )\n",
-    "\n",
-    "        # intiate matrix that stores gradient\n",
-    "        gradient_weights = np.zeros(\n",
-    "            (\n",
-    "                a_previous.shape[input_index],\n",
-    "                a_previous.shape[node_index] - bias_index,\n",
-    "                delta_term.shape[node_index],\n",
-    "            )\n",
-    "        )\n",
-    "\n",
-    "        # calculate gradient = delta term * previous a\n",
-    "        for i in range(len(delta_term)):\n",
-    "            gradient_weights[i, :, :] = np.outer(\n",
-    "                a_previous[i, bias_index:], delta_term[i, :]\n",
-    "            )\n",
-    "\n",
-    "        # sum the gradient, divide by input_index\n",
-    "        gradient_weights = np.mean(gradient_weights, axis=input_index)\n",
-    "        # for the bias gradient we do not multiply by previous a\n",
-    "        gradient_bias = np.mean(delta_term, axis=input_index).reshape(\n",
-    "            1, delta_term.shape[node_index]\n",
-    "        )\n",
-    "\n",
-    "        # regularization term\n",
-    "        gradient_weights += self.weights[bias_index:, :] * lam\n",
-    "\n",
-    "        # send gradients into scheduler\n",
-    "        # returns update matrix which will be used to update the weights and bias\n",
-    "        update_matrix = np.vstack(\n",
-    "            [\n",
-    "                self.scheduler_bias.update_change(gradient_bias),\n",
-    "                self.scheduler_weight.update_change(gradient_weights),\n",
-    "            ]\n",
-    "        )\n",
-    "\n",
-    "        # update weights\n",
-    "        self.weights -= update_matrix\n",
-    "\n",
-    "        # return weights and delta term, input for backpropagation in previous layer\n",
-    "        return self.weights, delta_term\n",
-    "\n",
-    "    def _reset_weights(self, previous_nodes):\n",
-    "        # sets seed to remove randomness inbetween runs\n",
-    "        if self.seed is not None:\n",
-    "            np.random.seed(self.seed)\n",
-    "\n",
-    "        # add bias, initiate random weights\n",
-    "        bias = 1\n",
-    "        self.weights = np.random.rand(previous_nodes + bias, self.nodes)\n",
-    "\n",
-    "        # returns number of nodes, used for reset_weights in next layer\n",
-    "        return self.nodes\n",
-    "\n",
-    "    def _reset_scheduler(self):\n",
-    "        # resets scheduler per epoch\n",
-    "        self.scheduler_weight.reset()\n",
-    "        self.scheduler_bias.reset()\n",
-    "\n",
-    "    def _set_pred_format(self):\n",
-    "        # sets prediction format to either regression, binary or multi-class classification\n",
-    "        if self.act_func.__name__ is None or self.act_func.__name__ == \"identity\":\n",
-    "            self.pred_format = \"Regression\"\n",
-    "        elif self.act_func.__name__ == \"sigmoid\" or self.act_func.__name__ == \"tanh\":\n",
-    "            self.pred_format = \"Binary\"\n",
-    "        else:\n",
-    "            self.pred_format = \"Multi-class\"\n",
-    "\n",
-    "    def get_pred_format(self):\n",
-    "        # returns format of prediction\n",
-    "        return self.pred_format"
+    "We can rewrite this operation in terms of a matrix-vector multiplication by defining a new vector where we flatten out the inputs as a vector $\\boldsymbol{X}'$ of length $9$ and\n",
+    "a matrix $\\boldsymbol{W}'$ with dimension $4\\times 9$ as"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "c8b1cfa7",
+   "id": "11470079",
    "metadata": {
     "editable": true
    },
    "source": [
-    "### Optimized Convolution2DLayer\n",
-    "\n",
-    "For our CNN, we have also implemented an optimized version of the\n",
-    "Convolution2DLayer, Convolution2DLayerOPT, which runs much faster. See\n",
-    "VII. Remarks for discussion. This layer will per default be used by\n",
-    "the CNN due to its computational advantages, but is much less\n",
-    "readable. We've documented it such that specially interested students\n",
-    "can understand the principles behind it, but it is not recommended to\n",
-    "read. In short, we reshape and transpose parts of the image such that\n",
-    "the convolutional operation can be swapped out for a simple matrix\n",
-    "multiplication."
+    "$$\n",
+    "\\boldsymbol{X}'=\\begin{bmatrix}x_{00} \\\\ x_{01} \\\\ x_{02} \\\\ x_{10} \\\\ x_{11} \\\\ x_{12} \\\\ x_{20} \\\\ x_{21} \\\\ x_{22} \\end{bmatrix},\n",
+    "$$"
    ]
   },
   {
-   "cell_type": "code",
-   "execution_count": 29,
-   "id": "27960893",
+   "cell_type": "markdown",
+   "id": "9b505f16",
    "metadata": {
-    "collapsed": false,
     "editable": true
    },
-   "outputs": [],
    "source": [
-    "class Convolution2DLayerOPT(Convolution2DLayer):\n",
-    "    \"\"\"\n",
-    "    Am optimized version of the convolution layer above which\n",
-    "    utilizes an approach of extracting windows of size equivalent\n",
-    "    in size to the filter. The convoution is then performed on those\n",
-    "    windows instead of a full feature map.\n",
-    "    \"\"\"\n",
-    "\n",
-    "    def __init__(\n",
-    "        self,\n",
-    "        input_channels,\n",
-    "        feature_maps,\n",
-    "        kernel_height,\n",
-    "        kernel_width,\n",
-    "        v_stride,\n",
-    "        h_stride,\n",
-    "        pad,\n",
-    "        act_func: Callable,\n",
-    "        seed=None,\n",
-    "        reset_weights_independently=True,\n",
-    "    ):\n",
-    "        super().__init__(\n",
-    "            input_channels,\n",
-    "            feature_maps,\n",
-    "            kernel_height,\n",
-    "            kernel_width,\n",
-    "            v_stride,\n",
-    "            h_stride,\n",
-    "            pad,\n",
-    "            act_func,\n",
-    "            seed,\n",
-    "        )\n",
-    "        # true if layer is used outside of CNN\n",
-    "        if reset_weights_independently == True:\n",
-    "            self._reset_weights_independently()\n",
-    "\n",
-    "    def _feedforward(self, X_batch):\n",
-    "        # The optimized _feedforward method is difficult to understand but computationally more efficient\n",
-    "        # for a more \"by the book\" approach, please look at the _feedforward method of Convolution2DLayer\n",
-    "\n",
-    "        # save the input for backpropagation\n",
-    "        self.X_batch_feedforward = X_batch\n",
-    "\n",
-    "        # check that there are the correct amount of input channels\n",
-    "        self._check_for_errors()\n",
-    "\n",
-    "        # calculate new shape after stride\n",
-    "        strided_height = int(np.ceil(X_batch.shape[height_index] / self.v_stride))\n",
-    "        strided_width = int(np.ceil(X_batch.shape[width_index] / self.h_stride))\n",
-    "\n",
-    "        # get windows of the image for more computationally efficient convolution\n",
-    "        # the idea is that we want to align the dimensions that we wish to matrix\n",
-    "        # multiply, then use a simple matrix multiplication instead of convolution.\n",
-    "        # then, we reshape the size back to its intended shape\n",
-    "        windows = self._extract_windows(X_batch)\n",
-    "        windows = windows.transpose(1, 0, 2, 3, 4).reshape(\n",
-    "            X_batch.shape[input_index],\n",
-    "            strided_height * strided_width,\n",
-    "            -1,\n",
-    "        )\n",
-    "\n",
-    "        # reshape the kernel for more computationally efficient convolution\n",
-    "        kernel = self.kernel\n",
-    "        kernel = kernel.transpose(0, 2, 3, 1).reshape(\n",
-    "            kernel.shape[kernel_input_channels_index]\n",
-    "            * kernel.shape[height_index]\n",
-    "            * kernel.shape[width_index],\n",
-    "            -1,\n",
-    "        )\n",
-    "\n",
-    "        # use simple matrix calculation to obtain output\n",
-    "        output = (\n",
-    "            (windows @ kernel)\n",
-    "            .reshape(\n",
-    "                X_batch.shape[input_index],\n",
-    "                strided_height,\n",
-    "                strided_width,\n",
-    "                -1,\n",
-    "            )\n",
-    "            .transpose(0, 3, 1, 2)\n",
-    "        )\n",
-    "\n",
-    "        # The output is reshaped and rearranged to appropriate shape\n",
-    "        return self.act_func(\n",
-    "            output / (self.kernel_height * X_batch.shape[feature_maps_index])\n",
-    "        )\n",
-    "\n",
-    "    def _backpropagate(self, delta_term_next):\n",
-    "        # The optimized _backpropagate method is difficult to understand but computationally more efficient\n",
-    "        # for a more \"by the book\" approach, please look at the _backpropagate method of Convolution2DLayer\n",
-    "        act_derivative = derivate(self.act_func)\n",
-    "        delta_term_next = act_derivative(delta_term_next)\n",
-    "\n",
-    "        # calculate strided dimensions\n",
-    "        strided_height = int(\n",
-    "            np.ceil(self.X_batch_feedforward.shape[height_index] / self.v_stride)\n",
-    "        )\n",
-    "        strided_width = int(\n",
-    "            np.ceil(self.X_batch_feedforward.shape[width_index] / self.h_stride)\n",
-    "        )\n",
-    "\n",
-    "        # copy kernel\n",
-    "        kernel = self.kernel\n",
-    "\n",
-    "        # get windows, reshape for matrix multiplication\n",
-    "        windows = self._extract_windows(self.X_batch_feedforward, \"image\").reshape(\n",
-    "            self.X_batch_feedforward.shape[input_index]\n",
-    "            * strided_height\n",
-    "            * strided_width,\n",
-    "            -1,\n",
-    "        )\n",
-    "\n",
-    "        # initialize output gradient, reshape and transpose into correct shape\n",
-    "        # for matrix multiplication\n",
-    "        output_grad_tr = delta_term_next.transpose(0, 2, 3, 1).reshape(\n",
-    "            self.X_batch_feedforward.shape[input_index]\n",
-    "            * strided_height\n",
-    "            * strided_width,\n",
-    "            -1,\n",
-    "        )\n",
-    "\n",
-    "        # calculate gradient kernel via simple matrix multiplication and reshaping\n",
-    "        gradient_kernel = (\n",
-    "            (windows.T @ output_grad_tr)\n",
-    "            .reshape(\n",
-    "                kernel.shape[kernel_input_channels_index],\n",
-    "                kernel.shape[height_index],\n",
-    "                kernel.shape[width_index],\n",
-    "                kernel.shape[kernel_feature_maps_index],\n",
-    "            )\n",
-    "            .transpose(0, 3, 1, 2)\n",
-    "        )\n",
-    "\n",
-    "        # for computing the input gradient\n",
-    "        windows_out, upsampled_height, upsampled_width = self._extract_windows(\n",
-    "            delta_term_next, \"grad\"\n",
-    "        )\n",
-    "\n",
-    "        # calculate new window dimensions\n",
-    "        new_windows_first_dim = (\n",
-    "            self.X_batch_feedforward.shape[input_index]\n",
-    "            * upsampled_height\n",
-    "            * upsampled_width\n",
-    "        )\n",
-    "        # ceil allows for various asymmetric kernels\n",
-    "        new_windows_sec_dim = int(np.ceil(windows_out.size / new_windows_first_dim))\n",
-    "\n",
-    "        # reshape for matrix multiplication\n",
-    "        windows_out = windows_out.transpose(1, 0, 2, 3, 4).reshape(\n",
-    "            new_windows_first_dim, new_windows_sec_dim\n",
-    "        )\n",
-    "\n",
-    "        # reshape for matrix multiplication\n",
-    "        kernel_reshaped = kernel.reshape(self.input_channels, -1)\n",
-    "\n",
-    "        # calculating input gradient for next convolutional layer\n",
-    "        input_grad = (windows_out @ kernel_reshaped.T).reshape(\n",
-    "            self.X_batch_feedforward.shape[input_index],\n",
-    "            upsampled_height,\n",
-    "            upsampled_width,\n",
-    "            kernel.shape[kernel_input_channels_index],\n",
-    "        )\n",
-    "        input_grad = input_grad.transpose(0, 3, 1, 2)\n",
-    "\n",
-    "        # Update the weights in the kernel\n",
-    "        self.kernel -= gradient_kernel\n",
-    "\n",
-    "        # Output the gradient to propagate backwards\n",
-    "        return input_grad\n",
-    "\n",
-    "    def _extract_windows(self, X_batch, batch_type=\"image\"):\n",
-    "        \"\"\"\n",
-    "        Receives as input the X_batch with shape (inputs, feature_maps, image_height, image_width)\n",
-    "        and extract windows of size kernel_height * kernel_width for every image and every feature_map.\n",
-    "        It then returns an np.ndarray of shape (image_height * image_width, inputs, feature_maps, kernel_height, kernel_width)\n",
-    "        which will be used either to filter the images in feedforward or to calculate the gradient.\n",
-    "        \"\"\"\n",
-    "\n",
-    "        # initialize list of windows\n",
-    "        windows = []\n",
-    "\n",
-    "        if batch_type == \"image\":\n",
-    "            # pad the images\n",
-    "            X_batch_padded = self._padding(X_batch, batch_type=\"image\")\n",
-    "            img_height, img_width = X_batch_padded.shape[2:]\n",
-    "            # For each location in the image...\n",
-    "            for h in range(\n",
-    "                0,\n",
-    "                X_batch.shape[height_index],\n",
-    "                self.v_stride,\n",
-    "            ):\n",
-    "                for w in range(\n",
-    "                    0,\n",
-    "                    X_batch.shape[width_index],\n",
-    "                    self.h_stride,\n",
-    "                ):\n",
-    "                    # ...obtain an image patch of the original size (strided)\n",
-    "\n",
-    "                    # get window\n",
-    "                    window = X_batch_padded[\n",
-    "                        :,\n",
-    "                        :,\n",
-    "                        h : h + self.kernel_height,\n",
-    "                        w : w + self.kernel_width,\n",
-    "                    ]\n",
-    "\n",
-    "                    # append to list of windows\n",
-    "                    windows.append(window)\n",
-    "\n",
-    "            # return numpy array instead of list\n",
-    "            return np.stack(windows)\n",
-    "\n",
-    "        # In order to be able to perform backprogagation by the method of window extraction,\n",
-    "        # here is a modified approach to extracting the windows which allow for the necessary\n",
-    "        # upsampling of the gradient in case the on of the stride parameters is larger than one.\n",
-    "\n",
-    "        if batch_type == \"grad\":\n",
-    "\n",
-    "            # In the case of one of the stride parameters being odd, we have to take some\n",
-    "            # extra care in calculating the upsampled size of X_batch. We solve this\n",
-    "            # by simply flooring the result of dividing stride by 2.\n",
-    "            if self.v_stride < 2 or self.v_stride % 2 == 0:\n",
-    "                v_stride = 0\n",
-    "            else:\n",
-    "                v_stride = int(np.floor(self.v_stride / 2))\n",
-    "\n",
-    "            if self.h_stride < 2 or self.h_stride % 2 == 0:\n",
-    "                h_stride = 0\n",
-    "            else:\n",
-    "                h_stride = int(np.floor(self.h_stride / 2))\n",
-    "\n",
-    "            upsampled_height = (X_batch.shape[height_index] * self.v_stride) - v_stride\n",
-    "            upsampled_width = (X_batch.shape[width_index] * self.h_stride) - h_stride\n",
-    "\n",
-    "            # When upsampling, we need to insert rows and columns filled with zeros\n",
-    "            # into each feature map. How many of those we have to insert is purely\n",
-    "            # dependant on the value of stride parameter in the vertical and horizontal\n",
-    "            # direction.\n",
-    "            if self.v_stride > 1:\n",
-    "                v_ind = 1\n",
-    "                for i in range(X_batch.shape[height_index]):\n",
-    "                    for j in range(self.v_stride - 1):\n",
-    "                        X_batch = np.insert(X_batch, v_ind, 0, axis=height_index)\n",
-    "                    v_ind += self.v_stride\n",
-    "\n",
-    "            if self.h_stride > 1:\n",
-    "                h_ind = 1\n",
-    "                for i in range(X_batch.shape[width_index]):\n",
-    "                    for k in range(self.h_stride - 1):\n",
-    "                        X_batch = np.insert(X_batch, h_ind, 0, axis=width_index)\n",
-    "                    h_ind += self.h_stride\n",
-    "\n",
-    "            # Since the insertion of zero-filled rows and columns isn't perfect, we have\n",
-    "            # to assure that the resulting feature maps will have the expected upsampled height\n",
-    "            # and width by cutting them og at desired dimensions.\n",
-    "\n",
-    "            X_batch = X_batch[:, :, :upsampled_height, :upsampled_width]\n",
-    "\n",
-    "            X_batch_padded = self._padding(X_batch, batch_type=\"grad\")\n",
-    "\n",
-    "            # initialize list of windows\n",
-    "            windows = []\n",
-    "\n",
-    "            # For each location in the image...\n",
-    "            for h in range(\n",
-    "                0,\n",
-    "                X_batch.shape[height_index],\n",
-    "                self.v_stride,\n",
-    "            ):\n",
-    "                for w in range(\n",
-    "                    0,\n",
-    "                    X_batch.shape[width_index],\n",
-    "                    self.h_stride,\n",
-    "                ):\n",
-    "                    # ...obtain an image patch of the original size (strided)\n",
-    "\n",
-    "                    # get window\n",
-    "                    window = X_batch_padded[\n",
-    "                        :, :, h : h + self.kernel_height, w : w + self.kernel_width\n",
-    "                    ]\n",
-    "\n",
-    "                    # append window to list\n",
-    "                    windows.append(window)\n",
-    "\n",
-    "            # return numpy array, unsampled dimensions\n",
-    "            return np.stack(windows), upsampled_height, upsampled_width\n",
-    "\n",
-    "    def _check_for_errors(self):\n",
-    "        # compares input channels of data to input channels of Convolution2DLayer\n",
-    "        if self.X_batch_feedforward.shape[input_channel_index] != self.input_channels:\n",
-    "            raise AssertionError(\n",
-    "                f\"ERROR: Number of input channels in data ({self.X_batch_feedforward.shape[input_channel_index]}) is not equal to input channels in Convolution2DLayerOPT ({self.input_channels})! Please change the number of input channels of the Convolution2DLayer such that they are equal\"\n",
-    "            )"
+    "and the new matrix"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "a0cd956f",
+   "id": "30c903b3",
    "metadata": {
     "editable": true
    },
    "source": [
-    "### The Convolutional Neural Network (CNN)\n",
-    "\n",
-    "Finally, we present the code for the CNN. The CNN class organizes all the layers, and allows for training on image data."
+    "$$\n",
+    "\\boldsymbol{W}'=\\begin{bmatrix} w_{00} & w_{01} & 0 & w_{10} & w_{11} & 0 & 0 & 0 & 0 \\\\\n",
+    "                        0  & w_{00} & w_{01} & 0 & w_{10} & w_{11} & 0 & 0 & 0 \\\\\n",
+    "\t\t\t0 & 0 & 0 & w_{00} & w_{01} & 0 & w_{10} & w_{11} & 0  \\\\\n",
+    "                        0 & 0 & 0 & 0 & w_{00} & w_{01} & 0 & w_{10} & w_{11}\\end{bmatrix}.\n",
+    "$$"
    ]
   },
   {
-   "cell_type": "code",
-   "execution_count": 30,
-   "id": "87e14f3b",
+   "cell_type": "markdown",
+   "id": "057a5e31",
    "metadata": {
-    "collapsed": false,
     "editable": true
    },
-   "outputs": [],
    "source": [
-    "import math\n",
-    "import autograd.numpy as np\n",
-    "import sys\n",
-    "import warnings\n",
-    "from autograd import grad, elementwise_grad\n",
-    "from random import random, seed\n",
-    "from copy import deepcopy\n",
-    "from typing import Tuple, Callable\n",
-    "from sklearn.utils import resample\n",
-    "\n",
-    "warnings.simplefilter(\"error\")\n",
-    "\n",
-    "\n",
-    "class CNN:\n",
-    "    def __init__(\n",
-    "        self,\n",
-    "        cost_func: Callable = CostCrossEntropy,\n",
-    "        scheduler: Scheduler = Adam(eta=1e-4, rho=0.9, rho2=0.999),\n",
-    "        seed: int = None,\n",
-    "    ):\n",
-    "        \"\"\"\n",
-    "        Description:\n",
-    "        ------------\n",
-    "            Instantiates CNN object\n",
-    "\n",
-    "        Parameters:\n",
-    "        ------------\n",
-    "            I   output_func (costFunctions) cost function for feed forward neural network part of CNN,\n",
-    "                such as \"CostLogReg\", \"CostOLS\" or \"CostCrossEntropy\"\n",
-    "\n",
-    "            II  scheduler (Scheduler) optional parameter, default set to Adam. Can also be set to other\n",
-    "                schedulers such as AdaGrad, Momentum, RMS_prop and Constant. Note that schedulers have\n",
-    "                to be instantiated first with proper parameters (for example eta, rho and rho2 for Adam)\n",
-    "\n",
-    "            III seed (int) used for seeding all random operations\n",
-    "        \"\"\"\n",
-    "        self.layers = list()\n",
-    "        self.cost_func = cost_func\n",
-    "        self.scheduler = scheduler\n",
-    "        self.schedulers_weight = list()\n",
-    "        self.schedulers_bias = list()\n",
-    "        self.seed = seed\n",
-    "        self.pred_format = None\n",
-    "\n",
-    "    def add_FullyConnectedLayer(\n",
-    "        self, nodes: int, act_func=LRELU, scheduler=None\n",
-    "    ) -> None:\n",
-    "        \"\"\"\n",
-    "        Description:\n",
-    "        ------------\n",
-    "            Add a FullyConnectedLayer to the CNN, i.e. a hidden layer in the feed forward neural\n",
-    "            network part of the CNN. Often called a Dense layer in literature\n",
-    "\n",
-    "        Parameters:\n",
-    "        ------------\n",
-    "            I   nodes (int) number of nodes in FullyConnectedLayer\n",
-    "            II  act_func (activationFunctions) activation function of FullyConnectedLayer,\n",
-    "                such as \"sigmoid\", \"RELU\", \"LRELU\", \"softmax\" or \"identity\"\n",
-    "            III scheduler (Scheduler) optional parameter, default set to Adam. Can also be set to other\n",
-    "                schedulers such as AdaGrad, Momentum, RMS_prop and Constant\n",
-    "        \"\"\"\n",
-    "        assert self.layers, \"FullyConnectedLayer should follow FlattenLayer in CNN\"\n",
-    "\n",
-    "        if scheduler is None:\n",
-    "            scheduler = self.scheduler\n",
-    "\n",
-    "        layer = FullyConnectedLayer(nodes, act_func, scheduler, self.seed)\n",
-    "        self.layers.append(layer)\n",
-    "\n",
-    "    def add_OutputLayer(self, nodes: int, output_func=sigmoid, scheduler=None) -> None:\n",
-    "        \"\"\"\n",
-    "        Description:\n",
-    "        ------------\n",
-    "            Add an OutputLayer to the CNN, i.e. a the final layer in the feed forward neural\n",
-    "            network part of the CNN\n",
-    "\n",
-    "        Parameters:\n",
-    "        ------------\n",
-    "            I   nodes (int) number of nodes in OutputLayer. Set nodes=1 for binary classification and\n",
-    "                nodes = number of classes for multi-class classification\n",
-    "            II  output_func (activationFunctions) activation function for the output layer, such as\n",
-    "                \"identity\" for regression, \"sigmoid\" for binary classification and \"softmax\" for multi-class\n",
-    "                classification\n",
-    "            III scheduler (Scheduler) optional parameter, default set to Adam. Can also be set to other\n",
-    "                schedulers such as AdaGrad, Momentum, RMS_prop and Constant\n",
-    "        \"\"\"\n",
-    "        assert self.layers, \"OutputLayer should follow FullyConnectedLayer in CNN\"\n",
-    "\n",
-    "        if scheduler is None:\n",
-    "            scheduler = self.scheduler\n",
-    "\n",
-    "        output_layer = OutputLayer(\n",
-    "            nodes, output_func, self.cost_func, scheduler, self.seed\n",
-    "        )\n",
-    "        self.layers.append(output_layer)\n",
-    "        self.pred_format = output_layer.get_pred_format()\n",
-    "\n",
-    "    def add_FlattenLayer(self, act_func=LRELU) -> None:\n",
-    "        \"\"\"\n",
-    "        Description:\n",
-    "        ------------\n",
-    "            Add a FlattenLayer to the CNN, which flattens the image data such that it is formatted to\n",
-    "            be used in the feed forward neural network part of the CNN\n",
-    "        \"\"\"\n",
-    "        self.layers.append(FlattenLayer(act_func=act_func, seed=self.seed))\n",
-    "\n",
-    "    def add_Convolution2DLayer(\n",
-    "        self,\n",
-    "        input_channels=1,\n",
-    "        feature_maps=1,\n",
-    "        kernel_height=3,\n",
-    "        kernel_width=3,\n",
-    "        v_stride=1,\n",
-    "        h_stride=1,\n",
-    "        pad=\"same\",\n",
-    "        act_func=LRELU,\n",
-    "        optimized=True,\n",
-    "    ) -> None:\n",
-    "        \"\"\"\n",
-    "        Description:\n",
-    "        ------------\n",
-    "            Add a Convolution2DLayer to the CNN, i.e. a convolutional layer with a 2 dimensional kernel. Should be\n",
-    "            the first layer added to the CNN\n",
-    "\n",
-    "        Parameters:\n",
-    "        ------------\n",
-    "            I   input_channels (int) specifies amount of input channels. For monochrome images, use input_channels\n",
-    "                = 1, and input_channels = 3 for colored images, where each channel represents one of R, G and B\n",
-    "            II  feature_maps (int) amount of feature maps in CNN\n",
-    "            III kernel_height (int) height of the kernel, also called 'convolutional filter' in literature\n",
-    "            IV  kernel_width (int) width of the kernel, also called 'convolutional filter' in literature\n",
-    "            V   v_stride (int) value of vertical stride for dimentionality reduction\n",
-    "            VI  h_stride (int) value of horizontal stride for dimentionality reduction\n",
-    "            VII pad (str) default = \"same\" ensures output size is the same as input size (given stride=1)\n",
-    "           VIII act_func (activationFunctions) default = \"LRELU\", nonlinear activation function\n",
-    "             IX optimized (bool) default = True, uses Convolution2DLayerOPT if True which is much faster when\n",
-    "                compared to Convolution2DLayer, which is a more straightforward, understandable implementation\n",
-    "        \"\"\"\n",
-    "        if optimized:\n",
-    "            conv_layer = Convolution2DLayerOPT(\n",
-    "                input_channels,\n",
-    "                feature_maps,\n",
-    "                kernel_height,\n",
-    "                kernel_width,\n",
-    "                v_stride,\n",
-    "                h_stride,\n",
-    "                pad,\n",
-    "                act_func,\n",
-    "                self.seed,\n",
-    "                reset_weights_independently=False,\n",
-    "            )\n",
-    "        else:\n",
-    "            conv_layer = Convolution2DLayer(\n",
-    "                input_channels,\n",
-    "                feature_maps,\n",
-    "                kernel_height,\n",
-    "                kernel_width,\n",
-    "                v_stride,\n",
-    "                h_stride,\n",
-    "                pad,\n",
-    "                act_func,\n",
-    "                self.seed,\n",
-    "                reset_weights_independently=False,\n",
-    "            )\n",
-    "        self.layers.append(conv_layer)\n",
-    "\n",
-    "    def add_PoolingLayer(\n",
-    "        self, kernel_height=2, kernel_width=2, v_stride=1, h_stride=1, pooling=\"max\"\n",
-    "    ) -> None:\n",
-    "        \"\"\"\n",
-    "        Description:\n",
-    "        ------------\n",
-    "            Add a Pooling2DLayer to the CNN, i.e. a pooling layer that reduces the dimentionality of\n",
-    "            the image data. It is not necessary to use a Pooling2DLayer when creating a CNN, but it\n",
-    "            can be used to speed up the training\n",
-    "\n",
-    "        Parameters:\n",
-    "        ------------\n",
-    "            I   kernel_height (int) height of the kernel used for pooling\n",
-    "            II  kernel_width (int) width of the kernel used for pooling\n",
-    "            III v_stride (int) value of vertical stride for dimentionality reduction\n",
-    "            IV  h_stride (int) value of horizontal stride for dimentionality reduction\n",
-    "            V   pooling (str) either \"max\" or \"average\", describes type of pooling performed\n",
-    "        \"\"\"\n",
-    "        pooling_layer = Pooling2DLayer(\n",
-    "            kernel_height, kernel_width, v_stride, h_stride, pooling, self.seed\n",
-    "        )\n",
-    "        self.layers.append(pooling_layer)\n",
-    "\n",
-    "    def fit(\n",
-    "        self,\n",
-    "        X: np.ndarray,\n",
-    "        t: np.ndarray,\n",
-    "        epochs: int = 100,\n",
-    "        lam: float = 0,\n",
-    "        batches: int = 1,\n",
-    "        X_val: np.ndarray = None,\n",
-    "        t_val: np.ndarray = None,\n",
-    "    ) -> dict:\n",
-    "        \"\"\"\n",
-    "        Description:\n",
-    "        ------------\n",
-    "            Fits the CNN to input X for a given amount of epochs. Performs feedforward and backpropagation passes,\n",
-    "            can utilize batches, regulariziation and validation if desired.\n",
-    "\n",
-    "        Parameters:\n",
-    "        ------------\n",
-    "            X (numpy array) with input data in format [images, input channels,\n",
-    "            image height, image_width]\n",
-    "            t (numpy array) target labels for input data\n",
-    "            epochs (int) amount of epochs\n",
-    "            lam (float) regulariziation term lambda\n",
-    "            batches (int) amount of batches input data splits into\n",
-    "            X_val (numpy array) validation data\n",
-    "            t_val (numpy array) target labels for validation data\n",
-    "\n",
-    "        Returns:\n",
-    "        ------------\n",
-    "            scores (dict) a dictionary with \"train_error\", \"train_acc\", \"val_error\", val_acc\" keys\n",
-    "            that contain numpy arrays with float values of all accuracies/errors over all epochs.\n",
-    "            Can be used to create plots. Also used to update the progress bar during training\n",
-    "        \"\"\"\n",
-    "\n",
-    "        # setup\n",
-    "        if self.seed is not None:\n",
-    "            np.random.seed(self.seed)\n",
-    "\n",
-    "        # initialize weights\n",
-    "        self._initialize_weights(X)\n",
-    "\n",
-    "        # create arrays for score metrics\n",
-    "        scores = self._initialize_scores(epochs)\n",
-    "\n",
-    "        assert batches <= t.shape[0]\n",
-    "        batch_size = X.shape[0] // batches\n",
-    "\n",
-    "        try:\n",
-    "            for epoch in range(epochs):\n",
-    "                for batch in range(batches):\n",
-    "                    # minibatch gradient descent\n",
-    "                    # If the for loop has reached the last batch, take all thats left\n",
-    "                    if batch == batches - 1:\n",
-    "                        X_batch = X[batch * batch_size :, :, :, :]\n",
-    "                        t_batch = t[batch * batch_size :, :]\n",
-    "                    else:\n",
-    "                        X_batch = X[\n",
-    "                            batch * batch_size : (batch + 1) * batch_size, :, :, :\n",
-    "                        ]\n",
-    "                        t_batch = t[batch * batch_size : (batch + 1) * batch_size, :]\n",
-    "\n",
-    "                    self._feedforward(X_batch)\n",
-    "                    self._backpropagate(t_batch, lam)\n",
-    "\n",
-    "                # reset schedulers for each epoch (some schedulers pass in this call)\n",
-    "                for layer in self.layers:\n",
-    "                    if isinstance(layer, FullyConnectedLayer):\n",
-    "                        layer._reset_scheduler()\n",
-    "\n",
-    "                # computing performance metrics\n",
-    "                scores = self._compute_scores(scores, epoch, X, t, X_val, t_val)\n",
-    "\n",
-    "                # printing progress bar\n",
-    "                print_length = self._progress_bar(\n",
-    "                    epoch,\n",
-    "                    epochs,\n",
-    "                    scores,\n",
-    "                )\n",
-    "        # allows for stopping training at any point and seeing the result\n",
-    "        except KeyboardInterrupt:\n",
-    "            pass\n",
-    "\n",
-    "        # visualization of training progression (similiar to tensorflow progression bar)\n",
-    "        sys.stdout.write(\"\\r\" + \" \" * print_length)\n",
-    "        sys.stdout.flush()\n",
-    "        self._progress_bar(\n",
-    "            epochs,\n",
-    "            epochs,\n",
-    "            scores,\n",
-    "        )\n",
-    "        sys.stdout.write(\"\")\n",
-    "\n",
-    "        return scores\n",
-    "\n",
-    "    def _feedforward(self, X_batch) -> np.ndarray:\n",
-    "        \"\"\"\n",
-    "        Description:\n",
-    "        ------------\n",
-    "            Performs the feedforward pass for all layers in the CNN. Called from fit()\n",
-    "        \"\"\"\n",
-    "        a = X_batch\n",
-    "        for layer in self.layers:\n",
-    "            a = layer._feedforward(a)\n",
-    "\n",
-    "        return a\n",
-    "\n",
-    "    def _backpropagate(self, t_batch, lam) -> None:\n",
-    "        \"\"\"\n",
-    "        Description:\n",
-    "        ------------\n",
-    "            Performs backpropagation for all layers in the CNN. Called from fit()\n",
-    "        \"\"\"\n",
-    "        assert len(self.layers) >= 2\n",
-    "        reversed_layers = self.layers[::-1]\n",
-    "\n",
-    "        # for every layer, backwards\n",
-    "        for i in range(len(reversed_layers) - 1):\n",
-    "            layer = reversed_layers[i]\n",
-    "            prev_layer = reversed_layers[i + 1]\n",
-    "\n",
-    "            # OutputLayer\n",
-    "            if isinstance(layer, OutputLayer):\n",
-    "                prev_a = prev_layer.get_prev_a()\n",
-    "                weights_next, delta_next = layer._backpropagate(t_batch, prev_a, lam)\n",
-    "\n",
-    "            # FullyConnectedLayer\n",
-    "            elif isinstance(layer, FullyConnectedLayer):\n",
-    "                assert (\n",
-    "                    delta_next is not None\n",
-    "                ), \"No OutputLayer to follow FullyConnectedLayer\"\n",
-    "                assert (\n",
-    "                    weights_next is not None\n",
-    "                ), \"No OutputLayer to follow FullyConnectedLayer\"\n",
-    "                prev_a = prev_layer.get_prev_a()\n",
-    "                weights_next, delta_next = layer._backpropagate(\n",
-    "                    weights_next, delta_next, prev_a, lam\n",
-    "                )\n",
-    "\n",
-    "            # FlattenLayer\n",
-    "            elif isinstance(layer, FlattenLayer):\n",
-    "                assert (\n",
-    "                    delta_next is not None\n",
-    "                ), \"No FullyConnectedLayer to follow FlattenLayer\"\n",
-    "                assert (\n",
-    "                    weights_next is not None\n",
-    "                ), \"No FullyConnectedLayer to follow FlattenLayer\"\n",
-    "                delta_next = layer._backpropagate(weights_next, delta_next)\n",
-    "\n",
-    "            # Convolution2DLayer and Convolution2DLayerOPT\n",
-    "            elif isinstance(layer, Convolution2DLayer):\n",
-    "                assert (\n",
-    "                    delta_next is not None\n",
-    "                ), \"No FlattenLayer to follow Convolution2DLayer\"\n",
-    "                delta_next = layer._backpropagate(delta_next)\n",
-    "\n",
-    "            # Pooling2DLayer\n",
-    "            elif isinstance(layer, Pooling2DLayer):\n",
-    "                assert delta_next is not None, \"No Layer to follow Pooling2DLayer\"\n",
-    "                delta_next = layer._backpropagate(delta_next)\n",
-    "\n",
-    "            # Catch error\n",
-    "            else:\n",
-    "                raise NotImplementedError\n",
-    "\n",
-    "    def _compute_scores(\n",
-    "        self,\n",
-    "        scores: dict,\n",
-    "        epoch: int,\n",
-    "        X: np.ndarray,\n",
-    "        t: np.ndarray,\n",
-    "        X_val: np.ndarray,\n",
-    "        t_val: np.ndarray,\n",
-    "    ) -> dict:\n",
-    "        \"\"\"\n",
-    "        Description:\n",
-    "        ------------\n",
-    "            Computes scores such as training error, training accuracy, validation error\n",
-    "            and validation accuracy for the CNN depending on if a validation set is used\n",
-    "            and if the CNN performs classification or regression\n",
-    "\n",
-    "        Returns:\n",
-    "        ------------\n",
-    "            scores (dict) a dictionary with \"train_error\", \"train_acc\", \"val_error\", val_acc\" keys\n",
-    "            that contain numpy arrays with float values of all accuracies/errors over all epochs.\n",
-    "            Can be used to create plots. Also used to update the progress bar during training\n",
-    "        \"\"\"\n",
-    "\n",
-    "        pred_train = self.predict(X)\n",
-    "        cost_function_train = self.cost_func(t)\n",
-    "        train_error = cost_function_train(pred_train)\n",
-    "        scores[\"train_error\"][epoch] = train_error\n",
-    "\n",
-    "        if X_val is not None and t_val is not None:\n",
-    "            cost_function_val = self.cost_func(t_val)\n",
-    "            pred_val = self.predict(X_val)\n",
-    "            val_error = cost_function_val(pred_val)\n",
-    "            scores[\"val_error\"][epoch] = val_error\n",
-    "\n",
-    "        if self.pred_format != \"Regression\":\n",
-    "            train_acc = self._accuracy(pred_train, t)\n",
-    "            scores[\"train_acc\"][epoch] = train_acc\n",
-    "            if X_val is not None and t_val is not None:\n",
-    "                val_acc = self._accuracy(pred_val, t_val)\n",
-    "                scores[\"val_acc\"][epoch] = val_acc\n",
-    "\n",
-    "        return scores\n",
-    "\n",
-    "    def _initialize_scores(self, epochs) -> dict:\n",
-    "        \"\"\"\n",
-    "        Description:\n",
-    "        ------------\n",
-    "            Initializes scores such as training error, training accuracy, validation error\n",
-    "            and validation accuracy for the CNN\n",
-    "\n",
-    "        Returns:\n",
-    "        ------------\n",
-    "            A dictionary with \"train_error\", \"train_acc\", \"val_error\", val_acc\" keys that\n",
-    "            will contain numpy arrays with float values of all accuracies/errors over all epochs\n",
-    "            when passed through the _compute_scores() function during fit()\n",
-    "        \"\"\"\n",
-    "        scores = dict()\n",
-    "\n",
-    "        train_errors = np.empty(epochs)\n",
-    "        train_errors.fill(np.nan)\n",
-    "        val_errors = np.empty(epochs)\n",
-    "        val_errors.fill(np.nan)\n",
-    "\n",
-    "        train_accs = np.empty(epochs)\n",
-    "        train_accs.fill(np.nan)\n",
-    "        val_accs = np.empty(epochs)\n",
-    "        val_accs.fill(np.nan)\n",
-    "\n",
-    "        scores[\"train_error\"] = train_errors\n",
-    "        scores[\"val_error\"] = val_errors\n",
-    "        scores[\"train_acc\"] = train_accs\n",
-    "        scores[\"val_acc\"] = val_accs\n",
-    "\n",
-    "        return scores\n",
-    "\n",
-    "    def _initialize_weights(self, X: np.ndarray) -> None:\n",
-    "        \"\"\"\n",
-    "        Description:\n",
-    "        ------------\n",
-    "            Initializes weights for all layers in CNN\n",
-    "\n",
-    "        Parameters:\n",
-    "        ------------\n",
-    "            I   X (np.ndarray) input of format [img, feature_maps, height, width]\n",
-    "        \"\"\"\n",
-    "        prev_nodes = X\n",
-    "        for layer in self.layers:\n",
-    "            prev_nodes = layer._reset_weights(prev_nodes)\n",
-    "\n",
-    "    def predict(self, X: np.ndarray, *, threshold=0.5) -> np.ndarray:\n",
-    "        \"\"\"\n",
-    "        Description:\n",
-    "        ------------\n",
-    "            Predicts output of input X\n",
-    "\n",
-    "        Parameters:\n",
-    "        ------------\n",
-    "            I   X (np.ndarray) input [img, feature_maps, height, width]\n",
-    "        \"\"\"\n",
-    "\n",
-    "        prediction = self._feedforward(X)\n",
-    "\n",
-    "        if self.pred_format == \"Binary\":\n",
-    "            return np.where(prediction > threshold, 1, 0)\n",
-    "        elif self.pred_format == \"Multi-class\":\n",
-    "            class_prediction = np.zeros(prediction.shape)\n",
-    "            for i in range(prediction.shape[0]):\n",
-    "                class_prediction[i, np.argmax(prediction[i, :])] = 1\n",
-    "            return class_prediction\n",
-    "        else:\n",
-    "            return prediction\n",
-    "\n",
-    "    def _accuracy(self, prediction: np.ndarray, target: np.ndarray) -> float:\n",
-    "        \"\"\"\n",
-    "        Description:\n",
-    "        ------------\n",
-    "            Calculates accuracy of given prediction to target\n",
-    "\n",
-    "        Parameters:\n",
-    "        ------------\n",
-    "            I   prediction (np.ndarray): output of predict() fuction\n",
-    "            (1s and 0s in case of classification, and real numbers in case of regression)\n",
-    "            II  target (np.ndarray): vector of true values (What the network should predict)\n",
-    "\n",
-    "        Returns:\n",
-    "        ------------\n",
-    "            A floating point number representing the percentage of correctly classified instances.\n",
-    "        \"\"\"\n",
-    "        assert prediction.size == target.size\n",
-    "        return np.average((target == prediction))\n",
-    "\n",
-    "    def _progress_bar(self, epoch: int, epochs: int, scores: dict) -> int:\n",
-    "        \"\"\"\n",
-    "        Description:\n",
-    "        ------------\n",
-    "            Displays progress of training\n",
-    "        \"\"\"\n",
-    "        progression = epoch / epochs\n",
-    "        epoch -= 1\n",
-    "        print_length = 40\n",
-    "        num_equals = int(progression * print_length)\n",
-    "        num_not = print_length - num_equals\n",
-    "        arrow = \">\" if num_equals > 0 else \"\"\n",
-    "        bar = \"[\" + \"=\" * (num_equals - 1) + arrow + \"-\" * num_not + \"]\"\n",
-    "        perc_print = self._fmt(progression * 100, N=5)\n",
-    "        line = f\"  {bar} {perc_print}% \"\n",
-    "\n",
-    "        for key, score in scores.items():\n",
-    "            if np.isnan(score[epoch]) == False:\n",
-    "                value = self._fmt(score[epoch], N=4)\n",
-    "                line += f\"| {key}: {value} \"\n",
-    "        print(line, end=\"\\r\")\n",
-    "        return len(line)\n",
-    "\n",
-    "    def _fmt(self, value: int, N=4) -> str:\n",
-    "        \"\"\"\n",
-    "        Description:\n",
-    "        ------------\n",
-    "            Formats decimal numbers for progress bar\n",
-    "        \"\"\"\n",
-    "        if value > 0:\n",
-    "            v = value\n",
-    "        elif value < 0:\n",
-    "            v = -10 * value\n",
-    "        else:\n",
-    "            v = 1\n",
-    "        n = 1 + math.floor(math.log10(v))\n",
-    "        if n >= N - 1:\n",
-    "            return str(round(value))\n",
-    "            # or overflow\n",
-    "        return f\"{value:.{N-n-1}f}\""
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "bfc23ff8",
-   "metadata": {
-    "editable": true
-   },
-   "source": [
-    "### Usage of CNN code\n",
-    "\n",
-    "Using the CNN codebase is very simple. We begin by initiating a CNN\n",
-    "object, which takes a cost function, a scheduler and a seed as its\n",
-    "arguments. If a scheduler is not provided, it will per default\n",
-    "initiate an Adam scheduler with eta=1e-4, and if a seed is not\n",
-    "provided, the CNN will not be seeded, meaning it will run with a\n",
-    "different random seed every run. Below we demonstrate an initiation of\n",
-    "our CNN."
+    "We see easily that performing the matrix-vector multiplication $\\boldsymbol{W}'\\boldsymbol{X}'$ is the same as the above convolution with stride $S=1$, that is"
    ]
   },
   {
-   "cell_type": "code",
-   "execution_count": 31,
-   "id": "1d20960c",
+   "cell_type": "markdown",
+   "id": "e5f35917",
    "metadata": {
-    "collapsed": false,
     "editable": true
    },
-   "outputs": [],
    "source": [
-    "adam_scheduler = Adam(eta=1e-3, rho=0.9, rho2=0.999)\n",
-    "cnn = CNN(cost_func=CostCrossEntropy, scheduler=adam_scheduler, seed=2023)"
+    "$$\n",
+    "Y=(\\boldsymbol{W}*\\boldsymbol{X}),\n",
+    "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "6752a03f",
+   "id": "7c7cca5e",
    "metadata": {
     "editable": true
    },
    "source": [
-    "Now that we have our CNN object, we can begin to add layers to it!\n",
-    "Many of the add_layer functions have default values, for example\n",
-    "add_Convolution2DLayer() has a default v_stride and h_stride of\n",
-    "1. However, these can of course be set to any value you please. Note\n",
-    "that the input channels of a subsequent convolutional layer must equal\n",
-    "the previous convolutional layer's feature maps."
+    "is now given by $\\boldsymbol{W}'\\boldsymbol{X}'$ which is a vector of length $4$ instead of the originally resulting  $2\\times 2$ output matrix."
    ]
   },
   {
-   "cell_type": "code",
-   "execution_count": 32,
-   "id": "c21b80e7",
+   "cell_type": "markdown",
+   "id": "ed8782fc",
    "metadata": {
-    "collapsed": false,
     "editable": true
    },
-   "outputs": [],
    "source": [
-    "cnn.add_Convolution2DLayer(\n",
-    "    input_channels=1,\n",
-    "    feature_maps=1,\n",
-    "    kernel_height=3,\n",
-    "    kernel_width=3,\n",
-    "    act_func=LRELU,\n",
-    ")\n",
-    "\n",
-    "cnn.add_FlattenLayer()\n",
-    "\n",
-    "cnn.add_FullyConnectedLayer(30, LRELU)\n",
-    "\n",
-    "cnn.add_FullyConnectedLayer(20, LRELU)\n",
+    "## The convolution stage\n",
     "\n",
-    "cnn.add_OutputLayer(10, softmax)"
+    "The convolution stage, where we apply different filters $\\boldsymbol{W}$ in\n",
+    "order to reduce the dimensionality of an image, adds, in addition to\n",
+    "the weights and biases (to be trained by the back propagation\n",
+    "algorithm) that define the filters, two new hyperparameters, the so-called\n",
+    "**padding** $P$ and the stride $S$."
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "10e46b6b",
+   "id": "3582873f",
    "metadata": {
     "editable": true
    },
    "source": [
-    "Here we have created a CNN with the following architecture:\n",
+    "## Finding the number of parameters\n",
+    "\n",
+    "In the above example we have an input matrix of dimension $3\\times\n",
+    "3$. In general we call the input for an input volume and it is defined\n",
+    "by its width $H_1$, height $H_1$ and depth $D_1$. If we have the\n",
+    "standard three color channels $D_1=3$.\n",
     "\n",
-    "1. A convolutional layer with 1 input channel, with a kernel height of 2 and a width of 2, which uses LRELU as its non-linearity function. This layer outputs 1 feature map, which feed into the subsequent layer.\n",
+    "The above example has $W_1=H_1=3$ and $D_1=1$.\n",
     "\n",
-    "2. A flatten layer\n",
+    "When we introduce the filter we have the following additional hyperparameters\n",
+    "1. $K$ the number of filters. It is common to perform the convolution of the input several times since by experience shrinking the input too fast does not work well\n",
     "\n",
-    "3. A hidden layer with 30 nodes, with LRELU as its activation function\n",
+    "2. $F$ as the filter's spatial extent\n",
     "\n",
-    "4. Another hidden layer but with 20 nodes\n",
+    "3. $S$ as the stride parameter\n",
     "\n",
-    "5. The output layer, with softmax as its activation function and 10 nodes. We use 10 nodes because we will be using a dataset with 10 classes.\n",
+    "4. $P$ as the padding parameter\n",
     "\n",
-    "Now, before we can train the model, we need to load in our data. We\n",
-    "will use the MNIST dataset and use 10000 $28 \\times  28$ images."
+    "These parameters are defined by the architecture of the network and are not included in the training."
    ]
   },
   {
-   "cell_type": "code",
-   "execution_count": 33,
-   "id": "29d4f033",
+   "cell_type": "markdown",
+   "id": "c06b2b85",
    "metadata": {
-    "collapsed": false,
     "editable": true
    },
-   "outputs": [],
    "source": [
-    "from sklearn.datasets import fetch_openml\n",
-    "from sklearn.model_selection import train_test_split\n",
-    "\n",
-    "def onehot(target: np.ndarray):\n",
-    "    onehot = np.zeros((target.size, target.max() + 1))\n",
-    "    onehot[np.arange(target.size), target] = 1\n",
-    "    return onehot\n",
-    "\n",
-    "# get dataset\n",
-    "dataset = fetch_openml(\"mnist_784\", parser=\"auto\")\n",
-    "mnist = dataset.data.to_numpy(dtype=\"float\")[:10000, :]\n",
-    "\n",
-    "# scale data\n",
-    "for i in range(mnist.shape[1]):\n",
-    "    mnist[:, i] /= 255\n",
-    "    \n",
-    "# reshape to add single input channel to data shape [inputs, input_channels, height, width]\n",
-    "mnist = mnist.reshape(mnist.shape[0], 1, 28, 28)\n",
+    "## New image (or volume)\n",
     "\n",
-    "# one hot encode target as we are doing multi-class classification\n",
-    "target = onehot(np.array([int(i) for i in dataset.target.to_numpy()[:10000]]))\n",
+    "Acting with the filter on the input volume produces an output volume\n",
+    "which is defined by its width $W_2$, its height $H_2$ and its depth\n",
+    "$D_2$.\n",
     "\n",
-    "# split into training and validation data\n",
-    "x_train, x_val, y_train, y_val = train_test_split(mnist, target)"
+    "These are defined by the following relations"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "96a37378",
+   "id": "aa9ff748",
    "metadata": {
     "editable": true
    },
    "source": [
-    "Now we may train our model. Note that we can utilize regularization in\n",
-    "the CNN by using the lam (lambda) parameter in fit(), and utilize\n",
-    "different types of gradient descent by specifying the amount of\n",
-    "batches via the batches parameter as shown below.\n",
-    "\n",
-    "The functionfit() returns a score dictionary of the training error and\n",
-    "accuracy (and validation error and accuracy if a validation set is\n",
-    "provided) which can be used to plot the error and accuracy of the\n",
-    "model over epochs."
+    "$$\n",
+    "W_2 = \\frac{(W_1-F+2P)}{S}+1,\n",
+    "$$"
    ]
   },
   {
-   "cell_type": "code",
-   "execution_count": 34,
-   "id": "79096135",
+   "cell_type": "markdown",
+   "id": "0508533e",
    "metadata": {
-    "collapsed": false,
     "editable": true
    },
-   "outputs": [],
    "source": [
-    "scores = cnn.fit(\n",
-    "    x_train,\n",
-    "    y_train,\n",
-    "    lam=1e-5,\n",
-    "    batches=10,\n",
-    "    epochs=100,\n",
-    "    X_val=x_val,\n",
-    "    t_val=y_val,\n",
-    ")\n",
-    "\n",
-    "plt.plot(scores[\"train_acc\"], label=\"Training\")\n",
-    "plt.plot(scores[\"val_acc\"], label=\"Validation\")\n",
-    "plt.ylim([0.8,1])\n",
-    "plt.xlabel(\"Epochs\")\n",
-    "plt.ylabel(\"Accuracy\")\n",
-    "plt.legend()\n",
-    "plt.show()"
+    "$$\n",
+    "H_2 = \\frac{(H_1-F+2P)}{S}+1,\n",
+    "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "7601ba4d",
+   "id": "2b59d0d6",
    "metadata": {
     "editable": true
    },
    "source": [
-    "Considering we only trained the model for 100 epochs without any tuning of the hyperparameters, this result is pretty good.\n",
-    "\n",
-    "The codebase allows for great flexibility in CNN\n",
-    "architectures. Pooling layers can be added before, inbetween or after\n",
-    "convolutional layers, but due to the great optimizations made within\n",
-    "Convolution2DLayerOPT, we recommend using the v_stride and h_stride\n",
-    "parameters in add_Convolution2DLayer() to reduce the dimentionality of\n",
-    "the problem as the pooling layer is slow in comparison. To use the\n",
-    "unoptimized version of Convolution2DLayer, simply pass optimized=False\n",
-    "as an argument in add_Convolution2DLayer().\n",
-    "\n",
-    "If one wishes to perform binary classification using the CNN, simply\n",
-    "use the cost function 'CostLogReg' when initializing the CNN and use 1\n",
-    "node at the OutputLayer.\n",
-    "\n",
-    "Below we have created another, more untraditional architecture using\n",
-    "our code to demonstrate its flexibility and different attributes such\n",
-    "as asymmetric stride that might become useful when constructing your\n",
-    "own CNN."
+    "and $D_2=K$."
    ]
   },
   {
-   "cell_type": "code",
-   "execution_count": 35,
-   "id": "30431eab",
+   "cell_type": "markdown",
+   "id": "e283f13b",
    "metadata": {
-    "collapsed": false,
     "editable": true
    },
-   "outputs": [],
    "source": [
-    "adam_scheduler = Adam(eta=1e-3, rho=0.9, rho2=0.999)\n",
-    "cnn = CNN(cost_func=CostCrossEntropy, scheduler=adam_scheduler, seed=2023)\n",
-    "\n",
-    "cnn.add_Convolution2DLayer(\n",
-    "    input_channels=1,\n",
-    "    feature_maps=7,\n",
-    "    kernel_height=7,\n",
-    "    kernel_width=1,\n",
-    "    act_func=LRELU,\n",
-    ")\n",
-    "\n",
-    "cnn.add_PoolingLayer(\n",
-    "    kernel_height=2,\n",
-    "    kernel_width=2,\n",
-    "    pooling=\"average\",\n",
-    ")\n",
-    "\n",
-    "cnn.add_PoolingLayer(\n",
-    "    kernel_height=2,\n",
-    "    kernel_width=2,\n",
-    "    pooling=\"max\",\n",
-    ")\n",
-    "\n",
-    "cnn.add_Convolution2DLayer(\n",
-    "    input_channels=7,\n",
-    "    feature_maps=1,\n",
-    "    kernel_height=4,\n",
-    "    kernel_width=4,\n",
-    "    v_stride=2,\n",
-    "    h_stride=3,\n",
-    "    act_func=LRELU,\n",
-    "    optimized=False,\n",
-    ")\n",
-    "\n",
-    "cnn.add_Convolution2DLayer(\n",
-    "    input_channels=1,\n",
-    "    feature_maps=1,\n",
-    "    kernel_height=2,\n",
-    "    kernel_width=2,\n",
-    "    act_func=sigmoid,\n",
-    "    optimized=True,\n",
-    ")\n",
-    "\n",
-    "cnn.add_PoolingLayer(\n",
-    "    kernel_height=2,\n",
-    "    kernel_width=2,\n",
-    "    pooling=\"max\"\n",
-    ")\n",
-    "\n",
-    "cnn.add_FlattenLayer()\n",
-    "\n",
-    "cnn.add_FullyConnectedLayer(100, LRELU)\n",
-    "\n",
-    "cnn.add_FullyConnectedLayer(10, sigmoid)\n",
+    "## Parameters to train, common settings\n",
     "\n",
-    "cnn.add_FullyConnectedLayer(101, identity)\n",
+    "With parameter sharing, the convolution involves thus  for each filter  $F\\times F\\times D_1$ weights plus one bias parameter.\n",
     "\n",
-    "cnn.add_OutputLayer(10, softmax)"
+    "In total we have"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "c0f7ead9",
+   "id": "59617fcb",
    "metadata": {
     "editable": true
    },
    "source": [
-    "Here we see the use of asymmetrical 1D kernels such as the $7 \\times\n",
-    "1$ kernel in the first convolutional layer, both max and average\n",
-    "pooling, asymmetric stride in the unoptimized convolutional layer,\n",
-    "more pooling, a flatten layer, a hidden layer with 100 nodes using\n",
-    "LRELU, another hidden layer with 10 hidden nodes that uses the sigmoid\n",
-    "activation function, and another hidden layer with 101 nodes which\n",
-    "utilizes no activation function (identity). Finally, we arrive at the\n",
-    "output layer with 10 nodes, which uses softmax as its activation\n",
-    "function."
+    "$$\n",
+    "\\left(F\\times F\\times D_1)\\right) \\times K+(K\\mathrm{--biases}),\n",
+    "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "8eba3502",
+   "id": "f406e197",
    "metadata": {
     "editable": true
    },
    "source": [
-    "### Additional Remarks\n",
+    "parameters to train by back propagation.\n",
     "\n",
-    "The stride parameter controls the distance between each convolution\n",
-    "and the kernel/filter. If our image is padded, stride is the only\n",
-    "parameter that determines the size of the output from a convolutional\n",
-    "layer. However, if we decide not to perform any padding, the size of\n",
-    "the output feature map depends on both the stride and kernel size. It\n",
-    "is important to note that neither the stride nor the kernel has to be\n",
-    "symmetrical. This means that we can use a rectangular filter if we\n",
-    "choose, and the stride in the vertical direction (axis=0 in Python)\n",
-    "does not need to be the same as the stride in the horizontal direction\n",
-    "(axis=1 in Python). It may even be the case that asymmetric\n",
-    "combinations of stride or kernel dimensions, or both, yield better\n",
-    "results than symmetric values for these parameters."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 36,
-   "id": "d15712bc",
-   "metadata": {
-    "collapsed": false,
-    "editable": true
-   },
-   "outputs": [],
-   "source": [
-    "def convolve(image, kernel, stride=1):\n",
-    "    for i in range(2):\n",
-    "        kernel = np.rot90(kernel)\n",
+    "It is common to let $K$ come in powers of $2$, that is $32$, $64$, $128$ etc.\n",
+    "\n",
+    "**Common settings.**\n",
     "\n",
-    "    k_half_height = kernel.shape[0] // 2\n",
-    "    k_half_width = kernel.shape[0] // 2\n",
+    "1. $\\begin{array}{c} F=3 & S=1 & P=1 \\end{array}$\n",
     "\n",
-    "    conv_image = np.zeros(image.shape)\n",
-    "    pad_image = padding(image, kernel)\n",
+    "2. $\\begin{array}{c} F=5 & S=1 & P=2 \\end{array}$\n",
     "\n",
-    "    for i in range(k_half_height, conv_image.shape[0] + k_half_height, stride):\n",
-    "        for j in range(k_half_width, conv_image.shape[1] + k_half_width, stride):\n",
-    "            conv_image[i - k_half_height, j - k_half_width] = np.sum(\n",
-    "                pad_image[\n",
-    "                    i - k_half_height : i + k_half_height + 1, j - k_half_width : j + k_half_width + 1\n",
-    "                ]\n",
-    "                * kernel\n",
-    "            )\n",
+    "3. $\\begin{array}{c} F=5 & S=2 & P=\\mathrm{open} \\end{array}$\n",
     "\n",
-    "    return conv_image"
+    "4. $\\begin{array}{c} F=1 & S=1 & P=0 \\end{array}$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "65c59f0c",
+   "id": "82febfb4",
    "metadata": {
     "editable": true
    },
    "source": [
-    "### Remarks on the speed\n",
+    "## Examples of CNN setups\n",
+    "\n",
+    "Let us assume we have an input volume $V$ given by an image of dimensionality\n",
+    "$32\\times 32 \\times 3$, that is three color channels and $32\\times 32$ pixels.\n",
+    "\n",
+    "We apply a filter of dimension $5\\times 5$ ten times with stride $S=1$ and padding $P=0$.\n",
+    "\n",
+    "The output volume is given by $(32-5)/1+1=28$, resulting in ten images\n",
+    "of dimensionality $28\\times 28\\times 3$.\n",
+    "\n",
+    "The total number of parameters to train for each filter is then\n",
+    "$5\\times 5\\times 3+1$, where the last parameter is the bias. This\n",
+    "gives us $76$ parameters for each filter, leading to a total of $760$\n",
+    "parameters for the ten filters.\n",
     "\n",
-    "Despite the naive convolution algorithm shown above working finely, it\n",
-    "is extremely slow, requiring approximately 20-30 seconds to process a\n",
-    "single image. The time complexity of 2D convolution, which is O(NMnm),\n",
-    "rapidly becomes a constraint and may, at worst, make computations\n",
-    "infeasible. Consequently, optimizing the naive 2D convolution\n",
-    "algorithm is a necessity, as the execution time of the algorithm\n",
-    "significantly increases as the input data size expands. This can pose\n",
-    "a bottleneck in applications that necessitate real-time processing of\n",
-    "large data volumes, such as image and video processing, deep learning,\n",
-    "and scientific simulations.\n",
+    "How many parameters will a filter of dimensionality $3\\times 3$\n",
+    "(adding color channels) result in if we produce $32$ new images? Use $S=1$ and $P=0$.\n",
     "\n",
-    "To address this issue, we shall present two widely used optimization\n",
-    "techniques: the separable kernel approach and Fast Fourier Transform\n",
-    "(FFT). Both of these methods can drastically reduce the computational\n",
-    "complexity of convolution and enhance the overall efficiency of\n",
-    "processing substantial data quantities. While we shall refrain from\n",
-    "delving into the intricacies of these algorithms, we strongly\n",
-    "encourage you to examine at least the application of FFT to optimize\n",
-    "computations."
+    "Note that strides constitute a form of **subsampling**. As an alternative to\n",
+    "being interpreted as a measure of how much the kernel/filter is translated, strides\n",
+    "can also be viewed as how much of the output is retained. For instance, moving\n",
+    "the kernel by hops of two is equivalent to moving the kernel by hops of one but\n",
+    "retaining only odd output elements."
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "20545917",
+   "id": "638e063c",
    "metadata": {
     "editable": true
    },
    "source": [
-    "### Convolution using separable kernels"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 37,
-   "id": "8613dd26",
-   "metadata": {
-    "collapsed": false,
-    "editable": true
-   },
-   "outputs": [],
-   "source": [
-    "def conv2DSep(image, kernel, coef, stride=1, pad=\"zero\"):\n",
-    "    for i in range(2):\n",
-    "        kernel = np.rot90(kernel)\n",
-    "\n",
-    "    # The kernel is quadratic, thus we only need one of its dimensions\n",
-    "    half_dim = kernel.shape[0] // 2\n",
-    "\n",
-    "    ker1 = np.array(kernel[0, :])\n",
-    "    ker2 = np.array(kernel[:, 0])\n",
-    "\n",
-    "    if pad == \"zero\":\n",
-    "        conv_image = np.zeros(image.shape)\n",
-    "        pad_image = padding(image, kernel)\n",
-    "    else:\n",
-    "        conv_image = np.zeros(\n",
-    "            (image.shape[0] - kernel.shape[0], image.shape[1] - kernel.shape[1])\n",
-    "        )\n",
-    "        pad_image = image[:, :]\n",
-    "\n",
-    "    for i in range(half_dim, conv_image.shape[0] + half_dim, stride):\n",
-    "        for j in range(half_dim, conv_image.shape[1] + half_dim, stride):\n",
-    "            conv_image[i - half_dim, j - half_dim] = (\n",
-    "                pad_image[\n",
-    "                    i - half_dim : i + half_dim + 1, j - half_dim : j + half_dim + 1\n",
-    "                ]\n",
-    "                @ ker1\n",
-    "                @ ker2.T\n",
-    "                * coef\n",
-    "            )\n",
+    "## Summarizing: Performing a general discrete convolution ([From Raschka et al](https://github.com/rasbt/machine-learning-book))\n",
     "\n",
-    "    return conv_image\n",
+    "<!-- dom:FIGURE: [figslides/discreteconv1.png, width=500 frac=0.67]  A deep CNN -->\n",
+    "<!-- begin figure -->\n",
     "\n",
-    "img_path = img_path = \"data/IMG-2167.JPG\"\n",
-    "image_of_cute_dog = imageio.imread(img_path, mode=\"L\")\n",
-    "start_time = time.time()\n",
-    "filtered_image = conv2DSep(image_of_cute_dog, kernel=sobel_kernel, coef=1)\n",
-    "print(f'Time taken for convolution with seperated kernel on 128x128 image {time.time() - start_time}')\n",
-    "plt.imshow(filtered_image, cmap=\"gray\", vmin=0, vmax=255, aspect=\"auto\")\n",
-    "plt.show()"
+    "<img src=\"figslides/discreteconv1.png\" width=\"500\"><p style=\"font-size: 0.9em\"><i>Figure 1: A deep CNN</i></p>\n",
+    "<!-- end figure -->"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "1d4fbbf9",
+   "id": "d182de4b",
    "metadata": {
     "editable": true
    },
    "source": [
-    "By taking advantage of the capabilities of separable kernels, we can\n",
-    "effectively cut the computational expense of filtering an image in\n",
-    "half. Yet, if we seek even more rapid processing, we can turn to the\n",
-    "Fast Fourier Transform (FFT) algorithm provided by the numpy\n",
-    "library. By utilizing FFT to transform the input image and filter into\n",
-    "the frequency domain, we can perform convolution in this domain. This\n",
-    "approach significantly reduces the number of operations needed and\n",
-    "results in a marked speedup relative to other convolution\n",
-    "techniques. In addition, it is worth noting that the FFT is widely\n",
-    "regarded as one of the most critical algorithms developed to date,\n",
-    "with applications ranging from digital signal processing to scientific\n",
-    "computing."
+    "## Pooling\n",
+    "\n",
+    "In addition to discrete convolutions themselves, **pooling** operations\n",
+    "make up another important building block in CNNs. Pooling operations reduce\n",
+    "the size of feature maps by using some function to summarize subregions, such\n",
+    "as taking the average or the maximum value.\n",
+    "\n",
+    "Pooling works by sliding a window across the input and feeding the content of\n",
+    "the window to a **pooling function**. In some sense, pooling works very much\n",
+    "like a discrete convolution, but replaces the linear combination described by\n",
+    "the kernel with some other function."
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "6d31f7ab",
+   "id": "1159bffe",
    "metadata": {
     "editable": true
    },
    "source": [
-    "### Convolution in the Fourier domain"
+    "## Pooling arithmetic\n",
+    "\n",
+    "In a neural network, pooling layers provide invariance to small translations of\n",
+    "the input. The most common kind of pooling is **max pooling**, which\n",
+    "consists in splitting the input in (usually non-overlapping) patches and\n",
+    "outputting the maximum value of each patch. Other kinds of pooling exist, e.g.,\n",
+    "mean or average pooling, which all share the same idea of aggregating the input\n",
+    "locally by applying a non-linearity to the content of some patches."
    ]
   },
   {
-   "cell_type": "code",
-   "execution_count": 38,
-   "id": "15082cf9",
+   "cell_type": "markdown",
+   "id": "138b6d6a",
    "metadata": {
-    "collapsed": false,
     "editable": true
    },
-   "outputs": [],
    "source": [
-    "start_time = time.time()\n",
-    "img_fft = np.fft.fft2(image_of_cute_dog)\n",
-    "kernel_fft = np.fft.fft2(sobel_kernel, s=image_of_cute_dog.shape)\n",
+    "## Pooling types ([From Raschka et al](https://github.com/rasbt/machine-learning-book))\n",
     "\n",
-    "conv_image = img_fft * kernel_fft\n",
+    "<!-- dom:FIGURE: [figslides/maxpooling.png, width=500 frac=0.67]  A deep CNN -->\n",
+    "<!-- begin figure -->\n",
     "\n",
-    "filtered_image = np.fft.ifft2(conv_image)\n",
-    "print(f'Time take for convolution in the fourier domain: {time.time() - start_time}')\n",
-    "plt.imshow(filtered_image.real, cmap=\"gray\", vmin=0, vmax=255, aspect=\"auto\")\n",
-    "plt.show()"
+    "<img src=\"figslides/maxpooling.png\" width=\"500\"><p style=\"font-size: 0.9em\"><i>Figure 1: A deep CNN</i></p>\n",
+    "<!-- end figure -->"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "5211e804",
+   "id": "97123878",
    "metadata": {
     "editable": true
    },
    "source": [
-    "It is evident that executing convolution in the Fourier domain yields\n",
-    "the quickest computation time. Nonetheless, one should exercise\n",
-    "caution, particularly when dealing with images of relatively small\n",
-    "dimensions, as one of the other methods may prove to be more\n",
-    "expeditious than FFT-enhanced convolution. The overhead involved in\n",
-    "transferring both the image and filter into the Fourier domain,\n",
-    "followed by their subsequent transformation back into the spatial\n",
-    "domain, results in a minor inconvenience. Therefore, it is imperative\n",
-    "to remain cognizant of this fact when utilizing FFT as the primary\n",
-    "optimization technique."
+    "## Building convolutional neural networks in Tensorflow/Keras and PyTorch\n",
+    "\n",
+    "As discussed above, CNNs are neural networks built from the assumption\n",
+    "that the inputs to the network are 2D images. This is important\n",
+    "because the number of features or pixels in images grows very fast\n",
+    "with the image size, and an enormous number of weights and biases are\n",
+    "needed in order to build an accurate network.  Next week we will\n",
+    "discuss in more detail how we can build a CNN using either TensorFlow\n",
+    "with Keras and PyTorch."
    ]
   }
  ],
diff --git a/doc/pub/week45/html/._week45-bs000.html b/doc/pub/week45/html/._week45-bs000.html
index 0bd82ed85..6c5387158 100644
--- a/doc/pub/week45/html/._week45-bs000.html
+++ b/doc/pub/week45/html/._week45-bs000.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks (RNNs)">
-<title>Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks (RNNs)</title>
+<meta name="description" content="Week 45,  Convolutional Neural Networks (CCNs)">
+<title>Week 45,  Convolutional Neural Networks (CCNs)</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week45.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week45-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -37,19 +37,19 @@
 <!-- tocinfo
 {'highest level': 2,
  'sections': [('Plans for week 45', 2, None, 'plans-for-week-45'),
-              ('Material for the lab sessions, additional ways to present '
-               'classification results and other practicalities',
+              ('Material for the lab sessions',
                2,
                None,
-               'material-for-the-lab-sessions-additional-ways-to-present-classification-results-and-other-practicalities'),
-              ('Material for Lecture Monday November 4',
+               'material-for-the-lab-sessions'),
+              ('Material for Lecture Monday November 3',
                2,
                None,
-               'material-for-lecture-monday-november-4'),
-              ('Convolutional Neural Networks (recognizing images)',
+               'material-for-lecture-monday-november-3'),
+              ('Convolutional Neural Networks (recognizing images), reminder '
+               'from last week',
                2,
                None,
-               'convolutional-neural-networks-recognizing-images'),
+               'convolutional-neural-networks-recognizing-images-reminder-from-last-week'),
               ('What is the Difference', 2, None, 'what-is-the-difference'),
               ('Neural Networks vs CNNs', 2, None, 'neural-networks-vs-cnns'),
               ('Why CNNS for images, sound files, medical images from CT scans '
@@ -62,10 +62,13 @@
                None,
                'regular-nns-don-t-scale-well-to-full-images'),
               ('3D volumes of neurons', 2, None, '3d-volumes-of-neurons'),
+              ('More on Dimensionalities', 2, None, 'more-on-dimensionalities'),
+              ('Further remarks', 2, None, 'further-remarks'),
               ('Layers used to build CNNs',
                2,
                None,
                'layers-used-to-build-cnns'),
+              ('Transforming images', 2, None, 'transforming-images'),
               ('CNNs in brief', 2, None, 'cnns-in-brief'),
               ('A deep CNN model ("From Raschka et '
                'al":"https://github.com/rasbt/machine-learning-book")',
@@ -73,208 +76,104 @@
                None,
                'a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
               ('Key Idea', 2, None, 'key-idea'),
-              ('Building convolutional neural networks in Tensorflow and Keras',
+              ('Mathematics of CNNs', 2, None, 'mathematics-of-cnns'),
+              ('Convolution Examples: Polynomial multiplication',
                2,
                None,
-               'building-convolutional-neural-networks-in-tensorflow-and-keras'),
-              ('Setting it up', 2, None, 'setting-it-up'),
-              ('The MNIST dataset again', 2, None, 'the-mnist-dataset-again'),
-              ('Strong correlations', 2, None, 'strong-correlations'),
-              ('Layers of a CNN', 2, None, 'layers-of-a-cnn'),
-              ('Systematic reduction', 2, None, 'systematic-reduction'),
-              ('Prerequisites: Collect and pre-process data',
-               2,
-               None,
-               'prerequisites-collect-and-pre-process-data'),
-              ('Importing Keras and Tensorflow',
-               2,
-               None,
-               'importing-keras-and-tensorflow'),
-              ('Running with Keras', 2, None, 'running-with-keras'),
-              ('Final part', 2, None, 'final-part'),
-              ('Final visualization', 2, None, 'final-visualization'),
-              ('The CIFAR01 data set', 2, None, 'the-cifar01-data-set'),
-              ('Verifying the data set', 2, None, 'verifying-the-data-set'),
-              ('Set up  the model', 2, None, 'set-up-the-model'),
-              ('Add Dense layers on top', 2, None, 'add-dense-layers-on-top'),
-              ('Compile and train the model',
-               2,
-               None,
-               'compile-and-train-the-model'),
-              ('Finally, evaluate the model',
-               2,
-               None,
-               'finally-evaluate-the-model'),
-              ('Building our own CNN code',
-               2,
-               None,
-               'building-our-own-cnn-code'),
-              ('List of contents:', 3, None, 'list-of-contents'),
-              ('Schedulers', 3, None, 'schedulers'),
-              ('Usage of schedulers', 3, None, 'usage-of-schedulers'),
-              ('Cost functions', 3, None, 'cost-functions'),
-              ('Usage of cost functions', 3, None, 'usage-of-cost-functions'),
-              ('Activation functions', 3, None, 'activation-functions'),
-              ('Usage of activation functions',
-               3,
-               None,
-               'usage-of-activation-functions'),
-              ('Convolution', 3, None, 'convolution'),
-              ('Layers', 3, None, 'layers'),
-              ('Convolution2DLayer: convolution in a hidden layer',
-               3,
-               None,
-               'convolution2dlayer-convolution-in-a-hidden-layer'),
-              ('Backpropagation in the convolutional layer',
-               3,
-               None,
-               'backpropagation-in-the-convolutional-layer'),
-              ('Demonstration', 3, None, 'demonstration'),
-              ('Pooling Layer', 3, None, 'pooling-layer'),
-              ('Flattening Layer', 3, None, 'flattening-layer'),
-              ('Fully Connected Layers', 3, None, 'fully-connected-layers'),
-              ('Optimized Convolution2DLayer',
-               3,
-               None,
-               'optimized-convolution2dlayer'),
-              ('The Convolutional Neural Network (CNN)',
-               3,
-               None,
-               'the-convolutional-neural-network-cnn'),
-              ('Usage of CNN code', 3, None, 'usage-of-cnn-code'),
-              ('Additional Remarks', 3, None, 'additional-remarks'),
-              ('Remarks on the speed', 3, None, 'remarks-on-the-speed'),
-              ('Convolution using separable kernels',
-               3,
-               None,
-               'convolution-using-separable-kernels'),
-              ('Convolution in the Fourier domain',
-               3,
-               None,
-               'convolution-in-the-fourier-domain'),
-              ('From FFNNs and CNNs to recurrent neural networks (RNNs)',
-               2,
-               None,
-               'from-ffnns-and-cnns-to-recurrent-neural-networks-rnns'),
-              ('Feedback connections', 2, None, 'feedback-connections'),
-              ('Vanishing gradients', 2, None, 'vanishing-gradients'),
-              ('Recurrent neural networks (RNNs): Overarching view',
+               'convolution-examples-polynomial-multiplication'),
+              ('Efficient Polynomial Multiplication',
                2,
                None,
-               'recurrent-neural-networks-rnns-overarching-view'),
-              ('Sequential data only?', 2, None, 'sequential-data-only'),
-              ('Differential equations', 2, None, 'differential-equations'),
-              ('A simple example', 2, None, 'a-simple-example'),
-              ('RNNs', 2, None, 'rnns'),
-              ('What kinds of behaviour can RNNs exhibit?',
+               'efficient-polynomial-multiplication'),
+              ('Further simplification', 2, None, 'further-simplification'),
+              ('A more efficient way of coding the above Convolution',
                2,
                None,
-               'what-kinds-of-behaviour-can-rnns-exhibit'),
-              ('Basic layout,  "Figures from Sebastian Rashcka et al, Machine '
-               'learning with Sickit-Learn and '
-               'PyTorch":"https://sebastianraschka.com/blog/2022/ml-pytorch-book.html"',
+               'a-more-efficient-way-of-coding-the-above-convolution'),
+              ('Commutative process', 2, None, 'commutative-process'),
+              ('Toeplitz matrices', 2, None, 'toeplitz-matrices'),
+              ('Fourier series and Toeplitz matrices',
                2,
                None,
-               'basic-layout-figures-from-sebastian-rashcka-et-al-machine-learning-with-sickit-learn-and-pytorch-https-sebastianraschka-com-blog-2022-ml-pytorch-book-html'),
-              ('Solving differential equations with RNNs',
+               'fourier-series-and-toeplitz-matrices'),
+              ('Generalizing the above one-dimensional case',
                2,
                None,
-               'solving-differential-equations-with-rnns'),
-              ('Two first-order differential equations',
+               'generalizing-the-above-one-dimensional-case'),
+              ('Memory considerations', 2, None, 'memory-considerations'),
+              ('Padding', 2, None, 'padding'),
+              ('New vector', 2, None, 'new-vector'),
+              ('Rewriting as dot products',
                2,
                None,
-               'two-first-order-differential-equations'),
-              ('Velocity only', 2, None, 'velocity-only'),
-              ('Linking with RNNs', 2, None, 'linking-with-rnns'),
-              ('Minor rewrite', 2, None, 'minor-rewrite'),
-              ('RNNs in more detail', 2, None, 'rnns-in-more-detail'),
-              ('RNNs in more detail, part 2',
+               'rewriting-as-dot-products'),
+              ('Cross correlation', 2, None, 'cross-correlation'),
+              ('Two-dimensional objects', 2, None, 'two-dimensional-objects'),
+              ('CNNs in more detail, simple example',
                2,
                None,
-               'rnns-in-more-detail-part-2'),
-              ('RNNs in more detail, part 3',
+               'cnns-in-more-detail-simple-example'),
+              ('The convolution stage', 2, None, 'the-convolution-stage'),
+              ('Finding the number of parameters',
                2,
                None,
-               'rnns-in-more-detail-part-3'),
-              ('RNNs in more detail, part 4',
+               'finding-the-number-of-parameters'),
+              ('New image (or volume)', 2, None, 'new-image-or-volume'),
+              ('Parameters to train, common settings',
                2,
                None,
-               'rnns-in-more-detail-part-4'),
-              ('RNNs in more detail, part 5',
-               2,
-               None,
-               'rnns-in-more-detail-part-5'),
-              ('RNNs in more detail, part 6',
-               2,
-               None,
-               'rnns-in-more-detail-part-6'),
-              ('RNNs in more detail, part 7',
-               2,
-               None,
-               'rnns-in-more-detail-part-7'),
-              ('Backpropagation through time',
-               2,
-               None,
-               'backpropagation-through-time'),
-              ('The backward pass is linear',
-               2,
-               None,
-               'the-backward-pass-is-linear'),
-              ('The problem of exploding or vanishing gradients',
-               2,
-               None,
-               'the-problem-of-exploding-or-vanishing-gradients'),
-              ('Mathematical setup', 2, None, 'mathematical-setup'),
-              ('Back propagation in time through figures, part 1',
-               2,
-               None,
-               'back-propagation-in-time-through-figures-part-1'),
-              ('Back propagation in time, part 2',
-               2,
-               None,
-               'back-propagation-in-time-part-2'),
-              ('Back propagation in time, part 3',
+               'parameters-to-train-common-settings'),
+              ('Examples of CNN setups', 2, None, 'examples-of-cnn-setups'),
+              ('Summarizing: Performing a general discrete convolution ("From '
+               'Raschka et '
+               'al":"https://github.com/rasbt/machine-learning-book")',
                2,
                None,
-               'back-propagation-in-time-part-3'),
-              ('Back propagation in time, part 4',
+               'summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
+              ('Pooling', 2, None, 'pooling'),
+              ('Pooling arithmetic', 2, None, 'pooling-arithmetic'),
+              ('Pooling types ("From Raschka et '
+               'al":"https://github.com/rasbt/machine-learning-book")',
                2,
                None,
-               'back-propagation-in-time-part-4'),
-              ('Back propagation in time in equations',
+               'pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
+              ('Building convolutional neural networks using Tensorflow and '
+               'Keras',
                2,
                None,
-               'back-propagation-in-time-in-equations'),
-              ('Chain rule again', 2, None, 'chain-rule-again'),
-              ('Gradients of loss functions',
+               'building-convolutional-neural-networks-using-tensorflow-and-keras'),
+              ('Setting it up', 2, None, 'setting-it-up'),
+              ('The MNIST dataset again', 2, None, 'the-mnist-dataset-again'),
+              ('Strong correlations', 2, None, 'strong-correlations'),
+              ('Layers of a CNN', 2, None, 'layers-of-a-cnn'),
+              ('Systematic reduction', 2, None, 'systematic-reduction'),
+              ('Prerequisites: Collect and pre-process data',
                2,
                None,
-               'gradients-of-loss-functions'),
-              ('Summary of RNNs', 2, None, 'summary-of-rnns'),
-              ('Summary of a  typical RNN',
+               'prerequisites-collect-and-pre-process-data'),
+              ('Importing Keras and Tensorflow',
                2,
                None,
-               'summary-of-a-typical-rnn'),
-              ('Four effective ways to learn an RNN and preparing for next '
-               'week',
+               'importing-keras-and-tensorflow'),
+              ('Running with Keras', 2, None, 'running-with-keras'),
+              ('Final part', 2, None, 'final-part'),
+              ('Final visualization', 2, None, 'final-visualization'),
+              ('The CIFAR01 data set', 2, None, 'the-cifar01-data-set'),
+              ('Verifying the data set', 2, None, 'verifying-the-data-set'),
+              ('Set up  the model', 2, None, 'set-up-the-model'),
+              ('Add Dense layers on top', 2, None, 'add-dense-layers-on-top'),
+              ('Compile and train the model',
                2,
                None,
-               'four-effective-ways-to-learn-an-rnn-and-preparing-for-next-week'),
-              ('Gating mechanism: Long Short Term Memory (LSTM)',
+               'compile-and-train-the-model'),
+              ('Finally, evaluate the model',
                2,
                None,
-               'gating-mechanism-long-short-term-memory-lstm'),
-              ('Implementing a memory cell in a neural network',
+               'finally-evaluate-the-model'),
+              ('Building code using Pytorch',
                2,
                None,
-               'implementing-a-memory-cell-in-a-neural-network'),
-              ('LSTM details', 2, None, 'lstm-details'),
-              ('Basic layout', 2, None, 'basic-layout'),
-              ('More LSTM details', 2, None, 'more-lstm-details'),
-              ('The forget gate', 2, None, 'the-forget-gate'),
-              ('Input gate', 2, None, 'input-gate'),
-              ('Forget and input', 2, None, 'forget-and-input'),
-              ('Output gate', 2, None, 'output-gate')]}
+               'building-code-using-pytorch')]}
 end of tocinfo -->
 
 <body>
@@ -302,111 +201,72 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week45-bs.html">Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks (RNNs)</a>
+    <a class="navbar-brand" href="http://github.com/week45-bs.html">Week 45,  Convolutional Neural Networks (CCNs)</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs001.html#plans-for-week-45" style="font-size: 80%;"><b>Plans for week 45</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs002.html#material-for-the-lab-sessions-additional-ways-to-present-classification-results-and-other-practicalities" style="font-size: 80%;"><b>Material for the lab sessions, additional ways to present classification results and other practicalities</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs003.html#material-for-lecture-monday-november-4" style="font-size: 80%;"><b>Material for Lecture Monday November 4</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs004.html#convolutional-neural-networks-recognizing-images" style="font-size: 80%;"><b>Convolutional Neural Networks (recognizing images)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs005.html#what-is-the-difference" style="font-size: 80%;"><b>What is the Difference</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs006.html#neural-networks-vs-cnns" style="font-size: 80%;"><b>Neural Networks vs CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs007.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;"><b>Why CNNS for images, sound files, medical images from CT scans etc?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs008.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;"><b>Regular NNs don’t scale well to full images</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs009.html#3d-volumes-of-neurons" style="font-size: 80%;"><b>3D volumes of neurons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs010.html#layers-used-to-build-cnns" style="font-size: 80%;"><b>Layers used to build CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs011.html#cnns-in-brief" style="font-size: 80%;"><b>CNNs in brief</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs012.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs013.html#key-idea" style="font-size: 80%;"><b>Key Idea</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs014.html#building-convolutional-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building convolutional neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs015.html#setting-it-up" style="font-size: 80%;"><b>Setting it up</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs016.html#the-mnist-dataset-again" style="font-size: 80%;"><b>The MNIST dataset again</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs017.html#strong-correlations" style="font-size: 80%;"><b>Strong correlations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs018.html#layers-of-a-cnn" style="font-size: 80%;"><b>Layers of a CNN</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs019.html#systematic-reduction" style="font-size: 80%;"><b>Systematic reduction</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs020.html#prerequisites-collect-and-pre-process-data" style="font-size: 80%;"><b>Prerequisites: Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs021.html#importing-keras-and-tensorflow" style="font-size: 80%;"><b>Importing Keras and Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs022.html#running-with-keras" style="font-size: 80%;"><b>Running with Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs023.html#final-part" style="font-size: 80%;"><b>Final part</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs024.html#final-visualization" style="font-size: 80%;"><b>Final visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs025.html#the-cifar01-data-set" style="font-size: 80%;"><b>The CIFAR01 data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs026.html#verifying-the-data-set" style="font-size: 80%;"><b>Verifying the data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs027.html#set-up-the-model" style="font-size: 80%;"><b>Set up  the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs028.html#add-dense-layers-on-top" style="font-size: 80%;"><b>Add Dense layers on top</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs029.html#compile-and-train-the-model" style="font-size: 80%;"><b>Compile and train the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs030.html#finally-evaluate-the-model" style="font-size: 80%;"><b>Finally, evaluate the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#building-our-own-cnn-code" style="font-size: 80%;"><b>Building our own CNN code</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#list-of-contents" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;List of contents:</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution2dlayer-convolution-in-a-hidden-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution2DLayer: convolution in a hidden layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#backpropagation-in-the-convolutional-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Backpropagation in the convolutional layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#demonstration" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Demonstration</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#pooling-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Pooling Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#flattening-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Flattening Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#fully-connected-layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Fully Connected Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#optimized-convolution2dlayer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Optimized Convolution2DLayer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#the-convolutional-neural-network-cnn" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;The Convolutional Neural Network (CNN)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-cnn-code" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of CNN code</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#additional-remarks" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Additional Remarks</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#remarks-on-the-speed" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Remarks on the speed</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution-using-separable-kernels" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution using separable kernels</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution-in-the-fourier-domain" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution in the Fourier domain</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs032.html#from-ffnns-and-cnns-to-recurrent-neural-networks-rnns" style="font-size: 80%;"><b>From FFNNs and CNNs to recurrent neural networks (RNNs)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs033.html#feedback-connections" style="font-size: 80%;"><b>Feedback connections</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs034.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs035.html#recurrent-neural-networks-rnns-overarching-view" style="font-size: 80%;"><b>Recurrent neural networks (RNNs): Overarching view</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs036.html#sequential-data-only" style="font-size: 80%;"><b>Sequential data only?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs037.html#differential-equations" style="font-size: 80%;"><b>Differential equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs038.html#a-simple-example" style="font-size: 80%;"><b>A simple example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs039.html#rnns" style="font-size: 80%;"><b>RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs040.html#what-kinds-of-behaviour-can-rnns-exhibit" style="font-size: 80%;"><b>What kinds of behaviour can RNNs exhibit?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs041.html#basic-layout-figures-from-sebastian-rashcka-et-al-machine-learning-with-sickit-learn-and-pytorch-https-sebastianraschka-com-blog-2022-ml-pytorch-book-html" style="font-size: 80%;"><b>Basic layout,  "Figures from Sebastian Rashcka et al, Machine learning with Sickit-Learn and PyTorch":"https://sebastianraschka.com/blog/2022/ml-pytorch-book.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs042.html#solving-differential-equations-with-rnns" style="font-size: 80%;"><b>Solving differential equations with RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs043.html#two-first-order-differential-equations" style="font-size: 80%;"><b>Two first-order differential equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs044.html#velocity-only" style="font-size: 80%;"><b>Velocity only</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs045.html#linking-with-rnns" style="font-size: 80%;"><b>Linking with RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs046.html#minor-rewrite" style="font-size: 80%;"><b>Minor rewrite</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs047.html#rnns-in-more-detail" style="font-size: 80%;"><b>RNNs in more detail</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs048.html#rnns-in-more-detail-part-2" style="font-size: 80%;"><b>RNNs in more detail, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs049.html#rnns-in-more-detail-part-3" style="font-size: 80%;"><b>RNNs in more detail, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs050.html#rnns-in-more-detail-part-4" style="font-size: 80%;"><b>RNNs in more detail, part 4</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs051.html#rnns-in-more-detail-part-5" style="font-size: 80%;"><b>RNNs in more detail, part 5</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs052.html#rnns-in-more-detail-part-6" style="font-size: 80%;"><b>RNNs in more detail, part 6</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs053.html#rnns-in-more-detail-part-7" style="font-size: 80%;"><b>RNNs in more detail, part 7</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs054.html#backpropagation-through-time" style="font-size: 80%;"><b>Backpropagation through time</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs055.html#the-backward-pass-is-linear" style="font-size: 80%;"><b>The backward pass is linear</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs056.html#the-problem-of-exploding-or-vanishing-gradients" style="font-size: 80%;"><b>The problem of exploding or vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs057.html#mathematical-setup" style="font-size: 80%;"><b>Mathematical setup</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs058.html#back-propagation-in-time-through-figures-part-1" style="font-size: 80%;"><b>Back propagation in time through figures, part 1</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs059.html#back-propagation-in-time-part-2" style="font-size: 80%;"><b>Back propagation in time, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs060.html#back-propagation-in-time-part-3" style="font-size: 80%;"><b>Back propagation in time, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs061.html#back-propagation-in-time-part-4" style="font-size: 80%;"><b>Back propagation in time, part 4</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs062.html#back-propagation-in-time-in-equations" style="font-size: 80%;"><b>Back propagation in time in equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs063.html#chain-rule-again" style="font-size: 80%;"><b>Chain rule again</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs064.html#gradients-of-loss-functions" style="font-size: 80%;"><b>Gradients of loss functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs065.html#summary-of-rnns" style="font-size: 80%;"><b>Summary of RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs066.html#summary-of-a-typical-rnn" style="font-size: 80%;"><b>Summary of a  typical RNN</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs067.html#four-effective-ways-to-learn-an-rnn-and-preparing-for-next-week" style="font-size: 80%;"><b>Four effective ways to learn an RNN and preparing for next week</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs068.html#gating-mechanism-long-short-term-memory-lstm" style="font-size: 80%;"><b>Gating mechanism: Long Short Term Memory (LSTM)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs069.html#implementing-a-memory-cell-in-a-neural-network" style="font-size: 80%;"><b>Implementing a memory cell in a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs070.html#lstm-details" style="font-size: 80%;"><b>LSTM details</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs071.html#basic-layout" style="font-size: 80%;"><b>Basic layout</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs072.html#more-lstm-details" style="font-size: 80%;"><b>More LSTM details</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs073.html#the-forget-gate" style="font-size: 80%;"><b>The forget gate</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs074.html#input-gate" style="font-size: 80%;"><b>Input gate</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs075.html#forget-and-input" style="font-size: 80%;"><b>Forget and input</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs076.html#output-gate" style="font-size: 80%;"><b>Output gate</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs001.html#plans-for-week-45" style="font-size: 80%;">Plans for week 45</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs002.html#material-for-the-lab-sessions" style="font-size: 80%;">Material for the lab sessions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs003.html#material-for-lecture-monday-november-3" style="font-size: 80%;">Material for Lecture Monday November 3</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs004.html#convolutional-neural-networks-recognizing-images-reminder-from-last-week" style="font-size: 80%;">Convolutional Neural Networks (recognizing images), reminder from last week</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs005.html#what-is-the-difference" style="font-size: 80%;">What is the Difference</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs006.html#neural-networks-vs-cnns" style="font-size: 80%;">Neural Networks vs CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs007.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;">Why CNNS for images, sound files, medical images from CT scans etc?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs008.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;">Regular NNs don’t scale well to full images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs009.html#3d-volumes-of-neurons" style="font-size: 80%;">3D volumes of neurons</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs010.html#more-on-dimensionalities" style="font-size: 80%;">More on Dimensionalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs011.html#further-remarks" style="font-size: 80%;">Further remarks</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs012.html#layers-used-to-build-cnns" style="font-size: 80%;">Layers used to build CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs013.html#transforming-images" style="font-size: 80%;">Transforming images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs014.html#cnns-in-brief" style="font-size: 80%;">CNNs in brief</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs015.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs016.html#key-idea" style="font-size: 80%;">Key Idea</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs017.html#mathematics-of-cnns" style="font-size: 80%;">Mathematics of CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs018.html#convolution-examples-polynomial-multiplication" style="font-size: 80%;">Convolution Examples: Polynomial multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs019.html#efficient-polynomial-multiplication" style="font-size: 80%;">Efficient Polynomial Multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs020.html#further-simplification" style="font-size: 80%;">Further simplification</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs021.html#a-more-efficient-way-of-coding-the-above-convolution" style="font-size: 80%;">A more efficient way of coding the above Convolution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs022.html#commutative-process" style="font-size: 80%;">Commutative process</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs023.html#toeplitz-matrices" style="font-size: 80%;">Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs024.html#fourier-series-and-toeplitz-matrices" style="font-size: 80%;">Fourier series and Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs025.html#generalizing-the-above-one-dimensional-case" style="font-size: 80%;">Generalizing the above one-dimensional case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs026.html#memory-considerations" style="font-size: 80%;">Memory considerations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs027.html#padding" style="font-size: 80%;">Padding</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs028.html#new-vector" style="font-size: 80%;">New vector</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs029.html#rewriting-as-dot-products" style="font-size: 80%;">Rewriting as dot products</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs030.html#cross-correlation" style="font-size: 80%;">Cross correlation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs030.html#two-dimensional-objects" style="font-size: 80%;">Two-dimensional objects</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#cnns-in-more-detail-simple-example" style="font-size: 80%;">CNNs in more detail, simple example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs032.html#the-convolution-stage" style="font-size: 80%;">The convolution stage</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs033.html#finding-the-number-of-parameters" style="font-size: 80%;">Finding the number of parameters</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs034.html#new-image-or-volume" style="font-size: 80%;">New image (or volume)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs035.html#parameters-to-train-common-settings" style="font-size: 80%;">Parameters to train, common settings</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs036.html#examples-of-cnn-setups" style="font-size: 80%;">Examples of CNN setups</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs037.html#summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Summarizing: Performing a general discrete convolution ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs038.html#pooling" style="font-size: 80%;">Pooling</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs039.html#pooling-arithmetic" style="font-size: 80%;">Pooling arithmetic</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs040.html#pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Pooling types ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs041.html#building-convolutional-neural-networks-using-tensorflow-and-keras" style="font-size: 80%;">Building convolutional neural networks using Tensorflow and Keras</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs042.html#setting-it-up" style="font-size: 80%;">Setting it up</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs043.html#the-mnist-dataset-again" style="font-size: 80%;">The MNIST dataset again</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs044.html#strong-correlations" style="font-size: 80%;">Strong correlations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs045.html#layers-of-a-cnn" style="font-size: 80%;">Layers of a CNN</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs046.html#systematic-reduction" style="font-size: 80%;">Systematic reduction</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs047.html#prerequisites-collect-and-pre-process-data" style="font-size: 80%;">Prerequisites: Collect and pre-process data</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs048.html#importing-keras-and-tensorflow" style="font-size: 80%;">Importing Keras and Tensorflow</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs049.html#running-with-keras" style="font-size: 80%;">Running with Keras</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs050.html#final-part" style="font-size: 80%;">Final part</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs051.html#final-visualization" style="font-size: 80%;">Final visualization</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs052.html#the-cifar01-data-set" style="font-size: 80%;">The CIFAR01 data set</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs053.html#verifying-the-data-set" style="font-size: 80%;">Verifying the data set</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs054.html#set-up-the-model" style="font-size: 80%;">Set up  the model</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs055.html#add-dense-layers-on-top" style="font-size: 80%;">Add Dense layers on top</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs056.html#compile-and-train-the-model" style="font-size: 80%;">Compile and train the model</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs057.html#finally-evaluate-the-model" style="font-size: 80%;">Finally, evaluate the model</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs058.html#building-code-using-pytorch" style="font-size: 80%;">Building code using Pytorch</a></li>
 
         </ul>
       </li>
@@ -420,7 +280,7 @@
 <!-- ------------------- main content ---------------------- -->
 <div class="jumbotron">
 <center>
-<h1>Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks (RNNs)</h1>
+<h1>Week 45,  Convolutional Neural Networks (CCNs)</h1>
 </center>  <!-- document title -->
 
 <!-- author(s): Morten Hjorth-Jensen -->
@@ -433,7 +293,7 @@ <h1>Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks
 </center>
 <br>
 <center>
-<h4>November 4-8</h4>
+<h4>November 3-7, 2025</h4>
 </center> <!-- date -->
 <br>
 
@@ -458,7 +318,7 @@ <h4>November 4-8</h4>
   <li><a href="http://github.com/._week45-bs008.html">9</a></li>
   <li><a href="http://github.com/._week45-bs009.html">10</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week45-bs076.html">77</a></li>
+  <li><a href="http://github.com/._week45-bs058.html">59</a></li>
   <li><a href="http://github.com/._week45-bs001.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
@@ -472,7 +332,7 @@ <h4>November 4-8</h4>
 </footer>
 -->
 <center style="font-size:80%">
-<!-- copyright --> &copy; 1999-2024, Morten Hjorth-Jensen. Released under CC Attribution-NonCommercial 4.0 license
+<!-- copyright --> &copy; 1999-2025, Morten Hjorth-Jensen. Released under CC Attribution-NonCommercial 4.0 license
 </center>
 </body>
 </html>
diff --git a/doc/pub/week45/html/._week45-bs001.html b/doc/pub/week45/html/._week45-bs001.html
index af6458cf7..5614871d9 100644
--- a/doc/pub/week45/html/._week45-bs001.html
+++ b/doc/pub/week45/html/._week45-bs001.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks (RNNs)">
-<title>Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks (RNNs)</title>
+<meta name="description" content="Week 45,  Convolutional Neural Networks (CCNs)">
+<title>Week 45,  Convolutional Neural Networks (CCNs)</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week45.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week45-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -37,19 +37,19 @@
 <!-- tocinfo
 {'highest level': 2,
  'sections': [('Plans for week 45', 2, None, 'plans-for-week-45'),
-              ('Material for the lab sessions, additional ways to present '
-               'classification results and other practicalities',
+              ('Material for the lab sessions',
                2,
                None,
-               'material-for-the-lab-sessions-additional-ways-to-present-classification-results-and-other-practicalities'),
-              ('Material for Lecture Monday November 4',
+               'material-for-the-lab-sessions'),
+              ('Material for Lecture Monday November 3',
                2,
                None,
-               'material-for-lecture-monday-november-4'),
-              ('Convolutional Neural Networks (recognizing images)',
+               'material-for-lecture-monday-november-3'),
+              ('Convolutional Neural Networks (recognizing images), reminder '
+               'from last week',
                2,
                None,
-               'convolutional-neural-networks-recognizing-images'),
+               'convolutional-neural-networks-recognizing-images-reminder-from-last-week'),
               ('What is the Difference', 2, None, 'what-is-the-difference'),
               ('Neural Networks vs CNNs', 2, None, 'neural-networks-vs-cnns'),
               ('Why CNNS for images, sound files, medical images from CT scans '
@@ -62,10 +62,13 @@
                None,
                'regular-nns-don-t-scale-well-to-full-images'),
               ('3D volumes of neurons', 2, None, '3d-volumes-of-neurons'),
+              ('More on Dimensionalities', 2, None, 'more-on-dimensionalities'),
+              ('Further remarks', 2, None, 'further-remarks'),
               ('Layers used to build CNNs',
                2,
                None,
                'layers-used-to-build-cnns'),
+              ('Transforming images', 2, None, 'transforming-images'),
               ('CNNs in brief', 2, None, 'cnns-in-brief'),
               ('A deep CNN model ("From Raschka et '
                'al":"https://github.com/rasbt/machine-learning-book")',
@@ -73,208 +76,104 @@
                None,
                'a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
               ('Key Idea', 2, None, 'key-idea'),
-              ('Building convolutional neural networks in Tensorflow and Keras',
+              ('Mathematics of CNNs', 2, None, 'mathematics-of-cnns'),
+              ('Convolution Examples: Polynomial multiplication',
                2,
                None,
-               'building-convolutional-neural-networks-in-tensorflow-and-keras'),
-              ('Setting it up', 2, None, 'setting-it-up'),
-              ('The MNIST dataset again', 2, None, 'the-mnist-dataset-again'),
-              ('Strong correlations', 2, None, 'strong-correlations'),
-              ('Layers of a CNN', 2, None, 'layers-of-a-cnn'),
-              ('Systematic reduction', 2, None, 'systematic-reduction'),
-              ('Prerequisites: Collect and pre-process data',
-               2,
-               None,
-               'prerequisites-collect-and-pre-process-data'),
-              ('Importing Keras and Tensorflow',
-               2,
-               None,
-               'importing-keras-and-tensorflow'),
-              ('Running with Keras', 2, None, 'running-with-keras'),
-              ('Final part', 2, None, 'final-part'),
-              ('Final visualization', 2, None, 'final-visualization'),
-              ('The CIFAR01 data set', 2, None, 'the-cifar01-data-set'),
-              ('Verifying the data set', 2, None, 'verifying-the-data-set'),
-              ('Set up  the model', 2, None, 'set-up-the-model'),
-              ('Add Dense layers on top', 2, None, 'add-dense-layers-on-top'),
-              ('Compile and train the model',
-               2,
-               None,
-               'compile-and-train-the-model'),
-              ('Finally, evaluate the model',
-               2,
-               None,
-               'finally-evaluate-the-model'),
-              ('Building our own CNN code',
-               2,
-               None,
-               'building-our-own-cnn-code'),
-              ('List of contents:', 3, None, 'list-of-contents'),
-              ('Schedulers', 3, None, 'schedulers'),
-              ('Usage of schedulers', 3, None, 'usage-of-schedulers'),
-              ('Cost functions', 3, None, 'cost-functions'),
-              ('Usage of cost functions', 3, None, 'usage-of-cost-functions'),
-              ('Activation functions', 3, None, 'activation-functions'),
-              ('Usage of activation functions',
-               3,
-               None,
-               'usage-of-activation-functions'),
-              ('Convolution', 3, None, 'convolution'),
-              ('Layers', 3, None, 'layers'),
-              ('Convolution2DLayer: convolution in a hidden layer',
-               3,
-               None,
-               'convolution2dlayer-convolution-in-a-hidden-layer'),
-              ('Backpropagation in the convolutional layer',
-               3,
-               None,
-               'backpropagation-in-the-convolutional-layer'),
-              ('Demonstration', 3, None, 'demonstration'),
-              ('Pooling Layer', 3, None, 'pooling-layer'),
-              ('Flattening Layer', 3, None, 'flattening-layer'),
-              ('Fully Connected Layers', 3, None, 'fully-connected-layers'),
-              ('Optimized Convolution2DLayer',
-               3,
-               None,
-               'optimized-convolution2dlayer'),
-              ('The Convolutional Neural Network (CNN)',
-               3,
-               None,
-               'the-convolutional-neural-network-cnn'),
-              ('Usage of CNN code', 3, None, 'usage-of-cnn-code'),
-              ('Additional Remarks', 3, None, 'additional-remarks'),
-              ('Remarks on the speed', 3, None, 'remarks-on-the-speed'),
-              ('Convolution using separable kernels',
-               3,
-               None,
-               'convolution-using-separable-kernels'),
-              ('Convolution in the Fourier domain',
-               3,
-               None,
-               'convolution-in-the-fourier-domain'),
-              ('From FFNNs and CNNs to recurrent neural networks (RNNs)',
-               2,
-               None,
-               'from-ffnns-and-cnns-to-recurrent-neural-networks-rnns'),
-              ('Feedback connections', 2, None, 'feedback-connections'),
-              ('Vanishing gradients', 2, None, 'vanishing-gradients'),
-              ('Recurrent neural networks (RNNs): Overarching view',
+               'convolution-examples-polynomial-multiplication'),
+              ('Efficient Polynomial Multiplication',
                2,
                None,
-               'recurrent-neural-networks-rnns-overarching-view'),
-              ('Sequential data only?', 2, None, 'sequential-data-only'),
-              ('Differential equations', 2, None, 'differential-equations'),
-              ('A simple example', 2, None, 'a-simple-example'),
-              ('RNNs', 2, None, 'rnns'),
-              ('What kinds of behaviour can RNNs exhibit?',
+               'efficient-polynomial-multiplication'),
+              ('Further simplification', 2, None, 'further-simplification'),
+              ('A more efficient way of coding the above Convolution',
                2,
                None,
-               'what-kinds-of-behaviour-can-rnns-exhibit'),
-              ('Basic layout,  "Figures from Sebastian Rashcka et al, Machine '
-               'learning with Sickit-Learn and '
-               'PyTorch":"https://sebastianraschka.com/blog/2022/ml-pytorch-book.html"',
+               'a-more-efficient-way-of-coding-the-above-convolution'),
+              ('Commutative process', 2, None, 'commutative-process'),
+              ('Toeplitz matrices', 2, None, 'toeplitz-matrices'),
+              ('Fourier series and Toeplitz matrices',
                2,
                None,
-               'basic-layout-figures-from-sebastian-rashcka-et-al-machine-learning-with-sickit-learn-and-pytorch-https-sebastianraschka-com-blog-2022-ml-pytorch-book-html'),
-              ('Solving differential equations with RNNs',
+               'fourier-series-and-toeplitz-matrices'),
+              ('Generalizing the above one-dimensional case',
                2,
                None,
-               'solving-differential-equations-with-rnns'),
-              ('Two first-order differential equations',
+               'generalizing-the-above-one-dimensional-case'),
+              ('Memory considerations', 2, None, 'memory-considerations'),
+              ('Padding', 2, None, 'padding'),
+              ('New vector', 2, None, 'new-vector'),
+              ('Rewriting as dot products',
                2,
                None,
-               'two-first-order-differential-equations'),
-              ('Velocity only', 2, None, 'velocity-only'),
-              ('Linking with RNNs', 2, None, 'linking-with-rnns'),
-              ('Minor rewrite', 2, None, 'minor-rewrite'),
-              ('RNNs in more detail', 2, None, 'rnns-in-more-detail'),
-              ('RNNs in more detail, part 2',
+               'rewriting-as-dot-products'),
+              ('Cross correlation', 2, None, 'cross-correlation'),
+              ('Two-dimensional objects', 2, None, 'two-dimensional-objects'),
+              ('CNNs in more detail, simple example',
                2,
                None,
-               'rnns-in-more-detail-part-2'),
-              ('RNNs in more detail, part 3',
+               'cnns-in-more-detail-simple-example'),
+              ('The convolution stage', 2, None, 'the-convolution-stage'),
+              ('Finding the number of parameters',
                2,
                None,
-               'rnns-in-more-detail-part-3'),
-              ('RNNs in more detail, part 4',
+               'finding-the-number-of-parameters'),
+              ('New image (or volume)', 2, None, 'new-image-or-volume'),
+              ('Parameters to train, common settings',
                2,
                None,
-               'rnns-in-more-detail-part-4'),
-              ('RNNs in more detail, part 5',
-               2,
-               None,
-               'rnns-in-more-detail-part-5'),
-              ('RNNs in more detail, part 6',
-               2,
-               None,
-               'rnns-in-more-detail-part-6'),
-              ('RNNs in more detail, part 7',
-               2,
-               None,
-               'rnns-in-more-detail-part-7'),
-              ('Backpropagation through time',
-               2,
-               None,
-               'backpropagation-through-time'),
-              ('The backward pass is linear',
-               2,
-               None,
-               'the-backward-pass-is-linear'),
-              ('The problem of exploding or vanishing gradients',
-               2,
-               None,
-               'the-problem-of-exploding-or-vanishing-gradients'),
-              ('Mathematical setup', 2, None, 'mathematical-setup'),
-              ('Back propagation in time through figures, part 1',
-               2,
-               None,
-               'back-propagation-in-time-through-figures-part-1'),
-              ('Back propagation in time, part 2',
-               2,
-               None,
-               'back-propagation-in-time-part-2'),
-              ('Back propagation in time, part 3',
+               'parameters-to-train-common-settings'),
+              ('Examples of CNN setups', 2, None, 'examples-of-cnn-setups'),
+              ('Summarizing: Performing a general discrete convolution ("From '
+               'Raschka et '
+               'al":"https://github.com/rasbt/machine-learning-book")',
                2,
                None,
-               'back-propagation-in-time-part-3'),
-              ('Back propagation in time, part 4',
+               'summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
+              ('Pooling', 2, None, 'pooling'),
+              ('Pooling arithmetic', 2, None, 'pooling-arithmetic'),
+              ('Pooling types ("From Raschka et '
+               'al":"https://github.com/rasbt/machine-learning-book")',
                2,
                None,
-               'back-propagation-in-time-part-4'),
-              ('Back propagation in time in equations',
+               'pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
+              ('Building convolutional neural networks using Tensorflow and '
+               'Keras',
                2,
                None,
-               'back-propagation-in-time-in-equations'),
-              ('Chain rule again', 2, None, 'chain-rule-again'),
-              ('Gradients of loss functions',
+               'building-convolutional-neural-networks-using-tensorflow-and-keras'),
+              ('Setting it up', 2, None, 'setting-it-up'),
+              ('The MNIST dataset again', 2, None, 'the-mnist-dataset-again'),
+              ('Strong correlations', 2, None, 'strong-correlations'),
+              ('Layers of a CNN', 2, None, 'layers-of-a-cnn'),
+              ('Systematic reduction', 2, None, 'systematic-reduction'),
+              ('Prerequisites: Collect and pre-process data',
                2,
                None,
-               'gradients-of-loss-functions'),
-              ('Summary of RNNs', 2, None, 'summary-of-rnns'),
-              ('Summary of a  typical RNN',
+               'prerequisites-collect-and-pre-process-data'),
+              ('Importing Keras and Tensorflow',
                2,
                None,
-               'summary-of-a-typical-rnn'),
-              ('Four effective ways to learn an RNN and preparing for next '
-               'week',
+               'importing-keras-and-tensorflow'),
+              ('Running with Keras', 2, None, 'running-with-keras'),
+              ('Final part', 2, None, 'final-part'),
+              ('Final visualization', 2, None, 'final-visualization'),
+              ('The CIFAR01 data set', 2, None, 'the-cifar01-data-set'),
+              ('Verifying the data set', 2, None, 'verifying-the-data-set'),
+              ('Set up  the model', 2, None, 'set-up-the-model'),
+              ('Add Dense layers on top', 2, None, 'add-dense-layers-on-top'),
+              ('Compile and train the model',
                2,
                None,
-               'four-effective-ways-to-learn-an-rnn-and-preparing-for-next-week'),
-              ('Gating mechanism: Long Short Term Memory (LSTM)',
+               'compile-and-train-the-model'),
+              ('Finally, evaluate the model',
                2,
                None,
-               'gating-mechanism-long-short-term-memory-lstm'),
-              ('Implementing a memory cell in a neural network',
+               'finally-evaluate-the-model'),
+              ('Building code using Pytorch',
                2,
                None,
-               'implementing-a-memory-cell-in-a-neural-network'),
-              ('LSTM details', 2, None, 'lstm-details'),
-              ('Basic layout', 2, None, 'basic-layout'),
-              ('More LSTM details', 2, None, 'more-lstm-details'),
-              ('The forget gate', 2, None, 'the-forget-gate'),
-              ('Input gate', 2, None, 'input-gate'),
-              ('Forget and input', 2, None, 'forget-and-input'),
-              ('Output gate', 2, None, 'output-gate')]}
+               'building-code-using-pytorch')]}
 end of tocinfo -->
 
 <body>
@@ -302,111 +201,72 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week45-bs.html">Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks (RNNs)</a>
+    <a class="navbar-brand" href="http://github.com/week45-bs.html">Week 45,  Convolutional Neural Networks (CCNs)</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="#plans-for-week-45" style="font-size: 80%;"><b>Plans for week 45</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs002.html#material-for-the-lab-sessions-additional-ways-to-present-classification-results-and-other-practicalities" style="font-size: 80%;"><b>Material for the lab sessions, additional ways to present classification results and other practicalities</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs003.html#material-for-lecture-monday-november-4" style="font-size: 80%;"><b>Material for Lecture Monday November 4</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs004.html#convolutional-neural-networks-recognizing-images" style="font-size: 80%;"><b>Convolutional Neural Networks (recognizing images)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs005.html#what-is-the-difference" style="font-size: 80%;"><b>What is the Difference</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs006.html#neural-networks-vs-cnns" style="font-size: 80%;"><b>Neural Networks vs CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs007.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;"><b>Why CNNS for images, sound files, medical images from CT scans etc?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs008.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;"><b>Regular NNs don’t scale well to full images</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs009.html#3d-volumes-of-neurons" style="font-size: 80%;"><b>3D volumes of neurons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs010.html#layers-used-to-build-cnns" style="font-size: 80%;"><b>Layers used to build CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs011.html#cnns-in-brief" style="font-size: 80%;"><b>CNNs in brief</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs012.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs013.html#key-idea" style="font-size: 80%;"><b>Key Idea</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs014.html#building-convolutional-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building convolutional neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs015.html#setting-it-up" style="font-size: 80%;"><b>Setting it up</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs016.html#the-mnist-dataset-again" style="font-size: 80%;"><b>The MNIST dataset again</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs017.html#strong-correlations" style="font-size: 80%;"><b>Strong correlations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs018.html#layers-of-a-cnn" style="font-size: 80%;"><b>Layers of a CNN</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs019.html#systematic-reduction" style="font-size: 80%;"><b>Systematic reduction</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs020.html#prerequisites-collect-and-pre-process-data" style="font-size: 80%;"><b>Prerequisites: Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs021.html#importing-keras-and-tensorflow" style="font-size: 80%;"><b>Importing Keras and Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs022.html#running-with-keras" style="font-size: 80%;"><b>Running with Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs023.html#final-part" style="font-size: 80%;"><b>Final part</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs024.html#final-visualization" style="font-size: 80%;"><b>Final visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs025.html#the-cifar01-data-set" style="font-size: 80%;"><b>The CIFAR01 data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs026.html#verifying-the-data-set" style="font-size: 80%;"><b>Verifying the data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs027.html#set-up-the-model" style="font-size: 80%;"><b>Set up  the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs028.html#add-dense-layers-on-top" style="font-size: 80%;"><b>Add Dense layers on top</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs029.html#compile-and-train-the-model" style="font-size: 80%;"><b>Compile and train the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs030.html#finally-evaluate-the-model" style="font-size: 80%;"><b>Finally, evaluate the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#building-our-own-cnn-code" style="font-size: 80%;"><b>Building our own CNN code</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#list-of-contents" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;List of contents:</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution2dlayer-convolution-in-a-hidden-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution2DLayer: convolution in a hidden layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#backpropagation-in-the-convolutional-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Backpropagation in the convolutional layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#demonstration" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Demonstration</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#pooling-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Pooling Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#flattening-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Flattening Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#fully-connected-layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Fully Connected Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#optimized-convolution2dlayer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Optimized Convolution2DLayer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#the-convolutional-neural-network-cnn" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;The Convolutional Neural Network (CNN)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-cnn-code" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of CNN code</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#additional-remarks" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Additional Remarks</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#remarks-on-the-speed" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Remarks on the speed</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution-using-separable-kernels" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution using separable kernels</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution-in-the-fourier-domain" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution in the Fourier domain</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs032.html#from-ffnns-and-cnns-to-recurrent-neural-networks-rnns" style="font-size: 80%;"><b>From FFNNs and CNNs to recurrent neural networks (RNNs)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs033.html#feedback-connections" style="font-size: 80%;"><b>Feedback connections</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs034.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs035.html#recurrent-neural-networks-rnns-overarching-view" style="font-size: 80%;"><b>Recurrent neural networks (RNNs): Overarching view</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs036.html#sequential-data-only" style="font-size: 80%;"><b>Sequential data only?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs037.html#differential-equations" style="font-size: 80%;"><b>Differential equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs038.html#a-simple-example" style="font-size: 80%;"><b>A simple example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs039.html#rnns" style="font-size: 80%;"><b>RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs040.html#what-kinds-of-behaviour-can-rnns-exhibit" style="font-size: 80%;"><b>What kinds of behaviour can RNNs exhibit?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs041.html#basic-layout-figures-from-sebastian-rashcka-et-al-machine-learning-with-sickit-learn-and-pytorch-https-sebastianraschka-com-blog-2022-ml-pytorch-book-html" style="font-size: 80%;"><b>Basic layout,  "Figures from Sebastian Rashcka et al, Machine learning with Sickit-Learn and PyTorch":"https://sebastianraschka.com/blog/2022/ml-pytorch-book.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs042.html#solving-differential-equations-with-rnns" style="font-size: 80%;"><b>Solving differential equations with RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs043.html#two-first-order-differential-equations" style="font-size: 80%;"><b>Two first-order differential equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs044.html#velocity-only" style="font-size: 80%;"><b>Velocity only</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs045.html#linking-with-rnns" style="font-size: 80%;"><b>Linking with RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs046.html#minor-rewrite" style="font-size: 80%;"><b>Minor rewrite</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs047.html#rnns-in-more-detail" style="font-size: 80%;"><b>RNNs in more detail</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs048.html#rnns-in-more-detail-part-2" style="font-size: 80%;"><b>RNNs in more detail, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs049.html#rnns-in-more-detail-part-3" style="font-size: 80%;"><b>RNNs in more detail, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs050.html#rnns-in-more-detail-part-4" style="font-size: 80%;"><b>RNNs in more detail, part 4</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs051.html#rnns-in-more-detail-part-5" style="font-size: 80%;"><b>RNNs in more detail, part 5</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs052.html#rnns-in-more-detail-part-6" style="font-size: 80%;"><b>RNNs in more detail, part 6</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs053.html#rnns-in-more-detail-part-7" style="font-size: 80%;"><b>RNNs in more detail, part 7</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs054.html#backpropagation-through-time" style="font-size: 80%;"><b>Backpropagation through time</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs055.html#the-backward-pass-is-linear" style="font-size: 80%;"><b>The backward pass is linear</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs056.html#the-problem-of-exploding-or-vanishing-gradients" style="font-size: 80%;"><b>The problem of exploding or vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs057.html#mathematical-setup" style="font-size: 80%;"><b>Mathematical setup</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs058.html#back-propagation-in-time-through-figures-part-1" style="font-size: 80%;"><b>Back propagation in time through figures, part 1</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs059.html#back-propagation-in-time-part-2" style="font-size: 80%;"><b>Back propagation in time, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs060.html#back-propagation-in-time-part-3" style="font-size: 80%;"><b>Back propagation in time, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs061.html#back-propagation-in-time-part-4" style="font-size: 80%;"><b>Back propagation in time, part 4</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs062.html#back-propagation-in-time-in-equations" style="font-size: 80%;"><b>Back propagation in time in equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs063.html#chain-rule-again" style="font-size: 80%;"><b>Chain rule again</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs064.html#gradients-of-loss-functions" style="font-size: 80%;"><b>Gradients of loss functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs065.html#summary-of-rnns" style="font-size: 80%;"><b>Summary of RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs066.html#summary-of-a-typical-rnn" style="font-size: 80%;"><b>Summary of a  typical RNN</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs067.html#four-effective-ways-to-learn-an-rnn-and-preparing-for-next-week" style="font-size: 80%;"><b>Four effective ways to learn an RNN and preparing for next week</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs068.html#gating-mechanism-long-short-term-memory-lstm" style="font-size: 80%;"><b>Gating mechanism: Long Short Term Memory (LSTM)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs069.html#implementing-a-memory-cell-in-a-neural-network" style="font-size: 80%;"><b>Implementing a memory cell in a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs070.html#lstm-details" style="font-size: 80%;"><b>LSTM details</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs071.html#basic-layout" style="font-size: 80%;"><b>Basic layout</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs072.html#more-lstm-details" style="font-size: 80%;"><b>More LSTM details</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs073.html#the-forget-gate" style="font-size: 80%;"><b>The forget gate</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs074.html#input-gate" style="font-size: 80%;"><b>Input gate</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs075.html#forget-and-input" style="font-size: 80%;"><b>Forget and input</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs076.html#output-gate" style="font-size: 80%;"><b>Output gate</b></a></li>
+     <!-- navigation toc: --> <li><a href="#plans-for-week-45" style="font-size: 80%;">Plans for week 45</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs002.html#material-for-the-lab-sessions" style="font-size: 80%;">Material for the lab sessions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs003.html#material-for-lecture-monday-november-3" style="font-size: 80%;">Material for Lecture Monday November 3</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs004.html#convolutional-neural-networks-recognizing-images-reminder-from-last-week" style="font-size: 80%;">Convolutional Neural Networks (recognizing images), reminder from last week</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs005.html#what-is-the-difference" style="font-size: 80%;">What is the Difference</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs006.html#neural-networks-vs-cnns" style="font-size: 80%;">Neural Networks vs CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs007.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;">Why CNNS for images, sound files, medical images from CT scans etc?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs008.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;">Regular NNs don’t scale well to full images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs009.html#3d-volumes-of-neurons" style="font-size: 80%;">3D volumes of neurons</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs010.html#more-on-dimensionalities" style="font-size: 80%;">More on Dimensionalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs011.html#further-remarks" style="font-size: 80%;">Further remarks</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs012.html#layers-used-to-build-cnns" style="font-size: 80%;">Layers used to build CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs013.html#transforming-images" style="font-size: 80%;">Transforming images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs014.html#cnns-in-brief" style="font-size: 80%;">CNNs in brief</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs015.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs016.html#key-idea" style="font-size: 80%;">Key Idea</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs017.html#mathematics-of-cnns" style="font-size: 80%;">Mathematics of CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs018.html#convolution-examples-polynomial-multiplication" style="font-size: 80%;">Convolution Examples: Polynomial multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs019.html#efficient-polynomial-multiplication" style="font-size: 80%;">Efficient Polynomial Multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs020.html#further-simplification" style="font-size: 80%;">Further simplification</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs021.html#a-more-efficient-way-of-coding-the-above-convolution" style="font-size: 80%;">A more efficient way of coding the above Convolution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs022.html#commutative-process" style="font-size: 80%;">Commutative process</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs023.html#toeplitz-matrices" style="font-size: 80%;">Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs024.html#fourier-series-and-toeplitz-matrices" style="font-size: 80%;">Fourier series and Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs025.html#generalizing-the-above-one-dimensional-case" style="font-size: 80%;">Generalizing the above one-dimensional case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs026.html#memory-considerations" style="font-size: 80%;">Memory considerations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs027.html#padding" style="font-size: 80%;">Padding</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs028.html#new-vector" style="font-size: 80%;">New vector</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs029.html#rewriting-as-dot-products" style="font-size: 80%;">Rewriting as dot products</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs030.html#cross-correlation" style="font-size: 80%;">Cross correlation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs030.html#two-dimensional-objects" style="font-size: 80%;">Two-dimensional objects</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#cnns-in-more-detail-simple-example" style="font-size: 80%;">CNNs in more detail, simple example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs032.html#the-convolution-stage" style="font-size: 80%;">The convolution stage</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs033.html#finding-the-number-of-parameters" style="font-size: 80%;">Finding the number of parameters</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs034.html#new-image-or-volume" style="font-size: 80%;">New image (or volume)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs035.html#parameters-to-train-common-settings" style="font-size: 80%;">Parameters to train, common settings</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs036.html#examples-of-cnn-setups" style="font-size: 80%;">Examples of CNN setups</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs037.html#summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Summarizing: Performing a general discrete convolution ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs038.html#pooling" style="font-size: 80%;">Pooling</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs039.html#pooling-arithmetic" style="font-size: 80%;">Pooling arithmetic</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs040.html#pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Pooling types ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs041.html#building-convolutional-neural-networks-using-tensorflow-and-keras" style="font-size: 80%;">Building convolutional neural networks using Tensorflow and Keras</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs042.html#setting-it-up" style="font-size: 80%;">Setting it up</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs043.html#the-mnist-dataset-again" style="font-size: 80%;">The MNIST dataset again</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs044.html#strong-correlations" style="font-size: 80%;">Strong correlations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs045.html#layers-of-a-cnn" style="font-size: 80%;">Layers of a CNN</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs046.html#systematic-reduction" style="font-size: 80%;">Systematic reduction</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs047.html#prerequisites-collect-and-pre-process-data" style="font-size: 80%;">Prerequisites: Collect and pre-process data</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs048.html#importing-keras-and-tensorflow" style="font-size: 80%;">Importing Keras and Tensorflow</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs049.html#running-with-keras" style="font-size: 80%;">Running with Keras</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs050.html#final-part" style="font-size: 80%;">Final part</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs051.html#final-visualization" style="font-size: 80%;">Final visualization</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs052.html#the-cifar01-data-set" style="font-size: 80%;">The CIFAR01 data set</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs053.html#verifying-the-data-set" style="font-size: 80%;">Verifying the data set</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs054.html#set-up-the-model" style="font-size: 80%;">Set up  the model</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs055.html#add-dense-layers-on-top" style="font-size: 80%;">Add Dense layers on top</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs056.html#compile-and-train-the-model" style="font-size: 80%;">Compile and train the model</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs057.html#finally-evaluate-the-model" style="font-size: 80%;">Finally, evaluate the model</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs058.html#building-code-using-pytorch" style="font-size: 80%;">Building code using Pytorch</a></li>
 
         </ul>
       </li>
@@ -424,16 +284,15 @@ <h2 id="plans-for-week-45" class="anchor">Plans for week 45 </h2>
 <div class="panel-body">
 <!-- subsequent paragraphs come in larger fonts, so start with a paragraph -->
 <ol>
-<li> Convolutional Neural Networks, codes and examples (own code and TensorFlow and Pytorch implementations)</li>
-<li> Recurrent  Neural Networks (RNNs)</li>
-<li> Readings and Videos:
+<li> Convolutional Neural Networks, codes and examples (TensorFlow and Pytorch implementations)</li>
+<li> Readings and Videos:</li>
+<li> These lecture notes at <a href="https://github.com/CompPhysics/MachineLearning/blob/master/doc/pub/week45/ipynb/week45.ipynb" target="_self"><tt>https://github.com/CompPhysics/MachineLearning/blob/master/doc/pub/week45/ipynb/week45.ipynb</tt></a></li>
+<li> Video of lecture at <a href="https://youtu.be/dZt6Vm1wjhs" target="_self"><tt>https://youtu.be/dZt6Vm1wjhs</tt></a></li>
+<li> Whiteboard notes at <a href="https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2025/FYSSTKweek45.pdf" target="_self"><tt>https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2025/FYSSTKweek45.pdf</tt></a></li>
+<li> For a more in depth discussion on  CNNs we recommend Goodfellow et al chapters 9. See also chapter 11 and 12 on practicalities and applications</li>    
+<li> Reading suggestions for implementation of CNNs, see Raschka et al chapters 14-15 at <a href="https://github.com/rasbt/machine-learning-book" target="_self"><tt>https://github.com/rasbt/machine-learning-book</tt></a>.
+<!-- o Video  on Recurrent Neural Networks from MIT at <a href="https://www.youtube.com/watch?v=SEnXr6v2ifU&ab_channel=AlexanderAmini" target="_self"><tt>https://www.youtube.com/watch?v=SEnXr6v2ifU&ab_channel=AlexanderAmini</tt></a> -->
 <ol type="a"></li>
- <li> These lecture notes at <a href="https://github.com/CompPhysics/MachineLearning/blob/master/doc/pub/week45/ipynb/week45.ipynb" target="_self"><tt>https://github.com/CompPhysics/MachineLearning/blob/master/doc/pub/week45/ipynb/week45.ipynb</tt></a>
-<!-- * <a href="https://youtu.be/z0x-vgyAZUk" target="_self">Video of lecture</a> -->
-<!-- * <a href="https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2023/NotesNov9.pdf" target="_self">Whiteboard notes</a> --></li>
- <li> For a more in depth discussion on  CNNs and recurrent neural networks we recommend Goodfellow et al chapters 9 and 10. See also chapter 11 and 12 on practicalities and applications</li>    
- <li> Reading suggestions for implementation of CNNs and RNNs, see Raschka et al chapters 14-15 at <a href="https://github.com/rasbt/machine-learning-book" target="_self"><tt>https://github.com/rasbt/machine-learning-book</tt></a>.</li>
- <li> Video  on Recurrent Neural Networks from MIT at <a href="https://www.youtube.com/watch?v=SEnXr6v2ifU&ab_channel=AlexanderAmini" target="_self"><tt>https://www.youtube.com/watch?v=SEnXr6v2ifU&ab_channel=AlexanderAmini</tt></a></li>
  <li> Video on Deep Learning at <a href="https://www.youtube.com/playlist?list=PLZHQObOWTQDNU6R1_67000Dx_ZCJB-3pi" target="_self"><tt>https://www.youtube.com/playlist?list=PLZHQObOWTQDNU6R1_67000Dx_ZCJB-3pi</tt></a></li>
 </ol>
 </ol>
@@ -457,7 +316,7 @@ <h2 id="plans-for-week-45" class="anchor">Plans for week 45 </h2>
   <li><a href="http://github.com/._week45-bs009.html">10</a></li>
   <li><a href="http://github.com/._week45-bs010.html">11</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week45-bs076.html">77</a></li>
+  <li><a href="http://github.com/._week45-bs058.html">59</a></li>
   <li><a href="http://github.com/._week45-bs002.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week45/html/._week45-bs002.html b/doc/pub/week45/html/._week45-bs002.html
index d49b7544d..87841c18a 100644
--- a/doc/pub/week45/html/._week45-bs002.html
+++ b/doc/pub/week45/html/._week45-bs002.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks (RNNs)">
-<title>Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks (RNNs)</title>
+<meta name="description" content="Week 45,  Convolutional Neural Networks (CCNs)">
+<title>Week 45,  Convolutional Neural Networks (CCNs)</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week45.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week45-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -37,19 +37,19 @@
 <!-- tocinfo
 {'highest level': 2,
  'sections': [('Plans for week 45', 2, None, 'plans-for-week-45'),
-              ('Material for the lab sessions, additional ways to present '
-               'classification results and other practicalities',
+              ('Material for the lab sessions',
                2,
                None,
-               'material-for-the-lab-sessions-additional-ways-to-present-classification-results-and-other-practicalities'),
-              ('Material for Lecture Monday November 4',
+               'material-for-the-lab-sessions'),
+              ('Material for Lecture Monday November 3',
                2,
                None,
-               'material-for-lecture-monday-november-4'),
-              ('Convolutional Neural Networks (recognizing images)',
+               'material-for-lecture-monday-november-3'),
+              ('Convolutional Neural Networks (recognizing images), reminder '
+               'from last week',
                2,
                None,
-               'convolutional-neural-networks-recognizing-images'),
+               'convolutional-neural-networks-recognizing-images-reminder-from-last-week'),
               ('What is the Difference', 2, None, 'what-is-the-difference'),
               ('Neural Networks vs CNNs', 2, None, 'neural-networks-vs-cnns'),
               ('Why CNNS for images, sound files, medical images from CT scans '
@@ -62,10 +62,13 @@
                None,
                'regular-nns-don-t-scale-well-to-full-images'),
               ('3D volumes of neurons', 2, None, '3d-volumes-of-neurons'),
+              ('More on Dimensionalities', 2, None, 'more-on-dimensionalities'),
+              ('Further remarks', 2, None, 'further-remarks'),
               ('Layers used to build CNNs',
                2,
                None,
                'layers-used-to-build-cnns'),
+              ('Transforming images', 2, None, 'transforming-images'),
               ('CNNs in brief', 2, None, 'cnns-in-brief'),
               ('A deep CNN model ("From Raschka et '
                'al":"https://github.com/rasbt/machine-learning-book")',
@@ -73,208 +76,104 @@
                None,
                'a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
               ('Key Idea', 2, None, 'key-idea'),
-              ('Building convolutional neural networks in Tensorflow and Keras',
+              ('Mathematics of CNNs', 2, None, 'mathematics-of-cnns'),
+              ('Convolution Examples: Polynomial multiplication',
                2,
                None,
-               'building-convolutional-neural-networks-in-tensorflow-and-keras'),
-              ('Setting it up', 2, None, 'setting-it-up'),
-              ('The MNIST dataset again', 2, None, 'the-mnist-dataset-again'),
-              ('Strong correlations', 2, None, 'strong-correlations'),
-              ('Layers of a CNN', 2, None, 'layers-of-a-cnn'),
-              ('Systematic reduction', 2, None, 'systematic-reduction'),
-              ('Prerequisites: Collect and pre-process data',
-               2,
-               None,
-               'prerequisites-collect-and-pre-process-data'),
-              ('Importing Keras and Tensorflow',
-               2,
-               None,
-               'importing-keras-and-tensorflow'),
-              ('Running with Keras', 2, None, 'running-with-keras'),
-              ('Final part', 2, None, 'final-part'),
-              ('Final visualization', 2, None, 'final-visualization'),
-              ('The CIFAR01 data set', 2, None, 'the-cifar01-data-set'),
-              ('Verifying the data set', 2, None, 'verifying-the-data-set'),
-              ('Set up  the model', 2, None, 'set-up-the-model'),
-              ('Add Dense layers on top', 2, None, 'add-dense-layers-on-top'),
-              ('Compile and train the model',
-               2,
-               None,
-               'compile-and-train-the-model'),
-              ('Finally, evaluate the model',
-               2,
-               None,
-               'finally-evaluate-the-model'),
-              ('Building our own CNN code',
-               2,
-               None,
-               'building-our-own-cnn-code'),
-              ('List of contents:', 3, None, 'list-of-contents'),
-              ('Schedulers', 3, None, 'schedulers'),
-              ('Usage of schedulers', 3, None, 'usage-of-schedulers'),
-              ('Cost functions', 3, None, 'cost-functions'),
-              ('Usage of cost functions', 3, None, 'usage-of-cost-functions'),
-              ('Activation functions', 3, None, 'activation-functions'),
-              ('Usage of activation functions',
-               3,
-               None,
-               'usage-of-activation-functions'),
-              ('Convolution', 3, None, 'convolution'),
-              ('Layers', 3, None, 'layers'),
-              ('Convolution2DLayer: convolution in a hidden layer',
-               3,
-               None,
-               'convolution2dlayer-convolution-in-a-hidden-layer'),
-              ('Backpropagation in the convolutional layer',
-               3,
-               None,
-               'backpropagation-in-the-convolutional-layer'),
-              ('Demonstration', 3, None, 'demonstration'),
-              ('Pooling Layer', 3, None, 'pooling-layer'),
-              ('Flattening Layer', 3, None, 'flattening-layer'),
-              ('Fully Connected Layers', 3, None, 'fully-connected-layers'),
-              ('Optimized Convolution2DLayer',
-               3,
-               None,
-               'optimized-convolution2dlayer'),
-              ('The Convolutional Neural Network (CNN)',
-               3,
-               None,
-               'the-convolutional-neural-network-cnn'),
-              ('Usage of CNN code', 3, None, 'usage-of-cnn-code'),
-              ('Additional Remarks', 3, None, 'additional-remarks'),
-              ('Remarks on the speed', 3, None, 'remarks-on-the-speed'),
-              ('Convolution using separable kernels',
-               3,
-               None,
-               'convolution-using-separable-kernels'),
-              ('Convolution in the Fourier domain',
-               3,
-               None,
-               'convolution-in-the-fourier-domain'),
-              ('From FFNNs and CNNs to recurrent neural networks (RNNs)',
+               'convolution-examples-polynomial-multiplication'),
+              ('Efficient Polynomial Multiplication',
                2,
                None,
-               'from-ffnns-and-cnns-to-recurrent-neural-networks-rnns'),
-              ('Feedback connections', 2, None, 'feedback-connections'),
-              ('Vanishing gradients', 2, None, 'vanishing-gradients'),
-              ('Recurrent neural networks (RNNs): Overarching view',
+               'efficient-polynomial-multiplication'),
+              ('Further simplification', 2, None, 'further-simplification'),
+              ('A more efficient way of coding the above Convolution',
                2,
                None,
-               'recurrent-neural-networks-rnns-overarching-view'),
-              ('Sequential data only?', 2, None, 'sequential-data-only'),
-              ('Differential equations', 2, None, 'differential-equations'),
-              ('A simple example', 2, None, 'a-simple-example'),
-              ('RNNs', 2, None, 'rnns'),
-              ('What kinds of behaviour can RNNs exhibit?',
+               'a-more-efficient-way-of-coding-the-above-convolution'),
+              ('Commutative process', 2, None, 'commutative-process'),
+              ('Toeplitz matrices', 2, None, 'toeplitz-matrices'),
+              ('Fourier series and Toeplitz matrices',
                2,
                None,
-               'what-kinds-of-behaviour-can-rnns-exhibit'),
-              ('Basic layout,  "Figures from Sebastian Rashcka et al, Machine '
-               'learning with Sickit-Learn and '
-               'PyTorch":"https://sebastianraschka.com/blog/2022/ml-pytorch-book.html"',
+               'fourier-series-and-toeplitz-matrices'),
+              ('Generalizing the above one-dimensional case',
                2,
                None,
-               'basic-layout-figures-from-sebastian-rashcka-et-al-machine-learning-with-sickit-learn-and-pytorch-https-sebastianraschka-com-blog-2022-ml-pytorch-book-html'),
-              ('Solving differential equations with RNNs',
+               'generalizing-the-above-one-dimensional-case'),
+              ('Memory considerations', 2, None, 'memory-considerations'),
+              ('Padding', 2, None, 'padding'),
+              ('New vector', 2, None, 'new-vector'),
+              ('Rewriting as dot products',
                2,
                None,
-               'solving-differential-equations-with-rnns'),
-              ('Two first-order differential equations',
+               'rewriting-as-dot-products'),
+              ('Cross correlation', 2, None, 'cross-correlation'),
+              ('Two-dimensional objects', 2, None, 'two-dimensional-objects'),
+              ('CNNs in more detail, simple example',
                2,
                None,
-               'two-first-order-differential-equations'),
-              ('Velocity only', 2, None, 'velocity-only'),
-              ('Linking with RNNs', 2, None, 'linking-with-rnns'),
-              ('Minor rewrite', 2, None, 'minor-rewrite'),
-              ('RNNs in more detail', 2, None, 'rnns-in-more-detail'),
-              ('RNNs in more detail, part 2',
+               'cnns-in-more-detail-simple-example'),
+              ('The convolution stage', 2, None, 'the-convolution-stage'),
+              ('Finding the number of parameters',
                2,
                None,
-               'rnns-in-more-detail-part-2'),
-              ('RNNs in more detail, part 3',
+               'finding-the-number-of-parameters'),
+              ('New image (or volume)', 2, None, 'new-image-or-volume'),
+              ('Parameters to train, common settings',
                2,
                None,
-               'rnns-in-more-detail-part-3'),
-              ('RNNs in more detail, part 4',
-               2,
-               None,
-               'rnns-in-more-detail-part-4'),
-              ('RNNs in more detail, part 5',
-               2,
-               None,
-               'rnns-in-more-detail-part-5'),
-              ('RNNs in more detail, part 6',
-               2,
-               None,
-               'rnns-in-more-detail-part-6'),
-              ('RNNs in more detail, part 7',
-               2,
-               None,
-               'rnns-in-more-detail-part-7'),
-              ('Backpropagation through time',
-               2,
-               None,
-               'backpropagation-through-time'),
-              ('The backward pass is linear',
-               2,
-               None,
-               'the-backward-pass-is-linear'),
-              ('The problem of exploding or vanishing gradients',
-               2,
-               None,
-               'the-problem-of-exploding-or-vanishing-gradients'),
-              ('Mathematical setup', 2, None, 'mathematical-setup'),
-              ('Back propagation in time through figures, part 1',
-               2,
-               None,
-               'back-propagation-in-time-through-figures-part-1'),
-              ('Back propagation in time, part 2',
-               2,
-               None,
-               'back-propagation-in-time-part-2'),
-              ('Back propagation in time, part 3',
+               'parameters-to-train-common-settings'),
+              ('Examples of CNN setups', 2, None, 'examples-of-cnn-setups'),
+              ('Summarizing: Performing a general discrete convolution ("From '
+               'Raschka et '
+               'al":"https://github.com/rasbt/machine-learning-book")',
                2,
                None,
-               'back-propagation-in-time-part-3'),
-              ('Back propagation in time, part 4',
+               'summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
+              ('Pooling', 2, None, 'pooling'),
+              ('Pooling arithmetic', 2, None, 'pooling-arithmetic'),
+              ('Pooling types ("From Raschka et '
+               'al":"https://github.com/rasbt/machine-learning-book")',
                2,
                None,
-               'back-propagation-in-time-part-4'),
-              ('Back propagation in time in equations',
+               'pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
+              ('Building convolutional neural networks using Tensorflow and '
+               'Keras',
                2,
                None,
-               'back-propagation-in-time-in-equations'),
-              ('Chain rule again', 2, None, 'chain-rule-again'),
-              ('Gradients of loss functions',
+               'building-convolutional-neural-networks-using-tensorflow-and-keras'),
+              ('Setting it up', 2, None, 'setting-it-up'),
+              ('The MNIST dataset again', 2, None, 'the-mnist-dataset-again'),
+              ('Strong correlations', 2, None, 'strong-correlations'),
+              ('Layers of a CNN', 2, None, 'layers-of-a-cnn'),
+              ('Systematic reduction', 2, None, 'systematic-reduction'),
+              ('Prerequisites: Collect and pre-process data',
                2,
                None,
-               'gradients-of-loss-functions'),
-              ('Summary of RNNs', 2, None, 'summary-of-rnns'),
-              ('Summary of a  typical RNN',
+               'prerequisites-collect-and-pre-process-data'),
+              ('Importing Keras and Tensorflow',
                2,
                None,
-               'summary-of-a-typical-rnn'),
-              ('Four effective ways to learn an RNN and preparing for next '
-               'week',
+               'importing-keras-and-tensorflow'),
+              ('Running with Keras', 2, None, 'running-with-keras'),
+              ('Final part', 2, None, 'final-part'),
+              ('Final visualization', 2, None, 'final-visualization'),
+              ('The CIFAR01 data set', 2, None, 'the-cifar01-data-set'),
+              ('Verifying the data set', 2, None, 'verifying-the-data-set'),
+              ('Set up  the model', 2, None, 'set-up-the-model'),
+              ('Add Dense layers on top', 2, None, 'add-dense-layers-on-top'),
+              ('Compile and train the model',
                2,
                None,
-               'four-effective-ways-to-learn-an-rnn-and-preparing-for-next-week'),
-              ('Gating mechanism: Long Short Term Memory (LSTM)',
+               'compile-and-train-the-model'),
+              ('Finally, evaluate the model',
                2,
                None,
-               'gating-mechanism-long-short-term-memory-lstm'),
-              ('Implementing a memory cell in a neural network',
+               'finally-evaluate-the-model'),
+              ('Building code using Pytorch',
                2,
                None,
-               'implementing-a-memory-cell-in-a-neural-network'),
-              ('LSTM details', 2, None, 'lstm-details'),
-              ('Basic layout', 2, None, 'basic-layout'),
-              ('More LSTM details', 2, None, 'more-lstm-details'),
-              ('The forget gate', 2, None, 'the-forget-gate'),
-              ('Input gate', 2, None, 'input-gate'),
-              ('Forget and input', 2, None, 'forget-and-input'),
-              ('Output gate', 2, None, 'output-gate')]}
+               'building-code-using-pytorch')]}
 end of tocinfo -->
 
 <body>
@@ -302,111 +201,72 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week45-bs.html">Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks (RNNs)</a>
+    <a class="navbar-brand" href="http://github.com/week45-bs.html">Week 45,  Convolutional Neural Networks (CCNs)</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs001.html#plans-for-week-45" style="font-size: 80%;"><b>Plans for week 45</b></a></li>
-     <!-- navigation toc: --> <li><a href="#material-for-the-lab-sessions-additional-ways-to-present-classification-results-and-other-practicalities" style="font-size: 80%;"><b>Material for the lab sessions, additional ways to present classification results and other practicalities</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs003.html#material-for-lecture-monday-november-4" style="font-size: 80%;"><b>Material for Lecture Monday November 4</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs004.html#convolutional-neural-networks-recognizing-images" style="font-size: 80%;"><b>Convolutional Neural Networks (recognizing images)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs005.html#what-is-the-difference" style="font-size: 80%;"><b>What is the Difference</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs006.html#neural-networks-vs-cnns" style="font-size: 80%;"><b>Neural Networks vs CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs007.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;"><b>Why CNNS for images, sound files, medical images from CT scans etc?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs008.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;"><b>Regular NNs don’t scale well to full images</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs009.html#3d-volumes-of-neurons" style="font-size: 80%;"><b>3D volumes of neurons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs010.html#layers-used-to-build-cnns" style="font-size: 80%;"><b>Layers used to build CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs011.html#cnns-in-brief" style="font-size: 80%;"><b>CNNs in brief</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs012.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs013.html#key-idea" style="font-size: 80%;"><b>Key Idea</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs014.html#building-convolutional-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building convolutional neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs015.html#setting-it-up" style="font-size: 80%;"><b>Setting it up</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs016.html#the-mnist-dataset-again" style="font-size: 80%;"><b>The MNIST dataset again</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs017.html#strong-correlations" style="font-size: 80%;"><b>Strong correlations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs018.html#layers-of-a-cnn" style="font-size: 80%;"><b>Layers of a CNN</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs019.html#systematic-reduction" style="font-size: 80%;"><b>Systematic reduction</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs020.html#prerequisites-collect-and-pre-process-data" style="font-size: 80%;"><b>Prerequisites: Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs021.html#importing-keras-and-tensorflow" style="font-size: 80%;"><b>Importing Keras and Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs022.html#running-with-keras" style="font-size: 80%;"><b>Running with Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs023.html#final-part" style="font-size: 80%;"><b>Final part</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs024.html#final-visualization" style="font-size: 80%;"><b>Final visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs025.html#the-cifar01-data-set" style="font-size: 80%;"><b>The CIFAR01 data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs026.html#verifying-the-data-set" style="font-size: 80%;"><b>Verifying the data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs027.html#set-up-the-model" style="font-size: 80%;"><b>Set up  the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs028.html#add-dense-layers-on-top" style="font-size: 80%;"><b>Add Dense layers on top</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs029.html#compile-and-train-the-model" style="font-size: 80%;"><b>Compile and train the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs030.html#finally-evaluate-the-model" style="font-size: 80%;"><b>Finally, evaluate the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#building-our-own-cnn-code" style="font-size: 80%;"><b>Building our own CNN code</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#list-of-contents" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;List of contents:</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution2dlayer-convolution-in-a-hidden-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution2DLayer: convolution in a hidden layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#backpropagation-in-the-convolutional-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Backpropagation in the convolutional layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#demonstration" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Demonstration</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#pooling-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Pooling Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#flattening-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Flattening Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#fully-connected-layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Fully Connected Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#optimized-convolution2dlayer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Optimized Convolution2DLayer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#the-convolutional-neural-network-cnn" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;The Convolutional Neural Network (CNN)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-cnn-code" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of CNN code</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#additional-remarks" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Additional Remarks</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#remarks-on-the-speed" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Remarks on the speed</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution-using-separable-kernels" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution using separable kernels</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution-in-the-fourier-domain" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution in the Fourier domain</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs032.html#from-ffnns-and-cnns-to-recurrent-neural-networks-rnns" style="font-size: 80%;"><b>From FFNNs and CNNs to recurrent neural networks (RNNs)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs033.html#feedback-connections" style="font-size: 80%;"><b>Feedback connections</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs034.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs035.html#recurrent-neural-networks-rnns-overarching-view" style="font-size: 80%;"><b>Recurrent neural networks (RNNs): Overarching view</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs036.html#sequential-data-only" style="font-size: 80%;"><b>Sequential data only?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs037.html#differential-equations" style="font-size: 80%;"><b>Differential equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs038.html#a-simple-example" style="font-size: 80%;"><b>A simple example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs039.html#rnns" style="font-size: 80%;"><b>RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs040.html#what-kinds-of-behaviour-can-rnns-exhibit" style="font-size: 80%;"><b>What kinds of behaviour can RNNs exhibit?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs041.html#basic-layout-figures-from-sebastian-rashcka-et-al-machine-learning-with-sickit-learn-and-pytorch-https-sebastianraschka-com-blog-2022-ml-pytorch-book-html" style="font-size: 80%;"><b>Basic layout,  "Figures from Sebastian Rashcka et al, Machine learning with Sickit-Learn and PyTorch":"https://sebastianraschka.com/blog/2022/ml-pytorch-book.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs042.html#solving-differential-equations-with-rnns" style="font-size: 80%;"><b>Solving differential equations with RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs043.html#two-first-order-differential-equations" style="font-size: 80%;"><b>Two first-order differential equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs044.html#velocity-only" style="font-size: 80%;"><b>Velocity only</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs045.html#linking-with-rnns" style="font-size: 80%;"><b>Linking with RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs046.html#minor-rewrite" style="font-size: 80%;"><b>Minor rewrite</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs047.html#rnns-in-more-detail" style="font-size: 80%;"><b>RNNs in more detail</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs048.html#rnns-in-more-detail-part-2" style="font-size: 80%;"><b>RNNs in more detail, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs049.html#rnns-in-more-detail-part-3" style="font-size: 80%;"><b>RNNs in more detail, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs050.html#rnns-in-more-detail-part-4" style="font-size: 80%;"><b>RNNs in more detail, part 4</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs051.html#rnns-in-more-detail-part-5" style="font-size: 80%;"><b>RNNs in more detail, part 5</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs052.html#rnns-in-more-detail-part-6" style="font-size: 80%;"><b>RNNs in more detail, part 6</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs053.html#rnns-in-more-detail-part-7" style="font-size: 80%;"><b>RNNs in more detail, part 7</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs054.html#backpropagation-through-time" style="font-size: 80%;"><b>Backpropagation through time</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs055.html#the-backward-pass-is-linear" style="font-size: 80%;"><b>The backward pass is linear</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs056.html#the-problem-of-exploding-or-vanishing-gradients" style="font-size: 80%;"><b>The problem of exploding or vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs057.html#mathematical-setup" style="font-size: 80%;"><b>Mathematical setup</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs058.html#back-propagation-in-time-through-figures-part-1" style="font-size: 80%;"><b>Back propagation in time through figures, part 1</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs059.html#back-propagation-in-time-part-2" style="font-size: 80%;"><b>Back propagation in time, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs060.html#back-propagation-in-time-part-3" style="font-size: 80%;"><b>Back propagation in time, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs061.html#back-propagation-in-time-part-4" style="font-size: 80%;"><b>Back propagation in time, part 4</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs062.html#back-propagation-in-time-in-equations" style="font-size: 80%;"><b>Back propagation in time in equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs063.html#chain-rule-again" style="font-size: 80%;"><b>Chain rule again</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs064.html#gradients-of-loss-functions" style="font-size: 80%;"><b>Gradients of loss functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs065.html#summary-of-rnns" style="font-size: 80%;"><b>Summary of RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs066.html#summary-of-a-typical-rnn" style="font-size: 80%;"><b>Summary of a  typical RNN</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs067.html#four-effective-ways-to-learn-an-rnn-and-preparing-for-next-week" style="font-size: 80%;"><b>Four effective ways to learn an RNN and preparing for next week</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs068.html#gating-mechanism-long-short-term-memory-lstm" style="font-size: 80%;"><b>Gating mechanism: Long Short Term Memory (LSTM)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs069.html#implementing-a-memory-cell-in-a-neural-network" style="font-size: 80%;"><b>Implementing a memory cell in a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs070.html#lstm-details" style="font-size: 80%;"><b>LSTM details</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs071.html#basic-layout" style="font-size: 80%;"><b>Basic layout</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs072.html#more-lstm-details" style="font-size: 80%;"><b>More LSTM details</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs073.html#the-forget-gate" style="font-size: 80%;"><b>The forget gate</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs074.html#input-gate" style="font-size: 80%;"><b>Input gate</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs075.html#forget-and-input" style="font-size: 80%;"><b>Forget and input</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs076.html#output-gate" style="font-size: 80%;"><b>Output gate</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs001.html#plans-for-week-45" style="font-size: 80%;">Plans for week 45</a></li>
+     <!-- navigation toc: --> <li><a href="#material-for-the-lab-sessions" style="font-size: 80%;">Material for the lab sessions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs003.html#material-for-lecture-monday-november-3" style="font-size: 80%;">Material for Lecture Monday November 3</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs004.html#convolutional-neural-networks-recognizing-images-reminder-from-last-week" style="font-size: 80%;">Convolutional Neural Networks (recognizing images), reminder from last week</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs005.html#what-is-the-difference" style="font-size: 80%;">What is the Difference</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs006.html#neural-networks-vs-cnns" style="font-size: 80%;">Neural Networks vs CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs007.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;">Why CNNS for images, sound files, medical images from CT scans etc?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs008.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;">Regular NNs don’t scale well to full images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs009.html#3d-volumes-of-neurons" style="font-size: 80%;">3D volumes of neurons</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs010.html#more-on-dimensionalities" style="font-size: 80%;">More on Dimensionalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs011.html#further-remarks" style="font-size: 80%;">Further remarks</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs012.html#layers-used-to-build-cnns" style="font-size: 80%;">Layers used to build CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs013.html#transforming-images" style="font-size: 80%;">Transforming images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs014.html#cnns-in-brief" style="font-size: 80%;">CNNs in brief</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs015.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs016.html#key-idea" style="font-size: 80%;">Key Idea</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs017.html#mathematics-of-cnns" style="font-size: 80%;">Mathematics of CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs018.html#convolution-examples-polynomial-multiplication" style="font-size: 80%;">Convolution Examples: Polynomial multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs019.html#efficient-polynomial-multiplication" style="font-size: 80%;">Efficient Polynomial Multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs020.html#further-simplification" style="font-size: 80%;">Further simplification</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs021.html#a-more-efficient-way-of-coding-the-above-convolution" style="font-size: 80%;">A more efficient way of coding the above Convolution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs022.html#commutative-process" style="font-size: 80%;">Commutative process</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs023.html#toeplitz-matrices" style="font-size: 80%;">Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs024.html#fourier-series-and-toeplitz-matrices" style="font-size: 80%;">Fourier series and Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs025.html#generalizing-the-above-one-dimensional-case" style="font-size: 80%;">Generalizing the above one-dimensional case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs026.html#memory-considerations" style="font-size: 80%;">Memory considerations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs027.html#padding" style="font-size: 80%;">Padding</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs028.html#new-vector" style="font-size: 80%;">New vector</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs029.html#rewriting-as-dot-products" style="font-size: 80%;">Rewriting as dot products</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs030.html#cross-correlation" style="font-size: 80%;">Cross correlation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs030.html#two-dimensional-objects" style="font-size: 80%;">Two-dimensional objects</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#cnns-in-more-detail-simple-example" style="font-size: 80%;">CNNs in more detail, simple example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs032.html#the-convolution-stage" style="font-size: 80%;">The convolution stage</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs033.html#finding-the-number-of-parameters" style="font-size: 80%;">Finding the number of parameters</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs034.html#new-image-or-volume" style="font-size: 80%;">New image (or volume)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs035.html#parameters-to-train-common-settings" style="font-size: 80%;">Parameters to train, common settings</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs036.html#examples-of-cnn-setups" style="font-size: 80%;">Examples of CNN setups</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs037.html#summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Summarizing: Performing a general discrete convolution ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs038.html#pooling" style="font-size: 80%;">Pooling</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs039.html#pooling-arithmetic" style="font-size: 80%;">Pooling arithmetic</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs040.html#pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Pooling types ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs041.html#building-convolutional-neural-networks-using-tensorflow-and-keras" style="font-size: 80%;">Building convolutional neural networks using Tensorflow and Keras</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs042.html#setting-it-up" style="font-size: 80%;">Setting it up</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs043.html#the-mnist-dataset-again" style="font-size: 80%;">The MNIST dataset again</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs044.html#strong-correlations" style="font-size: 80%;">Strong correlations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs045.html#layers-of-a-cnn" style="font-size: 80%;">Layers of a CNN</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs046.html#systematic-reduction" style="font-size: 80%;">Systematic reduction</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs047.html#prerequisites-collect-and-pre-process-data" style="font-size: 80%;">Prerequisites: Collect and pre-process data</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs048.html#importing-keras-and-tensorflow" style="font-size: 80%;">Importing Keras and Tensorflow</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs049.html#running-with-keras" style="font-size: 80%;">Running with Keras</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs050.html#final-part" style="font-size: 80%;">Final part</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs051.html#final-visualization" style="font-size: 80%;">Final visualization</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs052.html#the-cifar01-data-set" style="font-size: 80%;">The CIFAR01 data set</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs053.html#verifying-the-data-set" style="font-size: 80%;">Verifying the data set</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs054.html#set-up-the-model" style="font-size: 80%;">Set up  the model</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs055.html#add-dense-layers-on-top" style="font-size: 80%;">Add Dense layers on top</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs056.html#compile-and-train-the-model" style="font-size: 80%;">Compile and train the model</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs057.html#finally-evaluate-the-model" style="font-size: 80%;">Finally, evaluate the model</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs058.html#building-code-using-pytorch" style="font-size: 80%;">Building code using Pytorch</a></li>
 
         </ul>
       </li>
@@ -418,17 +278,9 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0002"></a>
 <!-- !split -->
-<h2 id="material-for-the-lab-sessions-additional-ways-to-present-classification-results-and-other-practicalities" class="anchor">Material for the lab sessions, additional ways to present classification results and other practicalities </h2>
+<h2 id="material-for-the-lab-sessions" class="anchor">Material for the lab sessions </h2>
 
-<div class="panel panel-default">
-<div class="panel-body">
-<!-- subsequent paragraphs come in larger fonts, so start with a paragraph -->
-<ol>
- <li> Discussion of and work on project 3, available from Monday November 4, late evening</li>
-</ol>
-</div>
-</div>
-  
+<p>Discussion of and work on project 2, no exercises this week, only project work</p>
 
 <p>
 <!-- navigation buttons at the bottom of the page -->
@@ -447,7 +299,7 @@ <h2 id="material-for-the-lab-sessions-additional-ways-to-present-classification-
   <li><a href="http://github.com/._week45-bs010.html">11</a></li>
   <li><a href="http://github.com/._week45-bs011.html">12</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week45-bs076.html">77</a></li>
+  <li><a href="http://github.com/._week45-bs058.html">59</a></li>
   <li><a href="http://github.com/._week45-bs003.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week45/html/._week45-bs003.html b/doc/pub/week45/html/._week45-bs003.html
index 1245f5411..4d64228d8 100644
--- a/doc/pub/week45/html/._week45-bs003.html
+++ b/doc/pub/week45/html/._week45-bs003.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks (RNNs)">
-<title>Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks (RNNs)</title>
+<meta name="description" content="Week 45,  Convolutional Neural Networks (CCNs)">
+<title>Week 45,  Convolutional Neural Networks (CCNs)</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week45.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week45-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -37,19 +37,19 @@
 <!-- tocinfo
 {'highest level': 2,
  'sections': [('Plans for week 45', 2, None, 'plans-for-week-45'),
-              ('Material for the lab sessions, additional ways to present '
-               'classification results and other practicalities',
+              ('Material for the lab sessions',
                2,
                None,
-               'material-for-the-lab-sessions-additional-ways-to-present-classification-results-and-other-practicalities'),
-              ('Material for Lecture Monday November 4',
+               'material-for-the-lab-sessions'),
+              ('Material for Lecture Monday November 3',
                2,
                None,
-               'material-for-lecture-monday-november-4'),
-              ('Convolutional Neural Networks (recognizing images)',
+               'material-for-lecture-monday-november-3'),
+              ('Convolutional Neural Networks (recognizing images), reminder '
+               'from last week',
                2,
                None,
-               'convolutional-neural-networks-recognizing-images'),
+               'convolutional-neural-networks-recognizing-images-reminder-from-last-week'),
               ('What is the Difference', 2, None, 'what-is-the-difference'),
               ('Neural Networks vs CNNs', 2, None, 'neural-networks-vs-cnns'),
               ('Why CNNS for images, sound files, medical images from CT scans '
@@ -62,10 +62,13 @@
                None,
                'regular-nns-don-t-scale-well-to-full-images'),
               ('3D volumes of neurons', 2, None, '3d-volumes-of-neurons'),
+              ('More on Dimensionalities', 2, None, 'more-on-dimensionalities'),
+              ('Further remarks', 2, None, 'further-remarks'),
               ('Layers used to build CNNs',
                2,
                None,
                'layers-used-to-build-cnns'),
+              ('Transforming images', 2, None, 'transforming-images'),
               ('CNNs in brief', 2, None, 'cnns-in-brief'),
               ('A deep CNN model ("From Raschka et '
                'al":"https://github.com/rasbt/machine-learning-book")',
@@ -73,208 +76,104 @@
                None,
                'a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
               ('Key Idea', 2, None, 'key-idea'),
-              ('Building convolutional neural networks in Tensorflow and Keras',
+              ('Mathematics of CNNs', 2, None, 'mathematics-of-cnns'),
+              ('Convolution Examples: Polynomial multiplication',
                2,
                None,
-               'building-convolutional-neural-networks-in-tensorflow-and-keras'),
-              ('Setting it up', 2, None, 'setting-it-up'),
-              ('The MNIST dataset again', 2, None, 'the-mnist-dataset-again'),
-              ('Strong correlations', 2, None, 'strong-correlations'),
-              ('Layers of a CNN', 2, None, 'layers-of-a-cnn'),
-              ('Systematic reduction', 2, None, 'systematic-reduction'),
-              ('Prerequisites: Collect and pre-process data',
-               2,
-               None,
-               'prerequisites-collect-and-pre-process-data'),
-              ('Importing Keras and Tensorflow',
-               2,
-               None,
-               'importing-keras-and-tensorflow'),
-              ('Running with Keras', 2, None, 'running-with-keras'),
-              ('Final part', 2, None, 'final-part'),
-              ('Final visualization', 2, None, 'final-visualization'),
-              ('The CIFAR01 data set', 2, None, 'the-cifar01-data-set'),
-              ('Verifying the data set', 2, None, 'verifying-the-data-set'),
-              ('Set up  the model', 2, None, 'set-up-the-model'),
-              ('Add Dense layers on top', 2, None, 'add-dense-layers-on-top'),
-              ('Compile and train the model',
-               2,
-               None,
-               'compile-and-train-the-model'),
-              ('Finally, evaluate the model',
-               2,
-               None,
-               'finally-evaluate-the-model'),
-              ('Building our own CNN code',
-               2,
-               None,
-               'building-our-own-cnn-code'),
-              ('List of contents:', 3, None, 'list-of-contents'),
-              ('Schedulers', 3, None, 'schedulers'),
-              ('Usage of schedulers', 3, None, 'usage-of-schedulers'),
-              ('Cost functions', 3, None, 'cost-functions'),
-              ('Usage of cost functions', 3, None, 'usage-of-cost-functions'),
-              ('Activation functions', 3, None, 'activation-functions'),
-              ('Usage of activation functions',
-               3,
-               None,
-               'usage-of-activation-functions'),
-              ('Convolution', 3, None, 'convolution'),
-              ('Layers', 3, None, 'layers'),
-              ('Convolution2DLayer: convolution in a hidden layer',
-               3,
-               None,
-               'convolution2dlayer-convolution-in-a-hidden-layer'),
-              ('Backpropagation in the convolutional layer',
-               3,
-               None,
-               'backpropagation-in-the-convolutional-layer'),
-              ('Demonstration', 3, None, 'demonstration'),
-              ('Pooling Layer', 3, None, 'pooling-layer'),
-              ('Flattening Layer', 3, None, 'flattening-layer'),
-              ('Fully Connected Layers', 3, None, 'fully-connected-layers'),
-              ('Optimized Convolution2DLayer',
-               3,
-               None,
-               'optimized-convolution2dlayer'),
-              ('The Convolutional Neural Network (CNN)',
-               3,
-               None,
-               'the-convolutional-neural-network-cnn'),
-              ('Usage of CNN code', 3, None, 'usage-of-cnn-code'),
-              ('Additional Remarks', 3, None, 'additional-remarks'),
-              ('Remarks on the speed', 3, None, 'remarks-on-the-speed'),
-              ('Convolution using separable kernels',
-               3,
-               None,
-               'convolution-using-separable-kernels'),
-              ('Convolution in the Fourier domain',
-               3,
-               None,
-               'convolution-in-the-fourier-domain'),
-              ('From FFNNs and CNNs to recurrent neural networks (RNNs)',
-               2,
-               None,
-               'from-ffnns-and-cnns-to-recurrent-neural-networks-rnns'),
-              ('Feedback connections', 2, None, 'feedback-connections'),
-              ('Vanishing gradients', 2, None, 'vanishing-gradients'),
-              ('Recurrent neural networks (RNNs): Overarching view',
+               'convolution-examples-polynomial-multiplication'),
+              ('Efficient Polynomial Multiplication',
                2,
                None,
-               'recurrent-neural-networks-rnns-overarching-view'),
-              ('Sequential data only?', 2, None, 'sequential-data-only'),
-              ('Differential equations', 2, None, 'differential-equations'),
-              ('A simple example', 2, None, 'a-simple-example'),
-              ('RNNs', 2, None, 'rnns'),
-              ('What kinds of behaviour can RNNs exhibit?',
+               'efficient-polynomial-multiplication'),
+              ('Further simplification', 2, None, 'further-simplification'),
+              ('A more efficient way of coding the above Convolution',
                2,
                None,
-               'what-kinds-of-behaviour-can-rnns-exhibit'),
-              ('Basic layout,  "Figures from Sebastian Rashcka et al, Machine '
-               'learning with Sickit-Learn and '
-               'PyTorch":"https://sebastianraschka.com/blog/2022/ml-pytorch-book.html"',
+               'a-more-efficient-way-of-coding-the-above-convolution'),
+              ('Commutative process', 2, None, 'commutative-process'),
+              ('Toeplitz matrices', 2, None, 'toeplitz-matrices'),
+              ('Fourier series and Toeplitz matrices',
                2,
                None,
-               'basic-layout-figures-from-sebastian-rashcka-et-al-machine-learning-with-sickit-learn-and-pytorch-https-sebastianraschka-com-blog-2022-ml-pytorch-book-html'),
-              ('Solving differential equations with RNNs',
+               'fourier-series-and-toeplitz-matrices'),
+              ('Generalizing the above one-dimensional case',
                2,
                None,
-               'solving-differential-equations-with-rnns'),
-              ('Two first-order differential equations',
+               'generalizing-the-above-one-dimensional-case'),
+              ('Memory considerations', 2, None, 'memory-considerations'),
+              ('Padding', 2, None, 'padding'),
+              ('New vector', 2, None, 'new-vector'),
+              ('Rewriting as dot products',
                2,
                None,
-               'two-first-order-differential-equations'),
-              ('Velocity only', 2, None, 'velocity-only'),
-              ('Linking with RNNs', 2, None, 'linking-with-rnns'),
-              ('Minor rewrite', 2, None, 'minor-rewrite'),
-              ('RNNs in more detail', 2, None, 'rnns-in-more-detail'),
-              ('RNNs in more detail, part 2',
+               'rewriting-as-dot-products'),
+              ('Cross correlation', 2, None, 'cross-correlation'),
+              ('Two-dimensional objects', 2, None, 'two-dimensional-objects'),
+              ('CNNs in more detail, simple example',
                2,
                None,
-               'rnns-in-more-detail-part-2'),
-              ('RNNs in more detail, part 3',
+               'cnns-in-more-detail-simple-example'),
+              ('The convolution stage', 2, None, 'the-convolution-stage'),
+              ('Finding the number of parameters',
                2,
                None,
-               'rnns-in-more-detail-part-3'),
-              ('RNNs in more detail, part 4',
+               'finding-the-number-of-parameters'),
+              ('New image (or volume)', 2, None, 'new-image-or-volume'),
+              ('Parameters to train, common settings',
                2,
                None,
-               'rnns-in-more-detail-part-4'),
-              ('RNNs in more detail, part 5',
-               2,
-               None,
-               'rnns-in-more-detail-part-5'),
-              ('RNNs in more detail, part 6',
-               2,
-               None,
-               'rnns-in-more-detail-part-6'),
-              ('RNNs in more detail, part 7',
-               2,
-               None,
-               'rnns-in-more-detail-part-7'),
-              ('Backpropagation through time',
-               2,
-               None,
-               'backpropagation-through-time'),
-              ('The backward pass is linear',
-               2,
-               None,
-               'the-backward-pass-is-linear'),
-              ('The problem of exploding or vanishing gradients',
-               2,
-               None,
-               'the-problem-of-exploding-or-vanishing-gradients'),
-              ('Mathematical setup', 2, None, 'mathematical-setup'),
-              ('Back propagation in time through figures, part 1',
-               2,
-               None,
-               'back-propagation-in-time-through-figures-part-1'),
-              ('Back propagation in time, part 2',
-               2,
-               None,
-               'back-propagation-in-time-part-2'),
-              ('Back propagation in time, part 3',
+               'parameters-to-train-common-settings'),
+              ('Examples of CNN setups', 2, None, 'examples-of-cnn-setups'),
+              ('Summarizing: Performing a general discrete convolution ("From '
+               'Raschka et '
+               'al":"https://github.com/rasbt/machine-learning-book")',
                2,
                None,
-               'back-propagation-in-time-part-3'),
-              ('Back propagation in time, part 4',
+               'summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
+              ('Pooling', 2, None, 'pooling'),
+              ('Pooling arithmetic', 2, None, 'pooling-arithmetic'),
+              ('Pooling types ("From Raschka et '
+               'al":"https://github.com/rasbt/machine-learning-book")',
                2,
                None,
-               'back-propagation-in-time-part-4'),
-              ('Back propagation in time in equations',
+               'pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
+              ('Building convolutional neural networks using Tensorflow and '
+               'Keras',
                2,
                None,
-               'back-propagation-in-time-in-equations'),
-              ('Chain rule again', 2, None, 'chain-rule-again'),
-              ('Gradients of loss functions',
+               'building-convolutional-neural-networks-using-tensorflow-and-keras'),
+              ('Setting it up', 2, None, 'setting-it-up'),
+              ('The MNIST dataset again', 2, None, 'the-mnist-dataset-again'),
+              ('Strong correlations', 2, None, 'strong-correlations'),
+              ('Layers of a CNN', 2, None, 'layers-of-a-cnn'),
+              ('Systematic reduction', 2, None, 'systematic-reduction'),
+              ('Prerequisites: Collect and pre-process data',
                2,
                None,
-               'gradients-of-loss-functions'),
-              ('Summary of RNNs', 2, None, 'summary-of-rnns'),
-              ('Summary of a  typical RNN',
+               'prerequisites-collect-and-pre-process-data'),
+              ('Importing Keras and Tensorflow',
                2,
                None,
-               'summary-of-a-typical-rnn'),
-              ('Four effective ways to learn an RNN and preparing for next '
-               'week',
+               'importing-keras-and-tensorflow'),
+              ('Running with Keras', 2, None, 'running-with-keras'),
+              ('Final part', 2, None, 'final-part'),
+              ('Final visualization', 2, None, 'final-visualization'),
+              ('The CIFAR01 data set', 2, None, 'the-cifar01-data-set'),
+              ('Verifying the data set', 2, None, 'verifying-the-data-set'),
+              ('Set up  the model', 2, None, 'set-up-the-model'),
+              ('Add Dense layers on top', 2, None, 'add-dense-layers-on-top'),
+              ('Compile and train the model',
                2,
                None,
-               'four-effective-ways-to-learn-an-rnn-and-preparing-for-next-week'),
-              ('Gating mechanism: Long Short Term Memory (LSTM)',
+               'compile-and-train-the-model'),
+              ('Finally, evaluate the model',
                2,
                None,
-               'gating-mechanism-long-short-term-memory-lstm'),
-              ('Implementing a memory cell in a neural network',
+               'finally-evaluate-the-model'),
+              ('Building code using Pytorch',
                2,
                None,
-               'implementing-a-memory-cell-in-a-neural-network'),
-              ('LSTM details', 2, None, 'lstm-details'),
-              ('Basic layout', 2, None, 'basic-layout'),
-              ('More LSTM details', 2, None, 'more-lstm-details'),
-              ('The forget gate', 2, None, 'the-forget-gate'),
-              ('Input gate', 2, None, 'input-gate'),
-              ('Forget and input', 2, None, 'forget-and-input'),
-              ('Output gate', 2, None, 'output-gate')]}
+               'building-code-using-pytorch')]}
 end of tocinfo -->
 
 <body>
@@ -302,111 +201,72 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week45-bs.html">Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks (RNNs)</a>
+    <a class="navbar-brand" href="http://github.com/week45-bs.html">Week 45,  Convolutional Neural Networks (CCNs)</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs001.html#plans-for-week-45" style="font-size: 80%;"><b>Plans for week 45</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs002.html#material-for-the-lab-sessions-additional-ways-to-present-classification-results-and-other-practicalities" style="font-size: 80%;"><b>Material for the lab sessions, additional ways to present classification results and other practicalities</b></a></li>
-     <!-- navigation toc: --> <li><a href="#material-for-lecture-monday-november-4" style="font-size: 80%;"><b>Material for Lecture Monday November 4</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs004.html#convolutional-neural-networks-recognizing-images" style="font-size: 80%;"><b>Convolutional Neural Networks (recognizing images)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs005.html#what-is-the-difference" style="font-size: 80%;"><b>What is the Difference</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs006.html#neural-networks-vs-cnns" style="font-size: 80%;"><b>Neural Networks vs CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs007.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;"><b>Why CNNS for images, sound files, medical images from CT scans etc?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs008.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;"><b>Regular NNs don’t scale well to full images</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs009.html#3d-volumes-of-neurons" style="font-size: 80%;"><b>3D volumes of neurons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs010.html#layers-used-to-build-cnns" style="font-size: 80%;"><b>Layers used to build CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs011.html#cnns-in-brief" style="font-size: 80%;"><b>CNNs in brief</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs012.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs013.html#key-idea" style="font-size: 80%;"><b>Key Idea</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs014.html#building-convolutional-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building convolutional neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs015.html#setting-it-up" style="font-size: 80%;"><b>Setting it up</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs016.html#the-mnist-dataset-again" style="font-size: 80%;"><b>The MNIST dataset again</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs017.html#strong-correlations" style="font-size: 80%;"><b>Strong correlations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs018.html#layers-of-a-cnn" style="font-size: 80%;"><b>Layers of a CNN</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs019.html#systematic-reduction" style="font-size: 80%;"><b>Systematic reduction</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs020.html#prerequisites-collect-and-pre-process-data" style="font-size: 80%;"><b>Prerequisites: Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs021.html#importing-keras-and-tensorflow" style="font-size: 80%;"><b>Importing Keras and Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs022.html#running-with-keras" style="font-size: 80%;"><b>Running with Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs023.html#final-part" style="font-size: 80%;"><b>Final part</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs024.html#final-visualization" style="font-size: 80%;"><b>Final visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs025.html#the-cifar01-data-set" style="font-size: 80%;"><b>The CIFAR01 data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs026.html#verifying-the-data-set" style="font-size: 80%;"><b>Verifying the data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs027.html#set-up-the-model" style="font-size: 80%;"><b>Set up  the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs028.html#add-dense-layers-on-top" style="font-size: 80%;"><b>Add Dense layers on top</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs029.html#compile-and-train-the-model" style="font-size: 80%;"><b>Compile and train the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs030.html#finally-evaluate-the-model" style="font-size: 80%;"><b>Finally, evaluate the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#building-our-own-cnn-code" style="font-size: 80%;"><b>Building our own CNN code</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#list-of-contents" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;List of contents:</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution2dlayer-convolution-in-a-hidden-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution2DLayer: convolution in a hidden layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#backpropagation-in-the-convolutional-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Backpropagation in the convolutional layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#demonstration" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Demonstration</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#pooling-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Pooling Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#flattening-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Flattening Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#fully-connected-layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Fully Connected Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#optimized-convolution2dlayer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Optimized Convolution2DLayer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#the-convolutional-neural-network-cnn" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;The Convolutional Neural Network (CNN)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-cnn-code" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of CNN code</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#additional-remarks" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Additional Remarks</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#remarks-on-the-speed" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Remarks on the speed</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution-using-separable-kernels" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution using separable kernels</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution-in-the-fourier-domain" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution in the Fourier domain</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs032.html#from-ffnns-and-cnns-to-recurrent-neural-networks-rnns" style="font-size: 80%;"><b>From FFNNs and CNNs to recurrent neural networks (RNNs)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs033.html#feedback-connections" style="font-size: 80%;"><b>Feedback connections</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs034.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs035.html#recurrent-neural-networks-rnns-overarching-view" style="font-size: 80%;"><b>Recurrent neural networks (RNNs): Overarching view</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs036.html#sequential-data-only" style="font-size: 80%;"><b>Sequential data only?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs037.html#differential-equations" style="font-size: 80%;"><b>Differential equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs038.html#a-simple-example" style="font-size: 80%;"><b>A simple example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs039.html#rnns" style="font-size: 80%;"><b>RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs040.html#what-kinds-of-behaviour-can-rnns-exhibit" style="font-size: 80%;"><b>What kinds of behaviour can RNNs exhibit?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs041.html#basic-layout-figures-from-sebastian-rashcka-et-al-machine-learning-with-sickit-learn-and-pytorch-https-sebastianraschka-com-blog-2022-ml-pytorch-book-html" style="font-size: 80%;"><b>Basic layout,  "Figures from Sebastian Rashcka et al, Machine learning with Sickit-Learn and PyTorch":"https://sebastianraschka.com/blog/2022/ml-pytorch-book.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs042.html#solving-differential-equations-with-rnns" style="font-size: 80%;"><b>Solving differential equations with RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs043.html#two-first-order-differential-equations" style="font-size: 80%;"><b>Two first-order differential equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs044.html#velocity-only" style="font-size: 80%;"><b>Velocity only</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs045.html#linking-with-rnns" style="font-size: 80%;"><b>Linking with RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs046.html#minor-rewrite" style="font-size: 80%;"><b>Minor rewrite</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs047.html#rnns-in-more-detail" style="font-size: 80%;"><b>RNNs in more detail</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs048.html#rnns-in-more-detail-part-2" style="font-size: 80%;"><b>RNNs in more detail, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs049.html#rnns-in-more-detail-part-3" style="font-size: 80%;"><b>RNNs in more detail, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs050.html#rnns-in-more-detail-part-4" style="font-size: 80%;"><b>RNNs in more detail, part 4</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs051.html#rnns-in-more-detail-part-5" style="font-size: 80%;"><b>RNNs in more detail, part 5</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs052.html#rnns-in-more-detail-part-6" style="font-size: 80%;"><b>RNNs in more detail, part 6</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs053.html#rnns-in-more-detail-part-7" style="font-size: 80%;"><b>RNNs in more detail, part 7</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs054.html#backpropagation-through-time" style="font-size: 80%;"><b>Backpropagation through time</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs055.html#the-backward-pass-is-linear" style="font-size: 80%;"><b>The backward pass is linear</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs056.html#the-problem-of-exploding-or-vanishing-gradients" style="font-size: 80%;"><b>The problem of exploding or vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs057.html#mathematical-setup" style="font-size: 80%;"><b>Mathematical setup</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs058.html#back-propagation-in-time-through-figures-part-1" style="font-size: 80%;"><b>Back propagation in time through figures, part 1</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs059.html#back-propagation-in-time-part-2" style="font-size: 80%;"><b>Back propagation in time, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs060.html#back-propagation-in-time-part-3" style="font-size: 80%;"><b>Back propagation in time, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs061.html#back-propagation-in-time-part-4" style="font-size: 80%;"><b>Back propagation in time, part 4</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs062.html#back-propagation-in-time-in-equations" style="font-size: 80%;"><b>Back propagation in time in equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs063.html#chain-rule-again" style="font-size: 80%;"><b>Chain rule again</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs064.html#gradients-of-loss-functions" style="font-size: 80%;"><b>Gradients of loss functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs065.html#summary-of-rnns" style="font-size: 80%;"><b>Summary of RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs066.html#summary-of-a-typical-rnn" style="font-size: 80%;"><b>Summary of a  typical RNN</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs067.html#four-effective-ways-to-learn-an-rnn-and-preparing-for-next-week" style="font-size: 80%;"><b>Four effective ways to learn an RNN and preparing for next week</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs068.html#gating-mechanism-long-short-term-memory-lstm" style="font-size: 80%;"><b>Gating mechanism: Long Short Term Memory (LSTM)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs069.html#implementing-a-memory-cell-in-a-neural-network" style="font-size: 80%;"><b>Implementing a memory cell in a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs070.html#lstm-details" style="font-size: 80%;"><b>LSTM details</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs071.html#basic-layout" style="font-size: 80%;"><b>Basic layout</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs072.html#more-lstm-details" style="font-size: 80%;"><b>More LSTM details</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs073.html#the-forget-gate" style="font-size: 80%;"><b>The forget gate</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs074.html#input-gate" style="font-size: 80%;"><b>Input gate</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs075.html#forget-and-input" style="font-size: 80%;"><b>Forget and input</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs076.html#output-gate" style="font-size: 80%;"><b>Output gate</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs001.html#plans-for-week-45" style="font-size: 80%;">Plans for week 45</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs002.html#material-for-the-lab-sessions" style="font-size: 80%;">Material for the lab sessions</a></li>
+     <!-- navigation toc: --> <li><a href="#material-for-lecture-monday-november-3" style="font-size: 80%;">Material for Lecture Monday November 3</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs004.html#convolutional-neural-networks-recognizing-images-reminder-from-last-week" style="font-size: 80%;">Convolutional Neural Networks (recognizing images), reminder from last week</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs005.html#what-is-the-difference" style="font-size: 80%;">What is the Difference</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs006.html#neural-networks-vs-cnns" style="font-size: 80%;">Neural Networks vs CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs007.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;">Why CNNS for images, sound files, medical images from CT scans etc?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs008.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;">Regular NNs don’t scale well to full images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs009.html#3d-volumes-of-neurons" style="font-size: 80%;">3D volumes of neurons</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs010.html#more-on-dimensionalities" style="font-size: 80%;">More on Dimensionalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs011.html#further-remarks" style="font-size: 80%;">Further remarks</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs012.html#layers-used-to-build-cnns" style="font-size: 80%;">Layers used to build CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs013.html#transforming-images" style="font-size: 80%;">Transforming images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs014.html#cnns-in-brief" style="font-size: 80%;">CNNs in brief</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs015.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs016.html#key-idea" style="font-size: 80%;">Key Idea</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs017.html#mathematics-of-cnns" style="font-size: 80%;">Mathematics of CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs018.html#convolution-examples-polynomial-multiplication" style="font-size: 80%;">Convolution Examples: Polynomial multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs019.html#efficient-polynomial-multiplication" style="font-size: 80%;">Efficient Polynomial Multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs020.html#further-simplification" style="font-size: 80%;">Further simplification</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs021.html#a-more-efficient-way-of-coding-the-above-convolution" style="font-size: 80%;">A more efficient way of coding the above Convolution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs022.html#commutative-process" style="font-size: 80%;">Commutative process</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs023.html#toeplitz-matrices" style="font-size: 80%;">Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs024.html#fourier-series-and-toeplitz-matrices" style="font-size: 80%;">Fourier series and Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs025.html#generalizing-the-above-one-dimensional-case" style="font-size: 80%;">Generalizing the above one-dimensional case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs026.html#memory-considerations" style="font-size: 80%;">Memory considerations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs027.html#padding" style="font-size: 80%;">Padding</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs028.html#new-vector" style="font-size: 80%;">New vector</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs029.html#rewriting-as-dot-products" style="font-size: 80%;">Rewriting as dot products</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs030.html#cross-correlation" style="font-size: 80%;">Cross correlation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs030.html#two-dimensional-objects" style="font-size: 80%;">Two-dimensional objects</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#cnns-in-more-detail-simple-example" style="font-size: 80%;">CNNs in more detail, simple example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs032.html#the-convolution-stage" style="font-size: 80%;">The convolution stage</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs033.html#finding-the-number-of-parameters" style="font-size: 80%;">Finding the number of parameters</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs034.html#new-image-or-volume" style="font-size: 80%;">New image (or volume)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs035.html#parameters-to-train-common-settings" style="font-size: 80%;">Parameters to train, common settings</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs036.html#examples-of-cnn-setups" style="font-size: 80%;">Examples of CNN setups</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs037.html#summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Summarizing: Performing a general discrete convolution ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs038.html#pooling" style="font-size: 80%;">Pooling</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs039.html#pooling-arithmetic" style="font-size: 80%;">Pooling arithmetic</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs040.html#pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Pooling types ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs041.html#building-convolutional-neural-networks-using-tensorflow-and-keras" style="font-size: 80%;">Building convolutional neural networks using Tensorflow and Keras</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs042.html#setting-it-up" style="font-size: 80%;">Setting it up</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs043.html#the-mnist-dataset-again" style="font-size: 80%;">The MNIST dataset again</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs044.html#strong-correlations" style="font-size: 80%;">Strong correlations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs045.html#layers-of-a-cnn" style="font-size: 80%;">Layers of a CNN</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs046.html#systematic-reduction" style="font-size: 80%;">Systematic reduction</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs047.html#prerequisites-collect-and-pre-process-data" style="font-size: 80%;">Prerequisites: Collect and pre-process data</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs048.html#importing-keras-and-tensorflow" style="font-size: 80%;">Importing Keras and Tensorflow</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs049.html#running-with-keras" style="font-size: 80%;">Running with Keras</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs050.html#final-part" style="font-size: 80%;">Final part</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs051.html#final-visualization" style="font-size: 80%;">Final visualization</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs052.html#the-cifar01-data-set" style="font-size: 80%;">The CIFAR01 data set</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs053.html#verifying-the-data-set" style="font-size: 80%;">Verifying the data set</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs054.html#set-up-the-model" style="font-size: 80%;">Set up  the model</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs055.html#add-dense-layers-on-top" style="font-size: 80%;">Add Dense layers on top</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs056.html#compile-and-train-the-model" style="font-size: 80%;">Compile and train the model</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs057.html#finally-evaluate-the-model" style="font-size: 80%;">Finally, evaluate the model</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs058.html#building-code-using-pytorch" style="font-size: 80%;">Building code using Pytorch</a></li>
 
         </ul>
       </li>
@@ -418,7 +278,7 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0003"></a>
 <!-- !split -->
-<h2 id="material-for-lecture-monday-november-4" class="anchor">Material for Lecture Monday November 4 </h2>
+<h2 id="material-for-lecture-monday-november-3" class="anchor">Material for Lecture Monday November 3 </h2>
 
 <p>
 <!-- navigation buttons at the bottom of the page -->
@@ -438,7 +298,7 @@ <h2 id="material-for-lecture-monday-november-4" class="anchor">Material for Lect
   <li><a href="http://github.com/._week45-bs011.html">12</a></li>
   <li><a href="http://github.com/._week45-bs012.html">13</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week45-bs076.html">77</a></li>
+  <li><a href="http://github.com/._week45-bs058.html">59</a></li>
   <li><a href="http://github.com/._week45-bs004.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week45/html/._week45-bs004.html b/doc/pub/week45/html/._week45-bs004.html
index 32e983ccf..717d6bd83 100644
--- a/doc/pub/week45/html/._week45-bs004.html
+++ b/doc/pub/week45/html/._week45-bs004.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks (RNNs)">
-<title>Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks (RNNs)</title>
+<meta name="description" content="Week 45,  Convolutional Neural Networks (CCNs)">
+<title>Week 45,  Convolutional Neural Networks (CCNs)</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week45.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week45-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -37,19 +37,19 @@
 <!-- tocinfo
 {'highest level': 2,
  'sections': [('Plans for week 45', 2, None, 'plans-for-week-45'),
-              ('Material for the lab sessions, additional ways to present '
-               'classification results and other practicalities',
+              ('Material for the lab sessions',
                2,
                None,
-               'material-for-the-lab-sessions-additional-ways-to-present-classification-results-and-other-practicalities'),
-              ('Material for Lecture Monday November 4',
+               'material-for-the-lab-sessions'),
+              ('Material for Lecture Monday November 3',
                2,
                None,
-               'material-for-lecture-monday-november-4'),
-              ('Convolutional Neural Networks (recognizing images)',
+               'material-for-lecture-monday-november-3'),
+              ('Convolutional Neural Networks (recognizing images), reminder '
+               'from last week',
                2,
                None,
-               'convolutional-neural-networks-recognizing-images'),
+               'convolutional-neural-networks-recognizing-images-reminder-from-last-week'),
               ('What is the Difference', 2, None, 'what-is-the-difference'),
               ('Neural Networks vs CNNs', 2, None, 'neural-networks-vs-cnns'),
               ('Why CNNS for images, sound files, medical images from CT scans '
@@ -62,10 +62,13 @@
                None,
                'regular-nns-don-t-scale-well-to-full-images'),
               ('3D volumes of neurons', 2, None, '3d-volumes-of-neurons'),
+              ('More on Dimensionalities', 2, None, 'more-on-dimensionalities'),
+              ('Further remarks', 2, None, 'further-remarks'),
               ('Layers used to build CNNs',
                2,
                None,
                'layers-used-to-build-cnns'),
+              ('Transforming images', 2, None, 'transforming-images'),
               ('CNNs in brief', 2, None, 'cnns-in-brief'),
               ('A deep CNN model ("From Raschka et '
                'al":"https://github.com/rasbt/machine-learning-book")',
@@ -73,208 +76,104 @@
                None,
                'a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
               ('Key Idea', 2, None, 'key-idea'),
-              ('Building convolutional neural networks in Tensorflow and Keras',
+              ('Mathematics of CNNs', 2, None, 'mathematics-of-cnns'),
+              ('Convolution Examples: Polynomial multiplication',
                2,
                None,
-               'building-convolutional-neural-networks-in-tensorflow-and-keras'),
-              ('Setting it up', 2, None, 'setting-it-up'),
-              ('The MNIST dataset again', 2, None, 'the-mnist-dataset-again'),
-              ('Strong correlations', 2, None, 'strong-correlations'),
-              ('Layers of a CNN', 2, None, 'layers-of-a-cnn'),
-              ('Systematic reduction', 2, None, 'systematic-reduction'),
-              ('Prerequisites: Collect and pre-process data',
-               2,
-               None,
-               'prerequisites-collect-and-pre-process-data'),
-              ('Importing Keras and Tensorflow',
-               2,
-               None,
-               'importing-keras-and-tensorflow'),
-              ('Running with Keras', 2, None, 'running-with-keras'),
-              ('Final part', 2, None, 'final-part'),
-              ('Final visualization', 2, None, 'final-visualization'),
-              ('The CIFAR01 data set', 2, None, 'the-cifar01-data-set'),
-              ('Verifying the data set', 2, None, 'verifying-the-data-set'),
-              ('Set up  the model', 2, None, 'set-up-the-model'),
-              ('Add Dense layers on top', 2, None, 'add-dense-layers-on-top'),
-              ('Compile and train the model',
-               2,
-               None,
-               'compile-and-train-the-model'),
-              ('Finally, evaluate the model',
-               2,
-               None,
-               'finally-evaluate-the-model'),
-              ('Building our own CNN code',
-               2,
-               None,
-               'building-our-own-cnn-code'),
-              ('List of contents:', 3, None, 'list-of-contents'),
-              ('Schedulers', 3, None, 'schedulers'),
-              ('Usage of schedulers', 3, None, 'usage-of-schedulers'),
-              ('Cost functions', 3, None, 'cost-functions'),
-              ('Usage of cost functions', 3, None, 'usage-of-cost-functions'),
-              ('Activation functions', 3, None, 'activation-functions'),
-              ('Usage of activation functions',
-               3,
-               None,
-               'usage-of-activation-functions'),
-              ('Convolution', 3, None, 'convolution'),
-              ('Layers', 3, None, 'layers'),
-              ('Convolution2DLayer: convolution in a hidden layer',
-               3,
-               None,
-               'convolution2dlayer-convolution-in-a-hidden-layer'),
-              ('Backpropagation in the convolutional layer',
-               3,
-               None,
-               'backpropagation-in-the-convolutional-layer'),
-              ('Demonstration', 3, None, 'demonstration'),
-              ('Pooling Layer', 3, None, 'pooling-layer'),
-              ('Flattening Layer', 3, None, 'flattening-layer'),
-              ('Fully Connected Layers', 3, None, 'fully-connected-layers'),
-              ('Optimized Convolution2DLayer',
-               3,
-               None,
-               'optimized-convolution2dlayer'),
-              ('The Convolutional Neural Network (CNN)',
-               3,
-               None,
-               'the-convolutional-neural-network-cnn'),
-              ('Usage of CNN code', 3, None, 'usage-of-cnn-code'),
-              ('Additional Remarks', 3, None, 'additional-remarks'),
-              ('Remarks on the speed', 3, None, 'remarks-on-the-speed'),
-              ('Convolution using separable kernels',
-               3,
-               None,
-               'convolution-using-separable-kernels'),
-              ('Convolution in the Fourier domain',
-               3,
-               None,
-               'convolution-in-the-fourier-domain'),
-              ('From FFNNs and CNNs to recurrent neural networks (RNNs)',
-               2,
-               None,
-               'from-ffnns-and-cnns-to-recurrent-neural-networks-rnns'),
-              ('Feedback connections', 2, None, 'feedback-connections'),
-              ('Vanishing gradients', 2, None, 'vanishing-gradients'),
-              ('Recurrent neural networks (RNNs): Overarching view',
+               'convolution-examples-polynomial-multiplication'),
+              ('Efficient Polynomial Multiplication',
                2,
                None,
-               'recurrent-neural-networks-rnns-overarching-view'),
-              ('Sequential data only?', 2, None, 'sequential-data-only'),
-              ('Differential equations', 2, None, 'differential-equations'),
-              ('A simple example', 2, None, 'a-simple-example'),
-              ('RNNs', 2, None, 'rnns'),
-              ('What kinds of behaviour can RNNs exhibit?',
+               'efficient-polynomial-multiplication'),
+              ('Further simplification', 2, None, 'further-simplification'),
+              ('A more efficient way of coding the above Convolution',
                2,
                None,
-               'what-kinds-of-behaviour-can-rnns-exhibit'),
-              ('Basic layout,  "Figures from Sebastian Rashcka et al, Machine '
-               'learning with Sickit-Learn and '
-               'PyTorch":"https://sebastianraschka.com/blog/2022/ml-pytorch-book.html"',
+               'a-more-efficient-way-of-coding-the-above-convolution'),
+              ('Commutative process', 2, None, 'commutative-process'),
+              ('Toeplitz matrices', 2, None, 'toeplitz-matrices'),
+              ('Fourier series and Toeplitz matrices',
                2,
                None,
-               'basic-layout-figures-from-sebastian-rashcka-et-al-machine-learning-with-sickit-learn-and-pytorch-https-sebastianraschka-com-blog-2022-ml-pytorch-book-html'),
-              ('Solving differential equations with RNNs',
+               'fourier-series-and-toeplitz-matrices'),
+              ('Generalizing the above one-dimensional case',
                2,
                None,
-               'solving-differential-equations-with-rnns'),
-              ('Two first-order differential equations',
+               'generalizing-the-above-one-dimensional-case'),
+              ('Memory considerations', 2, None, 'memory-considerations'),
+              ('Padding', 2, None, 'padding'),
+              ('New vector', 2, None, 'new-vector'),
+              ('Rewriting as dot products',
                2,
                None,
-               'two-first-order-differential-equations'),
-              ('Velocity only', 2, None, 'velocity-only'),
-              ('Linking with RNNs', 2, None, 'linking-with-rnns'),
-              ('Minor rewrite', 2, None, 'minor-rewrite'),
-              ('RNNs in more detail', 2, None, 'rnns-in-more-detail'),
-              ('RNNs in more detail, part 2',
+               'rewriting-as-dot-products'),
+              ('Cross correlation', 2, None, 'cross-correlation'),
+              ('Two-dimensional objects', 2, None, 'two-dimensional-objects'),
+              ('CNNs in more detail, simple example',
                2,
                None,
-               'rnns-in-more-detail-part-2'),
-              ('RNNs in more detail, part 3',
+               'cnns-in-more-detail-simple-example'),
+              ('The convolution stage', 2, None, 'the-convolution-stage'),
+              ('Finding the number of parameters',
                2,
                None,
-               'rnns-in-more-detail-part-3'),
-              ('RNNs in more detail, part 4',
+               'finding-the-number-of-parameters'),
+              ('New image (or volume)', 2, None, 'new-image-or-volume'),
+              ('Parameters to train, common settings',
                2,
                None,
-               'rnns-in-more-detail-part-4'),
-              ('RNNs in more detail, part 5',
-               2,
-               None,
-               'rnns-in-more-detail-part-5'),
-              ('RNNs in more detail, part 6',
-               2,
-               None,
-               'rnns-in-more-detail-part-6'),
-              ('RNNs in more detail, part 7',
-               2,
-               None,
-               'rnns-in-more-detail-part-7'),
-              ('Backpropagation through time',
-               2,
-               None,
-               'backpropagation-through-time'),
-              ('The backward pass is linear',
-               2,
-               None,
-               'the-backward-pass-is-linear'),
-              ('The problem of exploding or vanishing gradients',
-               2,
-               None,
-               'the-problem-of-exploding-or-vanishing-gradients'),
-              ('Mathematical setup', 2, None, 'mathematical-setup'),
-              ('Back propagation in time through figures, part 1',
-               2,
-               None,
-               'back-propagation-in-time-through-figures-part-1'),
-              ('Back propagation in time, part 2',
-               2,
-               None,
-               'back-propagation-in-time-part-2'),
-              ('Back propagation in time, part 3',
+               'parameters-to-train-common-settings'),
+              ('Examples of CNN setups', 2, None, 'examples-of-cnn-setups'),
+              ('Summarizing: Performing a general discrete convolution ("From '
+               'Raschka et '
+               'al":"https://github.com/rasbt/machine-learning-book")',
                2,
                None,
-               'back-propagation-in-time-part-3'),
-              ('Back propagation in time, part 4',
+               'summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
+              ('Pooling', 2, None, 'pooling'),
+              ('Pooling arithmetic', 2, None, 'pooling-arithmetic'),
+              ('Pooling types ("From Raschka et '
+               'al":"https://github.com/rasbt/machine-learning-book")',
                2,
                None,
-               'back-propagation-in-time-part-4'),
-              ('Back propagation in time in equations',
+               'pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
+              ('Building convolutional neural networks using Tensorflow and '
+               'Keras',
                2,
                None,
-               'back-propagation-in-time-in-equations'),
-              ('Chain rule again', 2, None, 'chain-rule-again'),
-              ('Gradients of loss functions',
+               'building-convolutional-neural-networks-using-tensorflow-and-keras'),
+              ('Setting it up', 2, None, 'setting-it-up'),
+              ('The MNIST dataset again', 2, None, 'the-mnist-dataset-again'),
+              ('Strong correlations', 2, None, 'strong-correlations'),
+              ('Layers of a CNN', 2, None, 'layers-of-a-cnn'),
+              ('Systematic reduction', 2, None, 'systematic-reduction'),
+              ('Prerequisites: Collect and pre-process data',
                2,
                None,
-               'gradients-of-loss-functions'),
-              ('Summary of RNNs', 2, None, 'summary-of-rnns'),
-              ('Summary of a  typical RNN',
+               'prerequisites-collect-and-pre-process-data'),
+              ('Importing Keras and Tensorflow',
                2,
                None,
-               'summary-of-a-typical-rnn'),
-              ('Four effective ways to learn an RNN and preparing for next '
-               'week',
+               'importing-keras-and-tensorflow'),
+              ('Running with Keras', 2, None, 'running-with-keras'),
+              ('Final part', 2, None, 'final-part'),
+              ('Final visualization', 2, None, 'final-visualization'),
+              ('The CIFAR01 data set', 2, None, 'the-cifar01-data-set'),
+              ('Verifying the data set', 2, None, 'verifying-the-data-set'),
+              ('Set up  the model', 2, None, 'set-up-the-model'),
+              ('Add Dense layers on top', 2, None, 'add-dense-layers-on-top'),
+              ('Compile and train the model',
                2,
                None,
-               'four-effective-ways-to-learn-an-rnn-and-preparing-for-next-week'),
-              ('Gating mechanism: Long Short Term Memory (LSTM)',
+               'compile-and-train-the-model'),
+              ('Finally, evaluate the model',
                2,
                None,
-               'gating-mechanism-long-short-term-memory-lstm'),
-              ('Implementing a memory cell in a neural network',
+               'finally-evaluate-the-model'),
+              ('Building code using Pytorch',
                2,
                None,
-               'implementing-a-memory-cell-in-a-neural-network'),
-              ('LSTM details', 2, None, 'lstm-details'),
-              ('Basic layout', 2, None, 'basic-layout'),
-              ('More LSTM details', 2, None, 'more-lstm-details'),
-              ('The forget gate', 2, None, 'the-forget-gate'),
-              ('Input gate', 2, None, 'input-gate'),
-              ('Forget and input', 2, None, 'forget-and-input'),
-              ('Output gate', 2, None, 'output-gate')]}
+               'building-code-using-pytorch')]}
 end of tocinfo -->
 
 <body>
@@ -302,111 +201,72 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week45-bs.html">Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks (RNNs)</a>
+    <a class="navbar-brand" href="http://github.com/week45-bs.html">Week 45,  Convolutional Neural Networks (CCNs)</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs001.html#plans-for-week-45" style="font-size: 80%;"><b>Plans for week 45</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs002.html#material-for-the-lab-sessions-additional-ways-to-present-classification-results-and-other-practicalities" style="font-size: 80%;"><b>Material for the lab sessions, additional ways to present classification results and other practicalities</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs003.html#material-for-lecture-monday-november-4" style="font-size: 80%;"><b>Material for Lecture Monday November 4</b></a></li>
-     <!-- navigation toc: --> <li><a href="#convolutional-neural-networks-recognizing-images" style="font-size: 80%;"><b>Convolutional Neural Networks (recognizing images)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs005.html#what-is-the-difference" style="font-size: 80%;"><b>What is the Difference</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs006.html#neural-networks-vs-cnns" style="font-size: 80%;"><b>Neural Networks vs CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs007.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;"><b>Why CNNS for images, sound files, medical images from CT scans etc?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs008.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;"><b>Regular NNs don’t scale well to full images</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs009.html#3d-volumes-of-neurons" style="font-size: 80%;"><b>3D volumes of neurons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs010.html#layers-used-to-build-cnns" style="font-size: 80%;"><b>Layers used to build CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs011.html#cnns-in-brief" style="font-size: 80%;"><b>CNNs in brief</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs012.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs013.html#key-idea" style="font-size: 80%;"><b>Key Idea</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs014.html#building-convolutional-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building convolutional neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs015.html#setting-it-up" style="font-size: 80%;"><b>Setting it up</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs016.html#the-mnist-dataset-again" style="font-size: 80%;"><b>The MNIST dataset again</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs017.html#strong-correlations" style="font-size: 80%;"><b>Strong correlations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs018.html#layers-of-a-cnn" style="font-size: 80%;"><b>Layers of a CNN</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs019.html#systematic-reduction" style="font-size: 80%;"><b>Systematic reduction</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs020.html#prerequisites-collect-and-pre-process-data" style="font-size: 80%;"><b>Prerequisites: Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs021.html#importing-keras-and-tensorflow" style="font-size: 80%;"><b>Importing Keras and Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs022.html#running-with-keras" style="font-size: 80%;"><b>Running with Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs023.html#final-part" style="font-size: 80%;"><b>Final part</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs024.html#final-visualization" style="font-size: 80%;"><b>Final visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs025.html#the-cifar01-data-set" style="font-size: 80%;"><b>The CIFAR01 data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs026.html#verifying-the-data-set" style="font-size: 80%;"><b>Verifying the data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs027.html#set-up-the-model" style="font-size: 80%;"><b>Set up  the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs028.html#add-dense-layers-on-top" style="font-size: 80%;"><b>Add Dense layers on top</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs029.html#compile-and-train-the-model" style="font-size: 80%;"><b>Compile and train the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs030.html#finally-evaluate-the-model" style="font-size: 80%;"><b>Finally, evaluate the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#building-our-own-cnn-code" style="font-size: 80%;"><b>Building our own CNN code</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#list-of-contents" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;List of contents:</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution2dlayer-convolution-in-a-hidden-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution2DLayer: convolution in a hidden layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#backpropagation-in-the-convolutional-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Backpropagation in the convolutional layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#demonstration" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Demonstration</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#pooling-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Pooling Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#flattening-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Flattening Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#fully-connected-layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Fully Connected Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#optimized-convolution2dlayer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Optimized Convolution2DLayer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#the-convolutional-neural-network-cnn" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;The Convolutional Neural Network (CNN)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-cnn-code" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of CNN code</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#additional-remarks" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Additional Remarks</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#remarks-on-the-speed" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Remarks on the speed</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution-using-separable-kernels" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution using separable kernels</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution-in-the-fourier-domain" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution in the Fourier domain</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs032.html#from-ffnns-and-cnns-to-recurrent-neural-networks-rnns" style="font-size: 80%;"><b>From FFNNs and CNNs to recurrent neural networks (RNNs)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs033.html#feedback-connections" style="font-size: 80%;"><b>Feedback connections</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs034.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs035.html#recurrent-neural-networks-rnns-overarching-view" style="font-size: 80%;"><b>Recurrent neural networks (RNNs): Overarching view</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs036.html#sequential-data-only" style="font-size: 80%;"><b>Sequential data only?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs037.html#differential-equations" style="font-size: 80%;"><b>Differential equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs038.html#a-simple-example" style="font-size: 80%;"><b>A simple example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs039.html#rnns" style="font-size: 80%;"><b>RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs040.html#what-kinds-of-behaviour-can-rnns-exhibit" style="font-size: 80%;"><b>What kinds of behaviour can RNNs exhibit?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs041.html#basic-layout-figures-from-sebastian-rashcka-et-al-machine-learning-with-sickit-learn-and-pytorch-https-sebastianraschka-com-blog-2022-ml-pytorch-book-html" style="font-size: 80%;"><b>Basic layout,  "Figures from Sebastian Rashcka et al, Machine learning with Sickit-Learn and PyTorch":"https://sebastianraschka.com/blog/2022/ml-pytorch-book.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs042.html#solving-differential-equations-with-rnns" style="font-size: 80%;"><b>Solving differential equations with RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs043.html#two-first-order-differential-equations" style="font-size: 80%;"><b>Two first-order differential equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs044.html#velocity-only" style="font-size: 80%;"><b>Velocity only</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs045.html#linking-with-rnns" style="font-size: 80%;"><b>Linking with RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs046.html#minor-rewrite" style="font-size: 80%;"><b>Minor rewrite</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs047.html#rnns-in-more-detail" style="font-size: 80%;"><b>RNNs in more detail</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs048.html#rnns-in-more-detail-part-2" style="font-size: 80%;"><b>RNNs in more detail, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs049.html#rnns-in-more-detail-part-3" style="font-size: 80%;"><b>RNNs in more detail, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs050.html#rnns-in-more-detail-part-4" style="font-size: 80%;"><b>RNNs in more detail, part 4</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs051.html#rnns-in-more-detail-part-5" style="font-size: 80%;"><b>RNNs in more detail, part 5</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs052.html#rnns-in-more-detail-part-6" style="font-size: 80%;"><b>RNNs in more detail, part 6</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs053.html#rnns-in-more-detail-part-7" style="font-size: 80%;"><b>RNNs in more detail, part 7</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs054.html#backpropagation-through-time" style="font-size: 80%;"><b>Backpropagation through time</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs055.html#the-backward-pass-is-linear" style="font-size: 80%;"><b>The backward pass is linear</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs056.html#the-problem-of-exploding-or-vanishing-gradients" style="font-size: 80%;"><b>The problem of exploding or vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs057.html#mathematical-setup" style="font-size: 80%;"><b>Mathematical setup</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs058.html#back-propagation-in-time-through-figures-part-1" style="font-size: 80%;"><b>Back propagation in time through figures, part 1</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs059.html#back-propagation-in-time-part-2" style="font-size: 80%;"><b>Back propagation in time, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs060.html#back-propagation-in-time-part-3" style="font-size: 80%;"><b>Back propagation in time, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs061.html#back-propagation-in-time-part-4" style="font-size: 80%;"><b>Back propagation in time, part 4</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs062.html#back-propagation-in-time-in-equations" style="font-size: 80%;"><b>Back propagation in time in equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs063.html#chain-rule-again" style="font-size: 80%;"><b>Chain rule again</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs064.html#gradients-of-loss-functions" style="font-size: 80%;"><b>Gradients of loss functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs065.html#summary-of-rnns" style="font-size: 80%;"><b>Summary of RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs066.html#summary-of-a-typical-rnn" style="font-size: 80%;"><b>Summary of a  typical RNN</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs067.html#four-effective-ways-to-learn-an-rnn-and-preparing-for-next-week" style="font-size: 80%;"><b>Four effective ways to learn an RNN and preparing for next week</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs068.html#gating-mechanism-long-short-term-memory-lstm" style="font-size: 80%;"><b>Gating mechanism: Long Short Term Memory (LSTM)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs069.html#implementing-a-memory-cell-in-a-neural-network" style="font-size: 80%;"><b>Implementing a memory cell in a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs070.html#lstm-details" style="font-size: 80%;"><b>LSTM details</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs071.html#basic-layout" style="font-size: 80%;"><b>Basic layout</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs072.html#more-lstm-details" style="font-size: 80%;"><b>More LSTM details</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs073.html#the-forget-gate" style="font-size: 80%;"><b>The forget gate</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs074.html#input-gate" style="font-size: 80%;"><b>Input gate</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs075.html#forget-and-input" style="font-size: 80%;"><b>Forget and input</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs076.html#output-gate" style="font-size: 80%;"><b>Output gate</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs001.html#plans-for-week-45" style="font-size: 80%;">Plans for week 45</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs002.html#material-for-the-lab-sessions" style="font-size: 80%;">Material for the lab sessions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs003.html#material-for-lecture-monday-november-3" style="font-size: 80%;">Material for Lecture Monday November 3</a></li>
+     <!-- navigation toc: --> <li><a href="#convolutional-neural-networks-recognizing-images-reminder-from-last-week" style="font-size: 80%;">Convolutional Neural Networks (recognizing images), reminder from last week</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs005.html#what-is-the-difference" style="font-size: 80%;">What is the Difference</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs006.html#neural-networks-vs-cnns" style="font-size: 80%;">Neural Networks vs CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs007.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;">Why CNNS for images, sound files, medical images from CT scans etc?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs008.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;">Regular NNs don’t scale well to full images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs009.html#3d-volumes-of-neurons" style="font-size: 80%;">3D volumes of neurons</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs010.html#more-on-dimensionalities" style="font-size: 80%;">More on Dimensionalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs011.html#further-remarks" style="font-size: 80%;">Further remarks</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs012.html#layers-used-to-build-cnns" style="font-size: 80%;">Layers used to build CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs013.html#transforming-images" style="font-size: 80%;">Transforming images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs014.html#cnns-in-brief" style="font-size: 80%;">CNNs in brief</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs015.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs016.html#key-idea" style="font-size: 80%;">Key Idea</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs017.html#mathematics-of-cnns" style="font-size: 80%;">Mathematics of CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs018.html#convolution-examples-polynomial-multiplication" style="font-size: 80%;">Convolution Examples: Polynomial multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs019.html#efficient-polynomial-multiplication" style="font-size: 80%;">Efficient Polynomial Multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs020.html#further-simplification" style="font-size: 80%;">Further simplification</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs021.html#a-more-efficient-way-of-coding-the-above-convolution" style="font-size: 80%;">A more efficient way of coding the above Convolution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs022.html#commutative-process" style="font-size: 80%;">Commutative process</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs023.html#toeplitz-matrices" style="font-size: 80%;">Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs024.html#fourier-series-and-toeplitz-matrices" style="font-size: 80%;">Fourier series and Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs025.html#generalizing-the-above-one-dimensional-case" style="font-size: 80%;">Generalizing the above one-dimensional case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs026.html#memory-considerations" style="font-size: 80%;">Memory considerations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs027.html#padding" style="font-size: 80%;">Padding</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs028.html#new-vector" style="font-size: 80%;">New vector</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs029.html#rewriting-as-dot-products" style="font-size: 80%;">Rewriting as dot products</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs030.html#cross-correlation" style="font-size: 80%;">Cross correlation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs030.html#two-dimensional-objects" style="font-size: 80%;">Two-dimensional objects</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#cnns-in-more-detail-simple-example" style="font-size: 80%;">CNNs in more detail, simple example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs032.html#the-convolution-stage" style="font-size: 80%;">The convolution stage</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs033.html#finding-the-number-of-parameters" style="font-size: 80%;">Finding the number of parameters</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs034.html#new-image-or-volume" style="font-size: 80%;">New image (or volume)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs035.html#parameters-to-train-common-settings" style="font-size: 80%;">Parameters to train, common settings</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs036.html#examples-of-cnn-setups" style="font-size: 80%;">Examples of CNN setups</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs037.html#summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Summarizing: Performing a general discrete convolution ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs038.html#pooling" style="font-size: 80%;">Pooling</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs039.html#pooling-arithmetic" style="font-size: 80%;">Pooling arithmetic</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs040.html#pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Pooling types ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs041.html#building-convolutional-neural-networks-using-tensorflow-and-keras" style="font-size: 80%;">Building convolutional neural networks using Tensorflow and Keras</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs042.html#setting-it-up" style="font-size: 80%;">Setting it up</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs043.html#the-mnist-dataset-again" style="font-size: 80%;">The MNIST dataset again</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs044.html#strong-correlations" style="font-size: 80%;">Strong correlations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs045.html#layers-of-a-cnn" style="font-size: 80%;">Layers of a CNN</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs046.html#systematic-reduction" style="font-size: 80%;">Systematic reduction</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs047.html#prerequisites-collect-and-pre-process-data" style="font-size: 80%;">Prerequisites: Collect and pre-process data</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs048.html#importing-keras-and-tensorflow" style="font-size: 80%;">Importing Keras and Tensorflow</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs049.html#running-with-keras" style="font-size: 80%;">Running with Keras</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs050.html#final-part" style="font-size: 80%;">Final part</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs051.html#final-visualization" style="font-size: 80%;">Final visualization</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs052.html#the-cifar01-data-set" style="font-size: 80%;">The CIFAR01 data set</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs053.html#verifying-the-data-set" style="font-size: 80%;">Verifying the data set</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs054.html#set-up-the-model" style="font-size: 80%;">Set up  the model</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs055.html#add-dense-layers-on-top" style="font-size: 80%;">Add Dense layers on top</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs056.html#compile-and-train-the-model" style="font-size: 80%;">Compile and train the model</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs057.html#finally-evaluate-the-model" style="font-size: 80%;">Finally, evaluate the model</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs058.html#building-code-using-pytorch" style="font-size: 80%;">Building code using Pytorch</a></li>
 
         </ul>
       </li>
@@ -418,7 +278,7 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0004"></a>
 <!-- !split -->
-<h2 id="convolutional-neural-networks-recognizing-images" class="anchor">Convolutional Neural Networks (recognizing images) </h2>
+<h2 id="convolutional-neural-networks-recognizing-images-reminder-from-last-week" class="anchor">Convolutional Neural Networks (recognizing images), reminder from last week </h2>
 
 <p>Convolutional neural networks (CNNs) were developed during the last
 decade of the previous century, with a focus on character recognition
@@ -458,7 +318,7 @@ <h2 id="convolutional-neural-networks-recognizing-images" class="anchor">Convolu
   <li><a href="http://github.com/._week45-bs012.html">13</a></li>
   <li><a href="http://github.com/._week45-bs013.html">14</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week45-bs076.html">77</a></li>
+  <li><a href="http://github.com/._week45-bs058.html">59</a></li>
   <li><a href="http://github.com/._week45-bs005.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week45/html/._week45-bs005.html b/doc/pub/week45/html/._week45-bs005.html
index 5e288381e..a6938e73c 100644
--- a/doc/pub/week45/html/._week45-bs005.html
+++ b/doc/pub/week45/html/._week45-bs005.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks (RNNs)">
-<title>Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks (RNNs)</title>
+<meta name="description" content="Week 45,  Convolutional Neural Networks (CCNs)">
+<title>Week 45,  Convolutional Neural Networks (CCNs)</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week45.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week45-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -37,19 +37,19 @@
 <!-- tocinfo
 {'highest level': 2,
  'sections': [('Plans for week 45', 2, None, 'plans-for-week-45'),
-              ('Material for the lab sessions, additional ways to present '
-               'classification results and other practicalities',
+              ('Material for the lab sessions',
                2,
                None,
-               'material-for-the-lab-sessions-additional-ways-to-present-classification-results-and-other-practicalities'),
-              ('Material for Lecture Monday November 4',
+               'material-for-the-lab-sessions'),
+              ('Material for Lecture Monday November 3',
                2,
                None,
-               'material-for-lecture-monday-november-4'),
-              ('Convolutional Neural Networks (recognizing images)',
+               'material-for-lecture-monday-november-3'),
+              ('Convolutional Neural Networks (recognizing images), reminder '
+               'from last week',
                2,
                None,
-               'convolutional-neural-networks-recognizing-images'),
+               'convolutional-neural-networks-recognizing-images-reminder-from-last-week'),
               ('What is the Difference', 2, None, 'what-is-the-difference'),
               ('Neural Networks vs CNNs', 2, None, 'neural-networks-vs-cnns'),
               ('Why CNNS for images, sound files, medical images from CT scans '
@@ -62,10 +62,13 @@
                None,
                'regular-nns-don-t-scale-well-to-full-images'),
               ('3D volumes of neurons', 2, None, '3d-volumes-of-neurons'),
+              ('More on Dimensionalities', 2, None, 'more-on-dimensionalities'),
+              ('Further remarks', 2, None, 'further-remarks'),
               ('Layers used to build CNNs',
                2,
                None,
                'layers-used-to-build-cnns'),
+              ('Transforming images', 2, None, 'transforming-images'),
               ('CNNs in brief', 2, None, 'cnns-in-brief'),
               ('A deep CNN model ("From Raschka et '
                'al":"https://github.com/rasbt/machine-learning-book")',
@@ -73,208 +76,104 @@
                None,
                'a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
               ('Key Idea', 2, None, 'key-idea'),
-              ('Building convolutional neural networks in Tensorflow and Keras',
+              ('Mathematics of CNNs', 2, None, 'mathematics-of-cnns'),
+              ('Convolution Examples: Polynomial multiplication',
                2,
                None,
-               'building-convolutional-neural-networks-in-tensorflow-and-keras'),
-              ('Setting it up', 2, None, 'setting-it-up'),
-              ('The MNIST dataset again', 2, None, 'the-mnist-dataset-again'),
-              ('Strong correlations', 2, None, 'strong-correlations'),
-              ('Layers of a CNN', 2, None, 'layers-of-a-cnn'),
-              ('Systematic reduction', 2, None, 'systematic-reduction'),
-              ('Prerequisites: Collect and pre-process data',
-               2,
-               None,
-               'prerequisites-collect-and-pre-process-data'),
-              ('Importing Keras and Tensorflow',
-               2,
-               None,
-               'importing-keras-and-tensorflow'),
-              ('Running with Keras', 2, None, 'running-with-keras'),
-              ('Final part', 2, None, 'final-part'),
-              ('Final visualization', 2, None, 'final-visualization'),
-              ('The CIFAR01 data set', 2, None, 'the-cifar01-data-set'),
-              ('Verifying the data set', 2, None, 'verifying-the-data-set'),
-              ('Set up  the model', 2, None, 'set-up-the-model'),
-              ('Add Dense layers on top', 2, None, 'add-dense-layers-on-top'),
-              ('Compile and train the model',
-               2,
-               None,
-               'compile-and-train-the-model'),
-              ('Finally, evaluate the model',
-               2,
-               None,
-               'finally-evaluate-the-model'),
-              ('Building our own CNN code',
-               2,
-               None,
-               'building-our-own-cnn-code'),
-              ('List of contents:', 3, None, 'list-of-contents'),
-              ('Schedulers', 3, None, 'schedulers'),
-              ('Usage of schedulers', 3, None, 'usage-of-schedulers'),
-              ('Cost functions', 3, None, 'cost-functions'),
-              ('Usage of cost functions', 3, None, 'usage-of-cost-functions'),
-              ('Activation functions', 3, None, 'activation-functions'),
-              ('Usage of activation functions',
-               3,
-               None,
-               'usage-of-activation-functions'),
-              ('Convolution', 3, None, 'convolution'),
-              ('Layers', 3, None, 'layers'),
-              ('Convolution2DLayer: convolution in a hidden layer',
-               3,
-               None,
-               'convolution2dlayer-convolution-in-a-hidden-layer'),
-              ('Backpropagation in the convolutional layer',
-               3,
-               None,
-               'backpropagation-in-the-convolutional-layer'),
-              ('Demonstration', 3, None, 'demonstration'),
-              ('Pooling Layer', 3, None, 'pooling-layer'),
-              ('Flattening Layer', 3, None, 'flattening-layer'),
-              ('Fully Connected Layers', 3, None, 'fully-connected-layers'),
-              ('Optimized Convolution2DLayer',
-               3,
-               None,
-               'optimized-convolution2dlayer'),
-              ('The Convolutional Neural Network (CNN)',
-               3,
-               None,
-               'the-convolutional-neural-network-cnn'),
-              ('Usage of CNN code', 3, None, 'usage-of-cnn-code'),
-              ('Additional Remarks', 3, None, 'additional-remarks'),
-              ('Remarks on the speed', 3, None, 'remarks-on-the-speed'),
-              ('Convolution using separable kernels',
-               3,
-               None,
-               'convolution-using-separable-kernels'),
-              ('Convolution in the Fourier domain',
-               3,
-               None,
-               'convolution-in-the-fourier-domain'),
-              ('From FFNNs and CNNs to recurrent neural networks (RNNs)',
-               2,
-               None,
-               'from-ffnns-and-cnns-to-recurrent-neural-networks-rnns'),
-              ('Feedback connections', 2, None, 'feedback-connections'),
-              ('Vanishing gradients', 2, None, 'vanishing-gradients'),
-              ('Recurrent neural networks (RNNs): Overarching view',
+               'convolution-examples-polynomial-multiplication'),
+              ('Efficient Polynomial Multiplication',
                2,
                None,
-               'recurrent-neural-networks-rnns-overarching-view'),
-              ('Sequential data only?', 2, None, 'sequential-data-only'),
-              ('Differential equations', 2, None, 'differential-equations'),
-              ('A simple example', 2, None, 'a-simple-example'),
-              ('RNNs', 2, None, 'rnns'),
-              ('What kinds of behaviour can RNNs exhibit?',
+               'efficient-polynomial-multiplication'),
+              ('Further simplification', 2, None, 'further-simplification'),
+              ('A more efficient way of coding the above Convolution',
                2,
                None,
-               'what-kinds-of-behaviour-can-rnns-exhibit'),
-              ('Basic layout,  "Figures from Sebastian Rashcka et al, Machine '
-               'learning with Sickit-Learn and '
-               'PyTorch":"https://sebastianraschka.com/blog/2022/ml-pytorch-book.html"',
+               'a-more-efficient-way-of-coding-the-above-convolution'),
+              ('Commutative process', 2, None, 'commutative-process'),
+              ('Toeplitz matrices', 2, None, 'toeplitz-matrices'),
+              ('Fourier series and Toeplitz matrices',
                2,
                None,
-               'basic-layout-figures-from-sebastian-rashcka-et-al-machine-learning-with-sickit-learn-and-pytorch-https-sebastianraschka-com-blog-2022-ml-pytorch-book-html'),
-              ('Solving differential equations with RNNs',
+               'fourier-series-and-toeplitz-matrices'),
+              ('Generalizing the above one-dimensional case',
                2,
                None,
-               'solving-differential-equations-with-rnns'),
-              ('Two first-order differential equations',
+               'generalizing-the-above-one-dimensional-case'),
+              ('Memory considerations', 2, None, 'memory-considerations'),
+              ('Padding', 2, None, 'padding'),
+              ('New vector', 2, None, 'new-vector'),
+              ('Rewriting as dot products',
                2,
                None,
-               'two-first-order-differential-equations'),
-              ('Velocity only', 2, None, 'velocity-only'),
-              ('Linking with RNNs', 2, None, 'linking-with-rnns'),
-              ('Minor rewrite', 2, None, 'minor-rewrite'),
-              ('RNNs in more detail', 2, None, 'rnns-in-more-detail'),
-              ('RNNs in more detail, part 2',
+               'rewriting-as-dot-products'),
+              ('Cross correlation', 2, None, 'cross-correlation'),
+              ('Two-dimensional objects', 2, None, 'two-dimensional-objects'),
+              ('CNNs in more detail, simple example',
                2,
                None,
-               'rnns-in-more-detail-part-2'),
-              ('RNNs in more detail, part 3',
+               'cnns-in-more-detail-simple-example'),
+              ('The convolution stage', 2, None, 'the-convolution-stage'),
+              ('Finding the number of parameters',
                2,
                None,
-               'rnns-in-more-detail-part-3'),
-              ('RNNs in more detail, part 4',
+               'finding-the-number-of-parameters'),
+              ('New image (or volume)', 2, None, 'new-image-or-volume'),
+              ('Parameters to train, common settings',
                2,
                None,
-               'rnns-in-more-detail-part-4'),
-              ('RNNs in more detail, part 5',
-               2,
-               None,
-               'rnns-in-more-detail-part-5'),
-              ('RNNs in more detail, part 6',
-               2,
-               None,
-               'rnns-in-more-detail-part-6'),
-              ('RNNs in more detail, part 7',
-               2,
-               None,
-               'rnns-in-more-detail-part-7'),
-              ('Backpropagation through time',
-               2,
-               None,
-               'backpropagation-through-time'),
-              ('The backward pass is linear',
-               2,
-               None,
-               'the-backward-pass-is-linear'),
-              ('The problem of exploding or vanishing gradients',
-               2,
-               None,
-               'the-problem-of-exploding-or-vanishing-gradients'),
-              ('Mathematical setup', 2, None, 'mathematical-setup'),
-              ('Back propagation in time through figures, part 1',
-               2,
-               None,
-               'back-propagation-in-time-through-figures-part-1'),
-              ('Back propagation in time, part 2',
-               2,
-               None,
-               'back-propagation-in-time-part-2'),
-              ('Back propagation in time, part 3',
+               'parameters-to-train-common-settings'),
+              ('Examples of CNN setups', 2, None, 'examples-of-cnn-setups'),
+              ('Summarizing: Performing a general discrete convolution ("From '
+               'Raschka et '
+               'al":"https://github.com/rasbt/machine-learning-book")',
                2,
                None,
-               'back-propagation-in-time-part-3'),
-              ('Back propagation in time, part 4',
+               'summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
+              ('Pooling', 2, None, 'pooling'),
+              ('Pooling arithmetic', 2, None, 'pooling-arithmetic'),
+              ('Pooling types ("From Raschka et '
+               'al":"https://github.com/rasbt/machine-learning-book")',
                2,
                None,
-               'back-propagation-in-time-part-4'),
-              ('Back propagation in time in equations',
+               'pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
+              ('Building convolutional neural networks using Tensorflow and '
+               'Keras',
                2,
                None,
-               'back-propagation-in-time-in-equations'),
-              ('Chain rule again', 2, None, 'chain-rule-again'),
-              ('Gradients of loss functions',
+               'building-convolutional-neural-networks-using-tensorflow-and-keras'),
+              ('Setting it up', 2, None, 'setting-it-up'),
+              ('The MNIST dataset again', 2, None, 'the-mnist-dataset-again'),
+              ('Strong correlations', 2, None, 'strong-correlations'),
+              ('Layers of a CNN', 2, None, 'layers-of-a-cnn'),
+              ('Systematic reduction', 2, None, 'systematic-reduction'),
+              ('Prerequisites: Collect and pre-process data',
                2,
                None,
-               'gradients-of-loss-functions'),
-              ('Summary of RNNs', 2, None, 'summary-of-rnns'),
-              ('Summary of a  typical RNN',
+               'prerequisites-collect-and-pre-process-data'),
+              ('Importing Keras and Tensorflow',
                2,
                None,
-               'summary-of-a-typical-rnn'),
-              ('Four effective ways to learn an RNN and preparing for next '
-               'week',
+               'importing-keras-and-tensorflow'),
+              ('Running with Keras', 2, None, 'running-with-keras'),
+              ('Final part', 2, None, 'final-part'),
+              ('Final visualization', 2, None, 'final-visualization'),
+              ('The CIFAR01 data set', 2, None, 'the-cifar01-data-set'),
+              ('Verifying the data set', 2, None, 'verifying-the-data-set'),
+              ('Set up  the model', 2, None, 'set-up-the-model'),
+              ('Add Dense layers on top', 2, None, 'add-dense-layers-on-top'),
+              ('Compile and train the model',
                2,
                None,
-               'four-effective-ways-to-learn-an-rnn-and-preparing-for-next-week'),
-              ('Gating mechanism: Long Short Term Memory (LSTM)',
+               'compile-and-train-the-model'),
+              ('Finally, evaluate the model',
                2,
                None,
-               'gating-mechanism-long-short-term-memory-lstm'),
-              ('Implementing a memory cell in a neural network',
+               'finally-evaluate-the-model'),
+              ('Building code using Pytorch',
                2,
                None,
-               'implementing-a-memory-cell-in-a-neural-network'),
-              ('LSTM details', 2, None, 'lstm-details'),
-              ('Basic layout', 2, None, 'basic-layout'),
-              ('More LSTM details', 2, None, 'more-lstm-details'),
-              ('The forget gate', 2, None, 'the-forget-gate'),
-              ('Input gate', 2, None, 'input-gate'),
-              ('Forget and input', 2, None, 'forget-and-input'),
-              ('Output gate', 2, None, 'output-gate')]}
+               'building-code-using-pytorch')]}
 end of tocinfo -->
 
 <body>
@@ -302,111 +201,72 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week45-bs.html">Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks (RNNs)</a>
+    <a class="navbar-brand" href="http://github.com/week45-bs.html">Week 45,  Convolutional Neural Networks (CCNs)</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs001.html#plans-for-week-45" style="font-size: 80%;"><b>Plans for week 45</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs002.html#material-for-the-lab-sessions-additional-ways-to-present-classification-results-and-other-practicalities" style="font-size: 80%;"><b>Material for the lab sessions, additional ways to present classification results and other practicalities</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs003.html#material-for-lecture-monday-november-4" style="font-size: 80%;"><b>Material for Lecture Monday November 4</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs004.html#convolutional-neural-networks-recognizing-images" style="font-size: 80%;"><b>Convolutional Neural Networks (recognizing images)</b></a></li>
-     <!-- navigation toc: --> <li><a href="#what-is-the-difference" style="font-size: 80%;"><b>What is the Difference</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs006.html#neural-networks-vs-cnns" style="font-size: 80%;"><b>Neural Networks vs CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs007.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;"><b>Why CNNS for images, sound files, medical images from CT scans etc?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs008.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;"><b>Regular NNs don’t scale well to full images</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs009.html#3d-volumes-of-neurons" style="font-size: 80%;"><b>3D volumes of neurons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs010.html#layers-used-to-build-cnns" style="font-size: 80%;"><b>Layers used to build CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs011.html#cnns-in-brief" style="font-size: 80%;"><b>CNNs in brief</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs012.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs013.html#key-idea" style="font-size: 80%;"><b>Key Idea</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs014.html#building-convolutional-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building convolutional neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs015.html#setting-it-up" style="font-size: 80%;"><b>Setting it up</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs016.html#the-mnist-dataset-again" style="font-size: 80%;"><b>The MNIST dataset again</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs017.html#strong-correlations" style="font-size: 80%;"><b>Strong correlations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs018.html#layers-of-a-cnn" style="font-size: 80%;"><b>Layers of a CNN</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs019.html#systematic-reduction" style="font-size: 80%;"><b>Systematic reduction</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs020.html#prerequisites-collect-and-pre-process-data" style="font-size: 80%;"><b>Prerequisites: Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs021.html#importing-keras-and-tensorflow" style="font-size: 80%;"><b>Importing Keras and Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs022.html#running-with-keras" style="font-size: 80%;"><b>Running with Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs023.html#final-part" style="font-size: 80%;"><b>Final part</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs024.html#final-visualization" style="font-size: 80%;"><b>Final visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs025.html#the-cifar01-data-set" style="font-size: 80%;"><b>The CIFAR01 data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs026.html#verifying-the-data-set" style="font-size: 80%;"><b>Verifying the data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs027.html#set-up-the-model" style="font-size: 80%;"><b>Set up  the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs028.html#add-dense-layers-on-top" style="font-size: 80%;"><b>Add Dense layers on top</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs029.html#compile-and-train-the-model" style="font-size: 80%;"><b>Compile and train the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs030.html#finally-evaluate-the-model" style="font-size: 80%;"><b>Finally, evaluate the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#building-our-own-cnn-code" style="font-size: 80%;"><b>Building our own CNN code</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#list-of-contents" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;List of contents:</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution2dlayer-convolution-in-a-hidden-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution2DLayer: convolution in a hidden layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#backpropagation-in-the-convolutional-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Backpropagation in the convolutional layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#demonstration" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Demonstration</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#pooling-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Pooling Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#flattening-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Flattening Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#fully-connected-layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Fully Connected Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#optimized-convolution2dlayer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Optimized Convolution2DLayer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#the-convolutional-neural-network-cnn" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;The Convolutional Neural Network (CNN)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-cnn-code" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of CNN code</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#additional-remarks" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Additional Remarks</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#remarks-on-the-speed" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Remarks on the speed</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution-using-separable-kernels" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution using separable kernels</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution-in-the-fourier-domain" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution in the Fourier domain</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs032.html#from-ffnns-and-cnns-to-recurrent-neural-networks-rnns" style="font-size: 80%;"><b>From FFNNs and CNNs to recurrent neural networks (RNNs)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs033.html#feedback-connections" style="font-size: 80%;"><b>Feedback connections</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs034.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs035.html#recurrent-neural-networks-rnns-overarching-view" style="font-size: 80%;"><b>Recurrent neural networks (RNNs): Overarching view</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs036.html#sequential-data-only" style="font-size: 80%;"><b>Sequential data only?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs037.html#differential-equations" style="font-size: 80%;"><b>Differential equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs038.html#a-simple-example" style="font-size: 80%;"><b>A simple example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs039.html#rnns" style="font-size: 80%;"><b>RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs040.html#what-kinds-of-behaviour-can-rnns-exhibit" style="font-size: 80%;"><b>What kinds of behaviour can RNNs exhibit?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs041.html#basic-layout-figures-from-sebastian-rashcka-et-al-machine-learning-with-sickit-learn-and-pytorch-https-sebastianraschka-com-blog-2022-ml-pytorch-book-html" style="font-size: 80%;"><b>Basic layout,  "Figures from Sebastian Rashcka et al, Machine learning with Sickit-Learn and PyTorch":"https://sebastianraschka.com/blog/2022/ml-pytorch-book.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs042.html#solving-differential-equations-with-rnns" style="font-size: 80%;"><b>Solving differential equations with RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs043.html#two-first-order-differential-equations" style="font-size: 80%;"><b>Two first-order differential equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs044.html#velocity-only" style="font-size: 80%;"><b>Velocity only</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs045.html#linking-with-rnns" style="font-size: 80%;"><b>Linking with RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs046.html#minor-rewrite" style="font-size: 80%;"><b>Minor rewrite</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs047.html#rnns-in-more-detail" style="font-size: 80%;"><b>RNNs in more detail</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs048.html#rnns-in-more-detail-part-2" style="font-size: 80%;"><b>RNNs in more detail, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs049.html#rnns-in-more-detail-part-3" style="font-size: 80%;"><b>RNNs in more detail, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs050.html#rnns-in-more-detail-part-4" style="font-size: 80%;"><b>RNNs in more detail, part 4</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs051.html#rnns-in-more-detail-part-5" style="font-size: 80%;"><b>RNNs in more detail, part 5</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs052.html#rnns-in-more-detail-part-6" style="font-size: 80%;"><b>RNNs in more detail, part 6</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs053.html#rnns-in-more-detail-part-7" style="font-size: 80%;"><b>RNNs in more detail, part 7</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs054.html#backpropagation-through-time" style="font-size: 80%;"><b>Backpropagation through time</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs055.html#the-backward-pass-is-linear" style="font-size: 80%;"><b>The backward pass is linear</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs056.html#the-problem-of-exploding-or-vanishing-gradients" style="font-size: 80%;"><b>The problem of exploding or vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs057.html#mathematical-setup" style="font-size: 80%;"><b>Mathematical setup</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs058.html#back-propagation-in-time-through-figures-part-1" style="font-size: 80%;"><b>Back propagation in time through figures, part 1</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs059.html#back-propagation-in-time-part-2" style="font-size: 80%;"><b>Back propagation in time, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs060.html#back-propagation-in-time-part-3" style="font-size: 80%;"><b>Back propagation in time, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs061.html#back-propagation-in-time-part-4" style="font-size: 80%;"><b>Back propagation in time, part 4</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs062.html#back-propagation-in-time-in-equations" style="font-size: 80%;"><b>Back propagation in time in equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs063.html#chain-rule-again" style="font-size: 80%;"><b>Chain rule again</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs064.html#gradients-of-loss-functions" style="font-size: 80%;"><b>Gradients of loss functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs065.html#summary-of-rnns" style="font-size: 80%;"><b>Summary of RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs066.html#summary-of-a-typical-rnn" style="font-size: 80%;"><b>Summary of a  typical RNN</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs067.html#four-effective-ways-to-learn-an-rnn-and-preparing-for-next-week" style="font-size: 80%;"><b>Four effective ways to learn an RNN and preparing for next week</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs068.html#gating-mechanism-long-short-term-memory-lstm" style="font-size: 80%;"><b>Gating mechanism: Long Short Term Memory (LSTM)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs069.html#implementing-a-memory-cell-in-a-neural-network" style="font-size: 80%;"><b>Implementing a memory cell in a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs070.html#lstm-details" style="font-size: 80%;"><b>LSTM details</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs071.html#basic-layout" style="font-size: 80%;"><b>Basic layout</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs072.html#more-lstm-details" style="font-size: 80%;"><b>More LSTM details</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs073.html#the-forget-gate" style="font-size: 80%;"><b>The forget gate</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs074.html#input-gate" style="font-size: 80%;"><b>Input gate</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs075.html#forget-and-input" style="font-size: 80%;"><b>Forget and input</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs076.html#output-gate" style="font-size: 80%;"><b>Output gate</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs001.html#plans-for-week-45" style="font-size: 80%;">Plans for week 45</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs002.html#material-for-the-lab-sessions" style="font-size: 80%;">Material for the lab sessions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs003.html#material-for-lecture-monday-november-3" style="font-size: 80%;">Material for Lecture Monday November 3</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs004.html#convolutional-neural-networks-recognizing-images-reminder-from-last-week" style="font-size: 80%;">Convolutional Neural Networks (recognizing images), reminder from last week</a></li>
+     <!-- navigation toc: --> <li><a href="#what-is-the-difference" style="font-size: 80%;">What is the Difference</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs006.html#neural-networks-vs-cnns" style="font-size: 80%;">Neural Networks vs CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs007.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;">Why CNNS for images, sound files, medical images from CT scans etc?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs008.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;">Regular NNs don’t scale well to full images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs009.html#3d-volumes-of-neurons" style="font-size: 80%;">3D volumes of neurons</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs010.html#more-on-dimensionalities" style="font-size: 80%;">More on Dimensionalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs011.html#further-remarks" style="font-size: 80%;">Further remarks</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs012.html#layers-used-to-build-cnns" style="font-size: 80%;">Layers used to build CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs013.html#transforming-images" style="font-size: 80%;">Transforming images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs014.html#cnns-in-brief" style="font-size: 80%;">CNNs in brief</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs015.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs016.html#key-idea" style="font-size: 80%;">Key Idea</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs017.html#mathematics-of-cnns" style="font-size: 80%;">Mathematics of CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs018.html#convolution-examples-polynomial-multiplication" style="font-size: 80%;">Convolution Examples: Polynomial multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs019.html#efficient-polynomial-multiplication" style="font-size: 80%;">Efficient Polynomial Multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs020.html#further-simplification" style="font-size: 80%;">Further simplification</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs021.html#a-more-efficient-way-of-coding-the-above-convolution" style="font-size: 80%;">A more efficient way of coding the above Convolution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs022.html#commutative-process" style="font-size: 80%;">Commutative process</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs023.html#toeplitz-matrices" style="font-size: 80%;">Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs024.html#fourier-series-and-toeplitz-matrices" style="font-size: 80%;">Fourier series and Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs025.html#generalizing-the-above-one-dimensional-case" style="font-size: 80%;">Generalizing the above one-dimensional case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs026.html#memory-considerations" style="font-size: 80%;">Memory considerations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs027.html#padding" style="font-size: 80%;">Padding</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs028.html#new-vector" style="font-size: 80%;">New vector</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs029.html#rewriting-as-dot-products" style="font-size: 80%;">Rewriting as dot products</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs030.html#cross-correlation" style="font-size: 80%;">Cross correlation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs030.html#two-dimensional-objects" style="font-size: 80%;">Two-dimensional objects</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#cnns-in-more-detail-simple-example" style="font-size: 80%;">CNNs in more detail, simple example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs032.html#the-convolution-stage" style="font-size: 80%;">The convolution stage</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs033.html#finding-the-number-of-parameters" style="font-size: 80%;">Finding the number of parameters</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs034.html#new-image-or-volume" style="font-size: 80%;">New image (or volume)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs035.html#parameters-to-train-common-settings" style="font-size: 80%;">Parameters to train, common settings</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs036.html#examples-of-cnn-setups" style="font-size: 80%;">Examples of CNN setups</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs037.html#summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Summarizing: Performing a general discrete convolution ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs038.html#pooling" style="font-size: 80%;">Pooling</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs039.html#pooling-arithmetic" style="font-size: 80%;">Pooling arithmetic</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs040.html#pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Pooling types ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs041.html#building-convolutional-neural-networks-using-tensorflow-and-keras" style="font-size: 80%;">Building convolutional neural networks using Tensorflow and Keras</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs042.html#setting-it-up" style="font-size: 80%;">Setting it up</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs043.html#the-mnist-dataset-again" style="font-size: 80%;">The MNIST dataset again</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs044.html#strong-correlations" style="font-size: 80%;">Strong correlations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs045.html#layers-of-a-cnn" style="font-size: 80%;">Layers of a CNN</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs046.html#systematic-reduction" style="font-size: 80%;">Systematic reduction</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs047.html#prerequisites-collect-and-pre-process-data" style="font-size: 80%;">Prerequisites: Collect and pre-process data</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs048.html#importing-keras-and-tensorflow" style="font-size: 80%;">Importing Keras and Tensorflow</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs049.html#running-with-keras" style="font-size: 80%;">Running with Keras</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs050.html#final-part" style="font-size: 80%;">Final part</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs051.html#final-visualization" style="font-size: 80%;">Final visualization</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs052.html#the-cifar01-data-set" style="font-size: 80%;">The CIFAR01 data set</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs053.html#verifying-the-data-set" style="font-size: 80%;">Verifying the data set</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs054.html#set-up-the-model" style="font-size: 80%;">Set up  the model</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs055.html#add-dense-layers-on-top" style="font-size: 80%;">Add Dense layers on top</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs056.html#compile-and-train-the-model" style="font-size: 80%;">Compile and train the model</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs057.html#finally-evaluate-the-model" style="font-size: 80%;">Finally, evaluate the model</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs058.html#building-code-using-pytorch" style="font-size: 80%;">Building code using Pytorch</a></li>
 
         </ul>
       </li>
@@ -447,7 +307,7 @@ <h2 id="what-is-the-difference" class="anchor">What is the Difference </h2>
   <li><a href="http://github.com/._week45-bs013.html">14</a></li>
   <li><a href="http://github.com/._week45-bs014.html">15</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week45-bs076.html">77</a></li>
+  <li><a href="http://github.com/._week45-bs058.html">59</a></li>
   <li><a href="http://github.com/._week45-bs006.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week45/html/._week45-bs006.html b/doc/pub/week45/html/._week45-bs006.html
index 2af2b5e04..42a40cec7 100644
--- a/doc/pub/week45/html/._week45-bs006.html
+++ b/doc/pub/week45/html/._week45-bs006.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks (RNNs)">
-<title>Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks (RNNs)</title>
+<meta name="description" content="Week 45,  Convolutional Neural Networks (CCNs)">
+<title>Week 45,  Convolutional Neural Networks (CCNs)</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week45.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week45-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -37,19 +37,19 @@
 <!-- tocinfo
 {'highest level': 2,
  'sections': [('Plans for week 45', 2, None, 'plans-for-week-45'),
-              ('Material for the lab sessions, additional ways to present '
-               'classification results and other practicalities',
+              ('Material for the lab sessions',
                2,
                None,
-               'material-for-the-lab-sessions-additional-ways-to-present-classification-results-and-other-practicalities'),
-              ('Material for Lecture Monday November 4',
+               'material-for-the-lab-sessions'),
+              ('Material for Lecture Monday November 3',
                2,
                None,
-               'material-for-lecture-monday-november-4'),
-              ('Convolutional Neural Networks (recognizing images)',
+               'material-for-lecture-monday-november-3'),
+              ('Convolutional Neural Networks (recognizing images), reminder '
+               'from last week',
                2,
                None,
-               'convolutional-neural-networks-recognizing-images'),
+               'convolutional-neural-networks-recognizing-images-reminder-from-last-week'),
               ('What is the Difference', 2, None, 'what-is-the-difference'),
               ('Neural Networks vs CNNs', 2, None, 'neural-networks-vs-cnns'),
               ('Why CNNS for images, sound files, medical images from CT scans '
@@ -62,10 +62,13 @@
                None,
                'regular-nns-don-t-scale-well-to-full-images'),
               ('3D volumes of neurons', 2, None, '3d-volumes-of-neurons'),
+              ('More on Dimensionalities', 2, None, 'more-on-dimensionalities'),
+              ('Further remarks', 2, None, 'further-remarks'),
               ('Layers used to build CNNs',
                2,
                None,
                'layers-used-to-build-cnns'),
+              ('Transforming images', 2, None, 'transforming-images'),
               ('CNNs in brief', 2, None, 'cnns-in-brief'),
               ('A deep CNN model ("From Raschka et '
                'al":"https://github.com/rasbt/machine-learning-book")',
@@ -73,208 +76,104 @@
                None,
                'a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
               ('Key Idea', 2, None, 'key-idea'),
-              ('Building convolutional neural networks in Tensorflow and Keras',
+              ('Mathematics of CNNs', 2, None, 'mathematics-of-cnns'),
+              ('Convolution Examples: Polynomial multiplication',
                2,
                None,
-               'building-convolutional-neural-networks-in-tensorflow-and-keras'),
-              ('Setting it up', 2, None, 'setting-it-up'),
-              ('The MNIST dataset again', 2, None, 'the-mnist-dataset-again'),
-              ('Strong correlations', 2, None, 'strong-correlations'),
-              ('Layers of a CNN', 2, None, 'layers-of-a-cnn'),
-              ('Systematic reduction', 2, None, 'systematic-reduction'),
-              ('Prerequisites: Collect and pre-process data',
-               2,
-               None,
-               'prerequisites-collect-and-pre-process-data'),
-              ('Importing Keras and Tensorflow',
-               2,
-               None,
-               'importing-keras-and-tensorflow'),
-              ('Running with Keras', 2, None, 'running-with-keras'),
-              ('Final part', 2, None, 'final-part'),
-              ('Final visualization', 2, None, 'final-visualization'),
-              ('The CIFAR01 data set', 2, None, 'the-cifar01-data-set'),
-              ('Verifying the data set', 2, None, 'verifying-the-data-set'),
-              ('Set up  the model', 2, None, 'set-up-the-model'),
-              ('Add Dense layers on top', 2, None, 'add-dense-layers-on-top'),
-              ('Compile and train the model',
-               2,
-               None,
-               'compile-and-train-the-model'),
-              ('Finally, evaluate the model',
-               2,
-               None,
-               'finally-evaluate-the-model'),
-              ('Building our own CNN code',
-               2,
-               None,
-               'building-our-own-cnn-code'),
-              ('List of contents:', 3, None, 'list-of-contents'),
-              ('Schedulers', 3, None, 'schedulers'),
-              ('Usage of schedulers', 3, None, 'usage-of-schedulers'),
-              ('Cost functions', 3, None, 'cost-functions'),
-              ('Usage of cost functions', 3, None, 'usage-of-cost-functions'),
-              ('Activation functions', 3, None, 'activation-functions'),
-              ('Usage of activation functions',
-               3,
-               None,
-               'usage-of-activation-functions'),
-              ('Convolution', 3, None, 'convolution'),
-              ('Layers', 3, None, 'layers'),
-              ('Convolution2DLayer: convolution in a hidden layer',
-               3,
-               None,
-               'convolution2dlayer-convolution-in-a-hidden-layer'),
-              ('Backpropagation in the convolutional layer',
-               3,
-               None,
-               'backpropagation-in-the-convolutional-layer'),
-              ('Demonstration', 3, None, 'demonstration'),
-              ('Pooling Layer', 3, None, 'pooling-layer'),
-              ('Flattening Layer', 3, None, 'flattening-layer'),
-              ('Fully Connected Layers', 3, None, 'fully-connected-layers'),
-              ('Optimized Convolution2DLayer',
-               3,
-               None,
-               'optimized-convolution2dlayer'),
-              ('The Convolutional Neural Network (CNN)',
-               3,
-               None,
-               'the-convolutional-neural-network-cnn'),
-              ('Usage of CNN code', 3, None, 'usage-of-cnn-code'),
-              ('Additional Remarks', 3, None, 'additional-remarks'),
-              ('Remarks on the speed', 3, None, 'remarks-on-the-speed'),
-              ('Convolution using separable kernels',
-               3,
-               None,
-               'convolution-using-separable-kernels'),
-              ('Convolution in the Fourier domain',
-               3,
-               None,
-               'convolution-in-the-fourier-domain'),
-              ('From FFNNs and CNNs to recurrent neural networks (RNNs)',
-               2,
-               None,
-               'from-ffnns-and-cnns-to-recurrent-neural-networks-rnns'),
-              ('Feedback connections', 2, None, 'feedback-connections'),
-              ('Vanishing gradients', 2, None, 'vanishing-gradients'),
-              ('Recurrent neural networks (RNNs): Overarching view',
+               'convolution-examples-polynomial-multiplication'),
+              ('Efficient Polynomial Multiplication',
                2,
                None,
-               'recurrent-neural-networks-rnns-overarching-view'),
-              ('Sequential data only?', 2, None, 'sequential-data-only'),
-              ('Differential equations', 2, None, 'differential-equations'),
-              ('A simple example', 2, None, 'a-simple-example'),
-              ('RNNs', 2, None, 'rnns'),
-              ('What kinds of behaviour can RNNs exhibit?',
+               'efficient-polynomial-multiplication'),
+              ('Further simplification', 2, None, 'further-simplification'),
+              ('A more efficient way of coding the above Convolution',
                2,
                None,
-               'what-kinds-of-behaviour-can-rnns-exhibit'),
-              ('Basic layout,  "Figures from Sebastian Rashcka et al, Machine '
-               'learning with Sickit-Learn and '
-               'PyTorch":"https://sebastianraschka.com/blog/2022/ml-pytorch-book.html"',
+               'a-more-efficient-way-of-coding-the-above-convolution'),
+              ('Commutative process', 2, None, 'commutative-process'),
+              ('Toeplitz matrices', 2, None, 'toeplitz-matrices'),
+              ('Fourier series and Toeplitz matrices',
                2,
                None,
-               'basic-layout-figures-from-sebastian-rashcka-et-al-machine-learning-with-sickit-learn-and-pytorch-https-sebastianraschka-com-blog-2022-ml-pytorch-book-html'),
-              ('Solving differential equations with RNNs',
+               'fourier-series-and-toeplitz-matrices'),
+              ('Generalizing the above one-dimensional case',
                2,
                None,
-               'solving-differential-equations-with-rnns'),
-              ('Two first-order differential equations',
+               'generalizing-the-above-one-dimensional-case'),
+              ('Memory considerations', 2, None, 'memory-considerations'),
+              ('Padding', 2, None, 'padding'),
+              ('New vector', 2, None, 'new-vector'),
+              ('Rewriting as dot products',
                2,
                None,
-               'two-first-order-differential-equations'),
-              ('Velocity only', 2, None, 'velocity-only'),
-              ('Linking with RNNs', 2, None, 'linking-with-rnns'),
-              ('Minor rewrite', 2, None, 'minor-rewrite'),
-              ('RNNs in more detail', 2, None, 'rnns-in-more-detail'),
-              ('RNNs in more detail, part 2',
+               'rewriting-as-dot-products'),
+              ('Cross correlation', 2, None, 'cross-correlation'),
+              ('Two-dimensional objects', 2, None, 'two-dimensional-objects'),
+              ('CNNs in more detail, simple example',
                2,
                None,
-               'rnns-in-more-detail-part-2'),
-              ('RNNs in more detail, part 3',
+               'cnns-in-more-detail-simple-example'),
+              ('The convolution stage', 2, None, 'the-convolution-stage'),
+              ('Finding the number of parameters',
                2,
                None,
-               'rnns-in-more-detail-part-3'),
-              ('RNNs in more detail, part 4',
+               'finding-the-number-of-parameters'),
+              ('New image (or volume)', 2, None, 'new-image-or-volume'),
+              ('Parameters to train, common settings',
                2,
                None,
-               'rnns-in-more-detail-part-4'),
-              ('RNNs in more detail, part 5',
-               2,
-               None,
-               'rnns-in-more-detail-part-5'),
-              ('RNNs in more detail, part 6',
-               2,
-               None,
-               'rnns-in-more-detail-part-6'),
-              ('RNNs in more detail, part 7',
-               2,
-               None,
-               'rnns-in-more-detail-part-7'),
-              ('Backpropagation through time',
-               2,
-               None,
-               'backpropagation-through-time'),
-              ('The backward pass is linear',
-               2,
-               None,
-               'the-backward-pass-is-linear'),
-              ('The problem of exploding or vanishing gradients',
-               2,
-               None,
-               'the-problem-of-exploding-or-vanishing-gradients'),
-              ('Mathematical setup', 2, None, 'mathematical-setup'),
-              ('Back propagation in time through figures, part 1',
-               2,
-               None,
-               'back-propagation-in-time-through-figures-part-1'),
-              ('Back propagation in time, part 2',
-               2,
-               None,
-               'back-propagation-in-time-part-2'),
-              ('Back propagation in time, part 3',
+               'parameters-to-train-common-settings'),
+              ('Examples of CNN setups', 2, None, 'examples-of-cnn-setups'),
+              ('Summarizing: Performing a general discrete convolution ("From '
+               'Raschka et '
+               'al":"https://github.com/rasbt/machine-learning-book")',
                2,
                None,
-               'back-propagation-in-time-part-3'),
-              ('Back propagation in time, part 4',
+               'summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
+              ('Pooling', 2, None, 'pooling'),
+              ('Pooling arithmetic', 2, None, 'pooling-arithmetic'),
+              ('Pooling types ("From Raschka et '
+               'al":"https://github.com/rasbt/machine-learning-book")',
                2,
                None,
-               'back-propagation-in-time-part-4'),
-              ('Back propagation in time in equations',
+               'pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
+              ('Building convolutional neural networks using Tensorflow and '
+               'Keras',
                2,
                None,
-               'back-propagation-in-time-in-equations'),
-              ('Chain rule again', 2, None, 'chain-rule-again'),
-              ('Gradients of loss functions',
+               'building-convolutional-neural-networks-using-tensorflow-and-keras'),
+              ('Setting it up', 2, None, 'setting-it-up'),
+              ('The MNIST dataset again', 2, None, 'the-mnist-dataset-again'),
+              ('Strong correlations', 2, None, 'strong-correlations'),
+              ('Layers of a CNN', 2, None, 'layers-of-a-cnn'),
+              ('Systematic reduction', 2, None, 'systematic-reduction'),
+              ('Prerequisites: Collect and pre-process data',
                2,
                None,
-               'gradients-of-loss-functions'),
-              ('Summary of RNNs', 2, None, 'summary-of-rnns'),
-              ('Summary of a  typical RNN',
+               'prerequisites-collect-and-pre-process-data'),
+              ('Importing Keras and Tensorflow',
                2,
                None,
-               'summary-of-a-typical-rnn'),
-              ('Four effective ways to learn an RNN and preparing for next '
-               'week',
+               'importing-keras-and-tensorflow'),
+              ('Running with Keras', 2, None, 'running-with-keras'),
+              ('Final part', 2, None, 'final-part'),
+              ('Final visualization', 2, None, 'final-visualization'),
+              ('The CIFAR01 data set', 2, None, 'the-cifar01-data-set'),
+              ('Verifying the data set', 2, None, 'verifying-the-data-set'),
+              ('Set up  the model', 2, None, 'set-up-the-model'),
+              ('Add Dense layers on top', 2, None, 'add-dense-layers-on-top'),
+              ('Compile and train the model',
                2,
                None,
-               'four-effective-ways-to-learn-an-rnn-and-preparing-for-next-week'),
-              ('Gating mechanism: Long Short Term Memory (LSTM)',
+               'compile-and-train-the-model'),
+              ('Finally, evaluate the model',
                2,
                None,
-               'gating-mechanism-long-short-term-memory-lstm'),
-              ('Implementing a memory cell in a neural network',
+               'finally-evaluate-the-model'),
+              ('Building code using Pytorch',
                2,
                None,
-               'implementing-a-memory-cell-in-a-neural-network'),
-              ('LSTM details', 2, None, 'lstm-details'),
-              ('Basic layout', 2, None, 'basic-layout'),
-              ('More LSTM details', 2, None, 'more-lstm-details'),
-              ('The forget gate', 2, None, 'the-forget-gate'),
-              ('Input gate', 2, None, 'input-gate'),
-              ('Forget and input', 2, None, 'forget-and-input'),
-              ('Output gate', 2, None, 'output-gate')]}
+               'building-code-using-pytorch')]}
 end of tocinfo -->
 
 <body>
@@ -302,111 +201,72 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week45-bs.html">Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks (RNNs)</a>
+    <a class="navbar-brand" href="http://github.com/week45-bs.html">Week 45,  Convolutional Neural Networks (CCNs)</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs001.html#plans-for-week-45" style="font-size: 80%;"><b>Plans for week 45</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs002.html#material-for-the-lab-sessions-additional-ways-to-present-classification-results-and-other-practicalities" style="font-size: 80%;"><b>Material for the lab sessions, additional ways to present classification results and other practicalities</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs003.html#material-for-lecture-monday-november-4" style="font-size: 80%;"><b>Material for Lecture Monday November 4</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs004.html#convolutional-neural-networks-recognizing-images" style="font-size: 80%;"><b>Convolutional Neural Networks (recognizing images)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs005.html#what-is-the-difference" style="font-size: 80%;"><b>What is the Difference</b></a></li>
-     <!-- navigation toc: --> <li><a href="#neural-networks-vs-cnns" style="font-size: 80%;"><b>Neural Networks vs CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs007.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;"><b>Why CNNS for images, sound files, medical images from CT scans etc?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs008.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;"><b>Regular NNs don’t scale well to full images</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs009.html#3d-volumes-of-neurons" style="font-size: 80%;"><b>3D volumes of neurons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs010.html#layers-used-to-build-cnns" style="font-size: 80%;"><b>Layers used to build CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs011.html#cnns-in-brief" style="font-size: 80%;"><b>CNNs in brief</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs012.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs013.html#key-idea" style="font-size: 80%;"><b>Key Idea</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs014.html#building-convolutional-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building convolutional neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs015.html#setting-it-up" style="font-size: 80%;"><b>Setting it up</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs016.html#the-mnist-dataset-again" style="font-size: 80%;"><b>The MNIST dataset again</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs017.html#strong-correlations" style="font-size: 80%;"><b>Strong correlations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs018.html#layers-of-a-cnn" style="font-size: 80%;"><b>Layers of a CNN</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs019.html#systematic-reduction" style="font-size: 80%;"><b>Systematic reduction</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs020.html#prerequisites-collect-and-pre-process-data" style="font-size: 80%;"><b>Prerequisites: Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs021.html#importing-keras-and-tensorflow" style="font-size: 80%;"><b>Importing Keras and Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs022.html#running-with-keras" style="font-size: 80%;"><b>Running with Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs023.html#final-part" style="font-size: 80%;"><b>Final part</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs024.html#final-visualization" style="font-size: 80%;"><b>Final visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs025.html#the-cifar01-data-set" style="font-size: 80%;"><b>The CIFAR01 data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs026.html#verifying-the-data-set" style="font-size: 80%;"><b>Verifying the data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs027.html#set-up-the-model" style="font-size: 80%;"><b>Set up  the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs028.html#add-dense-layers-on-top" style="font-size: 80%;"><b>Add Dense layers on top</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs029.html#compile-and-train-the-model" style="font-size: 80%;"><b>Compile and train the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs030.html#finally-evaluate-the-model" style="font-size: 80%;"><b>Finally, evaluate the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#building-our-own-cnn-code" style="font-size: 80%;"><b>Building our own CNN code</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#list-of-contents" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;List of contents:</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution2dlayer-convolution-in-a-hidden-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution2DLayer: convolution in a hidden layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#backpropagation-in-the-convolutional-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Backpropagation in the convolutional layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#demonstration" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Demonstration</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#pooling-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Pooling Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#flattening-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Flattening Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#fully-connected-layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Fully Connected Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#optimized-convolution2dlayer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Optimized Convolution2DLayer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#the-convolutional-neural-network-cnn" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;The Convolutional Neural Network (CNN)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-cnn-code" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of CNN code</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#additional-remarks" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Additional Remarks</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#remarks-on-the-speed" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Remarks on the speed</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution-using-separable-kernels" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution using separable kernels</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution-in-the-fourier-domain" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution in the Fourier domain</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs032.html#from-ffnns-and-cnns-to-recurrent-neural-networks-rnns" style="font-size: 80%;"><b>From FFNNs and CNNs to recurrent neural networks (RNNs)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs033.html#feedback-connections" style="font-size: 80%;"><b>Feedback connections</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs034.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs035.html#recurrent-neural-networks-rnns-overarching-view" style="font-size: 80%;"><b>Recurrent neural networks (RNNs): Overarching view</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs036.html#sequential-data-only" style="font-size: 80%;"><b>Sequential data only?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs037.html#differential-equations" style="font-size: 80%;"><b>Differential equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs038.html#a-simple-example" style="font-size: 80%;"><b>A simple example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs039.html#rnns" style="font-size: 80%;"><b>RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs040.html#what-kinds-of-behaviour-can-rnns-exhibit" style="font-size: 80%;"><b>What kinds of behaviour can RNNs exhibit?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs041.html#basic-layout-figures-from-sebastian-rashcka-et-al-machine-learning-with-sickit-learn-and-pytorch-https-sebastianraschka-com-blog-2022-ml-pytorch-book-html" style="font-size: 80%;"><b>Basic layout,  "Figures from Sebastian Rashcka et al, Machine learning with Sickit-Learn and PyTorch":"https://sebastianraschka.com/blog/2022/ml-pytorch-book.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs042.html#solving-differential-equations-with-rnns" style="font-size: 80%;"><b>Solving differential equations with RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs043.html#two-first-order-differential-equations" style="font-size: 80%;"><b>Two first-order differential equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs044.html#velocity-only" style="font-size: 80%;"><b>Velocity only</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs045.html#linking-with-rnns" style="font-size: 80%;"><b>Linking with RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs046.html#minor-rewrite" style="font-size: 80%;"><b>Minor rewrite</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs047.html#rnns-in-more-detail" style="font-size: 80%;"><b>RNNs in more detail</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs048.html#rnns-in-more-detail-part-2" style="font-size: 80%;"><b>RNNs in more detail, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs049.html#rnns-in-more-detail-part-3" style="font-size: 80%;"><b>RNNs in more detail, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs050.html#rnns-in-more-detail-part-4" style="font-size: 80%;"><b>RNNs in more detail, part 4</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs051.html#rnns-in-more-detail-part-5" style="font-size: 80%;"><b>RNNs in more detail, part 5</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs052.html#rnns-in-more-detail-part-6" style="font-size: 80%;"><b>RNNs in more detail, part 6</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs053.html#rnns-in-more-detail-part-7" style="font-size: 80%;"><b>RNNs in more detail, part 7</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs054.html#backpropagation-through-time" style="font-size: 80%;"><b>Backpropagation through time</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs055.html#the-backward-pass-is-linear" style="font-size: 80%;"><b>The backward pass is linear</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs056.html#the-problem-of-exploding-or-vanishing-gradients" style="font-size: 80%;"><b>The problem of exploding or vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs057.html#mathematical-setup" style="font-size: 80%;"><b>Mathematical setup</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs058.html#back-propagation-in-time-through-figures-part-1" style="font-size: 80%;"><b>Back propagation in time through figures, part 1</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs059.html#back-propagation-in-time-part-2" style="font-size: 80%;"><b>Back propagation in time, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs060.html#back-propagation-in-time-part-3" style="font-size: 80%;"><b>Back propagation in time, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs061.html#back-propagation-in-time-part-4" style="font-size: 80%;"><b>Back propagation in time, part 4</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs062.html#back-propagation-in-time-in-equations" style="font-size: 80%;"><b>Back propagation in time in equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs063.html#chain-rule-again" style="font-size: 80%;"><b>Chain rule again</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs064.html#gradients-of-loss-functions" style="font-size: 80%;"><b>Gradients of loss functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs065.html#summary-of-rnns" style="font-size: 80%;"><b>Summary of RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs066.html#summary-of-a-typical-rnn" style="font-size: 80%;"><b>Summary of a  typical RNN</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs067.html#four-effective-ways-to-learn-an-rnn-and-preparing-for-next-week" style="font-size: 80%;"><b>Four effective ways to learn an RNN and preparing for next week</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs068.html#gating-mechanism-long-short-term-memory-lstm" style="font-size: 80%;"><b>Gating mechanism: Long Short Term Memory (LSTM)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs069.html#implementing-a-memory-cell-in-a-neural-network" style="font-size: 80%;"><b>Implementing a memory cell in a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs070.html#lstm-details" style="font-size: 80%;"><b>LSTM details</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs071.html#basic-layout" style="font-size: 80%;"><b>Basic layout</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs072.html#more-lstm-details" style="font-size: 80%;"><b>More LSTM details</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs073.html#the-forget-gate" style="font-size: 80%;"><b>The forget gate</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs074.html#input-gate" style="font-size: 80%;"><b>Input gate</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs075.html#forget-and-input" style="font-size: 80%;"><b>Forget and input</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs076.html#output-gate" style="font-size: 80%;"><b>Output gate</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs001.html#plans-for-week-45" style="font-size: 80%;">Plans for week 45</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs002.html#material-for-the-lab-sessions" style="font-size: 80%;">Material for the lab sessions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs003.html#material-for-lecture-monday-november-3" style="font-size: 80%;">Material for Lecture Monday November 3</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs004.html#convolutional-neural-networks-recognizing-images-reminder-from-last-week" style="font-size: 80%;">Convolutional Neural Networks (recognizing images), reminder from last week</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs005.html#what-is-the-difference" style="font-size: 80%;">What is the Difference</a></li>
+     <!-- navigation toc: --> <li><a href="#neural-networks-vs-cnns" style="font-size: 80%;">Neural Networks vs CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs007.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;">Why CNNS for images, sound files, medical images from CT scans etc?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs008.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;">Regular NNs don’t scale well to full images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs009.html#3d-volumes-of-neurons" style="font-size: 80%;">3D volumes of neurons</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs010.html#more-on-dimensionalities" style="font-size: 80%;">More on Dimensionalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs011.html#further-remarks" style="font-size: 80%;">Further remarks</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs012.html#layers-used-to-build-cnns" style="font-size: 80%;">Layers used to build CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs013.html#transforming-images" style="font-size: 80%;">Transforming images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs014.html#cnns-in-brief" style="font-size: 80%;">CNNs in brief</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs015.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs016.html#key-idea" style="font-size: 80%;">Key Idea</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs017.html#mathematics-of-cnns" style="font-size: 80%;">Mathematics of CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs018.html#convolution-examples-polynomial-multiplication" style="font-size: 80%;">Convolution Examples: Polynomial multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs019.html#efficient-polynomial-multiplication" style="font-size: 80%;">Efficient Polynomial Multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs020.html#further-simplification" style="font-size: 80%;">Further simplification</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs021.html#a-more-efficient-way-of-coding-the-above-convolution" style="font-size: 80%;">A more efficient way of coding the above Convolution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs022.html#commutative-process" style="font-size: 80%;">Commutative process</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs023.html#toeplitz-matrices" style="font-size: 80%;">Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs024.html#fourier-series-and-toeplitz-matrices" style="font-size: 80%;">Fourier series and Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs025.html#generalizing-the-above-one-dimensional-case" style="font-size: 80%;">Generalizing the above one-dimensional case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs026.html#memory-considerations" style="font-size: 80%;">Memory considerations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs027.html#padding" style="font-size: 80%;">Padding</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs028.html#new-vector" style="font-size: 80%;">New vector</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs029.html#rewriting-as-dot-products" style="font-size: 80%;">Rewriting as dot products</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs030.html#cross-correlation" style="font-size: 80%;">Cross correlation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs030.html#two-dimensional-objects" style="font-size: 80%;">Two-dimensional objects</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#cnns-in-more-detail-simple-example" style="font-size: 80%;">CNNs in more detail, simple example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs032.html#the-convolution-stage" style="font-size: 80%;">The convolution stage</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs033.html#finding-the-number-of-parameters" style="font-size: 80%;">Finding the number of parameters</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs034.html#new-image-or-volume" style="font-size: 80%;">New image (or volume)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs035.html#parameters-to-train-common-settings" style="font-size: 80%;">Parameters to train, common settings</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs036.html#examples-of-cnn-setups" style="font-size: 80%;">Examples of CNN setups</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs037.html#summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Summarizing: Performing a general discrete convolution ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs038.html#pooling" style="font-size: 80%;">Pooling</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs039.html#pooling-arithmetic" style="font-size: 80%;">Pooling arithmetic</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs040.html#pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Pooling types ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs041.html#building-convolutional-neural-networks-using-tensorflow-and-keras" style="font-size: 80%;">Building convolutional neural networks using Tensorflow and Keras</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs042.html#setting-it-up" style="font-size: 80%;">Setting it up</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs043.html#the-mnist-dataset-again" style="font-size: 80%;">The MNIST dataset again</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs044.html#strong-correlations" style="font-size: 80%;">Strong correlations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs045.html#layers-of-a-cnn" style="font-size: 80%;">Layers of a CNN</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs046.html#systematic-reduction" style="font-size: 80%;">Systematic reduction</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs047.html#prerequisites-collect-and-pre-process-data" style="font-size: 80%;">Prerequisites: Collect and pre-process data</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs048.html#importing-keras-and-tensorflow" style="font-size: 80%;">Importing Keras and Tensorflow</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs049.html#running-with-keras" style="font-size: 80%;">Running with Keras</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs050.html#final-part" style="font-size: 80%;">Final part</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs051.html#final-visualization" style="font-size: 80%;">Final visualization</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs052.html#the-cifar01-data-set" style="font-size: 80%;">The CIFAR01 data set</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs053.html#verifying-the-data-set" style="font-size: 80%;">Verifying the data set</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs054.html#set-up-the-model" style="font-size: 80%;">Set up  the model</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs055.html#add-dense-layers-on-top" style="font-size: 80%;">Add Dense layers on top</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs056.html#compile-and-train-the-model" style="font-size: 80%;">Compile and train the model</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs057.html#finally-evaluate-the-model" style="font-size: 80%;">Finally, evaluate the model</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs058.html#building-code-using-pytorch" style="font-size: 80%;">Building code using Pytorch</a></li>
 
         </ul>
       </li>
@@ -450,7 +310,7 @@ <h2 id="neural-networks-vs-cnns" class="anchor">Neural Networks vs CNNs </h2>
   <li><a href="http://github.com/._week45-bs014.html">15</a></li>
   <li><a href="http://github.com/._week45-bs015.html">16</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week45-bs076.html">77</a></li>
+  <li><a href="http://github.com/._week45-bs058.html">59</a></li>
   <li><a href="http://github.com/._week45-bs007.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week45/html/._week45-bs007.html b/doc/pub/week45/html/._week45-bs007.html
index 449c34c85..0dfd7df65 100644
--- a/doc/pub/week45/html/._week45-bs007.html
+++ b/doc/pub/week45/html/._week45-bs007.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks (RNNs)">
-<title>Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks (RNNs)</title>
+<meta name="description" content="Week 45,  Convolutional Neural Networks (CCNs)">
+<title>Week 45,  Convolutional Neural Networks (CCNs)</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week45.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week45-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -37,19 +37,19 @@
 <!-- tocinfo
 {'highest level': 2,
  'sections': [('Plans for week 45', 2, None, 'plans-for-week-45'),
-              ('Material for the lab sessions, additional ways to present '
-               'classification results and other practicalities',
+              ('Material for the lab sessions',
                2,
                None,
-               'material-for-the-lab-sessions-additional-ways-to-present-classification-results-and-other-practicalities'),
-              ('Material for Lecture Monday November 4',
+               'material-for-the-lab-sessions'),
+              ('Material for Lecture Monday November 3',
                2,
                None,
-               'material-for-lecture-monday-november-4'),
-              ('Convolutional Neural Networks (recognizing images)',
+               'material-for-lecture-monday-november-3'),
+              ('Convolutional Neural Networks (recognizing images), reminder '
+               'from last week',
                2,
                None,
-               'convolutional-neural-networks-recognizing-images'),
+               'convolutional-neural-networks-recognizing-images-reminder-from-last-week'),
               ('What is the Difference', 2, None, 'what-is-the-difference'),
               ('Neural Networks vs CNNs', 2, None, 'neural-networks-vs-cnns'),
               ('Why CNNS for images, sound files, medical images from CT scans '
@@ -62,10 +62,13 @@
                None,
                'regular-nns-don-t-scale-well-to-full-images'),
               ('3D volumes of neurons', 2, None, '3d-volumes-of-neurons'),
+              ('More on Dimensionalities', 2, None, 'more-on-dimensionalities'),
+              ('Further remarks', 2, None, 'further-remarks'),
               ('Layers used to build CNNs',
                2,
                None,
                'layers-used-to-build-cnns'),
+              ('Transforming images', 2, None, 'transforming-images'),
               ('CNNs in brief', 2, None, 'cnns-in-brief'),
               ('A deep CNN model ("From Raschka et '
                'al":"https://github.com/rasbt/machine-learning-book")',
@@ -73,208 +76,104 @@
                None,
                'a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
               ('Key Idea', 2, None, 'key-idea'),
-              ('Building convolutional neural networks in Tensorflow and Keras',
+              ('Mathematics of CNNs', 2, None, 'mathematics-of-cnns'),
+              ('Convolution Examples: Polynomial multiplication',
                2,
                None,
-               'building-convolutional-neural-networks-in-tensorflow-and-keras'),
-              ('Setting it up', 2, None, 'setting-it-up'),
-              ('The MNIST dataset again', 2, None, 'the-mnist-dataset-again'),
-              ('Strong correlations', 2, None, 'strong-correlations'),
-              ('Layers of a CNN', 2, None, 'layers-of-a-cnn'),
-              ('Systematic reduction', 2, None, 'systematic-reduction'),
-              ('Prerequisites: Collect and pre-process data',
-               2,
-               None,
-               'prerequisites-collect-and-pre-process-data'),
-              ('Importing Keras and Tensorflow',
-               2,
-               None,
-               'importing-keras-and-tensorflow'),
-              ('Running with Keras', 2, None, 'running-with-keras'),
-              ('Final part', 2, None, 'final-part'),
-              ('Final visualization', 2, None, 'final-visualization'),
-              ('The CIFAR01 data set', 2, None, 'the-cifar01-data-set'),
-              ('Verifying the data set', 2, None, 'verifying-the-data-set'),
-              ('Set up  the model', 2, None, 'set-up-the-model'),
-              ('Add Dense layers on top', 2, None, 'add-dense-layers-on-top'),
-              ('Compile and train the model',
-               2,
-               None,
-               'compile-and-train-the-model'),
-              ('Finally, evaluate the model',
-               2,
-               None,
-               'finally-evaluate-the-model'),
-              ('Building our own CNN code',
-               2,
-               None,
-               'building-our-own-cnn-code'),
-              ('List of contents:', 3, None, 'list-of-contents'),
-              ('Schedulers', 3, None, 'schedulers'),
-              ('Usage of schedulers', 3, None, 'usage-of-schedulers'),
-              ('Cost functions', 3, None, 'cost-functions'),
-              ('Usage of cost functions', 3, None, 'usage-of-cost-functions'),
-              ('Activation functions', 3, None, 'activation-functions'),
-              ('Usage of activation functions',
-               3,
-               None,
-               'usage-of-activation-functions'),
-              ('Convolution', 3, None, 'convolution'),
-              ('Layers', 3, None, 'layers'),
-              ('Convolution2DLayer: convolution in a hidden layer',
-               3,
-               None,
-               'convolution2dlayer-convolution-in-a-hidden-layer'),
-              ('Backpropagation in the convolutional layer',
-               3,
-               None,
-               'backpropagation-in-the-convolutional-layer'),
-              ('Demonstration', 3, None, 'demonstration'),
-              ('Pooling Layer', 3, None, 'pooling-layer'),
-              ('Flattening Layer', 3, None, 'flattening-layer'),
-              ('Fully Connected Layers', 3, None, 'fully-connected-layers'),
-              ('Optimized Convolution2DLayer',
-               3,
-               None,
-               'optimized-convolution2dlayer'),
-              ('The Convolutional Neural Network (CNN)',
-               3,
-               None,
-               'the-convolutional-neural-network-cnn'),
-              ('Usage of CNN code', 3, None, 'usage-of-cnn-code'),
-              ('Additional Remarks', 3, None, 'additional-remarks'),
-              ('Remarks on the speed', 3, None, 'remarks-on-the-speed'),
-              ('Convolution using separable kernels',
-               3,
-               None,
-               'convolution-using-separable-kernels'),
-              ('Convolution in the Fourier domain',
-               3,
-               None,
-               'convolution-in-the-fourier-domain'),
-              ('From FFNNs and CNNs to recurrent neural networks (RNNs)',
-               2,
-               None,
-               'from-ffnns-and-cnns-to-recurrent-neural-networks-rnns'),
-              ('Feedback connections', 2, None, 'feedback-connections'),
-              ('Vanishing gradients', 2, None, 'vanishing-gradients'),
-              ('Recurrent neural networks (RNNs): Overarching view',
+               'convolution-examples-polynomial-multiplication'),
+              ('Efficient Polynomial Multiplication',
                2,
                None,
-               'recurrent-neural-networks-rnns-overarching-view'),
-              ('Sequential data only?', 2, None, 'sequential-data-only'),
-              ('Differential equations', 2, None, 'differential-equations'),
-              ('A simple example', 2, None, 'a-simple-example'),
-              ('RNNs', 2, None, 'rnns'),
-              ('What kinds of behaviour can RNNs exhibit?',
+               'efficient-polynomial-multiplication'),
+              ('Further simplification', 2, None, 'further-simplification'),
+              ('A more efficient way of coding the above Convolution',
                2,
                None,
-               'what-kinds-of-behaviour-can-rnns-exhibit'),
-              ('Basic layout,  "Figures from Sebastian Rashcka et al, Machine '
-               'learning with Sickit-Learn and '
-               'PyTorch":"https://sebastianraschka.com/blog/2022/ml-pytorch-book.html"',
+               'a-more-efficient-way-of-coding-the-above-convolution'),
+              ('Commutative process', 2, None, 'commutative-process'),
+              ('Toeplitz matrices', 2, None, 'toeplitz-matrices'),
+              ('Fourier series and Toeplitz matrices',
                2,
                None,
-               'basic-layout-figures-from-sebastian-rashcka-et-al-machine-learning-with-sickit-learn-and-pytorch-https-sebastianraschka-com-blog-2022-ml-pytorch-book-html'),
-              ('Solving differential equations with RNNs',
+               'fourier-series-and-toeplitz-matrices'),
+              ('Generalizing the above one-dimensional case',
                2,
                None,
-               'solving-differential-equations-with-rnns'),
-              ('Two first-order differential equations',
+               'generalizing-the-above-one-dimensional-case'),
+              ('Memory considerations', 2, None, 'memory-considerations'),
+              ('Padding', 2, None, 'padding'),
+              ('New vector', 2, None, 'new-vector'),
+              ('Rewriting as dot products',
                2,
                None,
-               'two-first-order-differential-equations'),
-              ('Velocity only', 2, None, 'velocity-only'),
-              ('Linking with RNNs', 2, None, 'linking-with-rnns'),
-              ('Minor rewrite', 2, None, 'minor-rewrite'),
-              ('RNNs in more detail', 2, None, 'rnns-in-more-detail'),
-              ('RNNs in more detail, part 2',
+               'rewriting-as-dot-products'),
+              ('Cross correlation', 2, None, 'cross-correlation'),
+              ('Two-dimensional objects', 2, None, 'two-dimensional-objects'),
+              ('CNNs in more detail, simple example',
                2,
                None,
-               'rnns-in-more-detail-part-2'),
-              ('RNNs in more detail, part 3',
+               'cnns-in-more-detail-simple-example'),
+              ('The convolution stage', 2, None, 'the-convolution-stage'),
+              ('Finding the number of parameters',
                2,
                None,
-               'rnns-in-more-detail-part-3'),
-              ('RNNs in more detail, part 4',
+               'finding-the-number-of-parameters'),
+              ('New image (or volume)', 2, None, 'new-image-or-volume'),
+              ('Parameters to train, common settings',
                2,
                None,
-               'rnns-in-more-detail-part-4'),
-              ('RNNs in more detail, part 5',
-               2,
-               None,
-               'rnns-in-more-detail-part-5'),
-              ('RNNs in more detail, part 6',
-               2,
-               None,
-               'rnns-in-more-detail-part-6'),
-              ('RNNs in more detail, part 7',
-               2,
-               None,
-               'rnns-in-more-detail-part-7'),
-              ('Backpropagation through time',
-               2,
-               None,
-               'backpropagation-through-time'),
-              ('The backward pass is linear',
-               2,
-               None,
-               'the-backward-pass-is-linear'),
-              ('The problem of exploding or vanishing gradients',
-               2,
-               None,
-               'the-problem-of-exploding-or-vanishing-gradients'),
-              ('Mathematical setup', 2, None, 'mathematical-setup'),
-              ('Back propagation in time through figures, part 1',
-               2,
-               None,
-               'back-propagation-in-time-through-figures-part-1'),
-              ('Back propagation in time, part 2',
-               2,
-               None,
-               'back-propagation-in-time-part-2'),
-              ('Back propagation in time, part 3',
+               'parameters-to-train-common-settings'),
+              ('Examples of CNN setups', 2, None, 'examples-of-cnn-setups'),
+              ('Summarizing: Performing a general discrete convolution ("From '
+               'Raschka et '
+               'al":"https://github.com/rasbt/machine-learning-book")',
                2,
                None,
-               'back-propagation-in-time-part-3'),
-              ('Back propagation in time, part 4',
+               'summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
+              ('Pooling', 2, None, 'pooling'),
+              ('Pooling arithmetic', 2, None, 'pooling-arithmetic'),
+              ('Pooling types ("From Raschka et '
+               'al":"https://github.com/rasbt/machine-learning-book")',
                2,
                None,
-               'back-propagation-in-time-part-4'),
-              ('Back propagation in time in equations',
+               'pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
+              ('Building convolutional neural networks using Tensorflow and '
+               'Keras',
                2,
                None,
-               'back-propagation-in-time-in-equations'),
-              ('Chain rule again', 2, None, 'chain-rule-again'),
-              ('Gradients of loss functions',
+               'building-convolutional-neural-networks-using-tensorflow-and-keras'),
+              ('Setting it up', 2, None, 'setting-it-up'),
+              ('The MNIST dataset again', 2, None, 'the-mnist-dataset-again'),
+              ('Strong correlations', 2, None, 'strong-correlations'),
+              ('Layers of a CNN', 2, None, 'layers-of-a-cnn'),
+              ('Systematic reduction', 2, None, 'systematic-reduction'),
+              ('Prerequisites: Collect and pre-process data',
                2,
                None,
-               'gradients-of-loss-functions'),
-              ('Summary of RNNs', 2, None, 'summary-of-rnns'),
-              ('Summary of a  typical RNN',
+               'prerequisites-collect-and-pre-process-data'),
+              ('Importing Keras and Tensorflow',
                2,
                None,
-               'summary-of-a-typical-rnn'),
-              ('Four effective ways to learn an RNN and preparing for next '
-               'week',
+               'importing-keras-and-tensorflow'),
+              ('Running with Keras', 2, None, 'running-with-keras'),
+              ('Final part', 2, None, 'final-part'),
+              ('Final visualization', 2, None, 'final-visualization'),
+              ('The CIFAR01 data set', 2, None, 'the-cifar01-data-set'),
+              ('Verifying the data set', 2, None, 'verifying-the-data-set'),
+              ('Set up  the model', 2, None, 'set-up-the-model'),
+              ('Add Dense layers on top', 2, None, 'add-dense-layers-on-top'),
+              ('Compile and train the model',
                2,
                None,
-               'four-effective-ways-to-learn-an-rnn-and-preparing-for-next-week'),
-              ('Gating mechanism: Long Short Term Memory (LSTM)',
+               'compile-and-train-the-model'),
+              ('Finally, evaluate the model',
                2,
                None,
-               'gating-mechanism-long-short-term-memory-lstm'),
-              ('Implementing a memory cell in a neural network',
+               'finally-evaluate-the-model'),
+              ('Building code using Pytorch',
                2,
                None,
-               'implementing-a-memory-cell-in-a-neural-network'),
-              ('LSTM details', 2, None, 'lstm-details'),
-              ('Basic layout', 2, None, 'basic-layout'),
-              ('More LSTM details', 2, None, 'more-lstm-details'),
-              ('The forget gate', 2, None, 'the-forget-gate'),
-              ('Input gate', 2, None, 'input-gate'),
-              ('Forget and input', 2, None, 'forget-and-input'),
-              ('Output gate', 2, None, 'output-gate')]}
+               'building-code-using-pytorch')]}
 end of tocinfo -->
 
 <body>
@@ -302,111 +201,72 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week45-bs.html">Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks (RNNs)</a>
+    <a class="navbar-brand" href="http://github.com/week45-bs.html">Week 45,  Convolutional Neural Networks (CCNs)</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs001.html#plans-for-week-45" style="font-size: 80%;"><b>Plans for week 45</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs002.html#material-for-the-lab-sessions-additional-ways-to-present-classification-results-and-other-practicalities" style="font-size: 80%;"><b>Material for the lab sessions, additional ways to present classification results and other practicalities</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs003.html#material-for-lecture-monday-november-4" style="font-size: 80%;"><b>Material for Lecture Monday November 4</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs004.html#convolutional-neural-networks-recognizing-images" style="font-size: 80%;"><b>Convolutional Neural Networks (recognizing images)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs005.html#what-is-the-difference" style="font-size: 80%;"><b>What is the Difference</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs006.html#neural-networks-vs-cnns" style="font-size: 80%;"><b>Neural Networks vs CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;"><b>Why CNNS for images, sound files, medical images from CT scans etc?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs008.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;"><b>Regular NNs don’t scale well to full images</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs009.html#3d-volumes-of-neurons" style="font-size: 80%;"><b>3D volumes of neurons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs010.html#layers-used-to-build-cnns" style="font-size: 80%;"><b>Layers used to build CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs011.html#cnns-in-brief" style="font-size: 80%;"><b>CNNs in brief</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs012.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs013.html#key-idea" style="font-size: 80%;"><b>Key Idea</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs014.html#building-convolutional-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building convolutional neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs015.html#setting-it-up" style="font-size: 80%;"><b>Setting it up</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs016.html#the-mnist-dataset-again" style="font-size: 80%;"><b>The MNIST dataset again</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs017.html#strong-correlations" style="font-size: 80%;"><b>Strong correlations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs018.html#layers-of-a-cnn" style="font-size: 80%;"><b>Layers of a CNN</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs019.html#systematic-reduction" style="font-size: 80%;"><b>Systematic reduction</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs020.html#prerequisites-collect-and-pre-process-data" style="font-size: 80%;"><b>Prerequisites: Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs021.html#importing-keras-and-tensorflow" style="font-size: 80%;"><b>Importing Keras and Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs022.html#running-with-keras" style="font-size: 80%;"><b>Running with Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs023.html#final-part" style="font-size: 80%;"><b>Final part</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs024.html#final-visualization" style="font-size: 80%;"><b>Final visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs025.html#the-cifar01-data-set" style="font-size: 80%;"><b>The CIFAR01 data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs026.html#verifying-the-data-set" style="font-size: 80%;"><b>Verifying the data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs027.html#set-up-the-model" style="font-size: 80%;"><b>Set up  the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs028.html#add-dense-layers-on-top" style="font-size: 80%;"><b>Add Dense layers on top</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs029.html#compile-and-train-the-model" style="font-size: 80%;"><b>Compile and train the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs030.html#finally-evaluate-the-model" style="font-size: 80%;"><b>Finally, evaluate the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#building-our-own-cnn-code" style="font-size: 80%;"><b>Building our own CNN code</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#list-of-contents" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;List of contents:</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution2dlayer-convolution-in-a-hidden-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution2DLayer: convolution in a hidden layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#backpropagation-in-the-convolutional-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Backpropagation in the convolutional layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#demonstration" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Demonstration</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#pooling-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Pooling Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#flattening-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Flattening Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#fully-connected-layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Fully Connected Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#optimized-convolution2dlayer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Optimized Convolution2DLayer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#the-convolutional-neural-network-cnn" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;The Convolutional Neural Network (CNN)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-cnn-code" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of CNN code</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#additional-remarks" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Additional Remarks</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#remarks-on-the-speed" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Remarks on the speed</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution-using-separable-kernels" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution using separable kernels</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution-in-the-fourier-domain" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution in the Fourier domain</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs032.html#from-ffnns-and-cnns-to-recurrent-neural-networks-rnns" style="font-size: 80%;"><b>From FFNNs and CNNs to recurrent neural networks (RNNs)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs033.html#feedback-connections" style="font-size: 80%;"><b>Feedback connections</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs034.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs035.html#recurrent-neural-networks-rnns-overarching-view" style="font-size: 80%;"><b>Recurrent neural networks (RNNs): Overarching view</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs036.html#sequential-data-only" style="font-size: 80%;"><b>Sequential data only?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs037.html#differential-equations" style="font-size: 80%;"><b>Differential equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs038.html#a-simple-example" style="font-size: 80%;"><b>A simple example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs039.html#rnns" style="font-size: 80%;"><b>RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs040.html#what-kinds-of-behaviour-can-rnns-exhibit" style="font-size: 80%;"><b>What kinds of behaviour can RNNs exhibit?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs041.html#basic-layout-figures-from-sebastian-rashcka-et-al-machine-learning-with-sickit-learn-and-pytorch-https-sebastianraschka-com-blog-2022-ml-pytorch-book-html" style="font-size: 80%;"><b>Basic layout,  "Figures from Sebastian Rashcka et al, Machine learning with Sickit-Learn and PyTorch":"https://sebastianraschka.com/blog/2022/ml-pytorch-book.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs042.html#solving-differential-equations-with-rnns" style="font-size: 80%;"><b>Solving differential equations with RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs043.html#two-first-order-differential-equations" style="font-size: 80%;"><b>Two first-order differential equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs044.html#velocity-only" style="font-size: 80%;"><b>Velocity only</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs045.html#linking-with-rnns" style="font-size: 80%;"><b>Linking with RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs046.html#minor-rewrite" style="font-size: 80%;"><b>Minor rewrite</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs047.html#rnns-in-more-detail" style="font-size: 80%;"><b>RNNs in more detail</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs048.html#rnns-in-more-detail-part-2" style="font-size: 80%;"><b>RNNs in more detail, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs049.html#rnns-in-more-detail-part-3" style="font-size: 80%;"><b>RNNs in more detail, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs050.html#rnns-in-more-detail-part-4" style="font-size: 80%;"><b>RNNs in more detail, part 4</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs051.html#rnns-in-more-detail-part-5" style="font-size: 80%;"><b>RNNs in more detail, part 5</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs052.html#rnns-in-more-detail-part-6" style="font-size: 80%;"><b>RNNs in more detail, part 6</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs053.html#rnns-in-more-detail-part-7" style="font-size: 80%;"><b>RNNs in more detail, part 7</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs054.html#backpropagation-through-time" style="font-size: 80%;"><b>Backpropagation through time</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs055.html#the-backward-pass-is-linear" style="font-size: 80%;"><b>The backward pass is linear</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs056.html#the-problem-of-exploding-or-vanishing-gradients" style="font-size: 80%;"><b>The problem of exploding or vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs057.html#mathematical-setup" style="font-size: 80%;"><b>Mathematical setup</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs058.html#back-propagation-in-time-through-figures-part-1" style="font-size: 80%;"><b>Back propagation in time through figures, part 1</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs059.html#back-propagation-in-time-part-2" style="font-size: 80%;"><b>Back propagation in time, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs060.html#back-propagation-in-time-part-3" style="font-size: 80%;"><b>Back propagation in time, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs061.html#back-propagation-in-time-part-4" style="font-size: 80%;"><b>Back propagation in time, part 4</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs062.html#back-propagation-in-time-in-equations" style="font-size: 80%;"><b>Back propagation in time in equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs063.html#chain-rule-again" style="font-size: 80%;"><b>Chain rule again</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs064.html#gradients-of-loss-functions" style="font-size: 80%;"><b>Gradients of loss functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs065.html#summary-of-rnns" style="font-size: 80%;"><b>Summary of RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs066.html#summary-of-a-typical-rnn" style="font-size: 80%;"><b>Summary of a  typical RNN</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs067.html#four-effective-ways-to-learn-an-rnn-and-preparing-for-next-week" style="font-size: 80%;"><b>Four effective ways to learn an RNN and preparing for next week</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs068.html#gating-mechanism-long-short-term-memory-lstm" style="font-size: 80%;"><b>Gating mechanism: Long Short Term Memory (LSTM)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs069.html#implementing-a-memory-cell-in-a-neural-network" style="font-size: 80%;"><b>Implementing a memory cell in a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs070.html#lstm-details" style="font-size: 80%;"><b>LSTM details</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs071.html#basic-layout" style="font-size: 80%;"><b>Basic layout</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs072.html#more-lstm-details" style="font-size: 80%;"><b>More LSTM details</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs073.html#the-forget-gate" style="font-size: 80%;"><b>The forget gate</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs074.html#input-gate" style="font-size: 80%;"><b>Input gate</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs075.html#forget-and-input" style="font-size: 80%;"><b>Forget and input</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs076.html#output-gate" style="font-size: 80%;"><b>Output gate</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs001.html#plans-for-week-45" style="font-size: 80%;">Plans for week 45</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs002.html#material-for-the-lab-sessions" style="font-size: 80%;">Material for the lab sessions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs003.html#material-for-lecture-monday-november-3" style="font-size: 80%;">Material for Lecture Monday November 3</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs004.html#convolutional-neural-networks-recognizing-images-reminder-from-last-week" style="font-size: 80%;">Convolutional Neural Networks (recognizing images), reminder from last week</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs005.html#what-is-the-difference" style="font-size: 80%;">What is the Difference</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs006.html#neural-networks-vs-cnns" style="font-size: 80%;">Neural Networks vs CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;">Why CNNS for images, sound files, medical images from CT scans etc?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs008.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;">Regular NNs don’t scale well to full images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs009.html#3d-volumes-of-neurons" style="font-size: 80%;">3D volumes of neurons</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs010.html#more-on-dimensionalities" style="font-size: 80%;">More on Dimensionalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs011.html#further-remarks" style="font-size: 80%;">Further remarks</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs012.html#layers-used-to-build-cnns" style="font-size: 80%;">Layers used to build CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs013.html#transforming-images" style="font-size: 80%;">Transforming images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs014.html#cnns-in-brief" style="font-size: 80%;">CNNs in brief</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs015.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs016.html#key-idea" style="font-size: 80%;">Key Idea</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs017.html#mathematics-of-cnns" style="font-size: 80%;">Mathematics of CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs018.html#convolution-examples-polynomial-multiplication" style="font-size: 80%;">Convolution Examples: Polynomial multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs019.html#efficient-polynomial-multiplication" style="font-size: 80%;">Efficient Polynomial Multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs020.html#further-simplification" style="font-size: 80%;">Further simplification</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs021.html#a-more-efficient-way-of-coding-the-above-convolution" style="font-size: 80%;">A more efficient way of coding the above Convolution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs022.html#commutative-process" style="font-size: 80%;">Commutative process</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs023.html#toeplitz-matrices" style="font-size: 80%;">Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs024.html#fourier-series-and-toeplitz-matrices" style="font-size: 80%;">Fourier series and Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs025.html#generalizing-the-above-one-dimensional-case" style="font-size: 80%;">Generalizing the above one-dimensional case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs026.html#memory-considerations" style="font-size: 80%;">Memory considerations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs027.html#padding" style="font-size: 80%;">Padding</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs028.html#new-vector" style="font-size: 80%;">New vector</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs029.html#rewriting-as-dot-products" style="font-size: 80%;">Rewriting as dot products</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs030.html#cross-correlation" style="font-size: 80%;">Cross correlation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs030.html#two-dimensional-objects" style="font-size: 80%;">Two-dimensional objects</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#cnns-in-more-detail-simple-example" style="font-size: 80%;">CNNs in more detail, simple example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs032.html#the-convolution-stage" style="font-size: 80%;">The convolution stage</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs033.html#finding-the-number-of-parameters" style="font-size: 80%;">Finding the number of parameters</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs034.html#new-image-or-volume" style="font-size: 80%;">New image (or volume)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs035.html#parameters-to-train-common-settings" style="font-size: 80%;">Parameters to train, common settings</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs036.html#examples-of-cnn-setups" style="font-size: 80%;">Examples of CNN setups</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs037.html#summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Summarizing: Performing a general discrete convolution ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs038.html#pooling" style="font-size: 80%;">Pooling</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs039.html#pooling-arithmetic" style="font-size: 80%;">Pooling arithmetic</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs040.html#pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Pooling types ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs041.html#building-convolutional-neural-networks-using-tensorflow-and-keras" style="font-size: 80%;">Building convolutional neural networks using Tensorflow and Keras</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs042.html#setting-it-up" style="font-size: 80%;">Setting it up</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs043.html#the-mnist-dataset-again" style="font-size: 80%;">The MNIST dataset again</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs044.html#strong-correlations" style="font-size: 80%;">Strong correlations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs045.html#layers-of-a-cnn" style="font-size: 80%;">Layers of a CNN</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs046.html#systematic-reduction" style="font-size: 80%;">Systematic reduction</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs047.html#prerequisites-collect-and-pre-process-data" style="font-size: 80%;">Prerequisites: Collect and pre-process data</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs048.html#importing-keras-and-tensorflow" style="font-size: 80%;">Importing Keras and Tensorflow</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs049.html#running-with-keras" style="font-size: 80%;">Running with Keras</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs050.html#final-part" style="font-size: 80%;">Final part</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs051.html#final-visualization" style="font-size: 80%;">Final visualization</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs052.html#the-cifar01-data-set" style="font-size: 80%;">The CIFAR01 data set</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs053.html#verifying-the-data-set" style="font-size: 80%;">Verifying the data set</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs054.html#set-up-the-model" style="font-size: 80%;">Set up  the model</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs055.html#add-dense-layers-on-top" style="font-size: 80%;">Add Dense layers on top</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs056.html#compile-and-train-the-model" style="font-size: 80%;">Compile and train the model</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs057.html#finally-evaluate-the-model" style="font-size: 80%;">Finally, evaluate the model</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs058.html#building-code-using-pytorch" style="font-size: 80%;">Building code using Pytorch</a></li>
 
         </ul>
       </li>
@@ -464,7 +324,7 @@ <h2 id="why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" class=
   <li><a href="http://github.com/._week45-bs015.html">16</a></li>
   <li><a href="http://github.com/._week45-bs016.html">17</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week45-bs076.html">77</a></li>
+  <li><a href="http://github.com/._week45-bs058.html">59</a></li>
   <li><a href="http://github.com/._week45-bs008.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week45/html/._week45-bs008.html b/doc/pub/week45/html/._week45-bs008.html
index 190551e2d..01548746c 100644
--- a/doc/pub/week45/html/._week45-bs008.html
+++ b/doc/pub/week45/html/._week45-bs008.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks (RNNs)">
-<title>Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks (RNNs)</title>
+<meta name="description" content="Week 45,  Convolutional Neural Networks (CCNs)">
+<title>Week 45,  Convolutional Neural Networks (CCNs)</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week45.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week45-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -37,19 +37,19 @@
 <!-- tocinfo
 {'highest level': 2,
  'sections': [('Plans for week 45', 2, None, 'plans-for-week-45'),
-              ('Material for the lab sessions, additional ways to present '
-               'classification results and other practicalities',
+              ('Material for the lab sessions',
                2,
                None,
-               'material-for-the-lab-sessions-additional-ways-to-present-classification-results-and-other-practicalities'),
-              ('Material for Lecture Monday November 4',
+               'material-for-the-lab-sessions'),
+              ('Material for Lecture Monday November 3',
                2,
                None,
-               'material-for-lecture-monday-november-4'),
-              ('Convolutional Neural Networks (recognizing images)',
+               'material-for-lecture-monday-november-3'),
+              ('Convolutional Neural Networks (recognizing images), reminder '
+               'from last week',
                2,
                None,
-               'convolutional-neural-networks-recognizing-images'),
+               'convolutional-neural-networks-recognizing-images-reminder-from-last-week'),
               ('What is the Difference', 2, None, 'what-is-the-difference'),
               ('Neural Networks vs CNNs', 2, None, 'neural-networks-vs-cnns'),
               ('Why CNNS for images, sound files, medical images from CT scans '
@@ -62,10 +62,13 @@
                None,
                'regular-nns-don-t-scale-well-to-full-images'),
               ('3D volumes of neurons', 2, None, '3d-volumes-of-neurons'),
+              ('More on Dimensionalities', 2, None, 'more-on-dimensionalities'),
+              ('Further remarks', 2, None, 'further-remarks'),
               ('Layers used to build CNNs',
                2,
                None,
                'layers-used-to-build-cnns'),
+              ('Transforming images', 2, None, 'transforming-images'),
               ('CNNs in brief', 2, None, 'cnns-in-brief'),
               ('A deep CNN model ("From Raschka et '
                'al":"https://github.com/rasbt/machine-learning-book")',
@@ -73,208 +76,104 @@
                None,
                'a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
               ('Key Idea', 2, None, 'key-idea'),
-              ('Building convolutional neural networks in Tensorflow and Keras',
+              ('Mathematics of CNNs', 2, None, 'mathematics-of-cnns'),
+              ('Convolution Examples: Polynomial multiplication',
                2,
                None,
-               'building-convolutional-neural-networks-in-tensorflow-and-keras'),
-              ('Setting it up', 2, None, 'setting-it-up'),
-              ('The MNIST dataset again', 2, None, 'the-mnist-dataset-again'),
-              ('Strong correlations', 2, None, 'strong-correlations'),
-              ('Layers of a CNN', 2, None, 'layers-of-a-cnn'),
-              ('Systematic reduction', 2, None, 'systematic-reduction'),
-              ('Prerequisites: Collect and pre-process data',
-               2,
-               None,
-               'prerequisites-collect-and-pre-process-data'),
-              ('Importing Keras and Tensorflow',
-               2,
-               None,
-               'importing-keras-and-tensorflow'),
-              ('Running with Keras', 2, None, 'running-with-keras'),
-              ('Final part', 2, None, 'final-part'),
-              ('Final visualization', 2, None, 'final-visualization'),
-              ('The CIFAR01 data set', 2, None, 'the-cifar01-data-set'),
-              ('Verifying the data set', 2, None, 'verifying-the-data-set'),
-              ('Set up  the model', 2, None, 'set-up-the-model'),
-              ('Add Dense layers on top', 2, None, 'add-dense-layers-on-top'),
-              ('Compile and train the model',
-               2,
-               None,
-               'compile-and-train-the-model'),
-              ('Finally, evaluate the model',
-               2,
-               None,
-               'finally-evaluate-the-model'),
-              ('Building our own CNN code',
-               2,
-               None,
-               'building-our-own-cnn-code'),
-              ('List of contents:', 3, None, 'list-of-contents'),
-              ('Schedulers', 3, None, 'schedulers'),
-              ('Usage of schedulers', 3, None, 'usage-of-schedulers'),
-              ('Cost functions', 3, None, 'cost-functions'),
-              ('Usage of cost functions', 3, None, 'usage-of-cost-functions'),
-              ('Activation functions', 3, None, 'activation-functions'),
-              ('Usage of activation functions',
-               3,
-               None,
-               'usage-of-activation-functions'),
-              ('Convolution', 3, None, 'convolution'),
-              ('Layers', 3, None, 'layers'),
-              ('Convolution2DLayer: convolution in a hidden layer',
-               3,
-               None,
-               'convolution2dlayer-convolution-in-a-hidden-layer'),
-              ('Backpropagation in the convolutional layer',
-               3,
-               None,
-               'backpropagation-in-the-convolutional-layer'),
-              ('Demonstration', 3, None, 'demonstration'),
-              ('Pooling Layer', 3, None, 'pooling-layer'),
-              ('Flattening Layer', 3, None, 'flattening-layer'),
-              ('Fully Connected Layers', 3, None, 'fully-connected-layers'),
-              ('Optimized Convolution2DLayer',
-               3,
-               None,
-               'optimized-convolution2dlayer'),
-              ('The Convolutional Neural Network (CNN)',
-               3,
-               None,
-               'the-convolutional-neural-network-cnn'),
-              ('Usage of CNN code', 3, None, 'usage-of-cnn-code'),
-              ('Additional Remarks', 3, None, 'additional-remarks'),
-              ('Remarks on the speed', 3, None, 'remarks-on-the-speed'),
-              ('Convolution using separable kernels',
-               3,
-               None,
-               'convolution-using-separable-kernels'),
-              ('Convolution in the Fourier domain',
-               3,
-               None,
-               'convolution-in-the-fourier-domain'),
-              ('From FFNNs and CNNs to recurrent neural networks (RNNs)',
-               2,
-               None,
-               'from-ffnns-and-cnns-to-recurrent-neural-networks-rnns'),
-              ('Feedback connections', 2, None, 'feedback-connections'),
-              ('Vanishing gradients', 2, None, 'vanishing-gradients'),
-              ('Recurrent neural networks (RNNs): Overarching view',
+               'convolution-examples-polynomial-multiplication'),
+              ('Efficient Polynomial Multiplication',
                2,
                None,
-               'recurrent-neural-networks-rnns-overarching-view'),
-              ('Sequential data only?', 2, None, 'sequential-data-only'),
-              ('Differential equations', 2, None, 'differential-equations'),
-              ('A simple example', 2, None, 'a-simple-example'),
-              ('RNNs', 2, None, 'rnns'),
-              ('What kinds of behaviour can RNNs exhibit?',
+               'efficient-polynomial-multiplication'),
+              ('Further simplification', 2, None, 'further-simplification'),
+              ('A more efficient way of coding the above Convolution',
                2,
                None,
-               'what-kinds-of-behaviour-can-rnns-exhibit'),
-              ('Basic layout,  "Figures from Sebastian Rashcka et al, Machine '
-               'learning with Sickit-Learn and '
-               'PyTorch":"https://sebastianraschka.com/blog/2022/ml-pytorch-book.html"',
+               'a-more-efficient-way-of-coding-the-above-convolution'),
+              ('Commutative process', 2, None, 'commutative-process'),
+              ('Toeplitz matrices', 2, None, 'toeplitz-matrices'),
+              ('Fourier series and Toeplitz matrices',
                2,
                None,
-               'basic-layout-figures-from-sebastian-rashcka-et-al-machine-learning-with-sickit-learn-and-pytorch-https-sebastianraschka-com-blog-2022-ml-pytorch-book-html'),
-              ('Solving differential equations with RNNs',
+               'fourier-series-and-toeplitz-matrices'),
+              ('Generalizing the above one-dimensional case',
                2,
                None,
-               'solving-differential-equations-with-rnns'),
-              ('Two first-order differential equations',
+               'generalizing-the-above-one-dimensional-case'),
+              ('Memory considerations', 2, None, 'memory-considerations'),
+              ('Padding', 2, None, 'padding'),
+              ('New vector', 2, None, 'new-vector'),
+              ('Rewriting as dot products',
                2,
                None,
-               'two-first-order-differential-equations'),
-              ('Velocity only', 2, None, 'velocity-only'),
-              ('Linking with RNNs', 2, None, 'linking-with-rnns'),
-              ('Minor rewrite', 2, None, 'minor-rewrite'),
-              ('RNNs in more detail', 2, None, 'rnns-in-more-detail'),
-              ('RNNs in more detail, part 2',
+               'rewriting-as-dot-products'),
+              ('Cross correlation', 2, None, 'cross-correlation'),
+              ('Two-dimensional objects', 2, None, 'two-dimensional-objects'),
+              ('CNNs in more detail, simple example',
                2,
                None,
-               'rnns-in-more-detail-part-2'),
-              ('RNNs in more detail, part 3',
+               'cnns-in-more-detail-simple-example'),
+              ('The convolution stage', 2, None, 'the-convolution-stage'),
+              ('Finding the number of parameters',
                2,
                None,
-               'rnns-in-more-detail-part-3'),
-              ('RNNs in more detail, part 4',
+               'finding-the-number-of-parameters'),
+              ('New image (or volume)', 2, None, 'new-image-or-volume'),
+              ('Parameters to train, common settings',
                2,
                None,
-               'rnns-in-more-detail-part-4'),
-              ('RNNs in more detail, part 5',
-               2,
-               None,
-               'rnns-in-more-detail-part-5'),
-              ('RNNs in more detail, part 6',
-               2,
-               None,
-               'rnns-in-more-detail-part-6'),
-              ('RNNs in more detail, part 7',
-               2,
-               None,
-               'rnns-in-more-detail-part-7'),
-              ('Backpropagation through time',
-               2,
-               None,
-               'backpropagation-through-time'),
-              ('The backward pass is linear',
-               2,
-               None,
-               'the-backward-pass-is-linear'),
-              ('The problem of exploding or vanishing gradients',
-               2,
-               None,
-               'the-problem-of-exploding-or-vanishing-gradients'),
-              ('Mathematical setup', 2, None, 'mathematical-setup'),
-              ('Back propagation in time through figures, part 1',
-               2,
-               None,
-               'back-propagation-in-time-through-figures-part-1'),
-              ('Back propagation in time, part 2',
-               2,
-               None,
-               'back-propagation-in-time-part-2'),
-              ('Back propagation in time, part 3',
+               'parameters-to-train-common-settings'),
+              ('Examples of CNN setups', 2, None, 'examples-of-cnn-setups'),
+              ('Summarizing: Performing a general discrete convolution ("From '
+               'Raschka et '
+               'al":"https://github.com/rasbt/machine-learning-book")',
                2,
                None,
-               'back-propagation-in-time-part-3'),
-              ('Back propagation in time, part 4',
+               'summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
+              ('Pooling', 2, None, 'pooling'),
+              ('Pooling arithmetic', 2, None, 'pooling-arithmetic'),
+              ('Pooling types ("From Raschka et '
+               'al":"https://github.com/rasbt/machine-learning-book")',
                2,
                None,
-               'back-propagation-in-time-part-4'),
-              ('Back propagation in time in equations',
+               'pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
+              ('Building convolutional neural networks using Tensorflow and '
+               'Keras',
                2,
                None,
-               'back-propagation-in-time-in-equations'),
-              ('Chain rule again', 2, None, 'chain-rule-again'),
-              ('Gradients of loss functions',
+               'building-convolutional-neural-networks-using-tensorflow-and-keras'),
+              ('Setting it up', 2, None, 'setting-it-up'),
+              ('The MNIST dataset again', 2, None, 'the-mnist-dataset-again'),
+              ('Strong correlations', 2, None, 'strong-correlations'),
+              ('Layers of a CNN', 2, None, 'layers-of-a-cnn'),
+              ('Systematic reduction', 2, None, 'systematic-reduction'),
+              ('Prerequisites: Collect and pre-process data',
                2,
                None,
-               'gradients-of-loss-functions'),
-              ('Summary of RNNs', 2, None, 'summary-of-rnns'),
-              ('Summary of a  typical RNN',
+               'prerequisites-collect-and-pre-process-data'),
+              ('Importing Keras and Tensorflow',
                2,
                None,
-               'summary-of-a-typical-rnn'),
-              ('Four effective ways to learn an RNN and preparing for next '
-               'week',
+               'importing-keras-and-tensorflow'),
+              ('Running with Keras', 2, None, 'running-with-keras'),
+              ('Final part', 2, None, 'final-part'),
+              ('Final visualization', 2, None, 'final-visualization'),
+              ('The CIFAR01 data set', 2, None, 'the-cifar01-data-set'),
+              ('Verifying the data set', 2, None, 'verifying-the-data-set'),
+              ('Set up  the model', 2, None, 'set-up-the-model'),
+              ('Add Dense layers on top', 2, None, 'add-dense-layers-on-top'),
+              ('Compile and train the model',
                2,
                None,
-               'four-effective-ways-to-learn-an-rnn-and-preparing-for-next-week'),
-              ('Gating mechanism: Long Short Term Memory (LSTM)',
+               'compile-and-train-the-model'),
+              ('Finally, evaluate the model',
                2,
                None,
-               'gating-mechanism-long-short-term-memory-lstm'),
-              ('Implementing a memory cell in a neural network',
+               'finally-evaluate-the-model'),
+              ('Building code using Pytorch',
                2,
                None,
-               'implementing-a-memory-cell-in-a-neural-network'),
-              ('LSTM details', 2, None, 'lstm-details'),
-              ('Basic layout', 2, None, 'basic-layout'),
-              ('More LSTM details', 2, None, 'more-lstm-details'),
-              ('The forget gate', 2, None, 'the-forget-gate'),
-              ('Input gate', 2, None, 'input-gate'),
-              ('Forget and input', 2, None, 'forget-and-input'),
-              ('Output gate', 2, None, 'output-gate')]}
+               'building-code-using-pytorch')]}
 end of tocinfo -->
 
 <body>
@@ -302,111 +201,72 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week45-bs.html">Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks (RNNs)</a>
+    <a class="navbar-brand" href="http://github.com/week45-bs.html">Week 45,  Convolutional Neural Networks (CCNs)</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs001.html#plans-for-week-45" style="font-size: 80%;"><b>Plans for week 45</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs002.html#material-for-the-lab-sessions-additional-ways-to-present-classification-results-and-other-practicalities" style="font-size: 80%;"><b>Material for the lab sessions, additional ways to present classification results and other practicalities</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs003.html#material-for-lecture-monday-november-4" style="font-size: 80%;"><b>Material for Lecture Monday November 4</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs004.html#convolutional-neural-networks-recognizing-images" style="font-size: 80%;"><b>Convolutional Neural Networks (recognizing images)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs005.html#what-is-the-difference" style="font-size: 80%;"><b>What is the Difference</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs006.html#neural-networks-vs-cnns" style="font-size: 80%;"><b>Neural Networks vs CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs007.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;"><b>Why CNNS for images, sound files, medical images from CT scans etc?</b></a></li>
-     <!-- navigation toc: --> <li><a href="#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;"><b>Regular NNs don’t scale well to full images</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs009.html#3d-volumes-of-neurons" style="font-size: 80%;"><b>3D volumes of neurons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs010.html#layers-used-to-build-cnns" style="font-size: 80%;"><b>Layers used to build CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs011.html#cnns-in-brief" style="font-size: 80%;"><b>CNNs in brief</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs012.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs013.html#key-idea" style="font-size: 80%;"><b>Key Idea</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs014.html#building-convolutional-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building convolutional neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs015.html#setting-it-up" style="font-size: 80%;"><b>Setting it up</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs016.html#the-mnist-dataset-again" style="font-size: 80%;"><b>The MNIST dataset again</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs017.html#strong-correlations" style="font-size: 80%;"><b>Strong correlations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs018.html#layers-of-a-cnn" style="font-size: 80%;"><b>Layers of a CNN</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs019.html#systematic-reduction" style="font-size: 80%;"><b>Systematic reduction</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs020.html#prerequisites-collect-and-pre-process-data" style="font-size: 80%;"><b>Prerequisites: Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs021.html#importing-keras-and-tensorflow" style="font-size: 80%;"><b>Importing Keras and Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs022.html#running-with-keras" style="font-size: 80%;"><b>Running with Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs023.html#final-part" style="font-size: 80%;"><b>Final part</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs024.html#final-visualization" style="font-size: 80%;"><b>Final visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs025.html#the-cifar01-data-set" style="font-size: 80%;"><b>The CIFAR01 data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs026.html#verifying-the-data-set" style="font-size: 80%;"><b>Verifying the data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs027.html#set-up-the-model" style="font-size: 80%;"><b>Set up  the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs028.html#add-dense-layers-on-top" style="font-size: 80%;"><b>Add Dense layers on top</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs029.html#compile-and-train-the-model" style="font-size: 80%;"><b>Compile and train the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs030.html#finally-evaluate-the-model" style="font-size: 80%;"><b>Finally, evaluate the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#building-our-own-cnn-code" style="font-size: 80%;"><b>Building our own CNN code</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#list-of-contents" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;List of contents:</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution2dlayer-convolution-in-a-hidden-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution2DLayer: convolution in a hidden layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#backpropagation-in-the-convolutional-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Backpropagation in the convolutional layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#demonstration" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Demonstration</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#pooling-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Pooling Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#flattening-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Flattening Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#fully-connected-layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Fully Connected Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#optimized-convolution2dlayer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Optimized Convolution2DLayer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#the-convolutional-neural-network-cnn" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;The Convolutional Neural Network (CNN)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-cnn-code" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of CNN code</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#additional-remarks" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Additional Remarks</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#remarks-on-the-speed" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Remarks on the speed</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution-using-separable-kernels" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution using separable kernels</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution-in-the-fourier-domain" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution in the Fourier domain</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs032.html#from-ffnns-and-cnns-to-recurrent-neural-networks-rnns" style="font-size: 80%;"><b>From FFNNs and CNNs to recurrent neural networks (RNNs)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs033.html#feedback-connections" style="font-size: 80%;"><b>Feedback connections</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs034.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs035.html#recurrent-neural-networks-rnns-overarching-view" style="font-size: 80%;"><b>Recurrent neural networks (RNNs): Overarching view</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs036.html#sequential-data-only" style="font-size: 80%;"><b>Sequential data only?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs037.html#differential-equations" style="font-size: 80%;"><b>Differential equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs038.html#a-simple-example" style="font-size: 80%;"><b>A simple example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs039.html#rnns" style="font-size: 80%;"><b>RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs040.html#what-kinds-of-behaviour-can-rnns-exhibit" style="font-size: 80%;"><b>What kinds of behaviour can RNNs exhibit?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs041.html#basic-layout-figures-from-sebastian-rashcka-et-al-machine-learning-with-sickit-learn-and-pytorch-https-sebastianraschka-com-blog-2022-ml-pytorch-book-html" style="font-size: 80%;"><b>Basic layout,  "Figures from Sebastian Rashcka et al, Machine learning with Sickit-Learn and PyTorch":"https://sebastianraschka.com/blog/2022/ml-pytorch-book.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs042.html#solving-differential-equations-with-rnns" style="font-size: 80%;"><b>Solving differential equations with RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs043.html#two-first-order-differential-equations" style="font-size: 80%;"><b>Two first-order differential equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs044.html#velocity-only" style="font-size: 80%;"><b>Velocity only</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs045.html#linking-with-rnns" style="font-size: 80%;"><b>Linking with RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs046.html#minor-rewrite" style="font-size: 80%;"><b>Minor rewrite</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs047.html#rnns-in-more-detail" style="font-size: 80%;"><b>RNNs in more detail</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs048.html#rnns-in-more-detail-part-2" style="font-size: 80%;"><b>RNNs in more detail, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs049.html#rnns-in-more-detail-part-3" style="font-size: 80%;"><b>RNNs in more detail, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs050.html#rnns-in-more-detail-part-4" style="font-size: 80%;"><b>RNNs in more detail, part 4</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs051.html#rnns-in-more-detail-part-5" style="font-size: 80%;"><b>RNNs in more detail, part 5</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs052.html#rnns-in-more-detail-part-6" style="font-size: 80%;"><b>RNNs in more detail, part 6</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs053.html#rnns-in-more-detail-part-7" style="font-size: 80%;"><b>RNNs in more detail, part 7</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs054.html#backpropagation-through-time" style="font-size: 80%;"><b>Backpropagation through time</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs055.html#the-backward-pass-is-linear" style="font-size: 80%;"><b>The backward pass is linear</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs056.html#the-problem-of-exploding-or-vanishing-gradients" style="font-size: 80%;"><b>The problem of exploding or vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs057.html#mathematical-setup" style="font-size: 80%;"><b>Mathematical setup</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs058.html#back-propagation-in-time-through-figures-part-1" style="font-size: 80%;"><b>Back propagation in time through figures, part 1</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs059.html#back-propagation-in-time-part-2" style="font-size: 80%;"><b>Back propagation in time, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs060.html#back-propagation-in-time-part-3" style="font-size: 80%;"><b>Back propagation in time, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs061.html#back-propagation-in-time-part-4" style="font-size: 80%;"><b>Back propagation in time, part 4</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs062.html#back-propagation-in-time-in-equations" style="font-size: 80%;"><b>Back propagation in time in equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs063.html#chain-rule-again" style="font-size: 80%;"><b>Chain rule again</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs064.html#gradients-of-loss-functions" style="font-size: 80%;"><b>Gradients of loss functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs065.html#summary-of-rnns" style="font-size: 80%;"><b>Summary of RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs066.html#summary-of-a-typical-rnn" style="font-size: 80%;"><b>Summary of a  typical RNN</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs067.html#four-effective-ways-to-learn-an-rnn-and-preparing-for-next-week" style="font-size: 80%;"><b>Four effective ways to learn an RNN and preparing for next week</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs068.html#gating-mechanism-long-short-term-memory-lstm" style="font-size: 80%;"><b>Gating mechanism: Long Short Term Memory (LSTM)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs069.html#implementing-a-memory-cell-in-a-neural-network" style="font-size: 80%;"><b>Implementing a memory cell in a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs070.html#lstm-details" style="font-size: 80%;"><b>LSTM details</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs071.html#basic-layout" style="font-size: 80%;"><b>Basic layout</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs072.html#more-lstm-details" style="font-size: 80%;"><b>More LSTM details</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs073.html#the-forget-gate" style="font-size: 80%;"><b>The forget gate</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs074.html#input-gate" style="font-size: 80%;"><b>Input gate</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs075.html#forget-and-input" style="font-size: 80%;"><b>Forget and input</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs076.html#output-gate" style="font-size: 80%;"><b>Output gate</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs001.html#plans-for-week-45" style="font-size: 80%;">Plans for week 45</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs002.html#material-for-the-lab-sessions" style="font-size: 80%;">Material for the lab sessions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs003.html#material-for-lecture-monday-november-3" style="font-size: 80%;">Material for Lecture Monday November 3</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs004.html#convolutional-neural-networks-recognizing-images-reminder-from-last-week" style="font-size: 80%;">Convolutional Neural Networks (recognizing images), reminder from last week</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs005.html#what-is-the-difference" style="font-size: 80%;">What is the Difference</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs006.html#neural-networks-vs-cnns" style="font-size: 80%;">Neural Networks vs CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs007.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;">Why CNNS for images, sound files, medical images from CT scans etc?</a></li>
+     <!-- navigation toc: --> <li><a href="#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;">Regular NNs don’t scale well to full images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs009.html#3d-volumes-of-neurons" style="font-size: 80%;">3D volumes of neurons</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs010.html#more-on-dimensionalities" style="font-size: 80%;">More on Dimensionalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs011.html#further-remarks" style="font-size: 80%;">Further remarks</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs012.html#layers-used-to-build-cnns" style="font-size: 80%;">Layers used to build CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs013.html#transforming-images" style="font-size: 80%;">Transforming images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs014.html#cnns-in-brief" style="font-size: 80%;">CNNs in brief</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs015.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs016.html#key-idea" style="font-size: 80%;">Key Idea</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs017.html#mathematics-of-cnns" style="font-size: 80%;">Mathematics of CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs018.html#convolution-examples-polynomial-multiplication" style="font-size: 80%;">Convolution Examples: Polynomial multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs019.html#efficient-polynomial-multiplication" style="font-size: 80%;">Efficient Polynomial Multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs020.html#further-simplification" style="font-size: 80%;">Further simplification</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs021.html#a-more-efficient-way-of-coding-the-above-convolution" style="font-size: 80%;">A more efficient way of coding the above Convolution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs022.html#commutative-process" style="font-size: 80%;">Commutative process</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs023.html#toeplitz-matrices" style="font-size: 80%;">Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs024.html#fourier-series-and-toeplitz-matrices" style="font-size: 80%;">Fourier series and Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs025.html#generalizing-the-above-one-dimensional-case" style="font-size: 80%;">Generalizing the above one-dimensional case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs026.html#memory-considerations" style="font-size: 80%;">Memory considerations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs027.html#padding" style="font-size: 80%;">Padding</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs028.html#new-vector" style="font-size: 80%;">New vector</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs029.html#rewriting-as-dot-products" style="font-size: 80%;">Rewriting as dot products</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs030.html#cross-correlation" style="font-size: 80%;">Cross correlation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs030.html#two-dimensional-objects" style="font-size: 80%;">Two-dimensional objects</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#cnns-in-more-detail-simple-example" style="font-size: 80%;">CNNs in more detail, simple example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs032.html#the-convolution-stage" style="font-size: 80%;">The convolution stage</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs033.html#finding-the-number-of-parameters" style="font-size: 80%;">Finding the number of parameters</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs034.html#new-image-or-volume" style="font-size: 80%;">New image (or volume)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs035.html#parameters-to-train-common-settings" style="font-size: 80%;">Parameters to train, common settings</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs036.html#examples-of-cnn-setups" style="font-size: 80%;">Examples of CNN setups</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs037.html#summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Summarizing: Performing a general discrete convolution ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs038.html#pooling" style="font-size: 80%;">Pooling</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs039.html#pooling-arithmetic" style="font-size: 80%;">Pooling arithmetic</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs040.html#pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Pooling types ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs041.html#building-convolutional-neural-networks-using-tensorflow-and-keras" style="font-size: 80%;">Building convolutional neural networks using Tensorflow and Keras</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs042.html#setting-it-up" style="font-size: 80%;">Setting it up</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs043.html#the-mnist-dataset-again" style="font-size: 80%;">The MNIST dataset again</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs044.html#strong-correlations" style="font-size: 80%;">Strong correlations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs045.html#layers-of-a-cnn" style="font-size: 80%;">Layers of a CNN</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs046.html#systematic-reduction" style="font-size: 80%;">Systematic reduction</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs047.html#prerequisites-collect-and-pre-process-data" style="font-size: 80%;">Prerequisites: Collect and pre-process data</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs048.html#importing-keras-and-tensorflow" style="font-size: 80%;">Importing Keras and Tensorflow</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs049.html#running-with-keras" style="font-size: 80%;">Running with Keras</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs050.html#final-part" style="font-size: 80%;">Final part</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs051.html#final-visualization" style="font-size: 80%;">Final visualization</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs052.html#the-cifar01-data-set" style="font-size: 80%;">The CIFAR01 data set</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs053.html#verifying-the-data-set" style="font-size: 80%;">Verifying the data set</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs054.html#set-up-the-model" style="font-size: 80%;">Set up  the model</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs055.html#add-dense-layers-on-top" style="font-size: 80%;">Add Dense layers on top</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs056.html#compile-and-train-the-model" style="font-size: 80%;">Compile and train the model</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs057.html#finally-evaluate-the-model" style="font-size: 80%;">Finally, evaluate the model</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs058.html#building-code-using-pytorch" style="font-size: 80%;">Building code using Pytorch</a></li>
 
         </ul>
       </li>
@@ -467,7 +327,7 @@ <h2 id="regular-nns-don-t-scale-well-to-full-images" class="anchor">Regular NNs
   <li><a href="http://github.com/._week45-bs016.html">17</a></li>
   <li><a href="http://github.com/._week45-bs017.html">18</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week45-bs076.html">77</a></li>
+  <li><a href="http://github.com/._week45-bs058.html">59</a></li>
   <li><a href="http://github.com/._week45-bs009.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week45/html/._week45-bs009.html b/doc/pub/week45/html/._week45-bs009.html
index 37947040b..ad6ce741f 100644
--- a/doc/pub/week45/html/._week45-bs009.html
+++ b/doc/pub/week45/html/._week45-bs009.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks (RNNs)">
-<title>Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks (RNNs)</title>
+<meta name="description" content="Week 45,  Convolutional Neural Networks (CCNs)">
+<title>Week 45,  Convolutional Neural Networks (CCNs)</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week45.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week45-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -37,19 +37,19 @@
 <!-- tocinfo
 {'highest level': 2,
  'sections': [('Plans for week 45', 2, None, 'plans-for-week-45'),
-              ('Material for the lab sessions, additional ways to present '
-               'classification results and other practicalities',
+              ('Material for the lab sessions',
                2,
                None,
-               'material-for-the-lab-sessions-additional-ways-to-present-classification-results-and-other-practicalities'),
-              ('Material for Lecture Monday November 4',
+               'material-for-the-lab-sessions'),
+              ('Material for Lecture Monday November 3',
                2,
                None,
-               'material-for-lecture-monday-november-4'),
-              ('Convolutional Neural Networks (recognizing images)',
+               'material-for-lecture-monday-november-3'),
+              ('Convolutional Neural Networks (recognizing images), reminder '
+               'from last week',
                2,
                None,
-               'convolutional-neural-networks-recognizing-images'),
+               'convolutional-neural-networks-recognizing-images-reminder-from-last-week'),
               ('What is the Difference', 2, None, 'what-is-the-difference'),
               ('Neural Networks vs CNNs', 2, None, 'neural-networks-vs-cnns'),
               ('Why CNNS for images, sound files, medical images from CT scans '
@@ -62,10 +62,13 @@
                None,
                'regular-nns-don-t-scale-well-to-full-images'),
               ('3D volumes of neurons', 2, None, '3d-volumes-of-neurons'),
+              ('More on Dimensionalities', 2, None, 'more-on-dimensionalities'),
+              ('Further remarks', 2, None, 'further-remarks'),
               ('Layers used to build CNNs',
                2,
                None,
                'layers-used-to-build-cnns'),
+              ('Transforming images', 2, None, 'transforming-images'),
               ('CNNs in brief', 2, None, 'cnns-in-brief'),
               ('A deep CNN model ("From Raschka et '
                'al":"https://github.com/rasbt/machine-learning-book")',
@@ -73,208 +76,104 @@
                None,
                'a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
               ('Key Idea', 2, None, 'key-idea'),
-              ('Building convolutional neural networks in Tensorflow and Keras',
+              ('Mathematics of CNNs', 2, None, 'mathematics-of-cnns'),
+              ('Convolution Examples: Polynomial multiplication',
                2,
                None,
-               'building-convolutional-neural-networks-in-tensorflow-and-keras'),
-              ('Setting it up', 2, None, 'setting-it-up'),
-              ('The MNIST dataset again', 2, None, 'the-mnist-dataset-again'),
-              ('Strong correlations', 2, None, 'strong-correlations'),
-              ('Layers of a CNN', 2, None, 'layers-of-a-cnn'),
-              ('Systematic reduction', 2, None, 'systematic-reduction'),
-              ('Prerequisites: Collect and pre-process data',
-               2,
-               None,
-               'prerequisites-collect-and-pre-process-data'),
-              ('Importing Keras and Tensorflow',
-               2,
-               None,
-               'importing-keras-and-tensorflow'),
-              ('Running with Keras', 2, None, 'running-with-keras'),
-              ('Final part', 2, None, 'final-part'),
-              ('Final visualization', 2, None, 'final-visualization'),
-              ('The CIFAR01 data set', 2, None, 'the-cifar01-data-set'),
-              ('Verifying the data set', 2, None, 'verifying-the-data-set'),
-              ('Set up  the model', 2, None, 'set-up-the-model'),
-              ('Add Dense layers on top', 2, None, 'add-dense-layers-on-top'),
-              ('Compile and train the model',
-               2,
-               None,
-               'compile-and-train-the-model'),
-              ('Finally, evaluate the model',
-               2,
-               None,
-               'finally-evaluate-the-model'),
-              ('Building our own CNN code',
-               2,
-               None,
-               'building-our-own-cnn-code'),
-              ('List of contents:', 3, None, 'list-of-contents'),
-              ('Schedulers', 3, None, 'schedulers'),
-              ('Usage of schedulers', 3, None, 'usage-of-schedulers'),
-              ('Cost functions', 3, None, 'cost-functions'),
-              ('Usage of cost functions', 3, None, 'usage-of-cost-functions'),
-              ('Activation functions', 3, None, 'activation-functions'),
-              ('Usage of activation functions',
-               3,
-               None,
-               'usage-of-activation-functions'),
-              ('Convolution', 3, None, 'convolution'),
-              ('Layers', 3, None, 'layers'),
-              ('Convolution2DLayer: convolution in a hidden layer',
-               3,
-               None,
-               'convolution2dlayer-convolution-in-a-hidden-layer'),
-              ('Backpropagation in the convolutional layer',
-               3,
-               None,
-               'backpropagation-in-the-convolutional-layer'),
-              ('Demonstration', 3, None, 'demonstration'),
-              ('Pooling Layer', 3, None, 'pooling-layer'),
-              ('Flattening Layer', 3, None, 'flattening-layer'),
-              ('Fully Connected Layers', 3, None, 'fully-connected-layers'),
-              ('Optimized Convolution2DLayer',
-               3,
-               None,
-               'optimized-convolution2dlayer'),
-              ('The Convolutional Neural Network (CNN)',
-               3,
-               None,
-               'the-convolutional-neural-network-cnn'),
-              ('Usage of CNN code', 3, None, 'usage-of-cnn-code'),
-              ('Additional Remarks', 3, None, 'additional-remarks'),
-              ('Remarks on the speed', 3, None, 'remarks-on-the-speed'),
-              ('Convolution using separable kernels',
-               3,
-               None,
-               'convolution-using-separable-kernels'),
-              ('Convolution in the Fourier domain',
-               3,
-               None,
-               'convolution-in-the-fourier-domain'),
-              ('From FFNNs and CNNs to recurrent neural networks (RNNs)',
-               2,
-               None,
-               'from-ffnns-and-cnns-to-recurrent-neural-networks-rnns'),
-              ('Feedback connections', 2, None, 'feedback-connections'),
-              ('Vanishing gradients', 2, None, 'vanishing-gradients'),
-              ('Recurrent neural networks (RNNs): Overarching view',
+               'convolution-examples-polynomial-multiplication'),
+              ('Efficient Polynomial Multiplication',
                2,
                None,
-               'recurrent-neural-networks-rnns-overarching-view'),
-              ('Sequential data only?', 2, None, 'sequential-data-only'),
-              ('Differential equations', 2, None, 'differential-equations'),
-              ('A simple example', 2, None, 'a-simple-example'),
-              ('RNNs', 2, None, 'rnns'),
-              ('What kinds of behaviour can RNNs exhibit?',
+               'efficient-polynomial-multiplication'),
+              ('Further simplification', 2, None, 'further-simplification'),
+              ('A more efficient way of coding the above Convolution',
                2,
                None,
-               'what-kinds-of-behaviour-can-rnns-exhibit'),
-              ('Basic layout,  "Figures from Sebastian Rashcka et al, Machine '
-               'learning with Sickit-Learn and '
-               'PyTorch":"https://sebastianraschka.com/blog/2022/ml-pytorch-book.html"',
+               'a-more-efficient-way-of-coding-the-above-convolution'),
+              ('Commutative process', 2, None, 'commutative-process'),
+              ('Toeplitz matrices', 2, None, 'toeplitz-matrices'),
+              ('Fourier series and Toeplitz matrices',
                2,
                None,
-               'basic-layout-figures-from-sebastian-rashcka-et-al-machine-learning-with-sickit-learn-and-pytorch-https-sebastianraschka-com-blog-2022-ml-pytorch-book-html'),
-              ('Solving differential equations with RNNs',
+               'fourier-series-and-toeplitz-matrices'),
+              ('Generalizing the above one-dimensional case',
                2,
                None,
-               'solving-differential-equations-with-rnns'),
-              ('Two first-order differential equations',
+               'generalizing-the-above-one-dimensional-case'),
+              ('Memory considerations', 2, None, 'memory-considerations'),
+              ('Padding', 2, None, 'padding'),
+              ('New vector', 2, None, 'new-vector'),
+              ('Rewriting as dot products',
                2,
                None,
-               'two-first-order-differential-equations'),
-              ('Velocity only', 2, None, 'velocity-only'),
-              ('Linking with RNNs', 2, None, 'linking-with-rnns'),
-              ('Minor rewrite', 2, None, 'minor-rewrite'),
-              ('RNNs in more detail', 2, None, 'rnns-in-more-detail'),
-              ('RNNs in more detail, part 2',
+               'rewriting-as-dot-products'),
+              ('Cross correlation', 2, None, 'cross-correlation'),
+              ('Two-dimensional objects', 2, None, 'two-dimensional-objects'),
+              ('CNNs in more detail, simple example',
                2,
                None,
-               'rnns-in-more-detail-part-2'),
-              ('RNNs in more detail, part 3',
+               'cnns-in-more-detail-simple-example'),
+              ('The convolution stage', 2, None, 'the-convolution-stage'),
+              ('Finding the number of parameters',
                2,
                None,
-               'rnns-in-more-detail-part-3'),
-              ('RNNs in more detail, part 4',
+               'finding-the-number-of-parameters'),
+              ('New image (or volume)', 2, None, 'new-image-or-volume'),
+              ('Parameters to train, common settings',
                2,
                None,
-               'rnns-in-more-detail-part-4'),
-              ('RNNs in more detail, part 5',
-               2,
-               None,
-               'rnns-in-more-detail-part-5'),
-              ('RNNs in more detail, part 6',
-               2,
-               None,
-               'rnns-in-more-detail-part-6'),
-              ('RNNs in more detail, part 7',
-               2,
-               None,
-               'rnns-in-more-detail-part-7'),
-              ('Backpropagation through time',
-               2,
-               None,
-               'backpropagation-through-time'),
-              ('The backward pass is linear',
-               2,
-               None,
-               'the-backward-pass-is-linear'),
-              ('The problem of exploding or vanishing gradients',
-               2,
-               None,
-               'the-problem-of-exploding-or-vanishing-gradients'),
-              ('Mathematical setup', 2, None, 'mathematical-setup'),
-              ('Back propagation in time through figures, part 1',
-               2,
-               None,
-               'back-propagation-in-time-through-figures-part-1'),
-              ('Back propagation in time, part 2',
-               2,
-               None,
-               'back-propagation-in-time-part-2'),
-              ('Back propagation in time, part 3',
+               'parameters-to-train-common-settings'),
+              ('Examples of CNN setups', 2, None, 'examples-of-cnn-setups'),
+              ('Summarizing: Performing a general discrete convolution ("From '
+               'Raschka et '
+               'al":"https://github.com/rasbt/machine-learning-book")',
                2,
                None,
-               'back-propagation-in-time-part-3'),
-              ('Back propagation in time, part 4',
+               'summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
+              ('Pooling', 2, None, 'pooling'),
+              ('Pooling arithmetic', 2, None, 'pooling-arithmetic'),
+              ('Pooling types ("From Raschka et '
+               'al":"https://github.com/rasbt/machine-learning-book")',
                2,
                None,
-               'back-propagation-in-time-part-4'),
-              ('Back propagation in time in equations',
+               'pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
+              ('Building convolutional neural networks using Tensorflow and '
+               'Keras',
                2,
                None,
-               'back-propagation-in-time-in-equations'),
-              ('Chain rule again', 2, None, 'chain-rule-again'),
-              ('Gradients of loss functions',
+               'building-convolutional-neural-networks-using-tensorflow-and-keras'),
+              ('Setting it up', 2, None, 'setting-it-up'),
+              ('The MNIST dataset again', 2, None, 'the-mnist-dataset-again'),
+              ('Strong correlations', 2, None, 'strong-correlations'),
+              ('Layers of a CNN', 2, None, 'layers-of-a-cnn'),
+              ('Systematic reduction', 2, None, 'systematic-reduction'),
+              ('Prerequisites: Collect and pre-process data',
                2,
                None,
-               'gradients-of-loss-functions'),
-              ('Summary of RNNs', 2, None, 'summary-of-rnns'),
-              ('Summary of a  typical RNN',
+               'prerequisites-collect-and-pre-process-data'),
+              ('Importing Keras and Tensorflow',
                2,
                None,
-               'summary-of-a-typical-rnn'),
-              ('Four effective ways to learn an RNN and preparing for next '
-               'week',
+               'importing-keras-and-tensorflow'),
+              ('Running with Keras', 2, None, 'running-with-keras'),
+              ('Final part', 2, None, 'final-part'),
+              ('Final visualization', 2, None, 'final-visualization'),
+              ('The CIFAR01 data set', 2, None, 'the-cifar01-data-set'),
+              ('Verifying the data set', 2, None, 'verifying-the-data-set'),
+              ('Set up  the model', 2, None, 'set-up-the-model'),
+              ('Add Dense layers on top', 2, None, 'add-dense-layers-on-top'),
+              ('Compile and train the model',
                2,
                None,
-               'four-effective-ways-to-learn-an-rnn-and-preparing-for-next-week'),
-              ('Gating mechanism: Long Short Term Memory (LSTM)',
+               'compile-and-train-the-model'),
+              ('Finally, evaluate the model',
                2,
                None,
-               'gating-mechanism-long-short-term-memory-lstm'),
-              ('Implementing a memory cell in a neural network',
+               'finally-evaluate-the-model'),
+              ('Building code using Pytorch',
                2,
                None,
-               'implementing-a-memory-cell-in-a-neural-network'),
-              ('LSTM details', 2, None, 'lstm-details'),
-              ('Basic layout', 2, None, 'basic-layout'),
-              ('More LSTM details', 2, None, 'more-lstm-details'),
-              ('The forget gate', 2, None, 'the-forget-gate'),
-              ('Input gate', 2, None, 'input-gate'),
-              ('Forget and input', 2, None, 'forget-and-input'),
-              ('Output gate', 2, None, 'output-gate')]}
+               'building-code-using-pytorch')]}
 end of tocinfo -->
 
 <body>
@@ -302,111 +201,72 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week45-bs.html">Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks (RNNs)</a>
+    <a class="navbar-brand" href="http://github.com/week45-bs.html">Week 45,  Convolutional Neural Networks (CCNs)</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs001.html#plans-for-week-45" style="font-size: 80%;"><b>Plans for week 45</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs002.html#material-for-the-lab-sessions-additional-ways-to-present-classification-results-and-other-practicalities" style="font-size: 80%;"><b>Material for the lab sessions, additional ways to present classification results and other practicalities</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs003.html#material-for-lecture-monday-november-4" style="font-size: 80%;"><b>Material for Lecture Monday November 4</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs004.html#convolutional-neural-networks-recognizing-images" style="font-size: 80%;"><b>Convolutional Neural Networks (recognizing images)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs005.html#what-is-the-difference" style="font-size: 80%;"><b>What is the Difference</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs006.html#neural-networks-vs-cnns" style="font-size: 80%;"><b>Neural Networks vs CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs007.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;"><b>Why CNNS for images, sound files, medical images from CT scans etc?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs008.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;"><b>Regular NNs don’t scale well to full images</b></a></li>
-     <!-- navigation toc: --> <li><a href="#3d-volumes-of-neurons" style="font-size: 80%;"><b>3D volumes of neurons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs010.html#layers-used-to-build-cnns" style="font-size: 80%;"><b>Layers used to build CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs011.html#cnns-in-brief" style="font-size: 80%;"><b>CNNs in brief</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs012.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs013.html#key-idea" style="font-size: 80%;"><b>Key Idea</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs014.html#building-convolutional-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building convolutional neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs015.html#setting-it-up" style="font-size: 80%;"><b>Setting it up</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs016.html#the-mnist-dataset-again" style="font-size: 80%;"><b>The MNIST dataset again</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs017.html#strong-correlations" style="font-size: 80%;"><b>Strong correlations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs018.html#layers-of-a-cnn" style="font-size: 80%;"><b>Layers of a CNN</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs019.html#systematic-reduction" style="font-size: 80%;"><b>Systematic reduction</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs020.html#prerequisites-collect-and-pre-process-data" style="font-size: 80%;"><b>Prerequisites: Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs021.html#importing-keras-and-tensorflow" style="font-size: 80%;"><b>Importing Keras and Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs022.html#running-with-keras" style="font-size: 80%;"><b>Running with Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs023.html#final-part" style="font-size: 80%;"><b>Final part</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs024.html#final-visualization" style="font-size: 80%;"><b>Final visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs025.html#the-cifar01-data-set" style="font-size: 80%;"><b>The CIFAR01 data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs026.html#verifying-the-data-set" style="font-size: 80%;"><b>Verifying the data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs027.html#set-up-the-model" style="font-size: 80%;"><b>Set up  the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs028.html#add-dense-layers-on-top" style="font-size: 80%;"><b>Add Dense layers on top</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs029.html#compile-and-train-the-model" style="font-size: 80%;"><b>Compile and train the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs030.html#finally-evaluate-the-model" style="font-size: 80%;"><b>Finally, evaluate the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#building-our-own-cnn-code" style="font-size: 80%;"><b>Building our own CNN code</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#list-of-contents" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;List of contents:</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution2dlayer-convolution-in-a-hidden-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution2DLayer: convolution in a hidden layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#backpropagation-in-the-convolutional-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Backpropagation in the convolutional layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#demonstration" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Demonstration</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#pooling-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Pooling Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#flattening-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Flattening Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#fully-connected-layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Fully Connected Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#optimized-convolution2dlayer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Optimized Convolution2DLayer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#the-convolutional-neural-network-cnn" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;The Convolutional Neural Network (CNN)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-cnn-code" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of CNN code</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#additional-remarks" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Additional Remarks</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#remarks-on-the-speed" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Remarks on the speed</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution-using-separable-kernels" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution using separable kernels</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution-in-the-fourier-domain" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution in the Fourier domain</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs032.html#from-ffnns-and-cnns-to-recurrent-neural-networks-rnns" style="font-size: 80%;"><b>From FFNNs and CNNs to recurrent neural networks (RNNs)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs033.html#feedback-connections" style="font-size: 80%;"><b>Feedback connections</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs034.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs035.html#recurrent-neural-networks-rnns-overarching-view" style="font-size: 80%;"><b>Recurrent neural networks (RNNs): Overarching view</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs036.html#sequential-data-only" style="font-size: 80%;"><b>Sequential data only?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs037.html#differential-equations" style="font-size: 80%;"><b>Differential equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs038.html#a-simple-example" style="font-size: 80%;"><b>A simple example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs039.html#rnns" style="font-size: 80%;"><b>RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs040.html#what-kinds-of-behaviour-can-rnns-exhibit" style="font-size: 80%;"><b>What kinds of behaviour can RNNs exhibit?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs041.html#basic-layout-figures-from-sebastian-rashcka-et-al-machine-learning-with-sickit-learn-and-pytorch-https-sebastianraschka-com-blog-2022-ml-pytorch-book-html" style="font-size: 80%;"><b>Basic layout,  "Figures from Sebastian Rashcka et al, Machine learning with Sickit-Learn and PyTorch":"https://sebastianraschka.com/blog/2022/ml-pytorch-book.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs042.html#solving-differential-equations-with-rnns" style="font-size: 80%;"><b>Solving differential equations with RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs043.html#two-first-order-differential-equations" style="font-size: 80%;"><b>Two first-order differential equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs044.html#velocity-only" style="font-size: 80%;"><b>Velocity only</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs045.html#linking-with-rnns" style="font-size: 80%;"><b>Linking with RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs046.html#minor-rewrite" style="font-size: 80%;"><b>Minor rewrite</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs047.html#rnns-in-more-detail" style="font-size: 80%;"><b>RNNs in more detail</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs048.html#rnns-in-more-detail-part-2" style="font-size: 80%;"><b>RNNs in more detail, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs049.html#rnns-in-more-detail-part-3" style="font-size: 80%;"><b>RNNs in more detail, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs050.html#rnns-in-more-detail-part-4" style="font-size: 80%;"><b>RNNs in more detail, part 4</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs051.html#rnns-in-more-detail-part-5" style="font-size: 80%;"><b>RNNs in more detail, part 5</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs052.html#rnns-in-more-detail-part-6" style="font-size: 80%;"><b>RNNs in more detail, part 6</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs053.html#rnns-in-more-detail-part-7" style="font-size: 80%;"><b>RNNs in more detail, part 7</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs054.html#backpropagation-through-time" style="font-size: 80%;"><b>Backpropagation through time</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs055.html#the-backward-pass-is-linear" style="font-size: 80%;"><b>The backward pass is linear</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs056.html#the-problem-of-exploding-or-vanishing-gradients" style="font-size: 80%;"><b>The problem of exploding or vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs057.html#mathematical-setup" style="font-size: 80%;"><b>Mathematical setup</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs058.html#back-propagation-in-time-through-figures-part-1" style="font-size: 80%;"><b>Back propagation in time through figures, part 1</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs059.html#back-propagation-in-time-part-2" style="font-size: 80%;"><b>Back propagation in time, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs060.html#back-propagation-in-time-part-3" style="font-size: 80%;"><b>Back propagation in time, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs061.html#back-propagation-in-time-part-4" style="font-size: 80%;"><b>Back propagation in time, part 4</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs062.html#back-propagation-in-time-in-equations" style="font-size: 80%;"><b>Back propagation in time in equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs063.html#chain-rule-again" style="font-size: 80%;"><b>Chain rule again</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs064.html#gradients-of-loss-functions" style="font-size: 80%;"><b>Gradients of loss functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs065.html#summary-of-rnns" style="font-size: 80%;"><b>Summary of RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs066.html#summary-of-a-typical-rnn" style="font-size: 80%;"><b>Summary of a  typical RNN</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs067.html#four-effective-ways-to-learn-an-rnn-and-preparing-for-next-week" style="font-size: 80%;"><b>Four effective ways to learn an RNN and preparing for next week</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs068.html#gating-mechanism-long-short-term-memory-lstm" style="font-size: 80%;"><b>Gating mechanism: Long Short Term Memory (LSTM)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs069.html#implementing-a-memory-cell-in-a-neural-network" style="font-size: 80%;"><b>Implementing a memory cell in a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs070.html#lstm-details" style="font-size: 80%;"><b>LSTM details</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs071.html#basic-layout" style="font-size: 80%;"><b>Basic layout</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs072.html#more-lstm-details" style="font-size: 80%;"><b>More LSTM details</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs073.html#the-forget-gate" style="font-size: 80%;"><b>The forget gate</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs074.html#input-gate" style="font-size: 80%;"><b>Input gate</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs075.html#forget-and-input" style="font-size: 80%;"><b>Forget and input</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs076.html#output-gate" style="font-size: 80%;"><b>Output gate</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs001.html#plans-for-week-45" style="font-size: 80%;">Plans for week 45</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs002.html#material-for-the-lab-sessions" style="font-size: 80%;">Material for the lab sessions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs003.html#material-for-lecture-monday-november-3" style="font-size: 80%;">Material for Lecture Monday November 3</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs004.html#convolutional-neural-networks-recognizing-images-reminder-from-last-week" style="font-size: 80%;">Convolutional Neural Networks (recognizing images), reminder from last week</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs005.html#what-is-the-difference" style="font-size: 80%;">What is the Difference</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs006.html#neural-networks-vs-cnns" style="font-size: 80%;">Neural Networks vs CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs007.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;">Why CNNS for images, sound files, medical images from CT scans etc?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs008.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;">Regular NNs don’t scale well to full images</a></li>
+     <!-- navigation toc: --> <li><a href="#3d-volumes-of-neurons" style="font-size: 80%;">3D volumes of neurons</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs010.html#more-on-dimensionalities" style="font-size: 80%;">More on Dimensionalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs011.html#further-remarks" style="font-size: 80%;">Further remarks</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs012.html#layers-used-to-build-cnns" style="font-size: 80%;">Layers used to build CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs013.html#transforming-images" style="font-size: 80%;">Transforming images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs014.html#cnns-in-brief" style="font-size: 80%;">CNNs in brief</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs015.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs016.html#key-idea" style="font-size: 80%;">Key Idea</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs017.html#mathematics-of-cnns" style="font-size: 80%;">Mathematics of CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs018.html#convolution-examples-polynomial-multiplication" style="font-size: 80%;">Convolution Examples: Polynomial multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs019.html#efficient-polynomial-multiplication" style="font-size: 80%;">Efficient Polynomial Multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs020.html#further-simplification" style="font-size: 80%;">Further simplification</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs021.html#a-more-efficient-way-of-coding-the-above-convolution" style="font-size: 80%;">A more efficient way of coding the above Convolution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs022.html#commutative-process" style="font-size: 80%;">Commutative process</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs023.html#toeplitz-matrices" style="font-size: 80%;">Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs024.html#fourier-series-and-toeplitz-matrices" style="font-size: 80%;">Fourier series and Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs025.html#generalizing-the-above-one-dimensional-case" style="font-size: 80%;">Generalizing the above one-dimensional case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs026.html#memory-considerations" style="font-size: 80%;">Memory considerations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs027.html#padding" style="font-size: 80%;">Padding</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs028.html#new-vector" style="font-size: 80%;">New vector</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs029.html#rewriting-as-dot-products" style="font-size: 80%;">Rewriting as dot products</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs030.html#cross-correlation" style="font-size: 80%;">Cross correlation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs030.html#two-dimensional-objects" style="font-size: 80%;">Two-dimensional objects</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#cnns-in-more-detail-simple-example" style="font-size: 80%;">CNNs in more detail, simple example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs032.html#the-convolution-stage" style="font-size: 80%;">The convolution stage</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs033.html#finding-the-number-of-parameters" style="font-size: 80%;">Finding the number of parameters</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs034.html#new-image-or-volume" style="font-size: 80%;">New image (or volume)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs035.html#parameters-to-train-common-settings" style="font-size: 80%;">Parameters to train, common settings</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs036.html#examples-of-cnn-setups" style="font-size: 80%;">Examples of CNN setups</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs037.html#summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Summarizing: Performing a general discrete convolution ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs038.html#pooling" style="font-size: 80%;">Pooling</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs039.html#pooling-arithmetic" style="font-size: 80%;">Pooling arithmetic</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs040.html#pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Pooling types ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs041.html#building-convolutional-neural-networks-using-tensorflow-and-keras" style="font-size: 80%;">Building convolutional neural networks using Tensorflow and Keras</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs042.html#setting-it-up" style="font-size: 80%;">Setting it up</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs043.html#the-mnist-dataset-again" style="font-size: 80%;">The MNIST dataset again</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs044.html#strong-correlations" style="font-size: 80%;">Strong correlations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs045.html#layers-of-a-cnn" style="font-size: 80%;">Layers of a CNN</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs046.html#systematic-reduction" style="font-size: 80%;">Systematic reduction</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs047.html#prerequisites-collect-and-pre-process-data" style="font-size: 80%;">Prerequisites: Collect and pre-process data</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs048.html#importing-keras-and-tensorflow" style="font-size: 80%;">Importing Keras and Tensorflow</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs049.html#running-with-keras" style="font-size: 80%;">Running with Keras</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs050.html#final-part" style="font-size: 80%;">Final part</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs051.html#final-visualization" style="font-size: 80%;">Final visualization</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs052.html#the-cifar01-data-set" style="font-size: 80%;">The CIFAR01 data set</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs053.html#verifying-the-data-set" style="font-size: 80%;">Verifying the data set</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs054.html#set-up-the-model" style="font-size: 80%;">Set up  the model</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs055.html#add-dense-layers-on-top" style="font-size: 80%;">Add Dense layers on top</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs056.html#compile-and-train-the-model" style="font-size: 80%;">Compile and train the model</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs057.html#finally-evaluate-the-model" style="font-size: 80%;">Finally, evaluate the model</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs058.html#building-code-using-pytorch" style="font-size: 80%;">Building code using Pytorch</a></li>
 
         </ul>
       </li>
@@ -480,7 +340,7 @@ <h2 id="3d-volumes-of-neurons" class="anchor">3D volumes of neurons </h2>
   <li><a href="http://github.com/._week45-bs017.html">18</a></li>
   <li><a href="http://github.com/._week45-bs018.html">19</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week45-bs076.html">77</a></li>
+  <li><a href="http://github.com/._week45-bs058.html">59</a></li>
   <li><a href="http://github.com/._week45-bs010.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week45/html/._week45-bs010.html b/doc/pub/week45/html/._week45-bs010.html
index 4ca4d78ec..37f788429 100644
--- a/doc/pub/week45/html/._week45-bs010.html
+++ b/doc/pub/week45/html/._week45-bs010.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks (RNNs)">
-<title>Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks (RNNs)</title>
+<meta name="description" content="Week 45,  Convolutional Neural Networks (CCNs)">
+<title>Week 45,  Convolutional Neural Networks (CCNs)</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week45.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week45-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -37,19 +37,19 @@
 <!-- tocinfo
 {'highest level': 2,
  'sections': [('Plans for week 45', 2, None, 'plans-for-week-45'),
-              ('Material for the lab sessions, additional ways to present '
-               'classification results and other practicalities',
+              ('Material for the lab sessions',
                2,
                None,
-               'material-for-the-lab-sessions-additional-ways-to-present-classification-results-and-other-practicalities'),
-              ('Material for Lecture Monday November 4',
+               'material-for-the-lab-sessions'),
+              ('Material for Lecture Monday November 3',
                2,
                None,
-               'material-for-lecture-monday-november-4'),
-              ('Convolutional Neural Networks (recognizing images)',
+               'material-for-lecture-monday-november-3'),
+              ('Convolutional Neural Networks (recognizing images), reminder '
+               'from last week',
                2,
                None,
-               'convolutional-neural-networks-recognizing-images'),
+               'convolutional-neural-networks-recognizing-images-reminder-from-last-week'),
               ('What is the Difference', 2, None, 'what-is-the-difference'),
               ('Neural Networks vs CNNs', 2, None, 'neural-networks-vs-cnns'),
               ('Why CNNS for images, sound files, medical images from CT scans '
@@ -62,10 +62,13 @@
                None,
                'regular-nns-don-t-scale-well-to-full-images'),
               ('3D volumes of neurons', 2, None, '3d-volumes-of-neurons'),
+              ('More on Dimensionalities', 2, None, 'more-on-dimensionalities'),
+              ('Further remarks', 2, None, 'further-remarks'),
               ('Layers used to build CNNs',
                2,
                None,
                'layers-used-to-build-cnns'),
+              ('Transforming images', 2, None, 'transforming-images'),
               ('CNNs in brief', 2, None, 'cnns-in-brief'),
               ('A deep CNN model ("From Raschka et '
                'al":"https://github.com/rasbt/machine-learning-book")',
@@ -73,208 +76,104 @@
                None,
                'a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
               ('Key Idea', 2, None, 'key-idea'),
-              ('Building convolutional neural networks in Tensorflow and Keras',
+              ('Mathematics of CNNs', 2, None, 'mathematics-of-cnns'),
+              ('Convolution Examples: Polynomial multiplication',
                2,
                None,
-               'building-convolutional-neural-networks-in-tensorflow-and-keras'),
-              ('Setting it up', 2, None, 'setting-it-up'),
-              ('The MNIST dataset again', 2, None, 'the-mnist-dataset-again'),
-              ('Strong correlations', 2, None, 'strong-correlations'),
-              ('Layers of a CNN', 2, None, 'layers-of-a-cnn'),
-              ('Systematic reduction', 2, None, 'systematic-reduction'),
-              ('Prerequisites: Collect and pre-process data',
-               2,
-               None,
-               'prerequisites-collect-and-pre-process-data'),
-              ('Importing Keras and Tensorflow',
-               2,
-               None,
-               'importing-keras-and-tensorflow'),
-              ('Running with Keras', 2, None, 'running-with-keras'),
-              ('Final part', 2, None, 'final-part'),
-              ('Final visualization', 2, None, 'final-visualization'),
-              ('The CIFAR01 data set', 2, None, 'the-cifar01-data-set'),
-              ('Verifying the data set', 2, None, 'verifying-the-data-set'),
-              ('Set up  the model', 2, None, 'set-up-the-model'),
-              ('Add Dense layers on top', 2, None, 'add-dense-layers-on-top'),
-              ('Compile and train the model',
+               'convolution-examples-polynomial-multiplication'),
+              ('Efficient Polynomial Multiplication',
                2,
                None,
-               'compile-and-train-the-model'),
-              ('Finally, evaluate the model',
+               'efficient-polynomial-multiplication'),
+              ('Further simplification', 2, None, 'further-simplification'),
+              ('A more efficient way of coding the above Convolution',
                2,
                None,
-               'finally-evaluate-the-model'),
-              ('Building our own CNN code',
+               'a-more-efficient-way-of-coding-the-above-convolution'),
+              ('Commutative process', 2, None, 'commutative-process'),
+              ('Toeplitz matrices', 2, None, 'toeplitz-matrices'),
+              ('Fourier series and Toeplitz matrices',
                2,
                None,
-               'building-our-own-cnn-code'),
-              ('List of contents:', 3, None, 'list-of-contents'),
-              ('Schedulers', 3, None, 'schedulers'),
-              ('Usage of schedulers', 3, None, 'usage-of-schedulers'),
-              ('Cost functions', 3, None, 'cost-functions'),
-              ('Usage of cost functions', 3, None, 'usage-of-cost-functions'),
-              ('Activation functions', 3, None, 'activation-functions'),
-              ('Usage of activation functions',
-               3,
-               None,
-               'usage-of-activation-functions'),
-              ('Convolution', 3, None, 'convolution'),
-              ('Layers', 3, None, 'layers'),
-              ('Convolution2DLayer: convolution in a hidden layer',
-               3,
-               None,
-               'convolution2dlayer-convolution-in-a-hidden-layer'),
-              ('Backpropagation in the convolutional layer',
-               3,
-               None,
-               'backpropagation-in-the-convolutional-layer'),
-              ('Demonstration', 3, None, 'demonstration'),
-              ('Pooling Layer', 3, None, 'pooling-layer'),
-              ('Flattening Layer', 3, None, 'flattening-layer'),
-              ('Fully Connected Layers', 3, None, 'fully-connected-layers'),
-              ('Optimized Convolution2DLayer',
-               3,
-               None,
-               'optimized-convolution2dlayer'),
-              ('The Convolutional Neural Network (CNN)',
-               3,
-               None,
-               'the-convolutional-neural-network-cnn'),
-              ('Usage of CNN code', 3, None, 'usage-of-cnn-code'),
-              ('Additional Remarks', 3, None, 'additional-remarks'),
-              ('Remarks on the speed', 3, None, 'remarks-on-the-speed'),
-              ('Convolution using separable kernels',
-               3,
-               None,
-               'convolution-using-separable-kernels'),
-              ('Convolution in the Fourier domain',
-               3,
-               None,
-               'convolution-in-the-fourier-domain'),
-              ('From FFNNs and CNNs to recurrent neural networks (RNNs)',
+               'fourier-series-and-toeplitz-matrices'),
+              ('Generalizing the above one-dimensional case',
                2,
                None,
-               'from-ffnns-and-cnns-to-recurrent-neural-networks-rnns'),
-              ('Feedback connections', 2, None, 'feedback-connections'),
-              ('Vanishing gradients', 2, None, 'vanishing-gradients'),
-              ('Recurrent neural networks (RNNs): Overarching view',
+               'generalizing-the-above-one-dimensional-case'),
+              ('Memory considerations', 2, None, 'memory-considerations'),
+              ('Padding', 2, None, 'padding'),
+              ('New vector', 2, None, 'new-vector'),
+              ('Rewriting as dot products',
                2,
                None,
-               'recurrent-neural-networks-rnns-overarching-view'),
-              ('Sequential data only?', 2, None, 'sequential-data-only'),
-              ('Differential equations', 2, None, 'differential-equations'),
-              ('A simple example', 2, None, 'a-simple-example'),
-              ('RNNs', 2, None, 'rnns'),
-              ('What kinds of behaviour can RNNs exhibit?',
+               'rewriting-as-dot-products'),
+              ('Cross correlation', 2, None, 'cross-correlation'),
+              ('Two-dimensional objects', 2, None, 'two-dimensional-objects'),
+              ('CNNs in more detail, simple example',
                2,
                None,
-               'what-kinds-of-behaviour-can-rnns-exhibit'),
-              ('Basic layout,  "Figures from Sebastian Rashcka et al, Machine '
-               'learning with Sickit-Learn and '
-               'PyTorch":"https://sebastianraschka.com/blog/2022/ml-pytorch-book.html"',
+               'cnns-in-more-detail-simple-example'),
+              ('The convolution stage', 2, None, 'the-convolution-stage'),
+              ('Finding the number of parameters',
                2,
                None,
-               'basic-layout-figures-from-sebastian-rashcka-et-al-machine-learning-with-sickit-learn-and-pytorch-https-sebastianraschka-com-blog-2022-ml-pytorch-book-html'),
-              ('Solving differential equations with RNNs',
+               'finding-the-number-of-parameters'),
+              ('New image (or volume)', 2, None, 'new-image-or-volume'),
+              ('Parameters to train, common settings',
                2,
                None,
-               'solving-differential-equations-with-rnns'),
-              ('Two first-order differential equations',
-               2,
-               None,
-               'two-first-order-differential-equations'),
-              ('Velocity only', 2, None, 'velocity-only'),
-              ('Linking with RNNs', 2, None, 'linking-with-rnns'),
-              ('Minor rewrite', 2, None, 'minor-rewrite'),
-              ('RNNs in more detail', 2, None, 'rnns-in-more-detail'),
-              ('RNNs in more detail, part 2',
-               2,
-               None,
-               'rnns-in-more-detail-part-2'),
-              ('RNNs in more detail, part 3',
-               2,
-               None,
-               'rnns-in-more-detail-part-3'),
-              ('RNNs in more detail, part 4',
-               2,
-               None,
-               'rnns-in-more-detail-part-4'),
-              ('RNNs in more detail, part 5',
-               2,
-               None,
-               'rnns-in-more-detail-part-5'),
-              ('RNNs in more detail, part 6',
-               2,
-               None,
-               'rnns-in-more-detail-part-6'),
-              ('RNNs in more detail, part 7',
-               2,
-               None,
-               'rnns-in-more-detail-part-7'),
-              ('Backpropagation through time',
-               2,
-               None,
-               'backpropagation-through-time'),
-              ('The backward pass is linear',
-               2,
-               None,
-               'the-backward-pass-is-linear'),
-              ('The problem of exploding or vanishing gradients',
-               2,
-               None,
-               'the-problem-of-exploding-or-vanishing-gradients'),
-              ('Mathematical setup', 2, None, 'mathematical-setup'),
-              ('Back propagation in time through figures, part 1',
-               2,
-               None,
-               'back-propagation-in-time-through-figures-part-1'),
-              ('Back propagation in time, part 2',
-               2,
-               None,
-               'back-propagation-in-time-part-2'),
-              ('Back propagation in time, part 3',
+               'parameters-to-train-common-settings'),
+              ('Examples of CNN setups', 2, None, 'examples-of-cnn-setups'),
+              ('Summarizing: Performing a general discrete convolution ("From '
+               'Raschka et '
+               'al":"https://github.com/rasbt/machine-learning-book")',
                2,
                None,
-               'back-propagation-in-time-part-3'),
-              ('Back propagation in time, part 4',
+               'summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
+              ('Pooling', 2, None, 'pooling'),
+              ('Pooling arithmetic', 2, None, 'pooling-arithmetic'),
+              ('Pooling types ("From Raschka et '
+               'al":"https://github.com/rasbt/machine-learning-book")',
                2,
                None,
-               'back-propagation-in-time-part-4'),
-              ('Back propagation in time in equations',
+               'pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
+              ('Building convolutional neural networks using Tensorflow and '
+               'Keras',
                2,
                None,
-               'back-propagation-in-time-in-equations'),
-              ('Chain rule again', 2, None, 'chain-rule-again'),
-              ('Gradients of loss functions',
+               'building-convolutional-neural-networks-using-tensorflow-and-keras'),
+              ('Setting it up', 2, None, 'setting-it-up'),
+              ('The MNIST dataset again', 2, None, 'the-mnist-dataset-again'),
+              ('Strong correlations', 2, None, 'strong-correlations'),
+              ('Layers of a CNN', 2, None, 'layers-of-a-cnn'),
+              ('Systematic reduction', 2, None, 'systematic-reduction'),
+              ('Prerequisites: Collect and pre-process data',
                2,
                None,
-               'gradients-of-loss-functions'),
-              ('Summary of RNNs', 2, None, 'summary-of-rnns'),
-              ('Summary of a  typical RNN',
+               'prerequisites-collect-and-pre-process-data'),
+              ('Importing Keras and Tensorflow',
                2,
                None,
-               'summary-of-a-typical-rnn'),
-              ('Four effective ways to learn an RNN and preparing for next '
-               'week',
+               'importing-keras-and-tensorflow'),
+              ('Running with Keras', 2, None, 'running-with-keras'),
+              ('Final part', 2, None, 'final-part'),
+              ('Final visualization', 2, None, 'final-visualization'),
+              ('The CIFAR01 data set', 2, None, 'the-cifar01-data-set'),
+              ('Verifying the data set', 2, None, 'verifying-the-data-set'),
+              ('Set up  the model', 2, None, 'set-up-the-model'),
+              ('Add Dense layers on top', 2, None, 'add-dense-layers-on-top'),
+              ('Compile and train the model',
                2,
                None,
-               'four-effective-ways-to-learn-an-rnn-and-preparing-for-next-week'),
-              ('Gating mechanism: Long Short Term Memory (LSTM)',
+               'compile-and-train-the-model'),
+              ('Finally, evaluate the model',
                2,
                None,
-               'gating-mechanism-long-short-term-memory-lstm'),
-              ('Implementing a memory cell in a neural network',
+               'finally-evaluate-the-model'),
+              ('Building code using Pytorch',
                2,
                None,
-               'implementing-a-memory-cell-in-a-neural-network'),
-              ('LSTM details', 2, None, 'lstm-details'),
-              ('Basic layout', 2, None, 'basic-layout'),
-              ('More LSTM details', 2, None, 'more-lstm-details'),
-              ('The forget gate', 2, None, 'the-forget-gate'),
-              ('Input gate', 2, None, 'input-gate'),
-              ('Forget and input', 2, None, 'forget-and-input'),
-              ('Output gate', 2, None, 'output-gate')]}
+               'building-code-using-pytorch')]}
 end of tocinfo -->
 
 <body>
@@ -302,111 +201,72 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week45-bs.html">Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks (RNNs)</a>
+    <a class="navbar-brand" href="http://github.com/week45-bs.html">Week 45,  Convolutional Neural Networks (CCNs)</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs001.html#plans-for-week-45" style="font-size: 80%;"><b>Plans for week 45</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs002.html#material-for-the-lab-sessions-additional-ways-to-present-classification-results-and-other-practicalities" style="font-size: 80%;"><b>Material for the lab sessions, additional ways to present classification results and other practicalities</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs003.html#material-for-lecture-monday-november-4" style="font-size: 80%;"><b>Material for Lecture Monday November 4</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs004.html#convolutional-neural-networks-recognizing-images" style="font-size: 80%;"><b>Convolutional Neural Networks (recognizing images)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs005.html#what-is-the-difference" style="font-size: 80%;"><b>What is the Difference</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs006.html#neural-networks-vs-cnns" style="font-size: 80%;"><b>Neural Networks vs CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs007.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;"><b>Why CNNS for images, sound files, medical images from CT scans etc?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs008.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;"><b>Regular NNs don’t scale well to full images</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs009.html#3d-volumes-of-neurons" style="font-size: 80%;"><b>3D volumes of neurons</b></a></li>
-     <!-- navigation toc: --> <li><a href="#layers-used-to-build-cnns" style="font-size: 80%;"><b>Layers used to build CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs011.html#cnns-in-brief" style="font-size: 80%;"><b>CNNs in brief</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs012.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs013.html#key-idea" style="font-size: 80%;"><b>Key Idea</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs014.html#building-convolutional-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building convolutional neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs015.html#setting-it-up" style="font-size: 80%;"><b>Setting it up</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs016.html#the-mnist-dataset-again" style="font-size: 80%;"><b>The MNIST dataset again</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs017.html#strong-correlations" style="font-size: 80%;"><b>Strong correlations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs018.html#layers-of-a-cnn" style="font-size: 80%;"><b>Layers of a CNN</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs019.html#systematic-reduction" style="font-size: 80%;"><b>Systematic reduction</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs020.html#prerequisites-collect-and-pre-process-data" style="font-size: 80%;"><b>Prerequisites: Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs021.html#importing-keras-and-tensorflow" style="font-size: 80%;"><b>Importing Keras and Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs022.html#running-with-keras" style="font-size: 80%;"><b>Running with Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs023.html#final-part" style="font-size: 80%;"><b>Final part</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs024.html#final-visualization" style="font-size: 80%;"><b>Final visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs025.html#the-cifar01-data-set" style="font-size: 80%;"><b>The CIFAR01 data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs026.html#verifying-the-data-set" style="font-size: 80%;"><b>Verifying the data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs027.html#set-up-the-model" style="font-size: 80%;"><b>Set up  the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs028.html#add-dense-layers-on-top" style="font-size: 80%;"><b>Add Dense layers on top</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs029.html#compile-and-train-the-model" style="font-size: 80%;"><b>Compile and train the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs030.html#finally-evaluate-the-model" style="font-size: 80%;"><b>Finally, evaluate the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#building-our-own-cnn-code" style="font-size: 80%;"><b>Building our own CNN code</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#list-of-contents" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;List of contents:</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution2dlayer-convolution-in-a-hidden-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution2DLayer: convolution in a hidden layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#backpropagation-in-the-convolutional-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Backpropagation in the convolutional layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#demonstration" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Demonstration</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#pooling-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Pooling Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#flattening-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Flattening Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#fully-connected-layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Fully Connected Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#optimized-convolution2dlayer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Optimized Convolution2DLayer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#the-convolutional-neural-network-cnn" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;The Convolutional Neural Network (CNN)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-cnn-code" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of CNN code</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#additional-remarks" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Additional Remarks</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#remarks-on-the-speed" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Remarks on the speed</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution-using-separable-kernels" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution using separable kernels</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution-in-the-fourier-domain" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution in the Fourier domain</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs032.html#from-ffnns-and-cnns-to-recurrent-neural-networks-rnns" style="font-size: 80%;"><b>From FFNNs and CNNs to recurrent neural networks (RNNs)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs033.html#feedback-connections" style="font-size: 80%;"><b>Feedback connections</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs034.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs035.html#recurrent-neural-networks-rnns-overarching-view" style="font-size: 80%;"><b>Recurrent neural networks (RNNs): Overarching view</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs036.html#sequential-data-only" style="font-size: 80%;"><b>Sequential data only?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs037.html#differential-equations" style="font-size: 80%;"><b>Differential equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs038.html#a-simple-example" style="font-size: 80%;"><b>A simple example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs039.html#rnns" style="font-size: 80%;"><b>RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs040.html#what-kinds-of-behaviour-can-rnns-exhibit" style="font-size: 80%;"><b>What kinds of behaviour can RNNs exhibit?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs041.html#basic-layout-figures-from-sebastian-rashcka-et-al-machine-learning-with-sickit-learn-and-pytorch-https-sebastianraschka-com-blog-2022-ml-pytorch-book-html" style="font-size: 80%;"><b>Basic layout,  "Figures from Sebastian Rashcka et al, Machine learning with Sickit-Learn and PyTorch":"https://sebastianraschka.com/blog/2022/ml-pytorch-book.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs042.html#solving-differential-equations-with-rnns" style="font-size: 80%;"><b>Solving differential equations with RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs043.html#two-first-order-differential-equations" style="font-size: 80%;"><b>Two first-order differential equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs044.html#velocity-only" style="font-size: 80%;"><b>Velocity only</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs045.html#linking-with-rnns" style="font-size: 80%;"><b>Linking with RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs046.html#minor-rewrite" style="font-size: 80%;"><b>Minor rewrite</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs047.html#rnns-in-more-detail" style="font-size: 80%;"><b>RNNs in more detail</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs048.html#rnns-in-more-detail-part-2" style="font-size: 80%;"><b>RNNs in more detail, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs049.html#rnns-in-more-detail-part-3" style="font-size: 80%;"><b>RNNs in more detail, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs050.html#rnns-in-more-detail-part-4" style="font-size: 80%;"><b>RNNs in more detail, part 4</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs051.html#rnns-in-more-detail-part-5" style="font-size: 80%;"><b>RNNs in more detail, part 5</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs052.html#rnns-in-more-detail-part-6" style="font-size: 80%;"><b>RNNs in more detail, part 6</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs053.html#rnns-in-more-detail-part-7" style="font-size: 80%;"><b>RNNs in more detail, part 7</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs054.html#backpropagation-through-time" style="font-size: 80%;"><b>Backpropagation through time</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs055.html#the-backward-pass-is-linear" style="font-size: 80%;"><b>The backward pass is linear</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs056.html#the-problem-of-exploding-or-vanishing-gradients" style="font-size: 80%;"><b>The problem of exploding or vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs057.html#mathematical-setup" style="font-size: 80%;"><b>Mathematical setup</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs058.html#back-propagation-in-time-through-figures-part-1" style="font-size: 80%;"><b>Back propagation in time through figures, part 1</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs059.html#back-propagation-in-time-part-2" style="font-size: 80%;"><b>Back propagation in time, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs060.html#back-propagation-in-time-part-3" style="font-size: 80%;"><b>Back propagation in time, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs061.html#back-propagation-in-time-part-4" style="font-size: 80%;"><b>Back propagation in time, part 4</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs062.html#back-propagation-in-time-in-equations" style="font-size: 80%;"><b>Back propagation in time in equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs063.html#chain-rule-again" style="font-size: 80%;"><b>Chain rule again</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs064.html#gradients-of-loss-functions" style="font-size: 80%;"><b>Gradients of loss functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs065.html#summary-of-rnns" style="font-size: 80%;"><b>Summary of RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs066.html#summary-of-a-typical-rnn" style="font-size: 80%;"><b>Summary of a  typical RNN</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs067.html#four-effective-ways-to-learn-an-rnn-and-preparing-for-next-week" style="font-size: 80%;"><b>Four effective ways to learn an RNN and preparing for next week</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs068.html#gating-mechanism-long-short-term-memory-lstm" style="font-size: 80%;"><b>Gating mechanism: Long Short Term Memory (LSTM)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs069.html#implementing-a-memory-cell-in-a-neural-network" style="font-size: 80%;"><b>Implementing a memory cell in a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs070.html#lstm-details" style="font-size: 80%;"><b>LSTM details</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs071.html#basic-layout" style="font-size: 80%;"><b>Basic layout</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs072.html#more-lstm-details" style="font-size: 80%;"><b>More LSTM details</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs073.html#the-forget-gate" style="font-size: 80%;"><b>The forget gate</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs074.html#input-gate" style="font-size: 80%;"><b>Input gate</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs075.html#forget-and-input" style="font-size: 80%;"><b>Forget and input</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs076.html#output-gate" style="font-size: 80%;"><b>Output gate</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs001.html#plans-for-week-45" style="font-size: 80%;">Plans for week 45</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs002.html#material-for-the-lab-sessions" style="font-size: 80%;">Material for the lab sessions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs003.html#material-for-lecture-monday-november-3" style="font-size: 80%;">Material for Lecture Monday November 3</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs004.html#convolutional-neural-networks-recognizing-images-reminder-from-last-week" style="font-size: 80%;">Convolutional Neural Networks (recognizing images), reminder from last week</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs005.html#what-is-the-difference" style="font-size: 80%;">What is the Difference</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs006.html#neural-networks-vs-cnns" style="font-size: 80%;">Neural Networks vs CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs007.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;">Why CNNS for images, sound files, medical images from CT scans etc?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs008.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;">Regular NNs don’t scale well to full images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs009.html#3d-volumes-of-neurons" style="font-size: 80%;">3D volumes of neurons</a></li>
+     <!-- navigation toc: --> <li><a href="#more-on-dimensionalities" style="font-size: 80%;">More on Dimensionalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs011.html#further-remarks" style="font-size: 80%;">Further remarks</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs012.html#layers-used-to-build-cnns" style="font-size: 80%;">Layers used to build CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs013.html#transforming-images" style="font-size: 80%;">Transforming images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs014.html#cnns-in-brief" style="font-size: 80%;">CNNs in brief</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs015.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs016.html#key-idea" style="font-size: 80%;">Key Idea</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs017.html#mathematics-of-cnns" style="font-size: 80%;">Mathematics of CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs018.html#convolution-examples-polynomial-multiplication" style="font-size: 80%;">Convolution Examples: Polynomial multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs019.html#efficient-polynomial-multiplication" style="font-size: 80%;">Efficient Polynomial Multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs020.html#further-simplification" style="font-size: 80%;">Further simplification</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs021.html#a-more-efficient-way-of-coding-the-above-convolution" style="font-size: 80%;">A more efficient way of coding the above Convolution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs022.html#commutative-process" style="font-size: 80%;">Commutative process</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs023.html#toeplitz-matrices" style="font-size: 80%;">Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs024.html#fourier-series-and-toeplitz-matrices" style="font-size: 80%;">Fourier series and Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs025.html#generalizing-the-above-one-dimensional-case" style="font-size: 80%;">Generalizing the above one-dimensional case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs026.html#memory-considerations" style="font-size: 80%;">Memory considerations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs027.html#padding" style="font-size: 80%;">Padding</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs028.html#new-vector" style="font-size: 80%;">New vector</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs029.html#rewriting-as-dot-products" style="font-size: 80%;">Rewriting as dot products</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs030.html#cross-correlation" style="font-size: 80%;">Cross correlation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs030.html#two-dimensional-objects" style="font-size: 80%;">Two-dimensional objects</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#cnns-in-more-detail-simple-example" style="font-size: 80%;">CNNs in more detail, simple example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs032.html#the-convolution-stage" style="font-size: 80%;">The convolution stage</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs033.html#finding-the-number-of-parameters" style="font-size: 80%;">Finding the number of parameters</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs034.html#new-image-or-volume" style="font-size: 80%;">New image (or volume)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs035.html#parameters-to-train-common-settings" style="font-size: 80%;">Parameters to train, common settings</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs036.html#examples-of-cnn-setups" style="font-size: 80%;">Examples of CNN setups</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs037.html#summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Summarizing: Performing a general discrete convolution ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs038.html#pooling" style="font-size: 80%;">Pooling</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs039.html#pooling-arithmetic" style="font-size: 80%;">Pooling arithmetic</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs040.html#pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Pooling types ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs041.html#building-convolutional-neural-networks-using-tensorflow-and-keras" style="font-size: 80%;">Building convolutional neural networks using Tensorflow and Keras</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs042.html#setting-it-up" style="font-size: 80%;">Setting it up</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs043.html#the-mnist-dataset-again" style="font-size: 80%;">The MNIST dataset again</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs044.html#strong-correlations" style="font-size: 80%;">Strong correlations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs045.html#layers-of-a-cnn" style="font-size: 80%;">Layers of a CNN</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs046.html#systematic-reduction" style="font-size: 80%;">Systematic reduction</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs047.html#prerequisites-collect-and-pre-process-data" style="font-size: 80%;">Prerequisites: Collect and pre-process data</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs048.html#importing-keras-and-tensorflow" style="font-size: 80%;">Importing Keras and Tensorflow</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs049.html#running-with-keras" style="font-size: 80%;">Running with Keras</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs050.html#final-part" style="font-size: 80%;">Final part</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs051.html#final-visualization" style="font-size: 80%;">Final visualization</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs052.html#the-cifar01-data-set" style="font-size: 80%;">The CIFAR01 data set</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs053.html#verifying-the-data-set" style="font-size: 80%;">Verifying the data set</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs054.html#set-up-the-model" style="font-size: 80%;">Set up  the model</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs055.html#add-dense-layers-on-top" style="font-size: 80%;">Add Dense layers on top</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs056.html#compile-and-train-the-model" style="font-size: 80%;">Compile and train the model</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs057.html#finally-evaluate-the-model" style="font-size: 80%;">Finally, evaluate the model</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs058.html#building-code-using-pytorch" style="font-size: 80%;">Building code using Pytorch</a></li>
 
         </ul>
       </li>
@@ -417,26 +277,35 @@
 <div class="container">
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0010"></a>
-<!-- !split  -->
-<h2 id="layers-used-to-build-cnns" class="anchor">Layers used to build CNNs </h2>
+<!-- !split -->
+<h2 id="more-on-dimensionalities" class="anchor">More on Dimensionalities </h2>
+
+<p>In fields like signal processing (and imaging as well), one designs
+so-called filters. These filters are defined by the convolutions and
+are often hand-crafted. One may specify filters for smoothing, edge
+detection, frequency reshaping, and similar operations. However with
+neural networks the idea is to automatically learn the filters and use
+many of them in conjunction with non-linear operations (activation
+functions).
+</p>
 
-<p>A simple CNN is a sequence of layers, and every layer of a CNN
-transforms one volume of activations to another through a
-differentiable function. We use three main types of layers to build
-CNN architectures: Convolutional Layer, Pooling Layer, and
-Fully-Connected Layer (exactly as seen in regular Neural Networks). We
-will stack these layers to form a full CNN architecture.
+<p>As an example consider a neural network operating on sound sequence
+data.  Assume that we an input vector \( \boldsymbol{x} \) of length \( d=10^6 \).  We
+construct then a neural network with onle hidden layer only with
+\( 10^4 \) nodes. This means that we will have a weight matrix with
+\( 10^4\times 10^6=10^{10} \) weights to be determined, together with \( 10^4 \) biases.
 </p>
 
-<p>A simple CNN for image classification could have the architecture:</p>
+<p>Assume furthermore that we have an output layer which is meant to train whether the sound sequence represents a human voice (true) or something else (false).
+It means that we have only one output node. But since this output node connects to \( 10^4 \) nodes in the hidden layer, there are in total \( 10^4 \) weights to be determined for the output layer, plus one bias. In total we have
+</p>
+
+$$
+\mathrm{NumberParameters}=10^{10}+10^4+10^4+1 \approx 10^{10},
+$$
+
+<p>that is ten billion parameters to determine. </p>
 
-<ul>
-<li> <b>INPUT</b> (\( 32\times 32 \times 3 \)) will hold the raw pixel values of the image, in this case an image of width 32, height 32, and with three color channels R,G,B.</li>
-<li> <b>CONV</b> (convolutional )layer will compute the output of neurons that are connected to local regions in the input, each computing a dot product between their weights and a small region they are connected to in the input volume. This may result in volume such as \( [32\times 32\times 12] \) if we decided to use 12 filters.</li>
-<li> <b>RELU</b> layer will apply an elementwise activation function, such as the \( max(0,x) \) thresholding at zero. This leaves the size of the volume unchanged (\( [32\times 32\times 12] \)).</li>
-<li> <b>POOL</b> (pooling) layer will perform a downsampling operation along the spatial dimensions (width, height), resulting in volume such as \( [16\times 16\times 12] \).</li>
-<li> <b>FC</b> (i.e. fully-connected) layer will compute the class scores, resulting in volume of size \( [1\times 1\times 10] \), where each of the 10 numbers correspond to a class score, such as among the 10 categories of the MNIST images we considered above . As with ordinary Neural Networks and as the name implies, each neuron in this layer will be connected to all the numbers in the previous volume.</li>
-</ul>
 <p>
 <!-- navigation buttons at the bottom of the page -->
 <ul class="pagination">
@@ -462,7 +331,7 @@ <h2 id="layers-used-to-build-cnns" class="anchor">Layers used to build CNNs </h2
   <li><a href="http://github.com/._week45-bs018.html">19</a></li>
   <li><a href="http://github.com/._week45-bs019.html">20</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week45-bs076.html">77</a></li>
+  <li><a href="http://github.com/._week45-bs058.html">59</a></li>
   <li><a href="http://github.com/._week45-bs011.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week45/html/._week45-bs011.html b/doc/pub/week45/html/._week45-bs011.html
index 213a3980d..f2a9755af 100644
--- a/doc/pub/week45/html/._week45-bs011.html
+++ b/doc/pub/week45/html/._week45-bs011.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks (RNNs)">
-<title>Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks (RNNs)</title>
+<meta name="description" content="Week 45,  Convolutional Neural Networks (CCNs)">
+<title>Week 45,  Convolutional Neural Networks (CCNs)</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week45.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week45-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -37,19 +37,19 @@
 <!-- tocinfo
 {'highest level': 2,
  'sections': [('Plans for week 45', 2, None, 'plans-for-week-45'),
-              ('Material for the lab sessions, additional ways to present '
-               'classification results and other practicalities',
+              ('Material for the lab sessions',
                2,
                None,
-               'material-for-the-lab-sessions-additional-ways-to-present-classification-results-and-other-practicalities'),
-              ('Material for Lecture Monday November 4',
+               'material-for-the-lab-sessions'),
+              ('Material for Lecture Monday November 3',
                2,
                None,
-               'material-for-lecture-monday-november-4'),
-              ('Convolutional Neural Networks (recognizing images)',
+               'material-for-lecture-monday-november-3'),
+              ('Convolutional Neural Networks (recognizing images), reminder '
+               'from last week',
                2,
                None,
-               'convolutional-neural-networks-recognizing-images'),
+               'convolutional-neural-networks-recognizing-images-reminder-from-last-week'),
               ('What is the Difference', 2, None, 'what-is-the-difference'),
               ('Neural Networks vs CNNs', 2, None, 'neural-networks-vs-cnns'),
               ('Why CNNS for images, sound files, medical images from CT scans '
@@ -62,10 +62,13 @@
                None,
                'regular-nns-don-t-scale-well-to-full-images'),
               ('3D volumes of neurons', 2, None, '3d-volumes-of-neurons'),
+              ('More on Dimensionalities', 2, None, 'more-on-dimensionalities'),
+              ('Further remarks', 2, None, 'further-remarks'),
               ('Layers used to build CNNs',
                2,
                None,
                'layers-used-to-build-cnns'),
+              ('Transforming images', 2, None, 'transforming-images'),
               ('CNNs in brief', 2, None, 'cnns-in-brief'),
               ('A deep CNN model ("From Raschka et '
                'al":"https://github.com/rasbt/machine-learning-book")',
@@ -73,208 +76,104 @@
                None,
                'a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
               ('Key Idea', 2, None, 'key-idea'),
-              ('Building convolutional neural networks in Tensorflow and Keras',
+              ('Mathematics of CNNs', 2, None, 'mathematics-of-cnns'),
+              ('Convolution Examples: Polynomial multiplication',
                2,
                None,
-               'building-convolutional-neural-networks-in-tensorflow-and-keras'),
-              ('Setting it up', 2, None, 'setting-it-up'),
-              ('The MNIST dataset again', 2, None, 'the-mnist-dataset-again'),
-              ('Strong correlations', 2, None, 'strong-correlations'),
-              ('Layers of a CNN', 2, None, 'layers-of-a-cnn'),
-              ('Systematic reduction', 2, None, 'systematic-reduction'),
-              ('Prerequisites: Collect and pre-process data',
-               2,
-               None,
-               'prerequisites-collect-and-pre-process-data'),
-              ('Importing Keras and Tensorflow',
-               2,
-               None,
-               'importing-keras-and-tensorflow'),
-              ('Running with Keras', 2, None, 'running-with-keras'),
-              ('Final part', 2, None, 'final-part'),
-              ('Final visualization', 2, None, 'final-visualization'),
-              ('The CIFAR01 data set', 2, None, 'the-cifar01-data-set'),
-              ('Verifying the data set', 2, None, 'verifying-the-data-set'),
-              ('Set up  the model', 2, None, 'set-up-the-model'),
-              ('Add Dense layers on top', 2, None, 'add-dense-layers-on-top'),
-              ('Compile and train the model',
-               2,
-               None,
-               'compile-and-train-the-model'),
-              ('Finally, evaluate the model',
-               2,
-               None,
-               'finally-evaluate-the-model'),
-              ('Building our own CNN code',
-               2,
-               None,
-               'building-our-own-cnn-code'),
-              ('List of contents:', 3, None, 'list-of-contents'),
-              ('Schedulers', 3, None, 'schedulers'),
-              ('Usage of schedulers', 3, None, 'usage-of-schedulers'),
-              ('Cost functions', 3, None, 'cost-functions'),
-              ('Usage of cost functions', 3, None, 'usage-of-cost-functions'),
-              ('Activation functions', 3, None, 'activation-functions'),
-              ('Usage of activation functions',
-               3,
-               None,
-               'usage-of-activation-functions'),
-              ('Convolution', 3, None, 'convolution'),
-              ('Layers', 3, None, 'layers'),
-              ('Convolution2DLayer: convolution in a hidden layer',
-               3,
-               None,
-               'convolution2dlayer-convolution-in-a-hidden-layer'),
-              ('Backpropagation in the convolutional layer',
-               3,
-               None,
-               'backpropagation-in-the-convolutional-layer'),
-              ('Demonstration', 3, None, 'demonstration'),
-              ('Pooling Layer', 3, None, 'pooling-layer'),
-              ('Flattening Layer', 3, None, 'flattening-layer'),
-              ('Fully Connected Layers', 3, None, 'fully-connected-layers'),
-              ('Optimized Convolution2DLayer',
-               3,
-               None,
-               'optimized-convolution2dlayer'),
-              ('The Convolutional Neural Network (CNN)',
-               3,
-               None,
-               'the-convolutional-neural-network-cnn'),
-              ('Usage of CNN code', 3, None, 'usage-of-cnn-code'),
-              ('Additional Remarks', 3, None, 'additional-remarks'),
-              ('Remarks on the speed', 3, None, 'remarks-on-the-speed'),
-              ('Convolution using separable kernels',
-               3,
-               None,
-               'convolution-using-separable-kernels'),
-              ('Convolution in the Fourier domain',
-               3,
-               None,
-               'convolution-in-the-fourier-domain'),
-              ('From FFNNs and CNNs to recurrent neural networks (RNNs)',
-               2,
-               None,
-               'from-ffnns-and-cnns-to-recurrent-neural-networks-rnns'),
-              ('Feedback connections', 2, None, 'feedback-connections'),
-              ('Vanishing gradients', 2, None, 'vanishing-gradients'),
-              ('Recurrent neural networks (RNNs): Overarching view',
+               'convolution-examples-polynomial-multiplication'),
+              ('Efficient Polynomial Multiplication',
                2,
                None,
-               'recurrent-neural-networks-rnns-overarching-view'),
-              ('Sequential data only?', 2, None, 'sequential-data-only'),
-              ('Differential equations', 2, None, 'differential-equations'),
-              ('A simple example', 2, None, 'a-simple-example'),
-              ('RNNs', 2, None, 'rnns'),
-              ('What kinds of behaviour can RNNs exhibit?',
+               'efficient-polynomial-multiplication'),
+              ('Further simplification', 2, None, 'further-simplification'),
+              ('A more efficient way of coding the above Convolution',
                2,
                None,
-               'what-kinds-of-behaviour-can-rnns-exhibit'),
-              ('Basic layout,  "Figures from Sebastian Rashcka et al, Machine '
-               'learning with Sickit-Learn and '
-               'PyTorch":"https://sebastianraschka.com/blog/2022/ml-pytorch-book.html"',
+               'a-more-efficient-way-of-coding-the-above-convolution'),
+              ('Commutative process', 2, None, 'commutative-process'),
+              ('Toeplitz matrices', 2, None, 'toeplitz-matrices'),
+              ('Fourier series and Toeplitz matrices',
                2,
                None,
-               'basic-layout-figures-from-sebastian-rashcka-et-al-machine-learning-with-sickit-learn-and-pytorch-https-sebastianraschka-com-blog-2022-ml-pytorch-book-html'),
-              ('Solving differential equations with RNNs',
+               'fourier-series-and-toeplitz-matrices'),
+              ('Generalizing the above one-dimensional case',
                2,
                None,
-               'solving-differential-equations-with-rnns'),
-              ('Two first-order differential equations',
+               'generalizing-the-above-one-dimensional-case'),
+              ('Memory considerations', 2, None, 'memory-considerations'),
+              ('Padding', 2, None, 'padding'),
+              ('New vector', 2, None, 'new-vector'),
+              ('Rewriting as dot products',
                2,
                None,
-               'two-first-order-differential-equations'),
-              ('Velocity only', 2, None, 'velocity-only'),
-              ('Linking with RNNs', 2, None, 'linking-with-rnns'),
-              ('Minor rewrite', 2, None, 'minor-rewrite'),
-              ('RNNs in more detail', 2, None, 'rnns-in-more-detail'),
-              ('RNNs in more detail, part 2',
+               'rewriting-as-dot-products'),
+              ('Cross correlation', 2, None, 'cross-correlation'),
+              ('Two-dimensional objects', 2, None, 'two-dimensional-objects'),
+              ('CNNs in more detail, simple example',
                2,
                None,
-               'rnns-in-more-detail-part-2'),
-              ('RNNs in more detail, part 3',
+               'cnns-in-more-detail-simple-example'),
+              ('The convolution stage', 2, None, 'the-convolution-stage'),
+              ('Finding the number of parameters',
                2,
                None,
-               'rnns-in-more-detail-part-3'),
-              ('RNNs in more detail, part 4',
+               'finding-the-number-of-parameters'),
+              ('New image (or volume)', 2, None, 'new-image-or-volume'),
+              ('Parameters to train, common settings',
                2,
                None,
-               'rnns-in-more-detail-part-4'),
-              ('RNNs in more detail, part 5',
-               2,
-               None,
-               'rnns-in-more-detail-part-5'),
-              ('RNNs in more detail, part 6',
-               2,
-               None,
-               'rnns-in-more-detail-part-6'),
-              ('RNNs in more detail, part 7',
-               2,
-               None,
-               'rnns-in-more-detail-part-7'),
-              ('Backpropagation through time',
-               2,
-               None,
-               'backpropagation-through-time'),
-              ('The backward pass is linear',
-               2,
-               None,
-               'the-backward-pass-is-linear'),
-              ('The problem of exploding or vanishing gradients',
-               2,
-               None,
-               'the-problem-of-exploding-or-vanishing-gradients'),
-              ('Mathematical setup', 2, None, 'mathematical-setup'),
-              ('Back propagation in time through figures, part 1',
-               2,
-               None,
-               'back-propagation-in-time-through-figures-part-1'),
-              ('Back propagation in time, part 2',
-               2,
-               None,
-               'back-propagation-in-time-part-2'),
-              ('Back propagation in time, part 3',
+               'parameters-to-train-common-settings'),
+              ('Examples of CNN setups', 2, None, 'examples-of-cnn-setups'),
+              ('Summarizing: Performing a general discrete convolution ("From '
+               'Raschka et '
+               'al":"https://github.com/rasbt/machine-learning-book")',
                2,
                None,
-               'back-propagation-in-time-part-3'),
-              ('Back propagation in time, part 4',
+               'summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
+              ('Pooling', 2, None, 'pooling'),
+              ('Pooling arithmetic', 2, None, 'pooling-arithmetic'),
+              ('Pooling types ("From Raschka et '
+               'al":"https://github.com/rasbt/machine-learning-book")',
                2,
                None,
-               'back-propagation-in-time-part-4'),
-              ('Back propagation in time in equations',
+               'pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
+              ('Building convolutional neural networks using Tensorflow and '
+               'Keras',
                2,
                None,
-               'back-propagation-in-time-in-equations'),
-              ('Chain rule again', 2, None, 'chain-rule-again'),
-              ('Gradients of loss functions',
+               'building-convolutional-neural-networks-using-tensorflow-and-keras'),
+              ('Setting it up', 2, None, 'setting-it-up'),
+              ('The MNIST dataset again', 2, None, 'the-mnist-dataset-again'),
+              ('Strong correlations', 2, None, 'strong-correlations'),
+              ('Layers of a CNN', 2, None, 'layers-of-a-cnn'),
+              ('Systematic reduction', 2, None, 'systematic-reduction'),
+              ('Prerequisites: Collect and pre-process data',
                2,
                None,
-               'gradients-of-loss-functions'),
-              ('Summary of RNNs', 2, None, 'summary-of-rnns'),
-              ('Summary of a  typical RNN',
+               'prerequisites-collect-and-pre-process-data'),
+              ('Importing Keras and Tensorflow',
                2,
                None,
-               'summary-of-a-typical-rnn'),
-              ('Four effective ways to learn an RNN and preparing for next '
-               'week',
+               'importing-keras-and-tensorflow'),
+              ('Running with Keras', 2, None, 'running-with-keras'),
+              ('Final part', 2, None, 'final-part'),
+              ('Final visualization', 2, None, 'final-visualization'),
+              ('The CIFAR01 data set', 2, None, 'the-cifar01-data-set'),
+              ('Verifying the data set', 2, None, 'verifying-the-data-set'),
+              ('Set up  the model', 2, None, 'set-up-the-model'),
+              ('Add Dense layers on top', 2, None, 'add-dense-layers-on-top'),
+              ('Compile and train the model',
                2,
                None,
-               'four-effective-ways-to-learn-an-rnn-and-preparing-for-next-week'),
-              ('Gating mechanism: Long Short Term Memory (LSTM)',
+               'compile-and-train-the-model'),
+              ('Finally, evaluate the model',
                2,
                None,
-               'gating-mechanism-long-short-term-memory-lstm'),
-              ('Implementing a memory cell in a neural network',
+               'finally-evaluate-the-model'),
+              ('Building code using Pytorch',
                2,
                None,
-               'implementing-a-memory-cell-in-a-neural-network'),
-              ('LSTM details', 2, None, 'lstm-details'),
-              ('Basic layout', 2, None, 'basic-layout'),
-              ('More LSTM details', 2, None, 'more-lstm-details'),
-              ('The forget gate', 2, None, 'the-forget-gate'),
-              ('Input gate', 2, None, 'input-gate'),
-              ('Forget and input', 2, None, 'forget-and-input'),
-              ('Output gate', 2, None, 'output-gate')]}
+               'building-code-using-pytorch')]}
 end of tocinfo -->
 
 <body>
@@ -302,111 +201,72 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week45-bs.html">Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks (RNNs)</a>
+    <a class="navbar-brand" href="http://github.com/week45-bs.html">Week 45,  Convolutional Neural Networks (CCNs)</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs001.html#plans-for-week-45" style="font-size: 80%;"><b>Plans for week 45</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs002.html#material-for-the-lab-sessions-additional-ways-to-present-classification-results-and-other-practicalities" style="font-size: 80%;"><b>Material for the lab sessions, additional ways to present classification results and other practicalities</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs003.html#material-for-lecture-monday-november-4" style="font-size: 80%;"><b>Material for Lecture Monday November 4</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs004.html#convolutional-neural-networks-recognizing-images" style="font-size: 80%;"><b>Convolutional Neural Networks (recognizing images)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs005.html#what-is-the-difference" style="font-size: 80%;"><b>What is the Difference</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs006.html#neural-networks-vs-cnns" style="font-size: 80%;"><b>Neural Networks vs CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs007.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;"><b>Why CNNS for images, sound files, medical images from CT scans etc?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs008.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;"><b>Regular NNs don’t scale well to full images</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs009.html#3d-volumes-of-neurons" style="font-size: 80%;"><b>3D volumes of neurons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs010.html#layers-used-to-build-cnns" style="font-size: 80%;"><b>Layers used to build CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="#cnns-in-brief" style="font-size: 80%;"><b>CNNs in brief</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs012.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs013.html#key-idea" style="font-size: 80%;"><b>Key Idea</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs014.html#building-convolutional-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building convolutional neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs015.html#setting-it-up" style="font-size: 80%;"><b>Setting it up</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs016.html#the-mnist-dataset-again" style="font-size: 80%;"><b>The MNIST dataset again</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs017.html#strong-correlations" style="font-size: 80%;"><b>Strong correlations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs018.html#layers-of-a-cnn" style="font-size: 80%;"><b>Layers of a CNN</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs019.html#systematic-reduction" style="font-size: 80%;"><b>Systematic reduction</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs020.html#prerequisites-collect-and-pre-process-data" style="font-size: 80%;"><b>Prerequisites: Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs021.html#importing-keras-and-tensorflow" style="font-size: 80%;"><b>Importing Keras and Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs022.html#running-with-keras" style="font-size: 80%;"><b>Running with Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs023.html#final-part" style="font-size: 80%;"><b>Final part</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs024.html#final-visualization" style="font-size: 80%;"><b>Final visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs025.html#the-cifar01-data-set" style="font-size: 80%;"><b>The CIFAR01 data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs026.html#verifying-the-data-set" style="font-size: 80%;"><b>Verifying the data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs027.html#set-up-the-model" style="font-size: 80%;"><b>Set up  the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs028.html#add-dense-layers-on-top" style="font-size: 80%;"><b>Add Dense layers on top</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs029.html#compile-and-train-the-model" style="font-size: 80%;"><b>Compile and train the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs030.html#finally-evaluate-the-model" style="font-size: 80%;"><b>Finally, evaluate the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#building-our-own-cnn-code" style="font-size: 80%;"><b>Building our own CNN code</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#list-of-contents" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;List of contents:</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution2dlayer-convolution-in-a-hidden-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution2DLayer: convolution in a hidden layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#backpropagation-in-the-convolutional-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Backpropagation in the convolutional layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#demonstration" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Demonstration</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#pooling-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Pooling Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#flattening-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Flattening Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#fully-connected-layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Fully Connected Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#optimized-convolution2dlayer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Optimized Convolution2DLayer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#the-convolutional-neural-network-cnn" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;The Convolutional Neural Network (CNN)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-cnn-code" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of CNN code</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#additional-remarks" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Additional Remarks</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#remarks-on-the-speed" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Remarks on the speed</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution-using-separable-kernels" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution using separable kernels</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution-in-the-fourier-domain" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution in the Fourier domain</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs032.html#from-ffnns-and-cnns-to-recurrent-neural-networks-rnns" style="font-size: 80%;"><b>From FFNNs and CNNs to recurrent neural networks (RNNs)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs033.html#feedback-connections" style="font-size: 80%;"><b>Feedback connections</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs034.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs035.html#recurrent-neural-networks-rnns-overarching-view" style="font-size: 80%;"><b>Recurrent neural networks (RNNs): Overarching view</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs036.html#sequential-data-only" style="font-size: 80%;"><b>Sequential data only?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs037.html#differential-equations" style="font-size: 80%;"><b>Differential equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs038.html#a-simple-example" style="font-size: 80%;"><b>A simple example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs039.html#rnns" style="font-size: 80%;"><b>RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs040.html#what-kinds-of-behaviour-can-rnns-exhibit" style="font-size: 80%;"><b>What kinds of behaviour can RNNs exhibit?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs041.html#basic-layout-figures-from-sebastian-rashcka-et-al-machine-learning-with-sickit-learn-and-pytorch-https-sebastianraschka-com-blog-2022-ml-pytorch-book-html" style="font-size: 80%;"><b>Basic layout,  "Figures from Sebastian Rashcka et al, Machine learning with Sickit-Learn and PyTorch":"https://sebastianraschka.com/blog/2022/ml-pytorch-book.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs042.html#solving-differential-equations-with-rnns" style="font-size: 80%;"><b>Solving differential equations with RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs043.html#two-first-order-differential-equations" style="font-size: 80%;"><b>Two first-order differential equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs044.html#velocity-only" style="font-size: 80%;"><b>Velocity only</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs045.html#linking-with-rnns" style="font-size: 80%;"><b>Linking with RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs046.html#minor-rewrite" style="font-size: 80%;"><b>Minor rewrite</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs047.html#rnns-in-more-detail" style="font-size: 80%;"><b>RNNs in more detail</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs048.html#rnns-in-more-detail-part-2" style="font-size: 80%;"><b>RNNs in more detail, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs049.html#rnns-in-more-detail-part-3" style="font-size: 80%;"><b>RNNs in more detail, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs050.html#rnns-in-more-detail-part-4" style="font-size: 80%;"><b>RNNs in more detail, part 4</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs051.html#rnns-in-more-detail-part-5" style="font-size: 80%;"><b>RNNs in more detail, part 5</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs052.html#rnns-in-more-detail-part-6" style="font-size: 80%;"><b>RNNs in more detail, part 6</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs053.html#rnns-in-more-detail-part-7" style="font-size: 80%;"><b>RNNs in more detail, part 7</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs054.html#backpropagation-through-time" style="font-size: 80%;"><b>Backpropagation through time</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs055.html#the-backward-pass-is-linear" style="font-size: 80%;"><b>The backward pass is linear</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs056.html#the-problem-of-exploding-or-vanishing-gradients" style="font-size: 80%;"><b>The problem of exploding or vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs057.html#mathematical-setup" style="font-size: 80%;"><b>Mathematical setup</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs058.html#back-propagation-in-time-through-figures-part-1" style="font-size: 80%;"><b>Back propagation in time through figures, part 1</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs059.html#back-propagation-in-time-part-2" style="font-size: 80%;"><b>Back propagation in time, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs060.html#back-propagation-in-time-part-3" style="font-size: 80%;"><b>Back propagation in time, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs061.html#back-propagation-in-time-part-4" style="font-size: 80%;"><b>Back propagation in time, part 4</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs062.html#back-propagation-in-time-in-equations" style="font-size: 80%;"><b>Back propagation in time in equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs063.html#chain-rule-again" style="font-size: 80%;"><b>Chain rule again</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs064.html#gradients-of-loss-functions" style="font-size: 80%;"><b>Gradients of loss functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs065.html#summary-of-rnns" style="font-size: 80%;"><b>Summary of RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs066.html#summary-of-a-typical-rnn" style="font-size: 80%;"><b>Summary of a  typical RNN</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs067.html#four-effective-ways-to-learn-an-rnn-and-preparing-for-next-week" style="font-size: 80%;"><b>Four effective ways to learn an RNN and preparing for next week</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs068.html#gating-mechanism-long-short-term-memory-lstm" style="font-size: 80%;"><b>Gating mechanism: Long Short Term Memory (LSTM)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs069.html#implementing-a-memory-cell-in-a-neural-network" style="font-size: 80%;"><b>Implementing a memory cell in a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs070.html#lstm-details" style="font-size: 80%;"><b>LSTM details</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs071.html#basic-layout" style="font-size: 80%;"><b>Basic layout</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs072.html#more-lstm-details" style="font-size: 80%;"><b>More LSTM details</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs073.html#the-forget-gate" style="font-size: 80%;"><b>The forget gate</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs074.html#input-gate" style="font-size: 80%;"><b>Input gate</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs075.html#forget-and-input" style="font-size: 80%;"><b>Forget and input</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs076.html#output-gate" style="font-size: 80%;"><b>Output gate</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs001.html#plans-for-week-45" style="font-size: 80%;">Plans for week 45</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs002.html#material-for-the-lab-sessions" style="font-size: 80%;">Material for the lab sessions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs003.html#material-for-lecture-monday-november-3" style="font-size: 80%;">Material for Lecture Monday November 3</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs004.html#convolutional-neural-networks-recognizing-images-reminder-from-last-week" style="font-size: 80%;">Convolutional Neural Networks (recognizing images), reminder from last week</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs005.html#what-is-the-difference" style="font-size: 80%;">What is the Difference</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs006.html#neural-networks-vs-cnns" style="font-size: 80%;">Neural Networks vs CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs007.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;">Why CNNS for images, sound files, medical images from CT scans etc?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs008.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;">Regular NNs don’t scale well to full images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs009.html#3d-volumes-of-neurons" style="font-size: 80%;">3D volumes of neurons</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs010.html#more-on-dimensionalities" style="font-size: 80%;">More on Dimensionalities</a></li>
+     <!-- navigation toc: --> <li><a href="#further-remarks" style="font-size: 80%;">Further remarks</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs012.html#layers-used-to-build-cnns" style="font-size: 80%;">Layers used to build CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs013.html#transforming-images" style="font-size: 80%;">Transforming images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs014.html#cnns-in-brief" style="font-size: 80%;">CNNs in brief</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs015.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs016.html#key-idea" style="font-size: 80%;">Key Idea</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs017.html#mathematics-of-cnns" style="font-size: 80%;">Mathematics of CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs018.html#convolution-examples-polynomial-multiplication" style="font-size: 80%;">Convolution Examples: Polynomial multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs019.html#efficient-polynomial-multiplication" style="font-size: 80%;">Efficient Polynomial Multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs020.html#further-simplification" style="font-size: 80%;">Further simplification</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs021.html#a-more-efficient-way-of-coding-the-above-convolution" style="font-size: 80%;">A more efficient way of coding the above Convolution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs022.html#commutative-process" style="font-size: 80%;">Commutative process</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs023.html#toeplitz-matrices" style="font-size: 80%;">Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs024.html#fourier-series-and-toeplitz-matrices" style="font-size: 80%;">Fourier series and Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs025.html#generalizing-the-above-one-dimensional-case" style="font-size: 80%;">Generalizing the above one-dimensional case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs026.html#memory-considerations" style="font-size: 80%;">Memory considerations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs027.html#padding" style="font-size: 80%;">Padding</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs028.html#new-vector" style="font-size: 80%;">New vector</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs029.html#rewriting-as-dot-products" style="font-size: 80%;">Rewriting as dot products</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs030.html#cross-correlation" style="font-size: 80%;">Cross correlation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs030.html#two-dimensional-objects" style="font-size: 80%;">Two-dimensional objects</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#cnns-in-more-detail-simple-example" style="font-size: 80%;">CNNs in more detail, simple example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs032.html#the-convolution-stage" style="font-size: 80%;">The convolution stage</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs033.html#finding-the-number-of-parameters" style="font-size: 80%;">Finding the number of parameters</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs034.html#new-image-or-volume" style="font-size: 80%;">New image (or volume)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs035.html#parameters-to-train-common-settings" style="font-size: 80%;">Parameters to train, common settings</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs036.html#examples-of-cnn-setups" style="font-size: 80%;">Examples of CNN setups</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs037.html#summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Summarizing: Performing a general discrete convolution ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs038.html#pooling" style="font-size: 80%;">Pooling</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs039.html#pooling-arithmetic" style="font-size: 80%;">Pooling arithmetic</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs040.html#pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Pooling types ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs041.html#building-convolutional-neural-networks-using-tensorflow-and-keras" style="font-size: 80%;">Building convolutional neural networks using Tensorflow and Keras</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs042.html#setting-it-up" style="font-size: 80%;">Setting it up</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs043.html#the-mnist-dataset-again" style="font-size: 80%;">The MNIST dataset again</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs044.html#strong-correlations" style="font-size: 80%;">Strong correlations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs045.html#layers-of-a-cnn" style="font-size: 80%;">Layers of a CNN</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs046.html#systematic-reduction" style="font-size: 80%;">Systematic reduction</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs047.html#prerequisites-collect-and-pre-process-data" style="font-size: 80%;">Prerequisites: Collect and pre-process data</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs048.html#importing-keras-and-tensorflow" style="font-size: 80%;">Importing Keras and Tensorflow</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs049.html#running-with-keras" style="font-size: 80%;">Running with Keras</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs050.html#final-part" style="font-size: 80%;">Final part</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs051.html#final-visualization" style="font-size: 80%;">Final visualization</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs052.html#the-cifar01-data-set" style="font-size: 80%;">The CIFAR01 data set</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs053.html#verifying-the-data-set" style="font-size: 80%;">Verifying the data set</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs054.html#set-up-the-model" style="font-size: 80%;">Set up  the model</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs055.html#add-dense-layers-on-top" style="font-size: 80%;">Add Dense layers on top</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs056.html#compile-and-train-the-model" style="font-size: 80%;">Compile and train the model</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs057.html#finally-evaluate-the-model" style="font-size: 80%;">Finally, evaluate the model</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs058.html#building-code-using-pytorch" style="font-size: 80%;">Building code using Pytorch</a></li>
 
         </ul>
       </li>
@@ -418,17 +278,17 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0011"></a>
 <!-- !split -->
-<h2 id="cnns-in-brief" class="anchor">CNNs in brief </h2>
+<h2 id="further-remarks" class="anchor">Further remarks </h2>
 
-<p>In summary:</p>
+<p>The main principles that justify convolutions is locality of
+information and repetion of patterns within the signal. Sound samples
+of the input in adjacent spots are much more likely to affect each
+other than those that are very far away. Similarly, sounds are
+repeated in multiple times in the signal. While slightly simplistic,
+reasoning about such a sound example demonstrates this. The same
+principles then apply to images and other similar data.
+</p>
 
-<ul>
-<li> A CNN architecture is in the simplest case a list of Layers that transform the image volume into an output volume (e.g. holding the class scores)</li>
-<li> There are a few distinct types of Layers (e.g. CONV/FC/RELU/POOL are by far the most popular)</li>
-<li> Each Layer accepts an input 3D volume and transforms it to an output 3D volume through a differentiable function</li>
-<li> Each Layer may or may not have parameters (e.g. CONV/FC do, RELU/POOL don&#8217;t)</li>
-<li> Each Layer may or may not have additional hyperparameters (e.g. CONV/FC/POOL do, RELU doesn&#8217;t)</li>
-</ul>
 <p>
 <!-- navigation buttons at the bottom of the page -->
 <ul class="pagination">
@@ -454,7 +314,7 @@ <h2 id="cnns-in-brief" class="anchor">CNNs in brief </h2>
   <li><a href="http://github.com/._week45-bs019.html">20</a></li>
   <li><a href="http://github.com/._week45-bs020.html">21</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week45-bs076.html">77</a></li>
+  <li><a href="http://github.com/._week45-bs058.html">59</a></li>
   <li><a href="http://github.com/._week45-bs012.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week45/html/._week45-bs012.html b/doc/pub/week45/html/._week45-bs012.html
index 3dcf8df4a..a114d56d3 100644
--- a/doc/pub/week45/html/._week45-bs012.html
+++ b/doc/pub/week45/html/._week45-bs012.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks (RNNs)">
-<title>Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks (RNNs)</title>
+<meta name="description" content="Week 45,  Convolutional Neural Networks (CCNs)">
+<title>Week 45,  Convolutional Neural Networks (CCNs)</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week45.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week45-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -37,19 +37,19 @@
 <!-- tocinfo
 {'highest level': 2,
  'sections': [('Plans for week 45', 2, None, 'plans-for-week-45'),
-              ('Material for the lab sessions, additional ways to present '
-               'classification results and other practicalities',
+              ('Material for the lab sessions',
                2,
                None,
-               'material-for-the-lab-sessions-additional-ways-to-present-classification-results-and-other-practicalities'),
-              ('Material for Lecture Monday November 4',
+               'material-for-the-lab-sessions'),
+              ('Material for Lecture Monday November 3',
                2,
                None,
-               'material-for-lecture-monday-november-4'),
-              ('Convolutional Neural Networks (recognizing images)',
+               'material-for-lecture-monday-november-3'),
+              ('Convolutional Neural Networks (recognizing images), reminder '
+               'from last week',
                2,
                None,
-               'convolutional-neural-networks-recognizing-images'),
+               'convolutional-neural-networks-recognizing-images-reminder-from-last-week'),
               ('What is the Difference', 2, None, 'what-is-the-difference'),
               ('Neural Networks vs CNNs', 2, None, 'neural-networks-vs-cnns'),
               ('Why CNNS for images, sound files, medical images from CT scans '
@@ -62,10 +62,13 @@
                None,
                'regular-nns-don-t-scale-well-to-full-images'),
               ('3D volumes of neurons', 2, None, '3d-volumes-of-neurons'),
+              ('More on Dimensionalities', 2, None, 'more-on-dimensionalities'),
+              ('Further remarks', 2, None, 'further-remarks'),
               ('Layers used to build CNNs',
                2,
                None,
                'layers-used-to-build-cnns'),
+              ('Transforming images', 2, None, 'transforming-images'),
               ('CNNs in brief', 2, None, 'cnns-in-brief'),
               ('A deep CNN model ("From Raschka et '
                'al":"https://github.com/rasbt/machine-learning-book")',
@@ -73,208 +76,104 @@
                None,
                'a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
               ('Key Idea', 2, None, 'key-idea'),
-              ('Building convolutional neural networks in Tensorflow and Keras',
+              ('Mathematics of CNNs', 2, None, 'mathematics-of-cnns'),
+              ('Convolution Examples: Polynomial multiplication',
                2,
                None,
-               'building-convolutional-neural-networks-in-tensorflow-and-keras'),
-              ('Setting it up', 2, None, 'setting-it-up'),
-              ('The MNIST dataset again', 2, None, 'the-mnist-dataset-again'),
-              ('Strong correlations', 2, None, 'strong-correlations'),
-              ('Layers of a CNN', 2, None, 'layers-of-a-cnn'),
-              ('Systematic reduction', 2, None, 'systematic-reduction'),
-              ('Prerequisites: Collect and pre-process data',
-               2,
-               None,
-               'prerequisites-collect-and-pre-process-data'),
-              ('Importing Keras and Tensorflow',
-               2,
-               None,
-               'importing-keras-and-tensorflow'),
-              ('Running with Keras', 2, None, 'running-with-keras'),
-              ('Final part', 2, None, 'final-part'),
-              ('Final visualization', 2, None, 'final-visualization'),
-              ('The CIFAR01 data set', 2, None, 'the-cifar01-data-set'),
-              ('Verifying the data set', 2, None, 'verifying-the-data-set'),
-              ('Set up  the model', 2, None, 'set-up-the-model'),
-              ('Add Dense layers on top', 2, None, 'add-dense-layers-on-top'),
-              ('Compile and train the model',
-               2,
-               None,
-               'compile-and-train-the-model'),
-              ('Finally, evaluate the model',
-               2,
-               None,
-               'finally-evaluate-the-model'),
-              ('Building our own CNN code',
-               2,
-               None,
-               'building-our-own-cnn-code'),
-              ('List of contents:', 3, None, 'list-of-contents'),
-              ('Schedulers', 3, None, 'schedulers'),
-              ('Usage of schedulers', 3, None, 'usage-of-schedulers'),
-              ('Cost functions', 3, None, 'cost-functions'),
-              ('Usage of cost functions', 3, None, 'usage-of-cost-functions'),
-              ('Activation functions', 3, None, 'activation-functions'),
-              ('Usage of activation functions',
-               3,
-               None,
-               'usage-of-activation-functions'),
-              ('Convolution', 3, None, 'convolution'),
-              ('Layers', 3, None, 'layers'),
-              ('Convolution2DLayer: convolution in a hidden layer',
-               3,
-               None,
-               'convolution2dlayer-convolution-in-a-hidden-layer'),
-              ('Backpropagation in the convolutional layer',
-               3,
-               None,
-               'backpropagation-in-the-convolutional-layer'),
-              ('Demonstration', 3, None, 'demonstration'),
-              ('Pooling Layer', 3, None, 'pooling-layer'),
-              ('Flattening Layer', 3, None, 'flattening-layer'),
-              ('Fully Connected Layers', 3, None, 'fully-connected-layers'),
-              ('Optimized Convolution2DLayer',
-               3,
-               None,
-               'optimized-convolution2dlayer'),
-              ('The Convolutional Neural Network (CNN)',
-               3,
-               None,
-               'the-convolutional-neural-network-cnn'),
-              ('Usage of CNN code', 3, None, 'usage-of-cnn-code'),
-              ('Additional Remarks', 3, None, 'additional-remarks'),
-              ('Remarks on the speed', 3, None, 'remarks-on-the-speed'),
-              ('Convolution using separable kernels',
-               3,
-               None,
-               'convolution-using-separable-kernels'),
-              ('Convolution in the Fourier domain',
-               3,
-               None,
-               'convolution-in-the-fourier-domain'),
-              ('From FFNNs and CNNs to recurrent neural networks (RNNs)',
-               2,
-               None,
-               'from-ffnns-and-cnns-to-recurrent-neural-networks-rnns'),
-              ('Feedback connections', 2, None, 'feedback-connections'),
-              ('Vanishing gradients', 2, None, 'vanishing-gradients'),
-              ('Recurrent neural networks (RNNs): Overarching view',
-               2,
-               None,
-               'recurrent-neural-networks-rnns-overarching-view'),
-              ('Sequential data only?', 2, None, 'sequential-data-only'),
-              ('Differential equations', 2, None, 'differential-equations'),
-              ('A simple example', 2, None, 'a-simple-example'),
-              ('RNNs', 2, None, 'rnns'),
-              ('What kinds of behaviour can RNNs exhibit?',
+               'convolution-examples-polynomial-multiplication'),
+              ('Efficient Polynomial Multiplication',
                2,
                None,
-               'what-kinds-of-behaviour-can-rnns-exhibit'),
-              ('Basic layout,  "Figures from Sebastian Rashcka et al, Machine '
-               'learning with Sickit-Learn and '
-               'PyTorch":"https://sebastianraschka.com/blog/2022/ml-pytorch-book.html"',
+               'efficient-polynomial-multiplication'),
+              ('Further simplification', 2, None, 'further-simplification'),
+              ('A more efficient way of coding the above Convolution',
                2,
                None,
-               'basic-layout-figures-from-sebastian-rashcka-et-al-machine-learning-with-sickit-learn-and-pytorch-https-sebastianraschka-com-blog-2022-ml-pytorch-book-html'),
-              ('Solving differential equations with RNNs',
+               'a-more-efficient-way-of-coding-the-above-convolution'),
+              ('Commutative process', 2, None, 'commutative-process'),
+              ('Toeplitz matrices', 2, None, 'toeplitz-matrices'),
+              ('Fourier series and Toeplitz matrices',
                2,
                None,
-               'solving-differential-equations-with-rnns'),
-              ('Two first-order differential equations',
+               'fourier-series-and-toeplitz-matrices'),
+              ('Generalizing the above one-dimensional case',
                2,
                None,
-               'two-first-order-differential-equations'),
-              ('Velocity only', 2, None, 'velocity-only'),
-              ('Linking with RNNs', 2, None, 'linking-with-rnns'),
-              ('Minor rewrite', 2, None, 'minor-rewrite'),
-              ('RNNs in more detail', 2, None, 'rnns-in-more-detail'),
-              ('RNNs in more detail, part 2',
+               'generalizing-the-above-one-dimensional-case'),
+              ('Memory considerations', 2, None, 'memory-considerations'),
+              ('Padding', 2, None, 'padding'),
+              ('New vector', 2, None, 'new-vector'),
+              ('Rewriting as dot products',
                2,
                None,
-               'rnns-in-more-detail-part-2'),
-              ('RNNs in more detail, part 3',
+               'rewriting-as-dot-products'),
+              ('Cross correlation', 2, None, 'cross-correlation'),
+              ('Two-dimensional objects', 2, None, 'two-dimensional-objects'),
+              ('CNNs in more detail, simple example',
                2,
                None,
-               'rnns-in-more-detail-part-3'),
-              ('RNNs in more detail, part 4',
+               'cnns-in-more-detail-simple-example'),
+              ('The convolution stage', 2, None, 'the-convolution-stage'),
+              ('Finding the number of parameters',
                2,
                None,
-               'rnns-in-more-detail-part-4'),
-              ('RNNs in more detail, part 5',
+               'finding-the-number-of-parameters'),
+              ('New image (or volume)', 2, None, 'new-image-or-volume'),
+              ('Parameters to train, common settings',
                2,
                None,
-               'rnns-in-more-detail-part-5'),
-              ('RNNs in more detail, part 6',
-               2,
-               None,
-               'rnns-in-more-detail-part-6'),
-              ('RNNs in more detail, part 7',
-               2,
-               None,
-               'rnns-in-more-detail-part-7'),
-              ('Backpropagation through time',
-               2,
-               None,
-               'backpropagation-through-time'),
-              ('The backward pass is linear',
-               2,
-               None,
-               'the-backward-pass-is-linear'),
-              ('The problem of exploding or vanishing gradients',
-               2,
-               None,
-               'the-problem-of-exploding-or-vanishing-gradients'),
-              ('Mathematical setup', 2, None, 'mathematical-setup'),
-              ('Back propagation in time through figures, part 1',
-               2,
-               None,
-               'back-propagation-in-time-through-figures-part-1'),
-              ('Back propagation in time, part 2',
-               2,
-               None,
-               'back-propagation-in-time-part-2'),
-              ('Back propagation in time, part 3',
+               'parameters-to-train-common-settings'),
+              ('Examples of CNN setups', 2, None, 'examples-of-cnn-setups'),
+              ('Summarizing: Performing a general discrete convolution ("From '
+               'Raschka et '
+               'al":"https://github.com/rasbt/machine-learning-book")',
                2,
                None,
-               'back-propagation-in-time-part-3'),
-              ('Back propagation in time, part 4',
+               'summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
+              ('Pooling', 2, None, 'pooling'),
+              ('Pooling arithmetic', 2, None, 'pooling-arithmetic'),
+              ('Pooling types ("From Raschka et '
+               'al":"https://github.com/rasbt/machine-learning-book")',
                2,
                None,
-               'back-propagation-in-time-part-4'),
-              ('Back propagation in time in equations',
+               'pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
+              ('Building convolutional neural networks using Tensorflow and '
+               'Keras',
                2,
                None,
-               'back-propagation-in-time-in-equations'),
-              ('Chain rule again', 2, None, 'chain-rule-again'),
-              ('Gradients of loss functions',
+               'building-convolutional-neural-networks-using-tensorflow-and-keras'),
+              ('Setting it up', 2, None, 'setting-it-up'),
+              ('The MNIST dataset again', 2, None, 'the-mnist-dataset-again'),
+              ('Strong correlations', 2, None, 'strong-correlations'),
+              ('Layers of a CNN', 2, None, 'layers-of-a-cnn'),
+              ('Systematic reduction', 2, None, 'systematic-reduction'),
+              ('Prerequisites: Collect and pre-process data',
                2,
                None,
-               'gradients-of-loss-functions'),
-              ('Summary of RNNs', 2, None, 'summary-of-rnns'),
-              ('Summary of a  typical RNN',
+               'prerequisites-collect-and-pre-process-data'),
+              ('Importing Keras and Tensorflow',
                2,
                None,
-               'summary-of-a-typical-rnn'),
-              ('Four effective ways to learn an RNN and preparing for next '
-               'week',
+               'importing-keras-and-tensorflow'),
+              ('Running with Keras', 2, None, 'running-with-keras'),
+              ('Final part', 2, None, 'final-part'),
+              ('Final visualization', 2, None, 'final-visualization'),
+              ('The CIFAR01 data set', 2, None, 'the-cifar01-data-set'),
+              ('Verifying the data set', 2, None, 'verifying-the-data-set'),
+              ('Set up  the model', 2, None, 'set-up-the-model'),
+              ('Add Dense layers on top', 2, None, 'add-dense-layers-on-top'),
+              ('Compile and train the model',
                2,
                None,
-               'four-effective-ways-to-learn-an-rnn-and-preparing-for-next-week'),
-              ('Gating mechanism: Long Short Term Memory (LSTM)',
+               'compile-and-train-the-model'),
+              ('Finally, evaluate the model',
                2,
                None,
-               'gating-mechanism-long-short-term-memory-lstm'),
-              ('Implementing a memory cell in a neural network',
+               'finally-evaluate-the-model'),
+              ('Building code using Pytorch',
                2,
                None,
-               'implementing-a-memory-cell-in-a-neural-network'),
-              ('LSTM details', 2, None, 'lstm-details'),
-              ('Basic layout', 2, None, 'basic-layout'),
-              ('More LSTM details', 2, None, 'more-lstm-details'),
-              ('The forget gate', 2, None, 'the-forget-gate'),
-              ('Input gate', 2, None, 'input-gate'),
-              ('Forget and input', 2, None, 'forget-and-input'),
-              ('Output gate', 2, None, 'output-gate')]}
+               'building-code-using-pytorch')]}
 end of tocinfo -->
 
 <body>
@@ -302,111 +201,72 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week45-bs.html">Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks (RNNs)</a>
+    <a class="navbar-brand" href="http://github.com/week45-bs.html">Week 45,  Convolutional Neural Networks (CCNs)</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs001.html#plans-for-week-45" style="font-size: 80%;"><b>Plans for week 45</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs002.html#material-for-the-lab-sessions-additional-ways-to-present-classification-results-and-other-practicalities" style="font-size: 80%;"><b>Material for the lab sessions, additional ways to present classification results and other practicalities</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs003.html#material-for-lecture-monday-november-4" style="font-size: 80%;"><b>Material for Lecture Monday November 4</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs004.html#convolutional-neural-networks-recognizing-images" style="font-size: 80%;"><b>Convolutional Neural Networks (recognizing images)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs005.html#what-is-the-difference" style="font-size: 80%;"><b>What is the Difference</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs006.html#neural-networks-vs-cnns" style="font-size: 80%;"><b>Neural Networks vs CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs007.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;"><b>Why CNNS for images, sound files, medical images from CT scans etc?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs008.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;"><b>Regular NNs don’t scale well to full images</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs009.html#3d-volumes-of-neurons" style="font-size: 80%;"><b>3D volumes of neurons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs010.html#layers-used-to-build-cnns" style="font-size: 80%;"><b>Layers used to build CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs011.html#cnns-in-brief" style="font-size: 80%;"><b>CNNs in brief</b></a></li>
-     <!-- navigation toc: --> <li><a href="#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs013.html#key-idea" style="font-size: 80%;"><b>Key Idea</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs014.html#building-convolutional-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building convolutional neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs015.html#setting-it-up" style="font-size: 80%;"><b>Setting it up</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs016.html#the-mnist-dataset-again" style="font-size: 80%;"><b>The MNIST dataset again</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs017.html#strong-correlations" style="font-size: 80%;"><b>Strong correlations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs018.html#layers-of-a-cnn" style="font-size: 80%;"><b>Layers of a CNN</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs019.html#systematic-reduction" style="font-size: 80%;"><b>Systematic reduction</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs020.html#prerequisites-collect-and-pre-process-data" style="font-size: 80%;"><b>Prerequisites: Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs021.html#importing-keras-and-tensorflow" style="font-size: 80%;"><b>Importing Keras and Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs022.html#running-with-keras" style="font-size: 80%;"><b>Running with Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs023.html#final-part" style="font-size: 80%;"><b>Final part</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs024.html#final-visualization" style="font-size: 80%;"><b>Final visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs025.html#the-cifar01-data-set" style="font-size: 80%;"><b>The CIFAR01 data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs026.html#verifying-the-data-set" style="font-size: 80%;"><b>Verifying the data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs027.html#set-up-the-model" style="font-size: 80%;"><b>Set up  the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs028.html#add-dense-layers-on-top" style="font-size: 80%;"><b>Add Dense layers on top</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs029.html#compile-and-train-the-model" style="font-size: 80%;"><b>Compile and train the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs030.html#finally-evaluate-the-model" style="font-size: 80%;"><b>Finally, evaluate the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#building-our-own-cnn-code" style="font-size: 80%;"><b>Building our own CNN code</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#list-of-contents" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;List of contents:</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution2dlayer-convolution-in-a-hidden-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution2DLayer: convolution in a hidden layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#backpropagation-in-the-convolutional-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Backpropagation in the convolutional layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#demonstration" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Demonstration</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#pooling-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Pooling Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#flattening-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Flattening Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#fully-connected-layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Fully Connected Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#optimized-convolution2dlayer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Optimized Convolution2DLayer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#the-convolutional-neural-network-cnn" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;The Convolutional Neural Network (CNN)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-cnn-code" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of CNN code</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#additional-remarks" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Additional Remarks</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#remarks-on-the-speed" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Remarks on the speed</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution-using-separable-kernels" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution using separable kernels</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution-in-the-fourier-domain" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution in the Fourier domain</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs032.html#from-ffnns-and-cnns-to-recurrent-neural-networks-rnns" style="font-size: 80%;"><b>From FFNNs and CNNs to recurrent neural networks (RNNs)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs033.html#feedback-connections" style="font-size: 80%;"><b>Feedback connections</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs034.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs035.html#recurrent-neural-networks-rnns-overarching-view" style="font-size: 80%;"><b>Recurrent neural networks (RNNs): Overarching view</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs036.html#sequential-data-only" style="font-size: 80%;"><b>Sequential data only?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs037.html#differential-equations" style="font-size: 80%;"><b>Differential equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs038.html#a-simple-example" style="font-size: 80%;"><b>A simple example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs039.html#rnns" style="font-size: 80%;"><b>RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs040.html#what-kinds-of-behaviour-can-rnns-exhibit" style="font-size: 80%;"><b>What kinds of behaviour can RNNs exhibit?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs041.html#basic-layout-figures-from-sebastian-rashcka-et-al-machine-learning-with-sickit-learn-and-pytorch-https-sebastianraschka-com-blog-2022-ml-pytorch-book-html" style="font-size: 80%;"><b>Basic layout,  "Figures from Sebastian Rashcka et al, Machine learning with Sickit-Learn and PyTorch":"https://sebastianraschka.com/blog/2022/ml-pytorch-book.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs042.html#solving-differential-equations-with-rnns" style="font-size: 80%;"><b>Solving differential equations with RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs043.html#two-first-order-differential-equations" style="font-size: 80%;"><b>Two first-order differential equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs044.html#velocity-only" style="font-size: 80%;"><b>Velocity only</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs045.html#linking-with-rnns" style="font-size: 80%;"><b>Linking with RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs046.html#minor-rewrite" style="font-size: 80%;"><b>Minor rewrite</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs047.html#rnns-in-more-detail" style="font-size: 80%;"><b>RNNs in more detail</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs048.html#rnns-in-more-detail-part-2" style="font-size: 80%;"><b>RNNs in more detail, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs049.html#rnns-in-more-detail-part-3" style="font-size: 80%;"><b>RNNs in more detail, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs050.html#rnns-in-more-detail-part-4" style="font-size: 80%;"><b>RNNs in more detail, part 4</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs051.html#rnns-in-more-detail-part-5" style="font-size: 80%;"><b>RNNs in more detail, part 5</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs052.html#rnns-in-more-detail-part-6" style="font-size: 80%;"><b>RNNs in more detail, part 6</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs053.html#rnns-in-more-detail-part-7" style="font-size: 80%;"><b>RNNs in more detail, part 7</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs054.html#backpropagation-through-time" style="font-size: 80%;"><b>Backpropagation through time</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs055.html#the-backward-pass-is-linear" style="font-size: 80%;"><b>The backward pass is linear</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs056.html#the-problem-of-exploding-or-vanishing-gradients" style="font-size: 80%;"><b>The problem of exploding or vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs057.html#mathematical-setup" style="font-size: 80%;"><b>Mathematical setup</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs058.html#back-propagation-in-time-through-figures-part-1" style="font-size: 80%;"><b>Back propagation in time through figures, part 1</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs059.html#back-propagation-in-time-part-2" style="font-size: 80%;"><b>Back propagation in time, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs060.html#back-propagation-in-time-part-3" style="font-size: 80%;"><b>Back propagation in time, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs061.html#back-propagation-in-time-part-4" style="font-size: 80%;"><b>Back propagation in time, part 4</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs062.html#back-propagation-in-time-in-equations" style="font-size: 80%;"><b>Back propagation in time in equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs063.html#chain-rule-again" style="font-size: 80%;"><b>Chain rule again</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs064.html#gradients-of-loss-functions" style="font-size: 80%;"><b>Gradients of loss functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs065.html#summary-of-rnns" style="font-size: 80%;"><b>Summary of RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs066.html#summary-of-a-typical-rnn" style="font-size: 80%;"><b>Summary of a  typical RNN</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs067.html#four-effective-ways-to-learn-an-rnn-and-preparing-for-next-week" style="font-size: 80%;"><b>Four effective ways to learn an RNN and preparing for next week</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs068.html#gating-mechanism-long-short-term-memory-lstm" style="font-size: 80%;"><b>Gating mechanism: Long Short Term Memory (LSTM)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs069.html#implementing-a-memory-cell-in-a-neural-network" style="font-size: 80%;"><b>Implementing a memory cell in a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs070.html#lstm-details" style="font-size: 80%;"><b>LSTM details</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs071.html#basic-layout" style="font-size: 80%;"><b>Basic layout</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs072.html#more-lstm-details" style="font-size: 80%;"><b>More LSTM details</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs073.html#the-forget-gate" style="font-size: 80%;"><b>The forget gate</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs074.html#input-gate" style="font-size: 80%;"><b>Input gate</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs075.html#forget-and-input" style="font-size: 80%;"><b>Forget and input</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs076.html#output-gate" style="font-size: 80%;"><b>Output gate</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs001.html#plans-for-week-45" style="font-size: 80%;">Plans for week 45</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs002.html#material-for-the-lab-sessions" style="font-size: 80%;">Material for the lab sessions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs003.html#material-for-lecture-monday-november-3" style="font-size: 80%;">Material for Lecture Monday November 3</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs004.html#convolutional-neural-networks-recognizing-images-reminder-from-last-week" style="font-size: 80%;">Convolutional Neural Networks (recognizing images), reminder from last week</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs005.html#what-is-the-difference" style="font-size: 80%;">What is the Difference</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs006.html#neural-networks-vs-cnns" style="font-size: 80%;">Neural Networks vs CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs007.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;">Why CNNS for images, sound files, medical images from CT scans etc?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs008.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;">Regular NNs don’t scale well to full images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs009.html#3d-volumes-of-neurons" style="font-size: 80%;">3D volumes of neurons</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs010.html#more-on-dimensionalities" style="font-size: 80%;">More on Dimensionalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs011.html#further-remarks" style="font-size: 80%;">Further remarks</a></li>
+     <!-- navigation toc: --> <li><a href="#layers-used-to-build-cnns" style="font-size: 80%;">Layers used to build CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs013.html#transforming-images" style="font-size: 80%;">Transforming images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs014.html#cnns-in-brief" style="font-size: 80%;">CNNs in brief</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs015.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs016.html#key-idea" style="font-size: 80%;">Key Idea</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs017.html#mathematics-of-cnns" style="font-size: 80%;">Mathematics of CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs018.html#convolution-examples-polynomial-multiplication" style="font-size: 80%;">Convolution Examples: Polynomial multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs019.html#efficient-polynomial-multiplication" style="font-size: 80%;">Efficient Polynomial Multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs020.html#further-simplification" style="font-size: 80%;">Further simplification</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs021.html#a-more-efficient-way-of-coding-the-above-convolution" style="font-size: 80%;">A more efficient way of coding the above Convolution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs022.html#commutative-process" style="font-size: 80%;">Commutative process</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs023.html#toeplitz-matrices" style="font-size: 80%;">Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs024.html#fourier-series-and-toeplitz-matrices" style="font-size: 80%;">Fourier series and Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs025.html#generalizing-the-above-one-dimensional-case" style="font-size: 80%;">Generalizing the above one-dimensional case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs026.html#memory-considerations" style="font-size: 80%;">Memory considerations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs027.html#padding" style="font-size: 80%;">Padding</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs028.html#new-vector" style="font-size: 80%;">New vector</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs029.html#rewriting-as-dot-products" style="font-size: 80%;">Rewriting as dot products</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs030.html#cross-correlation" style="font-size: 80%;">Cross correlation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs030.html#two-dimensional-objects" style="font-size: 80%;">Two-dimensional objects</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#cnns-in-more-detail-simple-example" style="font-size: 80%;">CNNs in more detail, simple example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs032.html#the-convolution-stage" style="font-size: 80%;">The convolution stage</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs033.html#finding-the-number-of-parameters" style="font-size: 80%;">Finding the number of parameters</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs034.html#new-image-or-volume" style="font-size: 80%;">New image (or volume)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs035.html#parameters-to-train-common-settings" style="font-size: 80%;">Parameters to train, common settings</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs036.html#examples-of-cnn-setups" style="font-size: 80%;">Examples of CNN setups</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs037.html#summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Summarizing: Performing a general discrete convolution ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs038.html#pooling" style="font-size: 80%;">Pooling</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs039.html#pooling-arithmetic" style="font-size: 80%;">Pooling arithmetic</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs040.html#pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Pooling types ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs041.html#building-convolutional-neural-networks-using-tensorflow-and-keras" style="font-size: 80%;">Building convolutional neural networks using Tensorflow and Keras</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs042.html#setting-it-up" style="font-size: 80%;">Setting it up</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs043.html#the-mnist-dataset-again" style="font-size: 80%;">The MNIST dataset again</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs044.html#strong-correlations" style="font-size: 80%;">Strong correlations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs045.html#layers-of-a-cnn" style="font-size: 80%;">Layers of a CNN</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs046.html#systematic-reduction" style="font-size: 80%;">Systematic reduction</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs047.html#prerequisites-collect-and-pre-process-data" style="font-size: 80%;">Prerequisites: Collect and pre-process data</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs048.html#importing-keras-and-tensorflow" style="font-size: 80%;">Importing Keras and Tensorflow</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs049.html#running-with-keras" style="font-size: 80%;">Running with Keras</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs050.html#final-part" style="font-size: 80%;">Final part</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs051.html#final-visualization" style="font-size: 80%;">Final visualization</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs052.html#the-cifar01-data-set" style="font-size: 80%;">The CIFAR01 data set</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs053.html#verifying-the-data-set" style="font-size: 80%;">Verifying the data set</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs054.html#set-up-the-model" style="font-size: 80%;">Set up  the model</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs055.html#add-dense-layers-on-top" style="font-size: 80%;">Add Dense layers on top</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs056.html#compile-and-train-the-model" style="font-size: 80%;">Compile and train the model</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs057.html#finally-evaluate-the-model" style="font-size: 80%;">Finally, evaluate the model</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs058.html#building-code-using-pytorch" style="font-size: 80%;">Building code using Pytorch</a></li>
 
         </ul>
       </li>
@@ -417,17 +277,26 @@
 <div class="container">
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0012"></a>
-<!-- !split -->
-<h2 id="a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" class="anchor">A deep CNN model (<a href="https://github.com/rasbt/machine-learning-book" target="_self">From Raschka et al</a>) </h2>
+<!-- !split  -->
+<h2 id="layers-used-to-build-cnns" class="anchor">Layers used to build CNNs </h2>
 
-<center>  <!-- FIGURE -->
-<hr class="figure">
-<center>
-<p class="caption">Figure 3:  A deep CNN </p>
-</center>
-<p><img src="http://github.com/figslides/deepcnn.png" width="500" align="bottom"></p>
-</center>
+<p>A simple CNN is a sequence of layers, and every layer of a CNN
+transforms one volume of activations to another through a
+differentiable function. We use three main types of layers to build
+CNN architectures: Convolutional Layer, Pooling Layer, and
+Fully-Connected Layer (exactly as seen in regular Neural Networks). We
+will stack these layers to form a full CNN architecture.
+</p>
 
+<p>A simple CNN for image classification could have the architecture:</p>
+
+<ul>
+<li> <b>INPUT</b> (\( 32\times 32 \times 3 \)) will hold the raw pixel values of the image, in this case an image of width 32, height 32, and with three color channels R,G,B.</li>
+<li> <b>CONV</b> (convolutional )layer will compute the output of neurons that are connected to local regions in the input, each computing a dot product between their weights and a small region they are connected to in the input volume. This may result in volume such as \( [32\times 32\times 12] \) if we decided to use 12 filters.</li>
+<li> <b>RELU</b> layer will apply an elementwise activation function, such as the \( max(0,x) \) thresholding at zero. This leaves the size of the volume unchanged (\( [32\times 32\times 12] \)).</li>
+<li> <b>POOL</b> (pooling) layer will perform a downsampling operation along the spatial dimensions (width, height), resulting in volume such as \( [16\times 16\times 12] \).</li>
+<li> <b>FC</b> (i.e. fully-connected) layer will compute the class scores, resulting in volume of size \( [1\times 1\times 10] \), where each of the 10 numbers correspond to a class score, such as among the 10 categories of the MNIST images we considered above . As with ordinary Neural Networks and as the name implies, each neuron in this layer will be connected to all the numbers in the previous volume.</li>
+</ul>
 <p>
 <!-- navigation buttons at the bottom of the page -->
 <ul class="pagination">
@@ -453,7 +322,7 @@ <h2 id="a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learn
   <li><a href="http://github.com/._week45-bs020.html">21</a></li>
   <li><a href="http://github.com/._week45-bs021.html">22</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week45-bs076.html">77</a></li>
+  <li><a href="http://github.com/._week45-bs058.html">59</a></li>
   <li><a href="http://github.com/._week45-bs013.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week45/html/._week45-bs013.html b/doc/pub/week45/html/._week45-bs013.html
index b4a249d3e..5726b5e3d 100644
--- a/doc/pub/week45/html/._week45-bs013.html
+++ b/doc/pub/week45/html/._week45-bs013.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks (RNNs)">
-<title>Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks (RNNs)</title>
+<meta name="description" content="Week 45,  Convolutional Neural Networks (CCNs)">
+<title>Week 45,  Convolutional Neural Networks (CCNs)</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week45.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week45-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -37,19 +37,19 @@
 <!-- tocinfo
 {'highest level': 2,
  'sections': [('Plans for week 45', 2, None, 'plans-for-week-45'),
-              ('Material for the lab sessions, additional ways to present '
-               'classification results and other practicalities',
+              ('Material for the lab sessions',
                2,
                None,
-               'material-for-the-lab-sessions-additional-ways-to-present-classification-results-and-other-practicalities'),
-              ('Material for Lecture Monday November 4',
+               'material-for-the-lab-sessions'),
+              ('Material for Lecture Monday November 3',
                2,
                None,
-               'material-for-lecture-monday-november-4'),
-              ('Convolutional Neural Networks (recognizing images)',
+               'material-for-lecture-monday-november-3'),
+              ('Convolutional Neural Networks (recognizing images), reminder '
+               'from last week',
                2,
                None,
-               'convolutional-neural-networks-recognizing-images'),
+               'convolutional-neural-networks-recognizing-images-reminder-from-last-week'),
               ('What is the Difference', 2, None, 'what-is-the-difference'),
               ('Neural Networks vs CNNs', 2, None, 'neural-networks-vs-cnns'),
               ('Why CNNS for images, sound files, medical images from CT scans '
@@ -62,10 +62,13 @@
                None,
                'regular-nns-don-t-scale-well-to-full-images'),
               ('3D volumes of neurons', 2, None, '3d-volumes-of-neurons'),
+              ('More on Dimensionalities', 2, None, 'more-on-dimensionalities'),
+              ('Further remarks', 2, None, 'further-remarks'),
               ('Layers used to build CNNs',
                2,
                None,
                'layers-used-to-build-cnns'),
+              ('Transforming images', 2, None, 'transforming-images'),
               ('CNNs in brief', 2, None, 'cnns-in-brief'),
               ('A deep CNN model ("From Raschka et '
                'al":"https://github.com/rasbt/machine-learning-book")',
@@ -73,208 +76,104 @@
                None,
                'a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
               ('Key Idea', 2, None, 'key-idea'),
-              ('Building convolutional neural networks in Tensorflow and Keras',
+              ('Mathematics of CNNs', 2, None, 'mathematics-of-cnns'),
+              ('Convolution Examples: Polynomial multiplication',
                2,
                None,
-               'building-convolutional-neural-networks-in-tensorflow-and-keras'),
-              ('Setting it up', 2, None, 'setting-it-up'),
-              ('The MNIST dataset again', 2, None, 'the-mnist-dataset-again'),
-              ('Strong correlations', 2, None, 'strong-correlations'),
-              ('Layers of a CNN', 2, None, 'layers-of-a-cnn'),
-              ('Systematic reduction', 2, None, 'systematic-reduction'),
-              ('Prerequisites: Collect and pre-process data',
-               2,
-               None,
-               'prerequisites-collect-and-pre-process-data'),
-              ('Importing Keras and Tensorflow',
-               2,
-               None,
-               'importing-keras-and-tensorflow'),
-              ('Running with Keras', 2, None, 'running-with-keras'),
-              ('Final part', 2, None, 'final-part'),
-              ('Final visualization', 2, None, 'final-visualization'),
-              ('The CIFAR01 data set', 2, None, 'the-cifar01-data-set'),
-              ('Verifying the data set', 2, None, 'verifying-the-data-set'),
-              ('Set up  the model', 2, None, 'set-up-the-model'),
-              ('Add Dense layers on top', 2, None, 'add-dense-layers-on-top'),
-              ('Compile and train the model',
-               2,
-               None,
-               'compile-and-train-the-model'),
-              ('Finally, evaluate the model',
-               2,
-               None,
-               'finally-evaluate-the-model'),
-              ('Building our own CNN code',
-               2,
-               None,
-               'building-our-own-cnn-code'),
-              ('List of contents:', 3, None, 'list-of-contents'),
-              ('Schedulers', 3, None, 'schedulers'),
-              ('Usage of schedulers', 3, None, 'usage-of-schedulers'),
-              ('Cost functions', 3, None, 'cost-functions'),
-              ('Usage of cost functions', 3, None, 'usage-of-cost-functions'),
-              ('Activation functions', 3, None, 'activation-functions'),
-              ('Usage of activation functions',
-               3,
-               None,
-               'usage-of-activation-functions'),
-              ('Convolution', 3, None, 'convolution'),
-              ('Layers', 3, None, 'layers'),
-              ('Convolution2DLayer: convolution in a hidden layer',
-               3,
-               None,
-               'convolution2dlayer-convolution-in-a-hidden-layer'),
-              ('Backpropagation in the convolutional layer',
-               3,
-               None,
-               'backpropagation-in-the-convolutional-layer'),
-              ('Demonstration', 3, None, 'demonstration'),
-              ('Pooling Layer', 3, None, 'pooling-layer'),
-              ('Flattening Layer', 3, None, 'flattening-layer'),
-              ('Fully Connected Layers', 3, None, 'fully-connected-layers'),
-              ('Optimized Convolution2DLayer',
-               3,
-               None,
-               'optimized-convolution2dlayer'),
-              ('The Convolutional Neural Network (CNN)',
-               3,
-               None,
-               'the-convolutional-neural-network-cnn'),
-              ('Usage of CNN code', 3, None, 'usage-of-cnn-code'),
-              ('Additional Remarks', 3, None, 'additional-remarks'),
-              ('Remarks on the speed', 3, None, 'remarks-on-the-speed'),
-              ('Convolution using separable kernels',
-               3,
-               None,
-               'convolution-using-separable-kernels'),
-              ('Convolution in the Fourier domain',
-               3,
-               None,
-               'convolution-in-the-fourier-domain'),
-              ('From FFNNs and CNNs to recurrent neural networks (RNNs)',
+               'convolution-examples-polynomial-multiplication'),
+              ('Efficient Polynomial Multiplication',
                2,
                None,
-               'from-ffnns-and-cnns-to-recurrent-neural-networks-rnns'),
-              ('Feedback connections', 2, None, 'feedback-connections'),
-              ('Vanishing gradients', 2, None, 'vanishing-gradients'),
-              ('Recurrent neural networks (RNNs): Overarching view',
+               'efficient-polynomial-multiplication'),
+              ('Further simplification', 2, None, 'further-simplification'),
+              ('A more efficient way of coding the above Convolution',
                2,
                None,
-               'recurrent-neural-networks-rnns-overarching-view'),
-              ('Sequential data only?', 2, None, 'sequential-data-only'),
-              ('Differential equations', 2, None, 'differential-equations'),
-              ('A simple example', 2, None, 'a-simple-example'),
-              ('RNNs', 2, None, 'rnns'),
-              ('What kinds of behaviour can RNNs exhibit?',
+               'a-more-efficient-way-of-coding-the-above-convolution'),
+              ('Commutative process', 2, None, 'commutative-process'),
+              ('Toeplitz matrices', 2, None, 'toeplitz-matrices'),
+              ('Fourier series and Toeplitz matrices',
                2,
                None,
-               'what-kinds-of-behaviour-can-rnns-exhibit'),
-              ('Basic layout,  "Figures from Sebastian Rashcka et al, Machine '
-               'learning with Sickit-Learn and '
-               'PyTorch":"https://sebastianraschka.com/blog/2022/ml-pytorch-book.html"',
+               'fourier-series-and-toeplitz-matrices'),
+              ('Generalizing the above one-dimensional case',
                2,
                None,
-               'basic-layout-figures-from-sebastian-rashcka-et-al-machine-learning-with-sickit-learn-and-pytorch-https-sebastianraschka-com-blog-2022-ml-pytorch-book-html'),
-              ('Solving differential equations with RNNs',
+               'generalizing-the-above-one-dimensional-case'),
+              ('Memory considerations', 2, None, 'memory-considerations'),
+              ('Padding', 2, None, 'padding'),
+              ('New vector', 2, None, 'new-vector'),
+              ('Rewriting as dot products',
                2,
                None,
-               'solving-differential-equations-with-rnns'),
-              ('Two first-order differential equations',
+               'rewriting-as-dot-products'),
+              ('Cross correlation', 2, None, 'cross-correlation'),
+              ('Two-dimensional objects', 2, None, 'two-dimensional-objects'),
+              ('CNNs in more detail, simple example',
                2,
                None,
-               'two-first-order-differential-equations'),
-              ('Velocity only', 2, None, 'velocity-only'),
-              ('Linking with RNNs', 2, None, 'linking-with-rnns'),
-              ('Minor rewrite', 2, None, 'minor-rewrite'),
-              ('RNNs in more detail', 2, None, 'rnns-in-more-detail'),
-              ('RNNs in more detail, part 2',
+               'cnns-in-more-detail-simple-example'),
+              ('The convolution stage', 2, None, 'the-convolution-stage'),
+              ('Finding the number of parameters',
                2,
                None,
-               'rnns-in-more-detail-part-2'),
-              ('RNNs in more detail, part 3',
+               'finding-the-number-of-parameters'),
+              ('New image (or volume)', 2, None, 'new-image-or-volume'),
+              ('Parameters to train, common settings',
                2,
                None,
-               'rnns-in-more-detail-part-3'),
-              ('RNNs in more detail, part 4',
-               2,
-               None,
-               'rnns-in-more-detail-part-4'),
-              ('RNNs in more detail, part 5',
-               2,
-               None,
-               'rnns-in-more-detail-part-5'),
-              ('RNNs in more detail, part 6',
-               2,
-               None,
-               'rnns-in-more-detail-part-6'),
-              ('RNNs in more detail, part 7',
-               2,
-               None,
-               'rnns-in-more-detail-part-7'),
-              ('Backpropagation through time',
-               2,
-               None,
-               'backpropagation-through-time'),
-              ('The backward pass is linear',
-               2,
-               None,
-               'the-backward-pass-is-linear'),
-              ('The problem of exploding or vanishing gradients',
-               2,
-               None,
-               'the-problem-of-exploding-or-vanishing-gradients'),
-              ('Mathematical setup', 2, None, 'mathematical-setup'),
-              ('Back propagation in time through figures, part 1',
-               2,
-               None,
-               'back-propagation-in-time-through-figures-part-1'),
-              ('Back propagation in time, part 2',
-               2,
-               None,
-               'back-propagation-in-time-part-2'),
-              ('Back propagation in time, part 3',
+               'parameters-to-train-common-settings'),
+              ('Examples of CNN setups', 2, None, 'examples-of-cnn-setups'),
+              ('Summarizing: Performing a general discrete convolution ("From '
+               'Raschka et '
+               'al":"https://github.com/rasbt/machine-learning-book")',
                2,
                None,
-               'back-propagation-in-time-part-3'),
-              ('Back propagation in time, part 4',
+               'summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
+              ('Pooling', 2, None, 'pooling'),
+              ('Pooling arithmetic', 2, None, 'pooling-arithmetic'),
+              ('Pooling types ("From Raschka et '
+               'al":"https://github.com/rasbt/machine-learning-book")',
                2,
                None,
-               'back-propagation-in-time-part-4'),
-              ('Back propagation in time in equations',
+               'pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
+              ('Building convolutional neural networks using Tensorflow and '
+               'Keras',
                2,
                None,
-               'back-propagation-in-time-in-equations'),
-              ('Chain rule again', 2, None, 'chain-rule-again'),
-              ('Gradients of loss functions',
+               'building-convolutional-neural-networks-using-tensorflow-and-keras'),
+              ('Setting it up', 2, None, 'setting-it-up'),
+              ('The MNIST dataset again', 2, None, 'the-mnist-dataset-again'),
+              ('Strong correlations', 2, None, 'strong-correlations'),
+              ('Layers of a CNN', 2, None, 'layers-of-a-cnn'),
+              ('Systematic reduction', 2, None, 'systematic-reduction'),
+              ('Prerequisites: Collect and pre-process data',
                2,
                None,
-               'gradients-of-loss-functions'),
-              ('Summary of RNNs', 2, None, 'summary-of-rnns'),
-              ('Summary of a  typical RNN',
+               'prerequisites-collect-and-pre-process-data'),
+              ('Importing Keras and Tensorflow',
                2,
                None,
-               'summary-of-a-typical-rnn'),
-              ('Four effective ways to learn an RNN and preparing for next '
-               'week',
+               'importing-keras-and-tensorflow'),
+              ('Running with Keras', 2, None, 'running-with-keras'),
+              ('Final part', 2, None, 'final-part'),
+              ('Final visualization', 2, None, 'final-visualization'),
+              ('The CIFAR01 data set', 2, None, 'the-cifar01-data-set'),
+              ('Verifying the data set', 2, None, 'verifying-the-data-set'),
+              ('Set up  the model', 2, None, 'set-up-the-model'),
+              ('Add Dense layers on top', 2, None, 'add-dense-layers-on-top'),
+              ('Compile and train the model',
                2,
                None,
-               'four-effective-ways-to-learn-an-rnn-and-preparing-for-next-week'),
-              ('Gating mechanism: Long Short Term Memory (LSTM)',
+               'compile-and-train-the-model'),
+              ('Finally, evaluate the model',
                2,
                None,
-               'gating-mechanism-long-short-term-memory-lstm'),
-              ('Implementing a memory cell in a neural network',
+               'finally-evaluate-the-model'),
+              ('Building code using Pytorch',
                2,
                None,
-               'implementing-a-memory-cell-in-a-neural-network'),
-              ('LSTM details', 2, None, 'lstm-details'),
-              ('Basic layout', 2, None, 'basic-layout'),
-              ('More LSTM details', 2, None, 'more-lstm-details'),
-              ('The forget gate', 2, None, 'the-forget-gate'),
-              ('Input gate', 2, None, 'input-gate'),
-              ('Forget and input', 2, None, 'forget-and-input'),
-              ('Output gate', 2, None, 'output-gate')]}
+               'building-code-using-pytorch')]}
 end of tocinfo -->
 
 <body>
@@ -302,111 +201,72 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week45-bs.html">Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks (RNNs)</a>
+    <a class="navbar-brand" href="http://github.com/week45-bs.html">Week 45,  Convolutional Neural Networks (CCNs)</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs001.html#plans-for-week-45" style="font-size: 80%;"><b>Plans for week 45</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs002.html#material-for-the-lab-sessions-additional-ways-to-present-classification-results-and-other-practicalities" style="font-size: 80%;"><b>Material for the lab sessions, additional ways to present classification results and other practicalities</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs003.html#material-for-lecture-monday-november-4" style="font-size: 80%;"><b>Material for Lecture Monday November 4</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs004.html#convolutional-neural-networks-recognizing-images" style="font-size: 80%;"><b>Convolutional Neural Networks (recognizing images)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs005.html#what-is-the-difference" style="font-size: 80%;"><b>What is the Difference</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs006.html#neural-networks-vs-cnns" style="font-size: 80%;"><b>Neural Networks vs CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs007.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;"><b>Why CNNS for images, sound files, medical images from CT scans etc?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs008.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;"><b>Regular NNs don’t scale well to full images</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs009.html#3d-volumes-of-neurons" style="font-size: 80%;"><b>3D volumes of neurons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs010.html#layers-used-to-build-cnns" style="font-size: 80%;"><b>Layers used to build CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs011.html#cnns-in-brief" style="font-size: 80%;"><b>CNNs in brief</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs012.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="#key-idea" style="font-size: 80%;"><b>Key Idea</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs014.html#building-convolutional-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building convolutional neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs015.html#setting-it-up" style="font-size: 80%;"><b>Setting it up</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs016.html#the-mnist-dataset-again" style="font-size: 80%;"><b>The MNIST dataset again</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs017.html#strong-correlations" style="font-size: 80%;"><b>Strong correlations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs018.html#layers-of-a-cnn" style="font-size: 80%;"><b>Layers of a CNN</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs019.html#systematic-reduction" style="font-size: 80%;"><b>Systematic reduction</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs020.html#prerequisites-collect-and-pre-process-data" style="font-size: 80%;"><b>Prerequisites: Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs021.html#importing-keras-and-tensorflow" style="font-size: 80%;"><b>Importing Keras and Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs022.html#running-with-keras" style="font-size: 80%;"><b>Running with Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs023.html#final-part" style="font-size: 80%;"><b>Final part</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs024.html#final-visualization" style="font-size: 80%;"><b>Final visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs025.html#the-cifar01-data-set" style="font-size: 80%;"><b>The CIFAR01 data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs026.html#verifying-the-data-set" style="font-size: 80%;"><b>Verifying the data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs027.html#set-up-the-model" style="font-size: 80%;"><b>Set up  the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs028.html#add-dense-layers-on-top" style="font-size: 80%;"><b>Add Dense layers on top</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs029.html#compile-and-train-the-model" style="font-size: 80%;"><b>Compile and train the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs030.html#finally-evaluate-the-model" style="font-size: 80%;"><b>Finally, evaluate the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#building-our-own-cnn-code" style="font-size: 80%;"><b>Building our own CNN code</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#list-of-contents" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;List of contents:</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution2dlayer-convolution-in-a-hidden-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution2DLayer: convolution in a hidden layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#backpropagation-in-the-convolutional-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Backpropagation in the convolutional layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#demonstration" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Demonstration</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#pooling-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Pooling Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#flattening-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Flattening Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#fully-connected-layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Fully Connected Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#optimized-convolution2dlayer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Optimized Convolution2DLayer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#the-convolutional-neural-network-cnn" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;The Convolutional Neural Network (CNN)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-cnn-code" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of CNN code</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#additional-remarks" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Additional Remarks</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#remarks-on-the-speed" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Remarks on the speed</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution-using-separable-kernels" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution using separable kernels</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution-in-the-fourier-domain" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution in the Fourier domain</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs032.html#from-ffnns-and-cnns-to-recurrent-neural-networks-rnns" style="font-size: 80%;"><b>From FFNNs and CNNs to recurrent neural networks (RNNs)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs033.html#feedback-connections" style="font-size: 80%;"><b>Feedback connections</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs034.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs035.html#recurrent-neural-networks-rnns-overarching-view" style="font-size: 80%;"><b>Recurrent neural networks (RNNs): Overarching view</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs036.html#sequential-data-only" style="font-size: 80%;"><b>Sequential data only?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs037.html#differential-equations" style="font-size: 80%;"><b>Differential equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs038.html#a-simple-example" style="font-size: 80%;"><b>A simple example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs039.html#rnns" style="font-size: 80%;"><b>RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs040.html#what-kinds-of-behaviour-can-rnns-exhibit" style="font-size: 80%;"><b>What kinds of behaviour can RNNs exhibit?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs041.html#basic-layout-figures-from-sebastian-rashcka-et-al-machine-learning-with-sickit-learn-and-pytorch-https-sebastianraschka-com-blog-2022-ml-pytorch-book-html" style="font-size: 80%;"><b>Basic layout,  "Figures from Sebastian Rashcka et al, Machine learning with Sickit-Learn and PyTorch":"https://sebastianraschka.com/blog/2022/ml-pytorch-book.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs042.html#solving-differential-equations-with-rnns" style="font-size: 80%;"><b>Solving differential equations with RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs043.html#two-first-order-differential-equations" style="font-size: 80%;"><b>Two first-order differential equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs044.html#velocity-only" style="font-size: 80%;"><b>Velocity only</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs045.html#linking-with-rnns" style="font-size: 80%;"><b>Linking with RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs046.html#minor-rewrite" style="font-size: 80%;"><b>Minor rewrite</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs047.html#rnns-in-more-detail" style="font-size: 80%;"><b>RNNs in more detail</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs048.html#rnns-in-more-detail-part-2" style="font-size: 80%;"><b>RNNs in more detail, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs049.html#rnns-in-more-detail-part-3" style="font-size: 80%;"><b>RNNs in more detail, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs050.html#rnns-in-more-detail-part-4" style="font-size: 80%;"><b>RNNs in more detail, part 4</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs051.html#rnns-in-more-detail-part-5" style="font-size: 80%;"><b>RNNs in more detail, part 5</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs052.html#rnns-in-more-detail-part-6" style="font-size: 80%;"><b>RNNs in more detail, part 6</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs053.html#rnns-in-more-detail-part-7" style="font-size: 80%;"><b>RNNs in more detail, part 7</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs054.html#backpropagation-through-time" style="font-size: 80%;"><b>Backpropagation through time</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs055.html#the-backward-pass-is-linear" style="font-size: 80%;"><b>The backward pass is linear</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs056.html#the-problem-of-exploding-or-vanishing-gradients" style="font-size: 80%;"><b>The problem of exploding or vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs057.html#mathematical-setup" style="font-size: 80%;"><b>Mathematical setup</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs058.html#back-propagation-in-time-through-figures-part-1" style="font-size: 80%;"><b>Back propagation in time through figures, part 1</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs059.html#back-propagation-in-time-part-2" style="font-size: 80%;"><b>Back propagation in time, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs060.html#back-propagation-in-time-part-3" style="font-size: 80%;"><b>Back propagation in time, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs061.html#back-propagation-in-time-part-4" style="font-size: 80%;"><b>Back propagation in time, part 4</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs062.html#back-propagation-in-time-in-equations" style="font-size: 80%;"><b>Back propagation in time in equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs063.html#chain-rule-again" style="font-size: 80%;"><b>Chain rule again</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs064.html#gradients-of-loss-functions" style="font-size: 80%;"><b>Gradients of loss functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs065.html#summary-of-rnns" style="font-size: 80%;"><b>Summary of RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs066.html#summary-of-a-typical-rnn" style="font-size: 80%;"><b>Summary of a  typical RNN</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs067.html#four-effective-ways-to-learn-an-rnn-and-preparing-for-next-week" style="font-size: 80%;"><b>Four effective ways to learn an RNN and preparing for next week</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs068.html#gating-mechanism-long-short-term-memory-lstm" style="font-size: 80%;"><b>Gating mechanism: Long Short Term Memory (LSTM)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs069.html#implementing-a-memory-cell-in-a-neural-network" style="font-size: 80%;"><b>Implementing a memory cell in a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs070.html#lstm-details" style="font-size: 80%;"><b>LSTM details</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs071.html#basic-layout" style="font-size: 80%;"><b>Basic layout</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs072.html#more-lstm-details" style="font-size: 80%;"><b>More LSTM details</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs073.html#the-forget-gate" style="font-size: 80%;"><b>The forget gate</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs074.html#input-gate" style="font-size: 80%;"><b>Input gate</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs075.html#forget-and-input" style="font-size: 80%;"><b>Forget and input</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs076.html#output-gate" style="font-size: 80%;"><b>Output gate</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs001.html#plans-for-week-45" style="font-size: 80%;">Plans for week 45</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs002.html#material-for-the-lab-sessions" style="font-size: 80%;">Material for the lab sessions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs003.html#material-for-lecture-monday-november-3" style="font-size: 80%;">Material for Lecture Monday November 3</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs004.html#convolutional-neural-networks-recognizing-images-reminder-from-last-week" style="font-size: 80%;">Convolutional Neural Networks (recognizing images), reminder from last week</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs005.html#what-is-the-difference" style="font-size: 80%;">What is the Difference</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs006.html#neural-networks-vs-cnns" style="font-size: 80%;">Neural Networks vs CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs007.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;">Why CNNS for images, sound files, medical images from CT scans etc?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs008.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;">Regular NNs don’t scale well to full images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs009.html#3d-volumes-of-neurons" style="font-size: 80%;">3D volumes of neurons</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs010.html#more-on-dimensionalities" style="font-size: 80%;">More on Dimensionalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs011.html#further-remarks" style="font-size: 80%;">Further remarks</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs012.html#layers-used-to-build-cnns" style="font-size: 80%;">Layers used to build CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="#transforming-images" style="font-size: 80%;">Transforming images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs014.html#cnns-in-brief" style="font-size: 80%;">CNNs in brief</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs015.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs016.html#key-idea" style="font-size: 80%;">Key Idea</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs017.html#mathematics-of-cnns" style="font-size: 80%;">Mathematics of CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs018.html#convolution-examples-polynomial-multiplication" style="font-size: 80%;">Convolution Examples: Polynomial multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs019.html#efficient-polynomial-multiplication" style="font-size: 80%;">Efficient Polynomial Multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs020.html#further-simplification" style="font-size: 80%;">Further simplification</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs021.html#a-more-efficient-way-of-coding-the-above-convolution" style="font-size: 80%;">A more efficient way of coding the above Convolution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs022.html#commutative-process" style="font-size: 80%;">Commutative process</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs023.html#toeplitz-matrices" style="font-size: 80%;">Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs024.html#fourier-series-and-toeplitz-matrices" style="font-size: 80%;">Fourier series and Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs025.html#generalizing-the-above-one-dimensional-case" style="font-size: 80%;">Generalizing the above one-dimensional case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs026.html#memory-considerations" style="font-size: 80%;">Memory considerations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs027.html#padding" style="font-size: 80%;">Padding</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs028.html#new-vector" style="font-size: 80%;">New vector</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs029.html#rewriting-as-dot-products" style="font-size: 80%;">Rewriting as dot products</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs030.html#cross-correlation" style="font-size: 80%;">Cross correlation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs030.html#two-dimensional-objects" style="font-size: 80%;">Two-dimensional objects</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#cnns-in-more-detail-simple-example" style="font-size: 80%;">CNNs in more detail, simple example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs032.html#the-convolution-stage" style="font-size: 80%;">The convolution stage</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs033.html#finding-the-number-of-parameters" style="font-size: 80%;">Finding the number of parameters</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs034.html#new-image-or-volume" style="font-size: 80%;">New image (or volume)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs035.html#parameters-to-train-common-settings" style="font-size: 80%;">Parameters to train, common settings</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs036.html#examples-of-cnn-setups" style="font-size: 80%;">Examples of CNN setups</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs037.html#summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Summarizing: Performing a general discrete convolution ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs038.html#pooling" style="font-size: 80%;">Pooling</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs039.html#pooling-arithmetic" style="font-size: 80%;">Pooling arithmetic</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs040.html#pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Pooling types ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs041.html#building-convolutional-neural-networks-using-tensorflow-and-keras" style="font-size: 80%;">Building convolutional neural networks using Tensorflow and Keras</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs042.html#setting-it-up" style="font-size: 80%;">Setting it up</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs043.html#the-mnist-dataset-again" style="font-size: 80%;">The MNIST dataset again</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs044.html#strong-correlations" style="font-size: 80%;">Strong correlations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs045.html#layers-of-a-cnn" style="font-size: 80%;">Layers of a CNN</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs046.html#systematic-reduction" style="font-size: 80%;">Systematic reduction</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs047.html#prerequisites-collect-and-pre-process-data" style="font-size: 80%;">Prerequisites: Collect and pre-process data</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs048.html#importing-keras-and-tensorflow" style="font-size: 80%;">Importing Keras and Tensorflow</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs049.html#running-with-keras" style="font-size: 80%;">Running with Keras</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs050.html#final-part" style="font-size: 80%;">Final part</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs051.html#final-visualization" style="font-size: 80%;">Final visualization</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs052.html#the-cifar01-data-set" style="font-size: 80%;">The CIFAR01 data set</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs053.html#verifying-the-data-set" style="font-size: 80%;">Verifying the data set</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs054.html#set-up-the-model" style="font-size: 80%;">Set up  the model</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs055.html#add-dense-layers-on-top" style="font-size: 80%;">Add Dense layers on top</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs056.html#compile-and-train-the-model" style="font-size: 80%;">Compile and train the model</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs057.html#finally-evaluate-the-model" style="font-size: 80%;">Finally, evaluate the model</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs058.html#building-code-using-pytorch" style="font-size: 80%;">Building code using Pytorch</a></li>
 
         </ul>
       </li>
@@ -418,15 +278,21 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0013"></a>
 <!-- !split -->
-<h2 id="key-idea" class="anchor">Key Idea </h2>
+<h2 id="transforming-images" class="anchor">Transforming images </h2>
 
-<p>A dense neural network is representd by an affine operation (like matrix-matrix multiplication) where all parameters are included.</p>
-
-<p>The key idea in CNNs for say imaging is that in images neighbor pixels tend to be related! So we connect
-only neighboring neurons in the input instead of connecting all with the first hidden layer.
+<p>CNNs transform the original image layer by layer from the original
+pixel values to the final class scores. 
 </p>
 
-<p>We say we perform a filtering (convolution is the mathematical operation). </p>
+<p>Observe that some layers contain
+parameters and other don&#8217;t. In particular, the CNN layers perform
+transformations that are a function of not only the activations in the
+input volume, but also of the parameters (the weights and biases of
+the neurons). On the other hand, the RELU/POOL layers will implement a
+fixed function. The parameters in the CONV/FC layers will be trained
+with gradient descent so that the class scores that the CNN computes
+are consistent with the labels in the training set for each image.
+</p>
 
 <p>
 <!-- navigation buttons at the bottom of the page -->
@@ -453,7 +319,7 @@ <h2 id="key-idea" class="anchor">Key Idea </h2>
   <li><a href="http://github.com/._week45-bs021.html">22</a></li>
   <li><a href="http://github.com/._week45-bs022.html">23</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week45-bs076.html">77</a></li>
+  <li><a href="http://github.com/._week45-bs058.html">59</a></li>
   <li><a href="http://github.com/._week45-bs014.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week45/html/._week45-bs014.html b/doc/pub/week45/html/._week45-bs014.html
index e1ca65afb..5337a9132 100644
--- a/doc/pub/week45/html/._week45-bs014.html
+++ b/doc/pub/week45/html/._week45-bs014.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks (RNNs)">
-<title>Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks (RNNs)</title>
+<meta name="description" content="Week 45,  Convolutional Neural Networks (CCNs)">
+<title>Week 45,  Convolutional Neural Networks (CCNs)</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week45.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week45-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -37,19 +37,19 @@
 <!-- tocinfo
 {'highest level': 2,
  'sections': [('Plans for week 45', 2, None, 'plans-for-week-45'),
-              ('Material for the lab sessions, additional ways to present '
-               'classification results and other practicalities',
+              ('Material for the lab sessions',
                2,
                None,
-               'material-for-the-lab-sessions-additional-ways-to-present-classification-results-and-other-practicalities'),
-              ('Material for Lecture Monday November 4',
+               'material-for-the-lab-sessions'),
+              ('Material for Lecture Monday November 3',
                2,
                None,
-               'material-for-lecture-monday-november-4'),
-              ('Convolutional Neural Networks (recognizing images)',
+               'material-for-lecture-monday-november-3'),
+              ('Convolutional Neural Networks (recognizing images), reminder '
+               'from last week',
                2,
                None,
-               'convolutional-neural-networks-recognizing-images'),
+               'convolutional-neural-networks-recognizing-images-reminder-from-last-week'),
               ('What is the Difference', 2, None, 'what-is-the-difference'),
               ('Neural Networks vs CNNs', 2, None, 'neural-networks-vs-cnns'),
               ('Why CNNS for images, sound files, medical images from CT scans '
@@ -62,10 +62,13 @@
                None,
                'regular-nns-don-t-scale-well-to-full-images'),
               ('3D volumes of neurons', 2, None, '3d-volumes-of-neurons'),
+              ('More on Dimensionalities', 2, None, 'more-on-dimensionalities'),
+              ('Further remarks', 2, None, 'further-remarks'),
               ('Layers used to build CNNs',
                2,
                None,
                'layers-used-to-build-cnns'),
+              ('Transforming images', 2, None, 'transforming-images'),
               ('CNNs in brief', 2, None, 'cnns-in-brief'),
               ('A deep CNN model ("From Raschka et '
                'al":"https://github.com/rasbt/machine-learning-book")',
@@ -73,208 +76,104 @@
                None,
                'a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
               ('Key Idea', 2, None, 'key-idea'),
-              ('Building convolutional neural networks in Tensorflow and Keras',
+              ('Mathematics of CNNs', 2, None, 'mathematics-of-cnns'),
+              ('Convolution Examples: Polynomial multiplication',
                2,
                None,
-               'building-convolutional-neural-networks-in-tensorflow-and-keras'),
-              ('Setting it up', 2, None, 'setting-it-up'),
-              ('The MNIST dataset again', 2, None, 'the-mnist-dataset-again'),
-              ('Strong correlations', 2, None, 'strong-correlations'),
-              ('Layers of a CNN', 2, None, 'layers-of-a-cnn'),
-              ('Systematic reduction', 2, None, 'systematic-reduction'),
-              ('Prerequisites: Collect and pre-process data',
-               2,
-               None,
-               'prerequisites-collect-and-pre-process-data'),
-              ('Importing Keras and Tensorflow',
-               2,
-               None,
-               'importing-keras-and-tensorflow'),
-              ('Running with Keras', 2, None, 'running-with-keras'),
-              ('Final part', 2, None, 'final-part'),
-              ('Final visualization', 2, None, 'final-visualization'),
-              ('The CIFAR01 data set', 2, None, 'the-cifar01-data-set'),
-              ('Verifying the data set', 2, None, 'verifying-the-data-set'),
-              ('Set up  the model', 2, None, 'set-up-the-model'),
-              ('Add Dense layers on top', 2, None, 'add-dense-layers-on-top'),
-              ('Compile and train the model',
-               2,
-               None,
-               'compile-and-train-the-model'),
-              ('Finally, evaluate the model',
-               2,
-               None,
-               'finally-evaluate-the-model'),
-              ('Building our own CNN code',
-               2,
-               None,
-               'building-our-own-cnn-code'),
-              ('List of contents:', 3, None, 'list-of-contents'),
-              ('Schedulers', 3, None, 'schedulers'),
-              ('Usage of schedulers', 3, None, 'usage-of-schedulers'),
-              ('Cost functions', 3, None, 'cost-functions'),
-              ('Usage of cost functions', 3, None, 'usage-of-cost-functions'),
-              ('Activation functions', 3, None, 'activation-functions'),
-              ('Usage of activation functions',
-               3,
-               None,
-               'usage-of-activation-functions'),
-              ('Convolution', 3, None, 'convolution'),
-              ('Layers', 3, None, 'layers'),
-              ('Convolution2DLayer: convolution in a hidden layer',
-               3,
-               None,
-               'convolution2dlayer-convolution-in-a-hidden-layer'),
-              ('Backpropagation in the convolutional layer',
-               3,
-               None,
-               'backpropagation-in-the-convolutional-layer'),
-              ('Demonstration', 3, None, 'demonstration'),
-              ('Pooling Layer', 3, None, 'pooling-layer'),
-              ('Flattening Layer', 3, None, 'flattening-layer'),
-              ('Fully Connected Layers', 3, None, 'fully-connected-layers'),
-              ('Optimized Convolution2DLayer',
-               3,
-               None,
-               'optimized-convolution2dlayer'),
-              ('The Convolutional Neural Network (CNN)',
-               3,
-               None,
-               'the-convolutional-neural-network-cnn'),
-              ('Usage of CNN code', 3, None, 'usage-of-cnn-code'),
-              ('Additional Remarks', 3, None, 'additional-remarks'),
-              ('Remarks on the speed', 3, None, 'remarks-on-the-speed'),
-              ('Convolution using separable kernels',
-               3,
-               None,
-               'convolution-using-separable-kernels'),
-              ('Convolution in the Fourier domain',
-               3,
-               None,
-               'convolution-in-the-fourier-domain'),
-              ('From FFNNs and CNNs to recurrent neural networks (RNNs)',
+               'convolution-examples-polynomial-multiplication'),
+              ('Efficient Polynomial Multiplication',
                2,
                None,
-               'from-ffnns-and-cnns-to-recurrent-neural-networks-rnns'),
-              ('Feedback connections', 2, None, 'feedback-connections'),
-              ('Vanishing gradients', 2, None, 'vanishing-gradients'),
-              ('Recurrent neural networks (RNNs): Overarching view',
+               'efficient-polynomial-multiplication'),
+              ('Further simplification', 2, None, 'further-simplification'),
+              ('A more efficient way of coding the above Convolution',
                2,
                None,
-               'recurrent-neural-networks-rnns-overarching-view'),
-              ('Sequential data only?', 2, None, 'sequential-data-only'),
-              ('Differential equations', 2, None, 'differential-equations'),
-              ('A simple example', 2, None, 'a-simple-example'),
-              ('RNNs', 2, None, 'rnns'),
-              ('What kinds of behaviour can RNNs exhibit?',
+               'a-more-efficient-way-of-coding-the-above-convolution'),
+              ('Commutative process', 2, None, 'commutative-process'),
+              ('Toeplitz matrices', 2, None, 'toeplitz-matrices'),
+              ('Fourier series and Toeplitz matrices',
                2,
                None,
-               'what-kinds-of-behaviour-can-rnns-exhibit'),
-              ('Basic layout,  "Figures from Sebastian Rashcka et al, Machine '
-               'learning with Sickit-Learn and '
-               'PyTorch":"https://sebastianraschka.com/blog/2022/ml-pytorch-book.html"',
+               'fourier-series-and-toeplitz-matrices'),
+              ('Generalizing the above one-dimensional case',
                2,
                None,
-               'basic-layout-figures-from-sebastian-rashcka-et-al-machine-learning-with-sickit-learn-and-pytorch-https-sebastianraschka-com-blog-2022-ml-pytorch-book-html'),
-              ('Solving differential equations with RNNs',
+               'generalizing-the-above-one-dimensional-case'),
+              ('Memory considerations', 2, None, 'memory-considerations'),
+              ('Padding', 2, None, 'padding'),
+              ('New vector', 2, None, 'new-vector'),
+              ('Rewriting as dot products',
                2,
                None,
-               'solving-differential-equations-with-rnns'),
-              ('Two first-order differential equations',
+               'rewriting-as-dot-products'),
+              ('Cross correlation', 2, None, 'cross-correlation'),
+              ('Two-dimensional objects', 2, None, 'two-dimensional-objects'),
+              ('CNNs in more detail, simple example',
                2,
                None,
-               'two-first-order-differential-equations'),
-              ('Velocity only', 2, None, 'velocity-only'),
-              ('Linking with RNNs', 2, None, 'linking-with-rnns'),
-              ('Minor rewrite', 2, None, 'minor-rewrite'),
-              ('RNNs in more detail', 2, None, 'rnns-in-more-detail'),
-              ('RNNs in more detail, part 2',
+               'cnns-in-more-detail-simple-example'),
+              ('The convolution stage', 2, None, 'the-convolution-stage'),
+              ('Finding the number of parameters',
                2,
                None,
-               'rnns-in-more-detail-part-2'),
-              ('RNNs in more detail, part 3',
+               'finding-the-number-of-parameters'),
+              ('New image (or volume)', 2, None, 'new-image-or-volume'),
+              ('Parameters to train, common settings',
                2,
                None,
-               'rnns-in-more-detail-part-3'),
-              ('RNNs in more detail, part 4',
-               2,
-               None,
-               'rnns-in-more-detail-part-4'),
-              ('RNNs in more detail, part 5',
-               2,
-               None,
-               'rnns-in-more-detail-part-5'),
-              ('RNNs in more detail, part 6',
-               2,
-               None,
-               'rnns-in-more-detail-part-6'),
-              ('RNNs in more detail, part 7',
-               2,
-               None,
-               'rnns-in-more-detail-part-7'),
-              ('Backpropagation through time',
-               2,
-               None,
-               'backpropagation-through-time'),
-              ('The backward pass is linear',
-               2,
-               None,
-               'the-backward-pass-is-linear'),
-              ('The problem of exploding or vanishing gradients',
-               2,
-               None,
-               'the-problem-of-exploding-or-vanishing-gradients'),
-              ('Mathematical setup', 2, None, 'mathematical-setup'),
-              ('Back propagation in time through figures, part 1',
-               2,
-               None,
-               'back-propagation-in-time-through-figures-part-1'),
-              ('Back propagation in time, part 2',
-               2,
-               None,
-               'back-propagation-in-time-part-2'),
-              ('Back propagation in time, part 3',
+               'parameters-to-train-common-settings'),
+              ('Examples of CNN setups', 2, None, 'examples-of-cnn-setups'),
+              ('Summarizing: Performing a general discrete convolution ("From '
+               'Raschka et '
+               'al":"https://github.com/rasbt/machine-learning-book")',
                2,
                None,
-               'back-propagation-in-time-part-3'),
-              ('Back propagation in time, part 4',
+               'summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
+              ('Pooling', 2, None, 'pooling'),
+              ('Pooling arithmetic', 2, None, 'pooling-arithmetic'),
+              ('Pooling types ("From Raschka et '
+               'al":"https://github.com/rasbt/machine-learning-book")',
                2,
                None,
-               'back-propagation-in-time-part-4'),
-              ('Back propagation in time in equations',
+               'pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
+              ('Building convolutional neural networks using Tensorflow and '
+               'Keras',
                2,
                None,
-               'back-propagation-in-time-in-equations'),
-              ('Chain rule again', 2, None, 'chain-rule-again'),
-              ('Gradients of loss functions',
+               'building-convolutional-neural-networks-using-tensorflow-and-keras'),
+              ('Setting it up', 2, None, 'setting-it-up'),
+              ('The MNIST dataset again', 2, None, 'the-mnist-dataset-again'),
+              ('Strong correlations', 2, None, 'strong-correlations'),
+              ('Layers of a CNN', 2, None, 'layers-of-a-cnn'),
+              ('Systematic reduction', 2, None, 'systematic-reduction'),
+              ('Prerequisites: Collect and pre-process data',
                2,
                None,
-               'gradients-of-loss-functions'),
-              ('Summary of RNNs', 2, None, 'summary-of-rnns'),
-              ('Summary of a  typical RNN',
+               'prerequisites-collect-and-pre-process-data'),
+              ('Importing Keras and Tensorflow',
                2,
                None,
-               'summary-of-a-typical-rnn'),
-              ('Four effective ways to learn an RNN and preparing for next '
-               'week',
+               'importing-keras-and-tensorflow'),
+              ('Running with Keras', 2, None, 'running-with-keras'),
+              ('Final part', 2, None, 'final-part'),
+              ('Final visualization', 2, None, 'final-visualization'),
+              ('The CIFAR01 data set', 2, None, 'the-cifar01-data-set'),
+              ('Verifying the data set', 2, None, 'verifying-the-data-set'),
+              ('Set up  the model', 2, None, 'set-up-the-model'),
+              ('Add Dense layers on top', 2, None, 'add-dense-layers-on-top'),
+              ('Compile and train the model',
                2,
                None,
-               'four-effective-ways-to-learn-an-rnn-and-preparing-for-next-week'),
-              ('Gating mechanism: Long Short Term Memory (LSTM)',
+               'compile-and-train-the-model'),
+              ('Finally, evaluate the model',
                2,
                None,
-               'gating-mechanism-long-short-term-memory-lstm'),
-              ('Implementing a memory cell in a neural network',
+               'finally-evaluate-the-model'),
+              ('Building code using Pytorch',
                2,
                None,
-               'implementing-a-memory-cell-in-a-neural-network'),
-              ('LSTM details', 2, None, 'lstm-details'),
-              ('Basic layout', 2, None, 'basic-layout'),
-              ('More LSTM details', 2, None, 'more-lstm-details'),
-              ('The forget gate', 2, None, 'the-forget-gate'),
-              ('Input gate', 2, None, 'input-gate'),
-              ('Forget and input', 2, None, 'forget-and-input'),
-              ('Output gate', 2, None, 'output-gate')]}
+               'building-code-using-pytorch')]}
 end of tocinfo -->
 
 <body>
@@ -302,111 +201,72 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week45-bs.html">Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks (RNNs)</a>
+    <a class="navbar-brand" href="http://github.com/week45-bs.html">Week 45,  Convolutional Neural Networks (CCNs)</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs001.html#plans-for-week-45" style="font-size: 80%;"><b>Plans for week 45</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs002.html#material-for-the-lab-sessions-additional-ways-to-present-classification-results-and-other-practicalities" style="font-size: 80%;"><b>Material for the lab sessions, additional ways to present classification results and other practicalities</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs003.html#material-for-lecture-monday-november-4" style="font-size: 80%;"><b>Material for Lecture Monday November 4</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs004.html#convolutional-neural-networks-recognizing-images" style="font-size: 80%;"><b>Convolutional Neural Networks (recognizing images)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs005.html#what-is-the-difference" style="font-size: 80%;"><b>What is the Difference</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs006.html#neural-networks-vs-cnns" style="font-size: 80%;"><b>Neural Networks vs CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs007.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;"><b>Why CNNS for images, sound files, medical images from CT scans etc?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs008.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;"><b>Regular NNs don’t scale well to full images</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs009.html#3d-volumes-of-neurons" style="font-size: 80%;"><b>3D volumes of neurons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs010.html#layers-used-to-build-cnns" style="font-size: 80%;"><b>Layers used to build CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs011.html#cnns-in-brief" style="font-size: 80%;"><b>CNNs in brief</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs012.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs013.html#key-idea" style="font-size: 80%;"><b>Key Idea</b></a></li>
-     <!-- navigation toc: --> <li><a href="#building-convolutional-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building convolutional neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs015.html#setting-it-up" style="font-size: 80%;"><b>Setting it up</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs016.html#the-mnist-dataset-again" style="font-size: 80%;"><b>The MNIST dataset again</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs017.html#strong-correlations" style="font-size: 80%;"><b>Strong correlations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs018.html#layers-of-a-cnn" style="font-size: 80%;"><b>Layers of a CNN</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs019.html#systematic-reduction" style="font-size: 80%;"><b>Systematic reduction</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs020.html#prerequisites-collect-and-pre-process-data" style="font-size: 80%;"><b>Prerequisites: Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs021.html#importing-keras-and-tensorflow" style="font-size: 80%;"><b>Importing Keras and Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs022.html#running-with-keras" style="font-size: 80%;"><b>Running with Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs023.html#final-part" style="font-size: 80%;"><b>Final part</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs024.html#final-visualization" style="font-size: 80%;"><b>Final visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs025.html#the-cifar01-data-set" style="font-size: 80%;"><b>The CIFAR01 data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs026.html#verifying-the-data-set" style="font-size: 80%;"><b>Verifying the data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs027.html#set-up-the-model" style="font-size: 80%;"><b>Set up  the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs028.html#add-dense-layers-on-top" style="font-size: 80%;"><b>Add Dense layers on top</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs029.html#compile-and-train-the-model" style="font-size: 80%;"><b>Compile and train the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs030.html#finally-evaluate-the-model" style="font-size: 80%;"><b>Finally, evaluate the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#building-our-own-cnn-code" style="font-size: 80%;"><b>Building our own CNN code</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#list-of-contents" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;List of contents:</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution2dlayer-convolution-in-a-hidden-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution2DLayer: convolution in a hidden layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#backpropagation-in-the-convolutional-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Backpropagation in the convolutional layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#demonstration" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Demonstration</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#pooling-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Pooling Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#flattening-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Flattening Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#fully-connected-layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Fully Connected Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#optimized-convolution2dlayer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Optimized Convolution2DLayer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#the-convolutional-neural-network-cnn" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;The Convolutional Neural Network (CNN)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-cnn-code" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of CNN code</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#additional-remarks" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Additional Remarks</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#remarks-on-the-speed" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Remarks on the speed</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution-using-separable-kernels" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution using separable kernels</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution-in-the-fourier-domain" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution in the Fourier domain</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs032.html#from-ffnns-and-cnns-to-recurrent-neural-networks-rnns" style="font-size: 80%;"><b>From FFNNs and CNNs to recurrent neural networks (RNNs)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs033.html#feedback-connections" style="font-size: 80%;"><b>Feedback connections</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs034.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs035.html#recurrent-neural-networks-rnns-overarching-view" style="font-size: 80%;"><b>Recurrent neural networks (RNNs): Overarching view</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs036.html#sequential-data-only" style="font-size: 80%;"><b>Sequential data only?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs037.html#differential-equations" style="font-size: 80%;"><b>Differential equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs038.html#a-simple-example" style="font-size: 80%;"><b>A simple example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs039.html#rnns" style="font-size: 80%;"><b>RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs040.html#what-kinds-of-behaviour-can-rnns-exhibit" style="font-size: 80%;"><b>What kinds of behaviour can RNNs exhibit?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs041.html#basic-layout-figures-from-sebastian-rashcka-et-al-machine-learning-with-sickit-learn-and-pytorch-https-sebastianraschka-com-blog-2022-ml-pytorch-book-html" style="font-size: 80%;"><b>Basic layout,  "Figures from Sebastian Rashcka et al, Machine learning with Sickit-Learn and PyTorch":"https://sebastianraschka.com/blog/2022/ml-pytorch-book.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs042.html#solving-differential-equations-with-rnns" style="font-size: 80%;"><b>Solving differential equations with RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs043.html#two-first-order-differential-equations" style="font-size: 80%;"><b>Two first-order differential equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs044.html#velocity-only" style="font-size: 80%;"><b>Velocity only</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs045.html#linking-with-rnns" style="font-size: 80%;"><b>Linking with RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs046.html#minor-rewrite" style="font-size: 80%;"><b>Minor rewrite</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs047.html#rnns-in-more-detail" style="font-size: 80%;"><b>RNNs in more detail</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs048.html#rnns-in-more-detail-part-2" style="font-size: 80%;"><b>RNNs in more detail, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs049.html#rnns-in-more-detail-part-3" style="font-size: 80%;"><b>RNNs in more detail, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs050.html#rnns-in-more-detail-part-4" style="font-size: 80%;"><b>RNNs in more detail, part 4</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs051.html#rnns-in-more-detail-part-5" style="font-size: 80%;"><b>RNNs in more detail, part 5</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs052.html#rnns-in-more-detail-part-6" style="font-size: 80%;"><b>RNNs in more detail, part 6</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs053.html#rnns-in-more-detail-part-7" style="font-size: 80%;"><b>RNNs in more detail, part 7</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs054.html#backpropagation-through-time" style="font-size: 80%;"><b>Backpropagation through time</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs055.html#the-backward-pass-is-linear" style="font-size: 80%;"><b>The backward pass is linear</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs056.html#the-problem-of-exploding-or-vanishing-gradients" style="font-size: 80%;"><b>The problem of exploding or vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs057.html#mathematical-setup" style="font-size: 80%;"><b>Mathematical setup</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs058.html#back-propagation-in-time-through-figures-part-1" style="font-size: 80%;"><b>Back propagation in time through figures, part 1</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs059.html#back-propagation-in-time-part-2" style="font-size: 80%;"><b>Back propagation in time, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs060.html#back-propagation-in-time-part-3" style="font-size: 80%;"><b>Back propagation in time, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs061.html#back-propagation-in-time-part-4" style="font-size: 80%;"><b>Back propagation in time, part 4</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs062.html#back-propagation-in-time-in-equations" style="font-size: 80%;"><b>Back propagation in time in equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs063.html#chain-rule-again" style="font-size: 80%;"><b>Chain rule again</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs064.html#gradients-of-loss-functions" style="font-size: 80%;"><b>Gradients of loss functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs065.html#summary-of-rnns" style="font-size: 80%;"><b>Summary of RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs066.html#summary-of-a-typical-rnn" style="font-size: 80%;"><b>Summary of a  typical RNN</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs067.html#four-effective-ways-to-learn-an-rnn-and-preparing-for-next-week" style="font-size: 80%;"><b>Four effective ways to learn an RNN and preparing for next week</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs068.html#gating-mechanism-long-short-term-memory-lstm" style="font-size: 80%;"><b>Gating mechanism: Long Short Term Memory (LSTM)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs069.html#implementing-a-memory-cell-in-a-neural-network" style="font-size: 80%;"><b>Implementing a memory cell in a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs070.html#lstm-details" style="font-size: 80%;"><b>LSTM details</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs071.html#basic-layout" style="font-size: 80%;"><b>Basic layout</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs072.html#more-lstm-details" style="font-size: 80%;"><b>More LSTM details</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs073.html#the-forget-gate" style="font-size: 80%;"><b>The forget gate</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs074.html#input-gate" style="font-size: 80%;"><b>Input gate</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs075.html#forget-and-input" style="font-size: 80%;"><b>Forget and input</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs076.html#output-gate" style="font-size: 80%;"><b>Output gate</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs001.html#plans-for-week-45" style="font-size: 80%;">Plans for week 45</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs002.html#material-for-the-lab-sessions" style="font-size: 80%;">Material for the lab sessions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs003.html#material-for-lecture-monday-november-3" style="font-size: 80%;">Material for Lecture Monday November 3</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs004.html#convolutional-neural-networks-recognizing-images-reminder-from-last-week" style="font-size: 80%;">Convolutional Neural Networks (recognizing images), reminder from last week</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs005.html#what-is-the-difference" style="font-size: 80%;">What is the Difference</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs006.html#neural-networks-vs-cnns" style="font-size: 80%;">Neural Networks vs CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs007.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;">Why CNNS for images, sound files, medical images from CT scans etc?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs008.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;">Regular NNs don’t scale well to full images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs009.html#3d-volumes-of-neurons" style="font-size: 80%;">3D volumes of neurons</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs010.html#more-on-dimensionalities" style="font-size: 80%;">More on Dimensionalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs011.html#further-remarks" style="font-size: 80%;">Further remarks</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs012.html#layers-used-to-build-cnns" style="font-size: 80%;">Layers used to build CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs013.html#transforming-images" style="font-size: 80%;">Transforming images</a></li>
+     <!-- navigation toc: --> <li><a href="#cnns-in-brief" style="font-size: 80%;">CNNs in brief</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs015.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs016.html#key-idea" style="font-size: 80%;">Key Idea</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs017.html#mathematics-of-cnns" style="font-size: 80%;">Mathematics of CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs018.html#convolution-examples-polynomial-multiplication" style="font-size: 80%;">Convolution Examples: Polynomial multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs019.html#efficient-polynomial-multiplication" style="font-size: 80%;">Efficient Polynomial Multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs020.html#further-simplification" style="font-size: 80%;">Further simplification</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs021.html#a-more-efficient-way-of-coding-the-above-convolution" style="font-size: 80%;">A more efficient way of coding the above Convolution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs022.html#commutative-process" style="font-size: 80%;">Commutative process</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs023.html#toeplitz-matrices" style="font-size: 80%;">Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs024.html#fourier-series-and-toeplitz-matrices" style="font-size: 80%;">Fourier series and Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs025.html#generalizing-the-above-one-dimensional-case" style="font-size: 80%;">Generalizing the above one-dimensional case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs026.html#memory-considerations" style="font-size: 80%;">Memory considerations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs027.html#padding" style="font-size: 80%;">Padding</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs028.html#new-vector" style="font-size: 80%;">New vector</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs029.html#rewriting-as-dot-products" style="font-size: 80%;">Rewriting as dot products</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs030.html#cross-correlation" style="font-size: 80%;">Cross correlation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs030.html#two-dimensional-objects" style="font-size: 80%;">Two-dimensional objects</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#cnns-in-more-detail-simple-example" style="font-size: 80%;">CNNs in more detail, simple example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs032.html#the-convolution-stage" style="font-size: 80%;">The convolution stage</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs033.html#finding-the-number-of-parameters" style="font-size: 80%;">Finding the number of parameters</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs034.html#new-image-or-volume" style="font-size: 80%;">New image (or volume)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs035.html#parameters-to-train-common-settings" style="font-size: 80%;">Parameters to train, common settings</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs036.html#examples-of-cnn-setups" style="font-size: 80%;">Examples of CNN setups</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs037.html#summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Summarizing: Performing a general discrete convolution ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs038.html#pooling" style="font-size: 80%;">Pooling</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs039.html#pooling-arithmetic" style="font-size: 80%;">Pooling arithmetic</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs040.html#pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Pooling types ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs041.html#building-convolutional-neural-networks-using-tensorflow-and-keras" style="font-size: 80%;">Building convolutional neural networks using Tensorflow and Keras</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs042.html#setting-it-up" style="font-size: 80%;">Setting it up</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs043.html#the-mnist-dataset-again" style="font-size: 80%;">The MNIST dataset again</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs044.html#strong-correlations" style="font-size: 80%;">Strong correlations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs045.html#layers-of-a-cnn" style="font-size: 80%;">Layers of a CNN</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs046.html#systematic-reduction" style="font-size: 80%;">Systematic reduction</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs047.html#prerequisites-collect-and-pre-process-data" style="font-size: 80%;">Prerequisites: Collect and pre-process data</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs048.html#importing-keras-and-tensorflow" style="font-size: 80%;">Importing Keras and Tensorflow</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs049.html#running-with-keras" style="font-size: 80%;">Running with Keras</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs050.html#final-part" style="font-size: 80%;">Final part</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs051.html#final-visualization" style="font-size: 80%;">Final visualization</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs052.html#the-cifar01-data-set" style="font-size: 80%;">The CIFAR01 data set</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs053.html#verifying-the-data-set" style="font-size: 80%;">Verifying the data set</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs054.html#set-up-the-model" style="font-size: 80%;">Set up  the model</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs055.html#add-dense-layers-on-top" style="font-size: 80%;">Add Dense layers on top</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs056.html#compile-and-train-the-model" style="font-size: 80%;">Compile and train the model</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs057.html#finally-evaluate-the-model" style="font-size: 80%;">Finally, evaluate the model</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs058.html#building-code-using-pytorch" style="font-size: 80%;">Building code using Pytorch</a></li>
 
         </ul>
       </li>
@@ -418,19 +278,17 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0014"></a>
 <!-- !split -->
-<h2 id="building-convolutional-neural-networks-in-tensorflow-and-keras" class="anchor">Building convolutional neural networks in Tensorflow and Keras </h2>
+<h2 id="cnns-in-brief" class="anchor">CNNs in brief </h2>
 
-<p>As discussed above, CNNs are neural networks built from the assumption that the inputs
-to the network are 2D images. This is important because the number of features or pixels in images
-grows very fast with the image size, and an enormous number of weights and biases are needed in order to build an accurate network.  
-</p>
-
-<p>As before, we still have our input, a hidden layer and an output. What's novel about convolutional networks
-are the <b>convolutional</b> and <b>pooling</b> layers stacked in pairs between the input and the hidden layer.
-In addition, the data is no longer represented as a 2D feature matrix, instead each input is a number of 2D
-matrices, typically 1 for each color dimension (Red, Green, Blue). 
-</p>
+<p>In summary:</p>
 
+<ul>
+<li> A CNN architecture is in the simplest case a list of Layers that transform the image volume into an output volume (e.g. holding the class scores)</li>
+<li> There are a few distinct types of Layers (e.g. CONV/FC/RELU/POOL are by far the most popular)</li>
+<li> Each Layer accepts an input 3D volume and transforms it to an output 3D volume through a differentiable function</li>
+<li> Each Layer may or may not have parameters (e.g. CONV/FC do, RELU/POOL don&#8217;t)</li>
+<li> Each Layer may or may not have additional hyperparameters (e.g. CONV/FC/POOL do, RELU doesn&#8217;t)</li>
+</ul>
 <p>
 <!-- navigation buttons at the bottom of the page -->
 <ul class="pagination">
@@ -456,7 +314,7 @@ <h2 id="building-convolutional-neural-networks-in-tensorflow-and-keras" class="a
   <li><a href="http://github.com/._week45-bs022.html">23</a></li>
   <li><a href="http://github.com/._week45-bs023.html">24</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week45-bs076.html">77</a></li>
+  <li><a href="http://github.com/._week45-bs058.html">59</a></li>
   <li><a href="http://github.com/._week45-bs015.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week45/html/._week45-bs015.html b/doc/pub/week45/html/._week45-bs015.html
index d8fd30628..75fe59d10 100644
--- a/doc/pub/week45/html/._week45-bs015.html
+++ b/doc/pub/week45/html/._week45-bs015.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks (RNNs)">
-<title>Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks (RNNs)</title>
+<meta name="description" content="Week 45,  Convolutional Neural Networks (CCNs)">
+<title>Week 45,  Convolutional Neural Networks (CCNs)</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week45.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week45-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -37,19 +37,19 @@
 <!-- tocinfo
 {'highest level': 2,
  'sections': [('Plans for week 45', 2, None, 'plans-for-week-45'),
-              ('Material for the lab sessions, additional ways to present '
-               'classification results and other practicalities',
+              ('Material for the lab sessions',
                2,
                None,
-               'material-for-the-lab-sessions-additional-ways-to-present-classification-results-and-other-practicalities'),
-              ('Material for Lecture Monday November 4',
+               'material-for-the-lab-sessions'),
+              ('Material for Lecture Monday November 3',
                2,
                None,
-               'material-for-lecture-monday-november-4'),
-              ('Convolutional Neural Networks (recognizing images)',
+               'material-for-lecture-monday-november-3'),
+              ('Convolutional Neural Networks (recognizing images), reminder '
+               'from last week',
                2,
                None,
-               'convolutional-neural-networks-recognizing-images'),
+               'convolutional-neural-networks-recognizing-images-reminder-from-last-week'),
               ('What is the Difference', 2, None, 'what-is-the-difference'),
               ('Neural Networks vs CNNs', 2, None, 'neural-networks-vs-cnns'),
               ('Why CNNS for images, sound files, medical images from CT scans '
@@ -62,10 +62,13 @@
                None,
                'regular-nns-don-t-scale-well-to-full-images'),
               ('3D volumes of neurons', 2, None, '3d-volumes-of-neurons'),
+              ('More on Dimensionalities', 2, None, 'more-on-dimensionalities'),
+              ('Further remarks', 2, None, 'further-remarks'),
               ('Layers used to build CNNs',
                2,
                None,
                'layers-used-to-build-cnns'),
+              ('Transforming images', 2, None, 'transforming-images'),
               ('CNNs in brief', 2, None, 'cnns-in-brief'),
               ('A deep CNN model ("From Raschka et '
                'al":"https://github.com/rasbt/machine-learning-book")',
@@ -73,208 +76,104 @@
                None,
                'a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
               ('Key Idea', 2, None, 'key-idea'),
-              ('Building convolutional neural networks in Tensorflow and Keras',
+              ('Mathematics of CNNs', 2, None, 'mathematics-of-cnns'),
+              ('Convolution Examples: Polynomial multiplication',
                2,
                None,
-               'building-convolutional-neural-networks-in-tensorflow-and-keras'),
-              ('Setting it up', 2, None, 'setting-it-up'),
-              ('The MNIST dataset again', 2, None, 'the-mnist-dataset-again'),
-              ('Strong correlations', 2, None, 'strong-correlations'),
-              ('Layers of a CNN', 2, None, 'layers-of-a-cnn'),
-              ('Systematic reduction', 2, None, 'systematic-reduction'),
-              ('Prerequisites: Collect and pre-process data',
-               2,
-               None,
-               'prerequisites-collect-and-pre-process-data'),
-              ('Importing Keras and Tensorflow',
-               2,
-               None,
-               'importing-keras-and-tensorflow'),
-              ('Running with Keras', 2, None, 'running-with-keras'),
-              ('Final part', 2, None, 'final-part'),
-              ('Final visualization', 2, None, 'final-visualization'),
-              ('The CIFAR01 data set', 2, None, 'the-cifar01-data-set'),
-              ('Verifying the data set', 2, None, 'verifying-the-data-set'),
-              ('Set up  the model', 2, None, 'set-up-the-model'),
-              ('Add Dense layers on top', 2, None, 'add-dense-layers-on-top'),
-              ('Compile and train the model',
-               2,
-               None,
-               'compile-and-train-the-model'),
-              ('Finally, evaluate the model',
-               2,
-               None,
-               'finally-evaluate-the-model'),
-              ('Building our own CNN code',
-               2,
-               None,
-               'building-our-own-cnn-code'),
-              ('List of contents:', 3, None, 'list-of-contents'),
-              ('Schedulers', 3, None, 'schedulers'),
-              ('Usage of schedulers', 3, None, 'usage-of-schedulers'),
-              ('Cost functions', 3, None, 'cost-functions'),
-              ('Usage of cost functions', 3, None, 'usage-of-cost-functions'),
-              ('Activation functions', 3, None, 'activation-functions'),
-              ('Usage of activation functions',
-               3,
-               None,
-               'usage-of-activation-functions'),
-              ('Convolution', 3, None, 'convolution'),
-              ('Layers', 3, None, 'layers'),
-              ('Convolution2DLayer: convolution in a hidden layer',
-               3,
-               None,
-               'convolution2dlayer-convolution-in-a-hidden-layer'),
-              ('Backpropagation in the convolutional layer',
-               3,
-               None,
-               'backpropagation-in-the-convolutional-layer'),
-              ('Demonstration', 3, None, 'demonstration'),
-              ('Pooling Layer', 3, None, 'pooling-layer'),
-              ('Flattening Layer', 3, None, 'flattening-layer'),
-              ('Fully Connected Layers', 3, None, 'fully-connected-layers'),
-              ('Optimized Convolution2DLayer',
-               3,
-               None,
-               'optimized-convolution2dlayer'),
-              ('The Convolutional Neural Network (CNN)',
-               3,
-               None,
-               'the-convolutional-neural-network-cnn'),
-              ('Usage of CNN code', 3, None, 'usage-of-cnn-code'),
-              ('Additional Remarks', 3, None, 'additional-remarks'),
-              ('Remarks on the speed', 3, None, 'remarks-on-the-speed'),
-              ('Convolution using separable kernels',
-               3,
-               None,
-               'convolution-using-separable-kernels'),
-              ('Convolution in the Fourier domain',
-               3,
-               None,
-               'convolution-in-the-fourier-domain'),
-              ('From FFNNs and CNNs to recurrent neural networks (RNNs)',
-               2,
-               None,
-               'from-ffnns-and-cnns-to-recurrent-neural-networks-rnns'),
-              ('Feedback connections', 2, None, 'feedback-connections'),
-              ('Vanishing gradients', 2, None, 'vanishing-gradients'),
-              ('Recurrent neural networks (RNNs): Overarching view',
-               2,
-               None,
-               'recurrent-neural-networks-rnns-overarching-view'),
-              ('Sequential data only?', 2, None, 'sequential-data-only'),
-              ('Differential equations', 2, None, 'differential-equations'),
-              ('A simple example', 2, None, 'a-simple-example'),
-              ('RNNs', 2, None, 'rnns'),
-              ('What kinds of behaviour can RNNs exhibit?',
-               2,
-               None,
-               'what-kinds-of-behaviour-can-rnns-exhibit'),
-              ('Basic layout,  "Figures from Sebastian Rashcka et al, Machine '
-               'learning with Sickit-Learn and '
-               'PyTorch":"https://sebastianraschka.com/blog/2022/ml-pytorch-book.html"',
-               2,
-               None,
-               'basic-layout-figures-from-sebastian-rashcka-et-al-machine-learning-with-sickit-learn-and-pytorch-https-sebastianraschka-com-blog-2022-ml-pytorch-book-html'),
-              ('Solving differential equations with RNNs',
-               2,
-               None,
-               'solving-differential-equations-with-rnns'),
-              ('Two first-order differential equations',
-               2,
-               None,
-               'two-first-order-differential-equations'),
-              ('Velocity only', 2, None, 'velocity-only'),
-              ('Linking with RNNs', 2, None, 'linking-with-rnns'),
-              ('Minor rewrite', 2, None, 'minor-rewrite'),
-              ('RNNs in more detail', 2, None, 'rnns-in-more-detail'),
-              ('RNNs in more detail, part 2',
-               2,
-               None,
-               'rnns-in-more-detail-part-2'),
-              ('RNNs in more detail, part 3',
-               2,
-               None,
-               'rnns-in-more-detail-part-3'),
-              ('RNNs in more detail, part 4',
-               2,
-               None,
-               'rnns-in-more-detail-part-4'),
-              ('RNNs in more detail, part 5',
+               'convolution-examples-polynomial-multiplication'),
+              ('Efficient Polynomial Multiplication',
                2,
                None,
-               'rnns-in-more-detail-part-5'),
-              ('RNNs in more detail, part 6',
+               'efficient-polynomial-multiplication'),
+              ('Further simplification', 2, None, 'further-simplification'),
+              ('A more efficient way of coding the above Convolution',
                2,
                None,
-               'rnns-in-more-detail-part-6'),
-              ('RNNs in more detail, part 7',
+               'a-more-efficient-way-of-coding-the-above-convolution'),
+              ('Commutative process', 2, None, 'commutative-process'),
+              ('Toeplitz matrices', 2, None, 'toeplitz-matrices'),
+              ('Fourier series and Toeplitz matrices',
                2,
                None,
-               'rnns-in-more-detail-part-7'),
-              ('Backpropagation through time',
+               'fourier-series-and-toeplitz-matrices'),
+              ('Generalizing the above one-dimensional case',
                2,
                None,
-               'backpropagation-through-time'),
-              ('The backward pass is linear',
+               'generalizing-the-above-one-dimensional-case'),
+              ('Memory considerations', 2, None, 'memory-considerations'),
+              ('Padding', 2, None, 'padding'),
+              ('New vector', 2, None, 'new-vector'),
+              ('Rewriting as dot products',
                2,
                None,
-               'the-backward-pass-is-linear'),
-              ('The problem of exploding or vanishing gradients',
+               'rewriting-as-dot-products'),
+              ('Cross correlation', 2, None, 'cross-correlation'),
+              ('Two-dimensional objects', 2, None, 'two-dimensional-objects'),
+              ('CNNs in more detail, simple example',
                2,
                None,
-               'the-problem-of-exploding-or-vanishing-gradients'),
-              ('Mathematical setup', 2, None, 'mathematical-setup'),
-              ('Back propagation in time through figures, part 1',
+               'cnns-in-more-detail-simple-example'),
+              ('The convolution stage', 2, None, 'the-convolution-stage'),
+              ('Finding the number of parameters',
                2,
                None,
-               'back-propagation-in-time-through-figures-part-1'),
-              ('Back propagation in time, part 2',
+               'finding-the-number-of-parameters'),
+              ('New image (or volume)', 2, None, 'new-image-or-volume'),
+              ('Parameters to train, common settings',
                2,
                None,
-               'back-propagation-in-time-part-2'),
-              ('Back propagation in time, part 3',
+               'parameters-to-train-common-settings'),
+              ('Examples of CNN setups', 2, None, 'examples-of-cnn-setups'),
+              ('Summarizing: Performing a general discrete convolution ("From '
+               'Raschka et '
+               'al":"https://github.com/rasbt/machine-learning-book")',
                2,
                None,
-               'back-propagation-in-time-part-3'),
-              ('Back propagation in time, part 4',
+               'summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
+              ('Pooling', 2, None, 'pooling'),
+              ('Pooling arithmetic', 2, None, 'pooling-arithmetic'),
+              ('Pooling types ("From Raschka et '
+               'al":"https://github.com/rasbt/machine-learning-book")',
                2,
                None,
-               'back-propagation-in-time-part-4'),
-              ('Back propagation in time in equations',
+               'pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
+              ('Building convolutional neural networks using Tensorflow and '
+               'Keras',
                2,
                None,
-               'back-propagation-in-time-in-equations'),
-              ('Chain rule again', 2, None, 'chain-rule-again'),
-              ('Gradients of loss functions',
+               'building-convolutional-neural-networks-using-tensorflow-and-keras'),
+              ('Setting it up', 2, None, 'setting-it-up'),
+              ('The MNIST dataset again', 2, None, 'the-mnist-dataset-again'),
+              ('Strong correlations', 2, None, 'strong-correlations'),
+              ('Layers of a CNN', 2, None, 'layers-of-a-cnn'),
+              ('Systematic reduction', 2, None, 'systematic-reduction'),
+              ('Prerequisites: Collect and pre-process data',
                2,
                None,
-               'gradients-of-loss-functions'),
-              ('Summary of RNNs', 2, None, 'summary-of-rnns'),
-              ('Summary of a  typical RNN',
+               'prerequisites-collect-and-pre-process-data'),
+              ('Importing Keras and Tensorflow',
                2,
                None,
-               'summary-of-a-typical-rnn'),
-              ('Four effective ways to learn an RNN and preparing for next '
-               'week',
+               'importing-keras-and-tensorflow'),
+              ('Running with Keras', 2, None, 'running-with-keras'),
+              ('Final part', 2, None, 'final-part'),
+              ('Final visualization', 2, None, 'final-visualization'),
+              ('The CIFAR01 data set', 2, None, 'the-cifar01-data-set'),
+              ('Verifying the data set', 2, None, 'verifying-the-data-set'),
+              ('Set up  the model', 2, None, 'set-up-the-model'),
+              ('Add Dense layers on top', 2, None, 'add-dense-layers-on-top'),
+              ('Compile and train the model',
                2,
                None,
-               'four-effective-ways-to-learn-an-rnn-and-preparing-for-next-week'),
-              ('Gating mechanism: Long Short Term Memory (LSTM)',
+               'compile-and-train-the-model'),
+              ('Finally, evaluate the model',
                2,
                None,
-               'gating-mechanism-long-short-term-memory-lstm'),
-              ('Implementing a memory cell in a neural network',
+               'finally-evaluate-the-model'),
+              ('Building code using Pytorch',
                2,
                None,
-               'implementing-a-memory-cell-in-a-neural-network'),
-              ('LSTM details', 2, None, 'lstm-details'),
-              ('Basic layout', 2, None, 'basic-layout'),
-              ('More LSTM details', 2, None, 'more-lstm-details'),
-              ('The forget gate', 2, None, 'the-forget-gate'),
-              ('Input gate', 2, None, 'input-gate'),
-              ('Forget and input', 2, None, 'forget-and-input'),
-              ('Output gate', 2, None, 'output-gate')]}
+               'building-code-using-pytorch')]}
 end of tocinfo -->
 
 <body>
@@ -302,111 +201,72 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week45-bs.html">Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks (RNNs)</a>
+    <a class="navbar-brand" href="http://github.com/week45-bs.html">Week 45,  Convolutional Neural Networks (CCNs)</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs001.html#plans-for-week-45" style="font-size: 80%;"><b>Plans for week 45</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs002.html#material-for-the-lab-sessions-additional-ways-to-present-classification-results-and-other-practicalities" style="font-size: 80%;"><b>Material for the lab sessions, additional ways to present classification results and other practicalities</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs003.html#material-for-lecture-monday-november-4" style="font-size: 80%;"><b>Material for Lecture Monday November 4</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs004.html#convolutional-neural-networks-recognizing-images" style="font-size: 80%;"><b>Convolutional Neural Networks (recognizing images)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs005.html#what-is-the-difference" style="font-size: 80%;"><b>What is the Difference</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs006.html#neural-networks-vs-cnns" style="font-size: 80%;"><b>Neural Networks vs CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs007.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;"><b>Why CNNS for images, sound files, medical images from CT scans etc?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs008.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;"><b>Regular NNs don’t scale well to full images</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs009.html#3d-volumes-of-neurons" style="font-size: 80%;"><b>3D volumes of neurons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs010.html#layers-used-to-build-cnns" style="font-size: 80%;"><b>Layers used to build CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs011.html#cnns-in-brief" style="font-size: 80%;"><b>CNNs in brief</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs012.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs013.html#key-idea" style="font-size: 80%;"><b>Key Idea</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs014.html#building-convolutional-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building convolutional neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="#setting-it-up" style="font-size: 80%;"><b>Setting it up</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs016.html#the-mnist-dataset-again" style="font-size: 80%;"><b>The MNIST dataset again</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs017.html#strong-correlations" style="font-size: 80%;"><b>Strong correlations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs018.html#layers-of-a-cnn" style="font-size: 80%;"><b>Layers of a CNN</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs019.html#systematic-reduction" style="font-size: 80%;"><b>Systematic reduction</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs020.html#prerequisites-collect-and-pre-process-data" style="font-size: 80%;"><b>Prerequisites: Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs021.html#importing-keras-and-tensorflow" style="font-size: 80%;"><b>Importing Keras and Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs022.html#running-with-keras" style="font-size: 80%;"><b>Running with Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs023.html#final-part" style="font-size: 80%;"><b>Final part</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs024.html#final-visualization" style="font-size: 80%;"><b>Final visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs025.html#the-cifar01-data-set" style="font-size: 80%;"><b>The CIFAR01 data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs026.html#verifying-the-data-set" style="font-size: 80%;"><b>Verifying the data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs027.html#set-up-the-model" style="font-size: 80%;"><b>Set up  the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs028.html#add-dense-layers-on-top" style="font-size: 80%;"><b>Add Dense layers on top</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs029.html#compile-and-train-the-model" style="font-size: 80%;"><b>Compile and train the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs030.html#finally-evaluate-the-model" style="font-size: 80%;"><b>Finally, evaluate the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#building-our-own-cnn-code" style="font-size: 80%;"><b>Building our own CNN code</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#list-of-contents" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;List of contents:</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution2dlayer-convolution-in-a-hidden-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution2DLayer: convolution in a hidden layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#backpropagation-in-the-convolutional-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Backpropagation in the convolutional layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#demonstration" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Demonstration</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#pooling-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Pooling Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#flattening-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Flattening Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#fully-connected-layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Fully Connected Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#optimized-convolution2dlayer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Optimized Convolution2DLayer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#the-convolutional-neural-network-cnn" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;The Convolutional Neural Network (CNN)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-cnn-code" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of CNN code</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#additional-remarks" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Additional Remarks</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#remarks-on-the-speed" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Remarks on the speed</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution-using-separable-kernels" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution using separable kernels</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution-in-the-fourier-domain" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution in the Fourier domain</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs032.html#from-ffnns-and-cnns-to-recurrent-neural-networks-rnns" style="font-size: 80%;"><b>From FFNNs and CNNs to recurrent neural networks (RNNs)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs033.html#feedback-connections" style="font-size: 80%;"><b>Feedback connections</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs034.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs035.html#recurrent-neural-networks-rnns-overarching-view" style="font-size: 80%;"><b>Recurrent neural networks (RNNs): Overarching view</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs036.html#sequential-data-only" style="font-size: 80%;"><b>Sequential data only?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs037.html#differential-equations" style="font-size: 80%;"><b>Differential equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs038.html#a-simple-example" style="font-size: 80%;"><b>A simple example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs039.html#rnns" style="font-size: 80%;"><b>RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs040.html#what-kinds-of-behaviour-can-rnns-exhibit" style="font-size: 80%;"><b>What kinds of behaviour can RNNs exhibit?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs041.html#basic-layout-figures-from-sebastian-rashcka-et-al-machine-learning-with-sickit-learn-and-pytorch-https-sebastianraschka-com-blog-2022-ml-pytorch-book-html" style="font-size: 80%;"><b>Basic layout,  "Figures from Sebastian Rashcka et al, Machine learning with Sickit-Learn and PyTorch":"https://sebastianraschka.com/blog/2022/ml-pytorch-book.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs042.html#solving-differential-equations-with-rnns" style="font-size: 80%;"><b>Solving differential equations with RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs043.html#two-first-order-differential-equations" style="font-size: 80%;"><b>Two first-order differential equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs044.html#velocity-only" style="font-size: 80%;"><b>Velocity only</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs045.html#linking-with-rnns" style="font-size: 80%;"><b>Linking with RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs046.html#minor-rewrite" style="font-size: 80%;"><b>Minor rewrite</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs047.html#rnns-in-more-detail" style="font-size: 80%;"><b>RNNs in more detail</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs048.html#rnns-in-more-detail-part-2" style="font-size: 80%;"><b>RNNs in more detail, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs049.html#rnns-in-more-detail-part-3" style="font-size: 80%;"><b>RNNs in more detail, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs050.html#rnns-in-more-detail-part-4" style="font-size: 80%;"><b>RNNs in more detail, part 4</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs051.html#rnns-in-more-detail-part-5" style="font-size: 80%;"><b>RNNs in more detail, part 5</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs052.html#rnns-in-more-detail-part-6" style="font-size: 80%;"><b>RNNs in more detail, part 6</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs053.html#rnns-in-more-detail-part-7" style="font-size: 80%;"><b>RNNs in more detail, part 7</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs054.html#backpropagation-through-time" style="font-size: 80%;"><b>Backpropagation through time</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs055.html#the-backward-pass-is-linear" style="font-size: 80%;"><b>The backward pass is linear</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs056.html#the-problem-of-exploding-or-vanishing-gradients" style="font-size: 80%;"><b>The problem of exploding or vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs057.html#mathematical-setup" style="font-size: 80%;"><b>Mathematical setup</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs058.html#back-propagation-in-time-through-figures-part-1" style="font-size: 80%;"><b>Back propagation in time through figures, part 1</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs059.html#back-propagation-in-time-part-2" style="font-size: 80%;"><b>Back propagation in time, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs060.html#back-propagation-in-time-part-3" style="font-size: 80%;"><b>Back propagation in time, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs061.html#back-propagation-in-time-part-4" style="font-size: 80%;"><b>Back propagation in time, part 4</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs062.html#back-propagation-in-time-in-equations" style="font-size: 80%;"><b>Back propagation in time in equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs063.html#chain-rule-again" style="font-size: 80%;"><b>Chain rule again</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs064.html#gradients-of-loss-functions" style="font-size: 80%;"><b>Gradients of loss functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs065.html#summary-of-rnns" style="font-size: 80%;"><b>Summary of RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs066.html#summary-of-a-typical-rnn" style="font-size: 80%;"><b>Summary of a  typical RNN</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs067.html#four-effective-ways-to-learn-an-rnn-and-preparing-for-next-week" style="font-size: 80%;"><b>Four effective ways to learn an RNN and preparing for next week</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs068.html#gating-mechanism-long-short-term-memory-lstm" style="font-size: 80%;"><b>Gating mechanism: Long Short Term Memory (LSTM)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs069.html#implementing-a-memory-cell-in-a-neural-network" style="font-size: 80%;"><b>Implementing a memory cell in a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs070.html#lstm-details" style="font-size: 80%;"><b>LSTM details</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs071.html#basic-layout" style="font-size: 80%;"><b>Basic layout</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs072.html#more-lstm-details" style="font-size: 80%;"><b>More LSTM details</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs073.html#the-forget-gate" style="font-size: 80%;"><b>The forget gate</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs074.html#input-gate" style="font-size: 80%;"><b>Input gate</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs075.html#forget-and-input" style="font-size: 80%;"><b>Forget and input</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs076.html#output-gate" style="font-size: 80%;"><b>Output gate</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs001.html#plans-for-week-45" style="font-size: 80%;">Plans for week 45</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs002.html#material-for-the-lab-sessions" style="font-size: 80%;">Material for the lab sessions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs003.html#material-for-lecture-monday-november-3" style="font-size: 80%;">Material for Lecture Monday November 3</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs004.html#convolutional-neural-networks-recognizing-images-reminder-from-last-week" style="font-size: 80%;">Convolutional Neural Networks (recognizing images), reminder from last week</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs005.html#what-is-the-difference" style="font-size: 80%;">What is the Difference</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs006.html#neural-networks-vs-cnns" style="font-size: 80%;">Neural Networks vs CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs007.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;">Why CNNS for images, sound files, medical images from CT scans etc?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs008.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;">Regular NNs don’t scale well to full images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs009.html#3d-volumes-of-neurons" style="font-size: 80%;">3D volumes of neurons</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs010.html#more-on-dimensionalities" style="font-size: 80%;">More on Dimensionalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs011.html#further-remarks" style="font-size: 80%;">Further remarks</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs012.html#layers-used-to-build-cnns" style="font-size: 80%;">Layers used to build CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs013.html#transforming-images" style="font-size: 80%;">Transforming images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs014.html#cnns-in-brief" style="font-size: 80%;">CNNs in brief</a></li>
+     <!-- navigation toc: --> <li><a href="#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs016.html#key-idea" style="font-size: 80%;">Key Idea</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs017.html#mathematics-of-cnns" style="font-size: 80%;">Mathematics of CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs018.html#convolution-examples-polynomial-multiplication" style="font-size: 80%;">Convolution Examples: Polynomial multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs019.html#efficient-polynomial-multiplication" style="font-size: 80%;">Efficient Polynomial Multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs020.html#further-simplification" style="font-size: 80%;">Further simplification</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs021.html#a-more-efficient-way-of-coding-the-above-convolution" style="font-size: 80%;">A more efficient way of coding the above Convolution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs022.html#commutative-process" style="font-size: 80%;">Commutative process</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs023.html#toeplitz-matrices" style="font-size: 80%;">Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs024.html#fourier-series-and-toeplitz-matrices" style="font-size: 80%;">Fourier series and Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs025.html#generalizing-the-above-one-dimensional-case" style="font-size: 80%;">Generalizing the above one-dimensional case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs026.html#memory-considerations" style="font-size: 80%;">Memory considerations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs027.html#padding" style="font-size: 80%;">Padding</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs028.html#new-vector" style="font-size: 80%;">New vector</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs029.html#rewriting-as-dot-products" style="font-size: 80%;">Rewriting as dot products</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs030.html#cross-correlation" style="font-size: 80%;">Cross correlation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs030.html#two-dimensional-objects" style="font-size: 80%;">Two-dimensional objects</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#cnns-in-more-detail-simple-example" style="font-size: 80%;">CNNs in more detail, simple example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs032.html#the-convolution-stage" style="font-size: 80%;">The convolution stage</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs033.html#finding-the-number-of-parameters" style="font-size: 80%;">Finding the number of parameters</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs034.html#new-image-or-volume" style="font-size: 80%;">New image (or volume)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs035.html#parameters-to-train-common-settings" style="font-size: 80%;">Parameters to train, common settings</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs036.html#examples-of-cnn-setups" style="font-size: 80%;">Examples of CNN setups</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs037.html#summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Summarizing: Performing a general discrete convolution ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs038.html#pooling" style="font-size: 80%;">Pooling</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs039.html#pooling-arithmetic" style="font-size: 80%;">Pooling arithmetic</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs040.html#pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Pooling types ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs041.html#building-convolutional-neural-networks-using-tensorflow-and-keras" style="font-size: 80%;">Building convolutional neural networks using Tensorflow and Keras</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs042.html#setting-it-up" style="font-size: 80%;">Setting it up</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs043.html#the-mnist-dataset-again" style="font-size: 80%;">The MNIST dataset again</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs044.html#strong-correlations" style="font-size: 80%;">Strong correlations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs045.html#layers-of-a-cnn" style="font-size: 80%;">Layers of a CNN</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs046.html#systematic-reduction" style="font-size: 80%;">Systematic reduction</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs047.html#prerequisites-collect-and-pre-process-data" style="font-size: 80%;">Prerequisites: Collect and pre-process data</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs048.html#importing-keras-and-tensorflow" style="font-size: 80%;">Importing Keras and Tensorflow</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs049.html#running-with-keras" style="font-size: 80%;">Running with Keras</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs050.html#final-part" style="font-size: 80%;">Final part</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs051.html#final-visualization" style="font-size: 80%;">Final visualization</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs052.html#the-cifar01-data-set" style="font-size: 80%;">The CIFAR01 data set</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs053.html#verifying-the-data-set" style="font-size: 80%;">Verifying the data set</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs054.html#set-up-the-model" style="font-size: 80%;">Set up  the model</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs055.html#add-dense-layers-on-top" style="font-size: 80%;">Add Dense layers on top</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs056.html#compile-and-train-the-model" style="font-size: 80%;">Compile and train the model</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs057.html#finally-evaluate-the-model" style="font-size: 80%;">Finally, evaluate the model</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs058.html#building-code-using-pytorch" style="font-size: 80%;">Building code using Pytorch</a></li>
 
         </ul>
       </li>
@@ -418,15 +278,15 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0015"></a>
 <!-- !split -->
-<h2 id="setting-it-up" class="anchor">Setting it up </h2>
-
-<p>It means that to represent the entire
-dataset of images, we require a 4D matrix or <b>tensor</b>. This tensor has the dimensions:  
-</p>
-$$  
-(n_{inputs},\, n_{pixels, width},\, n_{pixels, height},\, depth) .
-$$
+<h2 id="a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" class="anchor">A deep CNN model (<a href="https://github.com/rasbt/machine-learning-book" target="_self">From Raschka et al</a>) </h2>
 
+<center>  <!-- FIGURE -->
+<hr class="figure">
+<center>
+<p class="caption">Figure 3:  A deep CNN </p>
+</center>
+<p><img src="http://github.com/figslides/deepcnn.png" width="500" align="bottom"></p>
+</center>
 
 <p>
 <!-- navigation buttons at the bottom of the page -->
@@ -453,7 +313,7 @@ <h2 id="setting-it-up" class="anchor">Setting it up </h2>
   <li><a href="http://github.com/._week45-bs023.html">24</a></li>
   <li><a href="http://github.com/._week45-bs024.html">25</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week45-bs076.html">77</a></li>
+  <li><a href="http://github.com/._week45-bs058.html">59</a></li>
   <li><a href="http://github.com/._week45-bs016.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week45/html/._week45-bs016.html b/doc/pub/week45/html/._week45-bs016.html
index bfb4477a8..d6c7de8cc 100644
--- a/doc/pub/week45/html/._week45-bs016.html
+++ b/doc/pub/week45/html/._week45-bs016.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks (RNNs)">
-<title>Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks (RNNs)</title>
+<meta name="description" content="Week 45,  Convolutional Neural Networks (CCNs)">
+<title>Week 45,  Convolutional Neural Networks (CCNs)</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week45.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week45-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -37,19 +37,19 @@
 <!-- tocinfo
 {'highest level': 2,
  'sections': [('Plans for week 45', 2, None, 'plans-for-week-45'),
-              ('Material for the lab sessions, additional ways to present '
-               'classification results and other practicalities',
+              ('Material for the lab sessions',
                2,
                None,
-               'material-for-the-lab-sessions-additional-ways-to-present-classification-results-and-other-practicalities'),
-              ('Material for Lecture Monday November 4',
+               'material-for-the-lab-sessions'),
+              ('Material for Lecture Monday November 3',
                2,
                None,
-               'material-for-lecture-monday-november-4'),
-              ('Convolutional Neural Networks (recognizing images)',
+               'material-for-lecture-monday-november-3'),
+              ('Convolutional Neural Networks (recognizing images), reminder '
+               'from last week',
                2,
                None,
-               'convolutional-neural-networks-recognizing-images'),
+               'convolutional-neural-networks-recognizing-images-reminder-from-last-week'),
               ('What is the Difference', 2, None, 'what-is-the-difference'),
               ('Neural Networks vs CNNs', 2, None, 'neural-networks-vs-cnns'),
               ('Why CNNS for images, sound files, medical images from CT scans '
@@ -62,10 +62,13 @@
                None,
                'regular-nns-don-t-scale-well-to-full-images'),
               ('3D volumes of neurons', 2, None, '3d-volumes-of-neurons'),
+              ('More on Dimensionalities', 2, None, 'more-on-dimensionalities'),
+              ('Further remarks', 2, None, 'further-remarks'),
               ('Layers used to build CNNs',
                2,
                None,
                'layers-used-to-build-cnns'),
+              ('Transforming images', 2, None, 'transforming-images'),
               ('CNNs in brief', 2, None, 'cnns-in-brief'),
               ('A deep CNN model ("From Raschka et '
                'al":"https://github.com/rasbt/machine-learning-book")',
@@ -73,208 +76,104 @@
                None,
                'a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
               ('Key Idea', 2, None, 'key-idea'),
-              ('Building convolutional neural networks in Tensorflow and Keras',
+              ('Mathematics of CNNs', 2, None, 'mathematics-of-cnns'),
+              ('Convolution Examples: Polynomial multiplication',
                2,
                None,
-               'building-convolutional-neural-networks-in-tensorflow-and-keras'),
-              ('Setting it up', 2, None, 'setting-it-up'),
-              ('The MNIST dataset again', 2, None, 'the-mnist-dataset-again'),
-              ('Strong correlations', 2, None, 'strong-correlations'),
-              ('Layers of a CNN', 2, None, 'layers-of-a-cnn'),
-              ('Systematic reduction', 2, None, 'systematic-reduction'),
-              ('Prerequisites: Collect and pre-process data',
-               2,
-               None,
-               'prerequisites-collect-and-pre-process-data'),
-              ('Importing Keras and Tensorflow',
-               2,
-               None,
-               'importing-keras-and-tensorflow'),
-              ('Running with Keras', 2, None, 'running-with-keras'),
-              ('Final part', 2, None, 'final-part'),
-              ('Final visualization', 2, None, 'final-visualization'),
-              ('The CIFAR01 data set', 2, None, 'the-cifar01-data-set'),
-              ('Verifying the data set', 2, None, 'verifying-the-data-set'),
-              ('Set up  the model', 2, None, 'set-up-the-model'),
-              ('Add Dense layers on top', 2, None, 'add-dense-layers-on-top'),
-              ('Compile and train the model',
-               2,
-               None,
-               'compile-and-train-the-model'),
-              ('Finally, evaluate the model',
-               2,
-               None,
-               'finally-evaluate-the-model'),
-              ('Building our own CNN code',
-               2,
-               None,
-               'building-our-own-cnn-code'),
-              ('List of contents:', 3, None, 'list-of-contents'),
-              ('Schedulers', 3, None, 'schedulers'),
-              ('Usage of schedulers', 3, None, 'usage-of-schedulers'),
-              ('Cost functions', 3, None, 'cost-functions'),
-              ('Usage of cost functions', 3, None, 'usage-of-cost-functions'),
-              ('Activation functions', 3, None, 'activation-functions'),
-              ('Usage of activation functions',
-               3,
-               None,
-               'usage-of-activation-functions'),
-              ('Convolution', 3, None, 'convolution'),
-              ('Layers', 3, None, 'layers'),
-              ('Convolution2DLayer: convolution in a hidden layer',
-               3,
-               None,
-               'convolution2dlayer-convolution-in-a-hidden-layer'),
-              ('Backpropagation in the convolutional layer',
-               3,
-               None,
-               'backpropagation-in-the-convolutional-layer'),
-              ('Demonstration', 3, None, 'demonstration'),
-              ('Pooling Layer', 3, None, 'pooling-layer'),
-              ('Flattening Layer', 3, None, 'flattening-layer'),
-              ('Fully Connected Layers', 3, None, 'fully-connected-layers'),
-              ('Optimized Convolution2DLayer',
-               3,
-               None,
-               'optimized-convolution2dlayer'),
-              ('The Convolutional Neural Network (CNN)',
-               3,
-               None,
-               'the-convolutional-neural-network-cnn'),
-              ('Usage of CNN code', 3, None, 'usage-of-cnn-code'),
-              ('Additional Remarks', 3, None, 'additional-remarks'),
-              ('Remarks on the speed', 3, None, 'remarks-on-the-speed'),
-              ('Convolution using separable kernels',
-               3,
-               None,
-               'convolution-using-separable-kernels'),
-              ('Convolution in the Fourier domain',
-               3,
-               None,
-               'convolution-in-the-fourier-domain'),
-              ('From FFNNs and CNNs to recurrent neural networks (RNNs)',
-               2,
-               None,
-               'from-ffnns-and-cnns-to-recurrent-neural-networks-rnns'),
-              ('Feedback connections', 2, None, 'feedback-connections'),
-              ('Vanishing gradients', 2, None, 'vanishing-gradients'),
-              ('Recurrent neural networks (RNNs): Overarching view',
+               'convolution-examples-polynomial-multiplication'),
+              ('Efficient Polynomial Multiplication',
                2,
                None,
-               'recurrent-neural-networks-rnns-overarching-view'),
-              ('Sequential data only?', 2, None, 'sequential-data-only'),
-              ('Differential equations', 2, None, 'differential-equations'),
-              ('A simple example', 2, None, 'a-simple-example'),
-              ('RNNs', 2, None, 'rnns'),
-              ('What kinds of behaviour can RNNs exhibit?',
+               'efficient-polynomial-multiplication'),
+              ('Further simplification', 2, None, 'further-simplification'),
+              ('A more efficient way of coding the above Convolution',
                2,
                None,
-               'what-kinds-of-behaviour-can-rnns-exhibit'),
-              ('Basic layout,  "Figures from Sebastian Rashcka et al, Machine '
-               'learning with Sickit-Learn and '
-               'PyTorch":"https://sebastianraschka.com/blog/2022/ml-pytorch-book.html"',
+               'a-more-efficient-way-of-coding-the-above-convolution'),
+              ('Commutative process', 2, None, 'commutative-process'),
+              ('Toeplitz matrices', 2, None, 'toeplitz-matrices'),
+              ('Fourier series and Toeplitz matrices',
                2,
                None,
-               'basic-layout-figures-from-sebastian-rashcka-et-al-machine-learning-with-sickit-learn-and-pytorch-https-sebastianraschka-com-blog-2022-ml-pytorch-book-html'),
-              ('Solving differential equations with RNNs',
+               'fourier-series-and-toeplitz-matrices'),
+              ('Generalizing the above one-dimensional case',
                2,
                None,
-               'solving-differential-equations-with-rnns'),
-              ('Two first-order differential equations',
+               'generalizing-the-above-one-dimensional-case'),
+              ('Memory considerations', 2, None, 'memory-considerations'),
+              ('Padding', 2, None, 'padding'),
+              ('New vector', 2, None, 'new-vector'),
+              ('Rewriting as dot products',
                2,
                None,
-               'two-first-order-differential-equations'),
-              ('Velocity only', 2, None, 'velocity-only'),
-              ('Linking with RNNs', 2, None, 'linking-with-rnns'),
-              ('Minor rewrite', 2, None, 'minor-rewrite'),
-              ('RNNs in more detail', 2, None, 'rnns-in-more-detail'),
-              ('RNNs in more detail, part 2',
+               'rewriting-as-dot-products'),
+              ('Cross correlation', 2, None, 'cross-correlation'),
+              ('Two-dimensional objects', 2, None, 'two-dimensional-objects'),
+              ('CNNs in more detail, simple example',
                2,
                None,
-               'rnns-in-more-detail-part-2'),
-              ('RNNs in more detail, part 3',
+               'cnns-in-more-detail-simple-example'),
+              ('The convolution stage', 2, None, 'the-convolution-stage'),
+              ('Finding the number of parameters',
                2,
                None,
-               'rnns-in-more-detail-part-3'),
-              ('RNNs in more detail, part 4',
+               'finding-the-number-of-parameters'),
+              ('New image (or volume)', 2, None, 'new-image-or-volume'),
+              ('Parameters to train, common settings',
                2,
                None,
-               'rnns-in-more-detail-part-4'),
-              ('RNNs in more detail, part 5',
-               2,
-               None,
-               'rnns-in-more-detail-part-5'),
-              ('RNNs in more detail, part 6',
-               2,
-               None,
-               'rnns-in-more-detail-part-6'),
-              ('RNNs in more detail, part 7',
-               2,
-               None,
-               'rnns-in-more-detail-part-7'),
-              ('Backpropagation through time',
-               2,
-               None,
-               'backpropagation-through-time'),
-              ('The backward pass is linear',
-               2,
-               None,
-               'the-backward-pass-is-linear'),
-              ('The problem of exploding or vanishing gradients',
-               2,
-               None,
-               'the-problem-of-exploding-or-vanishing-gradients'),
-              ('Mathematical setup', 2, None, 'mathematical-setup'),
-              ('Back propagation in time through figures, part 1',
-               2,
-               None,
-               'back-propagation-in-time-through-figures-part-1'),
-              ('Back propagation in time, part 2',
-               2,
-               None,
-               'back-propagation-in-time-part-2'),
-              ('Back propagation in time, part 3',
+               'parameters-to-train-common-settings'),
+              ('Examples of CNN setups', 2, None, 'examples-of-cnn-setups'),
+              ('Summarizing: Performing a general discrete convolution ("From '
+               'Raschka et '
+               'al":"https://github.com/rasbt/machine-learning-book")',
                2,
                None,
-               'back-propagation-in-time-part-3'),
-              ('Back propagation in time, part 4',
+               'summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
+              ('Pooling', 2, None, 'pooling'),
+              ('Pooling arithmetic', 2, None, 'pooling-arithmetic'),
+              ('Pooling types ("From Raschka et '
+               'al":"https://github.com/rasbt/machine-learning-book")',
                2,
                None,
-               'back-propagation-in-time-part-4'),
-              ('Back propagation in time in equations',
+               'pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
+              ('Building convolutional neural networks using Tensorflow and '
+               'Keras',
                2,
                None,
-               'back-propagation-in-time-in-equations'),
-              ('Chain rule again', 2, None, 'chain-rule-again'),
-              ('Gradients of loss functions',
+               'building-convolutional-neural-networks-using-tensorflow-and-keras'),
+              ('Setting it up', 2, None, 'setting-it-up'),
+              ('The MNIST dataset again', 2, None, 'the-mnist-dataset-again'),
+              ('Strong correlations', 2, None, 'strong-correlations'),
+              ('Layers of a CNN', 2, None, 'layers-of-a-cnn'),
+              ('Systematic reduction', 2, None, 'systematic-reduction'),
+              ('Prerequisites: Collect and pre-process data',
                2,
                None,
-               'gradients-of-loss-functions'),
-              ('Summary of RNNs', 2, None, 'summary-of-rnns'),
-              ('Summary of a  typical RNN',
+               'prerequisites-collect-and-pre-process-data'),
+              ('Importing Keras and Tensorflow',
                2,
                None,
-               'summary-of-a-typical-rnn'),
-              ('Four effective ways to learn an RNN and preparing for next '
-               'week',
+               'importing-keras-and-tensorflow'),
+              ('Running with Keras', 2, None, 'running-with-keras'),
+              ('Final part', 2, None, 'final-part'),
+              ('Final visualization', 2, None, 'final-visualization'),
+              ('The CIFAR01 data set', 2, None, 'the-cifar01-data-set'),
+              ('Verifying the data set', 2, None, 'verifying-the-data-set'),
+              ('Set up  the model', 2, None, 'set-up-the-model'),
+              ('Add Dense layers on top', 2, None, 'add-dense-layers-on-top'),
+              ('Compile and train the model',
                2,
                None,
-               'four-effective-ways-to-learn-an-rnn-and-preparing-for-next-week'),
-              ('Gating mechanism: Long Short Term Memory (LSTM)',
+               'compile-and-train-the-model'),
+              ('Finally, evaluate the model',
                2,
                None,
-               'gating-mechanism-long-short-term-memory-lstm'),
-              ('Implementing a memory cell in a neural network',
+               'finally-evaluate-the-model'),
+              ('Building code using Pytorch',
                2,
                None,
-               'implementing-a-memory-cell-in-a-neural-network'),
-              ('LSTM details', 2, None, 'lstm-details'),
-              ('Basic layout', 2, None, 'basic-layout'),
-              ('More LSTM details', 2, None, 'more-lstm-details'),
-              ('The forget gate', 2, None, 'the-forget-gate'),
-              ('Input gate', 2, None, 'input-gate'),
-              ('Forget and input', 2, None, 'forget-and-input'),
-              ('Output gate', 2, None, 'output-gate')]}
+               'building-code-using-pytorch')]}
 end of tocinfo -->
 
 <body>
@@ -302,111 +201,72 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week45-bs.html">Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks (RNNs)</a>
+    <a class="navbar-brand" href="http://github.com/week45-bs.html">Week 45,  Convolutional Neural Networks (CCNs)</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs001.html#plans-for-week-45" style="font-size: 80%;"><b>Plans for week 45</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs002.html#material-for-the-lab-sessions-additional-ways-to-present-classification-results-and-other-practicalities" style="font-size: 80%;"><b>Material for the lab sessions, additional ways to present classification results and other practicalities</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs003.html#material-for-lecture-monday-november-4" style="font-size: 80%;"><b>Material for Lecture Monday November 4</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs004.html#convolutional-neural-networks-recognizing-images" style="font-size: 80%;"><b>Convolutional Neural Networks (recognizing images)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs005.html#what-is-the-difference" style="font-size: 80%;"><b>What is the Difference</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs006.html#neural-networks-vs-cnns" style="font-size: 80%;"><b>Neural Networks vs CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs007.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;"><b>Why CNNS for images, sound files, medical images from CT scans etc?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs008.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;"><b>Regular NNs don’t scale well to full images</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs009.html#3d-volumes-of-neurons" style="font-size: 80%;"><b>3D volumes of neurons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs010.html#layers-used-to-build-cnns" style="font-size: 80%;"><b>Layers used to build CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs011.html#cnns-in-brief" style="font-size: 80%;"><b>CNNs in brief</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs012.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs013.html#key-idea" style="font-size: 80%;"><b>Key Idea</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs014.html#building-convolutional-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building convolutional neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs015.html#setting-it-up" style="font-size: 80%;"><b>Setting it up</b></a></li>
-     <!-- navigation toc: --> <li><a href="#the-mnist-dataset-again" style="font-size: 80%;"><b>The MNIST dataset again</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs017.html#strong-correlations" style="font-size: 80%;"><b>Strong correlations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs018.html#layers-of-a-cnn" style="font-size: 80%;"><b>Layers of a CNN</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs019.html#systematic-reduction" style="font-size: 80%;"><b>Systematic reduction</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs020.html#prerequisites-collect-and-pre-process-data" style="font-size: 80%;"><b>Prerequisites: Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs021.html#importing-keras-and-tensorflow" style="font-size: 80%;"><b>Importing Keras and Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs022.html#running-with-keras" style="font-size: 80%;"><b>Running with Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs023.html#final-part" style="font-size: 80%;"><b>Final part</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs024.html#final-visualization" style="font-size: 80%;"><b>Final visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs025.html#the-cifar01-data-set" style="font-size: 80%;"><b>The CIFAR01 data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs026.html#verifying-the-data-set" style="font-size: 80%;"><b>Verifying the data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs027.html#set-up-the-model" style="font-size: 80%;"><b>Set up  the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs028.html#add-dense-layers-on-top" style="font-size: 80%;"><b>Add Dense layers on top</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs029.html#compile-and-train-the-model" style="font-size: 80%;"><b>Compile and train the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs030.html#finally-evaluate-the-model" style="font-size: 80%;"><b>Finally, evaluate the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#building-our-own-cnn-code" style="font-size: 80%;"><b>Building our own CNN code</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#list-of-contents" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;List of contents:</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution2dlayer-convolution-in-a-hidden-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution2DLayer: convolution in a hidden layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#backpropagation-in-the-convolutional-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Backpropagation in the convolutional layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#demonstration" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Demonstration</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#pooling-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Pooling Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#flattening-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Flattening Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#fully-connected-layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Fully Connected Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#optimized-convolution2dlayer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Optimized Convolution2DLayer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#the-convolutional-neural-network-cnn" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;The Convolutional Neural Network (CNN)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-cnn-code" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of CNN code</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#additional-remarks" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Additional Remarks</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#remarks-on-the-speed" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Remarks on the speed</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution-using-separable-kernels" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution using separable kernels</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution-in-the-fourier-domain" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution in the Fourier domain</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs032.html#from-ffnns-and-cnns-to-recurrent-neural-networks-rnns" style="font-size: 80%;"><b>From FFNNs and CNNs to recurrent neural networks (RNNs)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs033.html#feedback-connections" style="font-size: 80%;"><b>Feedback connections</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs034.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs035.html#recurrent-neural-networks-rnns-overarching-view" style="font-size: 80%;"><b>Recurrent neural networks (RNNs): Overarching view</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs036.html#sequential-data-only" style="font-size: 80%;"><b>Sequential data only?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs037.html#differential-equations" style="font-size: 80%;"><b>Differential equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs038.html#a-simple-example" style="font-size: 80%;"><b>A simple example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs039.html#rnns" style="font-size: 80%;"><b>RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs040.html#what-kinds-of-behaviour-can-rnns-exhibit" style="font-size: 80%;"><b>What kinds of behaviour can RNNs exhibit?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs041.html#basic-layout-figures-from-sebastian-rashcka-et-al-machine-learning-with-sickit-learn-and-pytorch-https-sebastianraschka-com-blog-2022-ml-pytorch-book-html" style="font-size: 80%;"><b>Basic layout,  "Figures from Sebastian Rashcka et al, Machine learning with Sickit-Learn and PyTorch":"https://sebastianraschka.com/blog/2022/ml-pytorch-book.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs042.html#solving-differential-equations-with-rnns" style="font-size: 80%;"><b>Solving differential equations with RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs043.html#two-first-order-differential-equations" style="font-size: 80%;"><b>Two first-order differential equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs044.html#velocity-only" style="font-size: 80%;"><b>Velocity only</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs045.html#linking-with-rnns" style="font-size: 80%;"><b>Linking with RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs046.html#minor-rewrite" style="font-size: 80%;"><b>Minor rewrite</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs047.html#rnns-in-more-detail" style="font-size: 80%;"><b>RNNs in more detail</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs048.html#rnns-in-more-detail-part-2" style="font-size: 80%;"><b>RNNs in more detail, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs049.html#rnns-in-more-detail-part-3" style="font-size: 80%;"><b>RNNs in more detail, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs050.html#rnns-in-more-detail-part-4" style="font-size: 80%;"><b>RNNs in more detail, part 4</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs051.html#rnns-in-more-detail-part-5" style="font-size: 80%;"><b>RNNs in more detail, part 5</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs052.html#rnns-in-more-detail-part-6" style="font-size: 80%;"><b>RNNs in more detail, part 6</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs053.html#rnns-in-more-detail-part-7" style="font-size: 80%;"><b>RNNs in more detail, part 7</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs054.html#backpropagation-through-time" style="font-size: 80%;"><b>Backpropagation through time</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs055.html#the-backward-pass-is-linear" style="font-size: 80%;"><b>The backward pass is linear</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs056.html#the-problem-of-exploding-or-vanishing-gradients" style="font-size: 80%;"><b>The problem of exploding or vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs057.html#mathematical-setup" style="font-size: 80%;"><b>Mathematical setup</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs058.html#back-propagation-in-time-through-figures-part-1" style="font-size: 80%;"><b>Back propagation in time through figures, part 1</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs059.html#back-propagation-in-time-part-2" style="font-size: 80%;"><b>Back propagation in time, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs060.html#back-propagation-in-time-part-3" style="font-size: 80%;"><b>Back propagation in time, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs061.html#back-propagation-in-time-part-4" style="font-size: 80%;"><b>Back propagation in time, part 4</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs062.html#back-propagation-in-time-in-equations" style="font-size: 80%;"><b>Back propagation in time in equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs063.html#chain-rule-again" style="font-size: 80%;"><b>Chain rule again</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs064.html#gradients-of-loss-functions" style="font-size: 80%;"><b>Gradients of loss functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs065.html#summary-of-rnns" style="font-size: 80%;"><b>Summary of RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs066.html#summary-of-a-typical-rnn" style="font-size: 80%;"><b>Summary of a  typical RNN</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs067.html#four-effective-ways-to-learn-an-rnn-and-preparing-for-next-week" style="font-size: 80%;"><b>Four effective ways to learn an RNN and preparing for next week</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs068.html#gating-mechanism-long-short-term-memory-lstm" style="font-size: 80%;"><b>Gating mechanism: Long Short Term Memory (LSTM)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs069.html#implementing-a-memory-cell-in-a-neural-network" style="font-size: 80%;"><b>Implementing a memory cell in a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs070.html#lstm-details" style="font-size: 80%;"><b>LSTM details</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs071.html#basic-layout" style="font-size: 80%;"><b>Basic layout</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs072.html#more-lstm-details" style="font-size: 80%;"><b>More LSTM details</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs073.html#the-forget-gate" style="font-size: 80%;"><b>The forget gate</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs074.html#input-gate" style="font-size: 80%;"><b>Input gate</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs075.html#forget-and-input" style="font-size: 80%;"><b>Forget and input</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs076.html#output-gate" style="font-size: 80%;"><b>Output gate</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs001.html#plans-for-week-45" style="font-size: 80%;">Plans for week 45</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs002.html#material-for-the-lab-sessions" style="font-size: 80%;">Material for the lab sessions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs003.html#material-for-lecture-monday-november-3" style="font-size: 80%;">Material for Lecture Monday November 3</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs004.html#convolutional-neural-networks-recognizing-images-reminder-from-last-week" style="font-size: 80%;">Convolutional Neural Networks (recognizing images), reminder from last week</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs005.html#what-is-the-difference" style="font-size: 80%;">What is the Difference</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs006.html#neural-networks-vs-cnns" style="font-size: 80%;">Neural Networks vs CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs007.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;">Why CNNS for images, sound files, medical images from CT scans etc?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs008.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;">Regular NNs don’t scale well to full images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs009.html#3d-volumes-of-neurons" style="font-size: 80%;">3D volumes of neurons</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs010.html#more-on-dimensionalities" style="font-size: 80%;">More on Dimensionalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs011.html#further-remarks" style="font-size: 80%;">Further remarks</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs012.html#layers-used-to-build-cnns" style="font-size: 80%;">Layers used to build CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs013.html#transforming-images" style="font-size: 80%;">Transforming images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs014.html#cnns-in-brief" style="font-size: 80%;">CNNs in brief</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs015.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="#key-idea" style="font-size: 80%;">Key Idea</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs017.html#mathematics-of-cnns" style="font-size: 80%;">Mathematics of CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs018.html#convolution-examples-polynomial-multiplication" style="font-size: 80%;">Convolution Examples: Polynomial multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs019.html#efficient-polynomial-multiplication" style="font-size: 80%;">Efficient Polynomial Multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs020.html#further-simplification" style="font-size: 80%;">Further simplification</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs021.html#a-more-efficient-way-of-coding-the-above-convolution" style="font-size: 80%;">A more efficient way of coding the above Convolution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs022.html#commutative-process" style="font-size: 80%;">Commutative process</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs023.html#toeplitz-matrices" style="font-size: 80%;">Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs024.html#fourier-series-and-toeplitz-matrices" style="font-size: 80%;">Fourier series and Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs025.html#generalizing-the-above-one-dimensional-case" style="font-size: 80%;">Generalizing the above one-dimensional case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs026.html#memory-considerations" style="font-size: 80%;">Memory considerations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs027.html#padding" style="font-size: 80%;">Padding</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs028.html#new-vector" style="font-size: 80%;">New vector</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs029.html#rewriting-as-dot-products" style="font-size: 80%;">Rewriting as dot products</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs030.html#cross-correlation" style="font-size: 80%;">Cross correlation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs030.html#two-dimensional-objects" style="font-size: 80%;">Two-dimensional objects</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#cnns-in-more-detail-simple-example" style="font-size: 80%;">CNNs in more detail, simple example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs032.html#the-convolution-stage" style="font-size: 80%;">The convolution stage</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs033.html#finding-the-number-of-parameters" style="font-size: 80%;">Finding the number of parameters</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs034.html#new-image-or-volume" style="font-size: 80%;">New image (or volume)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs035.html#parameters-to-train-common-settings" style="font-size: 80%;">Parameters to train, common settings</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs036.html#examples-of-cnn-setups" style="font-size: 80%;">Examples of CNN setups</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs037.html#summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Summarizing: Performing a general discrete convolution ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs038.html#pooling" style="font-size: 80%;">Pooling</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs039.html#pooling-arithmetic" style="font-size: 80%;">Pooling arithmetic</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs040.html#pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Pooling types ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs041.html#building-convolutional-neural-networks-using-tensorflow-and-keras" style="font-size: 80%;">Building convolutional neural networks using Tensorflow and Keras</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs042.html#setting-it-up" style="font-size: 80%;">Setting it up</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs043.html#the-mnist-dataset-again" style="font-size: 80%;">The MNIST dataset again</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs044.html#strong-correlations" style="font-size: 80%;">Strong correlations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs045.html#layers-of-a-cnn" style="font-size: 80%;">Layers of a CNN</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs046.html#systematic-reduction" style="font-size: 80%;">Systematic reduction</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs047.html#prerequisites-collect-and-pre-process-data" style="font-size: 80%;">Prerequisites: Collect and pre-process data</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs048.html#importing-keras-and-tensorflow" style="font-size: 80%;">Importing Keras and Tensorflow</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs049.html#running-with-keras" style="font-size: 80%;">Running with Keras</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs050.html#final-part" style="font-size: 80%;">Final part</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs051.html#final-visualization" style="font-size: 80%;">Final visualization</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs052.html#the-cifar01-data-set" style="font-size: 80%;">The CIFAR01 data set</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs053.html#verifying-the-data-set" style="font-size: 80%;">Verifying the data set</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs054.html#set-up-the-model" style="font-size: 80%;">Set up  the model</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs055.html#add-dense-layers-on-top" style="font-size: 80%;">Add Dense layers on top</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs056.html#compile-and-train-the-model" style="font-size: 80%;">Compile and train the model</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs057.html#finally-evaluate-the-model" style="font-size: 80%;">Finally, evaluate the model</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs058.html#building-code-using-pytorch" style="font-size: 80%;">Building code using Pytorch</a></li>
 
         </ul>
       </li>
@@ -418,21 +278,15 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0016"></a>
 <!-- !split -->
-<h2 id="the-mnist-dataset-again" class="anchor">The MNIST dataset again </h2>
+<h2 id="key-idea" class="anchor">Key Idea </h2>
 
-<p>The MNIST dataset consists of grayscale images with a pixel size of
-\( 28\times 28 \), meaning we require \( 28 \times 28 = 724 \) weights to each
-neuron in the first hidden layer.
-</p>
+<p>A dense neural network is representd by an affine operation (like matrix-matrix multiplication) where all parameters are included.</p>
 
-<p>If we were to analyze images of size \( 128\times 128 \) we would require
-\( 128 \times 128 = 16384 \) weights to each neuron. Even worse if we were
-dealing with color images, as most images are, we have an image matrix
-of size \( 128\times 128 \) for each color dimension (Red, Green, Blue),
-meaning 3 times the number of weights \( = 49152 \) are required for every
-single neuron in the first hidden layer.
+<p>The key idea in CNNs for say imaging is that in images neighbor pixels tend to be related! So we connect
+only neighboring neurons in the input instead of connecting all with the first hidden layer.
 </p>
 
+<p>We say we perform a filtering (convolution is the mathematical operation). </p>
 
 <p>
 <!-- navigation buttons at the bottom of the page -->
@@ -459,7 +313,7 @@ <h2 id="the-mnist-dataset-again" class="anchor">The MNIST dataset again </h2>
   <li><a href="http://github.com/._week45-bs024.html">25</a></li>
   <li><a href="http://github.com/._week45-bs025.html">26</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week45-bs076.html">77</a></li>
+  <li><a href="http://github.com/._week45-bs058.html">59</a></li>
   <li><a href="http://github.com/._week45-bs017.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week45/html/._week45-bs017.html b/doc/pub/week45/html/._week45-bs017.html
index f45778899..846727785 100644
--- a/doc/pub/week45/html/._week45-bs017.html
+++ b/doc/pub/week45/html/._week45-bs017.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks (RNNs)">
-<title>Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks (RNNs)</title>
+<meta name="description" content="Week 45,  Convolutional Neural Networks (CCNs)">
+<title>Week 45,  Convolutional Neural Networks (CCNs)</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week45.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week45-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -37,19 +37,19 @@
 <!-- tocinfo
 {'highest level': 2,
  'sections': [('Plans for week 45', 2, None, 'plans-for-week-45'),
-              ('Material for the lab sessions, additional ways to present '
-               'classification results and other practicalities',
+              ('Material for the lab sessions',
                2,
                None,
-               'material-for-the-lab-sessions-additional-ways-to-present-classification-results-and-other-practicalities'),
-              ('Material for Lecture Monday November 4',
+               'material-for-the-lab-sessions'),
+              ('Material for Lecture Monday November 3',
                2,
                None,
-               'material-for-lecture-monday-november-4'),
-              ('Convolutional Neural Networks (recognizing images)',
+               'material-for-lecture-monday-november-3'),
+              ('Convolutional Neural Networks (recognizing images), reminder '
+               'from last week',
                2,
                None,
-               'convolutional-neural-networks-recognizing-images'),
+               'convolutional-neural-networks-recognizing-images-reminder-from-last-week'),
               ('What is the Difference', 2, None, 'what-is-the-difference'),
               ('Neural Networks vs CNNs', 2, None, 'neural-networks-vs-cnns'),
               ('Why CNNS for images, sound files, medical images from CT scans '
@@ -62,10 +62,13 @@
                None,
                'regular-nns-don-t-scale-well-to-full-images'),
               ('3D volumes of neurons', 2, None, '3d-volumes-of-neurons'),
+              ('More on Dimensionalities', 2, None, 'more-on-dimensionalities'),
+              ('Further remarks', 2, None, 'further-remarks'),
               ('Layers used to build CNNs',
                2,
                None,
                'layers-used-to-build-cnns'),
+              ('Transforming images', 2, None, 'transforming-images'),
               ('CNNs in brief', 2, None, 'cnns-in-brief'),
               ('A deep CNN model ("From Raschka et '
                'al":"https://github.com/rasbt/machine-learning-book")',
@@ -73,208 +76,104 @@
                None,
                'a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
               ('Key Idea', 2, None, 'key-idea'),
-              ('Building convolutional neural networks in Tensorflow and Keras',
+              ('Mathematics of CNNs', 2, None, 'mathematics-of-cnns'),
+              ('Convolution Examples: Polynomial multiplication',
                2,
                None,
-               'building-convolutional-neural-networks-in-tensorflow-and-keras'),
-              ('Setting it up', 2, None, 'setting-it-up'),
-              ('The MNIST dataset again', 2, None, 'the-mnist-dataset-again'),
-              ('Strong correlations', 2, None, 'strong-correlations'),
-              ('Layers of a CNN', 2, None, 'layers-of-a-cnn'),
-              ('Systematic reduction', 2, None, 'systematic-reduction'),
-              ('Prerequisites: Collect and pre-process data',
-               2,
-               None,
-               'prerequisites-collect-and-pre-process-data'),
-              ('Importing Keras and Tensorflow',
-               2,
-               None,
-               'importing-keras-and-tensorflow'),
-              ('Running with Keras', 2, None, 'running-with-keras'),
-              ('Final part', 2, None, 'final-part'),
-              ('Final visualization', 2, None, 'final-visualization'),
-              ('The CIFAR01 data set', 2, None, 'the-cifar01-data-set'),
-              ('Verifying the data set', 2, None, 'verifying-the-data-set'),
-              ('Set up  the model', 2, None, 'set-up-the-model'),
-              ('Add Dense layers on top', 2, None, 'add-dense-layers-on-top'),
-              ('Compile and train the model',
-               2,
-               None,
-               'compile-and-train-the-model'),
-              ('Finally, evaluate the model',
-               2,
-               None,
-               'finally-evaluate-the-model'),
-              ('Building our own CNN code',
-               2,
-               None,
-               'building-our-own-cnn-code'),
-              ('List of contents:', 3, None, 'list-of-contents'),
-              ('Schedulers', 3, None, 'schedulers'),
-              ('Usage of schedulers', 3, None, 'usage-of-schedulers'),
-              ('Cost functions', 3, None, 'cost-functions'),
-              ('Usage of cost functions', 3, None, 'usage-of-cost-functions'),
-              ('Activation functions', 3, None, 'activation-functions'),
-              ('Usage of activation functions',
-               3,
-               None,
-               'usage-of-activation-functions'),
-              ('Convolution', 3, None, 'convolution'),
-              ('Layers', 3, None, 'layers'),
-              ('Convolution2DLayer: convolution in a hidden layer',
-               3,
-               None,
-               'convolution2dlayer-convolution-in-a-hidden-layer'),
-              ('Backpropagation in the convolutional layer',
-               3,
-               None,
-               'backpropagation-in-the-convolutional-layer'),
-              ('Demonstration', 3, None, 'demonstration'),
-              ('Pooling Layer', 3, None, 'pooling-layer'),
-              ('Flattening Layer', 3, None, 'flattening-layer'),
-              ('Fully Connected Layers', 3, None, 'fully-connected-layers'),
-              ('Optimized Convolution2DLayer',
-               3,
-               None,
-               'optimized-convolution2dlayer'),
-              ('The Convolutional Neural Network (CNN)',
-               3,
-               None,
-               'the-convolutional-neural-network-cnn'),
-              ('Usage of CNN code', 3, None, 'usage-of-cnn-code'),
-              ('Additional Remarks', 3, None, 'additional-remarks'),
-              ('Remarks on the speed', 3, None, 'remarks-on-the-speed'),
-              ('Convolution using separable kernels',
-               3,
-               None,
-               'convolution-using-separable-kernels'),
-              ('Convolution in the Fourier domain',
-               3,
-               None,
-               'convolution-in-the-fourier-domain'),
-              ('From FFNNs and CNNs to recurrent neural networks (RNNs)',
-               2,
-               None,
-               'from-ffnns-and-cnns-to-recurrent-neural-networks-rnns'),
-              ('Feedback connections', 2, None, 'feedback-connections'),
-              ('Vanishing gradients', 2, None, 'vanishing-gradients'),
-              ('Recurrent neural networks (RNNs): Overarching view',
-               2,
-               None,
-               'recurrent-neural-networks-rnns-overarching-view'),
-              ('Sequential data only?', 2, None, 'sequential-data-only'),
-              ('Differential equations', 2, None, 'differential-equations'),
-              ('A simple example', 2, None, 'a-simple-example'),
-              ('RNNs', 2, None, 'rnns'),
-              ('What kinds of behaviour can RNNs exhibit?',
+               'convolution-examples-polynomial-multiplication'),
+              ('Efficient Polynomial Multiplication',
                2,
                None,
-               'what-kinds-of-behaviour-can-rnns-exhibit'),
-              ('Basic layout,  "Figures from Sebastian Rashcka et al, Machine '
-               'learning with Sickit-Learn and '
-               'PyTorch":"https://sebastianraschka.com/blog/2022/ml-pytorch-book.html"',
+               'efficient-polynomial-multiplication'),
+              ('Further simplification', 2, None, 'further-simplification'),
+              ('A more efficient way of coding the above Convolution',
                2,
                None,
-               'basic-layout-figures-from-sebastian-rashcka-et-al-machine-learning-with-sickit-learn-and-pytorch-https-sebastianraschka-com-blog-2022-ml-pytorch-book-html'),
-              ('Solving differential equations with RNNs',
+               'a-more-efficient-way-of-coding-the-above-convolution'),
+              ('Commutative process', 2, None, 'commutative-process'),
+              ('Toeplitz matrices', 2, None, 'toeplitz-matrices'),
+              ('Fourier series and Toeplitz matrices',
                2,
                None,
-               'solving-differential-equations-with-rnns'),
-              ('Two first-order differential equations',
+               'fourier-series-and-toeplitz-matrices'),
+              ('Generalizing the above one-dimensional case',
                2,
                None,
-               'two-first-order-differential-equations'),
-              ('Velocity only', 2, None, 'velocity-only'),
-              ('Linking with RNNs', 2, None, 'linking-with-rnns'),
-              ('Minor rewrite', 2, None, 'minor-rewrite'),
-              ('RNNs in more detail', 2, None, 'rnns-in-more-detail'),
-              ('RNNs in more detail, part 2',
+               'generalizing-the-above-one-dimensional-case'),
+              ('Memory considerations', 2, None, 'memory-considerations'),
+              ('Padding', 2, None, 'padding'),
+              ('New vector', 2, None, 'new-vector'),
+              ('Rewriting as dot products',
                2,
                None,
-               'rnns-in-more-detail-part-2'),
-              ('RNNs in more detail, part 3',
+               'rewriting-as-dot-products'),
+              ('Cross correlation', 2, None, 'cross-correlation'),
+              ('Two-dimensional objects', 2, None, 'two-dimensional-objects'),
+              ('CNNs in more detail, simple example',
                2,
                None,
-               'rnns-in-more-detail-part-3'),
-              ('RNNs in more detail, part 4',
+               'cnns-in-more-detail-simple-example'),
+              ('The convolution stage', 2, None, 'the-convolution-stage'),
+              ('Finding the number of parameters',
                2,
                None,
-               'rnns-in-more-detail-part-4'),
-              ('RNNs in more detail, part 5',
+               'finding-the-number-of-parameters'),
+              ('New image (or volume)', 2, None, 'new-image-or-volume'),
+              ('Parameters to train, common settings',
                2,
                None,
-               'rnns-in-more-detail-part-5'),
-              ('RNNs in more detail, part 6',
-               2,
-               None,
-               'rnns-in-more-detail-part-6'),
-              ('RNNs in more detail, part 7',
-               2,
-               None,
-               'rnns-in-more-detail-part-7'),
-              ('Backpropagation through time',
-               2,
-               None,
-               'backpropagation-through-time'),
-              ('The backward pass is linear',
-               2,
-               None,
-               'the-backward-pass-is-linear'),
-              ('The problem of exploding or vanishing gradients',
-               2,
-               None,
-               'the-problem-of-exploding-or-vanishing-gradients'),
-              ('Mathematical setup', 2, None, 'mathematical-setup'),
-              ('Back propagation in time through figures, part 1',
-               2,
-               None,
-               'back-propagation-in-time-through-figures-part-1'),
-              ('Back propagation in time, part 2',
-               2,
-               None,
-               'back-propagation-in-time-part-2'),
-              ('Back propagation in time, part 3',
+               'parameters-to-train-common-settings'),
+              ('Examples of CNN setups', 2, None, 'examples-of-cnn-setups'),
+              ('Summarizing: Performing a general discrete convolution ("From '
+               'Raschka et '
+               'al":"https://github.com/rasbt/machine-learning-book")',
                2,
                None,
-               'back-propagation-in-time-part-3'),
-              ('Back propagation in time, part 4',
+               'summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
+              ('Pooling', 2, None, 'pooling'),
+              ('Pooling arithmetic', 2, None, 'pooling-arithmetic'),
+              ('Pooling types ("From Raschka et '
+               'al":"https://github.com/rasbt/machine-learning-book")',
                2,
                None,
-               'back-propagation-in-time-part-4'),
-              ('Back propagation in time in equations',
+               'pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
+              ('Building convolutional neural networks using Tensorflow and '
+               'Keras',
                2,
                None,
-               'back-propagation-in-time-in-equations'),
-              ('Chain rule again', 2, None, 'chain-rule-again'),
-              ('Gradients of loss functions',
+               'building-convolutional-neural-networks-using-tensorflow-and-keras'),
+              ('Setting it up', 2, None, 'setting-it-up'),
+              ('The MNIST dataset again', 2, None, 'the-mnist-dataset-again'),
+              ('Strong correlations', 2, None, 'strong-correlations'),
+              ('Layers of a CNN', 2, None, 'layers-of-a-cnn'),
+              ('Systematic reduction', 2, None, 'systematic-reduction'),
+              ('Prerequisites: Collect and pre-process data',
                2,
                None,
-               'gradients-of-loss-functions'),
-              ('Summary of RNNs', 2, None, 'summary-of-rnns'),
-              ('Summary of a  typical RNN',
+               'prerequisites-collect-and-pre-process-data'),
+              ('Importing Keras and Tensorflow',
                2,
                None,
-               'summary-of-a-typical-rnn'),
-              ('Four effective ways to learn an RNN and preparing for next '
-               'week',
+               'importing-keras-and-tensorflow'),
+              ('Running with Keras', 2, None, 'running-with-keras'),
+              ('Final part', 2, None, 'final-part'),
+              ('Final visualization', 2, None, 'final-visualization'),
+              ('The CIFAR01 data set', 2, None, 'the-cifar01-data-set'),
+              ('Verifying the data set', 2, None, 'verifying-the-data-set'),
+              ('Set up  the model', 2, None, 'set-up-the-model'),
+              ('Add Dense layers on top', 2, None, 'add-dense-layers-on-top'),
+              ('Compile and train the model',
                2,
                None,
-               'four-effective-ways-to-learn-an-rnn-and-preparing-for-next-week'),
-              ('Gating mechanism: Long Short Term Memory (LSTM)',
+               'compile-and-train-the-model'),
+              ('Finally, evaluate the model',
                2,
                None,
-               'gating-mechanism-long-short-term-memory-lstm'),
-              ('Implementing a memory cell in a neural network',
+               'finally-evaluate-the-model'),
+              ('Building code using Pytorch',
                2,
                None,
-               'implementing-a-memory-cell-in-a-neural-network'),
-              ('LSTM details', 2, None, 'lstm-details'),
-              ('Basic layout', 2, None, 'basic-layout'),
-              ('More LSTM details', 2, None, 'more-lstm-details'),
-              ('The forget gate', 2, None, 'the-forget-gate'),
-              ('Input gate', 2, None, 'input-gate'),
-              ('Forget and input', 2, None, 'forget-and-input'),
-              ('Output gate', 2, None, 'output-gate')]}
+               'building-code-using-pytorch')]}
 end of tocinfo -->
 
 <body>
@@ -302,111 +201,72 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week45-bs.html">Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks (RNNs)</a>
+    <a class="navbar-brand" href="http://github.com/week45-bs.html">Week 45,  Convolutional Neural Networks (CCNs)</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs001.html#plans-for-week-45" style="font-size: 80%;"><b>Plans for week 45</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs002.html#material-for-the-lab-sessions-additional-ways-to-present-classification-results-and-other-practicalities" style="font-size: 80%;"><b>Material for the lab sessions, additional ways to present classification results and other practicalities</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs003.html#material-for-lecture-monday-november-4" style="font-size: 80%;"><b>Material for Lecture Monday November 4</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs004.html#convolutional-neural-networks-recognizing-images" style="font-size: 80%;"><b>Convolutional Neural Networks (recognizing images)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs005.html#what-is-the-difference" style="font-size: 80%;"><b>What is the Difference</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs006.html#neural-networks-vs-cnns" style="font-size: 80%;"><b>Neural Networks vs CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs007.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;"><b>Why CNNS for images, sound files, medical images from CT scans etc?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs008.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;"><b>Regular NNs don’t scale well to full images</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs009.html#3d-volumes-of-neurons" style="font-size: 80%;"><b>3D volumes of neurons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs010.html#layers-used-to-build-cnns" style="font-size: 80%;"><b>Layers used to build CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs011.html#cnns-in-brief" style="font-size: 80%;"><b>CNNs in brief</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs012.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs013.html#key-idea" style="font-size: 80%;"><b>Key Idea</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs014.html#building-convolutional-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building convolutional neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs015.html#setting-it-up" style="font-size: 80%;"><b>Setting it up</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs016.html#the-mnist-dataset-again" style="font-size: 80%;"><b>The MNIST dataset again</b></a></li>
-     <!-- navigation toc: --> <li><a href="#strong-correlations" style="font-size: 80%;"><b>Strong correlations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs018.html#layers-of-a-cnn" style="font-size: 80%;"><b>Layers of a CNN</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs019.html#systematic-reduction" style="font-size: 80%;"><b>Systematic reduction</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs020.html#prerequisites-collect-and-pre-process-data" style="font-size: 80%;"><b>Prerequisites: Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs021.html#importing-keras-and-tensorflow" style="font-size: 80%;"><b>Importing Keras and Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs022.html#running-with-keras" style="font-size: 80%;"><b>Running with Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs023.html#final-part" style="font-size: 80%;"><b>Final part</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs024.html#final-visualization" style="font-size: 80%;"><b>Final visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs025.html#the-cifar01-data-set" style="font-size: 80%;"><b>The CIFAR01 data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs026.html#verifying-the-data-set" style="font-size: 80%;"><b>Verifying the data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs027.html#set-up-the-model" style="font-size: 80%;"><b>Set up  the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs028.html#add-dense-layers-on-top" style="font-size: 80%;"><b>Add Dense layers on top</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs029.html#compile-and-train-the-model" style="font-size: 80%;"><b>Compile and train the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs030.html#finally-evaluate-the-model" style="font-size: 80%;"><b>Finally, evaluate the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#building-our-own-cnn-code" style="font-size: 80%;"><b>Building our own CNN code</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#list-of-contents" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;List of contents:</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution2dlayer-convolution-in-a-hidden-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution2DLayer: convolution in a hidden layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#backpropagation-in-the-convolutional-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Backpropagation in the convolutional layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#demonstration" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Demonstration</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#pooling-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Pooling Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#flattening-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Flattening Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#fully-connected-layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Fully Connected Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#optimized-convolution2dlayer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Optimized Convolution2DLayer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#the-convolutional-neural-network-cnn" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;The Convolutional Neural Network (CNN)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-cnn-code" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of CNN code</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#additional-remarks" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Additional Remarks</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#remarks-on-the-speed" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Remarks on the speed</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution-using-separable-kernels" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution using separable kernels</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution-in-the-fourier-domain" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution in the Fourier domain</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs032.html#from-ffnns-and-cnns-to-recurrent-neural-networks-rnns" style="font-size: 80%;"><b>From FFNNs and CNNs to recurrent neural networks (RNNs)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs033.html#feedback-connections" style="font-size: 80%;"><b>Feedback connections</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs034.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs035.html#recurrent-neural-networks-rnns-overarching-view" style="font-size: 80%;"><b>Recurrent neural networks (RNNs): Overarching view</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs036.html#sequential-data-only" style="font-size: 80%;"><b>Sequential data only?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs037.html#differential-equations" style="font-size: 80%;"><b>Differential equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs038.html#a-simple-example" style="font-size: 80%;"><b>A simple example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs039.html#rnns" style="font-size: 80%;"><b>RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs040.html#what-kinds-of-behaviour-can-rnns-exhibit" style="font-size: 80%;"><b>What kinds of behaviour can RNNs exhibit?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs041.html#basic-layout-figures-from-sebastian-rashcka-et-al-machine-learning-with-sickit-learn-and-pytorch-https-sebastianraschka-com-blog-2022-ml-pytorch-book-html" style="font-size: 80%;"><b>Basic layout,  "Figures from Sebastian Rashcka et al, Machine learning with Sickit-Learn and PyTorch":"https://sebastianraschka.com/blog/2022/ml-pytorch-book.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs042.html#solving-differential-equations-with-rnns" style="font-size: 80%;"><b>Solving differential equations with RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs043.html#two-first-order-differential-equations" style="font-size: 80%;"><b>Two first-order differential equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs044.html#velocity-only" style="font-size: 80%;"><b>Velocity only</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs045.html#linking-with-rnns" style="font-size: 80%;"><b>Linking with RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs046.html#minor-rewrite" style="font-size: 80%;"><b>Minor rewrite</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs047.html#rnns-in-more-detail" style="font-size: 80%;"><b>RNNs in more detail</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs048.html#rnns-in-more-detail-part-2" style="font-size: 80%;"><b>RNNs in more detail, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs049.html#rnns-in-more-detail-part-3" style="font-size: 80%;"><b>RNNs in more detail, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs050.html#rnns-in-more-detail-part-4" style="font-size: 80%;"><b>RNNs in more detail, part 4</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs051.html#rnns-in-more-detail-part-5" style="font-size: 80%;"><b>RNNs in more detail, part 5</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs052.html#rnns-in-more-detail-part-6" style="font-size: 80%;"><b>RNNs in more detail, part 6</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs053.html#rnns-in-more-detail-part-7" style="font-size: 80%;"><b>RNNs in more detail, part 7</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs054.html#backpropagation-through-time" style="font-size: 80%;"><b>Backpropagation through time</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs055.html#the-backward-pass-is-linear" style="font-size: 80%;"><b>The backward pass is linear</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs056.html#the-problem-of-exploding-or-vanishing-gradients" style="font-size: 80%;"><b>The problem of exploding or vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs057.html#mathematical-setup" style="font-size: 80%;"><b>Mathematical setup</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs058.html#back-propagation-in-time-through-figures-part-1" style="font-size: 80%;"><b>Back propagation in time through figures, part 1</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs059.html#back-propagation-in-time-part-2" style="font-size: 80%;"><b>Back propagation in time, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs060.html#back-propagation-in-time-part-3" style="font-size: 80%;"><b>Back propagation in time, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs061.html#back-propagation-in-time-part-4" style="font-size: 80%;"><b>Back propagation in time, part 4</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs062.html#back-propagation-in-time-in-equations" style="font-size: 80%;"><b>Back propagation in time in equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs063.html#chain-rule-again" style="font-size: 80%;"><b>Chain rule again</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs064.html#gradients-of-loss-functions" style="font-size: 80%;"><b>Gradients of loss functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs065.html#summary-of-rnns" style="font-size: 80%;"><b>Summary of RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs066.html#summary-of-a-typical-rnn" style="font-size: 80%;"><b>Summary of a  typical RNN</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs067.html#four-effective-ways-to-learn-an-rnn-and-preparing-for-next-week" style="font-size: 80%;"><b>Four effective ways to learn an RNN and preparing for next week</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs068.html#gating-mechanism-long-short-term-memory-lstm" style="font-size: 80%;"><b>Gating mechanism: Long Short Term Memory (LSTM)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs069.html#implementing-a-memory-cell-in-a-neural-network" style="font-size: 80%;"><b>Implementing a memory cell in a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs070.html#lstm-details" style="font-size: 80%;"><b>LSTM details</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs071.html#basic-layout" style="font-size: 80%;"><b>Basic layout</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs072.html#more-lstm-details" style="font-size: 80%;"><b>More LSTM details</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs073.html#the-forget-gate" style="font-size: 80%;"><b>The forget gate</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs074.html#input-gate" style="font-size: 80%;"><b>Input gate</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs075.html#forget-and-input" style="font-size: 80%;"><b>Forget and input</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs076.html#output-gate" style="font-size: 80%;"><b>Output gate</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs001.html#plans-for-week-45" style="font-size: 80%;">Plans for week 45</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs002.html#material-for-the-lab-sessions" style="font-size: 80%;">Material for the lab sessions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs003.html#material-for-lecture-monday-november-3" style="font-size: 80%;">Material for Lecture Monday November 3</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs004.html#convolutional-neural-networks-recognizing-images-reminder-from-last-week" style="font-size: 80%;">Convolutional Neural Networks (recognizing images), reminder from last week</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs005.html#what-is-the-difference" style="font-size: 80%;">What is the Difference</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs006.html#neural-networks-vs-cnns" style="font-size: 80%;">Neural Networks vs CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs007.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;">Why CNNS for images, sound files, medical images from CT scans etc?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs008.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;">Regular NNs don’t scale well to full images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs009.html#3d-volumes-of-neurons" style="font-size: 80%;">3D volumes of neurons</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs010.html#more-on-dimensionalities" style="font-size: 80%;">More on Dimensionalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs011.html#further-remarks" style="font-size: 80%;">Further remarks</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs012.html#layers-used-to-build-cnns" style="font-size: 80%;">Layers used to build CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs013.html#transforming-images" style="font-size: 80%;">Transforming images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs014.html#cnns-in-brief" style="font-size: 80%;">CNNs in brief</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs015.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs016.html#key-idea" style="font-size: 80%;">Key Idea</a></li>
+     <!-- navigation toc: --> <li><a href="#mathematics-of-cnns" style="font-size: 80%;">Mathematics of CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs018.html#convolution-examples-polynomial-multiplication" style="font-size: 80%;">Convolution Examples: Polynomial multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs019.html#efficient-polynomial-multiplication" style="font-size: 80%;">Efficient Polynomial Multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs020.html#further-simplification" style="font-size: 80%;">Further simplification</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs021.html#a-more-efficient-way-of-coding-the-above-convolution" style="font-size: 80%;">A more efficient way of coding the above Convolution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs022.html#commutative-process" style="font-size: 80%;">Commutative process</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs023.html#toeplitz-matrices" style="font-size: 80%;">Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs024.html#fourier-series-and-toeplitz-matrices" style="font-size: 80%;">Fourier series and Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs025.html#generalizing-the-above-one-dimensional-case" style="font-size: 80%;">Generalizing the above one-dimensional case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs026.html#memory-considerations" style="font-size: 80%;">Memory considerations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs027.html#padding" style="font-size: 80%;">Padding</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs028.html#new-vector" style="font-size: 80%;">New vector</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs029.html#rewriting-as-dot-products" style="font-size: 80%;">Rewriting as dot products</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs030.html#cross-correlation" style="font-size: 80%;">Cross correlation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs030.html#two-dimensional-objects" style="font-size: 80%;">Two-dimensional objects</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#cnns-in-more-detail-simple-example" style="font-size: 80%;">CNNs in more detail, simple example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs032.html#the-convolution-stage" style="font-size: 80%;">The convolution stage</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs033.html#finding-the-number-of-parameters" style="font-size: 80%;">Finding the number of parameters</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs034.html#new-image-or-volume" style="font-size: 80%;">New image (or volume)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs035.html#parameters-to-train-common-settings" style="font-size: 80%;">Parameters to train, common settings</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs036.html#examples-of-cnn-setups" style="font-size: 80%;">Examples of CNN setups</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs037.html#summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Summarizing: Performing a general discrete convolution ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs038.html#pooling" style="font-size: 80%;">Pooling</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs039.html#pooling-arithmetic" style="font-size: 80%;">Pooling arithmetic</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs040.html#pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Pooling types ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs041.html#building-convolutional-neural-networks-using-tensorflow-and-keras" style="font-size: 80%;">Building convolutional neural networks using Tensorflow and Keras</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs042.html#setting-it-up" style="font-size: 80%;">Setting it up</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs043.html#the-mnist-dataset-again" style="font-size: 80%;">The MNIST dataset again</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs044.html#strong-correlations" style="font-size: 80%;">Strong correlations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs045.html#layers-of-a-cnn" style="font-size: 80%;">Layers of a CNN</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs046.html#systematic-reduction" style="font-size: 80%;">Systematic reduction</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs047.html#prerequisites-collect-and-pre-process-data" style="font-size: 80%;">Prerequisites: Collect and pre-process data</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs048.html#importing-keras-and-tensorflow" style="font-size: 80%;">Importing Keras and Tensorflow</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs049.html#running-with-keras" style="font-size: 80%;">Running with Keras</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs050.html#final-part" style="font-size: 80%;">Final part</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs051.html#final-visualization" style="font-size: 80%;">Final visualization</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs052.html#the-cifar01-data-set" style="font-size: 80%;">The CIFAR01 data set</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs053.html#verifying-the-data-set" style="font-size: 80%;">Verifying the data set</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs054.html#set-up-the-model" style="font-size: 80%;">Set up  the model</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs055.html#add-dense-layers-on-top" style="font-size: 80%;">Add Dense layers on top</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs056.html#compile-and-train-the-model" style="font-size: 80%;">Compile and train the model</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs057.html#finally-evaluate-the-model" style="font-size: 80%;">Finally, evaluate the model</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs058.html#building-code-using-pytorch" style="font-size: 80%;">Building code using Pytorch</a></li>
 
         </ul>
       </li>
@@ -418,21 +278,41 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0017"></a>
 <!-- !split -->
-<h2 id="strong-correlations" class="anchor">Strong correlations </h2>
+<h2 id="mathematics-of-cnns" class="anchor">Mathematics of CNNs </h2>
 
-<p>Images typically have strong local correlations, meaning that a small
-part of the image varies little from its neighboring regions. If for
-example we have an image of a blue car, we can roughly assume that a
-small blue part of the image is surrounded by other blue regions.
+<p>The mathematics of CNNs is based on the mathematical operation of
+<b>convolution</b>.  In mathematics (in particular in functional analysis),
+convolution is represented by mathematical operations (integration,
+summation etc) on two functions in order to produce a third function
+that expresses how the shape of one gets modified by the other.
+Convolution has a plethora of applications in a variety of
+disciplines, spanning from statistics to signal processing, computer
+vision, solutions of differential equations,linear algebra,
+engineering, and yes, machine learning.
 </p>
 
-<p>Therefore, instead of connecting every single pixel to a neuron in the
-first hidden layer, as we have previously done with deep neural
-networks, we can instead connect each neuron to a small part of the
-image (in all 3 RGB depth dimensions).  The size of each small area is
-fixed, and known as a <a href="https://en.wikipedia.org/wiki/Receptive_field" target="_self">receptive</a>.
+<p>Mathematically, convolution is defined as follows (one-dimensional example):
+Let us define a continuous function \( y(t) \) given by
 </p>
+$$
+y(t) = \int x(a) w(t-a) da,
+$$
+
+<p>where \( x(a) \) represents a so-called input and \( w(t-a) \) is normally called the weight function or kernel.</p>
+
+<p>The above integral is written in  a more compact form as</p>
+$$
+y(t) = \left(x * w\right)(t).
+$$
+
+<p>The discretized version reads</p>
+$$
+y(t) = \sum_{a=-\infty}^{a=\infty}x(a)w(t-a).
+$$
+
+<p>Computing the inverse of the above convolution operations is known as deconvolution and the process is commutative.</p>
 
+<p>How can we use this? And what does it mean? Let us study some familiar examples first.</p>
 
 <p>
 <!-- navigation buttons at the bottom of the page -->
@@ -459,7 +339,7 @@ <h2 id="strong-correlations" class="anchor">Strong correlations </h2>
   <li><a href="http://github.com/._week45-bs025.html">26</a></li>
   <li><a href="http://github.com/._week45-bs026.html">27</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week45-bs076.html">77</a></li>
+  <li><a href="http://github.com/._week45-bs058.html">59</a></li>
   <li><a href="http://github.com/._week45-bs018.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week45/html/._week45-bs018.html b/doc/pub/week45/html/._week45-bs018.html
index a5bf14d1c..806e5f6c1 100644
--- a/doc/pub/week45/html/._week45-bs018.html
+++ b/doc/pub/week45/html/._week45-bs018.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks (RNNs)">
-<title>Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks (RNNs)</title>
+<meta name="description" content="Week 45,  Convolutional Neural Networks (CCNs)">
+<title>Week 45,  Convolutional Neural Networks (CCNs)</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week45.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week45-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -37,19 +37,19 @@
 <!-- tocinfo
 {'highest level': 2,
  'sections': [('Plans for week 45', 2, None, 'plans-for-week-45'),
-              ('Material for the lab sessions, additional ways to present '
-               'classification results and other practicalities',
+              ('Material for the lab sessions',
                2,
                None,
-               'material-for-the-lab-sessions-additional-ways-to-present-classification-results-and-other-practicalities'),
-              ('Material for Lecture Monday November 4',
+               'material-for-the-lab-sessions'),
+              ('Material for Lecture Monday November 3',
                2,
                None,
-               'material-for-lecture-monday-november-4'),
-              ('Convolutional Neural Networks (recognizing images)',
+               'material-for-lecture-monday-november-3'),
+              ('Convolutional Neural Networks (recognizing images), reminder '
+               'from last week',
                2,
                None,
-               'convolutional-neural-networks-recognizing-images'),
+               'convolutional-neural-networks-recognizing-images-reminder-from-last-week'),
               ('What is the Difference', 2, None, 'what-is-the-difference'),
               ('Neural Networks vs CNNs', 2, None, 'neural-networks-vs-cnns'),
               ('Why CNNS for images, sound files, medical images from CT scans '
@@ -62,10 +62,13 @@
                None,
                'regular-nns-don-t-scale-well-to-full-images'),
               ('3D volumes of neurons', 2, None, '3d-volumes-of-neurons'),
+              ('More on Dimensionalities', 2, None, 'more-on-dimensionalities'),
+              ('Further remarks', 2, None, 'further-remarks'),
               ('Layers used to build CNNs',
                2,
                None,
                'layers-used-to-build-cnns'),
+              ('Transforming images', 2, None, 'transforming-images'),
               ('CNNs in brief', 2, None, 'cnns-in-brief'),
               ('A deep CNN model ("From Raschka et '
                'al":"https://github.com/rasbt/machine-learning-book")',
@@ -73,208 +76,104 @@
                None,
                'a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
               ('Key Idea', 2, None, 'key-idea'),
-              ('Building convolutional neural networks in Tensorflow and Keras',
+              ('Mathematics of CNNs', 2, None, 'mathematics-of-cnns'),
+              ('Convolution Examples: Polynomial multiplication',
                2,
                None,
-               'building-convolutional-neural-networks-in-tensorflow-and-keras'),
-              ('Setting it up', 2, None, 'setting-it-up'),
-              ('The MNIST dataset again', 2, None, 'the-mnist-dataset-again'),
-              ('Strong correlations', 2, None, 'strong-correlations'),
-              ('Layers of a CNN', 2, None, 'layers-of-a-cnn'),
-              ('Systematic reduction', 2, None, 'systematic-reduction'),
-              ('Prerequisites: Collect and pre-process data',
-               2,
-               None,
-               'prerequisites-collect-and-pre-process-data'),
-              ('Importing Keras and Tensorflow',
-               2,
-               None,
-               'importing-keras-and-tensorflow'),
-              ('Running with Keras', 2, None, 'running-with-keras'),
-              ('Final part', 2, None, 'final-part'),
-              ('Final visualization', 2, None, 'final-visualization'),
-              ('The CIFAR01 data set', 2, None, 'the-cifar01-data-set'),
-              ('Verifying the data set', 2, None, 'verifying-the-data-set'),
-              ('Set up  the model', 2, None, 'set-up-the-model'),
-              ('Add Dense layers on top', 2, None, 'add-dense-layers-on-top'),
-              ('Compile and train the model',
-               2,
-               None,
-               'compile-and-train-the-model'),
-              ('Finally, evaluate the model',
-               2,
-               None,
-               'finally-evaluate-the-model'),
-              ('Building our own CNN code',
-               2,
-               None,
-               'building-our-own-cnn-code'),
-              ('List of contents:', 3, None, 'list-of-contents'),
-              ('Schedulers', 3, None, 'schedulers'),
-              ('Usage of schedulers', 3, None, 'usage-of-schedulers'),
-              ('Cost functions', 3, None, 'cost-functions'),
-              ('Usage of cost functions', 3, None, 'usage-of-cost-functions'),
-              ('Activation functions', 3, None, 'activation-functions'),
-              ('Usage of activation functions',
-               3,
-               None,
-               'usage-of-activation-functions'),
-              ('Convolution', 3, None, 'convolution'),
-              ('Layers', 3, None, 'layers'),
-              ('Convolution2DLayer: convolution in a hidden layer',
-               3,
-               None,
-               'convolution2dlayer-convolution-in-a-hidden-layer'),
-              ('Backpropagation in the convolutional layer',
-               3,
-               None,
-               'backpropagation-in-the-convolutional-layer'),
-              ('Demonstration', 3, None, 'demonstration'),
-              ('Pooling Layer', 3, None, 'pooling-layer'),
-              ('Flattening Layer', 3, None, 'flattening-layer'),
-              ('Fully Connected Layers', 3, None, 'fully-connected-layers'),
-              ('Optimized Convolution2DLayer',
-               3,
-               None,
-               'optimized-convolution2dlayer'),
-              ('The Convolutional Neural Network (CNN)',
-               3,
-               None,
-               'the-convolutional-neural-network-cnn'),
-              ('Usage of CNN code', 3, None, 'usage-of-cnn-code'),
-              ('Additional Remarks', 3, None, 'additional-remarks'),
-              ('Remarks on the speed', 3, None, 'remarks-on-the-speed'),
-              ('Convolution using separable kernels',
-               3,
-               None,
-               'convolution-using-separable-kernels'),
-              ('Convolution in the Fourier domain',
-               3,
-               None,
-               'convolution-in-the-fourier-domain'),
-              ('From FFNNs and CNNs to recurrent neural networks (RNNs)',
-               2,
-               None,
-               'from-ffnns-and-cnns-to-recurrent-neural-networks-rnns'),
-              ('Feedback connections', 2, None, 'feedback-connections'),
-              ('Vanishing gradients', 2, None, 'vanishing-gradients'),
-              ('Recurrent neural networks (RNNs): Overarching view',
-               2,
-               None,
-               'recurrent-neural-networks-rnns-overarching-view'),
-              ('Sequential data only?', 2, None, 'sequential-data-only'),
-              ('Differential equations', 2, None, 'differential-equations'),
-              ('A simple example', 2, None, 'a-simple-example'),
-              ('RNNs', 2, None, 'rnns'),
-              ('What kinds of behaviour can RNNs exhibit?',
+               'convolution-examples-polynomial-multiplication'),
+              ('Efficient Polynomial Multiplication',
                2,
                None,
-               'what-kinds-of-behaviour-can-rnns-exhibit'),
-              ('Basic layout,  "Figures from Sebastian Rashcka et al, Machine '
-               'learning with Sickit-Learn and '
-               'PyTorch":"https://sebastianraschka.com/blog/2022/ml-pytorch-book.html"',
+               'efficient-polynomial-multiplication'),
+              ('Further simplification', 2, None, 'further-simplification'),
+              ('A more efficient way of coding the above Convolution',
                2,
                None,
-               'basic-layout-figures-from-sebastian-rashcka-et-al-machine-learning-with-sickit-learn-and-pytorch-https-sebastianraschka-com-blog-2022-ml-pytorch-book-html'),
-              ('Solving differential equations with RNNs',
+               'a-more-efficient-way-of-coding-the-above-convolution'),
+              ('Commutative process', 2, None, 'commutative-process'),
+              ('Toeplitz matrices', 2, None, 'toeplitz-matrices'),
+              ('Fourier series and Toeplitz matrices',
                2,
                None,
-               'solving-differential-equations-with-rnns'),
-              ('Two first-order differential equations',
+               'fourier-series-and-toeplitz-matrices'),
+              ('Generalizing the above one-dimensional case',
                2,
                None,
-               'two-first-order-differential-equations'),
-              ('Velocity only', 2, None, 'velocity-only'),
-              ('Linking with RNNs', 2, None, 'linking-with-rnns'),
-              ('Minor rewrite', 2, None, 'minor-rewrite'),
-              ('RNNs in more detail', 2, None, 'rnns-in-more-detail'),
-              ('RNNs in more detail, part 2',
+               'generalizing-the-above-one-dimensional-case'),
+              ('Memory considerations', 2, None, 'memory-considerations'),
+              ('Padding', 2, None, 'padding'),
+              ('New vector', 2, None, 'new-vector'),
+              ('Rewriting as dot products',
                2,
                None,
-               'rnns-in-more-detail-part-2'),
-              ('RNNs in more detail, part 3',
+               'rewriting-as-dot-products'),
+              ('Cross correlation', 2, None, 'cross-correlation'),
+              ('Two-dimensional objects', 2, None, 'two-dimensional-objects'),
+              ('CNNs in more detail, simple example',
                2,
                None,
-               'rnns-in-more-detail-part-3'),
-              ('RNNs in more detail, part 4',
+               'cnns-in-more-detail-simple-example'),
+              ('The convolution stage', 2, None, 'the-convolution-stage'),
+              ('Finding the number of parameters',
                2,
                None,
-               'rnns-in-more-detail-part-4'),
-              ('RNNs in more detail, part 5',
+               'finding-the-number-of-parameters'),
+              ('New image (or volume)', 2, None, 'new-image-or-volume'),
+              ('Parameters to train, common settings',
                2,
                None,
-               'rnns-in-more-detail-part-5'),
-              ('RNNs in more detail, part 6',
-               2,
-               None,
-               'rnns-in-more-detail-part-6'),
-              ('RNNs in more detail, part 7',
-               2,
-               None,
-               'rnns-in-more-detail-part-7'),
-              ('Backpropagation through time',
-               2,
-               None,
-               'backpropagation-through-time'),
-              ('The backward pass is linear',
-               2,
-               None,
-               'the-backward-pass-is-linear'),
-              ('The problem of exploding or vanishing gradients',
-               2,
-               None,
-               'the-problem-of-exploding-or-vanishing-gradients'),
-              ('Mathematical setup', 2, None, 'mathematical-setup'),
-              ('Back propagation in time through figures, part 1',
-               2,
-               None,
-               'back-propagation-in-time-through-figures-part-1'),
-              ('Back propagation in time, part 2',
-               2,
-               None,
-               'back-propagation-in-time-part-2'),
-              ('Back propagation in time, part 3',
+               'parameters-to-train-common-settings'),
+              ('Examples of CNN setups', 2, None, 'examples-of-cnn-setups'),
+              ('Summarizing: Performing a general discrete convolution ("From '
+               'Raschka et '
+               'al":"https://github.com/rasbt/machine-learning-book")',
                2,
                None,
-               'back-propagation-in-time-part-3'),
-              ('Back propagation in time, part 4',
+               'summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
+              ('Pooling', 2, None, 'pooling'),
+              ('Pooling arithmetic', 2, None, 'pooling-arithmetic'),
+              ('Pooling types ("From Raschka et '
+               'al":"https://github.com/rasbt/machine-learning-book")',
                2,
                None,
-               'back-propagation-in-time-part-4'),
-              ('Back propagation in time in equations',
+               'pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
+              ('Building convolutional neural networks using Tensorflow and '
+               'Keras',
                2,
                None,
-               'back-propagation-in-time-in-equations'),
-              ('Chain rule again', 2, None, 'chain-rule-again'),
-              ('Gradients of loss functions',
+               'building-convolutional-neural-networks-using-tensorflow-and-keras'),
+              ('Setting it up', 2, None, 'setting-it-up'),
+              ('The MNIST dataset again', 2, None, 'the-mnist-dataset-again'),
+              ('Strong correlations', 2, None, 'strong-correlations'),
+              ('Layers of a CNN', 2, None, 'layers-of-a-cnn'),
+              ('Systematic reduction', 2, None, 'systematic-reduction'),
+              ('Prerequisites: Collect and pre-process data',
                2,
                None,
-               'gradients-of-loss-functions'),
-              ('Summary of RNNs', 2, None, 'summary-of-rnns'),
-              ('Summary of a  typical RNN',
+               'prerequisites-collect-and-pre-process-data'),
+              ('Importing Keras and Tensorflow',
                2,
                None,
-               'summary-of-a-typical-rnn'),
-              ('Four effective ways to learn an RNN and preparing for next '
-               'week',
+               'importing-keras-and-tensorflow'),
+              ('Running with Keras', 2, None, 'running-with-keras'),
+              ('Final part', 2, None, 'final-part'),
+              ('Final visualization', 2, None, 'final-visualization'),
+              ('The CIFAR01 data set', 2, None, 'the-cifar01-data-set'),
+              ('Verifying the data set', 2, None, 'verifying-the-data-set'),
+              ('Set up  the model', 2, None, 'set-up-the-model'),
+              ('Add Dense layers on top', 2, None, 'add-dense-layers-on-top'),
+              ('Compile and train the model',
                2,
                None,
-               'four-effective-ways-to-learn-an-rnn-and-preparing-for-next-week'),
-              ('Gating mechanism: Long Short Term Memory (LSTM)',
+               'compile-and-train-the-model'),
+              ('Finally, evaluate the model',
                2,
                None,
-               'gating-mechanism-long-short-term-memory-lstm'),
-              ('Implementing a memory cell in a neural network',
+               'finally-evaluate-the-model'),
+              ('Building code using Pytorch',
                2,
                None,
-               'implementing-a-memory-cell-in-a-neural-network'),
-              ('LSTM details', 2, None, 'lstm-details'),
-              ('Basic layout', 2, None, 'basic-layout'),
-              ('More LSTM details', 2, None, 'more-lstm-details'),
-              ('The forget gate', 2, None, 'the-forget-gate'),
-              ('Input gate', 2, None, 'input-gate'),
-              ('Forget and input', 2, None, 'forget-and-input'),
-              ('Output gate', 2, None, 'output-gate')]}
+               'building-code-using-pytorch')]}
 end of tocinfo -->
 
 <body>
@@ -302,111 +201,72 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week45-bs.html">Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks (RNNs)</a>
+    <a class="navbar-brand" href="http://github.com/week45-bs.html">Week 45,  Convolutional Neural Networks (CCNs)</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs001.html#plans-for-week-45" style="font-size: 80%;"><b>Plans for week 45</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs002.html#material-for-the-lab-sessions-additional-ways-to-present-classification-results-and-other-practicalities" style="font-size: 80%;"><b>Material for the lab sessions, additional ways to present classification results and other practicalities</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs003.html#material-for-lecture-monday-november-4" style="font-size: 80%;"><b>Material for Lecture Monday November 4</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs004.html#convolutional-neural-networks-recognizing-images" style="font-size: 80%;"><b>Convolutional Neural Networks (recognizing images)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs005.html#what-is-the-difference" style="font-size: 80%;"><b>What is the Difference</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs006.html#neural-networks-vs-cnns" style="font-size: 80%;"><b>Neural Networks vs CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs007.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;"><b>Why CNNS for images, sound files, medical images from CT scans etc?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs008.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;"><b>Regular NNs don’t scale well to full images</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs009.html#3d-volumes-of-neurons" style="font-size: 80%;"><b>3D volumes of neurons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs010.html#layers-used-to-build-cnns" style="font-size: 80%;"><b>Layers used to build CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs011.html#cnns-in-brief" style="font-size: 80%;"><b>CNNs in brief</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs012.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs013.html#key-idea" style="font-size: 80%;"><b>Key Idea</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs014.html#building-convolutional-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building convolutional neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs015.html#setting-it-up" style="font-size: 80%;"><b>Setting it up</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs016.html#the-mnist-dataset-again" style="font-size: 80%;"><b>The MNIST dataset again</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs017.html#strong-correlations" style="font-size: 80%;"><b>Strong correlations</b></a></li>
-     <!-- navigation toc: --> <li><a href="#layers-of-a-cnn" style="font-size: 80%;"><b>Layers of a CNN</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs019.html#systematic-reduction" style="font-size: 80%;"><b>Systematic reduction</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs020.html#prerequisites-collect-and-pre-process-data" style="font-size: 80%;"><b>Prerequisites: Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs021.html#importing-keras-and-tensorflow" style="font-size: 80%;"><b>Importing Keras and Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs022.html#running-with-keras" style="font-size: 80%;"><b>Running with Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs023.html#final-part" style="font-size: 80%;"><b>Final part</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs024.html#final-visualization" style="font-size: 80%;"><b>Final visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs025.html#the-cifar01-data-set" style="font-size: 80%;"><b>The CIFAR01 data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs026.html#verifying-the-data-set" style="font-size: 80%;"><b>Verifying the data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs027.html#set-up-the-model" style="font-size: 80%;"><b>Set up  the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs028.html#add-dense-layers-on-top" style="font-size: 80%;"><b>Add Dense layers on top</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs029.html#compile-and-train-the-model" style="font-size: 80%;"><b>Compile and train the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs030.html#finally-evaluate-the-model" style="font-size: 80%;"><b>Finally, evaluate the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#building-our-own-cnn-code" style="font-size: 80%;"><b>Building our own CNN code</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#list-of-contents" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;List of contents:</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution2dlayer-convolution-in-a-hidden-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution2DLayer: convolution in a hidden layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#backpropagation-in-the-convolutional-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Backpropagation in the convolutional layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#demonstration" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Demonstration</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#pooling-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Pooling Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#flattening-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Flattening Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#fully-connected-layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Fully Connected Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#optimized-convolution2dlayer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Optimized Convolution2DLayer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#the-convolutional-neural-network-cnn" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;The Convolutional Neural Network (CNN)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-cnn-code" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of CNN code</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#additional-remarks" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Additional Remarks</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#remarks-on-the-speed" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Remarks on the speed</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution-using-separable-kernels" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution using separable kernels</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution-in-the-fourier-domain" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution in the Fourier domain</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs032.html#from-ffnns-and-cnns-to-recurrent-neural-networks-rnns" style="font-size: 80%;"><b>From FFNNs and CNNs to recurrent neural networks (RNNs)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs033.html#feedback-connections" style="font-size: 80%;"><b>Feedback connections</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs034.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs035.html#recurrent-neural-networks-rnns-overarching-view" style="font-size: 80%;"><b>Recurrent neural networks (RNNs): Overarching view</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs036.html#sequential-data-only" style="font-size: 80%;"><b>Sequential data only?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs037.html#differential-equations" style="font-size: 80%;"><b>Differential equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs038.html#a-simple-example" style="font-size: 80%;"><b>A simple example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs039.html#rnns" style="font-size: 80%;"><b>RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs040.html#what-kinds-of-behaviour-can-rnns-exhibit" style="font-size: 80%;"><b>What kinds of behaviour can RNNs exhibit?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs041.html#basic-layout-figures-from-sebastian-rashcka-et-al-machine-learning-with-sickit-learn-and-pytorch-https-sebastianraschka-com-blog-2022-ml-pytorch-book-html" style="font-size: 80%;"><b>Basic layout,  "Figures from Sebastian Rashcka et al, Machine learning with Sickit-Learn and PyTorch":"https://sebastianraschka.com/blog/2022/ml-pytorch-book.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs042.html#solving-differential-equations-with-rnns" style="font-size: 80%;"><b>Solving differential equations with RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs043.html#two-first-order-differential-equations" style="font-size: 80%;"><b>Two first-order differential equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs044.html#velocity-only" style="font-size: 80%;"><b>Velocity only</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs045.html#linking-with-rnns" style="font-size: 80%;"><b>Linking with RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs046.html#minor-rewrite" style="font-size: 80%;"><b>Minor rewrite</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs047.html#rnns-in-more-detail" style="font-size: 80%;"><b>RNNs in more detail</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs048.html#rnns-in-more-detail-part-2" style="font-size: 80%;"><b>RNNs in more detail, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs049.html#rnns-in-more-detail-part-3" style="font-size: 80%;"><b>RNNs in more detail, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs050.html#rnns-in-more-detail-part-4" style="font-size: 80%;"><b>RNNs in more detail, part 4</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs051.html#rnns-in-more-detail-part-5" style="font-size: 80%;"><b>RNNs in more detail, part 5</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs052.html#rnns-in-more-detail-part-6" style="font-size: 80%;"><b>RNNs in more detail, part 6</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs053.html#rnns-in-more-detail-part-7" style="font-size: 80%;"><b>RNNs in more detail, part 7</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs054.html#backpropagation-through-time" style="font-size: 80%;"><b>Backpropagation through time</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs055.html#the-backward-pass-is-linear" style="font-size: 80%;"><b>The backward pass is linear</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs056.html#the-problem-of-exploding-or-vanishing-gradients" style="font-size: 80%;"><b>The problem of exploding or vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs057.html#mathematical-setup" style="font-size: 80%;"><b>Mathematical setup</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs058.html#back-propagation-in-time-through-figures-part-1" style="font-size: 80%;"><b>Back propagation in time through figures, part 1</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs059.html#back-propagation-in-time-part-2" style="font-size: 80%;"><b>Back propagation in time, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs060.html#back-propagation-in-time-part-3" style="font-size: 80%;"><b>Back propagation in time, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs061.html#back-propagation-in-time-part-4" style="font-size: 80%;"><b>Back propagation in time, part 4</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs062.html#back-propagation-in-time-in-equations" style="font-size: 80%;"><b>Back propagation in time in equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs063.html#chain-rule-again" style="font-size: 80%;"><b>Chain rule again</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs064.html#gradients-of-loss-functions" style="font-size: 80%;"><b>Gradients of loss functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs065.html#summary-of-rnns" style="font-size: 80%;"><b>Summary of RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs066.html#summary-of-a-typical-rnn" style="font-size: 80%;"><b>Summary of a  typical RNN</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs067.html#four-effective-ways-to-learn-an-rnn-and-preparing-for-next-week" style="font-size: 80%;"><b>Four effective ways to learn an RNN and preparing for next week</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs068.html#gating-mechanism-long-short-term-memory-lstm" style="font-size: 80%;"><b>Gating mechanism: Long Short Term Memory (LSTM)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs069.html#implementing-a-memory-cell-in-a-neural-network" style="font-size: 80%;"><b>Implementing a memory cell in a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs070.html#lstm-details" style="font-size: 80%;"><b>LSTM details</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs071.html#basic-layout" style="font-size: 80%;"><b>Basic layout</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs072.html#more-lstm-details" style="font-size: 80%;"><b>More LSTM details</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs073.html#the-forget-gate" style="font-size: 80%;"><b>The forget gate</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs074.html#input-gate" style="font-size: 80%;"><b>Input gate</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs075.html#forget-and-input" style="font-size: 80%;"><b>Forget and input</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs076.html#output-gate" style="font-size: 80%;"><b>Output gate</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs001.html#plans-for-week-45" style="font-size: 80%;">Plans for week 45</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs002.html#material-for-the-lab-sessions" style="font-size: 80%;">Material for the lab sessions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs003.html#material-for-lecture-monday-november-3" style="font-size: 80%;">Material for Lecture Monday November 3</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs004.html#convolutional-neural-networks-recognizing-images-reminder-from-last-week" style="font-size: 80%;">Convolutional Neural Networks (recognizing images), reminder from last week</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs005.html#what-is-the-difference" style="font-size: 80%;">What is the Difference</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs006.html#neural-networks-vs-cnns" style="font-size: 80%;">Neural Networks vs CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs007.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;">Why CNNS for images, sound files, medical images from CT scans etc?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs008.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;">Regular NNs don’t scale well to full images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs009.html#3d-volumes-of-neurons" style="font-size: 80%;">3D volumes of neurons</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs010.html#more-on-dimensionalities" style="font-size: 80%;">More on Dimensionalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs011.html#further-remarks" style="font-size: 80%;">Further remarks</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs012.html#layers-used-to-build-cnns" style="font-size: 80%;">Layers used to build CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs013.html#transforming-images" style="font-size: 80%;">Transforming images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs014.html#cnns-in-brief" style="font-size: 80%;">CNNs in brief</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs015.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs016.html#key-idea" style="font-size: 80%;">Key Idea</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs017.html#mathematics-of-cnns" style="font-size: 80%;">Mathematics of CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="#convolution-examples-polynomial-multiplication" style="font-size: 80%;">Convolution Examples: Polynomial multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs019.html#efficient-polynomial-multiplication" style="font-size: 80%;">Efficient Polynomial Multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs020.html#further-simplification" style="font-size: 80%;">Further simplification</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs021.html#a-more-efficient-way-of-coding-the-above-convolution" style="font-size: 80%;">A more efficient way of coding the above Convolution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs022.html#commutative-process" style="font-size: 80%;">Commutative process</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs023.html#toeplitz-matrices" style="font-size: 80%;">Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs024.html#fourier-series-and-toeplitz-matrices" style="font-size: 80%;">Fourier series and Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs025.html#generalizing-the-above-one-dimensional-case" style="font-size: 80%;">Generalizing the above one-dimensional case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs026.html#memory-considerations" style="font-size: 80%;">Memory considerations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs027.html#padding" style="font-size: 80%;">Padding</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs028.html#new-vector" style="font-size: 80%;">New vector</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs029.html#rewriting-as-dot-products" style="font-size: 80%;">Rewriting as dot products</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs030.html#cross-correlation" style="font-size: 80%;">Cross correlation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs030.html#two-dimensional-objects" style="font-size: 80%;">Two-dimensional objects</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#cnns-in-more-detail-simple-example" style="font-size: 80%;">CNNs in more detail, simple example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs032.html#the-convolution-stage" style="font-size: 80%;">The convolution stage</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs033.html#finding-the-number-of-parameters" style="font-size: 80%;">Finding the number of parameters</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs034.html#new-image-or-volume" style="font-size: 80%;">New image (or volume)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs035.html#parameters-to-train-common-settings" style="font-size: 80%;">Parameters to train, common settings</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs036.html#examples-of-cnn-setups" style="font-size: 80%;">Examples of CNN setups</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs037.html#summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Summarizing: Performing a general discrete convolution ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs038.html#pooling" style="font-size: 80%;">Pooling</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs039.html#pooling-arithmetic" style="font-size: 80%;">Pooling arithmetic</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs040.html#pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Pooling types ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs041.html#building-convolutional-neural-networks-using-tensorflow-and-keras" style="font-size: 80%;">Building convolutional neural networks using Tensorflow and Keras</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs042.html#setting-it-up" style="font-size: 80%;">Setting it up</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs043.html#the-mnist-dataset-again" style="font-size: 80%;">The MNIST dataset again</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs044.html#strong-correlations" style="font-size: 80%;">Strong correlations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs045.html#layers-of-a-cnn" style="font-size: 80%;">Layers of a CNN</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs046.html#systematic-reduction" style="font-size: 80%;">Systematic reduction</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs047.html#prerequisites-collect-and-pre-process-data" style="font-size: 80%;">Prerequisites: Collect and pre-process data</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs048.html#importing-keras-and-tensorflow" style="font-size: 80%;">Importing Keras and Tensorflow</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs049.html#running-with-keras" style="font-size: 80%;">Running with Keras</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs050.html#final-part" style="font-size: 80%;">Final part</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs051.html#final-visualization" style="font-size: 80%;">Final visualization</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs052.html#the-cifar01-data-set" style="font-size: 80%;">The CIFAR01 data set</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs053.html#verifying-the-data-set" style="font-size: 80%;">Verifying the data set</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs054.html#set-up-the-model" style="font-size: 80%;">Set up  the model</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs055.html#add-dense-layers-on-top" style="font-size: 80%;">Add Dense layers on top</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs056.html#compile-and-train-the-model" style="font-size: 80%;">Compile and train the model</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs057.html#finally-evaluate-the-model" style="font-size: 80%;">Finally, evaluate the model</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs058.html#building-code-using-pytorch" style="font-size: 80%;">Building code using Pytorch</a></li>
 
         </ul>
       </li>
@@ -417,27 +277,32 @@
 <div class="container">
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0018"></a>
-<!-- !split  -->
-<h2 id="layers-of-a-cnn" class="anchor">Layers of a CNN </h2>
+<!-- !split -->
+<h2 id="convolution-examples-polynomial-multiplication" class="anchor">Convolution Examples: Polynomial multiplication </h2>
 
-<p>The layers of a convolutional neural network arrange neurons in 3D: width, height and depth.  
-The input image is typically a square matrix of depth 3. 
+<p>Our first example is that of a multiplication between two polynomials,
+which we will rewrite in terms of the mathematics of convolution. In
+the final stage, since the problem here is a discrete one, we will
+recast the final expression in terms of a matrix-vector
+multiplication, where the matrix is a so-called <a href="https://link.springer.com/book/10.1007/978-93-86279-04-0" target="_self">Toeplitz matrix
+</a>.
 </p>
 
-<p>A <b>convolution</b> is performed on the image which outputs
-a 3D volume of neurons. The weights to the input are arranged in a number of 2D matrices, known as <b>filters</b>.
-</p>
+<p>Let us look a the following polynomials to second and third order, respectively:</p>
+$$
+p(t) = \alpha_0+\alpha_1 t+\alpha_2 t^2,
+$$
+
+<p>and</p>
+$$
+s(t) = \beta_0+\beta_1 t+\beta_2 t^2+\beta_3 t^3.
+$$
+
+<p>The polynomial multiplication gives us a new polynomial of degree \( 5 \)</p>
+$$
+z(t) = \delta_0+\delta_1 t+\delta_2 t^2+\delta_3 t^3+\delta_4 t^4+\delta_5 t^5.
+$$
 
-<p>Each filter slides along the input image, taking the dot product
-between each small part of the image and the filter, in all depth
-dimensions. This is then passed through a non-linear function,
-typically the <b>Rectified Linear (ReLu)</b> function, which serves as the
-activation of the neurons in the first convolutional layer. This is
-further passed through a <b>pooling layer</b>, which reduces the size of the
-convolutional layer, e.g. by taking the maximum or average across some
-small regions, and this serves as input to the next convolutional
-layer.
-</p>
 
 <p>
 <!-- navigation buttons at the bottom of the page -->
@@ -464,7 +329,7 @@ <h2 id="layers-of-a-cnn" class="anchor">Layers of a CNN </h2>
   <li><a href="http://github.com/._week45-bs026.html">27</a></li>
   <li><a href="http://github.com/._week45-bs027.html">28</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week45-bs076.html">77</a></li>
+  <li><a href="http://github.com/._week45-bs058.html">59</a></li>
   <li><a href="http://github.com/._week45-bs019.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week45/html/._week45-bs019.html b/doc/pub/week45/html/._week45-bs019.html
index e90996e06..47cf208a9 100644
--- a/doc/pub/week45/html/._week45-bs019.html
+++ b/doc/pub/week45/html/._week45-bs019.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks (RNNs)">
-<title>Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks (RNNs)</title>
+<meta name="description" content="Week 45,  Convolutional Neural Networks (CCNs)">
+<title>Week 45,  Convolutional Neural Networks (CCNs)</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week45.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week45-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -37,19 +37,19 @@
 <!-- tocinfo
 {'highest level': 2,
  'sections': [('Plans for week 45', 2, None, 'plans-for-week-45'),
-              ('Material for the lab sessions, additional ways to present '
-               'classification results and other practicalities',
+              ('Material for the lab sessions',
                2,
                None,
-               'material-for-the-lab-sessions-additional-ways-to-present-classification-results-and-other-practicalities'),
-              ('Material for Lecture Monday November 4',
+               'material-for-the-lab-sessions'),
+              ('Material for Lecture Monday November 3',
                2,
                None,
-               'material-for-lecture-monday-november-4'),
-              ('Convolutional Neural Networks (recognizing images)',
+               'material-for-lecture-monday-november-3'),
+              ('Convolutional Neural Networks (recognizing images), reminder '
+               'from last week',
                2,
                None,
-               'convolutional-neural-networks-recognizing-images'),
+               'convolutional-neural-networks-recognizing-images-reminder-from-last-week'),
               ('What is the Difference', 2, None, 'what-is-the-difference'),
               ('Neural Networks vs CNNs', 2, None, 'neural-networks-vs-cnns'),
               ('Why CNNS for images, sound files, medical images from CT scans '
@@ -62,10 +62,13 @@
                None,
                'regular-nns-don-t-scale-well-to-full-images'),
               ('3D volumes of neurons', 2, None, '3d-volumes-of-neurons'),
+              ('More on Dimensionalities', 2, None, 'more-on-dimensionalities'),
+              ('Further remarks', 2, None, 'further-remarks'),
               ('Layers used to build CNNs',
                2,
                None,
                'layers-used-to-build-cnns'),
+              ('Transforming images', 2, None, 'transforming-images'),
               ('CNNs in brief', 2, None, 'cnns-in-brief'),
               ('A deep CNN model ("From Raschka et '
                'al":"https://github.com/rasbt/machine-learning-book")',
@@ -73,208 +76,104 @@
                None,
                'a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
               ('Key Idea', 2, None, 'key-idea'),
-              ('Building convolutional neural networks in Tensorflow and Keras',
+              ('Mathematics of CNNs', 2, None, 'mathematics-of-cnns'),
+              ('Convolution Examples: Polynomial multiplication',
                2,
                None,
-               'building-convolutional-neural-networks-in-tensorflow-and-keras'),
-              ('Setting it up', 2, None, 'setting-it-up'),
-              ('The MNIST dataset again', 2, None, 'the-mnist-dataset-again'),
-              ('Strong correlations', 2, None, 'strong-correlations'),
-              ('Layers of a CNN', 2, None, 'layers-of-a-cnn'),
-              ('Systematic reduction', 2, None, 'systematic-reduction'),
-              ('Prerequisites: Collect and pre-process data',
-               2,
-               None,
-               'prerequisites-collect-and-pre-process-data'),
-              ('Importing Keras and Tensorflow',
-               2,
-               None,
-               'importing-keras-and-tensorflow'),
-              ('Running with Keras', 2, None, 'running-with-keras'),
-              ('Final part', 2, None, 'final-part'),
-              ('Final visualization', 2, None, 'final-visualization'),
-              ('The CIFAR01 data set', 2, None, 'the-cifar01-data-set'),
-              ('Verifying the data set', 2, None, 'verifying-the-data-set'),
-              ('Set up  the model', 2, None, 'set-up-the-model'),
-              ('Add Dense layers on top', 2, None, 'add-dense-layers-on-top'),
-              ('Compile and train the model',
-               2,
-               None,
-               'compile-and-train-the-model'),
-              ('Finally, evaluate the model',
-               2,
-               None,
-               'finally-evaluate-the-model'),
-              ('Building our own CNN code',
-               2,
-               None,
-               'building-our-own-cnn-code'),
-              ('List of contents:', 3, None, 'list-of-contents'),
-              ('Schedulers', 3, None, 'schedulers'),
-              ('Usage of schedulers', 3, None, 'usage-of-schedulers'),
-              ('Cost functions', 3, None, 'cost-functions'),
-              ('Usage of cost functions', 3, None, 'usage-of-cost-functions'),
-              ('Activation functions', 3, None, 'activation-functions'),
-              ('Usage of activation functions',
-               3,
-               None,
-               'usage-of-activation-functions'),
-              ('Convolution', 3, None, 'convolution'),
-              ('Layers', 3, None, 'layers'),
-              ('Convolution2DLayer: convolution in a hidden layer',
-               3,
-               None,
-               'convolution2dlayer-convolution-in-a-hidden-layer'),
-              ('Backpropagation in the convolutional layer',
-               3,
-               None,
-               'backpropagation-in-the-convolutional-layer'),
-              ('Demonstration', 3, None, 'demonstration'),
-              ('Pooling Layer', 3, None, 'pooling-layer'),
-              ('Flattening Layer', 3, None, 'flattening-layer'),
-              ('Fully Connected Layers', 3, None, 'fully-connected-layers'),
-              ('Optimized Convolution2DLayer',
-               3,
-               None,
-               'optimized-convolution2dlayer'),
-              ('The Convolutional Neural Network (CNN)',
-               3,
-               None,
-               'the-convolutional-neural-network-cnn'),
-              ('Usage of CNN code', 3, None, 'usage-of-cnn-code'),
-              ('Additional Remarks', 3, None, 'additional-remarks'),
-              ('Remarks on the speed', 3, None, 'remarks-on-the-speed'),
-              ('Convolution using separable kernels',
-               3,
-               None,
-               'convolution-using-separable-kernels'),
-              ('Convolution in the Fourier domain',
-               3,
-               None,
-               'convolution-in-the-fourier-domain'),
-              ('From FFNNs and CNNs to recurrent neural networks (RNNs)',
-               2,
-               None,
-               'from-ffnns-and-cnns-to-recurrent-neural-networks-rnns'),
-              ('Feedback connections', 2, None, 'feedback-connections'),
-              ('Vanishing gradients', 2, None, 'vanishing-gradients'),
-              ('Recurrent neural networks (RNNs): Overarching view',
-               2,
-               None,
-               'recurrent-neural-networks-rnns-overarching-view'),
-              ('Sequential data only?', 2, None, 'sequential-data-only'),
-              ('Differential equations', 2, None, 'differential-equations'),
-              ('A simple example', 2, None, 'a-simple-example'),
-              ('RNNs', 2, None, 'rnns'),
-              ('What kinds of behaviour can RNNs exhibit?',
+               'convolution-examples-polynomial-multiplication'),
+              ('Efficient Polynomial Multiplication',
                2,
                None,
-               'what-kinds-of-behaviour-can-rnns-exhibit'),
-              ('Basic layout,  "Figures from Sebastian Rashcka et al, Machine '
-               'learning with Sickit-Learn and '
-               'PyTorch":"https://sebastianraschka.com/blog/2022/ml-pytorch-book.html"',
+               'efficient-polynomial-multiplication'),
+              ('Further simplification', 2, None, 'further-simplification'),
+              ('A more efficient way of coding the above Convolution',
                2,
                None,
-               'basic-layout-figures-from-sebastian-rashcka-et-al-machine-learning-with-sickit-learn-and-pytorch-https-sebastianraschka-com-blog-2022-ml-pytorch-book-html'),
-              ('Solving differential equations with RNNs',
+               'a-more-efficient-way-of-coding-the-above-convolution'),
+              ('Commutative process', 2, None, 'commutative-process'),
+              ('Toeplitz matrices', 2, None, 'toeplitz-matrices'),
+              ('Fourier series and Toeplitz matrices',
                2,
                None,
-               'solving-differential-equations-with-rnns'),
-              ('Two first-order differential equations',
+               'fourier-series-and-toeplitz-matrices'),
+              ('Generalizing the above one-dimensional case',
                2,
                None,
-               'two-first-order-differential-equations'),
-              ('Velocity only', 2, None, 'velocity-only'),
-              ('Linking with RNNs', 2, None, 'linking-with-rnns'),
-              ('Minor rewrite', 2, None, 'minor-rewrite'),
-              ('RNNs in more detail', 2, None, 'rnns-in-more-detail'),
-              ('RNNs in more detail, part 2',
+               'generalizing-the-above-one-dimensional-case'),
+              ('Memory considerations', 2, None, 'memory-considerations'),
+              ('Padding', 2, None, 'padding'),
+              ('New vector', 2, None, 'new-vector'),
+              ('Rewriting as dot products',
                2,
                None,
-               'rnns-in-more-detail-part-2'),
-              ('RNNs in more detail, part 3',
+               'rewriting-as-dot-products'),
+              ('Cross correlation', 2, None, 'cross-correlation'),
+              ('Two-dimensional objects', 2, None, 'two-dimensional-objects'),
+              ('CNNs in more detail, simple example',
                2,
                None,
-               'rnns-in-more-detail-part-3'),
-              ('RNNs in more detail, part 4',
+               'cnns-in-more-detail-simple-example'),
+              ('The convolution stage', 2, None, 'the-convolution-stage'),
+              ('Finding the number of parameters',
                2,
                None,
-               'rnns-in-more-detail-part-4'),
-              ('RNNs in more detail, part 5',
+               'finding-the-number-of-parameters'),
+              ('New image (or volume)', 2, None, 'new-image-or-volume'),
+              ('Parameters to train, common settings',
                2,
                None,
-               'rnns-in-more-detail-part-5'),
-              ('RNNs in more detail, part 6',
-               2,
-               None,
-               'rnns-in-more-detail-part-6'),
-              ('RNNs in more detail, part 7',
-               2,
-               None,
-               'rnns-in-more-detail-part-7'),
-              ('Backpropagation through time',
-               2,
-               None,
-               'backpropagation-through-time'),
-              ('The backward pass is linear',
-               2,
-               None,
-               'the-backward-pass-is-linear'),
-              ('The problem of exploding or vanishing gradients',
-               2,
-               None,
-               'the-problem-of-exploding-or-vanishing-gradients'),
-              ('Mathematical setup', 2, None, 'mathematical-setup'),
-              ('Back propagation in time through figures, part 1',
-               2,
-               None,
-               'back-propagation-in-time-through-figures-part-1'),
-              ('Back propagation in time, part 2',
-               2,
-               None,
-               'back-propagation-in-time-part-2'),
-              ('Back propagation in time, part 3',
+               'parameters-to-train-common-settings'),
+              ('Examples of CNN setups', 2, None, 'examples-of-cnn-setups'),
+              ('Summarizing: Performing a general discrete convolution ("From '
+               'Raschka et '
+               'al":"https://github.com/rasbt/machine-learning-book")',
                2,
                None,
-               'back-propagation-in-time-part-3'),
-              ('Back propagation in time, part 4',
+               'summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
+              ('Pooling', 2, None, 'pooling'),
+              ('Pooling arithmetic', 2, None, 'pooling-arithmetic'),
+              ('Pooling types ("From Raschka et '
+               'al":"https://github.com/rasbt/machine-learning-book")',
                2,
                None,
-               'back-propagation-in-time-part-4'),
-              ('Back propagation in time in equations',
+               'pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
+              ('Building convolutional neural networks using Tensorflow and '
+               'Keras',
                2,
                None,
-               'back-propagation-in-time-in-equations'),
-              ('Chain rule again', 2, None, 'chain-rule-again'),
-              ('Gradients of loss functions',
+               'building-convolutional-neural-networks-using-tensorflow-and-keras'),
+              ('Setting it up', 2, None, 'setting-it-up'),
+              ('The MNIST dataset again', 2, None, 'the-mnist-dataset-again'),
+              ('Strong correlations', 2, None, 'strong-correlations'),
+              ('Layers of a CNN', 2, None, 'layers-of-a-cnn'),
+              ('Systematic reduction', 2, None, 'systematic-reduction'),
+              ('Prerequisites: Collect and pre-process data',
                2,
                None,
-               'gradients-of-loss-functions'),
-              ('Summary of RNNs', 2, None, 'summary-of-rnns'),
-              ('Summary of a  typical RNN',
+               'prerequisites-collect-and-pre-process-data'),
+              ('Importing Keras and Tensorflow',
                2,
                None,
-               'summary-of-a-typical-rnn'),
-              ('Four effective ways to learn an RNN and preparing for next '
-               'week',
+               'importing-keras-and-tensorflow'),
+              ('Running with Keras', 2, None, 'running-with-keras'),
+              ('Final part', 2, None, 'final-part'),
+              ('Final visualization', 2, None, 'final-visualization'),
+              ('The CIFAR01 data set', 2, None, 'the-cifar01-data-set'),
+              ('Verifying the data set', 2, None, 'verifying-the-data-set'),
+              ('Set up  the model', 2, None, 'set-up-the-model'),
+              ('Add Dense layers on top', 2, None, 'add-dense-layers-on-top'),
+              ('Compile and train the model',
                2,
                None,
-               'four-effective-ways-to-learn-an-rnn-and-preparing-for-next-week'),
-              ('Gating mechanism: Long Short Term Memory (LSTM)',
+               'compile-and-train-the-model'),
+              ('Finally, evaluate the model',
                2,
                None,
-               'gating-mechanism-long-short-term-memory-lstm'),
-              ('Implementing a memory cell in a neural network',
+               'finally-evaluate-the-model'),
+              ('Building code using Pytorch',
                2,
                None,
-               'implementing-a-memory-cell-in-a-neural-network'),
-              ('LSTM details', 2, None, 'lstm-details'),
-              ('Basic layout', 2, None, 'basic-layout'),
-              ('More LSTM details', 2, None, 'more-lstm-details'),
-              ('The forget gate', 2, None, 'the-forget-gate'),
-              ('Input gate', 2, None, 'input-gate'),
-              ('Forget and input', 2, None, 'forget-and-input'),
-              ('Output gate', 2, None, 'output-gate')]}
+               'building-code-using-pytorch')]}
 end of tocinfo -->
 
 <body>
@@ -302,111 +201,72 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week45-bs.html">Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks (RNNs)</a>
+    <a class="navbar-brand" href="http://github.com/week45-bs.html">Week 45,  Convolutional Neural Networks (CCNs)</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs001.html#plans-for-week-45" style="font-size: 80%;"><b>Plans for week 45</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs002.html#material-for-the-lab-sessions-additional-ways-to-present-classification-results-and-other-practicalities" style="font-size: 80%;"><b>Material for the lab sessions, additional ways to present classification results and other practicalities</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs003.html#material-for-lecture-monday-november-4" style="font-size: 80%;"><b>Material for Lecture Monday November 4</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs004.html#convolutional-neural-networks-recognizing-images" style="font-size: 80%;"><b>Convolutional Neural Networks (recognizing images)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs005.html#what-is-the-difference" style="font-size: 80%;"><b>What is the Difference</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs006.html#neural-networks-vs-cnns" style="font-size: 80%;"><b>Neural Networks vs CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs007.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;"><b>Why CNNS for images, sound files, medical images from CT scans etc?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs008.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;"><b>Regular NNs don’t scale well to full images</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs009.html#3d-volumes-of-neurons" style="font-size: 80%;"><b>3D volumes of neurons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs010.html#layers-used-to-build-cnns" style="font-size: 80%;"><b>Layers used to build CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs011.html#cnns-in-brief" style="font-size: 80%;"><b>CNNs in brief</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs012.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs013.html#key-idea" style="font-size: 80%;"><b>Key Idea</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs014.html#building-convolutional-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building convolutional neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs015.html#setting-it-up" style="font-size: 80%;"><b>Setting it up</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs016.html#the-mnist-dataset-again" style="font-size: 80%;"><b>The MNIST dataset again</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs017.html#strong-correlations" style="font-size: 80%;"><b>Strong correlations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs018.html#layers-of-a-cnn" style="font-size: 80%;"><b>Layers of a CNN</b></a></li>
-     <!-- navigation toc: --> <li><a href="#systematic-reduction" style="font-size: 80%;"><b>Systematic reduction</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs020.html#prerequisites-collect-and-pre-process-data" style="font-size: 80%;"><b>Prerequisites: Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs021.html#importing-keras-and-tensorflow" style="font-size: 80%;"><b>Importing Keras and Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs022.html#running-with-keras" style="font-size: 80%;"><b>Running with Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs023.html#final-part" style="font-size: 80%;"><b>Final part</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs024.html#final-visualization" style="font-size: 80%;"><b>Final visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs025.html#the-cifar01-data-set" style="font-size: 80%;"><b>The CIFAR01 data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs026.html#verifying-the-data-set" style="font-size: 80%;"><b>Verifying the data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs027.html#set-up-the-model" style="font-size: 80%;"><b>Set up  the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs028.html#add-dense-layers-on-top" style="font-size: 80%;"><b>Add Dense layers on top</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs029.html#compile-and-train-the-model" style="font-size: 80%;"><b>Compile and train the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs030.html#finally-evaluate-the-model" style="font-size: 80%;"><b>Finally, evaluate the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#building-our-own-cnn-code" style="font-size: 80%;"><b>Building our own CNN code</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#list-of-contents" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;List of contents:</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution2dlayer-convolution-in-a-hidden-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution2DLayer: convolution in a hidden layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#backpropagation-in-the-convolutional-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Backpropagation in the convolutional layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#demonstration" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Demonstration</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#pooling-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Pooling Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#flattening-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Flattening Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#fully-connected-layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Fully Connected Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#optimized-convolution2dlayer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Optimized Convolution2DLayer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#the-convolutional-neural-network-cnn" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;The Convolutional Neural Network (CNN)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-cnn-code" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of CNN code</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#additional-remarks" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Additional Remarks</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#remarks-on-the-speed" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Remarks on the speed</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution-using-separable-kernels" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution using separable kernels</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution-in-the-fourier-domain" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution in the Fourier domain</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs032.html#from-ffnns-and-cnns-to-recurrent-neural-networks-rnns" style="font-size: 80%;"><b>From FFNNs and CNNs to recurrent neural networks (RNNs)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs033.html#feedback-connections" style="font-size: 80%;"><b>Feedback connections</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs034.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs035.html#recurrent-neural-networks-rnns-overarching-view" style="font-size: 80%;"><b>Recurrent neural networks (RNNs): Overarching view</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs036.html#sequential-data-only" style="font-size: 80%;"><b>Sequential data only?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs037.html#differential-equations" style="font-size: 80%;"><b>Differential equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs038.html#a-simple-example" style="font-size: 80%;"><b>A simple example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs039.html#rnns" style="font-size: 80%;"><b>RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs040.html#what-kinds-of-behaviour-can-rnns-exhibit" style="font-size: 80%;"><b>What kinds of behaviour can RNNs exhibit?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs041.html#basic-layout-figures-from-sebastian-rashcka-et-al-machine-learning-with-sickit-learn-and-pytorch-https-sebastianraschka-com-blog-2022-ml-pytorch-book-html" style="font-size: 80%;"><b>Basic layout,  "Figures from Sebastian Rashcka et al, Machine learning with Sickit-Learn and PyTorch":"https://sebastianraschka.com/blog/2022/ml-pytorch-book.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs042.html#solving-differential-equations-with-rnns" style="font-size: 80%;"><b>Solving differential equations with RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs043.html#two-first-order-differential-equations" style="font-size: 80%;"><b>Two first-order differential equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs044.html#velocity-only" style="font-size: 80%;"><b>Velocity only</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs045.html#linking-with-rnns" style="font-size: 80%;"><b>Linking with RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs046.html#minor-rewrite" style="font-size: 80%;"><b>Minor rewrite</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs047.html#rnns-in-more-detail" style="font-size: 80%;"><b>RNNs in more detail</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs048.html#rnns-in-more-detail-part-2" style="font-size: 80%;"><b>RNNs in more detail, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs049.html#rnns-in-more-detail-part-3" style="font-size: 80%;"><b>RNNs in more detail, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs050.html#rnns-in-more-detail-part-4" style="font-size: 80%;"><b>RNNs in more detail, part 4</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs051.html#rnns-in-more-detail-part-5" style="font-size: 80%;"><b>RNNs in more detail, part 5</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs052.html#rnns-in-more-detail-part-6" style="font-size: 80%;"><b>RNNs in more detail, part 6</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs053.html#rnns-in-more-detail-part-7" style="font-size: 80%;"><b>RNNs in more detail, part 7</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs054.html#backpropagation-through-time" style="font-size: 80%;"><b>Backpropagation through time</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs055.html#the-backward-pass-is-linear" style="font-size: 80%;"><b>The backward pass is linear</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs056.html#the-problem-of-exploding-or-vanishing-gradients" style="font-size: 80%;"><b>The problem of exploding or vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs057.html#mathematical-setup" style="font-size: 80%;"><b>Mathematical setup</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs058.html#back-propagation-in-time-through-figures-part-1" style="font-size: 80%;"><b>Back propagation in time through figures, part 1</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs059.html#back-propagation-in-time-part-2" style="font-size: 80%;"><b>Back propagation in time, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs060.html#back-propagation-in-time-part-3" style="font-size: 80%;"><b>Back propagation in time, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs061.html#back-propagation-in-time-part-4" style="font-size: 80%;"><b>Back propagation in time, part 4</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs062.html#back-propagation-in-time-in-equations" style="font-size: 80%;"><b>Back propagation in time in equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs063.html#chain-rule-again" style="font-size: 80%;"><b>Chain rule again</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs064.html#gradients-of-loss-functions" style="font-size: 80%;"><b>Gradients of loss functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs065.html#summary-of-rnns" style="font-size: 80%;"><b>Summary of RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs066.html#summary-of-a-typical-rnn" style="font-size: 80%;"><b>Summary of a  typical RNN</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs067.html#four-effective-ways-to-learn-an-rnn-and-preparing-for-next-week" style="font-size: 80%;"><b>Four effective ways to learn an RNN and preparing for next week</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs068.html#gating-mechanism-long-short-term-memory-lstm" style="font-size: 80%;"><b>Gating mechanism: Long Short Term Memory (LSTM)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs069.html#implementing-a-memory-cell-in-a-neural-network" style="font-size: 80%;"><b>Implementing a memory cell in a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs070.html#lstm-details" style="font-size: 80%;"><b>LSTM details</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs071.html#basic-layout" style="font-size: 80%;"><b>Basic layout</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs072.html#more-lstm-details" style="font-size: 80%;"><b>More LSTM details</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs073.html#the-forget-gate" style="font-size: 80%;"><b>The forget gate</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs074.html#input-gate" style="font-size: 80%;"><b>Input gate</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs075.html#forget-and-input" style="font-size: 80%;"><b>Forget and input</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs076.html#output-gate" style="font-size: 80%;"><b>Output gate</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs001.html#plans-for-week-45" style="font-size: 80%;">Plans for week 45</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs002.html#material-for-the-lab-sessions" style="font-size: 80%;">Material for the lab sessions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs003.html#material-for-lecture-monday-november-3" style="font-size: 80%;">Material for Lecture Monday November 3</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs004.html#convolutional-neural-networks-recognizing-images-reminder-from-last-week" style="font-size: 80%;">Convolutional Neural Networks (recognizing images), reminder from last week</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs005.html#what-is-the-difference" style="font-size: 80%;">What is the Difference</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs006.html#neural-networks-vs-cnns" style="font-size: 80%;">Neural Networks vs CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs007.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;">Why CNNS for images, sound files, medical images from CT scans etc?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs008.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;">Regular NNs don’t scale well to full images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs009.html#3d-volumes-of-neurons" style="font-size: 80%;">3D volumes of neurons</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs010.html#more-on-dimensionalities" style="font-size: 80%;">More on Dimensionalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs011.html#further-remarks" style="font-size: 80%;">Further remarks</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs012.html#layers-used-to-build-cnns" style="font-size: 80%;">Layers used to build CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs013.html#transforming-images" style="font-size: 80%;">Transforming images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs014.html#cnns-in-brief" style="font-size: 80%;">CNNs in brief</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs015.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs016.html#key-idea" style="font-size: 80%;">Key Idea</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs017.html#mathematics-of-cnns" style="font-size: 80%;">Mathematics of CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs018.html#convolution-examples-polynomial-multiplication" style="font-size: 80%;">Convolution Examples: Polynomial multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="#efficient-polynomial-multiplication" style="font-size: 80%;">Efficient Polynomial Multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs020.html#further-simplification" style="font-size: 80%;">Further simplification</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs021.html#a-more-efficient-way-of-coding-the-above-convolution" style="font-size: 80%;">A more efficient way of coding the above Convolution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs022.html#commutative-process" style="font-size: 80%;">Commutative process</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs023.html#toeplitz-matrices" style="font-size: 80%;">Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs024.html#fourier-series-and-toeplitz-matrices" style="font-size: 80%;">Fourier series and Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs025.html#generalizing-the-above-one-dimensional-case" style="font-size: 80%;">Generalizing the above one-dimensional case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs026.html#memory-considerations" style="font-size: 80%;">Memory considerations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs027.html#padding" style="font-size: 80%;">Padding</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs028.html#new-vector" style="font-size: 80%;">New vector</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs029.html#rewriting-as-dot-products" style="font-size: 80%;">Rewriting as dot products</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs030.html#cross-correlation" style="font-size: 80%;">Cross correlation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs030.html#two-dimensional-objects" style="font-size: 80%;">Two-dimensional objects</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#cnns-in-more-detail-simple-example" style="font-size: 80%;">CNNs in more detail, simple example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs032.html#the-convolution-stage" style="font-size: 80%;">The convolution stage</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs033.html#finding-the-number-of-parameters" style="font-size: 80%;">Finding the number of parameters</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs034.html#new-image-or-volume" style="font-size: 80%;">New image (or volume)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs035.html#parameters-to-train-common-settings" style="font-size: 80%;">Parameters to train, common settings</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs036.html#examples-of-cnn-setups" style="font-size: 80%;">Examples of CNN setups</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs037.html#summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Summarizing: Performing a general discrete convolution ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs038.html#pooling" style="font-size: 80%;">Pooling</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs039.html#pooling-arithmetic" style="font-size: 80%;">Pooling arithmetic</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs040.html#pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Pooling types ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs041.html#building-convolutional-neural-networks-using-tensorflow-and-keras" style="font-size: 80%;">Building convolutional neural networks using Tensorflow and Keras</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs042.html#setting-it-up" style="font-size: 80%;">Setting it up</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs043.html#the-mnist-dataset-again" style="font-size: 80%;">The MNIST dataset again</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs044.html#strong-correlations" style="font-size: 80%;">Strong correlations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs045.html#layers-of-a-cnn" style="font-size: 80%;">Layers of a CNN</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs046.html#systematic-reduction" style="font-size: 80%;">Systematic reduction</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs047.html#prerequisites-collect-and-pre-process-data" style="font-size: 80%;">Prerequisites: Collect and pre-process data</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs048.html#importing-keras-and-tensorflow" style="font-size: 80%;">Importing Keras and Tensorflow</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs049.html#running-with-keras" style="font-size: 80%;">Running with Keras</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs050.html#final-part" style="font-size: 80%;">Final part</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs051.html#final-visualization" style="font-size: 80%;">Final visualization</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs052.html#the-cifar01-data-set" style="font-size: 80%;">The CIFAR01 data set</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs053.html#verifying-the-data-set" style="font-size: 80%;">Verifying the data set</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs054.html#set-up-the-model" style="font-size: 80%;">Set up  the model</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs055.html#add-dense-layers-on-top" style="font-size: 80%;">Add Dense layers on top</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs056.html#compile-and-train-the-model" style="font-size: 80%;">Compile and train the model</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs057.html#finally-evaluate-the-model" style="font-size: 80%;">Finally, evaluate the model</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs058.html#building-code-using-pytorch" style="font-size: 80%;">Building code using Pytorch</a></li>
 
         </ul>
       </li>
@@ -418,18 +278,35 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0019"></a>
 <!-- !split -->
-<h2 id="systematic-reduction" class="anchor">Systematic reduction </h2>
+<h2 id="efficient-polynomial-multiplication" class="anchor">Efficient Polynomial Multiplication </h2>
 
-<p>By systematically reducing the size of the input volume, through
-convolution and pooling, the network should create representations of
-small parts of the input, and then from them assemble representations
-of larger areas.  The final pooling layer is flattened to serve as
-input to a hidden layer, such that each neuron in the final pooling
-layer is connected to every single neuron in the hidden layer. This
-then serves as input to the output layer, e.g. a softmax output for
-classification.
+<p>Computing polynomial products can be implemented efficiently if we rewrite the more brute force multiplications using convolution.
+We note first that the new coefficients are given as
 </p>
 
+$$
+\begin{split}
+\delta_0=&\alpha_0\beta_0\\
+\delta_1=&\alpha_1\beta_0+\alpha_0\beta_1\\
+\delta_2=&\alpha_0\beta_2+\alpha_1\beta_1+\alpha_2\beta_0\\
+\delta_3=&\alpha_1\beta_2+\alpha_2\beta_1+\alpha_0\beta_3\\
+\delta_4=&\alpha_2\beta_2+\alpha_1\beta_3\\
+\delta_5=&\alpha_2\beta_3.\\
+\end{split}
+$$
+
+<p>We note that \( \alpha_i=0 \) except for \( i\in \left\{0,1,2\right\} \) and \( \beta_i=0 \) except for \( i\in\left\{0,1,2,3\right\} \).</p>
+
+<p>We can then rewrite the coefficients \( \delta_j \) using a discrete convolution as</p>
+$$
+\delta_j = \sum_{i=-\infty}^{i=\infty}\alpha_i\beta_{j-i}=(\alpha * \beta)_j,
+$$
+
+<p>or as a double sum with restriction \( l=i+j \)</p>
+$$
+\delta_l = \sum_{ij}\alpha_i\beta_{j}.
+$$
+
 
 <p>
 <!-- navigation buttons at the bottom of the page -->
@@ -456,7 +333,7 @@ <h2 id="systematic-reduction" class="anchor">Systematic reduction </h2>
   <li><a href="http://github.com/._week45-bs027.html">28</a></li>
   <li><a href="http://github.com/._week45-bs028.html">29</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week45-bs076.html">77</a></li>
+  <li><a href="http://github.com/._week45-bs058.html">59</a></li>
   <li><a href="http://github.com/._week45-bs020.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week45/html/._week45-bs020.html b/doc/pub/week45/html/._week45-bs020.html
index 44d9ecfec..49befb270 100644
--- a/doc/pub/week45/html/._week45-bs020.html
+++ b/doc/pub/week45/html/._week45-bs020.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks (RNNs)">
-<title>Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks (RNNs)</title>
+<meta name="description" content="Week 45,  Convolutional Neural Networks (CCNs)">
+<title>Week 45,  Convolutional Neural Networks (CCNs)</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week45.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week45-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -37,19 +37,19 @@
 <!-- tocinfo
 {'highest level': 2,
  'sections': [('Plans for week 45', 2, None, 'plans-for-week-45'),
-              ('Material for the lab sessions, additional ways to present '
-               'classification results and other practicalities',
+              ('Material for the lab sessions',
                2,
                None,
-               'material-for-the-lab-sessions-additional-ways-to-present-classification-results-and-other-practicalities'),
-              ('Material for Lecture Monday November 4',
+               'material-for-the-lab-sessions'),
+              ('Material for Lecture Monday November 3',
                2,
                None,
-               'material-for-lecture-monday-november-4'),
-              ('Convolutional Neural Networks (recognizing images)',
+               'material-for-lecture-monday-november-3'),
+              ('Convolutional Neural Networks (recognizing images), reminder '
+               'from last week',
                2,
                None,
-               'convolutional-neural-networks-recognizing-images'),
+               'convolutional-neural-networks-recognizing-images-reminder-from-last-week'),
               ('What is the Difference', 2, None, 'what-is-the-difference'),
               ('Neural Networks vs CNNs', 2, None, 'neural-networks-vs-cnns'),
               ('Why CNNS for images, sound files, medical images from CT scans '
@@ -62,10 +62,13 @@
                None,
                'regular-nns-don-t-scale-well-to-full-images'),
               ('3D volumes of neurons', 2, None, '3d-volumes-of-neurons'),
+              ('More on Dimensionalities', 2, None, 'more-on-dimensionalities'),
+              ('Further remarks', 2, None, 'further-remarks'),
               ('Layers used to build CNNs',
                2,
                None,
                'layers-used-to-build-cnns'),
+              ('Transforming images', 2, None, 'transforming-images'),
               ('CNNs in brief', 2, None, 'cnns-in-brief'),
               ('A deep CNN model ("From Raschka et '
                'al":"https://github.com/rasbt/machine-learning-book")',
@@ -73,208 +76,104 @@
                None,
                'a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
               ('Key Idea', 2, None, 'key-idea'),
-              ('Building convolutional neural networks in Tensorflow and Keras',
+              ('Mathematics of CNNs', 2, None, 'mathematics-of-cnns'),
+              ('Convolution Examples: Polynomial multiplication',
                2,
                None,
-               'building-convolutional-neural-networks-in-tensorflow-and-keras'),
-              ('Setting it up', 2, None, 'setting-it-up'),
-              ('The MNIST dataset again', 2, None, 'the-mnist-dataset-again'),
-              ('Strong correlations', 2, None, 'strong-correlations'),
-              ('Layers of a CNN', 2, None, 'layers-of-a-cnn'),
-              ('Systematic reduction', 2, None, 'systematic-reduction'),
-              ('Prerequisites: Collect and pre-process data',
-               2,
-               None,
-               'prerequisites-collect-and-pre-process-data'),
-              ('Importing Keras and Tensorflow',
-               2,
-               None,
-               'importing-keras-and-tensorflow'),
-              ('Running with Keras', 2, None, 'running-with-keras'),
-              ('Final part', 2, None, 'final-part'),
-              ('Final visualization', 2, None, 'final-visualization'),
-              ('The CIFAR01 data set', 2, None, 'the-cifar01-data-set'),
-              ('Verifying the data set', 2, None, 'verifying-the-data-set'),
-              ('Set up  the model', 2, None, 'set-up-the-model'),
-              ('Add Dense layers on top', 2, None, 'add-dense-layers-on-top'),
-              ('Compile and train the model',
-               2,
-               None,
-               'compile-and-train-the-model'),
-              ('Finally, evaluate the model',
-               2,
-               None,
-               'finally-evaluate-the-model'),
-              ('Building our own CNN code',
-               2,
-               None,
-               'building-our-own-cnn-code'),
-              ('List of contents:', 3, None, 'list-of-contents'),
-              ('Schedulers', 3, None, 'schedulers'),
-              ('Usage of schedulers', 3, None, 'usage-of-schedulers'),
-              ('Cost functions', 3, None, 'cost-functions'),
-              ('Usage of cost functions', 3, None, 'usage-of-cost-functions'),
-              ('Activation functions', 3, None, 'activation-functions'),
-              ('Usage of activation functions',
-               3,
-               None,
-               'usage-of-activation-functions'),
-              ('Convolution', 3, None, 'convolution'),
-              ('Layers', 3, None, 'layers'),
-              ('Convolution2DLayer: convolution in a hidden layer',
-               3,
-               None,
-               'convolution2dlayer-convolution-in-a-hidden-layer'),
-              ('Backpropagation in the convolutional layer',
-               3,
-               None,
-               'backpropagation-in-the-convolutional-layer'),
-              ('Demonstration', 3, None, 'demonstration'),
-              ('Pooling Layer', 3, None, 'pooling-layer'),
-              ('Flattening Layer', 3, None, 'flattening-layer'),
-              ('Fully Connected Layers', 3, None, 'fully-connected-layers'),
-              ('Optimized Convolution2DLayer',
-               3,
-               None,
-               'optimized-convolution2dlayer'),
-              ('The Convolutional Neural Network (CNN)',
-               3,
-               None,
-               'the-convolutional-neural-network-cnn'),
-              ('Usage of CNN code', 3, None, 'usage-of-cnn-code'),
-              ('Additional Remarks', 3, None, 'additional-remarks'),
-              ('Remarks on the speed', 3, None, 'remarks-on-the-speed'),
-              ('Convolution using separable kernels',
-               3,
-               None,
-               'convolution-using-separable-kernels'),
-              ('Convolution in the Fourier domain',
-               3,
-               None,
-               'convolution-in-the-fourier-domain'),
-              ('From FFNNs and CNNs to recurrent neural networks (RNNs)',
-               2,
-               None,
-               'from-ffnns-and-cnns-to-recurrent-neural-networks-rnns'),
-              ('Feedback connections', 2, None, 'feedback-connections'),
-              ('Vanishing gradients', 2, None, 'vanishing-gradients'),
-              ('Recurrent neural networks (RNNs): Overarching view',
-               2,
-               None,
-               'recurrent-neural-networks-rnns-overarching-view'),
-              ('Sequential data only?', 2, None, 'sequential-data-only'),
-              ('Differential equations', 2, None, 'differential-equations'),
-              ('A simple example', 2, None, 'a-simple-example'),
-              ('RNNs', 2, None, 'rnns'),
-              ('What kinds of behaviour can RNNs exhibit?',
-               2,
-               None,
-               'what-kinds-of-behaviour-can-rnns-exhibit'),
-              ('Basic layout,  "Figures from Sebastian Rashcka et al, Machine '
-               'learning with Sickit-Learn and '
-               'PyTorch":"https://sebastianraschka.com/blog/2022/ml-pytorch-book.html"',
-               2,
-               None,
-               'basic-layout-figures-from-sebastian-rashcka-et-al-machine-learning-with-sickit-learn-and-pytorch-https-sebastianraschka-com-blog-2022-ml-pytorch-book-html'),
-              ('Solving differential equations with RNNs',
-               2,
-               None,
-               'solving-differential-equations-with-rnns'),
-              ('Two first-order differential equations',
-               2,
-               None,
-               'two-first-order-differential-equations'),
-              ('Velocity only', 2, None, 'velocity-only'),
-              ('Linking with RNNs', 2, None, 'linking-with-rnns'),
-              ('Minor rewrite', 2, None, 'minor-rewrite'),
-              ('RNNs in more detail', 2, None, 'rnns-in-more-detail'),
-              ('RNNs in more detail, part 2',
-               2,
-               None,
-               'rnns-in-more-detail-part-2'),
-              ('RNNs in more detail, part 3',
-               2,
-               None,
-               'rnns-in-more-detail-part-3'),
-              ('RNNs in more detail, part 4',
-               2,
-               None,
-               'rnns-in-more-detail-part-4'),
-              ('RNNs in more detail, part 5',
+               'convolution-examples-polynomial-multiplication'),
+              ('Efficient Polynomial Multiplication',
                2,
                None,
-               'rnns-in-more-detail-part-5'),
-              ('RNNs in more detail, part 6',
+               'efficient-polynomial-multiplication'),
+              ('Further simplification', 2, None, 'further-simplification'),
+              ('A more efficient way of coding the above Convolution',
                2,
                None,
-               'rnns-in-more-detail-part-6'),
-              ('RNNs in more detail, part 7',
+               'a-more-efficient-way-of-coding-the-above-convolution'),
+              ('Commutative process', 2, None, 'commutative-process'),
+              ('Toeplitz matrices', 2, None, 'toeplitz-matrices'),
+              ('Fourier series and Toeplitz matrices',
                2,
                None,
-               'rnns-in-more-detail-part-7'),
-              ('Backpropagation through time',
+               'fourier-series-and-toeplitz-matrices'),
+              ('Generalizing the above one-dimensional case',
                2,
                None,
-               'backpropagation-through-time'),
-              ('The backward pass is linear',
+               'generalizing-the-above-one-dimensional-case'),
+              ('Memory considerations', 2, None, 'memory-considerations'),
+              ('Padding', 2, None, 'padding'),
+              ('New vector', 2, None, 'new-vector'),
+              ('Rewriting as dot products',
                2,
                None,
-               'the-backward-pass-is-linear'),
-              ('The problem of exploding or vanishing gradients',
+               'rewriting-as-dot-products'),
+              ('Cross correlation', 2, None, 'cross-correlation'),
+              ('Two-dimensional objects', 2, None, 'two-dimensional-objects'),
+              ('CNNs in more detail, simple example',
                2,
                None,
-               'the-problem-of-exploding-or-vanishing-gradients'),
-              ('Mathematical setup', 2, None, 'mathematical-setup'),
-              ('Back propagation in time through figures, part 1',
+               'cnns-in-more-detail-simple-example'),
+              ('The convolution stage', 2, None, 'the-convolution-stage'),
+              ('Finding the number of parameters',
                2,
                None,
-               'back-propagation-in-time-through-figures-part-1'),
-              ('Back propagation in time, part 2',
+               'finding-the-number-of-parameters'),
+              ('New image (or volume)', 2, None, 'new-image-or-volume'),
+              ('Parameters to train, common settings',
                2,
                None,
-               'back-propagation-in-time-part-2'),
-              ('Back propagation in time, part 3',
+               'parameters-to-train-common-settings'),
+              ('Examples of CNN setups', 2, None, 'examples-of-cnn-setups'),
+              ('Summarizing: Performing a general discrete convolution ("From '
+               'Raschka et '
+               'al":"https://github.com/rasbt/machine-learning-book")',
                2,
                None,
-               'back-propagation-in-time-part-3'),
-              ('Back propagation in time, part 4',
+               'summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
+              ('Pooling', 2, None, 'pooling'),
+              ('Pooling arithmetic', 2, None, 'pooling-arithmetic'),
+              ('Pooling types ("From Raschka et '
+               'al":"https://github.com/rasbt/machine-learning-book")',
                2,
                None,
-               'back-propagation-in-time-part-4'),
-              ('Back propagation in time in equations',
+               'pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
+              ('Building convolutional neural networks using Tensorflow and '
+               'Keras',
                2,
                None,
-               'back-propagation-in-time-in-equations'),
-              ('Chain rule again', 2, None, 'chain-rule-again'),
-              ('Gradients of loss functions',
+               'building-convolutional-neural-networks-using-tensorflow-and-keras'),
+              ('Setting it up', 2, None, 'setting-it-up'),
+              ('The MNIST dataset again', 2, None, 'the-mnist-dataset-again'),
+              ('Strong correlations', 2, None, 'strong-correlations'),
+              ('Layers of a CNN', 2, None, 'layers-of-a-cnn'),
+              ('Systematic reduction', 2, None, 'systematic-reduction'),
+              ('Prerequisites: Collect and pre-process data',
                2,
                None,
-               'gradients-of-loss-functions'),
-              ('Summary of RNNs', 2, None, 'summary-of-rnns'),
-              ('Summary of a  typical RNN',
+               'prerequisites-collect-and-pre-process-data'),
+              ('Importing Keras and Tensorflow',
                2,
                None,
-               'summary-of-a-typical-rnn'),
-              ('Four effective ways to learn an RNN and preparing for next '
-               'week',
+               'importing-keras-and-tensorflow'),
+              ('Running with Keras', 2, None, 'running-with-keras'),
+              ('Final part', 2, None, 'final-part'),
+              ('Final visualization', 2, None, 'final-visualization'),
+              ('The CIFAR01 data set', 2, None, 'the-cifar01-data-set'),
+              ('Verifying the data set', 2, None, 'verifying-the-data-set'),
+              ('Set up  the model', 2, None, 'set-up-the-model'),
+              ('Add Dense layers on top', 2, None, 'add-dense-layers-on-top'),
+              ('Compile and train the model',
                2,
                None,
-               'four-effective-ways-to-learn-an-rnn-and-preparing-for-next-week'),
-              ('Gating mechanism: Long Short Term Memory (LSTM)',
+               'compile-and-train-the-model'),
+              ('Finally, evaluate the model',
                2,
                None,
-               'gating-mechanism-long-short-term-memory-lstm'),
-              ('Implementing a memory cell in a neural network',
+               'finally-evaluate-the-model'),
+              ('Building code using Pytorch',
                2,
                None,
-               'implementing-a-memory-cell-in-a-neural-network'),
-              ('LSTM details', 2, None, 'lstm-details'),
-              ('Basic layout', 2, None, 'basic-layout'),
-              ('More LSTM details', 2, None, 'more-lstm-details'),
-              ('The forget gate', 2, None, 'the-forget-gate'),
-              ('Input gate', 2, None, 'input-gate'),
-              ('Forget and input', 2, None, 'forget-and-input'),
-              ('Output gate', 2, None, 'output-gate')]}
+               'building-code-using-pytorch')]}
 end of tocinfo -->
 
 <body>
@@ -302,111 +201,72 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week45-bs.html">Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks (RNNs)</a>
+    <a class="navbar-brand" href="http://github.com/week45-bs.html">Week 45,  Convolutional Neural Networks (CCNs)</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs001.html#plans-for-week-45" style="font-size: 80%;"><b>Plans for week 45</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs002.html#material-for-the-lab-sessions-additional-ways-to-present-classification-results-and-other-practicalities" style="font-size: 80%;"><b>Material for the lab sessions, additional ways to present classification results and other practicalities</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs003.html#material-for-lecture-monday-november-4" style="font-size: 80%;"><b>Material for Lecture Monday November 4</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs004.html#convolutional-neural-networks-recognizing-images" style="font-size: 80%;"><b>Convolutional Neural Networks (recognizing images)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs005.html#what-is-the-difference" style="font-size: 80%;"><b>What is the Difference</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs006.html#neural-networks-vs-cnns" style="font-size: 80%;"><b>Neural Networks vs CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs007.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;"><b>Why CNNS for images, sound files, medical images from CT scans etc?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs008.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;"><b>Regular NNs don’t scale well to full images</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs009.html#3d-volumes-of-neurons" style="font-size: 80%;"><b>3D volumes of neurons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs010.html#layers-used-to-build-cnns" style="font-size: 80%;"><b>Layers used to build CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs011.html#cnns-in-brief" style="font-size: 80%;"><b>CNNs in brief</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs012.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs013.html#key-idea" style="font-size: 80%;"><b>Key Idea</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs014.html#building-convolutional-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building convolutional neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs015.html#setting-it-up" style="font-size: 80%;"><b>Setting it up</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs016.html#the-mnist-dataset-again" style="font-size: 80%;"><b>The MNIST dataset again</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs017.html#strong-correlations" style="font-size: 80%;"><b>Strong correlations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs018.html#layers-of-a-cnn" style="font-size: 80%;"><b>Layers of a CNN</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs019.html#systematic-reduction" style="font-size: 80%;"><b>Systematic reduction</b></a></li>
-     <!-- navigation toc: --> <li><a href="#prerequisites-collect-and-pre-process-data" style="font-size: 80%;"><b>Prerequisites: Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs021.html#importing-keras-and-tensorflow" style="font-size: 80%;"><b>Importing Keras and Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs022.html#running-with-keras" style="font-size: 80%;"><b>Running with Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs023.html#final-part" style="font-size: 80%;"><b>Final part</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs024.html#final-visualization" style="font-size: 80%;"><b>Final visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs025.html#the-cifar01-data-set" style="font-size: 80%;"><b>The CIFAR01 data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs026.html#verifying-the-data-set" style="font-size: 80%;"><b>Verifying the data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs027.html#set-up-the-model" style="font-size: 80%;"><b>Set up  the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs028.html#add-dense-layers-on-top" style="font-size: 80%;"><b>Add Dense layers on top</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs029.html#compile-and-train-the-model" style="font-size: 80%;"><b>Compile and train the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs030.html#finally-evaluate-the-model" style="font-size: 80%;"><b>Finally, evaluate the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#building-our-own-cnn-code" style="font-size: 80%;"><b>Building our own CNN code</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#list-of-contents" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;List of contents:</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution2dlayer-convolution-in-a-hidden-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution2DLayer: convolution in a hidden layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#backpropagation-in-the-convolutional-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Backpropagation in the convolutional layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#demonstration" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Demonstration</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#pooling-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Pooling Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#flattening-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Flattening Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#fully-connected-layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Fully Connected Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#optimized-convolution2dlayer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Optimized Convolution2DLayer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#the-convolutional-neural-network-cnn" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;The Convolutional Neural Network (CNN)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-cnn-code" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of CNN code</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#additional-remarks" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Additional Remarks</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#remarks-on-the-speed" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Remarks on the speed</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution-using-separable-kernels" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution using separable kernels</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution-in-the-fourier-domain" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution in the Fourier domain</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs032.html#from-ffnns-and-cnns-to-recurrent-neural-networks-rnns" style="font-size: 80%;"><b>From FFNNs and CNNs to recurrent neural networks (RNNs)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs033.html#feedback-connections" style="font-size: 80%;"><b>Feedback connections</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs034.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs035.html#recurrent-neural-networks-rnns-overarching-view" style="font-size: 80%;"><b>Recurrent neural networks (RNNs): Overarching view</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs036.html#sequential-data-only" style="font-size: 80%;"><b>Sequential data only?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs037.html#differential-equations" style="font-size: 80%;"><b>Differential equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs038.html#a-simple-example" style="font-size: 80%;"><b>A simple example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs039.html#rnns" style="font-size: 80%;"><b>RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs040.html#what-kinds-of-behaviour-can-rnns-exhibit" style="font-size: 80%;"><b>What kinds of behaviour can RNNs exhibit?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs041.html#basic-layout-figures-from-sebastian-rashcka-et-al-machine-learning-with-sickit-learn-and-pytorch-https-sebastianraschka-com-blog-2022-ml-pytorch-book-html" style="font-size: 80%;"><b>Basic layout,  "Figures from Sebastian Rashcka et al, Machine learning with Sickit-Learn and PyTorch":"https://sebastianraschka.com/blog/2022/ml-pytorch-book.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs042.html#solving-differential-equations-with-rnns" style="font-size: 80%;"><b>Solving differential equations with RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs043.html#two-first-order-differential-equations" style="font-size: 80%;"><b>Two first-order differential equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs044.html#velocity-only" style="font-size: 80%;"><b>Velocity only</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs045.html#linking-with-rnns" style="font-size: 80%;"><b>Linking with RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs046.html#minor-rewrite" style="font-size: 80%;"><b>Minor rewrite</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs047.html#rnns-in-more-detail" style="font-size: 80%;"><b>RNNs in more detail</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs048.html#rnns-in-more-detail-part-2" style="font-size: 80%;"><b>RNNs in more detail, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs049.html#rnns-in-more-detail-part-3" style="font-size: 80%;"><b>RNNs in more detail, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs050.html#rnns-in-more-detail-part-4" style="font-size: 80%;"><b>RNNs in more detail, part 4</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs051.html#rnns-in-more-detail-part-5" style="font-size: 80%;"><b>RNNs in more detail, part 5</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs052.html#rnns-in-more-detail-part-6" style="font-size: 80%;"><b>RNNs in more detail, part 6</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs053.html#rnns-in-more-detail-part-7" style="font-size: 80%;"><b>RNNs in more detail, part 7</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs054.html#backpropagation-through-time" style="font-size: 80%;"><b>Backpropagation through time</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs055.html#the-backward-pass-is-linear" style="font-size: 80%;"><b>The backward pass is linear</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs056.html#the-problem-of-exploding-or-vanishing-gradients" style="font-size: 80%;"><b>The problem of exploding or vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs057.html#mathematical-setup" style="font-size: 80%;"><b>Mathematical setup</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs058.html#back-propagation-in-time-through-figures-part-1" style="font-size: 80%;"><b>Back propagation in time through figures, part 1</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs059.html#back-propagation-in-time-part-2" style="font-size: 80%;"><b>Back propagation in time, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs060.html#back-propagation-in-time-part-3" style="font-size: 80%;"><b>Back propagation in time, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs061.html#back-propagation-in-time-part-4" style="font-size: 80%;"><b>Back propagation in time, part 4</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs062.html#back-propagation-in-time-in-equations" style="font-size: 80%;"><b>Back propagation in time in equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs063.html#chain-rule-again" style="font-size: 80%;"><b>Chain rule again</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs064.html#gradients-of-loss-functions" style="font-size: 80%;"><b>Gradients of loss functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs065.html#summary-of-rnns" style="font-size: 80%;"><b>Summary of RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs066.html#summary-of-a-typical-rnn" style="font-size: 80%;"><b>Summary of a  typical RNN</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs067.html#four-effective-ways-to-learn-an-rnn-and-preparing-for-next-week" style="font-size: 80%;"><b>Four effective ways to learn an RNN and preparing for next week</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs068.html#gating-mechanism-long-short-term-memory-lstm" style="font-size: 80%;"><b>Gating mechanism: Long Short Term Memory (LSTM)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs069.html#implementing-a-memory-cell-in-a-neural-network" style="font-size: 80%;"><b>Implementing a memory cell in a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs070.html#lstm-details" style="font-size: 80%;"><b>LSTM details</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs071.html#basic-layout" style="font-size: 80%;"><b>Basic layout</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs072.html#more-lstm-details" style="font-size: 80%;"><b>More LSTM details</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs073.html#the-forget-gate" style="font-size: 80%;"><b>The forget gate</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs074.html#input-gate" style="font-size: 80%;"><b>Input gate</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs075.html#forget-and-input" style="font-size: 80%;"><b>Forget and input</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs076.html#output-gate" style="font-size: 80%;"><b>Output gate</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs001.html#plans-for-week-45" style="font-size: 80%;">Plans for week 45</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs002.html#material-for-the-lab-sessions" style="font-size: 80%;">Material for the lab sessions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs003.html#material-for-lecture-monday-november-3" style="font-size: 80%;">Material for Lecture Monday November 3</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs004.html#convolutional-neural-networks-recognizing-images-reminder-from-last-week" style="font-size: 80%;">Convolutional Neural Networks (recognizing images), reminder from last week</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs005.html#what-is-the-difference" style="font-size: 80%;">What is the Difference</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs006.html#neural-networks-vs-cnns" style="font-size: 80%;">Neural Networks vs CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs007.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;">Why CNNS for images, sound files, medical images from CT scans etc?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs008.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;">Regular NNs don’t scale well to full images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs009.html#3d-volumes-of-neurons" style="font-size: 80%;">3D volumes of neurons</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs010.html#more-on-dimensionalities" style="font-size: 80%;">More on Dimensionalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs011.html#further-remarks" style="font-size: 80%;">Further remarks</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs012.html#layers-used-to-build-cnns" style="font-size: 80%;">Layers used to build CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs013.html#transforming-images" style="font-size: 80%;">Transforming images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs014.html#cnns-in-brief" style="font-size: 80%;">CNNs in brief</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs015.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs016.html#key-idea" style="font-size: 80%;">Key Idea</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs017.html#mathematics-of-cnns" style="font-size: 80%;">Mathematics of CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs018.html#convolution-examples-polynomial-multiplication" style="font-size: 80%;">Convolution Examples: Polynomial multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs019.html#efficient-polynomial-multiplication" style="font-size: 80%;">Efficient Polynomial Multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="#further-simplification" style="font-size: 80%;">Further simplification</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs021.html#a-more-efficient-way-of-coding-the-above-convolution" style="font-size: 80%;">A more efficient way of coding the above Convolution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs022.html#commutative-process" style="font-size: 80%;">Commutative process</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs023.html#toeplitz-matrices" style="font-size: 80%;">Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs024.html#fourier-series-and-toeplitz-matrices" style="font-size: 80%;">Fourier series and Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs025.html#generalizing-the-above-one-dimensional-case" style="font-size: 80%;">Generalizing the above one-dimensional case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs026.html#memory-considerations" style="font-size: 80%;">Memory considerations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs027.html#padding" style="font-size: 80%;">Padding</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs028.html#new-vector" style="font-size: 80%;">New vector</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs029.html#rewriting-as-dot-products" style="font-size: 80%;">Rewriting as dot products</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs030.html#cross-correlation" style="font-size: 80%;">Cross correlation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs030.html#two-dimensional-objects" style="font-size: 80%;">Two-dimensional objects</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#cnns-in-more-detail-simple-example" style="font-size: 80%;">CNNs in more detail, simple example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs032.html#the-convolution-stage" style="font-size: 80%;">The convolution stage</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs033.html#finding-the-number-of-parameters" style="font-size: 80%;">Finding the number of parameters</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs034.html#new-image-or-volume" style="font-size: 80%;">New image (or volume)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs035.html#parameters-to-train-common-settings" style="font-size: 80%;">Parameters to train, common settings</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs036.html#examples-of-cnn-setups" style="font-size: 80%;">Examples of CNN setups</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs037.html#summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Summarizing: Performing a general discrete convolution ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs038.html#pooling" style="font-size: 80%;">Pooling</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs039.html#pooling-arithmetic" style="font-size: 80%;">Pooling arithmetic</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs040.html#pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Pooling types ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs041.html#building-convolutional-neural-networks-using-tensorflow-and-keras" style="font-size: 80%;">Building convolutional neural networks using Tensorflow and Keras</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs042.html#setting-it-up" style="font-size: 80%;">Setting it up</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs043.html#the-mnist-dataset-again" style="font-size: 80%;">The MNIST dataset again</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs044.html#strong-correlations" style="font-size: 80%;">Strong correlations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs045.html#layers-of-a-cnn" style="font-size: 80%;">Layers of a CNN</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs046.html#systematic-reduction" style="font-size: 80%;">Systematic reduction</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs047.html#prerequisites-collect-and-pre-process-data" style="font-size: 80%;">Prerequisites: Collect and pre-process data</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs048.html#importing-keras-and-tensorflow" style="font-size: 80%;">Importing Keras and Tensorflow</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs049.html#running-with-keras" style="font-size: 80%;">Running with Keras</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs050.html#final-part" style="font-size: 80%;">Final part</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs051.html#final-visualization" style="font-size: 80%;">Final visualization</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs052.html#the-cifar01-data-set" style="font-size: 80%;">The CIFAR01 data set</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs053.html#verifying-the-data-set" style="font-size: 80%;">Verifying the data set</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs054.html#set-up-the-model" style="font-size: 80%;">Set up  the model</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs055.html#add-dense-layers-on-top" style="font-size: 80%;">Add Dense layers on top</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs056.html#compile-and-train-the-model" style="font-size: 80%;">Compile and train the model</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs057.html#finally-evaluate-the-model" style="font-size: 80%;">Finally, evaluate the model</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs058.html#building-code-using-pytorch" style="font-size: 80%;">Building code using Pytorch</a></li>
 
         </ul>
       </li>
@@ -418,69 +278,16 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0020"></a>
 <!-- !split -->
-<h2 id="prerequisites-collect-and-pre-process-data" class="anchor">Prerequisites: Collect and pre-process data </h2>
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #408080; font-style: italic"># import necessary packages</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">matplotlib.pyplot</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">plt</span>
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn</span> <span style="color: #008000; font-weight: bold">import</span> datasets
-
-
-<span style="color: #408080; font-style: italic"># ensure the same random numbers appear every time</span>
-np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>seed(<span style="color: #666666">0</span>)
-
-<span style="color: #408080; font-style: italic"># display images in notebook</span>
-<span style="color: #666666">%</span>matplotlib inline
-plt<span style="color: #666666">.</span>rcParams[<span style="color: #BA2121">&#39;figure.figsize&#39;</span>] <span style="color: #666666">=</span> (<span style="color: #666666">12</span>,<span style="color: #666666">12</span>)
+<h2 id="further-simplification" class="anchor">Further simplification </h2>
 
+<p>Although we may have redundant operations with some few zeros for \( \beta_i \), we can rewrite the above sum in a more compact way as </p>
+$$
+\delta_i = \sum_{k=0}^{k=m-1}\alpha_k\beta_{i-k},
+$$
 
-<span style="color: #408080; font-style: italic"># download MNIST dataset</span>
-digits <span style="color: #666666">=</span> datasets<span style="color: #666666">.</span>load_digits()
-
-<span style="color: #408080; font-style: italic"># define inputs and labels</span>
-inputs <span style="color: #666666">=</span> digits<span style="color: #666666">.</span>images
-labels <span style="color: #666666">=</span> digits<span style="color: #666666">.</span>target
-
-<span style="color: #408080; font-style: italic"># RGB images have a depth of 3</span>
-<span style="color: #408080; font-style: italic"># our images are grayscale so they should have a depth of 1</span>
-inputs <span style="color: #666666">=</span> inputs[:,:,:,np<span style="color: #666666">.</span>newaxis]
-
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;inputs = (n_inputs, pixel_width, pixel_height, depth) = &quot;</span> <span style="color: #666666">+</span> <span style="color: #008000">str</span>(inputs<span style="color: #666666">.</span>shape))
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;labels = (n_inputs) = &quot;</span> <span style="color: #666666">+</span> <span style="color: #008000">str</span>(labels<span style="color: #666666">.</span>shape))
-
-
-<span style="color: #408080; font-style: italic"># choose some random images to display</span>
-n_inputs <span style="color: #666666">=</span> <span style="color: #008000">len</span>(inputs)
-indices <span style="color: #666666">=</span> np<span style="color: #666666">.</span>arange(n_inputs)
-random_indices <span style="color: #666666">=</span> np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>choice(indices, size<span style="color: #666666">=5</span>)
-
-<span style="color: #008000; font-weight: bold">for</span> i, image <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">enumerate</span>(digits<span style="color: #666666">.</span>images[random_indices]):
-    plt<span style="color: #666666">.</span>subplot(<span style="color: #666666">1</span>, <span style="color: #666666">5</span>, i<span style="color: #666666">+1</span>)
-    plt<span style="color: #666666">.</span>axis(<span style="color: #BA2121">&#39;off&#39;</span>)
-    plt<span style="color: #666666">.</span>imshow(image, cmap<span style="color: #666666">=</span>plt<span style="color: #666666">.</span>cm<span style="color: #666666">.</span>gray_r, interpolation<span style="color: #666666">=</span><span style="color: #BA2121">&#39;nearest&#39;</span>)
-    plt<span style="color: #666666">.</span>title(<span style="color: #BA2121">&quot;Label: </span><span style="color: #BB6688; font-weight: bold">%d</span><span style="color: #BA2121">&quot;</span> <span style="color: #666666">%</span> digits<span style="color: #666666">.</span>target[random_indices[i]])
-plt<span style="color: #666666">.</span>show()
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
+<p>where \( m=3 \) in our case, the maximum length of
+the vector \( \alpha \). Note that the vector \( \boldsymbol{\beta} \) has length \( n=4 \). Below we will find an even more efficient representation.
+</p>
 
 <p>
 <!-- navigation buttons at the bottom of the page -->
@@ -507,7 +314,7 @@ <h2 id="prerequisites-collect-and-pre-process-data" class="anchor">Prerequisites
   <li><a href="http://github.com/._week45-bs028.html">29</a></li>
   <li><a href="http://github.com/._week45-bs029.html">30</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week45-bs076.html">77</a></li>
+  <li><a href="http://github.com/._week45-bs058.html">59</a></li>
   <li><a href="http://github.com/._week45-bs021.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week45/html/._week45-bs021.html b/doc/pub/week45/html/._week45-bs021.html
index dcae3bbbd..03258a1bf 100644
--- a/doc/pub/week45/html/._week45-bs021.html
+++ b/doc/pub/week45/html/._week45-bs021.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks (RNNs)">
-<title>Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks (RNNs)</title>
+<meta name="description" content="Week 45,  Convolutional Neural Networks (CCNs)">
+<title>Week 45,  Convolutional Neural Networks (CCNs)</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week45.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week45-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -37,19 +37,19 @@
 <!-- tocinfo
 {'highest level': 2,
  'sections': [('Plans for week 45', 2, None, 'plans-for-week-45'),
-              ('Material for the lab sessions, additional ways to present '
-               'classification results and other practicalities',
+              ('Material for the lab sessions',
                2,
                None,
-               'material-for-the-lab-sessions-additional-ways-to-present-classification-results-and-other-practicalities'),
-              ('Material for Lecture Monday November 4',
+               'material-for-the-lab-sessions'),
+              ('Material for Lecture Monday November 3',
                2,
                None,
-               'material-for-lecture-monday-november-4'),
-              ('Convolutional Neural Networks (recognizing images)',
+               'material-for-lecture-monday-november-3'),
+              ('Convolutional Neural Networks (recognizing images), reminder '
+               'from last week',
                2,
                None,
-               'convolutional-neural-networks-recognizing-images'),
+               'convolutional-neural-networks-recognizing-images-reminder-from-last-week'),
               ('What is the Difference', 2, None, 'what-is-the-difference'),
               ('Neural Networks vs CNNs', 2, None, 'neural-networks-vs-cnns'),
               ('Why CNNS for images, sound files, medical images from CT scans '
@@ -62,10 +62,13 @@
                None,
                'regular-nns-don-t-scale-well-to-full-images'),
               ('3D volumes of neurons', 2, None, '3d-volumes-of-neurons'),
+              ('More on Dimensionalities', 2, None, 'more-on-dimensionalities'),
+              ('Further remarks', 2, None, 'further-remarks'),
               ('Layers used to build CNNs',
                2,
                None,
                'layers-used-to-build-cnns'),
+              ('Transforming images', 2, None, 'transforming-images'),
               ('CNNs in brief', 2, None, 'cnns-in-brief'),
               ('A deep CNN model ("From Raschka et '
                'al":"https://github.com/rasbt/machine-learning-book")',
@@ -73,208 +76,104 @@
                None,
                'a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
               ('Key Idea', 2, None, 'key-idea'),
-              ('Building convolutional neural networks in Tensorflow and Keras',
+              ('Mathematics of CNNs', 2, None, 'mathematics-of-cnns'),
+              ('Convolution Examples: Polynomial multiplication',
                2,
                None,
-               'building-convolutional-neural-networks-in-tensorflow-and-keras'),
-              ('Setting it up', 2, None, 'setting-it-up'),
-              ('The MNIST dataset again', 2, None, 'the-mnist-dataset-again'),
-              ('Strong correlations', 2, None, 'strong-correlations'),
-              ('Layers of a CNN', 2, None, 'layers-of-a-cnn'),
-              ('Systematic reduction', 2, None, 'systematic-reduction'),
-              ('Prerequisites: Collect and pre-process data',
-               2,
-               None,
-               'prerequisites-collect-and-pre-process-data'),
-              ('Importing Keras and Tensorflow',
-               2,
-               None,
-               'importing-keras-and-tensorflow'),
-              ('Running with Keras', 2, None, 'running-with-keras'),
-              ('Final part', 2, None, 'final-part'),
-              ('Final visualization', 2, None, 'final-visualization'),
-              ('The CIFAR01 data set', 2, None, 'the-cifar01-data-set'),
-              ('Verifying the data set', 2, None, 'verifying-the-data-set'),
-              ('Set up  the model', 2, None, 'set-up-the-model'),
-              ('Add Dense layers on top', 2, None, 'add-dense-layers-on-top'),
-              ('Compile and train the model',
+               'convolution-examples-polynomial-multiplication'),
+              ('Efficient Polynomial Multiplication',
                2,
                None,
-               'compile-and-train-the-model'),
-              ('Finally, evaluate the model',
+               'efficient-polynomial-multiplication'),
+              ('Further simplification', 2, None, 'further-simplification'),
+              ('A more efficient way of coding the above Convolution',
                2,
                None,
-               'finally-evaluate-the-model'),
-              ('Building our own CNN code',
+               'a-more-efficient-way-of-coding-the-above-convolution'),
+              ('Commutative process', 2, None, 'commutative-process'),
+              ('Toeplitz matrices', 2, None, 'toeplitz-matrices'),
+              ('Fourier series and Toeplitz matrices',
                2,
                None,
-               'building-our-own-cnn-code'),
-              ('List of contents:', 3, None, 'list-of-contents'),
-              ('Schedulers', 3, None, 'schedulers'),
-              ('Usage of schedulers', 3, None, 'usage-of-schedulers'),
-              ('Cost functions', 3, None, 'cost-functions'),
-              ('Usage of cost functions', 3, None, 'usage-of-cost-functions'),
-              ('Activation functions', 3, None, 'activation-functions'),
-              ('Usage of activation functions',
-               3,
-               None,
-               'usage-of-activation-functions'),
-              ('Convolution', 3, None, 'convolution'),
-              ('Layers', 3, None, 'layers'),
-              ('Convolution2DLayer: convolution in a hidden layer',
-               3,
-               None,
-               'convolution2dlayer-convolution-in-a-hidden-layer'),
-              ('Backpropagation in the convolutional layer',
-               3,
-               None,
-               'backpropagation-in-the-convolutional-layer'),
-              ('Demonstration', 3, None, 'demonstration'),
-              ('Pooling Layer', 3, None, 'pooling-layer'),
-              ('Flattening Layer', 3, None, 'flattening-layer'),
-              ('Fully Connected Layers', 3, None, 'fully-connected-layers'),
-              ('Optimized Convolution2DLayer',
-               3,
-               None,
-               'optimized-convolution2dlayer'),
-              ('The Convolutional Neural Network (CNN)',
-               3,
-               None,
-               'the-convolutional-neural-network-cnn'),
-              ('Usage of CNN code', 3, None, 'usage-of-cnn-code'),
-              ('Additional Remarks', 3, None, 'additional-remarks'),
-              ('Remarks on the speed', 3, None, 'remarks-on-the-speed'),
-              ('Convolution using separable kernels',
-               3,
-               None,
-               'convolution-using-separable-kernels'),
-              ('Convolution in the Fourier domain',
-               3,
-               None,
-               'convolution-in-the-fourier-domain'),
-              ('From FFNNs and CNNs to recurrent neural networks (RNNs)',
+               'fourier-series-and-toeplitz-matrices'),
+              ('Generalizing the above one-dimensional case',
                2,
                None,
-               'from-ffnns-and-cnns-to-recurrent-neural-networks-rnns'),
-              ('Feedback connections', 2, None, 'feedback-connections'),
-              ('Vanishing gradients', 2, None, 'vanishing-gradients'),
-              ('Recurrent neural networks (RNNs): Overarching view',
+               'generalizing-the-above-one-dimensional-case'),
+              ('Memory considerations', 2, None, 'memory-considerations'),
+              ('Padding', 2, None, 'padding'),
+              ('New vector', 2, None, 'new-vector'),
+              ('Rewriting as dot products',
                2,
                None,
-               'recurrent-neural-networks-rnns-overarching-view'),
-              ('Sequential data only?', 2, None, 'sequential-data-only'),
-              ('Differential equations', 2, None, 'differential-equations'),
-              ('A simple example', 2, None, 'a-simple-example'),
-              ('RNNs', 2, None, 'rnns'),
-              ('What kinds of behaviour can RNNs exhibit?',
+               'rewriting-as-dot-products'),
+              ('Cross correlation', 2, None, 'cross-correlation'),
+              ('Two-dimensional objects', 2, None, 'two-dimensional-objects'),
+              ('CNNs in more detail, simple example',
                2,
                None,
-               'what-kinds-of-behaviour-can-rnns-exhibit'),
-              ('Basic layout,  "Figures from Sebastian Rashcka et al, Machine '
-               'learning with Sickit-Learn and '
-               'PyTorch":"https://sebastianraschka.com/blog/2022/ml-pytorch-book.html"',
+               'cnns-in-more-detail-simple-example'),
+              ('The convolution stage', 2, None, 'the-convolution-stage'),
+              ('Finding the number of parameters',
                2,
                None,
-               'basic-layout-figures-from-sebastian-rashcka-et-al-machine-learning-with-sickit-learn-and-pytorch-https-sebastianraschka-com-blog-2022-ml-pytorch-book-html'),
-              ('Solving differential equations with RNNs',
+               'finding-the-number-of-parameters'),
+              ('New image (or volume)', 2, None, 'new-image-or-volume'),
+              ('Parameters to train, common settings',
                2,
                None,
-               'solving-differential-equations-with-rnns'),
-              ('Two first-order differential equations',
-               2,
-               None,
-               'two-first-order-differential-equations'),
-              ('Velocity only', 2, None, 'velocity-only'),
-              ('Linking with RNNs', 2, None, 'linking-with-rnns'),
-              ('Minor rewrite', 2, None, 'minor-rewrite'),
-              ('RNNs in more detail', 2, None, 'rnns-in-more-detail'),
-              ('RNNs in more detail, part 2',
-               2,
-               None,
-               'rnns-in-more-detail-part-2'),
-              ('RNNs in more detail, part 3',
-               2,
-               None,
-               'rnns-in-more-detail-part-3'),
-              ('RNNs in more detail, part 4',
-               2,
-               None,
-               'rnns-in-more-detail-part-4'),
-              ('RNNs in more detail, part 5',
-               2,
-               None,
-               'rnns-in-more-detail-part-5'),
-              ('RNNs in more detail, part 6',
-               2,
-               None,
-               'rnns-in-more-detail-part-6'),
-              ('RNNs in more detail, part 7',
-               2,
-               None,
-               'rnns-in-more-detail-part-7'),
-              ('Backpropagation through time',
-               2,
-               None,
-               'backpropagation-through-time'),
-              ('The backward pass is linear',
-               2,
-               None,
-               'the-backward-pass-is-linear'),
-              ('The problem of exploding or vanishing gradients',
-               2,
-               None,
-               'the-problem-of-exploding-or-vanishing-gradients'),
-              ('Mathematical setup', 2, None, 'mathematical-setup'),
-              ('Back propagation in time through figures, part 1',
-               2,
-               None,
-               'back-propagation-in-time-through-figures-part-1'),
-              ('Back propagation in time, part 2',
-               2,
-               None,
-               'back-propagation-in-time-part-2'),
-              ('Back propagation in time, part 3',
+               'parameters-to-train-common-settings'),
+              ('Examples of CNN setups', 2, None, 'examples-of-cnn-setups'),
+              ('Summarizing: Performing a general discrete convolution ("From '
+               'Raschka et '
+               'al":"https://github.com/rasbt/machine-learning-book")',
                2,
                None,
-               'back-propagation-in-time-part-3'),
-              ('Back propagation in time, part 4',
+               'summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
+              ('Pooling', 2, None, 'pooling'),
+              ('Pooling arithmetic', 2, None, 'pooling-arithmetic'),
+              ('Pooling types ("From Raschka et '
+               'al":"https://github.com/rasbt/machine-learning-book")',
                2,
                None,
-               'back-propagation-in-time-part-4'),
-              ('Back propagation in time in equations',
+               'pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
+              ('Building convolutional neural networks using Tensorflow and '
+               'Keras',
                2,
                None,
-               'back-propagation-in-time-in-equations'),
-              ('Chain rule again', 2, None, 'chain-rule-again'),
-              ('Gradients of loss functions',
+               'building-convolutional-neural-networks-using-tensorflow-and-keras'),
+              ('Setting it up', 2, None, 'setting-it-up'),
+              ('The MNIST dataset again', 2, None, 'the-mnist-dataset-again'),
+              ('Strong correlations', 2, None, 'strong-correlations'),
+              ('Layers of a CNN', 2, None, 'layers-of-a-cnn'),
+              ('Systematic reduction', 2, None, 'systematic-reduction'),
+              ('Prerequisites: Collect and pre-process data',
                2,
                None,
-               'gradients-of-loss-functions'),
-              ('Summary of RNNs', 2, None, 'summary-of-rnns'),
-              ('Summary of a  typical RNN',
+               'prerequisites-collect-and-pre-process-data'),
+              ('Importing Keras and Tensorflow',
                2,
                None,
-               'summary-of-a-typical-rnn'),
-              ('Four effective ways to learn an RNN and preparing for next '
-               'week',
+               'importing-keras-and-tensorflow'),
+              ('Running with Keras', 2, None, 'running-with-keras'),
+              ('Final part', 2, None, 'final-part'),
+              ('Final visualization', 2, None, 'final-visualization'),
+              ('The CIFAR01 data set', 2, None, 'the-cifar01-data-set'),
+              ('Verifying the data set', 2, None, 'verifying-the-data-set'),
+              ('Set up  the model', 2, None, 'set-up-the-model'),
+              ('Add Dense layers on top', 2, None, 'add-dense-layers-on-top'),
+              ('Compile and train the model',
                2,
                None,
-               'four-effective-ways-to-learn-an-rnn-and-preparing-for-next-week'),
-              ('Gating mechanism: Long Short Term Memory (LSTM)',
+               'compile-and-train-the-model'),
+              ('Finally, evaluate the model',
                2,
                None,
-               'gating-mechanism-long-short-term-memory-lstm'),
-              ('Implementing a memory cell in a neural network',
+               'finally-evaluate-the-model'),
+              ('Building code using Pytorch',
                2,
                None,
-               'implementing-a-memory-cell-in-a-neural-network'),
-              ('LSTM details', 2, None, 'lstm-details'),
-              ('Basic layout', 2, None, 'basic-layout'),
-              ('More LSTM details', 2, None, 'more-lstm-details'),
-              ('The forget gate', 2, None, 'the-forget-gate'),
-              ('Input gate', 2, None, 'input-gate'),
-              ('Forget and input', 2, None, 'forget-and-input'),
-              ('Output gate', 2, None, 'output-gate')]}
+               'building-code-using-pytorch')]}
 end of tocinfo -->
 
 <body>
@@ -302,111 +201,72 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week45-bs.html">Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks (RNNs)</a>
+    <a class="navbar-brand" href="http://github.com/week45-bs.html">Week 45,  Convolutional Neural Networks (CCNs)</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs001.html#plans-for-week-45" style="font-size: 80%;"><b>Plans for week 45</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs002.html#material-for-the-lab-sessions-additional-ways-to-present-classification-results-and-other-practicalities" style="font-size: 80%;"><b>Material for the lab sessions, additional ways to present classification results and other practicalities</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs003.html#material-for-lecture-monday-november-4" style="font-size: 80%;"><b>Material for Lecture Monday November 4</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs004.html#convolutional-neural-networks-recognizing-images" style="font-size: 80%;"><b>Convolutional Neural Networks (recognizing images)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs005.html#what-is-the-difference" style="font-size: 80%;"><b>What is the Difference</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs006.html#neural-networks-vs-cnns" style="font-size: 80%;"><b>Neural Networks vs CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs007.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;"><b>Why CNNS for images, sound files, medical images from CT scans etc?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs008.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;"><b>Regular NNs don’t scale well to full images</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs009.html#3d-volumes-of-neurons" style="font-size: 80%;"><b>3D volumes of neurons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs010.html#layers-used-to-build-cnns" style="font-size: 80%;"><b>Layers used to build CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs011.html#cnns-in-brief" style="font-size: 80%;"><b>CNNs in brief</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs012.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs013.html#key-idea" style="font-size: 80%;"><b>Key Idea</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs014.html#building-convolutional-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building convolutional neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs015.html#setting-it-up" style="font-size: 80%;"><b>Setting it up</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs016.html#the-mnist-dataset-again" style="font-size: 80%;"><b>The MNIST dataset again</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs017.html#strong-correlations" style="font-size: 80%;"><b>Strong correlations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs018.html#layers-of-a-cnn" style="font-size: 80%;"><b>Layers of a CNN</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs019.html#systematic-reduction" style="font-size: 80%;"><b>Systematic reduction</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs020.html#prerequisites-collect-and-pre-process-data" style="font-size: 80%;"><b>Prerequisites: Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="#importing-keras-and-tensorflow" style="font-size: 80%;"><b>Importing Keras and Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs022.html#running-with-keras" style="font-size: 80%;"><b>Running with Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs023.html#final-part" style="font-size: 80%;"><b>Final part</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs024.html#final-visualization" style="font-size: 80%;"><b>Final visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs025.html#the-cifar01-data-set" style="font-size: 80%;"><b>The CIFAR01 data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs026.html#verifying-the-data-set" style="font-size: 80%;"><b>Verifying the data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs027.html#set-up-the-model" style="font-size: 80%;"><b>Set up  the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs028.html#add-dense-layers-on-top" style="font-size: 80%;"><b>Add Dense layers on top</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs029.html#compile-and-train-the-model" style="font-size: 80%;"><b>Compile and train the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs030.html#finally-evaluate-the-model" style="font-size: 80%;"><b>Finally, evaluate the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#building-our-own-cnn-code" style="font-size: 80%;"><b>Building our own CNN code</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#list-of-contents" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;List of contents:</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution2dlayer-convolution-in-a-hidden-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution2DLayer: convolution in a hidden layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#backpropagation-in-the-convolutional-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Backpropagation in the convolutional layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#demonstration" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Demonstration</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#pooling-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Pooling Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#flattening-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Flattening Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#fully-connected-layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Fully Connected Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#optimized-convolution2dlayer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Optimized Convolution2DLayer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#the-convolutional-neural-network-cnn" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;The Convolutional Neural Network (CNN)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-cnn-code" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of CNN code</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#additional-remarks" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Additional Remarks</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#remarks-on-the-speed" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Remarks on the speed</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution-using-separable-kernels" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution using separable kernels</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution-in-the-fourier-domain" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution in the Fourier domain</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs032.html#from-ffnns-and-cnns-to-recurrent-neural-networks-rnns" style="font-size: 80%;"><b>From FFNNs and CNNs to recurrent neural networks (RNNs)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs033.html#feedback-connections" style="font-size: 80%;"><b>Feedback connections</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs034.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs035.html#recurrent-neural-networks-rnns-overarching-view" style="font-size: 80%;"><b>Recurrent neural networks (RNNs): Overarching view</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs036.html#sequential-data-only" style="font-size: 80%;"><b>Sequential data only?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs037.html#differential-equations" style="font-size: 80%;"><b>Differential equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs038.html#a-simple-example" style="font-size: 80%;"><b>A simple example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs039.html#rnns" style="font-size: 80%;"><b>RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs040.html#what-kinds-of-behaviour-can-rnns-exhibit" style="font-size: 80%;"><b>What kinds of behaviour can RNNs exhibit?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs041.html#basic-layout-figures-from-sebastian-rashcka-et-al-machine-learning-with-sickit-learn-and-pytorch-https-sebastianraschka-com-blog-2022-ml-pytorch-book-html" style="font-size: 80%;"><b>Basic layout,  "Figures from Sebastian Rashcka et al, Machine learning with Sickit-Learn and PyTorch":"https://sebastianraschka.com/blog/2022/ml-pytorch-book.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs042.html#solving-differential-equations-with-rnns" style="font-size: 80%;"><b>Solving differential equations with RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs043.html#two-first-order-differential-equations" style="font-size: 80%;"><b>Two first-order differential equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs044.html#velocity-only" style="font-size: 80%;"><b>Velocity only</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs045.html#linking-with-rnns" style="font-size: 80%;"><b>Linking with RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs046.html#minor-rewrite" style="font-size: 80%;"><b>Minor rewrite</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs047.html#rnns-in-more-detail" style="font-size: 80%;"><b>RNNs in more detail</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs048.html#rnns-in-more-detail-part-2" style="font-size: 80%;"><b>RNNs in more detail, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs049.html#rnns-in-more-detail-part-3" style="font-size: 80%;"><b>RNNs in more detail, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs050.html#rnns-in-more-detail-part-4" style="font-size: 80%;"><b>RNNs in more detail, part 4</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs051.html#rnns-in-more-detail-part-5" style="font-size: 80%;"><b>RNNs in more detail, part 5</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs052.html#rnns-in-more-detail-part-6" style="font-size: 80%;"><b>RNNs in more detail, part 6</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs053.html#rnns-in-more-detail-part-7" style="font-size: 80%;"><b>RNNs in more detail, part 7</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs054.html#backpropagation-through-time" style="font-size: 80%;"><b>Backpropagation through time</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs055.html#the-backward-pass-is-linear" style="font-size: 80%;"><b>The backward pass is linear</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs056.html#the-problem-of-exploding-or-vanishing-gradients" style="font-size: 80%;"><b>The problem of exploding or vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs057.html#mathematical-setup" style="font-size: 80%;"><b>Mathematical setup</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs058.html#back-propagation-in-time-through-figures-part-1" style="font-size: 80%;"><b>Back propagation in time through figures, part 1</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs059.html#back-propagation-in-time-part-2" style="font-size: 80%;"><b>Back propagation in time, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs060.html#back-propagation-in-time-part-3" style="font-size: 80%;"><b>Back propagation in time, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs061.html#back-propagation-in-time-part-4" style="font-size: 80%;"><b>Back propagation in time, part 4</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs062.html#back-propagation-in-time-in-equations" style="font-size: 80%;"><b>Back propagation in time in equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs063.html#chain-rule-again" style="font-size: 80%;"><b>Chain rule again</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs064.html#gradients-of-loss-functions" style="font-size: 80%;"><b>Gradients of loss functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs065.html#summary-of-rnns" style="font-size: 80%;"><b>Summary of RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs066.html#summary-of-a-typical-rnn" style="font-size: 80%;"><b>Summary of a  typical RNN</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs067.html#four-effective-ways-to-learn-an-rnn-and-preparing-for-next-week" style="font-size: 80%;"><b>Four effective ways to learn an RNN and preparing for next week</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs068.html#gating-mechanism-long-short-term-memory-lstm" style="font-size: 80%;"><b>Gating mechanism: Long Short Term Memory (LSTM)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs069.html#implementing-a-memory-cell-in-a-neural-network" style="font-size: 80%;"><b>Implementing a memory cell in a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs070.html#lstm-details" style="font-size: 80%;"><b>LSTM details</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs071.html#basic-layout" style="font-size: 80%;"><b>Basic layout</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs072.html#more-lstm-details" style="font-size: 80%;"><b>More LSTM details</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs073.html#the-forget-gate" style="font-size: 80%;"><b>The forget gate</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs074.html#input-gate" style="font-size: 80%;"><b>Input gate</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs075.html#forget-and-input" style="font-size: 80%;"><b>Forget and input</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs076.html#output-gate" style="font-size: 80%;"><b>Output gate</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs001.html#plans-for-week-45" style="font-size: 80%;">Plans for week 45</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs002.html#material-for-the-lab-sessions" style="font-size: 80%;">Material for the lab sessions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs003.html#material-for-lecture-monday-november-3" style="font-size: 80%;">Material for Lecture Monday November 3</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs004.html#convolutional-neural-networks-recognizing-images-reminder-from-last-week" style="font-size: 80%;">Convolutional Neural Networks (recognizing images), reminder from last week</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs005.html#what-is-the-difference" style="font-size: 80%;">What is the Difference</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs006.html#neural-networks-vs-cnns" style="font-size: 80%;">Neural Networks vs CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs007.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;">Why CNNS for images, sound files, medical images from CT scans etc?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs008.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;">Regular NNs don’t scale well to full images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs009.html#3d-volumes-of-neurons" style="font-size: 80%;">3D volumes of neurons</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs010.html#more-on-dimensionalities" style="font-size: 80%;">More on Dimensionalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs011.html#further-remarks" style="font-size: 80%;">Further remarks</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs012.html#layers-used-to-build-cnns" style="font-size: 80%;">Layers used to build CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs013.html#transforming-images" style="font-size: 80%;">Transforming images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs014.html#cnns-in-brief" style="font-size: 80%;">CNNs in brief</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs015.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs016.html#key-idea" style="font-size: 80%;">Key Idea</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs017.html#mathematics-of-cnns" style="font-size: 80%;">Mathematics of CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs018.html#convolution-examples-polynomial-multiplication" style="font-size: 80%;">Convolution Examples: Polynomial multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs019.html#efficient-polynomial-multiplication" style="font-size: 80%;">Efficient Polynomial Multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs020.html#further-simplification" style="font-size: 80%;">Further simplification</a></li>
+     <!-- navigation toc: --> <li><a href="#a-more-efficient-way-of-coding-the-above-convolution" style="font-size: 80%;">A more efficient way of coding the above Convolution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs022.html#commutative-process" style="font-size: 80%;">Commutative process</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs023.html#toeplitz-matrices" style="font-size: 80%;">Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs024.html#fourier-series-and-toeplitz-matrices" style="font-size: 80%;">Fourier series and Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs025.html#generalizing-the-above-one-dimensional-case" style="font-size: 80%;">Generalizing the above one-dimensional case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs026.html#memory-considerations" style="font-size: 80%;">Memory considerations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs027.html#padding" style="font-size: 80%;">Padding</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs028.html#new-vector" style="font-size: 80%;">New vector</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs029.html#rewriting-as-dot-products" style="font-size: 80%;">Rewriting as dot products</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs030.html#cross-correlation" style="font-size: 80%;">Cross correlation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs030.html#two-dimensional-objects" style="font-size: 80%;">Two-dimensional objects</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#cnns-in-more-detail-simple-example" style="font-size: 80%;">CNNs in more detail, simple example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs032.html#the-convolution-stage" style="font-size: 80%;">The convolution stage</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs033.html#finding-the-number-of-parameters" style="font-size: 80%;">Finding the number of parameters</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs034.html#new-image-or-volume" style="font-size: 80%;">New image (or volume)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs035.html#parameters-to-train-common-settings" style="font-size: 80%;">Parameters to train, common settings</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs036.html#examples-of-cnn-setups" style="font-size: 80%;">Examples of CNN setups</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs037.html#summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Summarizing: Performing a general discrete convolution ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs038.html#pooling" style="font-size: 80%;">Pooling</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs039.html#pooling-arithmetic" style="font-size: 80%;">Pooling arithmetic</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs040.html#pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Pooling types ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs041.html#building-convolutional-neural-networks-using-tensorflow-and-keras" style="font-size: 80%;">Building convolutional neural networks using Tensorflow and Keras</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs042.html#setting-it-up" style="font-size: 80%;">Setting it up</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs043.html#the-mnist-dataset-again" style="font-size: 80%;">The MNIST dataset again</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs044.html#strong-correlations" style="font-size: 80%;">Strong correlations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs045.html#layers-of-a-cnn" style="font-size: 80%;">Layers of a CNN</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs046.html#systematic-reduction" style="font-size: 80%;">Systematic reduction</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs047.html#prerequisites-collect-and-pre-process-data" style="font-size: 80%;">Prerequisites: Collect and pre-process data</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs048.html#importing-keras-and-tensorflow" style="font-size: 80%;">Importing Keras and Tensorflow</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs049.html#running-with-keras" style="font-size: 80%;">Running with Keras</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs050.html#final-part" style="font-size: 80%;">Final part</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs051.html#final-visualization" style="font-size: 80%;">Final visualization</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs052.html#the-cifar01-data-set" style="font-size: 80%;">The CIFAR01 data set</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs053.html#verifying-the-data-set" style="font-size: 80%;">Verifying the data set</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs054.html#set-up-the-model" style="font-size: 80%;">Set up  the model</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs055.html#add-dense-layers-on-top" style="font-size: 80%;">Add Dense layers on top</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs056.html#compile-and-train-the-model" style="font-size: 80%;">Compile and train the model</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs057.html#finally-evaluate-the-model" style="font-size: 80%;">Finally, evaluate the model</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs058.html#building-code-using-pytorch" style="font-size: 80%;">Building code using Pytorch</a></li>
 
         </ul>
       </li>
@@ -418,50 +278,22 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0021"></a>
 <!-- !split -->
-<h2 id="importing-keras-and-tensorflow" class="anchor">Importing Keras and Tensorflow </h2>
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">tensorflow.keras</span> <span style="color: #008000; font-weight: bold">import</span> datasets, layers, models
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">tensorflow.keras.layers</span> <span style="color: #008000; font-weight: bold">import</span> Input
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">tensorflow.keras.models</span> <span style="color: #008000; font-weight: bold">import</span> Sequential      <span style="color: #408080; font-style: italic">#This allows appending layers to existing models</span>
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">tensorflow.keras.layers</span> <span style="color: #008000; font-weight: bold">import</span> Dense           <span style="color: #408080; font-style: italic">#This allows defining the characteristics of a particular layer</span>
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">tensorflow.keras</span> <span style="color: #008000; font-weight: bold">import</span> optimizers             <span style="color: #408080; font-style: italic">#This allows using whichever optimiser we want (sgd,adam,RMSprop)</span>
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">tensorflow.keras</span> <span style="color: #008000; font-weight: bold">import</span> regularizers           <span style="color: #408080; font-style: italic">#This allows using whichever regularizer we want (l1,l2,l1_l2)</span>
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">tensorflow.keras.utils</span> <span style="color: #008000; font-weight: bold">import</span> to_categorical   <span style="color: #408080; font-style: italic">#This allows using categorical cross entropy as the cost function</span>
-<span style="color: #408080; font-style: italic">#from tensorflow.keras import Conv2D</span>
-<span style="color: #408080; font-style: italic">#from tensorflow.keras import MaxPooling2D</span>
-<span style="color: #408080; font-style: italic">#from tensorflow.keras import Flatten</span>
+<h2 id="a-more-efficient-way-of-coding-the-above-convolution" class="anchor">A more efficient way of coding the above Convolution </h2>
 
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.model_selection</span> <span style="color: #008000; font-weight: bold">import</span> train_test_split
+<p>Since we only have a finite number of \( \alpha \) and \( \beta \) values
+which are non-zero, we can rewrite the above convolution expressions
+as a matrix-vector multiplication
+</p>
 
-<span style="color: #408080; font-style: italic"># representation of labels</span>
-labels <span style="color: #666666">=</span> to_categorical(labels)
-
-<span style="color: #408080; font-style: italic"># split into train and test data</span>
-<span style="color: #408080; font-style: italic"># one-liner from scikit-learn library</span>
-train_size <span style="color: #666666">=</span> <span style="color: #666666">0.8</span>
-test_size <span style="color: #666666">=</span> <span style="color: #666666">1</span> <span style="color: #666666">-</span> train_size
-X_train, X_test, Y_train, Y_test <span style="color: #666666">=</span> train_test_split(inputs, labels, train_size<span style="color: #666666">=</span>train_size,
-                                                    test_size<span style="color: #666666">=</span>test_size)
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
+$$
+\boldsymbol{\delta}=\begin{bmatrix}\alpha_0 & 0 & 0 & 0 \\
+                            \alpha_1 & \alpha_0 & 0 & 0 \\
+			    \alpha_2 & \alpha_1 & \alpha_0 & 0 \\
+			    0 & \alpha_2 & \alpha_1 & \alpha_0 \\
+			    0 & 0 & \alpha_2 & \alpha_1 \\
+			    0 & 0 & 0 & \alpha_2
+			    \end{bmatrix}\begin{bmatrix} \beta_0 \\ \beta_1 \\ \beta_2 \\ \beta_3\end{bmatrix}.
+$$
 
 
 <p>
@@ -489,7 +321,7 @@ <h2 id="importing-keras-and-tensorflow" class="anchor">Importing Keras and Tenso
   <li><a href="http://github.com/._week45-bs029.html">30</a></li>
   <li><a href="http://github.com/._week45-bs030.html">31</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week45-bs076.html">77</a></li>
+  <li><a href="http://github.com/._week45-bs058.html">59</a></li>
   <li><a href="http://github.com/._week45-bs022.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week45/html/._week45-bs022.html b/doc/pub/week45/html/._week45-bs022.html
index ddef1af18..471d049a4 100644
--- a/doc/pub/week45/html/._week45-bs022.html
+++ b/doc/pub/week45/html/._week45-bs022.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks (RNNs)">
-<title>Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks (RNNs)</title>
+<meta name="description" content="Week 45,  Convolutional Neural Networks (CCNs)">
+<title>Week 45,  Convolutional Neural Networks (CCNs)</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week45.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week45-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -37,19 +37,19 @@
 <!-- tocinfo
 {'highest level': 2,
  'sections': [('Plans for week 45', 2, None, 'plans-for-week-45'),
-              ('Material for the lab sessions, additional ways to present '
-               'classification results and other practicalities',
+              ('Material for the lab sessions',
                2,
                None,
-               'material-for-the-lab-sessions-additional-ways-to-present-classification-results-and-other-practicalities'),
-              ('Material for Lecture Monday November 4',
+               'material-for-the-lab-sessions'),
+              ('Material for Lecture Monday November 3',
                2,
                None,
-               'material-for-lecture-monday-november-4'),
-              ('Convolutional Neural Networks (recognizing images)',
+               'material-for-lecture-monday-november-3'),
+              ('Convolutional Neural Networks (recognizing images), reminder '
+               'from last week',
                2,
                None,
-               'convolutional-neural-networks-recognizing-images'),
+               'convolutional-neural-networks-recognizing-images-reminder-from-last-week'),
               ('What is the Difference', 2, None, 'what-is-the-difference'),
               ('Neural Networks vs CNNs', 2, None, 'neural-networks-vs-cnns'),
               ('Why CNNS for images, sound files, medical images from CT scans '
@@ -62,10 +62,13 @@
                None,
                'regular-nns-don-t-scale-well-to-full-images'),
               ('3D volumes of neurons', 2, None, '3d-volumes-of-neurons'),
+              ('More on Dimensionalities', 2, None, 'more-on-dimensionalities'),
+              ('Further remarks', 2, None, 'further-remarks'),
               ('Layers used to build CNNs',
                2,
                None,
                'layers-used-to-build-cnns'),
+              ('Transforming images', 2, None, 'transforming-images'),
               ('CNNs in brief', 2, None, 'cnns-in-brief'),
               ('A deep CNN model ("From Raschka et '
                'al":"https://github.com/rasbt/machine-learning-book")',
@@ -73,208 +76,104 @@
                None,
                'a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
               ('Key Idea', 2, None, 'key-idea'),
-              ('Building convolutional neural networks in Tensorflow and Keras',
+              ('Mathematics of CNNs', 2, None, 'mathematics-of-cnns'),
+              ('Convolution Examples: Polynomial multiplication',
                2,
                None,
-               'building-convolutional-neural-networks-in-tensorflow-and-keras'),
-              ('Setting it up', 2, None, 'setting-it-up'),
-              ('The MNIST dataset again', 2, None, 'the-mnist-dataset-again'),
-              ('Strong correlations', 2, None, 'strong-correlations'),
-              ('Layers of a CNN', 2, None, 'layers-of-a-cnn'),
-              ('Systematic reduction', 2, None, 'systematic-reduction'),
-              ('Prerequisites: Collect and pre-process data',
-               2,
-               None,
-               'prerequisites-collect-and-pre-process-data'),
-              ('Importing Keras and Tensorflow',
-               2,
-               None,
-               'importing-keras-and-tensorflow'),
-              ('Running with Keras', 2, None, 'running-with-keras'),
-              ('Final part', 2, None, 'final-part'),
-              ('Final visualization', 2, None, 'final-visualization'),
-              ('The CIFAR01 data set', 2, None, 'the-cifar01-data-set'),
-              ('Verifying the data set', 2, None, 'verifying-the-data-set'),
-              ('Set up  the model', 2, None, 'set-up-the-model'),
-              ('Add Dense layers on top', 2, None, 'add-dense-layers-on-top'),
-              ('Compile and train the model',
-               2,
-               None,
-               'compile-and-train-the-model'),
-              ('Finally, evaluate the model',
-               2,
-               None,
-               'finally-evaluate-the-model'),
-              ('Building our own CNN code',
-               2,
-               None,
-               'building-our-own-cnn-code'),
-              ('List of contents:', 3, None, 'list-of-contents'),
-              ('Schedulers', 3, None, 'schedulers'),
-              ('Usage of schedulers', 3, None, 'usage-of-schedulers'),
-              ('Cost functions', 3, None, 'cost-functions'),
-              ('Usage of cost functions', 3, None, 'usage-of-cost-functions'),
-              ('Activation functions', 3, None, 'activation-functions'),
-              ('Usage of activation functions',
-               3,
-               None,
-               'usage-of-activation-functions'),
-              ('Convolution', 3, None, 'convolution'),
-              ('Layers', 3, None, 'layers'),
-              ('Convolution2DLayer: convolution in a hidden layer',
-               3,
-               None,
-               'convolution2dlayer-convolution-in-a-hidden-layer'),
-              ('Backpropagation in the convolutional layer',
-               3,
-               None,
-               'backpropagation-in-the-convolutional-layer'),
-              ('Demonstration', 3, None, 'demonstration'),
-              ('Pooling Layer', 3, None, 'pooling-layer'),
-              ('Flattening Layer', 3, None, 'flattening-layer'),
-              ('Fully Connected Layers', 3, None, 'fully-connected-layers'),
-              ('Optimized Convolution2DLayer',
-               3,
-               None,
-               'optimized-convolution2dlayer'),
-              ('The Convolutional Neural Network (CNN)',
-               3,
-               None,
-               'the-convolutional-neural-network-cnn'),
-              ('Usage of CNN code', 3, None, 'usage-of-cnn-code'),
-              ('Additional Remarks', 3, None, 'additional-remarks'),
-              ('Remarks on the speed', 3, None, 'remarks-on-the-speed'),
-              ('Convolution using separable kernels',
-               3,
-               None,
-               'convolution-using-separable-kernels'),
-              ('Convolution in the Fourier domain',
-               3,
-               None,
-               'convolution-in-the-fourier-domain'),
-              ('From FFNNs and CNNs to recurrent neural networks (RNNs)',
-               2,
-               None,
-               'from-ffnns-and-cnns-to-recurrent-neural-networks-rnns'),
-              ('Feedback connections', 2, None, 'feedback-connections'),
-              ('Vanishing gradients', 2, None, 'vanishing-gradients'),
-              ('Recurrent neural networks (RNNs): Overarching view',
-               2,
-               None,
-               'recurrent-neural-networks-rnns-overarching-view'),
-              ('Sequential data only?', 2, None, 'sequential-data-only'),
-              ('Differential equations', 2, None, 'differential-equations'),
-              ('A simple example', 2, None, 'a-simple-example'),
-              ('RNNs', 2, None, 'rnns'),
-              ('What kinds of behaviour can RNNs exhibit?',
-               2,
-               None,
-               'what-kinds-of-behaviour-can-rnns-exhibit'),
-              ('Basic layout,  "Figures from Sebastian Rashcka et al, Machine '
-               'learning with Sickit-Learn and '
-               'PyTorch":"https://sebastianraschka.com/blog/2022/ml-pytorch-book.html"',
-               2,
-               None,
-               'basic-layout-figures-from-sebastian-rashcka-et-al-machine-learning-with-sickit-learn-and-pytorch-https-sebastianraschka-com-blog-2022-ml-pytorch-book-html'),
-              ('Solving differential equations with RNNs',
-               2,
-               None,
-               'solving-differential-equations-with-rnns'),
-              ('Two first-order differential equations',
-               2,
-               None,
-               'two-first-order-differential-equations'),
-              ('Velocity only', 2, None, 'velocity-only'),
-              ('Linking with RNNs', 2, None, 'linking-with-rnns'),
-              ('Minor rewrite', 2, None, 'minor-rewrite'),
-              ('RNNs in more detail', 2, None, 'rnns-in-more-detail'),
-              ('RNNs in more detail, part 2',
-               2,
-               None,
-               'rnns-in-more-detail-part-2'),
-              ('RNNs in more detail, part 3',
-               2,
-               None,
-               'rnns-in-more-detail-part-3'),
-              ('RNNs in more detail, part 4',
+               'convolution-examples-polynomial-multiplication'),
+              ('Efficient Polynomial Multiplication',
                2,
                None,
-               'rnns-in-more-detail-part-4'),
-              ('RNNs in more detail, part 5',
+               'efficient-polynomial-multiplication'),
+              ('Further simplification', 2, None, 'further-simplification'),
+              ('A more efficient way of coding the above Convolution',
                2,
                None,
-               'rnns-in-more-detail-part-5'),
-              ('RNNs in more detail, part 6',
+               'a-more-efficient-way-of-coding-the-above-convolution'),
+              ('Commutative process', 2, None, 'commutative-process'),
+              ('Toeplitz matrices', 2, None, 'toeplitz-matrices'),
+              ('Fourier series and Toeplitz matrices',
                2,
                None,
-               'rnns-in-more-detail-part-6'),
-              ('RNNs in more detail, part 7',
+               'fourier-series-and-toeplitz-matrices'),
+              ('Generalizing the above one-dimensional case',
                2,
                None,
-               'rnns-in-more-detail-part-7'),
-              ('Backpropagation through time',
+               'generalizing-the-above-one-dimensional-case'),
+              ('Memory considerations', 2, None, 'memory-considerations'),
+              ('Padding', 2, None, 'padding'),
+              ('New vector', 2, None, 'new-vector'),
+              ('Rewriting as dot products',
                2,
                None,
-               'backpropagation-through-time'),
-              ('The backward pass is linear',
+               'rewriting-as-dot-products'),
+              ('Cross correlation', 2, None, 'cross-correlation'),
+              ('Two-dimensional objects', 2, None, 'two-dimensional-objects'),
+              ('CNNs in more detail, simple example',
                2,
                None,
-               'the-backward-pass-is-linear'),
-              ('The problem of exploding or vanishing gradients',
+               'cnns-in-more-detail-simple-example'),
+              ('The convolution stage', 2, None, 'the-convolution-stage'),
+              ('Finding the number of parameters',
                2,
                None,
-               'the-problem-of-exploding-or-vanishing-gradients'),
-              ('Mathematical setup', 2, None, 'mathematical-setup'),
-              ('Back propagation in time through figures, part 1',
+               'finding-the-number-of-parameters'),
+              ('New image (or volume)', 2, None, 'new-image-or-volume'),
+              ('Parameters to train, common settings',
                2,
                None,
-               'back-propagation-in-time-through-figures-part-1'),
-              ('Back propagation in time, part 2',
-               2,
-               None,
-               'back-propagation-in-time-part-2'),
-              ('Back propagation in time, part 3',
+               'parameters-to-train-common-settings'),
+              ('Examples of CNN setups', 2, None, 'examples-of-cnn-setups'),
+              ('Summarizing: Performing a general discrete convolution ("From '
+               'Raschka et '
+               'al":"https://github.com/rasbt/machine-learning-book")',
                2,
                None,
-               'back-propagation-in-time-part-3'),
-              ('Back propagation in time, part 4',
+               'summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
+              ('Pooling', 2, None, 'pooling'),
+              ('Pooling arithmetic', 2, None, 'pooling-arithmetic'),
+              ('Pooling types ("From Raschka et '
+               'al":"https://github.com/rasbt/machine-learning-book")',
                2,
                None,
-               'back-propagation-in-time-part-4'),
-              ('Back propagation in time in equations',
+               'pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
+              ('Building convolutional neural networks using Tensorflow and '
+               'Keras',
                2,
                None,
-               'back-propagation-in-time-in-equations'),
-              ('Chain rule again', 2, None, 'chain-rule-again'),
-              ('Gradients of loss functions',
+               'building-convolutional-neural-networks-using-tensorflow-and-keras'),
+              ('Setting it up', 2, None, 'setting-it-up'),
+              ('The MNIST dataset again', 2, None, 'the-mnist-dataset-again'),
+              ('Strong correlations', 2, None, 'strong-correlations'),
+              ('Layers of a CNN', 2, None, 'layers-of-a-cnn'),
+              ('Systematic reduction', 2, None, 'systematic-reduction'),
+              ('Prerequisites: Collect and pre-process data',
                2,
                None,
-               'gradients-of-loss-functions'),
-              ('Summary of RNNs', 2, None, 'summary-of-rnns'),
-              ('Summary of a  typical RNN',
+               'prerequisites-collect-and-pre-process-data'),
+              ('Importing Keras and Tensorflow',
                2,
                None,
-               'summary-of-a-typical-rnn'),
-              ('Four effective ways to learn an RNN and preparing for next '
-               'week',
+               'importing-keras-and-tensorflow'),
+              ('Running with Keras', 2, None, 'running-with-keras'),
+              ('Final part', 2, None, 'final-part'),
+              ('Final visualization', 2, None, 'final-visualization'),
+              ('The CIFAR01 data set', 2, None, 'the-cifar01-data-set'),
+              ('Verifying the data set', 2, None, 'verifying-the-data-set'),
+              ('Set up  the model', 2, None, 'set-up-the-model'),
+              ('Add Dense layers on top', 2, None, 'add-dense-layers-on-top'),
+              ('Compile and train the model',
                2,
                None,
-               'four-effective-ways-to-learn-an-rnn-and-preparing-for-next-week'),
-              ('Gating mechanism: Long Short Term Memory (LSTM)',
+               'compile-and-train-the-model'),
+              ('Finally, evaluate the model',
                2,
                None,
-               'gating-mechanism-long-short-term-memory-lstm'),
-              ('Implementing a memory cell in a neural network',
+               'finally-evaluate-the-model'),
+              ('Building code using Pytorch',
                2,
                None,
-               'implementing-a-memory-cell-in-a-neural-network'),
-              ('LSTM details', 2, None, 'lstm-details'),
-              ('Basic layout', 2, None, 'basic-layout'),
-              ('More LSTM details', 2, None, 'more-lstm-details'),
-              ('The forget gate', 2, None, 'the-forget-gate'),
-              ('Input gate', 2, None, 'input-gate'),
-              ('Forget and input', 2, None, 'forget-and-input'),
-              ('Output gate', 2, None, 'output-gate')]}
+               'building-code-using-pytorch')]}
 end of tocinfo -->
 
 <body>
@@ -302,111 +201,72 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week45-bs.html">Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks (RNNs)</a>
+    <a class="navbar-brand" href="http://github.com/week45-bs.html">Week 45,  Convolutional Neural Networks (CCNs)</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs001.html#plans-for-week-45" style="font-size: 80%;"><b>Plans for week 45</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs002.html#material-for-the-lab-sessions-additional-ways-to-present-classification-results-and-other-practicalities" style="font-size: 80%;"><b>Material for the lab sessions, additional ways to present classification results and other practicalities</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs003.html#material-for-lecture-monday-november-4" style="font-size: 80%;"><b>Material for Lecture Monday November 4</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs004.html#convolutional-neural-networks-recognizing-images" style="font-size: 80%;"><b>Convolutional Neural Networks (recognizing images)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs005.html#what-is-the-difference" style="font-size: 80%;"><b>What is the Difference</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs006.html#neural-networks-vs-cnns" style="font-size: 80%;"><b>Neural Networks vs CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs007.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;"><b>Why CNNS for images, sound files, medical images from CT scans etc?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs008.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;"><b>Regular NNs don’t scale well to full images</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs009.html#3d-volumes-of-neurons" style="font-size: 80%;"><b>3D volumes of neurons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs010.html#layers-used-to-build-cnns" style="font-size: 80%;"><b>Layers used to build CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs011.html#cnns-in-brief" style="font-size: 80%;"><b>CNNs in brief</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs012.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs013.html#key-idea" style="font-size: 80%;"><b>Key Idea</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs014.html#building-convolutional-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building convolutional neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs015.html#setting-it-up" style="font-size: 80%;"><b>Setting it up</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs016.html#the-mnist-dataset-again" style="font-size: 80%;"><b>The MNIST dataset again</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs017.html#strong-correlations" style="font-size: 80%;"><b>Strong correlations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs018.html#layers-of-a-cnn" style="font-size: 80%;"><b>Layers of a CNN</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs019.html#systematic-reduction" style="font-size: 80%;"><b>Systematic reduction</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs020.html#prerequisites-collect-and-pre-process-data" style="font-size: 80%;"><b>Prerequisites: Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs021.html#importing-keras-and-tensorflow" style="font-size: 80%;"><b>Importing Keras and Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="#running-with-keras" style="font-size: 80%;"><b>Running with Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs023.html#final-part" style="font-size: 80%;"><b>Final part</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs024.html#final-visualization" style="font-size: 80%;"><b>Final visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs025.html#the-cifar01-data-set" style="font-size: 80%;"><b>The CIFAR01 data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs026.html#verifying-the-data-set" style="font-size: 80%;"><b>Verifying the data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs027.html#set-up-the-model" style="font-size: 80%;"><b>Set up  the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs028.html#add-dense-layers-on-top" style="font-size: 80%;"><b>Add Dense layers on top</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs029.html#compile-and-train-the-model" style="font-size: 80%;"><b>Compile and train the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs030.html#finally-evaluate-the-model" style="font-size: 80%;"><b>Finally, evaluate the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#building-our-own-cnn-code" style="font-size: 80%;"><b>Building our own CNN code</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#list-of-contents" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;List of contents:</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution2dlayer-convolution-in-a-hidden-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution2DLayer: convolution in a hidden layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#backpropagation-in-the-convolutional-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Backpropagation in the convolutional layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#demonstration" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Demonstration</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#pooling-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Pooling Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#flattening-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Flattening Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#fully-connected-layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Fully Connected Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#optimized-convolution2dlayer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Optimized Convolution2DLayer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#the-convolutional-neural-network-cnn" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;The Convolutional Neural Network (CNN)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-cnn-code" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of CNN code</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#additional-remarks" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Additional Remarks</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#remarks-on-the-speed" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Remarks on the speed</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution-using-separable-kernels" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution using separable kernels</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution-in-the-fourier-domain" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution in the Fourier domain</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs032.html#from-ffnns-and-cnns-to-recurrent-neural-networks-rnns" style="font-size: 80%;"><b>From FFNNs and CNNs to recurrent neural networks (RNNs)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs033.html#feedback-connections" style="font-size: 80%;"><b>Feedback connections</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs034.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs035.html#recurrent-neural-networks-rnns-overarching-view" style="font-size: 80%;"><b>Recurrent neural networks (RNNs): Overarching view</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs036.html#sequential-data-only" style="font-size: 80%;"><b>Sequential data only?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs037.html#differential-equations" style="font-size: 80%;"><b>Differential equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs038.html#a-simple-example" style="font-size: 80%;"><b>A simple example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs039.html#rnns" style="font-size: 80%;"><b>RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs040.html#what-kinds-of-behaviour-can-rnns-exhibit" style="font-size: 80%;"><b>What kinds of behaviour can RNNs exhibit?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs041.html#basic-layout-figures-from-sebastian-rashcka-et-al-machine-learning-with-sickit-learn-and-pytorch-https-sebastianraschka-com-blog-2022-ml-pytorch-book-html" style="font-size: 80%;"><b>Basic layout,  "Figures from Sebastian Rashcka et al, Machine learning with Sickit-Learn and PyTorch":"https://sebastianraschka.com/blog/2022/ml-pytorch-book.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs042.html#solving-differential-equations-with-rnns" style="font-size: 80%;"><b>Solving differential equations with RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs043.html#two-first-order-differential-equations" style="font-size: 80%;"><b>Two first-order differential equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs044.html#velocity-only" style="font-size: 80%;"><b>Velocity only</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs045.html#linking-with-rnns" style="font-size: 80%;"><b>Linking with RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs046.html#minor-rewrite" style="font-size: 80%;"><b>Minor rewrite</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs047.html#rnns-in-more-detail" style="font-size: 80%;"><b>RNNs in more detail</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs048.html#rnns-in-more-detail-part-2" style="font-size: 80%;"><b>RNNs in more detail, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs049.html#rnns-in-more-detail-part-3" style="font-size: 80%;"><b>RNNs in more detail, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs050.html#rnns-in-more-detail-part-4" style="font-size: 80%;"><b>RNNs in more detail, part 4</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs051.html#rnns-in-more-detail-part-5" style="font-size: 80%;"><b>RNNs in more detail, part 5</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs052.html#rnns-in-more-detail-part-6" style="font-size: 80%;"><b>RNNs in more detail, part 6</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs053.html#rnns-in-more-detail-part-7" style="font-size: 80%;"><b>RNNs in more detail, part 7</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs054.html#backpropagation-through-time" style="font-size: 80%;"><b>Backpropagation through time</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs055.html#the-backward-pass-is-linear" style="font-size: 80%;"><b>The backward pass is linear</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs056.html#the-problem-of-exploding-or-vanishing-gradients" style="font-size: 80%;"><b>The problem of exploding or vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs057.html#mathematical-setup" style="font-size: 80%;"><b>Mathematical setup</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs058.html#back-propagation-in-time-through-figures-part-1" style="font-size: 80%;"><b>Back propagation in time through figures, part 1</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs059.html#back-propagation-in-time-part-2" style="font-size: 80%;"><b>Back propagation in time, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs060.html#back-propagation-in-time-part-3" style="font-size: 80%;"><b>Back propagation in time, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs061.html#back-propagation-in-time-part-4" style="font-size: 80%;"><b>Back propagation in time, part 4</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs062.html#back-propagation-in-time-in-equations" style="font-size: 80%;"><b>Back propagation in time in equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs063.html#chain-rule-again" style="font-size: 80%;"><b>Chain rule again</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs064.html#gradients-of-loss-functions" style="font-size: 80%;"><b>Gradients of loss functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs065.html#summary-of-rnns" style="font-size: 80%;"><b>Summary of RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs066.html#summary-of-a-typical-rnn" style="font-size: 80%;"><b>Summary of a  typical RNN</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs067.html#four-effective-ways-to-learn-an-rnn-and-preparing-for-next-week" style="font-size: 80%;"><b>Four effective ways to learn an RNN and preparing for next week</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs068.html#gating-mechanism-long-short-term-memory-lstm" style="font-size: 80%;"><b>Gating mechanism: Long Short Term Memory (LSTM)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs069.html#implementing-a-memory-cell-in-a-neural-network" style="font-size: 80%;"><b>Implementing a memory cell in a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs070.html#lstm-details" style="font-size: 80%;"><b>LSTM details</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs071.html#basic-layout" style="font-size: 80%;"><b>Basic layout</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs072.html#more-lstm-details" style="font-size: 80%;"><b>More LSTM details</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs073.html#the-forget-gate" style="font-size: 80%;"><b>The forget gate</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs074.html#input-gate" style="font-size: 80%;"><b>Input gate</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs075.html#forget-and-input" style="font-size: 80%;"><b>Forget and input</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs076.html#output-gate" style="font-size: 80%;"><b>Output gate</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs001.html#plans-for-week-45" style="font-size: 80%;">Plans for week 45</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs002.html#material-for-the-lab-sessions" style="font-size: 80%;">Material for the lab sessions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs003.html#material-for-lecture-monday-november-3" style="font-size: 80%;">Material for Lecture Monday November 3</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs004.html#convolutional-neural-networks-recognizing-images-reminder-from-last-week" style="font-size: 80%;">Convolutional Neural Networks (recognizing images), reminder from last week</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs005.html#what-is-the-difference" style="font-size: 80%;">What is the Difference</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs006.html#neural-networks-vs-cnns" style="font-size: 80%;">Neural Networks vs CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs007.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;">Why CNNS for images, sound files, medical images from CT scans etc?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs008.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;">Regular NNs don’t scale well to full images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs009.html#3d-volumes-of-neurons" style="font-size: 80%;">3D volumes of neurons</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs010.html#more-on-dimensionalities" style="font-size: 80%;">More on Dimensionalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs011.html#further-remarks" style="font-size: 80%;">Further remarks</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs012.html#layers-used-to-build-cnns" style="font-size: 80%;">Layers used to build CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs013.html#transforming-images" style="font-size: 80%;">Transforming images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs014.html#cnns-in-brief" style="font-size: 80%;">CNNs in brief</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs015.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs016.html#key-idea" style="font-size: 80%;">Key Idea</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs017.html#mathematics-of-cnns" style="font-size: 80%;">Mathematics of CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs018.html#convolution-examples-polynomial-multiplication" style="font-size: 80%;">Convolution Examples: Polynomial multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs019.html#efficient-polynomial-multiplication" style="font-size: 80%;">Efficient Polynomial Multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs020.html#further-simplification" style="font-size: 80%;">Further simplification</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs021.html#a-more-efficient-way-of-coding-the-above-convolution" style="font-size: 80%;">A more efficient way of coding the above Convolution</a></li>
+     <!-- navigation toc: --> <li><a href="#commutative-process" style="font-size: 80%;">Commutative process</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs023.html#toeplitz-matrices" style="font-size: 80%;">Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs024.html#fourier-series-and-toeplitz-matrices" style="font-size: 80%;">Fourier series and Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs025.html#generalizing-the-above-one-dimensional-case" style="font-size: 80%;">Generalizing the above one-dimensional case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs026.html#memory-considerations" style="font-size: 80%;">Memory considerations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs027.html#padding" style="font-size: 80%;">Padding</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs028.html#new-vector" style="font-size: 80%;">New vector</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs029.html#rewriting-as-dot-products" style="font-size: 80%;">Rewriting as dot products</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs030.html#cross-correlation" style="font-size: 80%;">Cross correlation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs030.html#two-dimensional-objects" style="font-size: 80%;">Two-dimensional objects</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#cnns-in-more-detail-simple-example" style="font-size: 80%;">CNNs in more detail, simple example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs032.html#the-convolution-stage" style="font-size: 80%;">The convolution stage</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs033.html#finding-the-number-of-parameters" style="font-size: 80%;">Finding the number of parameters</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs034.html#new-image-or-volume" style="font-size: 80%;">New image (or volume)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs035.html#parameters-to-train-common-settings" style="font-size: 80%;">Parameters to train, common settings</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs036.html#examples-of-cnn-setups" style="font-size: 80%;">Examples of CNN setups</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs037.html#summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Summarizing: Performing a general discrete convolution ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs038.html#pooling" style="font-size: 80%;">Pooling</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs039.html#pooling-arithmetic" style="font-size: 80%;">Pooling arithmetic</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs040.html#pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Pooling types ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs041.html#building-convolutional-neural-networks-using-tensorflow-and-keras" style="font-size: 80%;">Building convolutional neural networks using Tensorflow and Keras</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs042.html#setting-it-up" style="font-size: 80%;">Setting it up</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs043.html#the-mnist-dataset-again" style="font-size: 80%;">The MNIST dataset again</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs044.html#strong-correlations" style="font-size: 80%;">Strong correlations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs045.html#layers-of-a-cnn" style="font-size: 80%;">Layers of a CNN</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs046.html#systematic-reduction" style="font-size: 80%;">Systematic reduction</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs047.html#prerequisites-collect-and-pre-process-data" style="font-size: 80%;">Prerequisites: Collect and pre-process data</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs048.html#importing-keras-and-tensorflow" style="font-size: 80%;">Importing Keras and Tensorflow</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs049.html#running-with-keras" style="font-size: 80%;">Running with Keras</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs050.html#final-part" style="font-size: 80%;">Final part</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs051.html#final-visualization" style="font-size: 80%;">Final visualization</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs052.html#the-cifar01-data-set" style="font-size: 80%;">The CIFAR01 data set</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs053.html#verifying-the-data-set" style="font-size: 80%;">Verifying the data set</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs054.html#set-up-the-model" style="font-size: 80%;">Set up  the model</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs055.html#add-dense-layers-on-top" style="font-size: 80%;">Add Dense layers on top</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs056.html#compile-and-train-the-model" style="font-size: 80%;">Compile and train the model</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs057.html#finally-evaluate-the-model" style="font-size: 80%;">Finally, evaluate the model</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs058.html#building-code-using-pytorch" style="font-size: 80%;">Building code using Pytorch</a></li>
 
         </ul>
       </li>
@@ -417,57 +277,28 @@
 <div class="container">
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0022"></a>
-<!-- !split  -->
-<h2 id="running-with-keras" class="anchor">Running with Keras </h2>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">create_convolutional_neural_network_keras</span>(input_shape, receptive_field,
-                                              n_filters, n_neurons_connected, n_categories,
-                                              eta, lmbd):
-    model <span style="color: #666666">=</span> Sequential()
-    model<span style="color: #666666">.</span>add(layers<span style="color: #666666">.</span>Conv2D(n_filters, (receptive_field, receptive_field), input_shape<span style="color: #666666">=</span>input_shape, padding<span style="color: #666666">=</span><span style="color: #BA2121">&#39;same&#39;</span>,
-              activation<span style="color: #666666">=</span><span style="color: #BA2121">&#39;relu&#39;</span>, kernel_regularizer<span style="color: #666666">=</span>regularizers<span style="color: #666666">.</span>l2(lmbd)))
-    model<span style="color: #666666">.</span>add(layers<span style="color: #666666">.</span>MaxPooling2D(pool_size<span style="color: #666666">=</span>(<span style="color: #666666">2</span>, <span style="color: #666666">2</span>)))
-    model<span style="color: #666666">.</span>add(layers<span style="color: #666666">.</span>Flatten())
-    model<span style="color: #666666">.</span>add(layers<span style="color: #666666">.</span>Dense(n_neurons_connected, activation<span style="color: #666666">=</span><span style="color: #BA2121">&#39;relu&#39;</span>, kernel_regularizer<span style="color: #666666">=</span>regularizers<span style="color: #666666">.</span>l2(lmbd)))
-    model<span style="color: #666666">.</span>add(layers<span style="color: #666666">.</span>Dense(n_categories, activation<span style="color: #666666">=</span><span style="color: #BA2121">&#39;softmax&#39;</span>, kernel_regularizer<span style="color: #666666">=</span>regularizers<span style="color: #666666">.</span>l2(lmbd)))
-    
-    sgd <span style="color: #666666">=</span> optimizers<span style="color: #666666">.</span>SGD(learning_rate<span style="color: #666666">=</span>eta)
-    model<span style="color: #666666">.</span>compile(loss<span style="color: #666666">=</span><span style="color: #BA2121">&#39;categorical_crossentropy&#39;</span>, optimizer<span style="color: #666666">=</span>sgd, metrics<span style="color: #666666">=</span>[<span style="color: #BA2121">&#39;accuracy&#39;</span>])
-    
-    <span style="color: #008000; font-weight: bold">return</span> model
+<!-- !split -->
+<h2 id="commutative-process" class="anchor">Commutative process </h2>
 
-epochs <span style="color: #666666">=</span> <span style="color: #666666">100</span>
-batch_size <span style="color: #666666">=</span> <span style="color: #666666">100</span>
-input_shape <span style="color: #666666">=</span> X_train<span style="color: #666666">.</span>shape[<span style="color: #666666">1</span>:<span style="color: #666666">4</span>]
-receptive_field <span style="color: #666666">=</span> <span style="color: #666666">3</span>
-n_filters <span style="color: #666666">=</span> <span style="color: #666666">10</span>
-n_neurons_connected <span style="color: #666666">=</span> <span style="color: #666666">50</span>
-n_categories <span style="color: #666666">=</span> <span style="color: #666666">10</span>
-
-eta_vals <span style="color: #666666">=</span> np<span style="color: #666666">.</span>logspace(<span style="color: #666666">-5</span>, <span style="color: #666666">1</span>, <span style="color: #666666">7</span>)
-lmbd_vals <span style="color: #666666">=</span> np<span style="color: #666666">.</span>logspace(<span style="color: #666666">-5</span>, <span style="color: #666666">1</span>, <span style="color: #666666">7</span>)
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
+<p>The process is commutative and we can easily see that we can rewrite the multiplication in terms of  a matrix holding \( \beta \) and a vector holding \( \alpha \).
+In this case we have
+</p>
+$$
+\boldsymbol{\delta}=\begin{bmatrix}\beta_0 & 0 & 0  \\
+                            \beta_1 & \beta_0 & 0  \\
+			    \beta_2 & \beta_1 & \beta_0  \\
+			    \beta_3 & \beta_2 & \beta_1 \\
+			    0 & \beta_3 & \beta_2 \\
+			    0 & 0 & \beta_3
+			    \end{bmatrix}\begin{bmatrix} \alpha_0 \\ \alpha_1 \\ \alpha_2\end{bmatrix}.
+$$
 
+<p>Note that the use of these matrices is for mathematical purposes only
+and not implementation purposes.  When implementing the above equation
+we do not encode (and allocate memory) the matrices explicitely.  We
+rather code the convolutions in the minimal memory footprint that they
+require.
+</p>
 
 <p>
 <!-- navigation buttons at the bottom of the page -->
@@ -494,7 +325,7 @@ <h2 id="running-with-keras" class="anchor">Running with Keras </h2>
   <li><a href="http://github.com/._week45-bs030.html">31</a></li>
   <li><a href="http://github.com/._week45-bs031.html">32</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week45-bs076.html">77</a></li>
+  <li><a href="http://github.com/._week45-bs058.html">59</a></li>
   <li><a href="http://github.com/._week45-bs023.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week45/html/._week45-bs023.html b/doc/pub/week45/html/._week45-bs023.html
index 8cfe55161..38230a096 100644
--- a/doc/pub/week45/html/._week45-bs023.html
+++ b/doc/pub/week45/html/._week45-bs023.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks (RNNs)">
-<title>Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks (RNNs)</title>
+<meta name="description" content="Week 45,  Convolutional Neural Networks (CCNs)">
+<title>Week 45,  Convolutional Neural Networks (CCNs)</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week45.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week45-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -37,19 +37,19 @@
 <!-- tocinfo
 {'highest level': 2,
  'sections': [('Plans for week 45', 2, None, 'plans-for-week-45'),
-              ('Material for the lab sessions, additional ways to present '
-               'classification results and other practicalities',
+              ('Material for the lab sessions',
                2,
                None,
-               'material-for-the-lab-sessions-additional-ways-to-present-classification-results-and-other-practicalities'),
-              ('Material for Lecture Monday November 4',
+               'material-for-the-lab-sessions'),
+              ('Material for Lecture Monday November 3',
                2,
                None,
-               'material-for-lecture-monday-november-4'),
-              ('Convolutional Neural Networks (recognizing images)',
+               'material-for-lecture-monday-november-3'),
+              ('Convolutional Neural Networks (recognizing images), reminder '
+               'from last week',
                2,
                None,
-               'convolutional-neural-networks-recognizing-images'),
+               'convolutional-neural-networks-recognizing-images-reminder-from-last-week'),
               ('What is the Difference', 2, None, 'what-is-the-difference'),
               ('Neural Networks vs CNNs', 2, None, 'neural-networks-vs-cnns'),
               ('Why CNNS for images, sound files, medical images from CT scans '
@@ -62,10 +62,13 @@
                None,
                'regular-nns-don-t-scale-well-to-full-images'),
               ('3D volumes of neurons', 2, None, '3d-volumes-of-neurons'),
+              ('More on Dimensionalities', 2, None, 'more-on-dimensionalities'),
+              ('Further remarks', 2, None, 'further-remarks'),
               ('Layers used to build CNNs',
                2,
                None,
                'layers-used-to-build-cnns'),
+              ('Transforming images', 2, None, 'transforming-images'),
               ('CNNs in brief', 2, None, 'cnns-in-brief'),
               ('A deep CNN model ("From Raschka et '
                'al":"https://github.com/rasbt/machine-learning-book")',
@@ -73,208 +76,104 @@
                None,
                'a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
               ('Key Idea', 2, None, 'key-idea'),
-              ('Building convolutional neural networks in Tensorflow and Keras',
+              ('Mathematics of CNNs', 2, None, 'mathematics-of-cnns'),
+              ('Convolution Examples: Polynomial multiplication',
                2,
                None,
-               'building-convolutional-neural-networks-in-tensorflow-and-keras'),
-              ('Setting it up', 2, None, 'setting-it-up'),
-              ('The MNIST dataset again', 2, None, 'the-mnist-dataset-again'),
-              ('Strong correlations', 2, None, 'strong-correlations'),
-              ('Layers of a CNN', 2, None, 'layers-of-a-cnn'),
-              ('Systematic reduction', 2, None, 'systematic-reduction'),
-              ('Prerequisites: Collect and pre-process data',
-               2,
-               None,
-               'prerequisites-collect-and-pre-process-data'),
-              ('Importing Keras and Tensorflow',
-               2,
-               None,
-               'importing-keras-and-tensorflow'),
-              ('Running with Keras', 2, None, 'running-with-keras'),
-              ('Final part', 2, None, 'final-part'),
-              ('Final visualization', 2, None, 'final-visualization'),
-              ('The CIFAR01 data set', 2, None, 'the-cifar01-data-set'),
-              ('Verifying the data set', 2, None, 'verifying-the-data-set'),
-              ('Set up  the model', 2, None, 'set-up-the-model'),
-              ('Add Dense layers on top', 2, None, 'add-dense-layers-on-top'),
-              ('Compile and train the model',
-               2,
-               None,
-               'compile-and-train-the-model'),
-              ('Finally, evaluate the model',
-               2,
-               None,
-               'finally-evaluate-the-model'),
-              ('Building our own CNN code',
-               2,
-               None,
-               'building-our-own-cnn-code'),
-              ('List of contents:', 3, None, 'list-of-contents'),
-              ('Schedulers', 3, None, 'schedulers'),
-              ('Usage of schedulers', 3, None, 'usage-of-schedulers'),
-              ('Cost functions', 3, None, 'cost-functions'),
-              ('Usage of cost functions', 3, None, 'usage-of-cost-functions'),
-              ('Activation functions', 3, None, 'activation-functions'),
-              ('Usage of activation functions',
-               3,
-               None,
-               'usage-of-activation-functions'),
-              ('Convolution', 3, None, 'convolution'),
-              ('Layers', 3, None, 'layers'),
-              ('Convolution2DLayer: convolution in a hidden layer',
-               3,
-               None,
-               'convolution2dlayer-convolution-in-a-hidden-layer'),
-              ('Backpropagation in the convolutional layer',
-               3,
-               None,
-               'backpropagation-in-the-convolutional-layer'),
-              ('Demonstration', 3, None, 'demonstration'),
-              ('Pooling Layer', 3, None, 'pooling-layer'),
-              ('Flattening Layer', 3, None, 'flattening-layer'),
-              ('Fully Connected Layers', 3, None, 'fully-connected-layers'),
-              ('Optimized Convolution2DLayer',
-               3,
-               None,
-               'optimized-convolution2dlayer'),
-              ('The Convolutional Neural Network (CNN)',
-               3,
-               None,
-               'the-convolutional-neural-network-cnn'),
-              ('Usage of CNN code', 3, None, 'usage-of-cnn-code'),
-              ('Additional Remarks', 3, None, 'additional-remarks'),
-              ('Remarks on the speed', 3, None, 'remarks-on-the-speed'),
-              ('Convolution using separable kernels',
-               3,
-               None,
-               'convolution-using-separable-kernels'),
-              ('Convolution in the Fourier domain',
-               3,
-               None,
-               'convolution-in-the-fourier-domain'),
-              ('From FFNNs and CNNs to recurrent neural networks (RNNs)',
-               2,
-               None,
-               'from-ffnns-and-cnns-to-recurrent-neural-networks-rnns'),
-              ('Feedback connections', 2, None, 'feedback-connections'),
-              ('Vanishing gradients', 2, None, 'vanishing-gradients'),
-              ('Recurrent neural networks (RNNs): Overarching view',
-               2,
-               None,
-               'recurrent-neural-networks-rnns-overarching-view'),
-              ('Sequential data only?', 2, None, 'sequential-data-only'),
-              ('Differential equations', 2, None, 'differential-equations'),
-              ('A simple example', 2, None, 'a-simple-example'),
-              ('RNNs', 2, None, 'rnns'),
-              ('What kinds of behaviour can RNNs exhibit?',
+               'convolution-examples-polynomial-multiplication'),
+              ('Efficient Polynomial Multiplication',
                2,
                None,
-               'what-kinds-of-behaviour-can-rnns-exhibit'),
-              ('Basic layout,  "Figures from Sebastian Rashcka et al, Machine '
-               'learning with Sickit-Learn and '
-               'PyTorch":"https://sebastianraschka.com/blog/2022/ml-pytorch-book.html"',
+               'efficient-polynomial-multiplication'),
+              ('Further simplification', 2, None, 'further-simplification'),
+              ('A more efficient way of coding the above Convolution',
                2,
                None,
-               'basic-layout-figures-from-sebastian-rashcka-et-al-machine-learning-with-sickit-learn-and-pytorch-https-sebastianraschka-com-blog-2022-ml-pytorch-book-html'),
-              ('Solving differential equations with RNNs',
+               'a-more-efficient-way-of-coding-the-above-convolution'),
+              ('Commutative process', 2, None, 'commutative-process'),
+              ('Toeplitz matrices', 2, None, 'toeplitz-matrices'),
+              ('Fourier series and Toeplitz matrices',
                2,
                None,
-               'solving-differential-equations-with-rnns'),
-              ('Two first-order differential equations',
+               'fourier-series-and-toeplitz-matrices'),
+              ('Generalizing the above one-dimensional case',
                2,
                None,
-               'two-first-order-differential-equations'),
-              ('Velocity only', 2, None, 'velocity-only'),
-              ('Linking with RNNs', 2, None, 'linking-with-rnns'),
-              ('Minor rewrite', 2, None, 'minor-rewrite'),
-              ('RNNs in more detail', 2, None, 'rnns-in-more-detail'),
-              ('RNNs in more detail, part 2',
+               'generalizing-the-above-one-dimensional-case'),
+              ('Memory considerations', 2, None, 'memory-considerations'),
+              ('Padding', 2, None, 'padding'),
+              ('New vector', 2, None, 'new-vector'),
+              ('Rewriting as dot products',
                2,
                None,
-               'rnns-in-more-detail-part-2'),
-              ('RNNs in more detail, part 3',
+               'rewriting-as-dot-products'),
+              ('Cross correlation', 2, None, 'cross-correlation'),
+              ('Two-dimensional objects', 2, None, 'two-dimensional-objects'),
+              ('CNNs in more detail, simple example',
                2,
                None,
-               'rnns-in-more-detail-part-3'),
-              ('RNNs in more detail, part 4',
+               'cnns-in-more-detail-simple-example'),
+              ('The convolution stage', 2, None, 'the-convolution-stage'),
+              ('Finding the number of parameters',
                2,
                None,
-               'rnns-in-more-detail-part-4'),
-              ('RNNs in more detail, part 5',
+               'finding-the-number-of-parameters'),
+              ('New image (or volume)', 2, None, 'new-image-or-volume'),
+              ('Parameters to train, common settings',
                2,
                None,
-               'rnns-in-more-detail-part-5'),
-              ('RNNs in more detail, part 6',
-               2,
-               None,
-               'rnns-in-more-detail-part-6'),
-              ('RNNs in more detail, part 7',
-               2,
-               None,
-               'rnns-in-more-detail-part-7'),
-              ('Backpropagation through time',
-               2,
-               None,
-               'backpropagation-through-time'),
-              ('The backward pass is linear',
-               2,
-               None,
-               'the-backward-pass-is-linear'),
-              ('The problem of exploding or vanishing gradients',
-               2,
-               None,
-               'the-problem-of-exploding-or-vanishing-gradients'),
-              ('Mathematical setup', 2, None, 'mathematical-setup'),
-              ('Back propagation in time through figures, part 1',
-               2,
-               None,
-               'back-propagation-in-time-through-figures-part-1'),
-              ('Back propagation in time, part 2',
-               2,
-               None,
-               'back-propagation-in-time-part-2'),
-              ('Back propagation in time, part 3',
+               'parameters-to-train-common-settings'),
+              ('Examples of CNN setups', 2, None, 'examples-of-cnn-setups'),
+              ('Summarizing: Performing a general discrete convolution ("From '
+               'Raschka et '
+               'al":"https://github.com/rasbt/machine-learning-book")',
                2,
                None,
-               'back-propagation-in-time-part-3'),
-              ('Back propagation in time, part 4',
+               'summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
+              ('Pooling', 2, None, 'pooling'),
+              ('Pooling arithmetic', 2, None, 'pooling-arithmetic'),
+              ('Pooling types ("From Raschka et '
+               'al":"https://github.com/rasbt/machine-learning-book")',
                2,
                None,
-               'back-propagation-in-time-part-4'),
-              ('Back propagation in time in equations',
+               'pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
+              ('Building convolutional neural networks using Tensorflow and '
+               'Keras',
                2,
                None,
-               'back-propagation-in-time-in-equations'),
-              ('Chain rule again', 2, None, 'chain-rule-again'),
-              ('Gradients of loss functions',
+               'building-convolutional-neural-networks-using-tensorflow-and-keras'),
+              ('Setting it up', 2, None, 'setting-it-up'),
+              ('The MNIST dataset again', 2, None, 'the-mnist-dataset-again'),
+              ('Strong correlations', 2, None, 'strong-correlations'),
+              ('Layers of a CNN', 2, None, 'layers-of-a-cnn'),
+              ('Systematic reduction', 2, None, 'systematic-reduction'),
+              ('Prerequisites: Collect and pre-process data',
                2,
                None,
-               'gradients-of-loss-functions'),
-              ('Summary of RNNs', 2, None, 'summary-of-rnns'),
-              ('Summary of a  typical RNN',
+               'prerequisites-collect-and-pre-process-data'),
+              ('Importing Keras and Tensorflow',
                2,
                None,
-               'summary-of-a-typical-rnn'),
-              ('Four effective ways to learn an RNN and preparing for next '
-               'week',
+               'importing-keras-and-tensorflow'),
+              ('Running with Keras', 2, None, 'running-with-keras'),
+              ('Final part', 2, None, 'final-part'),
+              ('Final visualization', 2, None, 'final-visualization'),
+              ('The CIFAR01 data set', 2, None, 'the-cifar01-data-set'),
+              ('Verifying the data set', 2, None, 'verifying-the-data-set'),
+              ('Set up  the model', 2, None, 'set-up-the-model'),
+              ('Add Dense layers on top', 2, None, 'add-dense-layers-on-top'),
+              ('Compile and train the model',
                2,
                None,
-               'four-effective-ways-to-learn-an-rnn-and-preparing-for-next-week'),
-              ('Gating mechanism: Long Short Term Memory (LSTM)',
+               'compile-and-train-the-model'),
+              ('Finally, evaluate the model',
                2,
                None,
-               'gating-mechanism-long-short-term-memory-lstm'),
-              ('Implementing a memory cell in a neural network',
+               'finally-evaluate-the-model'),
+              ('Building code using Pytorch',
                2,
                None,
-               'implementing-a-memory-cell-in-a-neural-network'),
-              ('LSTM details', 2, None, 'lstm-details'),
-              ('Basic layout', 2, None, 'basic-layout'),
-              ('More LSTM details', 2, None, 'more-lstm-details'),
-              ('The forget gate', 2, None, 'the-forget-gate'),
-              ('Input gate', 2, None, 'input-gate'),
-              ('Forget and input', 2, None, 'forget-and-input'),
-              ('Output gate', 2, None, 'output-gate')]}
+               'building-code-using-pytorch')]}
 end of tocinfo -->
 
 <body>
@@ -302,111 +201,72 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week45-bs.html">Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks (RNNs)</a>
+    <a class="navbar-brand" href="http://github.com/week45-bs.html">Week 45,  Convolutional Neural Networks (CCNs)</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs001.html#plans-for-week-45" style="font-size: 80%;"><b>Plans for week 45</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs002.html#material-for-the-lab-sessions-additional-ways-to-present-classification-results-and-other-practicalities" style="font-size: 80%;"><b>Material for the lab sessions, additional ways to present classification results and other practicalities</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs003.html#material-for-lecture-monday-november-4" style="font-size: 80%;"><b>Material for Lecture Monday November 4</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs004.html#convolutional-neural-networks-recognizing-images" style="font-size: 80%;"><b>Convolutional Neural Networks (recognizing images)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs005.html#what-is-the-difference" style="font-size: 80%;"><b>What is the Difference</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs006.html#neural-networks-vs-cnns" style="font-size: 80%;"><b>Neural Networks vs CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs007.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;"><b>Why CNNS for images, sound files, medical images from CT scans etc?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs008.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;"><b>Regular NNs don’t scale well to full images</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs009.html#3d-volumes-of-neurons" style="font-size: 80%;"><b>3D volumes of neurons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs010.html#layers-used-to-build-cnns" style="font-size: 80%;"><b>Layers used to build CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs011.html#cnns-in-brief" style="font-size: 80%;"><b>CNNs in brief</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs012.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs013.html#key-idea" style="font-size: 80%;"><b>Key Idea</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs014.html#building-convolutional-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building convolutional neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs015.html#setting-it-up" style="font-size: 80%;"><b>Setting it up</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs016.html#the-mnist-dataset-again" style="font-size: 80%;"><b>The MNIST dataset again</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs017.html#strong-correlations" style="font-size: 80%;"><b>Strong correlations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs018.html#layers-of-a-cnn" style="font-size: 80%;"><b>Layers of a CNN</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs019.html#systematic-reduction" style="font-size: 80%;"><b>Systematic reduction</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs020.html#prerequisites-collect-and-pre-process-data" style="font-size: 80%;"><b>Prerequisites: Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs021.html#importing-keras-and-tensorflow" style="font-size: 80%;"><b>Importing Keras and Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs022.html#running-with-keras" style="font-size: 80%;"><b>Running with Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="#final-part" style="font-size: 80%;"><b>Final part</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs024.html#final-visualization" style="font-size: 80%;"><b>Final visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs025.html#the-cifar01-data-set" style="font-size: 80%;"><b>The CIFAR01 data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs026.html#verifying-the-data-set" style="font-size: 80%;"><b>Verifying the data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs027.html#set-up-the-model" style="font-size: 80%;"><b>Set up  the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs028.html#add-dense-layers-on-top" style="font-size: 80%;"><b>Add Dense layers on top</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs029.html#compile-and-train-the-model" style="font-size: 80%;"><b>Compile and train the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs030.html#finally-evaluate-the-model" style="font-size: 80%;"><b>Finally, evaluate the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#building-our-own-cnn-code" style="font-size: 80%;"><b>Building our own CNN code</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#list-of-contents" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;List of contents:</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution2dlayer-convolution-in-a-hidden-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution2DLayer: convolution in a hidden layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#backpropagation-in-the-convolutional-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Backpropagation in the convolutional layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#demonstration" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Demonstration</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#pooling-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Pooling Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#flattening-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Flattening Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#fully-connected-layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Fully Connected Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#optimized-convolution2dlayer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Optimized Convolution2DLayer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#the-convolutional-neural-network-cnn" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;The Convolutional Neural Network (CNN)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-cnn-code" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of CNN code</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#additional-remarks" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Additional Remarks</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#remarks-on-the-speed" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Remarks on the speed</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution-using-separable-kernels" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution using separable kernels</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution-in-the-fourier-domain" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution in the Fourier domain</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs032.html#from-ffnns-and-cnns-to-recurrent-neural-networks-rnns" style="font-size: 80%;"><b>From FFNNs and CNNs to recurrent neural networks (RNNs)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs033.html#feedback-connections" style="font-size: 80%;"><b>Feedback connections</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs034.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs035.html#recurrent-neural-networks-rnns-overarching-view" style="font-size: 80%;"><b>Recurrent neural networks (RNNs): Overarching view</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs036.html#sequential-data-only" style="font-size: 80%;"><b>Sequential data only?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs037.html#differential-equations" style="font-size: 80%;"><b>Differential equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs038.html#a-simple-example" style="font-size: 80%;"><b>A simple example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs039.html#rnns" style="font-size: 80%;"><b>RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs040.html#what-kinds-of-behaviour-can-rnns-exhibit" style="font-size: 80%;"><b>What kinds of behaviour can RNNs exhibit?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs041.html#basic-layout-figures-from-sebastian-rashcka-et-al-machine-learning-with-sickit-learn-and-pytorch-https-sebastianraschka-com-blog-2022-ml-pytorch-book-html" style="font-size: 80%;"><b>Basic layout,  "Figures from Sebastian Rashcka et al, Machine learning with Sickit-Learn and PyTorch":"https://sebastianraschka.com/blog/2022/ml-pytorch-book.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs042.html#solving-differential-equations-with-rnns" style="font-size: 80%;"><b>Solving differential equations with RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs043.html#two-first-order-differential-equations" style="font-size: 80%;"><b>Two first-order differential equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs044.html#velocity-only" style="font-size: 80%;"><b>Velocity only</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs045.html#linking-with-rnns" style="font-size: 80%;"><b>Linking with RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs046.html#minor-rewrite" style="font-size: 80%;"><b>Minor rewrite</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs047.html#rnns-in-more-detail" style="font-size: 80%;"><b>RNNs in more detail</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs048.html#rnns-in-more-detail-part-2" style="font-size: 80%;"><b>RNNs in more detail, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs049.html#rnns-in-more-detail-part-3" style="font-size: 80%;"><b>RNNs in more detail, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs050.html#rnns-in-more-detail-part-4" style="font-size: 80%;"><b>RNNs in more detail, part 4</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs051.html#rnns-in-more-detail-part-5" style="font-size: 80%;"><b>RNNs in more detail, part 5</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs052.html#rnns-in-more-detail-part-6" style="font-size: 80%;"><b>RNNs in more detail, part 6</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs053.html#rnns-in-more-detail-part-7" style="font-size: 80%;"><b>RNNs in more detail, part 7</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs054.html#backpropagation-through-time" style="font-size: 80%;"><b>Backpropagation through time</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs055.html#the-backward-pass-is-linear" style="font-size: 80%;"><b>The backward pass is linear</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs056.html#the-problem-of-exploding-or-vanishing-gradients" style="font-size: 80%;"><b>The problem of exploding or vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs057.html#mathematical-setup" style="font-size: 80%;"><b>Mathematical setup</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs058.html#back-propagation-in-time-through-figures-part-1" style="font-size: 80%;"><b>Back propagation in time through figures, part 1</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs059.html#back-propagation-in-time-part-2" style="font-size: 80%;"><b>Back propagation in time, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs060.html#back-propagation-in-time-part-3" style="font-size: 80%;"><b>Back propagation in time, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs061.html#back-propagation-in-time-part-4" style="font-size: 80%;"><b>Back propagation in time, part 4</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs062.html#back-propagation-in-time-in-equations" style="font-size: 80%;"><b>Back propagation in time in equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs063.html#chain-rule-again" style="font-size: 80%;"><b>Chain rule again</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs064.html#gradients-of-loss-functions" style="font-size: 80%;"><b>Gradients of loss functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs065.html#summary-of-rnns" style="font-size: 80%;"><b>Summary of RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs066.html#summary-of-a-typical-rnn" style="font-size: 80%;"><b>Summary of a  typical RNN</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs067.html#four-effective-ways-to-learn-an-rnn-and-preparing-for-next-week" style="font-size: 80%;"><b>Four effective ways to learn an RNN and preparing for next week</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs068.html#gating-mechanism-long-short-term-memory-lstm" style="font-size: 80%;"><b>Gating mechanism: Long Short Term Memory (LSTM)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs069.html#implementing-a-memory-cell-in-a-neural-network" style="font-size: 80%;"><b>Implementing a memory cell in a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs070.html#lstm-details" style="font-size: 80%;"><b>LSTM details</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs071.html#basic-layout" style="font-size: 80%;"><b>Basic layout</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs072.html#more-lstm-details" style="font-size: 80%;"><b>More LSTM details</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs073.html#the-forget-gate" style="font-size: 80%;"><b>The forget gate</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs074.html#input-gate" style="font-size: 80%;"><b>Input gate</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs075.html#forget-and-input" style="font-size: 80%;"><b>Forget and input</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs076.html#output-gate" style="font-size: 80%;"><b>Output gate</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs001.html#plans-for-week-45" style="font-size: 80%;">Plans for week 45</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs002.html#material-for-the-lab-sessions" style="font-size: 80%;">Material for the lab sessions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs003.html#material-for-lecture-monday-november-3" style="font-size: 80%;">Material for Lecture Monday November 3</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs004.html#convolutional-neural-networks-recognizing-images-reminder-from-last-week" style="font-size: 80%;">Convolutional Neural Networks (recognizing images), reminder from last week</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs005.html#what-is-the-difference" style="font-size: 80%;">What is the Difference</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs006.html#neural-networks-vs-cnns" style="font-size: 80%;">Neural Networks vs CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs007.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;">Why CNNS for images, sound files, medical images from CT scans etc?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs008.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;">Regular NNs don’t scale well to full images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs009.html#3d-volumes-of-neurons" style="font-size: 80%;">3D volumes of neurons</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs010.html#more-on-dimensionalities" style="font-size: 80%;">More on Dimensionalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs011.html#further-remarks" style="font-size: 80%;">Further remarks</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs012.html#layers-used-to-build-cnns" style="font-size: 80%;">Layers used to build CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs013.html#transforming-images" style="font-size: 80%;">Transforming images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs014.html#cnns-in-brief" style="font-size: 80%;">CNNs in brief</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs015.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs016.html#key-idea" style="font-size: 80%;">Key Idea</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs017.html#mathematics-of-cnns" style="font-size: 80%;">Mathematics of CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs018.html#convolution-examples-polynomial-multiplication" style="font-size: 80%;">Convolution Examples: Polynomial multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs019.html#efficient-polynomial-multiplication" style="font-size: 80%;">Efficient Polynomial Multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs020.html#further-simplification" style="font-size: 80%;">Further simplification</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs021.html#a-more-efficient-way-of-coding-the-above-convolution" style="font-size: 80%;">A more efficient way of coding the above Convolution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs022.html#commutative-process" style="font-size: 80%;">Commutative process</a></li>
+     <!-- navigation toc: --> <li><a href="#toeplitz-matrices" style="font-size: 80%;">Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs024.html#fourier-series-and-toeplitz-matrices" style="font-size: 80%;">Fourier series and Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs025.html#generalizing-the-above-one-dimensional-case" style="font-size: 80%;">Generalizing the above one-dimensional case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs026.html#memory-considerations" style="font-size: 80%;">Memory considerations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs027.html#padding" style="font-size: 80%;">Padding</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs028.html#new-vector" style="font-size: 80%;">New vector</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs029.html#rewriting-as-dot-products" style="font-size: 80%;">Rewriting as dot products</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs030.html#cross-correlation" style="font-size: 80%;">Cross correlation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs030.html#two-dimensional-objects" style="font-size: 80%;">Two-dimensional objects</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#cnns-in-more-detail-simple-example" style="font-size: 80%;">CNNs in more detail, simple example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs032.html#the-convolution-stage" style="font-size: 80%;">The convolution stage</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs033.html#finding-the-number-of-parameters" style="font-size: 80%;">Finding the number of parameters</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs034.html#new-image-or-volume" style="font-size: 80%;">New image (or volume)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs035.html#parameters-to-train-common-settings" style="font-size: 80%;">Parameters to train, common settings</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs036.html#examples-of-cnn-setups" style="font-size: 80%;">Examples of CNN setups</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs037.html#summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Summarizing: Performing a general discrete convolution ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs038.html#pooling" style="font-size: 80%;">Pooling</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs039.html#pooling-arithmetic" style="font-size: 80%;">Pooling arithmetic</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs040.html#pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Pooling types ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs041.html#building-convolutional-neural-networks-using-tensorflow-and-keras" style="font-size: 80%;">Building convolutional neural networks using Tensorflow and Keras</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs042.html#setting-it-up" style="font-size: 80%;">Setting it up</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs043.html#the-mnist-dataset-again" style="font-size: 80%;">The MNIST dataset again</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs044.html#strong-correlations" style="font-size: 80%;">Strong correlations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs045.html#layers-of-a-cnn" style="font-size: 80%;">Layers of a CNN</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs046.html#systematic-reduction" style="font-size: 80%;">Systematic reduction</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs047.html#prerequisites-collect-and-pre-process-data" style="font-size: 80%;">Prerequisites: Collect and pre-process data</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs048.html#importing-keras-and-tensorflow" style="font-size: 80%;">Importing Keras and Tensorflow</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs049.html#running-with-keras" style="font-size: 80%;">Running with Keras</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs050.html#final-part" style="font-size: 80%;">Final part</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs051.html#final-visualization" style="font-size: 80%;">Final visualization</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs052.html#the-cifar01-data-set" style="font-size: 80%;">The CIFAR01 data set</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs053.html#verifying-the-data-set" style="font-size: 80%;">Verifying the data set</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs054.html#set-up-the-model" style="font-size: 80%;">Set up  the model</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs055.html#add-dense-layers-on-top" style="font-size: 80%;">Add Dense layers on top</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs056.html#compile-and-train-the-model" style="font-size: 80%;">Compile and train the model</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs057.html#finally-evaluate-the-model" style="font-size: 80%;">Finally, evaluate the model</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs058.html#building-code-using-pytorch" style="font-size: 80%;">Building code using Pytorch</a></li>
 
         </ul>
       </li>
@@ -418,46 +278,32 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0023"></a>
 <!-- !split -->
-<h2 id="final-part" class="anchor">Final part </h2>
+<h2 id="toeplitz-matrices" class="anchor">Toeplitz matrices </h2>
 
+<p>The above matrices are examples of so-called <a href="https://link.springer.com/book/10.1007/978-93-86279-04-0" target="_self">Toeplitz
+matrices</a>. A
+Toeplitz matrix is a matrix in which each descending diagonal from
+left to right is constant. For instance the last matrix, which we
+rewrite as
+</p>
+$$
+\boldsymbol{A}=\begin{bmatrix}a_0 & 0 & 0  \\
+                            a_1 & a_0 & 0  \\
+			    a_2 & a_1 & a_0  \\
+			    a_3 & a_2 & a_1 \\
+			    0 & a_3 & a_2 \\
+			    0 & 0 & a_3
+			    \end{bmatrix},
+$$
 
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;">CNN_keras <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros((<span style="color: #008000">len</span>(eta_vals), <span style="color: #008000">len</span>(lmbd_vals)), dtype<span style="color: #666666">=</span><span style="color: #008000">object</span>)
-        
-<span style="color: #008000; font-weight: bold">for</span> i, eta <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">enumerate</span>(eta_vals):
-    <span style="color: #008000; font-weight: bold">for</span> j, lmbd <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">enumerate</span>(lmbd_vals):
-        CNN <span style="color: #666666">=</span> create_convolutional_neural_network_keras(input_shape, receptive_field,
-                                              n_filters, n_neurons_connected, n_categories,
-                                              eta, lmbd)
-        CNN<span style="color: #666666">.</span>fit(X_train, Y_train, epochs<span style="color: #666666">=</span>epochs, batch_size<span style="color: #666666">=</span>batch_size, verbose<span style="color: #666666">=0</span>)
-        scores <span style="color: #666666">=</span> CNN<span style="color: #666666">.</span>evaluate(X_test, Y_test)
-        
-        CNN_keras[i][j] <span style="color: #666666">=</span> CNN
-        
-        <span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;Learning rate = &quot;</span>, eta)
-        <span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;Lambda = &quot;</span>, lmbd)
-        <span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;Test accuracy: </span><span style="color: #BB6688; font-weight: bold">%.3f</span><span style="color: #BA2121">&quot;</span> <span style="color: #666666">%</span> scores[<span style="color: #666666">1</span>])
-        <span style="color: #008000">print</span>()
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
+<p>with elements \( a_{ii}=a_{i+1,j+1}=a_{i-j} \) is an example of a Toeplitz
+matrix. Such a matrix does not need to be a square matrix.  Toeplitz
+matrices are also closely connected with Fourier series, because the multiplication operator by a trigonometric
+polynomial, compressed to a finite-dimensional space, can be
+represented by such a matrix. The example above shows that we can
+represent linear convolution as multiplication of a Toeplitz matrix by
+a vector.
+</p>
 
 <p>
 <!-- navigation buttons at the bottom of the page -->
@@ -484,7 +330,7 @@ <h2 id="final-part" class="anchor">Final part </h2>
   <li><a href="http://github.com/._week45-bs031.html">32</a></li>
   <li><a href="http://github.com/._week45-bs032.html">33</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week45-bs076.html">77</a></li>
+  <li><a href="http://github.com/._week45-bs058.html">59</a></li>
   <li><a href="http://github.com/._week45-bs024.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week45/html/._week45-bs024.html b/doc/pub/week45/html/._week45-bs024.html
index 797161288..8ed5d0f60 100644
--- a/doc/pub/week45/html/._week45-bs024.html
+++ b/doc/pub/week45/html/._week45-bs024.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks (RNNs)">
-<title>Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks (RNNs)</title>
+<meta name="description" content="Week 45,  Convolutional Neural Networks (CCNs)">
+<title>Week 45,  Convolutional Neural Networks (CCNs)</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week45.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week45-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -37,19 +37,19 @@
 <!-- tocinfo
 {'highest level': 2,
  'sections': [('Plans for week 45', 2, None, 'plans-for-week-45'),
-              ('Material for the lab sessions, additional ways to present '
-               'classification results and other practicalities',
+              ('Material for the lab sessions',
                2,
                None,
-               'material-for-the-lab-sessions-additional-ways-to-present-classification-results-and-other-practicalities'),
-              ('Material for Lecture Monday November 4',
+               'material-for-the-lab-sessions'),
+              ('Material for Lecture Monday November 3',
                2,
                None,
-               'material-for-lecture-monday-november-4'),
-              ('Convolutional Neural Networks (recognizing images)',
+               'material-for-lecture-monday-november-3'),
+              ('Convolutional Neural Networks (recognizing images), reminder '
+               'from last week',
                2,
                None,
-               'convolutional-neural-networks-recognizing-images'),
+               'convolutional-neural-networks-recognizing-images-reminder-from-last-week'),
               ('What is the Difference', 2, None, 'what-is-the-difference'),
               ('Neural Networks vs CNNs', 2, None, 'neural-networks-vs-cnns'),
               ('Why CNNS for images, sound files, medical images from CT scans '
@@ -62,10 +62,13 @@
                None,
                'regular-nns-don-t-scale-well-to-full-images'),
               ('3D volumes of neurons', 2, None, '3d-volumes-of-neurons'),
+              ('More on Dimensionalities', 2, None, 'more-on-dimensionalities'),
+              ('Further remarks', 2, None, 'further-remarks'),
               ('Layers used to build CNNs',
                2,
                None,
                'layers-used-to-build-cnns'),
+              ('Transforming images', 2, None, 'transforming-images'),
               ('CNNs in brief', 2, None, 'cnns-in-brief'),
               ('A deep CNN model ("From Raschka et '
                'al":"https://github.com/rasbt/machine-learning-book")',
@@ -73,208 +76,104 @@
                None,
                'a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
               ('Key Idea', 2, None, 'key-idea'),
-              ('Building convolutional neural networks in Tensorflow and Keras',
+              ('Mathematics of CNNs', 2, None, 'mathematics-of-cnns'),
+              ('Convolution Examples: Polynomial multiplication',
                2,
                None,
-               'building-convolutional-neural-networks-in-tensorflow-and-keras'),
-              ('Setting it up', 2, None, 'setting-it-up'),
-              ('The MNIST dataset again', 2, None, 'the-mnist-dataset-again'),
-              ('Strong correlations', 2, None, 'strong-correlations'),
-              ('Layers of a CNN', 2, None, 'layers-of-a-cnn'),
-              ('Systematic reduction', 2, None, 'systematic-reduction'),
-              ('Prerequisites: Collect and pre-process data',
-               2,
-               None,
-               'prerequisites-collect-and-pre-process-data'),
-              ('Importing Keras and Tensorflow',
-               2,
-               None,
-               'importing-keras-and-tensorflow'),
-              ('Running with Keras', 2, None, 'running-with-keras'),
-              ('Final part', 2, None, 'final-part'),
-              ('Final visualization', 2, None, 'final-visualization'),
-              ('The CIFAR01 data set', 2, None, 'the-cifar01-data-set'),
-              ('Verifying the data set', 2, None, 'verifying-the-data-set'),
-              ('Set up  the model', 2, None, 'set-up-the-model'),
-              ('Add Dense layers on top', 2, None, 'add-dense-layers-on-top'),
-              ('Compile and train the model',
-               2,
-               None,
-               'compile-and-train-the-model'),
-              ('Finally, evaluate the model',
-               2,
-               None,
-               'finally-evaluate-the-model'),
-              ('Building our own CNN code',
-               2,
-               None,
-               'building-our-own-cnn-code'),
-              ('List of contents:', 3, None, 'list-of-contents'),
-              ('Schedulers', 3, None, 'schedulers'),
-              ('Usage of schedulers', 3, None, 'usage-of-schedulers'),
-              ('Cost functions', 3, None, 'cost-functions'),
-              ('Usage of cost functions', 3, None, 'usage-of-cost-functions'),
-              ('Activation functions', 3, None, 'activation-functions'),
-              ('Usage of activation functions',
-               3,
-               None,
-               'usage-of-activation-functions'),
-              ('Convolution', 3, None, 'convolution'),
-              ('Layers', 3, None, 'layers'),
-              ('Convolution2DLayer: convolution in a hidden layer',
-               3,
-               None,
-               'convolution2dlayer-convolution-in-a-hidden-layer'),
-              ('Backpropagation in the convolutional layer',
-               3,
-               None,
-               'backpropagation-in-the-convolutional-layer'),
-              ('Demonstration', 3, None, 'demonstration'),
-              ('Pooling Layer', 3, None, 'pooling-layer'),
-              ('Flattening Layer', 3, None, 'flattening-layer'),
-              ('Fully Connected Layers', 3, None, 'fully-connected-layers'),
-              ('Optimized Convolution2DLayer',
-               3,
-               None,
-               'optimized-convolution2dlayer'),
-              ('The Convolutional Neural Network (CNN)',
-               3,
-               None,
-               'the-convolutional-neural-network-cnn'),
-              ('Usage of CNN code', 3, None, 'usage-of-cnn-code'),
-              ('Additional Remarks', 3, None, 'additional-remarks'),
-              ('Remarks on the speed', 3, None, 'remarks-on-the-speed'),
-              ('Convolution using separable kernels',
-               3,
-               None,
-               'convolution-using-separable-kernels'),
-              ('Convolution in the Fourier domain',
-               3,
-               None,
-               'convolution-in-the-fourier-domain'),
-              ('From FFNNs and CNNs to recurrent neural networks (RNNs)',
-               2,
-               None,
-               'from-ffnns-and-cnns-to-recurrent-neural-networks-rnns'),
-              ('Feedback connections', 2, None, 'feedback-connections'),
-              ('Vanishing gradients', 2, None, 'vanishing-gradients'),
-              ('Recurrent neural networks (RNNs): Overarching view',
-               2,
-               None,
-               'recurrent-neural-networks-rnns-overarching-view'),
-              ('Sequential data only?', 2, None, 'sequential-data-only'),
-              ('Differential equations', 2, None, 'differential-equations'),
-              ('A simple example', 2, None, 'a-simple-example'),
-              ('RNNs', 2, None, 'rnns'),
-              ('What kinds of behaviour can RNNs exhibit?',
-               2,
-               None,
-               'what-kinds-of-behaviour-can-rnns-exhibit'),
-              ('Basic layout,  "Figures from Sebastian Rashcka et al, Machine '
-               'learning with Sickit-Learn and '
-               'PyTorch":"https://sebastianraschka.com/blog/2022/ml-pytorch-book.html"',
-               2,
-               None,
-               'basic-layout-figures-from-sebastian-rashcka-et-al-machine-learning-with-sickit-learn-and-pytorch-https-sebastianraschka-com-blog-2022-ml-pytorch-book-html'),
-              ('Solving differential equations with RNNs',
+               'convolution-examples-polynomial-multiplication'),
+              ('Efficient Polynomial Multiplication',
                2,
                None,
-               'solving-differential-equations-with-rnns'),
-              ('Two first-order differential equations',
+               'efficient-polynomial-multiplication'),
+              ('Further simplification', 2, None, 'further-simplification'),
+              ('A more efficient way of coding the above Convolution',
                2,
                None,
-               'two-first-order-differential-equations'),
-              ('Velocity only', 2, None, 'velocity-only'),
-              ('Linking with RNNs', 2, None, 'linking-with-rnns'),
-              ('Minor rewrite', 2, None, 'minor-rewrite'),
-              ('RNNs in more detail', 2, None, 'rnns-in-more-detail'),
-              ('RNNs in more detail, part 2',
+               'a-more-efficient-way-of-coding-the-above-convolution'),
+              ('Commutative process', 2, None, 'commutative-process'),
+              ('Toeplitz matrices', 2, None, 'toeplitz-matrices'),
+              ('Fourier series and Toeplitz matrices',
                2,
                None,
-               'rnns-in-more-detail-part-2'),
-              ('RNNs in more detail, part 3',
+               'fourier-series-and-toeplitz-matrices'),
+              ('Generalizing the above one-dimensional case',
                2,
                None,
-               'rnns-in-more-detail-part-3'),
-              ('RNNs in more detail, part 4',
+               'generalizing-the-above-one-dimensional-case'),
+              ('Memory considerations', 2, None, 'memory-considerations'),
+              ('Padding', 2, None, 'padding'),
+              ('New vector', 2, None, 'new-vector'),
+              ('Rewriting as dot products',
                2,
                None,
-               'rnns-in-more-detail-part-4'),
-              ('RNNs in more detail, part 5',
+               'rewriting-as-dot-products'),
+              ('Cross correlation', 2, None, 'cross-correlation'),
+              ('Two-dimensional objects', 2, None, 'two-dimensional-objects'),
+              ('CNNs in more detail, simple example',
                2,
                None,
-               'rnns-in-more-detail-part-5'),
-              ('RNNs in more detail, part 6',
+               'cnns-in-more-detail-simple-example'),
+              ('The convolution stage', 2, None, 'the-convolution-stage'),
+              ('Finding the number of parameters',
                2,
                None,
-               'rnns-in-more-detail-part-6'),
-              ('RNNs in more detail, part 7',
+               'finding-the-number-of-parameters'),
+              ('New image (or volume)', 2, None, 'new-image-or-volume'),
+              ('Parameters to train, common settings',
                2,
                None,
-               'rnns-in-more-detail-part-7'),
-              ('Backpropagation through time',
-               2,
-               None,
-               'backpropagation-through-time'),
-              ('The backward pass is linear',
-               2,
-               None,
-               'the-backward-pass-is-linear'),
-              ('The problem of exploding or vanishing gradients',
-               2,
-               None,
-               'the-problem-of-exploding-or-vanishing-gradients'),
-              ('Mathematical setup', 2, None, 'mathematical-setup'),
-              ('Back propagation in time through figures, part 1',
-               2,
-               None,
-               'back-propagation-in-time-through-figures-part-1'),
-              ('Back propagation in time, part 2',
-               2,
-               None,
-               'back-propagation-in-time-part-2'),
-              ('Back propagation in time, part 3',
+               'parameters-to-train-common-settings'),
+              ('Examples of CNN setups', 2, None, 'examples-of-cnn-setups'),
+              ('Summarizing: Performing a general discrete convolution ("From '
+               'Raschka et '
+               'al":"https://github.com/rasbt/machine-learning-book")',
                2,
                None,
-               'back-propagation-in-time-part-3'),
-              ('Back propagation in time, part 4',
+               'summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
+              ('Pooling', 2, None, 'pooling'),
+              ('Pooling arithmetic', 2, None, 'pooling-arithmetic'),
+              ('Pooling types ("From Raschka et '
+               'al":"https://github.com/rasbt/machine-learning-book")',
                2,
                None,
-               'back-propagation-in-time-part-4'),
-              ('Back propagation in time in equations',
+               'pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
+              ('Building convolutional neural networks using Tensorflow and '
+               'Keras',
                2,
                None,
-               'back-propagation-in-time-in-equations'),
-              ('Chain rule again', 2, None, 'chain-rule-again'),
-              ('Gradients of loss functions',
+               'building-convolutional-neural-networks-using-tensorflow-and-keras'),
+              ('Setting it up', 2, None, 'setting-it-up'),
+              ('The MNIST dataset again', 2, None, 'the-mnist-dataset-again'),
+              ('Strong correlations', 2, None, 'strong-correlations'),
+              ('Layers of a CNN', 2, None, 'layers-of-a-cnn'),
+              ('Systematic reduction', 2, None, 'systematic-reduction'),
+              ('Prerequisites: Collect and pre-process data',
                2,
                None,
-               'gradients-of-loss-functions'),
-              ('Summary of RNNs', 2, None, 'summary-of-rnns'),
-              ('Summary of a  typical RNN',
+               'prerequisites-collect-and-pre-process-data'),
+              ('Importing Keras and Tensorflow',
                2,
                None,
-               'summary-of-a-typical-rnn'),
-              ('Four effective ways to learn an RNN and preparing for next '
-               'week',
+               'importing-keras-and-tensorflow'),
+              ('Running with Keras', 2, None, 'running-with-keras'),
+              ('Final part', 2, None, 'final-part'),
+              ('Final visualization', 2, None, 'final-visualization'),
+              ('The CIFAR01 data set', 2, None, 'the-cifar01-data-set'),
+              ('Verifying the data set', 2, None, 'verifying-the-data-set'),
+              ('Set up  the model', 2, None, 'set-up-the-model'),
+              ('Add Dense layers on top', 2, None, 'add-dense-layers-on-top'),
+              ('Compile and train the model',
                2,
                None,
-               'four-effective-ways-to-learn-an-rnn-and-preparing-for-next-week'),
-              ('Gating mechanism: Long Short Term Memory (LSTM)',
+               'compile-and-train-the-model'),
+              ('Finally, evaluate the model',
                2,
                None,
-               'gating-mechanism-long-short-term-memory-lstm'),
-              ('Implementing a memory cell in a neural network',
+               'finally-evaluate-the-model'),
+              ('Building code using Pytorch',
                2,
                None,
-               'implementing-a-memory-cell-in-a-neural-network'),
-              ('LSTM details', 2, None, 'lstm-details'),
-              ('Basic layout', 2, None, 'basic-layout'),
-              ('More LSTM details', 2, None, 'more-lstm-details'),
-              ('The forget gate', 2, None, 'the-forget-gate'),
-              ('Input gate', 2, None, 'input-gate'),
-              ('Forget and input', 2, None, 'forget-and-input'),
-              ('Output gate', 2, None, 'output-gate')]}
+               'building-code-using-pytorch')]}
 end of tocinfo -->
 
 <body>
@@ -302,111 +201,72 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week45-bs.html">Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks (RNNs)</a>
+    <a class="navbar-brand" href="http://github.com/week45-bs.html">Week 45,  Convolutional Neural Networks (CCNs)</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs001.html#plans-for-week-45" style="font-size: 80%;"><b>Plans for week 45</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs002.html#material-for-the-lab-sessions-additional-ways-to-present-classification-results-and-other-practicalities" style="font-size: 80%;"><b>Material for the lab sessions, additional ways to present classification results and other practicalities</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs003.html#material-for-lecture-monday-november-4" style="font-size: 80%;"><b>Material for Lecture Monday November 4</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs004.html#convolutional-neural-networks-recognizing-images" style="font-size: 80%;"><b>Convolutional Neural Networks (recognizing images)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs005.html#what-is-the-difference" style="font-size: 80%;"><b>What is the Difference</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs006.html#neural-networks-vs-cnns" style="font-size: 80%;"><b>Neural Networks vs CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs007.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;"><b>Why CNNS for images, sound files, medical images from CT scans etc?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs008.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;"><b>Regular NNs don’t scale well to full images</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs009.html#3d-volumes-of-neurons" style="font-size: 80%;"><b>3D volumes of neurons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs010.html#layers-used-to-build-cnns" style="font-size: 80%;"><b>Layers used to build CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs011.html#cnns-in-brief" style="font-size: 80%;"><b>CNNs in brief</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs012.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs013.html#key-idea" style="font-size: 80%;"><b>Key Idea</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs014.html#building-convolutional-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building convolutional neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs015.html#setting-it-up" style="font-size: 80%;"><b>Setting it up</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs016.html#the-mnist-dataset-again" style="font-size: 80%;"><b>The MNIST dataset again</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs017.html#strong-correlations" style="font-size: 80%;"><b>Strong correlations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs018.html#layers-of-a-cnn" style="font-size: 80%;"><b>Layers of a CNN</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs019.html#systematic-reduction" style="font-size: 80%;"><b>Systematic reduction</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs020.html#prerequisites-collect-and-pre-process-data" style="font-size: 80%;"><b>Prerequisites: Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs021.html#importing-keras-and-tensorflow" style="font-size: 80%;"><b>Importing Keras and Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs022.html#running-with-keras" style="font-size: 80%;"><b>Running with Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs023.html#final-part" style="font-size: 80%;"><b>Final part</b></a></li>
-     <!-- navigation toc: --> <li><a href="#final-visualization" style="font-size: 80%;"><b>Final visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs025.html#the-cifar01-data-set" style="font-size: 80%;"><b>The CIFAR01 data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs026.html#verifying-the-data-set" style="font-size: 80%;"><b>Verifying the data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs027.html#set-up-the-model" style="font-size: 80%;"><b>Set up  the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs028.html#add-dense-layers-on-top" style="font-size: 80%;"><b>Add Dense layers on top</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs029.html#compile-and-train-the-model" style="font-size: 80%;"><b>Compile and train the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs030.html#finally-evaluate-the-model" style="font-size: 80%;"><b>Finally, evaluate the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#building-our-own-cnn-code" style="font-size: 80%;"><b>Building our own CNN code</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#list-of-contents" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;List of contents:</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution2dlayer-convolution-in-a-hidden-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution2DLayer: convolution in a hidden layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#backpropagation-in-the-convolutional-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Backpropagation in the convolutional layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#demonstration" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Demonstration</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#pooling-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Pooling Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#flattening-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Flattening Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#fully-connected-layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Fully Connected Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#optimized-convolution2dlayer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Optimized Convolution2DLayer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#the-convolutional-neural-network-cnn" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;The Convolutional Neural Network (CNN)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-cnn-code" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of CNN code</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#additional-remarks" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Additional Remarks</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#remarks-on-the-speed" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Remarks on the speed</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution-using-separable-kernels" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution using separable kernels</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution-in-the-fourier-domain" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution in the Fourier domain</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs032.html#from-ffnns-and-cnns-to-recurrent-neural-networks-rnns" style="font-size: 80%;"><b>From FFNNs and CNNs to recurrent neural networks (RNNs)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs033.html#feedback-connections" style="font-size: 80%;"><b>Feedback connections</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs034.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs035.html#recurrent-neural-networks-rnns-overarching-view" style="font-size: 80%;"><b>Recurrent neural networks (RNNs): Overarching view</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs036.html#sequential-data-only" style="font-size: 80%;"><b>Sequential data only?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs037.html#differential-equations" style="font-size: 80%;"><b>Differential equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs038.html#a-simple-example" style="font-size: 80%;"><b>A simple example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs039.html#rnns" style="font-size: 80%;"><b>RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs040.html#what-kinds-of-behaviour-can-rnns-exhibit" style="font-size: 80%;"><b>What kinds of behaviour can RNNs exhibit?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs041.html#basic-layout-figures-from-sebastian-rashcka-et-al-machine-learning-with-sickit-learn-and-pytorch-https-sebastianraschka-com-blog-2022-ml-pytorch-book-html" style="font-size: 80%;"><b>Basic layout,  "Figures from Sebastian Rashcka et al, Machine learning with Sickit-Learn and PyTorch":"https://sebastianraschka.com/blog/2022/ml-pytorch-book.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs042.html#solving-differential-equations-with-rnns" style="font-size: 80%;"><b>Solving differential equations with RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs043.html#two-first-order-differential-equations" style="font-size: 80%;"><b>Two first-order differential equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs044.html#velocity-only" style="font-size: 80%;"><b>Velocity only</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs045.html#linking-with-rnns" style="font-size: 80%;"><b>Linking with RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs046.html#minor-rewrite" style="font-size: 80%;"><b>Minor rewrite</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs047.html#rnns-in-more-detail" style="font-size: 80%;"><b>RNNs in more detail</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs048.html#rnns-in-more-detail-part-2" style="font-size: 80%;"><b>RNNs in more detail, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs049.html#rnns-in-more-detail-part-3" style="font-size: 80%;"><b>RNNs in more detail, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs050.html#rnns-in-more-detail-part-4" style="font-size: 80%;"><b>RNNs in more detail, part 4</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs051.html#rnns-in-more-detail-part-5" style="font-size: 80%;"><b>RNNs in more detail, part 5</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs052.html#rnns-in-more-detail-part-6" style="font-size: 80%;"><b>RNNs in more detail, part 6</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs053.html#rnns-in-more-detail-part-7" style="font-size: 80%;"><b>RNNs in more detail, part 7</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs054.html#backpropagation-through-time" style="font-size: 80%;"><b>Backpropagation through time</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs055.html#the-backward-pass-is-linear" style="font-size: 80%;"><b>The backward pass is linear</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs056.html#the-problem-of-exploding-or-vanishing-gradients" style="font-size: 80%;"><b>The problem of exploding or vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs057.html#mathematical-setup" style="font-size: 80%;"><b>Mathematical setup</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs058.html#back-propagation-in-time-through-figures-part-1" style="font-size: 80%;"><b>Back propagation in time through figures, part 1</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs059.html#back-propagation-in-time-part-2" style="font-size: 80%;"><b>Back propagation in time, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs060.html#back-propagation-in-time-part-3" style="font-size: 80%;"><b>Back propagation in time, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs061.html#back-propagation-in-time-part-4" style="font-size: 80%;"><b>Back propagation in time, part 4</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs062.html#back-propagation-in-time-in-equations" style="font-size: 80%;"><b>Back propagation in time in equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs063.html#chain-rule-again" style="font-size: 80%;"><b>Chain rule again</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs064.html#gradients-of-loss-functions" style="font-size: 80%;"><b>Gradients of loss functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs065.html#summary-of-rnns" style="font-size: 80%;"><b>Summary of RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs066.html#summary-of-a-typical-rnn" style="font-size: 80%;"><b>Summary of a  typical RNN</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs067.html#four-effective-ways-to-learn-an-rnn-and-preparing-for-next-week" style="font-size: 80%;"><b>Four effective ways to learn an RNN and preparing for next week</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs068.html#gating-mechanism-long-short-term-memory-lstm" style="font-size: 80%;"><b>Gating mechanism: Long Short Term Memory (LSTM)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs069.html#implementing-a-memory-cell-in-a-neural-network" style="font-size: 80%;"><b>Implementing a memory cell in a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs070.html#lstm-details" style="font-size: 80%;"><b>LSTM details</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs071.html#basic-layout" style="font-size: 80%;"><b>Basic layout</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs072.html#more-lstm-details" style="font-size: 80%;"><b>More LSTM details</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs073.html#the-forget-gate" style="font-size: 80%;"><b>The forget gate</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs074.html#input-gate" style="font-size: 80%;"><b>Input gate</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs075.html#forget-and-input" style="font-size: 80%;"><b>Forget and input</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs076.html#output-gate" style="font-size: 80%;"><b>Output gate</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs001.html#plans-for-week-45" style="font-size: 80%;">Plans for week 45</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs002.html#material-for-the-lab-sessions" style="font-size: 80%;">Material for the lab sessions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs003.html#material-for-lecture-monday-november-3" style="font-size: 80%;">Material for Lecture Monday November 3</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs004.html#convolutional-neural-networks-recognizing-images-reminder-from-last-week" style="font-size: 80%;">Convolutional Neural Networks (recognizing images), reminder from last week</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs005.html#what-is-the-difference" style="font-size: 80%;">What is the Difference</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs006.html#neural-networks-vs-cnns" style="font-size: 80%;">Neural Networks vs CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs007.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;">Why CNNS for images, sound files, medical images from CT scans etc?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs008.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;">Regular NNs don’t scale well to full images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs009.html#3d-volumes-of-neurons" style="font-size: 80%;">3D volumes of neurons</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs010.html#more-on-dimensionalities" style="font-size: 80%;">More on Dimensionalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs011.html#further-remarks" style="font-size: 80%;">Further remarks</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs012.html#layers-used-to-build-cnns" style="font-size: 80%;">Layers used to build CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs013.html#transforming-images" style="font-size: 80%;">Transforming images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs014.html#cnns-in-brief" style="font-size: 80%;">CNNs in brief</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs015.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs016.html#key-idea" style="font-size: 80%;">Key Idea</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs017.html#mathematics-of-cnns" style="font-size: 80%;">Mathematics of CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs018.html#convolution-examples-polynomial-multiplication" style="font-size: 80%;">Convolution Examples: Polynomial multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs019.html#efficient-polynomial-multiplication" style="font-size: 80%;">Efficient Polynomial Multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs020.html#further-simplification" style="font-size: 80%;">Further simplification</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs021.html#a-more-efficient-way-of-coding-the-above-convolution" style="font-size: 80%;">A more efficient way of coding the above Convolution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs022.html#commutative-process" style="font-size: 80%;">Commutative process</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs023.html#toeplitz-matrices" style="font-size: 80%;">Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="#fourier-series-and-toeplitz-matrices" style="font-size: 80%;">Fourier series and Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs025.html#generalizing-the-above-one-dimensional-case" style="font-size: 80%;">Generalizing the above one-dimensional case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs026.html#memory-considerations" style="font-size: 80%;">Memory considerations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs027.html#padding" style="font-size: 80%;">Padding</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs028.html#new-vector" style="font-size: 80%;">New vector</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs029.html#rewriting-as-dot-products" style="font-size: 80%;">Rewriting as dot products</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs030.html#cross-correlation" style="font-size: 80%;">Cross correlation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs030.html#two-dimensional-objects" style="font-size: 80%;">Two-dimensional objects</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#cnns-in-more-detail-simple-example" style="font-size: 80%;">CNNs in more detail, simple example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs032.html#the-convolution-stage" style="font-size: 80%;">The convolution stage</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs033.html#finding-the-number-of-parameters" style="font-size: 80%;">Finding the number of parameters</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs034.html#new-image-or-volume" style="font-size: 80%;">New image (or volume)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs035.html#parameters-to-train-common-settings" style="font-size: 80%;">Parameters to train, common settings</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs036.html#examples-of-cnn-setups" style="font-size: 80%;">Examples of CNN setups</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs037.html#summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Summarizing: Performing a general discrete convolution ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs038.html#pooling" style="font-size: 80%;">Pooling</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs039.html#pooling-arithmetic" style="font-size: 80%;">Pooling arithmetic</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs040.html#pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Pooling types ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs041.html#building-convolutional-neural-networks-using-tensorflow-and-keras" style="font-size: 80%;">Building convolutional neural networks using Tensorflow and Keras</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs042.html#setting-it-up" style="font-size: 80%;">Setting it up</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs043.html#the-mnist-dataset-again" style="font-size: 80%;">The MNIST dataset again</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs044.html#strong-correlations" style="font-size: 80%;">Strong correlations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs045.html#layers-of-a-cnn" style="font-size: 80%;">Layers of a CNN</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs046.html#systematic-reduction" style="font-size: 80%;">Systematic reduction</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs047.html#prerequisites-collect-and-pre-process-data" style="font-size: 80%;">Prerequisites: Collect and pre-process data</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs048.html#importing-keras-and-tensorflow" style="font-size: 80%;">Importing Keras and Tensorflow</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs049.html#running-with-keras" style="font-size: 80%;">Running with Keras</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs050.html#final-part" style="font-size: 80%;">Final part</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs051.html#final-visualization" style="font-size: 80%;">Final visualization</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs052.html#the-cifar01-data-set" style="font-size: 80%;">The CIFAR01 data set</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs053.html#verifying-the-data-set" style="font-size: 80%;">Verifying the data set</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs054.html#set-up-the-model" style="font-size: 80%;">Set up  the model</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs055.html#add-dense-layers-on-top" style="font-size: 80%;">Add Dense layers on top</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs056.html#compile-and-train-the-model" style="font-size: 80%;">Compile and train the model</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs057.html#finally-evaluate-the-model" style="font-size: 80%;">Finally, evaluate the model</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs058.html#building-code-using-pytorch" style="font-size: 80%;">Building code using Pytorch</a></li>
 
         </ul>
       </li>
@@ -418,61 +278,13 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0024"></a>
 <!-- !split -->
-<h2 id="final-visualization" class="anchor">Final visualization </h2>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #408080; font-style: italic"># visual representation of grid search</span>
-<span style="color: #408080; font-style: italic"># uses seaborn heatmap, could probably do this in matplotlib</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">seaborn</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">sns</span>
-
-sns<span style="color: #666666">.</span>set()
-
-train_accuracy <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros((<span style="color: #008000">len</span>(eta_vals), <span style="color: #008000">len</span>(lmbd_vals)))
-test_accuracy <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros((<span style="color: #008000">len</span>(eta_vals), <span style="color: #008000">len</span>(lmbd_vals)))
-
-<span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #008000">len</span>(eta_vals)):
-    <span style="color: #008000; font-weight: bold">for</span> j <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #008000">len</span>(lmbd_vals)):
-        CNN <span style="color: #666666">=</span> CNN_keras[i][j]
-
-        train_accuracy[i][j] <span style="color: #666666">=</span> CNN<span style="color: #666666">.</span>evaluate(X_train, Y_train)[<span style="color: #666666">1</span>]
-        test_accuracy[i][j] <span style="color: #666666">=</span> CNN<span style="color: #666666">.</span>evaluate(X_test, Y_test)[<span style="color: #666666">1</span>]
-
-        
-fig, ax <span style="color: #666666">=</span> plt<span style="color: #666666">.</span>subplots(figsize <span style="color: #666666">=</span> (<span style="color: #666666">10</span>, <span style="color: #666666">10</span>))
-sns<span style="color: #666666">.</span>heatmap(train_accuracy, annot<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">True</span>, ax<span style="color: #666666">=</span>ax, cmap<span style="color: #666666">=</span><span style="color: #BA2121">&quot;viridis&quot;</span>)
-ax<span style="color: #666666">.</span>set_title(<span style="color: #BA2121">&quot;Training Accuracy&quot;</span>)
-ax<span style="color: #666666">.</span>set_ylabel(<span style="color: #BA2121">&quot;$\eta$&quot;</span>)
-ax<span style="color: #666666">.</span>set_xlabel(<span style="color: #BA2121">&quot;$\lambda$&quot;</span>)
-plt<span style="color: #666666">.</span>show()
-
-fig, ax <span style="color: #666666">=</span> plt<span style="color: #666666">.</span>subplots(figsize <span style="color: #666666">=</span> (<span style="color: #666666">10</span>, <span style="color: #666666">10</span>))
-sns<span style="color: #666666">.</span>heatmap(test_accuracy, annot<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">True</span>, ax<span style="color: #666666">=</span>ax, cmap<span style="color: #666666">=</span><span style="color: #BA2121">&quot;viridis&quot;</span>)
-ax<span style="color: #666666">.</span>set_title(<span style="color: #BA2121">&quot;Test Accuracy&quot;</span>)
-ax<span style="color: #666666">.</span>set_ylabel(<span style="color: #BA2121">&quot;$\eta$&quot;</span>)
-ax<span style="color: #666666">.</span>set_xlabel(<span style="color: #BA2121">&quot;$\lambda$&quot;</span>)
-plt<span style="color: #666666">.</span>show()
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
+<h2 id="fourier-series-and-toeplitz-matrices" class="anchor">Fourier series and Toeplitz matrices </h2>
 
+<p>This is an active and ogoing research area concerning CNNs. The following articles may be of interest</p>
+<ol>
+<li> <a href="https://www.sciencedirect.com/topics/engineering/convolution-theorem#:~:text=The%20convolution%20theorem%20(together%20with,k%20)%20G%20(%20k%20)%20." target="_self">Read more about the convolution theorem and Fouriers series</a></li>
+<li> <a href="https://www.sciencedirect.com/science/article/pii/S1568494623006257" target="_self">Fourier Transform Layer</a></li>
+</ol>
 <p>
 <!-- navigation buttons at the bottom of the page -->
 <ul class="pagination">
@@ -498,7 +310,7 @@ <h2 id="final-visualization" class="anchor">Final visualization </h2>
   <li><a href="http://github.com/._week45-bs032.html">33</a></li>
   <li><a href="http://github.com/._week45-bs033.html">34</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week45-bs076.html">77</a></li>
+  <li><a href="http://github.com/._week45-bs058.html">59</a></li>
   <li><a href="http://github.com/._week45-bs025.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week45/html/._week45-bs025.html b/doc/pub/week45/html/._week45-bs025.html
index cd4d2ad73..ed0cc8a0d 100644
--- a/doc/pub/week45/html/._week45-bs025.html
+++ b/doc/pub/week45/html/._week45-bs025.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks (RNNs)">
-<title>Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks (RNNs)</title>
+<meta name="description" content="Week 45,  Convolutional Neural Networks (CCNs)">
+<title>Week 45,  Convolutional Neural Networks (CCNs)</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week45.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week45-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -37,19 +37,19 @@
 <!-- tocinfo
 {'highest level': 2,
  'sections': [('Plans for week 45', 2, None, 'plans-for-week-45'),
-              ('Material for the lab sessions, additional ways to present '
-               'classification results and other practicalities',
+              ('Material for the lab sessions',
                2,
                None,
-               'material-for-the-lab-sessions-additional-ways-to-present-classification-results-and-other-practicalities'),
-              ('Material for Lecture Monday November 4',
+               'material-for-the-lab-sessions'),
+              ('Material for Lecture Monday November 3',
                2,
                None,
-               'material-for-lecture-monday-november-4'),
-              ('Convolutional Neural Networks (recognizing images)',
+               'material-for-lecture-monday-november-3'),
+              ('Convolutional Neural Networks (recognizing images), reminder '
+               'from last week',
                2,
                None,
-               'convolutional-neural-networks-recognizing-images'),
+               'convolutional-neural-networks-recognizing-images-reminder-from-last-week'),
               ('What is the Difference', 2, None, 'what-is-the-difference'),
               ('Neural Networks vs CNNs', 2, None, 'neural-networks-vs-cnns'),
               ('Why CNNS for images, sound files, medical images from CT scans '
@@ -62,10 +62,13 @@
                None,
                'regular-nns-don-t-scale-well-to-full-images'),
               ('3D volumes of neurons', 2, None, '3d-volumes-of-neurons'),
+              ('More on Dimensionalities', 2, None, 'more-on-dimensionalities'),
+              ('Further remarks', 2, None, 'further-remarks'),
               ('Layers used to build CNNs',
                2,
                None,
                'layers-used-to-build-cnns'),
+              ('Transforming images', 2, None, 'transforming-images'),
               ('CNNs in brief', 2, None, 'cnns-in-brief'),
               ('A deep CNN model ("From Raschka et '
                'al":"https://github.com/rasbt/machine-learning-book")',
@@ -73,208 +76,104 @@
                None,
                'a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
               ('Key Idea', 2, None, 'key-idea'),
-              ('Building convolutional neural networks in Tensorflow and Keras',
+              ('Mathematics of CNNs', 2, None, 'mathematics-of-cnns'),
+              ('Convolution Examples: Polynomial multiplication',
                2,
                None,
-               'building-convolutional-neural-networks-in-tensorflow-and-keras'),
-              ('Setting it up', 2, None, 'setting-it-up'),
-              ('The MNIST dataset again', 2, None, 'the-mnist-dataset-again'),
-              ('Strong correlations', 2, None, 'strong-correlations'),
-              ('Layers of a CNN', 2, None, 'layers-of-a-cnn'),
-              ('Systematic reduction', 2, None, 'systematic-reduction'),
-              ('Prerequisites: Collect and pre-process data',
-               2,
-               None,
-               'prerequisites-collect-and-pre-process-data'),
-              ('Importing Keras and Tensorflow',
-               2,
-               None,
-               'importing-keras-and-tensorflow'),
-              ('Running with Keras', 2, None, 'running-with-keras'),
-              ('Final part', 2, None, 'final-part'),
-              ('Final visualization', 2, None, 'final-visualization'),
-              ('The CIFAR01 data set', 2, None, 'the-cifar01-data-set'),
-              ('Verifying the data set', 2, None, 'verifying-the-data-set'),
-              ('Set up  the model', 2, None, 'set-up-the-model'),
-              ('Add Dense layers on top', 2, None, 'add-dense-layers-on-top'),
-              ('Compile and train the model',
-               2,
-               None,
-               'compile-and-train-the-model'),
-              ('Finally, evaluate the model',
-               2,
-               None,
-               'finally-evaluate-the-model'),
-              ('Building our own CNN code',
-               2,
-               None,
-               'building-our-own-cnn-code'),
-              ('List of contents:', 3, None, 'list-of-contents'),
-              ('Schedulers', 3, None, 'schedulers'),
-              ('Usage of schedulers', 3, None, 'usage-of-schedulers'),
-              ('Cost functions', 3, None, 'cost-functions'),
-              ('Usage of cost functions', 3, None, 'usage-of-cost-functions'),
-              ('Activation functions', 3, None, 'activation-functions'),
-              ('Usage of activation functions',
-               3,
-               None,
-               'usage-of-activation-functions'),
-              ('Convolution', 3, None, 'convolution'),
-              ('Layers', 3, None, 'layers'),
-              ('Convolution2DLayer: convolution in a hidden layer',
-               3,
-               None,
-               'convolution2dlayer-convolution-in-a-hidden-layer'),
-              ('Backpropagation in the convolutional layer',
-               3,
-               None,
-               'backpropagation-in-the-convolutional-layer'),
-              ('Demonstration', 3, None, 'demonstration'),
-              ('Pooling Layer', 3, None, 'pooling-layer'),
-              ('Flattening Layer', 3, None, 'flattening-layer'),
-              ('Fully Connected Layers', 3, None, 'fully-connected-layers'),
-              ('Optimized Convolution2DLayer',
-               3,
-               None,
-               'optimized-convolution2dlayer'),
-              ('The Convolutional Neural Network (CNN)',
-               3,
-               None,
-               'the-convolutional-neural-network-cnn'),
-              ('Usage of CNN code', 3, None, 'usage-of-cnn-code'),
-              ('Additional Remarks', 3, None, 'additional-remarks'),
-              ('Remarks on the speed', 3, None, 'remarks-on-the-speed'),
-              ('Convolution using separable kernels',
-               3,
-               None,
-               'convolution-using-separable-kernels'),
-              ('Convolution in the Fourier domain',
-               3,
-               None,
-               'convolution-in-the-fourier-domain'),
-              ('From FFNNs and CNNs to recurrent neural networks (RNNs)',
-               2,
-               None,
-               'from-ffnns-and-cnns-to-recurrent-neural-networks-rnns'),
-              ('Feedback connections', 2, None, 'feedback-connections'),
-              ('Vanishing gradients', 2, None, 'vanishing-gradients'),
-              ('Recurrent neural networks (RNNs): Overarching view',
-               2,
-               None,
-               'recurrent-neural-networks-rnns-overarching-view'),
-              ('Sequential data only?', 2, None, 'sequential-data-only'),
-              ('Differential equations', 2, None, 'differential-equations'),
-              ('A simple example', 2, None, 'a-simple-example'),
-              ('RNNs', 2, None, 'rnns'),
-              ('What kinds of behaviour can RNNs exhibit?',
-               2,
-               None,
-               'what-kinds-of-behaviour-can-rnns-exhibit'),
-              ('Basic layout,  "Figures from Sebastian Rashcka et al, Machine '
-               'learning with Sickit-Learn and '
-               'PyTorch":"https://sebastianraschka.com/blog/2022/ml-pytorch-book.html"',
+               'convolution-examples-polynomial-multiplication'),
+              ('Efficient Polynomial Multiplication',
                2,
                None,
-               'basic-layout-figures-from-sebastian-rashcka-et-al-machine-learning-with-sickit-learn-and-pytorch-https-sebastianraschka-com-blog-2022-ml-pytorch-book-html'),
-              ('Solving differential equations with RNNs',
+               'efficient-polynomial-multiplication'),
+              ('Further simplification', 2, None, 'further-simplification'),
+              ('A more efficient way of coding the above Convolution',
                2,
                None,
-               'solving-differential-equations-with-rnns'),
-              ('Two first-order differential equations',
+               'a-more-efficient-way-of-coding-the-above-convolution'),
+              ('Commutative process', 2, None, 'commutative-process'),
+              ('Toeplitz matrices', 2, None, 'toeplitz-matrices'),
+              ('Fourier series and Toeplitz matrices',
                2,
                None,
-               'two-first-order-differential-equations'),
-              ('Velocity only', 2, None, 'velocity-only'),
-              ('Linking with RNNs', 2, None, 'linking-with-rnns'),
-              ('Minor rewrite', 2, None, 'minor-rewrite'),
-              ('RNNs in more detail', 2, None, 'rnns-in-more-detail'),
-              ('RNNs in more detail, part 2',
+               'fourier-series-and-toeplitz-matrices'),
+              ('Generalizing the above one-dimensional case',
                2,
                None,
-               'rnns-in-more-detail-part-2'),
-              ('RNNs in more detail, part 3',
+               'generalizing-the-above-one-dimensional-case'),
+              ('Memory considerations', 2, None, 'memory-considerations'),
+              ('Padding', 2, None, 'padding'),
+              ('New vector', 2, None, 'new-vector'),
+              ('Rewriting as dot products',
                2,
                None,
-               'rnns-in-more-detail-part-3'),
-              ('RNNs in more detail, part 4',
+               'rewriting-as-dot-products'),
+              ('Cross correlation', 2, None, 'cross-correlation'),
+              ('Two-dimensional objects', 2, None, 'two-dimensional-objects'),
+              ('CNNs in more detail, simple example',
                2,
                None,
-               'rnns-in-more-detail-part-4'),
-              ('RNNs in more detail, part 5',
+               'cnns-in-more-detail-simple-example'),
+              ('The convolution stage', 2, None, 'the-convolution-stage'),
+              ('Finding the number of parameters',
                2,
                None,
-               'rnns-in-more-detail-part-5'),
-              ('RNNs in more detail, part 6',
+               'finding-the-number-of-parameters'),
+              ('New image (or volume)', 2, None, 'new-image-or-volume'),
+              ('Parameters to train, common settings',
                2,
                None,
-               'rnns-in-more-detail-part-6'),
-              ('RNNs in more detail, part 7',
-               2,
-               None,
-               'rnns-in-more-detail-part-7'),
-              ('Backpropagation through time',
-               2,
-               None,
-               'backpropagation-through-time'),
-              ('The backward pass is linear',
-               2,
-               None,
-               'the-backward-pass-is-linear'),
-              ('The problem of exploding or vanishing gradients',
-               2,
-               None,
-               'the-problem-of-exploding-or-vanishing-gradients'),
-              ('Mathematical setup', 2, None, 'mathematical-setup'),
-              ('Back propagation in time through figures, part 1',
-               2,
-               None,
-               'back-propagation-in-time-through-figures-part-1'),
-              ('Back propagation in time, part 2',
-               2,
-               None,
-               'back-propagation-in-time-part-2'),
-              ('Back propagation in time, part 3',
+               'parameters-to-train-common-settings'),
+              ('Examples of CNN setups', 2, None, 'examples-of-cnn-setups'),
+              ('Summarizing: Performing a general discrete convolution ("From '
+               'Raschka et '
+               'al":"https://github.com/rasbt/machine-learning-book")',
                2,
                None,
-               'back-propagation-in-time-part-3'),
-              ('Back propagation in time, part 4',
+               'summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
+              ('Pooling', 2, None, 'pooling'),
+              ('Pooling arithmetic', 2, None, 'pooling-arithmetic'),
+              ('Pooling types ("From Raschka et '
+               'al":"https://github.com/rasbt/machine-learning-book")',
                2,
                None,
-               'back-propagation-in-time-part-4'),
-              ('Back propagation in time in equations',
+               'pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
+              ('Building convolutional neural networks using Tensorflow and '
+               'Keras',
                2,
                None,
-               'back-propagation-in-time-in-equations'),
-              ('Chain rule again', 2, None, 'chain-rule-again'),
-              ('Gradients of loss functions',
+               'building-convolutional-neural-networks-using-tensorflow-and-keras'),
+              ('Setting it up', 2, None, 'setting-it-up'),
+              ('The MNIST dataset again', 2, None, 'the-mnist-dataset-again'),
+              ('Strong correlations', 2, None, 'strong-correlations'),
+              ('Layers of a CNN', 2, None, 'layers-of-a-cnn'),
+              ('Systematic reduction', 2, None, 'systematic-reduction'),
+              ('Prerequisites: Collect and pre-process data',
                2,
                None,
-               'gradients-of-loss-functions'),
-              ('Summary of RNNs', 2, None, 'summary-of-rnns'),
-              ('Summary of a  typical RNN',
+               'prerequisites-collect-and-pre-process-data'),
+              ('Importing Keras and Tensorflow',
                2,
                None,
-               'summary-of-a-typical-rnn'),
-              ('Four effective ways to learn an RNN and preparing for next '
-               'week',
+               'importing-keras-and-tensorflow'),
+              ('Running with Keras', 2, None, 'running-with-keras'),
+              ('Final part', 2, None, 'final-part'),
+              ('Final visualization', 2, None, 'final-visualization'),
+              ('The CIFAR01 data set', 2, None, 'the-cifar01-data-set'),
+              ('Verifying the data set', 2, None, 'verifying-the-data-set'),
+              ('Set up  the model', 2, None, 'set-up-the-model'),
+              ('Add Dense layers on top', 2, None, 'add-dense-layers-on-top'),
+              ('Compile and train the model',
                2,
                None,
-               'four-effective-ways-to-learn-an-rnn-and-preparing-for-next-week'),
-              ('Gating mechanism: Long Short Term Memory (LSTM)',
+               'compile-and-train-the-model'),
+              ('Finally, evaluate the model',
                2,
                None,
-               'gating-mechanism-long-short-term-memory-lstm'),
-              ('Implementing a memory cell in a neural network',
+               'finally-evaluate-the-model'),
+              ('Building code using Pytorch',
                2,
                None,
-               'implementing-a-memory-cell-in-a-neural-network'),
-              ('LSTM details', 2, None, 'lstm-details'),
-              ('Basic layout', 2, None, 'basic-layout'),
-              ('More LSTM details', 2, None, 'more-lstm-details'),
-              ('The forget gate', 2, None, 'the-forget-gate'),
-              ('Input gate', 2, None, 'input-gate'),
-              ('Forget and input', 2, None, 'forget-and-input'),
-              ('Output gate', 2, None, 'output-gate')]}
+               'building-code-using-pytorch')]}
 end of tocinfo -->
 
 <body>
@@ -302,111 +201,72 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week45-bs.html">Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks (RNNs)</a>
+    <a class="navbar-brand" href="http://github.com/week45-bs.html">Week 45,  Convolutional Neural Networks (CCNs)</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs001.html#plans-for-week-45" style="font-size: 80%;"><b>Plans for week 45</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs002.html#material-for-the-lab-sessions-additional-ways-to-present-classification-results-and-other-practicalities" style="font-size: 80%;"><b>Material for the lab sessions, additional ways to present classification results and other practicalities</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs003.html#material-for-lecture-monday-november-4" style="font-size: 80%;"><b>Material for Lecture Monday November 4</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs004.html#convolutional-neural-networks-recognizing-images" style="font-size: 80%;"><b>Convolutional Neural Networks (recognizing images)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs005.html#what-is-the-difference" style="font-size: 80%;"><b>What is the Difference</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs006.html#neural-networks-vs-cnns" style="font-size: 80%;"><b>Neural Networks vs CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs007.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;"><b>Why CNNS for images, sound files, medical images from CT scans etc?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs008.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;"><b>Regular NNs don’t scale well to full images</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs009.html#3d-volumes-of-neurons" style="font-size: 80%;"><b>3D volumes of neurons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs010.html#layers-used-to-build-cnns" style="font-size: 80%;"><b>Layers used to build CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs011.html#cnns-in-brief" style="font-size: 80%;"><b>CNNs in brief</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs012.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs013.html#key-idea" style="font-size: 80%;"><b>Key Idea</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs014.html#building-convolutional-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building convolutional neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs015.html#setting-it-up" style="font-size: 80%;"><b>Setting it up</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs016.html#the-mnist-dataset-again" style="font-size: 80%;"><b>The MNIST dataset again</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs017.html#strong-correlations" style="font-size: 80%;"><b>Strong correlations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs018.html#layers-of-a-cnn" style="font-size: 80%;"><b>Layers of a CNN</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs019.html#systematic-reduction" style="font-size: 80%;"><b>Systematic reduction</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs020.html#prerequisites-collect-and-pre-process-data" style="font-size: 80%;"><b>Prerequisites: Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs021.html#importing-keras-and-tensorflow" style="font-size: 80%;"><b>Importing Keras and Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs022.html#running-with-keras" style="font-size: 80%;"><b>Running with Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs023.html#final-part" style="font-size: 80%;"><b>Final part</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs024.html#final-visualization" style="font-size: 80%;"><b>Final visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="#the-cifar01-data-set" style="font-size: 80%;"><b>The CIFAR01 data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs026.html#verifying-the-data-set" style="font-size: 80%;"><b>Verifying the data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs027.html#set-up-the-model" style="font-size: 80%;"><b>Set up  the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs028.html#add-dense-layers-on-top" style="font-size: 80%;"><b>Add Dense layers on top</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs029.html#compile-and-train-the-model" style="font-size: 80%;"><b>Compile and train the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs030.html#finally-evaluate-the-model" style="font-size: 80%;"><b>Finally, evaluate the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#building-our-own-cnn-code" style="font-size: 80%;"><b>Building our own CNN code</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#list-of-contents" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;List of contents:</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution2dlayer-convolution-in-a-hidden-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution2DLayer: convolution in a hidden layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#backpropagation-in-the-convolutional-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Backpropagation in the convolutional layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#demonstration" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Demonstration</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#pooling-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Pooling Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#flattening-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Flattening Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#fully-connected-layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Fully Connected Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#optimized-convolution2dlayer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Optimized Convolution2DLayer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#the-convolutional-neural-network-cnn" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;The Convolutional Neural Network (CNN)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-cnn-code" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of CNN code</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#additional-remarks" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Additional Remarks</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#remarks-on-the-speed" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Remarks on the speed</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution-using-separable-kernels" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution using separable kernels</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution-in-the-fourier-domain" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution in the Fourier domain</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs032.html#from-ffnns-and-cnns-to-recurrent-neural-networks-rnns" style="font-size: 80%;"><b>From FFNNs and CNNs to recurrent neural networks (RNNs)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs033.html#feedback-connections" style="font-size: 80%;"><b>Feedback connections</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs034.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs035.html#recurrent-neural-networks-rnns-overarching-view" style="font-size: 80%;"><b>Recurrent neural networks (RNNs): Overarching view</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs036.html#sequential-data-only" style="font-size: 80%;"><b>Sequential data only?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs037.html#differential-equations" style="font-size: 80%;"><b>Differential equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs038.html#a-simple-example" style="font-size: 80%;"><b>A simple example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs039.html#rnns" style="font-size: 80%;"><b>RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs040.html#what-kinds-of-behaviour-can-rnns-exhibit" style="font-size: 80%;"><b>What kinds of behaviour can RNNs exhibit?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs041.html#basic-layout-figures-from-sebastian-rashcka-et-al-machine-learning-with-sickit-learn-and-pytorch-https-sebastianraschka-com-blog-2022-ml-pytorch-book-html" style="font-size: 80%;"><b>Basic layout,  "Figures from Sebastian Rashcka et al, Machine learning with Sickit-Learn and PyTorch":"https://sebastianraschka.com/blog/2022/ml-pytorch-book.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs042.html#solving-differential-equations-with-rnns" style="font-size: 80%;"><b>Solving differential equations with RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs043.html#two-first-order-differential-equations" style="font-size: 80%;"><b>Two first-order differential equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs044.html#velocity-only" style="font-size: 80%;"><b>Velocity only</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs045.html#linking-with-rnns" style="font-size: 80%;"><b>Linking with RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs046.html#minor-rewrite" style="font-size: 80%;"><b>Minor rewrite</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs047.html#rnns-in-more-detail" style="font-size: 80%;"><b>RNNs in more detail</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs048.html#rnns-in-more-detail-part-2" style="font-size: 80%;"><b>RNNs in more detail, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs049.html#rnns-in-more-detail-part-3" style="font-size: 80%;"><b>RNNs in more detail, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs050.html#rnns-in-more-detail-part-4" style="font-size: 80%;"><b>RNNs in more detail, part 4</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs051.html#rnns-in-more-detail-part-5" style="font-size: 80%;"><b>RNNs in more detail, part 5</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs052.html#rnns-in-more-detail-part-6" style="font-size: 80%;"><b>RNNs in more detail, part 6</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs053.html#rnns-in-more-detail-part-7" style="font-size: 80%;"><b>RNNs in more detail, part 7</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs054.html#backpropagation-through-time" style="font-size: 80%;"><b>Backpropagation through time</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs055.html#the-backward-pass-is-linear" style="font-size: 80%;"><b>The backward pass is linear</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs056.html#the-problem-of-exploding-or-vanishing-gradients" style="font-size: 80%;"><b>The problem of exploding or vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs057.html#mathematical-setup" style="font-size: 80%;"><b>Mathematical setup</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs058.html#back-propagation-in-time-through-figures-part-1" style="font-size: 80%;"><b>Back propagation in time through figures, part 1</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs059.html#back-propagation-in-time-part-2" style="font-size: 80%;"><b>Back propagation in time, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs060.html#back-propagation-in-time-part-3" style="font-size: 80%;"><b>Back propagation in time, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs061.html#back-propagation-in-time-part-4" style="font-size: 80%;"><b>Back propagation in time, part 4</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs062.html#back-propagation-in-time-in-equations" style="font-size: 80%;"><b>Back propagation in time in equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs063.html#chain-rule-again" style="font-size: 80%;"><b>Chain rule again</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs064.html#gradients-of-loss-functions" style="font-size: 80%;"><b>Gradients of loss functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs065.html#summary-of-rnns" style="font-size: 80%;"><b>Summary of RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs066.html#summary-of-a-typical-rnn" style="font-size: 80%;"><b>Summary of a  typical RNN</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs067.html#four-effective-ways-to-learn-an-rnn-and-preparing-for-next-week" style="font-size: 80%;"><b>Four effective ways to learn an RNN and preparing for next week</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs068.html#gating-mechanism-long-short-term-memory-lstm" style="font-size: 80%;"><b>Gating mechanism: Long Short Term Memory (LSTM)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs069.html#implementing-a-memory-cell-in-a-neural-network" style="font-size: 80%;"><b>Implementing a memory cell in a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs070.html#lstm-details" style="font-size: 80%;"><b>LSTM details</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs071.html#basic-layout" style="font-size: 80%;"><b>Basic layout</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs072.html#more-lstm-details" style="font-size: 80%;"><b>More LSTM details</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs073.html#the-forget-gate" style="font-size: 80%;"><b>The forget gate</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs074.html#input-gate" style="font-size: 80%;"><b>Input gate</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs075.html#forget-and-input" style="font-size: 80%;"><b>Forget and input</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs076.html#output-gate" style="font-size: 80%;"><b>Output gate</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs001.html#plans-for-week-45" style="font-size: 80%;">Plans for week 45</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs002.html#material-for-the-lab-sessions" style="font-size: 80%;">Material for the lab sessions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs003.html#material-for-lecture-monday-november-3" style="font-size: 80%;">Material for Lecture Monday November 3</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs004.html#convolutional-neural-networks-recognizing-images-reminder-from-last-week" style="font-size: 80%;">Convolutional Neural Networks (recognizing images), reminder from last week</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs005.html#what-is-the-difference" style="font-size: 80%;">What is the Difference</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs006.html#neural-networks-vs-cnns" style="font-size: 80%;">Neural Networks vs CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs007.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;">Why CNNS for images, sound files, medical images from CT scans etc?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs008.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;">Regular NNs don’t scale well to full images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs009.html#3d-volumes-of-neurons" style="font-size: 80%;">3D volumes of neurons</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs010.html#more-on-dimensionalities" style="font-size: 80%;">More on Dimensionalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs011.html#further-remarks" style="font-size: 80%;">Further remarks</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs012.html#layers-used-to-build-cnns" style="font-size: 80%;">Layers used to build CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs013.html#transforming-images" style="font-size: 80%;">Transforming images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs014.html#cnns-in-brief" style="font-size: 80%;">CNNs in brief</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs015.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs016.html#key-idea" style="font-size: 80%;">Key Idea</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs017.html#mathematics-of-cnns" style="font-size: 80%;">Mathematics of CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs018.html#convolution-examples-polynomial-multiplication" style="font-size: 80%;">Convolution Examples: Polynomial multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs019.html#efficient-polynomial-multiplication" style="font-size: 80%;">Efficient Polynomial Multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs020.html#further-simplification" style="font-size: 80%;">Further simplification</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs021.html#a-more-efficient-way-of-coding-the-above-convolution" style="font-size: 80%;">A more efficient way of coding the above Convolution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs022.html#commutative-process" style="font-size: 80%;">Commutative process</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs023.html#toeplitz-matrices" style="font-size: 80%;">Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs024.html#fourier-series-and-toeplitz-matrices" style="font-size: 80%;">Fourier series and Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="#generalizing-the-above-one-dimensional-case" style="font-size: 80%;">Generalizing the above one-dimensional case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs026.html#memory-considerations" style="font-size: 80%;">Memory considerations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs027.html#padding" style="font-size: 80%;">Padding</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs028.html#new-vector" style="font-size: 80%;">New vector</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs029.html#rewriting-as-dot-products" style="font-size: 80%;">Rewriting as dot products</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs030.html#cross-correlation" style="font-size: 80%;">Cross correlation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs030.html#two-dimensional-objects" style="font-size: 80%;">Two-dimensional objects</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#cnns-in-more-detail-simple-example" style="font-size: 80%;">CNNs in more detail, simple example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs032.html#the-convolution-stage" style="font-size: 80%;">The convolution stage</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs033.html#finding-the-number-of-parameters" style="font-size: 80%;">Finding the number of parameters</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs034.html#new-image-or-volume" style="font-size: 80%;">New image (or volume)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs035.html#parameters-to-train-common-settings" style="font-size: 80%;">Parameters to train, common settings</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs036.html#examples-of-cnn-setups" style="font-size: 80%;">Examples of CNN setups</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs037.html#summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Summarizing: Performing a general discrete convolution ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs038.html#pooling" style="font-size: 80%;">Pooling</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs039.html#pooling-arithmetic" style="font-size: 80%;">Pooling arithmetic</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs040.html#pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Pooling types ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs041.html#building-convolutional-neural-networks-using-tensorflow-and-keras" style="font-size: 80%;">Building convolutional neural networks using Tensorflow and Keras</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs042.html#setting-it-up" style="font-size: 80%;">Setting it up</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs043.html#the-mnist-dataset-again" style="font-size: 80%;">The MNIST dataset again</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs044.html#strong-correlations" style="font-size: 80%;">Strong correlations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs045.html#layers-of-a-cnn" style="font-size: 80%;">Layers of a CNN</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs046.html#systematic-reduction" style="font-size: 80%;">Systematic reduction</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs047.html#prerequisites-collect-and-pre-process-data" style="font-size: 80%;">Prerequisites: Collect and pre-process data</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs048.html#importing-keras-and-tensorflow" style="font-size: 80%;">Importing Keras and Tensorflow</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs049.html#running-with-keras" style="font-size: 80%;">Running with Keras</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs050.html#final-part" style="font-size: 80%;">Final part</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs051.html#final-visualization" style="font-size: 80%;">Final visualization</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs052.html#the-cifar01-data-set" style="font-size: 80%;">The CIFAR01 data set</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs053.html#verifying-the-data-set" style="font-size: 80%;">Verifying the data set</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs054.html#set-up-the-model" style="font-size: 80%;">Set up  the model</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs055.html#add-dense-layers-on-top" style="font-size: 80%;">Add Dense layers on top</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs056.html#compile-and-train-the-model" style="font-size: 80%;">Compile and train the model</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs057.html#finally-evaluate-the-model" style="font-size: 80%;">Finally, evaluate the model</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs058.html#building-code-using-pytorch" style="font-size: 80%;">Building code using Pytorch</a></li>
 
         </ul>
       </li>
@@ -418,46 +278,24 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0025"></a>
 <!-- !split -->
-<h2 id="the-cifar01-data-set" class="anchor">The CIFAR01 data set </h2>
+<h2 id="generalizing-the-above-one-dimensional-case" class="anchor">Generalizing the above one-dimensional case </h2>
 
-<p>The CIFAR10 dataset contains 60,000 color images in 10 classes, with
-6,000 images in each class. The dataset is divided into 50,000
-training images and 10,000 testing images. The classes are mutually
-exclusive and there is no overlap between them.
+<p>In order to align the above simple case with the more general
+convolution cases, we rename \( \boldsymbol{\alpha} \), whose length is \( m=3 \),
+with \( \boldsymbol{w} \).  We will interpret \( \boldsymbol{w} \) as a weight/filter function
+with which we want to perform the convolution with an input variable
+\( \boldsymbol{x} \) of length \( n \).  We will assume always that the filter
+\( \boldsymbol{w} \) has dimensionality \( m \le n \).
 </p>
 
+<p>We replace thus \( \boldsymbol{\beta} \) with \( \boldsymbol{x} \) and \( \boldsymbol{\delta} \) with \( \boldsymbol{y} \) and have</p>
+$$
+y(i)= \left(x*w\right)(i)= \sum_{k=0}^{k=m-1}w(k)x(i-k),
+$$
 
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">tensorflow</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">tf</span>
-
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">tensorflow.keras</span> <span style="color: #008000; font-weight: bold">import</span> datasets, layers, models
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">matplotlib.pyplot</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">plt</span>
-
-<span style="color: #408080; font-style: italic"># We import the data set</span>
-(train_images, train_labels), (test_images, test_labels) <span style="color: #666666">=</span> datasets<span style="color: #666666">.</span>cifar10<span style="color: #666666">.</span>load_data()
-
-<span style="color: #408080; font-style: italic"># Normalize pixel values to be between 0 and 1 by dividing by 255. </span>
-train_images, test_images <span style="color: #666666">=</span> train_images <span style="color: #666666">/</span> <span style="color: #666666">255.0</span>, test_images <span style="color: #666666">/</span> <span style="color: #666666">255.0</span>
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
+<p>where \( m=3 \) in our case, the maximum length of the vector \( \boldsymbol{w} \).
+Here the symbol \( * \) represents the mathematical operation of convolution.
+</p>
 
 <p>
 <!-- navigation buttons at the bottom of the page -->
@@ -484,7 +322,7 @@ <h2 id="the-cifar01-data-set" class="anchor">The CIFAR01 data set </h2>
   <li><a href="http://github.com/._week45-bs033.html">34</a></li>
   <li><a href="http://github.com/._week45-bs034.html">35</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week45-bs076.html">77</a></li>
+  <li><a href="http://github.com/._week45-bs058.html">59</a></li>
   <li><a href="http://github.com/._week45-bs026.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week45/html/._week45-bs026.html b/doc/pub/week45/html/._week45-bs026.html
index d6e53bdcd..68c8dcd58 100644
--- a/doc/pub/week45/html/._week45-bs026.html
+++ b/doc/pub/week45/html/._week45-bs026.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks (RNNs)">
-<title>Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks (RNNs)</title>
+<meta name="description" content="Week 45,  Convolutional Neural Networks (CCNs)">
+<title>Week 45,  Convolutional Neural Networks (CCNs)</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week45.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week45-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -37,19 +37,19 @@
 <!-- tocinfo
 {'highest level': 2,
  'sections': [('Plans for week 45', 2, None, 'plans-for-week-45'),
-              ('Material for the lab sessions, additional ways to present '
-               'classification results and other practicalities',
+              ('Material for the lab sessions',
                2,
                None,
-               'material-for-the-lab-sessions-additional-ways-to-present-classification-results-and-other-practicalities'),
-              ('Material for Lecture Monday November 4',
+               'material-for-the-lab-sessions'),
+              ('Material for Lecture Monday November 3',
                2,
                None,
-               'material-for-lecture-monday-november-4'),
-              ('Convolutional Neural Networks (recognizing images)',
+               'material-for-lecture-monday-november-3'),
+              ('Convolutional Neural Networks (recognizing images), reminder '
+               'from last week',
                2,
                None,
-               'convolutional-neural-networks-recognizing-images'),
+               'convolutional-neural-networks-recognizing-images-reminder-from-last-week'),
               ('What is the Difference', 2, None, 'what-is-the-difference'),
               ('Neural Networks vs CNNs', 2, None, 'neural-networks-vs-cnns'),
               ('Why CNNS for images, sound files, medical images from CT scans '
@@ -62,10 +62,13 @@
                None,
                'regular-nns-don-t-scale-well-to-full-images'),
               ('3D volumes of neurons', 2, None, '3d-volumes-of-neurons'),
+              ('More on Dimensionalities', 2, None, 'more-on-dimensionalities'),
+              ('Further remarks', 2, None, 'further-remarks'),
               ('Layers used to build CNNs',
                2,
                None,
                'layers-used-to-build-cnns'),
+              ('Transforming images', 2, None, 'transforming-images'),
               ('CNNs in brief', 2, None, 'cnns-in-brief'),
               ('A deep CNN model ("From Raschka et '
                'al":"https://github.com/rasbt/machine-learning-book")',
@@ -73,208 +76,104 @@
                None,
                'a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
               ('Key Idea', 2, None, 'key-idea'),
-              ('Building convolutional neural networks in Tensorflow and Keras',
+              ('Mathematics of CNNs', 2, None, 'mathematics-of-cnns'),
+              ('Convolution Examples: Polynomial multiplication',
                2,
                None,
-               'building-convolutional-neural-networks-in-tensorflow-and-keras'),
-              ('Setting it up', 2, None, 'setting-it-up'),
-              ('The MNIST dataset again', 2, None, 'the-mnist-dataset-again'),
-              ('Strong correlations', 2, None, 'strong-correlations'),
-              ('Layers of a CNN', 2, None, 'layers-of-a-cnn'),
-              ('Systematic reduction', 2, None, 'systematic-reduction'),
-              ('Prerequisites: Collect and pre-process data',
-               2,
-               None,
-               'prerequisites-collect-and-pre-process-data'),
-              ('Importing Keras and Tensorflow',
-               2,
-               None,
-               'importing-keras-and-tensorflow'),
-              ('Running with Keras', 2, None, 'running-with-keras'),
-              ('Final part', 2, None, 'final-part'),
-              ('Final visualization', 2, None, 'final-visualization'),
-              ('The CIFAR01 data set', 2, None, 'the-cifar01-data-set'),
-              ('Verifying the data set', 2, None, 'verifying-the-data-set'),
-              ('Set up  the model', 2, None, 'set-up-the-model'),
-              ('Add Dense layers on top', 2, None, 'add-dense-layers-on-top'),
-              ('Compile and train the model',
-               2,
-               None,
-               'compile-and-train-the-model'),
-              ('Finally, evaluate the model',
-               2,
-               None,
-               'finally-evaluate-the-model'),
-              ('Building our own CNN code',
-               2,
-               None,
-               'building-our-own-cnn-code'),
-              ('List of contents:', 3, None, 'list-of-contents'),
-              ('Schedulers', 3, None, 'schedulers'),
-              ('Usage of schedulers', 3, None, 'usage-of-schedulers'),
-              ('Cost functions', 3, None, 'cost-functions'),
-              ('Usage of cost functions', 3, None, 'usage-of-cost-functions'),
-              ('Activation functions', 3, None, 'activation-functions'),
-              ('Usage of activation functions',
-               3,
-               None,
-               'usage-of-activation-functions'),
-              ('Convolution', 3, None, 'convolution'),
-              ('Layers', 3, None, 'layers'),
-              ('Convolution2DLayer: convolution in a hidden layer',
-               3,
-               None,
-               'convolution2dlayer-convolution-in-a-hidden-layer'),
-              ('Backpropagation in the convolutional layer',
-               3,
-               None,
-               'backpropagation-in-the-convolutional-layer'),
-              ('Demonstration', 3, None, 'demonstration'),
-              ('Pooling Layer', 3, None, 'pooling-layer'),
-              ('Flattening Layer', 3, None, 'flattening-layer'),
-              ('Fully Connected Layers', 3, None, 'fully-connected-layers'),
-              ('Optimized Convolution2DLayer',
-               3,
-               None,
-               'optimized-convolution2dlayer'),
-              ('The Convolutional Neural Network (CNN)',
-               3,
-               None,
-               'the-convolutional-neural-network-cnn'),
-              ('Usage of CNN code', 3, None, 'usage-of-cnn-code'),
-              ('Additional Remarks', 3, None, 'additional-remarks'),
-              ('Remarks on the speed', 3, None, 'remarks-on-the-speed'),
-              ('Convolution using separable kernels',
-               3,
-               None,
-               'convolution-using-separable-kernels'),
-              ('Convolution in the Fourier domain',
-               3,
-               None,
-               'convolution-in-the-fourier-domain'),
-              ('From FFNNs and CNNs to recurrent neural networks (RNNs)',
-               2,
-               None,
-               'from-ffnns-and-cnns-to-recurrent-neural-networks-rnns'),
-              ('Feedback connections', 2, None, 'feedback-connections'),
-              ('Vanishing gradients', 2, None, 'vanishing-gradients'),
-              ('Recurrent neural networks (RNNs): Overarching view',
-               2,
-               None,
-               'recurrent-neural-networks-rnns-overarching-view'),
-              ('Sequential data only?', 2, None, 'sequential-data-only'),
-              ('Differential equations', 2, None, 'differential-equations'),
-              ('A simple example', 2, None, 'a-simple-example'),
-              ('RNNs', 2, None, 'rnns'),
-              ('What kinds of behaviour can RNNs exhibit?',
+               'convolution-examples-polynomial-multiplication'),
+              ('Efficient Polynomial Multiplication',
                2,
                None,
-               'what-kinds-of-behaviour-can-rnns-exhibit'),
-              ('Basic layout,  "Figures from Sebastian Rashcka et al, Machine '
-               'learning with Sickit-Learn and '
-               'PyTorch":"https://sebastianraschka.com/blog/2022/ml-pytorch-book.html"',
+               'efficient-polynomial-multiplication'),
+              ('Further simplification', 2, None, 'further-simplification'),
+              ('A more efficient way of coding the above Convolution',
                2,
                None,
-               'basic-layout-figures-from-sebastian-rashcka-et-al-machine-learning-with-sickit-learn-and-pytorch-https-sebastianraschka-com-blog-2022-ml-pytorch-book-html'),
-              ('Solving differential equations with RNNs',
+               'a-more-efficient-way-of-coding-the-above-convolution'),
+              ('Commutative process', 2, None, 'commutative-process'),
+              ('Toeplitz matrices', 2, None, 'toeplitz-matrices'),
+              ('Fourier series and Toeplitz matrices',
                2,
                None,
-               'solving-differential-equations-with-rnns'),
-              ('Two first-order differential equations',
+               'fourier-series-and-toeplitz-matrices'),
+              ('Generalizing the above one-dimensional case',
                2,
                None,
-               'two-first-order-differential-equations'),
-              ('Velocity only', 2, None, 'velocity-only'),
-              ('Linking with RNNs', 2, None, 'linking-with-rnns'),
-              ('Minor rewrite', 2, None, 'minor-rewrite'),
-              ('RNNs in more detail', 2, None, 'rnns-in-more-detail'),
-              ('RNNs in more detail, part 2',
+               'generalizing-the-above-one-dimensional-case'),
+              ('Memory considerations', 2, None, 'memory-considerations'),
+              ('Padding', 2, None, 'padding'),
+              ('New vector', 2, None, 'new-vector'),
+              ('Rewriting as dot products',
                2,
                None,
-               'rnns-in-more-detail-part-2'),
-              ('RNNs in more detail, part 3',
+               'rewriting-as-dot-products'),
+              ('Cross correlation', 2, None, 'cross-correlation'),
+              ('Two-dimensional objects', 2, None, 'two-dimensional-objects'),
+              ('CNNs in more detail, simple example',
                2,
                None,
-               'rnns-in-more-detail-part-3'),
-              ('RNNs in more detail, part 4',
+               'cnns-in-more-detail-simple-example'),
+              ('The convolution stage', 2, None, 'the-convolution-stage'),
+              ('Finding the number of parameters',
                2,
                None,
-               'rnns-in-more-detail-part-4'),
-              ('RNNs in more detail, part 5',
+               'finding-the-number-of-parameters'),
+              ('New image (or volume)', 2, None, 'new-image-or-volume'),
+              ('Parameters to train, common settings',
                2,
                None,
-               'rnns-in-more-detail-part-5'),
-              ('RNNs in more detail, part 6',
-               2,
-               None,
-               'rnns-in-more-detail-part-6'),
-              ('RNNs in more detail, part 7',
-               2,
-               None,
-               'rnns-in-more-detail-part-7'),
-              ('Backpropagation through time',
-               2,
-               None,
-               'backpropagation-through-time'),
-              ('The backward pass is linear',
-               2,
-               None,
-               'the-backward-pass-is-linear'),
-              ('The problem of exploding or vanishing gradients',
-               2,
-               None,
-               'the-problem-of-exploding-or-vanishing-gradients'),
-              ('Mathematical setup', 2, None, 'mathematical-setup'),
-              ('Back propagation in time through figures, part 1',
-               2,
-               None,
-               'back-propagation-in-time-through-figures-part-1'),
-              ('Back propagation in time, part 2',
-               2,
-               None,
-               'back-propagation-in-time-part-2'),
-              ('Back propagation in time, part 3',
+               'parameters-to-train-common-settings'),
+              ('Examples of CNN setups', 2, None, 'examples-of-cnn-setups'),
+              ('Summarizing: Performing a general discrete convolution ("From '
+               'Raschka et '
+               'al":"https://github.com/rasbt/machine-learning-book")',
                2,
                None,
-               'back-propagation-in-time-part-3'),
-              ('Back propagation in time, part 4',
+               'summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
+              ('Pooling', 2, None, 'pooling'),
+              ('Pooling arithmetic', 2, None, 'pooling-arithmetic'),
+              ('Pooling types ("From Raschka et '
+               'al":"https://github.com/rasbt/machine-learning-book")',
                2,
                None,
-               'back-propagation-in-time-part-4'),
-              ('Back propagation in time in equations',
+               'pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
+              ('Building convolutional neural networks using Tensorflow and '
+               'Keras',
                2,
                None,
-               'back-propagation-in-time-in-equations'),
-              ('Chain rule again', 2, None, 'chain-rule-again'),
-              ('Gradients of loss functions',
+               'building-convolutional-neural-networks-using-tensorflow-and-keras'),
+              ('Setting it up', 2, None, 'setting-it-up'),
+              ('The MNIST dataset again', 2, None, 'the-mnist-dataset-again'),
+              ('Strong correlations', 2, None, 'strong-correlations'),
+              ('Layers of a CNN', 2, None, 'layers-of-a-cnn'),
+              ('Systematic reduction', 2, None, 'systematic-reduction'),
+              ('Prerequisites: Collect and pre-process data',
                2,
                None,
-               'gradients-of-loss-functions'),
-              ('Summary of RNNs', 2, None, 'summary-of-rnns'),
-              ('Summary of a  typical RNN',
+               'prerequisites-collect-and-pre-process-data'),
+              ('Importing Keras and Tensorflow',
                2,
                None,
-               'summary-of-a-typical-rnn'),
-              ('Four effective ways to learn an RNN and preparing for next '
-               'week',
+               'importing-keras-and-tensorflow'),
+              ('Running with Keras', 2, None, 'running-with-keras'),
+              ('Final part', 2, None, 'final-part'),
+              ('Final visualization', 2, None, 'final-visualization'),
+              ('The CIFAR01 data set', 2, None, 'the-cifar01-data-set'),
+              ('Verifying the data set', 2, None, 'verifying-the-data-set'),
+              ('Set up  the model', 2, None, 'set-up-the-model'),
+              ('Add Dense layers on top', 2, None, 'add-dense-layers-on-top'),
+              ('Compile and train the model',
                2,
                None,
-               'four-effective-ways-to-learn-an-rnn-and-preparing-for-next-week'),
-              ('Gating mechanism: Long Short Term Memory (LSTM)',
+               'compile-and-train-the-model'),
+              ('Finally, evaluate the model',
                2,
                None,
-               'gating-mechanism-long-short-term-memory-lstm'),
-              ('Implementing a memory cell in a neural network',
+               'finally-evaluate-the-model'),
+              ('Building code using Pytorch',
                2,
                None,
-               'implementing-a-memory-cell-in-a-neural-network'),
-              ('LSTM details', 2, None, 'lstm-details'),
-              ('Basic layout', 2, None, 'basic-layout'),
-              ('More LSTM details', 2, None, 'more-lstm-details'),
-              ('The forget gate', 2, None, 'the-forget-gate'),
-              ('Input gate', 2, None, 'input-gate'),
-              ('Forget and input', 2, None, 'forget-and-input'),
-              ('Output gate', 2, None, 'output-gate')]}
+               'building-code-using-pytorch')]}
 end of tocinfo -->
 
 <body>
@@ -302,111 +201,72 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week45-bs.html">Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks (RNNs)</a>
+    <a class="navbar-brand" href="http://github.com/week45-bs.html">Week 45,  Convolutional Neural Networks (CCNs)</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs001.html#plans-for-week-45" style="font-size: 80%;"><b>Plans for week 45</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs002.html#material-for-the-lab-sessions-additional-ways-to-present-classification-results-and-other-practicalities" style="font-size: 80%;"><b>Material for the lab sessions, additional ways to present classification results and other practicalities</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs003.html#material-for-lecture-monday-november-4" style="font-size: 80%;"><b>Material for Lecture Monday November 4</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs004.html#convolutional-neural-networks-recognizing-images" style="font-size: 80%;"><b>Convolutional Neural Networks (recognizing images)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs005.html#what-is-the-difference" style="font-size: 80%;"><b>What is the Difference</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs006.html#neural-networks-vs-cnns" style="font-size: 80%;"><b>Neural Networks vs CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs007.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;"><b>Why CNNS for images, sound files, medical images from CT scans etc?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs008.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;"><b>Regular NNs don’t scale well to full images</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs009.html#3d-volumes-of-neurons" style="font-size: 80%;"><b>3D volumes of neurons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs010.html#layers-used-to-build-cnns" style="font-size: 80%;"><b>Layers used to build CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs011.html#cnns-in-brief" style="font-size: 80%;"><b>CNNs in brief</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs012.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs013.html#key-idea" style="font-size: 80%;"><b>Key Idea</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs014.html#building-convolutional-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building convolutional neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs015.html#setting-it-up" style="font-size: 80%;"><b>Setting it up</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs016.html#the-mnist-dataset-again" style="font-size: 80%;"><b>The MNIST dataset again</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs017.html#strong-correlations" style="font-size: 80%;"><b>Strong correlations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs018.html#layers-of-a-cnn" style="font-size: 80%;"><b>Layers of a CNN</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs019.html#systematic-reduction" style="font-size: 80%;"><b>Systematic reduction</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs020.html#prerequisites-collect-and-pre-process-data" style="font-size: 80%;"><b>Prerequisites: Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs021.html#importing-keras-and-tensorflow" style="font-size: 80%;"><b>Importing Keras and Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs022.html#running-with-keras" style="font-size: 80%;"><b>Running with Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs023.html#final-part" style="font-size: 80%;"><b>Final part</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs024.html#final-visualization" style="font-size: 80%;"><b>Final visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs025.html#the-cifar01-data-set" style="font-size: 80%;"><b>The CIFAR01 data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="#verifying-the-data-set" style="font-size: 80%;"><b>Verifying the data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs027.html#set-up-the-model" style="font-size: 80%;"><b>Set up  the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs028.html#add-dense-layers-on-top" style="font-size: 80%;"><b>Add Dense layers on top</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs029.html#compile-and-train-the-model" style="font-size: 80%;"><b>Compile and train the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs030.html#finally-evaluate-the-model" style="font-size: 80%;"><b>Finally, evaluate the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#building-our-own-cnn-code" style="font-size: 80%;"><b>Building our own CNN code</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#list-of-contents" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;List of contents:</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution2dlayer-convolution-in-a-hidden-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution2DLayer: convolution in a hidden layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#backpropagation-in-the-convolutional-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Backpropagation in the convolutional layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#demonstration" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Demonstration</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#pooling-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Pooling Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#flattening-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Flattening Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#fully-connected-layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Fully Connected Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#optimized-convolution2dlayer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Optimized Convolution2DLayer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#the-convolutional-neural-network-cnn" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;The Convolutional Neural Network (CNN)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-cnn-code" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of CNN code</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#additional-remarks" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Additional Remarks</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#remarks-on-the-speed" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Remarks on the speed</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution-using-separable-kernels" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution using separable kernels</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution-in-the-fourier-domain" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution in the Fourier domain</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs032.html#from-ffnns-and-cnns-to-recurrent-neural-networks-rnns" style="font-size: 80%;"><b>From FFNNs and CNNs to recurrent neural networks (RNNs)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs033.html#feedback-connections" style="font-size: 80%;"><b>Feedback connections</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs034.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs035.html#recurrent-neural-networks-rnns-overarching-view" style="font-size: 80%;"><b>Recurrent neural networks (RNNs): Overarching view</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs036.html#sequential-data-only" style="font-size: 80%;"><b>Sequential data only?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs037.html#differential-equations" style="font-size: 80%;"><b>Differential equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs038.html#a-simple-example" style="font-size: 80%;"><b>A simple example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs039.html#rnns" style="font-size: 80%;"><b>RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs040.html#what-kinds-of-behaviour-can-rnns-exhibit" style="font-size: 80%;"><b>What kinds of behaviour can RNNs exhibit?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs041.html#basic-layout-figures-from-sebastian-rashcka-et-al-machine-learning-with-sickit-learn-and-pytorch-https-sebastianraschka-com-blog-2022-ml-pytorch-book-html" style="font-size: 80%;"><b>Basic layout,  "Figures from Sebastian Rashcka et al, Machine learning with Sickit-Learn and PyTorch":"https://sebastianraschka.com/blog/2022/ml-pytorch-book.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs042.html#solving-differential-equations-with-rnns" style="font-size: 80%;"><b>Solving differential equations with RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs043.html#two-first-order-differential-equations" style="font-size: 80%;"><b>Two first-order differential equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs044.html#velocity-only" style="font-size: 80%;"><b>Velocity only</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs045.html#linking-with-rnns" style="font-size: 80%;"><b>Linking with RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs046.html#minor-rewrite" style="font-size: 80%;"><b>Minor rewrite</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs047.html#rnns-in-more-detail" style="font-size: 80%;"><b>RNNs in more detail</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs048.html#rnns-in-more-detail-part-2" style="font-size: 80%;"><b>RNNs in more detail, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs049.html#rnns-in-more-detail-part-3" style="font-size: 80%;"><b>RNNs in more detail, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs050.html#rnns-in-more-detail-part-4" style="font-size: 80%;"><b>RNNs in more detail, part 4</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs051.html#rnns-in-more-detail-part-5" style="font-size: 80%;"><b>RNNs in more detail, part 5</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs052.html#rnns-in-more-detail-part-6" style="font-size: 80%;"><b>RNNs in more detail, part 6</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs053.html#rnns-in-more-detail-part-7" style="font-size: 80%;"><b>RNNs in more detail, part 7</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs054.html#backpropagation-through-time" style="font-size: 80%;"><b>Backpropagation through time</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs055.html#the-backward-pass-is-linear" style="font-size: 80%;"><b>The backward pass is linear</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs056.html#the-problem-of-exploding-or-vanishing-gradients" style="font-size: 80%;"><b>The problem of exploding or vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs057.html#mathematical-setup" style="font-size: 80%;"><b>Mathematical setup</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs058.html#back-propagation-in-time-through-figures-part-1" style="font-size: 80%;"><b>Back propagation in time through figures, part 1</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs059.html#back-propagation-in-time-part-2" style="font-size: 80%;"><b>Back propagation in time, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs060.html#back-propagation-in-time-part-3" style="font-size: 80%;"><b>Back propagation in time, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs061.html#back-propagation-in-time-part-4" style="font-size: 80%;"><b>Back propagation in time, part 4</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs062.html#back-propagation-in-time-in-equations" style="font-size: 80%;"><b>Back propagation in time in equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs063.html#chain-rule-again" style="font-size: 80%;"><b>Chain rule again</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs064.html#gradients-of-loss-functions" style="font-size: 80%;"><b>Gradients of loss functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs065.html#summary-of-rnns" style="font-size: 80%;"><b>Summary of RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs066.html#summary-of-a-typical-rnn" style="font-size: 80%;"><b>Summary of a  typical RNN</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs067.html#four-effective-ways-to-learn-an-rnn-and-preparing-for-next-week" style="font-size: 80%;"><b>Four effective ways to learn an RNN and preparing for next week</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs068.html#gating-mechanism-long-short-term-memory-lstm" style="font-size: 80%;"><b>Gating mechanism: Long Short Term Memory (LSTM)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs069.html#implementing-a-memory-cell-in-a-neural-network" style="font-size: 80%;"><b>Implementing a memory cell in a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs070.html#lstm-details" style="font-size: 80%;"><b>LSTM details</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs071.html#basic-layout" style="font-size: 80%;"><b>Basic layout</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs072.html#more-lstm-details" style="font-size: 80%;"><b>More LSTM details</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs073.html#the-forget-gate" style="font-size: 80%;"><b>The forget gate</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs074.html#input-gate" style="font-size: 80%;"><b>Input gate</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs075.html#forget-and-input" style="font-size: 80%;"><b>Forget and input</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs076.html#output-gate" style="font-size: 80%;"><b>Output gate</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs001.html#plans-for-week-45" style="font-size: 80%;">Plans for week 45</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs002.html#material-for-the-lab-sessions" style="font-size: 80%;">Material for the lab sessions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs003.html#material-for-lecture-monday-november-3" style="font-size: 80%;">Material for Lecture Monday November 3</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs004.html#convolutional-neural-networks-recognizing-images-reminder-from-last-week" style="font-size: 80%;">Convolutional Neural Networks (recognizing images), reminder from last week</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs005.html#what-is-the-difference" style="font-size: 80%;">What is the Difference</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs006.html#neural-networks-vs-cnns" style="font-size: 80%;">Neural Networks vs CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs007.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;">Why CNNS for images, sound files, medical images from CT scans etc?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs008.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;">Regular NNs don’t scale well to full images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs009.html#3d-volumes-of-neurons" style="font-size: 80%;">3D volumes of neurons</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs010.html#more-on-dimensionalities" style="font-size: 80%;">More on Dimensionalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs011.html#further-remarks" style="font-size: 80%;">Further remarks</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs012.html#layers-used-to-build-cnns" style="font-size: 80%;">Layers used to build CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs013.html#transforming-images" style="font-size: 80%;">Transforming images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs014.html#cnns-in-brief" style="font-size: 80%;">CNNs in brief</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs015.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs016.html#key-idea" style="font-size: 80%;">Key Idea</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs017.html#mathematics-of-cnns" style="font-size: 80%;">Mathematics of CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs018.html#convolution-examples-polynomial-multiplication" style="font-size: 80%;">Convolution Examples: Polynomial multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs019.html#efficient-polynomial-multiplication" style="font-size: 80%;">Efficient Polynomial Multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs020.html#further-simplification" style="font-size: 80%;">Further simplification</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs021.html#a-more-efficient-way-of-coding-the-above-convolution" style="font-size: 80%;">A more efficient way of coding the above Convolution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs022.html#commutative-process" style="font-size: 80%;">Commutative process</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs023.html#toeplitz-matrices" style="font-size: 80%;">Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs024.html#fourier-series-and-toeplitz-matrices" style="font-size: 80%;">Fourier series and Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs025.html#generalizing-the-above-one-dimensional-case" style="font-size: 80%;">Generalizing the above one-dimensional case</a></li>
+     <!-- navigation toc: --> <li><a href="#memory-considerations" style="font-size: 80%;">Memory considerations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs027.html#padding" style="font-size: 80%;">Padding</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs028.html#new-vector" style="font-size: 80%;">New vector</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs029.html#rewriting-as-dot-products" style="font-size: 80%;">Rewriting as dot products</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs030.html#cross-correlation" style="font-size: 80%;">Cross correlation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs030.html#two-dimensional-objects" style="font-size: 80%;">Two-dimensional objects</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#cnns-in-more-detail-simple-example" style="font-size: 80%;">CNNs in more detail, simple example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs032.html#the-convolution-stage" style="font-size: 80%;">The convolution stage</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs033.html#finding-the-number-of-parameters" style="font-size: 80%;">Finding the number of parameters</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs034.html#new-image-or-volume" style="font-size: 80%;">New image (or volume)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs035.html#parameters-to-train-common-settings" style="font-size: 80%;">Parameters to train, common settings</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs036.html#examples-of-cnn-setups" style="font-size: 80%;">Examples of CNN setups</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs037.html#summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Summarizing: Performing a general discrete convolution ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs038.html#pooling" style="font-size: 80%;">Pooling</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs039.html#pooling-arithmetic" style="font-size: 80%;">Pooling arithmetic</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs040.html#pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Pooling types ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs041.html#building-convolutional-neural-networks-using-tensorflow-and-keras" style="font-size: 80%;">Building convolutional neural networks using Tensorflow and Keras</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs042.html#setting-it-up" style="font-size: 80%;">Setting it up</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs043.html#the-mnist-dataset-again" style="font-size: 80%;">The MNIST dataset again</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs044.html#strong-correlations" style="font-size: 80%;">Strong correlations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs045.html#layers-of-a-cnn" style="font-size: 80%;">Layers of a CNN</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs046.html#systematic-reduction" style="font-size: 80%;">Systematic reduction</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs047.html#prerequisites-collect-and-pre-process-data" style="font-size: 80%;">Prerequisites: Collect and pre-process data</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs048.html#importing-keras-and-tensorflow" style="font-size: 80%;">Importing Keras and Tensorflow</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs049.html#running-with-keras" style="font-size: 80%;">Running with Keras</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs050.html#final-part" style="font-size: 80%;">Final part</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs051.html#final-visualization" style="font-size: 80%;">Final visualization</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs052.html#the-cifar01-data-set" style="font-size: 80%;">The CIFAR01 data set</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs053.html#verifying-the-data-set" style="font-size: 80%;">Verifying the data set</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs054.html#set-up-the-model" style="font-size: 80%;">Set up  the model</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs055.html#add-dense-layers-on-top" style="font-size: 80%;">Add Dense layers on top</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs056.html#compile-and-train-the-model" style="font-size: 80%;">Compile and train the model</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs057.html#finally-evaluate-the-model" style="font-size: 80%;">Finally, evaluate the model</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs058.html#building-code-using-pytorch" style="font-size: 80%;">Building code using Pytorch</a></li>
 
         </ul>
       </li>
@@ -418,45 +278,21 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0026"></a>
 <!-- !split -->
-<h2 id="verifying-the-data-set" class="anchor">Verifying the data set </h2>
+<h2 id="memory-considerations" class="anchor">Memory considerations </h2>
 
-<p>To verify that the dataset looks correct, let's plot the first 25 images from the training set and display the class name below each image.</p>
+<p>This expression leaves us however with some terms with negative
+indices, for example \( x(-1) \) and \( x(-2) \) which may not be defined. Our
+vector \( \boldsymbol{x} \) has components \( x(0) \), \( x(1) \), \( x(2) \) and \( x(3) \).
+</p>
 
+<p>The index \( j \) for \( \boldsymbol{x} \) runs from \( j=0 \) to \( j=3 \) since \( \boldsymbol{x} \) is meant to
+represent a third-order polynomial.
+</p>
 
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;">class_names <span style="color: #666666">=</span> [<span style="color: #BA2121">&#39;airplane&#39;</span>, <span style="color: #BA2121">&#39;automobile&#39;</span>, <span style="color: #BA2121">&#39;bird&#39;</span>, <span style="color: #BA2121">&#39;cat&#39;</span>, <span style="color: #BA2121">&#39;deer&#39;</span>,
-               <span style="color: #BA2121">&#39;dog&#39;</span>, <span style="color: #BA2121">&#39;frog&#39;</span>, <span style="color: #BA2121">&#39;horse&#39;</span>, <span style="color: #BA2121">&#39;ship&#39;</span>, <span style="color: #BA2121">&#39;truck&#39;</span>]
-plt<span style="color: #666666">.</span>figure(figsize<span style="color: #666666">=</span>(<span style="color: #666666">10</span>,<span style="color: #666666">10</span>))
-<span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #666666">25</span>):
-    plt<span style="color: #666666">.</span>subplot(<span style="color: #666666">5</span>,<span style="color: #666666">5</span>,i<span style="color: #666666">+1</span>)
-    plt<span style="color: #666666">.</span>xticks([])
-    plt<span style="color: #666666">.</span>yticks([])
-    plt<span style="color: #666666">.</span>grid(<span style="color: #008000; font-weight: bold">False</span>)
-    plt<span style="color: #666666">.</span>imshow(train_images[i], cmap<span style="color: #666666">=</span>plt<span style="color: #666666">.</span>cm<span style="color: #666666">.</span>binary)
-    <span style="color: #408080; font-style: italic"># The CIFAR labels happen to be arrays, </span>
-    <span style="color: #408080; font-style: italic"># which is why you need the extra index</span>
-    plt<span style="color: #666666">.</span>xlabel(class_names[train_labels[i][<span style="color: #666666">0</span>]])
-plt<span style="color: #666666">.</span>show()
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
+<p>Furthermore, the index \( i \) runs from \( i=0 \) to \( i=5 \) since \( \boldsymbol{y} \)
+contains the coefficients of a fifth-order polynomial.  When \( i=5 \) we
+may also have values of \( x(4) \) and \( x(5) \) which are not defined.
+</p>
 
 <p>
 <!-- navigation buttons at the bottom of the page -->
@@ -483,7 +319,7 @@ <h2 id="verifying-the-data-set" class="anchor">Verifying the data set </h2>
   <li><a href="http://github.com/._week45-bs034.html">35</a></li>
   <li><a href="http://github.com/._week45-bs035.html">36</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week45-bs076.html">77</a></li>
+  <li><a href="http://github.com/._week45-bs058.html">59</a></li>
   <li><a href="http://github.com/._week45-bs027.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week45/html/._week45-bs027.html b/doc/pub/week45/html/._week45-bs027.html
index 969d6497d..80d3a9be0 100644
--- a/doc/pub/week45/html/._week45-bs027.html
+++ b/doc/pub/week45/html/._week45-bs027.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks (RNNs)">
-<title>Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks (RNNs)</title>
+<meta name="description" content="Week 45,  Convolutional Neural Networks (CCNs)">
+<title>Week 45,  Convolutional Neural Networks (CCNs)</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week45.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week45-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -37,19 +37,19 @@
 <!-- tocinfo
 {'highest level': 2,
  'sections': [('Plans for week 45', 2, None, 'plans-for-week-45'),
-              ('Material for the lab sessions, additional ways to present '
-               'classification results and other practicalities',
+              ('Material for the lab sessions',
                2,
                None,
-               'material-for-the-lab-sessions-additional-ways-to-present-classification-results-and-other-practicalities'),
-              ('Material for Lecture Monday November 4',
+               'material-for-the-lab-sessions'),
+              ('Material for Lecture Monday November 3',
                2,
                None,
-               'material-for-lecture-monday-november-4'),
-              ('Convolutional Neural Networks (recognizing images)',
+               'material-for-lecture-monday-november-3'),
+              ('Convolutional Neural Networks (recognizing images), reminder '
+               'from last week',
                2,
                None,
-               'convolutional-neural-networks-recognizing-images'),
+               'convolutional-neural-networks-recognizing-images-reminder-from-last-week'),
               ('What is the Difference', 2, None, 'what-is-the-difference'),
               ('Neural Networks vs CNNs', 2, None, 'neural-networks-vs-cnns'),
               ('Why CNNS for images, sound files, medical images from CT scans '
@@ -62,10 +62,13 @@
                None,
                'regular-nns-don-t-scale-well-to-full-images'),
               ('3D volumes of neurons', 2, None, '3d-volumes-of-neurons'),
+              ('More on Dimensionalities', 2, None, 'more-on-dimensionalities'),
+              ('Further remarks', 2, None, 'further-remarks'),
               ('Layers used to build CNNs',
                2,
                None,
                'layers-used-to-build-cnns'),
+              ('Transforming images', 2, None, 'transforming-images'),
               ('CNNs in brief', 2, None, 'cnns-in-brief'),
               ('A deep CNN model ("From Raschka et '
                'al":"https://github.com/rasbt/machine-learning-book")',
@@ -73,208 +76,104 @@
                None,
                'a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
               ('Key Idea', 2, None, 'key-idea'),
-              ('Building convolutional neural networks in Tensorflow and Keras',
+              ('Mathematics of CNNs', 2, None, 'mathematics-of-cnns'),
+              ('Convolution Examples: Polynomial multiplication',
                2,
                None,
-               'building-convolutional-neural-networks-in-tensorflow-and-keras'),
-              ('Setting it up', 2, None, 'setting-it-up'),
-              ('The MNIST dataset again', 2, None, 'the-mnist-dataset-again'),
-              ('Strong correlations', 2, None, 'strong-correlations'),
-              ('Layers of a CNN', 2, None, 'layers-of-a-cnn'),
-              ('Systematic reduction', 2, None, 'systematic-reduction'),
-              ('Prerequisites: Collect and pre-process data',
-               2,
-               None,
-               'prerequisites-collect-and-pre-process-data'),
-              ('Importing Keras and Tensorflow',
-               2,
-               None,
-               'importing-keras-and-tensorflow'),
-              ('Running with Keras', 2, None, 'running-with-keras'),
-              ('Final part', 2, None, 'final-part'),
-              ('Final visualization', 2, None, 'final-visualization'),
-              ('The CIFAR01 data set', 2, None, 'the-cifar01-data-set'),
-              ('Verifying the data set', 2, None, 'verifying-the-data-set'),
-              ('Set up  the model', 2, None, 'set-up-the-model'),
-              ('Add Dense layers on top', 2, None, 'add-dense-layers-on-top'),
-              ('Compile and train the model',
-               2,
-               None,
-               'compile-and-train-the-model'),
-              ('Finally, evaluate the model',
-               2,
-               None,
-               'finally-evaluate-the-model'),
-              ('Building our own CNN code',
-               2,
-               None,
-               'building-our-own-cnn-code'),
-              ('List of contents:', 3, None, 'list-of-contents'),
-              ('Schedulers', 3, None, 'schedulers'),
-              ('Usage of schedulers', 3, None, 'usage-of-schedulers'),
-              ('Cost functions', 3, None, 'cost-functions'),
-              ('Usage of cost functions', 3, None, 'usage-of-cost-functions'),
-              ('Activation functions', 3, None, 'activation-functions'),
-              ('Usage of activation functions',
-               3,
-               None,
-               'usage-of-activation-functions'),
-              ('Convolution', 3, None, 'convolution'),
-              ('Layers', 3, None, 'layers'),
-              ('Convolution2DLayer: convolution in a hidden layer',
-               3,
-               None,
-               'convolution2dlayer-convolution-in-a-hidden-layer'),
-              ('Backpropagation in the convolutional layer',
-               3,
-               None,
-               'backpropagation-in-the-convolutional-layer'),
-              ('Demonstration', 3, None, 'demonstration'),
-              ('Pooling Layer', 3, None, 'pooling-layer'),
-              ('Flattening Layer', 3, None, 'flattening-layer'),
-              ('Fully Connected Layers', 3, None, 'fully-connected-layers'),
-              ('Optimized Convolution2DLayer',
-               3,
-               None,
-               'optimized-convolution2dlayer'),
-              ('The Convolutional Neural Network (CNN)',
-               3,
-               None,
-               'the-convolutional-neural-network-cnn'),
-              ('Usage of CNN code', 3, None, 'usage-of-cnn-code'),
-              ('Additional Remarks', 3, None, 'additional-remarks'),
-              ('Remarks on the speed', 3, None, 'remarks-on-the-speed'),
-              ('Convolution using separable kernels',
-               3,
-               None,
-               'convolution-using-separable-kernels'),
-              ('Convolution in the Fourier domain',
-               3,
-               None,
-               'convolution-in-the-fourier-domain'),
-              ('From FFNNs and CNNs to recurrent neural networks (RNNs)',
-               2,
-               None,
-               'from-ffnns-and-cnns-to-recurrent-neural-networks-rnns'),
-              ('Feedback connections', 2, None, 'feedback-connections'),
-              ('Vanishing gradients', 2, None, 'vanishing-gradients'),
-              ('Recurrent neural networks (RNNs): Overarching view',
+               'convolution-examples-polynomial-multiplication'),
+              ('Efficient Polynomial Multiplication',
                2,
                None,
-               'recurrent-neural-networks-rnns-overarching-view'),
-              ('Sequential data only?', 2, None, 'sequential-data-only'),
-              ('Differential equations', 2, None, 'differential-equations'),
-              ('A simple example', 2, None, 'a-simple-example'),
-              ('RNNs', 2, None, 'rnns'),
-              ('What kinds of behaviour can RNNs exhibit?',
+               'efficient-polynomial-multiplication'),
+              ('Further simplification', 2, None, 'further-simplification'),
+              ('A more efficient way of coding the above Convolution',
                2,
                None,
-               'what-kinds-of-behaviour-can-rnns-exhibit'),
-              ('Basic layout,  "Figures from Sebastian Rashcka et al, Machine '
-               'learning with Sickit-Learn and '
-               'PyTorch":"https://sebastianraschka.com/blog/2022/ml-pytorch-book.html"',
+               'a-more-efficient-way-of-coding-the-above-convolution'),
+              ('Commutative process', 2, None, 'commutative-process'),
+              ('Toeplitz matrices', 2, None, 'toeplitz-matrices'),
+              ('Fourier series and Toeplitz matrices',
                2,
                None,
-               'basic-layout-figures-from-sebastian-rashcka-et-al-machine-learning-with-sickit-learn-and-pytorch-https-sebastianraschka-com-blog-2022-ml-pytorch-book-html'),
-              ('Solving differential equations with RNNs',
+               'fourier-series-and-toeplitz-matrices'),
+              ('Generalizing the above one-dimensional case',
                2,
                None,
-               'solving-differential-equations-with-rnns'),
-              ('Two first-order differential equations',
+               'generalizing-the-above-one-dimensional-case'),
+              ('Memory considerations', 2, None, 'memory-considerations'),
+              ('Padding', 2, None, 'padding'),
+              ('New vector', 2, None, 'new-vector'),
+              ('Rewriting as dot products',
                2,
                None,
-               'two-first-order-differential-equations'),
-              ('Velocity only', 2, None, 'velocity-only'),
-              ('Linking with RNNs', 2, None, 'linking-with-rnns'),
-              ('Minor rewrite', 2, None, 'minor-rewrite'),
-              ('RNNs in more detail', 2, None, 'rnns-in-more-detail'),
-              ('RNNs in more detail, part 2',
+               'rewriting-as-dot-products'),
+              ('Cross correlation', 2, None, 'cross-correlation'),
+              ('Two-dimensional objects', 2, None, 'two-dimensional-objects'),
+              ('CNNs in more detail, simple example',
                2,
                None,
-               'rnns-in-more-detail-part-2'),
-              ('RNNs in more detail, part 3',
+               'cnns-in-more-detail-simple-example'),
+              ('The convolution stage', 2, None, 'the-convolution-stage'),
+              ('Finding the number of parameters',
                2,
                None,
-               'rnns-in-more-detail-part-3'),
-              ('RNNs in more detail, part 4',
+               'finding-the-number-of-parameters'),
+              ('New image (or volume)', 2, None, 'new-image-or-volume'),
+              ('Parameters to train, common settings',
                2,
                None,
-               'rnns-in-more-detail-part-4'),
-              ('RNNs in more detail, part 5',
-               2,
-               None,
-               'rnns-in-more-detail-part-5'),
-              ('RNNs in more detail, part 6',
-               2,
-               None,
-               'rnns-in-more-detail-part-6'),
-              ('RNNs in more detail, part 7',
-               2,
-               None,
-               'rnns-in-more-detail-part-7'),
-              ('Backpropagation through time',
-               2,
-               None,
-               'backpropagation-through-time'),
-              ('The backward pass is linear',
-               2,
-               None,
-               'the-backward-pass-is-linear'),
-              ('The problem of exploding or vanishing gradients',
-               2,
-               None,
-               'the-problem-of-exploding-or-vanishing-gradients'),
-              ('Mathematical setup', 2, None, 'mathematical-setup'),
-              ('Back propagation in time through figures, part 1',
-               2,
-               None,
-               'back-propagation-in-time-through-figures-part-1'),
-              ('Back propagation in time, part 2',
-               2,
-               None,
-               'back-propagation-in-time-part-2'),
-              ('Back propagation in time, part 3',
+               'parameters-to-train-common-settings'),
+              ('Examples of CNN setups', 2, None, 'examples-of-cnn-setups'),
+              ('Summarizing: Performing a general discrete convolution ("From '
+               'Raschka et '
+               'al":"https://github.com/rasbt/machine-learning-book")',
                2,
                None,
-               'back-propagation-in-time-part-3'),
-              ('Back propagation in time, part 4',
+               'summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
+              ('Pooling', 2, None, 'pooling'),
+              ('Pooling arithmetic', 2, None, 'pooling-arithmetic'),
+              ('Pooling types ("From Raschka et '
+               'al":"https://github.com/rasbt/machine-learning-book")',
                2,
                None,
-               'back-propagation-in-time-part-4'),
-              ('Back propagation in time in equations',
+               'pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
+              ('Building convolutional neural networks using Tensorflow and '
+               'Keras',
                2,
                None,
-               'back-propagation-in-time-in-equations'),
-              ('Chain rule again', 2, None, 'chain-rule-again'),
-              ('Gradients of loss functions',
+               'building-convolutional-neural-networks-using-tensorflow-and-keras'),
+              ('Setting it up', 2, None, 'setting-it-up'),
+              ('The MNIST dataset again', 2, None, 'the-mnist-dataset-again'),
+              ('Strong correlations', 2, None, 'strong-correlations'),
+              ('Layers of a CNN', 2, None, 'layers-of-a-cnn'),
+              ('Systematic reduction', 2, None, 'systematic-reduction'),
+              ('Prerequisites: Collect and pre-process data',
                2,
                None,
-               'gradients-of-loss-functions'),
-              ('Summary of RNNs', 2, None, 'summary-of-rnns'),
-              ('Summary of a  typical RNN',
+               'prerequisites-collect-and-pre-process-data'),
+              ('Importing Keras and Tensorflow',
                2,
                None,
-               'summary-of-a-typical-rnn'),
-              ('Four effective ways to learn an RNN and preparing for next '
-               'week',
+               'importing-keras-and-tensorflow'),
+              ('Running with Keras', 2, None, 'running-with-keras'),
+              ('Final part', 2, None, 'final-part'),
+              ('Final visualization', 2, None, 'final-visualization'),
+              ('The CIFAR01 data set', 2, None, 'the-cifar01-data-set'),
+              ('Verifying the data set', 2, None, 'verifying-the-data-set'),
+              ('Set up  the model', 2, None, 'set-up-the-model'),
+              ('Add Dense layers on top', 2, None, 'add-dense-layers-on-top'),
+              ('Compile and train the model',
                2,
                None,
-               'four-effective-ways-to-learn-an-rnn-and-preparing-for-next-week'),
-              ('Gating mechanism: Long Short Term Memory (LSTM)',
+               'compile-and-train-the-model'),
+              ('Finally, evaluate the model',
                2,
                None,
-               'gating-mechanism-long-short-term-memory-lstm'),
-              ('Implementing a memory cell in a neural network',
+               'finally-evaluate-the-model'),
+              ('Building code using Pytorch',
                2,
                None,
-               'implementing-a-memory-cell-in-a-neural-network'),
-              ('LSTM details', 2, None, 'lstm-details'),
-              ('Basic layout', 2, None, 'basic-layout'),
-              ('More LSTM details', 2, None, 'more-lstm-details'),
-              ('The forget gate', 2, None, 'the-forget-gate'),
-              ('Input gate', 2, None, 'input-gate'),
-              ('Forget and input', 2, None, 'forget-and-input'),
-              ('Output gate', 2, None, 'output-gate')]}
+               'building-code-using-pytorch')]}
 end of tocinfo -->
 
 <body>
@@ -302,111 +201,72 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week45-bs.html">Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks (RNNs)</a>
+    <a class="navbar-brand" href="http://github.com/week45-bs.html">Week 45,  Convolutional Neural Networks (CCNs)</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs001.html#plans-for-week-45" style="font-size: 80%;"><b>Plans for week 45</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs002.html#material-for-the-lab-sessions-additional-ways-to-present-classification-results-and-other-practicalities" style="font-size: 80%;"><b>Material for the lab sessions, additional ways to present classification results and other practicalities</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs003.html#material-for-lecture-monday-november-4" style="font-size: 80%;"><b>Material for Lecture Monday November 4</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs004.html#convolutional-neural-networks-recognizing-images" style="font-size: 80%;"><b>Convolutional Neural Networks (recognizing images)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs005.html#what-is-the-difference" style="font-size: 80%;"><b>What is the Difference</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs006.html#neural-networks-vs-cnns" style="font-size: 80%;"><b>Neural Networks vs CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs007.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;"><b>Why CNNS for images, sound files, medical images from CT scans etc?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs008.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;"><b>Regular NNs don’t scale well to full images</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs009.html#3d-volumes-of-neurons" style="font-size: 80%;"><b>3D volumes of neurons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs010.html#layers-used-to-build-cnns" style="font-size: 80%;"><b>Layers used to build CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs011.html#cnns-in-brief" style="font-size: 80%;"><b>CNNs in brief</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs012.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs013.html#key-idea" style="font-size: 80%;"><b>Key Idea</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs014.html#building-convolutional-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building convolutional neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs015.html#setting-it-up" style="font-size: 80%;"><b>Setting it up</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs016.html#the-mnist-dataset-again" style="font-size: 80%;"><b>The MNIST dataset again</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs017.html#strong-correlations" style="font-size: 80%;"><b>Strong correlations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs018.html#layers-of-a-cnn" style="font-size: 80%;"><b>Layers of a CNN</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs019.html#systematic-reduction" style="font-size: 80%;"><b>Systematic reduction</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs020.html#prerequisites-collect-and-pre-process-data" style="font-size: 80%;"><b>Prerequisites: Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs021.html#importing-keras-and-tensorflow" style="font-size: 80%;"><b>Importing Keras and Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs022.html#running-with-keras" style="font-size: 80%;"><b>Running with Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs023.html#final-part" style="font-size: 80%;"><b>Final part</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs024.html#final-visualization" style="font-size: 80%;"><b>Final visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs025.html#the-cifar01-data-set" style="font-size: 80%;"><b>The CIFAR01 data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs026.html#verifying-the-data-set" style="font-size: 80%;"><b>Verifying the data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="#set-up-the-model" style="font-size: 80%;"><b>Set up  the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs028.html#add-dense-layers-on-top" style="font-size: 80%;"><b>Add Dense layers on top</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs029.html#compile-and-train-the-model" style="font-size: 80%;"><b>Compile and train the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs030.html#finally-evaluate-the-model" style="font-size: 80%;"><b>Finally, evaluate the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#building-our-own-cnn-code" style="font-size: 80%;"><b>Building our own CNN code</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#list-of-contents" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;List of contents:</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution2dlayer-convolution-in-a-hidden-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution2DLayer: convolution in a hidden layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#backpropagation-in-the-convolutional-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Backpropagation in the convolutional layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#demonstration" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Demonstration</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#pooling-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Pooling Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#flattening-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Flattening Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#fully-connected-layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Fully Connected Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#optimized-convolution2dlayer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Optimized Convolution2DLayer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#the-convolutional-neural-network-cnn" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;The Convolutional Neural Network (CNN)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-cnn-code" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of CNN code</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#additional-remarks" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Additional Remarks</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#remarks-on-the-speed" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Remarks on the speed</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution-using-separable-kernels" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution using separable kernels</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution-in-the-fourier-domain" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution in the Fourier domain</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs032.html#from-ffnns-and-cnns-to-recurrent-neural-networks-rnns" style="font-size: 80%;"><b>From FFNNs and CNNs to recurrent neural networks (RNNs)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs033.html#feedback-connections" style="font-size: 80%;"><b>Feedback connections</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs034.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs035.html#recurrent-neural-networks-rnns-overarching-view" style="font-size: 80%;"><b>Recurrent neural networks (RNNs): Overarching view</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs036.html#sequential-data-only" style="font-size: 80%;"><b>Sequential data only?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs037.html#differential-equations" style="font-size: 80%;"><b>Differential equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs038.html#a-simple-example" style="font-size: 80%;"><b>A simple example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs039.html#rnns" style="font-size: 80%;"><b>RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs040.html#what-kinds-of-behaviour-can-rnns-exhibit" style="font-size: 80%;"><b>What kinds of behaviour can RNNs exhibit?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs041.html#basic-layout-figures-from-sebastian-rashcka-et-al-machine-learning-with-sickit-learn-and-pytorch-https-sebastianraschka-com-blog-2022-ml-pytorch-book-html" style="font-size: 80%;"><b>Basic layout,  "Figures from Sebastian Rashcka et al, Machine learning with Sickit-Learn and PyTorch":"https://sebastianraschka.com/blog/2022/ml-pytorch-book.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs042.html#solving-differential-equations-with-rnns" style="font-size: 80%;"><b>Solving differential equations with RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs043.html#two-first-order-differential-equations" style="font-size: 80%;"><b>Two first-order differential equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs044.html#velocity-only" style="font-size: 80%;"><b>Velocity only</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs045.html#linking-with-rnns" style="font-size: 80%;"><b>Linking with RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs046.html#minor-rewrite" style="font-size: 80%;"><b>Minor rewrite</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs047.html#rnns-in-more-detail" style="font-size: 80%;"><b>RNNs in more detail</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs048.html#rnns-in-more-detail-part-2" style="font-size: 80%;"><b>RNNs in more detail, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs049.html#rnns-in-more-detail-part-3" style="font-size: 80%;"><b>RNNs in more detail, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs050.html#rnns-in-more-detail-part-4" style="font-size: 80%;"><b>RNNs in more detail, part 4</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs051.html#rnns-in-more-detail-part-5" style="font-size: 80%;"><b>RNNs in more detail, part 5</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs052.html#rnns-in-more-detail-part-6" style="font-size: 80%;"><b>RNNs in more detail, part 6</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs053.html#rnns-in-more-detail-part-7" style="font-size: 80%;"><b>RNNs in more detail, part 7</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs054.html#backpropagation-through-time" style="font-size: 80%;"><b>Backpropagation through time</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs055.html#the-backward-pass-is-linear" style="font-size: 80%;"><b>The backward pass is linear</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs056.html#the-problem-of-exploding-or-vanishing-gradients" style="font-size: 80%;"><b>The problem of exploding or vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs057.html#mathematical-setup" style="font-size: 80%;"><b>Mathematical setup</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs058.html#back-propagation-in-time-through-figures-part-1" style="font-size: 80%;"><b>Back propagation in time through figures, part 1</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs059.html#back-propagation-in-time-part-2" style="font-size: 80%;"><b>Back propagation in time, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs060.html#back-propagation-in-time-part-3" style="font-size: 80%;"><b>Back propagation in time, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs061.html#back-propagation-in-time-part-4" style="font-size: 80%;"><b>Back propagation in time, part 4</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs062.html#back-propagation-in-time-in-equations" style="font-size: 80%;"><b>Back propagation in time in equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs063.html#chain-rule-again" style="font-size: 80%;"><b>Chain rule again</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs064.html#gradients-of-loss-functions" style="font-size: 80%;"><b>Gradients of loss functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs065.html#summary-of-rnns" style="font-size: 80%;"><b>Summary of RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs066.html#summary-of-a-typical-rnn" style="font-size: 80%;"><b>Summary of a  typical RNN</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs067.html#four-effective-ways-to-learn-an-rnn-and-preparing-for-next-week" style="font-size: 80%;"><b>Four effective ways to learn an RNN and preparing for next week</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs068.html#gating-mechanism-long-short-term-memory-lstm" style="font-size: 80%;"><b>Gating mechanism: Long Short Term Memory (LSTM)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs069.html#implementing-a-memory-cell-in-a-neural-network" style="font-size: 80%;"><b>Implementing a memory cell in a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs070.html#lstm-details" style="font-size: 80%;"><b>LSTM details</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs071.html#basic-layout" style="font-size: 80%;"><b>Basic layout</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs072.html#more-lstm-details" style="font-size: 80%;"><b>More LSTM details</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs073.html#the-forget-gate" style="font-size: 80%;"><b>The forget gate</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs074.html#input-gate" style="font-size: 80%;"><b>Input gate</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs075.html#forget-and-input" style="font-size: 80%;"><b>Forget and input</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs076.html#output-gate" style="font-size: 80%;"><b>Output gate</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs001.html#plans-for-week-45" style="font-size: 80%;">Plans for week 45</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs002.html#material-for-the-lab-sessions" style="font-size: 80%;">Material for the lab sessions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs003.html#material-for-lecture-monday-november-3" style="font-size: 80%;">Material for Lecture Monday November 3</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs004.html#convolutional-neural-networks-recognizing-images-reminder-from-last-week" style="font-size: 80%;">Convolutional Neural Networks (recognizing images), reminder from last week</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs005.html#what-is-the-difference" style="font-size: 80%;">What is the Difference</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs006.html#neural-networks-vs-cnns" style="font-size: 80%;">Neural Networks vs CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs007.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;">Why CNNS for images, sound files, medical images from CT scans etc?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs008.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;">Regular NNs don’t scale well to full images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs009.html#3d-volumes-of-neurons" style="font-size: 80%;">3D volumes of neurons</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs010.html#more-on-dimensionalities" style="font-size: 80%;">More on Dimensionalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs011.html#further-remarks" style="font-size: 80%;">Further remarks</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs012.html#layers-used-to-build-cnns" style="font-size: 80%;">Layers used to build CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs013.html#transforming-images" style="font-size: 80%;">Transforming images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs014.html#cnns-in-brief" style="font-size: 80%;">CNNs in brief</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs015.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs016.html#key-idea" style="font-size: 80%;">Key Idea</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs017.html#mathematics-of-cnns" style="font-size: 80%;">Mathematics of CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs018.html#convolution-examples-polynomial-multiplication" style="font-size: 80%;">Convolution Examples: Polynomial multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs019.html#efficient-polynomial-multiplication" style="font-size: 80%;">Efficient Polynomial Multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs020.html#further-simplification" style="font-size: 80%;">Further simplification</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs021.html#a-more-efficient-way-of-coding-the-above-convolution" style="font-size: 80%;">A more efficient way of coding the above Convolution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs022.html#commutative-process" style="font-size: 80%;">Commutative process</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs023.html#toeplitz-matrices" style="font-size: 80%;">Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs024.html#fourier-series-and-toeplitz-matrices" style="font-size: 80%;">Fourier series and Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs025.html#generalizing-the-above-one-dimensional-case" style="font-size: 80%;">Generalizing the above one-dimensional case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs026.html#memory-considerations" style="font-size: 80%;">Memory considerations</a></li>
+     <!-- navigation toc: --> <li><a href="#padding" style="font-size: 80%;">Padding</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs028.html#new-vector" style="font-size: 80%;">New vector</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs029.html#rewriting-as-dot-products" style="font-size: 80%;">Rewriting as dot products</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs030.html#cross-correlation" style="font-size: 80%;">Cross correlation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs030.html#two-dimensional-objects" style="font-size: 80%;">Two-dimensional objects</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#cnns-in-more-detail-simple-example" style="font-size: 80%;">CNNs in more detail, simple example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs032.html#the-convolution-stage" style="font-size: 80%;">The convolution stage</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs033.html#finding-the-number-of-parameters" style="font-size: 80%;">Finding the number of parameters</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs034.html#new-image-or-volume" style="font-size: 80%;">New image (or volume)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs035.html#parameters-to-train-common-settings" style="font-size: 80%;">Parameters to train, common settings</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs036.html#examples-of-cnn-setups" style="font-size: 80%;">Examples of CNN setups</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs037.html#summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Summarizing: Performing a general discrete convolution ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs038.html#pooling" style="font-size: 80%;">Pooling</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs039.html#pooling-arithmetic" style="font-size: 80%;">Pooling arithmetic</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs040.html#pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Pooling types ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs041.html#building-convolutional-neural-networks-using-tensorflow-and-keras" style="font-size: 80%;">Building convolutional neural networks using Tensorflow and Keras</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs042.html#setting-it-up" style="font-size: 80%;">Setting it up</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs043.html#the-mnist-dataset-again" style="font-size: 80%;">The MNIST dataset again</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs044.html#strong-correlations" style="font-size: 80%;">Strong correlations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs045.html#layers-of-a-cnn" style="font-size: 80%;">Layers of a CNN</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs046.html#systematic-reduction" style="font-size: 80%;">Systematic reduction</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs047.html#prerequisites-collect-and-pre-process-data" style="font-size: 80%;">Prerequisites: Collect and pre-process data</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs048.html#importing-keras-and-tensorflow" style="font-size: 80%;">Importing Keras and Tensorflow</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs049.html#running-with-keras" style="font-size: 80%;">Running with Keras</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs050.html#final-part" style="font-size: 80%;">Final part</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs051.html#final-visualization" style="font-size: 80%;">Final visualization</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs052.html#the-cifar01-data-set" style="font-size: 80%;">The CIFAR01 data set</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs053.html#verifying-the-data-set" style="font-size: 80%;">Verifying the data set</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs054.html#set-up-the-model" style="font-size: 80%;">Set up  the model</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs055.html#add-dense-layers-on-top" style="font-size: 80%;">Add Dense layers on top</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs056.html#compile-and-train-the-model" style="font-size: 80%;">Compile and train the model</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs057.html#finally-evaluate-the-model" style="font-size: 80%;">Finally, evaluate the model</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs058.html#building-code-using-pytorch" style="font-size: 80%;">Building code using Pytorch</a></li>
 
         </ul>
       </li>
@@ -418,45 +278,14 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0027"></a>
 <!-- !split -->
-<h2 id="set-up-the-model" class="anchor">Set up  the model </h2>
-
-<p>The 6 lines of code below define the convolutional base using a common pattern: a stack of Conv2D and MaxPooling2D layers.</p>
-
-<p>As input, a CNN takes tensors of shape (image_height, image_width, color_channels), ignoring the batch size. If you are new to these dimensions, color_channels refers to (R,G,B). In this example, you will configure our CNN to process inputs of shape (32, 32, 3), which is the format of CIFAR images. You can do this by passing the argument input_shape to our first layer.</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;">model <span style="color: #666666">=</span> models<span style="color: #666666">.</span>Sequential()
-model<span style="color: #666666">.</span>add(layers<span style="color: #666666">.</span>Conv2D(<span style="color: #666666">32</span>, (<span style="color: #666666">3</span>, <span style="color: #666666">3</span>), activation<span style="color: #666666">=</span><span style="color: #BA2121">&#39;relu&#39;</span>, input_shape<span style="color: #666666">=</span>(<span style="color: #666666">32</span>, <span style="color: #666666">32</span>, <span style="color: #666666">3</span>)))
-model<span style="color: #666666">.</span>add(layers<span style="color: #666666">.</span>MaxPooling2D((<span style="color: #666666">2</span>, <span style="color: #666666">2</span>)))
-model<span style="color: #666666">.</span>add(layers<span style="color: #666666">.</span>Conv2D(<span style="color: #666666">64</span>, (<span style="color: #666666">3</span>, <span style="color: #666666">3</span>), activation<span style="color: #666666">=</span><span style="color: #BA2121">&#39;relu&#39;</span>))
-model<span style="color: #666666">.</span>add(layers<span style="color: #666666">.</span>MaxPooling2D((<span style="color: #666666">2</span>, <span style="color: #666666">2</span>)))
-model<span style="color: #666666">.</span>add(layers<span style="color: #666666">.</span>Conv2D(<span style="color: #666666">64</span>, (<span style="color: #666666">3</span>, <span style="color: #666666">3</span>), activation<span style="color: #666666">=</span><span style="color: #BA2121">&#39;relu&#39;</span>))
-
-<span style="color: #408080; font-style: italic"># Let&#39;s display the architecture of our model so far.</span>
-
-model<span style="color: #666666">.</span>summary()
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
+<h2 id="padding" class="anchor">Padding </h2>
 
-<p>You can see that the output of every Conv2D and MaxPooling2D layer is a 3D tensor of shape (height, width, channels). The width and height dimensions tend to shrink as you go deeper in the network. The number of output channels for each Conv2D layer is controlled by the first argument (e.g., 32 or 64). Typically, as the width and height shrink, you can afford (computationally) to add more output channels in each Conv2D layer.</p>
+<p>The solution to this is what is called <b>padding</b>!  We simply define a
+new vector \( x \) with two added elements set to zero before \( x(0) \) and
+two new elements after \( x(3) \) set to zero. That is, we augment the
+length of \( \boldsymbol{x} \) from \( n=4 \) to \( n+2P=8 \), where \( P=2 \) is the padding
+constant (a new hyperparameter), see discussions below as well.
+</p>
 
 <p>
 <!-- navigation buttons at the bottom of the page -->
@@ -483,7 +312,7 @@ <h2 id="set-up-the-model" class="anchor">Set up  the model </h2>
   <li><a href="http://github.com/._week45-bs035.html">36</a></li>
   <li><a href="http://github.com/._week45-bs036.html">37</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week45-bs076.html">77</a></li>
+  <li><a href="http://github.com/._week45-bs058.html">59</a></li>
   <li><a href="http://github.com/._week45-bs028.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week45/html/._week45-bs028.html b/doc/pub/week45/html/._week45-bs028.html
index 25d697d7d..fe0d877a9 100644
--- a/doc/pub/week45/html/._week45-bs028.html
+++ b/doc/pub/week45/html/._week45-bs028.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks (RNNs)">
-<title>Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks (RNNs)</title>
+<meta name="description" content="Week 45,  Convolutional Neural Networks (CCNs)">
+<title>Week 45,  Convolutional Neural Networks (CCNs)</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week45.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week45-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -37,19 +37,19 @@
 <!-- tocinfo
 {'highest level': 2,
  'sections': [('Plans for week 45', 2, None, 'plans-for-week-45'),
-              ('Material for the lab sessions, additional ways to present '
-               'classification results and other practicalities',
+              ('Material for the lab sessions',
                2,
                None,
-               'material-for-the-lab-sessions-additional-ways-to-present-classification-results-and-other-practicalities'),
-              ('Material for Lecture Monday November 4',
+               'material-for-the-lab-sessions'),
+              ('Material for Lecture Monday November 3',
                2,
                None,
-               'material-for-lecture-monday-november-4'),
-              ('Convolutional Neural Networks (recognizing images)',
+               'material-for-lecture-monday-november-3'),
+              ('Convolutional Neural Networks (recognizing images), reminder '
+               'from last week',
                2,
                None,
-               'convolutional-neural-networks-recognizing-images'),
+               'convolutional-neural-networks-recognizing-images-reminder-from-last-week'),
               ('What is the Difference', 2, None, 'what-is-the-difference'),
               ('Neural Networks vs CNNs', 2, None, 'neural-networks-vs-cnns'),
               ('Why CNNS for images, sound files, medical images from CT scans '
@@ -62,10 +62,13 @@
                None,
                'regular-nns-don-t-scale-well-to-full-images'),
               ('3D volumes of neurons', 2, None, '3d-volumes-of-neurons'),
+              ('More on Dimensionalities', 2, None, 'more-on-dimensionalities'),
+              ('Further remarks', 2, None, 'further-remarks'),
               ('Layers used to build CNNs',
                2,
                None,
                'layers-used-to-build-cnns'),
+              ('Transforming images', 2, None, 'transforming-images'),
               ('CNNs in brief', 2, None, 'cnns-in-brief'),
               ('A deep CNN model ("From Raschka et '
                'al":"https://github.com/rasbt/machine-learning-book")',
@@ -73,208 +76,104 @@
                None,
                'a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
               ('Key Idea', 2, None, 'key-idea'),
-              ('Building convolutional neural networks in Tensorflow and Keras',
+              ('Mathematics of CNNs', 2, None, 'mathematics-of-cnns'),
+              ('Convolution Examples: Polynomial multiplication',
                2,
                None,
-               'building-convolutional-neural-networks-in-tensorflow-and-keras'),
-              ('Setting it up', 2, None, 'setting-it-up'),
-              ('The MNIST dataset again', 2, None, 'the-mnist-dataset-again'),
-              ('Strong correlations', 2, None, 'strong-correlations'),
-              ('Layers of a CNN', 2, None, 'layers-of-a-cnn'),
-              ('Systematic reduction', 2, None, 'systematic-reduction'),
-              ('Prerequisites: Collect and pre-process data',
-               2,
-               None,
-               'prerequisites-collect-and-pre-process-data'),
-              ('Importing Keras and Tensorflow',
-               2,
-               None,
-               'importing-keras-and-tensorflow'),
-              ('Running with Keras', 2, None, 'running-with-keras'),
-              ('Final part', 2, None, 'final-part'),
-              ('Final visualization', 2, None, 'final-visualization'),
-              ('The CIFAR01 data set', 2, None, 'the-cifar01-data-set'),
-              ('Verifying the data set', 2, None, 'verifying-the-data-set'),
-              ('Set up  the model', 2, None, 'set-up-the-model'),
-              ('Add Dense layers on top', 2, None, 'add-dense-layers-on-top'),
-              ('Compile and train the model',
-               2,
-               None,
-               'compile-and-train-the-model'),
-              ('Finally, evaluate the model',
-               2,
-               None,
-               'finally-evaluate-the-model'),
-              ('Building our own CNN code',
-               2,
-               None,
-               'building-our-own-cnn-code'),
-              ('List of contents:', 3, None, 'list-of-contents'),
-              ('Schedulers', 3, None, 'schedulers'),
-              ('Usage of schedulers', 3, None, 'usage-of-schedulers'),
-              ('Cost functions', 3, None, 'cost-functions'),
-              ('Usage of cost functions', 3, None, 'usage-of-cost-functions'),
-              ('Activation functions', 3, None, 'activation-functions'),
-              ('Usage of activation functions',
-               3,
-               None,
-               'usage-of-activation-functions'),
-              ('Convolution', 3, None, 'convolution'),
-              ('Layers', 3, None, 'layers'),
-              ('Convolution2DLayer: convolution in a hidden layer',
-               3,
-               None,
-               'convolution2dlayer-convolution-in-a-hidden-layer'),
-              ('Backpropagation in the convolutional layer',
-               3,
-               None,
-               'backpropagation-in-the-convolutional-layer'),
-              ('Demonstration', 3, None, 'demonstration'),
-              ('Pooling Layer', 3, None, 'pooling-layer'),
-              ('Flattening Layer', 3, None, 'flattening-layer'),
-              ('Fully Connected Layers', 3, None, 'fully-connected-layers'),
-              ('Optimized Convolution2DLayer',
-               3,
-               None,
-               'optimized-convolution2dlayer'),
-              ('The Convolutional Neural Network (CNN)',
-               3,
-               None,
-               'the-convolutional-neural-network-cnn'),
-              ('Usage of CNN code', 3, None, 'usage-of-cnn-code'),
-              ('Additional Remarks', 3, None, 'additional-remarks'),
-              ('Remarks on the speed', 3, None, 'remarks-on-the-speed'),
-              ('Convolution using separable kernels',
-               3,
-               None,
-               'convolution-using-separable-kernels'),
-              ('Convolution in the Fourier domain',
-               3,
-               None,
-               'convolution-in-the-fourier-domain'),
-              ('From FFNNs and CNNs to recurrent neural networks (RNNs)',
-               2,
-               None,
-               'from-ffnns-and-cnns-to-recurrent-neural-networks-rnns'),
-              ('Feedback connections', 2, None, 'feedback-connections'),
-              ('Vanishing gradients', 2, None, 'vanishing-gradients'),
-              ('Recurrent neural networks (RNNs): Overarching view',
-               2,
-               None,
-               'recurrent-neural-networks-rnns-overarching-view'),
-              ('Sequential data only?', 2, None, 'sequential-data-only'),
-              ('Differential equations', 2, None, 'differential-equations'),
-              ('A simple example', 2, None, 'a-simple-example'),
-              ('RNNs', 2, None, 'rnns'),
-              ('What kinds of behaviour can RNNs exhibit?',
-               2,
-               None,
-               'what-kinds-of-behaviour-can-rnns-exhibit'),
-              ('Basic layout,  "Figures from Sebastian Rashcka et al, Machine '
-               'learning with Sickit-Learn and '
-               'PyTorch":"https://sebastianraschka.com/blog/2022/ml-pytorch-book.html"',
-               2,
-               None,
-               'basic-layout-figures-from-sebastian-rashcka-et-al-machine-learning-with-sickit-learn-and-pytorch-https-sebastianraschka-com-blog-2022-ml-pytorch-book-html'),
-              ('Solving differential equations with RNNs',
-               2,
-               None,
-               'solving-differential-equations-with-rnns'),
-              ('Two first-order differential equations',
-               2,
-               None,
-               'two-first-order-differential-equations'),
-              ('Velocity only', 2, None, 'velocity-only'),
-              ('Linking with RNNs', 2, None, 'linking-with-rnns'),
-              ('Minor rewrite', 2, None, 'minor-rewrite'),
-              ('RNNs in more detail', 2, None, 'rnns-in-more-detail'),
-              ('RNNs in more detail, part 2',
-               2,
-               None,
-               'rnns-in-more-detail-part-2'),
-              ('RNNs in more detail, part 3',
-               2,
-               None,
-               'rnns-in-more-detail-part-3'),
-              ('RNNs in more detail, part 4',
+               'convolution-examples-polynomial-multiplication'),
+              ('Efficient Polynomial Multiplication',
                2,
                None,
-               'rnns-in-more-detail-part-4'),
-              ('RNNs in more detail, part 5',
+               'efficient-polynomial-multiplication'),
+              ('Further simplification', 2, None, 'further-simplification'),
+              ('A more efficient way of coding the above Convolution',
                2,
                None,
-               'rnns-in-more-detail-part-5'),
-              ('RNNs in more detail, part 6',
+               'a-more-efficient-way-of-coding-the-above-convolution'),
+              ('Commutative process', 2, None, 'commutative-process'),
+              ('Toeplitz matrices', 2, None, 'toeplitz-matrices'),
+              ('Fourier series and Toeplitz matrices',
                2,
                None,
-               'rnns-in-more-detail-part-6'),
-              ('RNNs in more detail, part 7',
+               'fourier-series-and-toeplitz-matrices'),
+              ('Generalizing the above one-dimensional case',
                2,
                None,
-               'rnns-in-more-detail-part-7'),
-              ('Backpropagation through time',
+               'generalizing-the-above-one-dimensional-case'),
+              ('Memory considerations', 2, None, 'memory-considerations'),
+              ('Padding', 2, None, 'padding'),
+              ('New vector', 2, None, 'new-vector'),
+              ('Rewriting as dot products',
                2,
                None,
-               'backpropagation-through-time'),
-              ('The backward pass is linear',
+               'rewriting-as-dot-products'),
+              ('Cross correlation', 2, None, 'cross-correlation'),
+              ('Two-dimensional objects', 2, None, 'two-dimensional-objects'),
+              ('CNNs in more detail, simple example',
                2,
                None,
-               'the-backward-pass-is-linear'),
-              ('The problem of exploding or vanishing gradients',
+               'cnns-in-more-detail-simple-example'),
+              ('The convolution stage', 2, None, 'the-convolution-stage'),
+              ('Finding the number of parameters',
                2,
                None,
-               'the-problem-of-exploding-or-vanishing-gradients'),
-              ('Mathematical setup', 2, None, 'mathematical-setup'),
-              ('Back propagation in time through figures, part 1',
+               'finding-the-number-of-parameters'),
+              ('New image (or volume)', 2, None, 'new-image-or-volume'),
+              ('Parameters to train, common settings',
                2,
                None,
-               'back-propagation-in-time-through-figures-part-1'),
-              ('Back propagation in time, part 2',
-               2,
-               None,
-               'back-propagation-in-time-part-2'),
-              ('Back propagation in time, part 3',
+               'parameters-to-train-common-settings'),
+              ('Examples of CNN setups', 2, None, 'examples-of-cnn-setups'),
+              ('Summarizing: Performing a general discrete convolution ("From '
+               'Raschka et '
+               'al":"https://github.com/rasbt/machine-learning-book")',
                2,
                None,
-               'back-propagation-in-time-part-3'),
-              ('Back propagation in time, part 4',
+               'summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
+              ('Pooling', 2, None, 'pooling'),
+              ('Pooling arithmetic', 2, None, 'pooling-arithmetic'),
+              ('Pooling types ("From Raschka et '
+               'al":"https://github.com/rasbt/machine-learning-book")',
                2,
                None,
-               'back-propagation-in-time-part-4'),
-              ('Back propagation in time in equations',
+               'pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
+              ('Building convolutional neural networks using Tensorflow and '
+               'Keras',
                2,
                None,
-               'back-propagation-in-time-in-equations'),
-              ('Chain rule again', 2, None, 'chain-rule-again'),
-              ('Gradients of loss functions',
+               'building-convolutional-neural-networks-using-tensorflow-and-keras'),
+              ('Setting it up', 2, None, 'setting-it-up'),
+              ('The MNIST dataset again', 2, None, 'the-mnist-dataset-again'),
+              ('Strong correlations', 2, None, 'strong-correlations'),
+              ('Layers of a CNN', 2, None, 'layers-of-a-cnn'),
+              ('Systematic reduction', 2, None, 'systematic-reduction'),
+              ('Prerequisites: Collect and pre-process data',
                2,
                None,
-               'gradients-of-loss-functions'),
-              ('Summary of RNNs', 2, None, 'summary-of-rnns'),
-              ('Summary of a  typical RNN',
+               'prerequisites-collect-and-pre-process-data'),
+              ('Importing Keras and Tensorflow',
                2,
                None,
-               'summary-of-a-typical-rnn'),
-              ('Four effective ways to learn an RNN and preparing for next '
-               'week',
+               'importing-keras-and-tensorflow'),
+              ('Running with Keras', 2, None, 'running-with-keras'),
+              ('Final part', 2, None, 'final-part'),
+              ('Final visualization', 2, None, 'final-visualization'),
+              ('The CIFAR01 data set', 2, None, 'the-cifar01-data-set'),
+              ('Verifying the data set', 2, None, 'verifying-the-data-set'),
+              ('Set up  the model', 2, None, 'set-up-the-model'),
+              ('Add Dense layers on top', 2, None, 'add-dense-layers-on-top'),
+              ('Compile and train the model',
                2,
                None,
-               'four-effective-ways-to-learn-an-rnn-and-preparing-for-next-week'),
-              ('Gating mechanism: Long Short Term Memory (LSTM)',
+               'compile-and-train-the-model'),
+              ('Finally, evaluate the model',
                2,
                None,
-               'gating-mechanism-long-short-term-memory-lstm'),
-              ('Implementing a memory cell in a neural network',
+               'finally-evaluate-the-model'),
+              ('Building code using Pytorch',
                2,
                None,
-               'implementing-a-memory-cell-in-a-neural-network'),
-              ('LSTM details', 2, None, 'lstm-details'),
-              ('Basic layout', 2, None, 'basic-layout'),
-              ('More LSTM details', 2, None, 'more-lstm-details'),
-              ('The forget gate', 2, None, 'the-forget-gate'),
-              ('Input gate', 2, None, 'input-gate'),
-              ('Forget and input', 2, None, 'forget-and-input'),
-              ('Output gate', 2, None, 'output-gate')]}
+               'building-code-using-pytorch')]}
 end of tocinfo -->
 
 <body>
@@ -302,111 +201,72 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week45-bs.html">Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks (RNNs)</a>
+    <a class="navbar-brand" href="http://github.com/week45-bs.html">Week 45,  Convolutional Neural Networks (CCNs)</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs001.html#plans-for-week-45" style="font-size: 80%;"><b>Plans for week 45</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs002.html#material-for-the-lab-sessions-additional-ways-to-present-classification-results-and-other-practicalities" style="font-size: 80%;"><b>Material for the lab sessions, additional ways to present classification results and other practicalities</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs003.html#material-for-lecture-monday-november-4" style="font-size: 80%;"><b>Material for Lecture Monday November 4</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs004.html#convolutional-neural-networks-recognizing-images" style="font-size: 80%;"><b>Convolutional Neural Networks (recognizing images)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs005.html#what-is-the-difference" style="font-size: 80%;"><b>What is the Difference</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs006.html#neural-networks-vs-cnns" style="font-size: 80%;"><b>Neural Networks vs CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs007.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;"><b>Why CNNS for images, sound files, medical images from CT scans etc?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs008.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;"><b>Regular NNs don’t scale well to full images</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs009.html#3d-volumes-of-neurons" style="font-size: 80%;"><b>3D volumes of neurons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs010.html#layers-used-to-build-cnns" style="font-size: 80%;"><b>Layers used to build CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs011.html#cnns-in-brief" style="font-size: 80%;"><b>CNNs in brief</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs012.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs013.html#key-idea" style="font-size: 80%;"><b>Key Idea</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs014.html#building-convolutional-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building convolutional neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs015.html#setting-it-up" style="font-size: 80%;"><b>Setting it up</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs016.html#the-mnist-dataset-again" style="font-size: 80%;"><b>The MNIST dataset again</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs017.html#strong-correlations" style="font-size: 80%;"><b>Strong correlations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs018.html#layers-of-a-cnn" style="font-size: 80%;"><b>Layers of a CNN</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs019.html#systematic-reduction" style="font-size: 80%;"><b>Systematic reduction</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs020.html#prerequisites-collect-and-pre-process-data" style="font-size: 80%;"><b>Prerequisites: Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs021.html#importing-keras-and-tensorflow" style="font-size: 80%;"><b>Importing Keras and Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs022.html#running-with-keras" style="font-size: 80%;"><b>Running with Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs023.html#final-part" style="font-size: 80%;"><b>Final part</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs024.html#final-visualization" style="font-size: 80%;"><b>Final visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs025.html#the-cifar01-data-set" style="font-size: 80%;"><b>The CIFAR01 data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs026.html#verifying-the-data-set" style="font-size: 80%;"><b>Verifying the data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs027.html#set-up-the-model" style="font-size: 80%;"><b>Set up  the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="#add-dense-layers-on-top" style="font-size: 80%;"><b>Add Dense layers on top</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs029.html#compile-and-train-the-model" style="font-size: 80%;"><b>Compile and train the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs030.html#finally-evaluate-the-model" style="font-size: 80%;"><b>Finally, evaluate the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#building-our-own-cnn-code" style="font-size: 80%;"><b>Building our own CNN code</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#list-of-contents" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;List of contents:</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution2dlayer-convolution-in-a-hidden-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution2DLayer: convolution in a hidden layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#backpropagation-in-the-convolutional-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Backpropagation in the convolutional layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#demonstration" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Demonstration</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#pooling-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Pooling Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#flattening-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Flattening Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#fully-connected-layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Fully Connected Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#optimized-convolution2dlayer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Optimized Convolution2DLayer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#the-convolutional-neural-network-cnn" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;The Convolutional Neural Network (CNN)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-cnn-code" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of CNN code</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#additional-remarks" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Additional Remarks</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#remarks-on-the-speed" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Remarks on the speed</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution-using-separable-kernels" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution using separable kernels</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution-in-the-fourier-domain" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution in the Fourier domain</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs032.html#from-ffnns-and-cnns-to-recurrent-neural-networks-rnns" style="font-size: 80%;"><b>From FFNNs and CNNs to recurrent neural networks (RNNs)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs033.html#feedback-connections" style="font-size: 80%;"><b>Feedback connections</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs034.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs035.html#recurrent-neural-networks-rnns-overarching-view" style="font-size: 80%;"><b>Recurrent neural networks (RNNs): Overarching view</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs036.html#sequential-data-only" style="font-size: 80%;"><b>Sequential data only?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs037.html#differential-equations" style="font-size: 80%;"><b>Differential equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs038.html#a-simple-example" style="font-size: 80%;"><b>A simple example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs039.html#rnns" style="font-size: 80%;"><b>RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs040.html#what-kinds-of-behaviour-can-rnns-exhibit" style="font-size: 80%;"><b>What kinds of behaviour can RNNs exhibit?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs041.html#basic-layout-figures-from-sebastian-rashcka-et-al-machine-learning-with-sickit-learn-and-pytorch-https-sebastianraschka-com-blog-2022-ml-pytorch-book-html" style="font-size: 80%;"><b>Basic layout,  "Figures from Sebastian Rashcka et al, Machine learning with Sickit-Learn and PyTorch":"https://sebastianraschka.com/blog/2022/ml-pytorch-book.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs042.html#solving-differential-equations-with-rnns" style="font-size: 80%;"><b>Solving differential equations with RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs043.html#two-first-order-differential-equations" style="font-size: 80%;"><b>Two first-order differential equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs044.html#velocity-only" style="font-size: 80%;"><b>Velocity only</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs045.html#linking-with-rnns" style="font-size: 80%;"><b>Linking with RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs046.html#minor-rewrite" style="font-size: 80%;"><b>Minor rewrite</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs047.html#rnns-in-more-detail" style="font-size: 80%;"><b>RNNs in more detail</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs048.html#rnns-in-more-detail-part-2" style="font-size: 80%;"><b>RNNs in more detail, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs049.html#rnns-in-more-detail-part-3" style="font-size: 80%;"><b>RNNs in more detail, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs050.html#rnns-in-more-detail-part-4" style="font-size: 80%;"><b>RNNs in more detail, part 4</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs051.html#rnns-in-more-detail-part-5" style="font-size: 80%;"><b>RNNs in more detail, part 5</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs052.html#rnns-in-more-detail-part-6" style="font-size: 80%;"><b>RNNs in more detail, part 6</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs053.html#rnns-in-more-detail-part-7" style="font-size: 80%;"><b>RNNs in more detail, part 7</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs054.html#backpropagation-through-time" style="font-size: 80%;"><b>Backpropagation through time</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs055.html#the-backward-pass-is-linear" style="font-size: 80%;"><b>The backward pass is linear</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs056.html#the-problem-of-exploding-or-vanishing-gradients" style="font-size: 80%;"><b>The problem of exploding or vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs057.html#mathematical-setup" style="font-size: 80%;"><b>Mathematical setup</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs058.html#back-propagation-in-time-through-figures-part-1" style="font-size: 80%;"><b>Back propagation in time through figures, part 1</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs059.html#back-propagation-in-time-part-2" style="font-size: 80%;"><b>Back propagation in time, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs060.html#back-propagation-in-time-part-3" style="font-size: 80%;"><b>Back propagation in time, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs061.html#back-propagation-in-time-part-4" style="font-size: 80%;"><b>Back propagation in time, part 4</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs062.html#back-propagation-in-time-in-equations" style="font-size: 80%;"><b>Back propagation in time in equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs063.html#chain-rule-again" style="font-size: 80%;"><b>Chain rule again</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs064.html#gradients-of-loss-functions" style="font-size: 80%;"><b>Gradients of loss functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs065.html#summary-of-rnns" style="font-size: 80%;"><b>Summary of RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs066.html#summary-of-a-typical-rnn" style="font-size: 80%;"><b>Summary of a  typical RNN</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs067.html#four-effective-ways-to-learn-an-rnn-and-preparing-for-next-week" style="font-size: 80%;"><b>Four effective ways to learn an RNN and preparing for next week</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs068.html#gating-mechanism-long-short-term-memory-lstm" style="font-size: 80%;"><b>Gating mechanism: Long Short Term Memory (LSTM)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs069.html#implementing-a-memory-cell-in-a-neural-network" style="font-size: 80%;"><b>Implementing a memory cell in a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs070.html#lstm-details" style="font-size: 80%;"><b>LSTM details</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs071.html#basic-layout" style="font-size: 80%;"><b>Basic layout</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs072.html#more-lstm-details" style="font-size: 80%;"><b>More LSTM details</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs073.html#the-forget-gate" style="font-size: 80%;"><b>The forget gate</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs074.html#input-gate" style="font-size: 80%;"><b>Input gate</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs075.html#forget-and-input" style="font-size: 80%;"><b>Forget and input</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs076.html#output-gate" style="font-size: 80%;"><b>Output gate</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs001.html#plans-for-week-45" style="font-size: 80%;">Plans for week 45</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs002.html#material-for-the-lab-sessions" style="font-size: 80%;">Material for the lab sessions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs003.html#material-for-lecture-monday-november-3" style="font-size: 80%;">Material for Lecture Monday November 3</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs004.html#convolutional-neural-networks-recognizing-images-reminder-from-last-week" style="font-size: 80%;">Convolutional Neural Networks (recognizing images), reminder from last week</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs005.html#what-is-the-difference" style="font-size: 80%;">What is the Difference</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs006.html#neural-networks-vs-cnns" style="font-size: 80%;">Neural Networks vs CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs007.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;">Why CNNS for images, sound files, medical images from CT scans etc?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs008.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;">Regular NNs don’t scale well to full images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs009.html#3d-volumes-of-neurons" style="font-size: 80%;">3D volumes of neurons</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs010.html#more-on-dimensionalities" style="font-size: 80%;">More on Dimensionalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs011.html#further-remarks" style="font-size: 80%;">Further remarks</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs012.html#layers-used-to-build-cnns" style="font-size: 80%;">Layers used to build CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs013.html#transforming-images" style="font-size: 80%;">Transforming images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs014.html#cnns-in-brief" style="font-size: 80%;">CNNs in brief</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs015.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs016.html#key-idea" style="font-size: 80%;">Key Idea</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs017.html#mathematics-of-cnns" style="font-size: 80%;">Mathematics of CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs018.html#convolution-examples-polynomial-multiplication" style="font-size: 80%;">Convolution Examples: Polynomial multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs019.html#efficient-polynomial-multiplication" style="font-size: 80%;">Efficient Polynomial Multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs020.html#further-simplification" style="font-size: 80%;">Further simplification</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs021.html#a-more-efficient-way-of-coding-the-above-convolution" style="font-size: 80%;">A more efficient way of coding the above Convolution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs022.html#commutative-process" style="font-size: 80%;">Commutative process</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs023.html#toeplitz-matrices" style="font-size: 80%;">Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs024.html#fourier-series-and-toeplitz-matrices" style="font-size: 80%;">Fourier series and Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs025.html#generalizing-the-above-one-dimensional-case" style="font-size: 80%;">Generalizing the above one-dimensional case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs026.html#memory-considerations" style="font-size: 80%;">Memory considerations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs027.html#padding" style="font-size: 80%;">Padding</a></li>
+     <!-- navigation toc: --> <li><a href="#new-vector" style="font-size: 80%;">New vector</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs029.html#rewriting-as-dot-products" style="font-size: 80%;">Rewriting as dot products</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs030.html#cross-correlation" style="font-size: 80%;">Cross correlation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs030.html#two-dimensional-objects" style="font-size: 80%;">Two-dimensional objects</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#cnns-in-more-detail-simple-example" style="font-size: 80%;">CNNs in more detail, simple example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs032.html#the-convolution-stage" style="font-size: 80%;">The convolution stage</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs033.html#finding-the-number-of-parameters" style="font-size: 80%;">Finding the number of parameters</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs034.html#new-image-or-volume" style="font-size: 80%;">New image (or volume)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs035.html#parameters-to-train-common-settings" style="font-size: 80%;">Parameters to train, common settings</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs036.html#examples-of-cnn-setups" style="font-size: 80%;">Examples of CNN setups</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs037.html#summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Summarizing: Performing a general discrete convolution ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs038.html#pooling" style="font-size: 80%;">Pooling</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs039.html#pooling-arithmetic" style="font-size: 80%;">Pooling arithmetic</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs040.html#pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Pooling types ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs041.html#building-convolutional-neural-networks-using-tensorflow-and-keras" style="font-size: 80%;">Building convolutional neural networks using Tensorflow and Keras</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs042.html#setting-it-up" style="font-size: 80%;">Setting it up</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs043.html#the-mnist-dataset-again" style="font-size: 80%;">The MNIST dataset again</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs044.html#strong-correlations" style="font-size: 80%;">Strong correlations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs045.html#layers-of-a-cnn" style="font-size: 80%;">Layers of a CNN</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs046.html#systematic-reduction" style="font-size: 80%;">Systematic reduction</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs047.html#prerequisites-collect-and-pre-process-data" style="font-size: 80%;">Prerequisites: Collect and pre-process data</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs048.html#importing-keras-and-tensorflow" style="font-size: 80%;">Importing Keras and Tensorflow</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs049.html#running-with-keras" style="font-size: 80%;">Running with Keras</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs050.html#final-part" style="font-size: 80%;">Final part</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs051.html#final-visualization" style="font-size: 80%;">Final visualization</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs052.html#the-cifar01-data-set" style="font-size: 80%;">The CIFAR01 data set</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs053.html#verifying-the-data-set" style="font-size: 80%;">Verifying the data set</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs054.html#set-up-the-model" style="font-size: 80%;">Set up  the model</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs055.html#add-dense-layers-on-top" style="font-size: 80%;">Add Dense layers on top</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs056.html#compile-and-train-the-model" style="font-size: 80%;">Compile and train the model</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs057.html#finally-evaluate-the-model" style="font-size: 80%;">Finally, evaluate the model</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs058.html#building-code-using-pytorch" style="font-size: 80%;">Building code using Pytorch</a></li>
 
         </ul>
       </li>
@@ -418,46 +278,38 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0028"></a>
 <!-- !split -->
-<h2 id="add-dense-layers-on-top" class="anchor">Add Dense layers on top </h2>
+<h2 id="new-vector" class="anchor">New vector </h2>
 
-<p>To complete our model, you will feed the last output tensor from the
-convolutional base (of shape (4, 4, 64)) into one or more Dense layers
-to perform classification. Dense layers take vectors as input (which
-are 1D), while the current output is a 3D tensor. First, you will
-flatten (or unroll) the 3D output to 1D, then add one or more Dense
-layers on top. CIFAR has 10 output classes, so you use a final Dense
-layer with 10 outputs and a softmax activation.
+<p>We have a new vector defined as \( x(0)=0 \), \( x(1)=0 \),
+\( x(2)=\beta_0 \), \( x(3)=\beta_1 \), \( x(4)=\beta_2 \), \( x(5)=\beta_3 \),
+\( x(6)=0 \), and \( x(7)=0 \).
 </p>
 
+<p>We have added four new elements, which
+are all zero. The benefit is that we can rewrite the equation for
+\( \boldsymbol{y} \), with \( i=0,1,\dots,5 \),
+</p>
+$$
+y(i) = \sum_{k=0}^{k=m-1}w(k)x(i+(m-1)-k).
+$$
 
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;">model<span style="color: #666666">.</span>add(layers<span style="color: #666666">.</span>Flatten())
-model<span style="color: #666666">.</span>add(layers<span style="color: #666666">.</span>Dense(<span style="color: #666666">64</span>, activation<span style="color: #666666">=</span><span style="color: #BA2121">&#39;relu&#39;</span>))
-model<span style="color: #666666">.</span>add(layers<span style="color: #666666">.</span>Dense(<span style="color: #666666">10</span>))
-<span style="color: #408080; font-style: italic"># Here&#39;s the complete architecture of our model</span>
+<p>As an example, we have</p>
+$$
+y(4)=x(6)w(0)+x(5)w(1)+x(4)w(2)=0\times \alpha_0+\beta_3\alpha_1+\beta_2\alpha_2,
+$$
 
-model<span style="color: #666666">.</span>summary()
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
+<p>as before except that we have an additional term \( x(6)w(0) \), which is zero.</p>
+
+<p>Similarly, for the fifth-order term we have</p>
+$$
+y(5)=x(7)w(0)+x(6)w(1)+x(5)w(2)=0\times \alpha_0+0\times\alpha_1+\beta_3\alpha_2.
+$$
+
+<p>The zeroth-order term is</p>
+$$
+y(0)=x(2)w(0)+x(1)w(1)+x(0)w(2)=\beta_0 \alpha_0+0\times\alpha_1+0\times\alpha_2=\alpha_0\beta_0.
+$$
 
-<p>As you can see, our (4, 4, 64) outputs were flattened into vectors of shape (1024) before going through two Dense layers.</p>
 
 <p>
 <!-- navigation buttons at the bottom of the page -->
@@ -484,7 +336,7 @@ <h2 id="add-dense-layers-on-top" class="anchor">Add Dense layers on top </h2>
   <li><a href="http://github.com/._week45-bs036.html">37</a></li>
   <li><a href="http://github.com/._week45-bs037.html">38</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week45-bs076.html">77</a></li>
+  <li><a href="http://github.com/._week45-bs058.html">59</a></li>
   <li><a href="http://github.com/._week45-bs029.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week45/html/._week45-bs029.html b/doc/pub/week45/html/._week45-bs029.html
index 8076f6dab..4fe5fe348 100644
--- a/doc/pub/week45/html/._week45-bs029.html
+++ b/doc/pub/week45/html/._week45-bs029.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks (RNNs)">
-<title>Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks (RNNs)</title>
+<meta name="description" content="Week 45,  Convolutional Neural Networks (CCNs)">
+<title>Week 45,  Convolutional Neural Networks (CCNs)</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week45.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week45-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -37,19 +37,19 @@
 <!-- tocinfo
 {'highest level': 2,
  'sections': [('Plans for week 45', 2, None, 'plans-for-week-45'),
-              ('Material for the lab sessions, additional ways to present '
-               'classification results and other practicalities',
+              ('Material for the lab sessions',
                2,
                None,
-               'material-for-the-lab-sessions-additional-ways-to-present-classification-results-and-other-practicalities'),
-              ('Material for Lecture Monday November 4',
+               'material-for-the-lab-sessions'),
+              ('Material for Lecture Monday November 3',
                2,
                None,
-               'material-for-lecture-monday-november-4'),
-              ('Convolutional Neural Networks (recognizing images)',
+               'material-for-lecture-monday-november-3'),
+              ('Convolutional Neural Networks (recognizing images), reminder '
+               'from last week',
                2,
                None,
-               'convolutional-neural-networks-recognizing-images'),
+               'convolutional-neural-networks-recognizing-images-reminder-from-last-week'),
               ('What is the Difference', 2, None, 'what-is-the-difference'),
               ('Neural Networks vs CNNs', 2, None, 'neural-networks-vs-cnns'),
               ('Why CNNS for images, sound files, medical images from CT scans '
@@ -62,10 +62,13 @@
                None,
                'regular-nns-don-t-scale-well-to-full-images'),
               ('3D volumes of neurons', 2, None, '3d-volumes-of-neurons'),
+              ('More on Dimensionalities', 2, None, 'more-on-dimensionalities'),
+              ('Further remarks', 2, None, 'further-remarks'),
               ('Layers used to build CNNs',
                2,
                None,
                'layers-used-to-build-cnns'),
+              ('Transforming images', 2, None, 'transforming-images'),
               ('CNNs in brief', 2, None, 'cnns-in-brief'),
               ('A deep CNN model ("From Raschka et '
                'al":"https://github.com/rasbt/machine-learning-book")',
@@ -73,208 +76,104 @@
                None,
                'a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
               ('Key Idea', 2, None, 'key-idea'),
-              ('Building convolutional neural networks in Tensorflow and Keras',
+              ('Mathematics of CNNs', 2, None, 'mathematics-of-cnns'),
+              ('Convolution Examples: Polynomial multiplication',
                2,
                None,
-               'building-convolutional-neural-networks-in-tensorflow-and-keras'),
-              ('Setting it up', 2, None, 'setting-it-up'),
-              ('The MNIST dataset again', 2, None, 'the-mnist-dataset-again'),
-              ('Strong correlations', 2, None, 'strong-correlations'),
-              ('Layers of a CNN', 2, None, 'layers-of-a-cnn'),
-              ('Systematic reduction', 2, None, 'systematic-reduction'),
-              ('Prerequisites: Collect and pre-process data',
-               2,
-               None,
-               'prerequisites-collect-and-pre-process-data'),
-              ('Importing Keras and Tensorflow',
-               2,
-               None,
-               'importing-keras-and-tensorflow'),
-              ('Running with Keras', 2, None, 'running-with-keras'),
-              ('Final part', 2, None, 'final-part'),
-              ('Final visualization', 2, None, 'final-visualization'),
-              ('The CIFAR01 data set', 2, None, 'the-cifar01-data-set'),
-              ('Verifying the data set', 2, None, 'verifying-the-data-set'),
-              ('Set up  the model', 2, None, 'set-up-the-model'),
-              ('Add Dense layers on top', 2, None, 'add-dense-layers-on-top'),
-              ('Compile and train the model',
-               2,
-               None,
-               'compile-and-train-the-model'),
-              ('Finally, evaluate the model',
-               2,
-               None,
-               'finally-evaluate-the-model'),
-              ('Building our own CNN code',
-               2,
-               None,
-               'building-our-own-cnn-code'),
-              ('List of contents:', 3, None, 'list-of-contents'),
-              ('Schedulers', 3, None, 'schedulers'),
-              ('Usage of schedulers', 3, None, 'usage-of-schedulers'),
-              ('Cost functions', 3, None, 'cost-functions'),
-              ('Usage of cost functions', 3, None, 'usage-of-cost-functions'),
-              ('Activation functions', 3, None, 'activation-functions'),
-              ('Usage of activation functions',
-               3,
-               None,
-               'usage-of-activation-functions'),
-              ('Convolution', 3, None, 'convolution'),
-              ('Layers', 3, None, 'layers'),
-              ('Convolution2DLayer: convolution in a hidden layer',
-               3,
-               None,
-               'convolution2dlayer-convolution-in-a-hidden-layer'),
-              ('Backpropagation in the convolutional layer',
-               3,
-               None,
-               'backpropagation-in-the-convolutional-layer'),
-              ('Demonstration', 3, None, 'demonstration'),
-              ('Pooling Layer', 3, None, 'pooling-layer'),
-              ('Flattening Layer', 3, None, 'flattening-layer'),
-              ('Fully Connected Layers', 3, None, 'fully-connected-layers'),
-              ('Optimized Convolution2DLayer',
-               3,
-               None,
-               'optimized-convolution2dlayer'),
-              ('The Convolutional Neural Network (CNN)',
-               3,
-               None,
-               'the-convolutional-neural-network-cnn'),
-              ('Usage of CNN code', 3, None, 'usage-of-cnn-code'),
-              ('Additional Remarks', 3, None, 'additional-remarks'),
-              ('Remarks on the speed', 3, None, 'remarks-on-the-speed'),
-              ('Convolution using separable kernels',
-               3,
-               None,
-               'convolution-using-separable-kernels'),
-              ('Convolution in the Fourier domain',
-               3,
-               None,
-               'convolution-in-the-fourier-domain'),
-              ('From FFNNs and CNNs to recurrent neural networks (RNNs)',
-               2,
-               None,
-               'from-ffnns-and-cnns-to-recurrent-neural-networks-rnns'),
-              ('Feedback connections', 2, None, 'feedback-connections'),
-              ('Vanishing gradients', 2, None, 'vanishing-gradients'),
-              ('Recurrent neural networks (RNNs): Overarching view',
-               2,
-               None,
-               'recurrent-neural-networks-rnns-overarching-view'),
-              ('Sequential data only?', 2, None, 'sequential-data-only'),
-              ('Differential equations', 2, None, 'differential-equations'),
-              ('A simple example', 2, None, 'a-simple-example'),
-              ('RNNs', 2, None, 'rnns'),
-              ('What kinds of behaviour can RNNs exhibit?',
+               'convolution-examples-polynomial-multiplication'),
+              ('Efficient Polynomial Multiplication',
                2,
                None,
-               'what-kinds-of-behaviour-can-rnns-exhibit'),
-              ('Basic layout,  "Figures from Sebastian Rashcka et al, Machine '
-               'learning with Sickit-Learn and '
-               'PyTorch":"https://sebastianraschka.com/blog/2022/ml-pytorch-book.html"',
+               'efficient-polynomial-multiplication'),
+              ('Further simplification', 2, None, 'further-simplification'),
+              ('A more efficient way of coding the above Convolution',
                2,
                None,
-               'basic-layout-figures-from-sebastian-rashcka-et-al-machine-learning-with-sickit-learn-and-pytorch-https-sebastianraschka-com-blog-2022-ml-pytorch-book-html'),
-              ('Solving differential equations with RNNs',
+               'a-more-efficient-way-of-coding-the-above-convolution'),
+              ('Commutative process', 2, None, 'commutative-process'),
+              ('Toeplitz matrices', 2, None, 'toeplitz-matrices'),
+              ('Fourier series and Toeplitz matrices',
                2,
                None,
-               'solving-differential-equations-with-rnns'),
-              ('Two first-order differential equations',
+               'fourier-series-and-toeplitz-matrices'),
+              ('Generalizing the above one-dimensional case',
                2,
                None,
-               'two-first-order-differential-equations'),
-              ('Velocity only', 2, None, 'velocity-only'),
-              ('Linking with RNNs', 2, None, 'linking-with-rnns'),
-              ('Minor rewrite', 2, None, 'minor-rewrite'),
-              ('RNNs in more detail', 2, None, 'rnns-in-more-detail'),
-              ('RNNs in more detail, part 2',
+               'generalizing-the-above-one-dimensional-case'),
+              ('Memory considerations', 2, None, 'memory-considerations'),
+              ('Padding', 2, None, 'padding'),
+              ('New vector', 2, None, 'new-vector'),
+              ('Rewriting as dot products',
                2,
                None,
-               'rnns-in-more-detail-part-2'),
-              ('RNNs in more detail, part 3',
+               'rewriting-as-dot-products'),
+              ('Cross correlation', 2, None, 'cross-correlation'),
+              ('Two-dimensional objects', 2, None, 'two-dimensional-objects'),
+              ('CNNs in more detail, simple example',
                2,
                None,
-               'rnns-in-more-detail-part-3'),
-              ('RNNs in more detail, part 4',
+               'cnns-in-more-detail-simple-example'),
+              ('The convolution stage', 2, None, 'the-convolution-stage'),
+              ('Finding the number of parameters',
                2,
                None,
-               'rnns-in-more-detail-part-4'),
-              ('RNNs in more detail, part 5',
+               'finding-the-number-of-parameters'),
+              ('New image (or volume)', 2, None, 'new-image-or-volume'),
+              ('Parameters to train, common settings',
                2,
                None,
-               'rnns-in-more-detail-part-5'),
-              ('RNNs in more detail, part 6',
-               2,
-               None,
-               'rnns-in-more-detail-part-6'),
-              ('RNNs in more detail, part 7',
-               2,
-               None,
-               'rnns-in-more-detail-part-7'),
-              ('Backpropagation through time',
-               2,
-               None,
-               'backpropagation-through-time'),
-              ('The backward pass is linear',
-               2,
-               None,
-               'the-backward-pass-is-linear'),
-              ('The problem of exploding or vanishing gradients',
-               2,
-               None,
-               'the-problem-of-exploding-or-vanishing-gradients'),
-              ('Mathematical setup', 2, None, 'mathematical-setup'),
-              ('Back propagation in time through figures, part 1',
-               2,
-               None,
-               'back-propagation-in-time-through-figures-part-1'),
-              ('Back propagation in time, part 2',
-               2,
-               None,
-               'back-propagation-in-time-part-2'),
-              ('Back propagation in time, part 3',
+               'parameters-to-train-common-settings'),
+              ('Examples of CNN setups', 2, None, 'examples-of-cnn-setups'),
+              ('Summarizing: Performing a general discrete convolution ("From '
+               'Raschka et '
+               'al":"https://github.com/rasbt/machine-learning-book")',
                2,
                None,
-               'back-propagation-in-time-part-3'),
-              ('Back propagation in time, part 4',
+               'summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
+              ('Pooling', 2, None, 'pooling'),
+              ('Pooling arithmetic', 2, None, 'pooling-arithmetic'),
+              ('Pooling types ("From Raschka et '
+               'al":"https://github.com/rasbt/machine-learning-book")',
                2,
                None,
-               'back-propagation-in-time-part-4'),
-              ('Back propagation in time in equations',
+               'pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
+              ('Building convolutional neural networks using Tensorflow and '
+               'Keras',
                2,
                None,
-               'back-propagation-in-time-in-equations'),
-              ('Chain rule again', 2, None, 'chain-rule-again'),
-              ('Gradients of loss functions',
+               'building-convolutional-neural-networks-using-tensorflow-and-keras'),
+              ('Setting it up', 2, None, 'setting-it-up'),
+              ('The MNIST dataset again', 2, None, 'the-mnist-dataset-again'),
+              ('Strong correlations', 2, None, 'strong-correlations'),
+              ('Layers of a CNN', 2, None, 'layers-of-a-cnn'),
+              ('Systematic reduction', 2, None, 'systematic-reduction'),
+              ('Prerequisites: Collect and pre-process data',
                2,
                None,
-               'gradients-of-loss-functions'),
-              ('Summary of RNNs', 2, None, 'summary-of-rnns'),
-              ('Summary of a  typical RNN',
+               'prerequisites-collect-and-pre-process-data'),
+              ('Importing Keras and Tensorflow',
                2,
                None,
-               'summary-of-a-typical-rnn'),
-              ('Four effective ways to learn an RNN and preparing for next '
-               'week',
+               'importing-keras-and-tensorflow'),
+              ('Running with Keras', 2, None, 'running-with-keras'),
+              ('Final part', 2, None, 'final-part'),
+              ('Final visualization', 2, None, 'final-visualization'),
+              ('The CIFAR01 data set', 2, None, 'the-cifar01-data-set'),
+              ('Verifying the data set', 2, None, 'verifying-the-data-set'),
+              ('Set up  the model', 2, None, 'set-up-the-model'),
+              ('Add Dense layers on top', 2, None, 'add-dense-layers-on-top'),
+              ('Compile and train the model',
                2,
                None,
-               'four-effective-ways-to-learn-an-rnn-and-preparing-for-next-week'),
-              ('Gating mechanism: Long Short Term Memory (LSTM)',
+               'compile-and-train-the-model'),
+              ('Finally, evaluate the model',
                2,
                None,
-               'gating-mechanism-long-short-term-memory-lstm'),
-              ('Implementing a memory cell in a neural network',
+               'finally-evaluate-the-model'),
+              ('Building code using Pytorch',
                2,
                None,
-               'implementing-a-memory-cell-in-a-neural-network'),
-              ('LSTM details', 2, None, 'lstm-details'),
-              ('Basic layout', 2, None, 'basic-layout'),
-              ('More LSTM details', 2, None, 'more-lstm-details'),
-              ('The forget gate', 2, None, 'the-forget-gate'),
-              ('Input gate', 2, None, 'input-gate'),
-              ('Forget and input', 2, None, 'forget-and-input'),
-              ('Output gate', 2, None, 'output-gate')]}
+               'building-code-using-pytorch')]}
 end of tocinfo -->
 
 <body>
@@ -302,111 +201,72 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week45-bs.html">Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks (RNNs)</a>
+    <a class="navbar-brand" href="http://github.com/week45-bs.html">Week 45,  Convolutional Neural Networks (CCNs)</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs001.html#plans-for-week-45" style="font-size: 80%;"><b>Plans for week 45</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs002.html#material-for-the-lab-sessions-additional-ways-to-present-classification-results-and-other-practicalities" style="font-size: 80%;"><b>Material for the lab sessions, additional ways to present classification results and other practicalities</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs003.html#material-for-lecture-monday-november-4" style="font-size: 80%;"><b>Material for Lecture Monday November 4</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs004.html#convolutional-neural-networks-recognizing-images" style="font-size: 80%;"><b>Convolutional Neural Networks (recognizing images)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs005.html#what-is-the-difference" style="font-size: 80%;"><b>What is the Difference</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs006.html#neural-networks-vs-cnns" style="font-size: 80%;"><b>Neural Networks vs CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs007.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;"><b>Why CNNS for images, sound files, medical images from CT scans etc?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs008.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;"><b>Regular NNs don’t scale well to full images</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs009.html#3d-volumes-of-neurons" style="font-size: 80%;"><b>3D volumes of neurons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs010.html#layers-used-to-build-cnns" style="font-size: 80%;"><b>Layers used to build CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs011.html#cnns-in-brief" style="font-size: 80%;"><b>CNNs in brief</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs012.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs013.html#key-idea" style="font-size: 80%;"><b>Key Idea</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs014.html#building-convolutional-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building convolutional neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs015.html#setting-it-up" style="font-size: 80%;"><b>Setting it up</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs016.html#the-mnist-dataset-again" style="font-size: 80%;"><b>The MNIST dataset again</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs017.html#strong-correlations" style="font-size: 80%;"><b>Strong correlations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs018.html#layers-of-a-cnn" style="font-size: 80%;"><b>Layers of a CNN</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs019.html#systematic-reduction" style="font-size: 80%;"><b>Systematic reduction</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs020.html#prerequisites-collect-and-pre-process-data" style="font-size: 80%;"><b>Prerequisites: Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs021.html#importing-keras-and-tensorflow" style="font-size: 80%;"><b>Importing Keras and Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs022.html#running-with-keras" style="font-size: 80%;"><b>Running with Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs023.html#final-part" style="font-size: 80%;"><b>Final part</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs024.html#final-visualization" style="font-size: 80%;"><b>Final visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs025.html#the-cifar01-data-set" style="font-size: 80%;"><b>The CIFAR01 data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs026.html#verifying-the-data-set" style="font-size: 80%;"><b>Verifying the data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs027.html#set-up-the-model" style="font-size: 80%;"><b>Set up  the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs028.html#add-dense-layers-on-top" style="font-size: 80%;"><b>Add Dense layers on top</b></a></li>
-     <!-- navigation toc: --> <li><a href="#compile-and-train-the-model" style="font-size: 80%;"><b>Compile and train the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs030.html#finally-evaluate-the-model" style="font-size: 80%;"><b>Finally, evaluate the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#building-our-own-cnn-code" style="font-size: 80%;"><b>Building our own CNN code</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#list-of-contents" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;List of contents:</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution2dlayer-convolution-in-a-hidden-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution2DLayer: convolution in a hidden layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#backpropagation-in-the-convolutional-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Backpropagation in the convolutional layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#demonstration" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Demonstration</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#pooling-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Pooling Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#flattening-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Flattening Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#fully-connected-layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Fully Connected Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#optimized-convolution2dlayer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Optimized Convolution2DLayer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#the-convolutional-neural-network-cnn" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;The Convolutional Neural Network (CNN)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-cnn-code" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of CNN code</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#additional-remarks" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Additional Remarks</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#remarks-on-the-speed" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Remarks on the speed</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution-using-separable-kernels" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution using separable kernels</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution-in-the-fourier-domain" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution in the Fourier domain</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs032.html#from-ffnns-and-cnns-to-recurrent-neural-networks-rnns" style="font-size: 80%;"><b>From FFNNs and CNNs to recurrent neural networks (RNNs)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs033.html#feedback-connections" style="font-size: 80%;"><b>Feedback connections</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs034.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs035.html#recurrent-neural-networks-rnns-overarching-view" style="font-size: 80%;"><b>Recurrent neural networks (RNNs): Overarching view</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs036.html#sequential-data-only" style="font-size: 80%;"><b>Sequential data only?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs037.html#differential-equations" style="font-size: 80%;"><b>Differential equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs038.html#a-simple-example" style="font-size: 80%;"><b>A simple example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs039.html#rnns" style="font-size: 80%;"><b>RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs040.html#what-kinds-of-behaviour-can-rnns-exhibit" style="font-size: 80%;"><b>What kinds of behaviour can RNNs exhibit?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs041.html#basic-layout-figures-from-sebastian-rashcka-et-al-machine-learning-with-sickit-learn-and-pytorch-https-sebastianraschka-com-blog-2022-ml-pytorch-book-html" style="font-size: 80%;"><b>Basic layout,  "Figures from Sebastian Rashcka et al, Machine learning with Sickit-Learn and PyTorch":"https://sebastianraschka.com/blog/2022/ml-pytorch-book.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs042.html#solving-differential-equations-with-rnns" style="font-size: 80%;"><b>Solving differential equations with RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs043.html#two-first-order-differential-equations" style="font-size: 80%;"><b>Two first-order differential equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs044.html#velocity-only" style="font-size: 80%;"><b>Velocity only</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs045.html#linking-with-rnns" style="font-size: 80%;"><b>Linking with RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs046.html#minor-rewrite" style="font-size: 80%;"><b>Minor rewrite</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs047.html#rnns-in-more-detail" style="font-size: 80%;"><b>RNNs in more detail</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs048.html#rnns-in-more-detail-part-2" style="font-size: 80%;"><b>RNNs in more detail, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs049.html#rnns-in-more-detail-part-3" style="font-size: 80%;"><b>RNNs in more detail, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs050.html#rnns-in-more-detail-part-4" style="font-size: 80%;"><b>RNNs in more detail, part 4</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs051.html#rnns-in-more-detail-part-5" style="font-size: 80%;"><b>RNNs in more detail, part 5</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs052.html#rnns-in-more-detail-part-6" style="font-size: 80%;"><b>RNNs in more detail, part 6</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs053.html#rnns-in-more-detail-part-7" style="font-size: 80%;"><b>RNNs in more detail, part 7</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs054.html#backpropagation-through-time" style="font-size: 80%;"><b>Backpropagation through time</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs055.html#the-backward-pass-is-linear" style="font-size: 80%;"><b>The backward pass is linear</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs056.html#the-problem-of-exploding-or-vanishing-gradients" style="font-size: 80%;"><b>The problem of exploding or vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs057.html#mathematical-setup" style="font-size: 80%;"><b>Mathematical setup</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs058.html#back-propagation-in-time-through-figures-part-1" style="font-size: 80%;"><b>Back propagation in time through figures, part 1</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs059.html#back-propagation-in-time-part-2" style="font-size: 80%;"><b>Back propagation in time, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs060.html#back-propagation-in-time-part-3" style="font-size: 80%;"><b>Back propagation in time, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs061.html#back-propagation-in-time-part-4" style="font-size: 80%;"><b>Back propagation in time, part 4</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs062.html#back-propagation-in-time-in-equations" style="font-size: 80%;"><b>Back propagation in time in equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs063.html#chain-rule-again" style="font-size: 80%;"><b>Chain rule again</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs064.html#gradients-of-loss-functions" style="font-size: 80%;"><b>Gradients of loss functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs065.html#summary-of-rnns" style="font-size: 80%;"><b>Summary of RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs066.html#summary-of-a-typical-rnn" style="font-size: 80%;"><b>Summary of a  typical RNN</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs067.html#four-effective-ways-to-learn-an-rnn-and-preparing-for-next-week" style="font-size: 80%;"><b>Four effective ways to learn an RNN and preparing for next week</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs068.html#gating-mechanism-long-short-term-memory-lstm" style="font-size: 80%;"><b>Gating mechanism: Long Short Term Memory (LSTM)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs069.html#implementing-a-memory-cell-in-a-neural-network" style="font-size: 80%;"><b>Implementing a memory cell in a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs070.html#lstm-details" style="font-size: 80%;"><b>LSTM details</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs071.html#basic-layout" style="font-size: 80%;"><b>Basic layout</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs072.html#more-lstm-details" style="font-size: 80%;"><b>More LSTM details</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs073.html#the-forget-gate" style="font-size: 80%;"><b>The forget gate</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs074.html#input-gate" style="font-size: 80%;"><b>Input gate</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs075.html#forget-and-input" style="font-size: 80%;"><b>Forget and input</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs076.html#output-gate" style="font-size: 80%;"><b>Output gate</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs001.html#plans-for-week-45" style="font-size: 80%;">Plans for week 45</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs002.html#material-for-the-lab-sessions" style="font-size: 80%;">Material for the lab sessions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs003.html#material-for-lecture-monday-november-3" style="font-size: 80%;">Material for Lecture Monday November 3</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs004.html#convolutional-neural-networks-recognizing-images-reminder-from-last-week" style="font-size: 80%;">Convolutional Neural Networks (recognizing images), reminder from last week</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs005.html#what-is-the-difference" style="font-size: 80%;">What is the Difference</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs006.html#neural-networks-vs-cnns" style="font-size: 80%;">Neural Networks vs CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs007.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;">Why CNNS for images, sound files, medical images from CT scans etc?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs008.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;">Regular NNs don’t scale well to full images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs009.html#3d-volumes-of-neurons" style="font-size: 80%;">3D volumes of neurons</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs010.html#more-on-dimensionalities" style="font-size: 80%;">More on Dimensionalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs011.html#further-remarks" style="font-size: 80%;">Further remarks</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs012.html#layers-used-to-build-cnns" style="font-size: 80%;">Layers used to build CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs013.html#transforming-images" style="font-size: 80%;">Transforming images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs014.html#cnns-in-brief" style="font-size: 80%;">CNNs in brief</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs015.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs016.html#key-idea" style="font-size: 80%;">Key Idea</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs017.html#mathematics-of-cnns" style="font-size: 80%;">Mathematics of CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs018.html#convolution-examples-polynomial-multiplication" style="font-size: 80%;">Convolution Examples: Polynomial multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs019.html#efficient-polynomial-multiplication" style="font-size: 80%;">Efficient Polynomial Multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs020.html#further-simplification" style="font-size: 80%;">Further simplification</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs021.html#a-more-efficient-way-of-coding-the-above-convolution" style="font-size: 80%;">A more efficient way of coding the above Convolution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs022.html#commutative-process" style="font-size: 80%;">Commutative process</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs023.html#toeplitz-matrices" style="font-size: 80%;">Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs024.html#fourier-series-and-toeplitz-matrices" style="font-size: 80%;">Fourier series and Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs025.html#generalizing-the-above-one-dimensional-case" style="font-size: 80%;">Generalizing the above one-dimensional case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs026.html#memory-considerations" style="font-size: 80%;">Memory considerations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs027.html#padding" style="font-size: 80%;">Padding</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs028.html#new-vector" style="font-size: 80%;">New vector</a></li>
+     <!-- navigation toc: --> <li><a href="#rewriting-as-dot-products" style="font-size: 80%;">Rewriting as dot products</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs030.html#cross-correlation" style="font-size: 80%;">Cross correlation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs030.html#two-dimensional-objects" style="font-size: 80%;">Two-dimensional objects</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#cnns-in-more-detail-simple-example" style="font-size: 80%;">CNNs in more detail, simple example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs032.html#the-convolution-stage" style="font-size: 80%;">The convolution stage</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs033.html#finding-the-number-of-parameters" style="font-size: 80%;">Finding the number of parameters</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs034.html#new-image-or-volume" style="font-size: 80%;">New image (or volume)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs035.html#parameters-to-train-common-settings" style="font-size: 80%;">Parameters to train, common settings</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs036.html#examples-of-cnn-setups" style="font-size: 80%;">Examples of CNN setups</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs037.html#summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Summarizing: Performing a general discrete convolution ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs038.html#pooling" style="font-size: 80%;">Pooling</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs039.html#pooling-arithmetic" style="font-size: 80%;">Pooling arithmetic</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs040.html#pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Pooling types ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs041.html#building-convolutional-neural-networks-using-tensorflow-and-keras" style="font-size: 80%;">Building convolutional neural networks using Tensorflow and Keras</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs042.html#setting-it-up" style="font-size: 80%;">Setting it up</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs043.html#the-mnist-dataset-again" style="font-size: 80%;">The MNIST dataset again</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs044.html#strong-correlations" style="font-size: 80%;">Strong correlations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs045.html#layers-of-a-cnn" style="font-size: 80%;">Layers of a CNN</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs046.html#systematic-reduction" style="font-size: 80%;">Systematic reduction</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs047.html#prerequisites-collect-and-pre-process-data" style="font-size: 80%;">Prerequisites: Collect and pre-process data</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs048.html#importing-keras-and-tensorflow" style="font-size: 80%;">Importing Keras and Tensorflow</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs049.html#running-with-keras" style="font-size: 80%;">Running with Keras</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs050.html#final-part" style="font-size: 80%;">Final part</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs051.html#final-visualization" style="font-size: 80%;">Final visualization</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs052.html#the-cifar01-data-set" style="font-size: 80%;">The CIFAR01 data set</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs053.html#verifying-the-data-set" style="font-size: 80%;">Verifying the data set</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs054.html#set-up-the-model" style="font-size: 80%;">Set up  the model</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs055.html#add-dense-layers-on-top" style="font-size: 80%;">Add Dense layers on top</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs056.html#compile-and-train-the-model" style="font-size: 80%;">Compile and train the model</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs057.html#finally-evaluate-the-model" style="font-size: 80%;">Finally, evaluate the model</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs058.html#building-code-using-pytorch" style="font-size: 80%;">Building code using Pytorch</a></li>
 
         </ul>
       </li>
@@ -418,36 +278,22 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0029"></a>
 <!-- !split -->
-<h2 id="compile-and-train-the-model" class="anchor">Compile and train the model </h2>
+<h2 id="rewriting-as-dot-products" class="anchor">Rewriting as dot products </h2>
 
+<p>If we now flip the filter/weight vector, with the following term as a typical example</p>
+$$
+y(0)=x(2)w(0)+x(1)w(1)+x(0)w(2)=x(2)\tilde{w}(2)+x(1)\tilde{w}(1)+x(0)\tilde{w}(0),
+$$
 
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;">model<span style="color: #666666">.</span>compile(optimizer<span style="color: #666666">=</span><span style="color: #BA2121">&#39;adam&#39;</span>,
-              loss<span style="color: #666666">=</span>tf<span style="color: #666666">.</span>keras<span style="color: #666666">.</span>losses<span style="color: #666666">.</span>SparseCategoricalCrossentropy(from_logits<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">True</span>),
-              metrics<span style="color: #666666">=</span>[<span style="color: #BA2121">&#39;accuracy&#39;</span>])
-
-history <span style="color: #666666">=</span> model<span style="color: #666666">.</span>fit(train_images, train_labels, epochs<span style="color: #666666">=10</span>, 
-                    validation_data<span style="color: #666666">=</span>(test_images, test_labels))
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
+<p>with \( \tilde{w}(0)=w(2) \), \( \tilde{w}(1)=w(1) \), and \( \tilde{w}(2)=w(0) \), we can then rewrite the above sum as a dot product of
+\( x(i:i+(m-1))\tilde{w} \) for element \( y(i) \), where \( x(i:i+(m-1)) \) is simply a patch of \( \boldsymbol{x} \) of size \( m-1 \).
+</p>
 
+<p>The padding \( P \) we have introduced for the convolution stage is just
+another hyperparameter which is introduced as part of the
+architecture. Similarly, below we will also introduce another
+hyperparameter called <b>Stride</b> \( S \). 
+</p>
 
 <p>
 <!-- navigation buttons at the bottom of the page -->
@@ -474,7 +320,7 @@ <h2 id="compile-and-train-the-model" class="anchor">Compile and train the model
   <li><a href="http://github.com/._week45-bs037.html">38</a></li>
   <li><a href="http://github.com/._week45-bs038.html">39</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week45-bs076.html">77</a></li>
+  <li><a href="http://github.com/._week45-bs058.html">59</a></li>
   <li><a href="http://github.com/._week45-bs030.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week45/html/._week45-bs030.html b/doc/pub/week45/html/._week45-bs030.html
index 8f82433ad..ac758adb0 100644
--- a/doc/pub/week45/html/._week45-bs030.html
+++ b/doc/pub/week45/html/._week45-bs030.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks (RNNs)">
-<title>Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks (RNNs)</title>
+<meta name="description" content="Week 45,  Convolutional Neural Networks (CCNs)">
+<title>Week 45,  Convolutional Neural Networks (CCNs)</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week45.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week45-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -37,19 +37,19 @@
 <!-- tocinfo
 {'highest level': 2,
  'sections': [('Plans for week 45', 2, None, 'plans-for-week-45'),
-              ('Material for the lab sessions, additional ways to present '
-               'classification results and other practicalities',
+              ('Material for the lab sessions',
                2,
                None,
-               'material-for-the-lab-sessions-additional-ways-to-present-classification-results-and-other-practicalities'),
-              ('Material for Lecture Monday November 4',
+               'material-for-the-lab-sessions'),
+              ('Material for Lecture Monday November 3',
                2,
                None,
-               'material-for-lecture-monday-november-4'),
-              ('Convolutional Neural Networks (recognizing images)',
+               'material-for-lecture-monday-november-3'),
+              ('Convolutional Neural Networks (recognizing images), reminder '
+               'from last week',
                2,
                None,
-               'convolutional-neural-networks-recognizing-images'),
+               'convolutional-neural-networks-recognizing-images-reminder-from-last-week'),
               ('What is the Difference', 2, None, 'what-is-the-difference'),
               ('Neural Networks vs CNNs', 2, None, 'neural-networks-vs-cnns'),
               ('Why CNNS for images, sound files, medical images from CT scans '
@@ -62,10 +62,13 @@
                None,
                'regular-nns-don-t-scale-well-to-full-images'),
               ('3D volumes of neurons', 2, None, '3d-volumes-of-neurons'),
+              ('More on Dimensionalities', 2, None, 'more-on-dimensionalities'),
+              ('Further remarks', 2, None, 'further-remarks'),
               ('Layers used to build CNNs',
                2,
                None,
                'layers-used-to-build-cnns'),
+              ('Transforming images', 2, None, 'transforming-images'),
               ('CNNs in brief', 2, None, 'cnns-in-brief'),
               ('A deep CNN model ("From Raschka et '
                'al":"https://github.com/rasbt/machine-learning-book")',
@@ -73,208 +76,104 @@
                None,
                'a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
               ('Key Idea', 2, None, 'key-idea'),
-              ('Building convolutional neural networks in Tensorflow and Keras',
+              ('Mathematics of CNNs', 2, None, 'mathematics-of-cnns'),
+              ('Convolution Examples: Polynomial multiplication',
                2,
                None,
-               'building-convolutional-neural-networks-in-tensorflow-and-keras'),
-              ('Setting it up', 2, None, 'setting-it-up'),
-              ('The MNIST dataset again', 2, None, 'the-mnist-dataset-again'),
-              ('Strong correlations', 2, None, 'strong-correlations'),
-              ('Layers of a CNN', 2, None, 'layers-of-a-cnn'),
-              ('Systematic reduction', 2, None, 'systematic-reduction'),
-              ('Prerequisites: Collect and pre-process data',
-               2,
-               None,
-               'prerequisites-collect-and-pre-process-data'),
-              ('Importing Keras and Tensorflow',
-               2,
-               None,
-               'importing-keras-and-tensorflow'),
-              ('Running with Keras', 2, None, 'running-with-keras'),
-              ('Final part', 2, None, 'final-part'),
-              ('Final visualization', 2, None, 'final-visualization'),
-              ('The CIFAR01 data set', 2, None, 'the-cifar01-data-set'),
-              ('Verifying the data set', 2, None, 'verifying-the-data-set'),
-              ('Set up  the model', 2, None, 'set-up-the-model'),
-              ('Add Dense layers on top', 2, None, 'add-dense-layers-on-top'),
-              ('Compile and train the model',
-               2,
-               None,
-               'compile-and-train-the-model'),
-              ('Finally, evaluate the model',
-               2,
-               None,
-               'finally-evaluate-the-model'),
-              ('Building our own CNN code',
-               2,
-               None,
-               'building-our-own-cnn-code'),
-              ('List of contents:', 3, None, 'list-of-contents'),
-              ('Schedulers', 3, None, 'schedulers'),
-              ('Usage of schedulers', 3, None, 'usage-of-schedulers'),
-              ('Cost functions', 3, None, 'cost-functions'),
-              ('Usage of cost functions', 3, None, 'usage-of-cost-functions'),
-              ('Activation functions', 3, None, 'activation-functions'),
-              ('Usage of activation functions',
-               3,
-               None,
-               'usage-of-activation-functions'),
-              ('Convolution', 3, None, 'convolution'),
-              ('Layers', 3, None, 'layers'),
-              ('Convolution2DLayer: convolution in a hidden layer',
-               3,
-               None,
-               'convolution2dlayer-convolution-in-a-hidden-layer'),
-              ('Backpropagation in the convolutional layer',
-               3,
-               None,
-               'backpropagation-in-the-convolutional-layer'),
-              ('Demonstration', 3, None, 'demonstration'),
-              ('Pooling Layer', 3, None, 'pooling-layer'),
-              ('Flattening Layer', 3, None, 'flattening-layer'),
-              ('Fully Connected Layers', 3, None, 'fully-connected-layers'),
-              ('Optimized Convolution2DLayer',
-               3,
-               None,
-               'optimized-convolution2dlayer'),
-              ('The Convolutional Neural Network (CNN)',
-               3,
-               None,
-               'the-convolutional-neural-network-cnn'),
-              ('Usage of CNN code', 3, None, 'usage-of-cnn-code'),
-              ('Additional Remarks', 3, None, 'additional-remarks'),
-              ('Remarks on the speed', 3, None, 'remarks-on-the-speed'),
-              ('Convolution using separable kernels',
-               3,
-               None,
-               'convolution-using-separable-kernels'),
-              ('Convolution in the Fourier domain',
-               3,
-               None,
-               'convolution-in-the-fourier-domain'),
-              ('From FFNNs and CNNs to recurrent neural networks (RNNs)',
-               2,
-               None,
-               'from-ffnns-and-cnns-to-recurrent-neural-networks-rnns'),
-              ('Feedback connections', 2, None, 'feedback-connections'),
-              ('Vanishing gradients', 2, None, 'vanishing-gradients'),
-              ('Recurrent neural networks (RNNs): Overarching view',
-               2,
-               None,
-               'recurrent-neural-networks-rnns-overarching-view'),
-              ('Sequential data only?', 2, None, 'sequential-data-only'),
-              ('Differential equations', 2, None, 'differential-equations'),
-              ('A simple example', 2, None, 'a-simple-example'),
-              ('RNNs', 2, None, 'rnns'),
-              ('What kinds of behaviour can RNNs exhibit?',
-               2,
-               None,
-               'what-kinds-of-behaviour-can-rnns-exhibit'),
-              ('Basic layout,  "Figures from Sebastian Rashcka et al, Machine '
-               'learning with Sickit-Learn and '
-               'PyTorch":"https://sebastianraschka.com/blog/2022/ml-pytorch-book.html"',
+               'convolution-examples-polynomial-multiplication'),
+              ('Efficient Polynomial Multiplication',
                2,
                None,
-               'basic-layout-figures-from-sebastian-rashcka-et-al-machine-learning-with-sickit-learn-and-pytorch-https-sebastianraschka-com-blog-2022-ml-pytorch-book-html'),
-              ('Solving differential equations with RNNs',
+               'efficient-polynomial-multiplication'),
+              ('Further simplification', 2, None, 'further-simplification'),
+              ('A more efficient way of coding the above Convolution',
                2,
                None,
-               'solving-differential-equations-with-rnns'),
-              ('Two first-order differential equations',
+               'a-more-efficient-way-of-coding-the-above-convolution'),
+              ('Commutative process', 2, None, 'commutative-process'),
+              ('Toeplitz matrices', 2, None, 'toeplitz-matrices'),
+              ('Fourier series and Toeplitz matrices',
                2,
                None,
-               'two-first-order-differential-equations'),
-              ('Velocity only', 2, None, 'velocity-only'),
-              ('Linking with RNNs', 2, None, 'linking-with-rnns'),
-              ('Minor rewrite', 2, None, 'minor-rewrite'),
-              ('RNNs in more detail', 2, None, 'rnns-in-more-detail'),
-              ('RNNs in more detail, part 2',
+               'fourier-series-and-toeplitz-matrices'),
+              ('Generalizing the above one-dimensional case',
                2,
                None,
-               'rnns-in-more-detail-part-2'),
-              ('RNNs in more detail, part 3',
+               'generalizing-the-above-one-dimensional-case'),
+              ('Memory considerations', 2, None, 'memory-considerations'),
+              ('Padding', 2, None, 'padding'),
+              ('New vector', 2, None, 'new-vector'),
+              ('Rewriting as dot products',
                2,
                None,
-               'rnns-in-more-detail-part-3'),
-              ('RNNs in more detail, part 4',
+               'rewriting-as-dot-products'),
+              ('Cross correlation', 2, None, 'cross-correlation'),
+              ('Two-dimensional objects', 2, None, 'two-dimensional-objects'),
+              ('CNNs in more detail, simple example',
                2,
                None,
-               'rnns-in-more-detail-part-4'),
-              ('RNNs in more detail, part 5',
+               'cnns-in-more-detail-simple-example'),
+              ('The convolution stage', 2, None, 'the-convolution-stage'),
+              ('Finding the number of parameters',
                2,
                None,
-               'rnns-in-more-detail-part-5'),
-              ('RNNs in more detail, part 6',
+               'finding-the-number-of-parameters'),
+              ('New image (or volume)', 2, None, 'new-image-or-volume'),
+              ('Parameters to train, common settings',
                2,
                None,
-               'rnns-in-more-detail-part-6'),
-              ('RNNs in more detail, part 7',
-               2,
-               None,
-               'rnns-in-more-detail-part-7'),
-              ('Backpropagation through time',
-               2,
-               None,
-               'backpropagation-through-time'),
-              ('The backward pass is linear',
-               2,
-               None,
-               'the-backward-pass-is-linear'),
-              ('The problem of exploding or vanishing gradients',
-               2,
-               None,
-               'the-problem-of-exploding-or-vanishing-gradients'),
-              ('Mathematical setup', 2, None, 'mathematical-setup'),
-              ('Back propagation in time through figures, part 1',
-               2,
-               None,
-               'back-propagation-in-time-through-figures-part-1'),
-              ('Back propagation in time, part 2',
-               2,
-               None,
-               'back-propagation-in-time-part-2'),
-              ('Back propagation in time, part 3',
+               'parameters-to-train-common-settings'),
+              ('Examples of CNN setups', 2, None, 'examples-of-cnn-setups'),
+              ('Summarizing: Performing a general discrete convolution ("From '
+               'Raschka et '
+               'al":"https://github.com/rasbt/machine-learning-book")',
                2,
                None,
-               'back-propagation-in-time-part-3'),
-              ('Back propagation in time, part 4',
+               'summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
+              ('Pooling', 2, None, 'pooling'),
+              ('Pooling arithmetic', 2, None, 'pooling-arithmetic'),
+              ('Pooling types ("From Raschka et '
+               'al":"https://github.com/rasbt/machine-learning-book")',
                2,
                None,
-               'back-propagation-in-time-part-4'),
-              ('Back propagation in time in equations',
+               'pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
+              ('Building convolutional neural networks using Tensorflow and '
+               'Keras',
                2,
                None,
-               'back-propagation-in-time-in-equations'),
-              ('Chain rule again', 2, None, 'chain-rule-again'),
-              ('Gradients of loss functions',
+               'building-convolutional-neural-networks-using-tensorflow-and-keras'),
+              ('Setting it up', 2, None, 'setting-it-up'),
+              ('The MNIST dataset again', 2, None, 'the-mnist-dataset-again'),
+              ('Strong correlations', 2, None, 'strong-correlations'),
+              ('Layers of a CNN', 2, None, 'layers-of-a-cnn'),
+              ('Systematic reduction', 2, None, 'systematic-reduction'),
+              ('Prerequisites: Collect and pre-process data',
                2,
                None,
-               'gradients-of-loss-functions'),
-              ('Summary of RNNs', 2, None, 'summary-of-rnns'),
-              ('Summary of a  typical RNN',
+               'prerequisites-collect-and-pre-process-data'),
+              ('Importing Keras and Tensorflow',
                2,
                None,
-               'summary-of-a-typical-rnn'),
-              ('Four effective ways to learn an RNN and preparing for next '
-               'week',
+               'importing-keras-and-tensorflow'),
+              ('Running with Keras', 2, None, 'running-with-keras'),
+              ('Final part', 2, None, 'final-part'),
+              ('Final visualization', 2, None, 'final-visualization'),
+              ('The CIFAR01 data set', 2, None, 'the-cifar01-data-set'),
+              ('Verifying the data set', 2, None, 'verifying-the-data-set'),
+              ('Set up  the model', 2, None, 'set-up-the-model'),
+              ('Add Dense layers on top', 2, None, 'add-dense-layers-on-top'),
+              ('Compile and train the model',
                2,
                None,
-               'four-effective-ways-to-learn-an-rnn-and-preparing-for-next-week'),
-              ('Gating mechanism: Long Short Term Memory (LSTM)',
+               'compile-and-train-the-model'),
+              ('Finally, evaluate the model',
                2,
                None,
-               'gating-mechanism-long-short-term-memory-lstm'),
-              ('Implementing a memory cell in a neural network',
+               'finally-evaluate-the-model'),
+              ('Building code using Pytorch',
                2,
                None,
-               'implementing-a-memory-cell-in-a-neural-network'),
-              ('LSTM details', 2, None, 'lstm-details'),
-              ('Basic layout', 2, None, 'basic-layout'),
-              ('More LSTM details', 2, None, 'more-lstm-details'),
-              ('The forget gate', 2, None, 'the-forget-gate'),
-              ('Input gate', 2, None, 'input-gate'),
-              ('Forget and input', 2, None, 'forget-and-input'),
-              ('Output gate', 2, None, 'output-gate')]}
+               'building-code-using-pytorch')]}
 end of tocinfo -->
 
 <body>
@@ -302,111 +201,72 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week45-bs.html">Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks (RNNs)</a>
+    <a class="navbar-brand" href="http://github.com/week45-bs.html">Week 45,  Convolutional Neural Networks (CCNs)</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs001.html#plans-for-week-45" style="font-size: 80%;"><b>Plans for week 45</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs002.html#material-for-the-lab-sessions-additional-ways-to-present-classification-results-and-other-practicalities" style="font-size: 80%;"><b>Material for the lab sessions, additional ways to present classification results and other practicalities</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs003.html#material-for-lecture-monday-november-4" style="font-size: 80%;"><b>Material for Lecture Monday November 4</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs004.html#convolutional-neural-networks-recognizing-images" style="font-size: 80%;"><b>Convolutional Neural Networks (recognizing images)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs005.html#what-is-the-difference" style="font-size: 80%;"><b>What is the Difference</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs006.html#neural-networks-vs-cnns" style="font-size: 80%;"><b>Neural Networks vs CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs007.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;"><b>Why CNNS for images, sound files, medical images from CT scans etc?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs008.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;"><b>Regular NNs don’t scale well to full images</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs009.html#3d-volumes-of-neurons" style="font-size: 80%;"><b>3D volumes of neurons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs010.html#layers-used-to-build-cnns" style="font-size: 80%;"><b>Layers used to build CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs011.html#cnns-in-brief" style="font-size: 80%;"><b>CNNs in brief</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs012.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs013.html#key-idea" style="font-size: 80%;"><b>Key Idea</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs014.html#building-convolutional-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building convolutional neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs015.html#setting-it-up" style="font-size: 80%;"><b>Setting it up</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs016.html#the-mnist-dataset-again" style="font-size: 80%;"><b>The MNIST dataset again</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs017.html#strong-correlations" style="font-size: 80%;"><b>Strong correlations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs018.html#layers-of-a-cnn" style="font-size: 80%;"><b>Layers of a CNN</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs019.html#systematic-reduction" style="font-size: 80%;"><b>Systematic reduction</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs020.html#prerequisites-collect-and-pre-process-data" style="font-size: 80%;"><b>Prerequisites: Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs021.html#importing-keras-and-tensorflow" style="font-size: 80%;"><b>Importing Keras and Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs022.html#running-with-keras" style="font-size: 80%;"><b>Running with Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs023.html#final-part" style="font-size: 80%;"><b>Final part</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs024.html#final-visualization" style="font-size: 80%;"><b>Final visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs025.html#the-cifar01-data-set" style="font-size: 80%;"><b>The CIFAR01 data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs026.html#verifying-the-data-set" style="font-size: 80%;"><b>Verifying the data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs027.html#set-up-the-model" style="font-size: 80%;"><b>Set up  the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs028.html#add-dense-layers-on-top" style="font-size: 80%;"><b>Add Dense layers on top</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs029.html#compile-and-train-the-model" style="font-size: 80%;"><b>Compile and train the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="#finally-evaluate-the-model" style="font-size: 80%;"><b>Finally, evaluate the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#building-our-own-cnn-code" style="font-size: 80%;"><b>Building our own CNN code</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#list-of-contents" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;List of contents:</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution2dlayer-convolution-in-a-hidden-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution2DLayer: convolution in a hidden layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#backpropagation-in-the-convolutional-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Backpropagation in the convolutional layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#demonstration" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Demonstration</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#pooling-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Pooling Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#flattening-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Flattening Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#fully-connected-layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Fully Connected Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#optimized-convolution2dlayer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Optimized Convolution2DLayer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#the-convolutional-neural-network-cnn" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;The Convolutional Neural Network (CNN)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-cnn-code" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of CNN code</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#additional-remarks" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Additional Remarks</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#remarks-on-the-speed" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Remarks on the speed</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution-using-separable-kernels" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution using separable kernels</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution-in-the-fourier-domain" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution in the Fourier domain</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs032.html#from-ffnns-and-cnns-to-recurrent-neural-networks-rnns" style="font-size: 80%;"><b>From FFNNs and CNNs to recurrent neural networks (RNNs)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs033.html#feedback-connections" style="font-size: 80%;"><b>Feedback connections</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs034.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs035.html#recurrent-neural-networks-rnns-overarching-view" style="font-size: 80%;"><b>Recurrent neural networks (RNNs): Overarching view</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs036.html#sequential-data-only" style="font-size: 80%;"><b>Sequential data only?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs037.html#differential-equations" style="font-size: 80%;"><b>Differential equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs038.html#a-simple-example" style="font-size: 80%;"><b>A simple example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs039.html#rnns" style="font-size: 80%;"><b>RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs040.html#what-kinds-of-behaviour-can-rnns-exhibit" style="font-size: 80%;"><b>What kinds of behaviour can RNNs exhibit?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs041.html#basic-layout-figures-from-sebastian-rashcka-et-al-machine-learning-with-sickit-learn-and-pytorch-https-sebastianraschka-com-blog-2022-ml-pytorch-book-html" style="font-size: 80%;"><b>Basic layout,  "Figures from Sebastian Rashcka et al, Machine learning with Sickit-Learn and PyTorch":"https://sebastianraschka.com/blog/2022/ml-pytorch-book.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs042.html#solving-differential-equations-with-rnns" style="font-size: 80%;"><b>Solving differential equations with RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs043.html#two-first-order-differential-equations" style="font-size: 80%;"><b>Two first-order differential equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs044.html#velocity-only" style="font-size: 80%;"><b>Velocity only</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs045.html#linking-with-rnns" style="font-size: 80%;"><b>Linking with RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs046.html#minor-rewrite" style="font-size: 80%;"><b>Minor rewrite</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs047.html#rnns-in-more-detail" style="font-size: 80%;"><b>RNNs in more detail</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs048.html#rnns-in-more-detail-part-2" style="font-size: 80%;"><b>RNNs in more detail, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs049.html#rnns-in-more-detail-part-3" style="font-size: 80%;"><b>RNNs in more detail, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs050.html#rnns-in-more-detail-part-4" style="font-size: 80%;"><b>RNNs in more detail, part 4</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs051.html#rnns-in-more-detail-part-5" style="font-size: 80%;"><b>RNNs in more detail, part 5</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs052.html#rnns-in-more-detail-part-6" style="font-size: 80%;"><b>RNNs in more detail, part 6</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs053.html#rnns-in-more-detail-part-7" style="font-size: 80%;"><b>RNNs in more detail, part 7</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs054.html#backpropagation-through-time" style="font-size: 80%;"><b>Backpropagation through time</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs055.html#the-backward-pass-is-linear" style="font-size: 80%;"><b>The backward pass is linear</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs056.html#the-problem-of-exploding-or-vanishing-gradients" style="font-size: 80%;"><b>The problem of exploding or vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs057.html#mathematical-setup" style="font-size: 80%;"><b>Mathematical setup</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs058.html#back-propagation-in-time-through-figures-part-1" style="font-size: 80%;"><b>Back propagation in time through figures, part 1</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs059.html#back-propagation-in-time-part-2" style="font-size: 80%;"><b>Back propagation in time, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs060.html#back-propagation-in-time-part-3" style="font-size: 80%;"><b>Back propagation in time, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs061.html#back-propagation-in-time-part-4" style="font-size: 80%;"><b>Back propagation in time, part 4</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs062.html#back-propagation-in-time-in-equations" style="font-size: 80%;"><b>Back propagation in time in equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs063.html#chain-rule-again" style="font-size: 80%;"><b>Chain rule again</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs064.html#gradients-of-loss-functions" style="font-size: 80%;"><b>Gradients of loss functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs065.html#summary-of-rnns" style="font-size: 80%;"><b>Summary of RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs066.html#summary-of-a-typical-rnn" style="font-size: 80%;"><b>Summary of a  typical RNN</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs067.html#four-effective-ways-to-learn-an-rnn-and-preparing-for-next-week" style="font-size: 80%;"><b>Four effective ways to learn an RNN and preparing for next week</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs068.html#gating-mechanism-long-short-term-memory-lstm" style="font-size: 80%;"><b>Gating mechanism: Long Short Term Memory (LSTM)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs069.html#implementing-a-memory-cell-in-a-neural-network" style="font-size: 80%;"><b>Implementing a memory cell in a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs070.html#lstm-details" style="font-size: 80%;"><b>LSTM details</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs071.html#basic-layout" style="font-size: 80%;"><b>Basic layout</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs072.html#more-lstm-details" style="font-size: 80%;"><b>More LSTM details</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs073.html#the-forget-gate" style="font-size: 80%;"><b>The forget gate</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs074.html#input-gate" style="font-size: 80%;"><b>Input gate</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs075.html#forget-and-input" style="font-size: 80%;"><b>Forget and input</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs076.html#output-gate" style="font-size: 80%;"><b>Output gate</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs001.html#plans-for-week-45" style="font-size: 80%;">Plans for week 45</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs002.html#material-for-the-lab-sessions" style="font-size: 80%;">Material for the lab sessions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs003.html#material-for-lecture-monday-november-3" style="font-size: 80%;">Material for Lecture Monday November 3</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs004.html#convolutional-neural-networks-recognizing-images-reminder-from-last-week" style="font-size: 80%;">Convolutional Neural Networks (recognizing images), reminder from last week</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs005.html#what-is-the-difference" style="font-size: 80%;">What is the Difference</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs006.html#neural-networks-vs-cnns" style="font-size: 80%;">Neural Networks vs CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs007.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;">Why CNNS for images, sound files, medical images from CT scans etc?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs008.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;">Regular NNs don’t scale well to full images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs009.html#3d-volumes-of-neurons" style="font-size: 80%;">3D volumes of neurons</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs010.html#more-on-dimensionalities" style="font-size: 80%;">More on Dimensionalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs011.html#further-remarks" style="font-size: 80%;">Further remarks</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs012.html#layers-used-to-build-cnns" style="font-size: 80%;">Layers used to build CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs013.html#transforming-images" style="font-size: 80%;">Transforming images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs014.html#cnns-in-brief" style="font-size: 80%;">CNNs in brief</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs015.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs016.html#key-idea" style="font-size: 80%;">Key Idea</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs017.html#mathematics-of-cnns" style="font-size: 80%;">Mathematics of CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs018.html#convolution-examples-polynomial-multiplication" style="font-size: 80%;">Convolution Examples: Polynomial multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs019.html#efficient-polynomial-multiplication" style="font-size: 80%;">Efficient Polynomial Multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs020.html#further-simplification" style="font-size: 80%;">Further simplification</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs021.html#a-more-efficient-way-of-coding-the-above-convolution" style="font-size: 80%;">A more efficient way of coding the above Convolution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs022.html#commutative-process" style="font-size: 80%;">Commutative process</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs023.html#toeplitz-matrices" style="font-size: 80%;">Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs024.html#fourier-series-and-toeplitz-matrices" style="font-size: 80%;">Fourier series and Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs025.html#generalizing-the-above-one-dimensional-case" style="font-size: 80%;">Generalizing the above one-dimensional case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs026.html#memory-considerations" style="font-size: 80%;">Memory considerations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs027.html#padding" style="font-size: 80%;">Padding</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs028.html#new-vector" style="font-size: 80%;">New vector</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs029.html#rewriting-as-dot-products" style="font-size: 80%;">Rewriting as dot products</a></li>
+     <!-- navigation toc: --> <li><a href="#cross-correlation" style="font-size: 80%;">Cross correlation</a></li>
+     <!-- navigation toc: --> <li><a href="#two-dimensional-objects" style="font-size: 80%;">Two-dimensional objects</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#cnns-in-more-detail-simple-example" style="font-size: 80%;">CNNs in more detail, simple example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs032.html#the-convolution-stage" style="font-size: 80%;">The convolution stage</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs033.html#finding-the-number-of-parameters" style="font-size: 80%;">Finding the number of parameters</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs034.html#new-image-or-volume" style="font-size: 80%;">New image (or volume)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs035.html#parameters-to-train-common-settings" style="font-size: 80%;">Parameters to train, common settings</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs036.html#examples-of-cnn-setups" style="font-size: 80%;">Examples of CNN setups</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs037.html#summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Summarizing: Performing a general discrete convolution ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs038.html#pooling" style="font-size: 80%;">Pooling</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs039.html#pooling-arithmetic" style="font-size: 80%;">Pooling arithmetic</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs040.html#pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Pooling types ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs041.html#building-convolutional-neural-networks-using-tensorflow-and-keras" style="font-size: 80%;">Building convolutional neural networks using Tensorflow and Keras</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs042.html#setting-it-up" style="font-size: 80%;">Setting it up</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs043.html#the-mnist-dataset-again" style="font-size: 80%;">The MNIST dataset again</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs044.html#strong-correlations" style="font-size: 80%;">Strong correlations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs045.html#layers-of-a-cnn" style="font-size: 80%;">Layers of a CNN</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs046.html#systematic-reduction" style="font-size: 80%;">Systematic reduction</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs047.html#prerequisites-collect-and-pre-process-data" style="font-size: 80%;">Prerequisites: Collect and pre-process data</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs048.html#importing-keras-and-tensorflow" style="font-size: 80%;">Importing Keras and Tensorflow</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs049.html#running-with-keras" style="font-size: 80%;">Running with Keras</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs050.html#final-part" style="font-size: 80%;">Final part</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs051.html#final-visualization" style="font-size: 80%;">Final visualization</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs052.html#the-cifar01-data-set" style="font-size: 80%;">The CIFAR01 data set</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs053.html#verifying-the-data-set" style="font-size: 80%;">Verifying the data set</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs054.html#set-up-the-model" style="font-size: 80%;">Set up  the model</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs055.html#add-dense-layers-on-top" style="font-size: 80%;">Add Dense layers on top</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs056.html#compile-and-train-the-model" style="font-size: 80%;">Compile and train the model</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs057.html#finally-evaluate-the-model" style="font-size: 80%;">Finally, evaluate the model</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs058.html#building-code-using-pytorch" style="font-size: 80%;">Building code using Pytorch</a></li>
 
         </ul>
       </li>
@@ -418,39 +278,56 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0030"></a>
 <!-- !split -->
-<h2 id="finally-evaluate-the-model" class="anchor">Finally, evaluate the model </h2>
+<h2 id="cross-correlation" class="anchor">Cross correlation </h2>
 
+<p>In essentially all applications one uses what is called cross correlation instead of the standard convolution described above.
+This means that multiplication is performed in the same direction and instead of the general expression we have discussed above (with infinite sums)
+</p>
+$$
+y(i) = \sum_{k=-\infty}^{k=\infty}w(k)x(i-k),
+$$
 
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;">plt<span style="color: #666666">.</span>plot(history<span style="color: #666666">.</span>history[<span style="color: #BA2121">&#39;accuracy&#39;</span>], label<span style="color: #666666">=</span><span style="color: #BA2121">&#39;accuracy&#39;</span>)
-plt<span style="color: #666666">.</span>plot(history<span style="color: #666666">.</span>history[<span style="color: #BA2121">&#39;val_accuracy&#39;</span>], label <span style="color: #666666">=</span> <span style="color: #BA2121">&#39;val_accuracy&#39;</span>)
-plt<span style="color: #666666">.</span>xlabel(<span style="color: #BA2121">&#39;Epoch&#39;</span>)
-plt<span style="color: #666666">.</span>ylabel(<span style="color: #BA2121">&#39;Accuracy&#39;</span>)
-plt<span style="color: #666666">.</span>ylim([<span style="color: #666666">0.5</span>, <span style="color: #666666">1</span>])
-plt<span style="color: #666666">.</span>legend(loc<span style="color: #666666">=</span><span style="color: #BA2121">&#39;lower right&#39;</span>)
+<p>we have now</p>
+$$
+y(i) = \sum_{k=-\infty}^{k=\infty}w(k)x(i+k).
+$$
 
-test_loss, test_acc <span style="color: #666666">=</span> model<span style="color: #666666">.</span>evaluate(test_images,  test_labels, verbose<span style="color: #666666">=2</span>)
+<p>Both TensorFlow and PyTorch (as well as our own code example below),
+implement the last equation, although it is normally referred to as
+convolution.  The same padding rules and stride rules discussed below
+apply to this expression as well.
+</p>
 
-<span style="color: #008000">print</span>(test_acc)
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
+<p>We leave it as an exercise for you to convince yourself that the example we have discussed till now, gives the same final expression using the last expression.</p>
+<h2 id="two-dimensional-objects" class="anchor">Two-dimensional objects </h2>
+
+<p>We are now ready to start studying the discrete convolutions relevant for convolutional neural networks.
+We often use convolutions over more than one dimension at a time. If
+we have a two-dimensional image \( X \) as input, we can have a <b>filter</b>
+defined by a two-dimensional <b>kernel/weight/filter</b> \( W \). This leads to an output \( Y \)
+</p>
+
+$$
+Y(i,j)=(X * W)(i,j) = \sum_m\sum_n X(m,n)W(i-m,j-n).
+$$
+
+<p>Convolution is a commutative process, which means we can rewrite this equation as</p>
+$$
+Y(i,j)=(X * W)(i,j) = \sum_m\sum_n X(i-m,j-n)W(m,n).
+$$
+
+<p>Normally the latter is more straightforward to implement in a machine
+larning library since there is less variation in the range of values
+of \( m \) and \( n \).
+</p>
+
+<p>As mentioned above, most deep learning libraries implement
+cross-correlation instead of convolution (although it is referred to as
+convolution)
+</p>
+$$
+Y(i,j)=(X * W)(i,j) = \sum_m\sum_n X(i+m,j+n)W(m,n).
+$$
 
 
 <p>
@@ -478,7 +355,7 @@ <h2 id="finally-evaluate-the-model" class="anchor">Finally, evaluate the model <
   <li><a href="http://github.com/._week45-bs038.html">39</a></li>
   <li><a href="http://github.com/._week45-bs039.html">40</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week45-bs076.html">77</a></li>
+  <li><a href="http://github.com/._week45-bs058.html">59</a></li>
   <li><a href="http://github.com/._week45-bs031.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week45/html/._week45-bs031.html b/doc/pub/week45/html/._week45-bs031.html
index 013221030..510a2a03a 100644
--- a/doc/pub/week45/html/._week45-bs031.html
+++ b/doc/pub/week45/html/._week45-bs031.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks (RNNs)">
-<title>Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks (RNNs)</title>
+<meta name="description" content="Week 45,  Convolutional Neural Networks (CCNs)">
+<title>Week 45,  Convolutional Neural Networks (CCNs)</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week45.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week45-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -37,19 +37,19 @@
 <!-- tocinfo
 {'highest level': 2,
  'sections': [('Plans for week 45', 2, None, 'plans-for-week-45'),
-              ('Material for the lab sessions, additional ways to present '
-               'classification results and other practicalities',
+              ('Material for the lab sessions',
                2,
                None,
-               'material-for-the-lab-sessions-additional-ways-to-present-classification-results-and-other-practicalities'),
-              ('Material for Lecture Monday November 4',
+               'material-for-the-lab-sessions'),
+              ('Material for Lecture Monday November 3',
                2,
                None,
-               'material-for-lecture-monday-november-4'),
-              ('Convolutional Neural Networks (recognizing images)',
+               'material-for-lecture-monday-november-3'),
+              ('Convolutional Neural Networks (recognizing images), reminder '
+               'from last week',
                2,
                None,
-               'convolutional-neural-networks-recognizing-images'),
+               'convolutional-neural-networks-recognizing-images-reminder-from-last-week'),
               ('What is the Difference', 2, None, 'what-is-the-difference'),
               ('Neural Networks vs CNNs', 2, None, 'neural-networks-vs-cnns'),
               ('Why CNNS for images, sound files, medical images from CT scans '
@@ -62,10 +62,13 @@
                None,
                'regular-nns-don-t-scale-well-to-full-images'),
               ('3D volumes of neurons', 2, None, '3d-volumes-of-neurons'),
+              ('More on Dimensionalities', 2, None, 'more-on-dimensionalities'),
+              ('Further remarks', 2, None, 'further-remarks'),
               ('Layers used to build CNNs',
                2,
                None,
                'layers-used-to-build-cnns'),
+              ('Transforming images', 2, None, 'transforming-images'),
               ('CNNs in brief', 2, None, 'cnns-in-brief'),
               ('A deep CNN model ("From Raschka et '
                'al":"https://github.com/rasbt/machine-learning-book")',
@@ -73,208 +76,104 @@
                None,
                'a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
               ('Key Idea', 2, None, 'key-idea'),
-              ('Building convolutional neural networks in Tensorflow and Keras',
+              ('Mathematics of CNNs', 2, None, 'mathematics-of-cnns'),
+              ('Convolution Examples: Polynomial multiplication',
                2,
                None,
-               'building-convolutional-neural-networks-in-tensorflow-and-keras'),
-              ('Setting it up', 2, None, 'setting-it-up'),
-              ('The MNIST dataset again', 2, None, 'the-mnist-dataset-again'),
-              ('Strong correlations', 2, None, 'strong-correlations'),
-              ('Layers of a CNN', 2, None, 'layers-of-a-cnn'),
-              ('Systematic reduction', 2, None, 'systematic-reduction'),
-              ('Prerequisites: Collect and pre-process data',
-               2,
-               None,
-               'prerequisites-collect-and-pre-process-data'),
-              ('Importing Keras and Tensorflow',
-               2,
-               None,
-               'importing-keras-and-tensorflow'),
-              ('Running with Keras', 2, None, 'running-with-keras'),
-              ('Final part', 2, None, 'final-part'),
-              ('Final visualization', 2, None, 'final-visualization'),
-              ('The CIFAR01 data set', 2, None, 'the-cifar01-data-set'),
-              ('Verifying the data set', 2, None, 'verifying-the-data-set'),
-              ('Set up  the model', 2, None, 'set-up-the-model'),
-              ('Add Dense layers on top', 2, None, 'add-dense-layers-on-top'),
-              ('Compile and train the model',
-               2,
-               None,
-               'compile-and-train-the-model'),
-              ('Finally, evaluate the model',
-               2,
-               None,
-               'finally-evaluate-the-model'),
-              ('Building our own CNN code',
-               2,
-               None,
-               'building-our-own-cnn-code'),
-              ('List of contents:', 3, None, 'list-of-contents'),
-              ('Schedulers', 3, None, 'schedulers'),
-              ('Usage of schedulers', 3, None, 'usage-of-schedulers'),
-              ('Cost functions', 3, None, 'cost-functions'),
-              ('Usage of cost functions', 3, None, 'usage-of-cost-functions'),
-              ('Activation functions', 3, None, 'activation-functions'),
-              ('Usage of activation functions',
-               3,
-               None,
-               'usage-of-activation-functions'),
-              ('Convolution', 3, None, 'convolution'),
-              ('Layers', 3, None, 'layers'),
-              ('Convolution2DLayer: convolution in a hidden layer',
-               3,
-               None,
-               'convolution2dlayer-convolution-in-a-hidden-layer'),
-              ('Backpropagation in the convolutional layer',
-               3,
-               None,
-               'backpropagation-in-the-convolutional-layer'),
-              ('Demonstration', 3, None, 'demonstration'),
-              ('Pooling Layer', 3, None, 'pooling-layer'),
-              ('Flattening Layer', 3, None, 'flattening-layer'),
-              ('Fully Connected Layers', 3, None, 'fully-connected-layers'),
-              ('Optimized Convolution2DLayer',
-               3,
-               None,
-               'optimized-convolution2dlayer'),
-              ('The Convolutional Neural Network (CNN)',
-               3,
-               None,
-               'the-convolutional-neural-network-cnn'),
-              ('Usage of CNN code', 3, None, 'usage-of-cnn-code'),
-              ('Additional Remarks', 3, None, 'additional-remarks'),
-              ('Remarks on the speed', 3, None, 'remarks-on-the-speed'),
-              ('Convolution using separable kernels',
-               3,
-               None,
-               'convolution-using-separable-kernels'),
-              ('Convolution in the Fourier domain',
-               3,
-               None,
-               'convolution-in-the-fourier-domain'),
-              ('From FFNNs and CNNs to recurrent neural networks (RNNs)',
-               2,
-               None,
-               'from-ffnns-and-cnns-to-recurrent-neural-networks-rnns'),
-              ('Feedback connections', 2, None, 'feedback-connections'),
-              ('Vanishing gradients', 2, None, 'vanishing-gradients'),
-              ('Recurrent neural networks (RNNs): Overarching view',
-               2,
-               None,
-               'recurrent-neural-networks-rnns-overarching-view'),
-              ('Sequential data only?', 2, None, 'sequential-data-only'),
-              ('Differential equations', 2, None, 'differential-equations'),
-              ('A simple example', 2, None, 'a-simple-example'),
-              ('RNNs', 2, None, 'rnns'),
-              ('What kinds of behaviour can RNNs exhibit?',
-               2,
-               None,
-               'what-kinds-of-behaviour-can-rnns-exhibit'),
-              ('Basic layout,  "Figures from Sebastian Rashcka et al, Machine '
-               'learning with Sickit-Learn and '
-               'PyTorch":"https://sebastianraschka.com/blog/2022/ml-pytorch-book.html"',
-               2,
-               None,
-               'basic-layout-figures-from-sebastian-rashcka-et-al-machine-learning-with-sickit-learn-and-pytorch-https-sebastianraschka-com-blog-2022-ml-pytorch-book-html'),
-              ('Solving differential equations with RNNs',
-               2,
-               None,
-               'solving-differential-equations-with-rnns'),
-              ('Two first-order differential equations',
-               2,
-               None,
-               'two-first-order-differential-equations'),
-              ('Velocity only', 2, None, 'velocity-only'),
-              ('Linking with RNNs', 2, None, 'linking-with-rnns'),
-              ('Minor rewrite', 2, None, 'minor-rewrite'),
-              ('RNNs in more detail', 2, None, 'rnns-in-more-detail'),
-              ('RNNs in more detail, part 2',
-               2,
-               None,
-               'rnns-in-more-detail-part-2'),
-              ('RNNs in more detail, part 3',
-               2,
-               None,
-               'rnns-in-more-detail-part-3'),
-              ('RNNs in more detail, part 4',
-               2,
-               None,
-               'rnns-in-more-detail-part-4'),
-              ('RNNs in more detail, part 5',
+               'convolution-examples-polynomial-multiplication'),
+              ('Efficient Polynomial Multiplication',
                2,
                None,
-               'rnns-in-more-detail-part-5'),
-              ('RNNs in more detail, part 6',
+               'efficient-polynomial-multiplication'),
+              ('Further simplification', 2, None, 'further-simplification'),
+              ('A more efficient way of coding the above Convolution',
                2,
                None,
-               'rnns-in-more-detail-part-6'),
-              ('RNNs in more detail, part 7',
+               'a-more-efficient-way-of-coding-the-above-convolution'),
+              ('Commutative process', 2, None, 'commutative-process'),
+              ('Toeplitz matrices', 2, None, 'toeplitz-matrices'),
+              ('Fourier series and Toeplitz matrices',
                2,
                None,
-               'rnns-in-more-detail-part-7'),
-              ('Backpropagation through time',
+               'fourier-series-and-toeplitz-matrices'),
+              ('Generalizing the above one-dimensional case',
                2,
                None,
-               'backpropagation-through-time'),
-              ('The backward pass is linear',
+               'generalizing-the-above-one-dimensional-case'),
+              ('Memory considerations', 2, None, 'memory-considerations'),
+              ('Padding', 2, None, 'padding'),
+              ('New vector', 2, None, 'new-vector'),
+              ('Rewriting as dot products',
                2,
                None,
-               'the-backward-pass-is-linear'),
-              ('The problem of exploding or vanishing gradients',
+               'rewriting-as-dot-products'),
+              ('Cross correlation', 2, None, 'cross-correlation'),
+              ('Two-dimensional objects', 2, None, 'two-dimensional-objects'),
+              ('CNNs in more detail, simple example',
                2,
                None,
-               'the-problem-of-exploding-or-vanishing-gradients'),
-              ('Mathematical setup', 2, None, 'mathematical-setup'),
-              ('Back propagation in time through figures, part 1',
+               'cnns-in-more-detail-simple-example'),
+              ('The convolution stage', 2, None, 'the-convolution-stage'),
+              ('Finding the number of parameters',
                2,
                None,
-               'back-propagation-in-time-through-figures-part-1'),
-              ('Back propagation in time, part 2',
+               'finding-the-number-of-parameters'),
+              ('New image (or volume)', 2, None, 'new-image-or-volume'),
+              ('Parameters to train, common settings',
                2,
                None,
-               'back-propagation-in-time-part-2'),
-              ('Back propagation in time, part 3',
+               'parameters-to-train-common-settings'),
+              ('Examples of CNN setups', 2, None, 'examples-of-cnn-setups'),
+              ('Summarizing: Performing a general discrete convolution ("From '
+               'Raschka et '
+               'al":"https://github.com/rasbt/machine-learning-book")',
                2,
                None,
-               'back-propagation-in-time-part-3'),
-              ('Back propagation in time, part 4',
+               'summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
+              ('Pooling', 2, None, 'pooling'),
+              ('Pooling arithmetic', 2, None, 'pooling-arithmetic'),
+              ('Pooling types ("From Raschka et '
+               'al":"https://github.com/rasbt/machine-learning-book")',
                2,
                None,
-               'back-propagation-in-time-part-4'),
-              ('Back propagation in time in equations',
+               'pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
+              ('Building convolutional neural networks using Tensorflow and '
+               'Keras',
                2,
                None,
-               'back-propagation-in-time-in-equations'),
-              ('Chain rule again', 2, None, 'chain-rule-again'),
-              ('Gradients of loss functions',
+               'building-convolutional-neural-networks-using-tensorflow-and-keras'),
+              ('Setting it up', 2, None, 'setting-it-up'),
+              ('The MNIST dataset again', 2, None, 'the-mnist-dataset-again'),
+              ('Strong correlations', 2, None, 'strong-correlations'),
+              ('Layers of a CNN', 2, None, 'layers-of-a-cnn'),
+              ('Systematic reduction', 2, None, 'systematic-reduction'),
+              ('Prerequisites: Collect and pre-process data',
                2,
                None,
-               'gradients-of-loss-functions'),
-              ('Summary of RNNs', 2, None, 'summary-of-rnns'),
-              ('Summary of a  typical RNN',
+               'prerequisites-collect-and-pre-process-data'),
+              ('Importing Keras and Tensorflow',
                2,
                None,
-               'summary-of-a-typical-rnn'),
-              ('Four effective ways to learn an RNN and preparing for next '
-               'week',
+               'importing-keras-and-tensorflow'),
+              ('Running with Keras', 2, None, 'running-with-keras'),
+              ('Final part', 2, None, 'final-part'),
+              ('Final visualization', 2, None, 'final-visualization'),
+              ('The CIFAR01 data set', 2, None, 'the-cifar01-data-set'),
+              ('Verifying the data set', 2, None, 'verifying-the-data-set'),
+              ('Set up  the model', 2, None, 'set-up-the-model'),
+              ('Add Dense layers on top', 2, None, 'add-dense-layers-on-top'),
+              ('Compile and train the model',
                2,
                None,
-               'four-effective-ways-to-learn-an-rnn-and-preparing-for-next-week'),
-              ('Gating mechanism: Long Short Term Memory (LSTM)',
+               'compile-and-train-the-model'),
+              ('Finally, evaluate the model',
                2,
                None,
-               'gating-mechanism-long-short-term-memory-lstm'),
-              ('Implementing a memory cell in a neural network',
+               'finally-evaluate-the-model'),
+              ('Building code using Pytorch',
                2,
                None,
-               'implementing-a-memory-cell-in-a-neural-network'),
-              ('LSTM details', 2, None, 'lstm-details'),
-              ('Basic layout', 2, None, 'basic-layout'),
-              ('More LSTM details', 2, None, 'more-lstm-details'),
-              ('The forget gate', 2, None, 'the-forget-gate'),
-              ('Input gate', 2, None, 'input-gate'),
-              ('Forget and input', 2, None, 'forget-and-input'),
-              ('Output gate', 2, None, 'output-gate')]}
+               'building-code-using-pytorch')]}
 end of tocinfo -->
 
 <body>
@@ -302,111 +201,72 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week45-bs.html">Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks (RNNs)</a>
+    <a class="navbar-brand" href="http://github.com/week45-bs.html">Week 45,  Convolutional Neural Networks (CCNs)</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs001.html#plans-for-week-45" style="font-size: 80%;"><b>Plans for week 45</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs002.html#material-for-the-lab-sessions-additional-ways-to-present-classification-results-and-other-practicalities" style="font-size: 80%;"><b>Material for the lab sessions, additional ways to present classification results and other practicalities</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs003.html#material-for-lecture-monday-november-4" style="font-size: 80%;"><b>Material for Lecture Monday November 4</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs004.html#convolutional-neural-networks-recognizing-images" style="font-size: 80%;"><b>Convolutional Neural Networks (recognizing images)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs005.html#what-is-the-difference" style="font-size: 80%;"><b>What is the Difference</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs006.html#neural-networks-vs-cnns" style="font-size: 80%;"><b>Neural Networks vs CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs007.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;"><b>Why CNNS for images, sound files, medical images from CT scans etc?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs008.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;"><b>Regular NNs don’t scale well to full images</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs009.html#3d-volumes-of-neurons" style="font-size: 80%;"><b>3D volumes of neurons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs010.html#layers-used-to-build-cnns" style="font-size: 80%;"><b>Layers used to build CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs011.html#cnns-in-brief" style="font-size: 80%;"><b>CNNs in brief</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs012.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs013.html#key-idea" style="font-size: 80%;"><b>Key Idea</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs014.html#building-convolutional-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building convolutional neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs015.html#setting-it-up" style="font-size: 80%;"><b>Setting it up</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs016.html#the-mnist-dataset-again" style="font-size: 80%;"><b>The MNIST dataset again</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs017.html#strong-correlations" style="font-size: 80%;"><b>Strong correlations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs018.html#layers-of-a-cnn" style="font-size: 80%;"><b>Layers of a CNN</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs019.html#systematic-reduction" style="font-size: 80%;"><b>Systematic reduction</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs020.html#prerequisites-collect-and-pre-process-data" style="font-size: 80%;"><b>Prerequisites: Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs021.html#importing-keras-and-tensorflow" style="font-size: 80%;"><b>Importing Keras and Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs022.html#running-with-keras" style="font-size: 80%;"><b>Running with Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs023.html#final-part" style="font-size: 80%;"><b>Final part</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs024.html#final-visualization" style="font-size: 80%;"><b>Final visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs025.html#the-cifar01-data-set" style="font-size: 80%;"><b>The CIFAR01 data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs026.html#verifying-the-data-set" style="font-size: 80%;"><b>Verifying the data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs027.html#set-up-the-model" style="font-size: 80%;"><b>Set up  the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs028.html#add-dense-layers-on-top" style="font-size: 80%;"><b>Add Dense layers on top</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs029.html#compile-and-train-the-model" style="font-size: 80%;"><b>Compile and train the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs030.html#finally-evaluate-the-model" style="font-size: 80%;"><b>Finally, evaluate the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="#building-our-own-cnn-code" style="font-size: 80%;"><b>Building our own CNN code</b></a></li>
-     <!-- navigation toc: --> <li><a href="#list-of-contents" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;List of contents:</a></li>
-     <!-- navigation toc: --> <li><a href="#schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="#usage-of-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="#cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="#usage-of-cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="#usage-of-activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="#convolution" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution</a></li>
-     <!-- navigation toc: --> <li><a href="#layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Layers</a></li>
-     <!-- navigation toc: --> <li><a href="#convolution2dlayer-convolution-in-a-hidden-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution2DLayer: convolution in a hidden layer</a></li>
-     <!-- navigation toc: --> <li><a href="#backpropagation-in-the-convolutional-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Backpropagation in the convolutional layer</a></li>
-     <!-- navigation toc: --> <li><a href="#demonstration" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Demonstration</a></li>
-     <!-- navigation toc: --> <li><a href="#pooling-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Pooling Layer</a></li>
-     <!-- navigation toc: --> <li><a href="#flattening-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Flattening Layer</a></li>
-     <!-- navigation toc: --> <li><a href="#fully-connected-layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Fully Connected Layers</a></li>
-     <!-- navigation toc: --> <li><a href="#optimized-convolution2dlayer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Optimized Convolution2DLayer</a></li>
-     <!-- navigation toc: --> <li><a href="#the-convolutional-neural-network-cnn" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;The Convolutional Neural Network (CNN)</a></li>
-     <!-- navigation toc: --> <li><a href="#usage-of-cnn-code" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of CNN code</a></li>
-     <!-- navigation toc: --> <li><a href="#additional-remarks" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Additional Remarks</a></li>
-     <!-- navigation toc: --> <li><a href="#remarks-on-the-speed" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Remarks on the speed</a></li>
-     <!-- navigation toc: --> <li><a href="#convolution-using-separable-kernels" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution using separable kernels</a></li>
-     <!-- navigation toc: --> <li><a href="#convolution-in-the-fourier-domain" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution in the Fourier domain</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs032.html#from-ffnns-and-cnns-to-recurrent-neural-networks-rnns" style="font-size: 80%;"><b>From FFNNs and CNNs to recurrent neural networks (RNNs)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs033.html#feedback-connections" style="font-size: 80%;"><b>Feedback connections</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs034.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs035.html#recurrent-neural-networks-rnns-overarching-view" style="font-size: 80%;"><b>Recurrent neural networks (RNNs): Overarching view</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs036.html#sequential-data-only" style="font-size: 80%;"><b>Sequential data only?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs037.html#differential-equations" style="font-size: 80%;"><b>Differential equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs038.html#a-simple-example" style="font-size: 80%;"><b>A simple example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs039.html#rnns" style="font-size: 80%;"><b>RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs040.html#what-kinds-of-behaviour-can-rnns-exhibit" style="font-size: 80%;"><b>What kinds of behaviour can RNNs exhibit?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs041.html#basic-layout-figures-from-sebastian-rashcka-et-al-machine-learning-with-sickit-learn-and-pytorch-https-sebastianraschka-com-blog-2022-ml-pytorch-book-html" style="font-size: 80%;"><b>Basic layout,  "Figures from Sebastian Rashcka et al, Machine learning with Sickit-Learn and PyTorch":"https://sebastianraschka.com/blog/2022/ml-pytorch-book.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs042.html#solving-differential-equations-with-rnns" style="font-size: 80%;"><b>Solving differential equations with RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs043.html#two-first-order-differential-equations" style="font-size: 80%;"><b>Two first-order differential equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs044.html#velocity-only" style="font-size: 80%;"><b>Velocity only</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs045.html#linking-with-rnns" style="font-size: 80%;"><b>Linking with RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs046.html#minor-rewrite" style="font-size: 80%;"><b>Minor rewrite</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs047.html#rnns-in-more-detail" style="font-size: 80%;"><b>RNNs in more detail</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs048.html#rnns-in-more-detail-part-2" style="font-size: 80%;"><b>RNNs in more detail, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs049.html#rnns-in-more-detail-part-3" style="font-size: 80%;"><b>RNNs in more detail, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs050.html#rnns-in-more-detail-part-4" style="font-size: 80%;"><b>RNNs in more detail, part 4</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs051.html#rnns-in-more-detail-part-5" style="font-size: 80%;"><b>RNNs in more detail, part 5</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs052.html#rnns-in-more-detail-part-6" style="font-size: 80%;"><b>RNNs in more detail, part 6</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs053.html#rnns-in-more-detail-part-7" style="font-size: 80%;"><b>RNNs in more detail, part 7</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs054.html#backpropagation-through-time" style="font-size: 80%;"><b>Backpropagation through time</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs055.html#the-backward-pass-is-linear" style="font-size: 80%;"><b>The backward pass is linear</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs056.html#the-problem-of-exploding-or-vanishing-gradients" style="font-size: 80%;"><b>The problem of exploding or vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs057.html#mathematical-setup" style="font-size: 80%;"><b>Mathematical setup</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs058.html#back-propagation-in-time-through-figures-part-1" style="font-size: 80%;"><b>Back propagation in time through figures, part 1</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs059.html#back-propagation-in-time-part-2" style="font-size: 80%;"><b>Back propagation in time, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs060.html#back-propagation-in-time-part-3" style="font-size: 80%;"><b>Back propagation in time, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs061.html#back-propagation-in-time-part-4" style="font-size: 80%;"><b>Back propagation in time, part 4</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs062.html#back-propagation-in-time-in-equations" style="font-size: 80%;"><b>Back propagation in time in equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs063.html#chain-rule-again" style="font-size: 80%;"><b>Chain rule again</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs064.html#gradients-of-loss-functions" style="font-size: 80%;"><b>Gradients of loss functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs065.html#summary-of-rnns" style="font-size: 80%;"><b>Summary of RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs066.html#summary-of-a-typical-rnn" style="font-size: 80%;"><b>Summary of a  typical RNN</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs067.html#four-effective-ways-to-learn-an-rnn-and-preparing-for-next-week" style="font-size: 80%;"><b>Four effective ways to learn an RNN and preparing for next week</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs068.html#gating-mechanism-long-short-term-memory-lstm" style="font-size: 80%;"><b>Gating mechanism: Long Short Term Memory (LSTM)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs069.html#implementing-a-memory-cell-in-a-neural-network" style="font-size: 80%;"><b>Implementing a memory cell in a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs070.html#lstm-details" style="font-size: 80%;"><b>LSTM details</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs071.html#basic-layout" style="font-size: 80%;"><b>Basic layout</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs072.html#more-lstm-details" style="font-size: 80%;"><b>More LSTM details</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs073.html#the-forget-gate" style="font-size: 80%;"><b>The forget gate</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs074.html#input-gate" style="font-size: 80%;"><b>Input gate</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs075.html#forget-and-input" style="font-size: 80%;"><b>Forget and input</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs076.html#output-gate" style="font-size: 80%;"><b>Output gate</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs001.html#plans-for-week-45" style="font-size: 80%;">Plans for week 45</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs002.html#material-for-the-lab-sessions" style="font-size: 80%;">Material for the lab sessions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs003.html#material-for-lecture-monday-november-3" style="font-size: 80%;">Material for Lecture Monday November 3</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs004.html#convolutional-neural-networks-recognizing-images-reminder-from-last-week" style="font-size: 80%;">Convolutional Neural Networks (recognizing images), reminder from last week</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs005.html#what-is-the-difference" style="font-size: 80%;">What is the Difference</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs006.html#neural-networks-vs-cnns" style="font-size: 80%;">Neural Networks vs CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs007.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;">Why CNNS for images, sound files, medical images from CT scans etc?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs008.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;">Regular NNs don’t scale well to full images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs009.html#3d-volumes-of-neurons" style="font-size: 80%;">3D volumes of neurons</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs010.html#more-on-dimensionalities" style="font-size: 80%;">More on Dimensionalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs011.html#further-remarks" style="font-size: 80%;">Further remarks</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs012.html#layers-used-to-build-cnns" style="font-size: 80%;">Layers used to build CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs013.html#transforming-images" style="font-size: 80%;">Transforming images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs014.html#cnns-in-brief" style="font-size: 80%;">CNNs in brief</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs015.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs016.html#key-idea" style="font-size: 80%;">Key Idea</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs017.html#mathematics-of-cnns" style="font-size: 80%;">Mathematics of CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs018.html#convolution-examples-polynomial-multiplication" style="font-size: 80%;">Convolution Examples: Polynomial multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs019.html#efficient-polynomial-multiplication" style="font-size: 80%;">Efficient Polynomial Multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs020.html#further-simplification" style="font-size: 80%;">Further simplification</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs021.html#a-more-efficient-way-of-coding-the-above-convolution" style="font-size: 80%;">A more efficient way of coding the above Convolution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs022.html#commutative-process" style="font-size: 80%;">Commutative process</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs023.html#toeplitz-matrices" style="font-size: 80%;">Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs024.html#fourier-series-and-toeplitz-matrices" style="font-size: 80%;">Fourier series and Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs025.html#generalizing-the-above-one-dimensional-case" style="font-size: 80%;">Generalizing the above one-dimensional case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs026.html#memory-considerations" style="font-size: 80%;">Memory considerations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs027.html#padding" style="font-size: 80%;">Padding</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs028.html#new-vector" style="font-size: 80%;">New vector</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs029.html#rewriting-as-dot-products" style="font-size: 80%;">Rewriting as dot products</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs030.html#cross-correlation" style="font-size: 80%;">Cross correlation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs030.html#two-dimensional-objects" style="font-size: 80%;">Two-dimensional objects</a></li>
+     <!-- navigation toc: --> <li><a href="#cnns-in-more-detail-simple-example" style="font-size: 80%;">CNNs in more detail, simple example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs032.html#the-convolution-stage" style="font-size: 80%;">The convolution stage</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs033.html#finding-the-number-of-parameters" style="font-size: 80%;">Finding the number of parameters</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs034.html#new-image-or-volume" style="font-size: 80%;">New image (or volume)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs035.html#parameters-to-train-common-settings" style="font-size: 80%;">Parameters to train, common settings</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs036.html#examples-of-cnn-setups" style="font-size: 80%;">Examples of CNN setups</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs037.html#summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Summarizing: Performing a general discrete convolution ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs038.html#pooling" style="font-size: 80%;">Pooling</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs039.html#pooling-arithmetic" style="font-size: 80%;">Pooling arithmetic</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs040.html#pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Pooling types ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs041.html#building-convolutional-neural-networks-using-tensorflow-and-keras" style="font-size: 80%;">Building convolutional neural networks using Tensorflow and Keras</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs042.html#setting-it-up" style="font-size: 80%;">Setting it up</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs043.html#the-mnist-dataset-again" style="font-size: 80%;">The MNIST dataset again</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs044.html#strong-correlations" style="font-size: 80%;">Strong correlations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs045.html#layers-of-a-cnn" style="font-size: 80%;">Layers of a CNN</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs046.html#systematic-reduction" style="font-size: 80%;">Systematic reduction</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs047.html#prerequisites-collect-and-pre-process-data" style="font-size: 80%;">Prerequisites: Collect and pre-process data</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs048.html#importing-keras-and-tensorflow" style="font-size: 80%;">Importing Keras and Tensorflow</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs049.html#running-with-keras" style="font-size: 80%;">Running with Keras</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs050.html#final-part" style="font-size: 80%;">Final part</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs051.html#final-visualization" style="font-size: 80%;">Final visualization</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs052.html#the-cifar01-data-set" style="font-size: 80%;">The CIFAR01 data set</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs053.html#verifying-the-data-set" style="font-size: 80%;">Verifying the data set</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs054.html#set-up-the-model" style="font-size: 80%;">Set up  the model</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs055.html#add-dense-layers-on-top" style="font-size: 80%;">Add Dense layers on top</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs056.html#compile-and-train-the-model" style="font-size: 80%;">Compile and train the model</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs057.html#finally-evaluate-the-model" style="font-size: 80%;">Finally, evaluate the model</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs058.html#building-code-using-pytorch" style="font-size: 80%;">Building code using Pytorch</a></li>
 
         </ul>
       </li>
@@ -418,3270 +278,64 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0031"></a>
 <!-- !split -->
-<h2 id="building-our-own-cnn-code" class="anchor">Building our own CNN code </h2>
-
-<p>Here we present a flexible and readable python code for a CNN
-implemented with NumPy. We will present the code, showcase how to use
-the codebase and fit a CNN that yields a 99% accuracy on the 28x28
-MNIST dataset within reasonable time.
-</p>
-
-<b>The codes here were developed by Eric Reber and Gregor Kajda during spring 2023.</b>
-
-<p>The CNN is compatible with all schedulers, cost functions and
-activation functions discussed in constructing our neural network
-codes.
-</p>
-
-<p> The CNN code consists of different types of Layer classes, including
-Convolution2DLayer, Pooling2DLayer, FlattenLayer, FullyConnectedLayer
-and OutputLayer, which can be added to the CNN object using the
-interface of the CNN class. This allows you to easily construct your
-own CNN, as well as allowing you to get used to an interface similar
-to that of TensorFlow which is used for real world applications. 
-</p>
-
-<p>Another important feature of this code is that it throws errors if
-unreasonable decisions are made (for example using a kernel that is
-larger than the image, not using a FlattenLayer, etc), and provides
-the user with an informative error message.
-</p>
-<h3 id="list-of-contents" class="anchor">List of contents: </h3>
-<ol>
-<li> Schedulers</li>
-<li> Activation Functions</li>
-<li> Cost Functions</li> 
-<li> Convolution</li>
-<li> Layers</li>
-<li> CNN</li> 
-<li> Some final remarks</li>
-</ol>
-<h3 id="schedulers" class="anchor">Schedulers </h3>
-
-<p>The code below shows object oriented implementations of the Constant,
-Momentum, Adagrad, AdagradMomentum, RMS prop and Adam schedulers. All
-of the classes belong to the shared abstract Scheduler class, and
-share the update_change() and reset() methods allowing for any of the
-schedulers to be seamlessly used during the training stage, as will
-later be shown in the fit() method of the neural
-network. Update_change() only has one parameter, the gradient
-(\( \delta^{l}_{j}a^{l-1}_k \)), and returns the change which will be
-subtracted from the weights. The reset() function takes no parameters,
-and resets the desired variables. For Constant and Momentum, reset
-does nothing.
-</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">autograd.numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
-
-<span style="color: #008000; font-weight: bold">class</span> <span style="color: #0000FF; font-weight: bold">Scheduler</span>:
-    <span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;</span>
-<span style="color: #BA2121; font-style: italic">    Abstract class for Schedulers</span>
-<span style="color: #BA2121; font-style: italic">    &quot;&quot;&quot;</span>
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">__init__</span>(<span style="color: #008000">self</span>, eta):
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>eta <span style="color: #666666">=</span> eta
-
-    <span style="color: #408080; font-style: italic"># should be overwritten</span>
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">update_change</span>(<span style="color: #008000">self</span>, gradient):
-        <span style="color: #008000; font-weight: bold">raise</span> <span style="color: #D2413A; font-weight: bold">NotImplementedError</span>
-
-    <span style="color: #408080; font-style: italic"># overwritten if needed</span>
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">reset</span>(<span style="color: #008000">self</span>):
-        <span style="color: #008000; font-weight: bold">pass</span>
-
-
-<span style="color: #008000; font-weight: bold">class</span> <span style="color: #0000FF; font-weight: bold">Constant</span>(Scheduler):
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">__init__</span>(<span style="color: #008000">self</span>, eta):
-        <span style="color: #008000">super</span>()<span style="color: #666666">.</span><span style="color: #0000FF">__init__</span>(eta)
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">update_change</span>(<span style="color: #008000">self</span>, gradient):
-        <span style="color: #008000; font-weight: bold">return</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>eta <span style="color: #666666">*</span> gradient
-    
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">reset</span>(<span style="color: #008000">self</span>):
-        <span style="color: #008000; font-weight: bold">pass</span>
-
-
-<span style="color: #008000; font-weight: bold">class</span> <span style="color: #0000FF; font-weight: bold">Momentum</span>(Scheduler):
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">__init__</span>(<span style="color: #008000">self</span>, eta: <span style="color: #008000">float</span>, momentum: <span style="color: #008000">float</span>):
-        <span style="color: #008000">super</span>()<span style="color: #666666">.</span><span style="color: #0000FF">__init__</span>(eta)
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>momentum <span style="color: #666666">=</span> momentum
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>change <span style="color: #666666">=</span> <span style="color: #666666">0</span>
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">update_change</span>(<span style="color: #008000">self</span>, gradient):
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>change <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>momentum <span style="color: #666666">*</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>change <span style="color: #666666">+</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>eta <span style="color: #666666">*</span> gradient
-        <span style="color: #008000; font-weight: bold">return</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>change
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">reset</span>(<span style="color: #008000">self</span>):
-        <span style="color: #008000; font-weight: bold">pass</span>
-
-
-<span style="color: #008000; font-weight: bold">class</span> <span style="color: #0000FF; font-weight: bold">Adagrad</span>(Scheduler):
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">__init__</span>(<span style="color: #008000">self</span>, eta):
-        <span style="color: #008000">super</span>()<span style="color: #666666">.</span><span style="color: #0000FF">__init__</span>(eta)
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>G_t <span style="color: #666666">=</span> <span style="color: #008000; font-weight: bold">None</span>
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">update_change</span>(<span style="color: #008000">self</span>, gradient):
-        delta <span style="color: #666666">=</span> <span style="color: #666666">1e-8</span>  <span style="color: #408080; font-style: italic"># avoid division ny zero</span>
-
-        <span style="color: #008000; font-weight: bold">if</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>G_t <span style="color: #AA22FF; font-weight: bold">is</span> <span style="color: #008000; font-weight: bold">None</span>:
-            <span style="color: #008000">self</span><span style="color: #666666">.</span>G_t <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros((gradient<span style="color: #666666">.</span>shape[<span style="color: #666666">0</span>], gradient<span style="color: #666666">.</span>shape[<span style="color: #666666">0</span>]))
-
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>G_t <span style="color: #666666">+=</span> gradient <span style="color: #666666">@</span> gradient<span style="color: #666666">.</span>T
-
-        G_t_inverse <span style="color: #666666">=</span> <span style="color: #666666">1</span> <span style="color: #666666">/</span> (
-            delta <span style="color: #666666">+</span> np<span style="color: #666666">.</span>sqrt(np<span style="color: #666666">.</span>reshape(np<span style="color: #666666">.</span>diagonal(<span style="color: #008000">self</span><span style="color: #666666">.</span>G_t), (<span style="color: #008000">self</span><span style="color: #666666">.</span>G_t<span style="color: #666666">.</span>shape[<span style="color: #666666">0</span>], <span style="color: #666666">1</span>)))
-        )
-        <span style="color: #008000; font-weight: bold">return</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>eta <span style="color: #666666">*</span> gradient <span style="color: #666666">*</span> G_t_inverse
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">reset</span>(<span style="color: #008000">self</span>):
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>G_t <span style="color: #666666">=</span> <span style="color: #008000; font-weight: bold">None</span>
-
-
-<span style="color: #008000; font-weight: bold">class</span> <span style="color: #0000FF; font-weight: bold">AdagradMomentum</span>(Scheduler):
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">__init__</span>(<span style="color: #008000">self</span>, eta, momentum):
-        <span style="color: #008000">super</span>()<span style="color: #666666">.</span><span style="color: #0000FF">__init__</span>(eta)
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>G_t <span style="color: #666666">=</span> <span style="color: #008000; font-weight: bold">None</span>
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>momentum <span style="color: #666666">=</span> momentum
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>change <span style="color: #666666">=</span> <span style="color: #666666">0</span>
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">update_change</span>(<span style="color: #008000">self</span>, gradient):
-        delta <span style="color: #666666">=</span> <span style="color: #666666">1e-8</span>  <span style="color: #408080; font-style: italic"># avoid division ny zero</span>
-
-        <span style="color: #008000; font-weight: bold">if</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>G_t <span style="color: #AA22FF; font-weight: bold">is</span> <span style="color: #008000; font-weight: bold">None</span>:
-            <span style="color: #008000">self</span><span style="color: #666666">.</span>G_t <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros((gradient<span style="color: #666666">.</span>shape[<span style="color: #666666">0</span>], gradient<span style="color: #666666">.</span>shape[<span style="color: #666666">0</span>]))
-
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>G_t <span style="color: #666666">+=</span> gradient <span style="color: #666666">@</span> gradient<span style="color: #666666">.</span>T
-
-        G_t_inverse <span style="color: #666666">=</span> <span style="color: #666666">1</span> <span style="color: #666666">/</span> (
-            delta <span style="color: #666666">+</span> np<span style="color: #666666">.</span>sqrt(np<span style="color: #666666">.</span>reshape(np<span style="color: #666666">.</span>diagonal(<span style="color: #008000">self</span><span style="color: #666666">.</span>G_t), (<span style="color: #008000">self</span><span style="color: #666666">.</span>G_t<span style="color: #666666">.</span>shape[<span style="color: #666666">0</span>], <span style="color: #666666">1</span>)))
-        )
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>change <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>change <span style="color: #666666">*</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>momentum <span style="color: #666666">+</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>eta <span style="color: #666666">*</span> gradient <span style="color: #666666">*</span> G_t_inverse
-        <span style="color: #008000; font-weight: bold">return</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>change
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">reset</span>(<span style="color: #008000">self</span>):
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>G_t <span style="color: #666666">=</span> <span style="color: #008000; font-weight: bold">None</span>
-
-
-<span style="color: #008000; font-weight: bold">class</span> <span style="color: #0000FF; font-weight: bold">RMS_prop</span>(Scheduler):
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">__init__</span>(<span style="color: #008000">self</span>, eta, rho):
-        <span style="color: #008000">super</span>()<span style="color: #666666">.</span><span style="color: #0000FF">__init__</span>(eta)
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>rho <span style="color: #666666">=</span> rho
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>second <span style="color: #666666">=</span> <span style="color: #666666">0.0</span>
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">update_change</span>(<span style="color: #008000">self</span>, gradient):
-        delta <span style="color: #666666">=</span> <span style="color: #666666">1e-8</span>  <span style="color: #408080; font-style: italic"># avoid division ny zero</span>
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>second <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>rho <span style="color: #666666">*</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>second <span style="color: #666666">+</span> (<span style="color: #666666">1</span> <span style="color: #666666">-</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>rho) <span style="color: #666666">*</span> gradient <span style="color: #666666">*</span> gradient
-        <span style="color: #008000; font-weight: bold">return</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>eta <span style="color: #666666">*</span> gradient <span style="color: #666666">/</span> (np<span style="color: #666666">.</span>sqrt(<span style="color: #008000">self</span><span style="color: #666666">.</span>second <span style="color: #666666">+</span> delta))
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">reset</span>(<span style="color: #008000">self</span>):
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>second <span style="color: #666666">=</span> <span style="color: #666666">0.0</span>
-
-
-<span style="color: #008000; font-weight: bold">class</span> <span style="color: #0000FF; font-weight: bold">Adam</span>(Scheduler):
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">__init__</span>(<span style="color: #008000">self</span>, eta, rho, rho2):
-        <span style="color: #008000">super</span>()<span style="color: #666666">.</span><span style="color: #0000FF">__init__</span>(eta)
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>rho <span style="color: #666666">=</span> rho
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>rho2 <span style="color: #666666">=</span> rho2
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>moment <span style="color: #666666">=</span> <span style="color: #666666">0</span>
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>second <span style="color: #666666">=</span> <span style="color: #666666">0</span>
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>n_epochs <span style="color: #666666">=</span> <span style="color: #666666">1</span>
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">update_change</span>(<span style="color: #008000">self</span>, gradient):
-        delta <span style="color: #666666">=</span> <span style="color: #666666">1e-8</span>  <span style="color: #408080; font-style: italic"># avoid division ny zero</span>
-
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>moment <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>rho <span style="color: #666666">*</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>moment <span style="color: #666666">+</span> (<span style="color: #666666">1</span> <span style="color: #666666">-</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>rho) <span style="color: #666666">*</span> gradient
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>second <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>rho2 <span style="color: #666666">*</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>second <span style="color: #666666">+</span> (<span style="color: #666666">1</span> <span style="color: #666666">-</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>rho2) <span style="color: #666666">*</span> gradient <span style="color: #666666">*</span> gradient
-
-        moment_corrected <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>moment <span style="color: #666666">/</span> (<span style="color: #666666">1</span> <span style="color: #666666">-</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>rho<span style="color: #666666">**</span><span style="color: #008000">self</span><span style="color: #666666">.</span>n_epochs)
-        second_corrected <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>second <span style="color: #666666">/</span> (<span style="color: #666666">1</span> <span style="color: #666666">-</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>rho2<span style="color: #666666">**</span><span style="color: #008000">self</span><span style="color: #666666">.</span>n_epochs)
-
-        <span style="color: #008000; font-weight: bold">return</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>eta <span style="color: #666666">*</span> moment_corrected <span style="color: #666666">/</span> (np<span style="color: #666666">.</span>sqrt(second_corrected <span style="color: #666666">+</span> delta))
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">reset</span>(<span style="color: #008000">self</span>):
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>n_epochs <span style="color: #666666">+=</span> <span style="color: #666666">1</span>
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>moment <span style="color: #666666">=</span> <span style="color: #666666">0</span>
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>second <span style="color: #666666">=</span> <span style="color: #666666">0</span>
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-<h3 id="usage-of-schedulers" class="anchor">Usage of schedulers </h3>
-
-<p>To initalize a scheduler, simply create the object and pass in the necessary parameters such as the learning rate and the momentum as shown below. As the Scheduler class is an abstract class it should not called directly, and will raise an error upon usage.</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;">momentum_scheduler <span style="color: #666666">=</span> Momentum(eta<span style="color: #666666">=1e-3</span>, momentum<span style="color: #666666">=0.9</span>)
-adam_scheduler <span style="color: #666666">=</span> Adam(eta<span style="color: #666666">=1e-3</span>, rho<span style="color: #666666">=0.9</span>, rho2<span style="color: #666666">=0.999</span>)
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-<p>Here is a small example for how a segment of code using schedulers could look. Switching out the schedulers is simple.</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;">weights <span style="color: #666666">=</span> np<span style="color: #666666">.</span>ones((<span style="color: #666666">3</span>,<span style="color: #666666">3</span>))
-<span style="color: #008000">print</span>(<span style="color: #BA2121">f&quot;Before scheduler:</span><span style="color: #BB6622; font-weight: bold">\n</span><span style="color: #BB6688; font-weight: bold">{</span>weights<span style="color: #BB6688; font-weight: bold">=}</span><span style="color: #BA2121">&quot;</span>)
-
-epochs <span style="color: #666666">=</span> <span style="color: #666666">10</span>
-<span style="color: #008000; font-weight: bold">for</span> e <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(epochs):
-    gradient <span style="color: #666666">=</span> np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>rand(<span style="color: #666666">3</span>, <span style="color: #666666">3</span>)
-    change <span style="color: #666666">=</span> adam_scheduler<span style="color: #666666">.</span>update_change(gradient)
-    weights <span style="color: #666666">=</span> weights <span style="color: #666666">-</span> change
-    adam_scheduler<span style="color: #666666">.</span>reset()
-
-<span style="color: #008000">print</span>(<span style="color: #BA2121">f&quot;</span><span style="color: #BB6622; font-weight: bold">\n</span><span style="color: #BA2121">After scheduler:</span><span style="color: #BB6622; font-weight: bold">\n</span><span style="color: #BB6688; font-weight: bold">{</span>weights<span style="color: #BB6688; font-weight: bold">=}</span><span style="color: #BA2121">&quot;</span>)
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-<h3 id="cost-functions" class="anchor">Cost functions </h3>
-
-<p>In this section we will quickly look at cost functions that can be
-used when creating the neural network. Every cost function takes the
-target vector as its parameter, and returns a function valued only at
-X such that it may easily be differentiated.
-</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">CostOLS</span>(target):
-    <span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;</span>
-<span style="color: #BA2121; font-style: italic">    Return OLS function valued only at X, so</span>
-<span style="color: #BA2121; font-style: italic">    that it may be easily differentiated</span>
-<span style="color: #BA2121; font-style: italic">    &quot;&quot;&quot;</span>
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">func</span>(X):
-        <span style="color: #008000; font-weight: bold">return</span> (<span style="color: #666666">1.0</span> <span style="color: #666666">/</span> target<span style="color: #666666">.</span>shape[<span style="color: #666666">0</span>]) <span style="color: #666666">*</span> np<span style="color: #666666">.</span>sum((target <span style="color: #666666">-</span> X) <span style="color: #666666">**</span> <span style="color: #666666">2</span>)
-
-    <span style="color: #008000; font-weight: bold">return</span> func
-
-
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">CostLogReg</span>(target):
-    <span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;</span>
-<span style="color: #BA2121; font-style: italic">    Return Logistic Regression cost function</span>
-<span style="color: #BA2121; font-style: italic">    valued only at X, so that it may be easily differentiated</span>
-<span style="color: #BA2121; font-style: italic">    &quot;&quot;&quot;</span>
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">func</span>(X):
-        <span style="color: #008000; font-weight: bold">return</span> <span style="color: #666666">-</span>(<span style="color: #666666">1.0</span> <span style="color: #666666">/</span> target<span style="color: #666666">.</span>shape[<span style="color: #666666">0</span>]) <span style="color: #666666">*</span> np<span style="color: #666666">.</span>sum(
-            (target <span style="color: #666666">*</span> np<span style="color: #666666">.</span>log(X <span style="color: #666666">+</span> <span style="color: #666666">10e-10</span>)) <span style="color: #666666">+</span> ((<span style="color: #666666">1</span> <span style="color: #666666">-</span> target) <span style="color: #666666">*</span> np<span style="color: #666666">.</span>log(<span style="color: #666666">1</span> <span style="color: #666666">-</span> X <span style="color: #666666">+</span> <span style="color: #666666">10e-10</span>))
-        )
-
-    <span style="color: #008000; font-weight: bold">return</span> func
-
-
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">CostCrossEntropy</span>(target):
-    <span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;</span>
-<span style="color: #BA2121; font-style: italic">    Return cross entropy cost function valued only at X, so</span>
-<span style="color: #BA2121; font-style: italic">    that it may be easily differentiated</span>
-<span style="color: #BA2121; font-style: italic">    &quot;&quot;&quot;</span>
-    
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">func</span>(X):
-        <span style="color: #008000; font-weight: bold">return</span> <span style="color: #666666">-</span>(<span style="color: #666666">1.0</span> <span style="color: #666666">/</span> target<span style="color: #666666">.</span>size) <span style="color: #666666">*</span> np<span style="color: #666666">.</span>sum(target <span style="color: #666666">*</span> np<span style="color: #666666">.</span>log(X <span style="color: #666666">+</span> <span style="color: #666666">10e-10</span>))
-
-    <span style="color: #008000; font-weight: bold">return</span> func
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-<h3 id="usage-of-cost-functions" class="anchor">Usage of cost functions </h3>
-
-<p>Below we will provide a short example of how these cost function may
-be used to obtain results if you wish to test them out on your own
-using AutoGrad's automatic differentiation.
-</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">autograd</span> <span style="color: #008000; font-weight: bold">import</span> grad
-
-target <span style="color: #666666">=</span> np<span style="color: #666666">.</span>array([[<span style="color: #666666">1</span>, <span style="color: #666666">2</span>, <span style="color: #666666">3</span>]])<span style="color: #666666">.</span>T
-a <span style="color: #666666">=</span> np<span style="color: #666666">.</span>array([[<span style="color: #666666">4</span>, <span style="color: #666666">5</span>, <span style="color: #666666">6</span>]])<span style="color: #666666">.</span>T
-
-cost_func <span style="color: #666666">=</span> CostCrossEntropy
-cost_func_derivative <span style="color: #666666">=</span> grad(cost_func(target))
-
-valued_at_a <span style="color: #666666">=</span> cost_func_derivative(a)
-<span style="color: #008000">print</span>(<span style="color: #BA2121">f&quot;Derivative of cost function </span><span style="color: #BB6688; font-weight: bold">{</span>cost_func<span style="color: #666666">.</span><span style="color: #19177C">__name__</span><span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121"> valued at a:</span><span style="color: #BB6622; font-weight: bold">\n</span><span style="color: #BB6688; font-weight: bold">{</span>valued_at_a<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">&quot;</span>)
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-<h3 id="activation-functions" class="anchor">Activation functions </h3>
-
-<p>Finally, before we look at the layers that make up the neural network,
-we will look at the activation functions which can be specified
-between the hidden layers and as the output function. Each function
-can be valued for any given vector or matrix X, and can be
-differentiated via derivate().
-</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">autograd.numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">autograd</span> <span style="color: #008000; font-weight: bold">import</span> elementwise_grad
-
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">identity</span>(X):
-    <span style="color: #008000; font-weight: bold">return</span> X
-
-
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">sigmoid</span>(X):
-    <span style="color: #008000; font-weight: bold">try</span>:
-        <span style="color: #008000; font-weight: bold">return</span> <span style="color: #666666">1.0</span> <span style="color: #666666">/</span> (<span style="color: #666666">1</span> <span style="color: #666666">+</span> np<span style="color: #666666">.</span>exp(<span style="color: #666666">-</span>X))
-    <span style="color: #008000; font-weight: bold">except</span> <span style="color: #D2413A; font-weight: bold">FloatingPointError</span>:
-        <span style="color: #008000; font-weight: bold">return</span> np<span style="color: #666666">.</span>where(X <span style="color: #666666">&gt;</span> np<span style="color: #666666">.</span>zeros(X<span style="color: #666666">.</span>shape), np<span style="color: #666666">.</span>ones(X<span style="color: #666666">.</span>shape), np<span style="color: #666666">.</span>zeros(X<span style="color: #666666">.</span>shape))
-
-
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">softmax</span>(X):
-    X <span style="color: #666666">=</span> X <span style="color: #666666">-</span> np<span style="color: #666666">.</span>max(X, axis<span style="color: #666666">=-1</span>, keepdims<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">True</span>)
-    delta <span style="color: #666666">=</span> <span style="color: #666666">10e-10</span>
-    <span style="color: #008000; font-weight: bold">return</span> np<span style="color: #666666">.</span>exp(X) <span style="color: #666666">/</span> (np<span style="color: #666666">.</span>sum(np<span style="color: #666666">.</span>exp(X), axis<span style="color: #666666">=-1</span>, keepdims<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">True</span>) <span style="color: #666666">+</span> delta)
-
-
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">RELU</span>(X):
-    <span style="color: #008000; font-weight: bold">return</span> np<span style="color: #666666">.</span>where(X <span style="color: #666666">&gt;</span> np<span style="color: #666666">.</span>zeros(X<span style="color: #666666">.</span>shape), X, np<span style="color: #666666">.</span>zeros(X<span style="color: #666666">.</span>shape))
-
-
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">LRELU</span>(X):
-    delta <span style="color: #666666">=</span> <span style="color: #666666">10e-4</span>
-    <span style="color: #008000; font-weight: bold">return</span> np<span style="color: #666666">.</span>where(X <span style="color: #666666">&gt;</span> np<span style="color: #666666">.</span>zeros(X<span style="color: #666666">.</span>shape), X, delta <span style="color: #666666">*</span> X)
-
-
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">derivate</span>(func):
-    <span style="color: #008000; font-weight: bold">if</span> func<span style="color: #666666">.</span><span style="color: #19177C">__name__</span> <span style="color: #666666">==</span> <span style="color: #BA2121">&quot;RELU&quot;</span>:
-
-        <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">func</span>(X):
-            <span style="color: #008000; font-weight: bold">return</span> np<span style="color: #666666">.</span>where(X <span style="color: #666666">&gt;</span> <span style="color: #666666">0</span>, <span style="color: #666666">1</span>, <span style="color: #666666">0</span>)
-
-        <span style="color: #008000; font-weight: bold">return</span> func
-
-    <span style="color: #008000; font-weight: bold">elif</span> func<span style="color: #666666">.</span><span style="color: #19177C">__name__</span> <span style="color: #666666">==</span> <span style="color: #BA2121">&quot;LRELU&quot;</span>:
+<h2 id="cnns-in-more-detail-simple-example" class="anchor">CNNs in more detail, simple example  </h2>
 
-        <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">func</span>(X):
-            delta <span style="color: #666666">=</span> <span style="color: #666666">10e-4</span>
-            <span style="color: #008000; font-weight: bold">return</span> np<span style="color: #666666">.</span>where(X <span style="color: #666666">&gt;</span> <span style="color: #666666">0</span>, <span style="color: #666666">1</span>, delta)
-
-        <span style="color: #008000; font-weight: bold">return</span> func
-
-    <span style="color: #008000; font-weight: bold">else</span>:
-        <span style="color: #008000; font-weight: bold">return</span> elementwise_grad(func)
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-<h3 id="usage-of-activation-functions" class="anchor">Usage of activation functions </h3>
-
-<p>Below we present a short demonstration of how to use an activation
-function. The derivative of the activation function will be important
-when calculating the output delta term during backpropagation. Note
-that derivate() can also be used for cost functions for a more
-generalized approach.
-</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;">z <span style="color: #666666">=</span> np<span style="color: #666666">.</span>array([[<span style="color: #666666">4</span>, <span style="color: #666666">5</span>, <span style="color: #666666">6</span>]])<span style="color: #666666">.</span>T
-<span style="color: #008000">print</span>(<span style="color: #BA2121">f&quot;Input to activation function:</span><span style="color: #BB6622; font-weight: bold">\n</span><span style="color: #BB6688; font-weight: bold">{</span>z<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">&quot;</span>)
-
-act_func <span style="color: #666666">=</span> sigmoid
-a <span style="color: #666666">=</span> act_func(z)
-<span style="color: #008000">print</span>(<span style="color: #BA2121">f&quot;</span><span style="color: #BB6622; font-weight: bold">\n</span><span style="color: #BA2121">Output from </span><span style="color: #BB6688; font-weight: bold">{</span>act_func<span style="color: #666666">.</span><span style="color: #19177C">__name__</span><span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121"> activation function:</span><span style="color: #BB6622; font-weight: bold">\n</span><span style="color: #BB6688; font-weight: bold">{</span>a<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">&quot;</span>)
-
-act_func_derivative <span style="color: #666666">=</span> derivate(act_func)
-valued_at_z <span style="color: #666666">=</span> act_func_derivative(a)
-<span style="color: #008000">print</span>(<span style="color: #BA2121">f&quot;</span><span style="color: #BB6622; font-weight: bold">\n</span><span style="color: #BA2121">Derivative of </span><span style="color: #BB6688; font-weight: bold">{</span>act_func<span style="color: #666666">.</span><span style="color: #19177C">__name__</span><span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121"> activation function valued at z:</span><span style="color: #BB6622; font-weight: bold">\n</span><span style="color: #BB6688; font-weight: bold">{</span>valued_at_z<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">&quot;</span>)
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-<h3 id="convolution" class="anchor">Convolution </h3>
-
-<p>In order to construct a convolutional neural network (CNN), it is
-crucial to comprehend the fundamental principles of convolution and
-how it aids in extracting information from images. Convolution, at its
-core, is merely a mathematical operation between two functions that
-yields another function. It is represented by an integral between two
-functions, which is typically expressed as:
-</p>
-
-$$
-(f \ast g)(t):=\int_{-\infty}^{\infty} f(\tau) g(t-\tau) d \tau.
-$$
-
-<p>Here, \( f \) and \( g \) are the two functions on which we want to perform an
-operation. The outcome of the convolution operation is represented by
-\( (f \ast g) \), and it is derived by sliding the function \( g \) over \( f \) and
-computing the integral of their product at each position. If both
-functions are continuous, convolution takes the form shown
-above. However, if we discretize both \( f \) and \( g \), the convolution
-operation will take the form of a sum between the elements of \( f \) and \( g \):
-</p>
-$$
-(f \ast g)[n]=\sum_{m=0}^{n-1} f(m) g(n-m).
-$$
-
-<p>The key idea we utilize to extract the information contained in an
-image is to slide an \( m \times n \) matrix \( g \) over an \( m \times n \)
-matrix \( f \). In our case, \( f \) represents the image, while \( g \)
-represents the kernel, oftentimes called a filter. However, since our
-convolution will be a two-dimensional variant, we need to extend our
-mathematical formula with an additional summation:
+<p>Let assume we have an input matrix \( X \) of dimensionality \( 3\times 3 \)
+and a \( 2\times 2 \) filter \( W \) given by the following matrices
 </p>
 
 $$
-(f \ast g)(i, j)\sum_{m=0}^{M-1}\sum_{n=0}^{N-1} f(m,n) g(i-m, j-n).
+\boldsymbol{X}=\begin{bmatrix}x_{00} & x_{01} & x_{02}  \\
+                      x_{10} & x_{11} & x_{12}  \\
+	              x_{20} & x_{21} & x_{22} \end{bmatrix},
 $$
 
-<p>It is imperative to note that the size of the kernel g is
-significantly smaller than the size of the input image f, thereby
-reducing the amount of computation necessary for feature
-extraction. Furthermore, the kernel is usually a trainable parameter
-in a convolutional neural network, allowing the network to learn
-appropriate kernels for specific tasks.
-</p>
-
-<p>To give you an example of how 2D convolution works in practice,
-suppose we have an image \( f \) of dimension \( 6 \times 6 \)
-</p>
-
+<p>and </p>
 $$
-f = \begin{bmatrix}
-4 & 1 & 2 & 9 & 8 & 6 \\
-9 & 5 & 9 & 5 & 8 & 5 \\
-1 & 5 & 9 & 7 & 6 & 4 \\
-2 & 9 & 8 & 3 & 7 & 1 \\
-8 & 1 & 6 & 4 & 2 & 2 \\
-1 & 0 & 5 & 7 & 8 & 2 \\
-\end{bmatrix}
+\boldsymbol{W}=\begin{bmatrix}w_{00} & w_{01} \\
+	              w_{10} & w_{11}\end{bmatrix}.
 $$
 
-<p>and a \( 3 \times 3 \) kernel \( g \) called a low-pass filter. Note that the
-kernel is usually rotated by 180 degrees during convolution, however
-this has no effect on this kernel.
+<p>We introduce now the hyperparameter \( S \) <b>stride</b>. Stride represents how the filter \( W \) moves the convolution process on the matrix \( X \).
+We strongly recommend the repository on <a href="https://github.com/vdumoulin/conv_arithmetic" target="_self">Arithmetic of deep learning by Dumoulin and Visin</a> 
 </p>
 
-$$
-g = \frac{1}{9}
-\begin{bmatrix}
-1 & 1 & 1 \\
-1 & 1 & 1 \\
-1 & 1 & 1 \\
-\end{bmatrix}
-$$
-
-<p>In order to filter the image, we have to extract a \( 3 \times 3 \)
-element from the upper left corner of \( f \), and perform element-wise
-multiplication of the extracted image pixels with the elements of the
-kernel \( g \):
-</p>
+<p>Here we set the stride equal to \( S=1 \), which means that, starting with the element \( x_{00} \), the filter will act on \( 2\times 2 \) submatrices each time, starting with the upper corner and moving according to the stride value column by column. </p>
 
+<p>Here we perform the operation</p>
 $$
-\begin{bmatrix}
-4 & 1 & 2 \\
-9 & 5 & 9 \\
-1 & 5 & 9 \\
-\end{bmatrix}
-\cdot
-\begin{bmatrix}
-\frac{1}{9} & \frac{1}{9} & \frac{1}{9} \\
-\frac{1}{9} & \frac{1}{9} & \frac{1}{9} \\
-\frac{1}{9} & \frac{1}{9} & \frac{1}{9} \\
-\end{bmatrix}
-=
-\begin{bmatrix}
-\frac{4}{9} & \frac{1}{9} & \frac{2}{9} \\
-\frac{9}{9} & \frac{5}{9} & \frac{9}{9} \\
-\frac{1}{9} & \frac{5}{9} & \frac{9}{9} \\
-\end {bmatrix}
-= \boldsymbol{A}
+Y_(i,j)=(X * W)(i,j) = \sum_m\sum_n X(i-m,j-n)W(m,n),
 $$
 
-<p>Then, following the multiplication, we summarize all the elements of the resulting matrix \( \boldsymbol{A} \):</p>
-
+<p>and obtain</p>
 $$
-(f \ast g)(0, 0)= \sum_{i=0}^{2} \sum_{j=0}^{2} a_{i,j} = 5,
+\boldsymbol{Y}=\begin{bmatrix}x_{00}w_{00}+x_{01}w_{01}+x_{10}w_{10}+x_{11}w_{11} & x_{01}w_{00}+x_{02}w_{01}+x_{11}w_{10}+x_{12}w_{11}  \\
+	              x_{10}w_{00}+x_{11}w_{01}+x_{20}w_{10}+x_{21}w_{11} & x_{11}w_{00}+x_{12}w_{01}+x_{21}w_{10}+x_{22}w_{11}\end{bmatrix}.
 $$
 
-<p>which corresponds to the first element of the filtered image \( (f \ast g) \).</p>
-
-<p>Here we use a stride of \( S=1 \), a parameter denoted \( S \) which describes how
-many indexes we move the kernel \( g \) to the right before repeating the
-calculations above for the next \( 3 \times 3 \) element of the image
-\( f \). It is usually presumed that \( S=1 \), however, larger values for \( S \)
-can be used to reduce the dimentionality of the filtered image such
-that the convolution operation is more computationally efficient. In
-the context of a convolutional neural network, this will become very
-useful.
+<p>We can rewrite this operation in terms of a matrix-vector multiplication by defining a new vector where we flatten out the inputs as a vector \( \boldsymbol{X}' \) of length \( 9 \) and
+a matrix \( \boldsymbol{W}' \) with dimension \( 4\times 9 \) as
 </p>
 
-<p>The full result of the convolution is:</p>
-
 $$
-(f \ast g) =
-\begin{bmatrix}
-5 & 5.78 & 7 & 6.44 \\
-6.33 & 6.67 & 6.89 & 5.11 \\
-5.44 & 5.78 & 5.78 & 4 \\
-4.44 & 4.78 & 5.56 & 4 \\
-\end{bmatrix}
+\boldsymbol{X}'=\begin{bmatrix}x_{00} \\ x_{01} \\ x_{02} \\ x_{10} \\ x_{11} \\ x_{12} \\ x_{20} \\ x_{21} \\ x_{22} \end{bmatrix},
 $$
 
-<p>The result is markedly smaller in shape than the original image. This
-occurs when using convolution without first padding the image with
-additional columns and rows, allowing us to keep the original image
-shape after sliding the kernel over the image.  How many rows and
-columns we wish to pad the image with depends strictly on the shape of
-the kernel, as we wish to pad the image with \( r \) additional rows and
-\( c \) additional columns.
-</p>
-
+<p>and the new matrix</p>
 $$
-r =\lfloor \frac{\mathrm{kernel height}}{2} \rfloor \cdot 2 \\
-c =\lfloor \frac{\mathrm{kernel width}}{2} \rfloor \cdot 2
+\boldsymbol{W}'=\begin{bmatrix} w_{00} & w_{01} & 0 & w_{10} & w_{11} & 0 & 0 & 0 & 0 \\
+                        0  & w_{00} & w_{01} & 0 & w_{10} & w_{11} & 0 & 0 & 0 \\
+			0 & 0 & 0 & w_{00} & w_{01} & 0 & w_{10} & w_{11} & 0  \\
+                        0 & 0 & 0 & 0 & w_{00} & w_{01} & 0 & w_{10} & w_{11}\end{bmatrix}.
 $$
 
-<p>Note the notation \( \lfloor \frac{\mathrm{kernel width}}{2} \rfloor \) means that
-we floor the result of the division, meaning we round down to a whole
-number in case \( \frac{\mathrm{kernel width}}{2} \) results in a floating point
-number.
-</p>
-
-<p>Using those simple equations, we find out by how much we have to
-extend the dimensions of the original image. Before proceeding,
-however, we might ask what we shall fill the additional rows and
-columns with? One of the most common approaches to padding is
-zero-padding, which as the name suggest, involves filling the rows and
-columns with zeros. This is the approach that we will be using for
-this demonstration. If we apply this padding to out original \( 6 \times 6 \)
-image, the result will be an \( 8 \times 8 \) image as the kernel has a width and
-height of 3. Note that the original image is encapsuled by the
-zero-padded rows and columns:
-</p>
+<p>We see easily that performing the matrix-vector multiplication \( \boldsymbol{W}'\boldsymbol{X}' \) is the same as the above convolution with stride \( S=1 \), that is</p>
 
 $$
-\begin{bmatrix}
-0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\
-0 & 4 & 1 & 2 & 9 & 8 & 6 & 0 \\
-0 & 9 & 5 & 9 & 5 & 8 & 5 & 0 \\
-0 & 1 & 5 & 9 & 7 & 6 & 4 & 0 \\
-0 & 2 & 9 & 8 & 3 & 7 & 1 & 0 \\
-0 & 8 & 1 & 6 & 4 & 2 & 2 & 0 \\
-0 & 1 & 0 & 5 & 7 & 8 & 2 & 0 \\
-0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\
-\end{bmatrix}.
+Y=(\boldsymbol{W}*\boldsymbol{X}),
 $$
 
-<p>Below we have provided code that demonstrates padding and
-convolution. As you will see when we run the code, the size of the
-image will remain unchanged when using padding.~
-</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
-
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">padding</span>(image, kernel):
-    <span style="color: #408080; font-style: italic"># calculate r and c</span>
-    r <span style="color: #666666">=</span> (kernel<span style="color: #666666">.</span>shape[<span style="color: #666666">0</span>] <span style="color: #666666">//</span> <span style="color: #666666">2</span>) <span style="color: #666666">*</span> <span style="color: #666666">2</span>
-    c <span style="color: #666666">=</span> (kernel<span style="color: #666666">.</span>shape[<span style="color: #666666">1</span>] <span style="color: #666666">//</span> <span style="color: #666666">2</span>) <span style="color: #666666">*</span> <span style="color: #666666">2</span>
-    
-    <span style="color: #408080; font-style: italic"># padded image dimensions</span>
-    padded_height <span style="color: #666666">=</span> image<span style="color: #666666">.</span>shape[<span style="color: #666666">0</span>] <span style="color: #666666">+</span> r
-    padded_width <span style="color: #666666">=</span> image<span style="color: #666666">.</span>shape[<span style="color: #666666">1</span>] <span style="color: #666666">+</span> c
-    
-    <span style="color: #408080; font-style: italic"># for more readable code</span>
-    k_half_height <span style="color: #666666">=</span> kernel<span style="color: #666666">.</span>shape[<span style="color: #666666">0</span>] <span style="color: #666666">//</span> <span style="color: #666666">2</span>
-    k_half_width <span style="color: #666666">=</span> kernel<span style="color: #666666">.</span>shape[<span style="color: #666666">1</span>] <span style="color: #666666">//</span> <span style="color: #666666">2</span>
-
-    <span style="color: #408080; font-style: italic"># zero matrix with padded dimensions</span>
-    padded_img <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros((padded_height, padded_width))
-
-    <span style="color: #408080; font-style: italic"># place image into zero matrix</span>
-    padded_img[k_half_height : padded_height <span style="color: #666666">-</span> k_half_height,
-               k_half_width : padded_width <span style="color: #666666">-</span> k_half_width] <span style="color: #666666">=</span> image[:, :]
-
-    <span style="color: #008000; font-weight: bold">return</span> padded_img
-
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">convolve</span>(original_image, padded_image, kernel, stride<span style="color: #666666">=1</span>):
-    <span style="color: #408080; font-style: italic"># rotate kernel by 180 degrees</span>
-    kernel <span style="color: #666666">=</span> np<span style="color: #666666">.</span>rot90(np<span style="color: #666666">.</span>rot90(kernel))
-
-    <span style="color: #408080; font-style: italic"># note that kernel height // 2 is written as &#39;m&#39;</span>
-    <span style="color: #408080; font-style: italic"># and kernel width // 2 as &#39;n&#39; in the mathematical notation</span>
-    m <span style="color: #666666">=</span> kernel<span style="color: #666666">.</span>shape[<span style="color: #666666">0</span>] <span style="color: #666666">//</span> <span style="color: #666666">2</span>
-    n <span style="color: #666666">=</span> kernel<span style="color: #666666">.</span>shape[<span style="color: #666666">1</span>] <span style="color: #666666">//</span> <span style="color: #666666">2</span>
-    
-    r <span style="color: #666666">=</span> (kernel<span style="color: #666666">.</span>shape[<span style="color: #666666">0</span>] <span style="color: #666666">//</span> <span style="color: #666666">2</span>) <span style="color: #666666">*</span> <span style="color: #666666">2</span>
-    c <span style="color: #666666">=</span> (kernel<span style="color: #666666">.</span>shape[<span style="color: #666666">1</span>] <span style="color: #666666">//</span> <span style="color: #666666">2</span>) <span style="color: #666666">*</span> <span style="color: #666666">2</span>
-    
-    <span style="color: #408080; font-style: italic"># initialize output array</span>
-    convolved_image <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros(original_image<span style="color: #666666">.</span>shape)
-    image_height <span style="color: #666666">=</span> original_image<span style="color: #666666">.</span>shape[<span style="color: #666666">0</span>]
-    image_width <span style="color: #666666">=</span> original_image<span style="color: #666666">.</span>shape[<span style="color: #666666">1</span>]
-
-    <span style="color: #408080; font-style: italic"># the convolution</span>
-    <span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(m, image_height <span style="color: #666666">+</span> m, stride):
-        <span style="color: #008000; font-weight: bold">for</span> j <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(n, image_width <span style="color: #666666">+</span> n, stride):
-            convolved_image[i<span style="color: #666666">-</span>m, j<span style="color: #666666">-</span>n] <span style="color: #666666">=</span> np<span style="color: #666666">.</span>sum(
-                padded_image[i : i <span style="color: #666666">+</span> m, j : j <span style="color: #666666">+</span> n]
-                <span style="color: #666666">*</span> kernel
-            )
-            
-    <span style="color: #008000; font-weight: bold">return</span> convolved_image
-
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">convolve</span>(image, kernel, stride<span style="color: #666666">=1</span>):
-    <span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #666666">2</span>):
-        kernel <span style="color: #666666">=</span> np<span style="color: #666666">.</span>rot90(kernel)
-
-    k_half_height <span style="color: #666666">=</span> kernel<span style="color: #666666">.</span>shape[<span style="color: #666666">0</span>] <span style="color: #666666">//</span> <span style="color: #666666">2</span>
-    k_half_width <span style="color: #666666">=</span> kernel<span style="color: #666666">.</span>shape[<span style="color: #666666">0</span>] <span style="color: #666666">//</span> <span style="color: #666666">2</span>
-
-    conv_image <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros(image<span style="color: #666666">.</span>shape)
-    pad_image <span style="color: #666666">=</span> padding(image, kernel)
-
-    <span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(k_half_height, conv_image<span style="color: #666666">.</span>shape[<span style="color: #666666">0</span>] <span style="color: #666666">+</span> k_half_height, stride):
-        <span style="color: #008000; font-weight: bold">for</span> j <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(k_half_width, conv_image<span style="color: #666666">.</span>shape[<span style="color: #666666">1</span>] <span style="color: #666666">+</span> k_half_width, stride):
-            conv_image[i <span style="color: #666666">-</span> k_half_height, j <span style="color: #666666">-</span> k_half_width] <span style="color: #666666">=</span> np<span style="color: #666666">.</span>sum(
-                pad_image[
-                    i <span style="color: #666666">-</span> k_half_height : i <span style="color: #666666">+</span> k_half_height <span style="color: #666666">+</span> <span style="color: #666666">1</span>, j <span style="color: #666666">-</span> k_half_width : j <span style="color: #666666">+</span> k_half_width <span style="color: #666666">+</span> <span style="color: #666666">1</span>
-                ]
-                <span style="color: #666666">*</span> kernel
-            )
-
-    <span style="color: #008000; font-weight: bold">return</span> conv_image
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-<p>Fun fact: When filtering images, you will see that convolution
-involves rotating the kernel by 180 degrees.  However, this is not the
-case when applying convolution in a CNN, where the same operation that is not
-rotated by 180 degrees is called cross-correlation, which is normally implemented in most libraries.
-</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;">original_image <span style="color: #666666">=</span> np<span style="color: #666666">.</span>array([[<span style="color: #666666">4</span>, <span style="color: #666666">1</span>, <span style="color: #666666">2</span>, <span style="color: #666666">9</span>, <span style="color: #666666">8</span>, <span style="color: #666666">6</span>],
-                 [<span style="color: #666666">9</span>, <span style="color: #666666">5</span>, <span style="color: #666666">9</span>, <span style="color: #666666">5</span>, <span style="color: #666666">8</span>, <span style="color: #666666">5</span>],
-                 [<span style="color: #666666">1</span>, <span style="color: #666666">5</span>, <span style="color: #666666">9</span>, <span style="color: #666666">7</span>, <span style="color: #666666">6</span>, <span style="color: #666666">4</span>],
-                 [<span style="color: #666666">2</span>, <span style="color: #666666">9</span>, <span style="color: #666666">8</span>, <span style="color: #666666">3</span>, <span style="color: #666666">7</span>, <span style="color: #666666">1</span>],
-                 [<span style="color: #666666">8</span>, <span style="color: #666666">1</span>, <span style="color: #666666">6</span>, <span style="color: #666666">4</span>, <span style="color: #666666">2</span>, <span style="color: #666666">2</span>],
-                 [<span style="color: #666666">1</span>, <span style="color: #666666">0</span>, <span style="color: #666666">5</span>, <span style="color: #666666">7</span>, <span style="color: #666666">8</span>, <span style="color: #666666">2</span>]])
-
-kernel <span style="color: #666666">=</span> (<span style="color: #666666">1/9</span>)<span style="color: #666666">*</span>np<span style="color: #666666">.</span>ones((<span style="color: #666666">3</span>,<span style="color: #666666">3</span>))
-
-<span style="color: #008000">print</span>(<span style="color: #BA2121">f&quot;</span><span style="color: #BB6688; font-weight: bold">{</span>original_image<span style="color: #666666">.</span>shape<span style="color: #BB6688; font-weight: bold">=}</span><span style="color: #BA2121">&quot;</span>)
-
-<span style="color: #408080; font-style: italic"># note that convolve() performs padding</span>
-convolved_image <span style="color: #666666">=</span> convolve(original_image, kernel, stride<span style="color: #666666">=1</span>)
-
-<span style="color: #008000">print</span>(<span style="color: #BA2121">f&quot;</span><span style="color: #BB6688; font-weight: bold">{</span>convolved_image<span style="color: #666666">.</span>shape<span style="color: #BB6688; font-weight: bold">=}</span><span style="color: #BA2121">&quot;</span>)
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-<p>As you can see, the resulting image is of the same size as the
-original image. To round of our demonstration of convolution, we will
-present the results of convolution using commonly used kernels. In a
-CNN, the values of the kernels are randomly initialized, and then
-learned during training. These kernels will extract information
-regarding the picture, such as for example the edge detection filter
-demonstrated below extracts the edges present in the picture. Of
-course, there is no guarantee that the CNN will learn an edge
-detection filter, but this should provide some intuiton as to how the
-CNN is able to use kernels to make better predictions than a regular
-feed forward neural network.
-</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #408080; font-style: italic"># Now an example using a real image and first a gaussian low-pass filter and then a Sobel filter</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">imageio.v3</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">imageio</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">matplotlib.pyplot</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">plt</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">time</span>
-
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">generate_gauss_mask</span>(sigma, K<span style="color: #666666">=1</span>):
-    side <span style="color: #666666">=</span> np<span style="color: #666666">.</span>ceil(<span style="color: #666666">1</span> <span style="color: #666666">+</span> <span style="color: #666666">8</span> <span style="color: #666666">*</span> sigma)
-    y, x <span style="color: #666666">=</span> np<span style="color: #666666">.</span>mgrid[<span style="color: #666666">-</span>side <span style="color: #666666">//</span> <span style="color: #666666">2</span> <span style="color: #666666">+</span> <span style="color: #666666">1</span> : (side <span style="color: #666666">//</span> <span style="color: #666666">2</span>) <span style="color: #666666">+</span> <span style="color: #666666">1</span>, <span style="color: #666666">-</span>side <span style="color: #666666">//</span> <span style="color: #666666">2</span> <span style="color: #666666">+</span> <span style="color: #666666">1</span> : (side <span style="color: #666666">//</span> <span style="color: #666666">2</span>) <span style="color: #666666">+</span> <span style="color: #666666">1</span>]
-    ker_coef <span style="color: #666666">=</span> K <span style="color: #666666">/</span> (<span style="color: #666666">2</span> <span style="color: #666666">*</span> np<span style="color: #666666">.</span>pi <span style="color: #666666">*</span> sigma<span style="color: #666666">**2</span>)
-    g <span style="color: #666666">=</span> np<span style="color: #666666">.</span>exp(<span style="color: #666666">-</span>((x<span style="color: #666666">**2</span> <span style="color: #666666">+</span> y<span style="color: #666666">**2</span>) <span style="color: #666666">/</span> (<span style="color: #666666">2.0</span> <span style="color: #666666">*</span> sigma<span style="color: #666666">**2</span>)))
-
-    <span style="color: #008000; font-weight: bold">return</span> g, ker_coef
-
-
-img_path <span style="color: #666666">=</span> <span style="color: #BA2121">&quot;data/IMG-2167.JPG&quot;</span>
-image_of_cute_dog <span style="color: #666666">=</span> imageio<span style="color: #666666">.</span>imread(img_path, mode<span style="color: #666666">=</span><span style="color: #BA2121">&#39;L&#39;</span>)
-
-plt<span style="color: #666666">.</span>imshow(image_of_cute_dog, cmap<span style="color: #666666">=</span><span style="color: #BA2121">&quot;gray&quot;</span>, vmin<span style="color: #666666">=0</span>, vmax<span style="color: #666666">=255</span>, aspect<span style="color: #666666">=</span><span style="color: #BA2121">&quot;auto&quot;</span>)
-plt<span style="color: #666666">.</span>title(<span style="color: #BA2121">&quot;Original image&quot;</span>)
-plt<span style="color: #666666">.</span>show()
-
-gauss, kernel <span style="color: #666666">=</span> generate_gauss_mask(sigma<span style="color: #666666">=6</span>)
-gauss_kernel <span style="color: #666666">=</span> gauss<span style="color: #666666">*</span>kernel
-
-filtered_image <span style="color: #666666">=</span> convolve(image_of_cute_dog, gauss_kernel)
-plt<span style="color: #666666">.</span>imshow(filtered_image, cmap<span style="color: #666666">=</span><span style="color: #BA2121">&quot;gray&quot;</span>, vmin<span style="color: #666666">=0</span>, vmax<span style="color: #666666">=255</span>, aspect<span style="color: #666666">=</span><span style="color: #BA2121">&quot;auto&quot;</span>)
-plt<span style="color: #666666">.</span>title(<span style="color: #BA2121">&quot;Result of convolution with gauss kernel (blurring filter)&quot;</span>)
-plt<span style="color: #666666">.</span>show()
-
-sobel_kernel <span style="color: #666666">=</span> np<span style="color: #666666">.</span>array([[<span style="color: #666666">1</span>, <span style="color: #666666">2</span>, <span style="color: #666666">1</span>],
-                    [<span style="color: #666666">0</span>, <span style="color: #666666">0</span>, <span style="color: #666666">0</span>], 
-                    [<span style="color: #666666">-1</span>, <span style="color: #666666">-2</span>, <span style="color: #666666">-1</span>]])
-
-filtered_image <span style="color: #666666">=</span> convolve(image_of_cute_dog, sobel_kernel)
-
-plt<span style="color: #666666">.</span>imshow(filtered_image, cmap<span style="color: #666666">=</span><span style="color: #BA2121">&quot;gray&quot;</span>, vmin<span style="color: #666666">=0</span>, vmax<span style="color: #666666">=255</span>, aspect<span style="color: #666666">=</span><span style="color: #BA2121">&quot;auto&quot;</span>)
-plt<span style="color: #666666">.</span>title(<span style="color: #BA2121">&quot;Result of convolution with sobel kernel (edge detection filter)&quot;</span>)
-plt<span style="color: #666666">.</span>show()
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-<h3 id="layers" class="anchor">Layers </h3>
-
-<p>The code below initialises global variables for readability and
-describes the abstract class Layers. This is not important in order to
-understand the CNN, but is benefitial for organizing the code neatly.
-</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">math</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">autograd.numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">copy</span> <span style="color: #008000; font-weight: bold">import</span> deepcopy, copy
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">autograd</span> <span style="color: #008000; font-weight: bold">import</span> grad
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">typing</span> <span style="color: #008000; font-weight: bold">import</span> Callable
-
-<span style="color: #408080; font-style: italic"># global variables for index readability</span>
-input_index <span style="color: #666666">=</span> <span style="color: #666666">0</span>
-node_index <span style="color: #666666">=</span> <span style="color: #666666">1</span>
-bias_index <span style="color: #666666">=</span> <span style="color: #666666">1</span>
-input_channel_index <span style="color: #666666">=</span> <span style="color: #666666">1</span>
-feature_maps_index <span style="color: #666666">=</span> <span style="color: #666666">1</span>
-height_index <span style="color: #666666">=</span> <span style="color: #666666">2</span>
-width_index <span style="color: #666666">=</span> <span style="color: #666666">3</span>
-kernel_feature_maps_index <span style="color: #666666">=</span> <span style="color: #666666">1</span>
-kernel_input_channels_index <span style="color: #666666">=</span> <span style="color: #666666">0</span>
-
-
-<span style="color: #008000; font-weight: bold">class</span> <span style="color: #0000FF; font-weight: bold">Layer</span>:
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">__init__</span>(<span style="color: #008000">self</span>, seed):
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>seed <span style="color: #666666">=</span> seed
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">_feedforward</span>(<span style="color: #008000">self</span>):
-        <span style="color: #008000; font-weight: bold">raise</span> <span style="color: #D2413A; font-weight: bold">NotImplementedError</span>
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">_backpropagate</span>(<span style="color: #008000">self</span>):
-        <span style="color: #008000; font-weight: bold">raise</span> <span style="color: #D2413A; font-weight: bold">NotImplementedError</span>
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">_reset_weights</span>(<span style="color: #008000">self</span>, previous_nodes):
-        <span style="color: #008000; font-weight: bold">raise</span> <span style="color: #D2413A; font-weight: bold">NotImplementedError</span>
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-<h3 id="convolution2dlayer-convolution-in-a-hidden-layer" class="anchor">Convolution2DLayer: convolution in a hidden layer </h3>
-
-<p>After establishing the foundational understanding of applying
-convolution to spatial data, let us delve into the intricate workings
-of a convolutional layer in a Convolutional Neural Network (CNN). The
-primary function of convolution, as previously discussed, is to
-extract pertinent information from images while simultaneously
-decreasing the scale of our data. To initiate the image processing, we
-shall begin by partitioning the images into color channels (unless the
-image is grayscale), comprising three primary colors: red, green, and
-blue. We will subsequently utilize trainable kernels to construct a
-higher-dimensional encoding of each channel called feature
-maps. Successive layers will receive these feature maps as inputs,
-generating further encodings, albeit with reduced dimensions. The term
-trainable kernels denotes the initialization of pre-defined
-kernel-shaped weights, which we will then train via backpropagation,
-similar to how weights are trained in a Feedforward Neural Network.
-</p>
-
-<p>To ensure seamless integration between our implementation of the
-convolutional layer and popular machine learning frameworks like
-Tensorflow (Keras) and PyTorch, we have adopted a design pattern that
-mirrors the construction of models using these APIs. This involves
-implementing our convolutional layer as a Python class or object,
-which allows for a more modular and flexible approach to building
-neural networks. By structuring our code in this way, users can easily
-incorporate our implementation into their existing machine learning
-pipelines without having to make significant changes to their
-codebase. Additionally, this design pattern promotes code reusability
-and makes it easier to maintain and update our convolutional layer
-implementation over time.
-</p>
-
-<p>Note that the Convolution2DLayer takes in an activation function as a parameter, as it also performs non-linearity.</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">class</span> <span style="color: #0000FF; font-weight: bold">Convolution2DLayer</span>(Layer):
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">__init__</span>(
-        <span style="color: #008000">self</span>,
-        input_channels,
-        feature_maps,
-        kernel_height,
-        kernel_width,
-        v_stride,
-        h_stride,
-        pad,
-        act_func: Callable,
-        seed<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">None</span>,
-        reset_weights_independently<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">True</span>,
-    ):
-        <span style="color: #008000">super</span>()<span style="color: #666666">.</span><span style="color: #0000FF">__init__</span>(seed)
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>input_channels <span style="color: #666666">=</span> input_channels
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>feature_maps <span style="color: #666666">=</span> feature_maps
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_height <span style="color: #666666">=</span> kernel_height
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_width <span style="color: #666666">=</span> kernel_width
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>v_stride <span style="color: #666666">=</span> v_stride
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>h_stride <span style="color: #666666">=</span> h_stride
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>pad <span style="color: #666666">=</span> pad
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>act_func <span style="color: #666666">=</span> act_func
-
-        <span style="color: #408080; font-style: italic"># such that the layer can be used on its own</span>
-        <span style="color: #408080; font-style: italic"># outside of the CNN module</span>
-        <span style="color: #008000; font-weight: bold">if</span> reset_weights_independently <span style="color: #666666">==</span> <span style="color: #008000; font-weight: bold">True</span>:
-            <span style="color: #008000">self</span><span style="color: #666666">.</span>_reset_weights_independently()
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">_feedforward</span>(<span style="color: #008000">self</span>, X_batch):
-        <span style="color: #408080; font-style: italic"># note that the shape of X_batch = [inputs, input_maps, img_height, img_width]</span>
-
-        <span style="color: #408080; font-style: italic"># pad the input batch</span>
-        X_batch_padded <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>_padding(X_batch)
-
-        <span style="color: #408080; font-style: italic"># calculate height_index and width_index after stride</span>
-        strided_height <span style="color: #666666">=</span> <span style="color: #008000">int</span>(np<span style="color: #666666">.</span>ceil(X_batch<span style="color: #666666">.</span>shape[height_index] <span style="color: #666666">/</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>v_stride))
-        strided_width <span style="color: #666666">=</span> <span style="color: #008000">int</span>(np<span style="color: #666666">.</span>ceil(X_batch<span style="color: #666666">.</span>shape[width_index] <span style="color: #666666">/</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>h_stride))
-
-        <span style="color: #408080; font-style: italic"># create output array</span>
-        output <span style="color: #666666">=</span> np<span style="color: #666666">.</span>ndarray(
-            (
-                X_batch<span style="color: #666666">.</span>shape[input_index],
-                <span style="color: #008000">self</span><span style="color: #666666">.</span>feature_maps,
-                strided_height,
-                strided_width,
-            )
-        )
-
-        <span style="color: #408080; font-style: italic"># save input and output for backpropagation</span>
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>X_batch_feedforward <span style="color: #666666">=</span> X_batch
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>output_shape <span style="color: #666666">=</span> output<span style="color: #666666">.</span>shape
-
-        <span style="color: #408080; font-style: italic"># checking for errors, no need to look here :)</span>
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>_check_for_errors()
-
-        <span style="color: #408080; font-style: italic"># convolve input with kernel</span>
-        <span style="color: #008000; font-weight: bold">for</span> img <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(X_batch<span style="color: #666666">.</span>shape[input_index]):
-            <span style="color: #008000; font-weight: bold">for</span> chin <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #008000">self</span><span style="color: #666666">.</span>input_channels):
-                <span style="color: #008000; font-weight: bold">for</span> fmap <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #008000">self</span><span style="color: #666666">.</span>feature_maps):
-                    out_h <span style="color: #666666">=</span> <span style="color: #666666">0</span>
-                    <span style="color: #008000; font-weight: bold">for</span> h <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #666666">0</span>, X_batch<span style="color: #666666">.</span>shape[height_index], <span style="color: #008000">self</span><span style="color: #666666">.</span>v_stride):
-                        out_w <span style="color: #666666">=</span> <span style="color: #666666">0</span>
-                        <span style="color: #008000; font-weight: bold">for</span> w <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #666666">0</span>, X_batch<span style="color: #666666">.</span>shape[width_index], <span style="color: #008000">self</span><span style="color: #666666">.</span>h_stride):
-                            output[img, fmap, out_h, out_w] <span style="color: #666666">=</span> np<span style="color: #666666">.</span>sum(
-                                X_batch_padded[
-                                    img,
-                                    chin,
-                                    h : h <span style="color: #666666">+</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_height,
-                                    w : w <span style="color: #666666">+</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_width,
-                                ]
-                                <span style="color: #666666">*</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel[chin, fmap, :, :]
-                            )
-                            out_w <span style="color: #666666">+=</span> <span style="color: #666666">1</span>
-                        out_h <span style="color: #666666">+=</span> <span style="color: #666666">1</span>
-
-        <span style="color: #408080; font-style: italic"># Pay attention to the fact that we&#39;re not rotating the kernel by 180 degrees when filtering the image in</span>
-        <span style="color: #408080; font-style: italic"># the convolutional layer, as convolution in terms of Machine Learning is a procedure known as cross-correlation</span>
-        <span style="color: #408080; font-style: italic"># in image processing and signal processing</span>
-
-        <span style="color: #408080; font-style: italic"># return a</span>
-        <span style="color: #008000; font-weight: bold">return</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>act_func(output <span style="color: #666666">/</span> (<span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_height))
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">_backpropagate</span>(<span style="color: #008000">self</span>, delta_term_next):
-        <span style="color: #408080; font-style: italic"># intiate matrices</span>
-        delta_term <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros((<span style="color: #008000">self</span><span style="color: #666666">.</span>X_batch_feedforward<span style="color: #666666">.</span>shape))
-        gradient_kernel <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros((<span style="color: #008000">self</span><span style="color: #666666">.</span>kernel<span style="color: #666666">.</span>shape))
-
-        <span style="color: #408080; font-style: italic"># pad input for convolution</span>
-        X_batch_padded <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>_padding(<span style="color: #008000">self</span><span style="color: #666666">.</span>X_batch_feedforward)
-
-        <span style="color: #408080; font-style: italic"># Since an activation function is used at the output of the convolution layer, its derivative</span>
-        <span style="color: #408080; font-style: italic"># has to be accounted for in the backpropagation -&gt; as if ReLU was a layer on its own.</span>
-        act_derivative <span style="color: #666666">=</span> derivate(<span style="color: #008000">self</span><span style="color: #666666">.</span>act_func)
-        delta_term_next <span style="color: #666666">=</span> act_derivative(delta_term_next)
-
-        <span style="color: #408080; font-style: italic"># fill in 0&#39;s for values removed by vertical stride in feedforward</span>
-        <span style="color: #008000; font-weight: bold">if</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>v_stride <span style="color: #666666">&gt;</span> <span style="color: #666666">1</span>:
-            v_ind <span style="color: #666666">=</span> <span style="color: #666666">1</span>
-            <span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(delta_term_next<span style="color: #666666">.</span>shape[height_index]):
-                <span style="color: #008000; font-weight: bold">for</span> j <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #008000">self</span><span style="color: #666666">.</span>v_stride <span style="color: #666666">-</span> <span style="color: #666666">1</span>):
-                    delta_term_next <span style="color: #666666">=</span> np<span style="color: #666666">.</span>insert(
-                        delta_term_next, v_ind, <span style="color: #666666">0</span>, axis<span style="color: #666666">=</span>height_index
-                    )
-                v_ind <span style="color: #666666">+=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>v_stride
-
-        <span style="color: #408080; font-style: italic"># fill in 0&#39;s for values removed by horizontal stride in feedforward</span>
-        <span style="color: #008000; font-weight: bold">if</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>h_stride <span style="color: #666666">&gt;</span> <span style="color: #666666">1</span>:
-            h_ind <span style="color: #666666">=</span> <span style="color: #666666">1</span>
-            <span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(delta_term_next<span style="color: #666666">.</span>shape[width_index]):
-                <span style="color: #008000; font-weight: bold">for</span> k <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #008000">self</span><span style="color: #666666">.</span>h_stride <span style="color: #666666">-</span> <span style="color: #666666">1</span>):
-                    delta_term_next <span style="color: #666666">=</span> np<span style="color: #666666">.</span>insert(
-                        delta_term_next, h_ind, <span style="color: #666666">0</span>, axis<span style="color: #666666">=</span>width_index
-                    )
-                h_ind <span style="color: #666666">+=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>h_stride
-
-        <span style="color: #408080; font-style: italic"># crops out 0-rows and 0-columns</span>
-        delta_term_next <span style="color: #666666">=</span> delta_term_next[
-            :,
-            :,
-            : <span style="color: #008000">self</span><span style="color: #666666">.</span>X_batch_feedforward<span style="color: #666666">.</span>shape[height_index],
-            : <span style="color: #008000">self</span><span style="color: #666666">.</span>X_batch_feedforward<span style="color: #666666">.</span>shape[width_index],
-        ]
-
-        <span style="color: #408080; font-style: italic"># the gradient received from the next layer also needs to be padded</span>
-        delta_term_next <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>_padding(delta_term_next)
-
-        <span style="color: #408080; font-style: italic"># calculate delta term by convolving next delta term with kernel</span>
-        <span style="color: #008000; font-weight: bold">for</span> img <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #008000">self</span><span style="color: #666666">.</span>X_batch_feedforward<span style="color: #666666">.</span>shape[input_index]):
-            <span style="color: #008000; font-weight: bold">for</span> chin <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #008000">self</span><span style="color: #666666">.</span>input_channels):
-                <span style="color: #008000; font-weight: bold">for</span> fmap <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #008000">self</span><span style="color: #666666">.</span>feature_maps):
-                    <span style="color: #008000; font-weight: bold">for</span> h <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #008000">self</span><span style="color: #666666">.</span>X_batch_feedforward<span style="color: #666666">.</span>shape[height_index]):
-                        <span style="color: #008000; font-weight: bold">for</span> w <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #008000">self</span><span style="color: #666666">.</span>X_batch_feedforward<span style="color: #666666">.</span>shape[width_index]):
-                            delta_term[img, chin, h, w] <span style="color: #666666">=</span> np<span style="color: #666666">.</span>sum(
-                                delta_term_next[
-                                    img,
-                                    fmap,
-                                    h : h <span style="color: #666666">+</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_height,
-                                    w : w <span style="color: #666666">+</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_width,
-                                ]
-                                <span style="color: #666666">*</span> np<span style="color: #666666">.</span>rot90(np<span style="color: #666666">.</span>rot90(<span style="color: #008000">self</span><span style="color: #666666">.</span>kernel[chin, fmap, :, :]))
-                            )
-
-        <span style="color: #408080; font-style: italic"># calculate gradient for kernel for weight update</span>
-        <span style="color: #408080; font-style: italic"># also via convolution</span>
-        <span style="color: #008000; font-weight: bold">for</span> chin <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #008000">self</span><span style="color: #666666">.</span>input_channels):
-            <span style="color: #008000; font-weight: bold">for</span> fmap <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #008000">self</span><span style="color: #666666">.</span>feature_maps):
-                <span style="color: #008000; font-weight: bold">for</span> k_x <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_height):
-                    <span style="color: #008000; font-weight: bold">for</span> k_y <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_width):
-                        gradient_kernel[chin, fmap, k_x, k_y] <span style="color: #666666">=</span> np<span style="color: #666666">.</span>sum(
-                            X_batch_padded[
-                                img,
-                                chin,
-                                h : h <span style="color: #666666">+</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_height,
-                                w : w <span style="color: #666666">+</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_width,
-                            ]
-                            <span style="color: #666666">*</span> delta_term_next[
-                                img,
-                                fmap,
-                                h : h <span style="color: #666666">+</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_height,
-                                w : w <span style="color: #666666">+</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_width,
-                            ]
-                        )
-        <span style="color: #408080; font-style: italic"># all kernels are updated with weight gradient of kernel</span>
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel <span style="color: #666666">-=</span> gradient_kernel
-
-        <span style="color: #408080; font-style: italic"># return delta term</span>
-        <span style="color: #008000; font-weight: bold">return</span> delta_term
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">_padding</span>(<span style="color: #008000">self</span>, X_batch, batch_type<span style="color: #666666">=</span><span style="color: #BA2121">&quot;image&quot;</span>):
-
-        <span style="color: #408080; font-style: italic"># same padding for images</span>
-        <span style="color: #008000; font-weight: bold">if</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>pad <span style="color: #666666">==</span> <span style="color: #BA2121">&quot;same&quot;</span> <span style="color: #AA22FF; font-weight: bold">and</span> batch_type <span style="color: #666666">==</span> <span style="color: #BA2121">&quot;image&quot;</span>:
-            padded_height <span style="color: #666666">=</span> X_batch<span style="color: #666666">.</span>shape[height_index] <span style="color: #666666">+</span> (<span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_height <span style="color: #666666">//</span> <span style="color: #666666">2</span>) <span style="color: #666666">*</span> <span style="color: #666666">2</span>
-            padded_width <span style="color: #666666">=</span> X_batch<span style="color: #666666">.</span>shape[width_index] <span style="color: #666666">+</span> (<span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_width <span style="color: #666666">//</span> <span style="color: #666666">2</span>) <span style="color: #666666">*</span> <span style="color: #666666">2</span>
-            half_kernel_height <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_height <span style="color: #666666">//</span> <span style="color: #666666">2</span>
-            half_kernel_width <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_width <span style="color: #666666">//</span> <span style="color: #666666">2</span>
-
-            <span style="color: #408080; font-style: italic"># initialize padded array</span>
-            X_batch_padded <span style="color: #666666">=</span> np<span style="color: #666666">.</span>ndarray(
-                (
-                    X_batch<span style="color: #666666">.</span>shape[input_index],
-                    X_batch<span style="color: #666666">.</span>shape[feature_maps_index],
-                    padded_height,
-                    padded_width,
-                )
-            )
-
-            <span style="color: #408080; font-style: italic"># zero pad all images in X_batch</span>
-            <span style="color: #008000; font-weight: bold">for</span> img <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(X_batch<span style="color: #666666">.</span>shape[input_index]):
-                padded_img <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros(
-                    (X_batch<span style="color: #666666">.</span>shape[feature_maps_index], padded_height, padded_width)
-                )
-                padded_img[
-                    :,
-                    half_kernel_height : padded_height <span style="color: #666666">-</span> half_kernel_height,
-                    half_kernel_width : padded_width <span style="color: #666666">-</span> half_kernel_width,
-                ] <span style="color: #666666">=</span> X_batch[img, :, :, :]
-                X_batch_padded[img, :, :, :] <span style="color: #666666">=</span> padded_img[:, :, :]
-
-            <span style="color: #008000; font-weight: bold">return</span> X_batch_padded
-
-        <span style="color: #408080; font-style: italic"># same padding for gradients</span>
-        <span style="color: #008000; font-weight: bold">elif</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>pad <span style="color: #666666">==</span> <span style="color: #BA2121">&quot;same&quot;</span> <span style="color: #AA22FF; font-weight: bold">and</span> batch_type <span style="color: #666666">==</span> <span style="color: #BA2121">&quot;grad&quot;</span>:
-            padded_height <span style="color: #666666">=</span> X_batch<span style="color: #666666">.</span>shape[height_index] <span style="color: #666666">+</span> (<span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_height <span style="color: #666666">//</span> <span style="color: #666666">2</span>) <span style="color: #666666">*</span> <span style="color: #666666">2</span>
-            padded_width <span style="color: #666666">=</span> X_batch<span style="color: #666666">.</span>shape[width_index] <span style="color: #666666">+</span> (<span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_width <span style="color: #666666">//</span> <span style="color: #666666">2</span>) <span style="color: #666666">*</span> <span style="color: #666666">2</span>
-            half_kernel_height <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_height <span style="color: #666666">//</span> <span style="color: #666666">2</span>
-            half_kernel_width <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_width <span style="color: #666666">//</span> <span style="color: #666666">2</span>
-
-            <span style="color: #408080; font-style: italic"># initialize padded array</span>
-            delta_term_padded <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros(
-                (
-                    X_batch<span style="color: #666666">.</span>shape[input_index],
-                    X_batch<span style="color: #666666">.</span>shape[feature_maps_index],
-                    padded_height,
-                    padded_width,
-                )
-            )
-
-            <span style="color: #408080; font-style: italic"># zero pad delta term</span>
-            delta_term_padded[
-                :, :, : X_batch<span style="color: #666666">.</span>shape[height_index], : X_batch<span style="color: #666666">.</span>shape[width_index]
-            ] <span style="color: #666666">=</span> X_batch[:, :, :, :]
-
-            <span style="color: #008000; font-weight: bold">return</span> delta_term_padded
-
-        <span style="color: #008000; font-weight: bold">else</span>:
-            <span style="color: #008000; font-weight: bold">return</span> X_batch
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">_reset_weights_independently</span>(<span style="color: #008000">self</span>):
-        <span style="color: #408080; font-style: italic"># sets seed to remove randomness inbetween runs</span>
-        <span style="color: #008000; font-weight: bold">if</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>seed <span style="color: #AA22FF; font-weight: bold">is</span> <span style="color: #AA22FF; font-weight: bold">not</span> <span style="color: #008000; font-weight: bold">None</span>:
-            np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>seed(<span style="color: #008000">self</span><span style="color: #666666">.</span>seed)
-
-        <span style="color: #408080; font-style: italic"># initializes kernel matrix</span>
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel <span style="color: #666666">=</span> np<span style="color: #666666">.</span>ndarray(
-            (
-                <span style="color: #008000">self</span><span style="color: #666666">.</span>input_channels,
-                <span style="color: #008000">self</span><span style="color: #666666">.</span>feature_maps,
-                <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_height,
-                <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_width,
-            )
-        )
-
-        <span style="color: #408080; font-style: italic"># randomly initializes weights</span>
-        <span style="color: #008000; font-weight: bold">for</span> chin <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #008000">self</span><span style="color: #666666">.</span>kernel<span style="color: #666666">.</span>shape[kernel_input_channels_index]):
-            <span style="color: #008000; font-weight: bold">for</span> fmap <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #008000">self</span><span style="color: #666666">.</span>kernel<span style="color: #666666">.</span>shape[kernel_feature_maps_index]):
-                <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel[chin, fmap, :, :] <span style="color: #666666">=</span> np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>rand(
-                    <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_height, <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_width
-                )
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">_reset_weights</span>(<span style="color: #008000">self</span>, previous_nodes):
-        <span style="color: #408080; font-style: italic"># sets weights</span>
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>_reset_weights_independently()
-
-        <span style="color: #408080; font-style: italic"># returns shape of output used for subsequent layer&#39;s weight initiation</span>
-        strided_height <span style="color: #666666">=</span> <span style="color: #008000">int</span>(
-            np<span style="color: #666666">.</span>ceil(previous_nodes<span style="color: #666666">.</span>shape[height_index] <span style="color: #666666">/</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>v_stride)
-        )
-        strided_width <span style="color: #666666">=</span> <span style="color: #008000">int</span>(np<span style="color: #666666">.</span>ceil(previous_nodes<span style="color: #666666">.</span>shape[width_index] <span style="color: #666666">/</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>h_stride))
-        next_nodes <span style="color: #666666">=</span> np<span style="color: #666666">.</span>ones(
-            (
-                previous_nodes<span style="color: #666666">.</span>shape[input_index],
-                <span style="color: #008000">self</span><span style="color: #666666">.</span>feature_maps,
-                strided_height,
-                strided_width,
-            )
-        )
-        <span style="color: #008000; font-weight: bold">return</span> next_nodes <span style="color: #666666">/</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_height
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">_check_for_errors</span>(<span style="color: #008000">self</span>):
-        <span style="color: #008000; font-weight: bold">if</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>X_batch_feedforward<span style="color: #666666">.</span>shape[input_channel_index] <span style="color: #666666">!=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>input_channels:
-            <span style="color: #008000; font-weight: bold">raise</span> <span style="color: #D2413A; font-weight: bold">AssertionError</span>(
-                <span style="color: #BA2121">f&quot;ERROR: Number of input channels in data (</span><span style="color: #BB6688; font-weight: bold">{</span><span style="color: #008000">self</span><span style="color: #666666">.</span>X_batch_feedforward<span style="color: #666666">.</span>shape[input_channel_index]<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">) is not equal to input channels in Convolution2DLayerOPT (</span><span style="color: #BB6688; font-weight: bold">{</span><span style="color: #008000">self</span><span style="color: #666666">.</span>input_channels<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">)! Please change the number of input channels of the Convolution2DLayer such that they are equal&quot;</span>
-            )
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-<h3 id="backpropagation-in-the-convolutional-layer" class="anchor">Backpropagation in the convolutional layer </h3>
-
-<p>As you may have noticed, we have not yet explained how the
-backpropagation algorithm works in a convolutional layer. However,
-having covered all other major details about convolutional layers, we
-are now prepared to do so. It should come as no surprise that the
-calculation of delta terms at each convolutional layer takes the form
-of convolution. After the gradient has been propagated backwards
-through the flattening layer, where it was reshaped into an
-appropriate form, calculating the update value for the kernel is
-simply a matter of convolving the output gradient with the input of
-the layer for which we are updating the weights. For more detail, this
-article serves as an excellent resource, see
-<a href="https://pavisj.medium.com/convolutions-and-backpropagations-46026a8f5d2c" target="_self"><tt>https://pavisj.medium.com/convolutions-and-backpropagations-46026a8f5d2c</tt></a>
-</p>
-<h3 id="demonstration" class="anchor">Demonstration </h3>
-
-<p>We can use the convolutional layer above to perform a simple convolution on an image of the now familiar cute dog.</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">imageio.v3</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">imageio</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">matplotlib.pyplot</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">plt</span>
-
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">plot_convolution_result</span>(X, layer):
-    plt<span style="color: #666666">.</span>imshow(X[<span style="color: #666666">0</span>, <span style="color: #666666">0</span>, :, :], vmin<span style="color: #666666">=0</span>, vmax<span style="color: #666666">=255</span>, cmap<span style="color: #666666">=</span><span style="color: #BA2121">&quot;gray&quot;</span>)
-    plt<span style="color: #666666">.</span>title(<span style="color: #BA2121">&quot;Original image&quot;</span>)
-    plt<span style="color: #666666">.</span>colorbar()
-    plt<span style="color: #666666">.</span>show()
-    conv_result <span style="color: #666666">=</span> layer<span style="color: #666666">.</span>_feedforward(X)
-    plt<span style="color: #666666">.</span>title(<span style="color: #BA2121">&quot;Result of convolutional layer&quot;</span>)
-    plt<span style="color: #666666">.</span>imshow(conv_result[<span style="color: #666666">0</span>, <span style="color: #666666">0</span>, :, :], vmin<span style="color: #666666">=0</span>, vmax<span style="color: #666666">=255</span>, cmap<span style="color: #666666">=</span><span style="color: #BA2121">&quot;gray&quot;</span>)
-    plt<span style="color: #666666">.</span>colorbar()
-    plt<span style="color: #666666">.</span>show()
-
-<span style="color: #408080; font-style: italic"># create layer</span>
-layer <span style="color: #666666">=</span> Convolution2DLayer(
-    input_channels<span style="color: #666666">=3</span>,
-    feature_maps<span style="color: #666666">=1</span>,
-    kernel_height<span style="color: #666666">=4</span>,
-    kernel_width<span style="color: #666666">=4</span>,
-    v_stride<span style="color: #666666">=2</span>,
-    h_stride<span style="color: #666666">=2</span>,
-    pad<span style="color: #666666">=</span><span style="color: #BA2121">&quot;same&quot;</span>,
-    act_func<span style="color: #666666">=</span>identity,
-    seed<span style="color: #666666">=2023</span>,
-    )
-
-<span style="color: #408080; font-style: italic"># read in image path, make data correct format</span>
-img_path <span style="color: #666666">=</span> img_path <span style="color: #666666">=</span> <span style="color: #BA2121">&quot;data/IMG-2167.JPG&quot;</span>
-image_of_cute_dog <span style="color: #666666">=</span> imageio<span style="color: #666666">.</span>imread(img_path)
-image_shape <span style="color: #666666">=</span> image_of_cute_dog<span style="color: #666666">.</span>shape
-image_of_cute_dog <span style="color: #666666">=</span> image_of_cute_dog<span style="color: #666666">.</span>reshape(<span style="color: #666666">1</span>, image_shape[<span style="color: #666666">0</span>], image_shape[<span style="color: #666666">1</span>], image_shape[<span style="color: #666666">2</span>])
-image_of_cute_dog <span style="color: #666666">=</span> image_of_cute_dog<span style="color: #666666">.</span>transpose(<span style="color: #666666">0</span>, <span style="color: #666666">3</span>, <span style="color: #666666">1</span>, <span style="color: #666666">2</span>)
-
-<span style="color: #408080; font-style: italic"># plot the result of the convolution</span>
-plot_convolution_result(image_of_cute_dog, layer)
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-<p>We cobserve that the result has half the pixels on each axis due to
-the fact that we've used a horizontal and vertical stride of 2. The
-result of this convolution is not very insightfull, as the kernel has
-completely random values for the first feedforward pass. However, as
-we perform multiple forward and backward passes, the results of the
-convolution should provide identifying features of the image it uses
-for classification.
-</p>
-
-<p>Note that image data usually comes in many different shapes and sizes,
-but for our CNN we require the input data be formatted as \[Number of
-inputs, input channels, input height, input width\]. Occasionally, the
-data you come accross use will be formatted like this, but on many
-occasions reshaping and transposing the dimensions is sadly necessary.
-</p>
-<h3 id="pooling-layer" class="anchor">Pooling Layer </h3>
-
-<p>The pooling layer is another widely used type of layer in
-convolutional neural networks that enables data downsampling to a more
-manageable size. Despite recent technological advancements that allow
-for convolution without excessive size reduction of the data, the
-pooling layer still remains a fundamental component of convolutional
-neural networks. It can be used before, after, or in between
-convolutional layers, although finding the optimal placement of layers
-and network depth requires experimentation to achieve the best
-performance for a given problem. The code we provide allows you to
-perform two types of pooling known as max pooling and average pooling.
-</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">class</span> <span style="color: #0000FF; font-weight: bold">Pooling2DLayer</span>(Layer):
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">__init__</span>(
-        <span style="color: #008000">self</span>,
-        kernel_height,
-        kernel_width,
-        v_stride,
-        h_stride,
-        pooling<span style="color: #666666">=</span><span style="color: #BA2121">&quot;max&quot;</span>,
-        seed<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">None</span>,
-    ):
-        <span style="color: #008000">super</span>()<span style="color: #666666">.</span><span style="color: #0000FF">__init__</span>(seed)
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_height <span style="color: #666666">=</span> kernel_height
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_width <span style="color: #666666">=</span> kernel_width
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>v_stride <span style="color: #666666">=</span> v_stride
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>h_stride <span style="color: #666666">=</span> h_stride
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>pooling <span style="color: #666666">=</span> pooling
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">_feedforward</span>(<span style="color: #008000">self</span>, X_batch):
-        <span style="color: #408080; font-style: italic"># Saving the input for use in the backwardpass</span>
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>X_batch_feedforward <span style="color: #666666">=</span> X_batch
-
-        <span style="color: #408080; font-style: italic"># check if user is silly</span>
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>_check_for_errors()
-
-        <span style="color: #408080; font-style: italic"># Computing the size of the feature maps based on kernel size and the stride parameter</span>
-        strided_height <span style="color: #666666">=</span> (
-            X_batch<span style="color: #666666">.</span>shape[height_index] <span style="color: #666666">-</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_height
-        ) <span style="color: #666666">//</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>v_stride <span style="color: #666666">+</span> <span style="color: #666666">1</span>
-        <span style="color: #008000; font-weight: bold">if</span> X_batch<span style="color: #666666">.</span>shape[height_index] <span style="color: #666666">==</span> X_batch<span style="color: #666666">.</span>shape[width_index]:
-            strided_width <span style="color: #666666">=</span> strided_height
-        <span style="color: #008000; font-weight: bold">else</span>:
-            strided_width <span style="color: #666666">=</span> (
-                X_batch<span style="color: #666666">.</span>shape[width_index] <span style="color: #666666">-</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_width
-            ) <span style="color: #666666">//</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>h_stride <span style="color: #666666">+</span> <span style="color: #666666">1</span>
-
-        <span style="color: #408080; font-style: italic"># initialize output array</span>
-        output <span style="color: #666666">=</span> np<span style="color: #666666">.</span>ndarray(
-            (
-                X_batch<span style="color: #666666">.</span>shape[input_index],
-                X_batch<span style="color: #666666">.</span>shape[feature_maps_index],
-                strided_height,
-                strided_width,
-            )
-        )
-
-        <span style="color: #408080; font-style: italic"># select pooling action, either max or average pooling</span>
-        <span style="color: #008000; font-weight: bold">if</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>pooling <span style="color: #666666">==</span> <span style="color: #BA2121">&quot;max&quot;</span>:
-            <span style="color: #008000">self</span><span style="color: #666666">.</span>pooling_action <span style="color: #666666">=</span> np<span style="color: #666666">.</span>max
-        <span style="color: #008000; font-weight: bold">elif</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>pooling <span style="color: #666666">==</span> <span style="color: #BA2121">&quot;average&quot;</span>:
-            <span style="color: #008000">self</span><span style="color: #666666">.</span>pooling_action <span style="color: #666666">=</span> np<span style="color: #666666">.</span>mean
-
-        <span style="color: #408080; font-style: italic"># pool based on kernel size and stride</span>
-        <span style="color: #008000; font-weight: bold">for</span> img <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(output<span style="color: #666666">.</span>shape[input_index]):
-            <span style="color: #008000; font-weight: bold">for</span> fmap <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(output<span style="color: #666666">.</span>shape[feature_maps_index]):
-                <span style="color: #008000; font-weight: bold">for</span> h <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(strided_height):
-                    <span style="color: #008000; font-weight: bold">for</span> w <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(strided_width):
-                        output[img, fmap, h, w] <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>pooling_action(
-                            X_batch[
-                                img,
-                                fmap,
-                                (h <span style="color: #666666">*</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>v_stride) : (h <span style="color: #666666">*</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>v_stride)
-                                <span style="color: #666666">+</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_height,
-                                (w <span style="color: #666666">*</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>h_stride) : (w <span style="color: #666666">*</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>h_stride)
-                                <span style="color: #666666">+</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_width,
-                            ]
-                        )
-
-        <span style="color: #408080; font-style: italic"># output for feedforward in next layer</span>
-        <span style="color: #008000; font-weight: bold">return</span> output
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">_backpropagate</span>(<span style="color: #008000">self</span>, delta_term_next):
-        <span style="color: #408080; font-style: italic"># initiate delta term array</span>
-        delta_term <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros((<span style="color: #008000">self</span><span style="color: #666666">.</span>X_batch_feedforward<span style="color: #666666">.</span>shape))
-
-        <span style="color: #008000; font-weight: bold">for</span> img <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(delta_term_next<span style="color: #666666">.</span>shape[input_index]):
-            <span style="color: #008000; font-weight: bold">for</span> fmap <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(delta_term_next<span style="color: #666666">.</span>shape[feature_maps_index]):
-                <span style="color: #008000; font-weight: bold">for</span> h <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #666666">0</span>, delta_term_next<span style="color: #666666">.</span>shape[height_index], <span style="color: #008000">self</span><span style="color: #666666">.</span>v_stride):
-                    <span style="color: #008000; font-weight: bold">for</span> w <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(
-                        <span style="color: #666666">0</span>, delta_term_next<span style="color: #666666">.</span>shape[width_index], <span style="color: #008000">self</span><span style="color: #666666">.</span>h_stride
-                    ):
-                        <span style="color: #408080; font-style: italic"># max pooling</span>
-                        <span style="color: #008000; font-weight: bold">if</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>pooling <span style="color: #666666">==</span> <span style="color: #BA2121">&quot;max&quot;</span>:
-                            <span style="color: #408080; font-style: italic"># get window</span>
-                            window <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>X_batch_feedforward[
-                                img,
-                                fmap,
-                                h : h <span style="color: #666666">+</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_height,
-                                w : w <span style="color: #666666">+</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_width,
-                            ]
-
-                            <span style="color: #408080; font-style: italic"># find max values indices in window</span>
-                            max_h, max_w <span style="color: #666666">=</span> np<span style="color: #666666">.</span>unravel_index(
-                                window<span style="color: #666666">.</span>argmax(), window<span style="color: #666666">.</span>shape
-                            )
-
-                            <span style="color: #408080; font-style: italic"># set values in new, upsampled delta term</span>
-                            delta_term[
-                                img,
-                                fmap,
-                                (h <span style="color: #666666">+</span> max_h),
-                                (w <span style="color: #666666">+</span> max_w),
-                            ] <span style="color: #666666">+=</span> delta_term_next[img, fmap, h, w]
-
-                        <span style="color: #408080; font-style: italic"># average pooling</span>
-                        <span style="color: #008000; font-weight: bold">if</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>pooling <span style="color: #666666">==</span> <span style="color: #BA2121">&quot;average&quot;</span>:
-                            delta_term[
-                                img,
-                                fmap,
-                                h : h <span style="color: #666666">+</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_height,
-                                w : w <span style="color: #666666">+</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_width,
-                            ] <span style="color: #666666">=</span> (
-                                delta_term_next[img, fmap, h, w]
-                                <span style="color: #666666">/</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_height
-                                <span style="color: #666666">/</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_width
-                            )
-        <span style="color: #408080; font-style: italic"># returns input to backpropagation in previous layer</span>
-        <span style="color: #008000; font-weight: bold">return</span> delta_term
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">_reset_weights</span>(<span style="color: #008000">self</span>, previous_nodes):
-        <span style="color: #408080; font-style: italic"># calculate strided height, strided width</span>
-        strided_height <span style="color: #666666">=</span> (
-            previous_nodes<span style="color: #666666">.</span>shape[height_index] <span style="color: #666666">-</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_height
-        ) <span style="color: #666666">//</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>v_stride <span style="color: #666666">+</span> <span style="color: #666666">1</span>
-        <span style="color: #008000; font-weight: bold">if</span> previous_nodes<span style="color: #666666">.</span>shape[height_index] <span style="color: #666666">==</span> previous_nodes<span style="color: #666666">.</span>shape[width_index]:
-            strided_width <span style="color: #666666">=</span> strided_height
-        <span style="color: #008000; font-weight: bold">else</span>:
-            strided_width <span style="color: #666666">=</span> (
-                previous_nodes<span style="color: #666666">.</span>shape[width_index] <span style="color: #666666">-</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_width
-            ) <span style="color: #666666">//</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>h_stride <span style="color: #666666">+</span> <span style="color: #666666">1</span>
-
-        <span style="color: #408080; font-style: italic"># initiate output array</span>
-        output <span style="color: #666666">=</span> np<span style="color: #666666">.</span>ones(
-            (
-                previous_nodes<span style="color: #666666">.</span>shape[input_index],
-                previous_nodes<span style="color: #666666">.</span>shape[feature_maps_index],
-                strided_height,
-                strided_width,
-            )
-        )
-
-        <span style="color: #408080; font-style: italic"># returns output with shape used for reset weights in next layer</span>
-        <span style="color: #008000; font-weight: bold">return</span> output
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">_check_for_errors</span>(<span style="color: #008000">self</span>):
-        <span style="color: #408080; font-style: italic"># check if input is smaller than kernel size -&gt; error</span>
-        <span style="color: #008000; font-weight: bold">assert</span> (
-            <span style="color: #008000">self</span><span style="color: #666666">.</span>X_batch_feedforward<span style="color: #666666">.</span>shape[width_index] <span style="color: #666666">&gt;=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_width
-        ), <span style="color: #BA2121">f&quot;ERROR: Pooling kernel width_index (</span><span style="color: #BB6688; font-weight: bold">{</span><span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_width<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">) larger than data width_index (</span><span style="color: #BB6688; font-weight: bold">{</span><span style="color: #008000">self</span><span style="color: #666666">.</span>X_batch_feedforward<span style="color: #666666">.</span>input<span style="color: #666666">.</span>shape[<span style="color: #666666">2</span>]<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">), please lower the kernel width_index of the Pooling2DLayer&quot;</span>
-        <span style="color: #008000; font-weight: bold">assert</span> (
-            <span style="color: #008000">self</span><span style="color: #666666">.</span>X_batch_feedforward<span style="color: #666666">.</span>shape[height_index] <span style="color: #666666">&gt;=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_height
-        ), <span style="color: #BA2121">f&quot;ERROR: Pooling kernel height_index (</span><span style="color: #BB6688; font-weight: bold">{</span><span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_height<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">) larger than data height_index (</span><span style="color: #BB6688; font-weight: bold">{</span><span style="color: #008000">self</span><span style="color: #666666">.</span>X_batch_feedforward<span style="color: #666666">.</span>input<span style="color: #666666">.</span>shape[<span style="color: #666666">3</span>]<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">), please lower the kernel height_index of the Pooling2DLayer&quot;</span>
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-<h3 id="flattening-layer" class="anchor">Flattening Layer </h3>
-
-<p>Before we can begin building our first CNN model, we need to introduce
-the flattening layer. As its name suggests, the flattening layer
-transforms the data into a one-dimensional vector that can be fed into
-the feedforward layers of our network. This layer plays a crucial role
-in preparing the data for further processing in the
-network. Additionally, the flattening layer is responsible for
-reshaping the gradient to the proper shape during
-backpropagation. This ensures that the kernels are correctly updated,
-allowing for effective learning in the network.
-</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">class</span> <span style="color: #0000FF; font-weight: bold">FlattenLayer</span>(Layer):
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">__init__</span>(<span style="color: #008000">self</span>, act_func<span style="color: #666666">=</span>LRELU, seed<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">None</span>):
-        <span style="color: #008000">super</span>()<span style="color: #666666">.</span><span style="color: #0000FF">__init__</span>(seed)
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>act_func <span style="color: #666666">=</span> act_func
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">_feedforward</span>(<span style="color: #008000">self</span>, X_batch):
-        <span style="color: #408080; font-style: italic"># save input for backpropagation</span>
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>X_batch_feedforward_shape <span style="color: #666666">=</span> X_batch<span style="color: #666666">.</span>shape
-        <span style="color: #408080; font-style: italic"># Remember, the data has the following shape: (I, FM, H, W, ) in the convolutional layers</span>
-        <span style="color: #408080; font-style: italic"># whilst the data has the shape (I, FM * H * W) in the fully connected layers</span>
-        <span style="color: #408080; font-style: italic"># I = Inputs, FM = Feature Maps, H = Height and W = Width.</span>
-        X_batch <span style="color: #666666">=</span> X_batch<span style="color: #666666">.</span>reshape(
-            X_batch<span style="color: #666666">.</span>shape[input_index],
-            X_batch<span style="color: #666666">.</span>shape[feature_maps_index]
-            <span style="color: #666666">*</span> X_batch<span style="color: #666666">.</span>shape[height_index]
-            <span style="color: #666666">*</span> X_batch<span style="color: #666666">.</span>shape[width_index],
-        )
-
-        <span style="color: #408080; font-style: italic"># add bias to a</span>
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>z_matrix <span style="color: #666666">=</span> X_batch
-        bias <span style="color: #666666">=</span> np<span style="color: #666666">.</span>ones((X_batch<span style="color: #666666">.</span>shape[input_index], <span style="color: #666666">1</span>)) <span style="color: #666666">*</span> <span style="color: #666666">0.01</span>
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>a_matrix <span style="color: #666666">=</span> np<span style="color: #666666">.</span>hstack([bias, X_batch])
-
-        <span style="color: #408080; font-style: italic"># return a, the input to feedforward in next layer</span>
-        <span style="color: #008000; font-weight: bold">return</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>a_matrix
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">_backpropagate</span>(<span style="color: #008000">self</span>, weights_next, delta_term_next):
-        activation_derivative <span style="color: #666666">=</span> derivate(<span style="color: #008000">self</span><span style="color: #666666">.</span>act_func)
-
-        <span style="color: #408080; font-style: italic"># calculate delta term</span>
-        delta_term <span style="color: #666666">=</span> (
-            weights_next[bias_index:, :] <span style="color: #666666">@</span> delta_term_next<span style="color: #666666">.</span>T
-        )<span style="color: #666666">.</span>T <span style="color: #666666">*</span> activation_derivative(<span style="color: #008000">self</span><span style="color: #666666">.</span>z_matrix)
-
-        <span style="color: #408080; font-style: italic"># FlattenLayer does not update weights</span>
-        <span style="color: #408080; font-style: italic"># reshapes delta layer to convolutional layer data format [Input, Feature_Maps, Height, Width]</span>
-        <span style="color: #008000; font-weight: bold">return</span> delta_term<span style="color: #666666">.</span>reshape(<span style="color: #008000">self</span><span style="color: #666666">.</span>X_batch_feedforward_shape)
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">_reset_weights</span>(<span style="color: #008000">self</span>, previous_nodes):
-        <span style="color: #408080; font-style: italic"># note that the previous nodes to the FlattenLayer are from the convolutional layers</span>
-        previous_nodes <span style="color: #666666">=</span> previous_nodes<span style="color: #666666">.</span>reshape(
-            previous_nodes<span style="color: #666666">.</span>shape[input_index],
-            previous_nodes<span style="color: #666666">.</span>shape[feature_maps_index]
-            <span style="color: #666666">*</span> previous_nodes<span style="color: #666666">.</span>shape[height_index]
-            <span style="color: #666666">*</span> previous_nodes<span style="color: #666666">.</span>shape[width_index],
-        )
-
-        <span style="color: #408080; font-style: italic"># return shape used in reset_weights in next layer</span>
-        <span style="color: #008000; font-weight: bold">return</span> previous_nodes<span style="color: #666666">.</span>shape[node_index]
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">get_prev_a</span>(<span style="color: #008000">self</span>):
-        <span style="color: #008000; font-weight: bold">return</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>a_matrix
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-<h3 id="fully-connected-layers" class="anchor">Fully Connected Layers </h3>
-
-<p>Finally, the result from the flatten layer will pass to a series of
-fully connected layers, which function as a normal feed forward neural
-network. The fully connected layers are split into two classes;
-FullyConnectedLayer which acts as a hidden layer, and OutputLayer,
-which acts as the single output layer at the end of the CNN. If one
-wishes to use this codebase to construct a normal feed forward neural
-network, it must start with a FlattenLayer due to techincal details
-regarding weight intitialization. However many FullyConnectedLayers
-can be added to the CNN, and in each layer the amount of nodes, which
-activation function and scheduler to use can be specified. In
-practice, the scheduler will be specified in the CNN object
-initialization, and inherited if no other scheduler is specified.
-</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">class</span> <span style="color: #0000FF; font-weight: bold">FullyConnectedLayer</span>(Layer):
-    <span style="color: #408080; font-style: italic"># FullyConnectedLayer per default uses LRELU and Adam scheduler</span>
-    <span style="color: #408080; font-style: italic"># with an eta of 0.0001, rho of 0.9 and rho2 of 0.999</span>
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">__init__</span>(
-        <span style="color: #008000">self</span>,
-        nodes: <span style="color: #008000">int</span>,
-        act_func: Callable <span style="color: #666666">=</span> LRELU,
-        scheduler: Scheduler <span style="color: #666666">=</span> Adam(eta<span style="color: #666666">=1e-4</span>, rho<span style="color: #666666">=0.9</span>, rho2<span style="color: #666666">=0.999</span>),
-        seed: <span style="color: #008000">int</span> <span style="color: #666666">=</span> <span style="color: #008000; font-weight: bold">None</span>,
-    ):
-        <span style="color: #008000">super</span>()<span style="color: #666666">.</span><span style="color: #0000FF">__init__</span>(seed)
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>nodes <span style="color: #666666">=</span> nodes
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>act_func <span style="color: #666666">=</span> act_func
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>scheduler_weight <span style="color: #666666">=</span> copy(scheduler)
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>scheduler_bias <span style="color: #666666">=</span> copy(scheduler)
-
-        <span style="color: #408080; font-style: italic"># initiate matrices for later</span>
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>weights <span style="color: #666666">=</span> <span style="color: #008000; font-weight: bold">None</span>
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>a_matrix <span style="color: #666666">=</span> <span style="color: #008000; font-weight: bold">None</span>
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>z_matrix <span style="color: #666666">=</span> <span style="color: #008000; font-weight: bold">None</span>
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">_feedforward</span>(<span style="color: #008000">self</span>, X_batch):
-        <span style="color: #408080; font-style: italic"># calculate z</span>
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>z_matrix <span style="color: #666666">=</span> X_batch <span style="color: #666666">@</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>weights
-
-        <span style="color: #408080; font-style: italic"># calculate a, add bias</span>
-        bias <span style="color: #666666">=</span> np<span style="color: #666666">.</span>ones((X_batch<span style="color: #666666">.</span>shape[input_index], <span style="color: #666666">1</span>)) <span style="color: #666666">*</span> <span style="color: #666666">0.01</span>
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>a_matrix <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>act_func(<span style="color: #008000">self</span><span style="color: #666666">.</span>z_matrix)
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>a_matrix <span style="color: #666666">=</span> np<span style="color: #666666">.</span>hstack([bias, <span style="color: #008000">self</span><span style="color: #666666">.</span>a_matrix])
-
-        <span style="color: #408080; font-style: italic"># return a, the input for feedforward in next layer</span>
-        <span style="color: #008000; font-weight: bold">return</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>a_matrix
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">_backpropagate</span>(<span style="color: #008000">self</span>, weights_next, delta_term_next, a_previous, lam):
-        <span style="color: #408080; font-style: italic"># take the derivative of the activation function</span>
-        activation_derivative <span style="color: #666666">=</span> derivate(<span style="color: #008000">self</span><span style="color: #666666">.</span>act_func)
-
-        <span style="color: #408080; font-style: italic"># calculate the delta term</span>
-        delta_term <span style="color: #666666">=</span> (
-            weights_next[bias_index:, :] <span style="color: #666666">@</span> delta_term_next<span style="color: #666666">.</span>T
-        )<span style="color: #666666">.</span>T <span style="color: #666666">*</span> activation_derivative(<span style="color: #008000">self</span><span style="color: #666666">.</span>z_matrix)
-
-        <span style="color: #408080; font-style: italic"># intitiate matrix to store gradient</span>
-        <span style="color: #408080; font-style: italic"># note that we exclude the bias term, which we will calculate later</span>
-        gradient_weights <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros(
-            (
-                a_previous<span style="color: #666666">.</span>shape[input_index],
-                a_previous<span style="color: #666666">.</span>shape[node_index] <span style="color: #666666">-</span> bias_index,
-                delta_term<span style="color: #666666">.</span>shape[node_index],
-            )
-        )
-
-        <span style="color: #408080; font-style: italic"># calculate gradient = delta term * previous a</span>
-        <span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #008000">len</span>(delta_term)):
-            gradient_weights[i, :, :] <span style="color: #666666">=</span> np<span style="color: #666666">.</span>outer(
-                a_previous[i, bias_index:], delta_term[i, :]
-            )
-
-        <span style="color: #408080; font-style: italic"># sum the gradient, divide by input_index</span>
-        gradient_weights <span style="color: #666666">=</span> np<span style="color: #666666">.</span>mean(gradient_weights, axis<span style="color: #666666">=</span>input_index)
-        <span style="color: #408080; font-style: italic"># for the bias gradient we do not multiply by previous a</span>
-        gradient_bias <span style="color: #666666">=</span> np<span style="color: #666666">.</span>mean(delta_term, axis<span style="color: #666666">=</span>input_index)<span style="color: #666666">.</span>reshape(
-            <span style="color: #666666">1</span>, delta_term<span style="color: #666666">.</span>shape[node_index]
-        )
-
-        <span style="color: #408080; font-style: italic"># regularization term</span>
-        gradient_weights <span style="color: #666666">+=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>weights[bias_index:, :] <span style="color: #666666">*</span> lam
-
-        <span style="color: #408080; font-style: italic"># send gradients into scheduler</span>
-        <span style="color: #408080; font-style: italic"># returns update matrix which will be used to update the weights and bias</span>
-        update_matrix <span style="color: #666666">=</span> np<span style="color: #666666">.</span>vstack(
-            [
-                <span style="color: #008000">self</span><span style="color: #666666">.</span>scheduler_bias<span style="color: #666666">.</span>update_change(gradient_bias),
-                <span style="color: #008000">self</span><span style="color: #666666">.</span>scheduler_weight<span style="color: #666666">.</span>update_change(gradient_weights),
-            ]
-        )
-
-        <span style="color: #408080; font-style: italic"># update weights</span>
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>weights <span style="color: #666666">-=</span> update_matrix
-
-        <span style="color: #408080; font-style: italic"># return weights and delta term, input for backpropagation in previous layer</span>
-        <span style="color: #008000; font-weight: bold">return</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>weights, delta_term
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">_reset_weights</span>(<span style="color: #008000">self</span>, previous_nodes):
-        <span style="color: #408080; font-style: italic"># sets seed to remove randomness inbetween runs</span>
-        <span style="color: #008000; font-weight: bold">if</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>seed <span style="color: #AA22FF; font-weight: bold">is</span> <span style="color: #AA22FF; font-weight: bold">not</span> <span style="color: #008000; font-weight: bold">None</span>:
-            np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>seed(<span style="color: #008000">self</span><span style="color: #666666">.</span>seed)
-
-        <span style="color: #408080; font-style: italic"># add bias, initiate random weights</span>
-        bias <span style="color: #666666">=</span> <span style="color: #666666">1</span>
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>weights <span style="color: #666666">=</span> np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>randn(previous_nodes <span style="color: #666666">+</span> bias, <span style="color: #008000">self</span><span style="color: #666666">.</span>nodes)
-
-        <span style="color: #408080; font-style: italic"># returns number of nodes, used for reset_weights in next layer</span>
-        <span style="color: #008000; font-weight: bold">return</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>nodes
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">_reset_scheduler</span>(<span style="color: #008000">self</span>):
-        <span style="color: #408080; font-style: italic"># resets scheduler per epoch</span>
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>scheduler_weight<span style="color: #666666">.</span>reset()
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>scheduler_bias<span style="color: #666666">.</span>reset()
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">get_prev_a</span>(<span style="color: #008000">self</span>):
-        <span style="color: #408080; font-style: italic"># returns a matrix, used in backpropagation</span>
-        <span style="color: #008000; font-weight: bold">return</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>a_matrix
-
-
-<span style="color: #008000; font-weight: bold">class</span> <span style="color: #0000FF; font-weight: bold">OutputLayer</span>(FullyConnectedLayer):
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">__init__</span>(
-        <span style="color: #008000">self</span>,
-        nodes: <span style="color: #008000">int</span>,
-        output_func: Callable <span style="color: #666666">=</span> LRELU,
-        cost_func: Callable <span style="color: #666666">=</span> CostCrossEntropy,
-        scheduler: Scheduler <span style="color: #666666">=</span> Adam(eta<span style="color: #666666">=1e-4</span>, rho<span style="color: #666666">=0.9</span>, rho2<span style="color: #666666">=0.999</span>),
-        seed: <span style="color: #008000">int</span> <span style="color: #666666">=</span> <span style="color: #008000; font-weight: bold">None</span>,
-    ):
-        <span style="color: #008000">super</span>()<span style="color: #666666">.</span><span style="color: #0000FF">__init__</span>(nodes, output_func, copy(scheduler), seed)
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>cost_func <span style="color: #666666">=</span> cost_func
-
-        <span style="color: #408080; font-style: italic"># initiate matrices for later</span>
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>weights <span style="color: #666666">=</span> <span style="color: #008000; font-weight: bold">None</span>
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>a_matrix <span style="color: #666666">=</span> <span style="color: #008000; font-weight: bold">None</span>
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>z_matrix <span style="color: #666666">=</span> <span style="color: #008000; font-weight: bold">None</span>
-
-        <span style="color: #408080; font-style: italic"># decides if the output layer performs binary or multi-class classification</span>
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>_set_pred_format()
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">_feedforward</span>(<span style="color: #008000">self</span>, X_batch: np<span style="color: #666666">.</span>ndarray):
-        <span style="color: #408080; font-style: italic"># calculate a, z</span>
-        <span style="color: #408080; font-style: italic"># note that bias is not added as this would create an extra output class</span>
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>z_matrix <span style="color: #666666">=</span> X_batch <span style="color: #666666">@</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>weights
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>a_matrix <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>act_func(<span style="color: #008000">self</span><span style="color: #666666">.</span>z_matrix)
-
-        <span style="color: #408080; font-style: italic"># returns prediction</span>
-        <span style="color: #008000; font-weight: bold">return</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>a_matrix
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">_backpropagate</span>(<span style="color: #008000">self</span>, target, a_previous, lam):
-        <span style="color: #408080; font-style: italic"># note that in the OutputLayer the activation function is the output function</span>
-        activation_derivative <span style="color: #666666">=</span> derivate(<span style="color: #008000">self</span><span style="color: #666666">.</span>act_func)
-
-        <span style="color: #408080; font-style: italic"># calculate output delta terms</span>
-        <span style="color: #408080; font-style: italic"># for multi-class or binary classification</span>
-        <span style="color: #008000; font-weight: bold">if</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>pred_format <span style="color: #666666">==</span> <span style="color: #BA2121">&quot;Multi-class&quot;</span>:
-            delta_term <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>a_matrix <span style="color: #666666">-</span> target
-        <span style="color: #008000; font-weight: bold">else</span>:
-            cost_func_derivative <span style="color: #666666">=</span> grad(<span style="color: #008000">self</span><span style="color: #666666">.</span>cost_func(target))
-            delta_term <span style="color: #666666">=</span> activation_derivative(<span style="color: #008000">self</span><span style="color: #666666">.</span>z_matrix) <span style="color: #666666">*</span> cost_func_derivative(
-                <span style="color: #008000">self</span><span style="color: #666666">.</span>a_matrix
-            )
-
-        <span style="color: #408080; font-style: italic"># intiate matrix that stores gradient</span>
-        gradient_weights <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros(
-            (
-                a_previous<span style="color: #666666">.</span>shape[input_index],
-                a_previous<span style="color: #666666">.</span>shape[node_index] <span style="color: #666666">-</span> bias_index,
-                delta_term<span style="color: #666666">.</span>shape[node_index],
-            )
-        )
-
-        <span style="color: #408080; font-style: italic"># calculate gradient = delta term * previous a</span>
-        <span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #008000">len</span>(delta_term)):
-            gradient_weights[i, :, :] <span style="color: #666666">=</span> np<span style="color: #666666">.</span>outer(
-                a_previous[i, bias_index:], delta_term[i, :]
-            )
-
-        <span style="color: #408080; font-style: italic"># sum the gradient, divide by input_index</span>
-        gradient_weights <span style="color: #666666">=</span> np<span style="color: #666666">.</span>mean(gradient_weights, axis<span style="color: #666666">=</span>input_index)
-        <span style="color: #408080; font-style: italic"># for the bias gradient we do not multiply by previous a</span>
-        gradient_bias <span style="color: #666666">=</span> np<span style="color: #666666">.</span>mean(delta_term, axis<span style="color: #666666">=</span>input_index)<span style="color: #666666">.</span>reshape(
-            <span style="color: #666666">1</span>, delta_term<span style="color: #666666">.</span>shape[node_index]
-        )
-
-        <span style="color: #408080; font-style: italic"># regularization term</span>
-        gradient_weights <span style="color: #666666">+=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>weights[bias_index:, :] <span style="color: #666666">*</span> lam
-
-        <span style="color: #408080; font-style: italic"># send gradients into scheduler</span>
-        <span style="color: #408080; font-style: italic"># returns update matrix which will be used to update the weights and bias</span>
-        update_matrix <span style="color: #666666">=</span> np<span style="color: #666666">.</span>vstack(
-            [
-                <span style="color: #008000">self</span><span style="color: #666666">.</span>scheduler_bias<span style="color: #666666">.</span>update_change(gradient_bias),
-                <span style="color: #008000">self</span><span style="color: #666666">.</span>scheduler_weight<span style="color: #666666">.</span>update_change(gradient_weights),
-            ]
-        )
-
-        <span style="color: #408080; font-style: italic"># update weights</span>
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>weights <span style="color: #666666">-=</span> update_matrix
-
-        <span style="color: #408080; font-style: italic"># return weights and delta term, input for backpropagation in previous layer</span>
-        <span style="color: #008000; font-weight: bold">return</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>weights, delta_term
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">_reset_weights</span>(<span style="color: #008000">self</span>, previous_nodes):
-        <span style="color: #408080; font-style: italic"># sets seed to remove randomness inbetween runs</span>
-        <span style="color: #008000; font-weight: bold">if</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>seed <span style="color: #AA22FF; font-weight: bold">is</span> <span style="color: #AA22FF; font-weight: bold">not</span> <span style="color: #008000; font-weight: bold">None</span>:
-            np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>seed(<span style="color: #008000">self</span><span style="color: #666666">.</span>seed)
-
-        <span style="color: #408080; font-style: italic"># add bias, initiate random weights</span>
-        bias <span style="color: #666666">=</span> <span style="color: #666666">1</span>
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>weights <span style="color: #666666">=</span> np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>rand(previous_nodes <span style="color: #666666">+</span> bias, <span style="color: #008000">self</span><span style="color: #666666">.</span>nodes)
-
-        <span style="color: #408080; font-style: italic"># returns number of nodes, used for reset_weights in next layer</span>
-        <span style="color: #008000; font-weight: bold">return</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>nodes
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">_reset_scheduler</span>(<span style="color: #008000">self</span>):
-        <span style="color: #408080; font-style: italic"># resets scheduler per epoch</span>
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>scheduler_weight<span style="color: #666666">.</span>reset()
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>scheduler_bias<span style="color: #666666">.</span>reset()
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">_set_pred_format</span>(<span style="color: #008000">self</span>):
-        <span style="color: #408080; font-style: italic"># sets prediction format to either regression, binary or multi-class classification</span>
-        <span style="color: #008000; font-weight: bold">if</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>act_func<span style="color: #666666">.</span><span style="color: #19177C">__name__</span> <span style="color: #AA22FF; font-weight: bold">is</span> <span style="color: #008000; font-weight: bold">None</span> <span style="color: #AA22FF; font-weight: bold">or</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>act_func<span style="color: #666666">.</span><span style="color: #19177C">__name__</span> <span style="color: #666666">==</span> <span style="color: #BA2121">&quot;identity&quot;</span>:
-            <span style="color: #008000">self</span><span style="color: #666666">.</span>pred_format <span style="color: #666666">=</span> <span style="color: #BA2121">&quot;Regression&quot;</span>
-        <span style="color: #008000; font-weight: bold">elif</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>act_func<span style="color: #666666">.</span><span style="color: #19177C">__name__</span> <span style="color: #666666">==</span> <span style="color: #BA2121">&quot;sigmoid&quot;</span> <span style="color: #AA22FF; font-weight: bold">or</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>act_func<span style="color: #666666">.</span><span style="color: #19177C">__name__</span> <span style="color: #666666">==</span> <span style="color: #BA2121">&quot;tanh&quot;</span>:
-            <span style="color: #008000">self</span><span style="color: #666666">.</span>pred_format <span style="color: #666666">=</span> <span style="color: #BA2121">&quot;Binary&quot;</span>
-        <span style="color: #008000; font-weight: bold">else</span>:
-            <span style="color: #008000">self</span><span style="color: #666666">.</span>pred_format <span style="color: #666666">=</span> <span style="color: #BA2121">&quot;Multi-class&quot;</span>
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">get_pred_format</span>(<span style="color: #008000">self</span>):
-        <span style="color: #408080; font-style: italic"># returns format of prediction</span>
-        <span style="color: #008000; font-weight: bold">return</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>pred_format
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-<h3 id="optimized-convolution2dlayer" class="anchor">Optimized Convolution2DLayer </h3>
-
-<p>For our CNN, we have also implemented an optimized version of the
-Convolution2DLayer, Convolution2DLayerOPT, which runs much faster. See
-VII. Remarks for discussion. This layer will per default be used by
-the CNN due to its computational advantages, but is much less
-readable. We've documented it such that specially interested students
-can understand the principles behind it, but it is not recommended to
-read. In short, we reshape and transpose parts of the image such that
-the convolutional operation can be swapped out for a simple matrix
-multiplication.
-</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">class</span> <span style="color: #0000FF; font-weight: bold">Convolution2DLayerOPT</span>(Convolution2DLayer):
-    <span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;</span>
-<span style="color: #BA2121; font-style: italic">    Am optimized version of the convolution layer above which</span>
-<span style="color: #BA2121; font-style: italic">    utilizes an approach of extracting windows of size equivalent</span>
-<span style="color: #BA2121; font-style: italic">    in size to the filter. The convoution is then performed on those</span>
-<span style="color: #BA2121; font-style: italic">    windows instead of a full feature map.</span>
-<span style="color: #BA2121; font-style: italic">    &quot;&quot;&quot;</span>
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">__init__</span>(
-        <span style="color: #008000">self</span>,
-        input_channels,
-        feature_maps,
-        kernel_height,
-        kernel_width,
-        v_stride,
-        h_stride,
-        pad,
-        act_func: Callable,
-        seed<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">None</span>,
-        reset_weights_independently<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">True</span>,
-    ):
-        <span style="color: #008000">super</span>()<span style="color: #666666">.</span><span style="color: #0000FF">__init__</span>(
-            input_channels,
-            feature_maps,
-            kernel_height,
-            kernel_width,
-            v_stride,
-            h_stride,
-            pad,
-            act_func,
-            seed,
-        )
-        <span style="color: #408080; font-style: italic"># true if layer is used outside of CNN</span>
-        <span style="color: #008000; font-weight: bold">if</span> reset_weights_independently <span style="color: #666666">==</span> <span style="color: #008000; font-weight: bold">True</span>:
-            <span style="color: #008000">self</span><span style="color: #666666">.</span>_reset_weights_independently()
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">_feedforward</span>(<span style="color: #008000">self</span>, X_batch):
-        <span style="color: #408080; font-style: italic"># The optimized _feedforward method is difficult to understand but computationally more efficient</span>
-        <span style="color: #408080; font-style: italic"># for a more &quot;by the book&quot; approach, please look at the _feedforward method of Convolution2DLayer</span>
-
-        <span style="color: #408080; font-style: italic"># save the input for backpropagation</span>
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>X_batch_feedforward <span style="color: #666666">=</span> X_batch
-
-        <span style="color: #408080; font-style: italic"># check that there are the correct amount of input channels</span>
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>_check_for_errors()
-
-        <span style="color: #408080; font-style: italic"># calculate new shape after stride</span>
-        strided_height <span style="color: #666666">=</span> <span style="color: #008000">int</span>(np<span style="color: #666666">.</span>ceil(X_batch<span style="color: #666666">.</span>shape[height_index] <span style="color: #666666">/</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>v_stride))
-        strided_width <span style="color: #666666">=</span> <span style="color: #008000">int</span>(np<span style="color: #666666">.</span>ceil(X_batch<span style="color: #666666">.</span>shape[width_index] <span style="color: #666666">/</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>h_stride))
-
-        <span style="color: #408080; font-style: italic"># get windows of the image for more computationally efficient convolution</span>
-        <span style="color: #408080; font-style: italic"># the idea is that we want to align the dimensions that we wish to matrix</span>
-        <span style="color: #408080; font-style: italic"># multiply, then use a simple matrix multiplication instead of convolution.</span>
-        <span style="color: #408080; font-style: italic"># then, we reshape the size back to its intended shape</span>
-        windows <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>_extract_windows(X_batch)
-        windows <span style="color: #666666">=</span> windows<span style="color: #666666">.</span>transpose(<span style="color: #666666">1</span>, <span style="color: #666666">0</span>, <span style="color: #666666">2</span>, <span style="color: #666666">3</span>, <span style="color: #666666">4</span>)<span style="color: #666666">.</span>reshape(
-            X_batch<span style="color: #666666">.</span>shape[input_index],
-            strided_height <span style="color: #666666">*</span> strided_width,
-            <span style="color: #666666">-1</span>,
-        )
-
-        <span style="color: #408080; font-style: italic"># reshape the kernel for more computationally efficient convolution</span>
-        kernel <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel
-        kernel <span style="color: #666666">=</span> kernel<span style="color: #666666">.</span>transpose(<span style="color: #666666">0</span>, <span style="color: #666666">2</span>, <span style="color: #666666">3</span>, <span style="color: #666666">1</span>)<span style="color: #666666">.</span>reshape(
-            kernel<span style="color: #666666">.</span>shape[kernel_input_channels_index]
-            <span style="color: #666666">*</span> kernel<span style="color: #666666">.</span>shape[height_index]
-            <span style="color: #666666">*</span> kernel<span style="color: #666666">.</span>shape[width_index],
-            <span style="color: #666666">-1</span>,
-        )
-
-        <span style="color: #408080; font-style: italic"># use simple matrix calculation to obtain output</span>
-        output <span style="color: #666666">=</span> (
-            (windows <span style="color: #666666">@</span> kernel)
-            <span style="color: #666666">.</span>reshape(
-                X_batch<span style="color: #666666">.</span>shape[input_index],
-                strided_height,
-                strided_width,
-                <span style="color: #666666">-1</span>,
-            )
-            <span style="color: #666666">.</span>transpose(<span style="color: #666666">0</span>, <span style="color: #666666">3</span>, <span style="color: #666666">1</span>, <span style="color: #666666">2</span>)
-        )
-
-        <span style="color: #408080; font-style: italic"># The output is reshaped and rearranged to appropriate shape</span>
-        <span style="color: #008000; font-weight: bold">return</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>act_func(
-            output <span style="color: #666666">/</span> (<span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_height <span style="color: #666666">*</span> X_batch<span style="color: #666666">.</span>shape[feature_maps_index])
-        )
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">_backpropagate</span>(<span style="color: #008000">self</span>, delta_term_next):
-        <span style="color: #408080; font-style: italic"># The optimized _backpropagate method is difficult to understand but computationally more efficient</span>
-        <span style="color: #408080; font-style: italic"># for a more &quot;by the book&quot; approach, please look at the _backpropagate method of Convolution2DLayer</span>
-        act_derivative <span style="color: #666666">=</span> derivate(<span style="color: #008000">self</span><span style="color: #666666">.</span>act_func)
-        delta_term_next <span style="color: #666666">=</span> act_derivative(delta_term_next)
-
-        <span style="color: #408080; font-style: italic"># calculate strided dimensions</span>
-        strided_height <span style="color: #666666">=</span> <span style="color: #008000">int</span>(
-            np<span style="color: #666666">.</span>ceil(<span style="color: #008000">self</span><span style="color: #666666">.</span>X_batch_feedforward<span style="color: #666666">.</span>shape[height_index] <span style="color: #666666">/</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>v_stride)
-        )
-        strided_width <span style="color: #666666">=</span> <span style="color: #008000">int</span>(
-            np<span style="color: #666666">.</span>ceil(<span style="color: #008000">self</span><span style="color: #666666">.</span>X_batch_feedforward<span style="color: #666666">.</span>shape[width_index] <span style="color: #666666">/</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>h_stride)
-        )
-
-        <span style="color: #408080; font-style: italic"># copy kernel</span>
-        kernel <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel
-
-        <span style="color: #408080; font-style: italic"># get windows, reshape for matrix multiplication</span>
-        windows <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>_extract_windows(<span style="color: #008000">self</span><span style="color: #666666">.</span>X_batch_feedforward, <span style="color: #BA2121">&quot;image&quot;</span>)<span style="color: #666666">.</span>reshape(
-            <span style="color: #008000">self</span><span style="color: #666666">.</span>X_batch_feedforward<span style="color: #666666">.</span>shape[input_index]
-            <span style="color: #666666">*</span> strided_height
-            <span style="color: #666666">*</span> strided_width,
-            <span style="color: #666666">-1</span>,
-        )
-
-        <span style="color: #408080; font-style: italic"># initialize output gradient, reshape and transpose into correct shape</span>
-        <span style="color: #408080; font-style: italic"># for matrix multiplication</span>
-        output_grad_tr <span style="color: #666666">=</span> delta_term_next<span style="color: #666666">.</span>transpose(<span style="color: #666666">0</span>, <span style="color: #666666">2</span>, <span style="color: #666666">3</span>, <span style="color: #666666">1</span>)<span style="color: #666666">.</span>reshape(
-            <span style="color: #008000">self</span><span style="color: #666666">.</span>X_batch_feedforward<span style="color: #666666">.</span>shape[input_index]
-            <span style="color: #666666">*</span> strided_height
-            <span style="color: #666666">*</span> strided_width,
-            <span style="color: #666666">-1</span>,
-        )
-
-        <span style="color: #408080; font-style: italic"># calculate gradient kernel via simple matrix multiplication and reshaping</span>
-        gradient_kernel <span style="color: #666666">=</span> (
-            (windows<span style="color: #666666">.</span>T <span style="color: #666666">@</span> output_grad_tr)
-            <span style="color: #666666">.</span>reshape(
-                kernel<span style="color: #666666">.</span>shape[kernel_input_channels_index],
-                kernel<span style="color: #666666">.</span>shape[height_index],
-                kernel<span style="color: #666666">.</span>shape[width_index],
-                kernel<span style="color: #666666">.</span>shape[kernel_feature_maps_index],
-            )
-            <span style="color: #666666">.</span>transpose(<span style="color: #666666">0</span>, <span style="color: #666666">3</span>, <span style="color: #666666">1</span>, <span style="color: #666666">2</span>)
-        )
-
-        <span style="color: #408080; font-style: italic"># for computing the input gradient</span>
-        windows_out, upsampled_height, upsampled_width <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>_extract_windows(
-            delta_term_next, <span style="color: #BA2121">&quot;grad&quot;</span>
-        )
-
-        <span style="color: #408080; font-style: italic"># calculate new window dimensions</span>
-        new_windows_first_dim <span style="color: #666666">=</span> (
-            <span style="color: #008000">self</span><span style="color: #666666">.</span>X_batch_feedforward<span style="color: #666666">.</span>shape[input_index]
-            <span style="color: #666666">*</span> upsampled_height
-            <span style="color: #666666">*</span> upsampled_width
-        )
-        <span style="color: #408080; font-style: italic"># ceil allows for various asymmetric kernels</span>
-        new_windows_sec_dim <span style="color: #666666">=</span> <span style="color: #008000">int</span>(np<span style="color: #666666">.</span>ceil(windows_out<span style="color: #666666">.</span>size <span style="color: #666666">/</span> new_windows_first_dim))
-
-        <span style="color: #408080; font-style: italic"># reshape for matrix multiplication</span>
-        windows_out <span style="color: #666666">=</span> windows_out<span style="color: #666666">.</span>transpose(<span style="color: #666666">1</span>, <span style="color: #666666">0</span>, <span style="color: #666666">2</span>, <span style="color: #666666">3</span>, <span style="color: #666666">4</span>)<span style="color: #666666">.</span>reshape(
-            new_windows_first_dim, new_windows_sec_dim
-        )
-
-        <span style="color: #408080; font-style: italic"># reshape for matrix multiplication</span>
-        kernel_reshaped <span style="color: #666666">=</span> kernel<span style="color: #666666">.</span>reshape(<span style="color: #008000">self</span><span style="color: #666666">.</span>input_channels, <span style="color: #666666">-1</span>)
-
-        <span style="color: #408080; font-style: italic"># calculating input gradient for next convolutional layer</span>
-        input_grad <span style="color: #666666">=</span> (windows_out <span style="color: #666666">@</span> kernel_reshaped<span style="color: #666666">.</span>T)<span style="color: #666666">.</span>reshape(
-            <span style="color: #008000">self</span><span style="color: #666666">.</span>X_batch_feedforward<span style="color: #666666">.</span>shape[input_index],
-            upsampled_height,
-            upsampled_width,
-            kernel<span style="color: #666666">.</span>shape[kernel_input_channels_index],
-        )
-        input_grad <span style="color: #666666">=</span> input_grad<span style="color: #666666">.</span>transpose(<span style="color: #666666">0</span>, <span style="color: #666666">3</span>, <span style="color: #666666">1</span>, <span style="color: #666666">2</span>)
-
-        <span style="color: #408080; font-style: italic"># Update the weights in the kernel</span>
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel <span style="color: #666666">-=</span> gradient_kernel
-
-        <span style="color: #408080; font-style: italic"># Output the gradient to propagate backwards</span>
-        <span style="color: #008000; font-weight: bold">return</span> input_grad
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">_extract_windows</span>(<span style="color: #008000">self</span>, X_batch, batch_type<span style="color: #666666">=</span><span style="color: #BA2121">&quot;image&quot;</span>):
-        <span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;</span>
-<span style="color: #BA2121; font-style: italic">        Receives as input the X_batch with shape (inputs, feature_maps, image_height, image_width)</span>
-<span style="color: #BA2121; font-style: italic">        and extract windows of size kernel_height * kernel_width for every image and every feature_map.</span>
-<span style="color: #BA2121; font-style: italic">        It then returns an np.ndarray of shape (image_height * image_width, inputs, feature_maps, kernel_height, kernel_width)</span>
-<span style="color: #BA2121; font-style: italic">        which will be used either to filter the images in feedforward or to calculate the gradient.</span>
-<span style="color: #BA2121; font-style: italic">        &quot;&quot;&quot;</span>
-
-        <span style="color: #408080; font-style: italic"># initialize list of windows</span>
-        windows <span style="color: #666666">=</span> []
-
-        <span style="color: #008000; font-weight: bold">if</span> batch_type <span style="color: #666666">==</span> <span style="color: #BA2121">&quot;image&quot;</span>:
-            <span style="color: #408080; font-style: italic"># pad the images</span>
-            X_batch_padded <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>_padding(X_batch, batch_type<span style="color: #666666">=</span><span style="color: #BA2121">&quot;image&quot;</span>)
-            img_height, img_width <span style="color: #666666">=</span> X_batch_padded<span style="color: #666666">.</span>shape[<span style="color: #666666">2</span>:]
-            <span style="color: #408080; font-style: italic"># For each location in the image...</span>
-            <span style="color: #008000; font-weight: bold">for</span> h <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(
-                <span style="color: #666666">0</span>,
-                X_batch<span style="color: #666666">.</span>shape[height_index],
-                <span style="color: #008000">self</span><span style="color: #666666">.</span>v_stride,
-            ):
-                <span style="color: #008000; font-weight: bold">for</span> w <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(
-                    <span style="color: #666666">0</span>,
-                    X_batch<span style="color: #666666">.</span>shape[width_index],
-                    <span style="color: #008000">self</span><span style="color: #666666">.</span>h_stride,
-                ):
-                    <span style="color: #408080; font-style: italic"># ...obtain an image patch of the original size (strided)</span>
-
-                    <span style="color: #408080; font-style: italic"># get window</span>
-                    window <span style="color: #666666">=</span> X_batch_padded[
-                        :,
-                        :,
-                        h : h <span style="color: #666666">+</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_height,
-                        w : w <span style="color: #666666">+</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_width,
-                    ]
-
-                    <span style="color: #408080; font-style: italic"># append to list of windows</span>
-                    windows<span style="color: #666666">.</span>append(window)
-
-            <span style="color: #408080; font-style: italic"># return numpy array instead of list</span>
-            <span style="color: #008000; font-weight: bold">return</span> np<span style="color: #666666">.</span>stack(windows)
-
-        <span style="color: #408080; font-style: italic"># In order to be able to perform backprogagation by the method of window extraction,</span>
-        <span style="color: #408080; font-style: italic"># here is a modified approach to extracting the windows which allow for the necessary</span>
-        <span style="color: #408080; font-style: italic"># upsampling of the gradient in case the on of the stride parameters is larger than one.</span>
-
-        <span style="color: #008000; font-weight: bold">if</span> batch_type <span style="color: #666666">==</span> <span style="color: #BA2121">&quot;grad&quot;</span>:
-
-            <span style="color: #408080; font-style: italic"># In the case of one of the stride parameters being odd, we have to take some</span>
-            <span style="color: #408080; font-style: italic"># extra care in calculating the upsampled size of X_batch. We solve this</span>
-            <span style="color: #408080; font-style: italic"># by simply flooring the result of dividing stride by 2.</span>
-            <span style="color: #008000; font-weight: bold">if</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>v_stride <span style="color: #666666">&lt;</span> <span style="color: #666666">2</span> <span style="color: #AA22FF; font-weight: bold">or</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>v_stride <span style="color: #666666">%</span> <span style="color: #666666">2</span> <span style="color: #666666">==</span> <span style="color: #666666">0</span>:
-                v_stride <span style="color: #666666">=</span> <span style="color: #666666">0</span>
-            <span style="color: #008000; font-weight: bold">else</span>:
-                v_stride <span style="color: #666666">=</span> <span style="color: #008000">int</span>(np<span style="color: #666666">.</span>floor(<span style="color: #008000">self</span><span style="color: #666666">.</span>v_stride <span style="color: #666666">/</span> <span style="color: #666666">2</span>))
-
-            <span style="color: #008000; font-weight: bold">if</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>h_stride <span style="color: #666666">&lt;</span> <span style="color: #666666">2</span> <span style="color: #AA22FF; font-weight: bold">or</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>h_stride <span style="color: #666666">%</span> <span style="color: #666666">2</span> <span style="color: #666666">==</span> <span style="color: #666666">0</span>:
-                h_stride <span style="color: #666666">=</span> <span style="color: #666666">0</span>
-            <span style="color: #008000; font-weight: bold">else</span>:
-                h_stride <span style="color: #666666">=</span> <span style="color: #008000">int</span>(np<span style="color: #666666">.</span>floor(<span style="color: #008000">self</span><span style="color: #666666">.</span>h_stride <span style="color: #666666">/</span> <span style="color: #666666">2</span>))
-
-            upsampled_height <span style="color: #666666">=</span> (X_batch<span style="color: #666666">.</span>shape[height_index] <span style="color: #666666">*</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>v_stride) <span style="color: #666666">-</span> v_stride
-            upsampled_width <span style="color: #666666">=</span> (X_batch<span style="color: #666666">.</span>shape[width_index] <span style="color: #666666">*</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>h_stride) <span style="color: #666666">-</span> h_stride
-
-            <span style="color: #408080; font-style: italic"># When upsampling, we need to insert rows and columns filled with zeros</span>
-            <span style="color: #408080; font-style: italic"># into each feature map. How many of those we have to insert is purely</span>
-            <span style="color: #408080; font-style: italic"># dependant on the value of stride parameter in the vertical and horizontal</span>
-            <span style="color: #408080; font-style: italic"># direction.</span>
-            <span style="color: #008000; font-weight: bold">if</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>v_stride <span style="color: #666666">&gt;</span> <span style="color: #666666">1</span>:
-                v_ind <span style="color: #666666">=</span> <span style="color: #666666">1</span>
-                <span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(X_batch<span style="color: #666666">.</span>shape[height_index]):
-                    <span style="color: #008000; font-weight: bold">for</span> j <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #008000">self</span><span style="color: #666666">.</span>v_stride <span style="color: #666666">-</span> <span style="color: #666666">1</span>):
-                        X_batch <span style="color: #666666">=</span> np<span style="color: #666666">.</span>insert(X_batch, v_ind, <span style="color: #666666">0</span>, axis<span style="color: #666666">=</span>height_index)
-                    v_ind <span style="color: #666666">+=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>v_stride
-
-            <span style="color: #008000; font-weight: bold">if</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>h_stride <span style="color: #666666">&gt;</span> <span style="color: #666666">1</span>:
-                h_ind <span style="color: #666666">=</span> <span style="color: #666666">1</span>
-                <span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(X_batch<span style="color: #666666">.</span>shape[width_index]):
-                    <span style="color: #008000; font-weight: bold">for</span> k <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #008000">self</span><span style="color: #666666">.</span>h_stride <span style="color: #666666">-</span> <span style="color: #666666">1</span>):
-                        X_batch <span style="color: #666666">=</span> np<span style="color: #666666">.</span>insert(X_batch, h_ind, <span style="color: #666666">0</span>, axis<span style="color: #666666">=</span>width_index)
-                    h_ind <span style="color: #666666">+=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>h_stride
-
-            <span style="color: #408080; font-style: italic"># Since the insertion of zero-filled rows and columns isn&#39;t perfect, we have</span>
-            <span style="color: #408080; font-style: italic"># to assure that the resulting feature maps will have the expected upsampled height</span>
-            <span style="color: #408080; font-style: italic"># and width by cutting them og at desired dimensions.</span>
-
-            X_batch <span style="color: #666666">=</span> X_batch[:, :, :upsampled_height, :upsampled_width]
-
-            X_batch_padded <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>_padding(X_batch, batch_type<span style="color: #666666">=</span><span style="color: #BA2121">&quot;grad&quot;</span>)
-
-            <span style="color: #408080; font-style: italic"># initialize list of windows</span>
-            windows <span style="color: #666666">=</span> []
-
-            <span style="color: #408080; font-style: italic"># For each location in the image...</span>
-            <span style="color: #008000; font-weight: bold">for</span> h <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(
-                <span style="color: #666666">0</span>,
-                X_batch<span style="color: #666666">.</span>shape[height_index],
-                <span style="color: #008000">self</span><span style="color: #666666">.</span>v_stride,
-            ):
-                <span style="color: #008000; font-weight: bold">for</span> w <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(
-                    <span style="color: #666666">0</span>,
-                    X_batch<span style="color: #666666">.</span>shape[width_index],
-                    <span style="color: #008000">self</span><span style="color: #666666">.</span>h_stride,
-                ):
-                    <span style="color: #408080; font-style: italic"># ...obtain an image patch of the original size (strided)</span>
-
-                    <span style="color: #408080; font-style: italic"># get window</span>
-                    window <span style="color: #666666">=</span> X_batch_padded[
-                        :, :, h : h <span style="color: #666666">+</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_height, w : w <span style="color: #666666">+</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_width
-                    ]
-
-                    <span style="color: #408080; font-style: italic"># append window to list</span>
-                    windows<span style="color: #666666">.</span>append(window)
-
-            <span style="color: #408080; font-style: italic"># return numpy array, unsampled dimensions</span>
-            <span style="color: #008000; font-weight: bold">return</span> np<span style="color: #666666">.</span>stack(windows), upsampled_height, upsampled_width
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">_check_for_errors</span>(<span style="color: #008000">self</span>):
-        <span style="color: #408080; font-style: italic"># compares input channels of data to input channels of Convolution2DLayer</span>
-        <span style="color: #008000; font-weight: bold">if</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>X_batch_feedforward<span style="color: #666666">.</span>shape[input_channel_index] <span style="color: #666666">!=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>input_channels:
-            <span style="color: #008000; font-weight: bold">raise</span> <span style="color: #D2413A; font-weight: bold">AssertionError</span>(
-                <span style="color: #BA2121">f&quot;ERROR: Number of input channels in data (</span><span style="color: #BB6688; font-weight: bold">{</span><span style="color: #008000">self</span><span style="color: #666666">.</span>X_batch_feedforward<span style="color: #666666">.</span>shape[input_channel_index]<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">) is not equal to input channels in Convolution2DLayerOPT (</span><span style="color: #BB6688; font-weight: bold">{</span><span style="color: #008000">self</span><span style="color: #666666">.</span>input_channels<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">)! Please change the number of input channels of the Convolution2DLayer such that they are equal&quot;</span>
-            )
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-<h3 id="the-convolutional-neural-network-cnn" class="anchor">The Convolutional Neural Network (CNN) </h3>
-
-<p>Finally, we present the code for the CNN. The CNN class organizes all the layers, and allows for training on image data.</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">math</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">autograd.numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">sys</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">warnings</span>
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">autograd</span> <span style="color: #008000; font-weight: bold">import</span> grad, elementwise_grad
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">random</span> <span style="color: #008000; font-weight: bold">import</span> random, seed
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">copy</span> <span style="color: #008000; font-weight: bold">import</span> deepcopy
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">typing</span> <span style="color: #008000; font-weight: bold">import</span> Tuple, Callable
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.utils</span> <span style="color: #008000; font-weight: bold">import</span> resample
-
-warnings<span style="color: #666666">.</span>simplefilter(<span style="color: #BA2121">&quot;error&quot;</span>)
-
-
-<span style="color: #008000; font-weight: bold">class</span> <span style="color: #0000FF; font-weight: bold">CNN</span>:
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">__init__</span>(
-        <span style="color: #008000">self</span>,
-        cost_func: Callable <span style="color: #666666">=</span> CostCrossEntropy,
-        scheduler: Scheduler <span style="color: #666666">=</span> Adam(eta<span style="color: #666666">=1e-4</span>, rho<span style="color: #666666">=0.9</span>, rho2<span style="color: #666666">=0.999</span>),
-        seed: <span style="color: #008000">int</span> <span style="color: #666666">=</span> <span style="color: #008000; font-weight: bold">None</span>,
-    ):
-        <span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;</span>
-<span style="color: #BA2121; font-style: italic">        Description:</span>
-<span style="color: #BA2121; font-style: italic">        ------------</span>
-<span style="color: #BA2121; font-style: italic">            Instantiates CNN object</span>
-
-<span style="color: #BA2121; font-style: italic">        Parameters:</span>
-<span style="color: #BA2121; font-style: italic">        ------------</span>
-<span style="color: #BA2121; font-style: italic">            I   output_func (costFunctions) cost function for feed forward neural network part of CNN,</span>
-<span style="color: #BA2121; font-style: italic">                such as &quot;CostLogReg&quot;, &quot;CostOLS&quot; or &quot;CostCrossEntropy&quot;</span>
-
-<span style="color: #BA2121; font-style: italic">            II  scheduler (Scheduler) optional parameter, default set to Adam. Can also be set to other</span>
-<span style="color: #BA2121; font-style: italic">                schedulers such as AdaGrad, Momentum, RMS_prop and Constant. Note that schedulers have</span>
-<span style="color: #BA2121; font-style: italic">                to be instantiated first with proper parameters (for example eta, rho and rho2 for Adam)</span>
-
-<span style="color: #BA2121; font-style: italic">            III seed (int) used for seeding all random operations</span>
-<span style="color: #BA2121; font-style: italic">        &quot;&quot;&quot;</span>
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>layers <span style="color: #666666">=</span> <span style="color: #008000">list</span>()
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>cost_func <span style="color: #666666">=</span> cost_func
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>scheduler <span style="color: #666666">=</span> scheduler
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>schedulers_weight <span style="color: #666666">=</span> <span style="color: #008000">list</span>()
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>schedulers_bias <span style="color: #666666">=</span> <span style="color: #008000">list</span>()
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>seed <span style="color: #666666">=</span> seed
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>pred_format <span style="color: #666666">=</span> <span style="color: #008000; font-weight: bold">None</span>
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">add_FullyConnectedLayer</span>(
-        <span style="color: #008000">self</span>, nodes: <span style="color: #008000">int</span>, act_func<span style="color: #666666">=</span>LRELU, scheduler<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">None</span>
-    ) <span style="color: #666666">-&gt;</span> <span style="color: #008000; font-weight: bold">None</span>:
-        <span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;</span>
-<span style="color: #BA2121; font-style: italic">        Description:</span>
-<span style="color: #BA2121; font-style: italic">        ------------</span>
-<span style="color: #BA2121; font-style: italic">            Add a FullyConnectedLayer to the CNN, i.e. a hidden layer in the feed forward neural</span>
-<span style="color: #BA2121; font-style: italic">            network part of the CNN. Often called a Dense layer in literature</span>
-
-<span style="color: #BA2121; font-style: italic">        Parameters:</span>
-<span style="color: #BA2121; font-style: italic">        ------------</span>
-<span style="color: #BA2121; font-style: italic">            I   nodes (int) number of nodes in FullyConnectedLayer</span>
-<span style="color: #BA2121; font-style: italic">            II  act_func (activationFunctions) activation function of FullyConnectedLayer,</span>
-<span style="color: #BA2121; font-style: italic">                such as &quot;sigmoid&quot;, &quot;RELU&quot;, &quot;LRELU&quot;, &quot;softmax&quot; or &quot;identity&quot;</span>
-<span style="color: #BA2121; font-style: italic">            III scheduler (Scheduler) optional parameter, default set to Adam. Can also be set to other</span>
-<span style="color: #BA2121; font-style: italic">                schedulers such as AdaGrad, Momentum, RMS_prop and Constant</span>
-<span style="color: #BA2121; font-style: italic">        &quot;&quot;&quot;</span>
-        <span style="color: #008000; font-weight: bold">assert</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>layers, <span style="color: #BA2121">&quot;FullyConnectedLayer should follow FlattenLayer in CNN&quot;</span>
-
-        <span style="color: #008000; font-weight: bold">if</span> scheduler <span style="color: #AA22FF; font-weight: bold">is</span> <span style="color: #008000; font-weight: bold">None</span>:
-            scheduler <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>scheduler
-
-        layer <span style="color: #666666">=</span> FullyConnectedLayer(nodes, act_func, scheduler, <span style="color: #008000">self</span><span style="color: #666666">.</span>seed)
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>layers<span style="color: #666666">.</span>append(layer)
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">add_OutputLayer</span>(<span style="color: #008000">self</span>, nodes: <span style="color: #008000">int</span>, output_func<span style="color: #666666">=</span>sigmoid, scheduler<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">None</span>) <span style="color: #666666">-&gt;</span> <span style="color: #008000; font-weight: bold">None</span>:
-        <span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;</span>
-<span style="color: #BA2121; font-style: italic">        Description:</span>
-<span style="color: #BA2121; font-style: italic">        ------------</span>
-<span style="color: #BA2121; font-style: italic">            Add an OutputLayer to the CNN, i.e. a the final layer in the feed forward neural</span>
-<span style="color: #BA2121; font-style: italic">            network part of the CNN</span>
-
-<span style="color: #BA2121; font-style: italic">        Parameters:</span>
-<span style="color: #BA2121; font-style: italic">        ------------</span>
-<span style="color: #BA2121; font-style: italic">            I   nodes (int) number of nodes in OutputLayer. Set nodes=1 for binary classification and</span>
-<span style="color: #BA2121; font-style: italic">                nodes = number of classes for multi-class classification</span>
-<span style="color: #BA2121; font-style: italic">            II  output_func (activationFunctions) activation function for the output layer, such as</span>
-<span style="color: #BA2121; font-style: italic">                &quot;identity&quot; for regression, &quot;sigmoid&quot; for binary classification and &quot;softmax&quot; for multi-class</span>
-<span style="color: #BA2121; font-style: italic">                classification</span>
-<span style="color: #BA2121; font-style: italic">            III scheduler (Scheduler) optional parameter, default set to Adam. Can also be set to other</span>
-<span style="color: #BA2121; font-style: italic">                schedulers such as AdaGrad, Momentum, RMS_prop and Constant</span>
-<span style="color: #BA2121; font-style: italic">        &quot;&quot;&quot;</span>
-        <span style="color: #008000; font-weight: bold">assert</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>layers, <span style="color: #BA2121">&quot;OutputLayer should follow FullyConnectedLayer in CNN&quot;</span>
-
-        <span style="color: #008000; font-weight: bold">if</span> scheduler <span style="color: #AA22FF; font-weight: bold">is</span> <span style="color: #008000; font-weight: bold">None</span>:
-            scheduler <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>scheduler
-
-        output_layer <span style="color: #666666">=</span> OutputLayer(
-            nodes, output_func, <span style="color: #008000">self</span><span style="color: #666666">.</span>cost_func, scheduler, <span style="color: #008000">self</span><span style="color: #666666">.</span>seed
-        )
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>layers<span style="color: #666666">.</span>append(output_layer)
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>pred_format <span style="color: #666666">=</span> output_layer<span style="color: #666666">.</span>get_pred_format()
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">add_FlattenLayer</span>(<span style="color: #008000">self</span>, act_func<span style="color: #666666">=</span>LRELU) <span style="color: #666666">-&gt;</span> <span style="color: #008000; font-weight: bold">None</span>:
-        <span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;</span>
-<span style="color: #BA2121; font-style: italic">        Description:</span>
-<span style="color: #BA2121; font-style: italic">        ------------</span>
-<span style="color: #BA2121; font-style: italic">            Add a FlattenLayer to the CNN, which flattens the image data such that it is formatted to</span>
-<span style="color: #BA2121; font-style: italic">            be used in the feed forward neural network part of the CNN</span>
-<span style="color: #BA2121; font-style: italic">        &quot;&quot;&quot;</span>
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>layers<span style="color: #666666">.</span>append(FlattenLayer(act_func<span style="color: #666666">=</span>act_func, seed<span style="color: #666666">=</span><span style="color: #008000">self</span><span style="color: #666666">.</span>seed))
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">add_Convolution2DLayer</span>(
-        <span style="color: #008000">self</span>,
-        input_channels<span style="color: #666666">=1</span>,
-        feature_maps<span style="color: #666666">=1</span>,
-        kernel_height<span style="color: #666666">=3</span>,
-        kernel_width<span style="color: #666666">=3</span>,
-        v_stride<span style="color: #666666">=1</span>,
-        h_stride<span style="color: #666666">=1</span>,
-        pad<span style="color: #666666">=</span><span style="color: #BA2121">&quot;same&quot;</span>,
-        act_func<span style="color: #666666">=</span>LRELU,
-        optimized<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">True</span>,
-    ) <span style="color: #666666">-&gt;</span> <span style="color: #008000; font-weight: bold">None</span>:
-        <span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;</span>
-<span style="color: #BA2121; font-style: italic">        Description:</span>
-<span style="color: #BA2121; font-style: italic">        ------------</span>
-<span style="color: #BA2121; font-style: italic">            Add a Convolution2DLayer to the CNN, i.e. a convolutional layer with a 2 dimensional kernel. Should be</span>
-<span style="color: #BA2121; font-style: italic">            the first layer added to the CNN</span>
-
-<span style="color: #BA2121; font-style: italic">        Parameters:</span>
-<span style="color: #BA2121; font-style: italic">        ------------</span>
-<span style="color: #BA2121; font-style: italic">            I   input_channels (int) specifies amount of input channels. For monochrome images, use input_channels</span>
-<span style="color: #BA2121; font-style: italic">                = 1, and input_channels = 3 for colored images, where each channel represents one of R, G and B</span>
-<span style="color: #BA2121; font-style: italic">            II  feature_maps (int) amount of feature maps in CNN</span>
-<span style="color: #BA2121; font-style: italic">            III kernel_height (int) height of the kernel, also called &#39;convolutional filter&#39; in literature</span>
-<span style="color: #BA2121; font-style: italic">            IV  kernel_width (int) width of the kernel, also called &#39;convolutional filter&#39; in literature</span>
-<span style="color: #BA2121; font-style: italic">            V   v_stride (int) value of vertical stride for dimentionality reduction</span>
-<span style="color: #BA2121; font-style: italic">            VI  h_stride (int) value of horizontal stride for dimentionality reduction</span>
-<span style="color: #BA2121; font-style: italic">            VII pad (str) default = &quot;same&quot; ensures output size is the same as input size (given stride=1)</span>
-<span style="color: #BA2121; font-style: italic">           VIII act_func (activationFunctions) default = &quot;LRELU&quot;, nonlinear activation function</span>
-<span style="color: #BA2121; font-style: italic">             IX optimized (bool) default = True, uses Convolution2DLayerOPT if True which is much faster when</span>
-<span style="color: #BA2121; font-style: italic">                compared to Convolution2DLayer, which is a more straightforward, understandable implementation</span>
-<span style="color: #BA2121; font-style: italic">        &quot;&quot;&quot;</span>
-        <span style="color: #008000; font-weight: bold">if</span> optimized:
-            conv_layer <span style="color: #666666">=</span> Convolution2DLayerOPT(
-                input_channels,
-                feature_maps,
-                kernel_height,
-                kernel_width,
-                v_stride,
-                h_stride,
-                pad,
-                act_func,
-                <span style="color: #008000">self</span><span style="color: #666666">.</span>seed,
-                reset_weights_independently<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">False</span>,
-            )
-        <span style="color: #008000; font-weight: bold">else</span>:
-            conv_layer <span style="color: #666666">=</span> Convolution2DLayer(
-                input_channels,
-                feature_maps,
-                kernel_height,
-                kernel_width,
-                v_stride,
-                h_stride,
-                pad,
-                act_func,
-                <span style="color: #008000">self</span><span style="color: #666666">.</span>seed,
-                reset_weights_independently<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">False</span>,
-            )
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>layers<span style="color: #666666">.</span>append(conv_layer)
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">add_PoolingLayer</span>(
-        <span style="color: #008000">self</span>, kernel_height<span style="color: #666666">=2</span>, kernel_width<span style="color: #666666">=2</span>, v_stride<span style="color: #666666">=1</span>, h_stride<span style="color: #666666">=1</span>, pooling<span style="color: #666666">=</span><span style="color: #BA2121">&quot;max&quot;</span>
-    ) <span style="color: #666666">-&gt;</span> <span style="color: #008000; font-weight: bold">None</span>:
-        <span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;</span>
-<span style="color: #BA2121; font-style: italic">        Description:</span>
-<span style="color: #BA2121; font-style: italic">        ------------</span>
-<span style="color: #BA2121; font-style: italic">            Add a Pooling2DLayer to the CNN, i.e. a pooling layer that reduces the dimentionality of</span>
-<span style="color: #BA2121; font-style: italic">            the image data. It is not necessary to use a Pooling2DLayer when creating a CNN, but it</span>
-<span style="color: #BA2121; font-style: italic">            can be used to speed up the training</span>
-
-<span style="color: #BA2121; font-style: italic">        Parameters:</span>
-<span style="color: #BA2121; font-style: italic">        ------------</span>
-<span style="color: #BA2121; font-style: italic">            I   kernel_height (int) height of the kernel used for pooling</span>
-<span style="color: #BA2121; font-style: italic">            II  kernel_width (int) width of the kernel used for pooling</span>
-<span style="color: #BA2121; font-style: italic">            III v_stride (int) value of vertical stride for dimentionality reduction</span>
-<span style="color: #BA2121; font-style: italic">            IV  h_stride (int) value of horizontal stride for dimentionality reduction</span>
-<span style="color: #BA2121; font-style: italic">            V   pooling (str) either &quot;max&quot; or &quot;average&quot;, describes type of pooling performed</span>
-<span style="color: #BA2121; font-style: italic">        &quot;&quot;&quot;</span>
-        pooling_layer <span style="color: #666666">=</span> Pooling2DLayer(
-            kernel_height, kernel_width, v_stride, h_stride, pooling, <span style="color: #008000">self</span><span style="color: #666666">.</span>seed
-        )
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>layers<span style="color: #666666">.</span>append(pooling_layer)
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">fit</span>(
-        <span style="color: #008000">self</span>,
-        X: np<span style="color: #666666">.</span>ndarray,
-        t: np<span style="color: #666666">.</span>ndarray,
-        epochs: <span style="color: #008000">int</span> <span style="color: #666666">=</span> <span style="color: #666666">100</span>,
-        lam: <span style="color: #008000">float</span> <span style="color: #666666">=</span> <span style="color: #666666">0</span>,
-        batches: <span style="color: #008000">int</span> <span style="color: #666666">=</span> <span style="color: #666666">1</span>,
-        X_val: np<span style="color: #666666">.</span>ndarray <span style="color: #666666">=</span> <span style="color: #008000; font-weight: bold">None</span>,
-        t_val: np<span style="color: #666666">.</span>ndarray <span style="color: #666666">=</span> <span style="color: #008000; font-weight: bold">None</span>,
-    ) <span style="color: #666666">-&gt;</span> <span style="color: #008000">dict</span>:
-        <span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;</span>
-<span style="color: #BA2121; font-style: italic">        Description:</span>
-<span style="color: #BA2121; font-style: italic">        ------------</span>
-<span style="color: #BA2121; font-style: italic">            Fits the CNN to input X for a given amount of epochs. Performs feedforward and backpropagation passes,</span>
-<span style="color: #BA2121; font-style: italic">            can utilize batches, regulariziation and validation if desired.</span>
-
-<span style="color: #BA2121; font-style: italic">        Parameters:</span>
-<span style="color: #BA2121; font-style: italic">        ------------</span>
-<span style="color: #BA2121; font-style: italic">            X (numpy array) with input data in format [images, input channels,</span>
-<span style="color: #BA2121; font-style: italic">            image height, image_width]</span>
-<span style="color: #BA2121; font-style: italic">            t (numpy array) target labels for input data</span>
-<span style="color: #BA2121; font-style: italic">            epochs (int) amount of epochs</span>
-<span style="color: #BA2121; font-style: italic">            lam (float) regulariziation term lambda</span>
-<span style="color: #BA2121; font-style: italic">            batches (int) amount of batches input data splits into</span>
-<span style="color: #BA2121; font-style: italic">            X_val (numpy array) validation data</span>
-<span style="color: #BA2121; font-style: italic">            t_val (numpy array) target labels for validation data</span>
-
-<span style="color: #BA2121; font-style: italic">        Returns:</span>
-<span style="color: #BA2121; font-style: italic">        ------------</span>
-<span style="color: #BA2121; font-style: italic">            scores (dict) a dictionary with &quot;train_error&quot;, &quot;train_acc&quot;, &quot;val_error&quot;, val_acc&quot; keys</span>
-<span style="color: #BA2121; font-style: italic">            that contain numpy arrays with float values of all accuracies/errors over all epochs.</span>
-<span style="color: #BA2121; font-style: italic">            Can be used to create plots. Also used to update the progress bar during training</span>
-<span style="color: #BA2121; font-style: italic">        &quot;&quot;&quot;</span>
-
-        <span style="color: #408080; font-style: italic"># setup</span>
-        <span style="color: #008000; font-weight: bold">if</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>seed <span style="color: #AA22FF; font-weight: bold">is</span> <span style="color: #AA22FF; font-weight: bold">not</span> <span style="color: #008000; font-weight: bold">None</span>:
-            np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>seed(<span style="color: #008000">self</span><span style="color: #666666">.</span>seed)
-
-        <span style="color: #408080; font-style: italic"># initialize weights</span>
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>_initialize_weights(X)
-
-        <span style="color: #408080; font-style: italic"># create arrays for score metrics</span>
-        scores <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>_initialize_scores(epochs)
-
-        <span style="color: #008000; font-weight: bold">assert</span> batches <span style="color: #666666">&lt;=</span> t<span style="color: #666666">.</span>shape[<span style="color: #666666">0</span>]
-        batch_size <span style="color: #666666">=</span> X<span style="color: #666666">.</span>shape[<span style="color: #666666">0</span>] <span style="color: #666666">//</span> batches
-
-        <span style="color: #008000; font-weight: bold">try</span>:
-            <span style="color: #008000; font-weight: bold">for</span> epoch <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(epochs):
-                <span style="color: #008000; font-weight: bold">for</span> batch <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(batches):
-                    <span style="color: #408080; font-style: italic"># minibatch gradient descent</span>
-                    <span style="color: #408080; font-style: italic"># If the for loop has reached the last batch, take all thats left</span>
-                    <span style="color: #008000; font-weight: bold">if</span> batch <span style="color: #666666">==</span> batches <span style="color: #666666">-</span> <span style="color: #666666">1</span>:
-                        X_batch <span style="color: #666666">=</span> X[batch <span style="color: #666666">*</span> batch_size :, :, :, :]
-                        t_batch <span style="color: #666666">=</span> t[batch <span style="color: #666666">*</span> batch_size :, :]
-                    <span style="color: #008000; font-weight: bold">else</span>:
-                        X_batch <span style="color: #666666">=</span> X[
-                            batch <span style="color: #666666">*</span> batch_size : (batch <span style="color: #666666">+</span> <span style="color: #666666">1</span>) <span style="color: #666666">*</span> batch_size, :, :, :
-                        ]
-                        t_batch <span style="color: #666666">=</span> t[batch <span style="color: #666666">*</span> batch_size : (batch <span style="color: #666666">+</span> <span style="color: #666666">1</span>) <span style="color: #666666">*</span> batch_size, :]
-
-                    <span style="color: #008000">self</span><span style="color: #666666">.</span>_feedforward(X_batch)
-                    <span style="color: #008000">self</span><span style="color: #666666">.</span>_backpropagate(t_batch, lam)
-
-                <span style="color: #408080; font-style: italic"># reset schedulers for each epoch (some schedulers pass in this call)</span>
-                <span style="color: #008000; font-weight: bold">for</span> layer <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>layers:
-                    <span style="color: #008000; font-weight: bold">if</span> <span style="color: #008000">isinstance</span>(layer, FullyConnectedLayer):
-                        layer<span style="color: #666666">.</span>_reset_scheduler()
-
-                <span style="color: #408080; font-style: italic"># computing performance metrics</span>
-                scores <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>_compute_scores(scores, epoch, X, t, X_val, t_val)
-
-                <span style="color: #408080; font-style: italic"># printing progress bar</span>
-                print_length <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>_progress_bar(
-                    epoch,
-                    epochs,
-                    scores,
-                )
-        <span style="color: #408080; font-style: italic"># allows for stopping training at any point and seeing the result</span>
-        <span style="color: #008000; font-weight: bold">except</span> <span style="color: #D2413A; font-weight: bold">KeyboardInterrupt</span>:
-            <span style="color: #008000; font-weight: bold">pass</span>
-
-        <span style="color: #408080; font-style: italic"># visualization of training progression (similiar to tensorflow progression bar)</span>
-        sys<span style="color: #666666">.</span>stdout<span style="color: #666666">.</span>write(<span style="color: #BA2121">&quot;</span><span style="color: #BB6622; font-weight: bold">\r</span><span style="color: #BA2121">&quot;</span> <span style="color: #666666">+</span> <span style="color: #BA2121">&quot; &quot;</span> <span style="color: #666666">*</span> print_length)
-        sys<span style="color: #666666">.</span>stdout<span style="color: #666666">.</span>flush()
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>_progress_bar(
-            epochs,
-            epochs,
-            scores,
-        )
-        sys<span style="color: #666666">.</span>stdout<span style="color: #666666">.</span>write(<span style="color: #BA2121">&quot;&quot;</span>)
-
-        <span style="color: #008000; font-weight: bold">return</span> scores
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">_feedforward</span>(<span style="color: #008000">self</span>, X_batch) <span style="color: #666666">-&gt;</span> np<span style="color: #666666">.</span>ndarray:
-        <span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;</span>
-<span style="color: #BA2121; font-style: italic">        Description:</span>
-<span style="color: #BA2121; font-style: italic">        ------------</span>
-<span style="color: #BA2121; font-style: italic">            Performs the feedforward pass for all layers in the CNN. Called from fit()</span>
-<span style="color: #BA2121; font-style: italic">        &quot;&quot;&quot;</span>
-        a <span style="color: #666666">=</span> X_batch
-        <span style="color: #008000; font-weight: bold">for</span> layer <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>layers:
-            a <span style="color: #666666">=</span> layer<span style="color: #666666">.</span>_feedforward(a)
-
-        <span style="color: #008000; font-weight: bold">return</span> a
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">_backpropagate</span>(<span style="color: #008000">self</span>, t_batch, lam) <span style="color: #666666">-&gt;</span> <span style="color: #008000; font-weight: bold">None</span>:
-        <span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;</span>
-<span style="color: #BA2121; font-style: italic">        Description:</span>
-<span style="color: #BA2121; font-style: italic">        ------------</span>
-<span style="color: #BA2121; font-style: italic">            Performs backpropagation for all layers in the CNN. Called from fit()</span>
-<span style="color: #BA2121; font-style: italic">        &quot;&quot;&quot;</span>
-        <span style="color: #008000; font-weight: bold">assert</span> <span style="color: #008000">len</span>(<span style="color: #008000">self</span><span style="color: #666666">.</span>layers) <span style="color: #666666">&gt;=</span> <span style="color: #666666">2</span>
-        reversed_layers <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>layers[::<span style="color: #666666">-1</span>]
-
-        <span style="color: #408080; font-style: italic"># for every layer, backwards</span>
-        <span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #008000">len</span>(reversed_layers) <span style="color: #666666">-</span> <span style="color: #666666">1</span>):
-            layer <span style="color: #666666">=</span> reversed_layers[i]
-            prev_layer <span style="color: #666666">=</span> reversed_layers[i <span style="color: #666666">+</span> <span style="color: #666666">1</span>]
-
-            <span style="color: #408080; font-style: italic"># OutputLayer</span>
-            <span style="color: #008000; font-weight: bold">if</span> <span style="color: #008000">isinstance</span>(layer, OutputLayer):
-                prev_a <span style="color: #666666">=</span> prev_layer<span style="color: #666666">.</span>get_prev_a()
-                weights_next, delta_next <span style="color: #666666">=</span> layer<span style="color: #666666">.</span>_backpropagate(t_batch, prev_a, lam)
-
-            <span style="color: #408080; font-style: italic"># FullyConnectedLayer</span>
-            <span style="color: #008000; font-weight: bold">elif</span> <span style="color: #008000">isinstance</span>(layer, FullyConnectedLayer):
-                <span style="color: #008000; font-weight: bold">assert</span> (
-                    delta_next <span style="color: #AA22FF; font-weight: bold">is</span> <span style="color: #AA22FF; font-weight: bold">not</span> <span style="color: #008000; font-weight: bold">None</span>
-                ), <span style="color: #BA2121">&quot;No OutputLayer to follow FullyConnectedLayer&quot;</span>
-                <span style="color: #008000; font-weight: bold">assert</span> (
-                    weights_next <span style="color: #AA22FF; font-weight: bold">is</span> <span style="color: #AA22FF; font-weight: bold">not</span> <span style="color: #008000; font-weight: bold">None</span>
-                ), <span style="color: #BA2121">&quot;No OutputLayer to follow FullyConnectedLayer&quot;</span>
-                prev_a <span style="color: #666666">=</span> prev_layer<span style="color: #666666">.</span>get_prev_a()
-                weights_next, delta_next <span style="color: #666666">=</span> layer<span style="color: #666666">.</span>_backpropagate(
-                    weights_next, delta_next, prev_a, lam
-                )
-
-            <span style="color: #408080; font-style: italic"># FlattenLayer</span>
-            <span style="color: #008000; font-weight: bold">elif</span> <span style="color: #008000">isinstance</span>(layer, FlattenLayer):
-                <span style="color: #008000; font-weight: bold">assert</span> (
-                    delta_next <span style="color: #AA22FF; font-weight: bold">is</span> <span style="color: #AA22FF; font-weight: bold">not</span> <span style="color: #008000; font-weight: bold">None</span>
-                ), <span style="color: #BA2121">&quot;No FullyConnectedLayer to follow FlattenLayer&quot;</span>
-                <span style="color: #008000; font-weight: bold">assert</span> (
-                    weights_next <span style="color: #AA22FF; font-weight: bold">is</span> <span style="color: #AA22FF; font-weight: bold">not</span> <span style="color: #008000; font-weight: bold">None</span>
-                ), <span style="color: #BA2121">&quot;No FullyConnectedLayer to follow FlattenLayer&quot;</span>
-                delta_next <span style="color: #666666">=</span> layer<span style="color: #666666">.</span>_backpropagate(weights_next, delta_next)
-
-            <span style="color: #408080; font-style: italic"># Convolution2DLayer and Convolution2DLayerOPT</span>
-            <span style="color: #008000; font-weight: bold">elif</span> <span style="color: #008000">isinstance</span>(layer, Convolution2DLayer):
-                <span style="color: #008000; font-weight: bold">assert</span> (
-                    delta_next <span style="color: #AA22FF; font-weight: bold">is</span> <span style="color: #AA22FF; font-weight: bold">not</span> <span style="color: #008000; font-weight: bold">None</span>
-                ), <span style="color: #BA2121">&quot;No FlattenLayer to follow Convolution2DLayer&quot;</span>
-                delta_next <span style="color: #666666">=</span> layer<span style="color: #666666">.</span>_backpropagate(delta_next)
-
-            <span style="color: #408080; font-style: italic"># Pooling2DLayer</span>
-            <span style="color: #008000; font-weight: bold">elif</span> <span style="color: #008000">isinstance</span>(layer, Pooling2DLayer):
-                <span style="color: #008000; font-weight: bold">assert</span> delta_next <span style="color: #AA22FF; font-weight: bold">is</span> <span style="color: #AA22FF; font-weight: bold">not</span> <span style="color: #008000; font-weight: bold">None</span>, <span style="color: #BA2121">&quot;No Layer to follow Pooling2DLayer&quot;</span>
-                delta_next <span style="color: #666666">=</span> layer<span style="color: #666666">.</span>_backpropagate(delta_next)
-
-            <span style="color: #408080; font-style: italic"># Catch error</span>
-            <span style="color: #008000; font-weight: bold">else</span>:
-                <span style="color: #008000; font-weight: bold">raise</span> <span style="color: #D2413A; font-weight: bold">NotImplementedError</span>
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">_compute_scores</span>(
-        <span style="color: #008000">self</span>,
-        scores: <span style="color: #008000">dict</span>,
-        epoch: <span style="color: #008000">int</span>,
-        X: np<span style="color: #666666">.</span>ndarray,
-        t: np<span style="color: #666666">.</span>ndarray,
-        X_val: np<span style="color: #666666">.</span>ndarray,
-        t_val: np<span style="color: #666666">.</span>ndarray,
-    ) <span style="color: #666666">-&gt;</span> <span style="color: #008000">dict</span>:
-        <span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;</span>
-<span style="color: #BA2121; font-style: italic">        Description:</span>
-<span style="color: #BA2121; font-style: italic">        ------------</span>
-<span style="color: #BA2121; font-style: italic">            Computes scores such as training error, training accuracy, validation error</span>
-<span style="color: #BA2121; font-style: italic">            and validation accuracy for the CNN depending on if a validation set is used</span>
-<span style="color: #BA2121; font-style: italic">            and if the CNN performs classification or regression</span>
-
-<span style="color: #BA2121; font-style: italic">        Returns:</span>
-<span style="color: #BA2121; font-style: italic">        ------------</span>
-<span style="color: #BA2121; font-style: italic">            scores (dict) a dictionary with &quot;train_error&quot;, &quot;train_acc&quot;, &quot;val_error&quot;, val_acc&quot; keys</span>
-<span style="color: #BA2121; font-style: italic">            that contain numpy arrays with float values of all accuracies/errors over all epochs.</span>
-<span style="color: #BA2121; font-style: italic">            Can be used to create plots. Also used to update the progress bar during training</span>
-<span style="color: #BA2121; font-style: italic">        &quot;&quot;&quot;</span>
-
-        pred_train <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>predict(X)
-        cost_function_train <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>cost_func(t)
-        train_error <span style="color: #666666">=</span> cost_function_train(pred_train)
-        scores[<span style="color: #BA2121">&quot;train_error&quot;</span>][epoch] <span style="color: #666666">=</span> train_error
-
-        <span style="color: #008000; font-weight: bold">if</span> X_val <span style="color: #AA22FF; font-weight: bold">is</span> <span style="color: #AA22FF; font-weight: bold">not</span> <span style="color: #008000; font-weight: bold">None</span> <span style="color: #AA22FF; font-weight: bold">and</span> t_val <span style="color: #AA22FF; font-weight: bold">is</span> <span style="color: #AA22FF; font-weight: bold">not</span> <span style="color: #008000; font-weight: bold">None</span>:
-            cost_function_val <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>cost_func(t_val)
-            pred_val <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>predict(X_val)
-            val_error <span style="color: #666666">=</span> cost_function_val(pred_val)
-            scores[<span style="color: #BA2121">&quot;val_error&quot;</span>][epoch] <span style="color: #666666">=</span> val_error
-
-        <span style="color: #008000; font-weight: bold">if</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>pred_format <span style="color: #666666">!=</span> <span style="color: #BA2121">&quot;Regression&quot;</span>:
-            train_acc <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>_accuracy(pred_train, t)
-            scores[<span style="color: #BA2121">&quot;train_acc&quot;</span>][epoch] <span style="color: #666666">=</span> train_acc
-            <span style="color: #008000; font-weight: bold">if</span> X_val <span style="color: #AA22FF; font-weight: bold">is</span> <span style="color: #AA22FF; font-weight: bold">not</span> <span style="color: #008000; font-weight: bold">None</span> <span style="color: #AA22FF; font-weight: bold">and</span> t_val <span style="color: #AA22FF; font-weight: bold">is</span> <span style="color: #AA22FF; font-weight: bold">not</span> <span style="color: #008000; font-weight: bold">None</span>:
-                val_acc <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>_accuracy(pred_val, t_val)
-                scores[<span style="color: #BA2121">&quot;val_acc&quot;</span>][epoch] <span style="color: #666666">=</span> val_acc
-
-        <span style="color: #008000; font-weight: bold">return</span> scores
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">_initialize_scores</span>(<span style="color: #008000">self</span>, epochs) <span style="color: #666666">-&gt;</span> <span style="color: #008000">dict</span>:
-        <span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;</span>
-<span style="color: #BA2121; font-style: italic">        Description:</span>
-<span style="color: #BA2121; font-style: italic">        ------------</span>
-<span style="color: #BA2121; font-style: italic">            Initializes scores such as training error, training accuracy, validation error</span>
-<span style="color: #BA2121; font-style: italic">            and validation accuracy for the CNN</span>
-
-<span style="color: #BA2121; font-style: italic">        Returns:</span>
-<span style="color: #BA2121; font-style: italic">        ------------</span>
-<span style="color: #BA2121; font-style: italic">            A dictionary with &quot;train_error&quot;, &quot;train_acc&quot;, &quot;val_error&quot;, val_acc&quot; keys that</span>
-<span style="color: #BA2121; font-style: italic">            will contain numpy arrays with float values of all accuracies/errors over all epochs</span>
-<span style="color: #BA2121; font-style: italic">            when passed through the _compute_scores() function during fit()</span>
-<span style="color: #BA2121; font-style: italic">        &quot;&quot;&quot;</span>
-        scores <span style="color: #666666">=</span> <span style="color: #008000">dict</span>()
-
-        train_errors <span style="color: #666666">=</span> np<span style="color: #666666">.</span>empty(epochs)
-        train_errors<span style="color: #666666">.</span>fill(np<span style="color: #666666">.</span>nan)
-        val_errors <span style="color: #666666">=</span> np<span style="color: #666666">.</span>empty(epochs)
-        val_errors<span style="color: #666666">.</span>fill(np<span style="color: #666666">.</span>nan)
-
-        train_accs <span style="color: #666666">=</span> np<span style="color: #666666">.</span>empty(epochs)
-        train_accs<span style="color: #666666">.</span>fill(np<span style="color: #666666">.</span>nan)
-        val_accs <span style="color: #666666">=</span> np<span style="color: #666666">.</span>empty(epochs)
-        val_accs<span style="color: #666666">.</span>fill(np<span style="color: #666666">.</span>nan)
-
-        scores[<span style="color: #BA2121">&quot;train_error&quot;</span>] <span style="color: #666666">=</span> train_errors
-        scores[<span style="color: #BA2121">&quot;val_error&quot;</span>] <span style="color: #666666">=</span> val_errors
-        scores[<span style="color: #BA2121">&quot;train_acc&quot;</span>] <span style="color: #666666">=</span> train_accs
-        scores[<span style="color: #BA2121">&quot;val_acc&quot;</span>] <span style="color: #666666">=</span> val_accs
-
-        <span style="color: #008000; font-weight: bold">return</span> scores
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">_initialize_weights</span>(<span style="color: #008000">self</span>, X: np<span style="color: #666666">.</span>ndarray) <span style="color: #666666">-&gt;</span> <span style="color: #008000; font-weight: bold">None</span>:
-        <span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;</span>
-<span style="color: #BA2121; font-style: italic">        Description:</span>
-<span style="color: #BA2121; font-style: italic">        ------------</span>
-<span style="color: #BA2121; font-style: italic">            Initializes weights for all layers in CNN</span>
-
-<span style="color: #BA2121; font-style: italic">        Parameters:</span>
-<span style="color: #BA2121; font-style: italic">        ------------</span>
-<span style="color: #BA2121; font-style: italic">            I   X (np.ndarray) input of format [img, feature_maps, height, width]</span>
-<span style="color: #BA2121; font-style: italic">        &quot;&quot;&quot;</span>
-        prev_nodes <span style="color: #666666">=</span> X
-        <span style="color: #008000; font-weight: bold">for</span> layer <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>layers:
-            prev_nodes <span style="color: #666666">=</span> layer<span style="color: #666666">.</span>_reset_weights(prev_nodes)
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">predict</span>(<span style="color: #008000">self</span>, X: np<span style="color: #666666">.</span>ndarray, <span style="color: #666666">*</span>, threshold<span style="color: #666666">=0.5</span>) <span style="color: #666666">-&gt;</span> np<span style="color: #666666">.</span>ndarray:
-        <span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;</span>
-<span style="color: #BA2121; font-style: italic">        Description:</span>
-<span style="color: #BA2121; font-style: italic">        ------------</span>
-<span style="color: #BA2121; font-style: italic">            Predicts output of input X</span>
-
-<span style="color: #BA2121; font-style: italic">        Parameters:</span>
-<span style="color: #BA2121; font-style: italic">        ------------</span>
-<span style="color: #BA2121; font-style: italic">            I   X (np.ndarray) input [img, feature_maps, height, width]</span>
-<span style="color: #BA2121; font-style: italic">        &quot;&quot;&quot;</span>
-
-        prediction <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>_feedforward(X)
-
-        <span style="color: #008000; font-weight: bold">if</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>pred_format <span style="color: #666666">==</span> <span style="color: #BA2121">&quot;Binary&quot;</span>:
-            <span style="color: #008000; font-weight: bold">return</span> np<span style="color: #666666">.</span>where(prediction <span style="color: #666666">&gt;</span> threshold, <span style="color: #666666">1</span>, <span style="color: #666666">0</span>)
-        <span style="color: #008000; font-weight: bold">elif</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>pred_format <span style="color: #666666">==</span> <span style="color: #BA2121">&quot;Multi-class&quot;</span>:
-            class_prediction <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros(prediction<span style="color: #666666">.</span>shape)
-            <span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(prediction<span style="color: #666666">.</span>shape[<span style="color: #666666">0</span>]):
-                class_prediction[i, np<span style="color: #666666">.</span>argmax(prediction[i, :])] <span style="color: #666666">=</span> <span style="color: #666666">1</span>
-            <span style="color: #008000; font-weight: bold">return</span> class_prediction
-        <span style="color: #008000; font-weight: bold">else</span>:
-            <span style="color: #008000; font-weight: bold">return</span> prediction
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">_accuracy</span>(<span style="color: #008000">self</span>, prediction: np<span style="color: #666666">.</span>ndarray, target: np<span style="color: #666666">.</span>ndarray) <span style="color: #666666">-&gt;</span> <span style="color: #008000">float</span>:
-        <span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;</span>
-<span style="color: #BA2121; font-style: italic">        Description:</span>
-<span style="color: #BA2121; font-style: italic">        ------------</span>
-<span style="color: #BA2121; font-style: italic">            Calculates accuracy of given prediction to target</span>
-
-<span style="color: #BA2121; font-style: italic">        Parameters:</span>
-<span style="color: #BA2121; font-style: italic">        ------------</span>
-<span style="color: #BA2121; font-style: italic">            I   prediction (np.ndarray): output of predict() fuction</span>
-<span style="color: #BA2121; font-style: italic">            (1s and 0s in case of classification, and real numbers in case of regression)</span>
-<span style="color: #BA2121; font-style: italic">            II  target (np.ndarray): vector of true values (What the network should predict)</span>
-
-<span style="color: #BA2121; font-style: italic">        Returns:</span>
-<span style="color: #BA2121; font-style: italic">        ------------</span>
-<span style="color: #BA2121; font-style: italic">            A floating point number representing the percentage of correctly classified instances.</span>
-<span style="color: #BA2121; font-style: italic">        &quot;&quot;&quot;</span>
-        <span style="color: #008000; font-weight: bold">assert</span> prediction<span style="color: #666666">.</span>size <span style="color: #666666">==</span> target<span style="color: #666666">.</span>size
-        <span style="color: #008000; font-weight: bold">return</span> np<span style="color: #666666">.</span>average((target <span style="color: #666666">==</span> prediction))
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">_progress_bar</span>(<span style="color: #008000">self</span>, epoch: <span style="color: #008000">int</span>, epochs: <span style="color: #008000">int</span>, scores: <span style="color: #008000">dict</span>) <span style="color: #666666">-&gt;</span> <span style="color: #008000">int</span>:
-        <span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;</span>
-<span style="color: #BA2121; font-style: italic">        Description:</span>
-<span style="color: #BA2121; font-style: italic">        ------------</span>
-<span style="color: #BA2121; font-style: italic">            Displays progress of training</span>
-<span style="color: #BA2121; font-style: italic">        &quot;&quot;&quot;</span>
-        progression <span style="color: #666666">=</span> epoch <span style="color: #666666">/</span> epochs
-        epoch <span style="color: #666666">-=</span> <span style="color: #666666">1</span>
-        print_length <span style="color: #666666">=</span> <span style="color: #666666">40</span>
-        num_equals <span style="color: #666666">=</span> <span style="color: #008000">int</span>(progression <span style="color: #666666">*</span> print_length)
-        num_not <span style="color: #666666">=</span> print_length <span style="color: #666666">-</span> num_equals
-        arrow <span style="color: #666666">=</span> <span style="color: #BA2121">&quot;&gt;&quot;</span> <span style="color: #008000; font-weight: bold">if</span> num_equals <span style="color: #666666">&gt;</span> <span style="color: #666666">0</span> <span style="color: #008000; font-weight: bold">else</span> <span style="color: #BA2121">&quot;&quot;</span>
-        bar <span style="color: #666666">=</span> <span style="color: #BA2121">&quot;[&quot;</span> <span style="color: #666666">+</span> <span style="color: #BA2121">&quot;=&quot;</span> <span style="color: #666666">*</span> (num_equals <span style="color: #666666">-</span> <span style="color: #666666">1</span>) <span style="color: #666666">+</span> arrow <span style="color: #666666">+</span> <span style="color: #BA2121">&quot;-&quot;</span> <span style="color: #666666">*</span> num_not <span style="color: #666666">+</span> <span style="color: #BA2121">&quot;]&quot;</span>
-        perc_print <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>_fmt(progression <span style="color: #666666">*</span> <span style="color: #666666">100</span>, N<span style="color: #666666">=5</span>)
-        line <span style="color: #666666">=</span> <span style="color: #BA2121">f&quot;  </span><span style="color: #BB6688; font-weight: bold">{</span>bar<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121"> </span><span style="color: #BB6688; font-weight: bold">{</span>perc_print<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">% &quot;</span>
-
-        <span style="color: #008000; font-weight: bold">for</span> key, score <span style="color: #AA22FF; font-weight: bold">in</span> scores<span style="color: #666666">.</span>items():
-            <span style="color: #008000; font-weight: bold">if</span> np<span style="color: #666666">.</span>isnan(score[epoch]) <span style="color: #666666">==</span> <span style="color: #008000; font-weight: bold">False</span>:
-                value <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>_fmt(score[epoch], N<span style="color: #666666">=4</span>)
-                line <span style="color: #666666">+=</span> <span style="color: #BA2121">f&quot;| </span><span style="color: #BB6688; font-weight: bold">{</span>key<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">: </span><span style="color: #BB6688; font-weight: bold">{</span>value<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121"> &quot;</span>
-        <span style="color: #008000">print</span>(line, end<span style="color: #666666">=</span><span style="color: #BA2121">&quot;</span><span style="color: #BB6622; font-weight: bold">\r</span><span style="color: #BA2121">&quot;</span>)
-        <span style="color: #008000; font-weight: bold">return</span> <span style="color: #008000">len</span>(line)
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">_fmt</span>(<span style="color: #008000">self</span>, value: <span style="color: #008000">int</span>, N<span style="color: #666666">=4</span>) <span style="color: #666666">-&gt;</span> <span style="color: #008000">str</span>:
-        <span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;</span>
-<span style="color: #BA2121; font-style: italic">        Description:</span>
-<span style="color: #BA2121; font-style: italic">        ------------</span>
-<span style="color: #BA2121; font-style: italic">            Formats decimal numbers for progress bar</span>
-<span style="color: #BA2121; font-style: italic">        &quot;&quot;&quot;</span>
-        <span style="color: #008000; font-weight: bold">if</span> value <span style="color: #666666">&gt;</span> <span style="color: #666666">0</span>:
-            v <span style="color: #666666">=</span> value
-        <span style="color: #008000; font-weight: bold">elif</span> value <span style="color: #666666">&lt;</span> <span style="color: #666666">0</span>:
-            v <span style="color: #666666">=</span> <span style="color: #666666">-10</span> <span style="color: #666666">*</span> value
-        <span style="color: #008000; font-weight: bold">else</span>:
-            v <span style="color: #666666">=</span> <span style="color: #666666">1</span>
-        n <span style="color: #666666">=</span> <span style="color: #666666">1</span> <span style="color: #666666">+</span> math<span style="color: #666666">.</span>floor(math<span style="color: #666666">.</span>log10(v))
-        <span style="color: #008000; font-weight: bold">if</span> n <span style="color: #666666">&gt;=</span> N <span style="color: #666666">-</span> <span style="color: #666666">1</span>:
-            <span style="color: #008000; font-weight: bold">return</span> <span style="color: #008000">str</span>(<span style="color: #008000">round</span>(value))
-            <span style="color: #408080; font-style: italic"># or overflow</span>
-        <span style="color: #008000; font-weight: bold">return</span> <span style="color: #BA2121">f&quot;</span><span style="color: #BB6688; font-weight: bold">{</span>value<span style="color: #BB6688; font-weight: bold">:</span><span style="color: #BA2121">.</span><span style="color: #BB6688; font-weight: bold">{</span>N<span style="color: #666666">-</span>n<span style="color: #666666">-1</span><span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">f</span><span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">&quot;</span>
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-<h3 id="usage-of-cnn-code" class="anchor">Usage of CNN code </h3>
-
-<p>Using the CNN codebase is very simple. We begin by initiating a CNN
-object, which takes a cost function, a scheduler and a seed as its
-arguments. If a scheduler is not provided, it will per default
-initiate an Adam scheduler with eta=1e-4, and if a seed is not
-provided, the CNN will not be seeded, meaning it will run with a
-different random seed every run. Below we demonstrate an initiation of
-our CNN.
-</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;">adam_scheduler <span style="color: #666666">=</span> Adam(eta<span style="color: #666666">=1e-3</span>, rho<span style="color: #666666">=0.9</span>, rho2<span style="color: #666666">=0.999</span>)
-cnn <span style="color: #666666">=</span> CNN(cost_func<span style="color: #666666">=</span>CostCrossEntropy, scheduler<span style="color: #666666">=</span>adam_scheduler, seed<span style="color: #666666">=2023</span>)
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-<p>Now that we have our CNN object, we can begin to add layers to it!
-Many of the add_layer functions have default values, for example
-add_Convolution2DLayer() has a default v_stride and h_stride of
-1. However, these can of course be set to any value you please. Note
-that the input channels of a subsequent convolutional layer must equal
-the previous convolutional layer's feature maps.
-</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;">cnn<span style="color: #666666">.</span>add_Convolution2DLayer(
-    input_channels<span style="color: #666666">=1</span>,
-    feature_maps<span style="color: #666666">=1</span>,
-    kernel_height<span style="color: #666666">=3</span>,
-    kernel_width<span style="color: #666666">=3</span>,
-    act_func<span style="color: #666666">=</span>LRELU,
-)
-
-cnn<span style="color: #666666">.</span>add_FlattenLayer()
-
-cnn<span style="color: #666666">.</span>add_FullyConnectedLayer(<span style="color: #666666">30</span>, LRELU)
-
-cnn<span style="color: #666666">.</span>add_FullyConnectedLayer(<span style="color: #666666">20</span>, LRELU)
-
-cnn<span style="color: #666666">.</span>add_OutputLayer(<span style="color: #666666">10</span>, softmax)
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-<p>Here we have created a CNN with the following architecture:</p>
-
-<ol>
-<li> A convolutional layer with 1 input channel, with a kernel height of 2 and a width of 2, which uses LRELU as its non-linearity function. This layer outputs 1 feature map, which feed into the subsequent layer.</li>
-<li> A flatten layer</li>
-<li> A hidden layer with 30 nodes, with LRELU as its activation function</li>
-<li> Another hidden layer but with 20 nodes</li>
-<li> The output layer, with softmax as its activation function and 10 nodes. We use 10 nodes because we will be using a dataset with 10 classes.</li>
-</ol>
-<p>Now, before we can train the model, we need to load in our data. We
-will use the MNIST dataset and use 10000 \( 28 \times  28 \) images.
-</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.datasets</span> <span style="color: #008000; font-weight: bold">import</span> fetch_openml
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.model_selection</span> <span style="color: #008000; font-weight: bold">import</span> train_test_split
-
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">onehot</span>(target: np<span style="color: #666666">.</span>ndarray):
-    onehot <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros((target<span style="color: #666666">.</span>size, target<span style="color: #666666">.</span>max() <span style="color: #666666">+</span> <span style="color: #666666">1</span>))
-    onehot[np<span style="color: #666666">.</span>arange(target<span style="color: #666666">.</span>size), target] <span style="color: #666666">=</span> <span style="color: #666666">1</span>
-    <span style="color: #008000; font-weight: bold">return</span> onehot
-
-<span style="color: #408080; font-style: italic"># get dataset</span>
-dataset <span style="color: #666666">=</span> fetch_openml(<span style="color: #BA2121">&quot;mnist_784&quot;</span>, parser<span style="color: #666666">=</span><span style="color: #BA2121">&quot;auto&quot;</span>)
-mnist <span style="color: #666666">=</span> dataset<span style="color: #666666">.</span>data<span style="color: #666666">.</span>to_numpy(dtype<span style="color: #666666">=</span><span style="color: #BA2121">&quot;float&quot;</span>)[:<span style="color: #666666">10000</span>, :]
-
-<span style="color: #408080; font-style: italic"># scale data</span>
-<span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(mnist<span style="color: #666666">.</span>shape[<span style="color: #666666">1</span>]):
-    mnist[:, i] <span style="color: #666666">/=</span> <span style="color: #666666">255</span>
-    
-<span style="color: #408080; font-style: italic"># reshape to add single input channel to data shape [inputs, input_channels, height, width]</span>
-mnist <span style="color: #666666">=</span> mnist<span style="color: #666666">.</span>reshape(mnist<span style="color: #666666">.</span>shape[<span style="color: #666666">0</span>], <span style="color: #666666">1</span>, <span style="color: #666666">28</span>, <span style="color: #666666">28</span>)
-
-<span style="color: #408080; font-style: italic"># one hot encode target as we are doing multi-class classification</span>
-target <span style="color: #666666">=</span> onehot(np<span style="color: #666666">.</span>array([<span style="color: #008000">int</span>(i) <span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> dataset<span style="color: #666666">.</span>target<span style="color: #666666">.</span>to_numpy()[:<span style="color: #666666">10000</span>]]))
-
-<span style="color: #408080; font-style: italic"># split into training and validation data</span>
-x_train, x_val, y_train, y_val <span style="color: #666666">=</span> train_test_split(mnist, target)
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-<p>Now we may train our model. Note that we can utilize regularization in
-the CNN by using the lam (lambda) parameter in fit(), and utilize
-different types of gradient descent by specifying the amount of
-batches via the batches parameter as shown below.
-</p>
-
-<p>The functionfit() returns a score dictionary of the training error and
-accuracy (and validation error and accuracy if a validation set is
-provided) which can be used to plot the error and accuracy of the
-model over epochs.
-</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;">scores <span style="color: #666666">=</span> cnn<span style="color: #666666">.</span>fit(
-    x_train,
-    y_train,
-    lam<span style="color: #666666">=1e-5</span>,
-    batches<span style="color: #666666">=10</span>,
-    epochs<span style="color: #666666">=100</span>,
-    X_val<span style="color: #666666">=</span>x_val,
-    t_val<span style="color: #666666">=</span>y_val,
-)
-
-plt<span style="color: #666666">.</span>plot(scores[<span style="color: #BA2121">&quot;train_acc&quot;</span>], label<span style="color: #666666">=</span><span style="color: #BA2121">&quot;Training&quot;</span>)
-plt<span style="color: #666666">.</span>plot(scores[<span style="color: #BA2121">&quot;val_acc&quot;</span>], label<span style="color: #666666">=</span><span style="color: #BA2121">&quot;Validation&quot;</span>)
-plt<span style="color: #666666">.</span>ylim([<span style="color: #666666">0.8</span>,<span style="color: #666666">1</span>])
-plt<span style="color: #666666">.</span>xlabel(<span style="color: #BA2121">&quot;Epochs&quot;</span>)
-plt<span style="color: #666666">.</span>ylabel(<span style="color: #BA2121">&quot;Accuracy&quot;</span>)
-plt<span style="color: #666666">.</span>legend()
-plt<span style="color: #666666">.</span>show()
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-<p>Considering we only trained the model for 100 epochs without any tuning of the hyperparameters, this result is pretty good.</p>
-
-<p>The codebase allows for great flexibility in CNN
-architectures. Pooling layers can be added before, inbetween or after
-convolutional layers, but due to the great optimizations made within
-Convolution2DLayerOPT, we recommend using the v_stride and h_stride
-parameters in add_Convolution2DLayer() to reduce the dimentionality of
-the problem as the pooling layer is slow in comparison. To use the
-unoptimized version of Convolution2DLayer, simply pass optimized=False
-as an argument in add_Convolution2DLayer().
-</p>
-
-<p>If one wishes to perform binary classification using the CNN, simply
-use the cost function 'CostLogReg' when initializing the CNN and use 1
-node at the OutputLayer.
-</p>
-
-<p>Below we have created another, more untraditional architecture using
-our code to demonstrate its flexibility and different attributes such
-as asymmetric stride that might become useful when constructing your
-own CNN.
-</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;">adam_scheduler <span style="color: #666666">=</span> Adam(eta<span style="color: #666666">=1e-3</span>, rho<span style="color: #666666">=0.9</span>, rho2<span style="color: #666666">=0.999</span>)
-cnn <span style="color: #666666">=</span> CNN(cost_func<span style="color: #666666">=</span>CostCrossEntropy, scheduler<span style="color: #666666">=</span>adam_scheduler, seed<span style="color: #666666">=2023</span>)
-
-cnn<span style="color: #666666">.</span>add_Convolution2DLayer(
-    input_channels<span style="color: #666666">=1</span>,
-    feature_maps<span style="color: #666666">=7</span>,
-    kernel_height<span style="color: #666666">=7</span>,
-    kernel_width<span style="color: #666666">=1</span>,
-    act_func<span style="color: #666666">=</span>LRELU,
-)
-
-cnn<span style="color: #666666">.</span>add_PoolingLayer(
-    kernel_height<span style="color: #666666">=2</span>,
-    kernel_width<span style="color: #666666">=2</span>,
-    pooling<span style="color: #666666">=</span><span style="color: #BA2121">&quot;average&quot;</span>,
-)
-
-cnn<span style="color: #666666">.</span>add_PoolingLayer(
-    kernel_height<span style="color: #666666">=2</span>,
-    kernel_width<span style="color: #666666">=2</span>,
-    pooling<span style="color: #666666">=</span><span style="color: #BA2121">&quot;max&quot;</span>,
-)
-
-cnn<span style="color: #666666">.</span>add_Convolution2DLayer(
-    input_channels<span style="color: #666666">=7</span>,
-    feature_maps<span style="color: #666666">=1</span>,
-    kernel_height<span style="color: #666666">=4</span>,
-    kernel_width<span style="color: #666666">=4</span>,
-    v_stride<span style="color: #666666">=2</span>,
-    h_stride<span style="color: #666666">=3</span>,
-    act_func<span style="color: #666666">=</span>LRELU,
-    optimized<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">False</span>,
-)
-
-cnn<span style="color: #666666">.</span>add_Convolution2DLayer(
-    input_channels<span style="color: #666666">=1</span>,
-    feature_maps<span style="color: #666666">=1</span>,
-    kernel_height<span style="color: #666666">=2</span>,
-    kernel_width<span style="color: #666666">=2</span>,
-    act_func<span style="color: #666666">=</span>sigmoid,
-    optimized<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">True</span>,
-)
-
-cnn<span style="color: #666666">.</span>add_PoolingLayer(
-    kernel_height<span style="color: #666666">=2</span>,
-    kernel_width<span style="color: #666666">=2</span>,
-    pooling<span style="color: #666666">=</span><span style="color: #BA2121">&quot;max&quot;</span>
-)
-
-cnn<span style="color: #666666">.</span>add_FlattenLayer()
-
-cnn<span style="color: #666666">.</span>add_FullyConnectedLayer(<span style="color: #666666">100</span>, LRELU)
-
-cnn<span style="color: #666666">.</span>add_FullyConnectedLayer(<span style="color: #666666">10</span>, sigmoid)
-
-cnn<span style="color: #666666">.</span>add_FullyConnectedLayer(<span style="color: #666666">101</span>, identity)
-
-cnn<span style="color: #666666">.</span>add_OutputLayer(<span style="color: #666666">10</span>, softmax)
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-<p>Here we see the use of asymmetrical 1D kernels such as the \( 7 \times
-1 \) kernel in the first convolutional layer, both max and average
-pooling, asymmetric stride in the unoptimized convolutional layer,
-more pooling, a flatten layer, a hidden layer with 100 nodes using
-LRELU, another hidden layer with 10 hidden nodes that uses the sigmoid
-activation function, and another hidden layer with 101 nodes which
-utilizes no activation function (identity). Finally, we arrive at the
-output layer with 10 nodes, which uses softmax as its activation
-function.
-</p>
-<h3 id="additional-remarks" class="anchor">Additional Remarks </h3>
-
-<p>The stride parameter controls the distance between each convolution
-and the kernel/filter. If our image is padded, stride is the only
-parameter that determines the size of the output from a convolutional
-layer. However, if we decide not to perform any padding, the size of
-the output feature map depends on both the stride and kernel size. It
-is important to note that neither the stride nor the kernel has to be
-symmetrical. This means that we can use a rectangular filter if we
-choose, and the stride in the vertical direction (axis=0 in Python)
-does not need to be the same as the stride in the horizontal direction
-(axis=1 in Python). It may even be the case that asymmetric
-combinations of stride or kernel dimensions, or both, yield better
-results than symmetric values for these parameters.
-</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">convolve</span>(image, kernel, stride<span style="color: #666666">=1</span>):
-    <span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #666666">2</span>):
-        kernel <span style="color: #666666">=</span> np<span style="color: #666666">.</span>rot90(kernel)
-
-    k_half_height <span style="color: #666666">=</span> kernel<span style="color: #666666">.</span>shape[<span style="color: #666666">0</span>] <span style="color: #666666">//</span> <span style="color: #666666">2</span>
-    k_half_width <span style="color: #666666">=</span> kernel<span style="color: #666666">.</span>shape[<span style="color: #666666">0</span>] <span style="color: #666666">//</span> <span style="color: #666666">2</span>
-
-    conv_image <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros(image<span style="color: #666666">.</span>shape)
-    pad_image <span style="color: #666666">=</span> padding(image, kernel)
-
-    <span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(k_half_height, conv_image<span style="color: #666666">.</span>shape[<span style="color: #666666">0</span>] <span style="color: #666666">+</span> k_half_height, stride):
-        <span style="color: #008000; font-weight: bold">for</span> j <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(k_half_width, conv_image<span style="color: #666666">.</span>shape[<span style="color: #666666">1</span>] <span style="color: #666666">+</span> k_half_width, stride):
-            conv_image[i <span style="color: #666666">-</span> k_half_height, j <span style="color: #666666">-</span> k_half_width] <span style="color: #666666">=</span> np<span style="color: #666666">.</span>sum(
-                pad_image[
-                    i <span style="color: #666666">-</span> k_half_height : i <span style="color: #666666">+</span> k_half_height <span style="color: #666666">+</span> <span style="color: #666666">1</span>, j <span style="color: #666666">-</span> k_half_width : j <span style="color: #666666">+</span> k_half_width <span style="color: #666666">+</span> <span style="color: #666666">1</span>
-                ]
-                <span style="color: #666666">*</span> kernel
-            )
-
-    <span style="color: #008000; font-weight: bold">return</span> conv_image
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-<h3 id="remarks-on-the-speed" class="anchor">Remarks on the speed  </h3>
-
-<p>Despite the naive convolution algorithm shown above working finely, it
-is extremely slow, requiring approximately 20-30 seconds to process a
-single image. The time complexity of 2D convolution, which is O(NMnm),
-rapidly becomes a constraint and may, at worst, make computations
-infeasible. Consequently, optimizing the naive 2D convolution
-algorithm is a necessity, as the execution time of the algorithm
-significantly increases as the input data size expands. This can pose
-a bottleneck in applications that necessitate real-time processing of
-large data volumes, such as image and video processing, deep learning,
-and scientific simulations.
-</p>
-
-<p>To address this issue, we shall present two widely used optimization
-techniques: the separable kernel approach and Fast Fourier Transform
-(FFT). Both of these methods can drastically reduce the computational
-complexity of convolution and enhance the overall efficiency of
-processing substantial data quantities. While we shall refrain from
-delving into the intricacies of these algorithms, we strongly
-encourage you to examine at least the application of FFT to optimize
-computations.
-</p>
-<h3 id="convolution-using-separable-kernels" class="anchor">Convolution using separable kernels </h3>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">conv2DSep</span>(image, kernel, coef, stride<span style="color: #666666">=1</span>, pad<span style="color: #666666">=</span><span style="color: #BA2121">&quot;zero&quot;</span>):
-    <span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #666666">2</span>):
-        kernel <span style="color: #666666">=</span> np<span style="color: #666666">.</span>rot90(kernel)
-
-    <span style="color: #408080; font-style: italic"># The kernel is quadratic, thus we only need one of its dimensions</span>
-    half_dim <span style="color: #666666">=</span> kernel<span style="color: #666666">.</span>shape[<span style="color: #666666">0</span>] <span style="color: #666666">//</span> <span style="color: #666666">2</span>
-
-    ker1 <span style="color: #666666">=</span> np<span style="color: #666666">.</span>array(kernel[<span style="color: #666666">0</span>, :])
-    ker2 <span style="color: #666666">=</span> np<span style="color: #666666">.</span>array(kernel[:, <span style="color: #666666">0</span>])
-
-    <span style="color: #008000; font-weight: bold">if</span> pad <span style="color: #666666">==</span> <span style="color: #BA2121">&quot;zero&quot;</span>:
-        conv_image <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros(image<span style="color: #666666">.</span>shape)
-        pad_image <span style="color: #666666">=</span> padding(image, kernel)
-    <span style="color: #008000; font-weight: bold">else</span>:
-        conv_image <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros(
-            (image<span style="color: #666666">.</span>shape[<span style="color: #666666">0</span>] <span style="color: #666666">-</span> kernel<span style="color: #666666">.</span>shape[<span style="color: #666666">0</span>], image<span style="color: #666666">.</span>shape[<span style="color: #666666">1</span>] <span style="color: #666666">-</span> kernel<span style="color: #666666">.</span>shape[<span style="color: #666666">1</span>])
-        )
-        pad_image <span style="color: #666666">=</span> image[:, :]
-
-    <span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(half_dim, conv_image<span style="color: #666666">.</span>shape[<span style="color: #666666">0</span>] <span style="color: #666666">+</span> half_dim, stride):
-        <span style="color: #008000; font-weight: bold">for</span> j <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(half_dim, conv_image<span style="color: #666666">.</span>shape[<span style="color: #666666">1</span>] <span style="color: #666666">+</span> half_dim, stride):
-            conv_image[i <span style="color: #666666">-</span> half_dim, j <span style="color: #666666">-</span> half_dim] <span style="color: #666666">=</span> (
-                pad_image[
-                    i <span style="color: #666666">-</span> half_dim : i <span style="color: #666666">+</span> half_dim <span style="color: #666666">+</span> <span style="color: #666666">1</span>, j <span style="color: #666666">-</span> half_dim : j <span style="color: #666666">+</span> half_dim <span style="color: #666666">+</span> <span style="color: #666666">1</span>
-                ]
-                <span style="color: #666666">@</span> ker1
-                <span style="color: #666666">@</span> ker2<span style="color: #666666">.</span>T
-                <span style="color: #666666">*</span> coef
-            )
-
-    <span style="color: #008000; font-weight: bold">return</span> conv_image
-
-img_path <span style="color: #666666">=</span> img_path <span style="color: #666666">=</span> <span style="color: #BA2121">&quot;data/IMG-2167.JPG&quot;</span>
-image_of_cute_dog <span style="color: #666666">=</span> imageio<span style="color: #666666">.</span>imread(img_path, mode<span style="color: #666666">=</span><span style="color: #BA2121">&quot;L&quot;</span>)
-start_time <span style="color: #666666">=</span> time<span style="color: #666666">.</span>time()
-filtered_image <span style="color: #666666">=</span> conv2DSep(image_of_cute_dog, kernel<span style="color: #666666">=</span>sobel_kernel, coef<span style="color: #666666">=1</span>)
-<span style="color: #008000">print</span>(<span style="color: #BA2121">f&#39;Time taken for convolution with seperated kernel on 128x128 image </span><span style="color: #BB6688; font-weight: bold">{</span>time<span style="color: #666666">.</span>time() <span style="color: #666666">-</span> start_time<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">&#39;</span>)
-plt<span style="color: #666666">.</span>imshow(filtered_image, cmap<span style="color: #666666">=</span><span style="color: #BA2121">&quot;gray&quot;</span>, vmin<span style="color: #666666">=0</span>, vmax<span style="color: #666666">=255</span>, aspect<span style="color: #666666">=</span><span style="color: #BA2121">&quot;auto&quot;</span>)
-plt<span style="color: #666666">.</span>show()
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-<p>By taking advantage of the capabilities of separable kernels, we can
-effectively cut the computational expense of filtering an image in
-half. Yet, if we seek even more rapid processing, we can turn to the
-Fast Fourier Transform (FFT) algorithm provided by the numpy
-library. By utilizing FFT to transform the input image and filter into
-the frequency domain, we can perform convolution in this domain. This
-approach significantly reduces the number of operations needed and
-results in a marked speedup relative to other convolution
-techniques. In addition, it is worth noting that the FFT is widely
-regarded as one of the most critical algorithms developed to date,
-with applications ranging from digital signal processing to scientific
-computing.
-</p>
-<h3 id="convolution-in-the-fourier-domain" class="anchor">Convolution in the Fourier domain </h3>
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;">start_time <span style="color: #666666">=</span> time<span style="color: #666666">.</span>time()
-img_fft <span style="color: #666666">=</span> np<span style="color: #666666">.</span>fft<span style="color: #666666">.</span>fft2(image_of_cute_dog)
-kernel_fft <span style="color: #666666">=</span> np<span style="color: #666666">.</span>fft<span style="color: #666666">.</span>fft2(sobel_kernel, s<span style="color: #666666">=</span>image_of_cute_dog<span style="color: #666666">.</span>shape)
-
-conv_image <span style="color: #666666">=</span> img_fft <span style="color: #666666">*</span> kernel_fft
-
-filtered_image <span style="color: #666666">=</span> np<span style="color: #666666">.</span>fft<span style="color: #666666">.</span>ifft2(conv_image)
-<span style="color: #008000">print</span>(<span style="color: #BA2121">f&#39;Time take for convolution in the fourier domain: </span><span style="color: #BB6688; font-weight: bold">{</span>time<span style="color: #666666">.</span>time() <span style="color: #666666">-</span> start_time<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">&#39;</span>)
-plt<span style="color: #666666">.</span>imshow(filtered_image<span style="color: #666666">.</span>real, cmap<span style="color: #666666">=</span><span style="color: #BA2121">&quot;gray&quot;</span>, vmin<span style="color: #666666">=0</span>, vmax<span style="color: #666666">=255</span>, aspect<span style="color: #666666">=</span><span style="color: #BA2121">&quot;auto&quot;</span>)
-plt<span style="color: #666666">.</span>show()
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-<p>It is evident that executing convolution in the Fourier domain yields
-the quickest computation time. Nonetheless, one should exercise
-caution, particularly when dealing with images of relatively small
-dimensions, as one of the other methods may prove to be more
-expeditious than FFT-enhanced convolution. The overhead involved in
-transferring both the image and filter into the Fourier domain,
-followed by their subsequent transformation back into the spatial
-domain, results in a minor inconvenience. Therefore, it is imperative
-to remain cognizant of this fact when utilizing FFT as the primary
-optimization technique.
-</p>
+<p>is now given by \( \boldsymbol{W}'\boldsymbol{X}' \) which is a vector of length \( 4 \) instead of the originally resulting  \( 2\times 2 \) output matrix.</p>
 
 <p>
 <!-- navigation buttons at the bottom of the page -->
@@ -3708,7 +362,7 @@ <h3 id="convolution-in-the-fourier-domain" class="anchor">Convolution in the Fou
   <li><a href="http://github.com/._week45-bs039.html">40</a></li>
   <li><a href="http://github.com/._week45-bs040.html">41</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week45-bs076.html">77</a></li>
+  <li><a href="http://github.com/._week45-bs058.html">59</a></li>
   <li><a href="http://github.com/._week45-bs032.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week45/html/._week45-bs032.html b/doc/pub/week45/html/._week45-bs032.html
index ec26df02f..91ff2acc1 100644
--- a/doc/pub/week45/html/._week45-bs032.html
+++ b/doc/pub/week45/html/._week45-bs032.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks (RNNs)">
-<title>Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks (RNNs)</title>
+<meta name="description" content="Week 45,  Convolutional Neural Networks (CCNs)">
+<title>Week 45,  Convolutional Neural Networks (CCNs)</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week45.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week45-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -37,19 +37,19 @@
 <!-- tocinfo
 {'highest level': 2,
  'sections': [('Plans for week 45', 2, None, 'plans-for-week-45'),
-              ('Material for the lab sessions, additional ways to present '
-               'classification results and other practicalities',
+              ('Material for the lab sessions',
                2,
                None,
-               'material-for-the-lab-sessions-additional-ways-to-present-classification-results-and-other-practicalities'),
-              ('Material for Lecture Monday November 4',
+               'material-for-the-lab-sessions'),
+              ('Material for Lecture Monday November 3',
                2,
                None,
-               'material-for-lecture-monday-november-4'),
-              ('Convolutional Neural Networks (recognizing images)',
+               'material-for-lecture-monday-november-3'),
+              ('Convolutional Neural Networks (recognizing images), reminder '
+               'from last week',
                2,
                None,
-               'convolutional-neural-networks-recognizing-images'),
+               'convolutional-neural-networks-recognizing-images-reminder-from-last-week'),
               ('What is the Difference', 2, None, 'what-is-the-difference'),
               ('Neural Networks vs CNNs', 2, None, 'neural-networks-vs-cnns'),
               ('Why CNNS for images, sound files, medical images from CT scans '
@@ -62,10 +62,13 @@
                None,
                'regular-nns-don-t-scale-well-to-full-images'),
               ('3D volumes of neurons', 2, None, '3d-volumes-of-neurons'),
+              ('More on Dimensionalities', 2, None, 'more-on-dimensionalities'),
+              ('Further remarks', 2, None, 'further-remarks'),
               ('Layers used to build CNNs',
                2,
                None,
                'layers-used-to-build-cnns'),
+              ('Transforming images', 2, None, 'transforming-images'),
               ('CNNs in brief', 2, None, 'cnns-in-brief'),
               ('A deep CNN model ("From Raschka et '
                'al":"https://github.com/rasbt/machine-learning-book")',
@@ -73,208 +76,104 @@
                None,
                'a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
               ('Key Idea', 2, None, 'key-idea'),
-              ('Building convolutional neural networks in Tensorflow and Keras',
+              ('Mathematics of CNNs', 2, None, 'mathematics-of-cnns'),
+              ('Convolution Examples: Polynomial multiplication',
                2,
                None,
-               'building-convolutional-neural-networks-in-tensorflow-and-keras'),
-              ('Setting it up', 2, None, 'setting-it-up'),
-              ('The MNIST dataset again', 2, None, 'the-mnist-dataset-again'),
-              ('Strong correlations', 2, None, 'strong-correlations'),
-              ('Layers of a CNN', 2, None, 'layers-of-a-cnn'),
-              ('Systematic reduction', 2, None, 'systematic-reduction'),
-              ('Prerequisites: Collect and pre-process data',
-               2,
-               None,
-               'prerequisites-collect-and-pre-process-data'),
-              ('Importing Keras and Tensorflow',
-               2,
-               None,
-               'importing-keras-and-tensorflow'),
-              ('Running with Keras', 2, None, 'running-with-keras'),
-              ('Final part', 2, None, 'final-part'),
-              ('Final visualization', 2, None, 'final-visualization'),
-              ('The CIFAR01 data set', 2, None, 'the-cifar01-data-set'),
-              ('Verifying the data set', 2, None, 'verifying-the-data-set'),
-              ('Set up  the model', 2, None, 'set-up-the-model'),
-              ('Add Dense layers on top', 2, None, 'add-dense-layers-on-top'),
-              ('Compile and train the model',
-               2,
-               None,
-               'compile-and-train-the-model'),
-              ('Finally, evaluate the model',
-               2,
-               None,
-               'finally-evaluate-the-model'),
-              ('Building our own CNN code',
-               2,
-               None,
-               'building-our-own-cnn-code'),
-              ('List of contents:', 3, None, 'list-of-contents'),
-              ('Schedulers', 3, None, 'schedulers'),
-              ('Usage of schedulers', 3, None, 'usage-of-schedulers'),
-              ('Cost functions', 3, None, 'cost-functions'),
-              ('Usage of cost functions', 3, None, 'usage-of-cost-functions'),
-              ('Activation functions', 3, None, 'activation-functions'),
-              ('Usage of activation functions',
-               3,
-               None,
-               'usage-of-activation-functions'),
-              ('Convolution', 3, None, 'convolution'),
-              ('Layers', 3, None, 'layers'),
-              ('Convolution2DLayer: convolution in a hidden layer',
-               3,
-               None,
-               'convolution2dlayer-convolution-in-a-hidden-layer'),
-              ('Backpropagation in the convolutional layer',
-               3,
-               None,
-               'backpropagation-in-the-convolutional-layer'),
-              ('Demonstration', 3, None, 'demonstration'),
-              ('Pooling Layer', 3, None, 'pooling-layer'),
-              ('Flattening Layer', 3, None, 'flattening-layer'),
-              ('Fully Connected Layers', 3, None, 'fully-connected-layers'),
-              ('Optimized Convolution2DLayer',
-               3,
-               None,
-               'optimized-convolution2dlayer'),
-              ('The Convolutional Neural Network (CNN)',
-               3,
-               None,
-               'the-convolutional-neural-network-cnn'),
-              ('Usage of CNN code', 3, None, 'usage-of-cnn-code'),
-              ('Additional Remarks', 3, None, 'additional-remarks'),
-              ('Remarks on the speed', 3, None, 'remarks-on-the-speed'),
-              ('Convolution using separable kernels',
-               3,
-               None,
-               'convolution-using-separable-kernels'),
-              ('Convolution in the Fourier domain',
-               3,
-               None,
-               'convolution-in-the-fourier-domain'),
-              ('From FFNNs and CNNs to recurrent neural networks (RNNs)',
-               2,
-               None,
-               'from-ffnns-and-cnns-to-recurrent-neural-networks-rnns'),
-              ('Feedback connections', 2, None, 'feedback-connections'),
-              ('Vanishing gradients', 2, None, 'vanishing-gradients'),
-              ('Recurrent neural networks (RNNs): Overarching view',
+               'convolution-examples-polynomial-multiplication'),
+              ('Efficient Polynomial Multiplication',
                2,
                None,
-               'recurrent-neural-networks-rnns-overarching-view'),
-              ('Sequential data only?', 2, None, 'sequential-data-only'),
-              ('Differential equations', 2, None, 'differential-equations'),
-              ('A simple example', 2, None, 'a-simple-example'),
-              ('RNNs', 2, None, 'rnns'),
-              ('What kinds of behaviour can RNNs exhibit?',
+               'efficient-polynomial-multiplication'),
+              ('Further simplification', 2, None, 'further-simplification'),
+              ('A more efficient way of coding the above Convolution',
                2,
                None,
-               'what-kinds-of-behaviour-can-rnns-exhibit'),
-              ('Basic layout,  "Figures from Sebastian Rashcka et al, Machine '
-               'learning with Sickit-Learn and '
-               'PyTorch":"https://sebastianraschka.com/blog/2022/ml-pytorch-book.html"',
+               'a-more-efficient-way-of-coding-the-above-convolution'),
+              ('Commutative process', 2, None, 'commutative-process'),
+              ('Toeplitz matrices', 2, None, 'toeplitz-matrices'),
+              ('Fourier series and Toeplitz matrices',
                2,
                None,
-               'basic-layout-figures-from-sebastian-rashcka-et-al-machine-learning-with-sickit-learn-and-pytorch-https-sebastianraschka-com-blog-2022-ml-pytorch-book-html'),
-              ('Solving differential equations with RNNs',
+               'fourier-series-and-toeplitz-matrices'),
+              ('Generalizing the above one-dimensional case',
                2,
                None,
-               'solving-differential-equations-with-rnns'),
-              ('Two first-order differential equations',
+               'generalizing-the-above-one-dimensional-case'),
+              ('Memory considerations', 2, None, 'memory-considerations'),
+              ('Padding', 2, None, 'padding'),
+              ('New vector', 2, None, 'new-vector'),
+              ('Rewriting as dot products',
                2,
                None,
-               'two-first-order-differential-equations'),
-              ('Velocity only', 2, None, 'velocity-only'),
-              ('Linking with RNNs', 2, None, 'linking-with-rnns'),
-              ('Minor rewrite', 2, None, 'minor-rewrite'),
-              ('RNNs in more detail', 2, None, 'rnns-in-more-detail'),
-              ('RNNs in more detail, part 2',
+               'rewriting-as-dot-products'),
+              ('Cross correlation', 2, None, 'cross-correlation'),
+              ('Two-dimensional objects', 2, None, 'two-dimensional-objects'),
+              ('CNNs in more detail, simple example',
                2,
                None,
-               'rnns-in-more-detail-part-2'),
-              ('RNNs in more detail, part 3',
+               'cnns-in-more-detail-simple-example'),
+              ('The convolution stage', 2, None, 'the-convolution-stage'),
+              ('Finding the number of parameters',
                2,
                None,
-               'rnns-in-more-detail-part-3'),
-              ('RNNs in more detail, part 4',
+               'finding-the-number-of-parameters'),
+              ('New image (or volume)', 2, None, 'new-image-or-volume'),
+              ('Parameters to train, common settings',
                2,
                None,
-               'rnns-in-more-detail-part-4'),
-              ('RNNs in more detail, part 5',
-               2,
-               None,
-               'rnns-in-more-detail-part-5'),
-              ('RNNs in more detail, part 6',
-               2,
-               None,
-               'rnns-in-more-detail-part-6'),
-              ('RNNs in more detail, part 7',
-               2,
-               None,
-               'rnns-in-more-detail-part-7'),
-              ('Backpropagation through time',
-               2,
-               None,
-               'backpropagation-through-time'),
-              ('The backward pass is linear',
-               2,
-               None,
-               'the-backward-pass-is-linear'),
-              ('The problem of exploding or vanishing gradients',
-               2,
-               None,
-               'the-problem-of-exploding-or-vanishing-gradients'),
-              ('Mathematical setup', 2, None, 'mathematical-setup'),
-              ('Back propagation in time through figures, part 1',
-               2,
-               None,
-               'back-propagation-in-time-through-figures-part-1'),
-              ('Back propagation in time, part 2',
-               2,
-               None,
-               'back-propagation-in-time-part-2'),
-              ('Back propagation in time, part 3',
+               'parameters-to-train-common-settings'),
+              ('Examples of CNN setups', 2, None, 'examples-of-cnn-setups'),
+              ('Summarizing: Performing a general discrete convolution ("From '
+               'Raschka et '
+               'al":"https://github.com/rasbt/machine-learning-book")',
                2,
                None,
-               'back-propagation-in-time-part-3'),
-              ('Back propagation in time, part 4',
+               'summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
+              ('Pooling', 2, None, 'pooling'),
+              ('Pooling arithmetic', 2, None, 'pooling-arithmetic'),
+              ('Pooling types ("From Raschka et '
+               'al":"https://github.com/rasbt/machine-learning-book")',
                2,
                None,
-               'back-propagation-in-time-part-4'),
-              ('Back propagation in time in equations',
+               'pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
+              ('Building convolutional neural networks using Tensorflow and '
+               'Keras',
                2,
                None,
-               'back-propagation-in-time-in-equations'),
-              ('Chain rule again', 2, None, 'chain-rule-again'),
-              ('Gradients of loss functions',
+               'building-convolutional-neural-networks-using-tensorflow-and-keras'),
+              ('Setting it up', 2, None, 'setting-it-up'),
+              ('The MNIST dataset again', 2, None, 'the-mnist-dataset-again'),
+              ('Strong correlations', 2, None, 'strong-correlations'),
+              ('Layers of a CNN', 2, None, 'layers-of-a-cnn'),
+              ('Systematic reduction', 2, None, 'systematic-reduction'),
+              ('Prerequisites: Collect and pre-process data',
                2,
                None,
-               'gradients-of-loss-functions'),
-              ('Summary of RNNs', 2, None, 'summary-of-rnns'),
-              ('Summary of a  typical RNN',
+               'prerequisites-collect-and-pre-process-data'),
+              ('Importing Keras and Tensorflow',
                2,
                None,
-               'summary-of-a-typical-rnn'),
-              ('Four effective ways to learn an RNN and preparing for next '
-               'week',
+               'importing-keras-and-tensorflow'),
+              ('Running with Keras', 2, None, 'running-with-keras'),
+              ('Final part', 2, None, 'final-part'),
+              ('Final visualization', 2, None, 'final-visualization'),
+              ('The CIFAR01 data set', 2, None, 'the-cifar01-data-set'),
+              ('Verifying the data set', 2, None, 'verifying-the-data-set'),
+              ('Set up  the model', 2, None, 'set-up-the-model'),
+              ('Add Dense layers on top', 2, None, 'add-dense-layers-on-top'),
+              ('Compile and train the model',
                2,
                None,
-               'four-effective-ways-to-learn-an-rnn-and-preparing-for-next-week'),
-              ('Gating mechanism: Long Short Term Memory (LSTM)',
+               'compile-and-train-the-model'),
+              ('Finally, evaluate the model',
                2,
                None,
-               'gating-mechanism-long-short-term-memory-lstm'),
-              ('Implementing a memory cell in a neural network',
+               'finally-evaluate-the-model'),
+              ('Building code using Pytorch',
                2,
                None,
-               'implementing-a-memory-cell-in-a-neural-network'),
-              ('LSTM details', 2, None, 'lstm-details'),
-              ('Basic layout', 2, None, 'basic-layout'),
-              ('More LSTM details', 2, None, 'more-lstm-details'),
-              ('The forget gate', 2, None, 'the-forget-gate'),
-              ('Input gate', 2, None, 'input-gate'),
-              ('Forget and input', 2, None, 'forget-and-input'),
-              ('Output gate', 2, None, 'output-gate')]}
+               'building-code-using-pytorch')]}
 end of tocinfo -->
 
 <body>
@@ -302,111 +201,72 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week45-bs.html">Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks (RNNs)</a>
+    <a class="navbar-brand" href="http://github.com/week45-bs.html">Week 45,  Convolutional Neural Networks (CCNs)</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs001.html#plans-for-week-45" style="font-size: 80%;"><b>Plans for week 45</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs002.html#material-for-the-lab-sessions-additional-ways-to-present-classification-results-and-other-practicalities" style="font-size: 80%;"><b>Material for the lab sessions, additional ways to present classification results and other practicalities</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs003.html#material-for-lecture-monday-november-4" style="font-size: 80%;"><b>Material for Lecture Monday November 4</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs004.html#convolutional-neural-networks-recognizing-images" style="font-size: 80%;"><b>Convolutional Neural Networks (recognizing images)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs005.html#what-is-the-difference" style="font-size: 80%;"><b>What is the Difference</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs006.html#neural-networks-vs-cnns" style="font-size: 80%;"><b>Neural Networks vs CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs007.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;"><b>Why CNNS for images, sound files, medical images from CT scans etc?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs008.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;"><b>Regular NNs don’t scale well to full images</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs009.html#3d-volumes-of-neurons" style="font-size: 80%;"><b>3D volumes of neurons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs010.html#layers-used-to-build-cnns" style="font-size: 80%;"><b>Layers used to build CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs011.html#cnns-in-brief" style="font-size: 80%;"><b>CNNs in brief</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs012.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs013.html#key-idea" style="font-size: 80%;"><b>Key Idea</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs014.html#building-convolutional-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building convolutional neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs015.html#setting-it-up" style="font-size: 80%;"><b>Setting it up</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs016.html#the-mnist-dataset-again" style="font-size: 80%;"><b>The MNIST dataset again</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs017.html#strong-correlations" style="font-size: 80%;"><b>Strong correlations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs018.html#layers-of-a-cnn" style="font-size: 80%;"><b>Layers of a CNN</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs019.html#systematic-reduction" style="font-size: 80%;"><b>Systematic reduction</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs020.html#prerequisites-collect-and-pre-process-data" style="font-size: 80%;"><b>Prerequisites: Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs021.html#importing-keras-and-tensorflow" style="font-size: 80%;"><b>Importing Keras and Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs022.html#running-with-keras" style="font-size: 80%;"><b>Running with Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs023.html#final-part" style="font-size: 80%;"><b>Final part</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs024.html#final-visualization" style="font-size: 80%;"><b>Final visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs025.html#the-cifar01-data-set" style="font-size: 80%;"><b>The CIFAR01 data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs026.html#verifying-the-data-set" style="font-size: 80%;"><b>Verifying the data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs027.html#set-up-the-model" style="font-size: 80%;"><b>Set up  the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs028.html#add-dense-layers-on-top" style="font-size: 80%;"><b>Add Dense layers on top</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs029.html#compile-and-train-the-model" style="font-size: 80%;"><b>Compile and train the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs030.html#finally-evaluate-the-model" style="font-size: 80%;"><b>Finally, evaluate the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#building-our-own-cnn-code" style="font-size: 80%;"><b>Building our own CNN code</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#list-of-contents" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;List of contents:</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution2dlayer-convolution-in-a-hidden-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution2DLayer: convolution in a hidden layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#backpropagation-in-the-convolutional-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Backpropagation in the convolutional layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#demonstration" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Demonstration</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#pooling-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Pooling Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#flattening-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Flattening Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#fully-connected-layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Fully Connected Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#optimized-convolution2dlayer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Optimized Convolution2DLayer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#the-convolutional-neural-network-cnn" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;The Convolutional Neural Network (CNN)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-cnn-code" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of CNN code</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#additional-remarks" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Additional Remarks</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#remarks-on-the-speed" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Remarks on the speed</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution-using-separable-kernels" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution using separable kernels</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution-in-the-fourier-domain" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution in the Fourier domain</a></li>
-     <!-- navigation toc: --> <li><a href="#from-ffnns-and-cnns-to-recurrent-neural-networks-rnns" style="font-size: 80%;"><b>From FFNNs and CNNs to recurrent neural networks (RNNs)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs033.html#feedback-connections" style="font-size: 80%;"><b>Feedback connections</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs034.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs035.html#recurrent-neural-networks-rnns-overarching-view" style="font-size: 80%;"><b>Recurrent neural networks (RNNs): Overarching view</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs036.html#sequential-data-only" style="font-size: 80%;"><b>Sequential data only?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs037.html#differential-equations" style="font-size: 80%;"><b>Differential equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs038.html#a-simple-example" style="font-size: 80%;"><b>A simple example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs039.html#rnns" style="font-size: 80%;"><b>RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs040.html#what-kinds-of-behaviour-can-rnns-exhibit" style="font-size: 80%;"><b>What kinds of behaviour can RNNs exhibit?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs041.html#basic-layout-figures-from-sebastian-rashcka-et-al-machine-learning-with-sickit-learn-and-pytorch-https-sebastianraschka-com-blog-2022-ml-pytorch-book-html" style="font-size: 80%;"><b>Basic layout,  "Figures from Sebastian Rashcka et al, Machine learning with Sickit-Learn and PyTorch":"https://sebastianraschka.com/blog/2022/ml-pytorch-book.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs042.html#solving-differential-equations-with-rnns" style="font-size: 80%;"><b>Solving differential equations with RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs043.html#two-first-order-differential-equations" style="font-size: 80%;"><b>Two first-order differential equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs044.html#velocity-only" style="font-size: 80%;"><b>Velocity only</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs045.html#linking-with-rnns" style="font-size: 80%;"><b>Linking with RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs046.html#minor-rewrite" style="font-size: 80%;"><b>Minor rewrite</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs047.html#rnns-in-more-detail" style="font-size: 80%;"><b>RNNs in more detail</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs048.html#rnns-in-more-detail-part-2" style="font-size: 80%;"><b>RNNs in more detail, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs049.html#rnns-in-more-detail-part-3" style="font-size: 80%;"><b>RNNs in more detail, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs050.html#rnns-in-more-detail-part-4" style="font-size: 80%;"><b>RNNs in more detail, part 4</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs051.html#rnns-in-more-detail-part-5" style="font-size: 80%;"><b>RNNs in more detail, part 5</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs052.html#rnns-in-more-detail-part-6" style="font-size: 80%;"><b>RNNs in more detail, part 6</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs053.html#rnns-in-more-detail-part-7" style="font-size: 80%;"><b>RNNs in more detail, part 7</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs054.html#backpropagation-through-time" style="font-size: 80%;"><b>Backpropagation through time</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs055.html#the-backward-pass-is-linear" style="font-size: 80%;"><b>The backward pass is linear</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs056.html#the-problem-of-exploding-or-vanishing-gradients" style="font-size: 80%;"><b>The problem of exploding or vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs057.html#mathematical-setup" style="font-size: 80%;"><b>Mathematical setup</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs058.html#back-propagation-in-time-through-figures-part-1" style="font-size: 80%;"><b>Back propagation in time through figures, part 1</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs059.html#back-propagation-in-time-part-2" style="font-size: 80%;"><b>Back propagation in time, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs060.html#back-propagation-in-time-part-3" style="font-size: 80%;"><b>Back propagation in time, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs061.html#back-propagation-in-time-part-4" style="font-size: 80%;"><b>Back propagation in time, part 4</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs062.html#back-propagation-in-time-in-equations" style="font-size: 80%;"><b>Back propagation in time in equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs063.html#chain-rule-again" style="font-size: 80%;"><b>Chain rule again</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs064.html#gradients-of-loss-functions" style="font-size: 80%;"><b>Gradients of loss functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs065.html#summary-of-rnns" style="font-size: 80%;"><b>Summary of RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs066.html#summary-of-a-typical-rnn" style="font-size: 80%;"><b>Summary of a  typical RNN</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs067.html#four-effective-ways-to-learn-an-rnn-and-preparing-for-next-week" style="font-size: 80%;"><b>Four effective ways to learn an RNN and preparing for next week</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs068.html#gating-mechanism-long-short-term-memory-lstm" style="font-size: 80%;"><b>Gating mechanism: Long Short Term Memory (LSTM)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs069.html#implementing-a-memory-cell-in-a-neural-network" style="font-size: 80%;"><b>Implementing a memory cell in a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs070.html#lstm-details" style="font-size: 80%;"><b>LSTM details</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs071.html#basic-layout" style="font-size: 80%;"><b>Basic layout</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs072.html#more-lstm-details" style="font-size: 80%;"><b>More LSTM details</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs073.html#the-forget-gate" style="font-size: 80%;"><b>The forget gate</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs074.html#input-gate" style="font-size: 80%;"><b>Input gate</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs075.html#forget-and-input" style="font-size: 80%;"><b>Forget and input</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs076.html#output-gate" style="font-size: 80%;"><b>Output gate</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs001.html#plans-for-week-45" style="font-size: 80%;">Plans for week 45</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs002.html#material-for-the-lab-sessions" style="font-size: 80%;">Material for the lab sessions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs003.html#material-for-lecture-monday-november-3" style="font-size: 80%;">Material for Lecture Monday November 3</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs004.html#convolutional-neural-networks-recognizing-images-reminder-from-last-week" style="font-size: 80%;">Convolutional Neural Networks (recognizing images), reminder from last week</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs005.html#what-is-the-difference" style="font-size: 80%;">What is the Difference</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs006.html#neural-networks-vs-cnns" style="font-size: 80%;">Neural Networks vs CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs007.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;">Why CNNS for images, sound files, medical images from CT scans etc?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs008.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;">Regular NNs don’t scale well to full images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs009.html#3d-volumes-of-neurons" style="font-size: 80%;">3D volumes of neurons</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs010.html#more-on-dimensionalities" style="font-size: 80%;">More on Dimensionalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs011.html#further-remarks" style="font-size: 80%;">Further remarks</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs012.html#layers-used-to-build-cnns" style="font-size: 80%;">Layers used to build CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs013.html#transforming-images" style="font-size: 80%;">Transforming images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs014.html#cnns-in-brief" style="font-size: 80%;">CNNs in brief</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs015.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs016.html#key-idea" style="font-size: 80%;">Key Idea</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs017.html#mathematics-of-cnns" style="font-size: 80%;">Mathematics of CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs018.html#convolution-examples-polynomial-multiplication" style="font-size: 80%;">Convolution Examples: Polynomial multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs019.html#efficient-polynomial-multiplication" style="font-size: 80%;">Efficient Polynomial Multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs020.html#further-simplification" style="font-size: 80%;">Further simplification</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs021.html#a-more-efficient-way-of-coding-the-above-convolution" style="font-size: 80%;">A more efficient way of coding the above Convolution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs022.html#commutative-process" style="font-size: 80%;">Commutative process</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs023.html#toeplitz-matrices" style="font-size: 80%;">Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs024.html#fourier-series-and-toeplitz-matrices" style="font-size: 80%;">Fourier series and Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs025.html#generalizing-the-above-one-dimensional-case" style="font-size: 80%;">Generalizing the above one-dimensional case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs026.html#memory-considerations" style="font-size: 80%;">Memory considerations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs027.html#padding" style="font-size: 80%;">Padding</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs028.html#new-vector" style="font-size: 80%;">New vector</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs029.html#rewriting-as-dot-products" style="font-size: 80%;">Rewriting as dot products</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs030.html#cross-correlation" style="font-size: 80%;">Cross correlation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs030.html#two-dimensional-objects" style="font-size: 80%;">Two-dimensional objects</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#cnns-in-more-detail-simple-example" style="font-size: 80%;">CNNs in more detail, simple example</a></li>
+     <!-- navigation toc: --> <li><a href="#the-convolution-stage" style="font-size: 80%;">The convolution stage</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs033.html#finding-the-number-of-parameters" style="font-size: 80%;">Finding the number of parameters</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs034.html#new-image-or-volume" style="font-size: 80%;">New image (or volume)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs035.html#parameters-to-train-common-settings" style="font-size: 80%;">Parameters to train, common settings</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs036.html#examples-of-cnn-setups" style="font-size: 80%;">Examples of CNN setups</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs037.html#summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Summarizing: Performing a general discrete convolution ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs038.html#pooling" style="font-size: 80%;">Pooling</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs039.html#pooling-arithmetic" style="font-size: 80%;">Pooling arithmetic</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs040.html#pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Pooling types ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs041.html#building-convolutional-neural-networks-using-tensorflow-and-keras" style="font-size: 80%;">Building convolutional neural networks using Tensorflow and Keras</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs042.html#setting-it-up" style="font-size: 80%;">Setting it up</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs043.html#the-mnist-dataset-again" style="font-size: 80%;">The MNIST dataset again</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs044.html#strong-correlations" style="font-size: 80%;">Strong correlations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs045.html#layers-of-a-cnn" style="font-size: 80%;">Layers of a CNN</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs046.html#systematic-reduction" style="font-size: 80%;">Systematic reduction</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs047.html#prerequisites-collect-and-pre-process-data" style="font-size: 80%;">Prerequisites: Collect and pre-process data</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs048.html#importing-keras-and-tensorflow" style="font-size: 80%;">Importing Keras and Tensorflow</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs049.html#running-with-keras" style="font-size: 80%;">Running with Keras</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs050.html#final-part" style="font-size: 80%;">Final part</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs051.html#final-visualization" style="font-size: 80%;">Final visualization</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs052.html#the-cifar01-data-set" style="font-size: 80%;">The CIFAR01 data set</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs053.html#verifying-the-data-set" style="font-size: 80%;">Verifying the data set</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs054.html#set-up-the-model" style="font-size: 80%;">Set up  the model</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs055.html#add-dense-layers-on-top" style="font-size: 80%;">Add Dense layers on top</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs056.html#compile-and-train-the-model" style="font-size: 80%;">Compile and train the model</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs057.html#finally-evaluate-the-model" style="font-size: 80%;">Finally, evaluate the model</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs058.html#building-code-using-pytorch" style="font-size: 80%;">Building code using Pytorch</a></li>
 
         </ul>
       </li>
@@ -418,15 +278,13 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0032"></a>
 <!-- !split -->
-<h2 id="from-ffnns-and-cnns-to-recurrent-neural-networks-rnns" class="anchor">From FFNNs and CNNs to recurrent neural networks (RNNs) </h2>
+<h2 id="the-convolution-stage" class="anchor">The convolution stage </h2>
 
-<p>There are limitation of FFNNs, one of which being that FFNNs are not
-designed to handle sequential data (data for which the order matters)
-effectively because they lack the capabilities of storing information
-about previous inputs; each input is being treated indepen-
-dently. This is a limitation when dealing with sequential data where
-past information can be vital to correctly process current and future
-inputs. 
+<p>The convolution stage, where we apply different filters \( \boldsymbol{W} \) in
+order to reduce the dimensionality of an image, adds, in addition to
+the weights and biases (to be trained by the back propagation
+algorithm) that define the filters, two new hyperparameters, the so-called
+<b>padding</b> \( P \) and the stride \( S \).
 </p>
 
 <p>
@@ -454,7 +312,7 @@ <h2 id="from-ffnns-and-cnns-to-recurrent-neural-networks-rnns" class="anchor">Fr
   <li><a href="http://github.com/._week45-bs040.html">41</a></li>
   <li><a href="http://github.com/._week45-bs041.html">42</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week45-bs076.html">77</a></li>
+  <li><a href="http://github.com/._week45-bs058.html">59</a></li>
   <li><a href="http://github.com/._week45-bs033.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week45/html/._week45-bs033.html b/doc/pub/week45/html/._week45-bs033.html
index 1b2f47def..72b7f731d 100644
--- a/doc/pub/week45/html/._week45-bs033.html
+++ b/doc/pub/week45/html/._week45-bs033.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks (RNNs)">
-<title>Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks (RNNs)</title>
+<meta name="description" content="Week 45,  Convolutional Neural Networks (CCNs)">
+<title>Week 45,  Convolutional Neural Networks (CCNs)</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week45.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week45-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -37,19 +37,19 @@
 <!-- tocinfo
 {'highest level': 2,
  'sections': [('Plans for week 45', 2, None, 'plans-for-week-45'),
-              ('Material for the lab sessions, additional ways to present '
-               'classification results and other practicalities',
+              ('Material for the lab sessions',
                2,
                None,
-               'material-for-the-lab-sessions-additional-ways-to-present-classification-results-and-other-practicalities'),
-              ('Material for Lecture Monday November 4',
+               'material-for-the-lab-sessions'),
+              ('Material for Lecture Monday November 3',
                2,
                None,
-               'material-for-lecture-monday-november-4'),
-              ('Convolutional Neural Networks (recognizing images)',
+               'material-for-lecture-monday-november-3'),
+              ('Convolutional Neural Networks (recognizing images), reminder '
+               'from last week',
                2,
                None,
-               'convolutional-neural-networks-recognizing-images'),
+               'convolutional-neural-networks-recognizing-images-reminder-from-last-week'),
               ('What is the Difference', 2, None, 'what-is-the-difference'),
               ('Neural Networks vs CNNs', 2, None, 'neural-networks-vs-cnns'),
               ('Why CNNS for images, sound files, medical images from CT scans '
@@ -62,10 +62,13 @@
                None,
                'regular-nns-don-t-scale-well-to-full-images'),
               ('3D volumes of neurons', 2, None, '3d-volumes-of-neurons'),
+              ('More on Dimensionalities', 2, None, 'more-on-dimensionalities'),
+              ('Further remarks', 2, None, 'further-remarks'),
               ('Layers used to build CNNs',
                2,
                None,
                'layers-used-to-build-cnns'),
+              ('Transforming images', 2, None, 'transforming-images'),
               ('CNNs in brief', 2, None, 'cnns-in-brief'),
               ('A deep CNN model ("From Raschka et '
                'al":"https://github.com/rasbt/machine-learning-book")',
@@ -73,208 +76,104 @@
                None,
                'a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
               ('Key Idea', 2, None, 'key-idea'),
-              ('Building convolutional neural networks in Tensorflow and Keras',
+              ('Mathematics of CNNs', 2, None, 'mathematics-of-cnns'),
+              ('Convolution Examples: Polynomial multiplication',
                2,
                None,
-               'building-convolutional-neural-networks-in-tensorflow-and-keras'),
-              ('Setting it up', 2, None, 'setting-it-up'),
-              ('The MNIST dataset again', 2, None, 'the-mnist-dataset-again'),
-              ('Strong correlations', 2, None, 'strong-correlations'),
-              ('Layers of a CNN', 2, None, 'layers-of-a-cnn'),
-              ('Systematic reduction', 2, None, 'systematic-reduction'),
-              ('Prerequisites: Collect and pre-process data',
-               2,
-               None,
-               'prerequisites-collect-and-pre-process-data'),
-              ('Importing Keras and Tensorflow',
-               2,
-               None,
-               'importing-keras-and-tensorflow'),
-              ('Running with Keras', 2, None, 'running-with-keras'),
-              ('Final part', 2, None, 'final-part'),
-              ('Final visualization', 2, None, 'final-visualization'),
-              ('The CIFAR01 data set', 2, None, 'the-cifar01-data-set'),
-              ('Verifying the data set', 2, None, 'verifying-the-data-set'),
-              ('Set up  the model', 2, None, 'set-up-the-model'),
-              ('Add Dense layers on top', 2, None, 'add-dense-layers-on-top'),
-              ('Compile and train the model',
-               2,
-               None,
-               'compile-and-train-the-model'),
-              ('Finally, evaluate the model',
-               2,
-               None,
-               'finally-evaluate-the-model'),
-              ('Building our own CNN code',
-               2,
-               None,
-               'building-our-own-cnn-code'),
-              ('List of contents:', 3, None, 'list-of-contents'),
-              ('Schedulers', 3, None, 'schedulers'),
-              ('Usage of schedulers', 3, None, 'usage-of-schedulers'),
-              ('Cost functions', 3, None, 'cost-functions'),
-              ('Usage of cost functions', 3, None, 'usage-of-cost-functions'),
-              ('Activation functions', 3, None, 'activation-functions'),
-              ('Usage of activation functions',
-               3,
-               None,
-               'usage-of-activation-functions'),
-              ('Convolution', 3, None, 'convolution'),
-              ('Layers', 3, None, 'layers'),
-              ('Convolution2DLayer: convolution in a hidden layer',
-               3,
-               None,
-               'convolution2dlayer-convolution-in-a-hidden-layer'),
-              ('Backpropagation in the convolutional layer',
-               3,
-               None,
-               'backpropagation-in-the-convolutional-layer'),
-              ('Demonstration', 3, None, 'demonstration'),
-              ('Pooling Layer', 3, None, 'pooling-layer'),
-              ('Flattening Layer', 3, None, 'flattening-layer'),
-              ('Fully Connected Layers', 3, None, 'fully-connected-layers'),
-              ('Optimized Convolution2DLayer',
-               3,
-               None,
-               'optimized-convolution2dlayer'),
-              ('The Convolutional Neural Network (CNN)',
-               3,
-               None,
-               'the-convolutional-neural-network-cnn'),
-              ('Usage of CNN code', 3, None, 'usage-of-cnn-code'),
-              ('Additional Remarks', 3, None, 'additional-remarks'),
-              ('Remarks on the speed', 3, None, 'remarks-on-the-speed'),
-              ('Convolution using separable kernels',
-               3,
-               None,
-               'convolution-using-separable-kernels'),
-              ('Convolution in the Fourier domain',
-               3,
-               None,
-               'convolution-in-the-fourier-domain'),
-              ('From FFNNs and CNNs to recurrent neural networks (RNNs)',
+               'convolution-examples-polynomial-multiplication'),
+              ('Efficient Polynomial Multiplication',
                2,
                None,
-               'from-ffnns-and-cnns-to-recurrent-neural-networks-rnns'),
-              ('Feedback connections', 2, None, 'feedback-connections'),
-              ('Vanishing gradients', 2, None, 'vanishing-gradients'),
-              ('Recurrent neural networks (RNNs): Overarching view',
+               'efficient-polynomial-multiplication'),
+              ('Further simplification', 2, None, 'further-simplification'),
+              ('A more efficient way of coding the above Convolution',
                2,
                None,
-               'recurrent-neural-networks-rnns-overarching-view'),
-              ('Sequential data only?', 2, None, 'sequential-data-only'),
-              ('Differential equations', 2, None, 'differential-equations'),
-              ('A simple example', 2, None, 'a-simple-example'),
-              ('RNNs', 2, None, 'rnns'),
-              ('What kinds of behaviour can RNNs exhibit?',
+               'a-more-efficient-way-of-coding-the-above-convolution'),
+              ('Commutative process', 2, None, 'commutative-process'),
+              ('Toeplitz matrices', 2, None, 'toeplitz-matrices'),
+              ('Fourier series and Toeplitz matrices',
                2,
                None,
-               'what-kinds-of-behaviour-can-rnns-exhibit'),
-              ('Basic layout,  "Figures from Sebastian Rashcka et al, Machine '
-               'learning with Sickit-Learn and '
-               'PyTorch":"https://sebastianraschka.com/blog/2022/ml-pytorch-book.html"',
+               'fourier-series-and-toeplitz-matrices'),
+              ('Generalizing the above one-dimensional case',
                2,
                None,
-               'basic-layout-figures-from-sebastian-rashcka-et-al-machine-learning-with-sickit-learn-and-pytorch-https-sebastianraschka-com-blog-2022-ml-pytorch-book-html'),
-              ('Solving differential equations with RNNs',
+               'generalizing-the-above-one-dimensional-case'),
+              ('Memory considerations', 2, None, 'memory-considerations'),
+              ('Padding', 2, None, 'padding'),
+              ('New vector', 2, None, 'new-vector'),
+              ('Rewriting as dot products',
                2,
                None,
-               'solving-differential-equations-with-rnns'),
-              ('Two first-order differential equations',
+               'rewriting-as-dot-products'),
+              ('Cross correlation', 2, None, 'cross-correlation'),
+              ('Two-dimensional objects', 2, None, 'two-dimensional-objects'),
+              ('CNNs in more detail, simple example',
                2,
                None,
-               'two-first-order-differential-equations'),
-              ('Velocity only', 2, None, 'velocity-only'),
-              ('Linking with RNNs', 2, None, 'linking-with-rnns'),
-              ('Minor rewrite', 2, None, 'minor-rewrite'),
-              ('RNNs in more detail', 2, None, 'rnns-in-more-detail'),
-              ('RNNs in more detail, part 2',
+               'cnns-in-more-detail-simple-example'),
+              ('The convolution stage', 2, None, 'the-convolution-stage'),
+              ('Finding the number of parameters',
                2,
                None,
-               'rnns-in-more-detail-part-2'),
-              ('RNNs in more detail, part 3',
+               'finding-the-number-of-parameters'),
+              ('New image (or volume)', 2, None, 'new-image-or-volume'),
+              ('Parameters to train, common settings',
                2,
                None,
-               'rnns-in-more-detail-part-3'),
-              ('RNNs in more detail, part 4',
-               2,
-               None,
-               'rnns-in-more-detail-part-4'),
-              ('RNNs in more detail, part 5',
-               2,
-               None,
-               'rnns-in-more-detail-part-5'),
-              ('RNNs in more detail, part 6',
-               2,
-               None,
-               'rnns-in-more-detail-part-6'),
-              ('RNNs in more detail, part 7',
-               2,
-               None,
-               'rnns-in-more-detail-part-7'),
-              ('Backpropagation through time',
-               2,
-               None,
-               'backpropagation-through-time'),
-              ('The backward pass is linear',
-               2,
-               None,
-               'the-backward-pass-is-linear'),
-              ('The problem of exploding or vanishing gradients',
-               2,
-               None,
-               'the-problem-of-exploding-or-vanishing-gradients'),
-              ('Mathematical setup', 2, None, 'mathematical-setup'),
-              ('Back propagation in time through figures, part 1',
-               2,
-               None,
-               'back-propagation-in-time-through-figures-part-1'),
-              ('Back propagation in time, part 2',
-               2,
-               None,
-               'back-propagation-in-time-part-2'),
-              ('Back propagation in time, part 3',
+               'parameters-to-train-common-settings'),
+              ('Examples of CNN setups', 2, None, 'examples-of-cnn-setups'),
+              ('Summarizing: Performing a general discrete convolution ("From '
+               'Raschka et '
+               'al":"https://github.com/rasbt/machine-learning-book")',
                2,
                None,
-               'back-propagation-in-time-part-3'),
-              ('Back propagation in time, part 4',
+               'summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
+              ('Pooling', 2, None, 'pooling'),
+              ('Pooling arithmetic', 2, None, 'pooling-arithmetic'),
+              ('Pooling types ("From Raschka et '
+               'al":"https://github.com/rasbt/machine-learning-book")',
                2,
                None,
-               'back-propagation-in-time-part-4'),
-              ('Back propagation in time in equations',
+               'pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
+              ('Building convolutional neural networks using Tensorflow and '
+               'Keras',
                2,
                None,
-               'back-propagation-in-time-in-equations'),
-              ('Chain rule again', 2, None, 'chain-rule-again'),
-              ('Gradients of loss functions',
+               'building-convolutional-neural-networks-using-tensorflow-and-keras'),
+              ('Setting it up', 2, None, 'setting-it-up'),
+              ('The MNIST dataset again', 2, None, 'the-mnist-dataset-again'),
+              ('Strong correlations', 2, None, 'strong-correlations'),
+              ('Layers of a CNN', 2, None, 'layers-of-a-cnn'),
+              ('Systematic reduction', 2, None, 'systematic-reduction'),
+              ('Prerequisites: Collect and pre-process data',
                2,
                None,
-               'gradients-of-loss-functions'),
-              ('Summary of RNNs', 2, None, 'summary-of-rnns'),
-              ('Summary of a  typical RNN',
+               'prerequisites-collect-and-pre-process-data'),
+              ('Importing Keras and Tensorflow',
                2,
                None,
-               'summary-of-a-typical-rnn'),
-              ('Four effective ways to learn an RNN and preparing for next '
-               'week',
+               'importing-keras-and-tensorflow'),
+              ('Running with Keras', 2, None, 'running-with-keras'),
+              ('Final part', 2, None, 'final-part'),
+              ('Final visualization', 2, None, 'final-visualization'),
+              ('The CIFAR01 data set', 2, None, 'the-cifar01-data-set'),
+              ('Verifying the data set', 2, None, 'verifying-the-data-set'),
+              ('Set up  the model', 2, None, 'set-up-the-model'),
+              ('Add Dense layers on top', 2, None, 'add-dense-layers-on-top'),
+              ('Compile and train the model',
                2,
                None,
-               'four-effective-ways-to-learn-an-rnn-and-preparing-for-next-week'),
-              ('Gating mechanism: Long Short Term Memory (LSTM)',
+               'compile-and-train-the-model'),
+              ('Finally, evaluate the model',
                2,
                None,
-               'gating-mechanism-long-short-term-memory-lstm'),
-              ('Implementing a memory cell in a neural network',
+               'finally-evaluate-the-model'),
+              ('Building code using Pytorch',
                2,
                None,
-               'implementing-a-memory-cell-in-a-neural-network'),
-              ('LSTM details', 2, None, 'lstm-details'),
-              ('Basic layout', 2, None, 'basic-layout'),
-              ('More LSTM details', 2, None, 'more-lstm-details'),
-              ('The forget gate', 2, None, 'the-forget-gate'),
-              ('Input gate', 2, None, 'input-gate'),
-              ('Forget and input', 2, None, 'forget-and-input'),
-              ('Output gate', 2, None, 'output-gate')]}
+               'building-code-using-pytorch')]}
 end of tocinfo -->
 
 <body>
@@ -302,111 +201,72 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week45-bs.html">Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks (RNNs)</a>
+    <a class="navbar-brand" href="http://github.com/week45-bs.html">Week 45,  Convolutional Neural Networks (CCNs)</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs001.html#plans-for-week-45" style="font-size: 80%;"><b>Plans for week 45</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs002.html#material-for-the-lab-sessions-additional-ways-to-present-classification-results-and-other-practicalities" style="font-size: 80%;"><b>Material for the lab sessions, additional ways to present classification results and other practicalities</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs003.html#material-for-lecture-monday-november-4" style="font-size: 80%;"><b>Material for Lecture Monday November 4</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs004.html#convolutional-neural-networks-recognizing-images" style="font-size: 80%;"><b>Convolutional Neural Networks (recognizing images)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs005.html#what-is-the-difference" style="font-size: 80%;"><b>What is the Difference</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs006.html#neural-networks-vs-cnns" style="font-size: 80%;"><b>Neural Networks vs CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs007.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;"><b>Why CNNS for images, sound files, medical images from CT scans etc?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs008.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;"><b>Regular NNs don’t scale well to full images</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs009.html#3d-volumes-of-neurons" style="font-size: 80%;"><b>3D volumes of neurons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs010.html#layers-used-to-build-cnns" style="font-size: 80%;"><b>Layers used to build CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs011.html#cnns-in-brief" style="font-size: 80%;"><b>CNNs in brief</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs012.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs013.html#key-idea" style="font-size: 80%;"><b>Key Idea</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs014.html#building-convolutional-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building convolutional neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs015.html#setting-it-up" style="font-size: 80%;"><b>Setting it up</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs016.html#the-mnist-dataset-again" style="font-size: 80%;"><b>The MNIST dataset again</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs017.html#strong-correlations" style="font-size: 80%;"><b>Strong correlations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs018.html#layers-of-a-cnn" style="font-size: 80%;"><b>Layers of a CNN</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs019.html#systematic-reduction" style="font-size: 80%;"><b>Systematic reduction</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs020.html#prerequisites-collect-and-pre-process-data" style="font-size: 80%;"><b>Prerequisites: Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs021.html#importing-keras-and-tensorflow" style="font-size: 80%;"><b>Importing Keras and Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs022.html#running-with-keras" style="font-size: 80%;"><b>Running with Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs023.html#final-part" style="font-size: 80%;"><b>Final part</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs024.html#final-visualization" style="font-size: 80%;"><b>Final visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs025.html#the-cifar01-data-set" style="font-size: 80%;"><b>The CIFAR01 data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs026.html#verifying-the-data-set" style="font-size: 80%;"><b>Verifying the data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs027.html#set-up-the-model" style="font-size: 80%;"><b>Set up  the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs028.html#add-dense-layers-on-top" style="font-size: 80%;"><b>Add Dense layers on top</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs029.html#compile-and-train-the-model" style="font-size: 80%;"><b>Compile and train the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs030.html#finally-evaluate-the-model" style="font-size: 80%;"><b>Finally, evaluate the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#building-our-own-cnn-code" style="font-size: 80%;"><b>Building our own CNN code</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#list-of-contents" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;List of contents:</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution2dlayer-convolution-in-a-hidden-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution2DLayer: convolution in a hidden layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#backpropagation-in-the-convolutional-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Backpropagation in the convolutional layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#demonstration" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Demonstration</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#pooling-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Pooling Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#flattening-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Flattening Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#fully-connected-layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Fully Connected Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#optimized-convolution2dlayer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Optimized Convolution2DLayer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#the-convolutional-neural-network-cnn" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;The Convolutional Neural Network (CNN)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-cnn-code" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of CNN code</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#additional-remarks" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Additional Remarks</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#remarks-on-the-speed" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Remarks on the speed</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution-using-separable-kernels" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution using separable kernels</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution-in-the-fourier-domain" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution in the Fourier domain</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs032.html#from-ffnns-and-cnns-to-recurrent-neural-networks-rnns" style="font-size: 80%;"><b>From FFNNs and CNNs to recurrent neural networks (RNNs)</b></a></li>
-     <!-- navigation toc: --> <li><a href="#feedback-connections" style="font-size: 80%;"><b>Feedback connections</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs034.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs035.html#recurrent-neural-networks-rnns-overarching-view" style="font-size: 80%;"><b>Recurrent neural networks (RNNs): Overarching view</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs036.html#sequential-data-only" style="font-size: 80%;"><b>Sequential data only?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs037.html#differential-equations" style="font-size: 80%;"><b>Differential equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs038.html#a-simple-example" style="font-size: 80%;"><b>A simple example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs039.html#rnns" style="font-size: 80%;"><b>RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs040.html#what-kinds-of-behaviour-can-rnns-exhibit" style="font-size: 80%;"><b>What kinds of behaviour can RNNs exhibit?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs041.html#basic-layout-figures-from-sebastian-rashcka-et-al-machine-learning-with-sickit-learn-and-pytorch-https-sebastianraschka-com-blog-2022-ml-pytorch-book-html" style="font-size: 80%;"><b>Basic layout,  "Figures from Sebastian Rashcka et al, Machine learning with Sickit-Learn and PyTorch":"https://sebastianraschka.com/blog/2022/ml-pytorch-book.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs042.html#solving-differential-equations-with-rnns" style="font-size: 80%;"><b>Solving differential equations with RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs043.html#two-first-order-differential-equations" style="font-size: 80%;"><b>Two first-order differential equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs044.html#velocity-only" style="font-size: 80%;"><b>Velocity only</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs045.html#linking-with-rnns" style="font-size: 80%;"><b>Linking with RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs046.html#minor-rewrite" style="font-size: 80%;"><b>Minor rewrite</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs047.html#rnns-in-more-detail" style="font-size: 80%;"><b>RNNs in more detail</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs048.html#rnns-in-more-detail-part-2" style="font-size: 80%;"><b>RNNs in more detail, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs049.html#rnns-in-more-detail-part-3" style="font-size: 80%;"><b>RNNs in more detail, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs050.html#rnns-in-more-detail-part-4" style="font-size: 80%;"><b>RNNs in more detail, part 4</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs051.html#rnns-in-more-detail-part-5" style="font-size: 80%;"><b>RNNs in more detail, part 5</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs052.html#rnns-in-more-detail-part-6" style="font-size: 80%;"><b>RNNs in more detail, part 6</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs053.html#rnns-in-more-detail-part-7" style="font-size: 80%;"><b>RNNs in more detail, part 7</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs054.html#backpropagation-through-time" style="font-size: 80%;"><b>Backpropagation through time</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs055.html#the-backward-pass-is-linear" style="font-size: 80%;"><b>The backward pass is linear</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs056.html#the-problem-of-exploding-or-vanishing-gradients" style="font-size: 80%;"><b>The problem of exploding or vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs057.html#mathematical-setup" style="font-size: 80%;"><b>Mathematical setup</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs058.html#back-propagation-in-time-through-figures-part-1" style="font-size: 80%;"><b>Back propagation in time through figures, part 1</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs059.html#back-propagation-in-time-part-2" style="font-size: 80%;"><b>Back propagation in time, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs060.html#back-propagation-in-time-part-3" style="font-size: 80%;"><b>Back propagation in time, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs061.html#back-propagation-in-time-part-4" style="font-size: 80%;"><b>Back propagation in time, part 4</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs062.html#back-propagation-in-time-in-equations" style="font-size: 80%;"><b>Back propagation in time in equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs063.html#chain-rule-again" style="font-size: 80%;"><b>Chain rule again</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs064.html#gradients-of-loss-functions" style="font-size: 80%;"><b>Gradients of loss functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs065.html#summary-of-rnns" style="font-size: 80%;"><b>Summary of RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs066.html#summary-of-a-typical-rnn" style="font-size: 80%;"><b>Summary of a  typical RNN</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs067.html#four-effective-ways-to-learn-an-rnn-and-preparing-for-next-week" style="font-size: 80%;"><b>Four effective ways to learn an RNN and preparing for next week</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs068.html#gating-mechanism-long-short-term-memory-lstm" style="font-size: 80%;"><b>Gating mechanism: Long Short Term Memory (LSTM)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs069.html#implementing-a-memory-cell-in-a-neural-network" style="font-size: 80%;"><b>Implementing a memory cell in a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs070.html#lstm-details" style="font-size: 80%;"><b>LSTM details</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs071.html#basic-layout" style="font-size: 80%;"><b>Basic layout</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs072.html#more-lstm-details" style="font-size: 80%;"><b>More LSTM details</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs073.html#the-forget-gate" style="font-size: 80%;"><b>The forget gate</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs074.html#input-gate" style="font-size: 80%;"><b>Input gate</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs075.html#forget-and-input" style="font-size: 80%;"><b>Forget and input</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs076.html#output-gate" style="font-size: 80%;"><b>Output gate</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs001.html#plans-for-week-45" style="font-size: 80%;">Plans for week 45</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs002.html#material-for-the-lab-sessions" style="font-size: 80%;">Material for the lab sessions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs003.html#material-for-lecture-monday-november-3" style="font-size: 80%;">Material for Lecture Monday November 3</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs004.html#convolutional-neural-networks-recognizing-images-reminder-from-last-week" style="font-size: 80%;">Convolutional Neural Networks (recognizing images), reminder from last week</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs005.html#what-is-the-difference" style="font-size: 80%;">What is the Difference</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs006.html#neural-networks-vs-cnns" style="font-size: 80%;">Neural Networks vs CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs007.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;">Why CNNS for images, sound files, medical images from CT scans etc?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs008.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;">Regular NNs don’t scale well to full images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs009.html#3d-volumes-of-neurons" style="font-size: 80%;">3D volumes of neurons</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs010.html#more-on-dimensionalities" style="font-size: 80%;">More on Dimensionalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs011.html#further-remarks" style="font-size: 80%;">Further remarks</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs012.html#layers-used-to-build-cnns" style="font-size: 80%;">Layers used to build CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs013.html#transforming-images" style="font-size: 80%;">Transforming images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs014.html#cnns-in-brief" style="font-size: 80%;">CNNs in brief</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs015.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs016.html#key-idea" style="font-size: 80%;">Key Idea</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs017.html#mathematics-of-cnns" style="font-size: 80%;">Mathematics of CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs018.html#convolution-examples-polynomial-multiplication" style="font-size: 80%;">Convolution Examples: Polynomial multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs019.html#efficient-polynomial-multiplication" style="font-size: 80%;">Efficient Polynomial Multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs020.html#further-simplification" style="font-size: 80%;">Further simplification</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs021.html#a-more-efficient-way-of-coding-the-above-convolution" style="font-size: 80%;">A more efficient way of coding the above Convolution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs022.html#commutative-process" style="font-size: 80%;">Commutative process</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs023.html#toeplitz-matrices" style="font-size: 80%;">Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs024.html#fourier-series-and-toeplitz-matrices" style="font-size: 80%;">Fourier series and Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs025.html#generalizing-the-above-one-dimensional-case" style="font-size: 80%;">Generalizing the above one-dimensional case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs026.html#memory-considerations" style="font-size: 80%;">Memory considerations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs027.html#padding" style="font-size: 80%;">Padding</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs028.html#new-vector" style="font-size: 80%;">New vector</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs029.html#rewriting-as-dot-products" style="font-size: 80%;">Rewriting as dot products</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs030.html#cross-correlation" style="font-size: 80%;">Cross correlation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs030.html#two-dimensional-objects" style="font-size: 80%;">Two-dimensional objects</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#cnns-in-more-detail-simple-example" style="font-size: 80%;">CNNs in more detail, simple example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs032.html#the-convolution-stage" style="font-size: 80%;">The convolution stage</a></li>
+     <!-- navigation toc: --> <li><a href="#finding-the-number-of-parameters" style="font-size: 80%;">Finding the number of parameters</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs034.html#new-image-or-volume" style="font-size: 80%;">New image (or volume)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs035.html#parameters-to-train-common-settings" style="font-size: 80%;">Parameters to train, common settings</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs036.html#examples-of-cnn-setups" style="font-size: 80%;">Examples of CNN setups</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs037.html#summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Summarizing: Performing a general discrete convolution ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs038.html#pooling" style="font-size: 80%;">Pooling</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs039.html#pooling-arithmetic" style="font-size: 80%;">Pooling arithmetic</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs040.html#pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Pooling types ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs041.html#building-convolutional-neural-networks-using-tensorflow-and-keras" style="font-size: 80%;">Building convolutional neural networks using Tensorflow and Keras</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs042.html#setting-it-up" style="font-size: 80%;">Setting it up</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs043.html#the-mnist-dataset-again" style="font-size: 80%;">The MNIST dataset again</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs044.html#strong-correlations" style="font-size: 80%;">Strong correlations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs045.html#layers-of-a-cnn" style="font-size: 80%;">Layers of a CNN</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs046.html#systematic-reduction" style="font-size: 80%;">Systematic reduction</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs047.html#prerequisites-collect-and-pre-process-data" style="font-size: 80%;">Prerequisites: Collect and pre-process data</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs048.html#importing-keras-and-tensorflow" style="font-size: 80%;">Importing Keras and Tensorflow</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs049.html#running-with-keras" style="font-size: 80%;">Running with Keras</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs050.html#final-part" style="font-size: 80%;">Final part</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs051.html#final-visualization" style="font-size: 80%;">Final visualization</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs052.html#the-cifar01-data-set" style="font-size: 80%;">The CIFAR01 data set</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs053.html#verifying-the-data-set" style="font-size: 80%;">Verifying the data set</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs054.html#set-up-the-model" style="font-size: 80%;">Set up  the model</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs055.html#add-dense-layers-on-top" style="font-size: 80%;">Add Dense layers on top</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs056.html#compile-and-train-the-model" style="font-size: 80%;">Compile and train the model</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs057.html#finally-evaluate-the-model" style="font-size: 80%;">Finally, evaluate the model</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs058.html#building-code-using-pytorch" style="font-size: 80%;">Building code using Pytorch</a></li>
 
         </ul>
       </li>
@@ -418,19 +278,25 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0033"></a>
 <!-- !split -->
-<h2 id="feedback-connections" class="anchor">Feedback connections </h2>
+<h2 id="finding-the-number-of-parameters" class="anchor">Finding the number of parameters </h2>
 
-<p>In contrast to FFNNs, recurrent networks introduce feedback
-connections, meaning the information is allowed to be carried to
-subsequent nodes across different time steps. These cyclic or feedback
-connections have the objective of providing the network with some kind
-of memory, making RNNs particularly suited for time- series data,
-natural language processing, speech recognition, and several other
-problems for which the order of the data is crucial.  The RNN
-architectures vary greatly in how they manage information flow and
-memory in the network.
+<p>In the above example we have an input matrix of dimension \( 3\times
+3 \). In general we call the input for an input volume and it is defined
+by its width \( H_1 \), height \( H_1 \) and depth \( D_1 \). If we have the
+standard three color channels \( D_1=3 \).
 </p>
 
+<p>The above example has \( W_1=H_1=3 \) and \( D_1=1 \).</p>
+
+<p>When we introduce the filter we have the following additional hyperparameters</p>
+<ol>
+<li> \( K \) the number of filters. It is common to perform the convolution of the input several times since by experience shrinking the input too fast does not work well</li>
+<li> \( F \) as the filter's spatial extent</li>
+<li> \( S \) as the stride parameter</li>
+<li> \( P \) as the padding parameter</li>
+</ol>
+<p>These parameters are defined by the architecture of the network and are not included in the training.</p>
+
 <p>
 <!-- navigation buttons at the bottom of the page -->
 <ul class="pagination">
@@ -456,7 +322,7 @@ <h2 id="feedback-connections" class="anchor">Feedback connections </h2>
   <li><a href="http://github.com/._week45-bs041.html">42</a></li>
   <li><a href="http://github.com/._week45-bs042.html">43</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week45-bs076.html">77</a></li>
+  <li><a href="http://github.com/._week45-bs058.html">59</a></li>
   <li><a href="http://github.com/._week45-bs034.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week45/html/._week45-bs034.html b/doc/pub/week45/html/._week45-bs034.html
index aacc50695..ddec1f988 100644
--- a/doc/pub/week45/html/._week45-bs034.html
+++ b/doc/pub/week45/html/._week45-bs034.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks (RNNs)">
-<title>Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks (RNNs)</title>
+<meta name="description" content="Week 45,  Convolutional Neural Networks (CCNs)">
+<title>Week 45,  Convolutional Neural Networks (CCNs)</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week45.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week45-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -37,19 +37,19 @@
 <!-- tocinfo
 {'highest level': 2,
  'sections': [('Plans for week 45', 2, None, 'plans-for-week-45'),
-              ('Material for the lab sessions, additional ways to present '
-               'classification results and other practicalities',
+              ('Material for the lab sessions',
                2,
                None,
-               'material-for-the-lab-sessions-additional-ways-to-present-classification-results-and-other-practicalities'),
-              ('Material for Lecture Monday November 4',
+               'material-for-the-lab-sessions'),
+              ('Material for Lecture Monday November 3',
                2,
                None,
-               'material-for-lecture-monday-november-4'),
-              ('Convolutional Neural Networks (recognizing images)',
+               'material-for-lecture-monday-november-3'),
+              ('Convolutional Neural Networks (recognizing images), reminder '
+               'from last week',
                2,
                None,
-               'convolutional-neural-networks-recognizing-images'),
+               'convolutional-neural-networks-recognizing-images-reminder-from-last-week'),
               ('What is the Difference', 2, None, 'what-is-the-difference'),
               ('Neural Networks vs CNNs', 2, None, 'neural-networks-vs-cnns'),
               ('Why CNNS for images, sound files, medical images from CT scans '
@@ -62,10 +62,13 @@
                None,
                'regular-nns-don-t-scale-well-to-full-images'),
               ('3D volumes of neurons', 2, None, '3d-volumes-of-neurons'),
+              ('More on Dimensionalities', 2, None, 'more-on-dimensionalities'),
+              ('Further remarks', 2, None, 'further-remarks'),
               ('Layers used to build CNNs',
                2,
                None,
                'layers-used-to-build-cnns'),
+              ('Transforming images', 2, None, 'transforming-images'),
               ('CNNs in brief', 2, None, 'cnns-in-brief'),
               ('A deep CNN model ("From Raschka et '
                'al":"https://github.com/rasbt/machine-learning-book")',
@@ -73,208 +76,104 @@
                None,
                'a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
               ('Key Idea', 2, None, 'key-idea'),
-              ('Building convolutional neural networks in Tensorflow and Keras',
+              ('Mathematics of CNNs', 2, None, 'mathematics-of-cnns'),
+              ('Convolution Examples: Polynomial multiplication',
                2,
                None,
-               'building-convolutional-neural-networks-in-tensorflow-and-keras'),
-              ('Setting it up', 2, None, 'setting-it-up'),
-              ('The MNIST dataset again', 2, None, 'the-mnist-dataset-again'),
-              ('Strong correlations', 2, None, 'strong-correlations'),
-              ('Layers of a CNN', 2, None, 'layers-of-a-cnn'),
-              ('Systematic reduction', 2, None, 'systematic-reduction'),
-              ('Prerequisites: Collect and pre-process data',
-               2,
-               None,
-               'prerequisites-collect-and-pre-process-data'),
-              ('Importing Keras and Tensorflow',
-               2,
-               None,
-               'importing-keras-and-tensorflow'),
-              ('Running with Keras', 2, None, 'running-with-keras'),
-              ('Final part', 2, None, 'final-part'),
-              ('Final visualization', 2, None, 'final-visualization'),
-              ('The CIFAR01 data set', 2, None, 'the-cifar01-data-set'),
-              ('Verifying the data set', 2, None, 'verifying-the-data-set'),
-              ('Set up  the model', 2, None, 'set-up-the-model'),
-              ('Add Dense layers on top', 2, None, 'add-dense-layers-on-top'),
-              ('Compile and train the model',
-               2,
-               None,
-               'compile-and-train-the-model'),
-              ('Finally, evaluate the model',
-               2,
-               None,
-               'finally-evaluate-the-model'),
-              ('Building our own CNN code',
-               2,
-               None,
-               'building-our-own-cnn-code'),
-              ('List of contents:', 3, None, 'list-of-contents'),
-              ('Schedulers', 3, None, 'schedulers'),
-              ('Usage of schedulers', 3, None, 'usage-of-schedulers'),
-              ('Cost functions', 3, None, 'cost-functions'),
-              ('Usage of cost functions', 3, None, 'usage-of-cost-functions'),
-              ('Activation functions', 3, None, 'activation-functions'),
-              ('Usage of activation functions',
-               3,
-               None,
-               'usage-of-activation-functions'),
-              ('Convolution', 3, None, 'convolution'),
-              ('Layers', 3, None, 'layers'),
-              ('Convolution2DLayer: convolution in a hidden layer',
-               3,
-               None,
-               'convolution2dlayer-convolution-in-a-hidden-layer'),
-              ('Backpropagation in the convolutional layer',
-               3,
-               None,
-               'backpropagation-in-the-convolutional-layer'),
-              ('Demonstration', 3, None, 'demonstration'),
-              ('Pooling Layer', 3, None, 'pooling-layer'),
-              ('Flattening Layer', 3, None, 'flattening-layer'),
-              ('Fully Connected Layers', 3, None, 'fully-connected-layers'),
-              ('Optimized Convolution2DLayer',
-               3,
-               None,
-               'optimized-convolution2dlayer'),
-              ('The Convolutional Neural Network (CNN)',
-               3,
-               None,
-               'the-convolutional-neural-network-cnn'),
-              ('Usage of CNN code', 3, None, 'usage-of-cnn-code'),
-              ('Additional Remarks', 3, None, 'additional-remarks'),
-              ('Remarks on the speed', 3, None, 'remarks-on-the-speed'),
-              ('Convolution using separable kernels',
-               3,
-               None,
-               'convolution-using-separable-kernels'),
-              ('Convolution in the Fourier domain',
-               3,
-               None,
-               'convolution-in-the-fourier-domain'),
-              ('From FFNNs and CNNs to recurrent neural networks (RNNs)',
-               2,
-               None,
-               'from-ffnns-and-cnns-to-recurrent-neural-networks-rnns'),
-              ('Feedback connections', 2, None, 'feedback-connections'),
-              ('Vanishing gradients', 2, None, 'vanishing-gradients'),
-              ('Recurrent neural networks (RNNs): Overarching view',
-               2,
-               None,
-               'recurrent-neural-networks-rnns-overarching-view'),
-              ('Sequential data only?', 2, None, 'sequential-data-only'),
-              ('Differential equations', 2, None, 'differential-equations'),
-              ('A simple example', 2, None, 'a-simple-example'),
-              ('RNNs', 2, None, 'rnns'),
-              ('What kinds of behaviour can RNNs exhibit?',
-               2,
-               None,
-               'what-kinds-of-behaviour-can-rnns-exhibit'),
-              ('Basic layout,  "Figures from Sebastian Rashcka et al, Machine '
-               'learning with Sickit-Learn and '
-               'PyTorch":"https://sebastianraschka.com/blog/2022/ml-pytorch-book.html"',
-               2,
-               None,
-               'basic-layout-figures-from-sebastian-rashcka-et-al-machine-learning-with-sickit-learn-and-pytorch-https-sebastianraschka-com-blog-2022-ml-pytorch-book-html'),
-              ('Solving differential equations with RNNs',
-               2,
-               None,
-               'solving-differential-equations-with-rnns'),
-              ('Two first-order differential equations',
-               2,
-               None,
-               'two-first-order-differential-equations'),
-              ('Velocity only', 2, None, 'velocity-only'),
-              ('Linking with RNNs', 2, None, 'linking-with-rnns'),
-              ('Minor rewrite', 2, None, 'minor-rewrite'),
-              ('RNNs in more detail', 2, None, 'rnns-in-more-detail'),
-              ('RNNs in more detail, part 2',
-               2,
-               None,
-               'rnns-in-more-detail-part-2'),
-              ('RNNs in more detail, part 3',
-               2,
-               None,
-               'rnns-in-more-detail-part-3'),
-              ('RNNs in more detail, part 4',
-               2,
-               None,
-               'rnns-in-more-detail-part-4'),
-              ('RNNs in more detail, part 5',
+               'convolution-examples-polynomial-multiplication'),
+              ('Efficient Polynomial Multiplication',
                2,
                None,
-               'rnns-in-more-detail-part-5'),
-              ('RNNs in more detail, part 6',
+               'efficient-polynomial-multiplication'),
+              ('Further simplification', 2, None, 'further-simplification'),
+              ('A more efficient way of coding the above Convolution',
                2,
                None,
-               'rnns-in-more-detail-part-6'),
-              ('RNNs in more detail, part 7',
+               'a-more-efficient-way-of-coding-the-above-convolution'),
+              ('Commutative process', 2, None, 'commutative-process'),
+              ('Toeplitz matrices', 2, None, 'toeplitz-matrices'),
+              ('Fourier series and Toeplitz matrices',
                2,
                None,
-               'rnns-in-more-detail-part-7'),
-              ('Backpropagation through time',
+               'fourier-series-and-toeplitz-matrices'),
+              ('Generalizing the above one-dimensional case',
                2,
                None,
-               'backpropagation-through-time'),
-              ('The backward pass is linear',
+               'generalizing-the-above-one-dimensional-case'),
+              ('Memory considerations', 2, None, 'memory-considerations'),
+              ('Padding', 2, None, 'padding'),
+              ('New vector', 2, None, 'new-vector'),
+              ('Rewriting as dot products',
                2,
                None,
-               'the-backward-pass-is-linear'),
-              ('The problem of exploding or vanishing gradients',
+               'rewriting-as-dot-products'),
+              ('Cross correlation', 2, None, 'cross-correlation'),
+              ('Two-dimensional objects', 2, None, 'two-dimensional-objects'),
+              ('CNNs in more detail, simple example',
                2,
                None,
-               'the-problem-of-exploding-or-vanishing-gradients'),
-              ('Mathematical setup', 2, None, 'mathematical-setup'),
-              ('Back propagation in time through figures, part 1',
+               'cnns-in-more-detail-simple-example'),
+              ('The convolution stage', 2, None, 'the-convolution-stage'),
+              ('Finding the number of parameters',
                2,
                None,
-               'back-propagation-in-time-through-figures-part-1'),
-              ('Back propagation in time, part 2',
+               'finding-the-number-of-parameters'),
+              ('New image (or volume)', 2, None, 'new-image-or-volume'),
+              ('Parameters to train, common settings',
                2,
                None,
-               'back-propagation-in-time-part-2'),
-              ('Back propagation in time, part 3',
+               'parameters-to-train-common-settings'),
+              ('Examples of CNN setups', 2, None, 'examples-of-cnn-setups'),
+              ('Summarizing: Performing a general discrete convolution ("From '
+               'Raschka et '
+               'al":"https://github.com/rasbt/machine-learning-book")',
                2,
                None,
-               'back-propagation-in-time-part-3'),
-              ('Back propagation in time, part 4',
+               'summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
+              ('Pooling', 2, None, 'pooling'),
+              ('Pooling arithmetic', 2, None, 'pooling-arithmetic'),
+              ('Pooling types ("From Raschka et '
+               'al":"https://github.com/rasbt/machine-learning-book")',
                2,
                None,
-               'back-propagation-in-time-part-4'),
-              ('Back propagation in time in equations',
+               'pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
+              ('Building convolutional neural networks using Tensorflow and '
+               'Keras',
                2,
                None,
-               'back-propagation-in-time-in-equations'),
-              ('Chain rule again', 2, None, 'chain-rule-again'),
-              ('Gradients of loss functions',
+               'building-convolutional-neural-networks-using-tensorflow-and-keras'),
+              ('Setting it up', 2, None, 'setting-it-up'),
+              ('The MNIST dataset again', 2, None, 'the-mnist-dataset-again'),
+              ('Strong correlations', 2, None, 'strong-correlations'),
+              ('Layers of a CNN', 2, None, 'layers-of-a-cnn'),
+              ('Systematic reduction', 2, None, 'systematic-reduction'),
+              ('Prerequisites: Collect and pre-process data',
                2,
                None,
-               'gradients-of-loss-functions'),
-              ('Summary of RNNs', 2, None, 'summary-of-rnns'),
-              ('Summary of a  typical RNN',
+               'prerequisites-collect-and-pre-process-data'),
+              ('Importing Keras and Tensorflow',
                2,
                None,
-               'summary-of-a-typical-rnn'),
-              ('Four effective ways to learn an RNN and preparing for next '
-               'week',
+               'importing-keras-and-tensorflow'),
+              ('Running with Keras', 2, None, 'running-with-keras'),
+              ('Final part', 2, None, 'final-part'),
+              ('Final visualization', 2, None, 'final-visualization'),
+              ('The CIFAR01 data set', 2, None, 'the-cifar01-data-set'),
+              ('Verifying the data set', 2, None, 'verifying-the-data-set'),
+              ('Set up  the model', 2, None, 'set-up-the-model'),
+              ('Add Dense layers on top', 2, None, 'add-dense-layers-on-top'),
+              ('Compile and train the model',
                2,
                None,
-               'four-effective-ways-to-learn-an-rnn-and-preparing-for-next-week'),
-              ('Gating mechanism: Long Short Term Memory (LSTM)',
+               'compile-and-train-the-model'),
+              ('Finally, evaluate the model',
                2,
                None,
-               'gating-mechanism-long-short-term-memory-lstm'),
-              ('Implementing a memory cell in a neural network',
+               'finally-evaluate-the-model'),
+              ('Building code using Pytorch',
                2,
                None,
-               'implementing-a-memory-cell-in-a-neural-network'),
-              ('LSTM details', 2, None, 'lstm-details'),
-              ('Basic layout', 2, None, 'basic-layout'),
-              ('More LSTM details', 2, None, 'more-lstm-details'),
-              ('The forget gate', 2, None, 'the-forget-gate'),
-              ('Input gate', 2, None, 'input-gate'),
-              ('Forget and input', 2, None, 'forget-and-input'),
-              ('Output gate', 2, None, 'output-gate')]}
+               'building-code-using-pytorch')]}
 end of tocinfo -->
 
 <body>
@@ -302,111 +201,72 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week45-bs.html">Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks (RNNs)</a>
+    <a class="navbar-brand" href="http://github.com/week45-bs.html">Week 45,  Convolutional Neural Networks (CCNs)</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs001.html#plans-for-week-45" style="font-size: 80%;"><b>Plans for week 45</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs002.html#material-for-the-lab-sessions-additional-ways-to-present-classification-results-and-other-practicalities" style="font-size: 80%;"><b>Material for the lab sessions, additional ways to present classification results and other practicalities</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs003.html#material-for-lecture-monday-november-4" style="font-size: 80%;"><b>Material for Lecture Monday November 4</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs004.html#convolutional-neural-networks-recognizing-images" style="font-size: 80%;"><b>Convolutional Neural Networks (recognizing images)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs005.html#what-is-the-difference" style="font-size: 80%;"><b>What is the Difference</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs006.html#neural-networks-vs-cnns" style="font-size: 80%;"><b>Neural Networks vs CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs007.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;"><b>Why CNNS for images, sound files, medical images from CT scans etc?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs008.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;"><b>Regular NNs don’t scale well to full images</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs009.html#3d-volumes-of-neurons" style="font-size: 80%;"><b>3D volumes of neurons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs010.html#layers-used-to-build-cnns" style="font-size: 80%;"><b>Layers used to build CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs011.html#cnns-in-brief" style="font-size: 80%;"><b>CNNs in brief</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs012.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs013.html#key-idea" style="font-size: 80%;"><b>Key Idea</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs014.html#building-convolutional-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building convolutional neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs015.html#setting-it-up" style="font-size: 80%;"><b>Setting it up</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs016.html#the-mnist-dataset-again" style="font-size: 80%;"><b>The MNIST dataset again</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs017.html#strong-correlations" style="font-size: 80%;"><b>Strong correlations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs018.html#layers-of-a-cnn" style="font-size: 80%;"><b>Layers of a CNN</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs019.html#systematic-reduction" style="font-size: 80%;"><b>Systematic reduction</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs020.html#prerequisites-collect-and-pre-process-data" style="font-size: 80%;"><b>Prerequisites: Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs021.html#importing-keras-and-tensorflow" style="font-size: 80%;"><b>Importing Keras and Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs022.html#running-with-keras" style="font-size: 80%;"><b>Running with Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs023.html#final-part" style="font-size: 80%;"><b>Final part</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs024.html#final-visualization" style="font-size: 80%;"><b>Final visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs025.html#the-cifar01-data-set" style="font-size: 80%;"><b>The CIFAR01 data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs026.html#verifying-the-data-set" style="font-size: 80%;"><b>Verifying the data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs027.html#set-up-the-model" style="font-size: 80%;"><b>Set up  the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs028.html#add-dense-layers-on-top" style="font-size: 80%;"><b>Add Dense layers on top</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs029.html#compile-and-train-the-model" style="font-size: 80%;"><b>Compile and train the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs030.html#finally-evaluate-the-model" style="font-size: 80%;"><b>Finally, evaluate the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#building-our-own-cnn-code" style="font-size: 80%;"><b>Building our own CNN code</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#list-of-contents" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;List of contents:</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution2dlayer-convolution-in-a-hidden-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution2DLayer: convolution in a hidden layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#backpropagation-in-the-convolutional-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Backpropagation in the convolutional layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#demonstration" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Demonstration</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#pooling-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Pooling Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#flattening-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Flattening Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#fully-connected-layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Fully Connected Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#optimized-convolution2dlayer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Optimized Convolution2DLayer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#the-convolutional-neural-network-cnn" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;The Convolutional Neural Network (CNN)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-cnn-code" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of CNN code</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#additional-remarks" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Additional Remarks</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#remarks-on-the-speed" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Remarks on the speed</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution-using-separable-kernels" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution using separable kernels</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution-in-the-fourier-domain" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution in the Fourier domain</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs032.html#from-ffnns-and-cnns-to-recurrent-neural-networks-rnns" style="font-size: 80%;"><b>From FFNNs and CNNs to recurrent neural networks (RNNs)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs033.html#feedback-connections" style="font-size: 80%;"><b>Feedback connections</b></a></li>
-     <!-- navigation toc: --> <li><a href="#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs035.html#recurrent-neural-networks-rnns-overarching-view" style="font-size: 80%;"><b>Recurrent neural networks (RNNs): Overarching view</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs036.html#sequential-data-only" style="font-size: 80%;"><b>Sequential data only?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs037.html#differential-equations" style="font-size: 80%;"><b>Differential equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs038.html#a-simple-example" style="font-size: 80%;"><b>A simple example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs039.html#rnns" style="font-size: 80%;"><b>RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs040.html#what-kinds-of-behaviour-can-rnns-exhibit" style="font-size: 80%;"><b>What kinds of behaviour can RNNs exhibit?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs041.html#basic-layout-figures-from-sebastian-rashcka-et-al-machine-learning-with-sickit-learn-and-pytorch-https-sebastianraschka-com-blog-2022-ml-pytorch-book-html" style="font-size: 80%;"><b>Basic layout,  "Figures from Sebastian Rashcka et al, Machine learning with Sickit-Learn and PyTorch":"https://sebastianraschka.com/blog/2022/ml-pytorch-book.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs042.html#solving-differential-equations-with-rnns" style="font-size: 80%;"><b>Solving differential equations with RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs043.html#two-first-order-differential-equations" style="font-size: 80%;"><b>Two first-order differential equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs044.html#velocity-only" style="font-size: 80%;"><b>Velocity only</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs045.html#linking-with-rnns" style="font-size: 80%;"><b>Linking with RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs046.html#minor-rewrite" style="font-size: 80%;"><b>Minor rewrite</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs047.html#rnns-in-more-detail" style="font-size: 80%;"><b>RNNs in more detail</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs048.html#rnns-in-more-detail-part-2" style="font-size: 80%;"><b>RNNs in more detail, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs049.html#rnns-in-more-detail-part-3" style="font-size: 80%;"><b>RNNs in more detail, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs050.html#rnns-in-more-detail-part-4" style="font-size: 80%;"><b>RNNs in more detail, part 4</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs051.html#rnns-in-more-detail-part-5" style="font-size: 80%;"><b>RNNs in more detail, part 5</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs052.html#rnns-in-more-detail-part-6" style="font-size: 80%;"><b>RNNs in more detail, part 6</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs053.html#rnns-in-more-detail-part-7" style="font-size: 80%;"><b>RNNs in more detail, part 7</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs054.html#backpropagation-through-time" style="font-size: 80%;"><b>Backpropagation through time</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs055.html#the-backward-pass-is-linear" style="font-size: 80%;"><b>The backward pass is linear</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs056.html#the-problem-of-exploding-or-vanishing-gradients" style="font-size: 80%;"><b>The problem of exploding or vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs057.html#mathematical-setup" style="font-size: 80%;"><b>Mathematical setup</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs058.html#back-propagation-in-time-through-figures-part-1" style="font-size: 80%;"><b>Back propagation in time through figures, part 1</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs059.html#back-propagation-in-time-part-2" style="font-size: 80%;"><b>Back propagation in time, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs060.html#back-propagation-in-time-part-3" style="font-size: 80%;"><b>Back propagation in time, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs061.html#back-propagation-in-time-part-4" style="font-size: 80%;"><b>Back propagation in time, part 4</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs062.html#back-propagation-in-time-in-equations" style="font-size: 80%;"><b>Back propagation in time in equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs063.html#chain-rule-again" style="font-size: 80%;"><b>Chain rule again</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs064.html#gradients-of-loss-functions" style="font-size: 80%;"><b>Gradients of loss functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs065.html#summary-of-rnns" style="font-size: 80%;"><b>Summary of RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs066.html#summary-of-a-typical-rnn" style="font-size: 80%;"><b>Summary of a  typical RNN</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs067.html#four-effective-ways-to-learn-an-rnn-and-preparing-for-next-week" style="font-size: 80%;"><b>Four effective ways to learn an RNN and preparing for next week</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs068.html#gating-mechanism-long-short-term-memory-lstm" style="font-size: 80%;"><b>Gating mechanism: Long Short Term Memory (LSTM)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs069.html#implementing-a-memory-cell-in-a-neural-network" style="font-size: 80%;"><b>Implementing a memory cell in a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs070.html#lstm-details" style="font-size: 80%;"><b>LSTM details</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs071.html#basic-layout" style="font-size: 80%;"><b>Basic layout</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs072.html#more-lstm-details" style="font-size: 80%;"><b>More LSTM details</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs073.html#the-forget-gate" style="font-size: 80%;"><b>The forget gate</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs074.html#input-gate" style="font-size: 80%;"><b>Input gate</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs075.html#forget-and-input" style="font-size: 80%;"><b>Forget and input</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs076.html#output-gate" style="font-size: 80%;"><b>Output gate</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs001.html#plans-for-week-45" style="font-size: 80%;">Plans for week 45</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs002.html#material-for-the-lab-sessions" style="font-size: 80%;">Material for the lab sessions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs003.html#material-for-lecture-monday-november-3" style="font-size: 80%;">Material for Lecture Monday November 3</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs004.html#convolutional-neural-networks-recognizing-images-reminder-from-last-week" style="font-size: 80%;">Convolutional Neural Networks (recognizing images), reminder from last week</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs005.html#what-is-the-difference" style="font-size: 80%;">What is the Difference</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs006.html#neural-networks-vs-cnns" style="font-size: 80%;">Neural Networks vs CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs007.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;">Why CNNS for images, sound files, medical images from CT scans etc?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs008.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;">Regular NNs don’t scale well to full images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs009.html#3d-volumes-of-neurons" style="font-size: 80%;">3D volumes of neurons</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs010.html#more-on-dimensionalities" style="font-size: 80%;">More on Dimensionalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs011.html#further-remarks" style="font-size: 80%;">Further remarks</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs012.html#layers-used-to-build-cnns" style="font-size: 80%;">Layers used to build CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs013.html#transforming-images" style="font-size: 80%;">Transforming images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs014.html#cnns-in-brief" style="font-size: 80%;">CNNs in brief</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs015.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs016.html#key-idea" style="font-size: 80%;">Key Idea</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs017.html#mathematics-of-cnns" style="font-size: 80%;">Mathematics of CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs018.html#convolution-examples-polynomial-multiplication" style="font-size: 80%;">Convolution Examples: Polynomial multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs019.html#efficient-polynomial-multiplication" style="font-size: 80%;">Efficient Polynomial Multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs020.html#further-simplification" style="font-size: 80%;">Further simplification</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs021.html#a-more-efficient-way-of-coding-the-above-convolution" style="font-size: 80%;">A more efficient way of coding the above Convolution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs022.html#commutative-process" style="font-size: 80%;">Commutative process</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs023.html#toeplitz-matrices" style="font-size: 80%;">Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs024.html#fourier-series-and-toeplitz-matrices" style="font-size: 80%;">Fourier series and Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs025.html#generalizing-the-above-one-dimensional-case" style="font-size: 80%;">Generalizing the above one-dimensional case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs026.html#memory-considerations" style="font-size: 80%;">Memory considerations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs027.html#padding" style="font-size: 80%;">Padding</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs028.html#new-vector" style="font-size: 80%;">New vector</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs029.html#rewriting-as-dot-products" style="font-size: 80%;">Rewriting as dot products</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs030.html#cross-correlation" style="font-size: 80%;">Cross correlation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs030.html#two-dimensional-objects" style="font-size: 80%;">Two-dimensional objects</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#cnns-in-more-detail-simple-example" style="font-size: 80%;">CNNs in more detail, simple example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs032.html#the-convolution-stage" style="font-size: 80%;">The convolution stage</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs033.html#finding-the-number-of-parameters" style="font-size: 80%;">Finding the number of parameters</a></li>
+     <!-- navigation toc: --> <li><a href="#new-image-or-volume" style="font-size: 80%;">New image (or volume)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs035.html#parameters-to-train-common-settings" style="font-size: 80%;">Parameters to train, common settings</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs036.html#examples-of-cnn-setups" style="font-size: 80%;">Examples of CNN setups</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs037.html#summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Summarizing: Performing a general discrete convolution ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs038.html#pooling" style="font-size: 80%;">Pooling</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs039.html#pooling-arithmetic" style="font-size: 80%;">Pooling arithmetic</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs040.html#pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Pooling types ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs041.html#building-convolutional-neural-networks-using-tensorflow-and-keras" style="font-size: 80%;">Building convolutional neural networks using Tensorflow and Keras</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs042.html#setting-it-up" style="font-size: 80%;">Setting it up</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs043.html#the-mnist-dataset-again" style="font-size: 80%;">The MNIST dataset again</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs044.html#strong-correlations" style="font-size: 80%;">Strong correlations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs045.html#layers-of-a-cnn" style="font-size: 80%;">Layers of a CNN</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs046.html#systematic-reduction" style="font-size: 80%;">Systematic reduction</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs047.html#prerequisites-collect-and-pre-process-data" style="font-size: 80%;">Prerequisites: Collect and pre-process data</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs048.html#importing-keras-and-tensorflow" style="font-size: 80%;">Importing Keras and Tensorflow</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs049.html#running-with-keras" style="font-size: 80%;">Running with Keras</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs050.html#final-part" style="font-size: 80%;">Final part</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs051.html#final-visualization" style="font-size: 80%;">Final visualization</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs052.html#the-cifar01-data-set" style="font-size: 80%;">The CIFAR01 data set</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs053.html#verifying-the-data-set" style="font-size: 80%;">Verifying the data set</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs054.html#set-up-the-model" style="font-size: 80%;">Set up  the model</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs055.html#add-dense-layers-on-top" style="font-size: 80%;">Add Dense layers on top</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs056.html#compile-and-train-the-model" style="font-size: 80%;">Compile and train the model</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs057.html#finally-evaluate-the-model" style="font-size: 80%;">Finally, evaluate the model</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs058.html#building-code-using-pytorch" style="font-size: 80%;">Building code using Pytorch</a></li>
 
         </ul>
       </li>
@@ -418,18 +278,24 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0034"></a>
 <!-- !split -->
-<h2 id="vanishing-gradients" class="anchor">Vanishing gradients </h2>
+<h2 id="new-image-or-volume" class="anchor">New image (or volume) </h2>
 
-<p>Different architectures often aim at improving
-some sub-optimal characteristics of the network. The simplest form of
-recurrent network, commonly called simple or vanilla RNN, for example,
-is known to suffer from the problem of vanishing gradients. This
-problem arises due to the nature of backpropagation in time. Gradients
-of the cost/loss function may get exponentially small (or large) if
-there are many layers in the network, which is the case of RNN when
-the sequence gets long.
+<p>Acting with the filter on the input volume produces an output volume
+which is defined by its width \( W_2 \), its height \( H_2 \) and its depth
+\( D_2 \).
 </p>
 
+<p>These are defined by the following relations</p>
+$$
+W_2 = \frac{(W_1-F+2P)}{S}+1,
+$$
+
+$$
+H_2 = \frac{(H_1-F+2P)}{S}+1,
+$$
+
+<p>and \( D_2=K \).</p>
+
 <p>
 <!-- navigation buttons at the bottom of the page -->
 <ul class="pagination">
@@ -455,7 +321,7 @@ <h2 id="vanishing-gradients" class="anchor">Vanishing gradients </h2>
   <li><a href="http://github.com/._week45-bs042.html">43</a></li>
   <li><a href="http://github.com/._week45-bs043.html">44</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week45-bs076.html">77</a></li>
+  <li><a href="http://github.com/._week45-bs058.html">59</a></li>
   <li><a href="http://github.com/._week45-bs035.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week45/html/._week45-bs035.html b/doc/pub/week45/html/._week45-bs035.html
index 8fd4b0c94..ddacda0da 100644
--- a/doc/pub/week45/html/._week45-bs035.html
+++ b/doc/pub/week45/html/._week45-bs035.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks (RNNs)">
-<title>Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks (RNNs)</title>
+<meta name="description" content="Week 45,  Convolutional Neural Networks (CCNs)">
+<title>Week 45,  Convolutional Neural Networks (CCNs)</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week45.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week45-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -37,19 +37,19 @@
 <!-- tocinfo
 {'highest level': 2,
  'sections': [('Plans for week 45', 2, None, 'plans-for-week-45'),
-              ('Material for the lab sessions, additional ways to present '
-               'classification results and other practicalities',
+              ('Material for the lab sessions',
                2,
                None,
-               'material-for-the-lab-sessions-additional-ways-to-present-classification-results-and-other-practicalities'),
-              ('Material for Lecture Monday November 4',
+               'material-for-the-lab-sessions'),
+              ('Material for Lecture Monday November 3',
                2,
                None,
-               'material-for-lecture-monday-november-4'),
-              ('Convolutional Neural Networks (recognizing images)',
+               'material-for-lecture-monday-november-3'),
+              ('Convolutional Neural Networks (recognizing images), reminder '
+               'from last week',
                2,
                None,
-               'convolutional-neural-networks-recognizing-images'),
+               'convolutional-neural-networks-recognizing-images-reminder-from-last-week'),
               ('What is the Difference', 2, None, 'what-is-the-difference'),
               ('Neural Networks vs CNNs', 2, None, 'neural-networks-vs-cnns'),
               ('Why CNNS for images, sound files, medical images from CT scans '
@@ -62,10 +62,13 @@
                None,
                'regular-nns-don-t-scale-well-to-full-images'),
               ('3D volumes of neurons', 2, None, '3d-volumes-of-neurons'),
+              ('More on Dimensionalities', 2, None, 'more-on-dimensionalities'),
+              ('Further remarks', 2, None, 'further-remarks'),
               ('Layers used to build CNNs',
                2,
                None,
                'layers-used-to-build-cnns'),
+              ('Transforming images', 2, None, 'transforming-images'),
               ('CNNs in brief', 2, None, 'cnns-in-brief'),
               ('A deep CNN model ("From Raschka et '
                'al":"https://github.com/rasbt/machine-learning-book")',
@@ -73,208 +76,104 @@
                None,
                'a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
               ('Key Idea', 2, None, 'key-idea'),
-              ('Building convolutional neural networks in Tensorflow and Keras',
+              ('Mathematics of CNNs', 2, None, 'mathematics-of-cnns'),
+              ('Convolution Examples: Polynomial multiplication',
                2,
                None,
-               'building-convolutional-neural-networks-in-tensorflow-and-keras'),
-              ('Setting it up', 2, None, 'setting-it-up'),
-              ('The MNIST dataset again', 2, None, 'the-mnist-dataset-again'),
-              ('Strong correlations', 2, None, 'strong-correlations'),
-              ('Layers of a CNN', 2, None, 'layers-of-a-cnn'),
-              ('Systematic reduction', 2, None, 'systematic-reduction'),
-              ('Prerequisites: Collect and pre-process data',
-               2,
-               None,
-               'prerequisites-collect-and-pre-process-data'),
-              ('Importing Keras and Tensorflow',
-               2,
-               None,
-               'importing-keras-and-tensorflow'),
-              ('Running with Keras', 2, None, 'running-with-keras'),
-              ('Final part', 2, None, 'final-part'),
-              ('Final visualization', 2, None, 'final-visualization'),
-              ('The CIFAR01 data set', 2, None, 'the-cifar01-data-set'),
-              ('Verifying the data set', 2, None, 'verifying-the-data-set'),
-              ('Set up  the model', 2, None, 'set-up-the-model'),
-              ('Add Dense layers on top', 2, None, 'add-dense-layers-on-top'),
-              ('Compile and train the model',
-               2,
-               None,
-               'compile-and-train-the-model'),
-              ('Finally, evaluate the model',
-               2,
-               None,
-               'finally-evaluate-the-model'),
-              ('Building our own CNN code',
-               2,
-               None,
-               'building-our-own-cnn-code'),
-              ('List of contents:', 3, None, 'list-of-contents'),
-              ('Schedulers', 3, None, 'schedulers'),
-              ('Usage of schedulers', 3, None, 'usage-of-schedulers'),
-              ('Cost functions', 3, None, 'cost-functions'),
-              ('Usage of cost functions', 3, None, 'usage-of-cost-functions'),
-              ('Activation functions', 3, None, 'activation-functions'),
-              ('Usage of activation functions',
-               3,
-               None,
-               'usage-of-activation-functions'),
-              ('Convolution', 3, None, 'convolution'),
-              ('Layers', 3, None, 'layers'),
-              ('Convolution2DLayer: convolution in a hidden layer',
-               3,
-               None,
-               'convolution2dlayer-convolution-in-a-hidden-layer'),
-              ('Backpropagation in the convolutional layer',
-               3,
-               None,
-               'backpropagation-in-the-convolutional-layer'),
-              ('Demonstration', 3, None, 'demonstration'),
-              ('Pooling Layer', 3, None, 'pooling-layer'),
-              ('Flattening Layer', 3, None, 'flattening-layer'),
-              ('Fully Connected Layers', 3, None, 'fully-connected-layers'),
-              ('Optimized Convolution2DLayer',
-               3,
-               None,
-               'optimized-convolution2dlayer'),
-              ('The Convolutional Neural Network (CNN)',
-               3,
-               None,
-               'the-convolutional-neural-network-cnn'),
-              ('Usage of CNN code', 3, None, 'usage-of-cnn-code'),
-              ('Additional Remarks', 3, None, 'additional-remarks'),
-              ('Remarks on the speed', 3, None, 'remarks-on-the-speed'),
-              ('Convolution using separable kernels',
-               3,
-               None,
-               'convolution-using-separable-kernels'),
-              ('Convolution in the Fourier domain',
-               3,
-               None,
-               'convolution-in-the-fourier-domain'),
-              ('From FFNNs and CNNs to recurrent neural networks (RNNs)',
-               2,
-               None,
-               'from-ffnns-and-cnns-to-recurrent-neural-networks-rnns'),
-              ('Feedback connections', 2, None, 'feedback-connections'),
-              ('Vanishing gradients', 2, None, 'vanishing-gradients'),
-              ('Recurrent neural networks (RNNs): Overarching view',
-               2,
-               None,
-               'recurrent-neural-networks-rnns-overarching-view'),
-              ('Sequential data only?', 2, None, 'sequential-data-only'),
-              ('Differential equations', 2, None, 'differential-equations'),
-              ('A simple example', 2, None, 'a-simple-example'),
-              ('RNNs', 2, None, 'rnns'),
-              ('What kinds of behaviour can RNNs exhibit?',
+               'convolution-examples-polynomial-multiplication'),
+              ('Efficient Polynomial Multiplication',
                2,
                None,
-               'what-kinds-of-behaviour-can-rnns-exhibit'),
-              ('Basic layout,  "Figures from Sebastian Rashcka et al, Machine '
-               'learning with Sickit-Learn and '
-               'PyTorch":"https://sebastianraschka.com/blog/2022/ml-pytorch-book.html"',
+               'efficient-polynomial-multiplication'),
+              ('Further simplification', 2, None, 'further-simplification'),
+              ('A more efficient way of coding the above Convolution',
                2,
                None,
-               'basic-layout-figures-from-sebastian-rashcka-et-al-machine-learning-with-sickit-learn-and-pytorch-https-sebastianraschka-com-blog-2022-ml-pytorch-book-html'),
-              ('Solving differential equations with RNNs',
+               'a-more-efficient-way-of-coding-the-above-convolution'),
+              ('Commutative process', 2, None, 'commutative-process'),
+              ('Toeplitz matrices', 2, None, 'toeplitz-matrices'),
+              ('Fourier series and Toeplitz matrices',
                2,
                None,
-               'solving-differential-equations-with-rnns'),
-              ('Two first-order differential equations',
+               'fourier-series-and-toeplitz-matrices'),
+              ('Generalizing the above one-dimensional case',
                2,
                None,
-               'two-first-order-differential-equations'),
-              ('Velocity only', 2, None, 'velocity-only'),
-              ('Linking with RNNs', 2, None, 'linking-with-rnns'),
-              ('Minor rewrite', 2, None, 'minor-rewrite'),
-              ('RNNs in more detail', 2, None, 'rnns-in-more-detail'),
-              ('RNNs in more detail, part 2',
+               'generalizing-the-above-one-dimensional-case'),
+              ('Memory considerations', 2, None, 'memory-considerations'),
+              ('Padding', 2, None, 'padding'),
+              ('New vector', 2, None, 'new-vector'),
+              ('Rewriting as dot products',
                2,
                None,
-               'rnns-in-more-detail-part-2'),
-              ('RNNs in more detail, part 3',
+               'rewriting-as-dot-products'),
+              ('Cross correlation', 2, None, 'cross-correlation'),
+              ('Two-dimensional objects', 2, None, 'two-dimensional-objects'),
+              ('CNNs in more detail, simple example',
                2,
                None,
-               'rnns-in-more-detail-part-3'),
-              ('RNNs in more detail, part 4',
+               'cnns-in-more-detail-simple-example'),
+              ('The convolution stage', 2, None, 'the-convolution-stage'),
+              ('Finding the number of parameters',
                2,
                None,
-               'rnns-in-more-detail-part-4'),
-              ('RNNs in more detail, part 5',
+               'finding-the-number-of-parameters'),
+              ('New image (or volume)', 2, None, 'new-image-or-volume'),
+              ('Parameters to train, common settings',
                2,
                None,
-               'rnns-in-more-detail-part-5'),
-              ('RNNs in more detail, part 6',
-               2,
-               None,
-               'rnns-in-more-detail-part-6'),
-              ('RNNs in more detail, part 7',
-               2,
-               None,
-               'rnns-in-more-detail-part-7'),
-              ('Backpropagation through time',
-               2,
-               None,
-               'backpropagation-through-time'),
-              ('The backward pass is linear',
-               2,
-               None,
-               'the-backward-pass-is-linear'),
-              ('The problem of exploding or vanishing gradients',
-               2,
-               None,
-               'the-problem-of-exploding-or-vanishing-gradients'),
-              ('Mathematical setup', 2, None, 'mathematical-setup'),
-              ('Back propagation in time through figures, part 1',
-               2,
-               None,
-               'back-propagation-in-time-through-figures-part-1'),
-              ('Back propagation in time, part 2',
-               2,
-               None,
-               'back-propagation-in-time-part-2'),
-              ('Back propagation in time, part 3',
+               'parameters-to-train-common-settings'),
+              ('Examples of CNN setups', 2, None, 'examples-of-cnn-setups'),
+              ('Summarizing: Performing a general discrete convolution ("From '
+               'Raschka et '
+               'al":"https://github.com/rasbt/machine-learning-book")',
                2,
                None,
-               'back-propagation-in-time-part-3'),
-              ('Back propagation in time, part 4',
+               'summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
+              ('Pooling', 2, None, 'pooling'),
+              ('Pooling arithmetic', 2, None, 'pooling-arithmetic'),
+              ('Pooling types ("From Raschka et '
+               'al":"https://github.com/rasbt/machine-learning-book")',
                2,
                None,
-               'back-propagation-in-time-part-4'),
-              ('Back propagation in time in equations',
+               'pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
+              ('Building convolutional neural networks using Tensorflow and '
+               'Keras',
                2,
                None,
-               'back-propagation-in-time-in-equations'),
-              ('Chain rule again', 2, None, 'chain-rule-again'),
-              ('Gradients of loss functions',
+               'building-convolutional-neural-networks-using-tensorflow-and-keras'),
+              ('Setting it up', 2, None, 'setting-it-up'),
+              ('The MNIST dataset again', 2, None, 'the-mnist-dataset-again'),
+              ('Strong correlations', 2, None, 'strong-correlations'),
+              ('Layers of a CNN', 2, None, 'layers-of-a-cnn'),
+              ('Systematic reduction', 2, None, 'systematic-reduction'),
+              ('Prerequisites: Collect and pre-process data',
                2,
                None,
-               'gradients-of-loss-functions'),
-              ('Summary of RNNs', 2, None, 'summary-of-rnns'),
-              ('Summary of a  typical RNN',
+               'prerequisites-collect-and-pre-process-data'),
+              ('Importing Keras and Tensorflow',
                2,
                None,
-               'summary-of-a-typical-rnn'),
-              ('Four effective ways to learn an RNN and preparing for next '
-               'week',
+               'importing-keras-and-tensorflow'),
+              ('Running with Keras', 2, None, 'running-with-keras'),
+              ('Final part', 2, None, 'final-part'),
+              ('Final visualization', 2, None, 'final-visualization'),
+              ('The CIFAR01 data set', 2, None, 'the-cifar01-data-set'),
+              ('Verifying the data set', 2, None, 'verifying-the-data-set'),
+              ('Set up  the model', 2, None, 'set-up-the-model'),
+              ('Add Dense layers on top', 2, None, 'add-dense-layers-on-top'),
+              ('Compile and train the model',
                2,
                None,
-               'four-effective-ways-to-learn-an-rnn-and-preparing-for-next-week'),
-              ('Gating mechanism: Long Short Term Memory (LSTM)',
+               'compile-and-train-the-model'),
+              ('Finally, evaluate the model',
                2,
                None,
-               'gating-mechanism-long-short-term-memory-lstm'),
-              ('Implementing a memory cell in a neural network',
+               'finally-evaluate-the-model'),
+              ('Building code using Pytorch',
                2,
                None,
-               'implementing-a-memory-cell-in-a-neural-network'),
-              ('LSTM details', 2, None, 'lstm-details'),
-              ('Basic layout', 2, None, 'basic-layout'),
-              ('More LSTM details', 2, None, 'more-lstm-details'),
-              ('The forget gate', 2, None, 'the-forget-gate'),
-              ('Input gate', 2, None, 'input-gate'),
-              ('Forget and input', 2, None, 'forget-and-input'),
-              ('Output gate', 2, None, 'output-gate')]}
+               'building-code-using-pytorch')]}
 end of tocinfo -->
 
 <body>
@@ -302,111 +201,72 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week45-bs.html">Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks (RNNs)</a>
+    <a class="navbar-brand" href="http://github.com/week45-bs.html">Week 45,  Convolutional Neural Networks (CCNs)</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs001.html#plans-for-week-45" style="font-size: 80%;"><b>Plans for week 45</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs002.html#material-for-the-lab-sessions-additional-ways-to-present-classification-results-and-other-practicalities" style="font-size: 80%;"><b>Material for the lab sessions, additional ways to present classification results and other practicalities</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs003.html#material-for-lecture-monday-november-4" style="font-size: 80%;"><b>Material for Lecture Monday November 4</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs004.html#convolutional-neural-networks-recognizing-images" style="font-size: 80%;"><b>Convolutional Neural Networks (recognizing images)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs005.html#what-is-the-difference" style="font-size: 80%;"><b>What is the Difference</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs006.html#neural-networks-vs-cnns" style="font-size: 80%;"><b>Neural Networks vs CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs007.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;"><b>Why CNNS for images, sound files, medical images from CT scans etc?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs008.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;"><b>Regular NNs don’t scale well to full images</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs009.html#3d-volumes-of-neurons" style="font-size: 80%;"><b>3D volumes of neurons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs010.html#layers-used-to-build-cnns" style="font-size: 80%;"><b>Layers used to build CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs011.html#cnns-in-brief" style="font-size: 80%;"><b>CNNs in brief</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs012.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs013.html#key-idea" style="font-size: 80%;"><b>Key Idea</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs014.html#building-convolutional-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building convolutional neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs015.html#setting-it-up" style="font-size: 80%;"><b>Setting it up</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs016.html#the-mnist-dataset-again" style="font-size: 80%;"><b>The MNIST dataset again</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs017.html#strong-correlations" style="font-size: 80%;"><b>Strong correlations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs018.html#layers-of-a-cnn" style="font-size: 80%;"><b>Layers of a CNN</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs019.html#systematic-reduction" style="font-size: 80%;"><b>Systematic reduction</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs020.html#prerequisites-collect-and-pre-process-data" style="font-size: 80%;"><b>Prerequisites: Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs021.html#importing-keras-and-tensorflow" style="font-size: 80%;"><b>Importing Keras and Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs022.html#running-with-keras" style="font-size: 80%;"><b>Running with Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs023.html#final-part" style="font-size: 80%;"><b>Final part</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs024.html#final-visualization" style="font-size: 80%;"><b>Final visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs025.html#the-cifar01-data-set" style="font-size: 80%;"><b>The CIFAR01 data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs026.html#verifying-the-data-set" style="font-size: 80%;"><b>Verifying the data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs027.html#set-up-the-model" style="font-size: 80%;"><b>Set up  the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs028.html#add-dense-layers-on-top" style="font-size: 80%;"><b>Add Dense layers on top</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs029.html#compile-and-train-the-model" style="font-size: 80%;"><b>Compile and train the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs030.html#finally-evaluate-the-model" style="font-size: 80%;"><b>Finally, evaluate the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#building-our-own-cnn-code" style="font-size: 80%;"><b>Building our own CNN code</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#list-of-contents" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;List of contents:</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution2dlayer-convolution-in-a-hidden-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution2DLayer: convolution in a hidden layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#backpropagation-in-the-convolutional-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Backpropagation in the convolutional layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#demonstration" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Demonstration</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#pooling-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Pooling Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#flattening-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Flattening Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#fully-connected-layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Fully Connected Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#optimized-convolution2dlayer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Optimized Convolution2DLayer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#the-convolutional-neural-network-cnn" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;The Convolutional Neural Network (CNN)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-cnn-code" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of CNN code</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#additional-remarks" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Additional Remarks</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#remarks-on-the-speed" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Remarks on the speed</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution-using-separable-kernels" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution using separable kernels</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution-in-the-fourier-domain" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution in the Fourier domain</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs032.html#from-ffnns-and-cnns-to-recurrent-neural-networks-rnns" style="font-size: 80%;"><b>From FFNNs and CNNs to recurrent neural networks (RNNs)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs033.html#feedback-connections" style="font-size: 80%;"><b>Feedback connections</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs034.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="#recurrent-neural-networks-rnns-overarching-view" style="font-size: 80%;"><b>Recurrent neural networks (RNNs): Overarching view</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs036.html#sequential-data-only" style="font-size: 80%;"><b>Sequential data only?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs037.html#differential-equations" style="font-size: 80%;"><b>Differential equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs038.html#a-simple-example" style="font-size: 80%;"><b>A simple example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs039.html#rnns" style="font-size: 80%;"><b>RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs040.html#what-kinds-of-behaviour-can-rnns-exhibit" style="font-size: 80%;"><b>What kinds of behaviour can RNNs exhibit?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs041.html#basic-layout-figures-from-sebastian-rashcka-et-al-machine-learning-with-sickit-learn-and-pytorch-https-sebastianraschka-com-blog-2022-ml-pytorch-book-html" style="font-size: 80%;"><b>Basic layout,  "Figures from Sebastian Rashcka et al, Machine learning with Sickit-Learn and PyTorch":"https://sebastianraschka.com/blog/2022/ml-pytorch-book.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs042.html#solving-differential-equations-with-rnns" style="font-size: 80%;"><b>Solving differential equations with RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs043.html#two-first-order-differential-equations" style="font-size: 80%;"><b>Two first-order differential equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs044.html#velocity-only" style="font-size: 80%;"><b>Velocity only</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs045.html#linking-with-rnns" style="font-size: 80%;"><b>Linking with RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs046.html#minor-rewrite" style="font-size: 80%;"><b>Minor rewrite</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs047.html#rnns-in-more-detail" style="font-size: 80%;"><b>RNNs in more detail</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs048.html#rnns-in-more-detail-part-2" style="font-size: 80%;"><b>RNNs in more detail, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs049.html#rnns-in-more-detail-part-3" style="font-size: 80%;"><b>RNNs in more detail, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs050.html#rnns-in-more-detail-part-4" style="font-size: 80%;"><b>RNNs in more detail, part 4</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs051.html#rnns-in-more-detail-part-5" style="font-size: 80%;"><b>RNNs in more detail, part 5</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs052.html#rnns-in-more-detail-part-6" style="font-size: 80%;"><b>RNNs in more detail, part 6</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs053.html#rnns-in-more-detail-part-7" style="font-size: 80%;"><b>RNNs in more detail, part 7</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs054.html#backpropagation-through-time" style="font-size: 80%;"><b>Backpropagation through time</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs055.html#the-backward-pass-is-linear" style="font-size: 80%;"><b>The backward pass is linear</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs056.html#the-problem-of-exploding-or-vanishing-gradients" style="font-size: 80%;"><b>The problem of exploding or vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs057.html#mathematical-setup" style="font-size: 80%;"><b>Mathematical setup</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs058.html#back-propagation-in-time-through-figures-part-1" style="font-size: 80%;"><b>Back propagation in time through figures, part 1</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs059.html#back-propagation-in-time-part-2" style="font-size: 80%;"><b>Back propagation in time, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs060.html#back-propagation-in-time-part-3" style="font-size: 80%;"><b>Back propagation in time, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs061.html#back-propagation-in-time-part-4" style="font-size: 80%;"><b>Back propagation in time, part 4</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs062.html#back-propagation-in-time-in-equations" style="font-size: 80%;"><b>Back propagation in time in equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs063.html#chain-rule-again" style="font-size: 80%;"><b>Chain rule again</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs064.html#gradients-of-loss-functions" style="font-size: 80%;"><b>Gradients of loss functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs065.html#summary-of-rnns" style="font-size: 80%;"><b>Summary of RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs066.html#summary-of-a-typical-rnn" style="font-size: 80%;"><b>Summary of a  typical RNN</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs067.html#four-effective-ways-to-learn-an-rnn-and-preparing-for-next-week" style="font-size: 80%;"><b>Four effective ways to learn an RNN and preparing for next week</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs068.html#gating-mechanism-long-short-term-memory-lstm" style="font-size: 80%;"><b>Gating mechanism: Long Short Term Memory (LSTM)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs069.html#implementing-a-memory-cell-in-a-neural-network" style="font-size: 80%;"><b>Implementing a memory cell in a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs070.html#lstm-details" style="font-size: 80%;"><b>LSTM details</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs071.html#basic-layout" style="font-size: 80%;"><b>Basic layout</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs072.html#more-lstm-details" style="font-size: 80%;"><b>More LSTM details</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs073.html#the-forget-gate" style="font-size: 80%;"><b>The forget gate</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs074.html#input-gate" style="font-size: 80%;"><b>Input gate</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs075.html#forget-and-input" style="font-size: 80%;"><b>Forget and input</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs076.html#output-gate" style="font-size: 80%;"><b>Output gate</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs001.html#plans-for-week-45" style="font-size: 80%;">Plans for week 45</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs002.html#material-for-the-lab-sessions" style="font-size: 80%;">Material for the lab sessions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs003.html#material-for-lecture-monday-november-3" style="font-size: 80%;">Material for Lecture Monday November 3</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs004.html#convolutional-neural-networks-recognizing-images-reminder-from-last-week" style="font-size: 80%;">Convolutional Neural Networks (recognizing images), reminder from last week</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs005.html#what-is-the-difference" style="font-size: 80%;">What is the Difference</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs006.html#neural-networks-vs-cnns" style="font-size: 80%;">Neural Networks vs CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs007.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;">Why CNNS for images, sound files, medical images from CT scans etc?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs008.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;">Regular NNs don’t scale well to full images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs009.html#3d-volumes-of-neurons" style="font-size: 80%;">3D volumes of neurons</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs010.html#more-on-dimensionalities" style="font-size: 80%;">More on Dimensionalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs011.html#further-remarks" style="font-size: 80%;">Further remarks</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs012.html#layers-used-to-build-cnns" style="font-size: 80%;">Layers used to build CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs013.html#transforming-images" style="font-size: 80%;">Transforming images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs014.html#cnns-in-brief" style="font-size: 80%;">CNNs in brief</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs015.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs016.html#key-idea" style="font-size: 80%;">Key Idea</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs017.html#mathematics-of-cnns" style="font-size: 80%;">Mathematics of CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs018.html#convolution-examples-polynomial-multiplication" style="font-size: 80%;">Convolution Examples: Polynomial multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs019.html#efficient-polynomial-multiplication" style="font-size: 80%;">Efficient Polynomial Multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs020.html#further-simplification" style="font-size: 80%;">Further simplification</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs021.html#a-more-efficient-way-of-coding-the-above-convolution" style="font-size: 80%;">A more efficient way of coding the above Convolution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs022.html#commutative-process" style="font-size: 80%;">Commutative process</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs023.html#toeplitz-matrices" style="font-size: 80%;">Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs024.html#fourier-series-and-toeplitz-matrices" style="font-size: 80%;">Fourier series and Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs025.html#generalizing-the-above-one-dimensional-case" style="font-size: 80%;">Generalizing the above one-dimensional case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs026.html#memory-considerations" style="font-size: 80%;">Memory considerations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs027.html#padding" style="font-size: 80%;">Padding</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs028.html#new-vector" style="font-size: 80%;">New vector</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs029.html#rewriting-as-dot-products" style="font-size: 80%;">Rewriting as dot products</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs030.html#cross-correlation" style="font-size: 80%;">Cross correlation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs030.html#two-dimensional-objects" style="font-size: 80%;">Two-dimensional objects</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#cnns-in-more-detail-simple-example" style="font-size: 80%;">CNNs in more detail, simple example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs032.html#the-convolution-stage" style="font-size: 80%;">The convolution stage</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs033.html#finding-the-number-of-parameters" style="font-size: 80%;">Finding the number of parameters</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs034.html#new-image-or-volume" style="font-size: 80%;">New image (or volume)</a></li>
+     <!-- navigation toc: --> <li><a href="#parameters-to-train-common-settings" style="font-size: 80%;">Parameters to train, common settings</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs036.html#examples-of-cnn-setups" style="font-size: 80%;">Examples of CNN setups</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs037.html#summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Summarizing: Performing a general discrete convolution ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs038.html#pooling" style="font-size: 80%;">Pooling</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs039.html#pooling-arithmetic" style="font-size: 80%;">Pooling arithmetic</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs040.html#pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Pooling types ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs041.html#building-convolutional-neural-networks-using-tensorflow-and-keras" style="font-size: 80%;">Building convolutional neural networks using Tensorflow and Keras</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs042.html#setting-it-up" style="font-size: 80%;">Setting it up</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs043.html#the-mnist-dataset-again" style="font-size: 80%;">The MNIST dataset again</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs044.html#strong-correlations" style="font-size: 80%;">Strong correlations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs045.html#layers-of-a-cnn" style="font-size: 80%;">Layers of a CNN</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs046.html#systematic-reduction" style="font-size: 80%;">Systematic reduction</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs047.html#prerequisites-collect-and-pre-process-data" style="font-size: 80%;">Prerequisites: Collect and pre-process data</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs048.html#importing-keras-and-tensorflow" style="font-size: 80%;">Importing Keras and Tensorflow</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs049.html#running-with-keras" style="font-size: 80%;">Running with Keras</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs050.html#final-part" style="font-size: 80%;">Final part</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs051.html#final-visualization" style="font-size: 80%;">Final visualization</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs052.html#the-cifar01-data-set" style="font-size: 80%;">The CIFAR01 data set</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs053.html#verifying-the-data-set" style="font-size: 80%;">Verifying the data set</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs054.html#set-up-the-model" style="font-size: 80%;">Set up  the model</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs055.html#add-dense-layers-on-top" style="font-size: 80%;">Add Dense layers on top</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs056.html#compile-and-train-the-model" style="font-size: 80%;">Compile and train the model</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs057.html#finally-evaluate-the-model" style="font-size: 80%;">Finally, evaluate the model</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs058.html#building-code-using-pytorch" style="font-size: 80%;">Building code using Pytorch</a></li>
 
         </ul>
       </li>
@@ -418,27 +278,31 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0035"></a>
 <!-- !split -->
-<h2 id="recurrent-neural-networks-rnns-overarching-view" class="anchor">Recurrent neural networks (RNNs): Overarching view </h2>
+<h2 id="parameters-to-train-common-settings" class="anchor">Parameters to train, common settings </h2>
 
-<p>Till now our focus has been, including convolutional neural networks
-as well, on feedforward neural networks. The output or the activations
-flow only in one direction, from the input layer to the output layer.
-</p>
+<p>With parameter sharing, the convolution involves thus  for each filter  \( F\times F\times D_1 \) weights plus one bias parameter.</p>
 
-<p>A recurrent neural network (RNN) looks very much like a feedforward
-neural network, except that it also has connections pointing
-backward. 
-</p>
+<p>In total we have</p>
+$$
+\left(F\times F\times D_1)\right) \times K+(K\mathrm{--biases}),
+$$
+
+<p>parameters to train by back propagation.</p>
+
+<p>It is common to let \( K \) come in powers of \( 2 \), that is \( 32 \), \( 64 \), \( 128 \) etc.</p>
+
+<div class="panel panel-default">
+<div class="panel-body">
+<!-- subsequent paragraphs come in larger fonts, so start with a paragraph -->
+<ol>
+<li> \( \begin{array}{c} F=3 &amp; S=1 &amp; P=1 \end{array} \)</li>
+<li> \( \begin{array}{c} F=5 &amp; S=1 &amp; P=2 \end{array} \)</li>
+<li> \( \begin{array}{c} F=5 &amp; S=2 &amp; P=\mathrm{open} \end{array} \)</li>
+<li> \( \begin{array}{c} F=1 &amp; S=1 &amp; P=0 \end{array} \)</li>
+</ol>
+</div>
+</div>
 
-<p>RNNs are used to analyze time series data such as stock prices, and
-tell you when to buy or sell. In autonomous driving systems, they can
-anticipate car trajectories and help avoid accidents. More generally,
-they can work on sequences of arbitrary lengths, rather than on
-fixed-sized inputs like all the nets we have discussed so far. For
-example, they can take sentences, documents, or audio samples as
-input, making them extremely useful for natural language processing
-systems such as automatic translation and speech-to-text.
-</p>
 
 <p>
 <!-- navigation buttons at the bottom of the page -->
@@ -465,7 +329,7 @@ <h2 id="recurrent-neural-networks-rnns-overarching-view" class="anchor">Recurren
   <li><a href="http://github.com/._week45-bs043.html">44</a></li>
   <li><a href="http://github.com/._week45-bs044.html">45</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week45-bs076.html">77</a></li>
+  <li><a href="http://github.com/._week45-bs058.html">59</a></li>
   <li><a href="http://github.com/._week45-bs036.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week45/html/._week45-bs036.html b/doc/pub/week45/html/._week45-bs036.html
index 5b89c6dfc..c04d46b23 100644
--- a/doc/pub/week45/html/._week45-bs036.html
+++ b/doc/pub/week45/html/._week45-bs036.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks (RNNs)">
-<title>Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks (RNNs)</title>
+<meta name="description" content="Week 45,  Convolutional Neural Networks (CCNs)">
+<title>Week 45,  Convolutional Neural Networks (CCNs)</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week45.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week45-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -37,19 +37,19 @@
 <!-- tocinfo
 {'highest level': 2,
  'sections': [('Plans for week 45', 2, None, 'plans-for-week-45'),
-              ('Material for the lab sessions, additional ways to present '
-               'classification results and other practicalities',
+              ('Material for the lab sessions',
                2,
                None,
-               'material-for-the-lab-sessions-additional-ways-to-present-classification-results-and-other-practicalities'),
-              ('Material for Lecture Monday November 4',
+               'material-for-the-lab-sessions'),
+              ('Material for Lecture Monday November 3',
                2,
                None,
-               'material-for-lecture-monday-november-4'),
-              ('Convolutional Neural Networks (recognizing images)',
+               'material-for-lecture-monday-november-3'),
+              ('Convolutional Neural Networks (recognizing images), reminder '
+               'from last week',
                2,
                None,
-               'convolutional-neural-networks-recognizing-images'),
+               'convolutional-neural-networks-recognizing-images-reminder-from-last-week'),
               ('What is the Difference', 2, None, 'what-is-the-difference'),
               ('Neural Networks vs CNNs', 2, None, 'neural-networks-vs-cnns'),
               ('Why CNNS for images, sound files, medical images from CT scans '
@@ -62,10 +62,13 @@
                None,
                'regular-nns-don-t-scale-well-to-full-images'),
               ('3D volumes of neurons', 2, None, '3d-volumes-of-neurons'),
+              ('More on Dimensionalities', 2, None, 'more-on-dimensionalities'),
+              ('Further remarks', 2, None, 'further-remarks'),
               ('Layers used to build CNNs',
                2,
                None,
                'layers-used-to-build-cnns'),
+              ('Transforming images', 2, None, 'transforming-images'),
               ('CNNs in brief', 2, None, 'cnns-in-brief'),
               ('A deep CNN model ("From Raschka et '
                'al":"https://github.com/rasbt/machine-learning-book")',
@@ -73,208 +76,104 @@
                None,
                'a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
               ('Key Idea', 2, None, 'key-idea'),
-              ('Building convolutional neural networks in Tensorflow and Keras',
+              ('Mathematics of CNNs', 2, None, 'mathematics-of-cnns'),
+              ('Convolution Examples: Polynomial multiplication',
                2,
                None,
-               'building-convolutional-neural-networks-in-tensorflow-and-keras'),
-              ('Setting it up', 2, None, 'setting-it-up'),
-              ('The MNIST dataset again', 2, None, 'the-mnist-dataset-again'),
-              ('Strong correlations', 2, None, 'strong-correlations'),
-              ('Layers of a CNN', 2, None, 'layers-of-a-cnn'),
-              ('Systematic reduction', 2, None, 'systematic-reduction'),
-              ('Prerequisites: Collect and pre-process data',
-               2,
-               None,
-               'prerequisites-collect-and-pre-process-data'),
-              ('Importing Keras and Tensorflow',
-               2,
-               None,
-               'importing-keras-and-tensorflow'),
-              ('Running with Keras', 2, None, 'running-with-keras'),
-              ('Final part', 2, None, 'final-part'),
-              ('Final visualization', 2, None, 'final-visualization'),
-              ('The CIFAR01 data set', 2, None, 'the-cifar01-data-set'),
-              ('Verifying the data set', 2, None, 'verifying-the-data-set'),
-              ('Set up  the model', 2, None, 'set-up-the-model'),
-              ('Add Dense layers on top', 2, None, 'add-dense-layers-on-top'),
-              ('Compile and train the model',
-               2,
-               None,
-               'compile-and-train-the-model'),
-              ('Finally, evaluate the model',
-               2,
-               None,
-               'finally-evaluate-the-model'),
-              ('Building our own CNN code',
-               2,
-               None,
-               'building-our-own-cnn-code'),
-              ('List of contents:', 3, None, 'list-of-contents'),
-              ('Schedulers', 3, None, 'schedulers'),
-              ('Usage of schedulers', 3, None, 'usage-of-schedulers'),
-              ('Cost functions', 3, None, 'cost-functions'),
-              ('Usage of cost functions', 3, None, 'usage-of-cost-functions'),
-              ('Activation functions', 3, None, 'activation-functions'),
-              ('Usage of activation functions',
-               3,
-               None,
-               'usage-of-activation-functions'),
-              ('Convolution', 3, None, 'convolution'),
-              ('Layers', 3, None, 'layers'),
-              ('Convolution2DLayer: convolution in a hidden layer',
-               3,
-               None,
-               'convolution2dlayer-convolution-in-a-hidden-layer'),
-              ('Backpropagation in the convolutional layer',
-               3,
-               None,
-               'backpropagation-in-the-convolutional-layer'),
-              ('Demonstration', 3, None, 'demonstration'),
-              ('Pooling Layer', 3, None, 'pooling-layer'),
-              ('Flattening Layer', 3, None, 'flattening-layer'),
-              ('Fully Connected Layers', 3, None, 'fully-connected-layers'),
-              ('Optimized Convolution2DLayer',
-               3,
-               None,
-               'optimized-convolution2dlayer'),
-              ('The Convolutional Neural Network (CNN)',
-               3,
-               None,
-               'the-convolutional-neural-network-cnn'),
-              ('Usage of CNN code', 3, None, 'usage-of-cnn-code'),
-              ('Additional Remarks', 3, None, 'additional-remarks'),
-              ('Remarks on the speed', 3, None, 'remarks-on-the-speed'),
-              ('Convolution using separable kernels',
-               3,
-               None,
-               'convolution-using-separable-kernels'),
-              ('Convolution in the Fourier domain',
-               3,
-               None,
-               'convolution-in-the-fourier-domain'),
-              ('From FFNNs and CNNs to recurrent neural networks (RNNs)',
-               2,
-               None,
-               'from-ffnns-and-cnns-to-recurrent-neural-networks-rnns'),
-              ('Feedback connections', 2, None, 'feedback-connections'),
-              ('Vanishing gradients', 2, None, 'vanishing-gradients'),
-              ('Recurrent neural networks (RNNs): Overarching view',
-               2,
-               None,
-               'recurrent-neural-networks-rnns-overarching-view'),
-              ('Sequential data only?', 2, None, 'sequential-data-only'),
-              ('Differential equations', 2, None, 'differential-equations'),
-              ('A simple example', 2, None, 'a-simple-example'),
-              ('RNNs', 2, None, 'rnns'),
-              ('What kinds of behaviour can RNNs exhibit?',
-               2,
-               None,
-               'what-kinds-of-behaviour-can-rnns-exhibit'),
-              ('Basic layout,  "Figures from Sebastian Rashcka et al, Machine '
-               'learning with Sickit-Learn and '
-               'PyTorch":"https://sebastianraschka.com/blog/2022/ml-pytorch-book.html"',
-               2,
-               None,
-               'basic-layout-figures-from-sebastian-rashcka-et-al-machine-learning-with-sickit-learn-and-pytorch-https-sebastianraschka-com-blog-2022-ml-pytorch-book-html'),
-              ('Solving differential equations with RNNs',
-               2,
-               None,
-               'solving-differential-equations-with-rnns'),
-              ('Two first-order differential equations',
-               2,
-               None,
-               'two-first-order-differential-equations'),
-              ('Velocity only', 2, None, 'velocity-only'),
-              ('Linking with RNNs', 2, None, 'linking-with-rnns'),
-              ('Minor rewrite', 2, None, 'minor-rewrite'),
-              ('RNNs in more detail', 2, None, 'rnns-in-more-detail'),
-              ('RNNs in more detail, part 2',
-               2,
-               None,
-               'rnns-in-more-detail-part-2'),
-              ('RNNs in more detail, part 3',
-               2,
-               None,
-               'rnns-in-more-detail-part-3'),
-              ('RNNs in more detail, part 4',
+               'convolution-examples-polynomial-multiplication'),
+              ('Efficient Polynomial Multiplication',
                2,
                None,
-               'rnns-in-more-detail-part-4'),
-              ('RNNs in more detail, part 5',
+               'efficient-polynomial-multiplication'),
+              ('Further simplification', 2, None, 'further-simplification'),
+              ('A more efficient way of coding the above Convolution',
                2,
                None,
-               'rnns-in-more-detail-part-5'),
-              ('RNNs in more detail, part 6',
+               'a-more-efficient-way-of-coding-the-above-convolution'),
+              ('Commutative process', 2, None, 'commutative-process'),
+              ('Toeplitz matrices', 2, None, 'toeplitz-matrices'),
+              ('Fourier series and Toeplitz matrices',
                2,
                None,
-               'rnns-in-more-detail-part-6'),
-              ('RNNs in more detail, part 7',
+               'fourier-series-and-toeplitz-matrices'),
+              ('Generalizing the above one-dimensional case',
                2,
                None,
-               'rnns-in-more-detail-part-7'),
-              ('Backpropagation through time',
+               'generalizing-the-above-one-dimensional-case'),
+              ('Memory considerations', 2, None, 'memory-considerations'),
+              ('Padding', 2, None, 'padding'),
+              ('New vector', 2, None, 'new-vector'),
+              ('Rewriting as dot products',
                2,
                None,
-               'backpropagation-through-time'),
-              ('The backward pass is linear',
+               'rewriting-as-dot-products'),
+              ('Cross correlation', 2, None, 'cross-correlation'),
+              ('Two-dimensional objects', 2, None, 'two-dimensional-objects'),
+              ('CNNs in more detail, simple example',
                2,
                None,
-               'the-backward-pass-is-linear'),
-              ('The problem of exploding or vanishing gradients',
+               'cnns-in-more-detail-simple-example'),
+              ('The convolution stage', 2, None, 'the-convolution-stage'),
+              ('Finding the number of parameters',
                2,
                None,
-               'the-problem-of-exploding-or-vanishing-gradients'),
-              ('Mathematical setup', 2, None, 'mathematical-setup'),
-              ('Back propagation in time through figures, part 1',
+               'finding-the-number-of-parameters'),
+              ('New image (or volume)', 2, None, 'new-image-or-volume'),
+              ('Parameters to train, common settings',
                2,
                None,
-               'back-propagation-in-time-through-figures-part-1'),
-              ('Back propagation in time, part 2',
-               2,
-               None,
-               'back-propagation-in-time-part-2'),
-              ('Back propagation in time, part 3',
+               'parameters-to-train-common-settings'),
+              ('Examples of CNN setups', 2, None, 'examples-of-cnn-setups'),
+              ('Summarizing: Performing a general discrete convolution ("From '
+               'Raschka et '
+               'al":"https://github.com/rasbt/machine-learning-book")',
                2,
                None,
-               'back-propagation-in-time-part-3'),
-              ('Back propagation in time, part 4',
+               'summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
+              ('Pooling', 2, None, 'pooling'),
+              ('Pooling arithmetic', 2, None, 'pooling-arithmetic'),
+              ('Pooling types ("From Raschka et '
+               'al":"https://github.com/rasbt/machine-learning-book")',
                2,
                None,
-               'back-propagation-in-time-part-4'),
-              ('Back propagation in time in equations',
+               'pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
+              ('Building convolutional neural networks using Tensorflow and '
+               'Keras',
                2,
                None,
-               'back-propagation-in-time-in-equations'),
-              ('Chain rule again', 2, None, 'chain-rule-again'),
-              ('Gradients of loss functions',
+               'building-convolutional-neural-networks-using-tensorflow-and-keras'),
+              ('Setting it up', 2, None, 'setting-it-up'),
+              ('The MNIST dataset again', 2, None, 'the-mnist-dataset-again'),
+              ('Strong correlations', 2, None, 'strong-correlations'),
+              ('Layers of a CNN', 2, None, 'layers-of-a-cnn'),
+              ('Systematic reduction', 2, None, 'systematic-reduction'),
+              ('Prerequisites: Collect and pre-process data',
                2,
                None,
-               'gradients-of-loss-functions'),
-              ('Summary of RNNs', 2, None, 'summary-of-rnns'),
-              ('Summary of a  typical RNN',
+               'prerequisites-collect-and-pre-process-data'),
+              ('Importing Keras and Tensorflow',
                2,
                None,
-               'summary-of-a-typical-rnn'),
-              ('Four effective ways to learn an RNN and preparing for next '
-               'week',
+               'importing-keras-and-tensorflow'),
+              ('Running with Keras', 2, None, 'running-with-keras'),
+              ('Final part', 2, None, 'final-part'),
+              ('Final visualization', 2, None, 'final-visualization'),
+              ('The CIFAR01 data set', 2, None, 'the-cifar01-data-set'),
+              ('Verifying the data set', 2, None, 'verifying-the-data-set'),
+              ('Set up  the model', 2, None, 'set-up-the-model'),
+              ('Add Dense layers on top', 2, None, 'add-dense-layers-on-top'),
+              ('Compile and train the model',
                2,
                None,
-               'four-effective-ways-to-learn-an-rnn-and-preparing-for-next-week'),
-              ('Gating mechanism: Long Short Term Memory (LSTM)',
+               'compile-and-train-the-model'),
+              ('Finally, evaluate the model',
                2,
                None,
-               'gating-mechanism-long-short-term-memory-lstm'),
-              ('Implementing a memory cell in a neural network',
+               'finally-evaluate-the-model'),
+              ('Building code using Pytorch',
                2,
                None,
-               'implementing-a-memory-cell-in-a-neural-network'),
-              ('LSTM details', 2, None, 'lstm-details'),
-              ('Basic layout', 2, None, 'basic-layout'),
-              ('More LSTM details', 2, None, 'more-lstm-details'),
-              ('The forget gate', 2, None, 'the-forget-gate'),
-              ('Input gate', 2, None, 'input-gate'),
-              ('Forget and input', 2, None, 'forget-and-input'),
-              ('Output gate', 2, None, 'output-gate')]}
+               'building-code-using-pytorch')]}
 end of tocinfo -->
 
 <body>
@@ -302,111 +201,72 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week45-bs.html">Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks (RNNs)</a>
+    <a class="navbar-brand" href="http://github.com/week45-bs.html">Week 45,  Convolutional Neural Networks (CCNs)</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs001.html#plans-for-week-45" style="font-size: 80%;"><b>Plans for week 45</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs002.html#material-for-the-lab-sessions-additional-ways-to-present-classification-results-and-other-practicalities" style="font-size: 80%;"><b>Material for the lab sessions, additional ways to present classification results and other practicalities</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs003.html#material-for-lecture-monday-november-4" style="font-size: 80%;"><b>Material for Lecture Monday November 4</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs004.html#convolutional-neural-networks-recognizing-images" style="font-size: 80%;"><b>Convolutional Neural Networks (recognizing images)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs005.html#what-is-the-difference" style="font-size: 80%;"><b>What is the Difference</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs006.html#neural-networks-vs-cnns" style="font-size: 80%;"><b>Neural Networks vs CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs007.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;"><b>Why CNNS for images, sound files, medical images from CT scans etc?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs008.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;"><b>Regular NNs don’t scale well to full images</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs009.html#3d-volumes-of-neurons" style="font-size: 80%;"><b>3D volumes of neurons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs010.html#layers-used-to-build-cnns" style="font-size: 80%;"><b>Layers used to build CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs011.html#cnns-in-brief" style="font-size: 80%;"><b>CNNs in brief</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs012.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs013.html#key-idea" style="font-size: 80%;"><b>Key Idea</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs014.html#building-convolutional-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building convolutional neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs015.html#setting-it-up" style="font-size: 80%;"><b>Setting it up</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs016.html#the-mnist-dataset-again" style="font-size: 80%;"><b>The MNIST dataset again</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs017.html#strong-correlations" style="font-size: 80%;"><b>Strong correlations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs018.html#layers-of-a-cnn" style="font-size: 80%;"><b>Layers of a CNN</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs019.html#systematic-reduction" style="font-size: 80%;"><b>Systematic reduction</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs020.html#prerequisites-collect-and-pre-process-data" style="font-size: 80%;"><b>Prerequisites: Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs021.html#importing-keras-and-tensorflow" style="font-size: 80%;"><b>Importing Keras and Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs022.html#running-with-keras" style="font-size: 80%;"><b>Running with Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs023.html#final-part" style="font-size: 80%;"><b>Final part</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs024.html#final-visualization" style="font-size: 80%;"><b>Final visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs025.html#the-cifar01-data-set" style="font-size: 80%;"><b>The CIFAR01 data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs026.html#verifying-the-data-set" style="font-size: 80%;"><b>Verifying the data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs027.html#set-up-the-model" style="font-size: 80%;"><b>Set up  the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs028.html#add-dense-layers-on-top" style="font-size: 80%;"><b>Add Dense layers on top</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs029.html#compile-and-train-the-model" style="font-size: 80%;"><b>Compile and train the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs030.html#finally-evaluate-the-model" style="font-size: 80%;"><b>Finally, evaluate the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#building-our-own-cnn-code" style="font-size: 80%;"><b>Building our own CNN code</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#list-of-contents" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;List of contents:</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution2dlayer-convolution-in-a-hidden-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution2DLayer: convolution in a hidden layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#backpropagation-in-the-convolutional-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Backpropagation in the convolutional layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#demonstration" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Demonstration</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#pooling-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Pooling Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#flattening-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Flattening Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#fully-connected-layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Fully Connected Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#optimized-convolution2dlayer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Optimized Convolution2DLayer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#the-convolutional-neural-network-cnn" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;The Convolutional Neural Network (CNN)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-cnn-code" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of CNN code</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#additional-remarks" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Additional Remarks</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#remarks-on-the-speed" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Remarks on the speed</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution-using-separable-kernels" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution using separable kernels</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution-in-the-fourier-domain" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution in the Fourier domain</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs032.html#from-ffnns-and-cnns-to-recurrent-neural-networks-rnns" style="font-size: 80%;"><b>From FFNNs and CNNs to recurrent neural networks (RNNs)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs033.html#feedback-connections" style="font-size: 80%;"><b>Feedback connections</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs034.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs035.html#recurrent-neural-networks-rnns-overarching-view" style="font-size: 80%;"><b>Recurrent neural networks (RNNs): Overarching view</b></a></li>
-     <!-- navigation toc: --> <li><a href="#sequential-data-only" style="font-size: 80%;"><b>Sequential data only?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs037.html#differential-equations" style="font-size: 80%;"><b>Differential equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs038.html#a-simple-example" style="font-size: 80%;"><b>A simple example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs039.html#rnns" style="font-size: 80%;"><b>RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs040.html#what-kinds-of-behaviour-can-rnns-exhibit" style="font-size: 80%;"><b>What kinds of behaviour can RNNs exhibit?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs041.html#basic-layout-figures-from-sebastian-rashcka-et-al-machine-learning-with-sickit-learn-and-pytorch-https-sebastianraschka-com-blog-2022-ml-pytorch-book-html" style="font-size: 80%;"><b>Basic layout,  "Figures from Sebastian Rashcka et al, Machine learning with Sickit-Learn and PyTorch":"https://sebastianraschka.com/blog/2022/ml-pytorch-book.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs042.html#solving-differential-equations-with-rnns" style="font-size: 80%;"><b>Solving differential equations with RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs043.html#two-first-order-differential-equations" style="font-size: 80%;"><b>Two first-order differential equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs044.html#velocity-only" style="font-size: 80%;"><b>Velocity only</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs045.html#linking-with-rnns" style="font-size: 80%;"><b>Linking with RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs046.html#minor-rewrite" style="font-size: 80%;"><b>Minor rewrite</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs047.html#rnns-in-more-detail" style="font-size: 80%;"><b>RNNs in more detail</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs048.html#rnns-in-more-detail-part-2" style="font-size: 80%;"><b>RNNs in more detail, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs049.html#rnns-in-more-detail-part-3" style="font-size: 80%;"><b>RNNs in more detail, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs050.html#rnns-in-more-detail-part-4" style="font-size: 80%;"><b>RNNs in more detail, part 4</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs051.html#rnns-in-more-detail-part-5" style="font-size: 80%;"><b>RNNs in more detail, part 5</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs052.html#rnns-in-more-detail-part-6" style="font-size: 80%;"><b>RNNs in more detail, part 6</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs053.html#rnns-in-more-detail-part-7" style="font-size: 80%;"><b>RNNs in more detail, part 7</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs054.html#backpropagation-through-time" style="font-size: 80%;"><b>Backpropagation through time</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs055.html#the-backward-pass-is-linear" style="font-size: 80%;"><b>The backward pass is linear</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs056.html#the-problem-of-exploding-or-vanishing-gradients" style="font-size: 80%;"><b>The problem of exploding or vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs057.html#mathematical-setup" style="font-size: 80%;"><b>Mathematical setup</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs058.html#back-propagation-in-time-through-figures-part-1" style="font-size: 80%;"><b>Back propagation in time through figures, part 1</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs059.html#back-propagation-in-time-part-2" style="font-size: 80%;"><b>Back propagation in time, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs060.html#back-propagation-in-time-part-3" style="font-size: 80%;"><b>Back propagation in time, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs061.html#back-propagation-in-time-part-4" style="font-size: 80%;"><b>Back propagation in time, part 4</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs062.html#back-propagation-in-time-in-equations" style="font-size: 80%;"><b>Back propagation in time in equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs063.html#chain-rule-again" style="font-size: 80%;"><b>Chain rule again</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs064.html#gradients-of-loss-functions" style="font-size: 80%;"><b>Gradients of loss functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs065.html#summary-of-rnns" style="font-size: 80%;"><b>Summary of RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs066.html#summary-of-a-typical-rnn" style="font-size: 80%;"><b>Summary of a  typical RNN</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs067.html#four-effective-ways-to-learn-an-rnn-and-preparing-for-next-week" style="font-size: 80%;"><b>Four effective ways to learn an RNN and preparing for next week</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs068.html#gating-mechanism-long-short-term-memory-lstm" style="font-size: 80%;"><b>Gating mechanism: Long Short Term Memory (LSTM)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs069.html#implementing-a-memory-cell-in-a-neural-network" style="font-size: 80%;"><b>Implementing a memory cell in a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs070.html#lstm-details" style="font-size: 80%;"><b>LSTM details</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs071.html#basic-layout" style="font-size: 80%;"><b>Basic layout</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs072.html#more-lstm-details" style="font-size: 80%;"><b>More LSTM details</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs073.html#the-forget-gate" style="font-size: 80%;"><b>The forget gate</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs074.html#input-gate" style="font-size: 80%;"><b>Input gate</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs075.html#forget-and-input" style="font-size: 80%;"><b>Forget and input</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs076.html#output-gate" style="font-size: 80%;"><b>Output gate</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs001.html#plans-for-week-45" style="font-size: 80%;">Plans for week 45</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs002.html#material-for-the-lab-sessions" style="font-size: 80%;">Material for the lab sessions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs003.html#material-for-lecture-monday-november-3" style="font-size: 80%;">Material for Lecture Monday November 3</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs004.html#convolutional-neural-networks-recognizing-images-reminder-from-last-week" style="font-size: 80%;">Convolutional Neural Networks (recognizing images), reminder from last week</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs005.html#what-is-the-difference" style="font-size: 80%;">What is the Difference</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs006.html#neural-networks-vs-cnns" style="font-size: 80%;">Neural Networks vs CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs007.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;">Why CNNS for images, sound files, medical images from CT scans etc?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs008.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;">Regular NNs don’t scale well to full images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs009.html#3d-volumes-of-neurons" style="font-size: 80%;">3D volumes of neurons</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs010.html#more-on-dimensionalities" style="font-size: 80%;">More on Dimensionalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs011.html#further-remarks" style="font-size: 80%;">Further remarks</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs012.html#layers-used-to-build-cnns" style="font-size: 80%;">Layers used to build CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs013.html#transforming-images" style="font-size: 80%;">Transforming images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs014.html#cnns-in-brief" style="font-size: 80%;">CNNs in brief</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs015.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs016.html#key-idea" style="font-size: 80%;">Key Idea</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs017.html#mathematics-of-cnns" style="font-size: 80%;">Mathematics of CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs018.html#convolution-examples-polynomial-multiplication" style="font-size: 80%;">Convolution Examples: Polynomial multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs019.html#efficient-polynomial-multiplication" style="font-size: 80%;">Efficient Polynomial Multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs020.html#further-simplification" style="font-size: 80%;">Further simplification</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs021.html#a-more-efficient-way-of-coding-the-above-convolution" style="font-size: 80%;">A more efficient way of coding the above Convolution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs022.html#commutative-process" style="font-size: 80%;">Commutative process</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs023.html#toeplitz-matrices" style="font-size: 80%;">Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs024.html#fourier-series-and-toeplitz-matrices" style="font-size: 80%;">Fourier series and Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs025.html#generalizing-the-above-one-dimensional-case" style="font-size: 80%;">Generalizing the above one-dimensional case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs026.html#memory-considerations" style="font-size: 80%;">Memory considerations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs027.html#padding" style="font-size: 80%;">Padding</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs028.html#new-vector" style="font-size: 80%;">New vector</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs029.html#rewriting-as-dot-products" style="font-size: 80%;">Rewriting as dot products</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs030.html#cross-correlation" style="font-size: 80%;">Cross correlation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs030.html#two-dimensional-objects" style="font-size: 80%;">Two-dimensional objects</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#cnns-in-more-detail-simple-example" style="font-size: 80%;">CNNs in more detail, simple example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs032.html#the-convolution-stage" style="font-size: 80%;">The convolution stage</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs033.html#finding-the-number-of-parameters" style="font-size: 80%;">Finding the number of parameters</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs034.html#new-image-or-volume" style="font-size: 80%;">New image (or volume)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs035.html#parameters-to-train-common-settings" style="font-size: 80%;">Parameters to train, common settings</a></li>
+     <!-- navigation toc: --> <li><a href="#examples-of-cnn-setups" style="font-size: 80%;">Examples of CNN setups</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs037.html#summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Summarizing: Performing a general discrete convolution ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs038.html#pooling" style="font-size: 80%;">Pooling</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs039.html#pooling-arithmetic" style="font-size: 80%;">Pooling arithmetic</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs040.html#pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Pooling types ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs041.html#building-convolutional-neural-networks-using-tensorflow-and-keras" style="font-size: 80%;">Building convolutional neural networks using Tensorflow and Keras</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs042.html#setting-it-up" style="font-size: 80%;">Setting it up</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs043.html#the-mnist-dataset-again" style="font-size: 80%;">The MNIST dataset again</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs044.html#strong-correlations" style="font-size: 80%;">Strong correlations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs045.html#layers-of-a-cnn" style="font-size: 80%;">Layers of a CNN</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs046.html#systematic-reduction" style="font-size: 80%;">Systematic reduction</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs047.html#prerequisites-collect-and-pre-process-data" style="font-size: 80%;">Prerequisites: Collect and pre-process data</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs048.html#importing-keras-and-tensorflow" style="font-size: 80%;">Importing Keras and Tensorflow</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs049.html#running-with-keras" style="font-size: 80%;">Running with Keras</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs050.html#final-part" style="font-size: 80%;">Final part</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs051.html#final-visualization" style="font-size: 80%;">Final visualization</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs052.html#the-cifar01-data-set" style="font-size: 80%;">The CIFAR01 data set</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs053.html#verifying-the-data-set" style="font-size: 80%;">Verifying the data set</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs054.html#set-up-the-model" style="font-size: 80%;">Set up  the model</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs055.html#add-dense-layers-on-top" style="font-size: 80%;">Add Dense layers on top</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs056.html#compile-and-train-the-model" style="font-size: 80%;">Compile and train the model</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs057.html#finally-evaluate-the-model" style="font-size: 80%;">Finally, evaluate the model</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs058.html#building-code-using-pytorch" style="font-size: 80%;">Building code using Pytorch</a></li>
 
         </ul>
       </li>
@@ -418,18 +278,33 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0036"></a>
 <!-- !split -->
-<h2 id="sequential-data-only" class="anchor">Sequential data only? </h2>
+<h2 id="examples-of-cnn-setups" class="anchor">Examples of CNN setups </h2>
+
+<p>Let us assume we have an input volume \( V \) given by an image of dimensionality
+\( 32\times 32 \times 3 \), that is three color channels and \( 32\times 32 \) pixels.
+</p>
+
+<p>We apply a filter of dimension \( 5\times 5 \) ten times with stride \( S=1 \) and padding \( P=0 \).</p>
+
+<p>The output volume is given by \( (32-5)/1+1=28 \), resulting in ten images
+of dimensionality \( 28\times 28\times 3 \).
+</p>
+
+<p>The total number of parameters to train for each filter is then
+\( 5\times 5\times 3+1 \), where the last parameter is the bias. This
+gives us \( 76 \) parameters for each filter, leading to a total of \( 760 \)
+parameters for the ten filters.
+</p>
 
-<p>An important issue is that in many deep learning methods we assume
-that the input and output data can be treated as independent and
-identically distributed, normally abbreviated to <b>iid</b>.
-This means that the data we use can be seen as mutually independent.
+<p>How many parameters will a filter of dimensionality \( 3\times 3 \)
+(adding color channels) result in if we produce \( 32 \) new images? Use \( S=1 \) and \( P=0 \).
 </p>
 
-<p>This is however not the case for most data sets used in RNNs since we
-are dealing with sequences of data with strong inter-dependencies.
-This applies in particular to time series, which are sequential by
-contruction.
+<p>Note that strides constitute a form of <b>subsampling</b>. As an alternative to
+being interpreted as a measure of how much the kernel/filter is translated, strides
+can also be viewed as how much of the output is retained. For instance, moving
+the kernel by hops of two is equivalent to moving the kernel by hops of one but
+retaining only odd output elements.
 </p>
 
 <p>
@@ -457,7 +332,7 @@ <h2 id="sequential-data-only" class="anchor">Sequential data only? </h2>
   <li><a href="http://github.com/._week45-bs044.html">45</a></li>
   <li><a href="http://github.com/._week45-bs045.html">46</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week45-bs076.html">77</a></li>
+  <li><a href="http://github.com/._week45-bs058.html">59</a></li>
   <li><a href="http://github.com/._week45-bs037.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week45/html/._week45-bs037.html b/doc/pub/week45/html/._week45-bs037.html
index 44d5952e3..01335183d 100644
--- a/doc/pub/week45/html/._week45-bs037.html
+++ b/doc/pub/week45/html/._week45-bs037.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks (RNNs)">
-<title>Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks (RNNs)</title>
+<meta name="description" content="Week 45,  Convolutional Neural Networks (CCNs)">
+<title>Week 45,  Convolutional Neural Networks (CCNs)</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week45.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week45-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -37,19 +37,19 @@
 <!-- tocinfo
 {'highest level': 2,
  'sections': [('Plans for week 45', 2, None, 'plans-for-week-45'),
-              ('Material for the lab sessions, additional ways to present '
-               'classification results and other practicalities',
+              ('Material for the lab sessions',
                2,
                None,
-               'material-for-the-lab-sessions-additional-ways-to-present-classification-results-and-other-practicalities'),
-              ('Material for Lecture Monday November 4',
+               'material-for-the-lab-sessions'),
+              ('Material for Lecture Monday November 3',
                2,
                None,
-               'material-for-lecture-monday-november-4'),
-              ('Convolutional Neural Networks (recognizing images)',
+               'material-for-lecture-monday-november-3'),
+              ('Convolutional Neural Networks (recognizing images), reminder '
+               'from last week',
                2,
                None,
-               'convolutional-neural-networks-recognizing-images'),
+               'convolutional-neural-networks-recognizing-images-reminder-from-last-week'),
               ('What is the Difference', 2, None, 'what-is-the-difference'),
               ('Neural Networks vs CNNs', 2, None, 'neural-networks-vs-cnns'),
               ('Why CNNS for images, sound files, medical images from CT scans '
@@ -62,10 +62,13 @@
                None,
                'regular-nns-don-t-scale-well-to-full-images'),
               ('3D volumes of neurons', 2, None, '3d-volumes-of-neurons'),
+              ('More on Dimensionalities', 2, None, 'more-on-dimensionalities'),
+              ('Further remarks', 2, None, 'further-remarks'),
               ('Layers used to build CNNs',
                2,
                None,
                'layers-used-to-build-cnns'),
+              ('Transforming images', 2, None, 'transforming-images'),
               ('CNNs in brief', 2, None, 'cnns-in-brief'),
               ('A deep CNN model ("From Raschka et '
                'al":"https://github.com/rasbt/machine-learning-book")',
@@ -73,208 +76,104 @@
                None,
                'a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
               ('Key Idea', 2, None, 'key-idea'),
-              ('Building convolutional neural networks in Tensorflow and Keras',
+              ('Mathematics of CNNs', 2, None, 'mathematics-of-cnns'),
+              ('Convolution Examples: Polynomial multiplication',
                2,
                None,
-               'building-convolutional-neural-networks-in-tensorflow-and-keras'),
-              ('Setting it up', 2, None, 'setting-it-up'),
-              ('The MNIST dataset again', 2, None, 'the-mnist-dataset-again'),
-              ('Strong correlations', 2, None, 'strong-correlations'),
-              ('Layers of a CNN', 2, None, 'layers-of-a-cnn'),
-              ('Systematic reduction', 2, None, 'systematic-reduction'),
-              ('Prerequisites: Collect and pre-process data',
-               2,
-               None,
-               'prerequisites-collect-and-pre-process-data'),
-              ('Importing Keras and Tensorflow',
-               2,
-               None,
-               'importing-keras-and-tensorflow'),
-              ('Running with Keras', 2, None, 'running-with-keras'),
-              ('Final part', 2, None, 'final-part'),
-              ('Final visualization', 2, None, 'final-visualization'),
-              ('The CIFAR01 data set', 2, None, 'the-cifar01-data-set'),
-              ('Verifying the data set', 2, None, 'verifying-the-data-set'),
-              ('Set up  the model', 2, None, 'set-up-the-model'),
-              ('Add Dense layers on top', 2, None, 'add-dense-layers-on-top'),
-              ('Compile and train the model',
-               2,
-               None,
-               'compile-and-train-the-model'),
-              ('Finally, evaluate the model',
-               2,
-               None,
-               'finally-evaluate-the-model'),
-              ('Building our own CNN code',
-               2,
-               None,
-               'building-our-own-cnn-code'),
-              ('List of contents:', 3, None, 'list-of-contents'),
-              ('Schedulers', 3, None, 'schedulers'),
-              ('Usage of schedulers', 3, None, 'usage-of-schedulers'),
-              ('Cost functions', 3, None, 'cost-functions'),
-              ('Usage of cost functions', 3, None, 'usage-of-cost-functions'),
-              ('Activation functions', 3, None, 'activation-functions'),
-              ('Usage of activation functions',
-               3,
-               None,
-               'usage-of-activation-functions'),
-              ('Convolution', 3, None, 'convolution'),
-              ('Layers', 3, None, 'layers'),
-              ('Convolution2DLayer: convolution in a hidden layer',
-               3,
-               None,
-               'convolution2dlayer-convolution-in-a-hidden-layer'),
-              ('Backpropagation in the convolutional layer',
-               3,
-               None,
-               'backpropagation-in-the-convolutional-layer'),
-              ('Demonstration', 3, None, 'demonstration'),
-              ('Pooling Layer', 3, None, 'pooling-layer'),
-              ('Flattening Layer', 3, None, 'flattening-layer'),
-              ('Fully Connected Layers', 3, None, 'fully-connected-layers'),
-              ('Optimized Convolution2DLayer',
-               3,
-               None,
-               'optimized-convolution2dlayer'),
-              ('The Convolutional Neural Network (CNN)',
-               3,
-               None,
-               'the-convolutional-neural-network-cnn'),
-              ('Usage of CNN code', 3, None, 'usage-of-cnn-code'),
-              ('Additional Remarks', 3, None, 'additional-remarks'),
-              ('Remarks on the speed', 3, None, 'remarks-on-the-speed'),
-              ('Convolution using separable kernels',
-               3,
-               None,
-               'convolution-using-separable-kernels'),
-              ('Convolution in the Fourier domain',
-               3,
-               None,
-               'convolution-in-the-fourier-domain'),
-              ('From FFNNs and CNNs to recurrent neural networks (RNNs)',
-               2,
-               None,
-               'from-ffnns-and-cnns-to-recurrent-neural-networks-rnns'),
-              ('Feedback connections', 2, None, 'feedback-connections'),
-              ('Vanishing gradients', 2, None, 'vanishing-gradients'),
-              ('Recurrent neural networks (RNNs): Overarching view',
-               2,
-               None,
-               'recurrent-neural-networks-rnns-overarching-view'),
-              ('Sequential data only?', 2, None, 'sequential-data-only'),
-              ('Differential equations', 2, None, 'differential-equations'),
-              ('A simple example', 2, None, 'a-simple-example'),
-              ('RNNs', 2, None, 'rnns'),
-              ('What kinds of behaviour can RNNs exhibit?',
+               'convolution-examples-polynomial-multiplication'),
+              ('Efficient Polynomial Multiplication',
                2,
                None,
-               'what-kinds-of-behaviour-can-rnns-exhibit'),
-              ('Basic layout,  "Figures from Sebastian Rashcka et al, Machine '
-               'learning with Sickit-Learn and '
-               'PyTorch":"https://sebastianraschka.com/blog/2022/ml-pytorch-book.html"',
+               'efficient-polynomial-multiplication'),
+              ('Further simplification', 2, None, 'further-simplification'),
+              ('A more efficient way of coding the above Convolution',
                2,
                None,
-               'basic-layout-figures-from-sebastian-rashcka-et-al-machine-learning-with-sickit-learn-and-pytorch-https-sebastianraschka-com-blog-2022-ml-pytorch-book-html'),
-              ('Solving differential equations with RNNs',
+               'a-more-efficient-way-of-coding-the-above-convolution'),
+              ('Commutative process', 2, None, 'commutative-process'),
+              ('Toeplitz matrices', 2, None, 'toeplitz-matrices'),
+              ('Fourier series and Toeplitz matrices',
                2,
                None,
-               'solving-differential-equations-with-rnns'),
-              ('Two first-order differential equations',
+               'fourier-series-and-toeplitz-matrices'),
+              ('Generalizing the above one-dimensional case',
                2,
                None,
-               'two-first-order-differential-equations'),
-              ('Velocity only', 2, None, 'velocity-only'),
-              ('Linking with RNNs', 2, None, 'linking-with-rnns'),
-              ('Minor rewrite', 2, None, 'minor-rewrite'),
-              ('RNNs in more detail', 2, None, 'rnns-in-more-detail'),
-              ('RNNs in more detail, part 2',
+               'generalizing-the-above-one-dimensional-case'),
+              ('Memory considerations', 2, None, 'memory-considerations'),
+              ('Padding', 2, None, 'padding'),
+              ('New vector', 2, None, 'new-vector'),
+              ('Rewriting as dot products',
                2,
                None,
-               'rnns-in-more-detail-part-2'),
-              ('RNNs in more detail, part 3',
+               'rewriting-as-dot-products'),
+              ('Cross correlation', 2, None, 'cross-correlation'),
+              ('Two-dimensional objects', 2, None, 'two-dimensional-objects'),
+              ('CNNs in more detail, simple example',
                2,
                None,
-               'rnns-in-more-detail-part-3'),
-              ('RNNs in more detail, part 4',
+               'cnns-in-more-detail-simple-example'),
+              ('The convolution stage', 2, None, 'the-convolution-stage'),
+              ('Finding the number of parameters',
                2,
                None,
-               'rnns-in-more-detail-part-4'),
-              ('RNNs in more detail, part 5',
+               'finding-the-number-of-parameters'),
+              ('New image (or volume)', 2, None, 'new-image-or-volume'),
+              ('Parameters to train, common settings',
                2,
                None,
-               'rnns-in-more-detail-part-5'),
-              ('RNNs in more detail, part 6',
-               2,
-               None,
-               'rnns-in-more-detail-part-6'),
-              ('RNNs in more detail, part 7',
-               2,
-               None,
-               'rnns-in-more-detail-part-7'),
-              ('Backpropagation through time',
-               2,
-               None,
-               'backpropagation-through-time'),
-              ('The backward pass is linear',
-               2,
-               None,
-               'the-backward-pass-is-linear'),
-              ('The problem of exploding or vanishing gradients',
-               2,
-               None,
-               'the-problem-of-exploding-or-vanishing-gradients'),
-              ('Mathematical setup', 2, None, 'mathematical-setup'),
-              ('Back propagation in time through figures, part 1',
-               2,
-               None,
-               'back-propagation-in-time-through-figures-part-1'),
-              ('Back propagation in time, part 2',
-               2,
-               None,
-               'back-propagation-in-time-part-2'),
-              ('Back propagation in time, part 3',
+               'parameters-to-train-common-settings'),
+              ('Examples of CNN setups', 2, None, 'examples-of-cnn-setups'),
+              ('Summarizing: Performing a general discrete convolution ("From '
+               'Raschka et '
+               'al":"https://github.com/rasbt/machine-learning-book")',
                2,
                None,
-               'back-propagation-in-time-part-3'),
-              ('Back propagation in time, part 4',
+               'summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
+              ('Pooling', 2, None, 'pooling'),
+              ('Pooling arithmetic', 2, None, 'pooling-arithmetic'),
+              ('Pooling types ("From Raschka et '
+               'al":"https://github.com/rasbt/machine-learning-book")',
                2,
                None,
-               'back-propagation-in-time-part-4'),
-              ('Back propagation in time in equations',
+               'pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
+              ('Building convolutional neural networks using Tensorflow and '
+               'Keras',
                2,
                None,
-               'back-propagation-in-time-in-equations'),
-              ('Chain rule again', 2, None, 'chain-rule-again'),
-              ('Gradients of loss functions',
+               'building-convolutional-neural-networks-using-tensorflow-and-keras'),
+              ('Setting it up', 2, None, 'setting-it-up'),
+              ('The MNIST dataset again', 2, None, 'the-mnist-dataset-again'),
+              ('Strong correlations', 2, None, 'strong-correlations'),
+              ('Layers of a CNN', 2, None, 'layers-of-a-cnn'),
+              ('Systematic reduction', 2, None, 'systematic-reduction'),
+              ('Prerequisites: Collect and pre-process data',
                2,
                None,
-               'gradients-of-loss-functions'),
-              ('Summary of RNNs', 2, None, 'summary-of-rnns'),
-              ('Summary of a  typical RNN',
+               'prerequisites-collect-and-pre-process-data'),
+              ('Importing Keras and Tensorflow',
                2,
                None,
-               'summary-of-a-typical-rnn'),
-              ('Four effective ways to learn an RNN and preparing for next '
-               'week',
+               'importing-keras-and-tensorflow'),
+              ('Running with Keras', 2, None, 'running-with-keras'),
+              ('Final part', 2, None, 'final-part'),
+              ('Final visualization', 2, None, 'final-visualization'),
+              ('The CIFAR01 data set', 2, None, 'the-cifar01-data-set'),
+              ('Verifying the data set', 2, None, 'verifying-the-data-set'),
+              ('Set up  the model', 2, None, 'set-up-the-model'),
+              ('Add Dense layers on top', 2, None, 'add-dense-layers-on-top'),
+              ('Compile and train the model',
                2,
                None,
-               'four-effective-ways-to-learn-an-rnn-and-preparing-for-next-week'),
-              ('Gating mechanism: Long Short Term Memory (LSTM)',
+               'compile-and-train-the-model'),
+              ('Finally, evaluate the model',
                2,
                None,
-               'gating-mechanism-long-short-term-memory-lstm'),
-              ('Implementing a memory cell in a neural network',
+               'finally-evaluate-the-model'),
+              ('Building code using Pytorch',
                2,
                None,
-               'implementing-a-memory-cell-in-a-neural-network'),
-              ('LSTM details', 2, None, 'lstm-details'),
-              ('Basic layout', 2, None, 'basic-layout'),
-              ('More LSTM details', 2, None, 'more-lstm-details'),
-              ('The forget gate', 2, None, 'the-forget-gate'),
-              ('Input gate', 2, None, 'input-gate'),
-              ('Forget and input', 2, None, 'forget-and-input'),
-              ('Output gate', 2, None, 'output-gate')]}
+               'building-code-using-pytorch')]}
 end of tocinfo -->
 
 <body>
@@ -302,111 +201,72 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week45-bs.html">Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks (RNNs)</a>
+    <a class="navbar-brand" href="http://github.com/week45-bs.html">Week 45,  Convolutional Neural Networks (CCNs)</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs001.html#plans-for-week-45" style="font-size: 80%;"><b>Plans for week 45</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs002.html#material-for-the-lab-sessions-additional-ways-to-present-classification-results-and-other-practicalities" style="font-size: 80%;"><b>Material for the lab sessions, additional ways to present classification results and other practicalities</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs003.html#material-for-lecture-monday-november-4" style="font-size: 80%;"><b>Material for Lecture Monday November 4</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs004.html#convolutional-neural-networks-recognizing-images" style="font-size: 80%;"><b>Convolutional Neural Networks (recognizing images)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs005.html#what-is-the-difference" style="font-size: 80%;"><b>What is the Difference</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs006.html#neural-networks-vs-cnns" style="font-size: 80%;"><b>Neural Networks vs CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs007.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;"><b>Why CNNS for images, sound files, medical images from CT scans etc?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs008.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;"><b>Regular NNs don’t scale well to full images</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs009.html#3d-volumes-of-neurons" style="font-size: 80%;"><b>3D volumes of neurons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs010.html#layers-used-to-build-cnns" style="font-size: 80%;"><b>Layers used to build CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs011.html#cnns-in-brief" style="font-size: 80%;"><b>CNNs in brief</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs012.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs013.html#key-idea" style="font-size: 80%;"><b>Key Idea</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs014.html#building-convolutional-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building convolutional neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs015.html#setting-it-up" style="font-size: 80%;"><b>Setting it up</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs016.html#the-mnist-dataset-again" style="font-size: 80%;"><b>The MNIST dataset again</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs017.html#strong-correlations" style="font-size: 80%;"><b>Strong correlations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs018.html#layers-of-a-cnn" style="font-size: 80%;"><b>Layers of a CNN</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs019.html#systematic-reduction" style="font-size: 80%;"><b>Systematic reduction</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs020.html#prerequisites-collect-and-pre-process-data" style="font-size: 80%;"><b>Prerequisites: Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs021.html#importing-keras-and-tensorflow" style="font-size: 80%;"><b>Importing Keras and Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs022.html#running-with-keras" style="font-size: 80%;"><b>Running with Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs023.html#final-part" style="font-size: 80%;"><b>Final part</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs024.html#final-visualization" style="font-size: 80%;"><b>Final visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs025.html#the-cifar01-data-set" style="font-size: 80%;"><b>The CIFAR01 data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs026.html#verifying-the-data-set" style="font-size: 80%;"><b>Verifying the data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs027.html#set-up-the-model" style="font-size: 80%;"><b>Set up  the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs028.html#add-dense-layers-on-top" style="font-size: 80%;"><b>Add Dense layers on top</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs029.html#compile-and-train-the-model" style="font-size: 80%;"><b>Compile and train the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs030.html#finally-evaluate-the-model" style="font-size: 80%;"><b>Finally, evaluate the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#building-our-own-cnn-code" style="font-size: 80%;"><b>Building our own CNN code</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#list-of-contents" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;List of contents:</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution2dlayer-convolution-in-a-hidden-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution2DLayer: convolution in a hidden layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#backpropagation-in-the-convolutional-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Backpropagation in the convolutional layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#demonstration" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Demonstration</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#pooling-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Pooling Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#flattening-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Flattening Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#fully-connected-layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Fully Connected Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#optimized-convolution2dlayer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Optimized Convolution2DLayer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#the-convolutional-neural-network-cnn" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;The Convolutional Neural Network (CNN)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-cnn-code" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of CNN code</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#additional-remarks" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Additional Remarks</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#remarks-on-the-speed" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Remarks on the speed</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution-using-separable-kernels" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution using separable kernels</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution-in-the-fourier-domain" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution in the Fourier domain</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs032.html#from-ffnns-and-cnns-to-recurrent-neural-networks-rnns" style="font-size: 80%;"><b>From FFNNs and CNNs to recurrent neural networks (RNNs)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs033.html#feedback-connections" style="font-size: 80%;"><b>Feedback connections</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs034.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs035.html#recurrent-neural-networks-rnns-overarching-view" style="font-size: 80%;"><b>Recurrent neural networks (RNNs): Overarching view</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs036.html#sequential-data-only" style="font-size: 80%;"><b>Sequential data only?</b></a></li>
-     <!-- navigation toc: --> <li><a href="#differential-equations" style="font-size: 80%;"><b>Differential equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs038.html#a-simple-example" style="font-size: 80%;"><b>A simple example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs039.html#rnns" style="font-size: 80%;"><b>RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs040.html#what-kinds-of-behaviour-can-rnns-exhibit" style="font-size: 80%;"><b>What kinds of behaviour can RNNs exhibit?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs041.html#basic-layout-figures-from-sebastian-rashcka-et-al-machine-learning-with-sickit-learn-and-pytorch-https-sebastianraschka-com-blog-2022-ml-pytorch-book-html" style="font-size: 80%;"><b>Basic layout,  "Figures from Sebastian Rashcka et al, Machine learning with Sickit-Learn and PyTorch":"https://sebastianraschka.com/blog/2022/ml-pytorch-book.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs042.html#solving-differential-equations-with-rnns" style="font-size: 80%;"><b>Solving differential equations with RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs043.html#two-first-order-differential-equations" style="font-size: 80%;"><b>Two first-order differential equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs044.html#velocity-only" style="font-size: 80%;"><b>Velocity only</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs045.html#linking-with-rnns" style="font-size: 80%;"><b>Linking with RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs046.html#minor-rewrite" style="font-size: 80%;"><b>Minor rewrite</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs047.html#rnns-in-more-detail" style="font-size: 80%;"><b>RNNs in more detail</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs048.html#rnns-in-more-detail-part-2" style="font-size: 80%;"><b>RNNs in more detail, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs049.html#rnns-in-more-detail-part-3" style="font-size: 80%;"><b>RNNs in more detail, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs050.html#rnns-in-more-detail-part-4" style="font-size: 80%;"><b>RNNs in more detail, part 4</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs051.html#rnns-in-more-detail-part-5" style="font-size: 80%;"><b>RNNs in more detail, part 5</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs052.html#rnns-in-more-detail-part-6" style="font-size: 80%;"><b>RNNs in more detail, part 6</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs053.html#rnns-in-more-detail-part-7" style="font-size: 80%;"><b>RNNs in more detail, part 7</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs054.html#backpropagation-through-time" style="font-size: 80%;"><b>Backpropagation through time</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs055.html#the-backward-pass-is-linear" style="font-size: 80%;"><b>The backward pass is linear</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs056.html#the-problem-of-exploding-or-vanishing-gradients" style="font-size: 80%;"><b>The problem of exploding or vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs057.html#mathematical-setup" style="font-size: 80%;"><b>Mathematical setup</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs058.html#back-propagation-in-time-through-figures-part-1" style="font-size: 80%;"><b>Back propagation in time through figures, part 1</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs059.html#back-propagation-in-time-part-2" style="font-size: 80%;"><b>Back propagation in time, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs060.html#back-propagation-in-time-part-3" style="font-size: 80%;"><b>Back propagation in time, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs061.html#back-propagation-in-time-part-4" style="font-size: 80%;"><b>Back propagation in time, part 4</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs062.html#back-propagation-in-time-in-equations" style="font-size: 80%;"><b>Back propagation in time in equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs063.html#chain-rule-again" style="font-size: 80%;"><b>Chain rule again</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs064.html#gradients-of-loss-functions" style="font-size: 80%;"><b>Gradients of loss functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs065.html#summary-of-rnns" style="font-size: 80%;"><b>Summary of RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs066.html#summary-of-a-typical-rnn" style="font-size: 80%;"><b>Summary of a  typical RNN</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs067.html#four-effective-ways-to-learn-an-rnn-and-preparing-for-next-week" style="font-size: 80%;"><b>Four effective ways to learn an RNN and preparing for next week</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs068.html#gating-mechanism-long-short-term-memory-lstm" style="font-size: 80%;"><b>Gating mechanism: Long Short Term Memory (LSTM)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs069.html#implementing-a-memory-cell-in-a-neural-network" style="font-size: 80%;"><b>Implementing a memory cell in a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs070.html#lstm-details" style="font-size: 80%;"><b>LSTM details</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs071.html#basic-layout" style="font-size: 80%;"><b>Basic layout</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs072.html#more-lstm-details" style="font-size: 80%;"><b>More LSTM details</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs073.html#the-forget-gate" style="font-size: 80%;"><b>The forget gate</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs074.html#input-gate" style="font-size: 80%;"><b>Input gate</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs075.html#forget-and-input" style="font-size: 80%;"><b>Forget and input</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs076.html#output-gate" style="font-size: 80%;"><b>Output gate</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs001.html#plans-for-week-45" style="font-size: 80%;">Plans for week 45</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs002.html#material-for-the-lab-sessions" style="font-size: 80%;">Material for the lab sessions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs003.html#material-for-lecture-monday-november-3" style="font-size: 80%;">Material for Lecture Monday November 3</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs004.html#convolutional-neural-networks-recognizing-images-reminder-from-last-week" style="font-size: 80%;">Convolutional Neural Networks (recognizing images), reminder from last week</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs005.html#what-is-the-difference" style="font-size: 80%;">What is the Difference</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs006.html#neural-networks-vs-cnns" style="font-size: 80%;">Neural Networks vs CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs007.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;">Why CNNS for images, sound files, medical images from CT scans etc?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs008.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;">Regular NNs don’t scale well to full images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs009.html#3d-volumes-of-neurons" style="font-size: 80%;">3D volumes of neurons</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs010.html#more-on-dimensionalities" style="font-size: 80%;">More on Dimensionalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs011.html#further-remarks" style="font-size: 80%;">Further remarks</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs012.html#layers-used-to-build-cnns" style="font-size: 80%;">Layers used to build CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs013.html#transforming-images" style="font-size: 80%;">Transforming images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs014.html#cnns-in-brief" style="font-size: 80%;">CNNs in brief</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs015.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs016.html#key-idea" style="font-size: 80%;">Key Idea</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs017.html#mathematics-of-cnns" style="font-size: 80%;">Mathematics of CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs018.html#convolution-examples-polynomial-multiplication" style="font-size: 80%;">Convolution Examples: Polynomial multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs019.html#efficient-polynomial-multiplication" style="font-size: 80%;">Efficient Polynomial Multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs020.html#further-simplification" style="font-size: 80%;">Further simplification</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs021.html#a-more-efficient-way-of-coding-the-above-convolution" style="font-size: 80%;">A more efficient way of coding the above Convolution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs022.html#commutative-process" style="font-size: 80%;">Commutative process</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs023.html#toeplitz-matrices" style="font-size: 80%;">Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs024.html#fourier-series-and-toeplitz-matrices" style="font-size: 80%;">Fourier series and Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs025.html#generalizing-the-above-one-dimensional-case" style="font-size: 80%;">Generalizing the above one-dimensional case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs026.html#memory-considerations" style="font-size: 80%;">Memory considerations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs027.html#padding" style="font-size: 80%;">Padding</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs028.html#new-vector" style="font-size: 80%;">New vector</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs029.html#rewriting-as-dot-products" style="font-size: 80%;">Rewriting as dot products</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs030.html#cross-correlation" style="font-size: 80%;">Cross correlation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs030.html#two-dimensional-objects" style="font-size: 80%;">Two-dimensional objects</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#cnns-in-more-detail-simple-example" style="font-size: 80%;">CNNs in more detail, simple example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs032.html#the-convolution-stage" style="font-size: 80%;">The convolution stage</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs033.html#finding-the-number-of-parameters" style="font-size: 80%;">Finding the number of parameters</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs034.html#new-image-or-volume" style="font-size: 80%;">New image (or volume)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs035.html#parameters-to-train-common-settings" style="font-size: 80%;">Parameters to train, common settings</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs036.html#examples-of-cnn-setups" style="font-size: 80%;">Examples of CNN setups</a></li>
+     <!-- navigation toc: --> <li><a href="#summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Summarizing: Performing a general discrete convolution ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs038.html#pooling" style="font-size: 80%;">Pooling</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs039.html#pooling-arithmetic" style="font-size: 80%;">Pooling arithmetic</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs040.html#pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Pooling types ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs041.html#building-convolutional-neural-networks-using-tensorflow-and-keras" style="font-size: 80%;">Building convolutional neural networks using Tensorflow and Keras</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs042.html#setting-it-up" style="font-size: 80%;">Setting it up</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs043.html#the-mnist-dataset-again" style="font-size: 80%;">The MNIST dataset again</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs044.html#strong-correlations" style="font-size: 80%;">Strong correlations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs045.html#layers-of-a-cnn" style="font-size: 80%;">Layers of a CNN</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs046.html#systematic-reduction" style="font-size: 80%;">Systematic reduction</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs047.html#prerequisites-collect-and-pre-process-data" style="font-size: 80%;">Prerequisites: Collect and pre-process data</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs048.html#importing-keras-and-tensorflow" style="font-size: 80%;">Importing Keras and Tensorflow</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs049.html#running-with-keras" style="font-size: 80%;">Running with Keras</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs050.html#final-part" style="font-size: 80%;">Final part</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs051.html#final-visualization" style="font-size: 80%;">Final visualization</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs052.html#the-cifar01-data-set" style="font-size: 80%;">The CIFAR01 data set</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs053.html#verifying-the-data-set" style="font-size: 80%;">Verifying the data set</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs054.html#set-up-the-model" style="font-size: 80%;">Set up  the model</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs055.html#add-dense-layers-on-top" style="font-size: 80%;">Add Dense layers on top</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs056.html#compile-and-train-the-model" style="font-size: 80%;">Compile and train the model</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs057.html#finally-evaluate-the-model" style="font-size: 80%;">Finally, evaluate the model</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs058.html#building-code-using-pytorch" style="font-size: 80%;">Building code using Pytorch</a></li>
 
         </ul>
       </li>
@@ -418,22 +278,15 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0037"></a>
 <!-- !split -->
-<h2 id="differential-equations" class="anchor">Differential equations </h2>
-
-<p>As an example, the solutions of ordinary differential equations can be
-represented as a time series, similarly, how stock prices evolve as
-function of time is another example of a typical time series, or voice
-records and many other examples.
-</p>
-
-<p>Not all sequential data may however have a time stamp, texts being a
-typical example thereof, or DNA sequences.
-</p>
+<h2 id="summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" class="anchor">Summarizing: Performing a general discrete convolution (<a href="https://github.com/rasbt/machine-learning-book" target="_self">From Raschka et al</a>) </h2>
 
-<p>The main focus here is on data that can be structured either as time
-series or as ordered series of data.  We will not focus on for example
-natural language processing or similar data sets.
-</p>
+<center>  <!-- FIGURE -->
+<hr class="figure">
+<center>
+<p class="caption">Figure 4:  A deep CNN </p>
+</center>
+<p><img src="http://github.com/figslides/discreteconv1.png" width="500" align="bottom"></p>
+</center>
 
 <p>
 <!-- navigation buttons at the bottom of the page -->
@@ -460,7 +313,7 @@ <h2 id="differential-equations" class="anchor">Differential equations </h2>
   <li><a href="http://github.com/._week45-bs045.html">46</a></li>
   <li><a href="http://github.com/._week45-bs046.html">47</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week45-bs076.html">77</a></li>
+  <li><a href="http://github.com/._week45-bs058.html">59</a></li>
   <li><a href="http://github.com/._week45-bs038.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week45/html/._week45-bs038.html b/doc/pub/week45/html/._week45-bs038.html
index 975617ef6..90a3a095d 100644
--- a/doc/pub/week45/html/._week45-bs038.html
+++ b/doc/pub/week45/html/._week45-bs038.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks (RNNs)">
-<title>Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks (RNNs)</title>
+<meta name="description" content="Week 45,  Convolutional Neural Networks (CCNs)">
+<title>Week 45,  Convolutional Neural Networks (CCNs)</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week45.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week45-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -37,19 +37,19 @@
 <!-- tocinfo
 {'highest level': 2,
  'sections': [('Plans for week 45', 2, None, 'plans-for-week-45'),
-              ('Material for the lab sessions, additional ways to present '
-               'classification results and other practicalities',
+              ('Material for the lab sessions',
                2,
                None,
-               'material-for-the-lab-sessions-additional-ways-to-present-classification-results-and-other-practicalities'),
-              ('Material for Lecture Monday November 4',
+               'material-for-the-lab-sessions'),
+              ('Material for Lecture Monday November 3',
                2,
                None,
-               'material-for-lecture-monday-november-4'),
-              ('Convolutional Neural Networks (recognizing images)',
+               'material-for-lecture-monday-november-3'),
+              ('Convolutional Neural Networks (recognizing images), reminder '
+               'from last week',
                2,
                None,
-               'convolutional-neural-networks-recognizing-images'),
+               'convolutional-neural-networks-recognizing-images-reminder-from-last-week'),
               ('What is the Difference', 2, None, 'what-is-the-difference'),
               ('Neural Networks vs CNNs', 2, None, 'neural-networks-vs-cnns'),
               ('Why CNNS for images, sound files, medical images from CT scans '
@@ -62,10 +62,13 @@
                None,
                'regular-nns-don-t-scale-well-to-full-images'),
               ('3D volumes of neurons', 2, None, '3d-volumes-of-neurons'),
+              ('More on Dimensionalities', 2, None, 'more-on-dimensionalities'),
+              ('Further remarks', 2, None, 'further-remarks'),
               ('Layers used to build CNNs',
                2,
                None,
                'layers-used-to-build-cnns'),
+              ('Transforming images', 2, None, 'transforming-images'),
               ('CNNs in brief', 2, None, 'cnns-in-brief'),
               ('A deep CNN model ("From Raschka et '
                'al":"https://github.com/rasbt/machine-learning-book")',
@@ -73,208 +76,104 @@
                None,
                'a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
               ('Key Idea', 2, None, 'key-idea'),
-              ('Building convolutional neural networks in Tensorflow and Keras',
+              ('Mathematics of CNNs', 2, None, 'mathematics-of-cnns'),
+              ('Convolution Examples: Polynomial multiplication',
                2,
                None,
-               'building-convolutional-neural-networks-in-tensorflow-and-keras'),
-              ('Setting it up', 2, None, 'setting-it-up'),
-              ('The MNIST dataset again', 2, None, 'the-mnist-dataset-again'),
-              ('Strong correlations', 2, None, 'strong-correlations'),
-              ('Layers of a CNN', 2, None, 'layers-of-a-cnn'),
-              ('Systematic reduction', 2, None, 'systematic-reduction'),
-              ('Prerequisites: Collect and pre-process data',
-               2,
-               None,
-               'prerequisites-collect-and-pre-process-data'),
-              ('Importing Keras and Tensorflow',
-               2,
-               None,
-               'importing-keras-and-tensorflow'),
-              ('Running with Keras', 2, None, 'running-with-keras'),
-              ('Final part', 2, None, 'final-part'),
-              ('Final visualization', 2, None, 'final-visualization'),
-              ('The CIFAR01 data set', 2, None, 'the-cifar01-data-set'),
-              ('Verifying the data set', 2, None, 'verifying-the-data-set'),
-              ('Set up  the model', 2, None, 'set-up-the-model'),
-              ('Add Dense layers on top', 2, None, 'add-dense-layers-on-top'),
-              ('Compile and train the model',
-               2,
-               None,
-               'compile-and-train-the-model'),
-              ('Finally, evaluate the model',
-               2,
-               None,
-               'finally-evaluate-the-model'),
-              ('Building our own CNN code',
-               2,
-               None,
-               'building-our-own-cnn-code'),
-              ('List of contents:', 3, None, 'list-of-contents'),
-              ('Schedulers', 3, None, 'schedulers'),
-              ('Usage of schedulers', 3, None, 'usage-of-schedulers'),
-              ('Cost functions', 3, None, 'cost-functions'),
-              ('Usage of cost functions', 3, None, 'usage-of-cost-functions'),
-              ('Activation functions', 3, None, 'activation-functions'),
-              ('Usage of activation functions',
-               3,
-               None,
-               'usage-of-activation-functions'),
-              ('Convolution', 3, None, 'convolution'),
-              ('Layers', 3, None, 'layers'),
-              ('Convolution2DLayer: convolution in a hidden layer',
-               3,
-               None,
-               'convolution2dlayer-convolution-in-a-hidden-layer'),
-              ('Backpropagation in the convolutional layer',
-               3,
-               None,
-               'backpropagation-in-the-convolutional-layer'),
-              ('Demonstration', 3, None, 'demonstration'),
-              ('Pooling Layer', 3, None, 'pooling-layer'),
-              ('Flattening Layer', 3, None, 'flattening-layer'),
-              ('Fully Connected Layers', 3, None, 'fully-connected-layers'),
-              ('Optimized Convolution2DLayer',
-               3,
-               None,
-               'optimized-convolution2dlayer'),
-              ('The Convolutional Neural Network (CNN)',
-               3,
-               None,
-               'the-convolutional-neural-network-cnn'),
-              ('Usage of CNN code', 3, None, 'usage-of-cnn-code'),
-              ('Additional Remarks', 3, None, 'additional-remarks'),
-              ('Remarks on the speed', 3, None, 'remarks-on-the-speed'),
-              ('Convolution using separable kernels',
-               3,
-               None,
-               'convolution-using-separable-kernels'),
-              ('Convolution in the Fourier domain',
-               3,
-               None,
-               'convolution-in-the-fourier-domain'),
-              ('From FFNNs and CNNs to recurrent neural networks (RNNs)',
-               2,
-               None,
-               'from-ffnns-and-cnns-to-recurrent-neural-networks-rnns'),
-              ('Feedback connections', 2, None, 'feedback-connections'),
-              ('Vanishing gradients', 2, None, 'vanishing-gradients'),
-              ('Recurrent neural networks (RNNs): Overarching view',
-               2,
-               None,
-               'recurrent-neural-networks-rnns-overarching-view'),
-              ('Sequential data only?', 2, None, 'sequential-data-only'),
-              ('Differential equations', 2, None, 'differential-equations'),
-              ('A simple example', 2, None, 'a-simple-example'),
-              ('RNNs', 2, None, 'rnns'),
-              ('What kinds of behaviour can RNNs exhibit?',
-               2,
-               None,
-               'what-kinds-of-behaviour-can-rnns-exhibit'),
-              ('Basic layout,  "Figures from Sebastian Rashcka et al, Machine '
-               'learning with Sickit-Learn and '
-               'PyTorch":"https://sebastianraschka.com/blog/2022/ml-pytorch-book.html"',
-               2,
-               None,
-               'basic-layout-figures-from-sebastian-rashcka-et-al-machine-learning-with-sickit-learn-and-pytorch-https-sebastianraschka-com-blog-2022-ml-pytorch-book-html'),
-              ('Solving differential equations with RNNs',
-               2,
-               None,
-               'solving-differential-equations-with-rnns'),
-              ('Two first-order differential equations',
-               2,
-               None,
-               'two-first-order-differential-equations'),
-              ('Velocity only', 2, None, 'velocity-only'),
-              ('Linking with RNNs', 2, None, 'linking-with-rnns'),
-              ('Minor rewrite', 2, None, 'minor-rewrite'),
-              ('RNNs in more detail', 2, None, 'rnns-in-more-detail'),
-              ('RNNs in more detail, part 2',
-               2,
-               None,
-               'rnns-in-more-detail-part-2'),
-              ('RNNs in more detail, part 3',
-               2,
-               None,
-               'rnns-in-more-detail-part-3'),
-              ('RNNs in more detail, part 4',
-               2,
-               None,
-               'rnns-in-more-detail-part-4'),
-              ('RNNs in more detail, part 5',
+               'convolution-examples-polynomial-multiplication'),
+              ('Efficient Polynomial Multiplication',
                2,
                None,
-               'rnns-in-more-detail-part-5'),
-              ('RNNs in more detail, part 6',
+               'efficient-polynomial-multiplication'),
+              ('Further simplification', 2, None, 'further-simplification'),
+              ('A more efficient way of coding the above Convolution',
                2,
                None,
-               'rnns-in-more-detail-part-6'),
-              ('RNNs in more detail, part 7',
+               'a-more-efficient-way-of-coding-the-above-convolution'),
+              ('Commutative process', 2, None, 'commutative-process'),
+              ('Toeplitz matrices', 2, None, 'toeplitz-matrices'),
+              ('Fourier series and Toeplitz matrices',
                2,
                None,
-               'rnns-in-more-detail-part-7'),
-              ('Backpropagation through time',
+               'fourier-series-and-toeplitz-matrices'),
+              ('Generalizing the above one-dimensional case',
                2,
                None,
-               'backpropagation-through-time'),
-              ('The backward pass is linear',
+               'generalizing-the-above-one-dimensional-case'),
+              ('Memory considerations', 2, None, 'memory-considerations'),
+              ('Padding', 2, None, 'padding'),
+              ('New vector', 2, None, 'new-vector'),
+              ('Rewriting as dot products',
                2,
                None,
-               'the-backward-pass-is-linear'),
-              ('The problem of exploding or vanishing gradients',
+               'rewriting-as-dot-products'),
+              ('Cross correlation', 2, None, 'cross-correlation'),
+              ('Two-dimensional objects', 2, None, 'two-dimensional-objects'),
+              ('CNNs in more detail, simple example',
                2,
                None,
-               'the-problem-of-exploding-or-vanishing-gradients'),
-              ('Mathematical setup', 2, None, 'mathematical-setup'),
-              ('Back propagation in time through figures, part 1',
+               'cnns-in-more-detail-simple-example'),
+              ('The convolution stage', 2, None, 'the-convolution-stage'),
+              ('Finding the number of parameters',
                2,
                None,
-               'back-propagation-in-time-through-figures-part-1'),
-              ('Back propagation in time, part 2',
+               'finding-the-number-of-parameters'),
+              ('New image (or volume)', 2, None, 'new-image-or-volume'),
+              ('Parameters to train, common settings',
                2,
                None,
-               'back-propagation-in-time-part-2'),
-              ('Back propagation in time, part 3',
+               'parameters-to-train-common-settings'),
+              ('Examples of CNN setups', 2, None, 'examples-of-cnn-setups'),
+              ('Summarizing: Performing a general discrete convolution ("From '
+               'Raschka et '
+               'al":"https://github.com/rasbt/machine-learning-book")',
                2,
                None,
-               'back-propagation-in-time-part-3'),
-              ('Back propagation in time, part 4',
+               'summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
+              ('Pooling', 2, None, 'pooling'),
+              ('Pooling arithmetic', 2, None, 'pooling-arithmetic'),
+              ('Pooling types ("From Raschka et '
+               'al":"https://github.com/rasbt/machine-learning-book")',
                2,
                None,
-               'back-propagation-in-time-part-4'),
-              ('Back propagation in time in equations',
+               'pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
+              ('Building convolutional neural networks using Tensorflow and '
+               'Keras',
                2,
                None,
-               'back-propagation-in-time-in-equations'),
-              ('Chain rule again', 2, None, 'chain-rule-again'),
-              ('Gradients of loss functions',
+               'building-convolutional-neural-networks-using-tensorflow-and-keras'),
+              ('Setting it up', 2, None, 'setting-it-up'),
+              ('The MNIST dataset again', 2, None, 'the-mnist-dataset-again'),
+              ('Strong correlations', 2, None, 'strong-correlations'),
+              ('Layers of a CNN', 2, None, 'layers-of-a-cnn'),
+              ('Systematic reduction', 2, None, 'systematic-reduction'),
+              ('Prerequisites: Collect and pre-process data',
                2,
                None,
-               'gradients-of-loss-functions'),
-              ('Summary of RNNs', 2, None, 'summary-of-rnns'),
-              ('Summary of a  typical RNN',
+               'prerequisites-collect-and-pre-process-data'),
+              ('Importing Keras and Tensorflow',
                2,
                None,
-               'summary-of-a-typical-rnn'),
-              ('Four effective ways to learn an RNN and preparing for next '
-               'week',
+               'importing-keras-and-tensorflow'),
+              ('Running with Keras', 2, None, 'running-with-keras'),
+              ('Final part', 2, None, 'final-part'),
+              ('Final visualization', 2, None, 'final-visualization'),
+              ('The CIFAR01 data set', 2, None, 'the-cifar01-data-set'),
+              ('Verifying the data set', 2, None, 'verifying-the-data-set'),
+              ('Set up  the model', 2, None, 'set-up-the-model'),
+              ('Add Dense layers on top', 2, None, 'add-dense-layers-on-top'),
+              ('Compile and train the model',
                2,
                None,
-               'four-effective-ways-to-learn-an-rnn-and-preparing-for-next-week'),
-              ('Gating mechanism: Long Short Term Memory (LSTM)',
+               'compile-and-train-the-model'),
+              ('Finally, evaluate the model',
                2,
                None,
-               'gating-mechanism-long-short-term-memory-lstm'),
-              ('Implementing a memory cell in a neural network',
+               'finally-evaluate-the-model'),
+              ('Building code using Pytorch',
                2,
                None,
-               'implementing-a-memory-cell-in-a-neural-network'),
-              ('LSTM details', 2, None, 'lstm-details'),
-              ('Basic layout', 2, None, 'basic-layout'),
-              ('More LSTM details', 2, None, 'more-lstm-details'),
-              ('The forget gate', 2, None, 'the-forget-gate'),
-              ('Input gate', 2, None, 'input-gate'),
-              ('Forget and input', 2, None, 'forget-and-input'),
-              ('Output gate', 2, None, 'output-gate')]}
+               'building-code-using-pytorch')]}
 end of tocinfo -->
 
 <body>
@@ -302,111 +201,72 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week45-bs.html">Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks (RNNs)</a>
+    <a class="navbar-brand" href="http://github.com/week45-bs.html">Week 45,  Convolutional Neural Networks (CCNs)</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs001.html#plans-for-week-45" style="font-size: 80%;"><b>Plans for week 45</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs002.html#material-for-the-lab-sessions-additional-ways-to-present-classification-results-and-other-practicalities" style="font-size: 80%;"><b>Material for the lab sessions, additional ways to present classification results and other practicalities</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs003.html#material-for-lecture-monday-november-4" style="font-size: 80%;"><b>Material for Lecture Monday November 4</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs004.html#convolutional-neural-networks-recognizing-images" style="font-size: 80%;"><b>Convolutional Neural Networks (recognizing images)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs005.html#what-is-the-difference" style="font-size: 80%;"><b>What is the Difference</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs006.html#neural-networks-vs-cnns" style="font-size: 80%;"><b>Neural Networks vs CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs007.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;"><b>Why CNNS for images, sound files, medical images from CT scans etc?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs008.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;"><b>Regular NNs don’t scale well to full images</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs009.html#3d-volumes-of-neurons" style="font-size: 80%;"><b>3D volumes of neurons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs010.html#layers-used-to-build-cnns" style="font-size: 80%;"><b>Layers used to build CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs011.html#cnns-in-brief" style="font-size: 80%;"><b>CNNs in brief</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs012.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs013.html#key-idea" style="font-size: 80%;"><b>Key Idea</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs014.html#building-convolutional-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building convolutional neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs015.html#setting-it-up" style="font-size: 80%;"><b>Setting it up</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs016.html#the-mnist-dataset-again" style="font-size: 80%;"><b>The MNIST dataset again</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs017.html#strong-correlations" style="font-size: 80%;"><b>Strong correlations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs018.html#layers-of-a-cnn" style="font-size: 80%;"><b>Layers of a CNN</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs019.html#systematic-reduction" style="font-size: 80%;"><b>Systematic reduction</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs020.html#prerequisites-collect-and-pre-process-data" style="font-size: 80%;"><b>Prerequisites: Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs021.html#importing-keras-and-tensorflow" style="font-size: 80%;"><b>Importing Keras and Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs022.html#running-with-keras" style="font-size: 80%;"><b>Running with Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs023.html#final-part" style="font-size: 80%;"><b>Final part</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs024.html#final-visualization" style="font-size: 80%;"><b>Final visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs025.html#the-cifar01-data-set" style="font-size: 80%;"><b>The CIFAR01 data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs026.html#verifying-the-data-set" style="font-size: 80%;"><b>Verifying the data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs027.html#set-up-the-model" style="font-size: 80%;"><b>Set up  the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs028.html#add-dense-layers-on-top" style="font-size: 80%;"><b>Add Dense layers on top</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs029.html#compile-and-train-the-model" style="font-size: 80%;"><b>Compile and train the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs030.html#finally-evaluate-the-model" style="font-size: 80%;"><b>Finally, evaluate the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#building-our-own-cnn-code" style="font-size: 80%;"><b>Building our own CNN code</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#list-of-contents" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;List of contents:</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution2dlayer-convolution-in-a-hidden-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution2DLayer: convolution in a hidden layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#backpropagation-in-the-convolutional-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Backpropagation in the convolutional layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#demonstration" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Demonstration</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#pooling-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Pooling Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#flattening-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Flattening Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#fully-connected-layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Fully Connected Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#optimized-convolution2dlayer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Optimized Convolution2DLayer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#the-convolutional-neural-network-cnn" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;The Convolutional Neural Network (CNN)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-cnn-code" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of CNN code</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#additional-remarks" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Additional Remarks</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#remarks-on-the-speed" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Remarks on the speed</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution-using-separable-kernels" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution using separable kernels</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution-in-the-fourier-domain" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution in the Fourier domain</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs032.html#from-ffnns-and-cnns-to-recurrent-neural-networks-rnns" style="font-size: 80%;"><b>From FFNNs and CNNs to recurrent neural networks (RNNs)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs033.html#feedback-connections" style="font-size: 80%;"><b>Feedback connections</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs034.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs035.html#recurrent-neural-networks-rnns-overarching-view" style="font-size: 80%;"><b>Recurrent neural networks (RNNs): Overarching view</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs036.html#sequential-data-only" style="font-size: 80%;"><b>Sequential data only?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs037.html#differential-equations" style="font-size: 80%;"><b>Differential equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="#a-simple-example" style="font-size: 80%;"><b>A simple example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs039.html#rnns" style="font-size: 80%;"><b>RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs040.html#what-kinds-of-behaviour-can-rnns-exhibit" style="font-size: 80%;"><b>What kinds of behaviour can RNNs exhibit?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs041.html#basic-layout-figures-from-sebastian-rashcka-et-al-machine-learning-with-sickit-learn-and-pytorch-https-sebastianraschka-com-blog-2022-ml-pytorch-book-html" style="font-size: 80%;"><b>Basic layout,  "Figures from Sebastian Rashcka et al, Machine learning with Sickit-Learn and PyTorch":"https://sebastianraschka.com/blog/2022/ml-pytorch-book.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs042.html#solving-differential-equations-with-rnns" style="font-size: 80%;"><b>Solving differential equations with RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs043.html#two-first-order-differential-equations" style="font-size: 80%;"><b>Two first-order differential equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs044.html#velocity-only" style="font-size: 80%;"><b>Velocity only</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs045.html#linking-with-rnns" style="font-size: 80%;"><b>Linking with RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs046.html#minor-rewrite" style="font-size: 80%;"><b>Minor rewrite</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs047.html#rnns-in-more-detail" style="font-size: 80%;"><b>RNNs in more detail</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs048.html#rnns-in-more-detail-part-2" style="font-size: 80%;"><b>RNNs in more detail, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs049.html#rnns-in-more-detail-part-3" style="font-size: 80%;"><b>RNNs in more detail, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs050.html#rnns-in-more-detail-part-4" style="font-size: 80%;"><b>RNNs in more detail, part 4</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs051.html#rnns-in-more-detail-part-5" style="font-size: 80%;"><b>RNNs in more detail, part 5</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs052.html#rnns-in-more-detail-part-6" style="font-size: 80%;"><b>RNNs in more detail, part 6</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs053.html#rnns-in-more-detail-part-7" style="font-size: 80%;"><b>RNNs in more detail, part 7</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs054.html#backpropagation-through-time" style="font-size: 80%;"><b>Backpropagation through time</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs055.html#the-backward-pass-is-linear" style="font-size: 80%;"><b>The backward pass is linear</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs056.html#the-problem-of-exploding-or-vanishing-gradients" style="font-size: 80%;"><b>The problem of exploding or vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs057.html#mathematical-setup" style="font-size: 80%;"><b>Mathematical setup</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs058.html#back-propagation-in-time-through-figures-part-1" style="font-size: 80%;"><b>Back propagation in time through figures, part 1</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs059.html#back-propagation-in-time-part-2" style="font-size: 80%;"><b>Back propagation in time, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs060.html#back-propagation-in-time-part-3" style="font-size: 80%;"><b>Back propagation in time, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs061.html#back-propagation-in-time-part-4" style="font-size: 80%;"><b>Back propagation in time, part 4</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs062.html#back-propagation-in-time-in-equations" style="font-size: 80%;"><b>Back propagation in time in equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs063.html#chain-rule-again" style="font-size: 80%;"><b>Chain rule again</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs064.html#gradients-of-loss-functions" style="font-size: 80%;"><b>Gradients of loss functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs065.html#summary-of-rnns" style="font-size: 80%;"><b>Summary of RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs066.html#summary-of-a-typical-rnn" style="font-size: 80%;"><b>Summary of a  typical RNN</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs067.html#four-effective-ways-to-learn-an-rnn-and-preparing-for-next-week" style="font-size: 80%;"><b>Four effective ways to learn an RNN and preparing for next week</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs068.html#gating-mechanism-long-short-term-memory-lstm" style="font-size: 80%;"><b>Gating mechanism: Long Short Term Memory (LSTM)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs069.html#implementing-a-memory-cell-in-a-neural-network" style="font-size: 80%;"><b>Implementing a memory cell in a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs070.html#lstm-details" style="font-size: 80%;"><b>LSTM details</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs071.html#basic-layout" style="font-size: 80%;"><b>Basic layout</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs072.html#more-lstm-details" style="font-size: 80%;"><b>More LSTM details</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs073.html#the-forget-gate" style="font-size: 80%;"><b>The forget gate</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs074.html#input-gate" style="font-size: 80%;"><b>Input gate</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs075.html#forget-and-input" style="font-size: 80%;"><b>Forget and input</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs076.html#output-gate" style="font-size: 80%;"><b>Output gate</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs001.html#plans-for-week-45" style="font-size: 80%;">Plans for week 45</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs002.html#material-for-the-lab-sessions" style="font-size: 80%;">Material for the lab sessions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs003.html#material-for-lecture-monday-november-3" style="font-size: 80%;">Material for Lecture Monday November 3</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs004.html#convolutional-neural-networks-recognizing-images-reminder-from-last-week" style="font-size: 80%;">Convolutional Neural Networks (recognizing images), reminder from last week</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs005.html#what-is-the-difference" style="font-size: 80%;">What is the Difference</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs006.html#neural-networks-vs-cnns" style="font-size: 80%;">Neural Networks vs CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs007.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;">Why CNNS for images, sound files, medical images from CT scans etc?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs008.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;">Regular NNs don’t scale well to full images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs009.html#3d-volumes-of-neurons" style="font-size: 80%;">3D volumes of neurons</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs010.html#more-on-dimensionalities" style="font-size: 80%;">More on Dimensionalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs011.html#further-remarks" style="font-size: 80%;">Further remarks</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs012.html#layers-used-to-build-cnns" style="font-size: 80%;">Layers used to build CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs013.html#transforming-images" style="font-size: 80%;">Transforming images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs014.html#cnns-in-brief" style="font-size: 80%;">CNNs in brief</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs015.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs016.html#key-idea" style="font-size: 80%;">Key Idea</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs017.html#mathematics-of-cnns" style="font-size: 80%;">Mathematics of CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs018.html#convolution-examples-polynomial-multiplication" style="font-size: 80%;">Convolution Examples: Polynomial multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs019.html#efficient-polynomial-multiplication" style="font-size: 80%;">Efficient Polynomial Multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs020.html#further-simplification" style="font-size: 80%;">Further simplification</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs021.html#a-more-efficient-way-of-coding-the-above-convolution" style="font-size: 80%;">A more efficient way of coding the above Convolution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs022.html#commutative-process" style="font-size: 80%;">Commutative process</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs023.html#toeplitz-matrices" style="font-size: 80%;">Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs024.html#fourier-series-and-toeplitz-matrices" style="font-size: 80%;">Fourier series and Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs025.html#generalizing-the-above-one-dimensional-case" style="font-size: 80%;">Generalizing the above one-dimensional case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs026.html#memory-considerations" style="font-size: 80%;">Memory considerations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs027.html#padding" style="font-size: 80%;">Padding</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs028.html#new-vector" style="font-size: 80%;">New vector</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs029.html#rewriting-as-dot-products" style="font-size: 80%;">Rewriting as dot products</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs030.html#cross-correlation" style="font-size: 80%;">Cross correlation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs030.html#two-dimensional-objects" style="font-size: 80%;">Two-dimensional objects</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#cnns-in-more-detail-simple-example" style="font-size: 80%;">CNNs in more detail, simple example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs032.html#the-convolution-stage" style="font-size: 80%;">The convolution stage</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs033.html#finding-the-number-of-parameters" style="font-size: 80%;">Finding the number of parameters</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs034.html#new-image-or-volume" style="font-size: 80%;">New image (or volume)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs035.html#parameters-to-train-common-settings" style="font-size: 80%;">Parameters to train, common settings</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs036.html#examples-of-cnn-setups" style="font-size: 80%;">Examples of CNN setups</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs037.html#summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Summarizing: Performing a general discrete convolution ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="#pooling" style="font-size: 80%;">Pooling</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs039.html#pooling-arithmetic" style="font-size: 80%;">Pooling arithmetic</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs040.html#pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Pooling types ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs041.html#building-convolutional-neural-networks-using-tensorflow-and-keras" style="font-size: 80%;">Building convolutional neural networks using Tensorflow and Keras</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs042.html#setting-it-up" style="font-size: 80%;">Setting it up</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs043.html#the-mnist-dataset-again" style="font-size: 80%;">The MNIST dataset again</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs044.html#strong-correlations" style="font-size: 80%;">Strong correlations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs045.html#layers-of-a-cnn" style="font-size: 80%;">Layers of a CNN</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs046.html#systematic-reduction" style="font-size: 80%;">Systematic reduction</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs047.html#prerequisites-collect-and-pre-process-data" style="font-size: 80%;">Prerequisites: Collect and pre-process data</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs048.html#importing-keras-and-tensorflow" style="font-size: 80%;">Importing Keras and Tensorflow</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs049.html#running-with-keras" style="font-size: 80%;">Running with Keras</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs050.html#final-part" style="font-size: 80%;">Final part</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs051.html#final-visualization" style="font-size: 80%;">Final visualization</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs052.html#the-cifar01-data-set" style="font-size: 80%;">The CIFAR01 data set</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs053.html#verifying-the-data-set" style="font-size: 80%;">Verifying the data set</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs054.html#set-up-the-model" style="font-size: 80%;">Set up  the model</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs055.html#add-dense-layers-on-top" style="font-size: 80%;">Add Dense layers on top</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs056.html#compile-and-train-the-model" style="font-size: 80%;">Compile and train the model</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs057.html#finally-evaluate-the-model" style="font-size: 80%;">Finally, evaluate the model</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs058.html#building-code-using-pytorch" style="font-size: 80%;">Building code using Pytorch</a></li>
 
         </ul>
       </li>
@@ -418,92 +278,19 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0038"></a>
 <!-- !split -->
-<h2 id="a-simple-example" class="anchor">A simple example </h2>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #408080; font-style: italic"># Start importing packages</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">pandas</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">pd</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">matplotlib.pyplot</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">plt</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">tensorflow</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">tf</span>
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">tensorflow.keras</span> <span style="color: #008000; font-weight: bold">import</span> datasets, layers, models
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">tensorflow.keras.layers</span> <span style="color: #008000; font-weight: bold">import</span> Input
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">tensorflow.keras.models</span> <span style="color: #008000; font-weight: bold">import</span> Model, Sequential 
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">tensorflow.keras.layers</span> <span style="color: #008000; font-weight: bold">import</span> Dense, SimpleRNN, LSTM, GRU
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">tensorflow.keras</span> <span style="color: #008000; font-weight: bold">import</span> optimizers     
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">tensorflow.keras</span> <span style="color: #008000; font-weight: bold">import</span> regularizers           
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">tensorflow.keras.utils</span> <span style="color: #008000; font-weight: bold">import</span> to_categorical 
-
-
+<h2 id="pooling" class="anchor">Pooling </h2>
 
-<span style="color: #408080; font-style: italic"># convert into dataset matrix</span>
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">convertToMatrix</span>(data, step):
- X, Y <span style="color: #666666">=</span>[], []
- <span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #008000">len</span>(data)<span style="color: #666666">-</span>step):
-  d<span style="color: #666666">=</span>i<span style="color: #666666">+</span>step  
-  X<span style="color: #666666">.</span>append(data[i:d,])
-  Y<span style="color: #666666">.</span>append(data[d,])
- <span style="color: #008000; font-weight: bold">return</span> np<span style="color: #666666">.</span>array(X), np<span style="color: #666666">.</span>array(Y)
-
-step <span style="color: #666666">=</span> <span style="color: #666666">4</span>
-N <span style="color: #666666">=</span> <span style="color: #666666">1000</span>    
-Tp <span style="color: #666666">=</span> <span style="color: #666666">800</span>    
-
-t<span style="color: #666666">=</span>np<span style="color: #666666">.</span>arange(<span style="color: #666666">0</span>,N)
-x<span style="color: #666666">=</span>np<span style="color: #666666">.</span>sin(<span style="color: #666666">0.02*</span>t)<span style="color: #666666">+2*</span>np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>rand(N)
-df <span style="color: #666666">=</span> pd<span style="color: #666666">.</span>DataFrame(x)
-df<span style="color: #666666">.</span>head()
-
-values<span style="color: #666666">=</span>df<span style="color: #666666">.</span>values
-train,test <span style="color: #666666">=</span> values[<span style="color: #666666">0</span>:Tp,:], values[Tp:N,:]
-
-<span style="color: #408080; font-style: italic"># add step elements into train and test</span>
-test <span style="color: #666666">=</span> np<span style="color: #666666">.</span>append(test,np<span style="color: #666666">.</span>repeat(test[<span style="color: #666666">-1</span>,],step))
-train <span style="color: #666666">=</span> np<span style="color: #666666">.</span>append(train,np<span style="color: #666666">.</span>repeat(train[<span style="color: #666666">-1</span>,],step))
- 
-trainX,trainY <span style="color: #666666">=</span>convertToMatrix(train,step)
-testX,testY <span style="color: #666666">=</span>convertToMatrix(test,step)
-trainX <span style="color: #666666">=</span> np<span style="color: #666666">.</span>reshape(trainX, (trainX<span style="color: #666666">.</span>shape[<span style="color: #666666">0</span>], <span style="color: #666666">1</span>, trainX<span style="color: #666666">.</span>shape[<span style="color: #666666">1</span>]))
-testX <span style="color: #666666">=</span> np<span style="color: #666666">.</span>reshape(testX, (testX<span style="color: #666666">.</span>shape[<span style="color: #666666">0</span>], <span style="color: #666666">1</span>, testX<span style="color: #666666">.</span>shape[<span style="color: #666666">1</span>]))
-
-model <span style="color: #666666">=</span> Sequential()
-model<span style="color: #666666">.</span>add(SimpleRNN(units<span style="color: #666666">=32</span>, input_shape<span style="color: #666666">=</span>(<span style="color: #666666">1</span>,step), activation<span style="color: #666666">=</span><span style="color: #BA2121">&quot;relu&quot;</span>))
-model<span style="color: #666666">.</span>add(Dense(<span style="color: #666666">8</span>, activation<span style="color: #666666">=</span><span style="color: #BA2121">&quot;relu&quot;</span>)) 
-model<span style="color: #666666">.</span>add(Dense(<span style="color: #666666">1</span>))
-model<span style="color: #666666">.</span>compile(loss<span style="color: #666666">=</span><span style="color: #BA2121">&#39;mean_squared_error&#39;</span>, optimizer<span style="color: #666666">=</span><span style="color: #BA2121">&#39;rmsprop&#39;</span>)
-model<span style="color: #666666">.</span>summary()
-
-model<span style="color: #666666">.</span>fit(trainX,trainY, epochs<span style="color: #666666">=100</span>, batch_size<span style="color: #666666">=16</span>, verbose<span style="color: #666666">=2</span>)
-trainPredict <span style="color: #666666">=</span> model<span style="color: #666666">.</span>predict(trainX)
-testPredict<span style="color: #666666">=</span> model<span style="color: #666666">.</span>predict(testX)
-predicted<span style="color: #666666">=</span>np<span style="color: #666666">.</span>concatenate((trainPredict,testPredict),axis<span style="color: #666666">=0</span>)
-
-trainScore <span style="color: #666666">=</span> model<span style="color: #666666">.</span>evaluate(trainX, trainY, verbose<span style="color: #666666">=0</span>)
-<span style="color: #008000">print</span>(trainScore)
-plt<span style="color: #666666">.</span>plot(df)
-plt<span style="color: #666666">.</span>plot(predicted)
-plt<span style="color: #666666">.</span>show()
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
+<p>In addition to discrete convolutions themselves, <b>pooling</b> operations
+make up another important building block in CNNs. Pooling operations reduce
+the size of feature maps by using some function to summarize subregions, such
+as taking the average or the maximum value.
+</p>
 
+<p>Pooling works by sliding a window across the input and feeding the content of
+the window to a <b>pooling function</b>. In some sense, pooling works very much
+like a discrete convolution, but replaces the linear combination described by
+the kernel with some other function.
+</p>
 
 <p>
 <!-- navigation buttons at the bottom of the page -->
@@ -530,7 +317,7 @@ <h2 id="a-simple-example" class="anchor">A simple example </h2>
   <li><a href="http://github.com/._week45-bs046.html">47</a></li>
   <li><a href="http://github.com/._week45-bs047.html">48</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week45-bs076.html">77</a></li>
+  <li><a href="http://github.com/._week45-bs058.html">59</a></li>
   <li><a href="http://github.com/._week45-bs039.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week45/html/._week45-bs039.html b/doc/pub/week45/html/._week45-bs039.html
index 5ddb46628..7cec57d8b 100644
--- a/doc/pub/week45/html/._week45-bs039.html
+++ b/doc/pub/week45/html/._week45-bs039.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks (RNNs)">
-<title>Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks (RNNs)</title>
+<meta name="description" content="Week 45,  Convolutional Neural Networks (CCNs)">
+<title>Week 45,  Convolutional Neural Networks (CCNs)</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week45.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week45-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -37,19 +37,19 @@
 <!-- tocinfo
 {'highest level': 2,
  'sections': [('Plans for week 45', 2, None, 'plans-for-week-45'),
-              ('Material for the lab sessions, additional ways to present '
-               'classification results and other practicalities',
+              ('Material for the lab sessions',
                2,
                None,
-               'material-for-the-lab-sessions-additional-ways-to-present-classification-results-and-other-practicalities'),
-              ('Material for Lecture Monday November 4',
+               'material-for-the-lab-sessions'),
+              ('Material for Lecture Monday November 3',
                2,
                None,
-               'material-for-lecture-monday-november-4'),
-              ('Convolutional Neural Networks (recognizing images)',
+               'material-for-lecture-monday-november-3'),
+              ('Convolutional Neural Networks (recognizing images), reminder '
+               'from last week',
                2,
                None,
-               'convolutional-neural-networks-recognizing-images'),
+               'convolutional-neural-networks-recognizing-images-reminder-from-last-week'),
               ('What is the Difference', 2, None, 'what-is-the-difference'),
               ('Neural Networks vs CNNs', 2, None, 'neural-networks-vs-cnns'),
               ('Why CNNS for images, sound files, medical images from CT scans '
@@ -62,10 +62,13 @@
                None,
                'regular-nns-don-t-scale-well-to-full-images'),
               ('3D volumes of neurons', 2, None, '3d-volumes-of-neurons'),
+              ('More on Dimensionalities', 2, None, 'more-on-dimensionalities'),
+              ('Further remarks', 2, None, 'further-remarks'),
               ('Layers used to build CNNs',
                2,
                None,
                'layers-used-to-build-cnns'),
+              ('Transforming images', 2, None, 'transforming-images'),
               ('CNNs in brief', 2, None, 'cnns-in-brief'),
               ('A deep CNN model ("From Raschka et '
                'al":"https://github.com/rasbt/machine-learning-book")',
@@ -73,208 +76,104 @@
                None,
                'a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
               ('Key Idea', 2, None, 'key-idea'),
-              ('Building convolutional neural networks in Tensorflow and Keras',
+              ('Mathematics of CNNs', 2, None, 'mathematics-of-cnns'),
+              ('Convolution Examples: Polynomial multiplication',
                2,
                None,
-               'building-convolutional-neural-networks-in-tensorflow-and-keras'),
-              ('Setting it up', 2, None, 'setting-it-up'),
-              ('The MNIST dataset again', 2, None, 'the-mnist-dataset-again'),
-              ('Strong correlations', 2, None, 'strong-correlations'),
-              ('Layers of a CNN', 2, None, 'layers-of-a-cnn'),
-              ('Systematic reduction', 2, None, 'systematic-reduction'),
-              ('Prerequisites: Collect and pre-process data',
-               2,
-               None,
-               'prerequisites-collect-and-pre-process-data'),
-              ('Importing Keras and Tensorflow',
-               2,
-               None,
-               'importing-keras-and-tensorflow'),
-              ('Running with Keras', 2, None, 'running-with-keras'),
-              ('Final part', 2, None, 'final-part'),
-              ('Final visualization', 2, None, 'final-visualization'),
-              ('The CIFAR01 data set', 2, None, 'the-cifar01-data-set'),
-              ('Verifying the data set', 2, None, 'verifying-the-data-set'),
-              ('Set up  the model', 2, None, 'set-up-the-model'),
-              ('Add Dense layers on top', 2, None, 'add-dense-layers-on-top'),
-              ('Compile and train the model',
-               2,
-               None,
-               'compile-and-train-the-model'),
-              ('Finally, evaluate the model',
-               2,
-               None,
-               'finally-evaluate-the-model'),
-              ('Building our own CNN code',
-               2,
-               None,
-               'building-our-own-cnn-code'),
-              ('List of contents:', 3, None, 'list-of-contents'),
-              ('Schedulers', 3, None, 'schedulers'),
-              ('Usage of schedulers', 3, None, 'usage-of-schedulers'),
-              ('Cost functions', 3, None, 'cost-functions'),
-              ('Usage of cost functions', 3, None, 'usage-of-cost-functions'),
-              ('Activation functions', 3, None, 'activation-functions'),
-              ('Usage of activation functions',
-               3,
-               None,
-               'usage-of-activation-functions'),
-              ('Convolution', 3, None, 'convolution'),
-              ('Layers', 3, None, 'layers'),
-              ('Convolution2DLayer: convolution in a hidden layer',
-               3,
-               None,
-               'convolution2dlayer-convolution-in-a-hidden-layer'),
-              ('Backpropagation in the convolutional layer',
-               3,
-               None,
-               'backpropagation-in-the-convolutional-layer'),
-              ('Demonstration', 3, None, 'demonstration'),
-              ('Pooling Layer', 3, None, 'pooling-layer'),
-              ('Flattening Layer', 3, None, 'flattening-layer'),
-              ('Fully Connected Layers', 3, None, 'fully-connected-layers'),
-              ('Optimized Convolution2DLayer',
-               3,
-               None,
-               'optimized-convolution2dlayer'),
-              ('The Convolutional Neural Network (CNN)',
-               3,
-               None,
-               'the-convolutional-neural-network-cnn'),
-              ('Usage of CNN code', 3, None, 'usage-of-cnn-code'),
-              ('Additional Remarks', 3, None, 'additional-remarks'),
-              ('Remarks on the speed', 3, None, 'remarks-on-the-speed'),
-              ('Convolution using separable kernels',
-               3,
-               None,
-               'convolution-using-separable-kernels'),
-              ('Convolution in the Fourier domain',
-               3,
-               None,
-               'convolution-in-the-fourier-domain'),
-              ('From FFNNs and CNNs to recurrent neural networks (RNNs)',
-               2,
-               None,
-               'from-ffnns-and-cnns-to-recurrent-neural-networks-rnns'),
-              ('Feedback connections', 2, None, 'feedback-connections'),
-              ('Vanishing gradients', 2, None, 'vanishing-gradients'),
-              ('Recurrent neural networks (RNNs): Overarching view',
+               'convolution-examples-polynomial-multiplication'),
+              ('Efficient Polynomial Multiplication',
                2,
                None,
-               'recurrent-neural-networks-rnns-overarching-view'),
-              ('Sequential data only?', 2, None, 'sequential-data-only'),
-              ('Differential equations', 2, None, 'differential-equations'),
-              ('A simple example', 2, None, 'a-simple-example'),
-              ('RNNs', 2, None, 'rnns'),
-              ('What kinds of behaviour can RNNs exhibit?',
+               'efficient-polynomial-multiplication'),
+              ('Further simplification', 2, None, 'further-simplification'),
+              ('A more efficient way of coding the above Convolution',
                2,
                None,
-               'what-kinds-of-behaviour-can-rnns-exhibit'),
-              ('Basic layout,  "Figures from Sebastian Rashcka et al, Machine '
-               'learning with Sickit-Learn and '
-               'PyTorch":"https://sebastianraschka.com/blog/2022/ml-pytorch-book.html"',
+               'a-more-efficient-way-of-coding-the-above-convolution'),
+              ('Commutative process', 2, None, 'commutative-process'),
+              ('Toeplitz matrices', 2, None, 'toeplitz-matrices'),
+              ('Fourier series and Toeplitz matrices',
                2,
                None,
-               'basic-layout-figures-from-sebastian-rashcka-et-al-machine-learning-with-sickit-learn-and-pytorch-https-sebastianraschka-com-blog-2022-ml-pytorch-book-html'),
-              ('Solving differential equations with RNNs',
+               'fourier-series-and-toeplitz-matrices'),
+              ('Generalizing the above one-dimensional case',
                2,
                None,
-               'solving-differential-equations-with-rnns'),
-              ('Two first-order differential equations',
+               'generalizing-the-above-one-dimensional-case'),
+              ('Memory considerations', 2, None, 'memory-considerations'),
+              ('Padding', 2, None, 'padding'),
+              ('New vector', 2, None, 'new-vector'),
+              ('Rewriting as dot products',
                2,
                None,
-               'two-first-order-differential-equations'),
-              ('Velocity only', 2, None, 'velocity-only'),
-              ('Linking with RNNs', 2, None, 'linking-with-rnns'),
-              ('Minor rewrite', 2, None, 'minor-rewrite'),
-              ('RNNs in more detail', 2, None, 'rnns-in-more-detail'),
-              ('RNNs in more detail, part 2',
+               'rewriting-as-dot-products'),
+              ('Cross correlation', 2, None, 'cross-correlation'),
+              ('Two-dimensional objects', 2, None, 'two-dimensional-objects'),
+              ('CNNs in more detail, simple example',
                2,
                None,
-               'rnns-in-more-detail-part-2'),
-              ('RNNs in more detail, part 3',
+               'cnns-in-more-detail-simple-example'),
+              ('The convolution stage', 2, None, 'the-convolution-stage'),
+              ('Finding the number of parameters',
                2,
                None,
-               'rnns-in-more-detail-part-3'),
-              ('RNNs in more detail, part 4',
+               'finding-the-number-of-parameters'),
+              ('New image (or volume)', 2, None, 'new-image-or-volume'),
+              ('Parameters to train, common settings',
                2,
                None,
-               'rnns-in-more-detail-part-4'),
-              ('RNNs in more detail, part 5',
-               2,
-               None,
-               'rnns-in-more-detail-part-5'),
-              ('RNNs in more detail, part 6',
-               2,
-               None,
-               'rnns-in-more-detail-part-6'),
-              ('RNNs in more detail, part 7',
-               2,
-               None,
-               'rnns-in-more-detail-part-7'),
-              ('Backpropagation through time',
-               2,
-               None,
-               'backpropagation-through-time'),
-              ('The backward pass is linear',
-               2,
-               None,
-               'the-backward-pass-is-linear'),
-              ('The problem of exploding or vanishing gradients',
-               2,
-               None,
-               'the-problem-of-exploding-or-vanishing-gradients'),
-              ('Mathematical setup', 2, None, 'mathematical-setup'),
-              ('Back propagation in time through figures, part 1',
-               2,
-               None,
-               'back-propagation-in-time-through-figures-part-1'),
-              ('Back propagation in time, part 2',
-               2,
-               None,
-               'back-propagation-in-time-part-2'),
-              ('Back propagation in time, part 3',
+               'parameters-to-train-common-settings'),
+              ('Examples of CNN setups', 2, None, 'examples-of-cnn-setups'),
+              ('Summarizing: Performing a general discrete convolution ("From '
+               'Raschka et '
+               'al":"https://github.com/rasbt/machine-learning-book")',
                2,
                None,
-               'back-propagation-in-time-part-3'),
-              ('Back propagation in time, part 4',
+               'summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
+              ('Pooling', 2, None, 'pooling'),
+              ('Pooling arithmetic', 2, None, 'pooling-arithmetic'),
+              ('Pooling types ("From Raschka et '
+               'al":"https://github.com/rasbt/machine-learning-book")',
                2,
                None,
-               'back-propagation-in-time-part-4'),
-              ('Back propagation in time in equations',
+               'pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
+              ('Building convolutional neural networks using Tensorflow and '
+               'Keras',
                2,
                None,
-               'back-propagation-in-time-in-equations'),
-              ('Chain rule again', 2, None, 'chain-rule-again'),
-              ('Gradients of loss functions',
+               'building-convolutional-neural-networks-using-tensorflow-and-keras'),
+              ('Setting it up', 2, None, 'setting-it-up'),
+              ('The MNIST dataset again', 2, None, 'the-mnist-dataset-again'),
+              ('Strong correlations', 2, None, 'strong-correlations'),
+              ('Layers of a CNN', 2, None, 'layers-of-a-cnn'),
+              ('Systematic reduction', 2, None, 'systematic-reduction'),
+              ('Prerequisites: Collect and pre-process data',
                2,
                None,
-               'gradients-of-loss-functions'),
-              ('Summary of RNNs', 2, None, 'summary-of-rnns'),
-              ('Summary of a  typical RNN',
+               'prerequisites-collect-and-pre-process-data'),
+              ('Importing Keras and Tensorflow',
                2,
                None,
-               'summary-of-a-typical-rnn'),
-              ('Four effective ways to learn an RNN and preparing for next '
-               'week',
+               'importing-keras-and-tensorflow'),
+              ('Running with Keras', 2, None, 'running-with-keras'),
+              ('Final part', 2, None, 'final-part'),
+              ('Final visualization', 2, None, 'final-visualization'),
+              ('The CIFAR01 data set', 2, None, 'the-cifar01-data-set'),
+              ('Verifying the data set', 2, None, 'verifying-the-data-set'),
+              ('Set up  the model', 2, None, 'set-up-the-model'),
+              ('Add Dense layers on top', 2, None, 'add-dense-layers-on-top'),
+              ('Compile and train the model',
                2,
                None,
-               'four-effective-ways-to-learn-an-rnn-and-preparing-for-next-week'),
-              ('Gating mechanism: Long Short Term Memory (LSTM)',
+               'compile-and-train-the-model'),
+              ('Finally, evaluate the model',
                2,
                None,
-               'gating-mechanism-long-short-term-memory-lstm'),
-              ('Implementing a memory cell in a neural network',
+               'finally-evaluate-the-model'),
+              ('Building code using Pytorch',
                2,
                None,
-               'implementing-a-memory-cell-in-a-neural-network'),
-              ('LSTM details', 2, None, 'lstm-details'),
-              ('Basic layout', 2, None, 'basic-layout'),
-              ('More LSTM details', 2, None, 'more-lstm-details'),
-              ('The forget gate', 2, None, 'the-forget-gate'),
-              ('Input gate', 2, None, 'input-gate'),
-              ('Forget and input', 2, None, 'forget-and-input'),
-              ('Output gate', 2, None, 'output-gate')]}
+               'building-code-using-pytorch')]}
 end of tocinfo -->
 
 <body>
@@ -302,111 +201,72 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week45-bs.html">Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks (RNNs)</a>
+    <a class="navbar-brand" href="http://github.com/week45-bs.html">Week 45,  Convolutional Neural Networks (CCNs)</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs001.html#plans-for-week-45" style="font-size: 80%;"><b>Plans for week 45</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs002.html#material-for-the-lab-sessions-additional-ways-to-present-classification-results-and-other-practicalities" style="font-size: 80%;"><b>Material for the lab sessions, additional ways to present classification results and other practicalities</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs003.html#material-for-lecture-monday-november-4" style="font-size: 80%;"><b>Material for Lecture Monday November 4</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs004.html#convolutional-neural-networks-recognizing-images" style="font-size: 80%;"><b>Convolutional Neural Networks (recognizing images)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs005.html#what-is-the-difference" style="font-size: 80%;"><b>What is the Difference</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs006.html#neural-networks-vs-cnns" style="font-size: 80%;"><b>Neural Networks vs CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs007.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;"><b>Why CNNS for images, sound files, medical images from CT scans etc?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs008.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;"><b>Regular NNs don’t scale well to full images</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs009.html#3d-volumes-of-neurons" style="font-size: 80%;"><b>3D volumes of neurons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs010.html#layers-used-to-build-cnns" style="font-size: 80%;"><b>Layers used to build CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs011.html#cnns-in-brief" style="font-size: 80%;"><b>CNNs in brief</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs012.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs013.html#key-idea" style="font-size: 80%;"><b>Key Idea</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs014.html#building-convolutional-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building convolutional neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs015.html#setting-it-up" style="font-size: 80%;"><b>Setting it up</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs016.html#the-mnist-dataset-again" style="font-size: 80%;"><b>The MNIST dataset again</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs017.html#strong-correlations" style="font-size: 80%;"><b>Strong correlations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs018.html#layers-of-a-cnn" style="font-size: 80%;"><b>Layers of a CNN</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs019.html#systematic-reduction" style="font-size: 80%;"><b>Systematic reduction</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs020.html#prerequisites-collect-and-pre-process-data" style="font-size: 80%;"><b>Prerequisites: Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs021.html#importing-keras-and-tensorflow" style="font-size: 80%;"><b>Importing Keras and Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs022.html#running-with-keras" style="font-size: 80%;"><b>Running with Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs023.html#final-part" style="font-size: 80%;"><b>Final part</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs024.html#final-visualization" style="font-size: 80%;"><b>Final visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs025.html#the-cifar01-data-set" style="font-size: 80%;"><b>The CIFAR01 data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs026.html#verifying-the-data-set" style="font-size: 80%;"><b>Verifying the data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs027.html#set-up-the-model" style="font-size: 80%;"><b>Set up  the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs028.html#add-dense-layers-on-top" style="font-size: 80%;"><b>Add Dense layers on top</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs029.html#compile-and-train-the-model" style="font-size: 80%;"><b>Compile and train the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs030.html#finally-evaluate-the-model" style="font-size: 80%;"><b>Finally, evaluate the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#building-our-own-cnn-code" style="font-size: 80%;"><b>Building our own CNN code</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#list-of-contents" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;List of contents:</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution2dlayer-convolution-in-a-hidden-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution2DLayer: convolution in a hidden layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#backpropagation-in-the-convolutional-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Backpropagation in the convolutional layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#demonstration" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Demonstration</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#pooling-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Pooling Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#flattening-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Flattening Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#fully-connected-layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Fully Connected Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#optimized-convolution2dlayer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Optimized Convolution2DLayer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#the-convolutional-neural-network-cnn" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;The Convolutional Neural Network (CNN)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-cnn-code" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of CNN code</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#additional-remarks" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Additional Remarks</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#remarks-on-the-speed" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Remarks on the speed</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution-using-separable-kernels" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution using separable kernels</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution-in-the-fourier-domain" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution in the Fourier domain</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs032.html#from-ffnns-and-cnns-to-recurrent-neural-networks-rnns" style="font-size: 80%;"><b>From FFNNs and CNNs to recurrent neural networks (RNNs)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs033.html#feedback-connections" style="font-size: 80%;"><b>Feedback connections</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs034.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs035.html#recurrent-neural-networks-rnns-overarching-view" style="font-size: 80%;"><b>Recurrent neural networks (RNNs): Overarching view</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs036.html#sequential-data-only" style="font-size: 80%;"><b>Sequential data only?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs037.html#differential-equations" style="font-size: 80%;"><b>Differential equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs038.html#a-simple-example" style="font-size: 80%;"><b>A simple example</b></a></li>
-     <!-- navigation toc: --> <li><a href="#rnns" style="font-size: 80%;"><b>RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs040.html#what-kinds-of-behaviour-can-rnns-exhibit" style="font-size: 80%;"><b>What kinds of behaviour can RNNs exhibit?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs041.html#basic-layout-figures-from-sebastian-rashcka-et-al-machine-learning-with-sickit-learn-and-pytorch-https-sebastianraschka-com-blog-2022-ml-pytorch-book-html" style="font-size: 80%;"><b>Basic layout,  "Figures from Sebastian Rashcka et al, Machine learning with Sickit-Learn and PyTorch":"https://sebastianraschka.com/blog/2022/ml-pytorch-book.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs042.html#solving-differential-equations-with-rnns" style="font-size: 80%;"><b>Solving differential equations with RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs043.html#two-first-order-differential-equations" style="font-size: 80%;"><b>Two first-order differential equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs044.html#velocity-only" style="font-size: 80%;"><b>Velocity only</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs045.html#linking-with-rnns" style="font-size: 80%;"><b>Linking with RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs046.html#minor-rewrite" style="font-size: 80%;"><b>Minor rewrite</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs047.html#rnns-in-more-detail" style="font-size: 80%;"><b>RNNs in more detail</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs048.html#rnns-in-more-detail-part-2" style="font-size: 80%;"><b>RNNs in more detail, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs049.html#rnns-in-more-detail-part-3" style="font-size: 80%;"><b>RNNs in more detail, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs050.html#rnns-in-more-detail-part-4" style="font-size: 80%;"><b>RNNs in more detail, part 4</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs051.html#rnns-in-more-detail-part-5" style="font-size: 80%;"><b>RNNs in more detail, part 5</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs052.html#rnns-in-more-detail-part-6" style="font-size: 80%;"><b>RNNs in more detail, part 6</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs053.html#rnns-in-more-detail-part-7" style="font-size: 80%;"><b>RNNs in more detail, part 7</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs054.html#backpropagation-through-time" style="font-size: 80%;"><b>Backpropagation through time</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs055.html#the-backward-pass-is-linear" style="font-size: 80%;"><b>The backward pass is linear</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs056.html#the-problem-of-exploding-or-vanishing-gradients" style="font-size: 80%;"><b>The problem of exploding or vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs057.html#mathematical-setup" style="font-size: 80%;"><b>Mathematical setup</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs058.html#back-propagation-in-time-through-figures-part-1" style="font-size: 80%;"><b>Back propagation in time through figures, part 1</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs059.html#back-propagation-in-time-part-2" style="font-size: 80%;"><b>Back propagation in time, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs060.html#back-propagation-in-time-part-3" style="font-size: 80%;"><b>Back propagation in time, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs061.html#back-propagation-in-time-part-4" style="font-size: 80%;"><b>Back propagation in time, part 4</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs062.html#back-propagation-in-time-in-equations" style="font-size: 80%;"><b>Back propagation in time in equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs063.html#chain-rule-again" style="font-size: 80%;"><b>Chain rule again</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs064.html#gradients-of-loss-functions" style="font-size: 80%;"><b>Gradients of loss functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs065.html#summary-of-rnns" style="font-size: 80%;"><b>Summary of RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs066.html#summary-of-a-typical-rnn" style="font-size: 80%;"><b>Summary of a  typical RNN</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs067.html#four-effective-ways-to-learn-an-rnn-and-preparing-for-next-week" style="font-size: 80%;"><b>Four effective ways to learn an RNN and preparing for next week</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs068.html#gating-mechanism-long-short-term-memory-lstm" style="font-size: 80%;"><b>Gating mechanism: Long Short Term Memory (LSTM)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs069.html#implementing-a-memory-cell-in-a-neural-network" style="font-size: 80%;"><b>Implementing a memory cell in a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs070.html#lstm-details" style="font-size: 80%;"><b>LSTM details</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs071.html#basic-layout" style="font-size: 80%;"><b>Basic layout</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs072.html#more-lstm-details" style="font-size: 80%;"><b>More LSTM details</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs073.html#the-forget-gate" style="font-size: 80%;"><b>The forget gate</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs074.html#input-gate" style="font-size: 80%;"><b>Input gate</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs075.html#forget-and-input" style="font-size: 80%;"><b>Forget and input</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs076.html#output-gate" style="font-size: 80%;"><b>Output gate</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs001.html#plans-for-week-45" style="font-size: 80%;">Plans for week 45</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs002.html#material-for-the-lab-sessions" style="font-size: 80%;">Material for the lab sessions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs003.html#material-for-lecture-monday-november-3" style="font-size: 80%;">Material for Lecture Monday November 3</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs004.html#convolutional-neural-networks-recognizing-images-reminder-from-last-week" style="font-size: 80%;">Convolutional Neural Networks (recognizing images), reminder from last week</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs005.html#what-is-the-difference" style="font-size: 80%;">What is the Difference</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs006.html#neural-networks-vs-cnns" style="font-size: 80%;">Neural Networks vs CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs007.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;">Why CNNS for images, sound files, medical images from CT scans etc?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs008.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;">Regular NNs don’t scale well to full images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs009.html#3d-volumes-of-neurons" style="font-size: 80%;">3D volumes of neurons</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs010.html#more-on-dimensionalities" style="font-size: 80%;">More on Dimensionalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs011.html#further-remarks" style="font-size: 80%;">Further remarks</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs012.html#layers-used-to-build-cnns" style="font-size: 80%;">Layers used to build CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs013.html#transforming-images" style="font-size: 80%;">Transforming images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs014.html#cnns-in-brief" style="font-size: 80%;">CNNs in brief</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs015.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs016.html#key-idea" style="font-size: 80%;">Key Idea</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs017.html#mathematics-of-cnns" style="font-size: 80%;">Mathematics of CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs018.html#convolution-examples-polynomial-multiplication" style="font-size: 80%;">Convolution Examples: Polynomial multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs019.html#efficient-polynomial-multiplication" style="font-size: 80%;">Efficient Polynomial Multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs020.html#further-simplification" style="font-size: 80%;">Further simplification</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs021.html#a-more-efficient-way-of-coding-the-above-convolution" style="font-size: 80%;">A more efficient way of coding the above Convolution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs022.html#commutative-process" style="font-size: 80%;">Commutative process</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs023.html#toeplitz-matrices" style="font-size: 80%;">Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs024.html#fourier-series-and-toeplitz-matrices" style="font-size: 80%;">Fourier series and Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs025.html#generalizing-the-above-one-dimensional-case" style="font-size: 80%;">Generalizing the above one-dimensional case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs026.html#memory-considerations" style="font-size: 80%;">Memory considerations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs027.html#padding" style="font-size: 80%;">Padding</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs028.html#new-vector" style="font-size: 80%;">New vector</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs029.html#rewriting-as-dot-products" style="font-size: 80%;">Rewriting as dot products</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs030.html#cross-correlation" style="font-size: 80%;">Cross correlation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs030.html#two-dimensional-objects" style="font-size: 80%;">Two-dimensional objects</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#cnns-in-more-detail-simple-example" style="font-size: 80%;">CNNs in more detail, simple example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs032.html#the-convolution-stage" style="font-size: 80%;">The convolution stage</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs033.html#finding-the-number-of-parameters" style="font-size: 80%;">Finding the number of parameters</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs034.html#new-image-or-volume" style="font-size: 80%;">New image (or volume)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs035.html#parameters-to-train-common-settings" style="font-size: 80%;">Parameters to train, common settings</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs036.html#examples-of-cnn-setups" style="font-size: 80%;">Examples of CNN setups</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs037.html#summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Summarizing: Performing a general discrete convolution ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs038.html#pooling" style="font-size: 80%;">Pooling</a></li>
+     <!-- navigation toc: --> <li><a href="#pooling-arithmetic" style="font-size: 80%;">Pooling arithmetic</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs040.html#pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Pooling types ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs041.html#building-convolutional-neural-networks-using-tensorflow-and-keras" style="font-size: 80%;">Building convolutional neural networks using Tensorflow and Keras</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs042.html#setting-it-up" style="font-size: 80%;">Setting it up</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs043.html#the-mnist-dataset-again" style="font-size: 80%;">The MNIST dataset again</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs044.html#strong-correlations" style="font-size: 80%;">Strong correlations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs045.html#layers-of-a-cnn" style="font-size: 80%;">Layers of a CNN</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs046.html#systematic-reduction" style="font-size: 80%;">Systematic reduction</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs047.html#prerequisites-collect-and-pre-process-data" style="font-size: 80%;">Prerequisites: Collect and pre-process data</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs048.html#importing-keras-and-tensorflow" style="font-size: 80%;">Importing Keras and Tensorflow</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs049.html#running-with-keras" style="font-size: 80%;">Running with Keras</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs050.html#final-part" style="font-size: 80%;">Final part</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs051.html#final-visualization" style="font-size: 80%;">Final visualization</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs052.html#the-cifar01-data-set" style="font-size: 80%;">The CIFAR01 data set</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs053.html#verifying-the-data-set" style="font-size: 80%;">Verifying the data set</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs054.html#set-up-the-model" style="font-size: 80%;">Set up  the model</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs055.html#add-dense-layers-on-top" style="font-size: 80%;">Add Dense layers on top</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs056.html#compile-and-train-the-model" style="font-size: 80%;">Compile and train the model</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs057.html#finally-evaluate-the-model" style="font-size: 80%;">Finally, evaluate the model</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs058.html#building-code-using-pytorch" style="font-size: 80%;">Building code using Pytorch</a></li>
 
         </ul>
       </li>
@@ -418,18 +278,14 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0039"></a>
 <!-- !split -->
-<h2 id="rnns" class="anchor">RNNs </h2>
+<h2 id="pooling-arithmetic" class="anchor">Pooling arithmetic </h2>
 
-<p>RNNs are very powerful, because they
-combine two properties:
-</p>
-<ol>
-<li> Distributed hidden state that allows them to store a lot of information about the past efficiently.</li>
-<li> Non-linear dynamics that allows them to update their hidden state in complicated ways.</li>
-</ol>
-<p>With enough neurons and time, RNNs
-can compute anything that can be
-computed by your computer.
+<p>In a neural network, pooling layers provide invariance to small translations of
+the input. The most common kind of pooling is <b>max pooling</b>, which
+consists in splitting the input in (usually non-overlapping) patches and
+outputting the maximum value of each patch. Other kinds of pooling exist, e.g.,
+mean or average pooling, which all share the same idea of aggregating the input
+locally by applying a non-linearity to the content of some patches.
 </p>
 
 <p>
@@ -457,7 +313,7 @@ <h2 id="rnns" class="anchor">RNNs </h2>
   <li><a href="http://github.com/._week45-bs047.html">48</a></li>
   <li><a href="http://github.com/._week45-bs048.html">49</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week45-bs076.html">77</a></li>
+  <li><a href="http://github.com/._week45-bs058.html">59</a></li>
   <li><a href="http://github.com/._week45-bs040.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week45/html/._week45-bs040.html b/doc/pub/week45/html/._week45-bs040.html
index 99351c34a..03ce40ac1 100644
--- a/doc/pub/week45/html/._week45-bs040.html
+++ b/doc/pub/week45/html/._week45-bs040.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks (RNNs)">
-<title>Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks (RNNs)</title>
+<meta name="description" content="Week 45,  Convolutional Neural Networks (CCNs)">
+<title>Week 45,  Convolutional Neural Networks (CCNs)</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week45.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week45-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -37,19 +37,19 @@
 <!-- tocinfo
 {'highest level': 2,
  'sections': [('Plans for week 45', 2, None, 'plans-for-week-45'),
-              ('Material for the lab sessions, additional ways to present '
-               'classification results and other practicalities',
+              ('Material for the lab sessions',
                2,
                None,
-               'material-for-the-lab-sessions-additional-ways-to-present-classification-results-and-other-practicalities'),
-              ('Material for Lecture Monday November 4',
+               'material-for-the-lab-sessions'),
+              ('Material for Lecture Monday November 3',
                2,
                None,
-               'material-for-lecture-monday-november-4'),
-              ('Convolutional Neural Networks (recognizing images)',
+               'material-for-lecture-monday-november-3'),
+              ('Convolutional Neural Networks (recognizing images), reminder '
+               'from last week',
                2,
                None,
-               'convolutional-neural-networks-recognizing-images'),
+               'convolutional-neural-networks-recognizing-images-reminder-from-last-week'),
               ('What is the Difference', 2, None, 'what-is-the-difference'),
               ('Neural Networks vs CNNs', 2, None, 'neural-networks-vs-cnns'),
               ('Why CNNS for images, sound files, medical images from CT scans '
@@ -62,10 +62,13 @@
                None,
                'regular-nns-don-t-scale-well-to-full-images'),
               ('3D volumes of neurons', 2, None, '3d-volumes-of-neurons'),
+              ('More on Dimensionalities', 2, None, 'more-on-dimensionalities'),
+              ('Further remarks', 2, None, 'further-remarks'),
               ('Layers used to build CNNs',
                2,
                None,
                'layers-used-to-build-cnns'),
+              ('Transforming images', 2, None, 'transforming-images'),
               ('CNNs in brief', 2, None, 'cnns-in-brief'),
               ('A deep CNN model ("From Raschka et '
                'al":"https://github.com/rasbt/machine-learning-book")',
@@ -73,208 +76,104 @@
                None,
                'a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
               ('Key Idea', 2, None, 'key-idea'),
-              ('Building convolutional neural networks in Tensorflow and Keras',
+              ('Mathematics of CNNs', 2, None, 'mathematics-of-cnns'),
+              ('Convolution Examples: Polynomial multiplication',
                2,
                None,
-               'building-convolutional-neural-networks-in-tensorflow-and-keras'),
-              ('Setting it up', 2, None, 'setting-it-up'),
-              ('The MNIST dataset again', 2, None, 'the-mnist-dataset-again'),
-              ('Strong correlations', 2, None, 'strong-correlations'),
-              ('Layers of a CNN', 2, None, 'layers-of-a-cnn'),
-              ('Systematic reduction', 2, None, 'systematic-reduction'),
-              ('Prerequisites: Collect and pre-process data',
-               2,
-               None,
-               'prerequisites-collect-and-pre-process-data'),
-              ('Importing Keras and Tensorflow',
-               2,
-               None,
-               'importing-keras-and-tensorflow'),
-              ('Running with Keras', 2, None, 'running-with-keras'),
-              ('Final part', 2, None, 'final-part'),
-              ('Final visualization', 2, None, 'final-visualization'),
-              ('The CIFAR01 data set', 2, None, 'the-cifar01-data-set'),
-              ('Verifying the data set', 2, None, 'verifying-the-data-set'),
-              ('Set up  the model', 2, None, 'set-up-the-model'),
-              ('Add Dense layers on top', 2, None, 'add-dense-layers-on-top'),
-              ('Compile and train the model',
-               2,
-               None,
-               'compile-and-train-the-model'),
-              ('Finally, evaluate the model',
-               2,
-               None,
-               'finally-evaluate-the-model'),
-              ('Building our own CNN code',
-               2,
-               None,
-               'building-our-own-cnn-code'),
-              ('List of contents:', 3, None, 'list-of-contents'),
-              ('Schedulers', 3, None, 'schedulers'),
-              ('Usage of schedulers', 3, None, 'usage-of-schedulers'),
-              ('Cost functions', 3, None, 'cost-functions'),
-              ('Usage of cost functions', 3, None, 'usage-of-cost-functions'),
-              ('Activation functions', 3, None, 'activation-functions'),
-              ('Usage of activation functions',
-               3,
-               None,
-               'usage-of-activation-functions'),
-              ('Convolution', 3, None, 'convolution'),
-              ('Layers', 3, None, 'layers'),
-              ('Convolution2DLayer: convolution in a hidden layer',
-               3,
-               None,
-               'convolution2dlayer-convolution-in-a-hidden-layer'),
-              ('Backpropagation in the convolutional layer',
-               3,
-               None,
-               'backpropagation-in-the-convolutional-layer'),
-              ('Demonstration', 3, None, 'demonstration'),
-              ('Pooling Layer', 3, None, 'pooling-layer'),
-              ('Flattening Layer', 3, None, 'flattening-layer'),
-              ('Fully Connected Layers', 3, None, 'fully-connected-layers'),
-              ('Optimized Convolution2DLayer',
-               3,
-               None,
-               'optimized-convolution2dlayer'),
-              ('The Convolutional Neural Network (CNN)',
-               3,
-               None,
-               'the-convolutional-neural-network-cnn'),
-              ('Usage of CNN code', 3, None, 'usage-of-cnn-code'),
-              ('Additional Remarks', 3, None, 'additional-remarks'),
-              ('Remarks on the speed', 3, None, 'remarks-on-the-speed'),
-              ('Convolution using separable kernels',
-               3,
-               None,
-               'convolution-using-separable-kernels'),
-              ('Convolution in the Fourier domain',
-               3,
-               None,
-               'convolution-in-the-fourier-domain'),
-              ('From FFNNs and CNNs to recurrent neural networks (RNNs)',
+               'convolution-examples-polynomial-multiplication'),
+              ('Efficient Polynomial Multiplication',
                2,
                None,
-               'from-ffnns-and-cnns-to-recurrent-neural-networks-rnns'),
-              ('Feedback connections', 2, None, 'feedback-connections'),
-              ('Vanishing gradients', 2, None, 'vanishing-gradients'),
-              ('Recurrent neural networks (RNNs): Overarching view',
+               'efficient-polynomial-multiplication'),
+              ('Further simplification', 2, None, 'further-simplification'),
+              ('A more efficient way of coding the above Convolution',
                2,
                None,
-               'recurrent-neural-networks-rnns-overarching-view'),
-              ('Sequential data only?', 2, None, 'sequential-data-only'),
-              ('Differential equations', 2, None, 'differential-equations'),
-              ('A simple example', 2, None, 'a-simple-example'),
-              ('RNNs', 2, None, 'rnns'),
-              ('What kinds of behaviour can RNNs exhibit?',
+               'a-more-efficient-way-of-coding-the-above-convolution'),
+              ('Commutative process', 2, None, 'commutative-process'),
+              ('Toeplitz matrices', 2, None, 'toeplitz-matrices'),
+              ('Fourier series and Toeplitz matrices',
                2,
                None,
-               'what-kinds-of-behaviour-can-rnns-exhibit'),
-              ('Basic layout,  "Figures from Sebastian Rashcka et al, Machine '
-               'learning with Sickit-Learn and '
-               'PyTorch":"https://sebastianraschka.com/blog/2022/ml-pytorch-book.html"',
+               'fourier-series-and-toeplitz-matrices'),
+              ('Generalizing the above one-dimensional case',
                2,
                None,
-               'basic-layout-figures-from-sebastian-rashcka-et-al-machine-learning-with-sickit-learn-and-pytorch-https-sebastianraschka-com-blog-2022-ml-pytorch-book-html'),
-              ('Solving differential equations with RNNs',
+               'generalizing-the-above-one-dimensional-case'),
+              ('Memory considerations', 2, None, 'memory-considerations'),
+              ('Padding', 2, None, 'padding'),
+              ('New vector', 2, None, 'new-vector'),
+              ('Rewriting as dot products',
                2,
                None,
-               'solving-differential-equations-with-rnns'),
-              ('Two first-order differential equations',
+               'rewriting-as-dot-products'),
+              ('Cross correlation', 2, None, 'cross-correlation'),
+              ('Two-dimensional objects', 2, None, 'two-dimensional-objects'),
+              ('CNNs in more detail, simple example',
                2,
                None,
-               'two-first-order-differential-equations'),
-              ('Velocity only', 2, None, 'velocity-only'),
-              ('Linking with RNNs', 2, None, 'linking-with-rnns'),
-              ('Minor rewrite', 2, None, 'minor-rewrite'),
-              ('RNNs in more detail', 2, None, 'rnns-in-more-detail'),
-              ('RNNs in more detail, part 2',
+               'cnns-in-more-detail-simple-example'),
+              ('The convolution stage', 2, None, 'the-convolution-stage'),
+              ('Finding the number of parameters',
                2,
                None,
-               'rnns-in-more-detail-part-2'),
-              ('RNNs in more detail, part 3',
+               'finding-the-number-of-parameters'),
+              ('New image (or volume)', 2, None, 'new-image-or-volume'),
+              ('Parameters to train, common settings',
                2,
                None,
-               'rnns-in-more-detail-part-3'),
-              ('RNNs in more detail, part 4',
-               2,
-               None,
-               'rnns-in-more-detail-part-4'),
-              ('RNNs in more detail, part 5',
-               2,
-               None,
-               'rnns-in-more-detail-part-5'),
-              ('RNNs in more detail, part 6',
-               2,
-               None,
-               'rnns-in-more-detail-part-6'),
-              ('RNNs in more detail, part 7',
-               2,
-               None,
-               'rnns-in-more-detail-part-7'),
-              ('Backpropagation through time',
-               2,
-               None,
-               'backpropagation-through-time'),
-              ('The backward pass is linear',
-               2,
-               None,
-               'the-backward-pass-is-linear'),
-              ('The problem of exploding or vanishing gradients',
-               2,
-               None,
-               'the-problem-of-exploding-or-vanishing-gradients'),
-              ('Mathematical setup', 2, None, 'mathematical-setup'),
-              ('Back propagation in time through figures, part 1',
-               2,
-               None,
-               'back-propagation-in-time-through-figures-part-1'),
-              ('Back propagation in time, part 2',
-               2,
-               None,
-               'back-propagation-in-time-part-2'),
-              ('Back propagation in time, part 3',
+               'parameters-to-train-common-settings'),
+              ('Examples of CNN setups', 2, None, 'examples-of-cnn-setups'),
+              ('Summarizing: Performing a general discrete convolution ("From '
+               'Raschka et '
+               'al":"https://github.com/rasbt/machine-learning-book")',
                2,
                None,
-               'back-propagation-in-time-part-3'),
-              ('Back propagation in time, part 4',
+               'summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
+              ('Pooling', 2, None, 'pooling'),
+              ('Pooling arithmetic', 2, None, 'pooling-arithmetic'),
+              ('Pooling types ("From Raschka et '
+               'al":"https://github.com/rasbt/machine-learning-book")',
                2,
                None,
-               'back-propagation-in-time-part-4'),
-              ('Back propagation in time in equations',
+               'pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
+              ('Building convolutional neural networks using Tensorflow and '
+               'Keras',
                2,
                None,
-               'back-propagation-in-time-in-equations'),
-              ('Chain rule again', 2, None, 'chain-rule-again'),
-              ('Gradients of loss functions',
+               'building-convolutional-neural-networks-using-tensorflow-and-keras'),
+              ('Setting it up', 2, None, 'setting-it-up'),
+              ('The MNIST dataset again', 2, None, 'the-mnist-dataset-again'),
+              ('Strong correlations', 2, None, 'strong-correlations'),
+              ('Layers of a CNN', 2, None, 'layers-of-a-cnn'),
+              ('Systematic reduction', 2, None, 'systematic-reduction'),
+              ('Prerequisites: Collect and pre-process data',
                2,
                None,
-               'gradients-of-loss-functions'),
-              ('Summary of RNNs', 2, None, 'summary-of-rnns'),
-              ('Summary of a  typical RNN',
+               'prerequisites-collect-and-pre-process-data'),
+              ('Importing Keras and Tensorflow',
                2,
                None,
-               'summary-of-a-typical-rnn'),
-              ('Four effective ways to learn an RNN and preparing for next '
-               'week',
+               'importing-keras-and-tensorflow'),
+              ('Running with Keras', 2, None, 'running-with-keras'),
+              ('Final part', 2, None, 'final-part'),
+              ('Final visualization', 2, None, 'final-visualization'),
+              ('The CIFAR01 data set', 2, None, 'the-cifar01-data-set'),
+              ('Verifying the data set', 2, None, 'verifying-the-data-set'),
+              ('Set up  the model', 2, None, 'set-up-the-model'),
+              ('Add Dense layers on top', 2, None, 'add-dense-layers-on-top'),
+              ('Compile and train the model',
                2,
                None,
-               'four-effective-ways-to-learn-an-rnn-and-preparing-for-next-week'),
-              ('Gating mechanism: Long Short Term Memory (LSTM)',
+               'compile-and-train-the-model'),
+              ('Finally, evaluate the model',
                2,
                None,
-               'gating-mechanism-long-short-term-memory-lstm'),
-              ('Implementing a memory cell in a neural network',
+               'finally-evaluate-the-model'),
+              ('Building code using Pytorch',
                2,
                None,
-               'implementing-a-memory-cell-in-a-neural-network'),
-              ('LSTM details', 2, None, 'lstm-details'),
-              ('Basic layout', 2, None, 'basic-layout'),
-              ('More LSTM details', 2, None, 'more-lstm-details'),
-              ('The forget gate', 2, None, 'the-forget-gate'),
-              ('Input gate', 2, None, 'input-gate'),
-              ('Forget and input', 2, None, 'forget-and-input'),
-              ('Output gate', 2, None, 'output-gate')]}
+               'building-code-using-pytorch')]}
 end of tocinfo -->
 
 <body>
@@ -302,111 +201,72 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week45-bs.html">Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks (RNNs)</a>
+    <a class="navbar-brand" href="http://github.com/week45-bs.html">Week 45,  Convolutional Neural Networks (CCNs)</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs001.html#plans-for-week-45" style="font-size: 80%;"><b>Plans for week 45</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs002.html#material-for-the-lab-sessions-additional-ways-to-present-classification-results-and-other-practicalities" style="font-size: 80%;"><b>Material for the lab sessions, additional ways to present classification results and other practicalities</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs003.html#material-for-lecture-monday-november-4" style="font-size: 80%;"><b>Material for Lecture Monday November 4</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs004.html#convolutional-neural-networks-recognizing-images" style="font-size: 80%;"><b>Convolutional Neural Networks (recognizing images)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs005.html#what-is-the-difference" style="font-size: 80%;"><b>What is the Difference</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs006.html#neural-networks-vs-cnns" style="font-size: 80%;"><b>Neural Networks vs CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs007.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;"><b>Why CNNS for images, sound files, medical images from CT scans etc?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs008.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;"><b>Regular NNs don’t scale well to full images</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs009.html#3d-volumes-of-neurons" style="font-size: 80%;"><b>3D volumes of neurons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs010.html#layers-used-to-build-cnns" style="font-size: 80%;"><b>Layers used to build CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs011.html#cnns-in-brief" style="font-size: 80%;"><b>CNNs in brief</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs012.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs013.html#key-idea" style="font-size: 80%;"><b>Key Idea</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs014.html#building-convolutional-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building convolutional neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs015.html#setting-it-up" style="font-size: 80%;"><b>Setting it up</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs016.html#the-mnist-dataset-again" style="font-size: 80%;"><b>The MNIST dataset again</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs017.html#strong-correlations" style="font-size: 80%;"><b>Strong correlations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs018.html#layers-of-a-cnn" style="font-size: 80%;"><b>Layers of a CNN</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs019.html#systematic-reduction" style="font-size: 80%;"><b>Systematic reduction</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs020.html#prerequisites-collect-and-pre-process-data" style="font-size: 80%;"><b>Prerequisites: Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs021.html#importing-keras-and-tensorflow" style="font-size: 80%;"><b>Importing Keras and Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs022.html#running-with-keras" style="font-size: 80%;"><b>Running with Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs023.html#final-part" style="font-size: 80%;"><b>Final part</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs024.html#final-visualization" style="font-size: 80%;"><b>Final visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs025.html#the-cifar01-data-set" style="font-size: 80%;"><b>The CIFAR01 data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs026.html#verifying-the-data-set" style="font-size: 80%;"><b>Verifying the data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs027.html#set-up-the-model" style="font-size: 80%;"><b>Set up  the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs028.html#add-dense-layers-on-top" style="font-size: 80%;"><b>Add Dense layers on top</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs029.html#compile-and-train-the-model" style="font-size: 80%;"><b>Compile and train the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs030.html#finally-evaluate-the-model" style="font-size: 80%;"><b>Finally, evaluate the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#building-our-own-cnn-code" style="font-size: 80%;"><b>Building our own CNN code</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#list-of-contents" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;List of contents:</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution2dlayer-convolution-in-a-hidden-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution2DLayer: convolution in a hidden layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#backpropagation-in-the-convolutional-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Backpropagation in the convolutional layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#demonstration" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Demonstration</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#pooling-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Pooling Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#flattening-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Flattening Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#fully-connected-layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Fully Connected Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#optimized-convolution2dlayer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Optimized Convolution2DLayer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#the-convolutional-neural-network-cnn" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;The Convolutional Neural Network (CNN)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-cnn-code" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of CNN code</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#additional-remarks" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Additional Remarks</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#remarks-on-the-speed" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Remarks on the speed</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution-using-separable-kernels" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution using separable kernels</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution-in-the-fourier-domain" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution in the Fourier domain</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs032.html#from-ffnns-and-cnns-to-recurrent-neural-networks-rnns" style="font-size: 80%;"><b>From FFNNs and CNNs to recurrent neural networks (RNNs)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs033.html#feedback-connections" style="font-size: 80%;"><b>Feedback connections</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs034.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs035.html#recurrent-neural-networks-rnns-overarching-view" style="font-size: 80%;"><b>Recurrent neural networks (RNNs): Overarching view</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs036.html#sequential-data-only" style="font-size: 80%;"><b>Sequential data only?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs037.html#differential-equations" style="font-size: 80%;"><b>Differential equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs038.html#a-simple-example" style="font-size: 80%;"><b>A simple example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs039.html#rnns" style="font-size: 80%;"><b>RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="#what-kinds-of-behaviour-can-rnns-exhibit" style="font-size: 80%;"><b>What kinds of behaviour can RNNs exhibit?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs041.html#basic-layout-figures-from-sebastian-rashcka-et-al-machine-learning-with-sickit-learn-and-pytorch-https-sebastianraschka-com-blog-2022-ml-pytorch-book-html" style="font-size: 80%;"><b>Basic layout,  "Figures from Sebastian Rashcka et al, Machine learning with Sickit-Learn and PyTorch":"https://sebastianraschka.com/blog/2022/ml-pytorch-book.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs042.html#solving-differential-equations-with-rnns" style="font-size: 80%;"><b>Solving differential equations with RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs043.html#two-first-order-differential-equations" style="font-size: 80%;"><b>Two first-order differential equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs044.html#velocity-only" style="font-size: 80%;"><b>Velocity only</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs045.html#linking-with-rnns" style="font-size: 80%;"><b>Linking with RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs046.html#minor-rewrite" style="font-size: 80%;"><b>Minor rewrite</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs047.html#rnns-in-more-detail" style="font-size: 80%;"><b>RNNs in more detail</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs048.html#rnns-in-more-detail-part-2" style="font-size: 80%;"><b>RNNs in more detail, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs049.html#rnns-in-more-detail-part-3" style="font-size: 80%;"><b>RNNs in more detail, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs050.html#rnns-in-more-detail-part-4" style="font-size: 80%;"><b>RNNs in more detail, part 4</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs051.html#rnns-in-more-detail-part-5" style="font-size: 80%;"><b>RNNs in more detail, part 5</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs052.html#rnns-in-more-detail-part-6" style="font-size: 80%;"><b>RNNs in more detail, part 6</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs053.html#rnns-in-more-detail-part-7" style="font-size: 80%;"><b>RNNs in more detail, part 7</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs054.html#backpropagation-through-time" style="font-size: 80%;"><b>Backpropagation through time</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs055.html#the-backward-pass-is-linear" style="font-size: 80%;"><b>The backward pass is linear</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs056.html#the-problem-of-exploding-or-vanishing-gradients" style="font-size: 80%;"><b>The problem of exploding or vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs057.html#mathematical-setup" style="font-size: 80%;"><b>Mathematical setup</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs058.html#back-propagation-in-time-through-figures-part-1" style="font-size: 80%;"><b>Back propagation in time through figures, part 1</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs059.html#back-propagation-in-time-part-2" style="font-size: 80%;"><b>Back propagation in time, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs060.html#back-propagation-in-time-part-3" style="font-size: 80%;"><b>Back propagation in time, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs061.html#back-propagation-in-time-part-4" style="font-size: 80%;"><b>Back propagation in time, part 4</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs062.html#back-propagation-in-time-in-equations" style="font-size: 80%;"><b>Back propagation in time in equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs063.html#chain-rule-again" style="font-size: 80%;"><b>Chain rule again</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs064.html#gradients-of-loss-functions" style="font-size: 80%;"><b>Gradients of loss functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs065.html#summary-of-rnns" style="font-size: 80%;"><b>Summary of RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs066.html#summary-of-a-typical-rnn" style="font-size: 80%;"><b>Summary of a  typical RNN</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs067.html#four-effective-ways-to-learn-an-rnn-and-preparing-for-next-week" style="font-size: 80%;"><b>Four effective ways to learn an RNN and preparing for next week</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs068.html#gating-mechanism-long-short-term-memory-lstm" style="font-size: 80%;"><b>Gating mechanism: Long Short Term Memory (LSTM)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs069.html#implementing-a-memory-cell-in-a-neural-network" style="font-size: 80%;"><b>Implementing a memory cell in a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs070.html#lstm-details" style="font-size: 80%;"><b>LSTM details</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs071.html#basic-layout" style="font-size: 80%;"><b>Basic layout</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs072.html#more-lstm-details" style="font-size: 80%;"><b>More LSTM details</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs073.html#the-forget-gate" style="font-size: 80%;"><b>The forget gate</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs074.html#input-gate" style="font-size: 80%;"><b>Input gate</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs075.html#forget-and-input" style="font-size: 80%;"><b>Forget and input</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs076.html#output-gate" style="font-size: 80%;"><b>Output gate</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs001.html#plans-for-week-45" style="font-size: 80%;">Plans for week 45</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs002.html#material-for-the-lab-sessions" style="font-size: 80%;">Material for the lab sessions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs003.html#material-for-lecture-monday-november-3" style="font-size: 80%;">Material for Lecture Monday November 3</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs004.html#convolutional-neural-networks-recognizing-images-reminder-from-last-week" style="font-size: 80%;">Convolutional Neural Networks (recognizing images), reminder from last week</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs005.html#what-is-the-difference" style="font-size: 80%;">What is the Difference</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs006.html#neural-networks-vs-cnns" style="font-size: 80%;">Neural Networks vs CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs007.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;">Why CNNS for images, sound files, medical images from CT scans etc?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs008.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;">Regular NNs don’t scale well to full images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs009.html#3d-volumes-of-neurons" style="font-size: 80%;">3D volumes of neurons</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs010.html#more-on-dimensionalities" style="font-size: 80%;">More on Dimensionalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs011.html#further-remarks" style="font-size: 80%;">Further remarks</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs012.html#layers-used-to-build-cnns" style="font-size: 80%;">Layers used to build CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs013.html#transforming-images" style="font-size: 80%;">Transforming images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs014.html#cnns-in-brief" style="font-size: 80%;">CNNs in brief</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs015.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs016.html#key-idea" style="font-size: 80%;">Key Idea</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs017.html#mathematics-of-cnns" style="font-size: 80%;">Mathematics of CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs018.html#convolution-examples-polynomial-multiplication" style="font-size: 80%;">Convolution Examples: Polynomial multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs019.html#efficient-polynomial-multiplication" style="font-size: 80%;">Efficient Polynomial Multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs020.html#further-simplification" style="font-size: 80%;">Further simplification</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs021.html#a-more-efficient-way-of-coding-the-above-convolution" style="font-size: 80%;">A more efficient way of coding the above Convolution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs022.html#commutative-process" style="font-size: 80%;">Commutative process</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs023.html#toeplitz-matrices" style="font-size: 80%;">Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs024.html#fourier-series-and-toeplitz-matrices" style="font-size: 80%;">Fourier series and Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs025.html#generalizing-the-above-one-dimensional-case" style="font-size: 80%;">Generalizing the above one-dimensional case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs026.html#memory-considerations" style="font-size: 80%;">Memory considerations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs027.html#padding" style="font-size: 80%;">Padding</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs028.html#new-vector" style="font-size: 80%;">New vector</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs029.html#rewriting-as-dot-products" style="font-size: 80%;">Rewriting as dot products</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs030.html#cross-correlation" style="font-size: 80%;">Cross correlation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs030.html#two-dimensional-objects" style="font-size: 80%;">Two-dimensional objects</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#cnns-in-more-detail-simple-example" style="font-size: 80%;">CNNs in more detail, simple example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs032.html#the-convolution-stage" style="font-size: 80%;">The convolution stage</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs033.html#finding-the-number-of-parameters" style="font-size: 80%;">Finding the number of parameters</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs034.html#new-image-or-volume" style="font-size: 80%;">New image (or volume)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs035.html#parameters-to-train-common-settings" style="font-size: 80%;">Parameters to train, common settings</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs036.html#examples-of-cnn-setups" style="font-size: 80%;">Examples of CNN setups</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs037.html#summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Summarizing: Performing a general discrete convolution ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs038.html#pooling" style="font-size: 80%;">Pooling</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs039.html#pooling-arithmetic" style="font-size: 80%;">Pooling arithmetic</a></li>
+     <!-- navigation toc: --> <li><a href="#pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Pooling types ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs041.html#building-convolutional-neural-networks-using-tensorflow-and-keras" style="font-size: 80%;">Building convolutional neural networks using Tensorflow and Keras</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs042.html#setting-it-up" style="font-size: 80%;">Setting it up</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs043.html#the-mnist-dataset-again" style="font-size: 80%;">The MNIST dataset again</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs044.html#strong-correlations" style="font-size: 80%;">Strong correlations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs045.html#layers-of-a-cnn" style="font-size: 80%;">Layers of a CNN</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs046.html#systematic-reduction" style="font-size: 80%;">Systematic reduction</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs047.html#prerequisites-collect-and-pre-process-data" style="font-size: 80%;">Prerequisites: Collect and pre-process data</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs048.html#importing-keras-and-tensorflow" style="font-size: 80%;">Importing Keras and Tensorflow</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs049.html#running-with-keras" style="font-size: 80%;">Running with Keras</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs050.html#final-part" style="font-size: 80%;">Final part</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs051.html#final-visualization" style="font-size: 80%;">Final visualization</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs052.html#the-cifar01-data-set" style="font-size: 80%;">The CIFAR01 data set</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs053.html#verifying-the-data-set" style="font-size: 80%;">Verifying the data set</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs054.html#set-up-the-model" style="font-size: 80%;">Set up  the model</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs055.html#add-dense-layers-on-top" style="font-size: 80%;">Add Dense layers on top</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs056.html#compile-and-train-the-model" style="font-size: 80%;">Compile and train the model</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs057.html#finally-evaluate-the-model" style="font-size: 80%;">Finally, evaluate the model</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs058.html#building-code-using-pytorch" style="font-size: 80%;">Building code using Pytorch</a></li>
 
         </ul>
       </li>
@@ -418,15 +278,15 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0040"></a>
 <!-- !split -->
-<h2 id="what-kinds-of-behaviour-can-rnns-exhibit" class="anchor">What kinds of behaviour can RNNs exhibit? </h2>
+<h2 id="pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" class="anchor">Pooling types (<a href="https://github.com/rasbt/machine-learning-book" target="_self">From Raschka et al</a>) </h2>
 
-<ol>
-<li> They can oscillate.</li> 
-<li> They can settle to point attractors.</li>
-<li> They can behave chaotically.</li>
-<li> RNNs could potentially learn to implement lots of small programs that each capture a nugget of knowledge and run in parallel, interacting to produce very complicated effects.</li>
-</ol>
-<p>But the extensive computational needs  of RNNs makes them very hard to train.</p>
+<center>  <!-- FIGURE -->
+<hr class="figure">
+<center>
+<p class="caption">Figure 5:  A deep CNN </p>
+</center>
+<p><img src="http://github.com/figslides/maxpooling.png" width="500" align="bottom"></p>
+</center>
 
 <p>
 <!-- navigation buttons at the bottom of the page -->
@@ -453,7 +313,7 @@ <h2 id="what-kinds-of-behaviour-can-rnns-exhibit" class="anchor">What kinds of b
   <li><a href="http://github.com/._week45-bs048.html">49</a></li>
   <li><a href="http://github.com/._week45-bs049.html">50</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week45-bs076.html">77</a></li>
+  <li><a href="http://github.com/._week45-bs058.html">59</a></li>
   <li><a href="http://github.com/._week45-bs041.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week45/html/._week45-bs041.html b/doc/pub/week45/html/._week45-bs041.html
index d158de7e6..9ed47796f 100644
--- a/doc/pub/week45/html/._week45-bs041.html
+++ b/doc/pub/week45/html/._week45-bs041.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks (RNNs)">
-<title>Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks (RNNs)</title>
+<meta name="description" content="Week 45,  Convolutional Neural Networks (CCNs)">
+<title>Week 45,  Convolutional Neural Networks (CCNs)</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week45.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week45-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -37,19 +37,19 @@
 <!-- tocinfo
 {'highest level': 2,
  'sections': [('Plans for week 45', 2, None, 'plans-for-week-45'),
-              ('Material for the lab sessions, additional ways to present '
-               'classification results and other practicalities',
+              ('Material for the lab sessions',
                2,
                None,
-               'material-for-the-lab-sessions-additional-ways-to-present-classification-results-and-other-practicalities'),
-              ('Material for Lecture Monday November 4',
+               'material-for-the-lab-sessions'),
+              ('Material for Lecture Monday November 3',
                2,
                None,
-               'material-for-lecture-monday-november-4'),
-              ('Convolutional Neural Networks (recognizing images)',
+               'material-for-lecture-monday-november-3'),
+              ('Convolutional Neural Networks (recognizing images), reminder '
+               'from last week',
                2,
                None,
-               'convolutional-neural-networks-recognizing-images'),
+               'convolutional-neural-networks-recognizing-images-reminder-from-last-week'),
               ('What is the Difference', 2, None, 'what-is-the-difference'),
               ('Neural Networks vs CNNs', 2, None, 'neural-networks-vs-cnns'),
               ('Why CNNS for images, sound files, medical images from CT scans '
@@ -62,10 +62,13 @@
                None,
                'regular-nns-don-t-scale-well-to-full-images'),
               ('3D volumes of neurons', 2, None, '3d-volumes-of-neurons'),
+              ('More on Dimensionalities', 2, None, 'more-on-dimensionalities'),
+              ('Further remarks', 2, None, 'further-remarks'),
               ('Layers used to build CNNs',
                2,
                None,
                'layers-used-to-build-cnns'),
+              ('Transforming images', 2, None, 'transforming-images'),
               ('CNNs in brief', 2, None, 'cnns-in-brief'),
               ('A deep CNN model ("From Raschka et '
                'al":"https://github.com/rasbt/machine-learning-book")',
@@ -73,208 +76,104 @@
                None,
                'a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
               ('Key Idea', 2, None, 'key-idea'),
-              ('Building convolutional neural networks in Tensorflow and Keras',
+              ('Mathematics of CNNs', 2, None, 'mathematics-of-cnns'),
+              ('Convolution Examples: Polynomial multiplication',
                2,
                None,
-               'building-convolutional-neural-networks-in-tensorflow-and-keras'),
-              ('Setting it up', 2, None, 'setting-it-up'),
-              ('The MNIST dataset again', 2, None, 'the-mnist-dataset-again'),
-              ('Strong correlations', 2, None, 'strong-correlations'),
-              ('Layers of a CNN', 2, None, 'layers-of-a-cnn'),
-              ('Systematic reduction', 2, None, 'systematic-reduction'),
-              ('Prerequisites: Collect and pre-process data',
-               2,
-               None,
-               'prerequisites-collect-and-pre-process-data'),
-              ('Importing Keras and Tensorflow',
-               2,
-               None,
-               'importing-keras-and-tensorflow'),
-              ('Running with Keras', 2, None, 'running-with-keras'),
-              ('Final part', 2, None, 'final-part'),
-              ('Final visualization', 2, None, 'final-visualization'),
-              ('The CIFAR01 data set', 2, None, 'the-cifar01-data-set'),
-              ('Verifying the data set', 2, None, 'verifying-the-data-set'),
-              ('Set up  the model', 2, None, 'set-up-the-model'),
-              ('Add Dense layers on top', 2, None, 'add-dense-layers-on-top'),
-              ('Compile and train the model',
-               2,
-               None,
-               'compile-and-train-the-model'),
-              ('Finally, evaluate the model',
-               2,
-               None,
-               'finally-evaluate-the-model'),
-              ('Building our own CNN code',
-               2,
-               None,
-               'building-our-own-cnn-code'),
-              ('List of contents:', 3, None, 'list-of-contents'),
-              ('Schedulers', 3, None, 'schedulers'),
-              ('Usage of schedulers', 3, None, 'usage-of-schedulers'),
-              ('Cost functions', 3, None, 'cost-functions'),
-              ('Usage of cost functions', 3, None, 'usage-of-cost-functions'),
-              ('Activation functions', 3, None, 'activation-functions'),
-              ('Usage of activation functions',
-               3,
-               None,
-               'usage-of-activation-functions'),
-              ('Convolution', 3, None, 'convolution'),
-              ('Layers', 3, None, 'layers'),
-              ('Convolution2DLayer: convolution in a hidden layer',
-               3,
-               None,
-               'convolution2dlayer-convolution-in-a-hidden-layer'),
-              ('Backpropagation in the convolutional layer',
-               3,
-               None,
-               'backpropagation-in-the-convolutional-layer'),
-              ('Demonstration', 3, None, 'demonstration'),
-              ('Pooling Layer', 3, None, 'pooling-layer'),
-              ('Flattening Layer', 3, None, 'flattening-layer'),
-              ('Fully Connected Layers', 3, None, 'fully-connected-layers'),
-              ('Optimized Convolution2DLayer',
-               3,
-               None,
-               'optimized-convolution2dlayer'),
-              ('The Convolutional Neural Network (CNN)',
-               3,
-               None,
-               'the-convolutional-neural-network-cnn'),
-              ('Usage of CNN code', 3, None, 'usage-of-cnn-code'),
-              ('Additional Remarks', 3, None, 'additional-remarks'),
-              ('Remarks on the speed', 3, None, 'remarks-on-the-speed'),
-              ('Convolution using separable kernels',
-               3,
-               None,
-               'convolution-using-separable-kernels'),
-              ('Convolution in the Fourier domain',
-               3,
-               None,
-               'convolution-in-the-fourier-domain'),
-              ('From FFNNs and CNNs to recurrent neural networks (RNNs)',
+               'convolution-examples-polynomial-multiplication'),
+              ('Efficient Polynomial Multiplication',
                2,
                None,
-               'from-ffnns-and-cnns-to-recurrent-neural-networks-rnns'),
-              ('Feedback connections', 2, None, 'feedback-connections'),
-              ('Vanishing gradients', 2, None, 'vanishing-gradients'),
-              ('Recurrent neural networks (RNNs): Overarching view',
+               'efficient-polynomial-multiplication'),
+              ('Further simplification', 2, None, 'further-simplification'),
+              ('A more efficient way of coding the above Convolution',
                2,
                None,
-               'recurrent-neural-networks-rnns-overarching-view'),
-              ('Sequential data only?', 2, None, 'sequential-data-only'),
-              ('Differential equations', 2, None, 'differential-equations'),
-              ('A simple example', 2, None, 'a-simple-example'),
-              ('RNNs', 2, None, 'rnns'),
-              ('What kinds of behaviour can RNNs exhibit?',
+               'a-more-efficient-way-of-coding-the-above-convolution'),
+              ('Commutative process', 2, None, 'commutative-process'),
+              ('Toeplitz matrices', 2, None, 'toeplitz-matrices'),
+              ('Fourier series and Toeplitz matrices',
                2,
                None,
-               'what-kinds-of-behaviour-can-rnns-exhibit'),
-              ('Basic layout,  "Figures from Sebastian Rashcka et al, Machine '
-               'learning with Sickit-Learn and '
-               'PyTorch":"https://sebastianraschka.com/blog/2022/ml-pytorch-book.html"',
+               'fourier-series-and-toeplitz-matrices'),
+              ('Generalizing the above one-dimensional case',
                2,
                None,
-               'basic-layout-figures-from-sebastian-rashcka-et-al-machine-learning-with-sickit-learn-and-pytorch-https-sebastianraschka-com-blog-2022-ml-pytorch-book-html'),
-              ('Solving differential equations with RNNs',
+               'generalizing-the-above-one-dimensional-case'),
+              ('Memory considerations', 2, None, 'memory-considerations'),
+              ('Padding', 2, None, 'padding'),
+              ('New vector', 2, None, 'new-vector'),
+              ('Rewriting as dot products',
                2,
                None,
-               'solving-differential-equations-with-rnns'),
-              ('Two first-order differential equations',
+               'rewriting-as-dot-products'),
+              ('Cross correlation', 2, None, 'cross-correlation'),
+              ('Two-dimensional objects', 2, None, 'two-dimensional-objects'),
+              ('CNNs in more detail, simple example',
                2,
                None,
-               'two-first-order-differential-equations'),
-              ('Velocity only', 2, None, 'velocity-only'),
-              ('Linking with RNNs', 2, None, 'linking-with-rnns'),
-              ('Minor rewrite', 2, None, 'minor-rewrite'),
-              ('RNNs in more detail', 2, None, 'rnns-in-more-detail'),
-              ('RNNs in more detail, part 2',
+               'cnns-in-more-detail-simple-example'),
+              ('The convolution stage', 2, None, 'the-convolution-stage'),
+              ('Finding the number of parameters',
                2,
                None,
-               'rnns-in-more-detail-part-2'),
-              ('RNNs in more detail, part 3',
+               'finding-the-number-of-parameters'),
+              ('New image (or volume)', 2, None, 'new-image-or-volume'),
+              ('Parameters to train, common settings',
                2,
                None,
-               'rnns-in-more-detail-part-3'),
-              ('RNNs in more detail, part 4',
-               2,
-               None,
-               'rnns-in-more-detail-part-4'),
-              ('RNNs in more detail, part 5',
-               2,
-               None,
-               'rnns-in-more-detail-part-5'),
-              ('RNNs in more detail, part 6',
-               2,
-               None,
-               'rnns-in-more-detail-part-6'),
-              ('RNNs in more detail, part 7',
-               2,
-               None,
-               'rnns-in-more-detail-part-7'),
-              ('Backpropagation through time',
-               2,
-               None,
-               'backpropagation-through-time'),
-              ('The backward pass is linear',
-               2,
-               None,
-               'the-backward-pass-is-linear'),
-              ('The problem of exploding or vanishing gradients',
-               2,
-               None,
-               'the-problem-of-exploding-or-vanishing-gradients'),
-              ('Mathematical setup', 2, None, 'mathematical-setup'),
-              ('Back propagation in time through figures, part 1',
-               2,
-               None,
-               'back-propagation-in-time-through-figures-part-1'),
-              ('Back propagation in time, part 2',
-               2,
-               None,
-               'back-propagation-in-time-part-2'),
-              ('Back propagation in time, part 3',
+               'parameters-to-train-common-settings'),
+              ('Examples of CNN setups', 2, None, 'examples-of-cnn-setups'),
+              ('Summarizing: Performing a general discrete convolution ("From '
+               'Raschka et '
+               'al":"https://github.com/rasbt/machine-learning-book")',
                2,
                None,
-               'back-propagation-in-time-part-3'),
-              ('Back propagation in time, part 4',
+               'summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
+              ('Pooling', 2, None, 'pooling'),
+              ('Pooling arithmetic', 2, None, 'pooling-arithmetic'),
+              ('Pooling types ("From Raschka et '
+               'al":"https://github.com/rasbt/machine-learning-book")',
                2,
                None,
-               'back-propagation-in-time-part-4'),
-              ('Back propagation in time in equations',
+               'pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
+              ('Building convolutional neural networks using Tensorflow and '
+               'Keras',
                2,
                None,
-               'back-propagation-in-time-in-equations'),
-              ('Chain rule again', 2, None, 'chain-rule-again'),
-              ('Gradients of loss functions',
+               'building-convolutional-neural-networks-using-tensorflow-and-keras'),
+              ('Setting it up', 2, None, 'setting-it-up'),
+              ('The MNIST dataset again', 2, None, 'the-mnist-dataset-again'),
+              ('Strong correlations', 2, None, 'strong-correlations'),
+              ('Layers of a CNN', 2, None, 'layers-of-a-cnn'),
+              ('Systematic reduction', 2, None, 'systematic-reduction'),
+              ('Prerequisites: Collect and pre-process data',
                2,
                None,
-               'gradients-of-loss-functions'),
-              ('Summary of RNNs', 2, None, 'summary-of-rnns'),
-              ('Summary of a  typical RNN',
+               'prerequisites-collect-and-pre-process-data'),
+              ('Importing Keras and Tensorflow',
                2,
                None,
-               'summary-of-a-typical-rnn'),
-              ('Four effective ways to learn an RNN and preparing for next '
-               'week',
+               'importing-keras-and-tensorflow'),
+              ('Running with Keras', 2, None, 'running-with-keras'),
+              ('Final part', 2, None, 'final-part'),
+              ('Final visualization', 2, None, 'final-visualization'),
+              ('The CIFAR01 data set', 2, None, 'the-cifar01-data-set'),
+              ('Verifying the data set', 2, None, 'verifying-the-data-set'),
+              ('Set up  the model', 2, None, 'set-up-the-model'),
+              ('Add Dense layers on top', 2, None, 'add-dense-layers-on-top'),
+              ('Compile and train the model',
                2,
                None,
-               'four-effective-ways-to-learn-an-rnn-and-preparing-for-next-week'),
-              ('Gating mechanism: Long Short Term Memory (LSTM)',
+               'compile-and-train-the-model'),
+              ('Finally, evaluate the model',
                2,
                None,
-               'gating-mechanism-long-short-term-memory-lstm'),
-              ('Implementing a memory cell in a neural network',
+               'finally-evaluate-the-model'),
+              ('Building code using Pytorch',
                2,
                None,
-               'implementing-a-memory-cell-in-a-neural-network'),
-              ('LSTM details', 2, None, 'lstm-details'),
-              ('Basic layout', 2, None, 'basic-layout'),
-              ('More LSTM details', 2, None, 'more-lstm-details'),
-              ('The forget gate', 2, None, 'the-forget-gate'),
-              ('Input gate', 2, None, 'input-gate'),
-              ('Forget and input', 2, None, 'forget-and-input'),
-              ('Output gate', 2, None, 'output-gate')]}
+               'building-code-using-pytorch')]}
 end of tocinfo -->
 
 <body>
@@ -302,111 +201,72 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week45-bs.html">Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks (RNNs)</a>
+    <a class="navbar-brand" href="http://github.com/week45-bs.html">Week 45,  Convolutional Neural Networks (CCNs)</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs001.html#plans-for-week-45" style="font-size: 80%;"><b>Plans for week 45</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs002.html#material-for-the-lab-sessions-additional-ways-to-present-classification-results-and-other-practicalities" style="font-size: 80%;"><b>Material for the lab sessions, additional ways to present classification results and other practicalities</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs003.html#material-for-lecture-monday-november-4" style="font-size: 80%;"><b>Material for Lecture Monday November 4</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs004.html#convolutional-neural-networks-recognizing-images" style="font-size: 80%;"><b>Convolutional Neural Networks (recognizing images)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs005.html#what-is-the-difference" style="font-size: 80%;"><b>What is the Difference</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs006.html#neural-networks-vs-cnns" style="font-size: 80%;"><b>Neural Networks vs CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs007.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;"><b>Why CNNS for images, sound files, medical images from CT scans etc?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs008.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;"><b>Regular NNs don’t scale well to full images</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs009.html#3d-volumes-of-neurons" style="font-size: 80%;"><b>3D volumes of neurons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs010.html#layers-used-to-build-cnns" style="font-size: 80%;"><b>Layers used to build CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs011.html#cnns-in-brief" style="font-size: 80%;"><b>CNNs in brief</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs012.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs013.html#key-idea" style="font-size: 80%;"><b>Key Idea</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs014.html#building-convolutional-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building convolutional neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs015.html#setting-it-up" style="font-size: 80%;"><b>Setting it up</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs016.html#the-mnist-dataset-again" style="font-size: 80%;"><b>The MNIST dataset again</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs017.html#strong-correlations" style="font-size: 80%;"><b>Strong correlations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs018.html#layers-of-a-cnn" style="font-size: 80%;"><b>Layers of a CNN</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs019.html#systematic-reduction" style="font-size: 80%;"><b>Systematic reduction</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs020.html#prerequisites-collect-and-pre-process-data" style="font-size: 80%;"><b>Prerequisites: Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs021.html#importing-keras-and-tensorflow" style="font-size: 80%;"><b>Importing Keras and Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs022.html#running-with-keras" style="font-size: 80%;"><b>Running with Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs023.html#final-part" style="font-size: 80%;"><b>Final part</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs024.html#final-visualization" style="font-size: 80%;"><b>Final visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs025.html#the-cifar01-data-set" style="font-size: 80%;"><b>The CIFAR01 data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs026.html#verifying-the-data-set" style="font-size: 80%;"><b>Verifying the data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs027.html#set-up-the-model" style="font-size: 80%;"><b>Set up  the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs028.html#add-dense-layers-on-top" style="font-size: 80%;"><b>Add Dense layers on top</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs029.html#compile-and-train-the-model" style="font-size: 80%;"><b>Compile and train the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs030.html#finally-evaluate-the-model" style="font-size: 80%;"><b>Finally, evaluate the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#building-our-own-cnn-code" style="font-size: 80%;"><b>Building our own CNN code</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#list-of-contents" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;List of contents:</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution2dlayer-convolution-in-a-hidden-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution2DLayer: convolution in a hidden layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#backpropagation-in-the-convolutional-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Backpropagation in the convolutional layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#demonstration" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Demonstration</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#pooling-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Pooling Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#flattening-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Flattening Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#fully-connected-layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Fully Connected Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#optimized-convolution2dlayer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Optimized Convolution2DLayer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#the-convolutional-neural-network-cnn" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;The Convolutional Neural Network (CNN)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-cnn-code" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of CNN code</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#additional-remarks" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Additional Remarks</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#remarks-on-the-speed" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Remarks on the speed</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution-using-separable-kernels" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution using separable kernels</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution-in-the-fourier-domain" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution in the Fourier domain</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs032.html#from-ffnns-and-cnns-to-recurrent-neural-networks-rnns" style="font-size: 80%;"><b>From FFNNs and CNNs to recurrent neural networks (RNNs)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs033.html#feedback-connections" style="font-size: 80%;"><b>Feedback connections</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs034.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs035.html#recurrent-neural-networks-rnns-overarching-view" style="font-size: 80%;"><b>Recurrent neural networks (RNNs): Overarching view</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs036.html#sequential-data-only" style="font-size: 80%;"><b>Sequential data only?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs037.html#differential-equations" style="font-size: 80%;"><b>Differential equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs038.html#a-simple-example" style="font-size: 80%;"><b>A simple example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs039.html#rnns" style="font-size: 80%;"><b>RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs040.html#what-kinds-of-behaviour-can-rnns-exhibit" style="font-size: 80%;"><b>What kinds of behaviour can RNNs exhibit?</b></a></li>
-     <!-- navigation toc: --> <li><a href="#basic-layout-figures-from-sebastian-rashcka-et-al-machine-learning-with-sickit-learn-and-pytorch-https-sebastianraschka-com-blog-2022-ml-pytorch-book-html" style="font-size: 80%;"><b>Basic layout,  "Figures from Sebastian Rashcka et al, Machine learning with Sickit-Learn and PyTorch":"https://sebastianraschka.com/blog/2022/ml-pytorch-book.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs042.html#solving-differential-equations-with-rnns" style="font-size: 80%;"><b>Solving differential equations with RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs043.html#two-first-order-differential-equations" style="font-size: 80%;"><b>Two first-order differential equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs044.html#velocity-only" style="font-size: 80%;"><b>Velocity only</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs045.html#linking-with-rnns" style="font-size: 80%;"><b>Linking with RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs046.html#minor-rewrite" style="font-size: 80%;"><b>Minor rewrite</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs047.html#rnns-in-more-detail" style="font-size: 80%;"><b>RNNs in more detail</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs048.html#rnns-in-more-detail-part-2" style="font-size: 80%;"><b>RNNs in more detail, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs049.html#rnns-in-more-detail-part-3" style="font-size: 80%;"><b>RNNs in more detail, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs050.html#rnns-in-more-detail-part-4" style="font-size: 80%;"><b>RNNs in more detail, part 4</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs051.html#rnns-in-more-detail-part-5" style="font-size: 80%;"><b>RNNs in more detail, part 5</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs052.html#rnns-in-more-detail-part-6" style="font-size: 80%;"><b>RNNs in more detail, part 6</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs053.html#rnns-in-more-detail-part-7" style="font-size: 80%;"><b>RNNs in more detail, part 7</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs054.html#backpropagation-through-time" style="font-size: 80%;"><b>Backpropagation through time</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs055.html#the-backward-pass-is-linear" style="font-size: 80%;"><b>The backward pass is linear</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs056.html#the-problem-of-exploding-or-vanishing-gradients" style="font-size: 80%;"><b>The problem of exploding or vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs057.html#mathematical-setup" style="font-size: 80%;"><b>Mathematical setup</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs058.html#back-propagation-in-time-through-figures-part-1" style="font-size: 80%;"><b>Back propagation in time through figures, part 1</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs059.html#back-propagation-in-time-part-2" style="font-size: 80%;"><b>Back propagation in time, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs060.html#back-propagation-in-time-part-3" style="font-size: 80%;"><b>Back propagation in time, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs061.html#back-propagation-in-time-part-4" style="font-size: 80%;"><b>Back propagation in time, part 4</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs062.html#back-propagation-in-time-in-equations" style="font-size: 80%;"><b>Back propagation in time in equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs063.html#chain-rule-again" style="font-size: 80%;"><b>Chain rule again</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs064.html#gradients-of-loss-functions" style="font-size: 80%;"><b>Gradients of loss functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs065.html#summary-of-rnns" style="font-size: 80%;"><b>Summary of RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs066.html#summary-of-a-typical-rnn" style="font-size: 80%;"><b>Summary of a  typical RNN</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs067.html#four-effective-ways-to-learn-an-rnn-and-preparing-for-next-week" style="font-size: 80%;"><b>Four effective ways to learn an RNN and preparing for next week</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs068.html#gating-mechanism-long-short-term-memory-lstm" style="font-size: 80%;"><b>Gating mechanism: Long Short Term Memory (LSTM)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs069.html#implementing-a-memory-cell-in-a-neural-network" style="font-size: 80%;"><b>Implementing a memory cell in a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs070.html#lstm-details" style="font-size: 80%;"><b>LSTM details</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs071.html#basic-layout" style="font-size: 80%;"><b>Basic layout</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs072.html#more-lstm-details" style="font-size: 80%;"><b>More LSTM details</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs073.html#the-forget-gate" style="font-size: 80%;"><b>The forget gate</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs074.html#input-gate" style="font-size: 80%;"><b>Input gate</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs075.html#forget-and-input" style="font-size: 80%;"><b>Forget and input</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs076.html#output-gate" style="font-size: 80%;"><b>Output gate</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs001.html#plans-for-week-45" style="font-size: 80%;">Plans for week 45</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs002.html#material-for-the-lab-sessions" style="font-size: 80%;">Material for the lab sessions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs003.html#material-for-lecture-monday-november-3" style="font-size: 80%;">Material for Lecture Monday November 3</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs004.html#convolutional-neural-networks-recognizing-images-reminder-from-last-week" style="font-size: 80%;">Convolutional Neural Networks (recognizing images), reminder from last week</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs005.html#what-is-the-difference" style="font-size: 80%;">What is the Difference</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs006.html#neural-networks-vs-cnns" style="font-size: 80%;">Neural Networks vs CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs007.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;">Why CNNS for images, sound files, medical images from CT scans etc?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs008.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;">Regular NNs don’t scale well to full images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs009.html#3d-volumes-of-neurons" style="font-size: 80%;">3D volumes of neurons</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs010.html#more-on-dimensionalities" style="font-size: 80%;">More on Dimensionalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs011.html#further-remarks" style="font-size: 80%;">Further remarks</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs012.html#layers-used-to-build-cnns" style="font-size: 80%;">Layers used to build CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs013.html#transforming-images" style="font-size: 80%;">Transforming images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs014.html#cnns-in-brief" style="font-size: 80%;">CNNs in brief</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs015.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs016.html#key-idea" style="font-size: 80%;">Key Idea</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs017.html#mathematics-of-cnns" style="font-size: 80%;">Mathematics of CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs018.html#convolution-examples-polynomial-multiplication" style="font-size: 80%;">Convolution Examples: Polynomial multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs019.html#efficient-polynomial-multiplication" style="font-size: 80%;">Efficient Polynomial Multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs020.html#further-simplification" style="font-size: 80%;">Further simplification</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs021.html#a-more-efficient-way-of-coding-the-above-convolution" style="font-size: 80%;">A more efficient way of coding the above Convolution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs022.html#commutative-process" style="font-size: 80%;">Commutative process</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs023.html#toeplitz-matrices" style="font-size: 80%;">Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs024.html#fourier-series-and-toeplitz-matrices" style="font-size: 80%;">Fourier series and Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs025.html#generalizing-the-above-one-dimensional-case" style="font-size: 80%;">Generalizing the above one-dimensional case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs026.html#memory-considerations" style="font-size: 80%;">Memory considerations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs027.html#padding" style="font-size: 80%;">Padding</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs028.html#new-vector" style="font-size: 80%;">New vector</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs029.html#rewriting-as-dot-products" style="font-size: 80%;">Rewriting as dot products</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs030.html#cross-correlation" style="font-size: 80%;">Cross correlation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs030.html#two-dimensional-objects" style="font-size: 80%;">Two-dimensional objects</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#cnns-in-more-detail-simple-example" style="font-size: 80%;">CNNs in more detail, simple example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs032.html#the-convolution-stage" style="font-size: 80%;">The convolution stage</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs033.html#finding-the-number-of-parameters" style="font-size: 80%;">Finding the number of parameters</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs034.html#new-image-or-volume" style="font-size: 80%;">New image (or volume)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs035.html#parameters-to-train-common-settings" style="font-size: 80%;">Parameters to train, common settings</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs036.html#examples-of-cnn-setups" style="font-size: 80%;">Examples of CNN setups</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs037.html#summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Summarizing: Performing a general discrete convolution ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs038.html#pooling" style="font-size: 80%;">Pooling</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs039.html#pooling-arithmetic" style="font-size: 80%;">Pooling arithmetic</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs040.html#pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Pooling types ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="#building-convolutional-neural-networks-using-tensorflow-and-keras" style="font-size: 80%;">Building convolutional neural networks using Tensorflow and Keras</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs042.html#setting-it-up" style="font-size: 80%;">Setting it up</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs043.html#the-mnist-dataset-again" style="font-size: 80%;">The MNIST dataset again</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs044.html#strong-correlations" style="font-size: 80%;">Strong correlations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs045.html#layers-of-a-cnn" style="font-size: 80%;">Layers of a CNN</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs046.html#systematic-reduction" style="font-size: 80%;">Systematic reduction</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs047.html#prerequisites-collect-and-pre-process-data" style="font-size: 80%;">Prerequisites: Collect and pre-process data</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs048.html#importing-keras-and-tensorflow" style="font-size: 80%;">Importing Keras and Tensorflow</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs049.html#running-with-keras" style="font-size: 80%;">Running with Keras</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs050.html#final-part" style="font-size: 80%;">Final part</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs051.html#final-visualization" style="font-size: 80%;">Final visualization</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs052.html#the-cifar01-data-set" style="font-size: 80%;">The CIFAR01 data set</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs053.html#verifying-the-data-set" style="font-size: 80%;">Verifying the data set</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs054.html#set-up-the-model" style="font-size: 80%;">Set up  the model</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs055.html#add-dense-layers-on-top" style="font-size: 80%;">Add Dense layers on top</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs056.html#compile-and-train-the-model" style="font-size: 80%;">Compile and train the model</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs057.html#finally-evaluate-the-model" style="font-size: 80%;">Finally, evaluate the model</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs058.html#building-code-using-pytorch" style="font-size: 80%;">Building code using Pytorch</a></li>
 
         </ul>
       </li>
@@ -418,13 +278,18 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0041"></a>
 <!-- !split -->
-<h2 id="basic-layout-figures-from-sebastian-rashcka-et-al-machine-learning-with-sickit-learn-and-pytorch-https-sebastianraschka-com-blog-2022-ml-pytorch-book-html" class="anchor">Basic layout,  <a href="https://sebastianraschka.com/blog/2022/ml-pytorch-book.html" target="_self">Figures from Sebastian Rashcka et al, Machine learning with Sickit-Learn and PyTorch</a> </h2>
+<h2 id="building-convolutional-neural-networks-using-tensorflow-and-keras" class="anchor">Building convolutional neural networks using Tensorflow and Keras </h2>
 
-<br/><br/>
-<center>
-<p><img src="http://github.com/figslides/RNN1.png" width="700" align="bottom"></p>
-</center>
-<br/><br/>
+<p>As discussed above, CNNs are neural networks built from the assumption that the inputs
+to the network are 2D images. This is important because the number of features or pixels in images
+grows very fast with the image size, and an enormous number of weights and biases are needed in order to build an accurate network.  
+</p>
+
+<p>As before, we still have our input, a hidden layer and an output. What's novel about convolutional networks
+are the <b>convolutional</b> and <b>pooling</b> layers stacked in pairs between the input and the hidden layer.
+In addition, the data is no longer represented as a 2D feature matrix, instead each input is a number of 2D
+matrices, typically 1 for each color dimension (Red, Green, Blue). 
+</p>
 
 <p>
 <!-- navigation buttons at the bottom of the page -->
@@ -451,7 +316,7 @@ <h2 id="basic-layout-figures-from-sebastian-rashcka-et-al-machine-learning-with-
   <li><a href="http://github.com/._week45-bs049.html">50</a></li>
   <li><a href="http://github.com/._week45-bs050.html">51</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week45-bs076.html">77</a></li>
+  <li><a href="http://github.com/._week45-bs058.html">59</a></li>
   <li><a href="http://github.com/._week45-bs042.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week45/html/._week45-bs042.html b/doc/pub/week45/html/._week45-bs042.html
index 66126e51a..ddd7e49f5 100644
--- a/doc/pub/week45/html/._week45-bs042.html
+++ b/doc/pub/week45/html/._week45-bs042.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks (RNNs)">
-<title>Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks (RNNs)</title>
+<meta name="description" content="Week 45,  Convolutional Neural Networks (CCNs)">
+<title>Week 45,  Convolutional Neural Networks (CCNs)</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week45.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week45-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -37,19 +37,19 @@
 <!-- tocinfo
 {'highest level': 2,
  'sections': [('Plans for week 45', 2, None, 'plans-for-week-45'),
-              ('Material for the lab sessions, additional ways to present '
-               'classification results and other practicalities',
+              ('Material for the lab sessions',
                2,
                None,
-               'material-for-the-lab-sessions-additional-ways-to-present-classification-results-and-other-practicalities'),
-              ('Material for Lecture Monday November 4',
+               'material-for-the-lab-sessions'),
+              ('Material for Lecture Monday November 3',
                2,
                None,
-               'material-for-lecture-monday-november-4'),
-              ('Convolutional Neural Networks (recognizing images)',
+               'material-for-lecture-monday-november-3'),
+              ('Convolutional Neural Networks (recognizing images), reminder '
+               'from last week',
                2,
                None,
-               'convolutional-neural-networks-recognizing-images'),
+               'convolutional-neural-networks-recognizing-images-reminder-from-last-week'),
               ('What is the Difference', 2, None, 'what-is-the-difference'),
               ('Neural Networks vs CNNs', 2, None, 'neural-networks-vs-cnns'),
               ('Why CNNS for images, sound files, medical images from CT scans '
@@ -62,10 +62,13 @@
                None,
                'regular-nns-don-t-scale-well-to-full-images'),
               ('3D volumes of neurons', 2, None, '3d-volumes-of-neurons'),
+              ('More on Dimensionalities', 2, None, 'more-on-dimensionalities'),
+              ('Further remarks', 2, None, 'further-remarks'),
               ('Layers used to build CNNs',
                2,
                None,
                'layers-used-to-build-cnns'),
+              ('Transforming images', 2, None, 'transforming-images'),
               ('CNNs in brief', 2, None, 'cnns-in-brief'),
               ('A deep CNN model ("From Raschka et '
                'al":"https://github.com/rasbt/machine-learning-book")',
@@ -73,208 +76,104 @@
                None,
                'a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
               ('Key Idea', 2, None, 'key-idea'),
-              ('Building convolutional neural networks in Tensorflow and Keras',
+              ('Mathematics of CNNs', 2, None, 'mathematics-of-cnns'),
+              ('Convolution Examples: Polynomial multiplication',
                2,
                None,
-               'building-convolutional-neural-networks-in-tensorflow-and-keras'),
-              ('Setting it up', 2, None, 'setting-it-up'),
-              ('The MNIST dataset again', 2, None, 'the-mnist-dataset-again'),
-              ('Strong correlations', 2, None, 'strong-correlations'),
-              ('Layers of a CNN', 2, None, 'layers-of-a-cnn'),
-              ('Systematic reduction', 2, None, 'systematic-reduction'),
-              ('Prerequisites: Collect and pre-process data',
-               2,
-               None,
-               'prerequisites-collect-and-pre-process-data'),
-              ('Importing Keras and Tensorflow',
-               2,
-               None,
-               'importing-keras-and-tensorflow'),
-              ('Running with Keras', 2, None, 'running-with-keras'),
-              ('Final part', 2, None, 'final-part'),
-              ('Final visualization', 2, None, 'final-visualization'),
-              ('The CIFAR01 data set', 2, None, 'the-cifar01-data-set'),
-              ('Verifying the data set', 2, None, 'verifying-the-data-set'),
-              ('Set up  the model', 2, None, 'set-up-the-model'),
-              ('Add Dense layers on top', 2, None, 'add-dense-layers-on-top'),
-              ('Compile and train the model',
-               2,
-               None,
-               'compile-and-train-the-model'),
-              ('Finally, evaluate the model',
-               2,
-               None,
-               'finally-evaluate-the-model'),
-              ('Building our own CNN code',
-               2,
-               None,
-               'building-our-own-cnn-code'),
-              ('List of contents:', 3, None, 'list-of-contents'),
-              ('Schedulers', 3, None, 'schedulers'),
-              ('Usage of schedulers', 3, None, 'usage-of-schedulers'),
-              ('Cost functions', 3, None, 'cost-functions'),
-              ('Usage of cost functions', 3, None, 'usage-of-cost-functions'),
-              ('Activation functions', 3, None, 'activation-functions'),
-              ('Usage of activation functions',
-               3,
-               None,
-               'usage-of-activation-functions'),
-              ('Convolution', 3, None, 'convolution'),
-              ('Layers', 3, None, 'layers'),
-              ('Convolution2DLayer: convolution in a hidden layer',
-               3,
-               None,
-               'convolution2dlayer-convolution-in-a-hidden-layer'),
-              ('Backpropagation in the convolutional layer',
-               3,
-               None,
-               'backpropagation-in-the-convolutional-layer'),
-              ('Demonstration', 3, None, 'demonstration'),
-              ('Pooling Layer', 3, None, 'pooling-layer'),
-              ('Flattening Layer', 3, None, 'flattening-layer'),
-              ('Fully Connected Layers', 3, None, 'fully-connected-layers'),
-              ('Optimized Convolution2DLayer',
-               3,
-               None,
-               'optimized-convolution2dlayer'),
-              ('The Convolutional Neural Network (CNN)',
-               3,
-               None,
-               'the-convolutional-neural-network-cnn'),
-              ('Usage of CNN code', 3, None, 'usage-of-cnn-code'),
-              ('Additional Remarks', 3, None, 'additional-remarks'),
-              ('Remarks on the speed', 3, None, 'remarks-on-the-speed'),
-              ('Convolution using separable kernels',
-               3,
-               None,
-               'convolution-using-separable-kernels'),
-              ('Convolution in the Fourier domain',
-               3,
-               None,
-               'convolution-in-the-fourier-domain'),
-              ('From FFNNs and CNNs to recurrent neural networks (RNNs)',
-               2,
-               None,
-               'from-ffnns-and-cnns-to-recurrent-neural-networks-rnns'),
-              ('Feedback connections', 2, None, 'feedback-connections'),
-              ('Vanishing gradients', 2, None, 'vanishing-gradients'),
-              ('Recurrent neural networks (RNNs): Overarching view',
-               2,
-               None,
-               'recurrent-neural-networks-rnns-overarching-view'),
-              ('Sequential data only?', 2, None, 'sequential-data-only'),
-              ('Differential equations', 2, None, 'differential-equations'),
-              ('A simple example', 2, None, 'a-simple-example'),
-              ('RNNs', 2, None, 'rnns'),
-              ('What kinds of behaviour can RNNs exhibit?',
+               'convolution-examples-polynomial-multiplication'),
+              ('Efficient Polynomial Multiplication',
                2,
                None,
-               'what-kinds-of-behaviour-can-rnns-exhibit'),
-              ('Basic layout,  "Figures from Sebastian Rashcka et al, Machine '
-               'learning with Sickit-Learn and '
-               'PyTorch":"https://sebastianraschka.com/blog/2022/ml-pytorch-book.html"',
+               'efficient-polynomial-multiplication'),
+              ('Further simplification', 2, None, 'further-simplification'),
+              ('A more efficient way of coding the above Convolution',
                2,
                None,
-               'basic-layout-figures-from-sebastian-rashcka-et-al-machine-learning-with-sickit-learn-and-pytorch-https-sebastianraschka-com-blog-2022-ml-pytorch-book-html'),
-              ('Solving differential equations with RNNs',
+               'a-more-efficient-way-of-coding-the-above-convolution'),
+              ('Commutative process', 2, None, 'commutative-process'),
+              ('Toeplitz matrices', 2, None, 'toeplitz-matrices'),
+              ('Fourier series and Toeplitz matrices',
                2,
                None,
-               'solving-differential-equations-with-rnns'),
-              ('Two first-order differential equations',
+               'fourier-series-and-toeplitz-matrices'),
+              ('Generalizing the above one-dimensional case',
                2,
                None,
-               'two-first-order-differential-equations'),
-              ('Velocity only', 2, None, 'velocity-only'),
-              ('Linking with RNNs', 2, None, 'linking-with-rnns'),
-              ('Minor rewrite', 2, None, 'minor-rewrite'),
-              ('RNNs in more detail', 2, None, 'rnns-in-more-detail'),
-              ('RNNs in more detail, part 2',
+               'generalizing-the-above-one-dimensional-case'),
+              ('Memory considerations', 2, None, 'memory-considerations'),
+              ('Padding', 2, None, 'padding'),
+              ('New vector', 2, None, 'new-vector'),
+              ('Rewriting as dot products',
                2,
                None,
-               'rnns-in-more-detail-part-2'),
-              ('RNNs in more detail, part 3',
+               'rewriting-as-dot-products'),
+              ('Cross correlation', 2, None, 'cross-correlation'),
+              ('Two-dimensional objects', 2, None, 'two-dimensional-objects'),
+              ('CNNs in more detail, simple example',
                2,
                None,
-               'rnns-in-more-detail-part-3'),
-              ('RNNs in more detail, part 4',
+               'cnns-in-more-detail-simple-example'),
+              ('The convolution stage', 2, None, 'the-convolution-stage'),
+              ('Finding the number of parameters',
                2,
                None,
-               'rnns-in-more-detail-part-4'),
-              ('RNNs in more detail, part 5',
+               'finding-the-number-of-parameters'),
+              ('New image (or volume)', 2, None, 'new-image-or-volume'),
+              ('Parameters to train, common settings',
                2,
                None,
-               'rnns-in-more-detail-part-5'),
-              ('RNNs in more detail, part 6',
-               2,
-               None,
-               'rnns-in-more-detail-part-6'),
-              ('RNNs in more detail, part 7',
-               2,
-               None,
-               'rnns-in-more-detail-part-7'),
-              ('Backpropagation through time',
-               2,
-               None,
-               'backpropagation-through-time'),
-              ('The backward pass is linear',
-               2,
-               None,
-               'the-backward-pass-is-linear'),
-              ('The problem of exploding or vanishing gradients',
-               2,
-               None,
-               'the-problem-of-exploding-or-vanishing-gradients'),
-              ('Mathematical setup', 2, None, 'mathematical-setup'),
-              ('Back propagation in time through figures, part 1',
-               2,
-               None,
-               'back-propagation-in-time-through-figures-part-1'),
-              ('Back propagation in time, part 2',
-               2,
-               None,
-               'back-propagation-in-time-part-2'),
-              ('Back propagation in time, part 3',
+               'parameters-to-train-common-settings'),
+              ('Examples of CNN setups', 2, None, 'examples-of-cnn-setups'),
+              ('Summarizing: Performing a general discrete convolution ("From '
+               'Raschka et '
+               'al":"https://github.com/rasbt/machine-learning-book")',
                2,
                None,
-               'back-propagation-in-time-part-3'),
-              ('Back propagation in time, part 4',
+               'summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
+              ('Pooling', 2, None, 'pooling'),
+              ('Pooling arithmetic', 2, None, 'pooling-arithmetic'),
+              ('Pooling types ("From Raschka et '
+               'al":"https://github.com/rasbt/machine-learning-book")',
                2,
                None,
-               'back-propagation-in-time-part-4'),
-              ('Back propagation in time in equations',
+               'pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
+              ('Building convolutional neural networks using Tensorflow and '
+               'Keras',
                2,
                None,
-               'back-propagation-in-time-in-equations'),
-              ('Chain rule again', 2, None, 'chain-rule-again'),
-              ('Gradients of loss functions',
+               'building-convolutional-neural-networks-using-tensorflow-and-keras'),
+              ('Setting it up', 2, None, 'setting-it-up'),
+              ('The MNIST dataset again', 2, None, 'the-mnist-dataset-again'),
+              ('Strong correlations', 2, None, 'strong-correlations'),
+              ('Layers of a CNN', 2, None, 'layers-of-a-cnn'),
+              ('Systematic reduction', 2, None, 'systematic-reduction'),
+              ('Prerequisites: Collect and pre-process data',
                2,
                None,
-               'gradients-of-loss-functions'),
-              ('Summary of RNNs', 2, None, 'summary-of-rnns'),
-              ('Summary of a  typical RNN',
+               'prerequisites-collect-and-pre-process-data'),
+              ('Importing Keras and Tensorflow',
                2,
                None,
-               'summary-of-a-typical-rnn'),
-              ('Four effective ways to learn an RNN and preparing for next '
-               'week',
+               'importing-keras-and-tensorflow'),
+              ('Running with Keras', 2, None, 'running-with-keras'),
+              ('Final part', 2, None, 'final-part'),
+              ('Final visualization', 2, None, 'final-visualization'),
+              ('The CIFAR01 data set', 2, None, 'the-cifar01-data-set'),
+              ('Verifying the data set', 2, None, 'verifying-the-data-set'),
+              ('Set up  the model', 2, None, 'set-up-the-model'),
+              ('Add Dense layers on top', 2, None, 'add-dense-layers-on-top'),
+              ('Compile and train the model',
                2,
                None,
-               'four-effective-ways-to-learn-an-rnn-and-preparing-for-next-week'),
-              ('Gating mechanism: Long Short Term Memory (LSTM)',
+               'compile-and-train-the-model'),
+              ('Finally, evaluate the model',
                2,
                None,
-               'gating-mechanism-long-short-term-memory-lstm'),
-              ('Implementing a memory cell in a neural network',
+               'finally-evaluate-the-model'),
+              ('Building code using Pytorch',
                2,
                None,
-               'implementing-a-memory-cell-in-a-neural-network'),
-              ('LSTM details', 2, None, 'lstm-details'),
-              ('Basic layout', 2, None, 'basic-layout'),
-              ('More LSTM details', 2, None, 'more-lstm-details'),
-              ('The forget gate', 2, None, 'the-forget-gate'),
-              ('Input gate', 2, None, 'input-gate'),
-              ('Forget and input', 2, None, 'forget-and-input'),
-              ('Output gate', 2, None, 'output-gate')]}
+               'building-code-using-pytorch')]}
 end of tocinfo -->
 
 <body>
@@ -302,111 +201,72 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week45-bs.html">Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks (RNNs)</a>
+    <a class="navbar-brand" href="http://github.com/week45-bs.html">Week 45,  Convolutional Neural Networks (CCNs)</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs001.html#plans-for-week-45" style="font-size: 80%;"><b>Plans for week 45</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs002.html#material-for-the-lab-sessions-additional-ways-to-present-classification-results-and-other-practicalities" style="font-size: 80%;"><b>Material for the lab sessions, additional ways to present classification results and other practicalities</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs003.html#material-for-lecture-monday-november-4" style="font-size: 80%;"><b>Material for Lecture Monday November 4</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs004.html#convolutional-neural-networks-recognizing-images" style="font-size: 80%;"><b>Convolutional Neural Networks (recognizing images)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs005.html#what-is-the-difference" style="font-size: 80%;"><b>What is the Difference</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs006.html#neural-networks-vs-cnns" style="font-size: 80%;"><b>Neural Networks vs CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs007.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;"><b>Why CNNS for images, sound files, medical images from CT scans etc?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs008.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;"><b>Regular NNs don’t scale well to full images</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs009.html#3d-volumes-of-neurons" style="font-size: 80%;"><b>3D volumes of neurons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs010.html#layers-used-to-build-cnns" style="font-size: 80%;"><b>Layers used to build CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs011.html#cnns-in-brief" style="font-size: 80%;"><b>CNNs in brief</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs012.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs013.html#key-idea" style="font-size: 80%;"><b>Key Idea</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs014.html#building-convolutional-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building convolutional neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs015.html#setting-it-up" style="font-size: 80%;"><b>Setting it up</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs016.html#the-mnist-dataset-again" style="font-size: 80%;"><b>The MNIST dataset again</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs017.html#strong-correlations" style="font-size: 80%;"><b>Strong correlations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs018.html#layers-of-a-cnn" style="font-size: 80%;"><b>Layers of a CNN</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs019.html#systematic-reduction" style="font-size: 80%;"><b>Systematic reduction</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs020.html#prerequisites-collect-and-pre-process-data" style="font-size: 80%;"><b>Prerequisites: Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs021.html#importing-keras-and-tensorflow" style="font-size: 80%;"><b>Importing Keras and Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs022.html#running-with-keras" style="font-size: 80%;"><b>Running with Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs023.html#final-part" style="font-size: 80%;"><b>Final part</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs024.html#final-visualization" style="font-size: 80%;"><b>Final visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs025.html#the-cifar01-data-set" style="font-size: 80%;"><b>The CIFAR01 data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs026.html#verifying-the-data-set" style="font-size: 80%;"><b>Verifying the data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs027.html#set-up-the-model" style="font-size: 80%;"><b>Set up  the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs028.html#add-dense-layers-on-top" style="font-size: 80%;"><b>Add Dense layers on top</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs029.html#compile-and-train-the-model" style="font-size: 80%;"><b>Compile and train the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs030.html#finally-evaluate-the-model" style="font-size: 80%;"><b>Finally, evaluate the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#building-our-own-cnn-code" style="font-size: 80%;"><b>Building our own CNN code</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#list-of-contents" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;List of contents:</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution2dlayer-convolution-in-a-hidden-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution2DLayer: convolution in a hidden layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#backpropagation-in-the-convolutional-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Backpropagation in the convolutional layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#demonstration" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Demonstration</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#pooling-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Pooling Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#flattening-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Flattening Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#fully-connected-layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Fully Connected Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#optimized-convolution2dlayer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Optimized Convolution2DLayer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#the-convolutional-neural-network-cnn" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;The Convolutional Neural Network (CNN)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-cnn-code" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of CNN code</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#additional-remarks" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Additional Remarks</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#remarks-on-the-speed" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Remarks on the speed</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution-using-separable-kernels" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution using separable kernels</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution-in-the-fourier-domain" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution in the Fourier domain</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs032.html#from-ffnns-and-cnns-to-recurrent-neural-networks-rnns" style="font-size: 80%;"><b>From FFNNs and CNNs to recurrent neural networks (RNNs)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs033.html#feedback-connections" style="font-size: 80%;"><b>Feedback connections</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs034.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs035.html#recurrent-neural-networks-rnns-overarching-view" style="font-size: 80%;"><b>Recurrent neural networks (RNNs): Overarching view</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs036.html#sequential-data-only" style="font-size: 80%;"><b>Sequential data only?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs037.html#differential-equations" style="font-size: 80%;"><b>Differential equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs038.html#a-simple-example" style="font-size: 80%;"><b>A simple example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs039.html#rnns" style="font-size: 80%;"><b>RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs040.html#what-kinds-of-behaviour-can-rnns-exhibit" style="font-size: 80%;"><b>What kinds of behaviour can RNNs exhibit?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs041.html#basic-layout-figures-from-sebastian-rashcka-et-al-machine-learning-with-sickit-learn-and-pytorch-https-sebastianraschka-com-blog-2022-ml-pytorch-book-html" style="font-size: 80%;"><b>Basic layout,  "Figures from Sebastian Rashcka et al, Machine learning with Sickit-Learn and PyTorch":"https://sebastianraschka.com/blog/2022/ml-pytorch-book.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="#solving-differential-equations-with-rnns" style="font-size: 80%;"><b>Solving differential equations with RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs043.html#two-first-order-differential-equations" style="font-size: 80%;"><b>Two first-order differential equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs044.html#velocity-only" style="font-size: 80%;"><b>Velocity only</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs045.html#linking-with-rnns" style="font-size: 80%;"><b>Linking with RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs046.html#minor-rewrite" style="font-size: 80%;"><b>Minor rewrite</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs047.html#rnns-in-more-detail" style="font-size: 80%;"><b>RNNs in more detail</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs048.html#rnns-in-more-detail-part-2" style="font-size: 80%;"><b>RNNs in more detail, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs049.html#rnns-in-more-detail-part-3" style="font-size: 80%;"><b>RNNs in more detail, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs050.html#rnns-in-more-detail-part-4" style="font-size: 80%;"><b>RNNs in more detail, part 4</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs051.html#rnns-in-more-detail-part-5" style="font-size: 80%;"><b>RNNs in more detail, part 5</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs052.html#rnns-in-more-detail-part-6" style="font-size: 80%;"><b>RNNs in more detail, part 6</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs053.html#rnns-in-more-detail-part-7" style="font-size: 80%;"><b>RNNs in more detail, part 7</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs054.html#backpropagation-through-time" style="font-size: 80%;"><b>Backpropagation through time</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs055.html#the-backward-pass-is-linear" style="font-size: 80%;"><b>The backward pass is linear</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs056.html#the-problem-of-exploding-or-vanishing-gradients" style="font-size: 80%;"><b>The problem of exploding or vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs057.html#mathematical-setup" style="font-size: 80%;"><b>Mathematical setup</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs058.html#back-propagation-in-time-through-figures-part-1" style="font-size: 80%;"><b>Back propagation in time through figures, part 1</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs059.html#back-propagation-in-time-part-2" style="font-size: 80%;"><b>Back propagation in time, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs060.html#back-propagation-in-time-part-3" style="font-size: 80%;"><b>Back propagation in time, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs061.html#back-propagation-in-time-part-4" style="font-size: 80%;"><b>Back propagation in time, part 4</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs062.html#back-propagation-in-time-in-equations" style="font-size: 80%;"><b>Back propagation in time in equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs063.html#chain-rule-again" style="font-size: 80%;"><b>Chain rule again</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs064.html#gradients-of-loss-functions" style="font-size: 80%;"><b>Gradients of loss functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs065.html#summary-of-rnns" style="font-size: 80%;"><b>Summary of RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs066.html#summary-of-a-typical-rnn" style="font-size: 80%;"><b>Summary of a  typical RNN</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs067.html#four-effective-ways-to-learn-an-rnn-and-preparing-for-next-week" style="font-size: 80%;"><b>Four effective ways to learn an RNN and preparing for next week</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs068.html#gating-mechanism-long-short-term-memory-lstm" style="font-size: 80%;"><b>Gating mechanism: Long Short Term Memory (LSTM)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs069.html#implementing-a-memory-cell-in-a-neural-network" style="font-size: 80%;"><b>Implementing a memory cell in a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs070.html#lstm-details" style="font-size: 80%;"><b>LSTM details</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs071.html#basic-layout" style="font-size: 80%;"><b>Basic layout</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs072.html#more-lstm-details" style="font-size: 80%;"><b>More LSTM details</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs073.html#the-forget-gate" style="font-size: 80%;"><b>The forget gate</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs074.html#input-gate" style="font-size: 80%;"><b>Input gate</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs075.html#forget-and-input" style="font-size: 80%;"><b>Forget and input</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs076.html#output-gate" style="font-size: 80%;"><b>Output gate</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs001.html#plans-for-week-45" style="font-size: 80%;">Plans for week 45</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs002.html#material-for-the-lab-sessions" style="font-size: 80%;">Material for the lab sessions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs003.html#material-for-lecture-monday-november-3" style="font-size: 80%;">Material for Lecture Monday November 3</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs004.html#convolutional-neural-networks-recognizing-images-reminder-from-last-week" style="font-size: 80%;">Convolutional Neural Networks (recognizing images), reminder from last week</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs005.html#what-is-the-difference" style="font-size: 80%;">What is the Difference</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs006.html#neural-networks-vs-cnns" style="font-size: 80%;">Neural Networks vs CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs007.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;">Why CNNS for images, sound files, medical images from CT scans etc?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs008.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;">Regular NNs don’t scale well to full images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs009.html#3d-volumes-of-neurons" style="font-size: 80%;">3D volumes of neurons</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs010.html#more-on-dimensionalities" style="font-size: 80%;">More on Dimensionalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs011.html#further-remarks" style="font-size: 80%;">Further remarks</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs012.html#layers-used-to-build-cnns" style="font-size: 80%;">Layers used to build CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs013.html#transforming-images" style="font-size: 80%;">Transforming images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs014.html#cnns-in-brief" style="font-size: 80%;">CNNs in brief</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs015.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs016.html#key-idea" style="font-size: 80%;">Key Idea</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs017.html#mathematics-of-cnns" style="font-size: 80%;">Mathematics of CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs018.html#convolution-examples-polynomial-multiplication" style="font-size: 80%;">Convolution Examples: Polynomial multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs019.html#efficient-polynomial-multiplication" style="font-size: 80%;">Efficient Polynomial Multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs020.html#further-simplification" style="font-size: 80%;">Further simplification</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs021.html#a-more-efficient-way-of-coding-the-above-convolution" style="font-size: 80%;">A more efficient way of coding the above Convolution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs022.html#commutative-process" style="font-size: 80%;">Commutative process</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs023.html#toeplitz-matrices" style="font-size: 80%;">Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs024.html#fourier-series-and-toeplitz-matrices" style="font-size: 80%;">Fourier series and Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs025.html#generalizing-the-above-one-dimensional-case" style="font-size: 80%;">Generalizing the above one-dimensional case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs026.html#memory-considerations" style="font-size: 80%;">Memory considerations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs027.html#padding" style="font-size: 80%;">Padding</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs028.html#new-vector" style="font-size: 80%;">New vector</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs029.html#rewriting-as-dot-products" style="font-size: 80%;">Rewriting as dot products</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs030.html#cross-correlation" style="font-size: 80%;">Cross correlation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs030.html#two-dimensional-objects" style="font-size: 80%;">Two-dimensional objects</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#cnns-in-more-detail-simple-example" style="font-size: 80%;">CNNs in more detail, simple example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs032.html#the-convolution-stage" style="font-size: 80%;">The convolution stage</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs033.html#finding-the-number-of-parameters" style="font-size: 80%;">Finding the number of parameters</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs034.html#new-image-or-volume" style="font-size: 80%;">New image (or volume)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs035.html#parameters-to-train-common-settings" style="font-size: 80%;">Parameters to train, common settings</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs036.html#examples-of-cnn-setups" style="font-size: 80%;">Examples of CNN setups</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs037.html#summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Summarizing: Performing a general discrete convolution ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs038.html#pooling" style="font-size: 80%;">Pooling</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs039.html#pooling-arithmetic" style="font-size: 80%;">Pooling arithmetic</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs040.html#pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Pooling types ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs041.html#building-convolutional-neural-networks-using-tensorflow-and-keras" style="font-size: 80%;">Building convolutional neural networks using Tensorflow and Keras</a></li>
+     <!-- navigation toc: --> <li><a href="#setting-it-up" style="font-size: 80%;">Setting it up</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs043.html#the-mnist-dataset-again" style="font-size: 80%;">The MNIST dataset again</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs044.html#strong-correlations" style="font-size: 80%;">Strong correlations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs045.html#layers-of-a-cnn" style="font-size: 80%;">Layers of a CNN</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs046.html#systematic-reduction" style="font-size: 80%;">Systematic reduction</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs047.html#prerequisites-collect-and-pre-process-data" style="font-size: 80%;">Prerequisites: Collect and pre-process data</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs048.html#importing-keras-and-tensorflow" style="font-size: 80%;">Importing Keras and Tensorflow</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs049.html#running-with-keras" style="font-size: 80%;">Running with Keras</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs050.html#final-part" style="font-size: 80%;">Final part</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs051.html#final-visualization" style="font-size: 80%;">Final visualization</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs052.html#the-cifar01-data-set" style="font-size: 80%;">The CIFAR01 data set</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs053.html#verifying-the-data-set" style="font-size: 80%;">Verifying the data set</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs054.html#set-up-the-model" style="font-size: 80%;">Set up  the model</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs055.html#add-dense-layers-on-top" style="font-size: 80%;">Add Dense layers on top</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs056.html#compile-and-train-the-model" style="font-size: 80%;">Compile and train the model</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs057.html#finally-evaluate-the-model" style="font-size: 80%;">Finally, evaluate the model</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs058.html#building-code-using-pytorch" style="font-size: 80%;">Building code using Pytorch</a></li>
 
         </ul>
       </li>
@@ -418,22 +278,15 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0042"></a>
 <!-- !split -->
-<h2 id="solving-differential-equations-with-rnns" class="anchor">Solving differential equations with RNNs </h2>
+<h2 id="setting-it-up" class="anchor">Setting it up </h2>
 
-<p>To gain some intuition on how we can use RNNs for time series, let us
-tailor the representation of the solution of a differential equation
-as a time series.
+<p>It means that to represent the entire
+dataset of images, we require a 4D matrix or <b>tensor</b>. This tensor has the dimensions:  
 </p>
-
-<p>Consider the famous differential equation (Newton's equation of motion for damped harmonic oscillations, scaled in terms of dimensionless time)</p>
-
-$$
-\frac{d^2x}{dt^2}+\eta\frac{dx}{dt}+x(t)=F(t),
+$$  
+(n_{inputs},\, n_{pixels, width},\, n_{pixels, height},\, depth) .
 $$
 
-<p>where \( \eta \) is a constant used in scaling time into a dimensionless variable and \( F(t) \) is an external force acting on the system.
-The constant \( \eta \) is a so-called damping.
-</p>
 
 <p>
 <!-- navigation buttons at the bottom of the page -->
@@ -460,7 +313,7 @@ <h2 id="solving-differential-equations-with-rnns" class="anchor">Solving differe
   <li><a href="http://github.com/._week45-bs050.html">51</a></li>
   <li><a href="http://github.com/._week45-bs051.html">52</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week45-bs076.html">77</a></li>
+  <li><a href="http://github.com/._week45-bs058.html">59</a></li>
   <li><a href="http://github.com/._week45-bs043.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week45/html/._week45-bs043.html b/doc/pub/week45/html/._week45-bs043.html
index 62c944918..7473ba88d 100644
--- a/doc/pub/week45/html/._week45-bs043.html
+++ b/doc/pub/week45/html/._week45-bs043.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks (RNNs)">
-<title>Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks (RNNs)</title>
+<meta name="description" content="Week 45,  Convolutional Neural Networks (CCNs)">
+<title>Week 45,  Convolutional Neural Networks (CCNs)</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week45.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week45-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -37,19 +37,19 @@
 <!-- tocinfo
 {'highest level': 2,
  'sections': [('Plans for week 45', 2, None, 'plans-for-week-45'),
-              ('Material for the lab sessions, additional ways to present '
-               'classification results and other practicalities',
+              ('Material for the lab sessions',
                2,
                None,
-               'material-for-the-lab-sessions-additional-ways-to-present-classification-results-and-other-practicalities'),
-              ('Material for Lecture Monday November 4',
+               'material-for-the-lab-sessions'),
+              ('Material for Lecture Monday November 3',
                2,
                None,
-               'material-for-lecture-monday-november-4'),
-              ('Convolutional Neural Networks (recognizing images)',
+               'material-for-lecture-monday-november-3'),
+              ('Convolutional Neural Networks (recognizing images), reminder '
+               'from last week',
                2,
                None,
-               'convolutional-neural-networks-recognizing-images'),
+               'convolutional-neural-networks-recognizing-images-reminder-from-last-week'),
               ('What is the Difference', 2, None, 'what-is-the-difference'),
               ('Neural Networks vs CNNs', 2, None, 'neural-networks-vs-cnns'),
               ('Why CNNS for images, sound files, medical images from CT scans '
@@ -62,10 +62,13 @@
                None,
                'regular-nns-don-t-scale-well-to-full-images'),
               ('3D volumes of neurons', 2, None, '3d-volumes-of-neurons'),
+              ('More on Dimensionalities', 2, None, 'more-on-dimensionalities'),
+              ('Further remarks', 2, None, 'further-remarks'),
               ('Layers used to build CNNs',
                2,
                None,
                'layers-used-to-build-cnns'),
+              ('Transforming images', 2, None, 'transforming-images'),
               ('CNNs in brief', 2, None, 'cnns-in-brief'),
               ('A deep CNN model ("From Raschka et '
                'al":"https://github.com/rasbt/machine-learning-book")',
@@ -73,208 +76,104 @@
                None,
                'a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
               ('Key Idea', 2, None, 'key-idea'),
-              ('Building convolutional neural networks in Tensorflow and Keras',
+              ('Mathematics of CNNs', 2, None, 'mathematics-of-cnns'),
+              ('Convolution Examples: Polynomial multiplication',
                2,
                None,
-               'building-convolutional-neural-networks-in-tensorflow-and-keras'),
-              ('Setting it up', 2, None, 'setting-it-up'),
-              ('The MNIST dataset again', 2, None, 'the-mnist-dataset-again'),
-              ('Strong correlations', 2, None, 'strong-correlations'),
-              ('Layers of a CNN', 2, None, 'layers-of-a-cnn'),
-              ('Systematic reduction', 2, None, 'systematic-reduction'),
-              ('Prerequisites: Collect and pre-process data',
-               2,
-               None,
-               'prerequisites-collect-and-pre-process-data'),
-              ('Importing Keras and Tensorflow',
-               2,
-               None,
-               'importing-keras-and-tensorflow'),
-              ('Running with Keras', 2, None, 'running-with-keras'),
-              ('Final part', 2, None, 'final-part'),
-              ('Final visualization', 2, None, 'final-visualization'),
-              ('The CIFAR01 data set', 2, None, 'the-cifar01-data-set'),
-              ('Verifying the data set', 2, None, 'verifying-the-data-set'),
-              ('Set up  the model', 2, None, 'set-up-the-model'),
-              ('Add Dense layers on top', 2, None, 'add-dense-layers-on-top'),
-              ('Compile and train the model',
-               2,
-               None,
-               'compile-and-train-the-model'),
-              ('Finally, evaluate the model',
-               2,
-               None,
-               'finally-evaluate-the-model'),
-              ('Building our own CNN code',
-               2,
-               None,
-               'building-our-own-cnn-code'),
-              ('List of contents:', 3, None, 'list-of-contents'),
-              ('Schedulers', 3, None, 'schedulers'),
-              ('Usage of schedulers', 3, None, 'usage-of-schedulers'),
-              ('Cost functions', 3, None, 'cost-functions'),
-              ('Usage of cost functions', 3, None, 'usage-of-cost-functions'),
-              ('Activation functions', 3, None, 'activation-functions'),
-              ('Usage of activation functions',
-               3,
-               None,
-               'usage-of-activation-functions'),
-              ('Convolution', 3, None, 'convolution'),
-              ('Layers', 3, None, 'layers'),
-              ('Convolution2DLayer: convolution in a hidden layer',
-               3,
-               None,
-               'convolution2dlayer-convolution-in-a-hidden-layer'),
-              ('Backpropagation in the convolutional layer',
-               3,
-               None,
-               'backpropagation-in-the-convolutional-layer'),
-              ('Demonstration', 3, None, 'demonstration'),
-              ('Pooling Layer', 3, None, 'pooling-layer'),
-              ('Flattening Layer', 3, None, 'flattening-layer'),
-              ('Fully Connected Layers', 3, None, 'fully-connected-layers'),
-              ('Optimized Convolution2DLayer',
-               3,
-               None,
-               'optimized-convolution2dlayer'),
-              ('The Convolutional Neural Network (CNN)',
-               3,
-               None,
-               'the-convolutional-neural-network-cnn'),
-              ('Usage of CNN code', 3, None, 'usage-of-cnn-code'),
-              ('Additional Remarks', 3, None, 'additional-remarks'),
-              ('Remarks on the speed', 3, None, 'remarks-on-the-speed'),
-              ('Convolution using separable kernels',
-               3,
-               None,
-               'convolution-using-separable-kernels'),
-              ('Convolution in the Fourier domain',
-               3,
-               None,
-               'convolution-in-the-fourier-domain'),
-              ('From FFNNs and CNNs to recurrent neural networks (RNNs)',
-               2,
-               None,
-               'from-ffnns-and-cnns-to-recurrent-neural-networks-rnns'),
-              ('Feedback connections', 2, None, 'feedback-connections'),
-              ('Vanishing gradients', 2, None, 'vanishing-gradients'),
-              ('Recurrent neural networks (RNNs): Overarching view',
+               'convolution-examples-polynomial-multiplication'),
+              ('Efficient Polynomial Multiplication',
                2,
                None,
-               'recurrent-neural-networks-rnns-overarching-view'),
-              ('Sequential data only?', 2, None, 'sequential-data-only'),
-              ('Differential equations', 2, None, 'differential-equations'),
-              ('A simple example', 2, None, 'a-simple-example'),
-              ('RNNs', 2, None, 'rnns'),
-              ('What kinds of behaviour can RNNs exhibit?',
+               'efficient-polynomial-multiplication'),
+              ('Further simplification', 2, None, 'further-simplification'),
+              ('A more efficient way of coding the above Convolution',
                2,
                None,
-               'what-kinds-of-behaviour-can-rnns-exhibit'),
-              ('Basic layout,  "Figures from Sebastian Rashcka et al, Machine '
-               'learning with Sickit-Learn and '
-               'PyTorch":"https://sebastianraschka.com/blog/2022/ml-pytorch-book.html"',
+               'a-more-efficient-way-of-coding-the-above-convolution'),
+              ('Commutative process', 2, None, 'commutative-process'),
+              ('Toeplitz matrices', 2, None, 'toeplitz-matrices'),
+              ('Fourier series and Toeplitz matrices',
                2,
                None,
-               'basic-layout-figures-from-sebastian-rashcka-et-al-machine-learning-with-sickit-learn-and-pytorch-https-sebastianraschka-com-blog-2022-ml-pytorch-book-html'),
-              ('Solving differential equations with RNNs',
+               'fourier-series-and-toeplitz-matrices'),
+              ('Generalizing the above one-dimensional case',
                2,
                None,
-               'solving-differential-equations-with-rnns'),
-              ('Two first-order differential equations',
+               'generalizing-the-above-one-dimensional-case'),
+              ('Memory considerations', 2, None, 'memory-considerations'),
+              ('Padding', 2, None, 'padding'),
+              ('New vector', 2, None, 'new-vector'),
+              ('Rewriting as dot products',
                2,
                None,
-               'two-first-order-differential-equations'),
-              ('Velocity only', 2, None, 'velocity-only'),
-              ('Linking with RNNs', 2, None, 'linking-with-rnns'),
-              ('Minor rewrite', 2, None, 'minor-rewrite'),
-              ('RNNs in more detail', 2, None, 'rnns-in-more-detail'),
-              ('RNNs in more detail, part 2',
+               'rewriting-as-dot-products'),
+              ('Cross correlation', 2, None, 'cross-correlation'),
+              ('Two-dimensional objects', 2, None, 'two-dimensional-objects'),
+              ('CNNs in more detail, simple example',
                2,
                None,
-               'rnns-in-more-detail-part-2'),
-              ('RNNs in more detail, part 3',
+               'cnns-in-more-detail-simple-example'),
+              ('The convolution stage', 2, None, 'the-convolution-stage'),
+              ('Finding the number of parameters',
                2,
                None,
-               'rnns-in-more-detail-part-3'),
-              ('RNNs in more detail, part 4',
+               'finding-the-number-of-parameters'),
+              ('New image (or volume)', 2, None, 'new-image-or-volume'),
+              ('Parameters to train, common settings',
                2,
                None,
-               'rnns-in-more-detail-part-4'),
-              ('RNNs in more detail, part 5',
-               2,
-               None,
-               'rnns-in-more-detail-part-5'),
-              ('RNNs in more detail, part 6',
-               2,
-               None,
-               'rnns-in-more-detail-part-6'),
-              ('RNNs in more detail, part 7',
-               2,
-               None,
-               'rnns-in-more-detail-part-7'),
-              ('Backpropagation through time',
-               2,
-               None,
-               'backpropagation-through-time'),
-              ('The backward pass is linear',
-               2,
-               None,
-               'the-backward-pass-is-linear'),
-              ('The problem of exploding or vanishing gradients',
-               2,
-               None,
-               'the-problem-of-exploding-or-vanishing-gradients'),
-              ('Mathematical setup', 2, None, 'mathematical-setup'),
-              ('Back propagation in time through figures, part 1',
-               2,
-               None,
-               'back-propagation-in-time-through-figures-part-1'),
-              ('Back propagation in time, part 2',
-               2,
-               None,
-               'back-propagation-in-time-part-2'),
-              ('Back propagation in time, part 3',
+               'parameters-to-train-common-settings'),
+              ('Examples of CNN setups', 2, None, 'examples-of-cnn-setups'),
+              ('Summarizing: Performing a general discrete convolution ("From '
+               'Raschka et '
+               'al":"https://github.com/rasbt/machine-learning-book")',
                2,
                None,
-               'back-propagation-in-time-part-3'),
-              ('Back propagation in time, part 4',
+               'summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
+              ('Pooling', 2, None, 'pooling'),
+              ('Pooling arithmetic', 2, None, 'pooling-arithmetic'),
+              ('Pooling types ("From Raschka et '
+               'al":"https://github.com/rasbt/machine-learning-book")',
                2,
                None,
-               'back-propagation-in-time-part-4'),
-              ('Back propagation in time in equations',
+               'pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
+              ('Building convolutional neural networks using Tensorflow and '
+               'Keras',
                2,
                None,
-               'back-propagation-in-time-in-equations'),
-              ('Chain rule again', 2, None, 'chain-rule-again'),
-              ('Gradients of loss functions',
+               'building-convolutional-neural-networks-using-tensorflow-and-keras'),
+              ('Setting it up', 2, None, 'setting-it-up'),
+              ('The MNIST dataset again', 2, None, 'the-mnist-dataset-again'),
+              ('Strong correlations', 2, None, 'strong-correlations'),
+              ('Layers of a CNN', 2, None, 'layers-of-a-cnn'),
+              ('Systematic reduction', 2, None, 'systematic-reduction'),
+              ('Prerequisites: Collect and pre-process data',
                2,
                None,
-               'gradients-of-loss-functions'),
-              ('Summary of RNNs', 2, None, 'summary-of-rnns'),
-              ('Summary of a  typical RNN',
+               'prerequisites-collect-and-pre-process-data'),
+              ('Importing Keras and Tensorflow',
                2,
                None,
-               'summary-of-a-typical-rnn'),
-              ('Four effective ways to learn an RNN and preparing for next '
-               'week',
+               'importing-keras-and-tensorflow'),
+              ('Running with Keras', 2, None, 'running-with-keras'),
+              ('Final part', 2, None, 'final-part'),
+              ('Final visualization', 2, None, 'final-visualization'),
+              ('The CIFAR01 data set', 2, None, 'the-cifar01-data-set'),
+              ('Verifying the data set', 2, None, 'verifying-the-data-set'),
+              ('Set up  the model', 2, None, 'set-up-the-model'),
+              ('Add Dense layers on top', 2, None, 'add-dense-layers-on-top'),
+              ('Compile and train the model',
                2,
                None,
-               'four-effective-ways-to-learn-an-rnn-and-preparing-for-next-week'),
-              ('Gating mechanism: Long Short Term Memory (LSTM)',
+               'compile-and-train-the-model'),
+              ('Finally, evaluate the model',
                2,
                None,
-               'gating-mechanism-long-short-term-memory-lstm'),
-              ('Implementing a memory cell in a neural network',
+               'finally-evaluate-the-model'),
+              ('Building code using Pytorch',
                2,
                None,
-               'implementing-a-memory-cell-in-a-neural-network'),
-              ('LSTM details', 2, None, 'lstm-details'),
-              ('Basic layout', 2, None, 'basic-layout'),
-              ('More LSTM details', 2, None, 'more-lstm-details'),
-              ('The forget gate', 2, None, 'the-forget-gate'),
-              ('Input gate', 2, None, 'input-gate'),
-              ('Forget and input', 2, None, 'forget-and-input'),
-              ('Output gate', 2, None, 'output-gate')]}
+               'building-code-using-pytorch')]}
 end of tocinfo -->
 
 <body>
@@ -302,111 +201,72 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week45-bs.html">Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks (RNNs)</a>
+    <a class="navbar-brand" href="http://github.com/week45-bs.html">Week 45,  Convolutional Neural Networks (CCNs)</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs001.html#plans-for-week-45" style="font-size: 80%;"><b>Plans for week 45</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs002.html#material-for-the-lab-sessions-additional-ways-to-present-classification-results-and-other-practicalities" style="font-size: 80%;"><b>Material for the lab sessions, additional ways to present classification results and other practicalities</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs003.html#material-for-lecture-monday-november-4" style="font-size: 80%;"><b>Material for Lecture Monday November 4</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs004.html#convolutional-neural-networks-recognizing-images" style="font-size: 80%;"><b>Convolutional Neural Networks (recognizing images)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs005.html#what-is-the-difference" style="font-size: 80%;"><b>What is the Difference</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs006.html#neural-networks-vs-cnns" style="font-size: 80%;"><b>Neural Networks vs CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs007.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;"><b>Why CNNS for images, sound files, medical images from CT scans etc?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs008.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;"><b>Regular NNs don’t scale well to full images</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs009.html#3d-volumes-of-neurons" style="font-size: 80%;"><b>3D volumes of neurons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs010.html#layers-used-to-build-cnns" style="font-size: 80%;"><b>Layers used to build CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs011.html#cnns-in-brief" style="font-size: 80%;"><b>CNNs in brief</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs012.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs013.html#key-idea" style="font-size: 80%;"><b>Key Idea</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs014.html#building-convolutional-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building convolutional neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs015.html#setting-it-up" style="font-size: 80%;"><b>Setting it up</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs016.html#the-mnist-dataset-again" style="font-size: 80%;"><b>The MNIST dataset again</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs017.html#strong-correlations" style="font-size: 80%;"><b>Strong correlations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs018.html#layers-of-a-cnn" style="font-size: 80%;"><b>Layers of a CNN</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs019.html#systematic-reduction" style="font-size: 80%;"><b>Systematic reduction</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs020.html#prerequisites-collect-and-pre-process-data" style="font-size: 80%;"><b>Prerequisites: Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs021.html#importing-keras-and-tensorflow" style="font-size: 80%;"><b>Importing Keras and Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs022.html#running-with-keras" style="font-size: 80%;"><b>Running with Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs023.html#final-part" style="font-size: 80%;"><b>Final part</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs024.html#final-visualization" style="font-size: 80%;"><b>Final visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs025.html#the-cifar01-data-set" style="font-size: 80%;"><b>The CIFAR01 data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs026.html#verifying-the-data-set" style="font-size: 80%;"><b>Verifying the data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs027.html#set-up-the-model" style="font-size: 80%;"><b>Set up  the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs028.html#add-dense-layers-on-top" style="font-size: 80%;"><b>Add Dense layers on top</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs029.html#compile-and-train-the-model" style="font-size: 80%;"><b>Compile and train the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs030.html#finally-evaluate-the-model" style="font-size: 80%;"><b>Finally, evaluate the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#building-our-own-cnn-code" style="font-size: 80%;"><b>Building our own CNN code</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#list-of-contents" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;List of contents:</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution2dlayer-convolution-in-a-hidden-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution2DLayer: convolution in a hidden layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#backpropagation-in-the-convolutional-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Backpropagation in the convolutional layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#demonstration" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Demonstration</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#pooling-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Pooling Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#flattening-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Flattening Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#fully-connected-layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Fully Connected Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#optimized-convolution2dlayer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Optimized Convolution2DLayer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#the-convolutional-neural-network-cnn" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;The Convolutional Neural Network (CNN)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-cnn-code" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of CNN code</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#additional-remarks" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Additional Remarks</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#remarks-on-the-speed" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Remarks on the speed</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution-using-separable-kernels" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution using separable kernels</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution-in-the-fourier-domain" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution in the Fourier domain</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs032.html#from-ffnns-and-cnns-to-recurrent-neural-networks-rnns" style="font-size: 80%;"><b>From FFNNs and CNNs to recurrent neural networks (RNNs)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs033.html#feedback-connections" style="font-size: 80%;"><b>Feedback connections</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs034.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs035.html#recurrent-neural-networks-rnns-overarching-view" style="font-size: 80%;"><b>Recurrent neural networks (RNNs): Overarching view</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs036.html#sequential-data-only" style="font-size: 80%;"><b>Sequential data only?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs037.html#differential-equations" style="font-size: 80%;"><b>Differential equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs038.html#a-simple-example" style="font-size: 80%;"><b>A simple example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs039.html#rnns" style="font-size: 80%;"><b>RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs040.html#what-kinds-of-behaviour-can-rnns-exhibit" style="font-size: 80%;"><b>What kinds of behaviour can RNNs exhibit?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs041.html#basic-layout-figures-from-sebastian-rashcka-et-al-machine-learning-with-sickit-learn-and-pytorch-https-sebastianraschka-com-blog-2022-ml-pytorch-book-html" style="font-size: 80%;"><b>Basic layout,  "Figures from Sebastian Rashcka et al, Machine learning with Sickit-Learn and PyTorch":"https://sebastianraschka.com/blog/2022/ml-pytorch-book.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs042.html#solving-differential-equations-with-rnns" style="font-size: 80%;"><b>Solving differential equations with RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="#two-first-order-differential-equations" style="font-size: 80%;"><b>Two first-order differential equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs044.html#velocity-only" style="font-size: 80%;"><b>Velocity only</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs045.html#linking-with-rnns" style="font-size: 80%;"><b>Linking with RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs046.html#minor-rewrite" style="font-size: 80%;"><b>Minor rewrite</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs047.html#rnns-in-more-detail" style="font-size: 80%;"><b>RNNs in more detail</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs048.html#rnns-in-more-detail-part-2" style="font-size: 80%;"><b>RNNs in more detail, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs049.html#rnns-in-more-detail-part-3" style="font-size: 80%;"><b>RNNs in more detail, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs050.html#rnns-in-more-detail-part-4" style="font-size: 80%;"><b>RNNs in more detail, part 4</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs051.html#rnns-in-more-detail-part-5" style="font-size: 80%;"><b>RNNs in more detail, part 5</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs052.html#rnns-in-more-detail-part-6" style="font-size: 80%;"><b>RNNs in more detail, part 6</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs053.html#rnns-in-more-detail-part-7" style="font-size: 80%;"><b>RNNs in more detail, part 7</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs054.html#backpropagation-through-time" style="font-size: 80%;"><b>Backpropagation through time</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs055.html#the-backward-pass-is-linear" style="font-size: 80%;"><b>The backward pass is linear</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs056.html#the-problem-of-exploding-or-vanishing-gradients" style="font-size: 80%;"><b>The problem of exploding or vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs057.html#mathematical-setup" style="font-size: 80%;"><b>Mathematical setup</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs058.html#back-propagation-in-time-through-figures-part-1" style="font-size: 80%;"><b>Back propagation in time through figures, part 1</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs059.html#back-propagation-in-time-part-2" style="font-size: 80%;"><b>Back propagation in time, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs060.html#back-propagation-in-time-part-3" style="font-size: 80%;"><b>Back propagation in time, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs061.html#back-propagation-in-time-part-4" style="font-size: 80%;"><b>Back propagation in time, part 4</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs062.html#back-propagation-in-time-in-equations" style="font-size: 80%;"><b>Back propagation in time in equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs063.html#chain-rule-again" style="font-size: 80%;"><b>Chain rule again</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs064.html#gradients-of-loss-functions" style="font-size: 80%;"><b>Gradients of loss functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs065.html#summary-of-rnns" style="font-size: 80%;"><b>Summary of RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs066.html#summary-of-a-typical-rnn" style="font-size: 80%;"><b>Summary of a  typical RNN</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs067.html#four-effective-ways-to-learn-an-rnn-and-preparing-for-next-week" style="font-size: 80%;"><b>Four effective ways to learn an RNN and preparing for next week</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs068.html#gating-mechanism-long-short-term-memory-lstm" style="font-size: 80%;"><b>Gating mechanism: Long Short Term Memory (LSTM)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs069.html#implementing-a-memory-cell-in-a-neural-network" style="font-size: 80%;"><b>Implementing a memory cell in a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs070.html#lstm-details" style="font-size: 80%;"><b>LSTM details</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs071.html#basic-layout" style="font-size: 80%;"><b>Basic layout</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs072.html#more-lstm-details" style="font-size: 80%;"><b>More LSTM details</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs073.html#the-forget-gate" style="font-size: 80%;"><b>The forget gate</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs074.html#input-gate" style="font-size: 80%;"><b>Input gate</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs075.html#forget-and-input" style="font-size: 80%;"><b>Forget and input</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs076.html#output-gate" style="font-size: 80%;"><b>Output gate</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs001.html#plans-for-week-45" style="font-size: 80%;">Plans for week 45</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs002.html#material-for-the-lab-sessions" style="font-size: 80%;">Material for the lab sessions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs003.html#material-for-lecture-monday-november-3" style="font-size: 80%;">Material for Lecture Monday November 3</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs004.html#convolutional-neural-networks-recognizing-images-reminder-from-last-week" style="font-size: 80%;">Convolutional Neural Networks (recognizing images), reminder from last week</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs005.html#what-is-the-difference" style="font-size: 80%;">What is the Difference</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs006.html#neural-networks-vs-cnns" style="font-size: 80%;">Neural Networks vs CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs007.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;">Why CNNS for images, sound files, medical images from CT scans etc?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs008.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;">Regular NNs don’t scale well to full images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs009.html#3d-volumes-of-neurons" style="font-size: 80%;">3D volumes of neurons</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs010.html#more-on-dimensionalities" style="font-size: 80%;">More on Dimensionalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs011.html#further-remarks" style="font-size: 80%;">Further remarks</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs012.html#layers-used-to-build-cnns" style="font-size: 80%;">Layers used to build CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs013.html#transforming-images" style="font-size: 80%;">Transforming images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs014.html#cnns-in-brief" style="font-size: 80%;">CNNs in brief</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs015.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs016.html#key-idea" style="font-size: 80%;">Key Idea</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs017.html#mathematics-of-cnns" style="font-size: 80%;">Mathematics of CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs018.html#convolution-examples-polynomial-multiplication" style="font-size: 80%;">Convolution Examples: Polynomial multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs019.html#efficient-polynomial-multiplication" style="font-size: 80%;">Efficient Polynomial Multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs020.html#further-simplification" style="font-size: 80%;">Further simplification</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs021.html#a-more-efficient-way-of-coding-the-above-convolution" style="font-size: 80%;">A more efficient way of coding the above Convolution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs022.html#commutative-process" style="font-size: 80%;">Commutative process</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs023.html#toeplitz-matrices" style="font-size: 80%;">Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs024.html#fourier-series-and-toeplitz-matrices" style="font-size: 80%;">Fourier series and Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs025.html#generalizing-the-above-one-dimensional-case" style="font-size: 80%;">Generalizing the above one-dimensional case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs026.html#memory-considerations" style="font-size: 80%;">Memory considerations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs027.html#padding" style="font-size: 80%;">Padding</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs028.html#new-vector" style="font-size: 80%;">New vector</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs029.html#rewriting-as-dot-products" style="font-size: 80%;">Rewriting as dot products</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs030.html#cross-correlation" style="font-size: 80%;">Cross correlation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs030.html#two-dimensional-objects" style="font-size: 80%;">Two-dimensional objects</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#cnns-in-more-detail-simple-example" style="font-size: 80%;">CNNs in more detail, simple example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs032.html#the-convolution-stage" style="font-size: 80%;">The convolution stage</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs033.html#finding-the-number-of-parameters" style="font-size: 80%;">Finding the number of parameters</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs034.html#new-image-or-volume" style="font-size: 80%;">New image (or volume)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs035.html#parameters-to-train-common-settings" style="font-size: 80%;">Parameters to train, common settings</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs036.html#examples-of-cnn-setups" style="font-size: 80%;">Examples of CNN setups</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs037.html#summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Summarizing: Performing a general discrete convolution ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs038.html#pooling" style="font-size: 80%;">Pooling</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs039.html#pooling-arithmetic" style="font-size: 80%;">Pooling arithmetic</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs040.html#pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Pooling types ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs041.html#building-convolutional-neural-networks-using-tensorflow-and-keras" style="font-size: 80%;">Building convolutional neural networks using Tensorflow and Keras</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs042.html#setting-it-up" style="font-size: 80%;">Setting it up</a></li>
+     <!-- navigation toc: --> <li><a href="#the-mnist-dataset-again" style="font-size: 80%;">The MNIST dataset again</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs044.html#strong-correlations" style="font-size: 80%;">Strong correlations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs045.html#layers-of-a-cnn" style="font-size: 80%;">Layers of a CNN</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs046.html#systematic-reduction" style="font-size: 80%;">Systematic reduction</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs047.html#prerequisites-collect-and-pre-process-data" style="font-size: 80%;">Prerequisites: Collect and pre-process data</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs048.html#importing-keras-and-tensorflow" style="font-size: 80%;">Importing Keras and Tensorflow</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs049.html#running-with-keras" style="font-size: 80%;">Running with Keras</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs050.html#final-part" style="font-size: 80%;">Final part</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs051.html#final-visualization" style="font-size: 80%;">Final visualization</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs052.html#the-cifar01-data-set" style="font-size: 80%;">The CIFAR01 data set</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs053.html#verifying-the-data-set" style="font-size: 80%;">Verifying the data set</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs054.html#set-up-the-model" style="font-size: 80%;">Set up  the model</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs055.html#add-dense-layers-on-top" style="font-size: 80%;">Add Dense layers on top</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs056.html#compile-and-train-the-model" style="font-size: 80%;">Compile and train the model</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs057.html#finally-evaluate-the-model" style="font-size: 80%;">Finally, evaluate the model</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs058.html#building-code-using-pytorch" style="font-size: 80%;">Building code using Pytorch</a></li>
 
         </ul>
       </li>
@@ -418,21 +278,21 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0043"></a>
 <!-- !split -->
-<h2 id="two-first-order-differential-equations" class="anchor">Two first-order differential equations </h2>
+<h2 id="the-mnist-dataset-again" class="anchor">The MNIST dataset again </h2>
 
-<p>In solving the above second-order equation, it is common to rewrite it in terms of two coupled first-order equations
-with the velocity
+<p>The MNIST dataset consists of grayscale images with a pixel size of
+\( 28\times 28 \), meaning we require \( 28 \times 28 = 724 \) weights to each
+neuron in the first hidden layer.
 </p>
-$$
-v(t)=\frac{dx}{dt},
-$$
 
-<p>and the acceleration</p>
-$$
-\frac{dv}{dt}=F(t)-\eta v(t)-x(t).
-$$
+<p>If we were to analyze images of size \( 128\times 128 \) we would require
+\( 128 \times 128 = 16384 \) weights to each neuron. Even worse if we were
+dealing with color images, as most images are, we have an image matrix
+of size \( 128\times 128 \) for each color dimension (Red, Green, Blue),
+meaning 3 times the number of weights \( = 49152 \) are required for every
+single neuron in the first hidden layer.
+</p>
 
-<p>With the initial conditions \( v_0=v(t_0) \) and \( x_0=x(t_0) \) defined, we can integrate these equations and find their respective solutions.</p>
 
 <p>
 <!-- navigation buttons at the bottom of the page -->
@@ -459,7 +319,7 @@ <h2 id="two-first-order-differential-equations" class="anchor">Two first-order d
   <li><a href="http://github.com/._week45-bs051.html">52</a></li>
   <li><a href="http://github.com/._week45-bs052.html">53</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week45-bs076.html">77</a></li>
+  <li><a href="http://github.com/._week45-bs058.html">59</a></li>
   <li><a href="http://github.com/._week45-bs044.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week45/html/._week45-bs044.html b/doc/pub/week45/html/._week45-bs044.html
index 03a509d3d..b0237e768 100644
--- a/doc/pub/week45/html/._week45-bs044.html
+++ b/doc/pub/week45/html/._week45-bs044.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks (RNNs)">
-<title>Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks (RNNs)</title>
+<meta name="description" content="Week 45,  Convolutional Neural Networks (CCNs)">
+<title>Week 45,  Convolutional Neural Networks (CCNs)</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week45.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week45-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -37,19 +37,19 @@
 <!-- tocinfo
 {'highest level': 2,
  'sections': [('Plans for week 45', 2, None, 'plans-for-week-45'),
-              ('Material for the lab sessions, additional ways to present '
-               'classification results and other practicalities',
+              ('Material for the lab sessions',
                2,
                None,
-               'material-for-the-lab-sessions-additional-ways-to-present-classification-results-and-other-practicalities'),
-              ('Material for Lecture Monday November 4',
+               'material-for-the-lab-sessions'),
+              ('Material for Lecture Monday November 3',
                2,
                None,
-               'material-for-lecture-monday-november-4'),
-              ('Convolutional Neural Networks (recognizing images)',
+               'material-for-lecture-monday-november-3'),
+              ('Convolutional Neural Networks (recognizing images), reminder '
+               'from last week',
                2,
                None,
-               'convolutional-neural-networks-recognizing-images'),
+               'convolutional-neural-networks-recognizing-images-reminder-from-last-week'),
               ('What is the Difference', 2, None, 'what-is-the-difference'),
               ('Neural Networks vs CNNs', 2, None, 'neural-networks-vs-cnns'),
               ('Why CNNS for images, sound files, medical images from CT scans '
@@ -62,10 +62,13 @@
                None,
                'regular-nns-don-t-scale-well-to-full-images'),
               ('3D volumes of neurons', 2, None, '3d-volumes-of-neurons'),
+              ('More on Dimensionalities', 2, None, 'more-on-dimensionalities'),
+              ('Further remarks', 2, None, 'further-remarks'),
               ('Layers used to build CNNs',
                2,
                None,
                'layers-used-to-build-cnns'),
+              ('Transforming images', 2, None, 'transforming-images'),
               ('CNNs in brief', 2, None, 'cnns-in-brief'),
               ('A deep CNN model ("From Raschka et '
                'al":"https://github.com/rasbt/machine-learning-book")',
@@ -73,208 +76,104 @@
                None,
                'a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
               ('Key Idea', 2, None, 'key-idea'),
-              ('Building convolutional neural networks in Tensorflow and Keras',
+              ('Mathematics of CNNs', 2, None, 'mathematics-of-cnns'),
+              ('Convolution Examples: Polynomial multiplication',
                2,
                None,
-               'building-convolutional-neural-networks-in-tensorflow-and-keras'),
-              ('Setting it up', 2, None, 'setting-it-up'),
-              ('The MNIST dataset again', 2, None, 'the-mnist-dataset-again'),
-              ('Strong correlations', 2, None, 'strong-correlations'),
-              ('Layers of a CNN', 2, None, 'layers-of-a-cnn'),
-              ('Systematic reduction', 2, None, 'systematic-reduction'),
-              ('Prerequisites: Collect and pre-process data',
-               2,
-               None,
-               'prerequisites-collect-and-pre-process-data'),
-              ('Importing Keras and Tensorflow',
-               2,
-               None,
-               'importing-keras-and-tensorflow'),
-              ('Running with Keras', 2, None, 'running-with-keras'),
-              ('Final part', 2, None, 'final-part'),
-              ('Final visualization', 2, None, 'final-visualization'),
-              ('The CIFAR01 data set', 2, None, 'the-cifar01-data-set'),
-              ('Verifying the data set', 2, None, 'verifying-the-data-set'),
-              ('Set up  the model', 2, None, 'set-up-the-model'),
-              ('Add Dense layers on top', 2, None, 'add-dense-layers-on-top'),
-              ('Compile and train the model',
-               2,
-               None,
-               'compile-and-train-the-model'),
-              ('Finally, evaluate the model',
-               2,
-               None,
-               'finally-evaluate-the-model'),
-              ('Building our own CNN code',
-               2,
-               None,
-               'building-our-own-cnn-code'),
-              ('List of contents:', 3, None, 'list-of-contents'),
-              ('Schedulers', 3, None, 'schedulers'),
-              ('Usage of schedulers', 3, None, 'usage-of-schedulers'),
-              ('Cost functions', 3, None, 'cost-functions'),
-              ('Usage of cost functions', 3, None, 'usage-of-cost-functions'),
-              ('Activation functions', 3, None, 'activation-functions'),
-              ('Usage of activation functions',
-               3,
-               None,
-               'usage-of-activation-functions'),
-              ('Convolution', 3, None, 'convolution'),
-              ('Layers', 3, None, 'layers'),
-              ('Convolution2DLayer: convolution in a hidden layer',
-               3,
-               None,
-               'convolution2dlayer-convolution-in-a-hidden-layer'),
-              ('Backpropagation in the convolutional layer',
-               3,
-               None,
-               'backpropagation-in-the-convolutional-layer'),
-              ('Demonstration', 3, None, 'demonstration'),
-              ('Pooling Layer', 3, None, 'pooling-layer'),
-              ('Flattening Layer', 3, None, 'flattening-layer'),
-              ('Fully Connected Layers', 3, None, 'fully-connected-layers'),
-              ('Optimized Convolution2DLayer',
-               3,
-               None,
-               'optimized-convolution2dlayer'),
-              ('The Convolutional Neural Network (CNN)',
-               3,
-               None,
-               'the-convolutional-neural-network-cnn'),
-              ('Usage of CNN code', 3, None, 'usage-of-cnn-code'),
-              ('Additional Remarks', 3, None, 'additional-remarks'),
-              ('Remarks on the speed', 3, None, 'remarks-on-the-speed'),
-              ('Convolution using separable kernels',
-               3,
-               None,
-               'convolution-using-separable-kernels'),
-              ('Convolution in the Fourier domain',
-               3,
-               None,
-               'convolution-in-the-fourier-domain'),
-              ('From FFNNs and CNNs to recurrent neural networks (RNNs)',
+               'convolution-examples-polynomial-multiplication'),
+              ('Efficient Polynomial Multiplication',
                2,
                None,
-               'from-ffnns-and-cnns-to-recurrent-neural-networks-rnns'),
-              ('Feedback connections', 2, None, 'feedback-connections'),
-              ('Vanishing gradients', 2, None, 'vanishing-gradients'),
-              ('Recurrent neural networks (RNNs): Overarching view',
+               'efficient-polynomial-multiplication'),
+              ('Further simplification', 2, None, 'further-simplification'),
+              ('A more efficient way of coding the above Convolution',
                2,
                None,
-               'recurrent-neural-networks-rnns-overarching-view'),
-              ('Sequential data only?', 2, None, 'sequential-data-only'),
-              ('Differential equations', 2, None, 'differential-equations'),
-              ('A simple example', 2, None, 'a-simple-example'),
-              ('RNNs', 2, None, 'rnns'),
-              ('What kinds of behaviour can RNNs exhibit?',
+               'a-more-efficient-way-of-coding-the-above-convolution'),
+              ('Commutative process', 2, None, 'commutative-process'),
+              ('Toeplitz matrices', 2, None, 'toeplitz-matrices'),
+              ('Fourier series and Toeplitz matrices',
                2,
                None,
-               'what-kinds-of-behaviour-can-rnns-exhibit'),
-              ('Basic layout,  "Figures from Sebastian Rashcka et al, Machine '
-               'learning with Sickit-Learn and '
-               'PyTorch":"https://sebastianraschka.com/blog/2022/ml-pytorch-book.html"',
+               'fourier-series-and-toeplitz-matrices'),
+              ('Generalizing the above one-dimensional case',
                2,
                None,
-               'basic-layout-figures-from-sebastian-rashcka-et-al-machine-learning-with-sickit-learn-and-pytorch-https-sebastianraschka-com-blog-2022-ml-pytorch-book-html'),
-              ('Solving differential equations with RNNs',
+               'generalizing-the-above-one-dimensional-case'),
+              ('Memory considerations', 2, None, 'memory-considerations'),
+              ('Padding', 2, None, 'padding'),
+              ('New vector', 2, None, 'new-vector'),
+              ('Rewriting as dot products',
                2,
                None,
-               'solving-differential-equations-with-rnns'),
-              ('Two first-order differential equations',
+               'rewriting-as-dot-products'),
+              ('Cross correlation', 2, None, 'cross-correlation'),
+              ('Two-dimensional objects', 2, None, 'two-dimensional-objects'),
+              ('CNNs in more detail, simple example',
                2,
                None,
-               'two-first-order-differential-equations'),
-              ('Velocity only', 2, None, 'velocity-only'),
-              ('Linking with RNNs', 2, None, 'linking-with-rnns'),
-              ('Minor rewrite', 2, None, 'minor-rewrite'),
-              ('RNNs in more detail', 2, None, 'rnns-in-more-detail'),
-              ('RNNs in more detail, part 2',
+               'cnns-in-more-detail-simple-example'),
+              ('The convolution stage', 2, None, 'the-convolution-stage'),
+              ('Finding the number of parameters',
                2,
                None,
-               'rnns-in-more-detail-part-2'),
-              ('RNNs in more detail, part 3',
+               'finding-the-number-of-parameters'),
+              ('New image (or volume)', 2, None, 'new-image-or-volume'),
+              ('Parameters to train, common settings',
                2,
                None,
-               'rnns-in-more-detail-part-3'),
-              ('RNNs in more detail, part 4',
-               2,
-               None,
-               'rnns-in-more-detail-part-4'),
-              ('RNNs in more detail, part 5',
-               2,
-               None,
-               'rnns-in-more-detail-part-5'),
-              ('RNNs in more detail, part 6',
-               2,
-               None,
-               'rnns-in-more-detail-part-6'),
-              ('RNNs in more detail, part 7',
-               2,
-               None,
-               'rnns-in-more-detail-part-7'),
-              ('Backpropagation through time',
-               2,
-               None,
-               'backpropagation-through-time'),
-              ('The backward pass is linear',
-               2,
-               None,
-               'the-backward-pass-is-linear'),
-              ('The problem of exploding or vanishing gradients',
-               2,
-               None,
-               'the-problem-of-exploding-or-vanishing-gradients'),
-              ('Mathematical setup', 2, None, 'mathematical-setup'),
-              ('Back propagation in time through figures, part 1',
-               2,
-               None,
-               'back-propagation-in-time-through-figures-part-1'),
-              ('Back propagation in time, part 2',
-               2,
-               None,
-               'back-propagation-in-time-part-2'),
-              ('Back propagation in time, part 3',
+               'parameters-to-train-common-settings'),
+              ('Examples of CNN setups', 2, None, 'examples-of-cnn-setups'),
+              ('Summarizing: Performing a general discrete convolution ("From '
+               'Raschka et '
+               'al":"https://github.com/rasbt/machine-learning-book")',
                2,
                None,
-               'back-propagation-in-time-part-3'),
-              ('Back propagation in time, part 4',
+               'summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
+              ('Pooling', 2, None, 'pooling'),
+              ('Pooling arithmetic', 2, None, 'pooling-arithmetic'),
+              ('Pooling types ("From Raschka et '
+               'al":"https://github.com/rasbt/machine-learning-book")',
                2,
                None,
-               'back-propagation-in-time-part-4'),
-              ('Back propagation in time in equations',
+               'pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
+              ('Building convolutional neural networks using Tensorflow and '
+               'Keras',
                2,
                None,
-               'back-propagation-in-time-in-equations'),
-              ('Chain rule again', 2, None, 'chain-rule-again'),
-              ('Gradients of loss functions',
+               'building-convolutional-neural-networks-using-tensorflow-and-keras'),
+              ('Setting it up', 2, None, 'setting-it-up'),
+              ('The MNIST dataset again', 2, None, 'the-mnist-dataset-again'),
+              ('Strong correlations', 2, None, 'strong-correlations'),
+              ('Layers of a CNN', 2, None, 'layers-of-a-cnn'),
+              ('Systematic reduction', 2, None, 'systematic-reduction'),
+              ('Prerequisites: Collect and pre-process data',
                2,
                None,
-               'gradients-of-loss-functions'),
-              ('Summary of RNNs', 2, None, 'summary-of-rnns'),
-              ('Summary of a  typical RNN',
+               'prerequisites-collect-and-pre-process-data'),
+              ('Importing Keras and Tensorflow',
                2,
                None,
-               'summary-of-a-typical-rnn'),
-              ('Four effective ways to learn an RNN and preparing for next '
-               'week',
+               'importing-keras-and-tensorflow'),
+              ('Running with Keras', 2, None, 'running-with-keras'),
+              ('Final part', 2, None, 'final-part'),
+              ('Final visualization', 2, None, 'final-visualization'),
+              ('The CIFAR01 data set', 2, None, 'the-cifar01-data-set'),
+              ('Verifying the data set', 2, None, 'verifying-the-data-set'),
+              ('Set up  the model', 2, None, 'set-up-the-model'),
+              ('Add Dense layers on top', 2, None, 'add-dense-layers-on-top'),
+              ('Compile and train the model',
                2,
                None,
-               'four-effective-ways-to-learn-an-rnn-and-preparing-for-next-week'),
-              ('Gating mechanism: Long Short Term Memory (LSTM)',
+               'compile-and-train-the-model'),
+              ('Finally, evaluate the model',
                2,
                None,
-               'gating-mechanism-long-short-term-memory-lstm'),
-              ('Implementing a memory cell in a neural network',
+               'finally-evaluate-the-model'),
+              ('Building code using Pytorch',
                2,
                None,
-               'implementing-a-memory-cell-in-a-neural-network'),
-              ('LSTM details', 2, None, 'lstm-details'),
-              ('Basic layout', 2, None, 'basic-layout'),
-              ('More LSTM details', 2, None, 'more-lstm-details'),
-              ('The forget gate', 2, None, 'the-forget-gate'),
-              ('Input gate', 2, None, 'input-gate'),
-              ('Forget and input', 2, None, 'forget-and-input'),
-              ('Output gate', 2, None, 'output-gate')]}
+               'building-code-using-pytorch')]}
 end of tocinfo -->
 
 <body>
@@ -302,111 +201,72 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week45-bs.html">Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks (RNNs)</a>
+    <a class="navbar-brand" href="http://github.com/week45-bs.html">Week 45,  Convolutional Neural Networks (CCNs)</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs001.html#plans-for-week-45" style="font-size: 80%;"><b>Plans for week 45</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs002.html#material-for-the-lab-sessions-additional-ways-to-present-classification-results-and-other-practicalities" style="font-size: 80%;"><b>Material for the lab sessions, additional ways to present classification results and other practicalities</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs003.html#material-for-lecture-monday-november-4" style="font-size: 80%;"><b>Material for Lecture Monday November 4</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs004.html#convolutional-neural-networks-recognizing-images" style="font-size: 80%;"><b>Convolutional Neural Networks (recognizing images)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs005.html#what-is-the-difference" style="font-size: 80%;"><b>What is the Difference</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs006.html#neural-networks-vs-cnns" style="font-size: 80%;"><b>Neural Networks vs CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs007.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;"><b>Why CNNS for images, sound files, medical images from CT scans etc?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs008.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;"><b>Regular NNs don’t scale well to full images</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs009.html#3d-volumes-of-neurons" style="font-size: 80%;"><b>3D volumes of neurons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs010.html#layers-used-to-build-cnns" style="font-size: 80%;"><b>Layers used to build CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs011.html#cnns-in-brief" style="font-size: 80%;"><b>CNNs in brief</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs012.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs013.html#key-idea" style="font-size: 80%;"><b>Key Idea</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs014.html#building-convolutional-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building convolutional neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs015.html#setting-it-up" style="font-size: 80%;"><b>Setting it up</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs016.html#the-mnist-dataset-again" style="font-size: 80%;"><b>The MNIST dataset again</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs017.html#strong-correlations" style="font-size: 80%;"><b>Strong correlations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs018.html#layers-of-a-cnn" style="font-size: 80%;"><b>Layers of a CNN</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs019.html#systematic-reduction" style="font-size: 80%;"><b>Systematic reduction</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs020.html#prerequisites-collect-and-pre-process-data" style="font-size: 80%;"><b>Prerequisites: Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs021.html#importing-keras-and-tensorflow" style="font-size: 80%;"><b>Importing Keras and Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs022.html#running-with-keras" style="font-size: 80%;"><b>Running with Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs023.html#final-part" style="font-size: 80%;"><b>Final part</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs024.html#final-visualization" style="font-size: 80%;"><b>Final visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs025.html#the-cifar01-data-set" style="font-size: 80%;"><b>The CIFAR01 data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs026.html#verifying-the-data-set" style="font-size: 80%;"><b>Verifying the data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs027.html#set-up-the-model" style="font-size: 80%;"><b>Set up  the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs028.html#add-dense-layers-on-top" style="font-size: 80%;"><b>Add Dense layers on top</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs029.html#compile-and-train-the-model" style="font-size: 80%;"><b>Compile and train the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs030.html#finally-evaluate-the-model" style="font-size: 80%;"><b>Finally, evaluate the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#building-our-own-cnn-code" style="font-size: 80%;"><b>Building our own CNN code</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#list-of-contents" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;List of contents:</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution2dlayer-convolution-in-a-hidden-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution2DLayer: convolution in a hidden layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#backpropagation-in-the-convolutional-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Backpropagation in the convolutional layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#demonstration" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Demonstration</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#pooling-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Pooling Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#flattening-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Flattening Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#fully-connected-layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Fully Connected Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#optimized-convolution2dlayer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Optimized Convolution2DLayer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#the-convolutional-neural-network-cnn" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;The Convolutional Neural Network (CNN)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-cnn-code" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of CNN code</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#additional-remarks" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Additional Remarks</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#remarks-on-the-speed" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Remarks on the speed</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution-using-separable-kernels" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution using separable kernels</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution-in-the-fourier-domain" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution in the Fourier domain</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs032.html#from-ffnns-and-cnns-to-recurrent-neural-networks-rnns" style="font-size: 80%;"><b>From FFNNs and CNNs to recurrent neural networks (RNNs)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs033.html#feedback-connections" style="font-size: 80%;"><b>Feedback connections</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs034.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs035.html#recurrent-neural-networks-rnns-overarching-view" style="font-size: 80%;"><b>Recurrent neural networks (RNNs): Overarching view</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs036.html#sequential-data-only" style="font-size: 80%;"><b>Sequential data only?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs037.html#differential-equations" style="font-size: 80%;"><b>Differential equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs038.html#a-simple-example" style="font-size: 80%;"><b>A simple example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs039.html#rnns" style="font-size: 80%;"><b>RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs040.html#what-kinds-of-behaviour-can-rnns-exhibit" style="font-size: 80%;"><b>What kinds of behaviour can RNNs exhibit?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs041.html#basic-layout-figures-from-sebastian-rashcka-et-al-machine-learning-with-sickit-learn-and-pytorch-https-sebastianraschka-com-blog-2022-ml-pytorch-book-html" style="font-size: 80%;"><b>Basic layout,  "Figures from Sebastian Rashcka et al, Machine learning with Sickit-Learn and PyTorch":"https://sebastianraschka.com/blog/2022/ml-pytorch-book.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs042.html#solving-differential-equations-with-rnns" style="font-size: 80%;"><b>Solving differential equations with RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs043.html#two-first-order-differential-equations" style="font-size: 80%;"><b>Two first-order differential equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="#velocity-only" style="font-size: 80%;"><b>Velocity only</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs045.html#linking-with-rnns" style="font-size: 80%;"><b>Linking with RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs046.html#minor-rewrite" style="font-size: 80%;"><b>Minor rewrite</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs047.html#rnns-in-more-detail" style="font-size: 80%;"><b>RNNs in more detail</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs048.html#rnns-in-more-detail-part-2" style="font-size: 80%;"><b>RNNs in more detail, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs049.html#rnns-in-more-detail-part-3" style="font-size: 80%;"><b>RNNs in more detail, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs050.html#rnns-in-more-detail-part-4" style="font-size: 80%;"><b>RNNs in more detail, part 4</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs051.html#rnns-in-more-detail-part-5" style="font-size: 80%;"><b>RNNs in more detail, part 5</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs052.html#rnns-in-more-detail-part-6" style="font-size: 80%;"><b>RNNs in more detail, part 6</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs053.html#rnns-in-more-detail-part-7" style="font-size: 80%;"><b>RNNs in more detail, part 7</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs054.html#backpropagation-through-time" style="font-size: 80%;"><b>Backpropagation through time</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs055.html#the-backward-pass-is-linear" style="font-size: 80%;"><b>The backward pass is linear</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs056.html#the-problem-of-exploding-or-vanishing-gradients" style="font-size: 80%;"><b>The problem of exploding or vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs057.html#mathematical-setup" style="font-size: 80%;"><b>Mathematical setup</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs058.html#back-propagation-in-time-through-figures-part-1" style="font-size: 80%;"><b>Back propagation in time through figures, part 1</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs059.html#back-propagation-in-time-part-2" style="font-size: 80%;"><b>Back propagation in time, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs060.html#back-propagation-in-time-part-3" style="font-size: 80%;"><b>Back propagation in time, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs061.html#back-propagation-in-time-part-4" style="font-size: 80%;"><b>Back propagation in time, part 4</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs062.html#back-propagation-in-time-in-equations" style="font-size: 80%;"><b>Back propagation in time in equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs063.html#chain-rule-again" style="font-size: 80%;"><b>Chain rule again</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs064.html#gradients-of-loss-functions" style="font-size: 80%;"><b>Gradients of loss functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs065.html#summary-of-rnns" style="font-size: 80%;"><b>Summary of RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs066.html#summary-of-a-typical-rnn" style="font-size: 80%;"><b>Summary of a  typical RNN</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs067.html#four-effective-ways-to-learn-an-rnn-and-preparing-for-next-week" style="font-size: 80%;"><b>Four effective ways to learn an RNN and preparing for next week</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs068.html#gating-mechanism-long-short-term-memory-lstm" style="font-size: 80%;"><b>Gating mechanism: Long Short Term Memory (LSTM)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs069.html#implementing-a-memory-cell-in-a-neural-network" style="font-size: 80%;"><b>Implementing a memory cell in a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs070.html#lstm-details" style="font-size: 80%;"><b>LSTM details</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs071.html#basic-layout" style="font-size: 80%;"><b>Basic layout</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs072.html#more-lstm-details" style="font-size: 80%;"><b>More LSTM details</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs073.html#the-forget-gate" style="font-size: 80%;"><b>The forget gate</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs074.html#input-gate" style="font-size: 80%;"><b>Input gate</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs075.html#forget-and-input" style="font-size: 80%;"><b>Forget and input</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs076.html#output-gate" style="font-size: 80%;"><b>Output gate</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs001.html#plans-for-week-45" style="font-size: 80%;">Plans for week 45</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs002.html#material-for-the-lab-sessions" style="font-size: 80%;">Material for the lab sessions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs003.html#material-for-lecture-monday-november-3" style="font-size: 80%;">Material for Lecture Monday November 3</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs004.html#convolutional-neural-networks-recognizing-images-reminder-from-last-week" style="font-size: 80%;">Convolutional Neural Networks (recognizing images), reminder from last week</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs005.html#what-is-the-difference" style="font-size: 80%;">What is the Difference</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs006.html#neural-networks-vs-cnns" style="font-size: 80%;">Neural Networks vs CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs007.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;">Why CNNS for images, sound files, medical images from CT scans etc?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs008.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;">Regular NNs don’t scale well to full images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs009.html#3d-volumes-of-neurons" style="font-size: 80%;">3D volumes of neurons</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs010.html#more-on-dimensionalities" style="font-size: 80%;">More on Dimensionalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs011.html#further-remarks" style="font-size: 80%;">Further remarks</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs012.html#layers-used-to-build-cnns" style="font-size: 80%;">Layers used to build CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs013.html#transforming-images" style="font-size: 80%;">Transforming images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs014.html#cnns-in-brief" style="font-size: 80%;">CNNs in brief</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs015.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs016.html#key-idea" style="font-size: 80%;">Key Idea</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs017.html#mathematics-of-cnns" style="font-size: 80%;">Mathematics of CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs018.html#convolution-examples-polynomial-multiplication" style="font-size: 80%;">Convolution Examples: Polynomial multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs019.html#efficient-polynomial-multiplication" style="font-size: 80%;">Efficient Polynomial Multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs020.html#further-simplification" style="font-size: 80%;">Further simplification</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs021.html#a-more-efficient-way-of-coding-the-above-convolution" style="font-size: 80%;">A more efficient way of coding the above Convolution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs022.html#commutative-process" style="font-size: 80%;">Commutative process</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs023.html#toeplitz-matrices" style="font-size: 80%;">Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs024.html#fourier-series-and-toeplitz-matrices" style="font-size: 80%;">Fourier series and Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs025.html#generalizing-the-above-one-dimensional-case" style="font-size: 80%;">Generalizing the above one-dimensional case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs026.html#memory-considerations" style="font-size: 80%;">Memory considerations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs027.html#padding" style="font-size: 80%;">Padding</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs028.html#new-vector" style="font-size: 80%;">New vector</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs029.html#rewriting-as-dot-products" style="font-size: 80%;">Rewriting as dot products</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs030.html#cross-correlation" style="font-size: 80%;">Cross correlation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs030.html#two-dimensional-objects" style="font-size: 80%;">Two-dimensional objects</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#cnns-in-more-detail-simple-example" style="font-size: 80%;">CNNs in more detail, simple example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs032.html#the-convolution-stage" style="font-size: 80%;">The convolution stage</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs033.html#finding-the-number-of-parameters" style="font-size: 80%;">Finding the number of parameters</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs034.html#new-image-or-volume" style="font-size: 80%;">New image (or volume)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs035.html#parameters-to-train-common-settings" style="font-size: 80%;">Parameters to train, common settings</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs036.html#examples-of-cnn-setups" style="font-size: 80%;">Examples of CNN setups</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs037.html#summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Summarizing: Performing a general discrete convolution ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs038.html#pooling" style="font-size: 80%;">Pooling</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs039.html#pooling-arithmetic" style="font-size: 80%;">Pooling arithmetic</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs040.html#pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Pooling types ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs041.html#building-convolutional-neural-networks-using-tensorflow-and-keras" style="font-size: 80%;">Building convolutional neural networks using Tensorflow and Keras</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs042.html#setting-it-up" style="font-size: 80%;">Setting it up</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs043.html#the-mnist-dataset-again" style="font-size: 80%;">The MNIST dataset again</a></li>
+     <!-- navigation toc: --> <li><a href="#strong-correlations" style="font-size: 80%;">Strong correlations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs045.html#layers-of-a-cnn" style="font-size: 80%;">Layers of a CNN</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs046.html#systematic-reduction" style="font-size: 80%;">Systematic reduction</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs047.html#prerequisites-collect-and-pre-process-data" style="font-size: 80%;">Prerequisites: Collect and pre-process data</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs048.html#importing-keras-and-tensorflow" style="font-size: 80%;">Importing Keras and Tensorflow</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs049.html#running-with-keras" style="font-size: 80%;">Running with Keras</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs050.html#final-part" style="font-size: 80%;">Final part</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs051.html#final-visualization" style="font-size: 80%;">Final visualization</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs052.html#the-cifar01-data-set" style="font-size: 80%;">The CIFAR01 data set</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs053.html#verifying-the-data-set" style="font-size: 80%;">Verifying the data set</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs054.html#set-up-the-model" style="font-size: 80%;">Set up  the model</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs055.html#add-dense-layers-on-top" style="font-size: 80%;">Add Dense layers on top</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs056.html#compile-and-train-the-model" style="font-size: 80%;">Compile and train the model</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs057.html#finally-evaluate-the-model" style="font-size: 80%;">Finally, evaluate the model</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs058.html#building-code-using-pytorch" style="font-size: 80%;">Building code using Pytorch</a></li>
 
         </ul>
       </li>
@@ -418,25 +278,20 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0044"></a>
 <!-- !split -->
-<h2 id="velocity-only" class="anchor">Velocity only </h2>
+<h2 id="strong-correlations" class="anchor">Strong correlations </h2>
 
-<p>Let us focus on the velocity only. Discretizing and using the simplest
-possible approximation for the derivative, we have Euler's forward
-method for the updated velocity at a time step \( i+1 \) given by
+<p>Images typically have strong local correlations, meaning that a small
+part of the image varies little from its neighboring regions. If for
+example we have an image of a blue car, we can roughly assume that a
+small blue part of the image is surrounded by other blue regions.
 </p>
-$$
-v_{i+1}=v_i+\Delta t \frac{dv}{dt}_{\vert_{v=v_i}}=v_i+\Delta t\left(F_i-\eta v_i-x_i\right).
-$$
 
-<p>Defining a function</p>
-$$
-h_i(x_i,v_i,F_i)=v_i+\Delta t\left(F_i-\eta v_i-x_i\right),
-$$
-
-<p>we have</p>
-$$
-v_{i+1}=h_i(x_i,v_i,F_i).
-$$
+<p>Therefore, instead of connecting every single pixel to a neuron in the
+first hidden layer, as we have previously done with deep neural
+networks, we can instead connect each neuron to a small part of the
+image (in all 3 RGB depth dimensions).  The size of each small area is
+fixed, and known as a <a href="https://en.wikipedia.org/wiki/Receptive_field" target="_self">receptive</a>.
+</p>
 
 
 <p>
@@ -464,7 +319,7 @@ <h2 id="velocity-only" class="anchor">Velocity only </h2>
   <li><a href="http://github.com/._week45-bs052.html">53</a></li>
   <li><a href="http://github.com/._week45-bs053.html">54</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week45-bs076.html">77</a></li>
+  <li><a href="http://github.com/._week45-bs058.html">59</a></li>
   <li><a href="http://github.com/._week45-bs045.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week45/html/._week45-bs045.html b/doc/pub/week45/html/._week45-bs045.html
index 4744cb936..bfea286de 100644
--- a/doc/pub/week45/html/._week45-bs045.html
+++ b/doc/pub/week45/html/._week45-bs045.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks (RNNs)">
-<title>Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks (RNNs)</title>
+<meta name="description" content="Week 45,  Convolutional Neural Networks (CCNs)">
+<title>Week 45,  Convolutional Neural Networks (CCNs)</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week45.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week45-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -37,19 +37,19 @@
 <!-- tocinfo
 {'highest level': 2,
  'sections': [('Plans for week 45', 2, None, 'plans-for-week-45'),
-              ('Material for the lab sessions, additional ways to present '
-               'classification results and other practicalities',
+              ('Material for the lab sessions',
                2,
                None,
-               'material-for-the-lab-sessions-additional-ways-to-present-classification-results-and-other-practicalities'),
-              ('Material for Lecture Monday November 4',
+               'material-for-the-lab-sessions'),
+              ('Material for Lecture Monday November 3',
                2,
                None,
-               'material-for-lecture-monday-november-4'),
-              ('Convolutional Neural Networks (recognizing images)',
+               'material-for-lecture-monday-november-3'),
+              ('Convolutional Neural Networks (recognizing images), reminder '
+               'from last week',
                2,
                None,
-               'convolutional-neural-networks-recognizing-images'),
+               'convolutional-neural-networks-recognizing-images-reminder-from-last-week'),
               ('What is the Difference', 2, None, 'what-is-the-difference'),
               ('Neural Networks vs CNNs', 2, None, 'neural-networks-vs-cnns'),
               ('Why CNNS for images, sound files, medical images from CT scans '
@@ -62,10 +62,13 @@
                None,
                'regular-nns-don-t-scale-well-to-full-images'),
               ('3D volumes of neurons', 2, None, '3d-volumes-of-neurons'),
+              ('More on Dimensionalities', 2, None, 'more-on-dimensionalities'),
+              ('Further remarks', 2, None, 'further-remarks'),
               ('Layers used to build CNNs',
                2,
                None,
                'layers-used-to-build-cnns'),
+              ('Transforming images', 2, None, 'transforming-images'),
               ('CNNs in brief', 2, None, 'cnns-in-brief'),
               ('A deep CNN model ("From Raschka et '
                'al":"https://github.com/rasbt/machine-learning-book")',
@@ -73,208 +76,104 @@
                None,
                'a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
               ('Key Idea', 2, None, 'key-idea'),
-              ('Building convolutional neural networks in Tensorflow and Keras',
+              ('Mathematics of CNNs', 2, None, 'mathematics-of-cnns'),
+              ('Convolution Examples: Polynomial multiplication',
                2,
                None,
-               'building-convolutional-neural-networks-in-tensorflow-and-keras'),
-              ('Setting it up', 2, None, 'setting-it-up'),
-              ('The MNIST dataset again', 2, None, 'the-mnist-dataset-again'),
-              ('Strong correlations', 2, None, 'strong-correlations'),
-              ('Layers of a CNN', 2, None, 'layers-of-a-cnn'),
-              ('Systematic reduction', 2, None, 'systematic-reduction'),
-              ('Prerequisites: Collect and pre-process data',
-               2,
-               None,
-               'prerequisites-collect-and-pre-process-data'),
-              ('Importing Keras and Tensorflow',
-               2,
-               None,
-               'importing-keras-and-tensorflow'),
-              ('Running with Keras', 2, None, 'running-with-keras'),
-              ('Final part', 2, None, 'final-part'),
-              ('Final visualization', 2, None, 'final-visualization'),
-              ('The CIFAR01 data set', 2, None, 'the-cifar01-data-set'),
-              ('Verifying the data set', 2, None, 'verifying-the-data-set'),
-              ('Set up  the model', 2, None, 'set-up-the-model'),
-              ('Add Dense layers on top', 2, None, 'add-dense-layers-on-top'),
-              ('Compile and train the model',
-               2,
-               None,
-               'compile-and-train-the-model'),
-              ('Finally, evaluate the model',
-               2,
-               None,
-               'finally-evaluate-the-model'),
-              ('Building our own CNN code',
-               2,
-               None,
-               'building-our-own-cnn-code'),
-              ('List of contents:', 3, None, 'list-of-contents'),
-              ('Schedulers', 3, None, 'schedulers'),
-              ('Usage of schedulers', 3, None, 'usage-of-schedulers'),
-              ('Cost functions', 3, None, 'cost-functions'),
-              ('Usage of cost functions', 3, None, 'usage-of-cost-functions'),
-              ('Activation functions', 3, None, 'activation-functions'),
-              ('Usage of activation functions',
-               3,
-               None,
-               'usage-of-activation-functions'),
-              ('Convolution', 3, None, 'convolution'),
-              ('Layers', 3, None, 'layers'),
-              ('Convolution2DLayer: convolution in a hidden layer',
-               3,
-               None,
-               'convolution2dlayer-convolution-in-a-hidden-layer'),
-              ('Backpropagation in the convolutional layer',
-               3,
-               None,
-               'backpropagation-in-the-convolutional-layer'),
-              ('Demonstration', 3, None, 'demonstration'),
-              ('Pooling Layer', 3, None, 'pooling-layer'),
-              ('Flattening Layer', 3, None, 'flattening-layer'),
-              ('Fully Connected Layers', 3, None, 'fully-connected-layers'),
-              ('Optimized Convolution2DLayer',
-               3,
-               None,
-               'optimized-convolution2dlayer'),
-              ('The Convolutional Neural Network (CNN)',
-               3,
-               None,
-               'the-convolutional-neural-network-cnn'),
-              ('Usage of CNN code', 3, None, 'usage-of-cnn-code'),
-              ('Additional Remarks', 3, None, 'additional-remarks'),
-              ('Remarks on the speed', 3, None, 'remarks-on-the-speed'),
-              ('Convolution using separable kernels',
-               3,
-               None,
-               'convolution-using-separable-kernels'),
-              ('Convolution in the Fourier domain',
-               3,
-               None,
-               'convolution-in-the-fourier-domain'),
-              ('From FFNNs and CNNs to recurrent neural networks (RNNs)',
-               2,
-               None,
-               'from-ffnns-and-cnns-to-recurrent-neural-networks-rnns'),
-              ('Feedback connections', 2, None, 'feedback-connections'),
-              ('Vanishing gradients', 2, None, 'vanishing-gradients'),
-              ('Recurrent neural networks (RNNs): Overarching view',
-               2,
-               None,
-               'recurrent-neural-networks-rnns-overarching-view'),
-              ('Sequential data only?', 2, None, 'sequential-data-only'),
-              ('Differential equations', 2, None, 'differential-equations'),
-              ('A simple example', 2, None, 'a-simple-example'),
-              ('RNNs', 2, None, 'rnns'),
-              ('What kinds of behaviour can RNNs exhibit?',
+               'convolution-examples-polynomial-multiplication'),
+              ('Efficient Polynomial Multiplication',
                2,
                None,
-               'what-kinds-of-behaviour-can-rnns-exhibit'),
-              ('Basic layout,  "Figures from Sebastian Rashcka et al, Machine '
-               'learning with Sickit-Learn and '
-               'PyTorch":"https://sebastianraschka.com/blog/2022/ml-pytorch-book.html"',
+               'efficient-polynomial-multiplication'),
+              ('Further simplification', 2, None, 'further-simplification'),
+              ('A more efficient way of coding the above Convolution',
                2,
                None,
-               'basic-layout-figures-from-sebastian-rashcka-et-al-machine-learning-with-sickit-learn-and-pytorch-https-sebastianraschka-com-blog-2022-ml-pytorch-book-html'),
-              ('Solving differential equations with RNNs',
+               'a-more-efficient-way-of-coding-the-above-convolution'),
+              ('Commutative process', 2, None, 'commutative-process'),
+              ('Toeplitz matrices', 2, None, 'toeplitz-matrices'),
+              ('Fourier series and Toeplitz matrices',
                2,
                None,
-               'solving-differential-equations-with-rnns'),
-              ('Two first-order differential equations',
+               'fourier-series-and-toeplitz-matrices'),
+              ('Generalizing the above one-dimensional case',
                2,
                None,
-               'two-first-order-differential-equations'),
-              ('Velocity only', 2, None, 'velocity-only'),
-              ('Linking with RNNs', 2, None, 'linking-with-rnns'),
-              ('Minor rewrite', 2, None, 'minor-rewrite'),
-              ('RNNs in more detail', 2, None, 'rnns-in-more-detail'),
-              ('RNNs in more detail, part 2',
+               'generalizing-the-above-one-dimensional-case'),
+              ('Memory considerations', 2, None, 'memory-considerations'),
+              ('Padding', 2, None, 'padding'),
+              ('New vector', 2, None, 'new-vector'),
+              ('Rewriting as dot products',
                2,
                None,
-               'rnns-in-more-detail-part-2'),
-              ('RNNs in more detail, part 3',
+               'rewriting-as-dot-products'),
+              ('Cross correlation', 2, None, 'cross-correlation'),
+              ('Two-dimensional objects', 2, None, 'two-dimensional-objects'),
+              ('CNNs in more detail, simple example',
                2,
                None,
-               'rnns-in-more-detail-part-3'),
-              ('RNNs in more detail, part 4',
+               'cnns-in-more-detail-simple-example'),
+              ('The convolution stage', 2, None, 'the-convolution-stage'),
+              ('Finding the number of parameters',
                2,
                None,
-               'rnns-in-more-detail-part-4'),
-              ('RNNs in more detail, part 5',
+               'finding-the-number-of-parameters'),
+              ('New image (or volume)', 2, None, 'new-image-or-volume'),
+              ('Parameters to train, common settings',
                2,
                None,
-               'rnns-in-more-detail-part-5'),
-              ('RNNs in more detail, part 6',
-               2,
-               None,
-               'rnns-in-more-detail-part-6'),
-              ('RNNs in more detail, part 7',
-               2,
-               None,
-               'rnns-in-more-detail-part-7'),
-              ('Backpropagation through time',
-               2,
-               None,
-               'backpropagation-through-time'),
-              ('The backward pass is linear',
-               2,
-               None,
-               'the-backward-pass-is-linear'),
-              ('The problem of exploding or vanishing gradients',
-               2,
-               None,
-               'the-problem-of-exploding-or-vanishing-gradients'),
-              ('Mathematical setup', 2, None, 'mathematical-setup'),
-              ('Back propagation in time through figures, part 1',
-               2,
-               None,
-               'back-propagation-in-time-through-figures-part-1'),
-              ('Back propagation in time, part 2',
-               2,
-               None,
-               'back-propagation-in-time-part-2'),
-              ('Back propagation in time, part 3',
+               'parameters-to-train-common-settings'),
+              ('Examples of CNN setups', 2, None, 'examples-of-cnn-setups'),
+              ('Summarizing: Performing a general discrete convolution ("From '
+               'Raschka et '
+               'al":"https://github.com/rasbt/machine-learning-book")',
                2,
                None,
-               'back-propagation-in-time-part-3'),
-              ('Back propagation in time, part 4',
+               'summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
+              ('Pooling', 2, None, 'pooling'),
+              ('Pooling arithmetic', 2, None, 'pooling-arithmetic'),
+              ('Pooling types ("From Raschka et '
+               'al":"https://github.com/rasbt/machine-learning-book")',
                2,
                None,
-               'back-propagation-in-time-part-4'),
-              ('Back propagation in time in equations',
+               'pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
+              ('Building convolutional neural networks using Tensorflow and '
+               'Keras',
                2,
                None,
-               'back-propagation-in-time-in-equations'),
-              ('Chain rule again', 2, None, 'chain-rule-again'),
-              ('Gradients of loss functions',
+               'building-convolutional-neural-networks-using-tensorflow-and-keras'),
+              ('Setting it up', 2, None, 'setting-it-up'),
+              ('The MNIST dataset again', 2, None, 'the-mnist-dataset-again'),
+              ('Strong correlations', 2, None, 'strong-correlations'),
+              ('Layers of a CNN', 2, None, 'layers-of-a-cnn'),
+              ('Systematic reduction', 2, None, 'systematic-reduction'),
+              ('Prerequisites: Collect and pre-process data',
                2,
                None,
-               'gradients-of-loss-functions'),
-              ('Summary of RNNs', 2, None, 'summary-of-rnns'),
-              ('Summary of a  typical RNN',
+               'prerequisites-collect-and-pre-process-data'),
+              ('Importing Keras and Tensorflow',
                2,
                None,
-               'summary-of-a-typical-rnn'),
-              ('Four effective ways to learn an RNN and preparing for next '
-               'week',
+               'importing-keras-and-tensorflow'),
+              ('Running with Keras', 2, None, 'running-with-keras'),
+              ('Final part', 2, None, 'final-part'),
+              ('Final visualization', 2, None, 'final-visualization'),
+              ('The CIFAR01 data set', 2, None, 'the-cifar01-data-set'),
+              ('Verifying the data set', 2, None, 'verifying-the-data-set'),
+              ('Set up  the model', 2, None, 'set-up-the-model'),
+              ('Add Dense layers on top', 2, None, 'add-dense-layers-on-top'),
+              ('Compile and train the model',
                2,
                None,
-               'four-effective-ways-to-learn-an-rnn-and-preparing-for-next-week'),
-              ('Gating mechanism: Long Short Term Memory (LSTM)',
+               'compile-and-train-the-model'),
+              ('Finally, evaluate the model',
                2,
                None,
-               'gating-mechanism-long-short-term-memory-lstm'),
-              ('Implementing a memory cell in a neural network',
+               'finally-evaluate-the-model'),
+              ('Building code using Pytorch',
                2,
                None,
-               'implementing-a-memory-cell-in-a-neural-network'),
-              ('LSTM details', 2, None, 'lstm-details'),
-              ('Basic layout', 2, None, 'basic-layout'),
-              ('More LSTM details', 2, None, 'more-lstm-details'),
-              ('The forget gate', 2, None, 'the-forget-gate'),
-              ('Input gate', 2, None, 'input-gate'),
-              ('Forget and input', 2, None, 'forget-and-input'),
-              ('Output gate', 2, None, 'output-gate')]}
+               'building-code-using-pytorch')]}
 end of tocinfo -->
 
 <body>
@@ -302,111 +201,72 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week45-bs.html">Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks (RNNs)</a>
+    <a class="navbar-brand" href="http://github.com/week45-bs.html">Week 45,  Convolutional Neural Networks (CCNs)</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs001.html#plans-for-week-45" style="font-size: 80%;"><b>Plans for week 45</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs002.html#material-for-the-lab-sessions-additional-ways-to-present-classification-results-and-other-practicalities" style="font-size: 80%;"><b>Material for the lab sessions, additional ways to present classification results and other practicalities</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs003.html#material-for-lecture-monday-november-4" style="font-size: 80%;"><b>Material for Lecture Monday November 4</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs004.html#convolutional-neural-networks-recognizing-images" style="font-size: 80%;"><b>Convolutional Neural Networks (recognizing images)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs005.html#what-is-the-difference" style="font-size: 80%;"><b>What is the Difference</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs006.html#neural-networks-vs-cnns" style="font-size: 80%;"><b>Neural Networks vs CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs007.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;"><b>Why CNNS for images, sound files, medical images from CT scans etc?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs008.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;"><b>Regular NNs don’t scale well to full images</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs009.html#3d-volumes-of-neurons" style="font-size: 80%;"><b>3D volumes of neurons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs010.html#layers-used-to-build-cnns" style="font-size: 80%;"><b>Layers used to build CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs011.html#cnns-in-brief" style="font-size: 80%;"><b>CNNs in brief</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs012.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs013.html#key-idea" style="font-size: 80%;"><b>Key Idea</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs014.html#building-convolutional-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building convolutional neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs015.html#setting-it-up" style="font-size: 80%;"><b>Setting it up</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs016.html#the-mnist-dataset-again" style="font-size: 80%;"><b>The MNIST dataset again</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs017.html#strong-correlations" style="font-size: 80%;"><b>Strong correlations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs018.html#layers-of-a-cnn" style="font-size: 80%;"><b>Layers of a CNN</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs019.html#systematic-reduction" style="font-size: 80%;"><b>Systematic reduction</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs020.html#prerequisites-collect-and-pre-process-data" style="font-size: 80%;"><b>Prerequisites: Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs021.html#importing-keras-and-tensorflow" style="font-size: 80%;"><b>Importing Keras and Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs022.html#running-with-keras" style="font-size: 80%;"><b>Running with Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs023.html#final-part" style="font-size: 80%;"><b>Final part</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs024.html#final-visualization" style="font-size: 80%;"><b>Final visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs025.html#the-cifar01-data-set" style="font-size: 80%;"><b>The CIFAR01 data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs026.html#verifying-the-data-set" style="font-size: 80%;"><b>Verifying the data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs027.html#set-up-the-model" style="font-size: 80%;"><b>Set up  the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs028.html#add-dense-layers-on-top" style="font-size: 80%;"><b>Add Dense layers on top</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs029.html#compile-and-train-the-model" style="font-size: 80%;"><b>Compile and train the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs030.html#finally-evaluate-the-model" style="font-size: 80%;"><b>Finally, evaluate the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#building-our-own-cnn-code" style="font-size: 80%;"><b>Building our own CNN code</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#list-of-contents" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;List of contents:</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution2dlayer-convolution-in-a-hidden-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution2DLayer: convolution in a hidden layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#backpropagation-in-the-convolutional-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Backpropagation in the convolutional layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#demonstration" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Demonstration</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#pooling-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Pooling Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#flattening-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Flattening Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#fully-connected-layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Fully Connected Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#optimized-convolution2dlayer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Optimized Convolution2DLayer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#the-convolutional-neural-network-cnn" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;The Convolutional Neural Network (CNN)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-cnn-code" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of CNN code</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#additional-remarks" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Additional Remarks</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#remarks-on-the-speed" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Remarks on the speed</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution-using-separable-kernels" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution using separable kernels</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution-in-the-fourier-domain" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution in the Fourier domain</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs032.html#from-ffnns-and-cnns-to-recurrent-neural-networks-rnns" style="font-size: 80%;"><b>From FFNNs and CNNs to recurrent neural networks (RNNs)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs033.html#feedback-connections" style="font-size: 80%;"><b>Feedback connections</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs034.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs035.html#recurrent-neural-networks-rnns-overarching-view" style="font-size: 80%;"><b>Recurrent neural networks (RNNs): Overarching view</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs036.html#sequential-data-only" style="font-size: 80%;"><b>Sequential data only?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs037.html#differential-equations" style="font-size: 80%;"><b>Differential equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs038.html#a-simple-example" style="font-size: 80%;"><b>A simple example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs039.html#rnns" style="font-size: 80%;"><b>RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs040.html#what-kinds-of-behaviour-can-rnns-exhibit" style="font-size: 80%;"><b>What kinds of behaviour can RNNs exhibit?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs041.html#basic-layout-figures-from-sebastian-rashcka-et-al-machine-learning-with-sickit-learn-and-pytorch-https-sebastianraschka-com-blog-2022-ml-pytorch-book-html" style="font-size: 80%;"><b>Basic layout,  "Figures from Sebastian Rashcka et al, Machine learning with Sickit-Learn and PyTorch":"https://sebastianraschka.com/blog/2022/ml-pytorch-book.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs042.html#solving-differential-equations-with-rnns" style="font-size: 80%;"><b>Solving differential equations with RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs043.html#two-first-order-differential-equations" style="font-size: 80%;"><b>Two first-order differential equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs044.html#velocity-only" style="font-size: 80%;"><b>Velocity only</b></a></li>
-     <!-- navigation toc: --> <li><a href="#linking-with-rnns" style="font-size: 80%;"><b>Linking with RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs046.html#minor-rewrite" style="font-size: 80%;"><b>Minor rewrite</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs047.html#rnns-in-more-detail" style="font-size: 80%;"><b>RNNs in more detail</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs048.html#rnns-in-more-detail-part-2" style="font-size: 80%;"><b>RNNs in more detail, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs049.html#rnns-in-more-detail-part-3" style="font-size: 80%;"><b>RNNs in more detail, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs050.html#rnns-in-more-detail-part-4" style="font-size: 80%;"><b>RNNs in more detail, part 4</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs051.html#rnns-in-more-detail-part-5" style="font-size: 80%;"><b>RNNs in more detail, part 5</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs052.html#rnns-in-more-detail-part-6" style="font-size: 80%;"><b>RNNs in more detail, part 6</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs053.html#rnns-in-more-detail-part-7" style="font-size: 80%;"><b>RNNs in more detail, part 7</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs054.html#backpropagation-through-time" style="font-size: 80%;"><b>Backpropagation through time</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs055.html#the-backward-pass-is-linear" style="font-size: 80%;"><b>The backward pass is linear</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs056.html#the-problem-of-exploding-or-vanishing-gradients" style="font-size: 80%;"><b>The problem of exploding or vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs057.html#mathematical-setup" style="font-size: 80%;"><b>Mathematical setup</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs058.html#back-propagation-in-time-through-figures-part-1" style="font-size: 80%;"><b>Back propagation in time through figures, part 1</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs059.html#back-propagation-in-time-part-2" style="font-size: 80%;"><b>Back propagation in time, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs060.html#back-propagation-in-time-part-3" style="font-size: 80%;"><b>Back propagation in time, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs061.html#back-propagation-in-time-part-4" style="font-size: 80%;"><b>Back propagation in time, part 4</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs062.html#back-propagation-in-time-in-equations" style="font-size: 80%;"><b>Back propagation in time in equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs063.html#chain-rule-again" style="font-size: 80%;"><b>Chain rule again</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs064.html#gradients-of-loss-functions" style="font-size: 80%;"><b>Gradients of loss functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs065.html#summary-of-rnns" style="font-size: 80%;"><b>Summary of RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs066.html#summary-of-a-typical-rnn" style="font-size: 80%;"><b>Summary of a  typical RNN</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs067.html#four-effective-ways-to-learn-an-rnn-and-preparing-for-next-week" style="font-size: 80%;"><b>Four effective ways to learn an RNN and preparing for next week</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs068.html#gating-mechanism-long-short-term-memory-lstm" style="font-size: 80%;"><b>Gating mechanism: Long Short Term Memory (LSTM)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs069.html#implementing-a-memory-cell-in-a-neural-network" style="font-size: 80%;"><b>Implementing a memory cell in a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs070.html#lstm-details" style="font-size: 80%;"><b>LSTM details</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs071.html#basic-layout" style="font-size: 80%;"><b>Basic layout</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs072.html#more-lstm-details" style="font-size: 80%;"><b>More LSTM details</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs073.html#the-forget-gate" style="font-size: 80%;"><b>The forget gate</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs074.html#input-gate" style="font-size: 80%;"><b>Input gate</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs075.html#forget-and-input" style="font-size: 80%;"><b>Forget and input</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs076.html#output-gate" style="font-size: 80%;"><b>Output gate</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs001.html#plans-for-week-45" style="font-size: 80%;">Plans for week 45</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs002.html#material-for-the-lab-sessions" style="font-size: 80%;">Material for the lab sessions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs003.html#material-for-lecture-monday-november-3" style="font-size: 80%;">Material for Lecture Monday November 3</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs004.html#convolutional-neural-networks-recognizing-images-reminder-from-last-week" style="font-size: 80%;">Convolutional Neural Networks (recognizing images), reminder from last week</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs005.html#what-is-the-difference" style="font-size: 80%;">What is the Difference</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs006.html#neural-networks-vs-cnns" style="font-size: 80%;">Neural Networks vs CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs007.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;">Why CNNS for images, sound files, medical images from CT scans etc?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs008.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;">Regular NNs don’t scale well to full images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs009.html#3d-volumes-of-neurons" style="font-size: 80%;">3D volumes of neurons</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs010.html#more-on-dimensionalities" style="font-size: 80%;">More on Dimensionalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs011.html#further-remarks" style="font-size: 80%;">Further remarks</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs012.html#layers-used-to-build-cnns" style="font-size: 80%;">Layers used to build CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs013.html#transforming-images" style="font-size: 80%;">Transforming images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs014.html#cnns-in-brief" style="font-size: 80%;">CNNs in brief</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs015.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs016.html#key-idea" style="font-size: 80%;">Key Idea</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs017.html#mathematics-of-cnns" style="font-size: 80%;">Mathematics of CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs018.html#convolution-examples-polynomial-multiplication" style="font-size: 80%;">Convolution Examples: Polynomial multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs019.html#efficient-polynomial-multiplication" style="font-size: 80%;">Efficient Polynomial Multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs020.html#further-simplification" style="font-size: 80%;">Further simplification</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs021.html#a-more-efficient-way-of-coding-the-above-convolution" style="font-size: 80%;">A more efficient way of coding the above Convolution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs022.html#commutative-process" style="font-size: 80%;">Commutative process</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs023.html#toeplitz-matrices" style="font-size: 80%;">Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs024.html#fourier-series-and-toeplitz-matrices" style="font-size: 80%;">Fourier series and Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs025.html#generalizing-the-above-one-dimensional-case" style="font-size: 80%;">Generalizing the above one-dimensional case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs026.html#memory-considerations" style="font-size: 80%;">Memory considerations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs027.html#padding" style="font-size: 80%;">Padding</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs028.html#new-vector" style="font-size: 80%;">New vector</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs029.html#rewriting-as-dot-products" style="font-size: 80%;">Rewriting as dot products</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs030.html#cross-correlation" style="font-size: 80%;">Cross correlation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs030.html#two-dimensional-objects" style="font-size: 80%;">Two-dimensional objects</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#cnns-in-more-detail-simple-example" style="font-size: 80%;">CNNs in more detail, simple example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs032.html#the-convolution-stage" style="font-size: 80%;">The convolution stage</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs033.html#finding-the-number-of-parameters" style="font-size: 80%;">Finding the number of parameters</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs034.html#new-image-or-volume" style="font-size: 80%;">New image (or volume)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs035.html#parameters-to-train-common-settings" style="font-size: 80%;">Parameters to train, common settings</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs036.html#examples-of-cnn-setups" style="font-size: 80%;">Examples of CNN setups</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs037.html#summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Summarizing: Performing a general discrete convolution ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs038.html#pooling" style="font-size: 80%;">Pooling</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs039.html#pooling-arithmetic" style="font-size: 80%;">Pooling arithmetic</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs040.html#pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Pooling types ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs041.html#building-convolutional-neural-networks-using-tensorflow-and-keras" style="font-size: 80%;">Building convolutional neural networks using Tensorflow and Keras</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs042.html#setting-it-up" style="font-size: 80%;">Setting it up</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs043.html#the-mnist-dataset-again" style="font-size: 80%;">The MNIST dataset again</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs044.html#strong-correlations" style="font-size: 80%;">Strong correlations</a></li>
+     <!-- navigation toc: --> <li><a href="#layers-of-a-cnn" style="font-size: 80%;">Layers of a CNN</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs046.html#systematic-reduction" style="font-size: 80%;">Systematic reduction</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs047.html#prerequisites-collect-and-pre-process-data" style="font-size: 80%;">Prerequisites: Collect and pre-process data</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs048.html#importing-keras-and-tensorflow" style="font-size: 80%;">Importing Keras and Tensorflow</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs049.html#running-with-keras" style="font-size: 80%;">Running with Keras</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs050.html#final-part" style="font-size: 80%;">Final part</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs051.html#final-visualization" style="font-size: 80%;">Final visualization</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs052.html#the-cifar01-data-set" style="font-size: 80%;">The CIFAR01 data set</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs053.html#verifying-the-data-set" style="font-size: 80%;">Verifying the data set</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs054.html#set-up-the-model" style="font-size: 80%;">Set up  the model</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs055.html#add-dense-layers-on-top" style="font-size: 80%;">Add Dense layers on top</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs056.html#compile-and-train-the-model" style="font-size: 80%;">Compile and train the model</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs057.html#finally-evaluate-the-model" style="font-size: 80%;">Finally, evaluate the model</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs058.html#building-code-using-pytorch" style="font-size: 80%;">Building code using Pytorch</a></li>
 
         </ul>
       </li>
@@ -417,33 +277,27 @@
 <div class="container">
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0045"></a>
-<!-- !split -->
-<h2 id="linking-with-rnns" class="anchor">Linking with RNNs </h2>
+<!-- !split  -->
+<h2 id="layers-of-a-cnn" class="anchor">Layers of a CNN </h2>
 
-<p>The equation</p>
-$$
-v_{i+1}=h_i(x_i,v_i,F_i).
-$$
-
-<p>can be used to train a feed-forward neural network with inputs \( v_i \) and outputs \( v_{i+1} \) at a time \( t_i \). But we can think of this also as a recurrent neural network
-with inputs \( v_i \), \( x_i \) and \( F_i \) at each time step \( t_i \), and producing an output \( v_{i+1} \).
+<p>The layers of a convolutional neural network arrange neurons in 3D: width, height and depth.  
+The input image is typically a square matrix of depth 3. 
 </p>
 
-<p>Noting that </p>
-$$
-v_{i}=v_{i-1}+\Delta t\left(F_{i-1}-\eta v_{i-1}-x_{i-1}\right)=h_{i-1}.
-$$
-
-<p>we have</p>
-$$
-v_{i}=h_{i-1}(x_{i-1},v_{i-1},F_{i-1}),
-$$
-
-<p>and we can rewrite</p>
-$$
-v_{i+1}=h_i(x_i,h_{i-1},F_i).
-$$
+<p>A <b>convolution</b> is performed on the image which outputs
+a 3D volume of neurons. The weights to the input are arranged in a number of 2D matrices, known as <b>filters</b>.
+</p>
 
+<p>Each filter slides along the input image, taking the dot product
+between each small part of the image and the filter, in all depth
+dimensions. This is then passed through a non-linear function,
+typically the <b>Rectified Linear (ReLu)</b> function, which serves as the
+activation of the neurons in the first convolutional layer. This is
+further passed through a <b>pooling layer</b>, which reduces the size of the
+convolutional layer, e.g. by taking the maximum or average across some
+small regions, and this serves as input to the next convolutional
+layer.
+</p>
 
 <p>
 <!-- navigation buttons at the bottom of the page -->
@@ -470,7 +324,7 @@ <h2 id="linking-with-rnns" class="anchor">Linking with RNNs </h2>
   <li><a href="http://github.com/._week45-bs053.html">54</a></li>
   <li><a href="http://github.com/._week45-bs054.html">55</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week45-bs076.html">77</a></li>
+  <li><a href="http://github.com/._week45-bs058.html">59</a></li>
   <li><a href="http://github.com/._week45-bs046.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week45/html/._week45-bs046.html b/doc/pub/week45/html/._week45-bs046.html
index 6e54e69fa..76d82d444 100644
--- a/doc/pub/week45/html/._week45-bs046.html
+++ b/doc/pub/week45/html/._week45-bs046.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks (RNNs)">
-<title>Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks (RNNs)</title>
+<meta name="description" content="Week 45,  Convolutional Neural Networks (CCNs)">
+<title>Week 45,  Convolutional Neural Networks (CCNs)</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week45.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week45-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -37,19 +37,19 @@
 <!-- tocinfo
 {'highest level': 2,
  'sections': [('Plans for week 45', 2, None, 'plans-for-week-45'),
-              ('Material for the lab sessions, additional ways to present '
-               'classification results and other practicalities',
+              ('Material for the lab sessions',
                2,
                None,
-               'material-for-the-lab-sessions-additional-ways-to-present-classification-results-and-other-practicalities'),
-              ('Material for Lecture Monday November 4',
+               'material-for-the-lab-sessions'),
+              ('Material for Lecture Monday November 3',
                2,
                None,
-               'material-for-lecture-monday-november-4'),
-              ('Convolutional Neural Networks (recognizing images)',
+               'material-for-lecture-monday-november-3'),
+              ('Convolutional Neural Networks (recognizing images), reminder '
+               'from last week',
                2,
                None,
-               'convolutional-neural-networks-recognizing-images'),
+               'convolutional-neural-networks-recognizing-images-reminder-from-last-week'),
               ('What is the Difference', 2, None, 'what-is-the-difference'),
               ('Neural Networks vs CNNs', 2, None, 'neural-networks-vs-cnns'),
               ('Why CNNS for images, sound files, medical images from CT scans '
@@ -62,10 +62,13 @@
                None,
                'regular-nns-don-t-scale-well-to-full-images'),
               ('3D volumes of neurons', 2, None, '3d-volumes-of-neurons'),
+              ('More on Dimensionalities', 2, None, 'more-on-dimensionalities'),
+              ('Further remarks', 2, None, 'further-remarks'),
               ('Layers used to build CNNs',
                2,
                None,
                'layers-used-to-build-cnns'),
+              ('Transforming images', 2, None, 'transforming-images'),
               ('CNNs in brief', 2, None, 'cnns-in-brief'),
               ('A deep CNN model ("From Raschka et '
                'al":"https://github.com/rasbt/machine-learning-book")',
@@ -73,208 +76,104 @@
                None,
                'a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
               ('Key Idea', 2, None, 'key-idea'),
-              ('Building convolutional neural networks in Tensorflow and Keras',
+              ('Mathematics of CNNs', 2, None, 'mathematics-of-cnns'),
+              ('Convolution Examples: Polynomial multiplication',
                2,
                None,
-               'building-convolutional-neural-networks-in-tensorflow-and-keras'),
-              ('Setting it up', 2, None, 'setting-it-up'),
-              ('The MNIST dataset again', 2, None, 'the-mnist-dataset-again'),
-              ('Strong correlations', 2, None, 'strong-correlations'),
-              ('Layers of a CNN', 2, None, 'layers-of-a-cnn'),
-              ('Systematic reduction', 2, None, 'systematic-reduction'),
-              ('Prerequisites: Collect and pre-process data',
-               2,
-               None,
-               'prerequisites-collect-and-pre-process-data'),
-              ('Importing Keras and Tensorflow',
-               2,
-               None,
-               'importing-keras-and-tensorflow'),
-              ('Running with Keras', 2, None, 'running-with-keras'),
-              ('Final part', 2, None, 'final-part'),
-              ('Final visualization', 2, None, 'final-visualization'),
-              ('The CIFAR01 data set', 2, None, 'the-cifar01-data-set'),
-              ('Verifying the data set', 2, None, 'verifying-the-data-set'),
-              ('Set up  the model', 2, None, 'set-up-the-model'),
-              ('Add Dense layers on top', 2, None, 'add-dense-layers-on-top'),
-              ('Compile and train the model',
-               2,
-               None,
-               'compile-and-train-the-model'),
-              ('Finally, evaluate the model',
-               2,
-               None,
-               'finally-evaluate-the-model'),
-              ('Building our own CNN code',
-               2,
-               None,
-               'building-our-own-cnn-code'),
-              ('List of contents:', 3, None, 'list-of-contents'),
-              ('Schedulers', 3, None, 'schedulers'),
-              ('Usage of schedulers', 3, None, 'usage-of-schedulers'),
-              ('Cost functions', 3, None, 'cost-functions'),
-              ('Usage of cost functions', 3, None, 'usage-of-cost-functions'),
-              ('Activation functions', 3, None, 'activation-functions'),
-              ('Usage of activation functions',
-               3,
-               None,
-               'usage-of-activation-functions'),
-              ('Convolution', 3, None, 'convolution'),
-              ('Layers', 3, None, 'layers'),
-              ('Convolution2DLayer: convolution in a hidden layer',
-               3,
-               None,
-               'convolution2dlayer-convolution-in-a-hidden-layer'),
-              ('Backpropagation in the convolutional layer',
-               3,
-               None,
-               'backpropagation-in-the-convolutional-layer'),
-              ('Demonstration', 3, None, 'demonstration'),
-              ('Pooling Layer', 3, None, 'pooling-layer'),
-              ('Flattening Layer', 3, None, 'flattening-layer'),
-              ('Fully Connected Layers', 3, None, 'fully-connected-layers'),
-              ('Optimized Convolution2DLayer',
-               3,
-               None,
-               'optimized-convolution2dlayer'),
-              ('The Convolutional Neural Network (CNN)',
-               3,
-               None,
-               'the-convolutional-neural-network-cnn'),
-              ('Usage of CNN code', 3, None, 'usage-of-cnn-code'),
-              ('Additional Remarks', 3, None, 'additional-remarks'),
-              ('Remarks on the speed', 3, None, 'remarks-on-the-speed'),
-              ('Convolution using separable kernels',
-               3,
-               None,
-               'convolution-using-separable-kernels'),
-              ('Convolution in the Fourier domain',
-               3,
-               None,
-               'convolution-in-the-fourier-domain'),
-              ('From FFNNs and CNNs to recurrent neural networks (RNNs)',
-               2,
-               None,
-               'from-ffnns-and-cnns-to-recurrent-neural-networks-rnns'),
-              ('Feedback connections', 2, None, 'feedback-connections'),
-              ('Vanishing gradients', 2, None, 'vanishing-gradients'),
-              ('Recurrent neural networks (RNNs): Overarching view',
+               'convolution-examples-polynomial-multiplication'),
+              ('Efficient Polynomial Multiplication',
                2,
                None,
-               'recurrent-neural-networks-rnns-overarching-view'),
-              ('Sequential data only?', 2, None, 'sequential-data-only'),
-              ('Differential equations', 2, None, 'differential-equations'),
-              ('A simple example', 2, None, 'a-simple-example'),
-              ('RNNs', 2, None, 'rnns'),
-              ('What kinds of behaviour can RNNs exhibit?',
+               'efficient-polynomial-multiplication'),
+              ('Further simplification', 2, None, 'further-simplification'),
+              ('A more efficient way of coding the above Convolution',
                2,
                None,
-               'what-kinds-of-behaviour-can-rnns-exhibit'),
-              ('Basic layout,  "Figures from Sebastian Rashcka et al, Machine '
-               'learning with Sickit-Learn and '
-               'PyTorch":"https://sebastianraschka.com/blog/2022/ml-pytorch-book.html"',
+               'a-more-efficient-way-of-coding-the-above-convolution'),
+              ('Commutative process', 2, None, 'commutative-process'),
+              ('Toeplitz matrices', 2, None, 'toeplitz-matrices'),
+              ('Fourier series and Toeplitz matrices',
                2,
                None,
-               'basic-layout-figures-from-sebastian-rashcka-et-al-machine-learning-with-sickit-learn-and-pytorch-https-sebastianraschka-com-blog-2022-ml-pytorch-book-html'),
-              ('Solving differential equations with RNNs',
+               'fourier-series-and-toeplitz-matrices'),
+              ('Generalizing the above one-dimensional case',
                2,
                None,
-               'solving-differential-equations-with-rnns'),
-              ('Two first-order differential equations',
+               'generalizing-the-above-one-dimensional-case'),
+              ('Memory considerations', 2, None, 'memory-considerations'),
+              ('Padding', 2, None, 'padding'),
+              ('New vector', 2, None, 'new-vector'),
+              ('Rewriting as dot products',
                2,
                None,
-               'two-first-order-differential-equations'),
-              ('Velocity only', 2, None, 'velocity-only'),
-              ('Linking with RNNs', 2, None, 'linking-with-rnns'),
-              ('Minor rewrite', 2, None, 'minor-rewrite'),
-              ('RNNs in more detail', 2, None, 'rnns-in-more-detail'),
-              ('RNNs in more detail, part 2',
+               'rewriting-as-dot-products'),
+              ('Cross correlation', 2, None, 'cross-correlation'),
+              ('Two-dimensional objects', 2, None, 'two-dimensional-objects'),
+              ('CNNs in more detail, simple example',
                2,
                None,
-               'rnns-in-more-detail-part-2'),
-              ('RNNs in more detail, part 3',
+               'cnns-in-more-detail-simple-example'),
+              ('The convolution stage', 2, None, 'the-convolution-stage'),
+              ('Finding the number of parameters',
                2,
                None,
-               'rnns-in-more-detail-part-3'),
-              ('RNNs in more detail, part 4',
+               'finding-the-number-of-parameters'),
+              ('New image (or volume)', 2, None, 'new-image-or-volume'),
+              ('Parameters to train, common settings',
                2,
                None,
-               'rnns-in-more-detail-part-4'),
-              ('RNNs in more detail, part 5',
-               2,
-               None,
-               'rnns-in-more-detail-part-5'),
-              ('RNNs in more detail, part 6',
-               2,
-               None,
-               'rnns-in-more-detail-part-6'),
-              ('RNNs in more detail, part 7',
-               2,
-               None,
-               'rnns-in-more-detail-part-7'),
-              ('Backpropagation through time',
-               2,
-               None,
-               'backpropagation-through-time'),
-              ('The backward pass is linear',
-               2,
-               None,
-               'the-backward-pass-is-linear'),
-              ('The problem of exploding or vanishing gradients',
-               2,
-               None,
-               'the-problem-of-exploding-or-vanishing-gradients'),
-              ('Mathematical setup', 2, None, 'mathematical-setup'),
-              ('Back propagation in time through figures, part 1',
-               2,
-               None,
-               'back-propagation-in-time-through-figures-part-1'),
-              ('Back propagation in time, part 2',
-               2,
-               None,
-               'back-propagation-in-time-part-2'),
-              ('Back propagation in time, part 3',
+               'parameters-to-train-common-settings'),
+              ('Examples of CNN setups', 2, None, 'examples-of-cnn-setups'),
+              ('Summarizing: Performing a general discrete convolution ("From '
+               'Raschka et '
+               'al":"https://github.com/rasbt/machine-learning-book")',
                2,
                None,
-               'back-propagation-in-time-part-3'),
-              ('Back propagation in time, part 4',
+               'summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
+              ('Pooling', 2, None, 'pooling'),
+              ('Pooling arithmetic', 2, None, 'pooling-arithmetic'),
+              ('Pooling types ("From Raschka et '
+               'al":"https://github.com/rasbt/machine-learning-book")',
                2,
                None,
-               'back-propagation-in-time-part-4'),
-              ('Back propagation in time in equations',
+               'pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
+              ('Building convolutional neural networks using Tensorflow and '
+               'Keras',
                2,
                None,
-               'back-propagation-in-time-in-equations'),
-              ('Chain rule again', 2, None, 'chain-rule-again'),
-              ('Gradients of loss functions',
+               'building-convolutional-neural-networks-using-tensorflow-and-keras'),
+              ('Setting it up', 2, None, 'setting-it-up'),
+              ('The MNIST dataset again', 2, None, 'the-mnist-dataset-again'),
+              ('Strong correlations', 2, None, 'strong-correlations'),
+              ('Layers of a CNN', 2, None, 'layers-of-a-cnn'),
+              ('Systematic reduction', 2, None, 'systematic-reduction'),
+              ('Prerequisites: Collect and pre-process data',
                2,
                None,
-               'gradients-of-loss-functions'),
-              ('Summary of RNNs', 2, None, 'summary-of-rnns'),
-              ('Summary of a  typical RNN',
+               'prerequisites-collect-and-pre-process-data'),
+              ('Importing Keras and Tensorflow',
                2,
                None,
-               'summary-of-a-typical-rnn'),
-              ('Four effective ways to learn an RNN and preparing for next '
-               'week',
+               'importing-keras-and-tensorflow'),
+              ('Running with Keras', 2, None, 'running-with-keras'),
+              ('Final part', 2, None, 'final-part'),
+              ('Final visualization', 2, None, 'final-visualization'),
+              ('The CIFAR01 data set', 2, None, 'the-cifar01-data-set'),
+              ('Verifying the data set', 2, None, 'verifying-the-data-set'),
+              ('Set up  the model', 2, None, 'set-up-the-model'),
+              ('Add Dense layers on top', 2, None, 'add-dense-layers-on-top'),
+              ('Compile and train the model',
                2,
                None,
-               'four-effective-ways-to-learn-an-rnn-and-preparing-for-next-week'),
-              ('Gating mechanism: Long Short Term Memory (LSTM)',
+               'compile-and-train-the-model'),
+              ('Finally, evaluate the model',
                2,
                None,
-               'gating-mechanism-long-short-term-memory-lstm'),
-              ('Implementing a memory cell in a neural network',
+               'finally-evaluate-the-model'),
+              ('Building code using Pytorch',
                2,
                None,
-               'implementing-a-memory-cell-in-a-neural-network'),
-              ('LSTM details', 2, None, 'lstm-details'),
-              ('Basic layout', 2, None, 'basic-layout'),
-              ('More LSTM details', 2, None, 'more-lstm-details'),
-              ('The forget gate', 2, None, 'the-forget-gate'),
-              ('Input gate', 2, None, 'input-gate'),
-              ('Forget and input', 2, None, 'forget-and-input'),
-              ('Output gate', 2, None, 'output-gate')]}
+               'building-code-using-pytorch')]}
 end of tocinfo -->
 
 <body>
@@ -302,111 +201,72 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week45-bs.html">Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks (RNNs)</a>
+    <a class="navbar-brand" href="http://github.com/week45-bs.html">Week 45,  Convolutional Neural Networks (CCNs)</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs001.html#plans-for-week-45" style="font-size: 80%;"><b>Plans for week 45</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs002.html#material-for-the-lab-sessions-additional-ways-to-present-classification-results-and-other-practicalities" style="font-size: 80%;"><b>Material for the lab sessions, additional ways to present classification results and other practicalities</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs003.html#material-for-lecture-monday-november-4" style="font-size: 80%;"><b>Material for Lecture Monday November 4</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs004.html#convolutional-neural-networks-recognizing-images" style="font-size: 80%;"><b>Convolutional Neural Networks (recognizing images)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs005.html#what-is-the-difference" style="font-size: 80%;"><b>What is the Difference</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs006.html#neural-networks-vs-cnns" style="font-size: 80%;"><b>Neural Networks vs CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs007.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;"><b>Why CNNS for images, sound files, medical images from CT scans etc?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs008.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;"><b>Regular NNs don’t scale well to full images</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs009.html#3d-volumes-of-neurons" style="font-size: 80%;"><b>3D volumes of neurons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs010.html#layers-used-to-build-cnns" style="font-size: 80%;"><b>Layers used to build CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs011.html#cnns-in-brief" style="font-size: 80%;"><b>CNNs in brief</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs012.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs013.html#key-idea" style="font-size: 80%;"><b>Key Idea</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs014.html#building-convolutional-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building convolutional neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs015.html#setting-it-up" style="font-size: 80%;"><b>Setting it up</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs016.html#the-mnist-dataset-again" style="font-size: 80%;"><b>The MNIST dataset again</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs017.html#strong-correlations" style="font-size: 80%;"><b>Strong correlations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs018.html#layers-of-a-cnn" style="font-size: 80%;"><b>Layers of a CNN</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs019.html#systematic-reduction" style="font-size: 80%;"><b>Systematic reduction</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs020.html#prerequisites-collect-and-pre-process-data" style="font-size: 80%;"><b>Prerequisites: Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs021.html#importing-keras-and-tensorflow" style="font-size: 80%;"><b>Importing Keras and Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs022.html#running-with-keras" style="font-size: 80%;"><b>Running with Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs023.html#final-part" style="font-size: 80%;"><b>Final part</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs024.html#final-visualization" style="font-size: 80%;"><b>Final visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs025.html#the-cifar01-data-set" style="font-size: 80%;"><b>The CIFAR01 data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs026.html#verifying-the-data-set" style="font-size: 80%;"><b>Verifying the data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs027.html#set-up-the-model" style="font-size: 80%;"><b>Set up  the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs028.html#add-dense-layers-on-top" style="font-size: 80%;"><b>Add Dense layers on top</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs029.html#compile-and-train-the-model" style="font-size: 80%;"><b>Compile and train the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs030.html#finally-evaluate-the-model" style="font-size: 80%;"><b>Finally, evaluate the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#building-our-own-cnn-code" style="font-size: 80%;"><b>Building our own CNN code</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#list-of-contents" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;List of contents:</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution2dlayer-convolution-in-a-hidden-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution2DLayer: convolution in a hidden layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#backpropagation-in-the-convolutional-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Backpropagation in the convolutional layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#demonstration" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Demonstration</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#pooling-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Pooling Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#flattening-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Flattening Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#fully-connected-layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Fully Connected Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#optimized-convolution2dlayer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Optimized Convolution2DLayer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#the-convolutional-neural-network-cnn" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;The Convolutional Neural Network (CNN)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-cnn-code" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of CNN code</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#additional-remarks" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Additional Remarks</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#remarks-on-the-speed" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Remarks on the speed</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution-using-separable-kernels" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution using separable kernels</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution-in-the-fourier-domain" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution in the Fourier domain</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs032.html#from-ffnns-and-cnns-to-recurrent-neural-networks-rnns" style="font-size: 80%;"><b>From FFNNs and CNNs to recurrent neural networks (RNNs)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs033.html#feedback-connections" style="font-size: 80%;"><b>Feedback connections</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs034.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs035.html#recurrent-neural-networks-rnns-overarching-view" style="font-size: 80%;"><b>Recurrent neural networks (RNNs): Overarching view</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs036.html#sequential-data-only" style="font-size: 80%;"><b>Sequential data only?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs037.html#differential-equations" style="font-size: 80%;"><b>Differential equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs038.html#a-simple-example" style="font-size: 80%;"><b>A simple example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs039.html#rnns" style="font-size: 80%;"><b>RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs040.html#what-kinds-of-behaviour-can-rnns-exhibit" style="font-size: 80%;"><b>What kinds of behaviour can RNNs exhibit?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs041.html#basic-layout-figures-from-sebastian-rashcka-et-al-machine-learning-with-sickit-learn-and-pytorch-https-sebastianraschka-com-blog-2022-ml-pytorch-book-html" style="font-size: 80%;"><b>Basic layout,  "Figures from Sebastian Rashcka et al, Machine learning with Sickit-Learn and PyTorch":"https://sebastianraschka.com/blog/2022/ml-pytorch-book.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs042.html#solving-differential-equations-with-rnns" style="font-size: 80%;"><b>Solving differential equations with RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs043.html#two-first-order-differential-equations" style="font-size: 80%;"><b>Two first-order differential equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs044.html#velocity-only" style="font-size: 80%;"><b>Velocity only</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs045.html#linking-with-rnns" style="font-size: 80%;"><b>Linking with RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="#minor-rewrite" style="font-size: 80%;"><b>Minor rewrite</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs047.html#rnns-in-more-detail" style="font-size: 80%;"><b>RNNs in more detail</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs048.html#rnns-in-more-detail-part-2" style="font-size: 80%;"><b>RNNs in more detail, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs049.html#rnns-in-more-detail-part-3" style="font-size: 80%;"><b>RNNs in more detail, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs050.html#rnns-in-more-detail-part-4" style="font-size: 80%;"><b>RNNs in more detail, part 4</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs051.html#rnns-in-more-detail-part-5" style="font-size: 80%;"><b>RNNs in more detail, part 5</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs052.html#rnns-in-more-detail-part-6" style="font-size: 80%;"><b>RNNs in more detail, part 6</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs053.html#rnns-in-more-detail-part-7" style="font-size: 80%;"><b>RNNs in more detail, part 7</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs054.html#backpropagation-through-time" style="font-size: 80%;"><b>Backpropagation through time</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs055.html#the-backward-pass-is-linear" style="font-size: 80%;"><b>The backward pass is linear</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs056.html#the-problem-of-exploding-or-vanishing-gradients" style="font-size: 80%;"><b>The problem of exploding or vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs057.html#mathematical-setup" style="font-size: 80%;"><b>Mathematical setup</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs058.html#back-propagation-in-time-through-figures-part-1" style="font-size: 80%;"><b>Back propagation in time through figures, part 1</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs059.html#back-propagation-in-time-part-2" style="font-size: 80%;"><b>Back propagation in time, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs060.html#back-propagation-in-time-part-3" style="font-size: 80%;"><b>Back propagation in time, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs061.html#back-propagation-in-time-part-4" style="font-size: 80%;"><b>Back propagation in time, part 4</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs062.html#back-propagation-in-time-in-equations" style="font-size: 80%;"><b>Back propagation in time in equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs063.html#chain-rule-again" style="font-size: 80%;"><b>Chain rule again</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs064.html#gradients-of-loss-functions" style="font-size: 80%;"><b>Gradients of loss functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs065.html#summary-of-rnns" style="font-size: 80%;"><b>Summary of RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs066.html#summary-of-a-typical-rnn" style="font-size: 80%;"><b>Summary of a  typical RNN</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs067.html#four-effective-ways-to-learn-an-rnn-and-preparing-for-next-week" style="font-size: 80%;"><b>Four effective ways to learn an RNN and preparing for next week</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs068.html#gating-mechanism-long-short-term-memory-lstm" style="font-size: 80%;"><b>Gating mechanism: Long Short Term Memory (LSTM)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs069.html#implementing-a-memory-cell-in-a-neural-network" style="font-size: 80%;"><b>Implementing a memory cell in a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs070.html#lstm-details" style="font-size: 80%;"><b>LSTM details</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs071.html#basic-layout" style="font-size: 80%;"><b>Basic layout</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs072.html#more-lstm-details" style="font-size: 80%;"><b>More LSTM details</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs073.html#the-forget-gate" style="font-size: 80%;"><b>The forget gate</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs074.html#input-gate" style="font-size: 80%;"><b>Input gate</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs075.html#forget-and-input" style="font-size: 80%;"><b>Forget and input</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs076.html#output-gate" style="font-size: 80%;"><b>Output gate</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs001.html#plans-for-week-45" style="font-size: 80%;">Plans for week 45</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs002.html#material-for-the-lab-sessions" style="font-size: 80%;">Material for the lab sessions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs003.html#material-for-lecture-monday-november-3" style="font-size: 80%;">Material for Lecture Monday November 3</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs004.html#convolutional-neural-networks-recognizing-images-reminder-from-last-week" style="font-size: 80%;">Convolutional Neural Networks (recognizing images), reminder from last week</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs005.html#what-is-the-difference" style="font-size: 80%;">What is the Difference</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs006.html#neural-networks-vs-cnns" style="font-size: 80%;">Neural Networks vs CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs007.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;">Why CNNS for images, sound files, medical images from CT scans etc?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs008.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;">Regular NNs don’t scale well to full images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs009.html#3d-volumes-of-neurons" style="font-size: 80%;">3D volumes of neurons</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs010.html#more-on-dimensionalities" style="font-size: 80%;">More on Dimensionalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs011.html#further-remarks" style="font-size: 80%;">Further remarks</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs012.html#layers-used-to-build-cnns" style="font-size: 80%;">Layers used to build CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs013.html#transforming-images" style="font-size: 80%;">Transforming images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs014.html#cnns-in-brief" style="font-size: 80%;">CNNs in brief</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs015.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs016.html#key-idea" style="font-size: 80%;">Key Idea</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs017.html#mathematics-of-cnns" style="font-size: 80%;">Mathematics of CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs018.html#convolution-examples-polynomial-multiplication" style="font-size: 80%;">Convolution Examples: Polynomial multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs019.html#efficient-polynomial-multiplication" style="font-size: 80%;">Efficient Polynomial Multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs020.html#further-simplification" style="font-size: 80%;">Further simplification</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs021.html#a-more-efficient-way-of-coding-the-above-convolution" style="font-size: 80%;">A more efficient way of coding the above Convolution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs022.html#commutative-process" style="font-size: 80%;">Commutative process</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs023.html#toeplitz-matrices" style="font-size: 80%;">Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs024.html#fourier-series-and-toeplitz-matrices" style="font-size: 80%;">Fourier series and Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs025.html#generalizing-the-above-one-dimensional-case" style="font-size: 80%;">Generalizing the above one-dimensional case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs026.html#memory-considerations" style="font-size: 80%;">Memory considerations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs027.html#padding" style="font-size: 80%;">Padding</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs028.html#new-vector" style="font-size: 80%;">New vector</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs029.html#rewriting-as-dot-products" style="font-size: 80%;">Rewriting as dot products</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs030.html#cross-correlation" style="font-size: 80%;">Cross correlation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs030.html#two-dimensional-objects" style="font-size: 80%;">Two-dimensional objects</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#cnns-in-more-detail-simple-example" style="font-size: 80%;">CNNs in more detail, simple example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs032.html#the-convolution-stage" style="font-size: 80%;">The convolution stage</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs033.html#finding-the-number-of-parameters" style="font-size: 80%;">Finding the number of parameters</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs034.html#new-image-or-volume" style="font-size: 80%;">New image (or volume)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs035.html#parameters-to-train-common-settings" style="font-size: 80%;">Parameters to train, common settings</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs036.html#examples-of-cnn-setups" style="font-size: 80%;">Examples of CNN setups</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs037.html#summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Summarizing: Performing a general discrete convolution ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs038.html#pooling" style="font-size: 80%;">Pooling</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs039.html#pooling-arithmetic" style="font-size: 80%;">Pooling arithmetic</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs040.html#pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Pooling types ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs041.html#building-convolutional-neural-networks-using-tensorflow-and-keras" style="font-size: 80%;">Building convolutional neural networks using Tensorflow and Keras</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs042.html#setting-it-up" style="font-size: 80%;">Setting it up</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs043.html#the-mnist-dataset-again" style="font-size: 80%;">The MNIST dataset again</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs044.html#strong-correlations" style="font-size: 80%;">Strong correlations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs045.html#layers-of-a-cnn" style="font-size: 80%;">Layers of a CNN</a></li>
+     <!-- navigation toc: --> <li><a href="#systematic-reduction" style="font-size: 80%;">Systematic reduction</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs047.html#prerequisites-collect-and-pre-process-data" style="font-size: 80%;">Prerequisites: Collect and pre-process data</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs048.html#importing-keras-and-tensorflow" style="font-size: 80%;">Importing Keras and Tensorflow</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs049.html#running-with-keras" style="font-size: 80%;">Running with Keras</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs050.html#final-part" style="font-size: 80%;">Final part</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs051.html#final-visualization" style="font-size: 80%;">Final visualization</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs052.html#the-cifar01-data-set" style="font-size: 80%;">The CIFAR01 data set</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs053.html#verifying-the-data-set" style="font-size: 80%;">Verifying the data set</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs054.html#set-up-the-model" style="font-size: 80%;">Set up  the model</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs055.html#add-dense-layers-on-top" style="font-size: 80%;">Add Dense layers on top</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs056.html#compile-and-train-the-model" style="font-size: 80%;">Compile and train the model</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs057.html#finally-evaluate-the-model" style="font-size: 80%;">Finally, evaluate the model</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs058.html#building-code-using-pytorch" style="font-size: 80%;">Building code using Pytorch</a></li>
 
         </ul>
       </li>
@@ -418,22 +278,18 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0046"></a>
 <!-- !split -->
-<h2 id="minor-rewrite" class="anchor">Minor rewrite </h2>
+<h2 id="systematic-reduction" class="anchor">Systematic reduction </h2>
 
-<p>We can thus set up a recurring series which depends on the inputs \( x_i \) and \( F_i \) and the previous values \( h_{i-1} \).
-We assume now that the inputs at each step (or time \( t_i \)) is given by \( x_i \) only and we denote the outputs for \( \tilde{y}_i \) instead of \( v_{i_1} \), we have then the compact equation for our outputs at each step \( t_i \)
+<p>By systematically reducing the size of the input volume, through
+convolution and pooling, the network should create representations of
+small parts of the input, and then from them assemble representations
+of larger areas.  The final pooling layer is flattened to serve as
+input to a hidden layer, such that each neuron in the final pooling
+layer is connected to every single neuron in the hidden layer. This
+then serves as input to the output layer, e.g. a softmax output for
+classification.
 </p>
-$$
-y_{i}=h_i(x_i,h_{i-1}).
-$$
 
-<p>We can think of this as an element in a recurrent network where our
-network (our model) produces an output \( y_i \) which is then compared
-with a target value through a given cost/loss function that we
-optimize. The target values at a given step \( t_i \) could be the results
-of a measurement or simply the analytical results of a differential
-equation.
-</p>
 
 <p>
 <!-- navigation buttons at the bottom of the page -->
@@ -460,7 +316,7 @@ <h2 id="minor-rewrite" class="anchor">Minor rewrite </h2>
   <li><a href="http://github.com/._week45-bs054.html">55</a></li>
   <li><a href="http://github.com/._week45-bs055.html">56</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week45-bs076.html">77</a></li>
+  <li><a href="http://github.com/._week45-bs058.html">59</a></li>
   <li><a href="http://github.com/._week45-bs047.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week45/html/._week45-bs047.html b/doc/pub/week45/html/._week45-bs047.html
index 4e3cb20a6..febac10b7 100644
--- a/doc/pub/week45/html/._week45-bs047.html
+++ b/doc/pub/week45/html/._week45-bs047.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks (RNNs)">
-<title>Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks (RNNs)</title>
+<meta name="description" content="Week 45,  Convolutional Neural Networks (CCNs)">
+<title>Week 45,  Convolutional Neural Networks (CCNs)</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week45.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week45-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -37,19 +37,19 @@
 <!-- tocinfo
 {'highest level': 2,
  'sections': [('Plans for week 45', 2, None, 'plans-for-week-45'),
-              ('Material for the lab sessions, additional ways to present '
-               'classification results and other practicalities',
+              ('Material for the lab sessions',
                2,
                None,
-               'material-for-the-lab-sessions-additional-ways-to-present-classification-results-and-other-practicalities'),
-              ('Material for Lecture Monday November 4',
+               'material-for-the-lab-sessions'),
+              ('Material for Lecture Monday November 3',
                2,
                None,
-               'material-for-lecture-monday-november-4'),
-              ('Convolutional Neural Networks (recognizing images)',
+               'material-for-lecture-monday-november-3'),
+              ('Convolutional Neural Networks (recognizing images), reminder '
+               'from last week',
                2,
                None,
-               'convolutional-neural-networks-recognizing-images'),
+               'convolutional-neural-networks-recognizing-images-reminder-from-last-week'),
               ('What is the Difference', 2, None, 'what-is-the-difference'),
               ('Neural Networks vs CNNs', 2, None, 'neural-networks-vs-cnns'),
               ('Why CNNS for images, sound files, medical images from CT scans '
@@ -62,10 +62,13 @@
                None,
                'regular-nns-don-t-scale-well-to-full-images'),
               ('3D volumes of neurons', 2, None, '3d-volumes-of-neurons'),
+              ('More on Dimensionalities', 2, None, 'more-on-dimensionalities'),
+              ('Further remarks', 2, None, 'further-remarks'),
               ('Layers used to build CNNs',
                2,
                None,
                'layers-used-to-build-cnns'),
+              ('Transforming images', 2, None, 'transforming-images'),
               ('CNNs in brief', 2, None, 'cnns-in-brief'),
               ('A deep CNN model ("From Raschka et '
                'al":"https://github.com/rasbt/machine-learning-book")',
@@ -73,208 +76,104 @@
                None,
                'a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
               ('Key Idea', 2, None, 'key-idea'),
-              ('Building convolutional neural networks in Tensorflow and Keras',
+              ('Mathematics of CNNs', 2, None, 'mathematics-of-cnns'),
+              ('Convolution Examples: Polynomial multiplication',
                2,
                None,
-               'building-convolutional-neural-networks-in-tensorflow-and-keras'),
-              ('Setting it up', 2, None, 'setting-it-up'),
-              ('The MNIST dataset again', 2, None, 'the-mnist-dataset-again'),
-              ('Strong correlations', 2, None, 'strong-correlations'),
-              ('Layers of a CNN', 2, None, 'layers-of-a-cnn'),
-              ('Systematic reduction', 2, None, 'systematic-reduction'),
-              ('Prerequisites: Collect and pre-process data',
-               2,
-               None,
-               'prerequisites-collect-and-pre-process-data'),
-              ('Importing Keras and Tensorflow',
-               2,
-               None,
-               'importing-keras-and-tensorflow'),
-              ('Running with Keras', 2, None, 'running-with-keras'),
-              ('Final part', 2, None, 'final-part'),
-              ('Final visualization', 2, None, 'final-visualization'),
-              ('The CIFAR01 data set', 2, None, 'the-cifar01-data-set'),
-              ('Verifying the data set', 2, None, 'verifying-the-data-set'),
-              ('Set up  the model', 2, None, 'set-up-the-model'),
-              ('Add Dense layers on top', 2, None, 'add-dense-layers-on-top'),
-              ('Compile and train the model',
-               2,
-               None,
-               'compile-and-train-the-model'),
-              ('Finally, evaluate the model',
-               2,
-               None,
-               'finally-evaluate-the-model'),
-              ('Building our own CNN code',
-               2,
-               None,
-               'building-our-own-cnn-code'),
-              ('List of contents:', 3, None, 'list-of-contents'),
-              ('Schedulers', 3, None, 'schedulers'),
-              ('Usage of schedulers', 3, None, 'usage-of-schedulers'),
-              ('Cost functions', 3, None, 'cost-functions'),
-              ('Usage of cost functions', 3, None, 'usage-of-cost-functions'),
-              ('Activation functions', 3, None, 'activation-functions'),
-              ('Usage of activation functions',
-               3,
-               None,
-               'usage-of-activation-functions'),
-              ('Convolution', 3, None, 'convolution'),
-              ('Layers', 3, None, 'layers'),
-              ('Convolution2DLayer: convolution in a hidden layer',
-               3,
-               None,
-               'convolution2dlayer-convolution-in-a-hidden-layer'),
-              ('Backpropagation in the convolutional layer',
-               3,
-               None,
-               'backpropagation-in-the-convolutional-layer'),
-              ('Demonstration', 3, None, 'demonstration'),
-              ('Pooling Layer', 3, None, 'pooling-layer'),
-              ('Flattening Layer', 3, None, 'flattening-layer'),
-              ('Fully Connected Layers', 3, None, 'fully-connected-layers'),
-              ('Optimized Convolution2DLayer',
-               3,
-               None,
-               'optimized-convolution2dlayer'),
-              ('The Convolutional Neural Network (CNN)',
-               3,
-               None,
-               'the-convolutional-neural-network-cnn'),
-              ('Usage of CNN code', 3, None, 'usage-of-cnn-code'),
-              ('Additional Remarks', 3, None, 'additional-remarks'),
-              ('Remarks on the speed', 3, None, 'remarks-on-the-speed'),
-              ('Convolution using separable kernels',
-               3,
-               None,
-               'convolution-using-separable-kernels'),
-              ('Convolution in the Fourier domain',
-               3,
-               None,
-               'convolution-in-the-fourier-domain'),
-              ('From FFNNs and CNNs to recurrent neural networks (RNNs)',
-               2,
-               None,
-               'from-ffnns-and-cnns-to-recurrent-neural-networks-rnns'),
-              ('Feedback connections', 2, None, 'feedback-connections'),
-              ('Vanishing gradients', 2, None, 'vanishing-gradients'),
-              ('Recurrent neural networks (RNNs): Overarching view',
-               2,
-               None,
-               'recurrent-neural-networks-rnns-overarching-view'),
-              ('Sequential data only?', 2, None, 'sequential-data-only'),
-              ('Differential equations', 2, None, 'differential-equations'),
-              ('A simple example', 2, None, 'a-simple-example'),
-              ('RNNs', 2, None, 'rnns'),
-              ('What kinds of behaviour can RNNs exhibit?',
-               2,
-               None,
-               'what-kinds-of-behaviour-can-rnns-exhibit'),
-              ('Basic layout,  "Figures from Sebastian Rashcka et al, Machine '
-               'learning with Sickit-Learn and '
-               'PyTorch":"https://sebastianraschka.com/blog/2022/ml-pytorch-book.html"',
-               2,
-               None,
-               'basic-layout-figures-from-sebastian-rashcka-et-al-machine-learning-with-sickit-learn-and-pytorch-https-sebastianraschka-com-blog-2022-ml-pytorch-book-html'),
-              ('Solving differential equations with RNNs',
-               2,
-               None,
-               'solving-differential-equations-with-rnns'),
-              ('Two first-order differential equations',
+               'convolution-examples-polynomial-multiplication'),
+              ('Efficient Polynomial Multiplication',
                2,
                None,
-               'two-first-order-differential-equations'),
-              ('Velocity only', 2, None, 'velocity-only'),
-              ('Linking with RNNs', 2, None, 'linking-with-rnns'),
-              ('Minor rewrite', 2, None, 'minor-rewrite'),
-              ('RNNs in more detail', 2, None, 'rnns-in-more-detail'),
-              ('RNNs in more detail, part 2',
+               'efficient-polynomial-multiplication'),
+              ('Further simplification', 2, None, 'further-simplification'),
+              ('A more efficient way of coding the above Convolution',
                2,
                None,
-               'rnns-in-more-detail-part-2'),
-              ('RNNs in more detail, part 3',
+               'a-more-efficient-way-of-coding-the-above-convolution'),
+              ('Commutative process', 2, None, 'commutative-process'),
+              ('Toeplitz matrices', 2, None, 'toeplitz-matrices'),
+              ('Fourier series and Toeplitz matrices',
                2,
                None,
-               'rnns-in-more-detail-part-3'),
-              ('RNNs in more detail, part 4',
+               'fourier-series-and-toeplitz-matrices'),
+              ('Generalizing the above one-dimensional case',
                2,
                None,
-               'rnns-in-more-detail-part-4'),
-              ('RNNs in more detail, part 5',
+               'generalizing-the-above-one-dimensional-case'),
+              ('Memory considerations', 2, None, 'memory-considerations'),
+              ('Padding', 2, None, 'padding'),
+              ('New vector', 2, None, 'new-vector'),
+              ('Rewriting as dot products',
                2,
                None,
-               'rnns-in-more-detail-part-5'),
-              ('RNNs in more detail, part 6',
+               'rewriting-as-dot-products'),
+              ('Cross correlation', 2, None, 'cross-correlation'),
+              ('Two-dimensional objects', 2, None, 'two-dimensional-objects'),
+              ('CNNs in more detail, simple example',
                2,
                None,
-               'rnns-in-more-detail-part-6'),
-              ('RNNs in more detail, part 7',
+               'cnns-in-more-detail-simple-example'),
+              ('The convolution stage', 2, None, 'the-convolution-stage'),
+              ('Finding the number of parameters',
                2,
                None,
-               'rnns-in-more-detail-part-7'),
-              ('Backpropagation through time',
+               'finding-the-number-of-parameters'),
+              ('New image (or volume)', 2, None, 'new-image-or-volume'),
+              ('Parameters to train, common settings',
                2,
                None,
-               'backpropagation-through-time'),
-              ('The backward pass is linear',
-               2,
-               None,
-               'the-backward-pass-is-linear'),
-              ('The problem of exploding or vanishing gradients',
-               2,
-               None,
-               'the-problem-of-exploding-or-vanishing-gradients'),
-              ('Mathematical setup', 2, None, 'mathematical-setup'),
-              ('Back propagation in time through figures, part 1',
-               2,
-               None,
-               'back-propagation-in-time-through-figures-part-1'),
-              ('Back propagation in time, part 2',
-               2,
-               None,
-               'back-propagation-in-time-part-2'),
-              ('Back propagation in time, part 3',
+               'parameters-to-train-common-settings'),
+              ('Examples of CNN setups', 2, None, 'examples-of-cnn-setups'),
+              ('Summarizing: Performing a general discrete convolution ("From '
+               'Raschka et '
+               'al":"https://github.com/rasbt/machine-learning-book")',
                2,
                None,
-               'back-propagation-in-time-part-3'),
-              ('Back propagation in time, part 4',
+               'summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
+              ('Pooling', 2, None, 'pooling'),
+              ('Pooling arithmetic', 2, None, 'pooling-arithmetic'),
+              ('Pooling types ("From Raschka et '
+               'al":"https://github.com/rasbt/machine-learning-book")',
                2,
                None,
-               'back-propagation-in-time-part-4'),
-              ('Back propagation in time in equations',
+               'pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
+              ('Building convolutional neural networks using Tensorflow and '
+               'Keras',
                2,
                None,
-               'back-propagation-in-time-in-equations'),
-              ('Chain rule again', 2, None, 'chain-rule-again'),
-              ('Gradients of loss functions',
+               'building-convolutional-neural-networks-using-tensorflow-and-keras'),
+              ('Setting it up', 2, None, 'setting-it-up'),
+              ('The MNIST dataset again', 2, None, 'the-mnist-dataset-again'),
+              ('Strong correlations', 2, None, 'strong-correlations'),
+              ('Layers of a CNN', 2, None, 'layers-of-a-cnn'),
+              ('Systematic reduction', 2, None, 'systematic-reduction'),
+              ('Prerequisites: Collect and pre-process data',
                2,
                None,
-               'gradients-of-loss-functions'),
-              ('Summary of RNNs', 2, None, 'summary-of-rnns'),
-              ('Summary of a  typical RNN',
+               'prerequisites-collect-and-pre-process-data'),
+              ('Importing Keras and Tensorflow',
                2,
                None,
-               'summary-of-a-typical-rnn'),
-              ('Four effective ways to learn an RNN and preparing for next '
-               'week',
+               'importing-keras-and-tensorflow'),
+              ('Running with Keras', 2, None, 'running-with-keras'),
+              ('Final part', 2, None, 'final-part'),
+              ('Final visualization', 2, None, 'final-visualization'),
+              ('The CIFAR01 data set', 2, None, 'the-cifar01-data-set'),
+              ('Verifying the data set', 2, None, 'verifying-the-data-set'),
+              ('Set up  the model', 2, None, 'set-up-the-model'),
+              ('Add Dense layers on top', 2, None, 'add-dense-layers-on-top'),
+              ('Compile and train the model',
                2,
                None,
-               'four-effective-ways-to-learn-an-rnn-and-preparing-for-next-week'),
-              ('Gating mechanism: Long Short Term Memory (LSTM)',
+               'compile-and-train-the-model'),
+              ('Finally, evaluate the model',
                2,
                None,
-               'gating-mechanism-long-short-term-memory-lstm'),
-              ('Implementing a memory cell in a neural network',
+               'finally-evaluate-the-model'),
+              ('Building code using Pytorch',
                2,
                None,
-               'implementing-a-memory-cell-in-a-neural-network'),
-              ('LSTM details', 2, None, 'lstm-details'),
-              ('Basic layout', 2, None, 'basic-layout'),
-              ('More LSTM details', 2, None, 'more-lstm-details'),
-              ('The forget gate', 2, None, 'the-forget-gate'),
-              ('Input gate', 2, None, 'input-gate'),
-              ('Forget and input', 2, None, 'forget-and-input'),
-              ('Output gate', 2, None, 'output-gate')]}
+               'building-code-using-pytorch')]}
 end of tocinfo -->
 
 <body>
@@ -302,111 +201,72 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week45-bs.html">Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks (RNNs)</a>
+    <a class="navbar-brand" href="http://github.com/week45-bs.html">Week 45,  Convolutional Neural Networks (CCNs)</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs001.html#plans-for-week-45" style="font-size: 80%;"><b>Plans for week 45</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs002.html#material-for-the-lab-sessions-additional-ways-to-present-classification-results-and-other-practicalities" style="font-size: 80%;"><b>Material for the lab sessions, additional ways to present classification results and other practicalities</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs003.html#material-for-lecture-monday-november-4" style="font-size: 80%;"><b>Material for Lecture Monday November 4</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs004.html#convolutional-neural-networks-recognizing-images" style="font-size: 80%;"><b>Convolutional Neural Networks (recognizing images)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs005.html#what-is-the-difference" style="font-size: 80%;"><b>What is the Difference</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs006.html#neural-networks-vs-cnns" style="font-size: 80%;"><b>Neural Networks vs CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs007.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;"><b>Why CNNS for images, sound files, medical images from CT scans etc?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs008.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;"><b>Regular NNs don’t scale well to full images</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs009.html#3d-volumes-of-neurons" style="font-size: 80%;"><b>3D volumes of neurons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs010.html#layers-used-to-build-cnns" style="font-size: 80%;"><b>Layers used to build CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs011.html#cnns-in-brief" style="font-size: 80%;"><b>CNNs in brief</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs012.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs013.html#key-idea" style="font-size: 80%;"><b>Key Idea</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs014.html#building-convolutional-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building convolutional neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs015.html#setting-it-up" style="font-size: 80%;"><b>Setting it up</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs016.html#the-mnist-dataset-again" style="font-size: 80%;"><b>The MNIST dataset again</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs017.html#strong-correlations" style="font-size: 80%;"><b>Strong correlations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs018.html#layers-of-a-cnn" style="font-size: 80%;"><b>Layers of a CNN</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs019.html#systematic-reduction" style="font-size: 80%;"><b>Systematic reduction</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs020.html#prerequisites-collect-and-pre-process-data" style="font-size: 80%;"><b>Prerequisites: Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs021.html#importing-keras-and-tensorflow" style="font-size: 80%;"><b>Importing Keras and Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs022.html#running-with-keras" style="font-size: 80%;"><b>Running with Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs023.html#final-part" style="font-size: 80%;"><b>Final part</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs024.html#final-visualization" style="font-size: 80%;"><b>Final visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs025.html#the-cifar01-data-set" style="font-size: 80%;"><b>The CIFAR01 data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs026.html#verifying-the-data-set" style="font-size: 80%;"><b>Verifying the data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs027.html#set-up-the-model" style="font-size: 80%;"><b>Set up  the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs028.html#add-dense-layers-on-top" style="font-size: 80%;"><b>Add Dense layers on top</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs029.html#compile-and-train-the-model" style="font-size: 80%;"><b>Compile and train the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs030.html#finally-evaluate-the-model" style="font-size: 80%;"><b>Finally, evaluate the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#building-our-own-cnn-code" style="font-size: 80%;"><b>Building our own CNN code</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#list-of-contents" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;List of contents:</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution2dlayer-convolution-in-a-hidden-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution2DLayer: convolution in a hidden layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#backpropagation-in-the-convolutional-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Backpropagation in the convolutional layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#demonstration" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Demonstration</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#pooling-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Pooling Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#flattening-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Flattening Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#fully-connected-layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Fully Connected Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#optimized-convolution2dlayer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Optimized Convolution2DLayer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#the-convolutional-neural-network-cnn" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;The Convolutional Neural Network (CNN)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-cnn-code" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of CNN code</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#additional-remarks" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Additional Remarks</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#remarks-on-the-speed" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Remarks on the speed</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution-using-separable-kernels" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution using separable kernels</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution-in-the-fourier-domain" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution in the Fourier domain</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs032.html#from-ffnns-and-cnns-to-recurrent-neural-networks-rnns" style="font-size: 80%;"><b>From FFNNs and CNNs to recurrent neural networks (RNNs)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs033.html#feedback-connections" style="font-size: 80%;"><b>Feedback connections</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs034.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs035.html#recurrent-neural-networks-rnns-overarching-view" style="font-size: 80%;"><b>Recurrent neural networks (RNNs): Overarching view</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs036.html#sequential-data-only" style="font-size: 80%;"><b>Sequential data only?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs037.html#differential-equations" style="font-size: 80%;"><b>Differential equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs038.html#a-simple-example" style="font-size: 80%;"><b>A simple example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs039.html#rnns" style="font-size: 80%;"><b>RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs040.html#what-kinds-of-behaviour-can-rnns-exhibit" style="font-size: 80%;"><b>What kinds of behaviour can RNNs exhibit?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs041.html#basic-layout-figures-from-sebastian-rashcka-et-al-machine-learning-with-sickit-learn-and-pytorch-https-sebastianraschka-com-blog-2022-ml-pytorch-book-html" style="font-size: 80%;"><b>Basic layout,  "Figures from Sebastian Rashcka et al, Machine learning with Sickit-Learn and PyTorch":"https://sebastianraschka.com/blog/2022/ml-pytorch-book.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs042.html#solving-differential-equations-with-rnns" style="font-size: 80%;"><b>Solving differential equations with RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs043.html#two-first-order-differential-equations" style="font-size: 80%;"><b>Two first-order differential equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs044.html#velocity-only" style="font-size: 80%;"><b>Velocity only</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs045.html#linking-with-rnns" style="font-size: 80%;"><b>Linking with RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs046.html#minor-rewrite" style="font-size: 80%;"><b>Minor rewrite</b></a></li>
-     <!-- navigation toc: --> <li><a href="#rnns-in-more-detail" style="font-size: 80%;"><b>RNNs in more detail</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs048.html#rnns-in-more-detail-part-2" style="font-size: 80%;"><b>RNNs in more detail, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs049.html#rnns-in-more-detail-part-3" style="font-size: 80%;"><b>RNNs in more detail, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs050.html#rnns-in-more-detail-part-4" style="font-size: 80%;"><b>RNNs in more detail, part 4</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs051.html#rnns-in-more-detail-part-5" style="font-size: 80%;"><b>RNNs in more detail, part 5</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs052.html#rnns-in-more-detail-part-6" style="font-size: 80%;"><b>RNNs in more detail, part 6</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs053.html#rnns-in-more-detail-part-7" style="font-size: 80%;"><b>RNNs in more detail, part 7</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs054.html#backpropagation-through-time" style="font-size: 80%;"><b>Backpropagation through time</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs055.html#the-backward-pass-is-linear" style="font-size: 80%;"><b>The backward pass is linear</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs056.html#the-problem-of-exploding-or-vanishing-gradients" style="font-size: 80%;"><b>The problem of exploding or vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs057.html#mathematical-setup" style="font-size: 80%;"><b>Mathematical setup</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs058.html#back-propagation-in-time-through-figures-part-1" style="font-size: 80%;"><b>Back propagation in time through figures, part 1</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs059.html#back-propagation-in-time-part-2" style="font-size: 80%;"><b>Back propagation in time, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs060.html#back-propagation-in-time-part-3" style="font-size: 80%;"><b>Back propagation in time, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs061.html#back-propagation-in-time-part-4" style="font-size: 80%;"><b>Back propagation in time, part 4</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs062.html#back-propagation-in-time-in-equations" style="font-size: 80%;"><b>Back propagation in time in equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs063.html#chain-rule-again" style="font-size: 80%;"><b>Chain rule again</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs064.html#gradients-of-loss-functions" style="font-size: 80%;"><b>Gradients of loss functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs065.html#summary-of-rnns" style="font-size: 80%;"><b>Summary of RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs066.html#summary-of-a-typical-rnn" style="font-size: 80%;"><b>Summary of a  typical RNN</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs067.html#four-effective-ways-to-learn-an-rnn-and-preparing-for-next-week" style="font-size: 80%;"><b>Four effective ways to learn an RNN and preparing for next week</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs068.html#gating-mechanism-long-short-term-memory-lstm" style="font-size: 80%;"><b>Gating mechanism: Long Short Term Memory (LSTM)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs069.html#implementing-a-memory-cell-in-a-neural-network" style="font-size: 80%;"><b>Implementing a memory cell in a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs070.html#lstm-details" style="font-size: 80%;"><b>LSTM details</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs071.html#basic-layout" style="font-size: 80%;"><b>Basic layout</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs072.html#more-lstm-details" style="font-size: 80%;"><b>More LSTM details</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs073.html#the-forget-gate" style="font-size: 80%;"><b>The forget gate</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs074.html#input-gate" style="font-size: 80%;"><b>Input gate</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs075.html#forget-and-input" style="font-size: 80%;"><b>Forget and input</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs076.html#output-gate" style="font-size: 80%;"><b>Output gate</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs001.html#plans-for-week-45" style="font-size: 80%;">Plans for week 45</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs002.html#material-for-the-lab-sessions" style="font-size: 80%;">Material for the lab sessions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs003.html#material-for-lecture-monday-november-3" style="font-size: 80%;">Material for Lecture Monday November 3</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs004.html#convolutional-neural-networks-recognizing-images-reminder-from-last-week" style="font-size: 80%;">Convolutional Neural Networks (recognizing images), reminder from last week</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs005.html#what-is-the-difference" style="font-size: 80%;">What is the Difference</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs006.html#neural-networks-vs-cnns" style="font-size: 80%;">Neural Networks vs CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs007.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;">Why CNNS for images, sound files, medical images from CT scans etc?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs008.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;">Regular NNs don’t scale well to full images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs009.html#3d-volumes-of-neurons" style="font-size: 80%;">3D volumes of neurons</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs010.html#more-on-dimensionalities" style="font-size: 80%;">More on Dimensionalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs011.html#further-remarks" style="font-size: 80%;">Further remarks</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs012.html#layers-used-to-build-cnns" style="font-size: 80%;">Layers used to build CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs013.html#transforming-images" style="font-size: 80%;">Transforming images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs014.html#cnns-in-brief" style="font-size: 80%;">CNNs in brief</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs015.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs016.html#key-idea" style="font-size: 80%;">Key Idea</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs017.html#mathematics-of-cnns" style="font-size: 80%;">Mathematics of CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs018.html#convolution-examples-polynomial-multiplication" style="font-size: 80%;">Convolution Examples: Polynomial multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs019.html#efficient-polynomial-multiplication" style="font-size: 80%;">Efficient Polynomial Multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs020.html#further-simplification" style="font-size: 80%;">Further simplification</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs021.html#a-more-efficient-way-of-coding-the-above-convolution" style="font-size: 80%;">A more efficient way of coding the above Convolution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs022.html#commutative-process" style="font-size: 80%;">Commutative process</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs023.html#toeplitz-matrices" style="font-size: 80%;">Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs024.html#fourier-series-and-toeplitz-matrices" style="font-size: 80%;">Fourier series and Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs025.html#generalizing-the-above-one-dimensional-case" style="font-size: 80%;">Generalizing the above one-dimensional case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs026.html#memory-considerations" style="font-size: 80%;">Memory considerations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs027.html#padding" style="font-size: 80%;">Padding</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs028.html#new-vector" style="font-size: 80%;">New vector</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs029.html#rewriting-as-dot-products" style="font-size: 80%;">Rewriting as dot products</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs030.html#cross-correlation" style="font-size: 80%;">Cross correlation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs030.html#two-dimensional-objects" style="font-size: 80%;">Two-dimensional objects</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#cnns-in-more-detail-simple-example" style="font-size: 80%;">CNNs in more detail, simple example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs032.html#the-convolution-stage" style="font-size: 80%;">The convolution stage</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs033.html#finding-the-number-of-parameters" style="font-size: 80%;">Finding the number of parameters</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs034.html#new-image-or-volume" style="font-size: 80%;">New image (or volume)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs035.html#parameters-to-train-common-settings" style="font-size: 80%;">Parameters to train, common settings</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs036.html#examples-of-cnn-setups" style="font-size: 80%;">Examples of CNN setups</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs037.html#summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Summarizing: Performing a general discrete convolution ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs038.html#pooling" style="font-size: 80%;">Pooling</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs039.html#pooling-arithmetic" style="font-size: 80%;">Pooling arithmetic</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs040.html#pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Pooling types ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs041.html#building-convolutional-neural-networks-using-tensorflow-and-keras" style="font-size: 80%;">Building convolutional neural networks using Tensorflow and Keras</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs042.html#setting-it-up" style="font-size: 80%;">Setting it up</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs043.html#the-mnist-dataset-again" style="font-size: 80%;">The MNIST dataset again</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs044.html#strong-correlations" style="font-size: 80%;">Strong correlations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs045.html#layers-of-a-cnn" style="font-size: 80%;">Layers of a CNN</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs046.html#systematic-reduction" style="font-size: 80%;">Systematic reduction</a></li>
+     <!-- navigation toc: --> <li><a href="#prerequisites-collect-and-pre-process-data" style="font-size: 80%;">Prerequisites: Collect and pre-process data</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs048.html#importing-keras-and-tensorflow" style="font-size: 80%;">Importing Keras and Tensorflow</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs049.html#running-with-keras" style="font-size: 80%;">Running with Keras</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs050.html#final-part" style="font-size: 80%;">Final part</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs051.html#final-visualization" style="font-size: 80%;">Final visualization</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs052.html#the-cifar01-data-set" style="font-size: 80%;">The CIFAR01 data set</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs053.html#verifying-the-data-set" style="font-size: 80%;">Verifying the data set</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs054.html#set-up-the-model" style="font-size: 80%;">Set up  the model</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs055.html#add-dense-layers-on-top" style="font-size: 80%;">Add Dense layers on top</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs056.html#compile-and-train-the-model" style="font-size: 80%;">Compile and train the model</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs057.html#finally-evaluate-the-model" style="font-size: 80%;">Finally, evaluate the model</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs058.html#building-code-using-pytorch" style="font-size: 80%;">Building code using Pytorch</a></li>
 
         </ul>
       </li>
@@ -418,13 +278,69 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0047"></a>
 <!-- !split -->
-<h2 id="rnns-in-more-detail" class="anchor">RNNs in more detail  </h2>
+<h2 id="prerequisites-collect-and-pre-process-data" class="anchor">Prerequisites: Collect and pre-process data </h2>
+
+<!-- code=python (!bc pycod) typeset with pygments style "default" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #f8f8f8">
+  <pre style="line-height: 125%;"><span style="color: #408080; font-style: italic"># import necessary packages</span>
+<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
+<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">matplotlib.pyplot</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">plt</span>
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn</span> <span style="color: #008000; font-weight: bold">import</span> datasets
+
+
+<span style="color: #408080; font-style: italic"># ensure the same random numbers appear every time</span>
+np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>seed(<span style="color: #666666">0</span>)
+
+<span style="color: #408080; font-style: italic"># display images in notebook</span>
+<span style="color: #666666">%</span>matplotlib inline
+plt<span style="color: #666666">.</span>rcParams[<span style="color: #BA2121">&#39;figure.figsize&#39;</span>] <span style="color: #666666">=</span> (<span style="color: #666666">12</span>,<span style="color: #666666">12</span>)
+
+
+<span style="color: #408080; font-style: italic"># download MNIST dataset</span>
+digits <span style="color: #666666">=</span> datasets<span style="color: #666666">.</span>load_digits()
+
+<span style="color: #408080; font-style: italic"># define inputs and labels</span>
+inputs <span style="color: #666666">=</span> digits<span style="color: #666666">.</span>images
+labels <span style="color: #666666">=</span> digits<span style="color: #666666">.</span>target
+
+<span style="color: #408080; font-style: italic"># RGB images have a depth of 3</span>
+<span style="color: #408080; font-style: italic"># our images are grayscale so they should have a depth of 1</span>
+inputs <span style="color: #666666">=</span> inputs[:,:,:,np<span style="color: #666666">.</span>newaxis]
+
+<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;inputs = (n_inputs, pixel_width, pixel_height, depth) = &quot;</span> <span style="color: #666666">+</span> <span style="color: #008000">str</span>(inputs<span style="color: #666666">.</span>shape))
+<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;labels = (n_inputs) = &quot;</span> <span style="color: #666666">+</span> <span style="color: #008000">str</span>(labels<span style="color: #666666">.</span>shape))
+
+
+<span style="color: #408080; font-style: italic"># choose some random images to display</span>
+n_inputs <span style="color: #666666">=</span> <span style="color: #008000">len</span>(inputs)
+indices <span style="color: #666666">=</span> np<span style="color: #666666">.</span>arange(n_inputs)
+random_indices <span style="color: #666666">=</span> np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>choice(indices, size<span style="color: #666666">=5</span>)
+
+<span style="color: #008000; font-weight: bold">for</span> i, image <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">enumerate</span>(digits<span style="color: #666666">.</span>images[random_indices]):
+    plt<span style="color: #666666">.</span>subplot(<span style="color: #666666">1</span>, <span style="color: #666666">5</span>, i<span style="color: #666666">+1</span>)
+    plt<span style="color: #666666">.</span>axis(<span style="color: #BA2121">&#39;off&#39;</span>)
+    plt<span style="color: #666666">.</span>imshow(image, cmap<span style="color: #666666">=</span>plt<span style="color: #666666">.</span>cm<span style="color: #666666">.</span>gray_r, interpolation<span style="color: #666666">=</span><span style="color: #BA2121">&#39;nearest&#39;</span>)
+    plt<span style="color: #666666">.</span>title(<span style="color: #BA2121">&quot;Label: </span><span style="color: #BB6688; font-weight: bold">%d</span><span style="color: #BA2121">&quot;</span> <span style="color: #666666">%</span> digits<span style="color: #666666">.</span>target[random_indices[i]])
+plt<span style="color: #666666">.</span>show()
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
+</div>
 
-<br/><br/>
-<center>
-<p><img src="http://github.com/figslides/RNN2.png" width="700" align="bottom"></p>
-</center>
-<br/><br/>
 
 <p>
 <!-- navigation buttons at the bottom of the page -->
@@ -451,7 +367,7 @@ <h2 id="rnns-in-more-detail" class="anchor">RNNs in more detail  </h2>
   <li><a href="http://github.com/._week45-bs055.html">56</a></li>
   <li><a href="http://github.com/._week45-bs056.html">57</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week45-bs076.html">77</a></li>
+  <li><a href="http://github.com/._week45-bs058.html">59</a></li>
   <li><a href="http://github.com/._week45-bs048.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week45/html/._week45-bs048.html b/doc/pub/week45/html/._week45-bs048.html
index 49c09ef79..053fe4470 100644
--- a/doc/pub/week45/html/._week45-bs048.html
+++ b/doc/pub/week45/html/._week45-bs048.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks (RNNs)">
-<title>Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks (RNNs)</title>
+<meta name="description" content="Week 45,  Convolutional Neural Networks (CCNs)">
+<title>Week 45,  Convolutional Neural Networks (CCNs)</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week45.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week45-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -37,19 +37,19 @@
 <!-- tocinfo
 {'highest level': 2,
  'sections': [('Plans for week 45', 2, None, 'plans-for-week-45'),
-              ('Material for the lab sessions, additional ways to present '
-               'classification results and other practicalities',
+              ('Material for the lab sessions',
                2,
                None,
-               'material-for-the-lab-sessions-additional-ways-to-present-classification-results-and-other-practicalities'),
-              ('Material for Lecture Monday November 4',
+               'material-for-the-lab-sessions'),
+              ('Material for Lecture Monday November 3',
                2,
                None,
-               'material-for-lecture-monday-november-4'),
-              ('Convolutional Neural Networks (recognizing images)',
+               'material-for-lecture-monday-november-3'),
+              ('Convolutional Neural Networks (recognizing images), reminder '
+               'from last week',
                2,
                None,
-               'convolutional-neural-networks-recognizing-images'),
+               'convolutional-neural-networks-recognizing-images-reminder-from-last-week'),
               ('What is the Difference', 2, None, 'what-is-the-difference'),
               ('Neural Networks vs CNNs', 2, None, 'neural-networks-vs-cnns'),
               ('Why CNNS for images, sound files, medical images from CT scans '
@@ -62,10 +62,13 @@
                None,
                'regular-nns-don-t-scale-well-to-full-images'),
               ('3D volumes of neurons', 2, None, '3d-volumes-of-neurons'),
+              ('More on Dimensionalities', 2, None, 'more-on-dimensionalities'),
+              ('Further remarks', 2, None, 'further-remarks'),
               ('Layers used to build CNNs',
                2,
                None,
                'layers-used-to-build-cnns'),
+              ('Transforming images', 2, None, 'transforming-images'),
               ('CNNs in brief', 2, None, 'cnns-in-brief'),
               ('A deep CNN model ("From Raschka et '
                'al":"https://github.com/rasbt/machine-learning-book")',
@@ -73,208 +76,104 @@
                None,
                'a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
               ('Key Idea', 2, None, 'key-idea'),
-              ('Building convolutional neural networks in Tensorflow and Keras',
+              ('Mathematics of CNNs', 2, None, 'mathematics-of-cnns'),
+              ('Convolution Examples: Polynomial multiplication',
                2,
                None,
-               'building-convolutional-neural-networks-in-tensorflow-and-keras'),
-              ('Setting it up', 2, None, 'setting-it-up'),
-              ('The MNIST dataset again', 2, None, 'the-mnist-dataset-again'),
-              ('Strong correlations', 2, None, 'strong-correlations'),
-              ('Layers of a CNN', 2, None, 'layers-of-a-cnn'),
-              ('Systematic reduction', 2, None, 'systematic-reduction'),
-              ('Prerequisites: Collect and pre-process data',
-               2,
-               None,
-               'prerequisites-collect-and-pre-process-data'),
-              ('Importing Keras and Tensorflow',
-               2,
-               None,
-               'importing-keras-and-tensorflow'),
-              ('Running with Keras', 2, None, 'running-with-keras'),
-              ('Final part', 2, None, 'final-part'),
-              ('Final visualization', 2, None, 'final-visualization'),
-              ('The CIFAR01 data set', 2, None, 'the-cifar01-data-set'),
-              ('Verifying the data set', 2, None, 'verifying-the-data-set'),
-              ('Set up  the model', 2, None, 'set-up-the-model'),
-              ('Add Dense layers on top', 2, None, 'add-dense-layers-on-top'),
-              ('Compile and train the model',
-               2,
-               None,
-               'compile-and-train-the-model'),
-              ('Finally, evaluate the model',
-               2,
-               None,
-               'finally-evaluate-the-model'),
-              ('Building our own CNN code',
-               2,
-               None,
-               'building-our-own-cnn-code'),
-              ('List of contents:', 3, None, 'list-of-contents'),
-              ('Schedulers', 3, None, 'schedulers'),
-              ('Usage of schedulers', 3, None, 'usage-of-schedulers'),
-              ('Cost functions', 3, None, 'cost-functions'),
-              ('Usage of cost functions', 3, None, 'usage-of-cost-functions'),
-              ('Activation functions', 3, None, 'activation-functions'),
-              ('Usage of activation functions',
-               3,
-               None,
-               'usage-of-activation-functions'),
-              ('Convolution', 3, None, 'convolution'),
-              ('Layers', 3, None, 'layers'),
-              ('Convolution2DLayer: convolution in a hidden layer',
-               3,
-               None,
-               'convolution2dlayer-convolution-in-a-hidden-layer'),
-              ('Backpropagation in the convolutional layer',
-               3,
-               None,
-               'backpropagation-in-the-convolutional-layer'),
-              ('Demonstration', 3, None, 'demonstration'),
-              ('Pooling Layer', 3, None, 'pooling-layer'),
-              ('Flattening Layer', 3, None, 'flattening-layer'),
-              ('Fully Connected Layers', 3, None, 'fully-connected-layers'),
-              ('Optimized Convolution2DLayer',
-               3,
-               None,
-               'optimized-convolution2dlayer'),
-              ('The Convolutional Neural Network (CNN)',
-               3,
-               None,
-               'the-convolutional-neural-network-cnn'),
-              ('Usage of CNN code', 3, None, 'usage-of-cnn-code'),
-              ('Additional Remarks', 3, None, 'additional-remarks'),
-              ('Remarks on the speed', 3, None, 'remarks-on-the-speed'),
-              ('Convolution using separable kernels',
-               3,
-               None,
-               'convolution-using-separable-kernels'),
-              ('Convolution in the Fourier domain',
-               3,
-               None,
-               'convolution-in-the-fourier-domain'),
-              ('From FFNNs and CNNs to recurrent neural networks (RNNs)',
-               2,
-               None,
-               'from-ffnns-and-cnns-to-recurrent-neural-networks-rnns'),
-              ('Feedback connections', 2, None, 'feedback-connections'),
-              ('Vanishing gradients', 2, None, 'vanishing-gradients'),
-              ('Recurrent neural networks (RNNs): Overarching view',
-               2,
-               None,
-               'recurrent-neural-networks-rnns-overarching-view'),
-              ('Sequential data only?', 2, None, 'sequential-data-only'),
-              ('Differential equations', 2, None, 'differential-equations'),
-              ('A simple example', 2, None, 'a-simple-example'),
-              ('RNNs', 2, None, 'rnns'),
-              ('What kinds of behaviour can RNNs exhibit?',
-               2,
-               None,
-               'what-kinds-of-behaviour-can-rnns-exhibit'),
-              ('Basic layout,  "Figures from Sebastian Rashcka et al, Machine '
-               'learning with Sickit-Learn and '
-               'PyTorch":"https://sebastianraschka.com/blog/2022/ml-pytorch-book.html"',
+               'convolution-examples-polynomial-multiplication'),
+              ('Efficient Polynomial Multiplication',
                2,
                None,
-               'basic-layout-figures-from-sebastian-rashcka-et-al-machine-learning-with-sickit-learn-and-pytorch-https-sebastianraschka-com-blog-2022-ml-pytorch-book-html'),
-              ('Solving differential equations with RNNs',
+               'efficient-polynomial-multiplication'),
+              ('Further simplification', 2, None, 'further-simplification'),
+              ('A more efficient way of coding the above Convolution',
                2,
                None,
-               'solving-differential-equations-with-rnns'),
-              ('Two first-order differential equations',
+               'a-more-efficient-way-of-coding-the-above-convolution'),
+              ('Commutative process', 2, None, 'commutative-process'),
+              ('Toeplitz matrices', 2, None, 'toeplitz-matrices'),
+              ('Fourier series and Toeplitz matrices',
                2,
                None,
-               'two-first-order-differential-equations'),
-              ('Velocity only', 2, None, 'velocity-only'),
-              ('Linking with RNNs', 2, None, 'linking-with-rnns'),
-              ('Minor rewrite', 2, None, 'minor-rewrite'),
-              ('RNNs in more detail', 2, None, 'rnns-in-more-detail'),
-              ('RNNs in more detail, part 2',
+               'fourier-series-and-toeplitz-matrices'),
+              ('Generalizing the above one-dimensional case',
                2,
                None,
-               'rnns-in-more-detail-part-2'),
-              ('RNNs in more detail, part 3',
+               'generalizing-the-above-one-dimensional-case'),
+              ('Memory considerations', 2, None, 'memory-considerations'),
+              ('Padding', 2, None, 'padding'),
+              ('New vector', 2, None, 'new-vector'),
+              ('Rewriting as dot products',
                2,
                None,
-               'rnns-in-more-detail-part-3'),
-              ('RNNs in more detail, part 4',
+               'rewriting-as-dot-products'),
+              ('Cross correlation', 2, None, 'cross-correlation'),
+              ('Two-dimensional objects', 2, None, 'two-dimensional-objects'),
+              ('CNNs in more detail, simple example',
                2,
                None,
-               'rnns-in-more-detail-part-4'),
-              ('RNNs in more detail, part 5',
+               'cnns-in-more-detail-simple-example'),
+              ('The convolution stage', 2, None, 'the-convolution-stage'),
+              ('Finding the number of parameters',
                2,
                None,
-               'rnns-in-more-detail-part-5'),
-              ('RNNs in more detail, part 6',
+               'finding-the-number-of-parameters'),
+              ('New image (or volume)', 2, None, 'new-image-or-volume'),
+              ('Parameters to train, common settings',
                2,
                None,
-               'rnns-in-more-detail-part-6'),
-              ('RNNs in more detail, part 7',
-               2,
-               None,
-               'rnns-in-more-detail-part-7'),
-              ('Backpropagation through time',
-               2,
-               None,
-               'backpropagation-through-time'),
-              ('The backward pass is linear',
-               2,
-               None,
-               'the-backward-pass-is-linear'),
-              ('The problem of exploding or vanishing gradients',
-               2,
-               None,
-               'the-problem-of-exploding-or-vanishing-gradients'),
-              ('Mathematical setup', 2, None, 'mathematical-setup'),
-              ('Back propagation in time through figures, part 1',
-               2,
-               None,
-               'back-propagation-in-time-through-figures-part-1'),
-              ('Back propagation in time, part 2',
-               2,
-               None,
-               'back-propagation-in-time-part-2'),
-              ('Back propagation in time, part 3',
+               'parameters-to-train-common-settings'),
+              ('Examples of CNN setups', 2, None, 'examples-of-cnn-setups'),
+              ('Summarizing: Performing a general discrete convolution ("From '
+               'Raschka et '
+               'al":"https://github.com/rasbt/machine-learning-book")',
                2,
                None,
-               'back-propagation-in-time-part-3'),
-              ('Back propagation in time, part 4',
+               'summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
+              ('Pooling', 2, None, 'pooling'),
+              ('Pooling arithmetic', 2, None, 'pooling-arithmetic'),
+              ('Pooling types ("From Raschka et '
+               'al":"https://github.com/rasbt/machine-learning-book")',
                2,
                None,
-               'back-propagation-in-time-part-4'),
-              ('Back propagation in time in equations',
+               'pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
+              ('Building convolutional neural networks using Tensorflow and '
+               'Keras',
                2,
                None,
-               'back-propagation-in-time-in-equations'),
-              ('Chain rule again', 2, None, 'chain-rule-again'),
-              ('Gradients of loss functions',
+               'building-convolutional-neural-networks-using-tensorflow-and-keras'),
+              ('Setting it up', 2, None, 'setting-it-up'),
+              ('The MNIST dataset again', 2, None, 'the-mnist-dataset-again'),
+              ('Strong correlations', 2, None, 'strong-correlations'),
+              ('Layers of a CNN', 2, None, 'layers-of-a-cnn'),
+              ('Systematic reduction', 2, None, 'systematic-reduction'),
+              ('Prerequisites: Collect and pre-process data',
                2,
                None,
-               'gradients-of-loss-functions'),
-              ('Summary of RNNs', 2, None, 'summary-of-rnns'),
-              ('Summary of a  typical RNN',
+               'prerequisites-collect-and-pre-process-data'),
+              ('Importing Keras and Tensorflow',
                2,
                None,
-               'summary-of-a-typical-rnn'),
-              ('Four effective ways to learn an RNN and preparing for next '
-               'week',
+               'importing-keras-and-tensorflow'),
+              ('Running with Keras', 2, None, 'running-with-keras'),
+              ('Final part', 2, None, 'final-part'),
+              ('Final visualization', 2, None, 'final-visualization'),
+              ('The CIFAR01 data set', 2, None, 'the-cifar01-data-set'),
+              ('Verifying the data set', 2, None, 'verifying-the-data-set'),
+              ('Set up  the model', 2, None, 'set-up-the-model'),
+              ('Add Dense layers on top', 2, None, 'add-dense-layers-on-top'),
+              ('Compile and train the model',
                2,
                None,
-               'four-effective-ways-to-learn-an-rnn-and-preparing-for-next-week'),
-              ('Gating mechanism: Long Short Term Memory (LSTM)',
+               'compile-and-train-the-model'),
+              ('Finally, evaluate the model',
                2,
                None,
-               'gating-mechanism-long-short-term-memory-lstm'),
-              ('Implementing a memory cell in a neural network',
+               'finally-evaluate-the-model'),
+              ('Building code using Pytorch',
                2,
                None,
-               'implementing-a-memory-cell-in-a-neural-network'),
-              ('LSTM details', 2, None, 'lstm-details'),
-              ('Basic layout', 2, None, 'basic-layout'),
-              ('More LSTM details', 2, None, 'more-lstm-details'),
-              ('The forget gate', 2, None, 'the-forget-gate'),
-              ('Input gate', 2, None, 'input-gate'),
-              ('Forget and input', 2, None, 'forget-and-input'),
-              ('Output gate', 2, None, 'output-gate')]}
+               'building-code-using-pytorch')]}
 end of tocinfo -->
 
 <body>
@@ -302,111 +201,72 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week45-bs.html">Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks (RNNs)</a>
+    <a class="navbar-brand" href="http://github.com/week45-bs.html">Week 45,  Convolutional Neural Networks (CCNs)</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs001.html#plans-for-week-45" style="font-size: 80%;"><b>Plans for week 45</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs002.html#material-for-the-lab-sessions-additional-ways-to-present-classification-results-and-other-practicalities" style="font-size: 80%;"><b>Material for the lab sessions, additional ways to present classification results and other practicalities</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs003.html#material-for-lecture-monday-november-4" style="font-size: 80%;"><b>Material for Lecture Monday November 4</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs004.html#convolutional-neural-networks-recognizing-images" style="font-size: 80%;"><b>Convolutional Neural Networks (recognizing images)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs005.html#what-is-the-difference" style="font-size: 80%;"><b>What is the Difference</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs006.html#neural-networks-vs-cnns" style="font-size: 80%;"><b>Neural Networks vs CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs007.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;"><b>Why CNNS for images, sound files, medical images from CT scans etc?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs008.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;"><b>Regular NNs don’t scale well to full images</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs009.html#3d-volumes-of-neurons" style="font-size: 80%;"><b>3D volumes of neurons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs010.html#layers-used-to-build-cnns" style="font-size: 80%;"><b>Layers used to build CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs011.html#cnns-in-brief" style="font-size: 80%;"><b>CNNs in brief</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs012.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs013.html#key-idea" style="font-size: 80%;"><b>Key Idea</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs014.html#building-convolutional-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building convolutional neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs015.html#setting-it-up" style="font-size: 80%;"><b>Setting it up</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs016.html#the-mnist-dataset-again" style="font-size: 80%;"><b>The MNIST dataset again</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs017.html#strong-correlations" style="font-size: 80%;"><b>Strong correlations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs018.html#layers-of-a-cnn" style="font-size: 80%;"><b>Layers of a CNN</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs019.html#systematic-reduction" style="font-size: 80%;"><b>Systematic reduction</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs020.html#prerequisites-collect-and-pre-process-data" style="font-size: 80%;"><b>Prerequisites: Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs021.html#importing-keras-and-tensorflow" style="font-size: 80%;"><b>Importing Keras and Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs022.html#running-with-keras" style="font-size: 80%;"><b>Running with Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs023.html#final-part" style="font-size: 80%;"><b>Final part</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs024.html#final-visualization" style="font-size: 80%;"><b>Final visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs025.html#the-cifar01-data-set" style="font-size: 80%;"><b>The CIFAR01 data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs026.html#verifying-the-data-set" style="font-size: 80%;"><b>Verifying the data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs027.html#set-up-the-model" style="font-size: 80%;"><b>Set up  the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs028.html#add-dense-layers-on-top" style="font-size: 80%;"><b>Add Dense layers on top</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs029.html#compile-and-train-the-model" style="font-size: 80%;"><b>Compile and train the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs030.html#finally-evaluate-the-model" style="font-size: 80%;"><b>Finally, evaluate the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#building-our-own-cnn-code" style="font-size: 80%;"><b>Building our own CNN code</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#list-of-contents" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;List of contents:</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution2dlayer-convolution-in-a-hidden-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution2DLayer: convolution in a hidden layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#backpropagation-in-the-convolutional-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Backpropagation in the convolutional layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#demonstration" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Demonstration</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#pooling-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Pooling Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#flattening-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Flattening Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#fully-connected-layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Fully Connected Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#optimized-convolution2dlayer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Optimized Convolution2DLayer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#the-convolutional-neural-network-cnn" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;The Convolutional Neural Network (CNN)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-cnn-code" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of CNN code</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#additional-remarks" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Additional Remarks</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#remarks-on-the-speed" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Remarks on the speed</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution-using-separable-kernels" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution using separable kernels</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution-in-the-fourier-domain" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution in the Fourier domain</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs032.html#from-ffnns-and-cnns-to-recurrent-neural-networks-rnns" style="font-size: 80%;"><b>From FFNNs and CNNs to recurrent neural networks (RNNs)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs033.html#feedback-connections" style="font-size: 80%;"><b>Feedback connections</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs034.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs035.html#recurrent-neural-networks-rnns-overarching-view" style="font-size: 80%;"><b>Recurrent neural networks (RNNs): Overarching view</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs036.html#sequential-data-only" style="font-size: 80%;"><b>Sequential data only?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs037.html#differential-equations" style="font-size: 80%;"><b>Differential equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs038.html#a-simple-example" style="font-size: 80%;"><b>A simple example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs039.html#rnns" style="font-size: 80%;"><b>RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs040.html#what-kinds-of-behaviour-can-rnns-exhibit" style="font-size: 80%;"><b>What kinds of behaviour can RNNs exhibit?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs041.html#basic-layout-figures-from-sebastian-rashcka-et-al-machine-learning-with-sickit-learn-and-pytorch-https-sebastianraschka-com-blog-2022-ml-pytorch-book-html" style="font-size: 80%;"><b>Basic layout,  "Figures from Sebastian Rashcka et al, Machine learning with Sickit-Learn and PyTorch":"https://sebastianraschka.com/blog/2022/ml-pytorch-book.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs042.html#solving-differential-equations-with-rnns" style="font-size: 80%;"><b>Solving differential equations with RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs043.html#two-first-order-differential-equations" style="font-size: 80%;"><b>Two first-order differential equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs044.html#velocity-only" style="font-size: 80%;"><b>Velocity only</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs045.html#linking-with-rnns" style="font-size: 80%;"><b>Linking with RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs046.html#minor-rewrite" style="font-size: 80%;"><b>Minor rewrite</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs047.html#rnns-in-more-detail" style="font-size: 80%;"><b>RNNs in more detail</b></a></li>
-     <!-- navigation toc: --> <li><a href="#rnns-in-more-detail-part-2" style="font-size: 80%;"><b>RNNs in more detail, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs049.html#rnns-in-more-detail-part-3" style="font-size: 80%;"><b>RNNs in more detail, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs050.html#rnns-in-more-detail-part-4" style="font-size: 80%;"><b>RNNs in more detail, part 4</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs051.html#rnns-in-more-detail-part-5" style="font-size: 80%;"><b>RNNs in more detail, part 5</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs052.html#rnns-in-more-detail-part-6" style="font-size: 80%;"><b>RNNs in more detail, part 6</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs053.html#rnns-in-more-detail-part-7" style="font-size: 80%;"><b>RNNs in more detail, part 7</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs054.html#backpropagation-through-time" style="font-size: 80%;"><b>Backpropagation through time</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs055.html#the-backward-pass-is-linear" style="font-size: 80%;"><b>The backward pass is linear</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs056.html#the-problem-of-exploding-or-vanishing-gradients" style="font-size: 80%;"><b>The problem of exploding or vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs057.html#mathematical-setup" style="font-size: 80%;"><b>Mathematical setup</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs058.html#back-propagation-in-time-through-figures-part-1" style="font-size: 80%;"><b>Back propagation in time through figures, part 1</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs059.html#back-propagation-in-time-part-2" style="font-size: 80%;"><b>Back propagation in time, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs060.html#back-propagation-in-time-part-3" style="font-size: 80%;"><b>Back propagation in time, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs061.html#back-propagation-in-time-part-4" style="font-size: 80%;"><b>Back propagation in time, part 4</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs062.html#back-propagation-in-time-in-equations" style="font-size: 80%;"><b>Back propagation in time in equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs063.html#chain-rule-again" style="font-size: 80%;"><b>Chain rule again</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs064.html#gradients-of-loss-functions" style="font-size: 80%;"><b>Gradients of loss functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs065.html#summary-of-rnns" style="font-size: 80%;"><b>Summary of RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs066.html#summary-of-a-typical-rnn" style="font-size: 80%;"><b>Summary of a  typical RNN</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs067.html#four-effective-ways-to-learn-an-rnn-and-preparing-for-next-week" style="font-size: 80%;"><b>Four effective ways to learn an RNN and preparing for next week</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs068.html#gating-mechanism-long-short-term-memory-lstm" style="font-size: 80%;"><b>Gating mechanism: Long Short Term Memory (LSTM)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs069.html#implementing-a-memory-cell-in-a-neural-network" style="font-size: 80%;"><b>Implementing a memory cell in a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs070.html#lstm-details" style="font-size: 80%;"><b>LSTM details</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs071.html#basic-layout" style="font-size: 80%;"><b>Basic layout</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs072.html#more-lstm-details" style="font-size: 80%;"><b>More LSTM details</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs073.html#the-forget-gate" style="font-size: 80%;"><b>The forget gate</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs074.html#input-gate" style="font-size: 80%;"><b>Input gate</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs075.html#forget-and-input" style="font-size: 80%;"><b>Forget and input</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs076.html#output-gate" style="font-size: 80%;"><b>Output gate</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs001.html#plans-for-week-45" style="font-size: 80%;">Plans for week 45</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs002.html#material-for-the-lab-sessions" style="font-size: 80%;">Material for the lab sessions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs003.html#material-for-lecture-monday-november-3" style="font-size: 80%;">Material for Lecture Monday November 3</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs004.html#convolutional-neural-networks-recognizing-images-reminder-from-last-week" style="font-size: 80%;">Convolutional Neural Networks (recognizing images), reminder from last week</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs005.html#what-is-the-difference" style="font-size: 80%;">What is the Difference</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs006.html#neural-networks-vs-cnns" style="font-size: 80%;">Neural Networks vs CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs007.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;">Why CNNS for images, sound files, medical images from CT scans etc?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs008.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;">Regular NNs don’t scale well to full images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs009.html#3d-volumes-of-neurons" style="font-size: 80%;">3D volumes of neurons</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs010.html#more-on-dimensionalities" style="font-size: 80%;">More on Dimensionalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs011.html#further-remarks" style="font-size: 80%;">Further remarks</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs012.html#layers-used-to-build-cnns" style="font-size: 80%;">Layers used to build CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs013.html#transforming-images" style="font-size: 80%;">Transforming images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs014.html#cnns-in-brief" style="font-size: 80%;">CNNs in brief</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs015.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs016.html#key-idea" style="font-size: 80%;">Key Idea</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs017.html#mathematics-of-cnns" style="font-size: 80%;">Mathematics of CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs018.html#convolution-examples-polynomial-multiplication" style="font-size: 80%;">Convolution Examples: Polynomial multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs019.html#efficient-polynomial-multiplication" style="font-size: 80%;">Efficient Polynomial Multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs020.html#further-simplification" style="font-size: 80%;">Further simplification</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs021.html#a-more-efficient-way-of-coding-the-above-convolution" style="font-size: 80%;">A more efficient way of coding the above Convolution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs022.html#commutative-process" style="font-size: 80%;">Commutative process</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs023.html#toeplitz-matrices" style="font-size: 80%;">Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs024.html#fourier-series-and-toeplitz-matrices" style="font-size: 80%;">Fourier series and Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs025.html#generalizing-the-above-one-dimensional-case" style="font-size: 80%;">Generalizing the above one-dimensional case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs026.html#memory-considerations" style="font-size: 80%;">Memory considerations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs027.html#padding" style="font-size: 80%;">Padding</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs028.html#new-vector" style="font-size: 80%;">New vector</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs029.html#rewriting-as-dot-products" style="font-size: 80%;">Rewriting as dot products</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs030.html#cross-correlation" style="font-size: 80%;">Cross correlation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs030.html#two-dimensional-objects" style="font-size: 80%;">Two-dimensional objects</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#cnns-in-more-detail-simple-example" style="font-size: 80%;">CNNs in more detail, simple example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs032.html#the-convolution-stage" style="font-size: 80%;">The convolution stage</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs033.html#finding-the-number-of-parameters" style="font-size: 80%;">Finding the number of parameters</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs034.html#new-image-or-volume" style="font-size: 80%;">New image (or volume)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs035.html#parameters-to-train-common-settings" style="font-size: 80%;">Parameters to train, common settings</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs036.html#examples-of-cnn-setups" style="font-size: 80%;">Examples of CNN setups</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs037.html#summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Summarizing: Performing a general discrete convolution ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs038.html#pooling" style="font-size: 80%;">Pooling</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs039.html#pooling-arithmetic" style="font-size: 80%;">Pooling arithmetic</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs040.html#pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Pooling types ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs041.html#building-convolutional-neural-networks-using-tensorflow-and-keras" style="font-size: 80%;">Building convolutional neural networks using Tensorflow and Keras</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs042.html#setting-it-up" style="font-size: 80%;">Setting it up</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs043.html#the-mnist-dataset-again" style="font-size: 80%;">The MNIST dataset again</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs044.html#strong-correlations" style="font-size: 80%;">Strong correlations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs045.html#layers-of-a-cnn" style="font-size: 80%;">Layers of a CNN</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs046.html#systematic-reduction" style="font-size: 80%;">Systematic reduction</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs047.html#prerequisites-collect-and-pre-process-data" style="font-size: 80%;">Prerequisites: Collect and pre-process data</a></li>
+     <!-- navigation toc: --> <li><a href="#importing-keras-and-tensorflow" style="font-size: 80%;">Importing Keras and Tensorflow</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs049.html#running-with-keras" style="font-size: 80%;">Running with Keras</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs050.html#final-part" style="font-size: 80%;">Final part</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs051.html#final-visualization" style="font-size: 80%;">Final visualization</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs052.html#the-cifar01-data-set" style="font-size: 80%;">The CIFAR01 data set</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs053.html#verifying-the-data-set" style="font-size: 80%;">Verifying the data set</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs054.html#set-up-the-model" style="font-size: 80%;">Set up  the model</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs055.html#add-dense-layers-on-top" style="font-size: 80%;">Add Dense layers on top</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs056.html#compile-and-train-the-model" style="font-size: 80%;">Compile and train the model</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs057.html#finally-evaluate-the-model" style="font-size: 80%;">Finally, evaluate the model</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs058.html#building-code-using-pytorch" style="font-size: 80%;">Building code using Pytorch</a></li>
 
         </ul>
       </li>
@@ -418,13 +278,51 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0048"></a>
 <!-- !split -->
-<h2 id="rnns-in-more-detail-part-2" class="anchor">RNNs in more detail, part 2  </h2>
+<h2 id="importing-keras-and-tensorflow" class="anchor">Importing Keras and Tensorflow </h2>
+
+<!-- code=python (!bc pycod) typeset with pygments style "default" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #f8f8f8">
+  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">tensorflow.keras</span> <span style="color: #008000; font-weight: bold">import</span> datasets, layers, models
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">tensorflow.keras.layers</span> <span style="color: #008000; font-weight: bold">import</span> Input
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">tensorflow.keras.models</span> <span style="color: #008000; font-weight: bold">import</span> Sequential      <span style="color: #408080; font-style: italic">#This allows appending layers to existing models</span>
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">tensorflow.keras.layers</span> <span style="color: #008000; font-weight: bold">import</span> Dense           <span style="color: #408080; font-style: italic">#This allows defining the characteristics of a particular layer</span>
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">tensorflow.keras</span> <span style="color: #008000; font-weight: bold">import</span> optimizers             <span style="color: #408080; font-style: italic">#This allows using whichever optimiser we want (sgd,adam,RMSprop)</span>
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">tensorflow.keras</span> <span style="color: #008000; font-weight: bold">import</span> regularizers           <span style="color: #408080; font-style: italic">#This allows using whichever regularizer we want (l1,l2,l1_l2)</span>
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">tensorflow.keras.utils</span> <span style="color: #008000; font-weight: bold">import</span> to_categorical   <span style="color: #408080; font-style: italic">#This allows using categorical cross entropy as the cost function</span>
+<span style="color: #408080; font-style: italic">#from tensorflow.keras import Conv2D</span>
+<span style="color: #408080; font-style: italic">#from tensorflow.keras import MaxPooling2D</span>
+<span style="color: #408080; font-style: italic">#from tensorflow.keras import Flatten</span>
+
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.model_selection</span> <span style="color: #008000; font-weight: bold">import</span> train_test_split
+
+<span style="color: #408080; font-style: italic"># representation of labels</span>
+labels <span style="color: #666666">=</span> to_categorical(labels)
+
+<span style="color: #408080; font-style: italic"># split into train and test data</span>
+<span style="color: #408080; font-style: italic"># one-liner from scikit-learn library</span>
+train_size <span style="color: #666666">=</span> <span style="color: #666666">0.8</span>
+test_size <span style="color: #666666">=</span> <span style="color: #666666">1</span> <span style="color: #666666">-</span> train_size
+X_train, X_test, Y_train, Y_test <span style="color: #666666">=</span> train_test_split(inputs, labels, train_size<span style="color: #666666">=</span>train_size,
+                                                    test_size<span style="color: #666666">=</span>test_size)
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
+</div>
 
-<br/><br/>
-<center>
-<p><img src="http://github.com/figslides/RNN3.png" width="700" align="bottom"></p>
-</center>
-<br/><br/>
 
 <p>
 <!-- navigation buttons at the bottom of the page -->
@@ -451,7 +349,7 @@ <h2 id="rnns-in-more-detail-part-2" class="anchor">RNNs in more detail, part 2
   <li><a href="http://github.com/._week45-bs056.html">57</a></li>
   <li><a href="http://github.com/._week45-bs057.html">58</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week45-bs076.html">77</a></li>
+  <li><a href="http://github.com/._week45-bs058.html">59</a></li>
   <li><a href="http://github.com/._week45-bs049.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week45/html/._week45-bs049.html b/doc/pub/week45/html/._week45-bs049.html
index 390ec3368..fd660ac01 100644
--- a/doc/pub/week45/html/._week45-bs049.html
+++ b/doc/pub/week45/html/._week45-bs049.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks (RNNs)">
-<title>Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks (RNNs)</title>
+<meta name="description" content="Week 45,  Convolutional Neural Networks (CCNs)">
+<title>Week 45,  Convolutional Neural Networks (CCNs)</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week45.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week45-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -37,19 +37,19 @@
 <!-- tocinfo
 {'highest level': 2,
  'sections': [('Plans for week 45', 2, None, 'plans-for-week-45'),
-              ('Material for the lab sessions, additional ways to present '
-               'classification results and other practicalities',
+              ('Material for the lab sessions',
                2,
                None,
-               'material-for-the-lab-sessions-additional-ways-to-present-classification-results-and-other-practicalities'),
-              ('Material for Lecture Monday November 4',
+               'material-for-the-lab-sessions'),
+              ('Material for Lecture Monday November 3',
                2,
                None,
-               'material-for-lecture-monday-november-4'),
-              ('Convolutional Neural Networks (recognizing images)',
+               'material-for-lecture-monday-november-3'),
+              ('Convolutional Neural Networks (recognizing images), reminder '
+               'from last week',
                2,
                None,
-               'convolutional-neural-networks-recognizing-images'),
+               'convolutional-neural-networks-recognizing-images-reminder-from-last-week'),
               ('What is the Difference', 2, None, 'what-is-the-difference'),
               ('Neural Networks vs CNNs', 2, None, 'neural-networks-vs-cnns'),
               ('Why CNNS for images, sound files, medical images from CT scans '
@@ -62,10 +62,13 @@
                None,
                'regular-nns-don-t-scale-well-to-full-images'),
               ('3D volumes of neurons', 2, None, '3d-volumes-of-neurons'),
+              ('More on Dimensionalities', 2, None, 'more-on-dimensionalities'),
+              ('Further remarks', 2, None, 'further-remarks'),
               ('Layers used to build CNNs',
                2,
                None,
                'layers-used-to-build-cnns'),
+              ('Transforming images', 2, None, 'transforming-images'),
               ('CNNs in brief', 2, None, 'cnns-in-brief'),
               ('A deep CNN model ("From Raschka et '
                'al":"https://github.com/rasbt/machine-learning-book")',
@@ -73,208 +76,104 @@
                None,
                'a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
               ('Key Idea', 2, None, 'key-idea'),
-              ('Building convolutional neural networks in Tensorflow and Keras',
+              ('Mathematics of CNNs', 2, None, 'mathematics-of-cnns'),
+              ('Convolution Examples: Polynomial multiplication',
                2,
                None,
-               'building-convolutional-neural-networks-in-tensorflow-and-keras'),
-              ('Setting it up', 2, None, 'setting-it-up'),
-              ('The MNIST dataset again', 2, None, 'the-mnist-dataset-again'),
-              ('Strong correlations', 2, None, 'strong-correlations'),
-              ('Layers of a CNN', 2, None, 'layers-of-a-cnn'),
-              ('Systematic reduction', 2, None, 'systematic-reduction'),
-              ('Prerequisites: Collect and pre-process data',
-               2,
-               None,
-               'prerequisites-collect-and-pre-process-data'),
-              ('Importing Keras and Tensorflow',
-               2,
-               None,
-               'importing-keras-and-tensorflow'),
-              ('Running with Keras', 2, None, 'running-with-keras'),
-              ('Final part', 2, None, 'final-part'),
-              ('Final visualization', 2, None, 'final-visualization'),
-              ('The CIFAR01 data set', 2, None, 'the-cifar01-data-set'),
-              ('Verifying the data set', 2, None, 'verifying-the-data-set'),
-              ('Set up  the model', 2, None, 'set-up-the-model'),
-              ('Add Dense layers on top', 2, None, 'add-dense-layers-on-top'),
-              ('Compile and train the model',
-               2,
-               None,
-               'compile-and-train-the-model'),
-              ('Finally, evaluate the model',
-               2,
-               None,
-               'finally-evaluate-the-model'),
-              ('Building our own CNN code',
-               2,
-               None,
-               'building-our-own-cnn-code'),
-              ('List of contents:', 3, None, 'list-of-contents'),
-              ('Schedulers', 3, None, 'schedulers'),
-              ('Usage of schedulers', 3, None, 'usage-of-schedulers'),
-              ('Cost functions', 3, None, 'cost-functions'),
-              ('Usage of cost functions', 3, None, 'usage-of-cost-functions'),
-              ('Activation functions', 3, None, 'activation-functions'),
-              ('Usage of activation functions',
-               3,
-               None,
-               'usage-of-activation-functions'),
-              ('Convolution', 3, None, 'convolution'),
-              ('Layers', 3, None, 'layers'),
-              ('Convolution2DLayer: convolution in a hidden layer',
-               3,
-               None,
-               'convolution2dlayer-convolution-in-a-hidden-layer'),
-              ('Backpropagation in the convolutional layer',
-               3,
-               None,
-               'backpropagation-in-the-convolutional-layer'),
-              ('Demonstration', 3, None, 'demonstration'),
-              ('Pooling Layer', 3, None, 'pooling-layer'),
-              ('Flattening Layer', 3, None, 'flattening-layer'),
-              ('Fully Connected Layers', 3, None, 'fully-connected-layers'),
-              ('Optimized Convolution2DLayer',
-               3,
-               None,
-               'optimized-convolution2dlayer'),
-              ('The Convolutional Neural Network (CNN)',
-               3,
-               None,
-               'the-convolutional-neural-network-cnn'),
-              ('Usage of CNN code', 3, None, 'usage-of-cnn-code'),
-              ('Additional Remarks', 3, None, 'additional-remarks'),
-              ('Remarks on the speed', 3, None, 'remarks-on-the-speed'),
-              ('Convolution using separable kernels',
-               3,
-               None,
-               'convolution-using-separable-kernels'),
-              ('Convolution in the Fourier domain',
-               3,
-               None,
-               'convolution-in-the-fourier-domain'),
-              ('From FFNNs and CNNs to recurrent neural networks (RNNs)',
-               2,
-               None,
-               'from-ffnns-and-cnns-to-recurrent-neural-networks-rnns'),
-              ('Feedback connections', 2, None, 'feedback-connections'),
-              ('Vanishing gradients', 2, None, 'vanishing-gradients'),
-              ('Recurrent neural networks (RNNs): Overarching view',
+               'convolution-examples-polynomial-multiplication'),
+              ('Efficient Polynomial Multiplication',
                2,
                None,
-               'recurrent-neural-networks-rnns-overarching-view'),
-              ('Sequential data only?', 2, None, 'sequential-data-only'),
-              ('Differential equations', 2, None, 'differential-equations'),
-              ('A simple example', 2, None, 'a-simple-example'),
-              ('RNNs', 2, None, 'rnns'),
-              ('What kinds of behaviour can RNNs exhibit?',
+               'efficient-polynomial-multiplication'),
+              ('Further simplification', 2, None, 'further-simplification'),
+              ('A more efficient way of coding the above Convolution',
                2,
                None,
-               'what-kinds-of-behaviour-can-rnns-exhibit'),
-              ('Basic layout,  "Figures from Sebastian Rashcka et al, Machine '
-               'learning with Sickit-Learn and '
-               'PyTorch":"https://sebastianraschka.com/blog/2022/ml-pytorch-book.html"',
+               'a-more-efficient-way-of-coding-the-above-convolution'),
+              ('Commutative process', 2, None, 'commutative-process'),
+              ('Toeplitz matrices', 2, None, 'toeplitz-matrices'),
+              ('Fourier series and Toeplitz matrices',
                2,
                None,
-               'basic-layout-figures-from-sebastian-rashcka-et-al-machine-learning-with-sickit-learn-and-pytorch-https-sebastianraschka-com-blog-2022-ml-pytorch-book-html'),
-              ('Solving differential equations with RNNs',
+               'fourier-series-and-toeplitz-matrices'),
+              ('Generalizing the above one-dimensional case',
                2,
                None,
-               'solving-differential-equations-with-rnns'),
-              ('Two first-order differential equations',
+               'generalizing-the-above-one-dimensional-case'),
+              ('Memory considerations', 2, None, 'memory-considerations'),
+              ('Padding', 2, None, 'padding'),
+              ('New vector', 2, None, 'new-vector'),
+              ('Rewriting as dot products',
                2,
                None,
-               'two-first-order-differential-equations'),
-              ('Velocity only', 2, None, 'velocity-only'),
-              ('Linking with RNNs', 2, None, 'linking-with-rnns'),
-              ('Minor rewrite', 2, None, 'minor-rewrite'),
-              ('RNNs in more detail', 2, None, 'rnns-in-more-detail'),
-              ('RNNs in more detail, part 2',
+               'rewriting-as-dot-products'),
+              ('Cross correlation', 2, None, 'cross-correlation'),
+              ('Two-dimensional objects', 2, None, 'two-dimensional-objects'),
+              ('CNNs in more detail, simple example',
                2,
                None,
-               'rnns-in-more-detail-part-2'),
-              ('RNNs in more detail, part 3',
+               'cnns-in-more-detail-simple-example'),
+              ('The convolution stage', 2, None, 'the-convolution-stage'),
+              ('Finding the number of parameters',
                2,
                None,
-               'rnns-in-more-detail-part-3'),
-              ('RNNs in more detail, part 4',
+               'finding-the-number-of-parameters'),
+              ('New image (or volume)', 2, None, 'new-image-or-volume'),
+              ('Parameters to train, common settings',
                2,
                None,
-               'rnns-in-more-detail-part-4'),
-              ('RNNs in more detail, part 5',
-               2,
-               None,
-               'rnns-in-more-detail-part-5'),
-              ('RNNs in more detail, part 6',
-               2,
-               None,
-               'rnns-in-more-detail-part-6'),
-              ('RNNs in more detail, part 7',
-               2,
-               None,
-               'rnns-in-more-detail-part-7'),
-              ('Backpropagation through time',
-               2,
-               None,
-               'backpropagation-through-time'),
-              ('The backward pass is linear',
-               2,
-               None,
-               'the-backward-pass-is-linear'),
-              ('The problem of exploding or vanishing gradients',
-               2,
-               None,
-               'the-problem-of-exploding-or-vanishing-gradients'),
-              ('Mathematical setup', 2, None, 'mathematical-setup'),
-              ('Back propagation in time through figures, part 1',
-               2,
-               None,
-               'back-propagation-in-time-through-figures-part-1'),
-              ('Back propagation in time, part 2',
-               2,
-               None,
-               'back-propagation-in-time-part-2'),
-              ('Back propagation in time, part 3',
+               'parameters-to-train-common-settings'),
+              ('Examples of CNN setups', 2, None, 'examples-of-cnn-setups'),
+              ('Summarizing: Performing a general discrete convolution ("From '
+               'Raschka et '
+               'al":"https://github.com/rasbt/machine-learning-book")',
                2,
                None,
-               'back-propagation-in-time-part-3'),
-              ('Back propagation in time, part 4',
+               'summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
+              ('Pooling', 2, None, 'pooling'),
+              ('Pooling arithmetic', 2, None, 'pooling-arithmetic'),
+              ('Pooling types ("From Raschka et '
+               'al":"https://github.com/rasbt/machine-learning-book")',
                2,
                None,
-               'back-propagation-in-time-part-4'),
-              ('Back propagation in time in equations',
+               'pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
+              ('Building convolutional neural networks using Tensorflow and '
+               'Keras',
                2,
                None,
-               'back-propagation-in-time-in-equations'),
-              ('Chain rule again', 2, None, 'chain-rule-again'),
-              ('Gradients of loss functions',
+               'building-convolutional-neural-networks-using-tensorflow-and-keras'),
+              ('Setting it up', 2, None, 'setting-it-up'),
+              ('The MNIST dataset again', 2, None, 'the-mnist-dataset-again'),
+              ('Strong correlations', 2, None, 'strong-correlations'),
+              ('Layers of a CNN', 2, None, 'layers-of-a-cnn'),
+              ('Systematic reduction', 2, None, 'systematic-reduction'),
+              ('Prerequisites: Collect and pre-process data',
                2,
                None,
-               'gradients-of-loss-functions'),
-              ('Summary of RNNs', 2, None, 'summary-of-rnns'),
-              ('Summary of a  typical RNN',
+               'prerequisites-collect-and-pre-process-data'),
+              ('Importing Keras and Tensorflow',
                2,
                None,
-               'summary-of-a-typical-rnn'),
-              ('Four effective ways to learn an RNN and preparing for next '
-               'week',
+               'importing-keras-and-tensorflow'),
+              ('Running with Keras', 2, None, 'running-with-keras'),
+              ('Final part', 2, None, 'final-part'),
+              ('Final visualization', 2, None, 'final-visualization'),
+              ('The CIFAR01 data set', 2, None, 'the-cifar01-data-set'),
+              ('Verifying the data set', 2, None, 'verifying-the-data-set'),
+              ('Set up  the model', 2, None, 'set-up-the-model'),
+              ('Add Dense layers on top', 2, None, 'add-dense-layers-on-top'),
+              ('Compile and train the model',
                2,
                None,
-               'four-effective-ways-to-learn-an-rnn-and-preparing-for-next-week'),
-              ('Gating mechanism: Long Short Term Memory (LSTM)',
+               'compile-and-train-the-model'),
+              ('Finally, evaluate the model',
                2,
                None,
-               'gating-mechanism-long-short-term-memory-lstm'),
-              ('Implementing a memory cell in a neural network',
+               'finally-evaluate-the-model'),
+              ('Building code using Pytorch',
                2,
                None,
-               'implementing-a-memory-cell-in-a-neural-network'),
-              ('LSTM details', 2, None, 'lstm-details'),
-              ('Basic layout', 2, None, 'basic-layout'),
-              ('More LSTM details', 2, None, 'more-lstm-details'),
-              ('The forget gate', 2, None, 'the-forget-gate'),
-              ('Input gate', 2, None, 'input-gate'),
-              ('Forget and input', 2, None, 'forget-and-input'),
-              ('Output gate', 2, None, 'output-gate')]}
+               'building-code-using-pytorch')]}
 end of tocinfo -->
 
 <body>
@@ -302,111 +201,72 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week45-bs.html">Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks (RNNs)</a>
+    <a class="navbar-brand" href="http://github.com/week45-bs.html">Week 45,  Convolutional Neural Networks (CCNs)</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs001.html#plans-for-week-45" style="font-size: 80%;"><b>Plans for week 45</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs002.html#material-for-the-lab-sessions-additional-ways-to-present-classification-results-and-other-practicalities" style="font-size: 80%;"><b>Material for the lab sessions, additional ways to present classification results and other practicalities</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs003.html#material-for-lecture-monday-november-4" style="font-size: 80%;"><b>Material for Lecture Monday November 4</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs004.html#convolutional-neural-networks-recognizing-images" style="font-size: 80%;"><b>Convolutional Neural Networks (recognizing images)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs005.html#what-is-the-difference" style="font-size: 80%;"><b>What is the Difference</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs006.html#neural-networks-vs-cnns" style="font-size: 80%;"><b>Neural Networks vs CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs007.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;"><b>Why CNNS for images, sound files, medical images from CT scans etc?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs008.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;"><b>Regular NNs don’t scale well to full images</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs009.html#3d-volumes-of-neurons" style="font-size: 80%;"><b>3D volumes of neurons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs010.html#layers-used-to-build-cnns" style="font-size: 80%;"><b>Layers used to build CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs011.html#cnns-in-brief" style="font-size: 80%;"><b>CNNs in brief</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs012.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs013.html#key-idea" style="font-size: 80%;"><b>Key Idea</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs014.html#building-convolutional-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building convolutional neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs015.html#setting-it-up" style="font-size: 80%;"><b>Setting it up</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs016.html#the-mnist-dataset-again" style="font-size: 80%;"><b>The MNIST dataset again</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs017.html#strong-correlations" style="font-size: 80%;"><b>Strong correlations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs018.html#layers-of-a-cnn" style="font-size: 80%;"><b>Layers of a CNN</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs019.html#systematic-reduction" style="font-size: 80%;"><b>Systematic reduction</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs020.html#prerequisites-collect-and-pre-process-data" style="font-size: 80%;"><b>Prerequisites: Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs021.html#importing-keras-and-tensorflow" style="font-size: 80%;"><b>Importing Keras and Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs022.html#running-with-keras" style="font-size: 80%;"><b>Running with Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs023.html#final-part" style="font-size: 80%;"><b>Final part</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs024.html#final-visualization" style="font-size: 80%;"><b>Final visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs025.html#the-cifar01-data-set" style="font-size: 80%;"><b>The CIFAR01 data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs026.html#verifying-the-data-set" style="font-size: 80%;"><b>Verifying the data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs027.html#set-up-the-model" style="font-size: 80%;"><b>Set up  the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs028.html#add-dense-layers-on-top" style="font-size: 80%;"><b>Add Dense layers on top</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs029.html#compile-and-train-the-model" style="font-size: 80%;"><b>Compile and train the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs030.html#finally-evaluate-the-model" style="font-size: 80%;"><b>Finally, evaluate the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#building-our-own-cnn-code" style="font-size: 80%;"><b>Building our own CNN code</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#list-of-contents" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;List of contents:</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution2dlayer-convolution-in-a-hidden-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution2DLayer: convolution in a hidden layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#backpropagation-in-the-convolutional-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Backpropagation in the convolutional layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#demonstration" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Demonstration</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#pooling-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Pooling Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#flattening-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Flattening Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#fully-connected-layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Fully Connected Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#optimized-convolution2dlayer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Optimized Convolution2DLayer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#the-convolutional-neural-network-cnn" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;The Convolutional Neural Network (CNN)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-cnn-code" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of CNN code</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#additional-remarks" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Additional Remarks</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#remarks-on-the-speed" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Remarks on the speed</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution-using-separable-kernels" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution using separable kernels</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution-in-the-fourier-domain" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution in the Fourier domain</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs032.html#from-ffnns-and-cnns-to-recurrent-neural-networks-rnns" style="font-size: 80%;"><b>From FFNNs and CNNs to recurrent neural networks (RNNs)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs033.html#feedback-connections" style="font-size: 80%;"><b>Feedback connections</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs034.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs035.html#recurrent-neural-networks-rnns-overarching-view" style="font-size: 80%;"><b>Recurrent neural networks (RNNs): Overarching view</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs036.html#sequential-data-only" style="font-size: 80%;"><b>Sequential data only?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs037.html#differential-equations" style="font-size: 80%;"><b>Differential equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs038.html#a-simple-example" style="font-size: 80%;"><b>A simple example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs039.html#rnns" style="font-size: 80%;"><b>RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs040.html#what-kinds-of-behaviour-can-rnns-exhibit" style="font-size: 80%;"><b>What kinds of behaviour can RNNs exhibit?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs041.html#basic-layout-figures-from-sebastian-rashcka-et-al-machine-learning-with-sickit-learn-and-pytorch-https-sebastianraschka-com-blog-2022-ml-pytorch-book-html" style="font-size: 80%;"><b>Basic layout,  "Figures from Sebastian Rashcka et al, Machine learning with Sickit-Learn and PyTorch":"https://sebastianraschka.com/blog/2022/ml-pytorch-book.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs042.html#solving-differential-equations-with-rnns" style="font-size: 80%;"><b>Solving differential equations with RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs043.html#two-first-order-differential-equations" style="font-size: 80%;"><b>Two first-order differential equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs044.html#velocity-only" style="font-size: 80%;"><b>Velocity only</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs045.html#linking-with-rnns" style="font-size: 80%;"><b>Linking with RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs046.html#minor-rewrite" style="font-size: 80%;"><b>Minor rewrite</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs047.html#rnns-in-more-detail" style="font-size: 80%;"><b>RNNs in more detail</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs048.html#rnns-in-more-detail-part-2" style="font-size: 80%;"><b>RNNs in more detail, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="#rnns-in-more-detail-part-3" style="font-size: 80%;"><b>RNNs in more detail, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs050.html#rnns-in-more-detail-part-4" style="font-size: 80%;"><b>RNNs in more detail, part 4</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs051.html#rnns-in-more-detail-part-5" style="font-size: 80%;"><b>RNNs in more detail, part 5</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs052.html#rnns-in-more-detail-part-6" style="font-size: 80%;"><b>RNNs in more detail, part 6</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs053.html#rnns-in-more-detail-part-7" style="font-size: 80%;"><b>RNNs in more detail, part 7</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs054.html#backpropagation-through-time" style="font-size: 80%;"><b>Backpropagation through time</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs055.html#the-backward-pass-is-linear" style="font-size: 80%;"><b>The backward pass is linear</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs056.html#the-problem-of-exploding-or-vanishing-gradients" style="font-size: 80%;"><b>The problem of exploding or vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs057.html#mathematical-setup" style="font-size: 80%;"><b>Mathematical setup</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs058.html#back-propagation-in-time-through-figures-part-1" style="font-size: 80%;"><b>Back propagation in time through figures, part 1</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs059.html#back-propagation-in-time-part-2" style="font-size: 80%;"><b>Back propagation in time, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs060.html#back-propagation-in-time-part-3" style="font-size: 80%;"><b>Back propagation in time, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs061.html#back-propagation-in-time-part-4" style="font-size: 80%;"><b>Back propagation in time, part 4</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs062.html#back-propagation-in-time-in-equations" style="font-size: 80%;"><b>Back propagation in time in equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs063.html#chain-rule-again" style="font-size: 80%;"><b>Chain rule again</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs064.html#gradients-of-loss-functions" style="font-size: 80%;"><b>Gradients of loss functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs065.html#summary-of-rnns" style="font-size: 80%;"><b>Summary of RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs066.html#summary-of-a-typical-rnn" style="font-size: 80%;"><b>Summary of a  typical RNN</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs067.html#four-effective-ways-to-learn-an-rnn-and-preparing-for-next-week" style="font-size: 80%;"><b>Four effective ways to learn an RNN and preparing for next week</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs068.html#gating-mechanism-long-short-term-memory-lstm" style="font-size: 80%;"><b>Gating mechanism: Long Short Term Memory (LSTM)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs069.html#implementing-a-memory-cell-in-a-neural-network" style="font-size: 80%;"><b>Implementing a memory cell in a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs070.html#lstm-details" style="font-size: 80%;"><b>LSTM details</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs071.html#basic-layout" style="font-size: 80%;"><b>Basic layout</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs072.html#more-lstm-details" style="font-size: 80%;"><b>More LSTM details</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs073.html#the-forget-gate" style="font-size: 80%;"><b>The forget gate</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs074.html#input-gate" style="font-size: 80%;"><b>Input gate</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs075.html#forget-and-input" style="font-size: 80%;"><b>Forget and input</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs076.html#output-gate" style="font-size: 80%;"><b>Output gate</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs001.html#plans-for-week-45" style="font-size: 80%;">Plans for week 45</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs002.html#material-for-the-lab-sessions" style="font-size: 80%;">Material for the lab sessions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs003.html#material-for-lecture-monday-november-3" style="font-size: 80%;">Material for Lecture Monday November 3</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs004.html#convolutional-neural-networks-recognizing-images-reminder-from-last-week" style="font-size: 80%;">Convolutional Neural Networks (recognizing images), reminder from last week</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs005.html#what-is-the-difference" style="font-size: 80%;">What is the Difference</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs006.html#neural-networks-vs-cnns" style="font-size: 80%;">Neural Networks vs CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs007.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;">Why CNNS for images, sound files, medical images from CT scans etc?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs008.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;">Regular NNs don’t scale well to full images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs009.html#3d-volumes-of-neurons" style="font-size: 80%;">3D volumes of neurons</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs010.html#more-on-dimensionalities" style="font-size: 80%;">More on Dimensionalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs011.html#further-remarks" style="font-size: 80%;">Further remarks</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs012.html#layers-used-to-build-cnns" style="font-size: 80%;">Layers used to build CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs013.html#transforming-images" style="font-size: 80%;">Transforming images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs014.html#cnns-in-brief" style="font-size: 80%;">CNNs in brief</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs015.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs016.html#key-idea" style="font-size: 80%;">Key Idea</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs017.html#mathematics-of-cnns" style="font-size: 80%;">Mathematics of CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs018.html#convolution-examples-polynomial-multiplication" style="font-size: 80%;">Convolution Examples: Polynomial multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs019.html#efficient-polynomial-multiplication" style="font-size: 80%;">Efficient Polynomial Multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs020.html#further-simplification" style="font-size: 80%;">Further simplification</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs021.html#a-more-efficient-way-of-coding-the-above-convolution" style="font-size: 80%;">A more efficient way of coding the above Convolution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs022.html#commutative-process" style="font-size: 80%;">Commutative process</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs023.html#toeplitz-matrices" style="font-size: 80%;">Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs024.html#fourier-series-and-toeplitz-matrices" style="font-size: 80%;">Fourier series and Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs025.html#generalizing-the-above-one-dimensional-case" style="font-size: 80%;">Generalizing the above one-dimensional case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs026.html#memory-considerations" style="font-size: 80%;">Memory considerations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs027.html#padding" style="font-size: 80%;">Padding</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs028.html#new-vector" style="font-size: 80%;">New vector</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs029.html#rewriting-as-dot-products" style="font-size: 80%;">Rewriting as dot products</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs030.html#cross-correlation" style="font-size: 80%;">Cross correlation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs030.html#two-dimensional-objects" style="font-size: 80%;">Two-dimensional objects</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#cnns-in-more-detail-simple-example" style="font-size: 80%;">CNNs in more detail, simple example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs032.html#the-convolution-stage" style="font-size: 80%;">The convolution stage</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs033.html#finding-the-number-of-parameters" style="font-size: 80%;">Finding the number of parameters</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs034.html#new-image-or-volume" style="font-size: 80%;">New image (or volume)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs035.html#parameters-to-train-common-settings" style="font-size: 80%;">Parameters to train, common settings</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs036.html#examples-of-cnn-setups" style="font-size: 80%;">Examples of CNN setups</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs037.html#summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Summarizing: Performing a general discrete convolution ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs038.html#pooling" style="font-size: 80%;">Pooling</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs039.html#pooling-arithmetic" style="font-size: 80%;">Pooling arithmetic</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs040.html#pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Pooling types ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs041.html#building-convolutional-neural-networks-using-tensorflow-and-keras" style="font-size: 80%;">Building convolutional neural networks using Tensorflow and Keras</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs042.html#setting-it-up" style="font-size: 80%;">Setting it up</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs043.html#the-mnist-dataset-again" style="font-size: 80%;">The MNIST dataset again</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs044.html#strong-correlations" style="font-size: 80%;">Strong correlations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs045.html#layers-of-a-cnn" style="font-size: 80%;">Layers of a CNN</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs046.html#systematic-reduction" style="font-size: 80%;">Systematic reduction</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs047.html#prerequisites-collect-and-pre-process-data" style="font-size: 80%;">Prerequisites: Collect and pre-process data</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs048.html#importing-keras-and-tensorflow" style="font-size: 80%;">Importing Keras and Tensorflow</a></li>
+     <!-- navigation toc: --> <li><a href="#running-with-keras" style="font-size: 80%;">Running with Keras</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs050.html#final-part" style="font-size: 80%;">Final part</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs051.html#final-visualization" style="font-size: 80%;">Final visualization</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs052.html#the-cifar01-data-set" style="font-size: 80%;">The CIFAR01 data set</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs053.html#verifying-the-data-set" style="font-size: 80%;">Verifying the data set</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs054.html#set-up-the-model" style="font-size: 80%;">Set up  the model</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs055.html#add-dense-layers-on-top" style="font-size: 80%;">Add Dense layers on top</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs056.html#compile-and-train-the-model" style="font-size: 80%;">Compile and train the model</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs057.html#finally-evaluate-the-model" style="font-size: 80%;">Finally, evaluate the model</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs058.html#building-code-using-pytorch" style="font-size: 80%;">Building code using Pytorch</a></li>
 
         </ul>
       </li>
@@ -417,14 +277,57 @@
 <div class="container">
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0049"></a>
-<!-- !split -->
-<h2 id="rnns-in-more-detail-part-3" class="anchor">RNNs in more detail, part 3  </h2>
+<!-- !split  -->
+<h2 id="running-with-keras" class="anchor">Running with Keras </h2>
+
+
+<!-- code=python (!bc pycod) typeset with pygments style "default" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #f8f8f8">
+  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">create_convolutional_neural_network_keras</span>(input_shape, receptive_field,
+                                              n_filters, n_neurons_connected, n_categories,
+                                              eta, lmbd):
+    model <span style="color: #666666">=</span> Sequential()
+    model<span style="color: #666666">.</span>add(layers<span style="color: #666666">.</span>Conv2D(n_filters, (receptive_field, receptive_field), input_shape<span style="color: #666666">=</span>input_shape, padding<span style="color: #666666">=</span><span style="color: #BA2121">&#39;same&#39;</span>,
+              activation<span style="color: #666666">=</span><span style="color: #BA2121">&#39;relu&#39;</span>, kernel_regularizer<span style="color: #666666">=</span>regularizers<span style="color: #666666">.</span>l2(lmbd)))
+    model<span style="color: #666666">.</span>add(layers<span style="color: #666666">.</span>MaxPooling2D(pool_size<span style="color: #666666">=</span>(<span style="color: #666666">2</span>, <span style="color: #666666">2</span>)))
+    model<span style="color: #666666">.</span>add(layers<span style="color: #666666">.</span>Flatten())
+    model<span style="color: #666666">.</span>add(layers<span style="color: #666666">.</span>Dense(n_neurons_connected, activation<span style="color: #666666">=</span><span style="color: #BA2121">&#39;relu&#39;</span>, kernel_regularizer<span style="color: #666666">=</span>regularizers<span style="color: #666666">.</span>l2(lmbd)))
+    model<span style="color: #666666">.</span>add(layers<span style="color: #666666">.</span>Dense(n_categories, activation<span style="color: #666666">=</span><span style="color: #BA2121">&#39;softmax&#39;</span>, kernel_regularizer<span style="color: #666666">=</span>regularizers<span style="color: #666666">.</span>l2(lmbd)))
+    
+    sgd <span style="color: #666666">=</span> optimizers<span style="color: #666666">.</span>SGD(lr<span style="color: #666666">=</span>eta)
+    model<span style="color: #666666">.</span>compile(loss<span style="color: #666666">=</span><span style="color: #BA2121">&#39;categorical_crossentropy&#39;</span>, optimizer<span style="color: #666666">=</span>sgd, metrics<span style="color: #666666">=</span>[<span style="color: #BA2121">&#39;accuracy&#39;</span>])
+    
+    <span style="color: #008000; font-weight: bold">return</span> model
+
+epochs <span style="color: #666666">=</span> <span style="color: #666666">100</span>
+batch_size <span style="color: #666666">=</span> <span style="color: #666666">100</span>
+input_shape <span style="color: #666666">=</span> X_train<span style="color: #666666">.</span>shape[<span style="color: #666666">1</span>:<span style="color: #666666">4</span>]
+receptive_field <span style="color: #666666">=</span> <span style="color: #666666">3</span>
+n_filters <span style="color: #666666">=</span> <span style="color: #666666">10</span>
+n_neurons_connected <span style="color: #666666">=</span> <span style="color: #666666">50</span>
+n_categories <span style="color: #666666">=</span> <span style="color: #666666">10</span>
+
+eta_vals <span style="color: #666666">=</span> np<span style="color: #666666">.</span>logspace(<span style="color: #666666">-5</span>, <span style="color: #666666">1</span>, <span style="color: #666666">7</span>)
+lmbd_vals <span style="color: #666666">=</span> np<span style="color: #666666">.</span>logspace(<span style="color: #666666">-5</span>, <span style="color: #666666">1</span>, <span style="color: #666666">7</span>)
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
+</div>
 
-<br/><br/>
-<center>
-<p><img src="http://github.com/figslides/RNN4.png" width="700" align="bottom"></p>
-</center>
-<br/><br/>
 
 <p>
 <!-- navigation buttons at the bottom of the page -->
@@ -450,8 +353,6 @@ <h2 id="rnns-in-more-detail-part-3" class="anchor">RNNs in more detail, part 3
   <li><a href="http://github.com/._week45-bs056.html">57</a></li>
   <li><a href="http://github.com/._week45-bs057.html">58</a></li>
   <li><a href="http://github.com/._week45-bs058.html">59</a></li>
-  <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week45-bs076.html">77</a></li>
   <li><a href="http://github.com/._week45-bs050.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week45/html/._week45-bs050.html b/doc/pub/week45/html/._week45-bs050.html
index fc7708ad4..f99e75b57 100644
--- a/doc/pub/week45/html/._week45-bs050.html
+++ b/doc/pub/week45/html/._week45-bs050.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks (RNNs)">
-<title>Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks (RNNs)</title>
+<meta name="description" content="Week 45,  Convolutional Neural Networks (CCNs)">
+<title>Week 45,  Convolutional Neural Networks (CCNs)</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week45.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week45-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -37,19 +37,19 @@
 <!-- tocinfo
 {'highest level': 2,
  'sections': [('Plans for week 45', 2, None, 'plans-for-week-45'),
-              ('Material for the lab sessions, additional ways to present '
-               'classification results and other practicalities',
+              ('Material for the lab sessions',
                2,
                None,
-               'material-for-the-lab-sessions-additional-ways-to-present-classification-results-and-other-practicalities'),
-              ('Material for Lecture Monday November 4',
+               'material-for-the-lab-sessions'),
+              ('Material for Lecture Monday November 3',
                2,
                None,
-               'material-for-lecture-monday-november-4'),
-              ('Convolutional Neural Networks (recognizing images)',
+               'material-for-lecture-monday-november-3'),
+              ('Convolutional Neural Networks (recognizing images), reminder '
+               'from last week',
                2,
                None,
-               'convolutional-neural-networks-recognizing-images'),
+               'convolutional-neural-networks-recognizing-images-reminder-from-last-week'),
               ('What is the Difference', 2, None, 'what-is-the-difference'),
               ('Neural Networks vs CNNs', 2, None, 'neural-networks-vs-cnns'),
               ('Why CNNS for images, sound files, medical images from CT scans '
@@ -62,10 +62,13 @@
                None,
                'regular-nns-don-t-scale-well-to-full-images'),
               ('3D volumes of neurons', 2, None, '3d-volumes-of-neurons'),
+              ('More on Dimensionalities', 2, None, 'more-on-dimensionalities'),
+              ('Further remarks', 2, None, 'further-remarks'),
               ('Layers used to build CNNs',
                2,
                None,
                'layers-used-to-build-cnns'),
+              ('Transforming images', 2, None, 'transforming-images'),
               ('CNNs in brief', 2, None, 'cnns-in-brief'),
               ('A deep CNN model ("From Raschka et '
                'al":"https://github.com/rasbt/machine-learning-book")',
@@ -73,208 +76,104 @@
                None,
                'a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
               ('Key Idea', 2, None, 'key-idea'),
-              ('Building convolutional neural networks in Tensorflow and Keras',
+              ('Mathematics of CNNs', 2, None, 'mathematics-of-cnns'),
+              ('Convolution Examples: Polynomial multiplication',
                2,
                None,
-               'building-convolutional-neural-networks-in-tensorflow-and-keras'),
-              ('Setting it up', 2, None, 'setting-it-up'),
-              ('The MNIST dataset again', 2, None, 'the-mnist-dataset-again'),
-              ('Strong correlations', 2, None, 'strong-correlations'),
-              ('Layers of a CNN', 2, None, 'layers-of-a-cnn'),
-              ('Systematic reduction', 2, None, 'systematic-reduction'),
-              ('Prerequisites: Collect and pre-process data',
-               2,
-               None,
-               'prerequisites-collect-and-pre-process-data'),
-              ('Importing Keras and Tensorflow',
-               2,
-               None,
-               'importing-keras-and-tensorflow'),
-              ('Running with Keras', 2, None, 'running-with-keras'),
-              ('Final part', 2, None, 'final-part'),
-              ('Final visualization', 2, None, 'final-visualization'),
-              ('The CIFAR01 data set', 2, None, 'the-cifar01-data-set'),
-              ('Verifying the data set', 2, None, 'verifying-the-data-set'),
-              ('Set up  the model', 2, None, 'set-up-the-model'),
-              ('Add Dense layers on top', 2, None, 'add-dense-layers-on-top'),
-              ('Compile and train the model',
-               2,
-               None,
-               'compile-and-train-the-model'),
-              ('Finally, evaluate the model',
-               2,
-               None,
-               'finally-evaluate-the-model'),
-              ('Building our own CNN code',
-               2,
-               None,
-               'building-our-own-cnn-code'),
-              ('List of contents:', 3, None, 'list-of-contents'),
-              ('Schedulers', 3, None, 'schedulers'),
-              ('Usage of schedulers', 3, None, 'usage-of-schedulers'),
-              ('Cost functions', 3, None, 'cost-functions'),
-              ('Usage of cost functions', 3, None, 'usage-of-cost-functions'),
-              ('Activation functions', 3, None, 'activation-functions'),
-              ('Usage of activation functions',
-               3,
-               None,
-               'usage-of-activation-functions'),
-              ('Convolution', 3, None, 'convolution'),
-              ('Layers', 3, None, 'layers'),
-              ('Convolution2DLayer: convolution in a hidden layer',
-               3,
-               None,
-               'convolution2dlayer-convolution-in-a-hidden-layer'),
-              ('Backpropagation in the convolutional layer',
-               3,
-               None,
-               'backpropagation-in-the-convolutional-layer'),
-              ('Demonstration', 3, None, 'demonstration'),
-              ('Pooling Layer', 3, None, 'pooling-layer'),
-              ('Flattening Layer', 3, None, 'flattening-layer'),
-              ('Fully Connected Layers', 3, None, 'fully-connected-layers'),
-              ('Optimized Convolution2DLayer',
-               3,
-               None,
-               'optimized-convolution2dlayer'),
-              ('The Convolutional Neural Network (CNN)',
-               3,
-               None,
-               'the-convolutional-neural-network-cnn'),
-              ('Usage of CNN code', 3, None, 'usage-of-cnn-code'),
-              ('Additional Remarks', 3, None, 'additional-remarks'),
-              ('Remarks on the speed', 3, None, 'remarks-on-the-speed'),
-              ('Convolution using separable kernels',
-               3,
-               None,
-               'convolution-using-separable-kernels'),
-              ('Convolution in the Fourier domain',
-               3,
-               None,
-               'convolution-in-the-fourier-domain'),
-              ('From FFNNs and CNNs to recurrent neural networks (RNNs)',
-               2,
-               None,
-               'from-ffnns-and-cnns-to-recurrent-neural-networks-rnns'),
-              ('Feedback connections', 2, None, 'feedback-connections'),
-              ('Vanishing gradients', 2, None, 'vanishing-gradients'),
-              ('Recurrent neural networks (RNNs): Overarching view',
-               2,
-               None,
-               'recurrent-neural-networks-rnns-overarching-view'),
-              ('Sequential data only?', 2, None, 'sequential-data-only'),
-              ('Differential equations', 2, None, 'differential-equations'),
-              ('A simple example', 2, None, 'a-simple-example'),
-              ('RNNs', 2, None, 'rnns'),
-              ('What kinds of behaviour can RNNs exhibit?',
-               2,
-               None,
-               'what-kinds-of-behaviour-can-rnns-exhibit'),
-              ('Basic layout,  "Figures from Sebastian Rashcka et al, Machine '
-               'learning with Sickit-Learn and '
-               'PyTorch":"https://sebastianraschka.com/blog/2022/ml-pytorch-book.html"',
+               'convolution-examples-polynomial-multiplication'),
+              ('Efficient Polynomial Multiplication',
                2,
                None,
-               'basic-layout-figures-from-sebastian-rashcka-et-al-machine-learning-with-sickit-learn-and-pytorch-https-sebastianraschka-com-blog-2022-ml-pytorch-book-html'),
-              ('Solving differential equations with RNNs',
+               'efficient-polynomial-multiplication'),
+              ('Further simplification', 2, None, 'further-simplification'),
+              ('A more efficient way of coding the above Convolution',
                2,
                None,
-               'solving-differential-equations-with-rnns'),
-              ('Two first-order differential equations',
+               'a-more-efficient-way-of-coding-the-above-convolution'),
+              ('Commutative process', 2, None, 'commutative-process'),
+              ('Toeplitz matrices', 2, None, 'toeplitz-matrices'),
+              ('Fourier series and Toeplitz matrices',
                2,
                None,
-               'two-first-order-differential-equations'),
-              ('Velocity only', 2, None, 'velocity-only'),
-              ('Linking with RNNs', 2, None, 'linking-with-rnns'),
-              ('Minor rewrite', 2, None, 'minor-rewrite'),
-              ('RNNs in more detail', 2, None, 'rnns-in-more-detail'),
-              ('RNNs in more detail, part 2',
+               'fourier-series-and-toeplitz-matrices'),
+              ('Generalizing the above one-dimensional case',
                2,
                None,
-               'rnns-in-more-detail-part-2'),
-              ('RNNs in more detail, part 3',
+               'generalizing-the-above-one-dimensional-case'),
+              ('Memory considerations', 2, None, 'memory-considerations'),
+              ('Padding', 2, None, 'padding'),
+              ('New vector', 2, None, 'new-vector'),
+              ('Rewriting as dot products',
                2,
                None,
-               'rnns-in-more-detail-part-3'),
-              ('RNNs in more detail, part 4',
+               'rewriting-as-dot-products'),
+              ('Cross correlation', 2, None, 'cross-correlation'),
+              ('Two-dimensional objects', 2, None, 'two-dimensional-objects'),
+              ('CNNs in more detail, simple example',
                2,
                None,
-               'rnns-in-more-detail-part-4'),
-              ('RNNs in more detail, part 5',
+               'cnns-in-more-detail-simple-example'),
+              ('The convolution stage', 2, None, 'the-convolution-stage'),
+              ('Finding the number of parameters',
                2,
                None,
-               'rnns-in-more-detail-part-5'),
-              ('RNNs in more detail, part 6',
+               'finding-the-number-of-parameters'),
+              ('New image (or volume)', 2, None, 'new-image-or-volume'),
+              ('Parameters to train, common settings',
                2,
                None,
-               'rnns-in-more-detail-part-6'),
-              ('RNNs in more detail, part 7',
-               2,
-               None,
-               'rnns-in-more-detail-part-7'),
-              ('Backpropagation through time',
-               2,
-               None,
-               'backpropagation-through-time'),
-              ('The backward pass is linear',
-               2,
-               None,
-               'the-backward-pass-is-linear'),
-              ('The problem of exploding or vanishing gradients',
-               2,
-               None,
-               'the-problem-of-exploding-or-vanishing-gradients'),
-              ('Mathematical setup', 2, None, 'mathematical-setup'),
-              ('Back propagation in time through figures, part 1',
-               2,
-               None,
-               'back-propagation-in-time-through-figures-part-1'),
-              ('Back propagation in time, part 2',
-               2,
-               None,
-               'back-propagation-in-time-part-2'),
-              ('Back propagation in time, part 3',
+               'parameters-to-train-common-settings'),
+              ('Examples of CNN setups', 2, None, 'examples-of-cnn-setups'),
+              ('Summarizing: Performing a general discrete convolution ("From '
+               'Raschka et '
+               'al":"https://github.com/rasbt/machine-learning-book")',
                2,
                None,
-               'back-propagation-in-time-part-3'),
-              ('Back propagation in time, part 4',
+               'summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
+              ('Pooling', 2, None, 'pooling'),
+              ('Pooling arithmetic', 2, None, 'pooling-arithmetic'),
+              ('Pooling types ("From Raschka et '
+               'al":"https://github.com/rasbt/machine-learning-book")',
                2,
                None,
-               'back-propagation-in-time-part-4'),
-              ('Back propagation in time in equations',
+               'pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
+              ('Building convolutional neural networks using Tensorflow and '
+               'Keras',
                2,
                None,
-               'back-propagation-in-time-in-equations'),
-              ('Chain rule again', 2, None, 'chain-rule-again'),
-              ('Gradients of loss functions',
+               'building-convolutional-neural-networks-using-tensorflow-and-keras'),
+              ('Setting it up', 2, None, 'setting-it-up'),
+              ('The MNIST dataset again', 2, None, 'the-mnist-dataset-again'),
+              ('Strong correlations', 2, None, 'strong-correlations'),
+              ('Layers of a CNN', 2, None, 'layers-of-a-cnn'),
+              ('Systematic reduction', 2, None, 'systematic-reduction'),
+              ('Prerequisites: Collect and pre-process data',
                2,
                None,
-               'gradients-of-loss-functions'),
-              ('Summary of RNNs', 2, None, 'summary-of-rnns'),
-              ('Summary of a  typical RNN',
+               'prerequisites-collect-and-pre-process-data'),
+              ('Importing Keras and Tensorflow',
                2,
                None,
-               'summary-of-a-typical-rnn'),
-              ('Four effective ways to learn an RNN and preparing for next '
-               'week',
+               'importing-keras-and-tensorflow'),
+              ('Running with Keras', 2, None, 'running-with-keras'),
+              ('Final part', 2, None, 'final-part'),
+              ('Final visualization', 2, None, 'final-visualization'),
+              ('The CIFAR01 data set', 2, None, 'the-cifar01-data-set'),
+              ('Verifying the data set', 2, None, 'verifying-the-data-set'),
+              ('Set up  the model', 2, None, 'set-up-the-model'),
+              ('Add Dense layers on top', 2, None, 'add-dense-layers-on-top'),
+              ('Compile and train the model',
                2,
                None,
-               'four-effective-ways-to-learn-an-rnn-and-preparing-for-next-week'),
-              ('Gating mechanism: Long Short Term Memory (LSTM)',
+               'compile-and-train-the-model'),
+              ('Finally, evaluate the model',
                2,
                None,
-               'gating-mechanism-long-short-term-memory-lstm'),
-              ('Implementing a memory cell in a neural network',
+               'finally-evaluate-the-model'),
+              ('Building code using Pytorch',
                2,
                None,
-               'implementing-a-memory-cell-in-a-neural-network'),
-              ('LSTM details', 2, None, 'lstm-details'),
-              ('Basic layout', 2, None, 'basic-layout'),
-              ('More LSTM details', 2, None, 'more-lstm-details'),
-              ('The forget gate', 2, None, 'the-forget-gate'),
-              ('Input gate', 2, None, 'input-gate'),
-              ('Forget and input', 2, None, 'forget-and-input'),
-              ('Output gate', 2, None, 'output-gate')]}
+               'building-code-using-pytorch')]}
 end of tocinfo -->
 
 <body>
@@ -302,111 +201,72 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week45-bs.html">Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks (RNNs)</a>
+    <a class="navbar-brand" href="http://github.com/week45-bs.html">Week 45,  Convolutional Neural Networks (CCNs)</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs001.html#plans-for-week-45" style="font-size: 80%;"><b>Plans for week 45</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs002.html#material-for-the-lab-sessions-additional-ways-to-present-classification-results-and-other-practicalities" style="font-size: 80%;"><b>Material for the lab sessions, additional ways to present classification results and other practicalities</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs003.html#material-for-lecture-monday-november-4" style="font-size: 80%;"><b>Material for Lecture Monday November 4</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs004.html#convolutional-neural-networks-recognizing-images" style="font-size: 80%;"><b>Convolutional Neural Networks (recognizing images)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs005.html#what-is-the-difference" style="font-size: 80%;"><b>What is the Difference</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs006.html#neural-networks-vs-cnns" style="font-size: 80%;"><b>Neural Networks vs CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs007.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;"><b>Why CNNS for images, sound files, medical images from CT scans etc?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs008.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;"><b>Regular NNs don’t scale well to full images</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs009.html#3d-volumes-of-neurons" style="font-size: 80%;"><b>3D volumes of neurons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs010.html#layers-used-to-build-cnns" style="font-size: 80%;"><b>Layers used to build CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs011.html#cnns-in-brief" style="font-size: 80%;"><b>CNNs in brief</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs012.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs013.html#key-idea" style="font-size: 80%;"><b>Key Idea</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs014.html#building-convolutional-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building convolutional neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs015.html#setting-it-up" style="font-size: 80%;"><b>Setting it up</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs016.html#the-mnist-dataset-again" style="font-size: 80%;"><b>The MNIST dataset again</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs017.html#strong-correlations" style="font-size: 80%;"><b>Strong correlations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs018.html#layers-of-a-cnn" style="font-size: 80%;"><b>Layers of a CNN</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs019.html#systematic-reduction" style="font-size: 80%;"><b>Systematic reduction</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs020.html#prerequisites-collect-and-pre-process-data" style="font-size: 80%;"><b>Prerequisites: Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs021.html#importing-keras-and-tensorflow" style="font-size: 80%;"><b>Importing Keras and Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs022.html#running-with-keras" style="font-size: 80%;"><b>Running with Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs023.html#final-part" style="font-size: 80%;"><b>Final part</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs024.html#final-visualization" style="font-size: 80%;"><b>Final visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs025.html#the-cifar01-data-set" style="font-size: 80%;"><b>The CIFAR01 data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs026.html#verifying-the-data-set" style="font-size: 80%;"><b>Verifying the data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs027.html#set-up-the-model" style="font-size: 80%;"><b>Set up  the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs028.html#add-dense-layers-on-top" style="font-size: 80%;"><b>Add Dense layers on top</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs029.html#compile-and-train-the-model" style="font-size: 80%;"><b>Compile and train the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs030.html#finally-evaluate-the-model" style="font-size: 80%;"><b>Finally, evaluate the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#building-our-own-cnn-code" style="font-size: 80%;"><b>Building our own CNN code</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#list-of-contents" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;List of contents:</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution2dlayer-convolution-in-a-hidden-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution2DLayer: convolution in a hidden layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#backpropagation-in-the-convolutional-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Backpropagation in the convolutional layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#demonstration" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Demonstration</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#pooling-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Pooling Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#flattening-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Flattening Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#fully-connected-layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Fully Connected Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#optimized-convolution2dlayer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Optimized Convolution2DLayer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#the-convolutional-neural-network-cnn" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;The Convolutional Neural Network (CNN)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-cnn-code" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of CNN code</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#additional-remarks" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Additional Remarks</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#remarks-on-the-speed" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Remarks on the speed</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution-using-separable-kernels" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution using separable kernels</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution-in-the-fourier-domain" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution in the Fourier domain</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs032.html#from-ffnns-and-cnns-to-recurrent-neural-networks-rnns" style="font-size: 80%;"><b>From FFNNs and CNNs to recurrent neural networks (RNNs)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs033.html#feedback-connections" style="font-size: 80%;"><b>Feedback connections</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs034.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs035.html#recurrent-neural-networks-rnns-overarching-view" style="font-size: 80%;"><b>Recurrent neural networks (RNNs): Overarching view</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs036.html#sequential-data-only" style="font-size: 80%;"><b>Sequential data only?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs037.html#differential-equations" style="font-size: 80%;"><b>Differential equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs038.html#a-simple-example" style="font-size: 80%;"><b>A simple example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs039.html#rnns" style="font-size: 80%;"><b>RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs040.html#what-kinds-of-behaviour-can-rnns-exhibit" style="font-size: 80%;"><b>What kinds of behaviour can RNNs exhibit?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs041.html#basic-layout-figures-from-sebastian-rashcka-et-al-machine-learning-with-sickit-learn-and-pytorch-https-sebastianraschka-com-blog-2022-ml-pytorch-book-html" style="font-size: 80%;"><b>Basic layout,  "Figures from Sebastian Rashcka et al, Machine learning with Sickit-Learn and PyTorch":"https://sebastianraschka.com/blog/2022/ml-pytorch-book.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs042.html#solving-differential-equations-with-rnns" style="font-size: 80%;"><b>Solving differential equations with RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs043.html#two-first-order-differential-equations" style="font-size: 80%;"><b>Two first-order differential equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs044.html#velocity-only" style="font-size: 80%;"><b>Velocity only</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs045.html#linking-with-rnns" style="font-size: 80%;"><b>Linking with RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs046.html#minor-rewrite" style="font-size: 80%;"><b>Minor rewrite</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs047.html#rnns-in-more-detail" style="font-size: 80%;"><b>RNNs in more detail</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs048.html#rnns-in-more-detail-part-2" style="font-size: 80%;"><b>RNNs in more detail, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs049.html#rnns-in-more-detail-part-3" style="font-size: 80%;"><b>RNNs in more detail, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="#rnns-in-more-detail-part-4" style="font-size: 80%;"><b>RNNs in more detail, part 4</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs051.html#rnns-in-more-detail-part-5" style="font-size: 80%;"><b>RNNs in more detail, part 5</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs052.html#rnns-in-more-detail-part-6" style="font-size: 80%;"><b>RNNs in more detail, part 6</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs053.html#rnns-in-more-detail-part-7" style="font-size: 80%;"><b>RNNs in more detail, part 7</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs054.html#backpropagation-through-time" style="font-size: 80%;"><b>Backpropagation through time</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs055.html#the-backward-pass-is-linear" style="font-size: 80%;"><b>The backward pass is linear</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs056.html#the-problem-of-exploding-or-vanishing-gradients" style="font-size: 80%;"><b>The problem of exploding or vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs057.html#mathematical-setup" style="font-size: 80%;"><b>Mathematical setup</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs058.html#back-propagation-in-time-through-figures-part-1" style="font-size: 80%;"><b>Back propagation in time through figures, part 1</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs059.html#back-propagation-in-time-part-2" style="font-size: 80%;"><b>Back propagation in time, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs060.html#back-propagation-in-time-part-3" style="font-size: 80%;"><b>Back propagation in time, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs061.html#back-propagation-in-time-part-4" style="font-size: 80%;"><b>Back propagation in time, part 4</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs062.html#back-propagation-in-time-in-equations" style="font-size: 80%;"><b>Back propagation in time in equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs063.html#chain-rule-again" style="font-size: 80%;"><b>Chain rule again</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs064.html#gradients-of-loss-functions" style="font-size: 80%;"><b>Gradients of loss functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs065.html#summary-of-rnns" style="font-size: 80%;"><b>Summary of RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs066.html#summary-of-a-typical-rnn" style="font-size: 80%;"><b>Summary of a  typical RNN</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs067.html#four-effective-ways-to-learn-an-rnn-and-preparing-for-next-week" style="font-size: 80%;"><b>Four effective ways to learn an RNN and preparing for next week</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs068.html#gating-mechanism-long-short-term-memory-lstm" style="font-size: 80%;"><b>Gating mechanism: Long Short Term Memory (LSTM)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs069.html#implementing-a-memory-cell-in-a-neural-network" style="font-size: 80%;"><b>Implementing a memory cell in a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs070.html#lstm-details" style="font-size: 80%;"><b>LSTM details</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs071.html#basic-layout" style="font-size: 80%;"><b>Basic layout</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs072.html#more-lstm-details" style="font-size: 80%;"><b>More LSTM details</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs073.html#the-forget-gate" style="font-size: 80%;"><b>The forget gate</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs074.html#input-gate" style="font-size: 80%;"><b>Input gate</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs075.html#forget-and-input" style="font-size: 80%;"><b>Forget and input</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs076.html#output-gate" style="font-size: 80%;"><b>Output gate</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs001.html#plans-for-week-45" style="font-size: 80%;">Plans for week 45</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs002.html#material-for-the-lab-sessions" style="font-size: 80%;">Material for the lab sessions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs003.html#material-for-lecture-monday-november-3" style="font-size: 80%;">Material for Lecture Monday November 3</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs004.html#convolutional-neural-networks-recognizing-images-reminder-from-last-week" style="font-size: 80%;">Convolutional Neural Networks (recognizing images), reminder from last week</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs005.html#what-is-the-difference" style="font-size: 80%;">What is the Difference</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs006.html#neural-networks-vs-cnns" style="font-size: 80%;">Neural Networks vs CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs007.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;">Why CNNS for images, sound files, medical images from CT scans etc?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs008.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;">Regular NNs don’t scale well to full images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs009.html#3d-volumes-of-neurons" style="font-size: 80%;">3D volumes of neurons</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs010.html#more-on-dimensionalities" style="font-size: 80%;">More on Dimensionalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs011.html#further-remarks" style="font-size: 80%;">Further remarks</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs012.html#layers-used-to-build-cnns" style="font-size: 80%;">Layers used to build CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs013.html#transforming-images" style="font-size: 80%;">Transforming images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs014.html#cnns-in-brief" style="font-size: 80%;">CNNs in brief</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs015.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs016.html#key-idea" style="font-size: 80%;">Key Idea</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs017.html#mathematics-of-cnns" style="font-size: 80%;">Mathematics of CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs018.html#convolution-examples-polynomial-multiplication" style="font-size: 80%;">Convolution Examples: Polynomial multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs019.html#efficient-polynomial-multiplication" style="font-size: 80%;">Efficient Polynomial Multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs020.html#further-simplification" style="font-size: 80%;">Further simplification</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs021.html#a-more-efficient-way-of-coding-the-above-convolution" style="font-size: 80%;">A more efficient way of coding the above Convolution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs022.html#commutative-process" style="font-size: 80%;">Commutative process</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs023.html#toeplitz-matrices" style="font-size: 80%;">Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs024.html#fourier-series-and-toeplitz-matrices" style="font-size: 80%;">Fourier series and Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs025.html#generalizing-the-above-one-dimensional-case" style="font-size: 80%;">Generalizing the above one-dimensional case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs026.html#memory-considerations" style="font-size: 80%;">Memory considerations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs027.html#padding" style="font-size: 80%;">Padding</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs028.html#new-vector" style="font-size: 80%;">New vector</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs029.html#rewriting-as-dot-products" style="font-size: 80%;">Rewriting as dot products</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs030.html#cross-correlation" style="font-size: 80%;">Cross correlation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs030.html#two-dimensional-objects" style="font-size: 80%;">Two-dimensional objects</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#cnns-in-more-detail-simple-example" style="font-size: 80%;">CNNs in more detail, simple example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs032.html#the-convolution-stage" style="font-size: 80%;">The convolution stage</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs033.html#finding-the-number-of-parameters" style="font-size: 80%;">Finding the number of parameters</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs034.html#new-image-or-volume" style="font-size: 80%;">New image (or volume)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs035.html#parameters-to-train-common-settings" style="font-size: 80%;">Parameters to train, common settings</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs036.html#examples-of-cnn-setups" style="font-size: 80%;">Examples of CNN setups</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs037.html#summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Summarizing: Performing a general discrete convolution ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs038.html#pooling" style="font-size: 80%;">Pooling</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs039.html#pooling-arithmetic" style="font-size: 80%;">Pooling arithmetic</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs040.html#pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Pooling types ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs041.html#building-convolutional-neural-networks-using-tensorflow-and-keras" style="font-size: 80%;">Building convolutional neural networks using Tensorflow and Keras</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs042.html#setting-it-up" style="font-size: 80%;">Setting it up</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs043.html#the-mnist-dataset-again" style="font-size: 80%;">The MNIST dataset again</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs044.html#strong-correlations" style="font-size: 80%;">Strong correlations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs045.html#layers-of-a-cnn" style="font-size: 80%;">Layers of a CNN</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs046.html#systematic-reduction" style="font-size: 80%;">Systematic reduction</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs047.html#prerequisites-collect-and-pre-process-data" style="font-size: 80%;">Prerequisites: Collect and pre-process data</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs048.html#importing-keras-and-tensorflow" style="font-size: 80%;">Importing Keras and Tensorflow</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs049.html#running-with-keras" style="font-size: 80%;">Running with Keras</a></li>
+     <!-- navigation toc: --> <li><a href="#final-part" style="font-size: 80%;">Final part</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs051.html#final-visualization" style="font-size: 80%;">Final visualization</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs052.html#the-cifar01-data-set" style="font-size: 80%;">The CIFAR01 data set</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs053.html#verifying-the-data-set" style="font-size: 80%;">Verifying the data set</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs054.html#set-up-the-model" style="font-size: 80%;">Set up  the model</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs055.html#add-dense-layers-on-top" style="font-size: 80%;">Add Dense layers on top</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs056.html#compile-and-train-the-model" style="font-size: 80%;">Compile and train the model</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs057.html#finally-evaluate-the-model" style="font-size: 80%;">Finally, evaluate the model</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs058.html#building-code-using-pytorch" style="font-size: 80%;">Building code using Pytorch</a></li>
 
         </ul>
       </li>
@@ -418,13 +278,46 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0050"></a>
 <!-- !split -->
-<h2 id="rnns-in-more-detail-part-4" class="anchor">RNNs in more detail, part 4  </h2>
+<h2 id="final-part" class="anchor">Final part </h2>
+
+
+<!-- code=python (!bc pycod) typeset with pygments style "default" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #f8f8f8">
+  <pre style="line-height: 125%;">CNN_keras <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros((<span style="color: #008000">len</span>(eta_vals), <span style="color: #008000">len</span>(lmbd_vals)), dtype<span style="color: #666666">=</span><span style="color: #008000">object</span>)
+        
+<span style="color: #008000; font-weight: bold">for</span> i, eta <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">enumerate</span>(eta_vals):
+    <span style="color: #008000; font-weight: bold">for</span> j, lmbd <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">enumerate</span>(lmbd_vals):
+        CNN <span style="color: #666666">=</span> create_convolutional_neural_network_keras(input_shape, receptive_field,
+                                              n_filters, n_neurons_connected, n_categories,
+                                              eta, lmbd)
+        CNN<span style="color: #666666">.</span>fit(X_train, Y_train, epochs<span style="color: #666666">=</span>epochs, batch_size<span style="color: #666666">=</span>batch_size, verbose<span style="color: #666666">=0</span>)
+        scores <span style="color: #666666">=</span> CNN<span style="color: #666666">.</span>evaluate(X_test, Y_test)
+        
+        CNN_keras[i][j] <span style="color: #666666">=</span> CNN
+        
+        <span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;Learning rate = &quot;</span>, eta)
+        <span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;Lambda = &quot;</span>, lmbd)
+        <span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;Test accuracy: </span><span style="color: #BB6688; font-weight: bold">%.3f</span><span style="color: #BA2121">&quot;</span> <span style="color: #666666">%</span> scores[<span style="color: #666666">1</span>])
+        <span style="color: #008000">print</span>()
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
+</div>
 
-<br/><br/>
-<center>
-<p><img src="http://github.com/figslides/RNN5.png" width="700" align="bottom"></p>
-</center>
-<br/><br/>
 
 <p>
 <!-- navigation buttons at the bottom of the page -->
@@ -449,9 +342,6 @@ <h2 id="rnns-in-more-detail-part-4" class="anchor">RNNs in more detail, part 4
   <li><a href="http://github.com/._week45-bs056.html">57</a></li>
   <li><a href="http://github.com/._week45-bs057.html">58</a></li>
   <li><a href="http://github.com/._week45-bs058.html">59</a></li>
-  <li><a href="http://github.com/._week45-bs059.html">60</a></li>
-  <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week45-bs076.html">77</a></li>
   <li><a href="http://github.com/._week45-bs051.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week45/html/._week45-bs051.html b/doc/pub/week45/html/._week45-bs051.html
index b6134d099..8e4a493b6 100644
--- a/doc/pub/week45/html/._week45-bs051.html
+++ b/doc/pub/week45/html/._week45-bs051.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks (RNNs)">
-<title>Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks (RNNs)</title>
+<meta name="description" content="Week 45,  Convolutional Neural Networks (CCNs)">
+<title>Week 45,  Convolutional Neural Networks (CCNs)</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week45.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week45-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -37,19 +37,19 @@
 <!-- tocinfo
 {'highest level': 2,
  'sections': [('Plans for week 45', 2, None, 'plans-for-week-45'),
-              ('Material for the lab sessions, additional ways to present '
-               'classification results and other practicalities',
+              ('Material for the lab sessions',
                2,
                None,
-               'material-for-the-lab-sessions-additional-ways-to-present-classification-results-and-other-practicalities'),
-              ('Material for Lecture Monday November 4',
+               'material-for-the-lab-sessions'),
+              ('Material for Lecture Monday November 3',
                2,
                None,
-               'material-for-lecture-monday-november-4'),
-              ('Convolutional Neural Networks (recognizing images)',
+               'material-for-lecture-monday-november-3'),
+              ('Convolutional Neural Networks (recognizing images), reminder '
+               'from last week',
                2,
                None,
-               'convolutional-neural-networks-recognizing-images'),
+               'convolutional-neural-networks-recognizing-images-reminder-from-last-week'),
               ('What is the Difference', 2, None, 'what-is-the-difference'),
               ('Neural Networks vs CNNs', 2, None, 'neural-networks-vs-cnns'),
               ('Why CNNS for images, sound files, medical images from CT scans '
@@ -62,10 +62,13 @@
                None,
                'regular-nns-don-t-scale-well-to-full-images'),
               ('3D volumes of neurons', 2, None, '3d-volumes-of-neurons'),
+              ('More on Dimensionalities', 2, None, 'more-on-dimensionalities'),
+              ('Further remarks', 2, None, 'further-remarks'),
               ('Layers used to build CNNs',
                2,
                None,
                'layers-used-to-build-cnns'),
+              ('Transforming images', 2, None, 'transforming-images'),
               ('CNNs in brief', 2, None, 'cnns-in-brief'),
               ('A deep CNN model ("From Raschka et '
                'al":"https://github.com/rasbt/machine-learning-book")',
@@ -73,208 +76,104 @@
                None,
                'a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
               ('Key Idea', 2, None, 'key-idea'),
-              ('Building convolutional neural networks in Tensorflow and Keras',
+              ('Mathematics of CNNs', 2, None, 'mathematics-of-cnns'),
+              ('Convolution Examples: Polynomial multiplication',
                2,
                None,
-               'building-convolutional-neural-networks-in-tensorflow-and-keras'),
-              ('Setting it up', 2, None, 'setting-it-up'),
-              ('The MNIST dataset again', 2, None, 'the-mnist-dataset-again'),
-              ('Strong correlations', 2, None, 'strong-correlations'),
-              ('Layers of a CNN', 2, None, 'layers-of-a-cnn'),
-              ('Systematic reduction', 2, None, 'systematic-reduction'),
-              ('Prerequisites: Collect and pre-process data',
-               2,
-               None,
-               'prerequisites-collect-and-pre-process-data'),
-              ('Importing Keras and Tensorflow',
-               2,
-               None,
-               'importing-keras-and-tensorflow'),
-              ('Running with Keras', 2, None, 'running-with-keras'),
-              ('Final part', 2, None, 'final-part'),
-              ('Final visualization', 2, None, 'final-visualization'),
-              ('The CIFAR01 data set', 2, None, 'the-cifar01-data-set'),
-              ('Verifying the data set', 2, None, 'verifying-the-data-set'),
-              ('Set up  the model', 2, None, 'set-up-the-model'),
-              ('Add Dense layers on top', 2, None, 'add-dense-layers-on-top'),
-              ('Compile and train the model',
-               2,
-               None,
-               'compile-and-train-the-model'),
-              ('Finally, evaluate the model',
-               2,
-               None,
-               'finally-evaluate-the-model'),
-              ('Building our own CNN code',
+               'convolution-examples-polynomial-multiplication'),
+              ('Efficient Polynomial Multiplication',
                2,
                None,
-               'building-our-own-cnn-code'),
-              ('List of contents:', 3, None, 'list-of-contents'),
-              ('Schedulers', 3, None, 'schedulers'),
-              ('Usage of schedulers', 3, None, 'usage-of-schedulers'),
-              ('Cost functions', 3, None, 'cost-functions'),
-              ('Usage of cost functions', 3, None, 'usage-of-cost-functions'),
-              ('Activation functions', 3, None, 'activation-functions'),
-              ('Usage of activation functions',
-               3,
-               None,
-               'usage-of-activation-functions'),
-              ('Convolution', 3, None, 'convolution'),
-              ('Layers', 3, None, 'layers'),
-              ('Convolution2DLayer: convolution in a hidden layer',
-               3,
-               None,
-               'convolution2dlayer-convolution-in-a-hidden-layer'),
-              ('Backpropagation in the convolutional layer',
-               3,
-               None,
-               'backpropagation-in-the-convolutional-layer'),
-              ('Demonstration', 3, None, 'demonstration'),
-              ('Pooling Layer', 3, None, 'pooling-layer'),
-              ('Flattening Layer', 3, None, 'flattening-layer'),
-              ('Fully Connected Layers', 3, None, 'fully-connected-layers'),
-              ('Optimized Convolution2DLayer',
-               3,
-               None,
-               'optimized-convolution2dlayer'),
-              ('The Convolutional Neural Network (CNN)',
-               3,
-               None,
-               'the-convolutional-neural-network-cnn'),
-              ('Usage of CNN code', 3, None, 'usage-of-cnn-code'),
-              ('Additional Remarks', 3, None, 'additional-remarks'),
-              ('Remarks on the speed', 3, None, 'remarks-on-the-speed'),
-              ('Convolution using separable kernels',
-               3,
-               None,
-               'convolution-using-separable-kernels'),
-              ('Convolution in the Fourier domain',
-               3,
-               None,
-               'convolution-in-the-fourier-domain'),
-              ('From FFNNs and CNNs to recurrent neural networks (RNNs)',
+               'efficient-polynomial-multiplication'),
+              ('Further simplification', 2, None, 'further-simplification'),
+              ('A more efficient way of coding the above Convolution',
                2,
                None,
-               'from-ffnns-and-cnns-to-recurrent-neural-networks-rnns'),
-              ('Feedback connections', 2, None, 'feedback-connections'),
-              ('Vanishing gradients', 2, None, 'vanishing-gradients'),
-              ('Recurrent neural networks (RNNs): Overarching view',
+               'a-more-efficient-way-of-coding-the-above-convolution'),
+              ('Commutative process', 2, None, 'commutative-process'),
+              ('Toeplitz matrices', 2, None, 'toeplitz-matrices'),
+              ('Fourier series and Toeplitz matrices',
                2,
                None,
-               'recurrent-neural-networks-rnns-overarching-view'),
-              ('Sequential data only?', 2, None, 'sequential-data-only'),
-              ('Differential equations', 2, None, 'differential-equations'),
-              ('A simple example', 2, None, 'a-simple-example'),
-              ('RNNs', 2, None, 'rnns'),
-              ('What kinds of behaviour can RNNs exhibit?',
+               'fourier-series-and-toeplitz-matrices'),
+              ('Generalizing the above one-dimensional case',
                2,
                None,
-               'what-kinds-of-behaviour-can-rnns-exhibit'),
-              ('Basic layout,  "Figures from Sebastian Rashcka et al, Machine '
-               'learning with Sickit-Learn and '
-               'PyTorch":"https://sebastianraschka.com/blog/2022/ml-pytorch-book.html"',
+               'generalizing-the-above-one-dimensional-case'),
+              ('Memory considerations', 2, None, 'memory-considerations'),
+              ('Padding', 2, None, 'padding'),
+              ('New vector', 2, None, 'new-vector'),
+              ('Rewriting as dot products',
                2,
                None,
-               'basic-layout-figures-from-sebastian-rashcka-et-al-machine-learning-with-sickit-learn-and-pytorch-https-sebastianraschka-com-blog-2022-ml-pytorch-book-html'),
-              ('Solving differential equations with RNNs',
+               'rewriting-as-dot-products'),
+              ('Cross correlation', 2, None, 'cross-correlation'),
+              ('Two-dimensional objects', 2, None, 'two-dimensional-objects'),
+              ('CNNs in more detail, simple example',
                2,
                None,
-               'solving-differential-equations-with-rnns'),
-              ('Two first-order differential equations',
+               'cnns-in-more-detail-simple-example'),
+              ('The convolution stage', 2, None, 'the-convolution-stage'),
+              ('Finding the number of parameters',
                2,
                None,
-               'two-first-order-differential-equations'),
-              ('Velocity only', 2, None, 'velocity-only'),
-              ('Linking with RNNs', 2, None, 'linking-with-rnns'),
-              ('Minor rewrite', 2, None, 'minor-rewrite'),
-              ('RNNs in more detail', 2, None, 'rnns-in-more-detail'),
-              ('RNNs in more detail, part 2',
+               'finding-the-number-of-parameters'),
+              ('New image (or volume)', 2, None, 'new-image-or-volume'),
+              ('Parameters to train, common settings',
                2,
                None,
-               'rnns-in-more-detail-part-2'),
-              ('RNNs in more detail, part 3',
-               2,
-               None,
-               'rnns-in-more-detail-part-3'),
-              ('RNNs in more detail, part 4',
-               2,
-               None,
-               'rnns-in-more-detail-part-4'),
-              ('RNNs in more detail, part 5',
-               2,
-               None,
-               'rnns-in-more-detail-part-5'),
-              ('RNNs in more detail, part 6',
-               2,
-               None,
-               'rnns-in-more-detail-part-6'),
-              ('RNNs in more detail, part 7',
-               2,
-               None,
-               'rnns-in-more-detail-part-7'),
-              ('Backpropagation through time',
-               2,
-               None,
-               'backpropagation-through-time'),
-              ('The backward pass is linear',
-               2,
-               None,
-               'the-backward-pass-is-linear'),
-              ('The problem of exploding or vanishing gradients',
-               2,
-               None,
-               'the-problem-of-exploding-or-vanishing-gradients'),
-              ('Mathematical setup', 2, None, 'mathematical-setup'),
-              ('Back propagation in time through figures, part 1',
-               2,
-               None,
-               'back-propagation-in-time-through-figures-part-1'),
-              ('Back propagation in time, part 2',
-               2,
-               None,
-               'back-propagation-in-time-part-2'),
-              ('Back propagation in time, part 3',
+               'parameters-to-train-common-settings'),
+              ('Examples of CNN setups', 2, None, 'examples-of-cnn-setups'),
+              ('Summarizing: Performing a general discrete convolution ("From '
+               'Raschka et '
+               'al":"https://github.com/rasbt/machine-learning-book")',
                2,
                None,
-               'back-propagation-in-time-part-3'),
-              ('Back propagation in time, part 4',
+               'summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
+              ('Pooling', 2, None, 'pooling'),
+              ('Pooling arithmetic', 2, None, 'pooling-arithmetic'),
+              ('Pooling types ("From Raschka et '
+               'al":"https://github.com/rasbt/machine-learning-book")',
                2,
                None,
-               'back-propagation-in-time-part-4'),
-              ('Back propagation in time in equations',
+               'pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
+              ('Building convolutional neural networks using Tensorflow and '
+               'Keras',
                2,
                None,
-               'back-propagation-in-time-in-equations'),
-              ('Chain rule again', 2, None, 'chain-rule-again'),
-              ('Gradients of loss functions',
+               'building-convolutional-neural-networks-using-tensorflow-and-keras'),
+              ('Setting it up', 2, None, 'setting-it-up'),
+              ('The MNIST dataset again', 2, None, 'the-mnist-dataset-again'),
+              ('Strong correlations', 2, None, 'strong-correlations'),
+              ('Layers of a CNN', 2, None, 'layers-of-a-cnn'),
+              ('Systematic reduction', 2, None, 'systematic-reduction'),
+              ('Prerequisites: Collect and pre-process data',
                2,
                None,
-               'gradients-of-loss-functions'),
-              ('Summary of RNNs', 2, None, 'summary-of-rnns'),
-              ('Summary of a  typical RNN',
+               'prerequisites-collect-and-pre-process-data'),
+              ('Importing Keras and Tensorflow',
                2,
                None,
-               'summary-of-a-typical-rnn'),
-              ('Four effective ways to learn an RNN and preparing for next '
-               'week',
+               'importing-keras-and-tensorflow'),
+              ('Running with Keras', 2, None, 'running-with-keras'),
+              ('Final part', 2, None, 'final-part'),
+              ('Final visualization', 2, None, 'final-visualization'),
+              ('The CIFAR01 data set', 2, None, 'the-cifar01-data-set'),
+              ('Verifying the data set', 2, None, 'verifying-the-data-set'),
+              ('Set up  the model', 2, None, 'set-up-the-model'),
+              ('Add Dense layers on top', 2, None, 'add-dense-layers-on-top'),
+              ('Compile and train the model',
                2,
                None,
-               'four-effective-ways-to-learn-an-rnn-and-preparing-for-next-week'),
-              ('Gating mechanism: Long Short Term Memory (LSTM)',
+               'compile-and-train-the-model'),
+              ('Finally, evaluate the model',
                2,
                None,
-               'gating-mechanism-long-short-term-memory-lstm'),
-              ('Implementing a memory cell in a neural network',
+               'finally-evaluate-the-model'),
+              ('Building code using Pytorch',
                2,
                None,
-               'implementing-a-memory-cell-in-a-neural-network'),
-              ('LSTM details', 2, None, 'lstm-details'),
-              ('Basic layout', 2, None, 'basic-layout'),
-              ('More LSTM details', 2, None, 'more-lstm-details'),
-              ('The forget gate', 2, None, 'the-forget-gate'),
-              ('Input gate', 2, None, 'input-gate'),
-              ('Forget and input', 2, None, 'forget-and-input'),
-              ('Output gate', 2, None, 'output-gate')]}
+               'building-code-using-pytorch')]}
 end of tocinfo -->
 
 <body>
@@ -302,111 +201,72 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week45-bs.html">Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks (RNNs)</a>
+    <a class="navbar-brand" href="http://github.com/week45-bs.html">Week 45,  Convolutional Neural Networks (CCNs)</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs001.html#plans-for-week-45" style="font-size: 80%;"><b>Plans for week 45</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs002.html#material-for-the-lab-sessions-additional-ways-to-present-classification-results-and-other-practicalities" style="font-size: 80%;"><b>Material for the lab sessions, additional ways to present classification results and other practicalities</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs003.html#material-for-lecture-monday-november-4" style="font-size: 80%;"><b>Material for Lecture Monday November 4</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs004.html#convolutional-neural-networks-recognizing-images" style="font-size: 80%;"><b>Convolutional Neural Networks (recognizing images)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs005.html#what-is-the-difference" style="font-size: 80%;"><b>What is the Difference</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs006.html#neural-networks-vs-cnns" style="font-size: 80%;"><b>Neural Networks vs CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs007.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;"><b>Why CNNS for images, sound files, medical images from CT scans etc?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs008.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;"><b>Regular NNs don’t scale well to full images</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs009.html#3d-volumes-of-neurons" style="font-size: 80%;"><b>3D volumes of neurons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs010.html#layers-used-to-build-cnns" style="font-size: 80%;"><b>Layers used to build CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs011.html#cnns-in-brief" style="font-size: 80%;"><b>CNNs in brief</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs012.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs013.html#key-idea" style="font-size: 80%;"><b>Key Idea</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs014.html#building-convolutional-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building convolutional neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs015.html#setting-it-up" style="font-size: 80%;"><b>Setting it up</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs016.html#the-mnist-dataset-again" style="font-size: 80%;"><b>The MNIST dataset again</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs017.html#strong-correlations" style="font-size: 80%;"><b>Strong correlations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs018.html#layers-of-a-cnn" style="font-size: 80%;"><b>Layers of a CNN</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs019.html#systematic-reduction" style="font-size: 80%;"><b>Systematic reduction</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs020.html#prerequisites-collect-and-pre-process-data" style="font-size: 80%;"><b>Prerequisites: Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs021.html#importing-keras-and-tensorflow" style="font-size: 80%;"><b>Importing Keras and Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs022.html#running-with-keras" style="font-size: 80%;"><b>Running with Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs023.html#final-part" style="font-size: 80%;"><b>Final part</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs024.html#final-visualization" style="font-size: 80%;"><b>Final visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs025.html#the-cifar01-data-set" style="font-size: 80%;"><b>The CIFAR01 data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs026.html#verifying-the-data-set" style="font-size: 80%;"><b>Verifying the data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs027.html#set-up-the-model" style="font-size: 80%;"><b>Set up  the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs028.html#add-dense-layers-on-top" style="font-size: 80%;"><b>Add Dense layers on top</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs029.html#compile-and-train-the-model" style="font-size: 80%;"><b>Compile and train the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs030.html#finally-evaluate-the-model" style="font-size: 80%;"><b>Finally, evaluate the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#building-our-own-cnn-code" style="font-size: 80%;"><b>Building our own CNN code</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#list-of-contents" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;List of contents:</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution2dlayer-convolution-in-a-hidden-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution2DLayer: convolution in a hidden layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#backpropagation-in-the-convolutional-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Backpropagation in the convolutional layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#demonstration" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Demonstration</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#pooling-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Pooling Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#flattening-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Flattening Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#fully-connected-layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Fully Connected Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#optimized-convolution2dlayer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Optimized Convolution2DLayer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#the-convolutional-neural-network-cnn" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;The Convolutional Neural Network (CNN)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-cnn-code" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of CNN code</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#additional-remarks" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Additional Remarks</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#remarks-on-the-speed" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Remarks on the speed</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution-using-separable-kernels" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution using separable kernels</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution-in-the-fourier-domain" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution in the Fourier domain</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs032.html#from-ffnns-and-cnns-to-recurrent-neural-networks-rnns" style="font-size: 80%;"><b>From FFNNs and CNNs to recurrent neural networks (RNNs)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs033.html#feedback-connections" style="font-size: 80%;"><b>Feedback connections</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs034.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs035.html#recurrent-neural-networks-rnns-overarching-view" style="font-size: 80%;"><b>Recurrent neural networks (RNNs): Overarching view</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs036.html#sequential-data-only" style="font-size: 80%;"><b>Sequential data only?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs037.html#differential-equations" style="font-size: 80%;"><b>Differential equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs038.html#a-simple-example" style="font-size: 80%;"><b>A simple example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs039.html#rnns" style="font-size: 80%;"><b>RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs040.html#what-kinds-of-behaviour-can-rnns-exhibit" style="font-size: 80%;"><b>What kinds of behaviour can RNNs exhibit?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs041.html#basic-layout-figures-from-sebastian-rashcka-et-al-machine-learning-with-sickit-learn-and-pytorch-https-sebastianraschka-com-blog-2022-ml-pytorch-book-html" style="font-size: 80%;"><b>Basic layout,  "Figures from Sebastian Rashcka et al, Machine learning with Sickit-Learn and PyTorch":"https://sebastianraschka.com/blog/2022/ml-pytorch-book.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs042.html#solving-differential-equations-with-rnns" style="font-size: 80%;"><b>Solving differential equations with RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs043.html#two-first-order-differential-equations" style="font-size: 80%;"><b>Two first-order differential equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs044.html#velocity-only" style="font-size: 80%;"><b>Velocity only</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs045.html#linking-with-rnns" style="font-size: 80%;"><b>Linking with RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs046.html#minor-rewrite" style="font-size: 80%;"><b>Minor rewrite</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs047.html#rnns-in-more-detail" style="font-size: 80%;"><b>RNNs in more detail</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs048.html#rnns-in-more-detail-part-2" style="font-size: 80%;"><b>RNNs in more detail, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs049.html#rnns-in-more-detail-part-3" style="font-size: 80%;"><b>RNNs in more detail, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs050.html#rnns-in-more-detail-part-4" style="font-size: 80%;"><b>RNNs in more detail, part 4</b></a></li>
-     <!-- navigation toc: --> <li><a href="#rnns-in-more-detail-part-5" style="font-size: 80%;"><b>RNNs in more detail, part 5</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs052.html#rnns-in-more-detail-part-6" style="font-size: 80%;"><b>RNNs in more detail, part 6</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs053.html#rnns-in-more-detail-part-7" style="font-size: 80%;"><b>RNNs in more detail, part 7</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs054.html#backpropagation-through-time" style="font-size: 80%;"><b>Backpropagation through time</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs055.html#the-backward-pass-is-linear" style="font-size: 80%;"><b>The backward pass is linear</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs056.html#the-problem-of-exploding-or-vanishing-gradients" style="font-size: 80%;"><b>The problem of exploding or vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs057.html#mathematical-setup" style="font-size: 80%;"><b>Mathematical setup</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs058.html#back-propagation-in-time-through-figures-part-1" style="font-size: 80%;"><b>Back propagation in time through figures, part 1</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs059.html#back-propagation-in-time-part-2" style="font-size: 80%;"><b>Back propagation in time, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs060.html#back-propagation-in-time-part-3" style="font-size: 80%;"><b>Back propagation in time, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs061.html#back-propagation-in-time-part-4" style="font-size: 80%;"><b>Back propagation in time, part 4</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs062.html#back-propagation-in-time-in-equations" style="font-size: 80%;"><b>Back propagation in time in equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs063.html#chain-rule-again" style="font-size: 80%;"><b>Chain rule again</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs064.html#gradients-of-loss-functions" style="font-size: 80%;"><b>Gradients of loss functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs065.html#summary-of-rnns" style="font-size: 80%;"><b>Summary of RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs066.html#summary-of-a-typical-rnn" style="font-size: 80%;"><b>Summary of a  typical RNN</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs067.html#four-effective-ways-to-learn-an-rnn-and-preparing-for-next-week" style="font-size: 80%;"><b>Four effective ways to learn an RNN and preparing for next week</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs068.html#gating-mechanism-long-short-term-memory-lstm" style="font-size: 80%;"><b>Gating mechanism: Long Short Term Memory (LSTM)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs069.html#implementing-a-memory-cell-in-a-neural-network" style="font-size: 80%;"><b>Implementing a memory cell in a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs070.html#lstm-details" style="font-size: 80%;"><b>LSTM details</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs071.html#basic-layout" style="font-size: 80%;"><b>Basic layout</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs072.html#more-lstm-details" style="font-size: 80%;"><b>More LSTM details</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs073.html#the-forget-gate" style="font-size: 80%;"><b>The forget gate</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs074.html#input-gate" style="font-size: 80%;"><b>Input gate</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs075.html#forget-and-input" style="font-size: 80%;"><b>Forget and input</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs076.html#output-gate" style="font-size: 80%;"><b>Output gate</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs001.html#plans-for-week-45" style="font-size: 80%;">Plans for week 45</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs002.html#material-for-the-lab-sessions" style="font-size: 80%;">Material for the lab sessions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs003.html#material-for-lecture-monday-november-3" style="font-size: 80%;">Material for Lecture Monday November 3</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs004.html#convolutional-neural-networks-recognizing-images-reminder-from-last-week" style="font-size: 80%;">Convolutional Neural Networks (recognizing images), reminder from last week</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs005.html#what-is-the-difference" style="font-size: 80%;">What is the Difference</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs006.html#neural-networks-vs-cnns" style="font-size: 80%;">Neural Networks vs CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs007.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;">Why CNNS for images, sound files, medical images from CT scans etc?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs008.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;">Regular NNs don’t scale well to full images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs009.html#3d-volumes-of-neurons" style="font-size: 80%;">3D volumes of neurons</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs010.html#more-on-dimensionalities" style="font-size: 80%;">More on Dimensionalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs011.html#further-remarks" style="font-size: 80%;">Further remarks</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs012.html#layers-used-to-build-cnns" style="font-size: 80%;">Layers used to build CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs013.html#transforming-images" style="font-size: 80%;">Transforming images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs014.html#cnns-in-brief" style="font-size: 80%;">CNNs in brief</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs015.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs016.html#key-idea" style="font-size: 80%;">Key Idea</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs017.html#mathematics-of-cnns" style="font-size: 80%;">Mathematics of CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs018.html#convolution-examples-polynomial-multiplication" style="font-size: 80%;">Convolution Examples: Polynomial multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs019.html#efficient-polynomial-multiplication" style="font-size: 80%;">Efficient Polynomial Multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs020.html#further-simplification" style="font-size: 80%;">Further simplification</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs021.html#a-more-efficient-way-of-coding-the-above-convolution" style="font-size: 80%;">A more efficient way of coding the above Convolution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs022.html#commutative-process" style="font-size: 80%;">Commutative process</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs023.html#toeplitz-matrices" style="font-size: 80%;">Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs024.html#fourier-series-and-toeplitz-matrices" style="font-size: 80%;">Fourier series and Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs025.html#generalizing-the-above-one-dimensional-case" style="font-size: 80%;">Generalizing the above one-dimensional case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs026.html#memory-considerations" style="font-size: 80%;">Memory considerations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs027.html#padding" style="font-size: 80%;">Padding</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs028.html#new-vector" style="font-size: 80%;">New vector</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs029.html#rewriting-as-dot-products" style="font-size: 80%;">Rewriting as dot products</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs030.html#cross-correlation" style="font-size: 80%;">Cross correlation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs030.html#two-dimensional-objects" style="font-size: 80%;">Two-dimensional objects</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#cnns-in-more-detail-simple-example" style="font-size: 80%;">CNNs in more detail, simple example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs032.html#the-convolution-stage" style="font-size: 80%;">The convolution stage</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs033.html#finding-the-number-of-parameters" style="font-size: 80%;">Finding the number of parameters</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs034.html#new-image-or-volume" style="font-size: 80%;">New image (or volume)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs035.html#parameters-to-train-common-settings" style="font-size: 80%;">Parameters to train, common settings</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs036.html#examples-of-cnn-setups" style="font-size: 80%;">Examples of CNN setups</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs037.html#summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Summarizing: Performing a general discrete convolution ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs038.html#pooling" style="font-size: 80%;">Pooling</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs039.html#pooling-arithmetic" style="font-size: 80%;">Pooling arithmetic</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs040.html#pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Pooling types ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs041.html#building-convolutional-neural-networks-using-tensorflow-and-keras" style="font-size: 80%;">Building convolutional neural networks using Tensorflow and Keras</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs042.html#setting-it-up" style="font-size: 80%;">Setting it up</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs043.html#the-mnist-dataset-again" style="font-size: 80%;">The MNIST dataset again</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs044.html#strong-correlations" style="font-size: 80%;">Strong correlations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs045.html#layers-of-a-cnn" style="font-size: 80%;">Layers of a CNN</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs046.html#systematic-reduction" style="font-size: 80%;">Systematic reduction</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs047.html#prerequisites-collect-and-pre-process-data" style="font-size: 80%;">Prerequisites: Collect and pre-process data</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs048.html#importing-keras-and-tensorflow" style="font-size: 80%;">Importing Keras and Tensorflow</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs049.html#running-with-keras" style="font-size: 80%;">Running with Keras</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs050.html#final-part" style="font-size: 80%;">Final part</a></li>
+     <!-- navigation toc: --> <li><a href="#final-visualization" style="font-size: 80%;">Final visualization</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs052.html#the-cifar01-data-set" style="font-size: 80%;">The CIFAR01 data set</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs053.html#verifying-the-data-set" style="font-size: 80%;">Verifying the data set</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs054.html#set-up-the-model" style="font-size: 80%;">Set up  the model</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs055.html#add-dense-layers-on-top" style="font-size: 80%;">Add Dense layers on top</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs056.html#compile-and-train-the-model" style="font-size: 80%;">Compile and train the model</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs057.html#finally-evaluate-the-model" style="font-size: 80%;">Finally, evaluate the model</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs058.html#building-code-using-pytorch" style="font-size: 80%;">Building code using Pytorch</a></li>
 
         </ul>
       </li>
@@ -418,13 +278,60 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0051"></a>
 <!-- !split -->
-<h2 id="rnns-in-more-detail-part-5" class="anchor">RNNs in more detail, part 5  </h2>
+<h2 id="final-visualization" class="anchor">Final visualization </h2>
+
+
+<!-- code=python (!bc pycod) typeset with pygments style "default" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #f8f8f8">
+  <pre style="line-height: 125%;"><span style="color: #408080; font-style: italic"># visual representation of grid search</span>
+<span style="color: #408080; font-style: italic"># uses seaborn heatmap, could probably do this in matplotlib</span>
+<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">seaborn</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">sns</span>
+
+sns<span style="color: #666666">.</span>set()
+
+train_accuracy <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros((<span style="color: #008000">len</span>(eta_vals), <span style="color: #008000">len</span>(lmbd_vals)))
+test_accuracy <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros((<span style="color: #008000">len</span>(eta_vals), <span style="color: #008000">len</span>(lmbd_vals)))
+
+<span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #008000">len</span>(eta_vals)):
+    <span style="color: #008000; font-weight: bold">for</span> j <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #008000">len</span>(lmbd_vals)):
+        CNN <span style="color: #666666">=</span> CNN_keras[i][j]
+
+        train_accuracy[i][j] <span style="color: #666666">=</span> CNN<span style="color: #666666">.</span>evaluate(X_train, Y_train)[<span style="color: #666666">1</span>]
+        test_accuracy[i][j] <span style="color: #666666">=</span> CNN<span style="color: #666666">.</span>evaluate(X_test, Y_test)[<span style="color: #666666">1</span>]
+
+        
+fig, ax <span style="color: #666666">=</span> plt<span style="color: #666666">.</span>subplots(figsize <span style="color: #666666">=</span> (<span style="color: #666666">10</span>, <span style="color: #666666">10</span>))
+sns<span style="color: #666666">.</span>heatmap(train_accuracy, annot<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">True</span>, ax<span style="color: #666666">=</span>ax, cmap<span style="color: #666666">=</span><span style="color: #BA2121">&quot;viridis&quot;</span>)
+ax<span style="color: #666666">.</span>set_title(<span style="color: #BA2121">&quot;Training Accuracy&quot;</span>)
+ax<span style="color: #666666">.</span>set_ylabel(<span style="color: #BA2121">&quot;$\eta$&quot;</span>)
+ax<span style="color: #666666">.</span>set_xlabel(<span style="color: #BA2121">&quot;$\lambda$&quot;</span>)
+plt<span style="color: #666666">.</span>show()
+
+fig, ax <span style="color: #666666">=</span> plt<span style="color: #666666">.</span>subplots(figsize <span style="color: #666666">=</span> (<span style="color: #666666">10</span>, <span style="color: #666666">10</span>))
+sns<span style="color: #666666">.</span>heatmap(test_accuracy, annot<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">True</span>, ax<span style="color: #666666">=</span>ax, cmap<span style="color: #666666">=</span><span style="color: #BA2121">&quot;viridis&quot;</span>)
+ax<span style="color: #666666">.</span>set_title(<span style="color: #BA2121">&quot;Test Accuracy&quot;</span>)
+ax<span style="color: #666666">.</span>set_ylabel(<span style="color: #BA2121">&quot;$\eta$&quot;</span>)
+ax<span style="color: #666666">.</span>set_xlabel(<span style="color: #BA2121">&quot;$\lambda$&quot;</span>)
+plt<span style="color: #666666">.</span>show()
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
+</div>
 
-<br/><br/>
-<center>
-<p><img src="http://github.com/figslides/RNN6.png" width="700" align="bottom"></p>
-</center>
-<br/><br/>
 
 <p>
 <!-- navigation buttons at the bottom of the page -->
@@ -448,10 +355,6 @@ <h2 id="rnns-in-more-detail-part-5" class="anchor">RNNs in more detail, part 5
   <li><a href="http://github.com/._week45-bs056.html">57</a></li>
   <li><a href="http://github.com/._week45-bs057.html">58</a></li>
   <li><a href="http://github.com/._week45-bs058.html">59</a></li>
-  <li><a href="http://github.com/._week45-bs059.html">60</a></li>
-  <li><a href="http://github.com/._week45-bs060.html">61</a></li>
-  <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week45-bs076.html">77</a></li>
   <li><a href="http://github.com/._week45-bs052.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week45/html/._week45-bs052.html b/doc/pub/week45/html/._week45-bs052.html
index 0192f9120..1d620a02d 100644
--- a/doc/pub/week45/html/._week45-bs052.html
+++ b/doc/pub/week45/html/._week45-bs052.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks (RNNs)">
-<title>Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks (RNNs)</title>
+<meta name="description" content="Week 45,  Convolutional Neural Networks (CCNs)">
+<title>Week 45,  Convolutional Neural Networks (CCNs)</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week45.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week45-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -37,19 +37,19 @@
 <!-- tocinfo
 {'highest level': 2,
  'sections': [('Plans for week 45', 2, None, 'plans-for-week-45'),
-              ('Material for the lab sessions, additional ways to present '
-               'classification results and other practicalities',
+              ('Material for the lab sessions',
                2,
                None,
-               'material-for-the-lab-sessions-additional-ways-to-present-classification-results-and-other-practicalities'),
-              ('Material for Lecture Monday November 4',
+               'material-for-the-lab-sessions'),
+              ('Material for Lecture Monday November 3',
                2,
                None,
-               'material-for-lecture-monday-november-4'),
-              ('Convolutional Neural Networks (recognizing images)',
+               'material-for-lecture-monday-november-3'),
+              ('Convolutional Neural Networks (recognizing images), reminder '
+               'from last week',
                2,
                None,
-               'convolutional-neural-networks-recognizing-images'),
+               'convolutional-neural-networks-recognizing-images-reminder-from-last-week'),
               ('What is the Difference', 2, None, 'what-is-the-difference'),
               ('Neural Networks vs CNNs', 2, None, 'neural-networks-vs-cnns'),
               ('Why CNNS for images, sound files, medical images from CT scans '
@@ -62,10 +62,13 @@
                None,
                'regular-nns-don-t-scale-well-to-full-images'),
               ('3D volumes of neurons', 2, None, '3d-volumes-of-neurons'),
+              ('More on Dimensionalities', 2, None, 'more-on-dimensionalities'),
+              ('Further remarks', 2, None, 'further-remarks'),
               ('Layers used to build CNNs',
                2,
                None,
                'layers-used-to-build-cnns'),
+              ('Transforming images', 2, None, 'transforming-images'),
               ('CNNs in brief', 2, None, 'cnns-in-brief'),
               ('A deep CNN model ("From Raschka et '
                'al":"https://github.com/rasbt/machine-learning-book")',
@@ -73,208 +76,104 @@
                None,
                'a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
               ('Key Idea', 2, None, 'key-idea'),
-              ('Building convolutional neural networks in Tensorflow and Keras',
+              ('Mathematics of CNNs', 2, None, 'mathematics-of-cnns'),
+              ('Convolution Examples: Polynomial multiplication',
                2,
                None,
-               'building-convolutional-neural-networks-in-tensorflow-and-keras'),
-              ('Setting it up', 2, None, 'setting-it-up'),
-              ('The MNIST dataset again', 2, None, 'the-mnist-dataset-again'),
-              ('Strong correlations', 2, None, 'strong-correlations'),
-              ('Layers of a CNN', 2, None, 'layers-of-a-cnn'),
-              ('Systematic reduction', 2, None, 'systematic-reduction'),
-              ('Prerequisites: Collect and pre-process data',
-               2,
-               None,
-               'prerequisites-collect-and-pre-process-data'),
-              ('Importing Keras and Tensorflow',
-               2,
-               None,
-               'importing-keras-and-tensorflow'),
-              ('Running with Keras', 2, None, 'running-with-keras'),
-              ('Final part', 2, None, 'final-part'),
-              ('Final visualization', 2, None, 'final-visualization'),
-              ('The CIFAR01 data set', 2, None, 'the-cifar01-data-set'),
-              ('Verifying the data set', 2, None, 'verifying-the-data-set'),
-              ('Set up  the model', 2, None, 'set-up-the-model'),
-              ('Add Dense layers on top', 2, None, 'add-dense-layers-on-top'),
-              ('Compile and train the model',
-               2,
-               None,
-               'compile-and-train-the-model'),
-              ('Finally, evaluate the model',
-               2,
-               None,
-               'finally-evaluate-the-model'),
-              ('Building our own CNN code',
-               2,
-               None,
-               'building-our-own-cnn-code'),
-              ('List of contents:', 3, None, 'list-of-contents'),
-              ('Schedulers', 3, None, 'schedulers'),
-              ('Usage of schedulers', 3, None, 'usage-of-schedulers'),
-              ('Cost functions', 3, None, 'cost-functions'),
-              ('Usage of cost functions', 3, None, 'usage-of-cost-functions'),
-              ('Activation functions', 3, None, 'activation-functions'),
-              ('Usage of activation functions',
-               3,
-               None,
-               'usage-of-activation-functions'),
-              ('Convolution', 3, None, 'convolution'),
-              ('Layers', 3, None, 'layers'),
-              ('Convolution2DLayer: convolution in a hidden layer',
-               3,
-               None,
-               'convolution2dlayer-convolution-in-a-hidden-layer'),
-              ('Backpropagation in the convolutional layer',
-               3,
-               None,
-               'backpropagation-in-the-convolutional-layer'),
-              ('Demonstration', 3, None, 'demonstration'),
-              ('Pooling Layer', 3, None, 'pooling-layer'),
-              ('Flattening Layer', 3, None, 'flattening-layer'),
-              ('Fully Connected Layers', 3, None, 'fully-connected-layers'),
-              ('Optimized Convolution2DLayer',
-               3,
-               None,
-               'optimized-convolution2dlayer'),
-              ('The Convolutional Neural Network (CNN)',
-               3,
-               None,
-               'the-convolutional-neural-network-cnn'),
-              ('Usage of CNN code', 3, None, 'usage-of-cnn-code'),
-              ('Additional Remarks', 3, None, 'additional-remarks'),
-              ('Remarks on the speed', 3, None, 'remarks-on-the-speed'),
-              ('Convolution using separable kernels',
-               3,
-               None,
-               'convolution-using-separable-kernels'),
-              ('Convolution in the Fourier domain',
-               3,
-               None,
-               'convolution-in-the-fourier-domain'),
-              ('From FFNNs and CNNs to recurrent neural networks (RNNs)',
-               2,
-               None,
-               'from-ffnns-and-cnns-to-recurrent-neural-networks-rnns'),
-              ('Feedback connections', 2, None, 'feedback-connections'),
-              ('Vanishing gradients', 2, None, 'vanishing-gradients'),
-              ('Recurrent neural networks (RNNs): Overarching view',
-               2,
-               None,
-               'recurrent-neural-networks-rnns-overarching-view'),
-              ('Sequential data only?', 2, None, 'sequential-data-only'),
-              ('Differential equations', 2, None, 'differential-equations'),
-              ('A simple example', 2, None, 'a-simple-example'),
-              ('RNNs', 2, None, 'rnns'),
-              ('What kinds of behaviour can RNNs exhibit?',
-               2,
-               None,
-               'what-kinds-of-behaviour-can-rnns-exhibit'),
-              ('Basic layout,  "Figures from Sebastian Rashcka et al, Machine '
-               'learning with Sickit-Learn and '
-               'PyTorch":"https://sebastianraschka.com/blog/2022/ml-pytorch-book.html"',
-               2,
-               None,
-               'basic-layout-figures-from-sebastian-rashcka-et-al-machine-learning-with-sickit-learn-and-pytorch-https-sebastianraschka-com-blog-2022-ml-pytorch-book-html'),
-              ('Solving differential equations with RNNs',
+               'convolution-examples-polynomial-multiplication'),
+              ('Efficient Polynomial Multiplication',
                2,
                None,
-               'solving-differential-equations-with-rnns'),
-              ('Two first-order differential equations',
+               'efficient-polynomial-multiplication'),
+              ('Further simplification', 2, None, 'further-simplification'),
+              ('A more efficient way of coding the above Convolution',
                2,
                None,
-               'two-first-order-differential-equations'),
-              ('Velocity only', 2, None, 'velocity-only'),
-              ('Linking with RNNs', 2, None, 'linking-with-rnns'),
-              ('Minor rewrite', 2, None, 'minor-rewrite'),
-              ('RNNs in more detail', 2, None, 'rnns-in-more-detail'),
-              ('RNNs in more detail, part 2',
+               'a-more-efficient-way-of-coding-the-above-convolution'),
+              ('Commutative process', 2, None, 'commutative-process'),
+              ('Toeplitz matrices', 2, None, 'toeplitz-matrices'),
+              ('Fourier series and Toeplitz matrices',
                2,
                None,
-               'rnns-in-more-detail-part-2'),
-              ('RNNs in more detail, part 3',
+               'fourier-series-and-toeplitz-matrices'),
+              ('Generalizing the above one-dimensional case',
                2,
                None,
-               'rnns-in-more-detail-part-3'),
-              ('RNNs in more detail, part 4',
+               'generalizing-the-above-one-dimensional-case'),
+              ('Memory considerations', 2, None, 'memory-considerations'),
+              ('Padding', 2, None, 'padding'),
+              ('New vector', 2, None, 'new-vector'),
+              ('Rewriting as dot products',
                2,
                None,
-               'rnns-in-more-detail-part-4'),
-              ('RNNs in more detail, part 5',
+               'rewriting-as-dot-products'),
+              ('Cross correlation', 2, None, 'cross-correlation'),
+              ('Two-dimensional objects', 2, None, 'two-dimensional-objects'),
+              ('CNNs in more detail, simple example',
                2,
                None,
-               'rnns-in-more-detail-part-5'),
-              ('RNNs in more detail, part 6',
+               'cnns-in-more-detail-simple-example'),
+              ('The convolution stage', 2, None, 'the-convolution-stage'),
+              ('Finding the number of parameters',
                2,
                None,
-               'rnns-in-more-detail-part-6'),
-              ('RNNs in more detail, part 7',
+               'finding-the-number-of-parameters'),
+              ('New image (or volume)', 2, None, 'new-image-or-volume'),
+              ('Parameters to train, common settings',
                2,
                None,
-               'rnns-in-more-detail-part-7'),
-              ('Backpropagation through time',
-               2,
-               None,
-               'backpropagation-through-time'),
-              ('The backward pass is linear',
-               2,
-               None,
-               'the-backward-pass-is-linear'),
-              ('The problem of exploding or vanishing gradients',
-               2,
-               None,
-               'the-problem-of-exploding-or-vanishing-gradients'),
-              ('Mathematical setup', 2, None, 'mathematical-setup'),
-              ('Back propagation in time through figures, part 1',
-               2,
-               None,
-               'back-propagation-in-time-through-figures-part-1'),
-              ('Back propagation in time, part 2',
-               2,
-               None,
-               'back-propagation-in-time-part-2'),
-              ('Back propagation in time, part 3',
+               'parameters-to-train-common-settings'),
+              ('Examples of CNN setups', 2, None, 'examples-of-cnn-setups'),
+              ('Summarizing: Performing a general discrete convolution ("From '
+               'Raschka et '
+               'al":"https://github.com/rasbt/machine-learning-book")',
                2,
                None,
-               'back-propagation-in-time-part-3'),
-              ('Back propagation in time, part 4',
+               'summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
+              ('Pooling', 2, None, 'pooling'),
+              ('Pooling arithmetic', 2, None, 'pooling-arithmetic'),
+              ('Pooling types ("From Raschka et '
+               'al":"https://github.com/rasbt/machine-learning-book")',
                2,
                None,
-               'back-propagation-in-time-part-4'),
-              ('Back propagation in time in equations',
+               'pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
+              ('Building convolutional neural networks using Tensorflow and '
+               'Keras',
                2,
                None,
-               'back-propagation-in-time-in-equations'),
-              ('Chain rule again', 2, None, 'chain-rule-again'),
-              ('Gradients of loss functions',
+               'building-convolutional-neural-networks-using-tensorflow-and-keras'),
+              ('Setting it up', 2, None, 'setting-it-up'),
+              ('The MNIST dataset again', 2, None, 'the-mnist-dataset-again'),
+              ('Strong correlations', 2, None, 'strong-correlations'),
+              ('Layers of a CNN', 2, None, 'layers-of-a-cnn'),
+              ('Systematic reduction', 2, None, 'systematic-reduction'),
+              ('Prerequisites: Collect and pre-process data',
                2,
                None,
-               'gradients-of-loss-functions'),
-              ('Summary of RNNs', 2, None, 'summary-of-rnns'),
-              ('Summary of a  typical RNN',
+               'prerequisites-collect-and-pre-process-data'),
+              ('Importing Keras and Tensorflow',
                2,
                None,
-               'summary-of-a-typical-rnn'),
-              ('Four effective ways to learn an RNN and preparing for next '
-               'week',
+               'importing-keras-and-tensorflow'),
+              ('Running with Keras', 2, None, 'running-with-keras'),
+              ('Final part', 2, None, 'final-part'),
+              ('Final visualization', 2, None, 'final-visualization'),
+              ('The CIFAR01 data set', 2, None, 'the-cifar01-data-set'),
+              ('Verifying the data set', 2, None, 'verifying-the-data-set'),
+              ('Set up  the model', 2, None, 'set-up-the-model'),
+              ('Add Dense layers on top', 2, None, 'add-dense-layers-on-top'),
+              ('Compile and train the model',
                2,
                None,
-               'four-effective-ways-to-learn-an-rnn-and-preparing-for-next-week'),
-              ('Gating mechanism: Long Short Term Memory (LSTM)',
+               'compile-and-train-the-model'),
+              ('Finally, evaluate the model',
                2,
                None,
-               'gating-mechanism-long-short-term-memory-lstm'),
-              ('Implementing a memory cell in a neural network',
+               'finally-evaluate-the-model'),
+              ('Building code using Pytorch',
                2,
                None,
-               'implementing-a-memory-cell-in-a-neural-network'),
-              ('LSTM details', 2, None, 'lstm-details'),
-              ('Basic layout', 2, None, 'basic-layout'),
-              ('More LSTM details', 2, None, 'more-lstm-details'),
-              ('The forget gate', 2, None, 'the-forget-gate'),
-              ('Input gate', 2, None, 'input-gate'),
-              ('Forget and input', 2, None, 'forget-and-input'),
-              ('Output gate', 2, None, 'output-gate')]}
+               'building-code-using-pytorch')]}
 end of tocinfo -->
 
 <body>
@@ -302,111 +201,72 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week45-bs.html">Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks (RNNs)</a>
+    <a class="navbar-brand" href="http://github.com/week45-bs.html">Week 45,  Convolutional Neural Networks (CCNs)</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs001.html#plans-for-week-45" style="font-size: 80%;"><b>Plans for week 45</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs002.html#material-for-the-lab-sessions-additional-ways-to-present-classification-results-and-other-practicalities" style="font-size: 80%;"><b>Material for the lab sessions, additional ways to present classification results and other practicalities</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs003.html#material-for-lecture-monday-november-4" style="font-size: 80%;"><b>Material for Lecture Monday November 4</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs004.html#convolutional-neural-networks-recognizing-images" style="font-size: 80%;"><b>Convolutional Neural Networks (recognizing images)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs005.html#what-is-the-difference" style="font-size: 80%;"><b>What is the Difference</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs006.html#neural-networks-vs-cnns" style="font-size: 80%;"><b>Neural Networks vs CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs007.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;"><b>Why CNNS for images, sound files, medical images from CT scans etc?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs008.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;"><b>Regular NNs don’t scale well to full images</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs009.html#3d-volumes-of-neurons" style="font-size: 80%;"><b>3D volumes of neurons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs010.html#layers-used-to-build-cnns" style="font-size: 80%;"><b>Layers used to build CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs011.html#cnns-in-brief" style="font-size: 80%;"><b>CNNs in brief</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs012.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs013.html#key-idea" style="font-size: 80%;"><b>Key Idea</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs014.html#building-convolutional-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building convolutional neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs015.html#setting-it-up" style="font-size: 80%;"><b>Setting it up</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs016.html#the-mnist-dataset-again" style="font-size: 80%;"><b>The MNIST dataset again</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs017.html#strong-correlations" style="font-size: 80%;"><b>Strong correlations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs018.html#layers-of-a-cnn" style="font-size: 80%;"><b>Layers of a CNN</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs019.html#systematic-reduction" style="font-size: 80%;"><b>Systematic reduction</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs020.html#prerequisites-collect-and-pre-process-data" style="font-size: 80%;"><b>Prerequisites: Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs021.html#importing-keras-and-tensorflow" style="font-size: 80%;"><b>Importing Keras and Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs022.html#running-with-keras" style="font-size: 80%;"><b>Running with Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs023.html#final-part" style="font-size: 80%;"><b>Final part</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs024.html#final-visualization" style="font-size: 80%;"><b>Final visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs025.html#the-cifar01-data-set" style="font-size: 80%;"><b>The CIFAR01 data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs026.html#verifying-the-data-set" style="font-size: 80%;"><b>Verifying the data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs027.html#set-up-the-model" style="font-size: 80%;"><b>Set up  the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs028.html#add-dense-layers-on-top" style="font-size: 80%;"><b>Add Dense layers on top</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs029.html#compile-and-train-the-model" style="font-size: 80%;"><b>Compile and train the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs030.html#finally-evaluate-the-model" style="font-size: 80%;"><b>Finally, evaluate the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#building-our-own-cnn-code" style="font-size: 80%;"><b>Building our own CNN code</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#list-of-contents" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;List of contents:</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution2dlayer-convolution-in-a-hidden-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution2DLayer: convolution in a hidden layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#backpropagation-in-the-convolutional-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Backpropagation in the convolutional layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#demonstration" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Demonstration</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#pooling-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Pooling Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#flattening-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Flattening Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#fully-connected-layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Fully Connected Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#optimized-convolution2dlayer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Optimized Convolution2DLayer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#the-convolutional-neural-network-cnn" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;The Convolutional Neural Network (CNN)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-cnn-code" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of CNN code</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#additional-remarks" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Additional Remarks</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#remarks-on-the-speed" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Remarks on the speed</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution-using-separable-kernels" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution using separable kernels</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution-in-the-fourier-domain" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution in the Fourier domain</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs032.html#from-ffnns-and-cnns-to-recurrent-neural-networks-rnns" style="font-size: 80%;"><b>From FFNNs and CNNs to recurrent neural networks (RNNs)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs033.html#feedback-connections" style="font-size: 80%;"><b>Feedback connections</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs034.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs035.html#recurrent-neural-networks-rnns-overarching-view" style="font-size: 80%;"><b>Recurrent neural networks (RNNs): Overarching view</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs036.html#sequential-data-only" style="font-size: 80%;"><b>Sequential data only?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs037.html#differential-equations" style="font-size: 80%;"><b>Differential equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs038.html#a-simple-example" style="font-size: 80%;"><b>A simple example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs039.html#rnns" style="font-size: 80%;"><b>RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs040.html#what-kinds-of-behaviour-can-rnns-exhibit" style="font-size: 80%;"><b>What kinds of behaviour can RNNs exhibit?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs041.html#basic-layout-figures-from-sebastian-rashcka-et-al-machine-learning-with-sickit-learn-and-pytorch-https-sebastianraschka-com-blog-2022-ml-pytorch-book-html" style="font-size: 80%;"><b>Basic layout,  "Figures from Sebastian Rashcka et al, Machine learning with Sickit-Learn and PyTorch":"https://sebastianraschka.com/blog/2022/ml-pytorch-book.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs042.html#solving-differential-equations-with-rnns" style="font-size: 80%;"><b>Solving differential equations with RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs043.html#two-first-order-differential-equations" style="font-size: 80%;"><b>Two first-order differential equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs044.html#velocity-only" style="font-size: 80%;"><b>Velocity only</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs045.html#linking-with-rnns" style="font-size: 80%;"><b>Linking with RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs046.html#minor-rewrite" style="font-size: 80%;"><b>Minor rewrite</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs047.html#rnns-in-more-detail" style="font-size: 80%;"><b>RNNs in more detail</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs048.html#rnns-in-more-detail-part-2" style="font-size: 80%;"><b>RNNs in more detail, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs049.html#rnns-in-more-detail-part-3" style="font-size: 80%;"><b>RNNs in more detail, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs050.html#rnns-in-more-detail-part-4" style="font-size: 80%;"><b>RNNs in more detail, part 4</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs051.html#rnns-in-more-detail-part-5" style="font-size: 80%;"><b>RNNs in more detail, part 5</b></a></li>
-     <!-- navigation toc: --> <li><a href="#rnns-in-more-detail-part-6" style="font-size: 80%;"><b>RNNs in more detail, part 6</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs053.html#rnns-in-more-detail-part-7" style="font-size: 80%;"><b>RNNs in more detail, part 7</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs054.html#backpropagation-through-time" style="font-size: 80%;"><b>Backpropagation through time</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs055.html#the-backward-pass-is-linear" style="font-size: 80%;"><b>The backward pass is linear</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs056.html#the-problem-of-exploding-or-vanishing-gradients" style="font-size: 80%;"><b>The problem of exploding or vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs057.html#mathematical-setup" style="font-size: 80%;"><b>Mathematical setup</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs058.html#back-propagation-in-time-through-figures-part-1" style="font-size: 80%;"><b>Back propagation in time through figures, part 1</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs059.html#back-propagation-in-time-part-2" style="font-size: 80%;"><b>Back propagation in time, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs060.html#back-propagation-in-time-part-3" style="font-size: 80%;"><b>Back propagation in time, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs061.html#back-propagation-in-time-part-4" style="font-size: 80%;"><b>Back propagation in time, part 4</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs062.html#back-propagation-in-time-in-equations" style="font-size: 80%;"><b>Back propagation in time in equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs063.html#chain-rule-again" style="font-size: 80%;"><b>Chain rule again</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs064.html#gradients-of-loss-functions" style="font-size: 80%;"><b>Gradients of loss functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs065.html#summary-of-rnns" style="font-size: 80%;"><b>Summary of RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs066.html#summary-of-a-typical-rnn" style="font-size: 80%;"><b>Summary of a  typical RNN</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs067.html#four-effective-ways-to-learn-an-rnn-and-preparing-for-next-week" style="font-size: 80%;"><b>Four effective ways to learn an RNN and preparing for next week</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs068.html#gating-mechanism-long-short-term-memory-lstm" style="font-size: 80%;"><b>Gating mechanism: Long Short Term Memory (LSTM)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs069.html#implementing-a-memory-cell-in-a-neural-network" style="font-size: 80%;"><b>Implementing a memory cell in a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs070.html#lstm-details" style="font-size: 80%;"><b>LSTM details</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs071.html#basic-layout" style="font-size: 80%;"><b>Basic layout</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs072.html#more-lstm-details" style="font-size: 80%;"><b>More LSTM details</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs073.html#the-forget-gate" style="font-size: 80%;"><b>The forget gate</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs074.html#input-gate" style="font-size: 80%;"><b>Input gate</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs075.html#forget-and-input" style="font-size: 80%;"><b>Forget and input</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs076.html#output-gate" style="font-size: 80%;"><b>Output gate</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs001.html#plans-for-week-45" style="font-size: 80%;">Plans for week 45</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs002.html#material-for-the-lab-sessions" style="font-size: 80%;">Material for the lab sessions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs003.html#material-for-lecture-monday-november-3" style="font-size: 80%;">Material for Lecture Monday November 3</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs004.html#convolutional-neural-networks-recognizing-images-reminder-from-last-week" style="font-size: 80%;">Convolutional Neural Networks (recognizing images), reminder from last week</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs005.html#what-is-the-difference" style="font-size: 80%;">What is the Difference</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs006.html#neural-networks-vs-cnns" style="font-size: 80%;">Neural Networks vs CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs007.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;">Why CNNS for images, sound files, medical images from CT scans etc?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs008.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;">Regular NNs don’t scale well to full images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs009.html#3d-volumes-of-neurons" style="font-size: 80%;">3D volumes of neurons</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs010.html#more-on-dimensionalities" style="font-size: 80%;">More on Dimensionalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs011.html#further-remarks" style="font-size: 80%;">Further remarks</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs012.html#layers-used-to-build-cnns" style="font-size: 80%;">Layers used to build CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs013.html#transforming-images" style="font-size: 80%;">Transforming images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs014.html#cnns-in-brief" style="font-size: 80%;">CNNs in brief</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs015.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs016.html#key-idea" style="font-size: 80%;">Key Idea</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs017.html#mathematics-of-cnns" style="font-size: 80%;">Mathematics of CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs018.html#convolution-examples-polynomial-multiplication" style="font-size: 80%;">Convolution Examples: Polynomial multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs019.html#efficient-polynomial-multiplication" style="font-size: 80%;">Efficient Polynomial Multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs020.html#further-simplification" style="font-size: 80%;">Further simplification</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs021.html#a-more-efficient-way-of-coding-the-above-convolution" style="font-size: 80%;">A more efficient way of coding the above Convolution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs022.html#commutative-process" style="font-size: 80%;">Commutative process</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs023.html#toeplitz-matrices" style="font-size: 80%;">Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs024.html#fourier-series-and-toeplitz-matrices" style="font-size: 80%;">Fourier series and Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs025.html#generalizing-the-above-one-dimensional-case" style="font-size: 80%;">Generalizing the above one-dimensional case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs026.html#memory-considerations" style="font-size: 80%;">Memory considerations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs027.html#padding" style="font-size: 80%;">Padding</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs028.html#new-vector" style="font-size: 80%;">New vector</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs029.html#rewriting-as-dot-products" style="font-size: 80%;">Rewriting as dot products</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs030.html#cross-correlation" style="font-size: 80%;">Cross correlation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs030.html#two-dimensional-objects" style="font-size: 80%;">Two-dimensional objects</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#cnns-in-more-detail-simple-example" style="font-size: 80%;">CNNs in more detail, simple example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs032.html#the-convolution-stage" style="font-size: 80%;">The convolution stage</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs033.html#finding-the-number-of-parameters" style="font-size: 80%;">Finding the number of parameters</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs034.html#new-image-or-volume" style="font-size: 80%;">New image (or volume)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs035.html#parameters-to-train-common-settings" style="font-size: 80%;">Parameters to train, common settings</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs036.html#examples-of-cnn-setups" style="font-size: 80%;">Examples of CNN setups</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs037.html#summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Summarizing: Performing a general discrete convolution ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs038.html#pooling" style="font-size: 80%;">Pooling</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs039.html#pooling-arithmetic" style="font-size: 80%;">Pooling arithmetic</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs040.html#pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Pooling types ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs041.html#building-convolutional-neural-networks-using-tensorflow-and-keras" style="font-size: 80%;">Building convolutional neural networks using Tensorflow and Keras</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs042.html#setting-it-up" style="font-size: 80%;">Setting it up</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs043.html#the-mnist-dataset-again" style="font-size: 80%;">The MNIST dataset again</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs044.html#strong-correlations" style="font-size: 80%;">Strong correlations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs045.html#layers-of-a-cnn" style="font-size: 80%;">Layers of a CNN</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs046.html#systematic-reduction" style="font-size: 80%;">Systematic reduction</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs047.html#prerequisites-collect-and-pre-process-data" style="font-size: 80%;">Prerequisites: Collect and pre-process data</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs048.html#importing-keras-and-tensorflow" style="font-size: 80%;">Importing Keras and Tensorflow</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs049.html#running-with-keras" style="font-size: 80%;">Running with Keras</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs050.html#final-part" style="font-size: 80%;">Final part</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs051.html#final-visualization" style="font-size: 80%;">Final visualization</a></li>
+     <!-- navigation toc: --> <li><a href="#the-cifar01-data-set" style="font-size: 80%;">The CIFAR01 data set</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs053.html#verifying-the-data-set" style="font-size: 80%;">Verifying the data set</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs054.html#set-up-the-model" style="font-size: 80%;">Set up  the model</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs055.html#add-dense-layers-on-top" style="font-size: 80%;">Add Dense layers on top</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs056.html#compile-and-train-the-model" style="font-size: 80%;">Compile and train the model</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs057.html#finally-evaluate-the-model" style="font-size: 80%;">Finally, evaluate the model</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs058.html#building-code-using-pytorch" style="font-size: 80%;">Building code using Pytorch</a></li>
 
         </ul>
       </li>
@@ -418,13 +278,46 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0052"></a>
 <!-- !split -->
-<h2 id="rnns-in-more-detail-part-6" class="anchor">RNNs in more detail, part 6  </h2>
+<h2 id="the-cifar01-data-set" class="anchor">The CIFAR01 data set </h2>
+
+<p>The CIFAR10 dataset contains 60,000 color images in 10 classes, with
+6,000 images in each class. The dataset is divided into 50,000
+training images and 10,000 testing images. The classes are mutually
+exclusive and there is no overlap between them.
+</p>
+
+
+<!-- code=python (!bc pycod) typeset with pygments style "default" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #f8f8f8">
+  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">tensorflow</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">tf</span>
+
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">tensorflow.keras</span> <span style="color: #008000; font-weight: bold">import</span> datasets, layers, models
+<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">matplotlib.pyplot</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">plt</span>
+
+<span style="color: #408080; font-style: italic"># We import the data set</span>
+(train_images, train_labels), (test_images, test_labels) <span style="color: #666666">=</span> datasets<span style="color: #666666">.</span>cifar10<span style="color: #666666">.</span>load_data()
+
+<span style="color: #408080; font-style: italic"># Normalize pixel values to be between 0 and 1 by dividing by 255. </span>
+train_images, test_images <span style="color: #666666">=</span> train_images <span style="color: #666666">/</span> <span style="color: #666666">255.0</span>, test_images <span style="color: #666666">/</span> <span style="color: #666666">255.0</span>
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
+</div>
 
-<br/><br/>
-<center>
-<p><img src="http://github.com/figslides/RNN7.png" width="700" align="bottom"></p>
-</center>
-<br/><br/>
 
 <p>
 <!-- navigation buttons at the bottom of the page -->
@@ -447,11 +340,6 @@ <h2 id="rnns-in-more-detail-part-6" class="anchor">RNNs in more detail, part 6
   <li><a href="http://github.com/._week45-bs056.html">57</a></li>
   <li><a href="http://github.com/._week45-bs057.html">58</a></li>
   <li><a href="http://github.com/._week45-bs058.html">59</a></li>
-  <li><a href="http://github.com/._week45-bs059.html">60</a></li>
-  <li><a href="http://github.com/._week45-bs060.html">61</a></li>
-  <li><a href="http://github.com/._week45-bs061.html">62</a></li>
-  <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week45-bs076.html">77</a></li>
   <li><a href="http://github.com/._week45-bs053.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week45/html/._week45-bs053.html b/doc/pub/week45/html/._week45-bs053.html
index ca6e5be69..09ff8de53 100644
--- a/doc/pub/week45/html/._week45-bs053.html
+++ b/doc/pub/week45/html/._week45-bs053.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks (RNNs)">
-<title>Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks (RNNs)</title>
+<meta name="description" content="Week 45,  Convolutional Neural Networks (CCNs)">
+<title>Week 45,  Convolutional Neural Networks (CCNs)</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week45.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week45-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -37,19 +37,19 @@
 <!-- tocinfo
 {'highest level': 2,
  'sections': [('Plans for week 45', 2, None, 'plans-for-week-45'),
-              ('Material for the lab sessions, additional ways to present '
-               'classification results and other practicalities',
+              ('Material for the lab sessions',
                2,
                None,
-               'material-for-the-lab-sessions-additional-ways-to-present-classification-results-and-other-practicalities'),
-              ('Material for Lecture Monday November 4',
+               'material-for-the-lab-sessions'),
+              ('Material for Lecture Monday November 3',
                2,
                None,
-               'material-for-lecture-monday-november-4'),
-              ('Convolutional Neural Networks (recognizing images)',
+               'material-for-lecture-monday-november-3'),
+              ('Convolutional Neural Networks (recognizing images), reminder '
+               'from last week',
                2,
                None,
-               'convolutional-neural-networks-recognizing-images'),
+               'convolutional-neural-networks-recognizing-images-reminder-from-last-week'),
               ('What is the Difference', 2, None, 'what-is-the-difference'),
               ('Neural Networks vs CNNs', 2, None, 'neural-networks-vs-cnns'),
               ('Why CNNS for images, sound files, medical images from CT scans '
@@ -62,10 +62,13 @@
                None,
                'regular-nns-don-t-scale-well-to-full-images'),
               ('3D volumes of neurons', 2, None, '3d-volumes-of-neurons'),
+              ('More on Dimensionalities', 2, None, 'more-on-dimensionalities'),
+              ('Further remarks', 2, None, 'further-remarks'),
               ('Layers used to build CNNs',
                2,
                None,
                'layers-used-to-build-cnns'),
+              ('Transforming images', 2, None, 'transforming-images'),
               ('CNNs in brief', 2, None, 'cnns-in-brief'),
               ('A deep CNN model ("From Raschka et '
                'al":"https://github.com/rasbt/machine-learning-book")',
@@ -73,208 +76,104 @@
                None,
                'a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
               ('Key Idea', 2, None, 'key-idea'),
-              ('Building convolutional neural networks in Tensorflow and Keras',
+              ('Mathematics of CNNs', 2, None, 'mathematics-of-cnns'),
+              ('Convolution Examples: Polynomial multiplication',
                2,
                None,
-               'building-convolutional-neural-networks-in-tensorflow-and-keras'),
-              ('Setting it up', 2, None, 'setting-it-up'),
-              ('The MNIST dataset again', 2, None, 'the-mnist-dataset-again'),
-              ('Strong correlations', 2, None, 'strong-correlations'),
-              ('Layers of a CNN', 2, None, 'layers-of-a-cnn'),
-              ('Systematic reduction', 2, None, 'systematic-reduction'),
-              ('Prerequisites: Collect and pre-process data',
-               2,
-               None,
-               'prerequisites-collect-and-pre-process-data'),
-              ('Importing Keras and Tensorflow',
-               2,
-               None,
-               'importing-keras-and-tensorflow'),
-              ('Running with Keras', 2, None, 'running-with-keras'),
-              ('Final part', 2, None, 'final-part'),
-              ('Final visualization', 2, None, 'final-visualization'),
-              ('The CIFAR01 data set', 2, None, 'the-cifar01-data-set'),
-              ('Verifying the data set', 2, None, 'verifying-the-data-set'),
-              ('Set up  the model', 2, None, 'set-up-the-model'),
-              ('Add Dense layers on top', 2, None, 'add-dense-layers-on-top'),
-              ('Compile and train the model',
-               2,
-               None,
-               'compile-and-train-the-model'),
-              ('Finally, evaluate the model',
-               2,
-               None,
-               'finally-evaluate-the-model'),
-              ('Building our own CNN code',
-               2,
-               None,
-               'building-our-own-cnn-code'),
-              ('List of contents:', 3, None, 'list-of-contents'),
-              ('Schedulers', 3, None, 'schedulers'),
-              ('Usage of schedulers', 3, None, 'usage-of-schedulers'),
-              ('Cost functions', 3, None, 'cost-functions'),
-              ('Usage of cost functions', 3, None, 'usage-of-cost-functions'),
-              ('Activation functions', 3, None, 'activation-functions'),
-              ('Usage of activation functions',
-               3,
-               None,
-               'usage-of-activation-functions'),
-              ('Convolution', 3, None, 'convolution'),
-              ('Layers', 3, None, 'layers'),
-              ('Convolution2DLayer: convolution in a hidden layer',
-               3,
-               None,
-               'convolution2dlayer-convolution-in-a-hidden-layer'),
-              ('Backpropagation in the convolutional layer',
-               3,
-               None,
-               'backpropagation-in-the-convolutional-layer'),
-              ('Demonstration', 3, None, 'demonstration'),
-              ('Pooling Layer', 3, None, 'pooling-layer'),
-              ('Flattening Layer', 3, None, 'flattening-layer'),
-              ('Fully Connected Layers', 3, None, 'fully-connected-layers'),
-              ('Optimized Convolution2DLayer',
-               3,
-               None,
-               'optimized-convolution2dlayer'),
-              ('The Convolutional Neural Network (CNN)',
-               3,
-               None,
-               'the-convolutional-neural-network-cnn'),
-              ('Usage of CNN code', 3, None, 'usage-of-cnn-code'),
-              ('Additional Remarks', 3, None, 'additional-remarks'),
-              ('Remarks on the speed', 3, None, 'remarks-on-the-speed'),
-              ('Convolution using separable kernels',
-               3,
-               None,
-               'convolution-using-separable-kernels'),
-              ('Convolution in the Fourier domain',
-               3,
-               None,
-               'convolution-in-the-fourier-domain'),
-              ('From FFNNs and CNNs to recurrent neural networks (RNNs)',
-               2,
-               None,
-               'from-ffnns-and-cnns-to-recurrent-neural-networks-rnns'),
-              ('Feedback connections', 2, None, 'feedback-connections'),
-              ('Vanishing gradients', 2, None, 'vanishing-gradients'),
-              ('Recurrent neural networks (RNNs): Overarching view',
-               2,
-               None,
-               'recurrent-neural-networks-rnns-overarching-view'),
-              ('Sequential data only?', 2, None, 'sequential-data-only'),
-              ('Differential equations', 2, None, 'differential-equations'),
-              ('A simple example', 2, None, 'a-simple-example'),
-              ('RNNs', 2, None, 'rnns'),
-              ('What kinds of behaviour can RNNs exhibit?',
-               2,
-               None,
-               'what-kinds-of-behaviour-can-rnns-exhibit'),
-              ('Basic layout,  "Figures from Sebastian Rashcka et al, Machine '
-               'learning with Sickit-Learn and '
-               'PyTorch":"https://sebastianraschka.com/blog/2022/ml-pytorch-book.html"',
+               'convolution-examples-polynomial-multiplication'),
+              ('Efficient Polynomial Multiplication',
                2,
                None,
-               'basic-layout-figures-from-sebastian-rashcka-et-al-machine-learning-with-sickit-learn-and-pytorch-https-sebastianraschka-com-blog-2022-ml-pytorch-book-html'),
-              ('Solving differential equations with RNNs',
+               'efficient-polynomial-multiplication'),
+              ('Further simplification', 2, None, 'further-simplification'),
+              ('A more efficient way of coding the above Convolution',
                2,
                None,
-               'solving-differential-equations-with-rnns'),
-              ('Two first-order differential equations',
+               'a-more-efficient-way-of-coding-the-above-convolution'),
+              ('Commutative process', 2, None, 'commutative-process'),
+              ('Toeplitz matrices', 2, None, 'toeplitz-matrices'),
+              ('Fourier series and Toeplitz matrices',
                2,
                None,
-               'two-first-order-differential-equations'),
-              ('Velocity only', 2, None, 'velocity-only'),
-              ('Linking with RNNs', 2, None, 'linking-with-rnns'),
-              ('Minor rewrite', 2, None, 'minor-rewrite'),
-              ('RNNs in more detail', 2, None, 'rnns-in-more-detail'),
-              ('RNNs in more detail, part 2',
+               'fourier-series-and-toeplitz-matrices'),
+              ('Generalizing the above one-dimensional case',
                2,
                None,
-               'rnns-in-more-detail-part-2'),
-              ('RNNs in more detail, part 3',
+               'generalizing-the-above-one-dimensional-case'),
+              ('Memory considerations', 2, None, 'memory-considerations'),
+              ('Padding', 2, None, 'padding'),
+              ('New vector', 2, None, 'new-vector'),
+              ('Rewriting as dot products',
                2,
                None,
-               'rnns-in-more-detail-part-3'),
-              ('RNNs in more detail, part 4',
+               'rewriting-as-dot-products'),
+              ('Cross correlation', 2, None, 'cross-correlation'),
+              ('Two-dimensional objects', 2, None, 'two-dimensional-objects'),
+              ('CNNs in more detail, simple example',
                2,
                None,
-               'rnns-in-more-detail-part-4'),
-              ('RNNs in more detail, part 5',
+               'cnns-in-more-detail-simple-example'),
+              ('The convolution stage', 2, None, 'the-convolution-stage'),
+              ('Finding the number of parameters',
                2,
                None,
-               'rnns-in-more-detail-part-5'),
-              ('RNNs in more detail, part 6',
+               'finding-the-number-of-parameters'),
+              ('New image (or volume)', 2, None, 'new-image-or-volume'),
+              ('Parameters to train, common settings',
                2,
                None,
-               'rnns-in-more-detail-part-6'),
-              ('RNNs in more detail, part 7',
-               2,
-               None,
-               'rnns-in-more-detail-part-7'),
-              ('Backpropagation through time',
-               2,
-               None,
-               'backpropagation-through-time'),
-              ('The backward pass is linear',
-               2,
-               None,
-               'the-backward-pass-is-linear'),
-              ('The problem of exploding or vanishing gradients',
-               2,
-               None,
-               'the-problem-of-exploding-or-vanishing-gradients'),
-              ('Mathematical setup', 2, None, 'mathematical-setup'),
-              ('Back propagation in time through figures, part 1',
-               2,
-               None,
-               'back-propagation-in-time-through-figures-part-1'),
-              ('Back propagation in time, part 2',
-               2,
-               None,
-               'back-propagation-in-time-part-2'),
-              ('Back propagation in time, part 3',
+               'parameters-to-train-common-settings'),
+              ('Examples of CNN setups', 2, None, 'examples-of-cnn-setups'),
+              ('Summarizing: Performing a general discrete convolution ("From '
+               'Raschka et '
+               'al":"https://github.com/rasbt/machine-learning-book")',
                2,
                None,
-               'back-propagation-in-time-part-3'),
-              ('Back propagation in time, part 4',
+               'summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
+              ('Pooling', 2, None, 'pooling'),
+              ('Pooling arithmetic', 2, None, 'pooling-arithmetic'),
+              ('Pooling types ("From Raschka et '
+               'al":"https://github.com/rasbt/machine-learning-book")',
                2,
                None,
-               'back-propagation-in-time-part-4'),
-              ('Back propagation in time in equations',
+               'pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
+              ('Building convolutional neural networks using Tensorflow and '
+               'Keras',
                2,
                None,
-               'back-propagation-in-time-in-equations'),
-              ('Chain rule again', 2, None, 'chain-rule-again'),
-              ('Gradients of loss functions',
+               'building-convolutional-neural-networks-using-tensorflow-and-keras'),
+              ('Setting it up', 2, None, 'setting-it-up'),
+              ('The MNIST dataset again', 2, None, 'the-mnist-dataset-again'),
+              ('Strong correlations', 2, None, 'strong-correlations'),
+              ('Layers of a CNN', 2, None, 'layers-of-a-cnn'),
+              ('Systematic reduction', 2, None, 'systematic-reduction'),
+              ('Prerequisites: Collect and pre-process data',
                2,
                None,
-               'gradients-of-loss-functions'),
-              ('Summary of RNNs', 2, None, 'summary-of-rnns'),
-              ('Summary of a  typical RNN',
+               'prerequisites-collect-and-pre-process-data'),
+              ('Importing Keras and Tensorflow',
                2,
                None,
-               'summary-of-a-typical-rnn'),
-              ('Four effective ways to learn an RNN and preparing for next '
-               'week',
+               'importing-keras-and-tensorflow'),
+              ('Running with Keras', 2, None, 'running-with-keras'),
+              ('Final part', 2, None, 'final-part'),
+              ('Final visualization', 2, None, 'final-visualization'),
+              ('The CIFAR01 data set', 2, None, 'the-cifar01-data-set'),
+              ('Verifying the data set', 2, None, 'verifying-the-data-set'),
+              ('Set up  the model', 2, None, 'set-up-the-model'),
+              ('Add Dense layers on top', 2, None, 'add-dense-layers-on-top'),
+              ('Compile and train the model',
                2,
                None,
-               'four-effective-ways-to-learn-an-rnn-and-preparing-for-next-week'),
-              ('Gating mechanism: Long Short Term Memory (LSTM)',
+               'compile-and-train-the-model'),
+              ('Finally, evaluate the model',
                2,
                None,
-               'gating-mechanism-long-short-term-memory-lstm'),
-              ('Implementing a memory cell in a neural network',
+               'finally-evaluate-the-model'),
+              ('Building code using Pytorch',
                2,
                None,
-               'implementing-a-memory-cell-in-a-neural-network'),
-              ('LSTM details', 2, None, 'lstm-details'),
-              ('Basic layout', 2, None, 'basic-layout'),
-              ('More LSTM details', 2, None, 'more-lstm-details'),
-              ('The forget gate', 2, None, 'the-forget-gate'),
-              ('Input gate', 2, None, 'input-gate'),
-              ('Forget and input', 2, None, 'forget-and-input'),
-              ('Output gate', 2, None, 'output-gate')]}
+               'building-code-using-pytorch')]}
 end of tocinfo -->
 
 <body>
@@ -302,111 +201,72 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week45-bs.html">Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks (RNNs)</a>
+    <a class="navbar-brand" href="http://github.com/week45-bs.html">Week 45,  Convolutional Neural Networks (CCNs)</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs001.html#plans-for-week-45" style="font-size: 80%;"><b>Plans for week 45</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs002.html#material-for-the-lab-sessions-additional-ways-to-present-classification-results-and-other-practicalities" style="font-size: 80%;"><b>Material for the lab sessions, additional ways to present classification results and other practicalities</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs003.html#material-for-lecture-monday-november-4" style="font-size: 80%;"><b>Material for Lecture Monday November 4</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs004.html#convolutional-neural-networks-recognizing-images" style="font-size: 80%;"><b>Convolutional Neural Networks (recognizing images)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs005.html#what-is-the-difference" style="font-size: 80%;"><b>What is the Difference</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs006.html#neural-networks-vs-cnns" style="font-size: 80%;"><b>Neural Networks vs CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs007.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;"><b>Why CNNS for images, sound files, medical images from CT scans etc?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs008.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;"><b>Regular NNs don’t scale well to full images</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs009.html#3d-volumes-of-neurons" style="font-size: 80%;"><b>3D volumes of neurons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs010.html#layers-used-to-build-cnns" style="font-size: 80%;"><b>Layers used to build CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs011.html#cnns-in-brief" style="font-size: 80%;"><b>CNNs in brief</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs012.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs013.html#key-idea" style="font-size: 80%;"><b>Key Idea</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs014.html#building-convolutional-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building convolutional neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs015.html#setting-it-up" style="font-size: 80%;"><b>Setting it up</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs016.html#the-mnist-dataset-again" style="font-size: 80%;"><b>The MNIST dataset again</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs017.html#strong-correlations" style="font-size: 80%;"><b>Strong correlations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs018.html#layers-of-a-cnn" style="font-size: 80%;"><b>Layers of a CNN</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs019.html#systematic-reduction" style="font-size: 80%;"><b>Systematic reduction</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs020.html#prerequisites-collect-and-pre-process-data" style="font-size: 80%;"><b>Prerequisites: Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs021.html#importing-keras-and-tensorflow" style="font-size: 80%;"><b>Importing Keras and Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs022.html#running-with-keras" style="font-size: 80%;"><b>Running with Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs023.html#final-part" style="font-size: 80%;"><b>Final part</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs024.html#final-visualization" style="font-size: 80%;"><b>Final visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs025.html#the-cifar01-data-set" style="font-size: 80%;"><b>The CIFAR01 data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs026.html#verifying-the-data-set" style="font-size: 80%;"><b>Verifying the data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs027.html#set-up-the-model" style="font-size: 80%;"><b>Set up  the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs028.html#add-dense-layers-on-top" style="font-size: 80%;"><b>Add Dense layers on top</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs029.html#compile-and-train-the-model" style="font-size: 80%;"><b>Compile and train the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs030.html#finally-evaluate-the-model" style="font-size: 80%;"><b>Finally, evaluate the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#building-our-own-cnn-code" style="font-size: 80%;"><b>Building our own CNN code</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#list-of-contents" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;List of contents:</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution2dlayer-convolution-in-a-hidden-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution2DLayer: convolution in a hidden layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#backpropagation-in-the-convolutional-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Backpropagation in the convolutional layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#demonstration" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Demonstration</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#pooling-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Pooling Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#flattening-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Flattening Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#fully-connected-layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Fully Connected Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#optimized-convolution2dlayer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Optimized Convolution2DLayer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#the-convolutional-neural-network-cnn" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;The Convolutional Neural Network (CNN)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-cnn-code" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of CNN code</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#additional-remarks" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Additional Remarks</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#remarks-on-the-speed" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Remarks on the speed</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution-using-separable-kernels" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution using separable kernels</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution-in-the-fourier-domain" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution in the Fourier domain</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs032.html#from-ffnns-and-cnns-to-recurrent-neural-networks-rnns" style="font-size: 80%;"><b>From FFNNs and CNNs to recurrent neural networks (RNNs)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs033.html#feedback-connections" style="font-size: 80%;"><b>Feedback connections</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs034.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs035.html#recurrent-neural-networks-rnns-overarching-view" style="font-size: 80%;"><b>Recurrent neural networks (RNNs): Overarching view</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs036.html#sequential-data-only" style="font-size: 80%;"><b>Sequential data only?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs037.html#differential-equations" style="font-size: 80%;"><b>Differential equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs038.html#a-simple-example" style="font-size: 80%;"><b>A simple example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs039.html#rnns" style="font-size: 80%;"><b>RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs040.html#what-kinds-of-behaviour-can-rnns-exhibit" style="font-size: 80%;"><b>What kinds of behaviour can RNNs exhibit?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs041.html#basic-layout-figures-from-sebastian-rashcka-et-al-machine-learning-with-sickit-learn-and-pytorch-https-sebastianraschka-com-blog-2022-ml-pytorch-book-html" style="font-size: 80%;"><b>Basic layout,  "Figures from Sebastian Rashcka et al, Machine learning with Sickit-Learn and PyTorch":"https://sebastianraschka.com/blog/2022/ml-pytorch-book.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs042.html#solving-differential-equations-with-rnns" style="font-size: 80%;"><b>Solving differential equations with RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs043.html#two-first-order-differential-equations" style="font-size: 80%;"><b>Two first-order differential equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs044.html#velocity-only" style="font-size: 80%;"><b>Velocity only</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs045.html#linking-with-rnns" style="font-size: 80%;"><b>Linking with RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs046.html#minor-rewrite" style="font-size: 80%;"><b>Minor rewrite</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs047.html#rnns-in-more-detail" style="font-size: 80%;"><b>RNNs in more detail</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs048.html#rnns-in-more-detail-part-2" style="font-size: 80%;"><b>RNNs in more detail, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs049.html#rnns-in-more-detail-part-3" style="font-size: 80%;"><b>RNNs in more detail, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs050.html#rnns-in-more-detail-part-4" style="font-size: 80%;"><b>RNNs in more detail, part 4</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs051.html#rnns-in-more-detail-part-5" style="font-size: 80%;"><b>RNNs in more detail, part 5</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs052.html#rnns-in-more-detail-part-6" style="font-size: 80%;"><b>RNNs in more detail, part 6</b></a></li>
-     <!-- navigation toc: --> <li><a href="#rnns-in-more-detail-part-7" style="font-size: 80%;"><b>RNNs in more detail, part 7</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs054.html#backpropagation-through-time" style="font-size: 80%;"><b>Backpropagation through time</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs055.html#the-backward-pass-is-linear" style="font-size: 80%;"><b>The backward pass is linear</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs056.html#the-problem-of-exploding-or-vanishing-gradients" style="font-size: 80%;"><b>The problem of exploding or vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs057.html#mathematical-setup" style="font-size: 80%;"><b>Mathematical setup</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs058.html#back-propagation-in-time-through-figures-part-1" style="font-size: 80%;"><b>Back propagation in time through figures, part 1</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs059.html#back-propagation-in-time-part-2" style="font-size: 80%;"><b>Back propagation in time, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs060.html#back-propagation-in-time-part-3" style="font-size: 80%;"><b>Back propagation in time, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs061.html#back-propagation-in-time-part-4" style="font-size: 80%;"><b>Back propagation in time, part 4</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs062.html#back-propagation-in-time-in-equations" style="font-size: 80%;"><b>Back propagation in time in equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs063.html#chain-rule-again" style="font-size: 80%;"><b>Chain rule again</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs064.html#gradients-of-loss-functions" style="font-size: 80%;"><b>Gradients of loss functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs065.html#summary-of-rnns" style="font-size: 80%;"><b>Summary of RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs066.html#summary-of-a-typical-rnn" style="font-size: 80%;"><b>Summary of a  typical RNN</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs067.html#four-effective-ways-to-learn-an-rnn-and-preparing-for-next-week" style="font-size: 80%;"><b>Four effective ways to learn an RNN and preparing for next week</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs068.html#gating-mechanism-long-short-term-memory-lstm" style="font-size: 80%;"><b>Gating mechanism: Long Short Term Memory (LSTM)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs069.html#implementing-a-memory-cell-in-a-neural-network" style="font-size: 80%;"><b>Implementing a memory cell in a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs070.html#lstm-details" style="font-size: 80%;"><b>LSTM details</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs071.html#basic-layout" style="font-size: 80%;"><b>Basic layout</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs072.html#more-lstm-details" style="font-size: 80%;"><b>More LSTM details</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs073.html#the-forget-gate" style="font-size: 80%;"><b>The forget gate</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs074.html#input-gate" style="font-size: 80%;"><b>Input gate</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs075.html#forget-and-input" style="font-size: 80%;"><b>Forget and input</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs076.html#output-gate" style="font-size: 80%;"><b>Output gate</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs001.html#plans-for-week-45" style="font-size: 80%;">Plans for week 45</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs002.html#material-for-the-lab-sessions" style="font-size: 80%;">Material for the lab sessions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs003.html#material-for-lecture-monday-november-3" style="font-size: 80%;">Material for Lecture Monday November 3</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs004.html#convolutional-neural-networks-recognizing-images-reminder-from-last-week" style="font-size: 80%;">Convolutional Neural Networks (recognizing images), reminder from last week</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs005.html#what-is-the-difference" style="font-size: 80%;">What is the Difference</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs006.html#neural-networks-vs-cnns" style="font-size: 80%;">Neural Networks vs CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs007.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;">Why CNNS for images, sound files, medical images from CT scans etc?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs008.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;">Regular NNs don’t scale well to full images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs009.html#3d-volumes-of-neurons" style="font-size: 80%;">3D volumes of neurons</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs010.html#more-on-dimensionalities" style="font-size: 80%;">More on Dimensionalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs011.html#further-remarks" style="font-size: 80%;">Further remarks</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs012.html#layers-used-to-build-cnns" style="font-size: 80%;">Layers used to build CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs013.html#transforming-images" style="font-size: 80%;">Transforming images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs014.html#cnns-in-brief" style="font-size: 80%;">CNNs in brief</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs015.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs016.html#key-idea" style="font-size: 80%;">Key Idea</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs017.html#mathematics-of-cnns" style="font-size: 80%;">Mathematics of CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs018.html#convolution-examples-polynomial-multiplication" style="font-size: 80%;">Convolution Examples: Polynomial multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs019.html#efficient-polynomial-multiplication" style="font-size: 80%;">Efficient Polynomial Multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs020.html#further-simplification" style="font-size: 80%;">Further simplification</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs021.html#a-more-efficient-way-of-coding-the-above-convolution" style="font-size: 80%;">A more efficient way of coding the above Convolution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs022.html#commutative-process" style="font-size: 80%;">Commutative process</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs023.html#toeplitz-matrices" style="font-size: 80%;">Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs024.html#fourier-series-and-toeplitz-matrices" style="font-size: 80%;">Fourier series and Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs025.html#generalizing-the-above-one-dimensional-case" style="font-size: 80%;">Generalizing the above one-dimensional case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs026.html#memory-considerations" style="font-size: 80%;">Memory considerations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs027.html#padding" style="font-size: 80%;">Padding</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs028.html#new-vector" style="font-size: 80%;">New vector</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs029.html#rewriting-as-dot-products" style="font-size: 80%;">Rewriting as dot products</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs030.html#cross-correlation" style="font-size: 80%;">Cross correlation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs030.html#two-dimensional-objects" style="font-size: 80%;">Two-dimensional objects</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#cnns-in-more-detail-simple-example" style="font-size: 80%;">CNNs in more detail, simple example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs032.html#the-convolution-stage" style="font-size: 80%;">The convolution stage</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs033.html#finding-the-number-of-parameters" style="font-size: 80%;">Finding the number of parameters</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs034.html#new-image-or-volume" style="font-size: 80%;">New image (or volume)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs035.html#parameters-to-train-common-settings" style="font-size: 80%;">Parameters to train, common settings</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs036.html#examples-of-cnn-setups" style="font-size: 80%;">Examples of CNN setups</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs037.html#summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Summarizing: Performing a general discrete convolution ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs038.html#pooling" style="font-size: 80%;">Pooling</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs039.html#pooling-arithmetic" style="font-size: 80%;">Pooling arithmetic</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs040.html#pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Pooling types ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs041.html#building-convolutional-neural-networks-using-tensorflow-and-keras" style="font-size: 80%;">Building convolutional neural networks using Tensorflow and Keras</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs042.html#setting-it-up" style="font-size: 80%;">Setting it up</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs043.html#the-mnist-dataset-again" style="font-size: 80%;">The MNIST dataset again</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs044.html#strong-correlations" style="font-size: 80%;">Strong correlations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs045.html#layers-of-a-cnn" style="font-size: 80%;">Layers of a CNN</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs046.html#systematic-reduction" style="font-size: 80%;">Systematic reduction</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs047.html#prerequisites-collect-and-pre-process-data" style="font-size: 80%;">Prerequisites: Collect and pre-process data</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs048.html#importing-keras-and-tensorflow" style="font-size: 80%;">Importing Keras and Tensorflow</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs049.html#running-with-keras" style="font-size: 80%;">Running with Keras</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs050.html#final-part" style="font-size: 80%;">Final part</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs051.html#final-visualization" style="font-size: 80%;">Final visualization</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs052.html#the-cifar01-data-set" style="font-size: 80%;">The CIFAR01 data set</a></li>
+     <!-- navigation toc: --> <li><a href="#verifying-the-data-set" style="font-size: 80%;">Verifying the data set</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs054.html#set-up-the-model" style="font-size: 80%;">Set up  the model</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs055.html#add-dense-layers-on-top" style="font-size: 80%;">Add Dense layers on top</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs056.html#compile-and-train-the-model" style="font-size: 80%;">Compile and train the model</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs057.html#finally-evaluate-the-model" style="font-size: 80%;">Finally, evaluate the model</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs058.html#building-code-using-pytorch" style="font-size: 80%;">Building code using Pytorch</a></li>
 
         </ul>
       </li>
@@ -418,13 +278,45 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0053"></a>
 <!-- !split -->
-<h2 id="rnns-in-more-detail-part-7" class="anchor">RNNs in more detail, part 7  </h2>
+<h2 id="verifying-the-data-set" class="anchor">Verifying the data set </h2>
+
+<p>To verify that the dataset looks correct, let's plot the first 25 images from the training set and display the class name below each image.</p>
+
+
+<!-- code=python (!bc pycod) typeset with pygments style "default" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #f8f8f8">
+  <pre style="line-height: 125%;">class_names <span style="color: #666666">=</span> [<span style="color: #BA2121">&#39;airplane&#39;</span>, <span style="color: #BA2121">&#39;automobile&#39;</span>, <span style="color: #BA2121">&#39;bird&#39;</span>, <span style="color: #BA2121">&#39;cat&#39;</span>, <span style="color: #BA2121">&#39;deer&#39;</span>,
+               <span style="color: #BA2121">&#39;dog&#39;</span>, <span style="color: #BA2121">&#39;frog&#39;</span>, <span style="color: #BA2121">&#39;horse&#39;</span>, <span style="color: #BA2121">&#39;ship&#39;</span>, <span style="color: #BA2121">&#39;truck&#39;</span>]
+plt<span style="color: #666666">.</span>figure(figsize<span style="color: #666666">=</span>(<span style="color: #666666">10</span>,<span style="color: #666666">10</span>))
+<span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #666666">25</span>):
+    plt<span style="color: #666666">.</span>subplot(<span style="color: #666666">5</span>,<span style="color: #666666">5</span>,i<span style="color: #666666">+1</span>)
+    plt<span style="color: #666666">.</span>xticks([])
+    plt<span style="color: #666666">.</span>yticks([])
+    plt<span style="color: #666666">.</span>grid(<span style="color: #008000; font-weight: bold">False</span>)
+    plt<span style="color: #666666">.</span>imshow(train_images[i], cmap<span style="color: #666666">=</span>plt<span style="color: #666666">.</span>cm<span style="color: #666666">.</span>binary)
+    <span style="color: #408080; font-style: italic"># The CIFAR labels happen to be arrays, </span>
+    <span style="color: #408080; font-style: italic"># which is why you need the extra index</span>
+    plt<span style="color: #666666">.</span>xlabel(class_names[train_labels[i][<span style="color: #666666">0</span>]])
+plt<span style="color: #666666">.</span>show()
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
+</div>
 
-<br/><br/>
-<center>
-<p><img src="http://github.com/figslides/RNN8.png" width="700" align="bottom"></p>
-</center>
-<br/><br/>
 
 <p>
 <!-- navigation buttons at the bottom of the page -->
@@ -446,12 +338,6 @@ <h2 id="rnns-in-more-detail-part-7" class="anchor">RNNs in more detail, part 7
   <li><a href="http://github.com/._week45-bs056.html">57</a></li>
   <li><a href="http://github.com/._week45-bs057.html">58</a></li>
   <li><a href="http://github.com/._week45-bs058.html">59</a></li>
-  <li><a href="http://github.com/._week45-bs059.html">60</a></li>
-  <li><a href="http://github.com/._week45-bs060.html">61</a></li>
-  <li><a href="http://github.com/._week45-bs061.html">62</a></li>
-  <li><a href="http://github.com/._week45-bs062.html">63</a></li>
-  <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week45-bs076.html">77</a></li>
   <li><a href="http://github.com/._week45-bs054.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week45/html/._week45-bs054.html b/doc/pub/week45/html/._week45-bs054.html
index fbc509b55..1a6529112 100644
--- a/doc/pub/week45/html/._week45-bs054.html
+++ b/doc/pub/week45/html/._week45-bs054.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks (RNNs)">
-<title>Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks (RNNs)</title>
+<meta name="description" content="Week 45,  Convolutional Neural Networks (CCNs)">
+<title>Week 45,  Convolutional Neural Networks (CCNs)</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week45.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week45-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -37,19 +37,19 @@
 <!-- tocinfo
 {'highest level': 2,
  'sections': [('Plans for week 45', 2, None, 'plans-for-week-45'),
-              ('Material for the lab sessions, additional ways to present '
-               'classification results and other practicalities',
+              ('Material for the lab sessions',
                2,
                None,
-               'material-for-the-lab-sessions-additional-ways-to-present-classification-results-and-other-practicalities'),
-              ('Material for Lecture Monday November 4',
+               'material-for-the-lab-sessions'),
+              ('Material for Lecture Monday November 3',
                2,
                None,
-               'material-for-lecture-monday-november-4'),
-              ('Convolutional Neural Networks (recognizing images)',
+               'material-for-lecture-monday-november-3'),
+              ('Convolutional Neural Networks (recognizing images), reminder '
+               'from last week',
                2,
                None,
-               'convolutional-neural-networks-recognizing-images'),
+               'convolutional-neural-networks-recognizing-images-reminder-from-last-week'),
               ('What is the Difference', 2, None, 'what-is-the-difference'),
               ('Neural Networks vs CNNs', 2, None, 'neural-networks-vs-cnns'),
               ('Why CNNS for images, sound files, medical images from CT scans '
@@ -62,10 +62,13 @@
                None,
                'regular-nns-don-t-scale-well-to-full-images'),
               ('3D volumes of neurons', 2, None, '3d-volumes-of-neurons'),
+              ('More on Dimensionalities', 2, None, 'more-on-dimensionalities'),
+              ('Further remarks', 2, None, 'further-remarks'),
               ('Layers used to build CNNs',
                2,
                None,
                'layers-used-to-build-cnns'),
+              ('Transforming images', 2, None, 'transforming-images'),
               ('CNNs in brief', 2, None, 'cnns-in-brief'),
               ('A deep CNN model ("From Raschka et '
                'al":"https://github.com/rasbt/machine-learning-book")',
@@ -73,208 +76,104 @@
                None,
                'a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
               ('Key Idea', 2, None, 'key-idea'),
-              ('Building convolutional neural networks in Tensorflow and Keras',
+              ('Mathematics of CNNs', 2, None, 'mathematics-of-cnns'),
+              ('Convolution Examples: Polynomial multiplication',
                2,
                None,
-               'building-convolutional-neural-networks-in-tensorflow-and-keras'),
-              ('Setting it up', 2, None, 'setting-it-up'),
-              ('The MNIST dataset again', 2, None, 'the-mnist-dataset-again'),
-              ('Strong correlations', 2, None, 'strong-correlations'),
-              ('Layers of a CNN', 2, None, 'layers-of-a-cnn'),
-              ('Systematic reduction', 2, None, 'systematic-reduction'),
-              ('Prerequisites: Collect and pre-process data',
-               2,
-               None,
-               'prerequisites-collect-and-pre-process-data'),
-              ('Importing Keras and Tensorflow',
-               2,
-               None,
-               'importing-keras-and-tensorflow'),
-              ('Running with Keras', 2, None, 'running-with-keras'),
-              ('Final part', 2, None, 'final-part'),
-              ('Final visualization', 2, None, 'final-visualization'),
-              ('The CIFAR01 data set', 2, None, 'the-cifar01-data-set'),
-              ('Verifying the data set', 2, None, 'verifying-the-data-set'),
-              ('Set up  the model', 2, None, 'set-up-the-model'),
-              ('Add Dense layers on top', 2, None, 'add-dense-layers-on-top'),
-              ('Compile and train the model',
-               2,
-               None,
-               'compile-and-train-the-model'),
-              ('Finally, evaluate the model',
-               2,
-               None,
-               'finally-evaluate-the-model'),
-              ('Building our own CNN code',
-               2,
-               None,
-               'building-our-own-cnn-code'),
-              ('List of contents:', 3, None, 'list-of-contents'),
-              ('Schedulers', 3, None, 'schedulers'),
-              ('Usage of schedulers', 3, None, 'usage-of-schedulers'),
-              ('Cost functions', 3, None, 'cost-functions'),
-              ('Usage of cost functions', 3, None, 'usage-of-cost-functions'),
-              ('Activation functions', 3, None, 'activation-functions'),
-              ('Usage of activation functions',
-               3,
-               None,
-               'usage-of-activation-functions'),
-              ('Convolution', 3, None, 'convolution'),
-              ('Layers', 3, None, 'layers'),
-              ('Convolution2DLayer: convolution in a hidden layer',
-               3,
-               None,
-               'convolution2dlayer-convolution-in-a-hidden-layer'),
-              ('Backpropagation in the convolutional layer',
-               3,
-               None,
-               'backpropagation-in-the-convolutional-layer'),
-              ('Demonstration', 3, None, 'demonstration'),
-              ('Pooling Layer', 3, None, 'pooling-layer'),
-              ('Flattening Layer', 3, None, 'flattening-layer'),
-              ('Fully Connected Layers', 3, None, 'fully-connected-layers'),
-              ('Optimized Convolution2DLayer',
-               3,
-               None,
-               'optimized-convolution2dlayer'),
-              ('The Convolutional Neural Network (CNN)',
-               3,
-               None,
-               'the-convolutional-neural-network-cnn'),
-              ('Usage of CNN code', 3, None, 'usage-of-cnn-code'),
-              ('Additional Remarks', 3, None, 'additional-remarks'),
-              ('Remarks on the speed', 3, None, 'remarks-on-the-speed'),
-              ('Convolution using separable kernels',
-               3,
-               None,
-               'convolution-using-separable-kernels'),
-              ('Convolution in the Fourier domain',
-               3,
-               None,
-               'convolution-in-the-fourier-domain'),
-              ('From FFNNs and CNNs to recurrent neural networks (RNNs)',
-               2,
-               None,
-               'from-ffnns-and-cnns-to-recurrent-neural-networks-rnns'),
-              ('Feedback connections', 2, None, 'feedback-connections'),
-              ('Vanishing gradients', 2, None, 'vanishing-gradients'),
-              ('Recurrent neural networks (RNNs): Overarching view',
-               2,
-               None,
-               'recurrent-neural-networks-rnns-overarching-view'),
-              ('Sequential data only?', 2, None, 'sequential-data-only'),
-              ('Differential equations', 2, None, 'differential-equations'),
-              ('A simple example', 2, None, 'a-simple-example'),
-              ('RNNs', 2, None, 'rnns'),
-              ('What kinds of behaviour can RNNs exhibit?',
-               2,
-               None,
-               'what-kinds-of-behaviour-can-rnns-exhibit'),
-              ('Basic layout,  "Figures from Sebastian Rashcka et al, Machine '
-               'learning with Sickit-Learn and '
-               'PyTorch":"https://sebastianraschka.com/blog/2022/ml-pytorch-book.html"',
+               'convolution-examples-polynomial-multiplication'),
+              ('Efficient Polynomial Multiplication',
                2,
                None,
-               'basic-layout-figures-from-sebastian-rashcka-et-al-machine-learning-with-sickit-learn-and-pytorch-https-sebastianraschka-com-blog-2022-ml-pytorch-book-html'),
-              ('Solving differential equations with RNNs',
+               'efficient-polynomial-multiplication'),
+              ('Further simplification', 2, None, 'further-simplification'),
+              ('A more efficient way of coding the above Convolution',
                2,
                None,
-               'solving-differential-equations-with-rnns'),
-              ('Two first-order differential equations',
+               'a-more-efficient-way-of-coding-the-above-convolution'),
+              ('Commutative process', 2, None, 'commutative-process'),
+              ('Toeplitz matrices', 2, None, 'toeplitz-matrices'),
+              ('Fourier series and Toeplitz matrices',
                2,
                None,
-               'two-first-order-differential-equations'),
-              ('Velocity only', 2, None, 'velocity-only'),
-              ('Linking with RNNs', 2, None, 'linking-with-rnns'),
-              ('Minor rewrite', 2, None, 'minor-rewrite'),
-              ('RNNs in more detail', 2, None, 'rnns-in-more-detail'),
-              ('RNNs in more detail, part 2',
+               'fourier-series-and-toeplitz-matrices'),
+              ('Generalizing the above one-dimensional case',
                2,
                None,
-               'rnns-in-more-detail-part-2'),
-              ('RNNs in more detail, part 3',
+               'generalizing-the-above-one-dimensional-case'),
+              ('Memory considerations', 2, None, 'memory-considerations'),
+              ('Padding', 2, None, 'padding'),
+              ('New vector', 2, None, 'new-vector'),
+              ('Rewriting as dot products',
                2,
                None,
-               'rnns-in-more-detail-part-3'),
-              ('RNNs in more detail, part 4',
+               'rewriting-as-dot-products'),
+              ('Cross correlation', 2, None, 'cross-correlation'),
+              ('Two-dimensional objects', 2, None, 'two-dimensional-objects'),
+              ('CNNs in more detail, simple example',
                2,
                None,
-               'rnns-in-more-detail-part-4'),
-              ('RNNs in more detail, part 5',
+               'cnns-in-more-detail-simple-example'),
+              ('The convolution stage', 2, None, 'the-convolution-stage'),
+              ('Finding the number of parameters',
                2,
                None,
-               'rnns-in-more-detail-part-5'),
-              ('RNNs in more detail, part 6',
+               'finding-the-number-of-parameters'),
+              ('New image (or volume)', 2, None, 'new-image-or-volume'),
+              ('Parameters to train, common settings',
                2,
                None,
-               'rnns-in-more-detail-part-6'),
-              ('RNNs in more detail, part 7',
-               2,
-               None,
-               'rnns-in-more-detail-part-7'),
-              ('Backpropagation through time',
-               2,
-               None,
-               'backpropagation-through-time'),
-              ('The backward pass is linear',
-               2,
-               None,
-               'the-backward-pass-is-linear'),
-              ('The problem of exploding or vanishing gradients',
-               2,
-               None,
-               'the-problem-of-exploding-or-vanishing-gradients'),
-              ('Mathematical setup', 2, None, 'mathematical-setup'),
-              ('Back propagation in time through figures, part 1',
-               2,
-               None,
-               'back-propagation-in-time-through-figures-part-1'),
-              ('Back propagation in time, part 2',
-               2,
-               None,
-               'back-propagation-in-time-part-2'),
-              ('Back propagation in time, part 3',
+               'parameters-to-train-common-settings'),
+              ('Examples of CNN setups', 2, None, 'examples-of-cnn-setups'),
+              ('Summarizing: Performing a general discrete convolution ("From '
+               'Raschka et '
+               'al":"https://github.com/rasbt/machine-learning-book")',
                2,
                None,
-               'back-propagation-in-time-part-3'),
-              ('Back propagation in time, part 4',
+               'summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
+              ('Pooling', 2, None, 'pooling'),
+              ('Pooling arithmetic', 2, None, 'pooling-arithmetic'),
+              ('Pooling types ("From Raschka et '
+               'al":"https://github.com/rasbt/machine-learning-book")',
                2,
                None,
-               'back-propagation-in-time-part-4'),
-              ('Back propagation in time in equations',
+               'pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
+              ('Building convolutional neural networks using Tensorflow and '
+               'Keras',
                2,
                None,
-               'back-propagation-in-time-in-equations'),
-              ('Chain rule again', 2, None, 'chain-rule-again'),
-              ('Gradients of loss functions',
+               'building-convolutional-neural-networks-using-tensorflow-and-keras'),
+              ('Setting it up', 2, None, 'setting-it-up'),
+              ('The MNIST dataset again', 2, None, 'the-mnist-dataset-again'),
+              ('Strong correlations', 2, None, 'strong-correlations'),
+              ('Layers of a CNN', 2, None, 'layers-of-a-cnn'),
+              ('Systematic reduction', 2, None, 'systematic-reduction'),
+              ('Prerequisites: Collect and pre-process data',
                2,
                None,
-               'gradients-of-loss-functions'),
-              ('Summary of RNNs', 2, None, 'summary-of-rnns'),
-              ('Summary of a  typical RNN',
+               'prerequisites-collect-and-pre-process-data'),
+              ('Importing Keras and Tensorflow',
                2,
                None,
-               'summary-of-a-typical-rnn'),
-              ('Four effective ways to learn an RNN and preparing for next '
-               'week',
+               'importing-keras-and-tensorflow'),
+              ('Running with Keras', 2, None, 'running-with-keras'),
+              ('Final part', 2, None, 'final-part'),
+              ('Final visualization', 2, None, 'final-visualization'),
+              ('The CIFAR01 data set', 2, None, 'the-cifar01-data-set'),
+              ('Verifying the data set', 2, None, 'verifying-the-data-set'),
+              ('Set up  the model', 2, None, 'set-up-the-model'),
+              ('Add Dense layers on top', 2, None, 'add-dense-layers-on-top'),
+              ('Compile and train the model',
                2,
                None,
-               'four-effective-ways-to-learn-an-rnn-and-preparing-for-next-week'),
-              ('Gating mechanism: Long Short Term Memory (LSTM)',
+               'compile-and-train-the-model'),
+              ('Finally, evaluate the model',
                2,
                None,
-               'gating-mechanism-long-short-term-memory-lstm'),
-              ('Implementing a memory cell in a neural network',
+               'finally-evaluate-the-model'),
+              ('Building code using Pytorch',
                2,
                None,
-               'implementing-a-memory-cell-in-a-neural-network'),
-              ('LSTM details', 2, None, 'lstm-details'),
-              ('Basic layout', 2, None, 'basic-layout'),
-              ('More LSTM details', 2, None, 'more-lstm-details'),
-              ('The forget gate', 2, None, 'the-forget-gate'),
-              ('Input gate', 2, None, 'input-gate'),
-              ('Forget and input', 2, None, 'forget-and-input'),
-              ('Output gate', 2, None, 'output-gate')]}
+               'building-code-using-pytorch')]}
 end of tocinfo -->
 
 <body>
@@ -302,111 +201,72 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week45-bs.html">Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks (RNNs)</a>
+    <a class="navbar-brand" href="http://github.com/week45-bs.html">Week 45,  Convolutional Neural Networks (CCNs)</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs001.html#plans-for-week-45" style="font-size: 80%;"><b>Plans for week 45</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs002.html#material-for-the-lab-sessions-additional-ways-to-present-classification-results-and-other-practicalities" style="font-size: 80%;"><b>Material for the lab sessions, additional ways to present classification results and other practicalities</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs003.html#material-for-lecture-monday-november-4" style="font-size: 80%;"><b>Material for Lecture Monday November 4</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs004.html#convolutional-neural-networks-recognizing-images" style="font-size: 80%;"><b>Convolutional Neural Networks (recognizing images)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs005.html#what-is-the-difference" style="font-size: 80%;"><b>What is the Difference</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs006.html#neural-networks-vs-cnns" style="font-size: 80%;"><b>Neural Networks vs CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs007.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;"><b>Why CNNS for images, sound files, medical images from CT scans etc?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs008.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;"><b>Regular NNs don’t scale well to full images</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs009.html#3d-volumes-of-neurons" style="font-size: 80%;"><b>3D volumes of neurons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs010.html#layers-used-to-build-cnns" style="font-size: 80%;"><b>Layers used to build CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs011.html#cnns-in-brief" style="font-size: 80%;"><b>CNNs in brief</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs012.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs013.html#key-idea" style="font-size: 80%;"><b>Key Idea</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs014.html#building-convolutional-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building convolutional neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs015.html#setting-it-up" style="font-size: 80%;"><b>Setting it up</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs016.html#the-mnist-dataset-again" style="font-size: 80%;"><b>The MNIST dataset again</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs017.html#strong-correlations" style="font-size: 80%;"><b>Strong correlations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs018.html#layers-of-a-cnn" style="font-size: 80%;"><b>Layers of a CNN</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs019.html#systematic-reduction" style="font-size: 80%;"><b>Systematic reduction</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs020.html#prerequisites-collect-and-pre-process-data" style="font-size: 80%;"><b>Prerequisites: Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs021.html#importing-keras-and-tensorflow" style="font-size: 80%;"><b>Importing Keras and Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs022.html#running-with-keras" style="font-size: 80%;"><b>Running with Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs023.html#final-part" style="font-size: 80%;"><b>Final part</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs024.html#final-visualization" style="font-size: 80%;"><b>Final visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs025.html#the-cifar01-data-set" style="font-size: 80%;"><b>The CIFAR01 data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs026.html#verifying-the-data-set" style="font-size: 80%;"><b>Verifying the data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs027.html#set-up-the-model" style="font-size: 80%;"><b>Set up  the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs028.html#add-dense-layers-on-top" style="font-size: 80%;"><b>Add Dense layers on top</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs029.html#compile-and-train-the-model" style="font-size: 80%;"><b>Compile and train the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs030.html#finally-evaluate-the-model" style="font-size: 80%;"><b>Finally, evaluate the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#building-our-own-cnn-code" style="font-size: 80%;"><b>Building our own CNN code</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#list-of-contents" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;List of contents:</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution2dlayer-convolution-in-a-hidden-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution2DLayer: convolution in a hidden layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#backpropagation-in-the-convolutional-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Backpropagation in the convolutional layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#demonstration" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Demonstration</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#pooling-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Pooling Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#flattening-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Flattening Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#fully-connected-layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Fully Connected Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#optimized-convolution2dlayer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Optimized Convolution2DLayer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#the-convolutional-neural-network-cnn" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;The Convolutional Neural Network (CNN)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-cnn-code" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of CNN code</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#additional-remarks" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Additional Remarks</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#remarks-on-the-speed" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Remarks on the speed</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution-using-separable-kernels" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution using separable kernels</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution-in-the-fourier-domain" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution in the Fourier domain</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs032.html#from-ffnns-and-cnns-to-recurrent-neural-networks-rnns" style="font-size: 80%;"><b>From FFNNs and CNNs to recurrent neural networks (RNNs)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs033.html#feedback-connections" style="font-size: 80%;"><b>Feedback connections</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs034.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs035.html#recurrent-neural-networks-rnns-overarching-view" style="font-size: 80%;"><b>Recurrent neural networks (RNNs): Overarching view</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs036.html#sequential-data-only" style="font-size: 80%;"><b>Sequential data only?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs037.html#differential-equations" style="font-size: 80%;"><b>Differential equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs038.html#a-simple-example" style="font-size: 80%;"><b>A simple example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs039.html#rnns" style="font-size: 80%;"><b>RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs040.html#what-kinds-of-behaviour-can-rnns-exhibit" style="font-size: 80%;"><b>What kinds of behaviour can RNNs exhibit?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs041.html#basic-layout-figures-from-sebastian-rashcka-et-al-machine-learning-with-sickit-learn-and-pytorch-https-sebastianraschka-com-blog-2022-ml-pytorch-book-html" style="font-size: 80%;"><b>Basic layout,  "Figures from Sebastian Rashcka et al, Machine learning with Sickit-Learn and PyTorch":"https://sebastianraschka.com/blog/2022/ml-pytorch-book.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs042.html#solving-differential-equations-with-rnns" style="font-size: 80%;"><b>Solving differential equations with RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs043.html#two-first-order-differential-equations" style="font-size: 80%;"><b>Two first-order differential equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs044.html#velocity-only" style="font-size: 80%;"><b>Velocity only</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs045.html#linking-with-rnns" style="font-size: 80%;"><b>Linking with RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs046.html#minor-rewrite" style="font-size: 80%;"><b>Minor rewrite</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs047.html#rnns-in-more-detail" style="font-size: 80%;"><b>RNNs in more detail</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs048.html#rnns-in-more-detail-part-2" style="font-size: 80%;"><b>RNNs in more detail, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs049.html#rnns-in-more-detail-part-3" style="font-size: 80%;"><b>RNNs in more detail, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs050.html#rnns-in-more-detail-part-4" style="font-size: 80%;"><b>RNNs in more detail, part 4</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs051.html#rnns-in-more-detail-part-5" style="font-size: 80%;"><b>RNNs in more detail, part 5</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs052.html#rnns-in-more-detail-part-6" style="font-size: 80%;"><b>RNNs in more detail, part 6</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs053.html#rnns-in-more-detail-part-7" style="font-size: 80%;"><b>RNNs in more detail, part 7</b></a></li>
-     <!-- navigation toc: --> <li><a href="#backpropagation-through-time" style="font-size: 80%;"><b>Backpropagation through time</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs055.html#the-backward-pass-is-linear" style="font-size: 80%;"><b>The backward pass is linear</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs056.html#the-problem-of-exploding-or-vanishing-gradients" style="font-size: 80%;"><b>The problem of exploding or vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs057.html#mathematical-setup" style="font-size: 80%;"><b>Mathematical setup</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs058.html#back-propagation-in-time-through-figures-part-1" style="font-size: 80%;"><b>Back propagation in time through figures, part 1</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs059.html#back-propagation-in-time-part-2" style="font-size: 80%;"><b>Back propagation in time, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs060.html#back-propagation-in-time-part-3" style="font-size: 80%;"><b>Back propagation in time, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs061.html#back-propagation-in-time-part-4" style="font-size: 80%;"><b>Back propagation in time, part 4</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs062.html#back-propagation-in-time-in-equations" style="font-size: 80%;"><b>Back propagation in time in equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs063.html#chain-rule-again" style="font-size: 80%;"><b>Chain rule again</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs064.html#gradients-of-loss-functions" style="font-size: 80%;"><b>Gradients of loss functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs065.html#summary-of-rnns" style="font-size: 80%;"><b>Summary of RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs066.html#summary-of-a-typical-rnn" style="font-size: 80%;"><b>Summary of a  typical RNN</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs067.html#four-effective-ways-to-learn-an-rnn-and-preparing-for-next-week" style="font-size: 80%;"><b>Four effective ways to learn an RNN and preparing for next week</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs068.html#gating-mechanism-long-short-term-memory-lstm" style="font-size: 80%;"><b>Gating mechanism: Long Short Term Memory (LSTM)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs069.html#implementing-a-memory-cell-in-a-neural-network" style="font-size: 80%;"><b>Implementing a memory cell in a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs070.html#lstm-details" style="font-size: 80%;"><b>LSTM details</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs071.html#basic-layout" style="font-size: 80%;"><b>Basic layout</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs072.html#more-lstm-details" style="font-size: 80%;"><b>More LSTM details</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs073.html#the-forget-gate" style="font-size: 80%;"><b>The forget gate</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs074.html#input-gate" style="font-size: 80%;"><b>Input gate</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs075.html#forget-and-input" style="font-size: 80%;"><b>Forget and input</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs076.html#output-gate" style="font-size: 80%;"><b>Output gate</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs001.html#plans-for-week-45" style="font-size: 80%;">Plans for week 45</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs002.html#material-for-the-lab-sessions" style="font-size: 80%;">Material for the lab sessions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs003.html#material-for-lecture-monday-november-3" style="font-size: 80%;">Material for Lecture Monday November 3</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs004.html#convolutional-neural-networks-recognizing-images-reminder-from-last-week" style="font-size: 80%;">Convolutional Neural Networks (recognizing images), reminder from last week</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs005.html#what-is-the-difference" style="font-size: 80%;">What is the Difference</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs006.html#neural-networks-vs-cnns" style="font-size: 80%;">Neural Networks vs CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs007.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;">Why CNNS for images, sound files, medical images from CT scans etc?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs008.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;">Regular NNs don’t scale well to full images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs009.html#3d-volumes-of-neurons" style="font-size: 80%;">3D volumes of neurons</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs010.html#more-on-dimensionalities" style="font-size: 80%;">More on Dimensionalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs011.html#further-remarks" style="font-size: 80%;">Further remarks</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs012.html#layers-used-to-build-cnns" style="font-size: 80%;">Layers used to build CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs013.html#transforming-images" style="font-size: 80%;">Transforming images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs014.html#cnns-in-brief" style="font-size: 80%;">CNNs in brief</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs015.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs016.html#key-idea" style="font-size: 80%;">Key Idea</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs017.html#mathematics-of-cnns" style="font-size: 80%;">Mathematics of CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs018.html#convolution-examples-polynomial-multiplication" style="font-size: 80%;">Convolution Examples: Polynomial multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs019.html#efficient-polynomial-multiplication" style="font-size: 80%;">Efficient Polynomial Multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs020.html#further-simplification" style="font-size: 80%;">Further simplification</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs021.html#a-more-efficient-way-of-coding-the-above-convolution" style="font-size: 80%;">A more efficient way of coding the above Convolution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs022.html#commutative-process" style="font-size: 80%;">Commutative process</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs023.html#toeplitz-matrices" style="font-size: 80%;">Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs024.html#fourier-series-and-toeplitz-matrices" style="font-size: 80%;">Fourier series and Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs025.html#generalizing-the-above-one-dimensional-case" style="font-size: 80%;">Generalizing the above one-dimensional case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs026.html#memory-considerations" style="font-size: 80%;">Memory considerations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs027.html#padding" style="font-size: 80%;">Padding</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs028.html#new-vector" style="font-size: 80%;">New vector</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs029.html#rewriting-as-dot-products" style="font-size: 80%;">Rewriting as dot products</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs030.html#cross-correlation" style="font-size: 80%;">Cross correlation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs030.html#two-dimensional-objects" style="font-size: 80%;">Two-dimensional objects</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#cnns-in-more-detail-simple-example" style="font-size: 80%;">CNNs in more detail, simple example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs032.html#the-convolution-stage" style="font-size: 80%;">The convolution stage</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs033.html#finding-the-number-of-parameters" style="font-size: 80%;">Finding the number of parameters</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs034.html#new-image-or-volume" style="font-size: 80%;">New image (or volume)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs035.html#parameters-to-train-common-settings" style="font-size: 80%;">Parameters to train, common settings</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs036.html#examples-of-cnn-setups" style="font-size: 80%;">Examples of CNN setups</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs037.html#summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Summarizing: Performing a general discrete convolution ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs038.html#pooling" style="font-size: 80%;">Pooling</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs039.html#pooling-arithmetic" style="font-size: 80%;">Pooling arithmetic</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs040.html#pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Pooling types ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs041.html#building-convolutional-neural-networks-using-tensorflow-and-keras" style="font-size: 80%;">Building convolutional neural networks using Tensorflow and Keras</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs042.html#setting-it-up" style="font-size: 80%;">Setting it up</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs043.html#the-mnist-dataset-again" style="font-size: 80%;">The MNIST dataset again</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs044.html#strong-correlations" style="font-size: 80%;">Strong correlations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs045.html#layers-of-a-cnn" style="font-size: 80%;">Layers of a CNN</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs046.html#systematic-reduction" style="font-size: 80%;">Systematic reduction</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs047.html#prerequisites-collect-and-pre-process-data" style="font-size: 80%;">Prerequisites: Collect and pre-process data</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs048.html#importing-keras-and-tensorflow" style="font-size: 80%;">Importing Keras and Tensorflow</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs049.html#running-with-keras" style="font-size: 80%;">Running with Keras</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs050.html#final-part" style="font-size: 80%;">Final part</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs051.html#final-visualization" style="font-size: 80%;">Final visualization</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs052.html#the-cifar01-data-set" style="font-size: 80%;">The CIFAR01 data set</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs053.html#verifying-the-data-set" style="font-size: 80%;">Verifying the data set</a></li>
+     <!-- navigation toc: --> <li><a href="#set-up-the-model" style="font-size: 80%;">Set up  the model</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs055.html#add-dense-layers-on-top" style="font-size: 80%;">Add Dense layers on top</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs056.html#compile-and-train-the-model" style="font-size: 80%;">Compile and train the model</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs057.html#finally-evaluate-the-model" style="font-size: 80%;">Finally, evaluate the model</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs058.html#building-code-using-pytorch" style="font-size: 80%;">Building code using Pytorch</a></li>
 
         </ul>
       </li>
@@ -418,25 +278,46 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0054"></a>
 <!-- !split -->
-<h2 id="backpropagation-through-time" class="anchor">Backpropagation through time </h2>
+<h2 id="set-up-the-model" class="anchor">Set up  the model </h2>
+
+<p>The 6 lines of code below define the convolutional base using a common pattern: a stack of Conv2D and MaxPooling2D layers.</p>
+
+<p>As input, a CNN takes tensors of shape (image_height, image_width, color_channels), ignoring the batch size. If you are new to these dimensions, color_channels refers to (R,G,B). In this example, you will configure our CNN to process inputs of shape (32, 32, 3), which is the format of CIFAR images. You can do this by passing the argument input_shape to our first layer.</p>
+
 
-<div class="panel panel-default">
-<div class="panel-body">
-<!-- subsequent paragraphs come in larger fonts, so start with a paragraph -->
-<p>We can think of the recurrent net as a layered, feed-forward
-net with shared weights and then train the feed-forward net
-with weight constraints.
-</p>
+<!-- code=python (!bc pycod) typeset with pygments style "default" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #f8f8f8">
+  <pre style="line-height: 125%;">model <span style="color: #666666">=</span> models<span style="color: #666666">.</span>Sequential()
+model<span style="color: #666666">.</span>add(layers<span style="color: #666666">.</span>Conv2D(<span style="color: #666666">32</span>, (<span style="color: #666666">3</span>, <span style="color: #666666">3</span>), activation<span style="color: #666666">=</span><span style="color: #BA2121">&#39;relu&#39;</span>, input_shape<span style="color: #666666">=</span>(<span style="color: #666666">32</span>, <span style="color: #666666">32</span>, <span style="color: #666666">3</span>)))
+model<span style="color: #666666">.</span>add(layers<span style="color: #666666">.</span>MaxPooling2D((<span style="color: #666666">2</span>, <span style="color: #666666">2</span>)))
+model<span style="color: #666666">.</span>add(layers<span style="color: #666666">.</span>Conv2D(<span style="color: #666666">64</span>, (<span style="color: #666666">3</span>, <span style="color: #666666">3</span>), activation<span style="color: #666666">=</span><span style="color: #BA2121">&#39;relu&#39;</span>))
+model<span style="color: #666666">.</span>add(layers<span style="color: #666666">.</span>MaxPooling2D((<span style="color: #666666">2</span>, <span style="color: #666666">2</span>)))
+model<span style="color: #666666">.</span>add(layers<span style="color: #666666">.</span>Conv2D(<span style="color: #666666">64</span>, (<span style="color: #666666">3</span>, <span style="color: #666666">3</span>), activation<span style="color: #666666">=</span><span style="color: #BA2121">&#39;relu&#39;</span>))
+
+<span style="color: #408080; font-style: italic"># Let&#39;s display the architecture of our model so far.</span>
+
+model<span style="color: #666666">.</span>summary()
+</pre>
 </div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
 </div>
 
+<p>You can see that the output of every Conv2D and MaxPooling2D layer is a 3D tensor of shape (height, width, channels). The width and height dimensions tend to shrink as you go deeper in the network. The number of output channels for each Conv2D layer is controlled by the first argument (e.g., 32 or 64). Typically, as the width and height shrink, you can afford (computationally) to add more output channels in each Conv2D layer.</p>
 
-<p>We can also think of this training algorithm in the time domain:</p>
-<ol>
-<li> The forward pass builds up a stack of the activities of all the units at each time step.</li>
-<li> The backward pass peels activities off the stack to compute the error derivatives at each time step.</li>
-<li> After the backward pass we add together the derivatives at all the different times for each weight.</li> 
-</ol>
 <p>
 <!-- navigation buttons at the bottom of the page -->
 <ul class="pagination">
@@ -456,13 +337,6 @@ <h2 id="backpropagation-through-time" class="anchor">Backpropagation through tim
   <li><a href="http://github.com/._week45-bs056.html">57</a></li>
   <li><a href="http://github.com/._week45-bs057.html">58</a></li>
   <li><a href="http://github.com/._week45-bs058.html">59</a></li>
-  <li><a href="http://github.com/._week45-bs059.html">60</a></li>
-  <li><a href="http://github.com/._week45-bs060.html">61</a></li>
-  <li><a href="http://github.com/._week45-bs061.html">62</a></li>
-  <li><a href="http://github.com/._week45-bs062.html">63</a></li>
-  <li><a href="http://github.com/._week45-bs063.html">64</a></li>
-  <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week45-bs076.html">77</a></li>
   <li><a href="http://github.com/._week45-bs055.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week45/html/._week45-bs055.html b/doc/pub/week45/html/._week45-bs055.html
index da610f078..868c26b4b 100644
--- a/doc/pub/week45/html/._week45-bs055.html
+++ b/doc/pub/week45/html/._week45-bs055.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks (RNNs)">
-<title>Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks (RNNs)</title>
+<meta name="description" content="Week 45,  Convolutional Neural Networks (CCNs)">
+<title>Week 45,  Convolutional Neural Networks (CCNs)</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week45.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week45-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -37,19 +37,19 @@
 <!-- tocinfo
 {'highest level': 2,
  'sections': [('Plans for week 45', 2, None, 'plans-for-week-45'),
-              ('Material for the lab sessions, additional ways to present '
-               'classification results and other practicalities',
+              ('Material for the lab sessions',
                2,
                None,
-               'material-for-the-lab-sessions-additional-ways-to-present-classification-results-and-other-practicalities'),
-              ('Material for Lecture Monday November 4',
+               'material-for-the-lab-sessions'),
+              ('Material for Lecture Monday November 3',
                2,
                None,
-               'material-for-lecture-monday-november-4'),
-              ('Convolutional Neural Networks (recognizing images)',
+               'material-for-lecture-monday-november-3'),
+              ('Convolutional Neural Networks (recognizing images), reminder '
+               'from last week',
                2,
                None,
-               'convolutional-neural-networks-recognizing-images'),
+               'convolutional-neural-networks-recognizing-images-reminder-from-last-week'),
               ('What is the Difference', 2, None, 'what-is-the-difference'),
               ('Neural Networks vs CNNs', 2, None, 'neural-networks-vs-cnns'),
               ('Why CNNS for images, sound files, medical images from CT scans '
@@ -62,10 +62,13 @@
                None,
                'regular-nns-don-t-scale-well-to-full-images'),
               ('3D volumes of neurons', 2, None, '3d-volumes-of-neurons'),
+              ('More on Dimensionalities', 2, None, 'more-on-dimensionalities'),
+              ('Further remarks', 2, None, 'further-remarks'),
               ('Layers used to build CNNs',
                2,
                None,
                'layers-used-to-build-cnns'),
+              ('Transforming images', 2, None, 'transforming-images'),
               ('CNNs in brief', 2, None, 'cnns-in-brief'),
               ('A deep CNN model ("From Raschka et '
                'al":"https://github.com/rasbt/machine-learning-book")',
@@ -73,208 +76,104 @@
                None,
                'a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
               ('Key Idea', 2, None, 'key-idea'),
-              ('Building convolutional neural networks in Tensorflow and Keras',
+              ('Mathematics of CNNs', 2, None, 'mathematics-of-cnns'),
+              ('Convolution Examples: Polynomial multiplication',
                2,
                None,
-               'building-convolutional-neural-networks-in-tensorflow-and-keras'),
-              ('Setting it up', 2, None, 'setting-it-up'),
-              ('The MNIST dataset again', 2, None, 'the-mnist-dataset-again'),
-              ('Strong correlations', 2, None, 'strong-correlations'),
-              ('Layers of a CNN', 2, None, 'layers-of-a-cnn'),
-              ('Systematic reduction', 2, None, 'systematic-reduction'),
-              ('Prerequisites: Collect and pre-process data',
-               2,
-               None,
-               'prerequisites-collect-and-pre-process-data'),
-              ('Importing Keras and Tensorflow',
-               2,
-               None,
-               'importing-keras-and-tensorflow'),
-              ('Running with Keras', 2, None, 'running-with-keras'),
-              ('Final part', 2, None, 'final-part'),
-              ('Final visualization', 2, None, 'final-visualization'),
-              ('The CIFAR01 data set', 2, None, 'the-cifar01-data-set'),
-              ('Verifying the data set', 2, None, 'verifying-the-data-set'),
-              ('Set up  the model', 2, None, 'set-up-the-model'),
-              ('Add Dense layers on top', 2, None, 'add-dense-layers-on-top'),
-              ('Compile and train the model',
-               2,
-               None,
-               'compile-and-train-the-model'),
-              ('Finally, evaluate the model',
-               2,
-               None,
-               'finally-evaluate-the-model'),
-              ('Building our own CNN code',
-               2,
-               None,
-               'building-our-own-cnn-code'),
-              ('List of contents:', 3, None, 'list-of-contents'),
-              ('Schedulers', 3, None, 'schedulers'),
-              ('Usage of schedulers', 3, None, 'usage-of-schedulers'),
-              ('Cost functions', 3, None, 'cost-functions'),
-              ('Usage of cost functions', 3, None, 'usage-of-cost-functions'),
-              ('Activation functions', 3, None, 'activation-functions'),
-              ('Usage of activation functions',
-               3,
-               None,
-               'usage-of-activation-functions'),
-              ('Convolution', 3, None, 'convolution'),
-              ('Layers', 3, None, 'layers'),
-              ('Convolution2DLayer: convolution in a hidden layer',
-               3,
-               None,
-               'convolution2dlayer-convolution-in-a-hidden-layer'),
-              ('Backpropagation in the convolutional layer',
-               3,
-               None,
-               'backpropagation-in-the-convolutional-layer'),
-              ('Demonstration', 3, None, 'demonstration'),
-              ('Pooling Layer', 3, None, 'pooling-layer'),
-              ('Flattening Layer', 3, None, 'flattening-layer'),
-              ('Fully Connected Layers', 3, None, 'fully-connected-layers'),
-              ('Optimized Convolution2DLayer',
-               3,
-               None,
-               'optimized-convolution2dlayer'),
-              ('The Convolutional Neural Network (CNN)',
-               3,
-               None,
-               'the-convolutional-neural-network-cnn'),
-              ('Usage of CNN code', 3, None, 'usage-of-cnn-code'),
-              ('Additional Remarks', 3, None, 'additional-remarks'),
-              ('Remarks on the speed', 3, None, 'remarks-on-the-speed'),
-              ('Convolution using separable kernels',
-               3,
-               None,
-               'convolution-using-separable-kernels'),
-              ('Convolution in the Fourier domain',
-               3,
-               None,
-               'convolution-in-the-fourier-domain'),
-              ('From FFNNs and CNNs to recurrent neural networks (RNNs)',
-               2,
-               None,
-               'from-ffnns-and-cnns-to-recurrent-neural-networks-rnns'),
-              ('Feedback connections', 2, None, 'feedback-connections'),
-              ('Vanishing gradients', 2, None, 'vanishing-gradients'),
-              ('Recurrent neural networks (RNNs): Overarching view',
-               2,
-               None,
-               'recurrent-neural-networks-rnns-overarching-view'),
-              ('Sequential data only?', 2, None, 'sequential-data-only'),
-              ('Differential equations', 2, None, 'differential-equations'),
-              ('A simple example', 2, None, 'a-simple-example'),
-              ('RNNs', 2, None, 'rnns'),
-              ('What kinds of behaviour can RNNs exhibit?',
-               2,
-               None,
-               'what-kinds-of-behaviour-can-rnns-exhibit'),
-              ('Basic layout,  "Figures from Sebastian Rashcka et al, Machine '
-               'learning with Sickit-Learn and '
-               'PyTorch":"https://sebastianraschka.com/blog/2022/ml-pytorch-book.html"',
+               'convolution-examples-polynomial-multiplication'),
+              ('Efficient Polynomial Multiplication',
                2,
                None,
-               'basic-layout-figures-from-sebastian-rashcka-et-al-machine-learning-with-sickit-learn-and-pytorch-https-sebastianraschka-com-blog-2022-ml-pytorch-book-html'),
-              ('Solving differential equations with RNNs',
+               'efficient-polynomial-multiplication'),
+              ('Further simplification', 2, None, 'further-simplification'),
+              ('A more efficient way of coding the above Convolution',
                2,
                None,
-               'solving-differential-equations-with-rnns'),
-              ('Two first-order differential equations',
+               'a-more-efficient-way-of-coding-the-above-convolution'),
+              ('Commutative process', 2, None, 'commutative-process'),
+              ('Toeplitz matrices', 2, None, 'toeplitz-matrices'),
+              ('Fourier series and Toeplitz matrices',
                2,
                None,
-               'two-first-order-differential-equations'),
-              ('Velocity only', 2, None, 'velocity-only'),
-              ('Linking with RNNs', 2, None, 'linking-with-rnns'),
-              ('Minor rewrite', 2, None, 'minor-rewrite'),
-              ('RNNs in more detail', 2, None, 'rnns-in-more-detail'),
-              ('RNNs in more detail, part 2',
+               'fourier-series-and-toeplitz-matrices'),
+              ('Generalizing the above one-dimensional case',
                2,
                None,
-               'rnns-in-more-detail-part-2'),
-              ('RNNs in more detail, part 3',
+               'generalizing-the-above-one-dimensional-case'),
+              ('Memory considerations', 2, None, 'memory-considerations'),
+              ('Padding', 2, None, 'padding'),
+              ('New vector', 2, None, 'new-vector'),
+              ('Rewriting as dot products',
                2,
                None,
-               'rnns-in-more-detail-part-3'),
-              ('RNNs in more detail, part 4',
+               'rewriting-as-dot-products'),
+              ('Cross correlation', 2, None, 'cross-correlation'),
+              ('Two-dimensional objects', 2, None, 'two-dimensional-objects'),
+              ('CNNs in more detail, simple example',
                2,
                None,
-               'rnns-in-more-detail-part-4'),
-              ('RNNs in more detail, part 5',
+               'cnns-in-more-detail-simple-example'),
+              ('The convolution stage', 2, None, 'the-convolution-stage'),
+              ('Finding the number of parameters',
                2,
                None,
-               'rnns-in-more-detail-part-5'),
-              ('RNNs in more detail, part 6',
+               'finding-the-number-of-parameters'),
+              ('New image (or volume)', 2, None, 'new-image-or-volume'),
+              ('Parameters to train, common settings',
                2,
                None,
-               'rnns-in-more-detail-part-6'),
-              ('RNNs in more detail, part 7',
-               2,
-               None,
-               'rnns-in-more-detail-part-7'),
-              ('Backpropagation through time',
-               2,
-               None,
-               'backpropagation-through-time'),
-              ('The backward pass is linear',
-               2,
-               None,
-               'the-backward-pass-is-linear'),
-              ('The problem of exploding or vanishing gradients',
-               2,
-               None,
-               'the-problem-of-exploding-or-vanishing-gradients'),
-              ('Mathematical setup', 2, None, 'mathematical-setup'),
-              ('Back propagation in time through figures, part 1',
-               2,
-               None,
-               'back-propagation-in-time-through-figures-part-1'),
-              ('Back propagation in time, part 2',
-               2,
-               None,
-               'back-propagation-in-time-part-2'),
-              ('Back propagation in time, part 3',
+               'parameters-to-train-common-settings'),
+              ('Examples of CNN setups', 2, None, 'examples-of-cnn-setups'),
+              ('Summarizing: Performing a general discrete convolution ("From '
+               'Raschka et '
+               'al":"https://github.com/rasbt/machine-learning-book")',
                2,
                None,
-               'back-propagation-in-time-part-3'),
-              ('Back propagation in time, part 4',
+               'summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
+              ('Pooling', 2, None, 'pooling'),
+              ('Pooling arithmetic', 2, None, 'pooling-arithmetic'),
+              ('Pooling types ("From Raschka et '
+               'al":"https://github.com/rasbt/machine-learning-book")',
                2,
                None,
-               'back-propagation-in-time-part-4'),
-              ('Back propagation in time in equations',
+               'pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
+              ('Building convolutional neural networks using Tensorflow and '
+               'Keras',
                2,
                None,
-               'back-propagation-in-time-in-equations'),
-              ('Chain rule again', 2, None, 'chain-rule-again'),
-              ('Gradients of loss functions',
+               'building-convolutional-neural-networks-using-tensorflow-and-keras'),
+              ('Setting it up', 2, None, 'setting-it-up'),
+              ('The MNIST dataset again', 2, None, 'the-mnist-dataset-again'),
+              ('Strong correlations', 2, None, 'strong-correlations'),
+              ('Layers of a CNN', 2, None, 'layers-of-a-cnn'),
+              ('Systematic reduction', 2, None, 'systematic-reduction'),
+              ('Prerequisites: Collect and pre-process data',
                2,
                None,
-               'gradients-of-loss-functions'),
-              ('Summary of RNNs', 2, None, 'summary-of-rnns'),
-              ('Summary of a  typical RNN',
+               'prerequisites-collect-and-pre-process-data'),
+              ('Importing Keras and Tensorflow',
                2,
                None,
-               'summary-of-a-typical-rnn'),
-              ('Four effective ways to learn an RNN and preparing for next '
-               'week',
+               'importing-keras-and-tensorflow'),
+              ('Running with Keras', 2, None, 'running-with-keras'),
+              ('Final part', 2, None, 'final-part'),
+              ('Final visualization', 2, None, 'final-visualization'),
+              ('The CIFAR01 data set', 2, None, 'the-cifar01-data-set'),
+              ('Verifying the data set', 2, None, 'verifying-the-data-set'),
+              ('Set up  the model', 2, None, 'set-up-the-model'),
+              ('Add Dense layers on top', 2, None, 'add-dense-layers-on-top'),
+              ('Compile and train the model',
                2,
                None,
-               'four-effective-ways-to-learn-an-rnn-and-preparing-for-next-week'),
-              ('Gating mechanism: Long Short Term Memory (LSTM)',
+               'compile-and-train-the-model'),
+              ('Finally, evaluate the model',
                2,
                None,
-               'gating-mechanism-long-short-term-memory-lstm'),
-              ('Implementing a memory cell in a neural network',
+               'finally-evaluate-the-model'),
+              ('Building code using Pytorch',
                2,
                None,
-               'implementing-a-memory-cell-in-a-neural-network'),
-              ('LSTM details', 2, None, 'lstm-details'),
-              ('Basic layout', 2, None, 'basic-layout'),
-              ('More LSTM details', 2, None, 'more-lstm-details'),
-              ('The forget gate', 2, None, 'the-forget-gate'),
-              ('Input gate', 2, None, 'input-gate'),
-              ('Forget and input', 2, None, 'forget-and-input'),
-              ('Output gate', 2, None, 'output-gate')]}
+               'building-code-using-pytorch')]}
 end of tocinfo -->
 
 <body>
@@ -302,111 +201,72 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week45-bs.html">Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks (RNNs)</a>
+    <a class="navbar-brand" href="http://github.com/week45-bs.html">Week 45,  Convolutional Neural Networks (CCNs)</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs001.html#plans-for-week-45" style="font-size: 80%;"><b>Plans for week 45</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs002.html#material-for-the-lab-sessions-additional-ways-to-present-classification-results-and-other-practicalities" style="font-size: 80%;"><b>Material for the lab sessions, additional ways to present classification results and other practicalities</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs003.html#material-for-lecture-monday-november-4" style="font-size: 80%;"><b>Material for Lecture Monday November 4</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs004.html#convolutional-neural-networks-recognizing-images" style="font-size: 80%;"><b>Convolutional Neural Networks (recognizing images)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs005.html#what-is-the-difference" style="font-size: 80%;"><b>What is the Difference</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs006.html#neural-networks-vs-cnns" style="font-size: 80%;"><b>Neural Networks vs CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs007.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;"><b>Why CNNS for images, sound files, medical images from CT scans etc?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs008.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;"><b>Regular NNs don’t scale well to full images</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs009.html#3d-volumes-of-neurons" style="font-size: 80%;"><b>3D volumes of neurons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs010.html#layers-used-to-build-cnns" style="font-size: 80%;"><b>Layers used to build CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs011.html#cnns-in-brief" style="font-size: 80%;"><b>CNNs in brief</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs012.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs013.html#key-idea" style="font-size: 80%;"><b>Key Idea</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs014.html#building-convolutional-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building convolutional neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs015.html#setting-it-up" style="font-size: 80%;"><b>Setting it up</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs016.html#the-mnist-dataset-again" style="font-size: 80%;"><b>The MNIST dataset again</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs017.html#strong-correlations" style="font-size: 80%;"><b>Strong correlations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs018.html#layers-of-a-cnn" style="font-size: 80%;"><b>Layers of a CNN</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs019.html#systematic-reduction" style="font-size: 80%;"><b>Systematic reduction</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs020.html#prerequisites-collect-and-pre-process-data" style="font-size: 80%;"><b>Prerequisites: Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs021.html#importing-keras-and-tensorflow" style="font-size: 80%;"><b>Importing Keras and Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs022.html#running-with-keras" style="font-size: 80%;"><b>Running with Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs023.html#final-part" style="font-size: 80%;"><b>Final part</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs024.html#final-visualization" style="font-size: 80%;"><b>Final visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs025.html#the-cifar01-data-set" style="font-size: 80%;"><b>The CIFAR01 data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs026.html#verifying-the-data-set" style="font-size: 80%;"><b>Verifying the data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs027.html#set-up-the-model" style="font-size: 80%;"><b>Set up  the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs028.html#add-dense-layers-on-top" style="font-size: 80%;"><b>Add Dense layers on top</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs029.html#compile-and-train-the-model" style="font-size: 80%;"><b>Compile and train the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs030.html#finally-evaluate-the-model" style="font-size: 80%;"><b>Finally, evaluate the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#building-our-own-cnn-code" style="font-size: 80%;"><b>Building our own CNN code</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#list-of-contents" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;List of contents:</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution2dlayer-convolution-in-a-hidden-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution2DLayer: convolution in a hidden layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#backpropagation-in-the-convolutional-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Backpropagation in the convolutional layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#demonstration" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Demonstration</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#pooling-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Pooling Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#flattening-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Flattening Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#fully-connected-layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Fully Connected Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#optimized-convolution2dlayer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Optimized Convolution2DLayer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#the-convolutional-neural-network-cnn" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;The Convolutional Neural Network (CNN)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-cnn-code" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of CNN code</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#additional-remarks" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Additional Remarks</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#remarks-on-the-speed" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Remarks on the speed</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution-using-separable-kernels" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution using separable kernels</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution-in-the-fourier-domain" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution in the Fourier domain</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs032.html#from-ffnns-and-cnns-to-recurrent-neural-networks-rnns" style="font-size: 80%;"><b>From FFNNs and CNNs to recurrent neural networks (RNNs)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs033.html#feedback-connections" style="font-size: 80%;"><b>Feedback connections</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs034.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs035.html#recurrent-neural-networks-rnns-overarching-view" style="font-size: 80%;"><b>Recurrent neural networks (RNNs): Overarching view</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs036.html#sequential-data-only" style="font-size: 80%;"><b>Sequential data only?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs037.html#differential-equations" style="font-size: 80%;"><b>Differential equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs038.html#a-simple-example" style="font-size: 80%;"><b>A simple example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs039.html#rnns" style="font-size: 80%;"><b>RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs040.html#what-kinds-of-behaviour-can-rnns-exhibit" style="font-size: 80%;"><b>What kinds of behaviour can RNNs exhibit?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs041.html#basic-layout-figures-from-sebastian-rashcka-et-al-machine-learning-with-sickit-learn-and-pytorch-https-sebastianraschka-com-blog-2022-ml-pytorch-book-html" style="font-size: 80%;"><b>Basic layout,  "Figures from Sebastian Rashcka et al, Machine learning with Sickit-Learn and PyTorch":"https://sebastianraschka.com/blog/2022/ml-pytorch-book.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs042.html#solving-differential-equations-with-rnns" style="font-size: 80%;"><b>Solving differential equations with RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs043.html#two-first-order-differential-equations" style="font-size: 80%;"><b>Two first-order differential equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs044.html#velocity-only" style="font-size: 80%;"><b>Velocity only</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs045.html#linking-with-rnns" style="font-size: 80%;"><b>Linking with RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs046.html#minor-rewrite" style="font-size: 80%;"><b>Minor rewrite</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs047.html#rnns-in-more-detail" style="font-size: 80%;"><b>RNNs in more detail</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs048.html#rnns-in-more-detail-part-2" style="font-size: 80%;"><b>RNNs in more detail, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs049.html#rnns-in-more-detail-part-3" style="font-size: 80%;"><b>RNNs in more detail, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs050.html#rnns-in-more-detail-part-4" style="font-size: 80%;"><b>RNNs in more detail, part 4</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs051.html#rnns-in-more-detail-part-5" style="font-size: 80%;"><b>RNNs in more detail, part 5</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs052.html#rnns-in-more-detail-part-6" style="font-size: 80%;"><b>RNNs in more detail, part 6</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs053.html#rnns-in-more-detail-part-7" style="font-size: 80%;"><b>RNNs in more detail, part 7</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs054.html#backpropagation-through-time" style="font-size: 80%;"><b>Backpropagation through time</b></a></li>
-     <!-- navigation toc: --> <li><a href="#the-backward-pass-is-linear" style="font-size: 80%;"><b>The backward pass is linear</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs056.html#the-problem-of-exploding-or-vanishing-gradients" style="font-size: 80%;"><b>The problem of exploding or vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs057.html#mathematical-setup" style="font-size: 80%;"><b>Mathematical setup</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs058.html#back-propagation-in-time-through-figures-part-1" style="font-size: 80%;"><b>Back propagation in time through figures, part 1</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs059.html#back-propagation-in-time-part-2" style="font-size: 80%;"><b>Back propagation in time, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs060.html#back-propagation-in-time-part-3" style="font-size: 80%;"><b>Back propagation in time, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs061.html#back-propagation-in-time-part-4" style="font-size: 80%;"><b>Back propagation in time, part 4</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs062.html#back-propagation-in-time-in-equations" style="font-size: 80%;"><b>Back propagation in time in equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs063.html#chain-rule-again" style="font-size: 80%;"><b>Chain rule again</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs064.html#gradients-of-loss-functions" style="font-size: 80%;"><b>Gradients of loss functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs065.html#summary-of-rnns" style="font-size: 80%;"><b>Summary of RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs066.html#summary-of-a-typical-rnn" style="font-size: 80%;"><b>Summary of a  typical RNN</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs067.html#four-effective-ways-to-learn-an-rnn-and-preparing-for-next-week" style="font-size: 80%;"><b>Four effective ways to learn an RNN and preparing for next week</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs068.html#gating-mechanism-long-short-term-memory-lstm" style="font-size: 80%;"><b>Gating mechanism: Long Short Term Memory (LSTM)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs069.html#implementing-a-memory-cell-in-a-neural-network" style="font-size: 80%;"><b>Implementing a memory cell in a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs070.html#lstm-details" style="font-size: 80%;"><b>LSTM details</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs071.html#basic-layout" style="font-size: 80%;"><b>Basic layout</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs072.html#more-lstm-details" style="font-size: 80%;"><b>More LSTM details</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs073.html#the-forget-gate" style="font-size: 80%;"><b>The forget gate</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs074.html#input-gate" style="font-size: 80%;"><b>Input gate</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs075.html#forget-and-input" style="font-size: 80%;"><b>Forget and input</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs076.html#output-gate" style="font-size: 80%;"><b>Output gate</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs001.html#plans-for-week-45" style="font-size: 80%;">Plans for week 45</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs002.html#material-for-the-lab-sessions" style="font-size: 80%;">Material for the lab sessions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs003.html#material-for-lecture-monday-november-3" style="font-size: 80%;">Material for Lecture Monday November 3</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs004.html#convolutional-neural-networks-recognizing-images-reminder-from-last-week" style="font-size: 80%;">Convolutional Neural Networks (recognizing images), reminder from last week</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs005.html#what-is-the-difference" style="font-size: 80%;">What is the Difference</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs006.html#neural-networks-vs-cnns" style="font-size: 80%;">Neural Networks vs CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs007.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;">Why CNNS for images, sound files, medical images from CT scans etc?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs008.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;">Regular NNs don’t scale well to full images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs009.html#3d-volumes-of-neurons" style="font-size: 80%;">3D volumes of neurons</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs010.html#more-on-dimensionalities" style="font-size: 80%;">More on Dimensionalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs011.html#further-remarks" style="font-size: 80%;">Further remarks</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs012.html#layers-used-to-build-cnns" style="font-size: 80%;">Layers used to build CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs013.html#transforming-images" style="font-size: 80%;">Transforming images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs014.html#cnns-in-brief" style="font-size: 80%;">CNNs in brief</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs015.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs016.html#key-idea" style="font-size: 80%;">Key Idea</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs017.html#mathematics-of-cnns" style="font-size: 80%;">Mathematics of CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs018.html#convolution-examples-polynomial-multiplication" style="font-size: 80%;">Convolution Examples: Polynomial multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs019.html#efficient-polynomial-multiplication" style="font-size: 80%;">Efficient Polynomial Multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs020.html#further-simplification" style="font-size: 80%;">Further simplification</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs021.html#a-more-efficient-way-of-coding-the-above-convolution" style="font-size: 80%;">A more efficient way of coding the above Convolution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs022.html#commutative-process" style="font-size: 80%;">Commutative process</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs023.html#toeplitz-matrices" style="font-size: 80%;">Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs024.html#fourier-series-and-toeplitz-matrices" style="font-size: 80%;">Fourier series and Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs025.html#generalizing-the-above-one-dimensional-case" style="font-size: 80%;">Generalizing the above one-dimensional case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs026.html#memory-considerations" style="font-size: 80%;">Memory considerations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs027.html#padding" style="font-size: 80%;">Padding</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs028.html#new-vector" style="font-size: 80%;">New vector</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs029.html#rewriting-as-dot-products" style="font-size: 80%;">Rewriting as dot products</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs030.html#cross-correlation" style="font-size: 80%;">Cross correlation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs030.html#two-dimensional-objects" style="font-size: 80%;">Two-dimensional objects</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#cnns-in-more-detail-simple-example" style="font-size: 80%;">CNNs in more detail, simple example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs032.html#the-convolution-stage" style="font-size: 80%;">The convolution stage</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs033.html#finding-the-number-of-parameters" style="font-size: 80%;">Finding the number of parameters</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs034.html#new-image-or-volume" style="font-size: 80%;">New image (or volume)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs035.html#parameters-to-train-common-settings" style="font-size: 80%;">Parameters to train, common settings</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs036.html#examples-of-cnn-setups" style="font-size: 80%;">Examples of CNN setups</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs037.html#summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Summarizing: Performing a general discrete convolution ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs038.html#pooling" style="font-size: 80%;">Pooling</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs039.html#pooling-arithmetic" style="font-size: 80%;">Pooling arithmetic</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs040.html#pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Pooling types ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs041.html#building-convolutional-neural-networks-using-tensorflow-and-keras" style="font-size: 80%;">Building convolutional neural networks using Tensorflow and Keras</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs042.html#setting-it-up" style="font-size: 80%;">Setting it up</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs043.html#the-mnist-dataset-again" style="font-size: 80%;">The MNIST dataset again</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs044.html#strong-correlations" style="font-size: 80%;">Strong correlations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs045.html#layers-of-a-cnn" style="font-size: 80%;">Layers of a CNN</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs046.html#systematic-reduction" style="font-size: 80%;">Systematic reduction</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs047.html#prerequisites-collect-and-pre-process-data" style="font-size: 80%;">Prerequisites: Collect and pre-process data</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs048.html#importing-keras-and-tensorflow" style="font-size: 80%;">Importing Keras and Tensorflow</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs049.html#running-with-keras" style="font-size: 80%;">Running with Keras</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs050.html#final-part" style="font-size: 80%;">Final part</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs051.html#final-visualization" style="font-size: 80%;">Final visualization</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs052.html#the-cifar01-data-set" style="font-size: 80%;">The CIFAR01 data set</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs053.html#verifying-the-data-set" style="font-size: 80%;">Verifying the data set</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs054.html#set-up-the-model" style="font-size: 80%;">Set up  the model</a></li>
+     <!-- navigation toc: --> <li><a href="#add-dense-layers-on-top" style="font-size: 80%;">Add Dense layers on top</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs056.html#compile-and-train-the-model" style="font-size: 80%;">Compile and train the model</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs057.html#finally-evaluate-the-model" style="font-size: 80%;">Finally, evaluate the model</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs058.html#building-code-using-pytorch" style="font-size: 80%;">Building code using Pytorch</a></li>
 
         </ul>
       </li>
@@ -418,17 +278,47 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0055"></a>
 <!-- !split -->
-<h2 id="the-backward-pass-is-linear" class="anchor">The backward pass is linear </h2>
+<h2 id="add-dense-layers-on-top" class="anchor">Add Dense layers on top </h2>
 
-<ol>
-<li> There is a big difference between the forward and backward passes.</li>
-<li> In the forward pass we use squashing functions (like the logistic) to prevent the activity vectors from exploding.</li>
-<li> The backward pass, is completely linear. If you double the error derivatives at the final layer, all the error derivatives will double.</li>
-</ol>
-<p>The forward pass determines the slope of the linear function used for
-backpropagating through each neuron
+<p>To complete our model, you will feed the last output tensor from the
+convolutional base (of shape (4, 4, 64)) into one or more Dense layers
+to perform classification. Dense layers take vectors as input (which
+are 1D), while the current output is a 3D tensor. First, you will
+flatten (or unroll) the 3D output to 1D, then add one or more Dense
+layers on top. CIFAR has 10 output classes, so you use a final Dense
+layer with 10 outputs and a softmax activation.
 </p>
 
+
+<!-- code=python (!bc pycod) typeset with pygments style "default" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #f8f8f8">
+  <pre style="line-height: 125%;">model<span style="color: #666666">.</span>add(layers<span style="color: #666666">.</span>Flatten())
+model<span style="color: #666666">.</span>add(layers<span style="color: #666666">.</span>Dense(<span style="color: #666666">64</span>, activation<span style="color: #666666">=</span><span style="color: #BA2121">&#39;relu&#39;</span>))
+model<span style="color: #666666">.</span>add(layers<span style="color: #666666">.</span>Dense(<span style="color: #666666">10</span>))
+Here<span style="color: #BA2121">&#39;s the complete architecture of our model.</span>
+
+model<span style="color: #666666">.</span>summary()
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
+</div>
+
+<p>As you can see, our (4, 4, 64) outputs were flattened into vectors of shape (1024) before going through two Dense layers.</p>
+
 <p>
 <!-- navigation buttons at the bottom of the page -->
 <ul class="pagination">
@@ -447,14 +337,6 @@ <h2 id="the-backward-pass-is-linear" class="anchor">The backward pass is linear
   <li><a href="http://github.com/._week45-bs056.html">57</a></li>
   <li><a href="http://github.com/._week45-bs057.html">58</a></li>
   <li><a href="http://github.com/._week45-bs058.html">59</a></li>
-  <li><a href="http://github.com/._week45-bs059.html">60</a></li>
-  <li><a href="http://github.com/._week45-bs060.html">61</a></li>
-  <li><a href="http://github.com/._week45-bs061.html">62</a></li>
-  <li><a href="http://github.com/._week45-bs062.html">63</a></li>
-  <li><a href="http://github.com/._week45-bs063.html">64</a></li>
-  <li><a href="http://github.com/._week45-bs064.html">65</a></li>
-  <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week45-bs076.html">77</a></li>
   <li><a href="http://github.com/._week45-bs056.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week45/html/._week45-bs056.html b/doc/pub/week45/html/._week45-bs056.html
index 560ce92e3..8673a2138 100644
--- a/doc/pub/week45/html/._week45-bs056.html
+++ b/doc/pub/week45/html/._week45-bs056.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks (RNNs)">
-<title>Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks (RNNs)</title>
+<meta name="description" content="Week 45,  Convolutional Neural Networks (CCNs)">
+<title>Week 45,  Convolutional Neural Networks (CCNs)</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week45.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week45-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -37,19 +37,19 @@
 <!-- tocinfo
 {'highest level': 2,
  'sections': [('Plans for week 45', 2, None, 'plans-for-week-45'),
-              ('Material for the lab sessions, additional ways to present '
-               'classification results and other practicalities',
+              ('Material for the lab sessions',
                2,
                None,
-               'material-for-the-lab-sessions-additional-ways-to-present-classification-results-and-other-practicalities'),
-              ('Material for Lecture Monday November 4',
+               'material-for-the-lab-sessions'),
+              ('Material for Lecture Monday November 3',
                2,
                None,
-               'material-for-lecture-monday-november-4'),
-              ('Convolutional Neural Networks (recognizing images)',
+               'material-for-lecture-monday-november-3'),
+              ('Convolutional Neural Networks (recognizing images), reminder '
+               'from last week',
                2,
                None,
-               'convolutional-neural-networks-recognizing-images'),
+               'convolutional-neural-networks-recognizing-images-reminder-from-last-week'),
               ('What is the Difference', 2, None, 'what-is-the-difference'),
               ('Neural Networks vs CNNs', 2, None, 'neural-networks-vs-cnns'),
               ('Why CNNS for images, sound files, medical images from CT scans '
@@ -62,10 +62,13 @@
                None,
                'regular-nns-don-t-scale-well-to-full-images'),
               ('3D volumes of neurons', 2, None, '3d-volumes-of-neurons'),
+              ('More on Dimensionalities', 2, None, 'more-on-dimensionalities'),
+              ('Further remarks', 2, None, 'further-remarks'),
               ('Layers used to build CNNs',
                2,
                None,
                'layers-used-to-build-cnns'),
+              ('Transforming images', 2, None, 'transforming-images'),
               ('CNNs in brief', 2, None, 'cnns-in-brief'),
               ('A deep CNN model ("From Raschka et '
                'al":"https://github.com/rasbt/machine-learning-book")',
@@ -73,208 +76,104 @@
                None,
                'a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
               ('Key Idea', 2, None, 'key-idea'),
-              ('Building convolutional neural networks in Tensorflow and Keras',
+              ('Mathematics of CNNs', 2, None, 'mathematics-of-cnns'),
+              ('Convolution Examples: Polynomial multiplication',
                2,
                None,
-               'building-convolutional-neural-networks-in-tensorflow-and-keras'),
-              ('Setting it up', 2, None, 'setting-it-up'),
-              ('The MNIST dataset again', 2, None, 'the-mnist-dataset-again'),
-              ('Strong correlations', 2, None, 'strong-correlations'),
-              ('Layers of a CNN', 2, None, 'layers-of-a-cnn'),
-              ('Systematic reduction', 2, None, 'systematic-reduction'),
-              ('Prerequisites: Collect and pre-process data',
-               2,
-               None,
-               'prerequisites-collect-and-pre-process-data'),
-              ('Importing Keras and Tensorflow',
-               2,
-               None,
-               'importing-keras-and-tensorflow'),
-              ('Running with Keras', 2, None, 'running-with-keras'),
-              ('Final part', 2, None, 'final-part'),
-              ('Final visualization', 2, None, 'final-visualization'),
-              ('The CIFAR01 data set', 2, None, 'the-cifar01-data-set'),
-              ('Verifying the data set', 2, None, 'verifying-the-data-set'),
-              ('Set up  the model', 2, None, 'set-up-the-model'),
-              ('Add Dense layers on top', 2, None, 'add-dense-layers-on-top'),
-              ('Compile and train the model',
-               2,
-               None,
-               'compile-and-train-the-model'),
-              ('Finally, evaluate the model',
-               2,
-               None,
-               'finally-evaluate-the-model'),
-              ('Building our own CNN code',
-               2,
-               None,
-               'building-our-own-cnn-code'),
-              ('List of contents:', 3, None, 'list-of-contents'),
-              ('Schedulers', 3, None, 'schedulers'),
-              ('Usage of schedulers', 3, None, 'usage-of-schedulers'),
-              ('Cost functions', 3, None, 'cost-functions'),
-              ('Usage of cost functions', 3, None, 'usage-of-cost-functions'),
-              ('Activation functions', 3, None, 'activation-functions'),
-              ('Usage of activation functions',
-               3,
-               None,
-               'usage-of-activation-functions'),
-              ('Convolution', 3, None, 'convolution'),
-              ('Layers', 3, None, 'layers'),
-              ('Convolution2DLayer: convolution in a hidden layer',
-               3,
-               None,
-               'convolution2dlayer-convolution-in-a-hidden-layer'),
-              ('Backpropagation in the convolutional layer',
-               3,
-               None,
-               'backpropagation-in-the-convolutional-layer'),
-              ('Demonstration', 3, None, 'demonstration'),
-              ('Pooling Layer', 3, None, 'pooling-layer'),
-              ('Flattening Layer', 3, None, 'flattening-layer'),
-              ('Fully Connected Layers', 3, None, 'fully-connected-layers'),
-              ('Optimized Convolution2DLayer',
-               3,
-               None,
-               'optimized-convolution2dlayer'),
-              ('The Convolutional Neural Network (CNN)',
-               3,
-               None,
-               'the-convolutional-neural-network-cnn'),
-              ('Usage of CNN code', 3, None, 'usage-of-cnn-code'),
-              ('Additional Remarks', 3, None, 'additional-remarks'),
-              ('Remarks on the speed', 3, None, 'remarks-on-the-speed'),
-              ('Convolution using separable kernels',
-               3,
-               None,
-               'convolution-using-separable-kernels'),
-              ('Convolution in the Fourier domain',
-               3,
-               None,
-               'convolution-in-the-fourier-domain'),
-              ('From FFNNs and CNNs to recurrent neural networks (RNNs)',
-               2,
-               None,
-               'from-ffnns-and-cnns-to-recurrent-neural-networks-rnns'),
-              ('Feedback connections', 2, None, 'feedback-connections'),
-              ('Vanishing gradients', 2, None, 'vanishing-gradients'),
-              ('Recurrent neural networks (RNNs): Overarching view',
+               'convolution-examples-polynomial-multiplication'),
+              ('Efficient Polynomial Multiplication',
                2,
                None,
-               'recurrent-neural-networks-rnns-overarching-view'),
-              ('Sequential data only?', 2, None, 'sequential-data-only'),
-              ('Differential equations', 2, None, 'differential-equations'),
-              ('A simple example', 2, None, 'a-simple-example'),
-              ('RNNs', 2, None, 'rnns'),
-              ('What kinds of behaviour can RNNs exhibit?',
+               'efficient-polynomial-multiplication'),
+              ('Further simplification', 2, None, 'further-simplification'),
+              ('A more efficient way of coding the above Convolution',
                2,
                None,
-               'what-kinds-of-behaviour-can-rnns-exhibit'),
-              ('Basic layout,  "Figures from Sebastian Rashcka et al, Machine '
-               'learning with Sickit-Learn and '
-               'PyTorch":"https://sebastianraschka.com/blog/2022/ml-pytorch-book.html"',
+               'a-more-efficient-way-of-coding-the-above-convolution'),
+              ('Commutative process', 2, None, 'commutative-process'),
+              ('Toeplitz matrices', 2, None, 'toeplitz-matrices'),
+              ('Fourier series and Toeplitz matrices',
                2,
                None,
-               'basic-layout-figures-from-sebastian-rashcka-et-al-machine-learning-with-sickit-learn-and-pytorch-https-sebastianraschka-com-blog-2022-ml-pytorch-book-html'),
-              ('Solving differential equations with RNNs',
+               'fourier-series-and-toeplitz-matrices'),
+              ('Generalizing the above one-dimensional case',
                2,
                None,
-               'solving-differential-equations-with-rnns'),
-              ('Two first-order differential equations',
+               'generalizing-the-above-one-dimensional-case'),
+              ('Memory considerations', 2, None, 'memory-considerations'),
+              ('Padding', 2, None, 'padding'),
+              ('New vector', 2, None, 'new-vector'),
+              ('Rewriting as dot products',
                2,
                None,
-               'two-first-order-differential-equations'),
-              ('Velocity only', 2, None, 'velocity-only'),
-              ('Linking with RNNs', 2, None, 'linking-with-rnns'),
-              ('Minor rewrite', 2, None, 'minor-rewrite'),
-              ('RNNs in more detail', 2, None, 'rnns-in-more-detail'),
-              ('RNNs in more detail, part 2',
+               'rewriting-as-dot-products'),
+              ('Cross correlation', 2, None, 'cross-correlation'),
+              ('Two-dimensional objects', 2, None, 'two-dimensional-objects'),
+              ('CNNs in more detail, simple example',
                2,
                None,
-               'rnns-in-more-detail-part-2'),
-              ('RNNs in more detail, part 3',
+               'cnns-in-more-detail-simple-example'),
+              ('The convolution stage', 2, None, 'the-convolution-stage'),
+              ('Finding the number of parameters',
                2,
                None,
-               'rnns-in-more-detail-part-3'),
-              ('RNNs in more detail, part 4',
+               'finding-the-number-of-parameters'),
+              ('New image (or volume)', 2, None, 'new-image-or-volume'),
+              ('Parameters to train, common settings',
                2,
                None,
-               'rnns-in-more-detail-part-4'),
-              ('RNNs in more detail, part 5',
-               2,
-               None,
-               'rnns-in-more-detail-part-5'),
-              ('RNNs in more detail, part 6',
-               2,
-               None,
-               'rnns-in-more-detail-part-6'),
-              ('RNNs in more detail, part 7',
-               2,
-               None,
-               'rnns-in-more-detail-part-7'),
-              ('Backpropagation through time',
-               2,
-               None,
-               'backpropagation-through-time'),
-              ('The backward pass is linear',
-               2,
-               None,
-               'the-backward-pass-is-linear'),
-              ('The problem of exploding or vanishing gradients',
-               2,
-               None,
-               'the-problem-of-exploding-or-vanishing-gradients'),
-              ('Mathematical setup', 2, None, 'mathematical-setup'),
-              ('Back propagation in time through figures, part 1',
-               2,
-               None,
-               'back-propagation-in-time-through-figures-part-1'),
-              ('Back propagation in time, part 2',
-               2,
-               None,
-               'back-propagation-in-time-part-2'),
-              ('Back propagation in time, part 3',
+               'parameters-to-train-common-settings'),
+              ('Examples of CNN setups', 2, None, 'examples-of-cnn-setups'),
+              ('Summarizing: Performing a general discrete convolution ("From '
+               'Raschka et '
+               'al":"https://github.com/rasbt/machine-learning-book")',
                2,
                None,
-               'back-propagation-in-time-part-3'),
-              ('Back propagation in time, part 4',
+               'summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
+              ('Pooling', 2, None, 'pooling'),
+              ('Pooling arithmetic', 2, None, 'pooling-arithmetic'),
+              ('Pooling types ("From Raschka et '
+               'al":"https://github.com/rasbt/machine-learning-book")',
                2,
                None,
-               'back-propagation-in-time-part-4'),
-              ('Back propagation in time in equations',
+               'pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
+              ('Building convolutional neural networks using Tensorflow and '
+               'Keras',
                2,
                None,
-               'back-propagation-in-time-in-equations'),
-              ('Chain rule again', 2, None, 'chain-rule-again'),
-              ('Gradients of loss functions',
+               'building-convolutional-neural-networks-using-tensorflow-and-keras'),
+              ('Setting it up', 2, None, 'setting-it-up'),
+              ('The MNIST dataset again', 2, None, 'the-mnist-dataset-again'),
+              ('Strong correlations', 2, None, 'strong-correlations'),
+              ('Layers of a CNN', 2, None, 'layers-of-a-cnn'),
+              ('Systematic reduction', 2, None, 'systematic-reduction'),
+              ('Prerequisites: Collect and pre-process data',
                2,
                None,
-               'gradients-of-loss-functions'),
-              ('Summary of RNNs', 2, None, 'summary-of-rnns'),
-              ('Summary of a  typical RNN',
+               'prerequisites-collect-and-pre-process-data'),
+              ('Importing Keras and Tensorflow',
                2,
                None,
-               'summary-of-a-typical-rnn'),
-              ('Four effective ways to learn an RNN and preparing for next '
-               'week',
+               'importing-keras-and-tensorflow'),
+              ('Running with Keras', 2, None, 'running-with-keras'),
+              ('Final part', 2, None, 'final-part'),
+              ('Final visualization', 2, None, 'final-visualization'),
+              ('The CIFAR01 data set', 2, None, 'the-cifar01-data-set'),
+              ('Verifying the data set', 2, None, 'verifying-the-data-set'),
+              ('Set up  the model', 2, None, 'set-up-the-model'),
+              ('Add Dense layers on top', 2, None, 'add-dense-layers-on-top'),
+              ('Compile and train the model',
                2,
                None,
-               'four-effective-ways-to-learn-an-rnn-and-preparing-for-next-week'),
-              ('Gating mechanism: Long Short Term Memory (LSTM)',
+               'compile-and-train-the-model'),
+              ('Finally, evaluate the model',
                2,
                None,
-               'gating-mechanism-long-short-term-memory-lstm'),
-              ('Implementing a memory cell in a neural network',
+               'finally-evaluate-the-model'),
+              ('Building code using Pytorch',
                2,
                None,
-               'implementing-a-memory-cell-in-a-neural-network'),
-              ('LSTM details', 2, None, 'lstm-details'),
-              ('Basic layout', 2, None, 'basic-layout'),
-              ('More LSTM details', 2, None, 'more-lstm-details'),
-              ('The forget gate', 2, None, 'the-forget-gate'),
-              ('Input gate', 2, None, 'input-gate'),
-              ('Forget and input', 2, None, 'forget-and-input'),
-              ('Output gate', 2, None, 'output-gate')]}
+               'building-code-using-pytorch')]}
 end of tocinfo -->
 
 <body>
@@ -302,111 +201,72 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week45-bs.html">Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks (RNNs)</a>
+    <a class="navbar-brand" href="http://github.com/week45-bs.html">Week 45,  Convolutional Neural Networks (CCNs)</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs001.html#plans-for-week-45" style="font-size: 80%;"><b>Plans for week 45</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs002.html#material-for-the-lab-sessions-additional-ways-to-present-classification-results-and-other-practicalities" style="font-size: 80%;"><b>Material for the lab sessions, additional ways to present classification results and other practicalities</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs003.html#material-for-lecture-monday-november-4" style="font-size: 80%;"><b>Material for Lecture Monday November 4</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs004.html#convolutional-neural-networks-recognizing-images" style="font-size: 80%;"><b>Convolutional Neural Networks (recognizing images)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs005.html#what-is-the-difference" style="font-size: 80%;"><b>What is the Difference</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs006.html#neural-networks-vs-cnns" style="font-size: 80%;"><b>Neural Networks vs CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs007.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;"><b>Why CNNS for images, sound files, medical images from CT scans etc?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs008.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;"><b>Regular NNs don’t scale well to full images</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs009.html#3d-volumes-of-neurons" style="font-size: 80%;"><b>3D volumes of neurons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs010.html#layers-used-to-build-cnns" style="font-size: 80%;"><b>Layers used to build CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs011.html#cnns-in-brief" style="font-size: 80%;"><b>CNNs in brief</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs012.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs013.html#key-idea" style="font-size: 80%;"><b>Key Idea</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs014.html#building-convolutional-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building convolutional neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs015.html#setting-it-up" style="font-size: 80%;"><b>Setting it up</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs016.html#the-mnist-dataset-again" style="font-size: 80%;"><b>The MNIST dataset again</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs017.html#strong-correlations" style="font-size: 80%;"><b>Strong correlations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs018.html#layers-of-a-cnn" style="font-size: 80%;"><b>Layers of a CNN</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs019.html#systematic-reduction" style="font-size: 80%;"><b>Systematic reduction</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs020.html#prerequisites-collect-and-pre-process-data" style="font-size: 80%;"><b>Prerequisites: Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs021.html#importing-keras-and-tensorflow" style="font-size: 80%;"><b>Importing Keras and Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs022.html#running-with-keras" style="font-size: 80%;"><b>Running with Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs023.html#final-part" style="font-size: 80%;"><b>Final part</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs024.html#final-visualization" style="font-size: 80%;"><b>Final visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs025.html#the-cifar01-data-set" style="font-size: 80%;"><b>The CIFAR01 data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs026.html#verifying-the-data-set" style="font-size: 80%;"><b>Verifying the data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs027.html#set-up-the-model" style="font-size: 80%;"><b>Set up  the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs028.html#add-dense-layers-on-top" style="font-size: 80%;"><b>Add Dense layers on top</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs029.html#compile-and-train-the-model" style="font-size: 80%;"><b>Compile and train the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs030.html#finally-evaluate-the-model" style="font-size: 80%;"><b>Finally, evaluate the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#building-our-own-cnn-code" style="font-size: 80%;"><b>Building our own CNN code</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#list-of-contents" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;List of contents:</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution2dlayer-convolution-in-a-hidden-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution2DLayer: convolution in a hidden layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#backpropagation-in-the-convolutional-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Backpropagation in the convolutional layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#demonstration" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Demonstration</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#pooling-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Pooling Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#flattening-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Flattening Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#fully-connected-layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Fully Connected Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#optimized-convolution2dlayer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Optimized Convolution2DLayer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#the-convolutional-neural-network-cnn" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;The Convolutional Neural Network (CNN)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-cnn-code" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of CNN code</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#additional-remarks" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Additional Remarks</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#remarks-on-the-speed" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Remarks on the speed</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution-using-separable-kernels" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution using separable kernels</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution-in-the-fourier-domain" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution in the Fourier domain</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs032.html#from-ffnns-and-cnns-to-recurrent-neural-networks-rnns" style="font-size: 80%;"><b>From FFNNs and CNNs to recurrent neural networks (RNNs)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs033.html#feedback-connections" style="font-size: 80%;"><b>Feedback connections</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs034.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs035.html#recurrent-neural-networks-rnns-overarching-view" style="font-size: 80%;"><b>Recurrent neural networks (RNNs): Overarching view</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs036.html#sequential-data-only" style="font-size: 80%;"><b>Sequential data only?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs037.html#differential-equations" style="font-size: 80%;"><b>Differential equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs038.html#a-simple-example" style="font-size: 80%;"><b>A simple example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs039.html#rnns" style="font-size: 80%;"><b>RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs040.html#what-kinds-of-behaviour-can-rnns-exhibit" style="font-size: 80%;"><b>What kinds of behaviour can RNNs exhibit?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs041.html#basic-layout-figures-from-sebastian-rashcka-et-al-machine-learning-with-sickit-learn-and-pytorch-https-sebastianraschka-com-blog-2022-ml-pytorch-book-html" style="font-size: 80%;"><b>Basic layout,  "Figures from Sebastian Rashcka et al, Machine learning with Sickit-Learn and PyTorch":"https://sebastianraschka.com/blog/2022/ml-pytorch-book.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs042.html#solving-differential-equations-with-rnns" style="font-size: 80%;"><b>Solving differential equations with RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs043.html#two-first-order-differential-equations" style="font-size: 80%;"><b>Two first-order differential equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs044.html#velocity-only" style="font-size: 80%;"><b>Velocity only</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs045.html#linking-with-rnns" style="font-size: 80%;"><b>Linking with RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs046.html#minor-rewrite" style="font-size: 80%;"><b>Minor rewrite</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs047.html#rnns-in-more-detail" style="font-size: 80%;"><b>RNNs in more detail</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs048.html#rnns-in-more-detail-part-2" style="font-size: 80%;"><b>RNNs in more detail, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs049.html#rnns-in-more-detail-part-3" style="font-size: 80%;"><b>RNNs in more detail, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs050.html#rnns-in-more-detail-part-4" style="font-size: 80%;"><b>RNNs in more detail, part 4</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs051.html#rnns-in-more-detail-part-5" style="font-size: 80%;"><b>RNNs in more detail, part 5</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs052.html#rnns-in-more-detail-part-6" style="font-size: 80%;"><b>RNNs in more detail, part 6</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs053.html#rnns-in-more-detail-part-7" style="font-size: 80%;"><b>RNNs in more detail, part 7</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs054.html#backpropagation-through-time" style="font-size: 80%;"><b>Backpropagation through time</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs055.html#the-backward-pass-is-linear" style="font-size: 80%;"><b>The backward pass is linear</b></a></li>
-     <!-- navigation toc: --> <li><a href="#the-problem-of-exploding-or-vanishing-gradients" style="font-size: 80%;"><b>The problem of exploding or vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs057.html#mathematical-setup" style="font-size: 80%;"><b>Mathematical setup</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs058.html#back-propagation-in-time-through-figures-part-1" style="font-size: 80%;"><b>Back propagation in time through figures, part 1</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs059.html#back-propagation-in-time-part-2" style="font-size: 80%;"><b>Back propagation in time, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs060.html#back-propagation-in-time-part-3" style="font-size: 80%;"><b>Back propagation in time, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs061.html#back-propagation-in-time-part-4" style="font-size: 80%;"><b>Back propagation in time, part 4</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs062.html#back-propagation-in-time-in-equations" style="font-size: 80%;"><b>Back propagation in time in equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs063.html#chain-rule-again" style="font-size: 80%;"><b>Chain rule again</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs064.html#gradients-of-loss-functions" style="font-size: 80%;"><b>Gradients of loss functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs065.html#summary-of-rnns" style="font-size: 80%;"><b>Summary of RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs066.html#summary-of-a-typical-rnn" style="font-size: 80%;"><b>Summary of a  typical RNN</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs067.html#four-effective-ways-to-learn-an-rnn-and-preparing-for-next-week" style="font-size: 80%;"><b>Four effective ways to learn an RNN and preparing for next week</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs068.html#gating-mechanism-long-short-term-memory-lstm" style="font-size: 80%;"><b>Gating mechanism: Long Short Term Memory (LSTM)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs069.html#implementing-a-memory-cell-in-a-neural-network" style="font-size: 80%;"><b>Implementing a memory cell in a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs070.html#lstm-details" style="font-size: 80%;"><b>LSTM details</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs071.html#basic-layout" style="font-size: 80%;"><b>Basic layout</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs072.html#more-lstm-details" style="font-size: 80%;"><b>More LSTM details</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs073.html#the-forget-gate" style="font-size: 80%;"><b>The forget gate</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs074.html#input-gate" style="font-size: 80%;"><b>Input gate</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs075.html#forget-and-input" style="font-size: 80%;"><b>Forget and input</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs076.html#output-gate" style="font-size: 80%;"><b>Output gate</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs001.html#plans-for-week-45" style="font-size: 80%;">Plans for week 45</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs002.html#material-for-the-lab-sessions" style="font-size: 80%;">Material for the lab sessions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs003.html#material-for-lecture-monday-november-3" style="font-size: 80%;">Material for Lecture Monday November 3</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs004.html#convolutional-neural-networks-recognizing-images-reminder-from-last-week" style="font-size: 80%;">Convolutional Neural Networks (recognizing images), reminder from last week</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs005.html#what-is-the-difference" style="font-size: 80%;">What is the Difference</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs006.html#neural-networks-vs-cnns" style="font-size: 80%;">Neural Networks vs CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs007.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;">Why CNNS for images, sound files, medical images from CT scans etc?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs008.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;">Regular NNs don’t scale well to full images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs009.html#3d-volumes-of-neurons" style="font-size: 80%;">3D volumes of neurons</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs010.html#more-on-dimensionalities" style="font-size: 80%;">More on Dimensionalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs011.html#further-remarks" style="font-size: 80%;">Further remarks</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs012.html#layers-used-to-build-cnns" style="font-size: 80%;">Layers used to build CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs013.html#transforming-images" style="font-size: 80%;">Transforming images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs014.html#cnns-in-brief" style="font-size: 80%;">CNNs in brief</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs015.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs016.html#key-idea" style="font-size: 80%;">Key Idea</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs017.html#mathematics-of-cnns" style="font-size: 80%;">Mathematics of CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs018.html#convolution-examples-polynomial-multiplication" style="font-size: 80%;">Convolution Examples: Polynomial multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs019.html#efficient-polynomial-multiplication" style="font-size: 80%;">Efficient Polynomial Multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs020.html#further-simplification" style="font-size: 80%;">Further simplification</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs021.html#a-more-efficient-way-of-coding-the-above-convolution" style="font-size: 80%;">A more efficient way of coding the above Convolution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs022.html#commutative-process" style="font-size: 80%;">Commutative process</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs023.html#toeplitz-matrices" style="font-size: 80%;">Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs024.html#fourier-series-and-toeplitz-matrices" style="font-size: 80%;">Fourier series and Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs025.html#generalizing-the-above-one-dimensional-case" style="font-size: 80%;">Generalizing the above one-dimensional case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs026.html#memory-considerations" style="font-size: 80%;">Memory considerations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs027.html#padding" style="font-size: 80%;">Padding</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs028.html#new-vector" style="font-size: 80%;">New vector</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs029.html#rewriting-as-dot-products" style="font-size: 80%;">Rewriting as dot products</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs030.html#cross-correlation" style="font-size: 80%;">Cross correlation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs030.html#two-dimensional-objects" style="font-size: 80%;">Two-dimensional objects</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#cnns-in-more-detail-simple-example" style="font-size: 80%;">CNNs in more detail, simple example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs032.html#the-convolution-stage" style="font-size: 80%;">The convolution stage</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs033.html#finding-the-number-of-parameters" style="font-size: 80%;">Finding the number of parameters</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs034.html#new-image-or-volume" style="font-size: 80%;">New image (or volume)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs035.html#parameters-to-train-common-settings" style="font-size: 80%;">Parameters to train, common settings</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs036.html#examples-of-cnn-setups" style="font-size: 80%;">Examples of CNN setups</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs037.html#summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Summarizing: Performing a general discrete convolution ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs038.html#pooling" style="font-size: 80%;">Pooling</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs039.html#pooling-arithmetic" style="font-size: 80%;">Pooling arithmetic</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs040.html#pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Pooling types ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs041.html#building-convolutional-neural-networks-using-tensorflow-and-keras" style="font-size: 80%;">Building convolutional neural networks using Tensorflow and Keras</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs042.html#setting-it-up" style="font-size: 80%;">Setting it up</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs043.html#the-mnist-dataset-again" style="font-size: 80%;">The MNIST dataset again</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs044.html#strong-correlations" style="font-size: 80%;">Strong correlations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs045.html#layers-of-a-cnn" style="font-size: 80%;">Layers of a CNN</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs046.html#systematic-reduction" style="font-size: 80%;">Systematic reduction</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs047.html#prerequisites-collect-and-pre-process-data" style="font-size: 80%;">Prerequisites: Collect and pre-process data</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs048.html#importing-keras-and-tensorflow" style="font-size: 80%;">Importing Keras and Tensorflow</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs049.html#running-with-keras" style="font-size: 80%;">Running with Keras</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs050.html#final-part" style="font-size: 80%;">Final part</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs051.html#final-visualization" style="font-size: 80%;">Final visualization</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs052.html#the-cifar01-data-set" style="font-size: 80%;">The CIFAR01 data set</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs053.html#verifying-the-data-set" style="font-size: 80%;">Verifying the data set</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs054.html#set-up-the-model" style="font-size: 80%;">Set up  the model</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs055.html#add-dense-layers-on-top" style="font-size: 80%;">Add Dense layers on top</a></li>
+     <!-- navigation toc: --> <li><a href="#compile-and-train-the-model" style="font-size: 80%;">Compile and train the model</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs057.html#finally-evaluate-the-model" style="font-size: 80%;">Finally, evaluate the model</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs058.html#building-code-using-pytorch" style="font-size: 80%;">Building code using Pytorch</a></li>
 
         </ul>
       </li>
@@ -417,22 +277,37 @@
 <div class="container">
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0056"></a>
-<!-- !split  -->
-<h2 id="the-problem-of-exploding-or-vanishing-gradients" class="anchor">The problem of exploding or vanishing gradients </h2>
-<ul>
-<li> What happens to the magnitude of the gradients as we backpropagate through many layers?
-<ol type="a"></li>
- <li> If the weights are small, the gradients shrink exponentially.</li>
- <li> If the weights are big the gradients grow exponentially.</li>
-</ol>
-<li> Typical feed-forward neural nets can cope with these exponential effects because they only have a few hidden layers.</li>
-<li> In an RNN trained on long sequences (e.g. 100 time steps) the gradients can easily explode or vanish.
-<ol type="a"></li>
- <li> We can avoid this by initializing the weights very carefully.</li>
-</ol>
-<li> Even with good initial weights, its very hard to detect that the current target output depends on an input from many time-steps ago.</li>
-</ul>
-<p>RNNs have difficulty dealing with long-range dependencies. </p>
+<!-- !split -->
+<h2 id="compile-and-train-the-model" class="anchor">Compile and train the model </h2>
+
+
+<!-- code=python (!bc pycod) typeset with pygments style "default" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #f8f8f8">
+  <pre style="line-height: 125%;">model<span style="color: #666666">.</span>compile(optimizer<span style="color: #666666">=</span><span style="color: #BA2121">&#39;adam&#39;</span>,
+              loss<span style="color: #666666">=</span>tf<span style="color: #666666">.</span>keras<span style="color: #666666">.</span>losses<span style="color: #666666">.</span>SparseCategoricalCrossentropy(from_logits<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">True</span>),
+              metrics<span style="color: #666666">=</span>[<span style="color: #BA2121">&#39;accuracy&#39;</span>])
+
+history <span style="color: #666666">=</span> model<span style="color: #666666">.</span>fit(train_images, train_labels, epochs<span style="color: #666666">=10</span>, 
+                    validation_data<span style="color: #666666">=</span>(test_images, test_labels))
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
+</div>
+
 
 <p>
 <!-- navigation buttons at the bottom of the page -->
@@ -451,15 +326,6 @@ <h2 id="the-problem-of-exploding-or-vanishing-gradients" class="anchor">The prob
   <li class="active"><a href="http://github.com/._week45-bs056.html">57</a></li>
   <li><a href="http://github.com/._week45-bs057.html">58</a></li>
   <li><a href="http://github.com/._week45-bs058.html">59</a></li>
-  <li><a href="http://github.com/._week45-bs059.html">60</a></li>
-  <li><a href="http://github.com/._week45-bs060.html">61</a></li>
-  <li><a href="http://github.com/._week45-bs061.html">62</a></li>
-  <li><a href="http://github.com/._week45-bs062.html">63</a></li>
-  <li><a href="http://github.com/._week45-bs063.html">64</a></li>
-  <li><a href="http://github.com/._week45-bs064.html">65</a></li>
-  <li><a href="http://github.com/._week45-bs065.html">66</a></li>
-  <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week45-bs076.html">77</a></li>
   <li><a href="http://github.com/._week45-bs057.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week45/html/._week45-bs057.html b/doc/pub/week45/html/._week45-bs057.html
index 46b255bf5..87c43a17f 100644
--- a/doc/pub/week45/html/._week45-bs057.html
+++ b/doc/pub/week45/html/._week45-bs057.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks (RNNs)">
-<title>Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks (RNNs)</title>
+<meta name="description" content="Week 45,  Convolutional Neural Networks (CCNs)">
+<title>Week 45,  Convolutional Neural Networks (CCNs)</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week45.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week45-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -37,19 +37,19 @@
 <!-- tocinfo
 {'highest level': 2,
  'sections': [('Plans for week 45', 2, None, 'plans-for-week-45'),
-              ('Material for the lab sessions, additional ways to present '
-               'classification results and other practicalities',
+              ('Material for the lab sessions',
                2,
                None,
-               'material-for-the-lab-sessions-additional-ways-to-present-classification-results-and-other-practicalities'),
-              ('Material for Lecture Monday November 4',
+               'material-for-the-lab-sessions'),
+              ('Material for Lecture Monday November 3',
                2,
                None,
-               'material-for-lecture-monday-november-4'),
-              ('Convolutional Neural Networks (recognizing images)',
+               'material-for-lecture-monday-november-3'),
+              ('Convolutional Neural Networks (recognizing images), reminder '
+               'from last week',
                2,
                None,
-               'convolutional-neural-networks-recognizing-images'),
+               'convolutional-neural-networks-recognizing-images-reminder-from-last-week'),
               ('What is the Difference', 2, None, 'what-is-the-difference'),
               ('Neural Networks vs CNNs', 2, None, 'neural-networks-vs-cnns'),
               ('Why CNNS for images, sound files, medical images from CT scans '
@@ -62,10 +62,13 @@
                None,
                'regular-nns-don-t-scale-well-to-full-images'),
               ('3D volumes of neurons', 2, None, '3d-volumes-of-neurons'),
+              ('More on Dimensionalities', 2, None, 'more-on-dimensionalities'),
+              ('Further remarks', 2, None, 'further-remarks'),
               ('Layers used to build CNNs',
                2,
                None,
                'layers-used-to-build-cnns'),
+              ('Transforming images', 2, None, 'transforming-images'),
               ('CNNs in brief', 2, None, 'cnns-in-brief'),
               ('A deep CNN model ("From Raschka et '
                'al":"https://github.com/rasbt/machine-learning-book")',
@@ -73,208 +76,104 @@
                None,
                'a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
               ('Key Idea', 2, None, 'key-idea'),
-              ('Building convolutional neural networks in Tensorflow and Keras',
+              ('Mathematics of CNNs', 2, None, 'mathematics-of-cnns'),
+              ('Convolution Examples: Polynomial multiplication',
                2,
                None,
-               'building-convolutional-neural-networks-in-tensorflow-and-keras'),
-              ('Setting it up', 2, None, 'setting-it-up'),
-              ('The MNIST dataset again', 2, None, 'the-mnist-dataset-again'),
-              ('Strong correlations', 2, None, 'strong-correlations'),
-              ('Layers of a CNN', 2, None, 'layers-of-a-cnn'),
-              ('Systematic reduction', 2, None, 'systematic-reduction'),
-              ('Prerequisites: Collect and pre-process data',
-               2,
-               None,
-               'prerequisites-collect-and-pre-process-data'),
-              ('Importing Keras and Tensorflow',
-               2,
-               None,
-               'importing-keras-and-tensorflow'),
-              ('Running with Keras', 2, None, 'running-with-keras'),
-              ('Final part', 2, None, 'final-part'),
-              ('Final visualization', 2, None, 'final-visualization'),
-              ('The CIFAR01 data set', 2, None, 'the-cifar01-data-set'),
-              ('Verifying the data set', 2, None, 'verifying-the-data-set'),
-              ('Set up  the model', 2, None, 'set-up-the-model'),
-              ('Add Dense layers on top', 2, None, 'add-dense-layers-on-top'),
-              ('Compile and train the model',
-               2,
-               None,
-               'compile-and-train-the-model'),
-              ('Finally, evaluate the model',
-               2,
-               None,
-               'finally-evaluate-the-model'),
-              ('Building our own CNN code',
-               2,
-               None,
-               'building-our-own-cnn-code'),
-              ('List of contents:', 3, None, 'list-of-contents'),
-              ('Schedulers', 3, None, 'schedulers'),
-              ('Usage of schedulers', 3, None, 'usage-of-schedulers'),
-              ('Cost functions', 3, None, 'cost-functions'),
-              ('Usage of cost functions', 3, None, 'usage-of-cost-functions'),
-              ('Activation functions', 3, None, 'activation-functions'),
-              ('Usage of activation functions',
-               3,
-               None,
-               'usage-of-activation-functions'),
-              ('Convolution', 3, None, 'convolution'),
-              ('Layers', 3, None, 'layers'),
-              ('Convolution2DLayer: convolution in a hidden layer',
-               3,
-               None,
-               'convolution2dlayer-convolution-in-a-hidden-layer'),
-              ('Backpropagation in the convolutional layer',
-               3,
-               None,
-               'backpropagation-in-the-convolutional-layer'),
-              ('Demonstration', 3, None, 'demonstration'),
-              ('Pooling Layer', 3, None, 'pooling-layer'),
-              ('Flattening Layer', 3, None, 'flattening-layer'),
-              ('Fully Connected Layers', 3, None, 'fully-connected-layers'),
-              ('Optimized Convolution2DLayer',
-               3,
-               None,
-               'optimized-convolution2dlayer'),
-              ('The Convolutional Neural Network (CNN)',
-               3,
-               None,
-               'the-convolutional-neural-network-cnn'),
-              ('Usage of CNN code', 3, None, 'usage-of-cnn-code'),
-              ('Additional Remarks', 3, None, 'additional-remarks'),
-              ('Remarks on the speed', 3, None, 'remarks-on-the-speed'),
-              ('Convolution using separable kernels',
-               3,
-               None,
-               'convolution-using-separable-kernels'),
-              ('Convolution in the Fourier domain',
-               3,
-               None,
-               'convolution-in-the-fourier-domain'),
-              ('From FFNNs and CNNs to recurrent neural networks (RNNs)',
-               2,
-               None,
-               'from-ffnns-and-cnns-to-recurrent-neural-networks-rnns'),
-              ('Feedback connections', 2, None, 'feedback-connections'),
-              ('Vanishing gradients', 2, None, 'vanishing-gradients'),
-              ('Recurrent neural networks (RNNs): Overarching view',
-               2,
-               None,
-               'recurrent-neural-networks-rnns-overarching-view'),
-              ('Sequential data only?', 2, None, 'sequential-data-only'),
-              ('Differential equations', 2, None, 'differential-equations'),
-              ('A simple example', 2, None, 'a-simple-example'),
-              ('RNNs', 2, None, 'rnns'),
-              ('What kinds of behaviour can RNNs exhibit?',
-               2,
-               None,
-               'what-kinds-of-behaviour-can-rnns-exhibit'),
-              ('Basic layout,  "Figures from Sebastian Rashcka et al, Machine '
-               'learning with Sickit-Learn and '
-               'PyTorch":"https://sebastianraschka.com/blog/2022/ml-pytorch-book.html"',
-               2,
-               None,
-               'basic-layout-figures-from-sebastian-rashcka-et-al-machine-learning-with-sickit-learn-and-pytorch-https-sebastianraschka-com-blog-2022-ml-pytorch-book-html'),
-              ('Solving differential equations with RNNs',
-               2,
-               None,
-               'solving-differential-equations-with-rnns'),
-              ('Two first-order differential equations',
-               2,
-               None,
-               'two-first-order-differential-equations'),
-              ('Velocity only', 2, None, 'velocity-only'),
-              ('Linking with RNNs', 2, None, 'linking-with-rnns'),
-              ('Minor rewrite', 2, None, 'minor-rewrite'),
-              ('RNNs in more detail', 2, None, 'rnns-in-more-detail'),
-              ('RNNs in more detail, part 2',
-               2,
-               None,
-               'rnns-in-more-detail-part-2'),
-              ('RNNs in more detail, part 3',
-               2,
-               None,
-               'rnns-in-more-detail-part-3'),
-              ('RNNs in more detail, part 4',
+               'convolution-examples-polynomial-multiplication'),
+              ('Efficient Polynomial Multiplication',
                2,
                None,
-               'rnns-in-more-detail-part-4'),
-              ('RNNs in more detail, part 5',
+               'efficient-polynomial-multiplication'),
+              ('Further simplification', 2, None, 'further-simplification'),
+              ('A more efficient way of coding the above Convolution',
                2,
                None,
-               'rnns-in-more-detail-part-5'),
-              ('RNNs in more detail, part 6',
+               'a-more-efficient-way-of-coding-the-above-convolution'),
+              ('Commutative process', 2, None, 'commutative-process'),
+              ('Toeplitz matrices', 2, None, 'toeplitz-matrices'),
+              ('Fourier series and Toeplitz matrices',
                2,
                None,
-               'rnns-in-more-detail-part-6'),
-              ('RNNs in more detail, part 7',
+               'fourier-series-and-toeplitz-matrices'),
+              ('Generalizing the above one-dimensional case',
                2,
                None,
-               'rnns-in-more-detail-part-7'),
-              ('Backpropagation through time',
+               'generalizing-the-above-one-dimensional-case'),
+              ('Memory considerations', 2, None, 'memory-considerations'),
+              ('Padding', 2, None, 'padding'),
+              ('New vector', 2, None, 'new-vector'),
+              ('Rewriting as dot products',
                2,
                None,
-               'backpropagation-through-time'),
-              ('The backward pass is linear',
+               'rewriting-as-dot-products'),
+              ('Cross correlation', 2, None, 'cross-correlation'),
+              ('Two-dimensional objects', 2, None, 'two-dimensional-objects'),
+              ('CNNs in more detail, simple example',
                2,
                None,
-               'the-backward-pass-is-linear'),
-              ('The problem of exploding or vanishing gradients',
+               'cnns-in-more-detail-simple-example'),
+              ('The convolution stage', 2, None, 'the-convolution-stage'),
+              ('Finding the number of parameters',
                2,
                None,
-               'the-problem-of-exploding-or-vanishing-gradients'),
-              ('Mathematical setup', 2, None, 'mathematical-setup'),
-              ('Back propagation in time through figures, part 1',
+               'finding-the-number-of-parameters'),
+              ('New image (or volume)', 2, None, 'new-image-or-volume'),
+              ('Parameters to train, common settings',
                2,
                None,
-               'back-propagation-in-time-through-figures-part-1'),
-              ('Back propagation in time, part 2',
-               2,
-               None,
-               'back-propagation-in-time-part-2'),
-              ('Back propagation in time, part 3',
+               'parameters-to-train-common-settings'),
+              ('Examples of CNN setups', 2, None, 'examples-of-cnn-setups'),
+              ('Summarizing: Performing a general discrete convolution ("From '
+               'Raschka et '
+               'al":"https://github.com/rasbt/machine-learning-book")',
                2,
                None,
-               'back-propagation-in-time-part-3'),
-              ('Back propagation in time, part 4',
+               'summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
+              ('Pooling', 2, None, 'pooling'),
+              ('Pooling arithmetic', 2, None, 'pooling-arithmetic'),
+              ('Pooling types ("From Raschka et '
+               'al":"https://github.com/rasbt/machine-learning-book")',
                2,
                None,
-               'back-propagation-in-time-part-4'),
-              ('Back propagation in time in equations',
+               'pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
+              ('Building convolutional neural networks using Tensorflow and '
+               'Keras',
                2,
                None,
-               'back-propagation-in-time-in-equations'),
-              ('Chain rule again', 2, None, 'chain-rule-again'),
-              ('Gradients of loss functions',
+               'building-convolutional-neural-networks-using-tensorflow-and-keras'),
+              ('Setting it up', 2, None, 'setting-it-up'),
+              ('The MNIST dataset again', 2, None, 'the-mnist-dataset-again'),
+              ('Strong correlations', 2, None, 'strong-correlations'),
+              ('Layers of a CNN', 2, None, 'layers-of-a-cnn'),
+              ('Systematic reduction', 2, None, 'systematic-reduction'),
+              ('Prerequisites: Collect and pre-process data',
                2,
                None,
-               'gradients-of-loss-functions'),
-              ('Summary of RNNs', 2, None, 'summary-of-rnns'),
-              ('Summary of a  typical RNN',
+               'prerequisites-collect-and-pre-process-data'),
+              ('Importing Keras and Tensorflow',
                2,
                None,
-               'summary-of-a-typical-rnn'),
-              ('Four effective ways to learn an RNN and preparing for next '
-               'week',
+               'importing-keras-and-tensorflow'),
+              ('Running with Keras', 2, None, 'running-with-keras'),
+              ('Final part', 2, None, 'final-part'),
+              ('Final visualization', 2, None, 'final-visualization'),
+              ('The CIFAR01 data set', 2, None, 'the-cifar01-data-set'),
+              ('Verifying the data set', 2, None, 'verifying-the-data-set'),
+              ('Set up  the model', 2, None, 'set-up-the-model'),
+              ('Add Dense layers on top', 2, None, 'add-dense-layers-on-top'),
+              ('Compile and train the model',
                2,
                None,
-               'four-effective-ways-to-learn-an-rnn-and-preparing-for-next-week'),
-              ('Gating mechanism: Long Short Term Memory (LSTM)',
+               'compile-and-train-the-model'),
+              ('Finally, evaluate the model',
                2,
                None,
-               'gating-mechanism-long-short-term-memory-lstm'),
-              ('Implementing a memory cell in a neural network',
+               'finally-evaluate-the-model'),
+              ('Building code using Pytorch',
                2,
                None,
-               'implementing-a-memory-cell-in-a-neural-network'),
-              ('LSTM details', 2, None, 'lstm-details'),
-              ('Basic layout', 2, None, 'basic-layout'),
-              ('More LSTM details', 2, None, 'more-lstm-details'),
-              ('The forget gate', 2, None, 'the-forget-gate'),
-              ('Input gate', 2, None, 'input-gate'),
-              ('Forget and input', 2, None, 'forget-and-input'),
-              ('Output gate', 2, None, 'output-gate')]}
+               'building-code-using-pytorch')]}
 end of tocinfo -->
 
 <body>
@@ -302,111 +201,72 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week45-bs.html">Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks (RNNs)</a>
+    <a class="navbar-brand" href="http://github.com/week45-bs.html">Week 45,  Convolutional Neural Networks (CCNs)</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs001.html#plans-for-week-45" style="font-size: 80%;"><b>Plans for week 45</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs002.html#material-for-the-lab-sessions-additional-ways-to-present-classification-results-and-other-practicalities" style="font-size: 80%;"><b>Material for the lab sessions, additional ways to present classification results and other practicalities</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs003.html#material-for-lecture-monday-november-4" style="font-size: 80%;"><b>Material for Lecture Monday November 4</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs004.html#convolutional-neural-networks-recognizing-images" style="font-size: 80%;"><b>Convolutional Neural Networks (recognizing images)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs005.html#what-is-the-difference" style="font-size: 80%;"><b>What is the Difference</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs006.html#neural-networks-vs-cnns" style="font-size: 80%;"><b>Neural Networks vs CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs007.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;"><b>Why CNNS for images, sound files, medical images from CT scans etc?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs008.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;"><b>Regular NNs don’t scale well to full images</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs009.html#3d-volumes-of-neurons" style="font-size: 80%;"><b>3D volumes of neurons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs010.html#layers-used-to-build-cnns" style="font-size: 80%;"><b>Layers used to build CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs011.html#cnns-in-brief" style="font-size: 80%;"><b>CNNs in brief</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs012.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs013.html#key-idea" style="font-size: 80%;"><b>Key Idea</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs014.html#building-convolutional-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building convolutional neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs015.html#setting-it-up" style="font-size: 80%;"><b>Setting it up</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs016.html#the-mnist-dataset-again" style="font-size: 80%;"><b>The MNIST dataset again</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs017.html#strong-correlations" style="font-size: 80%;"><b>Strong correlations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs018.html#layers-of-a-cnn" style="font-size: 80%;"><b>Layers of a CNN</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs019.html#systematic-reduction" style="font-size: 80%;"><b>Systematic reduction</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs020.html#prerequisites-collect-and-pre-process-data" style="font-size: 80%;"><b>Prerequisites: Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs021.html#importing-keras-and-tensorflow" style="font-size: 80%;"><b>Importing Keras and Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs022.html#running-with-keras" style="font-size: 80%;"><b>Running with Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs023.html#final-part" style="font-size: 80%;"><b>Final part</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs024.html#final-visualization" style="font-size: 80%;"><b>Final visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs025.html#the-cifar01-data-set" style="font-size: 80%;"><b>The CIFAR01 data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs026.html#verifying-the-data-set" style="font-size: 80%;"><b>Verifying the data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs027.html#set-up-the-model" style="font-size: 80%;"><b>Set up  the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs028.html#add-dense-layers-on-top" style="font-size: 80%;"><b>Add Dense layers on top</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs029.html#compile-and-train-the-model" style="font-size: 80%;"><b>Compile and train the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs030.html#finally-evaluate-the-model" style="font-size: 80%;"><b>Finally, evaluate the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#building-our-own-cnn-code" style="font-size: 80%;"><b>Building our own CNN code</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#list-of-contents" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;List of contents:</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution2dlayer-convolution-in-a-hidden-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution2DLayer: convolution in a hidden layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#backpropagation-in-the-convolutional-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Backpropagation in the convolutional layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#demonstration" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Demonstration</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#pooling-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Pooling Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#flattening-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Flattening Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#fully-connected-layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Fully Connected Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#optimized-convolution2dlayer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Optimized Convolution2DLayer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#the-convolutional-neural-network-cnn" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;The Convolutional Neural Network (CNN)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-cnn-code" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of CNN code</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#additional-remarks" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Additional Remarks</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#remarks-on-the-speed" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Remarks on the speed</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution-using-separable-kernels" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution using separable kernels</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution-in-the-fourier-domain" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution in the Fourier domain</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs032.html#from-ffnns-and-cnns-to-recurrent-neural-networks-rnns" style="font-size: 80%;"><b>From FFNNs and CNNs to recurrent neural networks (RNNs)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs033.html#feedback-connections" style="font-size: 80%;"><b>Feedback connections</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs034.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs035.html#recurrent-neural-networks-rnns-overarching-view" style="font-size: 80%;"><b>Recurrent neural networks (RNNs): Overarching view</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs036.html#sequential-data-only" style="font-size: 80%;"><b>Sequential data only?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs037.html#differential-equations" style="font-size: 80%;"><b>Differential equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs038.html#a-simple-example" style="font-size: 80%;"><b>A simple example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs039.html#rnns" style="font-size: 80%;"><b>RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs040.html#what-kinds-of-behaviour-can-rnns-exhibit" style="font-size: 80%;"><b>What kinds of behaviour can RNNs exhibit?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs041.html#basic-layout-figures-from-sebastian-rashcka-et-al-machine-learning-with-sickit-learn-and-pytorch-https-sebastianraschka-com-blog-2022-ml-pytorch-book-html" style="font-size: 80%;"><b>Basic layout,  "Figures from Sebastian Rashcka et al, Machine learning with Sickit-Learn and PyTorch":"https://sebastianraschka.com/blog/2022/ml-pytorch-book.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs042.html#solving-differential-equations-with-rnns" style="font-size: 80%;"><b>Solving differential equations with RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs043.html#two-first-order-differential-equations" style="font-size: 80%;"><b>Two first-order differential equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs044.html#velocity-only" style="font-size: 80%;"><b>Velocity only</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs045.html#linking-with-rnns" style="font-size: 80%;"><b>Linking with RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs046.html#minor-rewrite" style="font-size: 80%;"><b>Minor rewrite</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs047.html#rnns-in-more-detail" style="font-size: 80%;"><b>RNNs in more detail</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs048.html#rnns-in-more-detail-part-2" style="font-size: 80%;"><b>RNNs in more detail, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs049.html#rnns-in-more-detail-part-3" style="font-size: 80%;"><b>RNNs in more detail, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs050.html#rnns-in-more-detail-part-4" style="font-size: 80%;"><b>RNNs in more detail, part 4</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs051.html#rnns-in-more-detail-part-5" style="font-size: 80%;"><b>RNNs in more detail, part 5</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs052.html#rnns-in-more-detail-part-6" style="font-size: 80%;"><b>RNNs in more detail, part 6</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs053.html#rnns-in-more-detail-part-7" style="font-size: 80%;"><b>RNNs in more detail, part 7</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs054.html#backpropagation-through-time" style="font-size: 80%;"><b>Backpropagation through time</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs055.html#the-backward-pass-is-linear" style="font-size: 80%;"><b>The backward pass is linear</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs056.html#the-problem-of-exploding-or-vanishing-gradients" style="font-size: 80%;"><b>The problem of exploding or vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="#mathematical-setup" style="font-size: 80%;"><b>Mathematical setup</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs058.html#back-propagation-in-time-through-figures-part-1" style="font-size: 80%;"><b>Back propagation in time through figures, part 1</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs059.html#back-propagation-in-time-part-2" style="font-size: 80%;"><b>Back propagation in time, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs060.html#back-propagation-in-time-part-3" style="font-size: 80%;"><b>Back propagation in time, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs061.html#back-propagation-in-time-part-4" style="font-size: 80%;"><b>Back propagation in time, part 4</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs062.html#back-propagation-in-time-in-equations" style="font-size: 80%;"><b>Back propagation in time in equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs063.html#chain-rule-again" style="font-size: 80%;"><b>Chain rule again</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs064.html#gradients-of-loss-functions" style="font-size: 80%;"><b>Gradients of loss functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs065.html#summary-of-rnns" style="font-size: 80%;"><b>Summary of RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs066.html#summary-of-a-typical-rnn" style="font-size: 80%;"><b>Summary of a  typical RNN</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs067.html#four-effective-ways-to-learn-an-rnn-and-preparing-for-next-week" style="font-size: 80%;"><b>Four effective ways to learn an RNN and preparing for next week</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs068.html#gating-mechanism-long-short-term-memory-lstm" style="font-size: 80%;"><b>Gating mechanism: Long Short Term Memory (LSTM)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs069.html#implementing-a-memory-cell-in-a-neural-network" style="font-size: 80%;"><b>Implementing a memory cell in a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs070.html#lstm-details" style="font-size: 80%;"><b>LSTM details</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs071.html#basic-layout" style="font-size: 80%;"><b>Basic layout</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs072.html#more-lstm-details" style="font-size: 80%;"><b>More LSTM details</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs073.html#the-forget-gate" style="font-size: 80%;"><b>The forget gate</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs074.html#input-gate" style="font-size: 80%;"><b>Input gate</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs075.html#forget-and-input" style="font-size: 80%;"><b>Forget and input</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs076.html#output-gate" style="font-size: 80%;"><b>Output gate</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs001.html#plans-for-week-45" style="font-size: 80%;">Plans for week 45</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs002.html#material-for-the-lab-sessions" style="font-size: 80%;">Material for the lab sessions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs003.html#material-for-lecture-monday-november-3" style="font-size: 80%;">Material for Lecture Monday November 3</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs004.html#convolutional-neural-networks-recognizing-images-reminder-from-last-week" style="font-size: 80%;">Convolutional Neural Networks (recognizing images), reminder from last week</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs005.html#what-is-the-difference" style="font-size: 80%;">What is the Difference</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs006.html#neural-networks-vs-cnns" style="font-size: 80%;">Neural Networks vs CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs007.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;">Why CNNS for images, sound files, medical images from CT scans etc?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs008.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;">Regular NNs don’t scale well to full images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs009.html#3d-volumes-of-neurons" style="font-size: 80%;">3D volumes of neurons</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs010.html#more-on-dimensionalities" style="font-size: 80%;">More on Dimensionalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs011.html#further-remarks" style="font-size: 80%;">Further remarks</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs012.html#layers-used-to-build-cnns" style="font-size: 80%;">Layers used to build CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs013.html#transforming-images" style="font-size: 80%;">Transforming images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs014.html#cnns-in-brief" style="font-size: 80%;">CNNs in brief</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs015.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs016.html#key-idea" style="font-size: 80%;">Key Idea</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs017.html#mathematics-of-cnns" style="font-size: 80%;">Mathematics of CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs018.html#convolution-examples-polynomial-multiplication" style="font-size: 80%;">Convolution Examples: Polynomial multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs019.html#efficient-polynomial-multiplication" style="font-size: 80%;">Efficient Polynomial Multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs020.html#further-simplification" style="font-size: 80%;">Further simplification</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs021.html#a-more-efficient-way-of-coding-the-above-convolution" style="font-size: 80%;">A more efficient way of coding the above Convolution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs022.html#commutative-process" style="font-size: 80%;">Commutative process</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs023.html#toeplitz-matrices" style="font-size: 80%;">Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs024.html#fourier-series-and-toeplitz-matrices" style="font-size: 80%;">Fourier series and Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs025.html#generalizing-the-above-one-dimensional-case" style="font-size: 80%;">Generalizing the above one-dimensional case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs026.html#memory-considerations" style="font-size: 80%;">Memory considerations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs027.html#padding" style="font-size: 80%;">Padding</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs028.html#new-vector" style="font-size: 80%;">New vector</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs029.html#rewriting-as-dot-products" style="font-size: 80%;">Rewriting as dot products</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs030.html#cross-correlation" style="font-size: 80%;">Cross correlation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs030.html#two-dimensional-objects" style="font-size: 80%;">Two-dimensional objects</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#cnns-in-more-detail-simple-example" style="font-size: 80%;">CNNs in more detail, simple example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs032.html#the-convolution-stage" style="font-size: 80%;">The convolution stage</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs033.html#finding-the-number-of-parameters" style="font-size: 80%;">Finding the number of parameters</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs034.html#new-image-or-volume" style="font-size: 80%;">New image (or volume)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs035.html#parameters-to-train-common-settings" style="font-size: 80%;">Parameters to train, common settings</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs036.html#examples-of-cnn-setups" style="font-size: 80%;">Examples of CNN setups</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs037.html#summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Summarizing: Performing a general discrete convolution ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs038.html#pooling" style="font-size: 80%;">Pooling</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs039.html#pooling-arithmetic" style="font-size: 80%;">Pooling arithmetic</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs040.html#pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Pooling types ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs041.html#building-convolutional-neural-networks-using-tensorflow-and-keras" style="font-size: 80%;">Building convolutional neural networks using Tensorflow and Keras</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs042.html#setting-it-up" style="font-size: 80%;">Setting it up</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs043.html#the-mnist-dataset-again" style="font-size: 80%;">The MNIST dataset again</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs044.html#strong-correlations" style="font-size: 80%;">Strong correlations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs045.html#layers-of-a-cnn" style="font-size: 80%;">Layers of a CNN</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs046.html#systematic-reduction" style="font-size: 80%;">Systematic reduction</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs047.html#prerequisites-collect-and-pre-process-data" style="font-size: 80%;">Prerequisites: Collect and pre-process data</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs048.html#importing-keras-and-tensorflow" style="font-size: 80%;">Importing Keras and Tensorflow</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs049.html#running-with-keras" style="font-size: 80%;">Running with Keras</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs050.html#final-part" style="font-size: 80%;">Final part</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs051.html#final-visualization" style="font-size: 80%;">Final visualization</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs052.html#the-cifar01-data-set" style="font-size: 80%;">The CIFAR01 data set</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs053.html#verifying-the-data-set" style="font-size: 80%;">Verifying the data set</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs054.html#set-up-the-model" style="font-size: 80%;">Set up  the model</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs055.html#add-dense-layers-on-top" style="font-size: 80%;">Add Dense layers on top</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs056.html#compile-and-train-the-model" style="font-size: 80%;">Compile and train the model</a></li>
+     <!-- navigation toc: --> <li><a href="#finally-evaluate-the-model" style="font-size: 80%;">Finally, evaluate the model</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs058.html#building-code-using-pytorch" style="font-size: 80%;">Building code using Pytorch</a></li>
 
         </ul>
       </li>
@@ -418,21 +278,39 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0057"></a>
 <!-- !split -->
-<h2 id="mathematical-setup" class="anchor">Mathematical setup </h2>
+<h2 id="finally-evaluate-the-model" class="anchor">Finally, evaluate the model </h2>
+
+
+<!-- code=python (!bc pycod) typeset with pygments style "default" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #f8f8f8">
+  <pre style="line-height: 125%;">plt<span style="color: #666666">.</span>plot(history<span style="color: #666666">.</span>history[<span style="color: #BA2121">&#39;accuracy&#39;</span>], label<span style="color: #666666">=</span><span style="color: #BA2121">&#39;accuracy&#39;</span>)
+plt<span style="color: #666666">.</span>plot(history<span style="color: #666666">.</span>history[<span style="color: #BA2121">&#39;val_accuracy&#39;</span>], label <span style="color: #666666">=</span> <span style="color: #BA2121">&#39;val_accuracy&#39;</span>)
+plt<span style="color: #666666">.</span>xlabel(<span style="color: #BA2121">&#39;Epoch&#39;</span>)
+plt<span style="color: #666666">.</span>ylabel(<span style="color: #BA2121">&#39;Accuracy&#39;</span>)
+plt<span style="color: #666666">.</span>ylim([<span style="color: #666666">0.5</span>, <span style="color: #666666">1</span>])
+plt<span style="color: #666666">.</span>legend(loc<span style="color: #666666">=</span><span style="color: #BA2121">&#39;lower right&#39;</span>)
 
-<p>The expression for the simplest Recurrent network resembles that of a
-regular feed-forward neural network, but now with
-the concept of temporal dependencies
-</p>
+test_loss, test_acc <span style="color: #666666">=</span> model<span style="color: #666666">.</span>evaluate(test_images,  test_labels, verbose<span style="color: #666666">=2</span>)
 
-$$
-\begin{align*}
-    \mathbf{a}^{(t)} & = U * \mathbf{x}^{(t)} + W * \mathbf{h}^{(t-1)} + \mathbf{b}, \notag \\
-    \mathbf{h}^{(t)} &= \sigma_h(\mathbf{a}^{(t)}), \notag\\
-    \mathbf{y}^{(t)} &= V * \mathbf{h}^{(t)} + \mathbf{c}, \notag\\
-    \mathbf{\hat{y}}^{(t)} &= \sigma_y(\mathbf{y}^{(t)}).
-\end{align*}
-$$
+<span style="color: #008000">print</span>(test_acc)
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
+</div>
 
 
 <p>
@@ -451,16 +329,6 @@ <h2 id="mathematical-setup" class="anchor">Mathematical setup </h2>
   <li><a href="http://github.com/._week45-bs056.html">57</a></li>
   <li class="active"><a href="http://github.com/._week45-bs057.html">58</a></li>
   <li><a href="http://github.com/._week45-bs058.html">59</a></li>
-  <li><a href="http://github.com/._week45-bs059.html">60</a></li>
-  <li><a href="http://github.com/._week45-bs060.html">61</a></li>
-  <li><a href="http://github.com/._week45-bs061.html">62</a></li>
-  <li><a href="http://github.com/._week45-bs062.html">63</a></li>
-  <li><a href="http://github.com/._week45-bs063.html">64</a></li>
-  <li><a href="http://github.com/._week45-bs064.html">65</a></li>
-  <li><a href="http://github.com/._week45-bs065.html">66</a></li>
-  <li><a href="http://github.com/._week45-bs066.html">67</a></li>
-  <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week45-bs076.html">77</a></li>
   <li><a href="http://github.com/._week45-bs058.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
diff --git a/doc/pub/week45/html/._week45-bs058.html b/doc/pub/week45/html/._week45-bs058.html
index bc1215f4b..522418d09 100644
--- a/doc/pub/week45/html/._week45-bs058.html
+++ b/doc/pub/week45/html/._week45-bs058.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks (RNNs)">
-<title>Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks (RNNs)</title>
+<meta name="description" content="Week 45,  Convolutional Neural Networks (CCNs)">
+<title>Week 45,  Convolutional Neural Networks (CCNs)</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week45.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week45-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -37,19 +37,19 @@
 <!-- tocinfo
 {'highest level': 2,
  'sections': [('Plans for week 45', 2, None, 'plans-for-week-45'),
-              ('Material for the lab sessions, additional ways to present '
-               'classification results and other practicalities',
+              ('Material for the lab sessions',
                2,
                None,
-               'material-for-the-lab-sessions-additional-ways-to-present-classification-results-and-other-practicalities'),
-              ('Material for Lecture Monday November 4',
+               'material-for-the-lab-sessions'),
+              ('Material for Lecture Monday November 3',
                2,
                None,
-               'material-for-lecture-monday-november-4'),
-              ('Convolutional Neural Networks (recognizing images)',
+               'material-for-lecture-monday-november-3'),
+              ('Convolutional Neural Networks (recognizing images), reminder '
+               'from last week',
                2,
                None,
-               'convolutional-neural-networks-recognizing-images'),
+               'convolutional-neural-networks-recognizing-images-reminder-from-last-week'),
               ('What is the Difference', 2, None, 'what-is-the-difference'),
               ('Neural Networks vs CNNs', 2, None, 'neural-networks-vs-cnns'),
               ('Why CNNS for images, sound files, medical images from CT scans '
@@ -62,10 +62,13 @@
                None,
                'regular-nns-don-t-scale-well-to-full-images'),
               ('3D volumes of neurons', 2, None, '3d-volumes-of-neurons'),
+              ('More on Dimensionalities', 2, None, 'more-on-dimensionalities'),
+              ('Further remarks', 2, None, 'further-remarks'),
               ('Layers used to build CNNs',
                2,
                None,
                'layers-used-to-build-cnns'),
+              ('Transforming images', 2, None, 'transforming-images'),
               ('CNNs in brief', 2, None, 'cnns-in-brief'),
               ('A deep CNN model ("From Raschka et '
                'al":"https://github.com/rasbt/machine-learning-book")',
@@ -73,208 +76,104 @@
                None,
                'a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
               ('Key Idea', 2, None, 'key-idea'),
-              ('Building convolutional neural networks in Tensorflow and Keras',
+              ('Mathematics of CNNs', 2, None, 'mathematics-of-cnns'),
+              ('Convolution Examples: Polynomial multiplication',
                2,
                None,
-               'building-convolutional-neural-networks-in-tensorflow-and-keras'),
-              ('Setting it up', 2, None, 'setting-it-up'),
-              ('The MNIST dataset again', 2, None, 'the-mnist-dataset-again'),
-              ('Strong correlations', 2, None, 'strong-correlations'),
-              ('Layers of a CNN', 2, None, 'layers-of-a-cnn'),
-              ('Systematic reduction', 2, None, 'systematic-reduction'),
-              ('Prerequisites: Collect and pre-process data',
-               2,
-               None,
-               'prerequisites-collect-and-pre-process-data'),
-              ('Importing Keras and Tensorflow',
-               2,
-               None,
-               'importing-keras-and-tensorflow'),
-              ('Running with Keras', 2, None, 'running-with-keras'),
-              ('Final part', 2, None, 'final-part'),
-              ('Final visualization', 2, None, 'final-visualization'),
-              ('The CIFAR01 data set', 2, None, 'the-cifar01-data-set'),
-              ('Verifying the data set', 2, None, 'verifying-the-data-set'),
-              ('Set up  the model', 2, None, 'set-up-the-model'),
-              ('Add Dense layers on top', 2, None, 'add-dense-layers-on-top'),
-              ('Compile and train the model',
-               2,
-               None,
-               'compile-and-train-the-model'),
-              ('Finally, evaluate the model',
-               2,
-               None,
-               'finally-evaluate-the-model'),
-              ('Building our own CNN code',
-               2,
-               None,
-               'building-our-own-cnn-code'),
-              ('List of contents:', 3, None, 'list-of-contents'),
-              ('Schedulers', 3, None, 'schedulers'),
-              ('Usage of schedulers', 3, None, 'usage-of-schedulers'),
-              ('Cost functions', 3, None, 'cost-functions'),
-              ('Usage of cost functions', 3, None, 'usage-of-cost-functions'),
-              ('Activation functions', 3, None, 'activation-functions'),
-              ('Usage of activation functions',
-               3,
-               None,
-               'usage-of-activation-functions'),
-              ('Convolution', 3, None, 'convolution'),
-              ('Layers', 3, None, 'layers'),
-              ('Convolution2DLayer: convolution in a hidden layer',
-               3,
-               None,
-               'convolution2dlayer-convolution-in-a-hidden-layer'),
-              ('Backpropagation in the convolutional layer',
-               3,
-               None,
-               'backpropagation-in-the-convolutional-layer'),
-              ('Demonstration', 3, None, 'demonstration'),
-              ('Pooling Layer', 3, None, 'pooling-layer'),
-              ('Flattening Layer', 3, None, 'flattening-layer'),
-              ('Fully Connected Layers', 3, None, 'fully-connected-layers'),
-              ('Optimized Convolution2DLayer',
-               3,
-               None,
-               'optimized-convolution2dlayer'),
-              ('The Convolutional Neural Network (CNN)',
-               3,
-               None,
-               'the-convolutional-neural-network-cnn'),
-              ('Usage of CNN code', 3, None, 'usage-of-cnn-code'),
-              ('Additional Remarks', 3, None, 'additional-remarks'),
-              ('Remarks on the speed', 3, None, 'remarks-on-the-speed'),
-              ('Convolution using separable kernels',
-               3,
-               None,
-               'convolution-using-separable-kernels'),
-              ('Convolution in the Fourier domain',
-               3,
-               None,
-               'convolution-in-the-fourier-domain'),
-              ('From FFNNs and CNNs to recurrent neural networks (RNNs)',
-               2,
-               None,
-               'from-ffnns-and-cnns-to-recurrent-neural-networks-rnns'),
-              ('Feedback connections', 2, None, 'feedback-connections'),
-              ('Vanishing gradients', 2, None, 'vanishing-gradients'),
-              ('Recurrent neural networks (RNNs): Overarching view',
-               2,
-               None,
-               'recurrent-neural-networks-rnns-overarching-view'),
-              ('Sequential data only?', 2, None, 'sequential-data-only'),
-              ('Differential equations', 2, None, 'differential-equations'),
-              ('A simple example', 2, None, 'a-simple-example'),
-              ('RNNs', 2, None, 'rnns'),
-              ('What kinds of behaviour can RNNs exhibit?',
-               2,
-               None,
-               'what-kinds-of-behaviour-can-rnns-exhibit'),
-              ('Basic layout,  "Figures from Sebastian Rashcka et al, Machine '
-               'learning with Sickit-Learn and '
-               'PyTorch":"https://sebastianraschka.com/blog/2022/ml-pytorch-book.html"',
-               2,
-               None,
-               'basic-layout-figures-from-sebastian-rashcka-et-al-machine-learning-with-sickit-learn-and-pytorch-https-sebastianraschka-com-blog-2022-ml-pytorch-book-html'),
-              ('Solving differential equations with RNNs',
-               2,
-               None,
-               'solving-differential-equations-with-rnns'),
-              ('Two first-order differential equations',
-               2,
-               None,
-               'two-first-order-differential-equations'),
-              ('Velocity only', 2, None, 'velocity-only'),
-              ('Linking with RNNs', 2, None, 'linking-with-rnns'),
-              ('Minor rewrite', 2, None, 'minor-rewrite'),
-              ('RNNs in more detail', 2, None, 'rnns-in-more-detail'),
-              ('RNNs in more detail, part 2',
+               'convolution-examples-polynomial-multiplication'),
+              ('Efficient Polynomial Multiplication',
                2,
                None,
-               'rnns-in-more-detail-part-2'),
-              ('RNNs in more detail, part 3',
+               'efficient-polynomial-multiplication'),
+              ('Further simplification', 2, None, 'further-simplification'),
+              ('A more efficient way of coding the above Convolution',
                2,
                None,
-               'rnns-in-more-detail-part-3'),
-              ('RNNs in more detail, part 4',
+               'a-more-efficient-way-of-coding-the-above-convolution'),
+              ('Commutative process', 2, None, 'commutative-process'),
+              ('Toeplitz matrices', 2, None, 'toeplitz-matrices'),
+              ('Fourier series and Toeplitz matrices',
                2,
                None,
-               'rnns-in-more-detail-part-4'),
-              ('RNNs in more detail, part 5',
+               'fourier-series-and-toeplitz-matrices'),
+              ('Generalizing the above one-dimensional case',
                2,
                None,
-               'rnns-in-more-detail-part-5'),
-              ('RNNs in more detail, part 6',
+               'generalizing-the-above-one-dimensional-case'),
+              ('Memory considerations', 2, None, 'memory-considerations'),
+              ('Padding', 2, None, 'padding'),
+              ('New vector', 2, None, 'new-vector'),
+              ('Rewriting as dot products',
                2,
                None,
-               'rnns-in-more-detail-part-6'),
-              ('RNNs in more detail, part 7',
+               'rewriting-as-dot-products'),
+              ('Cross correlation', 2, None, 'cross-correlation'),
+              ('Two-dimensional objects', 2, None, 'two-dimensional-objects'),
+              ('CNNs in more detail, simple example',
                2,
                None,
-               'rnns-in-more-detail-part-7'),
-              ('Backpropagation through time',
+               'cnns-in-more-detail-simple-example'),
+              ('The convolution stage', 2, None, 'the-convolution-stage'),
+              ('Finding the number of parameters',
                2,
                None,
-               'backpropagation-through-time'),
-              ('The backward pass is linear',
+               'finding-the-number-of-parameters'),
+              ('New image (or volume)', 2, None, 'new-image-or-volume'),
+              ('Parameters to train, common settings',
                2,
                None,
-               'the-backward-pass-is-linear'),
-              ('The problem of exploding or vanishing gradients',
-               2,
-               None,
-               'the-problem-of-exploding-or-vanishing-gradients'),
-              ('Mathematical setup', 2, None, 'mathematical-setup'),
-              ('Back propagation in time through figures, part 1',
-               2,
-               None,
-               'back-propagation-in-time-through-figures-part-1'),
-              ('Back propagation in time, part 2',
-               2,
-               None,
-               'back-propagation-in-time-part-2'),
-              ('Back propagation in time, part 3',
+               'parameters-to-train-common-settings'),
+              ('Examples of CNN setups', 2, None, 'examples-of-cnn-setups'),
+              ('Summarizing: Performing a general discrete convolution ("From '
+               'Raschka et '
+               'al":"https://github.com/rasbt/machine-learning-book")',
                2,
                None,
-               'back-propagation-in-time-part-3'),
-              ('Back propagation in time, part 4',
+               'summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
+              ('Pooling', 2, None, 'pooling'),
+              ('Pooling arithmetic', 2, None, 'pooling-arithmetic'),
+              ('Pooling types ("From Raschka et '
+               'al":"https://github.com/rasbt/machine-learning-book")',
                2,
                None,
-               'back-propagation-in-time-part-4'),
-              ('Back propagation in time in equations',
+               'pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
+              ('Building convolutional neural networks using Tensorflow and '
+               'Keras',
                2,
                None,
-               'back-propagation-in-time-in-equations'),
-              ('Chain rule again', 2, None, 'chain-rule-again'),
-              ('Gradients of loss functions',
+               'building-convolutional-neural-networks-using-tensorflow-and-keras'),
+              ('Setting it up', 2, None, 'setting-it-up'),
+              ('The MNIST dataset again', 2, None, 'the-mnist-dataset-again'),
+              ('Strong correlations', 2, None, 'strong-correlations'),
+              ('Layers of a CNN', 2, None, 'layers-of-a-cnn'),
+              ('Systematic reduction', 2, None, 'systematic-reduction'),
+              ('Prerequisites: Collect and pre-process data',
                2,
                None,
-               'gradients-of-loss-functions'),
-              ('Summary of RNNs', 2, None, 'summary-of-rnns'),
-              ('Summary of a  typical RNN',
+               'prerequisites-collect-and-pre-process-data'),
+              ('Importing Keras and Tensorflow',
                2,
                None,
-               'summary-of-a-typical-rnn'),
-              ('Four effective ways to learn an RNN and preparing for next '
-               'week',
+               'importing-keras-and-tensorflow'),
+              ('Running with Keras', 2, None, 'running-with-keras'),
+              ('Final part', 2, None, 'final-part'),
+              ('Final visualization', 2, None, 'final-visualization'),
+              ('The CIFAR01 data set', 2, None, 'the-cifar01-data-set'),
+              ('Verifying the data set', 2, None, 'verifying-the-data-set'),
+              ('Set up  the model', 2, None, 'set-up-the-model'),
+              ('Add Dense layers on top', 2, None, 'add-dense-layers-on-top'),
+              ('Compile and train the model',
                2,
                None,
-               'four-effective-ways-to-learn-an-rnn-and-preparing-for-next-week'),
-              ('Gating mechanism: Long Short Term Memory (LSTM)',
+               'compile-and-train-the-model'),
+              ('Finally, evaluate the model',
                2,
                None,
-               'gating-mechanism-long-short-term-memory-lstm'),
-              ('Implementing a memory cell in a neural network',
+               'finally-evaluate-the-model'),
+              ('Building code using Pytorch',
                2,
                None,
-               'implementing-a-memory-cell-in-a-neural-network'),
-              ('LSTM details', 2, None, 'lstm-details'),
-              ('Basic layout', 2, None, 'basic-layout'),
-              ('More LSTM details', 2, None, 'more-lstm-details'),
-              ('The forget gate', 2, None, 'the-forget-gate'),
-              ('Input gate', 2, None, 'input-gate'),
-              ('Forget and input', 2, None, 'forget-and-input'),
-              ('Output gate', 2, None, 'output-gate')]}
+               'building-code-using-pytorch')]}
 end of tocinfo -->
 
 <body>
@@ -302,111 +201,72 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week45-bs.html">Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks (RNNs)</a>
+    <a class="navbar-brand" href="http://github.com/week45-bs.html">Week 45,  Convolutional Neural Networks (CCNs)</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs001.html#plans-for-week-45" style="font-size: 80%;"><b>Plans for week 45</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs002.html#material-for-the-lab-sessions-additional-ways-to-present-classification-results-and-other-practicalities" style="font-size: 80%;"><b>Material for the lab sessions, additional ways to present classification results and other practicalities</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs003.html#material-for-lecture-monday-november-4" style="font-size: 80%;"><b>Material for Lecture Monday November 4</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs004.html#convolutional-neural-networks-recognizing-images" style="font-size: 80%;"><b>Convolutional Neural Networks (recognizing images)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs005.html#what-is-the-difference" style="font-size: 80%;"><b>What is the Difference</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs006.html#neural-networks-vs-cnns" style="font-size: 80%;"><b>Neural Networks vs CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs007.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;"><b>Why CNNS for images, sound files, medical images from CT scans etc?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs008.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;"><b>Regular NNs don’t scale well to full images</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs009.html#3d-volumes-of-neurons" style="font-size: 80%;"><b>3D volumes of neurons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs010.html#layers-used-to-build-cnns" style="font-size: 80%;"><b>Layers used to build CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs011.html#cnns-in-brief" style="font-size: 80%;"><b>CNNs in brief</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs012.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs013.html#key-idea" style="font-size: 80%;"><b>Key Idea</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs014.html#building-convolutional-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building convolutional neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs015.html#setting-it-up" style="font-size: 80%;"><b>Setting it up</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs016.html#the-mnist-dataset-again" style="font-size: 80%;"><b>The MNIST dataset again</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs017.html#strong-correlations" style="font-size: 80%;"><b>Strong correlations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs018.html#layers-of-a-cnn" style="font-size: 80%;"><b>Layers of a CNN</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs019.html#systematic-reduction" style="font-size: 80%;"><b>Systematic reduction</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs020.html#prerequisites-collect-and-pre-process-data" style="font-size: 80%;"><b>Prerequisites: Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs021.html#importing-keras-and-tensorflow" style="font-size: 80%;"><b>Importing Keras and Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs022.html#running-with-keras" style="font-size: 80%;"><b>Running with Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs023.html#final-part" style="font-size: 80%;"><b>Final part</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs024.html#final-visualization" style="font-size: 80%;"><b>Final visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs025.html#the-cifar01-data-set" style="font-size: 80%;"><b>The CIFAR01 data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs026.html#verifying-the-data-set" style="font-size: 80%;"><b>Verifying the data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs027.html#set-up-the-model" style="font-size: 80%;"><b>Set up  the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs028.html#add-dense-layers-on-top" style="font-size: 80%;"><b>Add Dense layers on top</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs029.html#compile-and-train-the-model" style="font-size: 80%;"><b>Compile and train the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs030.html#finally-evaluate-the-model" style="font-size: 80%;"><b>Finally, evaluate the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#building-our-own-cnn-code" style="font-size: 80%;"><b>Building our own CNN code</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#list-of-contents" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;List of contents:</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution2dlayer-convolution-in-a-hidden-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution2DLayer: convolution in a hidden layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#backpropagation-in-the-convolutional-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Backpropagation in the convolutional layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#demonstration" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Demonstration</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#pooling-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Pooling Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#flattening-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Flattening Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#fully-connected-layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Fully Connected Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#optimized-convolution2dlayer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Optimized Convolution2DLayer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#the-convolutional-neural-network-cnn" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;The Convolutional Neural Network (CNN)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-cnn-code" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of CNN code</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#additional-remarks" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Additional Remarks</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#remarks-on-the-speed" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Remarks on the speed</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution-using-separable-kernels" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution using separable kernels</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution-in-the-fourier-domain" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution in the Fourier domain</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs032.html#from-ffnns-and-cnns-to-recurrent-neural-networks-rnns" style="font-size: 80%;"><b>From FFNNs and CNNs to recurrent neural networks (RNNs)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs033.html#feedback-connections" style="font-size: 80%;"><b>Feedback connections</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs034.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs035.html#recurrent-neural-networks-rnns-overarching-view" style="font-size: 80%;"><b>Recurrent neural networks (RNNs): Overarching view</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs036.html#sequential-data-only" style="font-size: 80%;"><b>Sequential data only?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs037.html#differential-equations" style="font-size: 80%;"><b>Differential equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs038.html#a-simple-example" style="font-size: 80%;"><b>A simple example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs039.html#rnns" style="font-size: 80%;"><b>RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs040.html#what-kinds-of-behaviour-can-rnns-exhibit" style="font-size: 80%;"><b>What kinds of behaviour can RNNs exhibit?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs041.html#basic-layout-figures-from-sebastian-rashcka-et-al-machine-learning-with-sickit-learn-and-pytorch-https-sebastianraschka-com-blog-2022-ml-pytorch-book-html" style="font-size: 80%;"><b>Basic layout,  "Figures from Sebastian Rashcka et al, Machine learning with Sickit-Learn and PyTorch":"https://sebastianraschka.com/blog/2022/ml-pytorch-book.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs042.html#solving-differential-equations-with-rnns" style="font-size: 80%;"><b>Solving differential equations with RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs043.html#two-first-order-differential-equations" style="font-size: 80%;"><b>Two first-order differential equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs044.html#velocity-only" style="font-size: 80%;"><b>Velocity only</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs045.html#linking-with-rnns" style="font-size: 80%;"><b>Linking with RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs046.html#minor-rewrite" style="font-size: 80%;"><b>Minor rewrite</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs047.html#rnns-in-more-detail" style="font-size: 80%;"><b>RNNs in more detail</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs048.html#rnns-in-more-detail-part-2" style="font-size: 80%;"><b>RNNs in more detail, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs049.html#rnns-in-more-detail-part-3" style="font-size: 80%;"><b>RNNs in more detail, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs050.html#rnns-in-more-detail-part-4" style="font-size: 80%;"><b>RNNs in more detail, part 4</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs051.html#rnns-in-more-detail-part-5" style="font-size: 80%;"><b>RNNs in more detail, part 5</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs052.html#rnns-in-more-detail-part-6" style="font-size: 80%;"><b>RNNs in more detail, part 6</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs053.html#rnns-in-more-detail-part-7" style="font-size: 80%;"><b>RNNs in more detail, part 7</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs054.html#backpropagation-through-time" style="font-size: 80%;"><b>Backpropagation through time</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs055.html#the-backward-pass-is-linear" style="font-size: 80%;"><b>The backward pass is linear</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs056.html#the-problem-of-exploding-or-vanishing-gradients" style="font-size: 80%;"><b>The problem of exploding or vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs057.html#mathematical-setup" style="font-size: 80%;"><b>Mathematical setup</b></a></li>
-     <!-- navigation toc: --> <li><a href="#back-propagation-in-time-through-figures-part-1" style="font-size: 80%;"><b>Back propagation in time through figures, part 1</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs059.html#back-propagation-in-time-part-2" style="font-size: 80%;"><b>Back propagation in time, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs060.html#back-propagation-in-time-part-3" style="font-size: 80%;"><b>Back propagation in time, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs061.html#back-propagation-in-time-part-4" style="font-size: 80%;"><b>Back propagation in time, part 4</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs062.html#back-propagation-in-time-in-equations" style="font-size: 80%;"><b>Back propagation in time in equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs063.html#chain-rule-again" style="font-size: 80%;"><b>Chain rule again</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs064.html#gradients-of-loss-functions" style="font-size: 80%;"><b>Gradients of loss functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs065.html#summary-of-rnns" style="font-size: 80%;"><b>Summary of RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs066.html#summary-of-a-typical-rnn" style="font-size: 80%;"><b>Summary of a  typical RNN</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs067.html#four-effective-ways-to-learn-an-rnn-and-preparing-for-next-week" style="font-size: 80%;"><b>Four effective ways to learn an RNN and preparing for next week</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs068.html#gating-mechanism-long-short-term-memory-lstm" style="font-size: 80%;"><b>Gating mechanism: Long Short Term Memory (LSTM)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs069.html#implementing-a-memory-cell-in-a-neural-network" style="font-size: 80%;"><b>Implementing a memory cell in a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs070.html#lstm-details" style="font-size: 80%;"><b>LSTM details</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs071.html#basic-layout" style="font-size: 80%;"><b>Basic layout</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs072.html#more-lstm-details" style="font-size: 80%;"><b>More LSTM details</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs073.html#the-forget-gate" style="font-size: 80%;"><b>The forget gate</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs074.html#input-gate" style="font-size: 80%;"><b>Input gate</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs075.html#forget-and-input" style="font-size: 80%;"><b>Forget and input</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs076.html#output-gate" style="font-size: 80%;"><b>Output gate</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs001.html#plans-for-week-45" style="font-size: 80%;">Plans for week 45</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs002.html#material-for-the-lab-sessions" style="font-size: 80%;">Material for the lab sessions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs003.html#material-for-lecture-monday-november-3" style="font-size: 80%;">Material for Lecture Monday November 3</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs004.html#convolutional-neural-networks-recognizing-images-reminder-from-last-week" style="font-size: 80%;">Convolutional Neural Networks (recognizing images), reminder from last week</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs005.html#what-is-the-difference" style="font-size: 80%;">What is the Difference</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs006.html#neural-networks-vs-cnns" style="font-size: 80%;">Neural Networks vs CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs007.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;">Why CNNS for images, sound files, medical images from CT scans etc?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs008.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;">Regular NNs don’t scale well to full images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs009.html#3d-volumes-of-neurons" style="font-size: 80%;">3D volumes of neurons</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs010.html#more-on-dimensionalities" style="font-size: 80%;">More on Dimensionalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs011.html#further-remarks" style="font-size: 80%;">Further remarks</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs012.html#layers-used-to-build-cnns" style="font-size: 80%;">Layers used to build CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs013.html#transforming-images" style="font-size: 80%;">Transforming images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs014.html#cnns-in-brief" style="font-size: 80%;">CNNs in brief</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs015.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs016.html#key-idea" style="font-size: 80%;">Key Idea</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs017.html#mathematics-of-cnns" style="font-size: 80%;">Mathematics of CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs018.html#convolution-examples-polynomial-multiplication" style="font-size: 80%;">Convolution Examples: Polynomial multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs019.html#efficient-polynomial-multiplication" style="font-size: 80%;">Efficient Polynomial Multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs020.html#further-simplification" style="font-size: 80%;">Further simplification</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs021.html#a-more-efficient-way-of-coding-the-above-convolution" style="font-size: 80%;">A more efficient way of coding the above Convolution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs022.html#commutative-process" style="font-size: 80%;">Commutative process</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs023.html#toeplitz-matrices" style="font-size: 80%;">Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs024.html#fourier-series-and-toeplitz-matrices" style="font-size: 80%;">Fourier series and Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs025.html#generalizing-the-above-one-dimensional-case" style="font-size: 80%;">Generalizing the above one-dimensional case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs026.html#memory-considerations" style="font-size: 80%;">Memory considerations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs027.html#padding" style="font-size: 80%;">Padding</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs028.html#new-vector" style="font-size: 80%;">New vector</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs029.html#rewriting-as-dot-products" style="font-size: 80%;">Rewriting as dot products</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs030.html#cross-correlation" style="font-size: 80%;">Cross correlation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs030.html#two-dimensional-objects" style="font-size: 80%;">Two-dimensional objects</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#cnns-in-more-detail-simple-example" style="font-size: 80%;">CNNs in more detail, simple example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs032.html#the-convolution-stage" style="font-size: 80%;">The convolution stage</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs033.html#finding-the-number-of-parameters" style="font-size: 80%;">Finding the number of parameters</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs034.html#new-image-or-volume" style="font-size: 80%;">New image (or volume)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs035.html#parameters-to-train-common-settings" style="font-size: 80%;">Parameters to train, common settings</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs036.html#examples-of-cnn-setups" style="font-size: 80%;">Examples of CNN setups</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs037.html#summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Summarizing: Performing a general discrete convolution ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs038.html#pooling" style="font-size: 80%;">Pooling</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs039.html#pooling-arithmetic" style="font-size: 80%;">Pooling arithmetic</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs040.html#pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Pooling types ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs041.html#building-convolutional-neural-networks-using-tensorflow-and-keras" style="font-size: 80%;">Building convolutional neural networks using Tensorflow and Keras</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs042.html#setting-it-up" style="font-size: 80%;">Setting it up</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs043.html#the-mnist-dataset-again" style="font-size: 80%;">The MNIST dataset again</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs044.html#strong-correlations" style="font-size: 80%;">Strong correlations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs045.html#layers-of-a-cnn" style="font-size: 80%;">Layers of a CNN</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs046.html#systematic-reduction" style="font-size: 80%;">Systematic reduction</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs047.html#prerequisites-collect-and-pre-process-data" style="font-size: 80%;">Prerequisites: Collect and pre-process data</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs048.html#importing-keras-and-tensorflow" style="font-size: 80%;">Importing Keras and Tensorflow</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs049.html#running-with-keras" style="font-size: 80%;">Running with Keras</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs050.html#final-part" style="font-size: 80%;">Final part</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs051.html#final-visualization" style="font-size: 80%;">Final visualization</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs052.html#the-cifar01-data-set" style="font-size: 80%;">The CIFAR01 data set</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs053.html#verifying-the-data-set" style="font-size: 80%;">Verifying the data set</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs054.html#set-up-the-model" style="font-size: 80%;">Set up  the model</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs055.html#add-dense-layers-on-top" style="font-size: 80%;">Add Dense layers on top</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs056.html#compile-and-train-the-model" style="font-size: 80%;">Compile and train the model</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs057.html#finally-evaluate-the-model" style="font-size: 80%;">Finally, evaluate the model</a></li>
+     <!-- navigation toc: --> <li><a href="#building-code-using-pytorch" style="font-size: 80%;">Building code using Pytorch</a></li>
 
         </ul>
       </li>
@@ -418,13 +278,120 @@
 <p>&nbsp;</p><p>&nbsp;</p><p>&nbsp;</p> <!-- add vertical space -->
 <a name="part0058"></a>
 <!-- !split -->
-<h2 id="back-propagation-in-time-through-figures-part-1" class="anchor">Back propagation in time through figures, part 1   </h2>
+<h2 id="building-code-using-pytorch" class="anchor">Building code using Pytorch </h2>
+
+<p>This code loads and normalizes the MNIST dataset. Thereafter it defines  a CNN architecture with:</p>
+<ol>
+<li> Two convolutional layers</li>
+<li> Max pooling</li>
+<li> Dropout for regularization</li>
+<li> Two fully connected layers</li>
+</ol>
+<p>It uses the Adam optimizer and for cost function it employs the
+Cross-Entropy function. It trains for 10 epochs.
+You can modify the architecture (number of layers, channels, dropout
+rate) or training parameters (learning rate, batch size, epochs) to
+experiment with different configurations.
+</p>
+
+
+<!-- code=python (!bc pycod) typeset with pygments style "default" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #f8f8f8">
+  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">torch</span>
+<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">torch.nn</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">nn</span>
+<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">torch.nn.functional</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">F</span>
+<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">torch.optim</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">optim</span>
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">torchvision</span> <span style="color: #008000; font-weight: bold">import</span> datasets, transforms
+
+<span style="color: #408080; font-style: italic"># Set device</span>
+device <span style="color: #666666">=</span> torch<span style="color: #666666">.</span>device(<span style="color: #BA2121">&quot;cuda&quot;</span> <span style="color: #008000; font-weight: bold">if</span> torch<span style="color: #666666">.</span>cuda<span style="color: #666666">.</span>is_available() <span style="color: #008000; font-weight: bold">else</span> <span style="color: #BA2121">&quot;cpu&quot;</span>)
+
+<span style="color: #408080; font-style: italic"># Define transforms</span>
+transform <span style="color: #666666">=</span> transforms<span style="color: #666666">.</span>Compose([
+   transforms<span style="color: #666666">.</span>ToTensor(),
+   transforms<span style="color: #666666">.</span>Normalize((<span style="color: #666666">0.1307</span>,), (<span style="color: #666666">0.3081</span>,))
+])
+
+<span style="color: #408080; font-style: italic"># Load datasets</span>
+train_dataset <span style="color: #666666">=</span> datasets<span style="color: #666666">.</span>MNIST(root<span style="color: #666666">=</span><span style="color: #BA2121">&#39;./data&#39;</span>, train<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">True</span>, download<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">True</span>, transform<span style="color: #666666">=</span>transform)
+test_dataset <span style="color: #666666">=</span> datasets<span style="color: #666666">.</span>MNIST(root<span style="color: #666666">=</span><span style="color: #BA2121">&#39;./data&#39;</span>, train<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">False</span>, download<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">True</span>, transform<span style="color: #666666">=</span>transform)
+
+<span style="color: #408080; font-style: italic"># Create data loaders</span>
+train_loader <span style="color: #666666">=</span> torch<span style="color: #666666">.</span>utils<span style="color: #666666">.</span>data<span style="color: #666666">.</span>DataLoader(train_dataset, batch_size<span style="color: #666666">=64</span>, shuffle<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">True</span>)
+test_loader <span style="color: #666666">=</span> torch<span style="color: #666666">.</span>utils<span style="color: #666666">.</span>data<span style="color: #666666">.</span>DataLoader(test_dataset, batch_size<span style="color: #666666">=64</span>, shuffle<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">False</span>)
+
+<span style="color: #408080; font-style: italic"># Define CNN model</span>
+<span style="color: #008000; font-weight: bold">class</span> <span style="color: #0000FF; font-weight: bold">CNN</span>(nn<span style="color: #666666">.</span>Module):
+   <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">__init__</span>(<span style="color: #008000">self</span>):
+       <span style="color: #008000">super</span>(CNN, <span style="color: #008000">self</span>)<span style="color: #666666">.</span><span style="color: #0000FF">__init__</span>()
+       <span style="color: #008000">self</span><span style="color: #666666">.</span>conv1 <span style="color: #666666">=</span> nn<span style="color: #666666">.</span>Conv2d(<span style="color: #666666">1</span>, <span style="color: #666666">32</span>, <span style="color: #666666">3</span>, padding<span style="color: #666666">=1</span>)
+       <span style="color: #008000">self</span><span style="color: #666666">.</span>conv2 <span style="color: #666666">=</span> nn<span style="color: #666666">.</span>Conv2d(<span style="color: #666666">32</span>, <span style="color: #666666">64</span>, <span style="color: #666666">3</span>, padding<span style="color: #666666">=1</span>)
+       <span style="color: #008000">self</span><span style="color: #666666">.</span>pool <span style="color: #666666">=</span> nn<span style="color: #666666">.</span>MaxPool2d(<span style="color: #666666">2</span>, <span style="color: #666666">2</span>)
+       <span style="color: #008000">self</span><span style="color: #666666">.</span>fc1 <span style="color: #666666">=</span> nn<span style="color: #666666">.</span>Linear(<span style="color: #666666">64*7*7</span>, <span style="color: #666666">1024</span>)
+       <span style="color: #008000">self</span><span style="color: #666666">.</span>fc2 <span style="color: #666666">=</span> nn<span style="color: #666666">.</span>Linear(<span style="color: #666666">1024</span>, <span style="color: #666666">10</span>)
+       <span style="color: #008000">self</span><span style="color: #666666">.</span>dropout <span style="color: #666666">=</span> nn<span style="color: #666666">.</span>Dropout(<span style="color: #666666">0.5</span>)
+
+   <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">forward</span>(<span style="color: #008000">self</span>, x):
+       x <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>pool(F<span style="color: #666666">.</span>relu(<span style="color: #008000">self</span><span style="color: #666666">.</span>conv1(x)))
+       x <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>pool(F<span style="color: #666666">.</span>relu(<span style="color: #008000">self</span><span style="color: #666666">.</span>conv2(x)))
+       x <span style="color: #666666">=</span> x<span style="color: #666666">.</span>view(<span style="color: #666666">-1</span>, <span style="color: #666666">64*7*7</span>)
+       x <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>dropout(F<span style="color: #666666">.</span>relu(<span style="color: #008000">self</span><span style="color: #666666">.</span>fc1(x)))
+       x <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>fc2(x)
+       <span style="color: #008000; font-weight: bold">return</span> x
+
+<span style="color: #408080; font-style: italic"># Initialize model, loss function, and optimizer</span>
+model <span style="color: #666666">=</span> CNN()<span style="color: #666666">.</span>to(device)
+criterion <span style="color: #666666">=</span> nn<span style="color: #666666">.</span>CrossEntropyLoss()
+optimizer <span style="color: #666666">=</span> optim<span style="color: #666666">.</span>Adam(model<span style="color: #666666">.</span>parameters(), lr<span style="color: #666666">=0.001</span>)
+
+<span style="color: #408080; font-style: italic"># Training loop</span>
+num_epochs <span style="color: #666666">=</span> <span style="color: #666666">10</span>
+<span style="color: #008000; font-weight: bold">for</span> epoch <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(num_epochs):
+   model<span style="color: #666666">.</span>train()
+   running_loss <span style="color: #666666">=</span> <span style="color: #666666">0.0</span>
+   <span style="color: #008000; font-weight: bold">for</span> batch_idx, (data, target) <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">enumerate</span>(train_loader):
+       data, target <span style="color: #666666">=</span> data<span style="color: #666666">.</span>to(device), target<span style="color: #666666">.</span>to(device)
+       optimizer<span style="color: #666666">.</span>zero_grad()
+       outputs <span style="color: #666666">=</span> model(data)
+       loss <span style="color: #666666">=</span> criterion(outputs, target)
+       loss<span style="color: #666666">.</span>backward()
+       optimizer<span style="color: #666666">.</span>step()
+       running_loss <span style="color: #666666">+=</span> loss<span style="color: #666666">.</span>item()
+
+   <span style="color: #008000">print</span>(<span style="color: #BA2121">f&#39;Epoch [</span><span style="color: #BB6688; font-weight: bold">{</span>epoch<span style="color: #666666">+1</span><span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">/</span><span style="color: #BB6688; font-weight: bold">{</span>num_epochs<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">], Loss: </span><span style="color: #BB6688; font-weight: bold">{</span>running_loss<span style="color: #666666">/</span><span style="color: #008000">len</span>(train_loader)<span style="color: #BB6688; font-weight: bold">:</span><span style="color: #BA2121">.4f</span><span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">&#39;</span>)
+
+<span style="color: #408080; font-style: italic"># Testing the model</span>
+model<span style="color: #666666">.</span>eval()
+correct <span style="color: #666666">=</span> <span style="color: #666666">0</span>
+total <span style="color: #666666">=</span> <span style="color: #666666">0</span>
+<span style="color: #008000; font-weight: bold">with</span> torch<span style="color: #666666">.</span>no_grad():
+   <span style="color: #008000; font-weight: bold">for</span> data, target <span style="color: #AA22FF; font-weight: bold">in</span> test_loader:
+       data, target <span style="color: #666666">=</span> data<span style="color: #666666">.</span>to(device), target<span style="color: #666666">.</span>to(device)
+       outputs <span style="color: #666666">=</span> model(data)
+       _, predicted <span style="color: #666666">=</span> torch<span style="color: #666666">.</span>max(outputs<span style="color: #666666">.</span>data, <span style="color: #666666">1</span>)
+       total <span style="color: #666666">+=</span> target<span style="color: #666666">.</span>size(<span style="color: #666666">0</span>)
+       correct <span style="color: #666666">+=</span> (predicted <span style="color: #666666">==</span> target)<span style="color: #666666">.</span>sum()<span style="color: #666666">.</span>item()
+
+<span style="color: #008000">print</span>(<span style="color: #BA2121">f&#39;Test Accuracy: </span><span style="color: #BB6688; font-weight: bold">{</span><span style="color: #666666">100</span> <span style="color: #666666">*</span> correct <span style="color: #666666">/</span> total<span style="color: #BB6688; font-weight: bold">:</span><span style="color: #BA2121">.2f</span><span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">%&#39;</span>)
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
+</div>
 
-<br/><br/>
-<center>
-<p><img src="http://github.com/figslides/RNN9.png" width="700" align="bottom"></p>
-</center>
-<br/><br/>
 
 <p>
 <!-- navigation buttons at the bottom of the page -->
@@ -441,18 +408,6 @@ <h2 id="back-propagation-in-time-through-figures-part-1" class="anchor">Back pro
   <li><a href="http://github.com/._week45-bs056.html">57</a></li>
   <li><a href="http://github.com/._week45-bs057.html">58</a></li>
   <li class="active"><a href="http://github.com/._week45-bs058.html">59</a></li>
-  <li><a href="http://github.com/._week45-bs059.html">60</a></li>
-  <li><a href="http://github.com/._week45-bs060.html">61</a></li>
-  <li><a href="http://github.com/._week45-bs061.html">62</a></li>
-  <li><a href="http://github.com/._week45-bs062.html">63</a></li>
-  <li><a href="http://github.com/._week45-bs063.html">64</a></li>
-  <li><a href="http://github.com/._week45-bs064.html">65</a></li>
-  <li><a href="http://github.com/._week45-bs065.html">66</a></li>
-  <li><a href="http://github.com/._week45-bs066.html">67</a></li>
-  <li><a href="http://github.com/._week45-bs067.html">68</a></li>
-  <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week45-bs076.html">77</a></li>
-  <li><a href="http://github.com/._week45-bs059.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
 </div>  <!-- end container -->
diff --git a/doc/pub/week45/html/week45-bs.html b/doc/pub/week45/html/week45-bs.html
index 0bd82ed85..6c5387158 100644
--- a/doc/pub/week45/html/week45-bs.html
+++ b/doc/pub/week45/html/week45-bs.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks (RNNs)">
-<title>Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks (RNNs)</title>
+<meta name="description" content="Week 45,  Convolutional Neural Networks (CCNs)">
+<title>Week 45,  Convolutional Neural Networks (CCNs)</title>
 <!-- Bootstrap style: bootstrap -->
 <!-- doconce format html week45.do.txt --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=week45-bs --no_mako -->
 <link href="https://netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css" rel="stylesheet">
@@ -37,19 +37,19 @@
 <!-- tocinfo
 {'highest level': 2,
  'sections': [('Plans for week 45', 2, None, 'plans-for-week-45'),
-              ('Material for the lab sessions, additional ways to present '
-               'classification results and other practicalities',
+              ('Material for the lab sessions',
                2,
                None,
-               'material-for-the-lab-sessions-additional-ways-to-present-classification-results-and-other-practicalities'),
-              ('Material for Lecture Monday November 4',
+               'material-for-the-lab-sessions'),
+              ('Material for Lecture Monday November 3',
                2,
                None,
-               'material-for-lecture-monday-november-4'),
-              ('Convolutional Neural Networks (recognizing images)',
+               'material-for-lecture-monday-november-3'),
+              ('Convolutional Neural Networks (recognizing images), reminder '
+               'from last week',
                2,
                None,
-               'convolutional-neural-networks-recognizing-images'),
+               'convolutional-neural-networks-recognizing-images-reminder-from-last-week'),
               ('What is the Difference', 2, None, 'what-is-the-difference'),
               ('Neural Networks vs CNNs', 2, None, 'neural-networks-vs-cnns'),
               ('Why CNNS for images, sound files, medical images from CT scans '
@@ -62,10 +62,13 @@
                None,
                'regular-nns-don-t-scale-well-to-full-images'),
               ('3D volumes of neurons', 2, None, '3d-volumes-of-neurons'),
+              ('More on Dimensionalities', 2, None, 'more-on-dimensionalities'),
+              ('Further remarks', 2, None, 'further-remarks'),
               ('Layers used to build CNNs',
                2,
                None,
                'layers-used-to-build-cnns'),
+              ('Transforming images', 2, None, 'transforming-images'),
               ('CNNs in brief', 2, None, 'cnns-in-brief'),
               ('A deep CNN model ("From Raschka et '
                'al":"https://github.com/rasbt/machine-learning-book")',
@@ -73,208 +76,104 @@
                None,
                'a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
               ('Key Idea', 2, None, 'key-idea'),
-              ('Building convolutional neural networks in Tensorflow and Keras',
+              ('Mathematics of CNNs', 2, None, 'mathematics-of-cnns'),
+              ('Convolution Examples: Polynomial multiplication',
                2,
                None,
-               'building-convolutional-neural-networks-in-tensorflow-and-keras'),
-              ('Setting it up', 2, None, 'setting-it-up'),
-              ('The MNIST dataset again', 2, None, 'the-mnist-dataset-again'),
-              ('Strong correlations', 2, None, 'strong-correlations'),
-              ('Layers of a CNN', 2, None, 'layers-of-a-cnn'),
-              ('Systematic reduction', 2, None, 'systematic-reduction'),
-              ('Prerequisites: Collect and pre-process data',
-               2,
-               None,
-               'prerequisites-collect-and-pre-process-data'),
-              ('Importing Keras and Tensorflow',
-               2,
-               None,
-               'importing-keras-and-tensorflow'),
-              ('Running with Keras', 2, None, 'running-with-keras'),
-              ('Final part', 2, None, 'final-part'),
-              ('Final visualization', 2, None, 'final-visualization'),
-              ('The CIFAR01 data set', 2, None, 'the-cifar01-data-set'),
-              ('Verifying the data set', 2, None, 'verifying-the-data-set'),
-              ('Set up  the model', 2, None, 'set-up-the-model'),
-              ('Add Dense layers on top', 2, None, 'add-dense-layers-on-top'),
-              ('Compile and train the model',
-               2,
-               None,
-               'compile-and-train-the-model'),
-              ('Finally, evaluate the model',
-               2,
-               None,
-               'finally-evaluate-the-model'),
-              ('Building our own CNN code',
-               2,
-               None,
-               'building-our-own-cnn-code'),
-              ('List of contents:', 3, None, 'list-of-contents'),
-              ('Schedulers', 3, None, 'schedulers'),
-              ('Usage of schedulers', 3, None, 'usage-of-schedulers'),
-              ('Cost functions', 3, None, 'cost-functions'),
-              ('Usage of cost functions', 3, None, 'usage-of-cost-functions'),
-              ('Activation functions', 3, None, 'activation-functions'),
-              ('Usage of activation functions',
-               3,
-               None,
-               'usage-of-activation-functions'),
-              ('Convolution', 3, None, 'convolution'),
-              ('Layers', 3, None, 'layers'),
-              ('Convolution2DLayer: convolution in a hidden layer',
-               3,
-               None,
-               'convolution2dlayer-convolution-in-a-hidden-layer'),
-              ('Backpropagation in the convolutional layer',
-               3,
-               None,
-               'backpropagation-in-the-convolutional-layer'),
-              ('Demonstration', 3, None, 'demonstration'),
-              ('Pooling Layer', 3, None, 'pooling-layer'),
-              ('Flattening Layer', 3, None, 'flattening-layer'),
-              ('Fully Connected Layers', 3, None, 'fully-connected-layers'),
-              ('Optimized Convolution2DLayer',
-               3,
-               None,
-               'optimized-convolution2dlayer'),
-              ('The Convolutional Neural Network (CNN)',
-               3,
-               None,
-               'the-convolutional-neural-network-cnn'),
-              ('Usage of CNN code', 3, None, 'usage-of-cnn-code'),
-              ('Additional Remarks', 3, None, 'additional-remarks'),
-              ('Remarks on the speed', 3, None, 'remarks-on-the-speed'),
-              ('Convolution using separable kernels',
-               3,
-               None,
-               'convolution-using-separable-kernels'),
-              ('Convolution in the Fourier domain',
-               3,
-               None,
-               'convolution-in-the-fourier-domain'),
-              ('From FFNNs and CNNs to recurrent neural networks (RNNs)',
-               2,
-               None,
-               'from-ffnns-and-cnns-to-recurrent-neural-networks-rnns'),
-              ('Feedback connections', 2, None, 'feedback-connections'),
-              ('Vanishing gradients', 2, None, 'vanishing-gradients'),
-              ('Recurrent neural networks (RNNs): Overarching view',
+               'convolution-examples-polynomial-multiplication'),
+              ('Efficient Polynomial Multiplication',
                2,
                None,
-               'recurrent-neural-networks-rnns-overarching-view'),
-              ('Sequential data only?', 2, None, 'sequential-data-only'),
-              ('Differential equations', 2, None, 'differential-equations'),
-              ('A simple example', 2, None, 'a-simple-example'),
-              ('RNNs', 2, None, 'rnns'),
-              ('What kinds of behaviour can RNNs exhibit?',
+               'efficient-polynomial-multiplication'),
+              ('Further simplification', 2, None, 'further-simplification'),
+              ('A more efficient way of coding the above Convolution',
                2,
                None,
-               'what-kinds-of-behaviour-can-rnns-exhibit'),
-              ('Basic layout,  "Figures from Sebastian Rashcka et al, Machine '
-               'learning with Sickit-Learn and '
-               'PyTorch":"https://sebastianraschka.com/blog/2022/ml-pytorch-book.html"',
+               'a-more-efficient-way-of-coding-the-above-convolution'),
+              ('Commutative process', 2, None, 'commutative-process'),
+              ('Toeplitz matrices', 2, None, 'toeplitz-matrices'),
+              ('Fourier series and Toeplitz matrices',
                2,
                None,
-               'basic-layout-figures-from-sebastian-rashcka-et-al-machine-learning-with-sickit-learn-and-pytorch-https-sebastianraschka-com-blog-2022-ml-pytorch-book-html'),
-              ('Solving differential equations with RNNs',
+               'fourier-series-and-toeplitz-matrices'),
+              ('Generalizing the above one-dimensional case',
                2,
                None,
-               'solving-differential-equations-with-rnns'),
-              ('Two first-order differential equations',
+               'generalizing-the-above-one-dimensional-case'),
+              ('Memory considerations', 2, None, 'memory-considerations'),
+              ('Padding', 2, None, 'padding'),
+              ('New vector', 2, None, 'new-vector'),
+              ('Rewriting as dot products',
                2,
                None,
-               'two-first-order-differential-equations'),
-              ('Velocity only', 2, None, 'velocity-only'),
-              ('Linking with RNNs', 2, None, 'linking-with-rnns'),
-              ('Minor rewrite', 2, None, 'minor-rewrite'),
-              ('RNNs in more detail', 2, None, 'rnns-in-more-detail'),
-              ('RNNs in more detail, part 2',
+               'rewriting-as-dot-products'),
+              ('Cross correlation', 2, None, 'cross-correlation'),
+              ('Two-dimensional objects', 2, None, 'two-dimensional-objects'),
+              ('CNNs in more detail, simple example',
                2,
                None,
-               'rnns-in-more-detail-part-2'),
-              ('RNNs in more detail, part 3',
+               'cnns-in-more-detail-simple-example'),
+              ('The convolution stage', 2, None, 'the-convolution-stage'),
+              ('Finding the number of parameters',
                2,
                None,
-               'rnns-in-more-detail-part-3'),
-              ('RNNs in more detail, part 4',
+               'finding-the-number-of-parameters'),
+              ('New image (or volume)', 2, None, 'new-image-or-volume'),
+              ('Parameters to train, common settings',
                2,
                None,
-               'rnns-in-more-detail-part-4'),
-              ('RNNs in more detail, part 5',
-               2,
-               None,
-               'rnns-in-more-detail-part-5'),
-              ('RNNs in more detail, part 6',
-               2,
-               None,
-               'rnns-in-more-detail-part-6'),
-              ('RNNs in more detail, part 7',
-               2,
-               None,
-               'rnns-in-more-detail-part-7'),
-              ('Backpropagation through time',
-               2,
-               None,
-               'backpropagation-through-time'),
-              ('The backward pass is linear',
-               2,
-               None,
-               'the-backward-pass-is-linear'),
-              ('The problem of exploding or vanishing gradients',
-               2,
-               None,
-               'the-problem-of-exploding-or-vanishing-gradients'),
-              ('Mathematical setup', 2, None, 'mathematical-setup'),
-              ('Back propagation in time through figures, part 1',
-               2,
-               None,
-               'back-propagation-in-time-through-figures-part-1'),
-              ('Back propagation in time, part 2',
-               2,
-               None,
-               'back-propagation-in-time-part-2'),
-              ('Back propagation in time, part 3',
+               'parameters-to-train-common-settings'),
+              ('Examples of CNN setups', 2, None, 'examples-of-cnn-setups'),
+              ('Summarizing: Performing a general discrete convolution ("From '
+               'Raschka et '
+               'al":"https://github.com/rasbt/machine-learning-book")',
                2,
                None,
-               'back-propagation-in-time-part-3'),
-              ('Back propagation in time, part 4',
+               'summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
+              ('Pooling', 2, None, 'pooling'),
+              ('Pooling arithmetic', 2, None, 'pooling-arithmetic'),
+              ('Pooling types ("From Raschka et '
+               'al":"https://github.com/rasbt/machine-learning-book")',
                2,
                None,
-               'back-propagation-in-time-part-4'),
-              ('Back propagation in time in equations',
+               'pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
+              ('Building convolutional neural networks using Tensorflow and '
+               'Keras',
                2,
                None,
-               'back-propagation-in-time-in-equations'),
-              ('Chain rule again', 2, None, 'chain-rule-again'),
-              ('Gradients of loss functions',
+               'building-convolutional-neural-networks-using-tensorflow-and-keras'),
+              ('Setting it up', 2, None, 'setting-it-up'),
+              ('The MNIST dataset again', 2, None, 'the-mnist-dataset-again'),
+              ('Strong correlations', 2, None, 'strong-correlations'),
+              ('Layers of a CNN', 2, None, 'layers-of-a-cnn'),
+              ('Systematic reduction', 2, None, 'systematic-reduction'),
+              ('Prerequisites: Collect and pre-process data',
                2,
                None,
-               'gradients-of-loss-functions'),
-              ('Summary of RNNs', 2, None, 'summary-of-rnns'),
-              ('Summary of a  typical RNN',
+               'prerequisites-collect-and-pre-process-data'),
+              ('Importing Keras and Tensorflow',
                2,
                None,
-               'summary-of-a-typical-rnn'),
-              ('Four effective ways to learn an RNN and preparing for next '
-               'week',
+               'importing-keras-and-tensorflow'),
+              ('Running with Keras', 2, None, 'running-with-keras'),
+              ('Final part', 2, None, 'final-part'),
+              ('Final visualization', 2, None, 'final-visualization'),
+              ('The CIFAR01 data set', 2, None, 'the-cifar01-data-set'),
+              ('Verifying the data set', 2, None, 'verifying-the-data-set'),
+              ('Set up  the model', 2, None, 'set-up-the-model'),
+              ('Add Dense layers on top', 2, None, 'add-dense-layers-on-top'),
+              ('Compile and train the model',
                2,
                None,
-               'four-effective-ways-to-learn-an-rnn-and-preparing-for-next-week'),
-              ('Gating mechanism: Long Short Term Memory (LSTM)',
+               'compile-and-train-the-model'),
+              ('Finally, evaluate the model',
                2,
                None,
-               'gating-mechanism-long-short-term-memory-lstm'),
-              ('Implementing a memory cell in a neural network',
+               'finally-evaluate-the-model'),
+              ('Building code using Pytorch',
                2,
                None,
-               'implementing-a-memory-cell-in-a-neural-network'),
-              ('LSTM details', 2, None, 'lstm-details'),
-              ('Basic layout', 2, None, 'basic-layout'),
-              ('More LSTM details', 2, None, 'more-lstm-details'),
-              ('The forget gate', 2, None, 'the-forget-gate'),
-              ('Input gate', 2, None, 'input-gate'),
-              ('Forget and input', 2, None, 'forget-and-input'),
-              ('Output gate', 2, None, 'output-gate')]}
+               'building-code-using-pytorch')]}
 end of tocinfo -->
 
 <body>
@@ -302,111 +201,72 @@
       <span class="icon-bar"></span>
       <span class="icon-bar"></span>
     </button>
-    <a class="navbar-brand" href="http://github.com/week45-bs.html">Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks (RNNs)</a>
+    <a class="navbar-brand" href="http://github.com/week45-bs.html">Week 45,  Convolutional Neural Networks (CCNs)</a>
   </div>
   <div class="navbar-collapse collapse navbar-responsive-collapse">
     <ul class="nav navbar-nav navbar-right">
       <li class="dropdown">
         <a href="#" class="dropdown-toggle" data-toggle="dropdown">Contents <b class="caret"></b></a>
         <ul class="dropdown-menu">
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs001.html#plans-for-week-45" style="font-size: 80%;"><b>Plans for week 45</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs002.html#material-for-the-lab-sessions-additional-ways-to-present-classification-results-and-other-practicalities" style="font-size: 80%;"><b>Material for the lab sessions, additional ways to present classification results and other practicalities</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs003.html#material-for-lecture-monday-november-4" style="font-size: 80%;"><b>Material for Lecture Monday November 4</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs004.html#convolutional-neural-networks-recognizing-images" style="font-size: 80%;"><b>Convolutional Neural Networks (recognizing images)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs005.html#what-is-the-difference" style="font-size: 80%;"><b>What is the Difference</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs006.html#neural-networks-vs-cnns" style="font-size: 80%;"><b>Neural Networks vs CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs007.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;"><b>Why CNNS for images, sound files, medical images from CT scans etc?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs008.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;"><b>Regular NNs don’t scale well to full images</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs009.html#3d-volumes-of-neurons" style="font-size: 80%;"><b>3D volumes of neurons</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs010.html#layers-used-to-build-cnns" style="font-size: 80%;"><b>Layers used to build CNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs011.html#cnns-in-brief" style="font-size: 80%;"><b>CNNs in brief</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs012.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;"><b>A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs013.html#key-idea" style="font-size: 80%;"><b>Key Idea</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs014.html#building-convolutional-neural-networks-in-tensorflow-and-keras" style="font-size: 80%;"><b>Building convolutional neural networks in Tensorflow and Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs015.html#setting-it-up" style="font-size: 80%;"><b>Setting it up</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs016.html#the-mnist-dataset-again" style="font-size: 80%;"><b>The MNIST dataset again</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs017.html#strong-correlations" style="font-size: 80%;"><b>Strong correlations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs018.html#layers-of-a-cnn" style="font-size: 80%;"><b>Layers of a CNN</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs019.html#systematic-reduction" style="font-size: 80%;"><b>Systematic reduction</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs020.html#prerequisites-collect-and-pre-process-data" style="font-size: 80%;"><b>Prerequisites: Collect and pre-process data</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs021.html#importing-keras-and-tensorflow" style="font-size: 80%;"><b>Importing Keras and Tensorflow</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs022.html#running-with-keras" style="font-size: 80%;"><b>Running with Keras</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs023.html#final-part" style="font-size: 80%;"><b>Final part</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs024.html#final-visualization" style="font-size: 80%;"><b>Final visualization</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs025.html#the-cifar01-data-set" style="font-size: 80%;"><b>The CIFAR01 data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs026.html#verifying-the-data-set" style="font-size: 80%;"><b>Verifying the data set</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs027.html#set-up-the-model" style="font-size: 80%;"><b>Set up  the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs028.html#add-dense-layers-on-top" style="font-size: 80%;"><b>Add Dense layers on top</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs029.html#compile-and-train-the-model" style="font-size: 80%;"><b>Compile and train the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs030.html#finally-evaluate-the-model" style="font-size: 80%;"><b>Finally, evaluate the model</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#building-our-own-cnn-code" style="font-size: 80%;"><b>Building our own CNN code</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#list-of-contents" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;List of contents:</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-schedulers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of schedulers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-cost-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of cost functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-activation-functions" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of activation functions</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution2dlayer-convolution-in-a-hidden-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution2DLayer: convolution in a hidden layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#backpropagation-in-the-convolutional-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Backpropagation in the convolutional layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#demonstration" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Demonstration</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#pooling-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Pooling Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#flattening-layer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Flattening Layer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#fully-connected-layers" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Fully Connected Layers</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#optimized-convolution2dlayer" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Optimized Convolution2DLayer</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#the-convolutional-neural-network-cnn" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;The Convolutional Neural Network (CNN)</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#usage-of-cnn-code" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Usage of CNN code</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#additional-remarks" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Additional Remarks</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#remarks-on-the-speed" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Remarks on the speed</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution-using-separable-kernels" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution using separable kernels</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#convolution-in-the-fourier-domain" style="font-size: 80%;">&nbsp;&nbsp;&nbsp;Convolution in the Fourier domain</a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs032.html#from-ffnns-and-cnns-to-recurrent-neural-networks-rnns" style="font-size: 80%;"><b>From FFNNs and CNNs to recurrent neural networks (RNNs)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs033.html#feedback-connections" style="font-size: 80%;"><b>Feedback connections</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs034.html#vanishing-gradients" style="font-size: 80%;"><b>Vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs035.html#recurrent-neural-networks-rnns-overarching-view" style="font-size: 80%;"><b>Recurrent neural networks (RNNs): Overarching view</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs036.html#sequential-data-only" style="font-size: 80%;"><b>Sequential data only?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs037.html#differential-equations" style="font-size: 80%;"><b>Differential equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs038.html#a-simple-example" style="font-size: 80%;"><b>A simple example</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs039.html#rnns" style="font-size: 80%;"><b>RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs040.html#what-kinds-of-behaviour-can-rnns-exhibit" style="font-size: 80%;"><b>What kinds of behaviour can RNNs exhibit?</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs041.html#basic-layout-figures-from-sebastian-rashcka-et-al-machine-learning-with-sickit-learn-and-pytorch-https-sebastianraschka-com-blog-2022-ml-pytorch-book-html" style="font-size: 80%;"><b>Basic layout,  "Figures from Sebastian Rashcka et al, Machine learning with Sickit-Learn and PyTorch":"https://sebastianraschka.com/blog/2022/ml-pytorch-book.html"</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs042.html#solving-differential-equations-with-rnns" style="font-size: 80%;"><b>Solving differential equations with RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs043.html#two-first-order-differential-equations" style="font-size: 80%;"><b>Two first-order differential equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs044.html#velocity-only" style="font-size: 80%;"><b>Velocity only</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs045.html#linking-with-rnns" style="font-size: 80%;"><b>Linking with RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs046.html#minor-rewrite" style="font-size: 80%;"><b>Minor rewrite</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs047.html#rnns-in-more-detail" style="font-size: 80%;"><b>RNNs in more detail</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs048.html#rnns-in-more-detail-part-2" style="font-size: 80%;"><b>RNNs in more detail, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs049.html#rnns-in-more-detail-part-3" style="font-size: 80%;"><b>RNNs in more detail, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs050.html#rnns-in-more-detail-part-4" style="font-size: 80%;"><b>RNNs in more detail, part 4</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs051.html#rnns-in-more-detail-part-5" style="font-size: 80%;"><b>RNNs in more detail, part 5</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs052.html#rnns-in-more-detail-part-6" style="font-size: 80%;"><b>RNNs in more detail, part 6</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs053.html#rnns-in-more-detail-part-7" style="font-size: 80%;"><b>RNNs in more detail, part 7</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs054.html#backpropagation-through-time" style="font-size: 80%;"><b>Backpropagation through time</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs055.html#the-backward-pass-is-linear" style="font-size: 80%;"><b>The backward pass is linear</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs056.html#the-problem-of-exploding-or-vanishing-gradients" style="font-size: 80%;"><b>The problem of exploding or vanishing gradients</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs057.html#mathematical-setup" style="font-size: 80%;"><b>Mathematical setup</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs058.html#back-propagation-in-time-through-figures-part-1" style="font-size: 80%;"><b>Back propagation in time through figures, part 1</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs059.html#back-propagation-in-time-part-2" style="font-size: 80%;"><b>Back propagation in time, part 2</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs060.html#back-propagation-in-time-part-3" style="font-size: 80%;"><b>Back propagation in time, part 3</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs061.html#back-propagation-in-time-part-4" style="font-size: 80%;"><b>Back propagation in time, part 4</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs062.html#back-propagation-in-time-in-equations" style="font-size: 80%;"><b>Back propagation in time in equations</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs063.html#chain-rule-again" style="font-size: 80%;"><b>Chain rule again</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs064.html#gradients-of-loss-functions" style="font-size: 80%;"><b>Gradients of loss functions</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs065.html#summary-of-rnns" style="font-size: 80%;"><b>Summary of RNNs</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs066.html#summary-of-a-typical-rnn" style="font-size: 80%;"><b>Summary of a  typical RNN</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs067.html#four-effective-ways-to-learn-an-rnn-and-preparing-for-next-week" style="font-size: 80%;"><b>Four effective ways to learn an RNN and preparing for next week</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs068.html#gating-mechanism-long-short-term-memory-lstm" style="font-size: 80%;"><b>Gating mechanism: Long Short Term Memory (LSTM)</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs069.html#implementing-a-memory-cell-in-a-neural-network" style="font-size: 80%;"><b>Implementing a memory cell in a neural network</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs070.html#lstm-details" style="font-size: 80%;"><b>LSTM details</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs071.html#basic-layout" style="font-size: 80%;"><b>Basic layout</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs072.html#more-lstm-details" style="font-size: 80%;"><b>More LSTM details</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs073.html#the-forget-gate" style="font-size: 80%;"><b>The forget gate</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs074.html#input-gate" style="font-size: 80%;"><b>Input gate</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs075.html#forget-and-input" style="font-size: 80%;"><b>Forget and input</b></a></li>
-     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs076.html#output-gate" style="font-size: 80%;"><b>Output gate</b></a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs001.html#plans-for-week-45" style="font-size: 80%;">Plans for week 45</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs002.html#material-for-the-lab-sessions" style="font-size: 80%;">Material for the lab sessions</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs003.html#material-for-lecture-monday-november-3" style="font-size: 80%;">Material for Lecture Monday November 3</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs004.html#convolutional-neural-networks-recognizing-images-reminder-from-last-week" style="font-size: 80%;">Convolutional Neural Networks (recognizing images), reminder from last week</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs005.html#what-is-the-difference" style="font-size: 80%;">What is the Difference</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs006.html#neural-networks-vs-cnns" style="font-size: 80%;">Neural Networks vs CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs007.html#why-cnns-for-images-sound-files-medical-images-from-ct-scans-etc" style="font-size: 80%;">Why CNNS for images, sound files, medical images from CT scans etc?</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs008.html#regular-nns-don-t-scale-well-to-full-images" style="font-size: 80%;">Regular NNs don’t scale well to full images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs009.html#3d-volumes-of-neurons" style="font-size: 80%;">3D volumes of neurons</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs010.html#more-on-dimensionalities" style="font-size: 80%;">More on Dimensionalities</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs011.html#further-remarks" style="font-size: 80%;">Further remarks</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs012.html#layers-used-to-build-cnns" style="font-size: 80%;">Layers used to build CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs013.html#transforming-images" style="font-size: 80%;">Transforming images</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs014.html#cnns-in-brief" style="font-size: 80%;">CNNs in brief</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs015.html#a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs016.html#key-idea" style="font-size: 80%;">Key Idea</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs017.html#mathematics-of-cnns" style="font-size: 80%;">Mathematics of CNNs</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs018.html#convolution-examples-polynomial-multiplication" style="font-size: 80%;">Convolution Examples: Polynomial multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs019.html#efficient-polynomial-multiplication" style="font-size: 80%;">Efficient Polynomial Multiplication</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs020.html#further-simplification" style="font-size: 80%;">Further simplification</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs021.html#a-more-efficient-way-of-coding-the-above-convolution" style="font-size: 80%;">A more efficient way of coding the above Convolution</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs022.html#commutative-process" style="font-size: 80%;">Commutative process</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs023.html#toeplitz-matrices" style="font-size: 80%;">Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs024.html#fourier-series-and-toeplitz-matrices" style="font-size: 80%;">Fourier series and Toeplitz matrices</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs025.html#generalizing-the-above-one-dimensional-case" style="font-size: 80%;">Generalizing the above one-dimensional case</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs026.html#memory-considerations" style="font-size: 80%;">Memory considerations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs027.html#padding" style="font-size: 80%;">Padding</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs028.html#new-vector" style="font-size: 80%;">New vector</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs029.html#rewriting-as-dot-products" style="font-size: 80%;">Rewriting as dot products</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs030.html#cross-correlation" style="font-size: 80%;">Cross correlation</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs030.html#two-dimensional-objects" style="font-size: 80%;">Two-dimensional objects</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs031.html#cnns-in-more-detail-simple-example" style="font-size: 80%;">CNNs in more detail, simple example</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs032.html#the-convolution-stage" style="font-size: 80%;">The convolution stage</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs033.html#finding-the-number-of-parameters" style="font-size: 80%;">Finding the number of parameters</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs034.html#new-image-or-volume" style="font-size: 80%;">New image (or volume)</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs035.html#parameters-to-train-common-settings" style="font-size: 80%;">Parameters to train, common settings</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs036.html#examples-of-cnn-setups" style="font-size: 80%;">Examples of CNN setups</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs037.html#summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Summarizing: Performing a general discrete convolution ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs038.html#pooling" style="font-size: 80%;">Pooling</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs039.html#pooling-arithmetic" style="font-size: 80%;">Pooling arithmetic</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs040.html#pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book" style="font-size: 80%;">Pooling types ("From Raschka et al":"https://github.com/rasbt/machine-learning-book")</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs041.html#building-convolutional-neural-networks-using-tensorflow-and-keras" style="font-size: 80%;">Building convolutional neural networks using Tensorflow and Keras</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs042.html#setting-it-up" style="font-size: 80%;">Setting it up</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs043.html#the-mnist-dataset-again" style="font-size: 80%;">The MNIST dataset again</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs044.html#strong-correlations" style="font-size: 80%;">Strong correlations</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs045.html#layers-of-a-cnn" style="font-size: 80%;">Layers of a CNN</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs046.html#systematic-reduction" style="font-size: 80%;">Systematic reduction</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs047.html#prerequisites-collect-and-pre-process-data" style="font-size: 80%;">Prerequisites: Collect and pre-process data</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs048.html#importing-keras-and-tensorflow" style="font-size: 80%;">Importing Keras and Tensorflow</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs049.html#running-with-keras" style="font-size: 80%;">Running with Keras</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs050.html#final-part" style="font-size: 80%;">Final part</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs051.html#final-visualization" style="font-size: 80%;">Final visualization</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs052.html#the-cifar01-data-set" style="font-size: 80%;">The CIFAR01 data set</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs053.html#verifying-the-data-set" style="font-size: 80%;">Verifying the data set</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs054.html#set-up-the-model" style="font-size: 80%;">Set up  the model</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs055.html#add-dense-layers-on-top" style="font-size: 80%;">Add Dense layers on top</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs056.html#compile-and-train-the-model" style="font-size: 80%;">Compile and train the model</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs057.html#finally-evaluate-the-model" style="font-size: 80%;">Finally, evaluate the model</a></li>
+     <!-- navigation toc: --> <li><a href="http://github.com/._week45-bs058.html#building-code-using-pytorch" style="font-size: 80%;">Building code using Pytorch</a></li>
 
         </ul>
       </li>
@@ -420,7 +280,7 @@
 <!-- ------------------- main content ---------------------- -->
 <div class="jumbotron">
 <center>
-<h1>Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks (RNNs)</h1>
+<h1>Week 45,  Convolutional Neural Networks (CCNs)</h1>
 </center>  <!-- document title -->
 
 <!-- author(s): Morten Hjorth-Jensen -->
@@ -433,7 +293,7 @@ <h1>Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks
 </center>
 <br>
 <center>
-<h4>November 4-8</h4>
+<h4>November 3-7, 2025</h4>
 </center> <!-- date -->
 <br>
 
@@ -458,7 +318,7 @@ <h4>November 4-8</h4>
   <li><a href="http://github.com/._week45-bs008.html">9</a></li>
   <li><a href="http://github.com/._week45-bs009.html">10</a></li>
   <li><a href="">...</a></li>
-  <li><a href="http://github.com/._week45-bs076.html">77</a></li>
+  <li><a href="http://github.com/._week45-bs058.html">59</a></li>
   <li><a href="http://github.com/._week45-bs001.html">&raquo;</a></li>
 </ul>
 <!-- ------------------- end of main content --------------- -->
@@ -472,7 +332,7 @@ <h4>November 4-8</h4>
 </footer>
 -->
 <center style="font-size:80%">
-<!-- copyright --> &copy; 1999-2024, Morten Hjorth-Jensen. Released under CC Attribution-NonCommercial 4.0 license
+<!-- copyright --> &copy; 1999-2025, Morten Hjorth-Jensen. Released under CC Attribution-NonCommercial 4.0 license
 </center>
 </body>
 </html>
diff --git a/doc/pub/week45/html/week45-reveal.html b/doc/pub/week45/html/week45-reveal.html
index 0fb0154bb..c76ab8124 100644
--- a/doc/pub/week45/html/week45-reveal.html
+++ b/doc/pub/week45/html/week45-reveal.html
@@ -9,8 +9,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks (RNNs)">
-<title>Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks (RNNs)</title>
+<meta name="description" content="Week 45,  Convolutional Neural Networks (CCNs)">
+<title>Week 45,  Convolutional Neural Networks (CCNs)</title>
 
 <!-- reveal.js: https://lab.hakim.se/reveal-js/ -->
 
@@ -168,7 +168,7 @@
 <section>
 <!-- ------------------- main content ---------------------- -->
 <center>
-<h1 style="text-align: center;">Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks (RNNs)</h1>
+<h1 style="text-align: center;">Week 45,  Convolutional Neural Networks (CCNs)</h1>
 </center>  <!-- document title -->
 
 <!-- author(s): Morten Hjorth-Jensen -->
@@ -181,13 +181,13 @@ <h1 style="text-align: center;">Week 45,  Convolutional Neural Networks (CCNs) a
 </center>
 <br>
 <center>
-<h4>November 4-8</h4>
+<h4>November 3-7, 2025</h4>
 </center> <!-- date -->
 <br>
 
 
 <center style="font-size:80%">
-<!-- copyright --> &copy; 1999-2024, Morten Hjorth-Jensen. Released under CC Attribution-NonCommercial 4.0 license
+<!-- copyright --> &copy; 1999-2025, Morten Hjorth-Jensen. Released under CC Attribution-NonCommercial 4.0 license
 </center>
 </section>
 
@@ -195,20 +195,19 @@ <h4>November 4-8</h4>
 <h2 id="plans-for-week-45">Plans for week 45 </h2>
 
 <div class="alert alert-block alert-block alert-text-normal">
-<b>Material for the lecture on Monday November 4, 2024</b>
+<b>Material for the lecture on Monday November 3, 2025</b>
 <p>
 <ol>
-<p><li> Convolutional Neural Networks, codes and examples (own code and TensorFlow and Pytorch implementations)</li>
-<p><li> Recurrent  Neural Networks (RNNs)</li>
-<p><li> Readings and Videos:
+<p><li> Convolutional Neural Networks, codes and examples (TensorFlow and Pytorch implementations)</li>
+<p><li> Readings and Videos:</li>
+<p><li> These lecture notes at <a href="https://github.com/CompPhysics/MachineLearning/blob/master/doc/pub/week45/ipynb/week45.ipynb" target="_blank"><tt>https://github.com/CompPhysics/MachineLearning/blob/master/doc/pub/week45/ipynb/week45.ipynb</tt></a></li>
+<p><li> Video of lecture at <a href="https://youtu.be/dZt6Vm1wjhs" target="_blank"><tt>https://youtu.be/dZt6Vm1wjhs</tt></a></li>
+<p><li> Whiteboard notes at <a href="https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2025/FYSSTKweek45.pdf" target="_blank"><tt>https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2025/FYSSTKweek45.pdf</tt></a></li>
+<p><li> For a more in depth discussion on  CNNs we recommend Goodfellow et al chapters 9. See also chapter 11 and 12 on practicalities and applications</li>
+
+<p><li> Reading suggestions for implementation of CNNs, see Raschka et al chapters 14-15 at <a href="https://github.com/rasbt/machine-learning-book" target="_blank"><tt>https://github.com/rasbt/machine-learning-book</tt></a>.
+<!-- o Video  on Recurrent Neural Networks from MIT at <a href="https://www.youtube.com/watch?v=SEnXr6v2ifU&ab_channel=AlexanderAmini" target="_blank"><tt>https://www.youtube.com/watch?v=SEnXr6v2ifU&ab_channel=AlexanderAmini</tt></a> -->
 <ol type="a"></li>
- <p><li> These lecture notes at <a href="https://github.com/CompPhysics/MachineLearning/blob/master/doc/pub/week45/ipynb/week45.ipynb" target="_blank"><tt>https://github.com/CompPhysics/MachineLearning/blob/master/doc/pub/week45/ipynb/week45.ipynb</tt></a>
-<!-- * <a href="https://youtu.be/z0x-vgyAZUk" target="_blank">Video of lecture</a> -->
-<!-- * <a href="https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2023/NotesNov9.pdf" target="_blank">Whiteboard notes</a> --></li>
- <p><li> For a more in depth discussion on  CNNs and recurrent neural networks we recommend Goodfellow et al chapters 9 and 10. See also chapter 11 and 12 on practicalities and applications</li>
-
-<p><li> Reading suggestions for implementation of CNNs and RNNs, see Raschka et al chapters 14-15 at <a href="https://github.com/rasbt/machine-learning-book" target="_blank"><tt>https://github.com/rasbt/machine-learning-book</tt></a>.</li>
- <p><li> Video  on Recurrent Neural Networks from MIT at <a href="https://www.youtube.com/watch?v=SEnXr6v2ifU&ab_channel=AlexanderAmini" target="_blank"><tt>https://www.youtube.com/watch?v=SEnXr6v2ifU&ab_channel=AlexanderAmini</tt></a></li>
  <p><li> Video on Deep Learning at <a href="https://www.youtube.com/playlist?list=PLZHQObOWTQDNU6R1_67000Dx_ZCJB-3pi" target="_blank"><tt>https://www.youtube.com/playlist?list=PLZHQObOWTQDNU6R1_67000Dx_ZCJB-3pi</tt></a></li>
 </ol>
 <p>
@@ -217,23 +216,17 @@ <h2 id="plans-for-week-45">Plans for week 45 </h2>
 </section>
 
 <section>
-<h2 id="material-for-the-lab-sessions-additional-ways-to-present-classification-results-and-other-practicalities">Material for the lab sessions, additional ways to present classification results and other practicalities </h2>
+<h2 id="material-for-the-lab-sessions">Material for the lab sessions </h2>
 
-<div class="alert alert-block alert-block alert-text-normal">
-<b>Material for the active learning sessions on Tuesday and Wednesday</b>
-<p>
-<ol>
- <p><li> Discussion of and work on project 3, available from Monday November 4, late evening</li>
-</ol>
-</div>
+<p>Discussion of and work on project 2, no exercises this week, only project work</p>
 </section>
 
 <section>
-<h2 id="material-for-lecture-monday-november-4">Material for Lecture Monday November 4 </h2>
+<h2 id="material-for-lecture-monday-november-3">Material for Lecture Monday November 3 </h2>
 </section>
 
 <section>
-<h2 id="convolutional-neural-networks-recognizing-images">Convolutional Neural Networks (recognizing images) </h2>
+<h2 id="convolutional-neural-networks-recognizing-images-reminder-from-last-week">Convolutional Neural Networks (recognizing images), reminder from last week </h2>
 
 <p>Convolutional neural networks (CNNs) were developed during the last
 decade of the previous century, with a focus on character recognition
@@ -374,6 +367,51 @@ <h2 id="3d-volumes-of-neurons">3D volumes of neurons </h2>
 </center>
 </section>
 
+<section>
+<h2 id="more-on-dimensionalities">More on Dimensionalities </h2>
+
+<p>In fields like signal processing (and imaging as well), one designs
+so-called filters. These filters are defined by the convolutions and
+are often hand-crafted. One may specify filters for smoothing, edge
+detection, frequency reshaping, and similar operations. However with
+neural networks the idea is to automatically learn the filters and use
+many of them in conjunction with non-linear operations (activation
+functions).
+</p>
+
+<p>As an example consider a neural network operating on sound sequence
+data.  Assume that we an input vector \( \boldsymbol{x} \) of length \( d=10^6 \).  We
+construct then a neural network with onle hidden layer only with
+\( 10^4 \) nodes. This means that we will have a weight matrix with
+\( 10^4\times 10^6=10^{10} \) weights to be determined, together with \( 10^4 \) biases.
+</p>
+
+<p>Assume furthermore that we have an output layer which is meant to train whether the sound sequence represents a human voice (true) or something else (false).
+It means that we have only one output node. But since this output node connects to \( 10^4 \) nodes in the hidden layer, there are in total \( 10^4 \) weights to be determined for the output layer, plus one bias. In total we have
+</p>
+
+<p>&nbsp;<br>
+$$
+\mathrm{NumberParameters}=10^{10}+10^4+10^4+1 \approx 10^{10},
+$$
+<p>&nbsp;<br>
+
+<p>that is ten billion parameters to determine. </p>
+</section>
+
+<section>
+<h2 id="further-remarks">Further remarks </h2>
+
+<p>The main principles that justify convolutions is locality of
+information and repetion of patterns within the signal. Sound samples
+of the input in adjacent spots are much more likely to affect each
+other than those that are very far away. Similarly, sounds are
+repeated in multiple times in the signal. While slightly simplistic,
+reasoning about such a sound example demonstrates this. The same
+principles then apply to images and other similar data.
+</p>
+</section>
+
 <section>
 <h2 id="layers-used-to-build-cnns">Layers used to build CNNs </h2>
 
@@ -396,6 +434,24 @@ <h2 id="layers-used-to-build-cnns">Layers used to build CNNs </h2>
 </ul>
 </section>
 
+<section>
+<h2 id="transforming-images">Transforming images </h2>
+
+<p>CNNs transform the original image layer by layer from the original
+pixel values to the final class scores. 
+</p>
+
+<p>Observe that some layers contain
+parameters and other don&#8217;t. In particular, the CNN layers perform
+transformations that are a function of not only the activations in the
+input volume, but also of the parameters (the weights and biases of
+the neurons). On the other hand, the RELU/POOL layers will implement a
+fixed function. The parameters in the CONV/FC layers will be trained
+with gradient descent so that the class scores that the CNN computes
+are consistent with the labels in the training set for each image.
+</p>
+</section>
+
 <section>
 <h2 id="cnns-in-brief">CNNs in brief </h2>
 
@@ -435,4693 +491,1355 @@ <h2 id="key-idea">Key Idea </h2>
 </section>
 
 <section>
-<h2 id="building-convolutional-neural-networks-in-tensorflow-and-keras">Building convolutional neural networks in Tensorflow and Keras </h2>
+<h2 id="mathematics-of-cnns">Mathematics of CNNs </h2>
 
-<p>As discussed above, CNNs are neural networks built from the assumption that the inputs
-to the network are 2D images. This is important because the number of features or pixels in images
-grows very fast with the image size, and an enormous number of weights and biases are needed in order to build an accurate network.  
+<p>The mathematics of CNNs is based on the mathematical operation of
+<b>convolution</b>.  In mathematics (in particular in functional analysis),
+convolution is represented by mathematical operations (integration,
+summation etc) on two functions in order to produce a third function
+that expresses how the shape of one gets modified by the other.
+Convolution has a plethora of applications in a variety of
+disciplines, spanning from statistics to signal processing, computer
+vision, solutions of differential equations,linear algebra,
+engineering, and yes, machine learning.
 </p>
 
-<p>As before, we still have our input, a hidden layer and an output. What's novel about convolutional networks
-are the <b>convolutional</b> and <b>pooling</b> layers stacked in pairs between the input and the hidden layer.
-In addition, the data is no longer represented as a 2D feature matrix, instead each input is a number of 2D
-matrices, typically 1 for each color dimension (Red, Green, Blue). 
+<p>Mathematically, convolution is defined as follows (one-dimensional example):
+Let us define a continuous function \( y(t) \) given by
 </p>
+<p>&nbsp;<br>
+$$
+y(t) = \int x(a) w(t-a) da,
+$$
+<p>&nbsp;<br>
+
+<p>where \( x(a) \) represents a so-called input and \( w(t-a) \) is normally called the weight function or kernel.</p>
+
+<p>The above integral is written in  a more compact form as</p>
+<p>&nbsp;<br>
+$$
+y(t) = \left(x * w\right)(t).
+$$
+<p>&nbsp;<br>
+
+<p>The discretized version reads</p>
+<p>&nbsp;<br>
+$$
+y(t) = \sum_{a=-\infty}^{a=\infty}x(a)w(t-a).
+$$
+<p>&nbsp;<br>
+
+<p>Computing the inverse of the above convolution operations is known as deconvolution and the process is commutative.</p>
+
+<p>How can we use this? And what does it mean? Let us study some familiar examples first.</p>
 </section>
 
 <section>
-<h2 id="setting-it-up">Setting it up </h2>
+<h2 id="convolution-examples-polynomial-multiplication">Convolution Examples: Polynomial multiplication </h2>
 
-<p>It means that to represent the entire
-dataset of images, we require a 4D matrix or <b>tensor</b>. This tensor has the dimensions:  
+<p>Our first example is that of a multiplication between two polynomials,
+which we will rewrite in terms of the mathematics of convolution. In
+the final stage, since the problem here is a discrete one, we will
+recast the final expression in terms of a matrix-vector
+multiplication, where the matrix is a so-called <a href="https://link.springer.com/book/10.1007/978-93-86279-04-0" target="_blank">Toeplitz matrix
+</a>.
 </p>
+
+<p>Let us look a the following polynomials to second and third order, respectively:</p>
 <p>&nbsp;<br>
-$$  
-(n_{inputs},\, n_{pixels, width},\, n_{pixels, height},\, depth) .
+$$
+p(t) = \alpha_0+\alpha_1 t+\alpha_2 t^2,
+$$
+<p>&nbsp;<br>
+
+<p>and</p>
+<p>&nbsp;<br>
+$$
+s(t) = \beta_0+\beta_1 t+\beta_2 t^2+\beta_3 t^3.
+$$
+<p>&nbsp;<br>
+
+<p>The polynomial multiplication gives us a new polynomial of degree \( 5 \)</p>
+<p>&nbsp;<br>
+$$
+z(t) = \delta_0+\delta_1 t+\delta_2 t^2+\delta_3 t^3+\delta_4 t^4+\delta_5 t^5.
 $$
 <p>&nbsp;<br>
 </section>
 
 <section>
-<h2 id="the-mnist-dataset-again">The MNIST dataset again </h2>
+<h2 id="efficient-polynomial-multiplication">Efficient Polynomial Multiplication </h2>
 
-<p>The MNIST dataset consists of grayscale images with a pixel size of
-\( 28\times 28 \), meaning we require \( 28 \times 28 = 724 \) weights to each
-neuron in the first hidden layer.
+<p>Computing polynomial products can be implemented efficiently if we rewrite the more brute force multiplications using convolution.
+We note first that the new coefficients are given as
 </p>
 
-<p>If we were to analyze images of size \( 128\times 128 \) we would require
-\( 128 \times 128 = 16384 \) weights to each neuron. Even worse if we were
-dealing with color images, as most images are, we have an image matrix
-of size \( 128\times 128 \) for each color dimension (Red, Green, Blue),
-meaning 3 times the number of weights \( = 49152 \) are required for every
-single neuron in the first hidden layer.
-</p>
+<p>&nbsp;<br>
+$$
+\begin{split}
+\delta_0=&\alpha_0\beta_0\\
+\delta_1=&\alpha_1\beta_0+\alpha_0\beta_1\\
+\delta_2=&\alpha_0\beta_2+\alpha_1\beta_1+\alpha_2\beta_0\\
+\delta_3=&\alpha_1\beta_2+\alpha_2\beta_1+\alpha_0\beta_3\\
+\delta_4=&\alpha_2\beta_2+\alpha_1\beta_3\\
+\delta_5=&\alpha_2\beta_3.\\
+\end{split}
+$$
+<p>&nbsp;<br>
+
+<p>We note that \( \alpha_i=0 \) except for \( i\in \left\{0,1,2\right\} \) and \( \beta_i=0 \) except for \( i\in\left\{0,1,2,3\right\} \).</p>
+
+<p>We can then rewrite the coefficients \( \delta_j \) using a discrete convolution as</p>
+<p>&nbsp;<br>
+$$
+\delta_j = \sum_{i=-\infty}^{i=\infty}\alpha_i\beta_{j-i}=(\alpha * \beta)_j,
+$$
+<p>&nbsp;<br>
+
+<p>or as a double sum with restriction \( l=i+j \)</p>
+<p>&nbsp;<br>
+$$
+\delta_l = \sum_{ij}\alpha_i\beta_{j}.
+$$
+<p>&nbsp;<br>
 </section>
 
 <section>
-<h2 id="strong-correlations">Strong correlations </h2>
+<h2 id="further-simplification">Further simplification </h2>
 
-<p>Images typically have strong local correlations, meaning that a small
-part of the image varies little from its neighboring regions. If for
-example we have an image of a blue car, we can roughly assume that a
-small blue part of the image is surrounded by other blue regions.
-</p>
+<p>Although we may have redundant operations with some few zeros for \( \beta_i \), we can rewrite the above sum in a more compact way as </p>
+<p>&nbsp;<br>
+$$
+\delta_i = \sum_{k=0}^{k=m-1}\alpha_k\beta_{i-k},
+$$
+<p>&nbsp;<br>
 
-<p>Therefore, instead of connecting every single pixel to a neuron in the
-first hidden layer, as we have previously done with deep neural
-networks, we can instead connect each neuron to a small part of the
-image (in all 3 RGB depth dimensions).  The size of each small area is
-fixed, and known as a <a href="https://en.wikipedia.org/wiki/Receptive_field" target="_blank">receptive</a>.
+<p>where \( m=3 \) in our case, the maximum length of
+the vector \( \alpha \). Note that the vector \( \boldsymbol{\beta} \) has length \( n=4 \). Below we will find an even more efficient representation.
 </p>
 </section>
 
 <section>
-<h2 id="layers-of-a-cnn">Layers of a CNN </h2>
+<h2 id="a-more-efficient-way-of-coding-the-above-convolution">A more efficient way of coding the above Convolution </h2>
 
-<p>The layers of a convolutional neural network arrange neurons in 3D: width, height and depth.  
-The input image is typically a square matrix of depth 3. 
+<p>Since we only have a finite number of \( \alpha \) and \( \beta \) values
+which are non-zero, we can rewrite the above convolution expressions
+as a matrix-vector multiplication
 </p>
 
-<p>A <b>convolution</b> is performed on the image which outputs
-a 3D volume of neurons. The weights to the input are arranged in a number of 2D matrices, known as <b>filters</b>.
+<p>&nbsp;<br>
+$$
+\boldsymbol{\delta}=\begin{bmatrix}\alpha_0 & 0 & 0 & 0 \\
+                            \alpha_1 & \alpha_0 & 0 & 0 \\
+			    \alpha_2 & \alpha_1 & \alpha_0 & 0 \\
+			    0 & \alpha_2 & \alpha_1 & \alpha_0 \\
+			    0 & 0 & \alpha_2 & \alpha_1 \\
+			    0 & 0 & 0 & \alpha_2
+			    \end{bmatrix}\begin{bmatrix} \beta_0 \\ \beta_1 \\ \beta_2 \\ \beta_3\end{bmatrix}.
+$$
+<p>&nbsp;<br>
+</section>
+
+<section>
+<h2 id="commutative-process">Commutative process </h2>
+
+<p>The process is commutative and we can easily see that we can rewrite the multiplication in terms of  a matrix holding \( \beta \) and a vector holding \( \alpha \).
+In this case we have
 </p>
+<p>&nbsp;<br>
+$$
+\boldsymbol{\delta}=\begin{bmatrix}\beta_0 & 0 & 0  \\
+                            \beta_1 & \beta_0 & 0  \\
+			    \beta_2 & \beta_1 & \beta_0  \\
+			    \beta_3 & \beta_2 & \beta_1 \\
+			    0 & \beta_3 & \beta_2 \\
+			    0 & 0 & \beta_3
+			    \end{bmatrix}\begin{bmatrix} \alpha_0 \\ \alpha_1 \\ \alpha_2\end{bmatrix}.
+$$
+<p>&nbsp;<br>
 
-<p>Each filter slides along the input image, taking the dot product
-between each small part of the image and the filter, in all depth
-dimensions. This is then passed through a non-linear function,
-typically the <b>Rectified Linear (ReLu)</b> function, which serves as the
-activation of the neurons in the first convolutional layer. This is
-further passed through a <b>pooling layer</b>, which reduces the size of the
-convolutional layer, e.g. by taking the maximum or average across some
-small regions, and this serves as input to the next convolutional
-layer.
+<p>Note that the use of these matrices is for mathematical purposes only
+and not implementation purposes.  When implementing the above equation
+we do not encode (and allocate memory) the matrices explicitely.  We
+rather code the convolutions in the minimal memory footprint that they
+require.
 </p>
 </section>
 
 <section>
-<h2 id="systematic-reduction">Systematic reduction </h2>
+<h2 id="toeplitz-matrices">Toeplitz matrices </h2>
 
-<p>By systematically reducing the size of the input volume, through
-convolution and pooling, the network should create representations of
-small parts of the input, and then from them assemble representations
-of larger areas.  The final pooling layer is flattened to serve as
-input to a hidden layer, such that each neuron in the final pooling
-layer is connected to every single neuron in the hidden layer. This
-then serves as input to the output layer, e.g. a softmax output for
-classification.
+<p>The above matrices are examples of so-called <a href="https://link.springer.com/book/10.1007/978-93-86279-04-0" target="_blank">Toeplitz
+matrices</a>. A
+Toeplitz matrix is a matrix in which each descending diagonal from
+left to right is constant. For instance the last matrix, which we
+rewrite as
+</p>
+<p>&nbsp;<br>
+$$
+\boldsymbol{A}=\begin{bmatrix}a_0 & 0 & 0  \\
+                            a_1 & a_0 & 0  \\
+			    a_2 & a_1 & a_0  \\
+			    a_3 & a_2 & a_1 \\
+			    0 & a_3 & a_2 \\
+			    0 & 0 & a_3
+			    \end{bmatrix},
+$$
+<p>&nbsp;<br>
+
+<p>with elements \( a_{ii}=a_{i+1,j+1}=a_{i-j} \) is an example of a Toeplitz
+matrix. Such a matrix does not need to be a square matrix.  Toeplitz
+matrices are also closely connected with Fourier series, because the multiplication operator by a trigonometric
+polynomial, compressed to a finite-dimensional space, can be
+represented by such a matrix. The example above shows that we can
+represent linear convolution as multiplication of a Toeplitz matrix by
+a vector.
 </p>
 </section>
 
 <section>
-<h2 id="prerequisites-collect-and-pre-process-data">Prerequisites: Collect and pre-process data </h2>
-
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="font-size: 80%; line-height: 125%;"><span style="color: #228B22"># import necessary packages</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">matplotlib.pyplot</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">plt</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn</span> <span style="color: #8B008B; font-weight: bold">import</span> datasets
+<h2 id="fourier-series-and-toeplitz-matrices">Fourier series and Toeplitz matrices </h2>
 
+<p>This is an active and ogoing research area concerning CNNs. The following articles may be of interest</p>
+<ol>
+<p><li> <a href="https://www.sciencedirect.com/topics/engineering/convolution-theorem#:~:text=The%20convolution%20theorem%20(together%20with,k%20)%20G%20(%20k%20)%20." target="_blank">Read more about the convolution theorem and Fouriers series</a></li>
+<p><li> <a href="https://www.sciencedirect.com/science/article/pii/S1568494623006257" target="_blank">Fourier Transform Layer</a></li>
+</ol>
+</section>
 
-<span style="color: #228B22"># ensure the same random numbers appear every time</span>
-np.random.seed(<span style="color: #B452CD">0</span>)
+<section>
+<h2 id="generalizing-the-above-one-dimensional-case">Generalizing the above one-dimensional case </h2>
 
-<span style="color: #228B22"># display images in notebook</span>
-%matplotlib inline
-plt.rcParams[<span style="color: #CD5555">&#39;figure.figsize&#39;</span>] = (<span style="color: #B452CD">12</span>,<span style="color: #B452CD">12</span>)
+<p>In order to align the above simple case with the more general
+convolution cases, we rename \( \boldsymbol{\alpha} \), whose length is \( m=3 \),
+with \( \boldsymbol{w} \).  We will interpret \( \boldsymbol{w} \) as a weight/filter function
+with which we want to perform the convolution with an input variable
+\( \boldsymbol{x} \) of length \( n \).  We will assume always that the filter
+\( \boldsymbol{w} \) has dimensionality \( m \le n \).
+</p>
 
+<p>We replace thus \( \boldsymbol{\beta} \) with \( \boldsymbol{x} \) and \( \boldsymbol{\delta} \) with \( \boldsymbol{y} \) and have</p>
+<p>&nbsp;<br>
+$$
+y(i)= \left(x*w\right)(i)= \sum_{k=0}^{k=m-1}w(k)x(i-k),
+$$
+<p>&nbsp;<br>
 
-<span style="color: #228B22"># download MNIST dataset</span>
-digits = datasets.load_digits()
+<p>where \( m=3 \) in our case, the maximum length of the vector \( \boldsymbol{w} \).
+Here the symbol \( * \) represents the mathematical operation of convolution.
+</p>
+</section>
 
-<span style="color: #228B22"># define inputs and labels</span>
-inputs = digits.images
-labels = digits.target
+<section>
+<h2 id="memory-considerations">Memory considerations </h2>
 
-<span style="color: #228B22"># RGB images have a depth of 3</span>
-<span style="color: #228B22"># our images are grayscale so they should have a depth of 1</span>
-inputs = inputs[:,:,:,np.newaxis]
+<p>This expression leaves us however with some terms with negative
+indices, for example \( x(-1) \) and \( x(-2) \) which may not be defined. Our
+vector \( \boldsymbol{x} \) has components \( x(0) \), \( x(1) \), \( x(2) \) and \( x(3) \).
+</p>
 
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;inputs = (n_inputs, pixel_width, pixel_height, depth) = &quot;</span> + <span style="color: #658b00">str</span>(inputs.shape))
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;labels = (n_inputs) = &quot;</span> + <span style="color: #658b00">str</span>(labels.shape))
+<p>The index \( j \) for \( \boldsymbol{x} \) runs from \( j=0 \) to \( j=3 \) since \( \boldsymbol{x} \) is meant to
+represent a third-order polynomial.
+</p>
 
+<p>Furthermore, the index \( i \) runs from \( i=0 \) to \( i=5 \) since \( \boldsymbol{y} \)
+contains the coefficients of a fifth-order polynomial.  When \( i=5 \) we
+may also have values of \( x(4) \) and \( x(5) \) which are not defined.
+</p>
+</section>
 
-<span style="color: #228B22"># choose some random images to display</span>
-n_inputs = <span style="color: #658b00">len</span>(inputs)
-indices = np.arange(n_inputs)
-random_indices = np.random.choice(indices, size=<span style="color: #B452CD">5</span>)
+<section>
+<h2 id="padding">Padding </h2>
 
-<span style="color: #8B008B; font-weight: bold">for</span> i, image <span style="color: #8B008B">in</span> <span style="color: #658b00">enumerate</span>(digits.images[random_indices]):
-    plt.subplot(<span style="color: #B452CD">1</span>, <span style="color: #B452CD">5</span>, i+<span style="color: #B452CD">1</span>)
-    plt.axis(<span style="color: #CD5555">&#39;off&#39;</span>)
-    plt.imshow(image, cmap=plt.cm.gray_r, interpolation=<span style="color: #CD5555">&#39;nearest&#39;</span>)
-    plt.title(<span style="color: #CD5555">&quot;Label: %d&quot;</span> % digits.target[random_indices[i]])
-plt.show()
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
+<p>The solution to this is what is called <b>padding</b>!  We simply define a
+new vector \( x \) with two added elements set to zero before \( x(0) \) and
+two new elements after \( x(3) \) set to zero. That is, we augment the
+length of \( \boldsymbol{x} \) from \( n=4 \) to \( n+2P=8 \), where \( P=2 \) is the padding
+constant (a new hyperparameter), see discussions below as well.
+</p>
 </section>
 
 <section>
-<h2 id="importing-keras-and-tensorflow">Importing Keras and Tensorflow </h2>
+<h2 id="new-vector">New vector </h2>
 
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="font-size: 80%; line-height: 125%;"><span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">tensorflow.keras</span> <span style="color: #8B008B; font-weight: bold">import</span> datasets, layers, models
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">tensorflow.keras.layers</span> <span style="color: #8B008B; font-weight: bold">import</span> Input
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">tensorflow.keras.models</span> <span style="color: #8B008B; font-weight: bold">import</span> Sequential      <span style="color: #228B22">#This allows appending layers to existing models</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">tensorflow.keras.layers</span> <span style="color: #8B008B; font-weight: bold">import</span> Dense           <span style="color: #228B22">#This allows defining the characteristics of a particular layer</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">tensorflow.keras</span> <span style="color: #8B008B; font-weight: bold">import</span> optimizers             <span style="color: #228B22">#This allows using whichever optimiser we want (sgd,adam,RMSprop)</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">tensorflow.keras</span> <span style="color: #8B008B; font-weight: bold">import</span> regularizers           <span style="color: #228B22">#This allows using whichever regularizer we want (l1,l2,l1_l2)</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">tensorflow.keras.utils</span> <span style="color: #8B008B; font-weight: bold">import</span> to_categorical   <span style="color: #228B22">#This allows using categorical cross entropy as the cost function</span>
-<span style="color: #228B22">#from tensorflow.keras import Conv2D</span>
-<span style="color: #228B22">#from tensorflow.keras import MaxPooling2D</span>
-<span style="color: #228B22">#from tensorflow.keras import Flatten</span>
+<p>We have a new vector defined as \( x(0)=0 \), \( x(1)=0 \),
+\( x(2)=\beta_0 \), \( x(3)=\beta_1 \), \( x(4)=\beta_2 \), \( x(5)=\beta_3 \),
+\( x(6)=0 \), and \( x(7)=0 \).
+</p>
 
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.model_selection</span> <span style="color: #8B008B; font-weight: bold">import</span> train_test_split
+<p>We have added four new elements, which
+are all zero. The benefit is that we can rewrite the equation for
+\( \boldsymbol{y} \), with \( i=0,1,\dots,5 \),
+</p>
+<p>&nbsp;<br>
+$$
+y(i) = \sum_{k=0}^{k=m-1}w(k)x(i+(m-1)-k).
+$$
+<p>&nbsp;<br>
 
-<span style="color: #228B22"># representation of labels</span>
-labels = to_categorical(labels)
-
-<span style="color: #228B22"># split into train and test data</span>
-<span style="color: #228B22"># one-liner from scikit-learn library</span>
-train_size = <span style="color: #B452CD">0.8</span>
-test_size = <span style="color: #B452CD">1</span> - train_size
-X_train, X_test, Y_train, Y_test = train_test_split(inputs, labels, train_size=train_size,
-                                                    test_size=test_size)
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-</section>
-
-<section>
-<h2 id="running-with-keras">Running with Keras </h2>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="font-size: 80%; line-height: 125%;"><span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">create_convolutional_neural_network_keras</span>(input_shape, receptive_field,
-                                              n_filters, n_neurons_connected, n_categories,
-                                              eta, lmbd):
-    model = Sequential()
-    model.add(layers.Conv2D(n_filters, (receptive_field, receptive_field), input_shape=input_shape, padding=<span style="color: #CD5555">&#39;same&#39;</span>,
-              activation=<span style="color: #CD5555">&#39;relu&#39;</span>, kernel_regularizer=regularizers.l2(lmbd)))
-    model.add(layers.MaxPooling2D(pool_size=(<span style="color: #B452CD">2</span>, <span style="color: #B452CD">2</span>)))
-    model.add(layers.Flatten())
-    model.add(layers.Dense(n_neurons_connected, activation=<span style="color: #CD5555">&#39;relu&#39;</span>, kernel_regularizer=regularizers.l2(lmbd)))
-    model.add(layers.Dense(n_categories, activation=<span style="color: #CD5555">&#39;softmax&#39;</span>, kernel_regularizer=regularizers.l2(lmbd)))
-    
-    sgd = optimizers.SGD(learning_rate=eta)
-    model.compile(loss=<span style="color: #CD5555">&#39;categorical_crossentropy&#39;</span>, optimizer=sgd, metrics=[<span style="color: #CD5555">&#39;accuracy&#39;</span>])
-    
-    <span style="color: #8B008B; font-weight: bold">return</span> model
-
-epochs = <span style="color: #B452CD">100</span>
-batch_size = <span style="color: #B452CD">100</span>
-input_shape = X_train.shape[<span style="color: #B452CD">1</span>:<span style="color: #B452CD">4</span>]
-receptive_field = <span style="color: #B452CD">3</span>
-n_filters = <span style="color: #B452CD">10</span>
-n_neurons_connected = <span style="color: #B452CD">50</span>
-n_categories = <span style="color: #B452CD">10</span>
-
-eta_vals = np.logspace(-<span style="color: #B452CD">5</span>, <span style="color: #B452CD">1</span>, <span style="color: #B452CD">7</span>)
-lmbd_vals = np.logspace(-<span style="color: #B452CD">5</span>, <span style="color: #B452CD">1</span>, <span style="color: #B452CD">7</span>)
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-</section>
-
-<section>
-<h2 id="final-part">Final part </h2>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="font-size: 80%; line-height: 125%;">CNN_keras = np.zeros((<span style="color: #658b00">len</span>(eta_vals), <span style="color: #658b00">len</span>(lmbd_vals)), dtype=<span style="color: #658b00">object</span>)
-        
-<span style="color: #8B008B; font-weight: bold">for</span> i, eta <span style="color: #8B008B">in</span> <span style="color: #658b00">enumerate</span>(eta_vals):
-    <span style="color: #8B008B; font-weight: bold">for</span> j, lmbd <span style="color: #8B008B">in</span> <span style="color: #658b00">enumerate</span>(lmbd_vals):
-        CNN = create_convolutional_neural_network_keras(input_shape, receptive_field,
-                                              n_filters, n_neurons_connected, n_categories,
-                                              eta, lmbd)
-        CNN.fit(X_train, Y_train, epochs=epochs, batch_size=batch_size, verbose=<span style="color: #B452CD">0</span>)
-        scores = CNN.evaluate(X_test, Y_test)
-        
-        CNN_keras[i][j] = CNN
-        
-        <span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;Learning rate = &quot;</span>, eta)
-        <span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;Lambda = &quot;</span>, lmbd)
-        <span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;Test accuracy: %.3f&quot;</span> % scores[<span style="color: #B452CD">1</span>])
-        <span style="color: #658b00">print</span>()
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-</section>
-
-<section>
-<h2 id="final-visualization">Final visualization </h2>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="font-size: 80%; line-height: 125%;"><span style="color: #228B22"># visual representation of grid search</span>
-<span style="color: #228B22"># uses seaborn heatmap, could probably do this in matplotlib</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">seaborn</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">sns</span>
-
-sns.set()
-
-train_accuracy = np.zeros((<span style="color: #658b00">len</span>(eta_vals), <span style="color: #658b00">len</span>(lmbd_vals)))
-test_accuracy = np.zeros((<span style="color: #658b00">len</span>(eta_vals), <span style="color: #658b00">len</span>(lmbd_vals)))
-
-<span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #658b00">len</span>(eta_vals)):
-    <span style="color: #8B008B; font-weight: bold">for</span> j <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #658b00">len</span>(lmbd_vals)):
-        CNN = CNN_keras[i][j]
-
-        train_accuracy[i][j] = CNN.evaluate(X_train, Y_train)[<span style="color: #B452CD">1</span>]
-        test_accuracy[i][j] = CNN.evaluate(X_test, Y_test)[<span style="color: #B452CD">1</span>]
-
-        
-fig, ax = plt.subplots(figsize = (<span style="color: #B452CD">10</span>, <span style="color: #B452CD">10</span>))
-sns.heatmap(train_accuracy, annot=<span style="color: #8B008B; font-weight: bold">True</span>, ax=ax, cmap=<span style="color: #CD5555">&quot;viridis&quot;</span>)
-ax.set_title(<span style="color: #CD5555">&quot;Training Accuracy&quot;</span>)
-ax.set_ylabel(<span style="color: #CD5555">&quot;$\eta$&quot;</span>)
-ax.set_xlabel(<span style="color: #CD5555">&quot;$\lambda$&quot;</span>)
-plt.show()
-
-fig, ax = plt.subplots(figsize = (<span style="color: #B452CD">10</span>, <span style="color: #B452CD">10</span>))
-sns.heatmap(test_accuracy, annot=<span style="color: #8B008B; font-weight: bold">True</span>, ax=ax, cmap=<span style="color: #CD5555">&quot;viridis&quot;</span>)
-ax.set_title(<span style="color: #CD5555">&quot;Test Accuracy&quot;</span>)
-ax.set_ylabel(<span style="color: #CD5555">&quot;$\eta$&quot;</span>)
-ax.set_xlabel(<span style="color: #CD5555">&quot;$\lambda$&quot;</span>)
-plt.show()
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-</section>
-
-<section>
-<h2 id="the-cifar01-data-set">The CIFAR01 data set </h2>
-
-<p>The CIFAR10 dataset contains 60,000 color images in 10 classes, with
-6,000 images in each class. The dataset is divided into 50,000
-training images and 10,000 testing images. The classes are mutually
-exclusive and there is no overlap between them.
-</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="font-size: 80%; line-height: 125%;"><span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">tensorflow</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">tf</span>
-
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">tensorflow.keras</span> <span style="color: #8B008B; font-weight: bold">import</span> datasets, layers, models
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">matplotlib.pyplot</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">plt</span>
-
-<span style="color: #228B22"># We import the data set</span>
-(train_images, train_labels), (test_images, test_labels) = datasets.cifar10.load_data()
-
-<span style="color: #228B22"># Normalize pixel values to be between 0 and 1 by dividing by 255. </span>
-train_images, test_images = train_images / <span style="color: #B452CD">255.0</span>, test_images / <span style="color: #B452CD">255.0</span>
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-</section>
-
-<section>
-<h2 id="verifying-the-data-set">Verifying the data set </h2>
-
-<p>To verify that the dataset looks correct, let's plot the first 25 images from the training set and display the class name below each image.</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="font-size: 80%; line-height: 125%;">class_names = [<span style="color: #CD5555">&#39;airplane&#39;</span>, <span style="color: #CD5555">&#39;automobile&#39;</span>, <span style="color: #CD5555">&#39;bird&#39;</span>, <span style="color: #CD5555">&#39;cat&#39;</span>, <span style="color: #CD5555">&#39;deer&#39;</span>,
-               <span style="color: #CD5555">&#39;dog&#39;</span>, <span style="color: #CD5555">&#39;frog&#39;</span>, <span style="color: #CD5555">&#39;horse&#39;</span>, <span style="color: #CD5555">&#39;ship&#39;</span>, <span style="color: #CD5555">&#39;truck&#39;</span>]
-plt.figure(figsize=(<span style="color: #B452CD">10</span>,<span style="color: #B452CD">10</span>))
-<span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #B452CD">25</span>):
-    plt.subplot(<span style="color: #B452CD">5</span>,<span style="color: #B452CD">5</span>,i+<span style="color: #B452CD">1</span>)
-    plt.xticks([])
-    plt.yticks([])
-    plt.grid(<span style="color: #8B008B; font-weight: bold">False</span>)
-    plt.imshow(train_images[i], cmap=plt.cm.binary)
-    <span style="color: #228B22"># The CIFAR labels happen to be arrays, </span>
-    <span style="color: #228B22"># which is why you need the extra index</span>
-    plt.xlabel(class_names[train_labels[i][<span style="color: #B452CD">0</span>]])
-plt.show()
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-</section>
-
-<section>
-<h2 id="set-up-the-model">Set up  the model </h2>
-
-<p>The 6 lines of code below define the convolutional base using a common pattern: a stack of Conv2D and MaxPooling2D layers.</p>
-
-<p>As input, a CNN takes tensors of shape (image_height, image_width, color_channels), ignoring the batch size. If you are new to these dimensions, color_channels refers to (R,G,B). In this example, you will configure our CNN to process inputs of shape (32, 32, 3), which is the format of CIFAR images. You can do this by passing the argument input_shape to our first layer.</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="font-size: 80%; line-height: 125%;">model = models.Sequential()
-model.add(layers.Conv2D(<span style="color: #B452CD">32</span>, (<span style="color: #B452CD">3</span>, <span style="color: #B452CD">3</span>), activation=<span style="color: #CD5555">&#39;relu&#39;</span>, input_shape=(<span style="color: #B452CD">32</span>, <span style="color: #B452CD">32</span>, <span style="color: #B452CD">3</span>)))
-model.add(layers.MaxPooling2D((<span style="color: #B452CD">2</span>, <span style="color: #B452CD">2</span>)))
-model.add(layers.Conv2D(<span style="color: #B452CD">64</span>, (<span style="color: #B452CD">3</span>, <span style="color: #B452CD">3</span>), activation=<span style="color: #CD5555">&#39;relu&#39;</span>))
-model.add(layers.MaxPooling2D((<span style="color: #B452CD">2</span>, <span style="color: #B452CD">2</span>)))
-model.add(layers.Conv2D(<span style="color: #B452CD">64</span>, (<span style="color: #B452CD">3</span>, <span style="color: #B452CD">3</span>), activation=<span style="color: #CD5555">&#39;relu&#39;</span>))
-
-<span style="color: #228B22"># Let&#39;s display the architecture of our model so far.</span>
-
-model.summary()
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-<p>You can see that the output of every Conv2D and MaxPooling2D layer is a 3D tensor of shape (height, width, channels). The width and height dimensions tend to shrink as you go deeper in the network. The number of output channels for each Conv2D layer is controlled by the first argument (e.g., 32 or 64). Typically, as the width and height shrink, you can afford (computationally) to add more output channels in each Conv2D layer.</p>
-</section>
-
-<section>
-<h2 id="add-dense-layers-on-top">Add Dense layers on top </h2>
-
-<p>To complete our model, you will feed the last output tensor from the
-convolutional base (of shape (4, 4, 64)) into one or more Dense layers
-to perform classification. Dense layers take vectors as input (which
-are 1D), while the current output is a 3D tensor. First, you will
-flatten (or unroll) the 3D output to 1D, then add one or more Dense
-layers on top. CIFAR has 10 output classes, so you use a final Dense
-layer with 10 outputs and a softmax activation.
-</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="font-size: 80%; line-height: 125%;">model.add(layers.Flatten())
-model.add(layers.Dense(<span style="color: #B452CD">64</span>, activation=<span style="color: #CD5555">&#39;relu&#39;</span>))
-model.add(layers.Dense(<span style="color: #B452CD">10</span>))
-<span style="color: #228B22"># Here&#39;s the complete architecture of our model</span>
-
-model.summary()
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-<p>As you can see, our (4, 4, 64) outputs were flattened into vectors of shape (1024) before going through two Dense layers.</p>
-</section>
-
-<section>
-<h2 id="compile-and-train-the-model">Compile and train the model </h2>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="font-size: 80%; line-height: 125%;">model.compile(optimizer=<span style="color: #CD5555">&#39;adam&#39;</span>,
-              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=<span style="color: #8B008B; font-weight: bold">True</span>),
-              metrics=[<span style="color: #CD5555">&#39;accuracy&#39;</span>])
-
-history = model.fit(train_images, train_labels, epochs=<span style="color: #B452CD">10</span>, 
-                    validation_data=(test_images, test_labels))
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-</section>
-
-<section>
-<h2 id="finally-evaluate-the-model">Finally, evaluate the model </h2>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="font-size: 80%; line-height: 125%;">plt.plot(history.history[<span style="color: #CD5555">&#39;accuracy&#39;</span>], label=<span style="color: #CD5555">&#39;accuracy&#39;</span>)
-plt.plot(history.history[<span style="color: #CD5555">&#39;val_accuracy&#39;</span>], label = <span style="color: #CD5555">&#39;val_accuracy&#39;</span>)
-plt.xlabel(<span style="color: #CD5555">&#39;Epoch&#39;</span>)
-plt.ylabel(<span style="color: #CD5555">&#39;Accuracy&#39;</span>)
-plt.ylim([<span style="color: #B452CD">0.5</span>, <span style="color: #B452CD">1</span>])
-plt.legend(loc=<span style="color: #CD5555">&#39;lower right&#39;</span>)
-
-test_loss, test_acc = model.evaluate(test_images,  test_labels, verbose=<span style="color: #B452CD">2</span>)
-
-<span style="color: #658b00">print</span>(test_acc)
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-</section>
-
-<section>
-<h2 id="building-our-own-cnn-code">Building our own CNN code </h2>
-
-<p>Here we present a flexible and readable python code for a CNN
-implemented with NumPy. We will present the code, showcase how to use
-the codebase and fit a CNN that yields a 99% accuracy on the 28x28
-MNIST dataset within reasonable time.
-</p>
-
-<b>The codes here were developed by Eric Reber and Gregor Kajda during spring 2023.</b>
-
-<p>The CNN is compatible with all schedulers, cost functions and
-activation functions discussed in constructing our neural network
-codes.
-</p>
-
-<p> The CNN code consists of different types of Layer classes, including
-Convolution2DLayer, Pooling2DLayer, FlattenLayer, FullyConnectedLayer
-and OutputLayer, which can be added to the CNN object using the
-interface of the CNN class. This allows you to easily construct your
-own CNN, as well as allowing you to get used to an interface similar
-to that of TensorFlow which is used for real world applications. 
-</p>
-
-<p>Another important feature of this code is that it throws errors if
-unreasonable decisions are made (for example using a kernel that is
-larger than the image, not using a FlattenLayer, etc), and provides
-the user with an informative error message.
-</p>
-<h3 id="list-of-contents">List of contents: </h3>
-<ol>
-<p><li> Schedulers</li>
-<p><li> Activation Functions</li>
-<p><li> Cost Functions</li> 
-<p><li> Convolution</li>
-<p><li> Layers</li>
-<p><li> CNN</li> 
-<p><li> Some final remarks</li>
-</ol>
-<p>
-<h3 id="schedulers">Schedulers </h3>
-
-<p>The code below shows object oriented implementations of the Constant,
-Momentum, Adagrad, AdagradMomentum, RMS prop and Adam schedulers. All
-of the classes belong to the shared abstract Scheduler class, and
-share the update_change() and reset() methods allowing for any of the
-schedulers to be seamlessly used during the training stage, as will
-later be shown in the fit() method of the neural
-network. Update_change() only has one parameter, the gradient
-(\( \delta^{l}_{j}a^{l-1}_k \)), and returns the change which will be
-subtracted from the weights. The reset() function takes no parameters,
-and resets the desired variables. For Constant and Momentum, reset
-does nothing.
-</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="font-size: 80%; line-height: 125%;"><span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">autograd.numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
-
-<span style="color: #8B008B; font-weight: bold">class</span> <span style="color: #008b45; font-weight: bold">Scheduler</span>:
-    <span style="color: #CD5555">&quot;&quot;&quot;</span>
-<span style="color: #CD5555">    Abstract class for Schedulers</span>
-<span style="color: #CD5555">    &quot;&quot;&quot;</span>
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">__init__</span>(<span style="color: #658b00">self</span>, eta):
-        <span style="color: #658b00">self</span>.eta = eta
-
-    <span style="color: #228B22"># should be overwritten</span>
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">update_change</span>(<span style="color: #658b00">self</span>, gradient):
-        <span style="color: #8B008B; font-weight: bold">raise</span> <span style="color: #008b45; font-weight: bold">NotImplementedError</span>
-
-    <span style="color: #228B22"># overwritten if needed</span>
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">reset</span>(<span style="color: #658b00">self</span>):
-        <span style="color: #8B008B; font-weight: bold">pass</span>
-
-
-<span style="color: #8B008B; font-weight: bold">class</span> <span style="color: #008b45; font-weight: bold">Constant</span>(Scheduler):
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">__init__</span>(<span style="color: #658b00">self</span>, eta):
-        <span style="color: #658b00">super</span>().<span style="color: #008b45">__init__</span>(eta)
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">update_change</span>(<span style="color: #658b00">self</span>, gradient):
-        <span style="color: #8B008B; font-weight: bold">return</span> <span style="color: #658b00">self</span>.eta * gradient
-    
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">reset</span>(<span style="color: #658b00">self</span>):
-        <span style="color: #8B008B; font-weight: bold">pass</span>
-
-
-<span style="color: #8B008B; font-weight: bold">class</span> <span style="color: #008b45; font-weight: bold">Momentum</span>(Scheduler):
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">__init__</span>(<span style="color: #658b00">self</span>, eta: <span style="color: #658b00">float</span>, momentum: <span style="color: #658b00">float</span>):
-        <span style="color: #658b00">super</span>().<span style="color: #008b45">__init__</span>(eta)
-        <span style="color: #658b00">self</span>.momentum = momentum
-        <span style="color: #658b00">self</span>.change = <span style="color: #B452CD">0</span>
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">update_change</span>(<span style="color: #658b00">self</span>, gradient):
-        <span style="color: #658b00">self</span>.change = <span style="color: #658b00">self</span>.momentum * <span style="color: #658b00">self</span>.change + <span style="color: #658b00">self</span>.eta * gradient
-        <span style="color: #8B008B; font-weight: bold">return</span> <span style="color: #658b00">self</span>.change
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">reset</span>(<span style="color: #658b00">self</span>):
-        <span style="color: #8B008B; font-weight: bold">pass</span>
-
-
-<span style="color: #8B008B; font-weight: bold">class</span> <span style="color: #008b45; font-weight: bold">Adagrad</span>(Scheduler):
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">__init__</span>(<span style="color: #658b00">self</span>, eta):
-        <span style="color: #658b00">super</span>().<span style="color: #008b45">__init__</span>(eta)
-        <span style="color: #658b00">self</span>.G_t = <span style="color: #8B008B; font-weight: bold">None</span>
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">update_change</span>(<span style="color: #658b00">self</span>, gradient):
-        delta = <span style="color: #B452CD">1e-8</span>  <span style="color: #228B22"># avoid division ny zero</span>
-
-        <span style="color: #8B008B; font-weight: bold">if</span> <span style="color: #658b00">self</span>.G_t <span style="color: #8B008B">is</span> <span style="color: #8B008B; font-weight: bold">None</span>:
-            <span style="color: #658b00">self</span>.G_t = np.zeros((gradient.shape[<span style="color: #B452CD">0</span>], gradient.shape[<span style="color: #B452CD">0</span>]))
-
-        <span style="color: #658b00">self</span>.G_t += gradient @ gradient.T
-
-        G_t_inverse = <span style="color: #B452CD">1</span> / (
-            delta + np.sqrt(np.reshape(np.diagonal(<span style="color: #658b00">self</span>.G_t), (<span style="color: #658b00">self</span>.G_t.shape[<span style="color: #B452CD">0</span>], <span style="color: #B452CD">1</span>)))
-        )
-        <span style="color: #8B008B; font-weight: bold">return</span> <span style="color: #658b00">self</span>.eta * gradient * G_t_inverse
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">reset</span>(<span style="color: #658b00">self</span>):
-        <span style="color: #658b00">self</span>.G_t = <span style="color: #8B008B; font-weight: bold">None</span>
-
-
-<span style="color: #8B008B; font-weight: bold">class</span> <span style="color: #008b45; font-weight: bold">AdagradMomentum</span>(Scheduler):
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">__init__</span>(<span style="color: #658b00">self</span>, eta, momentum):
-        <span style="color: #658b00">super</span>().<span style="color: #008b45">__init__</span>(eta)
-        <span style="color: #658b00">self</span>.G_t = <span style="color: #8B008B; font-weight: bold">None</span>
-        <span style="color: #658b00">self</span>.momentum = momentum
-        <span style="color: #658b00">self</span>.change = <span style="color: #B452CD">0</span>
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">update_change</span>(<span style="color: #658b00">self</span>, gradient):
-        delta = <span style="color: #B452CD">1e-8</span>  <span style="color: #228B22"># avoid division ny zero</span>
-
-        <span style="color: #8B008B; font-weight: bold">if</span> <span style="color: #658b00">self</span>.G_t <span style="color: #8B008B">is</span> <span style="color: #8B008B; font-weight: bold">None</span>:
-            <span style="color: #658b00">self</span>.G_t = np.zeros((gradient.shape[<span style="color: #B452CD">0</span>], gradient.shape[<span style="color: #B452CD">0</span>]))
-
-        <span style="color: #658b00">self</span>.G_t += gradient @ gradient.T
-
-        G_t_inverse = <span style="color: #B452CD">1</span> / (
-            delta + np.sqrt(np.reshape(np.diagonal(<span style="color: #658b00">self</span>.G_t), (<span style="color: #658b00">self</span>.G_t.shape[<span style="color: #B452CD">0</span>], <span style="color: #B452CD">1</span>)))
-        )
-        <span style="color: #658b00">self</span>.change = <span style="color: #658b00">self</span>.change * <span style="color: #658b00">self</span>.momentum + <span style="color: #658b00">self</span>.eta * gradient * G_t_inverse
-        <span style="color: #8B008B; font-weight: bold">return</span> <span style="color: #658b00">self</span>.change
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">reset</span>(<span style="color: #658b00">self</span>):
-        <span style="color: #658b00">self</span>.G_t = <span style="color: #8B008B; font-weight: bold">None</span>
-
-
-<span style="color: #8B008B; font-weight: bold">class</span> <span style="color: #008b45; font-weight: bold">RMS_prop</span>(Scheduler):
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">__init__</span>(<span style="color: #658b00">self</span>, eta, rho):
-        <span style="color: #658b00">super</span>().<span style="color: #008b45">__init__</span>(eta)
-        <span style="color: #658b00">self</span>.rho = rho
-        <span style="color: #658b00">self</span>.second = <span style="color: #B452CD">0.0</span>
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">update_change</span>(<span style="color: #658b00">self</span>, gradient):
-        delta = <span style="color: #B452CD">1e-8</span>  <span style="color: #228B22"># avoid division ny zero</span>
-        <span style="color: #658b00">self</span>.second = <span style="color: #658b00">self</span>.rho * <span style="color: #658b00">self</span>.second + (<span style="color: #B452CD">1</span> - <span style="color: #658b00">self</span>.rho) * gradient * gradient
-        <span style="color: #8B008B; font-weight: bold">return</span> <span style="color: #658b00">self</span>.eta * gradient / (np.sqrt(<span style="color: #658b00">self</span>.second + delta))
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">reset</span>(<span style="color: #658b00">self</span>):
-        <span style="color: #658b00">self</span>.second = <span style="color: #B452CD">0.0</span>
-
-
-<span style="color: #8B008B; font-weight: bold">class</span> <span style="color: #008b45; font-weight: bold">Adam</span>(Scheduler):
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">__init__</span>(<span style="color: #658b00">self</span>, eta, rho, rho2):
-        <span style="color: #658b00">super</span>().<span style="color: #008b45">__init__</span>(eta)
-        <span style="color: #658b00">self</span>.rho = rho
-        <span style="color: #658b00">self</span>.rho2 = rho2
-        <span style="color: #658b00">self</span>.moment = <span style="color: #B452CD">0</span>
-        <span style="color: #658b00">self</span>.second = <span style="color: #B452CD">0</span>
-        <span style="color: #658b00">self</span>.n_epochs = <span style="color: #B452CD">1</span>
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">update_change</span>(<span style="color: #658b00">self</span>, gradient):
-        delta = <span style="color: #B452CD">1e-8</span>  <span style="color: #228B22"># avoid division ny zero</span>
-
-        <span style="color: #658b00">self</span>.moment = <span style="color: #658b00">self</span>.rho * <span style="color: #658b00">self</span>.moment + (<span style="color: #B452CD">1</span> - <span style="color: #658b00">self</span>.rho) * gradient
-        <span style="color: #658b00">self</span>.second = <span style="color: #658b00">self</span>.rho2 * <span style="color: #658b00">self</span>.second + (<span style="color: #B452CD">1</span> - <span style="color: #658b00">self</span>.rho2) * gradient * gradient
-
-        moment_corrected = <span style="color: #658b00">self</span>.moment / (<span style="color: #B452CD">1</span> - <span style="color: #658b00">self</span>.rho**<span style="color: #658b00">self</span>.n_epochs)
-        second_corrected = <span style="color: #658b00">self</span>.second / (<span style="color: #B452CD">1</span> - <span style="color: #658b00">self</span>.rho2**<span style="color: #658b00">self</span>.n_epochs)
-
-        <span style="color: #8B008B; font-weight: bold">return</span> <span style="color: #658b00">self</span>.eta * moment_corrected / (np.sqrt(second_corrected + delta))
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">reset</span>(<span style="color: #658b00">self</span>):
-        <span style="color: #658b00">self</span>.n_epochs += <span style="color: #B452CD">1</span>
-        <span style="color: #658b00">self</span>.moment = <span style="color: #B452CD">0</span>
-        <span style="color: #658b00">self</span>.second = <span style="color: #B452CD">0</span>
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-<h3 id="usage-of-schedulers">Usage of schedulers </h3>
-
-<p>To initalize a scheduler, simply create the object and pass in the necessary parameters such as the learning rate and the momentum as shown below. As the Scheduler class is an abstract class it should not called directly, and will raise an error upon usage.</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="font-size: 80%; line-height: 125%;">momentum_scheduler = Momentum(eta=<span style="color: #B452CD">1e-3</span>, momentum=<span style="color: #B452CD">0.9</span>)
-adam_scheduler = Adam(eta=<span style="color: #B452CD">1e-3</span>, rho=<span style="color: #B452CD">0.9</span>, rho2=<span style="color: #B452CD">0.999</span>)
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-<p>Here is a small example for how a segment of code using schedulers could look. Switching out the schedulers is simple.</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="font-size: 80%; line-height: 125%;">weights = np.ones((<span style="color: #B452CD">3</span>,<span style="color: #B452CD">3</span>))
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">f&quot;Before scheduler:\n{</span>weights<span style="color: #CD5555">=}&quot;</span>)
-
-epochs = <span style="color: #B452CD">10</span>
-<span style="color: #8B008B; font-weight: bold">for</span> e <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(epochs):
-    gradient = np.random.rand(<span style="color: #B452CD">3</span>, <span style="color: #B452CD">3</span>)
-    change = adam_scheduler.update_change(gradient)
-    weights = weights - change
-    adam_scheduler.reset()
-
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">f&quot;\nAfter scheduler:\n{</span>weights<span style="color: #CD5555">=}&quot;</span>)
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-<h3 id="cost-functions">Cost functions </h3>
-
-<p>In this section we will quickly look at cost functions that can be
-used when creating the neural network. Every cost function takes the
-target vector as its parameter, and returns a function valued only at
-X such that it may easily be differentiated.
-</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="font-size: 80%; line-height: 125%;"><span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">CostOLS</span>(target):
-    <span style="color: #CD5555">&quot;&quot;&quot;</span>
-<span style="color: #CD5555">    Return OLS function valued only at X, so</span>
-<span style="color: #CD5555">    that it may be easily differentiated</span>
-<span style="color: #CD5555">    &quot;&quot;&quot;</span>
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">func</span>(X):
-        <span style="color: #8B008B; font-weight: bold">return</span> (<span style="color: #B452CD">1.0</span> / target.shape[<span style="color: #B452CD">0</span>]) * np.sum((target - X) ** <span style="color: #B452CD">2</span>)
-
-    <span style="color: #8B008B; font-weight: bold">return</span> func
-
-
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">CostLogReg</span>(target):
-    <span style="color: #CD5555">&quot;&quot;&quot;</span>
-<span style="color: #CD5555">    Return Logistic Regression cost function</span>
-<span style="color: #CD5555">    valued only at X, so that it may be easily differentiated</span>
-<span style="color: #CD5555">    &quot;&quot;&quot;</span>
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">func</span>(X):
-        <span style="color: #8B008B; font-weight: bold">return</span> -(<span style="color: #B452CD">1.0</span> / target.shape[<span style="color: #B452CD">0</span>]) * np.sum(
-            (target * np.log(X + <span style="color: #B452CD">10e-10</span>)) + ((<span style="color: #B452CD">1</span> - target) * np.log(<span style="color: #B452CD">1</span> - X + <span style="color: #B452CD">10e-10</span>))
-        )
-
-    <span style="color: #8B008B; font-weight: bold">return</span> func
-
-
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">CostCrossEntropy</span>(target):
-    <span style="color: #CD5555">&quot;&quot;&quot;</span>
-<span style="color: #CD5555">    Return cross entropy cost function valued only at X, so</span>
-<span style="color: #CD5555">    that it may be easily differentiated</span>
-<span style="color: #CD5555">    &quot;&quot;&quot;</span>
-    
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">func</span>(X):
-        <span style="color: #8B008B; font-weight: bold">return</span> -(<span style="color: #B452CD">1.0</span> / target.size) * np.sum(target * np.log(X + <span style="color: #B452CD">10e-10</span>))
-
-    <span style="color: #8B008B; font-weight: bold">return</span> func
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-<h3 id="usage-of-cost-functions">Usage of cost functions </h3>
-
-<p>Below we will provide a short example of how these cost function may
-be used to obtain results if you wish to test them out on your own
-using AutoGrad's automatic differentiation.
-</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="font-size: 80%; line-height: 125%;"><span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">autograd</span> <span style="color: #8B008B; font-weight: bold">import</span> grad
-
-target = np.array([[<span style="color: #B452CD">1</span>, <span style="color: #B452CD">2</span>, <span style="color: #B452CD">3</span>]]).T
-a = np.array([[<span style="color: #B452CD">4</span>, <span style="color: #B452CD">5</span>, <span style="color: #B452CD">6</span>]]).T
-
-cost_func = CostCrossEntropy
-cost_func_derivative = grad(cost_func(target))
-
-valued_at_a = cost_func_derivative(a)
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">f&quot;Derivative of cost function {</span>cost_func.<span style="color: #00688B">__name__</span><span style="color: #CD5555">} valued at a:\n{</span>valued_at_a<span style="color: #CD5555">}&quot;</span>)
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-<h3 id="activation-functions">Activation functions </h3>
-
-<p>Finally, before we look at the layers that make up the neural network,
-we will look at the activation functions which can be specified
-between the hidden layers and as the output function. Each function
-can be valued for any given vector or matrix X, and can be
-differentiated via derivate().
-</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="font-size: 80%; line-height: 125%;"><span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">autograd.numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">autograd</span> <span style="color: #8B008B; font-weight: bold">import</span> elementwise_grad
-
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">identity</span>(X):
-    <span style="color: #8B008B; font-weight: bold">return</span> X
-
-
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">sigmoid</span>(X):
-    <span style="color: #8B008B; font-weight: bold">try</span>:
-        <span style="color: #8B008B; font-weight: bold">return</span> <span style="color: #B452CD">1.0</span> / (<span style="color: #B452CD">1</span> + np.exp(-X))
-    <span style="color: #8B008B; font-weight: bold">except</span> <span style="color: #008b45; font-weight: bold">FloatingPointError</span>:
-        <span style="color: #8B008B; font-weight: bold">return</span> np.where(X &gt; np.zeros(X.shape), np.ones(X.shape), np.zeros(X.shape))
-
-
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">softmax</span>(X):
-    X = X - np.max(X, axis=-<span style="color: #B452CD">1</span>, keepdims=<span style="color: #8B008B; font-weight: bold">True</span>)
-    delta = <span style="color: #B452CD">10e-10</span>
-    <span style="color: #8B008B; font-weight: bold">return</span> np.exp(X) / (np.sum(np.exp(X), axis=-<span style="color: #B452CD">1</span>, keepdims=<span style="color: #8B008B; font-weight: bold">True</span>) + delta)
-
-
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">RELU</span>(X):
-    <span style="color: #8B008B; font-weight: bold">return</span> np.where(X &gt; np.zeros(X.shape), X, np.zeros(X.shape))
-
-
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">LRELU</span>(X):
-    delta = <span style="color: #B452CD">10e-4</span>
-    <span style="color: #8B008B; font-weight: bold">return</span> np.where(X &gt; np.zeros(X.shape), X, delta * X)
-
-
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">derivate</span>(func):
-    <span style="color: #8B008B; font-weight: bold">if</span> func.<span style="color: #00688B">__name__</span> == <span style="color: #CD5555">&quot;RELU&quot;</span>:
-
-        <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">func</span>(X):
-            <span style="color: #8B008B; font-weight: bold">return</span> np.where(X &gt; <span style="color: #B452CD">0</span>, <span style="color: #B452CD">1</span>, <span style="color: #B452CD">0</span>)
-
-        <span style="color: #8B008B; font-weight: bold">return</span> func
-
-    <span style="color: #8B008B; font-weight: bold">elif</span> func.<span style="color: #00688B">__name__</span> == <span style="color: #CD5555">&quot;LRELU&quot;</span>:
-
-        <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">func</span>(X):
-            delta = <span style="color: #B452CD">10e-4</span>
-            <span style="color: #8B008B; font-weight: bold">return</span> np.where(X &gt; <span style="color: #B452CD">0</span>, <span style="color: #B452CD">1</span>, delta)
-
-        <span style="color: #8B008B; font-weight: bold">return</span> func
-
-    <span style="color: #8B008B; font-weight: bold">else</span>:
-        <span style="color: #8B008B; font-weight: bold">return</span> elementwise_grad(func)
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-<h3 id="usage-of-activation-functions">Usage of activation functions </h3>
-
-<p>Below we present a short demonstration of how to use an activation
-function. The derivative of the activation function will be important
-when calculating the output delta term during backpropagation. Note
-that derivate() can also be used for cost functions for a more
-generalized approach.
-</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="font-size: 80%; line-height: 125%;">z = np.array([[<span style="color: #B452CD">4</span>, <span style="color: #B452CD">5</span>, <span style="color: #B452CD">6</span>]]).T
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">f&quot;Input to activation function:\n{</span>z<span style="color: #CD5555">}&quot;</span>)
-
-act_func = sigmoid
-a = act_func(z)
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">f&quot;\nOutput from {</span>act_func.<span style="color: #00688B">__name__</span><span style="color: #CD5555">} activation function:\n{</span>a<span style="color: #CD5555">}&quot;</span>)
-
-act_func_derivative = derivate(act_func)
-valued_at_z = act_func_derivative(a)
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">f&quot;\nDerivative of {</span>act_func.<span style="color: #00688B">__name__</span><span style="color: #CD5555">} activation function valued at z:\n{</span>valued_at_z<span style="color: #CD5555">}&quot;</span>)
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-<h3 id="convolution">Convolution </h3>
-
-<p>In order to construct a convolutional neural network (CNN), it is
-crucial to comprehend the fundamental principles of convolution and
-how it aids in extracting information from images. Convolution, at its
-core, is merely a mathematical operation between two functions that
-yields another function. It is represented by an integral between two
-functions, which is typically expressed as:
-</p>
-
-<p>&nbsp;<br>
-$$
-(f \ast g)(t):=\int_{-\infty}^{\infty} f(\tau) g(t-\tau) d \tau.
-$$
-<p>&nbsp;<br>
-
-<p>Here, \( f \) and \( g \) are the two functions on which we want to perform an
-operation. The outcome of the convolution operation is represented by
-\( (f \ast g) \), and it is derived by sliding the function \( g \) over \( f \) and
-computing the integral of their product at each position. If both
-functions are continuous, convolution takes the form shown
-above. However, if we discretize both \( f \) and \( g \), the convolution
-operation will take the form of a sum between the elements of \( f \) and \( g \):
-</p>
-<p>&nbsp;<br>
-$$
-(f \ast g)[n]=\sum_{m=0}^{n-1} f(m) g(n-m).
-$$
-<p>&nbsp;<br>
-
-<p>The key idea we utilize to extract the information contained in an
-image is to slide an \( m \times n \) matrix \( g \) over an \( m \times n \)
-matrix \( f \). In our case, \( f \) represents the image, while \( g \)
-represents the kernel, oftentimes called a filter. However, since our
-convolution will be a two-dimensional variant, we need to extend our
-mathematical formula with an additional summation:
-</p>
-
-<p>&nbsp;<br>
-$$
-(f \ast g)(i, j)\sum_{m=0}^{M-1}\sum_{n=0}^{N-1} f(m,n) g(i-m, j-n).
-$$
-<p>&nbsp;<br>
-
-<p>It is imperative to note that the size of the kernel g is
-significantly smaller than the size of the input image f, thereby
-reducing the amount of computation necessary for feature
-extraction. Furthermore, the kernel is usually a trainable parameter
-in a convolutional neural network, allowing the network to learn
-appropriate kernels for specific tasks.
-</p>
-
-<p>To give you an example of how 2D convolution works in practice,
-suppose we have an image \( f \) of dimension \( 6 \times 6 \)
-</p>
-
-<p>&nbsp;<br>
-$$
-f = \begin{bmatrix}
-4 & 1 & 2 & 9 & 8 & 6 \\
-9 & 5 & 9 & 5 & 8 & 5 \\
-1 & 5 & 9 & 7 & 6 & 4 \\
-2 & 9 & 8 & 3 & 7 & 1 \\
-8 & 1 & 6 & 4 & 2 & 2 \\
-1 & 0 & 5 & 7 & 8 & 2 \\
-\end{bmatrix}
-$$
-<p>&nbsp;<br>
-
-<p>and a \( 3 \times 3 \) kernel \( g \) called a low-pass filter. Note that the
-kernel is usually rotated by 180 degrees during convolution, however
-this has no effect on this kernel.
-</p>
-
-<p>&nbsp;<br>
-$$
-g = \frac{1}{9}
-\begin{bmatrix}
-1 & 1 & 1 \\
-1 & 1 & 1 \\
-1 & 1 & 1 \\
-\end{bmatrix}
-$$
-<p>&nbsp;<br>
-
-<p>In order to filter the image, we have to extract a \( 3 \times 3 \)
-element from the upper left corner of \( f \), and perform element-wise
-multiplication of the extracted image pixels with the elements of the
-kernel \( g \):
-</p>
-
-<p>&nbsp;<br>
-$$
-\begin{bmatrix}
-4 & 1 & 2 \\
-9 & 5 & 9 \\
-1 & 5 & 9 \\
-\end{bmatrix}
-\cdot
-\begin{bmatrix}
-\frac{1}{9} & \frac{1}{9} & \frac{1}{9} \\
-\frac{1}{9} & \frac{1}{9} & \frac{1}{9} \\
-\frac{1}{9} & \frac{1}{9} & \frac{1}{9} \\
-\end{bmatrix}
-=
-\begin{bmatrix}
-\frac{4}{9} & \frac{1}{9} & \frac{2}{9} \\
-\frac{9}{9} & \frac{5}{9} & \frac{9}{9} \\
-\frac{1}{9} & \frac{5}{9} & \frac{9}{9} \\
-\end {bmatrix}
-= \boldsymbol{A}
-$$
-<p>&nbsp;<br>
-
-<p>Then, following the multiplication, we summarize all the elements of the resulting matrix \( \boldsymbol{A} \):</p>
-
-<p>&nbsp;<br>
-$$
-(f \ast g)(0, 0)= \sum_{i=0}^{2} \sum_{j=0}^{2} a_{i,j} = 5,
-$$
-<p>&nbsp;<br>
-
-<p>which corresponds to the first element of the filtered image \( (f \ast g) \).</p>
-
-<p>Here we use a stride of \( S=1 \), a parameter denoted \( S \) which describes how
-many indexes we move the kernel \( g \) to the right before repeating the
-calculations above for the next \( 3 \times 3 \) element of the image
-\( f \). It is usually presumed that \( S=1 \), however, larger values for \( S \)
-can be used to reduce the dimentionality of the filtered image such
-that the convolution operation is more computationally efficient. In
-the context of a convolutional neural network, this will become very
-useful.
-</p>
-
-<p>The full result of the convolution is:</p>
-
-<p>&nbsp;<br>
-$$
-(f \ast g) =
-\begin{bmatrix}
-5 & 5.78 & 7 & 6.44 \\
-6.33 & 6.67 & 6.89 & 5.11 \\
-5.44 & 5.78 & 5.78 & 4 \\
-4.44 & 4.78 & 5.56 & 4 \\
-\end{bmatrix}
-$$
-<p>&nbsp;<br>
-
-<p>The result is markedly smaller in shape than the original image. This
-occurs when using convolution without first padding the image with
-additional columns and rows, allowing us to keep the original image
-shape after sliding the kernel over the image.  How many rows and
-columns we wish to pad the image with depends strictly on the shape of
-the kernel, as we wish to pad the image with \( r \) additional rows and
-\( c \) additional columns.
-</p>
-
-<p>&nbsp;<br>
-$$
-r =\lfloor \frac{\mathrm{kernel height}}{2} \rfloor \cdot 2 \\
-c =\lfloor \frac{\mathrm{kernel width}}{2} \rfloor \cdot 2
-$$
-<p>&nbsp;<br>
-
-<p>Note the notation \( \lfloor \frac{\mathrm{kernel width}}{2} \rfloor \) means that
-we floor the result of the division, meaning we round down to a whole
-number in case \( \frac{\mathrm{kernel width}}{2} \) results in a floating point
-number.
-</p>
-
-<p>Using those simple equations, we find out by how much we have to
-extend the dimensions of the original image. Before proceeding,
-however, we might ask what we shall fill the additional rows and
-columns with? One of the most common approaches to padding is
-zero-padding, which as the name suggest, involves filling the rows and
-columns with zeros. This is the approach that we will be using for
-this demonstration. If we apply this padding to out original \( 6 \times 6 \)
-image, the result will be an \( 8 \times 8 \) image as the kernel has a width and
-height of 3. Note that the original image is encapsuled by the
-zero-padded rows and columns:
-</p>
-
-<p>&nbsp;<br>
-$$
-\begin{bmatrix}
-0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\
-0 & 4 & 1 & 2 & 9 & 8 & 6 & 0 \\
-0 & 9 & 5 & 9 & 5 & 8 & 5 & 0 \\
-0 & 1 & 5 & 9 & 7 & 6 & 4 & 0 \\
-0 & 2 & 9 & 8 & 3 & 7 & 1 & 0 \\
-0 & 8 & 1 & 6 & 4 & 2 & 2 & 0 \\
-0 & 1 & 0 & 5 & 7 & 8 & 2 & 0 \\
-0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\
-\end{bmatrix}.
-$$
-<p>&nbsp;<br>
-
-<p>Below we have provided code that demonstrates padding and
-convolution. As you will see when we run the code, the size of the
-image will remain unchanged when using padding.~
-</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="font-size: 80%; line-height: 125%;"><span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
-
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">padding</span>(image, kernel):
-    <span style="color: #228B22"># calculate r and c</span>
-    r = (kernel.shape[<span style="color: #B452CD">0</span>] // <span style="color: #B452CD">2</span>) * <span style="color: #B452CD">2</span>
-    c = (kernel.shape[<span style="color: #B452CD">1</span>] // <span style="color: #B452CD">2</span>) * <span style="color: #B452CD">2</span>
-    
-    <span style="color: #228B22"># padded image dimensions</span>
-    padded_height = image.shape[<span style="color: #B452CD">0</span>] + r
-    padded_width = image.shape[<span style="color: #B452CD">1</span>] + c
-    
-    <span style="color: #228B22"># for more readable code</span>
-    k_half_height = kernel.shape[<span style="color: #B452CD">0</span>] // <span style="color: #B452CD">2</span>
-    k_half_width = kernel.shape[<span style="color: #B452CD">1</span>] // <span style="color: #B452CD">2</span>
-
-    <span style="color: #228B22"># zero matrix with padded dimensions</span>
-    padded_img = np.zeros((padded_height, padded_width))
-
-    <span style="color: #228B22"># place image into zero matrix</span>
-    padded_img[k_half_height : padded_height - k_half_height,
-               k_half_width : padded_width - k_half_width] = image[:, :]
-
-    <span style="color: #8B008B; font-weight: bold">return</span> padded_img
-
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">convolve</span>(original_image, padded_image, kernel, stride=<span style="color: #B452CD">1</span>):
-    <span style="color: #228B22"># rotate kernel by 180 degrees</span>
-    kernel = np.rot90(np.rot90(kernel))
-
-    <span style="color: #228B22"># note that kernel height // 2 is written as &#39;m&#39;</span>
-    <span style="color: #228B22"># and kernel width // 2 as &#39;n&#39; in the mathematical notation</span>
-    m = kernel.shape[<span style="color: #B452CD">0</span>] // <span style="color: #B452CD">2</span>
-    n = kernel.shape[<span style="color: #B452CD">1</span>] // <span style="color: #B452CD">2</span>
-    
-    r = (kernel.shape[<span style="color: #B452CD">0</span>] // <span style="color: #B452CD">2</span>) * <span style="color: #B452CD">2</span>
-    c = (kernel.shape[<span style="color: #B452CD">1</span>] // <span style="color: #B452CD">2</span>) * <span style="color: #B452CD">2</span>
-    
-    <span style="color: #228B22"># initialize output array</span>
-    convolved_image = np.zeros(original_image.shape)
-    image_height = original_image.shape[<span style="color: #B452CD">0</span>]
-    image_width = original_image.shape[<span style="color: #B452CD">1</span>]
-
-    <span style="color: #228B22"># the convolution</span>
-    <span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(m, image_height + m, stride):
-        <span style="color: #8B008B; font-weight: bold">for</span> j <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(n, image_width + n, stride):
-            convolved_image[i-m, j-n] = np.sum(
-                padded_image[i : i + m, j : j + n]
-                * kernel
-            )
-            
-    <span style="color: #8B008B; font-weight: bold">return</span> convolved_image
-
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">convolve</span>(image, kernel, stride=<span style="color: #B452CD">1</span>):
-    <span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #B452CD">2</span>):
-        kernel = np.rot90(kernel)
-
-    k_half_height = kernel.shape[<span style="color: #B452CD">0</span>] // <span style="color: #B452CD">2</span>
-    k_half_width = kernel.shape[<span style="color: #B452CD">0</span>] // <span style="color: #B452CD">2</span>
-
-    conv_image = np.zeros(image.shape)
-    pad_image = padding(image, kernel)
-
-    <span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(k_half_height, conv_image.shape[<span style="color: #B452CD">0</span>] + k_half_height, stride):
-        <span style="color: #8B008B; font-weight: bold">for</span> j <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(k_half_width, conv_image.shape[<span style="color: #B452CD">1</span>] + k_half_width, stride):
-            conv_image[i - k_half_height, j - k_half_width] = np.sum(
-                pad_image[
-                    i - k_half_height : i + k_half_height + <span style="color: #B452CD">1</span>, j - k_half_width : j + k_half_width + <span style="color: #B452CD">1</span>
-                ]
-                * kernel
-            )
-
-    <span style="color: #8B008B; font-weight: bold">return</span> conv_image
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-<p>Fun fact: When filtering images, you will see that convolution
-involves rotating the kernel by 180 degrees.  However, this is not the
-case when applying convolution in a CNN, where the same operation that is not
-rotated by 180 degrees is called cross-correlation, which is normally implemented in most libraries.
-</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="font-size: 80%; line-height: 125%;">original_image = np.array([[<span style="color: #B452CD">4</span>, <span style="color: #B452CD">1</span>, <span style="color: #B452CD">2</span>, <span style="color: #B452CD">9</span>, <span style="color: #B452CD">8</span>, <span style="color: #B452CD">6</span>],
-                 [<span style="color: #B452CD">9</span>, <span style="color: #B452CD">5</span>, <span style="color: #B452CD">9</span>, <span style="color: #B452CD">5</span>, <span style="color: #B452CD">8</span>, <span style="color: #B452CD">5</span>],
-                 [<span style="color: #B452CD">1</span>, <span style="color: #B452CD">5</span>, <span style="color: #B452CD">9</span>, <span style="color: #B452CD">7</span>, <span style="color: #B452CD">6</span>, <span style="color: #B452CD">4</span>],
-                 [<span style="color: #B452CD">2</span>, <span style="color: #B452CD">9</span>, <span style="color: #B452CD">8</span>, <span style="color: #B452CD">3</span>, <span style="color: #B452CD">7</span>, <span style="color: #B452CD">1</span>],
-                 [<span style="color: #B452CD">8</span>, <span style="color: #B452CD">1</span>, <span style="color: #B452CD">6</span>, <span style="color: #B452CD">4</span>, <span style="color: #B452CD">2</span>, <span style="color: #B452CD">2</span>],
-                 [<span style="color: #B452CD">1</span>, <span style="color: #B452CD">0</span>, <span style="color: #B452CD">5</span>, <span style="color: #B452CD">7</span>, <span style="color: #B452CD">8</span>, <span style="color: #B452CD">2</span>]])
-
-kernel = (<span style="color: #B452CD">1</span>/<span style="color: #B452CD">9</span>)*np.ones((<span style="color: #B452CD">3</span>,<span style="color: #B452CD">3</span>))
-
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">f&quot;{</span>original_image.shape<span style="color: #CD5555">=}&quot;</span>)
-
-<span style="color: #228B22"># note that convolve() performs padding</span>
-convolved_image = convolve(original_image, kernel, stride=<span style="color: #B452CD">1</span>)
-
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">f&quot;{</span>convolved_image.shape<span style="color: #CD5555">=}&quot;</span>)
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-<p>As you can see, the resulting image is of the same size as the
-original image. To round of our demonstration of convolution, we will
-present the results of convolution using commonly used kernels. In a
-CNN, the values of the kernels are randomly initialized, and then
-learned during training. These kernels will extract information
-regarding the picture, such as for example the edge detection filter
-demonstrated below extracts the edges present in the picture. Of
-course, there is no guarantee that the CNN will learn an edge
-detection filter, but this should provide some intuiton as to how the
-CNN is able to use kernels to make better predictions than a regular
-feed forward neural network.
-</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="font-size: 80%; line-height: 125%;"><span style="color: #228B22"># Now an example using a real image and first a gaussian low-pass filter and then a Sobel filter</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">imageio.v3</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">imageio</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">matplotlib.pyplot</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">plt</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">time</span>
-
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">generate_gauss_mask</span>(sigma, K=<span style="color: #B452CD">1</span>):
-    side = np.ceil(<span style="color: #B452CD">1</span> + <span style="color: #B452CD">8</span> * sigma)
-    y, x = np.mgrid[-side // <span style="color: #B452CD">2</span> + <span style="color: #B452CD">1</span> : (side // <span style="color: #B452CD">2</span>) + <span style="color: #B452CD">1</span>, -side // <span style="color: #B452CD">2</span> + <span style="color: #B452CD">1</span> : (side // <span style="color: #B452CD">2</span>) + <span style="color: #B452CD">1</span>]
-    ker_coef = K / (<span style="color: #B452CD">2</span> * np.pi * sigma**<span style="color: #B452CD">2</span>)
-    g = np.exp(-((x**<span style="color: #B452CD">2</span> + y**<span style="color: #B452CD">2</span>) / (<span style="color: #B452CD">2.0</span> * sigma**<span style="color: #B452CD">2</span>)))
-
-    <span style="color: #8B008B; font-weight: bold">return</span> g, ker_coef
-
-
-img_path = <span style="color: #CD5555">&quot;data/IMG-2167.JPG&quot;</span>
-image_of_cute_dog = imageio.imread(img_path, mode=<span style="color: #CD5555">&#39;L&#39;</span>)
-
-plt.imshow(image_of_cute_dog, cmap=<span style="color: #CD5555">&quot;gray&quot;</span>, vmin=<span style="color: #B452CD">0</span>, vmax=<span style="color: #B452CD">255</span>, aspect=<span style="color: #CD5555">&quot;auto&quot;</span>)
-plt.title(<span style="color: #CD5555">&quot;Original image&quot;</span>)
-plt.show()
-
-gauss, kernel = generate_gauss_mask(sigma=<span style="color: #B452CD">6</span>)
-gauss_kernel = gauss*kernel
-
-filtered_image = convolve(image_of_cute_dog, gauss_kernel)
-plt.imshow(filtered_image, cmap=<span style="color: #CD5555">&quot;gray&quot;</span>, vmin=<span style="color: #B452CD">0</span>, vmax=<span style="color: #B452CD">255</span>, aspect=<span style="color: #CD5555">&quot;auto&quot;</span>)
-plt.title(<span style="color: #CD5555">&quot;Result of convolution with gauss kernel (blurring filter)&quot;</span>)
-plt.show()
-
-sobel_kernel = np.array([[<span style="color: #B452CD">1</span>, <span style="color: #B452CD">2</span>, <span style="color: #B452CD">1</span>],
-                    [<span style="color: #B452CD">0</span>, <span style="color: #B452CD">0</span>, <span style="color: #B452CD">0</span>], 
-                    [-<span style="color: #B452CD">1</span>, -<span style="color: #B452CD">2</span>, -<span style="color: #B452CD">1</span>]])
-
-filtered_image = convolve(image_of_cute_dog, sobel_kernel)
-
-plt.imshow(filtered_image, cmap=<span style="color: #CD5555">&quot;gray&quot;</span>, vmin=<span style="color: #B452CD">0</span>, vmax=<span style="color: #B452CD">255</span>, aspect=<span style="color: #CD5555">&quot;auto&quot;</span>)
-plt.title(<span style="color: #CD5555">&quot;Result of convolution with sobel kernel (edge detection filter)&quot;</span>)
-plt.show()
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-<h3 id="layers">Layers </h3>
-
-<p>The code below initialises global variables for readability and
-describes the abstract class Layers. This is not important in order to
-understand the CNN, but is benefitial for organizing the code neatly.
-</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="font-size: 80%; line-height: 125%;"><span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">math</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">autograd.numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">copy</span> <span style="color: #8B008B; font-weight: bold">import</span> deepcopy, copy
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">autograd</span> <span style="color: #8B008B; font-weight: bold">import</span> grad
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">typing</span> <span style="color: #8B008B; font-weight: bold">import</span> Callable
-
-<span style="color: #228B22"># global variables for index readability</span>
-input_index = <span style="color: #B452CD">0</span>
-node_index = <span style="color: #B452CD">1</span>
-bias_index = <span style="color: #B452CD">1</span>
-input_channel_index = <span style="color: #B452CD">1</span>
-feature_maps_index = <span style="color: #B452CD">1</span>
-height_index = <span style="color: #B452CD">2</span>
-width_index = <span style="color: #B452CD">3</span>
-kernel_feature_maps_index = <span style="color: #B452CD">1</span>
-kernel_input_channels_index = <span style="color: #B452CD">0</span>
-
-
-<span style="color: #8B008B; font-weight: bold">class</span> <span style="color: #008b45; font-weight: bold">Layer</span>:
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">__init__</span>(<span style="color: #658b00">self</span>, seed):
-        <span style="color: #658b00">self</span>.seed = seed
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">_feedforward</span>(<span style="color: #658b00">self</span>):
-        <span style="color: #8B008B; font-weight: bold">raise</span> <span style="color: #008b45; font-weight: bold">NotImplementedError</span>
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">_backpropagate</span>(<span style="color: #658b00">self</span>):
-        <span style="color: #8B008B; font-weight: bold">raise</span> <span style="color: #008b45; font-weight: bold">NotImplementedError</span>
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">_reset_weights</span>(<span style="color: #658b00">self</span>, previous_nodes):
-        <span style="color: #8B008B; font-weight: bold">raise</span> <span style="color: #008b45; font-weight: bold">NotImplementedError</span>
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-<h3 id="convolution2dlayer-convolution-in-a-hidden-layer">Convolution2DLayer: convolution in a hidden layer </h3>
-
-<p>After establishing the foundational understanding of applying
-convolution to spatial data, let us delve into the intricate workings
-of a convolutional layer in a Convolutional Neural Network (CNN). The
-primary function of convolution, as previously discussed, is to
-extract pertinent information from images while simultaneously
-decreasing the scale of our data. To initiate the image processing, we
-shall begin by partitioning the images into color channels (unless the
-image is grayscale), comprising three primary colors: red, green, and
-blue. We will subsequently utilize trainable kernels to construct a
-higher-dimensional encoding of each channel called feature
-maps. Successive layers will receive these feature maps as inputs,
-generating further encodings, albeit with reduced dimensions. The term
-trainable kernels denotes the initialization of pre-defined
-kernel-shaped weights, which we will then train via backpropagation,
-similar to how weights are trained in a Feedforward Neural Network.
-</p>
-
-<p>To ensure seamless integration between our implementation of the
-convolutional layer and popular machine learning frameworks like
-Tensorflow (Keras) and PyTorch, we have adopted a design pattern that
-mirrors the construction of models using these APIs. This involves
-implementing our convolutional layer as a Python class or object,
-which allows for a more modular and flexible approach to building
-neural networks. By structuring our code in this way, users can easily
-incorporate our implementation into their existing machine learning
-pipelines without having to make significant changes to their
-codebase. Additionally, this design pattern promotes code reusability
-and makes it easier to maintain and update our convolutional layer
-implementation over time.
-</p>
-
-<p>Note that the Convolution2DLayer takes in an activation function as a parameter, as it also performs non-linearity.</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="font-size: 80%; line-height: 125%;"><span style="color: #8B008B; font-weight: bold">class</span> <span style="color: #008b45; font-weight: bold">Convolution2DLayer</span>(Layer):
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">__init__</span>(
-        <span style="color: #658b00">self</span>,
-        input_channels,
-        feature_maps,
-        kernel_height,
-        kernel_width,
-        v_stride,
-        h_stride,
-        pad,
-        act_func: Callable,
-        seed=<span style="color: #8B008B; font-weight: bold">None</span>,
-        reset_weights_independently=<span style="color: #8B008B; font-weight: bold">True</span>,
-    ):
-        <span style="color: #658b00">super</span>().<span style="color: #008b45">__init__</span>(seed)
-        <span style="color: #658b00">self</span>.input_channels = input_channels
-        <span style="color: #658b00">self</span>.feature_maps = feature_maps
-        <span style="color: #658b00">self</span>.kernel_height = kernel_height
-        <span style="color: #658b00">self</span>.kernel_width = kernel_width
-        <span style="color: #658b00">self</span>.v_stride = v_stride
-        <span style="color: #658b00">self</span>.h_stride = h_stride
-        <span style="color: #658b00">self</span>.pad = pad
-        <span style="color: #658b00">self</span>.act_func = act_func
-
-        <span style="color: #228B22"># such that the layer can be used on its own</span>
-        <span style="color: #228B22"># outside of the CNN module</span>
-        <span style="color: #8B008B; font-weight: bold">if</span> reset_weights_independently == <span style="color: #8B008B; font-weight: bold">True</span>:
-            <span style="color: #658b00">self</span>._reset_weights_independently()
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">_feedforward</span>(<span style="color: #658b00">self</span>, X_batch):
-        <span style="color: #228B22"># note that the shape of X_batch = [inputs, input_maps, img_height, img_width]</span>
-
-        <span style="color: #228B22"># pad the input batch</span>
-        X_batch_padded = <span style="color: #658b00">self</span>._padding(X_batch)
-
-        <span style="color: #228B22"># calculate height_index and width_index after stride</span>
-        strided_height = <span style="color: #658b00">int</span>(np.ceil(X_batch.shape[height_index] / <span style="color: #658b00">self</span>.v_stride))
-        strided_width = <span style="color: #658b00">int</span>(np.ceil(X_batch.shape[width_index] / <span style="color: #658b00">self</span>.h_stride))
-
-        <span style="color: #228B22"># create output array</span>
-        output = np.ndarray(
-            (
-                X_batch.shape[input_index],
-                <span style="color: #658b00">self</span>.feature_maps,
-                strided_height,
-                strided_width,
-            )
-        )
-
-        <span style="color: #228B22"># save input and output for backpropagation</span>
-        <span style="color: #658b00">self</span>.X_batch_feedforward = X_batch
-        <span style="color: #658b00">self</span>.output_shape = output.shape
-
-        <span style="color: #228B22"># checking for errors, no need to look here :)</span>
-        <span style="color: #658b00">self</span>._check_for_errors()
-
-        <span style="color: #228B22"># convolve input with kernel</span>
-        <span style="color: #8B008B; font-weight: bold">for</span> img <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(X_batch.shape[input_index]):
-            <span style="color: #8B008B; font-weight: bold">for</span> chin <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #658b00">self</span>.input_channels):
-                <span style="color: #8B008B; font-weight: bold">for</span> fmap <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #658b00">self</span>.feature_maps):
-                    out_h = <span style="color: #B452CD">0</span>
-                    <span style="color: #8B008B; font-weight: bold">for</span> h <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #B452CD">0</span>, X_batch.shape[height_index], <span style="color: #658b00">self</span>.v_stride):
-                        out_w = <span style="color: #B452CD">0</span>
-                        <span style="color: #8B008B; font-weight: bold">for</span> w <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #B452CD">0</span>, X_batch.shape[width_index], <span style="color: #658b00">self</span>.h_stride):
-                            output[img, fmap, out_h, out_w] = np.sum(
-                                X_batch_padded[
-                                    img,
-                                    chin,
-                                    h : h + <span style="color: #658b00">self</span>.kernel_height,
-                                    w : w + <span style="color: #658b00">self</span>.kernel_width,
-                                ]
-                                * <span style="color: #658b00">self</span>.kernel[chin, fmap, :, :]
-                            )
-                            out_w += <span style="color: #B452CD">1</span>
-                        out_h += <span style="color: #B452CD">1</span>
-
-        <span style="color: #228B22"># Pay attention to the fact that we&#39;re not rotating the kernel by 180 degrees when filtering the image in</span>
-        <span style="color: #228B22"># the convolutional layer, as convolution in terms of Machine Learning is a procedure known as cross-correlation</span>
-        <span style="color: #228B22"># in image processing and signal processing</span>
-
-        <span style="color: #228B22"># return a</span>
-        <span style="color: #8B008B; font-weight: bold">return</span> <span style="color: #658b00">self</span>.act_func(output / (<span style="color: #658b00">self</span>.kernel_height))
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">_backpropagate</span>(<span style="color: #658b00">self</span>, delta_term_next):
-        <span style="color: #228B22"># intiate matrices</span>
-        delta_term = np.zeros((<span style="color: #658b00">self</span>.X_batch_feedforward.shape))
-        gradient_kernel = np.zeros((<span style="color: #658b00">self</span>.kernel.shape))
-
-        <span style="color: #228B22"># pad input for convolution</span>
-        X_batch_padded = <span style="color: #658b00">self</span>._padding(<span style="color: #658b00">self</span>.X_batch_feedforward)
-
-        <span style="color: #228B22"># Since an activation function is used at the output of the convolution layer, its derivative</span>
-        <span style="color: #228B22"># has to be accounted for in the backpropagation -&gt; as if ReLU was a layer on its own.</span>
-        act_derivative = derivate(<span style="color: #658b00">self</span>.act_func)
-        delta_term_next = act_derivative(delta_term_next)
-
-        <span style="color: #228B22"># fill in 0&#39;s for values removed by vertical stride in feedforward</span>
-        <span style="color: #8B008B; font-weight: bold">if</span> <span style="color: #658b00">self</span>.v_stride &gt; <span style="color: #B452CD">1</span>:
-            v_ind = <span style="color: #B452CD">1</span>
-            <span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(delta_term_next.shape[height_index]):
-                <span style="color: #8B008B; font-weight: bold">for</span> j <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #658b00">self</span>.v_stride - <span style="color: #B452CD">1</span>):
-                    delta_term_next = np.insert(
-                        delta_term_next, v_ind, <span style="color: #B452CD">0</span>, axis=height_index
-                    )
-                v_ind += <span style="color: #658b00">self</span>.v_stride
-
-        <span style="color: #228B22"># fill in 0&#39;s for values removed by horizontal stride in feedforward</span>
-        <span style="color: #8B008B; font-weight: bold">if</span> <span style="color: #658b00">self</span>.h_stride &gt; <span style="color: #B452CD">1</span>:
-            h_ind = <span style="color: #B452CD">1</span>
-            <span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(delta_term_next.shape[width_index]):
-                <span style="color: #8B008B; font-weight: bold">for</span> k <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #658b00">self</span>.h_stride - <span style="color: #B452CD">1</span>):
-                    delta_term_next = np.insert(
-                        delta_term_next, h_ind, <span style="color: #B452CD">0</span>, axis=width_index
-                    )
-                h_ind += <span style="color: #658b00">self</span>.h_stride
-
-        <span style="color: #228B22"># crops out 0-rows and 0-columns</span>
-        delta_term_next = delta_term_next[
-            :,
-            :,
-            : <span style="color: #658b00">self</span>.X_batch_feedforward.shape[height_index],
-            : <span style="color: #658b00">self</span>.X_batch_feedforward.shape[width_index],
-        ]
-
-        <span style="color: #228B22"># the gradient received from the next layer also needs to be padded</span>
-        delta_term_next = <span style="color: #658b00">self</span>._padding(delta_term_next)
-
-        <span style="color: #228B22"># calculate delta term by convolving next delta term with kernel</span>
-        <span style="color: #8B008B; font-weight: bold">for</span> img <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #658b00">self</span>.X_batch_feedforward.shape[input_index]):
-            <span style="color: #8B008B; font-weight: bold">for</span> chin <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #658b00">self</span>.input_channels):
-                <span style="color: #8B008B; font-weight: bold">for</span> fmap <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #658b00">self</span>.feature_maps):
-                    <span style="color: #8B008B; font-weight: bold">for</span> h <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #658b00">self</span>.X_batch_feedforward.shape[height_index]):
-                        <span style="color: #8B008B; font-weight: bold">for</span> w <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #658b00">self</span>.X_batch_feedforward.shape[width_index]):
-                            delta_term[img, chin, h, w] = np.sum(
-                                delta_term_next[
-                                    img,
-                                    fmap,
-                                    h : h + <span style="color: #658b00">self</span>.kernel_height,
-                                    w : w + <span style="color: #658b00">self</span>.kernel_width,
-                                ]
-                                * np.rot90(np.rot90(<span style="color: #658b00">self</span>.kernel[chin, fmap, :, :]))
-                            )
-
-        <span style="color: #228B22"># calculate gradient for kernel for weight update</span>
-        <span style="color: #228B22"># also via convolution</span>
-        <span style="color: #8B008B; font-weight: bold">for</span> chin <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #658b00">self</span>.input_channels):
-            <span style="color: #8B008B; font-weight: bold">for</span> fmap <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #658b00">self</span>.feature_maps):
-                <span style="color: #8B008B; font-weight: bold">for</span> k_x <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #658b00">self</span>.kernel_height):
-                    <span style="color: #8B008B; font-weight: bold">for</span> k_y <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #658b00">self</span>.kernel_width):
-                        gradient_kernel[chin, fmap, k_x, k_y] = np.sum(
-                            X_batch_padded[
-                                img,
-                                chin,
-                                h : h + <span style="color: #658b00">self</span>.kernel_height,
-                                w : w + <span style="color: #658b00">self</span>.kernel_width,
-                            ]
-                            * delta_term_next[
-                                img,
-                                fmap,
-                                h : h + <span style="color: #658b00">self</span>.kernel_height,
-                                w : w + <span style="color: #658b00">self</span>.kernel_width,
-                            ]
-                        )
-        <span style="color: #228B22"># all kernels are updated with weight gradient of kernel</span>
-        <span style="color: #658b00">self</span>.kernel -= gradient_kernel
-
-        <span style="color: #228B22"># return delta term</span>
-        <span style="color: #8B008B; font-weight: bold">return</span> delta_term
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">_padding</span>(<span style="color: #658b00">self</span>, X_batch, batch_type=<span style="color: #CD5555">&quot;image&quot;</span>):
-
-        <span style="color: #228B22"># same padding for images</span>
-        <span style="color: #8B008B; font-weight: bold">if</span> <span style="color: #658b00">self</span>.pad == <span style="color: #CD5555">&quot;same&quot;</span> <span style="color: #8B008B">and</span> batch_type == <span style="color: #CD5555">&quot;image&quot;</span>:
-            padded_height = X_batch.shape[height_index] + (<span style="color: #658b00">self</span>.kernel_height // <span style="color: #B452CD">2</span>) * <span style="color: #B452CD">2</span>
-            padded_width = X_batch.shape[width_index] + (<span style="color: #658b00">self</span>.kernel_width // <span style="color: #B452CD">2</span>) * <span style="color: #B452CD">2</span>
-            half_kernel_height = <span style="color: #658b00">self</span>.kernel_height // <span style="color: #B452CD">2</span>
-            half_kernel_width = <span style="color: #658b00">self</span>.kernel_width // <span style="color: #B452CD">2</span>
-
-            <span style="color: #228B22"># initialize padded array</span>
-            X_batch_padded = np.ndarray(
-                (
-                    X_batch.shape[input_index],
-                    X_batch.shape[feature_maps_index],
-                    padded_height,
-                    padded_width,
-                )
-            )
-
-            <span style="color: #228B22"># zero pad all images in X_batch</span>
-            <span style="color: #8B008B; font-weight: bold">for</span> img <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(X_batch.shape[input_index]):
-                padded_img = np.zeros(
-                    (X_batch.shape[feature_maps_index], padded_height, padded_width)
-                )
-                padded_img[
-                    :,
-                    half_kernel_height : padded_height - half_kernel_height,
-                    half_kernel_width : padded_width - half_kernel_width,
-                ] = X_batch[img, :, :, :]
-                X_batch_padded[img, :, :, :] = padded_img[:, :, :]
-
-            <span style="color: #8B008B; font-weight: bold">return</span> X_batch_padded
-
-        <span style="color: #228B22"># same padding for gradients</span>
-        <span style="color: #8B008B; font-weight: bold">elif</span> <span style="color: #658b00">self</span>.pad == <span style="color: #CD5555">&quot;same&quot;</span> <span style="color: #8B008B">and</span> batch_type == <span style="color: #CD5555">&quot;grad&quot;</span>:
-            padded_height = X_batch.shape[height_index] + (<span style="color: #658b00">self</span>.kernel_height // <span style="color: #B452CD">2</span>) * <span style="color: #B452CD">2</span>
-            padded_width = X_batch.shape[width_index] + (<span style="color: #658b00">self</span>.kernel_width // <span style="color: #B452CD">2</span>) * <span style="color: #B452CD">2</span>
-            half_kernel_height = <span style="color: #658b00">self</span>.kernel_height // <span style="color: #B452CD">2</span>
-            half_kernel_width = <span style="color: #658b00">self</span>.kernel_width // <span style="color: #B452CD">2</span>
-
-            <span style="color: #228B22"># initialize padded array</span>
-            delta_term_padded = np.zeros(
-                (
-                    X_batch.shape[input_index],
-                    X_batch.shape[feature_maps_index],
-                    padded_height,
-                    padded_width,
-                )
-            )
-
-            <span style="color: #228B22"># zero pad delta term</span>
-            delta_term_padded[
-                :, :, : X_batch.shape[height_index], : X_batch.shape[width_index]
-            ] = X_batch[:, :, :, :]
-
-            <span style="color: #8B008B; font-weight: bold">return</span> delta_term_padded
-
-        <span style="color: #8B008B; font-weight: bold">else</span>:
-            <span style="color: #8B008B; font-weight: bold">return</span> X_batch
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">_reset_weights_independently</span>(<span style="color: #658b00">self</span>):
-        <span style="color: #228B22"># sets seed to remove randomness inbetween runs</span>
-        <span style="color: #8B008B; font-weight: bold">if</span> <span style="color: #658b00">self</span>.seed <span style="color: #8B008B">is</span> <span style="color: #8B008B">not</span> <span style="color: #8B008B; font-weight: bold">None</span>:
-            np.random.seed(<span style="color: #658b00">self</span>.seed)
-
-        <span style="color: #228B22"># initializes kernel matrix</span>
-        <span style="color: #658b00">self</span>.kernel = np.ndarray(
-            (
-                <span style="color: #658b00">self</span>.input_channels,
-                <span style="color: #658b00">self</span>.feature_maps,
-                <span style="color: #658b00">self</span>.kernel_height,
-                <span style="color: #658b00">self</span>.kernel_width,
-            )
-        )
-
-        <span style="color: #228B22"># randomly initializes weights</span>
-        <span style="color: #8B008B; font-weight: bold">for</span> chin <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #658b00">self</span>.kernel.shape[kernel_input_channels_index]):
-            <span style="color: #8B008B; font-weight: bold">for</span> fmap <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #658b00">self</span>.kernel.shape[kernel_feature_maps_index]):
-                <span style="color: #658b00">self</span>.kernel[chin, fmap, :, :] = np.random.rand(
-                    <span style="color: #658b00">self</span>.kernel_height, <span style="color: #658b00">self</span>.kernel_width
-                )
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">_reset_weights</span>(<span style="color: #658b00">self</span>, previous_nodes):
-        <span style="color: #228B22"># sets weights</span>
-        <span style="color: #658b00">self</span>._reset_weights_independently()
-
-        <span style="color: #228B22"># returns shape of output used for subsequent layer&#39;s weight initiation</span>
-        strided_height = <span style="color: #658b00">int</span>(
-            np.ceil(previous_nodes.shape[height_index] / <span style="color: #658b00">self</span>.v_stride)
-        )
-        strided_width = <span style="color: #658b00">int</span>(np.ceil(previous_nodes.shape[width_index] / <span style="color: #658b00">self</span>.h_stride))
-        next_nodes = np.ones(
-            (
-                previous_nodes.shape[input_index],
-                <span style="color: #658b00">self</span>.feature_maps,
-                strided_height,
-                strided_width,
-            )
-        )
-        <span style="color: #8B008B; font-weight: bold">return</span> next_nodes / <span style="color: #658b00">self</span>.kernel_height
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">_check_for_errors</span>(<span style="color: #658b00">self</span>):
-        <span style="color: #8B008B; font-weight: bold">if</span> <span style="color: #658b00">self</span>.X_batch_feedforward.shape[input_channel_index] != <span style="color: #658b00">self</span>.input_channels:
-            <span style="color: #8B008B; font-weight: bold">raise</span> <span style="color: #008b45; font-weight: bold">AssertionError</span>(
-                <span style="color: #CD5555">f&quot;ERROR: Number of input channels in data ({</span><span style="color: #658b00">self</span>.X_batch_feedforward.shape[input_channel_index]<span style="color: #CD5555">}) is not equal to input channels in Convolution2DLayerOPT ({</span><span style="color: #658b00">self</span>.input_channels<span style="color: #CD5555">})! Please change the number of input channels of the Convolution2DLayer such that they are equal&quot;</span>
-            )
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-<h3 id="backpropagation-in-the-convolutional-layer">Backpropagation in the convolutional layer </h3>
-
-<p>As you may have noticed, we have not yet explained how the
-backpropagation algorithm works in a convolutional layer. However,
-having covered all other major details about convolutional layers, we
-are now prepared to do so. It should come as no surprise that the
-calculation of delta terms at each convolutional layer takes the form
-of convolution. After the gradient has been propagated backwards
-through the flattening layer, where it was reshaped into an
-appropriate form, calculating the update value for the kernel is
-simply a matter of convolving the output gradient with the input of
-the layer for which we are updating the weights. For more detail, this
-article serves as an excellent resource, see
-<a href="https://pavisj.medium.com/convolutions-and-backpropagations-46026a8f5d2c" target="_blank"><tt>https://pavisj.medium.com/convolutions-and-backpropagations-46026a8f5d2c</tt></a>
-</p>
-<h3 id="demonstration">Demonstration </h3>
-
-<p>We can use the convolutional layer above to perform a simple convolution on an image of the now familiar cute dog.</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="font-size: 80%; line-height: 125%;"><span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">imageio.v3</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">imageio</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">matplotlib.pyplot</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">plt</span>
-
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">plot_convolution_result</span>(X, layer):
-    plt.imshow(X[<span style="color: #B452CD">0</span>, <span style="color: #B452CD">0</span>, :, :], vmin=<span style="color: #B452CD">0</span>, vmax=<span style="color: #B452CD">255</span>, cmap=<span style="color: #CD5555">&quot;gray&quot;</span>)
-    plt.title(<span style="color: #CD5555">&quot;Original image&quot;</span>)
-    plt.colorbar()
-    plt.show()
-    conv_result = layer._feedforward(X)
-    plt.title(<span style="color: #CD5555">&quot;Result of convolutional layer&quot;</span>)
-    plt.imshow(conv_result[<span style="color: #B452CD">0</span>, <span style="color: #B452CD">0</span>, :, :], vmin=<span style="color: #B452CD">0</span>, vmax=<span style="color: #B452CD">255</span>, cmap=<span style="color: #CD5555">&quot;gray&quot;</span>)
-    plt.colorbar()
-    plt.show()
-
-<span style="color: #228B22"># create layer</span>
-layer = Convolution2DLayer(
-    input_channels=<span style="color: #B452CD">3</span>,
-    feature_maps=<span style="color: #B452CD">1</span>,
-    kernel_height=<span style="color: #B452CD">4</span>,
-    kernel_width=<span style="color: #B452CD">4</span>,
-    v_stride=<span style="color: #B452CD">2</span>,
-    h_stride=<span style="color: #B452CD">2</span>,
-    pad=<span style="color: #CD5555">&quot;same&quot;</span>,
-    act_func=identity,
-    seed=<span style="color: #B452CD">2023</span>,
-    )
-
-<span style="color: #228B22"># read in image path, make data correct format</span>
-img_path = img_path = <span style="color: #CD5555">&quot;data/IMG-2167.JPG&quot;</span>
-image_of_cute_dog = imageio.imread(img_path)
-image_shape = image_of_cute_dog.shape
-image_of_cute_dog = image_of_cute_dog.reshape(<span style="color: #B452CD">1</span>, image_shape[<span style="color: #B452CD">0</span>], image_shape[<span style="color: #B452CD">1</span>], image_shape[<span style="color: #B452CD">2</span>])
-image_of_cute_dog = image_of_cute_dog.transpose(<span style="color: #B452CD">0</span>, <span style="color: #B452CD">3</span>, <span style="color: #B452CD">1</span>, <span style="color: #B452CD">2</span>)
-
-<span style="color: #228B22"># plot the result of the convolution</span>
-plot_convolution_result(image_of_cute_dog, layer)
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-<p>We cobserve that the result has half the pixels on each axis due to
-the fact that we've used a horizontal and vertical stride of 2. The
-result of this convolution is not very insightfull, as the kernel has
-completely random values for the first feedforward pass. However, as
-we perform multiple forward and backward passes, the results of the
-convolution should provide identifying features of the image it uses
-for classification.
-</p>
-
-<p>Note that image data usually comes in many different shapes and sizes,
-but for our CNN we require the input data be formatted as \[Number of
-inputs, input channels, input height, input width\]. Occasionally, the
-data you come accross use will be formatted like this, but on many
-occasions reshaping and transposing the dimensions is sadly necessary.
-</p>
-<h3 id="pooling-layer">Pooling Layer </h3>
-
-<p>The pooling layer is another widely used type of layer in
-convolutional neural networks that enables data downsampling to a more
-manageable size. Despite recent technological advancements that allow
-for convolution without excessive size reduction of the data, the
-pooling layer still remains a fundamental component of convolutional
-neural networks. It can be used before, after, or in between
-convolutional layers, although finding the optimal placement of layers
-and network depth requires experimentation to achieve the best
-performance for a given problem. The code we provide allows you to
-perform two types of pooling known as max pooling and average pooling.
-</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="font-size: 80%; line-height: 125%;"><span style="color: #8B008B; font-weight: bold">class</span> <span style="color: #008b45; font-weight: bold">Pooling2DLayer</span>(Layer):
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">__init__</span>(
-        <span style="color: #658b00">self</span>,
-        kernel_height,
-        kernel_width,
-        v_stride,
-        h_stride,
-        pooling=<span style="color: #CD5555">&quot;max&quot;</span>,
-        seed=<span style="color: #8B008B; font-weight: bold">None</span>,
-    ):
-        <span style="color: #658b00">super</span>().<span style="color: #008b45">__init__</span>(seed)
-        <span style="color: #658b00">self</span>.kernel_height = kernel_height
-        <span style="color: #658b00">self</span>.kernel_width = kernel_width
-        <span style="color: #658b00">self</span>.v_stride = v_stride
-        <span style="color: #658b00">self</span>.h_stride = h_stride
-        <span style="color: #658b00">self</span>.pooling = pooling
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">_feedforward</span>(<span style="color: #658b00">self</span>, X_batch):
-        <span style="color: #228B22"># Saving the input for use in the backwardpass</span>
-        <span style="color: #658b00">self</span>.X_batch_feedforward = X_batch
-
-        <span style="color: #228B22"># check if user is silly</span>
-        <span style="color: #658b00">self</span>._check_for_errors()
-
-        <span style="color: #228B22"># Computing the size of the feature maps based on kernel size and the stride parameter</span>
-        strided_height = (
-            X_batch.shape[height_index] - <span style="color: #658b00">self</span>.kernel_height
-        ) // <span style="color: #658b00">self</span>.v_stride + <span style="color: #B452CD">1</span>
-        <span style="color: #8B008B; font-weight: bold">if</span> X_batch.shape[height_index] == X_batch.shape[width_index]:
-            strided_width = strided_height
-        <span style="color: #8B008B; font-weight: bold">else</span>:
-            strided_width = (
-                X_batch.shape[width_index] - <span style="color: #658b00">self</span>.kernel_width
-            ) // <span style="color: #658b00">self</span>.h_stride + <span style="color: #B452CD">1</span>
-
-        <span style="color: #228B22"># initialize output array</span>
-        output = np.ndarray(
-            (
-                X_batch.shape[input_index],
-                X_batch.shape[feature_maps_index],
-                strided_height,
-                strided_width,
-            )
-        )
-
-        <span style="color: #228B22"># select pooling action, either max or average pooling</span>
-        <span style="color: #8B008B; font-weight: bold">if</span> <span style="color: #658b00">self</span>.pooling == <span style="color: #CD5555">&quot;max&quot;</span>:
-            <span style="color: #658b00">self</span>.pooling_action = np.max
-        <span style="color: #8B008B; font-weight: bold">elif</span> <span style="color: #658b00">self</span>.pooling == <span style="color: #CD5555">&quot;average&quot;</span>:
-            <span style="color: #658b00">self</span>.pooling_action = np.mean
-
-        <span style="color: #228B22"># pool based on kernel size and stride</span>
-        <span style="color: #8B008B; font-weight: bold">for</span> img <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(output.shape[input_index]):
-            <span style="color: #8B008B; font-weight: bold">for</span> fmap <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(output.shape[feature_maps_index]):
-                <span style="color: #8B008B; font-weight: bold">for</span> h <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(strided_height):
-                    <span style="color: #8B008B; font-weight: bold">for</span> w <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(strided_width):
-                        output[img, fmap, h, w] = <span style="color: #658b00">self</span>.pooling_action(
-                            X_batch[
-                                img,
-                                fmap,
-                                (h * <span style="color: #658b00">self</span>.v_stride) : (h * <span style="color: #658b00">self</span>.v_stride)
-                                + <span style="color: #658b00">self</span>.kernel_height,
-                                (w * <span style="color: #658b00">self</span>.h_stride) : (w * <span style="color: #658b00">self</span>.h_stride)
-                                + <span style="color: #658b00">self</span>.kernel_width,
-                            ]
-                        )
-
-        <span style="color: #228B22"># output for feedforward in next layer</span>
-        <span style="color: #8B008B; font-weight: bold">return</span> output
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">_backpropagate</span>(<span style="color: #658b00">self</span>, delta_term_next):
-        <span style="color: #228B22"># initiate delta term array</span>
-        delta_term = np.zeros((<span style="color: #658b00">self</span>.X_batch_feedforward.shape))
-
-        <span style="color: #8B008B; font-weight: bold">for</span> img <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(delta_term_next.shape[input_index]):
-            <span style="color: #8B008B; font-weight: bold">for</span> fmap <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(delta_term_next.shape[feature_maps_index]):
-                <span style="color: #8B008B; font-weight: bold">for</span> h <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #B452CD">0</span>, delta_term_next.shape[height_index], <span style="color: #658b00">self</span>.v_stride):
-                    <span style="color: #8B008B; font-weight: bold">for</span> w <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(
-                        <span style="color: #B452CD">0</span>, delta_term_next.shape[width_index], <span style="color: #658b00">self</span>.h_stride
-                    ):
-                        <span style="color: #228B22"># max pooling</span>
-                        <span style="color: #8B008B; font-weight: bold">if</span> <span style="color: #658b00">self</span>.pooling == <span style="color: #CD5555">&quot;max&quot;</span>:
-                            <span style="color: #228B22"># get window</span>
-                            window = <span style="color: #658b00">self</span>.X_batch_feedforward[
-                                img,
-                                fmap,
-                                h : h + <span style="color: #658b00">self</span>.kernel_height,
-                                w : w + <span style="color: #658b00">self</span>.kernel_width,
-                            ]
-
-                            <span style="color: #228B22"># find max values indices in window</span>
-                            max_h, max_w = np.unravel_index(
-                                window.argmax(), window.shape
-                            )
-
-                            <span style="color: #228B22"># set values in new, upsampled delta term</span>
-                            delta_term[
-                                img,
-                                fmap,
-                                (h + max_h),
-                                (w + max_w),
-                            ] += delta_term_next[img, fmap, h, w]
-
-                        <span style="color: #228B22"># average pooling</span>
-                        <span style="color: #8B008B; font-weight: bold">if</span> <span style="color: #658b00">self</span>.pooling == <span style="color: #CD5555">&quot;average&quot;</span>:
-                            delta_term[
-                                img,
-                                fmap,
-                                h : h + <span style="color: #658b00">self</span>.kernel_height,
-                                w : w + <span style="color: #658b00">self</span>.kernel_width,
-                            ] = (
-                                delta_term_next[img, fmap, h, w]
-                                / <span style="color: #658b00">self</span>.kernel_height
-                                / <span style="color: #658b00">self</span>.kernel_width
-                            )
-        <span style="color: #228B22"># returns input to backpropagation in previous layer</span>
-        <span style="color: #8B008B; font-weight: bold">return</span> delta_term
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">_reset_weights</span>(<span style="color: #658b00">self</span>, previous_nodes):
-        <span style="color: #228B22"># calculate strided height, strided width</span>
-        strided_height = (
-            previous_nodes.shape[height_index] - <span style="color: #658b00">self</span>.kernel_height
-        ) // <span style="color: #658b00">self</span>.v_stride + <span style="color: #B452CD">1</span>
-        <span style="color: #8B008B; font-weight: bold">if</span> previous_nodes.shape[height_index] == previous_nodes.shape[width_index]:
-            strided_width = strided_height
-        <span style="color: #8B008B; font-weight: bold">else</span>:
-            strided_width = (
-                previous_nodes.shape[width_index] - <span style="color: #658b00">self</span>.kernel_width
-            ) // <span style="color: #658b00">self</span>.h_stride + <span style="color: #B452CD">1</span>
-
-        <span style="color: #228B22"># initiate output array</span>
-        output = np.ones(
-            (
-                previous_nodes.shape[input_index],
-                previous_nodes.shape[feature_maps_index],
-                strided_height,
-                strided_width,
-            )
-        )
-
-        <span style="color: #228B22"># returns output with shape used for reset weights in next layer</span>
-        <span style="color: #8B008B; font-weight: bold">return</span> output
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">_check_for_errors</span>(<span style="color: #658b00">self</span>):
-        <span style="color: #228B22"># check if input is smaller than kernel size -&gt; error</span>
-        <span style="color: #8B008B; font-weight: bold">assert</span> (
-            <span style="color: #658b00">self</span>.X_batch_feedforward.shape[width_index] &gt;= <span style="color: #658b00">self</span>.kernel_width
-        ), <span style="color: #CD5555">f&quot;ERROR: Pooling kernel width_index ({</span><span style="color: #658b00">self</span>.kernel_width<span style="color: #CD5555">}) larger than data width_index ({</span><span style="color: #658b00">self</span>.X_batch_feedforward.input.shape[<span style="color: #B452CD">2</span>]<span style="color: #CD5555">}), please lower the kernel width_index of the Pooling2DLayer&quot;</span>
-        <span style="color: #8B008B; font-weight: bold">assert</span> (
-            <span style="color: #658b00">self</span>.X_batch_feedforward.shape[height_index] &gt;= <span style="color: #658b00">self</span>.kernel_height
-        ), <span style="color: #CD5555">f&quot;ERROR: Pooling kernel height_index ({</span><span style="color: #658b00">self</span>.kernel_height<span style="color: #CD5555">}) larger than data height_index ({</span><span style="color: #658b00">self</span>.X_batch_feedforward.input.shape[<span style="color: #B452CD">3</span>]<span style="color: #CD5555">}), please lower the kernel height_index of the Pooling2DLayer&quot;</span>
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-<h3 id="flattening-layer">Flattening Layer </h3>
-
-<p>Before we can begin building our first CNN model, we need to introduce
-the flattening layer. As its name suggests, the flattening layer
-transforms the data into a one-dimensional vector that can be fed into
-the feedforward layers of our network. This layer plays a crucial role
-in preparing the data for further processing in the
-network. Additionally, the flattening layer is responsible for
-reshaping the gradient to the proper shape during
-backpropagation. This ensures that the kernels are correctly updated,
-allowing for effective learning in the network.
-</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="font-size: 80%; line-height: 125%;"><span style="color: #8B008B; font-weight: bold">class</span> <span style="color: #008b45; font-weight: bold">FlattenLayer</span>(Layer):
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">__init__</span>(<span style="color: #658b00">self</span>, act_func=LRELU, seed=<span style="color: #8B008B; font-weight: bold">None</span>):
-        <span style="color: #658b00">super</span>().<span style="color: #008b45">__init__</span>(seed)
-        <span style="color: #658b00">self</span>.act_func = act_func
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">_feedforward</span>(<span style="color: #658b00">self</span>, X_batch):
-        <span style="color: #228B22"># save input for backpropagation</span>
-        <span style="color: #658b00">self</span>.X_batch_feedforward_shape = X_batch.shape
-        <span style="color: #228B22"># Remember, the data has the following shape: (I, FM, H, W, ) in the convolutional layers</span>
-        <span style="color: #228B22"># whilst the data has the shape (I, FM * H * W) in the fully connected layers</span>
-        <span style="color: #228B22"># I = Inputs, FM = Feature Maps, H = Height and W = Width.</span>
-        X_batch = X_batch.reshape(
-            X_batch.shape[input_index],
-            X_batch.shape[feature_maps_index]
-            * X_batch.shape[height_index]
-            * X_batch.shape[width_index],
-        )
-
-        <span style="color: #228B22"># add bias to a</span>
-        <span style="color: #658b00">self</span>.z_matrix = X_batch
-        bias = np.ones((X_batch.shape[input_index], <span style="color: #B452CD">1</span>)) * <span style="color: #B452CD">0.01</span>
-        <span style="color: #658b00">self</span>.a_matrix = np.hstack([bias, X_batch])
-
-        <span style="color: #228B22"># return a, the input to feedforward in next layer</span>
-        <span style="color: #8B008B; font-weight: bold">return</span> <span style="color: #658b00">self</span>.a_matrix
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">_backpropagate</span>(<span style="color: #658b00">self</span>, weights_next, delta_term_next):
-        activation_derivative = derivate(<span style="color: #658b00">self</span>.act_func)
-
-        <span style="color: #228B22"># calculate delta term</span>
-        delta_term = (
-            weights_next[bias_index:, :] @ delta_term_next.T
-        ).T * activation_derivative(<span style="color: #658b00">self</span>.z_matrix)
-
-        <span style="color: #228B22"># FlattenLayer does not update weights</span>
-        <span style="color: #228B22"># reshapes delta layer to convolutional layer data format [Input, Feature_Maps, Height, Width]</span>
-        <span style="color: #8B008B; font-weight: bold">return</span> delta_term.reshape(<span style="color: #658b00">self</span>.X_batch_feedforward_shape)
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">_reset_weights</span>(<span style="color: #658b00">self</span>, previous_nodes):
-        <span style="color: #228B22"># note that the previous nodes to the FlattenLayer are from the convolutional layers</span>
-        previous_nodes = previous_nodes.reshape(
-            previous_nodes.shape[input_index],
-            previous_nodes.shape[feature_maps_index]
-            * previous_nodes.shape[height_index]
-            * previous_nodes.shape[width_index],
-        )
-
-        <span style="color: #228B22"># return shape used in reset_weights in next layer</span>
-        <span style="color: #8B008B; font-weight: bold">return</span> previous_nodes.shape[node_index]
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">get_prev_a</span>(<span style="color: #658b00">self</span>):
-        <span style="color: #8B008B; font-weight: bold">return</span> <span style="color: #658b00">self</span>.a_matrix
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-<h3 id="fully-connected-layers">Fully Connected Layers </h3>
-
-<p>Finally, the result from the flatten layer will pass to a series of
-fully connected layers, which function as a normal feed forward neural
-network. The fully connected layers are split into two classes;
-FullyConnectedLayer which acts as a hidden layer, and OutputLayer,
-which acts as the single output layer at the end of the CNN. If one
-wishes to use this codebase to construct a normal feed forward neural
-network, it must start with a FlattenLayer due to techincal details
-regarding weight intitialization. However many FullyConnectedLayers
-can be added to the CNN, and in each layer the amount of nodes, which
-activation function and scheduler to use can be specified. In
-practice, the scheduler will be specified in the CNN object
-initialization, and inherited if no other scheduler is specified.
-</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="font-size: 80%; line-height: 125%;"><span style="color: #8B008B; font-weight: bold">class</span> <span style="color: #008b45; font-weight: bold">FullyConnectedLayer</span>(Layer):
-    <span style="color: #228B22"># FullyConnectedLayer per default uses LRELU and Adam scheduler</span>
-    <span style="color: #228B22"># with an eta of 0.0001, rho of 0.9 and rho2 of 0.999</span>
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">__init__</span>(
-        <span style="color: #658b00">self</span>,
-        nodes: <span style="color: #658b00">int</span>,
-        act_func: Callable = LRELU,
-        scheduler: Scheduler = Adam(eta=<span style="color: #B452CD">1e-4</span>, rho=<span style="color: #B452CD">0.9</span>, rho2=<span style="color: #B452CD">0.999</span>),
-        seed: <span style="color: #658b00">int</span> = <span style="color: #8B008B; font-weight: bold">None</span>,
-    ):
-        <span style="color: #658b00">super</span>().<span style="color: #008b45">__init__</span>(seed)
-        <span style="color: #658b00">self</span>.nodes = nodes
-        <span style="color: #658b00">self</span>.act_func = act_func
-        <span style="color: #658b00">self</span>.scheduler_weight = copy(scheduler)
-        <span style="color: #658b00">self</span>.scheduler_bias = copy(scheduler)
-
-        <span style="color: #228B22"># initiate matrices for later</span>
-        <span style="color: #658b00">self</span>.weights = <span style="color: #8B008B; font-weight: bold">None</span>
-        <span style="color: #658b00">self</span>.a_matrix = <span style="color: #8B008B; font-weight: bold">None</span>
-        <span style="color: #658b00">self</span>.z_matrix = <span style="color: #8B008B; font-weight: bold">None</span>
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">_feedforward</span>(<span style="color: #658b00">self</span>, X_batch):
-        <span style="color: #228B22"># calculate z</span>
-        <span style="color: #658b00">self</span>.z_matrix = X_batch @ <span style="color: #658b00">self</span>.weights
-
-        <span style="color: #228B22"># calculate a, add bias</span>
-        bias = np.ones((X_batch.shape[input_index], <span style="color: #B452CD">1</span>)) * <span style="color: #B452CD">0.01</span>
-        <span style="color: #658b00">self</span>.a_matrix = <span style="color: #658b00">self</span>.act_func(<span style="color: #658b00">self</span>.z_matrix)
-        <span style="color: #658b00">self</span>.a_matrix = np.hstack([bias, <span style="color: #658b00">self</span>.a_matrix])
-
-        <span style="color: #228B22"># return a, the input for feedforward in next layer</span>
-        <span style="color: #8B008B; font-weight: bold">return</span> <span style="color: #658b00">self</span>.a_matrix
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">_backpropagate</span>(<span style="color: #658b00">self</span>, weights_next, delta_term_next, a_previous, lam):
-        <span style="color: #228B22"># take the derivative of the activation function</span>
-        activation_derivative = derivate(<span style="color: #658b00">self</span>.act_func)
-
-        <span style="color: #228B22"># calculate the delta term</span>
-        delta_term = (
-            weights_next[bias_index:, :] @ delta_term_next.T
-        ).T * activation_derivative(<span style="color: #658b00">self</span>.z_matrix)
-
-        <span style="color: #228B22"># intitiate matrix to store gradient</span>
-        <span style="color: #228B22"># note that we exclude the bias term, which we will calculate later</span>
-        gradient_weights = np.zeros(
-            (
-                a_previous.shape[input_index],
-                a_previous.shape[node_index] - bias_index,
-                delta_term.shape[node_index],
-            )
-        )
-
-        <span style="color: #228B22"># calculate gradient = delta term * previous a</span>
-        <span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #658b00">len</span>(delta_term)):
-            gradient_weights[i, :, :] = np.outer(
-                a_previous[i, bias_index:], delta_term[i, :]
-            )
-
-        <span style="color: #228B22"># sum the gradient, divide by input_index</span>
-        gradient_weights = np.mean(gradient_weights, axis=input_index)
-        <span style="color: #228B22"># for the bias gradient we do not multiply by previous a</span>
-        gradient_bias = np.mean(delta_term, axis=input_index).reshape(
-            <span style="color: #B452CD">1</span>, delta_term.shape[node_index]
-        )
-
-        <span style="color: #228B22"># regularization term</span>
-        gradient_weights += <span style="color: #658b00">self</span>.weights[bias_index:, :] * lam
-
-        <span style="color: #228B22"># send gradients into scheduler</span>
-        <span style="color: #228B22"># returns update matrix which will be used to update the weights and bias</span>
-        update_matrix = np.vstack(
-            [
-                <span style="color: #658b00">self</span>.scheduler_bias.update_change(gradient_bias),
-                <span style="color: #658b00">self</span>.scheduler_weight.update_change(gradient_weights),
-            ]
-        )
-
-        <span style="color: #228B22"># update weights</span>
-        <span style="color: #658b00">self</span>.weights -= update_matrix
-
-        <span style="color: #228B22"># return weights and delta term, input for backpropagation in previous layer</span>
-        <span style="color: #8B008B; font-weight: bold">return</span> <span style="color: #658b00">self</span>.weights, delta_term
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">_reset_weights</span>(<span style="color: #658b00">self</span>, previous_nodes):
-        <span style="color: #228B22"># sets seed to remove randomness inbetween runs</span>
-        <span style="color: #8B008B; font-weight: bold">if</span> <span style="color: #658b00">self</span>.seed <span style="color: #8B008B">is</span> <span style="color: #8B008B">not</span> <span style="color: #8B008B; font-weight: bold">None</span>:
-            np.random.seed(<span style="color: #658b00">self</span>.seed)
-
-        <span style="color: #228B22"># add bias, initiate random weights</span>
-        bias = <span style="color: #B452CD">1</span>
-        <span style="color: #658b00">self</span>.weights = np.random.randn(previous_nodes + bias, <span style="color: #658b00">self</span>.nodes)
-
-        <span style="color: #228B22"># returns number of nodes, used for reset_weights in next layer</span>
-        <span style="color: #8B008B; font-weight: bold">return</span> <span style="color: #658b00">self</span>.nodes
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">_reset_scheduler</span>(<span style="color: #658b00">self</span>):
-        <span style="color: #228B22"># resets scheduler per epoch</span>
-        <span style="color: #658b00">self</span>.scheduler_weight.reset()
-        <span style="color: #658b00">self</span>.scheduler_bias.reset()
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">get_prev_a</span>(<span style="color: #658b00">self</span>):
-        <span style="color: #228B22"># returns a matrix, used in backpropagation</span>
-        <span style="color: #8B008B; font-weight: bold">return</span> <span style="color: #658b00">self</span>.a_matrix
-
-
-<span style="color: #8B008B; font-weight: bold">class</span> <span style="color: #008b45; font-weight: bold">OutputLayer</span>(FullyConnectedLayer):
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">__init__</span>(
-        <span style="color: #658b00">self</span>,
-        nodes: <span style="color: #658b00">int</span>,
-        output_func: Callable = LRELU,
-        cost_func: Callable = CostCrossEntropy,
-        scheduler: Scheduler = Adam(eta=<span style="color: #B452CD">1e-4</span>, rho=<span style="color: #B452CD">0.9</span>, rho2=<span style="color: #B452CD">0.999</span>),
-        seed: <span style="color: #658b00">int</span> = <span style="color: #8B008B; font-weight: bold">None</span>,
-    ):
-        <span style="color: #658b00">super</span>().<span style="color: #008b45">__init__</span>(nodes, output_func, copy(scheduler), seed)
-        <span style="color: #658b00">self</span>.cost_func = cost_func
-
-        <span style="color: #228B22"># initiate matrices for later</span>
-        <span style="color: #658b00">self</span>.weights = <span style="color: #8B008B; font-weight: bold">None</span>
-        <span style="color: #658b00">self</span>.a_matrix = <span style="color: #8B008B; font-weight: bold">None</span>
-        <span style="color: #658b00">self</span>.z_matrix = <span style="color: #8B008B; font-weight: bold">None</span>
-
-        <span style="color: #228B22"># decides if the output layer performs binary or multi-class classification</span>
-        <span style="color: #658b00">self</span>._set_pred_format()
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">_feedforward</span>(<span style="color: #658b00">self</span>, X_batch: np.ndarray):
-        <span style="color: #228B22"># calculate a, z</span>
-        <span style="color: #228B22"># note that bias is not added as this would create an extra output class</span>
-        <span style="color: #658b00">self</span>.z_matrix = X_batch @ <span style="color: #658b00">self</span>.weights
-        <span style="color: #658b00">self</span>.a_matrix = <span style="color: #658b00">self</span>.act_func(<span style="color: #658b00">self</span>.z_matrix)
-
-        <span style="color: #228B22"># returns prediction</span>
-        <span style="color: #8B008B; font-weight: bold">return</span> <span style="color: #658b00">self</span>.a_matrix
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">_backpropagate</span>(<span style="color: #658b00">self</span>, target, a_previous, lam):
-        <span style="color: #228B22"># note that in the OutputLayer the activation function is the output function</span>
-        activation_derivative = derivate(<span style="color: #658b00">self</span>.act_func)
-
-        <span style="color: #228B22"># calculate output delta terms</span>
-        <span style="color: #228B22"># for multi-class or binary classification</span>
-        <span style="color: #8B008B; font-weight: bold">if</span> <span style="color: #658b00">self</span>.pred_format == <span style="color: #CD5555">&quot;Multi-class&quot;</span>:
-            delta_term = <span style="color: #658b00">self</span>.a_matrix - target
-        <span style="color: #8B008B; font-weight: bold">else</span>:
-            cost_func_derivative = grad(<span style="color: #658b00">self</span>.cost_func(target))
-            delta_term = activation_derivative(<span style="color: #658b00">self</span>.z_matrix) * cost_func_derivative(
-                <span style="color: #658b00">self</span>.a_matrix
-            )
-
-        <span style="color: #228B22"># intiate matrix that stores gradient</span>
-        gradient_weights = np.zeros(
-            (
-                a_previous.shape[input_index],
-                a_previous.shape[node_index] - bias_index,
-                delta_term.shape[node_index],
-            )
-        )
-
-        <span style="color: #228B22"># calculate gradient = delta term * previous a</span>
-        <span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #658b00">len</span>(delta_term)):
-            gradient_weights[i, :, :] = np.outer(
-                a_previous[i, bias_index:], delta_term[i, :]
-            )
-
-        <span style="color: #228B22"># sum the gradient, divide by input_index</span>
-        gradient_weights = np.mean(gradient_weights, axis=input_index)
-        <span style="color: #228B22"># for the bias gradient we do not multiply by previous a</span>
-        gradient_bias = np.mean(delta_term, axis=input_index).reshape(
-            <span style="color: #B452CD">1</span>, delta_term.shape[node_index]
-        )
-
-        <span style="color: #228B22"># regularization term</span>
-        gradient_weights += <span style="color: #658b00">self</span>.weights[bias_index:, :] * lam
-
-        <span style="color: #228B22"># send gradients into scheduler</span>
-        <span style="color: #228B22"># returns update matrix which will be used to update the weights and bias</span>
-        update_matrix = np.vstack(
-            [
-                <span style="color: #658b00">self</span>.scheduler_bias.update_change(gradient_bias),
-                <span style="color: #658b00">self</span>.scheduler_weight.update_change(gradient_weights),
-            ]
-        )
-
-        <span style="color: #228B22"># update weights</span>
-        <span style="color: #658b00">self</span>.weights -= update_matrix
-
-        <span style="color: #228B22"># return weights and delta term, input for backpropagation in previous layer</span>
-        <span style="color: #8B008B; font-weight: bold">return</span> <span style="color: #658b00">self</span>.weights, delta_term
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">_reset_weights</span>(<span style="color: #658b00">self</span>, previous_nodes):
-        <span style="color: #228B22"># sets seed to remove randomness inbetween runs</span>
-        <span style="color: #8B008B; font-weight: bold">if</span> <span style="color: #658b00">self</span>.seed <span style="color: #8B008B">is</span> <span style="color: #8B008B">not</span> <span style="color: #8B008B; font-weight: bold">None</span>:
-            np.random.seed(<span style="color: #658b00">self</span>.seed)
-
-        <span style="color: #228B22"># add bias, initiate random weights</span>
-        bias = <span style="color: #B452CD">1</span>
-        <span style="color: #658b00">self</span>.weights = np.random.rand(previous_nodes + bias, <span style="color: #658b00">self</span>.nodes)
-
-        <span style="color: #228B22"># returns number of nodes, used for reset_weights in next layer</span>
-        <span style="color: #8B008B; font-weight: bold">return</span> <span style="color: #658b00">self</span>.nodes
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">_reset_scheduler</span>(<span style="color: #658b00">self</span>):
-        <span style="color: #228B22"># resets scheduler per epoch</span>
-        <span style="color: #658b00">self</span>.scheduler_weight.reset()
-        <span style="color: #658b00">self</span>.scheduler_bias.reset()
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">_set_pred_format</span>(<span style="color: #658b00">self</span>):
-        <span style="color: #228B22"># sets prediction format to either regression, binary or multi-class classification</span>
-        <span style="color: #8B008B; font-weight: bold">if</span> <span style="color: #658b00">self</span>.act_func.<span style="color: #00688B">__name__</span> <span style="color: #8B008B">is</span> <span style="color: #8B008B; font-weight: bold">None</span> <span style="color: #8B008B">or</span> <span style="color: #658b00">self</span>.act_func.<span style="color: #00688B">__name__</span> == <span style="color: #CD5555">&quot;identity&quot;</span>:
-            <span style="color: #658b00">self</span>.pred_format = <span style="color: #CD5555">&quot;Regression&quot;</span>
-        <span style="color: #8B008B; font-weight: bold">elif</span> <span style="color: #658b00">self</span>.act_func.<span style="color: #00688B">__name__</span> == <span style="color: #CD5555">&quot;sigmoid&quot;</span> <span style="color: #8B008B">or</span> <span style="color: #658b00">self</span>.act_func.<span style="color: #00688B">__name__</span> == <span style="color: #CD5555">&quot;tanh&quot;</span>:
-            <span style="color: #658b00">self</span>.pred_format = <span style="color: #CD5555">&quot;Binary&quot;</span>
-        <span style="color: #8B008B; font-weight: bold">else</span>:
-            <span style="color: #658b00">self</span>.pred_format = <span style="color: #CD5555">&quot;Multi-class&quot;</span>
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">get_pred_format</span>(<span style="color: #658b00">self</span>):
-        <span style="color: #228B22"># returns format of prediction</span>
-        <span style="color: #8B008B; font-weight: bold">return</span> <span style="color: #658b00">self</span>.pred_format
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-<h3 id="optimized-convolution2dlayer">Optimized Convolution2DLayer </h3>
-
-<p>For our CNN, we have also implemented an optimized version of the
-Convolution2DLayer, Convolution2DLayerOPT, which runs much faster. See
-VII. Remarks for discussion. This layer will per default be used by
-the CNN due to its computational advantages, but is much less
-readable. We've documented it such that specially interested students
-can understand the principles behind it, but it is not recommended to
-read. In short, we reshape and transpose parts of the image such that
-the convolutional operation can be swapped out for a simple matrix
-multiplication.
-</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="font-size: 80%; line-height: 125%;"><span style="color: #8B008B; font-weight: bold">class</span> <span style="color: #008b45; font-weight: bold">Convolution2DLayerOPT</span>(Convolution2DLayer):
-    <span style="color: #CD5555">&quot;&quot;&quot;</span>
-<span style="color: #CD5555">    Am optimized version of the convolution layer above which</span>
-<span style="color: #CD5555">    utilizes an approach of extracting windows of size equivalent</span>
-<span style="color: #CD5555">    in size to the filter. The convoution is then performed on those</span>
-<span style="color: #CD5555">    windows instead of a full feature map.</span>
-<span style="color: #CD5555">    &quot;&quot;&quot;</span>
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">__init__</span>(
-        <span style="color: #658b00">self</span>,
-        input_channels,
-        feature_maps,
-        kernel_height,
-        kernel_width,
-        v_stride,
-        h_stride,
-        pad,
-        act_func: Callable,
-        seed=<span style="color: #8B008B; font-weight: bold">None</span>,
-        reset_weights_independently=<span style="color: #8B008B; font-weight: bold">True</span>,
-    ):
-        <span style="color: #658b00">super</span>().<span style="color: #008b45">__init__</span>(
-            input_channels,
-            feature_maps,
-            kernel_height,
-            kernel_width,
-            v_stride,
-            h_stride,
-            pad,
-            act_func,
-            seed,
-        )
-        <span style="color: #228B22"># true if layer is used outside of CNN</span>
-        <span style="color: #8B008B; font-weight: bold">if</span> reset_weights_independently == <span style="color: #8B008B; font-weight: bold">True</span>:
-            <span style="color: #658b00">self</span>._reset_weights_independently()
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">_feedforward</span>(<span style="color: #658b00">self</span>, X_batch):
-        <span style="color: #228B22"># The optimized _feedforward method is difficult to understand but computationally more efficient</span>
-        <span style="color: #228B22"># for a more &quot;by the book&quot; approach, please look at the _feedforward method of Convolution2DLayer</span>
-
-        <span style="color: #228B22"># save the input for backpropagation</span>
-        <span style="color: #658b00">self</span>.X_batch_feedforward = X_batch
-
-        <span style="color: #228B22"># check that there are the correct amount of input channels</span>
-        <span style="color: #658b00">self</span>._check_for_errors()
-
-        <span style="color: #228B22"># calculate new shape after stride</span>
-        strided_height = <span style="color: #658b00">int</span>(np.ceil(X_batch.shape[height_index] / <span style="color: #658b00">self</span>.v_stride))
-        strided_width = <span style="color: #658b00">int</span>(np.ceil(X_batch.shape[width_index] / <span style="color: #658b00">self</span>.h_stride))
-
-        <span style="color: #228B22"># get windows of the image for more computationally efficient convolution</span>
-        <span style="color: #228B22"># the idea is that we want to align the dimensions that we wish to matrix</span>
-        <span style="color: #228B22"># multiply, then use a simple matrix multiplication instead of convolution.</span>
-        <span style="color: #228B22"># then, we reshape the size back to its intended shape</span>
-        windows = <span style="color: #658b00">self</span>._extract_windows(X_batch)
-        windows = windows.transpose(<span style="color: #B452CD">1</span>, <span style="color: #B452CD">0</span>, <span style="color: #B452CD">2</span>, <span style="color: #B452CD">3</span>, <span style="color: #B452CD">4</span>).reshape(
-            X_batch.shape[input_index],
-            strided_height * strided_width,
-            -<span style="color: #B452CD">1</span>,
-        )
-
-        <span style="color: #228B22"># reshape the kernel for more computationally efficient convolution</span>
-        kernel = <span style="color: #658b00">self</span>.kernel
-        kernel = kernel.transpose(<span style="color: #B452CD">0</span>, <span style="color: #B452CD">2</span>, <span style="color: #B452CD">3</span>, <span style="color: #B452CD">1</span>).reshape(
-            kernel.shape[kernel_input_channels_index]
-            * kernel.shape[height_index]
-            * kernel.shape[width_index],
-            -<span style="color: #B452CD">1</span>,
-        )
-
-        <span style="color: #228B22"># use simple matrix calculation to obtain output</span>
-        output = (
-            (windows @ kernel)
-            .reshape(
-                X_batch.shape[input_index],
-                strided_height,
-                strided_width,
-                -<span style="color: #B452CD">1</span>,
-            )
-            .transpose(<span style="color: #B452CD">0</span>, <span style="color: #B452CD">3</span>, <span style="color: #B452CD">1</span>, <span style="color: #B452CD">2</span>)
-        )
-
-        <span style="color: #228B22"># The output is reshaped and rearranged to appropriate shape</span>
-        <span style="color: #8B008B; font-weight: bold">return</span> <span style="color: #658b00">self</span>.act_func(
-            output / (<span style="color: #658b00">self</span>.kernel_height * X_batch.shape[feature_maps_index])
-        )
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">_backpropagate</span>(<span style="color: #658b00">self</span>, delta_term_next):
-        <span style="color: #228B22"># The optimized _backpropagate method is difficult to understand but computationally more efficient</span>
-        <span style="color: #228B22"># for a more &quot;by the book&quot; approach, please look at the _backpropagate method of Convolution2DLayer</span>
-        act_derivative = derivate(<span style="color: #658b00">self</span>.act_func)
-        delta_term_next = act_derivative(delta_term_next)
-
-        <span style="color: #228B22"># calculate strided dimensions</span>
-        strided_height = <span style="color: #658b00">int</span>(
-            np.ceil(<span style="color: #658b00">self</span>.X_batch_feedforward.shape[height_index] / <span style="color: #658b00">self</span>.v_stride)
-        )
-        strided_width = <span style="color: #658b00">int</span>(
-            np.ceil(<span style="color: #658b00">self</span>.X_batch_feedforward.shape[width_index] / <span style="color: #658b00">self</span>.h_stride)
-        )
-
-        <span style="color: #228B22"># copy kernel</span>
-        kernel = <span style="color: #658b00">self</span>.kernel
-
-        <span style="color: #228B22"># get windows, reshape for matrix multiplication</span>
-        windows = <span style="color: #658b00">self</span>._extract_windows(<span style="color: #658b00">self</span>.X_batch_feedforward, <span style="color: #CD5555">&quot;image&quot;</span>).reshape(
-            <span style="color: #658b00">self</span>.X_batch_feedforward.shape[input_index]
-            * strided_height
-            * strided_width,
-            -<span style="color: #B452CD">1</span>,
-        )
-
-        <span style="color: #228B22"># initialize output gradient, reshape and transpose into correct shape</span>
-        <span style="color: #228B22"># for matrix multiplication</span>
-        output_grad_tr = delta_term_next.transpose(<span style="color: #B452CD">0</span>, <span style="color: #B452CD">2</span>, <span style="color: #B452CD">3</span>, <span style="color: #B452CD">1</span>).reshape(
-            <span style="color: #658b00">self</span>.X_batch_feedforward.shape[input_index]
-            * strided_height
-            * strided_width,
-            -<span style="color: #B452CD">1</span>,
-        )
-
-        <span style="color: #228B22"># calculate gradient kernel via simple matrix multiplication and reshaping</span>
-        gradient_kernel = (
-            (windows.T @ output_grad_tr)
-            .reshape(
-                kernel.shape[kernel_input_channels_index],
-                kernel.shape[height_index],
-                kernel.shape[width_index],
-                kernel.shape[kernel_feature_maps_index],
-            )
-            .transpose(<span style="color: #B452CD">0</span>, <span style="color: #B452CD">3</span>, <span style="color: #B452CD">1</span>, <span style="color: #B452CD">2</span>)
-        )
-
-        <span style="color: #228B22"># for computing the input gradient</span>
-        windows_out, upsampled_height, upsampled_width = <span style="color: #658b00">self</span>._extract_windows(
-            delta_term_next, <span style="color: #CD5555">&quot;grad&quot;</span>
-        )
-
-        <span style="color: #228B22"># calculate new window dimensions</span>
-        new_windows_first_dim = (
-            <span style="color: #658b00">self</span>.X_batch_feedforward.shape[input_index]
-            * upsampled_height
-            * upsampled_width
-        )
-        <span style="color: #228B22"># ceil allows for various asymmetric kernels</span>
-        new_windows_sec_dim = <span style="color: #658b00">int</span>(np.ceil(windows_out.size / new_windows_first_dim))
-
-        <span style="color: #228B22"># reshape for matrix multiplication</span>
-        windows_out = windows_out.transpose(<span style="color: #B452CD">1</span>, <span style="color: #B452CD">0</span>, <span style="color: #B452CD">2</span>, <span style="color: #B452CD">3</span>, <span style="color: #B452CD">4</span>).reshape(
-            new_windows_first_dim, new_windows_sec_dim
-        )
-
-        <span style="color: #228B22"># reshape for matrix multiplication</span>
-        kernel_reshaped = kernel.reshape(<span style="color: #658b00">self</span>.input_channels, -<span style="color: #B452CD">1</span>)
-
-        <span style="color: #228B22"># calculating input gradient for next convolutional layer</span>
-        input_grad = (windows_out @ kernel_reshaped.T).reshape(
-            <span style="color: #658b00">self</span>.X_batch_feedforward.shape[input_index],
-            upsampled_height,
-            upsampled_width,
-            kernel.shape[kernel_input_channels_index],
-        )
-        input_grad = input_grad.transpose(<span style="color: #B452CD">0</span>, <span style="color: #B452CD">3</span>, <span style="color: #B452CD">1</span>, <span style="color: #B452CD">2</span>)
-
-        <span style="color: #228B22"># Update the weights in the kernel</span>
-        <span style="color: #658b00">self</span>.kernel -= gradient_kernel
-
-        <span style="color: #228B22"># Output the gradient to propagate backwards</span>
-        <span style="color: #8B008B; font-weight: bold">return</span> input_grad
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">_extract_windows</span>(<span style="color: #658b00">self</span>, X_batch, batch_type=<span style="color: #CD5555">&quot;image&quot;</span>):
-        <span style="color: #CD5555">&quot;&quot;&quot;</span>
-<span style="color: #CD5555">        Receives as input the X_batch with shape (inputs, feature_maps, image_height, image_width)</span>
-<span style="color: #CD5555">        and extract windows of size kernel_height * kernel_width for every image and every feature_map.</span>
-<span style="color: #CD5555">        It then returns an np.ndarray of shape (image_height * image_width, inputs, feature_maps, kernel_height, kernel_width)</span>
-<span style="color: #CD5555">        which will be used either to filter the images in feedforward or to calculate the gradient.</span>
-<span style="color: #CD5555">        &quot;&quot;&quot;</span>
-
-        <span style="color: #228B22"># initialize list of windows</span>
-        windows = []
-
-        <span style="color: #8B008B; font-weight: bold">if</span> batch_type == <span style="color: #CD5555">&quot;image&quot;</span>:
-            <span style="color: #228B22"># pad the images</span>
-            X_batch_padded = <span style="color: #658b00">self</span>._padding(X_batch, batch_type=<span style="color: #CD5555">&quot;image&quot;</span>)
-            img_height, img_width = X_batch_padded.shape[<span style="color: #B452CD">2</span>:]
-            <span style="color: #228B22"># For each location in the image...</span>
-            <span style="color: #8B008B; font-weight: bold">for</span> h <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(
-                <span style="color: #B452CD">0</span>,
-                X_batch.shape[height_index],
-                <span style="color: #658b00">self</span>.v_stride,
-            ):
-                <span style="color: #8B008B; font-weight: bold">for</span> w <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(
-                    <span style="color: #B452CD">0</span>,
-                    X_batch.shape[width_index],
-                    <span style="color: #658b00">self</span>.h_stride,
-                ):
-                    <span style="color: #228B22"># ...obtain an image patch of the original size (strided)</span>
-
-                    <span style="color: #228B22"># get window</span>
-                    window = X_batch_padded[
-                        :,
-                        :,
-                        h : h + <span style="color: #658b00">self</span>.kernel_height,
-                        w : w + <span style="color: #658b00">self</span>.kernel_width,
-                    ]
-
-                    <span style="color: #228B22"># append to list of windows</span>
-                    windows.append(window)
-
-            <span style="color: #228B22"># return numpy array instead of list</span>
-            <span style="color: #8B008B; font-weight: bold">return</span> np.stack(windows)
-
-        <span style="color: #228B22"># In order to be able to perform backprogagation by the method of window extraction,</span>
-        <span style="color: #228B22"># here is a modified approach to extracting the windows which allow for the necessary</span>
-        <span style="color: #228B22"># upsampling of the gradient in case the on of the stride parameters is larger than one.</span>
-
-        <span style="color: #8B008B; font-weight: bold">if</span> batch_type == <span style="color: #CD5555">&quot;grad&quot;</span>:
-
-            <span style="color: #228B22"># In the case of one of the stride parameters being odd, we have to take some</span>
-            <span style="color: #228B22"># extra care in calculating the upsampled size of X_batch. We solve this</span>
-            <span style="color: #228B22"># by simply flooring the result of dividing stride by 2.</span>
-            <span style="color: #8B008B; font-weight: bold">if</span> <span style="color: #658b00">self</span>.v_stride &lt; <span style="color: #B452CD">2</span> <span style="color: #8B008B">or</span> <span style="color: #658b00">self</span>.v_stride % <span style="color: #B452CD">2</span> == <span style="color: #B452CD">0</span>:
-                v_stride = <span style="color: #B452CD">0</span>
-            <span style="color: #8B008B; font-weight: bold">else</span>:
-                v_stride = <span style="color: #658b00">int</span>(np.floor(<span style="color: #658b00">self</span>.v_stride / <span style="color: #B452CD">2</span>))
-
-            <span style="color: #8B008B; font-weight: bold">if</span> <span style="color: #658b00">self</span>.h_stride &lt; <span style="color: #B452CD">2</span> <span style="color: #8B008B">or</span> <span style="color: #658b00">self</span>.h_stride % <span style="color: #B452CD">2</span> == <span style="color: #B452CD">0</span>:
-                h_stride = <span style="color: #B452CD">0</span>
-            <span style="color: #8B008B; font-weight: bold">else</span>:
-                h_stride = <span style="color: #658b00">int</span>(np.floor(<span style="color: #658b00">self</span>.h_stride / <span style="color: #B452CD">2</span>))
-
-            upsampled_height = (X_batch.shape[height_index] * <span style="color: #658b00">self</span>.v_stride) - v_stride
-            upsampled_width = (X_batch.shape[width_index] * <span style="color: #658b00">self</span>.h_stride) - h_stride
-
-            <span style="color: #228B22"># When upsampling, we need to insert rows and columns filled with zeros</span>
-            <span style="color: #228B22"># into each feature map. How many of those we have to insert is purely</span>
-            <span style="color: #228B22"># dependant on the value of stride parameter in the vertical and horizontal</span>
-            <span style="color: #228B22"># direction.</span>
-            <span style="color: #8B008B; font-weight: bold">if</span> <span style="color: #658b00">self</span>.v_stride &gt; <span style="color: #B452CD">1</span>:
-                v_ind = <span style="color: #B452CD">1</span>
-                <span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(X_batch.shape[height_index]):
-                    <span style="color: #8B008B; font-weight: bold">for</span> j <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #658b00">self</span>.v_stride - <span style="color: #B452CD">1</span>):
-                        X_batch = np.insert(X_batch, v_ind, <span style="color: #B452CD">0</span>, axis=height_index)
-                    v_ind += <span style="color: #658b00">self</span>.v_stride
-
-            <span style="color: #8B008B; font-weight: bold">if</span> <span style="color: #658b00">self</span>.h_stride &gt; <span style="color: #B452CD">1</span>:
-                h_ind = <span style="color: #B452CD">1</span>
-                <span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(X_batch.shape[width_index]):
-                    <span style="color: #8B008B; font-weight: bold">for</span> k <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #658b00">self</span>.h_stride - <span style="color: #B452CD">1</span>):
-                        X_batch = np.insert(X_batch, h_ind, <span style="color: #B452CD">0</span>, axis=width_index)
-                    h_ind += <span style="color: #658b00">self</span>.h_stride
-
-            <span style="color: #228B22"># Since the insertion of zero-filled rows and columns isn&#39;t perfect, we have</span>
-            <span style="color: #228B22"># to assure that the resulting feature maps will have the expected upsampled height</span>
-            <span style="color: #228B22"># and width by cutting them og at desired dimensions.</span>
-
-            X_batch = X_batch[:, :, :upsampled_height, :upsampled_width]
-
-            X_batch_padded = <span style="color: #658b00">self</span>._padding(X_batch, batch_type=<span style="color: #CD5555">&quot;grad&quot;</span>)
-
-            <span style="color: #228B22"># initialize list of windows</span>
-            windows = []
-
-            <span style="color: #228B22"># For each location in the image...</span>
-            <span style="color: #8B008B; font-weight: bold">for</span> h <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(
-                <span style="color: #B452CD">0</span>,
-                X_batch.shape[height_index],
-                <span style="color: #658b00">self</span>.v_stride,
-            ):
-                <span style="color: #8B008B; font-weight: bold">for</span> w <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(
-                    <span style="color: #B452CD">0</span>,
-                    X_batch.shape[width_index],
-                    <span style="color: #658b00">self</span>.h_stride,
-                ):
-                    <span style="color: #228B22"># ...obtain an image patch of the original size (strided)</span>
-
-                    <span style="color: #228B22"># get window</span>
-                    window = X_batch_padded[
-                        :, :, h : h + <span style="color: #658b00">self</span>.kernel_height, w : w + <span style="color: #658b00">self</span>.kernel_width
-                    ]
-
-                    <span style="color: #228B22"># append window to list</span>
-                    windows.append(window)
-
-            <span style="color: #228B22"># return numpy array, unsampled dimensions</span>
-            <span style="color: #8B008B; font-weight: bold">return</span> np.stack(windows), upsampled_height, upsampled_width
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">_check_for_errors</span>(<span style="color: #658b00">self</span>):
-        <span style="color: #228B22"># compares input channels of data to input channels of Convolution2DLayer</span>
-        <span style="color: #8B008B; font-weight: bold">if</span> <span style="color: #658b00">self</span>.X_batch_feedforward.shape[input_channel_index] != <span style="color: #658b00">self</span>.input_channels:
-            <span style="color: #8B008B; font-weight: bold">raise</span> <span style="color: #008b45; font-weight: bold">AssertionError</span>(
-                <span style="color: #CD5555">f&quot;ERROR: Number of input channels in data ({</span><span style="color: #658b00">self</span>.X_batch_feedforward.shape[input_channel_index]<span style="color: #CD5555">}) is not equal to input channels in Convolution2DLayerOPT ({</span><span style="color: #658b00">self</span>.input_channels<span style="color: #CD5555">})! Please change the number of input channels of the Convolution2DLayer such that they are equal&quot;</span>
-            )
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-<h3 id="the-convolutional-neural-network-cnn">The Convolutional Neural Network (CNN) </h3>
-
-<p>Finally, we present the code for the CNN. The CNN class organizes all the layers, and allows for training on image data.</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="font-size: 80%; line-height: 125%;"><span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">math</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">autograd.numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">sys</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">warnings</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">autograd</span> <span style="color: #8B008B; font-weight: bold">import</span> grad, elementwise_grad
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">random</span> <span style="color: #8B008B; font-weight: bold">import</span> random, seed
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">copy</span> <span style="color: #8B008B; font-weight: bold">import</span> deepcopy
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">typing</span> <span style="color: #8B008B; font-weight: bold">import</span> Tuple, Callable
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.utils</span> <span style="color: #8B008B; font-weight: bold">import</span> resample
-
-warnings.simplefilter(<span style="color: #CD5555">&quot;error&quot;</span>)
-
-
-<span style="color: #8B008B; font-weight: bold">class</span> <span style="color: #008b45; font-weight: bold">CNN</span>:
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">__init__</span>(
-        <span style="color: #658b00">self</span>,
-        cost_func: Callable = CostCrossEntropy,
-        scheduler: Scheduler = Adam(eta=<span style="color: #B452CD">1e-4</span>, rho=<span style="color: #B452CD">0.9</span>, rho2=<span style="color: #B452CD">0.999</span>),
-        seed: <span style="color: #658b00">int</span> = <span style="color: #8B008B; font-weight: bold">None</span>,
-    ):
-        <span style="color: #CD5555">&quot;&quot;&quot;</span>
-<span style="color: #CD5555">        Description:</span>
-<span style="color: #CD5555">        ------------</span>
-<span style="color: #CD5555">            Instantiates CNN object</span>
-
-<span style="color: #CD5555">        Parameters:</span>
-<span style="color: #CD5555">        ------------</span>
-<span style="color: #CD5555">            I   output_func (costFunctions) cost function for feed forward neural network part of CNN,</span>
-<span style="color: #CD5555">                such as &quot;CostLogReg&quot;, &quot;CostOLS&quot; or &quot;CostCrossEntropy&quot;</span>
-
-<span style="color: #CD5555">            II  scheduler (Scheduler) optional parameter, default set to Adam. Can also be set to other</span>
-<span style="color: #CD5555">                schedulers such as AdaGrad, Momentum, RMS_prop and Constant. Note that schedulers have</span>
-<span style="color: #CD5555">                to be instantiated first with proper parameters (for example eta, rho and rho2 for Adam)</span>
-
-<span style="color: #CD5555">            III seed (int) used for seeding all random operations</span>
-<span style="color: #CD5555">        &quot;&quot;&quot;</span>
-        <span style="color: #658b00">self</span>.layers = <span style="color: #658b00">list</span>()
-        <span style="color: #658b00">self</span>.cost_func = cost_func
-        <span style="color: #658b00">self</span>.scheduler = scheduler
-        <span style="color: #658b00">self</span>.schedulers_weight = <span style="color: #658b00">list</span>()
-        <span style="color: #658b00">self</span>.schedulers_bias = <span style="color: #658b00">list</span>()
-        <span style="color: #658b00">self</span>.seed = seed
-        <span style="color: #658b00">self</span>.pred_format = <span style="color: #8B008B; font-weight: bold">None</span>
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">add_FullyConnectedLayer</span>(
-        <span style="color: #658b00">self</span>, nodes: <span style="color: #658b00">int</span>, act_func=LRELU, scheduler=<span style="color: #8B008B; font-weight: bold">None</span>
-    ) -&gt; <span style="color: #8B008B; font-weight: bold">None</span>:
-        <span style="color: #CD5555">&quot;&quot;&quot;</span>
-<span style="color: #CD5555">        Description:</span>
-<span style="color: #CD5555">        ------------</span>
-<span style="color: #CD5555">            Add a FullyConnectedLayer to the CNN, i.e. a hidden layer in the feed forward neural</span>
-<span style="color: #CD5555">            network part of the CNN. Often called a Dense layer in literature</span>
-
-<span style="color: #CD5555">        Parameters:</span>
-<span style="color: #CD5555">        ------------</span>
-<span style="color: #CD5555">            I   nodes (int) number of nodes in FullyConnectedLayer</span>
-<span style="color: #CD5555">            II  act_func (activationFunctions) activation function of FullyConnectedLayer,</span>
-<span style="color: #CD5555">                such as &quot;sigmoid&quot;, &quot;RELU&quot;, &quot;LRELU&quot;, &quot;softmax&quot; or &quot;identity&quot;</span>
-<span style="color: #CD5555">            III scheduler (Scheduler) optional parameter, default set to Adam. Can also be set to other</span>
-<span style="color: #CD5555">                schedulers such as AdaGrad, Momentum, RMS_prop and Constant</span>
-<span style="color: #CD5555">        &quot;&quot;&quot;</span>
-        <span style="color: #8B008B; font-weight: bold">assert</span> <span style="color: #658b00">self</span>.layers, <span style="color: #CD5555">&quot;FullyConnectedLayer should follow FlattenLayer in CNN&quot;</span>
-
-        <span style="color: #8B008B; font-weight: bold">if</span> scheduler <span style="color: #8B008B">is</span> <span style="color: #8B008B; font-weight: bold">None</span>:
-            scheduler = <span style="color: #658b00">self</span>.scheduler
-
-        layer = FullyConnectedLayer(nodes, act_func, scheduler, <span style="color: #658b00">self</span>.seed)
-        <span style="color: #658b00">self</span>.layers.append(layer)
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">add_OutputLayer</span>(<span style="color: #658b00">self</span>, nodes: <span style="color: #658b00">int</span>, output_func=sigmoid, scheduler=<span style="color: #8B008B; font-weight: bold">None</span>) -&gt; <span style="color: #8B008B; font-weight: bold">None</span>:
-        <span style="color: #CD5555">&quot;&quot;&quot;</span>
-<span style="color: #CD5555">        Description:</span>
-<span style="color: #CD5555">        ------------</span>
-<span style="color: #CD5555">            Add an OutputLayer to the CNN, i.e. a the final layer in the feed forward neural</span>
-<span style="color: #CD5555">            network part of the CNN</span>
-
-<span style="color: #CD5555">        Parameters:</span>
-<span style="color: #CD5555">        ------------</span>
-<span style="color: #CD5555">            I   nodes (int) number of nodes in OutputLayer. Set nodes=1 for binary classification and</span>
-<span style="color: #CD5555">                nodes = number of classes for multi-class classification</span>
-<span style="color: #CD5555">            II  output_func (activationFunctions) activation function for the output layer, such as</span>
-<span style="color: #CD5555">                &quot;identity&quot; for regression, &quot;sigmoid&quot; for binary classification and &quot;softmax&quot; for multi-class</span>
-<span style="color: #CD5555">                classification</span>
-<span style="color: #CD5555">            III scheduler (Scheduler) optional parameter, default set to Adam. Can also be set to other</span>
-<span style="color: #CD5555">                schedulers such as AdaGrad, Momentum, RMS_prop and Constant</span>
-<span style="color: #CD5555">        &quot;&quot;&quot;</span>
-        <span style="color: #8B008B; font-weight: bold">assert</span> <span style="color: #658b00">self</span>.layers, <span style="color: #CD5555">&quot;OutputLayer should follow FullyConnectedLayer in CNN&quot;</span>
-
-        <span style="color: #8B008B; font-weight: bold">if</span> scheduler <span style="color: #8B008B">is</span> <span style="color: #8B008B; font-weight: bold">None</span>:
-            scheduler = <span style="color: #658b00">self</span>.scheduler
-
-        output_layer = OutputLayer(
-            nodes, output_func, <span style="color: #658b00">self</span>.cost_func, scheduler, <span style="color: #658b00">self</span>.seed
-        )
-        <span style="color: #658b00">self</span>.layers.append(output_layer)
-        <span style="color: #658b00">self</span>.pred_format = output_layer.get_pred_format()
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">add_FlattenLayer</span>(<span style="color: #658b00">self</span>, act_func=LRELU) -&gt; <span style="color: #8B008B; font-weight: bold">None</span>:
-        <span style="color: #CD5555">&quot;&quot;&quot;</span>
-<span style="color: #CD5555">        Description:</span>
-<span style="color: #CD5555">        ------------</span>
-<span style="color: #CD5555">            Add a FlattenLayer to the CNN, which flattens the image data such that it is formatted to</span>
-<span style="color: #CD5555">            be used in the feed forward neural network part of the CNN</span>
-<span style="color: #CD5555">        &quot;&quot;&quot;</span>
-        <span style="color: #658b00">self</span>.layers.append(FlattenLayer(act_func=act_func, seed=<span style="color: #658b00">self</span>.seed))
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">add_Convolution2DLayer</span>(
-        <span style="color: #658b00">self</span>,
-        input_channels=<span style="color: #B452CD">1</span>,
-        feature_maps=<span style="color: #B452CD">1</span>,
-        kernel_height=<span style="color: #B452CD">3</span>,
-        kernel_width=<span style="color: #B452CD">3</span>,
-        v_stride=<span style="color: #B452CD">1</span>,
-        h_stride=<span style="color: #B452CD">1</span>,
-        pad=<span style="color: #CD5555">&quot;same&quot;</span>,
-        act_func=LRELU,
-        optimized=<span style="color: #8B008B; font-weight: bold">True</span>,
-    ) -&gt; <span style="color: #8B008B; font-weight: bold">None</span>:
-        <span style="color: #CD5555">&quot;&quot;&quot;</span>
-<span style="color: #CD5555">        Description:</span>
-<span style="color: #CD5555">        ------------</span>
-<span style="color: #CD5555">            Add a Convolution2DLayer to the CNN, i.e. a convolutional layer with a 2 dimensional kernel. Should be</span>
-<span style="color: #CD5555">            the first layer added to the CNN</span>
-
-<span style="color: #CD5555">        Parameters:</span>
-<span style="color: #CD5555">        ------------</span>
-<span style="color: #CD5555">            I   input_channels (int) specifies amount of input channels. For monochrome images, use input_channels</span>
-<span style="color: #CD5555">                = 1, and input_channels = 3 for colored images, where each channel represents one of R, G and B</span>
-<span style="color: #CD5555">            II  feature_maps (int) amount of feature maps in CNN</span>
-<span style="color: #CD5555">            III kernel_height (int) height of the kernel, also called &#39;convolutional filter&#39; in literature</span>
-<span style="color: #CD5555">            IV  kernel_width (int) width of the kernel, also called &#39;convolutional filter&#39; in literature</span>
-<span style="color: #CD5555">            V   v_stride (int) value of vertical stride for dimentionality reduction</span>
-<span style="color: #CD5555">            VI  h_stride (int) value of horizontal stride for dimentionality reduction</span>
-<span style="color: #CD5555">            VII pad (str) default = &quot;same&quot; ensures output size is the same as input size (given stride=1)</span>
-<span style="color: #CD5555">           VIII act_func (activationFunctions) default = &quot;LRELU&quot;, nonlinear activation function</span>
-<span style="color: #CD5555">             IX optimized (bool) default = True, uses Convolution2DLayerOPT if True which is much faster when</span>
-<span style="color: #CD5555">                compared to Convolution2DLayer, which is a more straightforward, understandable implementation</span>
-<span style="color: #CD5555">        &quot;&quot;&quot;</span>
-        <span style="color: #8B008B; font-weight: bold">if</span> optimized:
-            conv_layer = Convolution2DLayerOPT(
-                input_channels,
-                feature_maps,
-                kernel_height,
-                kernel_width,
-                v_stride,
-                h_stride,
-                pad,
-                act_func,
-                <span style="color: #658b00">self</span>.seed,
-                reset_weights_independently=<span style="color: #8B008B; font-weight: bold">False</span>,
-            )
-        <span style="color: #8B008B; font-weight: bold">else</span>:
-            conv_layer = Convolution2DLayer(
-                input_channels,
-                feature_maps,
-                kernel_height,
-                kernel_width,
-                v_stride,
-                h_stride,
-                pad,
-                act_func,
-                <span style="color: #658b00">self</span>.seed,
-                reset_weights_independently=<span style="color: #8B008B; font-weight: bold">False</span>,
-            )
-        <span style="color: #658b00">self</span>.layers.append(conv_layer)
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">add_PoolingLayer</span>(
-        <span style="color: #658b00">self</span>, kernel_height=<span style="color: #B452CD">2</span>, kernel_width=<span style="color: #B452CD">2</span>, v_stride=<span style="color: #B452CD">1</span>, h_stride=<span style="color: #B452CD">1</span>, pooling=<span style="color: #CD5555">&quot;max&quot;</span>
-    ) -&gt; <span style="color: #8B008B; font-weight: bold">None</span>:
-        <span style="color: #CD5555">&quot;&quot;&quot;</span>
-<span style="color: #CD5555">        Description:</span>
-<span style="color: #CD5555">        ------------</span>
-<span style="color: #CD5555">            Add a Pooling2DLayer to the CNN, i.e. a pooling layer that reduces the dimentionality of</span>
-<span style="color: #CD5555">            the image data. It is not necessary to use a Pooling2DLayer when creating a CNN, but it</span>
-<span style="color: #CD5555">            can be used to speed up the training</span>
-
-<span style="color: #CD5555">        Parameters:</span>
-<span style="color: #CD5555">        ------------</span>
-<span style="color: #CD5555">            I   kernel_height (int) height of the kernel used for pooling</span>
-<span style="color: #CD5555">            II  kernel_width (int) width of the kernel used for pooling</span>
-<span style="color: #CD5555">            III v_stride (int) value of vertical stride for dimentionality reduction</span>
-<span style="color: #CD5555">            IV  h_stride (int) value of horizontal stride for dimentionality reduction</span>
-<span style="color: #CD5555">            V   pooling (str) either &quot;max&quot; or &quot;average&quot;, describes type of pooling performed</span>
-<span style="color: #CD5555">        &quot;&quot;&quot;</span>
-        pooling_layer = Pooling2DLayer(
-            kernel_height, kernel_width, v_stride, h_stride, pooling, <span style="color: #658b00">self</span>.seed
-        )
-        <span style="color: #658b00">self</span>.layers.append(pooling_layer)
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">fit</span>(
-        <span style="color: #658b00">self</span>,
-        X: np.ndarray,
-        t: np.ndarray,
-        epochs: <span style="color: #658b00">int</span> = <span style="color: #B452CD">100</span>,
-        lam: <span style="color: #658b00">float</span> = <span style="color: #B452CD">0</span>,
-        batches: <span style="color: #658b00">int</span> = <span style="color: #B452CD">1</span>,
-        X_val: np.ndarray = <span style="color: #8B008B; font-weight: bold">None</span>,
-        t_val: np.ndarray = <span style="color: #8B008B; font-weight: bold">None</span>,
-    ) -&gt; <span style="color: #658b00">dict</span>:
-        <span style="color: #CD5555">&quot;&quot;&quot;</span>
-<span style="color: #CD5555">        Description:</span>
-<span style="color: #CD5555">        ------------</span>
-<span style="color: #CD5555">            Fits the CNN to input X for a given amount of epochs. Performs feedforward and backpropagation passes,</span>
-<span style="color: #CD5555">            can utilize batches, regulariziation and validation if desired.</span>
-
-<span style="color: #CD5555">        Parameters:</span>
-<span style="color: #CD5555">        ------------</span>
-<span style="color: #CD5555">            X (numpy array) with input data in format [images, input channels,</span>
-<span style="color: #CD5555">            image height, image_width]</span>
-<span style="color: #CD5555">            t (numpy array) target labels for input data</span>
-<span style="color: #CD5555">            epochs (int) amount of epochs</span>
-<span style="color: #CD5555">            lam (float) regulariziation term lambda</span>
-<span style="color: #CD5555">            batches (int) amount of batches input data splits into</span>
-<span style="color: #CD5555">            X_val (numpy array) validation data</span>
-<span style="color: #CD5555">            t_val (numpy array) target labels for validation data</span>
-
-<span style="color: #CD5555">        Returns:</span>
-<span style="color: #CD5555">        ------------</span>
-<span style="color: #CD5555">            scores (dict) a dictionary with &quot;train_error&quot;, &quot;train_acc&quot;, &quot;val_error&quot;, val_acc&quot; keys</span>
-<span style="color: #CD5555">            that contain numpy arrays with float values of all accuracies/errors over all epochs.</span>
-<span style="color: #CD5555">            Can be used to create plots. Also used to update the progress bar during training</span>
-<span style="color: #CD5555">        &quot;&quot;&quot;</span>
-
-        <span style="color: #228B22"># setup</span>
-        <span style="color: #8B008B; font-weight: bold">if</span> <span style="color: #658b00">self</span>.seed <span style="color: #8B008B">is</span> <span style="color: #8B008B">not</span> <span style="color: #8B008B; font-weight: bold">None</span>:
-            np.random.seed(<span style="color: #658b00">self</span>.seed)
-
-        <span style="color: #228B22"># initialize weights</span>
-        <span style="color: #658b00">self</span>._initialize_weights(X)
-
-        <span style="color: #228B22"># create arrays for score metrics</span>
-        scores = <span style="color: #658b00">self</span>._initialize_scores(epochs)
-
-        <span style="color: #8B008B; font-weight: bold">assert</span> batches &lt;= t.shape[<span style="color: #B452CD">0</span>]
-        batch_size = X.shape[<span style="color: #B452CD">0</span>] // batches
-
-        <span style="color: #8B008B; font-weight: bold">try</span>:
-            <span style="color: #8B008B; font-weight: bold">for</span> epoch <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(epochs):
-                <span style="color: #8B008B; font-weight: bold">for</span> batch <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(batches):
-                    <span style="color: #228B22"># minibatch gradient descent</span>
-                    <span style="color: #228B22"># If the for loop has reached the last batch, take all thats left</span>
-                    <span style="color: #8B008B; font-weight: bold">if</span> batch == batches - <span style="color: #B452CD">1</span>:
-                        X_batch = X[batch * batch_size :, :, :, :]
-                        t_batch = t[batch * batch_size :, :]
-                    <span style="color: #8B008B; font-weight: bold">else</span>:
-                        X_batch = X[
-                            batch * batch_size : (batch + <span style="color: #B452CD">1</span>) * batch_size, :, :, :
-                        ]
-                        t_batch = t[batch * batch_size : (batch + <span style="color: #B452CD">1</span>) * batch_size, :]
-
-                    <span style="color: #658b00">self</span>._feedforward(X_batch)
-                    <span style="color: #658b00">self</span>._backpropagate(t_batch, lam)
-
-                <span style="color: #228B22"># reset schedulers for each epoch (some schedulers pass in this call)</span>
-                <span style="color: #8B008B; font-weight: bold">for</span> layer <span style="color: #8B008B">in</span> <span style="color: #658b00">self</span>.layers:
-                    <span style="color: #8B008B; font-weight: bold">if</span> <span style="color: #658b00">isinstance</span>(layer, FullyConnectedLayer):
-                        layer._reset_scheduler()
-
-                <span style="color: #228B22"># computing performance metrics</span>
-                scores = <span style="color: #658b00">self</span>._compute_scores(scores, epoch, X, t, X_val, t_val)
-
-                <span style="color: #228B22"># printing progress bar</span>
-                print_length = <span style="color: #658b00">self</span>._progress_bar(
-                    epoch,
-                    epochs,
-                    scores,
-                )
-        <span style="color: #228B22"># allows for stopping training at any point and seeing the result</span>
-        <span style="color: #8B008B; font-weight: bold">except</span> <span style="color: #008b45; font-weight: bold">KeyboardInterrupt</span>:
-            <span style="color: #8B008B; font-weight: bold">pass</span>
-
-        <span style="color: #228B22"># visualization of training progression (similiar to tensorflow progression bar)</span>
-        sys.stdout.write(<span style="color: #CD5555">&quot;\r&quot;</span> + <span style="color: #CD5555">&quot; &quot;</span> * print_length)
-        sys.stdout.flush()
-        <span style="color: #658b00">self</span>._progress_bar(
-            epochs,
-            epochs,
-            scores,
-        )
-        sys.stdout.write(<span style="color: #CD5555">&quot;&quot;</span>)
-
-        <span style="color: #8B008B; font-weight: bold">return</span> scores
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">_feedforward</span>(<span style="color: #658b00">self</span>, X_batch) -&gt; np.ndarray:
-        <span style="color: #CD5555">&quot;&quot;&quot;</span>
-<span style="color: #CD5555">        Description:</span>
-<span style="color: #CD5555">        ------------</span>
-<span style="color: #CD5555">            Performs the feedforward pass for all layers in the CNN. Called from fit()</span>
-<span style="color: #CD5555">        &quot;&quot;&quot;</span>
-        a = X_batch
-        <span style="color: #8B008B; font-weight: bold">for</span> layer <span style="color: #8B008B">in</span> <span style="color: #658b00">self</span>.layers:
-            a = layer._feedforward(a)
-
-        <span style="color: #8B008B; font-weight: bold">return</span> a
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">_backpropagate</span>(<span style="color: #658b00">self</span>, t_batch, lam) -&gt; <span style="color: #8B008B; font-weight: bold">None</span>:
-        <span style="color: #CD5555">&quot;&quot;&quot;</span>
-<span style="color: #CD5555">        Description:</span>
-<span style="color: #CD5555">        ------------</span>
-<span style="color: #CD5555">            Performs backpropagation for all layers in the CNN. Called from fit()</span>
-<span style="color: #CD5555">        &quot;&quot;&quot;</span>
-        <span style="color: #8B008B; font-weight: bold">assert</span> <span style="color: #658b00">len</span>(<span style="color: #658b00">self</span>.layers) &gt;= <span style="color: #B452CD">2</span>
-        reversed_layers = <span style="color: #658b00">self</span>.layers[::-<span style="color: #B452CD">1</span>]
-
-        <span style="color: #228B22"># for every layer, backwards</span>
-        <span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #658b00">len</span>(reversed_layers) - <span style="color: #B452CD">1</span>):
-            layer = reversed_layers[i]
-            prev_layer = reversed_layers[i + <span style="color: #B452CD">1</span>]
-
-            <span style="color: #228B22"># OutputLayer</span>
-            <span style="color: #8B008B; font-weight: bold">if</span> <span style="color: #658b00">isinstance</span>(layer, OutputLayer):
-                prev_a = prev_layer.get_prev_a()
-                weights_next, delta_next = layer._backpropagate(t_batch, prev_a, lam)
-
-            <span style="color: #228B22"># FullyConnectedLayer</span>
-            <span style="color: #8B008B; font-weight: bold">elif</span> <span style="color: #658b00">isinstance</span>(layer, FullyConnectedLayer):
-                <span style="color: #8B008B; font-weight: bold">assert</span> (
-                    delta_next <span style="color: #8B008B">is</span> <span style="color: #8B008B">not</span> <span style="color: #8B008B; font-weight: bold">None</span>
-                ), <span style="color: #CD5555">&quot;No OutputLayer to follow FullyConnectedLayer&quot;</span>
-                <span style="color: #8B008B; font-weight: bold">assert</span> (
-                    weights_next <span style="color: #8B008B">is</span> <span style="color: #8B008B">not</span> <span style="color: #8B008B; font-weight: bold">None</span>
-                ), <span style="color: #CD5555">&quot;No OutputLayer to follow FullyConnectedLayer&quot;</span>
-                prev_a = prev_layer.get_prev_a()
-                weights_next, delta_next = layer._backpropagate(
-                    weights_next, delta_next, prev_a, lam
-                )
-
-            <span style="color: #228B22"># FlattenLayer</span>
-            <span style="color: #8B008B; font-weight: bold">elif</span> <span style="color: #658b00">isinstance</span>(layer, FlattenLayer):
-                <span style="color: #8B008B; font-weight: bold">assert</span> (
-                    delta_next <span style="color: #8B008B">is</span> <span style="color: #8B008B">not</span> <span style="color: #8B008B; font-weight: bold">None</span>
-                ), <span style="color: #CD5555">&quot;No FullyConnectedLayer to follow FlattenLayer&quot;</span>
-                <span style="color: #8B008B; font-weight: bold">assert</span> (
-                    weights_next <span style="color: #8B008B">is</span> <span style="color: #8B008B">not</span> <span style="color: #8B008B; font-weight: bold">None</span>
-                ), <span style="color: #CD5555">&quot;No FullyConnectedLayer to follow FlattenLayer&quot;</span>
-                delta_next = layer._backpropagate(weights_next, delta_next)
-
-            <span style="color: #228B22"># Convolution2DLayer and Convolution2DLayerOPT</span>
-            <span style="color: #8B008B; font-weight: bold">elif</span> <span style="color: #658b00">isinstance</span>(layer, Convolution2DLayer):
-                <span style="color: #8B008B; font-weight: bold">assert</span> (
-                    delta_next <span style="color: #8B008B">is</span> <span style="color: #8B008B">not</span> <span style="color: #8B008B; font-weight: bold">None</span>
-                ), <span style="color: #CD5555">&quot;No FlattenLayer to follow Convolution2DLayer&quot;</span>
-                delta_next = layer._backpropagate(delta_next)
-
-            <span style="color: #228B22"># Pooling2DLayer</span>
-            <span style="color: #8B008B; font-weight: bold">elif</span> <span style="color: #658b00">isinstance</span>(layer, Pooling2DLayer):
-                <span style="color: #8B008B; font-weight: bold">assert</span> delta_next <span style="color: #8B008B">is</span> <span style="color: #8B008B">not</span> <span style="color: #8B008B; font-weight: bold">None</span>, <span style="color: #CD5555">&quot;No Layer to follow Pooling2DLayer&quot;</span>
-                delta_next = layer._backpropagate(delta_next)
-
-            <span style="color: #228B22"># Catch error</span>
-            <span style="color: #8B008B; font-weight: bold">else</span>:
-                <span style="color: #8B008B; font-weight: bold">raise</span> <span style="color: #008b45; font-weight: bold">NotImplementedError</span>
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">_compute_scores</span>(
-        <span style="color: #658b00">self</span>,
-        scores: <span style="color: #658b00">dict</span>,
-        epoch: <span style="color: #658b00">int</span>,
-        X: np.ndarray,
-        t: np.ndarray,
-        X_val: np.ndarray,
-        t_val: np.ndarray,
-    ) -&gt; <span style="color: #658b00">dict</span>:
-        <span style="color: #CD5555">&quot;&quot;&quot;</span>
-<span style="color: #CD5555">        Description:</span>
-<span style="color: #CD5555">        ------------</span>
-<span style="color: #CD5555">            Computes scores such as training error, training accuracy, validation error</span>
-<span style="color: #CD5555">            and validation accuracy for the CNN depending on if a validation set is used</span>
-<span style="color: #CD5555">            and if the CNN performs classification or regression</span>
-
-<span style="color: #CD5555">        Returns:</span>
-<span style="color: #CD5555">        ------------</span>
-<span style="color: #CD5555">            scores (dict) a dictionary with &quot;train_error&quot;, &quot;train_acc&quot;, &quot;val_error&quot;, val_acc&quot; keys</span>
-<span style="color: #CD5555">            that contain numpy arrays with float values of all accuracies/errors over all epochs.</span>
-<span style="color: #CD5555">            Can be used to create plots. Also used to update the progress bar during training</span>
-<span style="color: #CD5555">        &quot;&quot;&quot;</span>
-
-        pred_train = <span style="color: #658b00">self</span>.predict(X)
-        cost_function_train = <span style="color: #658b00">self</span>.cost_func(t)
-        train_error = cost_function_train(pred_train)
-        scores[<span style="color: #CD5555">&quot;train_error&quot;</span>][epoch] = train_error
-
-        <span style="color: #8B008B; font-weight: bold">if</span> X_val <span style="color: #8B008B">is</span> <span style="color: #8B008B">not</span> <span style="color: #8B008B; font-weight: bold">None</span> <span style="color: #8B008B">and</span> t_val <span style="color: #8B008B">is</span> <span style="color: #8B008B">not</span> <span style="color: #8B008B; font-weight: bold">None</span>:
-            cost_function_val = <span style="color: #658b00">self</span>.cost_func(t_val)
-            pred_val = <span style="color: #658b00">self</span>.predict(X_val)
-            val_error = cost_function_val(pred_val)
-            scores[<span style="color: #CD5555">&quot;val_error&quot;</span>][epoch] = val_error
-
-        <span style="color: #8B008B; font-weight: bold">if</span> <span style="color: #658b00">self</span>.pred_format != <span style="color: #CD5555">&quot;Regression&quot;</span>:
-            train_acc = <span style="color: #658b00">self</span>._accuracy(pred_train, t)
-            scores[<span style="color: #CD5555">&quot;train_acc&quot;</span>][epoch] = train_acc
-            <span style="color: #8B008B; font-weight: bold">if</span> X_val <span style="color: #8B008B">is</span> <span style="color: #8B008B">not</span> <span style="color: #8B008B; font-weight: bold">None</span> <span style="color: #8B008B">and</span> t_val <span style="color: #8B008B">is</span> <span style="color: #8B008B">not</span> <span style="color: #8B008B; font-weight: bold">None</span>:
-                val_acc = <span style="color: #658b00">self</span>._accuracy(pred_val, t_val)
-                scores[<span style="color: #CD5555">&quot;val_acc&quot;</span>][epoch] = val_acc
-
-        <span style="color: #8B008B; font-weight: bold">return</span> scores
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">_initialize_scores</span>(<span style="color: #658b00">self</span>, epochs) -&gt; <span style="color: #658b00">dict</span>:
-        <span style="color: #CD5555">&quot;&quot;&quot;</span>
-<span style="color: #CD5555">        Description:</span>
-<span style="color: #CD5555">        ------------</span>
-<span style="color: #CD5555">            Initializes scores such as training error, training accuracy, validation error</span>
-<span style="color: #CD5555">            and validation accuracy for the CNN</span>
-
-<span style="color: #CD5555">        Returns:</span>
-<span style="color: #CD5555">        ------------</span>
-<span style="color: #CD5555">            A dictionary with &quot;train_error&quot;, &quot;train_acc&quot;, &quot;val_error&quot;, val_acc&quot; keys that</span>
-<span style="color: #CD5555">            will contain numpy arrays with float values of all accuracies/errors over all epochs</span>
-<span style="color: #CD5555">            when passed through the _compute_scores() function during fit()</span>
-<span style="color: #CD5555">        &quot;&quot;&quot;</span>
-        scores = <span style="color: #658b00">dict</span>()
-
-        train_errors = np.empty(epochs)
-        train_errors.fill(np.nan)
-        val_errors = np.empty(epochs)
-        val_errors.fill(np.nan)
-
-        train_accs = np.empty(epochs)
-        train_accs.fill(np.nan)
-        val_accs = np.empty(epochs)
-        val_accs.fill(np.nan)
-
-        scores[<span style="color: #CD5555">&quot;train_error&quot;</span>] = train_errors
-        scores[<span style="color: #CD5555">&quot;val_error&quot;</span>] = val_errors
-        scores[<span style="color: #CD5555">&quot;train_acc&quot;</span>] = train_accs
-        scores[<span style="color: #CD5555">&quot;val_acc&quot;</span>] = val_accs
-
-        <span style="color: #8B008B; font-weight: bold">return</span> scores
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">_initialize_weights</span>(<span style="color: #658b00">self</span>, X: np.ndarray) -&gt; <span style="color: #8B008B; font-weight: bold">None</span>:
-        <span style="color: #CD5555">&quot;&quot;&quot;</span>
-<span style="color: #CD5555">        Description:</span>
-<span style="color: #CD5555">        ------------</span>
-<span style="color: #CD5555">            Initializes weights for all layers in CNN</span>
-
-<span style="color: #CD5555">        Parameters:</span>
-<span style="color: #CD5555">        ------------</span>
-<span style="color: #CD5555">            I   X (np.ndarray) input of format [img, feature_maps, height, width]</span>
-<span style="color: #CD5555">        &quot;&quot;&quot;</span>
-        prev_nodes = X
-        <span style="color: #8B008B; font-weight: bold">for</span> layer <span style="color: #8B008B">in</span> <span style="color: #658b00">self</span>.layers:
-            prev_nodes = layer._reset_weights(prev_nodes)
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">predict</span>(<span style="color: #658b00">self</span>, X: np.ndarray, *, threshold=<span style="color: #B452CD">0.5</span>) -&gt; np.ndarray:
-        <span style="color: #CD5555">&quot;&quot;&quot;</span>
-<span style="color: #CD5555">        Description:</span>
-<span style="color: #CD5555">        ------------</span>
-<span style="color: #CD5555">            Predicts output of input X</span>
-
-<span style="color: #CD5555">        Parameters:</span>
-<span style="color: #CD5555">        ------------</span>
-<span style="color: #CD5555">            I   X (np.ndarray) input [img, feature_maps, height, width]</span>
-<span style="color: #CD5555">        &quot;&quot;&quot;</span>
-
-        prediction = <span style="color: #658b00">self</span>._feedforward(X)
-
-        <span style="color: #8B008B; font-weight: bold">if</span> <span style="color: #658b00">self</span>.pred_format == <span style="color: #CD5555">&quot;Binary&quot;</span>:
-            <span style="color: #8B008B; font-weight: bold">return</span> np.where(prediction &gt; threshold, <span style="color: #B452CD">1</span>, <span style="color: #B452CD">0</span>)
-        <span style="color: #8B008B; font-weight: bold">elif</span> <span style="color: #658b00">self</span>.pred_format == <span style="color: #CD5555">&quot;Multi-class&quot;</span>:
-            class_prediction = np.zeros(prediction.shape)
-            <span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(prediction.shape[<span style="color: #B452CD">0</span>]):
-                class_prediction[i, np.argmax(prediction[i, :])] = <span style="color: #B452CD">1</span>
-            <span style="color: #8B008B; font-weight: bold">return</span> class_prediction
-        <span style="color: #8B008B; font-weight: bold">else</span>:
-            <span style="color: #8B008B; font-weight: bold">return</span> prediction
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">_accuracy</span>(<span style="color: #658b00">self</span>, prediction: np.ndarray, target: np.ndarray) -&gt; <span style="color: #658b00">float</span>:
-        <span style="color: #CD5555">&quot;&quot;&quot;</span>
-<span style="color: #CD5555">        Description:</span>
-<span style="color: #CD5555">        ------------</span>
-<span style="color: #CD5555">            Calculates accuracy of given prediction to target</span>
-
-<span style="color: #CD5555">        Parameters:</span>
-<span style="color: #CD5555">        ------------</span>
-<span style="color: #CD5555">            I   prediction (np.ndarray): output of predict() fuction</span>
-<span style="color: #CD5555">            (1s and 0s in case of classification, and real numbers in case of regression)</span>
-<span style="color: #CD5555">            II  target (np.ndarray): vector of true values (What the network should predict)</span>
-
-<span style="color: #CD5555">        Returns:</span>
-<span style="color: #CD5555">        ------------</span>
-<span style="color: #CD5555">            A floating point number representing the percentage of correctly classified instances.</span>
-<span style="color: #CD5555">        &quot;&quot;&quot;</span>
-        <span style="color: #8B008B; font-weight: bold">assert</span> prediction.size == target.size
-        <span style="color: #8B008B; font-weight: bold">return</span> np.average((target == prediction))
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">_progress_bar</span>(<span style="color: #658b00">self</span>, epoch: <span style="color: #658b00">int</span>, epochs: <span style="color: #658b00">int</span>, scores: <span style="color: #658b00">dict</span>) -&gt; <span style="color: #658b00">int</span>:
-        <span style="color: #CD5555">&quot;&quot;&quot;</span>
-<span style="color: #CD5555">        Description:</span>
-<span style="color: #CD5555">        ------------</span>
-<span style="color: #CD5555">            Displays progress of training</span>
-<span style="color: #CD5555">        &quot;&quot;&quot;</span>
-        progression = epoch / epochs
-        epoch -= <span style="color: #B452CD">1</span>
-        print_length = <span style="color: #B452CD">40</span>
-        num_equals = <span style="color: #658b00">int</span>(progression * print_length)
-        num_not = print_length - num_equals
-        arrow = <span style="color: #CD5555">&quot;&gt;&quot;</span> <span style="color: #8B008B; font-weight: bold">if</span> num_equals &gt; <span style="color: #B452CD">0</span> <span style="color: #8B008B; font-weight: bold">else</span> <span style="color: #CD5555">&quot;&quot;</span>
-        bar = <span style="color: #CD5555">&quot;[&quot;</span> + <span style="color: #CD5555">&quot;=&quot;</span> * (num_equals - <span style="color: #B452CD">1</span>) + arrow + <span style="color: #CD5555">&quot;-&quot;</span> * num_not + <span style="color: #CD5555">&quot;]&quot;</span>
-        perc_print = <span style="color: #658b00">self</span>._fmt(progression * <span style="color: #B452CD">100</span>, N=<span style="color: #B452CD">5</span>)
-        line = <span style="color: #CD5555">f&quot;  {</span>bar<span style="color: #CD5555">} {</span>perc_print<span style="color: #CD5555">}% &quot;</span>
-
-        <span style="color: #8B008B; font-weight: bold">for</span> key, score <span style="color: #8B008B">in</span> scores.items():
-            <span style="color: #8B008B; font-weight: bold">if</span> np.isnan(score[epoch]) == <span style="color: #8B008B; font-weight: bold">False</span>:
-                value = <span style="color: #658b00">self</span>._fmt(score[epoch], N=<span style="color: #B452CD">4</span>)
-                line += <span style="color: #CD5555">f&quot;| {</span>key<span style="color: #CD5555">}: {</span>value<span style="color: #CD5555">} &quot;</span>
-        <span style="color: #658b00">print</span>(line, end=<span style="color: #CD5555">&quot;\r&quot;</span>)
-        <span style="color: #8B008B; font-weight: bold">return</span> <span style="color: #658b00">len</span>(line)
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">_fmt</span>(<span style="color: #658b00">self</span>, value: <span style="color: #658b00">int</span>, N=<span style="color: #B452CD">4</span>) -&gt; <span style="color: #658b00">str</span>:
-        <span style="color: #CD5555">&quot;&quot;&quot;</span>
-<span style="color: #CD5555">        Description:</span>
-<span style="color: #CD5555">        ------------</span>
-<span style="color: #CD5555">            Formats decimal numbers for progress bar</span>
-<span style="color: #CD5555">        &quot;&quot;&quot;</span>
-        <span style="color: #8B008B; font-weight: bold">if</span> value &gt; <span style="color: #B452CD">0</span>:
-            v = value
-        <span style="color: #8B008B; font-weight: bold">elif</span> value &lt; <span style="color: #B452CD">0</span>:
-            v = -<span style="color: #B452CD">10</span> * value
-        <span style="color: #8B008B; font-weight: bold">else</span>:
-            v = <span style="color: #B452CD">1</span>
-        n = <span style="color: #B452CD">1</span> + math.floor(math.log10(v))
-        <span style="color: #8B008B; font-weight: bold">if</span> n &gt;= N - <span style="color: #B452CD">1</span>:
-            <span style="color: #8B008B; font-weight: bold">return</span> <span style="color: #658b00">str</span>(<span style="color: #658b00">round</span>(value))
-            <span style="color: #228B22"># or overflow</span>
-        <span style="color: #8B008B; font-weight: bold">return</span> <span style="color: #CD5555">f&quot;{</span>value<span style="color: #CD5555">:.{</span>N-n-<span style="color: #B452CD">1</span><span style="color: #CD5555">}f}&quot;</span>
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-<h3 id="usage-of-cnn-code">Usage of CNN code </h3>
-
-<p>Using the CNN codebase is very simple. We begin by initiating a CNN
-object, which takes a cost function, a scheduler and a seed as its
-arguments. If a scheduler is not provided, it will per default
-initiate an Adam scheduler with eta=1e-4, and if a seed is not
-provided, the CNN will not be seeded, meaning it will run with a
-different random seed every run. Below we demonstrate an initiation of
-our CNN.
-</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="font-size: 80%; line-height: 125%;">adam_scheduler = Adam(eta=<span style="color: #B452CD">1e-3</span>, rho=<span style="color: #B452CD">0.9</span>, rho2=<span style="color: #B452CD">0.999</span>)
-cnn = CNN(cost_func=CostCrossEntropy, scheduler=adam_scheduler, seed=<span style="color: #B452CD">2023</span>)
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-<p>Now that we have our CNN object, we can begin to add layers to it!
-Many of the add_layer functions have default values, for example
-add_Convolution2DLayer() has a default v_stride and h_stride of
-1. However, these can of course be set to any value you please. Note
-that the input channels of a subsequent convolutional layer must equal
-the previous convolutional layer's feature maps.
-</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="font-size: 80%; line-height: 125%;">cnn.add_Convolution2DLayer(
-    input_channels=<span style="color: #B452CD">1</span>,
-    feature_maps=<span style="color: #B452CD">1</span>,
-    kernel_height=<span style="color: #B452CD">3</span>,
-    kernel_width=<span style="color: #B452CD">3</span>,
-    act_func=LRELU,
-)
-
-cnn.add_FlattenLayer()
-
-cnn.add_FullyConnectedLayer(<span style="color: #B452CD">30</span>, LRELU)
-
-cnn.add_FullyConnectedLayer(<span style="color: #B452CD">20</span>, LRELU)
-
-cnn.add_OutputLayer(<span style="color: #B452CD">10</span>, softmax)
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-<p>Here we have created a CNN with the following architecture:</p>
-
-<ol>
-<p><li> A convolutional layer with 1 input channel, with a kernel height of 2 and a width of 2, which uses LRELU as its non-linearity function. This layer outputs 1 feature map, which feed into the subsequent layer.</li>
-<p><li> A flatten layer</li>
-<p><li> A hidden layer with 30 nodes, with LRELU as its activation function</li>
-<p><li> Another hidden layer but with 20 nodes</li>
-<p><li> The output layer, with softmax as its activation function and 10 nodes. We use 10 nodes because we will be using a dataset with 10 classes.</li>
-</ol>
-<p>
-<p>Now, before we can train the model, we need to load in our data. We
-will use the MNIST dataset and use 10000 \( 28 \times  28 \) images.
-</p>
+<p>As an example, we have</p>
+<p>&nbsp;<br>
+$$
+y(4)=x(6)w(0)+x(5)w(1)+x(4)w(2)=0\times \alpha_0+\beta_3\alpha_1+\beta_2\alpha_2,
+$$
+<p>&nbsp;<br>
 
+<p>as before except that we have an additional term \( x(6)w(0) \), which is zero.</p>
 
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="font-size: 80%; line-height: 125%;"><span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.datasets</span> <span style="color: #8B008B; font-weight: bold">import</span> fetch_openml
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.model_selection</span> <span style="color: #8B008B; font-weight: bold">import</span> train_test_split
-
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">onehot</span>(target: np.ndarray):
-    onehot = np.zeros((target.size, target.max() + <span style="color: #B452CD">1</span>))
-    onehot[np.arange(target.size), target] = <span style="color: #B452CD">1</span>
-    <span style="color: #8B008B; font-weight: bold">return</span> onehot
-
-<span style="color: #228B22"># get dataset</span>
-dataset = fetch_openml(<span style="color: #CD5555">&quot;mnist_784&quot;</span>, parser=<span style="color: #CD5555">&quot;auto&quot;</span>)
-mnist = dataset.data.to_numpy(dtype=<span style="color: #CD5555">&quot;float&quot;</span>)[:<span style="color: #B452CD">10000</span>, :]
-
-<span style="color: #228B22"># scale data</span>
-<span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(mnist.shape[<span style="color: #B452CD">1</span>]):
-    mnist[:, i] /= <span style="color: #B452CD">255</span>
-    
-<span style="color: #228B22"># reshape to add single input channel to data shape [inputs, input_channels, height, width]</span>
-mnist = mnist.reshape(mnist.shape[<span style="color: #B452CD">0</span>], <span style="color: #B452CD">1</span>, <span style="color: #B452CD">28</span>, <span style="color: #B452CD">28</span>)
-
-<span style="color: #228B22"># one hot encode target as we are doing multi-class classification</span>
-target = onehot(np.array([<span style="color: #658b00">int</span>(i) <span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> dataset.target.to_numpy()[:<span style="color: #B452CD">10000</span>]]))
-
-<span style="color: #228B22"># split into training and validation data</span>
-x_train, x_val, y_train, y_val = train_test_split(mnist, target)
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-<p>Now we may train our model. Note that we can utilize regularization in
-the CNN by using the lam (lambda) parameter in fit(), and utilize
-different types of gradient descent by specifying the amount of
-batches via the batches parameter as shown below.
-</p>
-
-<p>The functionfit() returns a score dictionary of the training error and
-accuracy (and validation error and accuracy if a validation set is
-provided) which can be used to plot the error and accuracy of the
-model over epochs.
-</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="font-size: 80%; line-height: 125%;">scores = cnn.fit(
-    x_train,
-    y_train,
-    lam=<span style="color: #B452CD">1e-5</span>,
-    batches=<span style="color: #B452CD">10</span>,
-    epochs=<span style="color: #B452CD">100</span>,
-    X_val=x_val,
-    t_val=y_val,
-)
-
-plt.plot(scores[<span style="color: #CD5555">&quot;train_acc&quot;</span>], label=<span style="color: #CD5555">&quot;Training&quot;</span>)
-plt.plot(scores[<span style="color: #CD5555">&quot;val_acc&quot;</span>], label=<span style="color: #CD5555">&quot;Validation&quot;</span>)
-plt.ylim([<span style="color: #B452CD">0.8</span>,<span style="color: #B452CD">1</span>])
-plt.xlabel(<span style="color: #CD5555">&quot;Epochs&quot;</span>)
-plt.ylabel(<span style="color: #CD5555">&quot;Accuracy&quot;</span>)
-plt.legend()
-plt.show()
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-<p>Considering we only trained the model for 100 epochs without any tuning of the hyperparameters, this result is pretty good.</p>
-
-<p>The codebase allows for great flexibility in CNN
-architectures. Pooling layers can be added before, inbetween or after
-convolutional layers, but due to the great optimizations made within
-Convolution2DLayerOPT, we recommend using the v_stride and h_stride
-parameters in add_Convolution2DLayer() to reduce the dimentionality of
-the problem as the pooling layer is slow in comparison. To use the
-unoptimized version of Convolution2DLayer, simply pass optimized=False
-as an argument in add_Convolution2DLayer().
-</p>
-
-<p>If one wishes to perform binary classification using the CNN, simply
-use the cost function 'CostLogReg' when initializing the CNN and use 1
-node at the OutputLayer.
-</p>
-
-<p>Below we have created another, more untraditional architecture using
-our code to demonstrate its flexibility and different attributes such
-as asymmetric stride that might become useful when constructing your
-own CNN.
-</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="font-size: 80%; line-height: 125%;">adam_scheduler = Adam(eta=<span style="color: #B452CD">1e-3</span>, rho=<span style="color: #B452CD">0.9</span>, rho2=<span style="color: #B452CD">0.999</span>)
-cnn = CNN(cost_func=CostCrossEntropy, scheduler=adam_scheduler, seed=<span style="color: #B452CD">2023</span>)
-
-cnn.add_Convolution2DLayer(
-    input_channels=<span style="color: #B452CD">1</span>,
-    feature_maps=<span style="color: #B452CD">7</span>,
-    kernel_height=<span style="color: #B452CD">7</span>,
-    kernel_width=<span style="color: #B452CD">1</span>,
-    act_func=LRELU,
-)
-
-cnn.add_PoolingLayer(
-    kernel_height=<span style="color: #B452CD">2</span>,
-    kernel_width=<span style="color: #B452CD">2</span>,
-    pooling=<span style="color: #CD5555">&quot;average&quot;</span>,
-)
-
-cnn.add_PoolingLayer(
-    kernel_height=<span style="color: #B452CD">2</span>,
-    kernel_width=<span style="color: #B452CD">2</span>,
-    pooling=<span style="color: #CD5555">&quot;max&quot;</span>,
-)
-
-cnn.add_Convolution2DLayer(
-    input_channels=<span style="color: #B452CD">7</span>,
-    feature_maps=<span style="color: #B452CD">1</span>,
-    kernel_height=<span style="color: #B452CD">4</span>,
-    kernel_width=<span style="color: #B452CD">4</span>,
-    v_stride=<span style="color: #B452CD">2</span>,
-    h_stride=<span style="color: #B452CD">3</span>,
-    act_func=LRELU,
-    optimized=<span style="color: #8B008B; font-weight: bold">False</span>,
-)
-
-cnn.add_Convolution2DLayer(
-    input_channels=<span style="color: #B452CD">1</span>,
-    feature_maps=<span style="color: #B452CD">1</span>,
-    kernel_height=<span style="color: #B452CD">2</span>,
-    kernel_width=<span style="color: #B452CD">2</span>,
-    act_func=sigmoid,
-    optimized=<span style="color: #8B008B; font-weight: bold">True</span>,
-)
-
-cnn.add_PoolingLayer(
-    kernel_height=<span style="color: #B452CD">2</span>,
-    kernel_width=<span style="color: #B452CD">2</span>,
-    pooling=<span style="color: #CD5555">&quot;max&quot;</span>
-)
-
-cnn.add_FlattenLayer()
-
-cnn.add_FullyConnectedLayer(<span style="color: #B452CD">100</span>, LRELU)
-
-cnn.add_FullyConnectedLayer(<span style="color: #B452CD">10</span>, sigmoid)
-
-cnn.add_FullyConnectedLayer(<span style="color: #B452CD">101</span>, identity)
-
-cnn.add_OutputLayer(<span style="color: #B452CD">10</span>, softmax)
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-<p>Here we see the use of asymmetrical 1D kernels such as the \( 7 \times
-1 \) kernel in the first convolutional layer, both max and average
-pooling, asymmetric stride in the unoptimized convolutional layer,
-more pooling, a flatten layer, a hidden layer with 100 nodes using
-LRELU, another hidden layer with 10 hidden nodes that uses the sigmoid
-activation function, and another hidden layer with 101 nodes which
-utilizes no activation function (identity). Finally, we arrive at the
-output layer with 10 nodes, which uses softmax as its activation
-function.
-</p>
-<h3 id="additional-remarks">Additional Remarks </h3>
-
-<p>The stride parameter controls the distance between each convolution
-and the kernel/filter. If our image is padded, stride is the only
-parameter that determines the size of the output from a convolutional
-layer. However, if we decide not to perform any padding, the size of
-the output feature map depends on both the stride and kernel size. It
-is important to note that neither the stride nor the kernel has to be
-symmetrical. This means that we can use a rectangular filter if we
-choose, and the stride in the vertical direction (axis=0 in Python)
-does not need to be the same as the stride in the horizontal direction
-(axis=1 in Python). It may even be the case that asymmetric
-combinations of stride or kernel dimensions, or both, yield better
-results than symmetric values for these parameters.
-</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="font-size: 80%; line-height: 125%;"><span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">convolve</span>(image, kernel, stride=<span style="color: #B452CD">1</span>):
-    <span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #B452CD">2</span>):
-        kernel = np.rot90(kernel)
-
-    k_half_height = kernel.shape[<span style="color: #B452CD">0</span>] // <span style="color: #B452CD">2</span>
-    k_half_width = kernel.shape[<span style="color: #B452CD">0</span>] // <span style="color: #B452CD">2</span>
-
-    conv_image = np.zeros(image.shape)
-    pad_image = padding(image, kernel)
-
-    <span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(k_half_height, conv_image.shape[<span style="color: #B452CD">0</span>] + k_half_height, stride):
-        <span style="color: #8B008B; font-weight: bold">for</span> j <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(k_half_width, conv_image.shape[<span style="color: #B452CD">1</span>] + k_half_width, stride):
-            conv_image[i - k_half_height, j - k_half_width] = np.sum(
-                pad_image[
-                    i - k_half_height : i + k_half_height + <span style="color: #B452CD">1</span>, j - k_half_width : j + k_half_width + <span style="color: #B452CD">1</span>
-                ]
-                * kernel
-            )
-
-    <span style="color: #8B008B; font-weight: bold">return</span> conv_image
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-<h3 id="remarks-on-the-speed">Remarks on the speed  </h3>
-
-<p>Despite the naive convolution algorithm shown above working finely, it
-is extremely slow, requiring approximately 20-30 seconds to process a
-single image. The time complexity of 2D convolution, which is O(NMnm),
-rapidly becomes a constraint and may, at worst, make computations
-infeasible. Consequently, optimizing the naive 2D convolution
-algorithm is a necessity, as the execution time of the algorithm
-significantly increases as the input data size expands. This can pose
-a bottleneck in applications that necessitate real-time processing of
-large data volumes, such as image and video processing, deep learning,
-and scientific simulations.
-</p>
-
-<p>To address this issue, we shall present two widely used optimization
-techniques: the separable kernel approach and Fast Fourier Transform
-(FFT). Both of these methods can drastically reduce the computational
-complexity of convolution and enhance the overall efficiency of
-processing substantial data quantities. While we shall refrain from
-delving into the intricacies of these algorithms, we strongly
-encourage you to examine at least the application of FFT to optimize
-computations.
-</p>
-<h3 id="convolution-using-separable-kernels">Convolution using separable kernels </h3>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="font-size: 80%; line-height: 125%;"><span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">conv2DSep</span>(image, kernel, coef, stride=<span style="color: #B452CD">1</span>, pad=<span style="color: #CD5555">&quot;zero&quot;</span>):
-    <span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #B452CD">2</span>):
-        kernel = np.rot90(kernel)
-
-    <span style="color: #228B22"># The kernel is quadratic, thus we only need one of its dimensions</span>
-    half_dim = kernel.shape[<span style="color: #B452CD">0</span>] // <span style="color: #B452CD">2</span>
-
-    ker1 = np.array(kernel[<span style="color: #B452CD">0</span>, :])
-    ker2 = np.array(kernel[:, <span style="color: #B452CD">0</span>])
-
-    <span style="color: #8B008B; font-weight: bold">if</span> pad == <span style="color: #CD5555">&quot;zero&quot;</span>:
-        conv_image = np.zeros(image.shape)
-        pad_image = padding(image, kernel)
-    <span style="color: #8B008B; font-weight: bold">else</span>:
-        conv_image = np.zeros(
-            (image.shape[<span style="color: #B452CD">0</span>] - kernel.shape[<span style="color: #B452CD">0</span>], image.shape[<span style="color: #B452CD">1</span>] - kernel.shape[<span style="color: #B452CD">1</span>])
-        )
-        pad_image = image[:, :]
-
-    <span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(half_dim, conv_image.shape[<span style="color: #B452CD">0</span>] + half_dim, stride):
-        <span style="color: #8B008B; font-weight: bold">for</span> j <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(half_dim, conv_image.shape[<span style="color: #B452CD">1</span>] + half_dim, stride):
-            conv_image[i - half_dim, j - half_dim] = (
-                pad_image[
-                    i - half_dim : i + half_dim + <span style="color: #B452CD">1</span>, j - half_dim : j + half_dim + <span style="color: #B452CD">1</span>
-                ]
-                @ ker1
-                @ ker2.T
-                * coef
-            )
-
-    <span style="color: #8B008B; font-weight: bold">return</span> conv_image
-
-img_path = img_path = <span style="color: #CD5555">&quot;data/IMG-2167.JPG&quot;</span>
-image_of_cute_dog = imageio.imread(img_path, mode=<span style="color: #CD5555">&quot;L&quot;</span>)
-start_time = time.time()
-filtered_image = conv2DSep(image_of_cute_dog, kernel=sobel_kernel, coef=<span style="color: #B452CD">1</span>)
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">f&#39;Time taken for convolution with seperated kernel on 128x128 image {</span>time.time() - start_time<span style="color: #CD5555">}&#39;</span>)
-plt.imshow(filtered_image, cmap=<span style="color: #CD5555">&quot;gray&quot;</span>, vmin=<span style="color: #B452CD">0</span>, vmax=<span style="color: #B452CD">255</span>, aspect=<span style="color: #CD5555">&quot;auto&quot;</span>)
-plt.show()
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-<p>By taking advantage of the capabilities of separable kernels, we can
-effectively cut the computational expense of filtering an image in
-half. Yet, if we seek even more rapid processing, we can turn to the
-Fast Fourier Transform (FFT) algorithm provided by the numpy
-library. By utilizing FFT to transform the input image and filter into
-the frequency domain, we can perform convolution in this domain. This
-approach significantly reduces the number of operations needed and
-results in a marked speedup relative to other convolution
-techniques. In addition, it is worth noting that the FFT is widely
-regarded as one of the most critical algorithms developed to date,
-with applications ranging from digital signal processing to scientific
-computing.
-</p>
-<h3 id="convolution-in-the-fourier-domain">Convolution in the Fourier domain </h3>
-
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="font-size: 80%; line-height: 125%;">start_time = time.time()
-img_fft = np.fft.fft2(image_of_cute_dog)
-kernel_fft = np.fft.fft2(sobel_kernel, s=image_of_cute_dog.shape)
-
-conv_image = img_fft * kernel_fft
-
-filtered_image = np.fft.ifft2(conv_image)
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">f&#39;Time take for convolution in the fourier domain: {</span>time.time() - start_time<span style="color: #CD5555">}&#39;</span>)
-plt.imshow(filtered_image.real, cmap=<span style="color: #CD5555">&quot;gray&quot;</span>, vmin=<span style="color: #B452CD">0</span>, vmax=<span style="color: #B452CD">255</span>, aspect=<span style="color: #CD5555">&quot;auto&quot;</span>)
-plt.show()
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-<p>It is evident that executing convolution in the Fourier domain yields
-the quickest computation time. Nonetheless, one should exercise
-caution, particularly when dealing with images of relatively small
-dimensions, as one of the other methods may prove to be more
-expeditious than FFT-enhanced convolution. The overhead involved in
-transferring both the image and filter into the Fourier domain,
-followed by their subsequent transformation back into the spatial
-domain, results in a minor inconvenience. Therefore, it is imperative
-to remain cognizant of this fact when utilizing FFT as the primary
-optimization technique.
-</p>
-</section>
-
-<section>
-<h2 id="from-ffnns-and-cnns-to-recurrent-neural-networks-rnns">From FFNNs and CNNs to recurrent neural networks (RNNs) </h2>
-
-<p>There are limitation of FFNNs, one of which being that FFNNs are not
-designed to handle sequential data (data for which the order matters)
-effectively because they lack the capabilities of storing information
-about previous inputs; each input is being treated indepen-
-dently. This is a limitation when dealing with sequential data where
-past information can be vital to correctly process current and future
-inputs. 
-</p>
-</section>
-
-<section>
-<h2 id="feedback-connections">Feedback connections </h2>
-
-<p>In contrast to FFNNs, recurrent networks introduce feedback
-connections, meaning the information is allowed to be carried to
-subsequent nodes across different time steps. These cyclic or feedback
-connections have the objective of providing the network with some kind
-of memory, making RNNs particularly suited for time- series data,
-natural language processing, speech recognition, and several other
-problems for which the order of the data is crucial.  The RNN
-architectures vary greatly in how they manage information flow and
-memory in the network.
-</p>
-</section>
-
-<section>
-<h2 id="vanishing-gradients">Vanishing gradients </h2>
-
-<p>Different architectures often aim at improving
-some sub-optimal characteristics of the network. The simplest form of
-recurrent network, commonly called simple or vanilla RNN, for example,
-is known to suffer from the problem of vanishing gradients. This
-problem arises due to the nature of backpropagation in time. Gradients
-of the cost/loss function may get exponentially small (or large) if
-there are many layers in the network, which is the case of RNN when
-the sequence gets long.
-</p>
-</section>
-
-<section>
-<h2 id="recurrent-neural-networks-rnns-overarching-view">Recurrent neural networks (RNNs): Overarching view </h2>
-
-<p>Till now our focus has been, including convolutional neural networks
-as well, on feedforward neural networks. The output or the activations
-flow only in one direction, from the input layer to the output layer.
-</p>
-
-<p>A recurrent neural network (RNN) looks very much like a feedforward
-neural network, except that it also has connections pointing
-backward. 
-</p>
-
-<p>RNNs are used to analyze time series data such as stock prices, and
-tell you when to buy or sell. In autonomous driving systems, they can
-anticipate car trajectories and help avoid accidents. More generally,
-they can work on sequences of arbitrary lengths, rather than on
-fixed-sized inputs like all the nets we have discussed so far. For
-example, they can take sentences, documents, or audio samples as
-input, making them extremely useful for natural language processing
-systems such as automatic translation and speech-to-text.
-</p>
-</section>
-
-<section>
-<h2 id="sequential-data-only">Sequential data only? </h2>
-
-<p>An important issue is that in many deep learning methods we assume
-that the input and output data can be treated as independent and
-identically distributed, normally abbreviated to <b>iid</b>.
-This means that the data we use can be seen as mutually independent.
-</p>
-
-<p>This is however not the case for most data sets used in RNNs since we
-are dealing with sequences of data with strong inter-dependencies.
-This applies in particular to time series, which are sequential by
-contruction.
-</p>
-</section>
-
-<section>
-<h2 id="differential-equations">Differential equations </h2>
-
-<p>As an example, the solutions of ordinary differential equations can be
-represented as a time series, similarly, how stock prices evolve as
-function of time is another example of a typical time series, or voice
-records and many other examples.
-</p>
-
-<p>Not all sequential data may however have a time stamp, texts being a
-typical example thereof, or DNA sequences.
-</p>
-
-<p>The main focus here is on data that can be structured either as time
-series or as ordered series of data.  We will not focus on for example
-natural language processing or similar data sets.
-</p>
-</section>
-
-<section>
-<h2 id="a-simple-example">A simple example </h2>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="font-size: 80%; line-height: 125%;"><span style="color: #228B22"># Start importing packages</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">pandas</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">pd</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">matplotlib.pyplot</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">plt</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">tensorflow</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">tf</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">tensorflow.keras</span> <span style="color: #8B008B; font-weight: bold">import</span> datasets, layers, models
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">tensorflow.keras.layers</span> <span style="color: #8B008B; font-weight: bold">import</span> Input
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">tensorflow.keras.models</span> <span style="color: #8B008B; font-weight: bold">import</span> Model, Sequential 
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">tensorflow.keras.layers</span> <span style="color: #8B008B; font-weight: bold">import</span> Dense, SimpleRNN, LSTM, GRU
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">tensorflow.keras</span> <span style="color: #8B008B; font-weight: bold">import</span> optimizers     
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">tensorflow.keras</span> <span style="color: #8B008B; font-weight: bold">import</span> regularizers           
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">tensorflow.keras.utils</span> <span style="color: #8B008B; font-weight: bold">import</span> to_categorical 
-
-
-
-<span style="color: #228B22"># convert into dataset matrix</span>
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">convertToMatrix</span>(data, step):
- X, Y =[], []
- <span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #658b00">len</span>(data)-step):
-  d=i+step  
-  X.append(data[i:d,])
-  Y.append(data[d,])
- <span style="color: #8B008B; font-weight: bold">return</span> np.array(X), np.array(Y)
-
-step = <span style="color: #B452CD">4</span>
-N = <span style="color: #B452CD">1000</span>    
-Tp = <span style="color: #B452CD">800</span>    
-
-t=np.arange(<span style="color: #B452CD">0</span>,N)
-x=np.sin(<span style="color: #B452CD">0.02</span>*t)+<span style="color: #B452CD">2</span>*np.random.rand(N)
-df = pd.DataFrame(x)
-df.head()
-
-values=df.values
-train,test = values[<span style="color: #B452CD">0</span>:Tp,:], values[Tp:N,:]
-
-<span style="color: #228B22"># add step elements into train and test</span>
-test = np.append(test,np.repeat(test[-<span style="color: #B452CD">1</span>,],step))
-train = np.append(train,np.repeat(train[-<span style="color: #B452CD">1</span>,],step))
- 
-trainX,trainY =convertToMatrix(train,step)
-testX,testY =convertToMatrix(test,step)
-trainX = np.reshape(trainX, (trainX.shape[<span style="color: #B452CD">0</span>], <span style="color: #B452CD">1</span>, trainX.shape[<span style="color: #B452CD">1</span>]))
-testX = np.reshape(testX, (testX.shape[<span style="color: #B452CD">0</span>], <span style="color: #B452CD">1</span>, testX.shape[<span style="color: #B452CD">1</span>]))
-
-model = Sequential()
-model.add(SimpleRNN(units=<span style="color: #B452CD">32</span>, input_shape=(<span style="color: #B452CD">1</span>,step), activation=<span style="color: #CD5555">&quot;relu&quot;</span>))
-model.add(Dense(<span style="color: #B452CD">8</span>, activation=<span style="color: #CD5555">&quot;relu&quot;</span>)) 
-model.add(Dense(<span style="color: #B452CD">1</span>))
-model.compile(loss=<span style="color: #CD5555">&#39;mean_squared_error&#39;</span>, optimizer=<span style="color: #CD5555">&#39;rmsprop&#39;</span>)
-model.summary()
-
-model.fit(trainX,trainY, epochs=<span style="color: #B452CD">100</span>, batch_size=<span style="color: #B452CD">16</span>, verbose=<span style="color: #B452CD">2</span>)
-trainPredict = model.predict(trainX)
-testPredict= model.predict(testX)
-predicted=np.concatenate((trainPredict,testPredict),axis=<span style="color: #B452CD">0</span>)
-
-trainScore = model.evaluate(trainX, trainY, verbose=<span style="color: #B452CD">0</span>)
-<span style="color: #658b00">print</span>(trainScore)
-plt.plot(df)
-plt.plot(predicted)
-plt.show()
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-</section>
-
-<section>
-<h2 id="rnns">RNNs </h2>
+<p>Similarly, for the fifth-order term we have</p>
+<p>&nbsp;<br>
+$$
+y(5)=x(7)w(0)+x(6)w(1)+x(5)w(2)=0\times \alpha_0+0\times\alpha_1+\beta_3\alpha_2.
+$$
+<p>&nbsp;<br>
 
-<p>RNNs are very powerful, because they
-combine two properties:
-</p>
-<ol>
-<p><li> Distributed hidden state that allows them to store a lot of information about the past efficiently.</li>
-<p><li> Non-linear dynamics that allows them to update their hidden state in complicated ways.</li>
-</ol>
-<p>
-<p>With enough neurons and time, RNNs
-can compute anything that can be
-computed by your computer.
-</p>
+<p>The zeroth-order term is</p>
+<p>&nbsp;<br>
+$$
+y(0)=x(2)w(0)+x(1)w(1)+x(0)w(2)=\beta_0 \alpha_0+0\times\alpha_1+0\times\alpha_2=\alpha_0\beta_0.
+$$
+<p>&nbsp;<br>
 </section>
 
 <section>
-<h2 id="what-kinds-of-behaviour-can-rnns-exhibit">What kinds of behaviour can RNNs exhibit? </h2>
+<h2 id="rewriting-as-dot-products">Rewriting as dot products </h2>
 
-<ol>
-<p><li> They can oscillate.</li> 
-<p><li> They can settle to point attractors.</li>
-<p><li> They can behave chaotically.</li>
-<p><li> RNNs could potentially learn to implement lots of small programs that each capture a nugget of knowledge and run in parallel, interacting to produce very complicated effects.</li>
-</ol>
-<p>
-<p>But the extensive computational needs  of RNNs makes them very hard to train.</p>
-</section>
+<p>If we now flip the filter/weight vector, with the following term as a typical example</p>
+<p>&nbsp;<br>
+$$
+y(0)=x(2)w(0)+x(1)w(1)+x(0)w(2)=x(2)\tilde{w}(2)+x(1)\tilde{w}(1)+x(0)\tilde{w}(0),
+$$
+<p>&nbsp;<br>
 
-<section>
-<h2 id="basic-layout-figures-from-sebastian-rashcka-et-al-machine-learning-with-sickit-learn-and-pytorch-https-sebastianraschka-com-blog-2022-ml-pytorch-book-html">Basic layout,  <a href="https://sebastianraschka.com/blog/2022/ml-pytorch-book.html" target="_blank">Figures from Sebastian Rashcka et al, Machine learning with Sickit-Learn and PyTorch</a> </h2>
+<p>with \( \tilde{w}(0)=w(2) \), \( \tilde{w}(1)=w(1) \), and \( \tilde{w}(2)=w(0) \), we can then rewrite the above sum as a dot product of
+\( x(i:i+(m-1))\tilde{w} \) for element \( y(i) \), where \( x(i:i+(m-1)) \) is simply a patch of \( \boldsymbol{x} \) of size \( m-1 \).
+</p>
 
-<br/><br/>
-<center>
-<p><img src="http://github.com/figslides/RNN1.png" width="700" align="bottom"></p>
-</center>
-<br/><br/>
+<p>The padding \( P \) we have introduced for the convolution stage is just
+another hyperparameter which is introduced as part of the
+architecture. Similarly, below we will also introduce another
+hyperparameter called <b>Stride</b> \( S \). 
+</p>
 </section>
 
 <section>
-<h2 id="solving-differential-equations-with-rnns">Solving differential equations with RNNs </h2>
+<h2 id="cross-correlation">Cross correlation </h2>
 
-<p>To gain some intuition on how we can use RNNs for time series, let us
-tailor the representation of the solution of a differential equation
-as a time series.
+<p>In essentially all applications one uses what is called cross correlation instead of the standard convolution described above.
+This means that multiplication is performed in the same direction and instead of the general expression we have discussed above (with infinite sums)
 </p>
+<p>&nbsp;<br>
+$$
+y(i) = \sum_{k=-\infty}^{k=\infty}w(k)x(i-k),
+$$
+<p>&nbsp;<br>
 
-<p>Consider the famous differential equation (Newton's equation of motion for damped harmonic oscillations, scaled in terms of dimensionless time)</p>
-
+<p>we have now</p>
 <p>&nbsp;<br>
 $$
-\frac{d^2x}{dt^2}+\eta\frac{dx}{dt}+x(t)=F(t),
+y(i) = \sum_{k=-\infty}^{k=\infty}w(k)x(i+k).
 $$
 <p>&nbsp;<br>
 
-<p>where \( \eta \) is a constant used in scaling time into a dimensionless variable and \( F(t) \) is an external force acting on the system.
-The constant \( \eta \) is a so-called damping.
+<p>Both TensorFlow and PyTorch (as well as our own code example below),
+implement the last equation, although it is normally referred to as
+convolution.  The same padding rules and stride rules discussed below
+apply to this expression as well.
 </p>
-</section>
 
-<section>
-<h2 id="two-first-order-differential-equations">Two first-order differential equations </h2>
+<p>We leave it as an exercise for you to convince yourself that the example we have discussed till now, gives the same final expression using the last expression.</p>
+<h2 id="two-dimensional-objects">Two-dimensional objects </h2>
 
-<p>In solving the above second-order equation, it is common to rewrite it in terms of two coupled first-order equations
-with the velocity
+<p>We are now ready to start studying the discrete convolutions relevant for convolutional neural networks.
+We often use convolutions over more than one dimension at a time. If
+we have a two-dimensional image \( X \) as input, we can have a <b>filter</b>
+defined by a two-dimensional <b>kernel/weight/filter</b> \( W \). This leads to an output \( Y \)
 </p>
+
 <p>&nbsp;<br>
 $$
-v(t)=\frac{dx}{dt},
+Y(i,j)=(X * W)(i,j) = \sum_m\sum_n X(m,n)W(i-m,j-n).
 $$
 <p>&nbsp;<br>
 
-<p>and the acceleration</p>
+<p>Convolution is a commutative process, which means we can rewrite this equation as</p>
 <p>&nbsp;<br>
 $$
-\frac{dv}{dt}=F(t)-\eta v(t)-x(t).
+Y(i,j)=(X * W)(i,j) = \sum_m\sum_n X(i-m,j-n)W(m,n).
 $$
 <p>&nbsp;<br>
 
-<p>With the initial conditions \( v_0=v(t_0) \) and \( x_0=x(t_0) \) defined, we can integrate these equations and find their respective solutions.</p>
+<p>Normally the latter is more straightforward to implement in a machine
+larning library since there is less variation in the range of values
+of \( m \) and \( n \).
+</p>
+
+<p>As mentioned above, most deep learning libraries implement
+cross-correlation instead of convolution (although it is referred to as
+convolution)
+</p>
+<p>&nbsp;<br>
+$$
+Y(i,j)=(X * W)(i,j) = \sum_m\sum_n X(i+m,j+n)W(m,n).
+$$
+<p>&nbsp;<br>
 </section>
 
 <section>
-<h2 id="velocity-only">Velocity only </h2>
+<h2 id="cnns-in-more-detail-simple-example">CNNs in more detail, simple example  </h2>
 
-<p>Let us focus on the velocity only. Discretizing and using the simplest
-possible approximation for the derivative, we have Euler's forward
-method for the updated velocity at a time step \( i+1 \) given by
+<p>Let assume we have an input matrix \( X \) of dimensionality \( 3\times 3 \)
+and a \( 2\times 2 \) filter \( W \) given by the following matrices
 </p>
+
 <p>&nbsp;<br>
 $$
-v_{i+1}=v_i+\Delta t \frac{dv}{dt}_{\vert_{v=v_i}}=v_i+\Delta t\left(F_i-\eta v_i-x_i\right).
+\boldsymbol{X}=\begin{bmatrix}x_{00} & x_{01} & x_{02}  \\
+                      x_{10} & x_{11} & x_{12}  \\
+	              x_{20} & x_{21} & x_{22} \end{bmatrix},
 $$
 <p>&nbsp;<br>
 
-<p>Defining a function</p>
+<p>and </p>
 <p>&nbsp;<br>
 $$
-h_i(x_i,v_i,F_i)=v_i+\Delta t\left(F_i-\eta v_i-x_i\right),
+\boldsymbol{W}=\begin{bmatrix}w_{00} & w_{01} \\
+	              w_{10} & w_{11}\end{bmatrix}.
 $$
 <p>&nbsp;<br>
 
-<p>we have</p>
+<p>We introduce now the hyperparameter \( S \) <b>stride</b>. Stride represents how the filter \( W \) moves the convolution process on the matrix \( X \).
+We strongly recommend the repository on <a href="https://github.com/vdumoulin/conv_arithmetic" target="_blank">Arithmetic of deep learning by Dumoulin and Visin</a> 
+</p>
+
+<p>Here we set the stride equal to \( S=1 \), which means that, starting with the element \( x_{00} \), the filter will act on \( 2\times 2 \) submatrices each time, starting with the upper corner and moving according to the stride value column by column. </p>
+
+<p>Here we perform the operation</p>
 <p>&nbsp;<br>
 $$
-v_{i+1}=h_i(x_i,v_i,F_i).
+Y_(i,j)=(X * W)(i,j) = \sum_m\sum_n X(i-m,j-n)W(m,n),
 $$
 <p>&nbsp;<br>
-</section>
-
-<section>
-<h2 id="linking-with-rnns">Linking with RNNs </h2>
 
-<p>The equation</p>
+<p>and obtain</p>
 <p>&nbsp;<br>
 $$
-v_{i+1}=h_i(x_i,v_i,F_i).
+\boldsymbol{Y}=\begin{bmatrix}x_{00}w_{00}+x_{01}w_{01}+x_{10}w_{10}+x_{11}w_{11} & x_{01}w_{00}+x_{02}w_{01}+x_{11}w_{10}+x_{12}w_{11}  \\
+	              x_{10}w_{00}+x_{11}w_{01}+x_{20}w_{10}+x_{21}w_{11} & x_{11}w_{00}+x_{12}w_{01}+x_{21}w_{10}+x_{22}w_{11}\end{bmatrix}.
 $$
 <p>&nbsp;<br>
 
-<p>can be used to train a feed-forward neural network with inputs \( v_i \) and outputs \( v_{i+1} \) at a time \( t_i \). But we can think of this also as a recurrent neural network
-with inputs \( v_i \), \( x_i \) and \( F_i \) at each time step \( t_i \), and producing an output \( v_{i+1} \).
+<p>We can rewrite this operation in terms of a matrix-vector multiplication by defining a new vector where we flatten out the inputs as a vector \( \boldsymbol{X}' \) of length \( 9 \) and
+a matrix \( \boldsymbol{W}' \) with dimension \( 4\times 9 \) as
 </p>
 
-<p>Noting that </p>
 <p>&nbsp;<br>
 $$
-v_{i}=v_{i-1}+\Delta t\left(F_{i-1}-\eta v_{i-1}-x_{i-1}\right)=h_{i-1}.
+\boldsymbol{X}'=\begin{bmatrix}x_{00} \\ x_{01} \\ x_{02} \\ x_{10} \\ x_{11} \\ x_{12} \\ x_{20} \\ x_{21} \\ x_{22} \end{bmatrix},
 $$
 <p>&nbsp;<br>
 
-<p>we have</p>
+<p>and the new matrix</p>
 <p>&nbsp;<br>
 $$
-v_{i}=h_{i-1}(x_{i-1},v_{i-1},F_{i-1}),
+\boldsymbol{W}'=\begin{bmatrix} w_{00} & w_{01} & 0 & w_{10} & w_{11} & 0 & 0 & 0 & 0 \\
+                        0  & w_{00} & w_{01} & 0 & w_{10} & w_{11} & 0 & 0 & 0 \\
+			0 & 0 & 0 & w_{00} & w_{01} & 0 & w_{10} & w_{11} & 0  \\
+                        0 & 0 & 0 & 0 & w_{00} & w_{01} & 0 & w_{10} & w_{11}\end{bmatrix}.
 $$
 <p>&nbsp;<br>
 
-<p>and we can rewrite</p>
+<p>We see easily that performing the matrix-vector multiplication \( \boldsymbol{W}'\boldsymbol{X}' \) is the same as the above convolution with stride \( S=1 \), that is</p>
+
 <p>&nbsp;<br>
 $$
-v_{i+1}=h_i(x_i,h_{i-1},F_i).
+Y=(\boldsymbol{W}*\boldsymbol{X}),
 $$
 <p>&nbsp;<br>
+
+<p>is now given by \( \boldsymbol{W}'\boldsymbol{X}' \) which is a vector of length \( 4 \) instead of the originally resulting  \( 2\times 2 \) output matrix.</p>
+</section>
+
+<section>
+<h2 id="the-convolution-stage">The convolution stage </h2>
+
+<p>The convolution stage, where we apply different filters \( \boldsymbol{W} \) in
+order to reduce the dimensionality of an image, adds, in addition to
+the weights and biases (to be trained by the back propagation
+algorithm) that define the filters, two new hyperparameters, the so-called
+<b>padding</b> \( P \) and the stride \( S \).
+</p>
+</section>
+
+<section>
+<h2 id="finding-the-number-of-parameters">Finding the number of parameters </h2>
+
+<p>In the above example we have an input matrix of dimension \( 3\times
+3 \). In general we call the input for an input volume and it is defined
+by its width \( H_1 \), height \( H_1 \) and depth \( D_1 \). If we have the
+standard three color channels \( D_1=3 \).
+</p>
+
+<p>The above example has \( W_1=H_1=3 \) and \( D_1=1 \).</p>
+
+<p>When we introduce the filter we have the following additional hyperparameters</p>
+<ol>
+<p><li> \( K \) the number of filters. It is common to perform the convolution of the input several times since by experience shrinking the input too fast does not work well</li>
+<p><li> \( F \) as the filter's spatial extent</li>
+<p><li> \( S \) as the stride parameter</li>
+<p><li> \( P \) as the padding parameter</li>
+</ol>
+<p>
+<p>These parameters are defined by the architecture of the network and are not included in the training.</p>
 </section>
 
 <section>
-<h2 id="minor-rewrite">Minor rewrite </h2>
+<h2 id="new-image-or-volume">New image (or volume) </h2>
 
-<p>We can thus set up a recurring series which depends on the inputs \( x_i \) and \( F_i \) and the previous values \( h_{i-1} \).
-We assume now that the inputs at each step (or time \( t_i \)) is given by \( x_i \) only and we denote the outputs for \( \tilde{y}_i \) instead of \( v_{i_1} \), we have then the compact equation for our outputs at each step \( t_i \)
+<p>Acting with the filter on the input volume produces an output volume
+which is defined by its width \( W_2 \), its height \( H_2 \) and its depth
+\( D_2 \).
 </p>
+
+<p>These are defined by the following relations</p>
+<p>&nbsp;<br>
+$$
+W_2 = \frac{(W_1-F+2P)}{S}+1,
+$$
+<p>&nbsp;<br>
+
+<p>&nbsp;<br>
+$$
+H_2 = \frac{(H_1-F+2P)}{S}+1,
+$$
+<p>&nbsp;<br>
+
+<p>and \( D_2=K \).</p>
+</section>
+
+<section>
+<h2 id="parameters-to-train-common-settings">Parameters to train, common settings </h2>
+
+<p>With parameter sharing, the convolution involves thus  for each filter  \( F\times F\times D_1 \) weights plus one bias parameter.</p>
+
+<p>In total we have</p>
 <p>&nbsp;<br>
 $$
-y_{i}=h_i(x_i,h_{i-1}).
+\left(F\times F\times D_1)\right) \times K+(K\mathrm{--biases}),
 $$
 <p>&nbsp;<br>
 
-<p>We can think of this as an element in a recurrent network where our
-network (our model) produces an output \( y_i \) which is then compared
-with a target value through a given cost/loss function that we
-optimize. The target values at a given step \( t_i \) could be the results
-of a measurement or simply the analytical results of a differential
-equation.
+<p>parameters to train by back propagation.</p>
+
+<p>It is common to let \( K \) come in powers of \( 2 \), that is \( 32 \), \( 64 \), \( 128 \) etc.</p>
+
+<div class="alert alert-block alert-block alert-text-normal">
+<b>Common settings</b>
+<p>
+<ol>
+<p><li> \( \begin{array}{c} F=3 &amp; S=1 &amp; P=1 \end{array} \)</li>
+<p><li> \( \begin{array}{c} F=5 &amp; S=1 &amp; P=2 \end{array} \)</li>
+<p><li> \( \begin{array}{c} F=5 &amp; S=2 &amp; P=\mathrm{open} \end{array} \)</li>
+<p><li> \( \begin{array}{c} F=1 &amp; S=1 &amp; P=0 \end{array} \)</li>
+</ol>
+</div>
+</section>
+
+<section>
+<h2 id="examples-of-cnn-setups">Examples of CNN setups </h2>
+
+<p>Let us assume we have an input volume \( V \) given by an image of dimensionality
+\( 32\times 32 \times 3 \), that is three color channels and \( 32\times 32 \) pixels.
+</p>
+
+<p>We apply a filter of dimension \( 5\times 5 \) ten times with stride \( S=1 \) and padding \( P=0 \).</p>
+
+<p>The output volume is given by \( (32-5)/1+1=28 \), resulting in ten images
+of dimensionality \( 28\times 28\times 3 \).
+</p>
+
+<p>The total number of parameters to train for each filter is then
+\( 5\times 5\times 3+1 \), where the last parameter is the bias. This
+gives us \( 76 \) parameters for each filter, leading to a total of \( 760 \)
+parameters for the ten filters.
+</p>
+
+<p>How many parameters will a filter of dimensionality \( 3\times 3 \)
+(adding color channels) result in if we produce \( 32 \) new images? Use \( S=1 \) and \( P=0 \).
+</p>
+
+<p>Note that strides constitute a form of <b>subsampling</b>. As an alternative to
+being interpreted as a measure of how much the kernel/filter is translated, strides
+can also be viewed as how much of the output is retained. For instance, moving
+the kernel by hops of two is equivalent to moving the kernel by hops of one but
+retaining only odd output elements.
 </p>
 </section>
 
 <section>
-<h2 id="rnns-in-more-detail">RNNs in more detail  </h2>
+<h2 id="summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book">Summarizing: Performing a general discrete convolution (<a href="https://github.com/rasbt/machine-learning-book" target="_blank">From Raschka et al</a>) </h2>
 
-<br/><br/>
+<center>  <!-- FIGURE -->
+<hr class="figure">
 <center>
-<p><img src="http://github.com/figslides/RNN2.png" width="700" align="bottom"></p>
+<p class="caption">Figure 4:  A deep CNN </p>
+</center>
+<p><img src="http://github.com/figslides/discreteconv1.png" width="500" align="bottom"></p>
 </center>
-<br/><br/>
 </section>
 
 <section>
-<h2 id="rnns-in-more-detail-part-2">RNNs in more detail, part 2  </h2>
+<h2 id="pooling">Pooling </h2>
 
-<br/><br/>
-<center>
-<p><img src="http://github.com/figslides/RNN3.png" width="700" align="bottom"></p>
-</center>
-<br/><br/>
+<p>In addition to discrete convolutions themselves, <b>pooling</b> operations
+make up another important building block in CNNs. Pooling operations reduce
+the size of feature maps by using some function to summarize subregions, such
+as taking the average or the maximum value.
+</p>
+
+<p>Pooling works by sliding a window across the input and feeding the content of
+the window to a <b>pooling function</b>. In some sense, pooling works very much
+like a discrete convolution, but replaces the linear combination described by
+the kernel with some other function.
+</p>
 </section>
 
 <section>
-<h2 id="rnns-in-more-detail-part-3">RNNs in more detail, part 3  </h2>
+<h2 id="pooling-arithmetic">Pooling arithmetic </h2>
 
-<br/><br/>
-<center>
-<p><img src="http://github.com/figslides/RNN4.png" width="700" align="bottom"></p>
-</center>
-<br/><br/>
+<p>In a neural network, pooling layers provide invariance to small translations of
+the input. The most common kind of pooling is <b>max pooling</b>, which
+consists in splitting the input in (usually non-overlapping) patches and
+outputting the maximum value of each patch. Other kinds of pooling exist, e.g.,
+mean or average pooling, which all share the same idea of aggregating the input
+locally by applying a non-linearity to the content of some patches.
+</p>
 </section>
 
 <section>
-<h2 id="rnns-in-more-detail-part-4">RNNs in more detail, part 4  </h2>
+<h2 id="pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book">Pooling types (<a href="https://github.com/rasbt/machine-learning-book" target="_blank">From Raschka et al</a>) </h2>
 
-<br/><br/>
+<center>  <!-- FIGURE -->
+<hr class="figure">
 <center>
-<p><img src="http://github.com/figslides/RNN5.png" width="700" align="bottom"></p>
+<p class="caption">Figure 5:  A deep CNN </p>
+</center>
+<p><img src="http://github.com/figslides/maxpooling.png" width="500" align="bottom"></p>
 </center>
-<br/><br/>
 </section>
 
 <section>
-<h2 id="rnns-in-more-detail-part-5">RNNs in more detail, part 5  </h2>
+<h2 id="building-convolutional-neural-networks-using-tensorflow-and-keras">Building convolutional neural networks using Tensorflow and Keras </h2>
+
+<p>As discussed above, CNNs are neural networks built from the assumption that the inputs
+to the network are 2D images. This is important because the number of features or pixels in images
+grows very fast with the image size, and an enormous number of weights and biases are needed in order to build an accurate network.  
+</p>
+
+<p>As before, we still have our input, a hidden layer and an output. What's novel about convolutional networks
+are the <b>convolutional</b> and <b>pooling</b> layers stacked in pairs between the input and the hidden layer.
+In addition, the data is no longer represented as a 2D feature matrix, instead each input is a number of 2D
+matrices, typically 1 for each color dimension (Red, Green, Blue). 
+</p>
+</section>
+
+<section>
+<h2 id="setting-it-up">Setting it up </h2>
+
+<p>It means that to represent the entire
+dataset of images, we require a 4D matrix or <b>tensor</b>. This tensor has the dimensions:  
+</p>
+<p>&nbsp;<br>
+$$  
+(n_{inputs},\, n_{pixels, width},\, n_{pixels, height},\, depth) .
+$$
+<p>&nbsp;<br>
+</section>
+
+<section>
+<h2 id="the-mnist-dataset-again">The MNIST dataset again </h2>
+
+<p>The MNIST dataset consists of grayscale images with a pixel size of
+\( 28\times 28 \), meaning we require \( 28 \times 28 = 724 \) weights to each
+neuron in the first hidden layer.
+</p>
+
+<p>If we were to analyze images of size \( 128\times 128 \) we would require
+\( 128 \times 128 = 16384 \) weights to each neuron. Even worse if we were
+dealing with color images, as most images are, we have an image matrix
+of size \( 128\times 128 \) for each color dimension (Red, Green, Blue),
+meaning 3 times the number of weights \( = 49152 \) are required for every
+single neuron in the first hidden layer.
+</p>
+</section>
+
+<section>
+<h2 id="strong-correlations">Strong correlations </h2>
+
+<p>Images typically have strong local correlations, meaning that a small
+part of the image varies little from its neighboring regions. If for
+example we have an image of a blue car, we can roughly assume that a
+small blue part of the image is surrounded by other blue regions.
+</p>
+
+<p>Therefore, instead of connecting every single pixel to a neuron in the
+first hidden layer, as we have previously done with deep neural
+networks, we can instead connect each neuron to a small part of the
+image (in all 3 RGB depth dimensions).  The size of each small area is
+fixed, and known as a <a href="https://en.wikipedia.org/wiki/Receptive_field" target="_blank">receptive</a>.
+</p>
+</section>
+
+<section>
+<h2 id="layers-of-a-cnn">Layers of a CNN </h2>
+
+<p>The layers of a convolutional neural network arrange neurons in 3D: width, height and depth.  
+The input image is typically a square matrix of depth 3. 
+</p>
+
+<p>A <b>convolution</b> is performed on the image which outputs
+a 3D volume of neurons. The weights to the input are arranged in a number of 2D matrices, known as <b>filters</b>.
+</p>
+
+<p>Each filter slides along the input image, taking the dot product
+between each small part of the image and the filter, in all depth
+dimensions. This is then passed through a non-linear function,
+typically the <b>Rectified Linear (ReLu)</b> function, which serves as the
+activation of the neurons in the first convolutional layer. This is
+further passed through a <b>pooling layer</b>, which reduces the size of the
+convolutional layer, e.g. by taking the maximum or average across some
+small regions, and this serves as input to the next convolutional
+layer.
+</p>
+</section>
+
+<section>
+<h2 id="systematic-reduction">Systematic reduction </h2>
+
+<p>By systematically reducing the size of the input volume, through
+convolution and pooling, the network should create representations of
+small parts of the input, and then from them assemble representations
+of larger areas.  The final pooling layer is flattened to serve as
+input to a hidden layer, such that each neuron in the final pooling
+layer is connected to every single neuron in the hidden layer. This
+then serves as input to the output layer, e.g. a softmax output for
+classification.
+</p>
+</section>
+
+<section>
+<h2 id="prerequisites-collect-and-pre-process-data">Prerequisites: Collect and pre-process data </h2>
+
+<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #eeeedd">
+  <pre style="font-size: 80%; line-height: 125%;"><span style="color: #228B22"># import necessary packages</span>
+<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
+<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">matplotlib.pyplot</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">plt</span>
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn</span> <span style="color: #8B008B; font-weight: bold">import</span> datasets
+
+
+<span style="color: #228B22"># ensure the same random numbers appear every time</span>
+np.random.seed(<span style="color: #B452CD">0</span>)
+
+<span style="color: #228B22"># display images in notebook</span>
+%matplotlib inline
+plt.rcParams[<span style="color: #CD5555">&#39;figure.figsize&#39;</span>] = (<span style="color: #B452CD">12</span>,<span style="color: #B452CD">12</span>)
+
+
+<span style="color: #228B22"># download MNIST dataset</span>
+digits = datasets.load_digits()
+
+<span style="color: #228B22"># define inputs and labels</span>
+inputs = digits.images
+labels = digits.target
+
+<span style="color: #228B22"># RGB images have a depth of 3</span>
+<span style="color: #228B22"># our images are grayscale so they should have a depth of 1</span>
+inputs = inputs[:,:,:,np.newaxis]
+
+<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;inputs = (n_inputs, pixel_width, pixel_height, depth) = &quot;</span> + <span style="color: #658b00">str</span>(inputs.shape))
+<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;labels = (n_inputs) = &quot;</span> + <span style="color: #658b00">str</span>(labels.shape))
+
+
+<span style="color: #228B22"># choose some random images to display</span>
+n_inputs = <span style="color: #658b00">len</span>(inputs)
+indices = np.arange(n_inputs)
+random_indices = np.random.choice(indices, size=<span style="color: #B452CD">5</span>)
+
+<span style="color: #8B008B; font-weight: bold">for</span> i, image <span style="color: #8B008B">in</span> <span style="color: #658b00">enumerate</span>(digits.images[random_indices]):
+    plt.subplot(<span style="color: #B452CD">1</span>, <span style="color: #B452CD">5</span>, i+<span style="color: #B452CD">1</span>)
+    plt.axis(<span style="color: #CD5555">&#39;off&#39;</span>)
+    plt.imshow(image, cmap=plt.cm.gray_r, interpolation=<span style="color: #CD5555">&#39;nearest&#39;</span>)
+    plt.title(<span style="color: #CD5555">&quot;Label: %d&quot;</span> % digits.target[random_indices[i]])
+plt.show()
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
+</div>
+</section>
+
+<section>
+<h2 id="importing-keras-and-tensorflow">Importing Keras and Tensorflow </h2>
 
-<br/><br/>
-<center>
-<p><img src="http://github.com/figslides/RNN6.png" width="700" align="bottom"></p>
-</center>
-<br/><br/>
-</section>
+<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #eeeedd">
+  <pre style="font-size: 80%; line-height: 125%;"><span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">tensorflow.keras</span> <span style="color: #8B008B; font-weight: bold">import</span> datasets, layers, models
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">tensorflow.keras.layers</span> <span style="color: #8B008B; font-weight: bold">import</span> Input
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">tensorflow.keras.models</span> <span style="color: #8B008B; font-weight: bold">import</span> Sequential      <span style="color: #228B22">#This allows appending layers to existing models</span>
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">tensorflow.keras.layers</span> <span style="color: #8B008B; font-weight: bold">import</span> Dense           <span style="color: #228B22">#This allows defining the characteristics of a particular layer</span>
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">tensorflow.keras</span> <span style="color: #8B008B; font-weight: bold">import</span> optimizers             <span style="color: #228B22">#This allows using whichever optimiser we want (sgd,adam,RMSprop)</span>
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">tensorflow.keras</span> <span style="color: #8B008B; font-weight: bold">import</span> regularizers           <span style="color: #228B22">#This allows using whichever regularizer we want (l1,l2,l1_l2)</span>
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">tensorflow.keras.utils</span> <span style="color: #8B008B; font-weight: bold">import</span> to_categorical   <span style="color: #228B22">#This allows using categorical cross entropy as the cost function</span>
+<span style="color: #228B22">#from tensorflow.keras import Conv2D</span>
+<span style="color: #228B22">#from tensorflow.keras import MaxPooling2D</span>
+<span style="color: #228B22">#from tensorflow.keras import Flatten</span>
 
-<section>
-<h2 id="rnns-in-more-detail-part-6">RNNs in more detail, part 6  </h2>
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.model_selection</span> <span style="color: #8B008B; font-weight: bold">import</span> train_test_split
 
-<br/><br/>
-<center>
-<p><img src="http://github.com/figslides/RNN7.png" width="700" align="bottom"></p>
-</center>
-<br/><br/>
+<span style="color: #228B22"># representation of labels</span>
+labels = to_categorical(labels)
+
+<span style="color: #228B22"># split into train and test data</span>
+<span style="color: #228B22"># one-liner from scikit-learn library</span>
+train_size = <span style="color: #B452CD">0.8</span>
+test_size = <span style="color: #B452CD">1</span> - train_size
+X_train, X_test, Y_train, Y_test = train_test_split(inputs, labels, train_size=train_size,
+                                                    test_size=test_size)
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
+</div>
 </section>
 
 <section>
-<h2 id="rnns-in-more-detail-part-7">RNNs in more detail, part 7  </h2>
+<h2 id="running-with-keras">Running with Keras </h2>
 
-<br/><br/>
-<center>
-<p><img src="http://github.com/figslides/RNN8.png" width="700" align="bottom"></p>
-</center>
-<br/><br/>
-</section>
 
-<section>
-<h2 id="backpropagation-through-time">Backpropagation through time </h2>
+<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #eeeedd">
+  <pre style="font-size: 80%; line-height: 125%;"><span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">create_convolutional_neural_network_keras</span>(input_shape, receptive_field,
+                                              n_filters, n_neurons_connected, n_categories,
+                                              eta, lmbd):
+    model = Sequential()
+    model.add(layers.Conv2D(n_filters, (receptive_field, receptive_field), input_shape=input_shape, padding=<span style="color: #CD5555">&#39;same&#39;</span>,
+              activation=<span style="color: #CD5555">&#39;relu&#39;</span>, kernel_regularizer=regularizers.l2(lmbd)))
+    model.add(layers.MaxPooling2D(pool_size=(<span style="color: #B452CD">2</span>, <span style="color: #B452CD">2</span>)))
+    model.add(layers.Flatten())
+    model.add(layers.Dense(n_neurons_connected, activation=<span style="color: #CD5555">&#39;relu&#39;</span>, kernel_regularizer=regularizers.l2(lmbd)))
+    model.add(layers.Dense(n_categories, activation=<span style="color: #CD5555">&#39;softmax&#39;</span>, kernel_regularizer=regularizers.l2(lmbd)))
+    
+    sgd = optimizers.SGD(lr=eta)
+    model.compile(loss=<span style="color: #CD5555">&#39;categorical_crossentropy&#39;</span>, optimizer=sgd, metrics=[<span style="color: #CD5555">&#39;accuracy&#39;</span>])
+    
+    <span style="color: #8B008B; font-weight: bold">return</span> model
 
-<div class="alert alert-block alert-block alert-text-normal">
-<b></b>
-<p>
-<p>We can think of the recurrent net as a layered, feed-forward
-net with shared weights and then train the feed-forward net
-with weight constraints.
-</p>
-</div>
+epochs = <span style="color: #B452CD">100</span>
+batch_size = <span style="color: #B452CD">100</span>
+input_shape = X_train.shape[<span style="color: #B452CD">1</span>:<span style="color: #B452CD">4</span>]
+receptive_field = <span style="color: #B452CD">3</span>
+n_filters = <span style="color: #B452CD">10</span>
+n_neurons_connected = <span style="color: #B452CD">50</span>
+n_categories = <span style="color: #B452CD">10</span>
 
-<p>We can also think of this training algorithm in the time domain:</p>
-<ol>
-<p><li> The forward pass builds up a stack of the activities of all the units at each time step.</li>
-<p><li> The backward pass peels activities off the stack to compute the error derivatives at each time step.</li>
-<p><li> After the backward pass we add together the derivatives at all the different times for each weight.</li> 
-</ol>
+eta_vals = np.logspace(-<span style="color: #B452CD">5</span>, <span style="color: #B452CD">1</span>, <span style="color: #B452CD">7</span>)
+lmbd_vals = np.logspace(-<span style="color: #B452CD">5</span>, <span style="color: #B452CD">1</span>, <span style="color: #B452CD">7</span>)
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
+</div>
 </section>
 
 <section>
-<h2 id="the-backward-pass-is-linear">The backward pass is linear </h2>
+<h2 id="final-part">Final part </h2>
 
-<ol>
-<p><li> There is a big difference between the forward and backward passes.</li>
-<p><li> In the forward pass we use squashing functions (like the logistic) to prevent the activity vectors from exploding.</li>
-<p><li> The backward pass, is completely linear. If you double the error derivatives at the final layer, all the error derivatives will double.</li>
-</ol>
-<p>
-<p>The forward pass determines the slope of the linear function used for
-backpropagating through each neuron
-</p>
-</section>
 
-<section>
-<h2 id="the-problem-of-exploding-or-vanishing-gradients">The problem of exploding or vanishing gradients </h2>
-<ul>
-<p><li> What happens to the magnitude of the gradients as we backpropagate through many layers?
-<ol type="a"></li>
- <p><li> If the weights are small, the gradients shrink exponentially.</li>
- <p><li> If the weights are big the gradients grow exponentially.</li>
-</ol>
-<p>
-<p><li> Typical feed-forward neural nets can cope with these exponential effects because they only have a few hidden layers.</li>
-<p><li> In an RNN trained on long sequences (e.g. 100 time steps) the gradients can easily explode or vanish.
-<ol type="a"></li>
- <p><li> We can avoid this by initializing the weights very carefully.</li>
-</ol>
-<p>
-<p><li> Even with good initial weights, its very hard to detect that the current target output depends on an input from many time-steps ago.</li>
-</ul>
-<p>
-<p>RNNs have difficulty dealing with long-range dependencies. </p>
+<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #eeeedd">
+  <pre style="font-size: 80%; line-height: 125%;">CNN_keras = np.zeros((<span style="color: #658b00">len</span>(eta_vals), <span style="color: #658b00">len</span>(lmbd_vals)), dtype=<span style="color: #658b00">object</span>)
+        
+<span style="color: #8B008B; font-weight: bold">for</span> i, eta <span style="color: #8B008B">in</span> <span style="color: #658b00">enumerate</span>(eta_vals):
+    <span style="color: #8B008B; font-weight: bold">for</span> j, lmbd <span style="color: #8B008B">in</span> <span style="color: #658b00">enumerate</span>(lmbd_vals):
+        CNN = create_convolutional_neural_network_keras(input_shape, receptive_field,
+                                              n_filters, n_neurons_connected, n_categories,
+                                              eta, lmbd)
+        CNN.fit(X_train, Y_train, epochs=epochs, batch_size=batch_size, verbose=<span style="color: #B452CD">0</span>)
+        scores = CNN.evaluate(X_test, Y_test)
+        
+        CNN_keras[i][j] = CNN
+        
+        <span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;Learning rate = &quot;</span>, eta)
+        <span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;Lambda = &quot;</span>, lmbd)
+        <span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;Test accuracy: %.3f&quot;</span> % scores[<span style="color: #B452CD">1</span>])
+        <span style="color: #658b00">print</span>()
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
+</div>
 </section>
 
 <section>
-<h2 id="mathematical-setup">Mathematical setup </h2>
+<h2 id="final-visualization">Final visualization </h2>
 
-<p>The expression for the simplest Recurrent network resembles that of a
-regular feed-forward neural network, but now with
-the concept of temporal dependencies
-</p>
 
-<p>&nbsp;<br>
-$$
-\begin{align*}
-    \mathbf{a}^{(t)} & = U * \mathbf{x}^{(t)} + W * \mathbf{h}^{(t-1)} + \mathbf{b}, \notag \\
-    \mathbf{h}^{(t)} &= \sigma_h(\mathbf{a}^{(t)}), \notag\\
-    \mathbf{y}^{(t)} &= V * \mathbf{h}^{(t)} + \mathbf{c}, \notag\\
-    \mathbf{\hat{y}}^{(t)} &= \sigma_y(\mathbf{y}^{(t)}).
-\end{align*}
-$$
-<p>&nbsp;<br>
-</section>
+<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #eeeedd">
+  <pre style="font-size: 80%; line-height: 125%;"><span style="color: #228B22"># visual representation of grid search</span>
+<span style="color: #228B22"># uses seaborn heatmap, could probably do this in matplotlib</span>
+<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">seaborn</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">sns</span>
 
-<section>
-<h2 id="back-propagation-in-time-through-figures-part-1">Back propagation in time through figures, part 1   </h2>
+sns.set()
 
-<br/><br/>
-<center>
-<p><img src="http://github.com/figslides/RNN9.png" width="700" align="bottom"></p>
-</center>
-<br/><br/>
-</section>
+train_accuracy = np.zeros((<span style="color: #658b00">len</span>(eta_vals), <span style="color: #658b00">len</span>(lmbd_vals)))
+test_accuracy = np.zeros((<span style="color: #658b00">len</span>(eta_vals), <span style="color: #658b00">len</span>(lmbd_vals)))
 
-<section>
-<h2 id="back-propagation-in-time-part-2">Back propagation in time, part 2   </h2>
+<span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #658b00">len</span>(eta_vals)):
+    <span style="color: #8B008B; font-weight: bold">for</span> j <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #658b00">len</span>(lmbd_vals)):
+        CNN = CNN_keras[i][j]
 
-<br/><br/>
-<center>
-<p><img src="http://github.com/figslides/RNN10.png" width="700" align="bottom"></p>
-</center>
-<br/><br/>
-</section>
+        train_accuracy[i][j] = CNN.evaluate(X_train, Y_train)[<span style="color: #B452CD">1</span>]
+        test_accuracy[i][j] = CNN.evaluate(X_test, Y_test)[<span style="color: #B452CD">1</span>]
 
-<section>
-<h2 id="back-propagation-in-time-part-3">Back propagation in time, part 3   </h2>
+        
+fig, ax = plt.subplots(figsize = (<span style="color: #B452CD">10</span>, <span style="color: #B452CD">10</span>))
+sns.heatmap(train_accuracy, annot=<span style="color: #8B008B; font-weight: bold">True</span>, ax=ax, cmap=<span style="color: #CD5555">&quot;viridis&quot;</span>)
+ax.set_title(<span style="color: #CD5555">&quot;Training Accuracy&quot;</span>)
+ax.set_ylabel(<span style="color: #CD5555">&quot;$\eta$&quot;</span>)
+ax.set_xlabel(<span style="color: #CD5555">&quot;$\lambda$&quot;</span>)
+plt.show()
 
-<br/><br/>
-<center>
-<p><img src="http://github.com/figslides/RNN11.png" width="700" align="bottom"></p>
-</center>
-<br/><br/>
+fig, ax = plt.subplots(figsize = (<span style="color: #B452CD">10</span>, <span style="color: #B452CD">10</span>))
+sns.heatmap(test_accuracy, annot=<span style="color: #8B008B; font-weight: bold">True</span>, ax=ax, cmap=<span style="color: #CD5555">&quot;viridis&quot;</span>)
+ax.set_title(<span style="color: #CD5555">&quot;Test Accuracy&quot;</span>)
+ax.set_ylabel(<span style="color: #CD5555">&quot;$\eta$&quot;</span>)
+ax.set_xlabel(<span style="color: #CD5555">&quot;$\lambda$&quot;</span>)
+plt.show()
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
+</div>
 </section>
 
 <section>
-<h2 id="back-propagation-in-time-part-4">Back propagation in time, part 4   </h2>
+<h2 id="the-cifar01-data-set">The CIFAR01 data set </h2>
 
-<br/><br/>
-<center>
-<p><img src="http://github.com/figslides/RNN12.png" width="700" align="bottom"></p>
-</center>
-<br/><br/>
-</section>
+<p>The CIFAR10 dataset contains 60,000 color images in 10 classes, with
+6,000 images in each class. The dataset is divided into 50,000
+training images and 10,000 testing images. The classes are mutually
+exclusive and there is no overlap between them.
+</p>
 
-<section>
-<h2 id="back-propagation-in-time-in-equations">Back propagation in time in equations </h2>
 
-<p>To derive the expression of the gradients of \( \mathcal{L} \) for
-the RNN, we need to start recursively from the nodes closer to the
-output layer in the temporal unrolling scheme - such as \( \mathbf{y} \)
-and \( \mathbf{h} \) at final time \( t = \tau \),
-</p>
+<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #eeeedd">
+  <pre style="font-size: 80%; line-height: 125%;"><span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">tensorflow</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">tf</span>
 
-<p>&nbsp;<br>
-$$
-\begin{align*}
-    (\nabla_{ \mathbf{y}^{(t)}} \mathcal{L})_{i} &= \frac{\partial \mathcal{L}}{\partial L^{(t)}}\frac{\partial L^{(t)}}{\partial y_{i}^{(t)}}, \notag\\
-    \nabla_{\mathbf{h}^{(\tau)}} \mathcal{L} &= \mathbf{V}^\mathsf{T}\nabla_{ \mathbf{y}^{(\tau)}} \mathcal{L}.
-\end{align*}
-$$
-<p>&nbsp;<br>
-</section>
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">tensorflow.keras</span> <span style="color: #8B008B; font-weight: bold">import</span> datasets, layers, models
+<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">matplotlib.pyplot</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">plt</span>
 
-<section>
-<h2 id="chain-rule-again">Chain rule again </h2>
-<p>For the following hidden nodes, we have to iterate through time, so by the chain rule, </p>
+<span style="color: #228B22"># We import the data set</span>
+(train_images, train_labels), (test_images, test_labels) = datasets.cifar10.load_data()
 
-<p>&nbsp;<br>
-$$
-\begin{align*}
-    \nabla_{\mathbf{h}^{(t)}} \mathcal{L} &= \left(\frac{\partial\mathbf{h}^{(t+1)}}{\partial\mathbf{h}^{(t)}}\right)^\mathsf{T}\nabla_{\mathbf{h}^{(t+1)}}\mathcal{L} + \left(\frac{\partial\mathbf{y}^{(t)}}{\partial\mathbf{h}^{(t)}}\right)^\mathsf{T}\nabla_{ \mathbf{y}^{(t)}} \mathcal{L}.
-\end{align*}
-$$
-<p>&nbsp;<br>
+<span style="color: #228B22"># Normalize pixel values to be between 0 and 1 by dividing by 255. </span>
+train_images, test_images = train_images / <span style="color: #B452CD">255.0</span>, test_images / <span style="color: #B452CD">255.0</span>
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
+</div>
 </section>
 
 <section>
-<h2 id="gradients-of-loss-functions">Gradients of loss functions </h2>
-<p>Similarly, the gradients of \( \mathcal{L} \) with respect to the weights and biases follow,</p>
-
-<p>&nbsp;<br>
-$$
-\begin{align*}
-    \nabla_{\mathbf{c}} \mathcal{L} &=\sum_{t}\left(\frac{\partial \mathbf{y}^{(t)}}{\partial \mathbf{c}}\right)^\mathsf{T} \nabla_{\mathbf{y}^{(t)}} \mathcal{L} \notag\\
-    \nabla_{\mathbf{b}} \mathcal{L} &=\sum_{t}\left(\frac{\partial \mathbf{h}^{(t)}}{\partial \mathbf{b}}\right)^\mathsf{T}        \nabla_{\mathbf{h}^{(t)}} \mathcal{L} \notag\\
-    \nabla_{\mathbf{V}} \mathcal{L} &=\sum_{t}\sum_{i}\left(\frac{\partial \mathcal{L}}{\partial y_i^{(t)} }\right)\nabla_{\mathbf{V}^{(t)}}y_i^{(t)} \notag\\
-    \nabla_{\mathbf{W}} \mathcal{L} &=\sum_{t}\sum_{i}\left(\frac{\partial \mathcal{L}}{\partial h_i^{(t)}}\right)\nabla_{\mathbf{w}^{(t)}} h_i^{(t)} \notag\\
-    \nabla_{\mathbf{U}} \mathcal{L} &=\sum_{t}\sum_{i}\left(\frac{\partial \mathcal{L}}{\partial h_i^{(t)}}\right)\nabla_{\mathbf{U}^{(t)}}h_i^{(t)}.
-    \tag{1}
-\end{align*}
-$$
-<p>&nbsp;<br>
-</section>
+<h2 id="verifying-the-data-set">Verifying the data set </h2>
 
-<section>
-<h2 id="summary-of-rnns">Summary of RNNs </h2>
+<p>To verify that the dataset looks correct, let's plot the first 25 images from the training set and display the class name below each image.</p>
 
-<p>Recurrent neural networks (RNNs) have in general no probabilistic component
-in a model. With a given fixed input and target from data, the RNNs learn the intermediate
-association between various layers.
-The inputs, outputs, and internal representation (hidden states) are all
-real-valued vectors.
-</p>
 
-<p>In a  traditional NN, it is assumed that every input is
-independent of each other.  But with sequential data, the input at a given stage \( t \) depends on the input from the previous stage \( t-1 \)
-</p>
+<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #eeeedd">
+  <pre style="font-size: 80%; line-height: 125%;">class_names = [<span style="color: #CD5555">&#39;airplane&#39;</span>, <span style="color: #CD5555">&#39;automobile&#39;</span>, <span style="color: #CD5555">&#39;bird&#39;</span>, <span style="color: #CD5555">&#39;cat&#39;</span>, <span style="color: #CD5555">&#39;deer&#39;</span>,
+               <span style="color: #CD5555">&#39;dog&#39;</span>, <span style="color: #CD5555">&#39;frog&#39;</span>, <span style="color: #CD5555">&#39;horse&#39;</span>, <span style="color: #CD5555">&#39;ship&#39;</span>, <span style="color: #CD5555">&#39;truck&#39;</span>]
+plt.figure(figsize=(<span style="color: #B452CD">10</span>,<span style="color: #B452CD">10</span>))
+<span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #B452CD">25</span>):
+    plt.subplot(<span style="color: #B452CD">5</span>,<span style="color: #B452CD">5</span>,i+<span style="color: #B452CD">1</span>)
+    plt.xticks([])
+    plt.yticks([])
+    plt.grid(<span style="color: #8B008B; font-weight: bold">False</span>)
+    plt.imshow(train_images[i], cmap=plt.cm.binary)
+    <span style="color: #228B22"># The CIFAR labels happen to be arrays, </span>
+    <span style="color: #228B22"># which is why you need the extra index</span>
+    plt.xlabel(class_names[train_labels[i][<span style="color: #B452CD">0</span>]])
+plt.show()
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
+</div>
 </section>
 
 <section>
-<h2 id="summary-of-a-typical-rnn">Summary of a  typical RNN </h2>
+<h2 id="set-up-the-model">Set up  the model </h2>
 
-<ol>
-<p><li> Weight matrices \( U \), \( W \) and \( V \) that connect the input layer at a stage \( t \) with the hidden layer \( h_t \), the previous hidden layer \( h_{t-1} \) with \( h_t \) and the hidden layer \( h_t \) connecting with the output layer at the same stage and producing an output \( \tilde{y}_t \), respectively.</li>
-<p><li> The output from the hidden layer \( h_t \) is oftem modulated by a \( \tanh{} \) function \( h_t=\sigma_h(x_t,h_{t-1})=\tanh{(Ux_t+Wh_{t-1}+b)} \) with \( b \) a bias value</li>
-<p><li> The output from the hidden layer produces \( \tilde{y}_t=\sigma_y(Vh_t+c) \) where \( c \) is a new bias parameter.</li>
-<p><li> The output from the training at a given stage is in turn compared with the observation \( y_t \) thorugh a chosen cost function.</li>
-</ol>
-<p>
-<p>The function \( g \) can any of the standard activation functions, that is a Sigmoid, a Softmax, a ReLU and other.
-The parameters are trained through the so-called back-propagation through time (BPTT) algorithm.
-</p>
-</section>
+<p>The 6 lines of code below define the convolutional base using a common pattern: a stack of Conv2D and MaxPooling2D layers.</p>
 
-<section>
-<h2 id="four-effective-ways-to-learn-an-rnn-and-preparing-for-next-week">Four effective ways to learn an RNN and preparing for next week </h2>
-<ol>
-<p><li> Long Short Term Memory Make the RNN out of little modules that are designed to remember values for a long time.</li>
-<p><li> Hessian Free Optimization: Deal with the vanishing gradients problem by using a fancy optimizer that can detect directions with a tiny gradient but even smaller curvature.</li>
-<p><li> Echo State Networks: Initialize the input a hidden and hidden-hidden and output-hidden connections very carefully so that the hidden state has a huge reservoir of weakly coupled oscillators which can be selectively driven by the input.</li>
-<ul>
+<p>As input, a CNN takes tensors of shape (image_height, image_width, color_channels), ignoring the batch size. If you are new to these dimensions, color_channels refers to (R,G,B). In this example, you will configure our CNN to process inputs of shape (32, 32, 3), which is the format of CIFAR images. You can do this by passing the argument input_shape to our first layer.</p>
 
-<p><li> ESNs only need to learn the hidden-output connections.</li>
-</ul>
-<p>
-<p><li> Good initialization with momentum Initialize like in Echo State Networks, but then learn all of the connections using momentum</li>
-</ol>
-</section>
 
-<section>
-<h2 id="gating-mechanism-long-short-term-memory-lstm">Gating mechanism: Long Short Term Memory (LSTM) </h2>
+<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #eeeedd">
+  <pre style="font-size: 80%; line-height: 125%;">model = models.Sequential()
+model.add(layers.Conv2D(<span style="color: #B452CD">32</span>, (<span style="color: #B452CD">3</span>, <span style="color: #B452CD">3</span>), activation=<span style="color: #CD5555">&#39;relu&#39;</span>, input_shape=(<span style="color: #B452CD">32</span>, <span style="color: #B452CD">32</span>, <span style="color: #B452CD">3</span>)))
+model.add(layers.MaxPooling2D((<span style="color: #B452CD">2</span>, <span style="color: #B452CD">2</span>)))
+model.add(layers.Conv2D(<span style="color: #B452CD">64</span>, (<span style="color: #B452CD">3</span>, <span style="color: #B452CD">3</span>), activation=<span style="color: #CD5555">&#39;relu&#39;</span>))
+model.add(layers.MaxPooling2D((<span style="color: #B452CD">2</span>, <span style="color: #B452CD">2</span>)))
+model.add(layers.Conv2D(<span style="color: #B452CD">64</span>, (<span style="color: #B452CD">3</span>, <span style="color: #B452CD">3</span>), activation=<span style="color: #CD5555">&#39;relu&#39;</span>))
 
-<p>Besides a simple recurrent neural network layer, as discussed above, there are two other
-commonly used types of recurrent neural network layers: Long Short
-Term Memory (LSTM) and Gated Recurrent Unit (GRU).  For a short
-introduction to these layers see <a href="https://medium.com/mindboard/lstm-vs-gru-experimental-comparison-955820c21e8b" target="_blank"><tt>https://medium.com/mindboard/lstm-vs-gru-experimental-comparison-955820c21e8b</tt></a>
-and <a href="https://medium.com/mindboard/lstm-vs-gru-experimental-comparison-955820c21e8b" target="_blank"><tt>https://medium.com/mindboard/lstm-vs-gru-experimental-comparison-955820c21e8b</tt></a>.
-</p>
+<span style="color: #228B22"># Let&#39;s display the architecture of our model so far.</span>
 
-<p>LSTM uses a memory cell for 
-modeling long-range dependencies and avoid vanishing gradient
- problems.
-Capable of modeling longer term dependencies by having
-memory cells and gates that controls the information flow along
-with the memory cells.
-</p>
+model.summary()
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
+</div>
 
-<ol>
-<p><li> Introduced by Hochreiter and Schmidhuber (1997) who solved the problem of getting an RNN to remember things for a long time (like hundreds of time steps).</li>
-<p><li> They designed a memory cell using logistic and linear units with multiplicative interactions.</li>
-<p><li> Information gets into the cell whenever its &#8220;write&#8221; gate is on.</li>
-<p><li> The information stays in the cell so long as its <b>keep</b> gate is on.</li>
-<p><li> Information can be read from the cell by turning on its <b>read</b> gate.</li> 
-</ol>
+<p>You can see that the output of every Conv2D and MaxPooling2D layer is a 3D tensor of shape (height, width, channels). The width and height dimensions tend to shrink as you go deeper in the network. The number of output channels for each Conv2D layer is controlled by the first argument (e.g., 32 or 64). Typically, as the width and height shrink, you can afford (computationally) to add more output channels in each Conv2D layer.</p>
 </section>
 
 <section>
-<h2 id="implementing-a-memory-cell-in-a-neural-network">Implementing a memory cell in a neural network </h2>
+<h2 id="add-dense-layers-on-top">Add Dense layers on top </h2>
 
-<p>To preserve information for a long time in
-the activities of an RNN, we use a circuit
-that implements an analog memory cell.
+<p>To complete our model, you will feed the last output tensor from the
+convolutional base (of shape (4, 4, 64)) into one or more Dense layers
+to perform classification. Dense layers take vectors as input (which
+are 1D), while the current output is a 3D tensor. First, you will
+flatten (or unroll) the 3D output to 1D, then add one or more Dense
+layers on top. CIFAR has 10 output classes, so you use a final Dense
+layer with 10 outputs and a softmax activation.
 </p>
 
-<ol>
-<p><li> A linear unit that has a self-link with a weight of 1 will maintain its state.</li>
-<p><li> Information is stored in the cell by activating its write gate.</li>
-<p><li> Information is retrieved by activating the read gate.</li>
-<p><li> We can backpropagate through this circuit because logistics are have nice derivatives.</li> 
-</ol>
-</section>
-
-<section>
-<h2 id="lstm-details">LSTM details </h2>
 
-<p>The LSTM is a unit cell that is made of three gates:</p>
-<ol>
-<p><li> the input gate,</li>
-<p><li> the forget gate,</li>
-<p><li> and the output gate.</li>
-</ol>
-<p>
-<p>It also introduces a cell state \( c \), which can be thought of as the
-long-term memory, and a hidden state \( h \) which can be thought of as
-the short-term memory.
-</p>
-</section>
+<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #eeeedd">
+  <pre style="font-size: 80%; line-height: 125%;">model.add(layers.Flatten())
+model.add(layers.Dense(<span style="color: #B452CD">64</span>, activation=<span style="color: #CD5555">&#39;relu&#39;</span>))
+model.add(layers.Dense(<span style="color: #B452CD">10</span>))
+Here<span style="color: #CD5555">&#39;s the complete architecture of our model.</span>
 
-<section>
-<h2 id="basic-layout">Basic layout </h2>
+model.summary()
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
+</div>
 
-<br/><br/>
-<center>
-<p><img src="http://github.com/figslides/lstm.png" width="700" align="bottom"></p>
-</center>
-<br/><br/>
+<p>As you can see, our (4, 4, 64) outputs were flattened into vectors of shape (1024) before going through two Dense layers.</p>
 </section>
 
 <section>
-<h2 id="more-lstm-details">More LSTM details </h2>
+<h2 id="compile-and-train-the-model">Compile and train the model </h2>
 
-<p>The first stage is called the forget gate, where we combine the input
-at (say, time \( t \)), and the hidden cell state input at \( t-1 \), passing
-it through the Sigmoid activation function and then performing an
-element-wise multiplication, denoted by \( \otimes \).
-</p>
 
-<p>It follows </p>
-<p>&nbsp;<br>
-$$
-\mathbf{f}^{(t)} = \sigma(W_f\mathbf{x}^{(t)} + U_f\mathbf{h}^{(t-1)} + \mathbf{b}_f)
-$$
-<p>&nbsp;<br>
+<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #eeeedd">
+  <pre style="font-size: 80%; line-height: 125%;">model.compile(optimizer=<span style="color: #CD5555">&#39;adam&#39;</span>,
+              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=<span style="color: #8B008B; font-weight: bold">True</span>),
+              metrics=[<span style="color: #CD5555">&#39;accuracy&#39;</span>])
 
-<p>where \( W \) and \( U \) are the weights respectively.</p>
+history = model.fit(train_images, train_labels, epochs=<span style="color: #B452CD">10</span>, 
+                    validation_data=(test_images, test_labels))
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
+</div>
 </section>
 
 <section>
-<h2 id="the-forget-gate">The forget gate </h2>
-
-<p>This is called the forget gate since the Sigmoid activation function's
-outputs are very close to \( 0 \) if the argument for the function is very
-negative, and \( 1 \) if the argument is very positive. Hence we can
-control the amount of information we want to take from the long-term
-memory.
-</p>
-</section>
+<h2 id="finally-evaluate-the-model">Finally, evaluate the model </h2>
 
-<section>
-<h2 id="input-gate">Input gate </h2>
-
-<p>The next stage is the input gate, which consists of both a Sigmoid
-function (\( \sigma_i \)), which decide what percentage of the input will
-be stored in the long-term memory, and the \( \tanh_i \) function, which
-decide what is the full memory that can be stored in the long term
-memory. When these results are calculated and multiplied together, it
-is added to the cell state or stored in the long-term memory, denoted
-as \( \oplus \). 
-</p>
 
-<p>We have</p>
-<p>&nbsp;<br>
-$$
-\mathbf{i}^{(t)} = \sigma_g(W_i\mathbf{x}^{(t)} + U_i\mathbf{h}^{(t-1)} + \mathbf{b}_i),
-$$
-<p>&nbsp;<br>
+<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #eeeedd">
+  <pre style="font-size: 80%; line-height: 125%;">plt.plot(history.history[<span style="color: #CD5555">&#39;accuracy&#39;</span>], label=<span style="color: #CD5555">&#39;accuracy&#39;</span>)
+plt.plot(history.history[<span style="color: #CD5555">&#39;val_accuracy&#39;</span>], label = <span style="color: #CD5555">&#39;val_accuracy&#39;</span>)
+plt.xlabel(<span style="color: #CD5555">&#39;Epoch&#39;</span>)
+plt.ylabel(<span style="color: #CD5555">&#39;Accuracy&#39;</span>)
+plt.ylim([<span style="color: #B452CD">0.5</span>, <span style="color: #B452CD">1</span>])
+plt.legend(loc=<span style="color: #CD5555">&#39;lower right&#39;</span>)
 
-<p>and</p>
-<p>&nbsp;<br>
-$$
-\mathbf{\tilde{c}}^{(t)} = \tanh(W_c\mathbf{x}^{(t)} + U_c\mathbf{h}^{(t-1)} + \mathbf{b}_c),
-$$
-<p>&nbsp;<br>
+test_loss, test_acc = model.evaluate(test_images,  test_labels, verbose=<span style="color: #B452CD">2</span>)
 
-<p>again the \( W \) and \( U \) are the weights.</p>
+<span style="color: #658b00">print</span>(test_acc)
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
+</div>
 </section>
 
 <section>
-<h2 id="forget-and-input">Forget and input </h2>
-
-<p>The forget gate and the input gate together also update the cell state with the following equation, </p>
-<p>&nbsp;<br>
-$$
-\mathbf{c}^{(t)} = \mathbf{f}^{(t)} \otimes \mathbf{c}^{(t-1)} + \mathbf{i}^{(t)} \otimes \mathbf{\tilde{c}}^{(t)},
-$$
-<p>&nbsp;<br>
-
-<p>where \( f^{(t)} \) and \( i^{(t)} \) are the outputs of the forget gate and the input gate, respectively.</p>
-</section>
+<h2 id="building-code-using-pytorch">Building code using Pytorch </h2>
 
-<section>
-<h2 id="output-gate">Output gate </h2>
-
-<p>The final stage of the LSTM is the output gate, and its purpose is to
-update the short-term memory.  To achieve this, we take the newly
-generated long-term memory and process it through a hyperbolic tangent
-(\( \tanh \)) function creating a potential new short-term memory. We then
-multiply this potential memory by the output of the Sigmoid function
-(\( \sigma_o \)). This multiplication generates the final output as well
-as the input for the next hidden cell (\( h^{\langle t \rangle} \)) within
-the LSTM cell.
+<p>This code loads and normalizes the MNIST dataset. Thereafter it defines  a CNN architecture with:</p>
+<ol>
+<p><li> Two convolutional layers</li>
+<p><li> Max pooling</li>
+<p><li> Dropout for regularization</li>
+<p><li> Two fully connected layers</li>
+</ol>
+<p>
+<p>It uses the Adam optimizer and for cost function it employs the
+Cross-Entropy function. It trains for 10 epochs.
+You can modify the architecture (number of layers, channels, dropout
+rate) or training parameters (learning rate, batch size, epochs) to
+experiment with different configurations.
 </p>
 
-<p>We have </p>
-<p>&nbsp;<br>
-$$
-\begin{aligned}
-\mathbf{o}^{(t)} &= \sigma_g(W_o\mathbf{x}^{(t)} + U_o\mathbf{h}^{(t-1)} + \mathbf{b}_o), \\
-\mathbf{h}^{(t)} &= \mathbf{o}^{(t)} \otimes \sigma_h(\mathbf{c}^{(t)}). \\
-\end{aligned}
-$$
-<p>&nbsp;<br>
 
-<p>where \( \mathbf{W_o,U_o} \) are the weights of the output gate and \( \mathbf{b_o} \) is the bias of the output gate.</p>
+<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #eeeedd">
+  <pre style="font-size: 80%; line-height: 125%;"><span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">torch</span>
+<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">torch.nn</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">nn</span>
+<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">torch.nn.functional</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">F</span>
+<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">torch.optim</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">optim</span>
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">torchvision</span> <span style="color: #8B008B; font-weight: bold">import</span> datasets, transforms
+
+<span style="color: #228B22"># Set device</span>
+device = torch.device(<span style="color: #CD5555">&quot;cuda&quot;</span> <span style="color: #8B008B; font-weight: bold">if</span> torch.cuda.is_available() <span style="color: #8B008B; font-weight: bold">else</span> <span style="color: #CD5555">&quot;cpu&quot;</span>)
+
+<span style="color: #228B22"># Define transforms</span>
+transform = transforms.Compose([
+   transforms.ToTensor(),
+   transforms.Normalize((<span style="color: #B452CD">0.1307</span>,), (<span style="color: #B452CD">0.3081</span>,))
+])
+
+<span style="color: #228B22"># Load datasets</span>
+train_dataset = datasets.MNIST(root=<span style="color: #CD5555">&#39;./data&#39;</span>, train=<span style="color: #8B008B; font-weight: bold">True</span>, download=<span style="color: #8B008B; font-weight: bold">True</span>, transform=transform)
+test_dataset = datasets.MNIST(root=<span style="color: #CD5555">&#39;./data&#39;</span>, train=<span style="color: #8B008B; font-weight: bold">False</span>, download=<span style="color: #8B008B; font-weight: bold">True</span>, transform=transform)
+
+<span style="color: #228B22"># Create data loaders</span>
+train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=<span style="color: #B452CD">64</span>, shuffle=<span style="color: #8B008B; font-weight: bold">True</span>)
+test_loader = torch.utils.data.DataLoader(test_dataset, batch_size=<span style="color: #B452CD">64</span>, shuffle=<span style="color: #8B008B; font-weight: bold">False</span>)
+
+<span style="color: #228B22"># Define CNN model</span>
+<span style="color: #8B008B; font-weight: bold">class</span> <span style="color: #008b45; font-weight: bold">CNN</span>(nn.Module):
+   <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">__init__</span>(<span style="color: #658b00">self</span>):
+       <span style="color: #658b00">super</span>(CNN, <span style="color: #658b00">self</span>).<span style="color: #008b45">__init__</span>()
+       <span style="color: #658b00">self</span>.conv1 = nn.Conv2d(<span style="color: #B452CD">1</span>, <span style="color: #B452CD">32</span>, <span style="color: #B452CD">3</span>, padding=<span style="color: #B452CD">1</span>)
+       <span style="color: #658b00">self</span>.conv2 = nn.Conv2d(<span style="color: #B452CD">32</span>, <span style="color: #B452CD">64</span>, <span style="color: #B452CD">3</span>, padding=<span style="color: #B452CD">1</span>)
+       <span style="color: #658b00">self</span>.pool = nn.MaxPool2d(<span style="color: #B452CD">2</span>, <span style="color: #B452CD">2</span>)
+       <span style="color: #658b00">self</span>.fc1 = nn.Linear(<span style="color: #B452CD">64</span>*<span style="color: #B452CD">7</span>*<span style="color: #B452CD">7</span>, <span style="color: #B452CD">1024</span>)
+       <span style="color: #658b00">self</span>.fc2 = nn.Linear(<span style="color: #B452CD">1024</span>, <span style="color: #B452CD">10</span>)
+       <span style="color: #658b00">self</span>.dropout = nn.Dropout(<span style="color: #B452CD">0.5</span>)
+
+   <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">forward</span>(<span style="color: #658b00">self</span>, x):
+       x = <span style="color: #658b00">self</span>.pool(F.relu(<span style="color: #658b00">self</span>.conv1(x)))
+       x = <span style="color: #658b00">self</span>.pool(F.relu(<span style="color: #658b00">self</span>.conv2(x)))
+       x = x.view(-<span style="color: #B452CD">1</span>, <span style="color: #B452CD">64</span>*<span style="color: #B452CD">7</span>*<span style="color: #B452CD">7</span>)
+       x = <span style="color: #658b00">self</span>.dropout(F.relu(<span style="color: #658b00">self</span>.fc1(x)))
+       x = <span style="color: #658b00">self</span>.fc2(x)
+       <span style="color: #8B008B; font-weight: bold">return</span> x
+
+<span style="color: #228B22"># Initialize model, loss function, and optimizer</span>
+model = CNN().to(device)
+criterion = nn.CrossEntropyLoss()
+optimizer = optim.Adam(model.parameters(), lr=<span style="color: #B452CD">0.001</span>)
+
+<span style="color: #228B22"># Training loop</span>
+num_epochs = <span style="color: #B452CD">10</span>
+<span style="color: #8B008B; font-weight: bold">for</span> epoch <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(num_epochs):
+   model.train()
+   running_loss = <span style="color: #B452CD">0.0</span>
+   <span style="color: #8B008B; font-weight: bold">for</span> batch_idx, (data, target) <span style="color: #8B008B">in</span> <span style="color: #658b00">enumerate</span>(train_loader):
+       data, target = data.to(device), target.to(device)
+       optimizer.zero_grad()
+       outputs = model(data)
+       loss = criterion(outputs, target)
+       loss.backward()
+       optimizer.step()
+       running_loss += loss.item()
+
+   <span style="color: #658b00">print</span>(<span style="color: #CD5555">f&#39;Epoch [{</span>epoch+<span style="color: #B452CD">1</span><span style="color: #CD5555">}/{</span>num_epochs<span style="color: #CD5555">}], Loss: {</span>running_loss/<span style="color: #658b00">len</span>(train_loader)<span style="color: #CD5555">:.4f}&#39;</span>)
+
+<span style="color: #228B22"># Testing the model</span>
+model.eval()
+correct = <span style="color: #B452CD">0</span>
+total = <span style="color: #B452CD">0</span>
+<span style="color: #8B008B; font-weight: bold">with</span> torch.no_grad():
+   <span style="color: #8B008B; font-weight: bold">for</span> data, target <span style="color: #8B008B">in</span> test_loader:
+       data, target = data.to(device), target.to(device)
+       outputs = model(data)
+       _, predicted = torch.max(outputs.data, <span style="color: #B452CD">1</span>)
+       total += target.size(<span style="color: #B452CD">0</span>)
+       correct += (predicted == target).sum().item()
+
+<span style="color: #658b00">print</span>(<span style="color: #CD5555">f&#39;Test Accuracy: {</span><span style="color: #B452CD">100</span> * correct / total<span style="color: #CD5555">:.2f}%&#39;</span>)
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
+</div>
 </section>
 
 
diff --git a/doc/pub/week45/html/week45-solarized.html b/doc/pub/week45/html/week45-solarized.html
index dd2bf3662..d6fb03294 100644
--- a/doc/pub/week45/html/week45-solarized.html
+++ b/doc/pub/week45/html/week45-solarized.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks (RNNs)">
-<title>Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks (RNNs)</title>
+<meta name="description" content="Week 45,  Convolutional Neural Networks (CCNs)">
+<title>Week 45,  Convolutional Neural Networks (CCNs)</title>
 <link href="https://cdn.rawgit.com/doconce/doconce/master/bundled/html_styles/style_solarized_box/css/solarized_light_code.css" rel="stylesheet" type="text/css" title="light"/>
 <script src="https://cdn.rawgit.com/doconce/doconce/master/bundled/html_styles/style_solarized_box/js/highlight.pack.js"></script>
 <script>hljs.initHighlightingOnLoad();</script>
@@ -64,19 +64,19 @@
 <!-- tocinfo
 {'highest level': 2,
  'sections': [('Plans for week 45', 2, None, 'plans-for-week-45'),
-              ('Material for the lab sessions, additional ways to present '
-               'classification results and other practicalities',
+              ('Material for the lab sessions',
                2,
                None,
-               'material-for-the-lab-sessions-additional-ways-to-present-classification-results-and-other-practicalities'),
-              ('Material for Lecture Monday November 4',
+               'material-for-the-lab-sessions'),
+              ('Material for Lecture Monday November 3',
                2,
                None,
-               'material-for-lecture-monday-november-4'),
-              ('Convolutional Neural Networks (recognizing images)',
+               'material-for-lecture-monday-november-3'),
+              ('Convolutional Neural Networks (recognizing images), reminder '
+               'from last week',
                2,
                None,
-               'convolutional-neural-networks-recognizing-images'),
+               'convolutional-neural-networks-recognizing-images-reminder-from-last-week'),
               ('What is the Difference', 2, None, 'what-is-the-difference'),
               ('Neural Networks vs CNNs', 2, None, 'neural-networks-vs-cnns'),
               ('Why CNNS for images, sound files, medical images from CT scans '
@@ -89,10 +89,13 @@
                None,
                'regular-nns-don-t-scale-well-to-full-images'),
               ('3D volumes of neurons', 2, None, '3d-volumes-of-neurons'),
+              ('More on Dimensionalities', 2, None, 'more-on-dimensionalities'),
+              ('Further remarks', 2, None, 'further-remarks'),
               ('Layers used to build CNNs',
                2,
                None,
                'layers-used-to-build-cnns'),
+              ('Transforming images', 2, None, 'transforming-images'),
               ('CNNs in brief', 2, None, 'cnns-in-brief'),
               ('A deep CNN model ("From Raschka et '
                'al":"https://github.com/rasbt/machine-learning-book")',
@@ -100,208 +103,104 @@
                None,
                'a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
               ('Key Idea', 2, None, 'key-idea'),
-              ('Building convolutional neural networks in Tensorflow and Keras',
+              ('Mathematics of CNNs', 2, None, 'mathematics-of-cnns'),
+              ('Convolution Examples: Polynomial multiplication',
                2,
                None,
-               'building-convolutional-neural-networks-in-tensorflow-and-keras'),
-              ('Setting it up', 2, None, 'setting-it-up'),
-              ('The MNIST dataset again', 2, None, 'the-mnist-dataset-again'),
-              ('Strong correlations', 2, None, 'strong-correlations'),
-              ('Layers of a CNN', 2, None, 'layers-of-a-cnn'),
-              ('Systematic reduction', 2, None, 'systematic-reduction'),
-              ('Prerequisites: Collect and pre-process data',
-               2,
-               None,
-               'prerequisites-collect-and-pre-process-data'),
-              ('Importing Keras and Tensorflow',
-               2,
-               None,
-               'importing-keras-and-tensorflow'),
-              ('Running with Keras', 2, None, 'running-with-keras'),
-              ('Final part', 2, None, 'final-part'),
-              ('Final visualization', 2, None, 'final-visualization'),
-              ('The CIFAR01 data set', 2, None, 'the-cifar01-data-set'),
-              ('Verifying the data set', 2, None, 'verifying-the-data-set'),
-              ('Set up  the model', 2, None, 'set-up-the-model'),
-              ('Add Dense layers on top', 2, None, 'add-dense-layers-on-top'),
-              ('Compile and train the model',
-               2,
-               None,
-               'compile-and-train-the-model'),
-              ('Finally, evaluate the model',
-               2,
-               None,
-               'finally-evaluate-the-model'),
-              ('Building our own CNN code',
-               2,
-               None,
-               'building-our-own-cnn-code'),
-              ('List of contents:', 3, None, 'list-of-contents'),
-              ('Schedulers', 3, None, 'schedulers'),
-              ('Usage of schedulers', 3, None, 'usage-of-schedulers'),
-              ('Cost functions', 3, None, 'cost-functions'),
-              ('Usage of cost functions', 3, None, 'usage-of-cost-functions'),
-              ('Activation functions', 3, None, 'activation-functions'),
-              ('Usage of activation functions',
-               3,
-               None,
-               'usage-of-activation-functions'),
-              ('Convolution', 3, None, 'convolution'),
-              ('Layers', 3, None, 'layers'),
-              ('Convolution2DLayer: convolution in a hidden layer',
-               3,
-               None,
-               'convolution2dlayer-convolution-in-a-hidden-layer'),
-              ('Backpropagation in the convolutional layer',
-               3,
-               None,
-               'backpropagation-in-the-convolutional-layer'),
-              ('Demonstration', 3, None, 'demonstration'),
-              ('Pooling Layer', 3, None, 'pooling-layer'),
-              ('Flattening Layer', 3, None, 'flattening-layer'),
-              ('Fully Connected Layers', 3, None, 'fully-connected-layers'),
-              ('Optimized Convolution2DLayer',
-               3,
-               None,
-               'optimized-convolution2dlayer'),
-              ('The Convolutional Neural Network (CNN)',
-               3,
-               None,
-               'the-convolutional-neural-network-cnn'),
-              ('Usage of CNN code', 3, None, 'usage-of-cnn-code'),
-              ('Additional Remarks', 3, None, 'additional-remarks'),
-              ('Remarks on the speed', 3, None, 'remarks-on-the-speed'),
-              ('Convolution using separable kernels',
-               3,
-               None,
-               'convolution-using-separable-kernels'),
-              ('Convolution in the Fourier domain',
-               3,
-               None,
-               'convolution-in-the-fourier-domain'),
-              ('From FFNNs and CNNs to recurrent neural networks (RNNs)',
-               2,
-               None,
-               'from-ffnns-and-cnns-to-recurrent-neural-networks-rnns'),
-              ('Feedback connections', 2, None, 'feedback-connections'),
-              ('Vanishing gradients', 2, None, 'vanishing-gradients'),
-              ('Recurrent neural networks (RNNs): Overarching view',
-               2,
-               None,
-               'recurrent-neural-networks-rnns-overarching-view'),
-              ('Sequential data only?', 2, None, 'sequential-data-only'),
-              ('Differential equations', 2, None, 'differential-equations'),
-              ('A simple example', 2, None, 'a-simple-example'),
-              ('RNNs', 2, None, 'rnns'),
-              ('What kinds of behaviour can RNNs exhibit?',
-               2,
-               None,
-               'what-kinds-of-behaviour-can-rnns-exhibit'),
-              ('Basic layout,  "Figures from Sebastian Rashcka et al, Machine '
-               'learning with Sickit-Learn and '
-               'PyTorch":"https://sebastianraschka.com/blog/2022/ml-pytorch-book.html"',
-               2,
-               None,
-               'basic-layout-figures-from-sebastian-rashcka-et-al-machine-learning-with-sickit-learn-and-pytorch-https-sebastianraschka-com-blog-2022-ml-pytorch-book-html'),
-              ('Solving differential equations with RNNs',
-               2,
-               None,
-               'solving-differential-equations-with-rnns'),
-              ('Two first-order differential equations',
-               2,
-               None,
-               'two-first-order-differential-equations'),
-              ('Velocity only', 2, None, 'velocity-only'),
-              ('Linking with RNNs', 2, None, 'linking-with-rnns'),
-              ('Minor rewrite', 2, None, 'minor-rewrite'),
-              ('RNNs in more detail', 2, None, 'rnns-in-more-detail'),
-              ('RNNs in more detail, part 2',
-               2,
-               None,
-               'rnns-in-more-detail-part-2'),
-              ('RNNs in more detail, part 3',
-               2,
-               None,
-               'rnns-in-more-detail-part-3'),
-              ('RNNs in more detail, part 4',
-               2,
-               None,
-               'rnns-in-more-detail-part-4'),
-              ('RNNs in more detail, part 5',
+               'convolution-examples-polynomial-multiplication'),
+              ('Efficient Polynomial Multiplication',
                2,
                None,
-               'rnns-in-more-detail-part-5'),
-              ('RNNs in more detail, part 6',
+               'efficient-polynomial-multiplication'),
+              ('Further simplification', 2, None, 'further-simplification'),
+              ('A more efficient way of coding the above Convolution',
                2,
                None,
-               'rnns-in-more-detail-part-6'),
-              ('RNNs in more detail, part 7',
+               'a-more-efficient-way-of-coding-the-above-convolution'),
+              ('Commutative process', 2, None, 'commutative-process'),
+              ('Toeplitz matrices', 2, None, 'toeplitz-matrices'),
+              ('Fourier series and Toeplitz matrices',
                2,
                None,
-               'rnns-in-more-detail-part-7'),
-              ('Backpropagation through time',
+               'fourier-series-and-toeplitz-matrices'),
+              ('Generalizing the above one-dimensional case',
                2,
                None,
-               'backpropagation-through-time'),
-              ('The backward pass is linear',
+               'generalizing-the-above-one-dimensional-case'),
+              ('Memory considerations', 2, None, 'memory-considerations'),
+              ('Padding', 2, None, 'padding'),
+              ('New vector', 2, None, 'new-vector'),
+              ('Rewriting as dot products',
                2,
                None,
-               'the-backward-pass-is-linear'),
-              ('The problem of exploding or vanishing gradients',
+               'rewriting-as-dot-products'),
+              ('Cross correlation', 2, None, 'cross-correlation'),
+              ('Two-dimensional objects', 2, None, 'two-dimensional-objects'),
+              ('CNNs in more detail, simple example',
                2,
                None,
-               'the-problem-of-exploding-or-vanishing-gradients'),
-              ('Mathematical setup', 2, None, 'mathematical-setup'),
-              ('Back propagation in time through figures, part 1',
+               'cnns-in-more-detail-simple-example'),
+              ('The convolution stage', 2, None, 'the-convolution-stage'),
+              ('Finding the number of parameters',
                2,
                None,
-               'back-propagation-in-time-through-figures-part-1'),
-              ('Back propagation in time, part 2',
+               'finding-the-number-of-parameters'),
+              ('New image (or volume)', 2, None, 'new-image-or-volume'),
+              ('Parameters to train, common settings',
                2,
                None,
-               'back-propagation-in-time-part-2'),
-              ('Back propagation in time, part 3',
+               'parameters-to-train-common-settings'),
+              ('Examples of CNN setups', 2, None, 'examples-of-cnn-setups'),
+              ('Summarizing: Performing a general discrete convolution ("From '
+               'Raschka et '
+               'al":"https://github.com/rasbt/machine-learning-book")',
                2,
                None,
-               'back-propagation-in-time-part-3'),
-              ('Back propagation in time, part 4',
+               'summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
+              ('Pooling', 2, None, 'pooling'),
+              ('Pooling arithmetic', 2, None, 'pooling-arithmetic'),
+              ('Pooling types ("From Raschka et '
+               'al":"https://github.com/rasbt/machine-learning-book")',
                2,
                None,
-               'back-propagation-in-time-part-4'),
-              ('Back propagation in time in equations',
+               'pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
+              ('Building convolutional neural networks using Tensorflow and '
+               'Keras',
                2,
                None,
-               'back-propagation-in-time-in-equations'),
-              ('Chain rule again', 2, None, 'chain-rule-again'),
-              ('Gradients of loss functions',
+               'building-convolutional-neural-networks-using-tensorflow-and-keras'),
+              ('Setting it up', 2, None, 'setting-it-up'),
+              ('The MNIST dataset again', 2, None, 'the-mnist-dataset-again'),
+              ('Strong correlations', 2, None, 'strong-correlations'),
+              ('Layers of a CNN', 2, None, 'layers-of-a-cnn'),
+              ('Systematic reduction', 2, None, 'systematic-reduction'),
+              ('Prerequisites: Collect and pre-process data',
                2,
                None,
-               'gradients-of-loss-functions'),
-              ('Summary of RNNs', 2, None, 'summary-of-rnns'),
-              ('Summary of a  typical RNN',
+               'prerequisites-collect-and-pre-process-data'),
+              ('Importing Keras and Tensorflow',
                2,
                None,
-               'summary-of-a-typical-rnn'),
-              ('Four effective ways to learn an RNN and preparing for next '
-               'week',
+               'importing-keras-and-tensorflow'),
+              ('Running with Keras', 2, None, 'running-with-keras'),
+              ('Final part', 2, None, 'final-part'),
+              ('Final visualization', 2, None, 'final-visualization'),
+              ('The CIFAR01 data set', 2, None, 'the-cifar01-data-set'),
+              ('Verifying the data set', 2, None, 'verifying-the-data-set'),
+              ('Set up  the model', 2, None, 'set-up-the-model'),
+              ('Add Dense layers on top', 2, None, 'add-dense-layers-on-top'),
+              ('Compile and train the model',
                2,
                None,
-               'four-effective-ways-to-learn-an-rnn-and-preparing-for-next-week'),
-              ('Gating mechanism: Long Short Term Memory (LSTM)',
+               'compile-and-train-the-model'),
+              ('Finally, evaluate the model',
                2,
                None,
-               'gating-mechanism-long-short-term-memory-lstm'),
-              ('Implementing a memory cell in a neural network',
+               'finally-evaluate-the-model'),
+              ('Building code using Pytorch',
                2,
                None,
-               'implementing-a-memory-cell-in-a-neural-network'),
-              ('LSTM details', 2, None, 'lstm-details'),
-              ('Basic layout', 2, None, 'basic-layout'),
-              ('More LSTM details', 2, None, 'more-lstm-details'),
-              ('The forget gate', 2, None, 'the-forget-gate'),
-              ('Input gate', 2, None, 'input-gate'),
-              ('Forget and input', 2, None, 'forget-and-input'),
-              ('Output gate', 2, None, 'output-gate')]}
+               'building-code-using-pytorch')]}
 end of tocinfo -->
 
 <body>
@@ -323,7 +222,7 @@
 
 <!-- ------------------- main content ---------------------- -->
 <center>
-<h1>Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks (RNNs)</h1>
+<h1>Week 45,  Convolutional Neural Networks (CCNs)</h1>
 </center>  <!-- document title -->
 
 <!-- author(s): Morten Hjorth-Jensen -->
@@ -336,7 +235,7 @@ <h1>Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks
 </center>
 <br>
 <center>
-<h4>November 4-8</h4>
+<h4>November 3-7, 2025</h4>
 </center> <!-- date -->
 <br>
 
@@ -344,19 +243,18 @@ <h4>November 4-8</h4>
 <h2 id="plans-for-week-45">Plans for week 45 </h2>
 
 <div class="alert alert-block alert-block alert-text-normal">
-<b>Material for the lecture on Monday November 4, 2024</b>
+<b>Material for the lecture on Monday November 3, 2025</b>
 <p>
 <ol>
-<li> Convolutional Neural Networks, codes and examples (own code and TensorFlow and Pytorch implementations)</li>
-<li> Recurrent  Neural Networks (RNNs)</li>
-<li> Readings and Videos:
+<li> Convolutional Neural Networks, codes and examples (TensorFlow and Pytorch implementations)</li>
+<li> Readings and Videos:</li>
+<li> These lecture notes at <a href="https://github.com/CompPhysics/MachineLearning/blob/master/doc/pub/week45/ipynb/week45.ipynb" target="_blank"><tt>https://github.com/CompPhysics/MachineLearning/blob/master/doc/pub/week45/ipynb/week45.ipynb</tt></a></li>
+<li> Video of lecture at <a href="https://youtu.be/dZt6Vm1wjhs" target="_blank"><tt>https://youtu.be/dZt6Vm1wjhs</tt></a></li>
+<li> Whiteboard notes at <a href="https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2025/FYSSTKweek45.pdf" target="_blank"><tt>https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2025/FYSSTKweek45.pdf</tt></a></li>
+<li> For a more in depth discussion on  CNNs we recommend Goodfellow et al chapters 9. See also chapter 11 and 12 on practicalities and applications</li>    
+<li> Reading suggestions for implementation of CNNs, see Raschka et al chapters 14-15 at <a href="https://github.com/rasbt/machine-learning-book" target="_blank"><tt>https://github.com/rasbt/machine-learning-book</tt></a>.
+<!-- o Video  on Recurrent Neural Networks from MIT at <a href="https://www.youtube.com/watch?v=SEnXr6v2ifU&ab_channel=AlexanderAmini" target="_blank"><tt>https://www.youtube.com/watch?v=SEnXr6v2ifU&ab_channel=AlexanderAmini</tt></a> -->
 <ol type="a"></li>
- <li> These lecture notes at <a href="https://github.com/CompPhysics/MachineLearning/blob/master/doc/pub/week45/ipynb/week45.ipynb" target="_blank"><tt>https://github.com/CompPhysics/MachineLearning/blob/master/doc/pub/week45/ipynb/week45.ipynb</tt></a>
-<!-- * <a href="https://youtu.be/z0x-vgyAZUk" target="_blank">Video of lecture</a> -->
-<!-- * <a href="https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2023/NotesNov9.pdf" target="_blank">Whiteboard notes</a> --></li>
- <li> For a more in depth discussion on  CNNs and recurrent neural networks we recommend Goodfellow et al chapters 9 and 10. See also chapter 11 and 12 on practicalities and applications</li>    
- <li> Reading suggestions for implementation of CNNs and RNNs, see Raschka et al chapters 14-15 at <a href="https://github.com/rasbt/machine-learning-book" target="_blank"><tt>https://github.com/rasbt/machine-learning-book</tt></a>.</li>
- <li> Video  on Recurrent Neural Networks from MIT at <a href="https://www.youtube.com/watch?v=SEnXr6v2ifU&ab_channel=AlexanderAmini" target="_blank"><tt>https://www.youtube.com/watch?v=SEnXr6v2ifU&ab_channel=AlexanderAmini</tt></a></li>
  <li> Video on Deep Learning at <a href="https://www.youtube.com/playlist?list=PLZHQObOWTQDNU6R1_67000Dx_ZCJB-3pi" target="_blank"><tt>https://www.youtube.com/playlist?list=PLZHQObOWTQDNU6R1_67000Dx_ZCJB-3pi</tt></a></li>
 </ol>
 </ol>
@@ -364,22 +262,15 @@ <h2 id="plans-for-week-45">Plans for week 45 </h2>
 
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="material-for-the-lab-sessions-additional-ways-to-present-classification-results-and-other-practicalities">Material for the lab sessions, additional ways to present classification results and other practicalities </h2>
+<h2 id="material-for-the-lab-sessions">Material for the lab sessions </h2>
 
-<div class="alert alert-block alert-block alert-text-normal">
-<b>Material for the active learning sessions on Tuesday and Wednesday</b>
-<p>
-<ol>
- <li> Discussion of and work on project 3, available from Monday November 4, late evening</li>
-</ol>
-</div>
-  
+<p>Discussion of and work on project 2, no exercises this week, only project work</p>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="material-for-lecture-monday-november-4">Material for Lecture Monday November 4 </h2>
+<h2 id="material-for-lecture-monday-november-3">Material for Lecture Monday November 3 </h2>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="convolutional-neural-networks-recognizing-images">Convolutional Neural Networks (recognizing images) </h2>
+<h2 id="convolutional-neural-networks-recognizing-images-reminder-from-last-week">Convolutional Neural Networks (recognizing images), reminder from last week </h2>
 
 <p>Convolutional neural networks (CNNs) were developed during the last
 decade of the previous century, with a focus on character recognition
@@ -513,6 +404,47 @@ <h2 id="3d-volumes-of-neurons">3D volumes of neurons </h2>
 <p><img src="http://github.com/figslides/cnn.jpeg" width="500" align="bottom"></p>
 </center>
 
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="more-on-dimensionalities">More on Dimensionalities </h2>
+
+<p>In fields like signal processing (and imaging as well), one designs
+so-called filters. These filters are defined by the convolutions and
+are often hand-crafted. One may specify filters for smoothing, edge
+detection, frequency reshaping, and similar operations. However with
+neural networks the idea is to automatically learn the filters and use
+many of them in conjunction with non-linear operations (activation
+functions).
+</p>
+
+<p>As an example consider a neural network operating on sound sequence
+data.  Assume that we an input vector \( \boldsymbol{x} \) of length \( d=10^6 \).  We
+construct then a neural network with onle hidden layer only with
+\( 10^4 \) nodes. This means that we will have a weight matrix with
+\( 10^4\times 10^6=10^{10} \) weights to be determined, together with \( 10^4 \) biases.
+</p>
+
+<p>Assume furthermore that we have an output layer which is meant to train whether the sound sequence represents a human voice (true) or something else (false).
+It means that we have only one output node. But since this output node connects to \( 10^4 \) nodes in the hidden layer, there are in total \( 10^4 \) weights to be determined for the output layer, plus one bias. In total we have
+</p>
+
+$$
+\mathrm{NumberParameters}=10^{10}+10^4+10^4+1 \approx 10^{10},
+$$
+
+<p>that is ten billion parameters to determine. </p>
+
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="further-remarks">Further remarks </h2>
+
+<p>The main principles that justify convolutions is locality of
+information and repetion of patterns within the signal. Sound samples
+of the input in adjacent spots are much more likely to affect each
+other than those that are very far away. Similarly, sounds are
+repeated in multiple times in the signal. While slightly simplistic,
+reasoning about such a sound example demonstrates this. The same
+principles then apply to images and other similar data.
+</p>
+
 <!-- !split  -->
 <h2 id="layers-used-to-build-cnns">Layers used to build CNNs </h2>
 
@@ -533,6 +465,23 @@ <h2 id="layers-used-to-build-cnns">Layers used to build CNNs </h2>
 <li> <b>POOL</b> (pooling) layer will perform a downsampling operation along the spatial dimensions (width, height), resulting in volume such as \( [16\times 16\times 12] \).</li>
 <li> <b>FC</b> (i.e. fully-connected) layer will compute the class scores, resulting in volume of size \( [1\times 1\times 10] \), where each of the 10 numbers correspond to a class score, such as among the 10 categories of the MNIST images we considered above . As with ordinary Neural Networks and as the name implies, each neuron in this layer will be connected to all the numbers in the previous volume.</li>
 </ul>
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="transforming-images">Transforming images </h2>
+
+<p>CNNs transform the original image layer by layer from the original
+pixel values to the final class scores. 
+</p>
+
+<p>Observe that some layers contain
+parameters and other don&#8217;t. In particular, the CNN layers perform
+transformations that are a function of not only the activations in the
+input volume, but also of the parameters (the weights and biases of
+the neurons). On the other hand, the RELU/POOL layers will implement a
+fixed function. The parameters in the CONV/FC layers will be trained
+with gradient descent so that the class scores that the CNN computes
+are consistent with the labels in the training set for each image.
+</p>
+
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
 <h2 id="cnns-in-brief">CNNs in brief </h2>
 
@@ -568,3636 +517,699 @@ <h2 id="key-idea">Key Idea </h2>
 <p>We say we perform a filtering (convolution is the mathematical operation). </p>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="building-convolutional-neural-networks-in-tensorflow-and-keras">Building convolutional neural networks in Tensorflow and Keras </h2>
+<h2 id="mathematics-of-cnns">Mathematics of CNNs </h2>
 
-<p>As discussed above, CNNs are neural networks built from the assumption that the inputs
-to the network are 2D images. This is important because the number of features or pixels in images
-grows very fast with the image size, and an enormous number of weights and biases are needed in order to build an accurate network.  
+<p>The mathematics of CNNs is based on the mathematical operation of
+<b>convolution</b>.  In mathematics (in particular in functional analysis),
+convolution is represented by mathematical operations (integration,
+summation etc) on two functions in order to produce a third function
+that expresses how the shape of one gets modified by the other.
+Convolution has a plethora of applications in a variety of
+disciplines, spanning from statistics to signal processing, computer
+vision, solutions of differential equations,linear algebra,
+engineering, and yes, machine learning.
 </p>
 
-<p>As before, we still have our input, a hidden layer and an output. What's novel about convolutional networks
-are the <b>convolutional</b> and <b>pooling</b> layers stacked in pairs between the input and the hidden layer.
-In addition, the data is no longer represented as a 2D feature matrix, instead each input is a number of 2D
-matrices, typically 1 for each color dimension (Red, Green, Blue). 
+<p>Mathematically, convolution is defined as follows (one-dimensional example):
+Let us define a continuous function \( y(t) \) given by
 </p>
+$$
+y(t) = \int x(a) w(t-a) da,
+$$
+
+<p>where \( x(a) \) represents a so-called input and \( w(t-a) \) is normally called the weight function or kernel.</p>
+
+<p>The above integral is written in  a more compact form as</p>
+$$
+y(t) = \left(x * w\right)(t).
+$$
+
+<p>The discretized version reads</p>
+$$
+y(t) = \sum_{a=-\infty}^{a=\infty}x(a)w(t-a).
+$$
+
+<p>Computing the inverse of the above convolution operations is known as deconvolution and the process is commutative.</p>
+
+<p>How can we use this? And what does it mean? Let us study some familiar examples first.</p>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="setting-it-up">Setting it up </h2>
+<h2 id="convolution-examples-polynomial-multiplication">Convolution Examples: Polynomial multiplication </h2>
 
-<p>It means that to represent the entire
-dataset of images, we require a 4D matrix or <b>tensor</b>. This tensor has the dimensions:  
+<p>Our first example is that of a multiplication between two polynomials,
+which we will rewrite in terms of the mathematics of convolution. In
+the final stage, since the problem here is a discrete one, we will
+recast the final expression in terms of a matrix-vector
+multiplication, where the matrix is a so-called <a href="https://link.springer.com/book/10.1007/978-93-86279-04-0" target="_blank">Toeplitz matrix
+</a>.
 </p>
-$$  
-(n_{inputs},\, n_{pixels, width},\, n_{pixels, height},\, depth) .
+
+<p>Let us look a the following polynomials to second and third order, respectively:</p>
+$$
+p(t) = \alpha_0+\alpha_1 t+\alpha_2 t^2,
+$$
+
+<p>and</p>
+$$
+s(t) = \beta_0+\beta_1 t+\beta_2 t^2+\beta_3 t^3.
+$$
+
+<p>The polynomial multiplication gives us a new polynomial of degree \( 5 \)</p>
+$$
+z(t) = \delta_0+\delta_1 t+\delta_2 t^2+\delta_3 t^3+\delta_4 t^4+\delta_5 t^5.
 $$
 
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="the-mnist-dataset-again">The MNIST dataset again </h2>
+<h2 id="efficient-polynomial-multiplication">Efficient Polynomial Multiplication </h2>
 
-<p>The MNIST dataset consists of grayscale images with a pixel size of
-\( 28\times 28 \), meaning we require \( 28 \times 28 = 724 \) weights to each
-neuron in the first hidden layer.
+<p>Computing polynomial products can be implemented efficiently if we rewrite the more brute force multiplications using convolution.
+We note first that the new coefficients are given as
 </p>
 
-<p>If we were to analyze images of size \( 128\times 128 \) we would require
-\( 128 \times 128 = 16384 \) weights to each neuron. Even worse if we were
-dealing with color images, as most images are, we have an image matrix
-of size \( 128\times 128 \) for each color dimension (Red, Green, Blue),
-meaning 3 times the number of weights \( = 49152 \) are required for every
-single neuron in the first hidden layer.
-</p>
+$$
+\begin{split}
+\delta_0=&\alpha_0\beta_0\\
+\delta_1=&\alpha_1\beta_0+\alpha_0\beta_1\\
+\delta_2=&\alpha_0\beta_2+\alpha_1\beta_1+\alpha_2\beta_0\\
+\delta_3=&\alpha_1\beta_2+\alpha_2\beta_1+\alpha_0\beta_3\\
+\delta_4=&\alpha_2\beta_2+\alpha_1\beta_3\\
+\delta_5=&\alpha_2\beta_3.\\
+\end{split}
+$$
+
+<p>We note that \( \alpha_i=0 \) except for \( i\in \left\{0,1,2\right\} \) and \( \beta_i=0 \) except for \( i\in\left\{0,1,2,3\right\} \).</p>
+
+<p>We can then rewrite the coefficients \( \delta_j \) using a discrete convolution as</p>
+$$
+\delta_j = \sum_{i=-\infty}^{i=\infty}\alpha_i\beta_{j-i}=(\alpha * \beta)_j,
+$$
+
+<p>or as a double sum with restriction \( l=i+j \)</p>
+$$
+\delta_l = \sum_{ij}\alpha_i\beta_{j}.
+$$
 
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="strong-correlations">Strong correlations </h2>
+<h2 id="further-simplification">Further simplification </h2>
 
-<p>Images typically have strong local correlations, meaning that a small
-part of the image varies little from its neighboring regions. If for
-example we have an image of a blue car, we can roughly assume that a
-small blue part of the image is surrounded by other blue regions.
+<p>Although we may have redundant operations with some few zeros for \( \beta_i \), we can rewrite the above sum in a more compact way as </p>
+$$
+\delta_i = \sum_{k=0}^{k=m-1}\alpha_k\beta_{i-k},
+$$
+
+<p>where \( m=3 \) in our case, the maximum length of
+the vector \( \alpha \). Note that the vector \( \boldsymbol{\beta} \) has length \( n=4 \). Below we will find an even more efficient representation.
 </p>
 
-<p>Therefore, instead of connecting every single pixel to a neuron in the
-first hidden layer, as we have previously done with deep neural
-networks, we can instead connect each neuron to a small part of the
-image (in all 3 RGB depth dimensions).  The size of each small area is
-fixed, and known as a <a href="https://en.wikipedia.org/wiki/Receptive_field" target="_blank">receptive</a>.
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="a-more-efficient-way-of-coding-the-above-convolution">A more efficient way of coding the above Convolution </h2>
+
+<p>Since we only have a finite number of \( \alpha \) and \( \beta \) values
+which are non-zero, we can rewrite the above convolution expressions
+as a matrix-vector multiplication
 </p>
 
+$$
+\boldsymbol{\delta}=\begin{bmatrix}\alpha_0 & 0 & 0 & 0 \\
+                            \alpha_1 & \alpha_0 & 0 & 0 \\
+			    \alpha_2 & \alpha_1 & \alpha_0 & 0 \\
+			    0 & \alpha_2 & \alpha_1 & \alpha_0 \\
+			    0 & 0 & \alpha_2 & \alpha_1 \\
+			    0 & 0 & 0 & \alpha_2
+			    \end{bmatrix}\begin{bmatrix} \beta_0 \\ \beta_1 \\ \beta_2 \\ \beta_3\end{bmatrix}.
+$$
 
-<!-- !split  -->
-<h2 id="layers-of-a-cnn">Layers of a CNN </h2>
 
-<p>The layers of a convolutional neural network arrange neurons in 3D: width, height and depth.  
-The input image is typically a square matrix of depth 3. 
-</p>
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="commutative-process">Commutative process </h2>
 
-<p>A <b>convolution</b> is performed on the image which outputs
-a 3D volume of neurons. The weights to the input are arranged in a number of 2D matrices, known as <b>filters</b>.
+<p>The process is commutative and we can easily see that we can rewrite the multiplication in terms of  a matrix holding \( \beta \) and a vector holding \( \alpha \).
+In this case we have
 </p>
+$$
+\boldsymbol{\delta}=\begin{bmatrix}\beta_0 & 0 & 0  \\
+                            \beta_1 & \beta_0 & 0  \\
+			    \beta_2 & \beta_1 & \beta_0  \\
+			    \beta_3 & \beta_2 & \beta_1 \\
+			    0 & \beta_3 & \beta_2 \\
+			    0 & 0 & \beta_3
+			    \end{bmatrix}\begin{bmatrix} \alpha_0 \\ \alpha_1 \\ \alpha_2\end{bmatrix}.
+$$
 
-<p>Each filter slides along the input image, taking the dot product
-between each small part of the image and the filter, in all depth
-dimensions. This is then passed through a non-linear function,
-typically the <b>Rectified Linear (ReLu)</b> function, which serves as the
-activation of the neurons in the first convolutional layer. This is
-further passed through a <b>pooling layer</b>, which reduces the size of the
-convolutional layer, e.g. by taking the maximum or average across some
-small regions, and this serves as input to the next convolutional
-layer.
+<p>Note that the use of these matrices is for mathematical purposes only
+and not implementation purposes.  When implementing the above equation
+we do not encode (and allocate memory) the matrices explicitely.  We
+rather code the convolutions in the minimal memory footprint that they
+require.
 </p>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="systematic-reduction">Systematic reduction </h2>
+<h2 id="toeplitz-matrices">Toeplitz matrices </h2>
 
-<p>By systematically reducing the size of the input volume, through
-convolution and pooling, the network should create representations of
-small parts of the input, and then from them assemble representations
-of larger areas.  The final pooling layer is flattened to serve as
-input to a hidden layer, such that each neuron in the final pooling
-layer is connected to every single neuron in the hidden layer. This
-then serves as input to the output layer, e.g. a softmax output for
-classification.
+<p>The above matrices are examples of so-called <a href="https://link.springer.com/book/10.1007/978-93-86279-04-0" target="_blank">Toeplitz
+matrices</a>. A
+Toeplitz matrix is a matrix in which each descending diagonal from
+left to right is constant. For instance the last matrix, which we
+rewrite as
 </p>
+$$
+\boldsymbol{A}=\begin{bmatrix}a_0 & 0 & 0  \\
+                            a_1 & a_0 & 0  \\
+			    a_2 & a_1 & a_0  \\
+			    a_3 & a_2 & a_1 \\
+			    0 & a_3 & a_2 \\
+			    0 & 0 & a_3
+			    \end{bmatrix},
+$$
 
+<p>with elements \( a_{ii}=a_{i+1,j+1}=a_{i-j} \) is an example of a Toeplitz
+matrix. Such a matrix does not need to be a square matrix.  Toeplitz
+matrices are also closely connected with Fourier series, because the multiplication operator by a trigonometric
+polynomial, compressed to a finite-dimensional space, can be
+represented by such a matrix. The example above shows that we can
+represent linear convolution as multiplication of a Toeplitz matrix by
+a vector.
+</p>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="prerequisites-collect-and-pre-process-data">Prerequisites: Collect and pre-process data </h2>
+<h2 id="fourier-series-and-toeplitz-matrices">Fourier series and Toeplitz matrices </h2>
 
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="line-height: 125%;"><span style="color: #228B22"># import necessary packages</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">matplotlib.pyplot</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">plt</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn</span> <span style="color: #8B008B; font-weight: bold">import</span> datasets
+<p>This is an active and ogoing research area concerning CNNs. The following articles may be of interest</p>
+<ol>
+<li> <a href="https://www.sciencedirect.com/topics/engineering/convolution-theorem#:~:text=The%20convolution%20theorem%20(together%20with,k%20)%20G%20(%20k%20)%20." target="_blank">Read more about the convolution theorem and Fouriers series</a></li>
+<li> <a href="https://www.sciencedirect.com/science/article/pii/S1568494623006257" target="_blank">Fourier Transform Layer</a></li>
+</ol>
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="generalizing-the-above-one-dimensional-case">Generalizing the above one-dimensional case </h2>
 
+<p>In order to align the above simple case with the more general
+convolution cases, we rename \( \boldsymbol{\alpha} \), whose length is \( m=3 \),
+with \( \boldsymbol{w} \).  We will interpret \( \boldsymbol{w} \) as a weight/filter function
+with which we want to perform the convolution with an input variable
+\( \boldsymbol{x} \) of length \( n \).  We will assume always that the filter
+\( \boldsymbol{w} \) has dimensionality \( m \le n \).
+</p>
 
-<span style="color: #228B22"># ensure the same random numbers appear every time</span>
-np.random.seed(<span style="color: #B452CD">0</span>)
+<p>We replace thus \( \boldsymbol{\beta} \) with \( \boldsymbol{x} \) and \( \boldsymbol{\delta} \) with \( \boldsymbol{y} \) and have</p>
+$$
+y(i)= \left(x*w\right)(i)= \sum_{k=0}^{k=m-1}w(k)x(i-k),
+$$
 
-<span style="color: #228B22"># display images in notebook</span>
-%matplotlib inline
-plt.rcParams[<span style="color: #CD5555">&#39;figure.figsize&#39;</span>] = (<span style="color: #B452CD">12</span>,<span style="color: #B452CD">12</span>)
+<p>where \( m=3 \) in our case, the maximum length of the vector \( \boldsymbol{w} \).
+Here the symbol \( * \) represents the mathematical operation of convolution.
+</p>
 
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="memory-considerations">Memory considerations </h2>
 
-<span style="color: #228B22"># download MNIST dataset</span>
-digits = datasets.load_digits()
+<p>This expression leaves us however with some terms with negative
+indices, for example \( x(-1) \) and \( x(-2) \) which may not be defined. Our
+vector \( \boldsymbol{x} \) has components \( x(0) \), \( x(1) \), \( x(2) \) and \( x(3) \).
+</p>
 
-<span style="color: #228B22"># define inputs and labels</span>
-inputs = digits.images
-labels = digits.target
+<p>The index \( j \) for \( \boldsymbol{x} \) runs from \( j=0 \) to \( j=3 \) since \( \boldsymbol{x} \) is meant to
+represent a third-order polynomial.
+</p>
 
-<span style="color: #228B22"># RGB images have a depth of 3</span>
-<span style="color: #228B22"># our images are grayscale so they should have a depth of 1</span>
-inputs = inputs[:,:,:,np.newaxis]
+<p>Furthermore, the index \( i \) runs from \( i=0 \) to \( i=5 \) since \( \boldsymbol{y} \)
+contains the coefficients of a fifth-order polynomial.  When \( i=5 \) we
+may also have values of \( x(4) \) and \( x(5) \) which are not defined.
+</p>
 
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;inputs = (n_inputs, pixel_width, pixel_height, depth) = &quot;</span> + <span style="color: #658b00">str</span>(inputs.shape))
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;labels = (n_inputs) = &quot;</span> + <span style="color: #658b00">str</span>(labels.shape))
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="padding">Padding </h2>
 
+<p>The solution to this is what is called <b>padding</b>!  We simply define a
+new vector \( x \) with two added elements set to zero before \( x(0) \) and
+two new elements after \( x(3) \) set to zero. That is, we augment the
+length of \( \boldsymbol{x} \) from \( n=4 \) to \( n+2P=8 \), where \( P=2 \) is the padding
+constant (a new hyperparameter), see discussions below as well.
+</p>
 
-<span style="color: #228B22"># choose some random images to display</span>
-n_inputs = <span style="color: #658b00">len</span>(inputs)
-indices = np.arange(n_inputs)
-random_indices = np.random.choice(indices, size=<span style="color: #B452CD">5</span>)
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="new-vector">New vector </h2>
 
-<span style="color: #8B008B; font-weight: bold">for</span> i, image <span style="color: #8B008B">in</span> <span style="color: #658b00">enumerate</span>(digits.images[random_indices]):
-    plt.subplot(<span style="color: #B452CD">1</span>, <span style="color: #B452CD">5</span>, i+<span style="color: #B452CD">1</span>)
-    plt.axis(<span style="color: #CD5555">&#39;off&#39;</span>)
-    plt.imshow(image, cmap=plt.cm.gray_r, interpolation=<span style="color: #CD5555">&#39;nearest&#39;</span>)
-    plt.title(<span style="color: #CD5555">&quot;Label: %d&quot;</span> % digits.target[random_indices[i]])
-plt.show()
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="importing-keras-and-tensorflow">Importing Keras and Tensorflow </h2>
-
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="line-height: 125%;"><span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">tensorflow.keras</span> <span style="color: #8B008B; font-weight: bold">import</span> datasets, layers, models
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">tensorflow.keras.layers</span> <span style="color: #8B008B; font-weight: bold">import</span> Input
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">tensorflow.keras.models</span> <span style="color: #8B008B; font-weight: bold">import</span> Sequential      <span style="color: #228B22">#This allows appending layers to existing models</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">tensorflow.keras.layers</span> <span style="color: #8B008B; font-weight: bold">import</span> Dense           <span style="color: #228B22">#This allows defining the characteristics of a particular layer</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">tensorflow.keras</span> <span style="color: #8B008B; font-weight: bold">import</span> optimizers             <span style="color: #228B22">#This allows using whichever optimiser we want (sgd,adam,RMSprop)</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">tensorflow.keras</span> <span style="color: #8B008B; font-weight: bold">import</span> regularizers           <span style="color: #228B22">#This allows using whichever regularizer we want (l1,l2,l1_l2)</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">tensorflow.keras.utils</span> <span style="color: #8B008B; font-weight: bold">import</span> to_categorical   <span style="color: #228B22">#This allows using categorical cross entropy as the cost function</span>
-<span style="color: #228B22">#from tensorflow.keras import Conv2D</span>
-<span style="color: #228B22">#from tensorflow.keras import MaxPooling2D</span>
-<span style="color: #228B22">#from tensorflow.keras import Flatten</span>
-
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.model_selection</span> <span style="color: #8B008B; font-weight: bold">import</span> train_test_split
-
-<span style="color: #228B22"># representation of labels</span>
-labels = to_categorical(labels)
-
-<span style="color: #228B22"># split into train and test data</span>
-<span style="color: #228B22"># one-liner from scikit-learn library</span>
-train_size = <span style="color: #B452CD">0.8</span>
-test_size = <span style="color: #B452CD">1</span> - train_size
-X_train, X_test, Y_train, Y_test = train_test_split(inputs, labels, train_size=train_size,
-                                                    test_size=test_size)
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-
-<!-- !split  -->
-<h2 id="running-with-keras">Running with Keras </h2>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="line-height: 125%;"><span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">create_convolutional_neural_network_keras</span>(input_shape, receptive_field,
-                                              n_filters, n_neurons_connected, n_categories,
-                                              eta, lmbd):
-    model = Sequential()
-    model.add(layers.Conv2D(n_filters, (receptive_field, receptive_field), input_shape=input_shape, padding=<span style="color: #CD5555">&#39;same&#39;</span>,
-              activation=<span style="color: #CD5555">&#39;relu&#39;</span>, kernel_regularizer=regularizers.l2(lmbd)))
-    model.add(layers.MaxPooling2D(pool_size=(<span style="color: #B452CD">2</span>, <span style="color: #B452CD">2</span>)))
-    model.add(layers.Flatten())
-    model.add(layers.Dense(n_neurons_connected, activation=<span style="color: #CD5555">&#39;relu&#39;</span>, kernel_regularizer=regularizers.l2(lmbd)))
-    model.add(layers.Dense(n_categories, activation=<span style="color: #CD5555">&#39;softmax&#39;</span>, kernel_regularizer=regularizers.l2(lmbd)))
-    
-    sgd = optimizers.SGD(learning_rate=eta)
-    model.compile(loss=<span style="color: #CD5555">&#39;categorical_crossentropy&#39;</span>, optimizer=sgd, metrics=[<span style="color: #CD5555">&#39;accuracy&#39;</span>])
-    
-    <span style="color: #8B008B; font-weight: bold">return</span> model
-
-epochs = <span style="color: #B452CD">100</span>
-batch_size = <span style="color: #B452CD">100</span>
-input_shape = X_train.shape[<span style="color: #B452CD">1</span>:<span style="color: #B452CD">4</span>]
-receptive_field = <span style="color: #B452CD">3</span>
-n_filters = <span style="color: #B452CD">10</span>
-n_neurons_connected = <span style="color: #B452CD">50</span>
-n_categories = <span style="color: #B452CD">10</span>
-
-eta_vals = np.logspace(-<span style="color: #B452CD">5</span>, <span style="color: #B452CD">1</span>, <span style="color: #B452CD">7</span>)
-lmbd_vals = np.logspace(-<span style="color: #B452CD">5</span>, <span style="color: #B452CD">1</span>, <span style="color: #B452CD">7</span>)
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="final-part">Final part </h2>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="line-height: 125%;">CNN_keras = np.zeros((<span style="color: #658b00">len</span>(eta_vals), <span style="color: #658b00">len</span>(lmbd_vals)), dtype=<span style="color: #658b00">object</span>)
-        
-<span style="color: #8B008B; font-weight: bold">for</span> i, eta <span style="color: #8B008B">in</span> <span style="color: #658b00">enumerate</span>(eta_vals):
-    <span style="color: #8B008B; font-weight: bold">for</span> j, lmbd <span style="color: #8B008B">in</span> <span style="color: #658b00">enumerate</span>(lmbd_vals):
-        CNN = create_convolutional_neural_network_keras(input_shape, receptive_field,
-                                              n_filters, n_neurons_connected, n_categories,
-                                              eta, lmbd)
-        CNN.fit(X_train, Y_train, epochs=epochs, batch_size=batch_size, verbose=<span style="color: #B452CD">0</span>)
-        scores = CNN.evaluate(X_test, Y_test)
-        
-        CNN_keras[i][j] = CNN
-        
-        <span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;Learning rate = &quot;</span>, eta)
-        <span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;Lambda = &quot;</span>, lmbd)
-        <span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;Test accuracy: %.3f&quot;</span> % scores[<span style="color: #B452CD">1</span>])
-        <span style="color: #658b00">print</span>()
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="final-visualization">Final visualization </h2>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="line-height: 125%;"><span style="color: #228B22"># visual representation of grid search</span>
-<span style="color: #228B22"># uses seaborn heatmap, could probably do this in matplotlib</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">seaborn</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">sns</span>
-
-sns.set()
-
-train_accuracy = np.zeros((<span style="color: #658b00">len</span>(eta_vals), <span style="color: #658b00">len</span>(lmbd_vals)))
-test_accuracy = np.zeros((<span style="color: #658b00">len</span>(eta_vals), <span style="color: #658b00">len</span>(lmbd_vals)))
-
-<span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #658b00">len</span>(eta_vals)):
-    <span style="color: #8B008B; font-weight: bold">for</span> j <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #658b00">len</span>(lmbd_vals)):
-        CNN = CNN_keras[i][j]
-
-        train_accuracy[i][j] = CNN.evaluate(X_train, Y_train)[<span style="color: #B452CD">1</span>]
-        test_accuracy[i][j] = CNN.evaluate(X_test, Y_test)[<span style="color: #B452CD">1</span>]
-
-        
-fig, ax = plt.subplots(figsize = (<span style="color: #B452CD">10</span>, <span style="color: #B452CD">10</span>))
-sns.heatmap(train_accuracy, annot=<span style="color: #8B008B; font-weight: bold">True</span>, ax=ax, cmap=<span style="color: #CD5555">&quot;viridis&quot;</span>)
-ax.set_title(<span style="color: #CD5555">&quot;Training Accuracy&quot;</span>)
-ax.set_ylabel(<span style="color: #CD5555">&quot;$\eta$&quot;</span>)
-ax.set_xlabel(<span style="color: #CD5555">&quot;$\lambda$&quot;</span>)
-plt.show()
-
-fig, ax = plt.subplots(figsize = (<span style="color: #B452CD">10</span>, <span style="color: #B452CD">10</span>))
-sns.heatmap(test_accuracy, annot=<span style="color: #8B008B; font-weight: bold">True</span>, ax=ax, cmap=<span style="color: #CD5555">&quot;viridis&quot;</span>)
-ax.set_title(<span style="color: #CD5555">&quot;Test Accuracy&quot;</span>)
-ax.set_ylabel(<span style="color: #CD5555">&quot;$\eta$&quot;</span>)
-ax.set_xlabel(<span style="color: #CD5555">&quot;$\lambda$&quot;</span>)
-plt.show()
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="the-cifar01-data-set">The CIFAR01 data set </h2>
-
-<p>The CIFAR10 dataset contains 60,000 color images in 10 classes, with
-6,000 images in each class. The dataset is divided into 50,000
-training images and 10,000 testing images. The classes are mutually
-exclusive and there is no overlap between them.
-</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="line-height: 125%;"><span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">tensorflow</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">tf</span>
-
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">tensorflow.keras</span> <span style="color: #8B008B; font-weight: bold">import</span> datasets, layers, models
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">matplotlib.pyplot</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">plt</span>
-
-<span style="color: #228B22"># We import the data set</span>
-(train_images, train_labels), (test_images, test_labels) = datasets.cifar10.load_data()
-
-<span style="color: #228B22"># Normalize pixel values to be between 0 and 1 by dividing by 255. </span>
-train_images, test_images = train_images / <span style="color: #B452CD">255.0</span>, test_images / <span style="color: #B452CD">255.0</span>
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="verifying-the-data-set">Verifying the data set </h2>
-
-<p>To verify that the dataset looks correct, let's plot the first 25 images from the training set and display the class name below each image.</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="line-height: 125%;">class_names = [<span style="color: #CD5555">&#39;airplane&#39;</span>, <span style="color: #CD5555">&#39;automobile&#39;</span>, <span style="color: #CD5555">&#39;bird&#39;</span>, <span style="color: #CD5555">&#39;cat&#39;</span>, <span style="color: #CD5555">&#39;deer&#39;</span>,
-               <span style="color: #CD5555">&#39;dog&#39;</span>, <span style="color: #CD5555">&#39;frog&#39;</span>, <span style="color: #CD5555">&#39;horse&#39;</span>, <span style="color: #CD5555">&#39;ship&#39;</span>, <span style="color: #CD5555">&#39;truck&#39;</span>]
-plt.figure(figsize=(<span style="color: #B452CD">10</span>,<span style="color: #B452CD">10</span>))
-<span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #B452CD">25</span>):
-    plt.subplot(<span style="color: #B452CD">5</span>,<span style="color: #B452CD">5</span>,i+<span style="color: #B452CD">1</span>)
-    plt.xticks([])
-    plt.yticks([])
-    plt.grid(<span style="color: #8B008B; font-weight: bold">False</span>)
-    plt.imshow(train_images[i], cmap=plt.cm.binary)
-    <span style="color: #228B22"># The CIFAR labels happen to be arrays, </span>
-    <span style="color: #228B22"># which is why you need the extra index</span>
-    plt.xlabel(class_names[train_labels[i][<span style="color: #B452CD">0</span>]])
-plt.show()
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="set-up-the-model">Set up  the model </h2>
-
-<p>The 6 lines of code below define the convolutional base using a common pattern: a stack of Conv2D and MaxPooling2D layers.</p>
-
-<p>As input, a CNN takes tensors of shape (image_height, image_width, color_channels), ignoring the batch size. If you are new to these dimensions, color_channels refers to (R,G,B). In this example, you will configure our CNN to process inputs of shape (32, 32, 3), which is the format of CIFAR images. You can do this by passing the argument input_shape to our first layer.</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="line-height: 125%;">model = models.Sequential()
-model.add(layers.Conv2D(<span style="color: #B452CD">32</span>, (<span style="color: #B452CD">3</span>, <span style="color: #B452CD">3</span>), activation=<span style="color: #CD5555">&#39;relu&#39;</span>, input_shape=(<span style="color: #B452CD">32</span>, <span style="color: #B452CD">32</span>, <span style="color: #B452CD">3</span>)))
-model.add(layers.MaxPooling2D((<span style="color: #B452CD">2</span>, <span style="color: #B452CD">2</span>)))
-model.add(layers.Conv2D(<span style="color: #B452CD">64</span>, (<span style="color: #B452CD">3</span>, <span style="color: #B452CD">3</span>), activation=<span style="color: #CD5555">&#39;relu&#39;</span>))
-model.add(layers.MaxPooling2D((<span style="color: #B452CD">2</span>, <span style="color: #B452CD">2</span>)))
-model.add(layers.Conv2D(<span style="color: #B452CD">64</span>, (<span style="color: #B452CD">3</span>, <span style="color: #B452CD">3</span>), activation=<span style="color: #CD5555">&#39;relu&#39;</span>))
-
-<span style="color: #228B22"># Let&#39;s display the architecture of our model so far.</span>
-
-model.summary()
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-<p>You can see that the output of every Conv2D and MaxPooling2D layer is a 3D tensor of shape (height, width, channels). The width and height dimensions tend to shrink as you go deeper in the network. The number of output channels for each Conv2D layer is controlled by the first argument (e.g., 32 or 64). Typically, as the width and height shrink, you can afford (computationally) to add more output channels in each Conv2D layer.</p>
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="add-dense-layers-on-top">Add Dense layers on top </h2>
-
-<p>To complete our model, you will feed the last output tensor from the
-convolutional base (of shape (4, 4, 64)) into one or more Dense layers
-to perform classification. Dense layers take vectors as input (which
-are 1D), while the current output is a 3D tensor. First, you will
-flatten (or unroll) the 3D output to 1D, then add one or more Dense
-layers on top. CIFAR has 10 output classes, so you use a final Dense
-layer with 10 outputs and a softmax activation.
-</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="line-height: 125%;">model.add(layers.Flatten())
-model.add(layers.Dense(<span style="color: #B452CD">64</span>, activation=<span style="color: #CD5555">&#39;relu&#39;</span>))
-model.add(layers.Dense(<span style="color: #B452CD">10</span>))
-<span style="color: #228B22"># Here&#39;s the complete architecture of our model</span>
-
-model.summary()
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-<p>As you can see, our (4, 4, 64) outputs were flattened into vectors of shape (1024) before going through two Dense layers.</p>
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="compile-and-train-the-model">Compile and train the model </h2>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="line-height: 125%;">model.compile(optimizer=<span style="color: #CD5555">&#39;adam&#39;</span>,
-              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=<span style="color: #8B008B; font-weight: bold">True</span>),
-              metrics=[<span style="color: #CD5555">&#39;accuracy&#39;</span>])
-
-history = model.fit(train_images, train_labels, epochs=<span style="color: #B452CD">10</span>, 
-                    validation_data=(test_images, test_labels))
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="finally-evaluate-the-model">Finally, evaluate the model </h2>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="line-height: 125%;">plt.plot(history.history[<span style="color: #CD5555">&#39;accuracy&#39;</span>], label=<span style="color: #CD5555">&#39;accuracy&#39;</span>)
-plt.plot(history.history[<span style="color: #CD5555">&#39;val_accuracy&#39;</span>], label = <span style="color: #CD5555">&#39;val_accuracy&#39;</span>)
-plt.xlabel(<span style="color: #CD5555">&#39;Epoch&#39;</span>)
-plt.ylabel(<span style="color: #CD5555">&#39;Accuracy&#39;</span>)
-plt.ylim([<span style="color: #B452CD">0.5</span>, <span style="color: #B452CD">1</span>])
-plt.legend(loc=<span style="color: #CD5555">&#39;lower right&#39;</span>)
-
-test_loss, test_acc = model.evaluate(test_images,  test_labels, verbose=<span style="color: #B452CD">2</span>)
-
-<span style="color: #658b00">print</span>(test_acc)
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="building-our-own-cnn-code">Building our own CNN code </h2>
-
-<p>Here we present a flexible and readable python code for a CNN
-implemented with NumPy. We will present the code, showcase how to use
-the codebase and fit a CNN that yields a 99% accuracy on the 28x28
-MNIST dataset within reasonable time.
-</p>
-
-<b>The codes here were developed by Eric Reber and Gregor Kajda during spring 2023.</b>
-
-<p>The CNN is compatible with all schedulers, cost functions and
-activation functions discussed in constructing our neural network
-codes.
-</p>
-
-<p> The CNN code consists of different types of Layer classes, including
-Convolution2DLayer, Pooling2DLayer, FlattenLayer, FullyConnectedLayer
-and OutputLayer, which can be added to the CNN object using the
-interface of the CNN class. This allows you to easily construct your
-own CNN, as well as allowing you to get used to an interface similar
-to that of TensorFlow which is used for real world applications. 
-</p>
-
-<p>Another important feature of this code is that it throws errors if
-unreasonable decisions are made (for example using a kernel that is
-larger than the image, not using a FlattenLayer, etc), and provides
-the user with an informative error message.
-</p>
-<h3 id="list-of-contents">List of contents: </h3>
-<ol>
-<li> Schedulers</li>
-<li> Activation Functions</li>
-<li> Cost Functions</li> 
-<li> Convolution</li>
-<li> Layers</li>
-<li> CNN</li> 
-<li> Some final remarks</li>
-</ol>
-<h3 id="schedulers">Schedulers </h3>
-
-<p>The code below shows object oriented implementations of the Constant,
-Momentum, Adagrad, AdagradMomentum, RMS prop and Adam schedulers. All
-of the classes belong to the shared abstract Scheduler class, and
-share the update_change() and reset() methods allowing for any of the
-schedulers to be seamlessly used during the training stage, as will
-later be shown in the fit() method of the neural
-network. Update_change() only has one parameter, the gradient
-(\( \delta^{l}_{j}a^{l-1}_k \)), and returns the change which will be
-subtracted from the weights. The reset() function takes no parameters,
-and resets the desired variables. For Constant and Momentum, reset
-does nothing.
-</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="line-height: 125%;"><span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">autograd.numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
-
-<span style="color: #8B008B; font-weight: bold">class</span> <span style="color: #008b45; font-weight: bold">Scheduler</span>:
-    <span style="color: #CD5555">&quot;&quot;&quot;</span>
-<span style="color: #CD5555">    Abstract class for Schedulers</span>
-<span style="color: #CD5555">    &quot;&quot;&quot;</span>
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">__init__</span>(<span style="color: #658b00">self</span>, eta):
-        <span style="color: #658b00">self</span>.eta = eta
-
-    <span style="color: #228B22"># should be overwritten</span>
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">update_change</span>(<span style="color: #658b00">self</span>, gradient):
-        <span style="color: #8B008B; font-weight: bold">raise</span> <span style="color: #008b45; font-weight: bold">NotImplementedError</span>
-
-    <span style="color: #228B22"># overwritten if needed</span>
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">reset</span>(<span style="color: #658b00">self</span>):
-        <span style="color: #8B008B; font-weight: bold">pass</span>
-
-
-<span style="color: #8B008B; font-weight: bold">class</span> <span style="color: #008b45; font-weight: bold">Constant</span>(Scheduler):
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">__init__</span>(<span style="color: #658b00">self</span>, eta):
-        <span style="color: #658b00">super</span>().<span style="color: #008b45">__init__</span>(eta)
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">update_change</span>(<span style="color: #658b00">self</span>, gradient):
-        <span style="color: #8B008B; font-weight: bold">return</span> <span style="color: #658b00">self</span>.eta * gradient
-    
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">reset</span>(<span style="color: #658b00">self</span>):
-        <span style="color: #8B008B; font-weight: bold">pass</span>
-
-
-<span style="color: #8B008B; font-weight: bold">class</span> <span style="color: #008b45; font-weight: bold">Momentum</span>(Scheduler):
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">__init__</span>(<span style="color: #658b00">self</span>, eta: <span style="color: #658b00">float</span>, momentum: <span style="color: #658b00">float</span>):
-        <span style="color: #658b00">super</span>().<span style="color: #008b45">__init__</span>(eta)
-        <span style="color: #658b00">self</span>.momentum = momentum
-        <span style="color: #658b00">self</span>.change = <span style="color: #B452CD">0</span>
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">update_change</span>(<span style="color: #658b00">self</span>, gradient):
-        <span style="color: #658b00">self</span>.change = <span style="color: #658b00">self</span>.momentum * <span style="color: #658b00">self</span>.change + <span style="color: #658b00">self</span>.eta * gradient
-        <span style="color: #8B008B; font-weight: bold">return</span> <span style="color: #658b00">self</span>.change
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">reset</span>(<span style="color: #658b00">self</span>):
-        <span style="color: #8B008B; font-weight: bold">pass</span>
-
-
-<span style="color: #8B008B; font-weight: bold">class</span> <span style="color: #008b45; font-weight: bold">Adagrad</span>(Scheduler):
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">__init__</span>(<span style="color: #658b00">self</span>, eta):
-        <span style="color: #658b00">super</span>().<span style="color: #008b45">__init__</span>(eta)
-        <span style="color: #658b00">self</span>.G_t = <span style="color: #8B008B; font-weight: bold">None</span>
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">update_change</span>(<span style="color: #658b00">self</span>, gradient):
-        delta = <span style="color: #B452CD">1e-8</span>  <span style="color: #228B22"># avoid division ny zero</span>
-
-        <span style="color: #8B008B; font-weight: bold">if</span> <span style="color: #658b00">self</span>.G_t <span style="color: #8B008B">is</span> <span style="color: #8B008B; font-weight: bold">None</span>:
-            <span style="color: #658b00">self</span>.G_t = np.zeros((gradient.shape[<span style="color: #B452CD">0</span>], gradient.shape[<span style="color: #B452CD">0</span>]))
-
-        <span style="color: #658b00">self</span>.G_t += gradient @ gradient.T
-
-        G_t_inverse = <span style="color: #B452CD">1</span> / (
-            delta + np.sqrt(np.reshape(np.diagonal(<span style="color: #658b00">self</span>.G_t), (<span style="color: #658b00">self</span>.G_t.shape[<span style="color: #B452CD">0</span>], <span style="color: #B452CD">1</span>)))
-        )
-        <span style="color: #8B008B; font-weight: bold">return</span> <span style="color: #658b00">self</span>.eta * gradient * G_t_inverse
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">reset</span>(<span style="color: #658b00">self</span>):
-        <span style="color: #658b00">self</span>.G_t = <span style="color: #8B008B; font-weight: bold">None</span>
-
-
-<span style="color: #8B008B; font-weight: bold">class</span> <span style="color: #008b45; font-weight: bold">AdagradMomentum</span>(Scheduler):
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">__init__</span>(<span style="color: #658b00">self</span>, eta, momentum):
-        <span style="color: #658b00">super</span>().<span style="color: #008b45">__init__</span>(eta)
-        <span style="color: #658b00">self</span>.G_t = <span style="color: #8B008B; font-weight: bold">None</span>
-        <span style="color: #658b00">self</span>.momentum = momentum
-        <span style="color: #658b00">self</span>.change = <span style="color: #B452CD">0</span>
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">update_change</span>(<span style="color: #658b00">self</span>, gradient):
-        delta = <span style="color: #B452CD">1e-8</span>  <span style="color: #228B22"># avoid division ny zero</span>
-
-        <span style="color: #8B008B; font-weight: bold">if</span> <span style="color: #658b00">self</span>.G_t <span style="color: #8B008B">is</span> <span style="color: #8B008B; font-weight: bold">None</span>:
-            <span style="color: #658b00">self</span>.G_t = np.zeros((gradient.shape[<span style="color: #B452CD">0</span>], gradient.shape[<span style="color: #B452CD">0</span>]))
-
-        <span style="color: #658b00">self</span>.G_t += gradient @ gradient.T
-
-        G_t_inverse = <span style="color: #B452CD">1</span> / (
-            delta + np.sqrt(np.reshape(np.diagonal(<span style="color: #658b00">self</span>.G_t), (<span style="color: #658b00">self</span>.G_t.shape[<span style="color: #B452CD">0</span>], <span style="color: #B452CD">1</span>)))
-        )
-        <span style="color: #658b00">self</span>.change = <span style="color: #658b00">self</span>.change * <span style="color: #658b00">self</span>.momentum + <span style="color: #658b00">self</span>.eta * gradient * G_t_inverse
-        <span style="color: #8B008B; font-weight: bold">return</span> <span style="color: #658b00">self</span>.change
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">reset</span>(<span style="color: #658b00">self</span>):
-        <span style="color: #658b00">self</span>.G_t = <span style="color: #8B008B; font-weight: bold">None</span>
-
-
-<span style="color: #8B008B; font-weight: bold">class</span> <span style="color: #008b45; font-weight: bold">RMS_prop</span>(Scheduler):
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">__init__</span>(<span style="color: #658b00">self</span>, eta, rho):
-        <span style="color: #658b00">super</span>().<span style="color: #008b45">__init__</span>(eta)
-        <span style="color: #658b00">self</span>.rho = rho
-        <span style="color: #658b00">self</span>.second = <span style="color: #B452CD">0.0</span>
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">update_change</span>(<span style="color: #658b00">self</span>, gradient):
-        delta = <span style="color: #B452CD">1e-8</span>  <span style="color: #228B22"># avoid division ny zero</span>
-        <span style="color: #658b00">self</span>.second = <span style="color: #658b00">self</span>.rho * <span style="color: #658b00">self</span>.second + (<span style="color: #B452CD">1</span> - <span style="color: #658b00">self</span>.rho) * gradient * gradient
-        <span style="color: #8B008B; font-weight: bold">return</span> <span style="color: #658b00">self</span>.eta * gradient / (np.sqrt(<span style="color: #658b00">self</span>.second + delta))
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">reset</span>(<span style="color: #658b00">self</span>):
-        <span style="color: #658b00">self</span>.second = <span style="color: #B452CD">0.0</span>
-
-
-<span style="color: #8B008B; font-weight: bold">class</span> <span style="color: #008b45; font-weight: bold">Adam</span>(Scheduler):
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">__init__</span>(<span style="color: #658b00">self</span>, eta, rho, rho2):
-        <span style="color: #658b00">super</span>().<span style="color: #008b45">__init__</span>(eta)
-        <span style="color: #658b00">self</span>.rho = rho
-        <span style="color: #658b00">self</span>.rho2 = rho2
-        <span style="color: #658b00">self</span>.moment = <span style="color: #B452CD">0</span>
-        <span style="color: #658b00">self</span>.second = <span style="color: #B452CD">0</span>
-        <span style="color: #658b00">self</span>.n_epochs = <span style="color: #B452CD">1</span>
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">update_change</span>(<span style="color: #658b00">self</span>, gradient):
-        delta = <span style="color: #B452CD">1e-8</span>  <span style="color: #228B22"># avoid division ny zero</span>
-
-        <span style="color: #658b00">self</span>.moment = <span style="color: #658b00">self</span>.rho * <span style="color: #658b00">self</span>.moment + (<span style="color: #B452CD">1</span> - <span style="color: #658b00">self</span>.rho) * gradient
-        <span style="color: #658b00">self</span>.second = <span style="color: #658b00">self</span>.rho2 * <span style="color: #658b00">self</span>.second + (<span style="color: #B452CD">1</span> - <span style="color: #658b00">self</span>.rho2) * gradient * gradient
-
-        moment_corrected = <span style="color: #658b00">self</span>.moment / (<span style="color: #B452CD">1</span> - <span style="color: #658b00">self</span>.rho**<span style="color: #658b00">self</span>.n_epochs)
-        second_corrected = <span style="color: #658b00">self</span>.second / (<span style="color: #B452CD">1</span> - <span style="color: #658b00">self</span>.rho2**<span style="color: #658b00">self</span>.n_epochs)
-
-        <span style="color: #8B008B; font-weight: bold">return</span> <span style="color: #658b00">self</span>.eta * moment_corrected / (np.sqrt(second_corrected + delta))
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">reset</span>(<span style="color: #658b00">self</span>):
-        <span style="color: #658b00">self</span>.n_epochs += <span style="color: #B452CD">1</span>
-        <span style="color: #658b00">self</span>.moment = <span style="color: #B452CD">0</span>
-        <span style="color: #658b00">self</span>.second = <span style="color: #B452CD">0</span>
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-<h3 id="usage-of-schedulers">Usage of schedulers </h3>
-
-<p>To initalize a scheduler, simply create the object and pass in the necessary parameters such as the learning rate and the momentum as shown below. As the Scheduler class is an abstract class it should not called directly, and will raise an error upon usage.</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="line-height: 125%;">momentum_scheduler = Momentum(eta=<span style="color: #B452CD">1e-3</span>, momentum=<span style="color: #B452CD">0.9</span>)
-adam_scheduler = Adam(eta=<span style="color: #B452CD">1e-3</span>, rho=<span style="color: #B452CD">0.9</span>, rho2=<span style="color: #B452CD">0.999</span>)
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-<p>Here is a small example for how a segment of code using schedulers could look. Switching out the schedulers is simple.</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="line-height: 125%;">weights = np.ones((<span style="color: #B452CD">3</span>,<span style="color: #B452CD">3</span>))
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">f&quot;Before scheduler:\n{</span>weights<span style="color: #CD5555">=}&quot;</span>)
-
-epochs = <span style="color: #B452CD">10</span>
-<span style="color: #8B008B; font-weight: bold">for</span> e <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(epochs):
-    gradient = np.random.rand(<span style="color: #B452CD">3</span>, <span style="color: #B452CD">3</span>)
-    change = adam_scheduler.update_change(gradient)
-    weights = weights - change
-    adam_scheduler.reset()
-
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">f&quot;\nAfter scheduler:\n{</span>weights<span style="color: #CD5555">=}&quot;</span>)
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-<h3 id="cost-functions">Cost functions </h3>
-
-<p>In this section we will quickly look at cost functions that can be
-used when creating the neural network. Every cost function takes the
-target vector as its parameter, and returns a function valued only at
-X such that it may easily be differentiated.
-</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="line-height: 125%;"><span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">CostOLS</span>(target):
-    <span style="color: #CD5555">&quot;&quot;&quot;</span>
-<span style="color: #CD5555">    Return OLS function valued only at X, so</span>
-<span style="color: #CD5555">    that it may be easily differentiated</span>
-<span style="color: #CD5555">    &quot;&quot;&quot;</span>
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">func</span>(X):
-        <span style="color: #8B008B; font-weight: bold">return</span> (<span style="color: #B452CD">1.0</span> / target.shape[<span style="color: #B452CD">0</span>]) * np.sum((target - X) ** <span style="color: #B452CD">2</span>)
-
-    <span style="color: #8B008B; font-weight: bold">return</span> func
-
-
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">CostLogReg</span>(target):
-    <span style="color: #CD5555">&quot;&quot;&quot;</span>
-<span style="color: #CD5555">    Return Logistic Regression cost function</span>
-<span style="color: #CD5555">    valued only at X, so that it may be easily differentiated</span>
-<span style="color: #CD5555">    &quot;&quot;&quot;</span>
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">func</span>(X):
-        <span style="color: #8B008B; font-weight: bold">return</span> -(<span style="color: #B452CD">1.0</span> / target.shape[<span style="color: #B452CD">0</span>]) * np.sum(
-            (target * np.log(X + <span style="color: #B452CD">10e-10</span>)) + ((<span style="color: #B452CD">1</span> - target) * np.log(<span style="color: #B452CD">1</span> - X + <span style="color: #B452CD">10e-10</span>))
-        )
-
-    <span style="color: #8B008B; font-weight: bold">return</span> func
-
-
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">CostCrossEntropy</span>(target):
-    <span style="color: #CD5555">&quot;&quot;&quot;</span>
-<span style="color: #CD5555">    Return cross entropy cost function valued only at X, so</span>
-<span style="color: #CD5555">    that it may be easily differentiated</span>
-<span style="color: #CD5555">    &quot;&quot;&quot;</span>
-    
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">func</span>(X):
-        <span style="color: #8B008B; font-weight: bold">return</span> -(<span style="color: #B452CD">1.0</span> / target.size) * np.sum(target * np.log(X + <span style="color: #B452CD">10e-10</span>))
-
-    <span style="color: #8B008B; font-weight: bold">return</span> func
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-<h3 id="usage-of-cost-functions">Usage of cost functions </h3>
-
-<p>Below we will provide a short example of how these cost function may
-be used to obtain results if you wish to test them out on your own
-using AutoGrad's automatic differentiation.
-</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="line-height: 125%;"><span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">autograd</span> <span style="color: #8B008B; font-weight: bold">import</span> grad
-
-target = np.array([[<span style="color: #B452CD">1</span>, <span style="color: #B452CD">2</span>, <span style="color: #B452CD">3</span>]]).T
-a = np.array([[<span style="color: #B452CD">4</span>, <span style="color: #B452CD">5</span>, <span style="color: #B452CD">6</span>]]).T
-
-cost_func = CostCrossEntropy
-cost_func_derivative = grad(cost_func(target))
-
-valued_at_a = cost_func_derivative(a)
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">f&quot;Derivative of cost function {</span>cost_func.<span style="color: #00688B">__name__</span><span style="color: #CD5555">} valued at a:\n{</span>valued_at_a<span style="color: #CD5555">}&quot;</span>)
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-<h3 id="activation-functions">Activation functions </h3>
-
-<p>Finally, before we look at the layers that make up the neural network,
-we will look at the activation functions which can be specified
-between the hidden layers and as the output function. Each function
-can be valued for any given vector or matrix X, and can be
-differentiated via derivate().
-</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="line-height: 125%;"><span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">autograd.numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">autograd</span> <span style="color: #8B008B; font-weight: bold">import</span> elementwise_grad
-
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">identity</span>(X):
-    <span style="color: #8B008B; font-weight: bold">return</span> X
-
-
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">sigmoid</span>(X):
-    <span style="color: #8B008B; font-weight: bold">try</span>:
-        <span style="color: #8B008B; font-weight: bold">return</span> <span style="color: #B452CD">1.0</span> / (<span style="color: #B452CD">1</span> + np.exp(-X))
-    <span style="color: #8B008B; font-weight: bold">except</span> <span style="color: #008b45; font-weight: bold">FloatingPointError</span>:
-        <span style="color: #8B008B; font-weight: bold">return</span> np.where(X &gt; np.zeros(X.shape), np.ones(X.shape), np.zeros(X.shape))
-
-
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">softmax</span>(X):
-    X = X - np.max(X, axis=-<span style="color: #B452CD">1</span>, keepdims=<span style="color: #8B008B; font-weight: bold">True</span>)
-    delta = <span style="color: #B452CD">10e-10</span>
-    <span style="color: #8B008B; font-weight: bold">return</span> np.exp(X) / (np.sum(np.exp(X), axis=-<span style="color: #B452CD">1</span>, keepdims=<span style="color: #8B008B; font-weight: bold">True</span>) + delta)
-
-
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">RELU</span>(X):
-    <span style="color: #8B008B; font-weight: bold">return</span> np.where(X &gt; np.zeros(X.shape), X, np.zeros(X.shape))
-
-
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">LRELU</span>(X):
-    delta = <span style="color: #B452CD">10e-4</span>
-    <span style="color: #8B008B; font-weight: bold">return</span> np.where(X &gt; np.zeros(X.shape), X, delta * X)
-
-
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">derivate</span>(func):
-    <span style="color: #8B008B; font-weight: bold">if</span> func.<span style="color: #00688B">__name__</span> == <span style="color: #CD5555">&quot;RELU&quot;</span>:
-
-        <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">func</span>(X):
-            <span style="color: #8B008B; font-weight: bold">return</span> np.where(X &gt; <span style="color: #B452CD">0</span>, <span style="color: #B452CD">1</span>, <span style="color: #B452CD">0</span>)
-
-        <span style="color: #8B008B; font-weight: bold">return</span> func
-
-    <span style="color: #8B008B; font-weight: bold">elif</span> func.<span style="color: #00688B">__name__</span> == <span style="color: #CD5555">&quot;LRELU&quot;</span>:
-
-        <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">func</span>(X):
-            delta = <span style="color: #B452CD">10e-4</span>
-            <span style="color: #8B008B; font-weight: bold">return</span> np.where(X &gt; <span style="color: #B452CD">0</span>, <span style="color: #B452CD">1</span>, delta)
-
-        <span style="color: #8B008B; font-weight: bold">return</span> func
-
-    <span style="color: #8B008B; font-weight: bold">else</span>:
-        <span style="color: #8B008B; font-weight: bold">return</span> elementwise_grad(func)
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-<h3 id="usage-of-activation-functions">Usage of activation functions </h3>
-
-<p>Below we present a short demonstration of how to use an activation
-function. The derivative of the activation function will be important
-when calculating the output delta term during backpropagation. Note
-that derivate() can also be used for cost functions for a more
-generalized approach.
-</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="line-height: 125%;">z = np.array([[<span style="color: #B452CD">4</span>, <span style="color: #B452CD">5</span>, <span style="color: #B452CD">6</span>]]).T
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">f&quot;Input to activation function:\n{</span>z<span style="color: #CD5555">}&quot;</span>)
-
-act_func = sigmoid
-a = act_func(z)
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">f&quot;\nOutput from {</span>act_func.<span style="color: #00688B">__name__</span><span style="color: #CD5555">} activation function:\n{</span>a<span style="color: #CD5555">}&quot;</span>)
-
-act_func_derivative = derivate(act_func)
-valued_at_z = act_func_derivative(a)
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">f&quot;\nDerivative of {</span>act_func.<span style="color: #00688B">__name__</span><span style="color: #CD5555">} activation function valued at z:\n{</span>valued_at_z<span style="color: #CD5555">}&quot;</span>)
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-<h3 id="convolution">Convolution </h3>
-
-<p>In order to construct a convolutional neural network (CNN), it is
-crucial to comprehend the fundamental principles of convolution and
-how it aids in extracting information from images. Convolution, at its
-core, is merely a mathematical operation between two functions that
-yields another function. It is represented by an integral between two
-functions, which is typically expressed as:
+<p>We have a new vector defined as \( x(0)=0 \), \( x(1)=0 \),
+\( x(2)=\beta_0 \), \( x(3)=\beta_1 \), \( x(4)=\beta_2 \), \( x(5)=\beta_3 \),
+\( x(6)=0 \), and \( x(7)=0 \).
 </p>
 
-$$
-(f \ast g)(t):=\int_{-\infty}^{\infty} f(\tau) g(t-\tau) d \tau.
-$$
-
-<p>Here, \( f \) and \( g \) are the two functions on which we want to perform an
-operation. The outcome of the convolution operation is represented by
-\( (f \ast g) \), and it is derived by sliding the function \( g \) over \( f \) and
-computing the integral of their product at each position. If both
-functions are continuous, convolution takes the form shown
-above. However, if we discretize both \( f \) and \( g \), the convolution
-operation will take the form of a sum between the elements of \( f \) and \( g \):
+<p>We have added four new elements, which
+are all zero. The benefit is that we can rewrite the equation for
+\( \boldsymbol{y} \), with \( i=0,1,\dots,5 \),
 </p>
 $$
-(f \ast g)[n]=\sum_{m=0}^{n-1} f(m) g(n-m).
+y(i) = \sum_{k=0}^{k=m-1}w(k)x(i+(m-1)-k).
 $$
 
-<p>The key idea we utilize to extract the information contained in an
-image is to slide an \( m \times n \) matrix \( g \) over an \( m \times n \)
-matrix \( f \). In our case, \( f \) represents the image, while \( g \)
-represents the kernel, oftentimes called a filter. However, since our
-convolution will be a two-dimensional variant, we need to extend our
-mathematical formula with an additional summation:
-</p>
-
+<p>As an example, we have</p>
 $$
-(f \ast g)(i, j)\sum_{m=0}^{M-1}\sum_{n=0}^{N-1} f(m,n) g(i-m, j-n).
+y(4)=x(6)w(0)+x(5)w(1)+x(4)w(2)=0\times \alpha_0+\beta_3\alpha_1+\beta_2\alpha_2,
 $$
 
-<p>It is imperative to note that the size of the kernel g is
-significantly smaller than the size of the input image f, thereby
-reducing the amount of computation necessary for feature
-extraction. Furthermore, the kernel is usually a trainable parameter
-in a convolutional neural network, allowing the network to learn
-appropriate kernels for specific tasks.
-</p>
-
-<p>To give you an example of how 2D convolution works in practice,
-suppose we have an image \( f \) of dimension \( 6 \times 6 \)
-</p>
+<p>as before except that we have an additional term \( x(6)w(0) \), which is zero.</p>
 
+<p>Similarly, for the fifth-order term we have</p>
 $$
-f = \begin{bmatrix}
-4 & 1 & 2 & 9 & 8 & 6 \\
-9 & 5 & 9 & 5 & 8 & 5 \\
-1 & 5 & 9 & 7 & 6 & 4 \\
-2 & 9 & 8 & 3 & 7 & 1 \\
-8 & 1 & 6 & 4 & 2 & 2 \\
-1 & 0 & 5 & 7 & 8 & 2 \\
-\end{bmatrix}
+y(5)=x(7)w(0)+x(6)w(1)+x(5)w(2)=0\times \alpha_0+0\times\alpha_1+\beta_3\alpha_2.
 $$
 
-<p>and a \( 3 \times 3 \) kernel \( g \) called a low-pass filter. Note that the
-kernel is usually rotated by 180 degrees during convolution, however
-this has no effect on this kernel.
-</p>
-
+<p>The zeroth-order term is</p>
 $$
-g = \frac{1}{9}
-\begin{bmatrix}
-1 & 1 & 1 \\
-1 & 1 & 1 \\
-1 & 1 & 1 \\
-\end{bmatrix}
+y(0)=x(2)w(0)+x(1)w(1)+x(0)w(2)=\beta_0 \alpha_0+0\times\alpha_1+0\times\alpha_2=\alpha_0\beta_0.
 $$
 
-<p>In order to filter the image, we have to extract a \( 3 \times 3 \)
-element from the upper left corner of \( f \), and perform element-wise
-multiplication of the extracted image pixels with the elements of the
-kernel \( g \):
-</p>
-
-$$
-\begin{bmatrix}
-4 & 1 & 2 \\
-9 & 5 & 9 \\
-1 & 5 & 9 \\
-\end{bmatrix}
-\cdot
-\begin{bmatrix}
-\frac{1}{9} & \frac{1}{9} & \frac{1}{9} \\
-\frac{1}{9} & \frac{1}{9} & \frac{1}{9} \\
-\frac{1}{9} & \frac{1}{9} & \frac{1}{9} \\
-\end{bmatrix}
-=
-\begin{bmatrix}
-\frac{4}{9} & \frac{1}{9} & \frac{2}{9} \\
-\frac{9}{9} & \frac{5}{9} & \frac{9}{9} \\
-\frac{1}{9} & \frac{5}{9} & \frac{9}{9} \\
-\end {bmatrix}
-= \boldsymbol{A}
-$$
 
-<p>Then, following the multiplication, we summarize all the elements of the resulting matrix \( \boldsymbol{A} \):</p>
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="rewriting-as-dot-products">Rewriting as dot products </h2>
 
+<p>If we now flip the filter/weight vector, with the following term as a typical example</p>
 $$
-(f \ast g)(0, 0)= \sum_{i=0}^{2} \sum_{j=0}^{2} a_{i,j} = 5,
+y(0)=x(2)w(0)+x(1)w(1)+x(0)w(2)=x(2)\tilde{w}(2)+x(1)\tilde{w}(1)+x(0)\tilde{w}(0),
 $$
 
-<p>which corresponds to the first element of the filtered image \( (f \ast g) \).</p>
-
-<p>Here we use a stride of \( S=1 \), a parameter denoted \( S \) which describes how
-many indexes we move the kernel \( g \) to the right before repeating the
-calculations above for the next \( 3 \times 3 \) element of the image
-\( f \). It is usually presumed that \( S=1 \), however, larger values for \( S \)
-can be used to reduce the dimentionality of the filtered image such
-that the convolution operation is more computationally efficient. In
-the context of a convolutional neural network, this will become very
-useful.
+<p>with \( \tilde{w}(0)=w(2) \), \( \tilde{w}(1)=w(1) \), and \( \tilde{w}(2)=w(0) \), we can then rewrite the above sum as a dot product of
+\( x(i:i+(m-1))\tilde{w} \) for element \( y(i) \), where \( x(i:i+(m-1)) \) is simply a patch of \( \boldsymbol{x} \) of size \( m-1 \).
 </p>
 
-<p>The full result of the convolution is:</p>
-
-$$
-(f \ast g) =
-\begin{bmatrix}
-5 & 5.78 & 7 & 6.44 \\
-6.33 & 6.67 & 6.89 & 5.11 \\
-5.44 & 5.78 & 5.78 & 4 \\
-4.44 & 4.78 & 5.56 & 4 \\
-\end{bmatrix}
-$$
-
-<p>The result is markedly smaller in shape than the original image. This
-occurs when using convolution without first padding the image with
-additional columns and rows, allowing us to keep the original image
-shape after sliding the kernel over the image.  How many rows and
-columns we wish to pad the image with depends strictly on the shape of
-the kernel, as we wish to pad the image with \( r \) additional rows and
-\( c \) additional columns.
+<p>The padding \( P \) we have introduced for the convolution stage is just
+another hyperparameter which is introduced as part of the
+architecture. Similarly, below we will also introduce another
+hyperparameter called <b>Stride</b> \( S \). 
 </p>
 
-$$
-r =\lfloor \frac{\mathrm{kernel height}}{2} \rfloor \cdot 2 \\
-c =\lfloor \frac{\mathrm{kernel width}}{2} \rfloor \cdot 2
-$$
-
-<p>Note the notation \( \lfloor \frac{\mathrm{kernel width}}{2} \rfloor \) means that
-we floor the result of the division, meaning we round down to a whole
-number in case \( \frac{\mathrm{kernel width}}{2} \) results in a floating point
-number.
-</p>
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="cross-correlation">Cross correlation </h2>
 
-<p>Using those simple equations, we find out by how much we have to
-extend the dimensions of the original image. Before proceeding,
-however, we might ask what we shall fill the additional rows and
-columns with? One of the most common approaches to padding is
-zero-padding, which as the name suggest, involves filling the rows and
-columns with zeros. This is the approach that we will be using for
-this demonstration. If we apply this padding to out original \( 6 \times 6 \)
-image, the result will be an \( 8 \times 8 \) image as the kernel has a width and
-height of 3. Note that the original image is encapsuled by the
-zero-padded rows and columns:
+<p>In essentially all applications one uses what is called cross correlation instead of the standard convolution described above.
+This means that multiplication is performed in the same direction and instead of the general expression we have discussed above (with infinite sums)
 </p>
-
-$$
-\begin{bmatrix}
-0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\
-0 & 4 & 1 & 2 & 9 & 8 & 6 & 0 \\
-0 & 9 & 5 & 9 & 5 & 8 & 5 & 0 \\
-0 & 1 & 5 & 9 & 7 & 6 & 4 & 0 \\
-0 & 2 & 9 & 8 & 3 & 7 & 1 & 0 \\
-0 & 8 & 1 & 6 & 4 & 2 & 2 & 0 \\
-0 & 1 & 0 & 5 & 7 & 8 & 2 & 0 \\
-0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\
-\end{bmatrix}.
+$$
+y(i) = \sum_{k=-\infty}^{k=\infty}w(k)x(i-k),
 $$
 
-<p>Below we have provided code that demonstrates padding and
-convolution. As you will see when we run the code, the size of the
-image will remain unchanged when using padding.~
-</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="line-height: 125%;"><span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
-
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">padding</span>(image, kernel):
-    <span style="color: #228B22"># calculate r and c</span>
-    r = (kernel.shape[<span style="color: #B452CD">0</span>] // <span style="color: #B452CD">2</span>) * <span style="color: #B452CD">2</span>
-    c = (kernel.shape[<span style="color: #B452CD">1</span>] // <span style="color: #B452CD">2</span>) * <span style="color: #B452CD">2</span>
-    
-    <span style="color: #228B22"># padded image dimensions</span>
-    padded_height = image.shape[<span style="color: #B452CD">0</span>] + r
-    padded_width = image.shape[<span style="color: #B452CD">1</span>] + c
-    
-    <span style="color: #228B22"># for more readable code</span>
-    k_half_height = kernel.shape[<span style="color: #B452CD">0</span>] // <span style="color: #B452CD">2</span>
-    k_half_width = kernel.shape[<span style="color: #B452CD">1</span>] // <span style="color: #B452CD">2</span>
-
-    <span style="color: #228B22"># zero matrix with padded dimensions</span>
-    padded_img = np.zeros((padded_height, padded_width))
-
-    <span style="color: #228B22"># place image into zero matrix</span>
-    padded_img[k_half_height : padded_height - k_half_height,
-               k_half_width : padded_width - k_half_width] = image[:, :]
-
-    <span style="color: #8B008B; font-weight: bold">return</span> padded_img
-
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">convolve</span>(original_image, padded_image, kernel, stride=<span style="color: #B452CD">1</span>):
-    <span style="color: #228B22"># rotate kernel by 180 degrees</span>
-    kernel = np.rot90(np.rot90(kernel))
-
-    <span style="color: #228B22"># note that kernel height // 2 is written as &#39;m&#39;</span>
-    <span style="color: #228B22"># and kernel width // 2 as &#39;n&#39; in the mathematical notation</span>
-    m = kernel.shape[<span style="color: #B452CD">0</span>] // <span style="color: #B452CD">2</span>
-    n = kernel.shape[<span style="color: #B452CD">1</span>] // <span style="color: #B452CD">2</span>
-    
-    r = (kernel.shape[<span style="color: #B452CD">0</span>] // <span style="color: #B452CD">2</span>) * <span style="color: #B452CD">2</span>
-    c = (kernel.shape[<span style="color: #B452CD">1</span>] // <span style="color: #B452CD">2</span>) * <span style="color: #B452CD">2</span>
-    
-    <span style="color: #228B22"># initialize output array</span>
-    convolved_image = np.zeros(original_image.shape)
-    image_height = original_image.shape[<span style="color: #B452CD">0</span>]
-    image_width = original_image.shape[<span style="color: #B452CD">1</span>]
-
-    <span style="color: #228B22"># the convolution</span>
-    <span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(m, image_height + m, stride):
-        <span style="color: #8B008B; font-weight: bold">for</span> j <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(n, image_width + n, stride):
-            convolved_image[i-m, j-n] = np.sum(
-                padded_image[i : i + m, j : j + n]
-                * kernel
-            )
-            
-    <span style="color: #8B008B; font-weight: bold">return</span> convolved_image
-
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">convolve</span>(image, kernel, stride=<span style="color: #B452CD">1</span>):
-    <span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #B452CD">2</span>):
-        kernel = np.rot90(kernel)
-
-    k_half_height = kernel.shape[<span style="color: #B452CD">0</span>] // <span style="color: #B452CD">2</span>
-    k_half_width = kernel.shape[<span style="color: #B452CD">0</span>] // <span style="color: #B452CD">2</span>
-
-    conv_image = np.zeros(image.shape)
-    pad_image = padding(image, kernel)
-
-    <span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(k_half_height, conv_image.shape[<span style="color: #B452CD">0</span>] + k_half_height, stride):
-        <span style="color: #8B008B; font-weight: bold">for</span> j <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(k_half_width, conv_image.shape[<span style="color: #B452CD">1</span>] + k_half_width, stride):
-            conv_image[i - k_half_height, j - k_half_width] = np.sum(
-                pad_image[
-                    i - k_half_height : i + k_half_height + <span style="color: #B452CD">1</span>, j - k_half_width : j + k_half_width + <span style="color: #B452CD">1</span>
-                ]
-                * kernel
-            )
-
-    <span style="color: #8B008B; font-weight: bold">return</span> conv_image
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-<p>Fun fact: When filtering images, you will see that convolution
-involves rotating the kernel by 180 degrees.  However, this is not the
-case when applying convolution in a CNN, where the same operation that is not
-rotated by 180 degrees is called cross-correlation, which is normally implemented in most libraries.
-</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="line-height: 125%;">original_image = np.array([[<span style="color: #B452CD">4</span>, <span style="color: #B452CD">1</span>, <span style="color: #B452CD">2</span>, <span style="color: #B452CD">9</span>, <span style="color: #B452CD">8</span>, <span style="color: #B452CD">6</span>],
-                 [<span style="color: #B452CD">9</span>, <span style="color: #B452CD">5</span>, <span style="color: #B452CD">9</span>, <span style="color: #B452CD">5</span>, <span style="color: #B452CD">8</span>, <span style="color: #B452CD">5</span>],
-                 [<span style="color: #B452CD">1</span>, <span style="color: #B452CD">5</span>, <span style="color: #B452CD">9</span>, <span style="color: #B452CD">7</span>, <span style="color: #B452CD">6</span>, <span style="color: #B452CD">4</span>],
-                 [<span style="color: #B452CD">2</span>, <span style="color: #B452CD">9</span>, <span style="color: #B452CD">8</span>, <span style="color: #B452CD">3</span>, <span style="color: #B452CD">7</span>, <span style="color: #B452CD">1</span>],
-                 [<span style="color: #B452CD">8</span>, <span style="color: #B452CD">1</span>, <span style="color: #B452CD">6</span>, <span style="color: #B452CD">4</span>, <span style="color: #B452CD">2</span>, <span style="color: #B452CD">2</span>],
-                 [<span style="color: #B452CD">1</span>, <span style="color: #B452CD">0</span>, <span style="color: #B452CD">5</span>, <span style="color: #B452CD">7</span>, <span style="color: #B452CD">8</span>, <span style="color: #B452CD">2</span>]])
-
-kernel = (<span style="color: #B452CD">1</span>/<span style="color: #B452CD">9</span>)*np.ones((<span style="color: #B452CD">3</span>,<span style="color: #B452CD">3</span>))
-
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">f&quot;{</span>original_image.shape<span style="color: #CD5555">=}&quot;</span>)
-
-<span style="color: #228B22"># note that convolve() performs padding</span>
-convolved_image = convolve(original_image, kernel, stride=<span style="color: #B452CD">1</span>)
-
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">f&quot;{</span>convolved_image.shape<span style="color: #CD5555">=}&quot;</span>)
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-<p>As you can see, the resulting image is of the same size as the
-original image. To round of our demonstration of convolution, we will
-present the results of convolution using commonly used kernels. In a
-CNN, the values of the kernels are randomly initialized, and then
-learned during training. These kernels will extract information
-regarding the picture, such as for example the edge detection filter
-demonstrated below extracts the edges present in the picture. Of
-course, there is no guarantee that the CNN will learn an edge
-detection filter, but this should provide some intuiton as to how the
-CNN is able to use kernels to make better predictions than a regular
-feed forward neural network.
-</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="line-height: 125%;"><span style="color: #228B22"># Now an example using a real image and first a gaussian low-pass filter and then a Sobel filter</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">imageio.v3</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">imageio</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">matplotlib.pyplot</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">plt</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">time</span>
-
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">generate_gauss_mask</span>(sigma, K=<span style="color: #B452CD">1</span>):
-    side = np.ceil(<span style="color: #B452CD">1</span> + <span style="color: #B452CD">8</span> * sigma)
-    y, x = np.mgrid[-side // <span style="color: #B452CD">2</span> + <span style="color: #B452CD">1</span> : (side // <span style="color: #B452CD">2</span>) + <span style="color: #B452CD">1</span>, -side // <span style="color: #B452CD">2</span> + <span style="color: #B452CD">1</span> : (side // <span style="color: #B452CD">2</span>) + <span style="color: #B452CD">1</span>]
-    ker_coef = K / (<span style="color: #B452CD">2</span> * np.pi * sigma**<span style="color: #B452CD">2</span>)
-    g = np.exp(-((x**<span style="color: #B452CD">2</span> + y**<span style="color: #B452CD">2</span>) / (<span style="color: #B452CD">2.0</span> * sigma**<span style="color: #B452CD">2</span>)))
+<p>we have now</p>
+$$
+y(i) = \sum_{k=-\infty}^{k=\infty}w(k)x(i+k).
+$$
 
-    <span style="color: #8B008B; font-weight: bold">return</span> g, ker_coef
+<p>Both TensorFlow and PyTorch (as well as our own code example below),
+implement the last equation, although it is normally referred to as
+convolution.  The same padding rules and stride rules discussed below
+apply to this expression as well.
+</p>
 
+<p>We leave it as an exercise for you to convince yourself that the example we have discussed till now, gives the same final expression using the last expression.</p>
+<h2 id="two-dimensional-objects">Two-dimensional objects </h2>
 
-img_path = <span style="color: #CD5555">&quot;data/IMG-2167.JPG&quot;</span>
-image_of_cute_dog = imageio.imread(img_path, mode=<span style="color: #CD5555">&#39;L&#39;</span>)
+<p>We are now ready to start studying the discrete convolutions relevant for convolutional neural networks.
+We often use convolutions over more than one dimension at a time. If
+we have a two-dimensional image \( X \) as input, we can have a <b>filter</b>
+defined by a two-dimensional <b>kernel/weight/filter</b> \( W \). This leads to an output \( Y \)
+</p>
 
-plt.imshow(image_of_cute_dog, cmap=<span style="color: #CD5555">&quot;gray&quot;</span>, vmin=<span style="color: #B452CD">0</span>, vmax=<span style="color: #B452CD">255</span>, aspect=<span style="color: #CD5555">&quot;auto&quot;</span>)
-plt.title(<span style="color: #CD5555">&quot;Original image&quot;</span>)
-plt.show()
+$$
+Y(i,j)=(X * W)(i,j) = \sum_m\sum_n X(m,n)W(i-m,j-n).
+$$
 
-gauss, kernel = generate_gauss_mask(sigma=<span style="color: #B452CD">6</span>)
-gauss_kernel = gauss*kernel
+<p>Convolution is a commutative process, which means we can rewrite this equation as</p>
+$$
+Y(i,j)=(X * W)(i,j) = \sum_m\sum_n X(i-m,j-n)W(m,n).
+$$
 
-filtered_image = convolve(image_of_cute_dog, gauss_kernel)
-plt.imshow(filtered_image, cmap=<span style="color: #CD5555">&quot;gray&quot;</span>, vmin=<span style="color: #B452CD">0</span>, vmax=<span style="color: #B452CD">255</span>, aspect=<span style="color: #CD5555">&quot;auto&quot;</span>)
-plt.title(<span style="color: #CD5555">&quot;Result of convolution with gauss kernel (blurring filter)&quot;</span>)
-plt.show()
+<p>Normally the latter is more straightforward to implement in a machine
+larning library since there is less variation in the range of values
+of \( m \) and \( n \).
+</p>
 
-sobel_kernel = np.array([[<span style="color: #B452CD">1</span>, <span style="color: #B452CD">2</span>, <span style="color: #B452CD">1</span>],
-                    [<span style="color: #B452CD">0</span>, <span style="color: #B452CD">0</span>, <span style="color: #B452CD">0</span>], 
-                    [-<span style="color: #B452CD">1</span>, -<span style="color: #B452CD">2</span>, -<span style="color: #B452CD">1</span>]])
+<p>As mentioned above, most deep learning libraries implement
+cross-correlation instead of convolution (although it is referred to as
+convolution)
+</p>
+$$
+Y(i,j)=(X * W)(i,j) = \sum_m\sum_n X(i+m,j+n)W(m,n).
+$$
 
-filtered_image = convolve(image_of_cute_dog, sobel_kernel)
 
-plt.imshow(filtered_image, cmap=<span style="color: #CD5555">&quot;gray&quot;</span>, vmin=<span style="color: #B452CD">0</span>, vmax=<span style="color: #B452CD">255</span>, aspect=<span style="color: #CD5555">&quot;auto&quot;</span>)
-plt.title(<span style="color: #CD5555">&quot;Result of convolution with sobel kernel (edge detection filter)&quot;</span>)
-plt.show()
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-<h3 id="layers">Layers </h3>
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="cnns-in-more-detail-simple-example">CNNs in more detail, simple example  </h2>
 
-<p>The code below initialises global variables for readability and
-describes the abstract class Layers. This is not important in order to
-understand the CNN, but is benefitial for organizing the code neatly.
+<p>Let assume we have an input matrix \( X \) of dimensionality \( 3\times 3 \)
+and a \( 2\times 2 \) filter \( W \) given by the following matrices
 </p>
 
+$$
+\boldsymbol{X}=\begin{bmatrix}x_{00} & x_{01} & x_{02}  \\
+                      x_{10} & x_{11} & x_{12}  \\
+	              x_{20} & x_{21} & x_{22} \end{bmatrix},
+$$
 
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="line-height: 125%;"><span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">math</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">autograd.numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">copy</span> <span style="color: #8B008B; font-weight: bold">import</span> deepcopy, copy
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">autograd</span> <span style="color: #8B008B; font-weight: bold">import</span> grad
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">typing</span> <span style="color: #8B008B; font-weight: bold">import</span> Callable
-
-<span style="color: #228B22"># global variables for index readability</span>
-input_index = <span style="color: #B452CD">0</span>
-node_index = <span style="color: #B452CD">1</span>
-bias_index = <span style="color: #B452CD">1</span>
-input_channel_index = <span style="color: #B452CD">1</span>
-feature_maps_index = <span style="color: #B452CD">1</span>
-height_index = <span style="color: #B452CD">2</span>
-width_index = <span style="color: #B452CD">3</span>
-kernel_feature_maps_index = <span style="color: #B452CD">1</span>
-kernel_input_channels_index = <span style="color: #B452CD">0</span>
-
-
-<span style="color: #8B008B; font-weight: bold">class</span> <span style="color: #008b45; font-weight: bold">Layer</span>:
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">__init__</span>(<span style="color: #658b00">self</span>, seed):
-        <span style="color: #658b00">self</span>.seed = seed
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">_feedforward</span>(<span style="color: #658b00">self</span>):
-        <span style="color: #8B008B; font-weight: bold">raise</span> <span style="color: #008b45; font-weight: bold">NotImplementedError</span>
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">_backpropagate</span>(<span style="color: #658b00">self</span>):
-        <span style="color: #8B008B; font-weight: bold">raise</span> <span style="color: #008b45; font-weight: bold">NotImplementedError</span>
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">_reset_weights</span>(<span style="color: #658b00">self</span>, previous_nodes):
-        <span style="color: #8B008B; font-weight: bold">raise</span> <span style="color: #008b45; font-weight: bold">NotImplementedError</span>
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-<h3 id="convolution2dlayer-convolution-in-a-hidden-layer">Convolution2DLayer: convolution in a hidden layer </h3>
-
-<p>After establishing the foundational understanding of applying
-convolution to spatial data, let us delve into the intricate workings
-of a convolutional layer in a Convolutional Neural Network (CNN). The
-primary function of convolution, as previously discussed, is to
-extract pertinent information from images while simultaneously
-decreasing the scale of our data. To initiate the image processing, we
-shall begin by partitioning the images into color channels (unless the
-image is grayscale), comprising three primary colors: red, green, and
-blue. We will subsequently utilize trainable kernels to construct a
-higher-dimensional encoding of each channel called feature
-maps. Successive layers will receive these feature maps as inputs,
-generating further encodings, albeit with reduced dimensions. The term
-trainable kernels denotes the initialization of pre-defined
-kernel-shaped weights, which we will then train via backpropagation,
-similar to how weights are trained in a Feedforward Neural Network.
-</p>
+<p>and </p>
+$$
+\boldsymbol{W}=\begin{bmatrix}w_{00} & w_{01} \\
+	              w_{10} & w_{11}\end{bmatrix}.
+$$
 
-<p>To ensure seamless integration between our implementation of the
-convolutional layer and popular machine learning frameworks like
-Tensorflow (Keras) and PyTorch, we have adopted a design pattern that
-mirrors the construction of models using these APIs. This involves
-implementing our convolutional layer as a Python class or object,
-which allows for a more modular and flexible approach to building
-neural networks. By structuring our code in this way, users can easily
-incorporate our implementation into their existing machine learning
-pipelines without having to make significant changes to their
-codebase. Additionally, this design pattern promotes code reusability
-and makes it easier to maintain and update our convolutional layer
-implementation over time.
+<p>We introduce now the hyperparameter \( S \) <b>stride</b>. Stride represents how the filter \( W \) moves the convolution process on the matrix \( X \).
+We strongly recommend the repository on <a href="https://github.com/vdumoulin/conv_arithmetic" target="_blank">Arithmetic of deep learning by Dumoulin and Visin</a> 
 </p>
 
-<p>Note that the Convolution2DLayer takes in an activation function as a parameter, as it also performs non-linearity.</p>
+<p>Here we set the stride equal to \( S=1 \), which means that, starting with the element \( x_{00} \), the filter will act on \( 2\times 2 \) submatrices each time, starting with the upper corner and moving according to the stride value column by column. </p>
 
+<p>Here we perform the operation</p>
+$$
+Y_(i,j)=(X * W)(i,j) = \sum_m\sum_n X(i-m,j-n)W(m,n),
+$$
 
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="line-height: 125%;"><span style="color: #8B008B; font-weight: bold">class</span> <span style="color: #008b45; font-weight: bold">Convolution2DLayer</span>(Layer):
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">__init__</span>(
-        <span style="color: #658b00">self</span>,
-        input_channels,
-        feature_maps,
-        kernel_height,
-        kernel_width,
-        v_stride,
-        h_stride,
-        pad,
-        act_func: Callable,
-        seed=<span style="color: #8B008B; font-weight: bold">None</span>,
-        reset_weights_independently=<span style="color: #8B008B; font-weight: bold">True</span>,
-    ):
-        <span style="color: #658b00">super</span>().<span style="color: #008b45">__init__</span>(seed)
-        <span style="color: #658b00">self</span>.input_channels = input_channels
-        <span style="color: #658b00">self</span>.feature_maps = feature_maps
-        <span style="color: #658b00">self</span>.kernel_height = kernel_height
-        <span style="color: #658b00">self</span>.kernel_width = kernel_width
-        <span style="color: #658b00">self</span>.v_stride = v_stride
-        <span style="color: #658b00">self</span>.h_stride = h_stride
-        <span style="color: #658b00">self</span>.pad = pad
-        <span style="color: #658b00">self</span>.act_func = act_func
-
-        <span style="color: #228B22"># such that the layer can be used on its own</span>
-        <span style="color: #228B22"># outside of the CNN module</span>
-        <span style="color: #8B008B; font-weight: bold">if</span> reset_weights_independently == <span style="color: #8B008B; font-weight: bold">True</span>:
-            <span style="color: #658b00">self</span>._reset_weights_independently()
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">_feedforward</span>(<span style="color: #658b00">self</span>, X_batch):
-        <span style="color: #228B22"># note that the shape of X_batch = [inputs, input_maps, img_height, img_width]</span>
-
-        <span style="color: #228B22"># pad the input batch</span>
-        X_batch_padded = <span style="color: #658b00">self</span>._padding(X_batch)
-
-        <span style="color: #228B22"># calculate height_index and width_index after stride</span>
-        strided_height = <span style="color: #658b00">int</span>(np.ceil(X_batch.shape[height_index] / <span style="color: #658b00">self</span>.v_stride))
-        strided_width = <span style="color: #658b00">int</span>(np.ceil(X_batch.shape[width_index] / <span style="color: #658b00">self</span>.h_stride))
-
-        <span style="color: #228B22"># create output array</span>
-        output = np.ndarray(
-            (
-                X_batch.shape[input_index],
-                <span style="color: #658b00">self</span>.feature_maps,
-                strided_height,
-                strided_width,
-            )
-        )
-
-        <span style="color: #228B22"># save input and output for backpropagation</span>
-        <span style="color: #658b00">self</span>.X_batch_feedforward = X_batch
-        <span style="color: #658b00">self</span>.output_shape = output.shape
-
-        <span style="color: #228B22"># checking for errors, no need to look here :)</span>
-        <span style="color: #658b00">self</span>._check_for_errors()
-
-        <span style="color: #228B22"># convolve input with kernel</span>
-        <span style="color: #8B008B; font-weight: bold">for</span> img <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(X_batch.shape[input_index]):
-            <span style="color: #8B008B; font-weight: bold">for</span> chin <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #658b00">self</span>.input_channels):
-                <span style="color: #8B008B; font-weight: bold">for</span> fmap <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #658b00">self</span>.feature_maps):
-                    out_h = <span style="color: #B452CD">0</span>
-                    <span style="color: #8B008B; font-weight: bold">for</span> h <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #B452CD">0</span>, X_batch.shape[height_index], <span style="color: #658b00">self</span>.v_stride):
-                        out_w = <span style="color: #B452CD">0</span>
-                        <span style="color: #8B008B; font-weight: bold">for</span> w <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #B452CD">0</span>, X_batch.shape[width_index], <span style="color: #658b00">self</span>.h_stride):
-                            output[img, fmap, out_h, out_w] = np.sum(
-                                X_batch_padded[
-                                    img,
-                                    chin,
-                                    h : h + <span style="color: #658b00">self</span>.kernel_height,
-                                    w : w + <span style="color: #658b00">self</span>.kernel_width,
-                                ]
-                                * <span style="color: #658b00">self</span>.kernel[chin, fmap, :, :]
-                            )
-                            out_w += <span style="color: #B452CD">1</span>
-                        out_h += <span style="color: #B452CD">1</span>
-
-        <span style="color: #228B22"># Pay attention to the fact that we&#39;re not rotating the kernel by 180 degrees when filtering the image in</span>
-        <span style="color: #228B22"># the convolutional layer, as convolution in terms of Machine Learning is a procedure known as cross-correlation</span>
-        <span style="color: #228B22"># in image processing and signal processing</span>
-
-        <span style="color: #228B22"># return a</span>
-        <span style="color: #8B008B; font-weight: bold">return</span> <span style="color: #658b00">self</span>.act_func(output / (<span style="color: #658b00">self</span>.kernel_height))
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">_backpropagate</span>(<span style="color: #658b00">self</span>, delta_term_next):
-        <span style="color: #228B22"># intiate matrices</span>
-        delta_term = np.zeros((<span style="color: #658b00">self</span>.X_batch_feedforward.shape))
-        gradient_kernel = np.zeros((<span style="color: #658b00">self</span>.kernel.shape))
-
-        <span style="color: #228B22"># pad input for convolution</span>
-        X_batch_padded = <span style="color: #658b00">self</span>._padding(<span style="color: #658b00">self</span>.X_batch_feedforward)
-
-        <span style="color: #228B22"># Since an activation function is used at the output of the convolution layer, its derivative</span>
-        <span style="color: #228B22"># has to be accounted for in the backpropagation -&gt; as if ReLU was a layer on its own.</span>
-        act_derivative = derivate(<span style="color: #658b00">self</span>.act_func)
-        delta_term_next = act_derivative(delta_term_next)
-
-        <span style="color: #228B22"># fill in 0&#39;s for values removed by vertical stride in feedforward</span>
-        <span style="color: #8B008B; font-weight: bold">if</span> <span style="color: #658b00">self</span>.v_stride &gt; <span style="color: #B452CD">1</span>:
-            v_ind = <span style="color: #B452CD">1</span>
-            <span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(delta_term_next.shape[height_index]):
-                <span style="color: #8B008B; font-weight: bold">for</span> j <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #658b00">self</span>.v_stride - <span style="color: #B452CD">1</span>):
-                    delta_term_next = np.insert(
-                        delta_term_next, v_ind, <span style="color: #B452CD">0</span>, axis=height_index
-                    )
-                v_ind += <span style="color: #658b00">self</span>.v_stride
-
-        <span style="color: #228B22"># fill in 0&#39;s for values removed by horizontal stride in feedforward</span>
-        <span style="color: #8B008B; font-weight: bold">if</span> <span style="color: #658b00">self</span>.h_stride &gt; <span style="color: #B452CD">1</span>:
-            h_ind = <span style="color: #B452CD">1</span>
-            <span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(delta_term_next.shape[width_index]):
-                <span style="color: #8B008B; font-weight: bold">for</span> k <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #658b00">self</span>.h_stride - <span style="color: #B452CD">1</span>):
-                    delta_term_next = np.insert(
-                        delta_term_next, h_ind, <span style="color: #B452CD">0</span>, axis=width_index
-                    )
-                h_ind += <span style="color: #658b00">self</span>.h_stride
-
-        <span style="color: #228B22"># crops out 0-rows and 0-columns</span>
-        delta_term_next = delta_term_next[
-            :,
-            :,
-            : <span style="color: #658b00">self</span>.X_batch_feedforward.shape[height_index],
-            : <span style="color: #658b00">self</span>.X_batch_feedforward.shape[width_index],
-        ]
-
-        <span style="color: #228B22"># the gradient received from the next layer also needs to be padded</span>
-        delta_term_next = <span style="color: #658b00">self</span>._padding(delta_term_next)
-
-        <span style="color: #228B22"># calculate delta term by convolving next delta term with kernel</span>
-        <span style="color: #8B008B; font-weight: bold">for</span> img <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #658b00">self</span>.X_batch_feedforward.shape[input_index]):
-            <span style="color: #8B008B; font-weight: bold">for</span> chin <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #658b00">self</span>.input_channels):
-                <span style="color: #8B008B; font-weight: bold">for</span> fmap <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #658b00">self</span>.feature_maps):
-                    <span style="color: #8B008B; font-weight: bold">for</span> h <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #658b00">self</span>.X_batch_feedforward.shape[height_index]):
-                        <span style="color: #8B008B; font-weight: bold">for</span> w <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #658b00">self</span>.X_batch_feedforward.shape[width_index]):
-                            delta_term[img, chin, h, w] = np.sum(
-                                delta_term_next[
-                                    img,
-                                    fmap,
-                                    h : h + <span style="color: #658b00">self</span>.kernel_height,
-                                    w : w + <span style="color: #658b00">self</span>.kernel_width,
-                                ]
-                                * np.rot90(np.rot90(<span style="color: #658b00">self</span>.kernel[chin, fmap, :, :]))
-                            )
-
-        <span style="color: #228B22"># calculate gradient for kernel for weight update</span>
-        <span style="color: #228B22"># also via convolution</span>
-        <span style="color: #8B008B; font-weight: bold">for</span> chin <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #658b00">self</span>.input_channels):
-            <span style="color: #8B008B; font-weight: bold">for</span> fmap <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #658b00">self</span>.feature_maps):
-                <span style="color: #8B008B; font-weight: bold">for</span> k_x <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #658b00">self</span>.kernel_height):
-                    <span style="color: #8B008B; font-weight: bold">for</span> k_y <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #658b00">self</span>.kernel_width):
-                        gradient_kernel[chin, fmap, k_x, k_y] = np.sum(
-                            X_batch_padded[
-                                img,
-                                chin,
-                                h : h + <span style="color: #658b00">self</span>.kernel_height,
-                                w : w + <span style="color: #658b00">self</span>.kernel_width,
-                            ]
-                            * delta_term_next[
-                                img,
-                                fmap,
-                                h : h + <span style="color: #658b00">self</span>.kernel_height,
-                                w : w + <span style="color: #658b00">self</span>.kernel_width,
-                            ]
-                        )
-        <span style="color: #228B22"># all kernels are updated with weight gradient of kernel</span>
-        <span style="color: #658b00">self</span>.kernel -= gradient_kernel
-
-        <span style="color: #228B22"># return delta term</span>
-        <span style="color: #8B008B; font-weight: bold">return</span> delta_term
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">_padding</span>(<span style="color: #658b00">self</span>, X_batch, batch_type=<span style="color: #CD5555">&quot;image&quot;</span>):
-
-        <span style="color: #228B22"># same padding for images</span>
-        <span style="color: #8B008B; font-weight: bold">if</span> <span style="color: #658b00">self</span>.pad == <span style="color: #CD5555">&quot;same&quot;</span> <span style="color: #8B008B">and</span> batch_type == <span style="color: #CD5555">&quot;image&quot;</span>:
-            padded_height = X_batch.shape[height_index] + (<span style="color: #658b00">self</span>.kernel_height // <span style="color: #B452CD">2</span>) * <span style="color: #B452CD">2</span>
-            padded_width = X_batch.shape[width_index] + (<span style="color: #658b00">self</span>.kernel_width // <span style="color: #B452CD">2</span>) * <span style="color: #B452CD">2</span>
-            half_kernel_height = <span style="color: #658b00">self</span>.kernel_height // <span style="color: #B452CD">2</span>
-            half_kernel_width = <span style="color: #658b00">self</span>.kernel_width // <span style="color: #B452CD">2</span>
-
-            <span style="color: #228B22"># initialize padded array</span>
-            X_batch_padded = np.ndarray(
-                (
-                    X_batch.shape[input_index],
-                    X_batch.shape[feature_maps_index],
-                    padded_height,
-                    padded_width,
-                )
-            )
-
-            <span style="color: #228B22"># zero pad all images in X_batch</span>
-            <span style="color: #8B008B; font-weight: bold">for</span> img <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(X_batch.shape[input_index]):
-                padded_img = np.zeros(
-                    (X_batch.shape[feature_maps_index], padded_height, padded_width)
-                )
-                padded_img[
-                    :,
-                    half_kernel_height : padded_height - half_kernel_height,
-                    half_kernel_width : padded_width - half_kernel_width,
-                ] = X_batch[img, :, :, :]
-                X_batch_padded[img, :, :, :] = padded_img[:, :, :]
-
-            <span style="color: #8B008B; font-weight: bold">return</span> X_batch_padded
-
-        <span style="color: #228B22"># same padding for gradients</span>
-        <span style="color: #8B008B; font-weight: bold">elif</span> <span style="color: #658b00">self</span>.pad == <span style="color: #CD5555">&quot;same&quot;</span> <span style="color: #8B008B">and</span> batch_type == <span style="color: #CD5555">&quot;grad&quot;</span>:
-            padded_height = X_batch.shape[height_index] + (<span style="color: #658b00">self</span>.kernel_height // <span style="color: #B452CD">2</span>) * <span style="color: #B452CD">2</span>
-            padded_width = X_batch.shape[width_index] + (<span style="color: #658b00">self</span>.kernel_width // <span style="color: #B452CD">2</span>) * <span style="color: #B452CD">2</span>
-            half_kernel_height = <span style="color: #658b00">self</span>.kernel_height // <span style="color: #B452CD">2</span>
-            half_kernel_width = <span style="color: #658b00">self</span>.kernel_width // <span style="color: #B452CD">2</span>
-
-            <span style="color: #228B22"># initialize padded array</span>
-            delta_term_padded = np.zeros(
-                (
-                    X_batch.shape[input_index],
-                    X_batch.shape[feature_maps_index],
-                    padded_height,
-                    padded_width,
-                )
-            )
-
-            <span style="color: #228B22"># zero pad delta term</span>
-            delta_term_padded[
-                :, :, : X_batch.shape[height_index], : X_batch.shape[width_index]
-            ] = X_batch[:, :, :, :]
-
-            <span style="color: #8B008B; font-weight: bold">return</span> delta_term_padded
-
-        <span style="color: #8B008B; font-weight: bold">else</span>:
-            <span style="color: #8B008B; font-weight: bold">return</span> X_batch
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">_reset_weights_independently</span>(<span style="color: #658b00">self</span>):
-        <span style="color: #228B22"># sets seed to remove randomness inbetween runs</span>
-        <span style="color: #8B008B; font-weight: bold">if</span> <span style="color: #658b00">self</span>.seed <span style="color: #8B008B">is</span> <span style="color: #8B008B">not</span> <span style="color: #8B008B; font-weight: bold">None</span>:
-            np.random.seed(<span style="color: #658b00">self</span>.seed)
-
-        <span style="color: #228B22"># initializes kernel matrix</span>
-        <span style="color: #658b00">self</span>.kernel = np.ndarray(
-            (
-                <span style="color: #658b00">self</span>.input_channels,
-                <span style="color: #658b00">self</span>.feature_maps,
-                <span style="color: #658b00">self</span>.kernel_height,
-                <span style="color: #658b00">self</span>.kernel_width,
-            )
-        )
-
-        <span style="color: #228B22"># randomly initializes weights</span>
-        <span style="color: #8B008B; font-weight: bold">for</span> chin <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #658b00">self</span>.kernel.shape[kernel_input_channels_index]):
-            <span style="color: #8B008B; font-weight: bold">for</span> fmap <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #658b00">self</span>.kernel.shape[kernel_feature_maps_index]):
-                <span style="color: #658b00">self</span>.kernel[chin, fmap, :, :] = np.random.rand(
-                    <span style="color: #658b00">self</span>.kernel_height, <span style="color: #658b00">self</span>.kernel_width
-                )
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">_reset_weights</span>(<span style="color: #658b00">self</span>, previous_nodes):
-        <span style="color: #228B22"># sets weights</span>
-        <span style="color: #658b00">self</span>._reset_weights_independently()
-
-        <span style="color: #228B22"># returns shape of output used for subsequent layer&#39;s weight initiation</span>
-        strided_height = <span style="color: #658b00">int</span>(
-            np.ceil(previous_nodes.shape[height_index] / <span style="color: #658b00">self</span>.v_stride)
-        )
-        strided_width = <span style="color: #658b00">int</span>(np.ceil(previous_nodes.shape[width_index] / <span style="color: #658b00">self</span>.h_stride))
-        next_nodes = np.ones(
-            (
-                previous_nodes.shape[input_index],
-                <span style="color: #658b00">self</span>.feature_maps,
-                strided_height,
-                strided_width,
-            )
-        )
-        <span style="color: #8B008B; font-weight: bold">return</span> next_nodes / <span style="color: #658b00">self</span>.kernel_height
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">_check_for_errors</span>(<span style="color: #658b00">self</span>):
-        <span style="color: #8B008B; font-weight: bold">if</span> <span style="color: #658b00">self</span>.X_batch_feedforward.shape[input_channel_index] != <span style="color: #658b00">self</span>.input_channels:
-            <span style="color: #8B008B; font-weight: bold">raise</span> <span style="color: #008b45; font-weight: bold">AssertionError</span>(
-                <span style="color: #CD5555">f&quot;ERROR: Number of input channels in data ({</span><span style="color: #658b00">self</span>.X_batch_feedforward.shape[input_channel_index]<span style="color: #CD5555">}) is not equal to input channels in Convolution2DLayerOPT ({</span><span style="color: #658b00">self</span>.input_channels<span style="color: #CD5555">})! Please change the number of input channels of the Convolution2DLayer such that they are equal&quot;</span>
-            )
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-<h3 id="backpropagation-in-the-convolutional-layer">Backpropagation in the convolutional layer </h3>
-
-<p>As you may have noticed, we have not yet explained how the
-backpropagation algorithm works in a convolutional layer. However,
-having covered all other major details about convolutional layers, we
-are now prepared to do so. It should come as no surprise that the
-calculation of delta terms at each convolutional layer takes the form
-of convolution. After the gradient has been propagated backwards
-through the flattening layer, where it was reshaped into an
-appropriate form, calculating the update value for the kernel is
-simply a matter of convolving the output gradient with the input of
-the layer for which we are updating the weights. For more detail, this
-article serves as an excellent resource, see
-<a href="https://pavisj.medium.com/convolutions-and-backpropagations-46026a8f5d2c" target="_blank"><tt>https://pavisj.medium.com/convolutions-and-backpropagations-46026a8f5d2c</tt></a>
+<p>and obtain</p>
+$$
+\boldsymbol{Y}=\begin{bmatrix}x_{00}w_{00}+x_{01}w_{01}+x_{10}w_{10}+x_{11}w_{11} & x_{01}w_{00}+x_{02}w_{01}+x_{11}w_{10}+x_{12}w_{11}  \\
+	              x_{10}w_{00}+x_{11}w_{01}+x_{20}w_{10}+x_{21}w_{11} & x_{11}w_{00}+x_{12}w_{01}+x_{21}w_{10}+x_{22}w_{11}\end{bmatrix}.
+$$
+
+<p>We can rewrite this operation in terms of a matrix-vector multiplication by defining a new vector where we flatten out the inputs as a vector \( \boldsymbol{X}' \) of length \( 9 \) and
+a matrix \( \boldsymbol{W}' \) with dimension \( 4\times 9 \) as
 </p>
-<h3 id="demonstration">Demonstration </h3>
 
-<p>We can use the convolutional layer above to perform a simple convolution on an image of the now familiar cute dog.</p>
+$$
+\boldsymbol{X}'=\begin{bmatrix}x_{00} \\ x_{01} \\ x_{02} \\ x_{10} \\ x_{11} \\ x_{12} \\ x_{20} \\ x_{21} \\ x_{22} \end{bmatrix},
+$$
 
+<p>and the new matrix</p>
+$$
+\boldsymbol{W}'=\begin{bmatrix} w_{00} & w_{01} & 0 & w_{10} & w_{11} & 0 & 0 & 0 & 0 \\
+                        0  & w_{00} & w_{01} & 0 & w_{10} & w_{11} & 0 & 0 & 0 \\
+			0 & 0 & 0 & w_{00} & w_{01} & 0 & w_{10} & w_{11} & 0  \\
+                        0 & 0 & 0 & 0 & w_{00} & w_{01} & 0 & w_{10} & w_{11}\end{bmatrix}.
+$$
 
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="line-height: 125%;"><span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">imageio.v3</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">imageio</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">matplotlib.pyplot</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">plt</span>
+<p>We see easily that performing the matrix-vector multiplication \( \boldsymbol{W}'\boldsymbol{X}' \) is the same as the above convolution with stride \( S=1 \), that is</p>
 
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">plot_convolution_result</span>(X, layer):
-    plt.imshow(X[<span style="color: #B452CD">0</span>, <span style="color: #B452CD">0</span>, :, :], vmin=<span style="color: #B452CD">0</span>, vmax=<span style="color: #B452CD">255</span>, cmap=<span style="color: #CD5555">&quot;gray&quot;</span>)
-    plt.title(<span style="color: #CD5555">&quot;Original image&quot;</span>)
-    plt.colorbar()
-    plt.show()
-    conv_result = layer._feedforward(X)
-    plt.title(<span style="color: #CD5555">&quot;Result of convolutional layer&quot;</span>)
-    plt.imshow(conv_result[<span style="color: #B452CD">0</span>, <span style="color: #B452CD">0</span>, :, :], vmin=<span style="color: #B452CD">0</span>, vmax=<span style="color: #B452CD">255</span>, cmap=<span style="color: #CD5555">&quot;gray&quot;</span>)
-    plt.colorbar()
-    plt.show()
-
-<span style="color: #228B22"># create layer</span>
-layer = Convolution2DLayer(
-    input_channels=<span style="color: #B452CD">3</span>,
-    feature_maps=<span style="color: #B452CD">1</span>,
-    kernel_height=<span style="color: #B452CD">4</span>,
-    kernel_width=<span style="color: #B452CD">4</span>,
-    v_stride=<span style="color: #B452CD">2</span>,
-    h_stride=<span style="color: #B452CD">2</span>,
-    pad=<span style="color: #CD5555">&quot;same&quot;</span>,
-    act_func=identity,
-    seed=<span style="color: #B452CD">2023</span>,
-    )
-
-<span style="color: #228B22"># read in image path, make data correct format</span>
-img_path = img_path = <span style="color: #CD5555">&quot;data/IMG-2167.JPG&quot;</span>
-image_of_cute_dog = imageio.imread(img_path)
-image_shape = image_of_cute_dog.shape
-image_of_cute_dog = image_of_cute_dog.reshape(<span style="color: #B452CD">1</span>, image_shape[<span style="color: #B452CD">0</span>], image_shape[<span style="color: #B452CD">1</span>], image_shape[<span style="color: #B452CD">2</span>])
-image_of_cute_dog = image_of_cute_dog.transpose(<span style="color: #B452CD">0</span>, <span style="color: #B452CD">3</span>, <span style="color: #B452CD">1</span>, <span style="color: #B452CD">2</span>)
-
-<span style="color: #228B22"># plot the result of the convolution</span>
-plot_convolution_result(image_of_cute_dog, layer)
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
+$$
+Y=(\boldsymbol{W}*\boldsymbol{X}),
+$$
 
-<p>We cobserve that the result has half the pixels on each axis due to
-the fact that we've used a horizontal and vertical stride of 2. The
-result of this convolution is not very insightfull, as the kernel has
-completely random values for the first feedforward pass. However, as
-we perform multiple forward and backward passes, the results of the
-convolution should provide identifying features of the image it uses
-for classification.
-</p>
+<p>is now given by \( \boldsymbol{W}'\boldsymbol{X}' \) which is a vector of length \( 4 \) instead of the originally resulting  \( 2\times 2 \) output matrix.</p>
 
-<p>Note that image data usually comes in many different shapes and sizes,
-but for our CNN we require the input data be formatted as \[Number of
-inputs, input channels, input height, input width\]. Occasionally, the
-data you come accross use will be formatted like this, but on many
-occasions reshaping and transposing the dimensions is sadly necessary.
-</p>
-<h3 id="pooling-layer">Pooling Layer </h3>
-
-<p>The pooling layer is another widely used type of layer in
-convolutional neural networks that enables data downsampling to a more
-manageable size. Despite recent technological advancements that allow
-for convolution without excessive size reduction of the data, the
-pooling layer still remains a fundamental component of convolutional
-neural networks. It can be used before, after, or in between
-convolutional layers, although finding the optimal placement of layers
-and network depth requires experimentation to achieve the best
-performance for a given problem. The code we provide allows you to
-perform two types of pooling known as max pooling and average pooling.
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="the-convolution-stage">The convolution stage </h2>
+
+<p>The convolution stage, where we apply different filters \( \boldsymbol{W} \) in
+order to reduce the dimensionality of an image, adds, in addition to
+the weights and biases (to be trained by the back propagation
+algorithm) that define the filters, two new hyperparameters, the so-called
+<b>padding</b> \( P \) and the stride \( S \).
 </p>
 
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="finding-the-number-of-parameters">Finding the number of parameters </h2>
 
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="line-height: 125%;"><span style="color: #8B008B; font-weight: bold">class</span> <span style="color: #008b45; font-weight: bold">Pooling2DLayer</span>(Layer):
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">__init__</span>(
-        <span style="color: #658b00">self</span>,
-        kernel_height,
-        kernel_width,
-        v_stride,
-        h_stride,
-        pooling=<span style="color: #CD5555">&quot;max&quot;</span>,
-        seed=<span style="color: #8B008B; font-weight: bold">None</span>,
-    ):
-        <span style="color: #658b00">super</span>().<span style="color: #008b45">__init__</span>(seed)
-        <span style="color: #658b00">self</span>.kernel_height = kernel_height
-        <span style="color: #658b00">self</span>.kernel_width = kernel_width
-        <span style="color: #658b00">self</span>.v_stride = v_stride
-        <span style="color: #658b00">self</span>.h_stride = h_stride
-        <span style="color: #658b00">self</span>.pooling = pooling
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">_feedforward</span>(<span style="color: #658b00">self</span>, X_batch):
-        <span style="color: #228B22"># Saving the input for use in the backwardpass</span>
-        <span style="color: #658b00">self</span>.X_batch_feedforward = X_batch
-
-        <span style="color: #228B22"># check if user is silly</span>
-        <span style="color: #658b00">self</span>._check_for_errors()
-
-        <span style="color: #228B22"># Computing the size of the feature maps based on kernel size and the stride parameter</span>
-        strided_height = (
-            X_batch.shape[height_index] - <span style="color: #658b00">self</span>.kernel_height
-        ) // <span style="color: #658b00">self</span>.v_stride + <span style="color: #B452CD">1</span>
-        <span style="color: #8B008B; font-weight: bold">if</span> X_batch.shape[height_index] == X_batch.shape[width_index]:
-            strided_width = strided_height
-        <span style="color: #8B008B; font-weight: bold">else</span>:
-            strided_width = (
-                X_batch.shape[width_index] - <span style="color: #658b00">self</span>.kernel_width
-            ) // <span style="color: #658b00">self</span>.h_stride + <span style="color: #B452CD">1</span>
-
-        <span style="color: #228B22"># initialize output array</span>
-        output = np.ndarray(
-            (
-                X_batch.shape[input_index],
-                X_batch.shape[feature_maps_index],
-                strided_height,
-                strided_width,
-            )
-        )
-
-        <span style="color: #228B22"># select pooling action, either max or average pooling</span>
-        <span style="color: #8B008B; font-weight: bold">if</span> <span style="color: #658b00">self</span>.pooling == <span style="color: #CD5555">&quot;max&quot;</span>:
-            <span style="color: #658b00">self</span>.pooling_action = np.max
-        <span style="color: #8B008B; font-weight: bold">elif</span> <span style="color: #658b00">self</span>.pooling == <span style="color: #CD5555">&quot;average&quot;</span>:
-            <span style="color: #658b00">self</span>.pooling_action = np.mean
-
-        <span style="color: #228B22"># pool based on kernel size and stride</span>
-        <span style="color: #8B008B; font-weight: bold">for</span> img <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(output.shape[input_index]):
-            <span style="color: #8B008B; font-weight: bold">for</span> fmap <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(output.shape[feature_maps_index]):
-                <span style="color: #8B008B; font-weight: bold">for</span> h <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(strided_height):
-                    <span style="color: #8B008B; font-weight: bold">for</span> w <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(strided_width):
-                        output[img, fmap, h, w] = <span style="color: #658b00">self</span>.pooling_action(
-                            X_batch[
-                                img,
-                                fmap,
-                                (h * <span style="color: #658b00">self</span>.v_stride) : (h * <span style="color: #658b00">self</span>.v_stride)
-                                + <span style="color: #658b00">self</span>.kernel_height,
-                                (w * <span style="color: #658b00">self</span>.h_stride) : (w * <span style="color: #658b00">self</span>.h_stride)
-                                + <span style="color: #658b00">self</span>.kernel_width,
-                            ]
-                        )
-
-        <span style="color: #228B22"># output for feedforward in next layer</span>
-        <span style="color: #8B008B; font-weight: bold">return</span> output
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">_backpropagate</span>(<span style="color: #658b00">self</span>, delta_term_next):
-        <span style="color: #228B22"># initiate delta term array</span>
-        delta_term = np.zeros((<span style="color: #658b00">self</span>.X_batch_feedforward.shape))
-
-        <span style="color: #8B008B; font-weight: bold">for</span> img <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(delta_term_next.shape[input_index]):
-            <span style="color: #8B008B; font-weight: bold">for</span> fmap <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(delta_term_next.shape[feature_maps_index]):
-                <span style="color: #8B008B; font-weight: bold">for</span> h <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #B452CD">0</span>, delta_term_next.shape[height_index], <span style="color: #658b00">self</span>.v_stride):
-                    <span style="color: #8B008B; font-weight: bold">for</span> w <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(
-                        <span style="color: #B452CD">0</span>, delta_term_next.shape[width_index], <span style="color: #658b00">self</span>.h_stride
-                    ):
-                        <span style="color: #228B22"># max pooling</span>
-                        <span style="color: #8B008B; font-weight: bold">if</span> <span style="color: #658b00">self</span>.pooling == <span style="color: #CD5555">&quot;max&quot;</span>:
-                            <span style="color: #228B22"># get window</span>
-                            window = <span style="color: #658b00">self</span>.X_batch_feedforward[
-                                img,
-                                fmap,
-                                h : h + <span style="color: #658b00">self</span>.kernel_height,
-                                w : w + <span style="color: #658b00">self</span>.kernel_width,
-                            ]
-
-                            <span style="color: #228B22"># find max values indices in window</span>
-                            max_h, max_w = np.unravel_index(
-                                window.argmax(), window.shape
-                            )
-
-                            <span style="color: #228B22"># set values in new, upsampled delta term</span>
-                            delta_term[
-                                img,
-                                fmap,
-                                (h + max_h),
-                                (w + max_w),
-                            ] += delta_term_next[img, fmap, h, w]
-
-                        <span style="color: #228B22"># average pooling</span>
-                        <span style="color: #8B008B; font-weight: bold">if</span> <span style="color: #658b00">self</span>.pooling == <span style="color: #CD5555">&quot;average&quot;</span>:
-                            delta_term[
-                                img,
-                                fmap,
-                                h : h + <span style="color: #658b00">self</span>.kernel_height,
-                                w : w + <span style="color: #658b00">self</span>.kernel_width,
-                            ] = (
-                                delta_term_next[img, fmap, h, w]
-                                / <span style="color: #658b00">self</span>.kernel_height
-                                / <span style="color: #658b00">self</span>.kernel_width
-                            )
-        <span style="color: #228B22"># returns input to backpropagation in previous layer</span>
-        <span style="color: #8B008B; font-weight: bold">return</span> delta_term
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">_reset_weights</span>(<span style="color: #658b00">self</span>, previous_nodes):
-        <span style="color: #228B22"># calculate strided height, strided width</span>
-        strided_height = (
-            previous_nodes.shape[height_index] - <span style="color: #658b00">self</span>.kernel_height
-        ) // <span style="color: #658b00">self</span>.v_stride + <span style="color: #B452CD">1</span>
-        <span style="color: #8B008B; font-weight: bold">if</span> previous_nodes.shape[height_index] == previous_nodes.shape[width_index]:
-            strided_width = strided_height
-        <span style="color: #8B008B; font-weight: bold">else</span>:
-            strided_width = (
-                previous_nodes.shape[width_index] - <span style="color: #658b00">self</span>.kernel_width
-            ) // <span style="color: #658b00">self</span>.h_stride + <span style="color: #B452CD">1</span>
-
-        <span style="color: #228B22"># initiate output array</span>
-        output = np.ones(
-            (
-                previous_nodes.shape[input_index],
-                previous_nodes.shape[feature_maps_index],
-                strided_height,
-                strided_width,
-            )
-        )
-
-        <span style="color: #228B22"># returns output with shape used for reset weights in next layer</span>
-        <span style="color: #8B008B; font-weight: bold">return</span> output
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">_check_for_errors</span>(<span style="color: #658b00">self</span>):
-        <span style="color: #228B22"># check if input is smaller than kernel size -&gt; error</span>
-        <span style="color: #8B008B; font-weight: bold">assert</span> (
-            <span style="color: #658b00">self</span>.X_batch_feedforward.shape[width_index] &gt;= <span style="color: #658b00">self</span>.kernel_width
-        ), <span style="color: #CD5555">f&quot;ERROR: Pooling kernel width_index ({</span><span style="color: #658b00">self</span>.kernel_width<span style="color: #CD5555">}) larger than data width_index ({</span><span style="color: #658b00">self</span>.X_batch_feedforward.input.shape[<span style="color: #B452CD">2</span>]<span style="color: #CD5555">}), please lower the kernel width_index of the Pooling2DLayer&quot;</span>
-        <span style="color: #8B008B; font-weight: bold">assert</span> (
-            <span style="color: #658b00">self</span>.X_batch_feedforward.shape[height_index] &gt;= <span style="color: #658b00">self</span>.kernel_height
-        ), <span style="color: #CD5555">f&quot;ERROR: Pooling kernel height_index ({</span><span style="color: #658b00">self</span>.kernel_height<span style="color: #CD5555">}) larger than data height_index ({</span><span style="color: #658b00">self</span>.X_batch_feedforward.input.shape[<span style="color: #B452CD">3</span>]<span style="color: #CD5555">}), please lower the kernel height_index of the Pooling2DLayer&quot;</span>
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-<h3 id="flattening-layer">Flattening Layer </h3>
-
-<p>Before we can begin building our first CNN model, we need to introduce
-the flattening layer. As its name suggests, the flattening layer
-transforms the data into a one-dimensional vector that can be fed into
-the feedforward layers of our network. This layer plays a crucial role
-in preparing the data for further processing in the
-network. Additionally, the flattening layer is responsible for
-reshaping the gradient to the proper shape during
-backpropagation. This ensures that the kernels are correctly updated,
-allowing for effective learning in the network.
+<p>In the above example we have an input matrix of dimension \( 3\times
+3 \). In general we call the input for an input volume and it is defined
+by its width \( H_1 \), height \( H_1 \) and depth \( D_1 \). If we have the
+standard three color channels \( D_1=3 \).
 </p>
 
+<p>The above example has \( W_1=H_1=3 \) and \( D_1=1 \).</p>
 
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="line-height: 125%;"><span style="color: #8B008B; font-weight: bold">class</span> <span style="color: #008b45; font-weight: bold">FlattenLayer</span>(Layer):
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">__init__</span>(<span style="color: #658b00">self</span>, act_func=LRELU, seed=<span style="color: #8B008B; font-weight: bold">None</span>):
-        <span style="color: #658b00">super</span>().<span style="color: #008b45">__init__</span>(seed)
-        <span style="color: #658b00">self</span>.act_func = act_func
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">_feedforward</span>(<span style="color: #658b00">self</span>, X_batch):
-        <span style="color: #228B22"># save input for backpropagation</span>
-        <span style="color: #658b00">self</span>.X_batch_feedforward_shape = X_batch.shape
-        <span style="color: #228B22"># Remember, the data has the following shape: (I, FM, H, W, ) in the convolutional layers</span>
-        <span style="color: #228B22"># whilst the data has the shape (I, FM * H * W) in the fully connected layers</span>
-        <span style="color: #228B22"># I = Inputs, FM = Feature Maps, H = Height and W = Width.</span>
-        X_batch = X_batch.reshape(
-            X_batch.shape[input_index],
-            X_batch.shape[feature_maps_index]
-            * X_batch.shape[height_index]
-            * X_batch.shape[width_index],
-        )
-
-        <span style="color: #228B22"># add bias to a</span>
-        <span style="color: #658b00">self</span>.z_matrix = X_batch
-        bias = np.ones((X_batch.shape[input_index], <span style="color: #B452CD">1</span>)) * <span style="color: #B452CD">0.01</span>
-        <span style="color: #658b00">self</span>.a_matrix = np.hstack([bias, X_batch])
-
-        <span style="color: #228B22"># return a, the input to feedforward in next layer</span>
-        <span style="color: #8B008B; font-weight: bold">return</span> <span style="color: #658b00">self</span>.a_matrix
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">_backpropagate</span>(<span style="color: #658b00">self</span>, weights_next, delta_term_next):
-        activation_derivative = derivate(<span style="color: #658b00">self</span>.act_func)
-
-        <span style="color: #228B22"># calculate delta term</span>
-        delta_term = (
-            weights_next[bias_index:, :] @ delta_term_next.T
-        ).T * activation_derivative(<span style="color: #658b00">self</span>.z_matrix)
-
-        <span style="color: #228B22"># FlattenLayer does not update weights</span>
-        <span style="color: #228B22"># reshapes delta layer to convolutional layer data format [Input, Feature_Maps, Height, Width]</span>
-        <span style="color: #8B008B; font-weight: bold">return</span> delta_term.reshape(<span style="color: #658b00">self</span>.X_batch_feedforward_shape)
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">_reset_weights</span>(<span style="color: #658b00">self</span>, previous_nodes):
-        <span style="color: #228B22"># note that the previous nodes to the FlattenLayer are from the convolutional layers</span>
-        previous_nodes = previous_nodes.reshape(
-            previous_nodes.shape[input_index],
-            previous_nodes.shape[feature_maps_index]
-            * previous_nodes.shape[height_index]
-            * previous_nodes.shape[width_index],
-        )
-
-        <span style="color: #228B22"># return shape used in reset_weights in next layer</span>
-        <span style="color: #8B008B; font-weight: bold">return</span> previous_nodes.shape[node_index]
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">get_prev_a</span>(<span style="color: #658b00">self</span>):
-        <span style="color: #8B008B; font-weight: bold">return</span> <span style="color: #658b00">self</span>.a_matrix
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-<h3 id="fully-connected-layers">Fully Connected Layers </h3>
-
-<p>Finally, the result from the flatten layer will pass to a series of
-fully connected layers, which function as a normal feed forward neural
-network. The fully connected layers are split into two classes;
-FullyConnectedLayer which acts as a hidden layer, and OutputLayer,
-which acts as the single output layer at the end of the CNN. If one
-wishes to use this codebase to construct a normal feed forward neural
-network, it must start with a FlattenLayer due to techincal details
-regarding weight intitialization. However many FullyConnectedLayers
-can be added to the CNN, and in each layer the amount of nodes, which
-activation function and scheduler to use can be specified. In
-practice, the scheduler will be specified in the CNN object
-initialization, and inherited if no other scheduler is specified.
-</p>
+<p>When we introduce the filter we have the following additional hyperparameters</p>
+<ol>
+<li> \( K \) the number of filters. It is common to perform the convolution of the input several times since by experience shrinking the input too fast does not work well</li>
+<li> \( F \) as the filter's spatial extent</li>
+<li> \( S \) as the stride parameter</li>
+<li> \( P \) as the padding parameter</li>
+</ol>
+<p>These parameters are defined by the architecture of the network and are not included in the training.</p>
 
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="new-image-or-volume">New image (or volume) </h2>
 
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="line-height: 125%;"><span style="color: #8B008B; font-weight: bold">class</span> <span style="color: #008b45; font-weight: bold">FullyConnectedLayer</span>(Layer):
-    <span style="color: #228B22"># FullyConnectedLayer per default uses LRELU and Adam scheduler</span>
-    <span style="color: #228B22"># with an eta of 0.0001, rho of 0.9 and rho2 of 0.999</span>
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">__init__</span>(
-        <span style="color: #658b00">self</span>,
-        nodes: <span style="color: #658b00">int</span>,
-        act_func: Callable = LRELU,
-        scheduler: Scheduler = Adam(eta=<span style="color: #B452CD">1e-4</span>, rho=<span style="color: #B452CD">0.9</span>, rho2=<span style="color: #B452CD">0.999</span>),
-        seed: <span style="color: #658b00">int</span> = <span style="color: #8B008B; font-weight: bold">None</span>,
-    ):
-        <span style="color: #658b00">super</span>().<span style="color: #008b45">__init__</span>(seed)
-        <span style="color: #658b00">self</span>.nodes = nodes
-        <span style="color: #658b00">self</span>.act_func = act_func
-        <span style="color: #658b00">self</span>.scheduler_weight = copy(scheduler)
-        <span style="color: #658b00">self</span>.scheduler_bias = copy(scheduler)
-
-        <span style="color: #228B22"># initiate matrices for later</span>
-        <span style="color: #658b00">self</span>.weights = <span style="color: #8B008B; font-weight: bold">None</span>
-        <span style="color: #658b00">self</span>.a_matrix = <span style="color: #8B008B; font-weight: bold">None</span>
-        <span style="color: #658b00">self</span>.z_matrix = <span style="color: #8B008B; font-weight: bold">None</span>
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">_feedforward</span>(<span style="color: #658b00">self</span>, X_batch):
-        <span style="color: #228B22"># calculate z</span>
-        <span style="color: #658b00">self</span>.z_matrix = X_batch @ <span style="color: #658b00">self</span>.weights
-
-        <span style="color: #228B22"># calculate a, add bias</span>
-        bias = np.ones((X_batch.shape[input_index], <span style="color: #B452CD">1</span>)) * <span style="color: #B452CD">0.01</span>
-        <span style="color: #658b00">self</span>.a_matrix = <span style="color: #658b00">self</span>.act_func(<span style="color: #658b00">self</span>.z_matrix)
-        <span style="color: #658b00">self</span>.a_matrix = np.hstack([bias, <span style="color: #658b00">self</span>.a_matrix])
-
-        <span style="color: #228B22"># return a, the input for feedforward in next layer</span>
-        <span style="color: #8B008B; font-weight: bold">return</span> <span style="color: #658b00">self</span>.a_matrix
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">_backpropagate</span>(<span style="color: #658b00">self</span>, weights_next, delta_term_next, a_previous, lam):
-        <span style="color: #228B22"># take the derivative of the activation function</span>
-        activation_derivative = derivate(<span style="color: #658b00">self</span>.act_func)
-
-        <span style="color: #228B22"># calculate the delta term</span>
-        delta_term = (
-            weights_next[bias_index:, :] @ delta_term_next.T
-        ).T * activation_derivative(<span style="color: #658b00">self</span>.z_matrix)
-
-        <span style="color: #228B22"># intitiate matrix to store gradient</span>
-        <span style="color: #228B22"># note that we exclude the bias term, which we will calculate later</span>
-        gradient_weights = np.zeros(
-            (
-                a_previous.shape[input_index],
-                a_previous.shape[node_index] - bias_index,
-                delta_term.shape[node_index],
-            )
-        )
-
-        <span style="color: #228B22"># calculate gradient = delta term * previous a</span>
-        <span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #658b00">len</span>(delta_term)):
-            gradient_weights[i, :, :] = np.outer(
-                a_previous[i, bias_index:], delta_term[i, :]
-            )
-
-        <span style="color: #228B22"># sum the gradient, divide by input_index</span>
-        gradient_weights = np.mean(gradient_weights, axis=input_index)
-        <span style="color: #228B22"># for the bias gradient we do not multiply by previous a</span>
-        gradient_bias = np.mean(delta_term, axis=input_index).reshape(
-            <span style="color: #B452CD">1</span>, delta_term.shape[node_index]
-        )
-
-        <span style="color: #228B22"># regularization term</span>
-        gradient_weights += <span style="color: #658b00">self</span>.weights[bias_index:, :] * lam
-
-        <span style="color: #228B22"># send gradients into scheduler</span>
-        <span style="color: #228B22"># returns update matrix which will be used to update the weights and bias</span>
-        update_matrix = np.vstack(
-            [
-                <span style="color: #658b00">self</span>.scheduler_bias.update_change(gradient_bias),
-                <span style="color: #658b00">self</span>.scheduler_weight.update_change(gradient_weights),
-            ]
-        )
-
-        <span style="color: #228B22"># update weights</span>
-        <span style="color: #658b00">self</span>.weights -= update_matrix
-
-        <span style="color: #228B22"># return weights and delta term, input for backpropagation in previous layer</span>
-        <span style="color: #8B008B; font-weight: bold">return</span> <span style="color: #658b00">self</span>.weights, delta_term
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">_reset_weights</span>(<span style="color: #658b00">self</span>, previous_nodes):
-        <span style="color: #228B22"># sets seed to remove randomness inbetween runs</span>
-        <span style="color: #8B008B; font-weight: bold">if</span> <span style="color: #658b00">self</span>.seed <span style="color: #8B008B">is</span> <span style="color: #8B008B">not</span> <span style="color: #8B008B; font-weight: bold">None</span>:
-            np.random.seed(<span style="color: #658b00">self</span>.seed)
-
-        <span style="color: #228B22"># add bias, initiate random weights</span>
-        bias = <span style="color: #B452CD">1</span>
-        <span style="color: #658b00">self</span>.weights = np.random.randn(previous_nodes + bias, <span style="color: #658b00">self</span>.nodes)
-
-        <span style="color: #228B22"># returns number of nodes, used for reset_weights in next layer</span>
-        <span style="color: #8B008B; font-weight: bold">return</span> <span style="color: #658b00">self</span>.nodes
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">_reset_scheduler</span>(<span style="color: #658b00">self</span>):
-        <span style="color: #228B22"># resets scheduler per epoch</span>
-        <span style="color: #658b00">self</span>.scheduler_weight.reset()
-        <span style="color: #658b00">self</span>.scheduler_bias.reset()
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">get_prev_a</span>(<span style="color: #658b00">self</span>):
-        <span style="color: #228B22"># returns a matrix, used in backpropagation</span>
-        <span style="color: #8B008B; font-weight: bold">return</span> <span style="color: #658b00">self</span>.a_matrix
-
-
-<span style="color: #8B008B; font-weight: bold">class</span> <span style="color: #008b45; font-weight: bold">OutputLayer</span>(FullyConnectedLayer):
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">__init__</span>(
-        <span style="color: #658b00">self</span>,
-        nodes: <span style="color: #658b00">int</span>,
-        output_func: Callable = LRELU,
-        cost_func: Callable = CostCrossEntropy,
-        scheduler: Scheduler = Adam(eta=<span style="color: #B452CD">1e-4</span>, rho=<span style="color: #B452CD">0.9</span>, rho2=<span style="color: #B452CD">0.999</span>),
-        seed: <span style="color: #658b00">int</span> = <span style="color: #8B008B; font-weight: bold">None</span>,
-    ):
-        <span style="color: #658b00">super</span>().<span style="color: #008b45">__init__</span>(nodes, output_func, copy(scheduler), seed)
-        <span style="color: #658b00">self</span>.cost_func = cost_func
-
-        <span style="color: #228B22"># initiate matrices for later</span>
-        <span style="color: #658b00">self</span>.weights = <span style="color: #8B008B; font-weight: bold">None</span>
-        <span style="color: #658b00">self</span>.a_matrix = <span style="color: #8B008B; font-weight: bold">None</span>
-        <span style="color: #658b00">self</span>.z_matrix = <span style="color: #8B008B; font-weight: bold">None</span>
-
-        <span style="color: #228B22"># decides if the output layer performs binary or multi-class classification</span>
-        <span style="color: #658b00">self</span>._set_pred_format()
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">_feedforward</span>(<span style="color: #658b00">self</span>, X_batch: np.ndarray):
-        <span style="color: #228B22"># calculate a, z</span>
-        <span style="color: #228B22"># note that bias is not added as this would create an extra output class</span>
-        <span style="color: #658b00">self</span>.z_matrix = X_batch @ <span style="color: #658b00">self</span>.weights
-        <span style="color: #658b00">self</span>.a_matrix = <span style="color: #658b00">self</span>.act_func(<span style="color: #658b00">self</span>.z_matrix)
-
-        <span style="color: #228B22"># returns prediction</span>
-        <span style="color: #8B008B; font-weight: bold">return</span> <span style="color: #658b00">self</span>.a_matrix
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">_backpropagate</span>(<span style="color: #658b00">self</span>, target, a_previous, lam):
-        <span style="color: #228B22"># note that in the OutputLayer the activation function is the output function</span>
-        activation_derivative = derivate(<span style="color: #658b00">self</span>.act_func)
-
-        <span style="color: #228B22"># calculate output delta terms</span>
-        <span style="color: #228B22"># for multi-class or binary classification</span>
-        <span style="color: #8B008B; font-weight: bold">if</span> <span style="color: #658b00">self</span>.pred_format == <span style="color: #CD5555">&quot;Multi-class&quot;</span>:
-            delta_term = <span style="color: #658b00">self</span>.a_matrix - target
-        <span style="color: #8B008B; font-weight: bold">else</span>:
-            cost_func_derivative = grad(<span style="color: #658b00">self</span>.cost_func(target))
-            delta_term = activation_derivative(<span style="color: #658b00">self</span>.z_matrix) * cost_func_derivative(
-                <span style="color: #658b00">self</span>.a_matrix
-            )
-
-        <span style="color: #228B22"># intiate matrix that stores gradient</span>
-        gradient_weights = np.zeros(
-            (
-                a_previous.shape[input_index],
-                a_previous.shape[node_index] - bias_index,
-                delta_term.shape[node_index],
-            )
-        )
-
-        <span style="color: #228B22"># calculate gradient = delta term * previous a</span>
-        <span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #658b00">len</span>(delta_term)):
-            gradient_weights[i, :, :] = np.outer(
-                a_previous[i, bias_index:], delta_term[i, :]
-            )
-
-        <span style="color: #228B22"># sum the gradient, divide by input_index</span>
-        gradient_weights = np.mean(gradient_weights, axis=input_index)
-        <span style="color: #228B22"># for the bias gradient we do not multiply by previous a</span>
-        gradient_bias = np.mean(delta_term, axis=input_index).reshape(
-            <span style="color: #B452CD">1</span>, delta_term.shape[node_index]
-        )
-
-        <span style="color: #228B22"># regularization term</span>
-        gradient_weights += <span style="color: #658b00">self</span>.weights[bias_index:, :] * lam
-
-        <span style="color: #228B22"># send gradients into scheduler</span>
-        <span style="color: #228B22"># returns update matrix which will be used to update the weights and bias</span>
-        update_matrix = np.vstack(
-            [
-                <span style="color: #658b00">self</span>.scheduler_bias.update_change(gradient_bias),
-                <span style="color: #658b00">self</span>.scheduler_weight.update_change(gradient_weights),
-            ]
-        )
-
-        <span style="color: #228B22"># update weights</span>
-        <span style="color: #658b00">self</span>.weights -= update_matrix
-
-        <span style="color: #228B22"># return weights and delta term, input for backpropagation in previous layer</span>
-        <span style="color: #8B008B; font-weight: bold">return</span> <span style="color: #658b00">self</span>.weights, delta_term
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">_reset_weights</span>(<span style="color: #658b00">self</span>, previous_nodes):
-        <span style="color: #228B22"># sets seed to remove randomness inbetween runs</span>
-        <span style="color: #8B008B; font-weight: bold">if</span> <span style="color: #658b00">self</span>.seed <span style="color: #8B008B">is</span> <span style="color: #8B008B">not</span> <span style="color: #8B008B; font-weight: bold">None</span>:
-            np.random.seed(<span style="color: #658b00">self</span>.seed)
-
-        <span style="color: #228B22"># add bias, initiate random weights</span>
-        bias = <span style="color: #B452CD">1</span>
-        <span style="color: #658b00">self</span>.weights = np.random.rand(previous_nodes + bias, <span style="color: #658b00">self</span>.nodes)
-
-        <span style="color: #228B22"># returns number of nodes, used for reset_weights in next layer</span>
-        <span style="color: #8B008B; font-weight: bold">return</span> <span style="color: #658b00">self</span>.nodes
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">_reset_scheduler</span>(<span style="color: #658b00">self</span>):
-        <span style="color: #228B22"># resets scheduler per epoch</span>
-        <span style="color: #658b00">self</span>.scheduler_weight.reset()
-        <span style="color: #658b00">self</span>.scheduler_bias.reset()
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">_set_pred_format</span>(<span style="color: #658b00">self</span>):
-        <span style="color: #228B22"># sets prediction format to either regression, binary or multi-class classification</span>
-        <span style="color: #8B008B; font-weight: bold">if</span> <span style="color: #658b00">self</span>.act_func.<span style="color: #00688B">__name__</span> <span style="color: #8B008B">is</span> <span style="color: #8B008B; font-weight: bold">None</span> <span style="color: #8B008B">or</span> <span style="color: #658b00">self</span>.act_func.<span style="color: #00688B">__name__</span> == <span style="color: #CD5555">&quot;identity&quot;</span>:
-            <span style="color: #658b00">self</span>.pred_format = <span style="color: #CD5555">&quot;Regression&quot;</span>
-        <span style="color: #8B008B; font-weight: bold">elif</span> <span style="color: #658b00">self</span>.act_func.<span style="color: #00688B">__name__</span> == <span style="color: #CD5555">&quot;sigmoid&quot;</span> <span style="color: #8B008B">or</span> <span style="color: #658b00">self</span>.act_func.<span style="color: #00688B">__name__</span> == <span style="color: #CD5555">&quot;tanh&quot;</span>:
-            <span style="color: #658b00">self</span>.pred_format = <span style="color: #CD5555">&quot;Binary&quot;</span>
-        <span style="color: #8B008B; font-weight: bold">else</span>:
-            <span style="color: #658b00">self</span>.pred_format = <span style="color: #CD5555">&quot;Multi-class&quot;</span>
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">get_pred_format</span>(<span style="color: #658b00">self</span>):
-        <span style="color: #228B22"># returns format of prediction</span>
-        <span style="color: #8B008B; font-weight: bold">return</span> <span style="color: #658b00">self</span>.pred_format
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-<h3 id="optimized-convolution2dlayer">Optimized Convolution2DLayer </h3>
-
-<p>For our CNN, we have also implemented an optimized version of the
-Convolution2DLayer, Convolution2DLayerOPT, which runs much faster. See
-VII. Remarks for discussion. This layer will per default be used by
-the CNN due to its computational advantages, but is much less
-readable. We've documented it such that specially interested students
-can understand the principles behind it, but it is not recommended to
-read. In short, we reshape and transpose parts of the image such that
-the convolutional operation can be swapped out for a simple matrix
-multiplication.
+<p>Acting with the filter on the input volume produces an output volume
+which is defined by its width \( W_2 \), its height \( H_2 \) and its depth
+\( D_2 \).
 </p>
 
+<p>These are defined by the following relations</p>
+$$
+W_2 = \frac{(W_1-F+2P)}{S}+1,
+$$
 
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="line-height: 125%;"><span style="color: #8B008B; font-weight: bold">class</span> <span style="color: #008b45; font-weight: bold">Convolution2DLayerOPT</span>(Convolution2DLayer):
-    <span style="color: #CD5555">&quot;&quot;&quot;</span>
-<span style="color: #CD5555">    Am optimized version of the convolution layer above which</span>
-<span style="color: #CD5555">    utilizes an approach of extracting windows of size equivalent</span>
-<span style="color: #CD5555">    in size to the filter. The convoution is then performed on those</span>
-<span style="color: #CD5555">    windows instead of a full feature map.</span>
-<span style="color: #CD5555">    &quot;&quot;&quot;</span>
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">__init__</span>(
-        <span style="color: #658b00">self</span>,
-        input_channels,
-        feature_maps,
-        kernel_height,
-        kernel_width,
-        v_stride,
-        h_stride,
-        pad,
-        act_func: Callable,
-        seed=<span style="color: #8B008B; font-weight: bold">None</span>,
-        reset_weights_independently=<span style="color: #8B008B; font-weight: bold">True</span>,
-    ):
-        <span style="color: #658b00">super</span>().<span style="color: #008b45">__init__</span>(
-            input_channels,
-            feature_maps,
-            kernel_height,
-            kernel_width,
-            v_stride,
-            h_stride,
-            pad,
-            act_func,
-            seed,
-        )
-        <span style="color: #228B22"># true if layer is used outside of CNN</span>
-        <span style="color: #8B008B; font-weight: bold">if</span> reset_weights_independently == <span style="color: #8B008B; font-weight: bold">True</span>:
-            <span style="color: #658b00">self</span>._reset_weights_independently()
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">_feedforward</span>(<span style="color: #658b00">self</span>, X_batch):
-        <span style="color: #228B22"># The optimized _feedforward method is difficult to understand but computationally more efficient</span>
-        <span style="color: #228B22"># for a more &quot;by the book&quot; approach, please look at the _feedforward method of Convolution2DLayer</span>
-
-        <span style="color: #228B22"># save the input for backpropagation</span>
-        <span style="color: #658b00">self</span>.X_batch_feedforward = X_batch
-
-        <span style="color: #228B22"># check that there are the correct amount of input channels</span>
-        <span style="color: #658b00">self</span>._check_for_errors()
-
-        <span style="color: #228B22"># calculate new shape after stride</span>
-        strided_height = <span style="color: #658b00">int</span>(np.ceil(X_batch.shape[height_index] / <span style="color: #658b00">self</span>.v_stride))
-        strided_width = <span style="color: #658b00">int</span>(np.ceil(X_batch.shape[width_index] / <span style="color: #658b00">self</span>.h_stride))
-
-        <span style="color: #228B22"># get windows of the image for more computationally efficient convolution</span>
-        <span style="color: #228B22"># the idea is that we want to align the dimensions that we wish to matrix</span>
-        <span style="color: #228B22"># multiply, then use a simple matrix multiplication instead of convolution.</span>
-        <span style="color: #228B22"># then, we reshape the size back to its intended shape</span>
-        windows = <span style="color: #658b00">self</span>._extract_windows(X_batch)
-        windows = windows.transpose(<span style="color: #B452CD">1</span>, <span style="color: #B452CD">0</span>, <span style="color: #B452CD">2</span>, <span style="color: #B452CD">3</span>, <span style="color: #B452CD">4</span>).reshape(
-            X_batch.shape[input_index],
-            strided_height * strided_width,
-            -<span style="color: #B452CD">1</span>,
-        )
-
-        <span style="color: #228B22"># reshape the kernel for more computationally efficient convolution</span>
-        kernel = <span style="color: #658b00">self</span>.kernel
-        kernel = kernel.transpose(<span style="color: #B452CD">0</span>, <span style="color: #B452CD">2</span>, <span style="color: #B452CD">3</span>, <span style="color: #B452CD">1</span>).reshape(
-            kernel.shape[kernel_input_channels_index]
-            * kernel.shape[height_index]
-            * kernel.shape[width_index],
-            -<span style="color: #B452CD">1</span>,
-        )
-
-        <span style="color: #228B22"># use simple matrix calculation to obtain output</span>
-        output = (
-            (windows @ kernel)
-            .reshape(
-                X_batch.shape[input_index],
-                strided_height,
-                strided_width,
-                -<span style="color: #B452CD">1</span>,
-            )
-            .transpose(<span style="color: #B452CD">0</span>, <span style="color: #B452CD">3</span>, <span style="color: #B452CD">1</span>, <span style="color: #B452CD">2</span>)
-        )
-
-        <span style="color: #228B22"># The output is reshaped and rearranged to appropriate shape</span>
-        <span style="color: #8B008B; font-weight: bold">return</span> <span style="color: #658b00">self</span>.act_func(
-            output / (<span style="color: #658b00">self</span>.kernel_height * X_batch.shape[feature_maps_index])
-        )
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">_backpropagate</span>(<span style="color: #658b00">self</span>, delta_term_next):
-        <span style="color: #228B22"># The optimized _backpropagate method is difficult to understand but computationally more efficient</span>
-        <span style="color: #228B22"># for a more &quot;by the book&quot; approach, please look at the _backpropagate method of Convolution2DLayer</span>
-        act_derivative = derivate(<span style="color: #658b00">self</span>.act_func)
-        delta_term_next = act_derivative(delta_term_next)
-
-        <span style="color: #228B22"># calculate strided dimensions</span>
-        strided_height = <span style="color: #658b00">int</span>(
-            np.ceil(<span style="color: #658b00">self</span>.X_batch_feedforward.shape[height_index] / <span style="color: #658b00">self</span>.v_stride)
-        )
-        strided_width = <span style="color: #658b00">int</span>(
-            np.ceil(<span style="color: #658b00">self</span>.X_batch_feedforward.shape[width_index] / <span style="color: #658b00">self</span>.h_stride)
-        )
-
-        <span style="color: #228B22"># copy kernel</span>
-        kernel = <span style="color: #658b00">self</span>.kernel
-
-        <span style="color: #228B22"># get windows, reshape for matrix multiplication</span>
-        windows = <span style="color: #658b00">self</span>._extract_windows(<span style="color: #658b00">self</span>.X_batch_feedforward, <span style="color: #CD5555">&quot;image&quot;</span>).reshape(
-            <span style="color: #658b00">self</span>.X_batch_feedforward.shape[input_index]
-            * strided_height
-            * strided_width,
-            -<span style="color: #B452CD">1</span>,
-        )
-
-        <span style="color: #228B22"># initialize output gradient, reshape and transpose into correct shape</span>
-        <span style="color: #228B22"># for matrix multiplication</span>
-        output_grad_tr = delta_term_next.transpose(<span style="color: #B452CD">0</span>, <span style="color: #B452CD">2</span>, <span style="color: #B452CD">3</span>, <span style="color: #B452CD">1</span>).reshape(
-            <span style="color: #658b00">self</span>.X_batch_feedforward.shape[input_index]
-            * strided_height
-            * strided_width,
-            -<span style="color: #B452CD">1</span>,
-        )
-
-        <span style="color: #228B22"># calculate gradient kernel via simple matrix multiplication and reshaping</span>
-        gradient_kernel = (
-            (windows.T @ output_grad_tr)
-            .reshape(
-                kernel.shape[kernel_input_channels_index],
-                kernel.shape[height_index],
-                kernel.shape[width_index],
-                kernel.shape[kernel_feature_maps_index],
-            )
-            .transpose(<span style="color: #B452CD">0</span>, <span style="color: #B452CD">3</span>, <span style="color: #B452CD">1</span>, <span style="color: #B452CD">2</span>)
-        )
-
-        <span style="color: #228B22"># for computing the input gradient</span>
-        windows_out, upsampled_height, upsampled_width = <span style="color: #658b00">self</span>._extract_windows(
-            delta_term_next, <span style="color: #CD5555">&quot;grad&quot;</span>
-        )
-
-        <span style="color: #228B22"># calculate new window dimensions</span>
-        new_windows_first_dim = (
-            <span style="color: #658b00">self</span>.X_batch_feedforward.shape[input_index]
-            * upsampled_height
-            * upsampled_width
-        )
-        <span style="color: #228B22"># ceil allows for various asymmetric kernels</span>
-        new_windows_sec_dim = <span style="color: #658b00">int</span>(np.ceil(windows_out.size / new_windows_first_dim))
-
-        <span style="color: #228B22"># reshape for matrix multiplication</span>
-        windows_out = windows_out.transpose(<span style="color: #B452CD">1</span>, <span style="color: #B452CD">0</span>, <span style="color: #B452CD">2</span>, <span style="color: #B452CD">3</span>, <span style="color: #B452CD">4</span>).reshape(
-            new_windows_first_dim, new_windows_sec_dim
-        )
-
-        <span style="color: #228B22"># reshape for matrix multiplication</span>
-        kernel_reshaped = kernel.reshape(<span style="color: #658b00">self</span>.input_channels, -<span style="color: #B452CD">1</span>)
-
-        <span style="color: #228B22"># calculating input gradient for next convolutional layer</span>
-        input_grad = (windows_out @ kernel_reshaped.T).reshape(
-            <span style="color: #658b00">self</span>.X_batch_feedforward.shape[input_index],
-            upsampled_height,
-            upsampled_width,
-            kernel.shape[kernel_input_channels_index],
-        )
-        input_grad = input_grad.transpose(<span style="color: #B452CD">0</span>, <span style="color: #B452CD">3</span>, <span style="color: #B452CD">1</span>, <span style="color: #B452CD">2</span>)
-
-        <span style="color: #228B22"># Update the weights in the kernel</span>
-        <span style="color: #658b00">self</span>.kernel -= gradient_kernel
-
-        <span style="color: #228B22"># Output the gradient to propagate backwards</span>
-        <span style="color: #8B008B; font-weight: bold">return</span> input_grad
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">_extract_windows</span>(<span style="color: #658b00">self</span>, X_batch, batch_type=<span style="color: #CD5555">&quot;image&quot;</span>):
-        <span style="color: #CD5555">&quot;&quot;&quot;</span>
-<span style="color: #CD5555">        Receives as input the X_batch with shape (inputs, feature_maps, image_height, image_width)</span>
-<span style="color: #CD5555">        and extract windows of size kernel_height * kernel_width for every image and every feature_map.</span>
-<span style="color: #CD5555">        It then returns an np.ndarray of shape (image_height * image_width, inputs, feature_maps, kernel_height, kernel_width)</span>
-<span style="color: #CD5555">        which will be used either to filter the images in feedforward or to calculate the gradient.</span>
-<span style="color: #CD5555">        &quot;&quot;&quot;</span>
-
-        <span style="color: #228B22"># initialize list of windows</span>
-        windows = []
-
-        <span style="color: #8B008B; font-weight: bold">if</span> batch_type == <span style="color: #CD5555">&quot;image&quot;</span>:
-            <span style="color: #228B22"># pad the images</span>
-            X_batch_padded = <span style="color: #658b00">self</span>._padding(X_batch, batch_type=<span style="color: #CD5555">&quot;image&quot;</span>)
-            img_height, img_width = X_batch_padded.shape[<span style="color: #B452CD">2</span>:]
-            <span style="color: #228B22"># For each location in the image...</span>
-            <span style="color: #8B008B; font-weight: bold">for</span> h <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(
-                <span style="color: #B452CD">0</span>,
-                X_batch.shape[height_index],
-                <span style="color: #658b00">self</span>.v_stride,
-            ):
-                <span style="color: #8B008B; font-weight: bold">for</span> w <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(
-                    <span style="color: #B452CD">0</span>,
-                    X_batch.shape[width_index],
-                    <span style="color: #658b00">self</span>.h_stride,
-                ):
-                    <span style="color: #228B22"># ...obtain an image patch of the original size (strided)</span>
-
-                    <span style="color: #228B22"># get window</span>
-                    window = X_batch_padded[
-                        :,
-                        :,
-                        h : h + <span style="color: #658b00">self</span>.kernel_height,
-                        w : w + <span style="color: #658b00">self</span>.kernel_width,
-                    ]
-
-                    <span style="color: #228B22"># append to list of windows</span>
-                    windows.append(window)
-
-            <span style="color: #228B22"># return numpy array instead of list</span>
-            <span style="color: #8B008B; font-weight: bold">return</span> np.stack(windows)
-
-        <span style="color: #228B22"># In order to be able to perform backprogagation by the method of window extraction,</span>
-        <span style="color: #228B22"># here is a modified approach to extracting the windows which allow for the necessary</span>
-        <span style="color: #228B22"># upsampling of the gradient in case the on of the stride parameters is larger than one.</span>
-
-        <span style="color: #8B008B; font-weight: bold">if</span> batch_type == <span style="color: #CD5555">&quot;grad&quot;</span>:
-
-            <span style="color: #228B22"># In the case of one of the stride parameters being odd, we have to take some</span>
-            <span style="color: #228B22"># extra care in calculating the upsampled size of X_batch. We solve this</span>
-            <span style="color: #228B22"># by simply flooring the result of dividing stride by 2.</span>
-            <span style="color: #8B008B; font-weight: bold">if</span> <span style="color: #658b00">self</span>.v_stride &lt; <span style="color: #B452CD">2</span> <span style="color: #8B008B">or</span> <span style="color: #658b00">self</span>.v_stride % <span style="color: #B452CD">2</span> == <span style="color: #B452CD">0</span>:
-                v_stride = <span style="color: #B452CD">0</span>
-            <span style="color: #8B008B; font-weight: bold">else</span>:
-                v_stride = <span style="color: #658b00">int</span>(np.floor(<span style="color: #658b00">self</span>.v_stride / <span style="color: #B452CD">2</span>))
-
-            <span style="color: #8B008B; font-weight: bold">if</span> <span style="color: #658b00">self</span>.h_stride &lt; <span style="color: #B452CD">2</span> <span style="color: #8B008B">or</span> <span style="color: #658b00">self</span>.h_stride % <span style="color: #B452CD">2</span> == <span style="color: #B452CD">0</span>:
-                h_stride = <span style="color: #B452CD">0</span>
-            <span style="color: #8B008B; font-weight: bold">else</span>:
-                h_stride = <span style="color: #658b00">int</span>(np.floor(<span style="color: #658b00">self</span>.h_stride / <span style="color: #B452CD">2</span>))
-
-            upsampled_height = (X_batch.shape[height_index] * <span style="color: #658b00">self</span>.v_stride) - v_stride
-            upsampled_width = (X_batch.shape[width_index] * <span style="color: #658b00">self</span>.h_stride) - h_stride
-
-            <span style="color: #228B22"># When upsampling, we need to insert rows and columns filled with zeros</span>
-            <span style="color: #228B22"># into each feature map. How many of those we have to insert is purely</span>
-            <span style="color: #228B22"># dependant on the value of stride parameter in the vertical and horizontal</span>
-            <span style="color: #228B22"># direction.</span>
-            <span style="color: #8B008B; font-weight: bold">if</span> <span style="color: #658b00">self</span>.v_stride &gt; <span style="color: #B452CD">1</span>:
-                v_ind = <span style="color: #B452CD">1</span>
-                <span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(X_batch.shape[height_index]):
-                    <span style="color: #8B008B; font-weight: bold">for</span> j <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #658b00">self</span>.v_stride - <span style="color: #B452CD">1</span>):
-                        X_batch = np.insert(X_batch, v_ind, <span style="color: #B452CD">0</span>, axis=height_index)
-                    v_ind += <span style="color: #658b00">self</span>.v_stride
-
-            <span style="color: #8B008B; font-weight: bold">if</span> <span style="color: #658b00">self</span>.h_stride &gt; <span style="color: #B452CD">1</span>:
-                h_ind = <span style="color: #B452CD">1</span>
-                <span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(X_batch.shape[width_index]):
-                    <span style="color: #8B008B; font-weight: bold">for</span> k <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #658b00">self</span>.h_stride - <span style="color: #B452CD">1</span>):
-                        X_batch = np.insert(X_batch, h_ind, <span style="color: #B452CD">0</span>, axis=width_index)
-                    h_ind += <span style="color: #658b00">self</span>.h_stride
-
-            <span style="color: #228B22"># Since the insertion of zero-filled rows and columns isn&#39;t perfect, we have</span>
-            <span style="color: #228B22"># to assure that the resulting feature maps will have the expected upsampled height</span>
-            <span style="color: #228B22"># and width by cutting them og at desired dimensions.</span>
-
-            X_batch = X_batch[:, :, :upsampled_height, :upsampled_width]
-
-            X_batch_padded = <span style="color: #658b00">self</span>._padding(X_batch, batch_type=<span style="color: #CD5555">&quot;grad&quot;</span>)
-
-            <span style="color: #228B22"># initialize list of windows</span>
-            windows = []
-
-            <span style="color: #228B22"># For each location in the image...</span>
-            <span style="color: #8B008B; font-weight: bold">for</span> h <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(
-                <span style="color: #B452CD">0</span>,
-                X_batch.shape[height_index],
-                <span style="color: #658b00">self</span>.v_stride,
-            ):
-                <span style="color: #8B008B; font-weight: bold">for</span> w <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(
-                    <span style="color: #B452CD">0</span>,
-                    X_batch.shape[width_index],
-                    <span style="color: #658b00">self</span>.h_stride,
-                ):
-                    <span style="color: #228B22"># ...obtain an image patch of the original size (strided)</span>
-
-                    <span style="color: #228B22"># get window</span>
-                    window = X_batch_padded[
-                        :, :, h : h + <span style="color: #658b00">self</span>.kernel_height, w : w + <span style="color: #658b00">self</span>.kernel_width
-                    ]
-
-                    <span style="color: #228B22"># append window to list</span>
-                    windows.append(window)
-
-            <span style="color: #228B22"># return numpy array, unsampled dimensions</span>
-            <span style="color: #8B008B; font-weight: bold">return</span> np.stack(windows), upsampled_height, upsampled_width
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">_check_for_errors</span>(<span style="color: #658b00">self</span>):
-        <span style="color: #228B22"># compares input channels of data to input channels of Convolution2DLayer</span>
-        <span style="color: #8B008B; font-weight: bold">if</span> <span style="color: #658b00">self</span>.X_batch_feedforward.shape[input_channel_index] != <span style="color: #658b00">self</span>.input_channels:
-            <span style="color: #8B008B; font-weight: bold">raise</span> <span style="color: #008b45; font-weight: bold">AssertionError</span>(
-                <span style="color: #CD5555">f&quot;ERROR: Number of input channels in data ({</span><span style="color: #658b00">self</span>.X_batch_feedforward.shape[input_channel_index]<span style="color: #CD5555">}) is not equal to input channels in Convolution2DLayerOPT ({</span><span style="color: #658b00">self</span>.input_channels<span style="color: #CD5555">})! Please change the number of input channels of the Convolution2DLayer such that they are equal&quot;</span>
-            )
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-<h3 id="the-convolutional-neural-network-cnn">The Convolutional Neural Network (CNN) </h3>
+$$
+H_2 = \frac{(H_1-F+2P)}{S}+1,
+$$
 
-<p>Finally, we present the code for the CNN. The CNN class organizes all the layers, and allows for training on image data.</p>
+<p>and \( D_2=K \).</p>
 
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="parameters-to-train-common-settings">Parameters to train, common settings </h2>
 
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="line-height: 125%;"><span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">math</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">autograd.numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">sys</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">warnings</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">autograd</span> <span style="color: #8B008B; font-weight: bold">import</span> grad, elementwise_grad
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">random</span> <span style="color: #8B008B; font-weight: bold">import</span> random, seed
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">copy</span> <span style="color: #8B008B; font-weight: bold">import</span> deepcopy
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">typing</span> <span style="color: #8B008B; font-weight: bold">import</span> Tuple, Callable
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.utils</span> <span style="color: #8B008B; font-weight: bold">import</span> resample
-
-warnings.simplefilter(<span style="color: #CD5555">&quot;error&quot;</span>)
-
-
-<span style="color: #8B008B; font-weight: bold">class</span> <span style="color: #008b45; font-weight: bold">CNN</span>:
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">__init__</span>(
-        <span style="color: #658b00">self</span>,
-        cost_func: Callable = CostCrossEntropy,
-        scheduler: Scheduler = Adam(eta=<span style="color: #B452CD">1e-4</span>, rho=<span style="color: #B452CD">0.9</span>, rho2=<span style="color: #B452CD">0.999</span>),
-        seed: <span style="color: #658b00">int</span> = <span style="color: #8B008B; font-weight: bold">None</span>,
-    ):
-        <span style="color: #CD5555">&quot;&quot;&quot;</span>
-<span style="color: #CD5555">        Description:</span>
-<span style="color: #CD5555">        ------------</span>
-<span style="color: #CD5555">            Instantiates CNN object</span>
-
-<span style="color: #CD5555">        Parameters:</span>
-<span style="color: #CD5555">        ------------</span>
-<span style="color: #CD5555">            I   output_func (costFunctions) cost function for feed forward neural network part of CNN,</span>
-<span style="color: #CD5555">                such as &quot;CostLogReg&quot;, &quot;CostOLS&quot; or &quot;CostCrossEntropy&quot;</span>
-
-<span style="color: #CD5555">            II  scheduler (Scheduler) optional parameter, default set to Adam. Can also be set to other</span>
-<span style="color: #CD5555">                schedulers such as AdaGrad, Momentum, RMS_prop and Constant. Note that schedulers have</span>
-<span style="color: #CD5555">                to be instantiated first with proper parameters (for example eta, rho and rho2 for Adam)</span>
-
-<span style="color: #CD5555">            III seed (int) used for seeding all random operations</span>
-<span style="color: #CD5555">        &quot;&quot;&quot;</span>
-        <span style="color: #658b00">self</span>.layers = <span style="color: #658b00">list</span>()
-        <span style="color: #658b00">self</span>.cost_func = cost_func
-        <span style="color: #658b00">self</span>.scheduler = scheduler
-        <span style="color: #658b00">self</span>.schedulers_weight = <span style="color: #658b00">list</span>()
-        <span style="color: #658b00">self</span>.schedulers_bias = <span style="color: #658b00">list</span>()
-        <span style="color: #658b00">self</span>.seed = seed
-        <span style="color: #658b00">self</span>.pred_format = <span style="color: #8B008B; font-weight: bold">None</span>
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">add_FullyConnectedLayer</span>(
-        <span style="color: #658b00">self</span>, nodes: <span style="color: #658b00">int</span>, act_func=LRELU, scheduler=<span style="color: #8B008B; font-weight: bold">None</span>
-    ) -&gt; <span style="color: #8B008B; font-weight: bold">None</span>:
-        <span style="color: #CD5555">&quot;&quot;&quot;</span>
-<span style="color: #CD5555">        Description:</span>
-<span style="color: #CD5555">        ------------</span>
-<span style="color: #CD5555">            Add a FullyConnectedLayer to the CNN, i.e. a hidden layer in the feed forward neural</span>
-<span style="color: #CD5555">            network part of the CNN. Often called a Dense layer in literature</span>
-
-<span style="color: #CD5555">        Parameters:</span>
-<span style="color: #CD5555">        ------------</span>
-<span style="color: #CD5555">            I   nodes (int) number of nodes in FullyConnectedLayer</span>
-<span style="color: #CD5555">            II  act_func (activationFunctions) activation function of FullyConnectedLayer,</span>
-<span style="color: #CD5555">                such as &quot;sigmoid&quot;, &quot;RELU&quot;, &quot;LRELU&quot;, &quot;softmax&quot; or &quot;identity&quot;</span>
-<span style="color: #CD5555">            III scheduler (Scheduler) optional parameter, default set to Adam. Can also be set to other</span>
-<span style="color: #CD5555">                schedulers such as AdaGrad, Momentum, RMS_prop and Constant</span>
-<span style="color: #CD5555">        &quot;&quot;&quot;</span>
-        <span style="color: #8B008B; font-weight: bold">assert</span> <span style="color: #658b00">self</span>.layers, <span style="color: #CD5555">&quot;FullyConnectedLayer should follow FlattenLayer in CNN&quot;</span>
-
-        <span style="color: #8B008B; font-weight: bold">if</span> scheduler <span style="color: #8B008B">is</span> <span style="color: #8B008B; font-weight: bold">None</span>:
-            scheduler = <span style="color: #658b00">self</span>.scheduler
-
-        layer = FullyConnectedLayer(nodes, act_func, scheduler, <span style="color: #658b00">self</span>.seed)
-        <span style="color: #658b00">self</span>.layers.append(layer)
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">add_OutputLayer</span>(<span style="color: #658b00">self</span>, nodes: <span style="color: #658b00">int</span>, output_func=sigmoid, scheduler=<span style="color: #8B008B; font-weight: bold">None</span>) -&gt; <span style="color: #8B008B; font-weight: bold">None</span>:
-        <span style="color: #CD5555">&quot;&quot;&quot;</span>
-<span style="color: #CD5555">        Description:</span>
-<span style="color: #CD5555">        ------------</span>
-<span style="color: #CD5555">            Add an OutputLayer to the CNN, i.e. a the final layer in the feed forward neural</span>
-<span style="color: #CD5555">            network part of the CNN</span>
-
-<span style="color: #CD5555">        Parameters:</span>
-<span style="color: #CD5555">        ------------</span>
-<span style="color: #CD5555">            I   nodes (int) number of nodes in OutputLayer. Set nodes=1 for binary classification and</span>
-<span style="color: #CD5555">                nodes = number of classes for multi-class classification</span>
-<span style="color: #CD5555">            II  output_func (activationFunctions) activation function for the output layer, such as</span>
-<span style="color: #CD5555">                &quot;identity&quot; for regression, &quot;sigmoid&quot; for binary classification and &quot;softmax&quot; for multi-class</span>
-<span style="color: #CD5555">                classification</span>
-<span style="color: #CD5555">            III scheduler (Scheduler) optional parameter, default set to Adam. Can also be set to other</span>
-<span style="color: #CD5555">                schedulers such as AdaGrad, Momentum, RMS_prop and Constant</span>
-<span style="color: #CD5555">        &quot;&quot;&quot;</span>
-        <span style="color: #8B008B; font-weight: bold">assert</span> <span style="color: #658b00">self</span>.layers, <span style="color: #CD5555">&quot;OutputLayer should follow FullyConnectedLayer in CNN&quot;</span>
-
-        <span style="color: #8B008B; font-weight: bold">if</span> scheduler <span style="color: #8B008B">is</span> <span style="color: #8B008B; font-weight: bold">None</span>:
-            scheduler = <span style="color: #658b00">self</span>.scheduler
-
-        output_layer = OutputLayer(
-            nodes, output_func, <span style="color: #658b00">self</span>.cost_func, scheduler, <span style="color: #658b00">self</span>.seed
-        )
-        <span style="color: #658b00">self</span>.layers.append(output_layer)
-        <span style="color: #658b00">self</span>.pred_format = output_layer.get_pred_format()
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">add_FlattenLayer</span>(<span style="color: #658b00">self</span>, act_func=LRELU) -&gt; <span style="color: #8B008B; font-weight: bold">None</span>:
-        <span style="color: #CD5555">&quot;&quot;&quot;</span>
-<span style="color: #CD5555">        Description:</span>
-<span style="color: #CD5555">        ------------</span>
-<span style="color: #CD5555">            Add a FlattenLayer to the CNN, which flattens the image data such that it is formatted to</span>
-<span style="color: #CD5555">            be used in the feed forward neural network part of the CNN</span>
-<span style="color: #CD5555">        &quot;&quot;&quot;</span>
-        <span style="color: #658b00">self</span>.layers.append(FlattenLayer(act_func=act_func, seed=<span style="color: #658b00">self</span>.seed))
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">add_Convolution2DLayer</span>(
-        <span style="color: #658b00">self</span>,
-        input_channels=<span style="color: #B452CD">1</span>,
-        feature_maps=<span style="color: #B452CD">1</span>,
-        kernel_height=<span style="color: #B452CD">3</span>,
-        kernel_width=<span style="color: #B452CD">3</span>,
-        v_stride=<span style="color: #B452CD">1</span>,
-        h_stride=<span style="color: #B452CD">1</span>,
-        pad=<span style="color: #CD5555">&quot;same&quot;</span>,
-        act_func=LRELU,
-        optimized=<span style="color: #8B008B; font-weight: bold">True</span>,
-    ) -&gt; <span style="color: #8B008B; font-weight: bold">None</span>:
-        <span style="color: #CD5555">&quot;&quot;&quot;</span>
-<span style="color: #CD5555">        Description:</span>
-<span style="color: #CD5555">        ------------</span>
-<span style="color: #CD5555">            Add a Convolution2DLayer to the CNN, i.e. a convolutional layer with a 2 dimensional kernel. Should be</span>
-<span style="color: #CD5555">            the first layer added to the CNN</span>
-
-<span style="color: #CD5555">        Parameters:</span>
-<span style="color: #CD5555">        ------------</span>
-<span style="color: #CD5555">            I   input_channels (int) specifies amount of input channels. For monochrome images, use input_channels</span>
-<span style="color: #CD5555">                = 1, and input_channels = 3 for colored images, where each channel represents one of R, G and B</span>
-<span style="color: #CD5555">            II  feature_maps (int) amount of feature maps in CNN</span>
-<span style="color: #CD5555">            III kernel_height (int) height of the kernel, also called &#39;convolutional filter&#39; in literature</span>
-<span style="color: #CD5555">            IV  kernel_width (int) width of the kernel, also called &#39;convolutional filter&#39; in literature</span>
-<span style="color: #CD5555">            V   v_stride (int) value of vertical stride for dimentionality reduction</span>
-<span style="color: #CD5555">            VI  h_stride (int) value of horizontal stride for dimentionality reduction</span>
-<span style="color: #CD5555">            VII pad (str) default = &quot;same&quot; ensures output size is the same as input size (given stride=1)</span>
-<span style="color: #CD5555">           VIII act_func (activationFunctions) default = &quot;LRELU&quot;, nonlinear activation function</span>
-<span style="color: #CD5555">             IX optimized (bool) default = True, uses Convolution2DLayerOPT if True which is much faster when</span>
-<span style="color: #CD5555">                compared to Convolution2DLayer, which is a more straightforward, understandable implementation</span>
-<span style="color: #CD5555">        &quot;&quot;&quot;</span>
-        <span style="color: #8B008B; font-weight: bold">if</span> optimized:
-            conv_layer = Convolution2DLayerOPT(
-                input_channels,
-                feature_maps,
-                kernel_height,
-                kernel_width,
-                v_stride,
-                h_stride,
-                pad,
-                act_func,
-                <span style="color: #658b00">self</span>.seed,
-                reset_weights_independently=<span style="color: #8B008B; font-weight: bold">False</span>,
-            )
-        <span style="color: #8B008B; font-weight: bold">else</span>:
-            conv_layer = Convolution2DLayer(
-                input_channels,
-                feature_maps,
-                kernel_height,
-                kernel_width,
-                v_stride,
-                h_stride,
-                pad,
-                act_func,
-                <span style="color: #658b00">self</span>.seed,
-                reset_weights_independently=<span style="color: #8B008B; font-weight: bold">False</span>,
-            )
-        <span style="color: #658b00">self</span>.layers.append(conv_layer)
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">add_PoolingLayer</span>(
-        <span style="color: #658b00">self</span>, kernel_height=<span style="color: #B452CD">2</span>, kernel_width=<span style="color: #B452CD">2</span>, v_stride=<span style="color: #B452CD">1</span>, h_stride=<span style="color: #B452CD">1</span>, pooling=<span style="color: #CD5555">&quot;max&quot;</span>
-    ) -&gt; <span style="color: #8B008B; font-weight: bold">None</span>:
-        <span style="color: #CD5555">&quot;&quot;&quot;</span>
-<span style="color: #CD5555">        Description:</span>
-<span style="color: #CD5555">        ------------</span>
-<span style="color: #CD5555">            Add a Pooling2DLayer to the CNN, i.e. a pooling layer that reduces the dimentionality of</span>
-<span style="color: #CD5555">            the image data. It is not necessary to use a Pooling2DLayer when creating a CNN, but it</span>
-<span style="color: #CD5555">            can be used to speed up the training</span>
-
-<span style="color: #CD5555">        Parameters:</span>
-<span style="color: #CD5555">        ------------</span>
-<span style="color: #CD5555">            I   kernel_height (int) height of the kernel used for pooling</span>
-<span style="color: #CD5555">            II  kernel_width (int) width of the kernel used for pooling</span>
-<span style="color: #CD5555">            III v_stride (int) value of vertical stride for dimentionality reduction</span>
-<span style="color: #CD5555">            IV  h_stride (int) value of horizontal stride for dimentionality reduction</span>
-<span style="color: #CD5555">            V   pooling (str) either &quot;max&quot; or &quot;average&quot;, describes type of pooling performed</span>
-<span style="color: #CD5555">        &quot;&quot;&quot;</span>
-        pooling_layer = Pooling2DLayer(
-            kernel_height, kernel_width, v_stride, h_stride, pooling, <span style="color: #658b00">self</span>.seed
-        )
-        <span style="color: #658b00">self</span>.layers.append(pooling_layer)
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">fit</span>(
-        <span style="color: #658b00">self</span>,
-        X: np.ndarray,
-        t: np.ndarray,
-        epochs: <span style="color: #658b00">int</span> = <span style="color: #B452CD">100</span>,
-        lam: <span style="color: #658b00">float</span> = <span style="color: #B452CD">0</span>,
-        batches: <span style="color: #658b00">int</span> = <span style="color: #B452CD">1</span>,
-        X_val: np.ndarray = <span style="color: #8B008B; font-weight: bold">None</span>,
-        t_val: np.ndarray = <span style="color: #8B008B; font-weight: bold">None</span>,
-    ) -&gt; <span style="color: #658b00">dict</span>:
-        <span style="color: #CD5555">&quot;&quot;&quot;</span>
-<span style="color: #CD5555">        Description:</span>
-<span style="color: #CD5555">        ------------</span>
-<span style="color: #CD5555">            Fits the CNN to input X for a given amount of epochs. Performs feedforward and backpropagation passes,</span>
-<span style="color: #CD5555">            can utilize batches, regulariziation and validation if desired.</span>
-
-<span style="color: #CD5555">        Parameters:</span>
-<span style="color: #CD5555">        ------------</span>
-<span style="color: #CD5555">            X (numpy array) with input data in format [images, input channels,</span>
-<span style="color: #CD5555">            image height, image_width]</span>
-<span style="color: #CD5555">            t (numpy array) target labels for input data</span>
-<span style="color: #CD5555">            epochs (int) amount of epochs</span>
-<span style="color: #CD5555">            lam (float) regulariziation term lambda</span>
-<span style="color: #CD5555">            batches (int) amount of batches input data splits into</span>
-<span style="color: #CD5555">            X_val (numpy array) validation data</span>
-<span style="color: #CD5555">            t_val (numpy array) target labels for validation data</span>
-
-<span style="color: #CD5555">        Returns:</span>
-<span style="color: #CD5555">        ------------</span>
-<span style="color: #CD5555">            scores (dict) a dictionary with &quot;train_error&quot;, &quot;train_acc&quot;, &quot;val_error&quot;, val_acc&quot; keys</span>
-<span style="color: #CD5555">            that contain numpy arrays with float values of all accuracies/errors over all epochs.</span>
-<span style="color: #CD5555">            Can be used to create plots. Also used to update the progress bar during training</span>
-<span style="color: #CD5555">        &quot;&quot;&quot;</span>
-
-        <span style="color: #228B22"># setup</span>
-        <span style="color: #8B008B; font-weight: bold">if</span> <span style="color: #658b00">self</span>.seed <span style="color: #8B008B">is</span> <span style="color: #8B008B">not</span> <span style="color: #8B008B; font-weight: bold">None</span>:
-            np.random.seed(<span style="color: #658b00">self</span>.seed)
-
-        <span style="color: #228B22"># initialize weights</span>
-        <span style="color: #658b00">self</span>._initialize_weights(X)
-
-        <span style="color: #228B22"># create arrays for score metrics</span>
-        scores = <span style="color: #658b00">self</span>._initialize_scores(epochs)
-
-        <span style="color: #8B008B; font-weight: bold">assert</span> batches &lt;= t.shape[<span style="color: #B452CD">0</span>]
-        batch_size = X.shape[<span style="color: #B452CD">0</span>] // batches
-
-        <span style="color: #8B008B; font-weight: bold">try</span>:
-            <span style="color: #8B008B; font-weight: bold">for</span> epoch <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(epochs):
-                <span style="color: #8B008B; font-weight: bold">for</span> batch <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(batches):
-                    <span style="color: #228B22"># minibatch gradient descent</span>
-                    <span style="color: #228B22"># If the for loop has reached the last batch, take all thats left</span>
-                    <span style="color: #8B008B; font-weight: bold">if</span> batch == batches - <span style="color: #B452CD">1</span>:
-                        X_batch = X[batch * batch_size :, :, :, :]
-                        t_batch = t[batch * batch_size :, :]
-                    <span style="color: #8B008B; font-weight: bold">else</span>:
-                        X_batch = X[
-                            batch * batch_size : (batch + <span style="color: #B452CD">1</span>) * batch_size, :, :, :
-                        ]
-                        t_batch = t[batch * batch_size : (batch + <span style="color: #B452CD">1</span>) * batch_size, :]
-
-                    <span style="color: #658b00">self</span>._feedforward(X_batch)
-                    <span style="color: #658b00">self</span>._backpropagate(t_batch, lam)
-
-                <span style="color: #228B22"># reset schedulers for each epoch (some schedulers pass in this call)</span>
-                <span style="color: #8B008B; font-weight: bold">for</span> layer <span style="color: #8B008B">in</span> <span style="color: #658b00">self</span>.layers:
-                    <span style="color: #8B008B; font-weight: bold">if</span> <span style="color: #658b00">isinstance</span>(layer, FullyConnectedLayer):
-                        layer._reset_scheduler()
-
-                <span style="color: #228B22"># computing performance metrics</span>
-                scores = <span style="color: #658b00">self</span>._compute_scores(scores, epoch, X, t, X_val, t_val)
-
-                <span style="color: #228B22"># printing progress bar</span>
-                print_length = <span style="color: #658b00">self</span>._progress_bar(
-                    epoch,
-                    epochs,
-                    scores,
-                )
-        <span style="color: #228B22"># allows for stopping training at any point and seeing the result</span>
-        <span style="color: #8B008B; font-weight: bold">except</span> <span style="color: #008b45; font-weight: bold">KeyboardInterrupt</span>:
-            <span style="color: #8B008B; font-weight: bold">pass</span>
-
-        <span style="color: #228B22"># visualization of training progression (similiar to tensorflow progression bar)</span>
-        sys.stdout.write(<span style="color: #CD5555">&quot;\r&quot;</span> + <span style="color: #CD5555">&quot; &quot;</span> * print_length)
-        sys.stdout.flush()
-        <span style="color: #658b00">self</span>._progress_bar(
-            epochs,
-            epochs,
-            scores,
-        )
-        sys.stdout.write(<span style="color: #CD5555">&quot;&quot;</span>)
-
-        <span style="color: #8B008B; font-weight: bold">return</span> scores
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">_feedforward</span>(<span style="color: #658b00">self</span>, X_batch) -&gt; np.ndarray:
-        <span style="color: #CD5555">&quot;&quot;&quot;</span>
-<span style="color: #CD5555">        Description:</span>
-<span style="color: #CD5555">        ------------</span>
-<span style="color: #CD5555">            Performs the feedforward pass for all layers in the CNN. Called from fit()</span>
-<span style="color: #CD5555">        &quot;&quot;&quot;</span>
-        a = X_batch
-        <span style="color: #8B008B; font-weight: bold">for</span> layer <span style="color: #8B008B">in</span> <span style="color: #658b00">self</span>.layers:
-            a = layer._feedforward(a)
-
-        <span style="color: #8B008B; font-weight: bold">return</span> a
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">_backpropagate</span>(<span style="color: #658b00">self</span>, t_batch, lam) -&gt; <span style="color: #8B008B; font-weight: bold">None</span>:
-        <span style="color: #CD5555">&quot;&quot;&quot;</span>
-<span style="color: #CD5555">        Description:</span>
-<span style="color: #CD5555">        ------------</span>
-<span style="color: #CD5555">            Performs backpropagation for all layers in the CNN. Called from fit()</span>
-<span style="color: #CD5555">        &quot;&quot;&quot;</span>
-        <span style="color: #8B008B; font-weight: bold">assert</span> <span style="color: #658b00">len</span>(<span style="color: #658b00">self</span>.layers) &gt;= <span style="color: #B452CD">2</span>
-        reversed_layers = <span style="color: #658b00">self</span>.layers[::-<span style="color: #B452CD">1</span>]
-
-        <span style="color: #228B22"># for every layer, backwards</span>
-        <span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #658b00">len</span>(reversed_layers) - <span style="color: #B452CD">1</span>):
-            layer = reversed_layers[i]
-            prev_layer = reversed_layers[i + <span style="color: #B452CD">1</span>]
-
-            <span style="color: #228B22"># OutputLayer</span>
-            <span style="color: #8B008B; font-weight: bold">if</span> <span style="color: #658b00">isinstance</span>(layer, OutputLayer):
-                prev_a = prev_layer.get_prev_a()
-                weights_next, delta_next = layer._backpropagate(t_batch, prev_a, lam)
-
-            <span style="color: #228B22"># FullyConnectedLayer</span>
-            <span style="color: #8B008B; font-weight: bold">elif</span> <span style="color: #658b00">isinstance</span>(layer, FullyConnectedLayer):
-                <span style="color: #8B008B; font-weight: bold">assert</span> (
-                    delta_next <span style="color: #8B008B">is</span> <span style="color: #8B008B">not</span> <span style="color: #8B008B; font-weight: bold">None</span>
-                ), <span style="color: #CD5555">&quot;No OutputLayer to follow FullyConnectedLayer&quot;</span>
-                <span style="color: #8B008B; font-weight: bold">assert</span> (
-                    weights_next <span style="color: #8B008B">is</span> <span style="color: #8B008B">not</span> <span style="color: #8B008B; font-weight: bold">None</span>
-                ), <span style="color: #CD5555">&quot;No OutputLayer to follow FullyConnectedLayer&quot;</span>
-                prev_a = prev_layer.get_prev_a()
-                weights_next, delta_next = layer._backpropagate(
-                    weights_next, delta_next, prev_a, lam
-                )
-
-            <span style="color: #228B22"># FlattenLayer</span>
-            <span style="color: #8B008B; font-weight: bold">elif</span> <span style="color: #658b00">isinstance</span>(layer, FlattenLayer):
-                <span style="color: #8B008B; font-weight: bold">assert</span> (
-                    delta_next <span style="color: #8B008B">is</span> <span style="color: #8B008B">not</span> <span style="color: #8B008B; font-weight: bold">None</span>
-                ), <span style="color: #CD5555">&quot;No FullyConnectedLayer to follow FlattenLayer&quot;</span>
-                <span style="color: #8B008B; font-weight: bold">assert</span> (
-                    weights_next <span style="color: #8B008B">is</span> <span style="color: #8B008B">not</span> <span style="color: #8B008B; font-weight: bold">None</span>
-                ), <span style="color: #CD5555">&quot;No FullyConnectedLayer to follow FlattenLayer&quot;</span>
-                delta_next = layer._backpropagate(weights_next, delta_next)
-
-            <span style="color: #228B22"># Convolution2DLayer and Convolution2DLayerOPT</span>
-            <span style="color: #8B008B; font-weight: bold">elif</span> <span style="color: #658b00">isinstance</span>(layer, Convolution2DLayer):
-                <span style="color: #8B008B; font-weight: bold">assert</span> (
-                    delta_next <span style="color: #8B008B">is</span> <span style="color: #8B008B">not</span> <span style="color: #8B008B; font-weight: bold">None</span>
-                ), <span style="color: #CD5555">&quot;No FlattenLayer to follow Convolution2DLayer&quot;</span>
-                delta_next = layer._backpropagate(delta_next)
-
-            <span style="color: #228B22"># Pooling2DLayer</span>
-            <span style="color: #8B008B; font-weight: bold">elif</span> <span style="color: #658b00">isinstance</span>(layer, Pooling2DLayer):
-                <span style="color: #8B008B; font-weight: bold">assert</span> delta_next <span style="color: #8B008B">is</span> <span style="color: #8B008B">not</span> <span style="color: #8B008B; font-weight: bold">None</span>, <span style="color: #CD5555">&quot;No Layer to follow Pooling2DLayer&quot;</span>
-                delta_next = layer._backpropagate(delta_next)
-
-            <span style="color: #228B22"># Catch error</span>
-            <span style="color: #8B008B; font-weight: bold">else</span>:
-                <span style="color: #8B008B; font-weight: bold">raise</span> <span style="color: #008b45; font-weight: bold">NotImplementedError</span>
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">_compute_scores</span>(
-        <span style="color: #658b00">self</span>,
-        scores: <span style="color: #658b00">dict</span>,
-        epoch: <span style="color: #658b00">int</span>,
-        X: np.ndarray,
-        t: np.ndarray,
-        X_val: np.ndarray,
-        t_val: np.ndarray,
-    ) -&gt; <span style="color: #658b00">dict</span>:
-        <span style="color: #CD5555">&quot;&quot;&quot;</span>
-<span style="color: #CD5555">        Description:</span>
-<span style="color: #CD5555">        ------------</span>
-<span style="color: #CD5555">            Computes scores such as training error, training accuracy, validation error</span>
-<span style="color: #CD5555">            and validation accuracy for the CNN depending on if a validation set is used</span>
-<span style="color: #CD5555">            and if the CNN performs classification or regression</span>
-
-<span style="color: #CD5555">        Returns:</span>
-<span style="color: #CD5555">        ------------</span>
-<span style="color: #CD5555">            scores (dict) a dictionary with &quot;train_error&quot;, &quot;train_acc&quot;, &quot;val_error&quot;, val_acc&quot; keys</span>
-<span style="color: #CD5555">            that contain numpy arrays with float values of all accuracies/errors over all epochs.</span>
-<span style="color: #CD5555">            Can be used to create plots. Also used to update the progress bar during training</span>
-<span style="color: #CD5555">        &quot;&quot;&quot;</span>
-
-        pred_train = <span style="color: #658b00">self</span>.predict(X)
-        cost_function_train = <span style="color: #658b00">self</span>.cost_func(t)
-        train_error = cost_function_train(pred_train)
-        scores[<span style="color: #CD5555">&quot;train_error&quot;</span>][epoch] = train_error
-
-        <span style="color: #8B008B; font-weight: bold">if</span> X_val <span style="color: #8B008B">is</span> <span style="color: #8B008B">not</span> <span style="color: #8B008B; font-weight: bold">None</span> <span style="color: #8B008B">and</span> t_val <span style="color: #8B008B">is</span> <span style="color: #8B008B">not</span> <span style="color: #8B008B; font-weight: bold">None</span>:
-            cost_function_val = <span style="color: #658b00">self</span>.cost_func(t_val)
-            pred_val = <span style="color: #658b00">self</span>.predict(X_val)
-            val_error = cost_function_val(pred_val)
-            scores[<span style="color: #CD5555">&quot;val_error&quot;</span>][epoch] = val_error
-
-        <span style="color: #8B008B; font-weight: bold">if</span> <span style="color: #658b00">self</span>.pred_format != <span style="color: #CD5555">&quot;Regression&quot;</span>:
-            train_acc = <span style="color: #658b00">self</span>._accuracy(pred_train, t)
-            scores[<span style="color: #CD5555">&quot;train_acc&quot;</span>][epoch] = train_acc
-            <span style="color: #8B008B; font-weight: bold">if</span> X_val <span style="color: #8B008B">is</span> <span style="color: #8B008B">not</span> <span style="color: #8B008B; font-weight: bold">None</span> <span style="color: #8B008B">and</span> t_val <span style="color: #8B008B">is</span> <span style="color: #8B008B">not</span> <span style="color: #8B008B; font-weight: bold">None</span>:
-                val_acc = <span style="color: #658b00">self</span>._accuracy(pred_val, t_val)
-                scores[<span style="color: #CD5555">&quot;val_acc&quot;</span>][epoch] = val_acc
-
-        <span style="color: #8B008B; font-weight: bold">return</span> scores
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">_initialize_scores</span>(<span style="color: #658b00">self</span>, epochs) -&gt; <span style="color: #658b00">dict</span>:
-        <span style="color: #CD5555">&quot;&quot;&quot;</span>
-<span style="color: #CD5555">        Description:</span>
-<span style="color: #CD5555">        ------------</span>
-<span style="color: #CD5555">            Initializes scores such as training error, training accuracy, validation error</span>
-<span style="color: #CD5555">            and validation accuracy for the CNN</span>
-
-<span style="color: #CD5555">        Returns:</span>
-<span style="color: #CD5555">        ------------</span>
-<span style="color: #CD5555">            A dictionary with &quot;train_error&quot;, &quot;train_acc&quot;, &quot;val_error&quot;, val_acc&quot; keys that</span>
-<span style="color: #CD5555">            will contain numpy arrays with float values of all accuracies/errors over all epochs</span>
-<span style="color: #CD5555">            when passed through the _compute_scores() function during fit()</span>
-<span style="color: #CD5555">        &quot;&quot;&quot;</span>
-        scores = <span style="color: #658b00">dict</span>()
-
-        train_errors = np.empty(epochs)
-        train_errors.fill(np.nan)
-        val_errors = np.empty(epochs)
-        val_errors.fill(np.nan)
-
-        train_accs = np.empty(epochs)
-        train_accs.fill(np.nan)
-        val_accs = np.empty(epochs)
-        val_accs.fill(np.nan)
-
-        scores[<span style="color: #CD5555">&quot;train_error&quot;</span>] = train_errors
-        scores[<span style="color: #CD5555">&quot;val_error&quot;</span>] = val_errors
-        scores[<span style="color: #CD5555">&quot;train_acc&quot;</span>] = train_accs
-        scores[<span style="color: #CD5555">&quot;val_acc&quot;</span>] = val_accs
-
-        <span style="color: #8B008B; font-weight: bold">return</span> scores
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">_initialize_weights</span>(<span style="color: #658b00">self</span>, X: np.ndarray) -&gt; <span style="color: #8B008B; font-weight: bold">None</span>:
-        <span style="color: #CD5555">&quot;&quot;&quot;</span>
-<span style="color: #CD5555">        Description:</span>
-<span style="color: #CD5555">        ------------</span>
-<span style="color: #CD5555">            Initializes weights for all layers in CNN</span>
-
-<span style="color: #CD5555">        Parameters:</span>
-<span style="color: #CD5555">        ------------</span>
-<span style="color: #CD5555">            I   X (np.ndarray) input of format [img, feature_maps, height, width]</span>
-<span style="color: #CD5555">        &quot;&quot;&quot;</span>
-        prev_nodes = X
-        <span style="color: #8B008B; font-weight: bold">for</span> layer <span style="color: #8B008B">in</span> <span style="color: #658b00">self</span>.layers:
-            prev_nodes = layer._reset_weights(prev_nodes)
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">predict</span>(<span style="color: #658b00">self</span>, X: np.ndarray, *, threshold=<span style="color: #B452CD">0.5</span>) -&gt; np.ndarray:
-        <span style="color: #CD5555">&quot;&quot;&quot;</span>
-<span style="color: #CD5555">        Description:</span>
-<span style="color: #CD5555">        ------------</span>
-<span style="color: #CD5555">            Predicts output of input X</span>
-
-<span style="color: #CD5555">        Parameters:</span>
-<span style="color: #CD5555">        ------------</span>
-<span style="color: #CD5555">            I   X (np.ndarray) input [img, feature_maps, height, width]</span>
-<span style="color: #CD5555">        &quot;&quot;&quot;</span>
-
-        prediction = <span style="color: #658b00">self</span>._feedforward(X)
-
-        <span style="color: #8B008B; font-weight: bold">if</span> <span style="color: #658b00">self</span>.pred_format == <span style="color: #CD5555">&quot;Binary&quot;</span>:
-            <span style="color: #8B008B; font-weight: bold">return</span> np.where(prediction &gt; threshold, <span style="color: #B452CD">1</span>, <span style="color: #B452CD">0</span>)
-        <span style="color: #8B008B; font-weight: bold">elif</span> <span style="color: #658b00">self</span>.pred_format == <span style="color: #CD5555">&quot;Multi-class&quot;</span>:
-            class_prediction = np.zeros(prediction.shape)
-            <span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(prediction.shape[<span style="color: #B452CD">0</span>]):
-                class_prediction[i, np.argmax(prediction[i, :])] = <span style="color: #B452CD">1</span>
-            <span style="color: #8B008B; font-weight: bold">return</span> class_prediction
-        <span style="color: #8B008B; font-weight: bold">else</span>:
-            <span style="color: #8B008B; font-weight: bold">return</span> prediction
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">_accuracy</span>(<span style="color: #658b00">self</span>, prediction: np.ndarray, target: np.ndarray) -&gt; <span style="color: #658b00">float</span>:
-        <span style="color: #CD5555">&quot;&quot;&quot;</span>
-<span style="color: #CD5555">        Description:</span>
-<span style="color: #CD5555">        ------------</span>
-<span style="color: #CD5555">            Calculates accuracy of given prediction to target</span>
-
-<span style="color: #CD5555">        Parameters:</span>
-<span style="color: #CD5555">        ------------</span>
-<span style="color: #CD5555">            I   prediction (np.ndarray): output of predict() fuction</span>
-<span style="color: #CD5555">            (1s and 0s in case of classification, and real numbers in case of regression)</span>
-<span style="color: #CD5555">            II  target (np.ndarray): vector of true values (What the network should predict)</span>
-
-<span style="color: #CD5555">        Returns:</span>
-<span style="color: #CD5555">        ------------</span>
-<span style="color: #CD5555">            A floating point number representing the percentage of correctly classified instances.</span>
-<span style="color: #CD5555">        &quot;&quot;&quot;</span>
-        <span style="color: #8B008B; font-weight: bold">assert</span> prediction.size == target.size
-        <span style="color: #8B008B; font-weight: bold">return</span> np.average((target == prediction))
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">_progress_bar</span>(<span style="color: #658b00">self</span>, epoch: <span style="color: #658b00">int</span>, epochs: <span style="color: #658b00">int</span>, scores: <span style="color: #658b00">dict</span>) -&gt; <span style="color: #658b00">int</span>:
-        <span style="color: #CD5555">&quot;&quot;&quot;</span>
-<span style="color: #CD5555">        Description:</span>
-<span style="color: #CD5555">        ------------</span>
-<span style="color: #CD5555">            Displays progress of training</span>
-<span style="color: #CD5555">        &quot;&quot;&quot;</span>
-        progression = epoch / epochs
-        epoch -= <span style="color: #B452CD">1</span>
-        print_length = <span style="color: #B452CD">40</span>
-        num_equals = <span style="color: #658b00">int</span>(progression * print_length)
-        num_not = print_length - num_equals
-        arrow = <span style="color: #CD5555">&quot;&gt;&quot;</span> <span style="color: #8B008B; font-weight: bold">if</span> num_equals &gt; <span style="color: #B452CD">0</span> <span style="color: #8B008B; font-weight: bold">else</span> <span style="color: #CD5555">&quot;&quot;</span>
-        bar = <span style="color: #CD5555">&quot;[&quot;</span> + <span style="color: #CD5555">&quot;=&quot;</span> * (num_equals - <span style="color: #B452CD">1</span>) + arrow + <span style="color: #CD5555">&quot;-&quot;</span> * num_not + <span style="color: #CD5555">&quot;]&quot;</span>
-        perc_print = <span style="color: #658b00">self</span>._fmt(progression * <span style="color: #B452CD">100</span>, N=<span style="color: #B452CD">5</span>)
-        line = <span style="color: #CD5555">f&quot;  {</span>bar<span style="color: #CD5555">} {</span>perc_print<span style="color: #CD5555">}% &quot;</span>
-
-        <span style="color: #8B008B; font-weight: bold">for</span> key, score <span style="color: #8B008B">in</span> scores.items():
-            <span style="color: #8B008B; font-weight: bold">if</span> np.isnan(score[epoch]) == <span style="color: #8B008B; font-weight: bold">False</span>:
-                value = <span style="color: #658b00">self</span>._fmt(score[epoch], N=<span style="color: #B452CD">4</span>)
-                line += <span style="color: #CD5555">f&quot;| {</span>key<span style="color: #CD5555">}: {</span>value<span style="color: #CD5555">} &quot;</span>
-        <span style="color: #658b00">print</span>(line, end=<span style="color: #CD5555">&quot;\r&quot;</span>)
-        <span style="color: #8B008B; font-weight: bold">return</span> <span style="color: #658b00">len</span>(line)
-
-    <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">_fmt</span>(<span style="color: #658b00">self</span>, value: <span style="color: #658b00">int</span>, N=<span style="color: #B452CD">4</span>) -&gt; <span style="color: #658b00">str</span>:
-        <span style="color: #CD5555">&quot;&quot;&quot;</span>
-<span style="color: #CD5555">        Description:</span>
-<span style="color: #CD5555">        ------------</span>
-<span style="color: #CD5555">            Formats decimal numbers for progress bar</span>
-<span style="color: #CD5555">        &quot;&quot;&quot;</span>
-        <span style="color: #8B008B; font-weight: bold">if</span> value &gt; <span style="color: #B452CD">0</span>:
-            v = value
-        <span style="color: #8B008B; font-weight: bold">elif</span> value &lt; <span style="color: #B452CD">0</span>:
-            v = -<span style="color: #B452CD">10</span> * value
-        <span style="color: #8B008B; font-weight: bold">else</span>:
-            v = <span style="color: #B452CD">1</span>
-        n = <span style="color: #B452CD">1</span> + math.floor(math.log10(v))
-        <span style="color: #8B008B; font-weight: bold">if</span> n &gt;= N - <span style="color: #B452CD">1</span>:
-            <span style="color: #8B008B; font-weight: bold">return</span> <span style="color: #658b00">str</span>(<span style="color: #658b00">round</span>(value))
-            <span style="color: #228B22"># or overflow</span>
-        <span style="color: #8B008B; font-weight: bold">return</span> <span style="color: #CD5555">f&quot;{</span>value<span style="color: #CD5555">:.{</span>N-n-<span style="color: #B452CD">1</span><span style="color: #CD5555">}f}&quot;</span>
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
+<p>With parameter sharing, the convolution involves thus  for each filter  \( F\times F\times D_1 \) weights plus one bias parameter.</p>
+
+<p>In total we have</p>
+$$
+\left(F\times F\times D_1)\right) \times K+(K\mathrm{--biases}),
+$$
+
+<p>parameters to train by back propagation.</p>
+
+<p>It is common to let \( K \) come in powers of \( 2 \), that is \( 32 \), \( 64 \), \( 128 \) etc.</p>
+
+<div class="alert alert-block alert-block alert-text-normal">
+<b>Common settings</b>
+<p>
+<ol>
+<li> \( \begin{array}{c} F=3 &amp; S=1 &amp; P=1 \end{array} \)</li>
+<li> \( \begin{array}{c} F=5 &amp; S=1 &amp; P=2 \end{array} \)</li>
+<li> \( \begin{array}{c} F=5 &amp; S=2 &amp; P=\mathrm{open} \end{array} \)</li>
+<li> \( \begin{array}{c} F=1 &amp; S=1 &amp; P=0 \end{array} \)</li>
+</ol>
 </div>
-<h3 id="usage-of-cnn-code">Usage of CNN code </h3>
-
-<p>Using the CNN codebase is very simple. We begin by initiating a CNN
-object, which takes a cost function, a scheduler and a seed as its
-arguments. If a scheduler is not provided, it will per default
-initiate an Adam scheduler with eta=1e-4, and if a seed is not
-provided, the CNN will not be seeded, meaning it will run with a
-different random seed every run. Below we demonstrate an initiation of
-our CNN.
+
+
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="examples-of-cnn-setups">Examples of CNN setups </h2>
+
+<p>Let us assume we have an input volume \( V \) given by an image of dimensionality
+\( 32\times 32 \times 3 \), that is three color channels and \( 32\times 32 \) pixels.
 </p>
 
+<p>We apply a filter of dimension \( 5\times 5 \) ten times with stride \( S=1 \) and padding \( P=0 \).</p>
 
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="line-height: 125%;">adam_scheduler = Adam(eta=<span style="color: #B452CD">1e-3</span>, rho=<span style="color: #B452CD">0.9</span>, rho2=<span style="color: #B452CD">0.999</span>)
-cnn = CNN(cost_func=CostCrossEntropy, scheduler=adam_scheduler, seed=<span style="color: #B452CD">2023</span>)
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
+<p>The output volume is given by \( (32-5)/1+1=28 \), resulting in ten images
+of dimensionality \( 28\times 28\times 3 \).
+</p>
 
-<p>Now that we have our CNN object, we can begin to add layers to it!
-Many of the add_layer functions have default values, for example
-add_Convolution2DLayer() has a default v_stride and h_stride of
-1. However, these can of course be set to any value you please. Note
-that the input channels of a subsequent convolutional layer must equal
-the previous convolutional layer's feature maps.
+<p>The total number of parameters to train for each filter is then
+\( 5\times 5\times 3+1 \), where the last parameter is the bias. This
+gives us \( 76 \) parameters for each filter, leading to a total of \( 760 \)
+parameters for the ten filters.
 </p>
 
+<p>How many parameters will a filter of dimensionality \( 3\times 3 \)
+(adding color channels) result in if we produce \( 32 \) new images? Use \( S=1 \) and \( P=0 \).
+</p>
 
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="line-height: 125%;">cnn.add_Convolution2DLayer(
-    input_channels=<span style="color: #B452CD">1</span>,
-    feature_maps=<span style="color: #B452CD">1</span>,
-    kernel_height=<span style="color: #B452CD">3</span>,
-    kernel_width=<span style="color: #B452CD">3</span>,
-    act_func=LRELU,
-)
+<p>Note that strides constitute a form of <b>subsampling</b>. As an alternative to
+being interpreted as a measure of how much the kernel/filter is translated, strides
+can also be viewed as how much of the output is retained. For instance, moving
+the kernel by hops of two is equivalent to moving the kernel by hops of one but
+retaining only odd output elements.
+</p>
 
-cnn.add_FlattenLayer()
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book">Summarizing: Performing a general discrete convolution (<a href="https://github.com/rasbt/machine-learning-book" target="_blank">From Raschka et al</a>) </h2>
 
-cnn.add_FullyConnectedLayer(<span style="color: #B452CD">30</span>, LRELU)
+<center>  <!-- FIGURE -->
+<hr class="figure">
+<center>
+<p class="caption">Figure 4:  A deep CNN </p>
+</center>
+<p><img src="http://github.com/figslides/discreteconv1.png" width="500" align="bottom"></p>
+</center>
 
-cnn.add_FullyConnectedLayer(<span style="color: #B452CD">20</span>, LRELU)
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="pooling">Pooling </h2>
 
-cnn.add_OutputLayer(<span style="color: #B452CD">10</span>, softmax)
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
+<p>In addition to discrete convolutions themselves, <b>pooling</b> operations
+make up another important building block in CNNs. Pooling operations reduce
+the size of feature maps by using some function to summarize subregions, such
+as taking the average or the maximum value.
+</p>
 
-<p>Here we have created a CNN with the following architecture:</p>
+<p>Pooling works by sliding a window across the input and feeding the content of
+the window to a <b>pooling function</b>. In some sense, pooling works very much
+like a discrete convolution, but replaces the linear combination described by
+the kernel with some other function.
+</p>
 
-<ol>
-<li> A convolutional layer with 1 input channel, with a kernel height of 2 and a width of 2, which uses LRELU as its non-linearity function. This layer outputs 1 feature map, which feed into the subsequent layer.</li>
-<li> A flatten layer</li>
-<li> A hidden layer with 30 nodes, with LRELU as its activation function</li>
-<li> Another hidden layer but with 20 nodes</li>
-<li> The output layer, with softmax as its activation function and 10 nodes. We use 10 nodes because we will be using a dataset with 10 classes.</li>
-</ol>
-<p>Now, before we can train the model, we need to load in our data. We
-will use the MNIST dataset and use 10000 \( 28 \times  28 \) images.
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="pooling-arithmetic">Pooling arithmetic </h2>
+
+<p>In a neural network, pooling layers provide invariance to small translations of
+the input. The most common kind of pooling is <b>max pooling</b>, which
+consists in splitting the input in (usually non-overlapping) patches and
+outputting the maximum value of each patch. Other kinds of pooling exist, e.g.,
+mean or average pooling, which all share the same idea of aggregating the input
+locally by applying a non-linearity to the content of some patches.
 </p>
 
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book">Pooling types (<a href="https://github.com/rasbt/machine-learning-book" target="_blank">From Raschka et al</a>) </h2>
 
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="line-height: 125%;"><span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.datasets</span> <span style="color: #8B008B; font-weight: bold">import</span> fetch_openml
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.model_selection</span> <span style="color: #8B008B; font-weight: bold">import</span> train_test_split
+<center>  <!-- FIGURE -->
+<hr class="figure">
+<center>
+<p class="caption">Figure 5:  A deep CNN </p>
+</center>
+<p><img src="http://github.com/figslides/maxpooling.png" width="500" align="bottom"></p>
+</center>
 
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">onehot</span>(target: np.ndarray):
-    onehot = np.zeros((target.size, target.max() + <span style="color: #B452CD">1</span>))
-    onehot[np.arange(target.size), target] = <span style="color: #B452CD">1</span>
-    <span style="color: #8B008B; font-weight: bold">return</span> onehot
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="building-convolutional-neural-networks-using-tensorflow-and-keras">Building convolutional neural networks using Tensorflow and Keras </h2>
 
-<span style="color: #228B22"># get dataset</span>
-dataset = fetch_openml(<span style="color: #CD5555">&quot;mnist_784&quot;</span>, parser=<span style="color: #CD5555">&quot;auto&quot;</span>)
-mnist = dataset.data.to_numpy(dtype=<span style="color: #CD5555">&quot;float&quot;</span>)[:<span style="color: #B452CD">10000</span>, :]
+<p>As discussed above, CNNs are neural networks built from the assumption that the inputs
+to the network are 2D images. This is important because the number of features or pixels in images
+grows very fast with the image size, and an enormous number of weights and biases are needed in order to build an accurate network.  
+</p>
 
-<span style="color: #228B22"># scale data</span>
-<span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(mnist.shape[<span style="color: #B452CD">1</span>]):
-    mnist[:, i] /= <span style="color: #B452CD">255</span>
-    
-<span style="color: #228B22"># reshape to add single input channel to data shape [inputs, input_channels, height, width]</span>
-mnist = mnist.reshape(mnist.shape[<span style="color: #B452CD">0</span>], <span style="color: #B452CD">1</span>, <span style="color: #B452CD">28</span>, <span style="color: #B452CD">28</span>)
+<p>As before, we still have our input, a hidden layer and an output. What's novel about convolutional networks
+are the <b>convolutional</b> and <b>pooling</b> layers stacked in pairs between the input and the hidden layer.
+In addition, the data is no longer represented as a 2D feature matrix, instead each input is a number of 2D
+matrices, typically 1 for each color dimension (Red, Green, Blue). 
+</p>
 
-<span style="color: #228B22"># one hot encode target as we are doing multi-class classification</span>
-target = onehot(np.array([<span style="color: #658b00">int</span>(i) <span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> dataset.target.to_numpy()[:<span style="color: #B452CD">10000</span>]]))
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="setting-it-up">Setting it up </h2>
 
-<span style="color: #228B22"># split into training and validation data</span>
-x_train, x_val, y_train, y_val = train_test_split(mnist, target)
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
+<p>It means that to represent the entire
+dataset of images, we require a 4D matrix or <b>tensor</b>. This tensor has the dimensions:  
+</p>
+$$  
+(n_{inputs},\, n_{pixels, width},\, n_{pixels, height},\, depth) .
+$$
+
+
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="the-mnist-dataset-again">The MNIST dataset again </h2>
 
-<p>Now we may train our model. Note that we can utilize regularization in
-the CNN by using the lam (lambda) parameter in fit(), and utilize
-different types of gradient descent by specifying the amount of
-batches via the batches parameter as shown below.
+<p>The MNIST dataset consists of grayscale images with a pixel size of
+\( 28\times 28 \), meaning we require \( 28 \times 28 = 724 \) weights to each
+neuron in the first hidden layer.
 </p>
 
-<p>The functionfit() returns a score dictionary of the training error and
-accuracy (and validation error and accuracy if a validation set is
-provided) which can be used to plot the error and accuracy of the
-model over epochs.
+<p>If we were to analyze images of size \( 128\times 128 \) we would require
+\( 128 \times 128 = 16384 \) weights to each neuron. Even worse if we were
+dealing with color images, as most images are, we have an image matrix
+of size \( 128\times 128 \) for each color dimension (Red, Green, Blue),
+meaning 3 times the number of weights \( = 49152 \) are required for every
+single neuron in the first hidden layer.
 </p>
 
 
-<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #eeeedd">
-  <pre style="line-height: 125%;">scores = cnn.fit(
-    x_train,
-    y_train,
-    lam=<span style="color: #B452CD">1e-5</span>,
-    batches=<span style="color: #B452CD">10</span>,
-    epochs=<span style="color: #B452CD">100</span>,
-    X_val=x_val,
-    t_val=y_val,
-)
-
-plt.plot(scores[<span style="color: #CD5555">&quot;train_acc&quot;</span>], label=<span style="color: #CD5555">&quot;Training&quot;</span>)
-plt.plot(scores[<span style="color: #CD5555">&quot;val_acc&quot;</span>], label=<span style="color: #CD5555">&quot;Validation&quot;</span>)
-plt.ylim([<span style="color: #B452CD">0.8</span>,<span style="color: #B452CD">1</span>])
-plt.xlabel(<span style="color: #CD5555">&quot;Epochs&quot;</span>)
-plt.ylabel(<span style="color: #CD5555">&quot;Accuracy&quot;</span>)
-plt.legend()
-plt.show()
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="strong-correlations">Strong correlations </h2>
+
+<p>Images typically have strong local correlations, meaning that a small
+part of the image varies little from its neighboring regions. If for
+example we have an image of a blue car, we can roughly assume that a
+small blue part of the image is surrounded by other blue regions.
+</p>
+
+<p>Therefore, instead of connecting every single pixel to a neuron in the
+first hidden layer, as we have previously done with deep neural
+networks, we can instead connect each neuron to a small part of the
+image (in all 3 RGB depth dimensions).  The size of each small area is
+fixed, and known as a <a href="https://en.wikipedia.org/wiki/Receptive_field" target="_blank">receptive</a>.
+</p>
+
+
+<!-- !split  -->
+<h2 id="layers-of-a-cnn">Layers of a CNN </h2>
 
-<p>Considering we only trained the model for 100 epochs without any tuning of the hyperparameters, this result is pretty good.</p>
+<p>The layers of a convolutional neural network arrange neurons in 3D: width, height and depth.  
+The input image is typically a square matrix of depth 3. 
+</p>
 
-<p>The codebase allows for great flexibility in CNN
-architectures. Pooling layers can be added before, inbetween or after
-convolutional layers, but due to the great optimizations made within
-Convolution2DLayerOPT, we recommend using the v_stride and h_stride
-parameters in add_Convolution2DLayer() to reduce the dimentionality of
-the problem as the pooling layer is slow in comparison. To use the
-unoptimized version of Convolution2DLayer, simply pass optimized=False
-as an argument in add_Convolution2DLayer().
+<p>A <b>convolution</b> is performed on the image which outputs
+a 3D volume of neurons. The weights to the input are arranged in a number of 2D matrices, known as <b>filters</b>.
 </p>
 
-<p>If one wishes to perform binary classification using the CNN, simply
-use the cost function 'CostLogReg' when initializing the CNN and use 1
-node at the OutputLayer.
+<p>Each filter slides along the input image, taking the dot product
+between each small part of the image and the filter, in all depth
+dimensions. This is then passed through a non-linear function,
+typically the <b>Rectified Linear (ReLu)</b> function, which serves as the
+activation of the neurons in the first convolutional layer. This is
+further passed through a <b>pooling layer</b>, which reduces the size of the
+convolutional layer, e.g. by taking the maximum or average across some
+small regions, and this serves as input to the next convolutional
+layer.
 </p>
 
-<p>Below we have created another, more untraditional architecture using
-our code to demonstrate its flexibility and different attributes such
-as asymmetric stride that might become useful when constructing your
-own CNN.
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="systematic-reduction">Systematic reduction </h2>
+
+<p>By systematically reducing the size of the input volume, through
+convolution and pooling, the network should create representations of
+small parts of the input, and then from them assemble representations
+of larger areas.  The final pooling layer is flattened to serve as
+input to a hidden layer, such that each neuron in the final pooling
+layer is connected to every single neuron in the hidden layer. This
+then serves as input to the output layer, e.g. a softmax output for
+classification.
 </p>
 
 
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="prerequisites-collect-and-pre-process-data">Prerequisites: Collect and pre-process data </h2>
+
 <!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
 <div class="cell border-box-sizing code_cell rendered">
   <div class="input">
     <div class="inner_cell">
       <div class="input_area">
         <div class="highlight" style="background: #eeeedd">
-  <pre style="line-height: 125%;">adam_scheduler = Adam(eta=<span style="color: #B452CD">1e-3</span>, rho=<span style="color: #B452CD">0.9</span>, rho2=<span style="color: #B452CD">0.999</span>)
-cnn = CNN(cost_func=CostCrossEntropy, scheduler=adam_scheduler, seed=<span style="color: #B452CD">2023</span>)
-
-cnn.add_Convolution2DLayer(
-    input_channels=<span style="color: #B452CD">1</span>,
-    feature_maps=<span style="color: #B452CD">7</span>,
-    kernel_height=<span style="color: #B452CD">7</span>,
-    kernel_width=<span style="color: #B452CD">1</span>,
-    act_func=LRELU,
-)
-
-cnn.add_PoolingLayer(
-    kernel_height=<span style="color: #B452CD">2</span>,
-    kernel_width=<span style="color: #B452CD">2</span>,
-    pooling=<span style="color: #CD5555">&quot;average&quot;</span>,
-)
-
-cnn.add_PoolingLayer(
-    kernel_height=<span style="color: #B452CD">2</span>,
-    kernel_width=<span style="color: #B452CD">2</span>,
-    pooling=<span style="color: #CD5555">&quot;max&quot;</span>,
-)
-
-cnn.add_Convolution2DLayer(
-    input_channels=<span style="color: #B452CD">7</span>,
-    feature_maps=<span style="color: #B452CD">1</span>,
-    kernel_height=<span style="color: #B452CD">4</span>,
-    kernel_width=<span style="color: #B452CD">4</span>,
-    v_stride=<span style="color: #B452CD">2</span>,
-    h_stride=<span style="color: #B452CD">3</span>,
-    act_func=LRELU,
-    optimized=<span style="color: #8B008B; font-weight: bold">False</span>,
-)
-
-cnn.add_Convolution2DLayer(
-    input_channels=<span style="color: #B452CD">1</span>,
-    feature_maps=<span style="color: #B452CD">1</span>,
-    kernel_height=<span style="color: #B452CD">2</span>,
-    kernel_width=<span style="color: #B452CD">2</span>,
-    act_func=sigmoid,
-    optimized=<span style="color: #8B008B; font-weight: bold">True</span>,
-)
-
-cnn.add_PoolingLayer(
-    kernel_height=<span style="color: #B452CD">2</span>,
-    kernel_width=<span style="color: #B452CD">2</span>,
-    pooling=<span style="color: #CD5555">&quot;max&quot;</span>
-)
-
-cnn.add_FlattenLayer()
-
-cnn.add_FullyConnectedLayer(<span style="color: #B452CD">100</span>, LRELU)
-
-cnn.add_FullyConnectedLayer(<span style="color: #B452CD">10</span>, sigmoid)
-
-cnn.add_FullyConnectedLayer(<span style="color: #B452CD">101</span>, identity)
-
-cnn.add_OutputLayer(<span style="color: #B452CD">10</span>, softmax)
+  <pre style="line-height: 125%;"><span style="color: #228B22"># import necessary packages</span>
+<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
+<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">matplotlib.pyplot</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">plt</span>
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn</span> <span style="color: #8B008B; font-weight: bold">import</span> datasets
+
+
+<span style="color: #228B22"># ensure the same random numbers appear every time</span>
+np.random.seed(<span style="color: #B452CD">0</span>)
+
+<span style="color: #228B22"># display images in notebook</span>
+%matplotlib inline
+plt.rcParams[<span style="color: #CD5555">&#39;figure.figsize&#39;</span>] = (<span style="color: #B452CD">12</span>,<span style="color: #B452CD">12</span>)
+
+
+<span style="color: #228B22"># download MNIST dataset</span>
+digits = datasets.load_digits()
+
+<span style="color: #228B22"># define inputs and labels</span>
+inputs = digits.images
+labels = digits.target
+
+<span style="color: #228B22"># RGB images have a depth of 3</span>
+<span style="color: #228B22"># our images are grayscale so they should have a depth of 1</span>
+inputs = inputs[:,:,:,np.newaxis]
+
+<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;inputs = (n_inputs, pixel_width, pixel_height, depth) = &quot;</span> + <span style="color: #658b00">str</span>(inputs.shape))
+<span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;labels = (n_inputs) = &quot;</span> + <span style="color: #658b00">str</span>(labels.shape))
+
+
+<span style="color: #228B22"># choose some random images to display</span>
+n_inputs = <span style="color: #658b00">len</span>(inputs)
+indices = np.arange(n_inputs)
+random_indices = np.random.choice(indices, size=<span style="color: #B452CD">5</span>)
+
+<span style="color: #8B008B; font-weight: bold">for</span> i, image <span style="color: #8B008B">in</span> <span style="color: #658b00">enumerate</span>(digits.images[random_indices]):
+    plt.subplot(<span style="color: #B452CD">1</span>, <span style="color: #B452CD">5</span>, i+<span style="color: #B452CD">1</span>)
+    plt.axis(<span style="color: #CD5555">&#39;off&#39;</span>)
+    plt.imshow(image, cmap=plt.cm.gray_r, interpolation=<span style="color: #CD5555">&#39;nearest&#39;</span>)
+    plt.title(<span style="color: #CD5555">&quot;Label: %d&quot;</span> % digits.target[random_indices[i]])
+plt.show()
 </pre>
 </div>
       </div>
@@ -4213,32 +1225,9 @@ <h3 id="usage-of-cnn-code">Usage of CNN code </h3>
   </div>
 </div>
 
-<p>Here we see the use of asymmetrical 1D kernels such as the \( 7 \times
-1 \) kernel in the first convolutional layer, both max and average
-pooling, asymmetric stride in the unoptimized convolutional layer,
-more pooling, a flatten layer, a hidden layer with 100 nodes using
-LRELU, another hidden layer with 10 hidden nodes that uses the sigmoid
-activation function, and another hidden layer with 101 nodes which
-utilizes no activation function (identity). Finally, we arrive at the
-output layer with 10 nodes, which uses softmax as its activation
-function.
-</p>
-<h3 id="additional-remarks">Additional Remarks </h3>
-
-<p>The stride parameter controls the distance between each convolution
-and the kernel/filter. If our image is padded, stride is the only
-parameter that determines the size of the output from a convolutional
-layer. However, if we decide not to perform any padding, the size of
-the output feature map depends on both the stride and kernel size. It
-is important to note that neither the stride nor the kernel has to be
-symmetrical. This means that we can use a rectangular filter if we
-choose, and the stride in the vertical direction (axis=0 in Python)
-does not need to be the same as the stride in the horizontal direction
-(axis=1 in Python). It may even be the case that asymmetric
-combinations of stride or kernel dimensions, or both, yield better
-results than symmetric values for these parameters.
-</p>
 
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="importing-keras-and-tensorflow">Importing Keras and Tensorflow </h2>
 
 <!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
 <div class="cell border-box-sizing code_cell rendered">
@@ -4246,26 +1235,28 @@ <h3 id="additional-remarks">Additional Remarks </h3>
     <div class="inner_cell">
       <div class="input_area">
         <div class="highlight" style="background: #eeeedd">
-  <pre style="line-height: 125%;"><span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">convolve</span>(image, kernel, stride=<span style="color: #B452CD">1</span>):
-    <span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #B452CD">2</span>):
-        kernel = np.rot90(kernel)
-
-    k_half_height = kernel.shape[<span style="color: #B452CD">0</span>] // <span style="color: #B452CD">2</span>
-    k_half_width = kernel.shape[<span style="color: #B452CD">0</span>] // <span style="color: #B452CD">2</span>
-
-    conv_image = np.zeros(image.shape)
-    pad_image = padding(image, kernel)
-
-    <span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(k_half_height, conv_image.shape[<span style="color: #B452CD">0</span>] + k_half_height, stride):
-        <span style="color: #8B008B; font-weight: bold">for</span> j <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(k_half_width, conv_image.shape[<span style="color: #B452CD">1</span>] + k_half_width, stride):
-            conv_image[i - k_half_height, j - k_half_width] = np.sum(
-                pad_image[
-                    i - k_half_height : i + k_half_height + <span style="color: #B452CD">1</span>, j - k_half_width : j + k_half_width + <span style="color: #B452CD">1</span>
-                ]
-                * kernel
-            )
-
-    <span style="color: #8B008B; font-weight: bold">return</span> conv_image
+  <pre style="line-height: 125%;"><span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">tensorflow.keras</span> <span style="color: #8B008B; font-weight: bold">import</span> datasets, layers, models
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">tensorflow.keras.layers</span> <span style="color: #8B008B; font-weight: bold">import</span> Input
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">tensorflow.keras.models</span> <span style="color: #8B008B; font-weight: bold">import</span> Sequential      <span style="color: #228B22">#This allows appending layers to existing models</span>
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">tensorflow.keras.layers</span> <span style="color: #8B008B; font-weight: bold">import</span> Dense           <span style="color: #228B22">#This allows defining the characteristics of a particular layer</span>
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">tensorflow.keras</span> <span style="color: #8B008B; font-weight: bold">import</span> optimizers             <span style="color: #228B22">#This allows using whichever optimiser we want (sgd,adam,RMSprop)</span>
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">tensorflow.keras</span> <span style="color: #8B008B; font-weight: bold">import</span> regularizers           <span style="color: #228B22">#This allows using whichever regularizer we want (l1,l2,l1_l2)</span>
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">tensorflow.keras.utils</span> <span style="color: #8B008B; font-weight: bold">import</span> to_categorical   <span style="color: #228B22">#This allows using categorical cross entropy as the cost function</span>
+<span style="color: #228B22">#from tensorflow.keras import Conv2D</span>
+<span style="color: #228B22">#from tensorflow.keras import MaxPooling2D</span>
+<span style="color: #228B22">#from tensorflow.keras import Flatten</span>
+
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">sklearn.model_selection</span> <span style="color: #8B008B; font-weight: bold">import</span> train_test_split
+
+<span style="color: #228B22"># representation of labels</span>
+labels = to_categorical(labels)
+
+<span style="color: #228B22"># split into train and test data</span>
+<span style="color: #228B22"># one-liner from scikit-learn library</span>
+train_size = <span style="color: #B452CD">0.8</span>
+test_size = <span style="color: #B452CD">1</span> - train_size
+X_train, X_test, Y_train, Y_test = train_test_split(inputs, labels, train_size=train_size,
+                                                    test_size=test_size)
 </pre>
 </div>
       </div>
@@ -4280,30 +1271,10 @@ <h3 id="additional-remarks">Additional Remarks </h3>
     </div>
   </div>
 </div>
-<h3 id="remarks-on-the-speed">Remarks on the speed  </h3>
-
-<p>Despite the naive convolution algorithm shown above working finely, it
-is extremely slow, requiring approximately 20-30 seconds to process a
-single image. The time complexity of 2D convolution, which is O(NMnm),
-rapidly becomes a constraint and may, at worst, make computations
-infeasible. Consequently, optimizing the naive 2D convolution
-algorithm is a necessity, as the execution time of the algorithm
-significantly increases as the input data size expands. This can pose
-a bottleneck in applications that necessitate real-time processing of
-large data volumes, such as image and video processing, deep learning,
-and scientific simulations.
-</p>
 
-<p>To address this issue, we shall present two widely used optimization
-techniques: the separable kernel approach and Fast Fourier Transform
-(FFT). Both of these methods can drastically reduce the computational
-complexity of convolution and enhance the overall efficiency of
-processing substantial data quantities. While we shall refrain from
-delving into the intricacies of these algorithms, we strongly
-encourage you to examine at least the application of FFT to optimize
-computations.
-</p>
-<h3 id="convolution-using-separable-kernels">Convolution using separable kernels </h3>
+
+<!-- !split  -->
+<h2 id="running-with-keras">Running with Keras </h2>
 
 
 <!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
@@ -4312,45 +1283,32 @@ <h3 id="convolution-using-separable-kernels">Convolution using separable kernels
     <div class="inner_cell">
       <div class="input_area">
         <div class="highlight" style="background: #eeeedd">
-  <pre style="line-height: 125%;"><span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">conv2DSep</span>(image, kernel, coef, stride=<span style="color: #B452CD">1</span>, pad=<span style="color: #CD5555">&quot;zero&quot;</span>):
-    <span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #B452CD">2</span>):
-        kernel = np.rot90(kernel)
-
-    <span style="color: #228B22"># The kernel is quadratic, thus we only need one of its dimensions</span>
-    half_dim = kernel.shape[<span style="color: #B452CD">0</span>] // <span style="color: #B452CD">2</span>
-
-    ker1 = np.array(kernel[<span style="color: #B452CD">0</span>, :])
-    ker2 = np.array(kernel[:, <span style="color: #B452CD">0</span>])
-
-    <span style="color: #8B008B; font-weight: bold">if</span> pad == <span style="color: #CD5555">&quot;zero&quot;</span>:
-        conv_image = np.zeros(image.shape)
-        pad_image = padding(image, kernel)
-    <span style="color: #8B008B; font-weight: bold">else</span>:
-        conv_image = np.zeros(
-            (image.shape[<span style="color: #B452CD">0</span>] - kernel.shape[<span style="color: #B452CD">0</span>], image.shape[<span style="color: #B452CD">1</span>] - kernel.shape[<span style="color: #B452CD">1</span>])
-        )
-        pad_image = image[:, :]
-
-    <span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(half_dim, conv_image.shape[<span style="color: #B452CD">0</span>] + half_dim, stride):
-        <span style="color: #8B008B; font-weight: bold">for</span> j <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(half_dim, conv_image.shape[<span style="color: #B452CD">1</span>] + half_dim, stride):
-            conv_image[i - half_dim, j - half_dim] = (
-                pad_image[
-                    i - half_dim : i + half_dim + <span style="color: #B452CD">1</span>, j - half_dim : j + half_dim + <span style="color: #B452CD">1</span>
-                ]
-                @ ker1
-                @ ker2.T
-                * coef
-            )
-
-    <span style="color: #8B008B; font-weight: bold">return</span> conv_image
-
-img_path = img_path = <span style="color: #CD5555">&quot;data/IMG-2167.JPG&quot;</span>
-image_of_cute_dog = imageio.imread(img_path, mode=<span style="color: #CD5555">&quot;L&quot;</span>)
-start_time = time.time()
-filtered_image = conv2DSep(image_of_cute_dog, kernel=sobel_kernel, coef=<span style="color: #B452CD">1</span>)
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">f&#39;Time taken for convolution with seperated kernel on 128x128 image {</span>time.time() - start_time<span style="color: #CD5555">}&#39;</span>)
-plt.imshow(filtered_image, cmap=<span style="color: #CD5555">&quot;gray&quot;</span>, vmin=<span style="color: #B452CD">0</span>, vmax=<span style="color: #B452CD">255</span>, aspect=<span style="color: #CD5555">&quot;auto&quot;</span>)
-plt.show()
+  <pre style="line-height: 125%;"><span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">create_convolutional_neural_network_keras</span>(input_shape, receptive_field,
+                                              n_filters, n_neurons_connected, n_categories,
+                                              eta, lmbd):
+    model = Sequential()
+    model.add(layers.Conv2D(n_filters, (receptive_field, receptive_field), input_shape=input_shape, padding=<span style="color: #CD5555">&#39;same&#39;</span>,
+              activation=<span style="color: #CD5555">&#39;relu&#39;</span>, kernel_regularizer=regularizers.l2(lmbd)))
+    model.add(layers.MaxPooling2D(pool_size=(<span style="color: #B452CD">2</span>, <span style="color: #B452CD">2</span>)))
+    model.add(layers.Flatten())
+    model.add(layers.Dense(n_neurons_connected, activation=<span style="color: #CD5555">&#39;relu&#39;</span>, kernel_regularizer=regularizers.l2(lmbd)))
+    model.add(layers.Dense(n_categories, activation=<span style="color: #CD5555">&#39;softmax&#39;</span>, kernel_regularizer=regularizers.l2(lmbd)))
+    
+    sgd = optimizers.SGD(lr=eta)
+    model.compile(loss=<span style="color: #CD5555">&#39;categorical_crossentropy&#39;</span>, optimizer=sgd, metrics=[<span style="color: #CD5555">&#39;accuracy&#39;</span>])
+    
+    <span style="color: #8B008B; font-weight: bold">return</span> model
+
+epochs = <span style="color: #B452CD">100</span>
+batch_size = <span style="color: #B452CD">100</span>
+input_shape = X_train.shape[<span style="color: #B452CD">1</span>:<span style="color: #B452CD">4</span>]
+receptive_field = <span style="color: #B452CD">3</span>
+n_filters = <span style="color: #B452CD">10</span>
+n_neurons_connected = <span style="color: #B452CD">50</span>
+n_categories = <span style="color: #B452CD">10</span>
+
+eta_vals = np.logspace(-<span style="color: #B452CD">5</span>, <span style="color: #B452CD">1</span>, <span style="color: #B452CD">7</span>)
+lmbd_vals = np.logspace(-<span style="color: #B452CD">5</span>, <span style="color: #B452CD">1</span>, <span style="color: #B452CD">7</span>)
 </pre>
 </div>
       </div>
@@ -4366,20 +1324,10 @@ <h3 id="convolution-using-separable-kernels">Convolution using separable kernels
   </div>
 </div>
 
-<p>By taking advantage of the capabilities of separable kernels, we can
-effectively cut the computational expense of filtering an image in
-half. Yet, if we seek even more rapid processing, we can turn to the
-Fast Fourier Transform (FFT) algorithm provided by the numpy
-library. By utilizing FFT to transform the input image and filter into
-the frequency domain, we can perform convolution in this domain. This
-approach significantly reduces the number of operations needed and
-results in a marked speedup relative to other convolution
-techniques. In addition, it is worth noting that the FFT is widely
-regarded as one of the most critical algorithms developed to date,
-with applications ranging from digital signal processing to scientific
-computing.
-</p>
-<h3 id="convolution-in-the-fourier-domain">Convolution in the Fourier domain </h3>
+
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="final-part">Final part </h2>
+
 
 <!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
 <div class="cell border-box-sizing code_cell rendered">
@@ -4387,16 +1335,22 @@ <h3 id="convolution-in-the-fourier-domain">Convolution in the Fourier domain </h
     <div class="inner_cell">
       <div class="input_area">
         <div class="highlight" style="background: #eeeedd">
-  <pre style="line-height: 125%;">start_time = time.time()
-img_fft = np.fft.fft2(image_of_cute_dog)
-kernel_fft = np.fft.fft2(sobel_kernel, s=image_of_cute_dog.shape)
-
-conv_image = img_fft * kernel_fft
-
-filtered_image = np.fft.ifft2(conv_image)
-<span style="color: #658b00">print</span>(<span style="color: #CD5555">f&#39;Time take for convolution in the fourier domain: {</span>time.time() - start_time<span style="color: #CD5555">}&#39;</span>)
-plt.imshow(filtered_image.real, cmap=<span style="color: #CD5555">&quot;gray&quot;</span>, vmin=<span style="color: #B452CD">0</span>, vmax=<span style="color: #B452CD">255</span>, aspect=<span style="color: #CD5555">&quot;auto&quot;</span>)
-plt.show()
+  <pre style="line-height: 125%;">CNN_keras = np.zeros((<span style="color: #658b00">len</span>(eta_vals), <span style="color: #658b00">len</span>(lmbd_vals)), dtype=<span style="color: #658b00">object</span>)
+        
+<span style="color: #8B008B; font-weight: bold">for</span> i, eta <span style="color: #8B008B">in</span> <span style="color: #658b00">enumerate</span>(eta_vals):
+    <span style="color: #8B008B; font-weight: bold">for</span> j, lmbd <span style="color: #8B008B">in</span> <span style="color: #658b00">enumerate</span>(lmbd_vals):
+        CNN = create_convolutional_neural_network_keras(input_shape, receptive_field,
+                                              n_filters, n_neurons_connected, n_categories,
+                                              eta, lmbd)
+        CNN.fit(X_train, Y_train, epochs=epochs, batch_size=batch_size, verbose=<span style="color: #B452CD">0</span>)
+        scores = CNN.evaluate(X_test, Y_test)
+        
+        CNN_keras[i][j] = CNN
+        
+        <span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;Learning rate = &quot;</span>, eta)
+        <span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;Lambda = &quot;</span>, lmbd)
+        <span style="color: #658b00">print</span>(<span style="color: #CD5555">&quot;Test accuracy: %.3f&quot;</span> % scores[<span style="color: #B452CD">1</span>])
+        <span style="color: #658b00">print</span>()
 </pre>
 </div>
       </div>
@@ -4412,115 +1366,9 @@ <h3 id="convolution-in-the-fourier-domain">Convolution in the Fourier domain </h
   </div>
 </div>
 
-<p>It is evident that executing convolution in the Fourier domain yields
-the quickest computation time. Nonetheless, one should exercise
-caution, particularly when dealing with images of relatively small
-dimensions, as one of the other methods may prove to be more
-expeditious than FFT-enhanced convolution. The overhead involved in
-transferring both the image and filter into the Fourier domain,
-followed by their subsequent transformation back into the spatial
-domain, results in a minor inconvenience. Therefore, it is imperative
-to remain cognizant of this fact when utilizing FFT as the primary
-optimization technique.
-</p>
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="from-ffnns-and-cnns-to-recurrent-neural-networks-rnns">From FFNNs and CNNs to recurrent neural networks (RNNs) </h2>
-
-<p>There are limitation of FFNNs, one of which being that FFNNs are not
-designed to handle sequential data (data for which the order matters)
-effectively because they lack the capabilities of storing information
-about previous inputs; each input is being treated indepen-
-dently. This is a limitation when dealing with sequential data where
-past information can be vital to correctly process current and future
-inputs. 
-</p>
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="feedback-connections">Feedback connections </h2>
-
-<p>In contrast to FFNNs, recurrent networks introduce feedback
-connections, meaning the information is allowed to be carried to
-subsequent nodes across different time steps. These cyclic or feedback
-connections have the objective of providing the network with some kind
-of memory, making RNNs particularly suited for time- series data,
-natural language processing, speech recognition, and several other
-problems for which the order of the data is crucial.  The RNN
-architectures vary greatly in how they manage information flow and
-memory in the network.
-</p>
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="vanishing-gradients">Vanishing gradients </h2>
-
-<p>Different architectures often aim at improving
-some sub-optimal characteristics of the network. The simplest form of
-recurrent network, commonly called simple or vanilla RNN, for example,
-is known to suffer from the problem of vanishing gradients. This
-problem arises due to the nature of backpropagation in time. Gradients
-of the cost/loss function may get exponentially small (or large) if
-there are many layers in the network, which is the case of RNN when
-the sequence gets long.
-</p>
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="recurrent-neural-networks-rnns-overarching-view">Recurrent neural networks (RNNs): Overarching view </h2>
-
-<p>Till now our focus has been, including convolutional neural networks
-as well, on feedforward neural networks. The output or the activations
-flow only in one direction, from the input layer to the output layer.
-</p>
-
-<p>A recurrent neural network (RNN) looks very much like a feedforward
-neural network, except that it also has connections pointing
-backward. 
-</p>
-
-<p>RNNs are used to analyze time series data such as stock prices, and
-tell you when to buy or sell. In autonomous driving systems, they can
-anticipate car trajectories and help avoid accidents. More generally,
-they can work on sequences of arbitrary lengths, rather than on
-fixed-sized inputs like all the nets we have discussed so far. For
-example, they can take sentences, documents, or audio samples as
-input, making them extremely useful for natural language processing
-systems such as automatic translation and speech-to-text.
-</p>
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="sequential-data-only">Sequential data only? </h2>
-
-<p>An important issue is that in many deep learning methods we assume
-that the input and output data can be treated as independent and
-identically distributed, normally abbreviated to <b>iid</b>.
-This means that the data we use can be seen as mutually independent.
-</p>
-
-<p>This is however not the case for most data sets used in RNNs since we
-are dealing with sequences of data with strong inter-dependencies.
-This applies in particular to time series, which are sequential by
-contruction.
-</p>
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="differential-equations">Differential equations </h2>
-
-<p>As an example, the solutions of ordinary differential equations can be
-represented as a time series, similarly, how stock prices evolve as
-function of time is another example of a typical time series, or voice
-records and many other examples.
-</p>
-
-<p>Not all sequential data may however have a time stamp, texts being a
-typical example thereof, or DNA sequences.
-</p>
-
-<p>The main focus here is on data that can be structured either as time
-series or as ordered series of data.  We will not focus on for example
-natural language processing or similar data sets.
-</p>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="a-simple-example">A simple example </h2>
+<h2 id="final-visualization">Final visualization </h2>
 
 
 <!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
@@ -4529,67 +1377,35 @@ <h2 id="a-simple-example">A simple example </h2>
     <div class="inner_cell">
       <div class="input_area">
         <div class="highlight" style="background: #eeeedd">
-  <pre style="line-height: 125%;"><span style="color: #228B22"># Start importing packages</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">pandas</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">pd</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">numpy</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">np</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">matplotlib.pyplot</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">plt</span>
-<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">tensorflow</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">tf</span>
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">tensorflow.keras</span> <span style="color: #8B008B; font-weight: bold">import</span> datasets, layers, models
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">tensorflow.keras.layers</span> <span style="color: #8B008B; font-weight: bold">import</span> Input
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">tensorflow.keras.models</span> <span style="color: #8B008B; font-weight: bold">import</span> Model, Sequential 
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">tensorflow.keras.layers</span> <span style="color: #8B008B; font-weight: bold">import</span> Dense, SimpleRNN, LSTM, GRU
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">tensorflow.keras</span> <span style="color: #8B008B; font-weight: bold">import</span> optimizers     
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">tensorflow.keras</span> <span style="color: #8B008B; font-weight: bold">import</span> regularizers           
-<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">tensorflow.keras.utils</span> <span style="color: #8B008B; font-weight: bold">import</span> to_categorical 
-
-
-
-<span style="color: #228B22"># convert into dataset matrix</span>
-<span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">convertToMatrix</span>(data, step):
- X, Y =[], []
- <span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #658b00">len</span>(data)-step):
-  d=i+step  
-  X.append(data[i:d,])
-  Y.append(data[d,])
- <span style="color: #8B008B; font-weight: bold">return</span> np.array(X), np.array(Y)
-
-step = <span style="color: #B452CD">4</span>
-N = <span style="color: #B452CD">1000</span>    
-Tp = <span style="color: #B452CD">800</span>    
-
-t=np.arange(<span style="color: #B452CD">0</span>,N)
-x=np.sin(<span style="color: #B452CD">0.02</span>*t)+<span style="color: #B452CD">2</span>*np.random.rand(N)
-df = pd.DataFrame(x)
-df.head()
-
-values=df.values
-train,test = values[<span style="color: #B452CD">0</span>:Tp,:], values[Tp:N,:]
-
-<span style="color: #228B22"># add step elements into train and test</span>
-test = np.append(test,np.repeat(test[-<span style="color: #B452CD">1</span>,],step))
-train = np.append(train,np.repeat(train[-<span style="color: #B452CD">1</span>,],step))
- 
-trainX,trainY =convertToMatrix(train,step)
-testX,testY =convertToMatrix(test,step)
-trainX = np.reshape(trainX, (trainX.shape[<span style="color: #B452CD">0</span>], <span style="color: #B452CD">1</span>, trainX.shape[<span style="color: #B452CD">1</span>]))
-testX = np.reshape(testX, (testX.shape[<span style="color: #B452CD">0</span>], <span style="color: #B452CD">1</span>, testX.shape[<span style="color: #B452CD">1</span>]))
-
-model = Sequential()
-model.add(SimpleRNN(units=<span style="color: #B452CD">32</span>, input_shape=(<span style="color: #B452CD">1</span>,step), activation=<span style="color: #CD5555">&quot;relu&quot;</span>))
-model.add(Dense(<span style="color: #B452CD">8</span>, activation=<span style="color: #CD5555">&quot;relu&quot;</span>)) 
-model.add(Dense(<span style="color: #B452CD">1</span>))
-model.compile(loss=<span style="color: #CD5555">&#39;mean_squared_error&#39;</span>, optimizer=<span style="color: #CD5555">&#39;rmsprop&#39;</span>)
-model.summary()
+  <pre style="line-height: 125%;"><span style="color: #228B22"># visual representation of grid search</span>
+<span style="color: #228B22"># uses seaborn heatmap, could probably do this in matplotlib</span>
+<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">seaborn</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">sns</span>
+
+sns.set()
+
+train_accuracy = np.zeros((<span style="color: #658b00">len</span>(eta_vals), <span style="color: #658b00">len</span>(lmbd_vals)))
+test_accuracy = np.zeros((<span style="color: #658b00">len</span>(eta_vals), <span style="color: #658b00">len</span>(lmbd_vals)))
+
+<span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #658b00">len</span>(eta_vals)):
+    <span style="color: #8B008B; font-weight: bold">for</span> j <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #658b00">len</span>(lmbd_vals)):
+        CNN = CNN_keras[i][j]
+
+        train_accuracy[i][j] = CNN.evaluate(X_train, Y_train)[<span style="color: #B452CD">1</span>]
+        test_accuracy[i][j] = CNN.evaluate(X_test, Y_test)[<span style="color: #B452CD">1</span>]
 
-model.fit(trainX,trainY, epochs=<span style="color: #B452CD">100</span>, batch_size=<span style="color: #B452CD">16</span>, verbose=<span style="color: #B452CD">2</span>)
-trainPredict = model.predict(trainX)
-testPredict= model.predict(testX)
-predicted=np.concatenate((trainPredict,testPredict),axis=<span style="color: #B452CD">0</span>)
+        
+fig, ax = plt.subplots(figsize = (<span style="color: #B452CD">10</span>, <span style="color: #B452CD">10</span>))
+sns.heatmap(train_accuracy, annot=<span style="color: #8B008B; font-weight: bold">True</span>, ax=ax, cmap=<span style="color: #CD5555">&quot;viridis&quot;</span>)
+ax.set_title(<span style="color: #CD5555">&quot;Training Accuracy&quot;</span>)
+ax.set_ylabel(<span style="color: #CD5555">&quot;$\eta$&quot;</span>)
+ax.set_xlabel(<span style="color: #CD5555">&quot;$\lambda$&quot;</span>)
+plt.show()
 
-trainScore = model.evaluate(trainX, trainY, verbose=<span style="color: #B452CD">0</span>)
-<span style="color: #658b00">print</span>(trainScore)
-plt.plot(df)
-plt.plot(predicted)
+fig, ax = plt.subplots(figsize = (<span style="color: #B452CD">10</span>, <span style="color: #B452CD">10</span>))
+sns.heatmap(test_accuracy, annot=<span style="color: #8B008B; font-weight: bold">True</span>, ax=ax, cmap=<span style="color: #CD5555">&quot;viridis&quot;</span>)
+ax.set_title(<span style="color: #CD5555">&quot;Test Accuracy&quot;</span>)
+ax.set_ylabel(<span style="color: #CD5555">&quot;$\eta$&quot;</span>)
+ax.set_xlabel(<span style="color: #CD5555">&quot;$\lambda$&quot;</span>)
 plt.show()
 </pre>
 </div>
@@ -4608,538 +1424,358 @@ <h2 id="a-simple-example">A simple example </h2>
 
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="rnns">RNNs </h2>
-
-<p>RNNs are very powerful, because they
-combine two properties:
-</p>
-<ol>
-<li> Distributed hidden state that allows them to store a lot of information about the past efficiently.</li>
-<li> Non-linear dynamics that allows them to update their hidden state in complicated ways.</li>
-</ol>
-<p>With enough neurons and time, RNNs
-can compute anything that can be
-computed by your computer.
-</p>
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="what-kinds-of-behaviour-can-rnns-exhibit">What kinds of behaviour can RNNs exhibit? </h2>
-
-<ol>
-<li> They can oscillate.</li> 
-<li> They can settle to point attractors.</li>
-<li> They can behave chaotically.</li>
-<li> RNNs could potentially learn to implement lots of small programs that each capture a nugget of knowledge and run in parallel, interacting to produce very complicated effects.</li>
-</ol>
-<p>But the extensive computational needs  of RNNs makes them very hard to train.</p>
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="basic-layout-figures-from-sebastian-rashcka-et-al-machine-learning-with-sickit-learn-and-pytorch-https-sebastianraschka-com-blog-2022-ml-pytorch-book-html">Basic layout,  <a href="https://sebastianraschka.com/blog/2022/ml-pytorch-book.html" target="_blank">Figures from Sebastian Rashcka et al, Machine learning with Sickit-Learn and PyTorch</a> </h2>
-
-<br/><br/>
-<center>
-<p><img src="http://github.com/figslides/RNN1.png" width="700" align="bottom"></p>
-</center>
-<br/><br/>
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="solving-differential-equations-with-rnns">Solving differential equations with RNNs </h2>
-
-<p>To gain some intuition on how we can use RNNs for time series, let us
-tailor the representation of the solution of a differential equation
-as a time series.
-</p>
-
-<p>Consider the famous differential equation (Newton's equation of motion for damped harmonic oscillations, scaled in terms of dimensionless time)</p>
-
-$$
-\frac{d^2x}{dt^2}+\eta\frac{dx}{dt}+x(t)=F(t),
-$$
-
-<p>where \( \eta \) is a constant used in scaling time into a dimensionless variable and \( F(t) \) is an external force acting on the system.
-The constant \( \eta \) is a so-called damping.
-</p>
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="two-first-order-differential-equations">Two first-order differential equations </h2>
-
-<p>In solving the above second-order equation, it is common to rewrite it in terms of two coupled first-order equations
-with the velocity
-</p>
-$$
-v(t)=\frac{dx}{dt},
-$$
-
-<p>and the acceleration</p>
-$$
-\frac{dv}{dt}=F(t)-\eta v(t)-x(t).
-$$
-
-<p>With the initial conditions \( v_0=v(t_0) \) and \( x_0=x(t_0) \) defined, we can integrate these equations and find their respective solutions.</p>
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="velocity-only">Velocity only </h2>
-
-<p>Let us focus on the velocity only. Discretizing and using the simplest
-possible approximation for the derivative, we have Euler's forward
-method for the updated velocity at a time step \( i+1 \) given by
-</p>
-$$
-v_{i+1}=v_i+\Delta t \frac{dv}{dt}_{\vert_{v=v_i}}=v_i+\Delta t\left(F_i-\eta v_i-x_i\right).
-$$
-
-<p>Defining a function</p>
-$$
-h_i(x_i,v_i,F_i)=v_i+\Delta t\left(F_i-\eta v_i-x_i\right),
-$$
-
-<p>we have</p>
-$$
-v_{i+1}=h_i(x_i,v_i,F_i).
-$$
-
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="linking-with-rnns">Linking with RNNs </h2>
-
-<p>The equation</p>
-$$
-v_{i+1}=h_i(x_i,v_i,F_i).
-$$
-
-<p>can be used to train a feed-forward neural network with inputs \( v_i \) and outputs \( v_{i+1} \) at a time \( t_i \). But we can think of this also as a recurrent neural network
-with inputs \( v_i \), \( x_i \) and \( F_i \) at each time step \( t_i \), and producing an output \( v_{i+1} \).
-</p>
-
-<p>Noting that </p>
-$$
-v_{i}=v_{i-1}+\Delta t\left(F_{i-1}-\eta v_{i-1}-x_{i-1}\right)=h_{i-1}.
-$$
-
-<p>we have</p>
-$$
-v_{i}=h_{i-1}(x_{i-1},v_{i-1},F_{i-1}),
-$$
-
-<p>and we can rewrite</p>
-$$
-v_{i+1}=h_i(x_i,h_{i-1},F_i).
-$$
-
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="minor-rewrite">Minor rewrite </h2>
-
-<p>We can thus set up a recurring series which depends on the inputs \( x_i \) and \( F_i \) and the previous values \( h_{i-1} \).
-We assume now that the inputs at each step (or time \( t_i \)) is given by \( x_i \) only and we denote the outputs for \( \tilde{y}_i \) instead of \( v_{i_1} \), we have then the compact equation for our outputs at each step \( t_i \)
-</p>
-$$
-y_{i}=h_i(x_i,h_{i-1}).
-$$
-
-<p>We can think of this as an element in a recurrent network where our
-network (our model) produces an output \( y_i \) which is then compared
-with a target value through a given cost/loss function that we
-optimize. The target values at a given step \( t_i \) could be the results
-of a measurement or simply the analytical results of a differential
-equation.
-</p>
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="rnns-in-more-detail">RNNs in more detail  </h2>
-
-<br/><br/>
-<center>
-<p><img src="http://github.com/figslides/RNN2.png" width="700" align="bottom"></p>
-</center>
-<br/><br/>
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="rnns-in-more-detail-part-2">RNNs in more detail, part 2  </h2>
-
-<br/><br/>
-<center>
-<p><img src="http://github.com/figslides/RNN3.png" width="700" align="bottom"></p>
-</center>
-<br/><br/>
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="rnns-in-more-detail-part-3">RNNs in more detail, part 3  </h2>
-
-<br/><br/>
-<center>
-<p><img src="http://github.com/figslides/RNN4.png" width="700" align="bottom"></p>
-</center>
-<br/><br/>
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="rnns-in-more-detail-part-4">RNNs in more detail, part 4  </h2>
-
-<br/><br/>
-<center>
-<p><img src="http://github.com/figslides/RNN5.png" width="700" align="bottom"></p>
-</center>
-<br/><br/>
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="rnns-in-more-detail-part-5">RNNs in more detail, part 5  </h2>
-
-<br/><br/>
-<center>
-<p><img src="http://github.com/figslides/RNN6.png" width="700" align="bottom"></p>
-</center>
-<br/><br/>
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="rnns-in-more-detail-part-6">RNNs in more detail, part 6  </h2>
-
-<br/><br/>
-<center>
-<p><img src="http://github.com/figslides/RNN7.png" width="700" align="bottom"></p>
-</center>
-<br/><br/>
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="rnns-in-more-detail-part-7">RNNs in more detail, part 7  </h2>
-
-<br/><br/>
-<center>
-<p><img src="http://github.com/figslides/RNN8.png" width="700" align="bottom"></p>
-</center>
-<br/><br/>
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="backpropagation-through-time">Backpropagation through time </h2>
-
-<div class="alert alert-block alert-block alert-text-normal">
-<b></b>
-<p>
-<p>We can think of the recurrent net as a layered, feed-forward
-net with shared weights and then train the feed-forward net
-with weight constraints.
-</p>
-</div>
-
-
-<p>We can also think of this training algorithm in the time domain:</p>
-<ol>
-<li> The forward pass builds up a stack of the activities of all the units at each time step.</li>
-<li> The backward pass peels activities off the stack to compute the error derivatives at each time step.</li>
-<li> After the backward pass we add together the derivatives at all the different times for each weight.</li> 
-</ol>
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="the-backward-pass-is-linear">The backward pass is linear </h2>
-
-<ol>
-<li> There is a big difference between the forward and backward passes.</li>
-<li> In the forward pass we use squashing functions (like the logistic) to prevent the activity vectors from exploding.</li>
-<li> The backward pass, is completely linear. If you double the error derivatives at the final layer, all the error derivatives will double.</li>
-</ol>
-<p>The forward pass determines the slope of the linear function used for
-backpropagating through each neuron
-</p>
-
-<!-- !split  -->
-<h2 id="the-problem-of-exploding-or-vanishing-gradients">The problem of exploding or vanishing gradients </h2>
-<ul>
-<li> What happens to the magnitude of the gradients as we backpropagate through many layers?
-<ol type="a"></li>
- <li> If the weights are small, the gradients shrink exponentially.</li>
- <li> If the weights are big the gradients grow exponentially.</li>
-</ol>
-<li> Typical feed-forward neural nets can cope with these exponential effects because they only have a few hidden layers.</li>
-<li> In an RNN trained on long sequences (e.g. 100 time steps) the gradients can easily explode or vanish.
-<ol type="a"></li>
- <li> We can avoid this by initializing the weights very carefully.</li>
-</ol>
-<li> Even with good initial weights, its very hard to detect that the current target output depends on an input from many time-steps ago.</li>
-</ul>
-<p>RNNs have difficulty dealing with long-range dependencies. </p>
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="mathematical-setup">Mathematical setup </h2>
+<h2 id="the-cifar01-data-set">The CIFAR01 data set </h2>
 
-<p>The expression for the simplest Recurrent network resembles that of a
-regular feed-forward neural network, but now with
-the concept of temporal dependencies
+<p>The CIFAR10 dataset contains 60,000 color images in 10 classes, with
+6,000 images in each class. The dataset is divided into 50,000
+training images and 10,000 testing images. The classes are mutually
+exclusive and there is no overlap between them.
 </p>
 
-$$
-\begin{align*}
-    \mathbf{a}^{(t)} & = U * \mathbf{x}^{(t)} + W * \mathbf{h}^{(t-1)} + \mathbf{b}, \notag \\
-    \mathbf{h}^{(t)} &= \sigma_h(\mathbf{a}^{(t)}), \notag\\
-    \mathbf{y}^{(t)} &= V * \mathbf{h}^{(t)} + \mathbf{c}, \notag\\
-    \mathbf{\hat{y}}^{(t)} &= \sigma_y(\mathbf{y}^{(t)}).
-\end{align*}
-$$
-
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="back-propagation-in-time-through-figures-part-1">Back propagation in time through figures, part 1   </h2>
-
-<br/><br/>
-<center>
-<p><img src="http://github.com/figslides/RNN9.png" width="700" align="bottom"></p>
-</center>
-<br/><br/>
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="back-propagation-in-time-part-2">Back propagation in time, part 2   </h2>
-
-<br/><br/>
-<center>
-<p><img src="http://github.com/figslides/RNN10.png" width="700" align="bottom"></p>
-</center>
-<br/><br/>
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="back-propagation-in-time-part-3">Back propagation in time, part 3   </h2>
-
-<br/><br/>
-<center>
-<p><img src="http://github.com/figslides/RNN11.png" width="700" align="bottom"></p>
-</center>
-<br/><br/>
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="back-propagation-in-time-part-4">Back propagation in time, part 4   </h2>
 
-<br/><br/>
-<center>
-<p><img src="http://github.com/figslides/RNN12.png" width="700" align="bottom"></p>
-</center>
-<br/><br/>
+<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #eeeedd">
+  <pre style="line-height: 125%;"><span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">tensorflow</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">tf</span>
 
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="back-propagation-in-time-in-equations">Back propagation in time in equations </h2>
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">tensorflow.keras</span> <span style="color: #8B008B; font-weight: bold">import</span> datasets, layers, models
+<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">matplotlib.pyplot</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">plt</span>
 
-<p>To derive the expression of the gradients of \( \mathcal{L} \) for
-the RNN, we need to start recursively from the nodes closer to the
-output layer in the temporal unrolling scheme - such as \( \mathbf{y} \)
-and \( \mathbf{h} \) at final time \( t = \tau \),
-</p>
+<span style="color: #228B22"># We import the data set</span>
+(train_images, train_labels), (test_images, test_labels) = datasets.cifar10.load_data()
 
-$$
-\begin{align*}
-    (\nabla_{ \mathbf{y}^{(t)}} \mathcal{L})_{i} &= \frac{\partial \mathcal{L}}{\partial L^{(t)}}\frac{\partial L^{(t)}}{\partial y_{i}^{(t)}}, \notag\\
-    \nabla_{\mathbf{h}^{(\tau)}} \mathcal{L} &= \mathbf{V}^\mathsf{T}\nabla_{ \mathbf{y}^{(\tau)}} \mathcal{L}.
-\end{align*}
-$$
+<span style="color: #228B22"># Normalize pixel values to be between 0 and 1 by dividing by 255. </span>
+train_images, test_images = train_images / <span style="color: #B452CD">255.0</span>, test_images / <span style="color: #B452CD">255.0</span>
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
+</div>
 
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="chain-rule-again">Chain rule again </h2>
-<p>For the following hidden nodes, we have to iterate through time, so by the chain rule, </p>
-
-$$
-\begin{align*}
-    \nabla_{\mathbf{h}^{(t)}} \mathcal{L} &= \left(\frac{\partial\mathbf{h}^{(t+1)}}{\partial\mathbf{h}^{(t)}}\right)^\mathsf{T}\nabla_{\mathbf{h}^{(t+1)}}\mathcal{L} + \left(\frac{\partial\mathbf{y}^{(t)}}{\partial\mathbf{h}^{(t)}}\right)^\mathsf{T}\nabla_{ \mathbf{y}^{(t)}} \mathcal{L}.
-\end{align*}
-$$
+<h2 id="verifying-the-data-set">Verifying the data set </h2>
 
+<p>To verify that the dataset looks correct, let's plot the first 25 images from the training set and display the class name below each image.</p>
 
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="gradients-of-loss-functions">Gradients of loss functions </h2>
-<p>Similarly, the gradients of \( \mathcal{L} \) with respect to the weights and biases follow,</p>
 
-$$
-\begin{align*}
-    \nabla_{\mathbf{c}} \mathcal{L} &=\sum_{t}\left(\frac{\partial \mathbf{y}^{(t)}}{\partial \mathbf{c}}\right)^\mathsf{T} \nabla_{\mathbf{y}^{(t)}} \mathcal{L} \notag\\
-    \nabla_{\mathbf{b}} \mathcal{L} &=\sum_{t}\left(\frac{\partial \mathbf{h}^{(t)}}{\partial \mathbf{b}}\right)^\mathsf{T}        \nabla_{\mathbf{h}^{(t)}} \mathcal{L} \notag\\
-    \nabla_{\mathbf{V}} \mathcal{L} &=\sum_{t}\sum_{i}\left(\frac{\partial \mathcal{L}}{\partial y_i^{(t)} }\right)\nabla_{\mathbf{V}^{(t)}}y_i^{(t)} \notag\\
-    \nabla_{\mathbf{W}} \mathcal{L} &=\sum_{t}\sum_{i}\left(\frac{\partial \mathcal{L}}{\partial h_i^{(t)}}\right)\nabla_{\mathbf{w}^{(t)}} h_i^{(t)} \notag\\
-    \nabla_{\mathbf{U}} \mathcal{L} &=\sum_{t}\sum_{i}\left(\frac{\partial \mathcal{L}}{\partial h_i^{(t)}}\right)\nabla_{\mathbf{U}^{(t)}}h_i^{(t)}.
-    \label{eq:rnn_gradients3}
-\end{align*}
-$$
+<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #eeeedd">
+  <pre style="line-height: 125%;">class_names = [<span style="color: #CD5555">&#39;airplane&#39;</span>, <span style="color: #CD5555">&#39;automobile&#39;</span>, <span style="color: #CD5555">&#39;bird&#39;</span>, <span style="color: #CD5555">&#39;cat&#39;</span>, <span style="color: #CD5555">&#39;deer&#39;</span>,
+               <span style="color: #CD5555">&#39;dog&#39;</span>, <span style="color: #CD5555">&#39;frog&#39;</span>, <span style="color: #CD5555">&#39;horse&#39;</span>, <span style="color: #CD5555">&#39;ship&#39;</span>, <span style="color: #CD5555">&#39;truck&#39;</span>]
+plt.figure(figsize=(<span style="color: #B452CD">10</span>,<span style="color: #B452CD">10</span>))
+<span style="color: #8B008B; font-weight: bold">for</span> i <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(<span style="color: #B452CD">25</span>):
+    plt.subplot(<span style="color: #B452CD">5</span>,<span style="color: #B452CD">5</span>,i+<span style="color: #B452CD">1</span>)
+    plt.xticks([])
+    plt.yticks([])
+    plt.grid(<span style="color: #8B008B; font-weight: bold">False</span>)
+    plt.imshow(train_images[i], cmap=plt.cm.binary)
+    <span style="color: #228B22"># The CIFAR labels happen to be arrays, </span>
+    <span style="color: #228B22"># which is why you need the extra index</span>
+    plt.xlabel(class_names[train_labels[i][<span style="color: #B452CD">0</span>]])
+plt.show()
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
+</div>
 
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="summary-of-rnns">Summary of RNNs </h2>
+<h2 id="set-up-the-model">Set up  the model </h2>
 
-<p>Recurrent neural networks (RNNs) have in general no probabilistic component
-in a model. With a given fixed input and target from data, the RNNs learn the intermediate
-association between various layers.
-The inputs, outputs, and internal representation (hidden states) are all
-real-valued vectors.
-</p>
+<p>The 6 lines of code below define the convolutional base using a common pattern: a stack of Conv2D and MaxPooling2D layers.</p>
 
-<p>In a  traditional NN, it is assumed that every input is
-independent of each other.  But with sequential data, the input at a given stage \( t \) depends on the input from the previous stage \( t-1 \)
-</p>
+<p>As input, a CNN takes tensors of shape (image_height, image_width, color_channels), ignoring the batch size. If you are new to these dimensions, color_channels refers to (R,G,B). In this example, you will configure our CNN to process inputs of shape (32, 32, 3), which is the format of CIFAR images. You can do this by passing the argument input_shape to our first layer.</p>
 
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="summary-of-a-typical-rnn">Summary of a  typical RNN </h2>
 
-<ol>
-<li> Weight matrices \( U \), \( W \) and \( V \) that connect the input layer at a stage \( t \) with the hidden layer \( h_t \), the previous hidden layer \( h_{t-1} \) with \( h_t \) and the hidden layer \( h_t \) connecting with the output layer at the same stage and producing an output \( \tilde{y}_t \), respectively.</li>
-<li> The output from the hidden layer \( h_t \) is oftem modulated by a \( \tanh{} \) function \( h_t=\sigma_h(x_t,h_{t-1})=\tanh{(Ux_t+Wh_{t-1}+b)} \) with \( b \) a bias value</li>
-<li> The output from the hidden layer produces \( \tilde{y}_t=\sigma_y(Vh_t+c) \) where \( c \) is a new bias parameter.</li>
-<li> The output from the training at a given stage is in turn compared with the observation \( y_t \) thorugh a chosen cost function.</li>
-</ol>
-<p>The function \( g \) can any of the standard activation functions, that is a Sigmoid, a Softmax, a ReLU and other.
-The parameters are trained through the so-called back-propagation through time (BPTT) algorithm.
-</p>
+<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #eeeedd">
+  <pre style="line-height: 125%;">model = models.Sequential()
+model.add(layers.Conv2D(<span style="color: #B452CD">32</span>, (<span style="color: #B452CD">3</span>, <span style="color: #B452CD">3</span>), activation=<span style="color: #CD5555">&#39;relu&#39;</span>, input_shape=(<span style="color: #B452CD">32</span>, <span style="color: #B452CD">32</span>, <span style="color: #B452CD">3</span>)))
+model.add(layers.MaxPooling2D((<span style="color: #B452CD">2</span>, <span style="color: #B452CD">2</span>)))
+model.add(layers.Conv2D(<span style="color: #B452CD">64</span>, (<span style="color: #B452CD">3</span>, <span style="color: #B452CD">3</span>), activation=<span style="color: #CD5555">&#39;relu&#39;</span>))
+model.add(layers.MaxPooling2D((<span style="color: #B452CD">2</span>, <span style="color: #B452CD">2</span>)))
+model.add(layers.Conv2D(<span style="color: #B452CD">64</span>, (<span style="color: #B452CD">3</span>, <span style="color: #B452CD">3</span>), activation=<span style="color: #CD5555">&#39;relu&#39;</span>))
 
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="four-effective-ways-to-learn-an-rnn-and-preparing-for-next-week">Four effective ways to learn an RNN and preparing for next week </h2>
-<ol>
-<li> Long Short Term Memory Make the RNN out of little modules that are designed to remember values for a long time.</li>
-<li> Hessian Free Optimization: Deal with the vanishing gradients problem by using a fancy optimizer that can detect directions with a tiny gradient but even smaller curvature.</li>
-<li> Echo State Networks: Initialize the input a hidden and hidden-hidden and output-hidden connections very carefully so that the hidden state has a huge reservoir of weakly coupled oscillators which can be selectively driven by the input.</li>
-<ul>
-  <li> ESNs only need to learn the hidden-output connections.</li>
-</ul>
-<li> Good initialization with momentum Initialize like in Echo State Networks, but then learn all of the connections using momentum</li>
-</ol>
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="gating-mechanism-long-short-term-memory-lstm">Gating mechanism: Long Short Term Memory (LSTM) </h2>
+<span style="color: #228B22"># Let&#39;s display the architecture of our model so far.</span>
 
-<p>Besides a simple recurrent neural network layer, as discussed above, there are two other
-commonly used types of recurrent neural network layers: Long Short
-Term Memory (LSTM) and Gated Recurrent Unit (GRU).  For a short
-introduction to these layers see <a href="https://medium.com/mindboard/lstm-vs-gru-experimental-comparison-955820c21e8b" target="_blank"><tt>https://medium.com/mindboard/lstm-vs-gru-experimental-comparison-955820c21e8b</tt></a>
-and <a href="https://medium.com/mindboard/lstm-vs-gru-experimental-comparison-955820c21e8b" target="_blank"><tt>https://medium.com/mindboard/lstm-vs-gru-experimental-comparison-955820c21e8b</tt></a>.
-</p>
+model.summary()
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
+</div>
 
-<p>LSTM uses a memory cell for 
-modeling long-range dependencies and avoid vanishing gradient
- problems.
-Capable of modeling longer term dependencies by having
-memory cells and gates that controls the information flow along
-with the memory cells.
-</p>
+<p>You can see that the output of every Conv2D and MaxPooling2D layer is a 3D tensor of shape (height, width, channels). The width and height dimensions tend to shrink as you go deeper in the network. The number of output channels for each Conv2D layer is controlled by the first argument (e.g., 32 or 64). Typically, as the width and height shrink, you can afford (computationally) to add more output channels in each Conv2D layer.</p>
 
-<ol>
-<li> Introduced by Hochreiter and Schmidhuber (1997) who solved the problem of getting an RNN to remember things for a long time (like hundreds of time steps).</li>
-<li> They designed a memory cell using logistic and linear units with multiplicative interactions.</li>
-<li> Information gets into the cell whenever its &#8220;write&#8221; gate is on.</li>
-<li> The information stays in the cell so long as its <b>keep</b> gate is on.</li>
-<li> Information can be read from the cell by turning on its <b>read</b> gate.</li> 
-</ol>
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="implementing-a-memory-cell-in-a-neural-network">Implementing a memory cell in a neural network </h2>
+<h2 id="add-dense-layers-on-top">Add Dense layers on top </h2>
 
-<p>To preserve information for a long time in
-the activities of an RNN, we use a circuit
-that implements an analog memory cell.
+<p>To complete our model, you will feed the last output tensor from the
+convolutional base (of shape (4, 4, 64)) into one or more Dense layers
+to perform classification. Dense layers take vectors as input (which
+are 1D), while the current output is a 3D tensor. First, you will
+flatten (or unroll) the 3D output to 1D, then add one or more Dense
+layers on top. CIFAR has 10 output classes, so you use a final Dense
+layer with 10 outputs and a softmax activation.
 </p>
 
-<ol>
-<li> A linear unit that has a self-link with a weight of 1 will maintain its state.</li>
-<li> Information is stored in the cell by activating its write gate.</li>
-<li> Information is retrieved by activating the read gate.</li>
-<li> We can backpropagate through this circuit because logistics are have nice derivatives.</li> 
-</ol>
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="lstm-details">LSTM details </h2>
 
-<p>The LSTM is a unit cell that is made of three gates:</p>
-<ol>
-<li> the input gate,</li>
-<li> the forget gate,</li>
-<li> and the output gate.</li>
-</ol>
-<p>It also introduces a cell state \( c \), which can be thought of as the
-long-term memory, and a hidden state \( h \) which can be thought of as
-the short-term memory.
-</p>
+<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #eeeedd">
+  <pre style="line-height: 125%;">model.add(layers.Flatten())
+model.add(layers.Dense(<span style="color: #B452CD">64</span>, activation=<span style="color: #CD5555">&#39;relu&#39;</span>))
+model.add(layers.Dense(<span style="color: #B452CD">10</span>))
+Here<span style="color: #CD5555">&#39;s the complete architecture of our model.</span>
 
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="basic-layout">Basic layout </h2>
+model.summary()
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
+</div>
 
-<br/><br/>
-<center>
-<p><img src="http://github.com/figslides/lstm.png" width="700" align="bottom"></p>
-</center>
-<br/><br/>
+<p>As you can see, our (4, 4, 64) outputs were flattened into vectors of shape (1024) before going through two Dense layers.</p>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="more-lstm-details">More LSTM details </h2>
-
-<p>The first stage is called the forget gate, where we combine the input
-at (say, time \( t \)), and the hidden cell state input at \( t-1 \), passing
-it through the Sigmoid activation function and then performing an
-element-wise multiplication, denoted by \( \otimes \).
-</p>
+<h2 id="compile-and-train-the-model">Compile and train the model </h2>
 
-<p>It follows </p>
-$$
-\mathbf{f}^{(t)} = \sigma(W_f\mathbf{x}^{(t)} + U_f\mathbf{h}^{(t-1)} + \mathbf{b}_f)
-$$
 
-<p>where \( W \) and \( U \) are the weights respectively.</p>
+<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #eeeedd">
+  <pre style="line-height: 125%;">model.compile(optimizer=<span style="color: #CD5555">&#39;adam&#39;</span>,
+              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=<span style="color: #8B008B; font-weight: bold">True</span>),
+              metrics=[<span style="color: #CD5555">&#39;accuracy&#39;</span>])
 
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="the-forget-gate">The forget gate </h2>
+history = model.fit(train_images, train_labels, epochs=<span style="color: #B452CD">10</span>, 
+                    validation_data=(test_images, test_labels))
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
+</div>
 
-<p>This is called the forget gate since the Sigmoid activation function's
-outputs are very close to \( 0 \) if the argument for the function is very
-negative, and \( 1 \) if the argument is very positive. Hence we can
-control the amount of information we want to take from the long-term
-memory.
-</p>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="input-gate">Input gate </h2>
-
-<p>The next stage is the input gate, which consists of both a Sigmoid
-function (\( \sigma_i \)), which decide what percentage of the input will
-be stored in the long-term memory, and the \( \tanh_i \) function, which
-decide what is the full memory that can be stored in the long term
-memory. When these results are calculated and multiplied together, it
-is added to the cell state or stored in the long-term memory, denoted
-as \( \oplus \). 
-</p>
-
-<p>We have</p>
-$$
-\mathbf{i}^{(t)} = \sigma_g(W_i\mathbf{x}^{(t)} + U_i\mathbf{h}^{(t-1)} + \mathbf{b}_i),
-$$
+<h2 id="finally-evaluate-the-model">Finally, evaluate the model </h2>
 
-<p>and</p>
-$$
-\mathbf{\tilde{c}}^{(t)} = \tanh(W_c\mathbf{x}^{(t)} + U_c\mathbf{h}^{(t-1)} + \mathbf{b}_c),
-$$
 
-<p>again the \( W \) and \( U \) are the weights.</p>
+<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #eeeedd">
+  <pre style="line-height: 125%;">plt.plot(history.history[<span style="color: #CD5555">&#39;accuracy&#39;</span>], label=<span style="color: #CD5555">&#39;accuracy&#39;</span>)
+plt.plot(history.history[<span style="color: #CD5555">&#39;val_accuracy&#39;</span>], label = <span style="color: #CD5555">&#39;val_accuracy&#39;</span>)
+plt.xlabel(<span style="color: #CD5555">&#39;Epoch&#39;</span>)
+plt.ylabel(<span style="color: #CD5555">&#39;Accuracy&#39;</span>)
+plt.ylim([<span style="color: #B452CD">0.5</span>, <span style="color: #B452CD">1</span>])
+plt.legend(loc=<span style="color: #CD5555">&#39;lower right&#39;</span>)
 
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="forget-and-input">Forget and input </h2>
+test_loss, test_acc = model.evaluate(test_images,  test_labels, verbose=<span style="color: #B452CD">2</span>)
 
-<p>The forget gate and the input gate together also update the cell state with the following equation, </p>
-$$
-\mathbf{c}^{(t)} = \mathbf{f}^{(t)} \otimes \mathbf{c}^{(t-1)} + \mathbf{i}^{(t)} \otimes \mathbf{\tilde{c}}^{(t)},
-$$
+<span style="color: #658b00">print</span>(test_acc)
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
+</div>
 
-<p>where \( f^{(t)} \) and \( i^{(t)} \) are the outputs of the forget gate and the input gate, respectively.</p>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="output-gate">Output gate </h2>
-
-<p>The final stage of the LSTM is the output gate, and its purpose is to
-update the short-term memory.  To achieve this, we take the newly
-generated long-term memory and process it through a hyperbolic tangent
-(\( \tanh \)) function creating a potential new short-term memory. We then
-multiply this potential memory by the output of the Sigmoid function
-(\( \sigma_o \)). This multiplication generates the final output as well
-as the input for the next hidden cell (\( h^{\langle t \rangle} \)) within
-the LSTM cell.
+<h2 id="building-code-using-pytorch">Building code using Pytorch </h2>
+
+<p>This code loads and normalizes the MNIST dataset. Thereafter it defines  a CNN architecture with:</p>
+<ol>
+<li> Two convolutional layers</li>
+<li> Max pooling</li>
+<li> Dropout for regularization</li>
+<li> Two fully connected layers</li>
+</ol>
+<p>It uses the Adam optimizer and for cost function it employs the
+Cross-Entropy function. It trains for 10 epochs.
+You can modify the architecture (number of layers, channels, dropout
+rate) or training parameters (learning rate, batch size, epochs) to
+experiment with different configurations.
 </p>
 
-<p>We have </p>
-$$
-\begin{aligned}
-\mathbf{o}^{(t)} &= \sigma_g(W_o\mathbf{x}^{(t)} + U_o\mathbf{h}^{(t-1)} + \mathbf{b}_o), \\
-\mathbf{h}^{(t)} &= \mathbf{o}^{(t)} \otimes \sigma_h(\mathbf{c}^{(t)}). \\
-\end{aligned}
-$$
 
-<p>where \( \mathbf{W_o,U_o} \) are the weights of the output gate and \( \mathbf{b_o} \) is the bias of the output gate.</p>
+<!-- code=python (!bc pycod) typeset with pygments style "perldoc" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #eeeedd">
+  <pre style="line-height: 125%;"><span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">torch</span>
+<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">torch.nn</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">nn</span>
+<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">torch.nn.functional</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">F</span>
+<span style="color: #8B008B; font-weight: bold">import</span> <span style="color: #008b45; text-decoration: underline">torch.optim</span> <span style="color: #8B008B; font-weight: bold">as</span> <span style="color: #008b45; text-decoration: underline">optim</span>
+<span style="color: #8B008B; font-weight: bold">from</span> <span style="color: #008b45; text-decoration: underline">torchvision</span> <span style="color: #8B008B; font-weight: bold">import</span> datasets, transforms
+
+<span style="color: #228B22"># Set device</span>
+device = torch.device(<span style="color: #CD5555">&quot;cuda&quot;</span> <span style="color: #8B008B; font-weight: bold">if</span> torch.cuda.is_available() <span style="color: #8B008B; font-weight: bold">else</span> <span style="color: #CD5555">&quot;cpu&quot;</span>)
+
+<span style="color: #228B22"># Define transforms</span>
+transform = transforms.Compose([
+   transforms.ToTensor(),
+   transforms.Normalize((<span style="color: #B452CD">0.1307</span>,), (<span style="color: #B452CD">0.3081</span>,))
+])
+
+<span style="color: #228B22"># Load datasets</span>
+train_dataset = datasets.MNIST(root=<span style="color: #CD5555">&#39;./data&#39;</span>, train=<span style="color: #8B008B; font-weight: bold">True</span>, download=<span style="color: #8B008B; font-weight: bold">True</span>, transform=transform)
+test_dataset = datasets.MNIST(root=<span style="color: #CD5555">&#39;./data&#39;</span>, train=<span style="color: #8B008B; font-weight: bold">False</span>, download=<span style="color: #8B008B; font-weight: bold">True</span>, transform=transform)
+
+<span style="color: #228B22"># Create data loaders</span>
+train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=<span style="color: #B452CD">64</span>, shuffle=<span style="color: #8B008B; font-weight: bold">True</span>)
+test_loader = torch.utils.data.DataLoader(test_dataset, batch_size=<span style="color: #B452CD">64</span>, shuffle=<span style="color: #8B008B; font-weight: bold">False</span>)
+
+<span style="color: #228B22"># Define CNN model</span>
+<span style="color: #8B008B; font-weight: bold">class</span> <span style="color: #008b45; font-weight: bold">CNN</span>(nn.Module):
+   <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">__init__</span>(<span style="color: #658b00">self</span>):
+       <span style="color: #658b00">super</span>(CNN, <span style="color: #658b00">self</span>).<span style="color: #008b45">__init__</span>()
+       <span style="color: #658b00">self</span>.conv1 = nn.Conv2d(<span style="color: #B452CD">1</span>, <span style="color: #B452CD">32</span>, <span style="color: #B452CD">3</span>, padding=<span style="color: #B452CD">1</span>)
+       <span style="color: #658b00">self</span>.conv2 = nn.Conv2d(<span style="color: #B452CD">32</span>, <span style="color: #B452CD">64</span>, <span style="color: #B452CD">3</span>, padding=<span style="color: #B452CD">1</span>)
+       <span style="color: #658b00">self</span>.pool = nn.MaxPool2d(<span style="color: #B452CD">2</span>, <span style="color: #B452CD">2</span>)
+       <span style="color: #658b00">self</span>.fc1 = nn.Linear(<span style="color: #B452CD">64</span>*<span style="color: #B452CD">7</span>*<span style="color: #B452CD">7</span>, <span style="color: #B452CD">1024</span>)
+       <span style="color: #658b00">self</span>.fc2 = nn.Linear(<span style="color: #B452CD">1024</span>, <span style="color: #B452CD">10</span>)
+       <span style="color: #658b00">self</span>.dropout = nn.Dropout(<span style="color: #B452CD">0.5</span>)
+
+   <span style="color: #8B008B; font-weight: bold">def</span> <span style="color: #008b45">forward</span>(<span style="color: #658b00">self</span>, x):
+       x = <span style="color: #658b00">self</span>.pool(F.relu(<span style="color: #658b00">self</span>.conv1(x)))
+       x = <span style="color: #658b00">self</span>.pool(F.relu(<span style="color: #658b00">self</span>.conv2(x)))
+       x = x.view(-<span style="color: #B452CD">1</span>, <span style="color: #B452CD">64</span>*<span style="color: #B452CD">7</span>*<span style="color: #B452CD">7</span>)
+       x = <span style="color: #658b00">self</span>.dropout(F.relu(<span style="color: #658b00">self</span>.fc1(x)))
+       x = <span style="color: #658b00">self</span>.fc2(x)
+       <span style="color: #8B008B; font-weight: bold">return</span> x
+
+<span style="color: #228B22"># Initialize model, loss function, and optimizer</span>
+model = CNN().to(device)
+criterion = nn.CrossEntropyLoss()
+optimizer = optim.Adam(model.parameters(), lr=<span style="color: #B452CD">0.001</span>)
+
+<span style="color: #228B22"># Training loop</span>
+num_epochs = <span style="color: #B452CD">10</span>
+<span style="color: #8B008B; font-weight: bold">for</span> epoch <span style="color: #8B008B">in</span> <span style="color: #658b00">range</span>(num_epochs):
+   model.train()
+   running_loss = <span style="color: #B452CD">0.0</span>
+   <span style="color: #8B008B; font-weight: bold">for</span> batch_idx, (data, target) <span style="color: #8B008B">in</span> <span style="color: #658b00">enumerate</span>(train_loader):
+       data, target = data.to(device), target.to(device)
+       optimizer.zero_grad()
+       outputs = model(data)
+       loss = criterion(outputs, target)
+       loss.backward()
+       optimizer.step()
+       running_loss += loss.item()
+
+   <span style="color: #658b00">print</span>(<span style="color: #CD5555">f&#39;Epoch [{</span>epoch+<span style="color: #B452CD">1</span><span style="color: #CD5555">}/{</span>num_epochs<span style="color: #CD5555">}], Loss: {</span>running_loss/<span style="color: #658b00">len</span>(train_loader)<span style="color: #CD5555">:.4f}&#39;</span>)
+
+<span style="color: #228B22"># Testing the model</span>
+model.eval()
+correct = <span style="color: #B452CD">0</span>
+total = <span style="color: #B452CD">0</span>
+<span style="color: #8B008B; font-weight: bold">with</span> torch.no_grad():
+   <span style="color: #8B008B; font-weight: bold">for</span> data, target <span style="color: #8B008B">in</span> test_loader:
+       data, target = data.to(device), target.to(device)
+       outputs = model(data)
+       _, predicted = torch.max(outputs.data, <span style="color: #B452CD">1</span>)
+       total += target.size(<span style="color: #B452CD">0</span>)
+       correct += (predicted == target).sum().item()
+
+<span style="color: #658b00">print</span>(<span style="color: #CD5555">f&#39;Test Accuracy: {</span><span style="color: #B452CD">100</span> * correct / total<span style="color: #CD5555">:.2f}%&#39;</span>)
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
+</div>
+
 
 <!-- ------------------- end of main content --------------- -->
 <center style="font-size:80%">
-<!-- copyright --> &copy; 1999-2024, Morten Hjorth-Jensen. Released under CC Attribution-NonCommercial 4.0 license
+<!-- copyright --> &copy; 1999-2025, Morten Hjorth-Jensen. Released under CC Attribution-NonCommercial 4.0 license
 </center>
 </body>
 </html>
diff --git a/doc/pub/week45/html/week45.html b/doc/pub/week45/html/week45.html
index 218a99d75..cceb0dcef 100644
--- a/doc/pub/week45/html/week45.html
+++ b/doc/pub/week45/html/week45.html
@@ -8,8 +8,8 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="DocOnce: https://github.com/doconce/doconce/" />
 <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-<meta name="description" content="Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks (RNNs)">
-<title>Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks (RNNs)</title>
+<meta name="description" content="Week 45,  Convolutional Neural Networks (CCNs)">
+<title>Week 45,  Convolutional Neural Networks (CCNs)</title>
 <style type="text/css">
 /* bloodish style */
 body {
@@ -141,19 +141,19 @@
 <!-- tocinfo
 {'highest level': 2,
  'sections': [('Plans for week 45', 2, None, 'plans-for-week-45'),
-              ('Material for the lab sessions, additional ways to present '
-               'classification results and other practicalities',
+              ('Material for the lab sessions',
                2,
                None,
-               'material-for-the-lab-sessions-additional-ways-to-present-classification-results-and-other-practicalities'),
-              ('Material for Lecture Monday November 4',
+               'material-for-the-lab-sessions'),
+              ('Material for Lecture Monday November 3',
                2,
                None,
-               'material-for-lecture-monday-november-4'),
-              ('Convolutional Neural Networks (recognizing images)',
+               'material-for-lecture-monday-november-3'),
+              ('Convolutional Neural Networks (recognizing images), reminder '
+               'from last week',
                2,
                None,
-               'convolutional-neural-networks-recognizing-images'),
+               'convolutional-neural-networks-recognizing-images-reminder-from-last-week'),
               ('What is the Difference', 2, None, 'what-is-the-difference'),
               ('Neural Networks vs CNNs', 2, None, 'neural-networks-vs-cnns'),
               ('Why CNNS for images, sound files, medical images from CT scans '
@@ -166,10 +166,13 @@
                None,
                'regular-nns-don-t-scale-well-to-full-images'),
               ('3D volumes of neurons', 2, None, '3d-volumes-of-neurons'),
+              ('More on Dimensionalities', 2, None, 'more-on-dimensionalities'),
+              ('Further remarks', 2, None, 'further-remarks'),
               ('Layers used to build CNNs',
                2,
                None,
                'layers-used-to-build-cnns'),
+              ('Transforming images', 2, None, 'transforming-images'),
               ('CNNs in brief', 2, None, 'cnns-in-brief'),
               ('A deep CNN model ("From Raschka et '
                'al":"https://github.com/rasbt/machine-learning-book")',
@@ -177,208 +180,104 @@
                None,
                'a-deep-cnn-model-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
               ('Key Idea', 2, None, 'key-idea'),
-              ('Building convolutional neural networks in Tensorflow and Keras',
+              ('Mathematics of CNNs', 2, None, 'mathematics-of-cnns'),
+              ('Convolution Examples: Polynomial multiplication',
                2,
                None,
-               'building-convolutional-neural-networks-in-tensorflow-and-keras'),
-              ('Setting it up', 2, None, 'setting-it-up'),
-              ('The MNIST dataset again', 2, None, 'the-mnist-dataset-again'),
-              ('Strong correlations', 2, None, 'strong-correlations'),
-              ('Layers of a CNN', 2, None, 'layers-of-a-cnn'),
-              ('Systematic reduction', 2, None, 'systematic-reduction'),
-              ('Prerequisites: Collect and pre-process data',
-               2,
-               None,
-               'prerequisites-collect-and-pre-process-data'),
-              ('Importing Keras and Tensorflow',
-               2,
-               None,
-               'importing-keras-and-tensorflow'),
-              ('Running with Keras', 2, None, 'running-with-keras'),
-              ('Final part', 2, None, 'final-part'),
-              ('Final visualization', 2, None, 'final-visualization'),
-              ('The CIFAR01 data set', 2, None, 'the-cifar01-data-set'),
-              ('Verifying the data set', 2, None, 'verifying-the-data-set'),
-              ('Set up  the model', 2, None, 'set-up-the-model'),
-              ('Add Dense layers on top', 2, None, 'add-dense-layers-on-top'),
-              ('Compile and train the model',
-               2,
-               None,
-               'compile-and-train-the-model'),
-              ('Finally, evaluate the model',
-               2,
-               None,
-               'finally-evaluate-the-model'),
-              ('Building our own CNN code',
-               2,
-               None,
-               'building-our-own-cnn-code'),
-              ('List of contents:', 3, None, 'list-of-contents'),
-              ('Schedulers', 3, None, 'schedulers'),
-              ('Usage of schedulers', 3, None, 'usage-of-schedulers'),
-              ('Cost functions', 3, None, 'cost-functions'),
-              ('Usage of cost functions', 3, None, 'usage-of-cost-functions'),
-              ('Activation functions', 3, None, 'activation-functions'),
-              ('Usage of activation functions',
-               3,
-               None,
-               'usage-of-activation-functions'),
-              ('Convolution', 3, None, 'convolution'),
-              ('Layers', 3, None, 'layers'),
-              ('Convolution2DLayer: convolution in a hidden layer',
-               3,
-               None,
-               'convolution2dlayer-convolution-in-a-hidden-layer'),
-              ('Backpropagation in the convolutional layer',
-               3,
-               None,
-               'backpropagation-in-the-convolutional-layer'),
-              ('Demonstration', 3, None, 'demonstration'),
-              ('Pooling Layer', 3, None, 'pooling-layer'),
-              ('Flattening Layer', 3, None, 'flattening-layer'),
-              ('Fully Connected Layers', 3, None, 'fully-connected-layers'),
-              ('Optimized Convolution2DLayer',
-               3,
-               None,
-               'optimized-convolution2dlayer'),
-              ('The Convolutional Neural Network (CNN)',
-               3,
-               None,
-               'the-convolutional-neural-network-cnn'),
-              ('Usage of CNN code', 3, None, 'usage-of-cnn-code'),
-              ('Additional Remarks', 3, None, 'additional-remarks'),
-              ('Remarks on the speed', 3, None, 'remarks-on-the-speed'),
-              ('Convolution using separable kernels',
-               3,
-               None,
-               'convolution-using-separable-kernels'),
-              ('Convolution in the Fourier domain',
-               3,
-               None,
-               'convolution-in-the-fourier-domain'),
-              ('From FFNNs and CNNs to recurrent neural networks (RNNs)',
-               2,
-               None,
-               'from-ffnns-and-cnns-to-recurrent-neural-networks-rnns'),
-              ('Feedback connections', 2, None, 'feedback-connections'),
-              ('Vanishing gradients', 2, None, 'vanishing-gradients'),
-              ('Recurrent neural networks (RNNs): Overarching view',
-               2,
-               None,
-               'recurrent-neural-networks-rnns-overarching-view'),
-              ('Sequential data only?', 2, None, 'sequential-data-only'),
-              ('Differential equations', 2, None, 'differential-equations'),
-              ('A simple example', 2, None, 'a-simple-example'),
-              ('RNNs', 2, None, 'rnns'),
-              ('What kinds of behaviour can RNNs exhibit?',
-               2,
-               None,
-               'what-kinds-of-behaviour-can-rnns-exhibit'),
-              ('Basic layout,  "Figures from Sebastian Rashcka et al, Machine '
-               'learning with Sickit-Learn and '
-               'PyTorch":"https://sebastianraschka.com/blog/2022/ml-pytorch-book.html"',
-               2,
-               None,
-               'basic-layout-figures-from-sebastian-rashcka-et-al-machine-learning-with-sickit-learn-and-pytorch-https-sebastianraschka-com-blog-2022-ml-pytorch-book-html'),
-              ('Solving differential equations with RNNs',
-               2,
-               None,
-               'solving-differential-equations-with-rnns'),
-              ('Two first-order differential equations',
-               2,
-               None,
-               'two-first-order-differential-equations'),
-              ('Velocity only', 2, None, 'velocity-only'),
-              ('Linking with RNNs', 2, None, 'linking-with-rnns'),
-              ('Minor rewrite', 2, None, 'minor-rewrite'),
-              ('RNNs in more detail', 2, None, 'rnns-in-more-detail'),
-              ('RNNs in more detail, part 2',
-               2,
-               None,
-               'rnns-in-more-detail-part-2'),
-              ('RNNs in more detail, part 3',
-               2,
-               None,
-               'rnns-in-more-detail-part-3'),
-              ('RNNs in more detail, part 4',
-               2,
-               None,
-               'rnns-in-more-detail-part-4'),
-              ('RNNs in more detail, part 5',
+               'convolution-examples-polynomial-multiplication'),
+              ('Efficient Polynomial Multiplication',
                2,
                None,
-               'rnns-in-more-detail-part-5'),
-              ('RNNs in more detail, part 6',
+               'efficient-polynomial-multiplication'),
+              ('Further simplification', 2, None, 'further-simplification'),
+              ('A more efficient way of coding the above Convolution',
                2,
                None,
-               'rnns-in-more-detail-part-6'),
-              ('RNNs in more detail, part 7',
+               'a-more-efficient-way-of-coding-the-above-convolution'),
+              ('Commutative process', 2, None, 'commutative-process'),
+              ('Toeplitz matrices', 2, None, 'toeplitz-matrices'),
+              ('Fourier series and Toeplitz matrices',
                2,
                None,
-               'rnns-in-more-detail-part-7'),
-              ('Backpropagation through time',
+               'fourier-series-and-toeplitz-matrices'),
+              ('Generalizing the above one-dimensional case',
                2,
                None,
-               'backpropagation-through-time'),
-              ('The backward pass is linear',
+               'generalizing-the-above-one-dimensional-case'),
+              ('Memory considerations', 2, None, 'memory-considerations'),
+              ('Padding', 2, None, 'padding'),
+              ('New vector', 2, None, 'new-vector'),
+              ('Rewriting as dot products',
                2,
                None,
-               'the-backward-pass-is-linear'),
-              ('The problem of exploding or vanishing gradients',
+               'rewriting-as-dot-products'),
+              ('Cross correlation', 2, None, 'cross-correlation'),
+              ('Two-dimensional objects', 2, None, 'two-dimensional-objects'),
+              ('CNNs in more detail, simple example',
                2,
                None,
-               'the-problem-of-exploding-or-vanishing-gradients'),
-              ('Mathematical setup', 2, None, 'mathematical-setup'),
-              ('Back propagation in time through figures, part 1',
+               'cnns-in-more-detail-simple-example'),
+              ('The convolution stage', 2, None, 'the-convolution-stage'),
+              ('Finding the number of parameters',
                2,
                None,
-               'back-propagation-in-time-through-figures-part-1'),
-              ('Back propagation in time, part 2',
+               'finding-the-number-of-parameters'),
+              ('New image (or volume)', 2, None, 'new-image-or-volume'),
+              ('Parameters to train, common settings',
                2,
                None,
-               'back-propagation-in-time-part-2'),
-              ('Back propagation in time, part 3',
+               'parameters-to-train-common-settings'),
+              ('Examples of CNN setups', 2, None, 'examples-of-cnn-setups'),
+              ('Summarizing: Performing a general discrete convolution ("From '
+               'Raschka et '
+               'al":"https://github.com/rasbt/machine-learning-book")',
                2,
                None,
-               'back-propagation-in-time-part-3'),
-              ('Back propagation in time, part 4',
+               'summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
+              ('Pooling', 2, None, 'pooling'),
+              ('Pooling arithmetic', 2, None, 'pooling-arithmetic'),
+              ('Pooling types ("From Raschka et '
+               'al":"https://github.com/rasbt/machine-learning-book")',
                2,
                None,
-               'back-propagation-in-time-part-4'),
-              ('Back propagation in time in equations',
+               'pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book'),
+              ('Building convolutional neural networks using Tensorflow and '
+               'Keras',
                2,
                None,
-               'back-propagation-in-time-in-equations'),
-              ('Chain rule again', 2, None, 'chain-rule-again'),
-              ('Gradients of loss functions',
+               'building-convolutional-neural-networks-using-tensorflow-and-keras'),
+              ('Setting it up', 2, None, 'setting-it-up'),
+              ('The MNIST dataset again', 2, None, 'the-mnist-dataset-again'),
+              ('Strong correlations', 2, None, 'strong-correlations'),
+              ('Layers of a CNN', 2, None, 'layers-of-a-cnn'),
+              ('Systematic reduction', 2, None, 'systematic-reduction'),
+              ('Prerequisites: Collect and pre-process data',
                2,
                None,
-               'gradients-of-loss-functions'),
-              ('Summary of RNNs', 2, None, 'summary-of-rnns'),
-              ('Summary of a  typical RNN',
+               'prerequisites-collect-and-pre-process-data'),
+              ('Importing Keras and Tensorflow',
                2,
                None,
-               'summary-of-a-typical-rnn'),
-              ('Four effective ways to learn an RNN and preparing for next '
-               'week',
+               'importing-keras-and-tensorflow'),
+              ('Running with Keras', 2, None, 'running-with-keras'),
+              ('Final part', 2, None, 'final-part'),
+              ('Final visualization', 2, None, 'final-visualization'),
+              ('The CIFAR01 data set', 2, None, 'the-cifar01-data-set'),
+              ('Verifying the data set', 2, None, 'verifying-the-data-set'),
+              ('Set up  the model', 2, None, 'set-up-the-model'),
+              ('Add Dense layers on top', 2, None, 'add-dense-layers-on-top'),
+              ('Compile and train the model',
                2,
                None,
-               'four-effective-ways-to-learn-an-rnn-and-preparing-for-next-week'),
-              ('Gating mechanism: Long Short Term Memory (LSTM)',
+               'compile-and-train-the-model'),
+              ('Finally, evaluate the model',
                2,
                None,
-               'gating-mechanism-long-short-term-memory-lstm'),
-              ('Implementing a memory cell in a neural network',
+               'finally-evaluate-the-model'),
+              ('Building code using Pytorch',
                2,
                None,
-               'implementing-a-memory-cell-in-a-neural-network'),
-              ('LSTM details', 2, None, 'lstm-details'),
-              ('Basic layout', 2, None, 'basic-layout'),
-              ('More LSTM details', 2, None, 'more-lstm-details'),
-              ('The forget gate', 2, None, 'the-forget-gate'),
-              ('Input gate', 2, None, 'input-gate'),
-              ('Forget and input', 2, None, 'forget-and-input'),
-              ('Output gate', 2, None, 'output-gate')]}
+               'building-code-using-pytorch')]}
 end of tocinfo -->
 
 <body>
@@ -400,7 +299,7 @@
 
 <!-- ------------------- main content ---------------------- -->
 <center>
-<h1>Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks (RNNs)</h1>
+<h1>Week 45,  Convolutional Neural Networks (CCNs)</h1>
 </center>  <!-- document title -->
 
 <!-- author(s): Morten Hjorth-Jensen -->
@@ -413,7 +312,7 @@ <h1>Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks
 </center>
 <br>
 <center>
-<h4>November 4-8</h4>
+<h4>November 3-7, 2025</h4>
 </center> <!-- date -->
 <br>
 
@@ -421,19 +320,18 @@ <h4>November 4-8</h4>
 <h2 id="plans-for-week-45">Plans for week 45 </h2>
 
 <div class="alert alert-block alert-block alert-text-normal">
-<b>Material for the lecture on Monday November 4, 2024</b>
+<b>Material for the lecture on Monday November 3, 2025</b>
 <p>
 <ol>
-<li> Convolutional Neural Networks, codes and examples (own code and TensorFlow and Pytorch implementations)</li>
-<li> Recurrent  Neural Networks (RNNs)</li>
-<li> Readings and Videos:
+<li> Convolutional Neural Networks, codes and examples (TensorFlow and Pytorch implementations)</li>
+<li> Readings and Videos:</li>
+<li> These lecture notes at <a href="https://github.com/CompPhysics/MachineLearning/blob/master/doc/pub/week45/ipynb/week45.ipynb" target="_blank"><tt>https://github.com/CompPhysics/MachineLearning/blob/master/doc/pub/week45/ipynb/week45.ipynb</tt></a></li>
+<li> Video of lecture at <a href="https://youtu.be/dZt6Vm1wjhs" target="_blank"><tt>https://youtu.be/dZt6Vm1wjhs</tt></a></li>
+<li> Whiteboard notes at <a href="https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2025/FYSSTKweek45.pdf" target="_blank"><tt>https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2025/FYSSTKweek45.pdf</tt></a></li>
+<li> For a more in depth discussion on  CNNs we recommend Goodfellow et al chapters 9. See also chapter 11 and 12 on practicalities and applications</li>    
+<li> Reading suggestions for implementation of CNNs, see Raschka et al chapters 14-15 at <a href="https://github.com/rasbt/machine-learning-book" target="_blank"><tt>https://github.com/rasbt/machine-learning-book</tt></a>.
+<!-- o Video  on Recurrent Neural Networks from MIT at <a href="https://www.youtube.com/watch?v=SEnXr6v2ifU&ab_channel=AlexanderAmini" target="_blank"><tt>https://www.youtube.com/watch?v=SEnXr6v2ifU&ab_channel=AlexanderAmini</tt></a> -->
 <ol type="a"></li>
- <li> These lecture notes at <a href="https://github.com/CompPhysics/MachineLearning/blob/master/doc/pub/week45/ipynb/week45.ipynb" target="_blank"><tt>https://github.com/CompPhysics/MachineLearning/blob/master/doc/pub/week45/ipynb/week45.ipynb</tt></a>
-<!-- * <a href="https://youtu.be/z0x-vgyAZUk" target="_blank">Video of lecture</a> -->
-<!-- * <a href="https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2023/NotesNov9.pdf" target="_blank">Whiteboard notes</a> --></li>
- <li> For a more in depth discussion on  CNNs and recurrent neural networks we recommend Goodfellow et al chapters 9 and 10. See also chapter 11 and 12 on practicalities and applications</li>    
- <li> Reading suggestions for implementation of CNNs and RNNs, see Raschka et al chapters 14-15 at <a href="https://github.com/rasbt/machine-learning-book" target="_blank"><tt>https://github.com/rasbt/machine-learning-book</tt></a>.</li>
- <li> Video  on Recurrent Neural Networks from MIT at <a href="https://www.youtube.com/watch?v=SEnXr6v2ifU&ab_channel=AlexanderAmini" target="_blank"><tt>https://www.youtube.com/watch?v=SEnXr6v2ifU&ab_channel=AlexanderAmini</tt></a></li>
  <li> Video on Deep Learning at <a href="https://www.youtube.com/playlist?list=PLZHQObOWTQDNU6R1_67000Dx_ZCJB-3pi" target="_blank"><tt>https://www.youtube.com/playlist?list=PLZHQObOWTQDNU6R1_67000Dx_ZCJB-3pi</tt></a></li>
 </ol>
 </ol>
@@ -441,22 +339,15 @@ <h2 id="plans-for-week-45">Plans for week 45 </h2>
 
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="material-for-the-lab-sessions-additional-ways-to-present-classification-results-and-other-practicalities">Material for the lab sessions, additional ways to present classification results and other practicalities </h2>
+<h2 id="material-for-the-lab-sessions">Material for the lab sessions </h2>
 
-<div class="alert alert-block alert-block alert-text-normal">
-<b>Material for the active learning sessions on Tuesday and Wednesday</b>
-<p>
-<ol>
- <li> Discussion of and work on project 3, available from Monday November 4, late evening</li>
-</ol>
-</div>
-  
+<p>Discussion of and work on project 2, no exercises this week, only project work</p>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="material-for-lecture-monday-november-4">Material for Lecture Monday November 4 </h2>
+<h2 id="material-for-lecture-monday-november-3">Material for Lecture Monday November 3 </h2>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="convolutional-neural-networks-recognizing-images">Convolutional Neural Networks (recognizing images) </h2>
+<h2 id="convolutional-neural-networks-recognizing-images-reminder-from-last-week">Convolutional Neural Networks (recognizing images), reminder from last week </h2>
 
 <p>Convolutional neural networks (CNNs) were developed during the last
 decade of the previous century, with a focus on character recognition
@@ -590,6 +481,47 @@ <h2 id="3d-volumes-of-neurons">3D volumes of neurons </h2>
 <p><img src="http://github.com/figslides/cnn.jpeg" width="500" align="bottom"></p>
 </center>
 
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="more-on-dimensionalities">More on Dimensionalities </h2>
+
+<p>In fields like signal processing (and imaging as well), one designs
+so-called filters. These filters are defined by the convolutions and
+are often hand-crafted. One may specify filters for smoothing, edge
+detection, frequency reshaping, and similar operations. However with
+neural networks the idea is to automatically learn the filters and use
+many of them in conjunction with non-linear operations (activation
+functions).
+</p>
+
+<p>As an example consider a neural network operating on sound sequence
+data.  Assume that we an input vector \( \boldsymbol{x} \) of length \( d=10^6 \).  We
+construct then a neural network with onle hidden layer only with
+\( 10^4 \) nodes. This means that we will have a weight matrix with
+\( 10^4\times 10^6=10^{10} \) weights to be determined, together with \( 10^4 \) biases.
+</p>
+
+<p>Assume furthermore that we have an output layer which is meant to train whether the sound sequence represents a human voice (true) or something else (false).
+It means that we have only one output node. But since this output node connects to \( 10^4 \) nodes in the hidden layer, there are in total \( 10^4 \) weights to be determined for the output layer, plus one bias. In total we have
+</p>
+
+$$
+\mathrm{NumberParameters}=10^{10}+10^4+10^4+1 \approx 10^{10},
+$$
+
+<p>that is ten billion parameters to determine. </p>
+
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="further-remarks">Further remarks </h2>
+
+<p>The main principles that justify convolutions is locality of
+information and repetion of patterns within the signal. Sound samples
+of the input in adjacent spots are much more likely to affect each
+other than those that are very far away. Similarly, sounds are
+repeated in multiple times in the signal. While slightly simplistic,
+reasoning about such a sound example demonstrates this. The same
+principles then apply to images and other similar data.
+</p>
+
 <!-- !split  -->
 <h2 id="layers-used-to-build-cnns">Layers used to build CNNs </h2>
 
@@ -610,6 +542,23 @@ <h2 id="layers-used-to-build-cnns">Layers used to build CNNs </h2>
 <li> <b>POOL</b> (pooling) layer will perform a downsampling operation along the spatial dimensions (width, height), resulting in volume such as \( [16\times 16\times 12] \).</li>
 <li> <b>FC</b> (i.e. fully-connected) layer will compute the class scores, resulting in volume of size \( [1\times 1\times 10] \), where each of the 10 numbers correspond to a class score, such as among the 10 categories of the MNIST images we considered above . As with ordinary Neural Networks and as the name implies, each neuron in this layer will be connected to all the numbers in the previous volume.</li>
 </ul>
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="transforming-images">Transforming images </h2>
+
+<p>CNNs transform the original image layer by layer from the original
+pixel values to the final class scores. 
+</p>
+
+<p>Observe that some layers contain
+parameters and other don&#8217;t. In particular, the CNN layers perform
+transformations that are a function of not only the activations in the
+input volume, but also of the parameters (the weights and biases of
+the neurons). On the other hand, the RELU/POOL layers will implement a
+fixed function. The parameters in the CONV/FC layers will be trained
+with gradient descent so that the class scores that the CNN computes
+are consistent with the labels in the training set for each image.
+</p>
+
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
 <h2 id="cnns-in-brief">CNNs in brief </h2>
 
@@ -645,3636 +594,699 @@ <h2 id="key-idea">Key Idea </h2>
 <p>We say we perform a filtering (convolution is the mathematical operation). </p>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="building-convolutional-neural-networks-in-tensorflow-and-keras">Building convolutional neural networks in Tensorflow and Keras </h2>
+<h2 id="mathematics-of-cnns">Mathematics of CNNs </h2>
 
-<p>As discussed above, CNNs are neural networks built from the assumption that the inputs
-to the network are 2D images. This is important because the number of features or pixels in images
-grows very fast with the image size, and an enormous number of weights and biases are needed in order to build an accurate network.  
+<p>The mathematics of CNNs is based on the mathematical operation of
+<b>convolution</b>.  In mathematics (in particular in functional analysis),
+convolution is represented by mathematical operations (integration,
+summation etc) on two functions in order to produce a third function
+that expresses how the shape of one gets modified by the other.
+Convolution has a plethora of applications in a variety of
+disciplines, spanning from statistics to signal processing, computer
+vision, solutions of differential equations,linear algebra,
+engineering, and yes, machine learning.
 </p>
 
-<p>As before, we still have our input, a hidden layer and an output. What's novel about convolutional networks
-are the <b>convolutional</b> and <b>pooling</b> layers stacked in pairs between the input and the hidden layer.
-In addition, the data is no longer represented as a 2D feature matrix, instead each input is a number of 2D
-matrices, typically 1 for each color dimension (Red, Green, Blue). 
+<p>Mathematically, convolution is defined as follows (one-dimensional example):
+Let us define a continuous function \( y(t) \) given by
 </p>
+$$
+y(t) = \int x(a) w(t-a) da,
+$$
+
+<p>where \( x(a) \) represents a so-called input and \( w(t-a) \) is normally called the weight function or kernel.</p>
+
+<p>The above integral is written in  a more compact form as</p>
+$$
+y(t) = \left(x * w\right)(t).
+$$
+
+<p>The discretized version reads</p>
+$$
+y(t) = \sum_{a=-\infty}^{a=\infty}x(a)w(t-a).
+$$
+
+<p>Computing the inverse of the above convolution operations is known as deconvolution and the process is commutative.</p>
+
+<p>How can we use this? And what does it mean? Let us study some familiar examples first.</p>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="setting-it-up">Setting it up </h2>
+<h2 id="convolution-examples-polynomial-multiplication">Convolution Examples: Polynomial multiplication </h2>
 
-<p>It means that to represent the entire
-dataset of images, we require a 4D matrix or <b>tensor</b>. This tensor has the dimensions:  
+<p>Our first example is that of a multiplication between two polynomials,
+which we will rewrite in terms of the mathematics of convolution. In
+the final stage, since the problem here is a discrete one, we will
+recast the final expression in terms of a matrix-vector
+multiplication, where the matrix is a so-called <a href="https://link.springer.com/book/10.1007/978-93-86279-04-0" target="_blank">Toeplitz matrix
+</a>.
 </p>
-$$  
-(n_{inputs},\, n_{pixels, width},\, n_{pixels, height},\, depth) .
+
+<p>Let us look a the following polynomials to second and third order, respectively:</p>
+$$
+p(t) = \alpha_0+\alpha_1 t+\alpha_2 t^2,
+$$
+
+<p>and</p>
+$$
+s(t) = \beta_0+\beta_1 t+\beta_2 t^2+\beta_3 t^3.
+$$
+
+<p>The polynomial multiplication gives us a new polynomial of degree \( 5 \)</p>
+$$
+z(t) = \delta_0+\delta_1 t+\delta_2 t^2+\delta_3 t^3+\delta_4 t^4+\delta_5 t^5.
 $$
 
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="the-mnist-dataset-again">The MNIST dataset again </h2>
+<h2 id="efficient-polynomial-multiplication">Efficient Polynomial Multiplication </h2>
 
-<p>The MNIST dataset consists of grayscale images with a pixel size of
-\( 28\times 28 \), meaning we require \( 28 \times 28 = 724 \) weights to each
-neuron in the first hidden layer.
+<p>Computing polynomial products can be implemented efficiently if we rewrite the more brute force multiplications using convolution.
+We note first that the new coefficients are given as
 </p>
 
-<p>If we were to analyze images of size \( 128\times 128 \) we would require
-\( 128 \times 128 = 16384 \) weights to each neuron. Even worse if we were
-dealing with color images, as most images are, we have an image matrix
-of size \( 128\times 128 \) for each color dimension (Red, Green, Blue),
-meaning 3 times the number of weights \( = 49152 \) are required for every
-single neuron in the first hidden layer.
-</p>
+$$
+\begin{split}
+\delta_0=&\alpha_0\beta_0\\
+\delta_1=&\alpha_1\beta_0+\alpha_0\beta_1\\
+\delta_2=&\alpha_0\beta_2+\alpha_1\beta_1+\alpha_2\beta_0\\
+\delta_3=&\alpha_1\beta_2+\alpha_2\beta_1+\alpha_0\beta_3\\
+\delta_4=&\alpha_2\beta_2+\alpha_1\beta_3\\
+\delta_5=&\alpha_2\beta_3.\\
+\end{split}
+$$
+
+<p>We note that \( \alpha_i=0 \) except for \( i\in \left\{0,1,2\right\} \) and \( \beta_i=0 \) except for \( i\in\left\{0,1,2,3\right\} \).</p>
+
+<p>We can then rewrite the coefficients \( \delta_j \) using a discrete convolution as</p>
+$$
+\delta_j = \sum_{i=-\infty}^{i=\infty}\alpha_i\beta_{j-i}=(\alpha * \beta)_j,
+$$
+
+<p>or as a double sum with restriction \( l=i+j \)</p>
+$$
+\delta_l = \sum_{ij}\alpha_i\beta_{j}.
+$$
 
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="strong-correlations">Strong correlations </h2>
+<h2 id="further-simplification">Further simplification </h2>
 
-<p>Images typically have strong local correlations, meaning that a small
-part of the image varies little from its neighboring regions. If for
-example we have an image of a blue car, we can roughly assume that a
-small blue part of the image is surrounded by other blue regions.
+<p>Although we may have redundant operations with some few zeros for \( \beta_i \), we can rewrite the above sum in a more compact way as </p>
+$$
+\delta_i = \sum_{k=0}^{k=m-1}\alpha_k\beta_{i-k},
+$$
+
+<p>where \( m=3 \) in our case, the maximum length of
+the vector \( \alpha \). Note that the vector \( \boldsymbol{\beta} \) has length \( n=4 \). Below we will find an even more efficient representation.
 </p>
 
-<p>Therefore, instead of connecting every single pixel to a neuron in the
-first hidden layer, as we have previously done with deep neural
-networks, we can instead connect each neuron to a small part of the
-image (in all 3 RGB depth dimensions).  The size of each small area is
-fixed, and known as a <a href="https://en.wikipedia.org/wiki/Receptive_field" target="_blank">receptive</a>.
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="a-more-efficient-way-of-coding-the-above-convolution">A more efficient way of coding the above Convolution </h2>
+
+<p>Since we only have a finite number of \( \alpha \) and \( \beta \) values
+which are non-zero, we can rewrite the above convolution expressions
+as a matrix-vector multiplication
 </p>
 
+$$
+\boldsymbol{\delta}=\begin{bmatrix}\alpha_0 & 0 & 0 & 0 \\
+                            \alpha_1 & \alpha_0 & 0 & 0 \\
+			    \alpha_2 & \alpha_1 & \alpha_0 & 0 \\
+			    0 & \alpha_2 & \alpha_1 & \alpha_0 \\
+			    0 & 0 & \alpha_2 & \alpha_1 \\
+			    0 & 0 & 0 & \alpha_2
+			    \end{bmatrix}\begin{bmatrix} \beta_0 \\ \beta_1 \\ \beta_2 \\ \beta_3\end{bmatrix}.
+$$
 
-<!-- !split  -->
-<h2 id="layers-of-a-cnn">Layers of a CNN </h2>
 
-<p>The layers of a convolutional neural network arrange neurons in 3D: width, height and depth.  
-The input image is typically a square matrix of depth 3. 
-</p>
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="commutative-process">Commutative process </h2>
 
-<p>A <b>convolution</b> is performed on the image which outputs
-a 3D volume of neurons. The weights to the input are arranged in a number of 2D matrices, known as <b>filters</b>.
+<p>The process is commutative and we can easily see that we can rewrite the multiplication in terms of  a matrix holding \( \beta \) and a vector holding \( \alpha \).
+In this case we have
 </p>
+$$
+\boldsymbol{\delta}=\begin{bmatrix}\beta_0 & 0 & 0  \\
+                            \beta_1 & \beta_0 & 0  \\
+			    \beta_2 & \beta_1 & \beta_0  \\
+			    \beta_3 & \beta_2 & \beta_1 \\
+			    0 & \beta_3 & \beta_2 \\
+			    0 & 0 & \beta_3
+			    \end{bmatrix}\begin{bmatrix} \alpha_0 \\ \alpha_1 \\ \alpha_2\end{bmatrix}.
+$$
 
-<p>Each filter slides along the input image, taking the dot product
-between each small part of the image and the filter, in all depth
-dimensions. This is then passed through a non-linear function,
-typically the <b>Rectified Linear (ReLu)</b> function, which serves as the
-activation of the neurons in the first convolutional layer. This is
-further passed through a <b>pooling layer</b>, which reduces the size of the
-convolutional layer, e.g. by taking the maximum or average across some
-small regions, and this serves as input to the next convolutional
-layer.
+<p>Note that the use of these matrices is for mathematical purposes only
+and not implementation purposes.  When implementing the above equation
+we do not encode (and allocate memory) the matrices explicitely.  We
+rather code the convolutions in the minimal memory footprint that they
+require.
 </p>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="systematic-reduction">Systematic reduction </h2>
+<h2 id="toeplitz-matrices">Toeplitz matrices </h2>
 
-<p>By systematically reducing the size of the input volume, through
-convolution and pooling, the network should create representations of
-small parts of the input, and then from them assemble representations
-of larger areas.  The final pooling layer is flattened to serve as
-input to a hidden layer, such that each neuron in the final pooling
-layer is connected to every single neuron in the hidden layer. This
-then serves as input to the output layer, e.g. a softmax output for
-classification.
+<p>The above matrices are examples of so-called <a href="https://link.springer.com/book/10.1007/978-93-86279-04-0" target="_blank">Toeplitz
+matrices</a>. A
+Toeplitz matrix is a matrix in which each descending diagonal from
+left to right is constant. For instance the last matrix, which we
+rewrite as
 </p>
+$$
+\boldsymbol{A}=\begin{bmatrix}a_0 & 0 & 0  \\
+                            a_1 & a_0 & 0  \\
+			    a_2 & a_1 & a_0  \\
+			    a_3 & a_2 & a_1 \\
+			    0 & a_3 & a_2 \\
+			    0 & 0 & a_3
+			    \end{bmatrix},
+$$
 
+<p>with elements \( a_{ii}=a_{i+1,j+1}=a_{i-j} \) is an example of a Toeplitz
+matrix. Such a matrix does not need to be a square matrix.  Toeplitz
+matrices are also closely connected with Fourier series, because the multiplication operator by a trigonometric
+polynomial, compressed to a finite-dimensional space, can be
+represented by such a matrix. The example above shows that we can
+represent linear convolution as multiplication of a Toeplitz matrix by
+a vector.
+</p>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="prerequisites-collect-and-pre-process-data">Prerequisites: Collect and pre-process data </h2>
+<h2 id="fourier-series-and-toeplitz-matrices">Fourier series and Toeplitz matrices </h2>
 
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #408080; font-style: italic"># import necessary packages</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">matplotlib.pyplot</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">plt</span>
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn</span> <span style="color: #008000; font-weight: bold">import</span> datasets
+<p>This is an active and ogoing research area concerning CNNs. The following articles may be of interest</p>
+<ol>
+<li> <a href="https://www.sciencedirect.com/topics/engineering/convolution-theorem#:~:text=The%20convolution%20theorem%20(together%20with,k%20)%20G%20(%20k%20)%20." target="_blank">Read more about the convolution theorem and Fouriers series</a></li>
+<li> <a href="https://www.sciencedirect.com/science/article/pii/S1568494623006257" target="_blank">Fourier Transform Layer</a></li>
+</ol>
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="generalizing-the-above-one-dimensional-case">Generalizing the above one-dimensional case </h2>
 
+<p>In order to align the above simple case with the more general
+convolution cases, we rename \( \boldsymbol{\alpha} \), whose length is \( m=3 \),
+with \( \boldsymbol{w} \).  We will interpret \( \boldsymbol{w} \) as a weight/filter function
+with which we want to perform the convolution with an input variable
+\( \boldsymbol{x} \) of length \( n \).  We will assume always that the filter
+\( \boldsymbol{w} \) has dimensionality \( m \le n \).
+</p>
 
-<span style="color: #408080; font-style: italic"># ensure the same random numbers appear every time</span>
-np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>seed(<span style="color: #666666">0</span>)
+<p>We replace thus \( \boldsymbol{\beta} \) with \( \boldsymbol{x} \) and \( \boldsymbol{\delta} \) with \( \boldsymbol{y} \) and have</p>
+$$
+y(i)= \left(x*w\right)(i)= \sum_{k=0}^{k=m-1}w(k)x(i-k),
+$$
 
-<span style="color: #408080; font-style: italic"># display images in notebook</span>
-<span style="color: #666666">%</span>matplotlib inline
-plt<span style="color: #666666">.</span>rcParams[<span style="color: #BA2121">&#39;figure.figsize&#39;</span>] <span style="color: #666666">=</span> (<span style="color: #666666">12</span>,<span style="color: #666666">12</span>)
+<p>where \( m=3 \) in our case, the maximum length of the vector \( \boldsymbol{w} \).
+Here the symbol \( * \) represents the mathematical operation of convolution.
+</p>
 
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="memory-considerations">Memory considerations </h2>
 
-<span style="color: #408080; font-style: italic"># download MNIST dataset</span>
-digits <span style="color: #666666">=</span> datasets<span style="color: #666666">.</span>load_digits()
+<p>This expression leaves us however with some terms with negative
+indices, for example \( x(-1) \) and \( x(-2) \) which may not be defined. Our
+vector \( \boldsymbol{x} \) has components \( x(0) \), \( x(1) \), \( x(2) \) and \( x(3) \).
+</p>
 
-<span style="color: #408080; font-style: italic"># define inputs and labels</span>
-inputs <span style="color: #666666">=</span> digits<span style="color: #666666">.</span>images
-labels <span style="color: #666666">=</span> digits<span style="color: #666666">.</span>target
+<p>The index \( j \) for \( \boldsymbol{x} \) runs from \( j=0 \) to \( j=3 \) since \( \boldsymbol{x} \) is meant to
+represent a third-order polynomial.
+</p>
 
-<span style="color: #408080; font-style: italic"># RGB images have a depth of 3</span>
-<span style="color: #408080; font-style: italic"># our images are grayscale so they should have a depth of 1</span>
-inputs <span style="color: #666666">=</span> inputs[:,:,:,np<span style="color: #666666">.</span>newaxis]
+<p>Furthermore, the index \( i \) runs from \( i=0 \) to \( i=5 \) since \( \boldsymbol{y} \)
+contains the coefficients of a fifth-order polynomial.  When \( i=5 \) we
+may also have values of \( x(4) \) and \( x(5) \) which are not defined.
+</p>
 
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;inputs = (n_inputs, pixel_width, pixel_height, depth) = &quot;</span> <span style="color: #666666">+</span> <span style="color: #008000">str</span>(inputs<span style="color: #666666">.</span>shape))
-<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;labels = (n_inputs) = &quot;</span> <span style="color: #666666">+</span> <span style="color: #008000">str</span>(labels<span style="color: #666666">.</span>shape))
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="padding">Padding </h2>
 
+<p>The solution to this is what is called <b>padding</b>!  We simply define a
+new vector \( x \) with two added elements set to zero before \( x(0) \) and
+two new elements after \( x(3) \) set to zero. That is, we augment the
+length of \( \boldsymbol{x} \) from \( n=4 \) to \( n+2P=8 \), where \( P=2 \) is the padding
+constant (a new hyperparameter), see discussions below as well.
+</p>
 
-<span style="color: #408080; font-style: italic"># choose some random images to display</span>
-n_inputs <span style="color: #666666">=</span> <span style="color: #008000">len</span>(inputs)
-indices <span style="color: #666666">=</span> np<span style="color: #666666">.</span>arange(n_inputs)
-random_indices <span style="color: #666666">=</span> np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>choice(indices, size<span style="color: #666666">=5</span>)
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="new-vector">New vector </h2>
 
-<span style="color: #008000; font-weight: bold">for</span> i, image <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">enumerate</span>(digits<span style="color: #666666">.</span>images[random_indices]):
-    plt<span style="color: #666666">.</span>subplot(<span style="color: #666666">1</span>, <span style="color: #666666">5</span>, i<span style="color: #666666">+1</span>)
-    plt<span style="color: #666666">.</span>axis(<span style="color: #BA2121">&#39;off&#39;</span>)
-    plt<span style="color: #666666">.</span>imshow(image, cmap<span style="color: #666666">=</span>plt<span style="color: #666666">.</span>cm<span style="color: #666666">.</span>gray_r, interpolation<span style="color: #666666">=</span><span style="color: #BA2121">&#39;nearest&#39;</span>)
-    plt<span style="color: #666666">.</span>title(<span style="color: #BA2121">&quot;Label: </span><span style="color: #BB6688; font-weight: bold">%d</span><span style="color: #BA2121">&quot;</span> <span style="color: #666666">%</span> digits<span style="color: #666666">.</span>target[random_indices[i]])
-plt<span style="color: #666666">.</span>show()
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="importing-keras-and-tensorflow">Importing Keras and Tensorflow </h2>
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">tensorflow.keras</span> <span style="color: #008000; font-weight: bold">import</span> datasets, layers, models
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">tensorflow.keras.layers</span> <span style="color: #008000; font-weight: bold">import</span> Input
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">tensorflow.keras.models</span> <span style="color: #008000; font-weight: bold">import</span> Sequential      <span style="color: #408080; font-style: italic">#This allows appending layers to existing models</span>
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">tensorflow.keras.layers</span> <span style="color: #008000; font-weight: bold">import</span> Dense           <span style="color: #408080; font-style: italic">#This allows defining the characteristics of a particular layer</span>
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">tensorflow.keras</span> <span style="color: #008000; font-weight: bold">import</span> optimizers             <span style="color: #408080; font-style: italic">#This allows using whichever optimiser we want (sgd,adam,RMSprop)</span>
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">tensorflow.keras</span> <span style="color: #008000; font-weight: bold">import</span> regularizers           <span style="color: #408080; font-style: italic">#This allows using whichever regularizer we want (l1,l2,l1_l2)</span>
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">tensorflow.keras.utils</span> <span style="color: #008000; font-weight: bold">import</span> to_categorical   <span style="color: #408080; font-style: italic">#This allows using categorical cross entropy as the cost function</span>
-<span style="color: #408080; font-style: italic">#from tensorflow.keras import Conv2D</span>
-<span style="color: #408080; font-style: italic">#from tensorflow.keras import MaxPooling2D</span>
-<span style="color: #408080; font-style: italic">#from tensorflow.keras import Flatten</span>
-
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.model_selection</span> <span style="color: #008000; font-weight: bold">import</span> train_test_split
-
-<span style="color: #408080; font-style: italic"># representation of labels</span>
-labels <span style="color: #666666">=</span> to_categorical(labels)
-
-<span style="color: #408080; font-style: italic"># split into train and test data</span>
-<span style="color: #408080; font-style: italic"># one-liner from scikit-learn library</span>
-train_size <span style="color: #666666">=</span> <span style="color: #666666">0.8</span>
-test_size <span style="color: #666666">=</span> <span style="color: #666666">1</span> <span style="color: #666666">-</span> train_size
-X_train, X_test, Y_train, Y_test <span style="color: #666666">=</span> train_test_split(inputs, labels, train_size<span style="color: #666666">=</span>train_size,
-                                                    test_size<span style="color: #666666">=</span>test_size)
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-
-<!-- !split  -->
-<h2 id="running-with-keras">Running with Keras </h2>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">create_convolutional_neural_network_keras</span>(input_shape, receptive_field,
-                                              n_filters, n_neurons_connected, n_categories,
-                                              eta, lmbd):
-    model <span style="color: #666666">=</span> Sequential()
-    model<span style="color: #666666">.</span>add(layers<span style="color: #666666">.</span>Conv2D(n_filters, (receptive_field, receptive_field), input_shape<span style="color: #666666">=</span>input_shape, padding<span style="color: #666666">=</span><span style="color: #BA2121">&#39;same&#39;</span>,
-              activation<span style="color: #666666">=</span><span style="color: #BA2121">&#39;relu&#39;</span>, kernel_regularizer<span style="color: #666666">=</span>regularizers<span style="color: #666666">.</span>l2(lmbd)))
-    model<span style="color: #666666">.</span>add(layers<span style="color: #666666">.</span>MaxPooling2D(pool_size<span style="color: #666666">=</span>(<span style="color: #666666">2</span>, <span style="color: #666666">2</span>)))
-    model<span style="color: #666666">.</span>add(layers<span style="color: #666666">.</span>Flatten())
-    model<span style="color: #666666">.</span>add(layers<span style="color: #666666">.</span>Dense(n_neurons_connected, activation<span style="color: #666666">=</span><span style="color: #BA2121">&#39;relu&#39;</span>, kernel_regularizer<span style="color: #666666">=</span>regularizers<span style="color: #666666">.</span>l2(lmbd)))
-    model<span style="color: #666666">.</span>add(layers<span style="color: #666666">.</span>Dense(n_categories, activation<span style="color: #666666">=</span><span style="color: #BA2121">&#39;softmax&#39;</span>, kernel_regularizer<span style="color: #666666">=</span>regularizers<span style="color: #666666">.</span>l2(lmbd)))
-    
-    sgd <span style="color: #666666">=</span> optimizers<span style="color: #666666">.</span>SGD(learning_rate<span style="color: #666666">=</span>eta)
-    model<span style="color: #666666">.</span>compile(loss<span style="color: #666666">=</span><span style="color: #BA2121">&#39;categorical_crossentropy&#39;</span>, optimizer<span style="color: #666666">=</span>sgd, metrics<span style="color: #666666">=</span>[<span style="color: #BA2121">&#39;accuracy&#39;</span>])
-    
-    <span style="color: #008000; font-weight: bold">return</span> model
-
-epochs <span style="color: #666666">=</span> <span style="color: #666666">100</span>
-batch_size <span style="color: #666666">=</span> <span style="color: #666666">100</span>
-input_shape <span style="color: #666666">=</span> X_train<span style="color: #666666">.</span>shape[<span style="color: #666666">1</span>:<span style="color: #666666">4</span>]
-receptive_field <span style="color: #666666">=</span> <span style="color: #666666">3</span>
-n_filters <span style="color: #666666">=</span> <span style="color: #666666">10</span>
-n_neurons_connected <span style="color: #666666">=</span> <span style="color: #666666">50</span>
-n_categories <span style="color: #666666">=</span> <span style="color: #666666">10</span>
-
-eta_vals <span style="color: #666666">=</span> np<span style="color: #666666">.</span>logspace(<span style="color: #666666">-5</span>, <span style="color: #666666">1</span>, <span style="color: #666666">7</span>)
-lmbd_vals <span style="color: #666666">=</span> np<span style="color: #666666">.</span>logspace(<span style="color: #666666">-5</span>, <span style="color: #666666">1</span>, <span style="color: #666666">7</span>)
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="final-part">Final part </h2>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;">CNN_keras <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros((<span style="color: #008000">len</span>(eta_vals), <span style="color: #008000">len</span>(lmbd_vals)), dtype<span style="color: #666666">=</span><span style="color: #008000">object</span>)
-        
-<span style="color: #008000; font-weight: bold">for</span> i, eta <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">enumerate</span>(eta_vals):
-    <span style="color: #008000; font-weight: bold">for</span> j, lmbd <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">enumerate</span>(lmbd_vals):
-        CNN <span style="color: #666666">=</span> create_convolutional_neural_network_keras(input_shape, receptive_field,
-                                              n_filters, n_neurons_connected, n_categories,
-                                              eta, lmbd)
-        CNN<span style="color: #666666">.</span>fit(X_train, Y_train, epochs<span style="color: #666666">=</span>epochs, batch_size<span style="color: #666666">=</span>batch_size, verbose<span style="color: #666666">=0</span>)
-        scores <span style="color: #666666">=</span> CNN<span style="color: #666666">.</span>evaluate(X_test, Y_test)
-        
-        CNN_keras[i][j] <span style="color: #666666">=</span> CNN
-        
-        <span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;Learning rate = &quot;</span>, eta)
-        <span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;Lambda = &quot;</span>, lmbd)
-        <span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;Test accuracy: </span><span style="color: #BB6688; font-weight: bold">%.3f</span><span style="color: #BA2121">&quot;</span> <span style="color: #666666">%</span> scores[<span style="color: #666666">1</span>])
-        <span style="color: #008000">print</span>()
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="final-visualization">Final visualization </h2>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #408080; font-style: italic"># visual representation of grid search</span>
-<span style="color: #408080; font-style: italic"># uses seaborn heatmap, could probably do this in matplotlib</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">seaborn</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">sns</span>
-
-sns<span style="color: #666666">.</span>set()
-
-train_accuracy <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros((<span style="color: #008000">len</span>(eta_vals), <span style="color: #008000">len</span>(lmbd_vals)))
-test_accuracy <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros((<span style="color: #008000">len</span>(eta_vals), <span style="color: #008000">len</span>(lmbd_vals)))
-
-<span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #008000">len</span>(eta_vals)):
-    <span style="color: #008000; font-weight: bold">for</span> j <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #008000">len</span>(lmbd_vals)):
-        CNN <span style="color: #666666">=</span> CNN_keras[i][j]
-
-        train_accuracy[i][j] <span style="color: #666666">=</span> CNN<span style="color: #666666">.</span>evaluate(X_train, Y_train)[<span style="color: #666666">1</span>]
-        test_accuracy[i][j] <span style="color: #666666">=</span> CNN<span style="color: #666666">.</span>evaluate(X_test, Y_test)[<span style="color: #666666">1</span>]
-
-        
-fig, ax <span style="color: #666666">=</span> plt<span style="color: #666666">.</span>subplots(figsize <span style="color: #666666">=</span> (<span style="color: #666666">10</span>, <span style="color: #666666">10</span>))
-sns<span style="color: #666666">.</span>heatmap(train_accuracy, annot<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">True</span>, ax<span style="color: #666666">=</span>ax, cmap<span style="color: #666666">=</span><span style="color: #BA2121">&quot;viridis&quot;</span>)
-ax<span style="color: #666666">.</span>set_title(<span style="color: #BA2121">&quot;Training Accuracy&quot;</span>)
-ax<span style="color: #666666">.</span>set_ylabel(<span style="color: #BA2121">&quot;$\eta$&quot;</span>)
-ax<span style="color: #666666">.</span>set_xlabel(<span style="color: #BA2121">&quot;$\lambda$&quot;</span>)
-plt<span style="color: #666666">.</span>show()
-
-fig, ax <span style="color: #666666">=</span> plt<span style="color: #666666">.</span>subplots(figsize <span style="color: #666666">=</span> (<span style="color: #666666">10</span>, <span style="color: #666666">10</span>))
-sns<span style="color: #666666">.</span>heatmap(test_accuracy, annot<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">True</span>, ax<span style="color: #666666">=</span>ax, cmap<span style="color: #666666">=</span><span style="color: #BA2121">&quot;viridis&quot;</span>)
-ax<span style="color: #666666">.</span>set_title(<span style="color: #BA2121">&quot;Test Accuracy&quot;</span>)
-ax<span style="color: #666666">.</span>set_ylabel(<span style="color: #BA2121">&quot;$\eta$&quot;</span>)
-ax<span style="color: #666666">.</span>set_xlabel(<span style="color: #BA2121">&quot;$\lambda$&quot;</span>)
-plt<span style="color: #666666">.</span>show()
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="the-cifar01-data-set">The CIFAR01 data set </h2>
-
-<p>The CIFAR10 dataset contains 60,000 color images in 10 classes, with
-6,000 images in each class. The dataset is divided into 50,000
-training images and 10,000 testing images. The classes are mutually
-exclusive and there is no overlap between them.
-</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">tensorflow</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">tf</span>
-
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">tensorflow.keras</span> <span style="color: #008000; font-weight: bold">import</span> datasets, layers, models
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">matplotlib.pyplot</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">plt</span>
-
-<span style="color: #408080; font-style: italic"># We import the data set</span>
-(train_images, train_labels), (test_images, test_labels) <span style="color: #666666">=</span> datasets<span style="color: #666666">.</span>cifar10<span style="color: #666666">.</span>load_data()
-
-<span style="color: #408080; font-style: italic"># Normalize pixel values to be between 0 and 1 by dividing by 255. </span>
-train_images, test_images <span style="color: #666666">=</span> train_images <span style="color: #666666">/</span> <span style="color: #666666">255.0</span>, test_images <span style="color: #666666">/</span> <span style="color: #666666">255.0</span>
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="verifying-the-data-set">Verifying the data set </h2>
-
-<p>To verify that the dataset looks correct, let's plot the first 25 images from the training set and display the class name below each image.</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;">class_names <span style="color: #666666">=</span> [<span style="color: #BA2121">&#39;airplane&#39;</span>, <span style="color: #BA2121">&#39;automobile&#39;</span>, <span style="color: #BA2121">&#39;bird&#39;</span>, <span style="color: #BA2121">&#39;cat&#39;</span>, <span style="color: #BA2121">&#39;deer&#39;</span>,
-               <span style="color: #BA2121">&#39;dog&#39;</span>, <span style="color: #BA2121">&#39;frog&#39;</span>, <span style="color: #BA2121">&#39;horse&#39;</span>, <span style="color: #BA2121">&#39;ship&#39;</span>, <span style="color: #BA2121">&#39;truck&#39;</span>]
-plt<span style="color: #666666">.</span>figure(figsize<span style="color: #666666">=</span>(<span style="color: #666666">10</span>,<span style="color: #666666">10</span>))
-<span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #666666">25</span>):
-    plt<span style="color: #666666">.</span>subplot(<span style="color: #666666">5</span>,<span style="color: #666666">5</span>,i<span style="color: #666666">+1</span>)
-    plt<span style="color: #666666">.</span>xticks([])
-    plt<span style="color: #666666">.</span>yticks([])
-    plt<span style="color: #666666">.</span>grid(<span style="color: #008000; font-weight: bold">False</span>)
-    plt<span style="color: #666666">.</span>imshow(train_images[i], cmap<span style="color: #666666">=</span>plt<span style="color: #666666">.</span>cm<span style="color: #666666">.</span>binary)
-    <span style="color: #408080; font-style: italic"># The CIFAR labels happen to be arrays, </span>
-    <span style="color: #408080; font-style: italic"># which is why you need the extra index</span>
-    plt<span style="color: #666666">.</span>xlabel(class_names[train_labels[i][<span style="color: #666666">0</span>]])
-plt<span style="color: #666666">.</span>show()
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="set-up-the-model">Set up  the model </h2>
-
-<p>The 6 lines of code below define the convolutional base using a common pattern: a stack of Conv2D and MaxPooling2D layers.</p>
-
-<p>As input, a CNN takes tensors of shape (image_height, image_width, color_channels), ignoring the batch size. If you are new to these dimensions, color_channels refers to (R,G,B). In this example, you will configure our CNN to process inputs of shape (32, 32, 3), which is the format of CIFAR images. You can do this by passing the argument input_shape to our first layer.</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;">model <span style="color: #666666">=</span> models<span style="color: #666666">.</span>Sequential()
-model<span style="color: #666666">.</span>add(layers<span style="color: #666666">.</span>Conv2D(<span style="color: #666666">32</span>, (<span style="color: #666666">3</span>, <span style="color: #666666">3</span>), activation<span style="color: #666666">=</span><span style="color: #BA2121">&#39;relu&#39;</span>, input_shape<span style="color: #666666">=</span>(<span style="color: #666666">32</span>, <span style="color: #666666">32</span>, <span style="color: #666666">3</span>)))
-model<span style="color: #666666">.</span>add(layers<span style="color: #666666">.</span>MaxPooling2D((<span style="color: #666666">2</span>, <span style="color: #666666">2</span>)))
-model<span style="color: #666666">.</span>add(layers<span style="color: #666666">.</span>Conv2D(<span style="color: #666666">64</span>, (<span style="color: #666666">3</span>, <span style="color: #666666">3</span>), activation<span style="color: #666666">=</span><span style="color: #BA2121">&#39;relu&#39;</span>))
-model<span style="color: #666666">.</span>add(layers<span style="color: #666666">.</span>MaxPooling2D((<span style="color: #666666">2</span>, <span style="color: #666666">2</span>)))
-model<span style="color: #666666">.</span>add(layers<span style="color: #666666">.</span>Conv2D(<span style="color: #666666">64</span>, (<span style="color: #666666">3</span>, <span style="color: #666666">3</span>), activation<span style="color: #666666">=</span><span style="color: #BA2121">&#39;relu&#39;</span>))
-
-<span style="color: #408080; font-style: italic"># Let&#39;s display the architecture of our model so far.</span>
-
-model<span style="color: #666666">.</span>summary()
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-<p>You can see that the output of every Conv2D and MaxPooling2D layer is a 3D tensor of shape (height, width, channels). The width and height dimensions tend to shrink as you go deeper in the network. The number of output channels for each Conv2D layer is controlled by the first argument (e.g., 32 or 64). Typically, as the width and height shrink, you can afford (computationally) to add more output channels in each Conv2D layer.</p>
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="add-dense-layers-on-top">Add Dense layers on top </h2>
-
-<p>To complete our model, you will feed the last output tensor from the
-convolutional base (of shape (4, 4, 64)) into one or more Dense layers
-to perform classification. Dense layers take vectors as input (which
-are 1D), while the current output is a 3D tensor. First, you will
-flatten (or unroll) the 3D output to 1D, then add one or more Dense
-layers on top. CIFAR has 10 output classes, so you use a final Dense
-layer with 10 outputs and a softmax activation.
-</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;">model<span style="color: #666666">.</span>add(layers<span style="color: #666666">.</span>Flatten())
-model<span style="color: #666666">.</span>add(layers<span style="color: #666666">.</span>Dense(<span style="color: #666666">64</span>, activation<span style="color: #666666">=</span><span style="color: #BA2121">&#39;relu&#39;</span>))
-model<span style="color: #666666">.</span>add(layers<span style="color: #666666">.</span>Dense(<span style="color: #666666">10</span>))
-<span style="color: #408080; font-style: italic"># Here&#39;s the complete architecture of our model</span>
-
-model<span style="color: #666666">.</span>summary()
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-<p>As you can see, our (4, 4, 64) outputs were flattened into vectors of shape (1024) before going through two Dense layers.</p>
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="compile-and-train-the-model">Compile and train the model </h2>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;">model<span style="color: #666666">.</span>compile(optimizer<span style="color: #666666">=</span><span style="color: #BA2121">&#39;adam&#39;</span>,
-              loss<span style="color: #666666">=</span>tf<span style="color: #666666">.</span>keras<span style="color: #666666">.</span>losses<span style="color: #666666">.</span>SparseCategoricalCrossentropy(from_logits<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">True</span>),
-              metrics<span style="color: #666666">=</span>[<span style="color: #BA2121">&#39;accuracy&#39;</span>])
-
-history <span style="color: #666666">=</span> model<span style="color: #666666">.</span>fit(train_images, train_labels, epochs<span style="color: #666666">=10</span>, 
-                    validation_data<span style="color: #666666">=</span>(test_images, test_labels))
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="finally-evaluate-the-model">Finally, evaluate the model </h2>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;">plt<span style="color: #666666">.</span>plot(history<span style="color: #666666">.</span>history[<span style="color: #BA2121">&#39;accuracy&#39;</span>], label<span style="color: #666666">=</span><span style="color: #BA2121">&#39;accuracy&#39;</span>)
-plt<span style="color: #666666">.</span>plot(history<span style="color: #666666">.</span>history[<span style="color: #BA2121">&#39;val_accuracy&#39;</span>], label <span style="color: #666666">=</span> <span style="color: #BA2121">&#39;val_accuracy&#39;</span>)
-plt<span style="color: #666666">.</span>xlabel(<span style="color: #BA2121">&#39;Epoch&#39;</span>)
-plt<span style="color: #666666">.</span>ylabel(<span style="color: #BA2121">&#39;Accuracy&#39;</span>)
-plt<span style="color: #666666">.</span>ylim([<span style="color: #666666">0.5</span>, <span style="color: #666666">1</span>])
-plt<span style="color: #666666">.</span>legend(loc<span style="color: #666666">=</span><span style="color: #BA2121">&#39;lower right&#39;</span>)
-
-test_loss, test_acc <span style="color: #666666">=</span> model<span style="color: #666666">.</span>evaluate(test_images,  test_labels, verbose<span style="color: #666666">=2</span>)
-
-<span style="color: #008000">print</span>(test_acc)
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="building-our-own-cnn-code">Building our own CNN code </h2>
-
-<p>Here we present a flexible and readable python code for a CNN
-implemented with NumPy. We will present the code, showcase how to use
-the codebase and fit a CNN that yields a 99% accuracy on the 28x28
-MNIST dataset within reasonable time.
-</p>
-
-<b>The codes here were developed by Eric Reber and Gregor Kajda during spring 2023.</b>
-
-<p>The CNN is compatible with all schedulers, cost functions and
-activation functions discussed in constructing our neural network
-codes.
-</p>
-
-<p> The CNN code consists of different types of Layer classes, including
-Convolution2DLayer, Pooling2DLayer, FlattenLayer, FullyConnectedLayer
-and OutputLayer, which can be added to the CNN object using the
-interface of the CNN class. This allows you to easily construct your
-own CNN, as well as allowing you to get used to an interface similar
-to that of TensorFlow which is used for real world applications. 
-</p>
-
-<p>Another important feature of this code is that it throws errors if
-unreasonable decisions are made (for example using a kernel that is
-larger than the image, not using a FlattenLayer, etc), and provides
-the user with an informative error message.
-</p>
-<h3 id="list-of-contents">List of contents: </h3>
-<ol>
-<li> Schedulers</li>
-<li> Activation Functions</li>
-<li> Cost Functions</li> 
-<li> Convolution</li>
-<li> Layers</li>
-<li> CNN</li> 
-<li> Some final remarks</li>
-</ol>
-<h3 id="schedulers">Schedulers </h3>
-
-<p>The code below shows object oriented implementations of the Constant,
-Momentum, Adagrad, AdagradMomentum, RMS prop and Adam schedulers. All
-of the classes belong to the shared abstract Scheduler class, and
-share the update_change() and reset() methods allowing for any of the
-schedulers to be seamlessly used during the training stage, as will
-later be shown in the fit() method of the neural
-network. Update_change() only has one parameter, the gradient
-(\( \delta^{l}_{j}a^{l-1}_k \)), and returns the change which will be
-subtracted from the weights. The reset() function takes no parameters,
-and resets the desired variables. For Constant and Momentum, reset
-does nothing.
-</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">autograd.numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
-
-<span style="color: #008000; font-weight: bold">class</span> <span style="color: #0000FF; font-weight: bold">Scheduler</span>:
-    <span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;</span>
-<span style="color: #BA2121; font-style: italic">    Abstract class for Schedulers</span>
-<span style="color: #BA2121; font-style: italic">    &quot;&quot;&quot;</span>
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">__init__</span>(<span style="color: #008000">self</span>, eta):
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>eta <span style="color: #666666">=</span> eta
-
-    <span style="color: #408080; font-style: italic"># should be overwritten</span>
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">update_change</span>(<span style="color: #008000">self</span>, gradient):
-        <span style="color: #008000; font-weight: bold">raise</span> <span style="color: #D2413A; font-weight: bold">NotImplementedError</span>
-
-    <span style="color: #408080; font-style: italic"># overwritten if needed</span>
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">reset</span>(<span style="color: #008000">self</span>):
-        <span style="color: #008000; font-weight: bold">pass</span>
-
-
-<span style="color: #008000; font-weight: bold">class</span> <span style="color: #0000FF; font-weight: bold">Constant</span>(Scheduler):
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">__init__</span>(<span style="color: #008000">self</span>, eta):
-        <span style="color: #008000">super</span>()<span style="color: #666666">.</span><span style="color: #0000FF">__init__</span>(eta)
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">update_change</span>(<span style="color: #008000">self</span>, gradient):
-        <span style="color: #008000; font-weight: bold">return</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>eta <span style="color: #666666">*</span> gradient
-    
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">reset</span>(<span style="color: #008000">self</span>):
-        <span style="color: #008000; font-weight: bold">pass</span>
-
-
-<span style="color: #008000; font-weight: bold">class</span> <span style="color: #0000FF; font-weight: bold">Momentum</span>(Scheduler):
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">__init__</span>(<span style="color: #008000">self</span>, eta: <span style="color: #008000">float</span>, momentum: <span style="color: #008000">float</span>):
-        <span style="color: #008000">super</span>()<span style="color: #666666">.</span><span style="color: #0000FF">__init__</span>(eta)
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>momentum <span style="color: #666666">=</span> momentum
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>change <span style="color: #666666">=</span> <span style="color: #666666">0</span>
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">update_change</span>(<span style="color: #008000">self</span>, gradient):
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>change <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>momentum <span style="color: #666666">*</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>change <span style="color: #666666">+</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>eta <span style="color: #666666">*</span> gradient
-        <span style="color: #008000; font-weight: bold">return</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>change
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">reset</span>(<span style="color: #008000">self</span>):
-        <span style="color: #008000; font-weight: bold">pass</span>
-
-
-<span style="color: #008000; font-weight: bold">class</span> <span style="color: #0000FF; font-weight: bold">Adagrad</span>(Scheduler):
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">__init__</span>(<span style="color: #008000">self</span>, eta):
-        <span style="color: #008000">super</span>()<span style="color: #666666">.</span><span style="color: #0000FF">__init__</span>(eta)
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>G_t <span style="color: #666666">=</span> <span style="color: #008000; font-weight: bold">None</span>
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">update_change</span>(<span style="color: #008000">self</span>, gradient):
-        delta <span style="color: #666666">=</span> <span style="color: #666666">1e-8</span>  <span style="color: #408080; font-style: italic"># avoid division ny zero</span>
-
-        <span style="color: #008000; font-weight: bold">if</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>G_t <span style="color: #AA22FF; font-weight: bold">is</span> <span style="color: #008000; font-weight: bold">None</span>:
-            <span style="color: #008000">self</span><span style="color: #666666">.</span>G_t <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros((gradient<span style="color: #666666">.</span>shape[<span style="color: #666666">0</span>], gradient<span style="color: #666666">.</span>shape[<span style="color: #666666">0</span>]))
-
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>G_t <span style="color: #666666">+=</span> gradient <span style="color: #666666">@</span> gradient<span style="color: #666666">.</span>T
-
-        G_t_inverse <span style="color: #666666">=</span> <span style="color: #666666">1</span> <span style="color: #666666">/</span> (
-            delta <span style="color: #666666">+</span> np<span style="color: #666666">.</span>sqrt(np<span style="color: #666666">.</span>reshape(np<span style="color: #666666">.</span>diagonal(<span style="color: #008000">self</span><span style="color: #666666">.</span>G_t), (<span style="color: #008000">self</span><span style="color: #666666">.</span>G_t<span style="color: #666666">.</span>shape[<span style="color: #666666">0</span>], <span style="color: #666666">1</span>)))
-        )
-        <span style="color: #008000; font-weight: bold">return</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>eta <span style="color: #666666">*</span> gradient <span style="color: #666666">*</span> G_t_inverse
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">reset</span>(<span style="color: #008000">self</span>):
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>G_t <span style="color: #666666">=</span> <span style="color: #008000; font-weight: bold">None</span>
-
-
-<span style="color: #008000; font-weight: bold">class</span> <span style="color: #0000FF; font-weight: bold">AdagradMomentum</span>(Scheduler):
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">__init__</span>(<span style="color: #008000">self</span>, eta, momentum):
-        <span style="color: #008000">super</span>()<span style="color: #666666">.</span><span style="color: #0000FF">__init__</span>(eta)
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>G_t <span style="color: #666666">=</span> <span style="color: #008000; font-weight: bold">None</span>
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>momentum <span style="color: #666666">=</span> momentum
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>change <span style="color: #666666">=</span> <span style="color: #666666">0</span>
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">update_change</span>(<span style="color: #008000">self</span>, gradient):
-        delta <span style="color: #666666">=</span> <span style="color: #666666">1e-8</span>  <span style="color: #408080; font-style: italic"># avoid division ny zero</span>
-
-        <span style="color: #008000; font-weight: bold">if</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>G_t <span style="color: #AA22FF; font-weight: bold">is</span> <span style="color: #008000; font-weight: bold">None</span>:
-            <span style="color: #008000">self</span><span style="color: #666666">.</span>G_t <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros((gradient<span style="color: #666666">.</span>shape[<span style="color: #666666">0</span>], gradient<span style="color: #666666">.</span>shape[<span style="color: #666666">0</span>]))
-
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>G_t <span style="color: #666666">+=</span> gradient <span style="color: #666666">@</span> gradient<span style="color: #666666">.</span>T
-
-        G_t_inverse <span style="color: #666666">=</span> <span style="color: #666666">1</span> <span style="color: #666666">/</span> (
-            delta <span style="color: #666666">+</span> np<span style="color: #666666">.</span>sqrt(np<span style="color: #666666">.</span>reshape(np<span style="color: #666666">.</span>diagonal(<span style="color: #008000">self</span><span style="color: #666666">.</span>G_t), (<span style="color: #008000">self</span><span style="color: #666666">.</span>G_t<span style="color: #666666">.</span>shape[<span style="color: #666666">0</span>], <span style="color: #666666">1</span>)))
-        )
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>change <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>change <span style="color: #666666">*</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>momentum <span style="color: #666666">+</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>eta <span style="color: #666666">*</span> gradient <span style="color: #666666">*</span> G_t_inverse
-        <span style="color: #008000; font-weight: bold">return</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>change
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">reset</span>(<span style="color: #008000">self</span>):
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>G_t <span style="color: #666666">=</span> <span style="color: #008000; font-weight: bold">None</span>
-
-
-<span style="color: #008000; font-weight: bold">class</span> <span style="color: #0000FF; font-weight: bold">RMS_prop</span>(Scheduler):
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">__init__</span>(<span style="color: #008000">self</span>, eta, rho):
-        <span style="color: #008000">super</span>()<span style="color: #666666">.</span><span style="color: #0000FF">__init__</span>(eta)
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>rho <span style="color: #666666">=</span> rho
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>second <span style="color: #666666">=</span> <span style="color: #666666">0.0</span>
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">update_change</span>(<span style="color: #008000">self</span>, gradient):
-        delta <span style="color: #666666">=</span> <span style="color: #666666">1e-8</span>  <span style="color: #408080; font-style: italic"># avoid division ny zero</span>
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>second <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>rho <span style="color: #666666">*</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>second <span style="color: #666666">+</span> (<span style="color: #666666">1</span> <span style="color: #666666">-</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>rho) <span style="color: #666666">*</span> gradient <span style="color: #666666">*</span> gradient
-        <span style="color: #008000; font-weight: bold">return</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>eta <span style="color: #666666">*</span> gradient <span style="color: #666666">/</span> (np<span style="color: #666666">.</span>sqrt(<span style="color: #008000">self</span><span style="color: #666666">.</span>second <span style="color: #666666">+</span> delta))
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">reset</span>(<span style="color: #008000">self</span>):
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>second <span style="color: #666666">=</span> <span style="color: #666666">0.0</span>
-
-
-<span style="color: #008000; font-weight: bold">class</span> <span style="color: #0000FF; font-weight: bold">Adam</span>(Scheduler):
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">__init__</span>(<span style="color: #008000">self</span>, eta, rho, rho2):
-        <span style="color: #008000">super</span>()<span style="color: #666666">.</span><span style="color: #0000FF">__init__</span>(eta)
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>rho <span style="color: #666666">=</span> rho
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>rho2 <span style="color: #666666">=</span> rho2
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>moment <span style="color: #666666">=</span> <span style="color: #666666">0</span>
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>second <span style="color: #666666">=</span> <span style="color: #666666">0</span>
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>n_epochs <span style="color: #666666">=</span> <span style="color: #666666">1</span>
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">update_change</span>(<span style="color: #008000">self</span>, gradient):
-        delta <span style="color: #666666">=</span> <span style="color: #666666">1e-8</span>  <span style="color: #408080; font-style: italic"># avoid division ny zero</span>
-
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>moment <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>rho <span style="color: #666666">*</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>moment <span style="color: #666666">+</span> (<span style="color: #666666">1</span> <span style="color: #666666">-</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>rho) <span style="color: #666666">*</span> gradient
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>second <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>rho2 <span style="color: #666666">*</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>second <span style="color: #666666">+</span> (<span style="color: #666666">1</span> <span style="color: #666666">-</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>rho2) <span style="color: #666666">*</span> gradient <span style="color: #666666">*</span> gradient
-
-        moment_corrected <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>moment <span style="color: #666666">/</span> (<span style="color: #666666">1</span> <span style="color: #666666">-</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>rho<span style="color: #666666">**</span><span style="color: #008000">self</span><span style="color: #666666">.</span>n_epochs)
-        second_corrected <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>second <span style="color: #666666">/</span> (<span style="color: #666666">1</span> <span style="color: #666666">-</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>rho2<span style="color: #666666">**</span><span style="color: #008000">self</span><span style="color: #666666">.</span>n_epochs)
-
-        <span style="color: #008000; font-weight: bold">return</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>eta <span style="color: #666666">*</span> moment_corrected <span style="color: #666666">/</span> (np<span style="color: #666666">.</span>sqrt(second_corrected <span style="color: #666666">+</span> delta))
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">reset</span>(<span style="color: #008000">self</span>):
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>n_epochs <span style="color: #666666">+=</span> <span style="color: #666666">1</span>
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>moment <span style="color: #666666">=</span> <span style="color: #666666">0</span>
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>second <span style="color: #666666">=</span> <span style="color: #666666">0</span>
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-<h3 id="usage-of-schedulers">Usage of schedulers </h3>
-
-<p>To initalize a scheduler, simply create the object and pass in the necessary parameters such as the learning rate and the momentum as shown below. As the Scheduler class is an abstract class it should not called directly, and will raise an error upon usage.</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;">momentum_scheduler <span style="color: #666666">=</span> Momentum(eta<span style="color: #666666">=1e-3</span>, momentum<span style="color: #666666">=0.9</span>)
-adam_scheduler <span style="color: #666666">=</span> Adam(eta<span style="color: #666666">=1e-3</span>, rho<span style="color: #666666">=0.9</span>, rho2<span style="color: #666666">=0.999</span>)
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-<p>Here is a small example for how a segment of code using schedulers could look. Switching out the schedulers is simple.</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;">weights <span style="color: #666666">=</span> np<span style="color: #666666">.</span>ones((<span style="color: #666666">3</span>,<span style="color: #666666">3</span>))
-<span style="color: #008000">print</span>(<span style="color: #BA2121">f&quot;Before scheduler:</span><span style="color: #BB6622; font-weight: bold">\n</span><span style="color: #BB6688; font-weight: bold">{</span>weights<span style="color: #BB6688; font-weight: bold">=}</span><span style="color: #BA2121">&quot;</span>)
-
-epochs <span style="color: #666666">=</span> <span style="color: #666666">10</span>
-<span style="color: #008000; font-weight: bold">for</span> e <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(epochs):
-    gradient <span style="color: #666666">=</span> np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>rand(<span style="color: #666666">3</span>, <span style="color: #666666">3</span>)
-    change <span style="color: #666666">=</span> adam_scheduler<span style="color: #666666">.</span>update_change(gradient)
-    weights <span style="color: #666666">=</span> weights <span style="color: #666666">-</span> change
-    adam_scheduler<span style="color: #666666">.</span>reset()
-
-<span style="color: #008000">print</span>(<span style="color: #BA2121">f&quot;</span><span style="color: #BB6622; font-weight: bold">\n</span><span style="color: #BA2121">After scheduler:</span><span style="color: #BB6622; font-weight: bold">\n</span><span style="color: #BB6688; font-weight: bold">{</span>weights<span style="color: #BB6688; font-weight: bold">=}</span><span style="color: #BA2121">&quot;</span>)
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-<h3 id="cost-functions">Cost functions </h3>
-
-<p>In this section we will quickly look at cost functions that can be
-used when creating the neural network. Every cost function takes the
-target vector as its parameter, and returns a function valued only at
-X such that it may easily be differentiated.
-</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">CostOLS</span>(target):
-    <span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;</span>
-<span style="color: #BA2121; font-style: italic">    Return OLS function valued only at X, so</span>
-<span style="color: #BA2121; font-style: italic">    that it may be easily differentiated</span>
-<span style="color: #BA2121; font-style: italic">    &quot;&quot;&quot;</span>
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">func</span>(X):
-        <span style="color: #008000; font-weight: bold">return</span> (<span style="color: #666666">1.0</span> <span style="color: #666666">/</span> target<span style="color: #666666">.</span>shape[<span style="color: #666666">0</span>]) <span style="color: #666666">*</span> np<span style="color: #666666">.</span>sum((target <span style="color: #666666">-</span> X) <span style="color: #666666">**</span> <span style="color: #666666">2</span>)
-
-    <span style="color: #008000; font-weight: bold">return</span> func
-
-
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">CostLogReg</span>(target):
-    <span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;</span>
-<span style="color: #BA2121; font-style: italic">    Return Logistic Regression cost function</span>
-<span style="color: #BA2121; font-style: italic">    valued only at X, so that it may be easily differentiated</span>
-<span style="color: #BA2121; font-style: italic">    &quot;&quot;&quot;</span>
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">func</span>(X):
-        <span style="color: #008000; font-weight: bold">return</span> <span style="color: #666666">-</span>(<span style="color: #666666">1.0</span> <span style="color: #666666">/</span> target<span style="color: #666666">.</span>shape[<span style="color: #666666">0</span>]) <span style="color: #666666">*</span> np<span style="color: #666666">.</span>sum(
-            (target <span style="color: #666666">*</span> np<span style="color: #666666">.</span>log(X <span style="color: #666666">+</span> <span style="color: #666666">10e-10</span>)) <span style="color: #666666">+</span> ((<span style="color: #666666">1</span> <span style="color: #666666">-</span> target) <span style="color: #666666">*</span> np<span style="color: #666666">.</span>log(<span style="color: #666666">1</span> <span style="color: #666666">-</span> X <span style="color: #666666">+</span> <span style="color: #666666">10e-10</span>))
-        )
-
-    <span style="color: #008000; font-weight: bold">return</span> func
-
-
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">CostCrossEntropy</span>(target):
-    <span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;</span>
-<span style="color: #BA2121; font-style: italic">    Return cross entropy cost function valued only at X, so</span>
-<span style="color: #BA2121; font-style: italic">    that it may be easily differentiated</span>
-<span style="color: #BA2121; font-style: italic">    &quot;&quot;&quot;</span>
-    
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">func</span>(X):
-        <span style="color: #008000; font-weight: bold">return</span> <span style="color: #666666">-</span>(<span style="color: #666666">1.0</span> <span style="color: #666666">/</span> target<span style="color: #666666">.</span>size) <span style="color: #666666">*</span> np<span style="color: #666666">.</span>sum(target <span style="color: #666666">*</span> np<span style="color: #666666">.</span>log(X <span style="color: #666666">+</span> <span style="color: #666666">10e-10</span>))
-
-    <span style="color: #008000; font-weight: bold">return</span> func
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-<h3 id="usage-of-cost-functions">Usage of cost functions </h3>
-
-<p>Below we will provide a short example of how these cost function may
-be used to obtain results if you wish to test them out on your own
-using AutoGrad's automatic differentiation.
-</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">autograd</span> <span style="color: #008000; font-weight: bold">import</span> grad
-
-target <span style="color: #666666">=</span> np<span style="color: #666666">.</span>array([[<span style="color: #666666">1</span>, <span style="color: #666666">2</span>, <span style="color: #666666">3</span>]])<span style="color: #666666">.</span>T
-a <span style="color: #666666">=</span> np<span style="color: #666666">.</span>array([[<span style="color: #666666">4</span>, <span style="color: #666666">5</span>, <span style="color: #666666">6</span>]])<span style="color: #666666">.</span>T
-
-cost_func <span style="color: #666666">=</span> CostCrossEntropy
-cost_func_derivative <span style="color: #666666">=</span> grad(cost_func(target))
-
-valued_at_a <span style="color: #666666">=</span> cost_func_derivative(a)
-<span style="color: #008000">print</span>(<span style="color: #BA2121">f&quot;Derivative of cost function </span><span style="color: #BB6688; font-weight: bold">{</span>cost_func<span style="color: #666666">.</span><span style="color: #19177C">__name__</span><span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121"> valued at a:</span><span style="color: #BB6622; font-weight: bold">\n</span><span style="color: #BB6688; font-weight: bold">{</span>valued_at_a<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">&quot;</span>)
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-<h3 id="activation-functions">Activation functions </h3>
-
-<p>Finally, before we look at the layers that make up the neural network,
-we will look at the activation functions which can be specified
-between the hidden layers and as the output function. Each function
-can be valued for any given vector or matrix X, and can be
-differentiated via derivate().
-</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">autograd.numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">autograd</span> <span style="color: #008000; font-weight: bold">import</span> elementwise_grad
-
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">identity</span>(X):
-    <span style="color: #008000; font-weight: bold">return</span> X
-
-
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">sigmoid</span>(X):
-    <span style="color: #008000; font-weight: bold">try</span>:
-        <span style="color: #008000; font-weight: bold">return</span> <span style="color: #666666">1.0</span> <span style="color: #666666">/</span> (<span style="color: #666666">1</span> <span style="color: #666666">+</span> np<span style="color: #666666">.</span>exp(<span style="color: #666666">-</span>X))
-    <span style="color: #008000; font-weight: bold">except</span> <span style="color: #D2413A; font-weight: bold">FloatingPointError</span>:
-        <span style="color: #008000; font-weight: bold">return</span> np<span style="color: #666666">.</span>where(X <span style="color: #666666">&gt;</span> np<span style="color: #666666">.</span>zeros(X<span style="color: #666666">.</span>shape), np<span style="color: #666666">.</span>ones(X<span style="color: #666666">.</span>shape), np<span style="color: #666666">.</span>zeros(X<span style="color: #666666">.</span>shape))
-
-
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">softmax</span>(X):
-    X <span style="color: #666666">=</span> X <span style="color: #666666">-</span> np<span style="color: #666666">.</span>max(X, axis<span style="color: #666666">=-1</span>, keepdims<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">True</span>)
-    delta <span style="color: #666666">=</span> <span style="color: #666666">10e-10</span>
-    <span style="color: #008000; font-weight: bold">return</span> np<span style="color: #666666">.</span>exp(X) <span style="color: #666666">/</span> (np<span style="color: #666666">.</span>sum(np<span style="color: #666666">.</span>exp(X), axis<span style="color: #666666">=-1</span>, keepdims<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">True</span>) <span style="color: #666666">+</span> delta)
-
-
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">RELU</span>(X):
-    <span style="color: #008000; font-weight: bold">return</span> np<span style="color: #666666">.</span>where(X <span style="color: #666666">&gt;</span> np<span style="color: #666666">.</span>zeros(X<span style="color: #666666">.</span>shape), X, np<span style="color: #666666">.</span>zeros(X<span style="color: #666666">.</span>shape))
-
-
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">LRELU</span>(X):
-    delta <span style="color: #666666">=</span> <span style="color: #666666">10e-4</span>
-    <span style="color: #008000; font-weight: bold">return</span> np<span style="color: #666666">.</span>where(X <span style="color: #666666">&gt;</span> np<span style="color: #666666">.</span>zeros(X<span style="color: #666666">.</span>shape), X, delta <span style="color: #666666">*</span> X)
-
-
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">derivate</span>(func):
-    <span style="color: #008000; font-weight: bold">if</span> func<span style="color: #666666">.</span><span style="color: #19177C">__name__</span> <span style="color: #666666">==</span> <span style="color: #BA2121">&quot;RELU&quot;</span>:
-
-        <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">func</span>(X):
-            <span style="color: #008000; font-weight: bold">return</span> np<span style="color: #666666">.</span>where(X <span style="color: #666666">&gt;</span> <span style="color: #666666">0</span>, <span style="color: #666666">1</span>, <span style="color: #666666">0</span>)
-
-        <span style="color: #008000; font-weight: bold">return</span> func
-
-    <span style="color: #008000; font-weight: bold">elif</span> func<span style="color: #666666">.</span><span style="color: #19177C">__name__</span> <span style="color: #666666">==</span> <span style="color: #BA2121">&quot;LRELU&quot;</span>:
-
-        <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">func</span>(X):
-            delta <span style="color: #666666">=</span> <span style="color: #666666">10e-4</span>
-            <span style="color: #008000; font-weight: bold">return</span> np<span style="color: #666666">.</span>where(X <span style="color: #666666">&gt;</span> <span style="color: #666666">0</span>, <span style="color: #666666">1</span>, delta)
-
-        <span style="color: #008000; font-weight: bold">return</span> func
-
-    <span style="color: #008000; font-weight: bold">else</span>:
-        <span style="color: #008000; font-weight: bold">return</span> elementwise_grad(func)
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-<h3 id="usage-of-activation-functions">Usage of activation functions </h3>
-
-<p>Below we present a short demonstration of how to use an activation
-function. The derivative of the activation function will be important
-when calculating the output delta term during backpropagation. Note
-that derivate() can also be used for cost functions for a more
-generalized approach.
-</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;">z <span style="color: #666666">=</span> np<span style="color: #666666">.</span>array([[<span style="color: #666666">4</span>, <span style="color: #666666">5</span>, <span style="color: #666666">6</span>]])<span style="color: #666666">.</span>T
-<span style="color: #008000">print</span>(<span style="color: #BA2121">f&quot;Input to activation function:</span><span style="color: #BB6622; font-weight: bold">\n</span><span style="color: #BB6688; font-weight: bold">{</span>z<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">&quot;</span>)
-
-act_func <span style="color: #666666">=</span> sigmoid
-a <span style="color: #666666">=</span> act_func(z)
-<span style="color: #008000">print</span>(<span style="color: #BA2121">f&quot;</span><span style="color: #BB6622; font-weight: bold">\n</span><span style="color: #BA2121">Output from </span><span style="color: #BB6688; font-weight: bold">{</span>act_func<span style="color: #666666">.</span><span style="color: #19177C">__name__</span><span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121"> activation function:</span><span style="color: #BB6622; font-weight: bold">\n</span><span style="color: #BB6688; font-weight: bold">{</span>a<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">&quot;</span>)
-
-act_func_derivative <span style="color: #666666">=</span> derivate(act_func)
-valued_at_z <span style="color: #666666">=</span> act_func_derivative(a)
-<span style="color: #008000">print</span>(<span style="color: #BA2121">f&quot;</span><span style="color: #BB6622; font-weight: bold">\n</span><span style="color: #BA2121">Derivative of </span><span style="color: #BB6688; font-weight: bold">{</span>act_func<span style="color: #666666">.</span><span style="color: #19177C">__name__</span><span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121"> activation function valued at z:</span><span style="color: #BB6622; font-weight: bold">\n</span><span style="color: #BB6688; font-weight: bold">{</span>valued_at_z<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">&quot;</span>)
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-<h3 id="convolution">Convolution </h3>
-
-<p>In order to construct a convolutional neural network (CNN), it is
-crucial to comprehend the fundamental principles of convolution and
-how it aids in extracting information from images. Convolution, at its
-core, is merely a mathematical operation between two functions that
-yields another function. It is represented by an integral between two
-functions, which is typically expressed as:
+<p>We have a new vector defined as \( x(0)=0 \), \( x(1)=0 \),
+\( x(2)=\beta_0 \), \( x(3)=\beta_1 \), \( x(4)=\beta_2 \), \( x(5)=\beta_3 \),
+\( x(6)=0 \), and \( x(7)=0 \).
 </p>
 
-$$
-(f \ast g)(t):=\int_{-\infty}^{\infty} f(\tau) g(t-\tau) d \tau.
-$$
-
-<p>Here, \( f \) and \( g \) are the two functions on which we want to perform an
-operation. The outcome of the convolution operation is represented by
-\( (f \ast g) \), and it is derived by sliding the function \( g \) over \( f \) and
-computing the integral of their product at each position. If both
-functions are continuous, convolution takes the form shown
-above. However, if we discretize both \( f \) and \( g \), the convolution
-operation will take the form of a sum between the elements of \( f \) and \( g \):
+<p>We have added four new elements, which
+are all zero. The benefit is that we can rewrite the equation for
+\( \boldsymbol{y} \), with \( i=0,1,\dots,5 \),
 </p>
 $$
-(f \ast g)[n]=\sum_{m=0}^{n-1} f(m) g(n-m).
+y(i) = \sum_{k=0}^{k=m-1}w(k)x(i+(m-1)-k).
 $$
 
-<p>The key idea we utilize to extract the information contained in an
-image is to slide an \( m \times n \) matrix \( g \) over an \( m \times n \)
-matrix \( f \). In our case, \( f \) represents the image, while \( g \)
-represents the kernel, oftentimes called a filter. However, since our
-convolution will be a two-dimensional variant, we need to extend our
-mathematical formula with an additional summation:
-</p>
-
+<p>As an example, we have</p>
 $$
-(f \ast g)(i, j)\sum_{m=0}^{M-1}\sum_{n=0}^{N-1} f(m,n) g(i-m, j-n).
+y(4)=x(6)w(0)+x(5)w(1)+x(4)w(2)=0\times \alpha_0+\beta_3\alpha_1+\beta_2\alpha_2,
 $$
 
-<p>It is imperative to note that the size of the kernel g is
-significantly smaller than the size of the input image f, thereby
-reducing the amount of computation necessary for feature
-extraction. Furthermore, the kernel is usually a trainable parameter
-in a convolutional neural network, allowing the network to learn
-appropriate kernels for specific tasks.
-</p>
-
-<p>To give you an example of how 2D convolution works in practice,
-suppose we have an image \( f \) of dimension \( 6 \times 6 \)
-</p>
+<p>as before except that we have an additional term \( x(6)w(0) \), which is zero.</p>
 
+<p>Similarly, for the fifth-order term we have</p>
 $$
-f = \begin{bmatrix}
-4 & 1 & 2 & 9 & 8 & 6 \\
-9 & 5 & 9 & 5 & 8 & 5 \\
-1 & 5 & 9 & 7 & 6 & 4 \\
-2 & 9 & 8 & 3 & 7 & 1 \\
-8 & 1 & 6 & 4 & 2 & 2 \\
-1 & 0 & 5 & 7 & 8 & 2 \\
-\end{bmatrix}
+y(5)=x(7)w(0)+x(6)w(1)+x(5)w(2)=0\times \alpha_0+0\times\alpha_1+\beta_3\alpha_2.
 $$
 
-<p>and a \( 3 \times 3 \) kernel \( g \) called a low-pass filter. Note that the
-kernel is usually rotated by 180 degrees during convolution, however
-this has no effect on this kernel.
-</p>
-
+<p>The zeroth-order term is</p>
 $$
-g = \frac{1}{9}
-\begin{bmatrix}
-1 & 1 & 1 \\
-1 & 1 & 1 \\
-1 & 1 & 1 \\
-\end{bmatrix}
+y(0)=x(2)w(0)+x(1)w(1)+x(0)w(2)=\beta_0 \alpha_0+0\times\alpha_1+0\times\alpha_2=\alpha_0\beta_0.
 $$
 
-<p>In order to filter the image, we have to extract a \( 3 \times 3 \)
-element from the upper left corner of \( f \), and perform element-wise
-multiplication of the extracted image pixels with the elements of the
-kernel \( g \):
-</p>
-
-$$
-\begin{bmatrix}
-4 & 1 & 2 \\
-9 & 5 & 9 \\
-1 & 5 & 9 \\
-\end{bmatrix}
-\cdot
-\begin{bmatrix}
-\frac{1}{9} & \frac{1}{9} & \frac{1}{9} \\
-\frac{1}{9} & \frac{1}{9} & \frac{1}{9} \\
-\frac{1}{9} & \frac{1}{9} & \frac{1}{9} \\
-\end{bmatrix}
-=
-\begin{bmatrix}
-\frac{4}{9} & \frac{1}{9} & \frac{2}{9} \\
-\frac{9}{9} & \frac{5}{9} & \frac{9}{9} \\
-\frac{1}{9} & \frac{5}{9} & \frac{9}{9} \\
-\end {bmatrix}
-= \boldsymbol{A}
-$$
 
-<p>Then, following the multiplication, we summarize all the elements of the resulting matrix \( \boldsymbol{A} \):</p>
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="rewriting-as-dot-products">Rewriting as dot products </h2>
 
+<p>If we now flip the filter/weight vector, with the following term as a typical example</p>
 $$
-(f \ast g)(0, 0)= \sum_{i=0}^{2} \sum_{j=0}^{2} a_{i,j} = 5,
+y(0)=x(2)w(0)+x(1)w(1)+x(0)w(2)=x(2)\tilde{w}(2)+x(1)\tilde{w}(1)+x(0)\tilde{w}(0),
 $$
 
-<p>which corresponds to the first element of the filtered image \( (f \ast g) \).</p>
-
-<p>Here we use a stride of \( S=1 \), a parameter denoted \( S \) which describes how
-many indexes we move the kernel \( g \) to the right before repeating the
-calculations above for the next \( 3 \times 3 \) element of the image
-\( f \). It is usually presumed that \( S=1 \), however, larger values for \( S \)
-can be used to reduce the dimentionality of the filtered image such
-that the convolution operation is more computationally efficient. In
-the context of a convolutional neural network, this will become very
-useful.
+<p>with \( \tilde{w}(0)=w(2) \), \( \tilde{w}(1)=w(1) \), and \( \tilde{w}(2)=w(0) \), we can then rewrite the above sum as a dot product of
+\( x(i:i+(m-1))\tilde{w} \) for element \( y(i) \), where \( x(i:i+(m-1)) \) is simply a patch of \( \boldsymbol{x} \) of size \( m-1 \).
 </p>
 
-<p>The full result of the convolution is:</p>
-
-$$
-(f \ast g) =
-\begin{bmatrix}
-5 & 5.78 & 7 & 6.44 \\
-6.33 & 6.67 & 6.89 & 5.11 \\
-5.44 & 5.78 & 5.78 & 4 \\
-4.44 & 4.78 & 5.56 & 4 \\
-\end{bmatrix}
-$$
-
-<p>The result is markedly smaller in shape than the original image. This
-occurs when using convolution without first padding the image with
-additional columns and rows, allowing us to keep the original image
-shape after sliding the kernel over the image.  How many rows and
-columns we wish to pad the image with depends strictly on the shape of
-the kernel, as we wish to pad the image with \( r \) additional rows and
-\( c \) additional columns.
+<p>The padding \( P \) we have introduced for the convolution stage is just
+another hyperparameter which is introduced as part of the
+architecture. Similarly, below we will also introduce another
+hyperparameter called <b>Stride</b> \( S \). 
 </p>
 
-$$
-r =\lfloor \frac{\mathrm{kernel height}}{2} \rfloor \cdot 2 \\
-c =\lfloor \frac{\mathrm{kernel width}}{2} \rfloor \cdot 2
-$$
-
-<p>Note the notation \( \lfloor \frac{\mathrm{kernel width}}{2} \rfloor \) means that
-we floor the result of the division, meaning we round down to a whole
-number in case \( \frac{\mathrm{kernel width}}{2} \) results in a floating point
-number.
-</p>
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="cross-correlation">Cross correlation </h2>
 
-<p>Using those simple equations, we find out by how much we have to
-extend the dimensions of the original image. Before proceeding,
-however, we might ask what we shall fill the additional rows and
-columns with? One of the most common approaches to padding is
-zero-padding, which as the name suggest, involves filling the rows and
-columns with zeros. This is the approach that we will be using for
-this demonstration. If we apply this padding to out original \( 6 \times 6 \)
-image, the result will be an \( 8 \times 8 \) image as the kernel has a width and
-height of 3. Note that the original image is encapsuled by the
-zero-padded rows and columns:
+<p>In essentially all applications one uses what is called cross correlation instead of the standard convolution described above.
+This means that multiplication is performed in the same direction and instead of the general expression we have discussed above (with infinite sums)
 </p>
-
-$$
-\begin{bmatrix}
-0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\
-0 & 4 & 1 & 2 & 9 & 8 & 6 & 0 \\
-0 & 9 & 5 & 9 & 5 & 8 & 5 & 0 \\
-0 & 1 & 5 & 9 & 7 & 6 & 4 & 0 \\
-0 & 2 & 9 & 8 & 3 & 7 & 1 & 0 \\
-0 & 8 & 1 & 6 & 4 & 2 & 2 & 0 \\
-0 & 1 & 0 & 5 & 7 & 8 & 2 & 0 \\
-0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\
-\end{bmatrix}.
+$$
+y(i) = \sum_{k=-\infty}^{k=\infty}w(k)x(i-k),
 $$
 
-<p>Below we have provided code that demonstrates padding and
-convolution. As you will see when we run the code, the size of the
-image will remain unchanged when using padding.~
-</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
-
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">padding</span>(image, kernel):
-    <span style="color: #408080; font-style: italic"># calculate r and c</span>
-    r <span style="color: #666666">=</span> (kernel<span style="color: #666666">.</span>shape[<span style="color: #666666">0</span>] <span style="color: #666666">//</span> <span style="color: #666666">2</span>) <span style="color: #666666">*</span> <span style="color: #666666">2</span>
-    c <span style="color: #666666">=</span> (kernel<span style="color: #666666">.</span>shape[<span style="color: #666666">1</span>] <span style="color: #666666">//</span> <span style="color: #666666">2</span>) <span style="color: #666666">*</span> <span style="color: #666666">2</span>
-    
-    <span style="color: #408080; font-style: italic"># padded image dimensions</span>
-    padded_height <span style="color: #666666">=</span> image<span style="color: #666666">.</span>shape[<span style="color: #666666">0</span>] <span style="color: #666666">+</span> r
-    padded_width <span style="color: #666666">=</span> image<span style="color: #666666">.</span>shape[<span style="color: #666666">1</span>] <span style="color: #666666">+</span> c
-    
-    <span style="color: #408080; font-style: italic"># for more readable code</span>
-    k_half_height <span style="color: #666666">=</span> kernel<span style="color: #666666">.</span>shape[<span style="color: #666666">0</span>] <span style="color: #666666">//</span> <span style="color: #666666">2</span>
-    k_half_width <span style="color: #666666">=</span> kernel<span style="color: #666666">.</span>shape[<span style="color: #666666">1</span>] <span style="color: #666666">//</span> <span style="color: #666666">2</span>
-
-    <span style="color: #408080; font-style: italic"># zero matrix with padded dimensions</span>
-    padded_img <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros((padded_height, padded_width))
-
-    <span style="color: #408080; font-style: italic"># place image into zero matrix</span>
-    padded_img[k_half_height : padded_height <span style="color: #666666">-</span> k_half_height,
-               k_half_width : padded_width <span style="color: #666666">-</span> k_half_width] <span style="color: #666666">=</span> image[:, :]
-
-    <span style="color: #008000; font-weight: bold">return</span> padded_img
-
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">convolve</span>(original_image, padded_image, kernel, stride<span style="color: #666666">=1</span>):
-    <span style="color: #408080; font-style: italic"># rotate kernel by 180 degrees</span>
-    kernel <span style="color: #666666">=</span> np<span style="color: #666666">.</span>rot90(np<span style="color: #666666">.</span>rot90(kernel))
-
-    <span style="color: #408080; font-style: italic"># note that kernel height // 2 is written as &#39;m&#39;</span>
-    <span style="color: #408080; font-style: italic"># and kernel width // 2 as &#39;n&#39; in the mathematical notation</span>
-    m <span style="color: #666666">=</span> kernel<span style="color: #666666">.</span>shape[<span style="color: #666666">0</span>] <span style="color: #666666">//</span> <span style="color: #666666">2</span>
-    n <span style="color: #666666">=</span> kernel<span style="color: #666666">.</span>shape[<span style="color: #666666">1</span>] <span style="color: #666666">//</span> <span style="color: #666666">2</span>
-    
-    r <span style="color: #666666">=</span> (kernel<span style="color: #666666">.</span>shape[<span style="color: #666666">0</span>] <span style="color: #666666">//</span> <span style="color: #666666">2</span>) <span style="color: #666666">*</span> <span style="color: #666666">2</span>
-    c <span style="color: #666666">=</span> (kernel<span style="color: #666666">.</span>shape[<span style="color: #666666">1</span>] <span style="color: #666666">//</span> <span style="color: #666666">2</span>) <span style="color: #666666">*</span> <span style="color: #666666">2</span>
-    
-    <span style="color: #408080; font-style: italic"># initialize output array</span>
-    convolved_image <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros(original_image<span style="color: #666666">.</span>shape)
-    image_height <span style="color: #666666">=</span> original_image<span style="color: #666666">.</span>shape[<span style="color: #666666">0</span>]
-    image_width <span style="color: #666666">=</span> original_image<span style="color: #666666">.</span>shape[<span style="color: #666666">1</span>]
-
-    <span style="color: #408080; font-style: italic"># the convolution</span>
-    <span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(m, image_height <span style="color: #666666">+</span> m, stride):
-        <span style="color: #008000; font-weight: bold">for</span> j <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(n, image_width <span style="color: #666666">+</span> n, stride):
-            convolved_image[i<span style="color: #666666">-</span>m, j<span style="color: #666666">-</span>n] <span style="color: #666666">=</span> np<span style="color: #666666">.</span>sum(
-                padded_image[i : i <span style="color: #666666">+</span> m, j : j <span style="color: #666666">+</span> n]
-                <span style="color: #666666">*</span> kernel
-            )
-            
-    <span style="color: #008000; font-weight: bold">return</span> convolved_image
-
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">convolve</span>(image, kernel, stride<span style="color: #666666">=1</span>):
-    <span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #666666">2</span>):
-        kernel <span style="color: #666666">=</span> np<span style="color: #666666">.</span>rot90(kernel)
-
-    k_half_height <span style="color: #666666">=</span> kernel<span style="color: #666666">.</span>shape[<span style="color: #666666">0</span>] <span style="color: #666666">//</span> <span style="color: #666666">2</span>
-    k_half_width <span style="color: #666666">=</span> kernel<span style="color: #666666">.</span>shape[<span style="color: #666666">0</span>] <span style="color: #666666">//</span> <span style="color: #666666">2</span>
-
-    conv_image <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros(image<span style="color: #666666">.</span>shape)
-    pad_image <span style="color: #666666">=</span> padding(image, kernel)
-
-    <span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(k_half_height, conv_image<span style="color: #666666">.</span>shape[<span style="color: #666666">0</span>] <span style="color: #666666">+</span> k_half_height, stride):
-        <span style="color: #008000; font-weight: bold">for</span> j <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(k_half_width, conv_image<span style="color: #666666">.</span>shape[<span style="color: #666666">1</span>] <span style="color: #666666">+</span> k_half_width, stride):
-            conv_image[i <span style="color: #666666">-</span> k_half_height, j <span style="color: #666666">-</span> k_half_width] <span style="color: #666666">=</span> np<span style="color: #666666">.</span>sum(
-                pad_image[
-                    i <span style="color: #666666">-</span> k_half_height : i <span style="color: #666666">+</span> k_half_height <span style="color: #666666">+</span> <span style="color: #666666">1</span>, j <span style="color: #666666">-</span> k_half_width : j <span style="color: #666666">+</span> k_half_width <span style="color: #666666">+</span> <span style="color: #666666">1</span>
-                ]
-                <span style="color: #666666">*</span> kernel
-            )
-
-    <span style="color: #008000; font-weight: bold">return</span> conv_image
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-<p>Fun fact: When filtering images, you will see that convolution
-involves rotating the kernel by 180 degrees.  However, this is not the
-case when applying convolution in a CNN, where the same operation that is not
-rotated by 180 degrees is called cross-correlation, which is normally implemented in most libraries.
-</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;">original_image <span style="color: #666666">=</span> np<span style="color: #666666">.</span>array([[<span style="color: #666666">4</span>, <span style="color: #666666">1</span>, <span style="color: #666666">2</span>, <span style="color: #666666">9</span>, <span style="color: #666666">8</span>, <span style="color: #666666">6</span>],
-                 [<span style="color: #666666">9</span>, <span style="color: #666666">5</span>, <span style="color: #666666">9</span>, <span style="color: #666666">5</span>, <span style="color: #666666">8</span>, <span style="color: #666666">5</span>],
-                 [<span style="color: #666666">1</span>, <span style="color: #666666">5</span>, <span style="color: #666666">9</span>, <span style="color: #666666">7</span>, <span style="color: #666666">6</span>, <span style="color: #666666">4</span>],
-                 [<span style="color: #666666">2</span>, <span style="color: #666666">9</span>, <span style="color: #666666">8</span>, <span style="color: #666666">3</span>, <span style="color: #666666">7</span>, <span style="color: #666666">1</span>],
-                 [<span style="color: #666666">8</span>, <span style="color: #666666">1</span>, <span style="color: #666666">6</span>, <span style="color: #666666">4</span>, <span style="color: #666666">2</span>, <span style="color: #666666">2</span>],
-                 [<span style="color: #666666">1</span>, <span style="color: #666666">0</span>, <span style="color: #666666">5</span>, <span style="color: #666666">7</span>, <span style="color: #666666">8</span>, <span style="color: #666666">2</span>]])
-
-kernel <span style="color: #666666">=</span> (<span style="color: #666666">1/9</span>)<span style="color: #666666">*</span>np<span style="color: #666666">.</span>ones((<span style="color: #666666">3</span>,<span style="color: #666666">3</span>))
-
-<span style="color: #008000">print</span>(<span style="color: #BA2121">f&quot;</span><span style="color: #BB6688; font-weight: bold">{</span>original_image<span style="color: #666666">.</span>shape<span style="color: #BB6688; font-weight: bold">=}</span><span style="color: #BA2121">&quot;</span>)
-
-<span style="color: #408080; font-style: italic"># note that convolve() performs padding</span>
-convolved_image <span style="color: #666666">=</span> convolve(original_image, kernel, stride<span style="color: #666666">=1</span>)
-
-<span style="color: #008000">print</span>(<span style="color: #BA2121">f&quot;</span><span style="color: #BB6688; font-weight: bold">{</span>convolved_image<span style="color: #666666">.</span>shape<span style="color: #BB6688; font-weight: bold">=}</span><span style="color: #BA2121">&quot;</span>)
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-
-<p>As you can see, the resulting image is of the same size as the
-original image. To round of our demonstration of convolution, we will
-present the results of convolution using commonly used kernels. In a
-CNN, the values of the kernels are randomly initialized, and then
-learned during training. These kernels will extract information
-regarding the picture, such as for example the edge detection filter
-demonstrated below extracts the edges present in the picture. Of
-course, there is no guarantee that the CNN will learn an edge
-detection filter, but this should provide some intuiton as to how the
-CNN is able to use kernels to make better predictions than a regular
-feed forward neural network.
-</p>
-
-
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #408080; font-style: italic"># Now an example using a real image and first a gaussian low-pass filter and then a Sobel filter</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">imageio.v3</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">imageio</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">matplotlib.pyplot</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">plt</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">time</span>
-
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">generate_gauss_mask</span>(sigma, K<span style="color: #666666">=1</span>):
-    side <span style="color: #666666">=</span> np<span style="color: #666666">.</span>ceil(<span style="color: #666666">1</span> <span style="color: #666666">+</span> <span style="color: #666666">8</span> <span style="color: #666666">*</span> sigma)
-    y, x <span style="color: #666666">=</span> np<span style="color: #666666">.</span>mgrid[<span style="color: #666666">-</span>side <span style="color: #666666">//</span> <span style="color: #666666">2</span> <span style="color: #666666">+</span> <span style="color: #666666">1</span> : (side <span style="color: #666666">//</span> <span style="color: #666666">2</span>) <span style="color: #666666">+</span> <span style="color: #666666">1</span>, <span style="color: #666666">-</span>side <span style="color: #666666">//</span> <span style="color: #666666">2</span> <span style="color: #666666">+</span> <span style="color: #666666">1</span> : (side <span style="color: #666666">//</span> <span style="color: #666666">2</span>) <span style="color: #666666">+</span> <span style="color: #666666">1</span>]
-    ker_coef <span style="color: #666666">=</span> K <span style="color: #666666">/</span> (<span style="color: #666666">2</span> <span style="color: #666666">*</span> np<span style="color: #666666">.</span>pi <span style="color: #666666">*</span> sigma<span style="color: #666666">**2</span>)
-    g <span style="color: #666666">=</span> np<span style="color: #666666">.</span>exp(<span style="color: #666666">-</span>((x<span style="color: #666666">**2</span> <span style="color: #666666">+</span> y<span style="color: #666666">**2</span>) <span style="color: #666666">/</span> (<span style="color: #666666">2.0</span> <span style="color: #666666">*</span> sigma<span style="color: #666666">**2</span>)))
+<p>we have now</p>
+$$
+y(i) = \sum_{k=-\infty}^{k=\infty}w(k)x(i+k).
+$$
 
-    <span style="color: #008000; font-weight: bold">return</span> g, ker_coef
+<p>Both TensorFlow and PyTorch (as well as our own code example below),
+implement the last equation, although it is normally referred to as
+convolution.  The same padding rules and stride rules discussed below
+apply to this expression as well.
+</p>
 
+<p>We leave it as an exercise for you to convince yourself that the example we have discussed till now, gives the same final expression using the last expression.</p>
+<h2 id="two-dimensional-objects">Two-dimensional objects </h2>
 
-img_path <span style="color: #666666">=</span> <span style="color: #BA2121">&quot;data/IMG-2167.JPG&quot;</span>
-image_of_cute_dog <span style="color: #666666">=</span> imageio<span style="color: #666666">.</span>imread(img_path, mode<span style="color: #666666">=</span><span style="color: #BA2121">&#39;L&#39;</span>)
+<p>We are now ready to start studying the discrete convolutions relevant for convolutional neural networks.
+We often use convolutions over more than one dimension at a time. If
+we have a two-dimensional image \( X \) as input, we can have a <b>filter</b>
+defined by a two-dimensional <b>kernel/weight/filter</b> \( W \). This leads to an output \( Y \)
+</p>
 
-plt<span style="color: #666666">.</span>imshow(image_of_cute_dog, cmap<span style="color: #666666">=</span><span style="color: #BA2121">&quot;gray&quot;</span>, vmin<span style="color: #666666">=0</span>, vmax<span style="color: #666666">=255</span>, aspect<span style="color: #666666">=</span><span style="color: #BA2121">&quot;auto&quot;</span>)
-plt<span style="color: #666666">.</span>title(<span style="color: #BA2121">&quot;Original image&quot;</span>)
-plt<span style="color: #666666">.</span>show()
+$$
+Y(i,j)=(X * W)(i,j) = \sum_m\sum_n X(m,n)W(i-m,j-n).
+$$
 
-gauss, kernel <span style="color: #666666">=</span> generate_gauss_mask(sigma<span style="color: #666666">=6</span>)
-gauss_kernel <span style="color: #666666">=</span> gauss<span style="color: #666666">*</span>kernel
+<p>Convolution is a commutative process, which means we can rewrite this equation as</p>
+$$
+Y(i,j)=(X * W)(i,j) = \sum_m\sum_n X(i-m,j-n)W(m,n).
+$$
 
-filtered_image <span style="color: #666666">=</span> convolve(image_of_cute_dog, gauss_kernel)
-plt<span style="color: #666666">.</span>imshow(filtered_image, cmap<span style="color: #666666">=</span><span style="color: #BA2121">&quot;gray&quot;</span>, vmin<span style="color: #666666">=0</span>, vmax<span style="color: #666666">=255</span>, aspect<span style="color: #666666">=</span><span style="color: #BA2121">&quot;auto&quot;</span>)
-plt<span style="color: #666666">.</span>title(<span style="color: #BA2121">&quot;Result of convolution with gauss kernel (blurring filter)&quot;</span>)
-plt<span style="color: #666666">.</span>show()
+<p>Normally the latter is more straightforward to implement in a machine
+larning library since there is less variation in the range of values
+of \( m \) and \( n \).
+</p>
 
-sobel_kernel <span style="color: #666666">=</span> np<span style="color: #666666">.</span>array([[<span style="color: #666666">1</span>, <span style="color: #666666">2</span>, <span style="color: #666666">1</span>],
-                    [<span style="color: #666666">0</span>, <span style="color: #666666">0</span>, <span style="color: #666666">0</span>], 
-                    [<span style="color: #666666">-1</span>, <span style="color: #666666">-2</span>, <span style="color: #666666">-1</span>]])
+<p>As mentioned above, most deep learning libraries implement
+cross-correlation instead of convolution (although it is referred to as
+convolution)
+</p>
+$$
+Y(i,j)=(X * W)(i,j) = \sum_m\sum_n X(i+m,j+n)W(m,n).
+$$
 
-filtered_image <span style="color: #666666">=</span> convolve(image_of_cute_dog, sobel_kernel)
 
-plt<span style="color: #666666">.</span>imshow(filtered_image, cmap<span style="color: #666666">=</span><span style="color: #BA2121">&quot;gray&quot;</span>, vmin<span style="color: #666666">=0</span>, vmax<span style="color: #666666">=255</span>, aspect<span style="color: #666666">=</span><span style="color: #BA2121">&quot;auto&quot;</span>)
-plt<span style="color: #666666">.</span>title(<span style="color: #BA2121">&quot;Result of convolution with sobel kernel (edge detection filter)&quot;</span>)
-plt<span style="color: #666666">.</span>show()
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-<h3 id="layers">Layers </h3>
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="cnns-in-more-detail-simple-example">CNNs in more detail, simple example  </h2>
 
-<p>The code below initialises global variables for readability and
-describes the abstract class Layers. This is not important in order to
-understand the CNN, but is benefitial for organizing the code neatly.
+<p>Let assume we have an input matrix \( X \) of dimensionality \( 3\times 3 \)
+and a \( 2\times 2 \) filter \( W \) given by the following matrices
 </p>
 
+$$
+\boldsymbol{X}=\begin{bmatrix}x_{00} & x_{01} & x_{02}  \\
+                      x_{10} & x_{11} & x_{12}  \\
+	              x_{20} & x_{21} & x_{22} \end{bmatrix},
+$$
 
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">math</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">autograd.numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">copy</span> <span style="color: #008000; font-weight: bold">import</span> deepcopy, copy
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">autograd</span> <span style="color: #008000; font-weight: bold">import</span> grad
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">typing</span> <span style="color: #008000; font-weight: bold">import</span> Callable
-
-<span style="color: #408080; font-style: italic"># global variables for index readability</span>
-input_index <span style="color: #666666">=</span> <span style="color: #666666">0</span>
-node_index <span style="color: #666666">=</span> <span style="color: #666666">1</span>
-bias_index <span style="color: #666666">=</span> <span style="color: #666666">1</span>
-input_channel_index <span style="color: #666666">=</span> <span style="color: #666666">1</span>
-feature_maps_index <span style="color: #666666">=</span> <span style="color: #666666">1</span>
-height_index <span style="color: #666666">=</span> <span style="color: #666666">2</span>
-width_index <span style="color: #666666">=</span> <span style="color: #666666">3</span>
-kernel_feature_maps_index <span style="color: #666666">=</span> <span style="color: #666666">1</span>
-kernel_input_channels_index <span style="color: #666666">=</span> <span style="color: #666666">0</span>
-
-
-<span style="color: #008000; font-weight: bold">class</span> <span style="color: #0000FF; font-weight: bold">Layer</span>:
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">__init__</span>(<span style="color: #008000">self</span>, seed):
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>seed <span style="color: #666666">=</span> seed
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">_feedforward</span>(<span style="color: #008000">self</span>):
-        <span style="color: #008000; font-weight: bold">raise</span> <span style="color: #D2413A; font-weight: bold">NotImplementedError</span>
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">_backpropagate</span>(<span style="color: #008000">self</span>):
-        <span style="color: #008000; font-weight: bold">raise</span> <span style="color: #D2413A; font-weight: bold">NotImplementedError</span>
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">_reset_weights</span>(<span style="color: #008000">self</span>, previous_nodes):
-        <span style="color: #008000; font-weight: bold">raise</span> <span style="color: #D2413A; font-weight: bold">NotImplementedError</span>
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-<h3 id="convolution2dlayer-convolution-in-a-hidden-layer">Convolution2DLayer: convolution in a hidden layer </h3>
-
-<p>After establishing the foundational understanding of applying
-convolution to spatial data, let us delve into the intricate workings
-of a convolutional layer in a Convolutional Neural Network (CNN). The
-primary function of convolution, as previously discussed, is to
-extract pertinent information from images while simultaneously
-decreasing the scale of our data. To initiate the image processing, we
-shall begin by partitioning the images into color channels (unless the
-image is grayscale), comprising three primary colors: red, green, and
-blue. We will subsequently utilize trainable kernels to construct a
-higher-dimensional encoding of each channel called feature
-maps. Successive layers will receive these feature maps as inputs,
-generating further encodings, albeit with reduced dimensions. The term
-trainable kernels denotes the initialization of pre-defined
-kernel-shaped weights, which we will then train via backpropagation,
-similar to how weights are trained in a Feedforward Neural Network.
-</p>
+<p>and </p>
+$$
+\boldsymbol{W}=\begin{bmatrix}w_{00} & w_{01} \\
+	              w_{10} & w_{11}\end{bmatrix}.
+$$
 
-<p>To ensure seamless integration between our implementation of the
-convolutional layer and popular machine learning frameworks like
-Tensorflow (Keras) and PyTorch, we have adopted a design pattern that
-mirrors the construction of models using these APIs. This involves
-implementing our convolutional layer as a Python class or object,
-which allows for a more modular and flexible approach to building
-neural networks. By structuring our code in this way, users can easily
-incorporate our implementation into their existing machine learning
-pipelines without having to make significant changes to their
-codebase. Additionally, this design pattern promotes code reusability
-and makes it easier to maintain and update our convolutional layer
-implementation over time.
+<p>We introduce now the hyperparameter \( S \) <b>stride</b>. Stride represents how the filter \( W \) moves the convolution process on the matrix \( X \).
+We strongly recommend the repository on <a href="https://github.com/vdumoulin/conv_arithmetic" target="_blank">Arithmetic of deep learning by Dumoulin and Visin</a> 
 </p>
 
-<p>Note that the Convolution2DLayer takes in an activation function as a parameter, as it also performs non-linearity.</p>
+<p>Here we set the stride equal to \( S=1 \), which means that, starting with the element \( x_{00} \), the filter will act on \( 2\times 2 \) submatrices each time, starting with the upper corner and moving according to the stride value column by column. </p>
 
+<p>Here we perform the operation</p>
+$$
+Y_(i,j)=(X * W)(i,j) = \sum_m\sum_n X(i-m,j-n)W(m,n),
+$$
 
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">class</span> <span style="color: #0000FF; font-weight: bold">Convolution2DLayer</span>(Layer):
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">__init__</span>(
-        <span style="color: #008000">self</span>,
-        input_channels,
-        feature_maps,
-        kernel_height,
-        kernel_width,
-        v_stride,
-        h_stride,
-        pad,
-        act_func: Callable,
-        seed<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">None</span>,
-        reset_weights_independently<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">True</span>,
-    ):
-        <span style="color: #008000">super</span>()<span style="color: #666666">.</span><span style="color: #0000FF">__init__</span>(seed)
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>input_channels <span style="color: #666666">=</span> input_channels
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>feature_maps <span style="color: #666666">=</span> feature_maps
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_height <span style="color: #666666">=</span> kernel_height
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_width <span style="color: #666666">=</span> kernel_width
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>v_stride <span style="color: #666666">=</span> v_stride
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>h_stride <span style="color: #666666">=</span> h_stride
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>pad <span style="color: #666666">=</span> pad
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>act_func <span style="color: #666666">=</span> act_func
-
-        <span style="color: #408080; font-style: italic"># such that the layer can be used on its own</span>
-        <span style="color: #408080; font-style: italic"># outside of the CNN module</span>
-        <span style="color: #008000; font-weight: bold">if</span> reset_weights_independently <span style="color: #666666">==</span> <span style="color: #008000; font-weight: bold">True</span>:
-            <span style="color: #008000">self</span><span style="color: #666666">.</span>_reset_weights_independently()
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">_feedforward</span>(<span style="color: #008000">self</span>, X_batch):
-        <span style="color: #408080; font-style: italic"># note that the shape of X_batch = [inputs, input_maps, img_height, img_width]</span>
-
-        <span style="color: #408080; font-style: italic"># pad the input batch</span>
-        X_batch_padded <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>_padding(X_batch)
-
-        <span style="color: #408080; font-style: italic"># calculate height_index and width_index after stride</span>
-        strided_height <span style="color: #666666">=</span> <span style="color: #008000">int</span>(np<span style="color: #666666">.</span>ceil(X_batch<span style="color: #666666">.</span>shape[height_index] <span style="color: #666666">/</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>v_stride))
-        strided_width <span style="color: #666666">=</span> <span style="color: #008000">int</span>(np<span style="color: #666666">.</span>ceil(X_batch<span style="color: #666666">.</span>shape[width_index] <span style="color: #666666">/</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>h_stride))
-
-        <span style="color: #408080; font-style: italic"># create output array</span>
-        output <span style="color: #666666">=</span> np<span style="color: #666666">.</span>ndarray(
-            (
-                X_batch<span style="color: #666666">.</span>shape[input_index],
-                <span style="color: #008000">self</span><span style="color: #666666">.</span>feature_maps,
-                strided_height,
-                strided_width,
-            )
-        )
-
-        <span style="color: #408080; font-style: italic"># save input and output for backpropagation</span>
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>X_batch_feedforward <span style="color: #666666">=</span> X_batch
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>output_shape <span style="color: #666666">=</span> output<span style="color: #666666">.</span>shape
-
-        <span style="color: #408080; font-style: italic"># checking for errors, no need to look here :)</span>
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>_check_for_errors()
-
-        <span style="color: #408080; font-style: italic"># convolve input with kernel</span>
-        <span style="color: #008000; font-weight: bold">for</span> img <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(X_batch<span style="color: #666666">.</span>shape[input_index]):
-            <span style="color: #008000; font-weight: bold">for</span> chin <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #008000">self</span><span style="color: #666666">.</span>input_channels):
-                <span style="color: #008000; font-weight: bold">for</span> fmap <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #008000">self</span><span style="color: #666666">.</span>feature_maps):
-                    out_h <span style="color: #666666">=</span> <span style="color: #666666">0</span>
-                    <span style="color: #008000; font-weight: bold">for</span> h <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #666666">0</span>, X_batch<span style="color: #666666">.</span>shape[height_index], <span style="color: #008000">self</span><span style="color: #666666">.</span>v_stride):
-                        out_w <span style="color: #666666">=</span> <span style="color: #666666">0</span>
-                        <span style="color: #008000; font-weight: bold">for</span> w <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #666666">0</span>, X_batch<span style="color: #666666">.</span>shape[width_index], <span style="color: #008000">self</span><span style="color: #666666">.</span>h_stride):
-                            output[img, fmap, out_h, out_w] <span style="color: #666666">=</span> np<span style="color: #666666">.</span>sum(
-                                X_batch_padded[
-                                    img,
-                                    chin,
-                                    h : h <span style="color: #666666">+</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_height,
-                                    w : w <span style="color: #666666">+</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_width,
-                                ]
-                                <span style="color: #666666">*</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel[chin, fmap, :, :]
-                            )
-                            out_w <span style="color: #666666">+=</span> <span style="color: #666666">1</span>
-                        out_h <span style="color: #666666">+=</span> <span style="color: #666666">1</span>
-
-        <span style="color: #408080; font-style: italic"># Pay attention to the fact that we&#39;re not rotating the kernel by 180 degrees when filtering the image in</span>
-        <span style="color: #408080; font-style: italic"># the convolutional layer, as convolution in terms of Machine Learning is a procedure known as cross-correlation</span>
-        <span style="color: #408080; font-style: italic"># in image processing and signal processing</span>
-
-        <span style="color: #408080; font-style: italic"># return a</span>
-        <span style="color: #008000; font-weight: bold">return</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>act_func(output <span style="color: #666666">/</span> (<span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_height))
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">_backpropagate</span>(<span style="color: #008000">self</span>, delta_term_next):
-        <span style="color: #408080; font-style: italic"># intiate matrices</span>
-        delta_term <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros((<span style="color: #008000">self</span><span style="color: #666666">.</span>X_batch_feedforward<span style="color: #666666">.</span>shape))
-        gradient_kernel <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros((<span style="color: #008000">self</span><span style="color: #666666">.</span>kernel<span style="color: #666666">.</span>shape))
-
-        <span style="color: #408080; font-style: italic"># pad input for convolution</span>
-        X_batch_padded <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>_padding(<span style="color: #008000">self</span><span style="color: #666666">.</span>X_batch_feedforward)
-
-        <span style="color: #408080; font-style: italic"># Since an activation function is used at the output of the convolution layer, its derivative</span>
-        <span style="color: #408080; font-style: italic"># has to be accounted for in the backpropagation -&gt; as if ReLU was a layer on its own.</span>
-        act_derivative <span style="color: #666666">=</span> derivate(<span style="color: #008000">self</span><span style="color: #666666">.</span>act_func)
-        delta_term_next <span style="color: #666666">=</span> act_derivative(delta_term_next)
-
-        <span style="color: #408080; font-style: italic"># fill in 0&#39;s for values removed by vertical stride in feedforward</span>
-        <span style="color: #008000; font-weight: bold">if</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>v_stride <span style="color: #666666">&gt;</span> <span style="color: #666666">1</span>:
-            v_ind <span style="color: #666666">=</span> <span style="color: #666666">1</span>
-            <span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(delta_term_next<span style="color: #666666">.</span>shape[height_index]):
-                <span style="color: #008000; font-weight: bold">for</span> j <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #008000">self</span><span style="color: #666666">.</span>v_stride <span style="color: #666666">-</span> <span style="color: #666666">1</span>):
-                    delta_term_next <span style="color: #666666">=</span> np<span style="color: #666666">.</span>insert(
-                        delta_term_next, v_ind, <span style="color: #666666">0</span>, axis<span style="color: #666666">=</span>height_index
-                    )
-                v_ind <span style="color: #666666">+=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>v_stride
-
-        <span style="color: #408080; font-style: italic"># fill in 0&#39;s for values removed by horizontal stride in feedforward</span>
-        <span style="color: #008000; font-weight: bold">if</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>h_stride <span style="color: #666666">&gt;</span> <span style="color: #666666">1</span>:
-            h_ind <span style="color: #666666">=</span> <span style="color: #666666">1</span>
-            <span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(delta_term_next<span style="color: #666666">.</span>shape[width_index]):
-                <span style="color: #008000; font-weight: bold">for</span> k <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #008000">self</span><span style="color: #666666">.</span>h_stride <span style="color: #666666">-</span> <span style="color: #666666">1</span>):
-                    delta_term_next <span style="color: #666666">=</span> np<span style="color: #666666">.</span>insert(
-                        delta_term_next, h_ind, <span style="color: #666666">0</span>, axis<span style="color: #666666">=</span>width_index
-                    )
-                h_ind <span style="color: #666666">+=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>h_stride
-
-        <span style="color: #408080; font-style: italic"># crops out 0-rows and 0-columns</span>
-        delta_term_next <span style="color: #666666">=</span> delta_term_next[
-            :,
-            :,
-            : <span style="color: #008000">self</span><span style="color: #666666">.</span>X_batch_feedforward<span style="color: #666666">.</span>shape[height_index],
-            : <span style="color: #008000">self</span><span style="color: #666666">.</span>X_batch_feedforward<span style="color: #666666">.</span>shape[width_index],
-        ]
-
-        <span style="color: #408080; font-style: italic"># the gradient received from the next layer also needs to be padded</span>
-        delta_term_next <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>_padding(delta_term_next)
-
-        <span style="color: #408080; font-style: italic"># calculate delta term by convolving next delta term with kernel</span>
-        <span style="color: #008000; font-weight: bold">for</span> img <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #008000">self</span><span style="color: #666666">.</span>X_batch_feedforward<span style="color: #666666">.</span>shape[input_index]):
-            <span style="color: #008000; font-weight: bold">for</span> chin <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #008000">self</span><span style="color: #666666">.</span>input_channels):
-                <span style="color: #008000; font-weight: bold">for</span> fmap <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #008000">self</span><span style="color: #666666">.</span>feature_maps):
-                    <span style="color: #008000; font-weight: bold">for</span> h <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #008000">self</span><span style="color: #666666">.</span>X_batch_feedforward<span style="color: #666666">.</span>shape[height_index]):
-                        <span style="color: #008000; font-weight: bold">for</span> w <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #008000">self</span><span style="color: #666666">.</span>X_batch_feedforward<span style="color: #666666">.</span>shape[width_index]):
-                            delta_term[img, chin, h, w] <span style="color: #666666">=</span> np<span style="color: #666666">.</span>sum(
-                                delta_term_next[
-                                    img,
-                                    fmap,
-                                    h : h <span style="color: #666666">+</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_height,
-                                    w : w <span style="color: #666666">+</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_width,
-                                ]
-                                <span style="color: #666666">*</span> np<span style="color: #666666">.</span>rot90(np<span style="color: #666666">.</span>rot90(<span style="color: #008000">self</span><span style="color: #666666">.</span>kernel[chin, fmap, :, :]))
-                            )
-
-        <span style="color: #408080; font-style: italic"># calculate gradient for kernel for weight update</span>
-        <span style="color: #408080; font-style: italic"># also via convolution</span>
-        <span style="color: #008000; font-weight: bold">for</span> chin <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #008000">self</span><span style="color: #666666">.</span>input_channels):
-            <span style="color: #008000; font-weight: bold">for</span> fmap <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #008000">self</span><span style="color: #666666">.</span>feature_maps):
-                <span style="color: #008000; font-weight: bold">for</span> k_x <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_height):
-                    <span style="color: #008000; font-weight: bold">for</span> k_y <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_width):
-                        gradient_kernel[chin, fmap, k_x, k_y] <span style="color: #666666">=</span> np<span style="color: #666666">.</span>sum(
-                            X_batch_padded[
-                                img,
-                                chin,
-                                h : h <span style="color: #666666">+</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_height,
-                                w : w <span style="color: #666666">+</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_width,
-                            ]
-                            <span style="color: #666666">*</span> delta_term_next[
-                                img,
-                                fmap,
-                                h : h <span style="color: #666666">+</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_height,
-                                w : w <span style="color: #666666">+</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_width,
-                            ]
-                        )
-        <span style="color: #408080; font-style: italic"># all kernels are updated with weight gradient of kernel</span>
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel <span style="color: #666666">-=</span> gradient_kernel
-
-        <span style="color: #408080; font-style: italic"># return delta term</span>
-        <span style="color: #008000; font-weight: bold">return</span> delta_term
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">_padding</span>(<span style="color: #008000">self</span>, X_batch, batch_type<span style="color: #666666">=</span><span style="color: #BA2121">&quot;image&quot;</span>):
-
-        <span style="color: #408080; font-style: italic"># same padding for images</span>
-        <span style="color: #008000; font-weight: bold">if</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>pad <span style="color: #666666">==</span> <span style="color: #BA2121">&quot;same&quot;</span> <span style="color: #AA22FF; font-weight: bold">and</span> batch_type <span style="color: #666666">==</span> <span style="color: #BA2121">&quot;image&quot;</span>:
-            padded_height <span style="color: #666666">=</span> X_batch<span style="color: #666666">.</span>shape[height_index] <span style="color: #666666">+</span> (<span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_height <span style="color: #666666">//</span> <span style="color: #666666">2</span>) <span style="color: #666666">*</span> <span style="color: #666666">2</span>
-            padded_width <span style="color: #666666">=</span> X_batch<span style="color: #666666">.</span>shape[width_index] <span style="color: #666666">+</span> (<span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_width <span style="color: #666666">//</span> <span style="color: #666666">2</span>) <span style="color: #666666">*</span> <span style="color: #666666">2</span>
-            half_kernel_height <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_height <span style="color: #666666">//</span> <span style="color: #666666">2</span>
-            half_kernel_width <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_width <span style="color: #666666">//</span> <span style="color: #666666">2</span>
-
-            <span style="color: #408080; font-style: italic"># initialize padded array</span>
-            X_batch_padded <span style="color: #666666">=</span> np<span style="color: #666666">.</span>ndarray(
-                (
-                    X_batch<span style="color: #666666">.</span>shape[input_index],
-                    X_batch<span style="color: #666666">.</span>shape[feature_maps_index],
-                    padded_height,
-                    padded_width,
-                )
-            )
-
-            <span style="color: #408080; font-style: italic"># zero pad all images in X_batch</span>
-            <span style="color: #008000; font-weight: bold">for</span> img <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(X_batch<span style="color: #666666">.</span>shape[input_index]):
-                padded_img <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros(
-                    (X_batch<span style="color: #666666">.</span>shape[feature_maps_index], padded_height, padded_width)
-                )
-                padded_img[
-                    :,
-                    half_kernel_height : padded_height <span style="color: #666666">-</span> half_kernel_height,
-                    half_kernel_width : padded_width <span style="color: #666666">-</span> half_kernel_width,
-                ] <span style="color: #666666">=</span> X_batch[img, :, :, :]
-                X_batch_padded[img, :, :, :] <span style="color: #666666">=</span> padded_img[:, :, :]
-
-            <span style="color: #008000; font-weight: bold">return</span> X_batch_padded
-
-        <span style="color: #408080; font-style: italic"># same padding for gradients</span>
-        <span style="color: #008000; font-weight: bold">elif</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>pad <span style="color: #666666">==</span> <span style="color: #BA2121">&quot;same&quot;</span> <span style="color: #AA22FF; font-weight: bold">and</span> batch_type <span style="color: #666666">==</span> <span style="color: #BA2121">&quot;grad&quot;</span>:
-            padded_height <span style="color: #666666">=</span> X_batch<span style="color: #666666">.</span>shape[height_index] <span style="color: #666666">+</span> (<span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_height <span style="color: #666666">//</span> <span style="color: #666666">2</span>) <span style="color: #666666">*</span> <span style="color: #666666">2</span>
-            padded_width <span style="color: #666666">=</span> X_batch<span style="color: #666666">.</span>shape[width_index] <span style="color: #666666">+</span> (<span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_width <span style="color: #666666">//</span> <span style="color: #666666">2</span>) <span style="color: #666666">*</span> <span style="color: #666666">2</span>
-            half_kernel_height <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_height <span style="color: #666666">//</span> <span style="color: #666666">2</span>
-            half_kernel_width <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_width <span style="color: #666666">//</span> <span style="color: #666666">2</span>
-
-            <span style="color: #408080; font-style: italic"># initialize padded array</span>
-            delta_term_padded <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros(
-                (
-                    X_batch<span style="color: #666666">.</span>shape[input_index],
-                    X_batch<span style="color: #666666">.</span>shape[feature_maps_index],
-                    padded_height,
-                    padded_width,
-                )
-            )
-
-            <span style="color: #408080; font-style: italic"># zero pad delta term</span>
-            delta_term_padded[
-                :, :, : X_batch<span style="color: #666666">.</span>shape[height_index], : X_batch<span style="color: #666666">.</span>shape[width_index]
-            ] <span style="color: #666666">=</span> X_batch[:, :, :, :]
-
-            <span style="color: #008000; font-weight: bold">return</span> delta_term_padded
-
-        <span style="color: #008000; font-weight: bold">else</span>:
-            <span style="color: #008000; font-weight: bold">return</span> X_batch
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">_reset_weights_independently</span>(<span style="color: #008000">self</span>):
-        <span style="color: #408080; font-style: italic"># sets seed to remove randomness inbetween runs</span>
-        <span style="color: #008000; font-weight: bold">if</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>seed <span style="color: #AA22FF; font-weight: bold">is</span> <span style="color: #AA22FF; font-weight: bold">not</span> <span style="color: #008000; font-weight: bold">None</span>:
-            np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>seed(<span style="color: #008000">self</span><span style="color: #666666">.</span>seed)
-
-        <span style="color: #408080; font-style: italic"># initializes kernel matrix</span>
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel <span style="color: #666666">=</span> np<span style="color: #666666">.</span>ndarray(
-            (
-                <span style="color: #008000">self</span><span style="color: #666666">.</span>input_channels,
-                <span style="color: #008000">self</span><span style="color: #666666">.</span>feature_maps,
-                <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_height,
-                <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_width,
-            )
-        )
-
-        <span style="color: #408080; font-style: italic"># randomly initializes weights</span>
-        <span style="color: #008000; font-weight: bold">for</span> chin <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #008000">self</span><span style="color: #666666">.</span>kernel<span style="color: #666666">.</span>shape[kernel_input_channels_index]):
-            <span style="color: #008000; font-weight: bold">for</span> fmap <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #008000">self</span><span style="color: #666666">.</span>kernel<span style="color: #666666">.</span>shape[kernel_feature_maps_index]):
-                <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel[chin, fmap, :, :] <span style="color: #666666">=</span> np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>rand(
-                    <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_height, <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_width
-                )
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">_reset_weights</span>(<span style="color: #008000">self</span>, previous_nodes):
-        <span style="color: #408080; font-style: italic"># sets weights</span>
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>_reset_weights_independently()
-
-        <span style="color: #408080; font-style: italic"># returns shape of output used for subsequent layer&#39;s weight initiation</span>
-        strided_height <span style="color: #666666">=</span> <span style="color: #008000">int</span>(
-            np<span style="color: #666666">.</span>ceil(previous_nodes<span style="color: #666666">.</span>shape[height_index] <span style="color: #666666">/</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>v_stride)
-        )
-        strided_width <span style="color: #666666">=</span> <span style="color: #008000">int</span>(np<span style="color: #666666">.</span>ceil(previous_nodes<span style="color: #666666">.</span>shape[width_index] <span style="color: #666666">/</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>h_stride))
-        next_nodes <span style="color: #666666">=</span> np<span style="color: #666666">.</span>ones(
-            (
-                previous_nodes<span style="color: #666666">.</span>shape[input_index],
-                <span style="color: #008000">self</span><span style="color: #666666">.</span>feature_maps,
-                strided_height,
-                strided_width,
-            )
-        )
-        <span style="color: #008000; font-weight: bold">return</span> next_nodes <span style="color: #666666">/</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_height
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">_check_for_errors</span>(<span style="color: #008000">self</span>):
-        <span style="color: #008000; font-weight: bold">if</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>X_batch_feedforward<span style="color: #666666">.</span>shape[input_channel_index] <span style="color: #666666">!=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>input_channels:
-            <span style="color: #008000; font-weight: bold">raise</span> <span style="color: #D2413A; font-weight: bold">AssertionError</span>(
-                <span style="color: #BA2121">f&quot;ERROR: Number of input channels in data (</span><span style="color: #BB6688; font-weight: bold">{</span><span style="color: #008000">self</span><span style="color: #666666">.</span>X_batch_feedforward<span style="color: #666666">.</span>shape[input_channel_index]<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">) is not equal to input channels in Convolution2DLayerOPT (</span><span style="color: #BB6688; font-weight: bold">{</span><span style="color: #008000">self</span><span style="color: #666666">.</span>input_channels<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">)! Please change the number of input channels of the Convolution2DLayer such that they are equal&quot;</span>
-            )
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-<h3 id="backpropagation-in-the-convolutional-layer">Backpropagation in the convolutional layer </h3>
-
-<p>As you may have noticed, we have not yet explained how the
-backpropagation algorithm works in a convolutional layer. However,
-having covered all other major details about convolutional layers, we
-are now prepared to do so. It should come as no surprise that the
-calculation of delta terms at each convolutional layer takes the form
-of convolution. After the gradient has been propagated backwards
-through the flattening layer, where it was reshaped into an
-appropriate form, calculating the update value for the kernel is
-simply a matter of convolving the output gradient with the input of
-the layer for which we are updating the weights. For more detail, this
-article serves as an excellent resource, see
-<a href="https://pavisj.medium.com/convolutions-and-backpropagations-46026a8f5d2c" target="_blank"><tt>https://pavisj.medium.com/convolutions-and-backpropagations-46026a8f5d2c</tt></a>
+<p>and obtain</p>
+$$
+\boldsymbol{Y}=\begin{bmatrix}x_{00}w_{00}+x_{01}w_{01}+x_{10}w_{10}+x_{11}w_{11} & x_{01}w_{00}+x_{02}w_{01}+x_{11}w_{10}+x_{12}w_{11}  \\
+	              x_{10}w_{00}+x_{11}w_{01}+x_{20}w_{10}+x_{21}w_{11} & x_{11}w_{00}+x_{12}w_{01}+x_{21}w_{10}+x_{22}w_{11}\end{bmatrix}.
+$$
+
+<p>We can rewrite this operation in terms of a matrix-vector multiplication by defining a new vector where we flatten out the inputs as a vector \( \boldsymbol{X}' \) of length \( 9 \) and
+a matrix \( \boldsymbol{W}' \) with dimension \( 4\times 9 \) as
 </p>
-<h3 id="demonstration">Demonstration </h3>
 
-<p>We can use the convolutional layer above to perform a simple convolution on an image of the now familiar cute dog.</p>
+$$
+\boldsymbol{X}'=\begin{bmatrix}x_{00} \\ x_{01} \\ x_{02} \\ x_{10} \\ x_{11} \\ x_{12} \\ x_{20} \\ x_{21} \\ x_{22} \end{bmatrix},
+$$
 
+<p>and the new matrix</p>
+$$
+\boldsymbol{W}'=\begin{bmatrix} w_{00} & w_{01} & 0 & w_{10} & w_{11} & 0 & 0 & 0 & 0 \\
+                        0  & w_{00} & w_{01} & 0 & w_{10} & w_{11} & 0 & 0 & 0 \\
+			0 & 0 & 0 & w_{00} & w_{01} & 0 & w_{10} & w_{11} & 0  \\
+                        0 & 0 & 0 & 0 & w_{00} & w_{01} & 0 & w_{10} & w_{11}\end{bmatrix}.
+$$
 
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">imageio.v3</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">imageio</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">matplotlib.pyplot</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">plt</span>
+<p>We see easily that performing the matrix-vector multiplication \( \boldsymbol{W}'\boldsymbol{X}' \) is the same as the above convolution with stride \( S=1 \), that is</p>
 
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">plot_convolution_result</span>(X, layer):
-    plt<span style="color: #666666">.</span>imshow(X[<span style="color: #666666">0</span>, <span style="color: #666666">0</span>, :, :], vmin<span style="color: #666666">=0</span>, vmax<span style="color: #666666">=255</span>, cmap<span style="color: #666666">=</span><span style="color: #BA2121">&quot;gray&quot;</span>)
-    plt<span style="color: #666666">.</span>title(<span style="color: #BA2121">&quot;Original image&quot;</span>)
-    plt<span style="color: #666666">.</span>colorbar()
-    plt<span style="color: #666666">.</span>show()
-    conv_result <span style="color: #666666">=</span> layer<span style="color: #666666">.</span>_feedforward(X)
-    plt<span style="color: #666666">.</span>title(<span style="color: #BA2121">&quot;Result of convolutional layer&quot;</span>)
-    plt<span style="color: #666666">.</span>imshow(conv_result[<span style="color: #666666">0</span>, <span style="color: #666666">0</span>, :, :], vmin<span style="color: #666666">=0</span>, vmax<span style="color: #666666">=255</span>, cmap<span style="color: #666666">=</span><span style="color: #BA2121">&quot;gray&quot;</span>)
-    plt<span style="color: #666666">.</span>colorbar()
-    plt<span style="color: #666666">.</span>show()
-
-<span style="color: #408080; font-style: italic"># create layer</span>
-layer <span style="color: #666666">=</span> Convolution2DLayer(
-    input_channels<span style="color: #666666">=3</span>,
-    feature_maps<span style="color: #666666">=1</span>,
-    kernel_height<span style="color: #666666">=4</span>,
-    kernel_width<span style="color: #666666">=4</span>,
-    v_stride<span style="color: #666666">=2</span>,
-    h_stride<span style="color: #666666">=2</span>,
-    pad<span style="color: #666666">=</span><span style="color: #BA2121">&quot;same&quot;</span>,
-    act_func<span style="color: #666666">=</span>identity,
-    seed<span style="color: #666666">=2023</span>,
-    )
-
-<span style="color: #408080; font-style: italic"># read in image path, make data correct format</span>
-img_path <span style="color: #666666">=</span> img_path <span style="color: #666666">=</span> <span style="color: #BA2121">&quot;data/IMG-2167.JPG&quot;</span>
-image_of_cute_dog <span style="color: #666666">=</span> imageio<span style="color: #666666">.</span>imread(img_path)
-image_shape <span style="color: #666666">=</span> image_of_cute_dog<span style="color: #666666">.</span>shape
-image_of_cute_dog <span style="color: #666666">=</span> image_of_cute_dog<span style="color: #666666">.</span>reshape(<span style="color: #666666">1</span>, image_shape[<span style="color: #666666">0</span>], image_shape[<span style="color: #666666">1</span>], image_shape[<span style="color: #666666">2</span>])
-image_of_cute_dog <span style="color: #666666">=</span> image_of_cute_dog<span style="color: #666666">.</span>transpose(<span style="color: #666666">0</span>, <span style="color: #666666">3</span>, <span style="color: #666666">1</span>, <span style="color: #666666">2</span>)
-
-<span style="color: #408080; font-style: italic"># plot the result of the convolution</span>
-plot_convolution_result(image_of_cute_dog, layer)
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
+$$
+Y=(\boldsymbol{W}*\boldsymbol{X}),
+$$
 
-<p>We cobserve that the result has half the pixels on each axis due to
-the fact that we've used a horizontal and vertical stride of 2. The
-result of this convolution is not very insightfull, as the kernel has
-completely random values for the first feedforward pass. However, as
-we perform multiple forward and backward passes, the results of the
-convolution should provide identifying features of the image it uses
-for classification.
-</p>
+<p>is now given by \( \boldsymbol{W}'\boldsymbol{X}' \) which is a vector of length \( 4 \) instead of the originally resulting  \( 2\times 2 \) output matrix.</p>
 
-<p>Note that image data usually comes in many different shapes and sizes,
-but for our CNN we require the input data be formatted as \[Number of
-inputs, input channels, input height, input width\]. Occasionally, the
-data you come accross use will be formatted like this, but on many
-occasions reshaping and transposing the dimensions is sadly necessary.
-</p>
-<h3 id="pooling-layer">Pooling Layer </h3>
-
-<p>The pooling layer is another widely used type of layer in
-convolutional neural networks that enables data downsampling to a more
-manageable size. Despite recent technological advancements that allow
-for convolution without excessive size reduction of the data, the
-pooling layer still remains a fundamental component of convolutional
-neural networks. It can be used before, after, or in between
-convolutional layers, although finding the optimal placement of layers
-and network depth requires experimentation to achieve the best
-performance for a given problem. The code we provide allows you to
-perform two types of pooling known as max pooling and average pooling.
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="the-convolution-stage">The convolution stage </h2>
+
+<p>The convolution stage, where we apply different filters \( \boldsymbol{W} \) in
+order to reduce the dimensionality of an image, adds, in addition to
+the weights and biases (to be trained by the back propagation
+algorithm) that define the filters, two new hyperparameters, the so-called
+<b>padding</b> \( P \) and the stride \( S \).
 </p>
 
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="finding-the-number-of-parameters">Finding the number of parameters </h2>
 
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">class</span> <span style="color: #0000FF; font-weight: bold">Pooling2DLayer</span>(Layer):
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">__init__</span>(
-        <span style="color: #008000">self</span>,
-        kernel_height,
-        kernel_width,
-        v_stride,
-        h_stride,
-        pooling<span style="color: #666666">=</span><span style="color: #BA2121">&quot;max&quot;</span>,
-        seed<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">None</span>,
-    ):
-        <span style="color: #008000">super</span>()<span style="color: #666666">.</span><span style="color: #0000FF">__init__</span>(seed)
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_height <span style="color: #666666">=</span> kernel_height
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_width <span style="color: #666666">=</span> kernel_width
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>v_stride <span style="color: #666666">=</span> v_stride
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>h_stride <span style="color: #666666">=</span> h_stride
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>pooling <span style="color: #666666">=</span> pooling
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">_feedforward</span>(<span style="color: #008000">self</span>, X_batch):
-        <span style="color: #408080; font-style: italic"># Saving the input for use in the backwardpass</span>
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>X_batch_feedforward <span style="color: #666666">=</span> X_batch
-
-        <span style="color: #408080; font-style: italic"># check if user is silly</span>
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>_check_for_errors()
-
-        <span style="color: #408080; font-style: italic"># Computing the size of the feature maps based on kernel size and the stride parameter</span>
-        strided_height <span style="color: #666666">=</span> (
-            X_batch<span style="color: #666666">.</span>shape[height_index] <span style="color: #666666">-</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_height
-        ) <span style="color: #666666">//</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>v_stride <span style="color: #666666">+</span> <span style="color: #666666">1</span>
-        <span style="color: #008000; font-weight: bold">if</span> X_batch<span style="color: #666666">.</span>shape[height_index] <span style="color: #666666">==</span> X_batch<span style="color: #666666">.</span>shape[width_index]:
-            strided_width <span style="color: #666666">=</span> strided_height
-        <span style="color: #008000; font-weight: bold">else</span>:
-            strided_width <span style="color: #666666">=</span> (
-                X_batch<span style="color: #666666">.</span>shape[width_index] <span style="color: #666666">-</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_width
-            ) <span style="color: #666666">//</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>h_stride <span style="color: #666666">+</span> <span style="color: #666666">1</span>
-
-        <span style="color: #408080; font-style: italic"># initialize output array</span>
-        output <span style="color: #666666">=</span> np<span style="color: #666666">.</span>ndarray(
-            (
-                X_batch<span style="color: #666666">.</span>shape[input_index],
-                X_batch<span style="color: #666666">.</span>shape[feature_maps_index],
-                strided_height,
-                strided_width,
-            )
-        )
-
-        <span style="color: #408080; font-style: italic"># select pooling action, either max or average pooling</span>
-        <span style="color: #008000; font-weight: bold">if</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>pooling <span style="color: #666666">==</span> <span style="color: #BA2121">&quot;max&quot;</span>:
-            <span style="color: #008000">self</span><span style="color: #666666">.</span>pooling_action <span style="color: #666666">=</span> np<span style="color: #666666">.</span>max
-        <span style="color: #008000; font-weight: bold">elif</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>pooling <span style="color: #666666">==</span> <span style="color: #BA2121">&quot;average&quot;</span>:
-            <span style="color: #008000">self</span><span style="color: #666666">.</span>pooling_action <span style="color: #666666">=</span> np<span style="color: #666666">.</span>mean
-
-        <span style="color: #408080; font-style: italic"># pool based on kernel size and stride</span>
-        <span style="color: #008000; font-weight: bold">for</span> img <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(output<span style="color: #666666">.</span>shape[input_index]):
-            <span style="color: #008000; font-weight: bold">for</span> fmap <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(output<span style="color: #666666">.</span>shape[feature_maps_index]):
-                <span style="color: #008000; font-weight: bold">for</span> h <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(strided_height):
-                    <span style="color: #008000; font-weight: bold">for</span> w <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(strided_width):
-                        output[img, fmap, h, w] <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>pooling_action(
-                            X_batch[
-                                img,
-                                fmap,
-                                (h <span style="color: #666666">*</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>v_stride) : (h <span style="color: #666666">*</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>v_stride)
-                                <span style="color: #666666">+</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_height,
-                                (w <span style="color: #666666">*</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>h_stride) : (w <span style="color: #666666">*</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>h_stride)
-                                <span style="color: #666666">+</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_width,
-                            ]
-                        )
-
-        <span style="color: #408080; font-style: italic"># output for feedforward in next layer</span>
-        <span style="color: #008000; font-weight: bold">return</span> output
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">_backpropagate</span>(<span style="color: #008000">self</span>, delta_term_next):
-        <span style="color: #408080; font-style: italic"># initiate delta term array</span>
-        delta_term <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros((<span style="color: #008000">self</span><span style="color: #666666">.</span>X_batch_feedforward<span style="color: #666666">.</span>shape))
-
-        <span style="color: #008000; font-weight: bold">for</span> img <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(delta_term_next<span style="color: #666666">.</span>shape[input_index]):
-            <span style="color: #008000; font-weight: bold">for</span> fmap <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(delta_term_next<span style="color: #666666">.</span>shape[feature_maps_index]):
-                <span style="color: #008000; font-weight: bold">for</span> h <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #666666">0</span>, delta_term_next<span style="color: #666666">.</span>shape[height_index], <span style="color: #008000">self</span><span style="color: #666666">.</span>v_stride):
-                    <span style="color: #008000; font-weight: bold">for</span> w <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(
-                        <span style="color: #666666">0</span>, delta_term_next<span style="color: #666666">.</span>shape[width_index], <span style="color: #008000">self</span><span style="color: #666666">.</span>h_stride
-                    ):
-                        <span style="color: #408080; font-style: italic"># max pooling</span>
-                        <span style="color: #008000; font-weight: bold">if</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>pooling <span style="color: #666666">==</span> <span style="color: #BA2121">&quot;max&quot;</span>:
-                            <span style="color: #408080; font-style: italic"># get window</span>
-                            window <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>X_batch_feedforward[
-                                img,
-                                fmap,
-                                h : h <span style="color: #666666">+</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_height,
-                                w : w <span style="color: #666666">+</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_width,
-                            ]
-
-                            <span style="color: #408080; font-style: italic"># find max values indices in window</span>
-                            max_h, max_w <span style="color: #666666">=</span> np<span style="color: #666666">.</span>unravel_index(
-                                window<span style="color: #666666">.</span>argmax(), window<span style="color: #666666">.</span>shape
-                            )
-
-                            <span style="color: #408080; font-style: italic"># set values in new, upsampled delta term</span>
-                            delta_term[
-                                img,
-                                fmap,
-                                (h <span style="color: #666666">+</span> max_h),
-                                (w <span style="color: #666666">+</span> max_w),
-                            ] <span style="color: #666666">+=</span> delta_term_next[img, fmap, h, w]
-
-                        <span style="color: #408080; font-style: italic"># average pooling</span>
-                        <span style="color: #008000; font-weight: bold">if</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>pooling <span style="color: #666666">==</span> <span style="color: #BA2121">&quot;average&quot;</span>:
-                            delta_term[
-                                img,
-                                fmap,
-                                h : h <span style="color: #666666">+</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_height,
-                                w : w <span style="color: #666666">+</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_width,
-                            ] <span style="color: #666666">=</span> (
-                                delta_term_next[img, fmap, h, w]
-                                <span style="color: #666666">/</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_height
-                                <span style="color: #666666">/</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_width
-                            )
-        <span style="color: #408080; font-style: italic"># returns input to backpropagation in previous layer</span>
-        <span style="color: #008000; font-weight: bold">return</span> delta_term
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">_reset_weights</span>(<span style="color: #008000">self</span>, previous_nodes):
-        <span style="color: #408080; font-style: italic"># calculate strided height, strided width</span>
-        strided_height <span style="color: #666666">=</span> (
-            previous_nodes<span style="color: #666666">.</span>shape[height_index] <span style="color: #666666">-</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_height
-        ) <span style="color: #666666">//</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>v_stride <span style="color: #666666">+</span> <span style="color: #666666">1</span>
-        <span style="color: #008000; font-weight: bold">if</span> previous_nodes<span style="color: #666666">.</span>shape[height_index] <span style="color: #666666">==</span> previous_nodes<span style="color: #666666">.</span>shape[width_index]:
-            strided_width <span style="color: #666666">=</span> strided_height
-        <span style="color: #008000; font-weight: bold">else</span>:
-            strided_width <span style="color: #666666">=</span> (
-                previous_nodes<span style="color: #666666">.</span>shape[width_index] <span style="color: #666666">-</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_width
-            ) <span style="color: #666666">//</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>h_stride <span style="color: #666666">+</span> <span style="color: #666666">1</span>
-
-        <span style="color: #408080; font-style: italic"># initiate output array</span>
-        output <span style="color: #666666">=</span> np<span style="color: #666666">.</span>ones(
-            (
-                previous_nodes<span style="color: #666666">.</span>shape[input_index],
-                previous_nodes<span style="color: #666666">.</span>shape[feature_maps_index],
-                strided_height,
-                strided_width,
-            )
-        )
-
-        <span style="color: #408080; font-style: italic"># returns output with shape used for reset weights in next layer</span>
-        <span style="color: #008000; font-weight: bold">return</span> output
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">_check_for_errors</span>(<span style="color: #008000">self</span>):
-        <span style="color: #408080; font-style: italic"># check if input is smaller than kernel size -&gt; error</span>
-        <span style="color: #008000; font-weight: bold">assert</span> (
-            <span style="color: #008000">self</span><span style="color: #666666">.</span>X_batch_feedforward<span style="color: #666666">.</span>shape[width_index] <span style="color: #666666">&gt;=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_width
-        ), <span style="color: #BA2121">f&quot;ERROR: Pooling kernel width_index (</span><span style="color: #BB6688; font-weight: bold">{</span><span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_width<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">) larger than data width_index (</span><span style="color: #BB6688; font-weight: bold">{</span><span style="color: #008000">self</span><span style="color: #666666">.</span>X_batch_feedforward<span style="color: #666666">.</span>input<span style="color: #666666">.</span>shape[<span style="color: #666666">2</span>]<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">), please lower the kernel width_index of the Pooling2DLayer&quot;</span>
-        <span style="color: #008000; font-weight: bold">assert</span> (
-            <span style="color: #008000">self</span><span style="color: #666666">.</span>X_batch_feedforward<span style="color: #666666">.</span>shape[height_index] <span style="color: #666666">&gt;=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_height
-        ), <span style="color: #BA2121">f&quot;ERROR: Pooling kernel height_index (</span><span style="color: #BB6688; font-weight: bold">{</span><span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_height<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">) larger than data height_index (</span><span style="color: #BB6688; font-weight: bold">{</span><span style="color: #008000">self</span><span style="color: #666666">.</span>X_batch_feedforward<span style="color: #666666">.</span>input<span style="color: #666666">.</span>shape[<span style="color: #666666">3</span>]<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">), please lower the kernel height_index of the Pooling2DLayer&quot;</span>
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-<h3 id="flattening-layer">Flattening Layer </h3>
-
-<p>Before we can begin building our first CNN model, we need to introduce
-the flattening layer. As its name suggests, the flattening layer
-transforms the data into a one-dimensional vector that can be fed into
-the feedforward layers of our network. This layer plays a crucial role
-in preparing the data for further processing in the
-network. Additionally, the flattening layer is responsible for
-reshaping the gradient to the proper shape during
-backpropagation. This ensures that the kernels are correctly updated,
-allowing for effective learning in the network.
+<p>In the above example we have an input matrix of dimension \( 3\times
+3 \). In general we call the input for an input volume and it is defined
+by its width \( H_1 \), height \( H_1 \) and depth \( D_1 \). If we have the
+standard three color channels \( D_1=3 \).
 </p>
 
+<p>The above example has \( W_1=H_1=3 \) and \( D_1=1 \).</p>
 
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">class</span> <span style="color: #0000FF; font-weight: bold">FlattenLayer</span>(Layer):
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">__init__</span>(<span style="color: #008000">self</span>, act_func<span style="color: #666666">=</span>LRELU, seed<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">None</span>):
-        <span style="color: #008000">super</span>()<span style="color: #666666">.</span><span style="color: #0000FF">__init__</span>(seed)
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>act_func <span style="color: #666666">=</span> act_func
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">_feedforward</span>(<span style="color: #008000">self</span>, X_batch):
-        <span style="color: #408080; font-style: italic"># save input for backpropagation</span>
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>X_batch_feedforward_shape <span style="color: #666666">=</span> X_batch<span style="color: #666666">.</span>shape
-        <span style="color: #408080; font-style: italic"># Remember, the data has the following shape: (I, FM, H, W, ) in the convolutional layers</span>
-        <span style="color: #408080; font-style: italic"># whilst the data has the shape (I, FM * H * W) in the fully connected layers</span>
-        <span style="color: #408080; font-style: italic"># I = Inputs, FM = Feature Maps, H = Height and W = Width.</span>
-        X_batch <span style="color: #666666">=</span> X_batch<span style="color: #666666">.</span>reshape(
-            X_batch<span style="color: #666666">.</span>shape[input_index],
-            X_batch<span style="color: #666666">.</span>shape[feature_maps_index]
-            <span style="color: #666666">*</span> X_batch<span style="color: #666666">.</span>shape[height_index]
-            <span style="color: #666666">*</span> X_batch<span style="color: #666666">.</span>shape[width_index],
-        )
-
-        <span style="color: #408080; font-style: italic"># add bias to a</span>
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>z_matrix <span style="color: #666666">=</span> X_batch
-        bias <span style="color: #666666">=</span> np<span style="color: #666666">.</span>ones((X_batch<span style="color: #666666">.</span>shape[input_index], <span style="color: #666666">1</span>)) <span style="color: #666666">*</span> <span style="color: #666666">0.01</span>
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>a_matrix <span style="color: #666666">=</span> np<span style="color: #666666">.</span>hstack([bias, X_batch])
-
-        <span style="color: #408080; font-style: italic"># return a, the input to feedforward in next layer</span>
-        <span style="color: #008000; font-weight: bold">return</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>a_matrix
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">_backpropagate</span>(<span style="color: #008000">self</span>, weights_next, delta_term_next):
-        activation_derivative <span style="color: #666666">=</span> derivate(<span style="color: #008000">self</span><span style="color: #666666">.</span>act_func)
-
-        <span style="color: #408080; font-style: italic"># calculate delta term</span>
-        delta_term <span style="color: #666666">=</span> (
-            weights_next[bias_index:, :] <span style="color: #666666">@</span> delta_term_next<span style="color: #666666">.</span>T
-        )<span style="color: #666666">.</span>T <span style="color: #666666">*</span> activation_derivative(<span style="color: #008000">self</span><span style="color: #666666">.</span>z_matrix)
-
-        <span style="color: #408080; font-style: italic"># FlattenLayer does not update weights</span>
-        <span style="color: #408080; font-style: italic"># reshapes delta layer to convolutional layer data format [Input, Feature_Maps, Height, Width]</span>
-        <span style="color: #008000; font-weight: bold">return</span> delta_term<span style="color: #666666">.</span>reshape(<span style="color: #008000">self</span><span style="color: #666666">.</span>X_batch_feedforward_shape)
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">_reset_weights</span>(<span style="color: #008000">self</span>, previous_nodes):
-        <span style="color: #408080; font-style: italic"># note that the previous nodes to the FlattenLayer are from the convolutional layers</span>
-        previous_nodes <span style="color: #666666">=</span> previous_nodes<span style="color: #666666">.</span>reshape(
-            previous_nodes<span style="color: #666666">.</span>shape[input_index],
-            previous_nodes<span style="color: #666666">.</span>shape[feature_maps_index]
-            <span style="color: #666666">*</span> previous_nodes<span style="color: #666666">.</span>shape[height_index]
-            <span style="color: #666666">*</span> previous_nodes<span style="color: #666666">.</span>shape[width_index],
-        )
-
-        <span style="color: #408080; font-style: italic"># return shape used in reset_weights in next layer</span>
-        <span style="color: #008000; font-weight: bold">return</span> previous_nodes<span style="color: #666666">.</span>shape[node_index]
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">get_prev_a</span>(<span style="color: #008000">self</span>):
-        <span style="color: #008000; font-weight: bold">return</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>a_matrix
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-<h3 id="fully-connected-layers">Fully Connected Layers </h3>
-
-<p>Finally, the result from the flatten layer will pass to a series of
-fully connected layers, which function as a normal feed forward neural
-network. The fully connected layers are split into two classes;
-FullyConnectedLayer which acts as a hidden layer, and OutputLayer,
-which acts as the single output layer at the end of the CNN. If one
-wishes to use this codebase to construct a normal feed forward neural
-network, it must start with a FlattenLayer due to techincal details
-regarding weight intitialization. However many FullyConnectedLayers
-can be added to the CNN, and in each layer the amount of nodes, which
-activation function and scheduler to use can be specified. In
-practice, the scheduler will be specified in the CNN object
-initialization, and inherited if no other scheduler is specified.
-</p>
+<p>When we introduce the filter we have the following additional hyperparameters</p>
+<ol>
+<li> \( K \) the number of filters. It is common to perform the convolution of the input several times since by experience shrinking the input too fast does not work well</li>
+<li> \( F \) as the filter's spatial extent</li>
+<li> \( S \) as the stride parameter</li>
+<li> \( P \) as the padding parameter</li>
+</ol>
+<p>These parameters are defined by the architecture of the network and are not included in the training.</p>
 
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="new-image-or-volume">New image (or volume) </h2>
 
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">class</span> <span style="color: #0000FF; font-weight: bold">FullyConnectedLayer</span>(Layer):
-    <span style="color: #408080; font-style: italic"># FullyConnectedLayer per default uses LRELU and Adam scheduler</span>
-    <span style="color: #408080; font-style: italic"># with an eta of 0.0001, rho of 0.9 and rho2 of 0.999</span>
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">__init__</span>(
-        <span style="color: #008000">self</span>,
-        nodes: <span style="color: #008000">int</span>,
-        act_func: Callable <span style="color: #666666">=</span> LRELU,
-        scheduler: Scheduler <span style="color: #666666">=</span> Adam(eta<span style="color: #666666">=1e-4</span>, rho<span style="color: #666666">=0.9</span>, rho2<span style="color: #666666">=0.999</span>),
-        seed: <span style="color: #008000">int</span> <span style="color: #666666">=</span> <span style="color: #008000; font-weight: bold">None</span>,
-    ):
-        <span style="color: #008000">super</span>()<span style="color: #666666">.</span><span style="color: #0000FF">__init__</span>(seed)
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>nodes <span style="color: #666666">=</span> nodes
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>act_func <span style="color: #666666">=</span> act_func
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>scheduler_weight <span style="color: #666666">=</span> copy(scheduler)
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>scheduler_bias <span style="color: #666666">=</span> copy(scheduler)
-
-        <span style="color: #408080; font-style: italic"># initiate matrices for later</span>
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>weights <span style="color: #666666">=</span> <span style="color: #008000; font-weight: bold">None</span>
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>a_matrix <span style="color: #666666">=</span> <span style="color: #008000; font-weight: bold">None</span>
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>z_matrix <span style="color: #666666">=</span> <span style="color: #008000; font-weight: bold">None</span>
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">_feedforward</span>(<span style="color: #008000">self</span>, X_batch):
-        <span style="color: #408080; font-style: italic"># calculate z</span>
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>z_matrix <span style="color: #666666">=</span> X_batch <span style="color: #666666">@</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>weights
-
-        <span style="color: #408080; font-style: italic"># calculate a, add bias</span>
-        bias <span style="color: #666666">=</span> np<span style="color: #666666">.</span>ones((X_batch<span style="color: #666666">.</span>shape[input_index], <span style="color: #666666">1</span>)) <span style="color: #666666">*</span> <span style="color: #666666">0.01</span>
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>a_matrix <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>act_func(<span style="color: #008000">self</span><span style="color: #666666">.</span>z_matrix)
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>a_matrix <span style="color: #666666">=</span> np<span style="color: #666666">.</span>hstack([bias, <span style="color: #008000">self</span><span style="color: #666666">.</span>a_matrix])
-
-        <span style="color: #408080; font-style: italic"># return a, the input for feedforward in next layer</span>
-        <span style="color: #008000; font-weight: bold">return</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>a_matrix
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">_backpropagate</span>(<span style="color: #008000">self</span>, weights_next, delta_term_next, a_previous, lam):
-        <span style="color: #408080; font-style: italic"># take the derivative of the activation function</span>
-        activation_derivative <span style="color: #666666">=</span> derivate(<span style="color: #008000">self</span><span style="color: #666666">.</span>act_func)
-
-        <span style="color: #408080; font-style: italic"># calculate the delta term</span>
-        delta_term <span style="color: #666666">=</span> (
-            weights_next[bias_index:, :] <span style="color: #666666">@</span> delta_term_next<span style="color: #666666">.</span>T
-        )<span style="color: #666666">.</span>T <span style="color: #666666">*</span> activation_derivative(<span style="color: #008000">self</span><span style="color: #666666">.</span>z_matrix)
-
-        <span style="color: #408080; font-style: italic"># intitiate matrix to store gradient</span>
-        <span style="color: #408080; font-style: italic"># note that we exclude the bias term, which we will calculate later</span>
-        gradient_weights <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros(
-            (
-                a_previous<span style="color: #666666">.</span>shape[input_index],
-                a_previous<span style="color: #666666">.</span>shape[node_index] <span style="color: #666666">-</span> bias_index,
-                delta_term<span style="color: #666666">.</span>shape[node_index],
-            )
-        )
-
-        <span style="color: #408080; font-style: italic"># calculate gradient = delta term * previous a</span>
-        <span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #008000">len</span>(delta_term)):
-            gradient_weights[i, :, :] <span style="color: #666666">=</span> np<span style="color: #666666">.</span>outer(
-                a_previous[i, bias_index:], delta_term[i, :]
-            )
-
-        <span style="color: #408080; font-style: italic"># sum the gradient, divide by input_index</span>
-        gradient_weights <span style="color: #666666">=</span> np<span style="color: #666666">.</span>mean(gradient_weights, axis<span style="color: #666666">=</span>input_index)
-        <span style="color: #408080; font-style: italic"># for the bias gradient we do not multiply by previous a</span>
-        gradient_bias <span style="color: #666666">=</span> np<span style="color: #666666">.</span>mean(delta_term, axis<span style="color: #666666">=</span>input_index)<span style="color: #666666">.</span>reshape(
-            <span style="color: #666666">1</span>, delta_term<span style="color: #666666">.</span>shape[node_index]
-        )
-
-        <span style="color: #408080; font-style: italic"># regularization term</span>
-        gradient_weights <span style="color: #666666">+=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>weights[bias_index:, :] <span style="color: #666666">*</span> lam
-
-        <span style="color: #408080; font-style: italic"># send gradients into scheduler</span>
-        <span style="color: #408080; font-style: italic"># returns update matrix which will be used to update the weights and bias</span>
-        update_matrix <span style="color: #666666">=</span> np<span style="color: #666666">.</span>vstack(
-            [
-                <span style="color: #008000">self</span><span style="color: #666666">.</span>scheduler_bias<span style="color: #666666">.</span>update_change(gradient_bias),
-                <span style="color: #008000">self</span><span style="color: #666666">.</span>scheduler_weight<span style="color: #666666">.</span>update_change(gradient_weights),
-            ]
-        )
-
-        <span style="color: #408080; font-style: italic"># update weights</span>
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>weights <span style="color: #666666">-=</span> update_matrix
-
-        <span style="color: #408080; font-style: italic"># return weights and delta term, input for backpropagation in previous layer</span>
-        <span style="color: #008000; font-weight: bold">return</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>weights, delta_term
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">_reset_weights</span>(<span style="color: #008000">self</span>, previous_nodes):
-        <span style="color: #408080; font-style: italic"># sets seed to remove randomness inbetween runs</span>
-        <span style="color: #008000; font-weight: bold">if</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>seed <span style="color: #AA22FF; font-weight: bold">is</span> <span style="color: #AA22FF; font-weight: bold">not</span> <span style="color: #008000; font-weight: bold">None</span>:
-            np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>seed(<span style="color: #008000">self</span><span style="color: #666666">.</span>seed)
-
-        <span style="color: #408080; font-style: italic"># add bias, initiate random weights</span>
-        bias <span style="color: #666666">=</span> <span style="color: #666666">1</span>
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>weights <span style="color: #666666">=</span> np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>randn(previous_nodes <span style="color: #666666">+</span> bias, <span style="color: #008000">self</span><span style="color: #666666">.</span>nodes)
-
-        <span style="color: #408080; font-style: italic"># returns number of nodes, used for reset_weights in next layer</span>
-        <span style="color: #008000; font-weight: bold">return</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>nodes
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">_reset_scheduler</span>(<span style="color: #008000">self</span>):
-        <span style="color: #408080; font-style: italic"># resets scheduler per epoch</span>
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>scheduler_weight<span style="color: #666666">.</span>reset()
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>scheduler_bias<span style="color: #666666">.</span>reset()
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">get_prev_a</span>(<span style="color: #008000">self</span>):
-        <span style="color: #408080; font-style: italic"># returns a matrix, used in backpropagation</span>
-        <span style="color: #008000; font-weight: bold">return</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>a_matrix
-
-
-<span style="color: #008000; font-weight: bold">class</span> <span style="color: #0000FF; font-weight: bold">OutputLayer</span>(FullyConnectedLayer):
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">__init__</span>(
-        <span style="color: #008000">self</span>,
-        nodes: <span style="color: #008000">int</span>,
-        output_func: Callable <span style="color: #666666">=</span> LRELU,
-        cost_func: Callable <span style="color: #666666">=</span> CostCrossEntropy,
-        scheduler: Scheduler <span style="color: #666666">=</span> Adam(eta<span style="color: #666666">=1e-4</span>, rho<span style="color: #666666">=0.9</span>, rho2<span style="color: #666666">=0.999</span>),
-        seed: <span style="color: #008000">int</span> <span style="color: #666666">=</span> <span style="color: #008000; font-weight: bold">None</span>,
-    ):
-        <span style="color: #008000">super</span>()<span style="color: #666666">.</span><span style="color: #0000FF">__init__</span>(nodes, output_func, copy(scheduler), seed)
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>cost_func <span style="color: #666666">=</span> cost_func
-
-        <span style="color: #408080; font-style: italic"># initiate matrices for later</span>
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>weights <span style="color: #666666">=</span> <span style="color: #008000; font-weight: bold">None</span>
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>a_matrix <span style="color: #666666">=</span> <span style="color: #008000; font-weight: bold">None</span>
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>z_matrix <span style="color: #666666">=</span> <span style="color: #008000; font-weight: bold">None</span>
-
-        <span style="color: #408080; font-style: italic"># decides if the output layer performs binary or multi-class classification</span>
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>_set_pred_format()
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">_feedforward</span>(<span style="color: #008000">self</span>, X_batch: np<span style="color: #666666">.</span>ndarray):
-        <span style="color: #408080; font-style: italic"># calculate a, z</span>
-        <span style="color: #408080; font-style: italic"># note that bias is not added as this would create an extra output class</span>
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>z_matrix <span style="color: #666666">=</span> X_batch <span style="color: #666666">@</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>weights
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>a_matrix <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>act_func(<span style="color: #008000">self</span><span style="color: #666666">.</span>z_matrix)
-
-        <span style="color: #408080; font-style: italic"># returns prediction</span>
-        <span style="color: #008000; font-weight: bold">return</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>a_matrix
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">_backpropagate</span>(<span style="color: #008000">self</span>, target, a_previous, lam):
-        <span style="color: #408080; font-style: italic"># note that in the OutputLayer the activation function is the output function</span>
-        activation_derivative <span style="color: #666666">=</span> derivate(<span style="color: #008000">self</span><span style="color: #666666">.</span>act_func)
-
-        <span style="color: #408080; font-style: italic"># calculate output delta terms</span>
-        <span style="color: #408080; font-style: italic"># for multi-class or binary classification</span>
-        <span style="color: #008000; font-weight: bold">if</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>pred_format <span style="color: #666666">==</span> <span style="color: #BA2121">&quot;Multi-class&quot;</span>:
-            delta_term <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>a_matrix <span style="color: #666666">-</span> target
-        <span style="color: #008000; font-weight: bold">else</span>:
-            cost_func_derivative <span style="color: #666666">=</span> grad(<span style="color: #008000">self</span><span style="color: #666666">.</span>cost_func(target))
-            delta_term <span style="color: #666666">=</span> activation_derivative(<span style="color: #008000">self</span><span style="color: #666666">.</span>z_matrix) <span style="color: #666666">*</span> cost_func_derivative(
-                <span style="color: #008000">self</span><span style="color: #666666">.</span>a_matrix
-            )
-
-        <span style="color: #408080; font-style: italic"># intiate matrix that stores gradient</span>
-        gradient_weights <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros(
-            (
-                a_previous<span style="color: #666666">.</span>shape[input_index],
-                a_previous<span style="color: #666666">.</span>shape[node_index] <span style="color: #666666">-</span> bias_index,
-                delta_term<span style="color: #666666">.</span>shape[node_index],
-            )
-        )
-
-        <span style="color: #408080; font-style: italic"># calculate gradient = delta term * previous a</span>
-        <span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #008000">len</span>(delta_term)):
-            gradient_weights[i, :, :] <span style="color: #666666">=</span> np<span style="color: #666666">.</span>outer(
-                a_previous[i, bias_index:], delta_term[i, :]
-            )
-
-        <span style="color: #408080; font-style: italic"># sum the gradient, divide by input_index</span>
-        gradient_weights <span style="color: #666666">=</span> np<span style="color: #666666">.</span>mean(gradient_weights, axis<span style="color: #666666">=</span>input_index)
-        <span style="color: #408080; font-style: italic"># for the bias gradient we do not multiply by previous a</span>
-        gradient_bias <span style="color: #666666">=</span> np<span style="color: #666666">.</span>mean(delta_term, axis<span style="color: #666666">=</span>input_index)<span style="color: #666666">.</span>reshape(
-            <span style="color: #666666">1</span>, delta_term<span style="color: #666666">.</span>shape[node_index]
-        )
-
-        <span style="color: #408080; font-style: italic"># regularization term</span>
-        gradient_weights <span style="color: #666666">+=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>weights[bias_index:, :] <span style="color: #666666">*</span> lam
-
-        <span style="color: #408080; font-style: italic"># send gradients into scheduler</span>
-        <span style="color: #408080; font-style: italic"># returns update matrix which will be used to update the weights and bias</span>
-        update_matrix <span style="color: #666666">=</span> np<span style="color: #666666">.</span>vstack(
-            [
-                <span style="color: #008000">self</span><span style="color: #666666">.</span>scheduler_bias<span style="color: #666666">.</span>update_change(gradient_bias),
-                <span style="color: #008000">self</span><span style="color: #666666">.</span>scheduler_weight<span style="color: #666666">.</span>update_change(gradient_weights),
-            ]
-        )
-
-        <span style="color: #408080; font-style: italic"># update weights</span>
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>weights <span style="color: #666666">-=</span> update_matrix
-
-        <span style="color: #408080; font-style: italic"># return weights and delta term, input for backpropagation in previous layer</span>
-        <span style="color: #008000; font-weight: bold">return</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>weights, delta_term
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">_reset_weights</span>(<span style="color: #008000">self</span>, previous_nodes):
-        <span style="color: #408080; font-style: italic"># sets seed to remove randomness inbetween runs</span>
-        <span style="color: #008000; font-weight: bold">if</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>seed <span style="color: #AA22FF; font-weight: bold">is</span> <span style="color: #AA22FF; font-weight: bold">not</span> <span style="color: #008000; font-weight: bold">None</span>:
-            np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>seed(<span style="color: #008000">self</span><span style="color: #666666">.</span>seed)
-
-        <span style="color: #408080; font-style: italic"># add bias, initiate random weights</span>
-        bias <span style="color: #666666">=</span> <span style="color: #666666">1</span>
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>weights <span style="color: #666666">=</span> np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>rand(previous_nodes <span style="color: #666666">+</span> bias, <span style="color: #008000">self</span><span style="color: #666666">.</span>nodes)
-
-        <span style="color: #408080; font-style: italic"># returns number of nodes, used for reset_weights in next layer</span>
-        <span style="color: #008000; font-weight: bold">return</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>nodes
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">_reset_scheduler</span>(<span style="color: #008000">self</span>):
-        <span style="color: #408080; font-style: italic"># resets scheduler per epoch</span>
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>scheduler_weight<span style="color: #666666">.</span>reset()
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>scheduler_bias<span style="color: #666666">.</span>reset()
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">_set_pred_format</span>(<span style="color: #008000">self</span>):
-        <span style="color: #408080; font-style: italic"># sets prediction format to either regression, binary or multi-class classification</span>
-        <span style="color: #008000; font-weight: bold">if</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>act_func<span style="color: #666666">.</span><span style="color: #19177C">__name__</span> <span style="color: #AA22FF; font-weight: bold">is</span> <span style="color: #008000; font-weight: bold">None</span> <span style="color: #AA22FF; font-weight: bold">or</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>act_func<span style="color: #666666">.</span><span style="color: #19177C">__name__</span> <span style="color: #666666">==</span> <span style="color: #BA2121">&quot;identity&quot;</span>:
-            <span style="color: #008000">self</span><span style="color: #666666">.</span>pred_format <span style="color: #666666">=</span> <span style="color: #BA2121">&quot;Regression&quot;</span>
-        <span style="color: #008000; font-weight: bold">elif</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>act_func<span style="color: #666666">.</span><span style="color: #19177C">__name__</span> <span style="color: #666666">==</span> <span style="color: #BA2121">&quot;sigmoid&quot;</span> <span style="color: #AA22FF; font-weight: bold">or</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>act_func<span style="color: #666666">.</span><span style="color: #19177C">__name__</span> <span style="color: #666666">==</span> <span style="color: #BA2121">&quot;tanh&quot;</span>:
-            <span style="color: #008000">self</span><span style="color: #666666">.</span>pred_format <span style="color: #666666">=</span> <span style="color: #BA2121">&quot;Binary&quot;</span>
-        <span style="color: #008000; font-weight: bold">else</span>:
-            <span style="color: #008000">self</span><span style="color: #666666">.</span>pred_format <span style="color: #666666">=</span> <span style="color: #BA2121">&quot;Multi-class&quot;</span>
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">get_pred_format</span>(<span style="color: #008000">self</span>):
-        <span style="color: #408080; font-style: italic"># returns format of prediction</span>
-        <span style="color: #008000; font-weight: bold">return</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>pred_format
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-<h3 id="optimized-convolution2dlayer">Optimized Convolution2DLayer </h3>
-
-<p>For our CNN, we have also implemented an optimized version of the
-Convolution2DLayer, Convolution2DLayerOPT, which runs much faster. See
-VII. Remarks for discussion. This layer will per default be used by
-the CNN due to its computational advantages, but is much less
-readable. We've documented it such that specially interested students
-can understand the principles behind it, but it is not recommended to
-read. In short, we reshape and transpose parts of the image such that
-the convolutional operation can be swapped out for a simple matrix
-multiplication.
+<p>Acting with the filter on the input volume produces an output volume
+which is defined by its width \( W_2 \), its height \( H_2 \) and its depth
+\( D_2 \).
 </p>
 
+<p>These are defined by the following relations</p>
+$$
+W_2 = \frac{(W_1-F+2P)}{S}+1,
+$$
 
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">class</span> <span style="color: #0000FF; font-weight: bold">Convolution2DLayerOPT</span>(Convolution2DLayer):
-    <span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;</span>
-<span style="color: #BA2121; font-style: italic">    Am optimized version of the convolution layer above which</span>
-<span style="color: #BA2121; font-style: italic">    utilizes an approach of extracting windows of size equivalent</span>
-<span style="color: #BA2121; font-style: italic">    in size to the filter. The convoution is then performed on those</span>
-<span style="color: #BA2121; font-style: italic">    windows instead of a full feature map.</span>
-<span style="color: #BA2121; font-style: italic">    &quot;&quot;&quot;</span>
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">__init__</span>(
-        <span style="color: #008000">self</span>,
-        input_channels,
-        feature_maps,
-        kernel_height,
-        kernel_width,
-        v_stride,
-        h_stride,
-        pad,
-        act_func: Callable,
-        seed<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">None</span>,
-        reset_weights_independently<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">True</span>,
-    ):
-        <span style="color: #008000">super</span>()<span style="color: #666666">.</span><span style="color: #0000FF">__init__</span>(
-            input_channels,
-            feature_maps,
-            kernel_height,
-            kernel_width,
-            v_stride,
-            h_stride,
-            pad,
-            act_func,
-            seed,
-        )
-        <span style="color: #408080; font-style: italic"># true if layer is used outside of CNN</span>
-        <span style="color: #008000; font-weight: bold">if</span> reset_weights_independently <span style="color: #666666">==</span> <span style="color: #008000; font-weight: bold">True</span>:
-            <span style="color: #008000">self</span><span style="color: #666666">.</span>_reset_weights_independently()
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">_feedforward</span>(<span style="color: #008000">self</span>, X_batch):
-        <span style="color: #408080; font-style: italic"># The optimized _feedforward method is difficult to understand but computationally more efficient</span>
-        <span style="color: #408080; font-style: italic"># for a more &quot;by the book&quot; approach, please look at the _feedforward method of Convolution2DLayer</span>
-
-        <span style="color: #408080; font-style: italic"># save the input for backpropagation</span>
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>X_batch_feedforward <span style="color: #666666">=</span> X_batch
-
-        <span style="color: #408080; font-style: italic"># check that there are the correct amount of input channels</span>
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>_check_for_errors()
-
-        <span style="color: #408080; font-style: italic"># calculate new shape after stride</span>
-        strided_height <span style="color: #666666">=</span> <span style="color: #008000">int</span>(np<span style="color: #666666">.</span>ceil(X_batch<span style="color: #666666">.</span>shape[height_index] <span style="color: #666666">/</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>v_stride))
-        strided_width <span style="color: #666666">=</span> <span style="color: #008000">int</span>(np<span style="color: #666666">.</span>ceil(X_batch<span style="color: #666666">.</span>shape[width_index] <span style="color: #666666">/</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>h_stride))
-
-        <span style="color: #408080; font-style: italic"># get windows of the image for more computationally efficient convolution</span>
-        <span style="color: #408080; font-style: italic"># the idea is that we want to align the dimensions that we wish to matrix</span>
-        <span style="color: #408080; font-style: italic"># multiply, then use a simple matrix multiplication instead of convolution.</span>
-        <span style="color: #408080; font-style: italic"># then, we reshape the size back to its intended shape</span>
-        windows <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>_extract_windows(X_batch)
-        windows <span style="color: #666666">=</span> windows<span style="color: #666666">.</span>transpose(<span style="color: #666666">1</span>, <span style="color: #666666">0</span>, <span style="color: #666666">2</span>, <span style="color: #666666">3</span>, <span style="color: #666666">4</span>)<span style="color: #666666">.</span>reshape(
-            X_batch<span style="color: #666666">.</span>shape[input_index],
-            strided_height <span style="color: #666666">*</span> strided_width,
-            <span style="color: #666666">-1</span>,
-        )
-
-        <span style="color: #408080; font-style: italic"># reshape the kernel for more computationally efficient convolution</span>
-        kernel <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel
-        kernel <span style="color: #666666">=</span> kernel<span style="color: #666666">.</span>transpose(<span style="color: #666666">0</span>, <span style="color: #666666">2</span>, <span style="color: #666666">3</span>, <span style="color: #666666">1</span>)<span style="color: #666666">.</span>reshape(
-            kernel<span style="color: #666666">.</span>shape[kernel_input_channels_index]
-            <span style="color: #666666">*</span> kernel<span style="color: #666666">.</span>shape[height_index]
-            <span style="color: #666666">*</span> kernel<span style="color: #666666">.</span>shape[width_index],
-            <span style="color: #666666">-1</span>,
-        )
-
-        <span style="color: #408080; font-style: italic"># use simple matrix calculation to obtain output</span>
-        output <span style="color: #666666">=</span> (
-            (windows <span style="color: #666666">@</span> kernel)
-            <span style="color: #666666">.</span>reshape(
-                X_batch<span style="color: #666666">.</span>shape[input_index],
-                strided_height,
-                strided_width,
-                <span style="color: #666666">-1</span>,
-            )
-            <span style="color: #666666">.</span>transpose(<span style="color: #666666">0</span>, <span style="color: #666666">3</span>, <span style="color: #666666">1</span>, <span style="color: #666666">2</span>)
-        )
-
-        <span style="color: #408080; font-style: italic"># The output is reshaped and rearranged to appropriate shape</span>
-        <span style="color: #008000; font-weight: bold">return</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>act_func(
-            output <span style="color: #666666">/</span> (<span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_height <span style="color: #666666">*</span> X_batch<span style="color: #666666">.</span>shape[feature_maps_index])
-        )
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">_backpropagate</span>(<span style="color: #008000">self</span>, delta_term_next):
-        <span style="color: #408080; font-style: italic"># The optimized _backpropagate method is difficult to understand but computationally more efficient</span>
-        <span style="color: #408080; font-style: italic"># for a more &quot;by the book&quot; approach, please look at the _backpropagate method of Convolution2DLayer</span>
-        act_derivative <span style="color: #666666">=</span> derivate(<span style="color: #008000">self</span><span style="color: #666666">.</span>act_func)
-        delta_term_next <span style="color: #666666">=</span> act_derivative(delta_term_next)
-
-        <span style="color: #408080; font-style: italic"># calculate strided dimensions</span>
-        strided_height <span style="color: #666666">=</span> <span style="color: #008000">int</span>(
-            np<span style="color: #666666">.</span>ceil(<span style="color: #008000">self</span><span style="color: #666666">.</span>X_batch_feedforward<span style="color: #666666">.</span>shape[height_index] <span style="color: #666666">/</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>v_stride)
-        )
-        strided_width <span style="color: #666666">=</span> <span style="color: #008000">int</span>(
-            np<span style="color: #666666">.</span>ceil(<span style="color: #008000">self</span><span style="color: #666666">.</span>X_batch_feedforward<span style="color: #666666">.</span>shape[width_index] <span style="color: #666666">/</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>h_stride)
-        )
-
-        <span style="color: #408080; font-style: italic"># copy kernel</span>
-        kernel <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel
-
-        <span style="color: #408080; font-style: italic"># get windows, reshape for matrix multiplication</span>
-        windows <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>_extract_windows(<span style="color: #008000">self</span><span style="color: #666666">.</span>X_batch_feedforward, <span style="color: #BA2121">&quot;image&quot;</span>)<span style="color: #666666">.</span>reshape(
-            <span style="color: #008000">self</span><span style="color: #666666">.</span>X_batch_feedforward<span style="color: #666666">.</span>shape[input_index]
-            <span style="color: #666666">*</span> strided_height
-            <span style="color: #666666">*</span> strided_width,
-            <span style="color: #666666">-1</span>,
-        )
-
-        <span style="color: #408080; font-style: italic"># initialize output gradient, reshape and transpose into correct shape</span>
-        <span style="color: #408080; font-style: italic"># for matrix multiplication</span>
-        output_grad_tr <span style="color: #666666">=</span> delta_term_next<span style="color: #666666">.</span>transpose(<span style="color: #666666">0</span>, <span style="color: #666666">2</span>, <span style="color: #666666">3</span>, <span style="color: #666666">1</span>)<span style="color: #666666">.</span>reshape(
-            <span style="color: #008000">self</span><span style="color: #666666">.</span>X_batch_feedforward<span style="color: #666666">.</span>shape[input_index]
-            <span style="color: #666666">*</span> strided_height
-            <span style="color: #666666">*</span> strided_width,
-            <span style="color: #666666">-1</span>,
-        )
-
-        <span style="color: #408080; font-style: italic"># calculate gradient kernel via simple matrix multiplication and reshaping</span>
-        gradient_kernel <span style="color: #666666">=</span> (
-            (windows<span style="color: #666666">.</span>T <span style="color: #666666">@</span> output_grad_tr)
-            <span style="color: #666666">.</span>reshape(
-                kernel<span style="color: #666666">.</span>shape[kernel_input_channels_index],
-                kernel<span style="color: #666666">.</span>shape[height_index],
-                kernel<span style="color: #666666">.</span>shape[width_index],
-                kernel<span style="color: #666666">.</span>shape[kernel_feature_maps_index],
-            )
-            <span style="color: #666666">.</span>transpose(<span style="color: #666666">0</span>, <span style="color: #666666">3</span>, <span style="color: #666666">1</span>, <span style="color: #666666">2</span>)
-        )
-
-        <span style="color: #408080; font-style: italic"># for computing the input gradient</span>
-        windows_out, upsampled_height, upsampled_width <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>_extract_windows(
-            delta_term_next, <span style="color: #BA2121">&quot;grad&quot;</span>
-        )
-
-        <span style="color: #408080; font-style: italic"># calculate new window dimensions</span>
-        new_windows_first_dim <span style="color: #666666">=</span> (
-            <span style="color: #008000">self</span><span style="color: #666666">.</span>X_batch_feedforward<span style="color: #666666">.</span>shape[input_index]
-            <span style="color: #666666">*</span> upsampled_height
-            <span style="color: #666666">*</span> upsampled_width
-        )
-        <span style="color: #408080; font-style: italic"># ceil allows for various asymmetric kernels</span>
-        new_windows_sec_dim <span style="color: #666666">=</span> <span style="color: #008000">int</span>(np<span style="color: #666666">.</span>ceil(windows_out<span style="color: #666666">.</span>size <span style="color: #666666">/</span> new_windows_first_dim))
-
-        <span style="color: #408080; font-style: italic"># reshape for matrix multiplication</span>
-        windows_out <span style="color: #666666">=</span> windows_out<span style="color: #666666">.</span>transpose(<span style="color: #666666">1</span>, <span style="color: #666666">0</span>, <span style="color: #666666">2</span>, <span style="color: #666666">3</span>, <span style="color: #666666">4</span>)<span style="color: #666666">.</span>reshape(
-            new_windows_first_dim, new_windows_sec_dim
-        )
-
-        <span style="color: #408080; font-style: italic"># reshape for matrix multiplication</span>
-        kernel_reshaped <span style="color: #666666">=</span> kernel<span style="color: #666666">.</span>reshape(<span style="color: #008000">self</span><span style="color: #666666">.</span>input_channels, <span style="color: #666666">-1</span>)
-
-        <span style="color: #408080; font-style: italic"># calculating input gradient for next convolutional layer</span>
-        input_grad <span style="color: #666666">=</span> (windows_out <span style="color: #666666">@</span> kernel_reshaped<span style="color: #666666">.</span>T)<span style="color: #666666">.</span>reshape(
-            <span style="color: #008000">self</span><span style="color: #666666">.</span>X_batch_feedforward<span style="color: #666666">.</span>shape[input_index],
-            upsampled_height,
-            upsampled_width,
-            kernel<span style="color: #666666">.</span>shape[kernel_input_channels_index],
-        )
-        input_grad <span style="color: #666666">=</span> input_grad<span style="color: #666666">.</span>transpose(<span style="color: #666666">0</span>, <span style="color: #666666">3</span>, <span style="color: #666666">1</span>, <span style="color: #666666">2</span>)
-
-        <span style="color: #408080; font-style: italic"># Update the weights in the kernel</span>
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel <span style="color: #666666">-=</span> gradient_kernel
-
-        <span style="color: #408080; font-style: italic"># Output the gradient to propagate backwards</span>
-        <span style="color: #008000; font-weight: bold">return</span> input_grad
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">_extract_windows</span>(<span style="color: #008000">self</span>, X_batch, batch_type<span style="color: #666666">=</span><span style="color: #BA2121">&quot;image&quot;</span>):
-        <span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;</span>
-<span style="color: #BA2121; font-style: italic">        Receives as input the X_batch with shape (inputs, feature_maps, image_height, image_width)</span>
-<span style="color: #BA2121; font-style: italic">        and extract windows of size kernel_height * kernel_width for every image and every feature_map.</span>
-<span style="color: #BA2121; font-style: italic">        It then returns an np.ndarray of shape (image_height * image_width, inputs, feature_maps, kernel_height, kernel_width)</span>
-<span style="color: #BA2121; font-style: italic">        which will be used either to filter the images in feedforward or to calculate the gradient.</span>
-<span style="color: #BA2121; font-style: italic">        &quot;&quot;&quot;</span>
-
-        <span style="color: #408080; font-style: italic"># initialize list of windows</span>
-        windows <span style="color: #666666">=</span> []
-
-        <span style="color: #008000; font-weight: bold">if</span> batch_type <span style="color: #666666">==</span> <span style="color: #BA2121">&quot;image&quot;</span>:
-            <span style="color: #408080; font-style: italic"># pad the images</span>
-            X_batch_padded <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>_padding(X_batch, batch_type<span style="color: #666666">=</span><span style="color: #BA2121">&quot;image&quot;</span>)
-            img_height, img_width <span style="color: #666666">=</span> X_batch_padded<span style="color: #666666">.</span>shape[<span style="color: #666666">2</span>:]
-            <span style="color: #408080; font-style: italic"># For each location in the image...</span>
-            <span style="color: #008000; font-weight: bold">for</span> h <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(
-                <span style="color: #666666">0</span>,
-                X_batch<span style="color: #666666">.</span>shape[height_index],
-                <span style="color: #008000">self</span><span style="color: #666666">.</span>v_stride,
-            ):
-                <span style="color: #008000; font-weight: bold">for</span> w <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(
-                    <span style="color: #666666">0</span>,
-                    X_batch<span style="color: #666666">.</span>shape[width_index],
-                    <span style="color: #008000">self</span><span style="color: #666666">.</span>h_stride,
-                ):
-                    <span style="color: #408080; font-style: italic"># ...obtain an image patch of the original size (strided)</span>
-
-                    <span style="color: #408080; font-style: italic"># get window</span>
-                    window <span style="color: #666666">=</span> X_batch_padded[
-                        :,
-                        :,
-                        h : h <span style="color: #666666">+</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_height,
-                        w : w <span style="color: #666666">+</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_width,
-                    ]
-
-                    <span style="color: #408080; font-style: italic"># append to list of windows</span>
-                    windows<span style="color: #666666">.</span>append(window)
-
-            <span style="color: #408080; font-style: italic"># return numpy array instead of list</span>
-            <span style="color: #008000; font-weight: bold">return</span> np<span style="color: #666666">.</span>stack(windows)
-
-        <span style="color: #408080; font-style: italic"># In order to be able to perform backprogagation by the method of window extraction,</span>
-        <span style="color: #408080; font-style: italic"># here is a modified approach to extracting the windows which allow for the necessary</span>
-        <span style="color: #408080; font-style: italic"># upsampling of the gradient in case the on of the stride parameters is larger than one.</span>
-
-        <span style="color: #008000; font-weight: bold">if</span> batch_type <span style="color: #666666">==</span> <span style="color: #BA2121">&quot;grad&quot;</span>:
-
-            <span style="color: #408080; font-style: italic"># In the case of one of the stride parameters being odd, we have to take some</span>
-            <span style="color: #408080; font-style: italic"># extra care in calculating the upsampled size of X_batch. We solve this</span>
-            <span style="color: #408080; font-style: italic"># by simply flooring the result of dividing stride by 2.</span>
-            <span style="color: #008000; font-weight: bold">if</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>v_stride <span style="color: #666666">&lt;</span> <span style="color: #666666">2</span> <span style="color: #AA22FF; font-weight: bold">or</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>v_stride <span style="color: #666666">%</span> <span style="color: #666666">2</span> <span style="color: #666666">==</span> <span style="color: #666666">0</span>:
-                v_stride <span style="color: #666666">=</span> <span style="color: #666666">0</span>
-            <span style="color: #008000; font-weight: bold">else</span>:
-                v_stride <span style="color: #666666">=</span> <span style="color: #008000">int</span>(np<span style="color: #666666">.</span>floor(<span style="color: #008000">self</span><span style="color: #666666">.</span>v_stride <span style="color: #666666">/</span> <span style="color: #666666">2</span>))
-
-            <span style="color: #008000; font-weight: bold">if</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>h_stride <span style="color: #666666">&lt;</span> <span style="color: #666666">2</span> <span style="color: #AA22FF; font-weight: bold">or</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>h_stride <span style="color: #666666">%</span> <span style="color: #666666">2</span> <span style="color: #666666">==</span> <span style="color: #666666">0</span>:
-                h_stride <span style="color: #666666">=</span> <span style="color: #666666">0</span>
-            <span style="color: #008000; font-weight: bold">else</span>:
-                h_stride <span style="color: #666666">=</span> <span style="color: #008000">int</span>(np<span style="color: #666666">.</span>floor(<span style="color: #008000">self</span><span style="color: #666666">.</span>h_stride <span style="color: #666666">/</span> <span style="color: #666666">2</span>))
-
-            upsampled_height <span style="color: #666666">=</span> (X_batch<span style="color: #666666">.</span>shape[height_index] <span style="color: #666666">*</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>v_stride) <span style="color: #666666">-</span> v_stride
-            upsampled_width <span style="color: #666666">=</span> (X_batch<span style="color: #666666">.</span>shape[width_index] <span style="color: #666666">*</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>h_stride) <span style="color: #666666">-</span> h_stride
-
-            <span style="color: #408080; font-style: italic"># When upsampling, we need to insert rows and columns filled with zeros</span>
-            <span style="color: #408080; font-style: italic"># into each feature map. How many of those we have to insert is purely</span>
-            <span style="color: #408080; font-style: italic"># dependant on the value of stride parameter in the vertical and horizontal</span>
-            <span style="color: #408080; font-style: italic"># direction.</span>
-            <span style="color: #008000; font-weight: bold">if</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>v_stride <span style="color: #666666">&gt;</span> <span style="color: #666666">1</span>:
-                v_ind <span style="color: #666666">=</span> <span style="color: #666666">1</span>
-                <span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(X_batch<span style="color: #666666">.</span>shape[height_index]):
-                    <span style="color: #008000; font-weight: bold">for</span> j <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #008000">self</span><span style="color: #666666">.</span>v_stride <span style="color: #666666">-</span> <span style="color: #666666">1</span>):
-                        X_batch <span style="color: #666666">=</span> np<span style="color: #666666">.</span>insert(X_batch, v_ind, <span style="color: #666666">0</span>, axis<span style="color: #666666">=</span>height_index)
-                    v_ind <span style="color: #666666">+=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>v_stride
-
-            <span style="color: #008000; font-weight: bold">if</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>h_stride <span style="color: #666666">&gt;</span> <span style="color: #666666">1</span>:
-                h_ind <span style="color: #666666">=</span> <span style="color: #666666">1</span>
-                <span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(X_batch<span style="color: #666666">.</span>shape[width_index]):
-                    <span style="color: #008000; font-weight: bold">for</span> k <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #008000">self</span><span style="color: #666666">.</span>h_stride <span style="color: #666666">-</span> <span style="color: #666666">1</span>):
-                        X_batch <span style="color: #666666">=</span> np<span style="color: #666666">.</span>insert(X_batch, h_ind, <span style="color: #666666">0</span>, axis<span style="color: #666666">=</span>width_index)
-                    h_ind <span style="color: #666666">+=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>h_stride
-
-            <span style="color: #408080; font-style: italic"># Since the insertion of zero-filled rows and columns isn&#39;t perfect, we have</span>
-            <span style="color: #408080; font-style: italic"># to assure that the resulting feature maps will have the expected upsampled height</span>
-            <span style="color: #408080; font-style: italic"># and width by cutting them og at desired dimensions.</span>
-
-            X_batch <span style="color: #666666">=</span> X_batch[:, :, :upsampled_height, :upsampled_width]
-
-            X_batch_padded <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>_padding(X_batch, batch_type<span style="color: #666666">=</span><span style="color: #BA2121">&quot;grad&quot;</span>)
-
-            <span style="color: #408080; font-style: italic"># initialize list of windows</span>
-            windows <span style="color: #666666">=</span> []
-
-            <span style="color: #408080; font-style: italic"># For each location in the image...</span>
-            <span style="color: #008000; font-weight: bold">for</span> h <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(
-                <span style="color: #666666">0</span>,
-                X_batch<span style="color: #666666">.</span>shape[height_index],
-                <span style="color: #008000">self</span><span style="color: #666666">.</span>v_stride,
-            ):
-                <span style="color: #008000; font-weight: bold">for</span> w <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(
-                    <span style="color: #666666">0</span>,
-                    X_batch<span style="color: #666666">.</span>shape[width_index],
-                    <span style="color: #008000">self</span><span style="color: #666666">.</span>h_stride,
-                ):
-                    <span style="color: #408080; font-style: italic"># ...obtain an image patch of the original size (strided)</span>
-
-                    <span style="color: #408080; font-style: italic"># get window</span>
-                    window <span style="color: #666666">=</span> X_batch_padded[
-                        :, :, h : h <span style="color: #666666">+</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_height, w : w <span style="color: #666666">+</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>kernel_width
-                    ]
-
-                    <span style="color: #408080; font-style: italic"># append window to list</span>
-                    windows<span style="color: #666666">.</span>append(window)
-
-            <span style="color: #408080; font-style: italic"># return numpy array, unsampled dimensions</span>
-            <span style="color: #008000; font-weight: bold">return</span> np<span style="color: #666666">.</span>stack(windows), upsampled_height, upsampled_width
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">_check_for_errors</span>(<span style="color: #008000">self</span>):
-        <span style="color: #408080; font-style: italic"># compares input channels of data to input channels of Convolution2DLayer</span>
-        <span style="color: #008000; font-weight: bold">if</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>X_batch_feedforward<span style="color: #666666">.</span>shape[input_channel_index] <span style="color: #666666">!=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>input_channels:
-            <span style="color: #008000; font-weight: bold">raise</span> <span style="color: #D2413A; font-weight: bold">AssertionError</span>(
-                <span style="color: #BA2121">f&quot;ERROR: Number of input channels in data (</span><span style="color: #BB6688; font-weight: bold">{</span><span style="color: #008000">self</span><span style="color: #666666">.</span>X_batch_feedforward<span style="color: #666666">.</span>shape[input_channel_index]<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">) is not equal to input channels in Convolution2DLayerOPT (</span><span style="color: #BB6688; font-weight: bold">{</span><span style="color: #008000">self</span><span style="color: #666666">.</span>input_channels<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">)! Please change the number of input channels of the Convolution2DLayer such that they are equal&quot;</span>
-            )
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
-<h3 id="the-convolutional-neural-network-cnn">The Convolutional Neural Network (CNN) </h3>
+$$
+H_2 = \frac{(H_1-F+2P)}{S}+1,
+$$
 
-<p>Finally, we present the code for the CNN. The CNN class organizes all the layers, and allows for training on image data.</p>
+<p>and \( D_2=K \).</p>
 
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="parameters-to-train-common-settings">Parameters to train, common settings </h2>
 
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">math</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">autograd.numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">sys</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">warnings</span>
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">autograd</span> <span style="color: #008000; font-weight: bold">import</span> grad, elementwise_grad
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">random</span> <span style="color: #008000; font-weight: bold">import</span> random, seed
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">copy</span> <span style="color: #008000; font-weight: bold">import</span> deepcopy
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">typing</span> <span style="color: #008000; font-weight: bold">import</span> Tuple, Callable
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.utils</span> <span style="color: #008000; font-weight: bold">import</span> resample
-
-warnings<span style="color: #666666">.</span>simplefilter(<span style="color: #BA2121">&quot;error&quot;</span>)
-
-
-<span style="color: #008000; font-weight: bold">class</span> <span style="color: #0000FF; font-weight: bold">CNN</span>:
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">__init__</span>(
-        <span style="color: #008000">self</span>,
-        cost_func: Callable <span style="color: #666666">=</span> CostCrossEntropy,
-        scheduler: Scheduler <span style="color: #666666">=</span> Adam(eta<span style="color: #666666">=1e-4</span>, rho<span style="color: #666666">=0.9</span>, rho2<span style="color: #666666">=0.999</span>),
-        seed: <span style="color: #008000">int</span> <span style="color: #666666">=</span> <span style="color: #008000; font-weight: bold">None</span>,
-    ):
-        <span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;</span>
-<span style="color: #BA2121; font-style: italic">        Description:</span>
-<span style="color: #BA2121; font-style: italic">        ------------</span>
-<span style="color: #BA2121; font-style: italic">            Instantiates CNN object</span>
-
-<span style="color: #BA2121; font-style: italic">        Parameters:</span>
-<span style="color: #BA2121; font-style: italic">        ------------</span>
-<span style="color: #BA2121; font-style: italic">            I   output_func (costFunctions) cost function for feed forward neural network part of CNN,</span>
-<span style="color: #BA2121; font-style: italic">                such as &quot;CostLogReg&quot;, &quot;CostOLS&quot; or &quot;CostCrossEntropy&quot;</span>
-
-<span style="color: #BA2121; font-style: italic">            II  scheduler (Scheduler) optional parameter, default set to Adam. Can also be set to other</span>
-<span style="color: #BA2121; font-style: italic">                schedulers such as AdaGrad, Momentum, RMS_prop and Constant. Note that schedulers have</span>
-<span style="color: #BA2121; font-style: italic">                to be instantiated first with proper parameters (for example eta, rho and rho2 for Adam)</span>
-
-<span style="color: #BA2121; font-style: italic">            III seed (int) used for seeding all random operations</span>
-<span style="color: #BA2121; font-style: italic">        &quot;&quot;&quot;</span>
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>layers <span style="color: #666666">=</span> <span style="color: #008000">list</span>()
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>cost_func <span style="color: #666666">=</span> cost_func
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>scheduler <span style="color: #666666">=</span> scheduler
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>schedulers_weight <span style="color: #666666">=</span> <span style="color: #008000">list</span>()
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>schedulers_bias <span style="color: #666666">=</span> <span style="color: #008000">list</span>()
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>seed <span style="color: #666666">=</span> seed
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>pred_format <span style="color: #666666">=</span> <span style="color: #008000; font-weight: bold">None</span>
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">add_FullyConnectedLayer</span>(
-        <span style="color: #008000">self</span>, nodes: <span style="color: #008000">int</span>, act_func<span style="color: #666666">=</span>LRELU, scheduler<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">None</span>
-    ) <span style="color: #666666">-&gt;</span> <span style="color: #008000; font-weight: bold">None</span>:
-        <span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;</span>
-<span style="color: #BA2121; font-style: italic">        Description:</span>
-<span style="color: #BA2121; font-style: italic">        ------------</span>
-<span style="color: #BA2121; font-style: italic">            Add a FullyConnectedLayer to the CNN, i.e. a hidden layer in the feed forward neural</span>
-<span style="color: #BA2121; font-style: italic">            network part of the CNN. Often called a Dense layer in literature</span>
-
-<span style="color: #BA2121; font-style: italic">        Parameters:</span>
-<span style="color: #BA2121; font-style: italic">        ------------</span>
-<span style="color: #BA2121; font-style: italic">            I   nodes (int) number of nodes in FullyConnectedLayer</span>
-<span style="color: #BA2121; font-style: italic">            II  act_func (activationFunctions) activation function of FullyConnectedLayer,</span>
-<span style="color: #BA2121; font-style: italic">                such as &quot;sigmoid&quot;, &quot;RELU&quot;, &quot;LRELU&quot;, &quot;softmax&quot; or &quot;identity&quot;</span>
-<span style="color: #BA2121; font-style: italic">            III scheduler (Scheduler) optional parameter, default set to Adam. Can also be set to other</span>
-<span style="color: #BA2121; font-style: italic">                schedulers such as AdaGrad, Momentum, RMS_prop and Constant</span>
-<span style="color: #BA2121; font-style: italic">        &quot;&quot;&quot;</span>
-        <span style="color: #008000; font-weight: bold">assert</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>layers, <span style="color: #BA2121">&quot;FullyConnectedLayer should follow FlattenLayer in CNN&quot;</span>
-
-        <span style="color: #008000; font-weight: bold">if</span> scheduler <span style="color: #AA22FF; font-weight: bold">is</span> <span style="color: #008000; font-weight: bold">None</span>:
-            scheduler <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>scheduler
-
-        layer <span style="color: #666666">=</span> FullyConnectedLayer(nodes, act_func, scheduler, <span style="color: #008000">self</span><span style="color: #666666">.</span>seed)
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>layers<span style="color: #666666">.</span>append(layer)
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">add_OutputLayer</span>(<span style="color: #008000">self</span>, nodes: <span style="color: #008000">int</span>, output_func<span style="color: #666666">=</span>sigmoid, scheduler<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">None</span>) <span style="color: #666666">-&gt;</span> <span style="color: #008000; font-weight: bold">None</span>:
-        <span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;</span>
-<span style="color: #BA2121; font-style: italic">        Description:</span>
-<span style="color: #BA2121; font-style: italic">        ------------</span>
-<span style="color: #BA2121; font-style: italic">            Add an OutputLayer to the CNN, i.e. a the final layer in the feed forward neural</span>
-<span style="color: #BA2121; font-style: italic">            network part of the CNN</span>
-
-<span style="color: #BA2121; font-style: italic">        Parameters:</span>
-<span style="color: #BA2121; font-style: italic">        ------------</span>
-<span style="color: #BA2121; font-style: italic">            I   nodes (int) number of nodes in OutputLayer. Set nodes=1 for binary classification and</span>
-<span style="color: #BA2121; font-style: italic">                nodes = number of classes for multi-class classification</span>
-<span style="color: #BA2121; font-style: italic">            II  output_func (activationFunctions) activation function for the output layer, such as</span>
-<span style="color: #BA2121; font-style: italic">                &quot;identity&quot; for regression, &quot;sigmoid&quot; for binary classification and &quot;softmax&quot; for multi-class</span>
-<span style="color: #BA2121; font-style: italic">                classification</span>
-<span style="color: #BA2121; font-style: italic">            III scheduler (Scheduler) optional parameter, default set to Adam. Can also be set to other</span>
-<span style="color: #BA2121; font-style: italic">                schedulers such as AdaGrad, Momentum, RMS_prop and Constant</span>
-<span style="color: #BA2121; font-style: italic">        &quot;&quot;&quot;</span>
-        <span style="color: #008000; font-weight: bold">assert</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>layers, <span style="color: #BA2121">&quot;OutputLayer should follow FullyConnectedLayer in CNN&quot;</span>
-
-        <span style="color: #008000; font-weight: bold">if</span> scheduler <span style="color: #AA22FF; font-weight: bold">is</span> <span style="color: #008000; font-weight: bold">None</span>:
-            scheduler <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>scheduler
-
-        output_layer <span style="color: #666666">=</span> OutputLayer(
-            nodes, output_func, <span style="color: #008000">self</span><span style="color: #666666">.</span>cost_func, scheduler, <span style="color: #008000">self</span><span style="color: #666666">.</span>seed
-        )
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>layers<span style="color: #666666">.</span>append(output_layer)
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>pred_format <span style="color: #666666">=</span> output_layer<span style="color: #666666">.</span>get_pred_format()
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">add_FlattenLayer</span>(<span style="color: #008000">self</span>, act_func<span style="color: #666666">=</span>LRELU) <span style="color: #666666">-&gt;</span> <span style="color: #008000; font-weight: bold">None</span>:
-        <span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;</span>
-<span style="color: #BA2121; font-style: italic">        Description:</span>
-<span style="color: #BA2121; font-style: italic">        ------------</span>
-<span style="color: #BA2121; font-style: italic">            Add a FlattenLayer to the CNN, which flattens the image data such that it is formatted to</span>
-<span style="color: #BA2121; font-style: italic">            be used in the feed forward neural network part of the CNN</span>
-<span style="color: #BA2121; font-style: italic">        &quot;&quot;&quot;</span>
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>layers<span style="color: #666666">.</span>append(FlattenLayer(act_func<span style="color: #666666">=</span>act_func, seed<span style="color: #666666">=</span><span style="color: #008000">self</span><span style="color: #666666">.</span>seed))
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">add_Convolution2DLayer</span>(
-        <span style="color: #008000">self</span>,
-        input_channels<span style="color: #666666">=1</span>,
-        feature_maps<span style="color: #666666">=1</span>,
-        kernel_height<span style="color: #666666">=3</span>,
-        kernel_width<span style="color: #666666">=3</span>,
-        v_stride<span style="color: #666666">=1</span>,
-        h_stride<span style="color: #666666">=1</span>,
-        pad<span style="color: #666666">=</span><span style="color: #BA2121">&quot;same&quot;</span>,
-        act_func<span style="color: #666666">=</span>LRELU,
-        optimized<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">True</span>,
-    ) <span style="color: #666666">-&gt;</span> <span style="color: #008000; font-weight: bold">None</span>:
-        <span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;</span>
-<span style="color: #BA2121; font-style: italic">        Description:</span>
-<span style="color: #BA2121; font-style: italic">        ------------</span>
-<span style="color: #BA2121; font-style: italic">            Add a Convolution2DLayer to the CNN, i.e. a convolutional layer with a 2 dimensional kernel. Should be</span>
-<span style="color: #BA2121; font-style: italic">            the first layer added to the CNN</span>
-
-<span style="color: #BA2121; font-style: italic">        Parameters:</span>
-<span style="color: #BA2121; font-style: italic">        ------------</span>
-<span style="color: #BA2121; font-style: italic">            I   input_channels (int) specifies amount of input channels. For monochrome images, use input_channels</span>
-<span style="color: #BA2121; font-style: italic">                = 1, and input_channels = 3 for colored images, where each channel represents one of R, G and B</span>
-<span style="color: #BA2121; font-style: italic">            II  feature_maps (int) amount of feature maps in CNN</span>
-<span style="color: #BA2121; font-style: italic">            III kernel_height (int) height of the kernel, also called &#39;convolutional filter&#39; in literature</span>
-<span style="color: #BA2121; font-style: italic">            IV  kernel_width (int) width of the kernel, also called &#39;convolutional filter&#39; in literature</span>
-<span style="color: #BA2121; font-style: italic">            V   v_stride (int) value of vertical stride for dimentionality reduction</span>
-<span style="color: #BA2121; font-style: italic">            VI  h_stride (int) value of horizontal stride for dimentionality reduction</span>
-<span style="color: #BA2121; font-style: italic">            VII pad (str) default = &quot;same&quot; ensures output size is the same as input size (given stride=1)</span>
-<span style="color: #BA2121; font-style: italic">           VIII act_func (activationFunctions) default = &quot;LRELU&quot;, nonlinear activation function</span>
-<span style="color: #BA2121; font-style: italic">             IX optimized (bool) default = True, uses Convolution2DLayerOPT if True which is much faster when</span>
-<span style="color: #BA2121; font-style: italic">                compared to Convolution2DLayer, which is a more straightforward, understandable implementation</span>
-<span style="color: #BA2121; font-style: italic">        &quot;&quot;&quot;</span>
-        <span style="color: #008000; font-weight: bold">if</span> optimized:
-            conv_layer <span style="color: #666666">=</span> Convolution2DLayerOPT(
-                input_channels,
-                feature_maps,
-                kernel_height,
-                kernel_width,
-                v_stride,
-                h_stride,
-                pad,
-                act_func,
-                <span style="color: #008000">self</span><span style="color: #666666">.</span>seed,
-                reset_weights_independently<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">False</span>,
-            )
-        <span style="color: #008000; font-weight: bold">else</span>:
-            conv_layer <span style="color: #666666">=</span> Convolution2DLayer(
-                input_channels,
-                feature_maps,
-                kernel_height,
-                kernel_width,
-                v_stride,
-                h_stride,
-                pad,
-                act_func,
-                <span style="color: #008000">self</span><span style="color: #666666">.</span>seed,
-                reset_weights_independently<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">False</span>,
-            )
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>layers<span style="color: #666666">.</span>append(conv_layer)
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">add_PoolingLayer</span>(
-        <span style="color: #008000">self</span>, kernel_height<span style="color: #666666">=2</span>, kernel_width<span style="color: #666666">=2</span>, v_stride<span style="color: #666666">=1</span>, h_stride<span style="color: #666666">=1</span>, pooling<span style="color: #666666">=</span><span style="color: #BA2121">&quot;max&quot;</span>
-    ) <span style="color: #666666">-&gt;</span> <span style="color: #008000; font-weight: bold">None</span>:
-        <span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;</span>
-<span style="color: #BA2121; font-style: italic">        Description:</span>
-<span style="color: #BA2121; font-style: italic">        ------------</span>
-<span style="color: #BA2121; font-style: italic">            Add a Pooling2DLayer to the CNN, i.e. a pooling layer that reduces the dimentionality of</span>
-<span style="color: #BA2121; font-style: italic">            the image data. It is not necessary to use a Pooling2DLayer when creating a CNN, but it</span>
-<span style="color: #BA2121; font-style: italic">            can be used to speed up the training</span>
-
-<span style="color: #BA2121; font-style: italic">        Parameters:</span>
-<span style="color: #BA2121; font-style: italic">        ------------</span>
-<span style="color: #BA2121; font-style: italic">            I   kernel_height (int) height of the kernel used for pooling</span>
-<span style="color: #BA2121; font-style: italic">            II  kernel_width (int) width of the kernel used for pooling</span>
-<span style="color: #BA2121; font-style: italic">            III v_stride (int) value of vertical stride for dimentionality reduction</span>
-<span style="color: #BA2121; font-style: italic">            IV  h_stride (int) value of horizontal stride for dimentionality reduction</span>
-<span style="color: #BA2121; font-style: italic">            V   pooling (str) either &quot;max&quot; or &quot;average&quot;, describes type of pooling performed</span>
-<span style="color: #BA2121; font-style: italic">        &quot;&quot;&quot;</span>
-        pooling_layer <span style="color: #666666">=</span> Pooling2DLayer(
-            kernel_height, kernel_width, v_stride, h_stride, pooling, <span style="color: #008000">self</span><span style="color: #666666">.</span>seed
-        )
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>layers<span style="color: #666666">.</span>append(pooling_layer)
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">fit</span>(
-        <span style="color: #008000">self</span>,
-        X: np<span style="color: #666666">.</span>ndarray,
-        t: np<span style="color: #666666">.</span>ndarray,
-        epochs: <span style="color: #008000">int</span> <span style="color: #666666">=</span> <span style="color: #666666">100</span>,
-        lam: <span style="color: #008000">float</span> <span style="color: #666666">=</span> <span style="color: #666666">0</span>,
-        batches: <span style="color: #008000">int</span> <span style="color: #666666">=</span> <span style="color: #666666">1</span>,
-        X_val: np<span style="color: #666666">.</span>ndarray <span style="color: #666666">=</span> <span style="color: #008000; font-weight: bold">None</span>,
-        t_val: np<span style="color: #666666">.</span>ndarray <span style="color: #666666">=</span> <span style="color: #008000; font-weight: bold">None</span>,
-    ) <span style="color: #666666">-&gt;</span> <span style="color: #008000">dict</span>:
-        <span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;</span>
-<span style="color: #BA2121; font-style: italic">        Description:</span>
-<span style="color: #BA2121; font-style: italic">        ------------</span>
-<span style="color: #BA2121; font-style: italic">            Fits the CNN to input X for a given amount of epochs. Performs feedforward and backpropagation passes,</span>
-<span style="color: #BA2121; font-style: italic">            can utilize batches, regulariziation and validation if desired.</span>
-
-<span style="color: #BA2121; font-style: italic">        Parameters:</span>
-<span style="color: #BA2121; font-style: italic">        ------------</span>
-<span style="color: #BA2121; font-style: italic">            X (numpy array) with input data in format [images, input channels,</span>
-<span style="color: #BA2121; font-style: italic">            image height, image_width]</span>
-<span style="color: #BA2121; font-style: italic">            t (numpy array) target labels for input data</span>
-<span style="color: #BA2121; font-style: italic">            epochs (int) amount of epochs</span>
-<span style="color: #BA2121; font-style: italic">            lam (float) regulariziation term lambda</span>
-<span style="color: #BA2121; font-style: italic">            batches (int) amount of batches input data splits into</span>
-<span style="color: #BA2121; font-style: italic">            X_val (numpy array) validation data</span>
-<span style="color: #BA2121; font-style: italic">            t_val (numpy array) target labels for validation data</span>
-
-<span style="color: #BA2121; font-style: italic">        Returns:</span>
-<span style="color: #BA2121; font-style: italic">        ------------</span>
-<span style="color: #BA2121; font-style: italic">            scores (dict) a dictionary with &quot;train_error&quot;, &quot;train_acc&quot;, &quot;val_error&quot;, val_acc&quot; keys</span>
-<span style="color: #BA2121; font-style: italic">            that contain numpy arrays with float values of all accuracies/errors over all epochs.</span>
-<span style="color: #BA2121; font-style: italic">            Can be used to create plots. Also used to update the progress bar during training</span>
-<span style="color: #BA2121; font-style: italic">        &quot;&quot;&quot;</span>
-
-        <span style="color: #408080; font-style: italic"># setup</span>
-        <span style="color: #008000; font-weight: bold">if</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>seed <span style="color: #AA22FF; font-weight: bold">is</span> <span style="color: #AA22FF; font-weight: bold">not</span> <span style="color: #008000; font-weight: bold">None</span>:
-            np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>seed(<span style="color: #008000">self</span><span style="color: #666666">.</span>seed)
-
-        <span style="color: #408080; font-style: italic"># initialize weights</span>
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>_initialize_weights(X)
-
-        <span style="color: #408080; font-style: italic"># create arrays for score metrics</span>
-        scores <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>_initialize_scores(epochs)
-
-        <span style="color: #008000; font-weight: bold">assert</span> batches <span style="color: #666666">&lt;=</span> t<span style="color: #666666">.</span>shape[<span style="color: #666666">0</span>]
-        batch_size <span style="color: #666666">=</span> X<span style="color: #666666">.</span>shape[<span style="color: #666666">0</span>] <span style="color: #666666">//</span> batches
-
-        <span style="color: #008000; font-weight: bold">try</span>:
-            <span style="color: #008000; font-weight: bold">for</span> epoch <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(epochs):
-                <span style="color: #008000; font-weight: bold">for</span> batch <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(batches):
-                    <span style="color: #408080; font-style: italic"># minibatch gradient descent</span>
-                    <span style="color: #408080; font-style: italic"># If the for loop has reached the last batch, take all thats left</span>
-                    <span style="color: #008000; font-weight: bold">if</span> batch <span style="color: #666666">==</span> batches <span style="color: #666666">-</span> <span style="color: #666666">1</span>:
-                        X_batch <span style="color: #666666">=</span> X[batch <span style="color: #666666">*</span> batch_size :, :, :, :]
-                        t_batch <span style="color: #666666">=</span> t[batch <span style="color: #666666">*</span> batch_size :, :]
-                    <span style="color: #008000; font-weight: bold">else</span>:
-                        X_batch <span style="color: #666666">=</span> X[
-                            batch <span style="color: #666666">*</span> batch_size : (batch <span style="color: #666666">+</span> <span style="color: #666666">1</span>) <span style="color: #666666">*</span> batch_size, :, :, :
-                        ]
-                        t_batch <span style="color: #666666">=</span> t[batch <span style="color: #666666">*</span> batch_size : (batch <span style="color: #666666">+</span> <span style="color: #666666">1</span>) <span style="color: #666666">*</span> batch_size, :]
-
-                    <span style="color: #008000">self</span><span style="color: #666666">.</span>_feedforward(X_batch)
-                    <span style="color: #008000">self</span><span style="color: #666666">.</span>_backpropagate(t_batch, lam)
-
-                <span style="color: #408080; font-style: italic"># reset schedulers for each epoch (some schedulers pass in this call)</span>
-                <span style="color: #008000; font-weight: bold">for</span> layer <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>layers:
-                    <span style="color: #008000; font-weight: bold">if</span> <span style="color: #008000">isinstance</span>(layer, FullyConnectedLayer):
-                        layer<span style="color: #666666">.</span>_reset_scheduler()
-
-                <span style="color: #408080; font-style: italic"># computing performance metrics</span>
-                scores <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>_compute_scores(scores, epoch, X, t, X_val, t_val)
-
-                <span style="color: #408080; font-style: italic"># printing progress bar</span>
-                print_length <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>_progress_bar(
-                    epoch,
-                    epochs,
-                    scores,
-                )
-        <span style="color: #408080; font-style: italic"># allows for stopping training at any point and seeing the result</span>
-        <span style="color: #008000; font-weight: bold">except</span> <span style="color: #D2413A; font-weight: bold">KeyboardInterrupt</span>:
-            <span style="color: #008000; font-weight: bold">pass</span>
-
-        <span style="color: #408080; font-style: italic"># visualization of training progression (similiar to tensorflow progression bar)</span>
-        sys<span style="color: #666666">.</span>stdout<span style="color: #666666">.</span>write(<span style="color: #BA2121">&quot;</span><span style="color: #BB6622; font-weight: bold">\r</span><span style="color: #BA2121">&quot;</span> <span style="color: #666666">+</span> <span style="color: #BA2121">&quot; &quot;</span> <span style="color: #666666">*</span> print_length)
-        sys<span style="color: #666666">.</span>stdout<span style="color: #666666">.</span>flush()
-        <span style="color: #008000">self</span><span style="color: #666666">.</span>_progress_bar(
-            epochs,
-            epochs,
-            scores,
-        )
-        sys<span style="color: #666666">.</span>stdout<span style="color: #666666">.</span>write(<span style="color: #BA2121">&quot;&quot;</span>)
-
-        <span style="color: #008000; font-weight: bold">return</span> scores
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">_feedforward</span>(<span style="color: #008000">self</span>, X_batch) <span style="color: #666666">-&gt;</span> np<span style="color: #666666">.</span>ndarray:
-        <span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;</span>
-<span style="color: #BA2121; font-style: italic">        Description:</span>
-<span style="color: #BA2121; font-style: italic">        ------------</span>
-<span style="color: #BA2121; font-style: italic">            Performs the feedforward pass for all layers in the CNN. Called from fit()</span>
-<span style="color: #BA2121; font-style: italic">        &quot;&quot;&quot;</span>
-        a <span style="color: #666666">=</span> X_batch
-        <span style="color: #008000; font-weight: bold">for</span> layer <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>layers:
-            a <span style="color: #666666">=</span> layer<span style="color: #666666">.</span>_feedforward(a)
-
-        <span style="color: #008000; font-weight: bold">return</span> a
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">_backpropagate</span>(<span style="color: #008000">self</span>, t_batch, lam) <span style="color: #666666">-&gt;</span> <span style="color: #008000; font-weight: bold">None</span>:
-        <span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;</span>
-<span style="color: #BA2121; font-style: italic">        Description:</span>
-<span style="color: #BA2121; font-style: italic">        ------------</span>
-<span style="color: #BA2121; font-style: italic">            Performs backpropagation for all layers in the CNN. Called from fit()</span>
-<span style="color: #BA2121; font-style: italic">        &quot;&quot;&quot;</span>
-        <span style="color: #008000; font-weight: bold">assert</span> <span style="color: #008000">len</span>(<span style="color: #008000">self</span><span style="color: #666666">.</span>layers) <span style="color: #666666">&gt;=</span> <span style="color: #666666">2</span>
-        reversed_layers <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>layers[::<span style="color: #666666">-1</span>]
-
-        <span style="color: #408080; font-style: italic"># for every layer, backwards</span>
-        <span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #008000">len</span>(reversed_layers) <span style="color: #666666">-</span> <span style="color: #666666">1</span>):
-            layer <span style="color: #666666">=</span> reversed_layers[i]
-            prev_layer <span style="color: #666666">=</span> reversed_layers[i <span style="color: #666666">+</span> <span style="color: #666666">1</span>]
-
-            <span style="color: #408080; font-style: italic"># OutputLayer</span>
-            <span style="color: #008000; font-weight: bold">if</span> <span style="color: #008000">isinstance</span>(layer, OutputLayer):
-                prev_a <span style="color: #666666">=</span> prev_layer<span style="color: #666666">.</span>get_prev_a()
-                weights_next, delta_next <span style="color: #666666">=</span> layer<span style="color: #666666">.</span>_backpropagate(t_batch, prev_a, lam)
-
-            <span style="color: #408080; font-style: italic"># FullyConnectedLayer</span>
-            <span style="color: #008000; font-weight: bold">elif</span> <span style="color: #008000">isinstance</span>(layer, FullyConnectedLayer):
-                <span style="color: #008000; font-weight: bold">assert</span> (
-                    delta_next <span style="color: #AA22FF; font-weight: bold">is</span> <span style="color: #AA22FF; font-weight: bold">not</span> <span style="color: #008000; font-weight: bold">None</span>
-                ), <span style="color: #BA2121">&quot;No OutputLayer to follow FullyConnectedLayer&quot;</span>
-                <span style="color: #008000; font-weight: bold">assert</span> (
-                    weights_next <span style="color: #AA22FF; font-weight: bold">is</span> <span style="color: #AA22FF; font-weight: bold">not</span> <span style="color: #008000; font-weight: bold">None</span>
-                ), <span style="color: #BA2121">&quot;No OutputLayer to follow FullyConnectedLayer&quot;</span>
-                prev_a <span style="color: #666666">=</span> prev_layer<span style="color: #666666">.</span>get_prev_a()
-                weights_next, delta_next <span style="color: #666666">=</span> layer<span style="color: #666666">.</span>_backpropagate(
-                    weights_next, delta_next, prev_a, lam
-                )
-
-            <span style="color: #408080; font-style: italic"># FlattenLayer</span>
-            <span style="color: #008000; font-weight: bold">elif</span> <span style="color: #008000">isinstance</span>(layer, FlattenLayer):
-                <span style="color: #008000; font-weight: bold">assert</span> (
-                    delta_next <span style="color: #AA22FF; font-weight: bold">is</span> <span style="color: #AA22FF; font-weight: bold">not</span> <span style="color: #008000; font-weight: bold">None</span>
-                ), <span style="color: #BA2121">&quot;No FullyConnectedLayer to follow FlattenLayer&quot;</span>
-                <span style="color: #008000; font-weight: bold">assert</span> (
-                    weights_next <span style="color: #AA22FF; font-weight: bold">is</span> <span style="color: #AA22FF; font-weight: bold">not</span> <span style="color: #008000; font-weight: bold">None</span>
-                ), <span style="color: #BA2121">&quot;No FullyConnectedLayer to follow FlattenLayer&quot;</span>
-                delta_next <span style="color: #666666">=</span> layer<span style="color: #666666">.</span>_backpropagate(weights_next, delta_next)
-
-            <span style="color: #408080; font-style: italic"># Convolution2DLayer and Convolution2DLayerOPT</span>
-            <span style="color: #008000; font-weight: bold">elif</span> <span style="color: #008000">isinstance</span>(layer, Convolution2DLayer):
-                <span style="color: #008000; font-weight: bold">assert</span> (
-                    delta_next <span style="color: #AA22FF; font-weight: bold">is</span> <span style="color: #AA22FF; font-weight: bold">not</span> <span style="color: #008000; font-weight: bold">None</span>
-                ), <span style="color: #BA2121">&quot;No FlattenLayer to follow Convolution2DLayer&quot;</span>
-                delta_next <span style="color: #666666">=</span> layer<span style="color: #666666">.</span>_backpropagate(delta_next)
-
-            <span style="color: #408080; font-style: italic"># Pooling2DLayer</span>
-            <span style="color: #008000; font-weight: bold">elif</span> <span style="color: #008000">isinstance</span>(layer, Pooling2DLayer):
-                <span style="color: #008000; font-weight: bold">assert</span> delta_next <span style="color: #AA22FF; font-weight: bold">is</span> <span style="color: #AA22FF; font-weight: bold">not</span> <span style="color: #008000; font-weight: bold">None</span>, <span style="color: #BA2121">&quot;No Layer to follow Pooling2DLayer&quot;</span>
-                delta_next <span style="color: #666666">=</span> layer<span style="color: #666666">.</span>_backpropagate(delta_next)
-
-            <span style="color: #408080; font-style: italic"># Catch error</span>
-            <span style="color: #008000; font-weight: bold">else</span>:
-                <span style="color: #008000; font-weight: bold">raise</span> <span style="color: #D2413A; font-weight: bold">NotImplementedError</span>
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">_compute_scores</span>(
-        <span style="color: #008000">self</span>,
-        scores: <span style="color: #008000">dict</span>,
-        epoch: <span style="color: #008000">int</span>,
-        X: np<span style="color: #666666">.</span>ndarray,
-        t: np<span style="color: #666666">.</span>ndarray,
-        X_val: np<span style="color: #666666">.</span>ndarray,
-        t_val: np<span style="color: #666666">.</span>ndarray,
-    ) <span style="color: #666666">-&gt;</span> <span style="color: #008000">dict</span>:
-        <span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;</span>
-<span style="color: #BA2121; font-style: italic">        Description:</span>
-<span style="color: #BA2121; font-style: italic">        ------------</span>
-<span style="color: #BA2121; font-style: italic">            Computes scores such as training error, training accuracy, validation error</span>
-<span style="color: #BA2121; font-style: italic">            and validation accuracy for the CNN depending on if a validation set is used</span>
-<span style="color: #BA2121; font-style: italic">            and if the CNN performs classification or regression</span>
-
-<span style="color: #BA2121; font-style: italic">        Returns:</span>
-<span style="color: #BA2121; font-style: italic">        ------------</span>
-<span style="color: #BA2121; font-style: italic">            scores (dict) a dictionary with &quot;train_error&quot;, &quot;train_acc&quot;, &quot;val_error&quot;, val_acc&quot; keys</span>
-<span style="color: #BA2121; font-style: italic">            that contain numpy arrays with float values of all accuracies/errors over all epochs.</span>
-<span style="color: #BA2121; font-style: italic">            Can be used to create plots. Also used to update the progress bar during training</span>
-<span style="color: #BA2121; font-style: italic">        &quot;&quot;&quot;</span>
-
-        pred_train <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>predict(X)
-        cost_function_train <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>cost_func(t)
-        train_error <span style="color: #666666">=</span> cost_function_train(pred_train)
-        scores[<span style="color: #BA2121">&quot;train_error&quot;</span>][epoch] <span style="color: #666666">=</span> train_error
-
-        <span style="color: #008000; font-weight: bold">if</span> X_val <span style="color: #AA22FF; font-weight: bold">is</span> <span style="color: #AA22FF; font-weight: bold">not</span> <span style="color: #008000; font-weight: bold">None</span> <span style="color: #AA22FF; font-weight: bold">and</span> t_val <span style="color: #AA22FF; font-weight: bold">is</span> <span style="color: #AA22FF; font-weight: bold">not</span> <span style="color: #008000; font-weight: bold">None</span>:
-            cost_function_val <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>cost_func(t_val)
-            pred_val <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>predict(X_val)
-            val_error <span style="color: #666666">=</span> cost_function_val(pred_val)
-            scores[<span style="color: #BA2121">&quot;val_error&quot;</span>][epoch] <span style="color: #666666">=</span> val_error
-
-        <span style="color: #008000; font-weight: bold">if</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>pred_format <span style="color: #666666">!=</span> <span style="color: #BA2121">&quot;Regression&quot;</span>:
-            train_acc <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>_accuracy(pred_train, t)
-            scores[<span style="color: #BA2121">&quot;train_acc&quot;</span>][epoch] <span style="color: #666666">=</span> train_acc
-            <span style="color: #008000; font-weight: bold">if</span> X_val <span style="color: #AA22FF; font-weight: bold">is</span> <span style="color: #AA22FF; font-weight: bold">not</span> <span style="color: #008000; font-weight: bold">None</span> <span style="color: #AA22FF; font-weight: bold">and</span> t_val <span style="color: #AA22FF; font-weight: bold">is</span> <span style="color: #AA22FF; font-weight: bold">not</span> <span style="color: #008000; font-weight: bold">None</span>:
-                val_acc <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>_accuracy(pred_val, t_val)
-                scores[<span style="color: #BA2121">&quot;val_acc&quot;</span>][epoch] <span style="color: #666666">=</span> val_acc
-
-        <span style="color: #008000; font-weight: bold">return</span> scores
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">_initialize_scores</span>(<span style="color: #008000">self</span>, epochs) <span style="color: #666666">-&gt;</span> <span style="color: #008000">dict</span>:
-        <span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;</span>
-<span style="color: #BA2121; font-style: italic">        Description:</span>
-<span style="color: #BA2121; font-style: italic">        ------------</span>
-<span style="color: #BA2121; font-style: italic">            Initializes scores such as training error, training accuracy, validation error</span>
-<span style="color: #BA2121; font-style: italic">            and validation accuracy for the CNN</span>
-
-<span style="color: #BA2121; font-style: italic">        Returns:</span>
-<span style="color: #BA2121; font-style: italic">        ------------</span>
-<span style="color: #BA2121; font-style: italic">            A dictionary with &quot;train_error&quot;, &quot;train_acc&quot;, &quot;val_error&quot;, val_acc&quot; keys that</span>
-<span style="color: #BA2121; font-style: italic">            will contain numpy arrays with float values of all accuracies/errors over all epochs</span>
-<span style="color: #BA2121; font-style: italic">            when passed through the _compute_scores() function during fit()</span>
-<span style="color: #BA2121; font-style: italic">        &quot;&quot;&quot;</span>
-        scores <span style="color: #666666">=</span> <span style="color: #008000">dict</span>()
-
-        train_errors <span style="color: #666666">=</span> np<span style="color: #666666">.</span>empty(epochs)
-        train_errors<span style="color: #666666">.</span>fill(np<span style="color: #666666">.</span>nan)
-        val_errors <span style="color: #666666">=</span> np<span style="color: #666666">.</span>empty(epochs)
-        val_errors<span style="color: #666666">.</span>fill(np<span style="color: #666666">.</span>nan)
-
-        train_accs <span style="color: #666666">=</span> np<span style="color: #666666">.</span>empty(epochs)
-        train_accs<span style="color: #666666">.</span>fill(np<span style="color: #666666">.</span>nan)
-        val_accs <span style="color: #666666">=</span> np<span style="color: #666666">.</span>empty(epochs)
-        val_accs<span style="color: #666666">.</span>fill(np<span style="color: #666666">.</span>nan)
-
-        scores[<span style="color: #BA2121">&quot;train_error&quot;</span>] <span style="color: #666666">=</span> train_errors
-        scores[<span style="color: #BA2121">&quot;val_error&quot;</span>] <span style="color: #666666">=</span> val_errors
-        scores[<span style="color: #BA2121">&quot;train_acc&quot;</span>] <span style="color: #666666">=</span> train_accs
-        scores[<span style="color: #BA2121">&quot;val_acc&quot;</span>] <span style="color: #666666">=</span> val_accs
-
-        <span style="color: #008000; font-weight: bold">return</span> scores
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">_initialize_weights</span>(<span style="color: #008000">self</span>, X: np<span style="color: #666666">.</span>ndarray) <span style="color: #666666">-&gt;</span> <span style="color: #008000; font-weight: bold">None</span>:
-        <span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;</span>
-<span style="color: #BA2121; font-style: italic">        Description:</span>
-<span style="color: #BA2121; font-style: italic">        ------------</span>
-<span style="color: #BA2121; font-style: italic">            Initializes weights for all layers in CNN</span>
-
-<span style="color: #BA2121; font-style: italic">        Parameters:</span>
-<span style="color: #BA2121; font-style: italic">        ------------</span>
-<span style="color: #BA2121; font-style: italic">            I   X (np.ndarray) input of format [img, feature_maps, height, width]</span>
-<span style="color: #BA2121; font-style: italic">        &quot;&quot;&quot;</span>
-        prev_nodes <span style="color: #666666">=</span> X
-        <span style="color: #008000; font-weight: bold">for</span> layer <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>layers:
-            prev_nodes <span style="color: #666666">=</span> layer<span style="color: #666666">.</span>_reset_weights(prev_nodes)
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">predict</span>(<span style="color: #008000">self</span>, X: np<span style="color: #666666">.</span>ndarray, <span style="color: #666666">*</span>, threshold<span style="color: #666666">=0.5</span>) <span style="color: #666666">-&gt;</span> np<span style="color: #666666">.</span>ndarray:
-        <span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;</span>
-<span style="color: #BA2121; font-style: italic">        Description:</span>
-<span style="color: #BA2121; font-style: italic">        ------------</span>
-<span style="color: #BA2121; font-style: italic">            Predicts output of input X</span>
-
-<span style="color: #BA2121; font-style: italic">        Parameters:</span>
-<span style="color: #BA2121; font-style: italic">        ------------</span>
-<span style="color: #BA2121; font-style: italic">            I   X (np.ndarray) input [img, feature_maps, height, width]</span>
-<span style="color: #BA2121; font-style: italic">        &quot;&quot;&quot;</span>
-
-        prediction <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>_feedforward(X)
-
-        <span style="color: #008000; font-weight: bold">if</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>pred_format <span style="color: #666666">==</span> <span style="color: #BA2121">&quot;Binary&quot;</span>:
-            <span style="color: #008000; font-weight: bold">return</span> np<span style="color: #666666">.</span>where(prediction <span style="color: #666666">&gt;</span> threshold, <span style="color: #666666">1</span>, <span style="color: #666666">0</span>)
-        <span style="color: #008000; font-weight: bold">elif</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>pred_format <span style="color: #666666">==</span> <span style="color: #BA2121">&quot;Multi-class&quot;</span>:
-            class_prediction <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros(prediction<span style="color: #666666">.</span>shape)
-            <span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(prediction<span style="color: #666666">.</span>shape[<span style="color: #666666">0</span>]):
-                class_prediction[i, np<span style="color: #666666">.</span>argmax(prediction[i, :])] <span style="color: #666666">=</span> <span style="color: #666666">1</span>
-            <span style="color: #008000; font-weight: bold">return</span> class_prediction
-        <span style="color: #008000; font-weight: bold">else</span>:
-            <span style="color: #008000; font-weight: bold">return</span> prediction
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">_accuracy</span>(<span style="color: #008000">self</span>, prediction: np<span style="color: #666666">.</span>ndarray, target: np<span style="color: #666666">.</span>ndarray) <span style="color: #666666">-&gt;</span> <span style="color: #008000">float</span>:
-        <span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;</span>
-<span style="color: #BA2121; font-style: italic">        Description:</span>
-<span style="color: #BA2121; font-style: italic">        ------------</span>
-<span style="color: #BA2121; font-style: italic">            Calculates accuracy of given prediction to target</span>
-
-<span style="color: #BA2121; font-style: italic">        Parameters:</span>
-<span style="color: #BA2121; font-style: italic">        ------------</span>
-<span style="color: #BA2121; font-style: italic">            I   prediction (np.ndarray): output of predict() fuction</span>
-<span style="color: #BA2121; font-style: italic">            (1s and 0s in case of classification, and real numbers in case of regression)</span>
-<span style="color: #BA2121; font-style: italic">            II  target (np.ndarray): vector of true values (What the network should predict)</span>
-
-<span style="color: #BA2121; font-style: italic">        Returns:</span>
-<span style="color: #BA2121; font-style: italic">        ------------</span>
-<span style="color: #BA2121; font-style: italic">            A floating point number representing the percentage of correctly classified instances.</span>
-<span style="color: #BA2121; font-style: italic">        &quot;&quot;&quot;</span>
-        <span style="color: #008000; font-weight: bold">assert</span> prediction<span style="color: #666666">.</span>size <span style="color: #666666">==</span> target<span style="color: #666666">.</span>size
-        <span style="color: #008000; font-weight: bold">return</span> np<span style="color: #666666">.</span>average((target <span style="color: #666666">==</span> prediction))
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">_progress_bar</span>(<span style="color: #008000">self</span>, epoch: <span style="color: #008000">int</span>, epochs: <span style="color: #008000">int</span>, scores: <span style="color: #008000">dict</span>) <span style="color: #666666">-&gt;</span> <span style="color: #008000">int</span>:
-        <span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;</span>
-<span style="color: #BA2121; font-style: italic">        Description:</span>
-<span style="color: #BA2121; font-style: italic">        ------------</span>
-<span style="color: #BA2121; font-style: italic">            Displays progress of training</span>
-<span style="color: #BA2121; font-style: italic">        &quot;&quot;&quot;</span>
-        progression <span style="color: #666666">=</span> epoch <span style="color: #666666">/</span> epochs
-        epoch <span style="color: #666666">-=</span> <span style="color: #666666">1</span>
-        print_length <span style="color: #666666">=</span> <span style="color: #666666">40</span>
-        num_equals <span style="color: #666666">=</span> <span style="color: #008000">int</span>(progression <span style="color: #666666">*</span> print_length)
-        num_not <span style="color: #666666">=</span> print_length <span style="color: #666666">-</span> num_equals
-        arrow <span style="color: #666666">=</span> <span style="color: #BA2121">&quot;&gt;&quot;</span> <span style="color: #008000; font-weight: bold">if</span> num_equals <span style="color: #666666">&gt;</span> <span style="color: #666666">0</span> <span style="color: #008000; font-weight: bold">else</span> <span style="color: #BA2121">&quot;&quot;</span>
-        bar <span style="color: #666666">=</span> <span style="color: #BA2121">&quot;[&quot;</span> <span style="color: #666666">+</span> <span style="color: #BA2121">&quot;=&quot;</span> <span style="color: #666666">*</span> (num_equals <span style="color: #666666">-</span> <span style="color: #666666">1</span>) <span style="color: #666666">+</span> arrow <span style="color: #666666">+</span> <span style="color: #BA2121">&quot;-&quot;</span> <span style="color: #666666">*</span> num_not <span style="color: #666666">+</span> <span style="color: #BA2121">&quot;]&quot;</span>
-        perc_print <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>_fmt(progression <span style="color: #666666">*</span> <span style="color: #666666">100</span>, N<span style="color: #666666">=5</span>)
-        line <span style="color: #666666">=</span> <span style="color: #BA2121">f&quot;  </span><span style="color: #BB6688; font-weight: bold">{</span>bar<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121"> </span><span style="color: #BB6688; font-weight: bold">{</span>perc_print<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">% &quot;</span>
-
-        <span style="color: #008000; font-weight: bold">for</span> key, score <span style="color: #AA22FF; font-weight: bold">in</span> scores<span style="color: #666666">.</span>items():
-            <span style="color: #008000; font-weight: bold">if</span> np<span style="color: #666666">.</span>isnan(score[epoch]) <span style="color: #666666">==</span> <span style="color: #008000; font-weight: bold">False</span>:
-                value <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>_fmt(score[epoch], N<span style="color: #666666">=4</span>)
-                line <span style="color: #666666">+=</span> <span style="color: #BA2121">f&quot;| </span><span style="color: #BB6688; font-weight: bold">{</span>key<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">: </span><span style="color: #BB6688; font-weight: bold">{</span>value<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121"> &quot;</span>
-        <span style="color: #008000">print</span>(line, end<span style="color: #666666">=</span><span style="color: #BA2121">&quot;</span><span style="color: #BB6622; font-weight: bold">\r</span><span style="color: #BA2121">&quot;</span>)
-        <span style="color: #008000; font-weight: bold">return</span> <span style="color: #008000">len</span>(line)
-
-    <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">_fmt</span>(<span style="color: #008000">self</span>, value: <span style="color: #008000">int</span>, N<span style="color: #666666">=4</span>) <span style="color: #666666">-&gt;</span> <span style="color: #008000">str</span>:
-        <span style="color: #BA2121; font-style: italic">&quot;&quot;&quot;</span>
-<span style="color: #BA2121; font-style: italic">        Description:</span>
-<span style="color: #BA2121; font-style: italic">        ------------</span>
-<span style="color: #BA2121; font-style: italic">            Formats decimal numbers for progress bar</span>
-<span style="color: #BA2121; font-style: italic">        &quot;&quot;&quot;</span>
-        <span style="color: #008000; font-weight: bold">if</span> value <span style="color: #666666">&gt;</span> <span style="color: #666666">0</span>:
-            v <span style="color: #666666">=</span> value
-        <span style="color: #008000; font-weight: bold">elif</span> value <span style="color: #666666">&lt;</span> <span style="color: #666666">0</span>:
-            v <span style="color: #666666">=</span> <span style="color: #666666">-10</span> <span style="color: #666666">*</span> value
-        <span style="color: #008000; font-weight: bold">else</span>:
-            v <span style="color: #666666">=</span> <span style="color: #666666">1</span>
-        n <span style="color: #666666">=</span> <span style="color: #666666">1</span> <span style="color: #666666">+</span> math<span style="color: #666666">.</span>floor(math<span style="color: #666666">.</span>log10(v))
-        <span style="color: #008000; font-weight: bold">if</span> n <span style="color: #666666">&gt;=</span> N <span style="color: #666666">-</span> <span style="color: #666666">1</span>:
-            <span style="color: #008000; font-weight: bold">return</span> <span style="color: #008000">str</span>(<span style="color: #008000">round</span>(value))
-            <span style="color: #408080; font-style: italic"># or overflow</span>
-        <span style="color: #008000; font-weight: bold">return</span> <span style="color: #BA2121">f&quot;</span><span style="color: #BB6688; font-weight: bold">{</span>value<span style="color: #BB6688; font-weight: bold">:</span><span style="color: #BA2121">.</span><span style="color: #BB6688; font-weight: bold">{</span>N<span style="color: #666666">-</span>n<span style="color: #666666">-1</span><span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">f</span><span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">&quot;</span>
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
+<p>With parameter sharing, the convolution involves thus  for each filter  \( F\times F\times D_1 \) weights plus one bias parameter.</p>
+
+<p>In total we have</p>
+$$
+\left(F\times F\times D_1)\right) \times K+(K\mathrm{--biases}),
+$$
+
+<p>parameters to train by back propagation.</p>
+
+<p>It is common to let \( K \) come in powers of \( 2 \), that is \( 32 \), \( 64 \), \( 128 \) etc.</p>
+
+<div class="alert alert-block alert-block alert-text-normal">
+<b>Common settings</b>
+<p>
+<ol>
+<li> \( \begin{array}{c} F=3 &amp; S=1 &amp; P=1 \end{array} \)</li>
+<li> \( \begin{array}{c} F=5 &amp; S=1 &amp; P=2 \end{array} \)</li>
+<li> \( \begin{array}{c} F=5 &amp; S=2 &amp; P=\mathrm{open} \end{array} \)</li>
+<li> \( \begin{array}{c} F=1 &amp; S=1 &amp; P=0 \end{array} \)</li>
+</ol>
 </div>
-<h3 id="usage-of-cnn-code">Usage of CNN code </h3>
-
-<p>Using the CNN codebase is very simple. We begin by initiating a CNN
-object, which takes a cost function, a scheduler and a seed as its
-arguments. If a scheduler is not provided, it will per default
-initiate an Adam scheduler with eta=1e-4, and if a seed is not
-provided, the CNN will not be seeded, meaning it will run with a
-different random seed every run. Below we demonstrate an initiation of
-our CNN.
+
+
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="examples-of-cnn-setups">Examples of CNN setups </h2>
+
+<p>Let us assume we have an input volume \( V \) given by an image of dimensionality
+\( 32\times 32 \times 3 \), that is three color channels and \( 32\times 32 \) pixels.
 </p>
 
+<p>We apply a filter of dimension \( 5\times 5 \) ten times with stride \( S=1 \) and padding \( P=0 \).</p>
 
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;">adam_scheduler <span style="color: #666666">=</span> Adam(eta<span style="color: #666666">=1e-3</span>, rho<span style="color: #666666">=0.9</span>, rho2<span style="color: #666666">=0.999</span>)
-cnn <span style="color: #666666">=</span> CNN(cost_func<span style="color: #666666">=</span>CostCrossEntropy, scheduler<span style="color: #666666">=</span>adam_scheduler, seed<span style="color: #666666">=2023</span>)
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
+<p>The output volume is given by \( (32-5)/1+1=28 \), resulting in ten images
+of dimensionality \( 28\times 28\times 3 \).
+</p>
 
-<p>Now that we have our CNN object, we can begin to add layers to it!
-Many of the add_layer functions have default values, for example
-add_Convolution2DLayer() has a default v_stride and h_stride of
-1. However, these can of course be set to any value you please. Note
-that the input channels of a subsequent convolutional layer must equal
-the previous convolutional layer's feature maps.
+<p>The total number of parameters to train for each filter is then
+\( 5\times 5\times 3+1 \), where the last parameter is the bias. This
+gives us \( 76 \) parameters for each filter, leading to a total of \( 760 \)
+parameters for the ten filters.
 </p>
 
+<p>How many parameters will a filter of dimensionality \( 3\times 3 \)
+(adding color channels) result in if we produce \( 32 \) new images? Use \( S=1 \) and \( P=0 \).
+</p>
 
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;">cnn<span style="color: #666666">.</span>add_Convolution2DLayer(
-    input_channels<span style="color: #666666">=1</span>,
-    feature_maps<span style="color: #666666">=1</span>,
-    kernel_height<span style="color: #666666">=3</span>,
-    kernel_width<span style="color: #666666">=3</span>,
-    act_func<span style="color: #666666">=</span>LRELU,
-)
+<p>Note that strides constitute a form of <b>subsampling</b>. As an alternative to
+being interpreted as a measure of how much the kernel/filter is translated, strides
+can also be viewed as how much of the output is retained. For instance, moving
+the kernel by hops of two is equivalent to moving the kernel by hops of one but
+retaining only odd output elements.
+</p>
 
-cnn<span style="color: #666666">.</span>add_FlattenLayer()
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="summarizing-performing-a-general-discrete-convolution-from-raschka-et-al-https-github-com-rasbt-machine-learning-book">Summarizing: Performing a general discrete convolution (<a href="https://github.com/rasbt/machine-learning-book" target="_blank">From Raschka et al</a>) </h2>
 
-cnn<span style="color: #666666">.</span>add_FullyConnectedLayer(<span style="color: #666666">30</span>, LRELU)
+<center>  <!-- FIGURE -->
+<hr class="figure">
+<center>
+<p class="caption">Figure 4:  A deep CNN </p>
+</center>
+<p><img src="http://github.com/figslides/discreteconv1.png" width="500" align="bottom"></p>
+</center>
 
-cnn<span style="color: #666666">.</span>add_FullyConnectedLayer(<span style="color: #666666">20</span>, LRELU)
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="pooling">Pooling </h2>
 
-cnn<span style="color: #666666">.</span>add_OutputLayer(<span style="color: #666666">10</span>, softmax)
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
+<p>In addition to discrete convolutions themselves, <b>pooling</b> operations
+make up another important building block in CNNs. Pooling operations reduce
+the size of feature maps by using some function to summarize subregions, such
+as taking the average or the maximum value.
+</p>
 
-<p>Here we have created a CNN with the following architecture:</p>
+<p>Pooling works by sliding a window across the input and feeding the content of
+the window to a <b>pooling function</b>. In some sense, pooling works very much
+like a discrete convolution, but replaces the linear combination described by
+the kernel with some other function.
+</p>
 
-<ol>
-<li> A convolutional layer with 1 input channel, with a kernel height of 2 and a width of 2, which uses LRELU as its non-linearity function. This layer outputs 1 feature map, which feed into the subsequent layer.</li>
-<li> A flatten layer</li>
-<li> A hidden layer with 30 nodes, with LRELU as its activation function</li>
-<li> Another hidden layer but with 20 nodes</li>
-<li> The output layer, with softmax as its activation function and 10 nodes. We use 10 nodes because we will be using a dataset with 10 classes.</li>
-</ol>
-<p>Now, before we can train the model, we need to load in our data. We
-will use the MNIST dataset and use 10000 \( 28 \times  28 \) images.
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="pooling-arithmetic">Pooling arithmetic </h2>
+
+<p>In a neural network, pooling layers provide invariance to small translations of
+the input. The most common kind of pooling is <b>max pooling</b>, which
+consists in splitting the input in (usually non-overlapping) patches and
+outputting the maximum value of each patch. Other kinds of pooling exist, e.g.,
+mean or average pooling, which all share the same idea of aggregating the input
+locally by applying a non-linearity to the content of some patches.
 </p>
 
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="pooling-types-from-raschka-et-al-https-github-com-rasbt-machine-learning-book">Pooling types (<a href="https://github.com/rasbt/machine-learning-book" target="_blank">From Raschka et al</a>) </h2>
 
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.datasets</span> <span style="color: #008000; font-weight: bold">import</span> fetch_openml
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.model_selection</span> <span style="color: #008000; font-weight: bold">import</span> train_test_split
+<center>  <!-- FIGURE -->
+<hr class="figure">
+<center>
+<p class="caption">Figure 5:  A deep CNN </p>
+</center>
+<p><img src="http://github.com/figslides/maxpooling.png" width="500" align="bottom"></p>
+</center>
 
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">onehot</span>(target: np<span style="color: #666666">.</span>ndarray):
-    onehot <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros((target<span style="color: #666666">.</span>size, target<span style="color: #666666">.</span>max() <span style="color: #666666">+</span> <span style="color: #666666">1</span>))
-    onehot[np<span style="color: #666666">.</span>arange(target<span style="color: #666666">.</span>size), target] <span style="color: #666666">=</span> <span style="color: #666666">1</span>
-    <span style="color: #008000; font-weight: bold">return</span> onehot
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="building-convolutional-neural-networks-using-tensorflow-and-keras">Building convolutional neural networks using Tensorflow and Keras </h2>
 
-<span style="color: #408080; font-style: italic"># get dataset</span>
-dataset <span style="color: #666666">=</span> fetch_openml(<span style="color: #BA2121">&quot;mnist_784&quot;</span>, parser<span style="color: #666666">=</span><span style="color: #BA2121">&quot;auto&quot;</span>)
-mnist <span style="color: #666666">=</span> dataset<span style="color: #666666">.</span>data<span style="color: #666666">.</span>to_numpy(dtype<span style="color: #666666">=</span><span style="color: #BA2121">&quot;float&quot;</span>)[:<span style="color: #666666">10000</span>, :]
+<p>As discussed above, CNNs are neural networks built from the assumption that the inputs
+to the network are 2D images. This is important because the number of features or pixels in images
+grows very fast with the image size, and an enormous number of weights and biases are needed in order to build an accurate network.  
+</p>
 
-<span style="color: #408080; font-style: italic"># scale data</span>
-<span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(mnist<span style="color: #666666">.</span>shape[<span style="color: #666666">1</span>]):
-    mnist[:, i] <span style="color: #666666">/=</span> <span style="color: #666666">255</span>
-    
-<span style="color: #408080; font-style: italic"># reshape to add single input channel to data shape [inputs, input_channels, height, width]</span>
-mnist <span style="color: #666666">=</span> mnist<span style="color: #666666">.</span>reshape(mnist<span style="color: #666666">.</span>shape[<span style="color: #666666">0</span>], <span style="color: #666666">1</span>, <span style="color: #666666">28</span>, <span style="color: #666666">28</span>)
+<p>As before, we still have our input, a hidden layer and an output. What's novel about convolutional networks
+are the <b>convolutional</b> and <b>pooling</b> layers stacked in pairs between the input and the hidden layer.
+In addition, the data is no longer represented as a 2D feature matrix, instead each input is a number of 2D
+matrices, typically 1 for each color dimension (Red, Green, Blue). 
+</p>
 
-<span style="color: #408080; font-style: italic"># one hot encode target as we are doing multi-class classification</span>
-target <span style="color: #666666">=</span> onehot(np<span style="color: #666666">.</span>array([<span style="color: #008000">int</span>(i) <span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> dataset<span style="color: #666666">.</span>target<span style="color: #666666">.</span>to_numpy()[:<span style="color: #666666">10000</span>]]))
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="setting-it-up">Setting it up </h2>
 
-<span style="color: #408080; font-style: italic"># split into training and validation data</span>
-x_train, x_val, y_train, y_val <span style="color: #666666">=</span> train_test_split(mnist, target)
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
+<p>It means that to represent the entire
+dataset of images, we require a 4D matrix or <b>tensor</b>. This tensor has the dimensions:  
+</p>
+$$  
+(n_{inputs},\, n_{pixels, width},\, n_{pixels, height},\, depth) .
+$$
+
+
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="the-mnist-dataset-again">The MNIST dataset again </h2>
 
-<p>Now we may train our model. Note that we can utilize regularization in
-the CNN by using the lam (lambda) parameter in fit(), and utilize
-different types of gradient descent by specifying the amount of
-batches via the batches parameter as shown below.
+<p>The MNIST dataset consists of grayscale images with a pixel size of
+\( 28\times 28 \), meaning we require \( 28 \times 28 = 724 \) weights to each
+neuron in the first hidden layer.
 </p>
 
-<p>The functionfit() returns a score dictionary of the training error and
-accuracy (and validation error and accuracy if a validation set is
-provided) which can be used to plot the error and accuracy of the
-model over epochs.
+<p>If we were to analyze images of size \( 128\times 128 \) we would require
+\( 128 \times 128 = 16384 \) weights to each neuron. Even worse if we were
+dealing with color images, as most images are, we have an image matrix
+of size \( 128\times 128 \) for each color dimension (Red, Green, Blue),
+meaning 3 times the number of weights \( = 49152 \) are required for every
+single neuron in the first hidden layer.
 </p>
 
 
-<!-- code=python (!bc pycod) typeset with pygments style "default" -->
-<div class="cell border-box-sizing code_cell rendered">
-  <div class="input">
-    <div class="inner_cell">
-      <div class="input_area">
-        <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;">scores <span style="color: #666666">=</span> cnn<span style="color: #666666">.</span>fit(
-    x_train,
-    y_train,
-    lam<span style="color: #666666">=1e-5</span>,
-    batches<span style="color: #666666">=10</span>,
-    epochs<span style="color: #666666">=100</span>,
-    X_val<span style="color: #666666">=</span>x_val,
-    t_val<span style="color: #666666">=</span>y_val,
-)
-
-plt<span style="color: #666666">.</span>plot(scores[<span style="color: #BA2121">&quot;train_acc&quot;</span>], label<span style="color: #666666">=</span><span style="color: #BA2121">&quot;Training&quot;</span>)
-plt<span style="color: #666666">.</span>plot(scores[<span style="color: #BA2121">&quot;val_acc&quot;</span>], label<span style="color: #666666">=</span><span style="color: #BA2121">&quot;Validation&quot;</span>)
-plt<span style="color: #666666">.</span>ylim([<span style="color: #666666">0.8</span>,<span style="color: #666666">1</span>])
-plt<span style="color: #666666">.</span>xlabel(<span style="color: #BA2121">&quot;Epochs&quot;</span>)
-plt<span style="color: #666666">.</span>ylabel(<span style="color: #BA2121">&quot;Accuracy&quot;</span>)
-plt<span style="color: #666666">.</span>legend()
-plt<span style="color: #666666">.</span>show()
-</pre>
-</div>
-      </div>
-    </div>
-  </div>
-  <div class="output_wrapper">
-    <div class="output">
-      <div class="output_area">
-        <div class="output_subarea output_stream output_stdout output_text">          
-        </div>
-      </div>
-    </div>
-  </div>
-</div>
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="strong-correlations">Strong correlations </h2>
+
+<p>Images typically have strong local correlations, meaning that a small
+part of the image varies little from its neighboring regions. If for
+example we have an image of a blue car, we can roughly assume that a
+small blue part of the image is surrounded by other blue regions.
+</p>
+
+<p>Therefore, instead of connecting every single pixel to a neuron in the
+first hidden layer, as we have previously done with deep neural
+networks, we can instead connect each neuron to a small part of the
+image (in all 3 RGB depth dimensions).  The size of each small area is
+fixed, and known as a <a href="https://en.wikipedia.org/wiki/Receptive_field" target="_blank">receptive</a>.
+</p>
+
+
+<!-- !split  -->
+<h2 id="layers-of-a-cnn">Layers of a CNN </h2>
 
-<p>Considering we only trained the model for 100 epochs without any tuning of the hyperparameters, this result is pretty good.</p>
+<p>The layers of a convolutional neural network arrange neurons in 3D: width, height and depth.  
+The input image is typically a square matrix of depth 3. 
+</p>
 
-<p>The codebase allows for great flexibility in CNN
-architectures. Pooling layers can be added before, inbetween or after
-convolutional layers, but due to the great optimizations made within
-Convolution2DLayerOPT, we recommend using the v_stride and h_stride
-parameters in add_Convolution2DLayer() to reduce the dimentionality of
-the problem as the pooling layer is slow in comparison. To use the
-unoptimized version of Convolution2DLayer, simply pass optimized=False
-as an argument in add_Convolution2DLayer().
+<p>A <b>convolution</b> is performed on the image which outputs
+a 3D volume of neurons. The weights to the input are arranged in a number of 2D matrices, known as <b>filters</b>.
 </p>
 
-<p>If one wishes to perform binary classification using the CNN, simply
-use the cost function 'CostLogReg' when initializing the CNN and use 1
-node at the OutputLayer.
+<p>Each filter slides along the input image, taking the dot product
+between each small part of the image and the filter, in all depth
+dimensions. This is then passed through a non-linear function,
+typically the <b>Rectified Linear (ReLu)</b> function, which serves as the
+activation of the neurons in the first convolutional layer. This is
+further passed through a <b>pooling layer</b>, which reduces the size of the
+convolutional layer, e.g. by taking the maximum or average across some
+small regions, and this serves as input to the next convolutional
+layer.
 </p>
 
-<p>Below we have created another, more untraditional architecture using
-our code to demonstrate its flexibility and different attributes such
-as asymmetric stride that might become useful when constructing your
-own CNN.
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="systematic-reduction">Systematic reduction </h2>
+
+<p>By systematically reducing the size of the input volume, through
+convolution and pooling, the network should create representations of
+small parts of the input, and then from them assemble representations
+of larger areas.  The final pooling layer is flattened to serve as
+input to a hidden layer, such that each neuron in the final pooling
+layer is connected to every single neuron in the hidden layer. This
+then serves as input to the output layer, e.g. a softmax output for
+classification.
 </p>
 
 
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="prerequisites-collect-and-pre-process-data">Prerequisites: Collect and pre-process data </h2>
+
 <!-- code=python (!bc pycod) typeset with pygments style "default" -->
 <div class="cell border-box-sizing code_cell rendered">
   <div class="input">
     <div class="inner_cell">
       <div class="input_area">
         <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;">adam_scheduler <span style="color: #666666">=</span> Adam(eta<span style="color: #666666">=1e-3</span>, rho<span style="color: #666666">=0.9</span>, rho2<span style="color: #666666">=0.999</span>)
-cnn <span style="color: #666666">=</span> CNN(cost_func<span style="color: #666666">=</span>CostCrossEntropy, scheduler<span style="color: #666666">=</span>adam_scheduler, seed<span style="color: #666666">=2023</span>)
-
-cnn<span style="color: #666666">.</span>add_Convolution2DLayer(
-    input_channels<span style="color: #666666">=1</span>,
-    feature_maps<span style="color: #666666">=7</span>,
-    kernel_height<span style="color: #666666">=7</span>,
-    kernel_width<span style="color: #666666">=1</span>,
-    act_func<span style="color: #666666">=</span>LRELU,
-)
-
-cnn<span style="color: #666666">.</span>add_PoolingLayer(
-    kernel_height<span style="color: #666666">=2</span>,
-    kernel_width<span style="color: #666666">=2</span>,
-    pooling<span style="color: #666666">=</span><span style="color: #BA2121">&quot;average&quot;</span>,
-)
-
-cnn<span style="color: #666666">.</span>add_PoolingLayer(
-    kernel_height<span style="color: #666666">=2</span>,
-    kernel_width<span style="color: #666666">=2</span>,
-    pooling<span style="color: #666666">=</span><span style="color: #BA2121">&quot;max&quot;</span>,
-)
-
-cnn<span style="color: #666666">.</span>add_Convolution2DLayer(
-    input_channels<span style="color: #666666">=7</span>,
-    feature_maps<span style="color: #666666">=1</span>,
-    kernel_height<span style="color: #666666">=4</span>,
-    kernel_width<span style="color: #666666">=4</span>,
-    v_stride<span style="color: #666666">=2</span>,
-    h_stride<span style="color: #666666">=3</span>,
-    act_func<span style="color: #666666">=</span>LRELU,
-    optimized<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">False</span>,
-)
-
-cnn<span style="color: #666666">.</span>add_Convolution2DLayer(
-    input_channels<span style="color: #666666">=1</span>,
-    feature_maps<span style="color: #666666">=1</span>,
-    kernel_height<span style="color: #666666">=2</span>,
-    kernel_width<span style="color: #666666">=2</span>,
-    act_func<span style="color: #666666">=</span>sigmoid,
-    optimized<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">True</span>,
-)
-
-cnn<span style="color: #666666">.</span>add_PoolingLayer(
-    kernel_height<span style="color: #666666">=2</span>,
-    kernel_width<span style="color: #666666">=2</span>,
-    pooling<span style="color: #666666">=</span><span style="color: #BA2121">&quot;max&quot;</span>
-)
-
-cnn<span style="color: #666666">.</span>add_FlattenLayer()
-
-cnn<span style="color: #666666">.</span>add_FullyConnectedLayer(<span style="color: #666666">100</span>, LRELU)
-
-cnn<span style="color: #666666">.</span>add_FullyConnectedLayer(<span style="color: #666666">10</span>, sigmoid)
-
-cnn<span style="color: #666666">.</span>add_FullyConnectedLayer(<span style="color: #666666">101</span>, identity)
-
-cnn<span style="color: #666666">.</span>add_OutputLayer(<span style="color: #666666">10</span>, softmax)
+  <pre style="line-height: 125%;"><span style="color: #408080; font-style: italic"># import necessary packages</span>
+<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
+<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">matplotlib.pyplot</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">plt</span>
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn</span> <span style="color: #008000; font-weight: bold">import</span> datasets
+
+
+<span style="color: #408080; font-style: italic"># ensure the same random numbers appear every time</span>
+np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>seed(<span style="color: #666666">0</span>)
+
+<span style="color: #408080; font-style: italic"># display images in notebook</span>
+<span style="color: #666666">%</span>matplotlib inline
+plt<span style="color: #666666">.</span>rcParams[<span style="color: #BA2121">&#39;figure.figsize&#39;</span>] <span style="color: #666666">=</span> (<span style="color: #666666">12</span>,<span style="color: #666666">12</span>)
+
+
+<span style="color: #408080; font-style: italic"># download MNIST dataset</span>
+digits <span style="color: #666666">=</span> datasets<span style="color: #666666">.</span>load_digits()
+
+<span style="color: #408080; font-style: italic"># define inputs and labels</span>
+inputs <span style="color: #666666">=</span> digits<span style="color: #666666">.</span>images
+labels <span style="color: #666666">=</span> digits<span style="color: #666666">.</span>target
+
+<span style="color: #408080; font-style: italic"># RGB images have a depth of 3</span>
+<span style="color: #408080; font-style: italic"># our images are grayscale so they should have a depth of 1</span>
+inputs <span style="color: #666666">=</span> inputs[:,:,:,np<span style="color: #666666">.</span>newaxis]
+
+<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;inputs = (n_inputs, pixel_width, pixel_height, depth) = &quot;</span> <span style="color: #666666">+</span> <span style="color: #008000">str</span>(inputs<span style="color: #666666">.</span>shape))
+<span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;labels = (n_inputs) = &quot;</span> <span style="color: #666666">+</span> <span style="color: #008000">str</span>(labels<span style="color: #666666">.</span>shape))
+
+
+<span style="color: #408080; font-style: italic"># choose some random images to display</span>
+n_inputs <span style="color: #666666">=</span> <span style="color: #008000">len</span>(inputs)
+indices <span style="color: #666666">=</span> np<span style="color: #666666">.</span>arange(n_inputs)
+random_indices <span style="color: #666666">=</span> np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>choice(indices, size<span style="color: #666666">=5</span>)
+
+<span style="color: #008000; font-weight: bold">for</span> i, image <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">enumerate</span>(digits<span style="color: #666666">.</span>images[random_indices]):
+    plt<span style="color: #666666">.</span>subplot(<span style="color: #666666">1</span>, <span style="color: #666666">5</span>, i<span style="color: #666666">+1</span>)
+    plt<span style="color: #666666">.</span>axis(<span style="color: #BA2121">&#39;off&#39;</span>)
+    plt<span style="color: #666666">.</span>imshow(image, cmap<span style="color: #666666">=</span>plt<span style="color: #666666">.</span>cm<span style="color: #666666">.</span>gray_r, interpolation<span style="color: #666666">=</span><span style="color: #BA2121">&#39;nearest&#39;</span>)
+    plt<span style="color: #666666">.</span>title(<span style="color: #BA2121">&quot;Label: </span><span style="color: #BB6688; font-weight: bold">%d</span><span style="color: #BA2121">&quot;</span> <span style="color: #666666">%</span> digits<span style="color: #666666">.</span>target[random_indices[i]])
+plt<span style="color: #666666">.</span>show()
 </pre>
 </div>
       </div>
@@ -4290,32 +1302,9 @@ <h3 id="usage-of-cnn-code">Usage of CNN code </h3>
   </div>
 </div>
 
-<p>Here we see the use of asymmetrical 1D kernels such as the \( 7 \times
-1 \) kernel in the first convolutional layer, both max and average
-pooling, asymmetric stride in the unoptimized convolutional layer,
-more pooling, a flatten layer, a hidden layer with 100 nodes using
-LRELU, another hidden layer with 10 hidden nodes that uses the sigmoid
-activation function, and another hidden layer with 101 nodes which
-utilizes no activation function (identity). Finally, we arrive at the
-output layer with 10 nodes, which uses softmax as its activation
-function.
-</p>
-<h3 id="additional-remarks">Additional Remarks </h3>
-
-<p>The stride parameter controls the distance between each convolution
-and the kernel/filter. If our image is padded, stride is the only
-parameter that determines the size of the output from a convolutional
-layer. However, if we decide not to perform any padding, the size of
-the output feature map depends on both the stride and kernel size. It
-is important to note that neither the stride nor the kernel has to be
-symmetrical. This means that we can use a rectangular filter if we
-choose, and the stride in the vertical direction (axis=0 in Python)
-does not need to be the same as the stride in the horizontal direction
-(axis=1 in Python). It may even be the case that asymmetric
-combinations of stride or kernel dimensions, or both, yield better
-results than symmetric values for these parameters.
-</p>
 
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="importing-keras-and-tensorflow">Importing Keras and Tensorflow </h2>
 
 <!-- code=python (!bc pycod) typeset with pygments style "default" -->
 <div class="cell border-box-sizing code_cell rendered">
@@ -4323,26 +1312,28 @@ <h3 id="additional-remarks">Additional Remarks </h3>
     <div class="inner_cell">
       <div class="input_area">
         <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">convolve</span>(image, kernel, stride<span style="color: #666666">=1</span>):
-    <span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #666666">2</span>):
-        kernel <span style="color: #666666">=</span> np<span style="color: #666666">.</span>rot90(kernel)
-
-    k_half_height <span style="color: #666666">=</span> kernel<span style="color: #666666">.</span>shape[<span style="color: #666666">0</span>] <span style="color: #666666">//</span> <span style="color: #666666">2</span>
-    k_half_width <span style="color: #666666">=</span> kernel<span style="color: #666666">.</span>shape[<span style="color: #666666">0</span>] <span style="color: #666666">//</span> <span style="color: #666666">2</span>
-
-    conv_image <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros(image<span style="color: #666666">.</span>shape)
-    pad_image <span style="color: #666666">=</span> padding(image, kernel)
-
-    <span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(k_half_height, conv_image<span style="color: #666666">.</span>shape[<span style="color: #666666">0</span>] <span style="color: #666666">+</span> k_half_height, stride):
-        <span style="color: #008000; font-weight: bold">for</span> j <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(k_half_width, conv_image<span style="color: #666666">.</span>shape[<span style="color: #666666">1</span>] <span style="color: #666666">+</span> k_half_width, stride):
-            conv_image[i <span style="color: #666666">-</span> k_half_height, j <span style="color: #666666">-</span> k_half_width] <span style="color: #666666">=</span> np<span style="color: #666666">.</span>sum(
-                pad_image[
-                    i <span style="color: #666666">-</span> k_half_height : i <span style="color: #666666">+</span> k_half_height <span style="color: #666666">+</span> <span style="color: #666666">1</span>, j <span style="color: #666666">-</span> k_half_width : j <span style="color: #666666">+</span> k_half_width <span style="color: #666666">+</span> <span style="color: #666666">1</span>
-                ]
-                <span style="color: #666666">*</span> kernel
-            )
-
-    <span style="color: #008000; font-weight: bold">return</span> conv_image
+  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">tensorflow.keras</span> <span style="color: #008000; font-weight: bold">import</span> datasets, layers, models
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">tensorflow.keras.layers</span> <span style="color: #008000; font-weight: bold">import</span> Input
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">tensorflow.keras.models</span> <span style="color: #008000; font-weight: bold">import</span> Sequential      <span style="color: #408080; font-style: italic">#This allows appending layers to existing models</span>
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">tensorflow.keras.layers</span> <span style="color: #008000; font-weight: bold">import</span> Dense           <span style="color: #408080; font-style: italic">#This allows defining the characteristics of a particular layer</span>
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">tensorflow.keras</span> <span style="color: #008000; font-weight: bold">import</span> optimizers             <span style="color: #408080; font-style: italic">#This allows using whichever optimiser we want (sgd,adam,RMSprop)</span>
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">tensorflow.keras</span> <span style="color: #008000; font-weight: bold">import</span> regularizers           <span style="color: #408080; font-style: italic">#This allows using whichever regularizer we want (l1,l2,l1_l2)</span>
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">tensorflow.keras.utils</span> <span style="color: #008000; font-weight: bold">import</span> to_categorical   <span style="color: #408080; font-style: italic">#This allows using categorical cross entropy as the cost function</span>
+<span style="color: #408080; font-style: italic">#from tensorflow.keras import Conv2D</span>
+<span style="color: #408080; font-style: italic">#from tensorflow.keras import MaxPooling2D</span>
+<span style="color: #408080; font-style: italic">#from tensorflow.keras import Flatten</span>
+
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">sklearn.model_selection</span> <span style="color: #008000; font-weight: bold">import</span> train_test_split
+
+<span style="color: #408080; font-style: italic"># representation of labels</span>
+labels <span style="color: #666666">=</span> to_categorical(labels)
+
+<span style="color: #408080; font-style: italic"># split into train and test data</span>
+<span style="color: #408080; font-style: italic"># one-liner from scikit-learn library</span>
+train_size <span style="color: #666666">=</span> <span style="color: #666666">0.8</span>
+test_size <span style="color: #666666">=</span> <span style="color: #666666">1</span> <span style="color: #666666">-</span> train_size
+X_train, X_test, Y_train, Y_test <span style="color: #666666">=</span> train_test_split(inputs, labels, train_size<span style="color: #666666">=</span>train_size,
+                                                    test_size<span style="color: #666666">=</span>test_size)
 </pre>
 </div>
       </div>
@@ -4357,30 +1348,10 @@ <h3 id="additional-remarks">Additional Remarks </h3>
     </div>
   </div>
 </div>
-<h3 id="remarks-on-the-speed">Remarks on the speed  </h3>
-
-<p>Despite the naive convolution algorithm shown above working finely, it
-is extremely slow, requiring approximately 20-30 seconds to process a
-single image. The time complexity of 2D convolution, which is O(NMnm),
-rapidly becomes a constraint and may, at worst, make computations
-infeasible. Consequently, optimizing the naive 2D convolution
-algorithm is a necessity, as the execution time of the algorithm
-significantly increases as the input data size expands. This can pose
-a bottleneck in applications that necessitate real-time processing of
-large data volumes, such as image and video processing, deep learning,
-and scientific simulations.
-</p>
 
-<p>To address this issue, we shall present two widely used optimization
-techniques: the separable kernel approach and Fast Fourier Transform
-(FFT). Both of these methods can drastically reduce the computational
-complexity of convolution and enhance the overall efficiency of
-processing substantial data quantities. While we shall refrain from
-delving into the intricacies of these algorithms, we strongly
-encourage you to examine at least the application of FFT to optimize
-computations.
-</p>
-<h3 id="convolution-using-separable-kernels">Convolution using separable kernels </h3>
+
+<!-- !split  -->
+<h2 id="running-with-keras">Running with Keras </h2>
 
 
 <!-- code=python (!bc pycod) typeset with pygments style "default" -->
@@ -4389,45 +1360,32 @@ <h3 id="convolution-using-separable-kernels">Convolution using separable kernels
     <div class="inner_cell">
       <div class="input_area">
         <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">conv2DSep</span>(image, kernel, coef, stride<span style="color: #666666">=1</span>, pad<span style="color: #666666">=</span><span style="color: #BA2121">&quot;zero&quot;</span>):
-    <span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #666666">2</span>):
-        kernel <span style="color: #666666">=</span> np<span style="color: #666666">.</span>rot90(kernel)
-
-    <span style="color: #408080; font-style: italic"># The kernel is quadratic, thus we only need one of its dimensions</span>
-    half_dim <span style="color: #666666">=</span> kernel<span style="color: #666666">.</span>shape[<span style="color: #666666">0</span>] <span style="color: #666666">//</span> <span style="color: #666666">2</span>
-
-    ker1 <span style="color: #666666">=</span> np<span style="color: #666666">.</span>array(kernel[<span style="color: #666666">0</span>, :])
-    ker2 <span style="color: #666666">=</span> np<span style="color: #666666">.</span>array(kernel[:, <span style="color: #666666">0</span>])
-
-    <span style="color: #008000; font-weight: bold">if</span> pad <span style="color: #666666">==</span> <span style="color: #BA2121">&quot;zero&quot;</span>:
-        conv_image <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros(image<span style="color: #666666">.</span>shape)
-        pad_image <span style="color: #666666">=</span> padding(image, kernel)
-    <span style="color: #008000; font-weight: bold">else</span>:
-        conv_image <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros(
-            (image<span style="color: #666666">.</span>shape[<span style="color: #666666">0</span>] <span style="color: #666666">-</span> kernel<span style="color: #666666">.</span>shape[<span style="color: #666666">0</span>], image<span style="color: #666666">.</span>shape[<span style="color: #666666">1</span>] <span style="color: #666666">-</span> kernel<span style="color: #666666">.</span>shape[<span style="color: #666666">1</span>])
-        )
-        pad_image <span style="color: #666666">=</span> image[:, :]
-
-    <span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(half_dim, conv_image<span style="color: #666666">.</span>shape[<span style="color: #666666">0</span>] <span style="color: #666666">+</span> half_dim, stride):
-        <span style="color: #008000; font-weight: bold">for</span> j <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(half_dim, conv_image<span style="color: #666666">.</span>shape[<span style="color: #666666">1</span>] <span style="color: #666666">+</span> half_dim, stride):
-            conv_image[i <span style="color: #666666">-</span> half_dim, j <span style="color: #666666">-</span> half_dim] <span style="color: #666666">=</span> (
-                pad_image[
-                    i <span style="color: #666666">-</span> half_dim : i <span style="color: #666666">+</span> half_dim <span style="color: #666666">+</span> <span style="color: #666666">1</span>, j <span style="color: #666666">-</span> half_dim : j <span style="color: #666666">+</span> half_dim <span style="color: #666666">+</span> <span style="color: #666666">1</span>
-                ]
-                <span style="color: #666666">@</span> ker1
-                <span style="color: #666666">@</span> ker2<span style="color: #666666">.</span>T
-                <span style="color: #666666">*</span> coef
-            )
-
-    <span style="color: #008000; font-weight: bold">return</span> conv_image
-
-img_path <span style="color: #666666">=</span> img_path <span style="color: #666666">=</span> <span style="color: #BA2121">&quot;data/IMG-2167.JPG&quot;</span>
-image_of_cute_dog <span style="color: #666666">=</span> imageio<span style="color: #666666">.</span>imread(img_path, mode<span style="color: #666666">=</span><span style="color: #BA2121">&quot;L&quot;</span>)
-start_time <span style="color: #666666">=</span> time<span style="color: #666666">.</span>time()
-filtered_image <span style="color: #666666">=</span> conv2DSep(image_of_cute_dog, kernel<span style="color: #666666">=</span>sobel_kernel, coef<span style="color: #666666">=1</span>)
-<span style="color: #008000">print</span>(<span style="color: #BA2121">f&#39;Time taken for convolution with seperated kernel on 128x128 image </span><span style="color: #BB6688; font-weight: bold">{</span>time<span style="color: #666666">.</span>time() <span style="color: #666666">-</span> start_time<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">&#39;</span>)
-plt<span style="color: #666666">.</span>imshow(filtered_image, cmap<span style="color: #666666">=</span><span style="color: #BA2121">&quot;gray&quot;</span>, vmin<span style="color: #666666">=0</span>, vmax<span style="color: #666666">=255</span>, aspect<span style="color: #666666">=</span><span style="color: #BA2121">&quot;auto&quot;</span>)
-plt<span style="color: #666666">.</span>show()
+  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">create_convolutional_neural_network_keras</span>(input_shape, receptive_field,
+                                              n_filters, n_neurons_connected, n_categories,
+                                              eta, lmbd):
+    model <span style="color: #666666">=</span> Sequential()
+    model<span style="color: #666666">.</span>add(layers<span style="color: #666666">.</span>Conv2D(n_filters, (receptive_field, receptive_field), input_shape<span style="color: #666666">=</span>input_shape, padding<span style="color: #666666">=</span><span style="color: #BA2121">&#39;same&#39;</span>,
+              activation<span style="color: #666666">=</span><span style="color: #BA2121">&#39;relu&#39;</span>, kernel_regularizer<span style="color: #666666">=</span>regularizers<span style="color: #666666">.</span>l2(lmbd)))
+    model<span style="color: #666666">.</span>add(layers<span style="color: #666666">.</span>MaxPooling2D(pool_size<span style="color: #666666">=</span>(<span style="color: #666666">2</span>, <span style="color: #666666">2</span>)))
+    model<span style="color: #666666">.</span>add(layers<span style="color: #666666">.</span>Flatten())
+    model<span style="color: #666666">.</span>add(layers<span style="color: #666666">.</span>Dense(n_neurons_connected, activation<span style="color: #666666">=</span><span style="color: #BA2121">&#39;relu&#39;</span>, kernel_regularizer<span style="color: #666666">=</span>regularizers<span style="color: #666666">.</span>l2(lmbd)))
+    model<span style="color: #666666">.</span>add(layers<span style="color: #666666">.</span>Dense(n_categories, activation<span style="color: #666666">=</span><span style="color: #BA2121">&#39;softmax&#39;</span>, kernel_regularizer<span style="color: #666666">=</span>regularizers<span style="color: #666666">.</span>l2(lmbd)))
+    
+    sgd <span style="color: #666666">=</span> optimizers<span style="color: #666666">.</span>SGD(lr<span style="color: #666666">=</span>eta)
+    model<span style="color: #666666">.</span>compile(loss<span style="color: #666666">=</span><span style="color: #BA2121">&#39;categorical_crossentropy&#39;</span>, optimizer<span style="color: #666666">=</span>sgd, metrics<span style="color: #666666">=</span>[<span style="color: #BA2121">&#39;accuracy&#39;</span>])
+    
+    <span style="color: #008000; font-weight: bold">return</span> model
+
+epochs <span style="color: #666666">=</span> <span style="color: #666666">100</span>
+batch_size <span style="color: #666666">=</span> <span style="color: #666666">100</span>
+input_shape <span style="color: #666666">=</span> X_train<span style="color: #666666">.</span>shape[<span style="color: #666666">1</span>:<span style="color: #666666">4</span>]
+receptive_field <span style="color: #666666">=</span> <span style="color: #666666">3</span>
+n_filters <span style="color: #666666">=</span> <span style="color: #666666">10</span>
+n_neurons_connected <span style="color: #666666">=</span> <span style="color: #666666">50</span>
+n_categories <span style="color: #666666">=</span> <span style="color: #666666">10</span>
+
+eta_vals <span style="color: #666666">=</span> np<span style="color: #666666">.</span>logspace(<span style="color: #666666">-5</span>, <span style="color: #666666">1</span>, <span style="color: #666666">7</span>)
+lmbd_vals <span style="color: #666666">=</span> np<span style="color: #666666">.</span>logspace(<span style="color: #666666">-5</span>, <span style="color: #666666">1</span>, <span style="color: #666666">7</span>)
 </pre>
 </div>
       </div>
@@ -4443,20 +1401,10 @@ <h3 id="convolution-using-separable-kernels">Convolution using separable kernels
   </div>
 </div>
 
-<p>By taking advantage of the capabilities of separable kernels, we can
-effectively cut the computational expense of filtering an image in
-half. Yet, if we seek even more rapid processing, we can turn to the
-Fast Fourier Transform (FFT) algorithm provided by the numpy
-library. By utilizing FFT to transform the input image and filter into
-the frequency domain, we can perform convolution in this domain. This
-approach significantly reduces the number of operations needed and
-results in a marked speedup relative to other convolution
-techniques. In addition, it is worth noting that the FFT is widely
-regarded as one of the most critical algorithms developed to date,
-with applications ranging from digital signal processing to scientific
-computing.
-</p>
-<h3 id="convolution-in-the-fourier-domain">Convolution in the Fourier domain </h3>
+
+<!-- !split --><br><br><br><br><br><br><br><br><br><br>
+<h2 id="final-part">Final part </h2>
+
 
 <!-- code=python (!bc pycod) typeset with pygments style "default" -->
 <div class="cell border-box-sizing code_cell rendered">
@@ -4464,16 +1412,22 @@ <h3 id="convolution-in-the-fourier-domain">Convolution in the Fourier domain </h
     <div class="inner_cell">
       <div class="input_area">
         <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;">start_time <span style="color: #666666">=</span> time<span style="color: #666666">.</span>time()
-img_fft <span style="color: #666666">=</span> np<span style="color: #666666">.</span>fft<span style="color: #666666">.</span>fft2(image_of_cute_dog)
-kernel_fft <span style="color: #666666">=</span> np<span style="color: #666666">.</span>fft<span style="color: #666666">.</span>fft2(sobel_kernel, s<span style="color: #666666">=</span>image_of_cute_dog<span style="color: #666666">.</span>shape)
-
-conv_image <span style="color: #666666">=</span> img_fft <span style="color: #666666">*</span> kernel_fft
-
-filtered_image <span style="color: #666666">=</span> np<span style="color: #666666">.</span>fft<span style="color: #666666">.</span>ifft2(conv_image)
-<span style="color: #008000">print</span>(<span style="color: #BA2121">f&#39;Time take for convolution in the fourier domain: </span><span style="color: #BB6688; font-weight: bold">{</span>time<span style="color: #666666">.</span>time() <span style="color: #666666">-</span> start_time<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">&#39;</span>)
-plt<span style="color: #666666">.</span>imshow(filtered_image<span style="color: #666666">.</span>real, cmap<span style="color: #666666">=</span><span style="color: #BA2121">&quot;gray&quot;</span>, vmin<span style="color: #666666">=0</span>, vmax<span style="color: #666666">=255</span>, aspect<span style="color: #666666">=</span><span style="color: #BA2121">&quot;auto&quot;</span>)
-plt<span style="color: #666666">.</span>show()
+  <pre style="line-height: 125%;">CNN_keras <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros((<span style="color: #008000">len</span>(eta_vals), <span style="color: #008000">len</span>(lmbd_vals)), dtype<span style="color: #666666">=</span><span style="color: #008000">object</span>)
+        
+<span style="color: #008000; font-weight: bold">for</span> i, eta <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">enumerate</span>(eta_vals):
+    <span style="color: #008000; font-weight: bold">for</span> j, lmbd <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">enumerate</span>(lmbd_vals):
+        CNN <span style="color: #666666">=</span> create_convolutional_neural_network_keras(input_shape, receptive_field,
+                                              n_filters, n_neurons_connected, n_categories,
+                                              eta, lmbd)
+        CNN<span style="color: #666666">.</span>fit(X_train, Y_train, epochs<span style="color: #666666">=</span>epochs, batch_size<span style="color: #666666">=</span>batch_size, verbose<span style="color: #666666">=0</span>)
+        scores <span style="color: #666666">=</span> CNN<span style="color: #666666">.</span>evaluate(X_test, Y_test)
+        
+        CNN_keras[i][j] <span style="color: #666666">=</span> CNN
+        
+        <span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;Learning rate = &quot;</span>, eta)
+        <span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;Lambda = &quot;</span>, lmbd)
+        <span style="color: #008000">print</span>(<span style="color: #BA2121">&quot;Test accuracy: </span><span style="color: #BB6688; font-weight: bold">%.3f</span><span style="color: #BA2121">&quot;</span> <span style="color: #666666">%</span> scores[<span style="color: #666666">1</span>])
+        <span style="color: #008000">print</span>()
 </pre>
 </div>
       </div>
@@ -4489,115 +1443,9 @@ <h3 id="convolution-in-the-fourier-domain">Convolution in the Fourier domain </h
   </div>
 </div>
 
-<p>It is evident that executing convolution in the Fourier domain yields
-the quickest computation time. Nonetheless, one should exercise
-caution, particularly when dealing with images of relatively small
-dimensions, as one of the other methods may prove to be more
-expeditious than FFT-enhanced convolution. The overhead involved in
-transferring both the image and filter into the Fourier domain,
-followed by their subsequent transformation back into the spatial
-domain, results in a minor inconvenience. Therefore, it is imperative
-to remain cognizant of this fact when utilizing FFT as the primary
-optimization technique.
-</p>
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="from-ffnns-and-cnns-to-recurrent-neural-networks-rnns">From FFNNs and CNNs to recurrent neural networks (RNNs) </h2>
-
-<p>There are limitation of FFNNs, one of which being that FFNNs are not
-designed to handle sequential data (data for which the order matters)
-effectively because they lack the capabilities of storing information
-about previous inputs; each input is being treated indepen-
-dently. This is a limitation when dealing with sequential data where
-past information can be vital to correctly process current and future
-inputs. 
-</p>
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="feedback-connections">Feedback connections </h2>
-
-<p>In contrast to FFNNs, recurrent networks introduce feedback
-connections, meaning the information is allowed to be carried to
-subsequent nodes across different time steps. These cyclic or feedback
-connections have the objective of providing the network with some kind
-of memory, making RNNs particularly suited for time- series data,
-natural language processing, speech recognition, and several other
-problems for which the order of the data is crucial.  The RNN
-architectures vary greatly in how they manage information flow and
-memory in the network.
-</p>
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="vanishing-gradients">Vanishing gradients </h2>
-
-<p>Different architectures often aim at improving
-some sub-optimal characteristics of the network. The simplest form of
-recurrent network, commonly called simple or vanilla RNN, for example,
-is known to suffer from the problem of vanishing gradients. This
-problem arises due to the nature of backpropagation in time. Gradients
-of the cost/loss function may get exponentially small (or large) if
-there are many layers in the network, which is the case of RNN when
-the sequence gets long.
-</p>
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="recurrent-neural-networks-rnns-overarching-view">Recurrent neural networks (RNNs): Overarching view </h2>
-
-<p>Till now our focus has been, including convolutional neural networks
-as well, on feedforward neural networks. The output or the activations
-flow only in one direction, from the input layer to the output layer.
-</p>
-
-<p>A recurrent neural network (RNN) looks very much like a feedforward
-neural network, except that it also has connections pointing
-backward. 
-</p>
-
-<p>RNNs are used to analyze time series data such as stock prices, and
-tell you when to buy or sell. In autonomous driving systems, they can
-anticipate car trajectories and help avoid accidents. More generally,
-they can work on sequences of arbitrary lengths, rather than on
-fixed-sized inputs like all the nets we have discussed so far. For
-example, they can take sentences, documents, or audio samples as
-input, making them extremely useful for natural language processing
-systems such as automatic translation and speech-to-text.
-</p>
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="sequential-data-only">Sequential data only? </h2>
-
-<p>An important issue is that in many deep learning methods we assume
-that the input and output data can be treated as independent and
-identically distributed, normally abbreviated to <b>iid</b>.
-This means that the data we use can be seen as mutually independent.
-</p>
-
-<p>This is however not the case for most data sets used in RNNs since we
-are dealing with sequences of data with strong inter-dependencies.
-This applies in particular to time series, which are sequential by
-contruction.
-</p>
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="differential-equations">Differential equations </h2>
-
-<p>As an example, the solutions of ordinary differential equations can be
-represented as a time series, similarly, how stock prices evolve as
-function of time is another example of a typical time series, or voice
-records and many other examples.
-</p>
-
-<p>Not all sequential data may however have a time stamp, texts being a
-typical example thereof, or DNA sequences.
-</p>
-
-<p>The main focus here is on data that can be structured either as time
-series or as ordered series of data.  We will not focus on for example
-natural language processing or similar data sets.
-</p>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="a-simple-example">A simple example </h2>
+<h2 id="final-visualization">Final visualization </h2>
 
 
 <!-- code=python (!bc pycod) typeset with pygments style "default" -->
@@ -4606,67 +1454,35 @@ <h2 id="a-simple-example">A simple example </h2>
     <div class="inner_cell">
       <div class="input_area">
         <div class="highlight" style="background: #f8f8f8">
-  <pre style="line-height: 125%;"><span style="color: #408080; font-style: italic"># Start importing packages</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">pandas</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">pd</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">numpy</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">np</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">matplotlib.pyplot</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">plt</span>
-<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">tensorflow</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">tf</span>
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">tensorflow.keras</span> <span style="color: #008000; font-weight: bold">import</span> datasets, layers, models
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">tensorflow.keras.layers</span> <span style="color: #008000; font-weight: bold">import</span> Input
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">tensorflow.keras.models</span> <span style="color: #008000; font-weight: bold">import</span> Model, Sequential 
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">tensorflow.keras.layers</span> <span style="color: #008000; font-weight: bold">import</span> Dense, SimpleRNN, LSTM, GRU
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">tensorflow.keras</span> <span style="color: #008000; font-weight: bold">import</span> optimizers     
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">tensorflow.keras</span> <span style="color: #008000; font-weight: bold">import</span> regularizers           
-<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">tensorflow.keras.utils</span> <span style="color: #008000; font-weight: bold">import</span> to_categorical 
-
-
-
-<span style="color: #408080; font-style: italic"># convert into dataset matrix</span>
-<span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">convertToMatrix</span>(data, step):
- X, Y <span style="color: #666666">=</span>[], []
- <span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #008000">len</span>(data)<span style="color: #666666">-</span>step):
-  d<span style="color: #666666">=</span>i<span style="color: #666666">+</span>step  
-  X<span style="color: #666666">.</span>append(data[i:d,])
-  Y<span style="color: #666666">.</span>append(data[d,])
- <span style="color: #008000; font-weight: bold">return</span> np<span style="color: #666666">.</span>array(X), np<span style="color: #666666">.</span>array(Y)
-
-step <span style="color: #666666">=</span> <span style="color: #666666">4</span>
-N <span style="color: #666666">=</span> <span style="color: #666666">1000</span>    
-Tp <span style="color: #666666">=</span> <span style="color: #666666">800</span>    
-
-t<span style="color: #666666">=</span>np<span style="color: #666666">.</span>arange(<span style="color: #666666">0</span>,N)
-x<span style="color: #666666">=</span>np<span style="color: #666666">.</span>sin(<span style="color: #666666">0.02*</span>t)<span style="color: #666666">+2*</span>np<span style="color: #666666">.</span>random<span style="color: #666666">.</span>rand(N)
-df <span style="color: #666666">=</span> pd<span style="color: #666666">.</span>DataFrame(x)
-df<span style="color: #666666">.</span>head()
-
-values<span style="color: #666666">=</span>df<span style="color: #666666">.</span>values
-train,test <span style="color: #666666">=</span> values[<span style="color: #666666">0</span>:Tp,:], values[Tp:N,:]
-
-<span style="color: #408080; font-style: italic"># add step elements into train and test</span>
-test <span style="color: #666666">=</span> np<span style="color: #666666">.</span>append(test,np<span style="color: #666666">.</span>repeat(test[<span style="color: #666666">-1</span>,],step))
-train <span style="color: #666666">=</span> np<span style="color: #666666">.</span>append(train,np<span style="color: #666666">.</span>repeat(train[<span style="color: #666666">-1</span>,],step))
- 
-trainX,trainY <span style="color: #666666">=</span>convertToMatrix(train,step)
-testX,testY <span style="color: #666666">=</span>convertToMatrix(test,step)
-trainX <span style="color: #666666">=</span> np<span style="color: #666666">.</span>reshape(trainX, (trainX<span style="color: #666666">.</span>shape[<span style="color: #666666">0</span>], <span style="color: #666666">1</span>, trainX<span style="color: #666666">.</span>shape[<span style="color: #666666">1</span>]))
-testX <span style="color: #666666">=</span> np<span style="color: #666666">.</span>reshape(testX, (testX<span style="color: #666666">.</span>shape[<span style="color: #666666">0</span>], <span style="color: #666666">1</span>, testX<span style="color: #666666">.</span>shape[<span style="color: #666666">1</span>]))
-
-model <span style="color: #666666">=</span> Sequential()
-model<span style="color: #666666">.</span>add(SimpleRNN(units<span style="color: #666666">=32</span>, input_shape<span style="color: #666666">=</span>(<span style="color: #666666">1</span>,step), activation<span style="color: #666666">=</span><span style="color: #BA2121">&quot;relu&quot;</span>))
-model<span style="color: #666666">.</span>add(Dense(<span style="color: #666666">8</span>, activation<span style="color: #666666">=</span><span style="color: #BA2121">&quot;relu&quot;</span>)) 
-model<span style="color: #666666">.</span>add(Dense(<span style="color: #666666">1</span>))
-model<span style="color: #666666">.</span>compile(loss<span style="color: #666666">=</span><span style="color: #BA2121">&#39;mean_squared_error&#39;</span>, optimizer<span style="color: #666666">=</span><span style="color: #BA2121">&#39;rmsprop&#39;</span>)
-model<span style="color: #666666">.</span>summary()
+  <pre style="line-height: 125%;"><span style="color: #408080; font-style: italic"># visual representation of grid search</span>
+<span style="color: #408080; font-style: italic"># uses seaborn heatmap, could probably do this in matplotlib</span>
+<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">seaborn</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">sns</span>
+
+sns<span style="color: #666666">.</span>set()
+
+train_accuracy <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros((<span style="color: #008000">len</span>(eta_vals), <span style="color: #008000">len</span>(lmbd_vals)))
+test_accuracy <span style="color: #666666">=</span> np<span style="color: #666666">.</span>zeros((<span style="color: #008000">len</span>(eta_vals), <span style="color: #008000">len</span>(lmbd_vals)))
+
+<span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #008000">len</span>(eta_vals)):
+    <span style="color: #008000; font-weight: bold">for</span> j <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #008000">len</span>(lmbd_vals)):
+        CNN <span style="color: #666666">=</span> CNN_keras[i][j]
+
+        train_accuracy[i][j] <span style="color: #666666">=</span> CNN<span style="color: #666666">.</span>evaluate(X_train, Y_train)[<span style="color: #666666">1</span>]
+        test_accuracy[i][j] <span style="color: #666666">=</span> CNN<span style="color: #666666">.</span>evaluate(X_test, Y_test)[<span style="color: #666666">1</span>]
 
-model<span style="color: #666666">.</span>fit(trainX,trainY, epochs<span style="color: #666666">=100</span>, batch_size<span style="color: #666666">=16</span>, verbose<span style="color: #666666">=2</span>)
-trainPredict <span style="color: #666666">=</span> model<span style="color: #666666">.</span>predict(trainX)
-testPredict<span style="color: #666666">=</span> model<span style="color: #666666">.</span>predict(testX)
-predicted<span style="color: #666666">=</span>np<span style="color: #666666">.</span>concatenate((trainPredict,testPredict),axis<span style="color: #666666">=0</span>)
+        
+fig, ax <span style="color: #666666">=</span> plt<span style="color: #666666">.</span>subplots(figsize <span style="color: #666666">=</span> (<span style="color: #666666">10</span>, <span style="color: #666666">10</span>))
+sns<span style="color: #666666">.</span>heatmap(train_accuracy, annot<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">True</span>, ax<span style="color: #666666">=</span>ax, cmap<span style="color: #666666">=</span><span style="color: #BA2121">&quot;viridis&quot;</span>)
+ax<span style="color: #666666">.</span>set_title(<span style="color: #BA2121">&quot;Training Accuracy&quot;</span>)
+ax<span style="color: #666666">.</span>set_ylabel(<span style="color: #BA2121">&quot;$\eta$&quot;</span>)
+ax<span style="color: #666666">.</span>set_xlabel(<span style="color: #BA2121">&quot;$\lambda$&quot;</span>)
+plt<span style="color: #666666">.</span>show()
 
-trainScore <span style="color: #666666">=</span> model<span style="color: #666666">.</span>evaluate(trainX, trainY, verbose<span style="color: #666666">=0</span>)
-<span style="color: #008000">print</span>(trainScore)
-plt<span style="color: #666666">.</span>plot(df)
-plt<span style="color: #666666">.</span>plot(predicted)
+fig, ax <span style="color: #666666">=</span> plt<span style="color: #666666">.</span>subplots(figsize <span style="color: #666666">=</span> (<span style="color: #666666">10</span>, <span style="color: #666666">10</span>))
+sns<span style="color: #666666">.</span>heatmap(test_accuracy, annot<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">True</span>, ax<span style="color: #666666">=</span>ax, cmap<span style="color: #666666">=</span><span style="color: #BA2121">&quot;viridis&quot;</span>)
+ax<span style="color: #666666">.</span>set_title(<span style="color: #BA2121">&quot;Test Accuracy&quot;</span>)
+ax<span style="color: #666666">.</span>set_ylabel(<span style="color: #BA2121">&quot;$\eta$&quot;</span>)
+ax<span style="color: #666666">.</span>set_xlabel(<span style="color: #BA2121">&quot;$\lambda$&quot;</span>)
 plt<span style="color: #666666">.</span>show()
 </pre>
 </div>
@@ -4685,538 +1501,358 @@ <h2 id="a-simple-example">A simple example </h2>
 
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="rnns">RNNs </h2>
-
-<p>RNNs are very powerful, because they
-combine two properties:
-</p>
-<ol>
-<li> Distributed hidden state that allows them to store a lot of information about the past efficiently.</li>
-<li> Non-linear dynamics that allows them to update their hidden state in complicated ways.</li>
-</ol>
-<p>With enough neurons and time, RNNs
-can compute anything that can be
-computed by your computer.
-</p>
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="what-kinds-of-behaviour-can-rnns-exhibit">What kinds of behaviour can RNNs exhibit? </h2>
-
-<ol>
-<li> They can oscillate.</li> 
-<li> They can settle to point attractors.</li>
-<li> They can behave chaotically.</li>
-<li> RNNs could potentially learn to implement lots of small programs that each capture a nugget of knowledge and run in parallel, interacting to produce very complicated effects.</li>
-</ol>
-<p>But the extensive computational needs  of RNNs makes them very hard to train.</p>
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="basic-layout-figures-from-sebastian-rashcka-et-al-machine-learning-with-sickit-learn-and-pytorch-https-sebastianraschka-com-blog-2022-ml-pytorch-book-html">Basic layout,  <a href="https://sebastianraschka.com/blog/2022/ml-pytorch-book.html" target="_blank">Figures from Sebastian Rashcka et al, Machine learning with Sickit-Learn and PyTorch</a> </h2>
-
-<br/><br/>
-<center>
-<p><img src="http://github.com/figslides/RNN1.png" width="700" align="bottom"></p>
-</center>
-<br/><br/>
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="solving-differential-equations-with-rnns">Solving differential equations with RNNs </h2>
-
-<p>To gain some intuition on how we can use RNNs for time series, let us
-tailor the representation of the solution of a differential equation
-as a time series.
-</p>
-
-<p>Consider the famous differential equation (Newton's equation of motion for damped harmonic oscillations, scaled in terms of dimensionless time)</p>
-
-$$
-\frac{d^2x}{dt^2}+\eta\frac{dx}{dt}+x(t)=F(t),
-$$
-
-<p>where \( \eta \) is a constant used in scaling time into a dimensionless variable and \( F(t) \) is an external force acting on the system.
-The constant \( \eta \) is a so-called damping.
-</p>
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="two-first-order-differential-equations">Two first-order differential equations </h2>
-
-<p>In solving the above second-order equation, it is common to rewrite it in terms of two coupled first-order equations
-with the velocity
-</p>
-$$
-v(t)=\frac{dx}{dt},
-$$
-
-<p>and the acceleration</p>
-$$
-\frac{dv}{dt}=F(t)-\eta v(t)-x(t).
-$$
-
-<p>With the initial conditions \( v_0=v(t_0) \) and \( x_0=x(t_0) \) defined, we can integrate these equations and find their respective solutions.</p>
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="velocity-only">Velocity only </h2>
-
-<p>Let us focus on the velocity only. Discretizing and using the simplest
-possible approximation for the derivative, we have Euler's forward
-method for the updated velocity at a time step \( i+1 \) given by
-</p>
-$$
-v_{i+1}=v_i+\Delta t \frac{dv}{dt}_{\vert_{v=v_i}}=v_i+\Delta t\left(F_i-\eta v_i-x_i\right).
-$$
-
-<p>Defining a function</p>
-$$
-h_i(x_i,v_i,F_i)=v_i+\Delta t\left(F_i-\eta v_i-x_i\right),
-$$
-
-<p>we have</p>
-$$
-v_{i+1}=h_i(x_i,v_i,F_i).
-$$
-
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="linking-with-rnns">Linking with RNNs </h2>
-
-<p>The equation</p>
-$$
-v_{i+1}=h_i(x_i,v_i,F_i).
-$$
-
-<p>can be used to train a feed-forward neural network with inputs \( v_i \) and outputs \( v_{i+1} \) at a time \( t_i \). But we can think of this also as a recurrent neural network
-with inputs \( v_i \), \( x_i \) and \( F_i \) at each time step \( t_i \), and producing an output \( v_{i+1} \).
-</p>
-
-<p>Noting that </p>
-$$
-v_{i}=v_{i-1}+\Delta t\left(F_{i-1}-\eta v_{i-1}-x_{i-1}\right)=h_{i-1}.
-$$
-
-<p>we have</p>
-$$
-v_{i}=h_{i-1}(x_{i-1},v_{i-1},F_{i-1}),
-$$
-
-<p>and we can rewrite</p>
-$$
-v_{i+1}=h_i(x_i,h_{i-1},F_i).
-$$
-
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="minor-rewrite">Minor rewrite </h2>
-
-<p>We can thus set up a recurring series which depends on the inputs \( x_i \) and \( F_i \) and the previous values \( h_{i-1} \).
-We assume now that the inputs at each step (or time \( t_i \)) is given by \( x_i \) only and we denote the outputs for \( \tilde{y}_i \) instead of \( v_{i_1} \), we have then the compact equation for our outputs at each step \( t_i \)
-</p>
-$$
-y_{i}=h_i(x_i,h_{i-1}).
-$$
-
-<p>We can think of this as an element in a recurrent network where our
-network (our model) produces an output \( y_i \) which is then compared
-with a target value through a given cost/loss function that we
-optimize. The target values at a given step \( t_i \) could be the results
-of a measurement or simply the analytical results of a differential
-equation.
-</p>
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="rnns-in-more-detail">RNNs in more detail  </h2>
-
-<br/><br/>
-<center>
-<p><img src="http://github.com/figslides/RNN2.png" width="700" align="bottom"></p>
-</center>
-<br/><br/>
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="rnns-in-more-detail-part-2">RNNs in more detail, part 2  </h2>
-
-<br/><br/>
-<center>
-<p><img src="http://github.com/figslides/RNN3.png" width="700" align="bottom"></p>
-</center>
-<br/><br/>
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="rnns-in-more-detail-part-3">RNNs in more detail, part 3  </h2>
-
-<br/><br/>
-<center>
-<p><img src="http://github.com/figslides/RNN4.png" width="700" align="bottom"></p>
-</center>
-<br/><br/>
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="rnns-in-more-detail-part-4">RNNs in more detail, part 4  </h2>
-
-<br/><br/>
-<center>
-<p><img src="http://github.com/figslides/RNN5.png" width="700" align="bottom"></p>
-</center>
-<br/><br/>
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="rnns-in-more-detail-part-5">RNNs in more detail, part 5  </h2>
-
-<br/><br/>
-<center>
-<p><img src="http://github.com/figslides/RNN6.png" width="700" align="bottom"></p>
-</center>
-<br/><br/>
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="rnns-in-more-detail-part-6">RNNs in more detail, part 6  </h2>
-
-<br/><br/>
-<center>
-<p><img src="http://github.com/figslides/RNN7.png" width="700" align="bottom"></p>
-</center>
-<br/><br/>
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="rnns-in-more-detail-part-7">RNNs in more detail, part 7  </h2>
-
-<br/><br/>
-<center>
-<p><img src="http://github.com/figslides/RNN8.png" width="700" align="bottom"></p>
-</center>
-<br/><br/>
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="backpropagation-through-time">Backpropagation through time </h2>
-
-<div class="alert alert-block alert-block alert-text-normal">
-<b></b>
-<p>
-<p>We can think of the recurrent net as a layered, feed-forward
-net with shared weights and then train the feed-forward net
-with weight constraints.
-</p>
-</div>
-
-
-<p>We can also think of this training algorithm in the time domain:</p>
-<ol>
-<li> The forward pass builds up a stack of the activities of all the units at each time step.</li>
-<li> The backward pass peels activities off the stack to compute the error derivatives at each time step.</li>
-<li> After the backward pass we add together the derivatives at all the different times for each weight.</li> 
-</ol>
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="the-backward-pass-is-linear">The backward pass is linear </h2>
-
-<ol>
-<li> There is a big difference between the forward and backward passes.</li>
-<li> In the forward pass we use squashing functions (like the logistic) to prevent the activity vectors from exploding.</li>
-<li> The backward pass, is completely linear. If you double the error derivatives at the final layer, all the error derivatives will double.</li>
-</ol>
-<p>The forward pass determines the slope of the linear function used for
-backpropagating through each neuron
-</p>
-
-<!-- !split  -->
-<h2 id="the-problem-of-exploding-or-vanishing-gradients">The problem of exploding or vanishing gradients </h2>
-<ul>
-<li> What happens to the magnitude of the gradients as we backpropagate through many layers?
-<ol type="a"></li>
- <li> If the weights are small, the gradients shrink exponentially.</li>
- <li> If the weights are big the gradients grow exponentially.</li>
-</ol>
-<li> Typical feed-forward neural nets can cope with these exponential effects because they only have a few hidden layers.</li>
-<li> In an RNN trained on long sequences (e.g. 100 time steps) the gradients can easily explode or vanish.
-<ol type="a"></li>
- <li> We can avoid this by initializing the weights very carefully.</li>
-</ol>
-<li> Even with good initial weights, its very hard to detect that the current target output depends on an input from many time-steps ago.</li>
-</ul>
-<p>RNNs have difficulty dealing with long-range dependencies. </p>
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="mathematical-setup">Mathematical setup </h2>
+<h2 id="the-cifar01-data-set">The CIFAR01 data set </h2>
 
-<p>The expression for the simplest Recurrent network resembles that of a
-regular feed-forward neural network, but now with
-the concept of temporal dependencies
+<p>The CIFAR10 dataset contains 60,000 color images in 10 classes, with
+6,000 images in each class. The dataset is divided into 50,000
+training images and 10,000 testing images. The classes are mutually
+exclusive and there is no overlap between them.
 </p>
 
-$$
-\begin{align*}
-    \mathbf{a}^{(t)} & = U * \mathbf{x}^{(t)} + W * \mathbf{h}^{(t-1)} + \mathbf{b}, \notag \\
-    \mathbf{h}^{(t)} &= \sigma_h(\mathbf{a}^{(t)}), \notag\\
-    \mathbf{y}^{(t)} &= V * \mathbf{h}^{(t)} + \mathbf{c}, \notag\\
-    \mathbf{\hat{y}}^{(t)} &= \sigma_y(\mathbf{y}^{(t)}).
-\end{align*}
-$$
-
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="back-propagation-in-time-through-figures-part-1">Back propagation in time through figures, part 1   </h2>
-
-<br/><br/>
-<center>
-<p><img src="http://github.com/figslides/RNN9.png" width="700" align="bottom"></p>
-</center>
-<br/><br/>
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="back-propagation-in-time-part-2">Back propagation in time, part 2   </h2>
-
-<br/><br/>
-<center>
-<p><img src="http://github.com/figslides/RNN10.png" width="700" align="bottom"></p>
-</center>
-<br/><br/>
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="back-propagation-in-time-part-3">Back propagation in time, part 3   </h2>
-
-<br/><br/>
-<center>
-<p><img src="http://github.com/figslides/RNN11.png" width="700" align="bottom"></p>
-</center>
-<br/><br/>
-
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="back-propagation-in-time-part-4">Back propagation in time, part 4   </h2>
 
-<br/><br/>
-<center>
-<p><img src="http://github.com/figslides/RNN12.png" width="700" align="bottom"></p>
-</center>
-<br/><br/>
+<!-- code=python (!bc pycod) typeset with pygments style "default" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #f8f8f8">
+  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">tensorflow</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">tf</span>
 
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="back-propagation-in-time-in-equations">Back propagation in time in equations </h2>
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">tensorflow.keras</span> <span style="color: #008000; font-weight: bold">import</span> datasets, layers, models
+<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">matplotlib.pyplot</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">plt</span>
 
-<p>To derive the expression of the gradients of \( \mathcal{L} \) for
-the RNN, we need to start recursively from the nodes closer to the
-output layer in the temporal unrolling scheme - such as \( \mathbf{y} \)
-and \( \mathbf{h} \) at final time \( t = \tau \),
-</p>
+<span style="color: #408080; font-style: italic"># We import the data set</span>
+(train_images, train_labels), (test_images, test_labels) <span style="color: #666666">=</span> datasets<span style="color: #666666">.</span>cifar10<span style="color: #666666">.</span>load_data()
 
-$$
-\begin{align*}
-    (\nabla_{ \mathbf{y}^{(t)}} \mathcal{L})_{i} &= \frac{\partial \mathcal{L}}{\partial L^{(t)}}\frac{\partial L^{(t)}}{\partial y_{i}^{(t)}}, \notag\\
-    \nabla_{\mathbf{h}^{(\tau)}} \mathcal{L} &= \mathbf{V}^\mathsf{T}\nabla_{ \mathbf{y}^{(\tau)}} \mathcal{L}.
-\end{align*}
-$$
+<span style="color: #408080; font-style: italic"># Normalize pixel values to be between 0 and 1 by dividing by 255. </span>
+train_images, test_images <span style="color: #666666">=</span> train_images <span style="color: #666666">/</span> <span style="color: #666666">255.0</span>, test_images <span style="color: #666666">/</span> <span style="color: #666666">255.0</span>
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
+</div>
 
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="chain-rule-again">Chain rule again </h2>
-<p>For the following hidden nodes, we have to iterate through time, so by the chain rule, </p>
-
-$$
-\begin{align*}
-    \nabla_{\mathbf{h}^{(t)}} \mathcal{L} &= \left(\frac{\partial\mathbf{h}^{(t+1)}}{\partial\mathbf{h}^{(t)}}\right)^\mathsf{T}\nabla_{\mathbf{h}^{(t+1)}}\mathcal{L} + \left(\frac{\partial\mathbf{y}^{(t)}}{\partial\mathbf{h}^{(t)}}\right)^\mathsf{T}\nabla_{ \mathbf{y}^{(t)}} \mathcal{L}.
-\end{align*}
-$$
+<h2 id="verifying-the-data-set">Verifying the data set </h2>
 
+<p>To verify that the dataset looks correct, let's plot the first 25 images from the training set and display the class name below each image.</p>
 
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="gradients-of-loss-functions">Gradients of loss functions </h2>
-<p>Similarly, the gradients of \( \mathcal{L} \) with respect to the weights and biases follow,</p>
 
-$$
-\begin{align*}
-    \nabla_{\mathbf{c}} \mathcal{L} &=\sum_{t}\left(\frac{\partial \mathbf{y}^{(t)}}{\partial \mathbf{c}}\right)^\mathsf{T} \nabla_{\mathbf{y}^{(t)}} \mathcal{L} \notag\\
-    \nabla_{\mathbf{b}} \mathcal{L} &=\sum_{t}\left(\frac{\partial \mathbf{h}^{(t)}}{\partial \mathbf{b}}\right)^\mathsf{T}        \nabla_{\mathbf{h}^{(t)}} \mathcal{L} \notag\\
-    \nabla_{\mathbf{V}} \mathcal{L} &=\sum_{t}\sum_{i}\left(\frac{\partial \mathcal{L}}{\partial y_i^{(t)} }\right)\nabla_{\mathbf{V}^{(t)}}y_i^{(t)} \notag\\
-    \nabla_{\mathbf{W}} \mathcal{L} &=\sum_{t}\sum_{i}\left(\frac{\partial \mathcal{L}}{\partial h_i^{(t)}}\right)\nabla_{\mathbf{w}^{(t)}} h_i^{(t)} \notag\\
-    \nabla_{\mathbf{U}} \mathcal{L} &=\sum_{t}\sum_{i}\left(\frac{\partial \mathcal{L}}{\partial h_i^{(t)}}\right)\nabla_{\mathbf{U}^{(t)}}h_i^{(t)}.
-    \label{eq:rnn_gradients3}
-\end{align*}
-$$
+<!-- code=python (!bc pycod) typeset with pygments style "default" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #f8f8f8">
+  <pre style="line-height: 125%;">class_names <span style="color: #666666">=</span> [<span style="color: #BA2121">&#39;airplane&#39;</span>, <span style="color: #BA2121">&#39;automobile&#39;</span>, <span style="color: #BA2121">&#39;bird&#39;</span>, <span style="color: #BA2121">&#39;cat&#39;</span>, <span style="color: #BA2121">&#39;deer&#39;</span>,
+               <span style="color: #BA2121">&#39;dog&#39;</span>, <span style="color: #BA2121">&#39;frog&#39;</span>, <span style="color: #BA2121">&#39;horse&#39;</span>, <span style="color: #BA2121">&#39;ship&#39;</span>, <span style="color: #BA2121">&#39;truck&#39;</span>]
+plt<span style="color: #666666">.</span>figure(figsize<span style="color: #666666">=</span>(<span style="color: #666666">10</span>,<span style="color: #666666">10</span>))
+<span style="color: #008000; font-weight: bold">for</span> i <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(<span style="color: #666666">25</span>):
+    plt<span style="color: #666666">.</span>subplot(<span style="color: #666666">5</span>,<span style="color: #666666">5</span>,i<span style="color: #666666">+1</span>)
+    plt<span style="color: #666666">.</span>xticks([])
+    plt<span style="color: #666666">.</span>yticks([])
+    plt<span style="color: #666666">.</span>grid(<span style="color: #008000; font-weight: bold">False</span>)
+    plt<span style="color: #666666">.</span>imshow(train_images[i], cmap<span style="color: #666666">=</span>plt<span style="color: #666666">.</span>cm<span style="color: #666666">.</span>binary)
+    <span style="color: #408080; font-style: italic"># The CIFAR labels happen to be arrays, </span>
+    <span style="color: #408080; font-style: italic"># which is why you need the extra index</span>
+    plt<span style="color: #666666">.</span>xlabel(class_names[train_labels[i][<span style="color: #666666">0</span>]])
+plt<span style="color: #666666">.</span>show()
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
+</div>
 
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="summary-of-rnns">Summary of RNNs </h2>
+<h2 id="set-up-the-model">Set up  the model </h2>
 
-<p>Recurrent neural networks (RNNs) have in general no probabilistic component
-in a model. With a given fixed input and target from data, the RNNs learn the intermediate
-association between various layers.
-The inputs, outputs, and internal representation (hidden states) are all
-real-valued vectors.
-</p>
+<p>The 6 lines of code below define the convolutional base using a common pattern: a stack of Conv2D and MaxPooling2D layers.</p>
 
-<p>In a  traditional NN, it is assumed that every input is
-independent of each other.  But with sequential data, the input at a given stage \( t \) depends on the input from the previous stage \( t-1 \)
-</p>
+<p>As input, a CNN takes tensors of shape (image_height, image_width, color_channels), ignoring the batch size. If you are new to these dimensions, color_channels refers to (R,G,B). In this example, you will configure our CNN to process inputs of shape (32, 32, 3), which is the format of CIFAR images. You can do this by passing the argument input_shape to our first layer.</p>
 
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="summary-of-a-typical-rnn">Summary of a  typical RNN </h2>
 
-<ol>
-<li> Weight matrices \( U \), \( W \) and \( V \) that connect the input layer at a stage \( t \) with the hidden layer \( h_t \), the previous hidden layer \( h_{t-1} \) with \( h_t \) and the hidden layer \( h_t \) connecting with the output layer at the same stage and producing an output \( \tilde{y}_t \), respectively.</li>
-<li> The output from the hidden layer \( h_t \) is oftem modulated by a \( \tanh{} \) function \( h_t=\sigma_h(x_t,h_{t-1})=\tanh{(Ux_t+Wh_{t-1}+b)} \) with \( b \) a bias value</li>
-<li> The output from the hidden layer produces \( \tilde{y}_t=\sigma_y(Vh_t+c) \) where \( c \) is a new bias parameter.</li>
-<li> The output from the training at a given stage is in turn compared with the observation \( y_t \) thorugh a chosen cost function.</li>
-</ol>
-<p>The function \( g \) can any of the standard activation functions, that is a Sigmoid, a Softmax, a ReLU and other.
-The parameters are trained through the so-called back-propagation through time (BPTT) algorithm.
-</p>
+<!-- code=python (!bc pycod) typeset with pygments style "default" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #f8f8f8">
+  <pre style="line-height: 125%;">model <span style="color: #666666">=</span> models<span style="color: #666666">.</span>Sequential()
+model<span style="color: #666666">.</span>add(layers<span style="color: #666666">.</span>Conv2D(<span style="color: #666666">32</span>, (<span style="color: #666666">3</span>, <span style="color: #666666">3</span>), activation<span style="color: #666666">=</span><span style="color: #BA2121">&#39;relu&#39;</span>, input_shape<span style="color: #666666">=</span>(<span style="color: #666666">32</span>, <span style="color: #666666">32</span>, <span style="color: #666666">3</span>)))
+model<span style="color: #666666">.</span>add(layers<span style="color: #666666">.</span>MaxPooling2D((<span style="color: #666666">2</span>, <span style="color: #666666">2</span>)))
+model<span style="color: #666666">.</span>add(layers<span style="color: #666666">.</span>Conv2D(<span style="color: #666666">64</span>, (<span style="color: #666666">3</span>, <span style="color: #666666">3</span>), activation<span style="color: #666666">=</span><span style="color: #BA2121">&#39;relu&#39;</span>))
+model<span style="color: #666666">.</span>add(layers<span style="color: #666666">.</span>MaxPooling2D((<span style="color: #666666">2</span>, <span style="color: #666666">2</span>)))
+model<span style="color: #666666">.</span>add(layers<span style="color: #666666">.</span>Conv2D(<span style="color: #666666">64</span>, (<span style="color: #666666">3</span>, <span style="color: #666666">3</span>), activation<span style="color: #666666">=</span><span style="color: #BA2121">&#39;relu&#39;</span>))
 
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="four-effective-ways-to-learn-an-rnn-and-preparing-for-next-week">Four effective ways to learn an RNN and preparing for next week </h2>
-<ol>
-<li> Long Short Term Memory Make the RNN out of little modules that are designed to remember values for a long time.</li>
-<li> Hessian Free Optimization: Deal with the vanishing gradients problem by using a fancy optimizer that can detect directions with a tiny gradient but even smaller curvature.</li>
-<li> Echo State Networks: Initialize the input a hidden and hidden-hidden and output-hidden connections very carefully so that the hidden state has a huge reservoir of weakly coupled oscillators which can be selectively driven by the input.</li>
-<ul>
-  <li> ESNs only need to learn the hidden-output connections.</li>
-</ul>
-<li> Good initialization with momentum Initialize like in Echo State Networks, but then learn all of the connections using momentum</li>
-</ol>
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="gating-mechanism-long-short-term-memory-lstm">Gating mechanism: Long Short Term Memory (LSTM) </h2>
+<span style="color: #408080; font-style: italic"># Let&#39;s display the architecture of our model so far.</span>
 
-<p>Besides a simple recurrent neural network layer, as discussed above, there are two other
-commonly used types of recurrent neural network layers: Long Short
-Term Memory (LSTM) and Gated Recurrent Unit (GRU).  For a short
-introduction to these layers see <a href="https://medium.com/mindboard/lstm-vs-gru-experimental-comparison-955820c21e8b" target="_blank"><tt>https://medium.com/mindboard/lstm-vs-gru-experimental-comparison-955820c21e8b</tt></a>
-and <a href="https://medium.com/mindboard/lstm-vs-gru-experimental-comparison-955820c21e8b" target="_blank"><tt>https://medium.com/mindboard/lstm-vs-gru-experimental-comparison-955820c21e8b</tt></a>.
-</p>
+model<span style="color: #666666">.</span>summary()
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
+</div>
 
-<p>LSTM uses a memory cell for 
-modeling long-range dependencies and avoid vanishing gradient
- problems.
-Capable of modeling longer term dependencies by having
-memory cells and gates that controls the information flow along
-with the memory cells.
-</p>
+<p>You can see that the output of every Conv2D and MaxPooling2D layer is a 3D tensor of shape (height, width, channels). The width and height dimensions tend to shrink as you go deeper in the network. The number of output channels for each Conv2D layer is controlled by the first argument (e.g., 32 or 64). Typically, as the width and height shrink, you can afford (computationally) to add more output channels in each Conv2D layer.</p>
 
-<ol>
-<li> Introduced by Hochreiter and Schmidhuber (1997) who solved the problem of getting an RNN to remember things for a long time (like hundreds of time steps).</li>
-<li> They designed a memory cell using logistic and linear units with multiplicative interactions.</li>
-<li> Information gets into the cell whenever its &#8220;write&#8221; gate is on.</li>
-<li> The information stays in the cell so long as its <b>keep</b> gate is on.</li>
-<li> Information can be read from the cell by turning on its <b>read</b> gate.</li> 
-</ol>
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="implementing-a-memory-cell-in-a-neural-network">Implementing a memory cell in a neural network </h2>
+<h2 id="add-dense-layers-on-top">Add Dense layers on top </h2>
 
-<p>To preserve information for a long time in
-the activities of an RNN, we use a circuit
-that implements an analog memory cell.
+<p>To complete our model, you will feed the last output tensor from the
+convolutional base (of shape (4, 4, 64)) into one or more Dense layers
+to perform classification. Dense layers take vectors as input (which
+are 1D), while the current output is a 3D tensor. First, you will
+flatten (or unroll) the 3D output to 1D, then add one or more Dense
+layers on top. CIFAR has 10 output classes, so you use a final Dense
+layer with 10 outputs and a softmax activation.
 </p>
 
-<ol>
-<li> A linear unit that has a self-link with a weight of 1 will maintain its state.</li>
-<li> Information is stored in the cell by activating its write gate.</li>
-<li> Information is retrieved by activating the read gate.</li>
-<li> We can backpropagate through this circuit because logistics are have nice derivatives.</li> 
-</ol>
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="lstm-details">LSTM details </h2>
 
-<p>The LSTM is a unit cell that is made of three gates:</p>
-<ol>
-<li> the input gate,</li>
-<li> the forget gate,</li>
-<li> and the output gate.</li>
-</ol>
-<p>It also introduces a cell state \( c \), which can be thought of as the
-long-term memory, and a hidden state \( h \) which can be thought of as
-the short-term memory.
-</p>
+<!-- code=python (!bc pycod) typeset with pygments style "default" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #f8f8f8">
+  <pre style="line-height: 125%;">model<span style="color: #666666">.</span>add(layers<span style="color: #666666">.</span>Flatten())
+model<span style="color: #666666">.</span>add(layers<span style="color: #666666">.</span>Dense(<span style="color: #666666">64</span>, activation<span style="color: #666666">=</span><span style="color: #BA2121">&#39;relu&#39;</span>))
+model<span style="color: #666666">.</span>add(layers<span style="color: #666666">.</span>Dense(<span style="color: #666666">10</span>))
+Here<span style="color: #BA2121">&#39;s the complete architecture of our model.</span>
 
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="basic-layout">Basic layout </h2>
+model<span style="color: #666666">.</span>summary()
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
+</div>
 
-<br/><br/>
-<center>
-<p><img src="http://github.com/figslides/lstm.png" width="700" align="bottom"></p>
-</center>
-<br/><br/>
+<p>As you can see, our (4, 4, 64) outputs were flattened into vectors of shape (1024) before going through two Dense layers.</p>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="more-lstm-details">More LSTM details </h2>
-
-<p>The first stage is called the forget gate, where we combine the input
-at (say, time \( t \)), and the hidden cell state input at \( t-1 \), passing
-it through the Sigmoid activation function and then performing an
-element-wise multiplication, denoted by \( \otimes \).
-</p>
+<h2 id="compile-and-train-the-model">Compile and train the model </h2>
 
-<p>It follows </p>
-$$
-\mathbf{f}^{(t)} = \sigma(W_f\mathbf{x}^{(t)} + U_f\mathbf{h}^{(t-1)} + \mathbf{b}_f)
-$$
 
-<p>where \( W \) and \( U \) are the weights respectively.</p>
+<!-- code=python (!bc pycod) typeset with pygments style "default" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #f8f8f8">
+  <pre style="line-height: 125%;">model<span style="color: #666666">.</span>compile(optimizer<span style="color: #666666">=</span><span style="color: #BA2121">&#39;adam&#39;</span>,
+              loss<span style="color: #666666">=</span>tf<span style="color: #666666">.</span>keras<span style="color: #666666">.</span>losses<span style="color: #666666">.</span>SparseCategoricalCrossentropy(from_logits<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">True</span>),
+              metrics<span style="color: #666666">=</span>[<span style="color: #BA2121">&#39;accuracy&#39;</span>])
 
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="the-forget-gate">The forget gate </h2>
+history <span style="color: #666666">=</span> model<span style="color: #666666">.</span>fit(train_images, train_labels, epochs<span style="color: #666666">=10</span>, 
+                    validation_data<span style="color: #666666">=</span>(test_images, test_labels))
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
+</div>
 
-<p>This is called the forget gate since the Sigmoid activation function's
-outputs are very close to \( 0 \) if the argument for the function is very
-negative, and \( 1 \) if the argument is very positive. Hence we can
-control the amount of information we want to take from the long-term
-memory.
-</p>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="input-gate">Input gate </h2>
-
-<p>The next stage is the input gate, which consists of both a Sigmoid
-function (\( \sigma_i \)), which decide what percentage of the input will
-be stored in the long-term memory, and the \( \tanh_i \) function, which
-decide what is the full memory that can be stored in the long term
-memory. When these results are calculated and multiplied together, it
-is added to the cell state or stored in the long-term memory, denoted
-as \( \oplus \). 
-</p>
-
-<p>We have</p>
-$$
-\mathbf{i}^{(t)} = \sigma_g(W_i\mathbf{x}^{(t)} + U_i\mathbf{h}^{(t-1)} + \mathbf{b}_i),
-$$
+<h2 id="finally-evaluate-the-model">Finally, evaluate the model </h2>
 
-<p>and</p>
-$$
-\mathbf{\tilde{c}}^{(t)} = \tanh(W_c\mathbf{x}^{(t)} + U_c\mathbf{h}^{(t-1)} + \mathbf{b}_c),
-$$
 
-<p>again the \( W \) and \( U \) are the weights.</p>
+<!-- code=python (!bc pycod) typeset with pygments style "default" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #f8f8f8">
+  <pre style="line-height: 125%;">plt<span style="color: #666666">.</span>plot(history<span style="color: #666666">.</span>history[<span style="color: #BA2121">&#39;accuracy&#39;</span>], label<span style="color: #666666">=</span><span style="color: #BA2121">&#39;accuracy&#39;</span>)
+plt<span style="color: #666666">.</span>plot(history<span style="color: #666666">.</span>history[<span style="color: #BA2121">&#39;val_accuracy&#39;</span>], label <span style="color: #666666">=</span> <span style="color: #BA2121">&#39;val_accuracy&#39;</span>)
+plt<span style="color: #666666">.</span>xlabel(<span style="color: #BA2121">&#39;Epoch&#39;</span>)
+plt<span style="color: #666666">.</span>ylabel(<span style="color: #BA2121">&#39;Accuracy&#39;</span>)
+plt<span style="color: #666666">.</span>ylim([<span style="color: #666666">0.5</span>, <span style="color: #666666">1</span>])
+plt<span style="color: #666666">.</span>legend(loc<span style="color: #666666">=</span><span style="color: #BA2121">&#39;lower right&#39;</span>)
 
-<!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="forget-and-input">Forget and input </h2>
+test_loss, test_acc <span style="color: #666666">=</span> model<span style="color: #666666">.</span>evaluate(test_images,  test_labels, verbose<span style="color: #666666">=2</span>)
 
-<p>The forget gate and the input gate together also update the cell state with the following equation, </p>
-$$
-\mathbf{c}^{(t)} = \mathbf{f}^{(t)} \otimes \mathbf{c}^{(t-1)} + \mathbf{i}^{(t)} \otimes \mathbf{\tilde{c}}^{(t)},
-$$
+<span style="color: #008000">print</span>(test_acc)
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
+</div>
 
-<p>where \( f^{(t)} \) and \( i^{(t)} \) are the outputs of the forget gate and the input gate, respectively.</p>
 
 <!-- !split --><br><br><br><br><br><br><br><br><br><br>
-<h2 id="output-gate">Output gate </h2>
-
-<p>The final stage of the LSTM is the output gate, and its purpose is to
-update the short-term memory.  To achieve this, we take the newly
-generated long-term memory and process it through a hyperbolic tangent
-(\( \tanh \)) function creating a potential new short-term memory. We then
-multiply this potential memory by the output of the Sigmoid function
-(\( \sigma_o \)). This multiplication generates the final output as well
-as the input for the next hidden cell (\( h^{\langle t \rangle} \)) within
-the LSTM cell.
+<h2 id="building-code-using-pytorch">Building code using Pytorch </h2>
+
+<p>This code loads and normalizes the MNIST dataset. Thereafter it defines  a CNN architecture with:</p>
+<ol>
+<li> Two convolutional layers</li>
+<li> Max pooling</li>
+<li> Dropout for regularization</li>
+<li> Two fully connected layers</li>
+</ol>
+<p>It uses the Adam optimizer and for cost function it employs the
+Cross-Entropy function. It trains for 10 epochs.
+You can modify the architecture (number of layers, channels, dropout
+rate) or training parameters (learning rate, batch size, epochs) to
+experiment with different configurations.
 </p>
 
-<p>We have </p>
-$$
-\begin{aligned}
-\mathbf{o}^{(t)} &= \sigma_g(W_o\mathbf{x}^{(t)} + U_o\mathbf{h}^{(t-1)} + \mathbf{b}_o), \\
-\mathbf{h}^{(t)} &= \mathbf{o}^{(t)} \otimes \sigma_h(\mathbf{c}^{(t)}). \\
-\end{aligned}
-$$
 
-<p>where \( \mathbf{W_o,U_o} \) are the weights of the output gate and \( \mathbf{b_o} \) is the bias of the output gate.</p>
+<!-- code=python (!bc pycod) typeset with pygments style "default" -->
+<div class="cell border-box-sizing code_cell rendered">
+  <div class="input">
+    <div class="inner_cell">
+      <div class="input_area">
+        <div class="highlight" style="background: #f8f8f8">
+  <pre style="line-height: 125%;"><span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">torch</span>
+<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">torch.nn</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">nn</span>
+<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">torch.nn.functional</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">F</span>
+<span style="color: #008000; font-weight: bold">import</span> <span style="color: #0000FF; font-weight: bold">torch.optim</span> <span style="color: #008000; font-weight: bold">as</span> <span style="color: #0000FF; font-weight: bold">optim</span>
+<span style="color: #008000; font-weight: bold">from</span> <span style="color: #0000FF; font-weight: bold">torchvision</span> <span style="color: #008000; font-weight: bold">import</span> datasets, transforms
+
+<span style="color: #408080; font-style: italic"># Set device</span>
+device <span style="color: #666666">=</span> torch<span style="color: #666666">.</span>device(<span style="color: #BA2121">&quot;cuda&quot;</span> <span style="color: #008000; font-weight: bold">if</span> torch<span style="color: #666666">.</span>cuda<span style="color: #666666">.</span>is_available() <span style="color: #008000; font-weight: bold">else</span> <span style="color: #BA2121">&quot;cpu&quot;</span>)
+
+<span style="color: #408080; font-style: italic"># Define transforms</span>
+transform <span style="color: #666666">=</span> transforms<span style="color: #666666">.</span>Compose([
+   transforms<span style="color: #666666">.</span>ToTensor(),
+   transforms<span style="color: #666666">.</span>Normalize((<span style="color: #666666">0.1307</span>,), (<span style="color: #666666">0.3081</span>,))
+])
+
+<span style="color: #408080; font-style: italic"># Load datasets</span>
+train_dataset <span style="color: #666666">=</span> datasets<span style="color: #666666">.</span>MNIST(root<span style="color: #666666">=</span><span style="color: #BA2121">&#39;./data&#39;</span>, train<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">True</span>, download<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">True</span>, transform<span style="color: #666666">=</span>transform)
+test_dataset <span style="color: #666666">=</span> datasets<span style="color: #666666">.</span>MNIST(root<span style="color: #666666">=</span><span style="color: #BA2121">&#39;./data&#39;</span>, train<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">False</span>, download<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">True</span>, transform<span style="color: #666666">=</span>transform)
+
+<span style="color: #408080; font-style: italic"># Create data loaders</span>
+train_loader <span style="color: #666666">=</span> torch<span style="color: #666666">.</span>utils<span style="color: #666666">.</span>data<span style="color: #666666">.</span>DataLoader(train_dataset, batch_size<span style="color: #666666">=64</span>, shuffle<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">True</span>)
+test_loader <span style="color: #666666">=</span> torch<span style="color: #666666">.</span>utils<span style="color: #666666">.</span>data<span style="color: #666666">.</span>DataLoader(test_dataset, batch_size<span style="color: #666666">=64</span>, shuffle<span style="color: #666666">=</span><span style="color: #008000; font-weight: bold">False</span>)
+
+<span style="color: #408080; font-style: italic"># Define CNN model</span>
+<span style="color: #008000; font-weight: bold">class</span> <span style="color: #0000FF; font-weight: bold">CNN</span>(nn<span style="color: #666666">.</span>Module):
+   <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">__init__</span>(<span style="color: #008000">self</span>):
+       <span style="color: #008000">super</span>(CNN, <span style="color: #008000">self</span>)<span style="color: #666666">.</span><span style="color: #0000FF">__init__</span>()
+       <span style="color: #008000">self</span><span style="color: #666666">.</span>conv1 <span style="color: #666666">=</span> nn<span style="color: #666666">.</span>Conv2d(<span style="color: #666666">1</span>, <span style="color: #666666">32</span>, <span style="color: #666666">3</span>, padding<span style="color: #666666">=1</span>)
+       <span style="color: #008000">self</span><span style="color: #666666">.</span>conv2 <span style="color: #666666">=</span> nn<span style="color: #666666">.</span>Conv2d(<span style="color: #666666">32</span>, <span style="color: #666666">64</span>, <span style="color: #666666">3</span>, padding<span style="color: #666666">=1</span>)
+       <span style="color: #008000">self</span><span style="color: #666666">.</span>pool <span style="color: #666666">=</span> nn<span style="color: #666666">.</span>MaxPool2d(<span style="color: #666666">2</span>, <span style="color: #666666">2</span>)
+       <span style="color: #008000">self</span><span style="color: #666666">.</span>fc1 <span style="color: #666666">=</span> nn<span style="color: #666666">.</span>Linear(<span style="color: #666666">64*7*7</span>, <span style="color: #666666">1024</span>)
+       <span style="color: #008000">self</span><span style="color: #666666">.</span>fc2 <span style="color: #666666">=</span> nn<span style="color: #666666">.</span>Linear(<span style="color: #666666">1024</span>, <span style="color: #666666">10</span>)
+       <span style="color: #008000">self</span><span style="color: #666666">.</span>dropout <span style="color: #666666">=</span> nn<span style="color: #666666">.</span>Dropout(<span style="color: #666666">0.5</span>)
+
+   <span style="color: #008000; font-weight: bold">def</span> <span style="color: #0000FF">forward</span>(<span style="color: #008000">self</span>, x):
+       x <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>pool(F<span style="color: #666666">.</span>relu(<span style="color: #008000">self</span><span style="color: #666666">.</span>conv1(x)))
+       x <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>pool(F<span style="color: #666666">.</span>relu(<span style="color: #008000">self</span><span style="color: #666666">.</span>conv2(x)))
+       x <span style="color: #666666">=</span> x<span style="color: #666666">.</span>view(<span style="color: #666666">-1</span>, <span style="color: #666666">64*7*7</span>)
+       x <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>dropout(F<span style="color: #666666">.</span>relu(<span style="color: #008000">self</span><span style="color: #666666">.</span>fc1(x)))
+       x <span style="color: #666666">=</span> <span style="color: #008000">self</span><span style="color: #666666">.</span>fc2(x)
+       <span style="color: #008000; font-weight: bold">return</span> x
+
+<span style="color: #408080; font-style: italic"># Initialize model, loss function, and optimizer</span>
+model <span style="color: #666666">=</span> CNN()<span style="color: #666666">.</span>to(device)
+criterion <span style="color: #666666">=</span> nn<span style="color: #666666">.</span>CrossEntropyLoss()
+optimizer <span style="color: #666666">=</span> optim<span style="color: #666666">.</span>Adam(model<span style="color: #666666">.</span>parameters(), lr<span style="color: #666666">=0.001</span>)
+
+<span style="color: #408080; font-style: italic"># Training loop</span>
+num_epochs <span style="color: #666666">=</span> <span style="color: #666666">10</span>
+<span style="color: #008000; font-weight: bold">for</span> epoch <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">range</span>(num_epochs):
+   model<span style="color: #666666">.</span>train()
+   running_loss <span style="color: #666666">=</span> <span style="color: #666666">0.0</span>
+   <span style="color: #008000; font-weight: bold">for</span> batch_idx, (data, target) <span style="color: #AA22FF; font-weight: bold">in</span> <span style="color: #008000">enumerate</span>(train_loader):
+       data, target <span style="color: #666666">=</span> data<span style="color: #666666">.</span>to(device), target<span style="color: #666666">.</span>to(device)
+       optimizer<span style="color: #666666">.</span>zero_grad()
+       outputs <span style="color: #666666">=</span> model(data)
+       loss <span style="color: #666666">=</span> criterion(outputs, target)
+       loss<span style="color: #666666">.</span>backward()
+       optimizer<span style="color: #666666">.</span>step()
+       running_loss <span style="color: #666666">+=</span> loss<span style="color: #666666">.</span>item()
+
+   <span style="color: #008000">print</span>(<span style="color: #BA2121">f&#39;Epoch [</span><span style="color: #BB6688; font-weight: bold">{</span>epoch<span style="color: #666666">+1</span><span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">/</span><span style="color: #BB6688; font-weight: bold">{</span>num_epochs<span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">], Loss: </span><span style="color: #BB6688; font-weight: bold">{</span>running_loss<span style="color: #666666">/</span><span style="color: #008000">len</span>(train_loader)<span style="color: #BB6688; font-weight: bold">:</span><span style="color: #BA2121">.4f</span><span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">&#39;</span>)
+
+<span style="color: #408080; font-style: italic"># Testing the model</span>
+model<span style="color: #666666">.</span>eval()
+correct <span style="color: #666666">=</span> <span style="color: #666666">0</span>
+total <span style="color: #666666">=</span> <span style="color: #666666">0</span>
+<span style="color: #008000; font-weight: bold">with</span> torch<span style="color: #666666">.</span>no_grad():
+   <span style="color: #008000; font-weight: bold">for</span> data, target <span style="color: #AA22FF; font-weight: bold">in</span> test_loader:
+       data, target <span style="color: #666666">=</span> data<span style="color: #666666">.</span>to(device), target<span style="color: #666666">.</span>to(device)
+       outputs <span style="color: #666666">=</span> model(data)
+       _, predicted <span style="color: #666666">=</span> torch<span style="color: #666666">.</span>max(outputs<span style="color: #666666">.</span>data, <span style="color: #666666">1</span>)
+       total <span style="color: #666666">+=</span> target<span style="color: #666666">.</span>size(<span style="color: #666666">0</span>)
+       correct <span style="color: #666666">+=</span> (predicted <span style="color: #666666">==</span> target)<span style="color: #666666">.</span>sum()<span style="color: #666666">.</span>item()
+
+<span style="color: #008000">print</span>(<span style="color: #BA2121">f&#39;Test Accuracy: </span><span style="color: #BB6688; font-weight: bold">{</span><span style="color: #666666">100</span> <span style="color: #666666">*</span> correct <span style="color: #666666">/</span> total<span style="color: #BB6688; font-weight: bold">:</span><span style="color: #BA2121">.2f</span><span style="color: #BB6688; font-weight: bold">}</span><span style="color: #BA2121">%&#39;</span>)
+</pre>
+</div>
+      </div>
+    </div>
+  </div>
+  <div class="output_wrapper">
+    <div class="output">
+      <div class="output_area">
+        <div class="output_subarea output_stream output_stdout output_text">          
+        </div>
+      </div>
+    </div>
+  </div>
+</div>
+
 
 <!-- ------------------- end of main content --------------- -->
 <center style="font-size:80%">
-<!-- copyright --> &copy; 1999-2024, Morten Hjorth-Jensen. Released under CC Attribution-NonCommercial 4.0 license
+<!-- copyright --> &copy; 1999-2025, Morten Hjorth-Jensen. Released under CC Attribution-NonCommercial 4.0 license
 </center>
 </body>
 </html>
diff --git a/doc/pub/week45/ipynb/ipynb-week45-src.tar.gz b/doc/pub/week45/ipynb/ipynb-week45-src.tar.gz
index 8499ad863..5262e98c9 100644
Binary files a/doc/pub/week45/ipynb/ipynb-week45-src.tar.gz and b/doc/pub/week45/ipynb/ipynb-week45-src.tar.gz differ
diff --git a/doc/pub/week45/ipynb/week45.ipynb b/doc/pub/week45/ipynb/week45.ipynb
index c3f89b2b8..c5336e2ab 100644
--- a/doc/pub/week45/ipynb/week45.ipynb
+++ b/doc/pub/week45/ipynb/week45.ipynb
@@ -2,91 +2,88 @@
  "cells": [
   {
    "cell_type": "markdown",
-   "id": "d2f5c0be",
+   "id": "9686648f",
    "metadata": {
     "editable": true
    },
    "source": [
     "<!-- HTML file automatically generated from DocOnce source (https://github.com/doconce/doconce/)\n",
     "doconce format html week45.do.txt --no_mako -->\n",
-    "<!-- dom:TITLE: Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks (RNNs) -->"
+    "<!-- dom:TITLE: Week 45,  Convolutional Neural Networks (CCNs) -->"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "3a711f58",
+   "id": "45892517",
    "metadata": {
     "editable": true
    },
    "source": [
-    "# Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks (RNNs)\n",
+    "# Week 45,  Convolutional Neural Networks (CCNs)\n",
     "**Morten Hjorth-Jensen**, Department of Physics, University of Oslo\n",
     "\n",
-    "Date: **November 4-8**"
+    "Date: **November 3-7, 2025**"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "953ead58",
+   "id": "8449fbfd",
    "metadata": {
     "editable": true
    },
    "source": [
     "## Plans for week 45\n",
     "\n",
-    "**Material for the lecture on Monday November 4, 2024.**\n",
+    "**Material for the lecture on Monday November 3, 2025.**\n",
     "\n",
-    "1. Convolutional Neural Networks, codes and examples (own code and TensorFlow and Pytorch implementations)\n",
+    "1. Convolutional Neural Networks, codes and examples (TensorFlow and Pytorch implementations)\n",
     "\n",
-    "2. Recurrent  Neural Networks (RNNs)\n",
+    "2. Readings and Videos:\n",
     "\n",
-    "3. Readings and Videos:\n",
+    "3. These lecture notes at <https://github.com/CompPhysics/MachineLearning/blob/master/doc/pub/week45/ipynb/week45.ipynb>\n",
     "\n",
-    "a. These lecture notes at <https://github.com/CompPhysics/MachineLearning/blob/master/doc/pub/week45/ipynb/week45.ipynb>\n",
-    "<!-- * [Video of lecture](https://youtu.be/z0x-vgyAZUk) -->\n",
-    "<!-- * [Whiteboard notes](https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2023/NotesNov9.pdf) -->\n",
+    "4. Video of lecture at <https://youtu.be/dZt6Vm1wjhs>\n",
     "\n",
-    "b. For a more in depth discussion on  CNNs and recurrent neural networks we recommend Goodfellow et al chapters 9 and 10. See also chapter 11 and 12 on practicalities and applications    \n",
+    "5. Whiteboard notes at <https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2025/FYSSTKweek45.pdf>\n",
     "\n",
-    "c. Reading suggestions for implementation of CNNs and RNNs, see Raschka et al chapters 14-15 at <https://github.com/rasbt/machine-learning-book>.\n",
+    "6. For a more in depth discussion on  CNNs we recommend Goodfellow et al chapters 9. See also chapter 11 and 12 on practicalities and applications    \n",
     "\n",
-    "d. Video  on Recurrent Neural Networks from MIT at <https://www.youtube.com/watch?v=SEnXr6v2ifU&ab_channel=AlexanderAmini>\n",
+    "7. Reading suggestions for implementation of CNNs, see Raschka et al chapters 14-15 at <https://github.com/rasbt/machine-learning-book>.\n",
+    "<!-- o Video  on Recurrent Neural Networks from MIT at <https://www.youtube.com/watch?v=SEnXr6v2ifU&ab_channel=AlexanderAmini> -->\n",
     "\n",
-    "e. Video on Deep Learning at <https://www.youtube.com/playlist?list=PLZHQObOWTQDNU6R1_67000Dx_ZCJB-3pi>"
+    "a. Video on Deep Learning at <https://www.youtube.com/playlist?list=PLZHQObOWTQDNU6R1_67000Dx_ZCJB-3pi>"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "6e2f3f8f",
+   "id": "4ad8a4b2",
    "metadata": {
     "editable": true
    },
    "source": [
-    "## Material for the lab sessions, additional ways to present classification results and other practicalities\n",
+    "## Material for the lab sessions\n",
     "\n",
-    "**Material for the active learning sessions on Tuesday and Wednesday.**\n",
-    "\n",
-    "1. Discussion of and work on project 3, available from Monday November 4, late evening"
+    "Discussion of and work on project 2, no exercises this week, only project work"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "a44ba136",
+   "id": "48e99fbe",
    "metadata": {
     "editable": true
    },
    "source": [
-    "## Material for Lecture Monday November 4"
+    "## Material for Lecture Monday November 3"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "3131e5a4",
+   "id": "661e183c",
    "metadata": {
     "editable": true
    },
    "source": [
-    "## Convolutional Neural Networks (recognizing images)\n",
+    "## Convolutional Neural Networks (recognizing images), reminder from last week\n",
     "\n",
     "Convolutional neural networks (CNNs) were developed during the last\n",
     "decade of the previous century, with a focus on character recognition\n",
@@ -108,7 +105,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "908a13dc",
+   "id": "96a38398",
    "metadata": {
     "editable": true
    },
@@ -124,7 +121,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "c4bb9dcc",
+   "id": "3ca522fb",
    "metadata": {
     "editable": true
    },
@@ -142,7 +139,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "2a879a06",
+   "id": "609aa156",
    "metadata": {
     "editable": true
    },
@@ -172,7 +169,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "260b0ff7",
+   "id": "c280e4de",
    "metadata": {
     "editable": true
    },
@@ -202,7 +199,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "a50db9d3",
+   "id": "0d86d50e",
    "metadata": {
     "editable": true
    },
@@ -242,7 +239,74 @@
   },
   {
    "cell_type": "markdown",
-   "id": "abd273f3",
+   "id": "93102a35",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## More on Dimensionalities\n",
+    "\n",
+    "In fields like signal processing (and imaging as well), one designs\n",
+    "so-called filters. These filters are defined by the convolutions and\n",
+    "are often hand-crafted. One may specify filters for smoothing, edge\n",
+    "detection, frequency reshaping, and similar operations. However with\n",
+    "neural networks the idea is to automatically learn the filters and use\n",
+    "many of them in conjunction with non-linear operations (activation\n",
+    "functions).\n",
+    "\n",
+    "As an example consider a neural network operating on sound sequence\n",
+    "data.  Assume that we an input vector $\\boldsymbol{x}$ of length $d=10^6$.  We\n",
+    "construct then a neural network with onle hidden layer only with\n",
+    "$10^4$ nodes. This means that we will have a weight matrix with\n",
+    "$10^4\\times 10^6=10^{10}$ weights to be determined, together with $10^4$ biases.\n",
+    "\n",
+    "Assume furthermore that we have an output layer which is meant to train whether the sound sequence represents a human voice (true) or something else (false).\n",
+    "It means that we have only one output node. But since this output node connects to $10^4$ nodes in the hidden layer, there are in total $10^4$ weights to be determined for the output layer, plus one bias. In total we have"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b0e6ea33",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\mathrm{NumberParameters}=10^{10}+10^4+10^4+1 \\approx 10^{10},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3fbba997",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "that is ten billion parameters to determine."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4be9d3e0",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Further remarks\n",
+    "\n",
+    "The main principles that justify convolutions is locality of\n",
+    "information and repetion of patterns within the signal. Sound samples\n",
+    "of the input in adjacent spots are much more likely to affect each\n",
+    "other than those that are very far away. Similarly, sounds are\n",
+    "repeated in multiple times in the signal. While slightly simplistic,\n",
+    "reasoning about such a sound example demonstrates this. The same\n",
+    "principles then apply to images and other similar data."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b93711ab",
    "metadata": {
     "editable": true
    },
@@ -271,7 +335,29 @@
   },
   {
    "cell_type": "markdown",
-   "id": "d2627fd0",
+   "id": "df93de2c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Transforming images\n",
+    "\n",
+    "CNNs transform the original image layer by layer from the original\n",
+    "pixel values to the final class scores. \n",
+    "\n",
+    "Observe that some layers contain\n",
+    "parameters and other don’t. In particular, the CNN layers perform\n",
+    "transformations that are a function of not only the activations in the\n",
+    "input volume, but also of the parameters (the weights and biases of\n",
+    "the neurons). On the other hand, the RELU/POOL layers will implement a\n",
+    "fixed function. The parameters in the CONV/FC layers will be trained\n",
+    "with gradient descent so that the class scores that the CNN computes\n",
+    "are consistent with the labels in the training set for each image."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "35b469f8",
    "metadata": {
     "editable": true
    },
@@ -293,7 +379,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "e5807407",
+   "id": "f2bc243c",
    "metadata": {
     "editable": true
    },
@@ -309,7 +395,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "510467ee",
+   "id": "92956a26",
    "metadata": {
     "editable": true
    },
@@ -326,5304 +412,1920 @@
   },
   {
    "cell_type": "markdown",
-   "id": "c4b71661",
+   "id": "b758f4ee",
    "metadata": {
     "editable": true
    },
    "source": [
-    "## Building convolutional neural networks in Tensorflow and Keras\n",
+    "## Mathematics of CNNs\n",
     "\n",
-    "As discussed above, CNNs are neural networks built from the assumption that the inputs\n",
-    "to the network are 2D images. This is important because the number of features or pixels in images\n",
-    "grows very fast with the image size, and an enormous number of weights and biases are needed in order to build an accurate network.  \n",
+    "The mathematics of CNNs is based on the mathematical operation of\n",
+    "**convolution**.  In mathematics (in particular in functional analysis),\n",
+    "convolution is represented by mathematical operations (integration,\n",
+    "summation etc) on two functions in order to produce a third function\n",
+    "that expresses how the shape of one gets modified by the other.\n",
+    "Convolution has a plethora of applications in a variety of\n",
+    "disciplines, spanning from statistics to signal processing, computer\n",
+    "vision, solutions of differential equations,linear algebra,\n",
+    "engineering, and yes, machine learning.\n",
     "\n",
-    "As before, we still have our input, a hidden layer and an output. What's novel about convolutional networks\n",
-    "are the **convolutional** and **pooling** layers stacked in pairs between the input and the hidden layer.\n",
-    "In addition, the data is no longer represented as a 2D feature matrix, instead each input is a number of 2D\n",
-    "matrices, typically 1 for each color dimension (Red, Green, Blue)."
+    "Mathematically, convolution is defined as follows (one-dimensional example):\n",
+    "Let us define a continuous function $y(t)$ given by"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "22017392",
+   "id": "9fa911b3",
    "metadata": {
     "editable": true
    },
    "source": [
-    "## Setting it up\n",
+    "$$\n",
+    "y(t) = \\int x(a) w(t-a) da,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "918817a5",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where $x(a)$ represents a so-called input and $w(t-a)$ is normally called the weight function or kernel.\n",
     "\n",
-    "It means that to represent the entire\n",
-    "dataset of images, we require a 4D matrix or **tensor**. This tensor has the dimensions:"
+    "The above integral is written in  a more compact form as"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "bd3a7e81",
+   "id": "d5538df6",
    "metadata": {
     "editable": true
    },
    "source": [
     "$$\n",
-    "(n_{inputs},\\, n_{pixels, width},\\, n_{pixels, height},\\, depth) .\n",
+    "y(t) = \\left(x * w\\right)(t).\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "6ce32ada",
+   "id": "d4a4e2bc",
    "metadata": {
     "editable": true
    },
    "source": [
-    "## The MNIST dataset again\n",
-    "\n",
-    "The MNIST dataset consists of grayscale images with a pixel size of\n",
-    "$28\\times 28$, meaning we require $28 \\times 28 = 724$ weights to each\n",
-    "neuron in the first hidden layer.\n",
-    "\n",
-    "If we were to analyze images of size $128\\times 128$ we would require\n",
-    "$128 \\times 128 = 16384$ weights to each neuron. Even worse if we were\n",
-    "dealing with color images, as most images are, we have an image matrix\n",
-    "of size $128\\times 128$ for each color dimension (Red, Green, Blue),\n",
-    "meaning 3 times the number of weights $= 49152$ are required for every\n",
-    "single neuron in the first hidden layer."
+    "The discretized version reads"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "433deaea",
+   "id": "68268e68",
    "metadata": {
     "editable": true
    },
    "source": [
-    "## Strong correlations\n",
-    "\n",
-    "Images typically have strong local correlations, meaning that a small\n",
-    "part of the image varies little from its neighboring regions. If for\n",
-    "example we have an image of a blue car, we can roughly assume that a\n",
-    "small blue part of the image is surrounded by other blue regions.\n",
-    "\n",
-    "Therefore, instead of connecting every single pixel to a neuron in the\n",
-    "first hidden layer, as we have previously done with deep neural\n",
-    "networks, we can instead connect each neuron to a small part of the\n",
-    "image (in all 3 RGB depth dimensions).  The size of each small area is\n",
-    "fixed, and known as a [receptive](https://en.wikipedia.org/wiki/Receptive_field)."
+    "$$\n",
+    "y(t) = \\sum_{a=-\\infty}^{a=\\infty}x(a)w(t-a).\n",
+    "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "20a9b99a",
+   "id": "198bcce9",
    "metadata": {
     "editable": true
    },
    "source": [
-    "## Layers of a CNN\n",
-    "\n",
-    "The layers of a convolutional neural network arrange neurons in 3D: width, height and depth.  \n",
-    "The input image is typically a square matrix of depth 3. \n",
+    "Computing the inverse of the above convolution operations is known as deconvolution and the process is commutative.\n",
     "\n",
-    "A **convolution** is performed on the image which outputs\n",
-    "a 3D volume of neurons. The weights to the input are arranged in a number of 2D matrices, known as **filters**.\n",
-    "\n",
-    "Each filter slides along the input image, taking the dot product\n",
-    "between each small part of the image and the filter, in all depth\n",
-    "dimensions. This is then passed through a non-linear function,\n",
-    "typically the **Rectified Linear (ReLu)** function, which serves as the\n",
-    "activation of the neurons in the first convolutional layer. This is\n",
-    "further passed through a **pooling layer**, which reduces the size of the\n",
-    "convolutional layer, e.g. by taking the maximum or average across some\n",
-    "small regions, and this serves as input to the next convolutional\n",
-    "layer."
+    "How can we use this? And what does it mean? Let us study some familiar examples first."
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "07094fb1",
+   "id": "43b535c4",
    "metadata": {
     "editable": true
    },
    "source": [
-    "## Systematic reduction\n",
+    "## Convolution Examples: Polynomial multiplication\n",
     "\n",
-    "By systematically reducing the size of the input volume, through\n",
-    "convolution and pooling, the network should create representations of\n",
-    "small parts of the input, and then from them assemble representations\n",
-    "of larger areas.  The final pooling layer is flattened to serve as\n",
-    "input to a hidden layer, such that each neuron in the final pooling\n",
-    "layer is connected to every single neuron in the hidden layer. This\n",
-    "then serves as input to the output layer, e.g. a softmax output for\n",
-    "classification."
+    "Our first example is that of a multiplication between two polynomials,\n",
+    "which we will rewrite in terms of the mathematics of convolution. In\n",
+    "the final stage, since the problem here is a discrete one, we will\n",
+    "recast the final expression in terms of a matrix-vector\n",
+    "multiplication, where the matrix is a so-called [Toeplitz matrix\n",
+    "](https://link.springer.com/book/10.1007/978-93-86279-04-0).\n",
+    "\n",
+    "Let us look a the following polynomials to second and third order, respectively:"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "4ac7b9ae",
+   "id": "45bc8ffc",
    "metadata": {
     "editable": true
    },
    "source": [
-    "## Prerequisites: Collect and pre-process data"
+    "$$\n",
+    "p(t) = \\alpha_0+\\alpha_1 t+\\alpha_2 t^2,\n",
+    "$$"
    ]
   },
   {
-   "cell_type": "code",
-   "execution_count": 1,
-   "id": "f977c04b",
+   "cell_type": "markdown",
+   "id": "2c42df04",
    "metadata": {
-    "collapsed": false,
     "editable": true
    },
-   "outputs": [],
    "source": [
-    "%matplotlib inline\n",
-    "\n",
-    "# import necessary packages\n",
-    "import numpy as np\n",
-    "import matplotlib.pyplot as plt\n",
-    "from sklearn import datasets\n",
-    "\n",
-    "\n",
-    "# ensure the same random numbers appear every time\n",
-    "np.random.seed(0)\n",
-    "\n",
-    "# display images in notebook\n",
-    "%matplotlib inline\n",
-    "plt.rcParams['figure.figsize'] = (12,12)\n",
-    "\n",
-    "\n",
-    "# download MNIST dataset\n",
-    "digits = datasets.load_digits()\n",
-    "\n",
-    "# define inputs and labels\n",
-    "inputs = digits.images\n",
-    "labels = digits.target\n",
-    "\n",
-    "# RGB images have a depth of 3\n",
-    "# our images are grayscale so they should have a depth of 1\n",
-    "inputs = inputs[:,:,:,np.newaxis]\n",
-    "\n",
-    "print(\"inputs = (n_inputs, pixel_width, pixel_height, depth) = \" + str(inputs.shape))\n",
-    "print(\"labels = (n_inputs) = \" + str(labels.shape))\n",
-    "\n",
-    "\n",
-    "# choose some random images to display\n",
-    "n_inputs = len(inputs)\n",
-    "indices = np.arange(n_inputs)\n",
-    "random_indices = np.random.choice(indices, size=5)\n",
-    "\n",
-    "for i, image in enumerate(digits.images[random_indices]):\n",
-    "    plt.subplot(1, 5, i+1)\n",
-    "    plt.axis('off')\n",
-    "    plt.imshow(image, cmap=plt.cm.gray_r, interpolation='nearest')\n",
-    "    plt.title(\"Label: %d\" % digits.target[random_indices[i]])\n",
-    "plt.show()"
+    "and"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "de1cfa56",
+   "id": "08c139bf",
    "metadata": {
     "editable": true
    },
    "source": [
-    "## Importing Keras and Tensorflow"
+    "$$\n",
+    "s(t) = \\beta_0+\\beta_1 t+\\beta_2 t^2+\\beta_3 t^3.\n",
+    "$$"
    ]
   },
   {
-   "cell_type": "code",
-   "execution_count": 2,
-   "id": "bb063007",
+   "cell_type": "markdown",
+   "id": "bf189420",
    "metadata": {
-    "collapsed": false,
     "editable": true
    },
-   "outputs": [],
    "source": [
-    "from tensorflow.keras import datasets, layers, models\n",
-    "from tensorflow.keras.layers import Input\n",
-    "from tensorflow.keras.models import Sequential      #This allows appending layers to existing models\n",
-    "from tensorflow.keras.layers import Dense           #This allows defining the characteristics of a particular layer\n",
-    "from tensorflow.keras import optimizers             #This allows using whichever optimiser we want (sgd,adam,RMSprop)\n",
-    "from tensorflow.keras import regularizers           #This allows using whichever regularizer we want (l1,l2,l1_l2)\n",
-    "from tensorflow.keras.utils import to_categorical   #This allows using categorical cross entropy as the cost function\n",
-    "#from tensorflow.keras import Conv2D\n",
-    "#from tensorflow.keras import MaxPooling2D\n",
-    "#from tensorflow.keras import Flatten\n",
-    "\n",
-    "from sklearn.model_selection import train_test_split\n",
-    "\n",
-    "# representation of labels\n",
-    "labels = to_categorical(labels)\n",
-    "\n",
-    "# split into train and test data\n",
-    "# one-liner from scikit-learn library\n",
-    "train_size = 0.8\n",
-    "test_size = 1 - train_size\n",
-    "X_train, X_test, Y_train, Y_test = train_test_split(inputs, labels, train_size=train_size,\n",
-    "                                                    test_size=test_size)"
+    "The polynomial multiplication gives us a new polynomial of degree $5$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "084ce120",
+   "id": "7f5d7607",
    "metadata": {
     "editable": true
    },
    "source": [
-    "## Running with Keras"
+    "$$\n",
+    "z(t) = \\delta_0+\\delta_1 t+\\delta_2 t^2+\\delta_3 t^3+\\delta_4 t^4+\\delta_5 t^5.\n",
+    "$$"
    ]
   },
   {
-   "cell_type": "code",
-   "execution_count": 3,
-   "id": "ebb96cee",
+   "cell_type": "markdown",
+   "id": "a2f47e64",
    "metadata": {
-    "collapsed": false,
     "editable": true
    },
-   "outputs": [],
    "source": [
-    "def create_convolutional_neural_network_keras(input_shape, receptive_field,\n",
-    "                                              n_filters, n_neurons_connected, n_categories,\n",
-    "                                              eta, lmbd):\n",
-    "    model = Sequential()\n",
-    "    model.add(layers.Conv2D(n_filters, (receptive_field, receptive_field), input_shape=input_shape, padding='same',\n",
-    "              activation='relu', kernel_regularizer=regularizers.l2(lmbd)))\n",
-    "    model.add(layers.MaxPooling2D(pool_size=(2, 2)))\n",
-    "    model.add(layers.Flatten())\n",
-    "    model.add(layers.Dense(n_neurons_connected, activation='relu', kernel_regularizer=regularizers.l2(lmbd)))\n",
-    "    model.add(layers.Dense(n_categories, activation='softmax', kernel_regularizer=regularizers.l2(lmbd)))\n",
-    "    \n",
-    "    sgd = optimizers.SGD(learning_rate=eta)\n",
-    "    model.compile(loss='categorical_crossentropy', optimizer=sgd, metrics=['accuracy'])\n",
-    "    \n",
-    "    return model\n",
-    "\n",
-    "epochs = 100\n",
-    "batch_size = 100\n",
-    "input_shape = X_train.shape[1:4]\n",
-    "receptive_field = 3\n",
-    "n_filters = 10\n",
-    "n_neurons_connected = 50\n",
-    "n_categories = 10\n",
+    "## Efficient Polynomial Multiplication\n",
     "\n",
-    "eta_vals = np.logspace(-5, 1, 7)\n",
-    "lmbd_vals = np.logspace(-5, 1, 7)"
+    "Computing polynomial products can be implemented efficiently if we rewrite the more brute force multiplications using convolution.\n",
+    "We note first that the new coefficients are given as"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "50ed8ef3",
+   "id": "7890aee8",
    "metadata": {
     "editable": true
    },
    "source": [
-    "## Final part"
+    "$$\n",
+    "\\begin{split}\n",
+    "\\delta_0=&\\alpha_0\\beta_0\\\\\n",
+    "\\delta_1=&\\alpha_1\\beta_0+\\alpha_0\\beta_1\\\\\n",
+    "\\delta_2=&\\alpha_0\\beta_2+\\alpha_1\\beta_1+\\alpha_2\\beta_0\\\\\n",
+    "\\delta_3=&\\alpha_1\\beta_2+\\alpha_2\\beta_1+\\alpha_0\\beta_3\\\\\n",
+    "\\delta_4=&\\alpha_2\\beta_2+\\alpha_1\\beta_3\\\\\n",
+    "\\delta_5=&\\alpha_2\\beta_3.\\\\\n",
+    "\\end{split}\n",
+    "$$"
    ]
   },
   {
-   "cell_type": "code",
-   "execution_count": 4,
-   "id": "0ae284c0",
+   "cell_type": "markdown",
+   "id": "6a03a3eb",
    "metadata": {
-    "collapsed": false,
     "editable": true
    },
-   "outputs": [],
    "source": [
-    "CNN_keras = np.zeros((len(eta_vals), len(lmbd_vals)), dtype=object)\n",
-    "        \n",
-    "for i, eta in enumerate(eta_vals):\n",
-    "    for j, lmbd in enumerate(lmbd_vals):\n",
-    "        CNN = create_convolutional_neural_network_keras(input_shape, receptive_field,\n",
-    "                                              n_filters, n_neurons_connected, n_categories,\n",
-    "                                              eta, lmbd)\n",
-    "        CNN.fit(X_train, Y_train, epochs=epochs, batch_size=batch_size, verbose=0)\n",
-    "        scores = CNN.evaluate(X_test, Y_test)\n",
-    "        \n",
-    "        CNN_keras[i][j] = CNN\n",
-    "        \n",
-    "        print(\"Learning rate = \", eta)\n",
-    "        print(\"Lambda = \", lmbd)\n",
-    "        print(\"Test accuracy: %.3f\" % scores[1])\n",
-    "        print()"
+    "We note that $\\alpha_i=0$ except for $i\\in \\left\\{0,1,2\\right\\}$ and $\\beta_i=0$ except for $i\\in\\left\\{0,1,2,3\\right\\}$.\n",
+    "\n",
+    "We can then rewrite the coefficients $\\delta_j$ using a discrete convolution as"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "4a9c990f",
+   "id": "b49e404f",
    "metadata": {
     "editable": true
    },
    "source": [
-    "## Final visualization"
+    "$$\n",
+    "\\delta_j = \\sum_{i=-\\infty}^{i=\\infty}\\alpha_i\\beta_{j-i}=(\\alpha * \\beta)_j,\n",
+    "$$"
    ]
   },
   {
-   "cell_type": "code",
-   "execution_count": 5,
-   "id": "9a876115",
+   "cell_type": "markdown",
+   "id": "4ef5b061",
    "metadata": {
-    "collapsed": false,
     "editable": true
    },
-   "outputs": [],
    "source": [
-    "# visual representation of grid search\n",
-    "# uses seaborn heatmap, could probably do this in matplotlib\n",
-    "import seaborn as sns\n",
-    "\n",
-    "sns.set()\n",
-    "\n",
-    "train_accuracy = np.zeros((len(eta_vals), len(lmbd_vals)))\n",
-    "test_accuracy = np.zeros((len(eta_vals), len(lmbd_vals)))\n",
-    "\n",
-    "for i in range(len(eta_vals)):\n",
-    "    for j in range(len(lmbd_vals)):\n",
-    "        CNN = CNN_keras[i][j]\n",
-    "\n",
-    "        train_accuracy[i][j] = CNN.evaluate(X_train, Y_train)[1]\n",
-    "        test_accuracy[i][j] = CNN.evaluate(X_test, Y_test)[1]\n",
-    "\n",
-    "        \n",
-    "fig, ax = plt.subplots(figsize = (10, 10))\n",
-    "sns.heatmap(train_accuracy, annot=True, ax=ax, cmap=\"viridis\")\n",
-    "ax.set_title(\"Training Accuracy\")\n",
-    "ax.set_ylabel(\"$\\eta$\")\n",
-    "ax.set_xlabel(\"$\\lambda$\")\n",
-    "plt.show()\n",
-    "\n",
-    "fig, ax = plt.subplots(figsize = (10, 10))\n",
-    "sns.heatmap(test_accuracy, annot=True, ax=ax, cmap=\"viridis\")\n",
-    "ax.set_title(\"Test Accuracy\")\n",
-    "ax.set_ylabel(\"$\\eta$\")\n",
-    "ax.set_xlabel(\"$\\lambda$\")\n",
-    "plt.show()"
+    "or as a double sum with restriction $l=i+j$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "5edad9f2",
+   "id": "61685a6c",
    "metadata": {
     "editable": true
    },
    "source": [
-    "## The CIFAR01 data set\n",
-    "\n",
-    "The CIFAR10 dataset contains 60,000 color images in 10 classes, with\n",
-    "6,000 images in each class. The dataset is divided into 50,000\n",
-    "training images and 10,000 testing images. The classes are mutually\n",
-    "exclusive and there is no overlap between them."
+    "$$\n",
+    "\\delta_l = \\sum_{ij}\\alpha_i\\beta_{j}.\n",
+    "$$"
    ]
   },
   {
-   "cell_type": "code",
-   "execution_count": 6,
-   "id": "b7e930dd",
+   "cell_type": "markdown",
+   "id": "7ced5341",
    "metadata": {
-    "collapsed": false,
     "editable": true
    },
-   "outputs": [],
    "source": [
-    "import tensorflow as tf\n",
+    "## Further simplification\n",
     "\n",
-    "from tensorflow.keras import datasets, layers, models\n",
-    "import matplotlib.pyplot as plt\n",
-    "\n",
-    "# We import the data set\n",
-    "(train_images, train_labels), (test_images, test_labels) = datasets.cifar10.load_data()\n",
-    "\n",
-    "# Normalize pixel values to be between 0 and 1 by dividing by 255. \n",
-    "train_images, test_images = train_images / 255.0, test_images / 255.0"
+    "Although we may have redundant operations with some few zeros for $\\beta_i$, we can rewrite the above sum in a more compact way as"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "cfd6c279",
+   "id": "3d00697e",
    "metadata": {
     "editable": true
    },
    "source": [
-    "## Verifying the data set\n",
-    "\n",
-    "To verify that the dataset looks correct, let's plot the first 25 images from the training set and display the class name below each image."
+    "$$\n",
+    "\\delta_i = \\sum_{k=0}^{k=m-1}\\alpha_k\\beta_{i-k},\n",
+    "$$"
    ]
   },
   {
-   "cell_type": "code",
-   "execution_count": 7,
-   "id": "faf00321",
+   "cell_type": "markdown",
+   "id": "22837be3",
    "metadata": {
-    "collapsed": false,
     "editable": true
    },
-   "outputs": [],
    "source": [
-    "class_names = ['airplane', 'automobile', 'bird', 'cat', 'deer',\n",
-    "               'dog', 'frog', 'horse', 'ship', 'truck']\n",
-    "plt.figure(figsize=(10,10))\n",
-    "for i in range(25):\n",
-    "    plt.subplot(5,5,i+1)\n",
-    "    plt.xticks([])\n",
-    "    plt.yticks([])\n",
-    "    plt.grid(False)\n",
-    "    plt.imshow(train_images[i], cmap=plt.cm.binary)\n",
-    "    # The CIFAR labels happen to be arrays, \n",
-    "    # which is why you need the extra index\n",
-    "    plt.xlabel(class_names[train_labels[i][0]])\n",
-    "plt.show()"
+    "where $m=3$ in our case, the maximum length of\n",
+    "the vector $\\alpha$. Note that the vector $\\boldsymbol{\\beta}$ has length $n=4$. Below we will find an even more efficient representation."
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "209aa32f",
+   "id": "1603a086",
    "metadata": {
     "editable": true
    },
    "source": [
-    "## Set up  the model\n",
+    "## A more efficient way of coding the above Convolution\n",
     "\n",
-    "The 6 lines of code below define the convolutional base using a common pattern: a stack of Conv2D and MaxPooling2D layers.\n",
-    "\n",
-    "As input, a CNN takes tensors of shape (image_height, image_width, color_channels), ignoring the batch size. If you are new to these dimensions, color_channels refers to (R,G,B). In this example, you will configure our CNN to process inputs of shape (32, 32, 3), which is the format of CIFAR images. You can do this by passing the argument input_shape to our first layer."
+    "Since we only have a finite number of $\\alpha$ and $\\beta$ values\n",
+    "which are non-zero, we can rewrite the above convolution expressions\n",
+    "as a matrix-vector multiplication"
    ]
   },
   {
-   "cell_type": "code",
-   "execution_count": 8,
-   "id": "41409ee9",
+   "cell_type": "markdown",
+   "id": "340acf5c",
    "metadata": {
-    "collapsed": false,
     "editable": true
    },
-   "outputs": [],
    "source": [
-    "model = models.Sequential()\n",
-    "model.add(layers.Conv2D(32, (3, 3), activation='relu', input_shape=(32, 32, 3)))\n",
-    "model.add(layers.MaxPooling2D((2, 2)))\n",
-    "model.add(layers.Conv2D(64, (3, 3), activation='relu'))\n",
-    "model.add(layers.MaxPooling2D((2, 2)))\n",
-    "model.add(layers.Conv2D(64, (3, 3), activation='relu'))\n",
-    "\n",
-    "# Let's display the architecture of our model so far.\n",
+    "$$\n",
+    "\\boldsymbol{\\delta}=\\begin{bmatrix}\\alpha_0 & 0 & 0 & 0 \\\\\n",
+    "                            \\alpha_1 & \\alpha_0 & 0 & 0 \\\\\n",
+    "\t\t\t    \\alpha_2 & \\alpha_1 & \\alpha_0 & 0 \\\\\n",
+    "\t\t\t    0 & \\alpha_2 & \\alpha_1 & \\alpha_0 \\\\\n",
+    "\t\t\t    0 & 0 & \\alpha_2 & \\alpha_1 \\\\\n",
+    "\t\t\t    0 & 0 & 0 & \\alpha_2\n",
+    "\t\t\t    \\end{bmatrix}\\begin{bmatrix} \\beta_0 \\\\ \\beta_1 \\\\ \\beta_2 \\\\ \\beta_3\\end{bmatrix}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cdc8d513",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Commutative process\n",
     "\n",
-    "model.summary()"
+    "The process is commutative and we can easily see that we can rewrite the multiplication in terms of  a matrix holding $\\beta$ and a vector holding $\\alpha$.\n",
+    "In this case we have"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "c5a9fdcd",
+   "id": "51e1f3d8",
    "metadata": {
     "editable": true
    },
    "source": [
-    "You can see that the output of every Conv2D and MaxPooling2D layer is a 3D tensor of shape (height, width, channels). The width and height dimensions tend to shrink as you go deeper in the network. The number of output channels for each Conv2D layer is controlled by the first argument (e.g., 32 or 64). Typically, as the width and height shrink, you can afford (computationally) to add more output channels in each Conv2D layer."
+    "$$\n",
+    "\\boldsymbol{\\delta}=\\begin{bmatrix}\\beta_0 & 0 & 0  \\\\\n",
+    "                            \\beta_1 & \\beta_0 & 0  \\\\\n",
+    "\t\t\t    \\beta_2 & \\beta_1 & \\beta_0  \\\\\n",
+    "\t\t\t    \\beta_3 & \\beta_2 & \\beta_1 \\\\\n",
+    "\t\t\t    0 & \\beta_3 & \\beta_2 \\\\\n",
+    "\t\t\t    0 & 0 & \\beta_3\n",
+    "\t\t\t    \\end{bmatrix}\\begin{bmatrix} \\alpha_0 \\\\ \\alpha_1 \\\\ \\alpha_2\\end{bmatrix}.\n",
+    "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "85ced028",
+   "id": "ce936f65",
    "metadata": {
     "editable": true
    },
    "source": [
-    "## Add Dense layers on top\n",
-    "\n",
-    "To complete our model, you will feed the last output tensor from the\n",
-    "convolutional base (of shape (4, 4, 64)) into one or more Dense layers\n",
-    "to perform classification. Dense layers take vectors as input (which\n",
-    "are 1D), while the current output is a 3D tensor. First, you will\n",
-    "flatten (or unroll) the 3D output to 1D, then add one or more Dense\n",
-    "layers on top. CIFAR has 10 output classes, so you use a final Dense\n",
-    "layer with 10 outputs and a softmax activation."
+    "Note that the use of these matrices is for mathematical purposes only\n",
+    "and not implementation purposes.  When implementing the above equation\n",
+    "we do not encode (and allocate memory) the matrices explicitely.  We\n",
+    "rather code the convolutions in the minimal memory footprint that they\n",
+    "require."
    ]
   },
   {
-   "cell_type": "code",
-   "execution_count": 9,
-   "id": "b4f62413",
+   "cell_type": "markdown",
+   "id": "c93a683f",
    "metadata": {
-    "collapsed": false,
     "editable": true
    },
-   "outputs": [],
    "source": [
-    "model.add(layers.Flatten())\n",
-    "model.add(layers.Dense(64, activation='relu'))\n",
-    "model.add(layers.Dense(10))\n",
-    "# Here's the complete architecture of our model\n",
+    "## Toeplitz matrices\n",
     "\n",
-    "model.summary()"
+    "The above matrices are examples of so-called [Toeplitz\n",
+    "matrices](https://link.springer.com/book/10.1007/978-93-86279-04-0). A\n",
+    "Toeplitz matrix is a matrix in which each descending diagonal from\n",
+    "left to right is constant. For instance the last matrix, which we\n",
+    "rewrite as"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "a045a7b7",
+   "id": "1e3cffca",
    "metadata": {
     "editable": true
    },
    "source": [
-    "As you can see, our (4, 4, 64) outputs were flattened into vectors of shape (1024) before going through two Dense layers."
+    "$$\n",
+    "\\boldsymbol{A}=\\begin{bmatrix}a_0 & 0 & 0  \\\\\n",
+    "                            a_1 & a_0 & 0  \\\\\n",
+    "\t\t\t    a_2 & a_1 & a_0  \\\\\n",
+    "\t\t\t    a_3 & a_2 & a_1 \\\\\n",
+    "\t\t\t    0 & a_3 & a_2 \\\\\n",
+    "\t\t\t    0 & 0 & a_3\n",
+    "\t\t\t    \\end{bmatrix},\n",
+    "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "10589326",
+   "id": "e27270d9",
    "metadata": {
     "editable": true
    },
    "source": [
-    "## Compile and train the model"
+    "with elements $a_{ii}=a_{i+1,j+1}=a_{i-j}$ is an example of a Toeplitz\n",
+    "matrix. Such a matrix does not need to be a square matrix.  Toeplitz\n",
+    "matrices are also closely connected with Fourier series, because the multiplication operator by a trigonometric\n",
+    "polynomial, compressed to a finite-dimensional space, can be\n",
+    "represented by such a matrix. The example above shows that we can\n",
+    "represent linear convolution as multiplication of a Toeplitz matrix by\n",
+    "a vector."
    ]
   },
   {
-   "cell_type": "code",
-   "execution_count": 10,
-   "id": "ead1ceea",
+   "cell_type": "markdown",
+   "id": "125ef645",
    "metadata": {
-    "collapsed": false,
     "editable": true
    },
-   "outputs": [],
    "source": [
-    "model.compile(optimizer='adam',\n",
-    "              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),\n",
-    "              metrics=['accuracy'])\n",
+    "## Fourier series and Toeplitz matrices\n",
     "\n",
-    "history = model.fit(train_images, train_labels, epochs=10, \n",
-    "                    validation_data=(test_images, test_labels))"
+    "This is an active and ogoing research area concerning CNNs. The following articles may be of interest\n",
+    "1. [Read more about the convolution theorem and Fouriers series](https://www.sciencedirect.com/topics/engineering/convolution-theorem#:~:text=The%20convolution%20theorem%20(together%20with,k%20)%20G%20(%20k%20)%20.)\n",
+    "\n",
+    "2. [Fourier Transform Layer](https://www.sciencedirect.com/science/article/pii/S1568494623006257)"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "95d77d17",
+   "id": "d13ab1e4",
    "metadata": {
     "editable": true
    },
    "source": [
-    "## Finally, evaluate the model"
+    "## Generalizing the above one-dimensional case\n",
+    "\n",
+    "In order to align the above simple case with the more general\n",
+    "convolution cases, we rename $\\boldsymbol{\\alpha}$, whose length is $m=3$,\n",
+    "with $\\boldsymbol{w}$.  We will interpret $\\boldsymbol{w}$ as a weight/filter function\n",
+    "with which we want to perform the convolution with an input variable\n",
+    "$\\boldsymbol{x}$ of length $n$.  We will assume always that the filter\n",
+    "$\\boldsymbol{w}$ has dimensionality $m \\le n$.\n",
+    "\n",
+    "We replace thus $\\boldsymbol{\\beta}$ with $\\boldsymbol{x}$ and $\\boldsymbol{\\delta}$ with $\\boldsymbol{y}$ and have"
    ]
   },
   {
-   "cell_type": "code",
-   "execution_count": 11,
-   "id": "632def1e",
+   "cell_type": "markdown",
+   "id": "b9eb4b1e",
    "metadata": {
-    "collapsed": false,
     "editable": true
    },
-   "outputs": [],
    "source": [
-    "plt.plot(history.history['accuracy'], label='accuracy')\n",
-    "plt.plot(history.history['val_accuracy'], label = 'val_accuracy')\n",
-    "plt.xlabel('Epoch')\n",
-    "plt.ylabel('Accuracy')\n",
-    "plt.ylim([0.5, 1])\n",
-    "plt.legend(loc='lower right')\n",
-    "\n",
-    "test_loss, test_acc = model.evaluate(test_images,  test_labels, verbose=2)\n",
-    "\n",
-    "print(test_acc)"
+    "$$\n",
+    "y(i)= \\left(x*w\\right)(i)= \\sum_{k=0}^{k=m-1}w(k)x(i-k),\n",
+    "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "e13a69ed",
+   "id": "bdf0893f",
    "metadata": {
     "editable": true
    },
    "source": [
-    "## Building our own CNN code\n",
-    "\n",
-    "Here we present a flexible and readable python code for a CNN\n",
-    "implemented with NumPy. We will present the code, showcase how to use\n",
-    "the codebase and fit a CNN that yields a 99% accuracy on the 28x28\n",
-    "MNIST dataset within reasonable time.\n",
-    "\n",
-    "**The codes here were developed by Eric Reber and Gregor Kajda during spring 2023.**\n",
-    "\n",
-    "The CNN is compatible with all schedulers, cost functions and\n",
-    "activation functions discussed in constructing our neural network\n",
-    "codes.\n",
-    "\n",
-    " The CNN code consists of different types of Layer classes, including\n",
-    "Convolution2DLayer, Pooling2DLayer, FlattenLayer, FullyConnectedLayer\n",
-    "and OutputLayer, which can be added to the CNN object using the\n",
-    "interface of the CNN class. This allows you to easily construct your\n",
-    "own CNN, as well as allowing you to get used to an interface similar\n",
-    "to that of TensorFlow which is used for real world applications. \n",
-    "\n",
-    "Another important feature of this code is that it throws errors if\n",
-    "unreasonable decisions are made (for example using a kernel that is\n",
-    "larger than the image, not using a FlattenLayer, etc), and provides\n",
-    "the user with an informative error message."
+    "where $m=3$ in our case, the maximum length of the vector $\\boldsymbol{w}$.\n",
+    "Here the symbol $*$ represents the mathematical operation of convolution."
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "1628db4b",
+   "id": "64cd5dbb",
    "metadata": {
     "editable": true
    },
    "source": [
-    "### List of contents:\n",
+    "## Memory considerations\n",
     "\n",
-    "1. Schedulers\n",
+    "This expression leaves us however with some terms with negative\n",
+    "indices, for example $x(-1)$ and $x(-2)$ which may not be defined. Our\n",
+    "vector $\\boldsymbol{x}$ has components $x(0)$, $x(1)$, $x(2)$ and $x(3)$.\n",
     "\n",
-    "2. Activation Functions\n",
+    "The index $j$ for $\\boldsymbol{x}$ runs from $j=0$ to $j=3$ since $\\boldsymbol{x}$ is meant to\n",
+    "represent a third-order polynomial.\n",
     "\n",
-    "3. Cost Functions \n",
-    "\n",
-    "4. Convolution\n",
-    "\n",
-    "5. Layers\n",
-    "\n",
-    "6. CNN \n",
-    "\n",
-    "7. Some final remarks"
+    "Furthermore, the index $i$ runs from $i=0$ to $i=5$ since $\\boldsymbol{y}$\n",
+    "contains the coefficients of a fifth-order polynomial.  When $i=5$ we\n",
+    "may also have values of $x(4)$ and $x(5)$ which are not defined."
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "3161dcdf",
+   "id": "20fa0219",
    "metadata": {
     "editable": true
    },
    "source": [
-    "### Schedulers\n",
+    "## Padding\n",
     "\n",
-    "The code below shows object oriented implementations of the Constant,\n",
-    "Momentum, Adagrad, AdagradMomentum, RMS prop and Adam schedulers. All\n",
-    "of the classes belong to the shared abstract Scheduler class, and\n",
-    "share the update_change() and reset() methods allowing for any of the\n",
-    "schedulers to be seamlessly used during the training stage, as will\n",
-    "later be shown in the fit() method of the neural\n",
-    "network. Update_change() only has one parameter, the gradient\n",
-    "($\\delta^{l}_{j}a^{l-1}_k$), and returns the change which will be\n",
-    "subtracted from the weights. The reset() function takes no parameters,\n",
-    "and resets the desired variables. For Constant and Momentum, reset\n",
-    "does nothing."
+    "The solution to this is what is called **padding**!  We simply define a\n",
+    "new vector $x$ with two added elements set to zero before $x(0)$ and\n",
+    "two new elements after $x(3)$ set to zero. That is, we augment the\n",
+    "length of $\\boldsymbol{x}$ from $n=4$ to $n+2P=8$, where $P=2$ is the padding\n",
+    "constant (a new hyperparameter), see discussions below as well."
    ]
   },
   {
-   "cell_type": "code",
-   "execution_count": 12,
-   "id": "b5f7c6ba",
+   "cell_type": "markdown",
+   "id": "d24c7e69",
    "metadata": {
-    "collapsed": false,
     "editable": true
    },
-   "outputs": [],
    "source": [
-    "import autograd.numpy as np\n",
-    "\n",
-    "class Scheduler:\n",
-    "    \"\"\"\n",
-    "    Abstract class for Schedulers\n",
-    "    \"\"\"\n",
-    "\n",
-    "    def __init__(self, eta):\n",
-    "        self.eta = eta\n",
-    "\n",
-    "    # should be overwritten\n",
-    "    def update_change(self, gradient):\n",
-    "        raise NotImplementedError\n",
-    "\n",
-    "    # overwritten if needed\n",
-    "    def reset(self):\n",
-    "        pass\n",
-    "\n",
-    "\n",
-    "class Constant(Scheduler):\n",
-    "    def __init__(self, eta):\n",
-    "        super().__init__(eta)\n",
-    "\n",
-    "    def update_change(self, gradient):\n",
-    "        return self.eta * gradient\n",
-    "    \n",
-    "    def reset(self):\n",
-    "        pass\n",
-    "\n",
-    "\n",
-    "class Momentum(Scheduler):\n",
-    "    def __init__(self, eta: float, momentum: float):\n",
-    "        super().__init__(eta)\n",
-    "        self.momentum = momentum\n",
-    "        self.change = 0\n",
-    "\n",
-    "    def update_change(self, gradient):\n",
-    "        self.change = self.momentum * self.change + self.eta * gradient\n",
-    "        return self.change\n",
-    "\n",
-    "    def reset(self):\n",
-    "        pass\n",
-    "\n",
-    "\n",
-    "class Adagrad(Scheduler):\n",
-    "    def __init__(self, eta):\n",
-    "        super().__init__(eta)\n",
-    "        self.G_t = None\n",
-    "\n",
-    "    def update_change(self, gradient):\n",
-    "        delta = 1e-8  # avoid division ny zero\n",
-    "\n",
-    "        if self.G_t is None:\n",
-    "            self.G_t = np.zeros((gradient.shape[0], gradient.shape[0]))\n",
-    "\n",
-    "        self.G_t += gradient @ gradient.T\n",
-    "\n",
-    "        G_t_inverse = 1 / (\n",
-    "            delta + np.sqrt(np.reshape(np.diagonal(self.G_t), (self.G_t.shape[0], 1)))\n",
-    "        )\n",
-    "        return self.eta * gradient * G_t_inverse\n",
-    "\n",
-    "    def reset(self):\n",
-    "        self.G_t = None\n",
-    "\n",
-    "\n",
-    "class AdagradMomentum(Scheduler):\n",
-    "    def __init__(self, eta, momentum):\n",
-    "        super().__init__(eta)\n",
-    "        self.G_t = None\n",
-    "        self.momentum = momentum\n",
-    "        self.change = 0\n",
-    "\n",
-    "    def update_change(self, gradient):\n",
-    "        delta = 1e-8  # avoid division ny zero\n",
-    "\n",
-    "        if self.G_t is None:\n",
-    "            self.G_t = np.zeros((gradient.shape[0], gradient.shape[0]))\n",
+    "## New vector\n",
     "\n",
-    "        self.G_t += gradient @ gradient.T\n",
+    "We have a new vector defined as $x(0)=0$, $x(1)=0$,\n",
+    "$x(2)=\\beta_0$, $x(3)=\\beta_1$, $x(4)=\\beta_2$, $x(5)=\\beta_3$,\n",
+    "$x(6)=0$, and $x(7)=0$.\n",
     "\n",
-    "        G_t_inverse = 1 / (\n",
-    "            delta + np.sqrt(np.reshape(np.diagonal(self.G_t), (self.G_t.shape[0], 1)))\n",
-    "        )\n",
-    "        self.change = self.change * self.momentum + self.eta * gradient * G_t_inverse\n",
-    "        return self.change\n",
-    "\n",
-    "    def reset(self):\n",
-    "        self.G_t = None\n",
-    "\n",
-    "\n",
-    "class RMS_prop(Scheduler):\n",
-    "    def __init__(self, eta, rho):\n",
-    "        super().__init__(eta)\n",
-    "        self.rho = rho\n",
-    "        self.second = 0.0\n",
-    "\n",
-    "    def update_change(self, gradient):\n",
-    "        delta = 1e-8  # avoid division ny zero\n",
-    "        self.second = self.rho * self.second + (1 - self.rho) * gradient * gradient\n",
-    "        return self.eta * gradient / (np.sqrt(self.second + delta))\n",
-    "\n",
-    "    def reset(self):\n",
-    "        self.second = 0.0\n",
-    "\n",
-    "\n",
-    "class Adam(Scheduler):\n",
-    "    def __init__(self, eta, rho, rho2):\n",
-    "        super().__init__(eta)\n",
-    "        self.rho = rho\n",
-    "        self.rho2 = rho2\n",
-    "        self.moment = 0\n",
-    "        self.second = 0\n",
-    "        self.n_epochs = 1\n",
-    "\n",
-    "    def update_change(self, gradient):\n",
-    "        delta = 1e-8  # avoid division ny zero\n",
-    "\n",
-    "        self.moment = self.rho * self.moment + (1 - self.rho) * gradient\n",
-    "        self.second = self.rho2 * self.second + (1 - self.rho2) * gradient * gradient\n",
-    "\n",
-    "        moment_corrected = self.moment / (1 - self.rho**self.n_epochs)\n",
-    "        second_corrected = self.second / (1 - self.rho2**self.n_epochs)\n",
-    "\n",
-    "        return self.eta * moment_corrected / (np.sqrt(second_corrected + delta))\n",
-    "\n",
-    "    def reset(self):\n",
-    "        self.n_epochs += 1\n",
-    "        self.moment = 0\n",
-    "        self.second = 0"
+    "We have added four new elements, which\n",
+    "are all zero. The benefit is that we can rewrite the equation for\n",
+    "$\\boldsymbol{y}$, with $i=0,1,\\dots,5$,"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "739c98bf",
+   "id": "c00151a8",
    "metadata": {
     "editable": true
    },
    "source": [
-    "### Usage of schedulers\n",
-    "\n",
-    "To initalize a scheduler, simply create the object and pass in the necessary parameters such as the learning rate and the momentum as shown below. As the Scheduler class is an abstract class it should not called directly, and will raise an error upon usage."
+    "$$\n",
+    "y(i) = \\sum_{k=0}^{k=m-1}w(k)x(i+(m-1)-k).\n",
+    "$$"
    ]
   },
   {
-   "cell_type": "code",
-   "execution_count": 13,
-   "id": "fce507fc",
+   "cell_type": "markdown",
+   "id": "c9b39bfd",
    "metadata": {
-    "collapsed": false,
     "editable": true
    },
-   "outputs": [],
    "source": [
-    "momentum_scheduler = Momentum(eta=1e-3, momentum=0.9)\n",
-    "adam_scheduler = Adam(eta=1e-3, rho=0.9, rho2=0.999)"
+    "As an example, we have"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "f85b9adf",
+   "id": "53de5ac4",
    "metadata": {
     "editable": true
    },
    "source": [
-    "Here is a small example for how a segment of code using schedulers could look. Switching out the schedulers is simple."
+    "$$\n",
+    "y(4)=x(6)w(0)+x(5)w(1)+x(4)w(2)=0\\times \\alpha_0+\\beta_3\\alpha_1+\\beta_2\\alpha_2,\n",
+    "$$"
    ]
   },
   {
-   "cell_type": "code",
-   "execution_count": 14,
-   "id": "51c1b5fd",
+   "cell_type": "markdown",
+   "id": "e1025d77",
    "metadata": {
-    "collapsed": false,
     "editable": true
    },
-   "outputs": [],
    "source": [
-    "weights = np.ones((3,3))\n",
-    "print(f\"Before scheduler:\\n{weights=}\")\n",
-    "\n",
-    "epochs = 10\n",
-    "for e in range(epochs):\n",
-    "    gradient = np.random.rand(3, 3)\n",
-    "    change = adam_scheduler.update_change(gradient)\n",
-    "    weights = weights - change\n",
-    "    adam_scheduler.reset()\n",
+    "as before except that we have an additional term $x(6)w(0)$, which is zero.\n",
     "\n",
-    "print(f\"\\nAfter scheduler:\\n{weights=}\")"
+    "Similarly, for the fifth-order term we have"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "aa19b36b",
+   "id": "34a5a413",
    "metadata": {
     "editable": true
    },
    "source": [
-    "### Cost functions\n",
-    "\n",
-    "In this section we will quickly look at cost functions that can be\n",
-    "used when creating the neural network. Every cost function takes the\n",
-    "target vector as its parameter, and returns a function valued only at\n",
-    "X such that it may easily be differentiated."
+    "$$\n",
+    "y(5)=x(7)w(0)+x(6)w(1)+x(5)w(2)=0\\times \\alpha_0+0\\times\\alpha_1+\\beta_3\\alpha_2.\n",
+    "$$"
    ]
   },
   {
-   "cell_type": "code",
-   "execution_count": 15,
-   "id": "2704114f",
+   "cell_type": "markdown",
+   "id": "5ef38242",
    "metadata": {
-    "collapsed": false,
     "editable": true
    },
-   "outputs": [],
    "source": [
-    "def CostOLS(target):\n",
-    "    \"\"\"\n",
-    "    Return OLS function valued only at X, so\n",
-    "    that it may be easily differentiated\n",
-    "    \"\"\"\n",
-    "\n",
-    "    def func(X):\n",
-    "        return (1.0 / target.shape[0]) * np.sum((target - X) ** 2)\n",
-    "\n",
-    "    return func\n",
-    "\n",
-    "\n",
-    "def CostLogReg(target):\n",
-    "    \"\"\"\n",
-    "    Return Logistic Regression cost function\n",
-    "    valued only at X, so that it may be easily differentiated\n",
-    "    \"\"\"\n",
-    "\n",
-    "    def func(X):\n",
-    "        return -(1.0 / target.shape[0]) * np.sum(\n",
-    "            (target * np.log(X + 10e-10)) + ((1 - target) * np.log(1 - X + 10e-10))\n",
-    "        )\n",
-    "\n",
-    "    return func\n",
-    "\n",
-    "\n",
-    "def CostCrossEntropy(target):\n",
-    "    \"\"\"\n",
-    "    Return cross entropy cost function valued only at X, so\n",
-    "    that it may be easily differentiated\n",
-    "    \"\"\"\n",
-    "    \n",
-    "    def func(X):\n",
-    "        return -(1.0 / target.size) * np.sum(target * np.log(X + 10e-10))\n",
-    "\n",
-    "    return func"
+    "The zeroth-order term is"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "4f85cd1b",
+   "id": "42a8bd2e",
    "metadata": {
     "editable": true
    },
    "source": [
-    "### Usage of cost functions\n",
-    "\n",
-    "Below we will provide a short example of how these cost function may\n",
-    "be used to obtain results if you wish to test them out on your own\n",
-    "using AutoGrad's automatic differentiation."
+    "$$\n",
+    "y(0)=x(2)w(0)+x(1)w(1)+x(0)w(2)=\\beta_0 \\alpha_0+0\\times\\alpha_1+0\\times\\alpha_2=\\alpha_0\\beta_0.\n",
+    "$$"
    ]
   },
   {
-   "cell_type": "code",
-   "execution_count": 16,
-   "id": "7e2e0e27",
+   "cell_type": "markdown",
+   "id": "2580d624",
    "metadata": {
-    "collapsed": false,
     "editable": true
    },
-   "outputs": [],
    "source": [
-    "from autograd import grad\n",
-    "\n",
-    "target = np.array([[1, 2, 3]]).T\n",
-    "a = np.array([[4, 5, 6]]).T\n",
-    "\n",
-    "cost_func = CostCrossEntropy\n",
-    "cost_func_derivative = grad(cost_func(target))\n",
+    "## Rewriting as dot products\n",
     "\n",
-    "valued_at_a = cost_func_derivative(a)\n",
-    "print(f\"Derivative of cost function {cost_func.__name__} valued at a:\\n{valued_at_a}\")"
+    "If we now flip the filter/weight vector, with the following term as a typical example"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "9699957b",
+   "id": "76157e3c",
    "metadata": {
     "editable": true
    },
    "source": [
-    "### Activation functions\n",
-    "\n",
-    "Finally, before we look at the layers that make up the neural network,\n",
-    "we will look at the activation functions which can be specified\n",
-    "between the hidden layers and as the output function. Each function\n",
-    "can be valued for any given vector or matrix X, and can be\n",
-    "differentiated via derivate()."
+    "$$\n",
+    "y(0)=x(2)w(0)+x(1)w(1)+x(0)w(2)=x(2)\\tilde{w}(2)+x(1)\\tilde{w}(1)+x(0)\\tilde{w}(0),\n",
+    "$$"
    ]
   },
   {
-   "cell_type": "code",
-   "execution_count": 17,
-   "id": "8899db31",
+   "cell_type": "markdown",
+   "id": "a47c0bbf",
    "metadata": {
-    "collapsed": false,
     "editable": true
    },
-   "outputs": [],
    "source": [
+    "with $\\tilde{w}(0)=w(2)$, $\\tilde{w}(1)=w(1)$, and $\\tilde{w}(2)=w(0)$, we can then rewrite the above sum as a dot product of\n",
+    "$x(i:i+(m-1))\\tilde{w}$ for element $y(i)$, where $x(i:i+(m-1))$ is simply a patch of $\\boldsymbol{x}$ of size $m-1$.\n",
     "\n",
-    "import autograd.numpy as np\n",
-    "from autograd import elementwise_grad\n",
-    "\n",
-    "def identity(X):\n",
-    "    return X\n",
-    "\n",
-    "\n",
-    "def sigmoid(X):\n",
-    "    try:\n",
-    "        return 1.0 / (1 + np.exp(-X))\n",
-    "    except FloatingPointError:\n",
-    "        return np.where(X > np.zeros(X.shape), np.ones(X.shape), np.zeros(X.shape))\n",
-    "\n",
-    "\n",
-    "def softmax(X):\n",
-    "    X = X - np.max(X, axis=-1, keepdims=True)\n",
-    "    delta = 10e-10\n",
-    "    return np.exp(X) / (np.sum(np.exp(X), axis=-1, keepdims=True) + delta)\n",
-    "\n",
-    "\n",
-    "def RELU(X):\n",
-    "    return np.where(X > np.zeros(X.shape), X, np.zeros(X.shape))\n",
-    "\n",
-    "\n",
-    "def LRELU(X):\n",
-    "    delta = 10e-4\n",
-    "    return np.where(X > np.zeros(X.shape), X, delta * X)\n",
-    "\n",
-    "\n",
-    "def derivate(func):\n",
-    "    if func.__name__ == \"RELU\":\n",
-    "\n",
-    "        def func(X):\n",
-    "            return np.where(X > 0, 1, 0)\n",
-    "\n",
-    "        return func\n",
-    "\n",
-    "    elif func.__name__ == \"LRELU\":\n",
-    "\n",
-    "        def func(X):\n",
-    "            delta = 10e-4\n",
-    "            return np.where(X > 0, 1, delta)\n",
-    "\n",
-    "        return func\n",
-    "\n",
-    "    else:\n",
-    "        return elementwise_grad(func)"
+    "The padding $P$ we have introduced for the convolution stage is just\n",
+    "another hyperparameter which is introduced as part of the\n",
+    "architecture. Similarly, below we will also introduce another\n",
+    "hyperparameter called **Stride** $S$."
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "204ab300",
+   "id": "4de2c235",
    "metadata": {
     "editable": true
    },
    "source": [
-    "### Usage of activation functions\n",
+    "## Cross correlation\n",
     "\n",
-    "Below we present a short demonstration of how to use an activation\n",
-    "function. The derivative of the activation function will be important\n",
-    "when calculating the output delta term during backpropagation. Note\n",
-    "that derivate() can also be used for cost functions for a more\n",
-    "generalized approach."
+    "In essentially all applications one uses what is called cross correlation instead of the standard convolution described above.\n",
+    "This means that multiplication is performed in the same direction and instead of the general expression we have discussed above (with infinite sums)"
    ]
   },
   {
-   "cell_type": "code",
-   "execution_count": 18,
-   "id": "0a3b2aaf",
+   "cell_type": "markdown",
+   "id": "33319954",
    "metadata": {
-    "collapsed": false,
     "editable": true
    },
-   "outputs": [],
    "source": [
-    "z = np.array([[4, 5, 6]]).T\n",
-    "print(f\"Input to activation function:\\n{z}\")\n",
-    "\n",
-    "act_func = sigmoid\n",
-    "a = act_func(z)\n",
-    "print(f\"\\nOutput from {act_func.__name__} activation function:\\n{a}\")\n",
-    "\n",
-    "act_func_derivative = derivate(act_func)\n",
-    "valued_at_z = act_func_derivative(a)\n",
-    "print(f\"\\nDerivative of {act_func.__name__} activation function valued at z:\\n{valued_at_z}\")"
+    "$$\n",
+    "y(i) = \\sum_{k=-\\infty}^{k=\\infty}w(k)x(i-k),\n",
+    "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "546d54a6",
+   "id": "d46fb216",
    "metadata": {
     "editable": true
    },
    "source": [
-    "### Convolution\n",
-    "\n",
-    "In order to construct a convolutional neural network (CNN), it is\n",
-    "crucial to comprehend the fundamental principles of convolution and\n",
-    "how it aids in extracting information from images. Convolution, at its\n",
-    "core, is merely a mathematical operation between two functions that\n",
-    "yields another function. It is represented by an integral between two\n",
-    "functions, which is typically expressed as:"
+    "we have now"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "02083d7f",
+   "id": "1125a773",
    "metadata": {
     "editable": true
    },
    "source": [
     "$$\n",
-    "(f \\ast g)(t):=\\int_{-\\infty}^{\\infty} f(\\tau) g(t-\\tau) d \\tau.\n",
+    "y(i) = \\sum_{k=-\\infty}^{k=\\infty}w(k)x(i+k).\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "fb1c74bd",
-   "metadata": {
-    "editable": true
-   },
-   "source": [
-    "Here, $f$ and $g$ are the two functions on which we want to perform an\n",
-    "operation. The outcome of the convolution operation is represented by\n",
-    "$(f \\ast g)$, and it is derived by sliding the function $g$ over $f$ and\n",
-    "computing the integral of their product at each position. If both\n",
-    "functions are continuous, convolution takes the form shown\n",
-    "above. However, if we discretize both $f$ and $g$, the convolution\n",
-    "operation will take the form of a sum between the elements of $f$ and $g$:"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "8530c2e9",
+   "id": "4e9ea645",
    "metadata": {
     "editable": true
    },
    "source": [
-    "$$\n",
-    "(f \\ast g)[n]=\\sum_{m=0}^{n-1} f(m) g(n-m).\n",
-    "$$"
+    "Both TensorFlow and PyTorch (as well as our own code example below),\n",
+    "implement the last equation, although it is normally referred to as\n",
+    "convolution.  The same padding rules and stride rules discussed below\n",
+    "apply to this expression as well.\n",
+    "\n",
+    "We leave it as an exercise for you to convince yourself that the example we have discussed till now, gives the same final expression using the last expression."
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "9ff818eb",
+   "id": "711fc589",
    "metadata": {
     "editable": true
    },
    "source": [
-    "The key idea we utilize to extract the information contained in an\n",
-    "image is to slide an $m \\times n$ matrix $g$ over an $m \\times n$\n",
-    "matrix $f$. In our case, $f$ represents the image, while $g$\n",
-    "represents the kernel, oftentimes called a filter. However, since our\n",
-    "convolution will be a two-dimensional variant, we need to extend our\n",
-    "mathematical formula with an additional summation:"
+    "## Two-dimensional objects\n",
+    "\n",
+    "We are now ready to start studying the discrete convolutions relevant for convolutional neural networks.\n",
+    "We often use convolutions over more than one dimension at a time. If\n",
+    "we have a two-dimensional image $X$ as input, we can have a **filter**\n",
+    "defined by a two-dimensional **kernel/weight/filter** $W$. This leads to an output $Y$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "438b25c3",
+   "id": "ea93186d",
    "metadata": {
     "editable": true
    },
    "source": [
     "$$\n",
-    "(f \\ast g)(i, j)\\sum_{m=0}^{M-1}\\sum_{n=0}^{N-1} f(m,n) g(i-m, j-n).\n",
+    "Y(i,j)=(X * W)(i,j) = \\sum_m\\sum_n X(m,n)W(i-m,j-n).\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "73b3d4a1",
+   "id": "2ce72e4f",
    "metadata": {
     "editable": true
    },
    "source": [
-    "It is imperative to note that the size of the kernel g is\n",
-    "significantly smaller than the size of the input image f, thereby\n",
-    "reducing the amount of computation necessary for feature\n",
-    "extraction. Furthermore, the kernel is usually a trainable parameter\n",
-    "in a convolutional neural network, allowing the network to learn\n",
-    "appropriate kernels for specific tasks.\n",
-    "\n",
-    "To give you an example of how 2D convolution works in practice,\n",
-    "suppose we have an image $f$ of dimension $6 \\times 6$"
+    "Convolution is a commutative process, which means we can rewrite this equation as"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "c5aede1e",
+   "id": "7c891889",
    "metadata": {
     "editable": true
    },
    "source": [
     "$$\n",
-    "f = \\begin{bmatrix}\n",
-    "4 & 1 & 2 & 9 & 8 & 6 \\\\\n",
-    "9 & 5 & 9 & 5 & 8 & 5 \\\\\n",
-    "1 & 5 & 9 & 7 & 6 & 4 \\\\\n",
-    "2 & 9 & 8 & 3 & 7 & 1 \\\\\n",
-    "8 & 1 & 6 & 4 & 2 & 2 \\\\\n",
-    "1 & 0 & 5 & 7 & 8 & 2 \\\\\n",
-    "\\end{bmatrix}\n",
+    "Y(i,j)=(X * W)(i,j) = \\sum_m\\sum_n X(i-m,j-n)W(m,n).\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "9dc83d8a",
+   "id": "337f70a6",
    "metadata": {
     "editable": true
    },
    "source": [
-    "and a $3 \\times 3$ kernel $g$ called a low-pass filter. Note that the\n",
-    "kernel is usually rotated by 180 degrees during convolution, however\n",
-    "this has no effect on this kernel."
+    "Normally the latter is more straightforward to implement in a machine\n",
+    "larning library since there is less variation in the range of values\n",
+    "of $m$ and $n$.\n",
+    "\n",
+    "As mentioned above, most deep learning libraries implement\n",
+    "cross-correlation instead of convolution (although it is referred to as\n",
+    "convolution)"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "9ee96ea9",
+   "id": "aa0e3c87",
    "metadata": {
     "editable": true
    },
    "source": [
     "$$\n",
-    "g = \\frac{1}{9}\n",
-    "\\begin{bmatrix}\n",
-    "1 & 1 & 1 \\\\\n",
-    "1 & 1 & 1 \\\\\n",
-    "1 & 1 & 1 \\\\\n",
-    "\\end{bmatrix}\n",
+    "Y(i,j)=(X * W)(i,j) = \\sum_m\\sum_n X(i+m,j+n)W(m,n).\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "52042373",
+   "id": "77113b34",
    "metadata": {
     "editable": true
    },
    "source": [
-    "In order to filter the image, we have to extract a $3 \\times 3$\n",
-    "element from the upper left corner of $f$, and perform element-wise\n",
-    "multiplication of the extracted image pixels with the elements of the\n",
-    "kernel $g$:"
+    "## CNNs in more detail, simple example\n",
+    "\n",
+    "Let assume we have an input matrix $X$ of dimensionality $3\\times 3$\n",
+    "and a $2\\times 2$ filter $W$ given by the following matrices"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "63c0c68a",
+   "id": "d54278c7",
    "metadata": {
     "editable": true
    },
    "source": [
     "$$\n",
-    "\\begin{bmatrix}\n",
-    "4 & 1 & 2 \\\\\n",
-    "9 & 5 & 9 \\\\\n",
-    "1 & 5 & 9 \\\\\n",
-    "\\end{bmatrix}\n",
-    "\\cdot\n",
-    "\\begin{bmatrix}\n",
-    "\\frac{1}{9} & \\frac{1}{9} & \\frac{1}{9} \\\\\n",
-    "\\frac{1}{9} & \\frac{1}{9} & \\frac{1}{9} \\\\\n",
-    "\\frac{1}{9} & \\frac{1}{9} & \\frac{1}{9} \\\\\n",
-    "\\end{bmatrix}\n",
-    "=\n",
-    "\\begin{bmatrix}\n",
-    "\\frac{4}{9} & \\frac{1}{9} & \\frac{2}{9} \\\\\n",
-    "\\frac{9}{9} & \\frac{5}{9} & \\frac{9}{9} \\\\\n",
-    "\\frac{1}{9} & \\frac{5}{9} & \\frac{9}{9} \\\\\n",
-    "\\end {bmatrix}\n",
-    "= \\boldsymbol{A}\n",
+    "\\boldsymbol{X}=\\begin{bmatrix}x_{00} & x_{01} & x_{02}  \\\\\n",
+    "                      x_{10} & x_{11} & x_{12}  \\\\\n",
+    "\t              x_{20} & x_{21} & x_{22} \\end{bmatrix},\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "01e8d4cc",
+   "id": "597d1ef3",
    "metadata": {
     "editable": true
    },
    "source": [
-    "Then, following the multiplication, we summarize all the elements of the resulting matrix $\\boldsymbol{A}$:"
+    "and"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "58414734",
+   "id": "c544ba40",
    "metadata": {
     "editable": true
    },
    "source": [
     "$$\n",
-    "(f \\ast g)(0, 0)= \\sum_{i=0}^{2} \\sum_{j=0}^{2} a_{i,j} = 5,\n",
+    "\\boldsymbol{W}=\\begin{bmatrix}w_{00} & w_{01} \\\\\n",
+    "\t              w_{10} & w_{11}\\end{bmatrix}.\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "62e8fbeb",
+   "id": "b6c1b40b",
    "metadata": {
     "editable": true
    },
    "source": [
-    "which corresponds to the first element of the filtered image $(f \\ast g)$.\n",
+    "We introduce now the hyperparameter $S$ **stride**. Stride represents how the filter $W$ moves the convolution process on the matrix $X$.\n",
+    "We strongly recommend the repository on [Arithmetic of deep learning by Dumoulin and Visin](https://github.com/vdumoulin/conv_arithmetic) \n",
     "\n",
-    "Here we use a stride of $S=1$, a parameter denoted $S$ which describes how\n",
-    "many indexes we move the kernel $g$ to the right before repeating the\n",
-    "calculations above for the next $3 \\times 3$ element of the image\n",
-    "$f$. It is usually presumed that $S=1$, however, larger values for $S$\n",
-    "can be used to reduce the dimentionality of the filtered image such\n",
-    "that the convolution operation is more computationally efficient. In\n",
-    "the context of a convolutional neural network, this will become very\n",
-    "useful.\n",
+    "Here we set the stride equal to $S=1$, which means that, starting with the element $x_{00}$, the filter will act on $2\\times 2$ submatrices each time, starting with the upper corner and moving according to the stride value column by column. \n",
     "\n",
-    "The full result of the convolution is:"
+    "Here we perform the operation"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "5d85fa16",
+   "id": "d8ee5cf0",
    "metadata": {
     "editable": true
    },
    "source": [
     "$$\n",
-    "(f \\ast g) =\n",
-    "\\begin{bmatrix}\n",
-    "5 & 5.78 & 7 & 6.44 \\\\\n",
-    "6.33 & 6.67 & 6.89 & 5.11 \\\\\n",
-    "5.44 & 5.78 & 5.78 & 4 \\\\\n",
-    "4.44 & 4.78 & 5.56 & 4 \\\\\n",
-    "\\end{bmatrix}\n",
+    "Y_(i,j)=(X * W)(i,j) = \\sum_m\\sum_n X(i-m,j-n)W(m,n),\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "aa72ecd9",
+   "id": "5df35204",
    "metadata": {
     "editable": true
    },
    "source": [
-    "The result is markedly smaller in shape than the original image. This\n",
-    "occurs when using convolution without first padding the image with\n",
-    "additional columns and rows, allowing us to keep the original image\n",
-    "shape after sliding the kernel over the image.  How many rows and\n",
-    "columns we wish to pad the image with depends strictly on the shape of\n",
-    "the kernel, as we wish to pad the image with $r$ additional rows and\n",
-    "$c$ additional columns."
+    "and obtain"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "c0f4a5a6",
+   "id": "afe8a3ab",
    "metadata": {
     "editable": true
    },
    "source": [
     "$$\n",
-    "r =\\lfloor \\frac{\\mathrm{kernel height}}{2} \\rfloor \\cdot 2 \\\\\n",
-    "c =\\lfloor \\frac{\\mathrm{kernel width}}{2} \\rfloor \\cdot 2\n",
+    "\\boldsymbol{Y}=\\begin{bmatrix}x_{00}w_{00}+x_{01}w_{01}+x_{10}w_{10}+x_{11}w_{11} & x_{01}w_{00}+x_{02}w_{01}+x_{11}w_{10}+x_{12}w_{11}  \\\\\n",
+    "\t              x_{10}w_{00}+x_{11}w_{01}+x_{20}w_{10}+x_{21}w_{11} & x_{11}w_{00}+x_{12}w_{01}+x_{21}w_{10}+x_{22}w_{11}\\end{bmatrix}.\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "8cfe4678",
+   "id": "9a1c6848",
    "metadata": {
     "editable": true
    },
    "source": [
-    "Note the notation $\\lfloor \\frac{\\mathrm{kernel width}}{2} \\rfloor$ means that\n",
-    "we floor the result of the division, meaning we round down to a whole\n",
-    "number in case $\\frac{\\mathrm{kernel width}}{2}$ results in a floating point\n",
-    "number.\n",
-    "\n",
-    "Using those simple equations, we find out by how much we have to\n",
-    "extend the dimensions of the original image. Before proceeding,\n",
-    "however, we might ask what we shall fill the additional rows and\n",
-    "columns with? One of the most common approaches to padding is\n",
-    "zero-padding, which as the name suggest, involves filling the rows and\n",
-    "columns with zeros. This is the approach that we will be using for\n",
-    "this demonstration. If we apply this padding to out original $6 \\times 6$\n",
-    "image, the result will be an $8 \\times 8$ image as the kernel has a width and\n",
-    "height of 3. Note that the original image is encapsuled by the\n",
-    "zero-padded rows and columns:"
+    "We can rewrite this operation in terms of a matrix-vector multiplication by defining a new vector where we flatten out the inputs as a vector $\\boldsymbol{X}'$ of length $9$ and\n",
+    "a matrix $\\boldsymbol{W}'$ with dimension $4\\times 9$ as"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "1955231d",
+   "id": "4506234a",
    "metadata": {
     "editable": true
    },
    "source": [
     "$$\n",
-    "\\begin{bmatrix}\n",
-    "0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\\\\n",
-    "0 & 4 & 1 & 2 & 9 & 8 & 6 & 0 \\\\\n",
-    "0 & 9 & 5 & 9 & 5 & 8 & 5 & 0 \\\\\n",
-    "0 & 1 & 5 & 9 & 7 & 6 & 4 & 0 \\\\\n",
-    "0 & 2 & 9 & 8 & 3 & 7 & 1 & 0 \\\\\n",
-    "0 & 8 & 1 & 6 & 4 & 2 & 2 & 0 \\\\\n",
-    "0 & 1 & 0 & 5 & 7 & 8 & 2 & 0 \\\\\n",
-    "0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\\\\n",
-    "\\end{bmatrix}.\n",
+    "\\boldsymbol{X}'=\\begin{bmatrix}x_{00} \\\\ x_{01} \\\\ x_{02} \\\\ x_{10} \\\\ x_{11} \\\\ x_{12} \\\\ x_{20} \\\\ x_{21} \\\\ x_{22} \\end{bmatrix},\n",
     "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "ae6e39a0",
+   "id": "f1b2fef4",
    "metadata": {
     "editable": true
    },
    "source": [
-    "Below we have provided code that demonstrates padding and\n",
-    "convolution. As you will see when we run the code, the size of the\n",
-    "image will remain unchanged when using padding.~"
+    "and the new matrix"
    ]
   },
   {
-   "cell_type": "code",
-   "execution_count": 19,
-   "id": "699e1059",
+   "cell_type": "markdown",
+   "id": "6c372fa6",
    "metadata": {
-    "collapsed": false,
     "editable": true
    },
-   "outputs": [],
    "source": [
-    "import numpy as np\n",
-    "\n",
-    "def padding(image, kernel):\n",
-    "    # calculate r and c\n",
-    "    r = (kernel.shape[0] // 2) * 2\n",
-    "    c = (kernel.shape[1] // 2) * 2\n",
-    "    \n",
-    "    # padded image dimensions\n",
-    "    padded_height = image.shape[0] + r\n",
-    "    padded_width = image.shape[1] + c\n",
-    "    \n",
-    "    # for more readable code\n",
-    "    k_half_height = kernel.shape[0] // 2\n",
-    "    k_half_width = kernel.shape[1] // 2\n",
-    "\n",
-    "    # zero matrix with padded dimensions\n",
-    "    padded_img = np.zeros((padded_height, padded_width))\n",
-    "\n",
-    "    # place image into zero matrix\n",
-    "    padded_img[k_half_height : padded_height - k_half_height,\n",
-    "               k_half_width : padded_width - k_half_width] = image[:, :]\n",
-    "\n",
-    "    return padded_img\n",
-    "\n",
-    "def convolve(original_image, padded_image, kernel, stride=1):\n",
-    "    # rotate kernel by 180 degrees\n",
-    "    kernel = np.rot90(np.rot90(kernel))\n",
-    "\n",
-    "    # note that kernel height // 2 is written as 'm'\n",
-    "    # and kernel width // 2 as 'n' in the mathematical notation\n",
-    "    m = kernel.shape[0] // 2\n",
-    "    n = kernel.shape[1] // 2\n",
-    "    \n",
-    "    r = (kernel.shape[0] // 2) * 2\n",
-    "    c = (kernel.shape[1] // 2) * 2\n",
-    "    \n",
-    "    # initialize output array\n",
-    "    convolved_image = np.zeros(original_image.shape)\n",
-    "    image_height = original_image.shape[0]\n",
-    "    image_width = original_image.shape[1]\n",
-    "\n",
-    "    # the convolution\n",
-    "    for i in range(m, image_height + m, stride):\n",
-    "        for j in range(n, image_width + n, stride):\n",
-    "            convolved_image[i-m, j-n] = np.sum(\n",
-    "                padded_image[i : i + m, j : j + n]\n",
-    "                * kernel\n",
-    "            )\n",
-    "            \n",
-    "    return convolved_image\n",
-    "\n",
-    "def convolve(image, kernel, stride=1):\n",
-    "    for i in range(2):\n",
-    "        kernel = np.rot90(kernel)\n",
-    "\n",
-    "    k_half_height = kernel.shape[0] // 2\n",
-    "    k_half_width = kernel.shape[0] // 2\n",
-    "\n",
-    "    conv_image = np.zeros(image.shape)\n",
-    "    pad_image = padding(image, kernel)\n",
-    "\n",
-    "    for i in range(k_half_height, conv_image.shape[0] + k_half_height, stride):\n",
-    "        for j in range(k_half_width, conv_image.shape[1] + k_half_width, stride):\n",
-    "            conv_image[i - k_half_height, j - k_half_width] = np.sum(\n",
-    "                pad_image[\n",
-    "                    i - k_half_height : i + k_half_height + 1, j - k_half_width : j + k_half_width + 1\n",
-    "                ]\n",
-    "                * kernel\n",
-    "            )\n",
-    "\n",
-    "    return conv_image"
+    "$$\n",
+    "\\boldsymbol{W}'=\\begin{bmatrix} w_{00} & w_{01} & 0 & w_{10} & w_{11} & 0 & 0 & 0 & 0 \\\\\n",
+    "                        0  & w_{00} & w_{01} & 0 & w_{10} & w_{11} & 0 & 0 & 0 \\\\\n",
+    "\t\t\t0 & 0 & 0 & w_{00} & w_{01} & 0 & w_{10} & w_{11} & 0  \\\\\n",
+    "                        0 & 0 & 0 & 0 & w_{00} & w_{01} & 0 & w_{10} & w_{11}\\end{bmatrix}.\n",
+    "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "37893245",
+   "id": "61ad1cf3",
    "metadata": {
     "editable": true
    },
    "source": [
-    "Fun fact: When filtering images, you will see that convolution\n",
-    "involves rotating the kernel by 180 degrees.  However, this is not the\n",
-    "case when applying convolution in a CNN, where the same operation that is not\n",
-    "rotated by 180 degrees is called cross-correlation, which is normally implemented in most libraries."
+    "We see easily that performing the matrix-vector multiplication $\\boldsymbol{W}'\\boldsymbol{X}'$ is the same as the above convolution with stride $S=1$, that is"
    ]
   },
   {
-   "cell_type": "code",
-   "execution_count": 20,
-   "id": "4eda0c82",
+   "cell_type": "markdown",
+   "id": "a18a70a2",
    "metadata": {
-    "collapsed": false,
     "editable": true
    },
-   "outputs": [],
    "source": [
-    "\n",
-    "original_image = np.array([[4, 1, 2, 9, 8, 6],\n",
-    "                 [9, 5, 9, 5, 8, 5],\n",
-    "                 [1, 5, 9, 7, 6, 4],\n",
-    "                 [2, 9, 8, 3, 7, 1],\n",
-    "                 [8, 1, 6, 4, 2, 2],\n",
-    "                 [1, 0, 5, 7, 8, 2]])\n",
-    "\n",
-    "kernel = (1/9)*np.ones((3,3))\n",
-    "\n",
-    "print(f\"{original_image.shape=}\")\n",
-    "\n",
-    "# note that convolve() performs padding\n",
-    "convolved_image = convolve(original_image, kernel, stride=1)\n",
-    "\n",
-    "print(f\"{convolved_image.shape=}\")"
+    "$$\n",
+    "Y=(\\boldsymbol{W}*\\boldsymbol{X}),\n",
+    "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "e8013faa",
+   "id": "b63a1613",
    "metadata": {
     "editable": true
    },
    "source": [
-    "As you can see, the resulting image is of the same size as the\n",
-    "original image. To round of our demonstration of convolution, we will\n",
-    "present the results of convolution using commonly used kernels. In a\n",
-    "CNN, the values of the kernels are randomly initialized, and then\n",
-    "learned during training. These kernels will extract information\n",
-    "regarding the picture, such as for example the edge detection filter\n",
-    "demonstrated below extracts the edges present in the picture. Of\n",
-    "course, there is no guarantee that the CNN will learn an edge\n",
-    "detection filter, but this should provide some intuiton as to how the\n",
-    "CNN is able to use kernels to make better predictions than a regular\n",
-    "feed forward neural network."
+    "is now given by $\\boldsymbol{W}'\\boldsymbol{X}'$ which is a vector of length $4$ instead of the originally resulting  $2\\times 2$ output matrix."
    ]
   },
   {
-   "cell_type": "code",
-   "execution_count": 21,
-   "id": "480b77bf",
+   "cell_type": "markdown",
+   "id": "8fa9fe57",
    "metadata": {
-    "collapsed": false,
     "editable": true
    },
-   "outputs": [],
    "source": [
-    "# Now an example using a real image and first a gaussian low-pass filter and then a Sobel filter\n",
-    "import numpy as np\n",
-    "import imageio.v3 as imageio\n",
-    "import matplotlib.pyplot as plt\n",
-    "import time\n",
-    "\n",
-    "def generate_gauss_mask(sigma, K=1):\n",
-    "    side = np.ceil(1 + 8 * sigma)\n",
-    "    y, x = np.mgrid[-side // 2 + 1 : (side // 2) + 1, -side // 2 + 1 : (side // 2) + 1]\n",
-    "    ker_coef = K / (2 * np.pi * sigma**2)\n",
-    "    g = np.exp(-((x**2 + y**2) / (2.0 * sigma**2)))\n",
-    "\n",
-    "    return g, ker_coef\n",
+    "## The convolution stage\n",
     "\n",
-    "\n",
-    "img_path = \"data/IMG-2167.JPG\"\n",
-    "image_of_cute_dog = imageio.imread(img_path, mode='L')\n",
-    "\n",
-    "plt.imshow(image_of_cute_dog, cmap=\"gray\", vmin=0, vmax=255, aspect=\"auto\")\n",
-    "plt.title(\"Original image\")\n",
-    "plt.show()\n",
-    "\n",
-    "gauss, kernel = generate_gauss_mask(sigma=6)\n",
-    "gauss_kernel = gauss*kernel\n",
-    "\n",
-    "filtered_image = convolve(image_of_cute_dog, gauss_kernel)\n",
-    "plt.imshow(filtered_image, cmap=\"gray\", vmin=0, vmax=255, aspect=\"auto\")\n",
-    "plt.title(\"Result of convolution with gauss kernel (blurring filter)\")\n",
-    "plt.show()\n",
-    "\n",
-    "sobel_kernel = np.array([[1, 2, 1],\n",
-    "                    [0, 0, 0], \n",
-    "                    [-1, -2, -1]])\n",
-    "\n",
-    "filtered_image = convolve(image_of_cute_dog, sobel_kernel)\n",
-    "\n",
-    "plt.imshow(filtered_image, cmap=\"gray\", vmin=0, vmax=255, aspect=\"auto\")\n",
-    "plt.title(\"Result of convolution with sobel kernel (edge detection filter)\")\n",
-    "plt.show()"
+    "The convolution stage, where we apply different filters $\\boldsymbol{W}$ in\n",
+    "order to reduce the dimensionality of an image, adds, in addition to\n",
+    "the weights and biases (to be trained by the back propagation\n",
+    "algorithm) that define the filters, two new hyperparameters, the so-called\n",
+    "**padding** $P$ and the stride $S$."
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "f251e136",
+   "id": "a30b6ced",
    "metadata": {
     "editable": true
    },
    "source": [
-    "### Layers\n",
+    "## Finding the number of parameters\n",
     "\n",
-    "The code below initialises global variables for readability and\n",
-    "describes the abstract class Layers. This is not important in order to\n",
-    "understand the CNN, but is benefitial for organizing the code neatly."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 22,
-   "id": "9a8044fe",
-   "metadata": {
-    "collapsed": false,
-    "editable": true
-   },
-   "outputs": [],
-   "source": [
-    "import math\n",
-    "import autograd.numpy as np\n",
-    "from copy import deepcopy, copy\n",
-    "from autograd import grad\n",
-    "from typing import Callable\n",
+    "In the above example we have an input matrix of dimension $3\\times\n",
+    "3$. In general we call the input for an input volume and it is defined\n",
+    "by its width $H_1$, height $H_1$ and depth $D_1$. If we have the\n",
+    "standard three color channels $D_1=3$.\n",
     "\n",
-    "# global variables for index readability\n",
-    "input_index = 0\n",
-    "node_index = 1\n",
-    "bias_index = 1\n",
-    "input_channel_index = 1\n",
-    "feature_maps_index = 1\n",
-    "height_index = 2\n",
-    "width_index = 3\n",
-    "kernel_feature_maps_index = 1\n",
-    "kernel_input_channels_index = 0\n",
+    "The above example has $W_1=H_1=3$ and $D_1=1$.\n",
     "\n",
+    "When we introduce the filter we have the following additional hyperparameters\n",
+    "1. $K$ the number of filters. It is common to perform the convolution of the input several times since by experience shrinking the input too fast does not work well\n",
     "\n",
-    "class Layer:\n",
-    "    def __init__(self, seed):\n",
-    "        self.seed = seed\n",
+    "2. $F$ as the filter's spatial extent\n",
     "\n",
-    "    def _feedforward(self):\n",
-    "        raise NotImplementedError\n",
+    "3. $S$ as the stride parameter\n",
     "\n",
-    "    def _backpropagate(self):\n",
-    "        raise NotImplementedError\n",
+    "4. $P$ as the padding parameter\n",
     "\n",
-    "    def _reset_weights(self, previous_nodes):\n",
-    "        raise NotImplementedError"
+    "These parameters are defined by the architecture of the network and are not included in the training."
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "79775d01",
+   "id": "b38d040f",
    "metadata": {
     "editable": true
    },
    "source": [
-    "### Convolution2DLayer: convolution in a hidden layer\n",
+    "## New image (or volume)\n",
     "\n",
-    "After establishing the foundational understanding of applying\n",
-    "convolution to spatial data, let us delve into the intricate workings\n",
-    "of a convolutional layer in a Convolutional Neural Network (CNN). The\n",
-    "primary function of convolution, as previously discussed, is to\n",
-    "extract pertinent information from images while simultaneously\n",
-    "decreasing the scale of our data. To initiate the image processing, we\n",
-    "shall begin by partitioning the images into color channels (unless the\n",
-    "image is grayscale), comprising three primary colors: red, green, and\n",
-    "blue. We will subsequently utilize trainable kernels to construct a\n",
-    "higher-dimensional encoding of each channel called feature\n",
-    "maps. Successive layers will receive these feature maps as inputs,\n",
-    "generating further encodings, albeit with reduced dimensions. The term\n",
-    "trainable kernels denotes the initialization of pre-defined\n",
-    "kernel-shaped weights, which we will then train via backpropagation,\n",
-    "similar to how weights are trained in a Feedforward Neural Network.\n",
+    "Acting with the filter on the input volume produces an output volume\n",
+    "which is defined by its width $W_2$, its height $H_2$ and its depth\n",
+    "$D_2$.\n",
     "\n",
-    "To ensure seamless integration between our implementation of the\n",
-    "convolutional layer and popular machine learning frameworks like\n",
-    "Tensorflow (Keras) and PyTorch, we have adopted a design pattern that\n",
-    "mirrors the construction of models using these APIs. This involves\n",
-    "implementing our convolutional layer as a Python class or object,\n",
-    "which allows for a more modular and flexible approach to building\n",
-    "neural networks. By structuring our code in this way, users can easily\n",
-    "incorporate our implementation into their existing machine learning\n",
-    "pipelines without having to make significant changes to their\n",
-    "codebase. Additionally, this design pattern promotes code reusability\n",
-    "and makes it easier to maintain and update our convolutional layer\n",
-    "implementation over time.\n",
-    "\n",
-    "Note that the Convolution2DLayer takes in an activation function as a parameter, as it also performs non-linearity."
+    "These are defined by the following relations"
    ]
   },
   {
-   "cell_type": "code",
-   "execution_count": 23,
-   "id": "84a44151",
+   "cell_type": "markdown",
+   "id": "3b090ce0",
    "metadata": {
-    "collapsed": false,
     "editable": true
    },
-   "outputs": [],
    "source": [
-    "class Convolution2DLayer(Layer):\n",
-    "    def __init__(\n",
-    "        self,\n",
-    "        input_channels,\n",
-    "        feature_maps,\n",
-    "        kernel_height,\n",
-    "        kernel_width,\n",
-    "        v_stride,\n",
-    "        h_stride,\n",
-    "        pad,\n",
-    "        act_func: Callable,\n",
-    "        seed=None,\n",
-    "        reset_weights_independently=True,\n",
-    "    ):\n",
-    "        super().__init__(seed)\n",
-    "        self.input_channels = input_channels\n",
-    "        self.feature_maps = feature_maps\n",
-    "        self.kernel_height = kernel_height\n",
-    "        self.kernel_width = kernel_width\n",
-    "        self.v_stride = v_stride\n",
-    "        self.h_stride = h_stride\n",
-    "        self.pad = pad\n",
-    "        self.act_func = act_func\n",
-    "\n",
-    "        # such that the layer can be used on its own\n",
-    "        # outside of the CNN module\n",
-    "        if reset_weights_independently == True:\n",
-    "            self._reset_weights_independently()\n",
-    "\n",
-    "    def _feedforward(self, X_batch):\n",
-    "        # note that the shape of X_batch = [inputs, input_maps, img_height, img_width]\n",
-    "\n",
-    "        # pad the input batch\n",
-    "        X_batch_padded = self._padding(X_batch)\n",
-    "\n",
-    "        # calculate height_index and width_index after stride\n",
-    "        strided_height = int(np.ceil(X_batch.shape[height_index] / self.v_stride))\n",
-    "        strided_width = int(np.ceil(X_batch.shape[width_index] / self.h_stride))\n",
-    "\n",
-    "        # create output array\n",
-    "        output = np.ndarray(\n",
-    "            (\n",
-    "                X_batch.shape[input_index],\n",
-    "                self.feature_maps,\n",
-    "                strided_height,\n",
-    "                strided_width,\n",
-    "            )\n",
-    "        )\n",
-    "\n",
-    "        # save input and output for backpropagation\n",
-    "        self.X_batch_feedforward = X_batch\n",
-    "        self.output_shape = output.shape\n",
-    "\n",
-    "        # checking for errors, no need to look here :)\n",
-    "        self._check_for_errors()\n",
-    "\n",
-    "        # convolve input with kernel\n",
-    "        for img in range(X_batch.shape[input_index]):\n",
-    "            for chin in range(self.input_channels):\n",
-    "                for fmap in range(self.feature_maps):\n",
-    "                    out_h = 0\n",
-    "                    for h in range(0, X_batch.shape[height_index], self.v_stride):\n",
-    "                        out_w = 0\n",
-    "                        for w in range(0, X_batch.shape[width_index], self.h_stride):\n",
-    "                            output[img, fmap, out_h, out_w] = np.sum(\n",
-    "                                X_batch_padded[\n",
-    "                                    img,\n",
-    "                                    chin,\n",
-    "                                    h : h + self.kernel_height,\n",
-    "                                    w : w + self.kernel_width,\n",
-    "                                ]\n",
-    "                                * self.kernel[chin, fmap, :, :]\n",
-    "                            )\n",
-    "                            out_w += 1\n",
-    "                        out_h += 1\n",
-    "\n",
-    "        # Pay attention to the fact that we're not rotating the kernel by 180 degrees when filtering the image in\n",
-    "        # the convolutional layer, as convolution in terms of Machine Learning is a procedure known as cross-correlation\n",
-    "        # in image processing and signal processing\n",
-    "\n",
-    "        # return a\n",
-    "        return self.act_func(output / (self.kernel_height))\n",
-    "\n",
-    "    def _backpropagate(self, delta_term_next):\n",
-    "        # intiate matrices\n",
-    "        delta_term = np.zeros((self.X_batch_feedforward.shape))\n",
-    "        gradient_kernel = np.zeros((self.kernel.shape))\n",
-    "\n",
-    "        # pad input for convolution\n",
-    "        X_batch_padded = self._padding(self.X_batch_feedforward)\n",
-    "\n",
-    "        # Since an activation function is used at the output of the convolution layer, its derivative\n",
-    "        # has to be accounted for in the backpropagation -> as if ReLU was a layer on its own.\n",
-    "        act_derivative = derivate(self.act_func)\n",
-    "        delta_term_next = act_derivative(delta_term_next)\n",
-    "\n",
-    "        # fill in 0's for values removed by vertical stride in feedforward\n",
-    "        if self.v_stride > 1:\n",
-    "            v_ind = 1\n",
-    "            for i in range(delta_term_next.shape[height_index]):\n",
-    "                for j in range(self.v_stride - 1):\n",
-    "                    delta_term_next = np.insert(\n",
-    "                        delta_term_next, v_ind, 0, axis=height_index\n",
-    "                    )\n",
-    "                v_ind += self.v_stride\n",
-    "\n",
-    "        # fill in 0's for values removed by horizontal stride in feedforward\n",
-    "        if self.h_stride > 1:\n",
-    "            h_ind = 1\n",
-    "            for i in range(delta_term_next.shape[width_index]):\n",
-    "                for k in range(self.h_stride - 1):\n",
-    "                    delta_term_next = np.insert(\n",
-    "                        delta_term_next, h_ind, 0, axis=width_index\n",
-    "                    )\n",
-    "                h_ind += self.h_stride\n",
-    "\n",
-    "        # crops out 0-rows and 0-columns\n",
-    "        delta_term_next = delta_term_next[\n",
-    "            :,\n",
-    "            :,\n",
-    "            : self.X_batch_feedforward.shape[height_index],\n",
-    "            : self.X_batch_feedforward.shape[width_index],\n",
-    "        ]\n",
-    "\n",
-    "        # the gradient received from the next layer also needs to be padded\n",
-    "        delta_term_next = self._padding(delta_term_next)\n",
-    "\n",
-    "        # calculate delta term by convolving next delta term with kernel\n",
-    "        for img in range(self.X_batch_feedforward.shape[input_index]):\n",
-    "            for chin in range(self.input_channels):\n",
-    "                for fmap in range(self.feature_maps):\n",
-    "                    for h in range(self.X_batch_feedforward.shape[height_index]):\n",
-    "                        for w in range(self.X_batch_feedforward.shape[width_index]):\n",
-    "                            delta_term[img, chin, h, w] = np.sum(\n",
-    "                                delta_term_next[\n",
-    "                                    img,\n",
-    "                                    fmap,\n",
-    "                                    h : h + self.kernel_height,\n",
-    "                                    w : w + self.kernel_width,\n",
-    "                                ]\n",
-    "                                * np.rot90(np.rot90(self.kernel[chin, fmap, :, :]))\n",
-    "                            )\n",
-    "\n",
-    "        # calculate gradient for kernel for weight update\n",
-    "        # also via convolution\n",
-    "        for chin in range(self.input_channels):\n",
-    "            for fmap in range(self.feature_maps):\n",
-    "                for k_x in range(self.kernel_height):\n",
-    "                    for k_y in range(self.kernel_width):\n",
-    "                        gradient_kernel[chin, fmap, k_x, k_y] = np.sum(\n",
-    "                            X_batch_padded[\n",
-    "                                img,\n",
-    "                                chin,\n",
-    "                                h : h + self.kernel_height,\n",
-    "                                w : w + self.kernel_width,\n",
-    "                            ]\n",
-    "                            * delta_term_next[\n",
-    "                                img,\n",
-    "                                fmap,\n",
-    "                                h : h + self.kernel_height,\n",
-    "                                w : w + self.kernel_width,\n",
-    "                            ]\n",
-    "                        )\n",
-    "        # all kernels are updated with weight gradient of kernel\n",
-    "        self.kernel -= gradient_kernel\n",
-    "\n",
-    "        # return delta term\n",
-    "        return delta_term\n",
-    "\n",
-    "    def _padding(self, X_batch, batch_type=\"image\"):\n",
-    "\n",
-    "        # same padding for images\n",
-    "        if self.pad == \"same\" and batch_type == \"image\":\n",
-    "            padded_height = X_batch.shape[height_index] + (self.kernel_height // 2) * 2\n",
-    "            padded_width = X_batch.shape[width_index] + (self.kernel_width // 2) * 2\n",
-    "            half_kernel_height = self.kernel_height // 2\n",
-    "            half_kernel_width = self.kernel_width // 2\n",
-    "\n",
-    "            # initialize padded array\n",
-    "            X_batch_padded = np.ndarray(\n",
-    "                (\n",
-    "                    X_batch.shape[input_index],\n",
-    "                    X_batch.shape[feature_maps_index],\n",
-    "                    padded_height,\n",
-    "                    padded_width,\n",
-    "                )\n",
-    "            )\n",
-    "\n",
-    "            # zero pad all images in X_batch\n",
-    "            for img in range(X_batch.shape[input_index]):\n",
-    "                padded_img = np.zeros(\n",
-    "                    (X_batch.shape[feature_maps_index], padded_height, padded_width)\n",
-    "                )\n",
-    "                padded_img[\n",
-    "                    :,\n",
-    "                    half_kernel_height : padded_height - half_kernel_height,\n",
-    "                    half_kernel_width : padded_width - half_kernel_width,\n",
-    "                ] = X_batch[img, :, :, :]\n",
-    "                X_batch_padded[img, :, :, :] = padded_img[:, :, :]\n",
-    "\n",
-    "            return X_batch_padded\n",
-    "\n",
-    "        # same padding for gradients\n",
-    "        elif self.pad == \"same\" and batch_type == \"grad\":\n",
-    "            padded_height = X_batch.shape[height_index] + (self.kernel_height // 2) * 2\n",
-    "            padded_width = X_batch.shape[width_index] + (self.kernel_width // 2) * 2\n",
-    "            half_kernel_height = self.kernel_height // 2\n",
-    "            half_kernel_width = self.kernel_width // 2\n",
-    "\n",
-    "            # initialize padded array\n",
-    "            delta_term_padded = np.zeros(\n",
-    "                (\n",
-    "                    X_batch.shape[input_index],\n",
-    "                    X_batch.shape[feature_maps_index],\n",
-    "                    padded_height,\n",
-    "                    padded_width,\n",
-    "                )\n",
-    "            )\n",
-    "\n",
-    "            # zero pad delta term\n",
-    "            delta_term_padded[\n",
-    "                :, :, : X_batch.shape[height_index], : X_batch.shape[width_index]\n",
-    "            ] = X_batch[:, :, :, :]\n",
-    "\n",
-    "            return delta_term_padded\n",
-    "\n",
-    "        else:\n",
-    "            return X_batch\n",
-    "\n",
-    "    def _reset_weights_independently(self):\n",
-    "        # sets seed to remove randomness inbetween runs\n",
-    "        if self.seed is not None:\n",
-    "            np.random.seed(self.seed)\n",
-    "\n",
-    "        # initializes kernel matrix\n",
-    "        self.kernel = np.ndarray(\n",
-    "            (\n",
-    "                self.input_channels,\n",
-    "                self.feature_maps,\n",
-    "                self.kernel_height,\n",
-    "                self.kernel_width,\n",
-    "            )\n",
-    "        )\n",
-    "\n",
-    "        # randomly initializes weights\n",
-    "        for chin in range(self.kernel.shape[kernel_input_channels_index]):\n",
-    "            for fmap in range(self.kernel.shape[kernel_feature_maps_index]):\n",
-    "                self.kernel[chin, fmap, :, :] = np.random.rand(\n",
-    "                    self.kernel_height, self.kernel_width\n",
-    "                )\n",
-    "\n",
-    "    def _reset_weights(self, previous_nodes):\n",
-    "        # sets weights\n",
-    "        self._reset_weights_independently()\n",
-    "\n",
-    "        # returns shape of output used for subsequent layer's weight initiation\n",
-    "        strided_height = int(\n",
-    "            np.ceil(previous_nodes.shape[height_index] / self.v_stride)\n",
-    "        )\n",
-    "        strided_width = int(np.ceil(previous_nodes.shape[width_index] / self.h_stride))\n",
-    "        next_nodes = np.ones(\n",
-    "            (\n",
-    "                previous_nodes.shape[input_index],\n",
-    "                self.feature_maps,\n",
-    "                strided_height,\n",
-    "                strided_width,\n",
-    "            )\n",
-    "        )\n",
-    "        return next_nodes / self.kernel_height\n",
-    "\n",
-    "    def _check_for_errors(self):\n",
-    "        if self.X_batch_feedforward.shape[input_channel_index] != self.input_channels:\n",
-    "            raise AssertionError(\n",
-    "                f\"ERROR: Number of input channels in data ({self.X_batch_feedforward.shape[input_channel_index]}) is not equal to input channels in Convolution2DLayerOPT ({self.input_channels})! Please change the number of input channels of the Convolution2DLayer such that they are equal\"\n",
-    "            )"
+    "$$\n",
+    "W_2 = \\frac{(W_1-F+2P)}{S}+1,\n",
+    "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "87600f84",
+   "id": "52fa4212",
    "metadata": {
     "editable": true
    },
    "source": [
-    "### Backpropagation in the convolutional layer\n",
-    "\n",
-    "As you may have noticed, we have not yet explained how the\n",
-    "backpropagation algorithm works in a convolutional layer. However,\n",
-    "having covered all other major details about convolutional layers, we\n",
-    "are now prepared to do so. It should come as no surprise that the\n",
-    "calculation of delta terms at each convolutional layer takes the form\n",
-    "of convolution. After the gradient has been propagated backwards\n",
-    "through the flattening layer, where it was reshaped into an\n",
-    "appropriate form, calculating the update value for the kernel is\n",
-    "simply a matter of convolving the output gradient with the input of\n",
-    "the layer for which we are updating the weights. For more detail, this\n",
-    "article serves as an excellent resource, see\n",
-    "<https://pavisj.medium.com/convolutions-and-backpropagations-46026a8f5d2c>"
+    "$$\n",
+    "H_2 = \\frac{(H_1-F+2P)}{S}+1,\n",
+    "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "0d92a098",
+   "id": "dfa9a926",
    "metadata": {
     "editable": true
    },
    "source": [
-    "### Demonstration\n",
-    "\n",
-    "We can use the convolutional layer above to perform a simple convolution on an image of the now familiar cute dog."
+    "and $D_2=K$."
    ]
   },
   {
-   "cell_type": "code",
-   "execution_count": 24,
-   "id": "a38ff60c",
+   "cell_type": "markdown",
+   "id": "9bb02c26",
    "metadata": {
-    "collapsed": false,
     "editable": true
    },
-   "outputs": [],
    "source": [
-    "import numpy as np\n",
-    "import imageio.v3 as imageio\n",
-    "import matplotlib.pyplot as plt\n",
-    "\n",
-    "def plot_convolution_result(X, layer):\n",
-    "    plt.imshow(X[0, 0, :, :], vmin=0, vmax=255, cmap=\"gray\")\n",
-    "    plt.title(\"Original image\")\n",
-    "    plt.colorbar()\n",
-    "    plt.show()\n",
-    "    conv_result = layer._feedforward(X)\n",
-    "    plt.title(\"Result of convolutional layer\")\n",
-    "    plt.imshow(conv_result[0, 0, :, :], vmin=0, vmax=255, cmap=\"gray\")\n",
-    "    plt.colorbar()\n",
-    "    plt.show()\n",
-    "\n",
-    "# create layer\n",
-    "layer = Convolution2DLayer(\n",
-    "    input_channels=3,\n",
-    "    feature_maps=1,\n",
-    "    kernel_height=4,\n",
-    "    kernel_width=4,\n",
-    "    v_stride=2,\n",
-    "    h_stride=2,\n",
-    "    pad=\"same\",\n",
-    "    act_func=identity,\n",
-    "    seed=2023,\n",
-    "    )\n",
+    "## Parameters to train, common settings\n",
     "\n",
-    "# read in image path, make data correct format\n",
-    "img_path = img_path = \"data/IMG-2167.JPG\"\n",
-    "image_of_cute_dog = imageio.imread(img_path)\n",
-    "image_shape = image_of_cute_dog.shape\n",
-    "image_of_cute_dog = image_of_cute_dog.reshape(1, image_shape[0], image_shape[1], image_shape[2])\n",
-    "image_of_cute_dog = image_of_cute_dog.transpose(0, 3, 1, 2)\n",
+    "With parameter sharing, the convolution involves thus  for each filter  $F\\times F\\times D_1$ weights plus one bias parameter.\n",
     "\n",
-    "# plot the result of the convolution\n",
-    "plot_convolution_result(image_of_cute_dog, layer)"
+    "In total we have"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "8ee9e7fa",
+   "id": "d98e6808",
    "metadata": {
     "editable": true
    },
    "source": [
-    "We cobserve that the result has half the pixels on each axis due to\n",
-    "the fact that we've used a horizontal and vertical stride of 2. The\n",
-    "result of this convolution is not very insightfull, as the kernel has\n",
-    "completely random values for the first feedforward pass. However, as\n",
-    "we perform multiple forward and backward passes, the results of the\n",
-    "convolution should provide identifying features of the image it uses\n",
-    "for classification.\n",
-    "\n",
-    "Note that image data usually comes in many different shapes and sizes,\n",
-    "but for our CNN we require the input data be formatted as \\[Number of\n",
-    "inputs, input channels, input height, input width\\]. Occasionally, the\n",
-    "data you come accross use will be formatted like this, but on many\n",
-    "occasions reshaping and transposing the dimensions is sadly necessary."
+    "$$\n",
+    "\\left(F\\times F\\times D_1)\\right) \\times K+(K\\mathrm{--biases}),\n",
+    "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "77fca3be",
+   "id": "601ecd16",
    "metadata": {
     "editable": true
    },
    "source": [
-    "### Pooling Layer\n",
+    "parameters to train by back propagation.\n",
     "\n",
-    "The pooling layer is another widely used type of layer in\n",
-    "convolutional neural networks that enables data downsampling to a more\n",
-    "manageable size. Despite recent technological advancements that allow\n",
-    "for convolution without excessive size reduction of the data, the\n",
-    "pooling layer still remains a fundamental component of convolutional\n",
-    "neural networks. It can be used before, after, or in between\n",
-    "convolutional layers, although finding the optimal placement of layers\n",
-    "and network depth requires experimentation to achieve the best\n",
-    "performance for a given problem. The code we provide allows you to\n",
-    "perform two types of pooling known as max pooling and average pooling."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 25,
-   "id": "bc586a97",
-   "metadata": {
-    "collapsed": false,
-    "editable": true
-   },
-   "outputs": [],
-   "source": [
-    "class Pooling2DLayer(Layer):\n",
-    "    def __init__(\n",
-    "        self,\n",
-    "        kernel_height,\n",
-    "        kernel_width,\n",
-    "        v_stride,\n",
-    "        h_stride,\n",
-    "        pooling=\"max\",\n",
-    "        seed=None,\n",
-    "    ):\n",
-    "        super().__init__(seed)\n",
-    "        self.kernel_height = kernel_height\n",
-    "        self.kernel_width = kernel_width\n",
-    "        self.v_stride = v_stride\n",
-    "        self.h_stride = h_stride\n",
-    "        self.pooling = pooling\n",
-    "\n",
-    "    def _feedforward(self, X_batch):\n",
-    "        # Saving the input for use in the backwardpass\n",
-    "        self.X_batch_feedforward = X_batch\n",
-    "\n",
-    "        # check if user is silly\n",
-    "        self._check_for_errors()\n",
-    "\n",
-    "        # Computing the size of the feature maps based on kernel size and the stride parameter\n",
-    "        strided_height = (\n",
-    "            X_batch.shape[height_index] - self.kernel_height\n",
-    "        ) // self.v_stride + 1\n",
-    "        if X_batch.shape[height_index] == X_batch.shape[width_index]:\n",
-    "            strided_width = strided_height\n",
-    "        else:\n",
-    "            strided_width = (\n",
-    "                X_batch.shape[width_index] - self.kernel_width\n",
-    "            ) // self.h_stride + 1\n",
-    "\n",
-    "        # initialize output array\n",
-    "        output = np.ndarray(\n",
-    "            (\n",
-    "                X_batch.shape[input_index],\n",
-    "                X_batch.shape[feature_maps_index],\n",
-    "                strided_height,\n",
-    "                strided_width,\n",
-    "            )\n",
-    "        )\n",
-    "\n",
-    "        # select pooling action, either max or average pooling\n",
-    "        if self.pooling == \"max\":\n",
-    "            self.pooling_action = np.max\n",
-    "        elif self.pooling == \"average\":\n",
-    "            self.pooling_action = np.mean\n",
-    "\n",
-    "        # pool based on kernel size and stride\n",
-    "        for img in range(output.shape[input_index]):\n",
-    "            for fmap in range(output.shape[feature_maps_index]):\n",
-    "                for h in range(strided_height):\n",
-    "                    for w in range(strided_width):\n",
-    "                        output[img, fmap, h, w] = self.pooling_action(\n",
-    "                            X_batch[\n",
-    "                                img,\n",
-    "                                fmap,\n",
-    "                                (h * self.v_stride) : (h * self.v_stride)\n",
-    "                                + self.kernel_height,\n",
-    "                                (w * self.h_stride) : (w * self.h_stride)\n",
-    "                                + self.kernel_width,\n",
-    "                            ]\n",
-    "                        )\n",
-    "\n",
-    "        # output for feedforward in next layer\n",
-    "        return output\n",
-    "\n",
-    "    def _backpropagate(self, delta_term_next):\n",
-    "        # initiate delta term array\n",
-    "        delta_term = np.zeros((self.X_batch_feedforward.shape))\n",
-    "\n",
-    "        for img in range(delta_term_next.shape[input_index]):\n",
-    "            for fmap in range(delta_term_next.shape[feature_maps_index]):\n",
-    "                for h in range(0, delta_term_next.shape[height_index], self.v_stride):\n",
-    "                    for w in range(\n",
-    "                        0, delta_term_next.shape[width_index], self.h_stride\n",
-    "                    ):\n",
-    "                        # max pooling\n",
-    "                        if self.pooling == \"max\":\n",
-    "                            # get window\n",
-    "                            window = self.X_batch_feedforward[\n",
-    "                                img,\n",
-    "                                fmap,\n",
-    "                                h : h + self.kernel_height,\n",
-    "                                w : w + self.kernel_width,\n",
-    "                            ]\n",
-    "\n",
-    "                            # find max values indices in window\n",
-    "                            max_h, max_w = np.unravel_index(\n",
-    "                                window.argmax(), window.shape\n",
-    "                            )\n",
-    "\n",
-    "                            # set values in new, upsampled delta term\n",
-    "                            delta_term[\n",
-    "                                img,\n",
-    "                                fmap,\n",
-    "                                (h + max_h),\n",
-    "                                (w + max_w),\n",
-    "                            ] += delta_term_next[img, fmap, h, w]\n",
-    "\n",
-    "                        # average pooling\n",
-    "                        if self.pooling == \"average\":\n",
-    "                            delta_term[\n",
-    "                                img,\n",
-    "                                fmap,\n",
-    "                                h : h + self.kernel_height,\n",
-    "                                w : w + self.kernel_width,\n",
-    "                            ] = (\n",
-    "                                delta_term_next[img, fmap, h, w]\n",
-    "                                / self.kernel_height\n",
-    "                                / self.kernel_width\n",
-    "                            )\n",
-    "        # returns input to backpropagation in previous layer\n",
-    "        return delta_term\n",
-    "\n",
-    "    def _reset_weights(self, previous_nodes):\n",
-    "        # calculate strided height, strided width\n",
-    "        strided_height = (\n",
-    "            previous_nodes.shape[height_index] - self.kernel_height\n",
-    "        ) // self.v_stride + 1\n",
-    "        if previous_nodes.shape[height_index] == previous_nodes.shape[width_index]:\n",
-    "            strided_width = strided_height\n",
-    "        else:\n",
-    "            strided_width = (\n",
-    "                previous_nodes.shape[width_index] - self.kernel_width\n",
-    "            ) // self.h_stride + 1\n",
-    "\n",
-    "        # initiate output array\n",
-    "        output = np.ones(\n",
-    "            (\n",
-    "                previous_nodes.shape[input_index],\n",
-    "                previous_nodes.shape[feature_maps_index],\n",
-    "                strided_height,\n",
-    "                strided_width,\n",
-    "            )\n",
-    "        )\n",
-    "\n",
-    "        # returns output with shape used for reset weights in next layer\n",
-    "        return output\n",
-    "\n",
-    "    def _check_for_errors(self):\n",
-    "        # check if input is smaller than kernel size -> error\n",
-    "        assert (\n",
-    "            self.X_batch_feedforward.shape[width_index] >= self.kernel_width\n",
-    "        ), f\"ERROR: Pooling kernel width_index ({self.kernel_width}) larger than data width_index ({self.X_batch_feedforward.input.shape[2]}), please lower the kernel width_index of the Pooling2DLayer\"\n",
-    "        assert (\n",
-    "            self.X_batch_feedforward.shape[height_index] >= self.kernel_height\n",
-    "        ), f\"ERROR: Pooling kernel height_index ({self.kernel_height}) larger than data height_index ({self.X_batch_feedforward.input.shape[3]}), please lower the kernel height_index of the Pooling2DLayer\""
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "ee5dc2e3",
-   "metadata": {
-    "editable": true
-   },
-   "source": [
-    "### Flattening Layer\n",
-    "\n",
-    "Before we can begin building our first CNN model, we need to introduce\n",
-    "the flattening layer. As its name suggests, the flattening layer\n",
-    "transforms the data into a one-dimensional vector that can be fed into\n",
-    "the feedforward layers of our network. This layer plays a crucial role\n",
-    "in preparing the data for further processing in the\n",
-    "network. Additionally, the flattening layer is responsible for\n",
-    "reshaping the gradient to the proper shape during\n",
-    "backpropagation. This ensures that the kernels are correctly updated,\n",
-    "allowing for effective learning in the network."
+    "It is common to let $K$ come in powers of $2$, that is $32$, $64$, $128$ etc.\n",
+    "\n",
+    "**Common settings.**\n",
+    "\n",
+    "1. $\\begin{array}{c} F=3 & S=1 & P=1 \\end{array}$\n",
+    "\n",
+    "2. $\\begin{array}{c} F=5 & S=1 & P=2 \\end{array}$\n",
+    "\n",
+    "3. $\\begin{array}{c} F=5 & S=2 & P=\\mathrm{open} \\end{array}$\n",
+    "\n",
+    "4. $\\begin{array}{c} F=1 & S=1 & P=0 \\end{array}$"
    ]
   },
   {
-   "cell_type": "code",
-   "execution_count": 26,
-   "id": "8bde158b",
+   "cell_type": "markdown",
+   "id": "3f87e148",
    "metadata": {
-    "collapsed": false,
     "editable": true
    },
-   "outputs": [],
    "source": [
-    "class FlattenLayer(Layer):\n",
-    "    def __init__(self, act_func=LRELU, seed=None):\n",
-    "        super().__init__(seed)\n",
-    "        self.act_func = act_func\n",
-    "\n",
-    "    def _feedforward(self, X_batch):\n",
-    "        # save input for backpropagation\n",
-    "        self.X_batch_feedforward_shape = X_batch.shape\n",
-    "        # Remember, the data has the following shape: (I, FM, H, W, ) in the convolutional layers\n",
-    "        # whilst the data has the shape (I, FM * H * W) in the fully connected layers\n",
-    "        # I = Inputs, FM = Feature Maps, H = Height and W = Width.\n",
-    "        X_batch = X_batch.reshape(\n",
-    "            X_batch.shape[input_index],\n",
-    "            X_batch.shape[feature_maps_index]\n",
-    "            * X_batch.shape[height_index]\n",
-    "            * X_batch.shape[width_index],\n",
-    "        )\n",
-    "\n",
-    "        # add bias to a\n",
-    "        self.z_matrix = X_batch\n",
-    "        bias = np.ones((X_batch.shape[input_index], 1)) * 0.01\n",
-    "        self.a_matrix = np.hstack([bias, X_batch])\n",
+    "## Examples of CNN setups\n",
     "\n",
-    "        # return a, the input to feedforward in next layer\n",
-    "        return self.a_matrix\n",
+    "Let us assume we have an input volume $V$ given by an image of dimensionality\n",
+    "$32\\times 32 \\times 3$, that is three color channels and $32\\times 32$ pixels.\n",
     "\n",
-    "    def _backpropagate(self, weights_next, delta_term_next):\n",
-    "        activation_derivative = derivate(self.act_func)\n",
+    "We apply a filter of dimension $5\\times 5$ ten times with stride $S=1$ and padding $P=0$.\n",
     "\n",
-    "        # calculate delta term\n",
-    "        delta_term = (\n",
-    "            weights_next[bias_index:, :] @ delta_term_next.T\n",
-    "        ).T * activation_derivative(self.z_matrix)\n",
+    "The output volume is given by $(32-5)/1+1=28$, resulting in ten images\n",
+    "of dimensionality $28\\times 28\\times 3$.\n",
     "\n",
-    "        # FlattenLayer does not update weights\n",
-    "        # reshapes delta layer to convolutional layer data format [Input, Feature_Maps, Height, Width]\n",
-    "        return delta_term.reshape(self.X_batch_feedforward_shape)\n",
+    "The total number of parameters to train for each filter is then\n",
+    "$5\\times 5\\times 3+1$, where the last parameter is the bias. This\n",
+    "gives us $76$ parameters for each filter, leading to a total of $760$\n",
+    "parameters for the ten filters.\n",
     "\n",
-    "    def _reset_weights(self, previous_nodes):\n",
-    "        # note that the previous nodes to the FlattenLayer are from the convolutional layers\n",
-    "        previous_nodes = previous_nodes.reshape(\n",
-    "            previous_nodes.shape[input_index],\n",
-    "            previous_nodes.shape[feature_maps_index]\n",
-    "            * previous_nodes.shape[height_index]\n",
-    "            * previous_nodes.shape[width_index],\n",
-    "        )\n",
+    "How many parameters will a filter of dimensionality $3\\times 3$\n",
+    "(adding color channels) result in if we produce $32$ new images? Use $S=1$ and $P=0$.\n",
     "\n",
-    "        # return shape used in reset_weights in next layer\n",
-    "        return previous_nodes.shape[node_index]\n",
-    "\n",
-    "    def get_prev_a(self):\n",
-    "        return self.a_matrix"
+    "Note that strides constitute a form of **subsampling**. As an alternative to\n",
+    "being interpreted as a measure of how much the kernel/filter is translated, strides\n",
+    "can also be viewed as how much of the output is retained. For instance, moving\n",
+    "the kernel by hops of two is equivalent to moving the kernel by hops of one but\n",
+    "retaining only odd output elements."
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "49eb9933",
+   "id": "45526eae",
    "metadata": {
     "editable": true
    },
    "source": [
-    "### Fully Connected Layers\n",
+    "## Summarizing: Performing a general discrete convolution ([From Raschka et al](https://github.com/rasbt/machine-learning-book))\n",
+    "\n",
+    "<!-- dom:FIGURE: [figslides/discreteconv1.png, width=500 frac=0.67]  A deep CNN -->\n",
+    "<!-- begin figure -->\n",
     "\n",
-    "Finally, the result from the flatten layer will pass to a series of\n",
-    "fully connected layers, which function as a normal feed forward neural\n",
-    "network. The fully connected layers are split into two classes;\n",
-    "FullyConnectedLayer which acts as a hidden layer, and OutputLayer,\n",
-    "which acts as the single output layer at the end of the CNN. If one\n",
-    "wishes to use this codebase to construct a normal feed forward neural\n",
-    "network, it must start with a FlattenLayer due to techincal details\n",
-    "regarding weight intitialization. However many FullyConnectedLayers\n",
-    "can be added to the CNN, and in each layer the amount of nodes, which\n",
-    "activation function and scheduler to use can be specified. In\n",
-    "practice, the scheduler will be specified in the CNN object\n",
-    "initialization, and inherited if no other scheduler is specified."
+    "<img src=\"figslides/discreteconv1.png\" width=\"500\"><p style=\"font-size: 0.9em\"><i>Figure 1: A deep CNN</i></p>\n",
+    "<!-- end figure -->"
    ]
   },
   {
-   "cell_type": "code",
-   "execution_count": 27,
-   "id": "6837826e",
+   "cell_type": "markdown",
+   "id": "963177d2",
    "metadata": {
-    "collapsed": false,
     "editable": true
    },
-   "outputs": [],
    "source": [
-    "class FullyConnectedLayer(Layer):\n",
-    "    # FullyConnectedLayer per default uses LRELU and Adam scheduler\n",
-    "    # with an eta of 0.0001, rho of 0.9 and rho2 of 0.999\n",
-    "    def __init__(\n",
-    "        self,\n",
-    "        nodes: int,\n",
-    "        act_func: Callable = LRELU,\n",
-    "        scheduler: Scheduler = Adam(eta=1e-4, rho=0.9, rho2=0.999),\n",
-    "        seed: int = None,\n",
-    "    ):\n",
-    "        super().__init__(seed)\n",
-    "        self.nodes = nodes\n",
-    "        self.act_func = act_func\n",
-    "        self.scheduler_weight = copy(scheduler)\n",
-    "        self.scheduler_bias = copy(scheduler)\n",
-    "\n",
-    "        # initiate matrices for later\n",
-    "        self.weights = None\n",
-    "        self.a_matrix = None\n",
-    "        self.z_matrix = None\n",
-    "\n",
-    "    def _feedforward(self, X_batch):\n",
-    "        # calculate z\n",
-    "        self.z_matrix = X_batch @ self.weights\n",
-    "\n",
-    "        # calculate a, add bias\n",
-    "        bias = np.ones((X_batch.shape[input_index], 1)) * 0.01\n",
-    "        self.a_matrix = self.act_func(self.z_matrix)\n",
-    "        self.a_matrix = np.hstack([bias, self.a_matrix])\n",
-    "\n",
-    "        # return a, the input for feedforward in next layer\n",
-    "        return self.a_matrix\n",
-    "\n",
-    "    def _backpropagate(self, weights_next, delta_term_next, a_previous, lam):\n",
-    "        # take the derivative of the activation function\n",
-    "        activation_derivative = derivate(self.act_func)\n",
-    "\n",
-    "        # calculate the delta term\n",
-    "        delta_term = (\n",
-    "            weights_next[bias_index:, :] @ delta_term_next.T\n",
-    "        ).T * activation_derivative(self.z_matrix)\n",
-    "\n",
-    "        # intitiate matrix to store gradient\n",
-    "        # note that we exclude the bias term, which we will calculate later\n",
-    "        gradient_weights = np.zeros(\n",
-    "            (\n",
-    "                a_previous.shape[input_index],\n",
-    "                a_previous.shape[node_index] - bias_index,\n",
-    "                delta_term.shape[node_index],\n",
-    "            )\n",
-    "        )\n",
-    "\n",
-    "        # calculate gradient = delta term * previous a\n",
-    "        for i in range(len(delta_term)):\n",
-    "            gradient_weights[i, :, :] = np.outer(\n",
-    "                a_previous[i, bias_index:], delta_term[i, :]\n",
-    "            )\n",
-    "\n",
-    "        # sum the gradient, divide by input_index\n",
-    "        gradient_weights = np.mean(gradient_weights, axis=input_index)\n",
-    "        # for the bias gradient we do not multiply by previous a\n",
-    "        gradient_bias = np.mean(delta_term, axis=input_index).reshape(\n",
-    "            1, delta_term.shape[node_index]\n",
-    "        )\n",
-    "\n",
-    "        # regularization term\n",
-    "        gradient_weights += self.weights[bias_index:, :] * lam\n",
-    "\n",
-    "        # send gradients into scheduler\n",
-    "        # returns update matrix which will be used to update the weights and bias\n",
-    "        update_matrix = np.vstack(\n",
-    "            [\n",
-    "                self.scheduler_bias.update_change(gradient_bias),\n",
-    "                self.scheduler_weight.update_change(gradient_weights),\n",
-    "            ]\n",
-    "        )\n",
-    "\n",
-    "        # update weights\n",
-    "        self.weights -= update_matrix\n",
-    "\n",
-    "        # return weights and delta term, input for backpropagation in previous layer\n",
-    "        return self.weights, delta_term\n",
-    "\n",
-    "    def _reset_weights(self, previous_nodes):\n",
-    "        # sets seed to remove randomness inbetween runs\n",
-    "        if self.seed is not None:\n",
-    "            np.random.seed(self.seed)\n",
-    "\n",
-    "        # add bias, initiate random weights\n",
-    "        bias = 1\n",
-    "        self.weights = np.random.randn(previous_nodes + bias, self.nodes)\n",
-    "\n",
-    "        # returns number of nodes, used for reset_weights in next layer\n",
-    "        return self.nodes\n",
-    "\n",
-    "    def _reset_scheduler(self):\n",
-    "        # resets scheduler per epoch\n",
-    "        self.scheduler_weight.reset()\n",
-    "        self.scheduler_bias.reset()\n",
-    "\n",
-    "    def get_prev_a(self):\n",
-    "        # returns a matrix, used in backpropagation\n",
-    "        return self.a_matrix\n",
-    "\n",
-    "\n",
-    "class OutputLayer(FullyConnectedLayer):\n",
-    "    def __init__(\n",
-    "        self,\n",
-    "        nodes: int,\n",
-    "        output_func: Callable = LRELU,\n",
-    "        cost_func: Callable = CostCrossEntropy,\n",
-    "        scheduler: Scheduler = Adam(eta=1e-4, rho=0.9, rho2=0.999),\n",
-    "        seed: int = None,\n",
-    "    ):\n",
-    "        super().__init__(nodes, output_func, copy(scheduler), seed)\n",
-    "        self.cost_func = cost_func\n",
-    "\n",
-    "        # initiate matrices for later\n",
-    "        self.weights = None\n",
-    "        self.a_matrix = None\n",
-    "        self.z_matrix = None\n",
-    "\n",
-    "        # decides if the output layer performs binary or multi-class classification\n",
-    "        self._set_pred_format()\n",
-    "\n",
-    "    def _feedforward(self, X_batch: np.ndarray):\n",
-    "        # calculate a, z\n",
-    "        # note that bias is not added as this would create an extra output class\n",
-    "        self.z_matrix = X_batch @ self.weights\n",
-    "        self.a_matrix = self.act_func(self.z_matrix)\n",
-    "\n",
-    "        # returns prediction\n",
-    "        return self.a_matrix\n",
-    "\n",
-    "    def _backpropagate(self, target, a_previous, lam):\n",
-    "        # note that in the OutputLayer the activation function is the output function\n",
-    "        activation_derivative = derivate(self.act_func)\n",
-    "\n",
-    "        # calculate output delta terms\n",
-    "        # for multi-class or binary classification\n",
-    "        if self.pred_format == \"Multi-class\":\n",
-    "            delta_term = self.a_matrix - target\n",
-    "        else:\n",
-    "            cost_func_derivative = grad(self.cost_func(target))\n",
-    "            delta_term = activation_derivative(self.z_matrix) * cost_func_derivative(\n",
-    "                self.a_matrix\n",
-    "            )\n",
-    "\n",
-    "        # intiate matrix that stores gradient\n",
-    "        gradient_weights = np.zeros(\n",
-    "            (\n",
-    "                a_previous.shape[input_index],\n",
-    "                a_previous.shape[node_index] - bias_index,\n",
-    "                delta_term.shape[node_index],\n",
-    "            )\n",
-    "        )\n",
-    "\n",
-    "        # calculate gradient = delta term * previous a\n",
-    "        for i in range(len(delta_term)):\n",
-    "            gradient_weights[i, :, :] = np.outer(\n",
-    "                a_previous[i, bias_index:], delta_term[i, :]\n",
-    "            )\n",
-    "\n",
-    "        # sum the gradient, divide by input_index\n",
-    "        gradient_weights = np.mean(gradient_weights, axis=input_index)\n",
-    "        # for the bias gradient we do not multiply by previous a\n",
-    "        gradient_bias = np.mean(delta_term, axis=input_index).reshape(\n",
-    "            1, delta_term.shape[node_index]\n",
-    "        )\n",
-    "\n",
-    "        # regularization term\n",
-    "        gradient_weights += self.weights[bias_index:, :] * lam\n",
-    "\n",
-    "        # send gradients into scheduler\n",
-    "        # returns update matrix which will be used to update the weights and bias\n",
-    "        update_matrix = np.vstack(\n",
-    "            [\n",
-    "                self.scheduler_bias.update_change(gradient_bias),\n",
-    "                self.scheduler_weight.update_change(gradient_weights),\n",
-    "            ]\n",
-    "        )\n",
-    "\n",
-    "        # update weights\n",
-    "        self.weights -= update_matrix\n",
-    "\n",
-    "        # return weights and delta term, input for backpropagation in previous layer\n",
-    "        return self.weights, delta_term\n",
-    "\n",
-    "    def _reset_weights(self, previous_nodes):\n",
-    "        # sets seed to remove randomness inbetween runs\n",
-    "        if self.seed is not None:\n",
-    "            np.random.seed(self.seed)\n",
-    "\n",
-    "        # add bias, initiate random weights\n",
-    "        bias = 1\n",
-    "        self.weights = np.random.rand(previous_nodes + bias, self.nodes)\n",
-    "\n",
-    "        # returns number of nodes, used for reset_weights in next layer\n",
-    "        return self.nodes\n",
-    "\n",
-    "    def _reset_scheduler(self):\n",
-    "        # resets scheduler per epoch\n",
-    "        self.scheduler_weight.reset()\n",
-    "        self.scheduler_bias.reset()\n",
-    "\n",
-    "    def _set_pred_format(self):\n",
-    "        # sets prediction format to either regression, binary or multi-class classification\n",
-    "        if self.act_func.__name__ is None or self.act_func.__name__ == \"identity\":\n",
-    "            self.pred_format = \"Regression\"\n",
-    "        elif self.act_func.__name__ == \"sigmoid\" or self.act_func.__name__ == \"tanh\":\n",
-    "            self.pred_format = \"Binary\"\n",
-    "        else:\n",
-    "            self.pred_format = \"Multi-class\"\n",
+    "## Pooling\n",
     "\n",
-    "    def get_pred_format(self):\n",
-    "        # returns format of prediction\n",
-    "        return self.pred_format"
+    "In addition to discrete convolutions themselves, **pooling** operations\n",
+    "make up another important building block in CNNs. Pooling operations reduce\n",
+    "the size of feature maps by using some function to summarize subregions, such\n",
+    "as taking the average or the maximum value.\n",
+    "\n",
+    "Pooling works by sliding a window across the input and feeding the content of\n",
+    "the window to a **pooling function**. In some sense, pooling works very much\n",
+    "like a discrete convolution, but replaces the linear combination described by\n",
+    "the kernel with some other function."
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "a98a3948",
+   "id": "f657465b",
    "metadata": {
     "editable": true
    },
    "source": [
-    "### Optimized Convolution2DLayer\n",
+    "## Pooling arithmetic\n",
     "\n",
-    "For our CNN, we have also implemented an optimized version of the\n",
-    "Convolution2DLayer, Convolution2DLayerOPT, which runs much faster. See\n",
-    "VII. Remarks for discussion. This layer will per default be used by\n",
-    "the CNN due to its computational advantages, but is much less\n",
-    "readable. We've documented it such that specially interested students\n",
-    "can understand the principles behind it, but it is not recommended to\n",
-    "read. In short, we reshape and transpose parts of the image such that\n",
-    "the convolutional operation can be swapped out for a simple matrix\n",
-    "multiplication."
+    "In a neural network, pooling layers provide invariance to small translations of\n",
+    "the input. The most common kind of pooling is **max pooling**, which\n",
+    "consists in splitting the input in (usually non-overlapping) patches and\n",
+    "outputting the maximum value of each patch. Other kinds of pooling exist, e.g.,\n",
+    "mean or average pooling, which all share the same idea of aggregating the input\n",
+    "locally by applying a non-linearity to the content of some patches."
    ]
   },
   {
-   "cell_type": "code",
-   "execution_count": 28,
-   "id": "9a185047",
+   "cell_type": "markdown",
+   "id": "33142d01",
    "metadata": {
-    "collapsed": false,
     "editable": true
    },
-   "outputs": [],
    "source": [
-    "class Convolution2DLayerOPT(Convolution2DLayer):\n",
-    "    \"\"\"\n",
-    "    Am optimized version of the convolution layer above which\n",
-    "    utilizes an approach of extracting windows of size equivalent\n",
-    "    in size to the filter. The convoution is then performed on those\n",
-    "    windows instead of a full feature map.\n",
-    "    \"\"\"\n",
-    "\n",
-    "    def __init__(\n",
-    "        self,\n",
-    "        input_channels,\n",
-    "        feature_maps,\n",
-    "        kernel_height,\n",
-    "        kernel_width,\n",
-    "        v_stride,\n",
-    "        h_stride,\n",
-    "        pad,\n",
-    "        act_func: Callable,\n",
-    "        seed=None,\n",
-    "        reset_weights_independently=True,\n",
-    "    ):\n",
-    "        super().__init__(\n",
-    "            input_channels,\n",
-    "            feature_maps,\n",
-    "            kernel_height,\n",
-    "            kernel_width,\n",
-    "            v_stride,\n",
-    "            h_stride,\n",
-    "            pad,\n",
-    "            act_func,\n",
-    "            seed,\n",
-    "        )\n",
-    "        # true if layer is used outside of CNN\n",
-    "        if reset_weights_independently == True:\n",
-    "            self._reset_weights_independently()\n",
-    "\n",
-    "    def _feedforward(self, X_batch):\n",
-    "        # The optimized _feedforward method is difficult to understand but computationally more efficient\n",
-    "        # for a more \"by the book\" approach, please look at the _feedforward method of Convolution2DLayer\n",
-    "\n",
-    "        # save the input for backpropagation\n",
-    "        self.X_batch_feedforward = X_batch\n",
-    "\n",
-    "        # check that there are the correct amount of input channels\n",
-    "        self._check_for_errors()\n",
-    "\n",
-    "        # calculate new shape after stride\n",
-    "        strided_height = int(np.ceil(X_batch.shape[height_index] / self.v_stride))\n",
-    "        strided_width = int(np.ceil(X_batch.shape[width_index] / self.h_stride))\n",
-    "\n",
-    "        # get windows of the image for more computationally efficient convolution\n",
-    "        # the idea is that we want to align the dimensions that we wish to matrix\n",
-    "        # multiply, then use a simple matrix multiplication instead of convolution.\n",
-    "        # then, we reshape the size back to its intended shape\n",
-    "        windows = self._extract_windows(X_batch)\n",
-    "        windows = windows.transpose(1, 0, 2, 3, 4).reshape(\n",
-    "            X_batch.shape[input_index],\n",
-    "            strided_height * strided_width,\n",
-    "            -1,\n",
-    "        )\n",
-    "\n",
-    "        # reshape the kernel for more computationally efficient convolution\n",
-    "        kernel = self.kernel\n",
-    "        kernel = kernel.transpose(0, 2, 3, 1).reshape(\n",
-    "            kernel.shape[kernel_input_channels_index]\n",
-    "            * kernel.shape[height_index]\n",
-    "            * kernel.shape[width_index],\n",
-    "            -1,\n",
-    "        )\n",
-    "\n",
-    "        # use simple matrix calculation to obtain output\n",
-    "        output = (\n",
-    "            (windows @ kernel)\n",
-    "            .reshape(\n",
-    "                X_batch.shape[input_index],\n",
-    "                strided_height,\n",
-    "                strided_width,\n",
-    "                -1,\n",
-    "            )\n",
-    "            .transpose(0, 3, 1, 2)\n",
-    "        )\n",
-    "\n",
-    "        # The output is reshaped and rearranged to appropriate shape\n",
-    "        return self.act_func(\n",
-    "            output / (self.kernel_height * X_batch.shape[feature_maps_index])\n",
-    "        )\n",
-    "\n",
-    "    def _backpropagate(self, delta_term_next):\n",
-    "        # The optimized _backpropagate method is difficult to understand but computationally more efficient\n",
-    "        # for a more \"by the book\" approach, please look at the _backpropagate method of Convolution2DLayer\n",
-    "        act_derivative = derivate(self.act_func)\n",
-    "        delta_term_next = act_derivative(delta_term_next)\n",
-    "\n",
-    "        # calculate strided dimensions\n",
-    "        strided_height = int(\n",
-    "            np.ceil(self.X_batch_feedforward.shape[height_index] / self.v_stride)\n",
-    "        )\n",
-    "        strided_width = int(\n",
-    "            np.ceil(self.X_batch_feedforward.shape[width_index] / self.h_stride)\n",
-    "        )\n",
-    "\n",
-    "        # copy kernel\n",
-    "        kernel = self.kernel\n",
-    "\n",
-    "        # get windows, reshape for matrix multiplication\n",
-    "        windows = self._extract_windows(self.X_batch_feedforward, \"image\").reshape(\n",
-    "            self.X_batch_feedforward.shape[input_index]\n",
-    "            * strided_height\n",
-    "            * strided_width,\n",
-    "            -1,\n",
-    "        )\n",
-    "\n",
-    "        # initialize output gradient, reshape and transpose into correct shape\n",
-    "        # for matrix multiplication\n",
-    "        output_grad_tr = delta_term_next.transpose(0, 2, 3, 1).reshape(\n",
-    "            self.X_batch_feedforward.shape[input_index]\n",
-    "            * strided_height\n",
-    "            * strided_width,\n",
-    "            -1,\n",
-    "        )\n",
-    "\n",
-    "        # calculate gradient kernel via simple matrix multiplication and reshaping\n",
-    "        gradient_kernel = (\n",
-    "            (windows.T @ output_grad_tr)\n",
-    "            .reshape(\n",
-    "                kernel.shape[kernel_input_channels_index],\n",
-    "                kernel.shape[height_index],\n",
-    "                kernel.shape[width_index],\n",
-    "                kernel.shape[kernel_feature_maps_index],\n",
-    "            )\n",
-    "            .transpose(0, 3, 1, 2)\n",
-    "        )\n",
-    "\n",
-    "        # for computing the input gradient\n",
-    "        windows_out, upsampled_height, upsampled_width = self._extract_windows(\n",
-    "            delta_term_next, \"grad\"\n",
-    "        )\n",
-    "\n",
-    "        # calculate new window dimensions\n",
-    "        new_windows_first_dim = (\n",
-    "            self.X_batch_feedforward.shape[input_index]\n",
-    "            * upsampled_height\n",
-    "            * upsampled_width\n",
-    "        )\n",
-    "        # ceil allows for various asymmetric kernels\n",
-    "        new_windows_sec_dim = int(np.ceil(windows_out.size / new_windows_first_dim))\n",
-    "\n",
-    "        # reshape for matrix multiplication\n",
-    "        windows_out = windows_out.transpose(1, 0, 2, 3, 4).reshape(\n",
-    "            new_windows_first_dim, new_windows_sec_dim\n",
-    "        )\n",
-    "\n",
-    "        # reshape for matrix multiplication\n",
-    "        kernel_reshaped = kernel.reshape(self.input_channels, -1)\n",
-    "\n",
-    "        # calculating input gradient for next convolutional layer\n",
-    "        input_grad = (windows_out @ kernel_reshaped.T).reshape(\n",
-    "            self.X_batch_feedforward.shape[input_index],\n",
-    "            upsampled_height,\n",
-    "            upsampled_width,\n",
-    "            kernel.shape[kernel_input_channels_index],\n",
-    "        )\n",
-    "        input_grad = input_grad.transpose(0, 3, 1, 2)\n",
-    "\n",
-    "        # Update the weights in the kernel\n",
-    "        self.kernel -= gradient_kernel\n",
-    "\n",
-    "        # Output the gradient to propagate backwards\n",
-    "        return input_grad\n",
-    "\n",
-    "    def _extract_windows(self, X_batch, batch_type=\"image\"):\n",
-    "        \"\"\"\n",
-    "        Receives as input the X_batch with shape (inputs, feature_maps, image_height, image_width)\n",
-    "        and extract windows of size kernel_height * kernel_width for every image and every feature_map.\n",
-    "        It then returns an np.ndarray of shape (image_height * image_width, inputs, feature_maps, kernel_height, kernel_width)\n",
-    "        which will be used either to filter the images in feedforward or to calculate the gradient.\n",
-    "        \"\"\"\n",
-    "\n",
-    "        # initialize list of windows\n",
-    "        windows = []\n",
-    "\n",
-    "        if batch_type == \"image\":\n",
-    "            # pad the images\n",
-    "            X_batch_padded = self._padding(X_batch, batch_type=\"image\")\n",
-    "            img_height, img_width = X_batch_padded.shape[2:]\n",
-    "            # For each location in the image...\n",
-    "            for h in range(\n",
-    "                0,\n",
-    "                X_batch.shape[height_index],\n",
-    "                self.v_stride,\n",
-    "            ):\n",
-    "                for w in range(\n",
-    "                    0,\n",
-    "                    X_batch.shape[width_index],\n",
-    "                    self.h_stride,\n",
-    "                ):\n",
-    "                    # ...obtain an image patch of the original size (strided)\n",
-    "\n",
-    "                    # get window\n",
-    "                    window = X_batch_padded[\n",
-    "                        :,\n",
-    "                        :,\n",
-    "                        h : h + self.kernel_height,\n",
-    "                        w : w + self.kernel_width,\n",
-    "                    ]\n",
-    "\n",
-    "                    # append to list of windows\n",
-    "                    windows.append(window)\n",
-    "\n",
-    "            # return numpy array instead of list\n",
-    "            return np.stack(windows)\n",
-    "\n",
-    "        # In order to be able to perform backprogagation by the method of window extraction,\n",
-    "        # here is a modified approach to extracting the windows which allow for the necessary\n",
-    "        # upsampling of the gradient in case the on of the stride parameters is larger than one.\n",
-    "\n",
-    "        if batch_type == \"grad\":\n",
-    "\n",
-    "            # In the case of one of the stride parameters being odd, we have to take some\n",
-    "            # extra care in calculating the upsampled size of X_batch. We solve this\n",
-    "            # by simply flooring the result of dividing stride by 2.\n",
-    "            if self.v_stride < 2 or self.v_stride % 2 == 0:\n",
-    "                v_stride = 0\n",
-    "            else:\n",
-    "                v_stride = int(np.floor(self.v_stride / 2))\n",
-    "\n",
-    "            if self.h_stride < 2 or self.h_stride % 2 == 0:\n",
-    "                h_stride = 0\n",
-    "            else:\n",
-    "                h_stride = int(np.floor(self.h_stride / 2))\n",
-    "\n",
-    "            upsampled_height = (X_batch.shape[height_index] * self.v_stride) - v_stride\n",
-    "            upsampled_width = (X_batch.shape[width_index] * self.h_stride) - h_stride\n",
-    "\n",
-    "            # When upsampling, we need to insert rows and columns filled with zeros\n",
-    "            # into each feature map. How many of those we have to insert is purely\n",
-    "            # dependant on the value of stride parameter in the vertical and horizontal\n",
-    "            # direction.\n",
-    "            if self.v_stride > 1:\n",
-    "                v_ind = 1\n",
-    "                for i in range(X_batch.shape[height_index]):\n",
-    "                    for j in range(self.v_stride - 1):\n",
-    "                        X_batch = np.insert(X_batch, v_ind, 0, axis=height_index)\n",
-    "                    v_ind += self.v_stride\n",
-    "\n",
-    "            if self.h_stride > 1:\n",
-    "                h_ind = 1\n",
-    "                for i in range(X_batch.shape[width_index]):\n",
-    "                    for k in range(self.h_stride - 1):\n",
-    "                        X_batch = np.insert(X_batch, h_ind, 0, axis=width_index)\n",
-    "                    h_ind += self.h_stride\n",
-    "\n",
-    "            # Since the insertion of zero-filled rows and columns isn't perfect, we have\n",
-    "            # to assure that the resulting feature maps will have the expected upsampled height\n",
-    "            # and width by cutting them og at desired dimensions.\n",
-    "\n",
-    "            X_batch = X_batch[:, :, :upsampled_height, :upsampled_width]\n",
-    "\n",
-    "            X_batch_padded = self._padding(X_batch, batch_type=\"grad\")\n",
-    "\n",
-    "            # initialize list of windows\n",
-    "            windows = []\n",
-    "\n",
-    "            # For each location in the image...\n",
-    "            for h in range(\n",
-    "                0,\n",
-    "                X_batch.shape[height_index],\n",
-    "                self.v_stride,\n",
-    "            ):\n",
-    "                for w in range(\n",
-    "                    0,\n",
-    "                    X_batch.shape[width_index],\n",
-    "                    self.h_stride,\n",
-    "                ):\n",
-    "                    # ...obtain an image patch of the original size (strided)\n",
-    "\n",
-    "                    # get window\n",
-    "                    window = X_batch_padded[\n",
-    "                        :, :, h : h + self.kernel_height, w : w + self.kernel_width\n",
-    "                    ]\n",
-    "\n",
-    "                    # append window to list\n",
-    "                    windows.append(window)\n",
+    "## Pooling types ([From Raschka et al](https://github.com/rasbt/machine-learning-book))\n",
     "\n",
-    "            # return numpy array, unsampled dimensions\n",
-    "            return np.stack(windows), upsampled_height, upsampled_width\n",
+    "<!-- dom:FIGURE: [figslides/maxpooling.png, width=500 frac=0.67]  A deep CNN -->\n",
+    "<!-- begin figure -->\n",
     "\n",
-    "    def _check_for_errors(self):\n",
-    "        # compares input channels of data to input channels of Convolution2DLayer\n",
-    "        if self.X_batch_feedforward.shape[input_channel_index] != self.input_channels:\n",
-    "            raise AssertionError(\n",
-    "                f\"ERROR: Number of input channels in data ({self.X_batch_feedforward.shape[input_channel_index]}) is not equal to input channels in Convolution2DLayerOPT ({self.input_channels})! Please change the number of input channels of the Convolution2DLayer such that they are equal\"\n",
-    "            )"
+    "<img src=\"figslides/maxpooling.png\" width=\"500\"><p style=\"font-size: 0.9em\"><i>Figure 1: A deep CNN</i></p>\n",
+    "<!-- end figure -->"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "1a425bf2",
+   "id": "7e8ee265",
    "metadata": {
     "editable": true
    },
    "source": [
-    "### The Convolutional Neural Network (CNN)\n",
+    "## Building convolutional neural networks using Tensorflow and Keras\n",
+    "\n",
+    "As discussed above, CNNs are neural networks built from the assumption that the inputs\n",
+    "to the network are 2D images. This is important because the number of features or pixels in images\n",
+    "grows very fast with the image size, and an enormous number of weights and biases are needed in order to build an accurate network.  \n",
     "\n",
-    "Finally, we present the code for the CNN. The CNN class organizes all the layers, and allows for training on image data."
+    "As before, we still have our input, a hidden layer and an output. What's novel about convolutional networks\n",
+    "are the **convolutional** and **pooling** layers stacked in pairs between the input and the hidden layer.\n",
+    "In addition, the data is no longer represented as a 2D feature matrix, instead each input is a number of 2D\n",
+    "matrices, typically 1 for each color dimension (Red, Green, Blue)."
    ]
   },
   {
-   "cell_type": "code",
-   "execution_count": 29,
-   "id": "d22dfa06",
+   "cell_type": "markdown",
+   "id": "c4e2bc6f",
    "metadata": {
-    "collapsed": false,
     "editable": true
    },
-   "outputs": [],
    "source": [
-    "import math\n",
-    "import autograd.numpy as np\n",
-    "import sys\n",
-    "import warnings\n",
-    "from autograd import grad, elementwise_grad\n",
-    "from random import random, seed\n",
-    "from copy import deepcopy\n",
-    "from typing import Tuple, Callable\n",
-    "from sklearn.utils import resample\n",
-    "\n",
-    "warnings.simplefilter(\"error\")\n",
-    "\n",
-    "\n",
-    "class CNN:\n",
-    "    def __init__(\n",
-    "        self,\n",
-    "        cost_func: Callable = CostCrossEntropy,\n",
-    "        scheduler: Scheduler = Adam(eta=1e-4, rho=0.9, rho2=0.999),\n",
-    "        seed: int = None,\n",
-    "    ):\n",
-    "        \"\"\"\n",
-    "        Description:\n",
-    "        ------------\n",
-    "            Instantiates CNN object\n",
-    "\n",
-    "        Parameters:\n",
-    "        ------------\n",
-    "            I   output_func (costFunctions) cost function for feed forward neural network part of CNN,\n",
-    "                such as \"CostLogReg\", \"CostOLS\" or \"CostCrossEntropy\"\n",
-    "\n",
-    "            II  scheduler (Scheduler) optional parameter, default set to Adam. Can also be set to other\n",
-    "                schedulers such as AdaGrad, Momentum, RMS_prop and Constant. Note that schedulers have\n",
-    "                to be instantiated first with proper parameters (for example eta, rho and rho2 for Adam)\n",
-    "\n",
-    "            III seed (int) used for seeding all random operations\n",
-    "        \"\"\"\n",
-    "        self.layers = list()\n",
-    "        self.cost_func = cost_func\n",
-    "        self.scheduler = scheduler\n",
-    "        self.schedulers_weight = list()\n",
-    "        self.schedulers_bias = list()\n",
-    "        self.seed = seed\n",
-    "        self.pred_format = None\n",
-    "\n",
-    "    def add_FullyConnectedLayer(\n",
-    "        self, nodes: int, act_func=LRELU, scheduler=None\n",
-    "    ) -> None:\n",
-    "        \"\"\"\n",
-    "        Description:\n",
-    "        ------------\n",
-    "            Add a FullyConnectedLayer to the CNN, i.e. a hidden layer in the feed forward neural\n",
-    "            network part of the CNN. Often called a Dense layer in literature\n",
-    "\n",
-    "        Parameters:\n",
-    "        ------------\n",
-    "            I   nodes (int) number of nodes in FullyConnectedLayer\n",
-    "            II  act_func (activationFunctions) activation function of FullyConnectedLayer,\n",
-    "                such as \"sigmoid\", \"RELU\", \"LRELU\", \"softmax\" or \"identity\"\n",
-    "            III scheduler (Scheduler) optional parameter, default set to Adam. Can also be set to other\n",
-    "                schedulers such as AdaGrad, Momentum, RMS_prop and Constant\n",
-    "        \"\"\"\n",
-    "        assert self.layers, \"FullyConnectedLayer should follow FlattenLayer in CNN\"\n",
-    "\n",
-    "        if scheduler is None:\n",
-    "            scheduler = self.scheduler\n",
-    "\n",
-    "        layer = FullyConnectedLayer(nodes, act_func, scheduler, self.seed)\n",
-    "        self.layers.append(layer)\n",
-    "\n",
-    "    def add_OutputLayer(self, nodes: int, output_func=sigmoid, scheduler=None) -> None:\n",
-    "        \"\"\"\n",
-    "        Description:\n",
-    "        ------------\n",
-    "            Add an OutputLayer to the CNN, i.e. a the final layer in the feed forward neural\n",
-    "            network part of the CNN\n",
-    "\n",
-    "        Parameters:\n",
-    "        ------------\n",
-    "            I   nodes (int) number of nodes in OutputLayer. Set nodes=1 for binary classification and\n",
-    "                nodes = number of classes for multi-class classification\n",
-    "            II  output_func (activationFunctions) activation function for the output layer, such as\n",
-    "                \"identity\" for regression, \"sigmoid\" for binary classification and \"softmax\" for multi-class\n",
-    "                classification\n",
-    "            III scheduler (Scheduler) optional parameter, default set to Adam. Can also be set to other\n",
-    "                schedulers such as AdaGrad, Momentum, RMS_prop and Constant\n",
-    "        \"\"\"\n",
-    "        assert self.layers, \"OutputLayer should follow FullyConnectedLayer in CNN\"\n",
-    "\n",
-    "        if scheduler is None:\n",
-    "            scheduler = self.scheduler\n",
-    "\n",
-    "        output_layer = OutputLayer(\n",
-    "            nodes, output_func, self.cost_func, scheduler, self.seed\n",
-    "        )\n",
-    "        self.layers.append(output_layer)\n",
-    "        self.pred_format = output_layer.get_pred_format()\n",
-    "\n",
-    "    def add_FlattenLayer(self, act_func=LRELU) -> None:\n",
-    "        \"\"\"\n",
-    "        Description:\n",
-    "        ------------\n",
-    "            Add a FlattenLayer to the CNN, which flattens the image data such that it is formatted to\n",
-    "            be used in the feed forward neural network part of the CNN\n",
-    "        \"\"\"\n",
-    "        self.layers.append(FlattenLayer(act_func=act_func, seed=self.seed))\n",
-    "\n",
-    "    def add_Convolution2DLayer(\n",
-    "        self,\n",
-    "        input_channels=1,\n",
-    "        feature_maps=1,\n",
-    "        kernel_height=3,\n",
-    "        kernel_width=3,\n",
-    "        v_stride=1,\n",
-    "        h_stride=1,\n",
-    "        pad=\"same\",\n",
-    "        act_func=LRELU,\n",
-    "        optimized=True,\n",
-    "    ) -> None:\n",
-    "        \"\"\"\n",
-    "        Description:\n",
-    "        ------------\n",
-    "            Add a Convolution2DLayer to the CNN, i.e. a convolutional layer with a 2 dimensional kernel. Should be\n",
-    "            the first layer added to the CNN\n",
-    "\n",
-    "        Parameters:\n",
-    "        ------------\n",
-    "            I   input_channels (int) specifies amount of input channels. For monochrome images, use input_channels\n",
-    "                = 1, and input_channels = 3 for colored images, where each channel represents one of R, G and B\n",
-    "            II  feature_maps (int) amount of feature maps in CNN\n",
-    "            III kernel_height (int) height of the kernel, also called 'convolutional filter' in literature\n",
-    "            IV  kernel_width (int) width of the kernel, also called 'convolutional filter' in literature\n",
-    "            V   v_stride (int) value of vertical stride for dimentionality reduction\n",
-    "            VI  h_stride (int) value of horizontal stride for dimentionality reduction\n",
-    "            VII pad (str) default = \"same\" ensures output size is the same as input size (given stride=1)\n",
-    "           VIII act_func (activationFunctions) default = \"LRELU\", nonlinear activation function\n",
-    "             IX optimized (bool) default = True, uses Convolution2DLayerOPT if True which is much faster when\n",
-    "                compared to Convolution2DLayer, which is a more straightforward, understandable implementation\n",
-    "        \"\"\"\n",
-    "        if optimized:\n",
-    "            conv_layer = Convolution2DLayerOPT(\n",
-    "                input_channels,\n",
-    "                feature_maps,\n",
-    "                kernel_height,\n",
-    "                kernel_width,\n",
-    "                v_stride,\n",
-    "                h_stride,\n",
-    "                pad,\n",
-    "                act_func,\n",
-    "                self.seed,\n",
-    "                reset_weights_independently=False,\n",
-    "            )\n",
-    "        else:\n",
-    "            conv_layer = Convolution2DLayer(\n",
-    "                input_channels,\n",
-    "                feature_maps,\n",
-    "                kernel_height,\n",
-    "                kernel_width,\n",
-    "                v_stride,\n",
-    "                h_stride,\n",
-    "                pad,\n",
-    "                act_func,\n",
-    "                self.seed,\n",
-    "                reset_weights_independently=False,\n",
-    "            )\n",
-    "        self.layers.append(conv_layer)\n",
-    "\n",
-    "    def add_PoolingLayer(\n",
-    "        self, kernel_height=2, kernel_width=2, v_stride=1, h_stride=1, pooling=\"max\"\n",
-    "    ) -> None:\n",
-    "        \"\"\"\n",
-    "        Description:\n",
-    "        ------------\n",
-    "            Add a Pooling2DLayer to the CNN, i.e. a pooling layer that reduces the dimentionality of\n",
-    "            the image data. It is not necessary to use a Pooling2DLayer when creating a CNN, but it\n",
-    "            can be used to speed up the training\n",
-    "\n",
-    "        Parameters:\n",
-    "        ------------\n",
-    "            I   kernel_height (int) height of the kernel used for pooling\n",
-    "            II  kernel_width (int) width of the kernel used for pooling\n",
-    "            III v_stride (int) value of vertical stride for dimentionality reduction\n",
-    "            IV  h_stride (int) value of horizontal stride for dimentionality reduction\n",
-    "            V   pooling (str) either \"max\" or \"average\", describes type of pooling performed\n",
-    "        \"\"\"\n",
-    "        pooling_layer = Pooling2DLayer(\n",
-    "            kernel_height, kernel_width, v_stride, h_stride, pooling, self.seed\n",
-    "        )\n",
-    "        self.layers.append(pooling_layer)\n",
-    "\n",
-    "    def fit(\n",
-    "        self,\n",
-    "        X: np.ndarray,\n",
-    "        t: np.ndarray,\n",
-    "        epochs: int = 100,\n",
-    "        lam: float = 0,\n",
-    "        batches: int = 1,\n",
-    "        X_val: np.ndarray = None,\n",
-    "        t_val: np.ndarray = None,\n",
-    "    ) -> dict:\n",
-    "        \"\"\"\n",
-    "        Description:\n",
-    "        ------------\n",
-    "            Fits the CNN to input X for a given amount of epochs. Performs feedforward and backpropagation passes,\n",
-    "            can utilize batches, regulariziation and validation if desired.\n",
-    "\n",
-    "        Parameters:\n",
-    "        ------------\n",
-    "            X (numpy array) with input data in format [images, input channels,\n",
-    "            image height, image_width]\n",
-    "            t (numpy array) target labels for input data\n",
-    "            epochs (int) amount of epochs\n",
-    "            lam (float) regulariziation term lambda\n",
-    "            batches (int) amount of batches input data splits into\n",
-    "            X_val (numpy array) validation data\n",
-    "            t_val (numpy array) target labels for validation data\n",
-    "\n",
-    "        Returns:\n",
-    "        ------------\n",
-    "            scores (dict) a dictionary with \"train_error\", \"train_acc\", \"val_error\", val_acc\" keys\n",
-    "            that contain numpy arrays with float values of all accuracies/errors over all epochs.\n",
-    "            Can be used to create plots. Also used to update the progress bar during training\n",
-    "        \"\"\"\n",
-    "\n",
-    "        # setup\n",
-    "        if self.seed is not None:\n",
-    "            np.random.seed(self.seed)\n",
-    "\n",
-    "        # initialize weights\n",
-    "        self._initialize_weights(X)\n",
-    "\n",
-    "        # create arrays for score metrics\n",
-    "        scores = self._initialize_scores(epochs)\n",
-    "\n",
-    "        assert batches <= t.shape[0]\n",
-    "        batch_size = X.shape[0] // batches\n",
-    "\n",
-    "        try:\n",
-    "            for epoch in range(epochs):\n",
-    "                for batch in range(batches):\n",
-    "                    # minibatch gradient descent\n",
-    "                    # If the for loop has reached the last batch, take all thats left\n",
-    "                    if batch == batches - 1:\n",
-    "                        X_batch = X[batch * batch_size :, :, :, :]\n",
-    "                        t_batch = t[batch * batch_size :, :]\n",
-    "                    else:\n",
-    "                        X_batch = X[\n",
-    "                            batch * batch_size : (batch + 1) * batch_size, :, :, :\n",
-    "                        ]\n",
-    "                        t_batch = t[batch * batch_size : (batch + 1) * batch_size, :]\n",
-    "\n",
-    "                    self._feedforward(X_batch)\n",
-    "                    self._backpropagate(t_batch, lam)\n",
-    "\n",
-    "                # reset schedulers for each epoch (some schedulers pass in this call)\n",
-    "                for layer in self.layers:\n",
-    "                    if isinstance(layer, FullyConnectedLayer):\n",
-    "                        layer._reset_scheduler()\n",
-    "\n",
-    "                # computing performance metrics\n",
-    "                scores = self._compute_scores(scores, epoch, X, t, X_val, t_val)\n",
-    "\n",
-    "                # printing progress bar\n",
-    "                print_length = self._progress_bar(\n",
-    "                    epoch,\n",
-    "                    epochs,\n",
-    "                    scores,\n",
-    "                )\n",
-    "        # allows for stopping training at any point and seeing the result\n",
-    "        except KeyboardInterrupt:\n",
-    "            pass\n",
-    "\n",
-    "        # visualization of training progression (similiar to tensorflow progression bar)\n",
-    "        sys.stdout.write(\"\\r\" + \" \" * print_length)\n",
-    "        sys.stdout.flush()\n",
-    "        self._progress_bar(\n",
-    "            epochs,\n",
-    "            epochs,\n",
-    "            scores,\n",
-    "        )\n",
-    "        sys.stdout.write(\"\")\n",
-    "\n",
-    "        return scores\n",
-    "\n",
-    "    def _feedforward(self, X_batch) -> np.ndarray:\n",
-    "        \"\"\"\n",
-    "        Description:\n",
-    "        ------------\n",
-    "            Performs the feedforward pass for all layers in the CNN. Called from fit()\n",
-    "        \"\"\"\n",
-    "        a = X_batch\n",
-    "        for layer in self.layers:\n",
-    "            a = layer._feedforward(a)\n",
-    "\n",
-    "        return a\n",
-    "\n",
-    "    def _backpropagate(self, t_batch, lam) -> None:\n",
-    "        \"\"\"\n",
-    "        Description:\n",
-    "        ------------\n",
-    "            Performs backpropagation for all layers in the CNN. Called from fit()\n",
-    "        \"\"\"\n",
-    "        assert len(self.layers) >= 2\n",
-    "        reversed_layers = self.layers[::-1]\n",
-    "\n",
-    "        # for every layer, backwards\n",
-    "        for i in range(len(reversed_layers) - 1):\n",
-    "            layer = reversed_layers[i]\n",
-    "            prev_layer = reversed_layers[i + 1]\n",
-    "\n",
-    "            # OutputLayer\n",
-    "            if isinstance(layer, OutputLayer):\n",
-    "                prev_a = prev_layer.get_prev_a()\n",
-    "                weights_next, delta_next = layer._backpropagate(t_batch, prev_a, lam)\n",
-    "\n",
-    "            # FullyConnectedLayer\n",
-    "            elif isinstance(layer, FullyConnectedLayer):\n",
-    "                assert (\n",
-    "                    delta_next is not None\n",
-    "                ), \"No OutputLayer to follow FullyConnectedLayer\"\n",
-    "                assert (\n",
-    "                    weights_next is not None\n",
-    "                ), \"No OutputLayer to follow FullyConnectedLayer\"\n",
-    "                prev_a = prev_layer.get_prev_a()\n",
-    "                weights_next, delta_next = layer._backpropagate(\n",
-    "                    weights_next, delta_next, prev_a, lam\n",
-    "                )\n",
-    "\n",
-    "            # FlattenLayer\n",
-    "            elif isinstance(layer, FlattenLayer):\n",
-    "                assert (\n",
-    "                    delta_next is not None\n",
-    "                ), \"No FullyConnectedLayer to follow FlattenLayer\"\n",
-    "                assert (\n",
-    "                    weights_next is not None\n",
-    "                ), \"No FullyConnectedLayer to follow FlattenLayer\"\n",
-    "                delta_next = layer._backpropagate(weights_next, delta_next)\n",
-    "\n",
-    "            # Convolution2DLayer and Convolution2DLayerOPT\n",
-    "            elif isinstance(layer, Convolution2DLayer):\n",
-    "                assert (\n",
-    "                    delta_next is not None\n",
-    "                ), \"No FlattenLayer to follow Convolution2DLayer\"\n",
-    "                delta_next = layer._backpropagate(delta_next)\n",
-    "\n",
-    "            # Pooling2DLayer\n",
-    "            elif isinstance(layer, Pooling2DLayer):\n",
-    "                assert delta_next is not None, \"No Layer to follow Pooling2DLayer\"\n",
-    "                delta_next = layer._backpropagate(delta_next)\n",
-    "\n",
-    "            # Catch error\n",
-    "            else:\n",
-    "                raise NotImplementedError\n",
-    "\n",
-    "    def _compute_scores(\n",
-    "        self,\n",
-    "        scores: dict,\n",
-    "        epoch: int,\n",
-    "        X: np.ndarray,\n",
-    "        t: np.ndarray,\n",
-    "        X_val: np.ndarray,\n",
-    "        t_val: np.ndarray,\n",
-    "    ) -> dict:\n",
-    "        \"\"\"\n",
-    "        Description:\n",
-    "        ------------\n",
-    "            Computes scores such as training error, training accuracy, validation error\n",
-    "            and validation accuracy for the CNN depending on if a validation set is used\n",
-    "            and if the CNN performs classification or regression\n",
-    "\n",
-    "        Returns:\n",
-    "        ------------\n",
-    "            scores (dict) a dictionary with \"train_error\", \"train_acc\", \"val_error\", val_acc\" keys\n",
-    "            that contain numpy arrays with float values of all accuracies/errors over all epochs.\n",
-    "            Can be used to create plots. Also used to update the progress bar during training\n",
-    "        \"\"\"\n",
-    "\n",
-    "        pred_train = self.predict(X)\n",
-    "        cost_function_train = self.cost_func(t)\n",
-    "        train_error = cost_function_train(pred_train)\n",
-    "        scores[\"train_error\"][epoch] = train_error\n",
-    "\n",
-    "        if X_val is not None and t_val is not None:\n",
-    "            cost_function_val = self.cost_func(t_val)\n",
-    "            pred_val = self.predict(X_val)\n",
-    "            val_error = cost_function_val(pred_val)\n",
-    "            scores[\"val_error\"][epoch] = val_error\n",
-    "\n",
-    "        if self.pred_format != \"Regression\":\n",
-    "            train_acc = self._accuracy(pred_train, t)\n",
-    "            scores[\"train_acc\"][epoch] = train_acc\n",
-    "            if X_val is not None and t_val is not None:\n",
-    "                val_acc = self._accuracy(pred_val, t_val)\n",
-    "                scores[\"val_acc\"][epoch] = val_acc\n",
-    "\n",
-    "        return scores\n",
-    "\n",
-    "    def _initialize_scores(self, epochs) -> dict:\n",
-    "        \"\"\"\n",
-    "        Description:\n",
-    "        ------------\n",
-    "            Initializes scores such as training error, training accuracy, validation error\n",
-    "            and validation accuracy for the CNN\n",
-    "\n",
-    "        Returns:\n",
-    "        ------------\n",
-    "            A dictionary with \"train_error\", \"train_acc\", \"val_error\", val_acc\" keys that\n",
-    "            will contain numpy arrays with float values of all accuracies/errors over all epochs\n",
-    "            when passed through the _compute_scores() function during fit()\n",
-    "        \"\"\"\n",
-    "        scores = dict()\n",
-    "\n",
-    "        train_errors = np.empty(epochs)\n",
-    "        train_errors.fill(np.nan)\n",
-    "        val_errors = np.empty(epochs)\n",
-    "        val_errors.fill(np.nan)\n",
-    "\n",
-    "        train_accs = np.empty(epochs)\n",
-    "        train_accs.fill(np.nan)\n",
-    "        val_accs = np.empty(epochs)\n",
-    "        val_accs.fill(np.nan)\n",
-    "\n",
-    "        scores[\"train_error\"] = train_errors\n",
-    "        scores[\"val_error\"] = val_errors\n",
-    "        scores[\"train_acc\"] = train_accs\n",
-    "        scores[\"val_acc\"] = val_accs\n",
-    "\n",
-    "        return scores\n",
-    "\n",
-    "    def _initialize_weights(self, X: np.ndarray) -> None:\n",
-    "        \"\"\"\n",
-    "        Description:\n",
-    "        ------------\n",
-    "            Initializes weights for all layers in CNN\n",
-    "\n",
-    "        Parameters:\n",
-    "        ------------\n",
-    "            I   X (np.ndarray) input of format [img, feature_maps, height, width]\n",
-    "        \"\"\"\n",
-    "        prev_nodes = X\n",
-    "        for layer in self.layers:\n",
-    "            prev_nodes = layer._reset_weights(prev_nodes)\n",
-    "\n",
-    "    def predict(self, X: np.ndarray, *, threshold=0.5) -> np.ndarray:\n",
-    "        \"\"\"\n",
-    "        Description:\n",
-    "        ------------\n",
-    "            Predicts output of input X\n",
-    "\n",
-    "        Parameters:\n",
-    "        ------------\n",
-    "            I   X (np.ndarray) input [img, feature_maps, height, width]\n",
-    "        \"\"\"\n",
-    "\n",
-    "        prediction = self._feedforward(X)\n",
-    "\n",
-    "        if self.pred_format == \"Binary\":\n",
-    "            return np.where(prediction > threshold, 1, 0)\n",
-    "        elif self.pred_format == \"Multi-class\":\n",
-    "            class_prediction = np.zeros(prediction.shape)\n",
-    "            for i in range(prediction.shape[0]):\n",
-    "                class_prediction[i, np.argmax(prediction[i, :])] = 1\n",
-    "            return class_prediction\n",
-    "        else:\n",
-    "            return prediction\n",
-    "\n",
-    "    def _accuracy(self, prediction: np.ndarray, target: np.ndarray) -> float:\n",
-    "        \"\"\"\n",
-    "        Description:\n",
-    "        ------------\n",
-    "            Calculates accuracy of given prediction to target\n",
-    "\n",
-    "        Parameters:\n",
-    "        ------------\n",
-    "            I   prediction (np.ndarray): output of predict() fuction\n",
-    "            (1s and 0s in case of classification, and real numbers in case of regression)\n",
-    "            II  target (np.ndarray): vector of true values (What the network should predict)\n",
-    "\n",
-    "        Returns:\n",
-    "        ------------\n",
-    "            A floating point number representing the percentage of correctly classified instances.\n",
-    "        \"\"\"\n",
-    "        assert prediction.size == target.size\n",
-    "        return np.average((target == prediction))\n",
-    "\n",
-    "    def _progress_bar(self, epoch: int, epochs: int, scores: dict) -> int:\n",
-    "        \"\"\"\n",
-    "        Description:\n",
-    "        ------------\n",
-    "            Displays progress of training\n",
-    "        \"\"\"\n",
-    "        progression = epoch / epochs\n",
-    "        epoch -= 1\n",
-    "        print_length = 40\n",
-    "        num_equals = int(progression * print_length)\n",
-    "        num_not = print_length - num_equals\n",
-    "        arrow = \">\" if num_equals > 0 else \"\"\n",
-    "        bar = \"[\" + \"=\" * (num_equals - 1) + arrow + \"-\" * num_not + \"]\"\n",
-    "        perc_print = self._fmt(progression * 100, N=5)\n",
-    "        line = f\"  {bar} {perc_print}% \"\n",
-    "\n",
-    "        for key, score in scores.items():\n",
-    "            if np.isnan(score[epoch]) == False:\n",
-    "                value = self._fmt(score[epoch], N=4)\n",
-    "                line += f\"| {key}: {value} \"\n",
-    "        print(line, end=\"\\r\")\n",
-    "        return len(line)\n",
-    "\n",
-    "    def _fmt(self, value: int, N=4) -> str:\n",
-    "        \"\"\"\n",
-    "        Description:\n",
-    "        ------------\n",
-    "            Formats decimal numbers for progress bar\n",
-    "        \"\"\"\n",
-    "        if value > 0:\n",
-    "            v = value\n",
-    "        elif value < 0:\n",
-    "            v = -10 * value\n",
-    "        else:\n",
-    "            v = 1\n",
-    "        n = 1 + math.floor(math.log10(v))\n",
-    "        if n >= N - 1:\n",
-    "            return str(round(value))\n",
-    "            # or overflow\n",
-    "        return f\"{value:.{N-n-1}f}\""
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "6796a8c1",
-   "metadata": {
-    "editable": true
-   },
-   "source": [
-    "### Usage of CNN code\n",
-    "\n",
-    "Using the CNN codebase is very simple. We begin by initiating a CNN\n",
-    "object, which takes a cost function, a scheduler and a seed as its\n",
-    "arguments. If a scheduler is not provided, it will per default\n",
-    "initiate an Adam scheduler with eta=1e-4, and if a seed is not\n",
-    "provided, the CNN will not be seeded, meaning it will run with a\n",
-    "different random seed every run. Below we demonstrate an initiation of\n",
-    "our CNN."
+    "## Setting it up\n",
+    "\n",
+    "It means that to represent the entire\n",
+    "dataset of images, we require a 4D matrix or **tensor**. This tensor has the dimensions:"
    ]
   },
   {
-   "cell_type": "code",
-   "execution_count": 30,
-   "id": "70409a86",
+   "cell_type": "markdown",
+   "id": "f8d6e5be",
    "metadata": {
-    "collapsed": false,
     "editable": true
    },
-   "outputs": [],
    "source": [
-    "adam_scheduler = Adam(eta=1e-3, rho=0.9, rho2=0.999)\n",
-    "cnn = CNN(cost_func=CostCrossEntropy, scheduler=adam_scheduler, seed=2023)"
+    "$$\n",
+    "(n_{inputs},\\, n_{pixels, width},\\, n_{pixels, height},\\, depth) .\n",
+    "$$"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "794ef4c4",
+   "id": "bd170ded",
    "metadata": {
     "editable": true
    },
    "source": [
-    "Now that we have our CNN object, we can begin to add layers to it!\n",
-    "Many of the add_layer functions have default values, for example\n",
-    "add_Convolution2DLayer() has a default v_stride and h_stride of\n",
-    "1. However, these can of course be set to any value you please. Note\n",
-    "that the input channels of a subsequent convolutional layer must equal\n",
-    "the previous convolutional layer's feature maps."
+    "## The MNIST dataset again\n",
+    "\n",
+    "The MNIST dataset consists of grayscale images with a pixel size of\n",
+    "$28\\times 28$, meaning we require $28 \\times 28 = 724$ weights to each\n",
+    "neuron in the first hidden layer.\n",
+    "\n",
+    "If we were to analyze images of size $128\\times 128$ we would require\n",
+    "$128 \\times 128 = 16384$ weights to each neuron. Even worse if we were\n",
+    "dealing with color images, as most images are, we have an image matrix\n",
+    "of size $128\\times 128$ for each color dimension (Red, Green, Blue),\n",
+    "meaning 3 times the number of weights $= 49152$ are required for every\n",
+    "single neuron in the first hidden layer."
    ]
   },
   {
-   "cell_type": "code",
-   "execution_count": 31,
-   "id": "13b012c0",
+   "cell_type": "markdown",
+   "id": "5f8a4322",
    "metadata": {
-    "collapsed": false,
     "editable": true
    },
-   "outputs": [],
    "source": [
-    "cnn.add_Convolution2DLayer(\n",
-    "    input_channels=1,\n",
-    "    feature_maps=1,\n",
-    "    kernel_height=3,\n",
-    "    kernel_width=3,\n",
-    "    act_func=LRELU,\n",
-    ")\n",
-    "\n",
-    "cnn.add_FlattenLayer()\n",
-    "\n",
-    "cnn.add_FullyConnectedLayer(30, LRELU)\n",
+    "## Strong correlations\n",
     "\n",
-    "cnn.add_FullyConnectedLayer(20, LRELU)\n",
+    "Images typically have strong local correlations, meaning that a small\n",
+    "part of the image varies little from its neighboring regions. If for\n",
+    "example we have an image of a blue car, we can roughly assume that a\n",
+    "small blue part of the image is surrounded by other blue regions.\n",
     "\n",
-    "cnn.add_OutputLayer(10, softmax)"
+    "Therefore, instead of connecting every single pixel to a neuron in the\n",
+    "first hidden layer, as we have previously done with deep neural\n",
+    "networks, we can instead connect each neuron to a small part of the\n",
+    "image (in all 3 RGB depth dimensions).  The size of each small area is\n",
+    "fixed, and known as a [receptive](https://en.wikipedia.org/wiki/Receptive_field)."
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "7304655b",
+   "id": "bad994c1",
    "metadata": {
     "editable": true
    },
    "source": [
-    "Here we have created a CNN with the following architecture:\n",
-    "\n",
-    "1. A convolutional layer with 1 input channel, with a kernel height of 2 and a width of 2, which uses LRELU as its non-linearity function. This layer outputs 1 feature map, which feed into the subsequent layer.\n",
-    "\n",
-    "2. A flatten layer\n",
-    "\n",
-    "3. A hidden layer with 30 nodes, with LRELU as its activation function\n",
+    "## Layers of a CNN\n",
     "\n",
-    "4. Another hidden layer but with 20 nodes\n",
+    "The layers of a convolutional neural network arrange neurons in 3D: width, height and depth.  \n",
+    "The input image is typically a square matrix of depth 3. \n",
     "\n",
-    "5. The output layer, with softmax as its activation function and 10 nodes. We use 10 nodes because we will be using a dataset with 10 classes.\n",
+    "A **convolution** is performed on the image which outputs\n",
+    "a 3D volume of neurons. The weights to the input are arranged in a number of 2D matrices, known as **filters**.\n",
     "\n",
-    "Now, before we can train the model, we need to load in our data. We\n",
-    "will use the MNIST dataset and use 10000 $28 \\times  28$ images."
+    "Each filter slides along the input image, taking the dot product\n",
+    "between each small part of the image and the filter, in all depth\n",
+    "dimensions. This is then passed through a non-linear function,\n",
+    "typically the **Rectified Linear (ReLu)** function, which serves as the\n",
+    "activation of the neurons in the first convolutional layer. This is\n",
+    "further passed through a **pooling layer**, which reduces the size of the\n",
+    "convolutional layer, e.g. by taking the maximum or average across some\n",
+    "small regions, and this serves as input to the next convolutional\n",
+    "layer."
    ]
   },
   {
-   "cell_type": "code",
-   "execution_count": 32,
-   "id": "2f7a9df5",
+   "cell_type": "markdown",
+   "id": "3f9bf131",
    "metadata": {
-    "collapsed": false,
     "editable": true
    },
-   "outputs": [],
    "source": [
-    "from sklearn.datasets import fetch_openml\n",
-    "from sklearn.model_selection import train_test_split\n",
-    "\n",
-    "def onehot(target: np.ndarray):\n",
-    "    onehot = np.zeros((target.size, target.max() + 1))\n",
-    "    onehot[np.arange(target.size), target] = 1\n",
-    "    return onehot\n",
-    "\n",
-    "# get dataset\n",
-    "dataset = fetch_openml(\"mnist_784\", parser=\"auto\")\n",
-    "mnist = dataset.data.to_numpy(dtype=\"float\")[:10000, :]\n",
-    "\n",
-    "# scale data\n",
-    "for i in range(mnist.shape[1]):\n",
-    "    mnist[:, i] /= 255\n",
-    "    \n",
-    "# reshape to add single input channel to data shape [inputs, input_channels, height, width]\n",
-    "mnist = mnist.reshape(mnist.shape[0], 1, 28, 28)\n",
+    "## Systematic reduction\n",
     "\n",
-    "# one hot encode target as we are doing multi-class classification\n",
-    "target = onehot(np.array([int(i) for i in dataset.target.to_numpy()[:10000]]))\n",
-    "\n",
-    "# split into training and validation data\n",
-    "x_train, x_val, y_train, y_val = train_test_split(mnist, target)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "7926b33f",
-   "metadata": {
-    "editable": true
-   },
-   "source": [
-    "Now we may train our model. Note that we can utilize regularization in\n",
-    "the CNN by using the lam (lambda) parameter in fit(), and utilize\n",
-    "different types of gradient descent by specifying the amount of\n",
-    "batches via the batches parameter as shown below.\n",
-    "\n",
-    "The functionfit() returns a score dictionary of the training error and\n",
-    "accuracy (and validation error and accuracy if a validation set is\n",
-    "provided) which can be used to plot the error and accuracy of the\n",
-    "model over epochs."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 33,
-   "id": "fe318e79",
-   "metadata": {
-    "collapsed": false,
-    "editable": true
-   },
-   "outputs": [],
-   "source": [
-    "scores = cnn.fit(\n",
-    "    x_train,\n",
-    "    y_train,\n",
-    "    lam=1e-5,\n",
-    "    batches=10,\n",
-    "    epochs=100,\n",
-    "    X_val=x_val,\n",
-    "    t_val=y_val,\n",
-    ")\n",
-    "\n",
-    "plt.plot(scores[\"train_acc\"], label=\"Training\")\n",
-    "plt.plot(scores[\"val_acc\"], label=\"Validation\")\n",
-    "plt.ylim([0.8,1])\n",
-    "plt.xlabel(\"Epochs\")\n",
-    "plt.ylabel(\"Accuracy\")\n",
-    "plt.legend()\n",
-    "plt.show()"
+    "By systematically reducing the size of the input volume, through\n",
+    "convolution and pooling, the network should create representations of\n",
+    "small parts of the input, and then from them assemble representations\n",
+    "of larger areas.  The final pooling layer is flattened to serve as\n",
+    "input to a hidden layer, such that each neuron in the final pooling\n",
+    "layer is connected to every single neuron in the hidden layer. This\n",
+    "then serves as input to the output layer, e.g. a softmax output for\n",
+    "classification."
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "bda984ae",
+   "id": "625ace40",
    "metadata": {
     "editable": true
    },
    "source": [
-    "Considering we only trained the model for 100 epochs without any tuning of the hyperparameters, this result is pretty good.\n",
-    "\n",
-    "The codebase allows for great flexibility in CNN\n",
-    "architectures. Pooling layers can be added before, inbetween or after\n",
-    "convolutional layers, but due to the great optimizations made within\n",
-    "Convolution2DLayerOPT, we recommend using the v_stride and h_stride\n",
-    "parameters in add_Convolution2DLayer() to reduce the dimentionality of\n",
-    "the problem as the pooling layer is slow in comparison. To use the\n",
-    "unoptimized version of Convolution2DLayer, simply pass optimized=False\n",
-    "as an argument in add_Convolution2DLayer().\n",
-    "\n",
-    "If one wishes to perform binary classification using the CNN, simply\n",
-    "use the cost function 'CostLogReg' when initializing the CNN and use 1\n",
-    "node at the OutputLayer.\n",
-    "\n",
-    "Below we have created another, more untraditional architecture using\n",
-    "our code to demonstrate its flexibility and different attributes such\n",
-    "as asymmetric stride that might become useful when constructing your\n",
-    "own CNN."
+    "## Prerequisites: Collect and pre-process data"
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 34,
-   "id": "ae4ff3e8",
+   "execution_count": 1,
+   "id": "a3f06a64",
    "metadata": {
     "collapsed": false,
     "editable": true
    },
    "outputs": [],
    "source": [
-    "adam_scheduler = Adam(eta=1e-3, rho=0.9, rho2=0.999)\n",
-    "cnn = CNN(cost_func=CostCrossEntropy, scheduler=adam_scheduler, seed=2023)\n",
-    "\n",
-    "cnn.add_Convolution2DLayer(\n",
-    "    input_channels=1,\n",
-    "    feature_maps=7,\n",
-    "    kernel_height=7,\n",
-    "    kernel_width=1,\n",
-    "    act_func=LRELU,\n",
-    ")\n",
-    "\n",
-    "cnn.add_PoolingLayer(\n",
-    "    kernel_height=2,\n",
-    "    kernel_width=2,\n",
-    "    pooling=\"average\",\n",
-    ")\n",
-    "\n",
-    "cnn.add_PoolingLayer(\n",
-    "    kernel_height=2,\n",
-    "    kernel_width=2,\n",
-    "    pooling=\"max\",\n",
-    ")\n",
-    "\n",
-    "cnn.add_Convolution2DLayer(\n",
-    "    input_channels=7,\n",
-    "    feature_maps=1,\n",
-    "    kernel_height=4,\n",
-    "    kernel_width=4,\n",
-    "    v_stride=2,\n",
-    "    h_stride=3,\n",
-    "    act_func=LRELU,\n",
-    "    optimized=False,\n",
-    ")\n",
-    "\n",
-    "cnn.add_Convolution2DLayer(\n",
-    "    input_channels=1,\n",
-    "    feature_maps=1,\n",
-    "    kernel_height=2,\n",
-    "    kernel_width=2,\n",
-    "    act_func=sigmoid,\n",
-    "    optimized=True,\n",
-    ")\n",
-    "\n",
-    "cnn.add_PoolingLayer(\n",
-    "    kernel_height=2,\n",
-    "    kernel_width=2,\n",
-    "    pooling=\"max\"\n",
-    ")\n",
-    "\n",
-    "cnn.add_FlattenLayer()\n",
-    "\n",
-    "cnn.add_FullyConnectedLayer(100, LRELU)\n",
+    "%matplotlib inline\n",
     "\n",
-    "cnn.add_FullyConnectedLayer(10, sigmoid)\n",
+    "# import necessary packages\n",
+    "import numpy as np\n",
+    "import matplotlib.pyplot as plt\n",
+    "from sklearn import datasets\n",
     "\n",
-    "cnn.add_FullyConnectedLayer(101, identity)\n",
     "\n",
-    "cnn.add_OutputLayer(10, softmax)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "665e448f",
-   "metadata": {
-    "editable": true
-   },
-   "source": [
-    "Here we see the use of asymmetrical 1D kernels such as the $7 \\times\n",
-    "1$ kernel in the first convolutional layer, both max and average\n",
-    "pooling, asymmetric stride in the unoptimized convolutional layer,\n",
-    "more pooling, a flatten layer, a hidden layer with 100 nodes using\n",
-    "LRELU, another hidden layer with 10 hidden nodes that uses the sigmoid\n",
-    "activation function, and another hidden layer with 101 nodes which\n",
-    "utilizes no activation function (identity). Finally, we arrive at the\n",
-    "output layer with 10 nodes, which uses softmax as its activation\n",
-    "function."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "dcdf7d55",
-   "metadata": {
-    "editable": true
-   },
-   "source": [
-    "### Additional Remarks\n",
+    "# ensure the same random numbers appear every time\n",
+    "np.random.seed(0)\n",
     "\n",
-    "The stride parameter controls the distance between each convolution\n",
-    "and the kernel/filter. If our image is padded, stride is the only\n",
-    "parameter that determines the size of the output from a convolutional\n",
-    "layer. However, if we decide not to perform any padding, the size of\n",
-    "the output feature map depends on both the stride and kernel size. It\n",
-    "is important to note that neither the stride nor the kernel has to be\n",
-    "symmetrical. This means that we can use a rectangular filter if we\n",
-    "choose, and the stride in the vertical direction (axis=0 in Python)\n",
-    "does not need to be the same as the stride in the horizontal direction\n",
-    "(axis=1 in Python). It may even be the case that asymmetric\n",
-    "combinations of stride or kernel dimensions, or both, yield better\n",
-    "results than symmetric values for these parameters."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 35,
-   "id": "b0e221ab",
-   "metadata": {
-    "collapsed": false,
-    "editable": true
-   },
-   "outputs": [],
-   "source": [
-    "def convolve(image, kernel, stride=1):\n",
-    "    for i in range(2):\n",
-    "        kernel = np.rot90(kernel)\n",
+    "# display images in notebook\n",
+    "%matplotlib inline\n",
+    "plt.rcParams['figure.figsize'] = (12,12)\n",
     "\n",
-    "    k_half_height = kernel.shape[0] // 2\n",
-    "    k_half_width = kernel.shape[0] // 2\n",
     "\n",
-    "    conv_image = np.zeros(image.shape)\n",
-    "    pad_image = padding(image, kernel)\n",
+    "# download MNIST dataset\n",
+    "digits = datasets.load_digits()\n",
     "\n",
-    "    for i in range(k_half_height, conv_image.shape[0] + k_half_height, stride):\n",
-    "        for j in range(k_half_width, conv_image.shape[1] + k_half_width, stride):\n",
-    "            conv_image[i - k_half_height, j - k_half_width] = np.sum(\n",
-    "                pad_image[\n",
-    "                    i - k_half_height : i + k_half_height + 1, j - k_half_width : j + k_half_width + 1\n",
-    "                ]\n",
-    "                * kernel\n",
-    "            )\n",
+    "# define inputs and labels\n",
+    "inputs = digits.images\n",
+    "labels = digits.target\n",
     "\n",
-    "    return conv_image"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "2b380cac",
-   "metadata": {
-    "editable": true
-   },
-   "source": [
-    "### Remarks on the speed\n",
+    "# RGB images have a depth of 3\n",
+    "# our images are grayscale so they should have a depth of 1\n",
+    "inputs = inputs[:,:,:,np.newaxis]\n",
     "\n",
-    "Despite the naive convolution algorithm shown above working finely, it\n",
-    "is extremely slow, requiring approximately 20-30 seconds to process a\n",
-    "single image. The time complexity of 2D convolution, which is O(NMnm),\n",
-    "rapidly becomes a constraint and may, at worst, make computations\n",
-    "infeasible. Consequently, optimizing the naive 2D convolution\n",
-    "algorithm is a necessity, as the execution time of the algorithm\n",
-    "significantly increases as the input data size expands. This can pose\n",
-    "a bottleneck in applications that necessitate real-time processing of\n",
-    "large data volumes, such as image and video processing, deep learning,\n",
-    "and scientific simulations.\n",
+    "print(\"inputs = (n_inputs, pixel_width, pixel_height, depth) = \" + str(inputs.shape))\n",
+    "print(\"labels = (n_inputs) = \" + str(labels.shape))\n",
     "\n",
-    "To address this issue, we shall present two widely used optimization\n",
-    "techniques: the separable kernel approach and Fast Fourier Transform\n",
-    "(FFT). Both of these methods can drastically reduce the computational\n",
-    "complexity of convolution and enhance the overall efficiency of\n",
-    "processing substantial data quantities. While we shall refrain from\n",
-    "delving into the intricacies of these algorithms, we strongly\n",
-    "encourage you to examine at least the application of FFT to optimize\n",
-    "computations."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "23e23750",
-   "metadata": {
-    "editable": true
-   },
-   "source": [
-    "### Convolution using separable kernels"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 36,
-   "id": "c4a603ac",
-   "metadata": {
-    "collapsed": false,
-    "editable": true
-   },
-   "outputs": [],
-   "source": [
-    "def conv2DSep(image, kernel, coef, stride=1, pad=\"zero\"):\n",
-    "    for i in range(2):\n",
-    "        kernel = np.rot90(kernel)\n",
-    "\n",
-    "    # The kernel is quadratic, thus we only need one of its dimensions\n",
-    "    half_dim = kernel.shape[0] // 2\n",
-    "\n",
-    "    ker1 = np.array(kernel[0, :])\n",
-    "    ker2 = np.array(kernel[:, 0])\n",
-    "\n",
-    "    if pad == \"zero\":\n",
-    "        conv_image = np.zeros(image.shape)\n",
-    "        pad_image = padding(image, kernel)\n",
-    "    else:\n",
-    "        conv_image = np.zeros(\n",
-    "            (image.shape[0] - kernel.shape[0], image.shape[1] - kernel.shape[1])\n",
-    "        )\n",
-    "        pad_image = image[:, :]\n",
-    "\n",
-    "    for i in range(half_dim, conv_image.shape[0] + half_dim, stride):\n",
-    "        for j in range(half_dim, conv_image.shape[1] + half_dim, stride):\n",
-    "            conv_image[i - half_dim, j - half_dim] = (\n",
-    "                pad_image[\n",
-    "                    i - half_dim : i + half_dim + 1, j - half_dim : j + half_dim + 1\n",
-    "                ]\n",
-    "                @ ker1\n",
-    "                @ ker2.T\n",
-    "                * coef\n",
-    "            )\n",
-    "\n",
-    "    return conv_image\n",
-    "\n",
-    "img_path = img_path = \"data/IMG-2167.JPG\"\n",
-    "image_of_cute_dog = imageio.imread(img_path, mode=\"L\")\n",
-    "start_time = time.time()\n",
-    "filtered_image = conv2DSep(image_of_cute_dog, kernel=sobel_kernel, coef=1)\n",
-    "print(f'Time taken for convolution with seperated kernel on 128x128 image {time.time() - start_time}')\n",
-    "plt.imshow(filtered_image, cmap=\"gray\", vmin=0, vmax=255, aspect=\"auto\")\n",
-    "plt.show()"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "9c917729",
-   "metadata": {
-    "editable": true
-   },
-   "source": [
-    "By taking advantage of the capabilities of separable kernels, we can\n",
-    "effectively cut the computational expense of filtering an image in\n",
-    "half. Yet, if we seek even more rapid processing, we can turn to the\n",
-    "Fast Fourier Transform (FFT) algorithm provided by the numpy\n",
-    "library. By utilizing FFT to transform the input image and filter into\n",
-    "the frequency domain, we can perform convolution in this domain. This\n",
-    "approach significantly reduces the number of operations needed and\n",
-    "results in a marked speedup relative to other convolution\n",
-    "techniques. In addition, it is worth noting that the FFT is widely\n",
-    "regarded as one of the most critical algorithms developed to date,\n",
-    "with applications ranging from digital signal processing to scientific\n",
-    "computing."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "6546e80d",
-   "metadata": {
-    "editable": true
-   },
-   "source": [
-    "### Convolution in the Fourier domain"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 37,
-   "id": "efdcd619",
-   "metadata": {
-    "collapsed": false,
-    "editable": true
-   },
-   "outputs": [],
-   "source": [
-    "start_time = time.time()\n",
-    "img_fft = np.fft.fft2(image_of_cute_dog)\n",
-    "kernel_fft = np.fft.fft2(sobel_kernel, s=image_of_cute_dog.shape)\n",
     "\n",
-    "conv_image = img_fft * kernel_fft\n",
+    "# choose some random images to display\n",
+    "n_inputs = len(inputs)\n",
+    "indices = np.arange(n_inputs)\n",
+    "random_indices = np.random.choice(indices, size=5)\n",
     "\n",
-    "filtered_image = np.fft.ifft2(conv_image)\n",
-    "print(f'Time take for convolution in the fourier domain: {time.time() - start_time}')\n",
-    "plt.imshow(filtered_image.real, cmap=\"gray\", vmin=0, vmax=255, aspect=\"auto\")\n",
+    "for i, image in enumerate(digits.images[random_indices]):\n",
+    "    plt.subplot(1, 5, i+1)\n",
+    "    plt.axis('off')\n",
+    "    plt.imshow(image, cmap=plt.cm.gray_r, interpolation='nearest')\n",
+    "    plt.title(\"Label: %d\" % digits.target[random_indices[i]])\n",
     "plt.show()"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "62eb0768",
-   "metadata": {
-    "editable": true
-   },
-   "source": [
-    "It is evident that executing convolution in the Fourier domain yields\n",
-    "the quickest computation time. Nonetheless, one should exercise\n",
-    "caution, particularly when dealing with images of relatively small\n",
-    "dimensions, as one of the other methods may prove to be more\n",
-    "expeditious than FFT-enhanced convolution. The overhead involved in\n",
-    "transferring both the image and filter into the Fourier domain,\n",
-    "followed by their subsequent transformation back into the spatial\n",
-    "domain, results in a minor inconvenience. Therefore, it is imperative\n",
-    "to remain cognizant of this fact when utilizing FFT as the primary\n",
-    "optimization technique."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "51311e6c",
-   "metadata": {
-    "editable": true
-   },
-   "source": [
-    "## From FFNNs and CNNs to recurrent neural networks (RNNs)\n",
-    "\n",
-    "There are limitation of FFNNs, one of which being that FFNNs are not\n",
-    "designed to handle sequential data (data for which the order matters)\n",
-    "effectively because they lack the capabilities of storing information\n",
-    "about previous inputs; each input is being treated indepen-\n",
-    "dently. This is a limitation when dealing with sequential data where\n",
-    "past information can be vital to correctly process current and future\n",
-    "inputs."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "72a74295",
-   "metadata": {
-    "editable": true
-   },
-   "source": [
-    "## Feedback connections\n",
-    "\n",
-    "In contrast to FFNNs, recurrent networks introduce feedback\n",
-    "connections, meaning the information is allowed to be carried to\n",
-    "subsequent nodes across different time steps. These cyclic or feedback\n",
-    "connections have the objective of providing the network with some kind\n",
-    "of memory, making RNNs particularly suited for time- series data,\n",
-    "natural language processing, speech recognition, and several other\n",
-    "problems for which the order of the data is crucial.  The RNN\n",
-    "architectures vary greatly in how they manage information flow and\n",
-    "memory in the network."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "32d580fe",
-   "metadata": {
-    "editable": true
-   },
-   "source": [
-    "## Vanishing gradients\n",
-    "\n",
-    "Different architectures often aim at improving\n",
-    "some sub-optimal characteristics of the network. The simplest form of\n",
-    "recurrent network, commonly called simple or vanilla RNN, for example,\n",
-    "is known to suffer from the problem of vanishing gradients. This\n",
-    "problem arises due to the nature of backpropagation in time. Gradients\n",
-    "of the cost/loss function may get exponentially small (or large) if\n",
-    "there are many layers in the network, which is the case of RNN when\n",
-    "the sequence gets long."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "3c9016c1",
-   "metadata": {
-    "editable": true
-   },
-   "source": [
-    "## Recurrent neural networks (RNNs): Overarching view\n",
-    "\n",
-    "Till now our focus has been, including convolutional neural networks\n",
-    "as well, on feedforward neural networks. The output or the activations\n",
-    "flow only in one direction, from the input layer to the output layer.\n",
-    "\n",
-    "A recurrent neural network (RNN) looks very much like a feedforward\n",
-    "neural network, except that it also has connections pointing\n",
-    "backward. \n",
-    "\n",
-    "RNNs are used to analyze time series data such as stock prices, and\n",
-    "tell you when to buy or sell. In autonomous driving systems, they can\n",
-    "anticipate car trajectories and help avoid accidents. More generally,\n",
-    "they can work on sequences of arbitrary lengths, rather than on\n",
-    "fixed-sized inputs like all the nets we have discussed so far. For\n",
-    "example, they can take sentences, documents, or audio samples as\n",
-    "input, making them extremely useful for natural language processing\n",
-    "systems such as automatic translation and speech-to-text."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "e07424f9",
-   "metadata": {
-    "editable": true
-   },
-   "source": [
-    "## Sequential data only?\n",
-    "\n",
-    "An important issue is that in many deep learning methods we assume\n",
-    "that the input and output data can be treated as independent and\n",
-    "identically distributed, normally abbreviated to **iid**.\n",
-    "This means that the data we use can be seen as mutually independent.\n",
-    "\n",
-    "This is however not the case for most data sets used in RNNs since we\n",
-    "are dealing with sequences of data with strong inter-dependencies.\n",
-    "This applies in particular to time series, which are sequential by\n",
-    "contruction."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "895427e6",
-   "metadata": {
-    "editable": true
-   },
-   "source": [
-    "## Differential equations\n",
-    "\n",
-    "As an example, the solutions of ordinary differential equations can be\n",
-    "represented as a time series, similarly, how stock prices evolve as\n",
-    "function of time is another example of a typical time series, or voice\n",
-    "records and many other examples.\n",
-    "\n",
-    "Not all sequential data may however have a time stamp, texts being a\n",
-    "typical example thereof, or DNA sequences.\n",
-    "\n",
-    "The main focus here is on data that can be structured either as time\n",
-    "series or as ordered series of data.  We will not focus on for example\n",
-    "natural language processing or similar data sets."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "b774b933",
+   "id": "764e7143",
    "metadata": {
     "editable": true
    },
    "source": [
-    "## A simple example"
+    "## Importing Keras and Tensorflow"
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 38,
-   "id": "fb53624a",
+   "execution_count": 2,
+   "id": "1b8fd15a",
    "metadata": {
     "collapsed": false,
     "editable": true
    },
-   "outputs": [],
-   "source": [
-    "# Start importing packages\n",
-    "import pandas as pd\n",
-    "import numpy as np\n",
-    "import matplotlib.pyplot as plt\n",
-    "import tensorflow as tf\n",
-    "from tensorflow.keras import datasets, layers, models\n",
-    "from tensorflow.keras.layers import Input\n",
-    "from tensorflow.keras.models import Model, Sequential \n",
-    "from tensorflow.keras.layers import Dense, SimpleRNN, LSTM, GRU\n",
-    "from tensorflow.keras import optimizers     \n",
-    "from tensorflow.keras import regularizers           \n",
-    "from tensorflow.keras.utils import to_categorical \n",
-    "\n",
-    "\n",
-    "\n",
-    "# convert into dataset matrix\n",
-    "def convertToMatrix(data, step):\n",
-    " X, Y =[], []\n",
-    " for i in range(len(data)-step):\n",
-    "  d=i+step  \n",
-    "  X.append(data[i:d,])\n",
-    "  Y.append(data[d,])\n",
-    " return np.array(X), np.array(Y)\n",
-    "\n",
-    "step = 4\n",
-    "N = 1000    \n",
-    "Tp = 800    \n",
-    "\n",
-    "t=np.arange(0,N)\n",
-    "x=np.sin(0.02*t)+2*np.random.rand(N)\n",
-    "df = pd.DataFrame(x)\n",
-    "df.head()\n",
-    "\n",
-    "values=df.values\n",
-    "train,test = values[0:Tp,:], values[Tp:N,:]\n",
-    "\n",
-    "# add step elements into train and test\n",
-    "test = np.append(test,np.repeat(test[-1,],step))\n",
-    "train = np.append(train,np.repeat(train[-1,],step))\n",
-    " \n",
-    "trainX,trainY =convertToMatrix(train,step)\n",
-    "testX,testY =convertToMatrix(test,step)\n",
-    "trainX = np.reshape(trainX, (trainX.shape[0], 1, trainX.shape[1]))\n",
-    "testX = np.reshape(testX, (testX.shape[0], 1, testX.shape[1]))\n",
-    "\n",
-    "model = Sequential()\n",
-    "model.add(SimpleRNN(units=32, input_shape=(1,step), activation=\"relu\"))\n",
-    "model.add(Dense(8, activation=\"relu\")) \n",
-    "model.add(Dense(1))\n",
-    "model.compile(loss='mean_squared_error', optimizer='rmsprop')\n",
-    "model.summary()\n",
-    "\n",
-    "model.fit(trainX,trainY, epochs=100, batch_size=16, verbose=2)\n",
-    "trainPredict = model.predict(trainX)\n",
-    "testPredict= model.predict(testX)\n",
-    "predicted=np.concatenate((trainPredict,testPredict),axis=0)\n",
-    "\n",
-    "trainScore = model.evaluate(trainX, trainY, verbose=0)\n",
-    "print(trainScore)\n",
-    "plt.plot(df)\n",
-    "plt.plot(predicted)\n",
-    "plt.show()"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "a977d68e",
-   "metadata": {
-    "editable": true
-   },
-   "source": [
-    "## RNNs\n",
-    "\n",
-    "RNNs are very powerful, because they\n",
-    "combine two properties:\n",
-    "1. Distributed hidden state that allows them to store a lot of information about the past efficiently.\n",
-    "\n",
-    "2. Non-linear dynamics that allows them to update their hidden state in complicated ways.\n",
-    "\n",
-    "With enough neurons and time, RNNs\n",
-    "can compute anything that can be\n",
-    "computed by your computer."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "83fb69c9",
-   "metadata": {
-    "editable": true
-   },
-   "source": [
-    "## What kinds of behaviour can RNNs exhibit?\n",
-    "\n",
-    "1. They can oscillate. \n",
-    "\n",
-    "2. They can settle to point attractors.\n",
-    "\n",
-    "3. They can behave chaotically.\n",
-    "\n",
-    "4. RNNs could potentially learn to implement lots of small programs that each capture a nugget of knowledge and run in parallel, interacting to produce very complicated effects.\n",
-    "\n",
-    "But the extensive computational needs  of RNNs makes them very hard to train."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "eb65ffa7",
-   "metadata": {
-    "editable": true
-   },
-   "source": [
-    "## Basic layout,  [Figures from Sebastian Rashcka et al, Machine learning with Sickit-Learn and PyTorch](https://sebastianraschka.com/blog/2022/ml-pytorch-book.html)\n",
-    "\n",
-    "<!-- dom:FIGURE: [figslides/RNN1.png, width=700 frac=0.9] -->\n",
-    "<!-- begin figure -->\n",
-    "\n",
-    "<img src=\"figslides/RNN1.png\" width=\"700\"><p style=\"font-size: 0.9em\"><i>Figure 1: </i></p>\n",
-    "<!-- end figure -->"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "d8df5ec8",
-   "metadata": {
-    "editable": true
-   },
-   "source": [
-    "## Solving differential equations with RNNs\n",
-    "\n",
-    "To gain some intuition on how we can use RNNs for time series, let us\n",
-    "tailor the representation of the solution of a differential equation\n",
-    "as a time series.\n",
-    "\n",
-    "Consider the famous differential equation (Newton's equation of motion for damped harmonic oscillations, scaled in terms of dimensionless time)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "30abe4b1",
-   "metadata": {
-    "editable": true
-   },
-   "source": [
-    "$$\n",
-    "\\frac{d^2x}{dt^2}+\\eta\\frac{dx}{dt}+x(t)=F(t),\n",
-    "$$"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "d81f7146",
-   "metadata": {
-    "editable": true
-   },
-   "source": [
-    "where $\\eta$ is a constant used in scaling time into a dimensionless variable and $F(t)$ is an external force acting on the system.\n",
-    "The constant $\\eta$ is a so-called damping."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "ccf5e322",
-   "metadata": {
-    "editable": true
-   },
-   "source": [
-    "## Two first-order differential equations\n",
-    "\n",
-    "In solving the above second-order equation, it is common to rewrite it in terms of two coupled first-order equations\n",
-    "with the velocity"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "e34426d7",
-   "metadata": {
-    "editable": true
-   },
-   "source": [
-    "$$\n",
-    "v(t)=\\frac{dx}{dt},\n",
-    "$$"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "ab832414",
-   "metadata": {
-    "editable": true
-   },
-   "source": [
-    "and the acceleration"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "c86c2293",
-   "metadata": {
-    "editable": true
-   },
-   "source": [
-    "$$\n",
-    "\\frac{dv}{dt}=F(t)-\\eta v(t)-x(t).\n",
-    "$$"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "0604d655",
-   "metadata": {
-    "editable": true
-   },
-   "source": [
-    "With the initial conditions $v_0=v(t_0)$ and $x_0=x(t_0)$ defined, we can integrate these equations and find their respective solutions."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "529b05ce",
-   "metadata": {
-    "editable": true
-   },
-   "source": [
-    "## Velocity only\n",
-    "\n",
-    "Let us focus on the velocity only. Discretizing and using the simplest\n",
-    "possible approximation for the derivative, we have Euler's forward\n",
-    "method for the updated velocity at a time step $i+1$ given by"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "ab76e6a5",
-   "metadata": {
-    "editable": true
-   },
-   "source": [
-    "$$\n",
-    "v_{i+1}=v_i+\\Delta t \\frac{dv}{dt}_{\\vert_{v=v_i}}=v_i+\\Delta t\\left(F_i-\\eta v_i-x_i\\right).\n",
-    "$$"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "fcae3989",
-   "metadata": {
-    "editable": true
-   },
-   "source": [
-    "Defining a function"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "df9f0fc5",
-   "metadata": {
-    "editable": true
-   },
-   "source": [
-    "$$\n",
-    "h_i(x_i,v_i,F_i)=v_i+\\Delta t\\left(F_i-\\eta v_i-x_i\\right),\n",
-    "$$"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "47bd251a",
-   "metadata": {
-    "editable": true
-   },
-   "source": [
-    "we have"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "553a042b",
-   "metadata": {
-    "editable": true
-   },
-   "source": [
-    "$$\n",
-    "v_{i+1}=h_i(x_i,v_i,F_i).\n",
-    "$$"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "ad6d0453",
-   "metadata": {
-    "editable": true
-   },
-   "source": [
-    "## Linking with RNNs\n",
-    "\n",
-    "The equation"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "b4a35ef4",
-   "metadata": {
-    "editable": true
-   },
-   "source": [
-    "$$\n",
-    "v_{i+1}=h_i(x_i,v_i,F_i).\n",
-    "$$"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "b00ed0bb",
-   "metadata": {
-    "editable": true
-   },
-   "source": [
-    "can be used to train a feed-forward neural network with inputs $v_i$ and outputs $v_{i+1}$ at a time $t_i$. But we can think of this also as a recurrent neural network\n",
-    "with inputs $v_i$, $x_i$ and $F_i$ at each time step $t_i$, and producing an output $v_{i+1}$.\n",
-    "\n",
-    "Noting that"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "44be8c0e",
-   "metadata": {
-    "editable": true
-   },
-   "source": [
-    "$$\n",
-    "v_{i}=v_{i-1}+\\Delta t\\left(F_{i-1}-\\eta v_{i-1}-x_{i-1}\\right)=h_{i-1}.\n",
-    "$$"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "d7f4eae4",
-   "metadata": {
-    "editable": true
-   },
-   "source": [
-    "we have"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "a3811824",
-   "metadata": {
-    "editable": true
-   },
-   "source": [
-    "$$\n",
-    "v_{i}=h_{i-1}(x_{i-1},v_{i-1},F_{i-1}),\n",
-    "$$"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "0c34b41b",
-   "metadata": {
-    "editable": true
-   },
-   "source": [
-    "and we can rewrite"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "e3d5a85e",
-   "metadata": {
-    "editable": true
-   },
-   "source": [
-    "$$\n",
-    "v_{i+1}=h_i(x_i,h_{i-1},F_i).\n",
-    "$$"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "ca4fc0a5",
-   "metadata": {
-    "editable": true
-   },
-   "source": [
-    "## Minor rewrite\n",
-    "\n",
-    "We can thus set up a recurring series which depends on the inputs $x_i$ and $F_i$ and the previous values $h_{i-1}$.\n",
-    "We assume now that the inputs at each step (or time $t_i$) is given by $x_i$ only and we denote the outputs for $\\tilde{y}_i$ instead of $v_{i_1}$, we have then the compact equation for our outputs at each step $t_i$"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "a308a316",
-   "metadata": {
-    "editable": true
-   },
-   "source": [
-    "$$\n",
-    "y_{i}=h_i(x_i,h_{i-1}).\n",
-    "$$"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "ad14542f",
-   "metadata": {
-    "editable": true
-   },
-   "source": [
-    "We can think of this as an element in a recurrent network where our\n",
-    "network (our model) produces an output $y_i$ which is then compared\n",
-    "with a target value through a given cost/loss function that we\n",
-    "optimize. The target values at a given step $t_i$ could be the results\n",
-    "of a measurement or simply the analytical results of a differential\n",
-    "equation."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "9ea75191",
-   "metadata": {
-    "editable": true
-   },
-   "source": [
-    "## RNNs in more detail\n",
-    "\n",
-    "<!-- dom:FIGURE: [figslides/RNN2.png, width=700 frac=0.9] -->\n",
-    "<!-- begin figure -->\n",
-    "\n",
-    "<img src=\"figslides/RNN2.png\" width=\"700\"><p style=\"font-size: 0.9em\"><i>Figure 1: </i></p>\n",
-    "<!-- end figure -->"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "73249a3b",
-   "metadata": {
-    "editable": true
-   },
-   "source": [
-    "## RNNs in more detail, part 2\n",
-    "\n",
-    "<!-- dom:FIGURE: [figslides/RNN3.png, width=700 frac=0.9] -->\n",
-    "<!-- begin figure -->\n",
-    "\n",
-    "<img src=\"figslides/RNN3.png\" width=\"700\"><p style=\"font-size: 0.9em\"><i>Figure 1: </i></p>\n",
-    "<!-- end figure -->"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "cdfc1d38",
-   "metadata": {
-    "editable": true
-   },
-   "source": [
-    "## RNNs in more detail, part 3\n",
-    "\n",
-    "<!-- dom:FIGURE: [figslides/RNN4.png, width=700 frac=0.9] -->\n",
-    "<!-- begin figure -->\n",
-    "\n",
-    "<img src=\"figslides/RNN4.png\" width=\"700\"><p style=\"font-size: 0.9em\"><i>Figure 1: </i></p>\n",
-    "<!-- end figure -->"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "c6866b1f",
-   "metadata": {
-    "editable": true
-   },
-   "source": [
-    "## RNNs in more detail, part 4\n",
-    "\n",
-    "<!-- dom:FIGURE: [figslides/RNN5.png, width=700 frac=0.9] -->\n",
-    "<!-- begin figure -->\n",
-    "\n",
-    "<img src=\"figslides/RNN5.png\" width=\"700\"><p style=\"font-size: 0.9em\"><i>Figure 1: </i></p>\n",
-    "<!-- end figure -->"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "6314c98d",
-   "metadata": {
-    "editable": true
-   },
-   "source": [
-    "## RNNs in more detail, part 5\n",
-    "\n",
-    "<!-- dom:FIGURE: [figslides/RNN6.png, width=700 frac=0.9] -->\n",
-    "<!-- begin figure -->\n",
-    "\n",
-    "<img src=\"figslides/RNN6.png\" width=\"700\"><p style=\"font-size: 0.9em\"><i>Figure 1: </i></p>\n",
-    "<!-- end figure -->"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "8f2e63a2",
-   "metadata": {
-    "editable": true
-   },
-   "source": [
-    "## RNNs in more detail, part 6\n",
-    "\n",
-    "<!-- dom:FIGURE: [figslides/RNN7.png, width=700 frac=0.9] -->\n",
-    "<!-- begin figure -->\n",
-    "\n",
-    "<img src=\"figslides/RNN7.png\" width=\"700\"><p style=\"font-size: 0.9em\"><i>Figure 1: </i></p>\n",
-    "<!-- end figure -->"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "ace69423",
-   "metadata": {
-    "editable": true
-   },
-   "source": [
-    "## RNNs in more detail, part 7\n",
-    "\n",
-    "<!-- dom:FIGURE: [figslides/RNN8.png, width=700 frac=0.9] -->\n",
-    "<!-- begin figure -->\n",
-    "\n",
-    "<img src=\"figslides/RNN8.png\" width=\"700\"><p style=\"font-size: 0.9em\"><i>Figure 1: </i></p>\n",
-    "<!-- end figure -->"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "b39c66ef",
-   "metadata": {
-    "editable": true
-   },
-   "source": [
-    "## Backpropagation through time\n",
-    "\n",
-    "We can think of the recurrent net as a layered, feed-forward\n",
-    "net with shared weights and then train the feed-forward net\n",
-    "with weight constraints.\n",
-    "\n",
-    "We can also think of this training algorithm in the time domain:\n",
-    "1. The forward pass builds up a stack of the activities of all the units at each time step.\n",
-    "\n",
-    "2. The backward pass peels activities off the stack to compute the error derivatives at each time step.\n",
-    "\n",
-    "3. After the backward pass we add together the derivatives at all the different times for each weight."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "2ea7b19a",
-   "metadata": {
-    "editable": true
-   },
-   "source": [
-    "## The backward pass is linear\n",
-    "\n",
-    "1. There is a big difference between the forward and backward passes.\n",
-    "\n",
-    "2. In the forward pass we use squashing functions (like the logistic) to prevent the activity vectors from exploding.\n",
-    "\n",
-    "3. The backward pass, is completely linear. If you double the error derivatives at the final layer, all the error derivatives will double.\n",
-    "\n",
-    "The forward pass determines the slope of the linear function used for\n",
-    "backpropagating through each neuron"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "29dd1e3e",
-   "metadata": {
-    "editable": true
-   },
-   "source": [
-    "## The problem of exploding or vanishing gradients\n",
-    "* What happens to the magnitude of the gradients as we backpropagate through many layers?\n",
-    "\n",
-    "a. If the weights are small, the gradients shrink exponentially.\n",
-    "\n",
-    "b. If the weights are big the gradients grow exponentially.\n",
-    "\n",
-    "* Typical feed-forward neural nets can cope with these exponential effects because they only have a few hidden layers.\n",
-    "\n",
-    "* In an RNN trained on long sequences (e.g. 100 time steps) the gradients can easily explode or vanish.\n",
-    "\n",
-    "a. We can avoid this by initializing the weights very carefully.\n",
-    "\n",
-    "* Even with good initial weights, its very hard to detect that the current target output depends on an input from many time-steps ago.\n",
-    "\n",
-    "RNNs have difficulty dealing with long-range dependencies."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "a05d412c",
-   "metadata": {
-    "editable": true
-   },
-   "source": [
-    "## Mathematical setup\n",
-    "\n",
-    "The expression for the simplest Recurrent network resembles that of a\n",
-    "regular feed-forward neural network, but now with\n",
-    "the concept of temporal dependencies"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "519ab289",
-   "metadata": {
-    "editable": true
-   },
-   "source": [
-    "$$\n",
-    "\\begin{align*}\n",
-    "    \\mathbf{a}^{(t)} & = U * \\mathbf{x}^{(t)} + W * \\mathbf{h}^{(t-1)} + \\mathbf{b}, \\notag \\\\\n",
-    "    \\mathbf{h}^{(t)} &= \\sigma_h(\\mathbf{a}^{(t)}), \\notag\\\\\n",
-    "    \\mathbf{y}^{(t)} &= V * \\mathbf{h}^{(t)} + \\mathbf{c}, \\notag\\\\\n",
-    "    \\mathbf{\\hat{y}}^{(t)} &= \\sigma_y(\\mathbf{y}^{(t)}).\n",
-    "\\end{align*}\n",
-    "$$"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "d9301e49",
-   "metadata": {
-    "editable": true
-   },
-   "source": [
-    "## Back propagation in time through figures, part 1\n",
-    "\n",
-    "<!-- dom:FIGURE: [figslides/RNN9.png, width=700 frac=0.9] -->\n",
-    "<!-- begin figure -->\n",
-    "\n",
-    "<img src=\"figslides/RNN9.png\" width=\"700\"><p style=\"font-size: 0.9em\"><i>Figure 1: </i></p>\n",
-    "<!-- end figure -->"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "e35682e1",
-   "metadata": {
-    "editable": true
-   },
-   "source": [
-    "## Back propagation in time, part 2\n",
-    "\n",
-    "<!-- dom:FIGURE: [figslides/RNN10.png, width=700 frac=0.9] -->\n",
-    "<!-- begin figure -->\n",
-    "\n",
-    "<img src=\"figslides/RNN10.png\" width=\"700\"><p style=\"font-size: 0.9em\"><i>Figure 1: </i></p>\n",
-    "<!-- end figure -->"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "bc9a0d03",
-   "metadata": {
-    "editable": true
-   },
-   "source": [
-    "## Back propagation in time, part 3\n",
-    "\n",
-    "<!-- dom:FIGURE: [figslides/RNN11.png, width=700 frac=0.9] -->\n",
-    "<!-- begin figure -->\n",
-    "\n",
-    "<img src=\"figslides/RNN11.png\" width=\"700\"><p style=\"font-size: 0.9em\"><i>Figure 1: </i></p>\n",
-    "<!-- end figure -->"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "326aaec9",
-   "metadata": {
-    "editable": true
-   },
-   "source": [
-    "## Back propagation in time, part 4\n",
-    "\n",
-    "<!-- dom:FIGURE: [figslides/RNN12.png, width=700 frac=0.9] -->\n",
-    "<!-- begin figure -->\n",
-    "\n",
-    "<img src=\"figslides/RNN12.png\" width=\"700\"><p style=\"font-size: 0.9em\"><i>Figure 1: </i></p>\n",
-    "<!-- end figure -->"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "3bbe0303",
-   "metadata": {
-    "editable": true
-   },
-   "source": [
-    "## Back propagation in time in equations\n",
-    "\n",
-    "To derive the expression of the gradients of $\\mathcal{L}$ for\n",
-    "the RNN, we need to start recursively from the nodes closer to the\n",
-    "output layer in the temporal unrolling scheme - such as $\\mathbf{y}$\n",
-    "and $\\mathbf{h}$ at final time $t = \\tau$,"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "7a33ed10",
-   "metadata": {
-    "editable": true
-   },
-   "source": [
-    "$$\n",
-    "\\begin{align*}\n",
-    "    (\\nabla_{ \\mathbf{y}^{(t)}} \\mathcal{L})_{i} &= \\frac{\\partial \\mathcal{L}}{\\partial L^{(t)}}\\frac{\\partial L^{(t)}}{\\partial y_{i}^{(t)}}, \\notag\\\\\n",
-    "    \\nabla_{\\mathbf{h}^{(\\tau)}} \\mathcal{L} &= \\mathbf{V}^\\mathsf{T}\\nabla_{ \\mathbf{y}^{(\\tau)}} \\mathcal{L}.\n",
-    "\\end{align*}\n",
-    "$$"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "70259480",
-   "metadata": {
-    "editable": true
-   },
-   "source": [
-    "## Chain rule again\n",
-    "For the following hidden nodes, we have to iterate through time, so by the chain rule,"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "c4107d56",
-   "metadata": {
-    "editable": true
-   },
-   "source": [
-    "$$\n",
-    "\\begin{align*}\n",
-    "    \\nabla_{\\mathbf{h}^{(t)}} \\mathcal{L} &= \\left(\\frac{\\partial\\mathbf{h}^{(t+1)}}{\\partial\\mathbf{h}^{(t)}}\\right)^\\mathsf{T}\\nabla_{\\mathbf{h}^{(t+1)}}\\mathcal{L} + \\left(\\frac{\\partial\\mathbf{y}^{(t)}}{\\partial\\mathbf{h}^{(t)}}\\right)^\\mathsf{T}\\nabla_{ \\mathbf{y}^{(t)}} \\mathcal{L}.\n",
-    "\\end{align*}\n",
-    "$$"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "988a0a83",
-   "metadata": {
-    "editable": true
-   },
-   "source": [
-    "## Gradients of loss functions\n",
-    "Similarly, the gradients of $\\mathcal{L}$ with respect to the weights and biases follow,"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "44299cb6",
-   "metadata": {
-    "editable": true
-   },
+   "outputs": [],
    "source": [
-    "<!-- Equation labels as ordinary links -->\n",
-    "<div id=\"eq:rnn_gradients3\"></div>\n",
+    "from tensorflow.keras import datasets, layers, models\n",
+    "from tensorflow.keras.layers import Input\n",
+    "from tensorflow.keras.models import Sequential      #This allows appending layers to existing models\n",
+    "from tensorflow.keras.layers import Dense           #This allows defining the characteristics of a particular layer\n",
+    "from tensorflow.keras import optimizers             #This allows using whichever optimiser we want (sgd,adam,RMSprop)\n",
+    "from tensorflow.keras import regularizers           #This allows using whichever regularizer we want (l1,l2,l1_l2)\n",
+    "from tensorflow.keras.utils import to_categorical   #This allows using categorical cross entropy as the cost function\n",
+    "#from tensorflow.keras import Conv2D\n",
+    "#from tensorflow.keras import MaxPooling2D\n",
+    "#from tensorflow.keras import Flatten\n",
     "\n",
-    "$$\n",
-    "\\begin{align*}\n",
-    "    \\nabla_{\\mathbf{c}} \\mathcal{L} &=\\sum_{t}\\left(\\frac{\\partial \\mathbf{y}^{(t)}}{\\partial \\mathbf{c}}\\right)^\\mathsf{T} \\nabla_{\\mathbf{y}^{(t)}} \\mathcal{L} \\notag\\\\\n",
-    "    \\nabla_{\\mathbf{b}} \\mathcal{L} &=\\sum_{t}\\left(\\frac{\\partial \\mathbf{h}^{(t)}}{\\partial \\mathbf{b}}\\right)^\\mathsf{T}        \\nabla_{\\mathbf{h}^{(t)}} \\mathcal{L} \\notag\\\\\n",
-    "    \\nabla_{\\mathbf{V}} \\mathcal{L} &=\\sum_{t}\\sum_{i}\\left(\\frac{\\partial \\mathcal{L}}{\\partial y_i^{(t)} }\\right)\\nabla_{\\mathbf{V}^{(t)}}y_i^{(t)} \\notag\\\\\n",
-    "    \\nabla_{\\mathbf{W}} \\mathcal{L} &=\\sum_{t}\\sum_{i}\\left(\\frac{\\partial \\mathcal{L}}{\\partial h_i^{(t)}}\\right)\\nabla_{\\mathbf{w}^{(t)}} h_i^{(t)} \\notag\\\\\n",
-    "    \\nabla_{\\mathbf{U}} \\mathcal{L} &=\\sum_{t}\\sum_{i}\\left(\\frac{\\partial \\mathcal{L}}{\\partial h_i^{(t)}}\\right)\\nabla_{\\mathbf{U}^{(t)}}h_i^{(t)}.\n",
-    "    \\label{eq:rnn_gradients3} \\tag{1}\n",
-    "\\end{align*}\n",
-    "$$"
+    "from sklearn.model_selection import train_test_split\n",
+    "\n",
+    "# representation of labels\n",
+    "labels = to_categorical(labels)\n",
+    "\n",
+    "# split into train and test data\n",
+    "# one-liner from scikit-learn library\n",
+    "train_size = 0.8\n",
+    "test_size = 1 - train_size\n",
+    "X_train, X_test, Y_train, Y_test = train_test_split(inputs, labels, train_size=train_size,\n",
+    "                                                    test_size=test_size)"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "658e7404",
+   "id": "bf68c3f4",
    "metadata": {
     "editable": true
    },
    "source": [
-    "## Summary of RNNs\n",
-    "\n",
-    "Recurrent neural networks (RNNs) have in general no probabilistic component\n",
-    "in a model. With a given fixed input and target from data, the RNNs learn the intermediate\n",
-    "association between various layers.\n",
-    "The inputs, outputs, and internal representation (hidden states) are all\n",
-    "real-valued vectors.\n",
-    "\n",
-    "In a  traditional NN, it is assumed that every input is\n",
-    "independent of each other.  But with sequential data, the input at a given stage $t$ depends on the input from the previous stage $t-1$"
+    "## Running with Keras"
    ]
   },
   {
-   "cell_type": "markdown",
-   "id": "b8257f6d",
+   "cell_type": "code",
+   "execution_count": 3,
+   "id": "d5a91d0e",
    "metadata": {
+    "collapsed": false,
     "editable": true
    },
+   "outputs": [],
    "source": [
-    "## Summary of a  typical RNN\n",
-    "\n",
-    "1. Weight matrices $U$, $W$ and $V$ that connect the input layer at a stage $t$ with the hidden layer $h_t$, the previous hidden layer $h_{t-1}$ with $h_t$ and the hidden layer $h_t$ connecting with the output layer at the same stage and producing an output $\\tilde{y}_t$, respectively.\n",
-    "\n",
-    "2. The output from the hidden layer $h_t$ is oftem modulated by a $\\tanh{}$ function $h_t=\\sigma_h(x_t,h_{t-1})=\\tanh{(Ux_t+Wh_{t-1}+b)}$ with $b$ a bias value\n",
-    "\n",
-    "3. The output from the hidden layer produces $\\tilde{y}_t=\\sigma_y(Vh_t+c)$ where $c$ is a new bias parameter.\n",
+    "def create_convolutional_neural_network_keras(input_shape, receptive_field,\n",
+    "                                              n_filters, n_neurons_connected, n_categories,\n",
+    "                                              eta, lmbd):\n",
+    "    model = Sequential()\n",
+    "    model.add(layers.Conv2D(n_filters, (receptive_field, receptive_field), input_shape=input_shape, padding='same',\n",
+    "              activation='relu', kernel_regularizer=regularizers.l2(lmbd)))\n",
+    "    model.add(layers.MaxPooling2D(pool_size=(2, 2)))\n",
+    "    model.add(layers.Flatten())\n",
+    "    model.add(layers.Dense(n_neurons_connected, activation='relu', kernel_regularizer=regularizers.l2(lmbd)))\n",
+    "    model.add(layers.Dense(n_categories, activation='softmax', kernel_regularizer=regularizers.l2(lmbd)))\n",
+    "    \n",
+    "    sgd = optimizers.SGD(lr=eta)\n",
+    "    model.compile(loss='categorical_crossentropy', optimizer=sgd, metrics=['accuracy'])\n",
+    "    \n",
+    "    return model\n",
     "\n",
-    "4. The output from the training at a given stage is in turn compared with the observation $y_t$ thorugh a chosen cost function.\n",
+    "epochs = 100\n",
+    "batch_size = 100\n",
+    "input_shape = X_train.shape[1:4]\n",
+    "receptive_field = 3\n",
+    "n_filters = 10\n",
+    "n_neurons_connected = 50\n",
+    "n_categories = 10\n",
     "\n",
-    "The function $g$ can any of the standard activation functions, that is a Sigmoid, a Softmax, a ReLU and other.\n",
-    "The parameters are trained through the so-called back-propagation through time (BPTT) algorithm."
+    "eta_vals = np.logspace(-5, 1, 7)\n",
+    "lmbd_vals = np.logspace(-5, 1, 7)"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "c4b224b1",
+   "id": "8ff4d34b",
    "metadata": {
     "editable": true
    },
    "source": [
-    "## Four effective ways to learn an RNN and preparing for next week\n",
-    "1. Long Short Term Memory Make the RNN out of little modules that are designed to remember values for a long time.\n",
-    "\n",
-    "2. Hessian Free Optimization: Deal with the vanishing gradients problem by using a fancy optimizer that can detect directions with a tiny gradient but even smaller curvature.\n",
-    "\n",
-    "3. Echo State Networks: Initialize the input a hidden and hidden-hidden and output-hidden connections very carefully so that the hidden state has a huge reservoir of weakly coupled oscillators which can be selectively driven by the input.\n",
-    "\n",
-    "  * ESNs only need to learn the hidden-output connections.\n",
-    "\n",
-    "4. Good initialization with momentum Initialize like in Echo State Networks, but then learn all of the connections using momentum"
+    "## Final part"
    ]
   },
   {
-   "cell_type": "markdown",
-   "id": "884ebc4a",
+   "cell_type": "code",
+   "execution_count": 4,
+   "id": "c1035646",
    "metadata": {
+    "collapsed": false,
     "editable": true
    },
+   "outputs": [],
    "source": [
-    "## Gating mechanism: Long Short Term Memory (LSTM)\n",
-    "\n",
-    "Besides a simple recurrent neural network layer, as discussed above, there are two other\n",
-    "commonly used types of recurrent neural network layers: Long Short\n",
-    "Term Memory (LSTM) and Gated Recurrent Unit (GRU).  For a short\n",
-    "introduction to these layers see <https://medium.com/mindboard/lstm-vs-gru-experimental-comparison-955820c21e8b>\n",
-    "and <https://medium.com/mindboard/lstm-vs-gru-experimental-comparison-955820c21e8b>.\n",
-    "\n",
-    "LSTM uses a memory cell for \n",
-    "modeling long-range dependencies and avoid vanishing gradient\n",
-    " problems.\n",
-    "Capable of modeling longer term dependencies by having\n",
-    "memory cells and gates that controls the information flow along\n",
-    "with the memory cells.\n",
-    "\n",
-    "1. Introduced by Hochreiter and Schmidhuber (1997) who solved the problem of getting an RNN to remember things for a long time (like hundreds of time steps).\n",
-    "\n",
-    "2. They designed a memory cell using logistic and linear units with multiplicative interactions.\n",
-    "\n",
-    "3. Information gets into the cell whenever its “write” gate is on.\n",
-    "\n",
-    "4. The information stays in the cell so long as its **keep** gate is on.\n",
-    "\n",
-    "5. Information can be read from the cell by turning on its **read** gate."
+    "CNN_keras = np.zeros((len(eta_vals), len(lmbd_vals)), dtype=object)\n",
+    "        \n",
+    "for i, eta in enumerate(eta_vals):\n",
+    "    for j, lmbd in enumerate(lmbd_vals):\n",
+    "        CNN = create_convolutional_neural_network_keras(input_shape, receptive_field,\n",
+    "                                              n_filters, n_neurons_connected, n_categories,\n",
+    "                                              eta, lmbd)\n",
+    "        CNN.fit(X_train, Y_train, epochs=epochs, batch_size=batch_size, verbose=0)\n",
+    "        scores = CNN.evaluate(X_test, Y_test)\n",
+    "        \n",
+    "        CNN_keras[i][j] = CNN\n",
+    "        \n",
+    "        print(\"Learning rate = \", eta)\n",
+    "        print(\"Lambda = \", lmbd)\n",
+    "        print(\"Test accuracy: %.3f\" % scores[1])\n",
+    "        print()"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "811a437a",
+   "id": "dcdee4b4",
    "metadata": {
     "editable": true
    },
    "source": [
-    "## Implementing a memory cell in a neural network\n",
-    "\n",
-    "To preserve information for a long time in\n",
-    "the activities of an RNN, we use a circuit\n",
-    "that implements an analog memory cell.\n",
-    "\n",
-    "1. A linear unit that has a self-link with a weight of 1 will maintain its state.\n",
-    "\n",
-    "2. Information is stored in the cell by activating its write gate.\n",
-    "\n",
-    "3. Information is retrieved by activating the read gate.\n",
-    "\n",
-    "4. We can backpropagate through this circuit because logistics are have nice derivatives."
+    "## Final visualization"
    ]
   },
   {
-   "cell_type": "markdown",
-   "id": "c82cd0ec",
+   "cell_type": "code",
+   "execution_count": 5,
+   "id": "c34c4218",
    "metadata": {
+    "collapsed": false,
     "editable": true
    },
+   "outputs": [],
    "source": [
-    "## LSTM details\n",
+    "# visual representation of grid search\n",
+    "# uses seaborn heatmap, could probably do this in matplotlib\n",
+    "import seaborn as sns\n",
+    "\n",
+    "sns.set()\n",
+    "\n",
+    "train_accuracy = np.zeros((len(eta_vals), len(lmbd_vals)))\n",
+    "test_accuracy = np.zeros((len(eta_vals), len(lmbd_vals)))\n",
     "\n",
-    "The LSTM is a unit cell that is made of three gates:\n",
-    "1. the input gate,\n",
+    "for i in range(len(eta_vals)):\n",
+    "    for j in range(len(lmbd_vals)):\n",
+    "        CNN = CNN_keras[i][j]\n",
     "\n",
-    "2. the forget gate,\n",
+    "        train_accuracy[i][j] = CNN.evaluate(X_train, Y_train)[1]\n",
+    "        test_accuracy[i][j] = CNN.evaluate(X_test, Y_test)[1]\n",
     "\n",
-    "3. and the output gate.\n",
+    "        \n",
+    "fig, ax = plt.subplots(figsize = (10, 10))\n",
+    "sns.heatmap(train_accuracy, annot=True, ax=ax, cmap=\"viridis\")\n",
+    "ax.set_title(\"Training Accuracy\")\n",
+    "ax.set_ylabel(\"$\\eta$\")\n",
+    "ax.set_xlabel(\"$\\lambda$\")\n",
+    "plt.show()\n",
     "\n",
-    "It also introduces a cell state $c$, which can be thought of as the\n",
-    "long-term memory, and a hidden state $h$ which can be thought of as\n",
-    "the short-term memory."
+    "fig, ax = plt.subplots(figsize = (10, 10))\n",
+    "sns.heatmap(test_accuracy, annot=True, ax=ax, cmap=\"viridis\")\n",
+    "ax.set_title(\"Test Accuracy\")\n",
+    "ax.set_ylabel(\"$\\eta$\")\n",
+    "ax.set_xlabel(\"$\\lambda$\")\n",
+    "plt.show()"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "6a4c4b10",
+   "id": "9848777f",
    "metadata": {
     "editable": true
    },
    "source": [
-    "## Basic layout\n",
-    "\n",
-    "<!-- dom:FIGURE: [figslides/lstm.png, width=700 frac=1.0] -->\n",
-    "<!-- begin figure -->\n",
+    "## The CIFAR01 data set\n",
     "\n",
-    "<img src=\"figslides/lstm.png\" width=\"700\"><p style=\"font-size: 0.9em\"><i>Figure 1: </i></p>\n",
-    "<!-- end figure -->"
+    "The CIFAR10 dataset contains 60,000 color images in 10 classes, with\n",
+    "6,000 images in each class. The dataset is divided into 50,000\n",
+    "training images and 10,000 testing images. The classes are mutually\n",
+    "exclusive and there is no overlap between them."
    ]
   },
   {
-   "cell_type": "markdown",
-   "id": "851d643b",
+   "cell_type": "code",
+   "execution_count": 6,
+   "id": "e3c34685",
    "metadata": {
+    "collapsed": false,
     "editable": true
    },
+   "outputs": [],
    "source": [
-    "## More LSTM details\n",
+    "import tensorflow as tf\n",
+    "\n",
+    "from tensorflow.keras import datasets, layers, models\n",
+    "import matplotlib.pyplot as plt\n",
     "\n",
-    "The first stage is called the forget gate, where we combine the input\n",
-    "at (say, time $t$), and the hidden cell state input at $t-1$, passing\n",
-    "it through the Sigmoid activation function and then performing an\n",
-    "element-wise multiplication, denoted by $\\otimes$.\n",
+    "# We import the data set\n",
+    "(train_images, train_labels), (test_images, test_labels) = datasets.cifar10.load_data()\n",
     "\n",
-    "It follows"
+    "# Normalize pixel values to be between 0 and 1 by dividing by 255. \n",
+    "train_images, test_images = train_images / 255.0, test_images / 255.0"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "95d83de8",
+   "id": "376a2959",
    "metadata": {
     "editable": true
    },
    "source": [
-    "$$\n",
-    "\\mathbf{f}^{(t)} = \\sigma(W_f\\mathbf{x}^{(t)} + U_f\\mathbf{h}^{(t-1)} + \\mathbf{b}_f)\n",
-    "$$"
+    "## Verifying the data set\n",
+    "\n",
+    "To verify that the dataset looks correct, let's plot the first 25 images from the training set and display the class name below each image."
    ]
   },
   {
-   "cell_type": "markdown",
-   "id": "c2f53b0a",
+   "cell_type": "code",
+   "execution_count": 7,
+   "id": "fa4b303c",
    "metadata": {
+    "collapsed": false,
     "editable": true
    },
+   "outputs": [],
    "source": [
-    "where $W$ and $U$ are the weights respectively."
+    "class_names = ['airplane', 'automobile', 'bird', 'cat', 'deer',\n",
+    "               'dog', 'frog', 'horse', 'ship', 'truck']\n",
+    "plt.figure(figsize=(10,10))\n",
+    "for i in range(25):\n",
+    "    plt.subplot(5,5,i+1)\n",
+    "    plt.xticks([])\n",
+    "    plt.yticks([])\n",
+    "    plt.grid(False)\n",
+    "    plt.imshow(train_images[i], cmap=plt.cm.binary)\n",
+    "    # The CIFAR labels happen to be arrays, \n",
+    "    # which is why you need the extra index\n",
+    "    plt.xlabel(class_names[train_labels[i][0]])\n",
+    "plt.show()"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "cf93841c",
+   "id": "8f717ab7",
    "metadata": {
     "editable": true
    },
    "source": [
-    "## The forget gate\n",
+    "## Set up  the model\n",
+    "\n",
+    "The 6 lines of code below define the convolutional base using a common pattern: a stack of Conv2D and MaxPooling2D layers.\n",
     "\n",
-    "This is called the forget gate since the Sigmoid activation function's\n",
-    "outputs are very close to $0$ if the argument for the function is very\n",
-    "negative, and $1$ if the argument is very positive. Hence we can\n",
-    "control the amount of information we want to take from the long-term\n",
-    "memory."
+    "As input, a CNN takes tensors of shape (image_height, image_width, color_channels), ignoring the batch size. If you are new to these dimensions, color_channels refers to (R,G,B). In this example, you will configure our CNN to process inputs of shape (32, 32, 3), which is the format of CIFAR images. You can do this by passing the argument input_shape to our first layer."
    ]
   },
   {
-   "cell_type": "markdown",
-   "id": "4bdc183d",
+   "cell_type": "code",
+   "execution_count": 8,
+   "id": "91013222",
    "metadata": {
+    "collapsed": false,
     "editable": true
    },
+   "outputs": [],
    "source": [
-    "## Input gate\n",
+    "model = models.Sequential()\n",
+    "model.add(layers.Conv2D(32, (3, 3), activation='relu', input_shape=(32, 32, 3)))\n",
+    "model.add(layers.MaxPooling2D((2, 2)))\n",
+    "model.add(layers.Conv2D(64, (3, 3), activation='relu'))\n",
+    "model.add(layers.MaxPooling2D((2, 2)))\n",
+    "model.add(layers.Conv2D(64, (3, 3), activation='relu'))\n",
     "\n",
-    "The next stage is the input gate, which consists of both a Sigmoid\n",
-    "function ($\\sigma_i$), which decide what percentage of the input will\n",
-    "be stored in the long-term memory, and the $\\tanh_i$ function, which\n",
-    "decide what is the full memory that can be stored in the long term\n",
-    "memory. When these results are calculated and multiplied together, it\n",
-    "is added to the cell state or stored in the long-term memory, denoted\n",
-    "as $\\oplus$. \n",
+    "# Let's display the architecture of our model so far.\n",
     "\n",
-    "We have"
+    "model.summary()"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "68620ac8",
+   "id": "64f3581b",
    "metadata": {
     "editable": true
    },
    "source": [
-    "$$\n",
-    "\\mathbf{i}^{(t)} = \\sigma_g(W_i\\mathbf{x}^{(t)} + U_i\\mathbf{h}^{(t-1)} + \\mathbf{b}_i),\n",
-    "$$"
+    "You can see that the output of every Conv2D and MaxPooling2D layer is a 3D tensor of shape (height, width, channels). The width and height dimensions tend to shrink as you go deeper in the network. The number of output channels for each Conv2D layer is controlled by the first argument (e.g., 32 or 64). Typically, as the width and height shrink, you can afford (computationally) to add more output channels in each Conv2D layer."
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "0ecefa00",
+   "id": "07774fd6",
    "metadata": {
     "editable": true
    },
    "source": [
-    "and"
+    "## Add Dense layers on top\n",
+    "\n",
+    "To complete our model, you will feed the last output tensor from the\n",
+    "convolutional base (of shape (4, 4, 64)) into one or more Dense layers\n",
+    "to perform classification. Dense layers take vectors as input (which\n",
+    "are 1D), while the current output is a 3D tensor. First, you will\n",
+    "flatten (or unroll) the 3D output to 1D, then add one or more Dense\n",
+    "layers on top. CIFAR has 10 output classes, so you use a final Dense\n",
+    "layer with 10 outputs and a softmax activation."
    ]
   },
   {
-   "cell_type": "markdown",
-   "id": "cfcdabcc",
+   "cell_type": "code",
+   "execution_count": 9,
+   "id": "a6dc1206",
    "metadata": {
+    "collapsed": false,
     "editable": true
    },
+   "outputs": [],
    "source": [
-    "$$\n",
-    "\\mathbf{\\tilde{c}}^{(t)} = \\tanh(W_c\\mathbf{x}^{(t)} + U_c\\mathbf{h}^{(t-1)} + \\mathbf{b}_c),\n",
-    "$$"
+    "model.add(layers.Flatten())\n",
+    "model.add(layers.Dense(64, activation='relu'))\n",
+    "model.add(layers.Dense(10))\n",
+    "Here's the complete architecture of our model.\n",
+    "\n",
+    "model.summary()"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "bcb8067e",
+   "id": "71ef5715",
    "metadata": {
     "editable": true
    },
    "source": [
-    "again the $W$ and $U$ are the weights."
+    "As you can see, our (4, 4, 64) outputs were flattened into vectors of shape (1024) before going through two Dense layers."
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "903077be",
+   "id": "596eaf51",
    "metadata": {
     "editable": true
    },
    "source": [
-    "## Forget and input\n",
-    "\n",
-    "The forget gate and the input gate together also update the cell state with the following equation,"
+    "## Compile and train the model"
    ]
   },
   {
-   "cell_type": "markdown",
-   "id": "7ab06635",
+   "cell_type": "code",
+   "execution_count": 10,
+   "id": "1c8159af",
    "metadata": {
+    "collapsed": false,
     "editable": true
    },
+   "outputs": [],
    "source": [
-    "$$\n",
-    "\\mathbf{c}^{(t)} = \\mathbf{f}^{(t)} \\otimes \\mathbf{c}^{(t-1)} + \\mathbf{i}^{(t)} \\otimes \\mathbf{\\tilde{c}}^{(t)},\n",
-    "$$"
+    "model.compile(optimizer='adam',\n",
+    "              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),\n",
+    "              metrics=['accuracy'])\n",
+    "\n",
+    "history = model.fit(train_images, train_labels, epochs=10, \n",
+    "                    validation_data=(test_images, test_labels))"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "7d7df6af",
+   "id": "23913f02",
    "metadata": {
     "editable": true
    },
    "source": [
-    "where $f^{(t)}$ and $i^{(t)}$ are the outputs of the forget gate and the input gate, respectively."
+    "## Finally, evaluate the model"
    ]
   },
   {
-   "cell_type": "markdown",
-   "id": "23fafbb6",
+   "cell_type": "code",
+   "execution_count": 11,
+   "id": "942cf136",
    "metadata": {
+    "collapsed": false,
     "editable": true
    },
+   "outputs": [],
    "source": [
-    "## Output gate\n",
+    "plt.plot(history.history['accuracy'], label='accuracy')\n",
+    "plt.plot(history.history['val_accuracy'], label = 'val_accuracy')\n",
+    "plt.xlabel('Epoch')\n",
+    "plt.ylabel('Accuracy')\n",
+    "plt.ylim([0.5, 1])\n",
+    "plt.legend(loc='lower right')\n",
     "\n",
-    "The final stage of the LSTM is the output gate, and its purpose is to\n",
-    "update the short-term memory.  To achieve this, we take the newly\n",
-    "generated long-term memory and process it through a hyperbolic tangent\n",
-    "($\\tanh$) function creating a potential new short-term memory. We then\n",
-    "multiply this potential memory by the output of the Sigmoid function\n",
-    "($\\sigma_o$). This multiplication generates the final output as well\n",
-    "as the input for the next hidden cell ($h^{\\langle t \\rangle}$) within\n",
-    "the LSTM cell.\n",
+    "test_loss, test_acc = model.evaluate(test_images,  test_labels, verbose=2)\n",
     "\n",
-    "We have"
+    "print(test_acc)"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "e5f59fdd",
+   "id": "9cf8f35b",
    "metadata": {
     "editable": true
    },
    "source": [
-    "$$\n",
-    "\\begin{aligned}\n",
-    "\\mathbf{o}^{(t)} &= \\sigma_g(W_o\\mathbf{x}^{(t)} + U_o\\mathbf{h}^{(t-1)} + \\mathbf{b}_o), \\\\\n",
-    "\\mathbf{h}^{(t)} &= \\mathbf{o}^{(t)} \\otimes \\sigma_h(\\mathbf{c}^{(t)}). \\\\\n",
-    "\\end{aligned}\n",
-    "$$"
+    "## Building code using Pytorch\n",
+    "\n",
+    "This code loads and normalizes the MNIST dataset. Thereafter it defines  a CNN architecture with:\n",
+    "1. Two convolutional layers\n",
+    "\n",
+    "2. Max pooling\n",
+    "\n",
+    "3. Dropout for regularization\n",
+    "\n",
+    "4. Two fully connected layers\n",
+    "\n",
+    "It uses the Adam optimizer and for cost function it employs the\n",
+    "Cross-Entropy function. It trains for 10 epochs.\n",
+    "You can modify the architecture (number of layers, channels, dropout\n",
+    "rate) or training parameters (learning rate, batch size, epochs) to\n",
+    "experiment with different configurations."
    ]
   },
   {
-   "cell_type": "markdown",
-   "id": "3194699c",
+   "cell_type": "code",
+   "execution_count": 12,
+   "id": "3f08edcf",
    "metadata": {
+    "collapsed": false,
     "editable": true
    },
+   "outputs": [],
    "source": [
-    "where $\\mathbf{W_o,U_o}$ are the weights of the output gate and $\\mathbf{b_o}$ is the bias of the output gate."
+    "import torch\n",
+    "import torch.nn as nn\n",
+    "import torch.nn.functional as F\n",
+    "import torch.optim as optim\n",
+    "from torchvision import datasets, transforms\n",
+    "\n",
+    "# Set device\n",
+    "device = torch.device(\"cuda\" if torch.cuda.is_available() else \"cpu\")\n",
+    "\n",
+    "# Define transforms\n",
+    "transform = transforms.Compose([\n",
+    "   transforms.ToTensor(),\n",
+    "   transforms.Normalize((0.1307,), (0.3081,))\n",
+    "])\n",
+    "\n",
+    "# Load datasets\n",
+    "train_dataset = datasets.MNIST(root='./data', train=True, download=True, transform=transform)\n",
+    "test_dataset = datasets.MNIST(root='./data', train=False, download=True, transform=transform)\n",
+    "\n",
+    "# Create data loaders\n",
+    "train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=64, shuffle=True)\n",
+    "test_loader = torch.utils.data.DataLoader(test_dataset, batch_size=64, shuffle=False)\n",
+    "\n",
+    "# Define CNN model\n",
+    "class CNN(nn.Module):\n",
+    "   def __init__(self):\n",
+    "       super(CNN, self).__init__()\n",
+    "       self.conv1 = nn.Conv2d(1, 32, 3, padding=1)\n",
+    "       self.conv2 = nn.Conv2d(32, 64, 3, padding=1)\n",
+    "       self.pool = nn.MaxPool2d(2, 2)\n",
+    "       self.fc1 = nn.Linear(64*7*7, 1024)\n",
+    "       self.fc2 = nn.Linear(1024, 10)\n",
+    "       self.dropout = nn.Dropout(0.5)\n",
+    "\n",
+    "   def forward(self, x):\n",
+    "       x = self.pool(F.relu(self.conv1(x)))\n",
+    "       x = self.pool(F.relu(self.conv2(x)))\n",
+    "       x = x.view(-1, 64*7*7)\n",
+    "       x = self.dropout(F.relu(self.fc1(x)))\n",
+    "       x = self.fc2(x)\n",
+    "       return x\n",
+    "\n",
+    "# Initialize model, loss function, and optimizer\n",
+    "model = CNN().to(device)\n",
+    "criterion = nn.CrossEntropyLoss()\n",
+    "optimizer = optim.Adam(model.parameters(), lr=0.001)\n",
+    "\n",
+    "# Training loop\n",
+    "num_epochs = 10\n",
+    "for epoch in range(num_epochs):\n",
+    "   model.train()\n",
+    "   running_loss = 0.0\n",
+    "   for batch_idx, (data, target) in enumerate(train_loader):\n",
+    "       data, target = data.to(device), target.to(device)\n",
+    "       optimizer.zero_grad()\n",
+    "       outputs = model(data)\n",
+    "       loss = criterion(outputs, target)\n",
+    "       loss.backward()\n",
+    "       optimizer.step()\n",
+    "       running_loss += loss.item()\n",
+    "\n",
+    "   print(f'Epoch [{epoch+1}/{num_epochs}], Loss: {running_loss/len(train_loader):.4f}')\n",
+    "\n",
+    "# Testing the model\n",
+    "model.eval()\n",
+    "correct = 0\n",
+    "total = 0\n",
+    "with torch.no_grad():\n",
+    "   for data, target in test_loader:\n",
+    "       data, target = data.to(device), target.to(device)\n",
+    "       outputs = model(data)\n",
+    "       _, predicted = torch.max(outputs.data, 1)\n",
+    "       total += target.size(0)\n",
+    "       correct += (predicted == target).sum().item()\n",
+    "\n",
+    "print(f'Test Accuracy: {100 * correct / total:.2f}%')"
    ]
   }
  ],
diff --git a/doc/src/Projects/2025/Project2/.ipynb_checkpoints/Project2-checkpoint.ipynb b/doc/src/Projects/2025/Project2/.ipynb_checkpoints/Project2-checkpoint.ipynb
new file mode 100644
index 000000000..e225c0e0c
--- /dev/null
+++ b/doc/src/Projects/2025/Project2/.ipynb_checkpoints/Project2-checkpoint.ipynb
@@ -0,0 +1,555 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "aa8cd4fa",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- HTML file automatically generated from DocOnce source (https://github.com/doconce/doconce/)\n",
+    "doconce format html Project2.do.txt  -->\n",
+    "<!-- dom:TITLE: Project 2 on Machine Learning, deadline November 10 (Midnight) -->"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5f1135b4",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "# Project 2 on Machine Learning, deadline November 10 (Midnight)\n",
+    "**[Data Analysis and Machine Learning FYS-STK3155/FYS4155](http://www.uio.no/studier/emner/matnat/fys/FYS3155/index-eng.html)**, University of Oslo, Norway\n",
+    "\n",
+    "Date: **October 14, 2025**"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "887ae589",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Deliverables\n",
+    "\n",
+    "First, join a group in canvas with your group partners. Pick an avaliable group for Project 2 in the **People** page.\n",
+    "\n",
+    "In canvas, deliver as a group and include:\n",
+    "\n",
+    "* A PDF of your report which follows the guidelines covered below and in the week 39 exercises. Additional requirements include:\n",
+    "\n",
+    "  * It should be around 5000 words, use the word counter in Overleaf for this. This often corresponds to 10-12 pages. References and appendices are excluded from the word count\n",
+    "\n",
+    "  * It should include around 10-15 figures. You can include more figures in appendices and/or as supplemental material in your repository.\n",
+    "\n",
+    "* A comment linking to your github repository (or folder in one of your github repositories) for this project. The repository must include\n",
+    "\n",
+    "A PDF file of the report\n",
+    "  * A folder named Code, where you put python files for your functions and notebooks for reproducing your results. Remember to use a seed for generating random data and for train-test splits when generating final results.\n",
+    "\n",
+    "  * A README file with the name of the group members\n",
+    "\n",
+    "  * a short description of the project\n",
+    "\n",
+    "  * a description of how to install the required packages to run your code from a requirements.txt file or similar (such as a plain text description) names and descriptions of the various notebooks in the Code folder and the results they produce"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2abeb4c0",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "### Preamble: Note on writing reports, using reference material, AI and other tools\n",
+    "\n",
+    "We want you to answer the three different projects by handing in\n",
+    "reports written like a standard scientific/technical report. The links\n",
+    "at\n",
+    "https://github.com/CompPhysics/MachineLearning/tree/master/doc/Projects/n",
+    "contain more information. There you can find examples of previous\n",
+    "reports, the projects themselves, how we grade reports etc. How to\n",
+    "write reports will also be discussed during the various lab\n",
+    "sessions. Please do ask us if you are in doubt.\n",
+    "\n",
+    "When using codes and material from other sources, you should refer to\n",
+    "these in the bibliography of your report, indicating wherefrom you for\n",
+    "example got the code, whether this is from the lecture notes,\n",
+    "softwares like Scikit-Learn, TensorFlow, PyTorch or other\n",
+    "sources. These sources should always be cited correctly. How to cite\n",
+    "some of the libraries is often indicated from their corresponding\n",
+    "GitHub sites or websites, see for example how to cite Scikit-Learn at\n",
+    "https://scikit-learn.org/dev/about.html./n",
+    "\n",
+    "We enocurage you to use tools like ChatGPT or similar in writing the\n",
+    "report. If you use for example ChatGPT, please do cite it properly and\n",
+    "include (if possible) your questions and answers as an addition to the\n",
+    "report. This can be uploaded to for example your website,\n",
+    "GitHub/GitLab or similar as supplemental material.\n",
+    "\n",
+    "If you would like to study other data sets, feel free to propose other\n",
+    "sets. What we have proposed here are mere suggestions from our\n",
+    "side. If you opt for another data set, consider using a set which has\n",
+    "been studied in the scientific literature. This makes it easier for\n",
+    "you to compare and analyze your results. Comparing with existing\n",
+    "results from the scientific literature is also an essential element of\n",
+    "the scientific discussion. The University of California at Irvine with\n",
+    "its Machine Learning repository at\n",
+    "https://archive.ics.uci.edu/ml/index.php%20is%20an%20excellent%20site%20to%20look/n",
+    "up for examples and inspiration. Kaggle.com is an equally interesting\n",
+    "site. Feel free to explore these sites."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "120fde91",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Classification and Regression, writing our own neural network code\n",
+    "\n",
+    "The main aim of this project is to study both classification and\n",
+    "regression problems by developing our own feed-forward neural network\n",
+    "(FFNN) code. The exercises from week 41 and 42 (see <https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/exercisesweek41.html> and <https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/exercisesweek42.html>) as well as the lecture material from the same weeks (see  <https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/week41.html> and <https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/week42.html>) should contain enough information for you to get started with writing your own code.\n",
+    "\n",
+    "We will also reuse our codes on gradient descent methods from project 1.\n",
+    "\n",
+    "The data sets that we propose here are (the default sets)\n",
+    "\n",
+    "* Regression (fitting a continuous function). In this part you will need to bring back your results from project 1 and compare these with what you get from your Neural Network code to be developed here. The data sets could be\n",
+    "\n",
+    "a. The simple one-dimensional function Runge function from project 1, that is $f(x) = \\frac{1}{1+25x^2}$. We recommend using a simpler function when developing your neural network code for regression problems. You should however feel free to discuss and study other functions, such as the the two-dimensional Runge function $f(x,y)=\\left[(10x - 5)^2 + (10y - 5)^2 + 1 \\right]^{-1}$, or even more complicated two-dimensional functions (see the supplementary material of <https://www.nature.com/articles/s41467-025-61362-4> for an extensive list of two-dimensional functions. We leave this however as an option for project 3. \n",
+    "\n",
+    "* Classification.\n",
+    "\n",
+    "a. We will consider the multiclass classification problem given by the full MNIST data set. The one  included in **scikit-learn** is  reduced data. The full data set is at <https://www.kaggle.com/datasets/hojjatk/mnist-dataset>. \n",
+    "\n",
+    "We will start with a regression problem and we will reuse our codes on gradient descent methods from project 1."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ed46ed5e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "### Part a): Analytical warm-up\n",
+    "\n",
+    "When using our gradient machinery from project 1, we will need the expressions for the cost/loss functions and their respective\n",
+    "gradients. The functions whose gradients we need are:\n",
+    "1. The mean-squared error (MSE) with and without the $L_1$ and $L_2$ norms (regression problems)\n",
+    "\n",
+    "2. The binary cross entropy (aka log loss)  for classification problems with and without $L_1$ and $L_2$ norms\n",
+    "\n",
+    "3. The multiclass cross entropy cost/loss function (aka Softmax cross entropy or just Softmax loss function)\n",
+    "\n",
+    "Set up these three cost/loss functions and their respective derivatives and explain the various terms.\n",
+    "\n",
+    "We will test three activation functions for our neural network setup, these are the \n",
+    "1. The Sigmoid (aka **logit**) function,\n",
+    "\n",
+    "2. the RELU function and\n",
+    "\n",
+    "3. the Leaky RELU function\n",
+    "\n",
+    "Set up their expressions and their first derivatives.\n",
+    "You may consult the lecture notes (with codes and more) from week 42 at <https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/week42.html>."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "89a608b3",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "### Reminder about the gradient machinery from project 1\n",
+    "\n",
+    "In the setup of a neural network code you will need your gradient descent codes from\n",
+    "project 1.  For neural networks we will recommend using stochastic\n",
+    "gradient descent with either the RMSprop or the ADAM algorithms for\n",
+    "updating the learning rates. But you should feel free to try plain gradient descent as well.\n",
+    "\n",
+    "We recommend reading chapter 8 on optimization from the textbook of\n",
+    "Goodfellow, Bengio and Courville at\n",
+    "<https://www.deeplearningbook.org/>. This chapter contains many\n",
+    "useful insights and discussions on the optimization part of machine\n",
+    "learning.  A useful reference on the back progagation algorithm is\n",
+    "Nielsen's book at <http://neuralnetworksanddeeplearning.com/>. It\n",
+    "is an excellent\n",
+    "\n",
+    "You will find the Python [Seaborn\n",
+    "package](https://seaborn.pydata.org/generated/seaborn.heatmap.html)\n",
+    "useful when plotting the results as function of the learning rate\n",
+    "$\\eta$ and the hyper-parameter $\\lambda$ ."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "efae04b0",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "### Part b): Writing your own Neural Network code\n",
+    "\n",
+    "Your aim now, and this is the central part of this project, is to\n",
+    "write your own Feed Forward Neural Network  code implementing the back\n",
+    "propagation algorithm discussed in the lecture slides from week 41 at <https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/week41.html> and week 42 at <https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/week41.html>.\n",
+    "\n",
+    "We will focus on a regression problem first, using the one-dimensional Runge function"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "21420400",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "f(x) = \\frac{1}{1+25x^2},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c0f204ea",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "from project 1.\n",
+    "\n",
+    "Use only the mean-squared error as cost function (no regularization terms) and \n",
+    "write an FFNN code for a regression problem with a flexible number of hidden\n",
+    "layers and nodes using only the Sigmoid function as activation function for\n",
+    "the hidden layers. Initialize the weights using a normal\n",
+    "distribution. How would you initialize the biases? And which\n",
+    "activation function would you select for the final output layer?\n",
+    "And how would you set up your design/feature matrix? Hint: does it have to represent a polynomial approximation as you did in project 1? \n",
+    "\n",
+    "Train your network and compare the results with those from your OLS\n",
+    "regression code from project 1 using the one-dimensional Runge\n",
+    "function.  When comparing your neural network code with the OLS\n",
+    "results from project 1, use the same data sets which gave you the best\n",
+    "MSE score. Moreover, use the polynomial order from project 1 that gave you the\n",
+    "best result.  Compare these results with your neural network with one\n",
+    "and two hidden layers using $50$ and $100$ hidden nodes, respectively.\n",
+    "\n",
+    "Comment your results and give a critical discussion of the results\n",
+    "obtained with the OLS code from project 1 and your own neural network\n",
+    "code.  Make an analysis of the learning rates employed to find the\n",
+    "optimal MSE and $R2$ scores. Test both stochastic gradient descent\n",
+    "with RMSprop and ADAM and plain gradient descent with different\n",
+    "learning rates.\n",
+    "\n",
+    "You should, as you did in project 1, scale your data."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "fb5eb86a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "### Part c): Testing against other software libraries\n",
+    "\n",
+    "You should test your results against a similar code using **Scikit-Learn** (see the examples in the above lecture notes from weeks 41 and 42) or **tensorflow/keras** or **Pytorch** (for Pytorch, see Raschka et al.'s text chapters 12 and 13). \n",
+    "\n",
+    "Furthermore, you should also test that your derivatives are correctly\n",
+    "calculated using automatic differentiation, using for example the\n",
+    "**Autograd** library or the **JAX** library. It is optional to implement\n",
+    "these libraries for the present project. In this project they serve as\n",
+    "useful tests of our derivatives."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "31f28df1",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "### Part d): Testing different activation functions and depths of the neural network\n",
+    "\n",
+    "You should also test different activation functions for the hidden\n",
+    "layers. Try out the Sigmoid, the RELU and the Leaky RELU functions and\n",
+    "discuss your results.  Test your results as functions of the number of hidden layers and nodes. Do you see signs of overfitting?\n",
+    "It is optional in this project to perform a bias-variance trade-off analysis."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "42234a78",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "### Part e): Testing different norms\n",
+    "\n",
+    "Finally, still using the one-dimensional Runge function, add now the\n",
+    "hyperparameters $\\lambda$ with the $L_2$ and $L_1$ norms.  Find the\n",
+    "optimal results for the hyperparameters $\\lambda$ and the learning\n",
+    "rates $\\eta$ and neural network architecture and compare the $L_2$ results with Ridge regression from\n",
+    "project 1 and the $L_1$ results with the Lasso calculations of project\n",
+    "1. Use again the same data sets and the best results from project 1 in your comparisons."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3293c0df",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "### Part f): Classification  analysis using neural networks\n",
+    "\n",
+    "With a well-written code it should now be easy to change the\n",
+    "activation function for the output layer.\n",
+    "\n",
+    "Here we will change the cost function for our neural network code\n",
+    "developed in parts b), d) and e) in order to perform a classification\n",
+    "analysis.  The classification problem we will study is the multiclass\n",
+    "MNIST problem, see the description of the full data set at\n",
+    "<https://www.kaggle.com/datasets/hojjatk/mnist-dataset>. We will use the Softmax cross entropy function discussed in a). \n",
+    "The MNIST data set discussed in the lecture notes from week 42, is a downscaled variant of the full dataset. \n",
+    "\n",
+    "Feel free to suggest other data sets. If you find the classic MNIST data set somewhat limited, feel free to try the  \n",
+    "MNIST-Fashion data set at for example <https://www.kaggle.com/datasets/zalando-research/fashionmnist>.\n",
+    "\n",
+    "To set up the data set, the following python programs may be useful"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "id": "4ef36899",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "from sklearn.datasets import fetch_openml\n",
+    "\n",
+    "# Fetch the MNIST dataset\n",
+    "mnist = fetch_openml('mnist_784', version=1, as_frame=False, parser='auto')\n",
+    "\n",
+    "# Extract data (features) and target (labels)\n",
+    "X = mnist.data\n",
+    "y = mnist.target"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1215ffb1",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "You should consider scaling the data. The Pixel values in MNIST range from 0 to 255. Scaling them to a 0-1 range can improve the performance of some models. That is, you could implement the following scaling"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "id": "799c6d08",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "X = X / 255.0"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e0bd8c49",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "And then perform the standard train-test splitting"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "id": "71940eed",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "from sklearn.model_selection import train_test_split\n",
+    "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "35e45fb9",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "To measure the performance of our classification problem we will use the\n",
+    "so-called *accuracy* score.  The accuracy is as you would expect just\n",
+    "the number of correctly guessed targets $t_i$ divided by the total\n",
+    "number of targets, that is"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8ce9d737",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\text{Accuracy} = \\frac{\\sum_{i=1}^n I(t_i = y_i)}{n} ,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d5a473c6",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where $I$ is the indicator function, $1$ if $t_i = y_i$ and $0$\n",
+    "otherwise if we have a binary classification problem. Here $t_i$\n",
+    "represents the target and $y_i$ the outputs of your FFNN code and $n$ is simply the number of targets $t_i$.\n",
+    "\n",
+    "Discuss your results and give a critical analysis of the various parameters, including hyper-parameters like the learning rates and the regularization parameter $\\lambda$, various activation functions, number of hidden layers and nodes and activation functions.  \n",
+    "\n",
+    "Again, we strongly recommend that you compare your own neural Network\n",
+    "code for classification and pertinent results against a similar code using **Scikit-Learn**  or **tensorflow/keras** or **pytorch**.\n",
+    "\n",
+    "If you have time, you can use the functionality of **scikit-learn** and compare your neural network results with those from Logistic regression. This is optional.\n",
+    "The weblink  here <https://medium.com/ai-in-plain-english/comparison-between-logistic-regression-and-neural-networks-in-classifying-digits-dc5e85cd93c3>compares logistic regression and FFNN using the so-called MNIST data set. You may find several useful hints and ideas from this article. Your neural network code can implement the equivalent of logistic regression by simply setting the number of hidden layers to zero. \n",
+    "\n",
+    "If you wish to compare with say Logisti Regression from **scikit-learn**, the following code uses the above data set"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "id": "03436ae6",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "from sklearn.linear_model import LogisticRegression\n",
+    "# Initialize the model\n",
+    "model = LogisticRegression(solver='saga', multi_class='multinomial', max_iter=1000, random_state=42)\n",
+    "# Train the model\n",
+    "model.fit(X_train, y_train)\n",
+    "from sklearn.metrics import accuracy_score\n",
+    "# Make predictions on the test set\n",
+    "y_pred = model.predict(X_test)\n",
+    "# Calculate accuracy\n",
+    "accuracy = accuracy_score(y_test, y_pred)\n",
+    "print(f\"Model Accuracy: {accuracy:.4f}\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "45eefe11",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "### Part g) Critical evaluation of the various algorithms\n",
+    "\n",
+    "After all these glorious calculations, you should now summarize the\n",
+    "various algorithms and come with a critical evaluation of their pros\n",
+    "and cons. Which algorithm works best for the regression case and which\n",
+    "is best for the classification case. These codes can also be part of\n",
+    "your final project 3, but now applied to other data sets."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b8109105",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Background literature\n",
+    "\n",
+    "1. The text of Michael Nielsen is highly recommended, see Nielsen's book at <http://neuralnetworksanddeeplearning.com/>. It is an excellent read.\n",
+    "\n",
+    "2. Goodfellow, Bengio and Courville, Deep Learning at <https://www.deeplearningbook.org/>. Here we recommend chapters 6, 7 and 8\n",
+    "\n",
+    "3. Raschka et al. at <https://sebastianraschka.com/blog/2022/ml-pytorch-book.html>. Here we recommend chapters 11, 12 and 13."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f2ef8297",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Introduction to numerical projects\n",
+    "\n",
+    "Here follows a brief recipe and recommendation on how to write a report for each\n",
+    "project.\n",
+    "\n",
+    "  * Give a short description of the nature of the problem and the eventual  numerical methods you have used.\n",
+    "\n",
+    "  * Describe the algorithm you have used and/or developed. Here you may find it convenient to use pseudocoding. In many cases you can describe the algorithm in the program itself.\n",
+    "\n",
+    "  * Include the source code of your program. Comment your program properly.\n",
+    "\n",
+    "  * If possible, try to find analytic solutions, or known limits in order to test your program when developing the code.\n",
+    "\n",
+    "  * Include your results either in figure form or in a table. Remember to        label your results. All tables and figures should have relevant captions        and labels on the axes.\n",
+    "\n",
+    "  * Try to evaluate the reliabilty and numerical stability/precision of your results. If possible, include a qualitative and/or quantitative discussion of the numerical stability, eventual loss of precision etc.\n",
+    "\n",
+    "  * Try to give an interpretation of you results in your answers to  the problems.\n",
+    "\n",
+    "  * Critique: if possible include your comments and reflections about the  exercise, whether you felt you learnt something, ideas for improvements and  other thoughts you've made when solving the exercise. We wish to keep this course at the interactive level and your comments can help us improve it.\n",
+    "\n",
+    "  * Try to establish a practice where you log your work at the  computerlab. You may find such a logbook very handy at later stages in your work, especially when you don't properly remember  what a previous test version  of your program did. Here you could also record  the time spent on solving the exercise, various algorithms you may have tested or other topics which you feel worthy of mentioning."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "02555ef2",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Format for electronic delivery of report and programs\n",
+    "\n",
+    "The preferred format for the report is a PDF file. You can also use DOC or postscript formats or as an ipython notebook file.  As programming language we prefer that you choose between C/C++, Fortran2008 or Python. The following prescription should be followed when preparing the report:\n",
+    "\n",
+    "  * Use Canvas to hand in your projects, log in  at  <https://www.uio.no/english/services/it/education/canvas/> with your normal UiO username and password.\n",
+    "\n",
+    "  * Upload **only** the report file or the link to your GitHub/GitLab or similar typo of  repos!  For the source code file(s) you have developed please provide us with your link to your GitHub/GitLab or similar  domain.  The report file should include all of your discussions and a list of the codes you have developed.  Do not include library files which are available at the course homepage, unless you have made specific changes to them.\n",
+    "\n",
+    "  * In your GitHub/GitLab or similar repository, please include a folder which contains selected results. These can be in the form of output from your code for a selected set of runs and input parameters.\n",
+    "\n",
+    "Finally, \n",
+    "we encourage you to collaborate. Optimal working groups consist of \n",
+    "2-3 students. You can then hand in a common report."
+   ]
+  }
+ ],
+ "metadata": {},
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/doc/src/Projects/2025/Project2/Project2.do.txt b/doc/src/Projects/2025/Project2/Project2.do.txt
new file mode 100644
index 000000000..8500697ec
--- /dev/null
+++ b/doc/src/Projects/2025/Project2/Project2.do.txt
@@ -0,0 +1,373 @@
+TITLE: Project 2 on Machine Learning, deadline November 10 (Midnight)
+AUTHOR: "Data Analysis and Machine Learning FYS-STK3155/FYS4155":"http://www.uio.no/studier/emner/matnat/fys/FYS3155/index-eng.html" {copyright, 1999-present|CC BY-NC} at University of Oslo, Norway
+DATE: October 14, 2025
+
+
+===== Deliverables =====
+
+First, join a group in canvas with your group partners. Pick an avaliable group for Project 2 in the _People_ page.
+
+In canvas, deliver as a group and include:
+
+* A PDF of your report which follows the guidelines covered below and in the week 39 exercises. Additional requirements include:
+  * It should be around 5000 words, use the word counter in Overleaf for this. This often corresponds to 10-12 pages. References and appendices are excluded from the word count
+  * It should include around 10-15 figures. You can include more figures in appendices and/or as supplemental material in your repository.
+
+* A comment linking to your github repository (or folder in one of your github repositories) for this project. The repository must include
+A PDF file of the report
+  * A folder named Code, where you put python files for your functions and notebooks for reproducing your results. Remember to use a seed for generating random data and for train-test splits when generating final results.
+  * A README file with the name of the group members
+  * a short description of the project
+  * a description of how to install the required packages to run your code from a requirements.txt file or similar (such as a plain text description) names and descriptions of the various notebooks in the Code folder and the results they produce
+
+
+
+=== Preamble: Note on writing reports, using reference material, AI and other tools ===
+
+We want you to answer the three different projects by handing in
+reports written like a standard scientific/technical report. The links
+at
+https://github.com/CompPhysics/MachineLearning/tree/master/doc/Projects
+contain more information. There you can find examples of previous
+reports, the projects themselves, how we grade reports etc. How to
+write reports will also be discussed during the various lab
+sessions. Please do ask us if you are in doubt.
+
+When using codes and material from other sources, you should refer to
+these in the bibliography of your report, indicating wherefrom you for
+example got the code, whether this is from the lecture notes,
+softwares like Scikit-Learn, TensorFlow, PyTorch or other
+sources. These sources should always be cited correctly. How to cite
+some of the libraries is often indicated from their corresponding
+GitHub sites or websites, see for example how to cite Scikit-Learn at
+https://scikit-learn.org/dev/about.html.
+
+We enocurage you to use tools like ChatGPT or similar in writing the
+report. If you use for example ChatGPT, please do cite it properly and
+include (if possible) your questions and answers as an addition to the
+report. This can be uploaded to for example your website,
+GitHub/GitLab or similar as supplemental material.
+
+If you would like to study other data sets, feel free to propose other
+sets. What we have proposed here are mere suggestions from our
+side. If you opt for another data set, consider using a set which has
+been studied in the scientific literature. This makes it easier for
+you to compare and analyze your results. Comparing with existing
+results from the scientific literature is also an essential element of
+the scientific discussion. The University of California at Irvine with
+its Machine Learning repository at
+https://archive.ics.uci.edu/ml/index.php is an excellent site to look
+up for examples and inspiration. Kaggle.com is an equally interesting
+site. Feel free to explore these sites. 
+
+
+
+
+===== Classification and Regression, writing our own neural network code  =====
+
+The main aim of this project is to study both classification and
+regression problems by developing our own 
+feed-forward neural network (FFNN) code. The exercises from week 41 and 42 (see URL:"https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/exercisesweek41.html" and URL:"https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/exercisesweek42.html") as well as the lecture material from the same weeks (see  URL:"https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/week41.html" and URL:"https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/week42.html") should contain enough information for you to get started with writing your own code.
+
+We will also reuse our codes on gradient descent methods from project 1.
+
+The data sets that we propose here are (the default sets)
+
+* Regression (fitting a continuous function). In this part you will need to bring back your results from project 1 and compare these with what you get from your Neural Network code to be developed here. The data sets could be
+  * The simple one-dimensional function Runge function from project 1, that is $f(x) = \frac{1}{1+25x^2}$. We recommend using a simpler function when developing your neural network code for regression problems. Feel however free to discuss and study other functions, such as the two-dimensional Runge function $f(x,y)=\left[(10x - 5)^2 + (10y - 5)^2 + 1 \right]^{-1}$, or even more complicated two-dimensional functions (see the supplementary material of URL:"https://www.nature.com/articles/s41467-025-61362-4" for an extensive list of two-dimensional functions). 
+* Classification.
+ * We will consider a multiclass classification problem given by the full MNIST data set. The full data set is at URL:"https://www.kaggle.com/datasets/hojjatk/mnist-dataset".
+
+
+
+We will start with a regression problem and we will reuse our codes on gradient descent methods from project 1.
+
+
+=== Part a): Analytical warm-up ===
+
+When using our gradient machinery from project 1, we will need the expressions for the cost/loss functions and their respective
+gradients. The functions whose gradients we need are:
+o The mean-squared error (MSE) with and without the $L_1$ and $L_2$ norms (regression problems)
+o The binary cross entropy (aka log loss)  for binary classification problems with and without $L_1$ and $L_2$ norms
+o The multiclass cross entropy cost/loss function (aka Softmax cross entropy or just Softmax loss function)
+
+Set up these three cost/loss functions and their respective derivatives and explain the various terms. In this project you will however only use the MSE and the Softmax  cross entropy.
+
+We will test three activation functions for our neural network setup, these are the 
+o The Sigmoid (aka _logit_) function,
+o the RELU function and
+o the Leaky RELU function
+
+Set up their expressions and their first derivatives.
+You may consult the lecture notes (with codes and more) from week 42 at URL:"https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/week42.html".
+
+=== Reminder about the gradient machinery from project 1 ===
+
+In the setup of a neural network code you will need your gradient descent codes from
+project 1.  For neural networks we will recommend using stochastic
+gradient descent with either the RMSprop or the ADAM algorithms for
+updating the learning rates. But you should feel free to try plain gradient descent as well.
+
+
+We recommend reading chapter 8 on optimization from the textbook of
+Goodfellow, Bengio and Courville at
+URL:"https://www.deeplearningbook.org/". This chapter contains many
+useful insights and discussions on the optimization part of machine
+learning.  A useful reference on the back progagation algorithm is
+Nielsen's book at URL:"http://neuralnetworksanddeeplearning.com/". 
+
+You will find the Python "Seaborn
+package":"https://seaborn.pydata.org/generated/seaborn.heatmap.html"
+useful when plotting the results as function of the learning rate
+$\eta$ and the hyper-parameter $\lambda$ .
+
+
+
+=== Part b): Writing your own Neural Network code  ===
+
+Your aim now, and this is the central part of this project, is to
+write your own FFNN code implementing the back
+propagation algorithm discussed in the lecture slides from week 41 at URL:"https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/week41.html" and week 42 at URL:"https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/week42.html".
+
+
+We will focus on a regression problem first, using the one-dimensional Runge function
+!bt
+\[
+f(x) = \frac{1}{1+25x^2},
+\]
+!et
+from project 1.
+
+Use only the mean-squared error as cost function (no regularization terms) and 
+write an FFNN code for a regression problem with a flexible number of hidden
+layers and nodes using only the Sigmoid function as activation function for
+the hidden layers. Initialize the weights using a normal
+distribution. How would you initialize the biases? And which
+activation function would you select for the final output layer?
+And how would you set up your design/feature matrix? Hint: does it have to represent a polynomial approximation as you did in project 1? 
+
+Train your network and compare the results with those from your OLS
+regression code from project 1 using the one-dimensional Runge
+function.  When comparing your neural network code with the OLS
+results from project 1, use the same data sets which gave you the best
+MSE score. Moreover, use the polynomial order from project 1 that gave you the
+best result.  Compare these results with your neural network with one
+and two hidden layers using $50$ and $100$ hidden nodes, respectively.
+
+
+
+Comment your results and give a critical discussion of the results
+obtained with the OLS code from project 1 and your own neural network
+code.  Make an analysis of the learning rates employed to find the
+optimal MSE score. Test both stochastic gradient descent
+with RMSprop and ADAM and plain gradient descent with different
+learning rates.
+
+You should, as you did in project 1, scale your data.
+
+=== Part c): Testing against other software libraries ===
+
+You should test your results against a similar code using _Scikit-Learn_ (see the examples in the above lecture notes from weeks 41 and 42) or _tensorflow/keras_ or _Pytorch_ (for Pytorch, see Raschka et al.'s text chapters 12 and 13). 
+
+Furthermore, you should also test that your derivatives are correctly
+calculated using automatic differentiation, using for example the
+_Autograd_ library or the _JAX_ library. It is optional to implement
+these libraries for the present project. In this project they serve as
+useful tests of our derivatives.
+
+
+=== Part d): Testing different activation functions and depths of the neural network ===
+
+You should also test different activation functions for the hidden
+layers. Try out the Sigmoid, the RELU and the Leaky RELU functions and
+discuss your results.  Test your results as functions of the number of hidden layers and nodes. Do you see signs of overfitting?
+It is optional in this project to perform a bias-variance trade-off analysis. 
+
+
+
+=== Part e): Testing different norms ===
+
+Finally, still using the one-dimensional Runge function, add now the
+hyperparameters $\lambda$ with the $L_2$ and $L_1$ norms.  Find the
+optimal results for the hyperparameters $\lambda$ and the learning
+rates $\eta$ and neural network architecture and compare the $L_2$ results with Ridge regression from
+project 1 and the $L_1$ results with the Lasso calculations of project 1.
+Use again the same data sets and the best results from project 1 in your comparisons. 
+
+
+
+
+=== Part f): Classification  analysis using neural networks  ===
+
+With a well-written code it should now be easy to change the
+activation function for the output layer.
+
+Here we will change the cost function for our neural network code
+developed in parts b), d) and e) in order to perform a classification
+analysis.  The classification problem we will study is the multiclass
+MNIST problem, see the description of the full data set at
+URL:"https://www.kaggle.com/datasets/hojjatk/mnist-dataset". We will use the Softmax cross entropy function discussed in a). 
+The MNIST data set discussed in the lecture notes from week 42 is a downscaled variant of the full dataset. 
+
+Feel free to suggest other data sets. If you find the classic MNIST data set somewhat limited, feel free to try the  
+MNIST-Fashion data set at for example URL:"https://www.kaggle.com/datasets/zalando-research/fashionmnist".
+
+
+To set up the data set, the following python programs may be useful
+!bc pycod
+from sklearn.datasets import fetch_openml
+
+# Fetch the MNIST dataset
+mnist = fetch_openml('mnist_784', version=1, as_frame=False, parser='auto')
+
+# Extract data (features) and target (labels)
+X = mnist.data
+y = mnist.target
+!ec
+You should consider scaling the data. The Pixel values in MNIST range from 0 to 255. Scaling them to a 0-1 range can improve the performance of some models. That is, you could implement the following scaling
+!bc pycod
+X = X / 255.0
+!ec
+And then perform the standard train-test splitting
+!bc pycod
+from sklearn.model_selection import train_test_split
+X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
+!ec
+
+To measure the performance of our classification problem we will use the
+so-called *accuracy* score.  The accuracy is as you would expect just
+the number of correctly guessed targets $t_i$ divided by the total
+number of targets, that is 
+
+
+!bt  
+\[ 
+\text{Accuracy} = \frac{\sum_{i=1}^n I(t_i = y_i)}{n} ,
+\]
+!et  
+
+where $I$ is the indicator function, $1$ if $t_i = y_i$ and $0$
+otherwise if we have a binary classification problem. Here $t_i$
+represents the target and $y_i$ the outputs of your FFNN code and $n$ is simply the number of targets $t_i$.
+
+Discuss your results and give a critical analysis of the various parameters, including hyper-parameters like the learning rates and the regularization parameter $\lambda$, various activation functions, number of hidden layers and nodes and activation functions.  
+
+
+Again, we strongly recommend that you compare your own neural Network
+code for classification and pertinent results against a similar code using _Scikit-Learn_  or _tensorflow/keras_ or _pytorch_.
+
+
+If you have time, you can use the functionality of _scikit-learn_ and compare your neural network results with those from Logistic regression. This is optional.
+The weblink  here URL:"https://medium.com/ai-in-plain-english/comparison-between-logistic-regression-and-neural-networks-in-classifying-digits-dc5e85cd93c3"compares logistic regression and FFNN using the so-called MNIST data set. You may find several useful hints and ideas from this article. Your neural network code can implement the equivalent of logistic regression by simply setting the number of hidden layers to zero and keeping just the input and the output layers. 
+
+If you wish to compare with say Logisti Regression from _scikit-learn_, the following code uses the above data set
+!bc pycod
+from sklearn.linear_model import LogisticRegression
+# Initialize the model
+model = LogisticRegression(solver='saga', multi_class='multinomial', max_iter=1000, random_state=42)
+# Train the model
+model.fit(X_train, y_train)
+from sklearn.metrics import accuracy_score
+# Make predictions on the test set
+y_pred = model.predict(X_test)
+# Calculate accuracy
+accuracy = accuracy_score(y_test, y_pred)
+print(f"Model Accuracy: {accuracy:.4f}")
+!ec
+
+=== Part g) Critical evaluation of the various algorithms ===
+
+After all these glorious calculations, you should now summarize the
+various algorithms and come with a critical evaluation of their pros
+and cons. Which algorithm works best for the regression case and which
+is best for the classification case. These codes can also be part of
+your final project 3, but now applied to other data sets.
+
+===== Summary of methods to implement and analyze =====
+
+_Required Implementation:_
+o Reuse the regression code and results from project 1, these will act as a benchmark for seeing how suited a neural network is for this regression task.
+o Implement a neural network with
+  * A flexible number of layers
+  * A flexible number of nodes in each layer
+  * A changeable activation function in each layer (Sigmoid, ReLU, LeakyReLU, as well as Linear and Softmax)
+  * A changeable cost function, which will be set to MSE for regression and cross-entropy for multiple-classification
+  * An optional L1 or L2 norm of the weights and biases in the cost function (only used for computing gradients, not interpretable metrics)
+o Implement the back-propagation algorithm to compute the gradient of your neural network
+o Reuse the implementation of Plain and Stochastic Gradient Descent from Project 1 (and adapt the code to work with the your neural network)
+  * With no optimization algorithm
+  * With RMS Prop
+  * With ADAM
+o Implement scaling and train-test splitting of your data, preferably using sklearn
+o Implement and compute metrics like the MSE and Accuracy
+
+=== Required Analysis: ===
+o Briefly show and argue for the advantages and disadvantages of the methods from Project 1.
+o Explore and show the impact of changing the number of layers, nodes per layer, choice of activation function, and inclusion of L1 and L2 norms. Present only the most interesting results from this exploration. 2D Heatmaps will be good for this: Start with finding a well performing set of hyper-parameters, then change two at a time in a range that shows good and bad performance.
+o Show and argue for the advantages and disadvantages of using a neural network for regression on your data
+o Show and argue for the advantages and disadvantages of using a neural network for classification on your data
+o Show and argue for the advantages and disadvantages of the different gradient methods and learning rates when training the neural network
+
+=== Optional (Note that you should include at least two of these in the report): ===
+o Implement Logistic Regression as simple classification model case (equivalent to a Neural Network with just the output layer)
+o Compute the gradient of the neural network with autograd, to show that it gives the same result as your hand-written backpropagation.
+o Compare your results with results from using a machine-learning library like pytorch (https://docs.pytorch.org/tutorials/beginner/basics/buildmodel_tutorial.html)
+o Use a more complex classification dataset instead, like the fashion MNIST (see URL:"https://www.kaggle.com/datasets/zalando-research/fashionmnist")
+o Use a more complex regression dataset instead, like the two-dimensional Runge function $f(x,y)=\left[(10x - 5)^2 + (10y - 5)^2 + 1 \right]^{-1}$, or even more complicated two-dimensional functions (see the supplementary material of URL:"https://www.nature.com/articles/s41467-025-61362-4" for an extensive list of two-dimensional functions). 
+o Compute and interpret a confusion matrix of your best classification model (see URL:"https://www.researchgate.net/figure/Confusion-matrix-of-MNIST-and-F-MNIST-embeddings_fig5_349758607")
+
+
+
+
+===== Background literature =====
+
+o The text of Michael Nielsen is highly recommended, see Nielsen's book at URL:"http://neuralnetworksanddeeplearning.com/". It is an excellent read.
+
+o Goodfellow, Bengio and Courville, Deep Learning at URL:"https://www.deeplearningbook.org/". Here we recommend chapters 6, 7 and 8
+
+o Raschka et al. at URL:"https://sebastianraschka.com/blog/2022/ml-pytorch-book.html". Here we recommend chapters 11, 12 and 13.
+
+===== Introduction to numerical projects =====
+
+Here follows a brief recipe and recommendation on how to write a report for each
+project.
+
+  * Give a short description of the nature of the problem and the eventual  numerical methods you have used.
+
+  * Describe the algorithm you have used and/or developed. Here you may find it convenient to use pseudocoding. In many cases you can describe the algorithm in the program itself.
+
+  * Include the source code of your program. Comment your program properly.
+
+  * If possible, try to find analytic solutions, or known limits in order to test your program when developing the code.
+
+  * Include your results either in figure form or in a table. Remember to        label your results. All tables and figures should have relevant captions        and labels on the axes.
+
+  * Try to evaluate the reliabilty and numerical stability/precision of your results. If possible, include a qualitative and/or quantitative discussion of the numerical stability, eventual loss of precision etc.
+
+  * Try to give an interpretation of you results in your answers to  the problems.
+
+  * Critique: if possible include your comments and reflections about the  exercise, whether you felt you learnt something, ideas for improvements and  other thoughts you've made when solving the exercise. We wish to keep this course at the interactive level and your comments can help us improve it.
+
+  * Try to establish a practice where you log your work at the  computerlab. You may find such a logbook very handy at later stages in your work, especially when you don't properly remember  what a previous test version  of your program did. Here you could also record  the time spent on solving the exercise, various algorithms you may have tested or other topics which you feel worthy of mentioning.
+
+
+
+
+
+
+===== Format for electronic delivery of report and programs =====
+
+The preferred format for the report is a PDF file. You can also use DOC or postscript formats or as an ipython notebook file.  As programming language we prefer that you choose between C/C++, Fortran2008 or Python. The following prescription should be followed when preparing the report:
+
+  * Use Canvas to hand in your projects, log in  at  URL:"https://www.uio.no/english/services/it/education/canvas/" with your normal UiO username and password.
+
+  * Upload _only_ the report file or the link to your GitHub/GitLab or similar typo of  repos!  For the source code file(s) you have developed please provide us with your link to your GitHub/GitLab or similar  domain.  The report file should include all of your discussions and a list of the codes you have developed.  Do not include library files which are available at the course homepage, unless you have made specific changes to them.
+
+  * In your GitHub/GitLab or similar repository, please include a folder which contains selected results. These can be in the form of output from your code for a selected set of runs and input parameters.
+
+
+Finally, 
+we encourage you to collaborate. Optimal working groups consist of 
+2-3 students. You can then hand in a common report. 
+
+
diff --git a/doc/src/Projects/2025/Project2/clean.sh b/doc/src/Projects/2025/Project2/clean.sh
new file mode 100755
index 000000000..2e5da2c72
--- /dev/null
+++ b/doc/src/Projects/2025/Project2/clean.sh
@@ -0,0 +1,3 @@
+#!/bin/sh
+doconce clean
+rm -rf *.pdf *.tex ipynb*.tar.gz *.html ._*.html *~ reveal.js Trash README.txt
diff --git a/doc/src/Projects/2025/Project2/make.sh b/doc/src/Projects/2025/Project2/make.sh
new file mode 100755
index 000000000..0d2bca146
--- /dev/null
+++ b/doc/src/Projects/2025/Project2/make.sh
@@ -0,0 +1,87 @@
+#!/bin/sh
+set -x
+
+function system {
+  "$@"
+  if [ $? -ne 0 ]; then
+    echo "make.sh: unsuccessful command $@"
+    echo "abort!"
+    exit 1
+  fi
+}
+
+if [ $# -eq 0 ]; then
+echo 'bash make.sh slides1|slides2'
+exit 1
+fi
+
+name=$1
+rm -f *.tar.gz
+
+opt="--encoding=utf-8"
+opt=
+
+rm -f *.aux
+
+
+
+# Plain HTML documents
+html=${name}
+system doconce format html $name --pygments_html_style=default --html_style=bloodish --html_links_in_new_window --html_output=$html $opt
+system doconce split_html $html.html --method=space10
+
+# Bootstrap style
+html=${name}-bs
+system doconce format html $name --html_style=bootstrap --pygments_html_style=default --html_admon=bootstrap_panel --html_output=$html $opt
+system doconce split_html $html.html --method=split --pagination --nav_button=bottom
+
+# IPython notebook
+system doconce format ipynb $name $opt
+
+
+# Ordinary plain LaTeX document
+system doconce format pdflatex $name --print_latex_style=trac --latex_admon=paragraph $opt
+system doconce ptex2tex $name envir=verbatim
+# Add special packages
+doconce subst "% Add user's preamble" "\g<1>\n\\usepackage{simplewick}" $name.tex
+doconce replace 'section{' 'section*{' $name.tex
+pdflatex -shell-escape $name
+pdflatex -shell-escape $name
+mv -f $name.pdf ${name}.pdf
+cp $name.tex ${name}.tex
+
+# Publish
+dest=../../../../Projects/2025
+if [ ! -d $dest/$name ]; then
+mkdir $dest/$name
+mkdir $dest/$name/pdf
+mkdir $dest/$name/html
+mkdir $dest/$name/ipynb
+fi
+cp ${name}*.tex $dest/$name/pdf
+cp ${name}*.pdf $dest/$name/pdf
+cp -r ${name}*.html ._${name}*.html $dest/$name/html
+
+# Figures: cannot just copy link, need to physically copy the files
+if [ -d fig-${name} ]; then
+if [ ! -d $dest/$name/html/fig-$name ]; then
+mkdir $dest/$name/html/fig-$name
+fi
+cp -r fig-${name}/* $dest/$name/html/fig-$name
+fi
+
+cp ${name}.ipynb $dest/$name/ipynb
+ipynb_tarfile=ipynb-${name}-src.tar.gz
+if [ ! -f ${ipynb_tarfile} ]; then
+cat > README.txt <<EOF
+This IPython notebook ${name}.ipynb does not require any additional
+programs.
+EOF
+tar czf ${ipynb_tarfile} README.txt
+fi
+cp ${ipynb_tarfile} $dest/$name/ipynb
+
+
+
+
+
diff --git a/doc/src/Reinforcement/Reinforce.do.txt b/doc/src/Reinforcement/Reinforce.do.txt
index b37060c4a..8b9da5824 100644
--- a/doc/src/Reinforcement/Reinforce.do.txt
+++ b/doc/src/Reinforcement/Reinforce.do.txt
@@ -1,5 +1,5 @@
-TITLE: Data Analysis and Machine Learning: Reinforcement Learning 
-AUTHOR: Morten Hjorth-Jensen {copyright, 1999-present|CC BY-NC} at Department of Physics, University of Oslo & Department of Physics and Astronomy and National Superconducting Cyclotron Laboratory, Michigan State University
+TITLE: Reinforcement Learning 
+AUTHOR: Morten Hjorth-Jensen {copyright, 1999-present|CC BY-NC} at Department of Physics, University of Oslo, Norway
 DATE: today
 
 !split
@@ -10,117 +10,413 @@ DATE: today
 !split
 ===== Reinforcement Learning: Overarching view =====
 
-Reinforcement Learning (RL) is one of the most exciting fields of Machine Learning today, and also one
-of the oldest. It has been around since the 1950s, producing many interesting applications over the years.
 
 
+#Figure: A simple depiction of the reinforcement learning agent-environment interaction. The agent (left) observes the current state s and takes an action# a; the environment (right) transitions to a new state s’ and provides a scalar reward R. This trial-and-error loop of interaction and feedback is centra#l to reinforcement learning .
+
+=== What is Reinforcement Learning (RL)? ===
+
+Reinforcement learning is a learning paradigm in which an agent learns
+to make decisions by interacting with an environment, aiming to
+maximize cumulative reward . Unlike supervised learning (with labeled
+examples) or unsupervised learning, an RL agent learns from trial and
+error—it discovers which actions yield the most reward by receiving
+feedback in the form of rewards or penalties. This framework is
+well-suited for sequential decision-making tasks with delayed feedback
+(rewards may be received long after actions are taken) . Key themes
+include the exploration vs. exploitation trade-off (trying new actions
+to discover rewards versus using known rewarding actions) and delayed
+reward (actions can have long-term consequences).
+
+=== Agent and Environment: ===
+
+The agent is the decision-maker or learner, and the environment is
+everything the agent interacts with outside of itself . At each
+discrete time step t, the agent observes a state s_t from the
+environment, chooses an action $a_t$, and in return the environment
+gives the agent a reward $r_{t+1}$ and a new state $s_{t+1}$. This
+loop (state → action → reward → next state) produces a trajectory of
+experience. The agent’s goal is to learn a policy (a strategy of
+choosing actions) that maximizes the total rewards over time . The
+sequence of interactions is often modeled as a Markov Decision Process
+(MDP), which we will formalize below.
+
+
+
+
+=== States, Actions, and Rewards: ===
+
+
+The state represents the situation of the environment as observed by
+the agent (it can be fully observable or partially observable). The
+action is a choice the agent can make (which may be discrete moves or
+continuous controls depending on the problem). The reward is a
+numerical feedback signal; positive values typically indicate
+desirable outcomes and negative values indicate undesirable
+outcomes. The agent’s objective is to maximize the cumulative reward
+(often a discounted sum of rewards over time). Importantly,
+reinforcement learning credits rewards to the actions that led to
+them, even if they are delayed – this credit assignment is a core
+challenge.  Example Applications: RL has been applied to a wide range
+of domains . In games, RL agents have achieved human or superhuman
+performance (e.g. AlphaGo and AlphaGo Zero learned to play the game of
+Go via self-play RL, achieving superhuman proficiency without any
+human-provided strategies ). In robotics, RL enables robots to learn
+control policies for manipulation and locomotion tasks (e.g. teaching
+a robotic arm to grasp objects through trial and error). In operations
+research and science, RL is used for optimizing decisions in systems
+such as resource management, chemical reaction control, or treatment
+planning in healthcare. These examples underscore the power of RL in
+sequential decision-making under uncertainty .
+
+
+=== Key Components of RL: ===
+
+o Policy $\p$i: The agent’s strategy or decision-making rule, mapping states to actions. A policy can be deterministic (always picks a
+specific action for a given state) or stochastic (assigns probabilities to possible actions) .
+o Return $G_t$: The total cumulative reward from time step t onward (often with a discount factor $\gamma\in[0,1]$ applied to future rewards). The return might be
+!bt
+\[ G_t = R_{t+1} + \gamma R_{t+2} + \gamma^2 R_{t+3} + \cdots.
+\]
+!et
+The agent seeks to maximize the expected return.
+
+
+=== Value Functions: ===
+
+These are core
+concepts in RL that estimate how good a state or state-action pair is,
+in terms of expected return. The state-value function V^\pi(s) is the
+expected return when starting in state s and following policy \pi
+thereafter . The action-value function (or Q-value) Q^\pi(s,a) is the
+expected return when starting in state s, taking action a, and then
+following policy \pi . Value functions allow the agent to predict
+future rewards and are used in many RL algorithms. We’ll explore these
+in depth in Lecture 2.  Model (optional): In some cases, the agent
+might have a model of the environment – i.e. knowledge of state
+transition probabilities and reward function. Model-based RL uses such
+a model to plan (as in dynamic programming methods). Model-free RL, on
+the other hand, learns purely from experience without an explicit
+model.
+
+A Simple Interaction Loop (Pseudo-Code): Below is a simple Python pseudo-code illustrating how an RL agent might interact with an environment using a given policy. This loop would repeat for many episodes of experience:
+
+!bc pycod
+# Pseudo-code for agent-environment interaction loop
+for episode in range(num_episodes):
+    state = env.reset()         # initialize environment and get initial state
+    done = False
+    while not done:
+        action = policy(state)  # choose action based on current policy
+        next_state, reward, done, info = env.step(action)  # take action, observe reward and next state
+        # (In a learning algorithm, here the agent would update its policy or value estimates using the reward)
+        state = next_state
+!ec
+
+This code uses a placeholder policy(state) which could be a function
+(or neural network) defining the agent’s behavior. In a real
+algorithm, the agent would also include an update step inside the loop
+to adjust its policy or value estimates based on the reward
+feedback. In upcoming lectures, we will see concrete algorithms
+(e.g. Q-learning, policy gradient) that fill in this update logic.
+
+Concept Check: Reinforcement learning problems can be identified by
+their components – try to identify the states, actions, and rewards in
+a familiar scenario (for example, an automated cart pole balancing
+task, or a game like Pac-Man). Consider what the agent’s goal is and
+how trial-and-error experience could lead to learning a good strategy.
+
+
+=== Exercise 1: ===
+
+Conceptual: Identify the states, actions, and rewards in a real-world problem that could be formulated as an RL task (e.g. robot navigation in a maze, stock trading agent, etc.). What might a good reward function be for this task?
+Research: Read Chapter 1 of Sutton & Barto’s Reinforcement Learning: An Introduction  to familiarize yourself with the key concepts and examples of reinforcement learning. Summarize in your own words the differences between RL and other machine learning paradigms.
+Coding (thought exercise): Install OpenAI Gym (or a similar environment library) and run a simple environment like CartPole-v1. Using the pseudo-code above as a guide, modify it to print out the sequence of (state, action, reward) for a single episode when the agent takes random actions. This will help you observe the agent-environment interaction firsthand.
+
 !split
-=====  Code example =====
-!bc pycod 
-"""
-A simple example for Reinforcement Learning using table lookup Q-learning method.
-An agent "o" is on the left of a 1 dimensional world, the treasure is on the rightmost location.
-Run this program and to see how the agent will improve its strategy of finding the treasure.
-View more on my tutorial page: https://morvanzhou.github.io/tutorials/
-"""
+===== Markov Decision Processes and Dynamic Programming =====
+
+
+Theoretical Foundations: In this lecture, we delve into the formal framework underlying reinforcement learning: the Markov Decision Process (MDP). We also cover value functions and the fundamental Bellman equations, and see how planning algorithms (dynamic programming) can compute optimal policies when the model is known. Key convergence results will be discussed (e.g. why iterative methods converge to optimal value functions).
+
+Markov Decision Process (MDP): An MDP provides a mathematical formulation of a sequential decision-making problem. It is defined by the tuple (S, A, P, R, \gamma) : 
+S is the set of states (can be finite or infinite).
+A is the set of actions available to the agent. Some formulations use A(s) for the actions available from state s.
+P is the state transition probability function P(s’ \mid s, a), which gives the probability of transitioning to state s’ when action a is taken in state s . This encapsulates the environment’s dynamics (the “model”).
+R is the reward function R(s,a,s’), giving the expected immediate reward of transitioning from s to s’ via action a . (Sometimes R(s,a) or R(s) is used if reward depends only on state or state-action.)
+\gamma \in [0,1] is the discount factor, which determines how future rewards are weighted relative to immediate rewards . A \gamma close to 1 means future rewards are considered nearly as important as immediate rewards (long-term focus), whereas \gamma = 0 makes the agent short-sighted (consider only immediate reward).
+ An MDP satisfies the Markov property: the future state s’ depends on the current state and action (s,a) only, not on the history of past states . In other words, given the present state, the past is irrelevant for predicting the future. This memoryless property allows MDPs to be analyzed using dynamic programming and other mathematical techniques.
+Policy and Value Functions: A policy \pi is a mapping from states to a probability distribution over actions. At time t, given state s_t, the policy \pi(a \mid s_t) is the probability of selecting action a. We distinguish: 
+Deterministic policy: \pi(s) = a. (A fixed action for each state.)
+Stochastic policy: \pi(a \mid s). (A distribution over actions.) .
+ The quality of a policy is evaluated by value functions: 
+State-Value Function V^\pi(s): the expected return (cumulative discounted reward) when starting from state s and following policy \pi thereafter . Formally, V^\pi(s) = \mathbb{E}\pi\Big[ \sum{t=0}^{\infty} \gamma^t \,R_{t+1} \;\Big|\; S_0 = s \Big]. This expectation is taken over the randomness in the policy and environment.
+Action-Value Function Q^\pi(s,a): the expected return when starting from state s, taking action a, and thereafter following policy \pi . Formally, Q^\pi(s, a) = \mathbb{E}\pi\Big[ \sum{t=0}^{\infty} \gamma^t \,R_{t+1} \;\Big|\; S_0 = s, A_0 = a \Big]. Q^\pi(s,a) can be understood as the value of taking action a in state s and then behaving according to \pi.
+ The goal in reinforcement learning is often to find an optimal policy \pi^ that maximizes the value function for all states. The optimal state-value function V^(s) gives the maximum achievable value starting from state s, and the optimal action-value function Q^(s,a) gives the maximum achievable value of state-action pair (s,a) (under any policy)  . An optimal policy \pi^ achieves these values and satisfies V^(s) = \max_a Q^(s,a) for all s.
+Bellman Equations: Richard Bellman’s principle of optimality leads to recursive relationships for value functions. For any policy \pi, the Bellman expectation equation holds: V^\pi(s) = \sum_{a \in A} \pi(a\mid s) \sum_{s’} P(s’ \mid s,a)\Big( R(s,a,s’) + \gamma\,V^\pi(s’) \Big). This says: under policy \pi, the value of state s equals the expected immediate reward plus the discounted value of the next state, averaging over the randomness in action selection and state transitions . Similarly for the action-value: Q^\pi(s,a) = \sum_{s’} P(s’\mid s,a)\Big( R(s,a,s’) + \gamma \sum_{a’} \pi(a’ \mid s’) Q^\pi(s’,a’) \Big). Under an optimal policy \pi^, we have the Bellman optimality equations: V^(s) = \max_{a}\sum_{s’} P(s’\mid s,a)\Big( R(s,a,s’) + \gamma\,V^(s’) \Big), Q^(s,a) = \sum_{s’} P(s’\mid s,a)\Big( R(s,a,s’) + \gamma \max_{a’} Q^(s’,a’) \Big). These are fundamental nonlinear equations that the optimal value functions must satisfy  . Notably, V^ is the unique fixed point of the Bellman optimality operator (which is a contraction mapping with factor \gamma < 1), so iterative algorithms can converge to V^*.
+Dynamic Programming (DP) for Planning: If the agent has a perfect model of the MDP (known P and R), it can compute optimal value functions and policies without learning by using dynamic programming. Classic DP algorithms for solving MDPs include: 
+Policy Evaluation: Given a fixed policy \pi, compute V^\pi (or Q^\pi). This can be done iteratively: initialize V_0(s) arbitrarily, then update V_{k+1}(s) = \sum_{a}\pi(a\mid s)\sum_{s’} P(s’|s,a)\big( R(s,a,s’) + \gamma V_k(s’)\big) for all states s. This converges to V^\pi as k\to\infty , thanks to the contraction property of the Bellman operator.
+Policy Improvement: Given V^\pi, we can derive a better policy \pi’ by acting greedily with respect to V^\pi. Specifically, for each state s, choose \pi’(s) = \arg\max_a \sum_{s’} P(s’|s,a)\big( R(s,a,s’) + \gamma V^\pi(s’)\big). This greedy policy is guaranteed to be equal or better than \pi (this is proved by showing V^{\pi’}(s)\ge V^{\pi}(s) for all s ).
+Policy Iteration: Alternate policy evaluation and policy improvement. Start with an initial policy \pi_0. Evaluate V^{\pi_0}, then improve to \pi_1, then evaluate V^{\pi_1}, and so on. This process will converge in a finite number of iterations to an optimal policy \pi^ and optimal value function V^  . Policy iteration typically converges quickly (often in a handful of iterations for small MDPs).
+Value Iteration: This algorithm merges policy evaluation and improvement into a single update. Starting with an arbitrary value function V_0, we iteratively apply the Bellman optimality update to all states: V_{k+1}(s) = \max_{a}\sum_{s’} P(s’|s,a)\big( R(s,a,s’) + \gamma V_k(s’) \big). Value iteration is essentially a fixed-point iteration for V^; it converges to the optimal value function V^ as k \to \infty  . Once V^* is obtained (within some small tolerance), the optimal policy can be extracted by choosing in each state the action that achieves the max in the above equation.
+
+
+
+Figure: A Markov Decision Process with three states (green circles) and two actions (orange circles). Arrows indicate state transitions under different actions, and wavy arrows represent rewards on certain transitions. An optimal policy in such an MDP would choose actions to maximize the long-term collected rewards. (Image: waldoalvarez, CC BY-SA 4.0)
+
+Example – Gridworld: Consider an agent navigating a grid maze. States could be the grid cells, actions might be {Up, Down, Left, Right}, and rewards could be -1 per step (to encourage shortest paths) plus a +10 reward for reaching a goal cell (and 0 until then). We can model this as an MDP. Using dynamic programming, we could compute the optimal value of each cell (essentially the negative distance to the goal, accounting for the step penalty) and derive an optimal policy (pointing straight toward the goal). This is exactly what value iteration or policy iteration would yield. The DP algorithms will converge to V^ and \pi^ for this gridworld, illustrating the principles of optimality.
+Convergence and Optimality: Under certain conditions (e.g. finite state spaces and \gamma<1), policy iteration and value iteration are guaranteed to converge to the optimal value function and policy  . Value iteration iteratively approximates V^* from below or above and converges because the Bellman optimality operator is a contraction mapping (intuitively, each iteration “pulls” the value function closer to the fixed point). Policy iteration converges in at most a finite number of improvements because there are only finitely many distinct policies in a finite MDP, and each improvement yields a strictly better policy until optimal. These algorithms exemplify planning (computing a solution given a perfect model). In later lectures, we transition to learning methods for when P and R are unknown.
+Python Example – Value Iteration: Below is a simple code sketch for value iteration on a small MDP. (For illustration, assume P[s][a] gives a list of (prob, next_state, reward) outcomes for taking action a in state s, and we have a list of states and actions defined.)
+
+# Initialize value function
+V = {s: 0.0 for s in states}
+gamma = 0.95
+theta = 1e-6  # convergence threshold
+
+while True:
+    delta = 0
+    for s in states:
+        # Compute the Bellman optimality backup for state s
+        best_value = float("-inf")
+        for a in actions:
+            # Calculate expected value for taking action a in state s
+            value_a = 0
+            for (prob, s_next, reward) in P[s][a]:
+                value_a += prob * (reward + gamma * V[s_next])
+            if value_a > best_value:
+                best_value = value_a
+        new_value = best_value
+        # Track the maximum change for convergence check
+        delta = max(delta, abs(new_value - V[s]))
+        V[s] = new_value
+    if delta < theta:
+        break
+
+# Derive optimal policy from the value function
+pi = {}
+for s in states:
+    # Select action with highest expected value
+    pi[s] = max(actions, key=lambda a: sum(prob * (reward + gamma * V[s_next])
+                                           for (prob, s_next, reward) in P[s][a]))
+In this code, we repeatedly update the value of each state by looking one step ahead (using the model P and reward). After convergence, we extract the policy by choosing the best action for each state. This corresponds to value iteration computing V^ and then \pi^. In practice, more efficient and specialized implementations exist, but this pseudocode captures the essence.
+
+Exercises (Lecture 2):
+
+Written: Prove the policy improvement theorem: show that if Q_\pi(s, a^) \ge Q_\pi(s, \pi(s)) for all states s (where a^ = \arg\max_a Q_\pi(s,a) is the greedy action), then the greedy policy \pi’ with \pi’(s)=a^* is as good as or better than \pi. (Hint: Compare V^{\pi’}(s) and V^\pi(s) using the definition of Q_\pi .) This forms the core of policy iteration’s correctness.
+Coding: Implement the gridworld example described above. Define a small grid, specify terminal states and rewards, and code policy iteration to find the optimal policy. Verify that your algorithm converges (for example, print the values each iteration to see monotonic improvement) and that the resulting policy makes intuitive sense (e.g. it should move toward the goal while avoiding obstacles if any).
+Research: Read Chapter 4 (Dynamic Programming) of Sutton & Barto’s book for deeper insight  . Focus on how the authors explain policy iteration and value iteration, and note the conditions required for convergence. What do they say about the complexity of these algorithms for large state spaces?
+
+
+
+===== Model-Free Learning – Monte Carlo, Temporal-Difference, Q-Learning and SARSA =====
+
+
+Theoretical Foundations: Thus far, we assumed the MDP’s dynamics were known (so we could directly apply dynamic programming). Now we shift to model-free reinforcement learning: the agent must learn optimal behavior from experience alone, without knowing the transition probabilities or reward function in advance. We introduce two fundamental approaches: Monte Carlo (MC) methods, which learn from episodic samples by averaging returns, and Temporal-Difference (TD) learning, which learns iteratively by bootstrapping estimates. We then focus on two classic TD control algorithms: Q-Learning and SARSA. These algorithms will illustrate the challenges of exploration, the concept of on-policy vs. off-policy learning, and key convergence properties (Watkins’s theorem for Q-learning convergence under certain conditions, etc.).
+
+Monte Carlo Learning: Monte Carlo methods learn value functions and policies by averaging over complete episodes of experience. An episode is a sequence of states, actions, rewards that ends in a terminal state. In MC policy evaluation, for example, to estimate V^\pi(s), the agent can simulate many episodes using policy \pi and compute the sample returns G_t following each occurrence of state s; the MC estimate of V^\pi(s) is the average of these returns. MC methods are model-free (they don’t require P or R), but they need episodes to terminate (or a mechanism to truncate infinite-horizon returns). Monte Carlo methods use actual returns for learning (no bootstrapping from current estimates). They eventually converge to true values given enough samples, by the Law of Large Numbers. However, pure MC has high variance and can be slow, since it waits until an episode is done to update estimates.
+Temporal-Difference (TD) Learning: TD methods combine ideas from DP and MC. Like DP, they bootstrap: they update estimates based partly on other learned estimates (e.g. using V(s’) to update V(s)). Like MC, they learn from raw experience without a model. A simple TD algorithm for policy evaluation is TD(0) (one-step temporal-difference): to estimate V^\pi, initialize V(s) arbitrarily. Then for each transition s_t \to a_t \to r_{t+1} \to s_{t+1} under policy \pi, update: V(s_t) \leftarrow V(s_t) + \alpha\,\big[ r_{t+1} + \gamma V(s_{t+1}) - V(s_t) \big], where \alpha is a step-size (learning rate)  . This is an example of the general TD error \delta_t = r_{t+1} + \gamma V(s_{t+1}) - V(s_t). The TD error measures the discrepancy between the current estimate and a “bootstrapped” one-step lookahead that includes the reward plus discounted next state value. TD(0) updates the value in the direction that reduces this error. Amazingly, TD(0) will converge to V^\pi for stationary policy \pi (under appropriate conditions like sufficiently small \alpha decreasing over time) – a result by Sutton (1988). Compared to Monte Carlo, TD can update after every step (no need to wait for episode end), and it often has lower variance because it doesn’t rely on complete returns – but it introduces bias by bootstrapping. There are also more general TD(\lambda) methods that mix n-step returns (beyond scope here).
+From Prediction to Control: So far, we discussed learning value functions for a fixed policy (prediction problem). The control problem is learning an optimal policy. Model-free control algorithms typically use some form of generalized policy iteration (GPI): they iteratively evaluate and improve the policy using sample data. Two quintessential algorithms for control are Q-Learning and SARSA, both of which learn action-value estimates, Q(s,a), and gradually make the policy greedy w.r.t. those estimates.
+Q-Learning (Off-Policy TD Control): Q-Learning, proposed by Watkins (1989), is one of the most famous RL algorithms. It learns the optimal action-value function Q^(s,a) independently of the policy being followed, hence it is an off-policy method. The core update (performed at each step from state s taking action a and landing in s’ with reward r) is: Q(s,a) \leftarrow Q(s,a) + \alpha \Big[ r + \gamma \max_{a’} Q(s’,a’) \;-\; Q(s,a) \Big]. This update uses the Bellman optimality target for the next state s’: the term r + \gamma \max_{a’} Q(s’,a’) is an estimate of the optimal future value assuming the best action a’ is taken from s’. The difference between this target and the current Q(s,a) is the TD error. Q-Learning is thus bootstrapping toward the optimal Q-value. It can be shown that if every state-action is visited infinitely often and \alpha decreases appropriately, Q(s,a) will converge to Q^(s,a) with probability 1 . Q-Learning is off-policy because the update uses the greedy action \max_{a’} Q(s’,a’) for evaluating the next state, regardless of what action was actually taken in the trajectory. In practice, we often use an exploration policy (like \epsilon-greedy, described below) to generate behavior, but still update toward the greedy optimum. This off-policy nature means the learned values converge to optimal Q^* even if the agent’s behavior is somewhat exploratory.
+SARSA (On-Policy TD Control): SARSA stands for State-Action-Reward-State-Action. It is an on-policy TD control algorithm, meaning it learns the value of the policy it is actually following. In SARSA, after taking action a in state s and transitioning to s’, the agent also takes an action a’ in s’ according to its current policy. The update is: Q(s,a) \leftarrow Q(s,a) + \alpha \Big[ r + \gamma Q(s’,a’) \;-\; Q(s,a) \Big]. Notice the difference: instead of the max over a’ (as in Q-Learning), SARSA uses the Q-value of the action actually taken a’ in the next state . Thus, SARSA’s update is informed by the current policy’s behavior. As the policy improves, the updates shift accordingly. If using an \epsilon-greedy policy, SARSA’s updates account for the fact that with probability \epsilon a suboptimal action might be chosen next (since it updates using whatever action occurred). SARSA will converge to the optimal policy as well, but in the interim it tends to be more conservative (it incorporates the exploration policy in value estimation, which can sometimes be beneficial in risky environments – e.g. in the “cliff walking” example from Sutton & Barto, SARSA finds a safer path than Q-learning because Q-learning, being off-policy, evaluates as if it will always take the optimal route, which may skirt the cliff edge ).
+Exploration vs. Exploitation: To ensure convergence of these learning algorithms to optimal values, the agent must explore sufficiently. A common strategy is \epsilon-greedy: with probability 1-\epsilon, choose the action \arg\max_a Q(s,a) (greedy exploitation of current knowledge), and with probability \epsilon, choose a random action (exploration) . Annealing (decaying) \epsilon over time reduces exploration as the agent gains confidence. Other exploration methods include softmax (Boltzmann exploration) or optimistic initial values. A balance is needed: initially, more exploration to discover high-reward actions; later, more exploitation to capitalize on learned knowledge.
+On-Policy vs. Off-Policy: As mentioned, on-policy methods (like SARSA) evaluate or improve the policy that is used to make decisions, whereas off-policy methods (like Q-learning) evaluate a different policy (typically the greedy one) while following an exploratory policy. Off-policy methods allow using old data or data from other agents (via experience replay, for example) since the learning doesn’t depend on adhering to the learned policy during data collection  . However, off-policy methods can be less stable when combined with function approximation (as we’ll see in later lectures about deep RL).
+Convergence and Stability: Both SARSA and Q-learning (with exploration) will converge to optimal Q^ in the limit under certain conditions (infinite visits, decaying learning rate) – a theoretical guarantee. In practice, one must be careful with learning rate selection and exploration scheduling. One known result: Q-learning is guaranteed to converge to Q^ if the environment is Markov and if each state-action is visited infinitely often with a decreasing \alpha . SARSA has a similar guarantee for the policy it follows (which, if using \epsilon-greedy with decaying \epsilon \to 0, will track toward the greedy optimal policy). These results assume a lookup-table representation of Q. With function approximation (like neural networks), convergence is not guaranteed and training can diverge – as historically seen before Deep Q Networks introduced stabilizing techniques. We will discuss those in Lecture 6.
+Example – Cliff Walking: In the Cliff Walking gridworld (Sutton & Barto), an agent must travel from a start to a goal along a grid edge; stepping off the cliff yields a large negative reward. If we apply Q-learning vs. SARSA: Q-learning, being off-policy, computes values assuming the greedy policy (which hugs the cliff for the shortest path). It finds the optimal path (which is shortest but risky) and its optimal Q-values reflect the high cost of falling off the cliff only if you actually fall (which the greedy policy wouldn’t). SARSA, on-policy, “feels” the cost of the cliff during learning because its exploratory moves occasionally fall off, and those negative rewards feed into the Q estimates of near-cliff states. As a result, the SARSA-derived policy tends to stay away from the cliff (a safer path with a slightly longer route) under moderate exploration. This illustrates how the choice of learning algorithm and on-policy vs. off-policy nature can affect the learned behavior during learning (though both converge to the optimal policy as \epsilon \to 0).
+Algorithm Pseudocode – Q-learning: Here is a Python-like pseudocode for Q-learning with an \epsilon-greedy policy. This assumes env is an environment object (like those in OpenAI Gym) and that we can index the Q-table by state and action.
 
 import numpy as np
-import pandas as pd
-import time
-
-np.random.seed(2)  # reproducible
-
-
-N_STATES = 6   # the length of the 1 dimensional world
-ACTIONS = ['left', 'right']     # available actions
-EPSILON = 0.9   # greedy police
-ALPHA = 0.1     # learning rate
-GAMMA = 0.9    # discount factor
-MAX_EPISODES = 13   # maximum episodes
-FRESH_TIME = 0.3    # fresh time for one move
-
-
-def build_q_table(n_states, actions):
-    table = pd.DataFrame(
-        np.zeros((n_states, len(actions))),     # q_table initial values
-        columns=actions,    # actions's name
-    )
-    # print(table)    # show table
-    return table
-
-
-def choose_action(state, q_table):
-    # This is how to choose an action
-    state_actions = q_table.iloc[state, :]
-    if (np.random.uniform() > EPSILON) or ((state_actions == 0).all()):  # act non-greedy or state-action have no value
-        action_name = np.random.choice(ACTIONS)
-    else:   # act greedy
-        action_name = state_actions.idxmax()    # replace argmax to idxmax as argmax means a different function in newer version of pandas
-    return action_name
-
-
-def get_env_feedback(S, A):
-    # This is how agent will interact with the environment
-    if A == 'right':    # move right
-        if S == N_STATES - 2:   # terminate
-            S_ = 'terminal'
-            R = 1
-        else:
-            S_ = S + 1
-            R = 0
-    else:   # move left
-        R = 0
-        if S == 0:
-            S_ = S  # reach the wall
+Q = np.zeros((num_states, num_actions))  # initialize Q-table
+alpha = 0.1    # learning rate
+gamma = 0.99   # discount factor
+epsilon = 0.1  # exploration probability
+
+for episode in range(num_episodes):
+    state = env.reset()
+    done = False
+    while not done:
+        # Choose action using epsilon-greedy policy
+        if np.random.rand() < epsilon:
+            action = env.action_space.sample()            # explore
         else:
-            S_ = S - 1
-    return S_, R
-
-
-def update_env(S, episode, step_counter):
-    # This is how environment be updated
-    env_list = ['-']*(N_STATES-1) + ['T']   # '---------T' our environment
-    if S == 'terminal':
-        interaction = 'Episode %s: total_steps = %s' % (episode+1, step_counter)
-        print('\r{}'.format(interaction), end='')
-        time.sleep(2)
-        print('\r                                ', end='')
-    else:
-        env_list[S] = 'o'
-        interaction = ''.join(env_list)
-        print('\r{}'.format(interaction), end='')
-        time.sleep(FRESH_TIME)
-
-
-def rl():
-    # main part of RL loop
-    q_table = build_q_table(N_STATES, ACTIONS)
-    for episode in range(MAX_EPISODES):
-        step_counter = 0
-        S = 0
-        is_terminated = False
-        update_env(S, episode, step_counter)
-        while not is_terminated:
-
-            A = choose_action(S, q_table)
-            S_, R = get_env_feedback(S, A)  # take action & get next state and reward
-            q_predict = q_table.loc[S, A]
-            if S_ != 'terminal':
-                q_target = R + GAMMA * q_table.iloc[S_, :].max()   # next state is not terminal
-            else:
-                q_target = R     # next state is terminal
-                is_terminated = True    # terminate this episode
-
-            q_table.loc[S, A] += ALPHA * (q_target - q_predict)  # update
-            S = S_  # move to next state
-
-            update_env(S, episode, step_counter+1)
-            step_counter += 1
-    return q_table
-
-
-if __name__ == "__main__":
-    q_table = rl()
-    print('\r\nQ-table:\n')
-print(q_table)
-!ec
+            action = np.argmax(Q[state])                  # exploit (greedy)
+        next_state, reward, done, info = env.step(action) # take action, observe outcome
+        # Q-learning update
+        best_next_action = np.argmax(Q[next_state])
+        td_target = reward + gamma * Q[next_state][best_next_action]
+        td_error  = td_target - Q[state][action]
+        Q[state][action] += alpha * td_error
+        state = next_state
+In this code, td_target represents r + \gamma \max_{a’} Q(s’,a’), and td_error is the difference between this target and the current Q estimate. The Q-table is updated in the direction of reducing this error. This process will gradually push Q towards Q^*. If we wanted SARSA instead, we would use the next action actually taken (perhaps by selecting it before the update) and use Q[s’][a’] in the target.
+
+Practical Considerations: Model-free learning can be slow if rewards are sparse or delayed. Strategies like reward shaping or using experience replay (storing transitions and replaying them for learning, as introduced in DQN ) can help. It’s also common to decay \epsilon over time (start with a lot of exploration, finish near-greedy). Another consideration is non-stationarity: if the environment changes, these algorithms can adapt since they continuously update values, but a too-large learning rate can cause oscillation.
+
+
+Exercises (Lecture 3):
+
+Derivation: Starting from the Bellman optimality equation, derive the Q-learning update. In other words, show that if Q satisfies Q(s,a) = \mathbb{E}[r + \gamma \max_{a’} Q(s’,a’)], then the incremental update with small learning rate is proportional to r + \gamma \max_{a’}Q(s’,a’) - Q(s,a). Why does this form make Q(s,a) approach the true value?
+Coding: Implement SARSA for the Windy Gridworld (from Sutton & Barto, Chapter 6) or another simple environment (like MountainCar-v0 from OpenAI Gym). Compare the learning performance and final policy of SARSA vs. Q-learning. For example, plot the cumulative reward per episode for both algorithms under the same learning rate and \epsilon schedule . Interpret any differences in their learning behaviors.
+Analysis: In the pseudocode above, we used a fixed exploration rate \epsilon. What are the pros and cons of decaying \epsilon over time? Try modifying the code to decrease \epsilon (say, start at 1.0 and linearly decrease to 0.01 over 500 episodes). Does this improve learning stability or final performance in your experiments?
+Further Reading: Skim Chapter 6 of Sutton & Barto (2nd ed.), which covers SARSA, Q-learning, and other TD control methods. Pay attention to the Cliff Walking example and the authors’ explanation of why SARSA yields a different path than Q-learning during learning (even though both converge to the optimal path in the end).
+
+
+
+Lecture 4: Policy Gradient Methods (Introduction to Policy Optimization)
+
+
+Theoretical Foundations: So far our methods implicitly learned value functions (and derived policies from them). In policy gradient methods, we parameterize the policy directly and optimize those parameters to maximize expected return . Policy-based approaches are powerful, especially for problems with continuous action spaces or when a stochastic policy is desired (e.g. for exploration or multi-modal action distributions). In this lecture, we cover the basic policy gradient theorem and the REINFORCE algorithm (Monte Carlo policy gradient), including the use of baseline for variance reduction. We also discuss why policy gradient methods can have high variance and how actor-critic methods (Lecture 5) will address that.
+
+Why Policy Gradient? In some environments (especially with continuous action spaces, like controlling robot torques), learning a value function and then deriving a policy (e.g. picking the action that maximizes Q(s,a)) can be cumbersome or unstable. Policy gradient methods learn the policy directly, which can naturally handle continuous actions by, for example, modeling the policy as a normal distribution over actions with learnable mean and variance. Moreover, policy gradients can optimize stochastic policies, which are useful in partially observable settings or for ensuring exploration. Another advantage: we can shape the policy’s functional form (e.g. a neural network mapping state to action probabilities) and optimize it end-to-end for the reward signal. This approach can be more stable in some cases and is compatible with frameworks of stochastic gradient descent.
+Policy Parameterization: We assume the policy \pi_\theta(a \mid s) is differentiable w.rt. parameters \theta. For example, \theta could be the weights of a neural network that takes state s and outputs a probability distribution over actions (for discrete actions) or parameters of a distribution (for continuous actions, e.g. mean and std of a Gaussian). We define the performance objective (to maximize) as the expected return from the start state distribution: J(\theta) = \mathbb{E}{\tau \sim \pi\theta}[\,R(\tau)\,], where R(\tau) is the total reward of trajectory \tau (or we can use infinite-horizon discounted return). More commonly, if s_0 is fixed, we write J(\theta) = \mathbb{E}{\pi\theta}[\sum_{t=0}^\infty \gamma^t r_{t+1} \mid s_0]. Our goal is to find \theta^* = \arg\max_\theta J(\theta).
+Policy Gradient Theorem: The crux is to compute \nabla_\theta J(\theta), the gradient of performance with respect to the policy parameters. There is a classic result (Williams, 1992; Sutton et al., 2000) stating: \nabla_\theta J(\theta) = \mathbb{E}{\pi\theta}\!\Big[ \nabla_\theta \ln \pi_\theta(A_t \mid S_t) \, G_t \Big], where G_t = \sum_{k=0}^\infty \gamma^k r_{t+k+1} is the return from time t. Intuitively, this says: to adjust parameters, weight each parameter change by the return that follows the action. If an action led to higher than expected return, increase the probability of that action (the gradient term \nabla_\theta \ln \pi_\theta will point in the direction that increases \pi_\theta(A_t\mid S_t)). If the return was lower, the gradient update will decrease the probability of that action. This result is powerful because it provides an unbiased estimator of the gradient from sample episodes  . Derivation sketch: J(\theta) = \sum_s d^\pi(s) \sum_a \pi_\theta(a\mid s) Q^\pi(s,a), where d^\pi(s) is the discounted state visitation distribution under \pi. Differentiating under the sum and using \nabla_\theta \pi_\theta(a\mid s) = \pi_\theta(a\mid s)\,\nabla_\theta \ln \pi_\theta(a\mid s), one can arrive at the above formula. A key step is that \nabla_\theta d^\pi(s) cancels out thanks to some algebra (i.e. we don’t need to know how the state distribution changes with \theta explicitly) .
+REINFORCE Algorithm (Monte Carlo Policy Gradient): REINFORCE is the simplest policy gradient algorithm (Williams, 1992). It uses the formula above by sampling episodes. A basic version: 
+Initialize policy parameters \theta (e.g. randomly).
+Loop over episodes: 
+Generate an episode S_0, A_0, R_1, S_1, A_1, R_2, \dots, S_T by following the current policy \pi_\theta.
+For each time step t in the episode: compute the return G_t = \sum_{k=0}^{T-t-1} \gamma^k R_{t+k+1}.
+For each time step t, update the policy parameters: \theta \leftarrow \theta + \alpha\, \nabla_\theta \ln \pi_\theta(A_t \mid S_t)\, (G_t - b), where \alpha is the learning rate and b is a baseline (see below).
+
+ This is a Monte Carlo method: we wait until the episode finishes to compute G_t. The term \nabla_\theta \ln \pi_\theta(A_t \mid S_t) is the score function giving the direction to change the policy to increase the probability of the chosen action A_t. Multiplying by G_t reinforces actions that led to high return. Summing over timesteps yields an unbiased sample of the gradient \nabla_\theta J(\theta).
+Baseline for Variance Reduction: The term b in the update is a baseline, typically chosen as an estimate of the value of state S_t (i.e. b \approx V^\pi(S_t)). Subtracting a baseline from G_t does not bias the gradient (as long as b does not depend on the action A_t), but it can significantly reduce variance  . Intuitively, if G_t is higher than usual for that state, the positive advantage G_t - b will strengthen the action’s probability; if G_t is lower, the negative advantage will suppress the action. Using b = V^\pi(S_t) makes G_t - b = G_t - V^\pi(S_t) the advantage estimate A^\pi(S_t, A_t). This centers the updates so that on average (under the policy) they have less variance. In practice, one often uses a learned critic (estimate of V) as a baseline – this leads to actor-critic methods (Lecture 5). If we set b=0, the algorithm is the plain REINFORCE with raw returns.
+Properties of Policy Gradient Methods: 
+They can directly optimize the expected reward, and are not limited by having to estimate value for each state-action pair (which is handy for huge or continuous action spaces where Q-tables are infeasible).
+They typically produce stochastic policies, which can be useful (e.g. in multi-agent settings or to maintain exploration).
+A downside is high variance in gradient estimates. Returns G_t can be quite noisy, so learning can be slow or unstable. Techniques to handle this include large numbers of episodes (sample averaging), baselines (as above), variance-reduction tricks like reward normalization, or using actor-critic which blends policy gradient with value function critique.
+Policy gradients are typically local optima methods – they do gradient ascent which could converge to a suboptimal local maximum of reward. In practice, with nonlinear function approximators like neural networks, this is a common concern (solutions like random restarts or advanced optimizers might be needed).
+
+Continuous Actions Example: Suppose we want an agent to output a continuous action (like steering angle). We could define \pi_\theta(a|s) as a Gaussian distribution with mean \mu_\theta(s) and variance \sigma^2. Policy gradient would adjust \mu_\theta(s) to increase/decrease the probability density on rewarded actions. For instance, if an action a yielded higher than expected return, the gradient \nabla_\theta \ln \pi_\theta(a|s) will push \mu_\theta(s) closer to a. This is something value-based methods struggle with, whereas policy gradient handles it naturally.
+REINFORCE with Example: Consider the simple case of learning a policy for the CartPole balancing task. The policy could be a neural network taking the state (cart position, velocity, pole angle, angular velocity) and outputting probabilities of “move left” or “move right.” Using REINFORCE: we run many episodes; in each, the agent eventually fails and gets a total reward equal to the time balanced. The policy gradient will increase the probability of action sequences that lead to longer balancing and decrease those that tip early. Over time, the network learns to balance the pole by gradient ascent on the expected episode reward. (This was historically one of the early successes of policy gradient methods).
+Pseudo-Code – REINFORCE:
+
+# REINFORCE algorithm (Monte Carlo Policy Gradient)
+policy = initialize_policy_parameters()    # e.g., weights of a neural network
+alpha = 1e-2  # learning rate
+
+for episode in range(num_episodes):
+    # Generate an episode
+    states = []
+    actions = []
+    rewards = []
+    state = env.reset()
+    done = False
+    while not done:
+        # sample action according to current policy
+        action = policy.sample_action(state)
+        next_state, reward, done, info = env.step(action)
+        # record state, action, reward
+        states.append(state)
+        actions.append(action)
+        rewards.append(reward)
+        state = next_state
+    # Compute returns G for each time step (backward)
+    returns = []
+    G = 0
+    for r in reversed(rewards):
+        G = r + policy.gamma * G
+        returns.insert(0, G)
+    # Update policy parameters
+    for s, a, G in zip(states, actions, returns):
+        # compute policy gradient (log-likelihood gradient times return)
+        grad_log_prob = policy.grad_log_prob(s, a)    # ∇_θ log π_θ(a|s)
+        policy.theta += alpha * grad_log_prob * (G - baseline(s))
+In this pseudocode, policy.sample_action(state) uses the current policy to pick an action (for discrete actions, e.g. sampling from a Softmax probability distribution). We accumulate rewards and then compute the return G for each timestep by backward accumulation. grad_log_prob represents the gradient of the log-probability of the taken action. Multiplying by the return (minus a baseline) and stepping in that direction performs the policy gradient update. The baseline(s) could be a function approximator for the value V(s); if not used, set it to 0. Note: In practice, one would batch episodes and use autograd tools if using neural networks, rather than manually computing gradients, but this illustrates the concept.
+
+Exercises (Lecture 4):
+
+Derivation: Derive the policy gradient formula from first principles. Specifically, start with J(\theta) = \sum_{s} d^\pi(s) \sum_a \pi_\theta(a|s) R(s,a) for a simplified finite-horizon, deterministic reward scenario, and differentiate with respect to \theta. Show how \nabla_\theta \pi_\theta(a|s) / \pi_\theta(a|s) appears and leads to \nabla_\theta \ln \pi_\theta(a|s) . (For full generality, consult the policy gradient theorem proof in textbooks.)
+Coding: Implement the REINFORCE algorithm on a simple environment, such as the CartPole-v1 (discrete action) or a continuous environment like MountainCarContinuous-v0. Use a neural network policy (you can use a simple two-layer network for CartPole). Monitor the learning curve of total reward per episode. (Hint: You may need to normalize or scale the returns for stability, and a baseline can greatly help – try using the average reward as a baseline).
+Experiment: Try adding a baseline to your REINFORCE implementation. For instance, use an estimate of V(s) by running a separate value network or even a running average of returns from each state. Does it improve the variance of updates and speed of learning? Document the difference in performance between using no baseline vs. a learned baseline.
+Math: Consider a very simple MDP: two actions (A or B) from a start state, each yielding a random reward (say Action A gives reward 1 with 50% chance, 0 otherwise; Action B gives reward 0.4 always). Suppose the policy is \pi_\theta(A) = \sigma(\theta) (a sigmoid of a parameter \theta). Work out one step of policy gradient update by hand: if the agent took action A and got reward 1, what is \nabla_\theta \ln \pi_\theta(A) and how would \theta update? If this episode repeats many times, what do you expect the optimal policy to be? (This checks understanding of how policy gradient encourages better-than-average outcomes).
+
+
+
+Lecture 5: Actor-Critic Methods (Policy Gradient with Value Function Approximation)
+
+
+Theoretical Foundations: Actor-Critic methods marry the strengths of value-based and policy-based approaches. We introduce the two components: the actor (the policy to be learned, as in Lecture 4) and the critic (a value function that criticizes the actions by providing feedback in terms of value estimates). The critic is typically learned via TD methods, providing a baseline/advantage estimate to reduce variance for the actor’s updates . We discuss how actor-critic algorithms work (one-step and multi-step variants), the distinction of on-policy vs. off-policy actor-critic (e.g. A2C/A3C vs. DDPG), and touch on convergence properties.
+
+Motivation: In pure policy gradient (REINFORCE), we saw high variance in the gradient estimates. The idea of Actor-Critic is to use a learned value function as a baseline and even as a booster to learning: rather than using actual returns G_t which are noisy, the critic provides an estimate of the value (expected return) which can be used to compute a lower-variance advantage. Essentially, the critic tells the actor how good the action was compared to average, and the actor adjusts the policy in the direction suggested by the critic  . This synergy tends to yield more efficient learning.
+Structure: An actor-critic algorithm has two sets of parameters: \theta for the actor (policy \pi_\theta(a|s)) and w for the critic (usually the parameters of a value function V_w(s) or sometimes a Q-function Q_w(s,a)). The actor selects actions and is updated by policy gradient methods. The critic is typically updated by minimizing a TD error (using techniques from Lecture 3) to fit the value function to the policy’s returns. Two main forms: 
+Critic as state-value (V) estimator: Here the critic learns V^\pi(s). The advantage estimate for an action a taken in state s is A(s,a) = r + \gamma V_w(s’) - V_w(s) (one-step TD advantage) or could use multi-step returns. This advantage A (the TD error \delta_t) serves as the signal for the actor update.
+Critic as action-value (Q) estimator: Here the critic directly estimates Q^\pi(s,a). The actor update can then use the deterministic policy gradient formula if continuous actions (as in DDPG, see below) or a variant for discrete. In many cases, though, using a state-value baseline is sufficient and simpler.
+
+On-Policy Actor-Critic (e.g. A2C/A3C): A basic actor-critic algorithm (on-policy) might go like: generate an episode (or batch of steps) using current policy \pi_\theta. At each step, compute the TD error \delta_t = r_{t+1} + \gamma V_w(s_{t+1}) - V_w(s_t). Update the critic by reducing this TD error (e.g. w \leftarrow w + \beta \delta_t \nabla_w V_w(s_t)). Update the actor by an ascent step \theta \leftarrow \theta + \alpha \,\delta_t \,\nabla_\theta \ln \pi_\theta(a_t|s_t)  . Intuitively, if \delta_t is positive (meaning the outcome was better than expected), the log-prob gradient will increase the tendency to take action a_t in s_t; if \delta_t is negative, it will decrease that tendency. This uses the critic’s estimate V_w as a baseline (expected value) and \delta_t as the advantage (actual reward minus expected). The famous Advantage Actor-Critic (A2C/A3C) algorithms operate essentially in this manner, with A3C (Asynchronous Advantage Actor-Critic) using multiple parallel environment runners to collect data and update a global network asynchronously  . A2C is a synchronized, batch version of A3C. These algorithms were demonstrated by DeepMind (Mnih et al. 2016) to learn Atari games efficiently with stable convergence by using parallelism and advantage normalization.
+Off-Policy Actor-Critic (e.g. DDPG, ACER): Off-policy actor-critic algorithms allow using experience replay and learning from off-policy data. A prime example is Deep Deterministic Policy Gradient (DDPG) which is an actor-critic method for continuous actions. In DDPG, the actor is a deterministic policy a=\mu_\theta(s), and the critic learns a Q-function Q_w(s,a) by off-policy TD (similar to Q-learning). The actor is updated using the deterministic policy gradient: \nabla_\theta J \approx \mathbb{E}[\nabla_a Q_w(s,a)\vert_{a=\mu(s)} \nabla_\theta \mu_\theta(s)] (this is essentially backpropagating through the Q-network to tweak the actor’s output in the direction of higher Q values  ). DDPG employs a replay buffer and target networks akin to DQN for stability, effectively combining DQN and policy gradient ideas . Off-policy actor-critic methods (like DDPG, ACER, SAC) tend to be more sample-efficient (reusing past experiences) but often require more careful tuning to remain stable.
+Stability and Convergence: Actor-critic methods are not guaranteed to converge in the general case, but in practice, with proper settings, they often do. The interplay of actor and critic can sometimes oscillate (e.g. a bad policy leads to a bad critic estimate, which then adversely affects the policy update). Techniques that help: having the critic updated sufficiently (multiple steps) per actor update (so the critic is near up-to-date), using small learning rates, and sometimes using entropy regularization (to keep the policy exploratory and prevent premature convergence). Additionally, using experience replay (off-policy) requires importance sampling or other corrections (like in ACER, which uses a Retrace algorithm to correct off-policy bias). We won’t dive deep into those math details here, but be aware of the complexity.
+Example – CartPole with Actor-Critic: Instead of REINFORCE, we could train a CartPole agent with an actor-critic. The actor could be a neural network outputting a probability of push-left vs push-right. The critic could be another neural network outputting the estimated value of a state. During training, after each step, we compute the TD error and update both networks. In practice, this tends to learn faster than REINFORCE because the critic’s feedback (the value estimate) provides a smoother, lower-variance signal than raw episode returns.
+Advanced Actor-Critic Algorithms: 
+Trust Region Policy Optimization (TRPO) and Proximal Policy Optimization (PPO): These are policy gradient methods that can be seen as advanced actor-critic approaches. They use a critic for advantage estimation and incorporate constraints or penalty terms to keep the policy update stable (limit how far the policy moves in one update)  . PPO in particular (Schulman et al. 2017) is widely used due to its simplicity and reliability: it uses clipped probability ratios to achieve a similar effect to TRPO’s trust region, preventing too large a policy update in one step.
+Soft Actor-Critic (SAC): An off-policy actor-critic algorithm that maximizes a combination of expected return and entropy (for exploration). It uses stochastic actors and critics, and aims to encourage exploration by adding an entropy bonus to the reward  . It’s state-of-the-art for continuous control tasks due to robustness and sample-efficiency.
+
+Actor-Critic Pseudocode (simplified on-policy):
+
+# Simplified On-policy Actor-Critic (1-step Advantage)
+actor_params = init_actor()
+critic_params = init_critic()
+alpha_theta = 1e-3  # actor learning rate
+alpha_w = 1e-3      # critic learning rate
+gamma = 0.99
+
+for episode in range(num_episodes):
+    state = env.reset()
+    done = False
+    while not done:
+        # Actor selects action
+        action = sample_action(actor_params, state)
+        next_state, reward, done, info = env.step(action)
+        # Critic evaluates: compute TD error
+        V_s = V(critic_params, state)            # value estimate for state
+        V_s_next = V(critic_params, next_state) if not done else 0
+        td_error = reward + gamma * V_s_next - V_s
+        # Critic update (e.g., gradient descent on MSE)
+        critic_params = critic_params + alpha_w * td_error * grad_w V(critic_params, state)
+        # Actor update (policy gradient with advantage = td_error)
+        actor_params = actor_params + alpha_theta * td_error * grad_theta logpi(actor_params, state, action)
+        state = next_state
+In this pseudocode, V(critic_params, state) is the critic’s estimate of state value, and grad_w V is its gradient w.rt. its parameters (for a linear function approximator or neural net). grad_theta logpi is the policy gradient (as before). Notice how the TD error drives both updates. If td_error is positive, the critic will increase V(s) (to reduce future error) and the actor will increase log-prob of the action (because that action led to better outcome than expected). If td_error is negative, the critic will decrease V(s) and the actor will decrease the probability of that action. Over time, this finds a consistent solution where td\_error \approx 0 (policy and value function are in sync with the environment’s returns).
+
+Exercises (Lecture 5):
+
+Conceptual: Explain in your own words why using a critic (value function) can reduce the variance of policy gradient updates . What is the advantage estimate and why does it have lower variance than using the full return G_t?
+Coding: Implement an actor-critic agent on a simple continuous control task (e.g. MountainCarContinuous-v0 or Pendulum-v0 from OpenAI Gym). Use a neural network for the actor (policy) and another for the critic (state-value function). Train the networks simultaneously using the one-step advantage actor-critic method. Plot the learning curve of average reward. Compare this to a baseline REINFORCE (if feasible) to see the difference in sample efficiency.
+Open-Ended: Research the A3C (Asynchronous Advantage Actor-Critic) algorithm . How does the asynchronous update scheme help stability and speed? If you have access to parallel processing, try to implement a simplified version of parallel actor-learners updating a shared global network. Observe how increasing the number of parallel actors affects training time.
+Theory: In an actor-critic method with function approximation, convergence is not guaranteed. Read about the “policy gradient convergence” in the literature (e.g., Baird’s counterexample, or conditions needed for convergence). Summarize one scenario where actor-critic can diverge, and what practical implementations do to mitigate this (hint: think of using small step sizes, off-policy corrections, or carefully tuning the critic’s training relative to the actor).
+
+
+
+Lecture 6: Advanced Topics and Real-World Applications in Reinforcement Learning
+
+
+Overview: In this final lecture, we explore several advanced topics in RL and examine how the algorithms we learned are applied in real-world scenarios. We discuss function approximation (using neural networks – leading to Deep Reinforcement Learning), techniques to stabilize training (Deep Q-Network (DQN) breakthroughs and improvements , as well as advances in policy gradient methods like PPO). We also highlight applications in various fields (robotics, games, NLP, operations research), and mention current frontiers such as safety in RL, multi-agent RL, and meta-RL.
+
+Function Approximation & Deep RL: All the preceding lectures assumed we could store value tables or parameterize policies with manageable sized parameters. In complex problems (large or continuous state spaces), we use function approximators (like neural networks) to represent V(s), Q(s,a), or the policy \pi(a|s). Deep Reinforcement Learning refers to leveraging deep neural networks as these approximators. The combination of RL objectives with high-capacity function approximators is powerful but can be unstable. Notably: 
+Deep Q-Network (DQN): In 2015, DeepMind’s DQN algorithm achieved human-level performance on Atari 2600 games  . DQN uses a neural network Q(s,a; \theta) to approximate the action-value function. Key innovations to stabilize training included: (1) Experience Replay: Store transitions (s,a,r,s’) in a replay buffer and train the network on random minibatches of past experiences . This breaks temporal correlations in the training data and improves data efficiency by reusing experiences. (2) Target Network: Maintain a separate set of weights \theta^- (a target network) that is used to compute the Q-learning target y = r + \gamma \max_{a’} Q(s’,a’; \theta^-). This target network is updated to the current network’s weights only occasionally (every fixed number of steps) . This stabilizes training by preventing oscillations (the target changes slowly). (3) Other techniques: clipping reward values to [-1,1] to normalize scale, and gradient clipping to avoid divergences. With these in place, DQN was able to learn from high-dimensional pixel input, something previously infeasible with naive Q-learning due to instability . Many enhancements followed: Double DQN (to reduce overestimation bias by decoupling action selection from evaluation), Dueling Network Architecture (splitting the network into separate streams estimating state-value and advantages, which improved learning of state values), Prioritized Replay (sampling important transitions more often) etc., each providing improvements in stability or efficiency.
+Policy Gradient Improvements: On the policy side, algorithms like TRPO and PPO introduced trust-region constraints or clipped gradients to allow larger neural networks to be trained reliably on policy objectives  . PPO (Proximal Policy Optimization) in particular is widely used: it uses a surrogate loss that penalizes too large a change in policy probability ratio from the previous policy. This acts as a soft trust region and enables multiple epochs of minibatch updates on the same batch of data without drastically degrading the policy. PPO, combined with advantage estimation (GAE – Generalized Advantage Estimator), has been very successful in continuous control (e.g., OpenAI’s robotics and locomotion tasks) and games.
+Actor-Critic for Continuous Control: DDPG and TD3 (Twin Delayed DDPG) are commonly used for continuous action environments (like robotic arms, driving). TD3 improved DDPG by addressing function approximation issues (e.g., it uses two critic networks to mitigate overestimation, delays actor updates, and adds noise to targets). SAC (Soft Actor-Critic) (mentioned earlier) adds an entropy bonus to encourage exploring a wide range of actions, yielding state-of-the-art results in many tasks with excellent stability.
+
+Multi-Agent RL: In many real-world scenarios, multiple agents learn and interact (cooperatively or competitively). Extensions of RL to multi-agent settings lead to new challenges (non-stationary environment from an agent’s perspective, since other agents are part of the environment). Techniques like MADDPG (Multi-Agent DDPG) train centralized critics for multiple agents in a cooperative or competitive scenario . Self-play (as in AlphaGo) is a form of multi-agent RL (two agents play against each other and learn). Applications include multi-robot coordination, autonomous driving (multiple cars interacting), and economics/game-theory simulations.
+Transfer Learning and Meta-RL: Advanced topics include how to make RL agents generalize knowledge from one task to another. Meta-RL aims to train agents that can learn new tasks fast by learning how to learn. This intersects with concepts like learning an internal representation or recurrent policy that can adapt to new tasks given minimal experience (e.g., RL^2, MAML for RL).
+Safety and Ethics: In high-stakes applications (like autonomous driving or medical decision making), ensuring that an RL agent operates within safe bounds is crucial. Safe RL research incorporates constraints (like never break certain rules) or optimizes risk-sensitive criteria. Approaches involve constrained MDPs (with penalty signals), shielding (overrides that prevent unsafe actions), or training with simulated adversaries to harden the policy. There are also efforts to make sure agents don’t learn undesirable behaviors to hack the reward function (“reward hacking”).
+Real-World Applications: 
+Robotics: RL is used to learn control policies for robotic arms (grasping objects, assembly tasks), legged robots (walking, running, adapting to terrain), and drones. For instance, recent work has used RL to train humanoid robots to perform backflips or bipedal robots to walk robustly. In practice, because training directly on real hardware is slow and risky, techniques like training in simulation and then transferring via domain randomization (randomizing simulator properties to learn a robust policy) are common. Notably, OpenAI demonstrated a robot hand manipulating a Rubik’s Cube using an RL policy trained mostly in simulation.
+Autonomous Vehicles: RL can optimize driving policies, or more commonly sub-problems like adaptive cruise control, lane merging, or decision-making in traffic (often in simulation environments for safety).
+Games: Beyond Atari and board games like Go, RL is now a core technique in video game AI (e.g., OpenAI Five learned to play Dota 2, and DeepMind’s AlphaStar for StarCraft II – both are extremely complex multi-agent game environments). These successes combined deep RL with many techniques: large-scale training, imitation learning to bootstrap from human replays, and novel network architectures. They demonstrate that given enough compute and well-designed reward shaping, RL can achieve superhuman performance in very complex domains.
+Operations Research and Systems: RL is applied to resource management problems (like job scheduling in computing clusters, network packet routing, or managing an electricity grid). Google famously used deep RL to optimize cooling in data centers, reducing energy consumption by learning control policies for fans and cooling units.
+Natural Language Processing: There’s a trend of using RL for dialogue systems (where the reward might be user satisfaction) and text generation tasks where the usual supervised loss is not sufficient. For example, reinforcement learning from human feedback (RLHF) is used to fine-tune language models (ChatGPT-like systems) to produce more helpful and less toxic responses, where a reward model (trained from human preferences) guides the generation  . This is essentially an actor (language model) being optimized with a learned reward function (the critic, trained from human-labeled data).
+Science and Other Fields: RL is being explored for controlling plasma in nuclear fusion reactors (a continuous control problem with complex dynamics), optimizing chemical reactions by controlling experimental conditions, or even in finance for trading strategies. Each domain has its own challenges (e.g. safety constraints, partial observability, etc.), but the core ideas remain similar.
+
+Current Challenges: Despite the successes, RL still faces challenges: sample inefficiency (many algorithms require enormous numbers of interactions – simulation helps, but real-world data can be limited), exploration in high-dimensional spaces (how to discover needle-in-haystack rewards efficiently), credit assignment (particularly long-term credit over very delayed rewards), and ensuring generalization (an agent often overfits to its training environment and may fail if conditions change even slightly). Research is ongoing in hierarchical RL (learning temporal abstractions), better exploration strategies (curiosity-driven learning, novelty search), and combining RL with other learning paradigms (like learning models of the environment for planning, or integrating with symbolic knowledge).
+Lifelong Learning: We ultimately want agents that can learn and adapt over a lifetime, not just in a single episodic training session. Techniques like continual learning (where an agent faces a sequence of tasks and must not forget earlier ones) and online adaptation (where the agent updates in real-time to new conditions) are important for real-world deployment.
+Summary: Reinforcement Learning has matured from tabular DP and simple games to a powerful framework capable of solving challenging tasks with high-dimensional inputs through deep learning. By understanding the foundational algorithms (DP, TD, Q-learning, policy gradients, actor-critic) and the improvements that address their limitations (experience replay, target networks, advantage estimation, trust regions), one can tackle new problems or innovate further. The field is active and evolving, blending ideas from machine learning, control theory, neuroscience (since RL has roots in animal learning psychology), and beyond.
+
+
+Exercises (Lecture 6):
+
+Paper Study: Read the seminal DQN paper by Mnih et al. (2015)   or the PPO paper by Schulman et al. (2017). Summarize in a short write-up: What were the key contributions? How did these methods improve stability or performance in RL?
+Implementation: Take your actor-critic implementation from Lecture 5 and extend it with experience replay and off-policy learning (this would resemble a simplified DDPG if you use a deterministic policy or an off-policy variant if stochastic). Test it on a continuous control task and see if replay allows you to use fewer episodes to achieve good performance.
+Apply RL to a Real-world Data Problem: For instance, formulate a recommendation system problem as an RL task (states = user profile, actions = recommend an item, reward = user interaction or rating). This is a bandit/ contextual bandit scenario if we consider one-step interactions. Implement a simple bandit algorithm (like \epsilon-greedy or UCB) on a dataset and evaluate its performance versus a non-RL heuristic. This will give insight into how exploration/exploitation works in a practical setting.
+Project: Consider a complex environment of your choice (it could be a game from OpenAI Gym or a custom simulation). Design an RL solution using the knowledge from all lectures: choose appropriate state representation, decide whether a value-based or policy-based method (or hybrid) fits best, and incorporate techniques like reward shaping or baseline to aid learning. Try to train the agent and document the training process and results. If things don’t work initially (which is common in RL!), perform debugging: check if rewards are being received as expected, visualize the policy behavior at different stages, tune hyperparameters like learning rate, exploration schedule, etc. Through this process, you’ll gain hands-on intuition for the subtleties of making RL work on real problems.
+
+
+Further Reading: To go beyond, we recommend Sutton & Barto (2018) (for theoretical foundations), OpenAI’s Spinning Up in Deep RL tutorial (practical implementations and intuitions), and Silver’s UCL course lectures (available online) which cover many advanced topics. Keeping up with current research via conferences like NeurIPS or ICML (RL workshops) will expose you to cutting-edge developments such as differentiable planning, model-based RL resurgence (e.g. MuZero), and integration of RL with other domains (like combining with evolutionary algorithms or supervised learning in hybrid systems). Reinforcement learning is a deep and exciting field – happy exploring!
+
+
diff --git a/doc/src/week36/programs/testingp1.py b/doc/src/week36/programs/testingp1.py
new file mode 100644
index 000000000..ea94aa8a1
--- /dev/null
+++ b/doc/src/week36/programs/testingp1.py
@@ -0,0 +1,99 @@
+import os
+import numpy as np
+import pandas as pd
+import matplotlib.pyplot as plt
+from sklearn.model_selection import train_test_split
+from sklearn import linear_model
+
+def R2(y_data, y_model):
+    return 1 - np.sum((y_data - y_model) ** 2) / np.sum((y_data - np.mean(y_data)) ** 2)
+def MSE(y_data,y_model):
+    n = np.size(y_model)
+    return np.sum((y_data-y_model)**2)/n
+
+
+# A seed just to ensure that the random numbers are the same for every run.
+# Useful for eventual debugging.
+np.random.seed(0)
+
+#x = np.random.rand(100)
+x = np.linspace(-1,1,200)
+y = 1.0/(1.0+25*x*x)
+plt.plot(x, y, label = 'Runge')
+# number of features p (here degree of polynomial
+p = 9
+#  The design matrix now as function of a given polynomial
+X = np.zeros((len(x),p))
+X[:,0] = x
+X[:,1] = x*x
+X[:,2] = x*x*x
+X[:,3] = x*x*x*x
+X[:,4] = x*x*x*x*x
+X[:,5] = x*x*x*x*x*x
+X[:,6] = x**7
+X[:,7] = x**8
+X[:,8] = x**9
+# We split the data in test and training data
+#X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
+
+# matrix inversion to find beta
+#OLSbeta = np.linalg.inv(X_train.T @ X_train) @ X_train.T @ y_train
+OLSbeta = np.linalg.inv(X.T @ X) @ X.T @ y
+ypredict = X @ OLSbeta
+plt.plot(x, y, label = 'Runge')
+plt.plot(x, ypredict, label = 'Runge')
+plt.show()
+"""
+print(OLSbeta)
+# and then make the prediction
+ytildeOLS = X_train @ OLSbeta
+print("Training MSE for OLS")
+print(MSE(y_train,ytildeOLS))
+ypredictOLS = X_test @ OLSbeta
+print("Test MSE OLS")
+print(np.abs(y_test-ypredictOLS))
+print(MSE(y_test,ypredictOLS))
+
+# Repeat now for Lasso and Ridge regression and various values of the regularization parameter
+I = np.eye(p,p)
+# Decide which values of lambda to use
+nlambdas = 100
+MSEPredict = np.zeros(nlambdas)
+MSETrain = np.zeros(nlambdas)
+MSELassoPredict = np.zeros(nlambdas)
+MSELassoTrain = np.zeros(nlambdas)
+lambdas = np.logspace(-4, 4, nlambdas)
+for i in range(nlambdas):
+    lmb = lambdas[i]
+    Ridgebeta = np.linalg.inv(X_train.T @ X_train+lmb*I) @ X_train.T @ y_train
+    # include lasso using Scikit-Learn
+    RegLasso = linear_model.Lasso(lmb,fit_intercept=True)
+    RegLasso.fit(X_train,y_train)
+    # and then make the prediction
+    ytildeRidge = X_train @ Ridgebeta
+    ypredictRidge = X_test @ Ridgebeta
+    ytildeLasso = RegLasso.predict(X_train)
+    ypredictLasso = RegLasso.predict(X_test)
+    MSEPredict[i] = MSE(y_test,ypredictRidge)
+    MSETrain[i] = MSE(y_train,ytildeRidge)
+    MSELassoPredict[i] = MSE(y_test,ypredictLasso)
+    MSELassoTrain[i] = MSE(y_train,ytildeLasso)
+
+# Now plot the results
+
+plt.figure()
+plt.plot(np.log10(lambdas), MSETrain, label = 'MSE Ridge train')
+plt.plot(np.log10(lambdas), MSEPredict, 'r--', label = 'MSE Ridge Test')
+plt.plot(np.log10(lambdas), MSELassoTrain, label = 'MSE Lasso train')
+plt.plot(np.log10(lambdas), MSELassoPredict, 'r--', label = 'MSE Lasso Test')
+
+plt.xlabel('log10(lambda)')
+plt.ylabel('MSE')
+plt.legend()
+plt.show()
+"""
+
+
+
+
+
diff --git a/doc/src/week37/.ipynb_checkpoints/week37-checkpoint.ipynb b/doc/src/week37/.ipynb_checkpoints/week37-checkpoint.ipynb
new file mode 100644
index 000000000..c40ec2419
--- /dev/null
+++ b/doc/src/week37/.ipynb_checkpoints/week37-checkpoint.ipynb
@@ -0,0 +1,2444 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "ebae0536",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- HTML file automatically generated from DocOnce source (https://github.com/doconce/doconce/)\n",
+    "doconce format html week37.do.txt --no_mako -->\n",
+    "<!-- dom:TITLE: Week 37: Statistical interpretations and Resampling Methods -->"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c87edf63",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "# Week 37: Statistical interpretations and Resampling Methods\n",
+    "**Morten Hjorth-Jensen**, Department of Physics, University of Oslo, Norway\n",
+    "\n",
+    "Date: **September 8-12, 2025**\n",
+    "\n",
+    "<!-- todo add link to videos and add link to Van Wieringens notes -->"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9b6436c9",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Plans for week 37, lecture Monday\n",
+    "\n",
+    "**Plans and material  for the lecture on Monday September 8.**\n",
+    "\n",
+    "The family of gradient descent methods\n",
+    "1. Plain gradient descent (constant learning rate), reminder from last week with examples using OLS and Ridge\n",
+    "\n",
+    "2. Improving gradient descent with momentum\n",
+    "\n",
+    "3. Introducing stochastic gradient descent\n",
+    "\n",
+    "4. More advanced updates of the learning rate: ADAgrad, RMSprop and ADAM\n",
+    "<!-- * [Video of Lecture](https://youtu.be/omLmp_kkie0) -->\n",
+    "<!-- * [Whiteboard notes](https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2024/NotesSeptember9.pdf) -->"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "01055296",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Readings and Videos:\n",
+    "1. Recommended: Goodfellow et al, Deep Learning, introduction to gradient descent, see sections 4.3-4.5  at <https://www.deeplearningbook.org/contents/numerical.html> and chapter 8.3-8.5 at URL::https://www.deeplearningbook.org/contents/optimization.html\"\n",
+    "\n",
+    "2. Rashcka et al, pages 37-44 and pages 278-283 with focus on linear regression.\n",
+    "\n",
+    "3. Video on gradient descent at <https://www.youtube.com/watch?v=sDv4f4s2SB8>\n",
+    "\n",
+    "4. Video on Stochastic gradient descent at <https://www.youtube.com/watch?v=vMh0zPT0tLI>"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5b4c3f44",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Material for lecture Monday September 8"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "23544472",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Gradient descent and revisiting Ordinary Least Squares from last week\n",
+    "\n",
+    "Last week we started with  linear regression as a case study for the gradient descent\n",
+    "methods. Linear regression is a great test case for the gradient\n",
+    "descent methods discussed in the lectures since it has several\n",
+    "desirable properties such as:\n",
+    "\n",
+    "1. An analytical solution (recall homework sets for week 35).\n",
+    "\n",
+    "2. The gradient can be computed analytically.\n",
+    "\n",
+    "3. The cost function is convex which guarantees that gradient descent converges for small enough learning rates\n",
+    "\n",
+    "We revisit an example similar to what we had in the first homework set. We have a function  of the type"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "id": "d74adaf2",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "x = 2*np.random.rand(m,1)\n",
+    "y = 4+3*x+np.random.randn(m,1)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "fa36c64d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "with $x_i \\in [0,1] $ is chosen randomly using a uniform distribution. Additionally we have a stochastic noise chosen according to a normal distribution $\\cal {N}(0,1)$. \n",
+    "The linear regression model is given by"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "571c791d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "h_\\theta(x) = \\boldsymbol{y} = \\theta_0 + \\theta_1 x,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "12d3e9eb",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "such that"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "17610490",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\boldsymbol{y}_i = \\theta_0 + \\theta_1 x_i.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "47fefa69",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Gradient descent example\n",
+    "\n",
+    "Let $\\mathbf{y} = (y_1,\\cdots,y_n)^T$, $\\mathbf{\\boldsymbol{y}} = (\\boldsymbol{y}_1,\\cdots,\\boldsymbol{y}_n)^T$ and $\\theta = (\\theta_0, \\theta_1)^T$\n",
+    "\n",
+    "It is convenient to write $\\mathbf{\\boldsymbol{y}} = X\\theta$ where $X \\in \\mathbb{R}^{100 \\times 2} $ is the design matrix given by (we keep the intercept here)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "52eb2d30",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "X \\equiv \\begin{bmatrix}\n",
+    "1 & x_1  \\\\\n",
+    "\\vdots & \\vdots  \\\\\n",
+    "1 & x_{100} &  \\\\\n",
+    "\\end{bmatrix}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7d89f32b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "The cost/loss/risk function is given by ("
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5b0fb490",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "C(\\theta) = \\frac{1}{n}||X\\theta-\\mathbf{y}||_{2}^{2} = \\frac{1}{n}\\sum_{i=1}^{100}\\left[ (\\theta_0 + \\theta_1 x_i)^2 - 2 y_i (\\theta_0 + \\theta_1 x_i) + y_i^2\\right]\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "891f388a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and we want to find $\\theta$ such that $C(\\theta)$ is minimized."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d05ffb4b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## The derivative of the cost/loss function\n",
+    "\n",
+    "Computing $\\partial C(\\theta) / \\partial \\theta_0$ and $\\partial C(\\theta) / \\partial \\theta_1$ we can show  that the gradient can be written as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f59689ea",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\nabla_{\\theta} C(\\theta) = \\frac{2}{n}\\begin{bmatrix} \\sum_{i=1}^{100} \\left(\\theta_0+\\theta_1x_i-y_i\\right) \\\\\n",
+    "\\sum_{i=1}^{100}\\left( x_i (\\theta_0+\\theta_1x_i)-y_ix_i\\right) \\\\\n",
+    "\\end{bmatrix} = \\frac{2}{n}X^T(X\\theta - \\mathbf{y}),\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9a119fe4",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where $X$ is the design matrix defined above."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ef5c109f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## The Hessian matrix\n",
+    "The Hessian matrix of $C(\\theta)$ is given by"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5412ed07",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\boldsymbol{H} \\equiv \\begin{bmatrix}\n",
+    "\\frac{\\partial^2 C(\\theta)}{\\partial \\theta_0^2} & \\frac{\\partial^2 C(\\theta)}{\\partial \\theta_0 \\partial \\theta_1}  \\\\\n",
+    "\\frac{\\partial^2 C(\\theta)}{\\partial \\theta_0 \\partial \\theta_1} & \\frac{\\partial^2 C(\\theta)}{\\partial \\theta_1^2} &  \\\\\n",
+    "\\end{bmatrix} = \\frac{2}{n}X^T X.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "33c5d9c9",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "This result implies that $C(\\theta)$ is a convex function since the matrix $X^T X$ always is positive semi-definite."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a8cf7d20",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Simple program\n",
+    "\n",
+    "We can now write a program that minimizes $C(\\theta)$ using the gradient descent method with a constant learning rate $\\gamma$ according to"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "40f0daf3",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\theta_{k+1} = \\theta_k - \\gamma \\nabla_\\theta C(\\theta_k), \\ k=0,1,\\cdots\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b09c9d54",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "We can use the expression we computed for the gradient and let use a\n",
+    "$\\theta_0$ be chosen randomly and let $\\gamma = 0.001$. Stop iterating\n",
+    "when $||\\nabla_\\theta C(\\theta_k) || \\leq \\epsilon = 10^{-8}$. **Note that the code below does not include the latter stop criterion**.\n",
+    "\n",
+    "And finally we can compare our solution for $\\theta$ with the analytic result given by \n",
+    "$\\theta= (X^TX)^{-1} X^T \\mathbf{y}$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "081f1192",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Gradient Descent Example\n",
+    "\n",
+    "Here our simple example"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "id": "5842a0a2",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "%matplotlib inline\n",
+    "\n",
+    "\n",
+    "# Importing various packages\n",
+    "from random import random, seed\n",
+    "import numpy as np\n",
+    "import matplotlib.pyplot as plt\n",
+    "from mpl_toolkits.mplot3d import Axes3D\n",
+    "from matplotlib import cm\n",
+    "from matplotlib.ticker import LinearLocator, FormatStrFormatter\n",
+    "import sys\n",
+    "\n",
+    "# the number of datapoints\n",
+    "n = 100\n",
+    "x = 2*np.random.rand(n,1)\n",
+    "y = 4+3*x+np.random.randn(n,1)\n",
+    "\n",
+    "X = np.c_[np.ones((n,1)), x]\n",
+    "# Hessian matrix\n",
+    "H = (2.0/n)* X.T @ X\n",
+    "# Get the eigenvalues\n",
+    "EigValues, EigVectors = np.linalg.eig(H)\n",
+    "print(f\"Eigenvalues of Hessian Matrix:{EigValues}\")\n",
+    "\n",
+    "theta_linreg = np.linalg.inv(X.T @ X) @ X.T @ y\n",
+    "print(theta_linreg)\n",
+    "theta = np.random.randn(2,1)\n",
+    "\n",
+    "eta = 1.0/np.max(EigValues)\n",
+    "Niterations = 1000\n",
+    "\n",
+    "for iter in range(Niterations):\n",
+    "    gradient = (2.0/n)*X.T @ (X @ theta-y)\n",
+    "    theta -= eta*gradient\n",
+    "\n",
+    "print(theta)\n",
+    "xnew = np.array([[0],[2]])\n",
+    "xbnew = np.c_[np.ones((2,1)), xnew]\n",
+    "ypredict = xbnew.dot(theta)\n",
+    "ypredict2 = xbnew.dot(theta_linreg)\n",
+    "plt.plot(xnew, ypredict, \"r-\")\n",
+    "plt.plot(xnew, ypredict2, \"b-\")\n",
+    "plt.plot(x, y ,'ro')\n",
+    "plt.axis([0,2.0,0, 15.0])\n",
+    "plt.xlabel(r'$x$')\n",
+    "plt.ylabel(r'$y$')\n",
+    "plt.title(r'Gradient descent example')\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "923874ba",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Gradient descent and Ridge\n",
+    "\n",
+    "We have also discussed Ridge regression where the loss function contains a regularized term given by the $L_2$ norm of $\\theta$,"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "442c5abb",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "C_{\\text{ridge}}(\\theta) = \\frac{1}{n}||X\\theta -\\mathbf{y}||^2 + \\lambda ||\\theta||^2, \\ \\lambda \\geq 0.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "743e05ca",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "In order to minimize $C_{\\text{ridge}}(\\theta)$ using GD we adjust the gradient as follows"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "27dfced9",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\nabla_\\theta C_{\\text{ridge}}(\\theta)  = \\frac{2}{n}\\begin{bmatrix} \\sum_{i=1}^{100} \\left(\\theta_0+\\theta_1x_i-y_i\\right) \\\\\n",
+    "\\sum_{i=1}^{100}\\left( x_i (\\theta_0+\\theta_1x_i)-y_ix_i\\right) \\\\\n",
+    "\\end{bmatrix} + 2\\lambda\\begin{bmatrix} \\theta_0 \\\\ \\theta_1\\end{bmatrix} = 2 (\\frac{1}{n}X^T(X\\theta - \\mathbf{y})+\\lambda \\theta).\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3ff66697",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "We can easily extend our program to minimize $C_{\\text{ridge}}(\\theta)$ using gradient descent and compare with the analytical solution given by"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f63f80f9",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\theta_{\\text{ridge}} = \\left(X^T X + n\\lambda I_{2 \\times 2} \\right)^{-1} X^T \\mathbf{y}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "dc6a4fff",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## The Hessian matrix for Ridge Regression\n",
+    "The Hessian matrix of Ridge Regression for our simple example  is given by"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "56a4b43d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\boldsymbol{H} \\equiv \\begin{bmatrix}\n",
+    "\\frac{\\partial^2 C(\\theta)}{\\partial \\theta_0^2} & \\frac{\\partial^2 C(\\theta)}{\\partial \\theta_0 \\partial \\theta_1}  \\\\\n",
+    "\\frac{\\partial^2 C(\\theta)}{\\partial \\theta_0 \\partial \\theta_1} & \\frac{\\partial^2 C(\\theta)}{\\partial \\theta_1^2} &  \\\\\n",
+    "\\end{bmatrix} = \\frac{2}{n}X^T X+2\\lambda\\boldsymbol{I}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "38c6aab3",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "This implies that the Hessian matrix  is positive definite, hence the stationary point is a\n",
+    "minimum.\n",
+    "Note that the Ridge cost function is convex being  a sum of two convex\n",
+    "functions. Therefore, the stationary point is a global\n",
+    "minimum of this function."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a4f724c3",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Program example for gradient descent with Ridge Regression"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "id": "ae5d09dc",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "from random import random, seed\n",
+    "import numpy as np\n",
+    "import matplotlib.pyplot as plt\n",
+    "from mpl_toolkits.mplot3d import Axes3D\n",
+    "from matplotlib import cm\n",
+    "from matplotlib.ticker import LinearLocator, FormatStrFormatter\n",
+    "import sys\n",
+    "\n",
+    "# the number of datapoints\n",
+    "n = 100\n",
+    "x = 2*np.random.rand(n,1)\n",
+    "y = 4+3*x+np.random.randn(n,1)\n",
+    "\n",
+    "X = np.c_[np.ones((n,1)), x]\n",
+    "XT_X = X.T @ X\n",
+    "\n",
+    "#Ridge parameter lambda\n",
+    "lmbda  = 0.001\n",
+    "Id = n*lmbda* np.eye(XT_X.shape[0])\n",
+    "\n",
+    "# Hessian matrix\n",
+    "H = (2.0/n)* XT_X+2*lmbda* np.eye(XT_X.shape[0])\n",
+    "# Get the eigenvalues\n",
+    "EigValues, EigVectors = np.linalg.eig(H)\n",
+    "print(f\"Eigenvalues of Hessian Matrix:{EigValues}\")\n",
+    "\n",
+    "\n",
+    "theta_linreg = np.linalg.inv(XT_X+Id) @ X.T @ y\n",
+    "print(theta_linreg)\n",
+    "# Start plain gradient descent\n",
+    "theta = np.random.randn(2,1)\n",
+    "\n",
+    "eta = 1.0/np.max(EigValues)\n",
+    "Niterations = 100\n",
+    "\n",
+    "for iter in range(Niterations):\n",
+    "    gradients = 2.0/n*X.T @ (X @ (theta)-y)+2*lmbda*theta\n",
+    "    theta -= eta*gradients\n",
+    "\n",
+    "print(theta)\n",
+    "ypredict = X @ theta\n",
+    "ypredict2 = X @ theta_linreg\n",
+    "plt.plot(x, ypredict, \"r-\")\n",
+    "plt.plot(x, ypredict2, \"b-\")\n",
+    "plt.plot(x, y ,'ro')\n",
+    "plt.axis([0,2.0,0, 15.0])\n",
+    "plt.xlabel(r'$x$')\n",
+    "plt.ylabel(r'$y$')\n",
+    "plt.title(r'Gradient descent example for Ridge')\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "184d283f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Using gradient descent methods, limitations\n",
+    "\n",
+    "* **Gradient descent (GD) finds local minima of our function**. Since the GD algorithm is deterministic, if it converges, it will converge to a local minimum of our cost/loss/risk function. Because in ML we are often dealing with extremely rugged landscapes with many local minima, this can lead to poor performance.\n",
+    "\n",
+    "* **GD is sensitive to initial conditions**. One consequence of the local nature of GD is that initial conditions matter. Depending on where one starts, one will end up at a different local minima. Therefore, it is very important to think about how one initializes the training process. This is true for GD as well as more complicated variants of GD.\n",
+    "\n",
+    "* **Gradients are computationally expensive to calculate for large datasets**. In many cases in statistics and ML, the cost/loss/risk function is a sum of terms, with one term for each data point. For example, in linear regression, $E \\propto \\sum_{i=1}^n (y_i - \\mathbf{w}^T\\cdot\\mathbf{x}_i)^2$; for logistic regression, the square error is replaced by the cross entropy. To calculate the gradient we have to sum over *all* $n$ data points. Doing this at every GD step becomes extremely computationally expensive. An ingenious solution to this, is to calculate the gradients using small subsets of the data called \"mini batches\". This has the added benefit of introducing stochasticity into our algorithm.\n",
+    "\n",
+    "* **GD is very sensitive to choices of learning rates**. GD is extremely sensitive to the choice of learning rates. If the learning rate is very small, the training process take an extremely long time. For larger learning rates, GD can diverge and give poor results. Furthermore, depending on what the local landscape looks like, we have to modify the learning rates to ensure convergence. Ideally, we would *adaptively* choose the learning rates to match the landscape.\n",
+    "\n",
+    "* **GD treats all directions in parameter space uniformly.** Another major drawback of GD is that unlike Newton's method, the learning rate for GD is the same in all directions in parameter space. For this reason, the maximum learning rate is set by the behavior of the steepest direction and this can significantly slow down training. Ideally, we would like to take large steps in flat directions and small steps in steep directions. Since we are exploring rugged landscapes where curvatures change, this requires us to keep track of not only the gradient but second derivatives. The ideal scenario would be to calculate the Hessian but this proves to be too computationally expensive. \n",
+    "\n",
+    "* GD can take exponential time to escape saddle points, even with random initialization. As we mentioned, GD is extremely sensitive to initial condition since it determines the particular local minimum GD would eventually reach. However, even with a good initialization scheme, through the introduction of randomness, GD can still take exponential time to escape saddle points."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cfb021a5",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Improving gradient descent with momentum\n",
+    "\n",
+    "We discuss here some simple examples where we introduce what is called 'memory'about previous steps, or what is normally called momentum gradient descent. The mathematics is explained below in connection with Stochastic gradient descent."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "id": "a89ea0e9",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "from numpy import asarray\n",
+    "from numpy import arange\n",
+    "from numpy.random import rand\n",
+    "from numpy.random import seed\n",
+    "from matplotlib import pyplot\n",
+    " \n",
+    "# objective function\n",
+    "def objective(x):\n",
+    "\treturn x**2.0\n",
+    " \n",
+    "# derivative of objective function\n",
+    "def derivative(x):\n",
+    "\treturn x * 2.0\n",
+    " \n",
+    "# gradient descent algorithm\n",
+    "def gradient_descent(objective, derivative, bounds, n_iter, step_size):\n",
+    "\t# track all solutions\n",
+    "\tsolutions, scores = list(), list()\n",
+    "\t# generate an initial point\n",
+    "\tsolution = bounds[:, 0] + rand(len(bounds)) * (bounds[:, 1] - bounds[:, 0])\n",
+    "\t# run the gradient descent\n",
+    "\tfor i in range(n_iter):\n",
+    "\t\t# calculate gradient\n",
+    "\t\tgradient = derivative(solution)\n",
+    "\t\t# take a step\n",
+    "\t\tsolution = solution - step_size * gradient\n",
+    "\t\t# evaluate candidate point\n",
+    "\t\tsolution_eval = objective(solution)\n",
+    "\t\t# store solution\n",
+    "\t\tsolutions.append(solution)\n",
+    "\t\tscores.append(solution_eval)\n",
+    "\t\t# report progress\n",
+    "\t\tprint('>%d f(%s) = %.5f' % (i, solution, solution_eval))\n",
+    "\treturn [solutions, scores]\n",
+    " \n",
+    "# seed the pseudo random number generator\n",
+    "seed(4)\n",
+    "# define range for input\n",
+    "bounds = asarray([[-1.0, 1.0]])\n",
+    "# define the total iterations\n",
+    "n_iter = 30\n",
+    "# define the step size\n",
+    "step_size = 0.1\n",
+    "# perform the gradient descent search\n",
+    "solutions, scores = gradient_descent(objective, derivative, bounds, n_iter, step_size)\n",
+    "# sample input range uniformly at 0.1 increments\n",
+    "inputs = arange(bounds[0,0], bounds[0,1]+0.1, 0.1)\n",
+    "# compute targets\n",
+    "results = objective(inputs)\n",
+    "# create a line plot of input vs result\n",
+    "pyplot.plot(inputs, results)\n",
+    "# plot the solutions found\n",
+    "pyplot.plot(solutions, scores, '.-', color='red')\n",
+    "# show the plot\n",
+    "pyplot.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4a29024a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Same code but now with momentum gradient descent"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "id": "9212126a",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "from numpy import asarray\n",
+    "from numpy import arange\n",
+    "from numpy.random import rand\n",
+    "from numpy.random import seed\n",
+    "from matplotlib import pyplot\n",
+    " \n",
+    "# objective function\n",
+    "def objective(x):\n",
+    "\treturn x**2.0\n",
+    " \n",
+    "# derivative of objective function\n",
+    "def derivative(x):\n",
+    "\treturn x * 2.0\n",
+    " \n",
+    "# gradient descent algorithm\n",
+    "def gradient_descent(objective, derivative, bounds, n_iter, step_size, momentum):\n",
+    "\t# track all solutions\n",
+    "\tsolutions, scores = list(), list()\n",
+    "\t# generate an initial point\n",
+    "\tsolution = bounds[:, 0] + rand(len(bounds)) * (bounds[:, 1] - bounds[:, 0])\n",
+    "\t# keep track of the change\n",
+    "\tchange = 0.0\n",
+    "\t# run the gradient descent\n",
+    "\tfor i in range(n_iter):\n",
+    "\t\t# calculate gradient\n",
+    "\t\tgradient = derivative(solution)\n",
+    "\t\t# calculate update\n",
+    "\t\tnew_change = step_size * gradient + momentum * change\n",
+    "\t\t# take a step\n",
+    "\t\tsolution = solution - new_change\n",
+    "\t\t# save the change\n",
+    "\t\tchange = new_change\n",
+    "\t\t# evaluate candidate point\n",
+    "\t\tsolution_eval = objective(solution)\n",
+    "\t\t# store solution\n",
+    "\t\tsolutions.append(solution)\n",
+    "\t\tscores.append(solution_eval)\n",
+    "\t\t# report progress\n",
+    "\t\tprint('>%d f(%s) = %.5f' % (i, solution, solution_eval))\n",
+    "\treturn [solutions, scores]\n",
+    " \n",
+    "# seed the pseudo random number generator\n",
+    "seed(4)\n",
+    "# define range for input\n",
+    "bounds = asarray([[-1.0, 1.0]])\n",
+    "# define the total iterations\n",
+    "n_iter = 30\n",
+    "# define the step size\n",
+    "step_size = 0.1\n",
+    "# define momentum\n",
+    "momentum = 0.3\n",
+    "# perform the gradient descent search with momentum\n",
+    "solutions, scores = gradient_descent(objective, derivative, bounds, n_iter, step_size, momentum)\n",
+    "# sample input range uniformly at 0.1 increments\n",
+    "inputs = arange(bounds[0,0], bounds[0,1]+0.1, 0.1)\n",
+    "# compute targets\n",
+    "results = objective(inputs)\n",
+    "# create a line plot of input vs result\n",
+    "pyplot.plot(inputs, results)\n",
+    "# plot the solutions found\n",
+    "pyplot.plot(solutions, scores, '.-', color='red')\n",
+    "# show the plot\n",
+    "pyplot.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c12221ee",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Overview video on Stochastic Gradient Descent\n",
+    "\n",
+    "[What is Stochastic Gradient Descent](https://www.youtube.com/watch?v=vMh0zPT0tLI&ab_channel=StatQuestwithJoshStarmer)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "89e21421",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Batches and mini-batches\n",
+    "\n",
+    "In gradient descent we compute the cost function and its gradient for all data points we have.\n",
+    "\n",
+    "In large-scale applications such as the [ILSVRC challenge](https://www.image-net.org/challenges/LSVRC/), the\n",
+    "training data can have on order of millions of examples. Hence, it\n",
+    "seems wasteful to compute the full cost function over the entire\n",
+    "training set in order to perform only a single parameter update. A\n",
+    "very common approach to addressing this challenge is to compute the\n",
+    "gradient over batches of the training data. For example, a typical batch could contain some thousand  examples from\n",
+    "an  entire training set of several millions. This batch is then used to\n",
+    "perform a parameter update."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b3d6706b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Stochastic Gradient Descent (SGD)\n",
+    "\n",
+    "In stochastic gradient descent, the extreme case is the case where we\n",
+    "have only one batch, that is we include the whole data set.\n",
+    "\n",
+    "This process is called Stochastic Gradient\n",
+    "Descent (SGD) (or also sometimes on-line gradient descent). This is\n",
+    "relatively less common to see because in practice due to vectorized\n",
+    "code optimizations it can be computationally much more efficient to\n",
+    "evaluate the gradient for 100 examples, than the gradient for one\n",
+    "example 100 times. Even though SGD technically refers to using a\n",
+    "single example at a time to evaluate the gradient, you will hear\n",
+    "people use the term SGD even when referring to mini-batch gradient\n",
+    "descent (i.e. mentions of MGD for “Minibatch Gradient Descent”, or BGD\n",
+    "for “Batch gradient descent” are rare to see), where it is usually\n",
+    "assumed that mini-batches are used. The size of the mini-batch is a\n",
+    "hyperparameter but it is not very common to cross-validate or bootstrap it. It is\n",
+    "usually based on memory constraints (if any), or set to some value,\n",
+    "e.g. 32, 64 or 128. We use powers of 2 in practice because many\n",
+    "vectorized operation implementations work faster when their inputs are\n",
+    "sized in powers of 2.\n",
+    "\n",
+    "In our notes with  SGD we mean stochastic gradient descent with mini-batches."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f2900e4e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Stochastic Gradient Descent\n",
+    "\n",
+    "Stochastic gradient descent (SGD) and variants thereof address some of\n",
+    "the shortcomings of the Gradient descent method discussed above.\n",
+    "\n",
+    "The underlying idea of SGD comes from the observation that the cost\n",
+    "function, which we want to minimize, can almost always be written as a\n",
+    "sum over $n$ data points $\\{\\mathbf{x}_i\\}_{i=1}^n$,"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b6745dec",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "C(\\mathbf{\\beta}) = \\sum_{i=1}^n c_i(\\mathbf{x}_i,\n",
+    "\\mathbf{\\beta}).\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b6f524f1",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Computation of gradients\n",
+    "\n",
+    "This in turn means that the gradient can be\n",
+    "computed as a sum over $i$-gradients"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "db7c028a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\nabla_\\beta C(\\mathbf{\\beta}) = \\sum_i^n \\nabla_\\beta c_i(\\mathbf{x}_i,\n",
+    "\\mathbf{\\beta}).\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "813b7f85",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Stochasticity/randomness is introduced by only taking the\n",
+    "gradient on a subset of the data called minibatches.  If there are $n$\n",
+    "data points and the size of each minibatch is $M$, there will be $n/M$\n",
+    "minibatches. We denote these minibatches by $B_k$ where\n",
+    "$k=1,\\cdots,n/M$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a562fa9a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## SGD example\n",
+    "As an example, suppose we have $10$ data points $(\\mathbf{x}_1,\\cdots, \\mathbf{x}_{10})$ \n",
+    "and we choose to have $M=5$ minibathces,\n",
+    "then each minibatch contains two data points. In particular we have\n",
+    "$B_1 = (\\mathbf{x}_1,\\mathbf{x}_2), \\cdots, B_5 =\n",
+    "(\\mathbf{x}_9,\\mathbf{x}_{10})$. Note that if you choose $M=1$ you\n",
+    "have only a single batch with all data points and on the other extreme,\n",
+    "you may choose $M=n$ resulting in a minibatch for each datapoint, i.e\n",
+    "$B_k = \\mathbf{x}_k$.\n",
+    "\n",
+    "The idea is now to approximate the gradient by replacing the sum over\n",
+    "all data points with a sum over the data points in one the minibatches\n",
+    "picked at random in each gradient descent step"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "75d84b18",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\nabla_{\\beta}\n",
+    "C(\\mathbf{\\beta}) = \\sum_{i=1}^n \\nabla_\\beta c_i(\\mathbf{x}_i,\n",
+    "\\mathbf{\\beta}) \\rightarrow \\sum_{i \\in B_k}^n \\nabla_\\beta\n",
+    "c_i(\\mathbf{x}_i, \\mathbf{\\beta}).\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0cf6f343",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## The gradient step\n",
+    "\n",
+    "Thus a gradient descent step now looks like"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "44f85de2",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\beta_{j+1} = \\beta_j - \\gamma_j \\sum_{i \\in B_k}^n \\nabla_\\beta c_i(\\mathbf{x}_i,\n",
+    "\\mathbf{\\beta})\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "92851949",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where $k$ is picked at random with equal\n",
+    "probability from $[1,n/M]$. An iteration over the number of\n",
+    "minibathces (n/M) is commonly referred to as an epoch. Thus it is\n",
+    "typical to choose a number of epochs and for each epoch iterate over\n",
+    "the number of minibatches, as exemplified in the code below."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "21d2691e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Simple example code"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "id": "6536b711",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "import numpy as np \n",
+    "\n",
+    "n = 100 #100 datapoints \n",
+    "M = 5   #size of each minibatch\n",
+    "m = int(n/M) #number of minibatches\n",
+    "n_epochs = 10 #number of epochs\n",
+    "\n",
+    "j = 0\n",
+    "for epoch in range(1,n_epochs+1):\n",
+    "    for i in range(m):\n",
+    "        k = np.random.randint(m) #Pick the k-th minibatch at random\n",
+    "        #Compute the gradient using the data in minibatch Bk\n",
+    "        #Compute new suggestion for \n",
+    "        j += 1"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4005ce6c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Taking the gradient only on a subset of the data has two important\n",
+    "benefits. First, it introduces randomness which decreases the chance\n",
+    "that our opmization scheme gets stuck in a local minima. Second, if\n",
+    "the size of the minibatches are small relative to the number of\n",
+    "datapoints ($M <  n$), the computation of the gradient is much\n",
+    "cheaper since we sum over the datapoints in the $k-th$ minibatch and not\n",
+    "all $n$ datapoints."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "adec9808",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## When do we stop?\n",
+    "\n",
+    "A natural question is when do we stop the search for a new minimum?\n",
+    "One possibility is to compute the full gradient after a given number\n",
+    "of epochs and check if the norm of the gradient is smaller than some\n",
+    "threshold and stop if true. However, the condition that the gradient\n",
+    "is zero is valid also for local minima, so this would only tell us\n",
+    "that we are close to a local/global minimum. However, we could also\n",
+    "evaluate the cost function at this point, store the result and\n",
+    "continue the search. If the test kicks in at a later stage we can\n",
+    "compare the values of the cost function and keep the $\\beta$ that\n",
+    "gave the lowest value."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "deecc226",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Slightly different approach\n",
+    "\n",
+    "Another approach is to let the step length $\\gamma_j$ depend on the\n",
+    "number of epochs in such a way that it becomes very small after a\n",
+    "reasonable time such that we do not move at all. Such approaches are\n",
+    "also called scaling. There are many such ways to [scale the learning\n",
+    "rate](https://towardsdatascience.com/gradient-descent-the-learning-rate-and-the-importance-of-feature-scaling-6c0b416596e1)\n",
+    "and [discussions here](https://www.jmlr.org/papers/volume23/20-1258/20-1258.pdf). See\n",
+    "also\n",
+    "<https://towardsdatascience.com/learning-rate-schedules-and-adaptive-learning-rate-methods-for-deep-learning-2c8f433990d1>\n",
+    "for a discussion of different scaling functions for the learning rate."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "73685769",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Time decay rate\n",
+    "\n",
+    "As an example, let $e = 0,1,2,3,\\cdots$ denote the current epoch and let $t_0, t_1 > 0$ be two fixed numbers. Furthermore, let $t = e \\cdot m + i$ where $m$ is the number of minibatches and $i=0,\\cdots,m-1$. Then the function $$\\gamma_j(t; t_0, t_1) = \\frac{t_0}{t+t_1} $$ goes to zero as the number of epochs gets large. I.e. we start with a step length $\\gamma_j (0; t_0, t_1) = t_0/t_1$ which decays in *time* $t$.\n",
+    "\n",
+    "In this way we can fix the number of epochs, compute $\\beta$ and\n",
+    "evaluate the cost function at the end. Repeating the computation will\n",
+    "give a different result since the scheme is random by design. Then we\n",
+    "pick the final $\\beta$ that gives the lowest value of the cost\n",
+    "function."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "id": "6484bffa",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "import numpy as np \n",
+    "\n",
+    "def step_length(t,t0,t1):\n",
+    "    return t0/(t+t1)\n",
+    "\n",
+    "n = 100 #100 datapoints \n",
+    "M = 5   #size of each minibatch\n",
+    "m = int(n/M) #number of minibatches\n",
+    "n_epochs = 500 #number of epochs\n",
+    "t0 = 1.0\n",
+    "t1 = 10\n",
+    "\n",
+    "gamma_j = t0/t1\n",
+    "j = 0\n",
+    "for epoch in range(1,n_epochs+1):\n",
+    "    for i in range(m):\n",
+    "        k = np.random.randint(m) #Pick the k-th minibatch at random\n",
+    "        #Compute the gradient using the data in minibatch Bk\n",
+    "        #Compute new suggestion for beta\n",
+    "        t = epoch*m+i\n",
+    "        gamma_j = step_length(t,t0,t1)\n",
+    "        j += 1\n",
+    "\n",
+    "print(\"gamma_j after %d epochs: %g\" % (n_epochs,gamma_j))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2438f642",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Code with a Number of Minibatches which varies\n",
+    "\n",
+    "In the code here we vary the number of mini-batches."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 8,
+   "id": "8d624cf1",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "# Importing various packages\n",
+    "from math import exp, sqrt\n",
+    "from random import random, seed\n",
+    "import numpy as np\n",
+    "import matplotlib.pyplot as plt\n",
+    "\n",
+    "n = 100\n",
+    "x = 2*np.random.rand(n,1)\n",
+    "y = 4+3*x+np.random.randn(n,1)\n",
+    "\n",
+    "X = np.c_[np.ones((n,1)), x]\n",
+    "XT_X = X.T @ X\n",
+    "theta_linreg = np.linalg.inv(X.T @ X) @ (X.T @ y)\n",
+    "print(\"Own inversion\")\n",
+    "print(theta_linreg)\n",
+    "# Hessian matrix\n",
+    "H = (2.0/n)* XT_X\n",
+    "EigValues, EigVectors = np.linalg.eig(H)\n",
+    "print(f\"Eigenvalues of Hessian Matrix:{EigValues}\")\n",
+    "\n",
+    "theta = np.random.randn(2,1)\n",
+    "eta = 1.0/np.max(EigValues)\n",
+    "Niterations = 1000\n",
+    "\n",
+    "\n",
+    "for iter in range(Niterations):\n",
+    "    gradients = 2.0/n*X.T @ ((X @ theta)-y)\n",
+    "    theta -= eta*gradients\n",
+    "print(\"theta from own gd\")\n",
+    "print(theta)\n",
+    "\n",
+    "xnew = np.array([[0],[2]])\n",
+    "Xnew = np.c_[np.ones((2,1)), xnew]\n",
+    "ypredict = Xnew.dot(theta)\n",
+    "ypredict2 = Xnew.dot(theta_linreg)\n",
+    "\n",
+    "n_epochs = 50\n",
+    "M = 5   #size of each minibatch\n",
+    "m = int(n/M) #number of minibatches\n",
+    "t0, t1 = 5, 50\n",
+    "\n",
+    "def learning_schedule(t):\n",
+    "    return t0/(t+t1)\n",
+    "\n",
+    "theta = np.random.randn(2,1)\n",
+    "\n",
+    "for epoch in range(n_epochs):\n",
+    "# Can you figure out a better way of setting up the contributions to each batch?\n",
+    "    for i in range(m):\n",
+    "        random_index = M*np.random.randint(m)\n",
+    "        xi = X[random_index:random_index+M]\n",
+    "        yi = y[random_index:random_index+M]\n",
+    "        gradients = (2.0/M)* xi.T @ ((xi @ theta)-yi)\n",
+    "        eta = learning_schedule(epoch*m+i)\n",
+    "        theta = theta - eta*gradients\n",
+    "print(\"theta from own sdg\")\n",
+    "print(theta)\n",
+    "\n",
+    "plt.plot(xnew, ypredict, \"r-\")\n",
+    "plt.plot(xnew, ypredict2, \"b-\")\n",
+    "plt.plot(x, y ,'ro')\n",
+    "plt.axis([0,2.0,0, 15.0])\n",
+    "plt.xlabel(r'$x$')\n",
+    "plt.ylabel(r'$y$')\n",
+    "plt.title(r'Random numbers ')\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6e9eb916",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Replace or not\n",
+    "\n",
+    "In the above code, we have use replacement in setting up the\n",
+    "mini-batches. The discussion\n",
+    "[here](https://sebastianraschka.com/faq/docs/sgd-methods.html) may be\n",
+    "useful."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c60a0137",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Momentum based GD\n",
+    "\n",
+    "The stochastic gradient descent (SGD) is almost always used with a\n",
+    "*momentum* or inertia term that serves as a memory of the direction we\n",
+    "are moving in parameter space.  This is typically implemented as\n",
+    "follows"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2d8174d4",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\mathbf{v}_{t}=\\gamma \\mathbf{v}_{t-1}+\\eta_{t}\\nabla_\\theta E(\\boldsymbol{\\theta}_t) \\nonumber\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "afc41240",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"_auto1\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation} \n",
+    "\\boldsymbol{\\theta}_{t+1}= \\boldsymbol{\\theta}_t -\\mathbf{v}_{t},\n",
+    "\\label{_auto1} \\tag{1}\n",
+    "\\end{equation}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e3c96a9f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where we have introduced a momentum parameter $\\gamma$, with\n",
+    "$0\\le\\gamma\\le 1$, and for brevity we dropped the explicit notation to\n",
+    "indicate the gradient is to be taken over a different mini-batch at\n",
+    "each step. We call this algorithm gradient descent with momentum\n",
+    "(GDM). From these equations, it is clear that $\\mathbf{v}_t$ is a\n",
+    "running average of recently encountered gradients and\n",
+    "$(1-\\gamma)^{-1}$ sets the characteristic time scale for the memory\n",
+    "used in the averaging procedure. Consistent with this, when\n",
+    "$\\gamma=0$, this just reduces down to ordinary SGD as discussed\n",
+    "earlier. An equivalent way of writing the updates is"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "74e0e345",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\Delta \\boldsymbol{\\theta}_{t+1} = \\gamma \\Delta \\boldsymbol{\\theta}_t -\\ \\eta_{t}\\nabla_\\theta E(\\boldsymbol{\\theta}_t),\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "288fcc66",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where we have defined $\\Delta \\boldsymbol{\\theta}_{t}= \\boldsymbol{\\theta}_t-\\boldsymbol{\\theta}_{t-1}$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "63b49aca",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## More on momentum based approaches\n",
+    "\n",
+    "Let us try to get more intuition from these equations. It is helpful\n",
+    "to consider a simple physical analogy with a particle of mass $m$\n",
+    "moving in a viscous medium with drag coefficient $\\mu$ and potential\n",
+    "$E(\\mathbf{w})$. If we denote the particle's position by $\\mathbf{w}$,\n",
+    "then its motion is described by"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "94b37496",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "m {d^2 \\mathbf{w} \\over dt^2} + \\mu {d \\mathbf{w} \\over dt }= -\\nabla_w E(\\mathbf{w}).\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3a439485",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "We can discretize this equation in the usual way to get"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ff92e318",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "m { \\mathbf{w}_{t+\\Delta t}-2 \\mathbf{w}_{t} +\\mathbf{w}_{t-\\Delta t} \\over (\\Delta t)^2}+\\mu {\\mathbf{w}_{t+\\Delta t}- \\mathbf{w}_{t} \\over \\Delta t} = -\\nabla_w E(\\mathbf{w}).\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "142e0f95",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Rearranging this equation, we can rewrite this as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "88d4ce9a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\Delta \\mathbf{w}_{t +\\Delta t}= - { (\\Delta t)^2 \\over m +\\mu \\Delta t} \\nabla_w E(\\mathbf{w})+ {m \\over m +\\mu \\Delta t} \\Delta \\mathbf{w}_t.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5c662618",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Momentum parameter\n",
+    "\n",
+    "Notice that this equation is identical to previous one if we identify\n",
+    "the position of the particle, $\\mathbf{w}$, with the parameters\n",
+    "$\\boldsymbol{\\theta}$. This allows us to identify the momentum\n",
+    "parameter and learning rate with the mass of the particle and the\n",
+    "viscous drag as:"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2c4a2172",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\gamma= {m \\over m +\\mu \\Delta t }, \\qquad \\eta = {(\\Delta t)^2 \\over m +\\mu \\Delta t}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "630c69f8",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Thus, as the name suggests, the momentum parameter is proportional to\n",
+    "the mass of the particle and effectively provides inertia.\n",
+    "Furthermore, in the large viscosity/small learning rate limit, our\n",
+    "memory time scales as $(1-\\gamma)^{-1} \\approx m/(\\mu \\Delta t)$.\n",
+    "\n",
+    "Why is momentum useful? SGD momentum helps the gradient descent\n",
+    "algorithm gain speed in directions with persistent but small gradients\n",
+    "even in the presence of stochasticity, while suppressing oscillations\n",
+    "in high-curvature directions. This becomes especially important in\n",
+    "situations where the landscape is shallow and flat in some directions\n",
+    "and narrow and steep in others. It has been argued that first-order\n",
+    "methods (with appropriate initial conditions) can perform comparable\n",
+    "to more expensive second order methods, especially in the context of\n",
+    "complex deep learning models.\n",
+    "\n",
+    "These beneficial properties of momentum can sometimes become even more\n",
+    "pronounced by using a slight modification of the classical momentum\n",
+    "algorithm called Nesterov Accelerated Gradient (NAG).\n",
+    "\n",
+    "In the NAG algorithm, rather than calculating the gradient at the\n",
+    "current parameters, $\\nabla_\\theta E(\\boldsymbol{\\theta}_t)$, one\n",
+    "calculates the gradient at the expected value of the parameters given\n",
+    "our current momentum, $\\nabla_\\theta E(\\boldsymbol{\\theta}_t +\\gamma\n",
+    "\\mathbf{v}_{t-1})$. This yields the NAG update rule"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7fc49052",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\mathbf{v}_{t}=\\gamma \\mathbf{v}_{t-1}+\\eta_{t}\\nabla_\\theta E(\\boldsymbol{\\theta}_t +\\gamma \\mathbf{v}_{t-1}) \\nonumber\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "800856ba",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"_auto2\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation} \n",
+    "\\boldsymbol{\\theta}_{t+1}= \\boldsymbol{\\theta}_t -\\mathbf{v}_{t}.\n",
+    "\\label{_auto2} \\tag{2}\n",
+    "\\end{equation}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "154d4907",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "One of the major advantages of NAG is that it allows for the use of a larger learning rate than GDM for the same choice of $\\gamma$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "17557243",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Second moment of the gradient\n",
+    "\n",
+    "In stochastic gradient descent, with and without momentum, we still\n",
+    "have to specify a schedule for tuning the learning rates $\\eta_t$\n",
+    "as a function of time.  As discussed in the context of Newton's\n",
+    "method, this presents a number of dilemmas. The learning rate is\n",
+    "limited by the steepest direction which can change depending on the\n",
+    "current position in the landscape. To circumvent this problem, ideally\n",
+    "our algorithm would keep track of curvature and take large steps in\n",
+    "shallow, flat directions and small steps in steep, narrow directions.\n",
+    "Second-order methods accomplish this by calculating or approximating\n",
+    "the Hessian and normalizing the learning rate by the\n",
+    "curvature. However, this is very computationally expensive for\n",
+    "extremely large models. Ideally, we would like to be able to\n",
+    "adaptively change the step size to match the landscape without paying\n",
+    "the steep computational price of calculating or approximating\n",
+    "Hessians.\n",
+    "\n",
+    "Recently, a number of methods have been introduced that accomplish\n",
+    "this by tracking not only the gradient, but also the second moment of\n",
+    "the gradient. These methods include AdaGrad, AdaDelta, Root Mean Squared Propagation (RMS-Prop), and\n",
+    "[ADAM](https://arxiv.org/abs/1412.6980)."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f3d09840",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## RMS prop\n",
+    "\n",
+    "In RMS prop, in addition to keeping a running average of the first\n",
+    "moment of the gradient, we also keep track of the second moment\n",
+    "denoted by $\\mathbf{s}_t=\\mathbb{E}[\\mathbf{g}_t^2]$. The update rule\n",
+    "for RMS prop is given by"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "175455a1",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"_auto3\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation}\n",
+    "\\mathbf{g}_t = \\nabla_\\theta E(\\boldsymbol{\\theta}) \n",
+    "\\label{_auto3} \\tag{3}\n",
+    "\\end{equation}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cf1e9538",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\mathbf{s}_t =\\beta \\mathbf{s}_{t-1} +(1-\\beta)\\mathbf{g}_t^2 \\nonumber\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "da9b6589",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\boldsymbol{\\theta}_{t+1}=\\boldsymbol{\\theta}_t - \\eta_t { \\mathbf{g}_t \\over \\sqrt{\\mathbf{s}_t +\\epsilon}}, \\nonumber\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cbc7a25f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where $\\beta$ controls the averaging time of the second moment and is\n",
+    "typically taken to be about $\\beta=0.9$, $\\eta_t$ is a learning rate\n",
+    "typically chosen to be $10^{-3}$, and $\\epsilon\\sim 10^{-8} $ is a\n",
+    "small regularization constant to prevent divergences. Multiplication\n",
+    "and division by vectors is understood as an element-wise operation. It\n",
+    "is clear from this formula that the learning rate is reduced in\n",
+    "directions where the norm of the gradient is consistently large. This\n",
+    "greatly speeds up the convergence by allowing us to use a larger\n",
+    "learning rate for flat directions."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e8064f4a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## [ADAM optimizer](https://arxiv.org/abs/1412.6980)\n",
+    "\n",
+    "A related algorithm is the ADAM optimizer. In\n",
+    "[ADAM](https://arxiv.org/abs/1412.6980), we keep a running average of\n",
+    "both the first and second moment of the gradient and use this\n",
+    "information to adaptively change the learning rate for different\n",
+    "parameters.  The method isefficient when working with large\n",
+    "problems involving lots data and/or parameters.  It is a combination of the\n",
+    "gradient descent with momentum algorithm and the RMSprop algorithm\n",
+    "discussed above.\n",
+    "\n",
+    "In addition to keeping a running average of the first and\n",
+    "second moments of the gradient\n",
+    "(i.e. $\\mathbf{m}_t=\\mathbb{E}[\\mathbf{g}_t]$ and\n",
+    "$\\mathbf{s}_t=\\mathbb{E}[\\mathbf{g}^2_t]$, respectively), ADAM\n",
+    "performs an additional bias correction to account for the fact that we\n",
+    "are estimating the first two moments of the gradient using a running\n",
+    "average (denoted by the hats in the update rule below). The update\n",
+    "rule for ADAM is given by (where multiplication and division are once\n",
+    "again understood to be element-wise operations below)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "12912817",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"_auto4\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation}\n",
+    "\\mathbf{g}_t = \\nabla_\\theta E(\\boldsymbol{\\theta}) \n",
+    "\\label{_auto4} \\tag{4}\n",
+    "\\end{equation}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7864ca89",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\mathbf{m}_t = \\beta_1 \\mathbf{m}_{t-1} + (1-\\beta_1) \\mathbf{g}_t \\nonumber\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "02a5d4ef",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\mathbf{s}_t =\\beta_2 \\mathbf{s}_{t-1} +(1-\\beta_2)\\mathbf{g}_t^2 \\nonumber\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "327a1f34",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\boldsymbol{\\mathbf{m}}_t={\\mathbf{m}_t \\over 1-\\beta_1^t} \\nonumber\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c5ac526b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\boldsymbol{\\mathbf{s}}_t ={\\mathbf{s}_t \\over1-\\beta_2^t} \\nonumber\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e6a33564",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\boldsymbol{\\theta}_{t+1}=\\boldsymbol{\\theta}_t - \\eta_t { \\boldsymbol{\\mathbf{m}}_t \\over \\sqrt{\\boldsymbol{\\mathbf{s}}_t} +\\epsilon}, \\nonumber\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4ffe0088",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"_auto5\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation} \n",
+    "\\label{_auto5} \\tag{5}\n",
+    "\\end{equation}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "34eb03fb",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where $\\beta_1$ and $\\beta_2$ set the memory lifetime of the first and\n",
+    "second moment and are typically taken to be $0.9$ and $0.99$\n",
+    "respectively, and $\\eta$ and $\\epsilon$ are identical to RMSprop.\n",
+    "\n",
+    "Like in RMSprop, the effective step size of a parameter depends on the\n",
+    "magnitude of its gradient squared.  To understand this better, let us\n",
+    "rewrite this expression in terms of the variance\n",
+    "$\\boldsymbol{\\sigma}_t^2 = \\boldsymbol{\\mathbf{s}}_t -\n",
+    "(\\boldsymbol{\\mathbf{m}}_t)^2$. Consider a single parameter $\\theta_t$. The\n",
+    "update rule for this parameter is given by"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "18f62c05",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\Delta \\theta_{t+1}= -\\eta_t { \\boldsymbol{m}_t \\over \\sqrt{\\sigma_t^2 +  m_t^2 }+\\epsilon}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5539d5a5",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Algorithms and codes for Adagrad, RMSprop and Adam\n",
+    "\n",
+    "The algorithms we have implemented are well described in the text by [Goodfellow, Bengio and Courville, chapter 8](https://www.deeplearningbook.org/contents/optimization.html).\n",
+    "\n",
+    "The codes which implement these algorithms are discussed below here."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ae04cfe0",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Practical tips\n",
+    "\n",
+    "* **Randomize the data when making mini-batches**. It is always important to randomly shuffle the data when forming mini-batches. Otherwise, the gradient descent method can fit spurious correlations resulting from the order in which data is presented.\n",
+    "\n",
+    "* **Transform your inputs**. Learning becomes difficult when our landscape has a mixture of steep and flat directions. One simple trick for minimizing these situations is to standardize the data by subtracting the mean and normalizing the variance of input variables. Whenever possible, also decorrelate the inputs. To understand why this is helpful, consider the case of linear regression. It is easy to show that for the squared error cost function, the Hessian of the cost function is just the correlation matrix between the inputs. Thus, by standardizing the inputs, we are ensuring that the landscape looks homogeneous in all directions in parameter space. Since most deep networks can be viewed as linear transformations followed by a non-linearity at each layer, we expect this intuition to hold beyond the linear case.\n",
+    "\n",
+    "* **Monitor the out-of-sample performance.** Always monitor the performance of your model on a validation set (a small portion of the training data that is held out of the training process to serve as a proxy for the test set. If the validation error starts increasing, then the model is beginning to overfit. Terminate the learning process. This *early stopping* significantly improves performance in many settings.\n",
+    "\n",
+    "* **Adaptive optimization methods don't always have good generalization.** Recent studies have shown that adaptive methods such as ADAM, RMSPorp, and AdaGrad tend to have poor generalization compared to SGD or SGD with momentum, particularly in the high-dimensional limit (i.e. the number of parameters exceeds the number of data points). Although it is not clear at this stage why these methods perform so well in training deep neural networks, simpler procedures like properly-tuned SGD may work as well or better in these applications."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3ffaaf9a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Sneaking in  automatic differentiation using Autograd\n",
+    "\n",
+    "We anticipate our discussions to come in connection with neural networks and automatic differentiation\n",
+    "by showing how we can use **autograd** for the cases above. Later we will replace **autograd** with **JAX**."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 9,
+   "id": "51503c7d",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "# Using Autograd to calculate gradients for OLS\n",
+    "from random import random, seed\n",
+    "import numpy as np\n",
+    "import autograd.numpy as np\n",
+    "import matplotlib.pyplot as plt\n",
+    "from autograd import grad\n",
+    "\n",
+    "def CostOLS(beta):\n",
+    "    return (1.0/n)*np.sum((y-X @ beta)**2)\n",
+    "\n",
+    "n = 100\n",
+    "x = 2*np.random.rand(n,1)\n",
+    "y = 4+3*x+np.random.randn(n,1)\n",
+    "\n",
+    "X = np.c_[np.ones((n,1)), x]\n",
+    "XT_X = X.T @ X\n",
+    "theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)\n",
+    "print(\"Own inversion\")\n",
+    "print(theta_linreg)\n",
+    "# Hessian matrix\n",
+    "H = (2.0/n)* XT_X\n",
+    "EigValues, EigVectors = np.linalg.eig(H)\n",
+    "print(f\"Eigenvalues of Hessian Matrix:{EigValues}\")\n",
+    "\n",
+    "theta = np.random.randn(2,1)\n",
+    "eta = 1.0/np.max(EigValues)\n",
+    "Niterations = 1000\n",
+    "# define the gradient\n",
+    "training_gradient = grad(CostOLS)\n",
+    "\n",
+    "for iter in range(Niterations):\n",
+    "    gradients = training_gradient(theta)\n",
+    "    theta -= eta*gradients\n",
+    "print(\"theta from own gd\")\n",
+    "print(theta)\n",
+    "\n",
+    "xnew = np.array([[0],[2]])\n",
+    "Xnew = np.c_[np.ones((2,1)), xnew]\n",
+    "ypredict = Xnew.dot(theta)\n",
+    "ypredict2 = Xnew.dot(theta_linreg)\n",
+    "\n",
+    "plt.plot(xnew, ypredict, \"r-\")\n",
+    "plt.plot(xnew, ypredict2, \"b-\")\n",
+    "plt.plot(x, y ,'ro')\n",
+    "plt.axis([0,2.0,0, 15.0])\n",
+    "plt.xlabel(r'$x$')\n",
+    "plt.ylabel(r'$y$')\n",
+    "plt.title(r'Random numbers ')\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "67d85da7",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Same code but now with momentum gradient descent"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 10,
+   "id": "586da287",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "# Using Autograd to calculate gradients for OLS\n",
+    "from random import random, seed\n",
+    "import numpy as np\n",
+    "import autograd.numpy as np\n",
+    "import matplotlib.pyplot as plt\n",
+    "from autograd import grad\n",
+    "\n",
+    "def CostOLS(beta):\n",
+    "    return (1.0/n)*np.sum((y-X @ beta)**2)\n",
+    "\n",
+    "n = 100\n",
+    "x = 2*np.random.rand(n,1)\n",
+    "y = 4+3*x#+np.random.randn(n,1)\n",
+    "\n",
+    "X = np.c_[np.ones((n,1)), x]\n",
+    "XT_X = X.T @ X\n",
+    "theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)\n",
+    "print(\"Own inversion\")\n",
+    "print(theta_linreg)\n",
+    "# Hessian matrix\n",
+    "H = (2.0/n)* XT_X\n",
+    "EigValues, EigVectors = np.linalg.eig(H)\n",
+    "print(f\"Eigenvalues of Hessian Matrix:{EigValues}\")\n",
+    "\n",
+    "theta = np.random.randn(2,1)\n",
+    "eta = 1.0/np.max(EigValues)\n",
+    "Niterations = 30\n",
+    "\n",
+    "# define the gradient\n",
+    "training_gradient = grad(CostOLS)\n",
+    "\n",
+    "for iter in range(Niterations):\n",
+    "    gradients = training_gradient(theta)\n",
+    "    theta -= eta*gradients\n",
+    "    print(iter,gradients[0],gradients[1])\n",
+    "print(\"theta from own gd\")\n",
+    "print(theta)\n",
+    "\n",
+    "# Now improve with momentum gradient descent\n",
+    "change = 0.0\n",
+    "delta_momentum = 0.3\n",
+    "for iter in range(Niterations):\n",
+    "    # calculate gradient\n",
+    "    gradients = training_gradient(theta)\n",
+    "    # calculate update\n",
+    "    new_change = eta*gradients+delta_momentum*change\n",
+    "    # take a step\n",
+    "    theta -= new_change\n",
+    "    # save the change\n",
+    "    change = new_change\n",
+    "    print(iter,gradients[0],gradients[1])\n",
+    "print(\"theta from own gd wth momentum\")\n",
+    "print(theta)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b0d53639",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## But none of these can compete with Newton's method"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 11,
+   "id": "81089279",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "# Using Newton's method\n",
+    "from random import random, seed\n",
+    "import numpy as np\n",
+    "import autograd.numpy as np\n",
+    "import matplotlib.pyplot as plt\n",
+    "from autograd import grad\n",
+    "\n",
+    "def CostOLS(beta):\n",
+    "    return (1.0/n)*np.sum((y-X @ beta)**2)\n",
+    "\n",
+    "n = 100\n",
+    "x = 2*np.random.rand(n,1)\n",
+    "y = 4+3*x+np.random.randn(n,1)\n",
+    "\n",
+    "X = np.c_[np.ones((n,1)), x]\n",
+    "XT_X = X.T @ X\n",
+    "beta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)\n",
+    "print(\"Own inversion\")\n",
+    "print(beta_linreg)\n",
+    "# Hessian matrix\n",
+    "H = (2.0/n)* XT_X\n",
+    "# Note that here the Hessian does not depend on the parameters beta\n",
+    "invH = np.linalg.pinv(H)\n",
+    "EigValues, EigVectors = np.linalg.eig(H)\n",
+    "print(f\"Eigenvalues of Hessian Matrix:{EigValues}\")\n",
+    "\n",
+    "beta = np.random.randn(2,1)\n",
+    "Niterations = 5\n",
+    "\n",
+    "# define the gradient\n",
+    "training_gradient = grad(CostOLS)\n",
+    "\n",
+    "for iter in range(Niterations):\n",
+    "    gradients = training_gradient(beta)\n",
+    "    beta -= invH @ gradients\n",
+    "    print(iter,gradients[0],gradients[1])\n",
+    "print(\"beta from own Newton code\")\n",
+    "print(beta)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e302b6c8",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Including Stochastic Gradient Descent with Autograd\n",
+    "In this code we include the stochastic gradient descent approach discussed above. Note here that we specify which argument we are taking the derivative with respect to when using **autograd**."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 12,
+   "id": "88efb71c",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "# Using Autograd to calculate gradients using SGD\n",
+    "# OLS example\n",
+    "from random import random, seed\n",
+    "import numpy as np\n",
+    "import autograd.numpy as np\n",
+    "import matplotlib.pyplot as plt\n",
+    "from autograd import grad\n",
+    "\n",
+    "# Note change from previous example\n",
+    "def CostOLS(y,X,theta):\n",
+    "    return np.sum((y-X @ theta)**2)\n",
+    "\n",
+    "n = 100\n",
+    "x = 2*np.random.rand(n,1)\n",
+    "y = 4+3*x+np.random.randn(n,1)\n",
+    "\n",
+    "X = np.c_[np.ones((n,1)), x]\n",
+    "XT_X = X.T @ X\n",
+    "theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)\n",
+    "print(\"Own inversion\")\n",
+    "print(theta_linreg)\n",
+    "# Hessian matrix\n",
+    "H = (2.0/n)* XT_X\n",
+    "EigValues, EigVectors = np.linalg.eig(H)\n",
+    "print(f\"Eigenvalues of Hessian Matrix:{EigValues}\")\n",
+    "\n",
+    "theta = np.random.randn(2,1)\n",
+    "eta = 1.0/np.max(EigValues)\n",
+    "Niterations = 1000\n",
+    "\n",
+    "# Note that we request the derivative wrt third argument (theta, 2 here)\n",
+    "training_gradient = grad(CostOLS,2)\n",
+    "\n",
+    "for iter in range(Niterations):\n",
+    "    gradients = (1.0/n)*training_gradient(y, X, theta)\n",
+    "    theta -= eta*gradients\n",
+    "print(\"theta from own gd\")\n",
+    "print(theta)\n",
+    "\n",
+    "xnew = np.array([[0],[2]])\n",
+    "Xnew = np.c_[np.ones((2,1)), xnew]\n",
+    "ypredict = Xnew.dot(theta)\n",
+    "ypredict2 = Xnew.dot(theta_linreg)\n",
+    "\n",
+    "plt.plot(xnew, ypredict, \"r-\")\n",
+    "plt.plot(xnew, ypredict2, \"b-\")\n",
+    "plt.plot(x, y ,'ro')\n",
+    "plt.axis([0,2.0,0, 15.0])\n",
+    "plt.xlabel(r'$x$')\n",
+    "plt.ylabel(r'$y$')\n",
+    "plt.title(r'Random numbers ')\n",
+    "plt.show()\n",
+    "\n",
+    "n_epochs = 50\n",
+    "M = 5   #size of each minibatch\n",
+    "m = int(n/M) #number of minibatches\n",
+    "t0, t1 = 5, 50\n",
+    "def learning_schedule(t):\n",
+    "    return t0/(t+t1)\n",
+    "\n",
+    "theta = np.random.randn(2,1)\n",
+    "\n",
+    "for epoch in range(n_epochs):\n",
+    "# Can you figure out a better way of setting up the contributions to each batch?\n",
+    "    for i in range(m):\n",
+    "        random_index = M*np.random.randint(m)\n",
+    "        xi = X[random_index:random_index+M]\n",
+    "        yi = y[random_index:random_index+M]\n",
+    "        gradients = (1.0/M)*training_gradient(yi, xi, theta)\n",
+    "        eta = learning_schedule(epoch*m+i)\n",
+    "        theta = theta - eta*gradients\n",
+    "print(\"theta from own sdg\")\n",
+    "print(theta)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a3e14887",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Same code but now with momentum gradient descent"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 13,
+   "id": "607e3bc9",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "# Using Autograd to calculate gradients using SGD\n",
+    "# OLS example\n",
+    "from random import random, seed\n",
+    "import numpy as np\n",
+    "import autograd.numpy as np\n",
+    "import matplotlib.pyplot as plt\n",
+    "from autograd import grad\n",
+    "\n",
+    "# Note change from previous example\n",
+    "def CostOLS(y,X,theta):\n",
+    "    return np.sum((y-X @ theta)**2)\n",
+    "\n",
+    "n = 100\n",
+    "x = 2*np.random.rand(n,1)\n",
+    "y = 4+3*x+np.random.randn(n,1)\n",
+    "\n",
+    "X = np.c_[np.ones((n,1)), x]\n",
+    "XT_X = X.T @ X\n",
+    "theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)\n",
+    "print(\"Own inversion\")\n",
+    "print(theta_linreg)\n",
+    "# Hessian matrix\n",
+    "H = (2.0/n)* XT_X\n",
+    "EigValues, EigVectors = np.linalg.eig(H)\n",
+    "print(f\"Eigenvalues of Hessian Matrix:{EigValues}\")\n",
+    "\n",
+    "theta = np.random.randn(2,1)\n",
+    "eta = 1.0/np.max(EigValues)\n",
+    "Niterations = 100\n",
+    "\n",
+    "# Note that we request the derivative wrt third argument (theta, 2 here)\n",
+    "training_gradient = grad(CostOLS,2)\n",
+    "\n",
+    "for iter in range(Niterations):\n",
+    "    gradients = (1.0/n)*training_gradient(y, X, theta)\n",
+    "    theta -= eta*gradients\n",
+    "print(\"theta from own gd\")\n",
+    "print(theta)\n",
+    "\n",
+    "\n",
+    "n_epochs = 50\n",
+    "M = 5   #size of each minibatch\n",
+    "m = int(n/M) #number of minibatches\n",
+    "t0, t1 = 5, 50\n",
+    "def learning_schedule(t):\n",
+    "    return t0/(t+t1)\n",
+    "\n",
+    "theta = np.random.randn(2,1)\n",
+    "\n",
+    "change = 0.0\n",
+    "delta_momentum = 0.3\n",
+    "\n",
+    "for epoch in range(n_epochs):\n",
+    "    for i in range(m):\n",
+    "        random_index = M*np.random.randint(m)\n",
+    "        xi = X[random_index:random_index+M]\n",
+    "        yi = y[random_index:random_index+M]\n",
+    "        gradients = (1.0/M)*training_gradient(yi, xi, theta)\n",
+    "        eta = learning_schedule(epoch*m+i)\n",
+    "        # calculate update\n",
+    "        new_change = eta*gradients+delta_momentum*change\n",
+    "        # take a step\n",
+    "        theta -= new_change\n",
+    "        # save the change\n",
+    "        change = new_change\n",
+    "print(\"theta from own sdg with momentum\")\n",
+    "print(theta)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d96e1c2c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Similar (second order function now) problem but now with AdaGrad"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 14,
+   "id": "6cbe854b",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "# Using Autograd to calculate gradients using AdaGrad and Stochastic Gradient descent\n",
+    "# OLS example\n",
+    "from random import random, seed\n",
+    "import numpy as np\n",
+    "import autograd.numpy as np\n",
+    "import matplotlib.pyplot as plt\n",
+    "from autograd import grad\n",
+    "\n",
+    "# Note change from previous example\n",
+    "def CostOLS(y,X,theta):\n",
+    "    return np.sum((y-X @ theta)**2)\n",
+    "\n",
+    "n = 1000\n",
+    "x = np.random.rand(n,1)\n",
+    "y = 2.0+3*x +4*x*x\n",
+    "\n",
+    "X = np.c_[np.ones((n,1)), x, x*x]\n",
+    "XT_X = X.T @ X\n",
+    "theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)\n",
+    "print(\"Own inversion\")\n",
+    "print(theta_linreg)\n",
+    "\n",
+    "\n",
+    "# Note that we request the derivative wrt third argument (theta, 2 here)\n",
+    "training_gradient = grad(CostOLS,2)\n",
+    "# Define parameters for Stochastic Gradient Descent\n",
+    "n_epochs = 50\n",
+    "M = 5   #size of each minibatch\n",
+    "m = int(n/M) #number of minibatches\n",
+    "# Guess for unknown parameters theta\n",
+    "theta = np.random.randn(3,1)\n",
+    "\n",
+    "# Value for learning rate\n",
+    "eta = 0.01\n",
+    "# Including AdaGrad parameter to avoid possible division by zero\n",
+    "delta  = 1e-8\n",
+    "for epoch in range(n_epochs):\n",
+    "    Giter = 0.0\n",
+    "    for i in range(m):\n",
+    "        random_index = M*np.random.randint(m)\n",
+    "        xi = X[random_index:random_index+M]\n",
+    "        yi = y[random_index:random_index+M]\n",
+    "        gradients = (1.0/M)*training_gradient(yi, xi, theta)\n",
+    "        Giter += gradients*gradients\n",
+    "        update = gradients*eta/(delta+np.sqrt(Giter))\n",
+    "        theta -= update\n",
+    "print(\"theta from own AdaGrad\")\n",
+    "print(theta)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0ed15b7b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Running this code we note an almost perfect agreement with the results from matrix inversion."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d97ab879",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## RMSprop for adaptive learning rate with Stochastic Gradient Descent"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 15,
+   "id": "c58a19a2",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "# Using Autograd to calculate gradients using RMSprop  and Stochastic Gradient descent\n",
+    "# OLS example\n",
+    "from random import random, seed\n",
+    "import numpy as np\n",
+    "import autograd.numpy as np\n",
+    "import matplotlib.pyplot as plt\n",
+    "from autograd import grad\n",
+    "\n",
+    "# Note change from previous example\n",
+    "def CostOLS(y,X,theta):\n",
+    "    return np.sum((y-X @ theta)**2)\n",
+    "\n",
+    "n = 1000\n",
+    "x = np.random.rand(n,1)\n",
+    "y = 2.0+3*x +4*x*x# +np.random.randn(n,1)\n",
+    "\n",
+    "X = np.c_[np.ones((n,1)), x, x*x]\n",
+    "XT_X = X.T @ X\n",
+    "theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)\n",
+    "print(\"Own inversion\")\n",
+    "print(theta_linreg)\n",
+    "\n",
+    "\n",
+    "# Note that we request the derivative wrt third argument (theta, 2 here)\n",
+    "training_gradient = grad(CostOLS,2)\n",
+    "# Define parameters for Stochastic Gradient Descent\n",
+    "n_epochs = 50\n",
+    "M = 5   #size of each minibatch\n",
+    "m = int(n/M) #number of minibatches\n",
+    "# Guess for unknown parameters theta\n",
+    "theta = np.random.randn(3,1)\n",
+    "\n",
+    "# Value for learning rate\n",
+    "eta = 0.01\n",
+    "# Value for parameter rho\n",
+    "rho = 0.99\n",
+    "# Including AdaGrad parameter to avoid possible division by zero\n",
+    "delta  = 1e-8\n",
+    "for epoch in range(n_epochs):\n",
+    "    Giter = 0.0\n",
+    "    for i in range(m):\n",
+    "        random_index = M*np.random.randint(m)\n",
+    "        xi = X[random_index:random_index+M]\n",
+    "        yi = y[random_index:random_index+M]\n",
+    "        gradients = (1.0/M)*training_gradient(yi, xi, theta)\n",
+    "\t# Accumulated gradient\n",
+    "\t# Scaling with rho the new and the previous results\n",
+    "        Giter = (rho*Giter+(1-rho)*gradients*gradients)\n",
+    "\t# Taking the diagonal only and inverting\n",
+    "        update = gradients*eta/(delta+np.sqrt(Giter))\n",
+    "\t# Hadamard product\n",
+    "        theta -= update\n",
+    "print(\"theta from own RMSprop\")\n",
+    "print(theta)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d7106089",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## And finally [ADAM](https://arxiv.org/pdf/1412.6980.pdf)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 16,
+   "id": "d3b9cdb9",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "# Using Autograd to calculate gradients using RMSprop  and Stochastic Gradient descent\n",
+    "# OLS example\n",
+    "from random import random, seed\n",
+    "import numpy as np\n",
+    "import autograd.numpy as np\n",
+    "import matplotlib.pyplot as plt\n",
+    "from autograd import grad\n",
+    "\n",
+    "# Note change from previous example\n",
+    "def CostOLS(y,X,theta):\n",
+    "    return np.sum((y-X @ theta)**2)\n",
+    "\n",
+    "n = 1000\n",
+    "x = np.random.rand(n,1)\n",
+    "y = 2.0+3*x +4*x*x# +np.random.randn(n,1)\n",
+    "\n",
+    "X = np.c_[np.ones((n,1)), x, x*x]\n",
+    "XT_X = X.T @ X\n",
+    "theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)\n",
+    "print(\"Own inversion\")\n",
+    "print(theta_linreg)\n",
+    "\n",
+    "\n",
+    "# Note that we request the derivative wrt third argument (theta, 2 here)\n",
+    "training_gradient = grad(CostOLS,2)\n",
+    "# Define parameters for Stochastic Gradient Descent\n",
+    "n_epochs = 50\n",
+    "M = 5   #size of each minibatch\n",
+    "m = int(n/M) #number of minibatches\n",
+    "# Guess for unknown parameters theta\n",
+    "theta = np.random.randn(3,1)\n",
+    "\n",
+    "# Value for learning rate\n",
+    "eta = 0.01\n",
+    "# Value for parameters beta1 and beta2, see https://arxiv.org/abs/1412.6980\n",
+    "beta1 = 0.9\n",
+    "beta2 = 0.999\n",
+    "# Including AdaGrad parameter to avoid possible division by zero\n",
+    "delta  = 1e-7\n",
+    "iter = 0\n",
+    "for epoch in range(n_epochs):\n",
+    "    first_moment = 0.0\n",
+    "    second_moment = 0.0\n",
+    "    iter += 1\n",
+    "    for i in range(m):\n",
+    "        random_index = M*np.random.randint(m)\n",
+    "        xi = X[random_index:random_index+M]\n",
+    "        yi = y[random_index:random_index+M]\n",
+    "        gradients = (1.0/M)*training_gradient(yi, xi, theta)\n",
+    "        # Computing moments first\n",
+    "        first_moment = beta1*first_moment + (1-beta1)*gradients\n",
+    "        second_moment = beta2*second_moment+(1-beta2)*gradients*gradients\n",
+    "        first_term = first_moment/(1.0-beta1**iter)\n",
+    "        second_term = second_moment/(1.0-beta2**iter)\n",
+    "\t# Scaling with rho the new and the previous results\n",
+    "        update = eta*first_term/(np.sqrt(second_term)+delta)\n",
+    "        theta -= update\n",
+    "print(\"theta from own ADAM\")\n",
+    "print(theta)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e6a7abff",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Material for the lab sessions\n",
+    "\n",
+    "**Material for the lab  sessions on Tuesday and Wednesday.**\n",
+    "\n",
+    "1. Exercise set for week 37\n",
+    "\n",
+    "2. Work on project 1\n",
+    "<!-- * [Video of exercise sessions week 37](https://youtu.be/bK4AEcTu-oM) -->\n",
+    "\n",
+    "  * For more discussions of Ridge regression and calculation of averages, [Wessel van Wieringen's](https://arxiv.org/abs/1509.09169) article is highly recommended."
+   ]
+  }
+ ],
+ "metadata": {},
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/doc/src/week37/Latexfiles/adam.pdf b/doc/src/week37/Latexfiles/adam.pdf
new file mode 100644
index 000000000..a10a0c238
Binary files /dev/null and b/doc/src/week37/Latexfiles/adam.pdf differ
diff --git a/doc/src/week37/Latexfiles/adam.tex b/doc/src/week37/Latexfiles/adam.tex
new file mode 100644
index 000000000..a1df1af7f
--- /dev/null
+++ b/doc/src/week37/Latexfiles/adam.tex
@@ -0,0 +1,105 @@
+\documentclass{beamer}
+\mode<presentation>{\usetheme{CambridgeUS}}
+\usepackage{amsmath,amssymb}
+\usepackage{graphicx}
+
+\title{Derivation of the Adam Optimization Algorithm}
+\author{MHJ}
+\institute{Notes 2018}
+\date{}
+
+\begin{document}
+
+\begin{frame}
+  \titlepage
+\end{frame}
+
+\begin{frame}{Overview}
+  \tableofcontents
+\end{frame}
+
+\section{Adam Optimizer}
+
+\begin{frame}{Why Combine Momentum and RMSProp?}
+  \textbf{Motivation for Adam:} Adaptive Moment Estimation (Adam) was introduced by Kingma \& Ba (2014) to combine the benefits of momentum and RMSProp
+  \begin{itemize}
+    \item \textbf{Momentum:} Fast convergence by smoothing gradients (accelerates in long-term gradient direction).
+    \item \textbf{Adaptive rates (RMSProp):} Per-dimension learning rate scaling for stability (handles different feature scales, sparse gradients).
+    \item Adam uses both: maintains moving averages of both first moment (gradients) and second moment (squared gradients)
+    \item Additionally, includes a mechanism to correct the bias in these moving averages (crucial in early iterations)
+    \item Result: Adam is robust, achieves faster convergence with less tuning, and often outperforms SGD (with momentum) in practice
+  \end{itemize}
+\end{frame}
+
+\begin{frame}{Adam: Exponential Moving Averages (Moments)}
+  Adam maintains two moving averages at each time step $t$ for each parameter $w$:
+  \begin{block}{First moment (mean) $m_t$}
+    $m_t = \beta_1\, m_{t-1} + (1-\beta_1)\, \nabla L(w_t)$,  \quad (Momentum term) 
+  \end{block}
+  \begin{block}{Second moment (uncentered variance) $v_t$}
+    $v_t = \beta_2\, v_{t-1} + (1-\beta_2)\, (\nabla L(w_t))^2$,  \quad (RMS term)
+  \end{block}
+  with typical $\beta_1 = 0.9$, $\beta_2 = 0.999$. Initialize $m_0 = 0$, $v_0 = 0$. \\
+  \vspace{0.2cm}
+  These are \alert{\textit{biased}} estimators of the true first and second moment of the gradients, especially at the start (since $m_0,v_0$ are zero)
+\end{frame}
+
+\begin{frame}{Adam: Bias Correction}
+  To counteract initialization bias in $m_t, v_t$, Adam computes bias-corrected estimates
+  \[
+    \hat{m}_t = \frac{m_t}{1 - \beta_1^t}, 
+    \qquad 
+    \hat{v}_t = \frac{v_t}{1 - \beta_2^t}\,. 
+  \]
+  \begin{itemize}
+    \item When $t$ is small, $1-\beta_i^t \approx 0$, so $\hat{m}_t, \hat{v}_t$ significantly larger than raw $m_t, v_t$, compensating for the initial zero bias.
+    \item As $t$ increases, $1-\beta_i^t \to 1$, and $\hat{m}_t, \hat{v}_t$ converge to $m_t, v_t$.
+    \item Bias correction is important for Adam’s stability in early iterations
+  \end{itemize}
+\end{frame}
+
+\begin{frame}{Adam: Update Rule Derivation}
+  Finally, Adam updates parameters using the bias-corrected moments:
+  \[
+    w_{t+1} \;=\; w_t \;-\; \frac{\alpha}{\sqrt{\hat{v}_t} + \epsilon}\; \hat{m}_t \,,
+  \] 
+  where $\epsilon$ is a small constant (e.g. $10^{-8}$) to prevent division by zero
+  \vspace{0.3cm}
+  Breaking it down:
+  \begin{enumerate}
+    \item Compute gradient $\nabla L(w_t)$.
+    \item Update first moment $m_t$ and second moment $v_t$ (exponential moving averages).
+    \item Bias-correct: $\hat{m}_t = m_t/(1-\beta_1^t)$, $\; \hat{v}_t = v_t/(1-\beta_2^t)$.
+    \item Compute step: $\Delta w_t = \frac{\hat{m}_t}{\sqrt{\hat{v}_t} + \epsilon}$.
+    \item Update parameters: $w_{t+1} = w_t - \alpha\, \Delta w_t$.
+  \end{enumerate}
+  This is the Adam update rule as given in the original paper
+\end{frame}
+
+\begin{frame}{Adam vs. AdaGrad and RMSProp}
+  \begin{itemize}
+    \item \textbf{AdaGrad:} Uses per-coordinate scaling like Adam, but no momentum. Tends to slow down too much due to cumulative history (no forgetting)
+    \item \textbf{RMSProp:} Uses moving average of squared gradients (like Adam’s $v_t$) to maintain adaptive learning rates, but does not include momentum or bias-correction.
+    \item \textbf{Adam:} Effectively RMSProp + Momentum + Bias-correction
+    \begin{itemize}
+      \item Momentum ($m_t$) provides acceleration and smoother convergence.
+      \item Adaptive $v_t$ scaling moderates the step size per dimension.
+      \item Bias correction (absent in AdaGrad/RMSProp) ensures robust estimates early on.
+    \end{itemize}
+    \item In practice, Adam often yields faster convergence and better tuning stability than RMSProp or AdaGrad alone
+  \end{itemize}
+\end{frame}
+
+
+\begin{frame}{Adaptivity Across Dimensions}
+  \begin{itemize}
+    \item Adam adapts the step size \emph{per coordinate}: parameters with larger gradient variance get smaller effective steps, those with smaller or sparse gradients get larger steps.
+    \item This per-dimension adaptivity is inherited from AdaGrad/RMSProp and helps handle ill-conditioned or sparse problems.
+    \item Meanwhile, momentum (first moment) allows Adam to continue making progress even if gradients become small or noisy, by leveraging accumulated direction.
+    \item \textbf{Example:} In a deep network, some weights may receive very noisy or infrequent gradients – Adam will keep their learning rate high (unlike AdaGrad which would have decayed it) and also smooth out the noise with momentum.
+  \end{itemize}
+\end{frame}
+
+
+
+\end{document}
diff --git a/doc/src/week37/Latexfiles/sgd.txt b/doc/src/week37/Latexfiles/sgd.txt
new file mode 100644
index 000000000..3502d6192
--- /dev/null
+++ b/doc/src/week37/Latexfiles/sgd.txt
@@ -0,0 +1,246 @@
+!split
+===== SGD vs Full-Batch GD: Convergence Speed and Memory Comparison =====
+
+
+
+=== Theoretical Convergence Speed and convex optimization ===
+
+
+Consider minimizing an empirical cost function
+!bt
+\[
+C(\theta) =\frac{1}{N}\sum_{i=1}^N l_i(\theta),
+\]
+!et
+
+where each $l_i(\theta)$ is a
+differentiable loss term. Gradient Descent (GD) updates parameters
+using the full gradient $\nabla C(\theta)$, while Stochastic Gradient
+Descent (SGD) uses a single sample (or mini-batch) gradient $\nabla
+l_i(\theta)$ selected at random. In equation form, one GD step is:
+
+!bt
+\[
+\theta_{t+1} = \theta_t-\eta \nabla C(\theta_t) =\theta_t -\eta \frac{1}{N}\sum_{i=1}^N \nabla l_i(\theta_t),
+\]
+!et
+whereas one SGD step is:
+
+!bt
+\[
+\theta_{t+1} ;=; \theta_t -\eta \nabla l_{i_t}(\theta_t),
+\]
+!et
+
+with $i_t$ randomly chosen. On smooth convex problems, GD and SGD both
+converge to the global minimum, but their rates differ. GD can take
+larger, more stable steps since it uses the exact gradient, achieving
+an error that decreases on the order of $O(1/t)$ per iteration for
+convex objectives (and even exponentially fast for strongly convex
+cases). In contrast, plain SGD has more variance in each step, leading
+to sublinear convergence in expectation – typically $O(1/\sqrt{t})$
+for general convex objectives (\thetaith appropriate diminishing step
+sizes) . Intuitively, GD’s trajectory is smoother and more
+predictable, while SGD’s path oscillates due to noise but costs far
+less per iteration, enabling many more updates in the same time.
+
+
+=== Strongly Convex Case ===
+
+
+If $C(\theta)$ is strongly convex and $L$-smooth (so GD enjoys linear
+convergence), the gap $C(\theta_t)-C(\theta^*)$ for GD shrinks as
+!bt
+\[
+C(\theta_t) - C(\theta^* ) ;\le; \Big(1 - \frac{\mu}{L}\Big)^t [C(\theta_0)-C(\theta^*)],
+\]
+!et
+
+a geometric (linear) convergence per iteration . Achieving an
+$\epsilon$-accurate solution thus takes on the order of
+$\log(1/\epsilon)$ iterations for GD. However, each GD iteration costs
+$O(N)$ gradient evaluations. SGD cannot exploit strong convexity to
+obtain a linear rate – instead, with a properly decaying step size
+(e.g. $\eta_t = \frac{1}{\mu t}$) or iterate averaging, SGD attains an
+$O(1/t)$ convergence rate in expectation . For example, one result
+of Moulines and  Bach 2011, see URL:"https://papers.nips.cc/paper_files/paper/2011/hash/40008b9a5380fcacce3976bf7c08af5b-Abstract.html" shows that with $\eta_t = \Theta(1/t)$,
+!bt
+\[
+\mathbb{E}[C(\theta_t) - C(\theta^*)] = O(1/t),
+\]
+!et
+
+for strongly convex, smooth $F$ . This $1/t$ rate is slower per
+iteration than GD’s exponential decay, but each SGD iteration is $N$
+times cheaper. In fact, to reach error $\epsilon$, plain SGD needs on
+the order of $T=O(1/\epsilon)$ iterations (sub-linear convergence),
+while GD needs $O(\log(1/\epsilon))$ iterations. When accounting for
+cost-per-iteration, GD requires $O(N \log(1/\epsilon))$ total gradient
+computations versus SGD’s $O(1/\epsilon)$ single-sample
+computations. In large-scale regimes (huge $N$), SGD can be
+faster in wall-clock time because $N \log(1/\epsilon)$ may far exceed
+$1/\epsilon$ for reasonable accuracy levels. In other words,
+with millions of data points, one epoch of GD (one full gradient) is
+extremely costly, whereas SGD can make $N$ cheap updates in the time
+GD makes one – often yielding a good solution faster in practice, even
+though SGD’s asymptotic error decays more slowly. As one lecture
+succinctly puts it: “SGD can be super effective in terms of iteration
+cost and memory, but SGD is slow to converge and can’t adapt to strong
+convexity” . Thus, the break-even point depends on $N$ and the desired
+accuracy: for moderate accuracy on very large $N$, SGD’s cheaper
+updates win; for extremely high precision (very small $\epsilon$) on a
+modest $N$, GD’s fast convergence per step can be advantageous.
+
+=== Non-Convex Problems ===
+
+In non-convex optimization (e.g. deep neural networks), neither GD nor
+SGD guarantees global minima, but SGD often displays faster progress
+in finding useful minima. Theoretical results here are weaker, usually
+showing convergence to a stationary point (\thetahere $|\nabla F|$ is
+small) in expectation. For example, GD might require $O(1/\epsilon^2)$
+iterations to ensure $|\nabla C(\theta)| < \epsilon$, and SGD typically has
+similar polynomial complexity (often worse due to gradient
+noise). However, a noteworthy difference is that SGD’s stochasticity
+can help escape saddle points or poor local minima. Random gradient
+fluctuations act like implicit noise, helping the iterate “jump” out
+of flat saddle regions where full-batch GD could stagnate . In fact,
+research has shown that adding noise to GD can guarantee escaping
+saddle points in polynomial time, and the inherent noise in SGD often
+serves this role. Empirically, this means SGD can sometimes find a
+lower loss basin faster, whereas full-batch GD might get “stuck” near
+saddle points or need a very small learning rate to navigate complex
+error surfaces . Overall, in modern high-dimensional machine learning,
+SGD (or mini-batch SGD) is the workhorse for large non-convex problems
+because it converges to good solutions much faster in practice,
+despite the lack of a linear convergence guarantee. Full-batch GD is
+rarely used on large neural networks, as it would require tiny steps
+to avoid divergence and is extremely slow per iteration .
+
+!split
+===== Memory Usage and Scalability =====
+
+
+A major advantage of SGD is its memory efficiency in handling large
+datasets. Full-batch GD requires access to the entire training set for
+each iteration, which often means the whole dataset (or a large
+subset) must reside in memory to compute $\nabla C(\theta)$ . This results
+in memory usage that scales linearly with the dataset size $N$. For
+instance, if each training sample is large (e.g. high-dimensional
+features), computing a full gradient may require storing a substantial
+portion of the data or all intermediate gradients until they are
+aggregated. In contrast, SGD needs only a single (or a small
+mini-batch of) training example(s) in memory at any time . The
+algorithm processes one sample (or mini-batch) at a time and
+immediately updates the model, discarding that sample before moving to
+the next. This streaming approach means that memory footprint is
+essentially independent of $N$ (apart from storing the model
+parameters themselves). As one source notes, gradient descent
+“requires more memory than SGD” because it “must store the entire
+dataset for each iteration,” whereas SGD “only needs to store the
+current training example” . In practical terms, if you have a dataset
+of size, say, 1 million examples, full-batch GD would need memory for
+all million every step, while SGD could be implemented to load just
+one example at a time – a crucial benefit if data are too large to fit
+in RAM or GPU memory. This scalability makes SGD suitable for
+large-scale learning: as long as you can stream data from disk, SGD
+can handle arbitrarily large datasets with fixed memory. In fact, SGD
+“does not need to remember which examples were visited” in the past,
+allowing it to run in an online fashion on infinite data streams
+. Full-batch GD, on the other hand, would require multiple passes
+through a giant dataset per update (or a complex distributed memory
+system), which is often infeasible.
+
+There is also a secondary memory effect: computing a full-batch
+gradient in deep learning requires storing all intermediate
+activations for backpropagation across the entire batch. A very large
+batch (approaching the full dataset) might exhaust GPU memory due to
+the need to hold activation gradients for thousands or millions of
+examples simultaneously. SGD/minibatches mitigate this by splitting
+the workload – e.g. with a mini-batch of size 32 or 256, memory use
+stays bounded, whereas a full-batch (size = $N$) forward/backward pass
+could not even be executed if $N$ is huge. Techniques like gradient
+accumulation exist to simulate large-batch GD by summing many
+small-batch gradients – but these still process data in manageable
+chunks to avoid memory overflow. In summary, memory complexity for GD
+grows with $N$, while for SGD it remains $O(1)$ w.r.t. dataset size
+(only the model and perhaps a mini-batch reside in memory) . This is a
+key reason why batch GD “does not scale” to very large data and why
+virtually all large-scale machine learning algorithms rely on
+stochastic or mini-batch methods.
+
+
+!split
+===== Empirical Evidence: Convergence Time and Memory in Practice =====
+
+
+Empirical studies strongly support the theoretical trade-offs
+above. In large-scale machine learning tasks, SGD often converges to a
+good solution much faster in wall-clock time than full-batch GD, and
+it uses far less memory. For example, Bottou & Bousquet (2008)
+analyzed learning time under a fixed computational budget and
+concluded that when data is abundant, it’s better to use a faster
+(even if less precise) optimization method to process more examples in
+the same time . This analysis showed that for large-scale problems,
+processing more data with SGD yields lower error than spending the
+time to do exact (batch) optimization on fewer data . In other words,
+if you have a time budget, it’s often optimal to accept slightly
+slower convergence per step (as with SGD) in exchange for being able
+to use many more training samples in that time. This phenomenon is
+borne out by experiments:
+
+
+
+=== Deep Neural Networks ===
+
+In modern deep learning, full-batch GD is so slow that it is rarely
+attempted; instead, mini-batch SGD is standard. A recent study
+demonstrated that it is possible to train a ResNet-50 on ImageNet
+using full-batch gradient descent, but it required careful tuning
+(e.g. gradient clipping, tiny learning rates) and vast computational
+resources – and even then, each full-batch update was extremely
+expensive.
+
+Using a huge batch
+(closer to full GD) tends to slow down convergence if the learning
+rate is not scaled up, and often encounters optimization difficulties
+(plateaus) that small batches avoid.
+Empirically, small or medium
+batch SGD finds minima in fewer clock hours because it can rapidly
+loop over the data with gradient noise aiding exploration.
+
+=== Memory constraints ===
+
+From a memory standpoint, practitioners note that batch GD becomes
+infeasible on large data. For example, if one tried to do full-batch
+training on a dataset that doesn’t fit in RAM or GPU memory, the
+program would resort to heavy disk I/O or simply crash. SGD
+circumvents this by processing mini-batches. Even in cases where data
+does fit in memory, using a full batch can spike memory usage due to
+storing all gradients. One empirical observation is that mini-batch
+training has a “lower, fluctuating usage pattern” of memory, whereas
+full-batch loading “quickly consumes memory (often exceeding limits)”
+. This is especially relevant for graph neural networks or other
+models where a “batch” may include a huge chunk of a graph: full-batch
+gradient computation can exhaust GPU memory, whereas mini-batch
+methods keep memory usage manageable .
+
+
+In summary, SGD converges faster than full-batch GD in terms of actual
+training time for large-scale problems, provided we measure
+convergence as reaching a good-enough solution. Theoretical bounds
+show SGD needs more iterations, but because it performs many more
+updates per unit time (and requires far less memory), it often
+achieves lower loss in a given time frame than GD. Full-batch GD might
+take slightly fewer iterations in theory, but each iteration is so
+costly that it is “slower… especially for large datasets” . Meanwhile,
+memory scaling strongly favors SGD: GD’s memory cost grows with
+dataset size, making it impractical beyond a point, whereas SGD’s
+memory use is modest and mostly constant w.r.t. $N$ . These
+differences have made SGD (and mini-batch variants) the de facto
+choice for training large machine learning models, from logistic
+regression on millions of examples to deep neural networks with
+billions of parameters. The consensus in both research and practice is
+that for large-scale or high-dimensional tasks, SGD-type methods
+converge quicker per unit of computation and handle memory constraints
+better than standard full-batch gradient descent .
+
diff --git a/doc/src/week37/Previousversions/.ipynb_checkpoints/week37_2022-checkpoint.ipynb b/doc/src/week37/Previousversions/.ipynb_checkpoints/week37_2022-checkpoint.ipynb
new file mode 100644
index 000000000..2a1dd00d3
--- /dev/null
+++ b/doc/src/week37/Previousversions/.ipynb_checkpoints/week37_2022-checkpoint.ipynb
@@ -0,0 +1,2581 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "5cbc3557",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- HTML file automatically generated from DocOnce source (https://github.com/doconce/doconce/)\n",
+    "doconce format html week37_2022.do.txt  -->\n",
+    "<!-- dom:TITLE: Week 37: Summary of Ridge and Lasso Regression and Resampling Methods -->"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "28e0229a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "# Week 37: Summary of Ridge and Lasso Regression and Resampling Methods\n",
+    "**Morten Hjorth-Jensen**, Department of Physics, University of Oslo and Department of Physics and Astronomy and National Superconducting Cyclotron Laboratory, Michigan State University\n",
+    "\n",
+    "Date: **Sep 22, 2025**\n",
+    "\n",
+    "Copyright 1999-2025, Morten Hjorth-Jensen. Released under CC Attribution-NonCommercial 4.0 license"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "593af66f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Plans for week 37\n",
+    "\n",
+    "* Summary of Ridge and Lasso with examples and statistical interpretation. Start resampling techniques and discussion of the **bias-variance** tradeoff.\n",
+    "\n",
+    "* Resampling methods, bias-variance, overfitting, Cross-validation and Bootstrapping\n",
+    "\n",
+    "Recommended Reading:\n",
+    "1. Lectures on Resampling methods (these lectures), see also lectures from week 36\n",
+    "\n",
+    "2. Bishop 1.3 (cross-validation) and 3.2 (bias-variance tradeoff)\n",
+    "\n",
+    "3. Hastie et al Chapter 7, here we recommend 7.1-7.5 and 7.10 (cross-validation) and 7.11 (bootstrap). \n",
+    "\n",
+    "4. See also the excellent videos on the SVD at <http://databookuw.com/page-2/page-4/>. The texboook by Brunton and Kutz at <http://databookuw.com> is highly recommended"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "08836293",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Summary of Ridge and Lasso Regression and start Resampling methods"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "fbb93c61",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Deriving OLS from a probability distribution\n",
+    "\n",
+    "Our basic assumption when we derived the OLS equations was to assume\n",
+    "that our output is determined by a given continuous function\n",
+    "$f(\\boldsymbol{x})$ and a random noise $\\boldsymbol{\\epsilon}$ given by the normal\n",
+    "distribution with zero mean value and an undetermined variance\n",
+    "$\\sigma^2$.\n",
+    "\n",
+    "We found above that the outputs $\\boldsymbol{y}$ have a mean value given by\n",
+    "$\\boldsymbol{X}\\hat{\\boldsymbol{\\beta}}$ and variance $\\sigma^2$. Since the entries to\n",
+    "the design matrix are not stochastic variables, we can assume that the\n",
+    "probability distribution of our targets is also a normal distribution\n",
+    "but now with mean value $\\boldsymbol{X}\\hat{\\boldsymbol{\\beta}}$. This means that a\n",
+    "single output $y_i$ is given by the Gaussian distribution"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "71bd3009",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "y_i\\sim \\mathcal{N}(\\boldsymbol{X}_{i,*}\\boldsymbol{\\beta}, \\sigma^2)=\\frac{1}{\\sqrt{2\\pi\\sigma^2}}\\exp{\\left[-\\frac{(y_i-\\boldsymbol{X}_{i,*}\\boldsymbol{\\beta})^2}{2\\sigma^2}\\right]}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f0b70ced",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Independent and Identically Distrubuted (iid)\n",
+    "\n",
+    "We assume now that the various $y_i$ values are stochastically distributed according to the above Gaussian distribution. \n",
+    "We define this distribution as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a79ce540",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "p(y_i, \\boldsymbol{X}\\vert\\boldsymbol{\\beta})=\\frac{1}{\\sqrt{2\\pi\\sigma^2}}\\exp{\\left[-\\frac{(y_i-\\boldsymbol{X}_{i,*}\\boldsymbol{\\beta})^2}{2\\sigma^2}\\right]},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e1e717f1",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "which reads as finding the likelihood of an event $y_i$ with the input variables $\\boldsymbol{X}$ given the parameters (to be determined) $\\boldsymbol{\\beta}$.\n",
+    "\n",
+    "Since these events are assumed to be independent and identicall distributed we can build the probability distribution function (PDF) for all possible event $\\boldsymbol{y}$ as the product of the single events, that is we have"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "78d13e9e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "p(\\boldsymbol{y},\\boldsymbol{X}\\vert\\boldsymbol{\\beta})=\\prod_{i=0}^{n-1}\\frac{1}{\\sqrt{2\\pi\\sigma^2}}\\exp{\\left[-\\frac{(y_i-\\boldsymbol{X}_{i,*}\\boldsymbol{\\beta})^2}{2\\sigma^2}\\right]}=\\prod_{i=0}^{n-1}p(y_i,\\boldsymbol{X}\\vert\\boldsymbol{\\beta}).\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f2769411",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "We will write this in a more compact form reserving $\\boldsymbol{D}$ for the domain of events, including the ouputs (targets) and the inputs. That is\n",
+    "in case we have a simple one-dimensional input and output case"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e01b3ed0",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\boldsymbol{D}=[(x_0,y_0), (x_1,y_1),\\dots, (x_{n-1},y_{n-1})].\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a8f6ef1f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "In the more general case the various inputs should be replaced by the possible features represented by the input data set $\\boldsymbol{X}$. \n",
+    "We can now rewrite the above probability as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cecc837e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "p(\\boldsymbol{D}\\vert\\boldsymbol{\\beta})=\\prod_{i=0}^{n-1}\\frac{1}{\\sqrt{2\\pi\\sigma^2}}\\exp{\\left[-\\frac{(y_i-\\boldsymbol{X}_{i,*}\\boldsymbol{\\beta})^2}{2\\sigma^2}\\right]}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d2568eef",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "It is a conditional probability (see below) and reads as the likelihood of a domain of events $\\boldsymbol{D}$ given a set of parameters $\\boldsymbol{\\beta}$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "88d9a59d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Maximum Likelihood Estimation (MLE)\n",
+    "\n",
+    "In statistics, maximum likelihood estimation (MLE) is a method of\n",
+    "estimating the parameters of an assumed probability distribution,\n",
+    "given some observed data. This is achieved by maximizing a likelihood\n",
+    "function so that, under the assumed statistical model, the observed\n",
+    "data is the most probable. \n",
+    "\n",
+    "We will assume here that our events are given by the above Gaussian\n",
+    "distribution and we will determine the optimal parameters $\\beta$ by\n",
+    "maximizing the above PDF. However, computing the derivatives of a\n",
+    "product function is cumbersome and can easily lead to overflow and/or\n",
+    "underflowproblems, with potentials for loss of numerical precision.\n",
+    "\n",
+    "In practice, it is more convenient to maximize the logarithm of the\n",
+    "PDF because it is a monotonically increasing function of the argument.\n",
+    "Alternatively, and this will be our option, we will minimize the\n",
+    "negative of the logarithm since this is a monotonically decreasing\n",
+    "function.\n",
+    "\n",
+    "Note also that maximization/minimization of the logarithm of the PDF\n",
+    "is equivalent to the maximization/minimization of the function itself."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8d8f3be3",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## A new Cost Function\n",
+    "\n",
+    "We could now define a new cost function to minimize, namely the negative logarithm of the above PDF"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9f10f980",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "C(\\boldsymbol{\\beta})=-\\log{\\prod_{i=0}^{n-1}p(y_i,\\boldsymbol{X}\\vert\\boldsymbol{\\beta})}=-\\sum_{i=0}^{n-1}\\log{p(y_i,\\boldsymbol{X}\\vert\\boldsymbol{\\beta})},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1235ef44",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "which becomes"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b442e4ff",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "C(\\boldsymbol{\\beta})=\\frac{n}{2}\\log{2\\pi\\sigma^2}+\\frac{\\vert\\vert (\\boldsymbol{y}-\\boldsymbol{X}\\boldsymbol{\\beta})\\vert\\vert_2^2}{2\\sigma^2}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "47aa5a5d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Taking the derivative of the *new* cost function with respect to the parameters $\\beta$ we recognize our familiar OLS equation, namely"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3e6ad327",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\boldsymbol{X}^T\\left(\\boldsymbol{y}-\\boldsymbol{X}\\boldsymbol{\\beta}\\right) =0,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2023e38c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "which leads to the well-known OLS equation for the optimal paramters $\\beta$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cb73854a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\hat{\\boldsymbol{\\beta}}^{\\mathrm{OLS}}=\\left(\\boldsymbol{X}^T\\boldsymbol{X}\\right)^{-1}\\boldsymbol{X}^T\\boldsymbol{y}!\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "333cd935",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Bayes' Theorem\n",
+    "\n",
+    "If we combine the conditional probability with the marginal probability and the standard product rule, we have"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7ededba6",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "p(X\\vert Y)= \\frac{p(X,Y)}{p(Y)},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "95fd669d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "which we can rewrite as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9af8b754",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "p(X\\vert Y)= \\frac{p(X,Y)}{\\sum_{i=0}^{n-1}p(Y\\vert X=x_i)p(x_i)}=\\frac{p(Y\\vert X)p(X)}{\\sum_{i=0}^{n-1}p(Y\\vert X=x_i)p(x_i)},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "07f16a3c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "which is Bayes' theorem. It allows us to evaluate the uncertainty in in $X$ after we have observed $Y$. We can easily interchange $X$ with $Y$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "447a9c2a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Interpretations of Bayes' Theorem\n",
+    "\n",
+    "The quantity $p(Y\\vert X)$ on the right-hand side of the theorem is\n",
+    "evaluated for the observed data $Y$ and can be viewed as a function of\n",
+    "the parameter space represented by $X$. This function is not\n",
+    "necesseraly normalized and is normally called the likelihood function.\n",
+    "\n",
+    "The function $p(X)$ on the right hand side is called the prior while the function on the left hand side is the called the posterior probability. The denominator on the right hand side serves as a normalization factor for the posterior distribution."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "de58fcc1",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Test Function for what happens with OLS, Ridge and Lasso\n",
+    "\n",
+    "We will play around with a study of the values for the optimal\n",
+    "parameters $\\boldsymbol{\\beta}$ using OLS, Ridge and Lasso regression.  For\n",
+    "OLS, you will notice as function of the noise and polynomial degree,\n",
+    "that the parameters $\\beta$ will fluctuate from order to order in the\n",
+    "polynomial fit and that for larger and larger polynomial degrees of freedom, the parameters will tend to increase in value for OLS.\n",
+    "\n",
+    "For Ridge and Lasso regression, the higher order parameters will typically be reduced, providing thereby less fluctuations from one order to another one."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "id": "e2c46f25",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "%matplotlib inline\n",
+    "\n",
+    "import numpy as np\n",
+    "import matplotlib.pyplot as plt\n",
+    "from sklearn.model_selection import train_test_split\n",
+    "from sklearn import linear_model\n",
+    "\n",
+    "def R2(y_data, y_model):\n",
+    "    return 1 - np.sum((y_data - y_model) ** 2) / np.sum((y_data - np.mean(y_data)) ** 2)\n",
+    "def MSE(y_data,y_model):\n",
+    "    n = np.size(y_model)\n",
+    "    return np.sum((y_data-y_model)**2)/n\n",
+    "\n",
+    "# Make data set.\n",
+    "n = 10000\n",
+    "x = np.random.rand(n)\n",
+    "y = np.exp(-x**2) + 1.5 * np.exp(-(x-2)**2)+ np.random.randn(n)\n",
+    "\n",
+    "Maxpolydegree = 5\n",
+    "X = np.zeros((len(x),Maxpolydegree))\n",
+    "X[:,0] = 1.0\n",
+    "\n",
+    "for polydegree in range(1, Maxpolydegree+1):\n",
+    "    for degree in range(polydegree):\n",
+    "        X[:,degree] = x**(degree)\n",
+    "\n",
+    "\n",
+    "# We split the data in test and training data\n",
+    "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)\n",
+    "\n",
+    "# matrix inversion to find beta\n",
+    "OLSbeta = np.linalg.pinv(X_train.T @ X_train) @ X_train.T @ y_train\n",
+    "print(OLSbeta)\n",
+    "ypredictOLS = X_test @ OLSbeta\n",
+    "print(\"Test MSE OLS\")\n",
+    "print(MSE(y_test,ypredictOLS))\n",
+    "# Repeat now for Lasso and Ridge regression and various values of the regularization parameter using Scikit-Learn\n",
+    "# Decide which values of lambda to use\n",
+    "nlambdas = 4\n",
+    "MSERidgePredict = np.zeros(nlambdas)\n",
+    "MSELassoPredict = np.zeros(nlambdas)\n",
+    "lambdas = np.logspace(-3, 1, nlambdas)\n",
+    "for i in range(nlambdas):\n",
+    "    lmb = lambdas[i]\n",
+    "    # Make the fit using Ridge and Lasso\n",
+    "    RegRidge = linear_model.Ridge(lmb,fit_intercept=False)\n",
+    "    RegRidge.fit(X_train,y_train)\n",
+    "    RegLasso = linear_model.Lasso(lmb,fit_intercept=False)\n",
+    "    RegLasso.fit(X_train,y_train)\n",
+    "    # and then make the prediction\n",
+    "    ypredictRidge = RegRidge.predict(X_test)\n",
+    "    ypredictLasso = RegLasso.predict(X_test)\n",
+    "    # Compute the MSE and print it\n",
+    "    MSERidgePredict[i] = MSE(y_test,ypredictRidge)\n",
+    "    MSELassoPredict[i] = MSE(y_test,ypredictLasso)\n",
+    "    print(lmb,RegRidge.coef_)\n",
+    "    print(lmb,RegLasso.coef_)\n",
+    "# Now plot the results\n",
+    "plt.figure()\n",
+    "plt.plot(np.log10(lambdas), MSERidgePredict, 'b', label = 'MSE Ridge Test')\n",
+    "plt.plot(np.log10(lambdas), MSELassoPredict, 'r', label = 'MSE Lasso Test')\n",
+    "plt.xlabel('log10(lambda)')\n",
+    "plt.ylabel('MSE')\n",
+    "plt.legend()\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c9ec0e30",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "How can we understand this?"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "27ceb046",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Rerunning the above code\n",
+    "\n",
+    "Let us write out the values of the coefficients $\\beta_i$ as functions\n",
+    "of the polynomial degree and noise. We will focus only on the Ridge\n",
+    "results and some few selected values of the hyperparameter $\\lambda$.\n",
+    "\n",
+    "If we don't include any noise and run this code for different values\n",
+    "of the polynomial degree, we notice that the results for $\\beta_i$ do\n",
+    "not show great changes from one order to the next. This is an\n",
+    "indication that for higher polynomial orders, our parameters become\n",
+    "less important.\n",
+    "\n",
+    "If we however add noise, what happens is that the polynomial fit is\n",
+    "trying to adjust the fit to traverse in the best possible way all data\n",
+    "points. This can lead to large fluctuations in the parameters\n",
+    "$\\beta_i$ as functions of polynomial order. It will also be reflected\n",
+    "in a larger value of the variance of each parameter $\\beta_i$.  What\n",
+    "Ridge regression (and Lasso as well) are doing then is to try to\n",
+    "quench the fluctuations in the parameters of $\\beta_i$ which have a\n",
+    "large variance (normally for higher orders in the polynomial)."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "id": "7416cbdf",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "import numpy as np\n",
+    "import pandas as pd\n",
+    "from IPython.display import display\n",
+    "import matplotlib.pyplot as plt\n",
+    "from sklearn.model_selection import train_test_split\n",
+    "from sklearn import linear_model\n",
+    "\n",
+    "# Make data set.\n",
+    "n = 1000\n",
+    "x = np.random.rand(n)\n",
+    "y = np.exp(-x**2) + 1.5 * np.exp(-(x-2)**2)+ np.random.randn(n)\n",
+    "\n",
+    "Maxpolydegree = 5\n",
+    "X = np.zeros((len(x),Maxpolydegree))\n",
+    "X[:,0] = 1.0\n",
+    "\n",
+    "for polydegree in range(1, Maxpolydegree+1):\n",
+    "    for degree in range(polydegree):\n",
+    "        X[:,degree] = x**(degree)\n",
+    "\n",
+    "\n",
+    "# We split the data in test and training data\n",
+    "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)\n",
+    "\n",
+    "# Decide which values of lambda to use\n",
+    "nlambdas = 5\n",
+    "lambdas = np.logspace(-3, 2, nlambdas)\n",
+    "for i in range(nlambdas):\n",
+    "    lmb = lambdas[i]\n",
+    "    # Make the fit using Ridge only\n",
+    "    RegRidge = linear_model.Ridge(lmb,fit_intercept=False)\n",
+    "    RegRidge.fit(X_train,y_train)\n",
+    "    # and then make the prediction\n",
+    "    ypredictRidge = RegRidge.predict(X_test)\n",
+    "    Coeffs = np.array(RegRidge.coef_)\n",
+    "    BetaValues = pd.DataFrame(Coeffs)\n",
+    "    BetaValues.columns = ['beta']\n",
+    "    display(BetaValues)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "65ed8a0d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Invoking Bayes' theorem\n",
+    "\n",
+    "Using Bayes' theorem we can gain a better intuition about Ridge and Lasso regression. \n",
+    "\n",
+    "For ordinary least squares we postulated that the maximum likelihood for the doamin of events $\\boldsymbol{D}$ (one-dimensional case)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e63a73b8",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\boldsymbol{D}=[(x_0,y_0), (x_1,y_1),\\dots, (x_{n-1},y_{n-1})],\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "592534d6",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "is given by"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "700861a4",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "p(\\boldsymbol{D}\\vert\\boldsymbol{\\beta})=\\prod_{i=0}^{n-1}\\frac{1}{\\sqrt{2\\pi\\sigma^2}}\\exp{\\left[-\\frac{(y_i-\\boldsymbol{X}_{i,*}\\boldsymbol{\\beta})^2}{2\\sigma^2}\\right]}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "353757dd",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "In Bayes' theorem this function plays the role of the so-called likelihood. We could now ask the question what is the posterior probability of a parameter set $\\boldsymbol{\\beta}$ given a domain of events $\\boldsymbol{D}$?  That is, how can we define the posterior probability"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "fe714f0e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "p(\\boldsymbol{\\beta}\\vert\\boldsymbol{D}).\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "206210ae",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Bayes' theorem comes to our rescue here since (omitting the normalization constant)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "948e45c6",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "p(\\boldsymbol{\\beta}\\vert\\boldsymbol{D})\\propto p(\\boldsymbol{D}\\vert\\boldsymbol{\\beta})p(\\boldsymbol{\\beta}).\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "fee3d7c4",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "We have a model for $p(\\boldsymbol{D}\\vert\\boldsymbol{\\beta})$ but need one for the **prior** $p(\\boldsymbol{\\beta}$!"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "099f7e42",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Ridge and Bayes\n",
+    "\n",
+    "With the posterior probability defined by a likelihood which we have\n",
+    "already modeled and an unknown prior, we are now ready to make\n",
+    "additional models for the prior.\n",
+    "\n",
+    "We can, based on our discussions of the variance of $\\boldsymbol{\\beta}$ and the mean value, assume that the prior for the values $\\boldsymbol{\\beta}$ is given by a Gaussian with mean value zero and variance $\\tau^2$, that is"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "104b7139",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "p(\\boldsymbol{\\beta})=\\prod_{j=0}^{p-1}\\exp{\\left(-\\frac{\\beta_j^2}{2\\tau^2}\\right)}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "353b7382",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Our posterior probability becomes then (omitting the normalization factor which is just a constant)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0fc7801e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "p(\\boldsymbol{\\beta\\vert\\boldsymbol{D})}=\\prod_{i=0}^{n-1}\\frac{1}{\\sqrt{2\\pi\\sigma^2}}\\exp{\\left[-\\frac{(y_i-\\boldsymbol{X}_{i,*}\\boldsymbol{\\beta})^2}{2\\sigma^2}\\right]}\\prod_{j=0}^{p-1}\\exp{\\left(-\\frac{\\beta_j^2}{2\\tau^2}\\right)}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "52481c87",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "We can now optimize this quantity with respect to $\\boldsymbol{\\beta}$. As we\n",
+    "did for OLS, this is most conveniently done by taking the negative\n",
+    "logarithm of the posterior probability. Doing so and leaving out the\n",
+    "constants terms that do not depend on $\\beta$, we have"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6049f5fe",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "C(\\boldsymbol{\\beta})=\\frac{\\vert\\vert (\\boldsymbol{y}-\\boldsymbol{X}\\boldsymbol{\\beta})\\vert\\vert_2^2}{2\\sigma^2}+\\frac{1}{2\\tau^2}\\vert\\vert\\boldsymbol{\\beta}\\vert\\vert_2^2,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0b0d5dc2",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and replacing $1/2\\tau^2$ with $\\lambda$ we have"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e6756e58",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "C(\\boldsymbol{\\beta})=\\frac{\\vert\\vert (\\boldsymbol{y}-\\boldsymbol{X}\\boldsymbol{\\beta})\\vert\\vert_2^2}{2\\sigma^2}+\\lambda\\vert\\vert\\boldsymbol{\\beta}\\vert\\vert_2^2,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a058ef4c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "which is our Ridge cost function!  Nice, isn't it?"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "151bf669",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Lasso and Bayes\n",
+    "\n",
+    "To derive the Lasso cost function, we simply replace the Gaussian prior with an exponential distribution ([Laplace in this case](https://en.wikipedia.org/wiki/Laplace_distribution)) with zero mean value,  that is"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f94bf6c1",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "p(\\boldsymbol{\\beta})=\\prod_{j=0}^{p-1}\\exp{\\left(-\\frac{\\vert\\beta_j\\vert}{\\tau}\\right)}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6e54268c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Our posterior probability becomes then (omitting the normalization factor which is just a constant)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a8f9e121",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "p(\\boldsymbol{\\beta}\\vert\\boldsymbol{D})=\\prod_{i=0}^{n-1}\\frac{1}{\\sqrt{2\\pi\\sigma^2}}\\exp{\\left[-\\frac{(y_i-\\boldsymbol{X}_{i,*}\\boldsymbol{\\beta})^2}{2\\sigma^2}\\right]}\\prod_{j=0}^{p-1}\\exp{\\left(-\\frac{\\vert\\beta_j\\vert}{\\tau}\\right)}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2451ef7f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Taking the negative\n",
+    "logarithm of the posterior probability and leaving out the\n",
+    "constants terms that do not depend on $\\beta$, we have"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0aee6d6b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "C(\\boldsymbol{\\beta}=\\frac{\\vert\\vert (\\boldsymbol{y}-\\boldsymbol{X}\\boldsymbol{\\beta})\\vert\\vert_2^2}{2\\sigma^2}+\\frac{1}{\\tau}\\vert\\vert\\boldsymbol{\\beta}\\vert\\vert_1,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cea1b629",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and replacing $1/\\tau$ with $\\lambda$ we have"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "aa55ba2f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "C(\\boldsymbol{\\beta}=\\frac{\\vert\\vert (\\boldsymbol{y}-\\boldsymbol{X}\\boldsymbol{\\beta})\\vert\\vert_2^2}{2\\sigma^2}+\\lambda\\vert\\vert\\boldsymbol{\\beta}\\vert\\vert_1,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "54211337",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "which is our Lasso cost function!"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1d1eade3",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Why resampling methods\n",
+    "\n",
+    "Before we proceed, we need to rethink what we have been doing. In our\n",
+    "eager to fit the data, we have omitted several important elements in\n",
+    "our regression analysis. In what follows we will\n",
+    "1. look at statistical properties, including a discussion of mean values, variance and the so-called bias-variance tradeoff\n",
+    "\n",
+    "2. introduce resampling techniques like cross-validation, bootstrapping and jackknife and more\n",
+    "\n",
+    "and discuss how to select a given model (one of the difficult parts in machine learning)."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e91075b7",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Resampling methods\n",
+    "Resampling methods are an indispensable tool in modern\n",
+    "statistics. They involve repeatedly drawing samples from a training\n",
+    "set and refitting a model of interest on each sample in order to\n",
+    "obtain additional information about the fitted model. For example, in\n",
+    "order to estimate the variability of a linear regression fit, we can\n",
+    "repeatedly draw different samples from the training data, fit a linear\n",
+    "regression to each new sample, and then examine the extent to which\n",
+    "the resulting fits differ. Such an approach may allow us to obtain\n",
+    "information that would not be available from fitting the model only\n",
+    "once using the original training sample.\n",
+    "\n",
+    "Two resampling methods are often used in Machine Learning analyses,\n",
+    "1. The **bootstrap method**\n",
+    "\n",
+    "2. and **Cross-Validation**\n",
+    "\n",
+    "In addition there are several other methods such as the Jackknife and the Blocking methods. We will discuss in particular\n",
+    "cross-validation and the bootstrap method."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "65ce3eb8",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Resampling approaches can be computationally expensive\n",
+    "\n",
+    "Resampling approaches can be computationally expensive, because they\n",
+    "involve fitting the same statistical method multiple times using\n",
+    "different subsets of the training data. However, due to recent\n",
+    "advances in computing power, the computational requirements of\n",
+    "resampling methods generally are not prohibitive. In this chapter, we\n",
+    "discuss two of the most commonly used resampling methods,\n",
+    "cross-validation and the bootstrap. Both methods are important tools\n",
+    "in the practical application of many statistical learning\n",
+    "procedures. For example, cross-validation can be used to estimate the\n",
+    "test error associated with a given statistical learning method in\n",
+    "order to evaluate its performance, or to select the appropriate level\n",
+    "of flexibility. The process of evaluating a model’s performance is\n",
+    "known as model assessment, whereas the process of selecting the proper\n",
+    "level of flexibility for a model is known as model selection. The\n",
+    "bootstrap is widely used."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f299e47b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Why resampling methods ?\n",
+    "**Statistical analysis.**\n",
+    "\n",
+    "* Our simulations can be treated as *computer experiments*. This is particularly the case for Monte Carlo methods which are widely used in statistical analyses.\n",
+    "\n",
+    "* The results can be analysed with the same statistical tools as we would use when analysing experimental data.\n",
+    "\n",
+    "* As in all experiments, we are looking for expectation values and an estimate of how accurate they are, i.e., possible sources for errors."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "096eb8b1",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Statistical analysis\n",
+    "\n",
+    "* As in other experiments, many numerical  experiments have two classes of errors:\n",
+    "\n",
+    "  * Statistical errors\n",
+    "\n",
+    "  * Systematical errors\n",
+    "\n",
+    "* Statistical errors can be estimated using standard tools from statistics\n",
+    "\n",
+    "* Systematical errors are method specific and must be treated differently from case to case."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4b05e0ce",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Resampling methods\n",
+    "\n",
+    "With all these analytical equations for both the OLS and Ridge\n",
+    "regression, we will now outline how to assess a given model. This will\n",
+    "lead to a discussion of the so-called bias-variance tradeoff (see\n",
+    "below) and so-called resampling methods.\n",
+    "\n",
+    "One of the quantities we have discussed as a way to measure errors is\n",
+    "the mean-squared error (MSE), mainly used for fitting of continuous\n",
+    "functions. Another choice is the absolute error.\n",
+    "\n",
+    "In the discussions below we will focus on the MSE and in particular since we will split the data into test and training data,\n",
+    "we discuss the\n",
+    "1. prediction error or simply the **test error** $\\mathrm{Err_{Test}}$, where we have a fixed training set and the test error is the MSE arising from the data reserved for testing. We discuss also the \n",
+    "\n",
+    "2. training error $\\mathrm{Err_{Train}}$, which is the average loss over the training data.\n",
+    "\n",
+    "As our model becomes more and more complex, more of the training data tends to  used. The training may thence adapt to more complicated structures in the data. This may lead to a decrease in the bias (see below for code example) and a slight increase of the variance for the test error.\n",
+    "For a certain level of complexity the test error will reach minimum, before starting to increase again. The\n",
+    "training error reaches a saturation."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5a3ffa9b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Resampling methods: Jackknife and Bootstrap\n",
+    "\n",
+    "Two famous\n",
+    "resampling methods are the **independent bootstrap** and **the jackknife**. \n",
+    "\n",
+    "The jackknife is a special case of the independent bootstrap. Still, the jackknife was made\n",
+    "popular prior to the independent bootstrap. And as the popularity of\n",
+    "the independent bootstrap soared, new variants, such as **the dependent bootstrap** have also been developed..\n",
+    "\n",
+    "The Jackknife and independent bootstrap work for\n",
+    "independent, identically distributed random variables.\n",
+    "If these conditions are not\n",
+    "satisfied, the methods will fail.  Yet, it should be said that if the data are\n",
+    "independent, identically distributed, and we only want to estimate the\n",
+    "variance of $\\overline{X}$ (which often is the case), then there is no\n",
+    "need for bootstrapping."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "480037c5",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Resampling methods: Jackknife\n",
+    "\n",
+    "The Jackknife works by making many replicas of the estimator $\\widehat{\\beta}$. \n",
+    "The jackknife is a resampling method where we systematically leave out one observation from the vector of observed values $\\boldsymbol{x} = (x_1,x_2,\\cdots,X_n)$. \n",
+    "Let $\\boldsymbol{x}_i$ denote the vector"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d41eed53",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\boldsymbol{x}_i = (x_1,x_2,\\cdots,x_{i-1},x_{i+1},\\cdots,x_n),\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f8f30e8b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "which equals the vector $\\boldsymbol{x}$ with the exception that observation\n",
+    "number $i$ is left out. Using this notation, define\n",
+    "$\\widehat{\\beta}_i$ to be the estimator\n",
+    "$\\widehat{\\beta}$ computed using $\\vec{X}_i$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "03cdb893",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Jackknife code example"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "id": "65bdaef4",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "from numpy import *\n",
+    "from numpy.random import randint, randn\n",
+    "from time import time\n",
+    "\n",
+    "def jackknife(data, stat):\n",
+    "    n = len(data);t = zeros(n); inds = arange(n); t0 = time()\n",
+    "    ## 'jackknifing' by leaving out an observation for each i                                                                                                                      \n",
+    "    for i in range(n):\n",
+    "        t[i] = stat(delete(data,i) )\n",
+    "\n",
+    "    # analysis                                                                                                                                                                     \n",
+    "    print(\"Runtime: %g sec\" % (time()-t0)); print(\"Jackknife Statistics :\")\n",
+    "    print(\"original           bias      std. error\")\n",
+    "    print(\"%8g %14g %15g\" % (stat(data),(n-1)*mean(t)/n, (n*var(t))**.5))\n",
+    "\n",
+    "    return t\n",
+    "\n",
+    "\n",
+    "# Returns mean of data samples                                                                                                                                                     \n",
+    "def stat(data):\n",
+    "    return mean(data)\n",
+    "\n",
+    "\n",
+    "mu, sigma = 100, 15\n",
+    "datapoints = 10000\n",
+    "x = mu + sigma*random.randn(datapoints)\n",
+    "# jackknife returns the data sample                                                                                                                                                \n",
+    "t = jackknife(x, stat)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "bf1ee253",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Resampling methods: Bootstrap\n",
+    "Bootstrapping is a [non-parametric approach](https://en.wikipedia.org/wiki/Nonparametric_statistics) to statistical inference\n",
+    "that substitutes computation for more traditional distributional\n",
+    "assumptions and asymptotic results. Bootstrapping offers a number of\n",
+    "advantages: \n",
+    "1. The bootstrap is quite general, although there are some cases in which it fails.  \n",
+    "\n",
+    "2. Because it does not require distributional assumptions (such as normally distributed errors), the bootstrap can provide more accurate inferences when the data are not well behaved or when the sample size is small.  \n",
+    "\n",
+    "3. It is possible to apply the bootstrap to statistics with sampling distributions that are difficult to derive, even asymptotically. \n",
+    "\n",
+    "4. It is relatively simple to apply the bootstrap to complex data-collection plans (such as stratified and clustered samples).\n",
+    "\n",
+    "The textbook by [Davison on the Bootstrap Methods and their Applications](https://www.cambridge.org/core/books/bootstrap-methods-and-their-application/ED2FD043579F27952363566DC09CBD6A) provides many more insights and proofs. In this course we will take a more practical approach and use the results and theorems provided in the literature. For those interested in reading more about the bootstrap methods, we recommend the above text and the one by [Efron and Tibshirani](https://www.routledge.com/An-Introduction-to-the-Bootstrap/Efron-Tibshirani/p/book/9780412042317).\n",
+    "\n",
+    "Before we proceed however, we need to remind ourselves about a central theorem in statistics, namely the so-called **central limit theorem**."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4fbb55be",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## The Central Limit Theorem\n",
+    "\n",
+    "Suppose we have a PDF $p(x)$ from which we generate  a series $N$\n",
+    "of averages $\\mathbb{E}[x_i]$. Each mean value $\\mathbb{E}[x_i]$\n",
+    "is viewed as the average of a specific measurement, e.g., throwing \n",
+    "dice 100 times and then taking the average value, or producing a certain\n",
+    "amount of random numbers. \n",
+    "For notational ease, we set $\\mathbb{E}[x_i]=x_i$ in the discussion\n",
+    "which follows. We do the same for $\\mathbb{E}[z]=z$.\n",
+    "\n",
+    "If we compute the mean $z$ of $m$ such mean values $x_i$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c7eecb0b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "z=\\frac{x_1+x_2+\\dots+x_m}{m},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c89cea70",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "the question we pose is which is the PDF of the new variable $z$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cbca665f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Finding the Limit\n",
+    "\n",
+    "The probability of obtaining an average value $z$ is the product of the \n",
+    "probabilities of obtaining arbitrary individual mean values $x_i$,\n",
+    "but with the constraint that the average is $z$. We can express this through\n",
+    "the following expression"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d1b912c0",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\tilde{p}(z)=\\int dx_1p(x_1)\\int dx_2p(x_2)\\dots\\int dx_mp(x_m)\n",
+    "    \\delta(z-\\frac{x_1+x_2+\\dots+x_m}{m}),\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e44342d4",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where the $\\delta$-function enbodies the constraint that the mean is $z$.\n",
+    "All measurements that lead to each individual $x_i$ are expected to\n",
+    "be independent, which in turn means that we can express $\\tilde{p}$ as the \n",
+    "product of individual $p(x_i)$.  The independence assumption is important in the derivation of the central limit theorem."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c56fed71",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Rewriting the $\\delta$-function\n",
+    "\n",
+    "If we use the integral expression for the $\\delta$-function"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "180c250f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\delta(z-\\frac{x_1+x_2+\\dots+x_m}{m})=\\frac{1}{2\\pi}\\int_{-\\infty}^{\\infty}\n",
+    "   dq\\exp{\\left(iq(z-\\frac{x_1+x_2+\\dots+x_m}{m})\\right)},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cc988c02",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and inserting $e^{i\\mu q-i\\mu q}$ where $\\mu$ is the mean value\n",
+    "we arrive at"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1ee0ebdf",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\tilde{p}(z)=\\frac{1}{2\\pi}\\int_{-\\infty}^{\\infty}\n",
+    "   dq\\exp{\\left(iq(z-\\mu)\\right)}\\left[\\int_{-\\infty}^{\\infty}\n",
+    "   dxp(x)\\exp{\\left(iq(\\mu-x)/m\\right)}\\right]^m,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "094af59c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "with the integral over $x$ resulting in"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a5c9ce7e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\int_{-\\infty}^{\\infty}dxp(x)\\exp{\\left(iq(\\mu-x)/m\\right)}=\n",
+    "  \\int_{-\\infty}^{\\infty}dxp(x)\n",
+    "   \\left[1+\\frac{iq(\\mu-x)}{m}-\\frac{q^2(\\mu-x)^2}{2m^2}+\\dots\\right].\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e8a3a476",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Identifying Terms\n",
+    "\n",
+    "The second term on the rhs disappears since this is just the mean and \n",
+    "employing the definition of $\\sigma^2$ we have"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ef6bb82d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\int_{-\\infty}^{\\infty}dxp(x)e^{\\left(iq(\\mu-x)/m\\right)}=\n",
+    "  1-\\frac{q^2\\sigma^2}{2m^2}+\\dots,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "bfedcd1c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "resulting in"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d01517bf",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\left[\\int_{-\\infty}^{\\infty}dxp(x)\\exp{\\left(iq(\\mu-x)/m\\right)}\\right]^m\\approx\n",
+    "  \\left[1-\\frac{q^2\\sigma^2}{2m^2}+\\dots \\right]^m,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e329181b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and in the limit $m\\rightarrow \\infty$ we obtain"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3f783f80",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\tilde{p}(z)=\\frac{1}{\\sqrt{2\\pi}(\\sigma/\\sqrt{m})}\n",
+    "    \\exp{\\left(-\\frac{(z-\\mu)^2}{2(\\sigma/\\sqrt{m})^2}\\right)},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "27c8f0bb",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "which is the normal distribution with variance\n",
+    "$\\sigma^2_m=\\sigma^2/m$, where $\\sigma$ is the variance of the PDF $p(x)$\n",
+    "and $\\mu$ is also the mean of the PDF $p(x)$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1e34c156",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Wrapping it up\n",
+    "\n",
+    "Thus, the central limit theorem states that the PDF $\\tilde{p}(z)$ of\n",
+    "the average of $m$ random values corresponding to a PDF $p(x)$ \n",
+    "is a normal distribution whose mean is the \n",
+    "mean value of the PDF $p(x)$ and whose variance is the variance\n",
+    "of the PDF $p(x)$ divided by $m$, the number of values used to compute $z$.\n",
+    "\n",
+    "The central limit theorem leads to the well-known expression for the\n",
+    "standard deviation, given by"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e52b7d05",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\sigma_m=\n",
+    "\\frac{\\sigma}{\\sqrt{m}}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e37824dd",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "The latter is true only if the average value is known exactly. This is obtained in the limit\n",
+    "$m\\rightarrow \\infty$  only. Because the mean and the variance are measured quantities we obtain \n",
+    "the familiar expression in statistics (the so-called Bessel correction)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ddb4e1ba",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\sigma_m\\approx \n",
+    "\\frac{\\sigma}{\\sqrt{m-1}}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8523e1d4",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "In many cases however the above estimate for the standard deviation,\n",
+    "in particular if correlations are strong, may be too simplistic. Keep\n",
+    "in mind that we have assumed that the variables $x$ are independent\n",
+    "and identically distributed. This is obviously not always the\n",
+    "case. For example, the random numbers (or better pseudorandom numbers)\n",
+    "we generate in various calculations do always exhibit some\n",
+    "correlations.\n",
+    "\n",
+    "The theorem is satisfied by a large class of PDFs. Note however that for a\n",
+    "finite $m$, it is not always possible to find a closed form /analytic expression for\n",
+    "$\\tilde{p}(x)$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "73769ce7",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Confidence Intervals\n",
+    "\n",
+    "Confidence intervals are used in statistics and represent a type of estimate\n",
+    "computed from the observed data. This gives a range of values for an\n",
+    "unknown parameter such as the parameters $\\boldsymbol{\\beta}$ from linear regression.\n",
+    "\n",
+    "With the OLS expressions for the parameters $\\boldsymbol{\\beta}$ we found \n",
+    "$\\mathbb{E}(\\boldsymbol{\\beta}) = \\boldsymbol{\\beta}$, which means that the estimator of the regression parameters is unbiased.\n",
+    "\n",
+    "We found also that the variance of the estimate of the $j$-th regression coefficient is\n",
+    "$\\boldsymbol{\\sigma}^2 (\\boldsymbol{\\beta}_j ) = \\boldsymbol{\\sigma}^2 [(\\mathbf{X}^{T} \\mathbf{X})^{-1}]_{jj} $.\n",
+    "\n",
+    "This quantity will be used to\n",
+    "construct a confidence interval for the estimates."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f8188c71",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Standard Approach based on the Normal Distribution\n",
+    "\n",
+    "We will assume that the parameters $\\beta$ follow a normal\n",
+    "distribution.  We can then define the confidence interval.  Here we will be using as\n",
+    "shorthands $\\mu_{\\beta}$ for the above mean value and $\\sigma_{\\beta}$\n",
+    "for the standard deviation. We have then a confidence interval"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "70628e2f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\left(\\mu_{\\beta}\\pm \\frac{z\\sigma_{\\beta}}{\\sqrt{n}}\\right),\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0fc91416",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where $z$ defines the level of certainty (or confidence). For a normal\n",
+    "distribution typical parameters are $z=2.576$ which corresponds to a\n",
+    "confidence of $99\\%$ while $z=1.96$ corresponds to a confidence of\n",
+    "$95\\%$.  A confidence level of $95\\%$ is commonly used and it is\n",
+    "normally referred to as a *two-sigmas* confidence level, that is we\n",
+    "approximate $z\\approx 2$.\n",
+    "\n",
+    "For more discussions of confidence intervals (and in particular linked with a discussion of the bootstrap method), see chapter 5 of the textbook by [Davison on the Bootstrap Methods and their Applications](https://www.cambridge.org/core/books/bootstrap-methods-and-their-application/ED2FD043579F27952363566DC09CBD6A)\n",
+    "\n",
+    "In this text you will also find an in-depth discussion of the\n",
+    "Bootstrap method, why it works and various theorems related to it."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "bfd9d392",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Resampling methods: Bootstrap background\n",
+    "\n",
+    "Since $\\widehat{\\beta} = \\widehat{\\beta}(\\boldsymbol{X})$ is a function of random variables,\n",
+    "$\\widehat{\\beta}$ itself must be a random variable. Thus it has\n",
+    "a pdf, call this function $p(\\boldsymbol{t})$. The aim of the bootstrap is to\n",
+    "estimate $p(\\boldsymbol{t})$ by the relative frequency of\n",
+    "$\\widehat{\\beta}$. You can think of this as using a histogram\n",
+    "in the place of $p(\\boldsymbol{t})$. If the relative frequency closely\n",
+    "resembles $p(\\vec{t})$, then using numerics, it is straight forward to\n",
+    "estimate all the interesting parameters of $p(\\boldsymbol{t})$ using point\n",
+    "estimators."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7d1051eb",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Resampling methods: More Bootstrap background\n",
+    "\n",
+    "In the case that $\\widehat{\\beta}$ has\n",
+    "more than one component, and the components are independent, we use the\n",
+    "same estimator on each component separately.  If the probability\n",
+    "density function of $X_i$, $p(x)$, had been known, then it would have\n",
+    "been straightforward to do this by: \n",
+    "1. Drawing lots of numbers from $p(x)$, suppose we call one such set of numbers $(X_1^*, X_2^*, \\cdots, X_n^*)$. \n",
+    "\n",
+    "2. Then using these numbers, we could compute a replica of $\\widehat{\\beta}$ called $\\widehat{\\beta}^*$. \n",
+    "\n",
+    "By repeated use of the above two points, many\n",
+    "estimates of $\\widehat{\\beta}$ can  be obtained. The\n",
+    "idea is to use the relative frequency of $\\widehat{\\beta}^*$\n",
+    "(think of a histogram) as an estimate of $p(\\boldsymbol{t})$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8df606b8",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Resampling methods: Bootstrap approach\n",
+    "\n",
+    "But\n",
+    "unless there is enough information available about the process that\n",
+    "generated $X_1,X_2,\\cdots,X_n$, $p(x)$ is in general\n",
+    "unknown. Therefore, [Efron in 1979](https://projecteuclid.org/euclid.aos/1176344552)  asked the\n",
+    "question: What if we replace $p(x)$ by the relative frequency\n",
+    "of the observation $X_i$?\n",
+    "\n",
+    "If we draw observations in accordance with\n",
+    "the relative frequency of the observations, will we obtain the same\n",
+    "result in some asymptotic sense? The answer is yes."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "12ffc406",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Resampling methods: Bootstrap steps\n",
+    "\n",
+    "The independent bootstrap works like this: \n",
+    "\n",
+    "1. Draw with replacement $n$ numbers for the observed variables $\\boldsymbol{x} = (x_1,x_2,\\cdots,x_n)$. \n",
+    "\n",
+    "2. Define a vector $\\boldsymbol{x}^*$ containing the values which were drawn from $\\boldsymbol{x}$. \n",
+    "\n",
+    "3. Using the vector $\\boldsymbol{x}^*$ compute $\\widehat{\\beta}^*$ by evaluating $\\widehat \\beta$ under the observations $\\boldsymbol{x}^*$. \n",
+    "\n",
+    "4. Repeat this process $k$ times. \n",
+    "\n",
+    "When you are done, you can draw a histogram of the relative frequency\n",
+    "of $\\widehat \\beta^*$. This is your estimate of the probability\n",
+    "distribution $p(t)$. Using this probability distribution you can\n",
+    "estimate any statistics thereof. In principle you never draw the\n",
+    "histogram of the relative frequency of $\\widehat{\\beta}^*$. Instead\n",
+    "you use the estimators corresponding to the statistic of interest. For\n",
+    "example, if you are interested in estimating the variance of $\\widehat\n",
+    "\\beta$, apply the etsimator $\\widehat \\sigma^2$ to the values\n",
+    "$\\widehat \\beta^*$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "186bb19b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Code example for the Bootstrap method\n",
+    "\n",
+    "The following code starts with a Gaussian distribution with mean value\n",
+    "$\\mu =100$ and variance $\\sigma=15$. We use this to generate the data\n",
+    "used in the bootstrap analysis. The bootstrap analysis returns a data\n",
+    "set after a given number of bootstrap operations (as many as we have\n",
+    "data points). This data set consists of estimated mean values for each\n",
+    "bootstrap operation. The histogram generated by the bootstrap method\n",
+    "shows that the distribution for these mean values is also a Gaussian,\n",
+    "centered around the mean value $\\mu=100$ but with standard deviation\n",
+    "$\\sigma/\\sqrt{n}$, where $n$ is the number of bootstrap samples (in\n",
+    "this case the same as the number of original data points). The value\n",
+    "of the standard deviation is what we expect from the central limit\n",
+    "theorem."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "id": "bb29aeb9",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "import numpy as np\n",
+    "from time import time\n",
+    "from scipy.stats import norm\n",
+    "import matplotlib.pyplot as plt\n",
+    "\n",
+    "# Returns mean of bootstrap samples \n",
+    "# Bootstrap algorithm\n",
+    "def bootstrap(data, datapoints):\n",
+    "    t = np.zeros(datapoints)\n",
+    "    n = len(data)\n",
+    "    # non-parametric bootstrap         \n",
+    "    for i in range(datapoints):\n",
+    "        t[i] = np.mean(data[np.random.randint(0,n,n)])\n",
+    "    # analysis    \n",
+    "    print(\"Bootstrap Statistics :\")\n",
+    "    print(\"original           bias      std. error\")\n",
+    "    print(\"%8g %8g %14g %15g\" % (np.mean(data), np.std(data),np.mean(t),np.std(t)))\n",
+    "    return t\n",
+    "\n",
+    "# We set the mean value to 100 and the standard deviation to 15\n",
+    "mu, sigma = 100, 15\n",
+    "datapoints = 10000\n",
+    "# We generate random numbers according to the normal distribution\n",
+    "x = mu + sigma*np.random.randn(datapoints)\n",
+    "# bootstrap returns the data sample                                    \n",
+    "t = bootstrap(x, datapoints)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f6d12430",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "We see that our new variance and from that the standard deviation, agrees with the central limit theorem."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3ea2c697",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Plotting the Histogram"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "id": "75da51d2",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "# the histogram of the bootstrapped data (normalized data if density = True)\n",
+    "n, binsboot, patches = plt.hist(t, 50, density=True, facecolor='red', alpha=0.75)\n",
+    "# add a 'best fit' line  \n",
+    "y = norm.pdf(binsboot, np.mean(t), np.std(t))\n",
+    "lt = plt.plot(binsboot, y, 'b', linewidth=1)\n",
+    "plt.xlabel('x')\n",
+    "plt.ylabel('Probability')\n",
+    "plt.grid(True)\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "432b1d05",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## The bias-variance tradeoff\n",
+    "\n",
+    "We will discuss the bias-variance tradeoff in the context of\n",
+    "continuous predictions such as regression. However, many of the\n",
+    "intuitions and ideas discussed here also carry over to classification\n",
+    "tasks. Consider a dataset $\\mathcal{D}$ consisting of the data\n",
+    "$\\mathbf{X}_\\mathcal{D}=\\{(y_j, \\boldsymbol{x}_j), j=0\\ldots n-1\\}$. \n",
+    "\n",
+    "Let us assume that the true data is generated from a noisy model"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "18ba4114",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\boldsymbol{y}=f(\\boldsymbol{x}) + \\boldsymbol{\\epsilon}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b08d293f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where $\\epsilon$ is normally distributed with mean zero and standard deviation $\\sigma^2$.\n",
+    "\n",
+    "In our derivation of the ordinary least squares method we defined then\n",
+    "an approximation to the function $f$ in terms of the parameters\n",
+    "$\\boldsymbol{\\beta}$ and the design matrix $\\boldsymbol{X}$ which embody our model,\n",
+    "that is $\\boldsymbol{\\tilde{y}}=\\boldsymbol{X}\\boldsymbol{\\beta}$. \n",
+    "\n",
+    "Thereafter we found the parameters $\\boldsymbol{\\beta}$ by optimizing the means squared error via the so-called cost function"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a6f63c57",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "C(\\boldsymbol{X},\\boldsymbol{\\beta}) =\\frac{1}{n}\\sum_{i=0}^{n-1}(y_i-\\tilde{y}_i)^2=\\mathbb{E}\\left[(\\boldsymbol{y}-\\boldsymbol{\\tilde{y}})^2\\right].\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a2f76770",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "We can rewrite this as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ad8720f3",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\mathbb{E}\\left[(\\boldsymbol{y}-\\boldsymbol{\\tilde{y}})^2\\right]=\\frac{1}{n}\\sum_i(f_i-\\mathbb{E}\\left[\\boldsymbol{\\tilde{y}}\\right])^2+\\frac{1}{n}\\sum_i(\\tilde{y}_i-\\mathbb{E}\\left[\\boldsymbol{\\tilde{y}}\\right])^2+\\sigma^2.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5a2c71e1",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "The three terms represent the square of the bias of the learning\n",
+    "method, which can be thought of as the error caused by the simplifying\n",
+    "assumptions built into the method. The second term represents the\n",
+    "variance of the chosen model and finally the last terms is variance of\n",
+    "the error $\\boldsymbol{\\epsilon}$.\n",
+    "\n",
+    "To derive this equation, we need to recall that the variance of $\\boldsymbol{y}$ and $\\boldsymbol{\\epsilon}$ are both equal to $\\sigma^2$. The mean value of $\\boldsymbol{\\epsilon}$ is by definition equal to zero. Furthermore, the function $f$ is not a stochastics variable, idem for $\\boldsymbol{\\tilde{y}}$.\n",
+    "We use a more compact notation in terms of the expectation value"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b51ab783",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\mathbb{E}\\left[(\\boldsymbol{y}-\\boldsymbol{\\tilde{y}})^2\\right]=\\mathbb{E}\\left[(\\boldsymbol{f}+\\boldsymbol{\\epsilon}-\\boldsymbol{\\tilde{y}})^2\\right],\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "890f4d3a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and adding and subtracting $\\mathbb{E}\\left[\\boldsymbol{\\tilde{y}}\\right]$ we get"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "529e915b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\mathbb{E}\\left[(\\boldsymbol{y}-\\boldsymbol{\\tilde{y}})^2\\right]=\\mathbb{E}\\left[(\\boldsymbol{f}+\\boldsymbol{\\epsilon}-\\boldsymbol{\\tilde{y}}+\\mathbb{E}\\left[\\boldsymbol{\\tilde{y}}\\right]-\\mathbb{E}\\left[\\boldsymbol{\\tilde{y}}\\right])^2\\right],\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a786a8c8",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "which, using the abovementioned expectation values can be rewritten as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e7dc0793",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\mathbb{E}\\left[(\\boldsymbol{y}-\\boldsymbol{\\tilde{y}})^2\\right]=\\mathbb{E}\\left[(\\boldsymbol{y}-\\mathbb{E}\\left[\\boldsymbol{\\tilde{y}}\\right])^2\\right]+\\mathrm{Var}\\left[\\boldsymbol{\\tilde{y}}\\right]+\\sigma^2,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3d56ec88",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "that is the rewriting in terms of the so-called bias, the variance of the model $\\boldsymbol{\\tilde{y}}$ and the variance of $\\boldsymbol{\\epsilon}$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e330aad1",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## A way to Read the Bias-Variance Tradeoff\n",
+    "\n",
+    "<!-- dom:FIGURE: [figures/BiasVariance.png, width=600 frac=0.9] -->\n",
+    "<!-- begin figure -->\n",
+    "\n",
+    "<img src=\"figures/BiasVariance.png\" width=\"600\"><p style=\"font-size: 0.9em\"><i>Figure 1: </i></p>\n",
+    "<!-- end figure -->"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0083d541",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Example code for Bias-Variance tradeoff"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "id": "36c4ace6",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "import matplotlib.pyplot as plt\n",
+    "import numpy as np\n",
+    "from sklearn.linear_model import LinearRegression, Ridge, Lasso\n",
+    "from sklearn.preprocessing import PolynomialFeatures\n",
+    "from sklearn.model_selection import train_test_split\n",
+    "from sklearn.pipeline import make_pipeline\n",
+    "from sklearn.utils import resample\n",
+    "\n",
+    "np.random.seed(2018)\n",
+    "\n",
+    "n = 500\n",
+    "n_boostraps = 100\n",
+    "degree = 18  # A quite high value, just to show.\n",
+    "noise = 0.1\n",
+    "\n",
+    "# Make data set.\n",
+    "x = np.linspace(-1, 3, n).reshape(-1, 1)\n",
+    "y = np.exp(-x**2) + 1.5 * np.exp(-(x-2)**2) + np.random.normal(0, 0.1, x.shape)\n",
+    "\n",
+    "# Hold out some test data that is never used in training.\n",
+    "x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2)\n",
+    "\n",
+    "# Combine x transformation and model into one operation.\n",
+    "# Not neccesary, but convenient.\n",
+    "model = make_pipeline(PolynomialFeatures(degree=degree), LinearRegression(fit_intercept=False))\n",
+    "\n",
+    "# The following (m x n_bootstraps) matrix holds the column vectors y_pred\n",
+    "# for each bootstrap iteration.\n",
+    "y_pred = np.empty((y_test.shape[0], n_boostraps))\n",
+    "for i in range(n_boostraps):\n",
+    "    x_, y_ = resample(x_train, y_train)\n",
+    "\n",
+    "    # Evaluate the new model on the same test data each time.\n",
+    "    y_pred[:, i] = model.fit(x_, y_).predict(x_test).ravel()\n",
+    "\n",
+    "# Note: Expectations and variances taken w.r.t. different training\n",
+    "# data sets, hence the axis=1. Subsequent means are taken across the test data\n",
+    "# set in order to obtain a total value, but before this we have error/bias/variance\n",
+    "# calculated per data point in the test set.\n",
+    "# Note 2: The use of keepdims=True is important in the calculation of bias as this \n",
+    "# maintains the column vector form. Dropping this yields very unexpected results.\n",
+    "error = np.mean( np.mean((y_test - y_pred)**2, axis=1, keepdims=True) )\n",
+    "bias = np.mean( (y_test - np.mean(y_pred, axis=1, keepdims=True))**2 )\n",
+    "variance = np.mean( np.var(y_pred, axis=1, keepdims=True) )\n",
+    "print('Error:', error)\n",
+    "print('Bias^2:', bias)\n",
+    "print('Var:', variance)\n",
+    "print('{} >= {} + {} = {}'.format(error, bias, variance, bias+variance))\n",
+    "\n",
+    "plt.plot(x[::5, :], y[::5, :], label='f(x)')\n",
+    "plt.scatter(x_test, y_test, label='Data points')\n",
+    "plt.scatter(x_test, np.mean(y_pred, axis=1), label='Pred')\n",
+    "plt.legend()\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5edcf285",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Understanding what happens"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "id": "ef232d7f",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "import matplotlib.pyplot as plt\n",
+    "import numpy as np\n",
+    "from sklearn.linear_model import LinearRegression, Ridge, Lasso\n",
+    "from sklearn.preprocessing import PolynomialFeatures\n",
+    "from sklearn.model_selection import train_test_split\n",
+    "from sklearn.pipeline import make_pipeline\n",
+    "from sklearn.utils import resample\n",
+    "\n",
+    "np.random.seed(2018)\n",
+    "\n",
+    "n = 40\n",
+    "n_boostraps = 100\n",
+    "maxdegree = 14\n",
+    "\n",
+    "\n",
+    "# Make data set.\n",
+    "x = np.linspace(-3, 3, n).reshape(-1, 1)\n",
+    "y = np.exp(-x**2) + 1.5 * np.exp(-(x-2)**2)+ np.random.normal(0, 0.1, x.shape)\n",
+    "error = np.zeros(maxdegree)\n",
+    "bias = np.zeros(maxdegree)\n",
+    "variance = np.zeros(maxdegree)\n",
+    "polydegree = np.zeros(maxdegree)\n",
+    "x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2)\n",
+    "\n",
+    "for degree in range(maxdegree):\n",
+    "    model = make_pipeline(PolynomialFeatures(degree=degree), LinearRegression(fit_intercept=False))\n",
+    "    y_pred = np.empty((y_test.shape[0], n_boostraps))\n",
+    "    for i in range(n_boostraps):\n",
+    "        x_, y_ = resample(x_train, y_train)\n",
+    "        y_pred[:, i] = model.fit(x_, y_).predict(x_test).ravel()\n",
+    "\n",
+    "    polydegree[degree] = degree\n",
+    "    error[degree] = np.mean( np.mean((y_test - y_pred)**2, axis=1, keepdims=True) )\n",
+    "    bias[degree] = np.mean( (y_test - np.mean(y_pred, axis=1, keepdims=True))**2 )\n",
+    "    variance[degree] = np.mean( np.var(y_pred, axis=1, keepdims=True) )\n",
+    "    print('Polynomial degree:', degree)\n",
+    "    print('Error:', error[degree])\n",
+    "    print('Bias^2:', bias[degree])\n",
+    "    print('Var:', variance[degree])\n",
+    "    print('{} >= {} + {} = {}'.format(error[degree], bias[degree], variance[degree], bias[degree]+variance[degree]))\n",
+    "\n",
+    "plt.plot(polydegree, error, label='Error')\n",
+    "plt.plot(polydegree, bias, label='bias')\n",
+    "plt.plot(polydegree, variance, label='Variance')\n",
+    "plt.legend()\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "46266d21",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Summing up\n",
+    "\n",
+    "The bias-variance tradeoff summarizes the fundamental tension in\n",
+    "machine learning, particularly supervised learning, between the\n",
+    "complexity of a model and the amount of training data needed to train\n",
+    "it.  Since data is often limited, in practice it is often useful to\n",
+    "use a less-complex model with higher bias, that is  a model whose asymptotic\n",
+    "performance is worse than another model because it is easier to\n",
+    "train and less sensitive to sampling noise arising from having a\n",
+    "finite-sized training dataset (smaller variance). \n",
+    "\n",
+    "The above equations tell us that in\n",
+    "order to minimize the expected test error, we need to select a\n",
+    "statistical learning method that simultaneously achieves low variance\n",
+    "and low bias. Note that variance is inherently a nonnegative quantity,\n",
+    "and squared bias is also nonnegative. Hence, we see that the expected\n",
+    "test MSE can never lie below $Var(\\epsilon)$, the irreducible error.\n",
+    "\n",
+    "What do we mean by the variance and bias of a statistical learning\n",
+    "method? The variance refers to the amount by which our model would change if we\n",
+    "estimated it using a different training data set. Since the training\n",
+    "data are used to fit the statistical learning method, different\n",
+    "training data sets  will result in a different estimate. But ideally the\n",
+    "estimate for our model should not vary too much between training\n",
+    "sets. However, if a method has high variance  then small changes in\n",
+    "the training data can result in large changes in the model. In general, more\n",
+    "flexible statistical methods have higher variance.\n",
+    "\n",
+    "You may also find this recent [article](https://www.pnas.org/content/116/32/15849) of interest."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cb2b6b38",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Another Example from Scikit-Learn's Repository"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 8,
+   "id": "a682b74e",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "\"\"\"\n",
+    "============================\n",
+    "Underfitting vs. Overfitting\n",
+    "============================\n",
+    "\n",
+    "This example demonstrates the problems of underfitting and overfitting and\n",
+    "how we can use linear regression with polynomial features to approximate\n",
+    "nonlinear functions. The plot shows the function that we want to approximate,\n",
+    "which is a part of the cosine function. In addition, the samples from the\n",
+    "real function and the approximations of different models are displayed. The\n",
+    "models have polynomial features of different degrees. We can see that a\n",
+    "linear function (polynomial with degree 1) is not sufficient to fit the\n",
+    "training samples. This is called **underfitting**. A polynomial of degree 4\n",
+    "approximates the true function almost perfectly. However, for higher degrees\n",
+    "the model will **overfit** the training data, i.e. it learns the noise of the\n",
+    "training data.\n",
+    "We evaluate quantitatively **overfitting** / **underfitting** by using\n",
+    "cross-validation. We calculate the mean squared error (MSE) on the validation\n",
+    "set, the higher, the less likely the model generalizes correctly from the\n",
+    "training data.\n",
+    "\"\"\"\n",
+    "\n",
+    "print(__doc__)\n",
+    "\n",
+    "import numpy as np\n",
+    "import matplotlib.pyplot as plt\n",
+    "from sklearn.pipeline import Pipeline\n",
+    "from sklearn.preprocessing import PolynomialFeatures\n",
+    "from sklearn.linear_model import LinearRegression\n",
+    "from sklearn.model_selection import cross_val_score\n",
+    "\n",
+    "\n",
+    "def true_fun(X):\n",
+    "    return np.cos(1.5 * np.pi * X)\n",
+    "\n",
+    "np.random.seed(0)\n",
+    "\n",
+    "n_samples = 30\n",
+    "degrees = [1, 4, 15]\n",
+    "\n",
+    "X = np.sort(np.random.rand(n_samples))\n",
+    "y = true_fun(X) + np.random.randn(n_samples) * 0.1\n",
+    "\n",
+    "plt.figure(figsize=(14, 5))\n",
+    "for i in range(len(degrees)):\n",
+    "    ax = plt.subplot(1, len(degrees), i + 1)\n",
+    "    plt.setp(ax, xticks=(), yticks=())\n",
+    "\n",
+    "    polynomial_features = PolynomialFeatures(degree=degrees[i],\n",
+    "                                             include_bias=False)\n",
+    "    linear_regression = LinearRegression()\n",
+    "    pipeline = Pipeline([(\"polynomial_features\", polynomial_features),\n",
+    "                         (\"linear_regression\", linear_regression)])\n",
+    "    pipeline.fit(X[:, np.newaxis], y)\n",
+    "\n",
+    "    # Evaluate the models using crossvalidation\n",
+    "    scores = cross_val_score(pipeline, X[:, np.newaxis], y,\n",
+    "                             scoring=\"neg_mean_squared_error\", cv=10)\n",
+    "\n",
+    "    X_test = np.linspace(0, 1, 100)\n",
+    "    plt.plot(X_test, pipeline.predict(X_test[:, np.newaxis]), label=\"Model\")\n",
+    "    plt.plot(X_test, true_fun(X_test), label=\"True function\")\n",
+    "    plt.scatter(X, y, edgecolor='b', s=20, label=\"Samples\")\n",
+    "    plt.xlabel(\"x\")\n",
+    "    plt.ylabel(\"y\")\n",
+    "    plt.xlim((0, 1))\n",
+    "    plt.ylim((-2, 2))\n",
+    "    plt.legend(loc=\"best\")\n",
+    "    plt.title(\"Degree {}\\nMSE = {:.2e}(+/- {:.2e})\".format(\n",
+    "        degrees[i], -scores.mean(), scores.std()))\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b993d1a6",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Various steps in cross-validation\n",
+    "\n",
+    "When the repetitive splitting of the data set is done randomly,\n",
+    "samples may accidently end up in a fast majority of the splits in\n",
+    "either training or test set. Such samples may have an unbalanced\n",
+    "influence on either model building or prediction evaluation. To avoid\n",
+    "this $k$-fold cross-validation structures the data splitting. The\n",
+    "samples are divided into $k$ more or less equally sized exhaustive and\n",
+    "mutually exclusive subsets. In turn (at each split) one of these\n",
+    "subsets plays the role of the test set while the union of the\n",
+    "remaining subsets constitutes the training set. Such a splitting\n",
+    "warrants a balanced representation of each sample in both training and\n",
+    "test set over the splits. Still the division into the $k$ subsets\n",
+    "involves a degree of randomness. This may be fully excluded when\n",
+    "choosing $k=n$. This particular case is referred to as leave-one-out\n",
+    "cross-validation (LOOCV)."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8e2f13c8",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Cross-validation in brief\n",
+    "\n",
+    "For the various values of $k$\n",
+    "\n",
+    "1. shuffle the dataset randomly.\n",
+    "\n",
+    "2. Split the dataset into $k$ groups.\n",
+    "\n",
+    "3. For each unique group:\n",
+    "\n",
+    "a. Decide which group to use as set for test data\n",
+    "\n",
+    "b. Take the remaining groups as a training data set\n",
+    "\n",
+    "c. Fit a model on the training set and evaluate it on the test set\n",
+    "\n",
+    "d. Retain the evaluation score and discard the model\n",
+    "\n",
+    "5. Summarize the model using the sample of model evaluation scores"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "46eca8a8",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Code Example for Cross-validation and $k$-fold Cross-validation\n",
+    "\n",
+    "The code here uses Ridge regression with cross-validation (CV)  resampling and $k$-fold CV in order to fit a specific polynomial."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 9,
+   "id": "ab7e9af1",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "import numpy as np\n",
+    "import matplotlib.pyplot as plt\n",
+    "from sklearn.model_selection import KFold\n",
+    "from sklearn.linear_model import Ridge\n",
+    "from sklearn.model_selection import cross_val_score\n",
+    "from sklearn.preprocessing import PolynomialFeatures\n",
+    "\n",
+    "# A seed just to ensure that the random numbers are the same for every run.\n",
+    "# Useful for eventual debugging.\n",
+    "np.random.seed(3155)\n",
+    "\n",
+    "# Generate the data.\n",
+    "nsamples = 100\n",
+    "x = np.random.randn(nsamples)\n",
+    "y = 3*x**2 + np.random.randn(nsamples)\n",
+    "\n",
+    "## Cross-validation on Ridge regression using KFold only\n",
+    "\n",
+    "# Decide degree on polynomial to fit\n",
+    "poly = PolynomialFeatures(degree = 6)\n",
+    "\n",
+    "# Decide which values of lambda to use\n",
+    "nlambdas = 500\n",
+    "lambdas = np.logspace(-3, 5, nlambdas)\n",
+    "\n",
+    "# Initialize a KFold instance\n",
+    "k = 5\n",
+    "kfold = KFold(n_splits = k)\n",
+    "\n",
+    "# Perform the cross-validation to estimate MSE\n",
+    "scores_KFold = np.zeros((nlambdas, k))\n",
+    "\n",
+    "i = 0\n",
+    "for lmb in lambdas:\n",
+    "    ridge = Ridge(alpha = lmb)\n",
+    "    j = 0\n",
+    "    for train_inds, test_inds in kfold.split(x):\n",
+    "        xtrain = x[train_inds]\n",
+    "        ytrain = y[train_inds]\n",
+    "\n",
+    "        xtest = x[test_inds]\n",
+    "        ytest = y[test_inds]\n",
+    "\n",
+    "        Xtrain = poly.fit_transform(xtrain[:, np.newaxis])\n",
+    "        ridge.fit(Xtrain, ytrain[:, np.newaxis])\n",
+    "\n",
+    "        Xtest = poly.fit_transform(xtest[:, np.newaxis])\n",
+    "        ypred = ridge.predict(Xtest)\n",
+    "\n",
+    "        scores_KFold[i,j] = np.sum((ypred - ytest[:, np.newaxis])**2)/np.size(ypred)\n",
+    "\n",
+    "        j += 1\n",
+    "    i += 1\n",
+    "\n",
+    "\n",
+    "estimated_mse_KFold = np.mean(scores_KFold, axis = 1)\n",
+    "\n",
+    "## Cross-validation using cross_val_score from sklearn along with KFold\n",
+    "\n",
+    "# kfold is an instance initialized above as:\n",
+    "# kfold = KFold(n_splits = k)\n",
+    "\n",
+    "estimated_mse_sklearn = np.zeros(nlambdas)\n",
+    "i = 0\n",
+    "for lmb in lambdas:\n",
+    "    ridge = Ridge(alpha = lmb)\n",
+    "\n",
+    "    X = poly.fit_transform(x[:, np.newaxis])\n",
+    "    estimated_mse_folds = cross_val_score(ridge, X, y[:, np.newaxis], scoring='neg_mean_squared_error', cv=kfold)\n",
+    "\n",
+    "    # cross_val_score return an array containing the estimated negative mse for every fold.\n",
+    "    # we have to the the mean of every array in order to get an estimate of the mse of the model\n",
+    "    estimated_mse_sklearn[i] = np.mean(-estimated_mse_folds)\n",
+    "\n",
+    "    i += 1\n",
+    "\n",
+    "## Plot and compare the slightly different ways to perform cross-validation\n",
+    "\n",
+    "plt.figure()\n",
+    "\n",
+    "plt.plot(np.log10(lambdas), estimated_mse_sklearn, label = 'cross_val_score')\n",
+    "plt.plot(np.log10(lambdas), estimated_mse_KFold, 'r--', label = 'KFold')\n",
+    "\n",
+    "plt.xlabel('log10(lambda)')\n",
+    "plt.ylabel('mse')\n",
+    "\n",
+    "plt.legend()\n",
+    "\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d380b2da",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## More examples on bootstrap and cross-validation and errors"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 10,
+   "id": "6ee8634a",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "# Common imports\n",
+    "import os\n",
+    "import numpy as np\n",
+    "import pandas as pd\n",
+    "import matplotlib.pyplot as plt\n",
+    "from sklearn.linear_model import LinearRegression, Ridge, Lasso\n",
+    "from sklearn.model_selection import train_test_split\n",
+    "from sklearn.utils import resample\n",
+    "from sklearn.metrics import mean_squared_error\n",
+    "# Where to save the figures and data files\n",
+    "PROJECT_ROOT_DIR = \"Results\"\n",
+    "FIGURE_ID = \"Results/FigureFiles\"\n",
+    "DATA_ID = \"DataFiles/\"\n",
+    "\n",
+    "if not os.path.exists(PROJECT_ROOT_DIR):\n",
+    "    os.mkdir(PROJECT_ROOT_DIR)\n",
+    "\n",
+    "if not os.path.exists(FIGURE_ID):\n",
+    "    os.makedirs(FIGURE_ID)\n",
+    "\n",
+    "if not os.path.exists(DATA_ID):\n",
+    "    os.makedirs(DATA_ID)\n",
+    "\n",
+    "def image_path(fig_id):\n",
+    "    return os.path.join(FIGURE_ID, fig_id)\n",
+    "\n",
+    "def data_path(dat_id):\n",
+    "    return os.path.join(DATA_ID, dat_id)\n",
+    "\n",
+    "def save_fig(fig_id):\n",
+    "    plt.savefig(image_path(fig_id) + \".png\", format='png')\n",
+    "\n",
+    "infile = open(data_path(\"EoS.csv\"),'r')\n",
+    "\n",
+    "# Read the EoS data as  csv file and organize the data into two arrays with density and energies\n",
+    "EoS = pd.read_csv(infile, names=('Density', 'Energy'))\n",
+    "EoS['Energy'] = pd.to_numeric(EoS['Energy'], errors='coerce')\n",
+    "EoS = EoS.dropna()\n",
+    "Energies = EoS['Energy']\n",
+    "Density = EoS['Density']\n",
+    "#  The design matrix now as function of various polytrops\n",
+    "\n",
+    "Maxpolydegree = 30\n",
+    "X = np.zeros((len(Density),Maxpolydegree))\n",
+    "X[:,0] = 1.0\n",
+    "testerror = np.zeros(Maxpolydegree)\n",
+    "trainingerror = np.zeros(Maxpolydegree)\n",
+    "polynomial = np.zeros(Maxpolydegree)\n",
+    "\n",
+    "trials = 100\n",
+    "for polydegree in range(1, Maxpolydegree):\n",
+    "    polynomial[polydegree] = polydegree\n",
+    "    for degree in range(polydegree):\n",
+    "        X[:,degree] = Density**(degree/3.0)\n",
+    "\n",
+    "# loop over trials in order to estimate the expectation value of the MSE\n",
+    "    testerror[polydegree] = 0.0\n",
+    "    trainingerror[polydegree] = 0.0\n",
+    "    for samples in range(trials):\n",
+    "        x_train, x_test, y_train, y_test = train_test_split(X, Energies, test_size=0.2)\n",
+    "        model = LinearRegression(fit_intercept=False).fit(x_train, y_train)\n",
+    "        ypred = model.predict(x_train)\n",
+    "        ytilde = model.predict(x_test)\n",
+    "        testerror[polydegree] += mean_squared_error(y_test, ytilde)\n",
+    "        trainingerror[polydegree] += mean_squared_error(y_train, ypred) \n",
+    "\n",
+    "    testerror[polydegree] /= trials\n",
+    "    trainingerror[polydegree] /= trials\n",
+    "    print(\"Degree of polynomial: %3d\"% polynomial[polydegree])\n",
+    "    print(\"Mean squared error on training data: %.8f\" % trainingerror[polydegree])\n",
+    "    print(\"Mean squared error on test data: %.8f\" % testerror[polydegree])\n",
+    "\n",
+    "plt.plot(polynomial, np.log10(trainingerror), label='Training Error')\n",
+    "plt.plot(polynomial, np.log10(testerror), label='Test Error')\n",
+    "plt.xlabel('Polynomial degree')\n",
+    "plt.ylabel('log10[MSE]')\n",
+    "plt.legend()\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "063c5a57",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Note that we kept the intercept column in the fitting here. This means that we need to set the **intercept** in the call to the **Scikit-Learn** function as **False**. Alternatively, we could have set up the design matrix $X$ without the first column of ones."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7736c7fc",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## The same example but now with cross-validation\n",
+    "\n",
+    "In this example we keep the intercept column again but add cross-validation in order to estimate the best possible value of the means squared error."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 11,
+   "id": "9f8e4fb6",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "# Common imports\n",
+    "import os\n",
+    "import numpy as np\n",
+    "import pandas as pd\n",
+    "import matplotlib.pyplot as plt\n",
+    "from sklearn.linear_model import LinearRegression, Ridge, Lasso\n",
+    "from sklearn.metrics import mean_squared_error\n",
+    "from sklearn.model_selection import KFold\n",
+    "from sklearn.model_selection import cross_val_score\n",
+    "\n",
+    "\n",
+    "# Where to save the figures and data files\n",
+    "PROJECT_ROOT_DIR = \"Results\"\n",
+    "FIGURE_ID = \"Results/FigureFiles\"\n",
+    "DATA_ID = \"DataFiles/\"\n",
+    "\n",
+    "if not os.path.exists(PROJECT_ROOT_DIR):\n",
+    "    os.mkdir(PROJECT_ROOT_DIR)\n",
+    "\n",
+    "if not os.path.exists(FIGURE_ID):\n",
+    "    os.makedirs(FIGURE_ID)\n",
+    "\n",
+    "if not os.path.exists(DATA_ID):\n",
+    "    os.makedirs(DATA_ID)\n",
+    "\n",
+    "def image_path(fig_id):\n",
+    "    return os.path.join(FIGURE_ID, fig_id)\n",
+    "\n",
+    "def data_path(dat_id):\n",
+    "    return os.path.join(DATA_ID, dat_id)\n",
+    "\n",
+    "def save_fig(fig_id):\n",
+    "    plt.savefig(image_path(fig_id) + \".png\", format='png')\n",
+    "\n",
+    "infile = open(data_path(\"EoS.csv\"),'r')\n",
+    "\n",
+    "# Read the EoS data as  csv file and organize the data into two arrays with density and energies\n",
+    "EoS = pd.read_csv(infile, names=('Density', 'Energy'))\n",
+    "EoS['Energy'] = pd.to_numeric(EoS['Energy'], errors='coerce')\n",
+    "EoS = EoS.dropna()\n",
+    "Energies = EoS['Energy']\n",
+    "Density = EoS['Density']\n",
+    "#  The design matrix now as function of various polytrops\n",
+    "\n",
+    "Maxpolydegree = 30\n",
+    "X = np.zeros((len(Density),Maxpolydegree))\n",
+    "X[:,0] = 1.0\n",
+    "estimated_mse_sklearn = np.zeros(Maxpolydegree)\n",
+    "polynomial = np.zeros(Maxpolydegree)\n",
+    "k =5\n",
+    "kfold = KFold(n_splits = k)\n",
+    "\n",
+    "for polydegree in range(1, Maxpolydegree):\n",
+    "    polynomial[polydegree] = polydegree\n",
+    "    for degree in range(polydegree):\n",
+    "        X[:,degree] = Density**(degree/3.0)\n",
+    "        OLS = LinearRegression(fit_intercept=False)\n",
+    "# loop over trials in order to estimate the expectation value of the MSE\n",
+    "    estimated_mse_folds = cross_val_score(OLS, X, Energies, scoring='neg_mean_squared_error', cv=kfold)\n",
+    "#[:, np.newaxis]\n",
+    "    estimated_mse_sklearn[polydegree] = np.mean(-estimated_mse_folds)\n",
+    "\n",
+    "plt.plot(polynomial, np.log10(estimated_mse_sklearn), label='Test Error')\n",
+    "plt.xlabel('Polynomial degree')\n",
+    "plt.ylabel('log10[MSE]')\n",
+    "plt.legend()\n",
+    "plt.show()"
+   ]
+  }
+ ],
+ "metadata": {},
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/doc/src/week37/Previousversions/figures/BiasVariance.png b/doc/src/week37/Previousversions/figures/BiasVariance.png
new file mode 100644
index 000000000..3fb3474ac
Binary files /dev/null and b/doc/src/week37/Previousversions/figures/BiasVariance.png differ
diff --git a/doc/src/week37/Previousversions/figures/adagrad.png b/doc/src/week37/Previousversions/figures/adagrad.png
new file mode 100644
index 000000000..97a9cf908
Binary files /dev/null and b/doc/src/week37/Previousversions/figures/adagrad.png differ
diff --git a/doc/src/week37/Previousversions/figures/adam.png b/doc/src/week37/Previousversions/figures/adam.png
new file mode 100644
index 000000000..a3a39f025
Binary files /dev/null and b/doc/src/week37/Previousversions/figures/adam.png differ
diff --git a/doc/src/week37/Previousversions/figures/nns.png b/doc/src/week37/Previousversions/figures/nns.png
new file mode 100644
index 000000000..19e31ef05
Binary files /dev/null and b/doc/src/week37/Previousversions/figures/nns.png differ
diff --git a/doc/src/week37/Previousversions/figures/rmsprop.png b/doc/src/week37/Previousversions/figures/rmsprop.png
new file mode 100644
index 000000000..9f336d033
Binary files /dev/null and b/doc/src/week37/Previousversions/figures/rmsprop.png differ
diff --git a/doc/src/week37/Previousversions/ipynb-week37_2022-src.tar.gz b/doc/src/week37/Previousversions/ipynb-week37_2022-src.tar.gz
new file mode 100644
index 000000000..1a9c9811f
Binary files /dev/null and b/doc/src/week37/Previousversions/ipynb-week37_2022-src.tar.gz differ
diff --git a/doc/src/week37/Previousversions/week37.do.txt b/doc/src/week37/Previousversions/week37.do.txt
new file mode 100644
index 000000000..e942a81d5
--- /dev/null
+++ b/doc/src/week37/Previousversions/week37.do.txt
@@ -0,0 +1,1658 @@
+TITLE: Week 37: Statistical interpretations and Resampling Methods
+AUTHOR: Morten Hjorth-Jensen {copyright, 1999-present|CC BY-NC} at Department of Physics, University of Oslo, Norway
+DATE: September 8-12, 2025
+
+
+# todo add link to videos and add link to Van Wieringens notes
+
+!split
+=====  Plans for week 37, lecture Monday  =====
+
+
+!bblock Material for the lecture on Monday September 8
+#   * "Video of Lecture":"https://youtu.be/omLmp_kkie0"
+#   * "Whiteboard notes":"https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2024/NotesSeptember9.pdf"
+  * Statistical interpretation of Ridge and Lasso regression, see also slides from last week
+  * Resampling techniques, Bootstrap and cross validation and bias-variance tradeoff (this may partly be discussed during the exercise sessions as well.
+  * Readings and Videos:
+    * Raschka et al, pages 175-192
+    * Hastie et al Chapter 7, here we recommend 7.1-7.5 and 7.10 (cross-validation) and 7.11 (bootstrap). See URL:"https://link.springer.com/book/10.1007/978-0-387-84858-7".
+    * "Video on cross validation":"https://www.youtube.com/watch?v=fSytzGwwBVw"
+    * "Video on Bootstrapping":"https://www.youtube.com/watch?v=Xz0x-8-cgaQ"
+    * "Video on bias-variance tradeoff":"https://www.youtube.com/watch?v=EuBBz3bI-aA"
+!eblock
+
+
+
+
+!split
+=====  Plans for week 37, lab sessions  =====
+
+
+
+!bblock  Material for the lab  sessions on Tuesday and Wednesday
+o Exercise set for week 37
+o Work on project 1
+#  * "Video of exercise sessions week 37":"https://youtu.be/bK4AEcTu-oM"
+  * For more discussions of Ridge regression and calculation of averages, "Wessel van Wieringen's":"https://arxiv.org/abs/1509.09169" article is highly recommended.
+!eblock  
+
+
+
+
+!split
+===== Material for lecture Monday September 8 =====
+
+
+!split
+===== Deriving OLS from a probability distribution =====
+
+Our basic assumption when we derived the OLS equations was to assume
+that our output is determined by a given continuous function
+$f(\bm{x})$ and a random noise $\bm{\epsilon}$ given by the normal
+distribution with zero mean value and an undetermined variance
+$\sigma^2$.
+
+We found above that the outputs $\bm{y}$ have a mean value given by
+$\bm{X}\hat{\bm{\beta}}$ and variance $\sigma^2$. Since the entries to
+the design matrix are not stochastic variables, we can assume that the
+probability distribution of our targets is also a normal distribution
+but now with mean value $\bm{X}\hat{\bm{\beta}}$. This means that a
+single output $y_i$ is given by the Gaussian distribution
+
+!bt
+\[
+y_i\sim \mathcal{N}(\bm{X}_{i,*}\bm{\beta}, \sigma^2)=\frac{1}{\sqrt{2\pi\sigma^2}}\exp{\left[-\frac{(y_i-\bm{X}_{i,*}\bm{\beta})^2}{2\sigma^2}\right]}.
+\]
+!et
+
+!split
+===== Independent and Identically Distrubuted (iid) =====
+
+We assume now that the various $y_i$ values are stochastically distributed according to the above Gaussian distribution. 
+We define this distribution as
+!bt
+\[
+p(y_i, \bm{X}\vert\bm{\beta})=\frac{1}{\sqrt{2\pi\sigma^2}}\exp{\left[-\frac{(y_i-\bm{X}_{i,*}\bm{\beta})^2}{2\sigma^2}\right]},
+\]
+!et
+which reads as finding the likelihood of an event $y_i$ with the input variables $\bm{X}$ given the parameters (to be determined) $\bm{\beta}$.
+
+Since these events are assumed to be independent and identicall distributed we can build the probability distribution function (PDF) for all possible event $\bm{y}$ as the product of the single events, that is we have
+
+!bt
+\[
+p(\bm{y},\bm{X}\vert\bm{\beta})=\prod_{i=0}^{n-1}\frac{1}{\sqrt{2\pi\sigma^2}}\exp{\left[-\frac{(y_i-\bm{X}_{i,*}\bm{\beta})^2}{2\sigma^2}\right]}=\prod_{i=0}^{n-1}p(y_i,\bm{X}\vert\bm{\beta}).
+\]
+!et
+
+We will write this in a more compact form reserving $\bm{D}$ for the domain of events, including the ouputs (targets) and the inputs. That is
+in case we have a simple one-dimensional input and output case
+!bt
+\[
+\bm{D}=[(x_0,y_0), (x_1,y_1),\dots, (x_{n-1},y_{n-1})].
+\]
+!et
+In the more general case the various inputs should be replaced by the possible features represented by the input data set $\bm{X}$. 
+We can now rewrite the above probability as 
+!bt
+\[
+p(\bm{D}\vert\bm{\beta})=\prod_{i=0}^{n-1}\frac{1}{\sqrt{2\pi\sigma^2}}\exp{\left[-\frac{(y_i-\bm{X}_{i,*}\bm{\beta})^2}{2\sigma^2}\right]}.
+\]
+!et
+
+It is a conditional probability (see below) and reads as the likelihood of a domain of events $\bm{D}$ given a set of parameters $\bm{\beta}$.
+
+!split
+===== Maximum Likelihood Estimation (MLE) =====
+
+In statistics, maximum likelihood estimation (MLE) is a method of
+estimating the parameters of an assumed probability distribution,
+given some observed data. This is achieved by maximizing a likelihood
+function so that, under the assumed statistical model, the observed
+data is the most probable. 
+
+
+We will assume here that our events are given by the above Gaussian
+distribution and we will determine the optimal parameters $\beta$ by
+maximizing the above PDF. However, computing the derivatives of a
+product function is cumbersome and can easily lead to overflow and/or
+underflowproblems, with potentials for loss of numerical precision.
+
+
+In practice, it is more convenient to maximize the logarithm of the
+PDF because it is a monotonically increasing function of the argument.
+Alternatively, and this will be our option, we will minimize the
+negative of the logarithm since this is a monotonically decreasing
+function.
+
+Note also that maximization/minimization of the logarithm of the PDF
+is equivalent to the maximization/minimization of the function itself.
+
+
+
+!split
+===== A new Cost Function =====
+
+We could now define a new cost function to minimize, namely the negative logarithm of the above PDF
+
+!bt
+\[
+C(\bm{\beta}=-\log{\prod_{i=0}^{n-1}p(y_i,\bm{X}\vert\bm{\beta})}=-\sum_{i=0}^{n-1}\log{p(y_i,\bm{X}\vert\bm{\beta})},
+\]
+!et
+which becomes
+!bt
+\[
+C(\bm{\beta}=\frac{n}{2}\log{2\pi\sigma^2}+\frac{\vert\vert (\bm{y}-\bm{X}\bm{\beta})\vert\vert_2^2}{2\sigma^2}.
+\]
+!et
+
+Taking the derivative of the *new* cost function with respect to the parameters $\beta$ we recognize our familiar OLS equation, namely
+
+!bt
+\[
+\bm{X}^T\left(\bm{y}-\bm{X}\bm{\beta}\right) =0,
+\]
+!et
+which leads to the well-known OLS equation for the optimal paramters $\beta$
+!bt
+\[
+\hat{\bm{\beta}}^{\mathrm{OLS}}=\left(\bm{X}^T\bm{X}\right)^{-1}\bm{X}^T\bm{y}!
+\]
+!et
+
+
+Before we make a similar analysis for Ridge and Lasso regression, we need a short reminder on statistics. 
+
+!split
+===== More basic Statistics and Bayes' theorem =====
+
+A central theorem in statistics is Bayes' theorem. This theorem plays a similar role as the good old Pythagoras' theorem in geometry.
+Bayes' theorem is extremely simple to derive. But to do so we need some basic axioms from statistics.
+
+Assume we have two domains of events $X=[x_0,x_1,\dots,x_{n-1}]$ and $Y=[y_0,y_1,\dots,y_{n-1}]$.
+
+We define also the likelihood for $X$ and $Y$ as $p(X)$ and $p(Y)$ respectively.
+The likelihood of a specific event $x_i$ (or $y_i$) is then written as $p(X=x_i)$ or just $p(x_i)=p_i$. 
+
+!bblock Union of events is given by
+!bt
+\[
+p(X \cup Y)= p(X)+p(Y)-p(X \cap Y).
+\]
+!et
+!eblock
+
+
+!bblock The product rule (aka joint probability) is given by
+!bt
+\[
+p(X \cup Y)= p(X,Y)= p(X\vert Y)p(Y)=p(Y\vert X)p(X),
+\]
+!et
+where we read $p(X\vert Y)$ as the likelihood of obtaining $X$ given $Y$.
+!eblock
+
+If we have independent events then $p(X,Y)=p(X)p(Y)$.
+
+
+!split
+===== Marginal Probability =====
+
+The marginal probability is defined in terms of only one of the set of variables $X,Y$. For a discrete probability we have
+!bblock 
+!bt
+\[
+p(X)=\sum_{i=0}^{n-1}p(X,Y=y_i)=\sum_{i=0}^{n-1}p(X\vert Y=y_i)p(Y=y_i)=\sum_{i=0}^{n-1}p(X\vert y_i)p(y_i).
+\]
+!et
+!eblock
+
+
+!split
+===== Conditional  Probability =====
+
+The conditional  probability, if $p(Y) > 0$, is 
+!bblock 
+!bt
+\[
+p(X\vert Y)= \frac{p(X,Y)}{p(Y)}=\frac{p(X,Y)}{\sum_{i=0}^{n-1}p(Y\vert X=x_i)p(x_i)}.
+\]
+!et
+!eblock
+
+
+!split
+===== Bayes' Theorem =====
+
+If we combine the conditional probability with the marginal probability and the standard product rule, we have
+!bt
+\[
+p(X\vert Y)= \frac{p(X,Y)}{p(Y)},
+\]
+!et
+which we can rewrite as
+
+!bt
+\[
+p(X\vert Y)= \frac{p(X,Y)}{\sum_{i=0}^{n-1}p(Y\vert X=x_i)p(x_i)}=\frac{p(Y\vert X)p(X)}{\sum_{i=0}^{n-1}p(Y\vert X=x_i)p(x_i)},
+\]
+!et
+which is Bayes' theorem. It allows us to evaluate the uncertainty in in $X$ after we have observed $Y$. We can easily interchange $X$ with $Y$.  
+
+!split
+===== Interpretations of Bayes' Theorem =====
+
+The quantity $p(Y\vert X)$ on the right-hand side of the theorem is
+evaluated for the observed data $Y$ and can be viewed as a function of
+the parameter space represented by $X$. This function is not
+necesseraly normalized and is normally called the likelihood function.
+
+The function $p(X)$ on the right hand side is called the prior while the function on the left hand side is the called the posterior probability. The denominator on the right hand side serves as a normalization factor for the posterior distribution.
+
+Let us try to illustrate Bayes' theorem through an example.
+
+!split
+=====  Example of Usage of Bayes' theorem =====
+
+Let us suppose that you are undergoing a series of mammography scans in
+order to rule out possible breast cancer cases.  We define the
+sensitivity for a positive event by the variable $X$. It takes binary
+values with $X=1$ representing a positive event and $X=0$ being a
+negative event. We reserve $Y$ as a classification parameter for
+either a negative or a positive breast cancer confirmation. (Short note on wordings: positive here means having breast cancer, although none of us would consider this being a  positive thing).
+
+We let $Y=1$ represent the the case of having breast cancer and $Y=0$ as not.
+
+Let us assume that if you have breast cancer, the test will be positive with a probability of $0.8$, that is we have
+
+!bt
+\[
+p(X=1\vert Y=1) =0.8.
+\]
+!et
+
+This obviously sounds  scary since many would conclude that if the test is positive, there is a likelihood of $80\%$ for having cancer.
+It is however not correct, as the following Bayesian analysis shows.
+
+!split
+===== Doing it correctly =====
+
+If we look at various national surveys on breast cancer, the general likelihood of developing breast cancer is a very small number.
+Let us assume that the prior probability in the population as a whole is
+
+!bt
+\[
+p(Y=1) =0.004.
+\]
+!et
+
+We need also to account for the fact that the test may produce a false positive result (false alarm). Let us here assume that we have
+!bt
+\[
+p(X=1\vert Y=0) =0.1.
+\]
+!et
+
+Using Bayes' theorem we can then find the posterior probability that the person has breast cancer in case of a positive test, that is we can compute
+
+!bt
+\[
+p(Y=1\vert X=1)=\frac{p(X=1\vert Y=1)p(Y=1)}{p(X=1\vert Y=1)p(Y=1)+p(X=1\vert Y=0)p(Y=0)}=\frac{0.8\times 0.004}{0.8\times 0.004+0.1\times 0.996}=0.031.
+\]
+!et
+That is, in case of a positive test, there is only a $3\%$ chance of having breast cancer!
+
+
+!split
+===== Bayes' Theorem and Ridge and Lasso Regression =====
+
+Using Bayes' theorem we can gain a better intuition about Ridge and Lasso regression. 
+
+For ordinary least squares we postulated that the maximum likelihood for the doamin of events $\bm{D}$ (one-dimensional case)
+!bt
+\[
+\bm{D}=[(x_0,y_0), (x_1,y_1),\dots, (x_{n-1},y_{n-1})],
+\]
+!et
+is given by
+!bt
+\[
+p(\bm{D}\vert\bm{\beta})=\prod_{i=0}^{n-1}\frac{1}{\sqrt{2\pi\sigma^2}}\exp{\left[-\frac{(y_i-\bm{X}_{i,*}\bm{\beta})^2}{2\sigma^2}\right]}.
+\]
+!et
+
+In Bayes' theorem this function plays the role of the so-called likelihood. We could now ask the question what is the posterior probability of a parameter set $\bm{\beta}$ given a domain of events $\bm{D}$?  That is, how can we define the posterior probability 
+
+!bt
+\[
+p(\bm{\beta}\vert\bm{D}).
+\]
+!et
+
+Bayes' theorem comes to our rescue here since (omitting the normalization constant)
+!bt
+\[
+p(\bm{\beta}\vert\bm{D})\propto p(\bm{D}\vert\bm{\beta})p(\bm{\beta}).
+\]
+!et
+
+We have a model for $p(\bm{D}\vert\bm{\beta})$ but need one for the _prior_ $p(\bm{\beta})$!   
+
+
+!split
+===== Ridge and Bayes =====
+
+With the posterior probability defined by a likelihood which we have
+already modeled and an unknown prior, we are now ready to make
+additional models for the prior.
+
+We can, based on our discussions of the variance of $\bm{\beta}$ and the mean value, assume that the prior for the values $\bm{\beta}$ is given by a Gaussian with mean value zero and variance $\tau^2$, that is
+
+!bt
+\[
+p(\bm{\beta})=\prod_{j=0}^{p-1}\exp{\left(-\frac{\beta_j^2}{2\tau^2}\right)}.
+\]
+!et
+
+Our posterior probability becomes then (omitting the normalization factor which is just a constant)
+!bt
+\[
+p(\bm{\beta\vert\bm{D})}=\prod_{i=0}^{n-1}\frac{1}{\sqrt{2\pi\sigma^2}}\exp{\left[-\frac{(y_i-\bm{X}_{i,*}\bm{\beta})^2}{2\sigma^2}\right]}\prod_{j=0}^{p-1}\exp{\left(-\frac{\beta_j^2}{2\tau^2}\right)}.
+\]
+!et
+
+
+We can now optimize this quantity with respect to $\bm{\beta}$. As we
+did for OLS, this is most conveniently done by taking the negative
+logarithm of the posterior probability. Doing so and leaving out the
+constants terms that do not depend on $\beta$, we have
+
+
+!bt
+\[
+C(\bm{\beta})=\frac{\vert\vert (\bm{y}-\bm{X}\bm{\beta})\vert\vert_2^2}{2\sigma^2}+\frac{1}{2\tau^2}\vert\vert\bm{\beta}\vert\vert_2^2,
+\]
+!et
+and replacing $1/2\tau^2$ with $\lambda$ we have
+
+!bt
+\[
+C(\bm{\beta})=\frac{\vert\vert (\bm{y}-\bm{X}\bm{\beta})\vert\vert_2^2}{2\sigma^2}+\lambda\vert\vert\bm{\beta}\vert\vert_2^2,
+\]
+!et
+which is our Ridge cost function!  Nice, isn't it?
+
+!split
+===== Lasso and Bayes =====
+
+To derive the Lasso cost function, we simply replace the Gaussian prior with an exponential distribution ("Laplace in this case":"https://en.wikipedia.org/wiki/Laplace_distribution") with zero mean value,  that is
+
+!bt
+\[
+p(\bm{\beta})=\prod_{j=0}^{p-1}\exp{\left(-\frac{\vert\beta_j\vert}{\tau}\right)}.
+\]
+!et
+
+Our posterior probability becomes then (omitting the normalization factor which is just a constant)
+!bt
+\[
+p(\bm{\beta}\vert\bm{D})=\prod_{i=0}^{n-1}\frac{1}{\sqrt{2\pi\sigma^2}}\exp{\left[-\frac{(y_i-\bm{X}_{i,*}\bm{\beta})^2}{2\sigma^2}\right]}\prod_{j=0}^{p-1}\exp{\left(-\frac{\vert\beta_j\vert}{\tau}\right)}.
+\]
+!et
+
+
+Taking the negative
+logarithm of the posterior probability and leaving out the
+constants terms that do not depend on $\beta$, we have
+
+
+!bt
+\[
+C(\bm{\beta})=\frac{\vert\vert (\bm{y}-\bm{X}\bm{\beta})\vert\vert_2^2}{2\sigma^2}+\frac{1}{\tau}\vert\vert\bm{\beta}\vert\vert_1,
+\]
+!et
+and replacing $1/\tau$ with $\lambda$ we have
+
+!bt
+\[
+C(\bm{\beta})=\frac{\vert\vert (\bm{y}-\bm{X}\bm{\beta})\vert\vert_2^2}{2\sigma^2}+\lambda\vert\vert\bm{\beta}\vert\vert_1,
+\]
+!et
+which is our Lasso cost function!  
+
+
+
+
+
+
+!split
+===== Why resampling methods =====
+
+Before we proceed, we need to rethink what we have been doing. In our
+eager to fit the data, we have omitted several important elements in
+our regression analysis. In what follows we will
+o look at statistical properties, including a discussion of mean values, variance and the so-called bias-variance tradeoff
+o introduce resampling techniques like cross-validation, bootstrapping and jackknife and more
+
+and discuss how to select a given model (one of the difficult parts in machine learning).
+
+
+
+
+
+!split
+===== Resampling methods =====
+!bblock
+Resampling methods are an indispensable tool in modern
+statistics. They involve repeatedly drawing samples from a training
+set and refitting a model of interest on each sample in order to
+obtain additional information about the fitted model. For example, in
+order to estimate the variability of a linear regression fit, we can
+repeatedly draw different samples from the training data, fit a linear
+regression to each new sample, and then examine the extent to which
+the resulting fits differ. Such an approach may allow us to obtain
+information that would not be available from fitting the model only
+once using the original training sample.
+
+Two resampling methods are often used in Machine Learning analyses,
+o The _bootstrap method_
+o and _Cross-Validation_
+
+In addition there are several other methods such as the Jackknife and the Blocking methods. We will discuss in particular
+cross-validation and the bootstrap method. 
+
+
+!eblock
+
+
+!split
+===== Resampling approaches can be computationally expensive =====
+!bblock
+
+Resampling approaches can be computationally expensive, because they
+involve fitting the same statistical method multiple times using
+different subsets of the training data. However, due to recent
+advances in computing power, the computational requirements of
+resampling methods generally are not prohibitive. In this chapter, we
+discuss two of the most commonly used resampling methods,
+cross-validation and the bootstrap. Both methods are important tools
+in the practical application of many statistical learning
+procedures. For example, cross-validation can be used to estimate the
+test error associated with a given statistical learning method in
+order to evaluate its performance, or to select the appropriate level
+of flexibility. The process of evaluating a model’s performance is
+known as model assessment, whereas the process of selecting the proper
+level of flexibility for a model is known as model selection. The
+bootstrap is widely used.
+
+!eblock
+
+!split
+===== Why resampling methods ? ===== 
+!bblock Statistical analysis
+    
+* Our simulations can be treated as *computer experiments*. This is particularly the case for Monte Carlo methods which are widely used in statistical analyses.
+* The results can be analysed with the same statistical tools as we would use when analysing experimental data.
+* As in all experiments, we are looking for expectation values and an estimate of how accurate they are, i.e., possible sources for errors.
+
+
+!eblock    
+
+!split
+=====  Statistical analysis ===== 
+!bblock 
+
+* As in other experiments, many numerical  experiments have two classes of errors:
+  * Statistical errors
+  * Systematical errors
+* Statistical errors can be estimated using standard tools from statistics
+* Systematical errors are method specific and must be treated differently from case to case. 
+!eblock    
+
+
+
+
+
+!split
+===== Resampling methods =====
+
+With all these analytical equations for both the OLS and Ridge
+regression, we will now outline how to assess a given model. This will
+lead to a discussion of the so-called bias-variance tradeoff (see
+below) and so-called resampling methods.
+
+One of the quantities we have discussed as a way to measure errors is
+the mean-squared error (MSE), mainly used for fitting of continuous
+functions. Another choice is the absolute error.
+
+In the discussions below we will focus on the MSE and in particular since we will split the data into test and training data,
+we discuss the
+o prediction error or simply the _test error_ $\mathrm{Err_{Test}}$, where we have a fixed training set and the test error is the MSE arising from the data reserved for testing. We discuss also the 
+o training error $\mathrm{Err_{Train}}$, which is the average loss over the training data.
+
+As our model becomes more and more complex, more of the training data tends to  used. The training may thence adapt to more complicated structures in the data. This may lead to a decrease in the bias (see below for code example) and a slight increase of the variance for the test error.
+For a certain level of complexity the test error will reach minimum, before starting to increase again. The
+training error reaches a saturation.
+
+
+!split
+===== Resampling methods: Bootstrap =====
+!bblock
+Bootstrapping is a "non-parametric approach":"https://en.wikipedia.org/wiki/Nonparametric_statistics" to statistical inference
+that substitutes computation for more traditional distributional
+assumptions and asymptotic results. Bootstrapping offers a number of
+advantages: 
+o The bootstrap is quite general, although there are some cases in which it fails.  
+o Because it does not require distributional assumptions (such as normally distributed errors), the bootstrap can provide more accurate inferences when the data are not well behaved or when the sample size is small.  
+o It is possible to apply the bootstrap to statistics with sampling distributions that are difficult to derive, even asymptotically. 
+o It is relatively simple to apply the bootstrap to complex data-collection plans (such as stratified and clustered samples).
+!eblock
+
+The textbook by "Davison on the Bootstrap Methods and their Applications":"https://www.cambridge.org/core/books/bootstrap-methods-and-their-application/ED2FD043579F27952363566DC09CBD6A" provides many more insights and proofs. In this course we will take a more practical approach and use the results and theorems provided in the literature. For those interested in reading more about the bootstrap methods, we recommend the above text and the one by "Efron and Tibshirani":"https://www.routledge.com/An-Introduction-to-the-Bootstrap/Efron-Tibshirani/p/book/9780412042317".
+
+
+Before we proceed however, we need to remind ourselves about a central theorem in statistics, namely the so-called _central limit theorem_.
+
+!split
+===== The Central Limit Theorem =====
+
+
+Suppose we have a PDF $p(x)$ from which we generate  a series $N$
+of averages $\mathbb{E}[x_i]$. Each mean value $\mathbb{E}[x_i]$
+is viewed as the average of a specific measurement, e.g., throwing 
+dice 100 times and then taking the average value, or producing a certain
+amount of random numbers. 
+For notational ease, we set $\mathbb{E}[x_i]=x_i$ in the discussion
+which follows. We do the same for $\mathbb{E}[z]=z$.
+
+If we compute the mean $z$ of $m$ such mean values $x_i$   
+!bt
+\[
+   z=\frac{x_1+x_2+\dots+x_m}{m},
+\]
+!et
+the question we pose is which is the PDF of the new variable $z$.
+
+!split
+===== Finding the Limit =====
+
+The probability of obtaining an average value $z$ is the product of the 
+probabilities of obtaining arbitrary individual mean values $x_i$,
+but with the constraint that the average is $z$. We can express this through
+the following expression
+!bt
+\[
+    \tilde{p}(z)=\int dx_1p(x_1)\int dx_2p(x_2)\dots\int dx_mp(x_m)
+    \delta(z-\frac{x_1+x_2+\dots+x_m}{m}),
+\]
+!et
+where the $\delta$-function enbodies the constraint that the mean is $z$.
+All measurements that lead to each individual $x_i$ are expected to
+be independent, which in turn means that we can express $\tilde{p}$ as the 
+product of individual $p(x_i)$.  The independence assumption is important in the derivation of the central limit theorem.
+
+
+!split
+===== Rewriting the $\delta$-function =====
+
+If we use the integral expression for the $\delta$-function
+
+!bt
+\[
+   \delta(z-\frac{x_1+x_2+\dots+x_m}{m})=\frac{1}{2\pi}\int_{-\infty}^{\infty}
+   dq\exp{\left(iq(z-\frac{x_1+x_2+\dots+x_m}{m})\right)},
+\]
+!et
+and inserting $e^{i\mu q-i\mu q}$ where $\mu$ is the mean value
+we arrive at
+!bt
+\[
+   \tilde{p}(z)=\frac{1}{2\pi}\int_{-\infty}^{\infty}
+   dq\exp{\left(iq(z-\mu)\right)}\left[\int_{-\infty}^{\infty}
+   dxp(x)\exp{\left(iq(\mu-x)/m\right)}\right]^m,
+\]
+!et
+with the integral over $x$ resulting in
+
+!bt
+\[
+  \int_{-\infty}^{\infty}dxp(x)\exp{\left(iq(\mu-x)/m\right)}=
+  \int_{-\infty}^{\infty}dxp(x)
+   \left[1+\frac{iq(\mu-x)}{m}-\frac{q^2(\mu-x)^2}{2m^2}+\dots\right].
+\]
+!et
+
+!split
+===== Identifying Terms =====
+
+The second term on the rhs disappears since this is just the mean and 
+employing the definition of $\sigma^2$ we have 
+!bt
+\[
+  \int_{-\infty}^{\infty}dxp(x)e^{\left(iq(\mu-x)/m\right)}=
+  1-\frac{q^2\sigma^2}{2m^2}+\dots,
+\]
+!et
+resulting in 
+
+!bt
+\[
+  \left[\int_{-\infty}^{\infty}dxp(x)\exp{\left(iq(\mu-x)/m\right)}\right]^m\approx
+  \left[1-\frac{q^2\sigma^2}{2m^2}+\dots \right]^m,
+\]
+!et
+and in the limit $m\rightarrow \infty$ we obtain 
+
+!bt
+\[
+   \tilde{p}(z)=\frac{1}{\sqrt{2\pi}(\sigma/\sqrt{m})}
+    \exp{\left(-\frac{(z-\mu)^2}{2(\sigma/\sqrt{m})^2}\right)},
+\]
+!et
+which is the normal distribution with variance
+$\sigma^2_m=\sigma^2/m$, where $\sigma$ is the variance of the PDF $p(x)$
+and $\mu$ is also the mean of the PDF $p(x)$. 
+
+!split
+===== Wrapping it up =====
+
+Thus, the central limit theorem states that the PDF $\tilde{p}(z)$ of
+the average of $m$ random values corresponding to a PDF $p(x)$ 
+is a normal distribution whose mean is the 
+mean value of the PDF $p(x)$ and whose variance is the variance
+of the PDF $p(x)$ divided by $m$, the number of values used to compute $z$.
+
+The central limit theorem leads to the well-known expression for the
+standard deviation, given by 
+
+!bt
+\[
+    \sigma_m=
+\frac{\sigma}{\sqrt{m}}.
+\]
+!et
+
+The latter is true only if the average value is known exactly. This is obtained in the limit
+$m\rightarrow \infty$  only. Because the mean and the variance are measured quantities we obtain 
+the familiar expression in statistics (the so-called Bessel correction)
+!bt
+\[
+    \sigma_m\approx 
+\frac{\sigma}{\sqrt{m-1}}.
+\]
+!et
+
+In many cases however the above estimate for the standard deviation,
+in particular if correlations are strong, may be too simplistic. Keep
+in mind that we have assumed that the variables $x$ are independent
+and identically distributed. This is obviously not always the
+case. For example, the random numbers (or better pseudorandom numbers)
+we generate in various calculations do always exhibit some
+correlations.
+
+
+
+The theorem is satisfied by a large class of PDFs. Note however that for a
+finite $m$, it is not always possible to find a closed form /analytic expression for
+$\tilde{p}(x)$.
+
+
+!split
+===== Confidence Intervals =====
+
+Confidence intervals are used in statistics and represent a type of estimate
+computed from the observed data. This gives a range of values for an
+unknown parameter such as the parameters $\bm{\beta}$ from linear regression.
+
+With the OLS expressions for the parameters $\bm{\beta}$ we found 
+$\mathbb{E}(\bm{\beta}) = \bm{\beta}$, which means that the estimator of the regression parameters is unbiased.
+
+In the exercises this week we show that the variance of the estimate of the $j$-th regression coefficient is
+$\bm{\sigma}^2 (\bm{\beta}_j ) = \bm{\sigma}^2 [(\mathbf{X}^{T} \mathbf{X})^{-1}]_{jj} $.
+
+This quantity can be used to
+construct a confidence interval for the estimates.
+
+
+!split
+===== Standard Approach based on the Normal Distribution =====
+
+We will assume that the parameters $\beta$ follow a normal
+distribution.  We can then define the confidence interval.  Here we will be using as
+shorthands $\mu_{\beta}$ for the above mean value and $\sigma_{\beta}$
+for the standard deviation. We have then a confidence interval
+
+!bt
+\[
+\left(\mu_{\beta}\pm \frac{z\sigma_{\beta}}{\sqrt{n}}\right),
+\]
+!et
+
+where $z$ defines the level of certainty (or confidence). For a normal
+distribution typical parameters are $z=2.576$ which corresponds to a
+confidence of $99\%$ while $z=1.96$ corresponds to a confidence of
+$95\%$.  A confidence level of $95\%$ is commonly used and it is
+normally referred to as a *two-sigmas* confidence level, that is we
+approximate $z\approx 2$.
+
+For more discussions of confidence intervals (and in particular linked with a discussion of the bootstrap method), see chapter 5 of the textbook by "Davison on the Bootstrap Methods and their Applications":"https://www.cambridge.org/core/books/bootstrap-methods-and-their-application/ED2FD043579F27952363566DC09CBD6A"
+
+In this text you will also find an in-depth discussion of the
+Bootstrap method, why it works and various theorems related to it. 
+
+!split
+===== Resampling methods: Bootstrap background =====
+
+Since $\widehat{\beta} = \widehat{\beta}(\bm{X})$ is a function of random variables,
+$\widehat{\beta}$ itself must be a random variable. Thus it has
+a pdf, call this function $p(\bm{t})$. The aim of the bootstrap is to
+estimate $p(\bm{t})$ by the relative frequency of
+$\widehat{\beta}$. You can think of this as using a histogram
+in the place of $p(\bm{t})$. If the relative frequency closely
+resembles $p(\vec{t})$, then using numerics, it is straight forward to
+estimate all the interesting parameters of $p(\bm{t})$ using point
+estimators.  
+
+
+!split
+===== Resampling methods: More Bootstrap background =====
+
+In the case that $\widehat{\beta}$ has
+more than one component, and the components are independent, we use the
+same estimator on each component separately.  If the probability
+density function of $X_i$, $p(x)$, had been known, then it would have
+been straightforward to do this by: 
+o Drawing lots of numbers from $p(x)$, suppose we call one such set of numbers $(X_1^*, X_2^*, \cdots, X_n^*)$. 
+o Then using these numbers, we could compute a replica of $\widehat{\beta}$ called $\widehat{\beta}^*$. 
+
+By repeated use of the above two points, many
+estimates of $\widehat{\beta}$ can  be obtained. The
+idea is to use the relative frequency of $\widehat{\beta}^*$
+(think of a histogram) as an estimate of $p(\bm{t})$.
+
+!split
+===== Resampling methods: Bootstrap approach =====
+
+But
+unless there is enough information available about the process that
+generated $X_1,X_2,\cdots,X_n$, $p(x)$ is in general
+unknown. Therefore, "Efron in 1979":"https://projecteuclid.org/euclid.aos/1176344552"  asked the
+question: What if we replace $p(x)$ by the relative frequency
+of the observation $X_i$?
+
+If we draw observations in accordance with
+the relative frequency of the observations, will we obtain the same
+result in some asymptotic sense? The answer is yes.
+
+
+
+!split
+===== Resampling methods: Bootstrap steps =====
+
+The independent bootstrap works like this: 
+
+o Draw with replacement $n$ numbers for the observed variables $\bm{x} = (x_1,x_2,\cdots,x_n)$. 
+o Define a vector $\bm{x}^*$ containing the values which were drawn from $\bm{x}$. 
+o Using the vector $\bm{x}^*$ compute $\widehat{\beta}^*$ by evaluating $\widehat \beta$ under the observations $\bm{x}^*$. 
+o Repeat this process $k$ times. 
+
+When you are done, you can draw a histogram of the relative frequency
+of $\widehat \beta^*$. This is your estimate of the probability
+distribution $p(t)$. Using this probability distribution you can
+estimate any statistics thereof. In principle you never draw the
+histogram of the relative frequency of $\widehat{\beta}^*$. Instead
+you use the estimators corresponding to the statistic of interest. For
+example, if you are interested in estimating the variance of $\widehat
+\beta$, apply the etsimator $\widehat \sigma^2$ to the values
+$\widehat \beta^*$.
+
+
+!split
+===== Code example for the Bootstrap method =====
+
+The following code starts with a Gaussian distribution with mean value
+$\mu =100$ and variance $\sigma=15$. We use this to generate the data
+used in the bootstrap analysis. The bootstrap analysis returns a data
+set after a given number of bootstrap operations (as many as we have
+data points). This data set consists of estimated mean values for each
+bootstrap operation. The histogram generated by the bootstrap method
+shows that the distribution for these mean values is also a Gaussian,
+centered around the mean value $\mu=100$ but with standard deviation
+$\sigma/\sqrt{n}$, where $n$ is the number of bootstrap samples (in
+this case the same as the number of original data points). The value
+of the standard deviation is what we expect from the central limit
+theorem.
+
+
+!bc pycod
+import numpy as np
+from time import time
+from scipy.stats import norm
+import matplotlib.pyplot as plt
+
+# Returns mean of bootstrap samples 
+# Bootstrap algorithm
+def bootstrap(data, datapoints):
+    t = np.zeros(datapoints)
+    n = len(data)
+    # non-parametric bootstrap         
+    for i in range(datapoints):
+        t[i] = np.mean(data[np.random.randint(0,n,n)])
+    # analysis    
+    print("Bootstrap Statistics :")
+    print("original           bias      std. error")
+    print("%8g %8g %14g %15g" % (np.mean(data), np.std(data),np.mean(t),np.std(t)))
+    return t
+
+# We set the mean value to 100 and the standard deviation to 15
+mu, sigma = 100, 15
+datapoints = 10000
+# We generate random numbers according to the normal distribution
+x = mu + sigma*np.random.randn(datapoints)
+# bootstrap returns the data sample                                    
+t = bootstrap(x, datapoints)
+!ec
+We see that our new variance and from that the standard deviation, agrees with the central limit theorem.
+
+!split
+===== Plotting the Histogram =====
+!bc pycod 
+# the histogram of the bootstrapped data (normalized data if density = True)
+n, binsboot, patches = plt.hist(t, 50, density=True, facecolor='red', alpha=0.75)
+# add a 'best fit' line  
+y = norm.pdf(binsboot, np.mean(t), np.std(t))
+lt = plt.plot(binsboot, y, 'b', linewidth=1)
+plt.xlabel('x')
+plt.ylabel('Probability')
+plt.grid(True)
+plt.show()
+!ec
+
+
+
+!split
+===== The bias-variance tradeoff =====
+
+
+We will discuss the bias-variance tradeoff in the context of
+continuous predictions such as regression. However, many of the
+intuitions and ideas discussed here also carry over to classification
+tasks. Consider a dataset $\mathcal{D}$ consisting of the data
+$\mathbf{X}_\mathcal{D}=\{(y_j, \boldsymbol{x}_j), j=0\ldots n-1\}$. 
+
+Let us assume that the true data is generated from a noisy model
+
+!bt
+\[
+\bm{y}=f(\boldsymbol{x}) + \bm{\epsilon}
+\]
+!et
+
+where $\epsilon$ is normally distributed with mean zero and standard deviation $\sigma^2$.
+
+In our derivation of the ordinary least squares method we defined then
+an approximation to the function $f$ in terms of the parameters
+$\bm{\beta}$ and the design matrix $\bm{X}$ which embody our model,
+that is $\bm{\tilde{y}}=\bm{X}\bm{\beta}$. 
+
+Thereafter we found the parameters $\bm{\beta}$ by optimizing the means squared error via the so-called cost function
+!bt
+\[
+C(\bm{X},\bm{\beta}) =\frac{1}{n}\sum_{i=0}^{n-1}(y_i-\tilde{y}_i)^2=\mathbb{E}\left[(\bm{y}-\bm{\tilde{y}})^2\right].
+\]
+!et
+
+We can rewrite this as 
+!bt
+\[
+\mathbb{E}\left[(\bm{y}-\bm{\tilde{y}})^2\right]=\frac{1}{n}\sum_i(f_i-\mathbb{E}\left[\bm{\tilde{y}}\right])^2+\frac{1}{n}\sum_i(\tilde{y}_i-\mathbb{E}\left[\bm{\tilde{y}}\right])^2+\sigma^2.
+\]
+!et
+
+The three terms represent the square of the bias of the learning
+method, which can be thought of as the error caused by the simplifying
+assumptions built into the method. The second term represents the
+variance of the chosen model and finally the last terms is variance of
+the error $\bm{\epsilon}$.
+
+To derive this equation, we need to recall that the variance of $\bm{y}$ and $\bm{\epsilon}$ are both equal to $\sigma^2$. The mean value of $\bm{\epsilon}$ is by definition equal to zero. Furthermore, the function $f$ is not a stochastics variable, idem for $\bm{\tilde{y}}$.
+We use a more compact notation in terms of the expectation value 
+!bt
+\[
+\mathbb{E}\left[(\bm{y}-\bm{\tilde{y}})^2\right]=\mathbb{E}\left[(\bm{f}+\bm{\epsilon}-\bm{\tilde{y}})^2\right],
+\]
+!et
+and adding and subtracting $\mathbb{E}\left[\bm{\tilde{y}}\right]$ we get
+!bt
+\[
+\mathbb{E}\left[(\bm{y}-\bm{\tilde{y}})^2\right]=\mathbb{E}\left[(\bm{f}+\bm{\epsilon}-\bm{\tilde{y}}+\mathbb{E}\left[\bm{\tilde{y}}\right]-\mathbb{E}\left[\bm{\tilde{y}}\right])^2\right],
+\]
+!et
+which, using the abovementioned expectation values can be rewritten as 
+!bt
+\[
+\mathbb{E}\left[(\bm{y}-\bm{\tilde{y}})^2\right]=\mathbb{E}\left[(\bm{y}-\mathbb{E}\left[\bm{\tilde{y}}\right])^2\right]+\mathrm{Var}\left[\bm{\tilde{y}}\right]+\sigma^2,
+\]
+!et
+that is the rewriting in terms of the so-called bias, the variance of the model $\bm{\tilde{y}}$ and the variance of $\bm{\epsilon}$.
+
+
+!split
+===== A way to Read the Bias-Variance Tradeoff =====
+
+FIGURE: [figures/BiasVariance.png, width=600 frac=0.9]
+
+
+!split
+===== Example code for Bias-Variance tradeoff =====
+!bc pycod
+import matplotlib.pyplot as plt
+import numpy as np
+from sklearn.linear_model import LinearRegression, Ridge, Lasso
+from sklearn.preprocessing import PolynomialFeatures
+from sklearn.model_selection import train_test_split
+from sklearn.pipeline import make_pipeline
+from sklearn.utils import resample
+
+np.random.seed(2018)
+
+n = 500
+n_boostraps = 100
+degree = 18  # A quite high value, just to show.
+noise = 0.1
+
+# Make data set.
+x = np.linspace(-1, 3, n).reshape(-1, 1)
+y = np.exp(-x**2) + 1.5 * np.exp(-(x-2)**2) + np.random.normal(0, 0.1, x.shape)
+
+# Hold out some test data that is never used in training.
+x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2)
+
+# Combine x transformation and model into one operation.
+# Not neccesary, but convenient.
+model = make_pipeline(PolynomialFeatures(degree=degree), LinearRegression(fit_intercept=False))
+
+# The following (m x n_bootstraps) matrix holds the column vectors y_pred
+# for each bootstrap iteration.
+y_pred = np.empty((y_test.shape[0], n_boostraps))
+for i in range(n_boostraps):
+    x_, y_ = resample(x_train, y_train)
+
+    # Evaluate the new model on the same test data each time.
+    y_pred[:, i] = model.fit(x_, y_).predict(x_test).ravel()
+
+# Note: Expectations and variances taken w.r.t. different training
+# data sets, hence the axis=1. Subsequent means are taken across the test data
+# set in order to obtain a total value, but before this we have error/bias/variance
+# calculated per data point in the test set.
+# Note 2: The use of keepdims=True is important in the calculation of bias as this 
+# maintains the column vector form. Dropping this yields very unexpected results.
+error = np.mean( np.mean((y_test - y_pred)**2, axis=1, keepdims=True) )
+bias = np.mean( (y_test - np.mean(y_pred, axis=1, keepdims=True))**2 )
+variance = np.mean( np.var(y_pred, axis=1, keepdims=True) )
+print('Error:', error)
+print('Bias^2:', bias)
+print('Var:', variance)
+print('{} >= {} + {} = {}'.format(error, bias, variance, bias+variance))
+
+plt.plot(x[::5, :], y[::5, :], label='f(x)')
+plt.scatter(x_test, y_test, label='Data points')
+plt.scatter(x_test, np.mean(y_pred, axis=1), label='Pred')
+plt.legend()
+plt.show()
+
+!ec
+
+
+!split
+===== Understanding what happens =====
+!bc pycod
+import matplotlib.pyplot as plt
+import numpy as np
+from sklearn.linear_model import LinearRegression, Ridge, Lasso
+from sklearn.preprocessing import PolynomialFeatures
+from sklearn.model_selection import train_test_split
+from sklearn.pipeline import make_pipeline
+from sklearn.utils import resample
+
+np.random.seed(2018)
+
+n = 40
+n_boostraps = 100
+maxdegree = 14
+
+
+# Make data set.
+x = np.linspace(-3, 3, n).reshape(-1, 1)
+y = np.exp(-x**2) + 1.5 * np.exp(-(x-2)**2)+ np.random.normal(0, 0.1, x.shape)
+error = np.zeros(maxdegree)
+bias = np.zeros(maxdegree)
+variance = np.zeros(maxdegree)
+polydegree = np.zeros(maxdegree)
+x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2)
+
+for degree in range(maxdegree):
+    model = make_pipeline(PolynomialFeatures(degree=degree), LinearRegression(fit_intercept=False))
+    y_pred = np.empty((y_test.shape[0], n_boostraps))
+    for i in range(n_boostraps):
+        x_, y_ = resample(x_train, y_train)
+        y_pred[:, i] = model.fit(x_, y_).predict(x_test).ravel()
+
+    polydegree[degree] = degree
+    error[degree] = np.mean( np.mean((y_test - y_pred)**2, axis=1, keepdims=True) )
+    bias[degree] = np.mean( (y_test - np.mean(y_pred, axis=1, keepdims=True))**2 )
+    variance[degree] = np.mean( np.var(y_pred, axis=1, keepdims=True) )
+    print('Polynomial degree:', degree)
+    print('Error:', error[degree])
+    print('Bias^2:', bias[degree])
+    print('Var:', variance[degree])
+    print('{} >= {} + {} = {}'.format(error[degree], bias[degree], variance[degree], bias[degree]+variance[degree]))
+
+plt.plot(polydegree, error, label='Error')
+plt.plot(polydegree, bias, label='bias')
+plt.plot(polydegree, variance, label='Variance')
+plt.legend()
+plt.show()
+
+
+
+
+!ec
+
+!split 
+===== Summing up ===== 
+
+
+
+
+The bias-variance tradeoff summarizes the fundamental tension in
+machine learning, particularly supervised learning, between the
+complexity of a model and the amount of training data needed to train
+it.  Since data is often limited, in practice it is often useful to
+use a less-complex model with higher bias, that is  a model whose asymptotic
+performance is worse than another model because it is easier to
+train and less sensitive to sampling noise arising from having a
+finite-sized training dataset (smaller variance). 
+
+
+
+The above equations tell us that in
+order to minimize the expected test error, we need to select a
+statistical learning method that simultaneously achieves low variance
+and low bias. Note that variance is inherently a nonnegative quantity,
+and squared bias is also nonnegative. Hence, we see that the expected
+test MSE can never lie below $Var(\epsilon)$, the irreducible error.
+
+
+What do we mean by the variance and bias of a statistical learning
+method? The variance refers to the amount by which our model would change if we
+estimated it using a different training data set. Since the training
+data are used to fit the statistical learning method, different
+training data sets  will result in a different estimate. But ideally the
+estimate for our model should not vary too much between training
+sets. However, if a method has high variance  then small changes in
+the training data can result in large changes in the model. In general, more
+flexible statistical methods have higher variance.
+
+
+You may also find this recent "article":"https://www.pnas.org/content/116/32/15849" of interest.
+
+!split
+===== Another Example from Scikit-Learn's Repository =====
+
+This example demonstrates the problems of underfitting and overfitting and
+how we can use linear regression with polynomial features to approximate
+nonlinear functions. The plot shows the function that we want to approximate,
+which is a part of the cosine function. In addition, the samples from the
+real function and the approximations of different models are displayed. The
+models have polynomial features of different degrees. We can see that a
+linear function (polynomial with degree 1) is not sufficient to fit the
+training samples. This is called _underfitting_. A polynomial of degree 4
+approximates the true function almost perfectly. However, for higher degrees
+the model will _overfit_ the training data, i.e. it learns the noise of the
+training data.
+We evaluate quantitatively overfitting and underfitting by using
+cross-validation. We calculate the mean squared error (MSE) on the validation
+set, the higher, the less likely the model generalizes correctly from the
+training data.
+
+!bc pycod
+
+
+#print(__doc__)
+
+import numpy as np
+import matplotlib.pyplot as plt
+from sklearn.pipeline import Pipeline
+from sklearn.preprocessing import PolynomialFeatures
+from sklearn.linear_model import LinearRegression
+from sklearn.model_selection import cross_val_score
+
+
+def true_fun(X):
+    return np.cos(1.5 * np.pi * X)
+
+np.random.seed(0)
+
+n_samples = 30
+degrees = [1, 4, 15]
+
+X = np.sort(np.random.rand(n_samples))
+y = true_fun(X) + np.random.randn(n_samples) * 0.1
+
+plt.figure(figsize=(14, 5))
+for i in range(len(degrees)):
+    ax = plt.subplot(1, len(degrees), i + 1)
+    plt.setp(ax, xticks=(), yticks=())
+
+    polynomial_features = PolynomialFeatures(degree=degrees[i],
+                                             include_bias=False)
+    linear_regression = LinearRegression()
+    pipeline = Pipeline([("polynomial_features", polynomial_features),
+                         ("linear_regression", linear_regression)])
+    pipeline.fit(X[:, np.newaxis], y)
+
+    # Evaluate the models using crossvalidation
+    scores = cross_val_score(pipeline, X[:, np.newaxis], y,
+                             scoring="neg_mean_squared_error", cv=10)
+
+    X_test = np.linspace(0, 1, 100)
+    plt.plot(X_test, pipeline.predict(X_test[:, np.newaxis]), label="Model")
+    plt.plot(X_test, true_fun(X_test), label="True function")
+    plt.scatter(X, y, edgecolor='b', s=20, label="Samples")
+    plt.xlabel("x")
+    plt.ylabel("y")
+    plt.xlim((0, 1))
+    plt.ylim((-2, 2))
+    plt.legend(loc="best")
+    plt.title("Degree {}\nMSE = {:.2e}(+/- {:.2e})".format(
+        degrees[i], -scores.mean(), scores.std()))
+plt.show()
+!ec
+
+
+
+
+!split 
+===== Various steps in cross-validation =====
+
+When the repetitive splitting of the data set is done randomly,
+samples may accidently end up in a fast majority of the splits in
+either training or test set. Such samples may have an unbalanced
+influence on either model building or prediction evaluation. To avoid
+this $k$-fold cross-validation structures the data splitting. The
+samples are divided into $k$ more or less equally sized exhaustive and
+mutually exclusive subsets. In turn (at each split) one of these
+subsets plays the role of the test set while the union of the
+remaining subsets constitutes the training set. Such a splitting
+warrants a balanced representation of each sample in both training and
+test set over the splits. Still the division into the $k$ subsets
+involves a degree of randomness. This may be fully excluded when
+choosing $k=n$. This particular case is referred to as leave-one-out
+cross-validation (LOOCV). 
+
+
+!split
+===== Cross-validation in brief =====
+
+For the various values of $k$
+
+o shuffle the dataset randomly.
+o Split the dataset into $k$ groups.
+o For each unique group:
+ o Decide which group to use as set for test data
+ o Take the remaining groups as a training data set
+ o Fit a model on the training set and evaluate it on the test set
+ o Retain the evaluation score and discard the model
+o Summarize the model using the sample of model evaluation scores
+
+
+
+!split
+===== Code Example for Cross-validation and $k$-fold Cross-validation =====
+
+The code here uses Ridge regression with cross-validation (CV)  resampling and $k$-fold CV in order to fit a specific polynomial. 
+!bc pycod
+import numpy as np
+import matplotlib.pyplot as plt
+from sklearn.model_selection import KFold
+from sklearn.linear_model import Ridge
+from sklearn.model_selection import cross_val_score
+from sklearn.preprocessing import PolynomialFeatures
+
+# A seed just to ensure that the random numbers are the same for every run.
+# Useful for eventual debugging.
+np.random.seed(3155)
+
+# Generate the data.
+nsamples = 100
+x = np.random.randn(nsamples)
+y = 3*x**2 + np.random.randn(nsamples)
+
+## Cross-validation on Ridge regression using KFold only
+
+# Decide degree on polynomial to fit
+poly = PolynomialFeatures(degree = 6)
+
+# Decide which values of lambda to use
+nlambdas = 500
+lambdas = np.logspace(-3, 5, nlambdas)
+
+# Initialize a KFold instance
+k = 5
+kfold = KFold(n_splits = k)
+
+# Perform the cross-validation to estimate MSE
+scores_KFold = np.zeros((nlambdas, k))
+
+i = 0
+for lmb in lambdas:
+    ridge = Ridge(alpha = lmb)
+    j = 0
+    for train_inds, test_inds in kfold.split(x):
+        xtrain = x[train_inds]
+        ytrain = y[train_inds]
+
+        xtest = x[test_inds]
+        ytest = y[test_inds]
+
+        Xtrain = poly.fit_transform(xtrain[:, np.newaxis])
+        ridge.fit(Xtrain, ytrain[:, np.newaxis])
+
+        Xtest = poly.fit_transform(xtest[:, np.newaxis])
+        ypred = ridge.predict(Xtest)
+
+        scores_KFold[i,j] = np.sum((ypred - ytest[:, np.newaxis])**2)/np.size(ypred)
+
+        j += 1
+    i += 1
+
+
+estimated_mse_KFold = np.mean(scores_KFold, axis = 1)
+
+## Cross-validation using cross_val_score from sklearn along with KFold
+
+# kfold is an instance initialized above as:
+# kfold = KFold(n_splits = k)
+
+estimated_mse_sklearn = np.zeros(nlambdas)
+i = 0
+for lmb in lambdas:
+    ridge = Ridge(alpha = lmb)
+
+    X = poly.fit_transform(x[:, np.newaxis])
+    estimated_mse_folds = cross_val_score(ridge, X, y[:, np.newaxis], scoring='neg_mean_squared_error', cv=kfold)
+
+    # cross_val_score return an array containing the estimated negative mse for every fold.
+    # we have to the the mean of every array in order to get an estimate of the mse of the model
+    estimated_mse_sklearn[i] = np.mean(-estimated_mse_folds)
+
+    i += 1
+
+## Plot and compare the slightly different ways to perform cross-validation
+
+plt.figure()
+
+plt.plot(np.log10(lambdas), estimated_mse_sklearn, label = 'cross_val_score')
+plt.plot(np.log10(lambdas), estimated_mse_KFold, 'r--', label = 'KFold')
+
+plt.xlabel('log10(lambda)')
+plt.ylabel('mse')
+
+plt.legend()
+
+plt.show()
+
+!ec
+
+
+
+!split
+===== More examples on bootstrap and cross-validation and errors =====
+
+!bc pycod
+# Common imports
+import os
+import numpy as np
+import pandas as pd
+import matplotlib.pyplot as plt
+from sklearn.linear_model import LinearRegression, Ridge, Lasso
+from sklearn.model_selection import train_test_split
+from sklearn.utils import resample
+from sklearn.metrics import mean_squared_error
+# Where to save the figures and data files
+PROJECT_ROOT_DIR = "Results"
+FIGURE_ID = "Results/FigureFiles"
+DATA_ID = "DataFiles/"
+
+if not os.path.exists(PROJECT_ROOT_DIR):
+    os.mkdir(PROJECT_ROOT_DIR)
+
+if not os.path.exists(FIGURE_ID):
+    os.makedirs(FIGURE_ID)
+
+if not os.path.exists(DATA_ID):
+    os.makedirs(DATA_ID)
+
+def image_path(fig_id):
+    return os.path.join(FIGURE_ID, fig_id)
+
+def data_path(dat_id):
+    return os.path.join(DATA_ID, dat_id)
+
+def save_fig(fig_id):
+    plt.savefig(image_path(fig_id) + ".png", format='png')
+
+infile = open(data_path("EoS.csv"),'r')
+
+# Read the EoS data as  csv file and organize the data into two arrays with density and energies
+EoS = pd.read_csv(infile, names=('Density', 'Energy'))
+EoS['Energy'] = pd.to_numeric(EoS['Energy'], errors='coerce')
+EoS = EoS.dropna()
+Energies = EoS['Energy']
+Density = EoS['Density']
+#  The design matrix now as function of various polytrops
+
+Maxpolydegree = 30
+X = np.zeros((len(Density),Maxpolydegree))
+X[:,0] = 1.0
+testerror = np.zeros(Maxpolydegree)
+trainingerror = np.zeros(Maxpolydegree)
+polynomial = np.zeros(Maxpolydegree)
+
+trials = 100
+for polydegree in range(1, Maxpolydegree):
+    polynomial[polydegree] = polydegree
+    for degree in range(polydegree):
+        X[:,degree] = Density**(degree/3.0)
+
+# loop over trials in order to estimate the expectation value of the MSE
+    testerror[polydegree] = 0.0
+    trainingerror[polydegree] = 0.0
+    for samples in range(trials):
+        x_train, x_test, y_train, y_test = train_test_split(X, Energies, test_size=0.2)
+        model = LinearRegression(fit_intercept=False).fit(x_train, y_train)
+        ypred = model.predict(x_train)
+        ytilde = model.predict(x_test)
+        testerror[polydegree] += mean_squared_error(y_test, ytilde)
+        trainingerror[polydegree] += mean_squared_error(y_train, ypred) 
+
+    testerror[polydegree] /= trials
+    trainingerror[polydegree] /= trials
+    print("Degree of polynomial: %3d"% polynomial[polydegree])
+    print("Mean squared error on training data: %.8f" % trainingerror[polydegree])
+    print("Mean squared error on test data: %.8f" % testerror[polydegree])
+
+plt.plot(polynomial, np.log10(trainingerror), label='Training Error')
+plt.plot(polynomial, np.log10(testerror), label='Test Error')
+plt.xlabel('Polynomial degree')
+plt.ylabel('log10[MSE]')
+plt.legend()
+plt.show()
+
+!ec
+
+Note that we kept the intercept column in the fitting here. This means that we need to set the _intercept_ in the call to the _Scikit-Learn_ function as _False_. Alternatively, we could have set up the design matrix $X$ without the first column of ones.
+
+!split 
+=====  The same example but now with cross-validation =====
+
+In this example we keep the intercept column again but add cross-validation in order to estimate the best possible value of the means squared error.
+!bc pycod
+# Common imports
+import os
+import numpy as np
+import pandas as pd
+import matplotlib.pyplot as plt
+from sklearn.linear_model import LinearRegression, Ridge, Lasso
+from sklearn.metrics import mean_squared_error
+from sklearn.model_selection import KFold
+from sklearn.model_selection import cross_val_score
+
+
+# Where to save the figures and data files
+PROJECT_ROOT_DIR = "Results"
+FIGURE_ID = "Results/FigureFiles"
+DATA_ID = "DataFiles/"
+
+if not os.path.exists(PROJECT_ROOT_DIR):
+    os.mkdir(PROJECT_ROOT_DIR)
+
+if not os.path.exists(FIGURE_ID):
+    os.makedirs(FIGURE_ID)
+
+if not os.path.exists(DATA_ID):
+    os.makedirs(DATA_ID)
+
+def image_path(fig_id):
+    return os.path.join(FIGURE_ID, fig_id)
+
+def data_path(dat_id):
+    return os.path.join(DATA_ID, dat_id)
+
+def save_fig(fig_id):
+    plt.savefig(image_path(fig_id) + ".png", format='png')
+
+infile = open(data_path("EoS.csv"),'r')
+
+# Read the EoS data as  csv file and organize the data into two arrays with density and energies
+EoS = pd.read_csv(infile, names=('Density', 'Energy'))
+EoS['Energy'] = pd.to_numeric(EoS['Energy'], errors='coerce')
+EoS = EoS.dropna()
+Energies = EoS['Energy']
+Density = EoS['Density']
+#  The design matrix now as function of various polytrops
+
+Maxpolydegree = 30
+X = np.zeros((len(Density),Maxpolydegree))
+X[:,0] = 1.0
+estimated_mse_sklearn = np.zeros(Maxpolydegree)
+polynomial = np.zeros(Maxpolydegree)
+k =5
+kfold = KFold(n_splits = k)
+
+for polydegree in range(1, Maxpolydegree):
+    polynomial[polydegree] = polydegree
+    for degree in range(polydegree):
+        X[:,degree] = Density**(degree/3.0)
+        OLS = LinearRegression(fit_intercept=False)
+# loop over trials in order to estimate the expectation value of the MSE
+    estimated_mse_folds = cross_val_score(OLS, X, Energies, scoring='neg_mean_squared_error', cv=kfold)
+#[:, np.newaxis]
+    estimated_mse_sklearn[polydegree] = np.mean(-estimated_mse_folds)
+
+plt.plot(polynomial, np.log10(estimated_mse_sklearn), label='Test Error')
+plt.xlabel('Polynomial degree')
+plt.ylabel('log10[MSE]')
+plt.legend()
+plt.show()
+
+!ec
+
+
+
+
+
+!split
+===== Material for the lab sessions =====
+
+
+!split 
+===== Linking the regression analysis with a statistical interpretation =====
+
+We will now couple the discussions of ordinary least squares, Ridge
+and Lasso regression with a statistical interpretation, that is we
+move from a linear algebra analysis to a statistical analysis. In
+particular, we will focus on what the regularization terms can result
+in.  We will amongst other things show that the regularization
+parameter can reduce considerably the variance of the parameters
+$\beta$.
+
+
+The
+advantage of doing linear regression is that we actually end up with
+analytical expressions for several statistical quantities.  
+Standard least squares and Ridge regression  allow us to
+derive quantities like the variance and other expectation values in a
+rather straightforward way.
+
+
+It is assumed that $\varepsilon_i
+\sim \mathcal{N}(0, \sigma^2)$ and the $\varepsilon_{i}$ are
+independent, i.e.: 
+!bt
+\begin{align*} 
+\mbox{Cov}(\varepsilon_{i_1},
+\varepsilon_{i_2}) & = \left\{ \begin{array}{lcc} \sigma^2 & \mbox{if}
+& i_1 = i_2, \\ 0 & \mbox{if} & i_1 \not= i_2.  \end{array} \right.
+\end{align*} 
+!et
+The randomness of $\varepsilon_i$ implies that
+$\mathbf{y}_i$ is also a random variable. In particular,
+$\mathbf{y}_i$ is normally distributed, because $\varepsilon_i \sim
+\mathcal{N}(0, \sigma^2)$ and $\mathbf{X}_{i,\ast} \, \bm{\beta}$ is a
+non-random scalar. To specify the parameters of the distribution of
+$\mathbf{y}_i$ we need to calculate its first two moments. 
+
+Recall that $\bm{X}$ is a matrix of dimensionality $n\times p$. The
+notation above $\mathbf{X}_{i,\ast}$ means that we are looking at the
+row number $i$ and perform a sum over all values $p$.
+
+
+!split
+===== Assumptions made =====
+
+The assumption we have made here can be summarized as (and this is going to be useful when we discuss the bias-variance trade off)
+that there exists a function $f(\bm{x})$ and  a normal distributed error $\bm{\varepsilon}\sim \mathcal{N}(0, \sigma^2)$
+which describe our data
+!bt
+\[
+\bm{y} = f(\bm{x})+\bm{\varepsilon}
+\]
+!et
+
+We approximate this function with our model from the solution of the linear regression equations, that is our
+function $f$ is approximated by $\bm{\tilde{y}}$ where we want to minimize $(\bm{y}-\bm{\tilde{y}})^2$, our MSE, with
+!bt
+\[
+\bm{\tilde{y}} = \bm{X}\bm{\beta}.
+\]
+!et
+
+!split
+===== Expectation value and variance =====
+
+We can calculate the expectation value of $\bm{y}$ for a given element $i$ 
+!bt
+\begin{align*} 
+\mathbb{E}(y_i) & =
+\mathbb{E}(\mathbf{X}_{i, \ast} \, \bm{\beta}) + \mathbb{E}(\varepsilon_i)
+\, \, \, = \, \, \, \mathbf{X}_{i, \ast} \, \beta, 
+\end{align*} 
+!et
+while
+its variance is 
+!bt
+\begin{align*} \mbox{Var}(y_i) & = \mathbb{E} \{ [y_i
+- \mathbb{E}(y_i)]^2 \} \, \, \, = \, \, \, \mathbb{E} ( y_i^2 ) -
+[\mathbb{E}(y_i)]^2  \\  & = \mathbb{E} [ ( \mathbf{X}_{i, \ast} \,
+\beta + \varepsilon_i )^2] - ( \mathbf{X}_{i, \ast} \, \bm{\beta})^2 \\ &
+= \mathbb{E} [ ( \mathbf{X}_{i, \ast} \, \bm{\beta})^2 + 2 \varepsilon_i
+\mathbf{X}_{i, \ast} \, \bm{\beta} + \varepsilon_i^2 ] - ( \mathbf{X}_{i,
+\ast} \, \beta)^2 \\  & = ( \mathbf{X}_{i, \ast} \, \bm{\beta})^2 + 2
+\mathbb{E}(\varepsilon_i) \mathbf{X}_{i, \ast} \, \bm{\beta} +
+\mathbb{E}(\varepsilon_i^2 ) - ( \mathbf{X}_{i, \ast} \, \bm{\beta})^2 
+\\ & = \mathbb{E}(\varepsilon_i^2 ) \, \, \, = \, \, \,
+\mbox{Var}(\varepsilon_i) \, \, \, = \, \, \, \sigma^2.  
+\end{align*}
+!et
+Hence, $y_i \sim \mathcal{N}( \mathbf{X}_{i, \ast} \, \bm{\beta}, \sigma^2)$, that is $\bm{y}$ follows a normal distribution with 
+mean value $\bm{X}\bm{\beta}$ and variance $\sigma^2$ (not be confused with the singular values of the SVD). 
+
+!split
+===== Expectation value and variance for $\bm{\beta}$ =====
+
+With the OLS expressions for the optimal parameters $\bm{\hat{\beta}}$ we can evaluate the expectation value
+!bt
+\[
+\mathbb{E}(\bm{\hat{\beta}}) = \mathbb{E}[ (\mathbf{X}^{\top} \mathbf{X})^{-1}\mathbf{X}^{T} \mathbf{Y}]=(\mathbf{X}^{T} \mathbf{X})^{-1}\mathbf{X}^{T} \mathbb{E}[ \mathbf{Y}]=(\mathbf{X}^{T} \mathbf{X})^{-1} \mathbf{X}^{T}\mathbf{X}\bm{\beta}=\bm{\beta}.
+\]
+!et
+This means that the estimator of the regression parameters is unbiased.
+
+We can also calculate the variance
+
+The variance of the optimal value $\bm{\hat{\beta}}$ is
+!bt
+\begin{eqnarray*}
+\mbox{Var}(\bm{\hat{\beta}}) & = & \mathbb{E} \{ [\bm{\beta} - \mathbb{E}(\bm{\beta})] [\bm{\beta} - \mathbb{E}(\bm{\beta})]^{T} \}
+\\
+& = & \mathbb{E} \{ [(\mathbf{X}^{T} \mathbf{X})^{-1} \, \mathbf{X}^{T} \mathbf{y} - \bm{\beta}] \, [(\mathbf{X}^{T} \mathbf{X})^{-1} \, \mathbf{X}^{T} \mathbf{y} - \bm{\beta}]^{T} \}
+\\
+% & = & \mathbb{E} \{ [(\mathbf{X}^{T} \mathbf{X})^{-1} \, \mathbf{X}^{T} \mathbf{y}] \, [(\mathbf{X}^{T} \mathbf{X})^{-1} \, \mathbf{X}^{T} \mathbf{y}]^{T} \} - \bm{\beta} \, \bm{\beta}^{T}
+% \\
+% & = & \mathbb{E} \{ (\mathbf{X}^{T} \mathbf{X})^{-1} \, \mathbf{X}^{T} \mathbf{y} \, \mathbf{y}^{T} \, \mathbf{X} \, (\mathbf{X}^{T} \mathbf{X})^{-1}  \} - \bm{\beta} \, \bm{\beta}^{T}
+% \\
+& = & (\mathbf{X}^{T} \mathbf{X})^{-1} \, \mathbf{X}^{T} \, \mathbb{E} \{ \mathbf{y} \, \mathbf{y}^{T} \} \, \mathbf{X} \, (\mathbf{X}^{T} \mathbf{X})^{-1} - \bm{\beta} \, \bm{\beta}^{T}
+\\
+& = & (\mathbf{X}^{T} \mathbf{X})^{-1} \, \mathbf{X}^{T} \, \{ \mathbf{X} \, \bm{\beta} \, \bm{\beta}^{T} \,  \mathbf{X}^{T} + \sigma^2 \} \, \mathbf{X} \, (\mathbf{X}^{T} \mathbf{X})^{-1} - \bm{\beta} \, \bm{\beta}^{T}
+% \\
+% & = & (\mathbf{X}^T \mathbf{X})^{-1} \, \mathbf{X}^T \, \mathbf{X} \, \bm{\beta} \, \bm{\beta}^T \,  \mathbf{X}^T \, \mathbf{X} \, (\mathbf{X}^T % \mathbf{X})^{-1}
+% \\
+% & & + \, \, \sigma^2 \, (\mathbf{X}^T \mathbf{X})^{-1} \, \mathbf{X}^T  \, \mathbf{X} \, (\mathbf{X}^T \mathbf{X})^{-1} - \bm{\beta} \bm{\beta}^T
+\\
+& = & \bm{\beta} \, \bm{\beta}^{T}  + \sigma^2 \, (\mathbf{X}^{T} \mathbf{X})^{-1} - \bm{\beta} \, \bm{\beta}^{T}
+\, \, \, = \, \, \, \sigma^2 \, (\mathbf{X}^{T} \mathbf{X})^{-1},
+\end{eqnarray*}
+!et
+
+where we have used  that $\mathbb{E} (\mathbf{y} \mathbf{y}^{T}) =
+\mathbf{X} \, \bm{\beta} \, \bm{\beta}^{T} \, \mathbf{X}^{T} +
+\sigma^2 \, \mathbf{I}_{nn}$. From $\mbox{Var}(\bm{\beta}) = \sigma^2
+\, (\mathbf{X}^{T} \mathbf{X})^{-1}$, one obtains an estimate of the
+variance of the estimate of the $j$-th regression coefficient:
+$\bm{\sigma}^2 (\bm{\beta}_j ) = \bm{\sigma}^2 [(\mathbf{X}^{T} \mathbf{X})^{-1}]_{jj} $. This may be used to
+construct a confidence interval for the estimates.
+
+
+In a similar way, we can obtain analytical expressions for say the
+expectation values of the parameters $\bm{\beta}$ and their variance
+when we employ Ridge regression, allowing us again to define a confidence interval. 
+
+It is rather straightforward to show that
+!bt
+\[
+\mathbb{E} \big[ \hat{\bm{\beta}}^{\mathrm{Ridge}} \big]=(\mathbf{X}^{T} \mathbf{X} + \lambda \mathbf{I}_{pp})^{-1} (\mathbf{X}^{\top} \mathbf{X})\bm{\beta}.
+\]
+!et
+We see clearly that 
+$\mathbb{E} \big[ \hat{\bm{\beta}}^{\mathrm{Ridge}} \big] \not= \hat{\bm{\beta}}^{\mathrm{OLS}}$ for any $\lambda > 0$.
+
+We can also compute the variance as 
+
+!bt
+\[
+\mbox{Var}[\hat{\bm{\beta}}^{\mathrm{Ridge}}]=\sigma^2[  \mathbf{X}^{T} \mathbf{X} + \lambda \mathbf{I} ]^{-1}  \mathbf{X}^{T} \mathbf{X} \{ [  \mathbf{X}^{\top} \mathbf{X} + \lambda \mathbf{I} ]^{-1}\}^{T},
+\]
+!et
+and it is easy to see that if the parameter $\lambda$ goes to infinity then the variance of Ridge parameters $\bm{\beta}$ goes to zero. 
+
+With this, we can compute the difference 
+
+!bt
+\[
+\mbox{Var}[\hat{\bm{\beta}}^{\mathrm{OLS}}]-\mbox{Var}(\hat{\bm{\beta}}^{\mathrm{Ridge}})=\sigma^2 [  \mathbf{X}^{T} \mathbf{X} + \lambda \mathbf{I} ]^{-1}[ 2\lambda\mathbf{I} + \lambda^2 (\mathbf{X}^{T} \mathbf{X})^{-1} ] \{ [  \mathbf{X}^{T} \mathbf{X} + \lambda \mathbf{I} ]^{-1}\}^{T}.
+\]
+!et
+The difference is non-negative definite since each component of the
+matrix product is non-negative definite. 
+This means the variance we obtain with the standard OLS will always for $\lambda > 0$ be larger than the variance of $\bm{\beta}$ obtained with the Ridge estimator. This has interesting consequences when we discuss the so-called bias-variance trade-off below. 
+
+For more discussions of Ridge regression and calculation of averages, "Wessel van Wieringen's":"https://arxiv.org/abs/1509.09169" article is highly recommended.
+
+
+
+
+
diff --git a/doc/src/week37/Previousversions/week37_2022.ipynb b/doc/src/week37/Previousversions/week37_2022.ipynb
new file mode 100644
index 000000000..76fe6b50c
--- /dev/null
+++ b/doc/src/week37/Previousversions/week37_2022.ipynb
@@ -0,0 +1,2643 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "5cbc3557",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- HTML file automatically generated from DocOnce source (https://github.com/doconce/doconce/)\n",
+    "doconce format html week37_2022.do.txt  -->\n",
+    "<!-- dom:TITLE: Week 37: Summary of Ridge and Lasso Regression and Resampling Methods -->"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "28e0229a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "# Week 37: Summary of Ridge and Lasso Regression and Resampling Methods\n",
+    "**Morten Hjorth-Jensen**, Department of Physics, University of Oslo and Department of Physics and Astronomy and National Superconducting Cyclotron Laboratory, Michigan State University\n",
+    "\n",
+    "Date: **Sep 22, 2025**\n",
+    "\n",
+    "Copyright 1999-2025, Morten Hjorth-Jensen. Released under CC Attribution-NonCommercial 4.0 license"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "593af66f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Plans for week 37\n",
+    "\n",
+    "* Summary of Ridge and Lasso with examples and statistical interpretation. Start resampling techniques and discussion of the **bias-variance** tradeoff.\n",
+    "\n",
+    "* Resampling methods, bias-variance, overfitting, Cross-validation and Bootstrapping\n",
+    "\n",
+    "Recommended Reading:\n",
+    "1. Lectures on Resampling methods (these lectures), see also lectures from week 36\n",
+    "\n",
+    "2. Bishop 1.3 (cross-validation) and 3.2 (bias-variance tradeoff)\n",
+    "\n",
+    "3. Hastie et al Chapter 7, here we recommend 7.1-7.5 and 7.10 (cross-validation) and 7.11 (bootstrap). \n",
+    "\n",
+    "4. See also the excellent videos on the SVD at <http://databookuw.com/page-2/page-4/>. The texboook by Brunton and Kutz at <http://databookuw.com> is highly recommended"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "08836293",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Summary of Ridge and Lasso Regression and start Resampling methods"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "fbb93c61",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Deriving OLS from a probability distribution\n",
+    "\n",
+    "Our basic assumption when we derived the OLS equations was to assume\n",
+    "that our output is determined by a given continuous function\n",
+    "$f(\\boldsymbol{x})$ and a random noise $\\boldsymbol{\\epsilon}$ given by the normal\n",
+    "distribution with zero mean value and an undetermined variance\n",
+    "$\\sigma^2$.\n",
+    "\n",
+    "We found above that the outputs $\\boldsymbol{y}$ have a mean value given by\n",
+    "$\\boldsymbol{X}\\hat{\\boldsymbol{\\beta}}$ and variance $\\sigma^2$. Since the entries to\n",
+    "the design matrix are not stochastic variables, we can assume that the\n",
+    "probability distribution of our targets is also a normal distribution\n",
+    "but now with mean value $\\boldsymbol{X}\\hat{\\boldsymbol{\\beta}}$. This means that a\n",
+    "single output $y_i$ is given by the Gaussian distribution"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "71bd3009",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "y_i\\sim \\mathcal{N}(\\boldsymbol{X}_{i,*}\\boldsymbol{\\beta}, \\sigma^2)=\\frac{1}{\\sqrt{2\\pi\\sigma^2}}\\exp{\\left[-\\frac{(y_i-\\boldsymbol{X}_{i,*}\\boldsymbol{\\beta})^2}{2\\sigma^2}\\right]}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f0b70ced",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Independent and Identically Distrubuted (iid)\n",
+    "\n",
+    "We assume now that the various $y_i$ values are stochastically distributed according to the above Gaussian distribution. \n",
+    "We define this distribution as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a79ce540",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "p(y_i, \\boldsymbol{X}\\vert\\boldsymbol{\\beta})=\\frac{1}{\\sqrt{2\\pi\\sigma^2}}\\exp{\\left[-\\frac{(y_i-\\boldsymbol{X}_{i,*}\\boldsymbol{\\beta})^2}{2\\sigma^2}\\right]},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e1e717f1",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "which reads as finding the likelihood of an event $y_i$ with the input variables $\\boldsymbol{X}$ given the parameters (to be determined) $\\boldsymbol{\\beta}$.\n",
+    "\n",
+    "Since these events are assumed to be independent and identicall distributed we can build the probability distribution function (PDF) for all possible event $\\boldsymbol{y}$ as the product of the single events, that is we have"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "78d13e9e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "p(\\boldsymbol{y},\\boldsymbol{X}\\vert\\boldsymbol{\\beta})=\\prod_{i=0}^{n-1}\\frac{1}{\\sqrt{2\\pi\\sigma^2}}\\exp{\\left[-\\frac{(y_i-\\boldsymbol{X}_{i,*}\\boldsymbol{\\beta})^2}{2\\sigma^2}\\right]}=\\prod_{i=0}^{n-1}p(y_i,\\boldsymbol{X}\\vert\\boldsymbol{\\beta}).\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f2769411",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "We will write this in a more compact form reserving $\\boldsymbol{D}$ for the domain of events, including the ouputs (targets) and the inputs. That is\n",
+    "in case we have a simple one-dimensional input and output case"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e01b3ed0",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\boldsymbol{D}=[(x_0,y_0), (x_1,y_1),\\dots, (x_{n-1},y_{n-1})].\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a8f6ef1f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "In the more general case the various inputs should be replaced by the possible features represented by the input data set $\\boldsymbol{X}$. \n",
+    "We can now rewrite the above probability as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cecc837e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "p(\\boldsymbol{D}\\vert\\boldsymbol{\\beta})=\\prod_{i=0}^{n-1}\\frac{1}{\\sqrt{2\\pi\\sigma^2}}\\exp{\\left[-\\frac{(y_i-\\boldsymbol{X}_{i,*}\\boldsymbol{\\beta})^2}{2\\sigma^2}\\right]}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d2568eef",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "It is a conditional probability (see below) and reads as the likelihood of a domain of events $\\boldsymbol{D}$ given a set of parameters $\\boldsymbol{\\beta}$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "88d9a59d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Maximum Likelihood Estimation (MLE)\n",
+    "\n",
+    "In statistics, maximum likelihood estimation (MLE) is a method of\n",
+    "estimating the parameters of an assumed probability distribution,\n",
+    "given some observed data. This is achieved by maximizing a likelihood\n",
+    "function so that, under the assumed statistical model, the observed\n",
+    "data is the most probable. \n",
+    "\n",
+    "We will assume here that our events are given by the above Gaussian\n",
+    "distribution and we will determine the optimal parameters $\\beta$ by\n",
+    "maximizing the above PDF. However, computing the derivatives of a\n",
+    "product function is cumbersome and can easily lead to overflow and/or\n",
+    "underflowproblems, with potentials for loss of numerical precision.\n",
+    "\n",
+    "In practice, it is more convenient to maximize the logarithm of the\n",
+    "PDF because it is a monotonically increasing function of the argument.\n",
+    "Alternatively, and this will be our option, we will minimize the\n",
+    "negative of the logarithm since this is a monotonically decreasing\n",
+    "function.\n",
+    "\n",
+    "Note also that maximization/minimization of the logarithm of the PDF\n",
+    "is equivalent to the maximization/minimization of the function itself."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8d8f3be3",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## A new Cost Function\n",
+    "\n",
+    "We could now define a new cost function to minimize, namely the negative logarithm of the above PDF"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9f10f980",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "C(\\boldsymbol{\\beta})=-\\log{\\prod_{i=0}^{n-1}p(y_i,\\boldsymbol{X}\\vert\\boldsymbol{\\beta})}=-\\sum_{i=0}^{n-1}\\log{p(y_i,\\boldsymbol{X}\\vert\\boldsymbol{\\beta})},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1235ef44",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "which becomes"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b442e4ff",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "C(\\boldsymbol{\\beta})=\\frac{n}{2}\\log{2\\pi\\sigma^2}+\\frac{\\vert\\vert (\\boldsymbol{y}-\\boldsymbol{X}\\boldsymbol{\\beta})\\vert\\vert_2^2}{2\\sigma^2}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "47aa5a5d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Taking the derivative of the *new* cost function with respect to the parameters $\\beta$ we recognize our familiar OLS equation, namely"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3e6ad327",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\boldsymbol{X}^T\\left(\\boldsymbol{y}-\\boldsymbol{X}\\boldsymbol{\\beta}\\right) =0,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2023e38c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "which leads to the well-known OLS equation for the optimal paramters $\\beta$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cb73854a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\hat{\\boldsymbol{\\beta}}^{\\mathrm{OLS}}=\\left(\\boldsymbol{X}^T\\boldsymbol{X}\\right)^{-1}\\boldsymbol{X}^T\\boldsymbol{y}!\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "333cd935",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Bayes' Theorem\n",
+    "\n",
+    "If we combine the conditional probability with the marginal probability and the standard product rule, we have"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7ededba6",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "p(X\\vert Y)= \\frac{p(X,Y)}{p(Y)},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "95fd669d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "which we can rewrite as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9af8b754",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "p(X\\vert Y)= \\frac{p(X,Y)}{\\sum_{i=0}^{n-1}p(Y\\vert X=x_i)p(x_i)}=\\frac{p(Y\\vert X)p(X)}{\\sum_{i=0}^{n-1}p(Y\\vert X=x_i)p(x_i)},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "07f16a3c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "which is Bayes' theorem. It allows us to evaluate the uncertainty in in $X$ after we have observed $Y$. We can easily interchange $X$ with $Y$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "447a9c2a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Interpretations of Bayes' Theorem\n",
+    "\n",
+    "The quantity $p(Y\\vert X)$ on the right-hand side of the theorem is\n",
+    "evaluated for the observed data $Y$ and can be viewed as a function of\n",
+    "the parameter space represented by $X$. This function is not\n",
+    "necesseraly normalized and is normally called the likelihood function.\n",
+    "\n",
+    "The function $p(X)$ on the right hand side is called the prior while the function on the left hand side is the called the posterior probability. The denominator on the right hand side serves as a normalization factor for the posterior distribution."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "de58fcc1",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Test Function for what happens with OLS, Ridge and Lasso\n",
+    "\n",
+    "We will play around with a study of the values for the optimal\n",
+    "parameters $\\boldsymbol{\\beta}$ using OLS, Ridge and Lasso regression.  For\n",
+    "OLS, you will notice as function of the noise and polynomial degree,\n",
+    "that the parameters $\\beta$ will fluctuate from order to order in the\n",
+    "polynomial fit and that for larger and larger polynomial degrees of freedom, the parameters will tend to increase in value for OLS.\n",
+    "\n",
+    "For Ridge and Lasso regression, the higher order parameters will typically be reduced, providing thereby less fluctuations from one order to another one."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "id": "e2c46f25",
+   "metadata": {
+    "collapsed": false,
+    "editable": true,
+    "jupyter": {
+     "outputs_hidden": false
+    }
+   },
+   "outputs": [],
+   "source": [
+    "%matplotlib inline\n",
+    "\n",
+    "import numpy as np\n",
+    "import matplotlib.pyplot as plt\n",
+    "from sklearn.model_selection import train_test_split\n",
+    "from sklearn import linear_model\n",
+    "\n",
+    "def R2(y_data, y_model):\n",
+    "    return 1 - np.sum((y_data - y_model) ** 2) / np.sum((y_data - np.mean(y_data)) ** 2)\n",
+    "def MSE(y_data,y_model):\n",
+    "    n = np.size(y_model)\n",
+    "    return np.sum((y_data-y_model)**2)/n\n",
+    "\n",
+    "# Make data set.\n",
+    "n = 10000\n",
+    "x = np.random.rand(n)\n",
+    "y = np.exp(-x**2) + 1.5 * np.exp(-(x-2)**2)+ np.random.randn(n)\n",
+    "\n",
+    "Maxpolydegree = 5\n",
+    "X = np.zeros((len(x),Maxpolydegree))\n",
+    "X[:,0] = 1.0\n",
+    "\n",
+    "for polydegree in range(1, Maxpolydegree+1):\n",
+    "    for degree in range(polydegree):\n",
+    "        X[:,degree] = x**(degree)\n",
+    "\n",
+    "\n",
+    "# We split the data in test and training data\n",
+    "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)\n",
+    "\n",
+    "# matrix inversion to find beta\n",
+    "OLSbeta = np.linalg.pinv(X_train.T @ X_train) @ X_train.T @ y_train\n",
+    "print(OLSbeta)\n",
+    "ypredictOLS = X_test @ OLSbeta\n",
+    "print(\"Test MSE OLS\")\n",
+    "print(MSE(y_test,ypredictOLS))\n",
+    "# Repeat now for Lasso and Ridge regression and various values of the regularization parameter using Scikit-Learn\n",
+    "# Decide which values of lambda to use\n",
+    "nlambdas = 4\n",
+    "MSERidgePredict = np.zeros(nlambdas)\n",
+    "MSELassoPredict = np.zeros(nlambdas)\n",
+    "lambdas = np.logspace(-3, 1, nlambdas)\n",
+    "for i in range(nlambdas):\n",
+    "    lmb = lambdas[i]\n",
+    "    # Make the fit using Ridge and Lasso\n",
+    "    RegRidge = linear_model.Ridge(lmb,fit_intercept=False)\n",
+    "    RegRidge.fit(X_train,y_train)\n",
+    "    RegLasso = linear_model.Lasso(lmb,fit_intercept=False)\n",
+    "    RegLasso.fit(X_train,y_train)\n",
+    "    # and then make the prediction\n",
+    "    ypredictRidge = RegRidge.predict(X_test)\n",
+    "    ypredictLasso = RegLasso.predict(X_test)\n",
+    "    # Compute the MSE and print it\n",
+    "    MSERidgePredict[i] = MSE(y_test,ypredictRidge)\n",
+    "    MSELassoPredict[i] = MSE(y_test,ypredictLasso)\n",
+    "    print(lmb,RegRidge.coef_)\n",
+    "    print(lmb,RegLasso.coef_)\n",
+    "# Now plot the results\n",
+    "plt.figure()\n",
+    "plt.plot(np.log10(lambdas), MSERidgePredict, 'b', label = 'MSE Ridge Test')\n",
+    "plt.plot(np.log10(lambdas), MSELassoPredict, 'r', label = 'MSE Lasso Test')\n",
+    "plt.xlabel('log10(lambda)')\n",
+    "plt.ylabel('MSE')\n",
+    "plt.legend()\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c9ec0e30",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "How can we understand this?"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "27ceb046",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Rerunning the above code\n",
+    "\n",
+    "Let us write out the values of the coefficients $\\beta_i$ as functions\n",
+    "of the polynomial degree and noise. We will focus only on the Ridge\n",
+    "results and some few selected values of the hyperparameter $\\lambda$.\n",
+    "\n",
+    "If we don't include any noise and run this code for different values\n",
+    "of the polynomial degree, we notice that the results for $\\beta_i$ do\n",
+    "not show great changes from one order to the next. This is an\n",
+    "indication that for higher polynomial orders, our parameters become\n",
+    "less important.\n",
+    "\n",
+    "If we however add noise, what happens is that the polynomial fit is\n",
+    "trying to adjust the fit to traverse in the best possible way all data\n",
+    "points. This can lead to large fluctuations in the parameters\n",
+    "$\\beta_i$ as functions of polynomial order. It will also be reflected\n",
+    "in a larger value of the variance of each parameter $\\beta_i$.  What\n",
+    "Ridge regression (and Lasso as well) are doing then is to try to\n",
+    "quench the fluctuations in the parameters of $\\beta_i$ which have a\n",
+    "large variance (normally for higher orders in the polynomial)."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "id": "7416cbdf",
+   "metadata": {
+    "collapsed": false,
+    "editable": true,
+    "jupyter": {
+     "outputs_hidden": false
+    }
+   },
+   "outputs": [],
+   "source": [
+    "import numpy as np\n",
+    "import pandas as pd\n",
+    "from IPython.display import display\n",
+    "import matplotlib.pyplot as plt\n",
+    "from sklearn.model_selection import train_test_split\n",
+    "from sklearn import linear_model\n",
+    "\n",
+    "# Make data set.\n",
+    "n = 1000\n",
+    "x = np.random.rand(n)\n",
+    "y = np.exp(-x**2) + 1.5 * np.exp(-(x-2)**2)+ np.random.randn(n)\n",
+    "\n",
+    "Maxpolydegree = 5\n",
+    "X = np.zeros((len(x),Maxpolydegree))\n",
+    "X[:,0] = 1.0\n",
+    "\n",
+    "for polydegree in range(1, Maxpolydegree+1):\n",
+    "    for degree in range(polydegree):\n",
+    "        X[:,degree] = x**(degree)\n",
+    "\n",
+    "\n",
+    "# We split the data in test and training data\n",
+    "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)\n",
+    "\n",
+    "# Decide which values of lambda to use\n",
+    "nlambdas = 5\n",
+    "lambdas = np.logspace(-3, 2, nlambdas)\n",
+    "for i in range(nlambdas):\n",
+    "    lmb = lambdas[i]\n",
+    "    # Make the fit using Ridge only\n",
+    "    RegRidge = linear_model.Ridge(lmb,fit_intercept=False)\n",
+    "    RegRidge.fit(X_train,y_train)\n",
+    "    # and then make the prediction\n",
+    "    ypredictRidge = RegRidge.predict(X_test)\n",
+    "    Coeffs = np.array(RegRidge.coef_)\n",
+    "    BetaValues = pd.DataFrame(Coeffs)\n",
+    "    BetaValues.columns = ['beta']\n",
+    "    display(BetaValues)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "65ed8a0d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Invoking Bayes' theorem\n",
+    "\n",
+    "Using Bayes' theorem we can gain a better intuition about Ridge and Lasso regression. \n",
+    "\n",
+    "For ordinary least squares we postulated that the maximum likelihood for the doamin of events $\\boldsymbol{D}$ (one-dimensional case)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e63a73b8",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\boldsymbol{D}=[(x_0,y_0), (x_1,y_1),\\dots, (x_{n-1},y_{n-1})],\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "592534d6",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "is given by"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "700861a4",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "p(\\boldsymbol{D}\\vert\\boldsymbol{\\beta})=\\prod_{i=0}^{n-1}\\frac{1}{\\sqrt{2\\pi\\sigma^2}}\\exp{\\left[-\\frac{(y_i-\\boldsymbol{X}_{i,*}\\boldsymbol{\\beta})^2}{2\\sigma^2}\\right]}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "353757dd",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "In Bayes' theorem this function plays the role of the so-called likelihood. We could now ask the question what is the posterior probability of a parameter set $\\boldsymbol{\\beta}$ given a domain of events $\\boldsymbol{D}$?  That is, how can we define the posterior probability"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "fe714f0e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "p(\\boldsymbol{\\beta}\\vert\\boldsymbol{D}).\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "206210ae",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Bayes' theorem comes to our rescue here since (omitting the normalization constant)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "948e45c6",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "p(\\boldsymbol{\\beta}\\vert\\boldsymbol{D})\\propto p(\\boldsymbol{D}\\vert\\boldsymbol{\\beta})p(\\boldsymbol{\\beta}).\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "fee3d7c4",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "We have a model for $p(\\boldsymbol{D}\\vert\\boldsymbol{\\beta})$ but need one for the **prior** $p(\\boldsymbol{\\beta}$!"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "099f7e42",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Ridge and Bayes\n",
+    "\n",
+    "With the posterior probability defined by a likelihood which we have\n",
+    "already modeled and an unknown prior, we are now ready to make\n",
+    "additional models for the prior.\n",
+    "\n",
+    "We can, based on our discussions of the variance of $\\boldsymbol{\\beta}$ and the mean value, assume that the prior for the values $\\boldsymbol{\\beta}$ is given by a Gaussian with mean value zero and variance $\\tau^2$, that is"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "104b7139",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "p(\\boldsymbol{\\beta})=\\prod_{j=0}^{p-1}\\exp{\\left(-\\frac{\\beta_j^2}{2\\tau^2}\\right)}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "353b7382",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Our posterior probability becomes then (omitting the normalization factor which is just a constant)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0fc7801e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "p(\\boldsymbol{\\beta\\vert\\boldsymbol{D})}=\\prod_{i=0}^{n-1}\\frac{1}{\\sqrt{2\\pi\\sigma^2}}\\exp{\\left[-\\frac{(y_i-\\boldsymbol{X}_{i,*}\\boldsymbol{\\beta})^2}{2\\sigma^2}\\right]}\\prod_{j=0}^{p-1}\\exp{\\left(-\\frac{\\beta_j^2}{2\\tau^2}\\right)}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "52481c87",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "We can now optimize this quantity with respect to $\\boldsymbol{\\beta}$. As we\n",
+    "did for OLS, this is most conveniently done by taking the negative\n",
+    "logarithm of the posterior probability. Doing so and leaving out the\n",
+    "constants terms that do not depend on $\\beta$, we have"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6049f5fe",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "C(\\boldsymbol{\\beta})=\\frac{\\vert\\vert (\\boldsymbol{y}-\\boldsymbol{X}\\boldsymbol{\\beta})\\vert\\vert_2^2}{2\\sigma^2}+\\frac{1}{2\\tau^2}\\vert\\vert\\boldsymbol{\\beta}\\vert\\vert_2^2,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0b0d5dc2",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and replacing $1/2\\tau^2$ with $\\lambda$ we have"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e6756e58",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "C(\\boldsymbol{\\beta})=\\frac{\\vert\\vert (\\boldsymbol{y}-\\boldsymbol{X}\\boldsymbol{\\beta})\\vert\\vert_2^2}{2\\sigma^2}+\\lambda\\vert\\vert\\boldsymbol{\\beta}\\vert\\vert_2^2,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a058ef4c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "which is our Ridge cost function!  Nice, isn't it?"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "151bf669",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Lasso and Bayes\n",
+    "\n",
+    "To derive the Lasso cost function, we simply replace the Gaussian prior with an exponential distribution ([Laplace in this case](https://en.wikipedia.org/wiki/Laplace_distribution)) with zero mean value,  that is"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f94bf6c1",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "p(\\boldsymbol{\\beta})=\\prod_{j=0}^{p-1}\\exp{\\left(-\\frac{\\vert\\beta_j\\vert}{\\tau}\\right)}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6e54268c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Our posterior probability becomes then (omitting the normalization factor which is just a constant)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a8f9e121",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "p(\\boldsymbol{\\beta}\\vert\\boldsymbol{D})=\\prod_{i=0}^{n-1}\\frac{1}{\\sqrt{2\\pi\\sigma^2}}\\exp{\\left[-\\frac{(y_i-\\boldsymbol{X}_{i,*}\\boldsymbol{\\beta})^2}{2\\sigma^2}\\right]}\\prod_{j=0}^{p-1}\\exp{\\left(-\\frac{\\vert\\beta_j\\vert}{\\tau}\\right)}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2451ef7f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Taking the negative\n",
+    "logarithm of the posterior probability and leaving out the\n",
+    "constants terms that do not depend on $\\beta$, we have"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0aee6d6b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "C(\\boldsymbol{\\beta}=\\frac{\\vert\\vert (\\boldsymbol{y}-\\boldsymbol{X}\\boldsymbol{\\beta})\\vert\\vert_2^2}{2\\sigma^2}+\\frac{1}{\\tau}\\vert\\vert\\boldsymbol{\\beta}\\vert\\vert_1,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cea1b629",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and replacing $1/\\tau$ with $\\lambda$ we have"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "aa55ba2f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "C(\\boldsymbol{\\beta}=\\frac{\\vert\\vert (\\boldsymbol{y}-\\boldsymbol{X}\\boldsymbol{\\beta})\\vert\\vert_2^2}{2\\sigma^2}+\\lambda\\vert\\vert\\boldsymbol{\\beta}\\vert\\vert_1,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "54211337",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "which is our Lasso cost function!"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1d1eade3",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Why resampling methods\n",
+    "\n",
+    "Before we proceed, we need to rethink what we have been doing. In our\n",
+    "eager to fit the data, we have omitted several important elements in\n",
+    "our regression analysis. In what follows we will\n",
+    "1. look at statistical properties, including a discussion of mean values, variance and the so-called bias-variance tradeoff\n",
+    "\n",
+    "2. introduce resampling techniques like cross-validation, bootstrapping and jackknife and more\n",
+    "\n",
+    "and discuss how to select a given model (one of the difficult parts in machine learning)."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e91075b7",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Resampling methods\n",
+    "Resampling methods are an indispensable tool in modern\n",
+    "statistics. They involve repeatedly drawing samples from a training\n",
+    "set and refitting a model of interest on each sample in order to\n",
+    "obtain additional information about the fitted model. For example, in\n",
+    "order to estimate the variability of a linear regression fit, we can\n",
+    "repeatedly draw different samples from the training data, fit a linear\n",
+    "regression to each new sample, and then examine the extent to which\n",
+    "the resulting fits differ. Such an approach may allow us to obtain\n",
+    "information that would not be available from fitting the model only\n",
+    "once using the original training sample.\n",
+    "\n",
+    "Two resampling methods are often used in Machine Learning analyses,\n",
+    "1. The **bootstrap method**\n",
+    "\n",
+    "2. and **Cross-Validation**\n",
+    "\n",
+    "In addition there are several other methods such as the Jackknife and the Blocking methods. We will discuss in particular\n",
+    "cross-validation and the bootstrap method."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "65ce3eb8",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Resampling approaches can be computationally expensive\n",
+    "\n",
+    "Resampling approaches can be computationally expensive, because they\n",
+    "involve fitting the same statistical method multiple times using\n",
+    "different subsets of the training data. However, due to recent\n",
+    "advances in computing power, the computational requirements of\n",
+    "resampling methods generally are not prohibitive. In this chapter, we\n",
+    "discuss two of the most commonly used resampling methods,\n",
+    "cross-validation and the bootstrap. Both methods are important tools\n",
+    "in the practical application of many statistical learning\n",
+    "procedures. For example, cross-validation can be used to estimate the\n",
+    "test error associated with a given statistical learning method in\n",
+    "order to evaluate its performance, or to select the appropriate level\n",
+    "of flexibility. The process of evaluating a model’s performance is\n",
+    "known as model assessment, whereas the process of selecting the proper\n",
+    "level of flexibility for a model is known as model selection. The\n",
+    "bootstrap is widely used."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f299e47b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Why resampling methods ?\n",
+    "**Statistical analysis.**\n",
+    "\n",
+    "* Our simulations can be treated as *computer experiments*. This is particularly the case for Monte Carlo methods which are widely used in statistical analyses.\n",
+    "\n",
+    "* The results can be analysed with the same statistical tools as we would use when analysing experimental data.\n",
+    "\n",
+    "* As in all experiments, we are looking for expectation values and an estimate of how accurate they are, i.e., possible sources for errors."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "096eb8b1",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Statistical analysis\n",
+    "\n",
+    "* As in other experiments, many numerical  experiments have two classes of errors:\n",
+    "\n",
+    "  * Statistical errors\n",
+    "\n",
+    "  * Systematical errors\n",
+    "\n",
+    "* Statistical errors can be estimated using standard tools from statistics\n",
+    "\n",
+    "* Systematical errors are method specific and must be treated differently from case to case."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4b05e0ce",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Resampling methods\n",
+    "\n",
+    "With all these analytical equations for both the OLS and Ridge\n",
+    "regression, we will now outline how to assess a given model. This will\n",
+    "lead to a discussion of the so-called bias-variance tradeoff (see\n",
+    "below) and so-called resampling methods.\n",
+    "\n",
+    "One of the quantities we have discussed as a way to measure errors is\n",
+    "the mean-squared error (MSE), mainly used for fitting of continuous\n",
+    "functions. Another choice is the absolute error.\n",
+    "\n",
+    "In the discussions below we will focus on the MSE and in particular since we will split the data into test and training data,\n",
+    "we discuss the\n",
+    "1. prediction error or simply the **test error** $\\mathrm{Err_{Test}}$, where we have a fixed training set and the test error is the MSE arising from the data reserved for testing. We discuss also the \n",
+    "\n",
+    "2. training error $\\mathrm{Err_{Train}}$, which is the average loss over the training data.\n",
+    "\n",
+    "As our model becomes more and more complex, more of the training data tends to  used. The training may thence adapt to more complicated structures in the data. This may lead to a decrease in the bias (see below for code example) and a slight increase of the variance for the test error.\n",
+    "For a certain level of complexity the test error will reach minimum, before starting to increase again. The\n",
+    "training error reaches a saturation."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5a3ffa9b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Resampling methods: Jackknife and Bootstrap\n",
+    "\n",
+    "Two famous\n",
+    "resampling methods are the **independent bootstrap** and **the jackknife**. \n",
+    "\n",
+    "The jackknife is a special case of the independent bootstrap. Still, the jackknife was made\n",
+    "popular prior to the independent bootstrap. And as the popularity of\n",
+    "the independent bootstrap soared, new variants, such as **the dependent bootstrap** have also been developed..\n",
+    "\n",
+    "The Jackknife and independent bootstrap work for\n",
+    "independent, identically distributed random variables.\n",
+    "If these conditions are not\n",
+    "satisfied, the methods will fail.  Yet, it should be said that if the data are\n",
+    "independent, identically distributed, and we only want to estimate the\n",
+    "variance of $\\overline{X}$ (which often is the case), then there is no\n",
+    "need for bootstrapping."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "480037c5",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Resampling methods: Jackknife\n",
+    "\n",
+    "The Jackknife works by making many replicas of the estimator $\\widehat{\\beta}$. \n",
+    "The jackknife is a resampling method where we systematically leave out one observation from the vector of observed values $\\boldsymbol{x} = (x_1,x_2,\\cdots,X_n)$. \n",
+    "Let $\\boldsymbol{x}_i$ denote the vector"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d41eed53",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\boldsymbol{x}_i = (x_1,x_2,\\cdots,x_{i-1},x_{i+1},\\cdots,x_n),\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f8f30e8b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "which equals the vector $\\boldsymbol{x}$ with the exception that observation\n",
+    "number $i$ is left out. Using this notation, define\n",
+    "$\\widehat{\\beta}_i$ to be the estimator\n",
+    "$\\widehat{\\beta}$ computed using $\\vec{X}_i$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "03cdb893",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Jackknife code example"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "id": "65bdaef4",
+   "metadata": {
+    "collapsed": false,
+    "editable": true,
+    "jupyter": {
+     "outputs_hidden": false
+    }
+   },
+   "outputs": [],
+   "source": [
+    "from numpy import *\n",
+    "from numpy.random import randint, randn\n",
+    "from time import time\n",
+    "\n",
+    "def jackknife(data, stat):\n",
+    "    n = len(data);t = zeros(n); inds = arange(n); t0 = time()\n",
+    "    ## 'jackknifing' by leaving out an observation for each i                                                                                                                      \n",
+    "    for i in range(n):\n",
+    "        t[i] = stat(delete(data,i) )\n",
+    "\n",
+    "    # analysis                                                                                                                                                                     \n",
+    "    print(\"Runtime: %g sec\" % (time()-t0)); print(\"Jackknife Statistics :\")\n",
+    "    print(\"original           bias      std. error\")\n",
+    "    print(\"%8g %14g %15g\" % (stat(data),(n-1)*mean(t)/n, (n*var(t))**.5))\n",
+    "\n",
+    "    return t\n",
+    "\n",
+    "\n",
+    "# Returns mean of data samples                                                                                                                                                     \n",
+    "def stat(data):\n",
+    "    return mean(data)\n",
+    "\n",
+    "\n",
+    "mu, sigma = 100, 15\n",
+    "datapoints = 10000\n",
+    "x = mu + sigma*random.randn(datapoints)\n",
+    "# jackknife returns the data sample                                                                                                                                                \n",
+    "t = jackknife(x, stat)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "bf1ee253",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Resampling methods: Bootstrap\n",
+    "Bootstrapping is a [non-parametric approach](https://en.wikipedia.org/wiki/Nonparametric_statistics) to statistical inference\n",
+    "that substitutes computation for more traditional distributional\n",
+    "assumptions and asymptotic results. Bootstrapping offers a number of\n",
+    "advantages: \n",
+    "1. The bootstrap is quite general, although there are some cases in which it fails.  \n",
+    "\n",
+    "2. Because it does not require distributional assumptions (such as normally distributed errors), the bootstrap can provide more accurate inferences when the data are not well behaved or when the sample size is small.  \n",
+    "\n",
+    "3. It is possible to apply the bootstrap to statistics with sampling distributions that are difficult to derive, even asymptotically. \n",
+    "\n",
+    "4. It is relatively simple to apply the bootstrap to complex data-collection plans (such as stratified and clustered samples).\n",
+    "\n",
+    "The textbook by [Davison on the Bootstrap Methods and their Applications](https://www.cambridge.org/core/books/bootstrap-methods-and-their-application/ED2FD043579F27952363566DC09CBD6A) provides many more insights and proofs. In this course we will take a more practical approach and use the results and theorems provided in the literature. For those interested in reading more about the bootstrap methods, we recommend the above text and the one by [Efron and Tibshirani](https://www.routledge.com/An-Introduction-to-the-Bootstrap/Efron-Tibshirani/p/book/9780412042317).\n",
+    "\n",
+    "Before we proceed however, we need to remind ourselves about a central theorem in statistics, namely the so-called **central limit theorem**."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4fbb55be",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## The Central Limit Theorem\n",
+    "\n",
+    "Suppose we have a PDF $p(x)$ from which we generate  a series $N$\n",
+    "of averages $\\mathbb{E}[x_i]$. Each mean value $\\mathbb{E}[x_i]$\n",
+    "is viewed as the average of a specific measurement, e.g., throwing \n",
+    "dice 100 times and then taking the average value, or producing a certain\n",
+    "amount of random numbers. \n",
+    "For notational ease, we set $\\mathbb{E}[x_i]=x_i$ in the discussion\n",
+    "which follows. We do the same for $\\mathbb{E}[z]=z$.\n",
+    "\n",
+    "If we compute the mean $z$ of $m$ such mean values $x_i$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c7eecb0b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "z=\\frac{x_1+x_2+\\dots+x_m}{m},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c89cea70",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "the question we pose is which is the PDF of the new variable $z$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cbca665f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Finding the Limit\n",
+    "\n",
+    "The probability of obtaining an average value $z$ is the product of the \n",
+    "probabilities of obtaining arbitrary individual mean values $x_i$,\n",
+    "but with the constraint that the average is $z$. We can express this through\n",
+    "the following expression"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d1b912c0",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\tilde{p}(z)=\\int dx_1p(x_1)\\int dx_2p(x_2)\\dots\\int dx_mp(x_m)\n",
+    "    \\delta(z-\\frac{x_1+x_2+\\dots+x_m}{m}),\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e44342d4",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where the $\\delta$-function enbodies the constraint that the mean is $z$.\n",
+    "All measurements that lead to each individual $x_i$ are expected to\n",
+    "be independent, which in turn means that we can express $\\tilde{p}$ as the \n",
+    "product of individual $p(x_i)$.  The independence assumption is important in the derivation of the central limit theorem."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c56fed71",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Rewriting the $\\delta$-function\n",
+    "\n",
+    "If we use the integral expression for the $\\delta$-function"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "180c250f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\delta(z-\\frac{x_1+x_2+\\dots+x_m}{m})=\\frac{1}{2\\pi}\\int_{-\\infty}^{\\infty}\n",
+    "   dq\\exp{\\left(iq(z-\\frac{x_1+x_2+\\dots+x_m}{m})\\right)},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cc988c02",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and inserting $e^{i\\mu q-i\\mu q}$ where $\\mu$ is the mean value\n",
+    "we arrive at"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1ee0ebdf",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\tilde{p}(z)=\\frac{1}{2\\pi}\\int_{-\\infty}^{\\infty}\n",
+    "   dq\\exp{\\left(iq(z-\\mu)\\right)}\\left[\\int_{-\\infty}^{\\infty}\n",
+    "   dxp(x)\\exp{\\left(iq(\\mu-x)/m\\right)}\\right]^m,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "094af59c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "with the integral over $x$ resulting in"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a5c9ce7e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\int_{-\\infty}^{\\infty}dxp(x)\\exp{\\left(iq(\\mu-x)/m\\right)}=\n",
+    "  \\int_{-\\infty}^{\\infty}dxp(x)\n",
+    "   \\left[1+\\frac{iq(\\mu-x)}{m}-\\frac{q^2(\\mu-x)^2}{2m^2}+\\dots\\right].\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e8a3a476",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Identifying Terms\n",
+    "\n",
+    "The second term on the rhs disappears since this is just the mean and \n",
+    "employing the definition of $\\sigma^2$ we have"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ef6bb82d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\int_{-\\infty}^{\\infty}dxp(x)e^{\\left(iq(\\mu-x)/m\\right)}=\n",
+    "  1-\\frac{q^2\\sigma^2}{2m^2}+\\dots,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "bfedcd1c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "resulting in"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d01517bf",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\left[\\int_{-\\infty}^{\\infty}dxp(x)\\exp{\\left(iq(\\mu-x)/m\\right)}\\right]^m\\approx\n",
+    "  \\left[1-\\frac{q^2\\sigma^2}{2m^2}+\\dots \\right]^m,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e329181b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and in the limit $m\\rightarrow \\infty$ we obtain"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3f783f80",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\tilde{p}(z)=\\frac{1}{\\sqrt{2\\pi}(\\sigma/\\sqrt{m})}\n",
+    "    \\exp{\\left(-\\frac{(z-\\mu)^2}{2(\\sigma/\\sqrt{m})^2}\\right)},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "27c8f0bb",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "which is the normal distribution with variance\n",
+    "$\\sigma^2_m=\\sigma^2/m$, where $\\sigma$ is the variance of the PDF $p(x)$\n",
+    "and $\\mu$ is also the mean of the PDF $p(x)$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1e34c156",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Wrapping it up\n",
+    "\n",
+    "Thus, the central limit theorem states that the PDF $\\tilde{p}(z)$ of\n",
+    "the average of $m$ random values corresponding to a PDF $p(x)$ \n",
+    "is a normal distribution whose mean is the \n",
+    "mean value of the PDF $p(x)$ and whose variance is the variance\n",
+    "of the PDF $p(x)$ divided by $m$, the number of values used to compute $z$.\n",
+    "\n",
+    "The central limit theorem leads to the well-known expression for the\n",
+    "standard deviation, given by"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e52b7d05",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\sigma_m=\n",
+    "\\frac{\\sigma}{\\sqrt{m}}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e37824dd",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "The latter is true only if the average value is known exactly. This is obtained in the limit\n",
+    "$m\\rightarrow \\infty$  only. Because the mean and the variance are measured quantities we obtain \n",
+    "the familiar expression in statistics (the so-called Bessel correction)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ddb4e1ba",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\sigma_m\\approx \n",
+    "\\frac{\\sigma}{\\sqrt{m-1}}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8523e1d4",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "In many cases however the above estimate for the standard deviation,\n",
+    "in particular if correlations are strong, may be too simplistic. Keep\n",
+    "in mind that we have assumed that the variables $x$ are independent\n",
+    "and identically distributed. This is obviously not always the\n",
+    "case. For example, the random numbers (or better pseudorandom numbers)\n",
+    "we generate in various calculations do always exhibit some\n",
+    "correlations.\n",
+    "\n",
+    "The theorem is satisfied by a large class of PDFs. Note however that for a\n",
+    "finite $m$, it is not always possible to find a closed form /analytic expression for\n",
+    "$\\tilde{p}(x)$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "73769ce7",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Confidence Intervals\n",
+    "\n",
+    "Confidence intervals are used in statistics and represent a type of estimate\n",
+    "computed from the observed data. This gives a range of values for an\n",
+    "unknown parameter such as the parameters $\\boldsymbol{\\beta}$ from linear regression.\n",
+    "\n",
+    "With the OLS expressions for the parameters $\\boldsymbol{\\beta}$ we found \n",
+    "$\\mathbb{E}(\\boldsymbol{\\beta}) = \\boldsymbol{\\beta}$, which means that the estimator of the regression parameters is unbiased.\n",
+    "\n",
+    "We found also that the variance of the estimate of the $j$-th regression coefficient is\n",
+    "$\\boldsymbol{\\sigma}^2 (\\boldsymbol{\\beta}_j ) = \\boldsymbol{\\sigma}^2 [(\\mathbf{X}^{T} \\mathbf{X})^{-1}]_{jj} $.\n",
+    "\n",
+    "This quantity will be used to\n",
+    "construct a confidence interval for the estimates."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f8188c71",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Standard Approach based on the Normal Distribution\n",
+    "\n",
+    "We will assume that the parameters $\\beta$ follow a normal\n",
+    "distribution.  We can then define the confidence interval.  Here we will be using as\n",
+    "shorthands $\\mu_{\\beta}$ for the above mean value and $\\sigma_{\\beta}$\n",
+    "for the standard deviation. We have then a confidence interval"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "70628e2f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\left(\\mu_{\\beta}\\pm \\frac{z\\sigma_{\\beta}}{\\sqrt{n}}\\right),\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0fc91416",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where $z$ defines the level of certainty (or confidence). For a normal\n",
+    "distribution typical parameters are $z=2.576$ which corresponds to a\n",
+    "confidence of $99\\%$ while $z=1.96$ corresponds to a confidence of\n",
+    "$95\\%$.  A confidence level of $95\\%$ is commonly used and it is\n",
+    "normally referred to as a *two-sigmas* confidence level, that is we\n",
+    "approximate $z\\approx 2$.\n",
+    "\n",
+    "For more discussions of confidence intervals (and in particular linked with a discussion of the bootstrap method), see chapter 5 of the textbook by [Davison on the Bootstrap Methods and their Applications](https://www.cambridge.org/core/books/bootstrap-methods-and-their-application/ED2FD043579F27952363566DC09CBD6A)\n",
+    "\n",
+    "In this text you will also find an in-depth discussion of the\n",
+    "Bootstrap method, why it works and various theorems related to it."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "bfd9d392",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Resampling methods: Bootstrap background\n",
+    "\n",
+    "Since $\\widehat{\\beta} = \\widehat{\\beta}(\\boldsymbol{X})$ is a function of random variables,\n",
+    "$\\widehat{\\beta}$ itself must be a random variable. Thus it has\n",
+    "a pdf, call this function $p(\\boldsymbol{t})$. The aim of the bootstrap is to\n",
+    "estimate $p(\\boldsymbol{t})$ by the relative frequency of\n",
+    "$\\widehat{\\beta}$. You can think of this as using a histogram\n",
+    "in the place of $p(\\boldsymbol{t})$. If the relative frequency closely\n",
+    "resembles $p(\\vec{t})$, then using numerics, it is straight forward to\n",
+    "estimate all the interesting parameters of $p(\\boldsymbol{t})$ using point\n",
+    "estimators."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7d1051eb",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Resampling methods: More Bootstrap background\n",
+    "\n",
+    "In the case that $\\widehat{\\beta}$ has\n",
+    "more than one component, and the components are independent, we use the\n",
+    "same estimator on each component separately.  If the probability\n",
+    "density function of $X_i$, $p(x)$, had been known, then it would have\n",
+    "been straightforward to do this by: \n",
+    "1. Drawing lots of numbers from $p(x)$, suppose we call one such set of numbers $(X_1^*, X_2^*, \\cdots, X_n^*)$. \n",
+    "\n",
+    "2. Then using these numbers, we could compute a replica of $\\widehat{\\beta}$ called $\\widehat{\\beta}^*$. \n",
+    "\n",
+    "By repeated use of the above two points, many\n",
+    "estimates of $\\widehat{\\beta}$ can  be obtained. The\n",
+    "idea is to use the relative frequency of $\\widehat{\\beta}^*$\n",
+    "(think of a histogram) as an estimate of $p(\\boldsymbol{t})$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8df606b8",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Resampling methods: Bootstrap approach\n",
+    "\n",
+    "But\n",
+    "unless there is enough information available about the process that\n",
+    "generated $X_1,X_2,\\cdots,X_n$, $p(x)$ is in general\n",
+    "unknown. Therefore, [Efron in 1979](https://projecteuclid.org/euclid.aos/1176344552)  asked the\n",
+    "question: What if we replace $p(x)$ by the relative frequency\n",
+    "of the observation $X_i$?\n",
+    "\n",
+    "If we draw observations in accordance with\n",
+    "the relative frequency of the observations, will we obtain the same\n",
+    "result in some asymptotic sense? The answer is yes."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "12ffc406",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Resampling methods: Bootstrap steps\n",
+    "\n",
+    "The independent bootstrap works like this: \n",
+    "\n",
+    "1. Draw with replacement $n$ numbers for the observed variables $\\boldsymbol{x} = (x_1,x_2,\\cdots,x_n)$. \n",
+    "\n",
+    "2. Define a vector $\\boldsymbol{x}^*$ containing the values which were drawn from $\\boldsymbol{x}$. \n",
+    "\n",
+    "3. Using the vector $\\boldsymbol{x}^*$ compute $\\widehat{\\beta}^*$ by evaluating $\\widehat \\beta$ under the observations $\\boldsymbol{x}^*$. \n",
+    "\n",
+    "4. Repeat this process $k$ times. \n",
+    "\n",
+    "When you are done, you can draw a histogram of the relative frequency\n",
+    "of $\\widehat \\beta^*$. This is your estimate of the probability\n",
+    "distribution $p(t)$. Using this probability distribution you can\n",
+    "estimate any statistics thereof. In principle you never draw the\n",
+    "histogram of the relative frequency of $\\widehat{\\beta}^*$. Instead\n",
+    "you use the estimators corresponding to the statistic of interest. For\n",
+    "example, if you are interested in estimating the variance of $\\widehat\n",
+    "\\beta$, apply the etsimator $\\widehat \\sigma^2$ to the values\n",
+    "$\\widehat \\beta^*$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "186bb19b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Code example for the Bootstrap method\n",
+    "\n",
+    "The following code starts with a Gaussian distribution with mean value\n",
+    "$\\mu =100$ and variance $\\sigma=15$. We use this to generate the data\n",
+    "used in the bootstrap analysis. The bootstrap analysis returns a data\n",
+    "set after a given number of bootstrap operations (as many as we have\n",
+    "data points). This data set consists of estimated mean values for each\n",
+    "bootstrap operation. The histogram generated by the bootstrap method\n",
+    "shows that the distribution for these mean values is also a Gaussian,\n",
+    "centered around the mean value $\\mu=100$ but with standard deviation\n",
+    "$\\sigma/\\sqrt{n}$, where $n$ is the number of bootstrap samples (in\n",
+    "this case the same as the number of original data points). The value\n",
+    "of the standard deviation is what we expect from the central limit\n",
+    "theorem."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "id": "bb29aeb9",
+   "metadata": {
+    "collapsed": false,
+    "editable": true,
+    "jupyter": {
+     "outputs_hidden": false
+    }
+   },
+   "outputs": [],
+   "source": [
+    "import numpy as np\n",
+    "from time import time\n",
+    "from scipy.stats import norm\n",
+    "import matplotlib.pyplot as plt\n",
+    "\n",
+    "# Returns mean of bootstrap samples \n",
+    "# Bootstrap algorithm\n",
+    "def bootstrap(data, datapoints):\n",
+    "    t = np.zeros(datapoints)\n",
+    "    n = len(data)\n",
+    "    # non-parametric bootstrap         \n",
+    "    for i in range(datapoints):\n",
+    "        t[i] = np.mean(data[np.random.randint(0,n,n)])\n",
+    "    # analysis    \n",
+    "    print(\"Bootstrap Statistics :\")\n",
+    "    print(\"original           bias      std. error\")\n",
+    "    print(\"%8g %8g %14g %15g\" % (np.mean(data), np.std(data),np.mean(t),np.std(t)))\n",
+    "    return t\n",
+    "\n",
+    "# We set the mean value to 100 and the standard deviation to 15\n",
+    "mu, sigma = 100, 15\n",
+    "datapoints = 10000\n",
+    "# We generate random numbers according to the normal distribution\n",
+    "x = mu + sigma*np.random.randn(datapoints)\n",
+    "# bootstrap returns the data sample                                    \n",
+    "t = bootstrap(x, datapoints)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f6d12430",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "We see that our new variance and from that the standard deviation, agrees with the central limit theorem."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3ea2c697",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Plotting the Histogram"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "id": "75da51d2",
+   "metadata": {
+    "collapsed": false,
+    "editable": true,
+    "jupyter": {
+     "outputs_hidden": false
+    }
+   },
+   "outputs": [],
+   "source": [
+    "# the histogram of the bootstrapped data (normalized data if density = True)\n",
+    "n, binsboot, patches = plt.hist(t, 50, density=True, facecolor='red', alpha=0.75)\n",
+    "# add a 'best fit' line  \n",
+    "y = norm.pdf(binsboot, np.mean(t), np.std(t))\n",
+    "lt = plt.plot(binsboot, y, 'b', linewidth=1)\n",
+    "plt.xlabel('x')\n",
+    "plt.ylabel('Probability')\n",
+    "plt.grid(True)\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "432b1d05",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## The bias-variance tradeoff\n",
+    "\n",
+    "We will discuss the bias-variance tradeoff in the context of\n",
+    "continuous predictions such as regression. However, many of the\n",
+    "intuitions and ideas discussed here also carry over to classification\n",
+    "tasks. Consider a dataset $\\mathcal{D}$ consisting of the data\n",
+    "$\\mathbf{X}_\\mathcal{D}=\\{(y_j, \\boldsymbol{x}_j), j=0\\ldots n-1\\}$. \n",
+    "\n",
+    "Let us assume that the true data is generated from a noisy model"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "18ba4114",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\boldsymbol{y}=f(\\boldsymbol{x}) + \\boldsymbol{\\epsilon}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b08d293f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where $\\epsilon$ is normally distributed with mean zero and standard deviation $\\sigma^2$.\n",
+    "\n",
+    "In our derivation of the ordinary least squares method we defined then\n",
+    "an approximation to the function $f$ in terms of the parameters\n",
+    "$\\boldsymbol{\\beta}$ and the design matrix $\\boldsymbol{X}$ which embody our model,\n",
+    "that is $\\boldsymbol{\\tilde{y}}=\\boldsymbol{X}\\boldsymbol{\\beta}$. \n",
+    "\n",
+    "Thereafter we found the parameters $\\boldsymbol{\\beta}$ by optimizing the means squared error via the so-called cost function"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a6f63c57",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "C(\\boldsymbol{X},\\boldsymbol{\\beta}) =\\frac{1}{n}\\sum_{i=0}^{n-1}(y_i-\\tilde{y}_i)^2=\\mathbb{E}\\left[(\\boldsymbol{y}-\\boldsymbol{\\tilde{y}})^2\\right].\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a2f76770",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "We can rewrite this as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ad8720f3",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\mathbb{E}\\left[(\\boldsymbol{y}-\\boldsymbol{\\tilde{y}})^2\\right]=\\frac{1}{n}\\sum_i(f_i-\\mathbb{E}\\left[\\boldsymbol{\\tilde{y}}\\right])^2+\\frac{1}{n}\\sum_i(\\tilde{y}_i-\\mathbb{E}\\left[\\boldsymbol{\\tilde{y}}\\right])^2+\\sigma^2.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5a2c71e1",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "The three terms represent the square of the bias of the learning\n",
+    "method, which can be thought of as the error caused by the simplifying\n",
+    "assumptions built into the method. The second term represents the\n",
+    "variance of the chosen model and finally the last terms is variance of\n",
+    "the error $\\boldsymbol{\\epsilon}$.\n",
+    "\n",
+    "To derive this equation, we need to recall that the variance of $\\boldsymbol{y}$ and $\\boldsymbol{\\epsilon}$ are both equal to $\\sigma^2$. The mean value of $\\boldsymbol{\\epsilon}$ is by definition equal to zero. Furthermore, the function $f$ is not a stochastics variable, idem for $\\boldsymbol{\\tilde{y}}$.\n",
+    "We use a more compact notation in terms of the expectation value"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b51ab783",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\mathbb{E}\\left[(\\boldsymbol{y}-\\boldsymbol{\\tilde{y}})^2\\right]=\\mathbb{E}\\left[(\\boldsymbol{f}+\\boldsymbol{\\epsilon}-\\boldsymbol{\\tilde{y}})^2\\right],\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "890f4d3a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and adding and subtracting $\\mathbb{E}\\left[\\boldsymbol{\\tilde{y}}\\right]$ we get"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "529e915b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\mathbb{E}\\left[(\\boldsymbol{y}-\\boldsymbol{\\tilde{y}})^2\\right]=\\mathbb{E}\\left[(\\boldsymbol{f}+\\boldsymbol{\\epsilon}-\\boldsymbol{\\tilde{y}}+\\mathbb{E}\\left[\\boldsymbol{\\tilde{y}}\\right]-\\mathbb{E}\\left[\\boldsymbol{\\tilde{y}}\\right])^2\\right],\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a786a8c8",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "which, using the abovementioned expectation values can be rewritten as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e7dc0793",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\mathbb{E}\\left[(\\boldsymbol{y}-\\boldsymbol{\\tilde{y}})^2\\right]=\\mathbb{E}\\left[(\\boldsymbol{y}-\\mathbb{E}\\left[\\boldsymbol{\\tilde{y}}\\right])^2\\right]+\\mathrm{Var}\\left[\\boldsymbol{\\tilde{y}}\\right]+\\sigma^2,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3d56ec88",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "that is the rewriting in terms of the so-called bias, the variance of the model $\\boldsymbol{\\tilde{y}}$ and the variance of $\\boldsymbol{\\epsilon}$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e330aad1",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## A way to Read the Bias-Variance Tradeoff\n",
+    "\n",
+    "<!-- dom:FIGURE: [figures/BiasVariance.png, width=600 frac=0.9] -->\n",
+    "<!-- begin figure -->\n",
+    "\n",
+    "<img src=\"figures/BiasVariance.png\" width=\"600\"><p style=\"font-size: 0.9em\"><i>Figure 1: </i></p>\n",
+    "<!-- end figure -->"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0083d541",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Example code for Bias-Variance tradeoff"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "id": "36c4ace6",
+   "metadata": {
+    "collapsed": false,
+    "editable": true,
+    "jupyter": {
+     "outputs_hidden": false
+    }
+   },
+   "outputs": [],
+   "source": [
+    "import matplotlib.pyplot as plt\n",
+    "import numpy as np\n",
+    "from sklearn.linear_model import LinearRegression, Ridge, Lasso\n",
+    "from sklearn.preprocessing import PolynomialFeatures\n",
+    "from sklearn.model_selection import train_test_split\n",
+    "from sklearn.pipeline import make_pipeline\n",
+    "from sklearn.utils import resample\n",
+    "\n",
+    "np.random.seed(2018)\n",
+    "\n",
+    "n = 500\n",
+    "n_boostraps = 100\n",
+    "degree = 18  # A quite high value, just to show.\n",
+    "noise = 0.1\n",
+    "\n",
+    "# Make data set.\n",
+    "x = np.linspace(-1, 3, n).reshape(-1, 1)\n",
+    "y = np.exp(-x**2) + 1.5 * np.exp(-(x-2)**2) + np.random.normal(0, 0.1, x.shape)\n",
+    "\n",
+    "# Hold out some test data that is never used in training.\n",
+    "x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2)\n",
+    "\n",
+    "# Combine x transformation and model into one operation.\n",
+    "# Not neccesary, but convenient.\n",
+    "model = make_pipeline(PolynomialFeatures(degree=degree), LinearRegression(fit_intercept=False))\n",
+    "\n",
+    "# The following (m x n_bootstraps) matrix holds the column vectors y_pred\n",
+    "# for each bootstrap iteration.\n",
+    "y_pred = np.empty((y_test.shape[0], n_boostraps))\n",
+    "for i in range(n_boostraps):\n",
+    "    x_, y_ = resample(x_train, y_train)\n",
+    "\n",
+    "    # Evaluate the new model on the same test data each time.\n",
+    "    y_pred[:, i] = model.fit(x_, y_).predict(x_test).ravel()\n",
+    "\n",
+    "# Note: Expectations and variances taken w.r.t. different training\n",
+    "# data sets, hence the axis=1. Subsequent means are taken across the test data\n",
+    "# set in order to obtain a total value, but before this we have error/bias/variance\n",
+    "# calculated per data point in the test set.\n",
+    "# Note 2: The use of keepdims=True is important in the calculation of bias as this \n",
+    "# maintains the column vector form. Dropping this yields very unexpected results.\n",
+    "error = np.mean( np.mean((y_test - y_pred)**2, axis=1, keepdims=True) )\n",
+    "bias = np.mean( (y_test - np.mean(y_pred, axis=1, keepdims=True))**2 )\n",
+    "variance = np.mean( np.var(y_pred, axis=1, keepdims=True) )\n",
+    "print('Error:', error)\n",
+    "print('Bias^2:', bias)\n",
+    "print('Var:', variance)\n",
+    "print('{} >= {} + {} = {}'.format(error, bias, variance, bias+variance))\n",
+    "\n",
+    "plt.plot(x[::5, :], y[::5, :], label='f(x)')\n",
+    "plt.scatter(x_test, y_test, label='Data points')\n",
+    "plt.scatter(x_test, np.mean(y_pred, axis=1), label='Pred')\n",
+    "plt.legend()\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5edcf285",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Understanding what happens"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "id": "ef232d7f",
+   "metadata": {
+    "collapsed": false,
+    "editable": true,
+    "jupyter": {
+     "outputs_hidden": false
+    }
+   },
+   "outputs": [],
+   "source": [
+    "import matplotlib.pyplot as plt\n",
+    "import numpy as np\n",
+    "from sklearn.linear_model import LinearRegression, Ridge, Lasso\n",
+    "from sklearn.preprocessing import PolynomialFeatures\n",
+    "from sklearn.model_selection import train_test_split\n",
+    "from sklearn.pipeline import make_pipeline\n",
+    "from sklearn.utils import resample\n",
+    "\n",
+    "np.random.seed(2018)\n",
+    "\n",
+    "n = 40\n",
+    "n_boostraps = 100\n",
+    "maxdegree = 14\n",
+    "\n",
+    "\n",
+    "# Make data set.\n",
+    "x = np.linspace(-3, 3, n).reshape(-1, 1)\n",
+    "y = np.exp(-x**2) + 1.5 * np.exp(-(x-2)**2)+ np.random.normal(0, 0.1, x.shape)\n",
+    "error = np.zeros(maxdegree)\n",
+    "bias = np.zeros(maxdegree)\n",
+    "variance = np.zeros(maxdegree)\n",
+    "polydegree = np.zeros(maxdegree)\n",
+    "x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2)\n",
+    "\n",
+    "for degree in range(maxdegree):\n",
+    "    model = make_pipeline(PolynomialFeatures(degree=degree), LinearRegression(fit_intercept=False))\n",
+    "    y_pred = np.empty((y_test.shape[0], n_boostraps))\n",
+    "    for i in range(n_boostraps):\n",
+    "        x_, y_ = resample(x_train, y_train)\n",
+    "        y_pred[:, i] = model.fit(x_, y_).predict(x_test).ravel()\n",
+    "\n",
+    "    polydegree[degree] = degree\n",
+    "    error[degree] = np.mean( np.mean((y_test - y_pred)**2, axis=1, keepdims=True) )\n",
+    "    bias[degree] = np.mean( (y_test - np.mean(y_pred, axis=1, keepdims=True))**2 )\n",
+    "    variance[degree] = np.mean( np.var(y_pred, axis=1, keepdims=True) )\n",
+    "    print('Polynomial degree:', degree)\n",
+    "    print('Error:', error[degree])\n",
+    "    print('Bias^2:', bias[degree])\n",
+    "    print('Var:', variance[degree])\n",
+    "    print('{} >= {} + {} = {}'.format(error[degree], bias[degree], variance[degree], bias[degree]+variance[degree]))\n",
+    "\n",
+    "plt.plot(polydegree, error, label='Error')\n",
+    "plt.plot(polydegree, bias, label='bias')\n",
+    "plt.plot(polydegree, variance, label='Variance')\n",
+    "plt.legend()\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "46266d21",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Summing up\n",
+    "\n",
+    "The bias-variance tradeoff summarizes the fundamental tension in\n",
+    "machine learning, particularly supervised learning, between the\n",
+    "complexity of a model and the amount of training data needed to train\n",
+    "it.  Since data is often limited, in practice it is often useful to\n",
+    "use a less-complex model with higher bias, that is  a model whose asymptotic\n",
+    "performance is worse than another model because it is easier to\n",
+    "train and less sensitive to sampling noise arising from having a\n",
+    "finite-sized training dataset (smaller variance). \n",
+    "\n",
+    "The above equations tell us that in\n",
+    "order to minimize the expected test error, we need to select a\n",
+    "statistical learning method that simultaneously achieves low variance\n",
+    "and low bias. Note that variance is inherently a nonnegative quantity,\n",
+    "and squared bias is also nonnegative. Hence, we see that the expected\n",
+    "test MSE can never lie below $Var(\\epsilon)$, the irreducible error.\n",
+    "\n",
+    "What do we mean by the variance and bias of a statistical learning\n",
+    "method? The variance refers to the amount by which our model would change if we\n",
+    "estimated it using a different training data set. Since the training\n",
+    "data are used to fit the statistical learning method, different\n",
+    "training data sets  will result in a different estimate. But ideally the\n",
+    "estimate for our model should not vary too much between training\n",
+    "sets. However, if a method has high variance  then small changes in\n",
+    "the training data can result in large changes in the model. In general, more\n",
+    "flexible statistical methods have higher variance.\n",
+    "\n",
+    "You may also find this recent [article](https://www.pnas.org/content/116/32/15849) of interest."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cb2b6b38",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Another Example from Scikit-Learn's Repository"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 8,
+   "id": "a682b74e",
+   "metadata": {
+    "collapsed": false,
+    "editable": true,
+    "jupyter": {
+     "outputs_hidden": false
+    }
+   },
+   "outputs": [],
+   "source": [
+    "\"\"\"\n",
+    "============================\n",
+    "Underfitting vs. Overfitting\n",
+    "============================\n",
+    "\n",
+    "This example demonstrates the problems of underfitting and overfitting and\n",
+    "how we can use linear regression with polynomial features to approximate\n",
+    "nonlinear functions. The plot shows the function that we want to approximate,\n",
+    "which is a part of the cosine function. In addition, the samples from the\n",
+    "real function and the approximations of different models are displayed. The\n",
+    "models have polynomial features of different degrees. We can see that a\n",
+    "linear function (polynomial with degree 1) is not sufficient to fit the\n",
+    "training samples. This is called **underfitting**. A polynomial of degree 4\n",
+    "approximates the true function almost perfectly. However, for higher degrees\n",
+    "the model will **overfit** the training data, i.e. it learns the noise of the\n",
+    "training data.\n",
+    "We evaluate quantitatively **overfitting** / **underfitting** by using\n",
+    "cross-validation. We calculate the mean squared error (MSE) on the validation\n",
+    "set, the higher, the less likely the model generalizes correctly from the\n",
+    "training data.\n",
+    "\"\"\"\n",
+    "\n",
+    "print(__doc__)\n",
+    "\n",
+    "import numpy as np\n",
+    "import matplotlib.pyplot as plt\n",
+    "from sklearn.pipeline import Pipeline\n",
+    "from sklearn.preprocessing import PolynomialFeatures\n",
+    "from sklearn.linear_model import LinearRegression\n",
+    "from sklearn.model_selection import cross_val_score\n",
+    "\n",
+    "\n",
+    "def true_fun(X):\n",
+    "    return np.cos(1.5 * np.pi * X)\n",
+    "\n",
+    "np.random.seed(0)\n",
+    "\n",
+    "n_samples = 30\n",
+    "degrees = [1, 4, 15]\n",
+    "\n",
+    "X = np.sort(np.random.rand(n_samples))\n",
+    "y = true_fun(X) + np.random.randn(n_samples) * 0.1\n",
+    "\n",
+    "plt.figure(figsize=(14, 5))\n",
+    "for i in range(len(degrees)):\n",
+    "    ax = plt.subplot(1, len(degrees), i + 1)\n",
+    "    plt.setp(ax, xticks=(), yticks=())\n",
+    "\n",
+    "    polynomial_features = PolynomialFeatures(degree=degrees[i],\n",
+    "                                             include_bias=False)\n",
+    "    linear_regression = LinearRegression()\n",
+    "    pipeline = Pipeline([(\"polynomial_features\", polynomial_features),\n",
+    "                         (\"linear_regression\", linear_regression)])\n",
+    "    pipeline.fit(X[:, np.newaxis], y)\n",
+    "\n",
+    "    # Evaluate the models using crossvalidation\n",
+    "    scores = cross_val_score(pipeline, X[:, np.newaxis], y,\n",
+    "                             scoring=\"neg_mean_squared_error\", cv=10)\n",
+    "\n",
+    "    X_test = np.linspace(0, 1, 100)\n",
+    "    plt.plot(X_test, pipeline.predict(X_test[:, np.newaxis]), label=\"Model\")\n",
+    "    plt.plot(X_test, true_fun(X_test), label=\"True function\")\n",
+    "    plt.scatter(X, y, edgecolor='b', s=20, label=\"Samples\")\n",
+    "    plt.xlabel(\"x\")\n",
+    "    plt.ylabel(\"y\")\n",
+    "    plt.xlim((0, 1))\n",
+    "    plt.ylim((-2, 2))\n",
+    "    plt.legend(loc=\"best\")\n",
+    "    plt.title(\"Degree {}\\nMSE = {:.2e}(+/- {:.2e})\".format(\n",
+    "        degrees[i], -scores.mean(), scores.std()))\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b993d1a6",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Various steps in cross-validation\n",
+    "\n",
+    "When the repetitive splitting of the data set is done randomly,\n",
+    "samples may accidently end up in a fast majority of the splits in\n",
+    "either training or test set. Such samples may have an unbalanced\n",
+    "influence on either model building or prediction evaluation. To avoid\n",
+    "this $k$-fold cross-validation structures the data splitting. The\n",
+    "samples are divided into $k$ more or less equally sized exhaustive and\n",
+    "mutually exclusive subsets. In turn (at each split) one of these\n",
+    "subsets plays the role of the test set while the union of the\n",
+    "remaining subsets constitutes the training set. Such a splitting\n",
+    "warrants a balanced representation of each sample in both training and\n",
+    "test set over the splits. Still the division into the $k$ subsets\n",
+    "involves a degree of randomness. This may be fully excluded when\n",
+    "choosing $k=n$. This particular case is referred to as leave-one-out\n",
+    "cross-validation (LOOCV)."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8e2f13c8",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Cross-validation in brief\n",
+    "\n",
+    "For the various values of $k$\n",
+    "\n",
+    "1. shuffle the dataset randomly.\n",
+    "\n",
+    "2. Split the dataset into $k$ groups.\n",
+    "\n",
+    "3. For each unique group:\n",
+    "\n",
+    "a. Decide which group to use as set for test data\n",
+    "\n",
+    "b. Take the remaining groups as a training data set\n",
+    "\n",
+    "c. Fit a model on the training set and evaluate it on the test set\n",
+    "\n",
+    "d. Retain the evaluation score and discard the model\n",
+    "\n",
+    "5. Summarize the model using the sample of model evaluation scores"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "46eca8a8",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Code Example for Cross-validation and $k$-fold Cross-validation\n",
+    "\n",
+    "The code here uses Ridge regression with cross-validation (CV)  resampling and $k$-fold CV in order to fit a specific polynomial."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "id": "ab7e9af1",
+   "metadata": {
+    "collapsed": false,
+    "editable": true,
+    "jupyter": {
+     "outputs_hidden": false
+    }
+   },
+   "outputs": [
+    {
+     "data": {
+      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAjsAAAGwCAYAAABPSaTdAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjguNCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8fJSN1AAAACXBIWXMAAA9hAAAPYQGoP6dpAABPNUlEQVR4nO3deVxU5f4H8M+wDYuALDIjicpV0gw0RTOxEhNccmtRKc2lrDSXJHdtozJIyuXeSLPN3Yu3hepnLqBXSUMTMUpwrSg14eJCLIozMPP8/jgyMCyCOHCGw+f9ep04c84zM98zkvPxOc95jkoIIUBERESkUDZyF0BERETUkBh2iIiISNEYdoiIiEjRGHaIiIhI0Rh2iIiISNEYdoiIiEjRGHaIiIhI0ezkLsAaGI1GXLhwAa6urlCpVHKXQ0RERHUghEBhYSF8fX1hY1Nz/w3DDoALFy7Az89P7jKIiIioHs6dO4c2bdrUuJ9hB4CrqysA6cNyc3OTuRoiIiKqi4KCAvj5+Zm+x2vCsAOYTl25ubkx7BARETUxtQ1B4QBlIiIiUjSGHSIiIlI0hh0iIiJSNI7ZISKiW2YwGFBSUiJ3GaRw9vb2sLW1ve3XYdghIqI6E0IgJycHf//9t9ylUDPRsmVLaLXa25oHj2GHiIjqrCzo+Pj4wNnZmROxUoMRQuDatWvIzc0FALRu3brer8WwQ0REdWIwGExBx8vLS+5yqBlwcnICAOTm5sLHx6fep7RkHaDcvn17qFSqKsv06dMBSKkuKioKvr6+cHJyQmhoKDIzM81eQ6fTYebMmfD29oaLiwtGjBiB8+fPy3E4RESKVjZGx9nZWeZKqDkp+327nTFisoad1NRUZGdnm5akpCQAwOjRowEAsbGxWL58OeLi4pCamgqtVovw8HAUFhaaXiMyMhIJCQmIj4/HgQMHUFRUhGHDhsFgMMhyTERESsdTV9SYLPH7JutprFatWpk9fuedd9ChQwf069cPQgisXLkSL7/8Mh577DEAwPr166HRaLBlyxZMmTIF+fn5+PTTT7Fx40aEhYUBADZt2gQ/Pz/s3r0bgwYNqvZ9dToddDqd6XFBQUEDHSERERHJzWrm2dHr9di0aROeeeYZqFQqZGVlIScnBwMHDjS1UavV6NevH1JSUgAAaWlpKCkpMWvj6+uLwMBAU5vqxMTEwN3d3bTwJqBERETKZTVh5+uvv8bff/+NSZMmAZBG/AOARqMxa6fRaEz7cnJy4ODgAA8PjxrbVGfRokXIz883LefOnbPgkRAREclHpVLh66+/lrsMq2I1V2N9+umnGDJkCHx9fc22Vz5XJ4So9fxdbW3UajXUanX9i70VV64AFcYYoayusp9aLWBvL60XFJi3rdgOALy9AQcHab2oSFpqet2WLcvbFhcDV6/W/LquruVtdTrg2rWaX9fZubzekhLg+vWaX1etLm9bWgro9TW/rp2dtACA0Si1r+l1bWyAshH5Qkjta2qrUpk/JiKiZscqws6ff/6J3bt346uvvjJt02q1AKTem4rX1ufm5pp6e7RaLfR6PfLy8sx6d3JzcxESEtJI1ddiyRJgxYqa9584AXTuLK2/+67UviZpaUCPHtJ6XBywaFHNbb//HnjgAWn944+BWbNqbrtjBzB4sLS+aRPw7LM1t/3yS+DGGCp8+SXw5JM1t92wARg/Xlrfvh0YObLmtqtXA1OnSut79wI3xmBVa9kyYPZsaf3HH4E+fWpuGxUFvP66tJ6RAfTqJYWlssXWtnx95kzg1VeltmfPAqGh1beztQWeeAJYuFBqe+UKMHy4tM/BQQp4FX/27w88/7zUVq8HFi+u2qbsZ0AAUOG0LPbvB5ycpJBZtpQ9tsCsokRUVUlJCezL/qFGFqPX6+FQ9g/rRmYVp7HWrl0LHx8fDB061LTN398fWq3WdIUWIH1QycnJpiATHBwMe3t7szbZ2dnIyMiwnrBjZwc4OpYvarX5UrHXwdZW+tKzty/v6bC1LV8qK+u1YM9FzSp+bgaD1BN17ZrUK1ZQAOTlAZcvAxcvmvdo6fVAVhbw22/AmTPAyZPA8eNSYPr5Z+DChfK2168DKSnAgQPAf/8L7NoF/N//AV99BcTHA4cPl7ctLpbC2jvvAG++KYWrBQuk8DZjBvDZZ+b1PvigFNDuvhvw9wc0GsDNTfrdqBweBw0Chg4Fxo0Dpk+XQlVsLLBmDbB7t3lbXq1IFiKEwDV9aaMvQohbqtNoNGLp0qXo2LEj1Go12rZti7fffht//PEHVCoV/vOf/yA0NBSOjo7YtGkTjEYj3nzzTbRp0wZqtRr33HMPdu7caXo9vV6PGTNmoHXr1nB0dET79u0RExNj2h8VFYW2bdtCrVbD19cXL774Yq01Llq0CPfdd1+V7V27dsXrN/7RlpqaivDwcHh7e8Pd3R39+vXD0aNHb+mzqOsx/P3333j++eeh0Wjg6OiIwMBAbNu2zbT/yy+/xN133w21Wo327dtj2bJlZq/fvn17LFmyBJMmTYK7uzuee+45AEBKSgoefPBBODk5wc/PDy+++CKuVj77YGEqcau/MRZmNBrh7++PJ598Eu+8847ZvqVLlyImJgZr165FQEAAoqOjsW/fPpw6dQqurq4AgBdeeAHbtm3DunXr4Onpiblz5+Ly5ctIS0ur8+RDBQUFcHd3R35+Ptzc3Cx+jLKr+Edctl5xm41NeWAyGs2/CCv/elQMXgaDdCqrprZloQ2QTktVPOVVua1aXX4qrbTU/LRb5bZlwbGsbdnVdNUdW1lvCCAFmJwcqW6jsfxYy9a9vICyXsTr16VQU1NbP7/yHrniYmDnzvLPQ6+Xfpatd+kChIdLba9dk3qbKrar2P7++4GyvxSvXZN68q5dK1+Ki8uPbdQo4PPPy/8s7G7SUTtokFRjGW9v6aePj7RoNNLSti0QFCS1J6rk+vXryMrKgr+/Pxxv/D94TV+KLq/tavRajr85CM4OdT85sWDBAnz88cdYsWIF7r//fmRnZ+PkyZMICwuDv7+/6cu6e/fuUKvV2Lp1K6KiorBmzRp0794dn332GVasWIHMzEwEBATgvffew7/+9S9s3rwZbdu2xblz53Du3Dk8+eST+OKLLzB58mTEx8fj7rvvRk5ODn7++WfTl31NMjIyEBQUhF9//RUdOnQAAGRmZiIwMBCnTp3CnXfeif/+97+4cOECgoODAQDLli3Dtm3bcObMGdP3okqlQkJCAh555JGbvt/NjsFoNKJv374oLCzEihUr0KFDBxw/fhy2trYYMmQI0tLScO+99yIqKgoRERFISUnBtGnTsGrVKtPY2/bt2yMvLw+vvvqqqZbi4mKEhITgrbfewtChQ3Hx4kXMmDED3bp1w9q1a6uts7rfuzJ1/f6WPewkJiZi0KBBpj/IioQQeOONN7BmzRrk5eWhd+/e+OCDDxAYGGhqc/36dcybNw9btmxBcXExBgwYgFWrVt3SFVaKDzukHEKU906pVICnp7S9tBT49lvg77+B/HzpZ8X1Hj2A116T2ur1UrisSVgYUKG3FPfeK43ruvNOoFMnaencWQpGPJXWrDTVsFNYWIhWrVohLi4Oz1Y6Tf/HH3/A398fK1euxKwKp/vvuOMOTJ8+HYsXLzZtu/fee9GrVy988MEHePHFF5GZmYndu3dXGSO6fPlyrFmzBhkZGbd8Oqxbt24YNWoUXr1xSn3x4sXYvXs3DlfsIa7AYDDAw8MDW7ZswbBhwwDUPezc7BgSExMxZMgQnDhxosp3MwCMGzcOFy9eRGJiomnb/Pnz8d1335km/23fvj26d++OhIQEU5sJEybAyckJa9asMW07cOAA+vXrh6tXr1YJM4Blwg4Eifz8fAFA5Ofny10KUcMzGoXIzRXi2DEh9uwR4t//FmLlSiHmzxfiySeFeOed8rZXrwohRayqi1otxKRJ5q/N/4cUrbi4WBw/flwUFxebthmNRnFVV9Loi9ForHPdP/74owAgfv/99yr7srKyBABx4MAB07ay74R9+/aZtY2MjBT9+/cXQgiRlpYmPD09RUBAgJg5c6bYtWuXqd3Zs2eFn5+faNOmjXj22WfFV199JUpKSupU69KlS0WnTp1Mn2379u3FypUrTfv/97//iSlTpoiAgADh5uYmXFxchEqlEh988IGpDQCRkJBQ63vd7BiWLl0q2rZtW+Nzu3fvLqKiosy2ff3118Le3l6UlpYKIYRo166dWLJkiVmbLl26CAcHB+Hi4mJanJ2dBQBx/Pjxat+rut+7MnX9/raKAcpE1IhUKqBVK2mpjb29NNg9Kws4fVoau3TqlLSu05WfegSkU2yenkCbNkBwsHRK7oEHgHvuufkpNmrSVCrVLZ1OkkPZ/ZVuxsXFpcq2m10N3KNHD2RlZWHHjh3YvXs3xowZg7CwMHzxxRfw8/PDqVOnkJSUhN27d2PatGl49913kZycXGtPz9ixY7Fw4UIcPXoUxcXFOHfuHJ544gnT/kmTJuHixYtYuXIl2rVrB7VajT59+kBf8WrXOrrZMdT2mYlqrnoW1Zwoqvy5Go1GTJkypdoxTG3btr3lY6gr6/4NJSJ52dtLgaXsyr4yBgPw55/SeK8yJ0+Wb//zT2mANgC0aAGEhADPPSeNMyJqZAEBAXBycsKePXuqnMaqjpubG3x9fXHgwAE8+OCDpu0pKSm49957zdpFREQgIiICo0aNwuDBg3HlyhV4enrCyckJI0aMwIgRIzB9+nR07twZx44dQ4+yK2pr0KZNGzz44IPYvHkziouLERYWZjbf3P79+7Fq1So8/PDDAIBz587h0qVLt/qR1HoMXbt2xfnz53H69OlqT2N16dIFBw4cMNuWkpKCO++886bjZXv06IHMzEx07Nix3jXXB8MOEd06W1vgH/8w39a9u3QZfnq6NCXA/v3ADz9I44YSE80HPV+6VL6Nd8+mBubo6IgFCxZg/vz5cHBwQN++fXHx4kVkZmZiwIAB1T5n3rx5eP3119GhQwfcc889WLt2LdLT07F582YAwIoVK9C6dWvcc889sLGxweeffw6tVouWLVti3bp1MBgM6N27N5ydnbFx40Y4OTmhXbt2dap33LhxiIqKgl6vx4pKU5d07NgRGzduRM+ePVFQUIB58+bVqeeqOjc7hn79+uHBBx/E448/juXLl6Njx444efIkVCoVBg8ejDlz5qBXr1546623EBERgYMHDyIuLg6rVq266XsuWLAA9913H6ZPn47nnnsOLi4uOHHiBJKSkvD+++/X6zjq5KYnuZoJjtkhaiClpUKkpwvx/vtCnDpVvn3dOmncj62tEIMGCfHZZ0JcuSJfnVQnNxs7Ye0MBoNYsmSJaNeunbC3txdt27YV0dHRpjE7P/30U5X2b7zxhrjjjjuEvb296Natm9ixY4dp/0cffSTuuece4eLiItzc3MSAAQPE0aNHhRBCJCQkiN69e5vG1Nx3331i9+7dda41Ly9PqNVq4ezsLAoLC832HT16VPTs2VOo1WoREBAgPv/8c9GuXTuxYsUKUxvUcczOzY5BCCEuX74snn76aeHl5SUcHR1FYGCg2LZtm2n/F198Ibp06WL6PN99912z169cV5nDhw+L8PBw0aJFC+Hi4iK6du0q3n777RrrtMSYHdmvxrIGvBqLqJH9+99AdLQ0b1EZe3upp2fiRGkOIU7qZnVudlUMUUOxxNVYVjGpIBE1M08+CRw7Jg12fustIDBQmmdo2zZgzBjgr7/krpCIFIRhh4jkc+edwCuvSMEnM1Oa9XnCBKB9+/I2S5YAe/ZUnVySqInZv38/WrRoUeNiadHR0TW+15AhQyz+ftaMp7HA01hEVuvcOSn4GI1Az57SrTUefZSTGcqEp7FuT3FxMf66Sa+lpa9QunLlCq5cuVLtPicnJ9xxxx0Wfb+GYonTWLwai4isl4MDMG0a8OmnwJEjwOjR0uzNb78thR7eF46aECcnp0a95NrT0xOeZbOsN3M8jUVE1kujAd5/X5q357XXAA8PaT6fxx8HeveWxvwQEdWCYYeIrF+rVsAbb0gzOb/6KuDiIt2N3sdH7sqIqAlg2CGipsPdHXjzTeC336Q7vnt4SNuFADZsML8rPBHRDQw7RNT0aDTS3dnLfPONND9PUJA0czMRUQUMO0TU9NnbA3fcIfX49OsHzJ7NXh4iMmHYIaKmb+hQaZ6eZ56RTmmtWCHdbT01Ve7KqJmYNGkSHnnkkZu2CQ0NRWRkZKPUQ+YYdohIGdzdpUvUt20DWrcGTp8G+vaVtlGzV10Y+eKLL+Do6IjY2FhERUVBpVJVWXbv3i1PwWRRDDtEpCxlvTyPPSbdgsLXV+6KyAp98sknGDduHOLi4jB//nwAwN13343s7Gyz5cEHH5S5UrIEhh0iUh4PD+CLL4AffgAqTotfWipfTUp39WrNy/XrdW9beaxVdW1uU2xsLGbMmIEtW7bg2WefNW23s7ODVqs1WxwcHAAAx44dw0MPPQQnJyd4eXnh+eefR1FR0U0+jquYMGECWrRogdatW2PZsmW3XTfVH8MOESmTSgWEhJQ/zsoC7r4b4GmJhtGiRc3L44+bt/Xxqblt5Xs2tW9ftc1tWLhwId566y1s27YNj1euqwbXrl3D4MGD4eHhgdTUVHz++efYvXs3ZsyYUeNz5s2bh7179yIhIQGJiYnYt28f0tLSbqt2qj/eLoKImoclS6RxPIMHA6tWAc8/L3dF1Mh27NiBb775Bnv27MFDDz1UZf+xY8fMbsjZpUsXHD58GJs3b0ZxcTE2bNgAFxcXAEBcXByGDx+OpUuXQqPRmL1OUVERPv30U2zYsAHh4eEAgPXr16NNmzYNeHR0Mww7RNQ8rFoF6HTA5s3AlCnSDMyxsby/lqXc5JROlRu35ubW3Nam0gmHP/6od0mVde3aFZcuXcJrr72GXr16wdXV1Wx/p06d8O2335oeq9VqAMCJEyfQrVs3U9ABgL59+8JoNOLUqVNVws5vv/0GvV6PPn36mLZ5enqiU6dOFjsWujUMO0TUPKjVwMaNQKdO0n223nsP+Ptv4MMPeRd1S6gQBGRrW4s77rgDX375Jfr374/Bgwdj586dZoHHwcGh2ht1CiGgqiEUV7ddCGGxmskyOGaHiJoPlUq6t9batVIPwiefAOPGAXq93JVRI2nbti2Sk5ORm5uLgQMHoqCgoNbndOnSBenp6bhaYXD0Dz/8ABsbG9x5551V2nfs2BH29vY4dOiQaVteXh5Onz5tmYOgW8awQ0TNz6RJwNat0szLv/5a9WohUrQ2bdpg3759uHz5MgYOHIj8/Pybth83bhwcHR0xceJEZGRkYO/evZg5cybGjx9f5RQWALRo0QKTJ0/GvHnzsGfPHmRkZGDSpEmwqXyKjhoNT2MRUfM0ahTg6Ql07Qq4ucldDTWyO+64A8nJyejfvz/Cw8MRUvHKvUqcnZ2xa9cuzJo1C7169YKzszMef/xxLF++vMbnvPvuuygqKsKIESPg6uqKOXPm1BqqqOGoBE8uoqCgAO7u7sjPz4cb/9Ijar527QIGDADs+O/A6ly/fh1ZWVnw9/eHo6Oj3OVQM3Gz37u6fn+zT42ICAD+9S/psvRJkwCDQe5qiMiCGHaIiABp8jo7O+nS9BdekG4oSkSKwLBDRAQAI0YAmzZJV2l9/DHw1ltyV0REFsKwQ0RUJiJCmnwQAF5/HdiwQd56rBSHelJjssTvG8MOEVFFU6YACxZI65MnA//9r7z1WBF7e3sA0r2iiBpL2e9b2e9fffCSAyKiyqKjpdsUbN0KpKQA1dxHqTmytbVFy5YtkXvjdg/Ozs41zixMdLuEELh27Rpyc3PRsmVL2N7GTOcMO0REldnYAOvWSae1Hn1U7mqsilarBQBT4CFqaC1btjT93tUXww4RUXUcHc2DTkmJFIKa+X20VCoVWrduDR8fH5SUlMhdDimcvb39bfXolGHYISKqTU4OMHo00K8fsGSJ3NVYBVtbW4t8CRE1BoYdIqLa7N8PHDggLcHBPLVF1MTwaiwiotqMHg289JK0PmECcPKkvPUQ0S2RPez89ddfeOqpp+Dl5QVnZ2fcc889SEtLM+0XQiAqKgq+vr5wcnJCaGgoMjMzzV5Dp9Nh5syZ8Pb2houLC0aMGIHz58839qEQkZLFxgKhoUBRkTRwmXdKJ2oyZA07eXl56Nu3L+zt7bFjxw4cP34cy5YtQ8uWLU1tYmNjsXz5csTFxSE1NRVarRbh4eEoLCw0tYmMjERCQgLi4+Nx4MABFBUVYdiwYTDw/jZEZCl2dsCWLUCrVsAvvwDz5sldERHVkax3PV+4cCF++OEH7N+/v9r9Qgj4+voiMjISC25M8qXT6aDRaLB06VJMmTIF+fn5aNWqFTZu3IiIiAgAwIULF+Dn54ft27dj0KBBtdbBu54TUZ3t3AkMGSKtf/01MHKkrOUQNWdN4q7n3377LXr27InRo0fDx8cH3bt3x8cff2zan5WVhZycHAwcONC0Ta1Wo1+/fkhJSQEApKWloaSkxKyNr68vAgMDTW0q0+l0KCgoMFuIiOpk8GBg7lzg7ruBDh3kroaI6kDWsPP7779j9erVCAgIwK5duzB16lS8+OKL2HDjfjQ5OTkAAI1GY/Y8jUZj2peTkwMHBwd4eHjU2KaymJgYuLu7mxY/Pz9LHxoRKdnbbwOHDwOBgXJXQkR1IGvYMRqN6NGjB6Kjo9G9e3dMmTIFzz33HFavXm3WrvJ05EKIWqcov1mbRYsWIT8/37ScO3fu9g6EiJoXBwfA2bn88ZUr8tVCRLWSNey0bt0aXbp0Mdt211134ezZswDKpyWv3EOTm5tr6u3RarXQ6/XIy8ursU1larUabm5uZgsR0S0zGKRJBtu1AypdJUpE1kPWsNO3b1+cOnXKbNvp06fRrl07AIC/vz+0Wi2SkpJM+/V6PZKTkxESEgIACA4Ohr29vVmb7OxsZGRkmNoQETUIGxvpRqFFRdL8O7x9ApFVkjXsvPTSSzh06BCio6Px66+/YsuWLfjoo48wffp0ANLpq8jISERHRyMhIQEZGRmYNGkSnJ2dMXbsWACAu7s7Jk+ejDlz5mDPnj346aef8NRTTyEoKAhhYWFyHh4RKZ1KBXzyCeDhARw9Kt0tnYisjqyXngPAtm3bsGjRIpw5cwb+/v6YPXs2nnvuOdN+IQTeeOMNrFmzBnl5eejduzc++OADBFYYGHj9+nXMmzcPW7ZsQXFxMQYMGIBVq1bVeeAxLz0notvy738DY8dKc/EcOiTdUoKIGlxdv79lDzvWgGGHiG6LENKsyp9/DnTtChw5Atjby10VkeI1iXl2iIgUQaUCPvgA8PKSZldetkzuioioAoYdIiJLaNUKWL4ccHQEnJzkroaIKrCTuwAiIsUYPx546CGgTRu5KyGiCtizQ0RkKSoVgw6RFWLYISJqCAcOACEhwP/+J3clRM0eww4RkaUJAURGAgcPAgsXyl0NUbPHsENEZGllV2cBwLp1wA8/yFoOUXPHsENE1BB69waefVZanz4dKC2Vtx6iZoxhh4ioocTESLeS+PlnYPVquasharYYdoiIGoq3txR4AOCVVzhYmUgmDDtERA3p2WeBnj2BggLgs8/kroaoWeKkgkREDcnWVjqFdeYM8MQTcldD1Cwx7BARNbSePaWFiGTB01hERI2poAA4fFjuKoiaFfbsEBE1lmPHgLAwaR6e06cBNze5KyJqFtizQ0TUWDp1Atzdpauyyq7SIqIGx7BDRNRYHByAZcuk9eXLgawseeshaiYYdoiIGtOwYdKpLL0emD9f7mqImgWGHSKixqRSSb06NjbAF18A338vd0VEisewQ0TU2IKCgOefl9YjIwGDQdZyiJSOYYeISA5vvgm0bAl07QpcuyZ3NUSKxkvPiYjk0KoV8OuvgJeX3JUQKR57doiI5MKgQ9QoGHaIiOT222/A2LHA+fNyV0KkSDyNRUQkt+efB/77X2kennXr5K6GSHHYs0NEJLey2ZQ3bAB++kneWogUiGGHiEhu994rncYSApgzR/pJRBbDsENEZA2iowG1Gti7F9i2Te5qiBSFYYeIyBq0awe89JK0Pm8eUFIibz1ECsKwQ0RkLRYtkubfOXUKWLtW7mqIFINXYxERWQs3N2DpUuCvv4CnnpK7GiLFYNghIrImTz8tdwVEisPTWERE1spoBPLz5a6CqMlj2CEiskZHjgDBwcCUKXJXQtTkMewQEVkje3vg55+BrVuBgwflroaoSWPYISKyRt26lY/fmT2bEw0S3QaGHSIia/XWW4CzM3DoEPD553JXQ9RkMewQEVkrX19g/nxpfeFCQKeTtx6iJkrWsBMVFQWVSmW2aLVa034hBKKiouDr6wsnJyeEhoYiMzPT7DV0Oh1mzpwJb29vuLi4YMSIETh//nxjHwoRUcOYO1cKPVlZwPvvy10NUZMke8/O3XffjezsbNNy7Ngx077Y2FgsX74ccXFxSE1NhVarRXh4OAoLC01tIiMjkZCQgPj4eBw4cABFRUUYNmwYDAaDHIdDRGRZLi7AkiXS+p49HLtDVA+yTypoZ2dn1ptTRgiBlStX4uWXX8Zjjz0GAFi/fj00Gg22bNmCKVOmID8/H59++ik2btyIsLAwAMCmTZvg5+eH3bt3Y9CgQY16LEREDWLCBOk2EkOHAiqV3NUQNTmy9+ycOXMGvr6+8Pf3xxNPPIHff/8dAJCVlYWcnBwMHDjQ1FatVqNfv35ISUkBAKSlpaGkpMSsja+vLwIDA01tqqPT6VBQUGC2EBFZLVtbYNgwBh2iepI17PTu3RsbNmzArl278PHHHyMnJwchISG4fPkycnJyAAAajcbsORqNxrQvJycHDg4O8PDwqLFNdWJiYuDu7m5a/Pz8LHxkREQNpLAQ2LBB7iqImhRZT2MNGTLEtB4UFIQ+ffqgQ4cOWL9+Pe677z4AgKrSv2SEEFW2VVZbm0WLFmH27NmmxwUFBQw8RGT9rl0D7rpLulFo27ZAaKjcFRE1CbKfxqrIxcUFQUFBOHPmjGkcT+UemtzcXFNvj1arhV6vR15eXo1tqqNWq+Hm5ma2EBFZPWdnYORIaX3OHOneWURUK6sKOzqdDidOnEDr1q3h7+8PrVaLpKQk0369Xo/k5GSEhIQAAIKDg2Fvb2/WJjs7GxkZGaY2RESKEhUFuLkBR48CmzbJXQ1RkyBr2Jk7dy6Sk5ORlZWFH3/8EaNGjUJBQQEmTpwIlUqFyMhIREdHIyEhARkZGZg0aRKcnZ0xduxYAIC7uzsmT56MOXPmYM+ePfjpp5/w1FNPISgoyHR1FhGRorRqBSxeLK0vXiyd2iKim5J1zM758+fx5JNP4tKlS2jVqhXuu+8+HDp0CO3atQMAzJ8/H8XFxZg2bRry8vLQu3dvJCYmwtXV1fQaK1asgJ2dHcaMGYPi4mIMGDAA69atg62trVyHRUTUsGbNAlavBv78E1i+HHjlFbkrIrJqKiE4Q1VBQQHc3d2Rn5/P8TtE1DTExwNPPilNOvjrr0A185URKV1dv7+taswOERHVUUQE0KcP8OijcldCZPVkn0GZiIjqQaUC9u4F1Gq5KyGyeuzZISJqqhh0iOqEYYeIqKn77Tdg9Ghg5065KyGySjyNRUTU1K1eDXzxBXDiBBAWBtjxr3aiitizQ0TU1L38MuDpCWRmAp9+Knc1RFaHYYeIqKnz8ABee01af+01ID9f3nqIrAzDDhGRErzwAnDnnUBuLicZJKqEYYeISAkcHIAPPpDWP/gASE2Vtx4iK8KwQ0SkFGFhwLhxgBDAO+/IXQ2R1eCQfSIiJVm2DOjYEViwQO5KiKwGww4RkZJoNEBUlNxVEFkVnsYiIlIqgwFISJBOaxE1Yww7RERKZDAA/fsDjz0GfP653NUQyYphh4hIiWxtgYcektanT5cuSSdqphh2iIiUavFioGtX4NIlYMYMuashkg3DDhGRUjk4AOvWSb08n3/O01nUbDHsEBEpWffuUg8PIJ3OunhR3nqIZMCwQ0SkdK+8AgQFSUFn+nS5qyFqdAw7RERK5+AAbNoE9O4NvPGG3NUQNTpOKkhE1Bx07QocPAioVHJXQtTo2LNDRNRcVAw6hw4BV6/KVwtRI2LYISJqbv71L6BvXyAyUu5KiBoFww4RUXMTFCTdQuKTT4DPPpO7GqIGx7BDRNTc9O8PvPmmtP7CC0Bqqrz1EDUwhh0iouZo8WJg5EhAr5fun8XbSZCCMewQETVHNjbAhg1Ap07A+fNS4Ll+Xe6qiBoEww4RUXPl5gZ8/TXg7g788AOwdavcFRE1CM6zQ0TUnHXuDCQkAOnpwIQJcldD1CAYdoiImrv+/aWljMEg3TyUSCF4GouIiMoVFgIDBgAffyx3JUQWw54dIiIqt2EDkJwMfP894OQEPPWU3BUR3Tb27BARUblp06Q7owsBTJwIrF0rd0VEt41hh4iIyqlU0u0knn8eMBqBZ54BVqyQuyqi28KwQ0RE5mxsgA8/BObNkx7Pni1NQmg0ylsXUT0x7BARUVUqFbB0KRAdLT1etw64dEnWkojqiwOUiYioeioVsGgRcMcd0nw8Pj5yV0RULww7RER0c5UnG9y8GbCzAyIi5KmH6BZZzWmsmJgYqFQqREZGmrYJIRAVFQVfX184OTkhNDQUmZmZZs/T6XSYOXMmvL294eLighEjRuD8+fONXD0RUTPx22/S4OUnngAmTwby8+WuiKhWVhF2UlNT8dFHH6Fr165m22NjY7F8+XLExcUhNTUVWq0W4eHhKCwsNLWJjIxEQkIC4uPjceDAARQVFWHYsGEwGAyNfRhERMrXrp00YFmlAj77DLjrLuCrr6RL1YmslOxhp6ioCOPGjcPHH38MDw8P03YhBFauXImXX34Zjz32GAIDA7F+/Xpcu3YNW7ZsAQDk5+fj008/xbJlyxAWFobu3btj06ZNOHbsGHbv3l3je+p0OhQUFJgtRERUB3Z2wFtvAfv2AQEBQHY28PjjwPDhwPHjcldHVC3Zw8706dMxdOhQhIWFmW3PyspCTk4OBg4caNqmVqvRr18/pKSkAADS0tJQUlJi1sbX1xeBgYGmNtWJiYmBu7u7afHz87PwURERKdyDDwK//AK88ooUgL77DujVC8jLk7syoipkDTvx8fE4evQoYmJiquzLyckBAGg0GrPtGo3GtC8nJwcODg5mPUKV21Rn0aJFyM/PNy3nzp273UMhImp+HB2lXp6MDOCxx4AXXgAq/n18+rR8tRFVINvVWOfOncOsWbOQmJgIR0fHGtupVCqzx0KIKtsqq62NWq2GWq2+tYKJiKh6nToBX35pPungwYNASAhw333SLMwREYCbm3w1UrMmW89OWloacnNzERwcDDs7O9jZ2SE5ORn/+te/YGdnZ+rRqdxDk5uba9qn1Wqh1+uRV6nbtGIbIiJqJDYVvlJ+/FE6vXXokHT1lo8PMGwY8MknwP/+J1+N1CzJFnYGDBiAY8eOIT093bT07NkT48aNQ3p6Ov7xj39Aq9UiKSnJ9By9Xo/k5GSEhIQAAIKDg2Fvb2/WJjs7GxkZGaY2REQkg8hI4Px5IDZW6vnR6aRxPc89B7RuDWRllbctLubVXNSgZDuN5erqisDAQLNtLi4u8PLyMm2PjIxEdHQ0AgICEBAQgOjoaDg7O2Ps2LEAAHd3d0yePBlz5syBl5cXPD09MXfuXAQFBVUZ8ExERI1Mo5HurzV3LpCZCXzzDfD118CffwLt25e3mzQJSEoCAgOBoCDgzjulS9zLFk9PmQ7ASggBGAyAra10yT8ghceSEunUoRDlS9ljDw+pPQAUFACFheb7K663aQOUDe24dAnIza2+nRBScHVxkdr+73/AuXPlQbXyz86dAXf3xvmMamHVMyjPnz8fxcXFmDZtGvLy8tC7d28kJibC1dXV1GbFihWws7PDmDFjUFxcjAEDBmDdunWwLftDJiIiealUUpAJDARefln68q04rvLQIekqrv37paWili3Nr/CaPRs4e1b6Mm/RQhokXbZ4eEinzMp89530xV0dtRq48Q9nAMDnn0u9TaWlVZey+4SVWbkSOHpU2ldSIgURg0F6bDAA27eXn9J75RUgMbFqm7Kfx46Vh4cXXwQ2bqzarmwsVHY2oNVK63PnAnFxNX/mv/4KdOggrUdHm9df2c8/A2Xz3K1aBbz+es1tU1KAPn2k9c2bgTlzam67ezcwYEDN+xuRVYWdffv2mT1WqVSIiopCVFRUjc9xdHTE+++/j/fff79hiyMiIsuoPFD51Cng5Enpiz8jA/j9d6n3588/pVNeFSUlSW2q07q1edh5+21poHR1PDzMw86aNcCePdW3tbc3Dwt79wLfflt9W0AKKWVh57ffgNTUmtuWlpavFxcDf/9989ctU8uFOmanBW1tpfFTNjbS81Qq8/WKr+XiAnh719zO3r68rZsb0LateT0Vf97k4qPGphKCJ0oLCgrg7u6O/Px8uPFqASIi61FaKn1Rl/nmG2ks0JUrwLVrwPXr5YurK/Cvf5W3nTtXClFA1TFBzs5Sb06Z2FhpUkQ7u6qLg0P53d8BICFBCmR2duVBomzd1hYYN678FFJaGnDhgnnbij979iw/vuxsqder4v6K6x4e5SFKr5fCT8UgUjmY1BaIFKCu398MO2DYISIiaorq+v0t+wzKRERERA2JYYeIiIgUjWGHiIiIFI1hh4iIiBSNYYeIiIgUrd5hp7S0FLt378aaNWtQWFgIALhw4QKKioosVhwRERHR7arXpIJ//vknBg8ejLNnz0Kn0yE8PByurq6IjY3F9evX8eGHH1q6TiIiIqJ6qVfPzqxZs9CzZ0/k5eXBycnJtP3RRx/FnppmoCQiIiKSQb16dg4cOIAffvgBDg4OZtvbtWuHv/76yyKFEREREVlCvXp2jEYjDBXv0XHD+fPnzW7SSURERCS3eoWd8PBwrFy50vRYpVKhqKgIr7/+Oh5++GFL1UZERER02+p1b6wLFy6gf//+sLW1xZkzZ9CzZ0+cOXMG3t7e+P777+Hj49MQtTYY3huLiIio6anr93e9xuz4+voiPT0d8fHxSEtLg9FoxOTJkzFu3DizActEREREcuNdz8GeHSIioqaoQe96vn79enz33Xemx/Pnz0fLli0REhKCP//8sz4vSURERNQg6hV2oqOjTaerDh48iLi4OMTGxsLb2xsvvfSSRQskIiIiuh31GrNz7tw5dOzYEQDw9ddfY9SoUXj++efRt29fhIaGWrI+IiIiottSr56dFi1a4PLlywCAxMREhIWFAQAcHR1RXFxsueqIiIiIblO9enbCw8Px7LPPonv37jh9+jSGDh0KAMjMzES7du0sWiARERHR7ahXz84HH3yAPn364OLFi/jyyy/h5eUFAEhLS8PYsWMtWiARERHR7aj3pefXr1/HL7/8gtzcXBiNRrN9I0aMsEhxjYWXnhMRETU9DTqp4M6dOzFhwgRcvnwZlbOSSqWq9r5ZRERERHKo12msGTNmYPTo0bhw4QKMRqPZwqBDRERE1qReYSc3NxezZ8+GRqOxdD1EREREFlWvsDNq1Cjs27fPwqUQERERWV69Bihfu3YNo0ePRqtWrRAUFAR7e3uz/S+++KLFCmwMHKBMRETU9DToAOUtW7Zg165dcHJywr59+6BSqUz7VCpVkws7REREpFz1CjuvvPIK3nzzTSxcuBA2NvU6E0ZERETUKOqVVPR6PSIiIhh0iIiIyOrVK61MnDgRW7dutXQtRERERBZXr9NYBoMBsbGx2LVrF7p27VplgPLy5cstUhwRERHR7apX2Dl27Bi6d+8OAMjIyDDbV3GwMhEREZHc6hV29u7da+k6iIiIiBoERxgTERGRojHsEBERkaLJGnZWr16Nrl27ws3NDW5ubujTpw927Nhh2i+EQFRUFHx9feHk5ITQ0FBkZmaavYZOp8PMmTPh7e0NFxcXjBgxAufPn2/sQyEiIiIrJWvYadOmDd555x0cOXIER44cwUMPPYSRI0eaAk1sbCyWL1+OuLg4pKamQqvVIjw8HIWFhabXiIyMREJCAuLj43HgwAEUFRVh2LBhvPs6ERERAajnvbEakqenJ959910888wz8PX1RWRkJBYsWABA6sXRaDRYunQppkyZgvz8fLRq1QobN25EREQEAODChQvw8/PD9u3bMWjQoDq9J++NRURE1PTU9fvbasbsGAwGxMfH4+rVq+jTpw+ysrKQk5ODgQMHmtqo1Wr069cPKSkpAIC0tDSUlJSYtfH19UVgYKCpTXV0Oh0KCgrMFiIiIlIm2cPOsWPH0KJFC6jVakydOhUJCQno0qULcnJyAAAajcasvUajMe3LycmBg4MDPDw8amxTnZiYGLi7u5sWPz8/Cx8VERERWQvZw06nTp2Qnp6OQ4cO4YUXXsDEiRNx/Phx0/7KkxQKIWqduLC2NosWLUJ+fr5pOXfu3O0dBBEREVkt2cOOg4MDOnbsiJ49eyImJgbdunXDP//5T2i1WgCo0kOTm5tr6u3RarXQ6/XIy8ursU111Gq16QqwsoWIiIiUSfawU5kQAjqdDv7+/tBqtUhKSjLt0+v1SE5ORkhICAAgODgY9vb2Zm2ys7ORkZFhakNERETNW71uF2EpixcvxpAhQ+Dn54fCwkLEx8dj37592LlzJ1QqFSIjIxEdHY2AgAAEBAQgOjoazs7OGDt2LADA3d0dkydPxpw5c+Dl5QVPT0/MnTsXQUFBCAsLk/PQiIiIyErIGnb+97//Yfz48cjOzoa7uzu6du2KnTt3Ijw8HAAwf/58FBcXY9q0acjLy0Pv3r2RmJgIV1dX02usWLECdnZ2GDNmDIqLizFgwACsW7cOtra2ch0WERERWRGrm2dHDpxnh4iIqOlpcvPsEBERETUEhh0iIiJSNIYdIiIiUjSGHSIiIlI0hh0iIiJSNIYdIiIiUjSGHSIiIlI0hh0iIiJSNIYdIiIiUjSGHSIiIlI0hh0iIiJSNIYdIiIiUjSGHSIiIlI0hh0iIiJSNIYdIiIiUjSGHSIiIlI0hh0iIiJSNIYdIiIiUjSGHSIiIlI0hh0iIiJSNIYdIiIiUjSGHSIiIlI0hh0iIiJSNIYdIiIiUjSGHSIiIlI0hh0iIiJSNIYdIiIiUjSGHSIiIlI0hh0iIiJSNIYdIiIiUjSGHSIiIlI0hh0iIiJSNIYdIiIiUjSGHSIiIlI0hh0iIiJSNIYdIiIiUjSGHSIiIlI0hh0iIiJSNFnDTkxMDHr16gVXV1f4+PjgkUcewalTp8zaCCEQFRUFX19fODk5ITQ0FJmZmWZtdDodZs6cCW9vb7i4uGDEiBE4f/58Yx4KERERWSlZw05ycjKmT5+OQ4cOISkpCaWlpRg4cCCuXr1qahMbG4vly5cjLi4Oqamp0Gq1CA8PR2FhoalNZGQkEhISEB8fjwMHDqCoqAjDhg2DwWCQ47CIiIjIiqiEEELuIspcvHgRPj4+SE5OxoMPPgghBHx9fREZGYkFCxYAkHpxNBoNli5diilTpiA/Px+tWrXCxo0bERERAQC4cOEC/Pz8sH37dgwaNKjK++h0Ouh0OtPjgoIC+Pn5IT8/H25ubo1zsERERHRbCgoK4O7uXuv3t1WN2cnPzwcAeHp6AgCysrKQk5ODgQMHmtqo1Wr069cPKSkpAIC0tDSUlJSYtfH19UVgYKCpTWUxMTFwd3c3LX5+fg11SERERCQzqwk7QgjMnj0b999/PwIDAwEAOTk5AACNRmPWVqPRmPbl5OTAwcEBHh4eNbapbNGiRcjPzzct586ds/ThEBERkZWwk7uAMjNmzMAvv/yCAwcOVNmnUqnMHgshqmyr7GZt1Go11Gp1/YslIiKiJsMqenZmzpyJb7/9Fnv37kWbNm1M27VaLQBU6aHJzc019fZotVro9Xrk5eXV2IaIiIiaL1nDjhACM2bMwFdffYX//ve/8Pf3N9vv7+8PrVaLpKQk0za9Xo/k5GSEhIQAAIKDg2Fvb2/WJjs7GxkZGaY2RERE1HzJehpr+vTp2LJlC7755hu4urqaenDc3d3h5OQElUqFyMhIREdHIyAgAAEBAYiOjoazszPGjh1rajt58mTMmTMHXl5e8PT0xNy5cxEUFISwsDA5D4+IiIisgKxhZ/Xq1QCA0NBQs+1r167FpEmTAADz589HcXExpk2bhry8PPTu3RuJiYlwdXU1tV+xYgXs7OwwZswYFBcXY8CAAVi3bh1sbW0b61CIiIjISlnVPDtyqet1+kRERGQ9muQ8O0RERESWxrBDREREisawQ0RERIrGsENERESKxrBDREREisawQ0RERIrGsENERESKxrBDREREisawQ0RERIrGsENERESKxrBDREREisawQ0RERIrGsENERESKxrBDREREisawQ0RERIrGsENERESKxrBDREREisawQ0RERIrGsENERESKxrBDREREisawQ0RERIrGsENERESKxrBDREREisawQ0RERIrGsENERESKxrBDREREisawQ0RERIrGsENERESKxrBDREREisawQ0RERIrGsENERESKxrBDREREisawQ0RERIrGsENERESKxrBDREREisawQ0RERIrGsENERESKJmvY+f777zF8+HD4+vpCpVLh66+/NtsvhEBUVBR8fX3h5OSE0NBQZGZmmrXR6XSYOXMmvL294eLighEjRuD8+fONeBRERERkzWQNO1evXkW3bt0QFxdX7f7Y2FgsX74ccXFxSE1NhVarRXh4OAoLC01tIiMjkZCQgPj4eBw4cABFRUUYNmwYDAZDYx0GERERWTGVEELIXQQAqFQqJCQk4JFHHgEg9er4+voiMjISCxYsACD14mg0GixduhRTpkxBfn4+WrVqhY0bNyIiIgIAcOHCBfj5+WH79u0YNGhQnd67oKAA7u7uyM/Ph5ubW4McHxEREVlWXb+/rXbMTlZWFnJycjBw4EDTNrVajX79+iElJQUAkJaWhpKSErM2vr6+CAwMNLWpjk6nQ0FBgdlCREREymS1YScnJwcAoNFozLZrNBrTvpycHDg4OMDDw6PGNtWJiYmBu7u7afHz87Nw9URERGQtrDbslFGpVGaPhRBVtlVWW5tFixYhPz/ftJw7d84itRIREZH1sdqwo9VqAaBKD01ubq6pt0er1UKv1yMvL6/GNtVRq9Vwc3MzW4iIiEiZrDbs+Pv7Q6vVIikpybRNr9cjOTkZISEhAIDg4GDY29ubtcnOzkZGRoapDRERETVvdnK+eVFREX799VfT46ysLKSnp8PT0xNt27ZFZGQkoqOjERAQgICAAERHR8PZ2Rljx44FALi7u2Py5MmYM2cOvLy84Onpiblz5yIoKAhhYWFyHRYRERFZEVnDzpEjR9C/f3/T49mzZwMAJk6ciHXr1mH+/PkoLi7GtGnTkJeXh969eyMxMRGurq6m56xYsQJ2dnYYM2YMiouLMWDAAKxbtw62traNfjxERERkfaxmnh05cZ4dIiKipqfJz7NDREREZAkMO0RERKRoDDtERESkaAw7REREpGgMO0RERKRoDDtERESkaAw7REREpGgMO0RERKRoDDtERESkaAw7REREpGgMO0RERKRoDDtERESkaAw7REREpGgMO0RERKRoDDtERESkaAw7REREpGgMO0RERKRoDDtERESkaAw7REREpGgMO0RERKRoDDtERESkaAw7REREpGgMO0RERKRoDDtERESkaAw7REREpGgMO0RERKRoDDtERESkaAw7REREpGgMO0RERKRoDDtERESkaAw7REREpGh2chdARETU2IxGAb3BiOslBuhKjdCVGKErldZN20oNN7YboS81otQoUGo0osQgUGq48dhQzTajEaUGIW0zlrUzwmAUMArAKKSfQogb28ofGwVgMArTetk+o6mdgBCA4ca60VjheUIAAKQf5evixjELISqsS49NLW/sEDDfLkzbq74eRPXbK79PmZjHgvDkvW1v94+uXhh2iIioSdCVGpBfXIKC4hLkV1iKdAZc1ZXimq4UV/XSuumnrhTXTNtKcU0nBRm9wSj34TQ7FYNPY2PYISKiRieEQMH1Ulws1CG38Lr0s0CHi0U6XLmqLw8z18pDTXGJoUFqsVEBjva2UNvZmH6q7WyhtrcxrTvY2cDWRgV7WxXsbGxgZ6uCvY0NbG1VsLdRwc5W2mZnI+23t72xzebGNlvp+bYqFWxsVLBRATYqFVQ3ftqoVLC1AVQ31ivut7VR1amtdCzST5VKWgBABVWF9fLtqLJdVaVNxeeanqWqva3K9J/y7S3U8kUOhh0iIrIYg1Hg8tUbwaVQZwozuRXCTG7hdeQW6KArvfXeFZUKcFXbwd3ZHi2dHODmZIcWaju4qO3g4lD201b6qbaFs4O03/nGNmcH2yrBxs6Ww1eVjmGHiIhqdb3EcCO46HDxRngp643JrfD4UpEOxls4XeHqaAcfVzV8XB3h46ZGqxZqeLZwgLuTPdydpEBTtu7uZA9XRzvY2Khqf2GiChh2iIiaKSEECopLcbGo5vAi9chcR8H10jq/ro0K8GohBRcfN7UpzLRyvbHuVv7Y0d62AY+QSMKwQ0SkIAajQOH1Elwq0uNSUXlvS/l6+fbLRfpbGqjrYGdzI7iobwQXR1N4qfjY08WBp4bIqigm7KxatQrvvvsusrOzcffdd2PlypV44IEH5C6LiOiWlBiMuKYzoEgvXUlUpJOuICq6cWVRwfUS/H1j0O7f1/TIu1aCv4tLkH9NL/0sLrnlq14qnkqq3PtSMdi4OdmZBqYSNSWKCDtbt25FZGQkVq1ahb59+2LNmjUYMmQIjh8/jrZt5bmmHwDyrupxVV9z168lLsOr7TXKZzu4ndeoSx03b1Xba9Tts6jlPerwGrdbh0U+zzrVefvHWut7WMHvTm2/N3V7jVpfotZXETfmNjHcmPekbO6TUoP002DEjX1GGIzSnCcGIVBqFNJ62XLj+WXzspTN2yKtG6EvLZ/TRW+Q9utLjSguMeDqjUCjr8eg3eq4qu3QylUN7xZqeLs6oFULab18m7Tu5eLAU0mkeCpRl79trFzv3r3Ro0cPrF692rTtrrvuwiOPPIKYmJgq7XU6HXQ6nelxQUEB/Pz8kJ+fDzc3N4vVtTjhGLb8eNZir0dEzYeDrY3Z1UQuaulqIldHO7R0doDHjauR3J3t4eHsgJbO9mjpZG+6SsnBjqeRSPkKCgrg7u5e6/d3k+/Z0ev1SEtLw8KFC822Dxw4ECkpKdU+JyYmBm+88UaD12Zvo4Kj/c3/wlGh9i7h2nqNa3uFunQ719qiDj3Xt1tHXXrHG+NYa3+Jhv8zq9trWMfnWet71HYcdaqz4Y/V5sZ8KNIcJhWWG/Oi2N14bLb/xrr5cyHN0WJnA4cKc7SUP7aB2t4WDrY20jwuttLjskAjXSZtx7BCZEFNPuxcunQJBoMBGo3GbLtGo0FOTk61z1m0aBFmz55telzWs2Npb4wMxBsjAy3+ukRERFR3TT7slKn8L1AhRI3/KlWr1VCr1Y1RFhEREcmsyfeTent7w9bWtkovTm5ubpXeHiIiImp+mnzYcXBwQHBwMJKSksy2JyUlISQkRKaqiIiIyFoo4jTW7NmzMX78ePTs2RN9+vTBRx99hLNnz2Lq1Klyl0ZEREQyU0TYiYiIwOXLl/Hmm28iOzsbgYGB2L59O9q1ayd3aURERCQzRcyzc7vqep0+ERERWY+6fn83+TE7RERERDfDsENERESKxrBDREREisawQ0RERIrGsENERESKxrBDREREisawQ0RERIrGsENERESKpogZlG9X2byKBQUFMldCREREdVX2vV3b/MgMOwAKCwsBAH5+fjJXQkRERLeqsLAQ7u7uNe7n7SIAGI1GXLhwAa6urlCpVBZ73YKCAvj5+eHcuXOKvQ2F0o9R6ccHKP8YeXxNn9KPkcdXf0IIFBYWwtfXFzY2NY/MYc8OABsbG7Rp06bBXt/NzU2Rv8AVKf0YlX58gPKPkcfX9Cn9GHl89XOzHp0yHKBMREREisawQ0RERIrGsNOA1Go1Xn/9dajVarlLaTBKP0alHx+g/GPk8TV9Sj9GHl/D4wBlIiIiUjT27BAREZGiMewQERGRojHsEBERkaIx7BAREZGiMew0ohEjRqBt27ZwdHRE69atMX78eFy4cEHusizijz/+wOTJk+Hv7w8nJyd06NABr7/+OvR6vdylWczbb7+NkJAQODs7o2XLlnKXYxGrVq2Cv78/HB0dERwcjP3798tdksV8//33GD58OHx9faFSqfD111/LXZJFxcTEoFevXnB1dYWPjw8eeeQRnDp1Su6yLGb16tXo2rWraSK6Pn36YMeOHXKX1WBiYmKgUqkQGRkpdykWExUVBZVKZbZotVpZamHYaUT9+/fHf/7zH5w6dQpffvklfvvtN4waNUrusizi5MmTMBqNWLNmDTIzM7FixQp8+OGHWLx4sdylWYxer8fo0aPxwgsvyF2KRWzduhWRkZF4+eWX8dNPP+GBBx7AkCFDcPbsWblLs4irV6+iW7duiIuLk7uUBpGcnIzp06fj0KFDSEpKQmlpKQYOHIirV6/KXZpFtGnTBu+88w6OHDmCI0eO4KGHHsLIkSORmZkpd2kWl5qaio8++ghdu3aVuxSLu/vuu5GdnW1ajh07Jk8hgmTzzTffCJVKJfR6vdylNIjY2Fjh7+8vdxkWt3btWuHu7i53Gbft3nvvFVOnTjXb1rlzZ7Fw4UKZKmo4AERCQoLcZTSo3NxcAUAkJyfLXUqD8fDwEJ988oncZVhUYWGhCAgIEElJSaJfv35i1qxZcpdkMa+//rro1q2b3GUIIYRgz45Mrly5gs2bNyMkJAT29vZyl9Mg8vPz4enpKXcZVA29Xo+0tDQMHDjQbPvAgQORkpIiU1V0O/Lz8wFAkf/PGQwGxMfH4+rVq+jTp4/c5VjU9OnTMXToUISFhcldSoM4c+YMfH194e/vjyeeeAK///67LHUw7DSyBQsWwMXFBV5eXjh79iy++eYbuUtqEL/99hvef/99TJ06Ve5SqBqXLl2CwWCARqMx267RaJCTkyNTVVRfQgjMnj0b999/PwIDA+Uux2KOHTuGFi1aQK1WY+rUqUhISECXLl3kLsti4uPjcfToUcTExMhdSoPo3bs3NmzYgF27duHjjz9GTk4OQkJCcPny5UavhWHnNlU3AKvycuTIEVP7efPm4aeffkJiYiJsbW0xYcIECCuexPpWjw8ALly4gMGDB2P06NF49tlnZaq8bupzfEqiUqnMHgshqmwj6zdjxgz88ssv+Pe//y13KRbVqVMnpKen49ChQ3jhhRcwceJEHD9+XO6yLOLcuXOYNWsWNm3aBEdHR7nLaRBDhgzB448/jqCgIISFheG7774DAKxfv77Ra7Fr9HdUmBkzZuCJJ564aZv27dub1r29veHt7Y0777wTd911F/z8/HDo0CGr7Zq91eO7cOEC+vfvjz59+uCjjz5q4Opu360en1J4e3vD1ta2Si9Obm5uld4esm4zZ87Et99+i++//x5t2rSRuxyLcnBwQMeOHQEAPXv2RGpqKv75z39izZo1Mld2+9LS0pCbm4vg4GDTNoPBgO+//x5xcXHQ6XSwtbWVsULLc3FxQVBQEM6cOdPo782wc5vKwkt9lPXo6HQ6S5ZkUbdyfH/99Rf69++P4OBgrF27FjY21t9xeDt/fk2Zg4MDgoODkZSUhEcffdS0PSkpCSNHjpSxMqorIQRmzpyJhIQE7Nu3D/7+/nKX1OCEEFb99+WtGDBgQJUrk55++ml07twZCxYsUFzQAaTvuhMnTuCBBx5o9Pdm2Gkkhw8fxuHDh3H//ffDw8MDv//+O1577TV06NDBant1bsWFCxcQGhqKtm3b4r333sPFixdN++SaV8HSzp49iytXruDs2bMwGAxIT08HAHTs2BEtWrSQt7h6mD17NsaPH4+ePXuaeuLOnj2rmHFWRUVF+PXXX02Ps7KykJ6eDk9PT7Rt21bGyixj+vTp2LJlC7755hu4urqaeunc3d3h5OQkc3W3b/HixRgyZAj8/PxQWFiI+Ph47Nu3Dzt37pS7NItwdXWtMr6qbDynUsZdzZ07F8OHD0fbtm2Rm5uLJUuWoKCgABMnTmz8YuS8FKw5+eWXX0T//v2Fp6enUKvVon379mLq1Kni/PnzcpdmEWvXrhUAql2UYuLEidUe3969e+Uurd4++OAD0a5dO+Hg4CB69OihqMuW9+7dW+2f18SJE+UuzSJq+v9t7dq1cpdmEc8884zpd7NVq1ZiwIABIjExUe6yGpTSLj2PiIgQrVu3Fvb29sLX11c89thjIjMzU5ZaVEJY8ehYIiIiottk/YMqiIiIiG4Dww4REREpGsMOERERKRrDDhERESkaww4REREpGsMOERERKRrDDhERESkaww4REREpGsMOEdUoNDQUkZGRcpdRrcuXL8PHxwd//PEHAGDfvn1QqVT4+++/G/R96/s+69atQ8uWLW/pOb169cJXX311S88hoqoYdoio0WRnZ2Ps2LHo1KkTbGxsagxSX375Jbp06QK1Wo0uXbogISGhSpuYmBgMHz5ckXelL/Pqq69i4cKFMBqNcpdC1KQx7BBRo9HpdGjVqhVefvlldOvWrdo2Bw8eREREBMaPH4+ff/4Z48ePx5gxY/Djjz+a2hQXF+PTTz/Fs88+21ily2Lo0KHIz8/Hrl275C6FqElj2CGiOsnLy8OECRPg4eEBZ2dnDBkyBGfOnDFr8/HHH8PPzw/Ozs549NFHsXz5crNTN+3bt8c///lPTJgwAe7u7tW+z8qVKxEeHo5Fixahc+fOWLRoEQYMGICVK1ea2uzYsQN2dnbo06dPjfVevnwZTz75JNq0aQNnZ2cEBQXh3//+t1mb0NBQzJw5E5GRkfDw8IBGo8FHH32Eq1ev4umnn4arqys6dOiAHTt2VHn9H374Ad26dYOjoyN69+6NY8eOme1ft24d2rZta/osLl++bLb/t99+w8iRI6HRaNCiRQv06tULu3fvNmtja2uLhx9+uErdRHRrGHaIqE4mTZqEI0eO4Ntvv8XBgwchhMDDDz+MkpISANKX/9SpUzFr1iykp6cjPDwcb7/99i2/z8GDBzFw4ECzbYMGDUJKSorp8ffff4+ePXve9HWuX7+O4OBgbNu2DRkZGXj++ecxfvx4sx4iAFi/fj28vb1x+PBhzJw5Ey+88AJGjx6NkJAQHD16FIMGDcL48eNx7do1s+fNmzcP7733HlJTU+Hj44MRI0aYPosff/wRzzzzDKZNm4b09HT0798fS5YsMXt+UVERHn74YezevRs//fQTBg0ahOHDh+Ps2bNm7e69917s37+/bh8eEVVPlnutE1GT0K9fPzFr1ixx+vRpAUD88MMPpn2XLl0STk5O4j//+Y8QQoiIiAgxdOhQs+ePGzdOuLu73/S1K7O3txebN28227Z582bh4OBgejxy5EjxzDPPmLXZu3evACDy8vJqPJ6HH35YzJkzx6yG+++/3/S4tLRUuLi4iPHjx5u2ZWdnCwDi4MGDZu8THx9vanP58mXh5OQktm7dKoQQ4sknnxSDBw82e++IiIgaP4syXbp0Ee+//77Ztm+++UbY2NgIg8Fw0+cSUc3Ys0NEtTpx4gTs7OzQu3dv0zYvLy906tQJJ06cAACcOnUK9957r9nzKj+uK5VKZfZYCGG2rbi4GI6Ojjd9DYPBgLfffhtdu3aFl5cXWrRogcTExCo9J127djWt29rawsvLC0FBQaZtGo0GAJCbm2v2vIqn0Dw9Pc0+ixMnTlQ5xVb58dWrVzF//nx06dIFLVu2RIsWLXDy5Mkq9Tk5OcFoNEKn0930eImoZnZyF0BE1k8IUeP2shBSOZDc7Hk3o9VqkZOTY7YtNzfXFDoAwNvbG3l5eTd9nWXLlmHFihVYuXIlgoKC4OLigsjISOj1erN29vb2Zo9VKpXZtrJjqssVURU/i9rMmzcPu3btwnvvvYeOHTvCyckJo0aNqlLflStX4OzsDCcnp1pfk4iqx54dIqpVly5dUFpaajbe5fLlyzh9+jTuuusuAEDnzp1x+PBhs+cdOXLklt+rT58+SEpKMtuWmJiIkJAQ0+Pu3bvj+PHjN32d/fv3Y+TIkXjqqafQrVs3/OMf/6gyoPp2HDp0yLSel5eH06dPo3PnzgCkz6vi/srty+qbNGkSHn30UQQFBUGr1ZrmDKooIyMDPXr0sFjdRM0Rww4R1SogIAAjR47Ec889hwMHDuDnn3/GU089hTvuuAMjR44EAMycORPbt2/H8uXLcebMGaxZswY7duyo0tuTnp6O9PR0FBUV4eLFi0hPTzcLLrNmzUJiYiKWLl2KkydPYunSpdi9e7fZnDyDBg1CZmbmTXt3OnbsiKSkJKSkpODEiROYMmVKlR6j2/Hmm29iz549yMjIwKRJk+Dt7Y1HHnkEAPDiiy9i586diI2NxenTpxEXF4edO3dWqe+rr75Ceno6fv75Z4wdO7ba3qP9+/dXGbBNRLeGYYeI6mTt2rUIDg7GsGHD0KdPHwghsH37dtMpn759++LDDz/E8uXL0a1bN+zcuRMvvfRSlbE13bt3R/fu3ZGWloYtW7age/fuePjhh037Q0JCEB8fj7Vr16Jr165Yt24dtm7dajZeKCgoCD179sR//vOfGut99dVX0aNHDwwaNAihoaHQarWmMGIJ77zzDmbNmoXg4GBkZ2fj22+/hYODAwDgvvvuwyeffIL3338f99xzDxITE/HKK6+YPX/FihXw8PBASEgIhg8fjkGDBlXpwfnrr7+QkpKCp59+2mJ1EzVHKlGfk+pERHXw3HPP4eTJkw1y6fT27dsxd+5cZGRkwMZGmf9umzdvHvLz8/HRRx/JXQpRk8YBykRkMe+99x7Cw8Ph4uKCHTt2YP369Vi1alWDvNfDDz+MM2fO4K+//oKfn1+DvIfcfHx8MHfuXLnLIGry2LNDRBYzZswY7Nu3D4WFhfjHP/6BmTNnYurUqXKXRUTNHMMOERERKZoyT3QTERER3cCwQ0RERIrGsENERESKxrBDREREisawQ0RERIrGsENERESKxrBDREREisawQ0RERIr2/75BpxVgMT2jAAAAAElFTkSuQmCC",
+      "text/plain": [
+       "<Figure size 640x480 with 1 Axes>"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "import numpy as np\n",
+    "import matplotlib.pyplot as plt\n",
+    "from sklearn.model_selection import KFold\n",
+    "from sklearn.linear_model import Ridge\n",
+    "from sklearn.model_selection import cross_val_score\n",
+    "from sklearn.preprocessing import PolynomialFeatures\n",
+    "\n",
+    "# A seed just to ensure that the random numbers are the same for every run.\n",
+    "# Useful for eventual debugging.\n",
+    "np.random.seed(3155)\n",
+    "\n",
+    "# Generate the data.\n",
+    "nsamples = 100\n",
+    "x = np.random.randn(nsamples)\n",
+    "y = 3*x**2 + np.random.randn(nsamples)\n",
+    "\n",
+    "## Cross-validation on Ridge regression using KFold only\n",
+    "\n",
+    "# Decide degree on polynomial to fit\n",
+    "poly = PolynomialFeatures(degree = 2)\n",
+    "\n",
+    "# Decide which values of lambda to use\n",
+    "nlambdas = 500\n",
+    "lambdas = np.logspace(-3, 5, nlambdas)\n",
+    "\n",
+    "# Initialize a KFold instance\n",
+    "k = 5\n",
+    "kfold = KFold(n_splits = k)\n",
+    "\n",
+    "# Perform the cross-validation to estimate MSE\n",
+    "scores_KFold = np.zeros((nlambdas, k))\n",
+    "\n",
+    "i = 0\n",
+    "for lmb in lambdas:\n",
+    "    ridge = Ridge(alpha = lmb)\n",
+    "    j = 0\n",
+    "    for train_inds, test_inds in kfold.split(x):\n",
+    "        xtrain = x[train_inds]\n",
+    "        ytrain = y[train_inds]\n",
+    "\n",
+    "        xtest = x[test_inds]\n",
+    "        ytest = y[test_inds]\n",
+    "\n",
+    "        Xtrain = poly.fit_transform(xtrain[:, np.newaxis])\n",
+    "        ridge.fit(Xtrain, ytrain[:, np.newaxis])\n",
+    "\n",
+    "        Xtest = poly.fit_transform(xtest[:, np.newaxis])\n",
+    "        ypred = ridge.predict(Xtest)\n",
+    "\n",
+    "        scores_KFold[i,j] = np.sum((ypred - ytest[:, np.newaxis])**2)/np.size(ypred)\n",
+    "\n",
+    "        j += 1\n",
+    "    i += 1\n",
+    "\n",
+    "\n",
+    "estimated_mse_KFold = np.mean(scores_KFold, axis = 1)\n",
+    "\n",
+    "## Cross-validation using cross_val_score from sklearn along with KFold\n",
+    "\n",
+    "# kfold is an instance initialized above as:\n",
+    "# kfold = KFold(n_splits = k)\n",
+    "\n",
+    "estimated_mse_sklearn = np.zeros(nlambdas)\n",
+    "i = 0\n",
+    "for lmb in lambdas:\n",
+    "    ridge = Ridge(alpha = lmb)\n",
+    "\n",
+    "    X = poly.fit_transform(x[:, np.newaxis])\n",
+    "    estimated_mse_folds = cross_val_score(ridge, X, y[:, np.newaxis], scoring='neg_mean_squared_error', cv=kfold)\n",
+    "\n",
+    "    # cross_val_score return an array containing the estimated negative mse for every fold.\n",
+    "    # we have to the the mean of every array in order to get an estimate of the mse of the model\n",
+    "    estimated_mse_sklearn[i] = np.mean(-estimated_mse_folds)\n",
+    "\n",
+    "    i += 1\n",
+    "\n",
+    "## Plot and compare the slightly different ways to perform cross-validation\n",
+    "\n",
+    "plt.figure()\n",
+    "\n",
+    "plt.plot(np.log10(lambdas), estimated_mse_sklearn, label = 'cross_val_score')\n",
+    "plt.plot(np.log10(lambdas), estimated_mse_KFold, 'r--', label = 'KFold')\n",
+    "\n",
+    "plt.xlabel('log10(lambda)')\n",
+    "plt.ylabel('mse')\n",
+    "\n",
+    "plt.legend()\n",
+    "\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d380b2da",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## More examples on bootstrap and cross-validation and errors"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 10,
+   "id": "6ee8634a",
+   "metadata": {
+    "collapsed": false,
+    "editable": true,
+    "jupyter": {
+     "outputs_hidden": false
+    }
+   },
+   "outputs": [],
+   "source": [
+    "# Common imports\n",
+    "import os\n",
+    "import numpy as np\n",
+    "import pandas as pd\n",
+    "import matplotlib.pyplot as plt\n",
+    "from sklearn.linear_model import LinearRegression, Ridge, Lasso\n",
+    "from sklearn.model_selection import train_test_split\n",
+    "from sklearn.utils import resample\n",
+    "from sklearn.metrics import mean_squared_error\n",
+    "# Where to save the figures and data files\n",
+    "PROJECT_ROOT_DIR = \"Results\"\n",
+    "FIGURE_ID = \"Results/FigureFiles\"\n",
+    "DATA_ID = \"DataFiles/\"\n",
+    "\n",
+    "if not os.path.exists(PROJECT_ROOT_DIR):\n",
+    "    os.mkdir(PROJECT_ROOT_DIR)\n",
+    "\n",
+    "if not os.path.exists(FIGURE_ID):\n",
+    "    os.makedirs(FIGURE_ID)\n",
+    "\n",
+    "if not os.path.exists(DATA_ID):\n",
+    "    os.makedirs(DATA_ID)\n",
+    "\n",
+    "def image_path(fig_id):\n",
+    "    return os.path.join(FIGURE_ID, fig_id)\n",
+    "\n",
+    "def data_path(dat_id):\n",
+    "    return os.path.join(DATA_ID, dat_id)\n",
+    "\n",
+    "def save_fig(fig_id):\n",
+    "    plt.savefig(image_path(fig_id) + \".png\", format='png')\n",
+    "\n",
+    "infile = open(data_path(\"EoS.csv\"),'r')\n",
+    "\n",
+    "# Read the EoS data as  csv file and organize the data into two arrays with density and energies\n",
+    "EoS = pd.read_csv(infile, names=('Density', 'Energy'))\n",
+    "EoS['Energy'] = pd.to_numeric(EoS['Energy'], errors='coerce')\n",
+    "EoS = EoS.dropna()\n",
+    "Energies = EoS['Energy']\n",
+    "Density = EoS['Density']\n",
+    "#  The design matrix now as function of various polytrops\n",
+    "\n",
+    "Maxpolydegree = 30\n",
+    "X = np.zeros((len(Density),Maxpolydegree))\n",
+    "X[:,0] = 1.0\n",
+    "testerror = np.zeros(Maxpolydegree)\n",
+    "trainingerror = np.zeros(Maxpolydegree)\n",
+    "polynomial = np.zeros(Maxpolydegree)\n",
+    "\n",
+    "trials = 100\n",
+    "for polydegree in range(1, Maxpolydegree):\n",
+    "    polynomial[polydegree] = polydegree\n",
+    "    for degree in range(polydegree):\n",
+    "        X[:,degree] = Density**(degree/3.0)\n",
+    "\n",
+    "# loop over trials in order to estimate the expectation value of the MSE\n",
+    "    testerror[polydegree] = 0.0\n",
+    "    trainingerror[polydegree] = 0.0\n",
+    "    for samples in range(trials):\n",
+    "        x_train, x_test, y_train, y_test = train_test_split(X, Energies, test_size=0.2)\n",
+    "        model = LinearRegression(fit_intercept=False).fit(x_train, y_train)\n",
+    "        ypred = model.predict(x_train)\n",
+    "        ytilde = model.predict(x_test)\n",
+    "        testerror[polydegree] += mean_squared_error(y_test, ytilde)\n",
+    "        trainingerror[polydegree] += mean_squared_error(y_train, ypred) \n",
+    "\n",
+    "    testerror[polydegree] /= trials\n",
+    "    trainingerror[polydegree] /= trials\n",
+    "    print(\"Degree of polynomial: %3d\"% polynomial[polydegree])\n",
+    "    print(\"Mean squared error on training data: %.8f\" % trainingerror[polydegree])\n",
+    "    print(\"Mean squared error on test data: %.8f\" % testerror[polydegree])\n",
+    "\n",
+    "plt.plot(polynomial, np.log10(trainingerror), label='Training Error')\n",
+    "plt.plot(polynomial, np.log10(testerror), label='Test Error')\n",
+    "plt.xlabel('Polynomial degree')\n",
+    "plt.ylabel('log10[MSE]')\n",
+    "plt.legend()\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "063c5a57",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Note that we kept the intercept column in the fitting here. This means that we need to set the **intercept** in the call to the **Scikit-Learn** function as **False**. Alternatively, we could have set up the design matrix $X$ without the first column of ones."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7736c7fc",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## The same example but now with cross-validation\n",
+    "\n",
+    "In this example we keep the intercept column again but add cross-validation in order to estimate the best possible value of the means squared error."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 11,
+   "id": "9f8e4fb6",
+   "metadata": {
+    "collapsed": false,
+    "editable": true,
+    "jupyter": {
+     "outputs_hidden": false
+    }
+   },
+   "outputs": [],
+   "source": [
+    "# Common imports\n",
+    "import os\n",
+    "import numpy as np\n",
+    "import pandas as pd\n",
+    "import matplotlib.pyplot as plt\n",
+    "from sklearn.linear_model import LinearRegression, Ridge, Lasso\n",
+    "from sklearn.metrics import mean_squared_error\n",
+    "from sklearn.model_selection import KFold\n",
+    "from sklearn.model_selection import cross_val_score\n",
+    "\n",
+    "\n",
+    "# Where to save the figures and data files\n",
+    "PROJECT_ROOT_DIR = \"Results\"\n",
+    "FIGURE_ID = \"Results/FigureFiles\"\n",
+    "DATA_ID = \"DataFiles/\"\n",
+    "\n",
+    "if not os.path.exists(PROJECT_ROOT_DIR):\n",
+    "    os.mkdir(PROJECT_ROOT_DIR)\n",
+    "\n",
+    "if not os.path.exists(FIGURE_ID):\n",
+    "    os.makedirs(FIGURE_ID)\n",
+    "\n",
+    "if not os.path.exists(DATA_ID):\n",
+    "    os.makedirs(DATA_ID)\n",
+    "\n",
+    "def image_path(fig_id):\n",
+    "    return os.path.join(FIGURE_ID, fig_id)\n",
+    "\n",
+    "def data_path(dat_id):\n",
+    "    return os.path.join(DATA_ID, dat_id)\n",
+    "\n",
+    "def save_fig(fig_id):\n",
+    "    plt.savefig(image_path(fig_id) + \".png\", format='png')\n",
+    "\n",
+    "infile = open(data_path(\"EoS.csv\"),'r')\n",
+    "\n",
+    "# Read the EoS data as  csv file and organize the data into two arrays with density and energies\n",
+    "EoS = pd.read_csv(infile, names=('Density', 'Energy'))\n",
+    "EoS['Energy'] = pd.to_numeric(EoS['Energy'], errors='coerce')\n",
+    "EoS = EoS.dropna()\n",
+    "Energies = EoS['Energy']\n",
+    "Density = EoS['Density']\n",
+    "#  The design matrix now as function of various polytrops\n",
+    "\n",
+    "Maxpolydegree = 30\n",
+    "X = np.zeros((len(Density),Maxpolydegree))\n",
+    "X[:,0] = 1.0\n",
+    "estimated_mse_sklearn = np.zeros(Maxpolydegree)\n",
+    "polynomial = np.zeros(Maxpolydegree)\n",
+    "k =5\n",
+    "kfold = KFold(n_splits = k)\n",
+    "\n",
+    "for polydegree in range(1, Maxpolydegree):\n",
+    "    polynomial[polydegree] = polydegree\n",
+    "    for degree in range(polydegree):\n",
+    "        X[:,degree] = Density**(degree/3.0)\n",
+    "        OLS = LinearRegression(fit_intercept=False)\n",
+    "# loop over trials in order to estimate the expectation value of the MSE\n",
+    "    estimated_mse_folds = cross_val_score(OLS, X, Energies, scoring='neg_mean_squared_error', cv=kfold)\n",
+    "#[:, np.newaxis]\n",
+    "    estimated_mse_sklearn[polydegree] = np.mean(-estimated_mse_folds)\n",
+    "\n",
+    "plt.plot(polynomial, np.log10(estimated_mse_sklearn), label='Test Error')\n",
+    "plt.xlabel('Polynomial degree')\n",
+    "plt.ylabel('log10[MSE]')\n",
+    "plt.legend()\n",
+    "plt.show()"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.9.15"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/doc/src/week37/exercisesweek37.do.txt b/doc/src/week37/exercisesweek37.do.txt
index 94c1d068a..e0bcaea69 100644
--- a/doc/src/week37/exercisesweek37.do.txt
+++ b/doc/src/week37/exercisesweek37.do.txt
@@ -59,7 +59,7 @@ y_mean = ?
 y_centered = ?
 !ec
 
-Fill in the necessary details.
+Fill in the necessary details. Do we need to center the $y$-values? 
 
 After this preprocessing, each column of $\bm{X}_{\mathrm{norm}}$ has mean zero and standard deviation $1$
 and $\bm{y}_{\mathrm{centered}}$ has mean 0. This makes the optimization landscape
@@ -76,7 +76,9 @@ Find the gradients for OLS and Ridge regression using the mean-squared error as
 
 !bc pycod
 # Set regularization parameter, either a single value or a vector of values
-lambda = ?
+# Note that lambda is a python keyword. The lambda keyword is used to create small, single-expression functions without a formal name. These are often called "anonymous functions" or "lambda functions."
+lam = ?
+
 
 # Analytical form for OLS and Ridge solution: theta_Ridge = (X^T X + lambda * I)^{-1} X^T y and theta_OLS = (X^T X)^{-1} X^T y
 I = np.eye(n_features)
@@ -120,19 +122,8 @@ num_iters = 1000
 # Initialize weights for gradient descent
 theta = np.zeros(n_features)
 
-# Arrays to store history for plotting
-cost_history = np.zeros(num_iters)
-
 # Gradient descent loop
-m = n_samples  # number of data points
 for t in range(num_iters):
-    # Compute prediction error
-    error = X_norm.dot(theta) - y_centered 
-    # Compute cost for OLS and Ridge (MSE + regularization for Ridge) for monitoring
-    cost_OLS = ?
-    cost_Ridge = ?
-    # You could add a history for both methods (optional)
-    cost_history[t] = ?
     # Compute gradients for OSL and Ridge
     grad_OLS = ?
     grad_Ridge = ?
@@ -148,9 +139,11 @@ print("Gradient Descent Ridge coefficients:", theta_gdRidge)
 !ec
 
 === 4a) ===
-Discuss the results as function of the learning rate parameters and the number of iterations.
+Write first a gradient descent code for OLS only using the above template.
+Discuss the results as function of the learning rate parameters and the number of iterations
 
 === 4b) ===
+Write then a similar code for Ridge regression using the above template.
 Try to add a stopping parameter as function of the number iterations and the difference between the new and old $\theta$ values. How would you define a stopping criterion? 
 
 
diff --git a/doc/src/week37/figures/adagrad.png b/doc/src/week37/figures/adagrad.png
new file mode 100644
index 000000000..97a9cf908
Binary files /dev/null and b/doc/src/week37/figures/adagrad.png differ
diff --git a/doc/src/week37/figures/adam.png b/doc/src/week37/figures/adam.png
new file mode 100644
index 000000000..a3a39f025
Binary files /dev/null and b/doc/src/week37/figures/adam.png differ
diff --git a/doc/src/week37/figures/nns.png b/doc/src/week37/figures/nns.png
new file mode 100644
index 000000000..19e31ef05
Binary files /dev/null and b/doc/src/week37/figures/nns.png differ
diff --git a/doc/src/week37/figures/rmsprop.png b/doc/src/week37/figures/rmsprop.png
new file mode 100644
index 000000000..9f336d033
Binary files /dev/null and b/doc/src/week37/figures/rmsprop.png differ
diff --git a/doc/src/week37/week37.do.txt b/doc/src/week37/week37.do.txt
index 7e9cb47b2..fec74292d 100644
--- a/doc/src/week37/week37.do.txt
+++ b/doc/src/week37/week37.do.txt
@@ -1,4 +1,4 @@
-TITLE: Week 37: Statistical interpretations and Resampling Methods
+TITLE: Week 37: Gradient descent methods
 AUTHOR: Morten Hjorth-Jensen {copyright, 1999-present|CC BY-NC} at Department of Physics, University of Oslo, Norway
 DATE: September 8-12, 2025
 
@@ -9,1650 +9,2373 @@ DATE: September 8-12, 2025
 =====  Plans for week 37, lecture Monday  =====
 
 
-!bblock Material for the lecture on Monday September 9
-#   * "Video of Lecture":"https://youtu.be/omLmp_kkie0"
-#   * "Whiteboard notes":"https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2024/NotesSeptember9.pdf"
-  * Statistical interpretation of Ridge and Lasso regression, see also slides from last week
-  * Resampling techniques, Bootstrap and cross validation and bias-variance tradeoff (this may partly be discussed during the exercise sessions as well.
-  * Readings and Videos:
-    * Raschka et al, pages 175-192
-    * Hastie et al Chapter 7, here we recommend 7.1-7.5 and 7.10 (cross-validation) and 7.11 (bootstrap). See URL:"https://link.springer.com/book/10.1007/978-0-387-84858-7".
-    * "Video on cross validation":"https://www.youtube.com/watch?v=fSytzGwwBVw"
-    * "Video on Bootstrapping":"https://www.youtube.com/watch?v=Xz0x-8-cgaQ"
-    * "Video on bias-variance tradeoff":"https://www.youtube.com/watch?v=EuBBz3bI-aA"
+!bblock Plans and material  for the lecture on Monday September 8
+The family of gradient descent methods
+o Plain gradient descent (constant learning rate), reminder from last week with examples using OLS and Ridge
+o Improving gradient descent with momentum
+o Introducing stochastic gradient descent
+o More advanced updates of the learning rate: ADAgrad, RMSprop and ADAM
+o "Video of Lecture":"https://youtu.be/SuxK68tj-V8"
+o "Whiteboard notes":"https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2025/FYSSTKweek37.pdf"
 !eblock
 
 
-
-
 !split
-=====  Plans for week 37, lab sessions  =====
+===== Readings and Videos: ===== 
+!bblock
+o Recommended: Goodfellow et al, Deep Learning, introduction to gradient descent, see sections 4.3-4.5  at URL:"https://www.deeplearningbook.org/contents/numerical.html" and chapter 8.3-8.5 at URL:"https://www.deeplearningbook.org/contents/optimization.html"
+o Rashcka et al, pages 37-44 and pages 278-283 with focus on linear regression.
+o Video on gradient descent at URL:"https://www.youtube.com/watch?v=sDv4f4s2SB8"
+o Video on Stochastic gradient descent at URL:"https://www.youtube.com/watch?v=vMh0zPT0tLI"
+!eblock
 
 
 
-!bblock  Material for the lab  sessions on Tuesday and Wednesday
-  * Calculations of expectation values
-  * Discussion of resampling techniques
-  * Exercise set for week 37
-  * Work on project 1
-  * "Video of exercise sessions week 37":"https://youtu.be/bK4AEcTu-oM"
-  * For more discussions of Ridge regression and calculation of averages, "Wessel van Wieringen's":"https://arxiv.org/abs/1509.09169" article is highly recommended.
-!eblock  
 
+!split
+===== Material for lecture Monday September 8 =====
 
 
 
-!split
-===== Material for lecture Monday September 9 =====
 
+!split 
+===== Gradient descent and revisiting Ordinary Least Squares from last week =====
 
-!split
-===== Deriving OLS from a probability distribution =====
+Last week we started with  linear regression as a case study for the gradient descent
+methods. Linear regression is a great test case for the gradient
+descent methods discussed in the lectures since it has several
+desirable properties such as:
 
-Our basic assumption when we derived the OLS equations was to assume
-that our output is determined by a given continuous function
-$f(\bm{x})$ and a random noise $\bm{\epsilon}$ given by the normal
-distribution with zero mean value and an undetermined variance
-$\sigma^2$.
+o An analytical solution (recall homework sets for week 35).
+o The gradient can be computed analytically.
+o The cost function is convex which guarantees that gradient descent converges for small enough learning rates
 
-We found above that the outputs $\bm{y}$ have a mean value given by
-$\bm{X}\hat{\bm{\beta}}$ and variance $\sigma^2$. Since the entries to
-the design matrix are not stochastic variables, we can assume that the
-probability distribution of our targets is also a normal distribution
-but now with mean value $\bm{X}\hat{\bm{\beta}}$. This means that a
-single output $y_i$ is given by the Gaussian distribution
+We revisit an example similar to what we had in the first homework set. We have a function  of the type
 
+!bc pycod
+import numpy as np
+x = 2*np.random.rand(m,1)
+y = 4+3*x+np.random.randn(m,1)
+!ec
+with $x_i \in [0,1] $ is chosen randomly using a uniform distribution. Additionally we have a stochastic noise chosen according to a normal distribution $\cal {N}(0,1)$. 
+The linear regression model is given by 
 !bt
 \[
-y_i\sim \mathcal{N}(\bm{X}_{i,*}\bm{\beta}, \sigma^2)=\frac{1}{\sqrt{2\pi\sigma^2}}\exp{\left[-\frac{(y_i-\bm{X}_{i,*}\bm{\beta})^2}{2\sigma^2}\right]}.
-\]
+h_\theta(x) = \bm{y} = \theta_0 + \theta_1 x,
+\] 
 !et
-
-!split
-===== Independent and Identically Distrubuted (iid) =====
-
-We assume now that the various $y_i$ values are stochastically distributed according to the above Gaussian distribution. 
-We define this distribution as
+such that 
 !bt
 \[
-p(y_i, \bm{X}\vert\bm{\beta})=\frac{1}{\sqrt{2\pi\sigma^2}}\exp{\left[-\frac{(y_i-\bm{X}_{i,*}\bm{\beta})^2}{2\sigma^2}\right]},
+\bm{y}_i = \theta_0 + \theta_1 x_i.
 \]
 !et
-which reads as finding the likelihood of an event $y_i$ with the input variables $\bm{X}$ given the parameters (to be determined) $\bm{\beta}$.
 
-Since these events are assumed to be independent and identicall distributed we can build the probability distribution function (PDF) for all possible event $\bm{y}$ as the product of the single events, that is we have
+!split 
+===== Gradient descent example =====
 
-!bt
-\[
-p(\bm{y},\bm{X}\vert\bm{\beta})=\prod_{i=0}^{n-1}\frac{1}{\sqrt{2\pi\sigma^2}}\exp{\left[-\frac{(y_i-\bm{X}_{i,*}\bm{\beta})^2}{2\sigma^2}\right]}=\prod_{i=0}^{n-1}p(y_i,\bm{X}\vert\bm{\beta}).
-\]
-!et
+Let $\mathbf{y} = (y_1,\cdots,y_n)^T$, $\mathbf{\bm{y}} = (\bm{y}_1,\cdots,\bm{y}_n)^T$ and $\theta = (\theta_0, \theta_1)^T$
 
-We will write this in a more compact form reserving $\bm{D}$ for the domain of events, including the ouputs (targets) and the inputs. That is
-in case we have a simple one-dimensional input and output case
+It is convenient to write $\mathbf{\bm{y}} = X\theta$ where $X \in \mathbb{R}^{100 \times 2} $ is the design matrix given by (we keep the intercept here)
 !bt
 \[
-\bm{D}=[(x_0,y_0), (x_1,y_1),\dots, (x_{n-1},y_{n-1})].
+X \equiv \begin{bmatrix}
+1 & x_1  \\
+\vdots & \vdots  \\
+1 & x_{100} &  \\
+\end{bmatrix}.
 \]
 !et
-In the more general case the various inputs should be replaced by the possible features represented by the input data set $\bm{X}$. 
-We can now rewrite the above probability as 
+The cost/loss/risk function is given by 
 !bt
 \[
-p(\bm{D}\vert\bm{\beta})=\prod_{i=0}^{n-1}\frac{1}{\sqrt{2\pi\sigma^2}}\exp{\left[-\frac{(y_i-\bm{X}_{i,*}\bm{\beta})^2}{2\sigma^2}\right]}.
+C(\theta) = \frac{1}{n}||X\theta-\mathbf{y}||_{2}^{2} = \frac{1}{n}\sum_{i=1}^{100}\left[ (\theta_0 + \theta_1 x_i)^2 - 2 y_i (\theta_0 + \theta_1 x_i) + y_i^2\right] 
 \]
 !et
-
-It is a conditional probability (see below) and reads as the likelihood of a domain of events $\bm{D}$ given a set of parameters $\bm{\beta}$.
-
-!split
-===== Maximum Likelihood Estimation (MLE) =====
-
-In statistics, maximum likelihood estimation (MLE) is a method of
-estimating the parameters of an assumed probability distribution,
-given some observed data. This is achieved by maximizing a likelihood
-function so that, under the assumed statistical model, the observed
-data is the most probable. 
-
-
-We will assume here that our events are given by the above Gaussian
-distribution and we will determine the optimal parameters $\beta$ by
-maximizing the above PDF. However, computing the derivatives of a
-product function is cumbersome and can easily lead to overflow and/or
-underflowproblems, with potentials for loss of numerical precision.
-
-
-In practice, it is more convenient to maximize the logarithm of the
-PDF because it is a monotonically increasing function of the argument.
-Alternatively, and this will be our option, we will minimize the
-negative of the logarithm since this is a monotonically decreasing
-function.
-
-Note also that maximization/minimization of the logarithm of the PDF
-is equivalent to the maximization/minimization of the function itself.
-
-
+and we want to find $\theta$ such that $C(\theta)$ is minimized.
 
 !split
-===== A new Cost Function =====
-
-We could now define a new cost function to minimize, namely the negative logarithm of the above PDF
+===== The derivative of the cost/loss function =====
 
+Computing $\partial C(\theta) / \partial \theta_0$ and $\partial C(\theta) / \partial \theta_1$ we can show  that the gradient can be written as
 !bt
 \[
-C(\bm{\beta}=-\log{\prod_{i=0}^{n-1}p(y_i,\bm{X}\vert\bm{\beta})}=-\sum_{i=0}^{n-1}\log{p(y_i,\bm{X}\vert\bm{\beta})},
+\nabla_{\theta} C(\theta) = \frac{2}{n}\begin{bmatrix} \sum_{i=1}^{100} \left(\theta_0+\theta_1x_i-y_i\right) \\
+\sum_{i=1}^{100}\left( x_i (\theta_0+\theta_1x_i)-y_ix_i\right) \\
+\end{bmatrix} = \frac{2}{n}X^T(X\theta - \mathbf{y}), 
 \]
 !et
-which becomes
+where $X$ is the design matrix defined above.
+
+!split
+===== The Hessian matrix =====
+The Hessian matrix of $C(\theta)$ is given by 
 !bt
 \[
-C(\bm{\beta}=\frac{n}{2}\log{2\pi\sigma^2}+\frac{\vert\vert (\bm{y}-\bm{X}\bm{\beta})\vert\vert_2^2}{2\sigma^2}.
+\bm{H} \equiv \begin{bmatrix}
+\frac{\partial^2 C(\theta)}{\partial \theta_0^2} & \frac{\partial^2 C(\theta)}{\partial \theta_0 \partial \theta_1}  \\
+\frac{\partial^2 C(\theta)}{\partial \theta_0 \partial \theta_1} & \frac{\partial^2 C(\theta)}{\partial \theta_1^2} &  \\
+\end{bmatrix} = \frac{2}{n}X^T X.
 \]
 !et
+This result implies that $C(\theta)$ is a convex function since the matrix $X^T X$ always is positive semi-definite.
 
-Taking the derivative of the *new* cost function with respect to the parameters $\beta$ we recognize our familiar OLS equation, namely
 
+
+
+!split
+===== Simple program =====
+
+We can now write a program that minimizes $C(\theta)$ using the gradient descent method with a constant learning rate $\eta$ according to 
 !bt
 \[
-\bm{X}^T\left(\bm{y}-\bm{X}\bm{\beta}\right) =0,
-\]
-!et
-which leads to the well-known OLS equation for the optimal paramters $\beta$
-!bt
-\[
-\hat{\bm{\beta}}^{\mathrm{OLS}}=\left(\bm{X}^T\bm{X}\right)^{-1}\bm{X}^T\bm{y}!
+\theta_{k+1} = \theta_k - \eta \nabla_\theta C(\theta_k), \ k=0,1,\cdots 
 \]
 !et
 
+We can use the expression we computed for the gradient and let use a
+$\theta_0$ be chosen randomly and let $\eta = 0.001$. Stop iterating
+when $||\nabla_\theta C(\theta_k) || \leq \epsilon = 10^{-8}$. _Note that the code below does not include the latter stop criterion_.
 
-Before we make a similar analysis for Ridge and Lasso regression, we need a short reminder on statistics. 
+And finally we can compare our solution for $\theta$ with the analytic result given by 
+$\theta= (X^TX)^{-1} X^T \mathbf{y}$.
 
 !split
-===== More basic Statistics and Bayes' theorem =====
+===== Gradient Descent Example =====
+
+Here our simple example
+!bc pycod
+
+# Importing various packages
+from random import random, seed
+import numpy as np
+import matplotlib.pyplot as plt
+from mpl_toolkits.mplot3d import Axes3D
+from matplotlib import cm
+from matplotlib.ticker import LinearLocator, FormatStrFormatter
+import sys
+
+# the number of datapoints
+n = 100
+x = 2*np.random.rand(n,1)
+y = 4+3*x+np.random.randn(n,1)
+
+X = np.c_[np.ones((n,1)), x]
+# Hessian matrix
+H = (2.0/n)* X.T @ X
+# Get the eigenvalues
+EigValues, EigVectors = np.linalg.eig(H)
+print(f"Eigenvalues of Hessian Matrix:{EigValues}")
+
+theta_linreg = np.linalg.inv(X.T @ X) @ X.T @ y
+print(theta_linreg)
+theta = np.random.randn(2,1)
+
+eta = 1.0/np.max(EigValues)
+Niterations = 1000
+
+for iter in range(Niterations):
+    gradient = (2.0/n)*X.T @ (X @ theta-y)
+    theta -= eta*gradient
+
+print(theta)
+xnew = np.array([[0],[2]])
+xbnew = np.c_[np.ones((2,1)), xnew]
+ypredict = xbnew.dot(theta)
+ypredict2 = xbnew.dot(theta_linreg)
+plt.plot(xnew, ypredict, "r-")
+plt.plot(xnew, ypredict2, "b-")
+plt.plot(x, y ,'ro')
+plt.axis([0,2.0,0, 15.0])
+plt.xlabel(r'$x$')
+plt.ylabel(r'$y$')
+plt.title(r'Gradient descent example')
+plt.show()
 
-A central theorem in statistics is Bayes' theorem. This theorem plays a similar role as the good old Pythagoras' theorem in geometry.
-Bayes' theorem is extremely simple to derive. But to do so we need some basic axioms from statistics.
+!ec
 
-Assume we have two domains of events $X=[x_0,x_1,\dots,x_{n-1}]$ and $Y=[y_0,y_1,\dots,y_{n-1}]$.
 
-We define also the likelihood for $X$ and $Y$ as $p(X)$ and $p(Y)$ respectively.
-The likelihood of a specific event $x_i$ (or $y_i$) is then written as $p(X=x_i)$ or just $p(x_i)=p_i$. 
+!split 
+===== Gradient descent and Ridge =====
 
-!bblock Union of events is given by
+We have also discussed Ridge regression where the loss function contains a regularized term given by the $L_2$ norm of $\theta$, 
 !bt
 \[
-p(X \cup Y)= p(X)+p(Y)-p(X \cap Y).
+C_{\text{ridge}}(\theta) = \frac{1}{n}||X\theta -\mathbf{y}||^2 + \lambda ||\theta||^2, \ \lambda \geq 0.
 \]
 !et
-!eblock
-
 
-!bblock The product rule (aka joint probability) is given by
+In order to minimize $C_{\text{ridge}}(\theta)$ using GD we adjust the gradient as follows 
 !bt
 \[
-p(X \cup Y)= p(X,Y)= p(X\vert Y)p(Y)=p(Y\vert X)p(X),
+\nabla_\theta C_{\text{ridge}}(\theta)  = \frac{2}{n}\begin{bmatrix} \sum_{i=1}^{100} \left(\theta_0+\theta_1x_i-y_i\right) \\
+\sum_{i=1}^{100}\left( x_i (\theta_0+\theta_1x_i)-y_ix_i\right) \\
+\end{bmatrix} + 2\lambda\begin{bmatrix} \theta_0 \\ \theta_1\end{bmatrix} = 2 (\frac{1}{n}X^T(X\theta - \mathbf{y})+\lambda \theta).
 \]
 !et
-where we read $p(X\vert Y)$ as the likelihood of obtaining $X$ given $Y$.
-!eblock
-
-If we have independent events then $p(X,Y)=p(X)p(Y)$.
 
-
-!split
-===== Marginal Probability =====
-
-The marginal probability is defined in terms of only one of the set of variables $X,Y$. For a discrete probability we have
-!bblock 
+We can easily extend our program to minimize $C_{\text{ridge}}(\theta)$ using gradient descent and compare with the analytical solution given by 
 !bt
 \[
-p(X)=\sum_{i=0}^{n-1}p(X,Y=y_i)=\sum_{i=0}^{n-1}p(X\vert Y=y_i)p(Y=y_i)=\sum_{i=0}^{n-1}p(X\vert y_i)p(y_i).
+\theta_{\text{ridge}} = \left(X^T X + n\lambda I_{2 \times 2} \right)^{-1} X^T \mathbf{y}.
 \]
 !et
-!eblock
-
 
 !split
-===== Conditional  Probability =====
-
-The conditional  probability, if $p(Y) > 0$, is 
-!bblock 
+===== The Hessian matrix for Ridge Regression =====
+The Hessian matrix of Ridge Regression for our simple example  is given by 
 !bt
 \[
-p(X\vert Y)= \frac{p(X,Y)}{p(Y)}=\frac{p(X,Y)}{\sum_{i=0}^{n-1}p(Y\vert X=x_i)p(x_i)}.
+\bm{H} \equiv \begin{bmatrix}
+\frac{\partial^2 C(\theta)}{\partial \theta_0^2} & \frac{\partial^2 C(\theta)}{\partial \theta_0 \partial \theta_1}  \\
+\frac{\partial^2 C(\theta)}{\partial \theta_0 \partial \theta_1} & \frac{\partial^2 C(\theta)}{\partial \theta_1^2} &  \\
+\end{bmatrix} = \frac{2}{n}X^T X+2\lambda\bm{I}.
 \]
 !et
-!eblock
+This implies that the Hessian matrix  is positive definite, hence the stationary point is a
+minimum.
+Note that the Ridge cost function is convex being  a sum of two convex
+functions. Therefore, the stationary point is a global
+minimum of this function.
 
 
 !split
-===== Bayes' Theorem =====
+===== Program example for gradient descent with Ridge Regression =====
+!bc pycod
+from random import random, seed
+import numpy as np
+import matplotlib.pyplot as plt
+from mpl_toolkits.mplot3d import Axes3D
+from matplotlib import cm
+from matplotlib.ticker import LinearLocator, FormatStrFormatter
+import sys
+
+# the number of datapoints
+n = 100
+x = 2*np.random.rand(n,1)
+y = 4+3*x+np.random.randn(n,1)
+
+X = np.c_[np.ones((n,1)), x]
+XT_X = X.T @ X
+
+#Ridge parameter lambda
+lmbda  = 0.001
+Id = n*lmbda* np.eye(XT_X.shape[0])
+
+# Hessian matrix
+H = (2.0/n)* XT_X+2*lmbda* np.eye(XT_X.shape[0])
+# Get the eigenvalues
+EigValues, EigVectors = np.linalg.eig(H)
+print(f"Eigenvalues of Hessian Matrix:{EigValues}")
+
+
+theta_linreg = np.linalg.inv(XT_X+Id) @ X.T @ y
+print(theta_linreg)
+# Start plain gradient descent
+theta = np.random.randn(2,1)
+
+eta = 1.0/np.max(EigValues)
+Niterations = 100
+
+for iter in range(Niterations):
+    gradients = 2.0/n*X.T @ (X @ (theta)-y)+2*lmbda*theta
+    theta -= eta*gradients
+
+print(theta)
+ypredict = X @ theta
+ypredict2 = X @ theta_linreg
+plt.plot(x, ypredict, "r-")
+plt.plot(x, ypredict2, "b-")
+plt.plot(x, y ,'ro')
+plt.axis([0,2.0,0, 15.0])
+plt.xlabel(r'$x$')
+plt.ylabel(r'$y$')
+plt.title(r'Gradient descent example for Ridge')
+plt.show()
+!ec
 
-If we combine the conditional probability with the marginal probability and the standard product rule, we have
-!bt
-\[
-p(X\vert Y)= \frac{p(X,Y)}{p(Y)},
-\]
-!et
-which we can rewrite as
 
-!bt
-\[
-p(X\vert Y)= \frac{p(X,Y)}{\sum_{i=0}^{n-1}p(Y\vert X=x_i)p(x_i)}=\frac{p(Y\vert X)p(X)}{\sum_{i=0}^{n-1}p(Y\vert X=x_i)p(x_i)},
-\]
-!et
-which is Bayes' theorem. It allows us to evaluate the uncertainty in in $X$ after we have observed $Y$. We can easily interchange $X$ with $Y$.  
 
 !split
-===== Interpretations of Bayes' Theorem =====
+===== Using gradient descent methods, limitations =====
 
-The quantity $p(Y\vert X)$ on the right-hand side of the theorem is
-evaluated for the observed data $Y$ and can be viewed as a function of
-the parameter space represented by $X$. This function is not
-necesseraly normalized and is normally called the likelihood function.
+* _Gradient descent (GD) finds local minima of our function_. Since the GD algorithm is deterministic, if it converges, it will converge to a local minimum of our cost/loss/risk function. Because in ML we are often dealing with extremely rugged landscapes with many local minima, this can lead to poor performance.
 
-The function $p(X)$ on the right hand side is called the prior while the function on the left hand side is the called the posterior probability. The denominator on the right hand side serves as a normalization factor for the posterior distribution.
+* _GD is sensitive to initial conditions_. One consequence of the local nature of GD is that initial conditions matter. Depending on where one starts, one will end up at a different local minima. Therefore, it is very important to think about how one initializes the training process. This is true for GD as well as more complicated variants of GD.
 
-Let us try to illustrate Bayes' theorem through an example.
+* _Gradients are computationally expensive to calculate for large datasets_. In many cases in statistics and ML, the cost/loss/risk function is a sum of terms, with one term for each data point. For example, in linear regression, $E \propto \sum_{i=1}^n (y_i - \mathbf{w}^T\cdot\mathbf{x}_i)^2$; for logistic regression, the square error is replaced by the cross entropy. To calculate the gradient we have to sum over *all* $n$ data points. Doing this at every GD step becomes extremely computationally expensive. An ingenious solution to this, is to calculate the gradients using small subsets of the data called ``mini batches''. This has the added benefit of introducing stochasticity into our algorithm.
 
+* _GD is very sensitive to choices of learning rates_. GD is extremely sensitive to the choice of learning rates. If the learning rate is very small, the training process take an extremely long time. For larger learning rates, GD can diverge and give poor results. Furthermore, depending on what the local landscape looks like, we have to modify the learning rates to ensure convergence. Ideally, we would *adaptively* choose the learning rates to match the landscape.
+
+* _GD treats all directions in parameter space uniformly._ Another major drawback of GD is that unlike Newton's method, the learning rate for GD is the same in all directions in parameter space. For this reason, the maximum learning rate is set by the behavior of the steepest direction and this can significantly slow down training. Ideally, we would like to take large steps in flat directions and small steps in steep directions. Since we are exploring rugged landscapes where curvatures change, this requires us to keep track of not only the gradient but second derivatives. The ideal scenario would be to calculate the Hessian but this proves to be too computationally expensive. 
+	
+* GD can take exponential time to escape saddle points, even with random initialization. As we mentioned, GD is extremely sensitive to initial condition since it determines the particular local minimum GD would eventually reach. However, even with a good initialization scheme, through the introduction of randomness, GD can still take exponential time to escape saddle points.
+	
 !split
-=====  Example of Usage of Bayes' theorem =====
+===== Momentum based GD =====
 
-Let us suppose that you are undergoing a series of mammography scans in
-order to rule out possible breast cancer cases.  We define the
-sensitivity for a positive event by the variable $X$. It takes binary
-values with $X=1$ representing a positive event and $X=0$ being a
-negative event. We reserve $Y$ as a classification parameter for
-either a negative or a positive breast cancer confirmation. (Short note on wordings: positive here means having breast cancer, although none of us would consider this being a  positive thing).
+We discuss here some simple examples where we introduce what is called
+'memory'about previous steps, or what is normally called momentum
+gradient descent.
+For the mathematical details, see whiteboad notes from lecture on September 8, 2025. 
 
-We let $Y=1$ represent the the case of having breast cancer and $Y=0$ as not.
 
-Let us assume that if you have breast cancer, the test will be positive with a probability of $0.8$, that is we have
 
-!bt
-\[
-p(X=1\vert Y=1) =0.8.
-\]
-!et
+!split
+===== Improving gradient descent with momentum =====
 
-This obviously sounds  scary since many would conclude that if the test is positive, there is a likelihood of $80\%$ for having cancer.
-It is however not correct, as the following Bayesian analysis shows.
 
-!split
-===== Doing it correctly =====
+!bc pycod
+from numpy import asarray
+from numpy import arange
+from numpy.random import rand
+from numpy.random import seed
+from matplotlib import pyplot
+ 
+# objective function
+def objective(x):
+	return x**2.0
+ 
+# derivative of objective function
+def derivative(x):
+	return x * 2.0
+ 
+# gradient descent algorithm
+def gradient_descent(objective, derivative, bounds, n_iter, step_size):
+	# track all solutions
+	solutions, scores = list(), list()
+	# generate an initial point
+	solution = bounds[:, 0] + rand(len(bounds)) * (bounds[:, 1] - bounds[:, 0])
+	# run the gradient descent
+	for i in range(n_iter):
+		# calculate gradient
+		gradient = derivative(solution)
+		# take a step
+		solution = solution - step_size * gradient
+		# evaluate candidate point
+		solution_eval = objective(solution)
+		# store solution
+		solutions.append(solution)
+		scores.append(solution_eval)
+		# report progress
+		print('>%d f(%s) = %.5f' % (i, solution, solution_eval))
+	return [solutions, scores]
+ 
+# seed the pseudo random number generator
+seed(4)
+# define range for input
+bounds = asarray([[-1.0, 1.0]])
+# define the total iterations
+n_iter = 30
+# define the step size
+step_size = 0.1
+# perform the gradient descent search
+solutions, scores = gradient_descent(objective, derivative, bounds, n_iter, step_size)
+# sample input range uniformly at 0.1 increments
+inputs = arange(bounds[0,0], bounds[0,1]+0.1, 0.1)
+# compute targets
+results = objective(inputs)
+# create a line plot of input vs result
+pyplot.plot(inputs, results)
+# plot the solutions found
+pyplot.plot(solutions, scores, '.-', color='red')
+# show the plot
+pyplot.show()
 
-If we look at various national surveys on breast cancer, the general likelihood of developing breast cancer is a very small number.
-Let us assume that the prior probability in the population as a whole is
+!ec
 
-!bt
-\[
-p(Y=1) =0.004.
-\]
-!et
 
-We need also to account for the fact that the test may produce a false positive result (false alarm). Let us here assume that we have
-!bt
-\[
-p(X=1\vert Y=0) =0.1.
-\]
-!et
+!split
+===== Same code but now with momentum gradient descent =====
 
-Using Bayes' theorem we can then find the posterior probability that the person has breast cancer in case of a positive test, that is we can compute
+!bc pycod
+from numpy import asarray
+from numpy import arange
+from numpy.random import rand
+from numpy.random import seed
+from matplotlib import pyplot
+ 
+# objective function
+def objective(x):
+	return x**2.0
+ 
+# derivative of objective function
+def derivative(x):
+	return x * 2.0
+ 
+# gradient descent algorithm
+def gradient_descent(objective, derivative, bounds, n_iter, step_size, momentum):
+	# track all solutions
+	solutions, scores = list(), list()
+	# generate an initial point
+	solution = bounds[:, 0] + rand(len(bounds)) * (bounds[:, 1] - bounds[:, 0])
+	# keep track of the change
+	change = 0.0
+	# run the gradient descent
+	for i in range(n_iter):
+		# calculate gradient
+		gradient = derivative(solution)
+		# calculate update
+		new_change = step_size * gradient + momentum * change
+		# take a step
+		solution = solution - new_change
+		# save the change
+		change = new_change
+		# evaluate candidate point
+		solution_eval = objective(solution)
+		# store solution
+		solutions.append(solution)
+		scores.append(solution_eval)
+		# report progress
+		print('>%d f(%s) = %.5f' % (i, solution, solution_eval))
+	return [solutions, scores]
+ 
+# seed the pseudo random number generator
+seed(4)
+# define range for input
+bounds = asarray([[-1.0, 1.0]])
+# define the total iterations
+n_iter = 30
+# define the step size
+step_size = 0.1
+# define momentum
+momentum = 0.3
+# perform the gradient descent search with momentum
+solutions, scores = gradient_descent(objective, derivative, bounds, n_iter, step_size, momentum)
+# sample input range uniformly at 0.1 increments
+inputs = arange(bounds[0,0], bounds[0,1]+0.1, 0.1)
+# compute targets
+results = objective(inputs)
+# create a line plot of input vs result
+pyplot.plot(inputs, results)
+# plot the solutions found
+pyplot.plot(solutions, scores, '.-', color='red')
+# show the plot
+pyplot.show()
+!ec
 
-!bt
-\[
-p(Y=1\vert X=1)=\frac{p(X=1\vert Y=1)p(Y=1)}{p(X=1\vert Y=1)p(Y=1)+p(X=1\vert Y=0)p(Y=0)}=\frac{0.8\times 0.004}{0.8\times 0.004+0.1\times 0.996}=0.031.
-\]
-!et
-That is, in case of a positive test, there is only a $3\%$ chance of having breast cancer!
+!split
+===== Overview video on Stochastic Gradient Descent (SGD) =====
+
+"What is Stochastic Gradient Descent":"https://www.youtube.com/watch?v=vMh0zPT0tLI&ab_channel=StatQuestwithJoshStarmer"
+There are several reasons for using stochastic gradient descent. Some of these are:
 
+o Efficiency: Updates weights more frequently using a single or a small batch of samples, which speeds up convergence.
+o Hopefully avoid Local Minima
+o Memory Usage: Requires less memory compared to computing gradients for the entire dataset.
 
 !split
-===== Bayes' Theorem and Ridge and Lasso Regression =====
+=====  Batches and mini-batches =====
 
-Using Bayes' theorem we can gain a better intuition about Ridge and Lasso regression. 
+In gradient descent we compute the cost function and its gradient for all data points we have.
 
-For ordinary least squares we postulated that the maximum likelihood for the doamin of events $\bm{D}$ (one-dimensional case)
-!bt
-\[
-\bm{D}=[(x_0,y_0), (x_1,y_1),\dots, (x_{n-1},y_{n-1})],
-\]
-!et
-is given by
-!bt
-\[
-p(\bm{D}\vert\bm{\beta})=\prod_{i=0}^{n-1}\frac{1}{\sqrt{2\pi\sigma^2}}\exp{\left[-\frac{(y_i-\bm{X}_{i,*}\bm{\beta})^2}{2\sigma^2}\right]}.
-\]
-!et
+In large-scale applications such as the "ILSVRC challenge":"https://www.image-net.org/challenges/LSVRC/", the
+training data can have on order of millions of examples. Hence, it
+seems wasteful to compute the full cost function over the entire
+training set in order to perform only a single parameter update. A
+very common approach to addressing this challenge is to compute the
+gradient over batches of the training data. For example, a typical batch could contain some thousand  examples from
+an  entire training set of several millions. This batch is then used to
+perform a parameter update.
 
-In Bayes' theorem this function plays the role of the so-called likelihood. We could now ask the question what is the posterior probability of a parameter set $\bm{\beta}$ given a domain of events $\bm{D}$?  That is, how can we define the posterior probability 
 
-!bt
-\[
-p(\bm{\beta}\vert\bm{D}).
-\]
-!et
+!split
+===== Pros and cons =====
 
-Bayes' theorem comes to our rescue here since (omitting the normalization constant)
-!bt
-\[
-p(\bm{\beta}\vert\bm{D})\propto p(\bm{D}\vert\bm{\beta})p(\bm{\beta}).
-\]
-!et
 
-We have a model for $p(\bm{D}\vert\bm{\beta})$ but need one for the _prior_ $p(\bm{\beta})$!   
+o Speed: SGD is faster than gradient descent because it uses only one training example per iteration, whereas gradient descent requires the entire dataset. This speed advantage becomes more significant as the size of the dataset increases.
+o Convergence: Gradient descent has a more predictable convergence behaviour because it uses the average gradient of the entire dataset. In contrast, SGD’s convergence behaviour can be more erratic due to its random sampling of individual training examples.
+o Memory: Gradient descent requires more memory than SGD because it must store the entire dataset for each iteration. SGD only needs to store the current training example, making it more memory-efficient.
 
 
 !split
-===== Ridge and Bayes =====
+===== Convergence rates =====
 
-With the posterior probability defined by a likelihood which we have
-already modeled and an unknown prior, we are now ready to make
-additional models for the prior.
+o Stochastic Gradient Descent has a faster convergence rate due to the use of single training examples in each iteration.
+o Gradient Descent as a slower convergence rate, as it uses the entire dataset for each iteration.
 
-We can, based on our discussions of the variance of $\bm{\beta}$ and the mean value, assume that the prior for the values $\bm{\beta}$ is given by a Gaussian with mean value zero and variance $\tau^2$, that is
+!split
+===== Accuracy  =====
 
-!bt
-\[
-p(\bm{\beta})=\prod_{j=0}^{p-1}\exp{\left(-\frac{\beta_j^2}{2\tau^2}\right)}.
-\]
-!et
-
-Our posterior probability becomes then (omitting the normalization factor which is just a constant)
-!bt
-\[
-p(\bm{\beta\vert\bm{D})}=\prod_{i=0}^{n-1}\frac{1}{\sqrt{2\pi\sigma^2}}\exp{\left[-\frac{(y_i-\bm{X}_{i,*}\bm{\beta})^2}{2\sigma^2}\right]}\prod_{j=0}^{p-1}\exp{\left(-\frac{\beta_j^2}{2\tau^2}\right)}.
-\]
-!et
+In general, stochastic Gradient Descent is Less accurate than gradient
+descent, as it calculates the gradient on single examples, which may
+not accurately represent the overall dataset.  Gradient Descent is
+more accurate because it uses the average gradient calculated over the
+entire dataset.
 
 
-We can now optimize this quantity with respect to $\bm{\beta}$. As we
-did for OLS, this is most conveniently done by taking the negative
-logarithm of the posterior probability. Doing so and leaving out the
-constants terms that do not depend on $\beta$, we have
+There are other disadvantages to using SGD. The main drawback is that
+its convergence behaviour can be more erratic due to the random
+sampling of individual training examples. This can lead to less
+accurate results, as the algorithm may not converge to the true
+minimum of the cost function. Additionally, the learning rate, which
+determines the step size of each update to the model’s parameters,
+must be carefully chosen to ensure convergence.
 
+It is however the method of choice in deep learning algorithms where
+SGD is often used in combination with other optimization techniques,
+such as momentum or adaptive learning rates
 
-!bt
-\[
-C(\bm{\beta})=\frac{\vert\vert (\bm{y}-\bm{X}\bm{\beta})\vert\vert_2^2}{2\sigma^2}+\frac{1}{2\tau^2}\vert\vert\bm{\beta}\vert\vert_2^2,
-\]
-!et
-and replacing $1/2\tau^2$ with $\lambda$ we have
+!split
+===== Stochastic Gradient Descent (SGD) =====
+
+In stochastic gradient descent, the extreme case is the case where we
+have only one batch, that is we include the whole data set.
+
+This process is called Stochastic Gradient
+Descent (SGD) (or also sometimes on-line gradient descent). This is
+relatively less common to see because in practice due to vectorized
+code optimizations it can be computationally much more efficient to
+evaluate the gradient for 100 examples, than the gradient for one
+example 100 times. Even though SGD technically refers to using a
+single example at a time to evaluate the gradient, you will hear
+people use the term SGD even when referring to mini-batch gradient
+descent (i.e. mentions of MGD for “Minibatch Gradient Descent”, or BGD
+for “Batch gradient descent” are rare to see), where it is usually
+assumed that mini-batches are used. The size of the mini-batch is a
+hyperparameter but it is not very common to cross-validate or bootstrap it. It is
+usually based on memory constraints (if any), or set to some value,
+e.g. 32, 64 or 128. We use powers of 2 in practice because many
+vectorized operation implementations work faster when their inputs are
+sized in powers of 2.
+
+In our notes with  SGD we mean stochastic gradient descent with mini-batches.
 
-!bt
-\[
-C(\bm{\beta})=\frac{\vert\vert (\bm{y}-\bm{X}\bm{\beta})\vert\vert_2^2}{2\sigma^2}+\lambda\vert\vert\bm{\beta}\vert\vert_2^2,
-\]
-!et
-which is our Ridge cost function!  Nice, isn't it?
 
 !split
-===== Lasso and Bayes =====
+===== Stochastic Gradient Descent =====
 
-To derive the Lasso cost function, we simply replace the Gaussian prior with an exponential distribution ("Laplace in this case":"https://en.wikipedia.org/wiki/Laplace_distribution") with zero mean value,  that is
+Stochastic gradient descent (SGD) and variants thereof address some of
+the shortcomings of the Gradient descent method discussed above.
 
+The underlying idea of SGD comes from the observation that the cost
+function, which we want to minimize, can almost always be written as a
+sum over $n$ data points $\{\mathbf{x}_i\}_{i=1}^n$,
 !bt
 \[
-p(\bm{\beta})=\prod_{j=0}^{p-1}\exp{\left(-\frac{\vert\beta_j\vert}{\tau}\right)}.
+C(\mathbf{\theta}) = \sum_{i=1}^n c_i(\mathbf{x}_i,
+\mathbf{\theta}). 
 \]
 !et
 
-Our posterior probability becomes then (omitting the normalization factor which is just a constant)
+!split
+===== Computation of gradients =====
+
+This in turn means that the gradient can be
+computed as a sum over $i$-gradients 
 !bt
 \[
-p(\bm{\beta}\vert\bm{D})=\prod_{i=0}^{n-1}\frac{1}{\sqrt{2\pi\sigma^2}}\exp{\left[-\frac{(y_i-\bm{X}_{i,*}\bm{\beta})^2}{2\sigma^2}\right]}\prod_{j=0}^{p-1}\exp{\left(-\frac{\vert\beta_j\vert}{\tau}\right)}.
+\nabla_\theta C(\mathbf{\theta}) = \sum_i^n \nabla_\theta c_i(\mathbf{x}_i,
+\mathbf{\theta}).
 \]
 !et
 
+Stochasticity/randomness is introduced by only taking the
+gradient on a subset of the data called minibatches.  If there are $n$
+data points and the size of each minibatch is $M$, there will be $n/M$
+minibatches. We denote these minibatches by $B_k$ where
+$k=1,\cdots,n/M$.
 
-Taking the negative
-logarithm of the posterior probability and leaving out the
-constants terms that do not depend on $\beta$, we have
 
 
+!split
+===== SGD example =====
+As an example, suppose we have $10$ data points $(\mathbf{x}_1,\cdots, \mathbf{x}_{10})$ 
+and we choose to have $M=5$ minibathces,
+then each minibatch contains two data points. In particular we have
+$B_1 = (\mathbf{x}_1,\mathbf{x}_2), \cdots, B_5 =
+(\mathbf{x}_9,\mathbf{x}_{10})$. Note that if you choose $M=1$ you
+have only a single batch with all data points and on the other extreme,
+you may choose $M=n$ resulting in a minibatch for each datapoint, i.e
+$B_k = \mathbf{x}_k$.
+
+The idea is now to approximate the gradient by replacing the sum over
+all data points with a sum over the data points in one the minibatches
+picked at random in each gradient descent step 
 !bt
 \[
-C(\bm{\beta})=\frac{\vert\vert (\bm{y}-\bm{X}\bm{\beta})\vert\vert_2^2}{2\sigma^2}+\frac{1}{\tau}\vert\vert\bm{\beta}\vert\vert_1,
+\nabla_{\theta}
+C(\mathbf{\theta}) = \sum_{i=1}^n \nabla_\theta c_i(\mathbf{x}_i,
+\mathbf{\theta}) \rightarrow \sum_{i \in B_k}^n \nabla_\theta
+c_i(\mathbf{x}_i, \mathbf{\theta}).
 \]
 !et
-and replacing $1/\tau$ with $\lambda$ we have
 
+!split
+===== The gradient step =====
+
+Thus a gradient descent step now looks like 
 !bt
 \[
-C(\bm{\beta})=\frac{\vert\vert (\bm{y}-\bm{X}\bm{\beta})\vert\vert_2^2}{2\sigma^2}+\lambda\vert\vert\bm{\beta}\vert\vert_1,
+\theta_{j+1} = \theta_j - \eta_j \sum_{i \in B_k}^n \nabla_\theta c_i(\mathbf{x}_i,
+\mathbf{\theta})
 \]
 !et
-which is our Lasso cost function!  
 
+where $k$ is picked at random with equal
+probability from $[1,n/M]$. An iteration over the number of
+minibathces (n/M) is commonly referred to as an epoch. Thus it is
+typical to choose a number of epochs and for each epoch iterate over
+the number of minibatches, as exemplified in the code below.
 
+!split
+===== Simple example code =====
 
+!bc pycod
+import numpy as np 
+
+n = 100 #100 datapoints 
+M = 5   #size of each minibatch
+m = int(n/M) #number of minibatches
+n_epochs = 10 #number of epochs
+
+j = 0
+for epoch in range(1,n_epochs+1):
+    for i in range(m):
+        k = np.random.randint(m) #Pick the k-th minibatch at random
+        #Compute the gradient using the data in minibatch Bk
+        #Compute new suggestion for 
+        j += 1
+!ec
 
+Taking the gradient only on a subset of the data has two important
+benefits. First, it introduces randomness which decreases the chance
+that our opmization scheme gets stuck in a local minima. Second, if
+the size of the minibatches are small relative to the number of
+datapoints ($M <  n$), the computation of the gradient is much
+cheaper since we sum over the datapoints in the $k-th$ minibatch and not
+all $n$ datapoints.
 
+!split
+===== When do we stop? =====
+
+A natural question is when do we stop the search for a new minimum?
+One possibility is to compute the full gradient after a given number
+of epochs and check if the norm of the gradient is smaller than some
+threshold and stop if true. However, the condition that the gradient
+is zero is valid also for local minima, so this would only tell us
+that we are close to a local/global minimum. However, we could also
+evaluate the cost function at this point, store the result and
+continue the search. If the test kicks in at a later stage we can
+compare the values of the cost function and keep the $\theta$ that
+gave the lowest value.
 
 !split
-===== Why resampling methods =====
+===== Slightly different approach =====
+
+Another approach is to let the step length $\eta_j$ depend on the
+number of epochs in such a way that it becomes very small after a
+reasonable time such that we do not move at all. Such approaches are
+also called scaling. There are many such ways to "scale the learning
+rate":"https://towardsdatascience.com/gradient-descent-the-learning-rate-and-the-importance-of-feature-scaling-6c0b416596e1"
+and "discussions here":"https://www.jmlr.org/papers/volume23/20-1258/20-1258.pdf". See
+also
+URL:"https://towardsdatascience.com/learning-rate-schedules-and-adaptive-learning-rate-methods-for-deep-learning-2c8f433990d1"
+for a discussion of different scaling functions for the learning rate.
 
-Before we proceed, we need to rethink what we have been doing. In our
-eager to fit the data, we have omitted several important elements in
-our regression analysis. In what follows we will
-o look at statistical properties, including a discussion of mean values, variance and the so-called bias-variance tradeoff
-o introduce resampling techniques like cross-validation, bootstrapping and jackknife and more
+!split
+===== Time decay rate =====
 
-and discuss how to select a given model (one of the difficult parts in machine learning).
+As an example, let $e = 0,1,2,3,\cdots$ denote the current epoch and let $t_0, t_1 > 0$ be two fixed numbers. Furthermore, let $t = e \cdot m + i$ where $m$ is the number of minibatches and $i=0,\cdots,m-1$. Then the function $$\eta_j(t; t_0, t_1) = \frac{t_0}{t+t_1} $$ goes to zero as the number of epochs gets large. I.e. we start with a step length $\eta_j (0; t_0, t_1) = t_0/t_1$ which decays in *time* $t$.
 
+In this way we can fix the number of epochs, compute $\theta$ and
+evaluate the cost function at the end. Repeating the computation will
+give a different result since the scheme is random by design. Then we
+pick the final $\theta$ that gives the lowest value of the cost
+function.
 
+!bc pycod
+import numpy as np 
+
+def step_length(t,t0,t1):
+    return t0/(t+t1)
+
+n = 100 #100 datapoints 
+M = 5   #size of each minibatch
+m = int(n/M) #number of minibatches
+n_epochs = 500 #number of epochs
+t0 = 1.0
+t1 = 10
+
+eta_j = t0/t1
+j = 0
+for epoch in range(1,n_epochs+1):
+    for i in range(m):
+        k = np.random.randint(m) #Pick the k-th minibatch at random
+        #Compute the gradient using the data in minibatch Bk
+        #Compute new suggestion for theta
+        t = epoch*m+i
+        eta_j = step_length(t,t0,t1)
+        j += 1
 
+print("eta_j after %d epochs: %g" % (n_epochs,eta_j))
+!ec
 
 
 !split
-===== Resampling methods =====
-!bblock
-Resampling methods are an indispensable tool in modern
-statistics. They involve repeatedly drawing samples from a training
-set and refitting a model of interest on each sample in order to
-obtain additional information about the fitted model. For example, in
-order to estimate the variability of a linear regression fit, we can
-repeatedly draw different samples from the training data, fit a linear
-regression to each new sample, and then examine the extent to which
-the resulting fits differ. Such an approach may allow us to obtain
-information that would not be available from fitting the model only
-once using the original training sample.
+===== Code with a Number of Minibatches which varies =====
 
-Two resampling methods are often used in Machine Learning analyses,
-o The _bootstrap method_
-o and _Cross-Validation_
+In the code here we vary the number of mini-batches.
+!bc pycode
+# Importing various packages
+from math import exp, sqrt
+from random import random, seed
+import numpy as np
+import matplotlib.pyplot as plt
 
-In addition there are several other methods such as the Jackknife and the Blocking methods. We will discuss in particular
-cross-validation and the bootstrap method. 
+n = 100
+x = 2*np.random.rand(n,1)
+y = 4+3*x+np.random.randn(n,1)
+
+X = np.c_[np.ones((n,1)), x]
+XT_X = X.T @ X
+theta_linreg = np.linalg.inv(X.T @ X) @ (X.T @ y)
+print("Own inversion")
+print(theta_linreg)
+# Hessian matrix
+H = (2.0/n)* XT_X
+EigValues, EigVectors = np.linalg.eig(H)
+print(f"Eigenvalues of Hessian Matrix:{EigValues}")
+
+theta = np.random.randn(2,1)
+eta = 1.0/np.max(EigValues)
+Niterations = 1000
+
+
+for iter in range(Niterations):
+    gradients = 2.0/n*X.T @ ((X @ theta)-y)
+    theta -= eta*gradients
+print("theta from own gd")
+print(theta)
+
+xnew = np.array([[0],[2]])
+Xnew = np.c_[np.ones((2,1)), xnew]
+ypredict = Xnew.dot(theta)
+ypredict2 = Xnew.dot(theta_linreg)
+
+n_epochs = 50
+M = 5   #size of each minibatch
+m = int(n/M) #number of minibatches
+t0, t1 = 5, 50
+
+def learning_schedule(t):
+    return t0/(t+t1)
+
+theta = np.random.randn(2,1)
+
+for epoch in range(n_epochs):
+# Can you figure out a better way of setting up the contributions to each batch?
+    for i in range(m):
+        random_index = M*np.random.randint(m)
+        xi = X[random_index:random_index+M]
+        yi = y[random_index:random_index+M]
+        gradients = (2.0/M)* xi.T @ ((xi @ theta)-yi)
+        eta = learning_schedule(epoch*m+i)
+        theta = theta - eta*gradients
+print("theta from own sdg")
+print(theta)
+
+plt.plot(xnew, ypredict, "r-")
+plt.plot(xnew, ypredict2, "b-")
+plt.plot(x, y ,'ro')
+plt.axis([0,2.0,0, 15.0])
+plt.xlabel(r'$x$')
+plt.ylabel(r'$y$')
+plt.title(r'Random numbers ')
+plt.show()
 
+!ec
 
-!eblock
 
 
 !split
-===== Resampling approaches can be computationally expensive =====
-!bblock
-
-Resampling approaches can be computationally expensive, because they
-involve fitting the same statistical method multiple times using
-different subsets of the training data. However, due to recent
-advances in computing power, the computational requirements of
-resampling methods generally are not prohibitive. In this chapter, we
-discuss two of the most commonly used resampling methods,
-cross-validation and the bootstrap. Both methods are important tools
-in the practical application of many statistical learning
-procedures. For example, cross-validation can be used to estimate the
-test error associated with a given statistical learning method in
-order to evaluate its performance, or to select the appropriate level
-of flexibility. The process of evaluating a model’s performance is
-known as model assessment, whereas the process of selecting the proper
-level of flexibility for a model is known as model selection. The
-bootstrap is widely used.
+===== Replace or not =====
 
-!eblock
+In the above code, we have use replacement in setting up the
+mini-batches. The discussion
+"here":"https://sebastianraschka.com/faq/docs/sgd-methods.html" may be
+useful.  
 
 !split
-===== Why resampling methods ? ===== 
-!bblock Statistical analysis
-    
-* Our simulations can be treated as *computer experiments*. This is particularly the case for Monte Carlo methods which are widely used in statistical analyses.
-* The results can be analysed with the same statistical tools as we would use when analysing experimental data.
-* As in all experiments, we are looking for expectation values and an estimate of how accurate they are, i.e., possible sources for errors.
+===== SGD vs Full-Batch GD: Convergence Speed and Memory Comparison =====
 
 
-!eblock    
 
-!split
-=====  Statistical analysis ===== 
-!bblock 
+=== Theoretical Convergence Speed and convex optimization ===
 
-* As in other experiments, many numerical  experiments have two classes of errors:
-  * Statistical errors
-  * Systematical errors
-* Statistical errors can be estimated using standard tools from statistics
-* Systematical errors are method specific and must be treated differently from case to case. 
-!eblock    
 
+Consider minimizing an empirical cost function
+!bt
+\[
+C(\theta) =\frac{1}{N}\sum_{i=1}^N l_i(\theta),
+\]
+!et
 
+where each $l_i(\theta)$ is a
+differentiable loss term. Gradient Descent (GD) updates parameters
+using the full gradient $\nabla C(\theta)$, while Stochastic Gradient
+Descent (SGD) uses a single sample (or mini-batch) gradient $\nabla
+l_i(\theta)$ selected at random. In equation form, one GD step is:
 
+!bt
+\[
+\theta_{t+1} = \theta_t-\eta \nabla C(\theta_t) =\theta_t -\eta \frac{1}{N}\sum_{i=1}^N \nabla l_i(\theta_t),
+\]
+!et
+whereas one SGD step is:
 
+!bt
+\[
+\theta_{t+1} = \theta_t -\eta \nabla l_{i_t}(\theta_t),
+\]
+!et
 
-!split
-===== Resampling methods =====
+with $i_t$ randomly chosen. On smooth convex problems, GD and SGD both
+converge to the global minimum, but their rates differ. GD can take
+larger, more stable steps since it uses the exact gradient, achieving
+an error that decreases on the order of $O(1/t)$ per iteration for
+convex objectives (and even exponentially fast for strongly convex
+cases). In contrast, plain SGD has more variance in each step, leading
+to sublinear convergence in expectation – typically $O(1/\sqrt{t})$
+for general convex objectives (\thetaith appropriate diminishing step
+sizes) . Intuitively, GD’s trajectory is smoother and more
+predictable, while SGD’s path oscillates due to noise but costs far
+less per iteration, enabling many more updates in the same time.
 
-With all these analytical equations for both the OLS and Ridge
-regression, we will now outline how to assess a given model. This will
-lead to a discussion of the so-called bias-variance tradeoff (see
-below) and so-called resampling methods.
 
-One of the quantities we have discussed as a way to measure errors is
-the mean-squared error (MSE), mainly used for fitting of continuous
-functions. Another choice is the absolute error.
+=== Strongly Convex Case ===
 
-In the discussions below we will focus on the MSE and in particular since we will split the data into test and training data,
-we discuss the
-o prediction error or simply the _test error_ $\mathrm{Err_{Test}}$, where we have a fixed training set and the test error is the MSE arising from the data reserved for testing. We discuss also the 
-o training error $\mathrm{Err_{Train}}$, which is the average loss over the training data.
 
-As our model becomes more and more complex, more of the training data tends to  used. The training may thence adapt to more complicated structures in the data. This may lead to a decrease in the bias (see below for code example) and a slight increase of the variance for the test error.
-For a certain level of complexity the test error will reach minimum, before starting to increase again. The
-training error reaches a saturation.
+If $C(\theta)$ is strongly convex and $L$-smooth (so GD enjoys linear
+convergence), the gap $C(\theta_t)-C(\theta^*)$ for GD shrinks as
+!bt
+\[
+C(\theta_t) - C(\theta^* ) \le \Big(1 - \frac{\mu}{L}\Big)^t [C(\theta_0)-C(\theta^*)],
+\]
+!et
 
+a geometric (linear) convergence per iteration . Achieving an
+$\epsilon$-accurate solution thus takes on the order of
+$\log(1/\epsilon)$ iterations for GD. However, each GD iteration costs
+$O(N)$ gradient evaluations. SGD cannot exploit strong convexity to
+obtain a linear rate – instead, with a properly decaying step size
+(e.g. $\eta_t = \frac{1}{\mu t}$) or iterate averaging, SGD attains an
+$O(1/t)$ convergence rate in expectation . For example, one result
+of Moulines and  Bach 2011, see URL:"https://papers.nips.cc/paper_files/paper/2011/hash/40008b9a5380fcacce3976bf7c08af5b-Abstract.html" shows that with $\eta_t = \Theta(1/t)$,
+!bt
+\[
+\mathbb{E}[C(\theta_t) - C(\theta^*)] = O(1/t),
+\]
+!et
+
+for strongly convex, smooth $F$ . This $1/t$ rate is slower per
+iteration than GD’s exponential decay, but each SGD iteration is $N$
+times cheaper. In fact, to reach error $\epsilon$, plain SGD needs on
+the order of $T=O(1/\epsilon)$ iterations (sub-linear convergence),
+while GD needs $O(\log(1/\epsilon))$ iterations. When accounting for
+cost-per-iteration, GD requires $O(N \log(1/\epsilon))$ total gradient
+computations versus SGD’s $O(1/\epsilon)$ single-sample
+computations. In large-scale regimes (huge $N$), SGD can be
+faster in wall-clock time because $N \log(1/\epsilon)$ may far exceed
+$1/\epsilon$ for reasonable accuracy levels. In other words,
+with millions of data points, one epoch of GD (one full gradient) is
+extremely costly, whereas SGD can make $N$ cheap updates in the time
+GD makes one – often yielding a good solution faster in practice, even
+though SGD’s asymptotic error decays more slowly. As one lecture
+succinctly puts it: “SGD can be super effective in terms of iteration
+cost and memory, but SGD is slow to converge and can’t adapt to strong
+convexity” . Thus, the break-even point depends on $N$ and the desired
+accuracy: for moderate accuracy on very large $N$, SGD’s cheaper
+updates win; for extremely high precision (very small $\epsilon$) on a
+modest $N$, GD’s fast convergence per step can be advantageous.
+
+=== Non-Convex Problems ===
+
+In non-convex optimization (e.g. deep neural networks), neither GD nor
+SGD guarantees global minima, but SGD often displays faster progress
+in finding useful minima. Theoretical results here are weaker, usually
+showing convergence to a stationary point $\theta$ ($|\nabla C|$ is
+small) in expectation. For example, GD might require $O(1/\epsilon^2)$
+iterations to ensure $|\nabla C(\theta)| < \epsilon$, and SGD typically has
+similar polynomial complexity (often worse due to gradient
+noise). However, a noteworthy difference is that SGD’s stochasticity
+can help escape saddle points or poor local minima. Random gradient
+fluctuations act like implicit noise, helping the iterate “jump” out
+of flat saddle regions where full-batch GD could stagnate . In fact,
+research has shown that adding noise to GD can guarantee escaping
+saddle points in polynomial time, and the inherent noise in SGD often
+serves this role. Empirically, this means SGD can sometimes find a
+lower loss basin faster, whereas full-batch GD might get “stuck” near
+saddle points or need a very small learning rate to navigate complex
+error surfaces . Overall, in modern high-dimensional machine learning,
+SGD (or mini-batch SGD) is the workhorse for large non-convex problems
+because it converges to good solutions much faster in practice,
+despite the lack of a linear convergence guarantee. Full-batch GD is
+rarely used on large neural networks, as it would require tiny steps
+to avoid divergence and is extremely slow per iteration .
 
 !split
-===== Resampling methods: Bootstrap =====
-!bblock
-Bootstrapping is a "non-parametric approach":"https://en.wikipedia.org/wiki/Nonparametric_statistics" to statistical inference
-that substitutes computation for more traditional distributional
-assumptions and asymptotic results. Bootstrapping offers a number of
-advantages: 
-o The bootstrap is quite general, although there are some cases in which it fails.  
-o Because it does not require distributional assumptions (such as normally distributed errors), the bootstrap can provide more accurate inferences when the data are not well behaved or when the sample size is small.  
-o It is possible to apply the bootstrap to statistics with sampling distributions that are difficult to derive, even asymptotically. 
-o It is relatively simple to apply the bootstrap to complex data-collection plans (such as stratified and clustered samples).
-!eblock
+===== Memory Usage and Scalability =====
+
+
+A major advantage of SGD is its memory efficiency in handling large
+datasets. Full-batch GD requires access to the entire training set for
+each iteration, which often means the whole dataset (or a large
+subset) must reside in memory to compute $\nabla C(\theta)$ . This results
+in memory usage that scales linearly with the dataset size $N$. For
+instance, if each training sample is large (e.g. high-dimensional
+features), computing a full gradient may require storing a substantial
+portion of the data or all intermediate gradients until they are
+aggregated. In contrast, SGD needs only a single (or a small
+mini-batch of) training example(s) in memory at any time . The
+algorithm processes one sample (or mini-batch) at a time and
+immediately updates the model, discarding that sample before moving to
+the next. This streaming approach means that memory footprint is
+essentially independent of $N$ (apart from storing the model
+parameters themselves). As one source notes, gradient descent
+“requires more memory than SGD” because it “must store the entire
+dataset for each iteration,” whereas SGD “only needs to store the
+current training example” . In practical terms, if you have a dataset
+of size, say, 1 million examples, full-batch GD would need memory for
+all million every step, while SGD could be implemented to load just
+one example at a time – a crucial benefit if data are too large to fit
+in RAM or GPU memory. This scalability makes SGD suitable for
+large-scale learning: as long as you can stream data from disk, SGD
+can handle arbitrarily large datasets with fixed memory. In fact, SGD
+“does not need to remember which examples were visited” in the past,
+allowing it to run in an online fashion on infinite data streams
+. Full-batch GD, on the other hand, would require multiple passes
+through a giant dataset per update (or a complex distributed memory
+system), which is often infeasible.
+
+There is also a secondary memory effect: computing a full-batch
+gradient in deep learning requires storing all intermediate
+activations for backpropagation across the entire batch. A very large
+batch (approaching the full dataset) might exhaust GPU memory due to
+the need to hold activation gradients for thousands or millions of
+examples simultaneously. SGD/minibatches mitigate this by splitting
+the workload – e.g. with a mini-batch of size 32 or 256, memory use
+stays bounded, whereas a full-batch (size = $N$) forward/backward pass
+could not even be executed if $N$ is huge. Techniques like gradient
+accumulation exist to simulate large-batch GD by summing many
+small-batch gradients – but these still process data in manageable
+chunks to avoid memory overflow. In summary, memory complexity for GD
+grows with $N$, while for SGD it remains $O(1)$ w.r.t. dataset size
+(only the model and perhaps a mini-batch reside in memory) . This is a
+key reason why batch GD “does not scale” to very large data and why
+virtually all large-scale machine learning algorithms rely on
+stochastic or mini-batch methods.
+
+
+!split
+===== Empirical Evidence: Convergence Time and Memory in Practice =====
+
+
+Empirical studies strongly support the theoretical trade-offs
+above. In large-scale machine learning tasks, SGD often converges to a
+good solution much faster in wall-clock time than full-batch GD, and
+it uses far less memory. For example, Bottou & Bousquet (2008)
+analyzed learning time under a fixed computational budget and
+concluded that when data is abundant, it’s better to use a faster
+(even if less precise) optimization method to process more examples in
+the same time . This analysis showed that for large-scale problems,
+processing more data with SGD yields lower error than spending the
+time to do exact (batch) optimization on fewer data . In other words,
+if you have a time budget, it’s often optimal to accept slightly
+slower convergence per step (as with SGD) in exchange for being able
+to use many more training samples in that time. This phenomenon is
+borne out by experiments:
+
+
+
+=== Deep Neural Networks ===
+
+In modern deep learning, full-batch GD is so slow that it is rarely
+attempted; instead, mini-batch SGD is standard. A recent study
+demonstrated that it is possible to train a ResNet-50 on ImageNet
+using full-batch gradient descent, but it required careful tuning
+(e.g. gradient clipping, tiny learning rates) and vast computational
+resources – and even then, each full-batch update was extremely
+expensive.
+
+Using a huge batch
+(closer to full GD) tends to slow down convergence if the learning
+rate is not scaled up, and often encounters optimization difficulties
+(plateaus) that small batches avoid.
+Empirically, small or medium
+batch SGD finds minima in fewer clock hours because it can rapidly
+loop over the data with gradient noise aiding exploration.
+
+=== Memory constraints ===
+
+From a memory standpoint, practitioners note that batch GD becomes
+infeasible on large data. For example, if one tried to do full-batch
+training on a dataset that doesn’t fit in RAM or GPU memory, the
+program would resort to heavy disk I/O or simply crash. SGD
+circumvents this by processing mini-batches. Even in cases where data
+does fit in memory, using a full batch can spike memory usage due to
+storing all gradients. One empirical observation is that mini-batch
+training has a “lower, fluctuating usage pattern” of memory, whereas
+full-batch loading “quickly consumes memory (often exceeding limits)”
+. This is especially relevant for graph neural networks or other
+models where a “batch” may include a huge chunk of a graph: full-batch
+gradient computation can exhaust GPU memory, whereas mini-batch
+methods keep memory usage manageable .
+
+
+In summary, SGD converges faster than full-batch GD in terms of actual
+training time for large-scale problems, provided we measure
+convergence as reaching a good-enough solution. Theoretical bounds
+show SGD needs more iterations, but because it performs many more
+updates per unit time (and requires far less memory), it often
+achieves lower loss in a given time frame than GD. Full-batch GD might
+take slightly fewer iterations in theory, but each iteration is so
+costly that it is “slower… especially for large datasets” . Meanwhile,
+memory scaling strongly favors SGD: GD’s memory cost grows with
+dataset size, making it impractical beyond a point, whereas SGD’s
+memory use is modest and mostly constant w.r.t. $N$ . These
+differences have made SGD (and mini-batch variants) the de facto
+choice for training large machine learning models, from logistic
+regression on millions of examples to deep neural networks with
+billions of parameters. The consensus in both research and practice is
+that for large-scale or high-dimensional tasks, SGD-type methods
+converge quicker per unit of computation and handle memory constraints
+better than standard full-batch gradient descent .
 
-The textbook by "Davison on the Bootstrap Methods and their Applications":"https://www.cambridge.org/core/books/bootstrap-methods-and-their-application/ED2FD043579F27952363566DC09CBD6A" provides many more insights and proofs. In this course we will take a more practical approach and use the results and theorems provided in the literature. For those interested in reading more about the bootstrap methods, we recommend the above text and the one by "Efron and Tibshirani":"https://www.routledge.com/An-Introduction-to-the-Bootstrap/Efron-Tibshirani/p/book/9780412042317".
 
 
-Before we proceed however, we need to remind ourselves about a central theorem in statistics, namely the so-called _central limit theorem_.
+!split
+===== Second moment of the gradient =====
+
+
+In stochastic gradient descent, with and without momentum, we still
+have to specify a schedule for tuning the learning rates $\eta_t$
+as a function of time.  As discussed in the context of Newton's
+method, this presents a number of dilemmas. The learning rate is
+limited by the steepest direction which can change depending on the
+current position in the landscape. To circumvent this problem, ideally
+our algorithm would keep track of curvature and take large steps in
+shallow, flat directions and small steps in steep, narrow directions.
+Second-order methods accomplish this by calculating or approximating
+the Hessian and normalizing the learning rate by the
+curvature. However, this is very computationally expensive for
+extremely large models. Ideally, we would like to be able to
+adaptively change the step size to match the landscape without paying
+the steep computational price of calculating or approximating
+Hessians.
+
+During the last decade a number of methods have been introduced that accomplish
+this by tracking not only the gradient, but also the second moment of
+the gradient. These methods include AdaGrad, AdaDelta, Root Mean Squared Propagation (RMS-Prop), and
+"ADAM":"https://arxiv.org/abs/1412.6980".
+
 
 !split
-===== The Central Limit Theorem =====
+===== Challenge: Choosing a Fixed Learning Rate =====
+A fixed $\eta$ is hard to get right:
+o If $\eta$ is too large, the updates can overshoot the minimum, causing oscillations or divergence
+o If $\eta$ is too small, convergence is very slow (many iterations to make progress)
 
+In practice, one often uses trial-and-error or schedules (decaying $\eta$ over time) to find a workable balance.
+For a function with steep directions and flat directions, a single global $\eta$ may be inappropriate:
+o Steep coordinates require a smaller step size to avoid oscillation.
+o Flat/shallow coordinates could use a larger step to speed up progress.
+o This issue is pronounced in high-dimensional problems with **sparse or varying-scale features** – we need a method to adjust step sizesper feature.
 
-Suppose we have a PDF $p(x)$ from which we generate  a series $N$
-of averages $\mathbb{E}[x_i]$. Each mean value $\mathbb{E}[x_i]$
-is viewed as the average of a specific measurement, e.g., throwing 
-dice 100 times and then taking the average value, or producing a certain
-amount of random numbers. 
-For notational ease, we set $\mathbb{E}[x_i]=x_i$ in the discussion
-which follows. We do the same for $\mathbb{E}[z]=z$.
+!split
+===== Motivation for Adaptive Step Sizes =====
+
+o Instead of a fixed global $\eta$, use an _adaptive learning rate_ for each parameter that depends on the history of gradients.
+o Parameters that have large accumulated gradient magnitude should get smaller steps (they've been changing a lot), whereas parameters with small or infrequent gradients can have larger relative steps.
+o This is especially useful for sparse features: Rarely active features accumulate little gradient, so their learning rate remains comparatively high, ensuring they are not neglected
+o Conversely, frequently active features accumulate large gradient sums, and their learning rate automatically decreases, preventing too-large updates
+o Several algorithms implement this idea (AdaGrad, RMSProp, AdaDelta, Adam, etc.). We will derive **AdaGrad**, one of the first adaptive methods.
+
+===== AdaGrad algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html" =====
+
+FIGURE: [figures/adagrad.png, width=600 frac=0.8]
 
-If we compute the mean $z$ of $m$ such mean values $x_i$   
-!bt
-\[
-   z=\frac{x_1+x_2+\dots+x_m}{m},
-\]
-!et
-the question we pose is which is the PDF of the new variable $z$.
 
 !split
-===== Finding the Limit =====
+===== Derivation of the AdaGrad Algorithm =====
 
-The probability of obtaining an average value $z$ is the product of the 
-probabilities of obtaining arbitrary individual mean values $x_i$,
-but with the constraint that the average is $z$. We can express this through
-the following expression
+!bblock Accumulating Gradient History
+o AdaGrad maintains a running sum of squared gradients for each parameter (coordinate)
+o Let $g_t = \nabla C_{i_t}(x_t)$ be the gradient at step $t$ (or a subgradient for nondifferentiable cases).
+o Initialize $r_0 = 0$ (an all-zero vector in $\mathbb{R}^d$).
+o At each iteration $t$, update the accumulation:
 !bt
 \[
-    \tilde{p}(z)=\int dx_1p(x_1)\int dx_2p(x_2)\dots\int dx_mp(x_m)
-    \delta(z-\frac{x_1+x_2+\dots+x_m}{m}),
+r_t = r_{t-1} + g_t \circ g_t,
 \]
 !et
-where the $\delta$-function enbodies the constraint that the mean is $z$.
-All measurements that lead to each individual $x_i$ are expected to
-be independent, which in turn means that we can express $\tilde{p}$ as the 
-product of individual $p(x_i)$.  The independence assumption is important in the derivation of the central limit theorem.
-
+o Here  $g_t \circ g_t$ denotes element-wise square of the gradient vector. $g_t^{(j)} = g_{t-1}^{(j)} + (g_{t,j})^2$ for each parameter $j$.
+o We can view $H_t = \mathrm{diag}(r_t)$ as a diagonal matrix of past squared gradients. Initially $H_0 = 0$.
+!eblock
 
 !split
-===== Rewriting the $\delta$-function =====
-
-If we use the integral expression for the $\delta$-function
+===== AdaGrad Update Rule Derivation =====
 
+We scale the gradient by the inverse square root of the accumulated matrix $H_t$. The AdaGrad update at step $t$ is:
 !bt
 \[
-   \delta(z-\frac{x_1+x_2+\dots+x_m}{m})=\frac{1}{2\pi}\int_{-\infty}^{\infty}
-   dq\exp{\left(iq(z-\frac{x_1+x_2+\dots+x_m}{m})\right)},
+\theta_{t+1} =\theta_t - \eta H_t^{-1/2} g_t,
 \]
 !et
-and inserting $e^{i\mu q-i\mu q}$ where $\mu$ is the mean value
-we arrive at
+where $H_t^{-1/2}$ is the diagonal matrix with entries $(r_{t}^{(1)})^{-1/2}, \dots, (r_{t}^{(d)})^{-1/2}$
+In coordinates, this means each parameter $j$ has an individual step size:
 !bt
 \[
-   \tilde{p}(z)=\frac{1}{2\pi}\int_{-\infty}^{\infty}
-   dq\exp{\left(iq(z-\mu)\right)}\left[\int_{-\infty}^{\infty}
-   dxp(x)\exp{\left(iq(\mu-x)/m\right)}\right]^m,
+ \theta_{t+1,j} =\theta_{t,j} -\frac{\eta}{\sqrt{r_{t,j}}}g_{t,j}.
 \]
 !et
-with the integral over $x$ resulting in
-
+In practice we add a small constant $\epsilon$ in the denominator for numerical stability to avoid division by zero:
 !bt
 \[
-  \int_{-\infty}^{\infty}dxp(x)\exp{\left(iq(\mu-x)/m\right)}=
-  \int_{-\infty}^{\infty}dxp(x)
-   \left[1+\frac{iq(\mu-x)}{m}-\frac{q^2(\mu-x)^2}{2m^2}+\dots\right].
+\theta_{t+1,j}= \theta_{t,j}-\frac{\eta}{\sqrt{\epsilon + r_{t,j}}}g_{t,j}.
 \]
 !et
+Equivalently, the effective learning rate for parameter $j$ at time $t$ is $\displaystyle \alpha_{t,j} = \frac{\eta}{\sqrt{\epsilon + r_{t,j}}}$. This decreases over time as $r_{t,j}$ grows.
+
+!split
+===== AdaGrad Properties =====
+
+o AdaGrad automatically tunes the step size for each parameter. Parameters with more *volatile or large gradients* get smaller steps, and those with *small or infrequent gradients* get relatively larger steps
+o No manual schedule needed: The accumulation $r_t$ keeps increasing (or stays the same if gradient is zero), so step sizes $\eta/\sqrt{r_t}$ are non-increasing. This has a similar effect to a learning rate schedule, but individualized per coordinate.
+o Sparse data benefit: For very sparse features, $r_{t,j}$ grows slowly, so that feature’s parameter retains a higher learning rate for longer, allowing it to make significant updates when it does get a gradient signal
+o Convergence: In convex optimization, AdaGrad can be shown to achieve a sub-linear convergence rate  comparable to the best fixed learning rate tuned for the problem
+
+It effectively reduces the need to tune $\eta$ by hand.
+o Limitations: Because $r_t$ accumulates without bound, AdaGrad’s learning rates can become extremely small over long training, potentially slowing progress. (Later variants like RMSProp, AdaDelta, Adam address this by modifying the accumulation rule.)
 
 !split
-===== Identifying Terms =====
+===== RMSProp: Adaptive Learning Rates =====
 
-The second term on the rhs disappears since this is just the mean and 
-employing the definition of $\sigma^2$ we have 
+Addresses AdaGrad’s diminishing learning rate issue.
+Uses a decaying average of squared gradients (instead of a cumulative sum):
 !bt
 \[
-  \int_{-\infty}^{\infty}dxp(x)e^{\left(iq(\mu-x)/m\right)}=
-  1-\frac{q^2\sigma^2}{2m^2}+\dots,
+v_t = \rho v_{t-1} + (1-\rho)(\nabla C(\theta_t))^2,
 \]
 !et
-resulting in 
+with $\rho$ typically $0.9$ (or $0.99$).
+o Update: $\theta_{t+1} = \theta_t - \frac{\eta}{\sqrt{v_t + \epsilon}} \nabla C(\theta_t)$.
+o Recent gradients have more weight, so $v_t$ adapts to the current landscape.
+o Avoids AdaGrad’s “infinite memory” problem – learning rate does not continuously decay to zero.
+RMSProp was first proposed in lecture notes by Geoff Hinton, 2012 -- unpublished.)
+
+===== RMSProp algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html" =====
+
+FIGURE: [figures/rmsprop.png, width=600 frac=0.8]
+
+
+!split
+===== Adam Optimizer =====
+
+Why combine Momentum and RMSProp? Motivation for Adam: Adaptive Moment Estimation (Adam) was introduced by Kingma an Ba (2014) to combine the benefits of momentum and RMSProp.
+
+o Fast convergence by smoothing gradients (accelerates in long-term gradient direction).
+o Adaptive rates (RMSProp): Per-dimension learning rate scaling for stability (handles different feature scales, sparse gradients).
+o Adam uses both: maintains moving averages of both first moment (gradients) and second moment (squared gradients)
+o Additionally, includes a mechanism to correct the bias in these moving averages (crucial in early iterations)
+
+_Result_: Adam is robust, achieves faster convergence with less tuning, and often outperforms SGD (with momentum) in practice.
+
+
+
+!split
+===== "ADAM optimizer":"https://arxiv.org/abs/1412.6980" =====
+
+
+In "ADAM":"https://arxiv.org/abs/1412.6980", we keep a running average of
+both the first and second moment of the gradient and use this
+information to adaptively change the learning rate for different
+parameters.  The method is efficient when working with large
+problems involving lots data and/or parameters.  It is a combination of the
+gradient descent with momentum algorithm and the RMSprop algorithm
+discussed above.
 
+
+!split
+===== Why Combine Momentum and RMSProp? =====
+
+o Momentum: Fast convergence by smoothing gradients (accelerates in long-term gradient direction).
+o Adaptive rates (RMSProp): Per-dimension learning rate scaling for stability (handles different feature scales, sparse gradients).
+o Adam uses both: maintains moving averages of both first moment (gradients) and second moment (squared gradients)
+o Additionally, includes a mechanism to correct the bias in these moving averages (crucial in early iterations)
+
+Result: Adam is robust, achieves faster convergence with less tuning, and often outperforms SGD (with momentum) in practice
+
+!split
+===== Adam: Exponential Moving Averages (Moments) =====
+Adam maintains two moving averages at each time step $t$ for each parameter $w$:
+!bblock First moment (mean) $m_t$
+The Momentum term
 !bt
 \[
-  \left[\int_{-\infty}^{\infty}dxp(x)\exp{\left(iq(\mu-x)/m\right)}\right]^m\approx
-  \left[1-\frac{q^2\sigma^2}{2m^2}+\dots \right]^m,
+m_t = \beta_1m_{t-1} + (1-\beta_1)\, \nabla C(\theta_t),  
 \]
 !et
-and in the limit $m\rightarrow \infty$ we obtain 
-
+!eblock
+!bblock Second moment (uncentered variance) $v_t$
+The RMS term
 !bt
 \[
-   \tilde{p}(z)=\frac{1}{\sqrt{2\pi}(\sigma/\sqrt{m})}
-    \exp{\left(-\frac{(z-\mu)^2}{2(\sigma/\sqrt{m})^2}\right)},
+v_t = \beta_2v_{t-1} + (1-\beta_2)(\nabla C(\theta_t))^2,
 \]
 !et
-which is the normal distribution with variance
-$\sigma^2_m=\sigma^2/m$, where $\sigma$ is the variance of the PDF $p(x)$
-and $\mu$ is also the mean of the PDF $p(x)$. 
+with typical $\beta_1 = 0.9$, $\beta_2 = 0.999$. Initialize $m_0 = 0$, $v_0 = 0$.
+!eblock
+  These are _biased_ estimators of the true first and second moment of the gradients, especially at the start (since $m_0,v_0$ are zero)
 
 !split
-===== Wrapping it up =====
-
-Thus, the central limit theorem states that the PDF $\tilde{p}(z)$ of
-the average of $m$ random values corresponding to a PDF $p(x)$ 
-is a normal distribution whose mean is the 
-mean value of the PDF $p(x)$ and whose variance is the variance
-of the PDF $p(x)$ divided by $m$, the number of values used to compute $z$.
-
-The central limit theorem leads to the well-known expression for the
-standard deviation, given by 
-
+=====  Adam: Bias Correction =====
+To counteract initialization bias in $m_t, v_t$, Adam computes bias-corrected estimates
 !bt
 \[
-    \sigma_m=
-\frac{\sigma}{\sqrt{m}}.
+\hat{m}_t = \frac{m_t}{1 - \beta_1^t}, \qquad \hat{v}_t = \frac{v_t}{1 - \beta_2^t}. 
 \]
 !et
+* When $t$ is small, $1-\beta_i^t \approx 0$, so $\hat{m}_t, \hat{v}_t$ significantly larger than raw $m_t, v_t$, compensating for the initial zero bias.
+* As $t$ increases, $1-\beta_i^t \to 1$, and $\hat{m}_t, \hat{v}_t$ converge to $m_t, v_t$.
+* Bias correction is important for Adam’s stability in early iterations
 
-The latter is true only if the average value is known exactly. This is obtained in the limit
-$m\rightarrow \infty$  only. Because the mean and the variance are measured quantities we obtain 
-the familiar expression in statistics (the so-called Bessel correction)
+!split
+===== Adam: Update Rule Derivation =====
+Finally, Adam updates parameters using the bias-corrected moments:
 !bt
 \[
-    \sigma_m\approx 
-\frac{\sigma}{\sqrt{m-1}}.
+\theta_{t+1} =\theta_t -\frac{\alpha}{\sqrt{\hat{v}_t} + \epsilon}\hat{m}_t,
 \]
 !et
+where $\epsilon$ is a small constant (e.g. $10^{-8}$) to prevent division by zero.
+Breaking it down:
+o Compute gradient $\nabla C(\theta_t)$.
+o Update first moment $m_t$ and second moment $v_t$ (exponential moving averages).
+o Bias-correct: $\hat{m}_t = m_t/(1-\beta_1^t)$, $\; \hat{v}_t = v_t/(1-\beta_2^t)$.
+o Compute step: $\Delta \theta_t = \frac{\hat{m}_t}{\sqrt{\hat{v}_t} + \epsilon}$.
+o Update parameters: $\theta_{t+1} = \theta_t - \alpha\, \Delta \theta_t$.
+This is the Adam update rule as given in the original paper.
 
-In many cases however the above estimate for the standard deviation,
-in particular if correlations are strong, may be too simplistic. Keep
-in mind that we have assumed that the variables $x$ are independent
-and identically distributed. This is obviously not always the
-case. For example, the random numbers (or better pseudorandom numbers)
-we generate in various calculations do always exhibit some
-correlations.
-
+!split
+===== Adam vs. AdaGrad and RMSProp =====
 
+o AdaGrad: Uses per-coordinate scaling like Adam, but no momentum. Tends to slow down too much due to cumulative history (no forgetting)
+o RMSProp: Uses moving average of squared gradients (like Adam’s $v_t$) to maintain adaptive learning rates, but does not include momentum or bias-correction.
+o Adam: Effectively RMSProp + Momentum + Bias-correction
+  * Momentum ($m_t$) provides acceleration and smoother convergence.
+  * Adaptive $v_t$ scaling moderates the step size per dimension.
+  * Bias correction (absent in AdaGrad/RMSProp) ensures robust estimates early on.
 
-The theorem is satisfied by a large class of PDFs. Note however that for a
-finite $m$, it is not always possible to find a closed form /analytic expression for
-$\tilde{p}(x)$.
+In practice, Adam often yields faster convergence and better tuning stability than RMSProp or AdaGrad alone
 
 
 !split
-===== Confidence Intervals =====
+===== Adaptivity Across Dimensions =====
 
-Confidence intervals are used in statistics and represent a type of estimate
-computed from the observed data. This gives a range of values for an
-unknown parameter such as the parameters $\bm{\beta}$ from linear regression.
+o Adam adapts the step size \emph{per coordinate}: parameters with larger gradient variance get smaller effective steps, those with smaller or sparse gradients get larger steps.
+o This per-dimension adaptivity is inherited from AdaGrad/RMSProp and helps handle ill-conditioned or sparse problems.
+o Meanwhile, momentum (first moment) allows Adam to continue making progress even if gradients become small or noisy, by leveraging accumulated direction.
 
-With the OLS expressions for the parameters $\bm{\beta}$ we found 
-$\mathbb{E}(\bm{\beta}) = \bm{\beta}$, which means that the estimator of the regression parameters is unbiased.
 
-In the exercises this week we show that the variance of the estimate of the $j$-th regression coefficient is
-$\bm{\sigma}^2 (\bm{\beta}_j ) = \bm{\sigma}^2 [(\mathbf{X}^{T} \mathbf{X})^{-1}]_{jj} $.
 
-This quantity can be used to
-construct a confidence interval for the estimates.
 
 
-!split
-===== Standard Approach based on the Normal Distribution =====
+===== ADAM algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html" =====
 
-We will assume that the parameters $\beta$ follow a normal
-distribution.  We can then define the confidence interval.  Here we will be using as
-shorthands $\mu_{\beta}$ for the above mean value and $\sigma_{\beta}$
-for the standard deviation. We have then a confidence interval
+FIGURE: [figures/adam.png, width=600 frac=0.8]
 
-!bt
-\[
-\left(\mu_{\beta}\pm \frac{z\sigma_{\beta}}{\sqrt{n}}\right),
-\]
-!et
 
-where $z$ defines the level of certainty (or confidence). For a normal
-distribution typical parameters are $z=2.576$ which corresponds to a
-confidence of $99\%$ while $z=1.96$ corresponds to a confidence of
-$95\%$.  A confidence level of $95\%$ is commonly used and it is
-normally referred to as a *two-sigmas* confidence level, that is we
-approximate $z\approx 2$.
 
-For more discussions of confidence intervals (and in particular linked with a discussion of the bootstrap method), see chapter 5 of the textbook by "Davison on the Bootstrap Methods and their Applications":"https://www.cambridge.org/core/books/bootstrap-methods-and-their-application/ED2FD043579F27952363566DC09CBD6A"
+!split
+===== Algorithms and codes for Adagrad, RMSprop and Adam =====
 
-In this text you will also find an in-depth discussion of the
-Bootstrap method, why it works and various theorems related to it. 
+The algorithms we have implemented are well described in the text by "Goodfellow, Bengio and Courville, chapter 8":"https://www.deeplearningbook.org/contents/optimization.html".
 
-!split
-===== Resampling methods: Bootstrap background =====
+The codes which implement these algorithms are discussed below here.
 
-Since $\widehat{\beta} = \widehat{\beta}(\bm{X})$ is a function of random variables,
-$\widehat{\beta}$ itself must be a random variable. Thus it has
-a pdf, call this function $p(\bm{t})$. The aim of the bootstrap is to
-estimate $p(\bm{t})$ by the relative frequency of
-$\widehat{\beta}$. You can think of this as using a histogram
-in the place of $p(\bm{t})$. If the relative frequency closely
-resembles $p(\vec{t})$, then using numerics, it is straight forward to
-estimate all the interesting parameters of $p(\bm{t})$ using point
-estimators.  
 
 
-!split
-===== Resampling methods: More Bootstrap background =====
 
-In the case that $\widehat{\beta}$ has
-more than one component, and the components are independent, we use the
-same estimator on each component separately.  If the probability
-density function of $X_i$, $p(x)$, had been known, then it would have
-been straightforward to do this by: 
-o Drawing lots of numbers from $p(x)$, suppose we call one such set of numbers $(X_1^*, X_2^*, \cdots, X_n^*)$. 
-o Then using these numbers, we could compute a replica of $\widehat{\beta}$ called $\widehat{\beta}^*$. 
 
-By repeated use of the above two points, many
-estimates of $\widehat{\beta}$ can  be obtained. The
-idea is to use the relative frequency of $\widehat{\beta}^*$
-(think of a histogram) as an estimate of $p(\bm{t})$.
 
-!split
-===== Resampling methods: Bootstrap approach =====
 
-But
-unless there is enough information available about the process that
-generated $X_1,X_2,\cdots,X_n$, $p(x)$ is in general
-unknown. Therefore, "Efron in 1979":"https://projecteuclid.org/euclid.aos/1176344552"  asked the
-question: What if we replace $p(x)$ by the relative frequency
-of the observation $X_i$?
 
-If we draw observations in accordance with
-the relative frequency of the observations, will we obtain the same
-result in some asymptotic sense? The answer is yes.
 
 
 
 !split
-===== Resampling methods: Bootstrap steps =====
+===== Practical tips =====
 
-The independent bootstrap works like this: 
+* _Randomize the data when making mini-batches_. It is always important to randomly shuffle the data when forming mini-batches. Otherwise, the gradient descent method can fit spurious correlations resulting from the order in which data is presented.
 
-o Draw with replacement $n$ numbers for the observed variables $\bm{x} = (x_1,x_2,\cdots,x_n)$. 
-o Define a vector $\bm{x}^*$ containing the values which were drawn from $\bm{x}$. 
-o Using the vector $\bm{x}^*$ compute $\widehat{\beta}^*$ by evaluating $\widehat \beta$ under the observations $\bm{x}^*$. 
-o Repeat this process $k$ times. 
+* _Transform your inputs_. Learning becomes difficult when our landscape has a mixture of steep and flat directions. One simple trick for minimizing these situations is to standardize the data by subtracting the mean and normalizing the variance of input variables. Whenever possible, also decorrelate the inputs. To understand why this is helpful, consider the case of linear regression. It is easy to show that for the squared error cost function, the Hessian of the cost function is just the correlation matrix between the inputs. Thus, by standardizing the inputs, we are ensuring that the landscape looks homogeneous in all directions in parameter space. Since most deep networks can be viewed as linear transformations followed by a non-linearity at each layer, we expect this intuition to hold beyond the linear case.
 
-When you are done, you can draw a histogram of the relative frequency
-of $\widehat \beta^*$. This is your estimate of the probability
-distribution $p(t)$. Using this probability distribution you can
-estimate any statistics thereof. In principle you never draw the
-histogram of the relative frequency of $\widehat{\beta}^*$. Instead
-you use the estimators corresponding to the statistic of interest. For
-example, if you are interested in estimating the variance of $\widehat
-\beta$, apply the etsimator $\widehat \sigma^2$ to the values
-$\widehat \beta^*$.
+* _Monitor the out-of-sample performance._ Always monitor the performance of your model on a validation set (a small portion of the training data that is held out of the training process to serve as a proxy for the test set. If the validation error starts increasing, then the model is beginning to overfit. Terminate the learning process. This *early stopping* significantly improves performance in many settings.
+	
+* _Adaptive optimization methods don't always have good generalization._ Recent studies have shown that adaptive methods such as ADAM, RMSPorp, and AdaGrad tend to have poor generalization compared to SGD or SGD with momentum, particularly in the high-dimensional limit (i.e. the number of parameters exceeds the number of data points). Although it is not clear at this stage why these methods perform so well in training deep neural networks, simpler procedures like properly-tuned SGD may work as well or better in these applications.
 
 
 !split
-===== Code example for the Bootstrap method =====
-
-The following code starts with a Gaussian distribution with mean value
-$\mu =100$ and variance $\sigma=15$. We use this to generate the data
-used in the bootstrap analysis. The bootstrap analysis returns a data
-set after a given number of bootstrap operations (as many as we have
-data points). This data set consists of estimated mean values for each
-bootstrap operation. The histogram generated by the bootstrap method
-shows that the distribution for these mean values is also a Gaussian,
-centered around the mean value $\mu=100$ but with standard deviation
-$\sigma/\sqrt{n}$, where $n$ is the number of bootstrap samples (in
-this case the same as the number of original data points). The value
-of the standard deviation is what we expect from the central limit
-theorem.
+=====  Sneaking in automatic differentiation using Autograd =====
 
+In the examples here we take the liberty of sneaking in automatic
+differentiation (without having discussed the mathematics).  In
+project 1 you will write the gradients as discussed above, that is
+hard-coding the gradients.  By introducing automatic differentiation
+via the library _autograd_, which is now replaced by _JAX_, we have
+more flexibility in setting up alternative cost functions.
+
+The
+first example shows results with ordinary leats squares.
 
 !bc pycod
+# Using Autograd to calculate gradients for OLS
+from random import random, seed
 import numpy as np
-from time import time
-from scipy.stats import norm
+import autograd.numpy as np
 import matplotlib.pyplot as plt
-
-# Returns mean of bootstrap samples 
-# Bootstrap algorithm
-def bootstrap(data, datapoints):
-    t = np.zeros(datapoints)
-    n = len(data)
-    # non-parametric bootstrap         
-    for i in range(datapoints):
-        t[i] = np.mean(data[np.random.randint(0,n,n)])
-    # analysis    
-    print("Bootstrap Statistics :")
-    print("original           bias      std. error")
-    print("%8g %8g %14g %15g" % (np.mean(data), np.std(data),np.mean(t),np.std(t)))
-    return t
-
-# We set the mean value to 100 and the standard deviation to 15
-mu, sigma = 100, 15
-datapoints = 10000
-# We generate random numbers according to the normal distribution
-x = mu + sigma*np.random.randn(datapoints)
-# bootstrap returns the data sample                                    
-t = bootstrap(x, datapoints)
-!ec
-We see that our new variance and from that the standard deviation, agrees with the central limit theorem.
-
-!split
-===== Plotting the Histogram =====
-!bc pycod 
-# the histogram of the bootstrapped data (normalized data if density = True)
-n, binsboot, patches = plt.hist(t, 50, density=True, facecolor='red', alpha=0.75)
-# add a 'best fit' line  
-y = norm.pdf(binsboot, np.mean(t), np.std(t))
-lt = plt.plot(binsboot, y, 'b', linewidth=1)
-plt.xlabel('x')
-plt.ylabel('Probability')
-plt.grid(True)
+from autograd import grad
+
+def CostOLS(theta):
+    return (1.0/n)*np.sum((y-X @ theta)**2)
+
+n = 100
+x = 2*np.random.rand(n,1)
+y = 4+3*x+np.random.randn(n,1)
+
+X = np.c_[np.ones((n,1)), x]
+XT_X = X.T @ X
+theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)
+print("Own inversion")
+print(theta_linreg)
+# Hessian matrix
+H = (2.0/n)* XT_X
+EigValues, EigVectors = np.linalg.eig(H)
+print(f"Eigenvalues of Hessian Matrix:{EigValues}")
+
+theta = np.random.randn(2,1)
+eta = 1.0/np.max(EigValues)
+Niterations = 1000
+# define the gradient
+training_gradient = grad(CostOLS)
+
+for iter in range(Niterations):
+    gradients = training_gradient(theta)
+    theta -= eta*gradients
+print("theta from own gd")
+print(theta)
+
+xnew = np.array([[0],[2]])
+Xnew = np.c_[np.ones((2,1)), xnew]
+ypredict = Xnew.dot(theta)
+ypredict2 = Xnew.dot(theta_linreg)
+
+plt.plot(xnew, ypredict, "r-")
+plt.plot(xnew, ypredict2, "b-")
+plt.plot(x, y ,'ro')
+plt.axis([0,2.0,0, 15.0])
+plt.xlabel(r'$x$')
+plt.ylabel(r'$y$')
+plt.title(r'Random numbers ')
 plt.show()
-!ec
 
+!ec
 
 
 !split
-===== The bias-variance tradeoff =====
-
+===== Same code but now with momentum gradient descent =====
+!bc pycod
+# Using Autograd to calculate gradients for OLS
+from random import random, seed
+import numpy as np
+import autograd.numpy as np
+import matplotlib.pyplot as plt
+from autograd import grad
+
+def CostOLS(theta):
+    return (1.0/n)*np.sum((y-X @ theta)**2)
+
+n = 100
+x = 2*np.random.rand(n,1)
+y = 4+3*x#+np.random.randn(n,1)
+
+X = np.c_[np.ones((n,1)), x]
+XT_X = X.T @ X
+theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)
+print("Own inversion")
+print(theta_linreg)
+# Hessian matrix
+H = (2.0/n)* XT_X
+EigValues, EigVectors = np.linalg.eig(H)
+print(f"Eigenvalues of Hessian Matrix:{EigValues}")
+
+theta = np.random.randn(2,1)
+eta = 1.0/np.max(EigValues)
+Niterations = 30
+
+# define the gradient
+training_gradient = grad(CostOLS)
+
+for iter in range(Niterations):
+    gradients = training_gradient(theta)
+    theta -= eta*gradients
+    print(iter,gradients[0],gradients[1])
+print("theta from own gd")
+print(theta)
+
+# Now improve with momentum gradient descent
+change = 0.0
+delta_momentum = 0.3
+for iter in range(Niterations):
+    # calculate gradient
+    gradients = training_gradient(theta)
+    # calculate update
+    new_change = eta*gradients+delta_momentum*change
+    # take a step
+    theta -= new_change
+    # save the change
+    change = new_change
+    print(iter,gradients[0],gradients[1])
+print("theta from own gd wth momentum")
+print(theta)
 
-We will discuss the bias-variance tradeoff in the context of
-continuous predictions such as regression. However, many of the
-intuitions and ideas discussed here also carry over to classification
-tasks. Consider a dataset $\mathcal{D}$ consisting of the data
-$\mathbf{X}_\mathcal{D}=\{(y_j, \boldsymbol{x}_j), j=0\ldots n-1\}$. 
+!ec
 
-Let us assume that the true data is generated from a noisy model
+!split
+===== Including Stochastic Gradient Descent with Autograd =====
 
-!bt
-\[
-\bm{y}=f(\boldsymbol{x}) + \bm{\epsilon}
-\]
-!et
+In this code we include the stochastic gradient descent approach
+discussed above. Note here that we specify which argument we are
+taking the derivative with respect to when using _autograd_.
 
-where $\epsilon$ is normally distributed with mean zero and standard deviation $\sigma^2$.
+!bc pycod
+# Using Autograd to calculate gradients using SGD
+# OLS example
+from random import random, seed
+import numpy as np
+import autograd.numpy as np
+import matplotlib.pyplot as plt
+from autograd import grad
+
+# Note change from previous example
+def CostOLS(y,X,theta):
+    return np.sum((y-X @ theta)**2)
+
+n = 100
+x = 2*np.random.rand(n,1)
+y = 4+3*x+np.random.randn(n,1)
+
+X = np.c_[np.ones((n,1)), x]
+XT_X = X.T @ X
+theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)
+print("Own inversion")
+print(theta_linreg)
+# Hessian matrix
+H = (2.0/n)* XT_X
+EigValues, EigVectors = np.linalg.eig(H)
+print(f"Eigenvalues of Hessian Matrix:{EigValues}")
+
+theta = np.random.randn(2,1)
+eta = 1.0/np.max(EigValues)
+Niterations = 1000
+
+# Note that we request the derivative wrt third argument (theta, 2 here)
+training_gradient = grad(CostOLS,2)
+
+for iter in range(Niterations):
+    gradients = (1.0/n)*training_gradient(y, X, theta)
+    theta -= eta*gradients
+print("theta from own gd")
+print(theta)
+
+xnew = np.array([[0],[2]])
+Xnew = np.c_[np.ones((2,1)), xnew]
+ypredict = Xnew.dot(theta)
+ypredict2 = Xnew.dot(theta_linreg)
+
+plt.plot(xnew, ypredict, "r-")
+plt.plot(xnew, ypredict2, "b-")
+plt.plot(x, y ,'ro')
+plt.axis([0,2.0,0, 15.0])
+plt.xlabel(r'$x$')
+plt.ylabel(r'$y$')
+plt.title(r'Random numbers ')
+plt.show()
 
-In our derivation of the ordinary least squares method we defined then
-an approximation to the function $f$ in terms of the parameters
-$\bm{\beta}$ and the design matrix $\bm{X}$ which embody our model,
-that is $\bm{\tilde{y}}=\bm{X}\bm{\beta}$. 
+n_epochs = 50
+M = 5   #size of each minibatch
+m = int(n/M) #number of minibatches
+t0, t1 = 5, 50
+def learning_schedule(t):
+    return t0/(t+t1)
 
-Thereafter we found the parameters $\bm{\beta}$ by optimizing the means squared error via the so-called cost function
-!bt
-\[
-C(\bm{X},\bm{\beta}) =\frac{1}{n}\sum_{i=0}^{n-1}(y_i-\tilde{y}_i)^2=\mathbb{E}\left[(\bm{y}-\bm{\tilde{y}})^2\right].
-\]
-!et
+theta = np.random.randn(2,1)
 
-We can rewrite this as 
-!bt
-\[
-\mathbb{E}\left[(\bm{y}-\bm{\tilde{y}})^2\right]=\frac{1}{n}\sum_i(f_i-\mathbb{E}\left[\bm{\tilde{y}}\right])^2+\frac{1}{n}\sum_i(\tilde{y}_i-\mathbb{E}\left[\bm{\tilde{y}}\right])^2+\sigma^2.
-\]
-!et
+for epoch in range(n_epochs):
+# Can you figure out a better way of setting up the contributions to each batch?
+    for i in range(m):
+        random_index = M*np.random.randint(m)
+        xi = X[random_index:random_index+M]
+        yi = y[random_index:random_index+M]
+        gradients = (1.0/M)*training_gradient(yi, xi, theta)
+        eta = learning_schedule(epoch*m+i)
+        theta = theta - eta*gradients
+print("theta from own sdg")
+print(theta)
 
-The three terms represent the square of the bias of the learning
-method, which can be thought of as the error caused by the simplifying
-assumptions built into the method. The second term represents the
-variance of the chosen model and finally the last terms is variance of
-the error $\bm{\epsilon}$.
 
-To derive this equation, we need to recall that the variance of $\bm{y}$ and $\bm{\epsilon}$ are both equal to $\sigma^2$. The mean value of $\bm{\epsilon}$ is by definition equal to zero. Furthermore, the function $f$ is not a stochastics variable, idem for $\bm{\tilde{y}}$.
-We use a more compact notation in terms of the expectation value 
-!bt
-\[
-\mathbb{E}\left[(\bm{y}-\bm{\tilde{y}})^2\right]=\mathbb{E}\left[(\bm{f}+\bm{\epsilon}-\bm{\tilde{y}})^2\right],
-\]
-!et
-and adding and subtracting $\mathbb{E}\left[\bm{\tilde{y}}\right]$ we get
-!bt
-\[
-\mathbb{E}\left[(\bm{y}-\bm{\tilde{y}})^2\right]=\mathbb{E}\left[(\bm{f}+\bm{\epsilon}-\bm{\tilde{y}}+\mathbb{E}\left[\bm{\tilde{y}}\right]-\mathbb{E}\left[\bm{\tilde{y}}\right])^2\right],
-\]
-!et
-which, using the abovementioned expectation values can be rewritten as 
-!bt
-\[
-\mathbb{E}\left[(\bm{y}-\bm{\tilde{y}})^2\right]=\mathbb{E}\left[(\bm{y}-\mathbb{E}\left[\bm{\tilde{y}}\right])^2\right]+\mathrm{Var}\left[\bm{\tilde{y}}\right]+\sigma^2,
-\]
-!et
-that is the rewriting in terms of the so-called bias, the variance of the model $\bm{\tilde{y}}$ and the variance of $\bm{\epsilon}$.
+!ec
 
 
 !split
-===== A way to Read the Bias-Variance Tradeoff =====
-
-FIGURE: [figures/BiasVariance.png, width=600 frac=0.9]
+===== Same code but now with momentum gradient descent =====
+!bc pycod
+# Using Autograd to calculate gradients using SGD
+# OLS example
+from random import random, seed
+import numpy as np
+import autograd.numpy as np
+import matplotlib.pyplot as plt
+from autograd import grad
+
+# Note change from previous example
+def CostOLS(y,X,theta):
+    return np.sum((y-X @ theta)**2)
+
+n = 100
+x = 2*np.random.rand(n,1)
+y = 4+3*x+np.random.randn(n,1)
+
+X = np.c_[np.ones((n,1)), x]
+XT_X = X.T @ X
+theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)
+print("Own inversion")
+print(theta_linreg)
+# Hessian matrix
+H = (2.0/n)* XT_X
+EigValues, EigVectors = np.linalg.eig(H)
+print(f"Eigenvalues of Hessian Matrix:{EigValues}")
+
+theta = np.random.randn(2,1)
+eta = 1.0/np.max(EigValues)
+Niterations = 100
+
+# Note that we request the derivative wrt third argument (theta, 2 here)
+training_gradient = grad(CostOLS,2)
+
+for iter in range(Niterations):
+    gradients = (1.0/n)*training_gradient(y, X, theta)
+    theta -= eta*gradients
+print("theta from own gd")
+print(theta)
+
+
+n_epochs = 50
+M = 5   #size of each minibatch
+m = int(n/M) #number of minibatches
+t0, t1 = 5, 50
+def learning_schedule(t):
+    return t0/(t+t1)
+
+theta = np.random.randn(2,1)
+
+change = 0.0
+delta_momentum = 0.3
+
+for epoch in range(n_epochs):
+    for i in range(m):
+        random_index = M*np.random.randint(m)
+        xi = X[random_index:random_index+M]
+        yi = y[random_index:random_index+M]
+        gradients = (1.0/M)*training_gradient(yi, xi, theta)
+        eta = learning_schedule(epoch*m+i)
+        # calculate update
+        new_change = eta*gradients+delta_momentum*change
+        # take a step
+        theta -= new_change
+        # save the change
+        change = new_change
+print("theta from own sdg with momentum")
+print(theta)
+!ec
 
 
 !split
-===== Example code for Bias-Variance tradeoff =====
+===== But none of these can compete with Newton's method =====
+
+Note that we here have introduced automatic differentiation
 !bc pycod
-import matplotlib.pyplot as plt
+# Using Newton's method
+from random import random, seed
 import numpy as np
-from sklearn.linear_model import LinearRegression, Ridge, Lasso
-from sklearn.preprocessing import PolynomialFeatures
-from sklearn.model_selection import train_test_split
-from sklearn.pipeline import make_pipeline
-from sklearn.utils import resample
-
-np.random.seed(2018)
-
-n = 500
-n_boostraps = 100
-degree = 18  # A quite high value, just to show.
-noise = 0.1
-
-# Make data set.
-x = np.linspace(-1, 3, n).reshape(-1, 1)
-y = np.exp(-x**2) + 1.5 * np.exp(-(x-2)**2) + np.random.normal(0, 0.1, x.shape)
-
-# Hold out some test data that is never used in training.
-x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2)
-
-# Combine x transformation and model into one operation.
-# Not neccesary, but convenient.
-model = make_pipeline(PolynomialFeatures(degree=degree), LinearRegression(fit_intercept=False))
-
-# The following (m x n_bootstraps) matrix holds the column vectors y_pred
-# for each bootstrap iteration.
-y_pred = np.empty((y_test.shape[0], n_boostraps))
-for i in range(n_boostraps):
-    x_, y_ = resample(x_train, y_train)
-
-    # Evaluate the new model on the same test data each time.
-    y_pred[:, i] = model.fit(x_, y_).predict(x_test).ravel()
-
-# Note: Expectations and variances taken w.r.t. different training
-# data sets, hence the axis=1. Subsequent means are taken across the test data
-# set in order to obtain a total value, but before this we have error/bias/variance
-# calculated per data point in the test set.
-# Note 2: The use of keepdims=True is important in the calculation of bias as this 
-# maintains the column vector form. Dropping this yields very unexpected results.
-error = np.mean( np.mean((y_test - y_pred)**2, axis=1, keepdims=True) )
-bias = np.mean( (y_test - np.mean(y_pred, axis=1, keepdims=True))**2 )
-variance = np.mean( np.var(y_pred, axis=1, keepdims=True) )
-print('Error:', error)
-print('Bias^2:', bias)
-print('Var:', variance)
-print('{} >= {} + {} = {}'.format(error, bias, variance, bias+variance))
-
-plt.plot(x[::5, :], y[::5, :], label='f(x)')
-plt.scatter(x_test, y_test, label='Data points')
-plt.scatter(x_test, np.mean(y_pred, axis=1), label='Pred')
-plt.legend()
-plt.show()
+import autograd.numpy as np
+from autograd import grad
+
+def CostOLS(theta):
+    return (1.0/n)*np.sum((y-X @ theta)**2)
+
+n = 100
+x = 2*np.random.rand(n,1)
+y = 4+3*x+5*x*x
+
+X = np.c_[np.ones((n,1)), x, x*x]
+XT_X = X.T @ X
+theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)
+print("Own inversion")
+print(theta_linreg)
+# Hessian matrix
+H = (2.0/n)* XT_X
+# Note that here the Hessian does not depend on the parameters theta
+invH = np.linalg.pinv(H)
+theta = np.random.randn(3,1)
+Niterations = 5
+# define the gradient
+training_gradient = grad(CostOLS)
+
+for iter in range(Niterations):
+    gradients = training_gradient(theta)
+    theta -= invH @ gradients
+    print(iter,gradients[0],gradients[1])
+print("theta from own Newton code")
+print(theta)
 
 !ec
 
 
+
 !split
-===== Understanding what happens =====
+===== Similar (second order function now) problem but now with AdaGrad =====
 !bc pycod
-import matplotlib.pyplot as plt
+# Using Autograd to calculate gradients using AdaGrad and Stochastic Gradient descent
+# OLS example
+from random import random, seed
 import numpy as np
-from sklearn.linear_model import LinearRegression, Ridge, Lasso
-from sklearn.preprocessing import PolynomialFeatures
-from sklearn.model_selection import train_test_split
-from sklearn.pipeline import make_pipeline
-from sklearn.utils import resample
-
-np.random.seed(2018)
-
-n = 40
-n_boostraps = 100
-maxdegree = 14
-
-
-# Make data set.
-x = np.linspace(-3, 3, n).reshape(-1, 1)
-y = np.exp(-x**2) + 1.5 * np.exp(-(x-2)**2)+ np.random.normal(0, 0.1, x.shape)
-error = np.zeros(maxdegree)
-bias = np.zeros(maxdegree)
-variance = np.zeros(maxdegree)
-polydegree = np.zeros(maxdegree)
-x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2)
-
-for degree in range(maxdegree):
-    model = make_pipeline(PolynomialFeatures(degree=degree), LinearRegression(fit_intercept=False))
-    y_pred = np.empty((y_test.shape[0], n_boostraps))
-    for i in range(n_boostraps):
-        x_, y_ = resample(x_train, y_train)
-        y_pred[:, i] = model.fit(x_, y_).predict(x_test).ravel()
-
-    polydegree[degree] = degree
-    error[degree] = np.mean( np.mean((y_test - y_pred)**2, axis=1, keepdims=True) )
-    bias[degree] = np.mean( (y_test - np.mean(y_pred, axis=1, keepdims=True))**2 )
-    variance[degree] = np.mean( np.var(y_pred, axis=1, keepdims=True) )
-    print('Polynomial degree:', degree)
-    print('Error:', error[degree])
-    print('Bias^2:', bias[degree])
-    print('Var:', variance[degree])
-    print('{} >= {} + {} = {}'.format(error[degree], bias[degree], variance[degree], bias[degree]+variance[degree]))
-
-plt.plot(polydegree, error, label='Error')
-plt.plot(polydegree, bias, label='bias')
-plt.plot(polydegree, variance, label='Variance')
-plt.legend()
-plt.show()
+import autograd.numpy as np
+import matplotlib.pyplot as plt
+from autograd import grad
+
+# Note change from previous example
+def CostOLS(y,X,theta):
+    return np.sum((y-X @ theta)**2)
+
+n = 1000
+x = np.random.rand(n,1)
+y = 2.0+3*x +4*x*x
+
+X = np.c_[np.ones((n,1)), x, x*x]
+XT_X = X.T @ X
+theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)
+print("Own inversion")
+print(theta_linreg)
+
+
+# Note that we request the derivative wrt third argument (theta, 2 here)
+training_gradient = grad(CostOLS,2)
+# Define parameters for Stochastic Gradient Descent
+n_epochs = 50
+M = 5   #size of each minibatch
+m = int(n/M) #number of minibatches
+# Guess for unknown parameters theta
+theta = np.random.randn(3,1)
+
+# Value for learning rate
+eta = 0.01
+# Including AdaGrad parameter to avoid possible division by zero
+delta  = 1e-8
+for epoch in range(n_epochs):
+    Giter = 0.0
+    for i in range(m):
+        random_index = M*np.random.randint(m)
+        xi = X[random_index:random_index+M]
+        yi = y[random_index:random_index+M]
+        gradients = (1.0/M)*training_gradient(yi, xi, theta)
+        Giter += gradients*gradients
+        update = gradients*eta/(delta+np.sqrt(Giter))
+        theta -= update
+print("theta from own AdaGrad")
+print(theta)
 
 
+!ec
 
+Running this code we note an almost perfect agreement with the results from matrix inversion.
 
+!split
+=====  RMSprop for adaptive learning rate with Stochastic Gradient Descent =====
+!bc pycod
+# Using Autograd to calculate gradients using RMSprop  and Stochastic Gradient descent
+# OLS example
+from random import random, seed
+import numpy as np
+import autograd.numpy as np
+import matplotlib.pyplot as plt
+from autograd import grad
+
+# Note change from previous example
+def CostOLS(y,X,theta):
+    return np.sum((y-X @ theta)**2)
+
+n = 1000
+x = np.random.rand(n,1)
+y = 2.0+3*x +4*x*x# +np.random.randn(n,1)
+
+X = np.c_[np.ones((n,1)), x, x*x]
+XT_X = X.T @ X
+theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)
+print("Own inversion")
+print(theta_linreg)
+
+
+# Note that we request the derivative wrt third argument (theta, 2 here)
+training_gradient = grad(CostOLS,2)
+# Define parameters for Stochastic Gradient Descent
+n_epochs = 50
+M = 5   #size of each minibatch
+m = int(n/M) #number of minibatches
+# Guess for unknown parameters theta
+theta = np.random.randn(3,1)
+
+# Value for learning rate
+eta = 0.01
+# Value for parameter rho
+rho = 0.99
+# Including AdaGrad parameter to avoid possible division by zero
+delta  = 1e-8
+for epoch in range(n_epochs):
+    Giter = 0.0
+    for i in range(m):
+        random_index = M*np.random.randint(m)
+        xi = X[random_index:random_index+M]
+        yi = y[random_index:random_index+M]
+        gradients = (1.0/M)*training_gradient(yi, xi, theta)
+	# Accumulated gradient
+	# Scaling with rho the new and the previous results
+        Giter = (rho*Giter+(1-rho)*gradients*gradients)
+	# Taking the diagonal only and inverting
+        update = gradients*eta/(delta+np.sqrt(Giter))
+	# Hadamard product
+        theta -= update
+print("theta from own RMSprop")
+print(theta)
 !ec
 
-!split 
-===== Summing up ===== 
+!split
+===== And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf" =====
 
+!bc pycod
+# Using Autograd to calculate gradients using RMSprop  and Stochastic Gradient descent
+# OLS example
+from random import random, seed
+import numpy as np
+import autograd.numpy as np
+import matplotlib.pyplot as plt
+from autograd import grad
+
+# Note change from previous example
+def CostOLS(y,X,theta):
+    return np.sum((y-X @ theta)**2)
+
+n = 1000
+x = np.random.rand(n,1)
+y = 2.0+3*x +4*x*x# +np.random.randn(n,1)
+
+X = np.c_[np.ones((n,1)), x, x*x]
+XT_X = X.T @ X
+theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)
+print("Own inversion")
+print(theta_linreg)
+
+
+# Note that we request the derivative wrt third argument (theta, 2 here)
+training_gradient = grad(CostOLS,2)
+# Define parameters for Stochastic Gradient Descent
+n_epochs = 50
+M = 5   #size of each minibatch
+m = int(n/M) #number of minibatches
+# Guess for unknown parameters theta
+theta = np.random.randn(3,1)
+
+# Value for learning rate
+eta = 0.01
+# Value for parameters theta1 and theta2, see https://arxiv.org/abs/1412.6980
+theta1 = 0.9
+theta2 = 0.999
+# Including AdaGrad parameter to avoid possible division by zero
+delta  = 1e-7
+iter = 0
+for epoch in range(n_epochs):
+    first_moment = 0.0
+    second_moment = 0.0
+    iter += 1
+    for i in range(m):
+        random_index = M*np.random.randint(m)
+        xi = X[random_index:random_index+M]
+        yi = y[random_index:random_index+M]
+        gradients = (1.0/M)*training_gradient(yi, xi, theta)
+        # Computing moments first
+        first_moment = theta1*first_moment + (1-theta1)*gradients
+        second_moment = theta2*second_moment+(1-theta2)*gradients*gradients
+        first_term = first_moment/(1.0-theta1**iter)
+        second_term = second_moment/(1.0-theta2**iter)
+	# Scaling with rho the new and the previous results
+        update = eta*first_term/(np.sqrt(second_term)+delta)
+        theta -= update
+print("theta from own ADAM")
+print(theta)
+!ec
 
 
+!split
+=====  Material for the lab sessions  =====
 
-The bias-variance tradeoff summarizes the fundamental tension in
-machine learning, particularly supervised learning, between the
-complexity of a model and the amount of training data needed to train
-it.  Since data is often limited, in practice it is often useful to
-use a less-complex model with higher bias, that is  a model whose asymptotic
-performance is worse than another model because it is easier to
-train and less sensitive to sampling noise arising from having a
-finite-sized training dataset (smaller variance). 
 
+!bblock  
+o Exercise set for week 37 and reminder on scaling (from lab sessions of week 35)
+o Work on project 1
+#  * "Video of exercise sessions week 37":"https://youtu.be/bK4AEcTu-oM"
+For more discussions of Ridge regression and calculation of averages, "Wessel van Wieringen's":"https://arxiv.org/abs/1509.09169" article is highly recommended.
+!eblock  
 
 
-The above equations tell us that in
-order to minimize the expected test error, we need to select a
-statistical learning method that simultaneously achieves low variance
-and low bias. Note that variance is inherently a nonnegative quantity,
-and squared bias is also nonnegative. Hence, we see that the expected
-test MSE can never lie below $Var(\epsilon)$, the irreducible error.
 
 
-What do we mean by the variance and bias of a statistical learning
-method? The variance refers to the amount by which our model would change if we
-estimated it using a different training data set. Since the training
-data are used to fit the statistical learning method, different
-training data sets  will result in a different estimate. But ideally the
-estimate for our model should not vary too much between training
-sets. However, if a method has high variance  then small changes in
-the training data can result in large changes in the model. In general, more
-flexible statistical methods have higher variance.
+!split
+===== Reminder on different scaling methods =====
 
+Before fitting a regression model, it is good practice to normalize or
+standardize the features. This ensures all features are on a
+comparable scale, which is especially important when using
+regularization. In the exercises this week we will perform standardization, scaling each
+feature to have mean 0 and standard deviation 1.
 
-You may also find this recent "article":"https://www.pnas.org/content/116/32/15849" of interest.
+Here we compute the mean and standard deviation of each column (feature) in our design/feature matrix $\bm{X}$.
+Then we subtract the mean and divide by the standard deviation for each feature.
 
-!split
-===== Another Example from Scikit-Learn's Repository =====
-
-This example demonstrates the problems of underfitting and overfitting and
-how we can use linear regression with polynomial features to approximate
-nonlinear functions. The plot shows the function that we want to approximate,
-which is a part of the cosine function. In addition, the samples from the
-real function and the approximations of different models are displayed. The
-models have polynomial features of different degrees. We can see that a
-linear function (polynomial with degree 1) is not sufficient to fit the
-training samples. This is called _underfitting_. A polynomial of degree 4
-approximates the true function almost perfectly. However, for higher degrees
-the model will _overfit_ the training data, i.e. it learns the noise of the
-training data.
-We evaluate quantitatively overfitting and underfitting by using
-cross-validation. We calculate the mean squared error (MSE) on the validation
-set, the higher, the less likely the model generalizes correctly from the
-training data.
+In the example here we
+we will also center the target $\bm{y}$ to mean $0$. Centering $\bm{y}$
+(and each feature) means the model does not require a separate intercept
+term, the data is shifted such that the intercept is effectively 0
+. (In practice, one could include an intercept in the model and not
+penalize it, but here we simplify by centering.)
+Choose $n=100$ data points and set up $\bm{x}, $\bm{y}$ and the design matrix $\bm{X}$.
 
 !bc pycod
+# Standardize features (zero mean, unit variance for each feature)
+X_mean = X.mean(axis=0)
+X_std = X.std(axis=0)
+X_std[X_std == 0] = 1  # safeguard to avoid division by zero for constant features
+X_norm = (X - X_mean) / X_std
+
+# Center the target to zero mean (optional, to simplify intercept handling)
+y_mean = ?
+y_centered = ?
+!ec
+Do we need to center the values of $y$?
 
+After this preprocessing, each column of $\bm{X}_{\mathrm{norm}}$ has mean zero and standard deviation $1$
+and $\bm{y}_{\mathrm{centered}}$ has mean 0. This can make the optimization landscape
+nicer and ensures the regularization penalty $\lambda \sum_j
+\theta_j^2$ in Ridge regression treats each coefficient fairly (since features are on the
+same scale).
 
-#print(__doc__)
-
-import numpy as np
-import matplotlib.pyplot as plt
-from sklearn.pipeline import Pipeline
-from sklearn.preprocessing import PolynomialFeatures
-from sklearn.linear_model import LinearRegression
-from sklearn.model_selection import cross_val_score
-
-
-def true_fun(X):
-    return np.cos(1.5 * np.pi * X)
-
-np.random.seed(0)
-
-n_samples = 30
-degrees = [1, 4, 15]
-
-X = np.sort(np.random.rand(n_samples))
-y = true_fun(X) + np.random.randn(n_samples) * 0.1
-
-plt.figure(figsize=(14, 5))
-for i in range(len(degrees)):
-    ax = plt.subplot(1, len(degrees), i + 1)
-    plt.setp(ax, xticks=(), yticks=())
-
-    polynomial_features = PolynomialFeatures(degree=degrees[i],
-                                             include_bias=False)
-    linear_regression = LinearRegression()
-    pipeline = Pipeline([("polynomial_features", polynomial_features),
-                         ("linear_regression", linear_regression)])
-    pipeline.fit(X[:, np.newaxis], y)
-
-    # Evaluate the models using crossvalidation
-    scores = cross_val_score(pipeline, X[:, np.newaxis], y,
-                             scoring="neg_mean_squared_error", cv=10)
-
-    X_test = np.linspace(0, 1, 100)
-    plt.plot(X_test, pipeline.predict(X_test[:, np.newaxis]), label="Model")
-    plt.plot(X_test, true_fun(X_test), label="True function")
-    plt.scatter(X, y, edgecolor='b', s=20, label="Samples")
-    plt.xlabel("x")
-    plt.ylabel("y")
-    plt.xlim((0, 1))
-    plt.ylim((-2, 2))
-    plt.legend(loc="best")
-    plt.title("Degree {}\nMSE = {:.2e}(+/- {:.2e})".format(
-        degrees[i], -scores.mean(), scores.std()))
-plt.show()
-!ec
 
+!split
+===== Functionality in Scikit-Learn =====
+
+_Scikit-Learn_ has several functions which allow us to rescale the
+data, normally resulting in much better results in terms of various
+accuracy scores.  The _StandardScaler_ function in _Scikit-Learn_
+ensures that for each feature/predictor we study the mean value is
+zero and the variance is one (every column in the design/feature
+matrix).  This scaling has the drawback that it does not ensure that
+we have a particular maximum or minimum in our data set. Another
+function included in _Scikit-Learn_ is the _MinMaxScaler_ which
+ensures that all features are exactly between $0$ and $1$. The
 
+!split
+===== More preprocessing =====
 
+!bblock
+The _Normalizer_ scales each data
+point such that the feature vector has a euclidean length of one. In other words, it
+projects a data point on the circle (or sphere in the case of higher dimensions) with a
+radius of 1. This means every data point is scaled by a different number (by the
+inverse of it’s length).
+This normalization is often used when only the direction (or angle) of the data matters,
+not the length of the feature vector.
+
+The _RobustScaler_ works similarly to the StandardScaler in that it
+ensures statistical properties for each feature that guarantee that
+they are on the same scale. However, the RobustScaler uses the median
+and quartiles, instead of mean and variance. This makes the
+RobustScaler ignore data points that are very different from the rest
+(like measurement errors). These odd data points are also called
+outliers, and might often lead to trouble for other scaling
+techniques.
 
-!split 
-===== Various steps in cross-validation =====
-
-When the repetitive splitting of the data set is done randomly,
-samples may accidently end up in a fast majority of the splits in
-either training or test set. Such samples may have an unbalanced
-influence on either model building or prediction evaluation. To avoid
-this $k$-fold cross-validation structures the data splitting. The
-samples are divided into $k$ more or less equally sized exhaustive and
-mutually exclusive subsets. In turn (at each split) one of these
-subsets plays the role of the test set while the union of the
-remaining subsets constitutes the training set. Such a splitting
-warrants a balanced representation of each sample in both training and
-test set over the splits. Still the division into the $k$ subsets
-involves a degree of randomness. This may be fully excluded when
-choosing $k=n$. This particular case is referred to as leave-one-out
-cross-validation (LOOCV). 
+!eblock
 
 
 !split
-===== Cross-validation in brief =====
+===== Frequently used scaling functions =====
+
 
-For the various values of $k$
+Many features are often scaled using standardization to improve performance. In _Scikit-Learn_ this is given by the _StandardScaler_ function as discussed above. It is easy however to write your own. 
+Mathematically, this involves subtracting the mean and divide by the standard deviation over the data set, for each feature:
 
-o shuffle the dataset randomly.
-o Split the dataset into $k$ groups.
-o For each unique group:
- o Decide which group to use as set for test data
- o Take the remaining groups as a training data set
- o Fit a model on the training set and evaluate it on the test set
- o Retain the evaluation score and discard the model
-o Summarize the model using the sample of model evaluation scores
+!bt
+\[
+    x_j^{(i)} \rightarrow \frac{x_j^{(i)} - \overline{x}_j}{\sigma(x_j)},
+\]
+!et
+where $\overline{x}_j$ and $\sigma(x_j)$ are the mean and standard deviation, respectively,  of the feature $x_j$.
+This ensures that each feature has zero mean and unit standard deviation.  For data sets where  we do not have the standard deviation or don't wish to calculate it,  it is then common to simply set it to one.
 
 
 
-!split
-===== Code Example for Cross-validation and $k$-fold Cross-validation =====
+Keep in mind that when you transform your data set before training a model, the same transformation needs to be done
+on your eventual new data set  before making a prediction. If we translate this into a Python code, it would could be implemented as
 
-The code here uses Ridge regression with cross-validation (CV)  resampling and $k$-fold CV in order to fit a specific polynomial. 
 !bc pycod
-import numpy as np
-import matplotlib.pyplot as plt
-from sklearn.model_selection import KFold
-from sklearn.linear_model import Ridge
-from sklearn.model_selection import cross_val_score
-from sklearn.preprocessing import PolynomialFeatures
+"""
+#Model training, we compute the mean value of y and X
+y_train_mean = np.mean(y_train)
+X_train_mean = np.mean(X_train,axis=0)
+X_train = X_train - X_train_mean
+y_train = y_train - y_train_mean
+
+# The we fit our model with the training data
+trained_model = some_model.fit(X_train,y_train)
+
+
+#Model prediction, we need also to transform our data set used for the prediction.
+X_test = X_test - X_train_mean #Use mean from training data
+y_pred = trained_model(X_test)
+y_pred = y_pred + y_train_mean
+"""
+!ec
 
-# A seed just to ensure that the random numbers are the same for every run.
-# Useful for eventual debugging.
-np.random.seed(3155)
 
-# Generate the data.
-nsamples = 100
-x = np.random.randn(nsamples)
-y = 3*x**2 + np.random.randn(nsamples)
 
-## Cross-validation on Ridge regression using KFold only
+Let us try to understand what this may imply mathematically when we
+subtract the mean values, also known as *zero centering*. For
+simplicity, we will focus on  ordinary regression, as done in the above example.
 
-# Decide degree on polynomial to fit
-poly = PolynomialFeatures(degree = 6)
+The cost/loss function  for regression is
+!bt
+\[
+C(\theta_0, \theta_1, ... , \theta_{p-1}) = \frac{1}{n}\sum_{i=0}^{n} \left(y_i - \theta_0 - \sum_{j=1}^{p-1} X_{ij}\theta_j\right)^2,.
+\]
+!et
 
-# Decide which values of lambda to use
-nlambdas = 500
-lambdas = np.logspace(-3, 5, nlambdas)
+Recall also that we use the squared value. This expression can lead to an
+increased penalty for higher differences between predicted and
+output/target values.
 
-# Initialize a KFold instance
-k = 5
-kfold = KFold(n_splits = k)
+What we have done is to single out the $\theta_0$ term in the
+definition of the mean squared error (MSE).  The design matrix $X$
+does in this case not contain any intercept column.  When we take the
+derivative with respect to $\theta_0$, we want the derivative to obey
 
-# Perform the cross-validation to estimate MSE
-scores_KFold = np.zeros((nlambdas, k))
+!bt
+\[
+\frac{\partial C}{\partial \theta_j} = 0,
+\]
+!et
 
-i = 0
-for lmb in lambdas:
-    ridge = Ridge(alpha = lmb)
-    j = 0
-    for train_inds, test_inds in kfold.split(x):
-        xtrain = x[train_inds]
-        ytrain = y[train_inds]
+for all $j$. For $\theta_0$ we have
 
-        xtest = x[test_inds]
-        ytest = y[test_inds]
+!bt
+\[
+\frac{\partial C}{\partial \theta_0} = -\frac{2}{n}\sum_{i=0}^{n-1} \left(y_i - \theta_0 - \sum_{j=1}^{p-1} X_{ij} \theta_j\right).
+\]
+!et
+Multiplying away the constant $2/n$, we obtain
+!bt
+\[
+\sum_{i=0}^{n-1} \theta_0 = \sum_{i=0}^{n-1}y_i - \sum_{i=0}^{n-1} \sum_{j=1}^{p-1} X_{ij} \theta_j.
+\]
+!et
 
-        Xtrain = poly.fit_transform(xtrain[:, np.newaxis])
-        ridge.fit(Xtrain, ytrain[:, np.newaxis])
+Let us specialize first to the case where we have only two parameters $\theta_0$ and $\theta_1$.
+Our result for $\theta_0$ simplifies then to
+!bt
+\[
+n\theta_0 = \sum_{i=0}^{n-1}y_i - \sum_{i=0}^{n-1} X_{i1} \theta_1.
+\]
+!et
+We obtain then
+!bt
+\[
+\theta_0 = \frac{1}{n}\sum_{i=0}^{n-1}y_i - \theta_1\frac{1}{n}\sum_{i=0}^{n-1} X_{i1}.
+\]
+!et
+If we define
+!bt
+\[
+\mu_{\bm{x}_1}=\frac{1}{n}\sum_{i=0}^{n-1} X_{i1},
+\]
+!et
+and the mean value of the outputs as
+!bt
+\[
+\mu_y=\frac{1}{n}\sum_{i=0}^{n-1}y_i,
+\]
+!et
+we have
+!bt
+\[
+\theta_0 = \mu_y - \theta_1\mu_{\bm{x}_1}.
+\]
+!et
+In the general case with more parameters than $\theta_0$ and $\theta_1$, we have
+!bt
+\[
+\theta_0 = \frac{1}{n}\sum_{i=0}^{n-1}y_i - \frac{1}{n}\sum_{i=0}^{n-1}\sum_{j=1}^{p-1} X_{ij}\theta_j.
+\]
+!et
 
-        Xtest = poly.fit_transform(xtest[:, np.newaxis])
-        ypred = ridge.predict(Xtest)
+We can rewrite the latter equation as
+!bt
+\[
+\theta_0 = \frac{1}{n}\sum_{i=0}^{n-1}y_i - \sum_{j=1}^{p-1} \mu_{\bm{x}_j}\theta_j,
+\]
+!et
+where we have defined
+!bt
+\[
+\mu_{\bm{x}_j}=\frac{1}{n}\sum_{i=0}^{n-1} X_{ij},
+\]
+!et
+the mean value for all elements of the column vector $\bm{x}_j$.
 
-        scores_KFold[i,j] = np.sum((ypred - ytest[:, np.newaxis])**2)/np.size(ypred)
 
-        j += 1
-    i += 1
 
+Replacing $y_i$ with $y_i - y_i - \overline{\bm{y}}$ and centering also our design matrix results in a cost function (in vector-matrix disguise)
+!bt
+\[
+C(\boldsymbol{\theta}) = (\boldsymbol{\tilde{y}} - \tilde{X}\boldsymbol{\theta})^T(\boldsymbol{\tilde{y}} - \tilde{X}\boldsymbol{\theta}). 
+\]
+!et
 
-estimated_mse_KFold = np.mean(scores_KFold, axis = 1)
 
-## Cross-validation using cross_val_score from sklearn along with KFold
 
-# kfold is an instance initialized above as:
-# kfold = KFold(n_splits = k)
+If we minimize with respect to $\bm{\theta}$ we have then
 
-estimated_mse_sklearn = np.zeros(nlambdas)
-i = 0
-for lmb in lambdas:
-    ridge = Ridge(alpha = lmb)
+!bt
+\[
+\hat{\bm{\theta}} = (\tilde{X}^T\tilde{X})^{-1}\tilde{X}^T\boldsymbol{\tilde{y}},
+\]
+!et
 
-    X = poly.fit_transform(x[:, np.newaxis])
-    estimated_mse_folds = cross_val_score(ridge, X, y[:, np.newaxis], scoring='neg_mean_squared_error', cv=kfold)
+where $\boldsymbol{\tilde{y}} = \boldsymbol{y} - \overline{\bm{y}}$
+and $\tilde{X}_{ij} = X_{ij} - \frac{1}{n}\sum_{k=0}^{n-1}X_{kj}$.
 
-    # cross_val_score return an array containing the estimated negative mse for every fold.
-    # we have to the the mean of every array in order to get an estimate of the mse of the model
-    estimated_mse_sklearn[i] = np.mean(-estimated_mse_folds)
+For Ridge regression we need to add $\lambda \boldsymbol{\theta}^T\boldsymbol{\theta}$ to the cost function and get then
+!bt
+\[
+\hat{\bm{\theta}} = (\tilde{X}^T\tilde{X} + \lambda I)^{-1}\tilde{X}^T\boldsymbol{\tilde{y}}.
+\]
+!et
 
-    i += 1
+What does this mean? And why do we insist on all this? Let us look at some examples.
 
-## Plot and compare the slightly different ways to perform cross-validation
 
-plt.figure()
+This code shows a simple first-order fit to a data set using the above transformed data, where we consider the role of the intercept first, by either excluding it or including it (*code example thanks to  Øyvind Sigmundson Schøyen*). Here our scaling of the data is done by subtracting the mean values only.
+Note also that we do not split the data into training and test.
 
-plt.plot(np.log10(lambdas), estimated_mse_sklearn, label = 'cross_val_score')
-plt.plot(np.log10(lambdas), estimated_mse_KFold, 'r--', label = 'KFold')
+!bc pycod
+import numpy as np
+import matplotlib.pyplot as plt
 
-plt.xlabel('log10(lambda)')
-plt.ylabel('mse')
+from sklearn.linear_model import LinearRegression
 
-plt.legend()
 
-plt.show()
+np.random.seed(2021)
 
-!ec
+def MSE(y_data,y_model):
+    n = np.size(y_model)
+    return np.sum((y_data-y_model)**2)/n
 
 
+def fit_theta(X, y):
+    return np.linalg.pinv(X.T @ X) @ X.T @ y
 
-!split
-===== More examples on bootstrap and cross-validation and errors =====
 
-!bc pycod
-# Common imports
-import os
-import numpy as np
-import pandas as pd
-import matplotlib.pyplot as plt
-from sklearn.linear_model import LinearRegression, Ridge, Lasso
-from sklearn.model_selection import train_test_split
-from sklearn.utils import resample
-from sklearn.metrics import mean_squared_error
-# Where to save the figures and data files
-PROJECT_ROOT_DIR = "Results"
-FIGURE_ID = "Results/FigureFiles"
-DATA_ID = "DataFiles/"
-
-if not os.path.exists(PROJECT_ROOT_DIR):
-    os.mkdir(PROJECT_ROOT_DIR)
-
-if not os.path.exists(FIGURE_ID):
-    os.makedirs(FIGURE_ID)
-
-if not os.path.exists(DATA_ID):
-    os.makedirs(DATA_ID)
-
-def image_path(fig_id):
-    return os.path.join(FIGURE_ID, fig_id)
-
-def data_path(dat_id):
-    return os.path.join(DATA_ID, dat_id)
-
-def save_fig(fig_id):
-    plt.savefig(image_path(fig_id) + ".png", format='png')
-
-infile = open(data_path("EoS.csv"),'r')
-
-# Read the EoS data as  csv file and organize the data into two arrays with density and energies
-EoS = pd.read_csv(infile, names=('Density', 'Energy'))
-EoS['Energy'] = pd.to_numeric(EoS['Energy'], errors='coerce')
-EoS = EoS.dropna()
-Energies = EoS['Energy']
-Density = EoS['Density']
-#  The design matrix now as function of various polytrops
-
-Maxpolydegree = 30
-X = np.zeros((len(Density),Maxpolydegree))
-X[:,0] = 1.0
-testerror = np.zeros(Maxpolydegree)
-trainingerror = np.zeros(Maxpolydegree)
-polynomial = np.zeros(Maxpolydegree)
-
-trials = 100
-for polydegree in range(1, Maxpolydegree):
-    polynomial[polydegree] = polydegree
-    for degree in range(polydegree):
-        X[:,degree] = Density**(degree/3.0)
-
-# loop over trials in order to estimate the expectation value of the MSE
-    testerror[polydegree] = 0.0
-    trainingerror[polydegree] = 0.0
-    for samples in range(trials):
-        x_train, x_test, y_train, y_test = train_test_split(X, Energies, test_size=0.2)
-        model = LinearRegression(fit_intercept=False).fit(x_train, y_train)
-        ypred = model.predict(x_train)
-        ytilde = model.predict(x_test)
-        testerror[polydegree] += mean_squared_error(y_test, ytilde)
-        trainingerror[polydegree] += mean_squared_error(y_train, ypred) 
-
-    testerror[polydegree] /= trials
-    trainingerror[polydegree] /= trials
-    print("Degree of polynomial: %3d"% polynomial[polydegree])
-    print("Mean squared error on training data: %.8f" % trainingerror[polydegree])
-    print("Mean squared error on test data: %.8f" % testerror[polydegree])
-
-plt.plot(polynomial, np.log10(trainingerror), label='Training Error')
-plt.plot(polynomial, np.log10(testerror), label='Test Error')
-plt.xlabel('Polynomial degree')
-plt.ylabel('log10[MSE]')
-plt.legend()
-plt.show()
+true_theta = [2, 0.5, 3.7]
 
-!ec
+x = np.linspace(0, 1, 11)
+y = np.sum(
+    np.asarray([x ** p * b for p, b in enumerate(true_theta)]), axis=0
+) + 0.1 * np.random.normal(size=len(x))
 
-Note that we kept the intercept column in the fitting here. This means that we need to set the _intercept_ in the call to the _Scikit-Learn_ function as _False_. Alternatively, we could have set up the design matrix $X$ without the first column of ones.
+degree = 3
+X = np.zeros((len(x), degree))
 
-!split 
-=====  The same example but now with cross-validation =====
+# Include the intercept in the design matrix
+for p in range(degree):
+    X[:, p] = x ** p
 
-In this example we keep the intercept column again but add cross-validation in order to estimate the best possible value of the means squared error.
-!bc pycod
-# Common imports
-import os
-import numpy as np
-import pandas as pd
-import matplotlib.pyplot as plt
-from sklearn.linear_model import LinearRegression, Ridge, Lasso
-from sklearn.metrics import mean_squared_error
-from sklearn.model_selection import KFold
-from sklearn.model_selection import cross_val_score
-
-
-# Where to save the figures and data files
-PROJECT_ROOT_DIR = "Results"
-FIGURE_ID = "Results/FigureFiles"
-DATA_ID = "DataFiles/"
-
-if not os.path.exists(PROJECT_ROOT_DIR):
-    os.mkdir(PROJECT_ROOT_DIR)
-
-if not os.path.exists(FIGURE_ID):
-    os.makedirs(FIGURE_ID)
-
-if not os.path.exists(DATA_ID):
-    os.makedirs(DATA_ID)
-
-def image_path(fig_id):
-    return os.path.join(FIGURE_ID, fig_id)
-
-def data_path(dat_id):
-    return os.path.join(DATA_ID, dat_id)
-
-def save_fig(fig_id):
-    plt.savefig(image_path(fig_id) + ".png", format='png')
-
-infile = open(data_path("EoS.csv"),'r')
-
-# Read the EoS data as  csv file and organize the data into two arrays with density and energies
-EoS = pd.read_csv(infile, names=('Density', 'Energy'))
-EoS['Energy'] = pd.to_numeric(EoS['Energy'], errors='coerce')
-EoS = EoS.dropna()
-Energies = EoS['Energy']
-Density = EoS['Density']
-#  The design matrix now as function of various polytrops
-
-Maxpolydegree = 30
-X = np.zeros((len(Density),Maxpolydegree))
-X[:,0] = 1.0
-estimated_mse_sklearn = np.zeros(Maxpolydegree)
-polynomial = np.zeros(Maxpolydegree)
-k =5
-kfold = KFold(n_splits = k)
-
-for polydegree in range(1, Maxpolydegree):
-    polynomial[polydegree] = polydegree
-    for degree in range(polydegree):
-        X[:,degree] = Density**(degree/3.0)
-        OLS = LinearRegression(fit_intercept=False)
-# loop over trials in order to estimate the expectation value of the MSE
-    estimated_mse_folds = cross_val_score(OLS, X, Energies, scoring='neg_mean_squared_error', cv=kfold)
-#[:, np.newaxis]
-    estimated_mse_sklearn[polydegree] = np.mean(-estimated_mse_folds)
-
-plt.plot(polynomial, np.log10(estimated_mse_sklearn), label='Test Error')
-plt.xlabel('Polynomial degree')
-plt.ylabel('log10[MSE]')
-plt.legend()
-plt.show()
+theta = fit_theta(X, y)
 
-!ec
+# Intercept is included in the design matrix
+skl = LinearRegression(fit_intercept=False).fit(X, y)
 
+print(f"True theta: {true_theta}")
+print(f"Fitted theta: {theta}")
+print(f"Sklearn fitted theta: {skl.coef_}")
+ypredictOwn = X @ theta
+ypredictSKL = skl.predict(X)
+print(f"MSE with intercept column")
+print(MSE(y,ypredictOwn))
+print(f"MSE with intercept column from SKL")
+print(MSE(y,ypredictSKL))
 
 
+plt.figure()
+plt.scatter(x, y, label="Data")
+plt.plot(x, X @ theta, label="Fit")
+plt.plot(x, skl.predict(X), label="Sklearn (fit_intercept=False)")
 
 
-!split
-===== Material for the lab sessions =====
+# Do not include the intercept in the design matrix
+X = np.zeros((len(x), degree - 1))
 
+for p in range(degree - 1):
+    X[:, p] = x ** (p + 1)
 
-!split 
-===== Linking the regression analysis with a statistical interpretation =====
+# Intercept is not included in the design matrix
+skl = LinearRegression(fit_intercept=True).fit(X, y)
 
-We will now couple the discussions of ordinary least squares, Ridge
-and Lasso regression with a statistical interpretation, that is we
-move from a linear algebra analysis to a statistical analysis. In
-particular, we will focus on what the regularization terms can result
-in.  We will amongst other things show that the regularization
-parameter can reduce considerably the variance of the parameters
-$\beta$.
+# Use centered values for X and y when computing coefficients
+y_offset = np.average(y, axis=0)
+X_offset = np.average(X, axis=0)
 
+theta = fit_theta(X - X_offset, y - y_offset)
+intercept = np.mean(y_offset - X_offset @ theta)
 
-The
-advantage of doing linear regression is that we actually end up with
-analytical expressions for several statistical quantities.  
-Standard least squares and Ridge regression  allow us to
-derive quantities like the variance and other expectation values in a
-rather straightforward way.
+print(f"Manual intercept: {intercept}")
+print(f"Fitted theta (without intercept): {theta}")
+print(f"Sklearn intercept: {skl.intercept_}")
+print(f"Sklearn fitted theta (without intercept): {skl.coef_}")
+ypredictOwn = X @ theta
+ypredictSKL = skl.predict(X)
+print(f"MSE with Manual intercept")
+print(MSE(y,ypredictOwn+intercept))
+print(f"MSE with Sklearn intercept")
+print(MSE(y,ypredictSKL))
 
+plt.plot(x, X @ theta + intercept, "--", label="Fit (manual intercept)")
+plt.plot(x, skl.predict(X), "--", label="Sklearn (fit_intercept=True)")
+plt.grid()
+plt.legend()
 
-It is assumed that $\varepsilon_i
-\sim \mathcal{N}(0, \sigma^2)$ and the $\varepsilon_{i}$ are
-independent, i.e.: 
-!bt
-\begin{align*} 
-\mbox{Cov}(\varepsilon_{i_1},
-\varepsilon_{i_2}) & = \left\{ \begin{array}{lcc} \sigma^2 & \mbox{if}
-& i_1 = i_2, \\ 0 & \mbox{if} & i_1 \not= i_2.  \end{array} \right.
-\end{align*} 
-!et
-The randomness of $\varepsilon_i$ implies that
-$\mathbf{y}_i$ is also a random variable. In particular,
-$\mathbf{y}_i$ is normally distributed, because $\varepsilon_i \sim
-\mathcal{N}(0, \sigma^2)$ and $\mathbf{X}_{i,\ast} \, \bm{\beta}$ is a
-non-random scalar. To specify the parameters of the distribution of
-$\mathbf{y}_i$ we need to calculate its first two moments. 
+plt.show()
 
-Recall that $\bm{X}$ is a matrix of dimensionality $n\times p$. The
-notation above $\mathbf{X}_{i,\ast}$ means that we are looking at the
-row number $i$ and perform a sum over all values $p$.
+!ec
 
+The intercept is the value of our output/target variable
+when all our features are zero and our function crosses the $y$-axis (for a one-dimensional case). 
 
-!split
-===== Assumptions made =====
+Printing the MSE, we see first that both methods give the same MSE, as
+they should.  However, when we move to for example Ridge regression,
+the way we treat the intercept may give a larger or smaller MSE,
+meaning that the MSE can be penalized by the value of the
+intercept. Not including the intercept in the fit, means that the
+regularization term does not include $\theta_0$. For different values
+of $\lambda$, this may lead to different MSE values. 
 
-The assumption we have made here can be summarized as (and this is going to be useful when we discuss the bias-variance trade off)
-that there exists a function $f(\bm{x})$ and  a normal distributed error $\bm{\varepsilon}\sim \mathcal{N}(0, \sigma^2)$
-which describe our data
+To remind the reader, the regularization term, with the intercept in Ridge regression, is given by
 !bt
 \[
-\bm{y} = f(\bm{x})+\bm{\varepsilon}
+\lambda \vert\vert \bm{\theta} \vert\vert_2^2 = \lambda \sum_{j=0}^{p-1}\theta_j^2,
 \]
 !et
-
-We approximate this function with our model from the solution of the linear regression equations, that is our
-function $f$ is approximated by $\bm{\tilde{y}}$ where we want to minimize $(\bm{y}-\bm{\tilde{y}})^2$, our MSE, with
+but when we take out the intercept, this equation becomes
 !bt
 \[
-\bm{\tilde{y}} = \bm{X}\bm{\beta}.
+\lambda \vert\vert \bm{\theta} \vert\vert_2^2 = \lambda \sum_{j=1}^{p-1}\theta_j^2.
 \]
 !et
 
-!split
-===== Expectation value and variance =====
-
-We can calculate the expectation value of $\bm{y}$ for a given element $i$ 
-!bt
-\begin{align*} 
-\mathbb{E}(y_i) & =
-\mathbb{E}(\mathbf{X}_{i, \ast} \, \bm{\beta}) + \mathbb{E}(\varepsilon_i)
-\, \, \, = \, \, \, \mathbf{X}_{i, \ast} \, \beta, 
-\end{align*} 
-!et
-while
-its variance is 
-!bt
-\begin{align*} \mbox{Var}(y_i) & = \mathbb{E} \{ [y_i
-- \mathbb{E}(y_i)]^2 \} \, \, \, = \, \, \, \mathbb{E} ( y_i^2 ) -
-[\mathbb{E}(y_i)]^2  \\  & = \mathbb{E} [ ( \mathbf{X}_{i, \ast} \,
-\beta + \varepsilon_i )^2] - ( \mathbf{X}_{i, \ast} \, \bm{\beta})^2 \\ &
-= \mathbb{E} [ ( \mathbf{X}_{i, \ast} \, \bm{\beta})^2 + 2 \varepsilon_i
-\mathbf{X}_{i, \ast} \, \bm{\beta} + \varepsilon_i^2 ] - ( \mathbf{X}_{i,
-\ast} \, \beta)^2 \\  & = ( \mathbf{X}_{i, \ast} \, \bm{\beta})^2 + 2
-\mathbb{E}(\varepsilon_i) \mathbf{X}_{i, \ast} \, \bm{\beta} +
-\mathbb{E}(\varepsilon_i^2 ) - ( \mathbf{X}_{i, \ast} \, \bm{\beta})^2 
-\\ & = \mathbb{E}(\varepsilon_i^2 ) \, \, \, = \, \, \,
-\mbox{Var}(\varepsilon_i) \, \, \, = \, \, \, \sigma^2.  
-\end{align*}
-!et
-Hence, $y_i \sim \mathcal{N}( \mathbf{X}_{i, \ast} \, \bm{\beta}, \sigma^2)$, that is $\bm{y}$ follows a normal distribution with 
-mean value $\bm{X}\bm{\beta}$ and variance $\sigma^2$ (not be confused with the singular values of the SVD). 
-
-!split
-===== Expectation value and variance for $\bm{\beta}$ =====
-
-With the OLS expressions for the optimal parameters $\bm{\hat{\beta}}$ we can evaluate the expectation value
+For Lasso regression we have
 !bt
 \[
-\mathbb{E}(\bm{\hat{\beta}}) = \mathbb{E}[ (\mathbf{X}^{\top} \mathbf{X})^{-1}\mathbf{X}^{T} \mathbf{Y}]=(\mathbf{X}^{T} \mathbf{X})^{-1}\mathbf{X}^{T} \mathbb{E}[ \mathbf{Y}]=(\mathbf{X}^{T} \mathbf{X})^{-1} \mathbf{X}^{T}\mathbf{X}\bm{\beta}=\bm{\beta}.
+\lambda \vert\vert \bm{\theta} \vert\vert_1 = \lambda \sum_{j=1}^{p-1}\vert\theta_j\vert.
 \]
 !et
-This means that the estimator of the regression parameters is unbiased.
 
-We can also calculate the variance
+It means that, when scaling the design matrix and the outputs/targets,
+by subtracting the mean values, we have an optimization problem which
+is not penalized by the intercept. The MSE value can then be smaller
+since it focuses only on the remaining quantities. If we however bring
+back the intercept, we will get a MSE which then contains the
+intercept.
 
-The variance of the optimal value $\bm{\hat{\beta}}$ is
-!bt
-\begin{eqnarray*}
-\mbox{Var}(\bm{\hat{\beta}}) & = & \mathbb{E} \{ [\bm{\beta} - \mathbb{E}(\bm{\beta})] [\bm{\beta} - \mathbb{E}(\bm{\beta})]^{T} \}
-\\
-& = & \mathbb{E} \{ [(\mathbf{X}^{T} \mathbf{X})^{-1} \, \mathbf{X}^{T} \mathbf{y} - \bm{\beta}] \, [(\mathbf{X}^{T} \mathbf{X})^{-1} \, \mathbf{X}^{T} \mathbf{y} - \bm{\beta}]^{T} \}
-\\
-% & = & \mathbb{E} \{ [(\mathbf{X}^{T} \mathbf{X})^{-1} \, \mathbf{X}^{T} \mathbf{y}] \, [(\mathbf{X}^{T} \mathbf{X})^{-1} \, \mathbf{X}^{T} \mathbf{y}]^{T} \} - \bm{\beta} \, \bm{\beta}^{T}
-% \\
-% & = & \mathbb{E} \{ (\mathbf{X}^{T} \mathbf{X})^{-1} \, \mathbf{X}^{T} \mathbf{y} \, \mathbf{y}^{T} \, \mathbf{X} \, (\mathbf{X}^{T} \mathbf{X})^{-1}  \} - \bm{\beta} \, \bm{\beta}^{T}
-% \\
-& = & (\mathbf{X}^{T} \mathbf{X})^{-1} \, \mathbf{X}^{T} \, \mathbb{E} \{ \mathbf{y} \, \mathbf{y}^{T} \} \, \mathbf{X} \, (\mathbf{X}^{T} \mathbf{X})^{-1} - \bm{\beta} \, \bm{\beta}^{T}
-\\
-& = & (\mathbf{X}^{T} \mathbf{X})^{-1} \, \mathbf{X}^{T} \, \{ \mathbf{X} \, \bm{\beta} \, \bm{\beta}^{T} \,  \mathbf{X}^{T} + \sigma^2 \} \, \mathbf{X} \, (\mathbf{X}^{T} \mathbf{X})^{-1} - \bm{\beta} \, \bm{\beta}^{T}
-% \\
-% & = & (\mathbf{X}^T \mathbf{X})^{-1} \, \mathbf{X}^T \, \mathbf{X} \, \bm{\beta} \, \bm{\beta}^T \,  \mathbf{X}^T \, \mathbf{X} \, (\mathbf{X}^T % \mathbf{X})^{-1}
-% \\
-% & & + \, \, \sigma^2 \, (\mathbf{X}^T \mathbf{X})^{-1} \, \mathbf{X}^T  \, \mathbf{X} \, (\mathbf{X}^T \mathbf{X})^{-1} - \bm{\beta} \bm{\beta}^T
-\\
-& = & \bm{\beta} \, \bm{\beta}^{T}  + \sigma^2 \, (\mathbf{X}^{T} \mathbf{X})^{-1} - \bm{\beta} \, \bm{\beta}^{T}
-\, \, \, = \, \, \, \sigma^2 \, (\mathbf{X}^{T} \mathbf{X})^{-1},
-\end{eqnarray*}
-!et
 
-where we have used  that $\mathbb{E} (\mathbf{y} \mathbf{y}^{T}) =
-\mathbf{X} \, \bm{\beta} \, \bm{\beta}^{T} \, \mathbf{X}^{T} +
-\sigma^2 \, \mathbf{I}_{nn}$. From $\mbox{Var}(\bm{\beta}) = \sigma^2
-\, (\mathbf{X}^{T} \mathbf{X})^{-1}$, one obtains an estimate of the
-variance of the estimate of the $j$-th regression coefficient:
-$\bm{\sigma}^2 (\bm{\beta}_j ) = \bm{\sigma}^2 [(\mathbf{X}^{T} \mathbf{X})^{-1}]_{jj} $. This may be used to
-construct a confidence interval for the estimates.
+Armed with this wisdom, we attempt first to simply set the intercept equal to _False_ in our implementation of Ridge regression for our well-known  vanilla data set.
 
+!bc pycod
+import numpy as np
+import pandas as pd
+import matplotlib.pyplot as plt
+from sklearn.model_selection import train_test_split
+from sklearn import linear_model
 
-In a similar way, we can obtain analytical expressions for say the
-expectation values of the parameters $\bm{\beta}$ and their variance
-when we employ Ridge regression, allowing us again to define a confidence interval. 
+def MSE(y_data,y_model):
+    n = np.size(y_model)
+    return np.sum((y_data-y_model)**2)/n
 
-It is rather straightforward to show that
-!bt
-\[
-\mathbb{E} \big[ \hat{\bm{\beta}}^{\mathrm{Ridge}} \big]=(\mathbf{X}^{T} \mathbf{X} + \lambda \mathbf{I}_{pp})^{-1} (\mathbf{X}^{\top} \mathbf{X})\bm{\beta}.
-\]
-!et
-We see clearly that 
-$\mathbb{E} \big[ \hat{\bm{\beta}}^{\mathrm{Ridge}} \big] \not= \hat{\bm{\beta}}^{\mathrm{OLS}}$ for any $\lambda > 0$.
 
-We can also compute the variance as 
+# A seed just to ensure that the random numbers are the same for every run.
+# Useful for eventual debugging.
+np.random.seed(3155)
 
-!bt
-\[
-\mbox{Var}[\hat{\bm{\beta}}^{\mathrm{Ridge}}]=\sigma^2[  \mathbf{X}^{T} \mathbf{X} + \lambda \mathbf{I} ]^{-1}  \mathbf{X}^{T} \mathbf{X} \{ [  \mathbf{X}^{\top} \mathbf{X} + \lambda \mathbf{I} ]^{-1}\}^{T},
-\]
-!et
-and it is easy to see that if the parameter $\lambda$ goes to infinity then the variance of Ridge parameters $\bm{\beta}$ goes to zero. 
+n = 100
+x = np.random.rand(n)
+y = np.exp(-x**2) + 1.5 * np.exp(-(x-2)**2)
 
-With this, we can compute the difference 
+Maxpolydegree = 20
+X = np.zeros((n,Maxpolydegree))
+#We include explicitely the intercept column
+for degree in range(Maxpolydegree):
+    X[:,degree] = x**degree
+# We split the data in test and training data
+X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
 
-!bt
-\[
-\mbox{Var}[\hat{\bm{\beta}}^{\mathrm{OLS}}]-\mbox{Var}(\hat{\bm{\beta}}^{\mathrm{Ridge}})=\sigma^2 [  \mathbf{X}^{T} \mathbf{X} + \lambda \mathbf{I} ]^{-1}[ 2\lambda\mathbf{I} + \lambda^2 (\mathbf{X}^{T} \mathbf{X})^{-1} ] \{ [  \mathbf{X}^{T} \mathbf{X} + \lambda \mathbf{I} ]^{-1}\}^{T}.
-\]
-!et
-The difference is non-negative definite since each component of the
-matrix product is non-negative definite. 
-This means the variance we obtain with the standard OLS will always for $\lambda > 0$ be larger than the variance of $\bm{\beta}$ obtained with the Ridge estimator. This has interesting consequences when we discuss the so-called bias-variance trade-off below. 
+p = Maxpolydegree
+I = np.eye(p,p)
+# Decide which values of lambda to use
+nlambdas = 6
+MSEOwnRidgePredict = np.zeros(nlambdas)
+MSERidgePredict = np.zeros(nlambdas)
+lambdas = np.logspace(-4, 2, nlambdas)
+for i in range(nlambdas):
+    lmb = lambdas[i]
+    OwnRidgeTheta = np.linalg.pinv(X_train.T @ X_train+lmb*I) @ X_train.T @ y_train
+    # Note: we include the intercept column and no scaling
+    RegRidge = linear_model.Ridge(lmb,fit_intercept=False)
+    RegRidge.fit(X_train,y_train)
+    # and then make the prediction
+    ytildeOwnRidge = X_train @ OwnRidgeTheta
+    ypredictOwnRidge = X_test @ OwnRidgeTheta
+    ytildeRidge = RegRidge.predict(X_train)
+    ypredictRidge = RegRidge.predict(X_test)
+    MSEOwnRidgePredict[i] = MSE(y_test,ypredictOwnRidge)
+    MSERidgePredict[i] = MSE(y_test,ypredictRidge)
+    print("Theta values for own Ridge implementation")
+    print(OwnRidgeTheta)
+    print("Theta values for Scikit-Learn Ridge implementation")
+    print(RegRidge.coef_)
+    print("MSE values for own Ridge implementation")
+    print(MSEOwnRidgePredict[i])
+    print("MSE values for Scikit-Learn Ridge implementation")
+    print(MSERidgePredict[i])
+
+# Now plot the results
+plt.figure()
+plt.plot(np.log10(lambdas), MSEOwnRidgePredict, 'r', label = 'MSE own Ridge Test')
+plt.plot(np.log10(lambdas), MSERidgePredict, 'g', label = 'MSE Ridge Test')
 
-For more discussions of Ridge regression and calculation of averages, "Wessel van Wieringen's":"https://arxiv.org/abs/1509.09169" article is highly recommended.
+plt.xlabel('log10(lambda)')
+plt.ylabel('MSE')
+plt.legend()
+plt.show()
+
+!ec
+
+The results here agree when we force _Scikit-Learn_'s Ridge function to include the first column in our design matrix.
+We see that the results agree very well. Here we have thus explicitely included the intercept column in the design matrix.
+What happens if we do not include the intercept in our fit?
+Let us see how we can change this code by zero centering.
+
+!bc pycod
+import numpy as np
+import pandas as pd
+import matplotlib.pyplot as plt
+from sklearn.model_selection import train_test_split
+from sklearn import linear_model
+from sklearn.preprocessing import StandardScaler
+
+def MSE(y_data,y_model):
+    n = np.size(y_model)
+    return np.sum((y_data-y_model)**2)/n
+# A seed just to ensure that the random numbers are the same for every run.
+# Useful for eventual debugging.
+np.random.seed(315)
+
+n = 100
+x = np.random.rand(n)
+y = np.exp(-x**2) + 1.5 * np.exp(-(x-2)**2)
+
+Maxpolydegree = 20
+X = np.zeros((n,Maxpolydegree-1))
+
+for degree in range(1,Maxpolydegree): #No intercept column
+    X[:,degree-1] = x**(degree)
+
+# We split the data in test and training data
+X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
+
+#For our own implementation, we will need to deal with the intercept by centering the design matrix and the target variable
+X_train_mean = np.mean(X_train,axis=0)
+#Center by removing mean from each feature
+X_train_scaled = X_train - X_train_mean 
+X_test_scaled = X_test - X_train_mean
+#The model intercept (called y_scaler) is given by the mean of the target variable (IF X is centered)
+#Remove the intercept from the training data.
+y_scaler = np.mean(y_train)           
+y_train_scaled = y_train - y_scaler   
+
+p = Maxpolydegree-1
+I = np.eye(p,p)
+# Decide which values of lambda to use
+nlambdas = 6
+MSEOwnRidgePredict = np.zeros(nlambdas)
+MSERidgePredict = np.zeros(nlambdas)
+
+lambdas = np.logspace(-4, 2, nlambdas)
+for i in range(nlambdas):
+    lmb = lambdas[i]
+    OwnRidgeTheta = np.linalg.pinv(X_train_scaled.T @ X_train_scaled+lmb*I) @ X_train_scaled.T @ (y_train_scaled)
+    intercept_ = y_scaler - X_train_mean@OwnRidgeTheta #The intercept can be shifted so the model can predict on uncentered data
+    #Add intercept to prediction
+    ypredictOwnRidge = X_test_scaled @ OwnRidgeTheta + y_scaler 
+    RegRidge = linear_model.Ridge(lmb)
+    RegRidge.fit(X_train,y_train)
+    ypredictRidge = RegRidge.predict(X_test)
+    MSEOwnRidgePredict[i] = MSE(y_test,ypredictOwnRidge)
+    MSERidgePredict[i] = MSE(y_test,ypredictRidge)
+    print("Theta values for own Ridge implementation")
+    print(OwnRidgeTheta) #Intercept is given by mean of target variable
+    print("Theta values for Scikit-Learn Ridge implementation")
+    print(RegRidge.coef_)
+    print('Intercept from own implementation:')
+    print(intercept_)
+    print('Intercept from Scikit-Learn Ridge implementation')
+    print(RegRidge.intercept_)
+    print("MSE values for own Ridge implementation")
+    print(MSEOwnRidgePredict[i])
+    print("MSE values for Scikit-Learn Ridge implementation")
+    print(MSERidgePredict[i])
+
+
+# Now plot the results
+plt.figure()
+plt.plot(np.log10(lambdas), MSEOwnRidgePredict, 'b--', label = 'MSE own Ridge Test')
+plt.plot(np.log10(lambdas), MSERidgePredict, 'g--', label = 'MSE SL Ridge Test')
+plt.xlabel('log10(lambda)')
+plt.ylabel('MSE')
+plt.legend()
+plt.show()
+!ec
+We see here, when compared to the code which includes explicitely the
+intercept column, that our MSE value is actually smaller. This is
+because the regularization term does not include the intercept value
+$\theta_0$ in the fitting.  This applies to Lasso regularization as
+well.  It means that our optimization is now done only with the
+centered matrix and/or vector that enter the fitting procedure.
 
 
 
diff --git a/doc/src/week38/PreviousVersions/week38.do.txt b/doc/src/week38/PreviousVersions/week38.do.txt
new file mode 100644
index 000000000..f187f74a1
--- /dev/null
+++ b/doc/src/week38/PreviousVersions/week38.do.txt
@@ -0,0 +1,1715 @@
+TITLE: Week 38: Logistic Regression and Optimization
+AUTHOR: Morten Hjorth-Jensen {copyright, 1999-present|CC BY-NC} at Department of Physics and Center for Computing in Science Education, University of Oslo
+DATE: September 15-19, 2025
+
+
+
+!split
+===== Plans for week 38, lecture Monday September 15 =====
+
+
+!bblock Material for the lecture on Monday September 15
+  * Logistic regression as our first encounter of classification methods. From binary cases to several categories.
+  * Start gradient and optimization methods
+#  * "Video of lecture":"https://youtu.be/c9DIfNHy2ks"
+#  * Whiteboard notes at URL:"https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2024/NotesSeptember16.pdf"
+!eblock
+
+!split
+===== Suggested reading and videos =====
+!bblock 
+  * Readings and Videos:
+    * Hastie et al 4.1, 4.2 and 4.3 on logistic regression
+    * Raschka et al, pages 53-76 on Logistic regression and pages 37-52 on gradient optimization
+    * For a good discussion on gradient methods, see Goodfellow et al section 4.3-4.5 and chapter 8. We will come back to the latter chapter in our discussion of Neural networks as well.
+    * "Video on Logistic regression":"https://www.youtube.com/watch?v=C5268D9t9Ak"
+    * "Yet another video on logistic regression":"https://www.youtube.com/watch?v=yIYKR4sgzI8"
+    * "Video on gradient descent":"https://www.youtube.com/watch?v=sDv4f4s2SB8"
+!eblock
+
+
+
+!split
+===== Plans for the lab sessions =====
+
+!bblock  Material for the active learning sessions on Tuesday and Wednesday
+  * Repetition  from last week on the bias-variance tradeoff
+  * Resampling techniques, cross-validation examples included here, see also the lectures from last week on the bootstrap method
+  * Exercise for week 38 on the bias-variance tradeoff, see also the video from the lab session from week 37 at URL:"https://youtu.be/omLmp_kkie0"
+  * Work on project 1, in particular resampling methods like cross-validation and bootstrap.
+  * "Video on cross-validation from exercise session":"https://youtu.be/T9jjWsmsd1o"
+!eblock  
+
+
+
+!split
+===== Material for lecture Monday September 16 =====
+
+
+!split 
+===== Logistic Regression =====
+
+In linear regression our main interest was centered on learning the
+coefficients of a functional fit (say a polynomial) in order to be
+able to predict the response of a continuous variable on some unseen
+data. The fit to the continuous variable $y_i$ is based on some
+independent variables $\bm{x}_i$. Linear regression resulted in
+analytical expressions for standard ordinary Least Squares or Ridge
+regression (in terms of matrices to invert) for several quantities,
+ranging from the variance and thereby the confidence intervals of the
+parameters $\bm{\beta}$ to the mean squared error. If we can invert
+the product of the design matrices, linear regression gives then a
+simple recipe for fitting our data.
+
+!split 
+===== Classification problems =====
+
+
+Classification problems, however, are concerned with outcomes taking
+the form of discrete variables (i.e. categories). We may for example,
+on the basis of DNA sequencing for a number of patients, like to find
+out which mutations are important for a certain disease; or based on
+scans of various patients' brains, figure out if there is a tumor or
+not; or given a specific physical system, we'd like to identify its
+state, say whether it is an ordered or disordered system (typical
+situation in solid state physics); or classify the status of a
+patient, whether she/he has a stroke or not and many other similar
+situations.
+
+The most common situation we encounter when we apply logistic
+regression is that of two possible outcomes, normally denoted as a
+binary outcome, true or false, positive or negative, success or
+failure etc.
+
+!split
+===== Optimization and Deep learning =====
+
+Logistic regression will also serve as our stepping stone towards
+neural network algorithms and supervised deep learning. For logistic
+learning, the minimization of the cost function leads to a non-linear
+equation in the parameters $\bm{\beta}$. The optimization of the
+problem calls therefore for minimization algorithms. This forms the
+bottle neck of all machine learning algorithms, namely how to find
+reliable minima of a multi-variable function. This leads us to the
+family of gradient descent methods. The latter are the working horses
+of basically all modern machine learning algorithms.
+
+We note also that many of the topics discussed here on logistic 
+regression are also commonly used in modern supervised Deep Learning
+models, as we will see later.
+
+
+!split 
+===== Basics =====
+
+We consider the case where the dependent variables, also called the
+responses or the outcomes, $y_i$ are discrete and only take values
+from $k=0,\dots,K-1$ (i.e. $K$ classes).
+
+The goal is to predict the
+output classes from the design matrix $\bm{X}\in\mathbb{R}^{n\times p}$
+made of $n$ samples, each of which carries $p$ features or predictors. The
+primary goal is to identify the classes to which new unseen samples
+belong.
+ 
+Let us specialize to the case of two classes only, with outputs
+$y_i=0$ and $y_i=1$. Our outcomes could represent the status of a
+credit card user that could default or not on her/his credit card
+debt. That is
+
+
+!bt
+\[
+y_i = \begin{bmatrix} 0 & \mathrm{no}\\  1 & \mathrm{yes} \end{bmatrix}.
+\]
+!et
+
+
+
+!split
+===== Linear classifier =====
+
+Before moving to the logistic model, let us try to use our linear
+regression model to classify these two outcomes. We could for example
+fit a linear model to the default case if $y_i > 0.5$ and the no
+default case $y_i \leq 0.5$.
+
+We would then have our 
+weighted linear combination, namely 
+!bt
+\begin{equation}
+\bm{y} = \bm{X}^T\bm{\beta} +  \bm{\epsilon},
+\end{equation}
+!et
+where $\bm{y}$ is a vector representing the possible outcomes, $\bm{X}$ is our
+$n\times p$ design matrix and $\bm{\beta}$ represents our estimators/predictors.
+
+!split
+===== Some selected properties =====
+
+The main problem with our function is that it takes values on the
+entire real axis. In the case of logistic regression, however, the
+labels $y_i$ are discrete variables. A typical example is the credit
+card data discussed below here, where we can set the state of
+defaulting the debt to $y_i=1$ and not to $y_i=0$ for one the persons
+in the data set (see the full example below).
+
+One simple way to get a discrete output is to have sign
+functions that map the output of a linear regressor to values $\{0,1\}$,
+$f(s_i)=sign(s_i)=1$ if $s_i\ge 0$ and 0 if otherwise. 
+We will encounter this model in our first demonstration of neural networks.
+
+Historically it is called the _perceptron_ model in the machine learning
+literature. This model is extremely simple. However, in many cases it is more
+favorable to use a ``soft" classifier that outputs
+the probability of a given category. This leads us to the logistic function.
+
+!split
+===== Simple example =====
+
+The following example on data for coronary heart disease (CHD) as function of age may serve as an illustration. In the code here we read and plot whether a person has had CHD (output = 1) or not (output = 0). This ouput  is plotted the person's against age. Clearly, the figure shows that attempting to make a standard linear regression fit may not be very meaningful.
+
+!bc pycod
+# Common imports
+import os
+import numpy as np
+import pandas as pd
+import matplotlib.pyplot as plt
+from sklearn.linear_model import LinearRegression, Ridge, Lasso
+from sklearn.model_selection import train_test_split
+from sklearn.utils import resample
+from sklearn.metrics import mean_squared_error
+from IPython.display import display
+from pylab import plt, mpl
+plt.style.use('seaborn')
+mpl.rcParams['font.family'] = 'serif'
+
+# Where to save the figures and data files
+PROJECT_ROOT_DIR = "Results"
+FIGURE_ID = "Results/FigureFiles"
+DATA_ID = "DataFiles/"
+
+if not os.path.exists(PROJECT_ROOT_DIR):
+    os.mkdir(PROJECT_ROOT_DIR)
+
+if not os.path.exists(FIGURE_ID):
+    os.makedirs(FIGURE_ID)
+
+if not os.path.exists(DATA_ID):
+    os.makedirs(DATA_ID)
+
+def image_path(fig_id):
+    return os.path.join(FIGURE_ID, fig_id)
+
+def data_path(dat_id):
+    return os.path.join(DATA_ID, dat_id)
+
+def save_fig(fig_id):
+    plt.savefig(image_path(fig_id) + ".png", format='png')
+
+infile = open(data_path("chddata.csv"),'r')
+
+# Read the chd data as  csv file and organize the data into arrays with age group, age, and chd
+chd = pd.read_csv(infile, names=('ID', 'Age', 'Agegroup', 'CHD'))
+chd.columns = ['ID', 'Age', 'Agegroup', 'CHD']
+output = chd['CHD']
+age = chd['Age']
+agegroup = chd['Agegroup']
+numberID  = chd['ID'] 
+display(chd)
+
+plt.scatter(age, output, marker='o')
+plt.axis([18,70.0,-0.1, 1.2])
+plt.xlabel(r'Age')
+plt.ylabel(r'CHD')
+plt.title(r'Age distribution and Coronary heart disease')
+plt.show()
+!ec
+
+!split
+===== Plotting the mean value for each group =====
+
+What we could attempt however is to plot the mean value for each group.
+
+!bc pycod
+agegroupmean = np.array([0.1, 0.133, 0.250, 0.333, 0.462, 0.625, 0.765, 0.800])
+group = np.array([1, 2, 3, 4, 5, 6, 7, 8])
+plt.plot(group, agegroupmean, "r-")
+plt.axis([0,9,0, 1.0])
+plt.xlabel(r'Age group')
+plt.ylabel(r'CHD mean values')
+plt.title(r'Mean values for each age group')
+plt.show()
+!ec
+
+We are now trying to find a function $f(y\vert x)$, that is a function which gives us an expected value for the output $y$ with a given input $x$.
+In standard linear regression with a linear dependence on $x$, we would write this in terms of our model
+!bt
+\[
+f(y_i\vert x_i)=\beta_0+\beta_1 x_i.
+\]
+!et
+
+This expression implies however that $f(y_i\vert x_i)$ could take any
+value from minus infinity to plus infinity. If we however let
+$f(y\vert y)$ be represented by the mean value, the above example
+shows us that we can constrain the function to take values between
+zero and one, that is we have $0 \le f(y_i\vert x_i) \le 1$. Looking
+at our last curve we see also that it has an S-shaped form. This leads
+us to a very popular model for the function $f$, namely the so-called
+Sigmoid function or logistic model. We will consider this function as
+representing the probability for finding a value of $y_i$ with a given
+$x_i$.
+
+!split
+===== The logistic function =====
+
+Another widely studied model, is the so-called 
+perceptron model, which is an example of a ``hard classification'' model. We
+will encounter this model when we discuss neural networks as
+well. Each datapoint is deterministically assigned to a category (i.e
+$y_i=0$ or $y_i=1$). In many cases, and the coronary heart disease data forms one of many such examples, it is favorable to have a ``soft''
+classifier that outputs the probability of a given category rather
+than a single value. For example, given $x_i$, the classifier
+outputs the probability of being in a category $k$.  Logistic regression
+is the most common example of a so-called soft classifier. In logistic
+regression, the probability that a data point $x_i$
+belongs to a category $y_i=\{0,1\}$ is given by the so-called logit function (or Sigmoid) which is meant to represent the likelihood for a given event, 
+!bt
+\[
+p(t) = \frac{1}{1+\mathrm \exp{-t}}=\frac{\exp{t}}{1+\mathrm \exp{t}}.
+\]
+!et
+Note that $1-p(t)= p(-t)$.
+
+!split
+===== Examples of likelihood functions used in logistic regression and nueral networks =====
+
+
+The following code plots the logistic function, the step function and other functions we will encounter from here and on.
+
+
+!bc pycod
+"""The sigmoid function (or the logistic curve) is a
+function that takes any real number, z, and outputs a number (0,1).
+It is useful in neural networks for assigning weights on a relative scale.
+The value z is the weighted sum of parameters involved in the learning algorithm."""
+
+import numpy
+import matplotlib.pyplot as plt
+import math as mt
+
+z = numpy.arange(-5, 5, .1)
+sigma_fn = numpy.vectorize(lambda z: 1/(1+numpy.exp(-z)))
+sigma = sigma_fn(z)
+
+fig = plt.figure()
+ax = fig.add_subplot(111)
+ax.plot(z, sigma)
+ax.set_ylim([-0.1, 1.1])
+ax.set_xlim([-5,5])
+ax.grid(True)
+ax.set_xlabel('z')
+ax.set_title('sigmoid function')
+
+plt.show()
+
+"""Step Function"""
+z = numpy.arange(-5, 5, .02)
+step_fn = numpy.vectorize(lambda z: 1.0 if z >= 0.0 else 0.0)
+step = step_fn(z)
+
+fig = plt.figure()
+ax = fig.add_subplot(111)
+ax.plot(z, step)
+ax.set_ylim([-0.5, 1.5])
+ax.set_xlim([-5,5])
+ax.grid(True)
+ax.set_xlabel('z')
+ax.set_title('step function')
+
+plt.show()
+
+"""tanh Function"""
+z = numpy.arange(-2*mt.pi, 2*mt.pi, 0.1)
+t = numpy.tanh(z)
+
+fig = plt.figure()
+ax = fig.add_subplot(111)
+ax.plot(z, t)
+ax.set_ylim([-1.0, 1.0])
+ax.set_xlim([-2*mt.pi,2*mt.pi])
+ax.grid(True)
+ax.set_xlabel('z')
+ax.set_title('tanh function')
+
+plt.show()
+!ec
+
+
+
+
+
+
+
+!split
+=====  Two parameters =====
+
+We assume now that we have two classes with $y_i$ either $0$ or $1$. Furthermore we assume also that we have only two parameters $\beta$ in our fitting of the Sigmoid function, that is we define probabilities 
+!bt
+\begin{align*}
+p(y_i=1|x_i,\bm{\beta}) &= \frac{\exp{(\beta_0+\beta_1x_i)}}{1+\exp{(\beta_0+\beta_1x_i)}},\nonumber\\
+p(y_i=0|x_i,\bm{\beta}) &= 1 - p(y_i=1|x_i,\bm{\beta}),
+\end{align*}
+!et
+where $\bm{\beta}$ are the weights we wish to extract from data, in our case $\beta_0$ and $\beta_1$. 
+
+Note that we used
+!bt
+\[
+p(y_i=0\vert x_i, \bm{\beta}) = 1-p(y_i=1\vert x_i, \bm{\beta}).
+\]
+!et
+
+!split 
+===== Maximum likelihood =====
+
+In order to define the total likelihood for all possible outcomes from a  
+dataset $\mathcal{D}=\{(y_i,x_i)\}$, with the binary labels
+$y_i\in\{0,1\}$ and where the data points are drawn independently, we use the so-called "Maximum Likelihood Estimation":"https://en.wikipedia.org/wiki/Maximum_likelihood_estimation" (MLE) principle. 
+We aim thus at maximizing 
+the probability of seeing the observed data. We can then approximate the 
+likelihood in terms of the product of the individual probabilities of a specific outcome $y_i$, that is 
+!bt
+\begin{align*}
+P(\mathcal{D}|\bm{\beta})& = \prod_{i=1}^n \left[p(y_i=1|x_i,\bm{\beta})\right]^{y_i}\left[1-p(y_i=1|x_i,\bm{\beta}))\right]^{1-y_i}\nonumber \\
+\end{align*}
+!et
+from which we obtain the log-likelihood and our _cost/loss_ function
+!bt
+\[
+\mathcal{C}(\bm{\beta}) = \sum_{i=1}^n \left( y_i\log{p(y_i=1|x_i,\bm{\beta})} + (1-y_i)\log\left[1-p(y_i=1|x_i,\bm{\beta}))\right]\right).
+\]
+!et
+
+!split
+===== The cost function rewritten =====
+
+Reordering the logarithms, we can rewrite the _cost/loss_ function as
+!bt
+\[
+\mathcal{C}(\bm{\beta}) = \sum_{i=1}^n  \left(y_i(\beta_0+\beta_1x_i) -\log{(1+\exp{(\beta_0+\beta_1x_i)})}\right).
+\]
+!et
+
+The maximum likelihood estimator is defined as the set of parameters that maximize the log-likelihood where we maximize with respect to $\beta$.
+Since the cost (error) function is just the negative log-likelihood, for logistic regression we have that
+!bt
+\[
+\mathcal{C}(\bm{\beta})=-\sum_{i=1}^n  \left(y_i(\beta_0+\beta_1x_i) -\log{(1+\exp{(\beta_0+\beta_1x_i)})}\right).
+\]
+!et
+This equation is known in statistics as the _cross entropy_. Finally, we note that just as in linear regression, 
+in practice we often supplement the cross-entropy with additional regularization terms, usually $L_1$ and $L_2$ regularization as we did for Ridge and Lasso regression.
+
+!split
+=====  Minimizing the cross entropy =====
+
+The cross entropy is a convex function of the weights $\bm{\beta}$ and,
+therefore, any local minimizer is a global minimizer. 
+
+
+Minimizing this
+cost function with respect to the two parameters $\beta_0$ and $\beta_1$ we obtain
+
+!bt
+\[
+\frac{\partial \mathcal{C}(\bm{\beta})}{\partial \beta_0} = -\sum_{i=1}^n  \left(y_i -\frac{\exp{(\beta_0+\beta_1x_i)}}{1+\exp{(\beta_0+\beta_1x_i)}}\right),
+\]
+!et
+and 
+!bt
+\[
+\frac{\partial \mathcal{C}(\bm{\beta})}{\partial \beta_1} = -\sum_{i=1}^n  \left(y_ix_i -x_i\frac{\exp{(\beta_0+\beta_1x_i)}}{1+\exp{(\beta_0+\beta_1x_i)}}\right).
+\]
+!et
+
+!split
+=====  A more compact expression =====
+
+Let us now define a vector $\bm{y}$ with $n$ elements $y_i$, an
+$n\times p$ matrix $\bm{X}$ which contains the $x_i$ values and a
+vector $\bm{p}$ of fitted probabilities $p(y_i\vert x_i,\bm{\beta})$. We can rewrite in a more compact form the first
+derivative of cost function as
+
+!bt
+\[
+\frac{\partial \mathcal{C}(\bm{\beta})}{\partial \bm{\beta}} = -\bm{X}^T\left(\bm{y}-\bm{p}\right). 
+\]
+!et
+
+If we in addition define a diagonal matrix $\bm{W}$ with elements 
+$p(y_i\vert x_i,\bm{\beta})(1-p(y_i\vert x_i,\bm{\beta})$, we can obtain a compact expression of the second derivative as 
+
+!bt
+\[
+\frac{\partial^2 \mathcal{C}(\bm{\beta})}{\partial \bm{\beta}\partial \bm{\beta}^T} = \bm{X}^T\bm{W}\bm{X}. 
+\]
+!et
+
+!split
+===== Extending to more predictors =====
+
+Within a binary classification problem, we can easily expand our model to include multiple predictors. Our ratio between likelihoods is then with $p$ predictors
+!bt
+\[
+\log{ \frac{p(\bm{\beta}\bm{x})}{1-p(\bm{\beta}\bm{x})}} = \beta_0+\beta_1x_1+\beta_2x_2+\dots+\beta_px_p.
+\]
+!et
+Here we defined $\bm{x}=[1,x_1,x_2,\dots,x_p]$ and $\bm{\beta}=[\beta_0, \beta_1, \dots, \beta_p]$ leading to
+!bt
+\[
+p(\bm{\beta}\bm{x})=\frac{ \exp{(\beta_0+\beta_1x_1+\beta_2x_2+\dots+\beta_px_p)}}{1+\exp{(\beta_0+\beta_1x_1+\beta_2x_2+\dots+\beta_px_p)}}.
+\]
+!et
+
+!split
+===== Including more classes =====
+
+Till now we have mainly focused on two classes, the so-called binary
+system. Suppose we wish to extend to $K$ classes.  Let us for the sake
+of simplicity assume we have only two predictors. We have then following model
+
+!bt
+\[
+\log{\frac{p(C=1\vert x)}{p(K\vert x)}} = \beta_{10}+\beta_{11}x_1,
+\]
+!et
+and 
+!bt
+\[
+\log{\frac{p(C=2\vert x)}{p(K\vert x)}} = \beta_{20}+\beta_{21}x_1,
+\]
+!et
+and so on till the class $C=K-1$ class
+!bt
+\[
+\log{\frac{p(C=K-1\vert x)}{p(K\vert x)}} = \beta_{(K-1)0}+\beta_{(K-1)1}x_1,
+\]
+!et
+
+and the model is specified in term of $K-1$ so-called log-odds or
+_logit_ transformations.
+
+
+!split
+===== More classes =====
+
+In our discussion of neural networks we will encounter the above again
+in terms of a slightly modified function, the so-called _Softmax_ function.
+
+The softmax function is used in various multiclass classification
+methods, such as multinomial logistic regression (also known as
+softmax regression), multiclass linear discriminant analysis, naive
+Bayes classifiers, and artificial neural networks.  Specifically, in
+multinomial logistic regression and linear discriminant analysis, the
+input to the function is the result of $K$ distinct linear functions,
+and the predicted probability for the $k$-th class given a sample
+vector $\bm{x}$ and a weighting vector $\bm{\beta}$ is (with two
+predictors):
+
+!bt
+\[
+p(C=k\vert \mathbf {x} )=\frac{\exp{(\beta_{k0}+\beta_{k1}x_1)}}{1+\sum_{l=1}^{K-1}\exp{(\beta_{l0}+\beta_{l1}x_1)}}.
+\]
+!et
+It is easy to extend to more predictors. The final class is 
+!bt
+\[
+p(C=K\vert \mathbf {x} )=\frac{1}{1+\sum_{l=1}^{K-1}\exp{(\beta_{l0}+\beta_{l1}x_1)}},
+\]
+!et
+
+and they sum to one. Our earlier discussions were all specialized to
+the case with two classes only. It is easy to see from the above that
+what we derived earlier is compatible with these equations.
+
+To find the optimal parameters we would typically use a gradient
+descent method.  Newton's method and gradient descent methods are
+discussed in the material on "optimization
+methods":"https://compphysics.github.io/MachineLearning/doc/pub/Splines/html/Splines-bs.html".
+
+
+!split
+===== Searching for Optimal Regularization Parameters $\lambda$ =====
+
+In project 1, when using Ridge and Lasso regression, we end up
+searching for the optimal parameter $\lambda$ which minimizes our
+selected scores (MSE or $R2$ values for example). The brute force
+approach, as discussed in the code here for Ridge regression, consists
+in evaluating the MSE as function of different $\lambda$ values.
+Based on these calculations, one tries then to determine the value of the hyperparameter $\lambda$
+which results in optimal scores (for example the smallest MSE or an $R2=1$).
+!bc pycod
+import numpy as np
+import pandas as pd
+import matplotlib.pyplot as plt
+from sklearn.model_selection import train_test_split
+from sklearn import linear_model
+
+def MSE(y_data,y_model):
+    n = np.size(y_model)
+    return np.sum((y_data-y_model)**2)/n
+# A seed just to ensure that the random numbers are the same for every run.
+# Useful for eventual debugging.
+np.random.seed(2021)
+
+n = 100
+x = np.random.rand(n)
+y = np.exp(-x**2) + 1.5 * np.exp(-(x-2)**2)+ np.random.randn(n)
+
+Maxpolydegree = 5
+X = np.zeros((n,Maxpolydegree-1))
+
+for degree in range(1,Maxpolydegree): #No intercept column
+    X[:,degree-1] = x**(degree)
+
+# We split the data in test and training data
+X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
+
+# Decide which values of lambda to use
+nlambdas = 500
+MSERidgePredict = np.zeros(nlambdas)
+lambdas = np.logspace(-4, 2, nlambdas)
+for i in range(nlambdas):
+    lmb = lambdas[i]
+    RegRidge = linear_model.Ridge(lmb)
+    RegRidge.fit(X_train,y_train)
+    ypredictRidge = RegRidge.predict(X_test)
+    MSERidgePredict[i] = MSE(y_test,ypredictRidge)
+
+# Now plot the results
+plt.figure()
+plt.plot(np.log10(lambdas), MSERidgePredict, 'g--', label = 'MSE SL Ridge Test')
+plt.xlabel('log10(lambda)')
+plt.ylabel('MSE')
+plt.legend()
+plt.show()
+
+!ec
+
+Here we have performed a rather data greedy calculation as function of the regularization parameter $\lambda$. There is no resampling here. The latter can easily be added by employing the function _RidgeCV_ instead of just calling the _Ridge_ function. For _RidgeCV_ we need to pass the array of $\lambda$ values.
+By inspecting the figure we can in turn determine which is the optimal regularization parameter.
+This becomes however less functional in the long run. 
+
+
+!split
+===== Grid Search =====
+
+
+An alternative is to use the so-called grid search functionality
+included with the library _Scikit-Learn_, as demonstrated for the same
+example here.
+
+!bc pycod
+import numpy as np
+from sklearn.model_selection import train_test_split
+from sklearn.linear_model import Ridge
+from sklearn.model_selection import GridSearchCV
+
+def R2(y_data, y_model):
+    return 1 - np.sum((y_data - y_model) ** 2) / np.sum((y_data - np.mean(y_data)) ** 2)
+
+def MSE(y_data,y_model):
+    n = np.size(y_model)
+    return np.sum((y_data-y_model)**2)/n
+
+# A seed just to ensure that the random numbers are the same for every run.
+# Useful for eventual debugging.
+np.random.seed(2021)
+
+n = 100
+x = np.random.rand(n)
+y = np.exp(-x**2) + 1.5 * np.exp(-(x-2)**2)+ np.random.randn(n)
+
+Maxpolydegree = 5
+X = np.zeros((n,Maxpolydegree-1))
+
+for degree in range(1,Maxpolydegree): #No intercept column
+    X[:,degree-1] = x**(degree)
+
+# We split the data in test and training data
+X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
+
+# Decide which values of lambda to use
+nlambdas = 10
+lambdas = np.logspace(-4, 2, nlambdas)
+# create and fit a ridge regression model, testing each alpha
+model = Ridge()
+gridsearch = GridSearchCV(estimator=model, param_grid=dict(alpha=lambdas))
+gridsearch.fit(X_train, y_train)
+print(gridsearch)
+ypredictRidge = gridsearch.predict(X_test)
+# summarize the results of the grid search
+print(f"Best estimated lambda-value: {gridsearch.best_estimator_.alpha}")
+print(f"MSE score: {MSE(y_test,ypredictRidge)}")
+print(f"R2 score: {R2(y_test,ypredictRidge)}")
+
+!ec
+
+By default the grid search function includes cross validation with
+five folds. The "Scikit-Learn
+documentation":"https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html#sklearn.model_selection.GridSearchCV"
+contains more information on how to set the different parameters.
+
+If we take out the random noise, running the above codes results in $\lambda=0$ yielding the best fit. 
+
+
+!split
+===== Randomized Grid Search =====
+
+An alternative to the above manual grid set up, is to use a random
+search where the parameters are tuned from a random distribution
+(uniform below) for a fixed number of iterations. A model is
+constructed and evaluated for each combination of chosen parameters.
+We repeat the previous example but now with a random search.  Note
+that values of $\lambda$ are now limited to be within $x\in
+[0,1]$. This domain may not be the most relevant one for the specific
+case under study.
+
+
+!bc pycod
+import numpy as np
+from sklearn.model_selection import train_test_split
+from sklearn.linear_model import Ridge
+from sklearn.model_selection import GridSearchCV
+from scipy.stats import uniform as randuniform
+from sklearn.model_selection import RandomizedSearchCV
+
+
+def R2(y_data, y_model):
+    return 1 - np.sum((y_data - y_model) ** 2) / np.sum((y_data - np.mean(y_data)) ** 2)
+
+def MSE(y_data,y_model):
+    n = np.size(y_model)
+    return np.sum((y_data-y_model)**2)/n
+
+# A seed just to ensure that the random numbers are the same for every run.
+# Useful for eventual debugging.
+np.random.seed(2021)
+
+n = 100
+x = np.random.rand(n)
+y = np.exp(-x**2) + 1.5 * np.exp(-(x-2)**2)+ np.random.randn(n)
+
+Maxpolydegree = 5
+X = np.zeros((n,Maxpolydegree-1))
+
+for degree in range(1,Maxpolydegree): #No intercept column
+    X[:,degree-1] = x**(degree)
+
+# We split the data in test and training data
+X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
+
+param_grid = {'alpha': randuniform()}
+# create and fit a ridge regression model, testing each alpha
+model = Ridge()
+gridsearch = RandomizedSearchCV(estimator=model, param_distributions=param_grid, n_iter=100)
+gridsearch.fit(X_train, y_train)
+print(gridsearch)
+ypredictRidge = gridsearch.predict(X_test)
+# summarize the results of the grid search
+print(f"Best estimated lambda-value: {gridsearch.best_estimator_.alpha}")
+print(f"MSE score: {MSE(y_test,ypredictRidge)}")
+print(f"R2 score: {R2(y_test,ypredictRidge)}")
+
+!ec
+
+
+
+
+!split
+=====  Wisconsin Cancer Data  =====
+
+We show here how we can use a simple regression case on the breast
+cancer data using Logistic regression as our algorithm for
+classification.
+
+
+!bc pycod
+import matplotlib.pyplot as plt
+import numpy as np
+from sklearn.model_selection import  train_test_split 
+from sklearn.datasets import load_breast_cancer
+from sklearn.linear_model import LogisticRegression
+
+# Load the data
+cancer = load_breast_cancer()
+
+X_train, X_test, y_train, y_test = train_test_split(cancer.data,cancer.target,random_state=0)
+print(X_train.shape)
+print(X_test.shape)
+# Logistic Regression
+logreg = LogisticRegression(solver='lbfgs')
+logreg.fit(X_train, y_train)
+print("Test set accuracy with Logistic Regression: {:.2f}".format(logreg.score(X_test,y_test)))
+!ec
+
+!split
+===== Using the correlation matrix =====
+
+In addition to the above scores, we could also study the covariance (and the correlation matrix).
+We use _Pandas_ to compute the correlation matrix.
+!bc pycod
+import matplotlib.pyplot as plt
+import numpy as np
+from sklearn.model_selection import  train_test_split 
+from sklearn.datasets import load_breast_cancer
+from sklearn.linear_model import LogisticRegression
+cancer = load_breast_cancer()
+import pandas as pd
+# Making a data frame
+cancerpd = pd.DataFrame(cancer.data, columns=cancer.feature_names)
+
+fig, axes = plt.subplots(15,2,figsize=(10,20))
+malignant = cancer.data[cancer.target == 0]
+benign = cancer.data[cancer.target == 1]
+ax = axes.ravel()
+
+for i in range(30):
+    _, bins = np.histogram(cancer.data[:,i], bins =50)
+    ax[i].hist(malignant[:,i], bins = bins, alpha = 0.5)
+    ax[i].hist(benign[:,i], bins = bins, alpha = 0.5)
+    ax[i].set_title(cancer.feature_names[i])
+    ax[i].set_yticks(())
+ax[0].set_xlabel("Feature magnitude")
+ax[0].set_ylabel("Frequency")
+ax[0].legend(["Malignant", "Benign"], loc ="best")
+fig.tight_layout()
+plt.show()
+
+import seaborn as sns
+correlation_matrix = cancerpd.corr().round(1)
+# use the heatmap function from seaborn to plot the correlation matrix
+# annot = True to print the values inside the square
+plt.figure(figsize=(15,8))
+sns.heatmap(data=correlation_matrix, annot=True)
+plt.show()
+
+
+!ec
+
+!split
+===== Discussing the correlation data =====
+
+In the above example we note two things. In the first plot we display
+the overlap of benign and malignant tumors as functions of the various
+features in the Wisconsing breast cancer data set. We see that for
+some of the features we can distinguish clearly the benign and
+malignant cases while for other features we cannot. This can point to
+us which features may be of greater interest when we wish to classify
+a benign or not benign tumour.
+
+In the second figure we have computed the so-called correlation
+matrix, which in our case with thirty features becomes a $30\times 30$
+matrix.
+
+We constructed this matrix using _pandas_ via the statements
+!bc pycod
+cancerpd = pd.DataFrame(cancer.data, columns=cancer.feature_names)
+!ec
+and then
+!bc pycod
+correlation_matrix = cancerpd.corr().round(1)
+!ec
+
+Diagonalizing this matrix we can in turn say something about which
+features are of relevance and which are not. This leads  us to
+the classical Principal Component Analysis (PCA) theorem with
+applications. This will be discussed later this semester ("week 43":"https://compphysics.github.io/MachineLearning/doc/pub/week43/html/week43-bs.html").
+
+
+
+!split
+===== Other measures in classification studies: Cancer Data  again =====
+!bc pycod 
+import matplotlib.pyplot as plt
+import numpy as np
+from sklearn.model_selection import  train_test_split 
+from sklearn.datasets import load_breast_cancer
+from sklearn.linear_model import LogisticRegression
+
+# Load the data
+cancer = load_breast_cancer()
+
+X_train, X_test, y_train, y_test = train_test_split(cancer.data,cancer.target,random_state=0)
+print(X_train.shape)
+print(X_test.shape)
+# Logistic Regression
+logreg = LogisticRegression(solver='lbfgs')
+logreg.fit(X_train, y_train)
+
+from sklearn.preprocessing import LabelEncoder
+from sklearn.model_selection import cross_validate
+#Cross validation
+accuracy = cross_validate(logreg,X_test,y_test,cv=10)['test_score']
+print(accuracy)
+print("Test set accuracy with Logistic Regression: {:.2f}".format(logreg.score(X_test,y_test)))
+
+import scikitplot as skplt
+y_pred = logreg.predict(X_test)
+skplt.metrics.plot_confusion_matrix(y_test, y_pred, normalize=True)
+plt.show()
+y_probas = logreg.predict_proba(X_test)
+skplt.metrics.plot_roc(y_test, y_probas)
+plt.show()
+skplt.metrics.plot_cumulative_gain(y_test, y_probas)
+plt.show()
+
+!ec
+
+
+
+!split
+===== Optimization, the central part of any Machine Learning algortithm =====
+
+"Overview Video, why do we care about gradient methods?":"https://www.uio.no/studier/emner/matnat/fys/FYS-STK3155/h20/forelesningsvideoer/OverarchingAimsWeek39.mp4?vrtx=view-as-webpage"
+
+
+
+Almost every problem in machine learning and data science starts with
+a dataset $X$, a model $g(\beta)$, which is a function of the
+parameters $\beta$ and a cost function $C(X, g(\beta))$ that allows
+us to judge how well the model $g(\beta)$ explains the observations
+$X$. The model is fit by finding the values of $\beta$ that minimize
+the cost function. Ideally we would be able to solve for $\beta$
+analytically, however this is not possible in general and we must use
+some approximative/numerical method to compute the minimum.
+
+
+!split
+=====  Revisiting our Logistic Regression case =====
+
+In our discussion on Logistic Regression we studied the 
+case of
+two classes, with $y_i$ either
+$0$ or $1$. Furthermore we assumed also that we have only two
+parameters $\beta$ in our fitting, that is we
+defined probabilities
+
+!bt
+\begin{align*}
+p(y_i=1|x_i,\bm{\beta}) &= \frac{\exp{(\beta_0+\beta_1x_i)}}{1+\exp{(\beta_0+\beta_1x_i)}},\nonumber\\
+p(y_i=0|x_i,\bm{\beta}) &= 1 - p(y_i=1|x_i,\bm{\beta}),
+\end{align*}
+!et
+where $\bm{\beta}$ are the weights we wish to extract from data, in our case $\beta_0$ and $\beta_1$. 
+
+!split
+===== The equations to solve =====
+
+Our compact equations used a definition of a vector $\bm{y}$ with $n$
+elements $y_i$, an $n\times p$ matrix $\bm{X}$ which contains the
+$x_i$ values and a vector $\bm{p}$ of fitted probabilities
+$p(y_i\vert x_i,\bm{\beta})$. We rewrote in a more compact form
+the first derivative of the cost function as
+
+!bt
+\[
+\frac{\partial \mathcal{C}(\bm{\beta})}{\partial \bm{\beta}} = -\bm{X}^T\left(\bm{y}-\bm{p}\right). 
+\]
+!et
+
+If we in addition define a diagonal matrix $\bm{W}$ with elements 
+$p(y_i\vert x_i,\bm{\beta})(1-p(y_i\vert x_i,\bm{\beta})$, we can obtain a compact expression of the second derivative as 
+
+!bt
+\[
+\frac{\partial^2 \mathcal{C}(\bm{\beta})}{\partial \bm{\beta}\partial \bm{\beta}^T} = \bm{X}^T\bm{W}\bm{X}. 
+\]
+!et
+This defines what is called  the Hessian matrix.
+
+!split
+===== Solving using Newton-Raphson's method =====
+
+If we can set up these equations, Newton-Raphson's iterative method is normally the method of choice. It requires however that we can compute in an efficient way the  matrices that define the first and second derivatives. 
+
+Our iterative scheme is then given by
+
+!bt
+\[
+\bm{\beta}^{\mathrm{new}} = \bm{\beta}^{\mathrm{old}}-\left(\frac{\partial^2 \mathcal{C}(\bm{\beta})}{\partial \bm{\beta}\partial \bm{\beta}^T}\right)^{-1}_{\bm{\beta}^{\mathrm{old}}}\times \left(\frac{\partial \mathcal{C}(\bm{\beta})}{\partial \bm{\beta}}\right)_{\bm{\beta}^{\mathrm{old}}},
+\]
+!et
+or in matrix form as
+
+!bt
+\[
+\bm{\beta}^{\mathrm{new}} = \bm{\beta}^{\mathrm{old}}-\left(\bm{X}^T\bm{W}\bm{X} \right)^{-1}\times \left(-\bm{X}^T(\bm{y}-\bm{p}) \right)_{\bm{\beta}^{\mathrm{old}}}.
+\]
+!et
+The right-hand side is computed with the old values of $\beta$. 
+
+If we can compute these matrices, in particular the Hessian, the above is often the easiest method to implement. 
+
+
+!split
+===== Brief reminder on Newton-Raphson's method =====
+
+Let us quickly remind ourselves how we derive the above method.
+
+Perhaps the most celebrated of all one-dimensional root-finding
+routines is Newton's method, also called the Newton-Raphson
+method. This method  requires the evaluation of both the
+function $f$ and its derivative $f'$ at arbitrary points. 
+If you can only calculate the derivative
+numerically and/or your function is not of the smooth type, we
+normally discourage the use of this method.
+
+!split
+===== The equations =====
+
+The Newton-Raphson formula consists geometrically of extending the
+tangent line at a current point until it crosses zero, then setting
+the next guess to the abscissa of that zero-crossing.  The mathematics
+behind this method is rather simple. Employing a Taylor expansion for
+$x$ sufficiently close to the solution $s$, we have
+
+
+!bt
+\[
+    f(s)=0=f(x)+(s-x)f'(x)+\frac{(s-x)^2}{2}f''(x) +\dots.
+    \label{eq:taylornr}
+\]
+!et
+
+For small enough values of the function and for well-behaved
+functions, the terms beyond linear are unimportant, hence we obtain
+
+
+!bt
+\[
+   f(x)+(s-x)f'(x)\approx 0,
+\]
+!et
+yielding
+!bt
+\[
+   s\approx x-\frac{f(x)}{f'(x)}.
+\]
+!et
+
+Having in mind an iterative procedure, it is natural to start iterating with
+!bt
+\[
+   x_{n+1}=x_n-\frac{f(x_n)}{f'(x_n)}.
+\]
+!et
+
+!split
+===== Simple geometric interpretation =====
+
+The above is Newton-Raphson's method. It has a simple geometric
+interpretation, namely $x_{n+1}$ is the point where the tangent from
+$(x_n,f(x_n))$ crosses the $x$-axis.  Close to the solution,
+Newton-Raphson converges fast to the desired result. However, if we
+are far from a root, where the higher-order terms in the series are
+important, the Newton-Raphson formula can give grossly inaccurate
+results. For instance, the initial guess for the root might be so far
+from the true root as to let the search interval include a local
+maximum or minimum of the function.  If an iteration places a trial
+guess near such a local extremum, so that the first derivative nearly
+vanishes, then Newton-Raphson may fail totally
+
+
+!split
+===== Extending to more than one variable =====
+
+Newton's method can be generalized to systems of several non-linear equations
+and variables. Consider the case with two equations
+!bt
+\[
+   \begin{array}{cc} f_1(x_1,x_2) &=0\\
+                     f_2(x_1,x_2) &=0,\end{array}
+\]
+!et
+which we Taylor expand to obtain
+
+!bt
+\[
+   \begin{array}{cc} 0=f_1(x_1+h_1,x_2+h_2)=&f_1(x_1,x_2)+h_1
+                     \partial f_1/\partial x_1+h_2
+                     \partial f_1/\partial x_2+\dots\\
+                     0=f_2(x_1+h_1,x_2+h_2)=&f_2(x_1,x_2)+h_1
+                     \partial f_2/\partial x_1+h_2
+                     \partial f_2/\partial x_2+\dots
+                       \end{array}.
+\]
+!et
+Defining the Jacobian matrix ${\bf \bm{J}}$ we have
+!bt
+\[
+ {\bf \bm{J}}=\left( \begin{array}{cc}
+                         \partial f_1/\partial x_1  & \partial f_1/\partial x_2 \\
+                          \partial f_2/\partial x_1     &\partial f_2/\partial x_2
+             \end{array} \right),
+\]
+!et
+we can rephrase Newton's method as
+!bt
+\[
+\left(\begin{array}{c} x_1^{n+1} \\ x_2^{n+1} \end{array} \right)=
+\left(\begin{array}{c} x_1^{n} \\ x_2^{n} \end{array} \right)+
+\left(\begin{array}{c} h_1^{n} \\ h_2^{n} \end{array} \right),
+\]
+!et
+where we have defined
+!bt
+\[
+   \left(\begin{array}{c} h_1^{n} \\ h_2^{n} \end{array} \right)=
+   -{\bf \bm{J}}^{-1}
+   \left(\begin{array}{c} f_1(x_1^{n},x_2^{n}) \\ f_2(x_1^{n},x_2^{n}) \end{array} \right).
+\]
+!et
+We need thus to compute the inverse of the Jacobian matrix and it
+is to understand that difficulties  may
+arise in case ${\bf \bm{J}}$ is nearly singular.
+
+It is rather straightforward to extend the above scheme to systems of
+more than two non-linear equations. In our case, the Jacobian matrix is given by the Hessian that represents the second derivative of cost function. 
+
+
+
+!split
+===== Steepest descent =====
+
+The basic idea of gradient descent is
+that a function $F(\mathbf{x})$, 
+$\mathbf{x} \equiv (x_1,\cdots,x_n)$, decreases fastest if one goes from $\bf {x}$ in the
+direction of the negative gradient $-\nabla F(\mathbf{x})$.
+
+It can be shown that if 
+!bt
+\[
+\mathbf{x}_{k+1} = \mathbf{x}_k - \gamma_k \nabla F(\mathbf{x}_k),
+\]
+!et
+with $\gamma_k > 0$.
+
+For $\gamma_k$ small enough, then $F(\mathbf{x}_{k+1}) \leq
+F(\mathbf{x}_k)$. This means that for a sufficiently small $\gamma_k$
+we are always moving towards smaller function values, i.e a minimum.
+
+!split 
+===== More on Steepest descent =====
+
+The previous observation is the basis of the method of steepest
+descent, which is also referred to as just gradient descent (GD). One
+starts with an initial guess $\mathbf{x}_0$ for a minimum of $F$ and
+computes new approximations according to
+
+!bt
+\[
+\mathbf{x}_{k+1} = \mathbf{x}_k - \gamma_k \nabla F(\mathbf{x}_k), \ \ k \geq 0.
+\]
+!et
+
+The parameter $\gamma_k$ is often referred to as the step length or
+the learning rate within the context of Machine Learning.
+
+!split 
+===== The ideal =====
+
+Ideally the sequence $\{\mathbf{x}_k \}_{k=0}$ converges to a global
+minimum of the function $F$. In general we do not know if we are in a
+global or local minimum. In the special case when $F$ is a convex
+function, all local minima are also global minima, so in this case
+gradient descent can converge to the global solution. The advantage of
+this scheme is that it is conceptually simple and straightforward to
+implement. However the method in this form has some severe
+limitations:
+
+In machine learing we are often faced with non-convex high dimensional
+cost functions with many local minima. Since GD is deterministic we
+will get stuck in a local minimum, if the method converges, unless we
+have a very good intial guess. This also implies that the scheme is
+sensitive to the chosen initial condition.
+
+Note that the gradient is a function of $\mathbf{x} =
+(x_1,\cdots,x_n)$ which makes it expensive to compute numerically.
+
+
+!split 
+===== The sensitiveness of the gradient descent ===== 
+
+The gradient descent method 
+is sensitive to the choice of learning rate $\gamma_k$. This is due
+to the fact that we are only guaranteed that $F(\mathbf{x}_{k+1}) \leq
+F(\mathbf{x}_k)$ for sufficiently small $\gamma_k$. The problem is to
+determine an optimal learning rate. If the learning rate is chosen too
+small the method will take a long time to converge and if it is too
+large we can experience erratic behavior.
+
+Many of these shortcomings can be alleviated by introducing
+randomness. One such method is that of Stochastic Gradient Descent
+(SGD), to be discussed next week.
+
+
+!split 
+===== Convex functions ===== 
+
+Ideally we want our cost/loss function to be convex(concave).
+
+First we give the definition of a convex set: A set $C$ in
+$\mathbb{R}^n$ is said to be convex if, for all $x$ and $y$ in $C$ and
+all $t \in (0,1)$ , the point $(1 − t)x + ty$ also belongs to
+C. Geometrically this means that every point on the line segment
+connecting $x$ and $y$ is in $C$ as discussed below.
+
+The convex subsets of $\mathbb{R}$ are the intervals of
+$\mathbb{R}$. Examples of convex sets of $\mathbb{R}^2$ are the
+regular polygons (triangles, rectangles, pentagons, etc...).
+
+!split
+===== Convex function =====
+
+_Convex function_: Let $X \subset \mathbb{R}^n$ be a convex set. Assume that the function $f: X \rightarrow \mathbb{R}$ is continuous, then $f$ is said to be convex if $$f(tx_1 + (1-t)x_2) \leq tf(x_1) + (1-t)f(x_2) $$ for all $x_1, x_2 \in X$ and for all $t \in [0,1]$. If $\leq$ is replaced with a strict inequaltiy in the definition, we demand $x_1 \neq x_2$ and $t\in(0,1)$ then $f$ is said to be strictly convex. For a single variable function, convexity means that if you draw a straight line connecting $f(x_1)$ and $f(x_2)$, the value of the function on the interval $[x_1,x_2]$ is always below the line as illustrated below.
+
+!split
+===== Conditions on convex functions =====
+
+In the following we state first and second-order conditions which
+ensures convexity of a function $f$. We write $D_f$ to denote the
+domain of $f$, i.e the subset of $R^n$ where $f$ is defined. For more
+details and proofs we refer to: "S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge University Press":"http://stanford.edu/boyd/cvxbook/,%202004".
+
+!bblock First order condition
+Suppose $f$ is differentiable (i.e $\nabla f(x)$ is well defined for
+all $x$ in the domain of $f$). Then $f$ is convex if and only if $D_f$
+is a convex set and $$f(y) \geq f(x) + \nabla f(x)^T (y-x) $$ holds
+for all $x,y \in D_f$. This condition means that for a convex function
+the first order Taylor expansion (right hand side above) at any point
+a global under estimator of the function. To convince yourself you can
+make a drawing of $f(x) = x^2+1$ and draw the tangent line to $f(x)$ and
+note that it is always below the graph.  
+!eblock
+
+!bblock Second order condition 
+Assume that $f$ is twice
+differentiable, i.e the Hessian matrix exists at each point in
+$D_f$. Then $f$ is convex if and only if $D_f$ is a convex set and its
+Hessian is positive semi-definite for all $x\in D_f$. For a
+single-variable function this reduces to $f''(x) \geq 0$. Geometrically this means that $f$ has nonnegative curvature
+everywhere.
+!eblock
+
+This condition is particularly useful since it gives us an procedure for determining if the function under consideration is convex, apart from using the definition.
+
+!split
+===== More on convex functions =====
+
+The next result is of great importance to us and the reason why we are
+going on about convex functions. In machine learning we frequently
+have to minimize a loss/cost function in order to find the best
+parameters for the model we are considering. 
+
+Ideally we want the
+global minimum (for high-dimensional models it is hard to know
+if we have local or global minimum). However, if the cost/loss function
+is convex the following result provides invaluable information:
+
+!bblock Any minimum is global for convex functions
+Consider the problem of finding $x \in \mathbb{R}^n$ such that $f(x)$
+is minimal, where $f$ is convex and differentiable. Then, any point
+$x^*$ that satisfies $\nabla f(x^*) = 0$ is a global minimum.
+!eblock
+
+This result means that if we know that the cost/loss function is convex and we are able to find a minimum, we are guaranteed that it is a global minimum.
+
+!split
+===== Some simple problems =====
+
+o Show that $f(x)=x^2$ is convex for $x \in \mathbb{R}$ using the definition of convexity. Hint: If you re-write the definition, $f$ is convex if the following holds for all $x,y \in D_f$ and any $\lambda \in [0,1]$ $\lambda f(x)+(1-\lambda)f(y)-f(\lambda x + (1-\lambda) y ) \geq 0$.
+
+o Using the second order condition show that the following functions are convex on the specified domain.
+ * $f(x) = e^x$ is convex for $x \in \mathbb{R}$.
+ * $g(x) = -\ln(x)$ is convex for $x \in (0,\infty)$.
+o Let $f(x) = x^2$ and $g(x) = e^x$. Show that $f(g(x))$ and $g(f(x))$ is convex for $x \in \mathbb{R}$. Also show that if $f(x)$ is any convex function than $h(x) = e^{f(x)}$ is convex.
+
+o A norm is any function that satisfy the following properties
+ * $f(\alpha x) = |\alpha| f(x)$ for all $\alpha \in \mathbb{R}$.
+ * $f(x+y) \leq f(x) + f(y)$
+ * $f(x) \leq 0$ for all $x \in \mathbb{R}^n$ with equality if and only if $x = 0$
+
+Using the definition of convexity, try to show that a function satisfying the properties above is convex (the third condition is not needed to show this).
+
+
+
+
+
+
+!split 
+===== Revisiting our first homework =====
+
+We will use linear regression as a case study for the gradient descent
+methods. Linear regression is a great test case for the gradient
+descent methods discussed in the lectures since it has several
+desirable properties such as:
+
+o An analytical solution (recall homework set 1).
+o The gradient can be computed analytically.
+o The cost function is convex which guarantees that gradient descent converges for small enough learning rates
+
+We revisit an example similar to what we had in the first homework set. We had a function  of the type
+
+!bc pycod
+x = 2*np.random.rand(m,1)
+y = 4+3*x+np.random.randn(m,1)
+!ec
+with $x_i \in [0,1] $ is chosen randomly using a uniform distribution. Additionally we have a stochastic noise chosen according to a normal distribution $\cal {N}(0,1)$. 
+The linear regression model is given by 
+!bt
+\[
+h_\beta(x) = \bm{y} = \beta_0 + \beta_1 x,
+\] 
+!et
+such that 
+!bt
+\[
+\bm{y}_i = \beta_0 + \beta_1 x_i.
+\]
+!et
+
+!split 
+===== Gradient descent example =====
+
+Let $\mathbf{y} = (y_1,\cdots,y_n)^T$, $\mathbf{\bm{y}} = (\bm{y}_1,\cdots,\bm{y}_n)^T$ and $\beta = (\beta_0, \beta_1)^T$
+
+It is convenient to write $\mathbf{\bm{y}} = X\beta$ where $X \in \mathbb{R}^{100 \times 2} $ is the design matrix given by (we keep the intercept here)
+!bt
+\[
+X \equiv \begin{bmatrix}
+1 & x_1  \\
+\vdots & \vdots  \\
+1 & x_{100} &  \\
+\end{bmatrix}.
+\]
+!et
+The cost/loss/risk function is given by (
+!bt
+\[
+C(\beta) = \frac{1}{n}||X\beta-\mathbf{y}||_{2}^{2} = \frac{1}{n}\sum_{i=1}^{100}\left[ (\beta_0 + \beta_1 x_i)^2 - 2 y_i (\beta_0 + \beta_1 x_i) + y_i^2\right] 
+\]
+!et
+and we want to find $\beta$ such that $C(\beta)$ is minimized.
+
+!split
+===== The derivative of the cost/loss function =====
+
+Computing $\partial C(\beta) / \partial \beta_0$ and $\partial C(\beta) / \partial \beta_1$ we can show  that the gradient can be written as
+!bt
+\[
+\nabla_{\beta} C(\beta) = \frac{2}{n}\begin{bmatrix} \sum_{i=1}^{100} \left(\beta_0+\beta_1x_i-y_i\right) \\
+\sum_{i=1}^{100}\left( x_i (\beta_0+\beta_1x_i)-y_ix_i\right) \\
+\end{bmatrix} = \frac{2}{n}X^T(X\beta - \mathbf{y}), 
+\]
+!et
+where $X$ is the design matrix defined above.
+
+!split
+===== The Hessian matrix =====
+The Hessian matrix of $C(\beta)$ is given by 
+!bt
+\[
+\bm{H} \equiv \begin{bmatrix}
+\frac{\partial^2 C(\beta)}{\partial \beta_0^2} & \frac{\partial^2 C(\beta)}{\partial \beta_0 \partial \beta_1}  \\
+\frac{\partial^2 C(\beta)}{\partial \beta_0 \partial \beta_1} & \frac{\partial^2 C(\beta)}{\partial \beta_1^2} &  \\
+\end{bmatrix} = \frac{2}{n}X^T X.
+\]
+!et
+This result implies that $C(\beta)$ is a convex function since the matrix $X^T X$ always is positive semi-definite.
+
+
+
+
+!split
+===== Simple program =====
+
+We can now write a program that minimizes $C(\beta)$ using the gradient descent method with a constant learning rate $\gamma$ according to 
+!bt
+\[
+\beta_{k+1} = \beta_k - \gamma \nabla_\beta C(\beta_k), \ k=0,1,\cdots 
+\]
+!et
+
+We can use the expression we computed for the gradient and let use a
+$\beta_0$ be chosen randomly and let $\gamma = 0.001$. Stop iterating
+when $||\nabla_\beta C(\beta_k) || \leq \epsilon = 10^{-8}$. _Note that the code below does not include the latter stop criterion_.
+
+And finally we can compare our solution for $\beta$ with the analytic result given by 
+$\beta= (X^TX)^{-1} X^T \mathbf{y}$.
+
+!split
+===== Gradient Descent Example =====
+
+Here our simple example
+!bc pycod
+
+# Importing various packages
+from random import random, seed
+import numpy as np
+import matplotlib.pyplot as plt
+from mpl_toolkits.mplot3d import Axes3D
+from matplotlib import cm
+from matplotlib.ticker import LinearLocator, FormatStrFormatter
+import sys
+
+# the number of datapoints
+n = 100
+x = 2*np.random.rand(n,1)
+y = 4+3*x+np.random.randn(n,1)
+
+X = np.c_[np.ones((n,1)), x]
+# Hessian matrix
+H = (2.0/n)* X.T @ X
+# Get the eigenvalues
+EigValues, EigVectors = np.linalg.eig(H)
+print(f"Eigenvalues of Hessian Matrix:{EigValues}")
+
+beta_linreg = np.linalg.inv(X.T @ X) @ X.T @ y
+print(beta_linreg)
+beta = np.random.randn(2,1)
+
+eta = 1.0/np.max(EigValues)
+Niterations = 1000
+
+for iter in range(Niterations):
+    gradient = (2.0/n)*X.T @ (X @ beta-y)
+    beta -= eta*gradient
+
+print(beta)
+xnew = np.array([[0],[2]])
+xbnew = np.c_[np.ones((2,1)), xnew]
+ypredict = xbnew.dot(beta)
+ypredict2 = xbnew.dot(beta_linreg)
+plt.plot(xnew, ypredict, "r-")
+plt.plot(xnew, ypredict2, "b-")
+plt.plot(x, y ,'ro')
+plt.axis([0,2.0,0, 15.0])
+plt.xlabel(r'$x$')
+plt.ylabel(r'$y$')
+plt.title(r'Gradient descent example')
+plt.show()
+
+!ec
+
+!split
+===== And a corresponding example using _scikit-learn_ =====
+
+!bc pycod
+# Importing various packages
+from random import random, seed
+import numpy as np
+import matplotlib.pyplot as plt
+from sklearn.linear_model import SGDRegressor
+
+n = 100
+x = 2*np.random.rand(n,1)
+y = 4+3*x+np.random.randn(n,1)
+
+X = np.c_[np.ones((n,1)), x]
+beta_linreg = np.linalg.inv(X.T @ X) @ (X.T @ y)
+print(beta_linreg)
+sgdreg = SGDRegressor(max_iter = 50, penalty=None, eta0=0.1)
+sgdreg.fit(x,y.ravel())
+print(sgdreg.intercept_, sgdreg.coef_)
+
+!ec
+
+
+
+!split 
+===== Gradient descent and Ridge =====
+
+We have also discussed Ridge regression where the loss function contains a regularized term given by the $L_2$ norm of $\beta$, 
+!bt
+\[
+C_{\text{ridge}}(\beta) = \frac{1}{n}||X\beta -\mathbf{y}||^2 + \lambda ||\beta||^2, \ \lambda \geq 0.
+\]
+!et
+
+In order to minimize $C_{\text{ridge}}(\beta)$ using GD we adjust the gradient as follows 
+!bt
+\[
+\nabla_\beta C_{\text{ridge}}(\beta)  = \frac{2}{n}\begin{bmatrix} \sum_{i=1}^{100} \left(\beta_0+\beta_1x_i-y_i\right) \\
+\sum_{i=1}^{100}\left( x_i (\beta_0+\beta_1x_i)-y_ix_i\right) \\
+\end{bmatrix} + 2\lambda\begin{bmatrix} \beta_0 \\ \beta_1\end{bmatrix} = 2 (\frac{1}{n}X^T(X\beta - \mathbf{y})+\lambda \beta).
+\]
+!et
+
+We can easily extend our program to minimize $C_{\text{ridge}}(\beta)$ using gradient descent and compare with the analytical solution given by 
+!bt
+\[
+\beta_{\text{ridge}} = \left(X^T X + n\lambda I_{2 \times 2} \right)^{-1} X^T \mathbf{y}.
+\]
+!et
+
+!split
+===== The Hessian matrix for Ridge Regression =====
+The Hessian matrix of Ridge Regression for our simple example  is given by 
+!bt
+\[
+\bm{H} \equiv \begin{bmatrix}
+\frac{\partial^2 C(\beta)}{\partial \beta_0^2} & \frac{\partial^2 C(\beta)}{\partial \beta_0 \partial \beta_1}  \\
+\frac{\partial^2 C(\beta)}{\partial \beta_0 \partial \beta_1} & \frac{\partial^2 C(\beta)}{\partial \beta_1^2} &  \\
+\end{bmatrix} = \frac{2}{n}X^T X+2\lambda\bm{I}.
+\]
+!et
+This implies that the Hessian matrix  is positive definite, hence the stationary point is a
+minimum.
+Note that the Ridge cost function is convex being  a sum of two convex
+functions. Therefore, the stationary point is a global
+minimum of this function.
+
+
+!split
+===== Program example for gradient descent with Ridge Regression =====
+!bc pycod
+from random import random, seed
+import numpy as np
+import matplotlib.pyplot as plt
+from mpl_toolkits.mplot3d import Axes3D
+from matplotlib import cm
+from matplotlib.ticker import LinearLocator, FormatStrFormatter
+import sys
+
+# the number of datapoints
+n = 100
+x = 2*np.random.rand(n,1)
+y = 4+3*x+np.random.randn(n,1)
+
+X = np.c_[np.ones((n,1)), x]
+XT_X = X.T @ X
+
+#Ridge parameter lambda
+lmbda  = 0.001
+Id = n*lmbda* np.eye(XT_X.shape[0])
+
+# Hessian matrix
+H = (2.0/n)* XT_X+2*lmbda* np.eye(XT_X.shape[0])
+# Get the eigenvalues
+EigValues, EigVectors = np.linalg.eig(H)
+print(f"Eigenvalues of Hessian Matrix:{EigValues}")
+
+
+beta_linreg = np.linalg.inv(XT_X+Id) @ X.T @ y
+print(beta_linreg)
+# Start plain gradient descent
+beta = np.random.randn(2,1)
+
+eta = 1.0/np.max(EigValues)
+Niterations = 100
+
+for iter in range(Niterations):
+    gradients = 2.0/n*X.T @ (X @ (beta)-y)+2*lmbda*beta
+    beta -= eta*gradients
+
+print(beta)
+ypredict = X @ beta
+ypredict2 = X @ beta_linreg
+plt.plot(x, ypredict, "r-")
+plt.plot(x, ypredict2, "b-")
+plt.plot(x, y ,'ro')
+plt.axis([0,2.0,0, 15.0])
+plt.xlabel(r'$x$')
+plt.ylabel(r'$y$')
+plt.title(r'Gradient descent example for Ridge')
+plt.show()
+
+
+!ec
+
+!split
+===== Using gradient descent methods, limitations =====
+
+* _Gradient descent (GD) finds local minima of our function_. Since the GD algorithm is deterministic, if it converges, it will converge to a local minimum of our cost/loss/risk function. Because in ML we are often dealing with extremely rugged landscapes with many local minima, this can lead to poor performance.
+
+* _GD is sensitive to initial conditions_. One consequence of the local nature of GD is that initial conditions matter. Depending on where one starts, one will end up at a different local minima. Therefore, it is very important to think about how one initializes the training process. This is true for GD as well as more complicated variants of GD.
+
+* _Gradients are computationally expensive to calculate for large datasets_. In many cases in statistics and ML, the cost/loss/risk function is a sum of terms, with one term for each data point. For example, in linear regression, $E \propto \sum_{i=1}^n (y_i - \mathbf{w}^T\cdot\mathbf{x}_i)^2$; for logistic regression, the square error is replaced by the cross entropy. To calculate the gradient we have to sum over *all* $n$ data points. Doing this at every GD step becomes extremely computationally expensive. An ingenious solution to this, is to calculate the gradients using small subsets of the data called ``mini batches''. This has the added benefit of introducing stochasticity into our algorithm.
+
+* _GD is very sensitive to choices of learning rates_. GD is extremely sensitive to the choice of learning rates. If the learning rate is very small, the training process take an extremely long time. For larger learning rates, GD can diverge and give poor results. Furthermore, depending on what the local landscape looks like, we have to modify the learning rates to ensure convergence. Ideally, we would *adaptively* choose the learning rates to match the landscape.
+
+* _GD treats all directions in parameter space uniformly._ Another major drawback of GD is that unlike Newton's method, the learning rate for GD is the same in all directions in parameter space. For this reason, the maximum learning rate is set by the behavior of the steepest direction and this can significantly slow down training. Ideally, we would like to take large steps in flat directions and small steps in steep directions. Since we are exploring rugged landscapes where curvatures change, this requires us to keep track of not only the gradient but second derivatives. The ideal scenario would be to calculate the Hessian but this proves to be too computationally expensive. 
+	
+* GD can take exponential time to escape saddle points, even with random initialization. As we mentioned, GD is extremely sensitive to initial condition since it determines the particular local minimum GD would eventually reach. However, even with a good initialization scheme, through the introduction of randomness, GD can still take exponential time to escape saddle points.
+	
+
+!split
+===== Challenge yourself the coming weekend =====
+
+Write a code which implements gradient descent for a logistic regression example.
+
+
+
+
+!split
+===== Lab session: Material from last week and relevant for the first project =====
+
+
+!split 
+===== Various steps in cross-validation =====
+
+When the repetitive splitting of the data set is done randomly,
+samples may accidently end up in a fast majority of the splits in
+either training or test set. Such samples may have an unbalanced
+influence on either model building or prediction evaluation. To avoid
+this $k$-fold cross-validation structures the data splitting. The
+samples are divided into $k$ more or less equally sized exhaustive and
+mutually exclusive subsets. In turn (at each split) one of these
+subsets plays the role of the test set while the union of the
+remaining subsets constitutes the training set. Such a splitting
+warrants a balanced representation of each sample in both training and
+test set over the splits. Still the division into the $k$ subsets
+involves a degree of randomness. This may be fully excluded when
+choosing $k=n$. This particular case is referred to as leave-one-out
+cross-validation (LOOCV). 
+
+!split 
+===== How to set up the cross-validation for Ridge and/or Lasso =====
+
+* Define a range of interest for the penalty parameter.
+
+* Divide the data set into training and test set comprising samples $\{1, \ldots, n\} \setminus i$ and $\{ i \}$, respectively.
+
+* Fit the linear regression model by means of for example Ridge or Lasso regression  for each $\lambda$ in the grid using the training set, and the corresponding estimate of the error variance $\bm{\sigma}_{-i}^2(\lambda)$, as
+!bt
+\begin{align*}
+\bm{\beta}_{-i}(\lambda) & =  ( \bm{X}_{-i, \ast}^{T}
+\bm{X}_{-i, \ast} + \lambda \bm{I}_{pp})^{-1}
+\bm{X}_{-i, \ast}^{T} \bm{y}_{-i}
+\end{align*}
+!et 
+
+* Evaluate the prediction performance of these models on the test set by $C[y_i, \bm{X}_{i, \ast}; \bm{\beta}_{-i}(\lambda), \bm{\sigma}_{-i}^2(\lambda)]$. Or, by the prediction error $|y_i - \bm{X}_{i, \ast} \bm{\beta}_{-i}(\lambda)|$, the relative error, the error squared or the R2 score function.
+
+* Repeat the first three steps  such that each sample plays the role of the test set once.
+
+* Average the prediction performances of the test sets at each grid point of the penalty bias/parameter. It is an estimate of the prediction performance of the model corresponding to this value of the penalty parameter on novel data. 
+
+
+!split
+===== Cross-validation in brief =====
+
+For the various values of $k$
+
+o shuffle the dataset randomly.
+o Split the dataset into $k$ groups.
+o For each unique group:
+  o Decide which group to use as set for test data
+  o Take the remaining groups as a training data set
+  o Fit a model on the training set and evaluate it on the test set
+  o Retain the evaluation score and discard the model
+o Summarize the model using the sample of model evaluation scores
+
+
+
+!split
+===== Code Example for Cross-validation and $k$-fold Cross-validation =====
+
+The code here uses Ridge regression with cross-validation (CV)  resampling and $k$-fold CV in order to fit a specific polynomial. 
+!bc pycod
+import numpy as np
+import matplotlib.pyplot as plt
+from sklearn.model_selection import KFold
+from sklearn.linear_model import Ridge
+from sklearn.model_selection import cross_val_score
+from sklearn.preprocessing import PolynomialFeatures
+
+# A seed just to ensure that the random numbers are the same for every run.
+# Useful for eventual debugging.
+np.random.seed(3155)
+
+# Generate the data.
+nsamples = 100
+x = np.random.randn(nsamples)
+y = 3*x**2 + np.random.randn(nsamples)
+
+## Cross-validation on Ridge regression using KFold only
+
+# Decide degree on polynomial to fit
+poly = PolynomialFeatures(degree = 6)
+
+# Decide which values of lambda to use
+nlambdas = 500
+lambdas = np.logspace(-3, 5, nlambdas)
+
+# Initialize a KFold instance
+k = 5
+kfold = KFold(n_splits = k)
+
+# Perform the cross-validation to estimate MSE
+scores_KFold = np.zeros((nlambdas, k))
+
+i = 0
+for lmb in lambdas:
+    ridge = Ridge(alpha = lmb)
+    j = 0
+    for train_inds, test_inds in kfold.split(x):
+        xtrain = x[train_inds]
+        ytrain = y[train_inds]
+
+        xtest = x[test_inds]
+        ytest = y[test_inds]
+
+        Xtrain = poly.fit_transform(xtrain[:, np.newaxis])
+        ridge.fit(Xtrain, ytrain[:, np.newaxis])
+
+        Xtest = poly.fit_transform(xtest[:, np.newaxis])
+        ypred = ridge.predict(Xtest)
+
+        scores_KFold[i,j] = np.sum((ypred - ytest[:, np.newaxis])**2)/np.size(ypred)
+
+        j += 1
+    i += 1
+
+
+estimated_mse_KFold = np.mean(scores_KFold, axis = 1)
+
+## Cross-validation using cross_val_score from sklearn along with KFold
+
+# kfold is an instance initialized above as:
+# kfold = KFold(n_splits = k)
+
+estimated_mse_sklearn = np.zeros(nlambdas)
+i = 0
+for lmb in lambdas:
+    ridge = Ridge(alpha = lmb)
+
+    X = poly.fit_transform(x[:, np.newaxis])
+    estimated_mse_folds = cross_val_score(ridge, X, y[:, np.newaxis], scoring='neg_mean_squared_error', cv=kfold)
+
+    # cross_val_score return an array containing the estimated negative mse for every fold.
+    # we have to the the mean of every array in order to get an estimate of the mse of the model
+    estimated_mse_sklearn[i] = np.mean(-estimated_mse_folds)
+
+    i += 1
+
+## Plot and compare the slightly different ways to perform cross-validation
+
+plt.figure()
+
+plt.plot(np.log10(lambdas), estimated_mse_sklearn, label = 'cross_val_score')
+plt.plot(np.log10(lambdas), estimated_mse_KFold, 'r--', label = 'KFold')
+
+plt.xlabel('log10(lambda)')
+plt.ylabel('mse')
+
+plt.legend()
+
+plt.show()
+
+!ec
+
+
diff --git a/doc/src/week38/figures/BiasVariance.png b/doc/src/week38/figures/BiasVariance.png
new file mode 100644
index 000000000..3fb3474ac
Binary files /dev/null and b/doc/src/week38/figures/BiasVariance.png differ
diff --git a/doc/src/week38/figures/adagrad.png b/doc/src/week38/figures/adagrad.png
new file mode 100644
index 000000000..97a9cf908
Binary files /dev/null and b/doc/src/week38/figures/adagrad.png differ
diff --git a/doc/src/week38/figures/adam.png b/doc/src/week38/figures/adam.png
new file mode 100644
index 000000000..a3a39f025
Binary files /dev/null and b/doc/src/week38/figures/adam.png differ
diff --git a/doc/src/week38/figures/nns.png b/doc/src/week38/figures/nns.png
new file mode 100644
index 000000000..19e31ef05
Binary files /dev/null and b/doc/src/week38/figures/nns.png differ
diff --git a/doc/src/week38/figures/rmsprop.png b/doc/src/week38/figures/rmsprop.png
new file mode 100644
index 000000000..9f336d033
Binary files /dev/null and b/doc/src/week38/figures/rmsprop.png differ
diff --git a/doc/src/week38/week38.do.txt b/doc/src/week38/week38.do.txt
index f187f74a1..f0fa1c3ca 100644
--- a/doc/src/week38/week38.do.txt
+++ b/doc/src/week38/week38.do.txt
@@ -1,5 +1,5 @@
-TITLE: Week 38: Logistic Regression and Optimization
-AUTHOR: Morten Hjorth-Jensen {copyright, 1999-present|CC BY-NC} at Department of Physics and Center for Computing in Science Education, University of Oslo
+TITLE: Week 38: Statistical analysis, bias-variance tradeoff and resampling methods
+AUTHOR: Morten Hjorth-Jensen {copyright, 1999-present|CC BY-NC} at Department of Physics and Center for Computing in Science Education, University of Oslo, Norway
 DATE: September 15-19, 2025
 
 
@@ -7,1555 +7,1075 @@ DATE: September 15-19, 2025
 !split
 ===== Plans for week 38, lecture Monday September 15 =====
 
-
 !bblock Material for the lecture on Monday September 15
-  * Logistic regression as our first encounter of classification methods. From binary cases to several categories.
-  * Start gradient and optimization methods
-#  * "Video of lecture":"https://youtu.be/c9DIfNHy2ks"
-#  * Whiteboard notes at URL:"https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2024/NotesSeptember16.pdf"
-!eblock
-
-!split
-===== Suggested reading and videos =====
-!bblock 
-  * Readings and Videos:
-    * Hastie et al 4.1, 4.2 and 4.3 on logistic regression
-    * Raschka et al, pages 53-76 on Logistic regression and pages 37-52 on gradient optimization
-    * For a good discussion on gradient methods, see Goodfellow et al section 4.3-4.5 and chapter 8. We will come back to the latter chapter in our discussion of Neural networks as well.
-    * "Video on Logistic regression":"https://www.youtube.com/watch?v=C5268D9t9Ak"
-    * "Yet another video on logistic regression":"https://www.youtube.com/watch?v=yIYKR4sgzI8"
-    * "Video on gradient descent":"https://www.youtube.com/watch?v=sDv4f4s2SB8"
+o Statistical interpretation of OLS and various expectation values
+o Resampling techniques, Bootstrap and cross validation and bias-variance tradeoff
+o The material we did not cover last week, that is on more advanced methods for updating the learning rate, are covered by its own video. We will briefly discuss these topics at the beginning of the lecture and during the lab sessions. See video on ADAgrad, RMSprop and ADAM (material from last week not covered during lecture) at URL:"https://youtu.be/J_41Hld6tTU"
+o "Video of Lecture":"https://youtu.be/4Fo7ITVA7V4"
+o "Video from lab sessions on the bias-variance tradeoff":"https://youtu.be/GBWc1abChKo"
+o "Whiteboard notes":"https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2025/FYSSTKweek38.pdf"
 !eblock
 
 
 
-!split
-===== Plans for the lab sessions =====
 
-!bblock  Material for the active learning sessions on Tuesday and Wednesday
-  * Repetition  from last week on the bias-variance tradeoff
-  * Resampling techniques, cross-validation examples included here, see also the lectures from last week on the bootstrap method
-  * Exercise for week 38 on the bias-variance tradeoff, see also the video from the lab session from week 37 at URL:"https://youtu.be/omLmp_kkie0"
-  * Work on project 1, in particular resampling methods like cross-validation and bootstrap.
-  * "Video on cross-validation from exercise session":"https://youtu.be/T9jjWsmsd1o"
-!eblock  
 
 
 
 !split
-===== Material for lecture Monday September 16 =====
-
+===== Readings and Videos =====
+!bblock
+o Raschka et al, pages 175-192
+o Hastie et al Chapter 7, here we recommend 7.1-7.5 and 7.10 (cross-validation) and 7.11 (bootstrap). See URL:"https://link.springer.com/book/10.1007/978-0-387-84858-7".
+o "Video on bias-variance tradeoff":"https://www.youtube.com/watch?v=EuBBz3bI-aA"
+o "Video on Bootstrapping":"https://www.youtube.com/watch?v=Xz0x-8-cgaQ"
+o "Video on cross validation":"https://www.youtube.com/watch?v=fSytzGwwBVw"
+For the lab session, the following video on cross validation (from 2024), could be helpful, see URL:"https://www.youtube.com/watch?v=T9jjWsmsd1o"
+!eblock
 
-!split 
-===== Logistic Regression =====
-
-In linear regression our main interest was centered on learning the
-coefficients of a functional fit (say a polynomial) in order to be
-able to predict the response of a continuous variable on some unseen
-data. The fit to the continuous variable $y_i$ is based on some
-independent variables $\bm{x}_i$. Linear regression resulted in
-analytical expressions for standard ordinary Least Squares or Ridge
-regression (in terms of matrices to invert) for several quantities,
-ranging from the variance and thereby the confidence intervals of the
-parameters $\bm{\beta}$ to the mean squared error. If we can invert
-the product of the design matrices, linear regression gives then a
-simple recipe for fitting our data.
 
 !split 
-===== Classification problems =====
-
-
-Classification problems, however, are concerned with outcomes taking
-the form of discrete variables (i.e. categories). We may for example,
-on the basis of DNA sequencing for a number of patients, like to find
-out which mutations are important for a certain disease; or based on
-scans of various patients' brains, figure out if there is a tumor or
-not; or given a specific physical system, we'd like to identify its
-state, say whether it is an ordered or disordered system (typical
-situation in solid state physics); or classify the status of a
-patient, whether she/he has a stroke or not and many other similar
-situations.
-
-The most common situation we encounter when we apply logistic
-regression is that of two possible outcomes, normally denoted as a
-binary outcome, true or false, positive or negative, success or
-failure etc.
+===== Linking the regression analysis with a statistical interpretation =====
 
-!split
-===== Optimization and Deep learning =====
-
-Logistic regression will also serve as our stepping stone towards
-neural network algorithms and supervised deep learning. For logistic
-learning, the minimization of the cost function leads to a non-linear
-equation in the parameters $\bm{\beta}$. The optimization of the
-problem calls therefore for minimization algorithms. This forms the
-bottle neck of all machine learning algorithms, namely how to find
-reliable minima of a multi-variable function. This leads us to the
-family of gradient descent methods. The latter are the working horses
-of basically all modern machine learning algorithms.
+We will now couple the discussions of ordinary least squares, Ridge
+and Lasso regression with a statistical interpretation, that is we
+move from a linear algebra analysis to a statistical analysis. In
+particular, we will focus on what the regularization terms can result
+in.  We will amongst other things show that the regularization
+parameter can reduce considerably the variance of the parameters
+$\theta$.
 
-We note also that many of the topics discussed here on logistic 
-regression are also commonly used in modern supervised Deep Learning
-models, as we will see later.
 
-
-!split 
-===== Basics =====
-
-We consider the case where the dependent variables, also called the
-responses or the outcomes, $y_i$ are discrete and only take values
-from $k=0,\dots,K-1$ (i.e. $K$ classes).
-
-The goal is to predict the
-output classes from the design matrix $\bm{X}\in\mathbb{R}^{n\times p}$
-made of $n$ samples, each of which carries $p$ features or predictors. The
-primary goal is to identify the classes to which new unseen samples
-belong.
- 
-Let us specialize to the case of two classes only, with outputs
-$y_i=0$ and $y_i=1$. Our outcomes could represent the status of a
-credit card user that could default or not on her/his credit card
-debt. That is
+On of the advantages of doing linear regression is that we actually end up with
+analytical expressions for several statistical quantities.  
+Standard least squares and Ridge regression  allow us to
+derive quantities like the variance and other expectation values in a
+rather straightforward way.
 
 
+It is assumed that $\varepsilon_i
+\sim \mathcal{N}(0, \sigma^2)$ and the $\varepsilon_{i}$ are
+independent, i.e.: 
 !bt
-\[
-y_i = \begin{bmatrix} 0 & \mathrm{no}\\  1 & \mathrm{yes} \end{bmatrix}.
-\]
+\begin{align*} 
+\mbox{Cov}(\varepsilon_{i_1},
+\varepsilon_{i_2}) & = \left\{ \begin{array}{lcc} \sigma^2 & \mbox{if}
+& i_1 = i_2, \\ 0 & \mbox{if} & i_1 \not= i_2.  \end{array} \right.
+\end{align*} 
 !et
+The randomness of $\varepsilon_i$ implies that
+$\mathbf{y}_i$ is also a random variable. In particular,
+$\mathbf{y}_i$ is normally distributed, because $\varepsilon_i \sim
+\mathcal{N}(0, \sigma^2)$ and $\mathbf{X}_{i,\ast} \, \bm{\theta}$ is a
+non-random scalar. To specify the parameters of the distribution of
+$\mathbf{y}_i$ we need to calculate its first two moments. 
 
+Recall that $\bm{X}$ is a matrix of dimensionality $n\times p$. The
+notation above $\mathbf{X}_{i,\ast}$ means that we are looking at the
+row number $i$ and perform a sum over all values $p$.
 
 
 !split
-===== Linear classifier =====
+===== Assumptions made =====
 
-Before moving to the logistic model, let us try to use our linear
-regression model to classify these two outcomes. We could for example
-fit a linear model to the default case if $y_i > 0.5$ and the no
-default case $y_i \leq 0.5$.
-
-We would then have our 
-weighted linear combination, namely 
-!bt
-\begin{equation}
-\bm{y} = \bm{X}^T\bm{\beta} +  \bm{\epsilon},
-\end{equation}
-!et
-where $\bm{y}$ is a vector representing the possible outcomes, $\bm{X}$ is our
-$n\times p$ design matrix and $\bm{\beta}$ represents our estimators/predictors.
-
-!split
-===== Some selected properties =====
-
-The main problem with our function is that it takes values on the
-entire real axis. In the case of logistic regression, however, the
-labels $y_i$ are discrete variables. A typical example is the credit
-card data discussed below here, where we can set the state of
-defaulting the debt to $y_i=1$ and not to $y_i=0$ for one the persons
-in the data set (see the full example below).
-
-One simple way to get a discrete output is to have sign
-functions that map the output of a linear regressor to values $\{0,1\}$,
-$f(s_i)=sign(s_i)=1$ if $s_i\ge 0$ and 0 if otherwise. 
-We will encounter this model in our first demonstration of neural networks.
-
-Historically it is called the _perceptron_ model in the machine learning
-literature. This model is extremely simple. However, in many cases it is more
-favorable to use a ``soft" classifier that outputs
-the probability of a given category. This leads us to the logistic function.
-
-!split
-===== Simple example =====
-
-The following example on data for coronary heart disease (CHD) as function of age may serve as an illustration. In the code here we read and plot whether a person has had CHD (output = 1) or not (output = 0). This ouput  is plotted the person's against age. Clearly, the figure shows that attempting to make a standard linear regression fit may not be very meaningful.
-
-!bc pycod
-# Common imports
-import os
-import numpy as np
-import pandas as pd
-import matplotlib.pyplot as plt
-from sklearn.linear_model import LinearRegression, Ridge, Lasso
-from sklearn.model_selection import train_test_split
-from sklearn.utils import resample
-from sklearn.metrics import mean_squared_error
-from IPython.display import display
-from pylab import plt, mpl
-plt.style.use('seaborn')
-mpl.rcParams['font.family'] = 'serif'
-
-# Where to save the figures and data files
-PROJECT_ROOT_DIR = "Results"
-FIGURE_ID = "Results/FigureFiles"
-DATA_ID = "DataFiles/"
-
-if not os.path.exists(PROJECT_ROOT_DIR):
-    os.mkdir(PROJECT_ROOT_DIR)
-
-if not os.path.exists(FIGURE_ID):
-    os.makedirs(FIGURE_ID)
-
-if not os.path.exists(DATA_ID):
-    os.makedirs(DATA_ID)
-
-def image_path(fig_id):
-    return os.path.join(FIGURE_ID, fig_id)
-
-def data_path(dat_id):
-    return os.path.join(DATA_ID, dat_id)
-
-def save_fig(fig_id):
-    plt.savefig(image_path(fig_id) + ".png", format='png')
-
-infile = open(data_path("chddata.csv"),'r')
-
-# Read the chd data as  csv file and organize the data into arrays with age group, age, and chd
-chd = pd.read_csv(infile, names=('ID', 'Age', 'Agegroup', 'CHD'))
-chd.columns = ['ID', 'Age', 'Agegroup', 'CHD']
-output = chd['CHD']
-age = chd['Age']
-agegroup = chd['Agegroup']
-numberID  = chd['ID'] 
-display(chd)
-
-plt.scatter(age, output, marker='o')
-plt.axis([18,70.0,-0.1, 1.2])
-plt.xlabel(r'Age')
-plt.ylabel(r'CHD')
-plt.title(r'Age distribution and Coronary heart disease')
-plt.show()
-!ec
-
-!split
-===== Plotting the mean value for each group =====
-
-What we could attempt however is to plot the mean value for each group.
-
-!bc pycod
-agegroupmean = np.array([0.1, 0.133, 0.250, 0.333, 0.462, 0.625, 0.765, 0.800])
-group = np.array([1, 2, 3, 4, 5, 6, 7, 8])
-plt.plot(group, agegroupmean, "r-")
-plt.axis([0,9,0, 1.0])
-plt.xlabel(r'Age group')
-plt.ylabel(r'CHD mean values')
-plt.title(r'Mean values for each age group')
-plt.show()
-!ec
-
-We are now trying to find a function $f(y\vert x)$, that is a function which gives us an expected value for the output $y$ with a given input $x$.
-In standard linear regression with a linear dependence on $x$, we would write this in terms of our model
+The assumption we have made here can be summarized as (and this is going to be useful when we discuss the bias-variance trade off)
+that there exists a function $f(\bm{x})$ and  a normal distributed error $\bm{\varepsilon}\sim \mathcal{N}(0, \sigma^2)$
+which describe our data
 !bt
 \[
-f(y_i\vert x_i)=\beta_0+\beta_1 x_i.
+\bm{y} = f(\bm{x})+\bm{\varepsilon}
 \]
 !et
 
-This expression implies however that $f(y_i\vert x_i)$ could take any
-value from minus infinity to plus infinity. If we however let
-$f(y\vert y)$ be represented by the mean value, the above example
-shows us that we can constrain the function to take values between
-zero and one, that is we have $0 \le f(y_i\vert x_i) \le 1$. Looking
-at our last curve we see also that it has an S-shaped form. This leads
-us to a very popular model for the function $f$, namely the so-called
-Sigmoid function or logistic model. We will consider this function as
-representing the probability for finding a value of $y_i$ with a given
-$x_i$.
-
-!split
-===== The logistic function =====
-
-Another widely studied model, is the so-called 
-perceptron model, which is an example of a ``hard classification'' model. We
-will encounter this model when we discuss neural networks as
-well. Each datapoint is deterministically assigned to a category (i.e
-$y_i=0$ or $y_i=1$). In many cases, and the coronary heart disease data forms one of many such examples, it is favorable to have a ``soft''
-classifier that outputs the probability of a given category rather
-than a single value. For example, given $x_i$, the classifier
-outputs the probability of being in a category $k$.  Logistic regression
-is the most common example of a so-called soft classifier. In logistic
-regression, the probability that a data point $x_i$
-belongs to a category $y_i=\{0,1\}$ is given by the so-called logit function (or Sigmoid) which is meant to represent the likelihood for a given event, 
+We approximate this function with our model from the solution of the linear regression equations, that is our
+function $f$ is approximated by $\bm{\tilde{y}}$ where we want to minimize $(\bm{y}-\bm{\tilde{y}})^2$, our MSE, with
 !bt
 \[
-p(t) = \frac{1}{1+\mathrm \exp{-t}}=\frac{\exp{t}}{1+\mathrm \exp{t}}.
+\bm{\tilde{y}} = \bm{X}\bm{\theta}.
 \]
 !et
-Note that $1-p(t)= p(-t)$.
 
 !split
-===== Examples of likelihood functions used in logistic regression and nueral networks =====
-
-
-The following code plots the logistic function, the step function and other functions we will encounter from here and on.
-
-
-!bc pycod
-"""The sigmoid function (or the logistic curve) is a
-function that takes any real number, z, and outputs a number (0,1).
-It is useful in neural networks for assigning weights on a relative scale.
-The value z is the weighted sum of parameters involved in the learning algorithm."""
-
-import numpy
-import matplotlib.pyplot as plt
-import math as mt
-
-z = numpy.arange(-5, 5, .1)
-sigma_fn = numpy.vectorize(lambda z: 1/(1+numpy.exp(-z)))
-sigma = sigma_fn(z)
-
-fig = plt.figure()
-ax = fig.add_subplot(111)
-ax.plot(z, sigma)
-ax.set_ylim([-0.1, 1.1])
-ax.set_xlim([-5,5])
-ax.grid(True)
-ax.set_xlabel('z')
-ax.set_title('sigmoid function')
-
-plt.show()
-
-"""Step Function"""
-z = numpy.arange(-5, 5, .02)
-step_fn = numpy.vectorize(lambda z: 1.0 if z >= 0.0 else 0.0)
-step = step_fn(z)
-
-fig = plt.figure()
-ax = fig.add_subplot(111)
-ax.plot(z, step)
-ax.set_ylim([-0.5, 1.5])
-ax.set_xlim([-5,5])
-ax.grid(True)
-ax.set_xlabel('z')
-ax.set_title('step function')
-
-plt.show()
+===== Expectation value and variance =====
 
-"""tanh Function"""
-z = numpy.arange(-2*mt.pi, 2*mt.pi, 0.1)
-t = numpy.tanh(z)
-
-fig = plt.figure()
-ax = fig.add_subplot(111)
-ax.plot(z, t)
-ax.set_ylim([-1.0, 1.0])
-ax.set_xlim([-2*mt.pi,2*mt.pi])
-ax.grid(True)
-ax.set_xlabel('z')
-ax.set_title('tanh function')
-
-plt.show()
-!ec
-
-
-
-
-
-
-
-!split
-=====  Two parameters =====
-
-We assume now that we have two classes with $y_i$ either $0$ or $1$. Furthermore we assume also that we have only two parameters $\beta$ in our fitting of the Sigmoid function, that is we define probabilities 
+We can calculate the expectation value of $\bm{y}$ for a given element $i$ 
+!bt
+\begin{align*} 
+\mathbb{E}(y_i) & =
+\mathbb{E}(\mathbf{X}_{i, \ast} \, \bm{\theta}) + \mathbb{E}(\varepsilon_i)
+\, \, \, = \, \, \, \mathbf{X}_{i, \ast} \, \theta, 
+\end{align*} 
+!et
+while
+its variance is 
 !bt
-\begin{align*}
-p(y_i=1|x_i,\bm{\beta}) &= \frac{\exp{(\beta_0+\beta_1x_i)}}{1+\exp{(\beta_0+\beta_1x_i)}},\nonumber\\
-p(y_i=0|x_i,\bm{\beta}) &= 1 - p(y_i=1|x_i,\bm{\beta}),
+\begin{align*} \mbox{Var}(y_i) & = \mathbb{E} \{ [y_i
+- \mathbb{E}(y_i)]^2 \} \, \, \, = \, \, \, \mathbb{E} ( y_i^2 ) -
+[\mathbb{E}(y_i)]^2  \\  & = \mathbb{E} [ ( \mathbf{X}_{i, \ast} \,
+\theta + \varepsilon_i )^2] - ( \mathbf{X}_{i, \ast} \, \bm{\theta})^2 \\ &
+= \mathbb{E} [ ( \mathbf{X}_{i, \ast} \, \bm{\theta})^2 + 2 \varepsilon_i
+\mathbf{X}_{i, \ast} \, \bm{\theta} + \varepsilon_i^2 ] - ( \mathbf{X}_{i,
+\ast} \, \theta)^2 \\  & = ( \mathbf{X}_{i, \ast} \, \bm{\theta})^2 + 2
+\mathbb{E}(\varepsilon_i) \mathbf{X}_{i, \ast} \, \bm{\theta} +
+\mathbb{E}(\varepsilon_i^2 ) - ( \mathbf{X}_{i, \ast} \, \bm{\theta})^2 
+\\ & = \mathbb{E}(\varepsilon_i^2 ) \, \, \, = \, \, \,
+\mbox{Var}(\varepsilon_i) \, \, \, = \, \, \, \sigma^2.  
 \end{align*}
 !et
-where $\bm{\beta}$ are the weights we wish to extract from data, in our case $\beta_0$ and $\beta_1$. 
+Hence, $y_i \sim \mathcal{N}( \mathbf{X}_{i, \ast} \, \bm{\theta}, \sigma^2)$, that is $\bm{y}$ follows a normal distribution with 
+mean value $\bm{X}\bm{\theta}$ and variance $\sigma^2$ (not be confused with the singular values of the SVD). 
 
-Note that we used
+!split
+===== Expectation value and variance for $\bm{\theta}$ =====
+
+With the OLS expressions for the optimal parameters $\bm{\hat{\theta}}$ we can evaluate the expectation value
 !bt
 \[
-p(y_i=0\vert x_i, \bm{\beta}) = 1-p(y_i=1\vert x_i, \bm{\beta}).
+\mathbb{E}(\bm{\hat{\theta}}) = \mathbb{E}[ (\mathbf{X}^{\top} \mathbf{X})^{-1}\mathbf{X}^{T} \mathbf{Y}]=(\mathbf{X}^{T} \mathbf{X})^{-1}\mathbf{X}^{T} \mathbb{E}[ \mathbf{Y}]=(\mathbf{X}^{T} \mathbf{X})^{-1} \mathbf{X}^{T}\mathbf{X}\bm{\theta}=\bm{\theta}.
 \]
 !et
+This means that the estimator of the regression parameters is unbiased.
 
-!split 
-===== Maximum likelihood =====
-
-In order to define the total likelihood for all possible outcomes from a  
-dataset $\mathcal{D}=\{(y_i,x_i)\}$, with the binary labels
-$y_i\in\{0,1\}$ and where the data points are drawn independently, we use the so-called "Maximum Likelihood Estimation":"https://en.wikipedia.org/wiki/Maximum_likelihood_estimation" (MLE) principle. 
-We aim thus at maximizing 
-the probability of seeing the observed data. We can then approximate the 
-likelihood in terms of the product of the individual probabilities of a specific outcome $y_i$, that is 
+We can also calculate the variance
+
+The variance of the optimal value $\bm{\hat{\theta}}$ is
 !bt
-\begin{align*}
-P(\mathcal{D}|\bm{\beta})& = \prod_{i=1}^n \left[p(y_i=1|x_i,\bm{\beta})\right]^{y_i}\left[1-p(y_i=1|x_i,\bm{\beta}))\right]^{1-y_i}\nonumber \\
-\end{align*}
+\begin{eqnarray*}
+\mbox{Var}(\bm{\hat{\theta}}) & = & \mathbb{E} \{ [\bm{\theta} - \mathbb{E}(\bm{\theta})] [\bm{\theta} - \mathbb{E}(\bm{\theta})]^{T} \}
+\\
+& = & \mathbb{E} \{ [(\mathbf{X}^{T} \mathbf{X})^{-1} \, \mathbf{X}^{T} \mathbf{Y} - \bm{\theta}] \, [(\mathbf{X}^{T} \mathbf{X})^{-1} \, \mathbf{X}^{T} \mathbf{Y} - \bm{\theta}]^{T} \}
+\\
+% & = & \mathbb{E} \{ [(\mathbf{X}^{T} \mathbf{X})^{-1} \, \mathbf{X}^{T} \mathbf{Y}] \, [(\mathbf{X}^{T} \mathbf{X})^{-1} \, \mathbf{X}^{T} \mathbf{Y}]^{T} \} - \bm{\theta} \, \bm{\theta}^{T}
+% \\
+% & = & \mathbb{E} \{ (\mathbf{X}^{T} \mathbf{X})^{-1} \, \mathbf{X}^{T} \mathbf{Y} \, \mathbf{Y}^{T} \, \mathbf{X} \, (\mathbf{X}^{T} \mathbf{X})^{-1}  \} - \bm{\theta} \, \bm{\theta}^{T}
+% \\
+& = & (\mathbf{X}^{T} \mathbf{X})^{-1} \, \mathbf{X}^{T} \, \mathbb{E} \{ \mathbf{Y} \, \mathbf{Y}^{T} \} \, \mathbf{X} \, (\mathbf{X}^{T} \mathbf{X})^{-1} - \bm{\theta} \, \bm{\theta}^{T}
+\\
+& = & (\mathbf{X}^{T} \mathbf{X})^{-1} \, \mathbf{X}^{T} \, \{ \mathbf{X} \, \bm{\theta} \, \bm{\theta}^{T} \,  \mathbf{X}^{T} + \sigma^2 \} \, \mathbf{X} \, (\mathbf{X}^{T} \mathbf{X})^{-1} - \bm{\theta} \, \bm{\theta}^{T}
+% \\
+% & = & (\mathbf{X}^T \mathbf{X})^{-1} \, \mathbf{X}^T \, \mathbf{X} \, \bm{\theta} \, \bm{\theta}^T \,  \mathbf{X}^T \, \mathbf{X} \, (\mathbf{X}^T % \mathbf{X})^{-1}
+% \\
+% & & + \, \, \sigma^2 \, (\mathbf{X}^T \mathbf{X})^{-1} \, \mathbf{X}^T  \, \mathbf{X} \, (\mathbf{X}^T \mathbf{X})^{-1} - \bm{\theta} \bm{\theta}^T
+\\
+& = & \bm{\theta} \, \bm{\theta}^{T}  + \sigma^2 \, (\mathbf{X}^{T} \mathbf{X})^{-1} - \bm{\theta} \, \bm{\theta}^{T}
+\, \, \, = \, \, \, \sigma^2 \, (\mathbf{X}^{T} \mathbf{X})^{-1},
+\end{eqnarray*}
 !et
-from which we obtain the log-likelihood and our _cost/loss_ function
+
+where we have used  that $\mathbb{E} (\mathbf{Y} \mathbf{Y}^{T}) =
+\mathbf{X} \, \bm{\theta} \, \bm{\theta}^{T} \, \mathbf{X}^{T} +
+\sigma^2 \, \mathbf{I}_{nn}$. From $\mbox{Var}(\bm{\theta}) = \sigma^2
+\, (\mathbf{X}^{T} \mathbf{X})^{-1}$, one obtains an estimate of the
+variance of the estimate of the $j$-th regression coefficient:
+$\bm{\sigma}^2 (\bm{\theta}_j ) = \bm{\sigma}^2 [(\mathbf{X}^{T} \mathbf{X})^{-1}]_{jj} $. This may be used to
+construct a confidence interval for the estimates.
+
+
+In a similar way, we can obtain analytical expressions for say the
+expectation values of the parameters $\bm{\theta}$ and their variance
+when we employ Ridge regression, allowing us again to define a confidence interval. 
+
+It is rather straightforward to show that
 !bt
 \[
-\mathcal{C}(\bm{\beta}) = \sum_{i=1}^n \left( y_i\log{p(y_i=1|x_i,\bm{\beta})} + (1-y_i)\log\left[1-p(y_i=1|x_i,\bm{\beta}))\right]\right).
+\mathbb{E} \big[ \bm{\theta}^{\mathrm{Ridge}} \big]=(\mathbf{X}^{T} \mathbf{X} + \lambda \mathbf{I}_{pp})^{-1} (\mathbf{X}^{\top} \mathbf{X})\bm{\theta}^{\mathrm{OLS}}.
 \]
 !et
+We see clearly that 
+$\mathbb{E} \big[ \bm{\theta}^{\mathrm{Ridge}} \big] \not= \bm{\theta}^{\mathrm{OLS}}$ for any $\lambda > 0$. We say then that the ridge estimator is biased.
 
-!split
-===== The cost function rewritten =====
+We can also compute the variance as 
 
-Reordering the logarithms, we can rewrite the _cost/loss_ function as
 !bt
 \[
-\mathcal{C}(\bm{\beta}) = \sum_{i=1}^n  \left(y_i(\beta_0+\beta_1x_i) -\log{(1+\exp{(\beta_0+\beta_1x_i)})}\right).
+\mbox{Var}[\bm{\theta}^{\mathrm{Ridge}}]=\sigma^2[  \mathbf{X}^{T} \mathbf{X} + \lambda \mathbf{I} ]^{-1}  \mathbf{X}^{T} \mathbf{X} \{ [  \mathbf{X}^{\top} \mathbf{X} + \lambda \mathbf{I} ]^{-1}\}^{T},
 \]
 !et
+and it is easy to see that if the parameter $\lambda$ goes to infinity then the variance of Ridge parameters $\bm{\theta}$ goes to zero. 
+
+With this, we can compute the difference 
 
-The maximum likelihood estimator is defined as the set of parameters that maximize the log-likelihood where we maximize with respect to $\beta$.
-Since the cost (error) function is just the negative log-likelihood, for logistic regression we have that
 !bt
 \[
-\mathcal{C}(\bm{\beta})=-\sum_{i=1}^n  \left(y_i(\beta_0+\beta_1x_i) -\log{(1+\exp{(\beta_0+\beta_1x_i)})}\right).
+\mbox{Var}[\bm{\theta}^{\mathrm{OLS}}]-\mbox{Var}(\bm{\theta}^{\mathrm{Ridge}})=\sigma^2 [  \mathbf{X}^{T} \mathbf{X} + \lambda \mathbf{I} ]^{-1}[ 2\lambda\mathbf{I} + \lambda^2 (\mathbf{X}^{T} \mathbf{X})^{-1} ] \{ [  \mathbf{X}^{T} \mathbf{X} + \lambda \mathbf{I} ]^{-1}\}^{T}.
 \]
 !et
-This equation is known in statistics as the _cross entropy_. Finally, we note that just as in linear regression, 
-in practice we often supplement the cross-entropy with additional regularization terms, usually $L_1$ and $L_2$ regularization as we did for Ridge and Lasso regression.
+The difference is non-negative definite since each component of the
+matrix product is non-negative definite. 
+This means the variance we obtain with the standard OLS will always for $\lambda > 0$ be larger than the variance of $\bm{\theta}$ obtained with the Ridge estimator. This has interesting consequences when we discuss the so-called bias-variance trade-off below. 
 
-!split
-=====  Minimizing the cross entropy =====
 
-The cross entropy is a convex function of the weights $\bm{\beta}$ and,
-therefore, any local minimizer is a global minimizer. 
+!split
+===== Deriving OLS from a probability distribution =====
 
+Our basic assumption when we derived the OLS equations was to assume
+that our output is determined by a given continuous function
+$f(\bm{x})$ and a random noise $\bm{\epsilon}$ given by the normal
+distribution with zero mean value and an undetermined variance
+$\sigma^2$.
 
-Minimizing this
-cost function with respect to the two parameters $\beta_0$ and $\beta_1$ we obtain
+We found above that the outputs $\bm{y}$ have a mean value given by
+$\bm{X}\hat{\bm{\theta}}$ and variance $\sigma^2$. Since the entries to
+the design matrix are not stochastic variables, we can assume that the
+probability distribution of our targets is also a normal distribution
+but now with mean value $\bm{X}\hat{\bm{\theta}}$. This means that a
+single output $y_i$ is given by the Gaussian distribution
 
 !bt
 \[
-\frac{\partial \mathcal{C}(\bm{\beta})}{\partial \beta_0} = -\sum_{i=1}^n  \left(y_i -\frac{\exp{(\beta_0+\beta_1x_i)}}{1+\exp{(\beta_0+\beta_1x_i)}}\right),
-\]
-!et
-and 
-!bt
-\[
-\frac{\partial \mathcal{C}(\bm{\beta})}{\partial \beta_1} = -\sum_{i=1}^n  \left(y_ix_i -x_i\frac{\exp{(\beta_0+\beta_1x_i)}}{1+\exp{(\beta_0+\beta_1x_i)}}\right).
+y_i\sim \mathcal{N}(\bm{X}_{i,*}\bm{\theta}, \sigma^2)=\frac{1}{\sqrt{2\pi\sigma^2}}\exp{\left[-\frac{(y_i-\bm{X}_{i,*}\bm{\theta})^2}{2\sigma^2}\right]}.
 \]
 !et
 
 !split
-=====  A more compact expression =====
-
-Let us now define a vector $\bm{y}$ with $n$ elements $y_i$, an
-$n\times p$ matrix $\bm{X}$ which contains the $x_i$ values and a
-vector $\bm{p}$ of fitted probabilities $p(y_i\vert x_i,\bm{\beta})$. We can rewrite in a more compact form the first
-derivative of cost function as
+===== Independent and Identically Distributed (iid) =====
 
+We assume now that the various $y_i$ values are stochastically distributed according to the above Gaussian distribution. 
+We define this distribution as
 !bt
 \[
-\frac{\partial \mathcal{C}(\bm{\beta})}{\partial \bm{\beta}} = -\bm{X}^T\left(\bm{y}-\bm{p}\right). 
+p(y_i, \bm{X}\vert\bm{\theta})=\frac{1}{\sqrt{2\pi\sigma^2}}\exp{\left[-\frac{(y_i-\bm{X}_{i,*}\bm{\theta})^2}{2\sigma^2}\right]},
 \]
 !et
+which reads as finding the likelihood of an event $y_i$ with the input variables $\bm{X}$ given the parameters (to be determined) $\bm{\theta}$.
 
-If we in addition define a diagonal matrix $\bm{W}$ with elements 
-$p(y_i\vert x_i,\bm{\beta})(1-p(y_i\vert x_i,\bm{\beta})$, we can obtain a compact expression of the second derivative as 
+Since these events are assumed to be independent and identicall distributed we can build the probability distribution function (PDF) for all possible event $\bm{y}$ as the product of the single events, that is we have
 
 !bt
 \[
-\frac{\partial^2 \mathcal{C}(\bm{\beta})}{\partial \bm{\beta}\partial \bm{\beta}^T} = \bm{X}^T\bm{W}\bm{X}. 
+p(\bm{y},\bm{X}\vert\bm{\theta})=\prod_{i=0}^{n-1}\frac{1}{\sqrt{2\pi\sigma^2}}\exp{\left[-\frac{(y_i-\bm{X}_{i,*}\bm{\theta})^2}{2\sigma^2}\right]}=\prod_{i=0}^{n-1}p(y_i,\bm{X}\vert\bm{\theta}).
 \]
 !et
 
-!split
-===== Extending to more predictors =====
-
-Within a binary classification problem, we can easily expand our model to include multiple predictors. Our ratio between likelihoods is then with $p$ predictors
+We will write this in a more compact form reserving $\bm{D}$ for the domain of events, including the ouputs (targets) and the inputs. That is
+in case we have a simple one-dimensional input and output case
 !bt
 \[
-\log{ \frac{p(\bm{\beta}\bm{x})}{1-p(\bm{\beta}\bm{x})}} = \beta_0+\beta_1x_1+\beta_2x_2+\dots+\beta_px_p.
+\bm{D}=[(x_0,y_0), (x_1,y_1),\dots, (x_{n-1},y_{n-1})].
 \]
 !et
-Here we defined $\bm{x}=[1,x_1,x_2,\dots,x_p]$ and $\bm{\beta}=[\beta_0, \beta_1, \dots, \beta_p]$ leading to
+In the more general case the various inputs should be replaced by the possible features represented by the input data set $\bm{X}$. 
+We can now rewrite the above probability as 
 !bt
 \[
-p(\bm{\beta}\bm{x})=\frac{ \exp{(\beta_0+\beta_1x_1+\beta_2x_2+\dots+\beta_px_p)}}{1+\exp{(\beta_0+\beta_1x_1+\beta_2x_2+\dots+\beta_px_p)}}.
+p(\bm{D}\vert\bm{\theta})=\prod_{i=0}^{n-1}\frac{1}{\sqrt{2\pi\sigma^2}}\exp{\left[-\frac{(y_i-\bm{X}_{i,*}\bm{\theta})^2}{2\sigma^2}\right]}.
 \]
 !et
 
+It is a conditional probability (see below) and reads as the likelihood of a domain of events $\bm{D}$ given a set of parameters $\bm{\theta}$.
+
 !split
-===== Including more classes =====
+===== Maximum Likelihood Estimation (MLE) =====
 
-Till now we have mainly focused on two classes, the so-called binary
-system. Suppose we wish to extend to $K$ classes.  Let us for the sake
-of simplicity assume we have only two predictors. We have then following model
+In statistics, maximum likelihood estimation (MLE) is a method of
+estimating the parameters of an assumed probability distribution,
+given some observed data. This is achieved by maximizing a likelihood
+function so that, under the assumed statistical model, the observed
+data is the most probable. 
+
+
+We will assume here that our events are given by the above Gaussian
+distribution and we will determine the optimal parameters $\theta$ by
+maximizing the above PDF. However, computing the derivatives of a
+product function is cumbersome and can easily lead to overflow and/or
+underflowproblems, with potentials for loss of numerical precision.
+
+
+In practice, it is more convenient to maximize the logarithm of the
+PDF because it is a monotonically increasing function of the argument.
+Alternatively, and this will be our option, we will minimize the
+negative of the logarithm since this is a monotonically decreasing
+function.
+
+Note also that maximization/minimization of the logarithm of the PDF
+is equivalent to the maximization/minimization of the function itself.
+
+
+
+!split
+===== A new Cost Function =====
+
+We could now define a new cost function to minimize, namely the negative logarithm of the above PDF
 
 !bt
 \[
-\log{\frac{p(C=1\vert x)}{p(K\vert x)}} = \beta_{10}+\beta_{11}x_1,
-\]
-!et
-and 
-!bt
-\[
-\log{\frac{p(C=2\vert x)}{p(K\vert x)}} = \beta_{20}+\beta_{21}x_1,
+C(\bm{\theta})=-\log{\prod_{i=0}^{n-1}p(y_i,\bm{X}\vert\bm{\theta})}=-\sum_{i=0}^{n-1}\log{p(y_i,\bm{X}\vert\bm{\theta})},
 \]
 !et
-and so on till the class $C=K-1$ class
+which becomes
 !bt
 \[
-\log{\frac{p(C=K-1\vert x)}{p(K\vert x)}} = \beta_{(K-1)0}+\beta_{(K-1)1}x_1,
+C(\bm{\theta})=\frac{n}{2}\log{2\pi\sigma^2}+\frac{\vert\vert (\bm{y}-\bm{X}\bm{\theta})\vert\vert_2^2}{2\sigma^2}.
 \]
 !et
 
-and the model is specified in term of $K-1$ so-called log-odds or
-_logit_ transformations.
-
-
-!split
-===== More classes =====
-
-In our discussion of neural networks we will encounter the above again
-in terms of a slightly modified function, the so-called _Softmax_ function.
-
-The softmax function is used in various multiclass classification
-methods, such as multinomial logistic regression (also known as
-softmax regression), multiclass linear discriminant analysis, naive
-Bayes classifiers, and artificial neural networks.  Specifically, in
-multinomial logistic regression and linear discriminant analysis, the
-input to the function is the result of $K$ distinct linear functions,
-and the predicted probability for the $k$-th class given a sample
-vector $\bm{x}$ and a weighting vector $\bm{\beta}$ is (with two
-predictors):
+Taking the derivative of the *new* cost function with respect to the parameters $\theta$ we recognize our familiar OLS equation, namely
 
 !bt
 \[
-p(C=k\vert \mathbf {x} )=\frac{\exp{(\beta_{k0}+\beta_{k1}x_1)}}{1+\sum_{l=1}^{K-1}\exp{(\beta_{l0}+\beta_{l1}x_1)}}.
+\bm{X}^T\left(\bm{y}-\bm{X}\bm{\theta}\right) =0,
 \]
 !et
-It is easy to extend to more predictors. The final class is 
+which leads to the well-known OLS equation for the optimal paramters $\theta$
 !bt
 \[
-p(C=K\vert \mathbf {x} )=\frac{1}{1+\sum_{l=1}^{K-1}\exp{(\beta_{l0}+\beta_{l1}x_1)}},
+\hat{\bm{\theta}}^{\mathrm{OLS}}=\left(\bm{X}^T\bm{X}\right)^{-1}\bm{X}^T\bm{y}!
 \]
 !et
 
-and they sum to one. Our earlier discussions were all specialized to
-the case with two classes only. It is easy to see from the above that
-what we derived earlier is compatible with these equations.
-
-To find the optimal parameters we would typically use a gradient
-descent method.  Newton's method and gradient descent methods are
-discussed in the material on "optimization
-methods":"https://compphysics.github.io/MachineLearning/doc/pub/Splines/html/Splines-bs.html".
-
-
-!split
-===== Searching for Optimal Regularization Parameters $\lambda$ =====
-
-In project 1, when using Ridge and Lasso regression, we end up
-searching for the optimal parameter $\lambda$ which minimizes our
-selected scores (MSE or $R2$ values for example). The brute force
-approach, as discussed in the code here for Ridge regression, consists
-in evaluating the MSE as function of different $\lambda$ values.
-Based on these calculations, one tries then to determine the value of the hyperparameter $\lambda$
-which results in optimal scores (for example the smallest MSE or an $R2=1$).
-!bc pycod
-import numpy as np
-import pandas as pd
-import matplotlib.pyplot as plt
-from sklearn.model_selection import train_test_split
-from sklearn import linear_model
-
-def MSE(y_data,y_model):
-    n = np.size(y_model)
-    return np.sum((y_data-y_model)**2)/n
-# A seed just to ensure that the random numbers are the same for every run.
-# Useful for eventual debugging.
-np.random.seed(2021)
-
-n = 100
-x = np.random.rand(n)
-y = np.exp(-x**2) + 1.5 * np.exp(-(x-2)**2)+ np.random.randn(n)
-
-Maxpolydegree = 5
-X = np.zeros((n,Maxpolydegree-1))
 
-for degree in range(1,Maxpolydegree): #No intercept column
-    X[:,degree-1] = x**(degree)
+Next week we will make  a similar analysis for Ridge and Lasso regression
 
-# We split the data in test and training data
-X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
 
-# Decide which values of lambda to use
-nlambdas = 500
-MSERidgePredict = np.zeros(nlambdas)
-lambdas = np.logspace(-4, 2, nlambdas)
-for i in range(nlambdas):
-    lmb = lambdas[i]
-    RegRidge = linear_model.Ridge(lmb)
-    RegRidge.fit(X_train,y_train)
-    ypredictRidge = RegRidge.predict(X_test)
-    MSERidgePredict[i] = MSE(y_test,ypredictRidge)
-
-# Now plot the results
-plt.figure()
-plt.plot(np.log10(lambdas), MSERidgePredict, 'g--', label = 'MSE SL Ridge Test')
-plt.xlabel('log10(lambda)')
-plt.ylabel('MSE')
-plt.legend()
-plt.show()
-
-!ec
 
-Here we have performed a rather data greedy calculation as function of the regularization parameter $\lambda$. There is no resampling here. The latter can easily be added by employing the function _RidgeCV_ instead of just calling the _Ridge_ function. For _RidgeCV_ we need to pass the array of $\lambda$ values.
-By inspecting the figure we can in turn determine which is the optimal regularization parameter.
-This becomes however less functional in the long run. 
 
 
 !split
-===== Grid Search =====
+===== Why resampling methods =====
 
+Before we proceed, we need to rethink what we have been doing. In our
+eager to fit the data, we have omitted several important elements in
+our regression analysis. In what follows we will
+o look at statistical properties, including a discussion of mean values, variance and the so-called bias-variance tradeoff
+o introduce resampling techniques like cross-validation, bootstrapping and jackknife and more
 
-An alternative is to use the so-called grid search functionality
-included with the library _Scikit-Learn_, as demonstrated for the same
-example here.
+and discuss how to select a given model (one of the difficult parts in machine learning).
 
-!bc pycod
-import numpy as np
-from sklearn.model_selection import train_test_split
-from sklearn.linear_model import Ridge
-from sklearn.model_selection import GridSearchCV
 
-def R2(y_data, y_model):
-    return 1 - np.sum((y_data - y_model) ** 2) / np.sum((y_data - np.mean(y_data)) ** 2)
-
-def MSE(y_data,y_model):
-    n = np.size(y_model)
-    return np.sum((y_data-y_model)**2)/n
-
-# A seed just to ensure that the random numbers are the same for every run.
-# Useful for eventual debugging.
-np.random.seed(2021)
 
-n = 100
-x = np.random.rand(n)
-y = np.exp(-x**2) + 1.5 * np.exp(-(x-2)**2)+ np.random.randn(n)
-
-Maxpolydegree = 5
-X = np.zeros((n,Maxpolydegree-1))
-
-for degree in range(1,Maxpolydegree): #No intercept column
-    X[:,degree-1] = x**(degree)
-
-# We split the data in test and training data
-X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
-
-# Decide which values of lambda to use
-nlambdas = 10
-lambdas = np.logspace(-4, 2, nlambdas)
-# create and fit a ridge regression model, testing each alpha
-model = Ridge()
-gridsearch = GridSearchCV(estimator=model, param_grid=dict(alpha=lambdas))
-gridsearch.fit(X_train, y_train)
-print(gridsearch)
-ypredictRidge = gridsearch.predict(X_test)
-# summarize the results of the grid search
-print(f"Best estimated lambda-value: {gridsearch.best_estimator_.alpha}")
-print(f"MSE score: {MSE(y_test,ypredictRidge)}")
-print(f"R2 score: {R2(y_test,ypredictRidge)}")
-
-!ec
-
-By default the grid search function includes cross validation with
-five folds. The "Scikit-Learn
-documentation":"https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html#sklearn.model_selection.GridSearchCV"
-contains more information on how to set the different parameters.
-
-If we take out the random noise, running the above codes results in $\lambda=0$ yielding the best fit. 
 
 
 !split
-===== Randomized Grid Search =====
-
-An alternative to the above manual grid set up, is to use a random
-search where the parameters are tuned from a random distribution
-(uniform below) for a fixed number of iterations. A model is
-constructed and evaluated for each combination of chosen parameters.
-We repeat the previous example but now with a random search.  Note
-that values of $\lambda$ are now limited to be within $x\in
-[0,1]$. This domain may not be the most relevant one for the specific
-case under study.
-
-
-!bc pycod
-import numpy as np
-from sklearn.model_selection import train_test_split
-from sklearn.linear_model import Ridge
-from sklearn.model_selection import GridSearchCV
-from scipy.stats import uniform as randuniform
-from sklearn.model_selection import RandomizedSearchCV
-
-
-def R2(y_data, y_model):
-    return 1 - np.sum((y_data - y_model) ** 2) / np.sum((y_data - np.mean(y_data)) ** 2)
-
-def MSE(y_data,y_model):
-    n = np.size(y_model)
-    return np.sum((y_data-y_model)**2)/n
-
-# A seed just to ensure that the random numbers are the same for every run.
-# Useful for eventual debugging.
-np.random.seed(2021)
-
-n = 100
-x = np.random.rand(n)
-y = np.exp(-x**2) + 1.5 * np.exp(-(x-2)**2)+ np.random.randn(n)
-
-Maxpolydegree = 5
-X = np.zeros((n,Maxpolydegree-1))
-
-for degree in range(1,Maxpolydegree): #No intercept column
-    X[:,degree-1] = x**(degree)
-
-# We split the data in test and training data
-X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
-
-param_grid = {'alpha': randuniform()}
-# create and fit a ridge regression model, testing each alpha
-model = Ridge()
-gridsearch = RandomizedSearchCV(estimator=model, param_distributions=param_grid, n_iter=100)
-gridsearch.fit(X_train, y_train)
-print(gridsearch)
-ypredictRidge = gridsearch.predict(X_test)
-# summarize the results of the grid search
-print(f"Best estimated lambda-value: {gridsearch.best_estimator_.alpha}")
-print(f"MSE score: {MSE(y_test,ypredictRidge)}")
-print(f"R2 score: {R2(y_test,ypredictRidge)}")
-
-!ec
+===== Resampling methods =====
+!bblock
+Resampling methods are an indispensable tool in modern
+statistics. They involve repeatedly drawing samples from a training
+set and refitting a model of interest on each sample in order to
+obtain additional information about the fitted model. For example, in
+order to estimate the variability of a linear regression fit, we can
+repeatedly draw different samples from the training data, fit a linear
+regression to each new sample, and then examine the extent to which
+the resulting fits differ. Such an approach may allow us to obtain
+information that would not be available from fitting the model only
+once using the original training sample.
+
+Two resampling methods are often used in Machine Learning analyses,
+o The _bootstrap method_
+o and _Cross-Validation_
+
+In addition there are several other methods such as the Jackknife and the Blocking methods. We will discuss in particular
+cross-validation and the bootstrap method. 
 
 
+!eblock
 
 
 !split
-=====  Wisconsin Cancer Data  =====
-
-We show here how we can use a simple regression case on the breast
-cancer data using Logistic regression as our algorithm for
-classification.
+===== Resampling approaches can be computationally expensive =====
+!bblock
+
+Resampling approaches can be computationally expensive, because they
+involve fitting the same statistical method multiple times using
+different subsets of the training data. However, due to recent
+advances in computing power, the computational requirements of
+resampling methods generally are not prohibitive. In this chapter, we
+discuss two of the most commonly used resampling methods,
+cross-validation and the bootstrap. Both methods are important tools
+in the practical application of many statistical learning
+procedures. For example, cross-validation can be used to estimate the
+test error associated with a given statistical learning method in
+order to evaluate its performance, or to select the appropriate level
+of flexibility. The process of evaluating a model’s performance is
+known as model assessment, whereas the process of selecting the proper
+level of flexibility for a model is known as model selection. The
+bootstrap is widely used.
 
-
-!bc pycod
-import matplotlib.pyplot as plt
-import numpy as np
-from sklearn.model_selection import  train_test_split 
-from sklearn.datasets import load_breast_cancer
-from sklearn.linear_model import LogisticRegression
-
-# Load the data
-cancer = load_breast_cancer()
-
-X_train, X_test, y_train, y_test = train_test_split(cancer.data,cancer.target,random_state=0)
-print(X_train.shape)
-print(X_test.shape)
-# Logistic Regression
-logreg = LogisticRegression(solver='lbfgs')
-logreg.fit(X_train, y_train)
-print("Test set accuracy with Logistic Regression: {:.2f}".format(logreg.score(X_test,y_test)))
-!ec
+!eblock
 
 !split
-===== Using the correlation matrix =====
+===== Why resampling methods ? ===== 
+!bblock Statistical analysis
+    
+* Our simulations can be treated as *computer experiments*. This is particularly the case for Monte Carlo methods which are widely used in statistical analyses.
+* The results can be analysed with the same statistical tools as we would use when analysing experimental data.
+* As in all experiments, we are looking for expectation values and an estimate of how accurate they are, i.e., possible sources for errors.
 
-In addition to the above scores, we could also study the covariance (and the correlation matrix).
-We use _Pandas_ to compute the correlation matrix.
-!bc pycod
-import matplotlib.pyplot as plt
-import numpy as np
-from sklearn.model_selection import  train_test_split 
-from sklearn.datasets import load_breast_cancer
-from sklearn.linear_model import LogisticRegression
-cancer = load_breast_cancer()
-import pandas as pd
-# Making a data frame
-cancerpd = pd.DataFrame(cancer.data, columns=cancer.feature_names)
-
-fig, axes = plt.subplots(15,2,figsize=(10,20))
-malignant = cancer.data[cancer.target == 0]
-benign = cancer.data[cancer.target == 1]
-ax = axes.ravel()
-
-for i in range(30):
-    _, bins = np.histogram(cancer.data[:,i], bins =50)
-    ax[i].hist(malignant[:,i], bins = bins, alpha = 0.5)
-    ax[i].hist(benign[:,i], bins = bins, alpha = 0.5)
-    ax[i].set_title(cancer.feature_names[i])
-    ax[i].set_yticks(())
-ax[0].set_xlabel("Feature magnitude")
-ax[0].set_ylabel("Frequency")
-ax[0].legend(["Malignant", "Benign"], loc ="best")
-fig.tight_layout()
-plt.show()
 
-import seaborn as sns
-correlation_matrix = cancerpd.corr().round(1)
-# use the heatmap function from seaborn to plot the correlation matrix
-# annot = True to print the values inside the square
-plt.figure(figsize=(15,8))
-sns.heatmap(data=correlation_matrix, annot=True)
-plt.show()
-
-
-!ec
+!eblock    
 
 !split
-===== Discussing the correlation data =====
+=====  Statistical analysis ===== 
+!bblock 
 
-In the above example we note two things. In the first plot we display
-the overlap of benign and malignant tumors as functions of the various
-features in the Wisconsing breast cancer data set. We see that for
-some of the features we can distinguish clearly the benign and
-malignant cases while for other features we cannot. This can point to
-us which features may be of greater interest when we wish to classify
-a benign or not benign tumour.
+* As in other experiments, many numerical  experiments have two classes of errors:
+  * Statistical errors
+  * Systematical errors
+* Statistical errors can be estimated using standard tools from statistics
+* Systematical errors are method specific and must be treated differently from case to case. 
+!eblock    
 
-In the second figure we have computed the so-called correlation
-matrix, which in our case with thirty features becomes a $30\times 30$
-matrix.
 
-We constructed this matrix using _pandas_ via the statements
-!bc pycod
-cancerpd = pd.DataFrame(cancer.data, columns=cancer.feature_names)
-!ec
-and then
-!bc pycod
-correlation_matrix = cancerpd.corr().round(1)
-!ec
-
-Diagonalizing this matrix we can in turn say something about which
-features are of relevance and which are not. This leads  us to
-the classical Principal Component Analysis (PCA) theorem with
-applications. This will be discussed later this semester ("week 43":"https://compphysics.github.io/MachineLearning/doc/pub/week43/html/week43-bs.html").
 
 
 
 !split
-===== Other measures in classification studies: Cancer Data  again =====
-!bc pycod 
-import matplotlib.pyplot as plt
-import numpy as np
-from sklearn.model_selection import  train_test_split 
-from sklearn.datasets import load_breast_cancer
-from sklearn.linear_model import LogisticRegression
-
-# Load the data
-cancer = load_breast_cancer()
-
-X_train, X_test, y_train, y_test = train_test_split(cancer.data,cancer.target,random_state=0)
-print(X_train.shape)
-print(X_test.shape)
-# Logistic Regression
-logreg = LogisticRegression(solver='lbfgs')
-logreg.fit(X_train, y_train)
-
-from sklearn.preprocessing import LabelEncoder
-from sklearn.model_selection import cross_validate
-#Cross validation
-accuracy = cross_validate(logreg,X_test,y_test,cv=10)['test_score']
-print(accuracy)
-print("Test set accuracy with Logistic Regression: {:.2f}".format(logreg.score(X_test,y_test)))
-
-import scikitplot as skplt
-y_pred = logreg.predict(X_test)
-skplt.metrics.plot_confusion_matrix(y_test, y_pred, normalize=True)
-plt.show()
-y_probas = logreg.predict_proba(X_test)
-skplt.metrics.plot_roc(y_test, y_probas)
-plt.show()
-skplt.metrics.plot_cumulative_gain(y_test, y_probas)
-plt.show()
+===== Resampling methods =====
 
-!ec
+With all these analytical equations for both the OLS and Ridge
+regression, we will now outline how to assess a given model. This will
+lead to a discussion of the so-called bias-variance tradeoff (see
+below) and so-called resampling methods.
 
+One of the quantities we have discussed as a way to measure errors is
+the mean-squared error (MSE), mainly used for fitting of continuous
+functions. Another choice is the absolute error.
 
+In the discussions below we will focus on the MSE and in particular since we will split the data into test and training data,
+we discuss the
+o prediction error or simply the _test error_ $\mathrm{Err_{Test}}$, where we have a fixed training set and the test error is the MSE arising from the data reserved for testing. We discuss also the 
+o training error $\mathrm{Err_{Train}}$, which is the average loss over the training data.
 
-!split
-===== Optimization, the central part of any Machine Learning algortithm =====
+As our model becomes more and more complex, more of the training data tends to  used. The training may thence adapt to more complicated structures in the data. This may lead to a decrease in the bias (see below for code example) and a slight increase of the variance for the test error.
+For a certain level of complexity the test error will reach minimum, before starting to increase again. The
+training error reaches a saturation.
 
-"Overview Video, why do we care about gradient methods?":"https://www.uio.no/studier/emner/matnat/fys/FYS-STK3155/h20/forelesningsvideoer/OverarchingAimsWeek39.mp4?vrtx=view-as-webpage"
 
+!split
+===== Resampling methods: Bootstrap =====
+!bblock
+Bootstrapping is a "non-parametric approach":"https://en.wikipedia.org/wiki/Nonparametric_statistics" to statistical inference
+that substitutes computation for more traditional distributional
+assumptions and asymptotic results. Bootstrapping offers a number of
+advantages: 
+o The bootstrap is quite general, although there are some cases in which it fails.  
+o Because it does not require distributional assumptions (such as normally distributed errors), the bootstrap can provide more accurate inferences when the data are not well behaved or when the sample size is small.  
+o It is possible to apply the bootstrap to statistics with sampling distributions that are difficult to derive, even asymptotically. 
+o It is relatively simple to apply the bootstrap to complex data-collection plans (such as stratified and clustered samples).
+!eblock
 
+The textbook by "Davison on the Bootstrap Methods and their Applications":"https://www.cambridge.org/core/books/bootstrap-methods-and-their-application/ED2FD043579F27952363566DC09CBD6A" provides many more insights and proofs. In this course we will take a more practical approach and use the results and theorems provided in the literature. For those interested in reading more about the bootstrap methods, we recommend the above text and the one by "Efron and Tibshirani":"https://www.routledge.com/An-Introduction-to-the-Bootstrap/Efron-Tibshirani/p/book/9780412042317".
 
-Almost every problem in machine learning and data science starts with
-a dataset $X$, a model $g(\beta)$, which is a function of the
-parameters $\beta$ and a cost function $C(X, g(\beta))$ that allows
-us to judge how well the model $g(\beta)$ explains the observations
-$X$. The model is fit by finding the values of $\beta$ that minimize
-the cost function. Ideally we would be able to solve for $\beta$
-analytically, however this is not possible in general and we must use
-some approximative/numerical method to compute the minimum.
 
+Before we proceed however, we need to remind ourselves about a central theorem in statistics, namely the so-called _central limit theorem_.
 
 !split
-=====  Revisiting our Logistic Regression case =====
-
-In our discussion on Logistic Regression we studied the 
-case of
-two classes, with $y_i$ either
-$0$ or $1$. Furthermore we assumed also that we have only two
-parameters $\beta$ in our fitting, that is we
-defined probabilities
+===== The Central Limit Theorem =====
 
-!bt
-\begin{align*}
-p(y_i=1|x_i,\bm{\beta}) &= \frac{\exp{(\beta_0+\beta_1x_i)}}{1+\exp{(\beta_0+\beta_1x_i)}},\nonumber\\
-p(y_i=0|x_i,\bm{\beta}) &= 1 - p(y_i=1|x_i,\bm{\beta}),
-\end{align*}
-!et
-where $\bm{\beta}$ are the weights we wish to extract from data, in our case $\beta_0$ and $\beta_1$. 
-
-!split
-===== The equations to solve =====
 
-Our compact equations used a definition of a vector $\bm{y}$ with $n$
-elements $y_i$, an $n\times p$ matrix $\bm{X}$ which contains the
-$x_i$ values and a vector $\bm{p}$ of fitted probabilities
-$p(y_i\vert x_i,\bm{\beta})$. We rewrote in a more compact form
-the first derivative of the cost function as
+Suppose we have a PDF $p(x)$ from which we generate  a series $N$
+of averages $\mathbb{E}[x_i]$. Each mean value $\mathbb{E}[x_i]$
+is viewed as the average of a specific measurement, e.g., throwing 
+dice 100 times and then taking the average value, or producing a certain
+amount of random numbers. 
+For notational ease, we set $\mathbb{E}[x_i]=x_i$ in the discussion
+which follows. We do the same for $\mathbb{E}[z]=z$.
 
+If we compute the mean $z$ of $m$ such mean values $x_i$   
 !bt
 \[
-\frac{\partial \mathcal{C}(\bm{\beta})}{\partial \bm{\beta}} = -\bm{X}^T\left(\bm{y}-\bm{p}\right). 
+   z=\frac{x_1+x_2+\dots+x_m}{m},
 \]
 !et
+the question we pose is which is the PDF of the new variable $z$.
 
-If we in addition define a diagonal matrix $\bm{W}$ with elements 
-$p(y_i\vert x_i,\bm{\beta})(1-p(y_i\vert x_i,\bm{\beta})$, we can obtain a compact expression of the second derivative as 
+!split
+===== Finding the Limit =====
 
+The probability of obtaining an average value $z$ is the product of the 
+probabilities of obtaining arbitrary individual mean values $x_i$,
+but with the constraint that the average is $z$. We can express this through
+the following expression
 !bt
 \[
-\frac{\partial^2 \mathcal{C}(\bm{\beta})}{\partial \bm{\beta}\partial \bm{\beta}^T} = \bm{X}^T\bm{W}\bm{X}. 
+    \tilde{p}(z)=\int dx_1p(x_1)\int dx_2p(x_2)\dots\int dx_mp(x_m)
+    \delta(z-\frac{x_1+x_2+\dots+x_m}{m}),
 \]
 !et
-This defines what is called  the Hessian matrix.
+where the $\delta$-function enbodies the constraint that the mean is $z$.
+All measurements that lead to each individual $x_i$ are expected to
+be independent, which in turn means that we can express $\tilde{p}$ as the 
+product of individual $p(x_i)$.  The independence assumption is important in the derivation of the central limit theorem.
 
-!split
-===== Solving using Newton-Raphson's method =====
 
-If we can set up these equations, Newton-Raphson's iterative method is normally the method of choice. It requires however that we can compute in an efficient way the  matrices that define the first and second derivatives. 
+!split
+===== Rewriting the $\delta$-function =====
 
-Our iterative scheme is then given by
+If we use the integral expression for the $\delta$-function
 
 !bt
 \[
-\bm{\beta}^{\mathrm{new}} = \bm{\beta}^{\mathrm{old}}-\left(\frac{\partial^2 \mathcal{C}(\bm{\beta})}{\partial \bm{\beta}\partial \bm{\beta}^T}\right)^{-1}_{\bm{\beta}^{\mathrm{old}}}\times \left(\frac{\partial \mathcal{C}(\bm{\beta})}{\partial \bm{\beta}}\right)_{\bm{\beta}^{\mathrm{old}}},
+   \delta(z-\frac{x_1+x_2+\dots+x_m}{m})=\frac{1}{2\pi}\int_{-\infty}^{\infty}
+   dq\exp{\left(iq(z-\frac{x_1+x_2+\dots+x_m}{m})\right)},
 \]
 !et
-or in matrix form as
-
+and inserting $e^{i\mu q-i\mu q}$ where $\mu$ is the mean value
+we arrive at
 !bt
 \[
-\bm{\beta}^{\mathrm{new}} = \bm{\beta}^{\mathrm{old}}-\left(\bm{X}^T\bm{W}\bm{X} \right)^{-1}\times \left(-\bm{X}^T(\bm{y}-\bm{p}) \right)_{\bm{\beta}^{\mathrm{old}}}.
+   \tilde{p}(z)=\frac{1}{2\pi}\int_{-\infty}^{\infty}
+   dq\exp{\left(iq(z-\mu)\right)}\left[\int_{-\infty}^{\infty}
+   dxp(x)\exp{\left(iq(\mu-x)/m\right)}\right]^m,
 \]
 !et
-The right-hand side is computed with the old values of $\beta$. 
-
-If we can compute these matrices, in particular the Hessian, the above is often the easiest method to implement. 
-
-
-!split
-===== Brief reminder on Newton-Raphson's method =====
-
-Let us quickly remind ourselves how we derive the above method.
-
-Perhaps the most celebrated of all one-dimensional root-finding
-routines is Newton's method, also called the Newton-Raphson
-method. This method  requires the evaluation of both the
-function $f$ and its derivative $f'$ at arbitrary points. 
-If you can only calculate the derivative
-numerically and/or your function is not of the smooth type, we
-normally discourage the use of this method.
-
-!split
-===== The equations =====
-
-The Newton-Raphson formula consists geometrically of extending the
-tangent line at a current point until it crosses zero, then setting
-the next guess to the abscissa of that zero-crossing.  The mathematics
-behind this method is rather simple. Employing a Taylor expansion for
-$x$ sufficiently close to the solution $s$, we have
-
+with the integral over $x$ resulting in
 
 !bt
 \[
-    f(s)=0=f(x)+(s-x)f'(x)+\frac{(s-x)^2}{2}f''(x) +\dots.
-    \label{eq:taylornr}
+  \int_{-\infty}^{\infty}dxp(x)\exp{\left(iq(\mu-x)/m\right)}=
+  \int_{-\infty}^{\infty}dxp(x)
+   \left[1+\frac{iq(\mu-x)}{m}-\frac{q^2(\mu-x)^2}{2m^2}+\dots\right].
 \]
 !et
 
-For small enough values of the function and for well-behaved
-functions, the terms beyond linear are unimportant, hence we obtain
-
+!split
+===== Identifying Terms =====
 
+The second term on the rhs disappears since this is just the mean and 
+employing the definition of $\sigma^2$ we have 
 !bt
 \[
-   f(x)+(s-x)f'(x)\approx 0,
+  \int_{-\infty}^{\infty}dxp(x)e^{\left(iq(\mu-x)/m\right)}=
+  1-\frac{q^2\sigma^2}{2m^2}+\dots,
 \]
 !et
-yielding
+resulting in 
+
 !bt
 \[
-   s\approx x-\frac{f(x)}{f'(x)}.
+  \left[\int_{-\infty}^{\infty}dxp(x)\exp{\left(iq(\mu-x)/m\right)}\right]^m\approx
+  \left[1-\frac{q^2\sigma^2}{2m^2}+\dots \right]^m,
 \]
 !et
+and in the limit $m\rightarrow \infty$ we obtain 
 
-Having in mind an iterative procedure, it is natural to start iterating with
 !bt
 \[
-   x_{n+1}=x_n-\frac{f(x_n)}{f'(x_n)}.
+   \tilde{p}(z)=\frac{1}{\sqrt{2\pi}(\sigma/\sqrt{m})}
+    \exp{\left(-\frac{(z-\mu)^2}{2(\sigma/\sqrt{m})^2}\right)},
 \]
 !et
+which is the normal distribution with variance
+$\sigma^2_m=\sigma^2/m$, where $\sigma$ is the variance of the PDF $p(x)$
+and $\mu$ is also the mean of the PDF $p(x)$. 
 
 !split
-===== Simple geometric interpretation =====
-
-The above is Newton-Raphson's method. It has a simple geometric
-interpretation, namely $x_{n+1}$ is the point where the tangent from
-$(x_n,f(x_n))$ crosses the $x$-axis.  Close to the solution,
-Newton-Raphson converges fast to the desired result. However, if we
-are far from a root, where the higher-order terms in the series are
-important, the Newton-Raphson formula can give grossly inaccurate
-results. For instance, the initial guess for the root might be so far
-from the true root as to let the search interval include a local
-maximum or minimum of the function.  If an iteration places a trial
-guess near such a local extremum, so that the first derivative nearly
-vanishes, then Newton-Raphson may fail totally
+===== Wrapping it up =====
 
+Thus, the central limit theorem states that the PDF $\tilde{p}(z)$ of
+the average of $m$ random values corresponding to a PDF $p(x)$ 
+is a normal distribution whose mean is the 
+mean value of the PDF $p(x)$ and whose variance is the variance
+of the PDF $p(x)$ divided by $m$, the number of values used to compute $z$.
 
-!split
-===== Extending to more than one variable =====
+The central limit theorem leads to the well-known expression for the
+standard deviation, given by 
 
-Newton's method can be generalized to systems of several non-linear equations
-and variables. Consider the case with two equations
 !bt
 \[
-   \begin{array}{cc} f_1(x_1,x_2) &=0\\
-                     f_2(x_1,x_2) &=0,\end{array}
+    \sigma_m=
+\frac{\sigma}{\sqrt{m}}.
 \]
 !et
-which we Taylor expand to obtain
 
+The latter is true only if the average value is known exactly. This is obtained in the limit
+$m\rightarrow \infty$  only. Because the mean and the variance are measured quantities we obtain 
+the familiar expression in statistics (the so-called Bessel correction)
 !bt
 \[
-   \begin{array}{cc} 0=f_1(x_1+h_1,x_2+h_2)=&f_1(x_1,x_2)+h_1
-                     \partial f_1/\partial x_1+h_2
-                     \partial f_1/\partial x_2+\dots\\
-                     0=f_2(x_1+h_1,x_2+h_2)=&f_2(x_1,x_2)+h_1
-                     \partial f_2/\partial x_1+h_2
-                     \partial f_2/\partial x_2+\dots
-                       \end{array}.
-\]
-!et
-Defining the Jacobian matrix ${\bf \bm{J}}$ we have
-!bt
-\[
- {\bf \bm{J}}=\left( \begin{array}{cc}
-                         \partial f_1/\partial x_1  & \partial f_1/\partial x_2 \\
-                          \partial f_2/\partial x_1     &\partial f_2/\partial x_2
-             \end{array} \right),
-\]
-!et
-we can rephrase Newton's method as
-!bt
-\[
-\left(\begin{array}{c} x_1^{n+1} \\ x_2^{n+1} \end{array} \right)=
-\left(\begin{array}{c} x_1^{n} \\ x_2^{n} \end{array} \right)+
-\left(\begin{array}{c} h_1^{n} \\ h_2^{n} \end{array} \right),
-\]
-!et
-where we have defined
-!bt
-\[
-   \left(\begin{array}{c} h_1^{n} \\ h_2^{n} \end{array} \right)=
-   -{\bf \bm{J}}^{-1}
-   \left(\begin{array}{c} f_1(x_1^{n},x_2^{n}) \\ f_2(x_1^{n},x_2^{n}) \end{array} \right).
+    \sigma_m\approx 
+\frac{\sigma}{\sqrt{m-1}}.
 \]
 !et
-We need thus to compute the inverse of the Jacobian matrix and it
-is to understand that difficulties  may
-arise in case ${\bf \bm{J}}$ is nearly singular.
 
-It is rather straightforward to extend the above scheme to systems of
-more than two non-linear equations. In our case, the Jacobian matrix is given by the Hessian that represents the second derivative of cost function. 
+In many cases however the above estimate for the standard deviation,
+in particular if correlations are strong, may be too simplistic. Keep
+in mind that we have assumed that the variables $x$ are independent
+and identically distributed. This is obviously not always the
+case. For example, the random numbers (or better pseudorandom numbers)
+we generate in various calculations do always exhibit some
+correlations.
 
 
 
+The theorem is satisfied by a large class of PDFs. Note however that for a
+finite $m$, it is not always possible to find a closed form /analytic expression for
+$\tilde{p}(x)$.
+
+
 !split
-===== Steepest descent =====
+===== Confidence Intervals =====
 
-The basic idea of gradient descent is
-that a function $F(\mathbf{x})$, 
-$\mathbf{x} \equiv (x_1,\cdots,x_n)$, decreases fastest if one goes from $\bf {x}$ in the
-direction of the negative gradient $-\nabla F(\mathbf{x})$.
+Confidence intervals are used in statistics and represent a type of estimate
+computed from the observed data. This gives a range of values for an
+unknown parameter such as the parameters $\bm{\theta}$ from linear regression.
 
-It can be shown that if 
-!bt
-\[
-\mathbf{x}_{k+1} = \mathbf{x}_k - \gamma_k \nabla F(\mathbf{x}_k),
-\]
-!et
-with $\gamma_k > 0$.
+With the OLS expressions for the parameters $\bm{\theta}$ we found 
+$\mathbb{E}(\bm{\theta}) = \bm{\theta}$, which means that the estimator of the regression parameters is unbiased.
 
-For $\gamma_k$ small enough, then $F(\mathbf{x}_{k+1}) \leq
-F(\mathbf{x}_k)$. This means that for a sufficiently small $\gamma_k$
-we are always moving towards smaller function values, i.e a minimum.
+In the exercises this week we show that the variance of the estimate of the $j$-th regression coefficient is
+$\bm{\sigma}^2 (\bm{\theta}_j ) = \bm{\sigma}^2 [(\mathbf{X}^{T} \mathbf{X})^{-1}]_{jj} $.
+
+This quantity can be used to
+construct a confidence interval for the estimates.
 
-!split 
-===== More on Steepest descent =====
 
-The previous observation is the basis of the method of steepest
-descent, which is also referred to as just gradient descent (GD). One
-starts with an initial guess $\mathbf{x}_0$ for a minimum of $F$ and
-computes new approximations according to
+!split
+===== Standard Approach based on the Normal Distribution =====
+
+We will assume that the parameters $\theta$ follow a normal
+distribution.  We can then define the confidence interval.  Here we will be using as
+shorthands $\mu_{\theta}$ for the above mean value and $\sigma_{\theta}$
+for the standard deviation. We have then a confidence interval
 
 !bt
 \[
-\mathbf{x}_{k+1} = \mathbf{x}_k - \gamma_k \nabla F(\mathbf{x}_k), \ \ k \geq 0.
+\left(\mu_{\theta}\pm \frac{z\sigma_{\theta}}{\sqrt{n}}\right),
 \]
 !et
 
-The parameter $\gamma_k$ is often referred to as the step length or
-the learning rate within the context of Machine Learning.
-
-!split 
-===== The ideal =====
-
-Ideally the sequence $\{\mathbf{x}_k \}_{k=0}$ converges to a global
-minimum of the function $F$. In general we do not know if we are in a
-global or local minimum. In the special case when $F$ is a convex
-function, all local minima are also global minima, so in this case
-gradient descent can converge to the global solution. The advantage of
-this scheme is that it is conceptually simple and straightforward to
-implement. However the method in this form has some severe
-limitations:
-
-In machine learing we are often faced with non-convex high dimensional
-cost functions with many local minima. Since GD is deterministic we
-will get stuck in a local minimum, if the method converges, unless we
-have a very good intial guess. This also implies that the scheme is
-sensitive to the chosen initial condition.
+where $z$ defines the level of certainty (or confidence). For a normal
+distribution typical parameters are $z=2.576$ which corresponds to a
+confidence of $99\%$ while $z=1.96$ corresponds to a confidence of
+$95\%$.  A confidence level of $95\%$ is commonly used and it is
+normally referred to as a *two-sigmas* confidence level, that is we
+approximate $z\approx 2$.
 
-Note that the gradient is a function of $\mathbf{x} =
-(x_1,\cdots,x_n)$ which makes it expensive to compute numerically.
+For more discussions of confidence intervals (and in particular linked with a discussion of the bootstrap method), see chapter 5 of the textbook by "Davison on the Bootstrap Methods and their Applications":"https://www.cambridge.org/core/books/bootstrap-methods-and-their-application/ED2FD043579F27952363566DC09CBD6A"
 
+In this text you will also find an in-depth discussion of the
+Bootstrap method, why it works and various theorems related to it. 
 
-!split 
-===== The sensitiveness of the gradient descent ===== 
-
-The gradient descent method 
-is sensitive to the choice of learning rate $\gamma_k$. This is due
-to the fact that we are only guaranteed that $F(\mathbf{x}_{k+1}) \leq
-F(\mathbf{x}_k)$ for sufficiently small $\gamma_k$. The problem is to
-determine an optimal learning rate. If the learning rate is chosen too
-small the method will take a long time to converge and if it is too
-large we can experience erratic behavior.
-
-Many of these shortcomings can be alleviated by introducing
-randomness. One such method is that of Stochastic Gradient Descent
-(SGD), to be discussed next week.
+!split
+===== Resampling methods: Bootstrap background =====
 
+Since $\widehat{\theta} = \widehat{\theta}(\bm{X})$ is a function of random variables,
+$\widehat{\theta}$ itself must be a random variable. Thus it has
+a pdf, call this function $p(\bm{t})$. The aim of the bootstrap is to
+estimate $p(\bm{t})$ by the relative frequency of
+$\widehat{\theta}$. You can think of this as using a histogram
+in the place of $p(\bm{t})$. If the relative frequency closely
+resembles $p(\vec{t})$, then using numerics, it is straight forward to
+estimate all the interesting parameters of $p(\bm{t})$ using point
+estimators.  
 
-!split 
-===== Convex functions ===== 
 
-Ideally we want our cost/loss function to be convex(concave).
+!split
+===== Resampling methods: More Bootstrap background =====
 
-First we give the definition of a convex set: A set $C$ in
-$\mathbb{R}^n$ is said to be convex if, for all $x$ and $y$ in $C$ and
-all $t \in (0,1)$ , the point $(1 − t)x + ty$ also belongs to
-C. Geometrically this means that every point on the line segment
-connecting $x$ and $y$ is in $C$ as discussed below.
+In the case that $\widehat{\theta}$ has
+more than one component, and the components are independent, we use the
+same estimator on each component separately.  If the probability
+density function of $X_i$, $p(x)$, had been known, then it would have
+been straightforward to do this by: 
+o Drawing lots of numbers from $p(x)$, suppose we call one such set of numbers $(X_1^*, X_2^*, \cdots, X_n^*)$. 
+o Then using these numbers, we could compute a replica of $\widehat{\theta}$ called $\widehat{\theta}^*$. 
 
-The convex subsets of $\mathbb{R}$ are the intervals of
-$\mathbb{R}$. Examples of convex sets of $\mathbb{R}^2$ are the
-regular polygons (triangles, rectangles, pentagons, etc...).
+By repeated use of the above two points, many
+estimates of $\widehat{\theta}$ can  be obtained. The
+idea is to use the relative frequency of $\widehat{\theta}^*$
+(think of a histogram) as an estimate of $p(\bm{t})$.
 
 !split
-===== Convex function =====
+===== Resampling methods: Bootstrap approach =====
 
-_Convex function_: Let $X \subset \mathbb{R}^n$ be a convex set. Assume that the function $f: X \rightarrow \mathbb{R}$ is continuous, then $f$ is said to be convex if $$f(tx_1 + (1-t)x_2) \leq tf(x_1) + (1-t)f(x_2) $$ for all $x_1, x_2 \in X$ and for all $t \in [0,1]$. If $\leq$ is replaced with a strict inequaltiy in the definition, we demand $x_1 \neq x_2$ and $t\in(0,1)$ then $f$ is said to be strictly convex. For a single variable function, convexity means that if you draw a straight line connecting $f(x_1)$ and $f(x_2)$, the value of the function on the interval $[x_1,x_2]$ is always below the line as illustrated below.
+But
+unless there is enough information available about the process that
+generated $X_1,X_2,\cdots,X_n$, $p(x)$ is in general
+unknown. Therefore, "Efron in 1979":"https://projecteuclid.org/euclid.aos/1176344552"  asked the
+question: What if we replace $p(x)$ by the relative frequency
+of the observation $X_i$?
 
-!split
-===== Conditions on convex functions =====
-
-In the following we state first and second-order conditions which
-ensures convexity of a function $f$. We write $D_f$ to denote the
-domain of $f$, i.e the subset of $R^n$ where $f$ is defined. For more
-details and proofs we refer to: "S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge University Press":"http://stanford.edu/boyd/cvxbook/,%202004".
-
-!bblock First order condition
-Suppose $f$ is differentiable (i.e $\nabla f(x)$ is well defined for
-all $x$ in the domain of $f$). Then $f$ is convex if and only if $D_f$
-is a convex set and $$f(y) \geq f(x) + \nabla f(x)^T (y-x) $$ holds
-for all $x,y \in D_f$. This condition means that for a convex function
-the first order Taylor expansion (right hand side above) at any point
-a global under estimator of the function. To convince yourself you can
-make a drawing of $f(x) = x^2+1$ and draw the tangent line to $f(x)$ and
-note that it is always below the graph.  
-!eblock
+If we draw observations in accordance with
+the relative frequency of the observations, will we obtain the same
+result in some asymptotic sense? The answer is yes.
 
-!bblock Second order condition 
-Assume that $f$ is twice
-differentiable, i.e the Hessian matrix exists at each point in
-$D_f$. Then $f$ is convex if and only if $D_f$ is a convex set and its
-Hessian is positive semi-definite for all $x\in D_f$. For a
-single-variable function this reduces to $f''(x) \geq 0$. Geometrically this means that $f$ has nonnegative curvature
-everywhere.
-!eblock
 
-This condition is particularly useful since it gives us an procedure for determining if the function under consideration is convex, apart from using the definition.
 
 !split
-===== More on convex functions =====
-
-The next result is of great importance to us and the reason why we are
-going on about convex functions. In machine learning we frequently
-have to minimize a loss/cost function in order to find the best
-parameters for the model we are considering. 
-
-Ideally we want the
-global minimum (for high-dimensional models it is hard to know
-if we have local or global minimum). However, if the cost/loss function
-is convex the following result provides invaluable information:
-
-!bblock Any minimum is global for convex functions
-Consider the problem of finding $x \in \mathbb{R}^n$ such that $f(x)$
-is minimal, where $f$ is convex and differentiable. Then, any point
-$x^*$ that satisfies $\nabla f(x^*) = 0$ is a global minimum.
-!eblock
+===== Resampling methods: Bootstrap steps =====
 
-This result means that if we know that the cost/loss function is convex and we are able to find a minimum, we are guaranteed that it is a global minimum.
+The independent bootstrap works like this: 
 
-!split
-===== Some simple problems =====
+o Draw with replacement $n$ numbers for the observed variables $\bm{x} = (x_1,x_2,\cdots,x_n)$. 
+o Define a vector $\bm{x}^*$ containing the values which were drawn from $\bm{x}$. 
+o Using the vector $\bm{x}^*$ compute $\widehat{\theta}^*$ by evaluating $\widehat \theta$ under the observations $\bm{x}^*$. 
+o Repeat this process $k$ times. 
 
-o Show that $f(x)=x^2$ is convex for $x \in \mathbb{R}$ using the definition of convexity. Hint: If you re-write the definition, $f$ is convex if the following holds for all $x,y \in D_f$ and any $\lambda \in [0,1]$ $\lambda f(x)+(1-\lambda)f(y)-f(\lambda x + (1-\lambda) y ) \geq 0$.
+When you are done, you can draw a histogram of the relative frequency
+of $\widehat \theta^*$. This is your estimate of the probability
+distribution $p(t)$. Using this probability distribution you can
+estimate any statistics thereof. In principle you never draw the
+histogram of the relative frequency of $\widehat{\theta}^*$. Instead
+you use the estimators corresponding to the statistic of interest. For
+example, if you are interested in estimating the variance of $\widehat
+\theta$, apply the etsimator $\widehat \sigma^2$ to the values
+$\widehat \theta^*$.
 
-o Using the second order condition show that the following functions are convex on the specified domain.
- * $f(x) = e^x$ is convex for $x \in \mathbb{R}$.
- * $g(x) = -\ln(x)$ is convex for $x \in (0,\infty)$.
-o Let $f(x) = x^2$ and $g(x) = e^x$. Show that $f(g(x))$ and $g(f(x))$ is convex for $x \in \mathbb{R}$. Also show that if $f(x)$ is any convex function than $h(x) = e^{f(x)}$ is convex.
 
-o A norm is any function that satisfy the following properties
- * $f(\alpha x) = |\alpha| f(x)$ for all $\alpha \in \mathbb{R}$.
- * $f(x+y) \leq f(x) + f(y)$
- * $f(x) \leq 0$ for all $x \in \mathbb{R}^n$ with equality if and only if $x = 0$
+!split
+===== Code example for the Bootstrap method =====
+
+The following code starts with a Gaussian distribution with mean value
+$\mu =100$ and variance $\sigma=15$. We use this to generate the data
+used in the bootstrap analysis. The bootstrap analysis returns a data
+set after a given number of bootstrap operations (as many as we have
+data points). This data set consists of estimated mean values for each
+bootstrap operation. The histogram generated by the bootstrap method
+shows that the distribution for these mean values is also a Gaussian,
+centered around the mean value $\mu=100$ but with standard deviation
+$\sigma/\sqrt{n}$, where $n$ is the number of bootstrap samples (in
+this case the same as the number of original data points). The value
+of the standard deviation is what we expect from the central limit
+theorem.
 
-Using the definition of convexity, try to show that a function satisfying the properties above is convex (the third condition is not needed to show this).
 
+!bc pycod
+import numpy as np
+from time import time
+from scipy.stats import norm
+import matplotlib.pyplot as plt
 
+# Returns mean of bootstrap samples 
+# Bootstrap algorithm
+def bootstrap(data, datapoints):
+    t = np.zeros(datapoints)
+    n = len(data)
+    # non-parametric bootstrap         
+    for i in range(datapoints):
+        t[i] = np.mean(data[np.random.randint(0,n,n)])
+    # analysis    
+    print("Bootstrap Statistics :")
+    print("original           bias      std. error")
+    print("%8g %8g %14g %15g" % (np.mean(data), np.std(data),np.mean(t),np.std(t)))
+    return t
+
+# We set the mean value to 100 and the standard deviation to 15
+mu, sigma = 100, 15
+datapoints = 10000
+# We generate random numbers according to the normal distribution
+x = mu + sigma*np.random.randn(datapoints)
+# bootstrap returns the data sample                                    
+t = bootstrap(x, datapoints)
+!ec
+We see that our new variance and from that the standard deviation, agrees with the central limit theorem.
 
+!split
+===== Plotting the Histogram =====
+!bc pycod 
+# the histogram of the bootstrapped data (normalized data if density = True)
+n, binsboot, patches = plt.hist(t, 50, density=True, facecolor='red', alpha=0.75)
+# add a 'best fit' line  
+y = norm.pdf(binsboot, np.mean(t), np.std(t))
+lt = plt.plot(binsboot, y, 'b', linewidth=1)
+plt.xlabel('x')
+plt.ylabel('Probability')
+plt.grid(True)
+plt.show()
+!ec
 
 
 
-!split 
-===== Revisiting our first homework =====
+!split
+===== The bias-variance tradeoff =====
 
-We will use linear regression as a case study for the gradient descent
-methods. Linear regression is a great test case for the gradient
-descent methods discussed in the lectures since it has several
-desirable properties such as:
 
-o An analytical solution (recall homework set 1).
-o The gradient can be computed analytically.
-o The cost function is convex which guarantees that gradient descent converges for small enough learning rates
+We will discuss the bias-variance tradeoff in the context of
+continuous predictions such as regression. However, many of the
+intuitions and ideas discussed here also carry over to classification
+tasks. Consider a dataset $\mathcal{D}$ consisting of the data
+$\mathbf{X}_\mathcal{D}=\{(y_j, \boldsymbol{x}_j), j=0\ldots n-1\}$. 
 
-We revisit an example similar to what we had in the first homework set. We had a function  of the type
+Let us assume that the true data is generated from a noisy model
 
-!bc pycod
-x = 2*np.random.rand(m,1)
-y = 4+3*x+np.random.randn(m,1)
-!ec
-with $x_i \in [0,1] $ is chosen randomly using a uniform distribution. Additionally we have a stochastic noise chosen according to a normal distribution $\cal {N}(0,1)$. 
-The linear regression model is given by 
-!bt
-\[
-h_\beta(x) = \bm{y} = \beta_0 + \beta_1 x,
-\] 
-!et
-such that 
 !bt
 \[
-\bm{y}_i = \beta_0 + \beta_1 x_i.
+\bm{y}=f(\boldsymbol{x}) + \bm{\epsilon}
 \]
 !et
 
-!split 
-===== Gradient descent example =====
+where $\epsilon$ is normally distributed with mean zero and standard deviation $\sigma^2$.
 
-Let $\mathbf{y} = (y_1,\cdots,y_n)^T$, $\mathbf{\bm{y}} = (\bm{y}_1,\cdots,\bm{y}_n)^T$ and $\beta = (\beta_0, \beta_1)^T$
+In our derivation of the ordinary least squares method we defined then
+an approximation to the function $f$ in terms of the parameters
+$\bm{\theta}$ and the design matrix $\bm{X}$ which embody our model,
+that is $\bm{\tilde{y}}=\bm{X}\bm{\theta}$. 
 
-It is convenient to write $\mathbf{\bm{y}} = X\beta$ where $X \in \mathbb{R}^{100 \times 2} $ is the design matrix given by (we keep the intercept here)
+Thereafter we found the parameters $\bm{\theta}$ by optimizing the means squared error via the so-called cost function
 !bt
 \[
-X \equiv \begin{bmatrix}
-1 & x_1  \\
-\vdots & \vdots  \\
-1 & x_{100} &  \\
-\end{bmatrix}.
+C(\bm{X},\bm{\theta}) =\frac{1}{n}\sum_{i=0}^{n-1}(y_i-\tilde{y}_i)^2=\mathbb{E}\left[(\bm{y}-\bm{\tilde{y}})^2\right].
 \]
 !et
-The cost/loss/risk function is given by (
+
+We can rewrite this as 
 !bt
 \[
-C(\beta) = \frac{1}{n}||X\beta-\mathbf{y}||_{2}^{2} = \frac{1}{n}\sum_{i=1}^{100}\left[ (\beta_0 + \beta_1 x_i)^2 - 2 y_i (\beta_0 + \beta_1 x_i) + y_i^2\right] 
+\mathbb{E}\left[(\bm{y}-\bm{\tilde{y}})^2\right]=\frac{1}{n}\sum_i(f_i-\mathbb{E}\left[\bm{\tilde{y}}\right])^2+\frac{1}{n}\sum_i(\tilde{y}_i-\mathbb{E}\left[\bm{\tilde{y}}\right])^2+\sigma^2.
 \]
 !et
-and we want to find $\beta$ such that $C(\beta)$ is minimized.
 
-!split
-===== The derivative of the cost/loss function =====
+The three terms represent the square of the bias of the learning
+method, which can be thought of as the error caused by the simplifying
+assumptions built into the method. The second term represents the
+variance of the chosen model and finally the last terms is variance of
+the error $\bm{\epsilon}$.
 
-Computing $\partial C(\beta) / \partial \beta_0$ and $\partial C(\beta) / \partial \beta_1$ we can show  that the gradient can be written as
+To derive this equation, we need to recall that the variance of $\bm{y}$ and $\bm{\epsilon}$ are both equal to $\sigma^2$. The mean value of $\bm{\epsilon}$ is by definition equal to zero. Furthermore, the function $f$ is not a stochastics variable, idem for $\bm{\tilde{y}}$.
+We use a more compact notation in terms of the expectation value 
 !bt
 \[
-\nabla_{\beta} C(\beta) = \frac{2}{n}\begin{bmatrix} \sum_{i=1}^{100} \left(\beta_0+\beta_1x_i-y_i\right) \\
-\sum_{i=1}^{100}\left( x_i (\beta_0+\beta_1x_i)-y_ix_i\right) \\
-\end{bmatrix} = \frac{2}{n}X^T(X\beta - \mathbf{y}), 
+\mathbb{E}\left[(\bm{y}-\bm{\tilde{y}})^2\right]=\mathbb{E}\left[(\bm{f}+\bm{\epsilon}-\bm{\tilde{y}})^2\right],
 \]
 !et
-where $X$ is the design matrix defined above.
-
-!split
-===== The Hessian matrix =====
-The Hessian matrix of $C(\beta)$ is given by 
+and adding and subtracting $\mathbb{E}\left[\bm{\tilde{y}}\right]$ we get
 !bt
 \[
-\bm{H} \equiv \begin{bmatrix}
-\frac{\partial^2 C(\beta)}{\partial \beta_0^2} & \frac{\partial^2 C(\beta)}{\partial \beta_0 \partial \beta_1}  \\
-\frac{\partial^2 C(\beta)}{\partial \beta_0 \partial \beta_1} & \frac{\partial^2 C(\beta)}{\partial \beta_1^2} &  \\
-\end{bmatrix} = \frac{2}{n}X^T X.
+\mathbb{E}\left[(\bm{y}-\bm{\tilde{y}})^2\right]=\mathbb{E}\left[(\bm{f}+\bm{\epsilon}-\bm{\tilde{y}}+\mathbb{E}\left[\bm{\tilde{y}}\right]-\mathbb{E}\left[\bm{\tilde{y}}\right])^2\right],
 \]
 !et
-This result implies that $C(\beta)$ is a convex function since the matrix $X^T X$ always is positive semi-definite.
-
-
-
-
-!split
-===== Simple program =====
-
-We can now write a program that minimizes $C(\beta)$ using the gradient descent method with a constant learning rate $\gamma$ according to 
+which, using the abovementioned expectation values can be rewritten as 
 !bt
 \[
-\beta_{k+1} = \beta_k - \gamma \nabla_\beta C(\beta_k), \ k=0,1,\cdots 
+\mathbb{E}\left[(\bm{y}-\bm{\tilde{y}})^2\right]=\mathbb{E}\left[(\bm{y}-\mathbb{E}\left[\bm{\tilde{y}}\right])^2\right]+\mathrm{Var}\left[\bm{\tilde{y}}\right]+\sigma^2,
 \]
 !et
+that is the rewriting in terms of the so-called bias, the variance of the model $\bm{\tilde{y}}$ and the variance of $\bm{\epsilon}$.
 
-We can use the expression we computed for the gradient and let use a
-$\beta_0$ be chosen randomly and let $\gamma = 0.001$. Stop iterating
-when $||\nabla_\beta C(\beta_k) || \leq \epsilon = 10^{-8}$. _Note that the code below does not include the latter stop criterion_.
-
-And finally we can compare our solution for $\beta$ with the analytic result given by 
-$\beta= (X^TX)^{-1} X^T \mathbf{y}$.
 
 !split
-===== Gradient Descent Example =====
+===== A way to Read the Bias-Variance Tradeoff =====
 
-Here our simple example
-!bc pycod
+FIGURE: [figures/BiasVariance.png, width=600 frac=0.9]
 
-# Importing various packages
-from random import random, seed
-import numpy as np
+
+!split
+===== Example code for Bias-Variance tradeoff =====
+!bc pycod
 import matplotlib.pyplot as plt
-from mpl_toolkits.mplot3d import Axes3D
-from matplotlib import cm
-from matplotlib.ticker import LinearLocator, FormatStrFormatter
-import sys
-
-# the number of datapoints
-n = 100
-x = 2*np.random.rand(n,1)
-y = 4+3*x+np.random.randn(n,1)
-
-X = np.c_[np.ones((n,1)), x]
-# Hessian matrix
-H = (2.0/n)* X.T @ X
-# Get the eigenvalues
-EigValues, EigVectors = np.linalg.eig(H)
-print(f"Eigenvalues of Hessian Matrix:{EigValues}")
-
-beta_linreg = np.linalg.inv(X.T @ X) @ X.T @ y
-print(beta_linreg)
-beta = np.random.randn(2,1)
-
-eta = 1.0/np.max(EigValues)
-Niterations = 1000
-
-for iter in range(Niterations):
-    gradient = (2.0/n)*X.T @ (X @ beta-y)
-    beta -= eta*gradient
-
-print(beta)
-xnew = np.array([[0],[2]])
-xbnew = np.c_[np.ones((2,1)), xnew]
-ypredict = xbnew.dot(beta)
-ypredict2 = xbnew.dot(beta_linreg)
-plt.plot(xnew, ypredict, "r-")
-plt.plot(xnew, ypredict2, "b-")
-plt.plot(x, y ,'ro')
-plt.axis([0,2.0,0, 15.0])
-plt.xlabel(r'$x$')
-plt.ylabel(r'$y$')
-plt.title(r'Gradient descent example')
+import numpy as np
+from sklearn.linear_model import LinearRegression, Ridge, Lasso
+from sklearn.preprocessing import PolynomialFeatures
+from sklearn.model_selection import train_test_split
+from sklearn.pipeline import make_pipeline
+from sklearn.utils import resample
+
+np.random.seed(2018)
+
+n = 500
+n_boostraps = 100
+degree = 18  # A quite high value, just to show.
+noise = 0.1
+
+# Make data set.
+x = np.linspace(-1, 3, n).reshape(-1, 1)
+y = np.exp(-x**2) + 1.5 * np.exp(-(x-2)**2) + np.random.normal(0, 0.1, x.shape)
+
+# Hold out some test data that is never used in training.
+x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2)
+
+# Combine x transformation and model into one operation.
+# Not neccesary, but convenient.
+model = make_pipeline(PolynomialFeatures(degree=degree), LinearRegression(fit_intercept=False))
+
+# The following (m x n_bootstraps) matrix holds the column vectors y_pred
+# for each bootstrap iteration.
+y_pred = np.empty((y_test.shape[0], n_boostraps))
+for i in range(n_boostraps):
+    x_, y_ = resample(x_train, y_train)
+
+    # Evaluate the new model on the same test data each time.
+    y_pred[:, i] = model.fit(x_, y_).predict(x_test).ravel()
+
+# Note: Expectations and variances taken w.r.t. different training
+# data sets, hence the axis=1. Subsequent means are taken across the test data
+# set in order to obtain a total value, but before this we have error/bias/variance
+# calculated per data point in the test set.
+# Note 2: The use of keepdims=True is important in the calculation of bias as this 
+# maintains the column vector form. Dropping this yields very unexpected results.
+error = np.mean( np.mean((y_test - y_pred)**2, axis=1, keepdims=True) )
+bias = np.mean( (y_test - np.mean(y_pred, axis=1, keepdims=True))**2 )
+variance = np.mean( np.var(y_pred, axis=1, keepdims=True) )
+print('Error:', error)
+print('Bias^2:', bias)
+print('Var:', variance)
+print('{} >= {} + {} = {}'.format(error, bias, variance, bias+variance))
+
+plt.plot(x[::5, :], y[::5, :], label='f(x)')
+plt.scatter(x_test, y_test, label='Data points')
+plt.scatter(x_test, np.mean(y_pred, axis=1), label='Pred')
+plt.legend()
 plt.show()
 
 !ec
 
-!split
-===== And a corresponding example using _scikit-learn_ =====
 
+!split
+===== Understanding what happens =====
 !bc pycod
-# Importing various packages
-from random import random, seed
-import numpy as np
 import matplotlib.pyplot as plt
-from sklearn.linear_model import SGDRegressor
+import numpy as np
+from sklearn.linear_model import LinearRegression, Ridge, Lasso
+from sklearn.preprocessing import PolynomialFeatures
+from sklearn.model_selection import train_test_split
+from sklearn.pipeline import make_pipeline
+from sklearn.utils import resample
 
-n = 100
-x = 2*np.random.rand(n,1)
-y = 4+3*x+np.random.randn(n,1)
+np.random.seed(2018)
+
+n = 40
+n_boostraps = 100
+maxdegree = 14
+
+
+# Make data set.
+x = np.linspace(-3, 3, n).reshape(-1, 1)
+y = np.exp(-x**2) + 1.5 * np.exp(-(x-2)**2)+ np.random.normal(0, 0.1, x.shape)
+error = np.zeros(maxdegree)
+bias = np.zeros(maxdegree)
+variance = np.zeros(maxdegree)
+polydegree = np.zeros(maxdegree)
+x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2)
+
+for degree in range(maxdegree):
+    model = make_pipeline(PolynomialFeatures(degree=degree), LinearRegression(fit_intercept=False))
+    y_pred = np.empty((y_test.shape[0], n_boostraps))
+    for i in range(n_boostraps):
+        x_, y_ = resample(x_train, y_train)
+        y_pred[:, i] = model.fit(x_, y_).predict(x_test).ravel()
+
+    polydegree[degree] = degree
+    error[degree] = np.mean( np.mean((y_test - y_pred)**2, axis=1, keepdims=True) )
+    bias[degree] = np.mean( (y_test - np.mean(y_pred, axis=1, keepdims=True))**2 )
+    variance[degree] = np.mean( np.var(y_pred, axis=1, keepdims=True) )
+    print('Polynomial degree:', degree)
+    print('Error:', error[degree])
+    print('Bias^2:', bias[degree])
+    print('Var:', variance[degree])
+    print('{} >= {} + {} = {}'.format(error[degree], bias[degree], variance[degree], bias[degree]+variance[degree]))
+
+plt.plot(polydegree, error, label='Error')
+plt.plot(polydegree, bias, label='bias')
+plt.plot(polydegree, variance, label='Variance')
+plt.legend()
+plt.show()
 
-X = np.c_[np.ones((n,1)), x]
-beta_linreg = np.linalg.inv(X.T @ X) @ (X.T @ y)
-print(beta_linreg)
-sgdreg = SGDRegressor(max_iter = 50, penalty=None, eta0=0.1)
-sgdreg.fit(x,y.ravel())
-print(sgdreg.intercept_, sgdreg.coef_)
 
-!ec
 
 
+!ec
 
 !split 
-===== Gradient descent and Ridge =====
+===== Summing up ===== 
 
-We have also discussed Ridge regression where the loss function contains a regularized term given by the $L_2$ norm of $\beta$, 
-!bt
-\[
-C_{\text{ridge}}(\beta) = \frac{1}{n}||X\beta -\mathbf{y}||^2 + \lambda ||\beta||^2, \ \lambda \geq 0.
-\]
-!et
 
-In order to minimize $C_{\text{ridge}}(\beta)$ using GD we adjust the gradient as follows 
-!bt
-\[
-\nabla_\beta C_{\text{ridge}}(\beta)  = \frac{2}{n}\begin{bmatrix} \sum_{i=1}^{100} \left(\beta_0+\beta_1x_i-y_i\right) \\
-\sum_{i=1}^{100}\left( x_i (\beta_0+\beta_1x_i)-y_ix_i\right) \\
-\end{bmatrix} + 2\lambda\begin{bmatrix} \beta_0 \\ \beta_1\end{bmatrix} = 2 (\frac{1}{n}X^T(X\beta - \mathbf{y})+\lambda \beta).
-\]
-!et
 
-We can easily extend our program to minimize $C_{\text{ridge}}(\beta)$ using gradient descent and compare with the analytical solution given by 
-!bt
-\[
-\beta_{\text{ridge}} = \left(X^T X + n\lambda I_{2 \times 2} \right)^{-1} X^T \mathbf{y}.
-\]
-!et
 
-!split
-===== The Hessian matrix for Ridge Regression =====
-The Hessian matrix of Ridge Regression for our simple example  is given by 
-!bt
-\[
-\bm{H} \equiv \begin{bmatrix}
-\frac{\partial^2 C(\beta)}{\partial \beta_0^2} & \frac{\partial^2 C(\beta)}{\partial \beta_0 \partial \beta_1}  \\
-\frac{\partial^2 C(\beta)}{\partial \beta_0 \partial \beta_1} & \frac{\partial^2 C(\beta)}{\partial \beta_1^2} &  \\
-\end{bmatrix} = \frac{2}{n}X^T X+2\lambda\bm{I}.
-\]
-!et
-This implies that the Hessian matrix  is positive definite, hence the stationary point is a
-minimum.
-Note that the Ridge cost function is convex being  a sum of two convex
-functions. Therefore, the stationary point is a global
-minimum of this function.
+The bias-variance tradeoff summarizes the fundamental tension in
+machine learning, particularly supervised learning, between the
+complexity of a model and the amount of training data needed to train
+it.  Since data is often limited, in practice it is often useful to
+use a less-complex model with higher bias, that is  a model whose asymptotic
+performance is worse than another model because it is easier to
+train and less sensitive to sampling noise arising from having a
+finite-sized training dataset (smaller variance). 
 
 
-!split
-===== Program example for gradient descent with Ridge Regression =====
-!bc pycod
-from random import random, seed
-import numpy as np
-import matplotlib.pyplot as plt
-from mpl_toolkits.mplot3d import Axes3D
-from matplotlib import cm
-from matplotlib.ticker import LinearLocator, FormatStrFormatter
-import sys
-
-# the number of datapoints
-n = 100
-x = 2*np.random.rand(n,1)
-y = 4+3*x+np.random.randn(n,1)
-
-X = np.c_[np.ones((n,1)), x]
-XT_X = X.T @ X
-
-#Ridge parameter lambda
-lmbda  = 0.001
-Id = n*lmbda* np.eye(XT_X.shape[0])
-
-# Hessian matrix
-H = (2.0/n)* XT_X+2*lmbda* np.eye(XT_X.shape[0])
-# Get the eigenvalues
-EigValues, EigVectors = np.linalg.eig(H)
-print(f"Eigenvalues of Hessian Matrix:{EigValues}")
-
-
-beta_linreg = np.linalg.inv(XT_X+Id) @ X.T @ y
-print(beta_linreg)
-# Start plain gradient descent
-beta = np.random.randn(2,1)
-
-eta = 1.0/np.max(EigValues)
-Niterations = 100
-
-for iter in range(Niterations):
-    gradients = 2.0/n*X.T @ (X @ (beta)-y)+2*lmbda*beta
-    beta -= eta*gradients
-
-print(beta)
-ypredict = X @ beta
-ypredict2 = X @ beta_linreg
-plt.plot(x, ypredict, "r-")
-plt.plot(x, ypredict2, "b-")
-plt.plot(x, y ,'ro')
-plt.axis([0,2.0,0, 15.0])
-plt.xlabel(r'$x$')
-plt.ylabel(r'$y$')
-plt.title(r'Gradient descent example for Ridge')
-plt.show()
 
+The above equations tell us that in
+order to minimize the expected test error, we need to select a
+statistical learning method that simultaneously achieves low variance
+and low bias. Note that variance is inherently a nonnegative quantity,
+and squared bias is also nonnegative. Hence, we see that the expected
+test MSE can never lie below $Var(\epsilon)$, the irreducible error.
 
-!ec
 
-!split
-===== Using gradient descent methods, limitations =====
+What do we mean by the variance and bias of a statistical learning
+method? The variance refers to the amount by which our model would change if we
+estimated it using a different training data set. Since the training
+data are used to fit the statistical learning method, different
+training data sets  will result in a different estimate. But ideally the
+estimate for our model should not vary too much between training
+sets. However, if a method has high variance  then small changes in
+the training data can result in large changes in the model. In general, more
+flexible statistical methods have higher variance.
 
-* _Gradient descent (GD) finds local minima of our function_. Since the GD algorithm is deterministic, if it converges, it will converge to a local minimum of our cost/loss/risk function. Because in ML we are often dealing with extremely rugged landscapes with many local minima, this can lead to poor performance.
 
-* _GD is sensitive to initial conditions_. One consequence of the local nature of GD is that initial conditions matter. Depending on where one starts, one will end up at a different local minima. Therefore, it is very important to think about how one initializes the training process. This is true for GD as well as more complicated variants of GD.
+You may also find this recent "article":"https://www.pnas.org/content/116/32/15849" of interest.
 
-* _Gradients are computationally expensive to calculate for large datasets_. In many cases in statistics and ML, the cost/loss/risk function is a sum of terms, with one term for each data point. For example, in linear regression, $E \propto \sum_{i=1}^n (y_i - \mathbf{w}^T\cdot\mathbf{x}_i)^2$; for logistic regression, the square error is replaced by the cross entropy. To calculate the gradient we have to sum over *all* $n$ data points. Doing this at every GD step becomes extremely computationally expensive. An ingenious solution to this, is to calculate the gradients using small subsets of the data called ``mini batches''. This has the added benefit of introducing stochasticity into our algorithm.
+!split
+===== Another Example from Scikit-Learn's Repository =====
+
+This example demonstrates the problems of underfitting and overfitting and
+how we can use linear regression with polynomial features to approximate
+nonlinear functions. The plot shows the function that we want to approximate,
+which is a part of the cosine function. In addition, the samples from the
+real function and the approximations of different models are displayed. The
+models have polynomial features of different degrees. We can see that a
+linear function (polynomial with degree 1) is not sufficient to fit the
+training samples. This is called _underfitting_. A polynomial of degree 4
+approximates the true function almost perfectly. However, for higher degrees
+the model will _overfit_ the training data, i.e. it learns the noise of the
+training data.
+We evaluate quantitatively overfitting and underfitting by using
+cross-validation. We calculate the mean squared error (MSE) on the validation
+set, the higher, the less likely the model generalizes correctly from the
+training data.
 
-* _GD is very sensitive to choices of learning rates_. GD is extremely sensitive to the choice of learning rates. If the learning rate is very small, the training process take an extremely long time. For larger learning rates, GD can diverge and give poor results. Furthermore, depending on what the local landscape looks like, we have to modify the learning rates to ensure convergence. Ideally, we would *adaptively* choose the learning rates to match the landscape.
+!bc pycod
 
-* _GD treats all directions in parameter space uniformly._ Another major drawback of GD is that unlike Newton's method, the learning rate for GD is the same in all directions in parameter space. For this reason, the maximum learning rate is set by the behavior of the steepest direction and this can significantly slow down training. Ideally, we would like to take large steps in flat directions and small steps in steep directions. Since we are exploring rugged landscapes where curvatures change, this requires us to keep track of not only the gradient but second derivatives. The ideal scenario would be to calculate the Hessian but this proves to be too computationally expensive. 
-	
-* GD can take exponential time to escape saddle points, even with random initialization. As we mentioned, GD is extremely sensitive to initial condition since it determines the particular local minimum GD would eventually reach. However, even with a good initialization scheme, through the introduction of randomness, GD can still take exponential time to escape saddle points.
-	
 
-!split
-===== Challenge yourself the coming weekend =====
+#print(__doc__)
 
-Write a code which implements gradient descent for a logistic regression example.
+import numpy as np
+import matplotlib.pyplot as plt
+from sklearn.pipeline import Pipeline
+from sklearn.preprocessing import PolynomialFeatures
+from sklearn.linear_model import LinearRegression
+from sklearn.model_selection import cross_val_score
 
 
+def true_fun(X):
+    return np.cos(1.5 * np.pi * X)
+
+np.random.seed(0)
+
+n_samples = 30
+degrees = [1, 4, 15]
+
+X = np.sort(np.random.rand(n_samples))
+y = true_fun(X) + np.random.randn(n_samples) * 0.1
+
+plt.figure(figsize=(14, 5))
+for i in range(len(degrees)):
+    ax = plt.subplot(1, len(degrees), i + 1)
+    plt.setp(ax, xticks=(), yticks=())
+
+    polynomial_features = PolynomialFeatures(degree=degrees[i],
+                                             include_bias=False)
+    linear_regression = LinearRegression()
+    pipeline = Pipeline([("polynomial_features", polynomial_features),
+                         ("linear_regression", linear_regression)])
+    pipeline.fit(X[:, np.newaxis], y)
+
+    # Evaluate the models using crossvalidation
+    scores = cross_val_score(pipeline, X[:, np.newaxis], y,
+                             scoring="neg_mean_squared_error", cv=10)
+
+    X_test = np.linspace(0, 1, 100)
+    plt.plot(X_test, pipeline.predict(X_test[:, np.newaxis]), label="Model")
+    plt.plot(X_test, true_fun(X_test), label="True function")
+    plt.scatter(X, y, edgecolor='b', s=20, label="Samples")
+    plt.xlabel("x")
+    plt.ylabel("y")
+    plt.xlim((0, 1))
+    plt.ylim((-2, 2))
+    plt.legend(loc="best")
+    plt.title("Degree {}\nMSE = {:.2e}(+/- {:.2e})".format(
+        degrees[i], -scores.mean(), scores.std()))
+plt.show()
+!ec
 
 
-!split
-===== Lab session: Material from last week and relevant for the first project =====
 
 
 !split 
@@ -1576,28 +1096,6 @@ involves a degree of randomness. This may be fully excluded when
 choosing $k=n$. This particular case is referred to as leave-one-out
 cross-validation (LOOCV). 
 
-!split 
-===== How to set up the cross-validation for Ridge and/or Lasso =====
-
-* Define a range of interest for the penalty parameter.
-
-* Divide the data set into training and test set comprising samples $\{1, \ldots, n\} \setminus i$ and $\{ i \}$, respectively.
-
-* Fit the linear regression model by means of for example Ridge or Lasso regression  for each $\lambda$ in the grid using the training set, and the corresponding estimate of the error variance $\bm{\sigma}_{-i}^2(\lambda)$, as
-!bt
-\begin{align*}
-\bm{\beta}_{-i}(\lambda) & =  ( \bm{X}_{-i, \ast}^{T}
-\bm{X}_{-i, \ast} + \lambda \bm{I}_{pp})^{-1}
-\bm{X}_{-i, \ast}^{T} \bm{y}_{-i}
-\end{align*}
-!et 
-
-* Evaluate the prediction performance of these models on the test set by $C[y_i, \bm{X}_{i, \ast}; \bm{\beta}_{-i}(\lambda), \bm{\sigma}_{-i}^2(\lambda)]$. Or, by the prediction error $|y_i - \bm{X}_{i, \ast} \bm{\beta}_{-i}(\lambda)|$, the relative error, the error squared or the R2 score function.
-
-* Repeat the first three steps  such that each sample plays the role of the test set once.
-
-* Average the prediction performances of the test sets at each grid point of the penalty bias/parameter. It is an estimate of the prediction performance of the model corresponding to this value of the penalty parameter on novel data. 
-
 
 !split
 ===== Cross-validation in brief =====
@@ -1607,10 +1105,10 @@ For the various values of $k$
 o shuffle the dataset randomly.
 o Split the dataset into $k$ groups.
 o For each unique group:
-  o Decide which group to use as set for test data
-  o Take the remaining groups as a training data set
-  o Fit a model on the training set and evaluate it on the test set
-  o Retain the evaluation score and discard the model
+ o Decide which group to use as set for test data
+ o Take the remaining groups as a training data set
+ o Fit a model on the training set and evaluate it on the test set
+ o Retain the evaluation score and discard the model
 o Summarize the model using the sample of model evaluation scores
 
 
@@ -1713,3 +1211,181 @@ plt.show()
 !ec
 
 
+
+!split
+===== More examples on bootstrap and cross-validation and errors =====
+
+!bc pycod
+# Common imports
+import os
+import numpy as np
+import pandas as pd
+import matplotlib.pyplot as plt
+from sklearn.linear_model import LinearRegression, Ridge, Lasso
+from sklearn.model_selection import train_test_split
+from sklearn.utils import resample
+from sklearn.metrics import mean_squared_error
+# Where to save the figures and data files
+PROJECT_ROOT_DIR = "Results"
+FIGURE_ID = "Results/FigureFiles"
+DATA_ID = "DataFiles/"
+
+if not os.path.exists(PROJECT_ROOT_DIR):
+    os.mkdir(PROJECT_ROOT_DIR)
+
+if not os.path.exists(FIGURE_ID):
+    os.makedirs(FIGURE_ID)
+
+if not os.path.exists(DATA_ID):
+    os.makedirs(DATA_ID)
+
+def image_path(fig_id):
+    return os.path.join(FIGURE_ID, fig_id)
+
+def data_path(dat_id):
+    return os.path.join(DATA_ID, dat_id)
+
+def save_fig(fig_id):
+    plt.savefig(image_path(fig_id) + ".png", format='png')
+
+infile = open(data_path("EoS.csv"),'r')
+
+# Read the EoS data as  csv file and organize the data into two arrays with density and energies
+EoS = pd.read_csv(infile, names=('Density', 'Energy'))
+EoS['Energy'] = pd.to_numeric(EoS['Energy'], errors='coerce')
+EoS = EoS.dropna()
+Energies = EoS['Energy']
+Density = EoS['Density']
+#  The design matrix now as function of various polytrops
+
+Maxpolydegree = 30
+X = np.zeros((len(Density),Maxpolydegree))
+X[:,0] = 1.0
+testerror = np.zeros(Maxpolydegree)
+trainingerror = np.zeros(Maxpolydegree)
+polynomial = np.zeros(Maxpolydegree)
+
+trials = 100
+for polydegree in range(1, Maxpolydegree):
+    polynomial[polydegree] = polydegree
+    for degree in range(polydegree):
+        X[:,degree] = Density**(degree/3.0)
+
+# loop over trials in order to estimate the expectation value of the MSE
+    testerror[polydegree] = 0.0
+    trainingerror[polydegree] = 0.0
+    for samples in range(trials):
+        x_train, x_test, y_train, y_test = train_test_split(X, Energies, test_size=0.2)
+        model = LinearRegression(fit_intercept=False).fit(x_train, y_train)
+        ypred = model.predict(x_train)
+        ytilde = model.predict(x_test)
+        testerror[polydegree] += mean_squared_error(y_test, ytilde)
+        trainingerror[polydegree] += mean_squared_error(y_train, ypred) 
+
+    testerror[polydegree] /= trials
+    trainingerror[polydegree] /= trials
+    print("Degree of polynomial: %3d"% polynomial[polydegree])
+    print("Mean squared error on training data: %.8f" % trainingerror[polydegree])
+    print("Mean squared error on test data: %.8f" % testerror[polydegree])
+
+plt.plot(polynomial, np.log10(trainingerror), label='Training Error')
+plt.plot(polynomial, np.log10(testerror), label='Test Error')
+plt.xlabel('Polynomial degree')
+plt.ylabel('log10[MSE]')
+plt.legend()
+plt.show()
+
+!ec
+
+Note that we kept the intercept column in the fitting here. This means that we need to set the _intercept_ in the call to the _Scikit-Learn_ function as _False_. Alternatively, we could have set up the design matrix $X$ without the first column of ones.
+
+!split 
+=====  The same example but now with cross-validation =====
+
+In this example we keep the intercept column again but add cross-validation in order to estimate the best possible value of the means squared error.
+!bc pycod
+# Common imports
+import os
+import numpy as np
+import pandas as pd
+import matplotlib.pyplot as plt
+from sklearn.linear_model import LinearRegression, Ridge, Lasso
+from sklearn.metrics import mean_squared_error
+from sklearn.model_selection import KFold
+from sklearn.model_selection import cross_val_score
+
+
+# Where to save the figures and data files
+PROJECT_ROOT_DIR = "Results"
+FIGURE_ID = "Results/FigureFiles"
+DATA_ID = "DataFiles/"
+
+if not os.path.exists(PROJECT_ROOT_DIR):
+    os.mkdir(PROJECT_ROOT_DIR)
+
+if not os.path.exists(FIGURE_ID):
+    os.makedirs(FIGURE_ID)
+
+if not os.path.exists(DATA_ID):
+    os.makedirs(DATA_ID)
+
+def image_path(fig_id):
+    return os.path.join(FIGURE_ID, fig_id)
+
+def data_path(dat_id):
+    return os.path.join(DATA_ID, dat_id)
+
+def save_fig(fig_id):
+    plt.savefig(image_path(fig_id) + ".png", format='png')
+
+infile = open(data_path("EoS.csv"),'r')
+
+# Read the EoS data as  csv file and organize the data into two arrays with density and energies
+EoS = pd.read_csv(infile, names=('Density', 'Energy'))
+EoS['Energy'] = pd.to_numeric(EoS['Energy'], errors='coerce')
+EoS = EoS.dropna()
+Energies = EoS['Energy']
+Density = EoS['Density']
+#  The design matrix now as function of various polytrops
+
+Maxpolydegree = 30
+X = np.zeros((len(Density),Maxpolydegree))
+X[:,0] = 1.0
+estimated_mse_sklearn = np.zeros(Maxpolydegree)
+polynomial = np.zeros(Maxpolydegree)
+k =5
+kfold = KFold(n_splits = k)
+
+for polydegree in range(1, Maxpolydegree):
+    polynomial[polydegree] = polydegree
+    for degree in range(polydegree):
+        X[:,degree] = Density**(degree/3.0)
+        OLS = LinearRegression(fit_intercept=False)
+# loop over trials in order to estimate the expectation value of the MSE
+    estimated_mse_folds = cross_val_score(OLS, X, Energies, scoring='neg_mean_squared_error', cv=kfold)
+#[:, np.newaxis]
+    estimated_mse_sklearn[polydegree] = np.mean(-estimated_mse_folds)
+
+plt.plot(polynomial, np.log10(estimated_mse_sklearn), label='Test Error')
+plt.xlabel('Polynomial degree')
+plt.ylabel('log10[MSE]')
+plt.legend()
+plt.show()
+
+!ec
+
+
+
+
+
+!split
+===== Material for the lab sessions =====
+
+This week we will discuss during the first hour of each lab session
+some technicalities related to the project and methods for updating
+the learning like ADAgrad, RMSprop and ADAM. As teaching material, see
+the jupyter-notebook from week 37 (September 12-16).
+
+For the lab session, the following video on cross validation (from 2024), could be helpful, see URL:"https://www.youtube.com/watch?v=T9jjWsmsd1o"
+
+See also video on ADAgrad, RMSprop and ADAM (material from last week not covered during lecture) at URL:"https://youtu.be/J_41Hld6tTU"
diff --git a/doc/src/week39/.ipynb_checkpoints/week39-checkpoint.ipynb b/doc/src/week39/.ipynb_checkpoints/week39-checkpoint.ipynb
new file mode 100644
index 000000000..700fa67ff
--- /dev/null
+++ b/doc/src/week39/.ipynb_checkpoints/week39-checkpoint.ipynb
@@ -0,0 +1,2431 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "62134f61",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- HTML file automatically generated from DocOnce source (https://github.com/doconce/doconce/)\n",
+    "doconce format html week39.do.txt --no_mako -->\n",
+    "<!-- dom:TITLE: Week 39: Resampling methods and logistic regression -->"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "18511459",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "# Week 39: Resampling methods and logistic regression\n",
+    "**Morten Hjorth-Jensen**, Department of Physics, University of Oslo\n",
+    "\n",
+    "Date: **Week 39**"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4ac361fe",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Plan for week 39, September 22-26, 2025\n",
+    "\n",
+    "**Material for the lecture on Monday September 22.**\n",
+    "\n",
+    "1. Resampling techniques, Bootstrap and cross validation and bias-variance tradeoff\n",
+    "\n",
+    "2. Logistic regression, our first classification encounter and a stepping stone towards neural networks\n",
+    "<!-- * [Video of lecture](https://youtu.be/ISGpTC28Vmk) -->\n",
+    "<!-- * [Whiteboard notes](https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2024/NotesSeptember23.pdf) -->"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "26591943",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Readings and Videos, resampling methods\n",
+    "1. Raschka et al, pages 175-192\n",
+    "\n",
+    "2. Hastie et al Chapter 7, here we recommend 7.1-7.5 and 7.10 (cross-validation) and 7.11 (bootstrap). See <https://link.springer.com/book/10.1007/978-0-387-84858-7>.\n",
+    "\n",
+    "3. [Video on bias-variance tradeoff](https://www.youtube.com/watch?v=EuBBz3bI-aA)\n",
+    "\n",
+    "4. [Video on Bootstrapping](https://www.youtube.com/watch?v=Xz0x-8-cgaQ)\n",
+    "\n",
+    "5. [Video on cross validation](https://www.youtube.com/watch?v=fSytzGwwBVw)\n",
+    "\n",
+    "For the lab session, the following video on cross validation (from 2024), could be helpful, see <https://www.youtube.com/watch?v=T9jjWsmsd1o>"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ef640267",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Readings and Videos, logistic regression\n",
+    "1. Hastie et al 4.1, 4.2 and 4.3 on logistic regression\n",
+    "\n",
+    "2. Raschka et al, pages 53-76 on Logistic regression and pages 37-52 on gradient optimization\n",
+    "\n",
+    "3. [Video on Logistic regression](https://www.youtube.com/watch?v=C5268D9t9Ak)\n",
+    "\n",
+    "4. [Yet another video on logistic regression](https://www.youtube.com/watch?v=yIYKR4sgzI8)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8dc6a98d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Lab sessions week 39\n",
+    "\n",
+    "**Material for the lab sessions on Tuesday and Wednesday.**\n",
+    "\n",
+    "1. Discussions on how to structure your report for the first project\n",
+    "\n",
+    "2. Exercise for week 39 on how to write the abstract and the introduction of the report and how to include references. \n",
+    "\n",
+    "3. Work on project 1, in particular resampling methods like cross-validation and bootstrap. **For more discussions of project 1, chapter 5 of Goodfellow et al is a good read, in particular sections 5.1-5.5 and 5.7-5.11**.\n",
+    "\n",
+    "4. [Video on how to write scientific reports recorded during one of the lab sessions](https://youtu.be/tVW1ZDmZnwM)\n",
+    "\n",
+    "5. A general guideline can be found at <https://github.com/CompPhysics/MachineLearning/blob/master/doc/Projects/EvaluationGrading/EvaluationForm.md>."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9c83ba1e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Lecture material"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8212a46b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Resampling methods\n",
+    "Resampling methods are an indispensable tool in modern\n",
+    "statistics. They involve repeatedly drawing samples from a training\n",
+    "set and refitting a model of interest on each sample in order to\n",
+    "obtain additional information about the fitted model. For example, in\n",
+    "order to estimate the variability of a linear regression fit, we can\n",
+    "repeatedly draw different samples from the training data, fit a linear\n",
+    "regression to each new sample, and then examine the extent to which\n",
+    "the resulting fits differ. Such an approach may allow us to obtain\n",
+    "information that would not be available from fitting the model only\n",
+    "once using the original training sample.\n",
+    "\n",
+    "Two resampling methods are often used in Machine Learning analyses,\n",
+    "1. The **bootstrap method**\n",
+    "\n",
+    "2. and **Cross-Validation**\n",
+    "\n",
+    "In addition there are several other methods such as the Jackknife and the Blocking methods. This week will repeat some of the elements of the bootstrap method and focus more on cross-validation."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "52dc4a61",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Resampling approaches can be computationally expensive\n",
+    "\n",
+    "Resampling approaches can be computationally expensive, because they\n",
+    "involve fitting the same statistical method multiple times using\n",
+    "different subsets of the training data. However, due to recent\n",
+    "advances in computing power, the computational requirements of\n",
+    "resampling methods generally are not prohibitive. In this chapter, we\n",
+    "discuss two of the most commonly used resampling methods,\n",
+    "cross-validation and the bootstrap. Both methods are important tools\n",
+    "in the practical application of many statistical learning\n",
+    "procedures. For example, cross-validation can be used to estimate the\n",
+    "test error associated with a given statistical learning method in\n",
+    "order to evaluate its performance, or to select the appropriate level\n",
+    "of flexibility. The process of evaluating a model’s performance is\n",
+    "known as model assessment, whereas the process of selecting the proper\n",
+    "level of flexibility for a model is known as model selection. The\n",
+    "bootstrap is widely used."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2bb1f260",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Why resampling methods ?\n",
+    "**Statistical analysis.**\n",
+    "\n",
+    "* Our simulations can be treated as *computer experiments*. This is particularly the case for Monte Carlo methods which are widely used in statistical analyses.\n",
+    "\n",
+    "* The results can be analysed with the same statistical tools as we would use when analysing experimental data.\n",
+    "\n",
+    "* As in all experiments, we are looking for expectation values and an estimate of how accurate they are, i.e., possible sources for errors."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "83ac9370",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Statistical analysis\n",
+    "\n",
+    "* As in other experiments, many numerical  experiments have two classes of errors:\n",
+    "\n",
+    "  * Statistical errors\n",
+    "\n",
+    "  * Systematical errors\n",
+    "\n",
+    "* Statistical errors can be estimated using standard tools from statistics\n",
+    "\n",
+    "* Systematical errors are method specific and must be treated differently from case to case."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "311b2462",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Resampling methods\n",
+    "\n",
+    "With all these analytical equations for both the OLS and Ridge\n",
+    "regression, we will now outline how to assess a given model. This will\n",
+    "lead to a discussion of the so-called bias-variance tradeoff (see\n",
+    "below) and so-called resampling methods.\n",
+    "\n",
+    "One of the quantities we have discussed as a way to measure errors is\n",
+    "the mean-squared error (MSE), mainly used for fitting of continuous\n",
+    "functions. Another choice is the absolute error.\n",
+    "\n",
+    "In the discussions below we will focus on the MSE and in particular since we will split the data into test and training data,\n",
+    "we discuss the\n",
+    "1. prediction error or simply the **test error** $\\mathrm{Err_{Test}}$, where we have a fixed training set and the test error is the MSE arising from the data reserved for testing. We discuss also the \n",
+    "\n",
+    "2. training error $\\mathrm{Err_{Train}}$, which is the average loss over the training data.\n",
+    "\n",
+    "As our model becomes more and more complex, more of the training data tends to  used. The training may thence adapt to more complicated structures in the data. This may lead to a decrease in the bias (see below for code example) and a slight increase of the variance for the test error.\n",
+    "For a certain level of complexity the test error will reach minimum, before starting to increase again. The\n",
+    "training error reaches a saturation."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "16fc2047",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Resampling methods: Bootstrap\n",
+    "Bootstrapping is a [non-parametric approach](https://en.wikipedia.org/wiki/Nonparametric_statistics) to statistical inference\n",
+    "that substitutes computation for more traditional distributional\n",
+    "assumptions and asymptotic results. Bootstrapping offers a number of\n",
+    "advantages: \n",
+    "1. The bootstrap is quite general, although there are some cases in which it fails.  \n",
+    "\n",
+    "2. Because it does not require distributional assumptions (such as normally distributed errors), the bootstrap can provide more accurate inferences when the data are not well behaved or when the sample size is small.  \n",
+    "\n",
+    "3. It is possible to apply the bootstrap to statistics with sampling distributions that are difficult to derive, even asymptotically. \n",
+    "\n",
+    "4. It is relatively simple to apply the bootstrap to complex data-collection plans (such as stratified and clustered samples).\n",
+    "\n",
+    "The textbook by [Davison on the Bootstrap Methods and their Applications](https://www.cambridge.org/core/books/bootstrap-methods-and-their-application/ED2FD043579F27952363566DC09CBD6A) provides many more insights and proofs. In this course we will take a more practical approach and use the results and theorems provided in the literature. For those interested in reading more about the bootstrap methods, we recommend the above text and the one by [Efron and Tibshirani](https://www.routledge.com/An-Introduction-to-the-Bootstrap/Efron-Tibshirani/p/book/9780412042317)."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "71e77d2a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## The bias-variance tradeoff\n",
+    "\n",
+    "We will discuss the bias-variance tradeoff in the context of\n",
+    "continuous predictions such as regression. However, many of the\n",
+    "intuitions and ideas discussed here also carry over to classification\n",
+    "tasks. Consider a dataset $\\mathcal{D}$ consisting of the data\n",
+    "$\\mathbf{X}_\\mathcal{D}=\\{(y_j, \\boldsymbol{x}_j), j=0\\ldots n-1\\}$. \n",
+    "\n",
+    "Let us assume that the true data is generated from a noisy model"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "060cde37",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\boldsymbol{y}=f(\\boldsymbol{x}) + \\boldsymbol{\\epsilon}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4250eafd",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where $\\epsilon$ is normally distributed with mean zero and standard deviation $\\sigma^2$.\n",
+    "\n",
+    "In our derivation of the ordinary least squares method we defined then\n",
+    "an approximation to the function $f$ in terms of the parameters\n",
+    "$\\boldsymbol{\\theta}$ and the design matrix $\\boldsymbol{X}$ which embody our model,\n",
+    "that is $\\boldsymbol{\\tilde{y}}=\\boldsymbol{X}\\boldsymbol{\\theta}$. \n",
+    "\n",
+    "Thereafter we found the parameters $\\boldsymbol{\\theta}$ by optimizing the means squared error via the so-called cost function"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "657e6a07",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "C(\\boldsymbol{X},\\boldsymbol{\\theta}) =\\frac{1}{n}\\sum_{i=0}^{n-1}(y_i-\\tilde{y}_i)^2=\\mathbb{E}\\left[(\\boldsymbol{y}-\\boldsymbol{\\tilde{y}})^2\\right].\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9d845060",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "We can rewrite this as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e14f1a91",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\mathbb{E}\\left[(\\boldsymbol{y}-\\boldsymbol{\\tilde{y}})^2\\right]=\\frac{1}{n}\\sum_i(f_i-\\mathbb{E}\\left[\\boldsymbol{\\tilde{y}}\\right])^2+\\frac{1}{n}\\sum_i(\\tilde{y}_i-\\mathbb{E}\\left[\\boldsymbol{\\tilde{y}}\\right])^2+\\sigma^2.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e2f106e3",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "**Note that in order to derive these equations we have assumed we can replace the unknown function ${\\bm f}$ with the target/output data $\\boldsymbol{y}$.**\n",
+    "\n",
+    "The three terms represent the square of the bias of the learning\n",
+    "method, which can be thought of as the error caused by the simplifying\n",
+    "assumptions built into the method. The second term represents the\n",
+    "variance of the chosen model and finally the last terms is variance of\n",
+    "the error $\\boldsymbol{\\epsilon}$.\n",
+    "\n",
+    "To derive this equation, we need to recall that the variance of $\\boldsymbol{y}$ and $\\boldsymbol{\\epsilon}$ are both equal to $\\sigma^2$. The mean value of $\\boldsymbol{\\epsilon}$ is by definition equal to zero. Furthermore, the function $f$ is not a stochastics variable, idem for $\\boldsymbol{\\tilde{y}}$.\n",
+    "We use a more compact notation in terms of the expectation value"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "aea6844b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\mathbb{E}\\left[(\\boldsymbol{y}-\\boldsymbol{\\tilde{y}})^2\\right]=\\mathbb{E}\\left[(\\boldsymbol{f}+\\boldsymbol{\\epsilon}-\\boldsymbol{\\tilde{y}})^2\\right],\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f02cbdf6",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and adding and subtracting $\\mathbb{E}\\left[\\boldsymbol{\\tilde{y}}\\right]$ we get"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3e7dee2c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\mathbb{E}\\left[(\\boldsymbol{y}-\\boldsymbol{\\tilde{y}})^2\\right]=\\mathbb{E}\\left[(\\boldsymbol{f}+\\boldsymbol{\\epsilon}-\\boldsymbol{\\tilde{y}}+\\mathbb{E}\\left[\\boldsymbol{\\tilde{y}}\\right]-\\mathbb{E}\\left[\\boldsymbol{\\tilde{y}}\\right])^2\\right],\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cebe7810",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "which, using the abovementioned expectation values can be rewritten as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d3717520",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\mathbb{E}\\left[(\\boldsymbol{y}-\\boldsymbol{\\tilde{y}})^2\\right]=\\mathbb{E}\\left[(\\boldsymbol{y}-\\mathbb{E}\\left[\\boldsymbol{\\tilde{y}}\\right])^2\\right]+\\mathrm{Var}\\left[\\boldsymbol{\\tilde{y}}\\right]+\\sigma^2,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0f938bd2",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "that is the rewriting in terms of the so-called bias, the variance of the model $\\boldsymbol{\\tilde{y}}$ and the variance of $\\boldsymbol{\\epsilon}$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "28c1403a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## A way to Read the Bias-Variance Tradeoff\n",
+    "\n",
+    "<!-- dom:FIGURE: [figures/BiasVariance.png, width=600 frac=0.9] -->\n",
+    "<!-- begin figure -->\n",
+    "\n",
+    "<img src=\"figures/BiasVariance.png\" width=\"600\"><p style=\"font-size: 0.9em\"><i>Figure 1: </i></p>\n",
+    "<!-- end figure -->"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3fcd177a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Understanding what happens"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "id": "5b86b2c4",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "%matplotlib inline\n",
+    "\n",
+    "import matplotlib.pyplot as plt\n",
+    "import numpy as np\n",
+    "from sklearn.linear_model import LinearRegression, Ridge, Lasso\n",
+    "from sklearn.preprocessing import PolynomialFeatures\n",
+    "from sklearn.model_selection import train_test_split\n",
+    "from sklearn.pipeline import make_pipeline\n",
+    "from sklearn.utils import resample\n",
+    "\n",
+    "np.random.seed(2018)\n",
+    "\n",
+    "n = 40\n",
+    "n_boostraps = 100\n",
+    "maxdegree = 14\n",
+    "\n",
+    "\n",
+    "# Make data set.\n",
+    "x = np.linspace(-3, 3, n).reshape(-1, 1)\n",
+    "y = np.exp(-x**2) + 1.5 * np.exp(-(x-2)**2)+ np.random.normal(0, 0.1, x.shape)\n",
+    "error = np.zeros(maxdegree)\n",
+    "bias = np.zeros(maxdegree)\n",
+    "variance = np.zeros(maxdegree)\n",
+    "polydegree = np.zeros(maxdegree)\n",
+    "x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2)\n",
+    "\n",
+    "for degree in range(maxdegree):\n",
+    "    model = make_pipeline(PolynomialFeatures(degree=degree), LinearRegression(fit_intercept=False))\n",
+    "    y_pred = np.empty((y_test.shape[0], n_boostraps))\n",
+    "    for i in range(n_boostraps):\n",
+    "        x_, y_ = resample(x_train, y_train)\n",
+    "        y_pred[:, i] = model.fit(x_, y_).predict(x_test).ravel()\n",
+    "\n",
+    "    polydegree[degree] = degree\n",
+    "    error[degree] = np.mean( np.mean((y_test - y_pred)**2, axis=1, keepdims=True) )\n",
+    "    bias[degree] = np.mean( (y_test - np.mean(y_pred, axis=1, keepdims=True))**2 )\n",
+    "    variance[degree] = np.mean( np.var(y_pred, axis=1, keepdims=True) )\n",
+    "    print('Polynomial degree:', degree)\n",
+    "    print('Error:', error[degree])\n",
+    "    print('Bias^2:', bias[degree])\n",
+    "    print('Var:', variance[degree])\n",
+    "    print('{} >= {} + {} = {}'.format(error[degree], bias[degree], variance[degree], bias[degree]+variance[degree]))\n",
+    "\n",
+    "plt.plot(polydegree, error, label='Error')\n",
+    "plt.plot(polydegree, bias, label='bias')\n",
+    "plt.plot(polydegree, variance, label='Variance')\n",
+    "plt.legend()\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e5b44e09",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Summing up\n",
+    "\n",
+    "The bias-variance tradeoff summarizes the fundamental tension in\n",
+    "machine learning, particularly supervised learning, between the\n",
+    "complexity of a model and the amount of training data needed to train\n",
+    "it.  Since data is often limited, in practice it is often useful to\n",
+    "use a less-complex model with higher bias, that is  a model whose asymptotic\n",
+    "performance is worse than another model because it is easier to\n",
+    "train and less sensitive to sampling noise arising from having a\n",
+    "finite-sized training dataset (smaller variance). \n",
+    "\n",
+    "The above equations tell us that in\n",
+    "order to minimize the expected test error, we need to select a\n",
+    "statistical learning method that simultaneously achieves low variance\n",
+    "and low bias. Note that variance is inherently a nonnegative quantity,\n",
+    "and squared bias is also nonnegative. Hence, we see that the expected\n",
+    "test MSE can never lie below $Var(\\epsilon)$, the irreducible error.\n",
+    "\n",
+    "What do we mean by the variance and bias of a statistical learning\n",
+    "method? The variance refers to the amount by which our model would change if we\n",
+    "estimated it using a different training data set. Since the training\n",
+    "data are used to fit the statistical learning method, different\n",
+    "training data sets  will result in a different estimate. But ideally the\n",
+    "estimate for our model should not vary too much between training\n",
+    "sets. However, if a method has high variance  then small changes in\n",
+    "the training data can result in large changes in the model. In general, more\n",
+    "flexible statistical methods have higher variance.\n",
+    "\n",
+    "You may also find this recent [article](https://www.pnas.org/content/116/32/15849) of interest."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "bce2ae6e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Another Example from Scikit-Learn's Repository\n",
+    "\n",
+    "This example demonstrates the problems of underfitting and overfitting and\n",
+    "how we can use linear regression with polynomial features to approximate\n",
+    "nonlinear functions. The plot shows the function that we want to approximate,\n",
+    "which is a part of the cosine function. In addition, the samples from the\n",
+    "real function and the approximations of different models are displayed. The\n",
+    "models have polynomial features of different degrees. We can see that a\n",
+    "linear function (polynomial with degree 1) is not sufficient to fit the\n",
+    "training samples. This is called **underfitting**. A polynomial of degree 4\n",
+    "approximates the true function almost perfectly. However, for higher degrees\n",
+    "the model will **overfit** the training data, i.e. it learns the noise of the\n",
+    "training data.\n",
+    "We evaluate quantitatively overfitting and underfitting by using\n",
+    "cross-validation. We calculate the mean squared error (MSE) on the validation\n",
+    "set, the higher, the less likely the model generalizes correctly from the\n",
+    "training data."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "id": "bf84c6f1",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "\n",
+    "\n",
+    "#print(__doc__)\n",
+    "\n",
+    "import numpy as np\n",
+    "import matplotlib.pyplot as plt\n",
+    "from sklearn.pipeline import Pipeline\n",
+    "from sklearn.preprocessing import PolynomialFeatures\n",
+    "from sklearn.linear_model import LinearRegression\n",
+    "from sklearn.model_selection import cross_val_score\n",
+    "\n",
+    "\n",
+    "def true_fun(X):\n",
+    "    return np.cos(1.5 * np.pi * X)\n",
+    "\n",
+    "np.random.seed(0)\n",
+    "\n",
+    "n_samples = 30\n",
+    "degrees = [1, 4, 15]\n",
+    "\n",
+    "X = np.sort(np.random.rand(n_samples))\n",
+    "y = true_fun(X) + np.random.randn(n_samples) * 0.1\n",
+    "\n",
+    "plt.figure(figsize=(14, 5))\n",
+    "for i in range(len(degrees)):\n",
+    "    ax = plt.subplot(1, len(degrees), i + 1)\n",
+    "    plt.setp(ax, xticks=(), yticks=())\n",
+    "\n",
+    "    polynomial_features = PolynomialFeatures(degree=degrees[i],\n",
+    "                                             include_bias=False)\n",
+    "    linear_regression = LinearRegression()\n",
+    "    pipeline = Pipeline([(\"polynomial_features\", polynomial_features),\n",
+    "                         (\"linear_regression\", linear_regression)])\n",
+    "    pipeline.fit(X[:, np.newaxis], y)\n",
+    "\n",
+    "    # Evaluate the models using crossvalidation\n",
+    "    scores = cross_val_score(pipeline, X[:, np.newaxis], y,\n",
+    "                             scoring=\"neg_mean_squared_error\", cv=10)\n",
+    "\n",
+    "    X_test = np.linspace(0, 1, 100)\n",
+    "    plt.plot(X_test, pipeline.predict(X_test[:, np.newaxis]), label=\"Model\")\n",
+    "    plt.plot(X_test, true_fun(X_test), label=\"True function\")\n",
+    "    plt.scatter(X, y, edgecolor='b', s=20, label=\"Samples\")\n",
+    "    plt.xlabel(\"x\")\n",
+    "    plt.ylabel(\"y\")\n",
+    "    plt.xlim((0, 1))\n",
+    "    plt.ylim((-2, 2))\n",
+    "    plt.legend(loc=\"best\")\n",
+    "    plt.title(\"Degree {}\\nMSE = {:.2e}(+/- {:.2e})\".format(\n",
+    "        degrees[i], -scores.mean(), scores.std()))\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ecd38906",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Various steps in cross-validation\n",
+    "\n",
+    "When the repetitive splitting of the data set is done randomly,\n",
+    "samples may accidently end up in a fast majority of the splits in\n",
+    "either training or test set. Such samples may have an unbalanced\n",
+    "influence on either model building or prediction evaluation. To avoid\n",
+    "this $k$-fold cross-validation structures the data splitting. The\n",
+    "samples are divided into $k$ more or less equally sized exhaustive and\n",
+    "mutually exclusive subsets. In turn (at each split) one of these\n",
+    "subsets plays the role of the test set while the union of the\n",
+    "remaining subsets constitutes the training set. Such a splitting\n",
+    "warrants a balanced representation of each sample in both training and\n",
+    "test set over the splits. Still the division into the $k$ subsets\n",
+    "involves a degree of randomness. This may be fully excluded when\n",
+    "choosing $k=n$. This particular case is referred to as leave-one-out\n",
+    "cross-validation (LOOCV)."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5f6400ac",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Cross-validation in brief\n",
+    "\n",
+    "For the various values of $k$\n",
+    "\n",
+    "1. shuffle the dataset randomly.\n",
+    "\n",
+    "2. Split the dataset into $k$ groups.\n",
+    "\n",
+    "3. For each unique group:\n",
+    "\n",
+    "a. Decide which group to use as set for test data\n",
+    "\n",
+    "b. Take the remaining groups as a training data set\n",
+    "\n",
+    "c. Fit a model on the training set and evaluate it on the test set\n",
+    "\n",
+    "d. Retain the evaluation score and discard the model\n",
+    "\n",
+    "5. Summarize the model using the sample of model evaluation scores"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "868ba56e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Code Example for Cross-validation and $k$-fold Cross-validation\n",
+    "\n",
+    "The code here uses Ridge regression with cross-validation (CV)  resampling and $k$-fold CV in order to fit a specific polynomial."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "id": "74f5295d",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "import numpy as np\n",
+    "import matplotlib.pyplot as plt\n",
+    "from sklearn.model_selection import KFold\n",
+    "from sklearn.linear_model import Ridge\n",
+    "from sklearn.model_selection import cross_val_score\n",
+    "from sklearn.preprocessing import PolynomialFeatures\n",
+    "\n",
+    "# A seed just to ensure that the random numbers are the same for every run.\n",
+    "# Useful for eventual debugging.\n",
+    "np.random.seed(3155)\n",
+    "\n",
+    "# Generate the data.\n",
+    "nsamples = 100\n",
+    "x = np.random.randn(nsamples)\n",
+    "y = 3*x**2 + np.random.randn(nsamples)\n",
+    "\n",
+    "## Cross-validation on Ridge regression using KFold only\n",
+    "\n",
+    "# Decide degree on polynomial to fit\n",
+    "poly = PolynomialFeatures(degree = 6)\n",
+    "\n",
+    "# Decide which values of lambda to use\n",
+    "nlambdas = 500\n",
+    "lambdas = np.logspace(-3, 5, nlambdas)\n",
+    "\n",
+    "# Initialize a KFold instance\n",
+    "k = 5\n",
+    "kfold = KFold(n_splits = k)\n",
+    "\n",
+    "# Perform the cross-validation to estimate MSE\n",
+    "scores_KFold = np.zeros((nlambdas, k))\n",
+    "\n",
+    "i = 0\n",
+    "for lmb in lambdas:\n",
+    "    ridge = Ridge(alpha = lmb)\n",
+    "    j = 0\n",
+    "    for train_inds, test_inds in kfold.split(x):\n",
+    "        xtrain = x[train_inds]\n",
+    "        ytrain = y[train_inds]\n",
+    "\n",
+    "        xtest = x[test_inds]\n",
+    "        ytest = y[test_inds]\n",
+    "\n",
+    "        Xtrain = poly.fit_transform(xtrain[:, np.newaxis])\n",
+    "        ridge.fit(Xtrain, ytrain[:, np.newaxis])\n",
+    "\n",
+    "        Xtest = poly.fit_transform(xtest[:, np.newaxis])\n",
+    "        ypred = ridge.predict(Xtest)\n",
+    "\n",
+    "        scores_KFold[i,j] = np.sum((ypred - ytest[:, np.newaxis])**2)/np.size(ypred)\n",
+    "\n",
+    "        j += 1\n",
+    "    i += 1\n",
+    "\n",
+    "\n",
+    "estimated_mse_KFold = np.mean(scores_KFold, axis = 1)\n",
+    "\n",
+    "## Cross-validation using cross_val_score from sklearn along with KFold\n",
+    "\n",
+    "# kfold is an instance initialized above as:\n",
+    "# kfold = KFold(n_splits = k)\n",
+    "\n",
+    "estimated_mse_sklearn = np.zeros(nlambdas)\n",
+    "i = 0\n",
+    "for lmb in lambdas:\n",
+    "    ridge = Ridge(alpha = lmb)\n",
+    "\n",
+    "    X = poly.fit_transform(x[:, np.newaxis])\n",
+    "    estimated_mse_folds = cross_val_score(ridge, X, y[:, np.newaxis], scoring='neg_mean_squared_error', cv=kfold)\n",
+    "\n",
+    "    # cross_val_score return an array containing the estimated negative mse for every fold.\n",
+    "    # we have to the the mean of every array in order to get an estimate of the mse of the model\n",
+    "    estimated_mse_sklearn[i] = np.mean(-estimated_mse_folds)\n",
+    "\n",
+    "    i += 1\n",
+    "\n",
+    "## Plot and compare the slightly different ways to perform cross-validation\n",
+    "\n",
+    "plt.figure()\n",
+    "\n",
+    "plt.plot(np.log10(lambdas), estimated_mse_sklearn, label = 'cross_val_score')\n",
+    "plt.plot(np.log10(lambdas), estimated_mse_KFold, 'r--', label = 'KFold')\n",
+    "\n",
+    "plt.xlabel('log10(lambda)')\n",
+    "plt.ylabel('mse')\n",
+    "\n",
+    "plt.legend()\n",
+    "\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "efc7785a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## More examples on bootstrap and cross-validation and errors"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "id": "10d4568b",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "# Common imports\n",
+    "import os\n",
+    "import numpy as np\n",
+    "import pandas as pd\n",
+    "import matplotlib.pyplot as plt\n",
+    "from sklearn.linear_model import LinearRegression, Ridge, Lasso\n",
+    "from sklearn.model_selection import train_test_split\n",
+    "from sklearn.utils import resample\n",
+    "from sklearn.metrics import mean_squared_error\n",
+    "# Where to save the figures and data files\n",
+    "PROJECT_ROOT_DIR = \"Results\"\n",
+    "FIGURE_ID = \"Results/FigureFiles\"\n",
+    "DATA_ID = \"DataFiles/\"\n",
+    "\n",
+    "if not os.path.exists(PROJECT_ROOT_DIR):\n",
+    "    os.mkdir(PROJECT_ROOT_DIR)\n",
+    "\n",
+    "if not os.path.exists(FIGURE_ID):\n",
+    "    os.makedirs(FIGURE_ID)\n",
+    "\n",
+    "if not os.path.exists(DATA_ID):\n",
+    "    os.makedirs(DATA_ID)\n",
+    "\n",
+    "def image_path(fig_id):\n",
+    "    return os.path.join(FIGURE_ID, fig_id)\n",
+    "\n",
+    "def data_path(dat_id):\n",
+    "    return os.path.join(DATA_ID, dat_id)\n",
+    "\n",
+    "def save_fig(fig_id):\n",
+    "    plt.savefig(image_path(fig_id) + \".png\", format='png')\n",
+    "\n",
+    "infile = open(data_path(\"EoS.csv\"),'r')\n",
+    "\n",
+    "# Read the EoS data as  csv file and organize the data into two arrays with density and energies\n",
+    "EoS = pd.read_csv(infile, names=('Density', 'Energy'))\n",
+    "EoS['Energy'] = pd.to_numeric(EoS['Energy'], errors='coerce')\n",
+    "EoS = EoS.dropna()\n",
+    "Energies = EoS['Energy']\n",
+    "Density = EoS['Density']\n",
+    "#  The design matrix now as function of various polytrops\n",
+    "\n",
+    "Maxpolydegree = 30\n",
+    "X = np.zeros((len(Density),Maxpolydegree))\n",
+    "X[:,0] = 1.0\n",
+    "testerror = np.zeros(Maxpolydegree)\n",
+    "trainingerror = np.zeros(Maxpolydegree)\n",
+    "polynomial = np.zeros(Maxpolydegree)\n",
+    "\n",
+    "trials = 100\n",
+    "for polydegree in range(1, Maxpolydegree):\n",
+    "    polynomial[polydegree] = polydegree\n",
+    "    for degree in range(polydegree):\n",
+    "        X[:,degree] = Density**(degree/3.0)\n",
+    "\n",
+    "# loop over trials in order to estimate the expectation value of the MSE\n",
+    "    testerror[polydegree] = 0.0\n",
+    "    trainingerror[polydegree] = 0.0\n",
+    "    for samples in range(trials):\n",
+    "        x_train, x_test, y_train, y_test = train_test_split(X, Energies, test_size=0.2)\n",
+    "        model = LinearRegression(fit_intercept=False).fit(x_train, y_train)\n",
+    "        ypred = model.predict(x_train)\n",
+    "        ytilde = model.predict(x_test)\n",
+    "        testerror[polydegree] += mean_squared_error(y_test, ytilde)\n",
+    "        trainingerror[polydegree] += mean_squared_error(y_train, ypred) \n",
+    "\n",
+    "    testerror[polydegree] /= trials\n",
+    "    trainingerror[polydegree] /= trials\n",
+    "    print(\"Degree of polynomial: %3d\"% polynomial[polydegree])\n",
+    "    print(\"Mean squared error on training data: %.8f\" % trainingerror[polydegree])\n",
+    "    print(\"Mean squared error on test data: %.8f\" % testerror[polydegree])\n",
+    "\n",
+    "plt.plot(polynomial, np.log10(trainingerror), label='Training Error')\n",
+    "plt.plot(polynomial, np.log10(testerror), label='Test Error')\n",
+    "plt.xlabel('Polynomial degree')\n",
+    "plt.ylabel('log10[MSE]')\n",
+    "plt.legend()\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9e3a327e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Note that we kept the intercept column in the fitting here. This means that we need to set the **intercept** in the call to the **Scikit-Learn** function as **False**. Alternatively, we could have set up the design matrix $X$ without the first column of ones."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8aa7f638",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## The same example but now with cross-validation\n",
+    "\n",
+    "In this example we keep the intercept column again but add cross-validation in order to estimate the best possible value of the means squared error."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "id": "c819c683",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "# Common imports\n",
+    "import os\n",
+    "import numpy as np\n",
+    "import pandas as pd\n",
+    "import matplotlib.pyplot as plt\n",
+    "from sklearn.linear_model import LinearRegression, Ridge, Lasso\n",
+    "from sklearn.metrics import mean_squared_error\n",
+    "from sklearn.model_selection import KFold\n",
+    "from sklearn.model_selection import cross_val_score\n",
+    "\n",
+    "\n",
+    "# Where to save the figures and data files\n",
+    "PROJECT_ROOT_DIR = \"Results\"\n",
+    "FIGURE_ID = \"Results/FigureFiles\"\n",
+    "DATA_ID = \"DataFiles/\"\n",
+    "\n",
+    "if not os.path.exists(PROJECT_ROOT_DIR):\n",
+    "    os.mkdir(PROJECT_ROOT_DIR)\n",
+    "\n",
+    "if not os.path.exists(FIGURE_ID):\n",
+    "    os.makedirs(FIGURE_ID)\n",
+    "\n",
+    "if not os.path.exists(DATA_ID):\n",
+    "    os.makedirs(DATA_ID)\n",
+    "\n",
+    "def image_path(fig_id):\n",
+    "    return os.path.join(FIGURE_ID, fig_id)\n",
+    "\n",
+    "def data_path(dat_id):\n",
+    "    return os.path.join(DATA_ID, dat_id)\n",
+    "\n",
+    "def save_fig(fig_id):\n",
+    "    plt.savefig(image_path(fig_id) + \".png\", format='png')\n",
+    "\n",
+    "infile = open(data_path(\"EoS.csv\"),'r')\n",
+    "\n",
+    "# Read the EoS data as  csv file and organize the data into two arrays with density and energies\n",
+    "EoS = pd.read_csv(infile, names=('Density', 'Energy'))\n",
+    "EoS['Energy'] = pd.to_numeric(EoS['Energy'], errors='coerce')\n",
+    "EoS = EoS.dropna()\n",
+    "Energies = EoS['Energy']\n",
+    "Density = EoS['Density']\n",
+    "#  The design matrix now as function of various polytrops\n",
+    "\n",
+    "Maxpolydegree = 30\n",
+    "X = np.zeros((len(Density),Maxpolydegree))\n",
+    "X[:,0] = 1.0\n",
+    "estimated_mse_sklearn = np.zeros(Maxpolydegree)\n",
+    "polynomial = np.zeros(Maxpolydegree)\n",
+    "k =5\n",
+    "kfold = KFold(n_splits = k)\n",
+    "\n",
+    "for polydegree in range(1, Maxpolydegree):\n",
+    "    polynomial[polydegree] = polydegree\n",
+    "    for degree in range(polydegree):\n",
+    "        X[:,degree] = Density**(degree/3.0)\n",
+    "        OLS = LinearRegression(fit_intercept=False)\n",
+    "# loop over trials in order to estimate the expectation value of the MSE\n",
+    "    estimated_mse_folds = cross_val_score(OLS, X, Energies, scoring='neg_mean_squared_error', cv=kfold)\n",
+    "#[:, np.newaxis]\n",
+    "    estimated_mse_sklearn[polydegree] = np.mean(-estimated_mse_folds)\n",
+    "\n",
+    "plt.plot(polynomial, np.log10(estimated_mse_sklearn), label='Test Error')\n",
+    "plt.xlabel('Polynomial degree')\n",
+    "plt.ylabel('log10[MSE]')\n",
+    "plt.legend()\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "87a24ed6",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Logistic Regression\n",
+    "\n",
+    "In linear regression our main interest was centered on learning the\n",
+    "coefficients of a functional fit (say a polynomial) in order to be\n",
+    "able to predict the response of a continuous variable on some unseen\n",
+    "data. The fit to the continuous variable $y_i$ is based on some\n",
+    "independent variables $\\boldsymbol{x}_i$. Linear regression resulted in\n",
+    "analytical expressions for standard ordinary Least Squares or Ridge\n",
+    "regression (in terms of matrices to invert) for several quantities,\n",
+    "ranging from the variance and thereby the confidence intervals of the\n",
+    "parameters $\\boldsymbol{\\theta}$ to the mean squared error. If we can invert\n",
+    "the product of the design matrices, linear regression gives then a\n",
+    "simple recipe for fitting our data."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "388d4fa7",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Classification problems\n",
+    "\n",
+    "Classification problems, however, are concerned with outcomes taking\n",
+    "the form of discrete variables (i.e. categories). We may for example,\n",
+    "on the basis of DNA sequencing for a number of patients, like to find\n",
+    "out which mutations are important for a certain disease; or based on\n",
+    "scans of various patients' brains, figure out if there is a tumor or\n",
+    "not; or given a specific physical system, we'd like to identify its\n",
+    "state, say whether it is an ordered or disordered system (typical\n",
+    "situation in solid state physics); or classify the status of a\n",
+    "patient, whether she/he has a stroke or not and many other similar\n",
+    "situations.\n",
+    "\n",
+    "The most common situation we encounter when we apply logistic\n",
+    "regression is that of two possible outcomes, normally denoted as a\n",
+    "binary outcome, true or false, positive or negative, success or\n",
+    "failure etc."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "71dcaf93",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Optimization and Deep learning\n",
+    "\n",
+    "Logistic regression will also serve as our stepping stone towards\n",
+    "neural network algorithms and supervised deep learning. For logistic\n",
+    "learning, the minimization of the cost function leads to a non-linear\n",
+    "equation in the parameters $\\boldsymbol{\\theta}$. The optimization of the\n",
+    "problem calls therefore for minimization algorithms. This forms the\n",
+    "bottle neck of all machine learning algorithms, namely how to find\n",
+    "reliable minima of a multi-variable function. This leads us to the\n",
+    "family of gradient descent methods. The latter are the working horses\n",
+    "of basically all modern machine learning algorithms.\n",
+    "\n",
+    "We note also that many of the topics discussed here on logistic \n",
+    "regression are also commonly used in modern supervised Deep Learning\n",
+    "models, as we will see later."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ee8f3888",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Basics\n",
+    "\n",
+    "We consider the case where the outputs/targets, also called the\n",
+    "responses or the outcomes, $y_i$ are discrete and only take values\n",
+    "from $k=0,\\dots,K-1$ (i.e. $K$ classes).\n",
+    "\n",
+    "The goal is to predict the\n",
+    "output classes from the design matrix $\\boldsymbol{X}\\in\\mathbb{R}^{n\\times p}$\n",
+    "made of $n$ samples, each of which carries $p$ features or predictors. The\n",
+    "primary goal is to identify the classes to which new unseen samples\n",
+    "belong.\n",
+    "\n",
+    "Let us specialize to the case of two classes only, with outputs\n",
+    "$y_i=0$ and $y_i=1$. Our outcomes could represent the status of a\n",
+    "credit card user that could default or not on her/his credit card\n",
+    "debt. That is"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2712a5df",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "y_i = \\begin{bmatrix} 0 & \\mathrm{no}\\\\  1 & \\mathrm{yes} \\end{bmatrix}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "487c5ca1",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Linear classifier\n",
+    "\n",
+    "Before moving to the logistic model, let us try to use our linear\n",
+    "regression model to classify these two outcomes. We could for example\n",
+    "fit a linear model to the default case if $y_i > 0.5$ and the no\n",
+    "default case $y_i \\leq 0.5$.\n",
+    "\n",
+    "We would then have our \n",
+    "weighted linear combination, namely"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "51481148",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"_auto1\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation}\n",
+    "\\boldsymbol{y} = \\boldsymbol{X}^T\\boldsymbol{\\theta} +  \\boldsymbol{\\epsilon},\n",
+    "\\label{_auto1} \\tag{1}\n",
+    "\\end{equation}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "63a7b287",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where $\\boldsymbol{y}$ is a vector representing the possible outcomes, $\\boldsymbol{X}$ is our\n",
+    "$n\\times p$ design matrix and $\\boldsymbol{\\theta}$ represents our estimators/predictors."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "484b37cc",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Some selected properties\n",
+    "\n",
+    "The main problem with our function is that it takes values on the\n",
+    "entire real axis. In the case of logistic regression, however, the\n",
+    "labels $y_i$ are discrete variables. A typical example is the credit\n",
+    "card data discussed below here, where we can set the state of\n",
+    "defaulting the debt to $y_i=1$ and not to $y_i=0$ for one the persons\n",
+    "in the data set (see the full example below).\n",
+    "\n",
+    "One simple way to get a discrete output is to have sign\n",
+    "functions that map the output of a linear regressor to values $\\{0,1\\}$,\n",
+    "$f(s_i)=sign(s_i)=1$ if $s_i\\ge 0$ and 0 if otherwise. \n",
+    "We will encounter this model in our first demonstration of neural networks.\n",
+    "\n",
+    "Historically it is called the **perceptron** model in the machine learning\n",
+    "literature. This model is extremely simple. However, in many cases it is more\n",
+    "favorable to use a ``soft\" classifier that outputs\n",
+    "the probability of a given category. This leads us to the logistic function."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b0838383",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Simple example\n",
+    "\n",
+    "The following example on data for coronary heart disease (CHD) as function of age may serve as an illustration. In the code here we read and plot whether a person has had CHD (output = 1) or not (output = 0). This ouput  is plotted the person's against age. Clearly, the figure shows that attempting to make a standard linear regression fit may not be very meaningful."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "id": "440d515e",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "# Common imports\n",
+    "import os\n",
+    "import numpy as np\n",
+    "import pandas as pd\n",
+    "import matplotlib.pyplot as plt\n",
+    "from sklearn.linear_model import LinearRegression, Ridge, Lasso\n",
+    "from sklearn.model_selection import train_test_split\n",
+    "from sklearn.utils import resample\n",
+    "from sklearn.metrics import mean_squared_error\n",
+    "from IPython.display import display\n",
+    "from pylab import plt, mpl\n",
+    "plt.style.use('seaborn')\n",
+    "mpl.rcParams['font.family'] = 'serif'\n",
+    "\n",
+    "# Where to save the figures and data files\n",
+    "PROJECT_ROOT_DIR = \"Results\"\n",
+    "FIGURE_ID = \"Results/FigureFiles\"\n",
+    "DATA_ID = \"DataFiles/\"\n",
+    "\n",
+    "if not os.path.exists(PROJECT_ROOT_DIR):\n",
+    "    os.mkdir(PROJECT_ROOT_DIR)\n",
+    "\n",
+    "if not os.path.exists(FIGURE_ID):\n",
+    "    os.makedirs(FIGURE_ID)\n",
+    "\n",
+    "if not os.path.exists(DATA_ID):\n",
+    "    os.makedirs(DATA_ID)\n",
+    "\n",
+    "def image_path(fig_id):\n",
+    "    return os.path.join(FIGURE_ID, fig_id)\n",
+    "\n",
+    "def data_path(dat_id):\n",
+    "    return os.path.join(DATA_ID, dat_id)\n",
+    "\n",
+    "def save_fig(fig_id):\n",
+    "    plt.savefig(image_path(fig_id) + \".png\", format='png')\n",
+    "\n",
+    "infile = open(data_path(\"chddata.csv\"),'r')\n",
+    "\n",
+    "# Read the chd data as  csv file and organize the data into arrays with age group, age, and chd\n",
+    "chd = pd.read_csv(infile, names=('ID', 'Age', 'Agegroup', 'CHD'))\n",
+    "chd.columns = ['ID', 'Age', 'Agegroup', 'CHD']\n",
+    "output = chd['CHD']\n",
+    "age = chd['Age']\n",
+    "agegroup = chd['Agegroup']\n",
+    "numberID  = chd['ID'] \n",
+    "display(chd)\n",
+    "\n",
+    "plt.scatter(age, output, marker='o')\n",
+    "plt.axis([18,70.0,-0.1, 1.2])\n",
+    "plt.xlabel(r'Age')\n",
+    "plt.ylabel(r'CHD')\n",
+    "plt.title(r'Age distribution and Coronary heart disease')\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "00d823bc",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Plotting the mean value for each group\n",
+    "\n",
+    "What we could attempt however is to plot the mean value for each group."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "id": "2514f198",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "agegroupmean = np.array([0.1, 0.133, 0.250, 0.333, 0.462, 0.625, 0.765, 0.800])\n",
+    "group = np.array([1, 2, 3, 4, 5, 6, 7, 8])\n",
+    "plt.plot(group, agegroupmean, \"r-\")\n",
+    "plt.axis([0,9,0, 1.0])\n",
+    "plt.xlabel(r'Age group')\n",
+    "plt.ylabel(r'CHD mean values')\n",
+    "plt.title(r'Mean values for each age group')\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b08a1823",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "We are now trying to find a function $f(y\\vert x)$, that is a function which gives us an expected value for the output $y$ with a given input $x$.\n",
+    "In standard linear regression with a linear dependence on $x$, we would write this in terms of our model"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4b2d4e0c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "f(y_i\\vert x_i)=\\theta_0+\\theta_1 x_i.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2437f515",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "This expression implies however that $f(y_i\\vert x_i)$ could take any\n",
+    "value from minus infinity to plus infinity. If we however let\n",
+    "$f(y\\vert y)$ be represented by the mean value, the above example\n",
+    "shows us that we can constrain the function to take values between\n",
+    "zero and one, that is we have $0 \\le f(y_i\\vert x_i) \\le 1$. Looking\n",
+    "at our last curve we see also that it has an S-shaped form. This leads\n",
+    "us to a very popular model for the function $f$, namely the so-called\n",
+    "Sigmoid function or logistic model. We will consider this function as\n",
+    "representing the probability for finding a value of $y_i$ with a given\n",
+    "$x_i$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ee4d5c1c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## The logistic function\n",
+    "\n",
+    "Another widely studied model, is the so-called \n",
+    "perceptron model, which is an example of a \"hard classification\" model. We\n",
+    "will encounter this model when we discuss neural networks as\n",
+    "well. Each datapoint is deterministically assigned to a category (i.e\n",
+    "$y_i=0$ or $y_i=1$). In many cases, and the coronary heart disease data forms one of many such examples, it is favorable to have a \"soft\"\n",
+    "classifier that outputs the probability of a given category rather\n",
+    "than a single value. For example, given $x_i$, the classifier\n",
+    "outputs the probability of being in a category $k$.  Logistic regression\n",
+    "is the most common example of a so-called soft classifier. In logistic\n",
+    "regression, the probability that a data point $x_i$\n",
+    "belongs to a category $y_i=\\{0,1\\}$ is given by the so-called logit function (or Sigmoid) which is meant to represent the likelihood for a given event,"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5cf73446",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "p(t) = \\frac{1}{1+\\mathrm \\exp{-t}}=\\frac{\\exp{t}}{1+\\mathrm \\exp{t}}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d29c50bd",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Note that $1-p(t)= p(-t)$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d3d7ea09",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Examples of likelihood functions used in logistic regression and nueral networks\n",
+    "\n",
+    "The following code plots the logistic function, the step function and other functions we will encounter from here and on."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 8,
+   "id": "df137543",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "\"\"\"The sigmoid function (or the logistic curve) is a\n",
+    "function that takes any real number, z, and outputs a number (0,1).\n",
+    "It is useful in neural networks for assigning weights on a relative scale.\n",
+    "The value z is the weighted sum of parameters involved in the learning algorithm.\"\"\"\n",
+    "\n",
+    "import numpy\n",
+    "import matplotlib.pyplot as plt\n",
+    "import math as mt\n",
+    "\n",
+    "z = numpy.arange(-5, 5, .1)\n",
+    "sigma_fn = numpy.vectorize(lambda z: 1/(1+numpy.exp(-z)))\n",
+    "sigma = sigma_fn(z)\n",
+    "\n",
+    "fig = plt.figure()\n",
+    "ax = fig.add_subplot(111)\n",
+    "ax.plot(z, sigma)\n",
+    "ax.set_ylim([-0.1, 1.1])\n",
+    "ax.set_xlim([-5,5])\n",
+    "ax.grid(True)\n",
+    "ax.set_xlabel('z')\n",
+    "ax.set_title('sigmoid function')\n",
+    "\n",
+    "plt.show()\n",
+    "\n",
+    "\"\"\"Step Function\"\"\"\n",
+    "z = numpy.arange(-5, 5, .02)\n",
+    "step_fn = numpy.vectorize(lambda z: 1.0 if z >= 0.0 else 0.0)\n",
+    "step = step_fn(z)\n",
+    "\n",
+    "fig = plt.figure()\n",
+    "ax = fig.add_subplot(111)\n",
+    "ax.plot(z, step)\n",
+    "ax.set_ylim([-0.5, 1.5])\n",
+    "ax.set_xlim([-5,5])\n",
+    "ax.grid(True)\n",
+    "ax.set_xlabel('z')\n",
+    "ax.set_title('step function')\n",
+    "\n",
+    "plt.show()\n",
+    "\n",
+    "\"\"\"tanh Function\"\"\"\n",
+    "z = numpy.arange(-2*mt.pi, 2*mt.pi, 0.1)\n",
+    "t = numpy.tanh(z)\n",
+    "\n",
+    "fig = plt.figure()\n",
+    "ax = fig.add_subplot(111)\n",
+    "ax.plot(z, t)\n",
+    "ax.set_ylim([-1.0, 1.0])\n",
+    "ax.set_xlim([-2*mt.pi,2*mt.pi])\n",
+    "ax.grid(True)\n",
+    "ax.set_xlabel('z')\n",
+    "ax.set_title('tanh function')\n",
+    "\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7294b8c4",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Two parameters\n",
+    "\n",
+    "We assume now that we have two classes with $y_i$ either $0$ or $1$. Furthermore we assume also that we have only two parameters $\\theta$ in our fitting of the Sigmoid function, that is we define probabilities"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4df27806",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\begin{align*}\n",
+    "p(y_i=1|x_i,\\boldsymbol{\\theta}) &= \\frac{\\exp{(\\theta_0+\\theta_1x_i)}}{1+\\exp{(\\theta_0+\\theta_1x_i)}},\\nonumber\\\\\n",
+    "p(y_i=0|x_i,\\boldsymbol{\\theta}) &= 1 - p(y_i=1|x_i,\\boldsymbol{\\theta}),\n",
+    "\\end{align*}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8edc44e7",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where $\\boldsymbol{\\theta}$ are the weights we wish to extract from data, in our case $\\theta_0$ and $\\theta_1$. \n",
+    "\n",
+    "Note that we used"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "dc8f47db",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "p(y_i=0\\vert x_i, \\boldsymbol{\\theta}) = 1-p(y_i=1\\vert x_i, \\boldsymbol{\\theta}).\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f7df7d6e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Maximum likelihood\n",
+    "\n",
+    "In order to define the total likelihood for all possible outcomes from a  \n",
+    "dataset $\\mathcal{D}=\\{(y_i,x_i)\\}$, with the binary labels\n",
+    "$y_i\\in\\{0,1\\}$ and where the data points are drawn independently, we use the so-called [Maximum Likelihood Estimation](https://en.wikipedia.org/wiki/Maximum_likelihood_estimation) (MLE) principle. \n",
+    "We aim thus at maximizing \n",
+    "the probability of seeing the observed data. We can then approximate the \n",
+    "likelihood in terms of the product of the individual probabilities of a specific outcome $y_i$, that is"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "be8d31ac",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\begin{align*}\n",
+    "P(\\mathcal{D}|\\boldsymbol{\\theta})& = \\prod_{i=1}^n \\left[p(y_i=1|x_i,\\boldsymbol{\\theta})\\right]^{y_i}\\left[1-p(y_i=1|x_i,\\boldsymbol{\\theta}))\\right]^{1-y_i}\\nonumber \\\\\n",
+    "\\end{align*}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0ca3a18b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "from which we obtain the log-likelihood and our **cost/loss** function"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "24364844",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\mathcal{C}(\\boldsymbol{\\theta}) = \\sum_{i=1}^n \\left( y_i\\log{p(y_i=1|x_i,\\boldsymbol{\\theta})} + (1-y_i)\\log\\left[1-p(y_i=1|x_i,\\boldsymbol{\\theta}))\\right]\\right).\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5a41f11e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## The cost function rewritten\n",
+    "\n",
+    "Reordering the logarithms, we can rewrite the **cost/loss** function as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8befc396",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\mathcal{C}(\\boldsymbol{\\theta}) = \\sum_{i=1}^n  \\left(y_i(\\theta_0+\\theta_1x_i) -\\log{(1+\\exp{(\\theta_0+\\theta_1x_i)})}\\right).\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "93e1c922",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "The maximum likelihood estimator is defined as the set of parameters that maximize the log-likelihood where we maximize with respect to $\\theta$.\n",
+    "Since the cost (error) function is just the negative log-likelihood, for logistic regression we have that"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "379d88b5",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\mathcal{C}(\\boldsymbol{\\theta})=-\\sum_{i=1}^n  \\left(y_i(\\theta_0+\\theta_1x_i) -\\log{(1+\\exp{(\\theta_0+\\theta_1x_i)})}\\right).\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ff497f62",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "This equation is known in statistics as the **cross entropy**. Finally, we note that just as in linear regression, \n",
+    "in practice we often supplement the cross-entropy with additional regularization terms, usually $L_1$ and $L_2$ regularization as we did for Ridge and Lasso regression."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3e0066e4",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Minimizing the cross entropy\n",
+    "\n",
+    "The cross entropy is a convex function of the weights $\\boldsymbol{\\theta}$ and,\n",
+    "therefore, any local minimizer is a global minimizer. \n",
+    "\n",
+    "Minimizing this\n",
+    "cost function with respect to the two parameters $\\theta_0$ and $\\theta_1$ we obtain"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "91c4078d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial \\mathcal{C}(\\boldsymbol{\\theta})}{\\partial \\theta_0} = -\\sum_{i=1}^n  \\left(y_i -\\frac{\\exp{(\\theta_0+\\theta_1x_i)}}{1+\\exp{(\\theta_0+\\theta_1x_i)}}\\right),\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cb1a5e75",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cb2b5260",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial \\mathcal{C}(\\boldsymbol{\\theta})}{\\partial \\theta_1} = -\\sum_{i=1}^n  \\left(y_ix_i -x_i\\frac{\\exp{(\\theta_0+\\theta_1x_i)}}{1+\\exp{(\\theta_0+\\theta_1x_i)}}\\right).\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "82e3f7fc",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## A more compact expression\n",
+    "\n",
+    "Let us now define a vector $\\boldsymbol{y}$ with $n$ elements $y_i$, an\n",
+    "$n\\times p$ matrix $\\boldsymbol{X}$ which contains the $x_i$ values and a\n",
+    "vector $\\boldsymbol{p}$ of fitted probabilities $p(y_i\\vert x_i,\\boldsymbol{\\theta})$. We can rewrite in a more compact form the first\n",
+    "derivative of the cost function as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "630599d0",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial \\mathcal{C}(\\boldsymbol{\\theta})}{\\partial \\boldsymbol{\\theta}} = -\\boldsymbol{X}^T\\left(\\boldsymbol{y}-\\boldsymbol{p}\\right).\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8ea4f878",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "If we in addition define a diagonal matrix $\\boldsymbol{W}$ with elements \n",
+    "$p(y_i\\vert x_i,\\boldsymbol{\\theta})(1-p(y_i\\vert x_i,\\boldsymbol{\\theta})$, we can obtain a compact expression of the second derivative as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1c68b2f6",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial^2 \\mathcal{C}(\\boldsymbol{\\theta})}{\\partial \\boldsymbol{\\theta}\\partial \\boldsymbol{\\theta}^T} = \\boldsymbol{X}^T\\boldsymbol{W}\\boldsymbol{X}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7ea4651a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Extending to more predictors\n",
+    "\n",
+    "Within a binary classification problem, we can easily expand our model to include multiple predictors. Our ratio between likelihoods is then with $p$ predictors"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3363b8cf",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\log{ \\frac{p(\\boldsymbol{\\theta}\\boldsymbol{x})}{1-p(\\boldsymbol{\\theta}\\boldsymbol{x})}} = \\theta_0+\\theta_1x_1+\\theta_2x_2+\\dots+\\theta_px_p.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7646b727",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Here we defined $\\boldsymbol{x}=[1,x_1,x_2,\\dots,x_p]$ and $\\boldsymbol{\\theta}=[\\theta_0, \\theta_1, \\dots, \\theta_p]$ leading to"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "be360cb3",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "p(\\boldsymbol{\\theta}\\boldsymbol{x})=\\frac{ \\exp{(\\theta_0+\\theta_1x_1+\\theta_2x_2+\\dots+\\theta_px_p)}}{1+\\exp{(\\theta_0+\\theta_1x_1+\\theta_2x_2+\\dots+\\theta_px_p)}}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "29235b97",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Including more classes\n",
+    "\n",
+    "Till now we have mainly focused on two classes, the so-called binary\n",
+    "system. Suppose we wish to extend to $K$ classes.  Let us for the sake\n",
+    "of simplicity assume we have only two predictors. We have then following model"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "53a2efd1",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\log{\\frac{p(C=1\\vert x)}{p(K\\vert x)}} = \\theta_{10}+\\theta_{11}x_1,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "65a8ddfd",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "12823da4",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\log{\\frac{p(C=2\\vert x)}{p(K\\vert x)}} = \\theta_{20}+\\theta_{21}x_1,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "56da4328",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and so on till the class $C=K-1$ class"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4505fa6c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\log{\\frac{p(C=K-1\\vert x)}{p(K\\vert x)}} = \\theta_{(K-1)0}+\\theta_{(K-1)1}x_1,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "bf19ec92",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and the model is specified in term of $K-1$ so-called log-odds or\n",
+    "**logit** transformations."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "adda9a24",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## More classes\n",
+    "\n",
+    "In our discussion of neural networks we will encounter the above again\n",
+    "in terms of a slightly modified function, the so-called **Softmax** function.\n",
+    "\n",
+    "The softmax function is used in various multiclass classification\n",
+    "methods, such as multinomial logistic regression (also known as\n",
+    "softmax regression), multiclass linear discriminant analysis, naive\n",
+    "Bayes classifiers, and artificial neural networks.  Specifically, in\n",
+    "multinomial logistic regression and linear discriminant analysis, the\n",
+    "input to the function is the result of $K$ distinct linear functions,\n",
+    "and the predicted probability for the $k$-th class given a sample\n",
+    "vector $\\boldsymbol{x}$ and a weighting vector $\\boldsymbol{\\theta}$ is (with two\n",
+    "predictors):"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0f8fc507",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "p(C=k\\vert \\mathbf {x} )=\\frac{\\exp{(\\theta_{k0}+\\theta_{k1}x_1)}}{1+\\sum_{l=1}^{K-1}\\exp{(\\theta_{l0}+\\theta_{l1}x_1)}}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f4a6e7f0",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "It is easy to extend to more predictors. The final class is"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3077c652",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "p(C=K\\vert \\mathbf {x} )=\\frac{1}{1+\\sum_{l=1}^{K-1}\\exp{(\\theta_{l0}+\\theta_{l1}x_1)}},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c4be485b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and they sum to one. Our earlier discussions were all specialized to\n",
+    "the case with two classes only. It is easy to see from the above that\n",
+    "what we derived earlier is compatible with these equations.\n",
+    "\n",
+    "To find the optimal parameters we would typically use a gradient\n",
+    "descent method.  Newton's method and gradient descent methods are\n",
+    "discussed in the material on [optimization\n",
+    "methods](https://compphysics.github.io/MachineLearning/doc/pub/Splines/html/Splines-bs.html)."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7bbc11f4",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Optimization, the central part of any Machine Learning algortithm\n",
+    "\n",
+    "Almost every problem in machine learning and data science starts with\n",
+    "a dataset $X$, a model $g(\\theta)$, which is a function of the\n",
+    "parameters $\\theta$ and a cost function $C(X, g(\\theta))$ that allows\n",
+    "us to judge how well the model $g(\\theta)$ explains the observations\n",
+    "$X$. The model is fit by finding the values of $\\theta$ that minimize\n",
+    "the cost function. Ideally we would be able to solve for $\\theta$\n",
+    "analytically, however this is not possible in general and we must use\n",
+    "some approximative/numerical method to compute the minimum."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "bae5a35e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Revisiting our Logistic Regression case\n",
+    "\n",
+    "In our discussion on Logistic Regression we studied the \n",
+    "case of\n",
+    "two classes, with $y_i$ either\n",
+    "$0$ or $1$. Furthermore we assumed also that we have only two\n",
+    "parameters $\\theta$ in our fitting, that is we\n",
+    "defined probabilities"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d1f48b6d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\begin{align*}\n",
+    "p(y_i=1|x_i,\\boldsymbol{\\theta}) &= \\frac{\\exp{(\\theta_0+\\theta_1x_i)}}{1+\\exp{(\\theta_0+\\theta_1x_i)}},\\nonumber\\\\\n",
+    "p(y_i=0|x_i,\\boldsymbol{\\theta}) &= 1 - p(y_i=1|x_i,\\boldsymbol{\\theta}),\n",
+    "\\end{align*}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f455d10c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where $\\boldsymbol{\\theta}$ are the weights we wish to extract from data, in our case $\\theta_0$ and $\\theta_1$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7af95256",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## The equations to solve\n",
+    "\n",
+    "Our compact equations used a definition of a vector $\\boldsymbol{y}$ with $n$\n",
+    "elements $y_i$, an $n\\times p$ matrix $\\boldsymbol{X}$ which contains the\n",
+    "$x_i$ values and a vector $\\boldsymbol{p}$ of fitted probabilities\n",
+    "$p(y_i\\vert x_i,\\boldsymbol{\\theta})$. We rewrote in a more compact form\n",
+    "the first derivative of the cost function as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "95c4a0ab",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial \\mathcal{C}(\\boldsymbol{\\theta})}{\\partial \\boldsymbol{\\theta}} = -\\boldsymbol{X}^T\\left(\\boldsymbol{y}-\\boldsymbol{p}\\right).\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7183e62c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "If we in addition define a diagonal matrix $\\boldsymbol{W}$ with elements \n",
+    "$p(y_i\\vert x_i,\\boldsymbol{\\theta})(1-p(y_i\\vert x_i,\\boldsymbol{\\theta})$, we can obtain a compact expression of the second derivative as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f330965a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial^2 \\mathcal{C}(\\boldsymbol{\\theta})}{\\partial \\boldsymbol{\\theta}\\partial \\boldsymbol{\\theta}^T} = \\boldsymbol{X}^T\\boldsymbol{W}\\boldsymbol{X}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d8387d0f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "This defines what is called  the Hessian matrix."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "aa3ec339",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Solving using Newton-Raphson's method\n",
+    "\n",
+    "If we can set up these equations, Newton-Raphson's iterative method is normally the method of choice. It requires however that we can compute in an efficient way the  matrices that define the first and second derivatives. \n",
+    "\n",
+    "Our iterative scheme is then given by"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "22b0e486",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\boldsymbol{\\theta}^{\\mathrm{new}} = \\boldsymbol{\\theta}^{\\mathrm{old}}-\\left(\\frac{\\partial^2 \\mathcal{C}(\\boldsymbol{\\theta})}{\\partial \\boldsymbol{\\theta}\\partial \\boldsymbol{\\theta}^T}\\right)^{-1}_{\\boldsymbol{\\theta}^{\\mathrm{old}}}\\times \\left(\\frac{\\partial \\mathcal{C}(\\boldsymbol{\\theta})}{\\partial \\boldsymbol{\\theta}}\\right)_{\\boldsymbol{\\theta}^{\\mathrm{old}}},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b1eea85b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "or in matrix form as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e8998d56",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\boldsymbol{\\theta}^{\\mathrm{new}} = \\boldsymbol{\\theta}^{\\mathrm{old}}-\\left(\\boldsymbol{X}^T\\boldsymbol{W}\\boldsymbol{X} \\right)^{-1}\\times \\left(-\\boldsymbol{X}^T(\\boldsymbol{y}-\\boldsymbol{p}) \\right)_{\\boldsymbol{\\theta}^{\\mathrm{old}}}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "51a02d1f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "The right-hand side is computed with the old values of $\\theta$. \n",
+    "\n",
+    "If we can compute these matrices, in particular the Hessian, the above is often the easiest method to implement."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6c58b944",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Example code for Logistic Regression\n",
+    "\n",
+    "Here we make a class for Logistic regression. The code uses a simple data set and includes both a binary case and a multiclass case."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 9,
+   "id": "fa75ceac",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "import numpy as np\n",
+    "\n",
+    "class LogisticRegression:\n",
+    "    \"\"\"\n",
+    "    Logistic Regression for binary and multiclass classification.\n",
+    "    \"\"\"\n",
+    "    def __init__(self, lr=0.01, epochs=1000, fit_intercept=True, verbose=False):\n",
+    "        self.lr = lr                  # Learning rate for gradient descent\n",
+    "        self.epochs = epochs          # Number of iterations\n",
+    "        self.fit_intercept = fit_intercept  # Whether to add intercept (bias)\n",
+    "        self.verbose = verbose        # Print loss during training if True\n",
+    "        self.weights = None\n",
+    "        self.multi_class = False      # Will be determined at fit time\n",
+    "\n",
+    "    def _add_intercept(self, X):\n",
+    "        \"\"\"Add intercept term (column of ones) to feature matrix.\"\"\"\n",
+    "        intercept = np.ones((X.shape[0], 1))\n",
+    "        return np.concatenate((intercept, X), axis=1)\n",
+    "\n",
+    "    def _sigmoid(self, z):\n",
+    "        \"\"\"Sigmoid function for binary logistic.\"\"\"\n",
+    "        return 1 / (1 + np.exp(-z))\n",
+    "\n",
+    "    def _softmax(self, Z):\n",
+    "        \"\"\"Softmax function for multiclass logistic.\"\"\"\n",
+    "        exp_Z = np.exp(Z - np.max(Z, axis=1, keepdims=True))\n",
+    "        return exp_Z / np.sum(exp_Z, axis=1, keepdims=True)\n",
+    "\n",
+    "    def fit(self, X, y):\n",
+    "        \"\"\"\n",
+    "        Train the logistic regression model using gradient descent.\n",
+    "        Supports binary (sigmoid) and multiclass (softmax) based on y.\n",
+    "        \"\"\"\n",
+    "        X = np.array(X)\n",
+    "        y = np.array(y)\n",
+    "        n_samples, n_features = X.shape\n",
+    "\n",
+    "        # Add intercept if needed\n",
+    "        if self.fit_intercept:\n",
+    "            X = self._add_intercept(X)\n",
+    "            n_features += 1\n",
+    "\n",
+    "        # Determine classes and mode (binary vs multiclass)\n",
+    "        unique_classes = np.unique(y)\n",
+    "        if len(unique_classes) > 2:\n",
+    "            self.multi_class = True\n",
+    "        else:\n",
+    "            self.multi_class = False\n",
+    "\n",
+    "        # ----- Multiclass case -----\n",
+    "        if self.multi_class:\n",
+    "            n_classes = len(unique_classes)\n",
+    "            # Map original labels to 0...n_classes-1\n",
+    "            class_to_index = {c: idx for idx, c in enumerate(unique_classes)}\n",
+    "            y_indices = np.array([class_to_index[c] for c in y])\n",
+    "            # Initialize weight matrix (features x classes)\n",
+    "            self.weights = np.zeros((n_features, n_classes))\n",
+    "\n",
+    "            # One-hot encode y\n",
+    "            Y_onehot = np.zeros((n_samples, n_classes))\n",
+    "            Y_onehot[np.arange(n_samples), y_indices] = 1\n",
+    "\n",
+    "            # Gradient descent\n",
+    "            for epoch in range(self.epochs):\n",
+    "                scores = X.dot(self.weights)          # Linear scores (n_samples x n_classes)\n",
+    "                probs = self._softmax(scores)        # Probabilities (n_samples x n_classes)\n",
+    "                # Compute gradient (features x classes)\n",
+    "                gradient = (1 / n_samples) * X.T.dot(probs - Y_onehot)\n",
+    "                # Update weights\n",
+    "                self.weights -= self.lr * gradient\n",
+    "\n",
+    "                if self.verbose and epoch % 100 == 0:\n",
+    "                    # Compute current loss (categorical cross-entropy)\n",
+    "                    loss = -np.sum(Y_onehot * np.log(probs + 1e-15)) / n_samples\n",
+    "                    print(f\"[Epoch {epoch}] Multiclass loss: {loss:.4f}\")\n",
+    "\n",
+    "        # ----- Binary case -----\n",
+    "        else:\n",
+    "            # Convert y to 0/1 if not already\n",
+    "            if not np.array_equal(unique_classes, [0, 1]):\n",
+    "                # Map the two classes to 0 and 1\n",
+    "                class0, class1 = unique_classes\n",
+    "                y_binary = np.where(y == class1, 1, 0)\n",
+    "            else:\n",
+    "                y_binary = y.copy().astype(int)\n",
+    "\n",
+    "            # Initialize weights vector (features,)\n",
+    "            self.weights = np.zeros(n_features)\n",
+    "\n",
+    "            # Gradient descent\n",
+    "            for epoch in range(self.epochs):\n",
+    "                linear_model = X.dot(self.weights)     # (n_samples,)\n",
+    "                probs = self._sigmoid(linear_model)   # (n_samples,)\n",
+    "                # Gradient for binary cross-entropy\n",
+    "                gradient = (1 / n_samples) * X.T.dot(probs - y_binary)\n",
+    "                self.weights -= self.lr * gradient\n",
+    "\n",
+    "                if self.verbose and epoch % 100 == 0:\n",
+    "                    # Compute binary cross-entropy loss\n",
+    "                    loss = -np.mean(\n",
+    "                        y_binary * np.log(probs + 1e-15) + \n",
+    "                        (1 - y_binary) * np.log(1 - probs + 1e-15)\n",
+    "                    )\n",
+    "                    print(f\"[Epoch {epoch}] Binary loss: {loss:.4f}\")\n",
+    "\n",
+    "    def predict_prob(self, X):\n",
+    "        \"\"\"\n",
+    "        Compute probability estimates. Returns a 1D array for binary or\n",
+    "        a 2D array (n_samples x n_classes) for multiclass.\n",
+    "        \"\"\"\n",
+    "        X = np.array(X)\n",
+    "        # Add intercept if the model used it\n",
+    "        if self.fit_intercept:\n",
+    "            X = self._add_intercept(X)\n",
+    "        scores = X.dot(self.weights)\n",
+    "        if self.multi_class:\n",
+    "            return self._softmax(scores)\n",
+    "        else:\n",
+    "            return self._sigmoid(scores)\n",
+    "\n",
+    "    def predict(self, X):\n",
+    "        \"\"\"\n",
+    "        Predict class labels for samples in X.\n",
+    "        Returns integer class labels (0,1 for binary, or 0...C-1 for multiclass).\n",
+    "        \"\"\"\n",
+    "        probs = self.predict_prob(X)\n",
+    "        if self.multi_class:\n",
+    "            # Choose class with highest probability\n",
+    "            return np.argmax(probs, axis=1)\n",
+    "        else:\n",
+    "            # Threshold at 0.5 for binary\n",
+    "            return (probs >= 0.5).astype(int)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ad6f386c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "The class implements the sigmoid and softmax internally. During fit(),\n",
+    "we check the number of classes: if more than 2, we set\n",
+    "self.multi_class=True and perform multinomial logistic regression. We\n",
+    "one-hot encode the target vector and update a weight matrix with\n",
+    "softmax probabilities. Otherwise, we do standard binary logistic\n",
+    "regression, converting labels to 0/1 if needed and updating a weight\n",
+    "vector. In both cases we use batch gradient descent on the\n",
+    "cross-entropy loss (we add a small epsilon 1e-15 to logs for numerical\n",
+    "stability). Progress (loss) can be printed if verbose=True."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 10,
+   "id": "d56ab986",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "# Evaluation Metrics\n",
+    "#We define helper functions for accuracy and cross-entropy loss. Accuracy is the fraction of correct predictions . For loss, we compute the appropriate cross-entropy:\n",
+    "\n",
+    "def accuracy_score(y_true, y_pred):\n",
+    "    \"\"\"Accuracy = (# correct predictions) / (total samples).\"\"\"\n",
+    "    y_true = np.array(y_true)\n",
+    "    y_pred = np.array(y_pred)\n",
+    "    return np.mean(y_true == y_pred)\n",
+    "\n",
+    "def binary_cross_entropy(y_true, y_prob):\n",
+    "    \"\"\"\n",
+    "    Binary cross-entropy loss.\n",
+    "    y_true: true binary labels (0 or 1), y_prob: predicted probabilities for class 1.\n",
+    "    \"\"\"\n",
+    "    y_true = np.array(y_true)\n",
+    "    y_prob = np.clip(np.array(y_prob), 1e-15, 1-1e-15)\n",
+    "    return -np.mean(y_true * np.log(y_prob) + (1 - y_true) * np.log(1 - y_prob))\n",
+    "\n",
+    "def categorical_cross_entropy(y_true, y_prob):\n",
+    "    \"\"\"\n",
+    "    Categorical cross-entropy loss for multiclass.\n",
+    "    y_true: true labels (0...C-1), y_prob: array of predicted probabilities (n_samples x C).\n",
+    "    \"\"\"\n",
+    "    y_true = np.array(y_true, dtype=int)\n",
+    "    y_prob = np.clip(np.array(y_prob), 1e-15, 1-1e-15)\n",
+    "    # One-hot encode true labels\n",
+    "    n_samples, n_classes = y_prob.shape\n",
+    "    one_hot = np.zeros_like(y_prob)\n",
+    "    one_hot[np.arange(n_samples), y_true] = 1\n",
+    "    # Compute cross-entropy\n",
+    "    loss_vec = -np.sum(one_hot * np.log(y_prob), axis=1)\n",
+    "    return np.mean(loss_vec)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8c271104",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "### Synthetic data generation\n",
+    "\n",
+    "Binary classification data: Create two Gaussian clusters in 2D. For example, class 0 around mean [-2,-2] and class 1 around [2,2].\n",
+    "Multiclass data: Create several Gaussian clusters (one per class) spread out in feature space."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 11,
+   "id": "c01c9b95",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "import numpy as np\n",
+    "\n",
+    "def generate_binary_data(n_samples=100, n_features=2, random_state=None):\n",
+    "    \"\"\"\n",
+    "    Generate synthetic binary classification data.\n",
+    "    Returns (X, y) where X is (n_samples x n_features), y in {0,1}.\n",
+    "    \"\"\"\n",
+    "    rng = np.random.RandomState(random_state)\n",
+    "    # Half samples for class 0, half for class 1\n",
+    "    n0 = n_samples // 2\n",
+    "    n1 = n_samples - n0\n",
+    "    # Class 0 around mean -2, class 1 around +2\n",
+    "    mean0 = -2 * np.ones(n_features)\n",
+    "    mean1 =  2 * np.ones(n_features)\n",
+    "    X0 = rng.randn(n0, n_features) + mean0\n",
+    "    X1 = rng.randn(n1, n_features) + mean1\n",
+    "    X = np.vstack((X0, X1))\n",
+    "    y = np.array([0]*n0 + [1]*n1)\n",
+    "    return X, y\n",
+    "\n",
+    "def generate_multiclass_data(n_samples=150, n_features=2, n_classes=3, random_state=None):\n",
+    "    \"\"\"\n",
+    "    Generate synthetic multiclass data with n_classes Gaussian clusters.\n",
+    "    \"\"\"\n",
+    "    rng = np.random.RandomState(random_state)\n",
+    "    X = []\n",
+    "    y = []\n",
+    "    samples_per_class = n_samples // n_classes\n",
+    "    for cls in range(n_classes):\n",
+    "        # Random cluster center for each class\n",
+    "        center = rng.uniform(-5, 5, size=n_features)\n",
+    "        Xi = rng.randn(samples_per_class, n_features) + center\n",
+    "        yi = [cls] * samples_per_class\n",
+    "        X.append(Xi)\n",
+    "        y.extend(yi)\n",
+    "    X = np.vstack(X)\n",
+    "    y = np.array(y)\n",
+    "    return X, y\n",
+    "\n",
+    "\n",
+    "# Generate and test on binary data\n",
+    "X_bin, y_bin = generate_binary_data(n_samples=200, n_features=2, random_state=42)\n",
+    "model_bin = LogisticRegression(lr=0.1, epochs=1000)\n",
+    "model_bin.fit(X_bin, y_bin)\n",
+    "y_prob_bin = model_bin.predict_prob(X_bin)      # probabilities for class 1\n",
+    "y_pred_bin = model_bin.predict(X_bin)           # predicted classes 0 or 1\n",
+    "\n",
+    "acc_bin = accuracy_score(y_bin, y_pred_bin)\n",
+    "loss_bin = binary_cross_entropy(y_bin, y_prob_bin)\n",
+    "print(f\"Binary Classification - Accuracy: {acc_bin:.2f}, Cross-Entropy Loss: {loss_bin:.2f}\")\n",
+    "#For multiclass:\n",
+    "# Generate and test on multiclass data\n",
+    "X_multi, y_multi = generate_multiclass_data(n_samples=300, n_features=2, n_classes=3, random_state=1)\n",
+    "model_multi = LogisticRegression(lr=0.1, epochs=1000)\n",
+    "model_multi.fit(X_multi, y_multi)\n",
+    "y_prob_multi = model_multi.predict_prob(X_multi)     # (n_samples x 3) probabilities\n",
+    "y_pred_multi = model_multi.predict(X_multi)          # predicted labels 0,1,2\n",
+    "\n",
+    "acc_multi = accuracy_score(y_multi, y_pred_multi)\n",
+    "loss_multi = categorical_cross_entropy(y_multi, y_prob_multi)\n",
+    "print(f\"Multiclass Classification - Accuracy: {acc_multi:.2f}, Cross-Entropy Loss: {loss_multi:.2f}\")\n",
+    "\n",
+    "# CSV Export\n",
+    "import csv\n",
+    "\n",
+    "# Export binary results\n",
+    "with open('binary_results.csv', mode='w', newline='') as f:\n",
+    "    writer = csv.writer(f)\n",
+    "    writer.writerow([\"TrueLabel\", \"PredictedLabel\"])\n",
+    "    for true, pred in zip(y_bin, y_pred_bin):\n",
+    "        writer.writerow([true, pred])\n",
+    "\n",
+    "# Export multiclass results\n",
+    "with open('multiclass_results.csv', mode='w', newline='') as f:\n",
+    "    writer = csv.writer(f)\n",
+    "    writer.writerow([\"TrueLabel\", \"PredictedLabel\"])\n",
+    "    for true, pred in zip(y_multi, y_pred_multi):\n",
+    "        writer.writerow([true, pred])"
+   ]
+  }
+ ],
+ "metadata": {},
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/doc/src/week39/DataFiles/chddata.csv b/doc/src/week39/DataFiles/chddata.csv
new file mode 100644
index 000000000..7f44183c7
--- /dev/null
+++ b/doc/src/week39/DataFiles/chddata.csv
@@ -0,0 +1,100 @@
+1,    21,      1,         0
+2,    23,      1,         0
+3,    25,      1,         1
+4,    29,      1,         0
+5,    21,      1,         0
+6,    24,      1,         0
+7,    27,      1,         0
+8,    29,      1,         0
+9,    28,      1,         0
+10,   26,      1,         0
+11,   30,      2,         0
+12,   31,      2,         0 
+13,   31,      2,         0 
+14,   31,      2,         1
+15,   32,      2,         0 
+16,   34,      2,         0
+17,   34,      2,         0 
+18,   31,      2,         0 
+19,   32,      2,         0  
+20,   32,      2,         0 
+21,   33,      2,         0 
+22,   34,      2,         0 
+23,   31,      2,         1 
+24,   30,      2,         0 
+25,   33,      2,         0
+26,   36,      3,         1 
+27,   35,      3,         0 
+28,   35,      3,         0 
+29,   38,      3,         0 
+30,   37,      3,         1 
+31,   36,      3,         0 
+32,   35,      3,         0 
+33,   39,      3,         0 
+34,   39,      3,         0 
+35,   38,      3,         1 
+36,   37,      3,         0 
+37,   37,      3,         0 
+38,   40,      4,         0 
+39,   41,      4,         1
+40,   44,      4,         0 
+41,   44,      4,         0 
+42,   43,      4,         1 
+43,   42,      4,         0 
+44,   41,      4,         0 
+45,   40,      4,         1 
+46,   42,      4,         0 
+47,   42,      4,         0 
+48,   43,      4,         0 
+49,   44,      4,         1 
+50,   44,      4,         0 
+51,   42,      4,         0 
+52,   41,      4,         1 
+53,   45,      5,         0 
+54,   45,      5,         1 
+55,   49,      5,         0 
+56,   48,      5,         1 
+57,   47,      5,         0 
+58,   49,      5,         1 
+59,   46,      5,         1 
+60,   45,      5,         0 
+61,   49,      5,         1 
+62,   48,      5,         0 
+63,   47,      5,         1 
+64,   46,      5,         0 
+65,   47,      5,         0 
+66,   50,      6,         1 
+67,   51,      6,         1 
+68,   51,      6,         0 
+69,   54,      6,         1 
+70,   53,      6,         1 
+71,   51,      6,         0 
+72,   52,      6,         1 
+73,   54,      6,         0 
+74,   55,      7,         1 
+75,   56,      7,         1 
+76,   58,      7,         0 
+77,   59,      7,         1 
+78,   59,      7,         1 
+79,   58,      7,         0 
+80,   55,      7,         1 
+81,   56,      7,         1 
+82,   57,      7,         1 
+83,   58,      7,         1 
+84,   59,      7,         0 
+85,   55,      7,         1 
+86,   56,      7,         1 
+87,   57,      7,         1 
+88,   58,      7,         0 
+89,   59,      7,         1 
+90,   56,      7,         1 
+91,   60,      8,         1 
+92,   65,      8,         1 
+93,   67,      8,         1 
+94,   66,      8,         0 
+95,   63,      8,         1 
+96,   61,      8,         1 
+97,   69,      8,         1 
+98,   65,      8,         1 
+99,   64,      8,         1 
+100,  63,      8,        0 
\ No newline at end of file
diff --git a/doc/src/week39/Previousversions/week39.do.txt b/doc/src/week39/Previousversions/week39.do.txt
new file mode 100644
index 000000000..2f3e51097
--- /dev/null
+++ b/doc/src/week39/Previousversions/week39.do.txt
@@ -0,0 +1,2711 @@
+TITLE: Week 39: Optimization and  Gradient Methods
+AUTHOR: Morten Hjorth-Jensen {copyright, 1999-present|CC BY-NC} at Department of Physics, University of Oslo
+DATE: Week 39
+
+!split
+===== Plan for week 39, September 22-26, 2025 =====
+
+
+
+!split
+===== Lecture Monday September 22 =====
+
+!bblock Material for the lecture on Monday September 22
+  * Repetition of Logistic regression equations and classification problems and discussion of Gradient methods. Examples on how to implement Logistic Regression and discussion of gradient methods
+  * Stochastic Gradient descent with examples and automatic differentiation (theme also for next week).
+#  * "Video of lecture":"https://youtu.be/ISGpTC28Vmk"
+#  * "Whiteboard notes":"https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2024/NotesSeptember23.pdf"
+  * Readings and Videos:
+    * These lecture notes
+    * For a good discussion on gradient methods, we would like to recommend Goodfellow et al section 4.3-4.5 and sections 8.3-8.6. We will come back to the latter chapter in our discussion of Neural networks as well.
+    * Raschka et al, pages 53-76 on Logistic regression and pages 37-52 on gradient optimization    
+    * "Video on gradient descent":"https://www.youtube.com/watch?v=sDv4f4s2SB8"
+    * "Video on stochastic gradient descent":"https://www.youtube.com/watch?v=vMh0zPT0tLI"
+!eblock
+
+!split
+===== Lab sessions week 39 =====
+
+!bblock  Material for the active learning sessions on Tuesday and Wednesday
+  * Discussions on how to structure your report for the first project
+  * Exercise for week 39 on how to write the abstract and the introduction of the report and how to include references. 
+  * Work on project 1, in particular resampling methods like cross-validation and bootstrap. _For more discussions of project 1, chapter 5 of Goodfellow et al is a good read, in particular sections 5.1-5.5 and 5.7-5.11_.
+  * "Video on how to write scientific reports recorded during one of the lab sessions":"https://youtu.be/tVW1ZDmZnwM"
+  * A general guideline can be found at URL:"https://github.com/CompPhysics/MachineLearning/blob/master/doc/Projects/EvaluationGrading/EvaluationForm.md".
+!eblock  
+
+
+
+
+!split
+===== Lecture Monday September 22, Optimization, the central part of any Machine Learning algortithm =====
+
+The first few slides here are a repetition from last week. 
+
+Almost every problem in machine learning and data science starts with
+a dataset $X$, a model $g(\beta)$, which is a function of the
+parameters $\beta$ and a cost function $C(X, g(\beta))$ that allows
+us to judge how well the model $g(\beta)$ explains the observations
+$X$. The model is fit by finding the values of $\beta$ that minimize
+the cost function. Ideally we would be able to solve for $\beta$
+analytically, however this is not possible in general and we must use
+some approximative/numerical method to compute the minimum.
+
+
+!split
+=====  Revisiting our Logistic Regression case =====
+
+In our discussion on Logistic Regression we studied the 
+case of
+two classes, with $y_i$ either
+$0$ or $1$. Furthermore we assumed also that we have only two
+parameters $\beta$ in our fitting, that is we
+defined probabilities
+
+!bt
+\begin{align*}
+p(y_i=1|x_i,\bm{\beta}) &= \frac{\exp{(\beta_0+\beta_1x_i)}}{1+\exp{(\beta_0+\beta_1x_i)}},\nonumber\\
+p(y_i=0|x_i,\bm{\beta}) &= 1 - p(y_i=1|x_i,\bm{\beta}),
+\end{align*}
+!et
+where $\bm{\beta}$ are the weights we wish to extract from data, in our case $\beta_0$ and $\beta_1$. 
+
+!split
+===== The equations to solve =====
+
+Our compact equations used a definition of a vector $\bm{y}$ with $n$
+elements $y_i$, an $n\times p$ matrix $\bm{X}$ which contains the
+$x_i$ values and a vector $\bm{p}$ of fitted probabilities
+$p(y_i\vert x_i,\bm{\beta})$. We rewrote in a more compact form
+the first derivative of the cost function as
+
+!bt
+\[
+\frac{\partial \mathcal{C}(\bm{\beta})}{\partial \bm{\beta}} = -\bm{X}^T\left(\bm{y}-\bm{p}\right). 
+\]
+!et
+
+If we in addition define a diagonal matrix $\bm{W}$ with elements 
+$p(y_i\vert x_i,\bm{\beta})(1-p(y_i\vert x_i,\bm{\beta})$, we can obtain a compact expression of the second derivative as 
+
+!bt
+\[
+\frac{\partial^2 \mathcal{C}(\bm{\beta})}{\partial \bm{\beta}\partial \bm{\beta}^T} = \bm{X}^T\bm{W}\bm{X}. 
+\]
+!et
+This defines what is called  the Hessian matrix.
+
+
+
+
+
+!split
+===== Solving using Newton-Raphson's method =====
+
+If we can set up these equations, Newton-Raphson's iterative method is normally the method of choice. It requires however that we can compute in an efficient way the  matrices that define the first and second derivatives. 
+
+Our iterative scheme is then given by
+
+!bt
+\[
+\bm{\beta}^{\mathrm{new}} = \bm{\beta}^{\mathrm{old}}-\left(\frac{\partial^2 \mathcal{C}(\bm{\beta})}{\partial \bm{\beta}\partial \bm{\beta}^T}\right)^{-1}_{\bm{\beta}^{\mathrm{old}}}\times \left(\frac{\partial \mathcal{C}(\bm{\beta})}{\partial \bm{\beta}}\right)_{\bm{\beta}^{\mathrm{old}}},
+\]
+!et
+or in matrix form as
+
+!bt
+\[
+\bm{\beta}^{\mathrm{new}} = \bm{\beta}^{\mathrm{old}}-\left(\bm{X}^T\bm{W}\bm{X} \right)^{-1}\times \left(-\bm{X}^T(\bm{y}-\bm{p}) \right)_{\bm{\beta}^{\mathrm{old}}}.
+\]
+!et
+The right-hand side is computed with the old values of $\beta$. 
+
+If we can compute these matrices, in particular the Hessian, the above is often the easiest method to implement. 
+
+
+!split
+===== Brief reminder on Newton-Raphson's method =====
+
+Let us quickly remind ourselves how we derive the above method.
+
+Perhaps the most celebrated of all one-dimensional root-finding
+routines is Newton's method, also called the Newton-Raphson
+method. This method  requires the evaluation of both the
+function $f$ and its derivative $f'$ at arbitrary points. 
+If you can only calculate the derivative
+numerically and/or your function is not of the smooth type, we
+normally discourage the use of this method.
+
+!split
+===== The equations =====
+
+The Newton-Raphson formula consists geometrically of extending the
+tangent line at a current point until it crosses zero, then setting
+the next guess to the abscissa of that zero-crossing.  The mathematics
+behind this method is rather simple. Employing a Taylor expansion for
+$x$ sufficiently close to the solution $s$, we have
+
+
+!bt
+\[
+    f(s)=0=f(x)+(s-x)f'(x)+\frac{(s-x)^2}{2}f''(x) +\dots.
+    \label{eq:taylornr}
+\]
+!et
+
+For small enough values of the function and for well-behaved
+functions, the terms beyond linear are unimportant, hence we obtain
+
+
+!bt
+\[
+   f(x)+(s-x)f'(x)\approx 0,
+\]
+!et
+yielding
+!bt
+\[
+   s\approx x-\frac{f(x)}{f'(x)}.
+\]
+!et
+
+Having in mind an iterative procedure, it is natural to start iterating with
+!bt
+\[
+   x_{n+1}=x_n-\frac{f(x_n)}{f'(x_n)}.
+\]
+!et
+
+!split
+===== Simple geometric interpretation =====
+
+The above is Newton-Raphson's method. It has a simple geometric
+interpretation, namely $x_{n+1}$ is the point where the tangent from
+$(x_n,f(x_n))$ crosses the $x$-axis.  Close to the solution,
+Newton-Raphson converges fast to the desired result. However, if we
+are far from a root, where the higher-order terms in the series are
+important, the Newton-Raphson formula can give grossly inaccurate
+results. For instance, the initial guess for the root might be so far
+from the true root as to let the search interval include a local
+maximum or minimum of the function.  If an iteration places a trial
+guess near such a local extremum, so that the first derivative nearly
+vanishes, then Newton-Raphson may fail totally
+
+
+!split
+===== Extending to more than one variable =====
+
+Newton's method can be generalized to systems of several non-linear equations
+and variables. Consider the case with two equations
+!bt
+\[
+   \begin{array}{cc} f_1(x_1,x_2) &=0\\
+                     f_2(x_1,x_2) &=0,\end{array}
+\]
+!et
+which we Taylor expand to obtain
+
+!bt
+\[
+   \begin{array}{cc} 0=f_1(x_1+h_1,x_2+h_2)=&f_1(x_1,x_2)+h_1
+                     \partial f_1/\partial x_1+h_2
+                     \partial f_1/\partial x_2+\dots\\
+                     0=f_2(x_1+h_1,x_2+h_2)=&f_2(x_1,x_2)+h_1
+                     \partial f_2/\partial x_1+h_2
+                     \partial f_2/\partial x_2+\dots
+                       \end{array}.
+\]
+!et
+Defining the Jacobian matrix ${\bf \bm{J}}$ we have
+!bt
+\[
+ {\bf \bm{J}}=\left( \begin{array}{cc}
+                         \partial f_1/\partial x_1  & \partial f_1/\partial x_2 \\
+                          \partial f_2/\partial x_1     &\partial f_2/\partial x_2
+             \end{array} \right),
+\]
+!et
+we can rephrase Newton's method as
+!bt
+\[
+\left(\begin{array}{c} x_1^{n+1} \\ x_2^{n+1} \end{array} \right)=
+\left(\begin{array}{c} x_1^{n} \\ x_2^{n} \end{array} \right)+
+\left(\begin{array}{c} h_1^{n} \\ h_2^{n} \end{array} \right),
+\]
+!et
+where we have defined
+!bt
+\[
+   \left(\begin{array}{c} h_1^{n} \\ h_2^{n} \end{array} \right)=
+   -{\bf \bm{J}}^{-1}
+   \left(\begin{array}{c} f_1(x_1^{n},x_2^{n}) \\ f_2(x_1^{n},x_2^{n}) \end{array} \right).
+\]
+!et
+We need thus to compute the inverse of the Jacobian matrix and it
+is to understand that difficulties  may
+arise in case ${\bf \bm{J}}$ is nearly singular.
+
+It is rather straightforward to extend the above scheme to systems of
+more than two non-linear equations. In our case, the Jacobian matrix is given by the Hessian that represents the second derivative of cost function. 
+
+
+
+!split
+===== Steepest descent =====
+
+The basic idea of gradient descent is
+that a function $F(\mathbf{x})$, 
+$\mathbf{x} \equiv (x_1,\cdots,x_n)$, decreases fastest if one goes from $\bf {x}$ in the
+direction of the negative gradient $-\nabla F(\mathbf{x})$.
+
+It can be shown that if 
+!bt
+\[
+\mathbf{x}_{k+1} = \mathbf{x}_k - \gamma_k \nabla F(\mathbf{x}_k),
+\]
+!et
+with $\gamma_k > 0$.
+
+For $\gamma_k$ small enough, then $F(\mathbf{x}_{k+1}) \leq
+F(\mathbf{x}_k)$. This means that for a sufficiently small $\gamma_k$
+we are always moving towards smaller function values, i.e a minimum.
+
+!split 
+===== More on Steepest descent =====
+
+The previous observation is the basis of the method of steepest
+descent, which is also referred to as just gradient descent (GD). One
+starts with an initial guess $\mathbf{x}_0$ for a minimum of $F$ and
+computes new approximations according to
+
+!bt
+\[
+\mathbf{x}_{k+1} = \mathbf{x}_k - \gamma_k \nabla F(\mathbf{x}_k), \ \ k \geq 0.
+\]
+!et
+
+The parameter $\gamma_k$ is often referred to as the step length or
+the learning rate within the context of Machine Learning.
+
+!split 
+===== The ideal =====
+
+Ideally the sequence $\{\mathbf{x}_k \}_{k=0}$ converges to a global
+minimum of the function $F$. In general we do not know if we are in a
+global or local minimum. In the special case when $F$ is a convex
+function, all local minima are also global minima, so in this case
+gradient descent can converge to the global solution. The advantage of
+this scheme is that it is conceptually simple and straightforward to
+implement. However the method in this form has some severe
+limitations:
+
+In machine learing we are often faced with non-convex high dimensional
+cost functions with many local minima. Since GD is deterministic we
+will get stuck in a local minimum, if the method converges, unless we
+have a very good intial guess. This also implies that the scheme is
+sensitive to the chosen initial condition.
+
+Note that the gradient is a function of $\mathbf{x} =
+(x_1,\cdots,x_n)$ which makes it expensive to compute numerically.
+
+
+!split 
+===== The sensitiveness of the gradient descent ===== 
+
+The gradient descent method 
+is sensitive to the choice of learning rate $\gamma_k$. This is due
+to the fact that we are only guaranteed that $F(\mathbf{x}_{k+1}) \leq
+F(\mathbf{x}_k)$ for sufficiently small $\gamma_k$. The problem is to
+determine an optimal learning rate. If the learning rate is chosen too
+small the method will take a long time to converge and if it is too
+large we can experience erratic behavior.
+
+Many of these shortcomings can be alleviated by introducing
+randomness. One such method is that of Stochastic Gradient Descent
+(SGD), see below.
+
+
+!split 
+===== Convex functions ===== 
+
+Ideally we want our cost/loss function to be convex(concave).
+
+First we give the definition of a convex set: A set $C$ in
+$\mathbb{R}^n$ is said to be convex if, for all $x$ and $y$ in $C$ and
+all $t \in (0,1)$ , the point $(1 − t)x + ty$ also belongs to
+C. Geometrically this means that every point on the line segment
+connecting $x$ and $y$ is in $C$ as discussed below.
+
+The convex subsets of $\mathbb{R}$ are the intervals of
+$\mathbb{R}$. Examples of convex sets of $\mathbb{R}^2$ are the
+regular polygons (triangles, rectangles, pentagons, etc...).
+
+!split
+===== Convex function =====
+
+_Convex function_: Let $X \subset \mathbb{R}^n$ be a convex
+set. Assume that the function $f: X \rightarrow \mathbb{R}$ is
+continuous, then $f$ is said to be convex if $f(tx_1 + (1-t)x_2) \leq tf(x_1) + (1-t)f(x_2)$
+for all $x_1, x_2 \in X$ and for all $t \in [0,1]$.
+If $\leq$ is replaced with a strict inequaltiy in the
+definition, we demand $x_1 \neq x_2$ and $t\in(0,1)$ then $f$ is said
+to be strictly convex. For a single variable function, convexity means
+that if you draw a straight line connecting $f(x_1)$ and $f(x_2)$, the
+value of the function on the interval $[x_1,x_2]$ is always below the
+line as illustrated below.
+
+!split
+===== Conditions on convex functions =====
+
+In the following we state first and second-order conditions which
+ensures convexity of a function $f$. We write $D_f$ to denote the
+domain of $f$, i.e the subset of $R^n$ where $f$ is defined. For more
+details and proofs we refer to: "S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge University Press":"http://stanford.edu/boyd/cvxbook/".
+
+!bblock First order condition
+Suppose $f$ is differentiable (i.e $\nabla f(x)$ is well defined for
+all $x$ in the domain of $f$). Then $f$ is convex if and only if $D_f$
+is a convex set and $f(y) \geq f(x) + \nabla f(x)^T (y-x)$ holds
+for all $x,y \in D_f$.
+
+This condition means that for a convex function
+the first order Taylor expansion (right hand side above) at any point
+a global under estimator of the function. To convince yourself you can
+make a drawing of $f(x) = x^2+1$ and draw the tangent line to $f(x)$ and
+note that it is always below the graph.  
+!eblock
+
+!bblock Second order condition 
+Assume that $f$ is twice
+differentiable, i.e the Hessian matrix exists at each point in
+$D_f$. Then $f$ is convex if and only if $D_f$ is a convex set and its
+Hessian is positive semi-definite for all $x\in D_f$. For a
+single-variable function this reduces to $f''(x) \geq 0$. Geometrically this means that $f$ has nonnegative curvature
+everywhere.
+!eblock
+
+This condition is particularly useful since it gives us an procedure for determining if the function under consideration is convex, apart from using the definition.
+
+!split
+===== More on convex functions =====
+
+The next result is of great importance to us and the reason why we are
+going on about convex functions. In machine learning we frequently
+have to minimize a loss/cost function in order to find the best
+parameters for the model we are considering. 
+
+Ideally we want the
+global minimum (for high-dimensional models it is hard to know
+if we have local or global minimum). However, if the cost/loss function
+is convex the following result provides invaluable information:
+
+!bblock Any minimum is global for convex functions
+Consider the problem of finding $x \in \mathbb{R}^n$ such that $f(x)$
+is minimal, where $f$ is convex and differentiable. Then, any point
+$x^*$ that satisfies $\nabla f(x^*) = 0$ is a global minimum.
+!eblock
+
+This result means that if we know that the cost/loss function is convex and we are able to find a minimum, we are guaranteed that it is a global minimum.
+
+!split
+===== Some simple problems =====
+
+o Show that $f(x)=x^2$ is convex for $x \in \mathbb{R}$ using the definition of convexity. Hint: If you re-write the definition, $f$ is convex if the following holds for all $x,y \in D_f$ and any $\lambda \in [0,1]$ $\lambda f(x)+(1-\lambda)f(y)-f(\lambda x + (1-\lambda) y ) \geq 0$.
+
+o Using the second order condition show that the following functions are convex on the specified domain.
+ * $f(x) = e^x$ is convex for $x \in \mathbb{R}$.
+ * $g(x) = -\ln(x)$ is convex for $x \in (0,\infty)$.
+o Let $f(x) = x^2$ and $g(x) = e^x$. Show that $f(g(x))$ and $g(f(x))$ is convex for $x \in \mathbb{R}$. Also show that if $f(x)$ is any convex function than $h(x) = e^{f(x)}$ is convex.
+
+o A norm is any function that satisfy the following properties
+ * $f(\alpha x) = |\alpha| f(x)$ for all $\alpha \in \mathbb{R}$.
+ * $f(x+y) \leq f(x) + f(y)$
+ * $f(x) \leq 0$ for all $x \in \mathbb{R}^n$ with equality if and only if $x = 0$
+
+Using the definition of convexity, try to show that a function satisfying the properties above is convex (the third condition is not needed to show this).
+
+
+
+
+!split
+===== Standard steepest descent =====
+
+
+Before we proceed, we would like to discuss the approach called the
+_standard Steepest descent_ (different from the above steepest descent discussion), which again leads to us having to be able
+to compute a matrix. It belongs to the class of Conjugate Gradient methods (CG).
+
+"The success of the CG method":"https://www.cs.cmu.edu/~quake-papers/painless-conjugate-gradient.pdf"
+for finding solutions of non-linear problems is based on the theory
+of conjugate gradients for linear systems of equations. It belongs to
+the class of iterative methods for solving problems from linear
+algebra of the type 
+!bt 
+\begin{equation*} 
+\bm{A}\bm{x} = \bm{b}.
+\end{equation*} 
+!et 
+
+In the iterative process we end up with a problem like
+
+!bt
+\begin{equation*}
+  \bm{r}= \bm{b}-\bm{A}\bm{x},
+\end{equation*}
+!et
+where $\bm{r}$ is the so-called residual or error in the iterative process.
+
+When we have found the exact solution, $\bm{r}=0$.
+
+!split
+===== Gradient method =====
+
+The residual is zero when we reach the minimum of the quadratic equation
+!bt
+\begin{equation*}
+  P(\bm{x})=\frac{1}{2}\bm{x}^T\bm{A}\bm{x} - \bm{x}^T\bm{b},
+\end{equation*}
+!et
+
+with the constraint that the matrix $\bm{A}$ is positive definite and
+symmetric.  This defines also the Hessian and we want it to be  positive definite.  
+
+
+!split
+===== Steepest descent  method ===== 
+
+We denote the initial guess for $\bm{x}$ as $\bm{x}_0$. 
+We can assume without loss of generality that
+!bt
+\begin{equation*}
+\bm{x}_0=0,
+\end{equation*}
+!et
+or consider the system
+!bt
+\begin{equation*}
+\bm{A}\bm{z} = \bm{b}-\bm{A}\bm{x}_0,
+\end{equation*}
+!et
+instead.
+
+
+!split
+===== Steepest descent  method ===== 
+!bblock
+One can show that the solution $\bm{x}$ is also the unique minimizer of the quadratic form
+!bt
+\begin{equation*}
+  f(\bm{x}) = \frac{1}{2}\bm{x}^T\bm{A}\bm{x} - \bm{x}^T \bm{x} , \quad \bm{x}\in\mathbf{R}^n. 
+\end{equation*}
+!et
+This suggests taking the first basis vector $\bm{r}_1$ (see below for definition) 
+to be the gradient of $f$ at $\bm{x}=\bm{x}_0$, 
+which equals
+!bt
+\begin{equation*}
+\bm{A}\bm{x}_0-\bm{b},
+\end{equation*}
+!et
+and 
+$\bm{x}_0=0$ it is equal $-\bm{b}$.
+
+!eblock
+
+!split
+===== Final expressions ===== 
+!bblock
+We can compute the residual iteratively as
+!bt
+\begin{equation*}
+\bm{r}_{k+1}=\bm{b}-\bm{A}\bm{x}_{k+1},
+ \end{equation*}
+!et
+which equals
+!bt
+\begin{equation*}
+\bm{b}-\bm{A}(\bm{x}_k+\alpha_k\bm{r}_k),
+ \end{equation*}
+!et
+or
+!bt
+\begin{equation*}
+(\bm{b}-\bm{A}\bm{x}_k)-\alpha_k\bm{A}\bm{r}_k,
+ \end{equation*}
+!et
+which gives
+
+!bt
+\[
+\alpha_k = \frac{\bm{r}_k^T\bm{r}_k}{\bm{r}_k^T\bm{A}\bm{r}_k}
+\]
+!et
+leading to the iterative scheme
+!bt
+\begin{equation*}
+\bm{x}_{k+1}=\bm{x}_k+\alpha_k\bm{r}_{k},
+ \end{equation*}
+!et
+!eblock
+
+
+
+!split
+===== Steepest descent example =====
+
+!bc pycod
+import numpy as np
+import numpy.linalg as la
+
+import scipy.optimize as sopt
+
+import matplotlib.pyplot as pt
+from mpl_toolkits.mplot3d import axes3d
+
+def f(x):
+    return x[0]**2 + 3.0*x[1]**2
+
+def df(x):
+    return np.array([2*x[0], 6*x[1]])
+
+fig = pt.figure()
+ax = fig.add_subplot(projection = '3d')
+
+xmesh, ymesh = np.mgrid[-3:3:50j,-3:3:50j]
+fmesh = f(np.array([xmesh, ymesh]))
+ax.plot_surface(xmesh, ymesh, fmesh)
+!ec
+And then as countor plot
+!bc pycod
+pt.axis("equal")
+pt.contour(xmesh, ymesh, fmesh)
+guesses = [np.array([2, 2./5])]
+!ec
+Find guesses
+!bc pycod
+x = guesses[-1]
+s = -df(x)
+!ec
+Run it!
+!bc pycod
+def f1d(alpha):
+    return f(x + alpha*s)
+
+alpha_opt = sopt.golden(f1d)
+next_guess = x + alpha_opt * s
+guesses.append(next_guess)
+print(next_guess)
+!ec
+What happened?
+!bc pycod
+pt.axis("equal")
+pt.contour(xmesh, ymesh, fmesh, 50)
+it_array = np.array(guesses)
+pt.plot(it_array.T[0], it_array.T[1], "x-")
+!ec
+
+Note that we did only one iteration here. We can easily add more using our previous guesses.
+
+!split
+===== Conjugate gradient method ===== 
+!bblock
+In the CG method we define so-called conjugate directions and two vectors 
+$\bm{s}$ and $\bm{t}$
+are said to be
+conjugate if
+!bt
+\begin{equation*}
+\bm{s}^T\bm{A}\bm{t}= 0.
+\end{equation*}
+!et
+The philosophy of the CG method is to perform searches in various conjugate directions
+of our vectors $\bm{x}_i$ obeying the above criterion, namely
+!bt
+\begin{equation*}
+\bm{x}_i^T\bm{A}\bm{x}_j= 0.
+\end{equation*}
+!et
+Two vectors are conjugate if they are orthogonal with respect to 
+this inner product. Being conjugate is a symmetric relation: if $\bm{s}$ is conjugate to $\bm{t}$, then $\bm{t}$ is conjugate to $\bm{s}$.
+!eblock
+
+!split
+===== Conjugate gradient method ===== 
+!bblock
+An example is given by the eigenvectors of the matrix
+!bt
+\begin{equation*}
+\bm{v}_i^T\bm{A}\bm{v}_j= \lambda\bm{v}_i^T\bm{v}_j,
+\end{equation*}
+!et
+which is zero unless $i=j$. 
+!eblock
+
+
+!split
+===== Conjugate gradient method ===== 
+!bblock
+Assume now that we have a symmetric positive-definite matrix $\bm{A}$ of size
+$n\times n$. At each iteration $i+1$ we obtain the conjugate direction of a vector
+!bt
+\begin{equation*}
+\bm{x}_{i+1}=\bm{x}_{i}+\alpha_i\bm{p}_{i}. 
+\end{equation*}
+!et
+We assume that $\bm{p}_{i}$ is a sequence of $n$ mutually conjugate directions. 
+Then the $\bm{p}_{i}$  form a basis of $R^n$ and we can expand the solution 
+$  \bm{A}\bm{x} = \bm{b}$ in this basis, namely
+
+!bt
+\begin{equation*}
+  \bm{x}  = \sum^{n}_{i=1} \alpha_i \bm{p}_i.
+\end{equation*}
+!et
+!eblock
+
+!split
+===== Conjugate gradient method ===== 
+!bblock
+The coefficients are given by
+!bt
+\begin{equation*}
+    \mathbf{A}\mathbf{x} = \sum^{n}_{i=1} \alpha_i \mathbf{A} \mathbf{p}_i = \mathbf{b}.
+\end{equation*}
+!et
+Multiplying with $\bm{p}_k^T$  from the left gives
+
+!bt
+\begin{equation*}
+  \bm{p}_k^T \bm{A}\bm{x} = \sum^{n}_{i=1} \alpha_i\bm{p}_k^T \bm{A}\bm{p}_i= \bm{p}_k^T \bm{b},
+\end{equation*}
+!et
+and we can define the coefficients $\alpha_k$ as
+
+!bt
+\begin{equation*}
+    \alpha_k = \frac{\bm{p}_k^T \bm{b}}{\bm{p}_k^T \bm{A} \bm{p}_k}
+\end{equation*}
+!et 
+!eblock
+
+!split
+===== Conjugate gradient method and iterations ===== 
+!bblock
+
+If we choose the conjugate vectors $\bm{p}_k$ carefully, 
+then we may not need all of them to obtain a good approximation to the solution 
+$\bm{x}$. 
+We want to regard the conjugate gradient method as an iterative method. 
+This will us to solve systems where $n$ is so large that the direct 
+method would take too much time.
+
+We denote the initial guess for $\bm{x}$ as $\bm{x}_0$. 
+We can assume without loss of generality that
+!bt
+\begin{equation*}
+\bm{x}_0=0,
+\end{equation*}
+!et
+or consider the system
+!bt
+\begin{equation*}
+\bm{A}\bm{z} = \bm{b}-\bm{A}\bm{x}_0,
+\end{equation*}
+!et
+instead.
+!eblock
+
+
+!split
+===== Conjugate gradient method ===== 
+!bblock
+One can show that the solution $\bm{x}$ is also the unique minimizer of the quadratic form
+!bt
+\begin{equation*}
+  f(\bm{x}) = \frac{1}{2}\bm{x}^T\bm{A}\bm{x} - \bm{x}^T \bm{x} , \quad \bm{x}\in\mathbf{R}^n. 
+\end{equation*}
+!et
+This suggests taking the first basis vector $\bm{p}_1$ 
+to be the gradient of $f$ at $\bm{x}=\bm{x}_0$, 
+which equals
+!bt
+\begin{equation*}
+\bm{A}\bm{x}_0-\bm{b},
+\end{equation*}
+!et
+and 
+$\bm{x}_0=0$ it is equal $-\bm{b}$.
+The other vectors in the basis will be conjugate to the gradient, 
+hence the name conjugate gradient method.
+!eblock
+
+
+!split
+===== Conjugate gradient method ===== 
+!bblock
+Let  $\bm{r}_k$ be the residual at the $k$-th step:
+!bt
+\begin{equation*}
+\bm{r}_k=\bm{b}-\bm{A}\bm{x}_k.
+\end{equation*}
+!et
+Note that $\bm{r}_k$ is the negative gradient of $f$ at 
+$\bm{x}=\bm{x}_k$, 
+so the gradient descent method would be to move in the direction $\bm{r}_k$. 
+Here, we insist that the directions $\bm{p}_k$ are conjugate to each other, 
+so we take the direction closest to the gradient $\bm{r}_k$  
+under the conjugacy constraint. 
+This gives the following expression
+!bt
+\begin{equation*}
+\bm{p}_{k+1}=\bm{r}_k-\frac{\bm{p}_k^T \bm{A}\bm{r}_k}{\bm{p}_k^T\bm{A}\bm{p}_k} \bm{p}_k.
+\end{equation*}
+!et
+!eblock
+
+!split
+===== Conjugate gradient method ===== 
+!bblock
+We can also  compute the residual iteratively as
+!bt
+\begin{equation*}
+\bm{r}_{k+1}=\bm{b}-\bm{A}\bm{x}_{k+1},
+ \end{equation*}
+!et
+which equals
+!bt
+\begin{equation*}
+\bm{b}-\bm{A}(\bm{x}_k+\alpha_k\bm{p}_k),
+ \end{equation*}
+!et
+or
+!bt
+\begin{equation*}
+(\bm{b}-\bm{A}\bm{x}_k)-\alpha_k\bm{A}\bm{p}_k,
+ \end{equation*}
+!et
+which gives
+
+!bt
+\begin{equation*}
+\bm{r}_{k+1}=\bm{r}_k-\bm{A}\bm{p}_{k},
+ \end{equation*}
+!et
+!eblock
+
+
+
+
+
+!split 
+===== Revisiting our first homework =====
+
+We will use linear regression as a case study for the gradient descent
+methods. Linear regression is a great test case for the gradient
+descent methods discussed in the lectures since it has several
+desirable properties such as:
+
+o An analytical solution (recall homework set 1).
+o The gradient can be computed analytically.
+o The cost function is convex which guarantees that gradient descent converges for small enough learning rates
+
+We revisit an example similar to what we had in the first homework set. We had a function  of the type
+
+!bc pycod
+m = 100
+x = 2*np.random.rand(m,1)
+y = 4+3*x+np.random.randn(m,1)
+!ec
+with $x_i \in [0,1] $ is chosen randomly using a uniform distribution. Additionally we have a stochastic noise chosen according to a normal distribution $\cal {N}(0,1)$. 
+The linear regression model is given by 
+!bt
+\[
+h_\beta(x) = \bm{y} = \beta_0 + \beta_1 x,
+\] 
+!et
+such that 
+!bt
+\[
+\bm{y}_i = \beta_0 + \beta_1 x_i.
+\]
+!et
+
+!split 
+===== Gradient descent example =====
+
+Let $\mathbf{y} = (y_1,\cdots,y_n)^T$, $\mathbf{\bm{y}} = (\bm{y}_1,\cdots,\bm{y}_n)^T$ and $\beta = (\beta_0, \beta_1)^T$
+
+It is convenient to write $\mathbf{\bm{y}} = X\beta$ where $X \in \mathbb{R}^{100 \times 2} $ is the design matrix given by (we keep the intercept here)
+!bt
+\[
+X \equiv \begin{bmatrix}
+1 & x_1  \\
+\vdots & \vdots  \\
+1 & x_{100} &  \\
+\end{bmatrix}.
+\]
+!et
+The cost/loss/risk function is given by (
+!bt
+\[
+C(\beta) = \frac{1}{n}||X\beta-\mathbf{y}||_{2}^{2} = \frac{1}{n}\sum_{i=1}^{100}\left[ (\beta_0 + \beta_1 x_i)^2 - 2 y_i (\beta_0 + \beta_1 x_i) + y_i^2\right] 
+\]
+!et
+and we want to find $\beta$ such that $C(\beta)$ is minimized.
+
+!split
+===== The derivative of the cost/loss function =====
+
+Computing $\partial C(\beta) / \partial \beta_0$ and $\partial C(\beta) / \partial \beta_1$ we can show  that the gradient can be written as
+!bt
+\[
+\nabla_{\beta} C(\beta) = \frac{2}{n}\begin{bmatrix} \sum_{i=1}^{100} \left(\beta_0+\beta_1x_i-y_i\right) \\
+\sum_{i=1}^{100}\left( x_i (\beta_0+\beta_1x_i)-y_ix_i\right) \\
+\end{bmatrix} = \frac{2}{n}X^T(X\beta - \mathbf{y}), 
+\]
+!et
+where $X$ is the design matrix defined above.
+
+!split
+===== The Hessian matrix =====
+The Hessian matrix of $C(\beta)$ is given by 
+!bt
+\[
+\bm{H} \equiv \begin{bmatrix}
+\frac{\partial^2 C(\beta)}{\partial \beta_0^2} & \frac{\partial^2 C(\beta)}{\partial \beta_0 \partial \beta_1}  \\
+\frac{\partial^2 C(\beta)}{\partial \beta_0 \partial \beta_1} & \frac{\partial^2 C(\beta)}{\partial \beta_1^2} &  \\
+\end{bmatrix} = \frac{2}{n}X^T X.
+\]
+!et
+This result implies that $C(\beta)$ is a convex function since the matrix $X^T X$ always is positive semi-definite.
+
+
+
+
+!split
+===== Simple program =====
+
+We can now write a program that minimizes $C(\beta)$ using the gradient descent method with a constant learning rate $\gamma$ according to 
+!bt
+\[
+\beta_{k+1} = \beta_k - \gamma \nabla_\beta C(\beta_k), \ k=0,1,\cdots 
+\]
+!et
+
+We can use the expression we computed for the gradient and let use a
+$\beta_0$ be chosen randomly and let $\gamma = 0.001$. Stop iterating
+when $||\nabla_\beta C(\beta_k) || \leq \epsilon = 10^{-8}$. _Note that the code below does not include the latter stop criterion_.
+
+And finally we can compare our solution for $\beta$ with the analytic result given by 
+$\beta= (X^TX)^{-1} X^T \mathbf{y}$.
+
+!split
+===== Gradient Descent Example =====
+
+Here our simple example
+!bc pycod
+
+# Importing various packages
+from random import random, seed
+import numpy as np
+import matplotlib.pyplot as plt
+from mpl_toolkits.mplot3d import Axes3D
+from matplotlib import cm
+from matplotlib.ticker import LinearLocator, FormatStrFormatter
+import sys
+
+# the number of datapoints
+n = 100
+x = 2*np.random.rand(n,1)
+y = 4+3*x+np.random.randn(n,1)
+
+X = np.c_[np.ones((n,1)), x]
+# Hessian matrix
+H = (2.0/n)* X.T @ X
+# Get the eigenvalues
+EigValues, EigVectors = np.linalg.eig(H)
+print(f"Eigenvalues of Hessian Matrix:{EigValues}")
+
+beta_linreg = np.linalg.inv(X.T @ X) @ X.T @ y
+print(beta_linreg)
+beta = np.random.randn(2,1)
+
+eta = 1.0/np.max(EigValues)
+Niterations = 1000
+
+for iter in range(Niterations):
+    gradient = (2.0/n)*X.T @ (X @ beta-y)
+    beta -= eta*gradient
+
+print(beta)
+xnew = np.array([[0],[2]])
+xbnew = np.c_[np.ones((2,1)), xnew]
+ypredict = xbnew.dot(beta)
+ypredict2 = xbnew.dot(beta_linreg)
+plt.plot(xnew, ypredict, "r-")
+plt.plot(xnew, ypredict2, "b-")
+plt.plot(x, y ,'ro')
+plt.axis([0,2.0,0, 15.0])
+plt.xlabel(r'$x$')
+plt.ylabel(r'$y$')
+plt.title(r'Gradient descent example')
+plt.show()
+
+!ec
+
+!split
+===== And a corresponding example using _scikit-learn_ =====
+
+!bc pycod
+# Importing various packages
+from random import random, seed
+import numpy as np
+import matplotlib.pyplot as plt
+from sklearn.linear_model import SGDRegressor
+
+n = 100
+x = 2*np.random.rand(n,1)
+y = 4+3*x+np.random.randn(n,1)
+
+X = np.c_[np.ones((n,1)), x]
+beta_linreg = np.linalg.inv(X.T @ X) @ (X.T @ y)
+print(beta_linreg)
+sgdreg = SGDRegressor(max_iter = 50, penalty=None, eta0=0.1)
+sgdreg.fit(x,y.ravel())
+print(sgdreg.intercept_, sgdreg.coef_)
+
+!ec
+
+
+
+!split 
+===== Gradient descent and Ridge =====
+
+We have also discussed Ridge regression where the loss function contains a regularized term given by the $L_2$ norm of $\beta$, 
+!bt
+\[
+C_{\text{ridge}}(\beta) = \frac{1}{n}||X\beta -\mathbf{y}||^2 + \lambda ||\beta||^2, \ \lambda \geq 0.
+\]
+!et
+
+In order to minimize $C_{\text{ridge}}(\beta)$ using GD we adjust the gradient as follows 
+!bt
+\[
+\nabla_\beta C_{\text{ridge}}(\beta)  = \frac{2}{n}\begin{bmatrix} \sum_{i=1}^{100} \left(\beta_0+\beta_1x_i-y_i\right) \\
+\sum_{i=1}^{100}\left( x_i (\beta_0+\beta_1x_i)-y_ix_i\right) \\
+\end{bmatrix} + 2\lambda\begin{bmatrix} \beta_0 \\ \beta_1\end{bmatrix} = 2 (\frac{1}{n}X^T(X\beta - \mathbf{y})+\lambda \beta).
+\]
+!et
+
+We can easily extend our program to minimize $C_{\text{ridge}}(\beta)$ using gradient descent and compare with the analytical solution given by 
+!bt
+\[
+\beta_{\text{ridge}} = \left(X^T X + n\lambda I_{2 \times 2} \right)^{-1} X^T \mathbf{y}.
+\]
+!et
+
+!split
+===== The Hessian matrix for Ridge Regression =====
+The Hessian matrix of Ridge Regression for our simple example  is given by 
+!bt
+\[
+\bm{H} \equiv \begin{bmatrix}
+\frac{\partial^2 C(\beta)}{\partial \beta_0^2} & \frac{\partial^2 C(\beta)}{\partial \beta_0 \partial \beta_1}  \\
+\frac{\partial^2 C(\beta)}{\partial \beta_0 \partial \beta_1} & \frac{\partial^2 C(\beta)}{\partial \beta_1^2} &  \\
+\end{bmatrix} = \frac{2}{n}X^T X+2\lambda\bm{I}.
+\]
+!et
+This implies that the Hessian matrix  is positive definite, hence the stationary point is a
+minimum.
+Note that the Ridge cost function is convex being  a sum of two convex
+functions. Therefore, the stationary point is a global
+minimum of this function.
+
+
+!split
+===== Program example for gradient descent with Ridge Regression =====
+!bc pycod
+from random import random, seed
+import numpy as np
+import matplotlib.pyplot as plt
+from mpl_toolkits.mplot3d import Axes3D
+from matplotlib import cm
+from matplotlib.ticker import LinearLocator, FormatStrFormatter
+import sys
+
+# the number of datapoints
+n = 100
+x = 2*np.random.rand(n,1)
+y = 4+3*x+np.random.randn(n,1)
+
+X = np.c_[np.ones((n,1)), x]
+XT_X = X.T @ X
+
+#Ridge parameter lambda
+lmbda  = 0.001
+Id = n*lmbda* np.eye(XT_X.shape[0])
+
+# Hessian matrix
+H = (2.0/n)* XT_X+2*lmbda* np.eye(XT_X.shape[0])
+# Get the eigenvalues
+EigValues, EigVectors = np.linalg.eig(H)
+print(f"Eigenvalues of Hessian Matrix:{EigValues}")
+
+
+beta_linreg = np.linalg.inv(XT_X+Id) @ X.T @ y
+print(beta_linreg)
+# Start plain gradient descent
+beta = np.random.randn(2,1)
+
+eta = 1.0/np.max(EigValues)
+Niterations = 100
+
+for iter in range(Niterations):
+    gradients = 2.0/n*X.T @ (X @ (beta)-y)+2*lmbda*beta
+    beta -= eta*gradients
+
+print(beta)
+ypredict = X @ beta
+ypredict2 = X @ beta_linreg
+plt.plot(x, ypredict, "r-")
+plt.plot(x, ypredict2, "b-")
+plt.plot(x, y ,'ro')
+plt.axis([0,2.0,0, 15.0])
+plt.xlabel(r'$x$')
+plt.ylabel(r'$y$')
+plt.title(r'Gradient descent example for Ridge')
+plt.show()
+
+
+!ec
+
+!split
+===== Using gradient descent methods, limitations =====
+
+* _Gradient descent (GD) finds local minima of our function_. Since the GD algorithm is deterministic, if it converges, it will converge to a local minimum of our cost/loss/risk function. Because in ML we are often dealing with extremely rugged landscapes with many local minima, this can lead to poor performance.
+
+* _GD is sensitive to initial conditions_. One consequence of the local nature of GD is that initial conditions matter. Depending on where one starts, one will end up at a different local minima. Therefore, it is very important to think about how one initializes the training process. This is true for GD as well as more complicated variants of GD.
+
+* _Gradients are computationally expensive to calculate for large datasets_. In many cases in statistics and ML, the cost/loss/risk function is a sum of terms, with one term for each data point. For example, in linear regression, $E \propto \sum_{i=1}^n (y_i - \mathbf{w}^T\cdot\mathbf{x}_i)^2$; for logistic regression, the square error is replaced by the cross entropy. To calculate the gradient we have to sum over *all* $n$ data points. Doing this at every GD step becomes extremely computationally expensive. An ingenious solution to this, is to calculate the gradients using small subsets of the data called ``mini batches''. This has the added benefit of introducing stochasticity into our algorithm.
+
+* _GD is very sensitive to choices of learning rates_. GD is extremely sensitive to the choice of learning rates. If the learning rate is very small, the training process take an extremely long time. For larger learning rates, GD can diverge and give poor results. Furthermore, depending on what the local landscape looks like, we have to modify the learning rates to ensure convergence. Ideally, we would *adaptively* choose the learning rates to match the landscape.
+
+* _GD treats all directions in parameter space uniformly._ Another major drawback of GD is that unlike Newton's method, the learning rate for GD is the same in all directions in parameter space. For this reason, the maximum learning rate is set by the behavior of the steepest direction and this can significantly slow down training. Ideally, we would like to take large steps in flat directions and small steps in steep directions. Since we are exploring rugged landscapes where curvatures change, this requires us to keep track of not only the gradient but second derivatives. The ideal scenario would be to calculate the Hessian but this proves to be too computationally expensive. 
+	
+* GD can take exponential time to escape saddle points, even with random initialization. As we mentioned, GD is extremely sensitive to initial condition since it determines the particular local minimum GD would eventually reach. However, even with a good initialization scheme, through the introduction of randomness, GD can still take exponential time to escape saddle points.
+
+!split
+===== Improving gradient descent with momentum =====
+
+We discuss here some simple examples where we introduce what is called 'memory'about previous steps, or what is normally called momentum gradient descent. The mathematics is explained below in connection with Stochastic gradient descent.
+
+!bc pycod
+from numpy import asarray
+from numpy import arange
+from numpy.random import rand
+from numpy.random import seed
+from matplotlib import pyplot
+ 
+# objective function
+def objective(x):
+	return x**2.0
+ 
+# derivative of objective function
+def derivative(x):
+	return x * 2.0
+ 
+# gradient descent algorithm
+def gradient_descent(objective, derivative, bounds, n_iter, step_size):
+	# track all solutions
+	solutions, scores = list(), list()
+	# generate an initial point
+	solution = bounds[:, 0] + rand(len(bounds)) * (bounds[:, 1] - bounds[:, 0])
+	# run the gradient descent
+	for i in range(n_iter):
+		# calculate gradient
+		gradient = derivative(solution)
+		# take a step
+		solution = solution - step_size * gradient
+		# evaluate candidate point
+		solution_eval = objective(solution)
+		# store solution
+		solutions.append(solution)
+		scores.append(solution_eval)
+		# report progress
+		print('>%d f(%s) = %.5f' % (i, solution, solution_eval))
+	return [solutions, scores]
+ 
+# seed the pseudo random number generator
+seed(4)
+# define range for input
+bounds = asarray([[-1.0, 1.0]])
+# define the total iterations
+n_iter = 30
+# define the step size
+step_size = 0.1
+# perform the gradient descent search
+solutions, scores = gradient_descent(objective, derivative, bounds, n_iter, step_size)
+# sample input range uniformly at 0.1 increments
+inputs = arange(bounds[0,0], bounds[0,1]+0.1, 0.1)
+# compute targets
+results = objective(inputs)
+# create a line plot of input vs result
+pyplot.plot(inputs, results)
+# plot the solutions found
+pyplot.plot(solutions, scores, '.-', color='red')
+# show the plot
+pyplot.show()
+
+!ec
+
+
+!split
+===== Same code but now with momentum gradient descent =====
+
+!bc pycod
+from numpy import asarray
+from numpy import arange
+from numpy.random import rand
+from numpy.random import seed
+from matplotlib import pyplot
+ 
+# objective function
+def objective(x):
+	return x**2.0
+ 
+# derivative of objective function
+def derivative(x):
+	return x * 2.0
+ 
+# gradient descent algorithm
+def gradient_descent(objective, derivative, bounds, n_iter, step_size, momentum):
+	# track all solutions
+	solutions, scores = list(), list()
+	# generate an initial point
+	solution = bounds[:, 0] + rand(len(bounds)) * (bounds[:, 1] - bounds[:, 0])
+	# keep track of the change
+	change = 0.0
+	# run the gradient descent
+	for i in range(n_iter):
+		# calculate gradient
+		gradient = derivative(solution)
+		# calculate update
+		new_change = step_size * gradient + momentum * change
+		# take a step
+		solution = solution - new_change
+		# save the change
+		change = new_change
+		# evaluate candidate point
+		solution_eval = objective(solution)
+		# store solution
+		solutions.append(solution)
+		scores.append(solution_eval)
+		# report progress
+		print('>%d f(%s) = %.5f' % (i, solution, solution_eval))
+	return [solutions, scores]
+ 
+# seed the pseudo random number generator
+seed(4)
+# define range for input
+bounds = asarray([[-1.0, 1.0]])
+# define the total iterations
+n_iter = 30
+# define the step size
+step_size = 0.1
+# define momentum
+momentum = 0.3
+# perform the gradient descent search with momentum
+solutions, scores = gradient_descent(objective, derivative, bounds, n_iter, step_size, momentum)
+# sample input range uniformly at 0.1 increments
+inputs = arange(bounds[0,0], bounds[0,1]+0.1, 0.1)
+# compute targets
+results = objective(inputs)
+# create a line plot of input vs result
+pyplot.plot(inputs, results)
+# plot the solutions found
+pyplot.plot(solutions, scores, '.-', color='red')
+# show the plot
+pyplot.show()
+!ec
+
+
+
+!split
+===== Overview video on Stochastic Gradient Descent =====
+
+"What is Stochastic Gradient Descent":"https://www.youtube.com/watch?v=vMh0zPT0tLI&ab_channel=StatQuestwithJoshStarmer"
+
+
+!split
+=====  Batches and mini-batches =====
+
+In gradient descent we compute the cost function and its gradient for all data points we have.
+
+In large-scale applications such as the "ILSVRC challenge":"https://www.image-net.org/challenges/LSVRC/", the
+training data can have on order of millions of examples. Hence, it
+seems wasteful to compute the full cost function over the entire
+training set in order to perform only a single parameter update. A
+very common approach to addressing this challenge is to compute the
+gradient over batches of the training data. For example, a typical batch could contain some thousand  examples from
+an  entire training set of several millions. This batch is then used to
+perform a parameter update.
+
+!split
+===== Stochastic Gradient Descent (SGD) =====
+
+In stochastic gradient descent, the extreme case is the case where we
+have only one batch, that is we include the whole data set.
+
+This process is called Stochastic Gradient
+Descent (SGD) (or also sometimes on-line gradient descent). This is
+relatively less common to see because in practice due to vectorized
+code optimizations it can be computationally much more efficient to
+evaluate the gradient for 100 examples, than the gradient for one
+example 100 times. Even though SGD technically refers to using a
+single example at a time to evaluate the gradient, you will hear
+people use the term SGD even when referring to mini-batch gradient
+descent (i.e. mentions of MGD for “Minibatch Gradient Descent”, or BGD
+for “Batch gradient descent” are rare to see), where it is usually
+assumed that mini-batches are used. The size of the mini-batch is a
+hyperparameter but it is not very common to cross-validate or bootstrap it. It is
+usually based on memory constraints (if any), or set to some value,
+e.g. 32, 64 or 128. We use powers of 2 in practice because many
+vectorized operation implementations work faster when their inputs are
+sized in powers of 2.
+
+In our notes with  SGD we mean stochastic gradient descent with mini-batches.
+
+
+!split
+===== Stochastic Gradient Descent =====
+
+Stochastic gradient descent (SGD) and variants thereof address some of
+the shortcomings of the Gradient descent method discussed above.
+
+The underlying idea of SGD comes from the observation that the cost
+function, which we want to minimize, can almost always be written as a
+sum over $n$ data points $\{\mathbf{x}_i\}_{i=1}^n$,
+!bt
+\[
+C(\mathbf{\beta}) = \sum_{i=1}^n c_i(\mathbf{x}_i,
+\mathbf{\beta}). 
+\]
+!et
+
+!split
+===== Computation of gradients =====
+
+This in turn means that the gradient can be
+computed as a sum over $i$-gradients 
+!bt
+\[
+\nabla_\beta C(\mathbf{\beta}) = \sum_i^n \nabla_\beta c_i(\mathbf{x}_i,
+\mathbf{\beta}).
+\]
+!et
+
+Stochasticity/randomness is introduced by only taking the
+gradient on a subset of the data called minibatches.  If there are $n$
+data points and the size of each minibatch is $M$, there will be $n/M$
+minibatches. We denote these minibatches by $B_k$ where
+$k=1,\cdots,n/M$.
+
+
+
+!split
+===== SGD example =====
+As an example, suppose we have $10$ data points $(\mathbf{x}_1,\cdots, \mathbf{x}_{10})$ 
+and we choose to have $M=5$ minibathces,
+then each minibatch contains two data points. In particular we have
+$B_1 = (\mathbf{x}_1,\mathbf{x}_2), \cdots, B_5 =
+(\mathbf{x}_9,\mathbf{x}_{10})$. Note that if you choose $M=1$ you
+have only a single batch with all data points and on the other extreme,
+you may choose $M=n$ resulting in a minibatch for each datapoint, i.e
+$B_k = \mathbf{x}_k$.
+
+The idea is now to approximate the gradient by replacing the sum over
+all data points with a sum over the data points in one the minibatches
+picked at random in each gradient descent step 
+!bt
+\[
+\nabla_{\beta}
+C(\mathbf{\beta}) = \sum_{i=1}^n \nabla_\beta c_i(\mathbf{x}_i,
+\mathbf{\beta}) \rightarrow \sum_{i \in B_k}^n \nabla_\beta
+c_i(\mathbf{x}_i, \mathbf{\beta}).
+\]
+!et
+
+!split
+===== The gradient step =====
+
+Thus a gradient descent step now looks like 
+!bt
+\[
+\beta_{j+1} = \beta_j - \gamma_j \sum_{i \in B_k}^n \nabla_\beta c_i(\mathbf{x}_i,
+\mathbf{\beta})
+\]
+!et
+
+where $k$ is picked at random with equal
+probability from $[1,n/M]$. An iteration over the number of
+minibathces (n/M) is commonly referred to as an epoch. Thus it is
+typical to choose a number of epochs and for each epoch iterate over
+the number of minibatches, as exemplified in the code below.
+
+!split
+===== Simple example code =====
+
+!bc pycod
+import numpy as np 
+
+n = 100 #100 datapoints 
+M = 5   #size of each minibatch
+m = int(n/M) #number of minibatches
+n_epochs = 10 #number of epochs
+
+j = 0
+for epoch in range(1,n_epochs+1):
+    for i in range(m):
+        k = np.random.randint(m) #Pick the k-th minibatch at random
+        #Compute the gradient using the data in minibatch Bk
+        #Compute new suggestion for 
+        j += 1
+!ec
+
+Taking the gradient only on a subset of the data has two important
+benefits. First, it introduces randomness which decreases the chance
+that our opmization scheme gets stuck in a local minima. Second, if
+the size of the minibatches are small relative to the number of
+datapoints ($M <  n$), the computation of the gradient is much
+cheaper since we sum over the datapoints in the $k-th$ minibatch and not
+all $n$ datapoints.
+
+!split
+===== When do we stop? =====
+
+A natural question is when do we stop the search for a new minimum?
+One possibility is to compute the full gradient after a given number
+of epochs and check if the norm of the gradient is smaller than some
+threshold and stop if true. However, the condition that the gradient
+is zero is valid also for local minima, so this would only tell us
+that we are close to a local/global minimum. However, we could also
+evaluate the cost function at this point, store the result and
+continue the search. If the test kicks in at a later stage we can
+compare the values of the cost function and keep the $\beta$ that
+gave the lowest value.
+
+!split
+===== Slightly different approach =====
+
+Another approach is to let the step length $\gamma_j$ depend on the
+number of epochs in such a way that it becomes very small after a
+reasonable time such that we do not move at all. Such approaches are
+also called scaling. There are many such ways to "scale the learning
+rate":"https://towardsdatascience.com/gradient-descent-the-learning-rate-and-the-importance-of-feature-scaling-6c0b416596e1"
+and "discussions here":"https://www.jmlr.org/papers/volume23/20-1258/20-1258.pdf". See
+also
+URL:"https://towardsdatascience.com/learning-rate-schedules-and-adaptive-learning-rate-methods-for-deep-learning-2c8f433990d1"
+for a discussion of different scaling functions for the learning rate.
+
+!split
+===== Time decay rate =====
+
+As an example, let $e = 0,1,2,3,\cdots$ denote the current epoch and let $t_0, t_1 > 0$ be two fixed numbers. Furthermore, let $t = e \cdot m + i$ where $m$ is the number of minibatches and $i=0,\cdots,m-1$. Then the function $$\gamma_j(t; t_0, t_1) = \frac{t_0}{t+t_1} $$ goes to zero as the number of epochs gets large. I.e. we start with a step length $\gamma_j (0; t_0, t_1) = t_0/t_1$ which decays in *time* $t$.
+
+In this way we can fix the number of epochs, compute $\beta$ and
+evaluate the cost function at the end. Repeating the computation will
+give a different result since the scheme is random by design. Then we
+pick the final $\beta$ that gives the lowest value of the cost
+function.
+
+!bc pycod
+import numpy as np 
+
+def step_length(t,t0,t1):
+    return t0/(t+t1)
+
+n = 100 #100 datapoints 
+M = 5   #size of each minibatch
+m = int(n/M) #number of minibatches
+n_epochs = 500 #number of epochs
+t0 = 1.0
+t1 = 10
+
+gamma_j = t0/t1
+j = 0
+for epoch in range(1,n_epochs+1):
+    for i in range(m):
+        k = np.random.randint(m) #Pick the k-th minibatch at random
+        #Compute the gradient using the data in minibatch Bk
+        #Compute new suggestion for beta
+        t = epoch*m+i
+        gamma_j = step_length(t,t0,t1)
+        j += 1
+
+print("gamma_j after %d epochs: %g" % (n_epochs,gamma_j))
+!ec
+
+
+
+
+
+
+!split
+===== Code with a Number of Minibatches which varies =====
+
+In the code here we vary the number of mini-batches.
+!bc pycode
+# Importing various packages
+from math import exp, sqrt
+from random import random, seed
+import numpy as np
+import matplotlib.pyplot as plt
+
+n = 100
+x = 2*np.random.rand(n,1)
+y = 4+3*x+np.random.randn(n,1)
+
+X = np.c_[np.ones((n,1)), x]
+XT_X = X.T @ X
+theta_linreg = np.linalg.inv(X.T @ X) @ (X.T @ y)
+print("Own inversion")
+print(theta_linreg)
+# Hessian matrix
+H = (2.0/n)* XT_X
+EigValues, EigVectors = np.linalg.eig(H)
+print(f"Eigenvalues of Hessian Matrix:{EigValues}")
+
+theta = np.random.randn(2,1)
+eta = 1.0/np.max(EigValues)
+Niterations = 1000
+
+
+for iter in range(Niterations):
+    gradients = 2.0/n*X.T @ ((X @ theta)-y)
+    theta -= eta*gradients
+print("theta from own gd")
+print(theta)
+
+xnew = np.array([[0],[2]])
+Xnew = np.c_[np.ones((2,1)), xnew]
+ypredict = Xnew.dot(theta)
+ypredict2 = Xnew.dot(theta_linreg)
+
+n_epochs = 50
+M = 5   #size of each minibatch
+m = int(n/M) #number of minibatches
+t0, t1 = 5, 50
+
+def learning_schedule(t):
+    return t0/(t+t1)
+
+theta = np.random.randn(2,1)
+
+for epoch in range(n_epochs):
+# Can you figure out a better way of setting up the contributions to each batch?
+    for i in range(m):
+        random_index = M*np.random.randint(m)
+        xi = X[random_index:random_index+M]
+        yi = y[random_index:random_index+M]
+        gradients = (2.0/M)* xi.T @ ((xi @ theta)-yi)
+        eta = learning_schedule(epoch*m+i)
+        theta = theta - eta*gradients
+print("theta from own sdg")
+print(theta)
+
+plt.plot(xnew, ypredict, "r-")
+plt.plot(xnew, ypredict2, "b-")
+plt.plot(x, y ,'ro')
+plt.axis([0,2.0,0, 15.0])
+plt.xlabel(r'$x$')
+plt.ylabel(r'$y$')
+plt.title(r'Random numbers ')
+plt.show()
+
+!ec
+
+
+
+!split
+===== Replace or not =====
+
+In the above code, we have use replacement in setting up the
+mini-batches. The discussion
+"here":"https://sebastianraschka.com/faq/docs/sgd-methods.html" may be
+useful.  
+
+
+!split
+===== Momentum based GD =====
+
+The stochastic gradient descent (SGD) is almost always used with a
+*momentum* or inertia term that serves as a memory of the direction we
+are moving in parameter space.  This is typically implemented as
+follows
+
+!bt
+\begin{align}
+\mathbf{v}_{t}&=\gamma \mathbf{v}_{t-1}+\eta_{t}\nabla_\theta E(\boldsymbol{\theta}_t) \nonumber \\
+\boldsymbol{\theta}_{t+1}&= \boldsymbol{\theta}_t -\mathbf{v}_{t},
+\end{align}
+!et
+
+where we have introduced a momentum parameter $\gamma$, with
+$0\le\gamma\le 1$, and for brevity we dropped the explicit notation to
+indicate the gradient is to be taken over a different mini-batch at
+each step. We call this algorithm gradient descent with momentum
+(GDM). From these equations, it is clear that $\mathbf{v}_t$ is a
+running average of recently encountered gradients and
+$(1-\gamma)^{-1}$ sets the characteristic time scale for the memory
+used in the averaging procedure. Consistent with this, when
+$\gamma=0$, this just reduces down to ordinary SGD as discussed
+earlier. An equivalent way of writing the updates is
+
+!bt
+\[
+\Delta \boldsymbol{\theta}_{t+1} = \gamma \Delta \boldsymbol{\theta}_t -\ \eta_{t}\nabla_\theta E(\boldsymbol{\theta}_t),
+\]
+!et
+where we have defined $\Delta \boldsymbol{\theta}_{t}= \boldsymbol{\theta}_t-\boldsymbol{\theta}_{t-1}$.
+
+!split
+===== More on momentum based approaches =====
+
+Let us try to get more intuition from these equations. It is helpful
+to consider a simple physical analogy with a particle of mass $m$
+moving in a viscous medium with drag coefficient $\mu$ and potential
+$E(\mathbf{w})$. If we denote the particle's position by $\mathbf{w}$,
+then its motion is described by
+
+!bt
+\[
+m {d^2 \mathbf{w} \over dt^2} + \mu {d \mathbf{w} \over dt }= -\nabla_w E(\mathbf{w}).
+\]
+!et
+
+We can discretize this equation in the usual way to get
+
+!bt
+\[
+m { \mathbf{w}_{t+\Delta t}-2 \mathbf{w}_{t} +\mathbf{w}_{t-\Delta t} \over (\Delta t)^2}+\mu {\mathbf{w}_{t+\Delta t}- \mathbf{w}_{t} \over \Delta t} = -\nabla_w E(\mathbf{w}).
+\]
+!et
+
+Rearranging this equation, we can rewrite this as
+
+!bt
+\[
+\Delta \mathbf{w}_{t +\Delta t}= - { (\Delta t)^2 \over m +\mu \Delta t} \nabla_w E(\mathbf{w})+ {m \over m +\mu \Delta t} \Delta \mathbf{w}_t.
+\]
+!et
+
+!split
+===== Momentum parameter =====
+
+Notice that this equation is identical to previous one if we identify
+the position of the particle, $\mathbf{w}$, with the parameters
+$\boldsymbol{\theta}$. This allows us to identify the momentum
+parameter and learning rate with the mass of the particle and the
+viscous drag as:
+
+!bt
+\[
+\gamma= {m \over m +\mu \Delta t }, \qquad \eta = {(\Delta t)^2 \over m +\mu \Delta t}.
+\]
+!et
+
+Thus, as the name suggests, the momentum parameter is proportional to
+the mass of the particle and effectively provides inertia.
+Furthermore, in the large viscosity/small learning rate limit, our
+memory time scales as $(1-\gamma)^{-1} \approx m/(\mu \Delta t)$.
+
+Why is momentum useful? SGD momentum helps the gradient descent
+algorithm gain speed in directions with persistent but small gradients
+even in the presence of stochasticity, while suppressing oscillations
+in high-curvature directions. This becomes especially important in
+situations where the landscape is shallow and flat in some directions
+and narrow and steep in others. It has been argued that first-order
+methods (with appropriate initial conditions) can perform comparable
+to more expensive second order methods, especially in the context of
+complex deep learning models.
+
+These beneficial properties of momentum can sometimes become even more
+pronounced by using a slight modification of the classical momentum
+algorithm called Nesterov Accelerated Gradient (NAG).
+
+In the NAG algorithm, rather than calculating the gradient at the
+current parameters, $\nabla_\theta E(\boldsymbol{\theta}_t)$, one
+calculates the gradient at the expected value of the parameters given
+our current momentum, $\nabla_\theta E(\boldsymbol{\theta}_t +\gamma
+\mathbf{v}_{t-1})$. This yields the NAG update rule
+
+!bt
+\begin{align}
+\mathbf{v}_{t}&=\gamma \mathbf{v}_{t-1}+\eta_{t}\nabla_\theta E(\boldsymbol{\theta}_t +\gamma \mathbf{v}_{t-1}) \nonumber \\
+\boldsymbol{\theta}_{t+1}&= \boldsymbol{\theta}_t -\mathbf{v}_{t}.
+\end{align}
+!et
+
+One of the major advantages of NAG is that it allows for the use of a larger learning rate than GDM for the same choice of $\gamma$.
+
+
+!split
+===== Second moment of the gradient =====
+
+
+In stochastic gradient descent, with and without momentum, we still
+have to specify a schedule for tuning the learning rates $\eta_t$
+as a function of time.  As discussed in the context of Newton's
+method, this presents a number of dilemmas. The learning rate is
+limited by the steepest direction which can change depending on the
+current position in the landscape. To circumvent this problem, ideally
+our algorithm would keep track of curvature and take large steps in
+shallow, flat directions and small steps in steep, narrow directions.
+Second-order methods accomplish this by calculating or approximating
+the Hessian and normalizing the learning rate by the
+curvature. However, this is very computationally expensive for
+extremely large models. Ideally, we would like to be able to
+adaptively change the step size to match the landscape without paying
+the steep computational price of calculating or approximating
+Hessians.
+
+Recently, a number of methods have been introduced that accomplish
+this by tracking not only the gradient, but also the second moment of
+the gradient. These methods include AdaGrad, AdaDelta, Root Mean Squared Propagation (RMS-Prop), and
+"ADAM":"https://arxiv.org/abs/1412.6980".
+
+!split
+===== RMS prop =====
+
+In RMS prop, in addition to keeping a running average of the first
+moment of the gradient, we also keep track of the second moment
+denoted by $\mathbf{s}_t=\mathbb{E}[\mathbf{g}_t^2]$. The update rule
+for RMS prop is given by
+
+!bt
+\begin{align}
+\mathbf{g}_t &= \nabla_\theta E(\boldsymbol{\theta}) \\
+\mathbf{s}_t &=\beta \mathbf{s}_{t-1} +(1-\beta)\mathbf{g}_t^2 \nonumber \\
+\boldsymbol{\theta}_{t+1}&=&\boldsymbol{\theta}_t - \eta_t { \mathbf{g}_t \over \sqrt{\mathbf{s}_t +\epsilon}}, \nonumber
+\end{align}
+!et
+
+where $\beta$ controls the averaging time of the second moment and is
+typically taken to be about $\beta=0.9$, $\eta_t$ is a learning rate
+typically chosen to be $10^{-3}$, and $\epsilon\sim 10^{-8} $ is a
+small regularization constant to prevent divergences. Multiplication
+and division by vectors is understood as an element-wise operation. It
+is clear from this formula that the learning rate is reduced in
+directions where the norm of the gradient is consistently large. This
+greatly speeds up the convergence by allowing us to use a larger
+learning rate for flat directions.
+
+
+!split
+===== "ADAM optimizer":"https://arxiv.org/abs/1412.6980" =====
+
+A related algorithm is the ADAM optimizer. In
+"ADAM":"https://arxiv.org/abs/1412.6980", we keep a running average of
+both the first and second moment of the gradient and use this
+information to adaptively change the learning rate for different
+parameters.  The method isefficient when working with large
+problems involving lots data and/or parameters.  It is a combination of the
+gradient descent with momentum algorithm and the RMSprop algorithm
+discussed above.
+
+In addition to keeping a running average of the first and
+second moments of the gradient
+(i.e. $\mathbf{m}_t=\mathbb{E}[\mathbf{g}_t]$ and
+$\mathbf{s}_t=\mathbb{E}[\mathbf{g}^2_t]$, respectively), ADAM
+performs an additional bias correction to account for the fact that we
+are estimating the first two moments of the gradient using a running
+average (denoted by the hats in the update rule below). The update
+rule for ADAM is given by (where multiplication and division are once
+again understood to be element-wise operations below)
+
+!bt
+\begin{align}
+\mathbf{g}_t &= \nabla_\theta E(\boldsymbol{\theta}) \\
+\mathbf{m}_t &= \beta_1 \mathbf{m}_{t-1} + (1-\beta_1) \mathbf{g}_t \nonumber \\
+\mathbf{s}_t &=\beta_2 \mathbf{s}_{t-1} +(1-\beta_2)\mathbf{g}_t^2 \nonumber \\
+\bm{\mathbf{m}}_t&={\mathbf{m}_t \over 1-\beta_1^t} \nonumber \\
+\bm{\mathbf{s}}_t &={\mathbf{s}_t \over1-\beta_2^t} \nonumber \\
+\boldsymbol{\theta}_{t+1}&=\boldsymbol{\theta}_t - \eta_t { \bm{\mathbf{m}}_t \over \sqrt{\bm{\mathbf{s}}_t} +\epsilon}, \nonumber \\
+\end{align}
+!et
+
+where $\beta_1$ and $\beta_2$ set the memory lifetime of the first and
+second moment and are typically taken to be $0.9$ and $0.99$
+respectively, and $\eta$ and $\epsilon$ are identical to RMSprop.
+
+Like in RMSprop, the effective step size of a parameter depends on the
+magnitude of its gradient squared.  To understand this better, let us
+rewrite this expression in terms of the variance
+$\boldsymbol{\sigma}_t^2 = \bm{\mathbf{s}}_t -
+(\bm{\mathbf{m}}_t)^2$. Consider a single parameter $\theta_t$. The
+update rule for this parameter is given by
+
+!bt
+\[
+\Delta \theta_{t+1}= -\eta_t { \bm{m}_t \over \sqrt{\sigma_t^2 +  m_t^2 }+\epsilon}.
+\]
+!et
+
+!split
+===== Algorithms and codes for Adagrad, RMSprop and Adam =====
+
+The algorithms we have implemented are well described in the text by "Goodfellow, Bengio and Courville, chapter 8":"https://www.deeplearningbook.org/contents/optimization.html".
+
+The codes which implement these algorithms are discussed after our presentation of automatic differentiation.
+
+
+!split
+===== Practical tips =====
+
+* _Randomize the data when making mini-batches_. It is always important to randomly shuffle the data when forming mini-batches. Otherwise, the gradient descent method can fit spurious correlations resulting from the order in which data is presented.
+
+* _Transform your inputs_. Learning becomes difficult when our landscape has a mixture of steep and flat directions. One simple trick for minimizing these situations is to standardize the data by subtracting the mean and normalizing the variance of input variables. Whenever possible, also decorrelate the inputs. To understand why this is helpful, consider the case of linear regression. It is easy to show that for the squared error cost function, the Hessian of the cost function is just the correlation matrix between the inputs. Thus, by standardizing the inputs, we are ensuring that the landscape looks homogeneous in all directions in parameter space. Since most deep networks can be viewed as linear transformations followed by a non-linearity at each layer, we expect this intuition to hold beyond the linear case.
+
+* _Monitor the out-of-sample performance._ Always monitor the performance of your model on a validation set (a small portion of the training data that is held out of the training process to serve as a proxy for the test set. If the validation error starts increasing, then the model is beginning to overfit. Terminate the learning process. This *early stopping* significantly improves performance in many settings.
+	
+* _Adaptive optimization methods don't always have good generalization._ Recent studies have shown that adaptive methods such as ADAM, RMSPorp, and AdaGrad tend to have poor generalization compared to SGD or SGD with momentum, particularly in the high-dimensional limit (i.e. the number of parameters exceeds the number of data points). Although it is not clear at this stage why these methods perform so well in training deep neural networks, simpler procedures like properly-tuned SGD may work as well or better in these applications.
+
+Geron's text, see chapter 11, has several interesting discussions.
+
+
+
+!split
+===== Automatic differentiation =====
+
+"Automatic differentiation (AD)":"https://en.wikipedia.org/wiki/Automatic_differentiation", 
+also called algorithmic
+differentiation or computational differentiation,is a set of
+techniques to numerically evaluate the derivative of a function
+specified by a computer program. AD exploits the fact that every
+computer program, no matter how complicated, executes a sequence of
+elementary arithmetic operations (addition, subtraction,
+multiplication, division, etc.) and elementary functions (exp, log,
+sin, cos, etc.). By applying the chain rule repeatedly to these
+operations, derivatives of arbitrary order can be computed
+automatically, accurately to working precision, and using at most a
+small constant factor more arithmetic operations than the original
+program.
+
+Automatic differentiation is neither:
+
+* Symbolic differentiation, nor
+* Numerical differentiation (the method of finite differences).
+
+Symbolic differentiation can lead to inefficient code and faces the
+difficulty of converting a computer program into a single expression,
+while numerical differentiation can introduce round-off errors in the
+discretization process and cancellation
+
+
+
+Python has tools for so-called _automatic differentiation_.
+Consider the following example
+!bt
+\[
+f(x) = \sin\left(2\pi x + x^2\right)
+\]
+!et
+which has the following derivative
+!bt
+\[
+f'(x) = \cos\left(2\pi x + x^2\right)\left(2\pi + 2x\right) 
+\]
+!et
+Using _autograd_ we have
+
+!bc pycod
+import autograd.numpy as np
+
+# To do elementwise differentiation:
+from autograd import elementwise_grad as egrad 
+
+# To plot:
+import matplotlib.pyplot as plt 
+
+
+def f(x):
+    return np.sin(2*np.pi*x + x**2)
+
+def f_grad_analytic(x):
+    return np.cos(2*np.pi*x + x**2)*(2*np.pi + 2*x)
+
+# Do the comparison:
+x = np.linspace(0,1,1000)
+
+f_grad = egrad(f)
+
+computed = f_grad(x)
+analytic = f_grad_analytic(x)
+
+plt.title('Derivative computed from Autograd compared with the analytical derivative')
+plt.plot(x,computed,label='autograd')
+plt.plot(x,analytic,label='analytic')
+
+plt.xlabel('x')
+plt.ylabel('y')
+plt.legend()
+
+plt.show()
+
+print("The max absolute difference is: %g"%(np.max(np.abs(computed - analytic))))
+!ec
+
+!split 
+===== Using autograd =====
+
+Here we
+experiment with what kind of functions Autograd is capable
+of finding the gradient of. The following Python functions are just
+meant to illustrate what Autograd can do, but please feel free to
+experiment with other, possibly more complicated, functions as well.
+
+!bc pycod
+import autograd.numpy as np
+from autograd import grad
+
+def f1(x):
+    return x**3 + 1
+
+f1_grad = grad(f1)
+
+# Remember to send in float as argument to the computed gradient from Autograd!
+a = 1.0
+
+# See the evaluated gradient at a using autograd:
+print("The gradient of f1 evaluated at a = %g using autograd is: %g"%(a,f1_grad(a)))
+
+# Compare with the analytical derivative, that is f1'(x) = 3*x**2 
+grad_analytical = 3*a**2
+print("The gradient of f1 evaluated at a = %g by finding the analytic expression is: %g"%(a,grad_analytical))
+!ec
+
+
+!split
+===== Autograd with more complicated functions =====
+
+To differentiate with respect to two (or more) arguments of a Python
+function, Autograd need to know at which variable the function if
+being differentiated with respect to.
+
+!bc pycod
+import autograd.numpy as np
+from autograd import grad
+def f2(x1,x2):
+    return 3*x1**3 + x2*(x1 - 5) + 1
+
+# By sending the argument 0, Autograd will compute the derivative w.r.t the first variable, in this case x1
+f2_grad_x1 = grad(f2,0)
+
+# ... and differentiate w.r.t x2 by sending 1 as an additional arugment to grad
+f2_grad_x2 = grad(f2,1)
+
+x1 = 1.0
+x2 = 3.0 
+
+print("Evaluating at x1 = %g, x2 = %g"%(x1,x2))
+print("-"*30)
+
+# Compare with the analytical derivatives:
+
+# Derivative of f2 w.r.t x1 is: 9*x1**2 + x2:
+f2_grad_x1_analytical = 9*x1**2 + x2
+
+# Derivative of f2 w.r.t x2 is: x1 - 5:
+f2_grad_x2_analytical = x1 - 5
+
+# See the evaluated derivations:
+print("The derivative of f2 w.r.t x1: %g"%( f2_grad_x1(x1,x2) ))
+print("The analytical derivative of f2 w.r.t x1: %g"%( f2_grad_x1(x1,x2) ))
+
+print()
+
+print("The derivative of f2 w.r.t x2: %g"%( f2_grad_x2(x1,x2) ))
+print("The analytical derivative of f2 w.r.t x2: %g"%( f2_grad_x2(x1,x2) ))
+!ec
+
+Note that the grad function will not produce the true gradient of the function. The true gradient of a function with two or more variables will produce a vector, where each element is the function differentiated w.r.t a variable.
+
+
+!split
+=====  More complicated functions using the elements of their arguments directly =====
+
+!bc pycod
+import autograd.numpy as np
+from autograd import grad
+def f3(x): # Assumes x is an array of length 5 or higher
+    return 2*x[0] + 3*x[1] + 5*x[2] + 7*x[3] + 11*x[4]**2
+
+f3_grad = grad(f3)
+
+x = np.linspace(0,4,5)
+
+# Print the computed gradient:
+print("The computed gradient of f3 is: ", f3_grad(x))
+
+# The analytical gradient is: (2, 3, 5, 7, 22*x[4])
+f3_grad_analytical = np.array([2, 3, 5, 7, 22*x[4]])
+
+# Print the analytical gradient:
+print("The analytical gradient of f3 is: ", f3_grad_analytical)
+!ec
+
+Note that in this case, when sending an array as input argument, the
+output from Autograd is another array. This is the true gradient of
+the function, as opposed to the function in the previous example. By
+using arrays to represent the variables, the output from Autograd
+might be easier to work with, as the output is closer to what one
+could expect form a gradient-evaluting function.
+
+!split 
+===== Functions using mathematical functions from Numpy =====
+
+!bc pycod
+import autograd.numpy as np
+from autograd import grad
+def f4(x):
+    return np.sqrt(1+x**2) + np.exp(x) + np.sin(2*np.pi*x)
+
+f4_grad = grad(f4)
+
+x = 2.7
+
+# Print the computed derivative:
+print("The computed derivative of f4 at x = %g is: %g"%(x,f4_grad(x)))
+
+# The analytical derivative is: x/sqrt(1 + x**2) + exp(x) + cos(2*pi*x)*2*pi
+f4_grad_analytical = x/np.sqrt(1 + x**2) + np.exp(x) + np.cos(2*np.pi*x)*2*np.pi
+
+# Print the analytical gradient:
+print("The analytical gradient of f4 at x = %g is: %g"%(x,f4_grad_analytical))
+!ec
+
+
+!split
+===== More autograd =====
+
+!bc pycod
+import autograd.numpy as np
+from autograd import grad
+def f5(x):
+    if x >= 0:
+        return x**2
+    else:
+        return -3*x + 1
+
+f5_grad = grad(f5)
+
+x = 2.7
+
+# Print the computed derivative:
+print("The computed derivative of f5 at x = %g is: %g"%(x,f5_grad(x)))
+!ec
+
+
+!split
+===== And  with loops =====
+
+!bc pycod
+import autograd.numpy as np
+from autograd import grad
+def f6_for(x):
+    val = 0
+    for i in range(10):
+        val = val + x**i
+    return val
+
+def f6_while(x):
+    val = 0
+    i = 0
+    while i < 10:
+        val = val + x**i
+        i = i + 1
+    return val
+
+f6_for_grad = grad(f6_for)
+f6_while_grad = grad(f6_while)
+
+x = 0.5
+
+# Print the computed derivaties of f6_for and f6_while
+print("The computed derivative of f6_for at x = %g is: %g"%(x,f6_for_grad(x)))
+print("The computed derivative of f6_while at x = %g is: %g"%(x,f6_while_grad(x)))
+!ec
+!bc pycod
+import autograd.numpy as np
+from autograd import grad
+# Both of the functions are implementation of the sum: sum(x**i) for i = 0, ..., 9
+# The analytical derivative is: sum(i*x**(i-1)) 
+f6_grad_analytical = 0
+for i in range(10):
+    f6_grad_analytical += i*x**(i-1)
+
+print("The analytical derivative of f6 at x = %g is: %g"%(x,f6_grad_analytical))
+!ec
+
+!split
+===== Using recursion =====
+!bc pycod
+import autograd.numpy as np
+from autograd import grad
+
+def f7(n): # Assume that n is an integer
+    if n == 1 or n == 0:
+        return 1
+    else:
+        return n*f7(n-1)
+
+f7_grad = grad(f7)
+
+n = 2.0
+
+print("The computed derivative of f7 at n = %d is: %g"%(n,f7_grad(n)))
+
+# The function f7 is an implementation of the factorial of n.
+# By using the product rule, one can find that the derivative is:
+
+f7_grad_analytical = 0
+for i in range(int(n)-1):
+    tmp = 1
+    for k in range(int(n)-1):
+        if k != i:
+            tmp *= (n - k)
+    f7_grad_analytical += tmp
+
+print("The analytical derivative of f7 at n = %d is: %g"%(n,f7_grad_analytical))
+
+!ec
+Note that if n is equal to zero or one, Autograd will give an error message. This message appears when the output is independent on input.
+
+!split
+===== Unsupported functions =====
+Autograd supports many features. However, there are some functions that is not supported (yet) by Autograd.
+
+Assigning a value to the variable being differentiated with respect to
+!bc pycod
+import autograd.numpy as np
+from autograd import grad
+def f8(x): # Assume x is an array
+    x[2] = 3
+    return x*2
+
+#f8_grad = grad(f8)
+
+#x = 8.4
+
+#print("The derivative of f8 is:",f8_grad(x))
+!ec
+Here, running this code, Autograd tells us that an 'ArrayBox' does not support item assignment. The item assignment is done when the program tries to assign x[2] to the value 3. However, Autograd has implemented the computation of the derivative such that this assignment is not possible.
+
+!split
+===== The syntax a.dot(b) when finding the dot product =====
+!bc pycod
+import autograd.numpy as np
+from autograd import grad
+def f9(a): # Assume a is an array with 2 elements
+    b = np.array([1.0,2.0])
+    return a.dot(b)
+
+#f9_grad = grad(f9)
+
+#x = np.array([1.0,0.0])
+
+#print("The derivative of f9 is:",f9_grad(x))
+!ec
+
+Here we are told that the 'dot' function does not belong to Autograd's
+version of a Numpy array.  To overcome this, an alternative syntax
+which also computed the dot product can be used:
+
+!bc pycod
+import autograd.numpy as np
+from autograd import grad
+def f9_alternative(x): # Assume a is an array with 2 elements
+    b = np.array([1.0,2.0])
+    return np.dot(x,b) # The same as x_1*b_1 + x_2*b_2
+
+f9_alternative_grad = grad(f9_alternative)
+
+x = np.array([3.0,0.0])
+
+print("The gradient of f9 is:",f9_alternative_grad(x))
+
+# The analytical gradient of the dot product of vectors x and b with two elements (x_1,x_2) and (b_1, b_2) respectively
+# w.r.t x is (b_1, b_2).
+!ec
+
+
+!split
+=====  Using Autograd with OLS =====
+
+We conclude the part on optmization by showing how we can make codes
+for linear regression and logistic regression using _autograd_. The
+first example shows results with ordinary leats squares.
+
+!bc pycod
+# Using Autograd to calculate gradients for OLS
+from random import random, seed
+import numpy as np
+import autograd.numpy as np
+import matplotlib.pyplot as plt
+from autograd import grad
+
+def CostOLS(beta):
+    return (1.0/n)*np.sum((y-X @ beta)**2)
+
+n = 100
+x = 2*np.random.rand(n,1)
+y = 4+3*x+np.random.randn(n,1)
+
+X = np.c_[np.ones((n,1)), x]
+XT_X = X.T @ X
+theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)
+print("Own inversion")
+print(theta_linreg)
+# Hessian matrix
+H = (2.0/n)* XT_X
+EigValues, EigVectors = np.linalg.eig(H)
+print(f"Eigenvalues of Hessian Matrix:{EigValues}")
+
+theta = np.random.randn(2,1)
+eta = 1.0/np.max(EigValues)
+Niterations = 1000
+# define the gradient
+training_gradient = grad(CostOLS)
+
+for iter in range(Niterations):
+    gradients = training_gradient(theta)
+    theta -= eta*gradients
+print("theta from own gd")
+print(theta)
+
+xnew = np.array([[0],[2]])
+Xnew = np.c_[np.ones((2,1)), xnew]
+ypredict = Xnew.dot(theta)
+ypredict2 = Xnew.dot(theta_linreg)
+
+plt.plot(xnew, ypredict, "r-")
+plt.plot(xnew, ypredict2, "b-")
+plt.plot(x, y ,'ro')
+plt.axis([0,2.0,0, 15.0])
+plt.xlabel(r'$x$')
+plt.ylabel(r'$y$')
+plt.title(r'Random numbers ')
+plt.show()
+
+!ec
+
+
+!split
+===== Same code but now with momentum gradient descent =====
+!bc pycod
+# Using Autograd to calculate gradients for OLS
+from random import random, seed
+import numpy as np
+import autograd.numpy as np
+import matplotlib.pyplot as plt
+from autograd import grad
+
+def CostOLS(beta):
+    return (1.0/n)*np.sum((y-X @ beta)**2)
+
+n = 100
+x = 2*np.random.rand(n,1)
+y = 4+3*x#+np.random.randn(n,1)
+
+X = np.c_[np.ones((n,1)), x]
+XT_X = X.T @ X
+theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)
+print("Own inversion")
+print(theta_linreg)
+# Hessian matrix
+H = (2.0/n)* XT_X
+EigValues, EigVectors = np.linalg.eig(H)
+print(f"Eigenvalues of Hessian Matrix:{EigValues}")
+
+theta = np.random.randn(2,1)
+eta = 1.0/np.max(EigValues)
+Niterations = 30
+
+# define the gradient
+training_gradient = grad(CostOLS)
+
+for iter in range(Niterations):
+    gradients = training_gradient(theta)
+    theta -= eta*gradients
+    print(iter,gradients[0],gradients[1])
+print("theta from own gd")
+print(theta)
+
+# Now improve with momentum gradient descent
+change = 0.0
+delta_momentum = 0.3
+for iter in range(Niterations):
+    # calculate gradient
+    gradients = training_gradient(theta)
+    # calculate update
+    new_change = eta*gradients+delta_momentum*change
+    # take a step
+    theta -= new_change
+    # save the change
+    change = new_change
+    print(iter,gradients[0],gradients[1])
+print("theta from own gd wth momentum")
+print(theta)
+
+!ec
+
+!split
+===== But none of these can compete with Newton's method =====
+
+!bc pycod
+# Using Newton's method
+from random import random, seed
+import numpy as np
+import autograd.numpy as np
+import matplotlib.pyplot as plt
+from autograd import grad
+
+def CostOLS(beta):
+    return (1.0/n)*np.sum((y-X @ beta)**2)
+
+n = 100
+x = 2*np.random.rand(n,1)
+y = 4+3*x+np.random.randn(n,1)
+
+X = np.c_[np.ones((n,1)), x]
+XT_X = X.T @ X
+beta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)
+print("Own inversion")
+print(beta_linreg)
+# Hessian matrix
+H = (2.0/n)* XT_X
+# Note that here the Hessian does not depend on the parameters beta
+invH = np.linalg.pinv(H)
+EigValues, EigVectors = np.linalg.eig(H)
+print(f"Eigenvalues of Hessian Matrix:{EigValues}")
+
+beta = np.random.randn(2,1)
+Niterations = 5
+
+# define the gradient
+training_gradient = grad(CostOLS)
+
+for iter in range(Niterations):
+    gradients = training_gradient(beta)
+    beta -= invH @ gradients
+    print(iter,gradients[0],gradients[1])
+print("beta from own Newton code")
+print(beta)
+!ec
+
+
+!split
+===== Including Stochastic Gradient Descent with Autograd =====
+In this code we include the stochastic gradient descent approach discussed above. Note here that we specify which argument we are taking the derivative with respect to when using _autograd_.
+
+!bc pycod
+# Using Autograd to calculate gradients using SGD
+# OLS example
+from random import random, seed
+import numpy as np
+import autograd.numpy as np
+import matplotlib.pyplot as plt
+from autograd import grad
+
+# Note change from previous example
+def CostOLS(y,X,theta):
+    return np.sum((y-X @ theta)**2)
+
+n = 100
+x = 2*np.random.rand(n,1)
+y = 4+3*x+np.random.randn(n,1)
+
+X = np.c_[np.ones((n,1)), x]
+XT_X = X.T @ X
+theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)
+print("Own inversion")
+print(theta_linreg)
+# Hessian matrix
+H = (2.0/n)* XT_X
+EigValues, EigVectors = np.linalg.eig(H)
+print(f"Eigenvalues of Hessian Matrix:{EigValues}")
+
+theta = np.random.randn(2,1)
+eta = 1.0/np.max(EigValues)
+Niterations = 1000
+
+# Note that we request the derivative wrt third argument (theta, 2 here)
+training_gradient = grad(CostOLS,2)
+
+for iter in range(Niterations):
+    gradients = (1.0/n)*training_gradient(y, X, theta)
+    theta -= eta*gradients
+print("theta from own gd")
+print(theta)
+
+xnew = np.array([[0],[2]])
+Xnew = np.c_[np.ones((2,1)), xnew]
+ypredict = Xnew.dot(theta)
+ypredict2 = Xnew.dot(theta_linreg)
+
+plt.plot(xnew, ypredict, "r-")
+plt.plot(xnew, ypredict2, "b-")
+plt.plot(x, y ,'ro')
+plt.axis([0,2.0,0, 15.0])
+plt.xlabel(r'$x$')
+plt.ylabel(r'$y$')
+plt.title(r'Random numbers ')
+plt.show()
+
+n_epochs = 50
+M = 5   #size of each minibatch
+m = int(n/M) #number of minibatches
+t0, t1 = 5, 50
+def learning_schedule(t):
+    return t0/(t+t1)
+
+theta = np.random.randn(2,1)
+
+for epoch in range(n_epochs):
+# Can you figure out a better way of setting up the contributions to each batch?
+    for i in range(m):
+        random_index = M*np.random.randint(m)
+        xi = X[random_index:random_index+M]
+        yi = y[random_index:random_index+M]
+        gradients = (1.0/M)*training_gradient(yi, xi, theta)
+        eta = learning_schedule(epoch*m+i)
+        theta = theta - eta*gradients
+print("theta from own sdg")
+print(theta)
+
+
+!ec
+
+
+!split
+===== Same code but now with momentum gradient descent =====
+!bc pycod
+# Using Autograd to calculate gradients using SGD
+# OLS example
+from random import random, seed
+import numpy as np
+import autograd.numpy as np
+import matplotlib.pyplot as plt
+from autograd import grad
+
+# Note change from previous example
+def CostOLS(y,X,theta):
+    return np.sum((y-X @ theta)**2)
+
+n = 100
+x = 2*np.random.rand(n,1)
+y = 4+3*x+np.random.randn(n,1)
+
+X = np.c_[np.ones((n,1)), x]
+XT_X = X.T @ X
+theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)
+print("Own inversion")
+print(theta_linreg)
+# Hessian matrix
+H = (2.0/n)* XT_X
+EigValues, EigVectors = np.linalg.eig(H)
+print(f"Eigenvalues of Hessian Matrix:{EigValues}")
+
+theta = np.random.randn(2,1)
+eta = 1.0/np.max(EigValues)
+Niterations = 100
+
+# Note that we request the derivative wrt third argument (theta, 2 here)
+training_gradient = grad(CostOLS,2)
+
+for iter in range(Niterations):
+    gradients = (1.0/n)*training_gradient(y, X, theta)
+    theta -= eta*gradients
+print("theta from own gd")
+print(theta)
+
+
+n_epochs = 50
+M = 5   #size of each minibatch
+m = int(n/M) #number of minibatches
+t0, t1 = 5, 50
+def learning_schedule(t):
+    return t0/(t+t1)
+
+theta = np.random.randn(2,1)
+
+change = 0.0
+delta_momentum = 0.3
+
+for epoch in range(n_epochs):
+    for i in range(m):
+        random_index = M*np.random.randint(m)
+        xi = X[random_index:random_index+M]
+        yi = y[random_index:random_index+M]
+        gradients = (1.0/M)*training_gradient(yi, xi, theta)
+        eta = learning_schedule(epoch*m+i)
+        # calculate update
+        new_change = eta*gradients+delta_momentum*change
+        # take a step
+        theta -= new_change
+        # save the change
+        change = new_change
+print("theta from own sdg with momentum")
+print(theta)
+!ec
+
+
+!split
+===== Similar (second order function now) problem but now with AdaGrad =====
+!bc pycod
+# Using Autograd to calculate gradients using AdaGrad and Stochastic Gradient descent
+# OLS example
+from random import random, seed
+import numpy as np
+import autograd.numpy as np
+import matplotlib.pyplot as plt
+from autograd import grad
+
+# Note change from previous example
+def CostOLS(y,X,theta):
+    return np.sum((y-X @ theta)**2)
+
+n = 1000
+x = np.random.rand(n,1)
+y = 2.0+3*x +4*x*x
+
+X = np.c_[np.ones((n,1)), x, x*x]
+XT_X = X.T @ X
+theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)
+print("Own inversion")
+print(theta_linreg)
+
+
+# Note that we request the derivative wrt third argument (theta, 2 here)
+training_gradient = grad(CostOLS,2)
+# Define parameters for Stochastic Gradient Descent
+n_epochs = 50
+M = 5   #size of each minibatch
+m = int(n/M) #number of minibatches
+# Guess for unknown parameters theta
+theta = np.random.randn(3,1)
+
+# Value for learning rate
+eta = 0.01
+# Including AdaGrad parameter to avoid possible division by zero
+delta  = 1e-8
+for epoch in range(n_epochs):
+    Giter = 0.0
+    for i in range(m):
+        random_index = M*np.random.randint(m)
+        xi = X[random_index:random_index+M]
+        yi = y[random_index:random_index+M]
+        gradients = (1.0/M)*training_gradient(yi, xi, theta)
+        Giter += gradients*gradients
+        update = gradients*eta/(delta+np.sqrt(Giter))
+        theta -= update
+print("theta from own AdaGrad")
+print(theta)
+
+
+!ec
+
+Running this code we note an almost perfect agreement with the results from matrix inversion.
+
+!split
+=====  RMSprop for adaptive learning rate with Stochastic Gradient Descent =====
+!bc pycod
+# Using Autograd to calculate gradients using RMSprop  and Stochastic Gradient descent
+# OLS example
+from random import random, seed
+import numpy as np
+import autograd.numpy as np
+import matplotlib.pyplot as plt
+from autograd import grad
+
+# Note change from previous example
+def CostOLS(y,X,theta):
+    return np.sum((y-X @ theta)**2)
+
+n = 1000
+x = np.random.rand(n,1)
+y = 2.0+3*x +4*x*x# +np.random.randn(n,1)
+
+X = np.c_[np.ones((n,1)), x, x*x]
+XT_X = X.T @ X
+theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)
+print("Own inversion")
+print(theta_linreg)
+
+
+# Note that we request the derivative wrt third argument (theta, 2 here)
+training_gradient = grad(CostOLS,2)
+# Define parameters for Stochastic Gradient Descent
+n_epochs = 50
+M = 5   #size of each minibatch
+m = int(n/M) #number of minibatches
+# Guess for unknown parameters theta
+theta = np.random.randn(3,1)
+
+# Value for learning rate
+eta = 0.01
+# Value for parameter rho
+rho = 0.99
+# Including AdaGrad parameter to avoid possible division by zero
+delta  = 1e-8
+for epoch in range(n_epochs):
+    Giter = 0.0
+    for i in range(m):
+        random_index = M*np.random.randint(m)
+        xi = X[random_index:random_index+M]
+        yi = y[random_index:random_index+M]
+        gradients = (1.0/M)*training_gradient(yi, xi, theta)
+	# Accumulated gradient
+	# Scaling with rho the new and the previous results
+        Giter = (rho*Giter+(1-rho)*gradients*gradients)
+	# Taking the diagonal only and inverting
+        update = gradients*eta/(delta+np.sqrt(Giter))
+	# Hadamard product
+        theta -= update
+print("theta from own RMSprop")
+print(theta)
+!ec
+
+!split
+===== And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf" =====
+
+!bc pycod
+# Using Autograd to calculate gradients using RMSprop  and Stochastic Gradient descent
+# OLS example
+from random import random, seed
+import numpy as np
+import autograd.numpy as np
+import matplotlib.pyplot as plt
+from autograd import grad
+
+# Note change from previous example
+def CostOLS(y,X,theta):
+    return np.sum((y-X @ theta)**2)
+
+n = 1000
+x = np.random.rand(n,1)
+y = 2.0+3*x +4*x*x# +np.random.randn(n,1)
+
+X = np.c_[np.ones((n,1)), x, x*x]
+XT_X = X.T @ X
+theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)
+print("Own inversion")
+print(theta_linreg)
+
+
+# Note that we request the derivative wrt third argument (theta, 2 here)
+training_gradient = grad(CostOLS,2)
+# Define parameters for Stochastic Gradient Descent
+n_epochs = 50
+M = 5   #size of each minibatch
+m = int(n/M) #number of minibatches
+# Guess for unknown parameters theta
+theta = np.random.randn(3,1)
+
+# Value for learning rate
+eta = 0.01
+# Value for parameters beta1 and beta2, see https://arxiv.org/abs/1412.6980
+beta1 = 0.9
+beta2 = 0.999
+# Including AdaGrad parameter to avoid possible division by zero
+delta  = 1e-7
+iter = 0
+for epoch in range(n_epochs):
+    first_moment = 0.0
+    second_moment = 0.0
+    iter += 1
+    for i in range(m):
+        random_index = M*np.random.randint(m)
+        xi = X[random_index:random_index+M]
+        yi = y[random_index:random_index+M]
+        gradients = (1.0/M)*training_gradient(yi, xi, theta)
+        # Computing moments first
+        first_moment = beta1*first_moment + (1-beta1)*gradients
+        second_moment = beta2*second_moment+(1-beta2)*gradients*gradients
+        first_term = first_moment/(1.0-beta1**iter)
+        second_term = second_moment/(1.0-beta2**iter)
+	# Scaling with rho the new and the previous results
+        update = eta*first_term/(np.sqrt(second_term)+delta)
+        theta -= update
+print("theta from own ADAM")
+print(theta)
+!ec
+
+!split
+===== And Logistic Regression =====
+
+!bc pycod
+import autograd.numpy as np
+from autograd import grad
+
+def sigmoid(x):
+    return 0.5 * (np.tanh(x / 2.) + 1)
+
+def logistic_predictions(weights, inputs):
+    # Outputs probability of a label being true according to logistic model.
+    return sigmoid(np.dot(inputs, weights))
+
+def training_loss(weights):
+    # Training loss is the negative log-likelihood of the training labels.
+    preds = logistic_predictions(weights, inputs)
+    label_probabilities = preds * targets + (1 - preds) * (1 - targets)
+    return -np.sum(np.log(label_probabilities))
+
+# Build a toy dataset.
+inputs = np.array([[0.52, 1.12,  0.77],
+                   [0.88, -1.08, 0.15],
+                   [0.52, 0.06, -1.30],
+                   [0.74, -2.49, 1.39]])
+targets = np.array([True, True, False, True])
+
+# Define a function that returns gradients of training loss using Autograd.
+training_gradient_fun = grad(training_loss)
+
+# Optimize weights using gradient descent.
+weights = np.array([0.0, 0.0, 0.0])
+print("Initial loss:", training_loss(weights))
+for i in range(100):
+    weights -= training_gradient_fun(weights) * 0.01
+
+print("Trained loss:", training_loss(weights))
+!ec
+
+
+!split
+===== Introducing "JAX":"https://jax.readthedocs.io/en/latest/" =====
+
+Presently, instead of using _autograd_, we recommend using "JAX":"https://jax.readthedocs.io/en/latest/"
+
+_JAX_ is Autograd and "XLA (Accelerated Linear Algebra))":"https://www.tensorflow.org/xla",
+brought together for high-performance numerical computing and machine learning research.
+It provides composable transformations of Python+NumPy programs: differentiate, vectorize, parallelize, Just-In-Time compile to GPU/TPU, and more.
+
+Here's a simple example on how you can use _JAX_ to compute the derivate of the logistic function.
+
+!bc pycod
+import jax.numpy as jnp
+from jax import grad, jit, vmap
+
+def sum_logistic(x):
+  return jnp.sum(1.0 / (1.0 + jnp.exp(-x)))
+
+x_small = jnp.arange(3.)
+derivative_fn = grad(sum_logistic)
+print(derivative_fn(x_small))
+
+!ec
+
+
+
+
+
diff --git a/doc/src/week39/binary_results.csv b/doc/src/week39/binary_results.csv
new file mode 100644
index 000000000..1a5f8e043
--- /dev/null
+++ b/doc/src/week39/binary_results.csv
@@ -0,0 +1,201 @@
+TrueLabel,PredictedLabel
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,0
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
diff --git a/doc/src/week39/codes/creditcard.csv b/doc/src/week39/codes/creditcard.csv
new file mode 100644
index 000000000..161281a32
--- /dev/null
+++ b/doc/src/week39/codes/creditcard.csv
@@ -0,0 +1,30001 @@
+AmountOfGivenCredit,Gender,Education,MaritalStatus,Age,RepaymentStatus,AmountOfBillStatement,AmountOfPreviousPayment,Outcome
+20000,Female,University,Married,24,2,3913,0,1
+120000,Female,University,Single,26,-1,2682,0,1
+90000,Female,University,Single,34,0,29239,1518,0
+50000,Female,University,Married,37,0,46990,2000,0
+50000,Male,University,Married,57,-1,8617,2000,0
+50000,Male,Graduate School,Single,37,0,64400,2500,0
+500000,Male,Graduate School,Single,29,0,367965,55000,0
+100000,Female,University,Single,23,0,11876,380,0
+140000,Female,High School,Married,28,0,11285,3329,0
+20000,Male,High School,Single,35,-2,0,0,0
+200000,Female,High School,Single,34,0,11073,2306,0
+260000,Female,Graduate School,Single,51,-1,12261,21818,0
+630000,Female,University,Single,41,-1,12137,1000,0
+70000,Male,University,Single,30,1,65802,3200,1
+250000,Male,Graduate School,Single,29,0,70887,3000,0
+50000,Female,High School,Others,23,1,50614,0,0
+20000,Male,Graduate School,Single,24,0,15376,3200,1
+320000,Male,Graduate School,Married,49,0,253286,10358,0
+360000,Female,Graduate School,Married,49,1,0,0,0
+180000,Female,Graduate School,Single,29,1,0,0,0
+130000,Female,High School,Single,39,0,38358,3000,0
+120000,Female,University,Married,39,-1,316,316,1
+70000,Female,University,Single,26,2,41087,2007,1
+450000,Female,Graduate School,Married,40,-2,5512,19428,1
+90000,Male,Graduate School,Single,23,0,4744,5757,0
+50000,Male,High School,Single,23,0,47620,1973,0
+60000,Male,Graduate School,Single,27,1,-109,0,1
+50000,Female,High School,Single,30,0,22541,1300,0
+50000,Female,High School,Married,47,-1,650,3415,0
+50000,Male,Graduate School,Single,26,0,15329,1500,0
+230000,Female,Graduate School,Single,27,-1,16646,17270,0
+50000,Male,University,Single,33,2,30518,1718,1
+100000,Male,Graduate School,Single,32,0,93036,3023,0
+500000,Female,University,Married,54,-2,10929,4152,0
+500000,Male,Graduate School,Married,58,-2,13709,5006,0
+160000,Male,Graduate School,Single,30,-1,30265,131,0
+280000,Male,University,Married,40,0,186503,8026,0
+60000,Female,University,Single,22,0,15054,1500,0
+50000,Male,Graduate School,Single,25,1,0,780,1
+280000,Male,Graduate School,Single,31,-1,498,9075,0
+360000,Male,Graduate School,Single,33,0,218668,10000,0
+70000,Female,Graduate School,Single,25,0,67521,3000,0
+10000,Male,University,Single,22,0,1877,1500,0
+140000,Female,University,Married,37,0,59504,3000,0
+40000,Female,Graduate School,Single,30,0,18927,3000,0
+210000,Male,Graduate School,Single,29,-2,0,0,1
+20000,Female,Graduate School,Single,22,0,14028,3000,1
+150000,Female,Others,Single,46,0,4463,1013,1
+380000,Male,University,Single,32,-1,22401,21540,0
+20000,Male,Graduate School,Single,24,0,17447,1318,0
+70000,Male,High School,Single,42,1,37042,0,1
+100000,Female,High School,Others,43,0,61559,2000,0
+310000,Female,University,Married,49,-2,13465,7875,0
+180000,Female,Graduate School,Single,25,1,41402,1300,0
+150000,Female,Graduate School,Single,29,2,46224,1600,0
+500000,Female,Graduate School,Married,45,-2,1905,3640,0
+180000,Female,High School,Married,34,0,16386,8500,0
+180000,Female,University,Married,34,0,175886,8083,0
+200000,Female,Graduate School,Single,34,-1,1587,0,0
+400000,Female,University,Married,29,0,400134,17000,0
+500000,Female,High School,Married,28,0,22848,1516,1
+70000,Male,University,Married,39,0,70800,4025,0
+50000,Male,Graduate School,Single,29,2,24987,0,1
+50000,Female,University,Married,46,0,28718,1000,1
+130000,Female,University,Married,51,-1,99,0,0
+200000,Male,Graduate School,Married,57,-2,152519,0,1
+10000,Male,University,Married,56,2,2097,2300,1
+210000,Female,Graduate School,Single,30,2,300,300,0
+130000,Female,High School,Single,29,1,-190,0,0
+20000,Male,Others,Single,22,2,18565,0,0
+80000,Male,Graduate School,Single,31,-1,780,0,0
+320000,Male,University,Single,29,2,58267,2500,1
+200000,Female,University,Married,32,-1,9076,5818,0
+290000,Female,Graduate School,Single,37,1,0,0,0
+340000,Male,Graduate School,Single,32,-1,3048,5713,0
+20000,Male,University,Single,24,0,14619,2850,0
+50000,Male,High School,Single,25,-1,42838,1759,0
+300000,Female,Graduate School,Married,45,-1,291,291,0
+30000,Female,University,Single,22,0,28387,1686,1
+240000,Female,University,Single,44,1,0,0,1
+470000,Female,High School,Others,33,0,165254,6400,0
+360000,Female,Graduate School,Single,26,0,23411,4796,0
+60000,Male,High School,Single,30,0,26324,1576,1
+400000,Female,University,Married,44,0,131595,10700,0
+50000,Female,High School,Single,49,0,48909,1676,0
+160000,Male,University,Single,33,0,130028,4400,0
+360000,Female,Graduate School,Married,45,-1,390,1170,1
+160000,Female,University,Single,32,0,3826,1147,0
+130000,Female,Graduate School,Married,35,0,81313,40000,0
+20000,Male,High School,Single,44,2,8583,0,0
+200000,Male,Graduate School,Married,53,2,138180,6300,1
+280000,Female,Graduate School,Single,39,-1,7524,0,0
+100000,Female,Graduate School,Single,27,-2,-2000,7555,0
+160000,Female,University,Married,37,-1,880,1602,0
+60000,Female,University,Single,23,0,45648,1937,0
+90000,Male,University,Single,35,0,83725,3621,0
+360000,Male,Graduate School,Married,43,-1,3967,8339,0
+150000,Male,Graduate School,Single,27,0,86009,4031,0
+50000,Female,High School,Married,22,0,18722,1411,0
+20000,Male,University,Married,38,0,17973,1699,1
+140000,Male,Graduate School,Single,32,-2,672,10212,0
+380000,Female,Graduate School,Single,30,-2,-81,223,0
+480000,Male,Graduate School,Married,63,0,422069,16078,0
+50000,Female,High School,Single,22,0,44698,1767,1
+60000,Female,University,Single,26,2,56685,0,0
+70000,Female,University,Single,24,-1,5580,0,0
+80000,Female,University,Married,36,-1,6108,2861,0
+350000,Male,Graduate School,Single,52,-1,713,2272,0
+130000,Male,University,Single,38,0,171438,10908,0
+360000,Male,University,Married,35,1,-103,0,0
+330000,Female,Graduate School,Married,31,0,105879,9260,0
+50000,Male,High School,Married,47,0,13244,2000,0
+280000,Male,University,Married,41,2,135673,6500,0
+100000,Female,Graduate School,Single,24,0,52128,2000,0
+50000,Male,University,Single,41,0,19015,1340,0
+30000,Male,Graduate School,Single,24,-1,18199,0,1
+240000,Male,Graduate School,Single,28,-1,326,326,0
+80000,Male,University,Single,26,2,14029,2000,1
+400000,Male,University,Married,34,-1,19660,9677,0
+240000,Female,University,Single,38,0,50254,2000,0
+50000,Male,High School,Single,37,2,46004,1000,1
+450000,Male,Graduate School,Married,40,1,0,0,1
+110000,Female,Graduate School,Married,48,1,0,0,0
+310000,Female,University,Married,35,2,304991,13019,1
+20000,Male,Graduate School,Single,27,0,19115,1404,0
+20000,Male,University,Single,23,1,0,0,0
+200000,Male,High School,Single,52,0,110151,3568,0
+180000,Male,Graduate School,Married,36,0,163736,4655,1
+50000,Male,University,Married,51,0,3347,1000,1
+60000,Male,High School,Married,55,3,60521,2504,0
+30000,Female,Graduate School,Single,23,1,4000,5645,0
+240000,Male,Graduate School,Single,41,1,95,2622,0
+420000,Male,University,Married,34,0,253454,9744,0
+330000,Male,High School,Married,46,0,227389,8210,0
+30000,Female,University,Single,22,0,28452,2000,0
+240000,Male,University,Married,34,0,10674,1500,0
+150000,Male,Graduate School,Single,27,0,17444,2500,0
+210000,Female,University,Married,33,0,7166,1500,0
+50000,Female,High School,Married,51,-1,752,300,1
+50000,Male,Graduate School,Single,24,0,50801,2360,0
+240000,Male,Graduate School,Single,47,1,0,0,1
+180000,Male,University,Single,28,-1,1832,0,0
+50000,Male,University,Single,23,1,10131,1000,0
+170000,Male,University,Single,29,-2,12159,10000,0
+20000,Male,Graduate School,Single,29,-1,1199,15586,0
+50000,Male,Graduate School,Single,28,0,4999,1000,0
+170000,Female,University,Single,27,0,19269,1661,1
+200000,Male,Graduate School,Single,34,1,197236,8000,0
+80000,Female,University,Married,23,1,9168,1650,0
+260000,Female,Graduate School,Married,60,1,-1100,0,0
+140000,Male,University,Married,32,0,86627,3455,0
+80000,Male,Graduate School,Single,25,0,42444,30000,0
+350000,Male,Graduate School,Single,41,1,208,2906,0
+280000,Female,University,Married,56,0,208775,8042,0
+30000,Female,High School,Single,26,0,9014,1700,0
+140000,Male,Graduate School,Married,34,0,23944,5000,0
+200000,Female,Graduate School,Single,37,0,105420,4000,0
+200000,Female,High School,Single,30,0,196031,7300,0
+210000,Male,High School,Married,45,2,115785,10478,1
+50000,Male,High School,Married,57,3,12854,0,0
+30000,Male,Graduate School,Single,41,2,24357,3500,0
+50000,Male,University,Single,27,2,390,390,0
+290000,Male,High School,Married,47,-1,1234,396,0
+250000,Female,Graduate School,Married,34,0,141223,17994,0
+60000,Female,University,Married,46,0,21148,3000,0
+110000,Female,Graduate School,Single,27,0,101640,5500,0
+370000,Male,Graduate School,Single,50,-2,6093,15383,0
+100000,Male,University,Married,27,-1,102349,3166,0
+90000,Female,University,Married,35,0,72112,3500,0
+50000,Female,University,Single,22,0,28040,1510,0
+270000,Male,University,Single,37,0,37695,5000,0
+300000,Female,Graduate School,Single,30,-1,688,3288,0
+50000,Female,University,Single,22,-1,8567,15000,0
+50000,Female,Graduate School,Single,24,1,-709,0,1
+360000,Male,Graduate School,Single,29,1,0,0,0
+130000,Male,High School,Married,56,1,64617,3000,1
+80000,Male,Graduate School,Single,30,-2,6187,504,0
+50000,Male,University,Single,30,1,48860,0,0
+20000,Female,University,Single,22,0,16001,1212,0
+80000,Female,University,Married,29,0,77883,5800,0
+240000,Male,Graduate School,Single,37,-1,12212,15000,0
+80000,Female,High School,Single,35,0,49608,12500,0
+500000,Female,Graduate School,Married,47,0,56422,70010,0
+60000,Female,University,Married,24,0,58024,2500,1
+20000,Male,University,Single,25,0,10642,1200,0
+100000,Female,University,Married,38,1,14483,0,1
+360000,Female,Graduate School,Single,32,1,2616,57087,0
+200000,Female,High School,Single,47,2,199436,8214,1
+130000,Female,University,Married,34,1,0,5396,0
+20000,Female,University,Single,31,1,21703,0,0
+310000,Male,Graduate School,Single,32,0,59901,10020,0
+60000,Female,Graduate School,Single,27,2,19625,1342,1
+180000,Female,Graduate School,Single,29,-1,11386,199,0
+180000,Female,Graduate School,Single,24,-1,14670,37867,0
+50000,Male,University,Married,36,0,47790,2000,0
+50000,Female,Graduate School,Single,24,1,36166,1900,1
+150000,Female,University,Married,34,-2,0,0,0
+20000,Female,Graduate School,Single,22,0,18553,1500,0
+500000,Female,Graduate School,Married,34,-2,412,138,0
+30000,Female,High School,Single,22,1,29010,1000,1
+180000,Female,Graduate School,Married,38,-2,750,0,0
+140000,Male,Graduate School,Single,31,0,27123,25000,0
+140000,Female,Graduate School,Single,26,-1,13424,10000,0
+120000,Male,University,Single,26,0,107314,5000,0
+360000,Female,University,Married,48,0,226430,9100,0
+20000,Male,High School,Married,43,-1,227,22359,0
+100000,Male,Graduate School,Married,47,-1,390,1473,0
+210000,Female,University,Married,41,-1,3088,1586,1
+80000,Female,University,Single,24,0,81625,3200,1
+330000,Female,Graduate School,Married,50,-1,8872,4345,0
+220000,Male,Graduate School,Single,24,0,105607,5000,0
+210000,Female,Graduate School,Married,38,1,0,0,1
+40000,Female,High School,Married,43,0,38257,1700,0
+30000,Male,University,Single,39,0,28347,3036,0
+470000,Male,University,Single,27,2,296573,13000,0
+30000,Male,High School,Married,42,-1,390,390,0
+240000,Female,University,Married,36,1,-946,0,1
+80000,Male,University,Single,36,2,79278,3100,0
+110000,Female,High School,Others,31,0,89989,4000,0
+310000,Female,Graduate School,Single,38,-1,1424,4542,0
+360000,Female,Graduate School,Married,40,2,105167,12000,1
+330000,Female,High School,Single,45,0,335196,12388,0
+300000,Female,University,Single,35,0,291619,12019,0
+320000,Female,University,Single,33,0,91653,10042,0
+50000,Female,High School,Single,37,1,52626,0,1
+170000,Female,Graduate School,Single,28,0,130370,6530,0
+350000,Male,High School,Married,44,-1,3265,2686,0
+20000,Male,University,Married,37,0,16455,1609,0
+50000,Male,University,Single,23,2,49758,2401,1
+20000,Female,High School,Single,24,1,19154,0,0
+50000,Female,Graduate School,Single,24,0,35084,2090,0
+20000,Male,University,Single,23,1,20235,0,1
+50000,Male,University,Married,26,0,102800,0,0
+190000,Male,University,Single,34,2,129801,5000,1
+60000,Female,University,Married,33,0,58203,1506,0
+80000,Male,Graduate School,Single,35,-1,7988,3983,0
+150000,Female,University,Single,27,0,127402,4500,0
+210000,Female,Graduate School,Single,31,-2,1440,0,0
+240000,Female,High School,Married,50,0,234205,10116,1
+140000,Female,University,Others,41,0,19346,3000,1
+60000,Female,Graduate School,Single,28,1,21501,0,0
+50000,Male,Graduate School,Single,39,0,47174,1800,0
+50000,Female,University,Single,27,0,26655,1800,0
+180000,Female,Graduate School,Single,32,-1,3139,2821,0
+30000,Female,Graduate School,Single,40,-1,142,355,0
+20000,Female,University,Single,40,0,19816,1601,0
+250000,Female,University,Married,75,0,52874,1631,0
+100000,Female,University,Single,27,0,37767,1788,0
+330000,Male,Graduate School,Single,25,0,46140,2504,0
+50000,Male,High School,Married,46,0,35713,1561,0
+50000,Female,University,Single,26,1,50867,1710,0
+30000,Male,University,Single,28,0,29242,5006,0
+140000,Female,University,Single,26,0,101551,5366,1
+160000,Female,Graduate School,Single,28,0,70243,7042,0
+400000,Female,University,Single,29,-1,19532,3127,0
+50000,Male,University,Married,25,0,27699,1565,0
+140000,Female,University,Married,48,0,1154,1233,0
+160000,Female,University,Married,33,-1,8015,2453,0
+100000,Female,High School,Married,43,0,62170,0,0
+220000,Male,Graduate School,Married,48,2,210380,10000,1
+510000,Female,Graduate School,Single,29,0,78331,40010,0
+50000,Male,University,Single,29,0,49276,1602,0
+160000,Female,Graduate School,Married,38,-1,4473,4664,0
+230000,Female,Graduate School,Single,37,-2,2283,51315,0
+80000,Female,Graduate School,Single,37,-1,10115,258,0
+150000,Female,University,Married,25,-1,6156,0,0
+10000,Male,High School,Single,23,0,6974,1134,0
+130000,Female,High School,Single,49,0,89003,4100,1
+40000,Female,Graduate School,Single,48,0,34871,1600,0
+50000,Male,University,Single,27,3,13694,1700,1
+100000,Male,High School,Single,32,0,70003,2035,0
+120000,Male,Graduate School,Married,34,-1,1082,0,0
+260000,Female,Graduate School,Single,58,1,0,0,1
+200000,Female,University,Single,33,0,11253,2000,0
+360000,Female,University,Single,28,-2,0,0,1
+70000,Female,University,Single,36,0,68699,4483,0
+460000,Male,Graduate School,Married,40,2,171,2728,1
+50000,Female,Graduate School,Single,26,-1,1156,316,0
+50000,Male,High School,Married,33,0,50859,2007,0
+250000,Female,University,Married,31,0,103109,4000,0
+30000,Female,Graduate School,Single,27,0,1512,1000,0
+270000,Male,University,Married,44,0,127543,10063,0
+180000,Female,High School,Single,30,0,163935,7722,0
+100000,Female,Graduate School,Single,28,0,53494,4173,0
+230000,Female,High School,Married,32,0,195126,5000,0
+210000,Female,Graduate School,Married,46,-1,110346,41986,0
+210000,Male,Graduate School,Single,38,2,60502,3276,0
+440000,Female,University,Married,46,0,180641,8610,0
+100000,Female,University,Single,25,2,70896,3472,1
+240000,Female,University,Married,46,0,8751,1295,0
+280000,Male,Graduate School,Single,33,-1,898,898,1
+50000,Female,University,Single,22,1,0,0,1
+30000,Male,University,Married,43,3,1700,1100,0
+10000,Male,University,Single,27,0,7015,3507,1
+130000,Male,Graduate School,Single,29,1,0,0,0
+200000,Male,Graduate School,Single,31,0,194353,9000,0
+120000,Female,University,Married,29,-1,1686,657,0
+210000,Female,University,Married,36,0,143225,6483,0
+280000,Female,Graduate School,Married,36,0,58983,2755,0
+300000,Male,Graduate School,Single,34,1,0,0,1
+100000,Male,University,Single,37,0,99209,3490,0
+440000,Female,Graduate School,Single,36,-2,0,0,0
+50000,Female,High School,Married,40,0,49941,2400,0
+20000,Female,Graduate School,Single,25,0,11968,2800,1
+200000,Female,Graduate School,Single,27,-1,871,665,0
+110000,Female,University,Single,26,-2,432,3220,0
+500000,Female,University,Married,36,-1,10483,23962,0
+300000,Female,University,Married,27,1,0,5215,0
+30000,Male,High School,Married,55,2,9720,1200,1
+60000,Female,Graduate School,Single,27,-1,20847,13809,0
+400000,Female,University,Married,38,-1,3697,1000,0
+180000,Female,University,Married,34,0,143301,7000,0
+20000,Male,Graduate School,Single,40,0,10934,3000,0
+200000,Female,High School,Married,49,-1,52439,0,1
+100000,Male,Graduate School,Single,27,0,55734,1631,0
+60000,Male,University,Single,23,0,49507,1970,1
+110000,Female,University,Single,36,0,47819,2200,0
+260000,Female,High School,Married,53,1,0,165,0
+50000,Female,High School,Married,46,3,48522,3754,0
+180000,Male,University,Single,29,1,-2,0,0
+110000,Male,University,Single,29,1,58362,2500,1
+340000,Female,Graduate School,Single,27,0,172180,21105,0
+50000,Male,University,Single,24,-1,1399,1070,0
+230000,Female,University,Married,28,0,26361,1710,1
+360000,Female,University,Single,32,0,93029,10000,0
+210000,Male,University,Single,43,1,0,0,0
+220000,Female,University,Married,41,-1,650,325,1
+60000,Female,University,Married,34,2,24898,3800,1
+340000,Male,University,Married,55,-2,50396,44665,0
+150000,Male,Graduate School,Married,40,2,99855,5200,0
+200000,Female,University,Married,37,1,-179,1000,1
+130000,Female,University,Single,40,0,9559,3000,1
+60000,Female,Graduate School,Single,28,1,10540,0,0
+400000,Female,University,Married,43,1,0,0,0
+20000,Male,Graduate School,Single,27,0,20571,1323,0
+190000,Female,Graduate School,Single,28,0,143464,5795,0
+260000,Female,Graduate School,Single,30,0,156882,10000,0
+140000,Female,University,Single,31,-2,390,0,0
+50000,Female,University,Married,31,0,46512,1974,1
+120000,Female,Graduate School,Single,34,-1,882,6900,0
+240000,Female,University,Single,30,-2,92,92,0
+50000,Male,University,Single,36,0,5916,1212,0
+200000,Female,Graduate School,Single,29,-1,2393,0,0
+180000,Male,Graduate School,Married,39,0,274731,11000,0
+180000,Female,Graduate School,Single,26,0,145403,5440,0
+160000,Female,Graduate School,Single,29,-1,3097,0,0
+100000,Female,University,Single,26,-1,1370,4210,0
+50000,Female,University,Single,29,0,47081,3000,1
+140000,Female,University,Single,31,0,138119,6600,0
+30000,Female,University,Single,26,0,17893,1411,1
+90000,Male,Graduate School,Single,26,0,39128,1541,0
+200000,Male,Graduate School,Married,36,1,-14386,10118,0
+380000,Male,Graduate School,Single,30,1,0,0,1
+80000,Female,University,Single,43,0,77057,3177,0
+50000,Male,High School,Married,58,2,15899,3400,0
+180000,Male,University,Married,45,-1,27679,1481,1
+240000,Female,University,Single,26,0,141315,6349,0
+380000,Male,University,Single,34,0,194314,9010,0
+110000,Female,Others,Single,24,0,83755,3000,0
+260000,Male,Graduate School,Married,34,-1,49657,291,0
+500000,Female,University,Single,23,4,507726,10000,1
+320000,Female,University,Single,30,0,71305,2589,0
+50000,Male,University,Single,32,4,45734,0,0
+110000,Male,Graduate School,Married,40,1,0,0,1
+230000,Female,Graduate School,Single,35,-2,761,300,0
+330000,Female,University,Married,56,1,-9,1529,0
+50000,Female,University,Married,61,0,47166,3000,0
+10000,Male,University,Single,24,-1,2887,0,0
+300000,Male,Graduate School,Married,40,1,0,0,0
+20000,Female,University,Married,32,3,20631,0,1
+180000,Male,University,Married,36,2,83538,0,1
+160000,Male,Graduate School,Single,30,-1,99,2977,0
+90000,Female,Graduate School,Single,34,0,34302,1640,0
+30000,Male,High School,Others,54,0,22987,2500,0
+180000,Male,Graduate School,Married,46,0,173267,8100,0
+30000,Male,High School,Married,58,0,19143,1640,0
+30000,Male,University,Married,43,1,20518,0,0
+140000,Female,Graduate School,Single,28,-1,10833,14000,0
+210000,Male,Graduate School,Single,30,0,69937,3500,0
+50000,Male,University,Single,31,0,20526,2000,1
+130000,Female,University,Single,29,0,89055,4200,0
+50000,Female,University,Married,37,2,1894,1500,1
+140000,Male,High School,Married,36,0,100594,3000,0
+170000,Female,University,Single,28,-1,2948,1105,1
+80000,Male,University,Single,25,0,22619,10010,0
+410000,Female,Others,Married,42,0,338106,15000,0
+80000,Female,Graduate School,Single,29,1,0,80000,0
+80000,Male,High School,Married,73,-1,63144,3500,1
+150000,Male,University,Married,43,0,153249,7900,0
+260000,Male,Graduate School,Married,30,-1,1131,291,1
+350000,Male,Graduate School,Married,38,1,0,0,0
+280000,Female,University,Married,39,1,-1312,38621,1
+310000,Male,University,Married,44,0,103797,4400,0
+140000,Female,University,Single,29,0,20110,3000,0
+360000,Male,Graduate School,Married,51,-1,38000,25000,0
+140000,Female,Graduate School,Single,27,0,142428,10000,0
+100000,Male,Graduate School,Single,38,0,40810,2502,0
+50000,Female,University,Single,22,-2,0,0,0
+210000,Female,University,Married,30,0,104701,4002,0
+120000,Female,University,Single,25,2,120298,4007,0
+240000,Male,Graduate School,Single,29,0,156362,5233,0
+60000,Female,High School,Others,42,0,41322,1488,0
+150000,Male,Graduate School,Single,41,-1,1464,2091,1
+30000,Male,University,Married,48,0,27879,1774,0
+160000,Female,Graduate School,Single,29,2,160432,7500,1
+200000,Male,Graduate School,Married,52,-2,3858,2690,0
+120000,Female,University,Married,25,0,26476,6000,0
+500000,Male,Graduate School,Married,37,1,507062,325,0
+320000,Female,Graduate School,Single,27,0,55607,11404,0
+280000,Male,Graduate School,Single,31,0,168382,10000,0
+60000,Female,Graduate School,Single,23,1,29332,0,1
+200000,Female,Graduate School,Married,38,-1,190,0,0
+230000,Female,Graduate School,Married,32,1,0,0,0
+230000,Female,University,Single,27,1,13668,0,1
+480000,Female,High School,Married,41,-2,14867,26734,0
+50000,Female,High School,Married,35,-1,22500,5000,0
+200000,Female,Graduate School,Single,31,-2,7184,17507,0
+50000,Female,Graduate School,Single,30,0,14811,2000,0
+50000,Male,University,Married,48,0,44338,3629,0
+260000,Female,Graduate School,Single,36,1,0,0,1
+30000,Male,Graduate School,Single,24,2,25265,1500,0
+440000,Female,Graduate School,Single,29,0,169661,8071,0
+230000,Female,University,Single,38,-1,10237,2795,0
+200000,Male,University,Married,38,2,104978,4500,1
+490000,Female,University,Married,30,0,192586,10000,0
+80000,Female,University,Married,32,-1,11262,0,0
+20000,Female,High School,Single,49,2,9551,3000,1
+170000,Female,Graduate School,Single,31,-1,199,199,1
+70000,Male,Graduate School,Single,27,0,70119,3600,0
+210000,Female,University,Single,33,0,212601,6000,0
+90000,Female,University,Single,25,0,91894,3200,0
+390000,Female,Graduate School,Single,27,0,304867,13007,1
+110000,Female,Graduate School,Single,29,0,48123,2500,0
+580000,Female,Graduate School,Married,36,0,159760,6422,0
+360000,Male,High School,Married,55,0,9210,6015,0
+270000,Female,University,Married,32,1,234752,10000,1
+60000,Female,Graduate School,Single,27,-1,14072,0,0
+110000,Male,High School,Married,37,0,61807,2300,0
+50000,Female,Graduate School,Married,25,0,51044,2500,0
+80000,Female,Graduate School,Single,23,0,77024,3212,1
+50000,Male,University,Single,23,-1,350,350,0
+100000,Male,Graduate School,Single,32,0,83831,4000,0
+30000,Female,Graduate School,Single,26,1,0,0,0
+200000,Female,Graduate School,Single,27,0,122923,5704,0
+240000,Male,Graduate School,Single,44,0,246135,12026,1
+220000,Male,University,Single,34,0,37304,1600,0
+160000,Female,Graduate School,Single,28,0,146433,4500,0
+180000,Female,Graduate School,Married,53,-1,46122,1260,0
+200000,Male,Others,Married,42,0,38564,5000,0
+140000,Female,High School,Married,30,0,96304,4505,0
+380000,Male,Graduate School,Married,51,1,0,0,1
+600000,Male,Graduate School,Married,53,2,467150,0,1
+260000,Male,University,Single,37,0,255691,14000,0
+200000,Female,Graduate School,Single,26,0,156786,179,0
+80000,Male,University,Single,42,0,78129,2500,0
+40000,Male,University,Single,39,0,40034,2027,0
+130000,Male,High School,Single,39,1,71034,0,0
+50000,Male,Graduate School,Single,32,-1,23974,1800,0
+230000,Female,University,Married,41,-1,819,18163,0
+70000,Male,University,Married,48,0,50927,2100,1
+180000,Female,University,Married,38,0,152510,5657,0
+80000,Female,University,Married,36,-1,1689,8638,0
+290000,Female,University,Single,26,0,20807,80000,0
+230000,Female,Graduate School,Single,28,0,30141,120093,0
+170000,Female,High School,Others,56,1,0,0,0
+230000,Male,Graduate School,Married,59,-1,208459,7536,1
+220000,Male,Graduate School,Single,38,0,212795,8909,1
+230000,Male,University,Single,32,0,28114,1610,0
+500000,Male,University,Married,38,0,59372,10026,0
+90000,Female,Graduate School,Married,41,-1,6160,3276,0
+390000,Female,High School,Single,35,0,355215,13802,1
+400000,Female,University,Single,41,0,366193,14020,0
+180000,Male,University,Married,34,-1,2927,2790,0
+60000,Male,High School,Married,38,1,29295,1800,0
+120000,Female,University,Married,32,-1,4158,2682,0
+60000,Female,University,Single,29,0,27199,1900,0
+50000,Male,Graduate School,Single,44,0,50893,1800,0
+70000,Female,University,Married,37,0,66759,2518,0
+500000,Female,Graduate School,Married,38,-1,600,0,0
+200000,Male,Graduate School,Single,31,-2,0,0,0
+250000,Female,University,Married,40,0,44930,120041,0
+140000,Male,University,Single,26,0,131367,17000,1
+70000,Female,University,Single,28,0,41264,2112,0
+90000,Male,High School,Married,48,1,77604,1700,0
+100000,Female,University,Married,43,-1,2491,1709,0
+190000,Male,University,Married,31,0,177554,7600,0
+230000,Female,Graduate School,Married,42,-1,6865,15111,0
+320000,Male,University,Married,39,0,93768,3317,0
+220000,Female,University,Married,36,-1,396,1648,1
+260000,Female,Graduate School,Married,34,-1,16426,6277,0
+90000,Female,University,Single,32,-2,573,157,0
+30000,Female,University,Married,43,2,28703,1300,0
+260000,Male,University,Single,30,0,98560,4000,0
+170000,Female,University,Single,28,0,28304,4000,0
+80000,Female,Graduate School,Married,28,0,8574,1203,1
+50000,Male,Graduate School,Married,41,-1,508,1000,0
+200000,Female,High School,Single,47,0,203183,7521,0
+20000,Male,University,Single,23,0,16440,7403,0
+250000,Female,Graduate School,Married,49,-1,66,0,0
+20000,Male,Graduate School,Married,38,0,15912,1623,0
+30000,Female,University,Married,38,0,20344,2000,0
+90000,Female,University,Married,35,1,23132,0,0
+230000,Female,Others,Single,46,0,221590,10000,0
+130000,Female,Graduate School,Married,33,0,19161,2000,1
+30000,Male,Others,Married,53,-2,0,0,0
+160000,Male,Graduate School,Single,35,-1,396,396,0
+500000,Female,University,Single,43,-1,3959,0,0
+180000,Female,University,Single,28,0,42238,2500,0
+80000,Female,University,Married,30,0,21507,1700,0
+120000,Male,Graduate School,Single,26,-1,884,0,0
+130000,Male,University,Married,31,-1,1261,1261,0
+120000,Female,University,Single,35,-1,240,240,1
+360000,Male,Graduate School,Single,28,-1,20223,53228,0
+50000,Male,University,Single,23,2,29509,2000,1
+20000,Female,University,Single,23,0,18355,2900,0
+100000,Male,High School,Married,56,0,94265,2385,0
+70000,Male,High School,Married,32,0,68739,3060,0
+50000,Male,University,Married,47,0,13917,3000,0
+10000,Female,University,Single,22,1,10250,0,0
+290000,Female,Graduate School,Married,34,0,93844,6000,0
+30000,Female,Graduate School,Married,31,0,16496,1597,0
+410000,Female,University,Single,27,1,-58,33049,0
+360000,Male,Graduate School,Single,28,-1,1210,390,1
+140000,Female,University,Married,42,0,111496,6022,1
+210000,Female,Graduate School,Single,32,0,67374,3200,0
+220000,Female,University,Single,27,-1,1692,13250,0
+20000,Male,University,Single,27,1,18649,0,0
+620000,Female,University,Married,45,2,160837,0,1
+360000,Male,Graduate School,Single,26,1,-8,0,0
+20000,Female,University,Single,21,-1,18763,14410,0
+50000,Male,University,Single,23,0,34549,2836,0
+50000,Male,University,Others,30,0,97538,2454,0
+10000,Male,University,Single,46,0,4073,2400,1
+100000,Female,High School,Married,50,2,53849,2050,1
+50000,Male,Graduate School,Single,26,-2,2411,3068,0
+60000,Male,University,Single,25,-1,480,1675,0
+440000,Female,Graduate School,Single,35,0,330759,13100,0
+120000,Male,University,Married,44,0,92007,5417,0
+50000,Female,Graduate School,Single,52,0,3257,1000,0
+110000,Female,University,Married,40,1,92638,600,0
+200000,Female,Graduate School,Single,38,-1,399,0,0
+150000,Female,Graduate School,Single,28,1,0,0,0
+30000,Female,University,Married,36,2,28778,0,1
+100000,Female,Graduate School,Single,24,0,30092,1553,0
+360000,Female,Graduate School,Married,34,-1,12339,10000,0
+390000,Female,Graduate School,Single,31,0,50620,5000,0
+220000,Female,Graduate School,Married,41,1,-65,0,1
+160000,Female,University,Single,25,-2,0,0,0
+230000,Female,Graduate School,Single,28,-1,2403,7357,0
+120000,Male,Graduate School,Single,27,0,52437,5024,0
+100000,Male,High School,Married,52,0,4657,1500,1
+190000,Female,Graduate School,Single,41,-1,28204,0,1
+50000,Male,Graduate School,Married,56,0,50715,2500,1
+240000,Female,Graduate School,Married,42,1,0,6370,0
+40000,Female,University,Single,24,-2,35285,2000,0
+630000,Female,University,Married,47,0,37850,38187,0
+160000,Female,Graduate School,Married,33,-2,5233,5401,0
+110000,Female,University,Single,32,0,65299,11000,0
+20000,Male,University,Single,29,0,17208,2055,0
+210000,Female,High School,Married,53,0,57192,2800,0
+270000,Male,Graduate School,Single,34,-1,5780,12006,0
+360000,Female,University,Single,52,-2,640,2231,0
+60000,Female,University,Married,42,-1,631,2790,0
+360000,Male,University,Single,38,-1,108351,8005,0
+390000,Female,Graduate School,Single,45,2,834,1,0
+250000,Female,High School,Single,35,-2,7244,8514,0
+90000,Female,Graduate School,Single,24,0,13141,1600,1
+360000,Male,Graduate School,Single,28,-1,7071,1510,0
+50000,Male,High School,Single,26,0,35429,1537,0
+270000,Female,Graduate School,Married,40,0,18751,1000,0
+200000,Female,Graduate School,Single,29,1,55788,8394,0
+50000,Female,University,Single,23,0,49809,2040,0
+350000,Male,Graduate School,Single,33,-1,10900,10530,0
+160000,Female,University,Single,60,-2,3128,5156,1
+80000,Female,High School,Married,49,-1,3367,4641,0
+290000,Male,Graduate School,Married,53,2,279610,13000,1
+270000,Female,Graduate School,Single,28,-1,475,475,0
+80000,Male,Graduate School,Single,31,-1,5994,13647,0
+50000,Male,University,Single,36,0,5197,1300,0
+30000,Male,University,Married,35,-2,7797,2687,0
+30000,Male,University,Single,28,1,0,780,1
+90000,Male,University,Single,33,-1,35785,15155,0
+160000,Male,Graduate School,Single,26,-1,1433,3757,0
+80000,Female,High School,Single,53,0,77766,4000,0
+50000,Male,University,Single,42,0,50300,2200,0
+20000,Female,Graduate School,Single,25,0,14603,1600,0
+220000,Male,Graduate School,Married,43,1,0,0,1
+210000,Male,High School,Single,25,-1,8817,23776,1
+130000,Female,University,Single,29,0,131575,6700,0
+260000,Female,University,Married,31,-1,5841,23194,0
+230000,Female,Graduate School,Married,30,0,30341,5000,0
+160000,Male,Graduate School,Single,34,0,83255,4000,0
+50000,Male,Graduate School,Married,35,0,34976,1127,0
+80000,Male,University,Married,32,0,48747,2145,0
+110000,Male,High School,Single,46,0,56700,2681,0
+200000,Male,Graduate School,Married,43,0,144678,5090,0
+360000,Male,University,Single,29,0,268112,9429,0
+210000,Female,High School,Single,31,1,0,0,0
+80000,Female,University,Single,24,2,79243,3249,0
+300000,Female,University,Married,36,-1,-475,10000,1
+310000,Female,University,Married,43,0,23396,1426,0
+30000,Male,High School,Married,59,-1,390,390,1
+170000,Male,Graduate School,Married,53,0,17862,1323,0
+140000,Female,University,Single,28,2,26450,3007,0
+180000,Female,University,Single,31,1,0,0,0
+50000,Male,Graduate School,Single,34,1,22746,0,0
+50000,Female,University,Married,36,0,9214,1351,0
+110000,Female,Graduate School,Single,24,0,40882,2000,0
+450000,Female,High School,Married,36,-2,3996,9120,0
+200000,Female,University,Married,34,-1,390,390,0
+220000,Male,Graduate School,Single,43,0,44177,1006,0
+360000,Female,Graduate School,Single,29,0,3280,0,1
+50000,Male,University,Married,37,0,48226,2000,0
+40000,Male,University,Single,23,2,36650,0,0
+190000,Female,University,Married,37,1,115049,6000,0
+150000,Male,Graduate School,Single,33,0,21724,2100,0
+200000,Female,Graduate School,Married,37,1,0,0,0
+270000,Female,University,Married,56,0,137807,5000,0
+60000,Male,Graduate School,Single,25,0,58839,2018,0
+200000,Female,University,Married,39,0,80833,3315,0
+150000,Female,University,Married,39,0,36146,3000,0
+180000,Female,University,Single,28,-1,340,0,0
+90000,Female,Graduate School,Single,27,0,7624,0,0
+360000,Female,University,Married,37,1,0,0,0
+90000,Female,Graduate School,Single,24,-1,491,0,0
+20000,Female,University,Single,24,0,13168,1237,0
+70000,Male,Graduate School,Married,46,0,60040,2600,0
+230000,Male,Graduate School,Married,39,-1,660,660,0
+30000,Male,High School,Married,48,0,29434,1465,0
+200000,Female,High School,Others,45,-1,4430,7193,0
+100000,Male,High School,Single,49,-1,1440,0,0
+50000,Female,University,Single,23,0,26795,2000,0
+20000,Female,High School,Married,25,0,16591,1155,0
+210000,Female,University,Married,47,-2,0,0,0
+110000,Male,High School,Married,56,0,46486,4086,0
+20000,Female,High School,Married,44,-1,550,228,0
+50000,Female,High School,Married,49,0,5630,1107,0
+50000,Male,University,Married,41,0,21986,1362,0
+400000,Female,Graduate School,Single,40,-1,60125,1981,0
+170000,Male,University,Single,31,0,56364,3000,0
+280000,Male,University,Married,42,0,148363,6900,0
+130000,Female,High School,Married,28,0,100143,2500,0
+210000,Female,University,Single,32,0,46255,5000,0
+370000,Male,University,Single,28,0,276436,20000,0
+30000,Female,University,Married,31,0,29493,3200,0
+150000,Male,High School,Single,39,1,214450,5180,0
+60000,Male,High School,Married,31,2,35369,1027,1
+80000,Female,University,Single,24,0,77615,3100,0
+280000,Female,University,Married,44,2,186149,0,1
+20000,Female,University,Single,46,8,21075,0,0
+270000,Male,Graduate School,Single,32,-2,0,0,0
+240000,Female,University,Single,26,0,14348,1000,0
+20000,Female,Graduate School,Single,22,0,20137,1300,0
+450000,Female,Graduate School,Married,42,1,-200,0,1
+10000,Male,University,Single,33,0,8177,2500,0
+280000,Male,University,Married,39,-2,2082,3597,1
+290000,Female,High School,Single,27,0,13820,2000,0
+190000,Female,Graduate School,Single,28,0,46923,2500,1
+80000,Female,Graduate School,Single,23,0,53261,4500,0
+280000,Female,University,Single,28,-1,427,0,0
+160000,Female,Graduate School,Married,33,-2,14150,8106,0
+90000,Male,University,Married,40,0,42856,3114,1
+140000,Female,Graduate School,Single,37,-1,1034,40000,0
+50000,Male,Graduate School,Single,22,0,5137,1300,0
+40000,Female,University,Married,32,0,24114,2000,0
+130000,Female,High School,Married,43,0,130067,5540,0
+70000,Female,High School,Married,27,0,61611,3011,0
+30000,Female,University,Married,35,0,25078,1500,0
+100000,Female,High School,Married,35,1,97002,8200,0
+20000,Female,Graduate School,Married,30,2,5766,2000,1
+30000,Female,University,Single,34,2,99568,33000,0
+20000,Male,University,Married,34,0,12857,1541,1
+100000,Female,Graduate School,Single,28,2,96193,3600,1
+30000,Male,University,Single,25,2,2400,0,1
+320000,Female,Graduate School,Married,41,0,97797,4520,0
+90000,Male,High School,Single,42,2,64762,0,1
+190000,Male,University,Single,40,2,145613,15000,1
+200000,Male,Graduate School,Single,28,0,71989,52129,0
+300000,Female,University,Single,31,-1,3345,1087,0
+30000,Female,Graduate School,Single,56,-1,29033,1283,0
+110000,Female,University,Others,45,0,26650,1532,0
+230000,Female,Graduate School,Single,29,-2,0,0,0
+230000,Female,Graduate School,Married,35,0,4259,1250,0
+310000,Female,University,Single,28,0,106468,8000,0
+320000,Male,Graduate School,Married,40,0,247276,10000,0
+360000,Male,Graduate School,Married,47,-1,1548,0,0
+20000,Female,High School,Married,41,-1,780,0,1
+170000,Female,University,Single,27,0,162671,7500,0
+140000,Male,Graduate School,Married,40,0,123049,4310,0
+80000,Male,University,Single,32,0,51372,1853,0
+150000,Male,University,Single,35,0,60015,20000,0
+170000,Male,University,Single,27,-1,1170,0,0
+200000,Female,University,Married,29,2,159398,6163,1
+70000,Male,Graduate School,Single,30,0,55664,5800,0
+320000,Male,University,Married,37,-2,43528,500,0
+150000,Female,Graduate School,Single,29,-2,0,0,0
+50000,Female,Graduate School,Married,34,0,27966,1267,0
+90000,Female,University,Single,34,-1,8792,0,0
+280000,Male,Graduate School,Married,41,-2,28026,41346,0
+300000,Female,Graduate School,Single,31,0,43610,1655,0
+400000,Female,University,Single,40,-2,0,8235,0
+200000,Female,Graduate School,Married,27,-1,5780,105,0
+10000,Female,High School,Single,22,0,8109,2000,1
+50000,Female,University,Others,30,1,52515,0,1
+30000,Female,University,Others,22,1,22582,4600,0
+50000,Male,Graduate School,Single,26,-1,15448,1500,1
+50000,Male,University,Married,38,1,58102,2500,0
+30000,Female,High School,Married,46,1,12289,0,0
+20000,Male,University,Single,24,0,10476,1000,0
+20000,Male,University,Single,49,0,16326,1441,0
+50000,Male,Graduate School,Others,49,0,17681,1611,0
+30000,Female,University,Married,42,0,47912,1755,0
+360000,Female,Graduate School,Married,30,1,18886,0,0
+170000,Male,University,Married,42,1,0,1690,0
+400000,Female,Graduate School,Single,31,-2,203,770,0
+50000,Female,University,Married,49,0,34962,1613,0
+160000,Female,University,Single,27,0,80964,4000,0
+70000,Female,University,Married,33,0,69418,2382,1
+150000,Female,Graduate School,Married,35,1,0,0,0
+80000,Female,University,Single,48,0,45174,3000,0
+30000,Female,University,Single,22,0,23892,5300,0
+10000,Male,University,Single,22,0,7960,2000,1
+180000,Female,Graduate School,Single,27,1,0,3338,0
+320000,Male,Graduate School,Married,47,-1,9192,13942,0
+80000,Female,University,Single,34,0,25935,2000,0
+10000,Male,University,Married,45,0,7139,1400,1
+360000,Female,University,Single,39,-1,12865,4484,0
+450000,Male,University,Single,33,-2,-14,2582,0
+330000,Female,University,Single,29,0,49549,90000,0
+60000,Male,Graduate School,Single,29,-1,546,0,1
+50000,Female,High School,Single,26,0,9935,1234,0
+200000,Male,University,Married,38,0,152856,6000,0
+120000,Male,Graduate School,Single,34,1,1278,0,0
+160000,Female,Graduate School,Married,37,-1,316,316,0
+110000,Male,University,Married,57,0,46132,2306,0
+70000,Female,High School,Married,50,-2,27345,1041,0
+150000,Female,High School,Single,28,0,101593,4000,0
+610000,Female,Graduate School,Single,31,0,142764,6540,0
+110000,Female,Graduate School,Single,25,0,92849,4346,0
+80000,Female,University,Single,25,2,17330,3000,1
+170000,Female,University,Married,33,0,109863,4380,0
+110000,Female,Graduate School,Single,30,-1,1064,2007,0
+140000,Female,Graduate School,Married,51,0,56764,2115,0
+70000,Male,Graduate School,Single,51,0,43048,3000,0
+260000,Female,University,Married,46,-2,29484,10000,0
+170000,Male,High School,Single,38,0,21310,2000,0
+50000,Female,University,Married,34,0,10821,1221,0
+300000,Female,High School,Married,34,0,280823,11000,1
+150000,Female,Graduate School,Single,31,2,390,390,1
+50000,Female,University,Single,25,8,37647,0,1
+50000,Female,University,Married,26,1,0,0,0
+90000,Female,Graduate School,Single,25,2,84710,4000,0
+50000,Male,University,Single,26,0,48036,2100,1
+500000,Female,University,Single,43,-1,43346,4006,0
+470000,Female,University,Married,36,-1,873,1005,0
+230000,Male,University,Single,58,0,110163,5600,0
+140000,Female,University,Single,26,0,4113,2000,0
+230000,Female,University,Married,30,2,9636,1222,1
+420000,Female,University,Single,31,0,25103,1509,0
+30000,Female,University,Single,24,0,31319,1564,0
+50000,Female,University,Single,52,1,43343,1300,1
+140000,Female,University,Single,24,-1,197,0,0
+260000,Female,University,Married,35,-1,7967,67650,0
+90000,Female,High School,Single,29,-1,19450,0,0
+240000,Female,University,Married,33,0,240878,10000,0
+510000,Female,Graduate School,Single,35,-1,2115,1117,0
+110000,Female,University,Married,35,0,81665,2460,0
+210000,Female,Graduate School,Single,28,-1,11442,3738,0
+220000,Male,University,Married,40,-1,158,0,0
+200000,Female,Graduate School,Married,35,-2,2257,4266,0
+360000,Female,University,Single,53,-1,10733,15206,0
+260000,Male,Graduate School,Single,44,-1,316,316,0
+20000,Male,University,Married,33,0,11026,1174,0
+120000,Female,University,Married,32,0,113869,5498,0
+50000,Female,Graduate School,Single,33,-1,7363,39,0
+60000,Female,Graduate School,Single,26,1,29649,0,0
+200000,Male,Graduate School,Single,26,-2,12716,16244,0
+20000,Male,University,Single,26,-1,1438,1261,0
+50000,Female,Graduate School,Married,32,2,47858,2500,1
+50000,Female,University,Single,31,1,0,0,0
+230000,Male,Graduate School,Single,29,0,40732,5000,0
+50000,Male,University,Married,27,-1,819,2184,0
+100000,Female,Graduate School,Single,28,2,79580,3790,1
+210000,Female,Graduate School,Married,42,1,0,0,1
+230000,Female,University,Married,32,0,87764,2630,1
+500000,Female,Graduate School,Single,26,1,31421,22157,0
+370000,Female,Graduate School,Single,38,-1,929,328,0
+230000,Female,Graduate School,Married,35,-1,2648,200,0
+80000,Male,University,Single,29,0,11894,1284,0
+60000,Female,University,Single,29,-1,55064,5000,0
+30000,Female,University,Single,51,1,25382,1800,0
+70000,Male,Graduate School,Single,27,1,-54,600,0
+410000,Male,Graduate School,Single,32,-1,5335,5400,0
+280000,Female,High School,Married,40,1,-288,0,0
+50000,Female,Graduate School,Married,34,0,46750,2320,0
+100000,Female,University,Married,30,0,97062,2800,0
+230000,Female,Graduate School,Single,29,-2,0,0,0
+50000,Female,Graduate School,Single,22,0,13801,1372,0
+100000,Female,Graduate School,Single,48,2,15355,3164,0
+210000,Female,University,Married,35,0,37148,2000,0
+400000,Male,Graduate School,Single,37,-1,385,13032,0
+20000,Male,University,Single,25,4,20610,0,1
+50000,Female,High School,Married,44,-1,2000,780,0
+230000,Female,Graduate School,Single,28,-1,6666,6666,0
+220000,Male,University,Single,29,0,31037,3050,0
+300000,Female,Graduate School,Married,36,0,25239,1713,0
+80000,Female,University,Single,23,1,998,6861,0
+20000,Female,High School,Single,36,0,16754,2400,1
+190000,Female,Graduate School,Single,26,0,31061,1750,0
+160000,Female,University,Others,37,0,2694,35888,0
+120000,Female,Graduate School,Others,44,0,70096,8100,0
+700000,Female,Graduate School,Married,39,0,99259,5000,0
+140000,Female,Graduate School,Single,24,1,0,0,0
+300000,Female,High School,Married,29,-1,24921,2000,0
+120000,Male,High School,Married,33,1,66055,3000,1
+280000,Female,University,Married,39,0,213738,20300,0
+70000,Male,High School,Married,51,2,54968,0,1
+20000,Female,High School,Single,24,-1,4101,507,0
+150000,Female,University,Married,45,-1,2605,2605,0
+60000,Male,Graduate School,Single,30,0,15380,4147,0
+200000,Female,High School,Others,51,-1,780,0,0
+50000,Female,University,Single,24,1,48861,2000,1
+150000,Male,University,Married,50,-1,55336,5085,0
+20000,Female,Graduate School,Single,22,0,20243,1410,0
+40000,Male,Graduate School,Single,27,0,11058,2000,1
+170000,Female,University,Single,33,1,0,7042,0
+50000,Female,University,Married,24,1,27087,0,0
+10000,Male,University,Single,37,0,8755,1167,0
+200000,Female,University,Married,58,-1,3035,199646,0
+50000,Female,University,Married,48,0,20944,2360,0
+320000,Male,Graduate School,Single,28,-1,2007,2000,0
+20000,Female,High School,Married,24,8,24310,0,0
+470000,Female,University,Married,27,0,152478,3001,1
+230000,Female,High School,Single,30,-1,550,0,0
+20000,Female,Graduate School,Single,23,0,17246,2000,0
+30000,Female,High School,Single,53,-1,1598,2487,1
+80000,Female,University,Single,23,2,22895,1600,0
+110000,Female,University,Married,55,0,15069,1600,0
+420000,Male,University,Married,34,0,88948,30000,0
+500000,Male,Graduate School,Married,43,1,0,0,1
+80000,Female,University,Single,22,1,79318,2000,0
+350000,Female,University,Married,31,0,67648,3039,0
+30000,Female,University,Married,32,-2,0,0,0
+110000,Female,University,Single,33,1,0,0,1
+500000,Male,High School,Married,39,0,315201,25000,0
+490000,Female,Graduate School,Married,38,-1,4290,4738,0
+300000,Female,University,Married,34,-2,205064,0,0
+50000,Male,University,Others,42,2,47647,3700,1
+150000,Male,High School,Married,37,4,151117,6500,1
+80000,Female,High School,Married,57,-1,726,0,0
+40000,Female,University,Single,23,0,20035,2000,0
+30000,Male,University,Single,32,1,29095,2500,0
+50000,Male,Graduate School,Married,37,0,30400,500,0
+140000,Female,University,Single,23,0,44733,2100,0
+50000,Female,High School,Single,49,0,4987,2000,0
+30000,Female,University,Single,25,-2,26957,1333,0
+40000,Female,Graduate School,Single,29,0,26461,3400,1
+100000,Female,High School,Married,43,2,50485,2200,1
+130000,Male,University,Single,29,0,131041,7000,0
+20000,Female,University,Single,27,0,20218,1482,0
+50000,Male,Graduate School,Single,26,-2,0,0,0
+260000,Male,Graduate School,Married,41,-2,1086,3552,0
+30000,Female,High School,Single,52,2,2450,0,1
+10000,Female,University,Married,31,0,15915,2330,0
+200000,Male,Graduate School,Single,32,0,43515,1739,0
+30000,Female,University,Married,46,0,28234,1484,0
+180000,Female,University,Single,35,0,119333,6000,0
+100000,Male,Graduate School,Single,35,0,29109,0,0
+140000,Female,University,Married,39,0,105192,3000,0
+290000,Female,University,Single,34,0,310403,12012,0
+20000,Female,University,Others,52,0,14591,1265,0
+80000,Female,High School,Married,40,0,35660,2000,0
+500000,Female,High School,Married,38,-2,13655,8351,0
+180000,Male,Graduate School,Married,45,-1,4453,3310,0
+20000,Female,High School,Married,40,-1,2946,0,0
+30000,Female,University,Married,25,0,30606,1800,0
+150000,Male,Graduate School,Single,33,0,45335,6000,0
+280000,Female,University,Single,27,-1,5924,5174,0
+50000,Female,University,Married,43,0,23587,1800,0
+20000,Female,University,Married,43,2,15294,1300,0
+200000,Male,Graduate School,Single,30,-1,4935,0,0
+210000,Female,University,Married,49,1,0,0,1
+230000,Female,Graduate School,Single,38,0,224020,8750,0
+50000,Male,High School,Single,57,0,47917,1988,0
+50000,Male,High School,Single,55,0,51737,2930,1
+70000,Female,University,Single,23,0,49605,3000,0
+310000,Male,Graduate School,Single,30,-2,1544,1300,0
+180000,Female,University,Married,30,-1,2488,13476,0
+90000,Female,University,Single,24,0,24401,1454,0
+390000,Male,University,Single,28,0,149445,8046,0
+150000,Female,High School,Married,34,-1,772,340,0
+400000,Male,University,Single,37,1,13611,3019,0
+80000,Female,University,Married,32,0,28408,1259,0
+320000,Male,Graduate School,Single,27,0,137574,6048,0
+240000,Female,University,Married,37,-1,238,0,1
+200000,Male,Graduate School,Single,31,-2,100,100,0
+110000,Male,Graduate School,Others,55,1,92610,4500,1
+170000,Female,Graduate School,Single,28,2,131864,5600,1
+90000,Female,High School,Others,51,0,11315,1212,1
+200000,Male,Graduate School,Married,36,-1,3877,11496,0
+360000,Male,Graduate School,Single,30,-2,0,2500,0
+140000,Female,Graduate School,Married,36,-1,4036,0,0
+220000,Male,University,Single,25,0,168286,10000,0
+100000,Male,High School,Married,50,0,154970,0,0
+80000,Female,University,Single,40,2,57364,2200,0
+30000,Female,University,Single,27,-1,27772,3136,0
+50000,Male,Graduate School,Single,30,2,4187,2300,1
+120000,Female,University,Single,26,0,22431,1500,0
+50000,Male,Graduate School,Single,27,0,7010,3000,0
+360000,Female,Graduate School,Single,27,0,79837,27465,0
+90000,Female,Graduate School,Single,25,2,86014,8300,0
+500000,Male,Graduate School,Single,32,-1,386405,25016,0
+140000,Female,High School,Married,46,0,68075,3000,0
+30000,Male,High School,Married,45,0,17187,7900,0
+70000,Male,High School,Married,30,0,19882,1352,0
+200000,Male,Graduate School,Married,39,-2,0,0,0
+350000,Male,Graduate School,Married,48,0,54969,2087,0
+120000,Male,Graduate School,Single,32,1,0,0,1
+240000,Male,Graduate School,Single,46,0,471814,16044,0
+90000,Female,Graduate School,Single,23,0,22109,6000,0
+260000,Female,Graduate School,Single,33,1,0,0,0
+50000,Female,Graduate School,Single,24,0,27245,1800,0
+110000,Male,Graduate School,Married,40,0,8481,4980,0
+20000,Female,Graduate School,Single,24,-1,1371,796,0
+30000,Female,University,Single,24,0,27391,1767,0
+180000,Male,Graduate School,Single,27,-2,11392,3500,1
+20000,Male,University,Married,46,0,5902,1122,0
+50000,Female,University,Single,22,0,50342,1896,0
+180000,Female,Graduate School,Single,36,-2,10737,21197,0
+200000,Male,Graduate School,Married,34,-2,1242,4034,0
+260000,Female,University,Single,24,0,68411,2500,0
+320000,Male,Graduate School,Married,60,-1,9955,3465,1
+160000,Female,University,Single,29,2,1989,0,1
+30000,Male,University,Single,32,2,7851,1500,1
+50000,Female,University,Married,36,0,16906,1314,0
+50000,Male,University,Single,30,0,21062,2000,0
+90000,Female,University,Single,29,0,47585,10000,0
+240000,Male,Graduate School,Single,34,1,11874,0,0
+180000,Female,University,Married,32,1,15548,0,1
+200000,Female,University,Married,32,-1,5150,10190,0
+50000,Female,Graduate School,Single,25,0,42543,1900,0
+50000,Male,University,Married,44,-1,1261,1261,0
+330000,Female,Graduate School,Married,37,-1,4785,3523,0
+80000,Female,University,Single,24,2,1200,0,1
+50000,Female,University,Single,23,-2,3779,3955,0
+360000,Female,Graduate School,Single,29,0,15854,1000,0
+240000,Female,University,Married,37,0,219099,8131,0
+50000,Male,University,Married,59,0,47069,2169,0
+470000,Male,University,Single,33,0,190258,40012,0
+160000,Female,High School,Married,51,-1,2000,0,0
+30000,Male,Graduate School,Married,27,-1,12897,2011,1
+80000,Female,University,Married,32,0,81074,3700,0
+500000,Female,High School,Single,42,-2,10889,29987,0
+180000,Male,University,Single,39,-1,3135,2833,0
+110000,Female,University,Married,46,-1,2410,2151,0
+160000,Male,Graduate School,Married,41,0,117126,15315,0
+210000,Female,University,Single,30,0,21780,1325,0
+200000,Female,Graduate School,Married,34,-1,6206,1120,1
+350000,Male,Graduate School,Single,35,-1,4439,33891,0
+250000,Male,Graduate School,Single,41,-1,1338,0,1
+380000,Male,Graduate School,Single,43,1,8201,39,1
+30000,Male,University,Single,47,-1,396,396,0
+200000,Male,University,Single,31,0,33781,3000,0
+30000,Male,University,Single,30,-1,27826,1600,0
+50000,Male,University,Single,24,0,14937,3027,0
+500000,Male,Graduate School,Married,36,0,294827,20007,0
+30000,Male,Graduate School,Single,28,1,0,0,0
+220000,Female,Graduate School,Married,36,-1,996,4102,0
+20000,Female,University,Married,27,0,42784,3000,0
+180000,Female,University,Married,32,0,74189,2000,0
+300000,Male,Graduate School,Married,42,-1,11973,20979,1
+130000,Female,University,Married,33,0,133461,7010,0
+180000,Female,High School,Married,39,2,10159,2710,1
+150000,Female,Graduate School,Others,30,-2,50,8589,0
+140000,Male,University,Married,30,0,134203,5000,1
+50000,Female,University,Single,22,0,49609,2100,0
+20000,Male,University,Single,25,1,6174,2000,1
+180000,Male,Graduate School,Single,28,-1,666,662,0
+180000,Male,University,Single,26,0,5591,1126,1
+150000,Female,High School,Married,54,-1,148747,5500,0
+70000,Female,Graduate School,Single,24,0,63903,3000,0
+30000,Male,University,Single,29,8,34423,0,0
+500000,Male,Graduate School,Single,36,-2,45106,81690,1
+60000,Male,High School,Married,49,0,44801,3000,0
+50000,Female,Graduate School,Single,23,-2,49860,2500,0
+130000,Male,High School,Single,43,1,1018,6433,1
+260000,Female,University,Married,33,-2,134736,6067,0
+380000,Male,High School,Married,53,-2,0,0,0
+360000,Female,University,Single,25,2,286732,10792,0
+50000,Male,High School,Married,54,1,62847,2100,0
+50000,Female,High School,Married,39,0,58300,7876,0
+260000,Female,Graduate School,Married,44,-1,1215,0,1
+80000,Female,University,Single,27,0,57376,2200,0
+360000,Female,Graduate School,Single,25,0,279846,7004,1
+290000,Female,Graduate School,Single,29,1,0,0,1
+200000,Male,Graduate School,Single,39,-2,-200,0,0
+140000,Male,Graduate School,Married,45,0,39716,1600,0
+360000,Male,Graduate School,Married,38,1,0,0,1
+50000,Female,University,Single,23,-1,780,0,1
+120000,Male,University,Single,25,2,113348,0,0
+100000,Male,University,Married,29,0,94453,3320,0
+200000,Female,University,Married,28,0,81865,5000,0
+90000,Female,University,Married,40,-1,4989,0,0
+360000,Male,Graduate School,Single,36,1,0,0,1
+150000,Female,High School,Single,30,-2,456,9664,1
+140000,Male,Graduate School,Single,29,-1,2937,2024,0
+200000,Female,University,Single,36,-1,1371,4077,0
+170000,Female,University,Married,37,0,166246,6700,0
+30000,Male,University,Single,34,0,19785,5500,0
+80000,Female,Graduate School,Single,27,0,80000,0,0
+280000,Male,Graduate School,Single,31,0,164736,8833,0
+410000,Female,Graduate School,Married,31,-1,2744,2051,0
+450000,Male,Graduate School,Single,38,-1,29747,30323,0
+80000,Male,University,Single,36,0,79148,2000,0
+360000,Male,Graduate School,Married,46,0,92961,10125,0
+130000,Female,Graduate School,Married,36,0,129689,5000,0
+160000,Female,Graduate School,Single,26,0,2198,9155,1
+140000,Female,High School,Single,57,2,131795,6743,1
+240000,Female,High School,Married,40,1,-55,0,0
+180000,Female,High School,Others,45,-1,4148,0,0
+70000,Male,Graduate School,Others,57,0,69398,2667,0
+150000,Female,University,Single,25,0,142827,5000,0
+210000,Female,High School,Married,37,-2,138398,5000,0
+200000,Female,Graduate School,Married,35,-1,9918,1219,0
+310000,Male,University,Single,33,-2,4040,4068,0
+180000,Male,Graduate School,Married,38,0,153099,7005,0
+300000,Male,University,Single,31,0,62198,1600,0
+450000,Female,Graduate School,Single,41,-1,11343,75459,0
+200000,Male,Graduate School,Single,33,1,-5,3470,0
+30000,Female,University,Married,67,2,30374,0,1
+400000,Female,Graduate School,Married,40,-1,1107,3891,0
+170000,Male,Graduate School,Single,33,-1,764,1105,0
+140000,Female,Graduate School,Single,31,-1,652,0,1
+200000,Female,University,Single,35,0,64195,3000,0
+60000,Female,University,Others,39,1,-1540,0,0
+60000,Male,University,Single,27,1,43356,0,1
+150000,Male,Graduate School,Single,23,0,11109,78016,0
+180000,Male,High School,Married,54,0,149698,4834,1
+50000,Female,High School,Single,23,0,43979,2000,0
+50000,Male,High School,Single,24,-1,2562,8769,0
+20000,Male,University,Single,26,-1,70,0,0
+230000,Female,University,Single,52,2,1292,3951,1
+110000,Female,University,Married,42,0,55349,2096,0
+90000,Female,Graduate School,Single,36,0,90922,4273,0
+20000,Female,Graduate School,Single,22,2,16233,5383,1
+30000,Female,Graduate School,Single,23,0,26847,2500,0
+220000,Male,High School,Married,30,0,162105,10000,1
+140000,Female,High School,Single,24,1,42897,3821,1
+300000,Female,Graduate School,Married,42,-2,3000,3300,0
+100000,Male,University,Married,49,0,34704,2011,0
+110000,Male,University,Single,31,2,279,0,0
+270000,Female,High School,Single,33,0,10000,10000,0
+30000,Male,Graduate School,Married,42,0,28033,1525,0
+70000,Male,High School,Single,41,0,70476,2700,0
+230000,Male,University,Married,53,2,1350,0,0
+20000,Male,University,Single,23,0,18852,1500,0
+190000,Female,Graduate School,Single,33,1,115056,0,0
+20000,Female,University,Married,29,0,15890,2000,0
+80000,Female,High School,Married,67,0,20255,6000,0
+20000,Male,High School,Married,33,2,20500,2191,1
+60000,Female,High School,Married,34,0,9928,1500,1
+50000,Male,University,Married,49,0,10750,1507,0
+280000,Female,University,Married,56,0,284204,10261,1
+300000,Male,Graduate School,Married,42,0,120151,4229,0
+30000,Female,High School,Married,55,2,29278,1600,1
+270000,Female,University,Married,25,0,21586,239104,0
+100000,Male,High School,Single,30,0,87579,4172,0
+240000,Male,University,Single,43,-1,291,291,1
+200000,Male,Graduate School,Single,40,-1,2159,1086,0
+110000,Female,Graduate School,Single,35,0,32691,4500,0
+230000,Male,Graduate School,Married,61,-1,1477,3710,0
+150000,Female,University,Married,44,-1,390,5104,1
+350000,Male,Graduate School,Single,33,-1,148341,8405,0
+360000,Male,Others,Married,66,-1,47615,75351,0
+150000,Female,University,Single,40,1,-19,3913,0
+100000,Male,High School,Single,47,0,94925,3889,0
+500000,Male,Graduate School,Single,27,0,204198,7434,0
+500000,Male,Graduate School,Single,35,-1,172041,58761,0
+180000,Female,High School,Married,47,-1,316,316,0
+400000,Female,High School,Married,39,1,2384,3603,0
+100000,Female,University,Single,29,0,61383,1908,0
+10000,Male,University,Married,32,1,8425,0,0
+60000,Female,Graduate School,Single,30,2,34735,0,0
+50000,Female,Graduate School,Single,25,1,47054,2000,1
+80000,Female,University,Single,23,-1,57314,2145,0
+110000,Female,University,Married,43,0,108616,9600,0
+410000,Female,University,Married,41,0,354506,14600,1
+260000,Female,University,Married,40,1,0,342,0
+30000,Female,High School,Married,35,0,28786,1500,0
+310000,Female,Graduate School,Single,28,0,36513,2500,0
+70000,Female,University,Single,38,0,22578,1397,0
+250000,Male,Graduate School,Single,31,0,215316,6845,0
+210000,Male,High School,Married,29,1,198510,0,0
+360000,Male,University,Married,41,1,0,0,1
+50000,Male,University,Single,31,1,48657,0,0
+670000,Male,Graduate School,Single,29,0,244663,10000,0
+130000,Male,Graduate School,Single,26,-1,2343,32322,0
+170000,Male,Graduate School,Married,42,1,0,0,0
+30000,Male,High School,Single,29,-1,390,0,0
+110000,Male,Graduate School,Married,50,0,36802,2000,0
+210000,Female,High School,Single,46,1,5816,13,1
+30000,Female,University,Single,24,0,24238,3462,1
+260000,Female,University,Married,46,0,18256,2934,0
+150000,Female,High School,Single,28,-2,1723,6876,0
+20000,Female,University,Married,35,1,18648,0,1
+380000,Female,Graduate School,Married,40,-1,6614,49094,0
+200000,Female,Graduate School,Married,41,-2,7812,0,0
+120000,Female,University,Married,47,-1,1188,2512,0
+80000,Male,University,Married,54,0,76962,4000,1
+30000,Female,University,Single,33,1,31089,0,0
+110000,Male,University,Single,29,1,83687,3900,0
+50000,Female,High School,Married,44,-1,3424,155,0
+90000,Female,University,Married,24,0,84148,5006,0
+110000,Male,Graduate School,Single,33,0,12633,80004,0
+50000,Male,High School,Married,54,0,38607,2000,0
+60000,Male,Graduate School,Single,27,-2,11862,1861,0
+150000,Female,University,Single,37,0,80463,3348,1
+20000,Male,Graduate School,Single,24,1,18738,20,0
+140000,Female,High School,Married,54,1,-150,0,1
+120000,Female,University,Married,34,1,-9,907,0
+200000,Female,University,Single,40,2,176502,8000,1
+20000,Male,University,Single,30,1,19977,2000,1
+50000,Male,University,Single,26,0,48221,2571,0
+200000,Female,University,Married,41,0,18012,1385,0
+100000,Female,University,Single,29,-2,75912,10000,0
+300000,Male,Graduate School,Married,38,-1,2285,0,0
+500000,Female,Graduate School,Single,35,-1,22369,14128,0
+360000,Female,University,Married,52,-1,2995,2406,0
+160000,Female,Graduate School,Single,29,1,0,0,0
+50000,Female,University,Single,23,0,47053,1842,1
+230000,Female,University,Married,34,0,234328,8913,1
+200000,Female,University,Single,34,1,0,3346,0
+270000,Male,Graduate School,Married,44,3,256402,17944,0
+50000,Female,University,Married,36,2,45159,4700,0
+20000,Male,University,Single,28,1,15764,500,0
+60000,Female,University,Single,31,0,59643,3200,0
+50000,Male,University,Single,31,-1,2522,0,0
+50000,Male,University,Married,41,0,19339,1703,0
+130000,Female,High School,Married,44,0,109006,4500,0
+230000,Female,High School,Single,47,0,147300,10000,0
+20000,Male,University,Single,25,0,3400,1000,0
+30000,Female,Graduate School,Single,25,0,19801,3000,0
+200000,Female,Graduate School,Married,39,-1,8432,3746,0
+140000,Female,University,Married,24,-1,2515,0,0
+100000,Female,University,Married,35,2,98845,4600,1
+240000,Male,University,Married,32,2,127418,10400,1
+70000,Female,Graduate School,Single,31,2,2400,0,1
+360000,Male,Graduate School,Single,33,1,71195,0,0
+140000,Female,Graduate School,Single,30,1,149389,0,0
+50000,Male,University,Married,52,-1,2072,0,0
+210000,Female,University,Married,36,-1,396,590,0
+280000,Male,Graduate School,Single,30,1,191493,1000,0
+180000,Female,University,Single,42,2,50420,2500,1
+150000,Female,Graduate School,Single,30,-1,10705,1000,0
+20000,Male,High School,Married,47,1,19739,0,0
+350000,Female,University,Married,29,-1,8760,5932,0
+130000,Female,Graduate School,Single,23,-1,3820,2640,0
+470000,Male,High School,Married,39,1,0,0,0
+340000,Female,University,Married,48,-1,4052,0,0
+80000,Male,University,Single,26,1,77389,0,0
+70000,Male,Graduate School,Single,33,0,70524,2500,0
+230000,Female,University,Married,44,-2,5990,1677,0
+180000,Female,University,Married,36,0,21494,5000,0
+210000,Female,University,Married,39,0,26749,14000,0
+20000,Male,University,Single,24,-1,780,0,0
+140000,Female,University,Single,25,2,137606,6000,0
+200000,Male,Graduate School,Single,52,0,115702,5517,0
+50000,Female,High School,Married,59,0,47822,2009,0
+80000,Female,Graduate School,Single,23,-1,3526,6687,0
+30000,Female,High School,Single,48,0,1445,2000,0
+140000,Female,Graduate School,Single,28,0,9305,4800,1
+80000,Male,High School,Married,62,-1,1262,1642,0
+220000,Female,High School,Married,38,-1,20000,3450,0
+180000,Female,Graduate School,Single,32,-1,666,666,0
+140000,Female,University,Married,29,0,81764,20167,1
+180000,Male,University,Single,25,2,10022,3000,1
+50000,Female,University,Single,35,0,14536,1700,0
+170000,Female,University,Single,27,0,97474,2932,0
+50000,Male,University,Single,38,0,50958,2042,0
+50000,Male,University,Single,24,0,49144,1500,0
+300000,Male,University,Married,29,-1,1874,0,0
+390000,Female,Graduate School,Single,27,2,386301,17000,0
+90000,Female,University,Married,55,2,81998,4003,1
+200000,Female,Graduate School,Single,29,0,194241,12000,0
+460000,Male,Graduate School,Single,29,0,410033,16088,0
+10000,Male,University,Single,23,0,3714,1104,0
+20000,Male,University,Married,30,-1,1261,2835,1
+50000,Female,University,Married,50,-1,1232,1846,0
+480000,Female,University,Married,38,0,90258,30000,0
+180000,Female,Graduate School,Single,24,0,169263,8000,0
+240000,Female,Graduate School,Single,29,0,3093,2000,0
+250000,Male,Graduate School,Single,38,-2,584,3307,0
+270000,Female,University,Single,34,-1,381,3406,1
+200000,Male,Graduate School,Married,37,0,123337,11202,0
+100000,Female,High School,Married,66,0,100036,4702,0
+160000,Female,Graduate School,Married,39,-2,710,666,0
+80000,Female,University,Married,45,0,81841,4100,0
+130000,Male,University,Married,44,-1,29547,0,0
+340000,Female,High School,Single,44,0,142836,7000,1
+360000,Male,Graduate School,Married,36,-2,0,394,0
+120000,Female,High School,Married,40,0,108300,5000,0
+120000,Male,Graduate School,Single,29,0,30101,2000,0
+300000,Female,Graduate School,Single,48,0,247279,10005,0
+60000,Female,University,Single,23,2,2516,0,0
+210000,Female,University,Married,50,0,106724,4037,0
+210000,Female,University,Married,29,0,209807,5088,1
+70000,Female,High School,Married,49,2,21024,1400,1
+200000,Male,University,Married,33,0,50612,10023,0
+230000,Female,University,Married,51,0,51802,3000,1
+20000,Male,High School,Married,23,-1,3675,19316,1
+210000,Female,Graduate School,Married,37,0,211749,7000,0
+160000,Female,High School,Single,26,-1,390,390,0
+290000,Female,Graduate School,Married,30,-1,1312,1318,1
+80000,Female,High School,Married,26,0,63677,2000,0
+360000,Male,Graduate School,Single,43,-1,4134,2134,0
+230000,Male,Graduate School,Single,32,0,164634,7326,0
+360000,Male,High School,Married,54,-2,16230,0,0
+340000,Female,Graduate School,Married,30,0,351828,30076,0
+50000,Female,Graduate School,Single,22,0,8537,1200,0
+80000,Male,University,Single,38,0,89394,5000,0
+50000,Female,University,Single,24,0,97043,2000,0
+30000,Female,Graduate School,Single,24,-1,3019,1479,0
+210000,Male,Graduate School,Single,26,-1,1360,0,0
+150000,Female,University,Married,42,-1,3260,2270,1
+20000,Male,University,Single,21,0,20297,1240,0
+20000,Male,High School,Single,22,0,19624,1000,1
+240000,Female,High School,Married,41,-1,5325,5,0
+500000,Female,University,Married,50,-2,16984,87004,0
+90000,Female,University,Married,23,0,12770,2500,1
+50000,Male,High School,Single,46,0,50131,1388,0
+50000,Female,University,Single,22,0,50834,2200,1
+680000,Male,Graduate School,Single,37,0,18314,2700,0
+50000,Male,Graduate School,Married,47,0,45717,3000,0
+100000,Female,University,Married,40,-1,326,326,0
+280000,Female,Graduate School,Single,30,0,270276,11000,0
+30000,Male,University,Single,54,0,26278,1700,1
+50000,Male,University,Single,27,2,23901,4146,1
+60000,Male,Graduate School,Married,33,-1,8308,7565,0
+120000,Male,University,Married,57,0,118709,5000,1
+80000,Female,University,Married,35,-1,2890,9796,0
+500000,Male,Graduate School,Married,45,-1,140,94,0
+70000,Female,High School,Married,32,2,18317,0,0
+160000,Female,University,Married,45,-2,0,0,0
+100000,Female,High School,Married,50,2,110537,0,1
+10000,Male,High School,Single,35,0,7877,1174,0
+90000,Male,Graduate School,Single,26,-1,10750,100,0
+170000,Male,Graduate School,Married,39,2,74298,7000,1
+90000,Female,University,Single,35,-1,2667,2667,0
+180000,Female,High School,Single,42,0,67414,10000,0
+350000,Female,Graduate School,Married,52,2,2400,0,1
+270000,Male,University,Single,35,0,225400,9000,0
+200000,Female,Graduate School,Single,29,0,65935,1011,1
+20000,Female,Graduate School,Single,22,0,15829,1600,0
+100000,Male,University,Single,30,0,44953,3000,0
+290000,Male,Graduate School,Married,46,0,194301,10744,0
+70000,Male,High School,Single,28,1,68970,0,0
+230000,Male,Graduate School,Single,26,-2,416,371,0
+180000,Female,Graduate School,Single,28,-1,950,4369,0
+210000,Male,Graduate School,Single,33,-1,291,291,0
+50000,Male,Graduate School,Single,24,2,49946,0,0
+150000,Male,Graduate School,Single,37,0,45000,3000,0
+380000,Male,University,Married,50,0,385662,12020,1
+50000,Male,University,Married,44,0,45335,1524,0
+150000,Female,High School,Married,43,-1,264,1000,1
+220000,Male,University,Single,29,0,122286,5008,0
+80000,Female,Others,Single,27,0,45268,2600,0
+220000,Female,University,Married,32,0,194961,7200,0
+70000,Female,University,Single,34,1,24208,1500,0
+120000,Male,University,Single,37,-1,16241,1000,1
+180000,Female,University,Single,32,0,20730,1582,0
+50000,Female,High School,Married,57,0,49017,2500,0
+10000,Female,University,Others,46,1,9532,0,0
+370000,Female,Graduate School,Married,53,-1,21160,8162,0
+20000,Female,Graduate School,Single,26,-1,10658,1072,0
+150000,Male,University,Married,34,0,132214,4414,0
+20000,Male,Graduate School,Single,26,-1,416,416,1
+480000,Female,University,Married,54,-2,17189,6026,0
+50000,Male,High School,Single,26,0,25060,2000,0
+170000,Male,University,Single,31,0,24581,1728,0
+400000,Female,University,Single,30,-2,-200,0,0
+200000,Female,Graduate School,Single,31,0,38635,3031,0
+360000,Male,Graduate School,Single,34,1,0,0,0
+140000,Female,Others,Single,36,0,91226,4182,0
+260000,Male,High School,Married,34,-2,12071,2114,0
+440000,Female,University,Single,26,0,96190,3513,0
+500000,Male,University,Married,43,0,12200,1518,0
+230000,Female,University,Married,29,0,64982,3000,0
+470000,Male,University,Married,41,-2,13780,12300,0
+20000,Female,High School,Married,48,1,14589,2700,1
+210000,Female,University,Single,24,0,108491,5800,0
+20000,Female,Graduate School,Single,23,0,6733,0,0
+20000,Female,High School,Married,53,0,12197,1600,0
+200000,Female,University,Married,34,1,-2,760,0
+250000,Female,University,Single,36,0,6220,27225,1
+300000,Male,University,Single,41,0,34480,1472,0
+320000,Female,Graduate School,Single,30,0,113801,5017,0
+50000,Male,University,Single,29,1,37170,0,0
+200000,Female,University,Married,34,-2,6626,2476,0
+80000,Female,Graduate School,Single,29,0,16936,1303,1
+70000,Male,University,Married,37,-1,10323,0,1
+60000,Male,High School,Single,36,3,56807,0,0
+10000,Female,High School,Single,22,0,10012,2452,1
+50000,Female,High School,Single,30,0,50332,4210,0
+140000,Male,University,Single,26,2,133572,6169,1
+30000,Male,University,Single,25,0,7160,1300,0
+190000,Male,University,Married,49,-1,435,435,0
+70000,Female,Graduate School,Single,23,-1,853,3128,0
+330000,Female,University,Married,28,-1,390,0,0
+110000,Female,University,Single,24,0,6470,2654,0
+270000,Male,High School,Married,47,0,165303,6200,0
+160000,Male,University,Married,31,3,26295,1500,1
+20000,Male,University,Single,24,3,18937,0,0
+320000,Male,University,Single,32,0,21460,3000,0
+430000,Male,Graduate School,Single,31,0,29860,10000,0
+50000,Female,Graduate School,Married,37,2,8014,1200,0
+140000,Male,University,Married,48,-1,1600,5012,1
+500000,Male,University,Married,49,-2,4214,6378,0
+80000,Male,High School,Married,45,0,26637,1490,0
+390000,Female,Graduate School,Married,34,1,0,0,0
+250000,Female,Graduate School,Married,43,-1,17454,24890,0
+390000,Female,Graduate School,Single,31,-2,6466,6266,0
+100000,Female,Graduate School,Married,35,0,57612,2134,0
+360000,Female,Graduate School,Married,42,0,179851,10005,0
+140000,Male,Graduate School,Single,36,1,-5,1140,0
+110000,Male,University,Married,51,0,45964,2000,0
+170000,Male,High School,Married,48,2,316,316,0
+170000,Male,University,Married,48,0,137823,5396,0
+180000,Female,University,Single,25,0,4745,1257,0
+140000,Male,University,Married,41,0,109326,5600,0
+20000,Female,University,Married,49,0,13882,2018,0
+150000,Male,Graduate School,Single,28,-2,0,0,0
+50000,Female,University,Single,28,0,48072,2518,0
+160000,Female,University,Married,38,1,0,0,1
+80000,Male,University,Married,41,-2,3153,1013,0
+300000,Female,High School,Single,29,-2,-140,28232,1
+60000,Female,High School,Single,32,0,62552,1500,0
+70000,Female,University,Single,26,2,42583,2000,1
+290000,Female,Graduate School,Single,31,-1,632,0,0
+80000,Female,Graduate School,Single,23,0,8766,1600,0
+50000,Female,High School,Married,53,1,0,0,1
+200000,Female,High School,Single,49,-1,7697,20071,0
+200000,Female,Graduate School,Married,34,-2,267,6817,0
+50000,Female,Graduate School,Single,22,-1,586,297,1
+20000,Female,University,Single,26,2,1041,10758,0
+200000,Male,Graduate School,Married,38,-1,6186,735,1
+120000,Female,Graduate School,Single,35,1,77785,0,0
+220000,Female,University,Married,49,0,172184,7583,0
+630000,Male,Graduate School,Single,40,0,440474,18000,0
+300000,Female,Graduate School,Single,34,-1,19357,5032,0
+10000,Female,University,Single,49,1,8184,0,1
+50000,Male,University,Single,29,1,0,0,0
+280000,Female,Graduate School,Married,39,-1,7791,1000,1
+130000,Male,Graduate School,Single,30,0,4536,1161,0
+30000,Female,High School,Married,28,0,23182,2000,0
+50000,Female,University,Married,27,0,9877,1140,0
+70000,Female,High School,Married,38,0,24758,1273,0
+80000,Male,University,Single,29,0,41815,2100,0
+20000,Female,University,Married,22,2,16726,3000,1
+390000,Male,Graduate School,Married,41,0,495559,10218,0
+470000,Female,University,Single,32,0,147113,5000,0
+240000,Female,University,Single,30,0,7818,9434,0
+140000,Female,Graduate School,Single,29,-1,14943,2888,0
+260000,Female,University,Married,37,2,244738,12000,1
+150000,Female,Graduate School,Single,25,1,191,1964,0
+120000,Male,Graduate School,Married,46,0,14682,3200,0
+20000,Female,University,Married,47,0,18021,1280,0
+260000,Female,Others,Single,42,-1,399,2101,0
+50000,Male,High School,Single,22,0,42678,1500,1
+180000,Female,University,Single,27,2,126195,6000,0
+310000,Female,Others,Married,35,1,10350,0,0
+360000,Male,University,Married,44,-1,5101,9419,0
+180000,Female,University,Single,30,0,14445,1500,0
+270000,Female,Graduate School,Married,42,2,1250,0,0
+70000,Male,Graduate School,Single,28,0,67857,3000,0
+50000,Male,University,Single,24,-1,2155,0,0
+30000,Female,High School,Single,50,-2,0,0,0
+50000,Female,University,Married,54,0,11351,1150,0
+350000,Male,University,Married,37,-1,325,294,0
+200000,Male,Graduate School,Married,58,0,192461,7822,0
+80000,Male,High School,Single,26,1,71661,0,0
+60000,Female,University,Married,31,0,47000,3000,0
+30000,Male,Graduate School,Married,52,0,15743,1300,0
+110000,Female,Graduate School,Single,36,0,108316,4600,0
+300000,Female,Graduate School,Single,27,-1,42189,83754,0
+50000,Female,High School,Married,22,0,22854,9000,0
+500000,Male,Graduate School,Single,39,-1,2450,11833,0
+360000,Male,Graduate School,Single,36,-1,356913,210000,0
+20000,Female,University,Married,39,1,14112,3520,0
+220000,Male,Graduate School,Married,39,0,75190,100000,0
+20000,Female,University,Single,22,0,11192,5000,0
+60000,Male,University,Married,37,3,30381,366,1
+170000,Female,Graduate School,Single,30,-1,1560,1000,0
+200000,Female,University,Married,26,0,47236,1800,0
+20000,Female,Graduate School,Single,23,0,19990,1438,0
+140000,Female,High School,Single,23,-2,67505,2000,0
+330000,Female,Graduate School,Single,35,-2,1663,0,0
+180000,Female,University,Single,24,1,108580,0,1
+360000,Female,University,Married,35,0,5157,1012,0
+20000,Male,High School,Married,51,2,14815,1264,0
+50000,Female,Graduate School,Single,23,0,34422,1544,0
+100000,Female,University,Married,25,0,20755,2000,0
+30000,Male,University,Married,36,2,30573,0,0
+70000,Female,University,Married,44,0,4221,2500,1
+470000,Male,Graduate School,Single,33,-1,5613,11606,0
+300000,Female,Graduate School,Married,36,-1,8385,27,0
+300000,Female,Graduate School,Married,45,-1,6608,10017,1
+200000,Female,Graduate School,Single,26,-2,2179,6440,0
+150000,Female,University,Married,41,0,79021,4023,0
+220000,Female,University,Single,25,0,165040,6809,1
+420000,Male,Graduate School,Single,31,-2,1569,565,0
+210000,Female,University,Single,45,-2,467,2050,0
+450000,Male,High School,Married,58,-1,8940,6271,0
+30000,Male,University,Married,38,-1,184,1522,0
+240000,Male,University,Married,30,-1,2816,7608,0
+420000,Female,University,Single,25,-1,28206,100000,0
+60000,Female,Graduate School,Single,25,0,57174,3000,0
+30000,Female,High School,Single,24,-1,4882,14630,0
+80000,Female,Graduate School,Single,29,-2,2673,0,0
+50000,Male,University,Single,37,0,48206,2120,0
+230000,Male,High School,Single,27,-2,19200,10000,0
+200000,Male,Graduate School,Single,31,0,6618,1900,0
+230000,Female,University,Single,28,0,12218,6000,0
+130000,Female,University,Single,24,0,122750,4966,0
+200000,Female,High School,Single,27,1,0,0,0
+90000,Female,Graduate School,Single,33,0,80919,9000,0
+120000,Male,Graduate School,Single,27,-1,316,316,0
+200000,Male,Graduate School,Single,42,-2,470,740,0
+20000,Female,University,Single,32,2,17757,0,0
+210000,Female,Graduate School,Married,37,2,76089,3700,1
+80000,Female,University,Single,44,0,80610,6502,0
+50000,Female,Graduate School,Married,29,0,50486,2200,0
+70000,Female,University,Married,42,2,50886,2200,1
+230000,Male,University,Single,32,2,189567,9000,1
+20000,Male,University,Single,52,0,17420,2000,0
+30000,Male,University,Single,44,-1,21043,1196,0
+100000,Male,Graduate School,Single,31,2,33752,0,1
+80000,Female,Graduate School,Single,29,0,81337,2842,1
+210000,Male,University,Single,33,0,187610,8500,0
+50000,Male,University,Single,54,1,48153,0,0
+130000,Female,University,Single,23,0,133622,4752,0
+70000,Female,Graduate School,Single,24,0,48432,5000,0
+210000,Female,Graduate School,Married,33,0,17420,1236,0
+100000,Male,High School,Married,44,-1,780,0,1
+200000,Male,High School,Others,51,-1,3556,5020,0
+40000,Male,Graduate School,Single,24,1,28926,0,1
+80000,Female,University,Single,25,2,69602,0,0
+210000,Female,Graduate School,Married,37,1,0,732,1
+20000,Female,University,Single,23,-1,7186,8060,0
+10000,Male,High School,Single,34,0,8813,2148,0
+60000,Female,High School,Married,50,0,61600,3300,1
+40000,Female,University,Married,25,0,11273,1520,0
+290000,Female,Graduate School,Single,33,-1,413,486,1
+50000,Male,University,Married,54,-2,1583,2134,0
+20000,Male,University,Single,25,0,17384,1500,0
+30000,Male,University,Single,26,0,33195,1600,0
+20000,Male,University,Married,32,0,17104,1613,0
+50000,Female,High School,Others,43,0,24443,1800,0
+500000,Male,University,Married,42,1,-984,4500,1
+90000,Female,High School,Single,24,-1,595,207,1
+10000,Male,University,Single,26,0,15147,2234,0
+50000,Female,University,Married,37,2,1050,0,1
+120000,Male,Graduate School,Single,36,0,24466,4500,0
+200000,Female,Graduate School,Single,27,0,157607,7194,0
+300000,Female,University,Married,50,-1,8924,3131,0
+90000,Female,University,Single,24,-1,17524,5000,0
+90000,Male,University,Married,27,0,70482,2580,0
+140000,Female,Graduate School,Single,23,1,0,0,1
+180000,Female,University,Married,39,-1,1081,148,0
+360000,Male,Graduate School,Married,39,-1,1761,2816,0
+160000,Female,High School,Single,29,1,0,0,1
+100000,Male,Graduate School,Married,45,0,10529,2688,0
+20000,Male,University,Single,21,0,19277,5000,1
+360000,Male,Graduate School,Married,36,0,40434,2000,0
+230000,Male,University,Single,39,-1,326,652,0
+110000,Female,High School,Single,29,0,101006,5300,1
+210000,Male,University,Single,37,0,142149,6534,0
+200000,Male,University,Single,26,-1,1747,12957,0
+50000,Male,University,Single,44,0,27560,1159,0
+210000,Female,Graduate School,Single,30,1,0,0,1
+280000,Male,Graduate School,Single,31,1,-32,0,0
+500000,Male,University,Married,40,0,94585,2659,0
+30000,Female,Graduate School,Single,25,0,3924,1099,0
+340000,Female,University,Married,36,0,167474,7106,0
+360000,Male,Graduate School,Single,32,-1,389,389,0
+70000,Female,Graduate School,Single,22,0,71358,3280,0
+20000,Female,University,Others,44,2,2400,0,1
+230000,Male,Graduate School,Single,32,0,44734,10120,0
+60000,Male,University,Married,41,0,48979,5070,0
+50000,Male,Graduate School,Single,28,3,49828,2121,1
+50000,Female,Graduate School,Single,29,0,10532,2800,1
+130000,Female,Others,Single,28,0,36108,3000,0
+160000,Male,High School,Single,52,0,155684,6525,0
+60000,Female,High School,Single,48,2,43533,1700,0
+160000,Male,University,Single,43,-1,12458,7545,0
+180000,Female,Graduate School,Single,25,0,23800,1246,0
+170000,Male,University,Married,36,2,75960,2000,0
+360000,Female,University,Single,35,2,360023,14693,1
+130000,Female,University,Married,34,0,59955,2260,0
+180000,Female,University,Single,48,1,2551,3044,0
+20000,Male,Graduate School,Single,27,1,6274,0,0
+230000,Female,University,Married,43,0,7806,1295,0
+220000,Female,University,Married,30,1,0,0,0
+240000,Male,University,Married,43,-2,0,0,0
+50000,Male,High School,Married,54,0,50356,2000,0
+310000,Female,University,Married,39,0,7246,1155,0
+350000,Male,Graduate School,Married,70,0,100895,3371,0
+50000,Female,Graduate School,Single,24,-1,629,351,0
+200000,Female,University,Married,30,1,26223,0,0
+480000,Male,High School,Single,53,0,482250,18093,0
+180000,Female,Graduate School,Married,27,2,81044,2600,1
+200000,Female,University,Single,33,0,184676,7900,0
+20000,Male,High School,Married,34,2,15736,1284,0
+50000,Male,University,Single,48,0,51631,1800,0
+210000,Female,High School,Single,33,1,0,0,1
+20000,Female,University,Single,24,0,19433,1700,0
+150000,Female,University,Married,31,1,6526,783,0
+50000,Female,University,Single,25,0,46528,1794,0
+190000,Male,High School,Married,49,2,178367,5000,1
+290000,Female,Graduate School,Single,27,2,232523,16500,0
+420000,Male,University,Married,39,0,171141,5700,0
+340000,Female,University,Single,36,2,337981,17181,0
+60000,Male,University,Single,40,3,9744,1500,0
+90000,Female,Graduate School,Single,27,0,81111,3500,0
+30000,Female,Graduate School,Single,27,2,28207,1800,1
+160000,Female,University,Married,37,-1,1988,0,0
+20000,Male,University,Single,35,-1,11916,0,0
+90000,Female,University,Married,25,0,87525,4200,1
+120000,Male,Graduate School,Single,27,0,20874,1700,0
+120000,Male,Graduate School,Single,38,1,-284,0,0
+360000,Female,University,Married,50,-1,147,686,0
+160000,Female,Graduate School,Single,32,-2,16922,2555,0
+320000,Female,Graduate School,Married,34,-2,65083,23750,0
+20000,Male,University,Single,25,0,16508,3400,0
+230000,Female,Graduate School,Single,29,0,63184,3100,0
+50000,Male,High School,Married,44,0,15931,2000,0
+50000,Male,High School,Married,48,-1,2722,33895,0
+40000,Female,High School,Single,24,-1,5496,0,0
+120000,Female,High School,Married,27,2,390,390,0
+130000,Female,University,Single,25,-1,1031,1583,1
+110000,Female,High School,Single,28,0,85918,4000,0
+20000,Male,University,Single,22,1,8222,0,1
+50000,Male,High School,Single,22,1,0,0,0
+120000,Female,High School,Married,30,0,115583,4178,0
+50000,Male,High School,Single,34,0,50375,2400,0
+360000,Female,University,Single,34,1,0,0,1
+240000,Female,University,Married,44,-1,3313,7458,1
+150000,Male,Graduate School,Single,32,0,46272,1082,0
+50000,Female,University,Single,23,0,30501,3113,0
+500000,Female,Graduate School,Single,42,-1,30439,1844,1
+110000,Female,University,Married,33,0,115895,6000,0
+170000,Male,University,Married,48,0,93420,4371,1
+230000,Female,University,Married,29,0,30818,2000,0
+20000,Female,High School,Others,47,0,9417,1179,0
+100000,Female,Graduate School,Married,41,1,0,0,0
+110000,Male,Graduate School,Single,28,1,44278,0,0
+340000,Male,Graduate School,Married,52,-2,30832,33670,1
+50000,Male,Graduate School,Single,37,0,37774,1976,0
+310000,Male,Graduate School,Married,50,-1,316,316,0
+50000,Female,High School,Married,31,1,38102,2830,1
+240000,Female,High School,Married,38,0,107148,9500,1
+50000,Female,University,Married,48,0,10929,2526,0
+150000,Female,Graduate School,Single,34,-2,0,13206,0
+140000,Male,University,Single,44,0,108033,4000,0
+30000,Female,High School,Married,56,0,16088,9000,0
+300000,Female,Graduate School,Married,30,0,93738,3000,1
+50000,Male,Graduate School,Single,29,0,2087,1000,0
+280000,Female,University,Married,37,0,29078,1515,0
+120000,Female,University,Single,23,0,118063,4700,0
+30000,Female,Graduate School,Married,27,1,0,306,0
+30000,Female,University,Single,23,-1,3226,3596,1
+180000,Male,High School,Married,45,1,24400,1500,0
+150000,Male,University,Married,33,1,8967,3001,0
+110000,Female,University,Married,40,0,84589,3700,0
+50000,Male,Graduate School,Single,24,0,50716,3029,1
+230000,Female,High School,Married,43,1,0,5655,0
+230000,Female,University,Single,25,0,95311,4500,0
+160000,Female,Graduate School,Single,46,-2,6241,8805,0
+50000,Male,High School,Single,36,0,52855,2206,0
+210000,Female,Graduate School,Married,36,1,0,1690,0
+120000,Female,University,Married,38,-1,4100,7270,0
+240000,Female,University,Married,42,0,21943,15000,0
+20000,Male,High School,Others,47,0,18768,1400,0
+50000,Female,High School,Single,40,0,5538,1271,0
+50000,Female,Graduate School,Single,70,2,49546,0,0
+50000,Female,University,Married,32,0,48508,3037,0
+100000,Male,High School,Single,52,0,73904,2781,0
+390000,Female,Graduate School,Married,49,-1,15026,13357,0
+550000,Male,Graduate School,Single,31,0,43769,2018,0
+450000,Female,University,Single,36,-1,13500,6364,0
+80000,Female,University,Married,50,-1,2587,2587,0
+50000,Female,High School,Married,48,0,4064,2000,0
+50000,Female,University,Single,33,1,49617,1813,0
+180000,Male,University,Married,44,0,70721,8000,0
+240000,Male,Graduate School,Single,35,-2,15432,10851,0
+320000,Female,High School,Single,29,-1,500,500,0
+480000,Female,University,Married,35,-1,18214,0,0
+150000,Female,University,Married,37,-1,1184,0,0
+110000,Female,Graduate School,Single,29,0,107195,7845,0
+240000,Male,Graduate School,Single,30,-1,7744,7305,0
+50000,Male,Graduate School,Single,33,0,47853,2412,1
+120000,Male,University,Single,27,-1,2474,2469,0
+150000,Female,Graduate School,Married,31,1,0,0,0
+100000,Female,High School,Single,53,0,101503,1538,0
+260000,Female,University,Single,38,-2,-2,0,0
+250000,Female,University,Single,26,-2,16884,2000,0
+100000,Female,University,Single,26,0,9614,0,0
+160000,Male,Graduate School,Married,51,2,79244,0,1
+270000,Female,Graduate School,Married,37,-1,264,264,1
+390000,Female,University,Single,42,-1,5895,42016,0
+210000,Female,University,Married,43,0,80527,4000,0
+30000,Female,University,Single,23,0,24285,3002,0
+300000,Female,High School,Married,30,-1,12090,15042,0
+190000,Female,High School,Married,38,0,89619,15000,0
+130000,Female,Graduate School,Married,35,-1,6348,321,1
+360000,Male,University,Married,44,0,270248,5000,0
+320000,Male,Graduate School,Married,35,1,291,2594,1
+20000,Male,University,Others,35,1,0,0,1
+360000,Male,Graduate School,Married,32,-2,10729,2535,0
+190000,Female,University,Single,25,0,110031,8614,0
+90000,Female,University,Single,33,0,63626,1000,1
+30000,Female,Graduate School,Single,26,-1,2635,1270,0
+240000,Male,Graduate School,Single,29,0,240248,11000,0
+40000,Female,Graduate School,Single,26,0,12308,1500,0
+170000,Male,University,Single,36,-1,2104,2081,0
+180000,Male,Graduate School,Single,37,0,22121,10000,0
+20000,Male,High School,Married,63,2,16097,1400,1
+150000,Female,High School,Married,48,0,37800,4000,0
+50000,Female,University,Married,47,0,44624,2355,1
+140000,Female,High School,Married,31,2,89910,5000,0
+50000,Male,High School,Single,31,0,40011,1566,0
+200000,Male,Graduate School,Single,34,0,67002,10000,0
+100000,Female,Graduate School,Married,43,-1,9122,22982,0
+250000,Female,Graduate School,Single,32,0,33323,2038,0
+80000,Female,Graduate School,Single,38,-2,1927,446,0
+170000,Female,University,Married,50,0,177956,7262,0
+80000,Male,High School,Single,29,0,46638,2091,0
+50000,Male,University,Married,49,0,49816,3000,0
+30000,Female,High School,Single,40,2,23687,0,0
+70000,Female,Graduate School,Single,34,1,70971,2000,1
+30000,Female,High School,Married,47,-1,1355,5174,0
+200000,Male,Graduate School,Married,34,-1,5023,621,0
+100000,Female,University,Single,27,-1,11540,1303,0
+20000,Male,University,Single,44,0,17636,1375,1
+200000,Male,Graduate School,Single,29,-1,2290,0,0
+50000,Male,High School,Married,72,2,35417,1908,1
+50000,Male,University,Married,48,0,40931,1880,1
+50000,Female,Graduate School,Single,27,1,36794,0,1
+220000,Male,University,Married,44,0,217743,7000,0
+240000,Female,High School,Married,39,-2,0,1234,0
+50000,Male,Graduate School,Married,40,0,42549,2200,0
+500000,Female,Graduate School,Married,45,-1,36954,0,0
+170000,Male,Graduate School,Single,26,-1,23594,1512,0
+30000,Male,Graduate School,Married,34,-1,220,2500,0
+300000,Male,Graduate School,Married,43,1,0,0,1
+50000,Male,University,Single,30,0,50507,4000,0
+100000,Female,High School,Married,31,-1,16033,1716,0
+150000,Female,Graduate School,Single,29,0,108817,7500,1
+50000,Male,Graduate School,Single,28,0,46176,2000,0
+190000,Female,University,Single,31,-1,771,0,0
+20000,Female,High School,Single,22,0,14562,1121,0
+50000,Female,High School,Married,55,1,49003,2005,0
+30000,Female,University,Married,39,-1,223,223,1
+90000,Female,Graduate School,Single,27,-2,1179,7020,0
+500000,Female,Graduate School,Single,36,-2,-312,0,1
+280000,Male,Graduate School,Married,40,-1,5310,2196,0
+30000,Male,University,Married,31,0,13265,1632,0
+90000,Female,University,Married,36,2,49295,2000,0
+200000,Male,University,Married,42,1,38187,44,1
+100000,Male,University,Single,30,-2,914,1475,0
+20000,Male,High School,Single,40,0,12649,3103,0
+50000,Male,University,Married,51,0,49613,3036,0
+50000,Female,University,Others,51,0,10571,2477,0
+290000,Male,High School,Single,32,-1,1520,1200,0
+220000,Female,Graduate School,Single,36,-2,0,420,0
+440000,Male,Graduate School,Married,37,-1,4330,8369,0
+50000,Male,University,Single,49,0,90231,2852,1
+300000,Male,Graduate School,Single,31,-1,40779,5401,0
+290000,Female,Graduate School,Married,50,0,165563,7000,0
+390000,Male,University,Married,40,1,424244,587,0
+50000,Female,University,Single,24,0,29186,1500,0
+220000,Female,University,Single,31,-1,38746,0,1
+150000,Female,University,Single,29,0,3559,1222,0
+50000,Male,Graduate School,Single,24,0,50963,1844,0
+200000,Male,Graduate School,Married,44,-1,667,0,0
+120000,Female,University,Single,30,0,16812,1358,0
+230000,Female,Graduate School,Married,61,-2,1443,1443,0
+180000,Female,University,Married,28,0,133868,5300,0
+20000,Male,Graduate School,Single,22,0,23538,3000,0
+20000,Male,University,Single,23,0,18804,2150,0
+50000,Male,High School,Married,46,-1,13473,0,1
+290000,Female,Graduate School,Single,24,-2,14931,15090,0
+290000,Female,University,Single,54,-2,-68,0,0
+150000,Female,Graduate School,Single,27,-2,0,610,0
+190000,Female,University,Single,53,0,173058,6300,0
+80000,Female,University,Single,25,0,81003,8213,1
+250000,Male,High School,Married,47,-1,1170,1863,0
+20000,Male,University,Married,35,1,19755,1000,1
+180000,Female,High School,Married,45,-1,396,396,0
+80000,Male,Graduate School,Single,25,-1,2109,1257,0
+300000,Male,Graduate School,Single,31,-1,2166,2918,0
+50000,Female,High School,Married,51,0,20454,2100,0
+70000,Female,Graduate School,Married,40,0,51959,1597,0
+150000,Female,Graduate School,Single,30,0,44795,10000,1
+320000,Female,Graduate School,Single,29,-1,179226,0,0
+60000,Female,Graduate School,Single,24,0,60381,1639,0
+20000,Male,High School,Single,32,0,3677,2400,1
+230000,Male,Graduate School,Single,35,2,117277,5800,0
+310000,Female,University,Married,29,0,74955,4000,0
+230000,Female,Graduate School,Single,36,-1,1545,306,0
+200000,Female,Graduate School,Married,40,-2,2552,8084,0
+100000,Female,Graduate School,Single,30,0,28383,5000,1
+310000,Male,University,Married,64,-1,1920,0,0
+180000,Male,High School,Married,33,2,159999,0,1
+390000,Female,Graduate School,Married,38,0,164418,9000,0
+20000,Female,Graduate School,Single,26,1,0,0,0
+50000,Female,University,Married,26,0,45663,2300,0
+140000,Male,Graduate School,Single,29,0,142087,5528,0
+170000,Male,Graduate School,Single,37,-1,4334,1343,1
+250000,Female,University,Married,45,-2,100323,2500,0
+50000,Female,High School,Married,40,2,49157,0,1
+250000,Female,Graduate School,Single,36,0,130360,5000,0
+90000,Male,Graduate School,Single,27,0,56608,2800,0
+90000,Male,University,Married,44,2,89875,4000,1
+160000,Female,Graduate School,Single,27,0,13234,0,0
+360000,Male,University,Single,29,0,114527,20046,0
+330000,Female,Graduate School,Single,26,0,58667,2213,0
+50000,Female,High School,Married,54,1,64069,0,0
+60000,Female,University,Married,42,2,57598,0,1
+10000,Female,High School,Single,42,0,4908,1275,0
+70000,Male,University,Married,41,1,34779,2000,0
+150000,Female,Graduate School,Married,31,-2,2977,1408,1
+250000,Female,University,Single,33,0,255223,1600,1
+130000,Female,High School,Married,33,0,122199,5100,0
+130000,Male,Graduate School,Single,27,0,84664,2039,0
+100000,Female,University,Married,36,-2,2205,0,0
+130000,Male,University,Single,37,0,131446,5000,0
+80000,Female,University,Single,29,2,77416,2661,1
+380000,Male,Graduate School,Single,41,0,350341,28400,0
+20000,Female,University,Single,23,0,17257,1230,1
+20000,Male,University,Single,25,0,15789,6500,0
+140000,Male,Graduate School,Married,54,0,139406,5545,1
+180000,Male,Graduate School,Single,33,-1,2217,0,0
+230000,Female,Graduate School,Single,29,-2,-5,0,0
+90000,Male,Graduate School,Single,32,-1,6546,1498,0
+160000,Male,University,Single,29,-1,7002,7229,0
+50000,Female,Graduate School,Single,22,0,49459,2300,0
+140000,Female,University,Single,29,-2,945,945,0
+250000,Female,University,Married,31,0,42441,4000,0
+120000,Female,University,Single,24,-1,100,0,0
+120000,Female,Graduate School,Single,27,0,79072,3000,0
+260000,Female,University,Married,40,0,243196,20008,0
+10000,Male,University,Single,38,2,1050,0,1
+60000,Female,University,Married,38,-2,4764,30500,0
+60000,Female,University,Married,29,0,33147,24511,0
+50000,Male,University,Single,27,0,17771,1311,0
+130000,Male,High School,Married,34,0,5444,1321,0
+130000,Male,Graduate School,Single,34,0,31179,2002,0
+360000,Female,Graduate School,Single,32,-2,0,0,0
+80000,Female,University,Married,36,0,20366,1352,0
+50000,Male,University,Single,26,1,0,0,0
+290000,Female,Graduate School,Married,33,1,7064,0,0
+200000,Male,Graduate School,Married,53,0,195050,0,0
+20000,Male,University,Single,27,0,18290,2458,0
+500000,Male,Graduate School,Single,33,-1,5379,2065,0
+130000,Female,Graduate School,Single,26,1,22415,0,0
+30000,Female,Graduate School,Single,27,0,23447,1500,1
+20000,Female,Graduate School,Single,24,2,7919,1000,1
+70000,Female,University,Single,26,0,30139,2000,0
+200000,Male,Graduate School,Single,27,-1,4890,3630,0
+50000,Female,University,Married,46,-1,569,1440,1
+360000,Male,University,Married,56,-1,750,1236,0
+50000,Female,High School,Married,56,0,33197,2407,0
+180000,Female,High School,Married,26,0,32023,10000,0
+50000,Male,University,Single,25,2,7180,0,1
+120000,Female,Graduate School,Single,28,-2,-8,2054,0
+80000,Female,University,Married,37,2,15234,2200,1
+330000,Female,Graduate School,Married,47,1,2246,13990,0
+40000,Male,High School,Married,43,0,41094,1611,1
+50000,Male,University,Single,26,0,27570,2000,0
+80000,Male,High School,Married,48,0,71515,3679,0
+450000,Male,Graduate School,Married,41,-1,697,505,0
+30000,Female,High School,Married,48,0,27947,2000,0
+160000,Male,University,Married,28,0,62312,5856,0
+80000,Female,Graduate School,Single,22,-1,10439,11501,0
+400000,Male,University,Married,34,0,52271,6000,0
+220000,Male,University,Married,60,-1,1994,1606,1
+40000,Female,University,Married,25,0,21590,3400,1
+280000,Female,University,Single,27,-1,82742,10077,0
+50000,Female,Graduate School,Single,25,0,50380,4000,0
+110000,Male,University,Single,26,0,106417,9026,0
+180000,Female,Graduate School,Married,44,1,2788,5646,0
+200000,Female,Graduate School,Single,26,0,18630,1502,0
+390000,Female,Graduate School,Single,33,0,85538,3500,0
+400000,Female,Graduate School,Single,29,-1,5154,1060,0
+120000,Female,High School,Single,37,-1,776,1616,1
+200000,Male,Graduate School,Single,34,1,-800,0,0
+280000,Male,University,Married,33,0,207982,8644,0
+230000,Female,Graduate School,Single,27,0,105217,4326,0
+280000,Female,Graduate School,Married,36,-1,21251,33840,0
+130000,Male,High School,Single,28,0,132258,6900,0
+160000,Female,University,Married,35,-1,776,1172,1
+120000,Male,High School,Married,47,-1,1051,671,0
+300000,Female,High School,Married,31,-1,331,332,0
+20000,Female,Graduate School,Single,31,-2,264,264,0
+450000,Male,Graduate School,Single,31,-1,10350,3629,1
+250000,Male,University,Married,30,-2,2419,29062,0
+100000,Female,University,Married,58,0,101249,3642,0
+150000,Female,University,Single,27,0,23165,1723,0
+140000,Male,University,Single,35,0,101038,4215,0
+500000,Female,Graduate School,Single,46,-2,33332,163500,0
+20000,Male,University,Single,36,-1,3152,0,0
+50000,Male,University,Single,53,0,18922,1329,0
+100000,Male,University,Single,39,0,121895,4500,1
+200000,Female,University,Single,27,2,186806,0,0
+150000,Female,University,Single,37,0,86037,5000,0
+80000,Male,Graduate School,Single,23,0,60130,2550,0
+160000,Male,Graduate School,Married,40,1,0,567,0
+180000,Female,University,Married,44,-1,816,803,0
+270000,Female,Graduate School,Married,32,-1,23351,30972,0
+180000,Female,Graduate School,Single,41,0,37353,1012,0
+120000,Male,Graduate School,Married,54,-1,390,390,1
+110000,Female,Graduate School,Single,29,0,48088,2000,0
+50000,Female,High School,Married,61,0,42200,0,0
+200000,Female,High School,Single,30,0,185723,8009,0
+70000,Female,High School,Married,23,2,254,6840,0
+360000,Female,University,Single,30,0,23900,0,0
+260000,Female,University,Married,35,0,425349,14347,1
+50000,Male,High School,Married,32,0,40992,2026,0
+380000,Female,University,Married,39,-1,19261,81975,0
+50000,Male,University,Single,24,-1,402,0,0
+20000,Male,High School,Married,38,0,19805,1000,1
+270000,Male,Others,Single,32,-2,24089,52398,0
+460000,Male,Graduate School,Single,32,0,386437,15000,0
+60000,Male,High School,Single,24,0,58819,3000,0
+200000,Female,Graduate School,Married,35,-1,25663,14224,0
+50000,Female,University,Single,37,1,14008,0,1
+20000,Female,Graduate School,Single,22,0,19760,1376,0
+290000,Female,University,Married,40,0,274558,10700,1
+110000,Female,University,Single,36,0,116486,4353,0
+160000,Female,High School,Married,51,2,142064,6600,1
+70000,Female,University,Married,31,0,60058,1834,0
+200000,Female,Graduate School,Single,39,-2,0,0,1
+80000,Female,University,Single,40,0,78717,3300,0
+20000,Male,Graduate School,Single,41,-2,18803,36147,1
+210000,Male,Graduate School,Single,48,-2,0,570,0
+180000,Female,High School,Married,39,2,172488,3450,1
+50000,Male,University,Others,53,0,34664,1500,0
+30000,Male,University,Single,26,1,31305,0,1
+150000,Male,Graduate School,Single,30,-1,6708,1098,0
+150000,Male,Graduate School,Single,28,-1,1111,0,1
+260000,Male,Graduate School,Married,43,1,0,2000,0
+310000,Female,Graduate School,Single,30,0,304329,14027,0
+150000,Female,Graduate School,Single,33,0,96066,4400,0
+200000,Female,Graduate School,Single,27,-1,16093,11197,0
+150000,Male,High School,Single,45,0,99247,4268,0
+220000,Female,University,Married,54,-1,8727,3011,0
+540000,Female,Graduate School,Single,34,-1,1784,892,0
+280000,Male,University,Single,30,0,204773,8038,0
+230000,Male,University,Single,32,0,49619,2072,0
+50000,Male,Graduate School,Married,39,0,33849,3817,0
+20000,Female,University,Single,24,1,0,0,0
+10000,Male,University,Single,35,1,8257,0,1
+500000,Male,Graduate School,Married,46,0,407331,15043,0
+50000,Female,University,Single,24,0,49819,1672,0
+420000,Female,University,Single,35,-1,7572,18920,0
+50000,Female,University,Single,23,0,21998,2063,0
+280000,Female,Graduate School,Single,29,-2,0,0,0
+50000,Male,University,Married,46,0,3656,1000,0
+160000,Female,High School,Married,41,2,46017,2200,1
+40000,Female,University,Married,23,2,33895,0,1
+100000,Female,University,Married,31,0,92265,5000,0
+50000,Female,Graduate School,Single,26,0,41003,1733,0
+130000,Female,High School,Single,32,0,119312,20021,0
+30000,Female,University,Single,22,0,27358,2000,0
+140000,Male,University,Married,34,-1,430,430,1
+210000,Female,University,Married,32,-1,164015,9320,0
+50000,Male,University,Married,38,0,48797,1864,0
+310000,Male,University,Married,29,0,31132,20000,0
+300000,Female,Graduate School,Married,31,-2,0,1110,0
+30000,Female,University,Single,47,2,16039,1600,1
+90000,Female,High School,Married,40,0,89928,3416,0
+150000,Female,University,Married,34,2,39465,4000,1
+90000,Male,Graduate School,Single,25,0,32755,2000,0
+80000,Female,High School,Single,65,0,84985,4000,0
+100000,Female,Graduate School,Single,28,-1,555,712,1
+10000,Male,University,Single,33,-2,0,0,0
+210000,Male,Graduate School,Single,31,0,161159,8000,0
+170000,Female,University,Married,41,0,21170,9734,0
+90000,Female,University,Married,32,1,85997,0,0
+210000,Female,High School,Single,36,1,0,4866,1
+50000,Female,Graduate School,Single,26,0,36806,2000,0
+150000,Male,Graduate School,Married,30,1,-819,36276,0
+80000,Female,Graduate School,Single,30,1,0,1313,0
+130000,Female,Graduate School,Married,38,0,64942,4000,1
+100000,Male,University,Single,30,1,72652,20000,1
+160000,Male,Graduate School,Married,41,0,53935,5000,0
+250000,Male,University,Single,31,0,14451,1504,0
+500000,Female,University,Married,42,0,60246,40000,0
+260000,Male,University,Single,32,0,106093,4410,0
+50000,Female,University,Married,44,-2,1473,390,0
+30000,Female,University,Single,22,0,27388,2075,0
+110000,Female,University,Single,42,1,57046,0,0
+500000,Male,Graduate School,Married,39,-1,6057,9418,0
+310000,Female,Graduate School,Single,32,0,35064,4000,0
+580000,Male,University,Married,37,0,110417,3757,0
+340000,Female,Graduate School,Married,36,0,185268,160444,0
+360000,Female,University,Married,42,-1,22237,6695,0
+290000,Female,Graduate School,Single,29,-1,578,7129,0
+30000,Male,University,Single,36,3,20424,0,0
+80000,Female,Graduate School,Single,29,0,26524,1500,0
+360000,Male,Graduate School,Married,38,-1,780,0,1
+170000,Female,Graduate School,Single,31,-1,1561,1239,0
+100000,Male,University,Married,42,0,56256,3000,0
+450000,Male,University,Single,37,0,457627,1000,0
+260000,Female,Graduate School,Single,33,-2,0,204,0
+240000,Female,Graduate School,Single,29,0,25700,5009,0
+110000,Female,University,Single,28,0,5660,2000,0
+240000,Female,University,Single,26,0,184501,7627,0
+220000,Female,University,Single,35,0,34819,2006,0
+20000,Male,University,Married,33,0,2508,1000,0
+100000,Female,Graduate School,Married,51,1,0,0,1
+300000,Male,Graduate School,Single,32,-2,264,264,0
+20000,Male,University,Single,33,0,8488,2000,1
+360000,Female,Graduate School,Single,30,0,139762,6000,0
+220000,Male,Graduate School,Single,28,0,221268,8700,1
+470000,Female,University,Single,26,0,169699,6200,0
+70000,Female,University,Single,22,0,65793,2000,0
+120000,Female,Graduate School,Single,26,0,10172,3000,0
+50000,Male,University,Single,35,0,18510,1500,0
+470000,Male,University,Married,49,0,169260,6600,0
+130000,Male,University,Married,49,0,24008,3000,0
+20000,Female,Graduate School,Single,24,-1,390,1097,1
+260000,Female,Graduate School,Single,36,-2,1891,2637,0
+20000,Male,Graduate School,Married,39,-1,628,700,0
+300000,Female,Graduate School,Single,28,2,1314,170,1
+460000,Female,University,Single,35,0,449790,17610,0
+20000,Female,Graduate School,Single,24,1,0,20087,1
+360000,Female,University,Single,27,0,238971,10381,1
+290000,Male,Graduate School,Married,40,0,98441,4000,0
+90000,Male,Graduate School,Married,30,0,38921,12000,0
+180000,Female,Graduate School,Single,26,-1,2049,0,0
+360000,Male,High School,Married,53,-2,0,0,0
+120000,Male,Graduate School,Married,45,0,20877,1369,0
+220000,Male,University,Single,30,0,217800,7261,0
+220000,Male,Graduate School,Single,34,2,203101,311,1
+10000,Female,Graduate School,Single,27,-1,499,0,1
+320000,Female,University,Single,31,0,104850,3600,0
+320000,Male,Graduate School,Married,33,0,325445,10000,0
+90000,Male,University,Single,39,0,50477,3000,0
+180000,Male,Graduate School,Married,50,0,66114,5000,0
+90000,Female,Graduate School,Single,28,0,88345,3360,0
+100000,Male,Graduate School,Married,36,0,23070,1700,0
+340000,Male,University,Married,41,0,331003,15110,0
+50000,Male,Graduate School,Single,22,0,47726,5500,1
+240000,Female,University,Married,41,-1,20272,1000,0
+10000,Male,University,Single,41,1,8722,1157,0
+220000,Female,Graduate School,Married,42,2,2500,0,1
+100000,Female,University,Single,37,0,107669,3536,0
+220000,Female,Graduate School,Married,31,-2,3785,1142,0
+70000,Female,High School,Married,57,0,67864,2620,0
+30000,Female,University,Single,37,3,29657,1763,1
+130000,Female,High School,Single,53,-1,11085,45697,0
+140000,Female,University,Married,25,0,81960,3005,0
+20000,Male,University,Single,32,0,4485,1000,0
+150000,Female,Graduate School,Married,38,1,0,0,1
+50000,Female,University,Single,26,0,4136,0,0
+90000,Male,University,Married,29,1,41073,3100,0
+170000,Female,High School,Married,37,0,158920,8000,0
+70000,Male,Graduate School,Single,30,2,34985,0,1
+200000,Female,Graduate School,Single,43,-2,830,0,0
+200000,Male,Graduate School,Married,42,0,19318,19067,0
+20000,Male,University,Married,40,1,20498,0,1
+200000,Male,University,Married,41,0,96910,10000,0
+170000,Male,University,Single,26,-2,0,0,0
+130000,Male,Graduate School,Single,25,1,-2000,0,1
+100000,Female,Graduate School,Single,27,0,90118,4110,0
+180000,Male,University,Single,31,0,28566,5000,0
+230000,Female,High School,Single,26,-2,1200,990,0
+20000,Male,High School,Single,50,-1,2191,1140,0
+500000,Female,Graduate School,Married,35,0,35176,2504,1
+150000,Female,Graduate School,Married,38,1,0,0,1
+80000,Female,High School,Single,40,3,38503,2500,1
+230000,Male,University,Married,29,0,56171,5000,0
+80000,Male,Graduate School,Single,37,1,0,0,1
+230000,Female,University,Single,24,0,24973,5000,0
+50000,Female,University,Single,49,2,38947,2000,1
+50000,Male,University,Single,35,0,48813,5000,0
+10000,Female,Graduate School,Single,25,1,10438,0,0
+140000,Male,University,Single,34,2,135949,8408,0
+50000,Male,University,Single,42,0,39236,2094,1
+180000,Male,High School,Single,28,0,129719,6070,0
+360000,Male,Graduate School,Single,65,1,0,0,1
+580000,Male,High School,Married,67,2,453985,25704,0
+310000,Female,Graduate School,Married,43,-2,14619,12484,0
+20000,Male,High School,Single,52,0,16735,4008,0
+180000,Female,Graduate School,Single,43,-2,8657,6847,0
+60000,Female,Graduate School,Single,24,0,15050,2000,0
+40000,Female,High School,Married,59,0,14769,1573,0
+140000,Female,University,Married,34,0,121374,8200,0
+220000,Male,University,Married,32,0,8000,8288,0
+10000,Male,University,Single,22,0,8602,1158,1
+20000,Male,Graduate School,Married,52,0,19965,390,0
+50000,Female,Graduate School,Single,23,-1,15036,1300,0
+50000,Female,University,Single,33,2,27752,1487,1
+320000,Female,University,Single,35,0,98697,3891,0
+20000,Male,University,Single,34,-2,2199,1530,0
+50000,Male,University,Married,40,0,48219,2037,0
+340000,Male,Graduate School,Married,49,-1,1960,1228,0
+360000,Female,University,Married,32,1,-301,0,1
+300000,Female,University,Married,33,1,0,0,0
+40000,Female,University,Single,26,0,31475,2000,0
+150000,Male,Graduate School,Married,51,0,128968,12900,0
+20000,Female,University,Single,40,2,18278,0,1
+30000,Female,High School,Married,35,1,-5,0,0
+80000,Female,University,Married,27,-2,390,390,0
+390000,Female,High School,Married,38,-2,241,0,1
+20000,Female,University,Single,22,1,5613,0,1
+200000,Male,University,Married,42,0,68670,4000,1
+80000,Female,High School,Married,35,0,49544,1970,0
+50000,Female,University,Married,36,2,50922,1715,1
+230000,Female,University,Married,38,0,10906,1364,0
+190000,Female,University,Single,31,0,147586,10000,0
+260000,Male,High School,Married,36,-1,154462,7425,0
+100000,Female,Graduate School,Married,39,-1,29449,55139,0
+140000,Male,University,Single,41,0,132758,4002,0
+20000,Female,University,Married,42,2,12426,1225,1
+80000,Male,University,Single,30,0,69392,3046,0
+360000,Male,Graduate School,Single,32,0,304907,12557,0
+70000,Male,University,Single,31,0,46079,2200,1
+30000,Female,University,Married,24,2,31214,0,1
+500000,Female,Graduate School,Married,39,0,358020,15000,0
+320000,Female,University,Single,30,2,416,416,0
+500000,Female,Graduate School,Married,34,0,32715,30007,0
+240000,Female,Graduate School,Married,27,1,0,0,0
+50000,Female,Graduate School,Single,25,-1,574,801,0
+120000,Female,University,Married,39,0,31611,1562,0
+250000,Female,Graduate School,Single,32,0,26465,5000,0
+180000,Female,University,Married,39,0,65710,2500,0
+10000,Female,University,Single,22,-1,5208,654,0
+200000,Female,High School,Married,53,0,98426,52306,0
+120000,Female,Graduate School,Single,29,-1,380,380,0
+230000,Male,University,Married,38,-1,326,326,0
+330000,Female,Graduate School,Married,36,1,0,0,0
+50000,Female,University,Single,28,0,28457,1492,1
+130000,Male,University,Single,32,0,27495,0,0
+110000,Male,University,Single,26,2,58112,0,1
+30000,Female,High School,Single,40,-1,2208,7006,0
+120000,Female,Graduate School,Single,25,0,8676,5000,0
+120000,Female,Graduate School,Single,28,-2,398,958,0
+210000,Female,University,Married,40,0,42820,1856,0
+30000,Male,University,Married,47,1,15148,0,0
+50000,Male,High School,Single,27,2,54529,1800,0
+140000,Female,University,Single,42,-1,176,395,0
+140000,Male,High School,Married,37,2,136184,6400,0
+50000,Male,University,Married,55,0,50528,2100,0
+130000,Male,High School,Married,46,1,26582,100,1
+80000,Female,University,Single,30,1,6393,0,1
+50000,Female,High School,Married,51,0,44766,4000,0
+100000,Male,Graduate School,Married,35,1,17365,1310,0
+220000,Male,University,Married,48,-1,61614,45889,0
+260000,Female,University,Married,28,-1,711,711,0
+320000,Male,Graduate School,Single,40,0,304702,20000,0
+80000,Female,University,Married,30,0,5123,3000,0
+140000,Male,Graduate School,Single,55,0,65640,2800,0
+180000,Male,University,Married,26,0,175539,3826,1
+80000,Male,Others,Single,23,1,-24,3056,0
+250000,Male,University,Single,40,1,0,0,0
+80000,Male,Graduate School,Single,33,0,75454,4100,0
+30000,Male,High School,Married,50,0,28339,1681,1
+120000,Female,University,Married,26,-1,13587,11433,0
+120000,Female,University,Single,29,0,82064,4000,0
+20000,Male,University,Married,29,2,8024,4000,0
+50000,Male,University,Single,40,0,26566,3000,0
+250000,Male,University,Married,50,-2,390,390,0
+260000,Male,Graduate School,Single,37,0,112053,5000,0
+180000,Female,Graduate School,Single,27,-1,980,0,0
+20000,Male,High School,Married,49,-1,390,678,0
+150000,Male,University,Married,55,1,76516,3800,1
+260000,Male,Graduate School,Single,34,-1,6776,3222,0
+230000,Male,Graduate School,Married,43,1,9462,1000,0
+70000,Female,Graduate School,Single,24,0,69572,2828,0
+160000,Female,Graduate School,Married,33,-1,7429,8000,0
+50000,Female,University,Married,42,1,0,0,0
+170000,Male,University,Married,37,-1,490,0,0
+210000,Female,High School,Married,39,0,106028,6000,0
+230000,Male,Graduate School,Single,28,1,0,0,0
+110000,Male,University,Married,36,2,57135,0,1
+80000,Female,University,Married,39,0,44002,2300,0
+90000,Female,University,Married,54,-1,692,394,0
+90000,Female,Graduate School,Married,41,2,78700,3500,1
+250000,Female,Graduate School,Married,42,-1,744,2186,0
+270000,Male,University,Single,31,1,16616,2500,0
+20000,Male,High School,Single,23,0,10103,1205,0
+80000,Female,Others,Single,34,1,2898,0,0
+20000,Male,Graduate School,Single,23,2,15077,1000,0
+150000,Female,University,Married,36,0,83853,5000,0
+80000,Female,High School,Married,37,1,99396,3560,1
+130000,Female,University,Married,47,0,46096,2167,0
+320000,Female,Graduate School,Single,28,-1,18506,1062,0
+50000,Female,Graduate School,Single,24,2,43379,3628,1
+260000,Female,University,Married,34,0,135489,5007,0
+140000,Female,Graduate School,Single,35,2,56528,2800,0
+10000,Female,High School,Married,53,1,4479,0,1
+180000,Male,Graduate School,Single,30,0,139804,7000,0
+250000,Male,Graduate School,Married,71,0,177484,6243,0
+20000,Male,Graduate School,Single,23,-1,15383,1300,1
+50000,Male,High School,Single,28,0,48963,2500,0
+120000,Female,Graduate School,Married,38,1,0,0,0
+160000,Female,University,Married,49,1,58943,2800,0
+60000,Female,University,Single,22,0,11718,1000,0
+130000,Male,University,Single,31,0,116437,4911,0
+320000,Male,University,Married,41,1,0,0,1
+50000,Female,University,Married,43,0,35268,1600,0
+50000,Male,University,Single,33,0,48058,2097,1
+70000,Female,University,Married,44,2,12662,1500,1
+80000,Female,University,Married,44,0,76992,2000,0
+230000,Male,University,Single,27,-1,416,568,1
+20000,Male,High School,Single,43,-1,390,1736,1
+20000,Female,High School,Single,39,0,12241,4000,0
+360000,Male,University,Single,42,0,10000,10000,0
+90000,Female,Graduate School,Single,28,-2,-1886,0,0
+160000,Female,University,Married,34,2,77290,4500,0
+50000,Male,Graduate School,Single,36,1,49845,2505,0
+220000,Female,Graduate School,Single,27,0,164855,6102,0
+350000,Male,Graduate School,Single,42,2,320982,12618,1
+120000,Female,Graduate School,Single,29,-1,3945,5894,0
+210000,Female,Graduate School,Single,30,1,73903,0,0
+100000,Female,University,Married,31,2,17606,0,0
+50000,Female,High School,Single,52,0,34781,1725,1
+340000,Female,Graduate School,Single,26,-1,58363,56270,0
+140000,Male,Graduate School,Married,57,2,108660,0,1
+50000,Male,University,Single,42,0,47691,3304,0
+50000,Male,University,Single,31,1,16617,0,0
+90000,Female,Graduate School,Single,27,0,56754,3000,0
+160000,Male,University,Married,66,2,67771,3400,1
+250000,Male,University,Single,27,0,34799,2027,0
+50000,Male,Graduate School,Single,23,1,0,0,0
+300000,Female,University,Married,36,0,141486,6000,0
+230000,Female,University,Single,34,1,-2,0,0
+260000,Male,Graduate School,Married,43,1,0,0,1
+60000,Female,Graduate School,Single,22,2,58841,3700,0
+140000,Male,High School,Married,41,-1,2139,0,0
+70000,Female,High School,Others,49,2,92033,3078,1
+20000,Male,High School,Married,41,2,18738,1376,1
+20000,Female,University,Married,27,3,4899,0,1
+170000,Male,Others,Married,28,0,171098,14000,0
+150000,Female,High School,Others,55,-1,1080,17498,0
+50000,Male,University,Single,26,0,13117,4000,0
+60000,Male,Graduate School,Married,49,2,60289,1580,1
+230000,Female,Graduate School,Single,29,-1,1960,11055,0
+100000,Male,University,Single,30,0,31373,1602,0
+150000,Male,Graduate School,Single,24,-1,1050,3258,0
+70000,Male,University,Single,51,0,67880,3028,0
+150000,Male,Graduate School,Single,25,0,52952,60086,0
+50000,Female,University,Single,25,0,23529,4820,0
+110000,Female,Graduate School,Single,29,2,95690,11000,1
+290000,Female,Graduate School,Single,25,0,305823,15000,1
+310000,Female,University,Single,42,0,128166,3844,0
+80000,Female,University,Married,38,0,74236,2125,0
+220000,Male,Graduate School,Married,54,-2,0,0,0
+230000,Female,University,Married,62,0,167635,6474,0
+360000,Male,Graduate School,Married,52,0,236171,8500,0
+320000,Male,Graduate School,Married,48,-1,25840,13326,0
+20000,Male,Graduate School,Single,30,0,14406,6500,0
+360000,Female,University,Single,35,-1,61793,79651,0
+360000,Female,University,Married,38,1,0,0,0
+440000,Female,Graduate School,Married,37,-1,10858,50004,0
+400000,Male,University,Married,49,2,175904,8000,1
+180000,Male,University,Married,35,1,32082,0,0
+230000,Female,Graduate School,Single,30,1,0,0,0
+200000,Female,Graduate School,Single,29,0,9078,3000,0
+230000,Male,Graduate School,Married,34,0,22954,1700,0
+50000,Male,University,Married,53,0,47824,1948,0
+270000,Male,Graduate School,Single,36,-2,15986,2008,0
+440000,Female,University,Single,29,0,485921,20000,0
+340000,Male,Graduate School,Married,53,-1,3780,3295,0
+320000,Male,University,Single,35,0,12233,7000,0
+500000,Female,University,Single,36,-1,339,0,0
+30000,Female,University,Married,33,-1,29568,3701,0
+120000,Female,High School,Single,55,0,102735,3000,0
+310000,Male,Graduate School,Single,37,0,10513,1193,0
+110000,Female,High School,Single,31,0,2705,2487,0
+400000,Male,University,Single,31,0,185677,7000,0
+150000,Female,Graduate School,Married,55,-1,2070,7620,0
+50000,Male,Graduate School,Married,42,0,47454,3000,0
+270000,Male,University,Married,27,0,21256,1618,0
+30000,Female,High School,Married,39,0,14517,2500,1
+30000,Male,Graduate School,Married,57,0,23978,3478,1
+20000,Male,University,Single,24,0,10700,510,0
+50000,Female,University,Single,22,0,18471,1329,0
+100000,Male,University,Married,50,0,101430,5001,0
+200000,Female,Graduate School,Single,29,1,0,0,0
+500000,Male,Graduate School,Married,38,0,419487,13000,0
+190000,Male,University,Married,43,0,90580,10000,0
+130000,Male,University,Married,47,-1,5837,21038,0
+50000,Male,University,Single,23,0,17792,2956,0
+10000,Male,High School,Single,22,0,9363,1200,0
+1000000,Female,Graduate School,Married,47,0,964511,50784,0
+150000,Female,High School,Married,35,-1,11500,4998,0
+20000,Male,University,Married,37,0,17219,2000,1
+140000,Female,Graduate School,Single,30,2,1070,4593,1
+210000,Female,Graduate School,Married,45,-2,-200,0,0
+360000,Male,High School,Married,53,0,8669,1303,0
+60000,Male,Graduate School,Single,30,0,59959,10020,0
+30000,Male,University,Married,33,0,15662,1500,0
+140000,Female,University,Married,37,0,127693,2000,0
+20000,Female,Graduate School,Single,21,0,17471,1400,0
+120000,Female,University,Single,32,0,15602,2000,1
+80000,Male,Graduate School,Married,45,1,-50,0,0
+50000,Female,University,Married,41,0,21910,2000,0
+360000,Female,Graduate School,Single,27,1,0,0,0
+500000,Male,Graduate School,Single,30,0,416171,30000,0
+10000,Male,University,Single,21,0,7985,1217,0
+210000,Male,University,Single,25,0,13899,0,0
+50000,Male,High School,Single,41,0,22358,1900,0
+200000,Female,Graduate School,Married,38,-1,6677,0,0
+300000,Female,Graduate School,Single,36,-1,10700,890,0
+110000,Male,University,Single,24,0,44737,2103,1
+50000,Male,University,Single,58,4,51996,300,1
+340000,Female,Graduate School,Married,41,-1,6814,0,1
+50000,Female,University,Married,32,-1,819,699,1
+110000,Male,High School,Single,63,1,49992,0,0
+600000,Male,High School,Married,38,0,459600,13713,0
+160000,Male,Graduate School,Single,32,-1,2507,800,0
+450000,Male,Graduate School,Married,38,-1,10591,7964,0
+60000,Male,Graduate School,Single,31,1,0,1012,1
+310000,Male,Graduate School,Married,43,0,250708,11000,1
+20000,Female,University,Married,24,-1,10361,10000,0
+60000,Female,University,Single,29,0,53274,2591,0
+170000,Female,University,Married,38,-1,521,6439,0
+190000,Female,Graduate School,Married,50,1,0,2329,0
+230000,Female,Graduate School,Single,49,2,113,0,1
+80000,Female,Graduate School,Single,25,-2,1174,0,0
+120000,Female,High School,Married,50,0,111564,4200,0
+20000,Male,University,Single,30,1,17260,2422,0
+140000,Female,University,Married,26,0,37283,7023,0
+50000,Male,University,Single,24,0,49076,2063,0
+180000,Female,Graduate School,Married,35,0,104335,5500,0
+150000,Female,High School,Married,50,-1,3644,875,0
+300000,Female,Graduate School,Single,35,0,294344,15000,0
+380000,Female,Graduate School,Married,45,-1,202,37791,0
+90000,Female,High School,Single,27,2,92290,4183,1
+10000,Female,University,Single,22,1,8925,0,0
+90000,Female,High School,Single,28,2,63869,2368,1
+240000,Male,Graduate School,Married,34,-1,228,23295,0
+180000,Female,High School,Married,32,-1,396,712,0
+140000,Male,Graduate School,Single,32,0,101458,3997,0
+280000,Male,University,Married,36,1,168708,8000,1
+120000,Female,Graduate School,Single,30,-1,711,878,1
+550000,Female,University,Married,32,0,546741,22863,1
+170000,Female,University,Single,30,0,90727,5000,0
+50000,Female,University,Married,36,2,2590,2458,0
+140000,Female,University,Married,35,0,137692,5202,0
+10000,Male,High School,Single,48,0,8858,1228,0
+240000,Female,High School,Married,32,-1,1386,1207,0
+30000,Female,University,Single,24,2,27954,0,0
+30000,Female,University,Single,25,1,14149,1543,1
+200000,Female,University,Single,36,0,2324,1064,0
+140000,Female,University,Single,33,0,56426,2000,0
+200000,Female,High School,Single,28,1,0,0,0
+80000,Female,University,Single,37,0,21054,1365,0
+100000,Male,High School,Single,30,0,46675,1800,1
+220000,Male,University,Single,26,2,215477,5516,0
+80000,Female,University,Single,25,0,7536,1200,0
+350000,Female,Graduate School,Single,27,0,312770,14599,0
+50000,Female,University,Single,26,0,33428,1800,1
+360000,Male,University,Married,39,0,23690,60284,0
+230000,Female,University,Married,33,0,9530,7000,0
+30000,Male,University,Married,39,2,13888,3000,1
+70000,Female,University,Single,24,0,41485,4200,0
+70000,Female,University,Single,23,-1,3082,1500,0
+150000,Female,Graduate School,Married,44,1,0,0,0
+360000,Male,Graduate School,Single,35,-1,1195,23944,0
+210000,Female,University,Married,56,1,-325,0,0
+50000,Male,University,Single,29,0,32312,1700,0
+180000,Female,University,Single,48,-1,3116,0,0
+210000,Male,Graduate School,Married,40,-1,6223,8146,0
+160000,Female,High School,Married,53,-1,2475,2475,0
+50000,Female,High School,Married,37,2,38527,2000,1
+200000,Female,University,Married,35,1,0,0,0
+20000,Female,University,Single,41,0,13723,1500,0
+500000,Male,High School,Married,56,0,139117,4500,0
+90000,Female,High School,Married,49,0,104494,3592,0
+70000,Female,Graduate School,Single,29,0,47310,4500,0
+50000,Female,High School,Married,49,0,33782,4000,1
+180000,Male,Graduate School,Married,37,-1,121915,7416,0
+50000,Male,University,Single,27,1,41322,0,0
+230000,Female,University,Single,33,0,232796,9000,0
+70000,Male,University,Single,28,1,35155,23,0
+30000,Male,High School,Single,39,0,10178,4491,1
+10000,Male,High School,Others,54,1,8210,0,0
+90000,Male,High School,Married,35,1,0,800,0
+80000,Female,High School,Married,43,2,68152,5000,1
+360000,Female,Graduate School,Married,46,1,-200,2916,0
+150000,Female,Graduate School,Single,29,-1,766,0,1
+200000,Female,Graduate School,Married,32,1,25830,0,0
+300000,Female,High School,Single,38,0,35103,10000,0
+500000,Male,Graduate School,Married,37,-2,0,0,1
+210000,Male,Graduate School,Single,27,-1,25025,5000,0
+80000,Female,University,Single,33,0,17254,1000,0
+200000,Female,Graduate School,Married,44,2,188553,8011,0
+230000,Female,Graduate School,Married,30,-1,2212,17402,1
+60000,Female,Graduate School,Married,29,0,51855,2200,0
+180000,Male,University,Single,32,1,172842,8500,1
+80000,Male,University,Married,26,0,51751,2000,0
+220000,Female,Graduate School,Married,44,-2,0,0,0
+30000,Female,University,Married,26,0,10540,1081,0
+30000,Female,Graduate School,Single,30,1,0,0,0
+30000,Male,University,Single,25,0,8864,1500,0
+150000,Male,University,Married,40,0,54021,2003,0
+30000,Female,University,Married,47,1,23709,0,0
+180000,Male,University,Single,26,0,93169,4500,0
+30000,Male,Graduate School,Single,56,2,27033,1480,0
+240000,Female,Graduate School,Single,28,0,236772,10000,0
+120000,Female,University,Single,31,0,26307,1483,0
+150000,Female,High School,Married,43,0,17541,2000,0
+180000,Female,Graduate School,Married,47,1,0,0,0
+50000,Male,University,Married,46,0,44573,2200,0
+360000,Female,Graduate School,Single,26,1,-200,0,1
+30000,Female,High School,Others,60,-1,2365,0,0
+50000,Male,University,Married,24,-1,390,918,0
+30000,Female,University,Married,39,-1,1742,0,0
+50000,Female,Graduate School,Single,53,0,50777,2504,0
+90000,Male,University,Single,30,0,46927,4000,0
+200000,Female,University,Single,39,7,195156,0,1
+50000,Male,High School,Single,52,0,47479,29456,0
+80000,Female,University,Single,40,0,33918,20000,0
+50000,Female,University,Single,41,0,45408,2025,0
+200000,Female,University,Single,23,-1,964,0,0
+90000,Female,University,Married,36,0,89309,5000,0
+240000,Male,Graduate School,Single,28,0,199671,7000,0
+90000,Female,Graduate School,Single,29,-2,1709,2522,1
+10000,Male,Graduate School,Single,23,2,2724,3000,0
+210000,Female,Graduate School,Married,38,-1,2360,4465,0
+140000,Female,University,Single,26,0,12436,3380,0
+50000,Female,Graduate School,Married,29,0,29719,1815,0
+50000,Female,University,Married,28,0,36380,30000,1
+230000,Female,High School,Married,45,-1,1633,8435,0
+150000,Female,University,Single,27,-1,4914,6237,0
+240000,Male,University,Married,36,0,10767,140013,0
+80000,Male,University,Single,25,1,81774,0,0
+140000,Female,High School,Single,33,1,0,0,0
+290000,Male,University,Married,50,0,88713,3459,0
+80000,Male,Others,Married,29,0,80534,3006,0
+400000,Male,University,Married,34,0,372724,28000,0
+20000,Male,High School,Others,46,0,20393,1425,0
+360000,Male,Graduate School,Single,30,-1,11487,307,0
+260000,Female,University,Single,25,0,16373,5006,1
+30000,Male,Graduate School,Single,28,1,10056,2000,0
+130000,Male,University,Single,27,0,131406,5000,1
+490000,Male,Graduate School,Married,41,0,95997,3600,0
+580000,Male,Graduate School,Married,42,0,264482,9431,0
+20000,Male,University,Married,34,0,12619,2800,1
+180000,Female,Graduate School,Single,32,0,64008,3414,0
+90000,Female,Graduate School,Single,26,-1,1224,3128,0
+90000,Male,University,Single,25,0,2908,1052,0
+440000,Female,Graduate School,Married,54,2,276955,8752,1
+280000,Female,University,Married,52,0,50667,2007,0
+230000,Male,Graduate School,Single,29,0,121344,6317,0
+300000,Male,University,Single,35,0,276536,26531,0
+20000,Female,University,Married,41,-1,6565,0,0
+80000,Female,University,Single,26,0,27041,5879,0
+20000,Male,University,Single,24,8,24348,0,0
+220000,Male,Graduate School,Single,34,-1,17875,3431,1
+210000,Female,University,Married,44,0,73487,3748,0
+130000,Male,High School,Single,49,2,60273,2536,1
+400000,Male,Graduate School,Married,66,0,258070,47000,0
+120000,Female,Graduate School,Single,25,-1,9642,12557,0
+460000,Female,University,Married,33,0,128302,2371,0
+230000,Female,Graduate School,Married,42,-1,2915,4203,0
+210000,Male,University,Single,28,0,187992,8500,0
+30000,Female,University,Single,23,-2,318,0,0
+50000,Female,High School,Married,39,0,41250,1501,0
+250000,Female,Graduate School,Single,29,-1,1181,1545,0
+50000,Female,University,Single,25,3,35592,1800,1
+40000,Male,University,Married,38,2,33152,1300,1
+230000,Male,University,Married,35,0,28827,8000,0
+210000,Male,Graduate School,Single,33,1,0,0,1
+60000,Male,University,Single,30,0,15145,2000,1
+500000,Female,Graduate School,Married,41,0,190730,8501,0
+110000,Female,University,Single,28,0,81449,4000,0
+400000,Male,Graduate School,Married,44,0,95719,4652,0
+280000,Male,Graduate School,Married,51,-1,1323,445,0
+60000,Male,University,Married,35,0,22226,1292,0
+100000,Male,High School,Married,47,0,3933,1000,0
+230000,Female,University,Single,42,0,64489,3100,0
+80000,Female,High School,Single,24,1,0,0,0
+110000,Female,University,Married,43,0,45021,2100,0
+390000,Male,Graduate School,Married,47,-1,9361,0,0
+150000,Female,University,Single,39,1,21019,1602,0
+180000,Male,University,Single,29,1,5265,0,1
+500000,Male,Graduate School,Single,32,1,-218,1000,0
+50000,Male,High School,Married,32,2,39589,1500,1
+20000,Female,High School,Single,49,0,8503,3008,0
+130000,Female,University,Single,24,1,-10,0,0
+110000,Female,High School,Married,27,0,61152,2326,0
+200000,Male,University,Married,29,0,45982,2078,0
+50000,Female,University,Married,47,0,48991,2000,0
+130000,Male,Graduate School,Single,29,0,126974,5988,0
+260000,Male,University,Single,28,0,149814,107000,0
+180000,Male,University,Married,30,-1,220,4686,0
+330000,Female,Graduate School,Married,29,-1,730,213,0
+80000,Female,University,Single,29,1,54033,0,0
+320000,Male,Graduate School,Single,32,-1,752,0,1
+20000,Male,University,Single,38,-1,10462,10287,0
+50000,Female,High School,Single,43,2,2400,0,1
+210000,Female,High School,Married,40,0,201169,16286,0
+420000,Male,University,Single,31,2,239293,0,1
+210000,Female,University,Single,35,-2,2593,0,0
+160000,Female,University,Married,42,0,35815,3000,0
+20000,Female,Graduate School,Single,25,0,17776,1700,0
+360000,Male,Graduate School,Single,35,-1,2010,2936,0
+30000,Male,University,Others,50,2,11479,2000,1
+50000,Male,University,Single,27,1,-697,49900,0
+30000,Male,University,Married,38,1,0,2281,1
+30000,Female,University,Single,24,0,28008,2000,0
+280000,Female,University,Single,24,-1,1006,1855,0
+30000,Female,High School,Single,24,2,30279,1500,1
+20000,Male,University,Single,33,0,12518,1300,0
+300000,Female,Graduate School,Married,51,0,108639,4107,0
+180000,Female,Graduate School,Single,35,-1,1880,0,0
+20000,Female,University,Married,24,4,16549,0,1
+80000,Female,High School,Married,39,0,4948,1000,0
+20000,Female,University,Married,47,0,19553,1417,0
+60000,Male,Graduate School,Single,29,0,31462,6037,0
+20000,Male,University,Single,22,0,18743,3000,0
+30000,Female,High School,Married,67,1,30734,0,0
+50000,Male,Graduate School,Single,33,1,26957,2000,1
+80000,Male,Graduate School,Married,47,0,49735,25144,0
+170000,Female,University,Single,24,-1,1166,3000,1
+160000,Male,Graduate School,Married,36,-1,816,316,1
+130000,Male,Graduate School,Single,27,1,60024,4000,0
+30000,Female,University,Married,34,2,31746,1800,1
+100000,Female,Graduate School,Single,36,1,0,780,0
+120000,Female,University,Single,25,3,87057,0,1
+150000,Female,Graduate School,Married,49,-1,850,0,1
+100000,Female,High School,Married,24,1,56341,0,1
+30000,Male,University,Single,24,2,25851,1779,1
+160000,Female,University,Single,45,-2,2057,2147,0
+120000,Female,High School,Married,48,-2,2468,7890,0
+160000,Female,Graduate School,Single,40,-1,67925,3509,0
+340000,Female,Graduate School,Married,35,1,-8,1730,1
+240000,Male,High School,Married,55,0,25631,2500,0
+30000,Female,High School,Married,42,1,30607,10,0
+50000,Male,University,Married,54,0,47872,2000,0
+200000,Male,University,Single,25,1,0,0,1
+80000,Female,University,Single,44,0,17047,1435,0
+50000,Female,University,Single,41,2,5025,0,1
+130000,Female,University,Married,29,0,18157,2000,0
+290000,Male,Graduate School,Married,57,1,0,0,1
+30000,Female,High School,Married,44,0,27631,2000,1
+340000,Male,High School,Married,40,0,77181,5305,0
+180000,Female,Graduate School,Single,28,-1,651,1883,0
+60000,Female,Graduate School,Single,23,0,59423,5500,0
+390000,Female,Graduate School,Single,25,-1,83514,4013,0
+460000,Male,Graduate School,Married,35,0,384919,15045,0
+500000,Male,Graduate School,Married,51,0,396402,14034,0
+30000,Female,University,Married,44,-1,8347,1500,0
+80000,Female,University,Single,25,1,28789,2998,0
+250000,Female,University,Married,41,-1,37137,51811,0
+280000,Female,Graduate School,Single,26,0,146130,5042,0
+30000,Female,University,Single,26,2,25615,5000,1
+30000,Male,Graduate School,Single,28,2,22329,2000,1
+20000,Male,University,Single,27,1,19443,0,1
+380000,Female,High School,Married,56,-2,0,1349,1
+100000,Female,High School,Married,29,4,175441,0,1
+180000,Male,University,Single,32,-1,1071,1671,1
+80000,Female,Graduate School,Single,25,0,3409,1144,0
+220000,Male,Graduate School,Single,31,0,163148,5725,0
+170000,Female,University,Single,26,0,141476,5113,0
+140000,Female,Graduate School,Single,29,2,97905,4700,1
+20000,Male,Graduate School,Single,29,2,10512,2000,1
+100000,Female,High School,Married,38,0,62427,2500,0
+50000,Female,High School,Married,47,0,32516,1870,0
+300000,Female,University,Single,30,-1,2186,0,0
+30000,Female,High School,Single,25,0,26769,1500,0
+50000,Male,High School,Married,53,0,49696,2000,0
+160000,Male,University,Single,37,-1,3090,52165,0
+80000,Male,University,Single,27,0,45920,5013,0
+140000,Female,University,Married,28,0,132070,6000,0
+20000,Male,University,Single,23,1,19403,0,1
+50000,Male,High School,Married,51,-1,17453,8338,0
+200000,Female,Graduate School,Single,34,1,0,0,1
+80000,Female,University,Married,23,-2,780,0,1
+80000,Female,University,Single,27,1,81635,3716,0
+210000,Female,Graduate School,Single,39,-1,6000,0,0
+260000,Male,University,Married,28,0,26303,5030,0
+200000,Male,University,Married,46,1,0,552,1
+110000,Female,University,Married,33,2,110327,5332,1
+10000,Male,University,Single,23,0,5857,1117,0
+200000,Male,Graduate School,Single,30,0,91936,7500,0
+300000,Male,Graduate School,Single,31,-1,360,360,1
+360000,Female,Graduate School,Married,39,-1,7001,1224,0
+290000,Female,University,Married,32,0,78200,3700,0
+30000,Male,High School,Married,26,1,30814,0,1
+300000,Male,Graduate School,Married,47,-2,0,225,0
+300000,Female,Graduate School,Single,28,-1,986,2964,0
+210000,Female,High School,Single,45,-2,316,312,0
+50000,Male,High School,Married,36,0,74969,2846,1
+340000,Female,Graduate School,Single,26,0,337871,11208,0
+330000,Male,Graduate School,Married,34,0,138009,7015,1
+220000,Female,Graduate School,Single,28,0,44109,5000,0
+100000,Female,University,Single,30,0,23939,1500,0
+260000,Male,Graduate School,Single,30,-1,399,399,1
+180000,Male,University,Single,30,-1,440,30404,0
+200000,Male,University,Married,44,-1,780,390,0
+110000,Female,High School,Single,41,0,90599,2213,1
+20000,Male,University,Single,48,0,7186,2000,1
+60000,Male,Graduate School,Single,25,0,13123,1500,0
+30000,Male,High School,Married,28,2,23848,700,1
+30000,Female,Graduate School,Single,24,2,31202,2415,1
+420000,Female,Graduate School,Married,37,-1,38386,316,1
+340000,Male,Graduate School,Married,34,-1,332,332,1
+270000,Male,University,Married,38,1,0,0,0
+360000,Female,Others,Single,36,-2,15019,2074,0
+230000,Female,Graduate School,Single,25,-1,736,645,0
+50000,Female,University,Married,33,0,47327,2200,0
+230000,Female,Graduate School,Single,28,-2,-5,2153,0
+200000,Female,Graduate School,Married,38,-2,-3,690,0
+100000,Female,University,Single,24,0,101887,4902,0
+150000,Female,University,Married,47,0,153404,6900,0
+50000,Male,High School,Single,28,0,46709,1690,0
+60000,Male,University,Single,29,0,34081,2000,0
+70000,Female,University,Single,30,0,10149,2000,0
+20000,Male,University,Single,23,-1,5044,1011,0
+50000,Female,University,Single,25,0,24408,1300,0
+500000,Female,Graduate School,Married,33,2,1569,0,0
+390000,Male,High School,Married,49,1,0,0,0
+90000,Female,Graduate School,Single,27,-2,0,0,0
+10000,Male,University,Single,22,0,5748,1189,0
+30000,Male,University,Single,24,-1,390,390,1
+240000,Male,High School,Married,35,1,-2,0,1
+180000,Female,Graduate School,Single,29,-1,36567,1682,0
+70000,Female,University,Single,23,0,22658,9397,0
+50000,Male,University,Single,26,-1,1530,750,0
+20000,Male,High School,Married,46,0,11325,1519,0
+260000,Female,University,Married,36,0,200593,8500,1
+160000,Male,University,Married,26,-2,0,0,1
+250000,Female,Graduate School,Married,42,0,25298,0,0
+20000,Male,Graduate School,Married,49,-1,1044,0,0
+210000,Male,High School,Married,37,-1,2359,1519,0
+100000,Female,Graduate School,Married,35,2,47253,3000,1
+100000,Male,Graduate School,Single,28,0,72970,30000,0
+230000,Female,University,Married,34,-2,2329,1894,0
+20000,Male,University,Single,25,0,13896,2000,1
+40000,Male,Graduate School,Single,24,0,30096,1300,1
+100000,Male,High School,Married,51,-1,54530,55000,0
+20000,Male,High School,Single,26,0,16514,1306,0
+140000,Male,University,Single,32,0,31367,1614,0
+50000,Male,High School,Single,23,2,44470,7800,0
+30000,Female,University,Single,26,0,17341,485,0
+80000,Male,University,Single,28,0,15268,1315,0
+20000,Female,High School,Married,31,0,15654,1330,1
+210000,Female,Graduate School,Single,26,0,9899,1008,0
+220000,Female,Graduate School,Others,33,-1,3378,1531,0
+150000,Female,Graduate School,Married,30,0,40046,7995,0
+200000,Male,Graduate School,Single,28,-1,508,10712,0
+200000,Female,Graduate School,Single,28,2,68398,6900,1
+50000,Male,University,Single,39,0,26813,1708,0
+80000,Female,Graduate School,Single,28,0,9782,2001,0
+310000,Female,Graduate School,Single,32,-2,20138,8267,1
+210000,Male,Graduate School,Single,29,0,140619,7000,0
+140000,Male,Graduate School,Married,40,2,136785,5051,1
+20000,Female,University,Married,32,0,15498,1600,0
+10000,Female,Graduate School,Single,23,1,0,0,0
+250000,Female,Graduate School,Single,30,0,5607,10000,0
+280000,Male,Graduate School,Single,31,1,0,3495,0
+210000,Male,University,Married,48,-1,8666,8666,0
+10000,Male,University,Single,54,0,5186,3300,0
+60000,Female,High School,Married,59,0,59174,3298,0
+180000,Female,Graduate School,Single,38,0,184650,0,0
+10000,Female,Graduate School,Single,31,1,0,0,0
+200000,Female,Graduate School,Married,41,-2,0,5932,0
+120000,Female,Graduate School,Single,29,-1,1171,1000,0
+160000,Female,University,Married,28,0,54232,2100,0
+180000,Male,Graduate School,Married,48,0,167140,8000,0
+60000,Female,High School,Married,42,0,58186,2382,0
+80000,Male,University,Married,27,0,73000,5000,0
+200000,Male,Graduate School,Single,27,-1,9747,10051,0
+20000,Female,High School,Single,22,1,18590,0,1
+360000,Female,University,Single,31,0,17582,0,0
+140000,Female,Graduate School,Single,43,1,147560,3,0
+110000,Female,High School,Married,46,0,94025,8942,1
+360000,Female,Graduate School,Married,33,1,-200,0,0
+30000,Female,Graduate School,Single,22,-1,3963,6222,0
+100000,Male,University,Single,25,0,76222,4700,0
+60000,Male,High School,Single,34,0,35187,1616,1
+130000,Female,Graduate School,Single,32,1,0,0,0
+240000,Female,University,Single,34,-1,5516,4950,0
+210000,Female,University,Married,35,2,61189,3100,1
+360000,Male,Graduate School,Single,25,-1,586,14076,0
+30000,Female,University,Single,22,1,22167,1000,0
+480000,Male,Graduate School,Married,49,1,-220,39149,0
+90000,Female,Others,Single,49,-1,1722,0,0
+50000,Female,Graduate School,Married,42,0,4908,3000,0
+20000,Male,High School,Single,41,0,9186,1200,0
+160000,Female,University,Single,27,-2,490,3085,0
+90000,Female,University,Single,24,0,14520,1240,0
+300000,Female,Others,Married,34,-2,9474,0,0
+500000,Male,Graduate School,Married,43,-1,640,600,0
+130000,Female,High School,Single,36,1,0,0,1
+250000,Male,High School,Married,62,0,38924,5000,0
+10000,Male,High School,Single,25,2,4693,1300,1
+340000,Male,Graduate School,Single,30,0,1933,1000,0
+60000,Male,High School,Single,24,-1,15527,2000,0
+50000,Male,University,Married,47,0,48308,2200,0
+170000,Male,University,Single,34,-1,7885,2336,0
+490000,Male,High School,Married,51,-1,389,377,1
+20000,Female,University,Married,26,0,17679,1300,1
+550000,Male,University,Married,47,0,68328,10000,0
+340000,Female,University,Single,39,0,66623,2500,0
+10000,Male,University,Single,32,0,9576,1384,0
+230000,Female,University,Single,25,-2,0,0,0
+30000,Male,University,Married,39,1,29388,0,0
+200000,Female,University,Single,42,-1,73032,12928,0
+450000,Female,Graduate School,Married,37,0,13371,4000,0
+380000,Male,High School,Married,44,0,329877,12055,1
+50000,Female,University,Married,31,2,29767,1615,1
+80000,Female,Graduate School,Single,27,0,23107,5034,0
+150000,Female,University,Married,42,1,0,0,1
+50000,Female,University,Single,44,0,49006,2000,0
+210000,Female,Graduate School,Single,39,1,0,0,0
+100000,Female,University,Single,24,0,59712,3000,0
+100000,Male,University,Single,38,0,25994,1994,0
+500000,Male,Graduate School,Married,45,0,3136,2002,0
+30000,Female,High School,Married,58,0,9819,1230,0
+50000,Male,University,Single,46,0,53541,31483,0
+50000,Female,University,Single,23,-2,0,0,0
+420000,Female,University,Married,36,-1,3663,4338,0
+30000,Female,University,Married,32,2,28156,0,1
+240000,Male,Graduate School,Single,32,0,240790,6892,0
+80000,Female,High School,Married,55,0,69116,2500,0
+330000,Male,Graduate School,Single,45,0,265362,7500,1
+50000,Male,University,Married,30,1,49371,1800,1
+200000,Female,Graduate School,Married,42,-2,0,0,1
+210000,Female,Graduate School,Married,34,1,-300,0,0
+130000,Male,University,Single,31,1,70722,0,0
+390000,Male,Graduate School,Single,28,-1,10439,15007,0
+100000,Male,High School,Married,40,0,12878,2000,0
+140000,Female,Graduate School,Married,33,0,85314,4000,0
+80000,Female,High School,Married,35,2,70002,3100,1
+50000,Female,High School,Married,42,1,44498,3240,1
+50000,Male,University,Single,25,0,21316,2375,0
+360000,Male,Graduate School,Married,39,1,0,0,0
+170000,Female,University,Married,49,1,153322,0,1
+50000,Male,University,Single,41,0,48152,2043,0
+310000,Female,Graduate School,Single,25,2,266270,11677,1
+210000,Male,Graduate School,Single,31,-1,326,326,1
+140000,Male,University,Married,37,0,90186,2805,0
+150000,Female,Graduate School,Single,37,0,127111,3878,0
+170000,Male,Graduate School,Married,40,-1,2000,2210,1
+260000,Female,Graduate School,Single,25,-1,148,7,1
+170000,Female,Graduate School,Single,26,0,99387,5250,0
+310000,Female,Graduate School,Single,35,1,0,0,0
+50000,Female,High School,Single,23,0,34364,1861,0
+260000,Female,Graduate School,Married,35,2,1013,0,1
+100000,Male,Graduate School,Single,27,0,36014,4200,1
+40000,Female,University,Married,25,0,40633,4300,0
+180000,Female,Graduate School,Married,46,1,82505,6000,0
+50000,Male,High School,Married,41,2,26184,3500,0
+200000,Male,Graduate School,Single,29,-1,4810,0,1
+320000,Female,University,Married,53,0,33589,3000,0
+220000,Female,Graduate School,Single,45,-1,4624,2888,0
+180000,Male,University,Married,46,0,60005,3000,0
+50000,Female,University,Single,36,0,94228,2000,1
+50000,Female,University,Single,22,0,29637,1826,0
+150000,Female,Graduate School,Single,32,0,83275,3006,0
+120000,Female,Graduate School,Married,38,-1,6856,19600,0
+30000,Male,University,Single,34,0,29302,1530,0
+10000,Male,Others,Married,43,-1,17560,2537,1
+380000,Male,Graduate School,Single,34,-2,34869,6336,0
+320000,Female,Graduate School,Married,38,1,0,0,1
+30000,Female,University,Single,41,0,1448,1261,0
+470000,Male,University,Single,29,0,72830,4003,0
+80000,Female,High School,Married,40,0,75768,3500,0
+20000,Male,University,Married,60,-1,8570,2620,0
+120000,Male,Graduate School,Married,51,0,125702,8300,0
+150000,Female,Graduate School,Single,37,1,0,0,0
+200000,Female,University,Single,43,0,180680,8155,0
+20000,Male,University,Single,40,0,45429,2819,0
+170000,Female,High School,Married,36,0,85780,2906,0
+150000,Female,Graduate School,Single,34,-1,900,0,0
+160000,Male,University,Single,30,-1,632,0,0
+280000,Female,Graduate School,Single,32,-1,45536,11509,0
+60000,Male,Graduate School,Married,42,2,39986,2000,1
+100000,Male,High School,Married,44,0,39649,5000,0
+360000,Male,University,Single,48,0,68032,4500,0
+500000,Female,Graduate School,Married,44,-2,71921,368199,0
+290000,Female,Graduate School,Single,30,0,284239,11000,1
+160000,Female,Graduate School,Single,31,-2,0,0,0
+50000,Male,University,Married,47,0,22923,1400,0
+250000,Female,University,Married,35,0,27107,10000,0
+350000,Male,Graduate School,Married,42,-1,3800,3138,0
+100000,Female,University,Married,22,0,99415,4600,0
+30000,Male,University,Single,33,1,31956,0,0
+450000,Female,Graduate School,Married,36,1,0,2500,0
+20000,Female,High School,Single,40,-1,9755,894,0
+30000,Female,High School,Single,26,0,28063,1700,0
+150000,Female,High School,Single,47,0,103226,2627,0
+30000,Male,University,Married,36,1,22327,0,0
+10000,Male,High School,Single,23,0,6001,2000,0
+50000,Female,High School,Single,47,0,17569,1613,0
+50000,Female,Graduate School,Single,22,0,4124,1070,0
+130000,Female,University,Single,23,2,6271,1500,0
+50000,Female,High School,Single,22,0,47066,2300,0
+180000,Female,University,Single,25,0,177633,5203,0
+30000,Male,University,Married,47,2,24081,6155,1
+60000,Female,University,Single,24,0,29717,2076,0
+200000,Male,Graduate School,Single,26,0,16811,5023,0
+30000,Female,High School,Single,23,0,18292,1000,0
+320000,Female,University,Single,36,0,262613,10043,0
+360000,Female,Graduate School,Married,37,-1,1605,14,0
+60000,Female,Graduate School,Married,39,0,56045,5000,0
+50000,Female,University,Single,28,-2,13733,2037,0
+30000,Male,University,Single,35,2,27143,1800,0
+50000,Female,University,Married,31,1,34644,0,0
+70000,Female,Graduate School,Single,24,0,42105,1614,0
+310000,Male,Graduate School,Single,44,1,-42,7323,0
+170000,Female,University,Single,39,-1,2860,15643,0
+150000,Male,Graduate School,Married,49,-2,111348,6000,0
+120000,Female,Graduate School,Single,30,-1,3333,0,0
+100000,Female,Graduate School,Single,33,0,75374,2000,0
+140000,Male,Graduate School,Single,41,2,138379,9000,1
+180000,Female,High School,Married,47,-1,14525,1347,0
+20000,Female,High School,Married,51,0,19684,1000,0
+80000,Female,University,Single,25,1,0,0,0
+200000,Female,University,Married,39,0,178665,3677,0
+200000,Male,University,Single,27,0,25334,2204,0
+30000,Male,High School,Married,42,0,14617,6000,0
+250000,Female,University,Single,40,-2,0,1069,0
+50000,Male,High School,Single,30,1,31217,3000,0
+250000,Male,Graduate School,Single,31,-2,-23,3125,1
+140000,Female,Graduate School,Single,25,0,23409,2000,0
+170000,Female,University,Single,31,2,520,1339,0
+240000,Female,University,Single,47,-2,316,21359,0
+140000,Female,Graduate School,Married,37,0,135134,6703,0
+50000,Female,University,Married,28,0,50998,2209,0
+240000,Female,University,Married,47,1,124425,10839,1
+420000,Female,University,Married,57,-1,666,2629,0
+220000,Male,Graduate School,Married,53,-1,1136,642,0
+70000,Female,University,Married,28,0,44230,1747,0
+90000,Male,University,Single,28,0,55779,2100,0
+220000,Female,Graduate School,Married,40,-2,0,0,0
+200000,Female,Graduate School,Single,27,0,9314,2020,0
+240000,Male,High School,Married,55,0,229254,8737,0
+310000,Female,High School,Single,28,0,19610,10000,0
+80000,Female,University,Married,34,2,76142,6611,1
+290000,Female,University,Married,38,0,144160,6000,0
+260000,Male,University,Married,43,0,175247,8500,1
+120000,Male,Graduate School,Married,43,1,0,0,1
+260000,Female,Graduate School,Married,32,-1,598,2268,0
+360000,Female,University,Single,39,-1,655,0,0
+30000,Female,High School,Married,36,0,7963,1321,1
+30000,Female,University,Single,22,0,27444,2000,1
+20000,Male,Graduate School,Single,22,0,3281,1100,0
+240000,Female,University,Single,25,0,175764,6500,0
+200000,Male,University,Married,31,7,254266,0,1
+70000,Male,University,Single,38,0,27000,5600,0
+200000,Female,Graduate School,Single,23,-2,488,878,0
+150000,Female,Graduate School,Single,31,-1,18395,312,0
+160000,Female,University,Married,35,0,17742,2000,0
+180000,Female,Graduate School,Single,30,0,113647,3000,0
+190000,Male,University,Married,39,0,144414,7000,0
+20000,Female,University,Married,22,0,15749,1500,0
+150000,Male,Graduate School,Single,27,0,82494,3100,0
+310000,Male,University,Married,35,0,7589,2000,0
+180000,Female,University,Married,36,0,98736,6025,0
+80000,Female,University,Single,24,0,81908,5501,1
+350000,Female,University,Married,41,0,85339,5000,0
+50000,Female,University,Married,47,0,32547,1582,0
+90000,Female,University,Single,23,2,87669,3499,0
+220000,Male,University,Married,35,0,112460,8500,0
+50000,Female,Graduate School,Single,25,2,43870,0,1
+90000,Male,High School,Married,46,-1,1651,0,0
+400000,Female,University,Married,30,0,161226,5000,0
+210000,Female,High School,Married,40,-1,2309,4791,0
+180000,Female,University,Married,39,2,177113,7800,1
+500000,Male,Graduate School,Married,55,-1,3925,11253,0
+260000,Female,University,Married,39,2,228443,10698,1
+180000,Female,University,Single,26,0,7149,5946,0
+180000,Female,University,Married,34,-1,8000,0,1
+30000,Female,Graduate School,Single,24,0,22236,1696,0
+360000,Female,Graduate School,Single,33,0,31482,5119,0
+280000,Female,Graduate School,Married,47,-1,2002,0,0
+100000,Female,Graduate School,Single,27,1,53213,6,1
+50000,Female,University,Married,49,0,49467,1472,0
+20000,Male,High School,Married,31,0,17557,1466,0
+80000,Female,High School,Single,27,-2,1672,0,1
+110000,Female,University,Single,36,1,0,780,0
+50000,Male,Graduate School,Others,25,1,6300,9195,1
+50000,Female,University,Married,50,0,48119,1554,1
+20000,Male,Graduate School,Single,33,0,16117,1285,0
+200000,Female,High School,Married,41,0,98184,5143,0
+250000,Female,University,Single,44,1,0,664,0
+150000,Female,Graduate School,Married,33,1,0,0,1
+50000,Male,Graduate School,Single,23,1,0,0,1
+200000,Female,Graduate School,Married,38,-1,1779,1354,0
+220000,Male,University,Married,38,0,147048,6364,0
+60000,Female,University,Married,36,-2,0,2426,0
+100000,Male,University,Single,43,0,47730,1808,0
+20000,Male,High School,Others,46,0,15523,2000,0
+140000,Male,University,Single,28,1,82265,5747,1
+50000,Male,University,Married,46,0,48677,2000,1
+50000,Male,Graduate School,Single,28,0,28639,2100,0
+120000,Female,University,Married,30,-1,686,686,0
+120000,Female,Graduate School,Married,39,1,0,0,0
+30000,Male,University,Single,38,0,28637,2002,0
+180000,Female,Graduate School,Single,26,1,0,0,0
+500000,Male,Graduate School,Married,46,0,53049,16018,0
+420000,Male,Graduate School,Married,44,0,337792,12011,0
+20000,Female,University,Married,45,0,14296,1262,0
+50000,Female,University,Single,24,0,39083,1700,1
+260000,Female,University,Single,30,0,19674,2000,0
+50000,Male,University,Single,27,0,47481,1910,0
+50000,Female,Graduate School,Single,26,-1,750,0,0
+50000,Female,University,Single,49,0,43342,2100,0
+100000,Male,University,Married,49,0,69896,2588,0
+50000,Female,Graduate School,Single,24,4,38965,0,1
+180000,Female,Graduate School,Single,27,0,174546,8269,0
+240000,Female,Graduate School,Married,38,1,0,0,0
+30000,Male,University,Married,38,2,28484,1905,0
+30000,Female,High School,Married,60,-1,5867,0,1
+270000,Female,Graduate School,Married,45,0,96175,3626,0
+30000,Female,High School,Married,57,-1,347,1353,1
+200000,Female,University,Married,31,-1,1319,4997,1
+40000,Female,University,Married,29,-1,8801,1000,1
+360000,Female,Graduate School,Married,31,-2,0,18540,0
+150000,Female,High School,Married,57,-1,5483,9136,0
+80000,Female,University,Single,24,0,9933,1370,1
+300000,Female,Graduate School,Married,32,1,-227,45018,0
+220000,Male,University,Married,32,0,202010,3746,0
+110000,Female,University,Married,42,-1,776,776,0
+50000,Male,Graduate School,Single,30,0,11787,1300,0
+50000,Female,High School,Single,60,0,31164,1756,0
+230000,Female,Graduate School,Married,37,-1,400,20123,0
+50000,Male,Graduate School,Married,27,2,43333,4000,1
+240000,Female,University,Others,39,0,171907,7200,0
+130000,Female,Graduate School,Single,26,0,20054,1298,0
+160000,Female,Graduate School,Single,34,0,43729,2045,0
+210000,Female,Graduate School,Single,31,0,121697,5000,0
+210000,Female,University,Single,32,-1,1952,1526,0
+330000,Male,Graduate School,Single,47,0,240063,9500,0
+110000,Female,University,Married,51,0,10141,1500,0
+120000,Female,University,Single,28,1,108993,5500,0
+100000,Female,Graduate School,Single,29,-1,5466,5652,0
+210000,Female,University,Single,34,0,63672,2200,0
+350000,Female,Graduate School,Single,31,0,181550,10000,0
+180000,Female,University,Single,25,0,141836,5124,0
+150000,Female,Graduate School,Single,28,0,66296,3000,0
+50000,Male,Graduate School,Others,37,1,49457,0,0
+230000,Female,High School,Married,47,-1,5482,15900,0
+30000,Female,University,Married,39,2,9884,0,1
+90000,Female,University,Married,24,2,86643,3881,1
+390000,Female,University,Married,24,0,16406,1500,0
+50000,Female,University,Single,25,2,15679,1100,1
+320000,Male,University,Married,39,0,51058,1240,0
+20000,Male,University,Single,34,0,15907,1498,0
+200000,Female,University,Married,41,0,42292,3000,0
+500000,Male,Graduate School,Married,40,-2,5255,721,1
+190000,Female,University,Single,28,1,0,0,0
+70000,Female,Graduate School,Single,25,2,5730,2000,1
+220000,Female,Graduate School,Single,35,-1,8304,11526,1
+50000,Male,University,Single,28,0,28864,2000,0
+50000,Male,University,Married,37,0,15175,1579,0
+50000,Female,University,Married,46,2,28390,2000,0
+70000,Male,Graduate School,Married,42,0,70730,3500,0
+80000,Female,Graduate School,Single,26,0,66982,2986,0
+50000,Male,High School,Single,41,0,45487,1858,0
+150000,Female,Graduate School,Single,27,-1,1345,0,0
+300000,Female,Graduate School,Married,36,-2,540,1283,0
+50000,Female,Graduate School,Single,25,0,10973,1300,0
+170000,Female,University,Married,42,0,4600,2000,0
+350000,Male,Graduate School,Single,37,-2,316,316,0
+470000,Female,University,Single,33,0,43353,5000,0
+20000,Male,University,Single,25,0,13610,1360,0
+140000,Female,University,Single,26,1,0,811,0
+60000,Female,University,Single,23,1,45014,1774,0
+160000,Female,University,Single,23,0,122339,18518,0
+150000,Female,Graduate School,Single,26,0,101569,3756,0
+20000,Female,Graduate School,Single,23,-1,292,780,0
+210000,Male,Graduate School,Single,35,-1,6808,4960,0
+50000,Male,University,Single,23,-1,3352,18685,0
+210000,Female,Graduate School,Single,31,-1,15332,10000,0
+280000,Male,Graduate School,Married,44,-2,593,1777,0
+300000,Male,University,Single,36,-1,15536,10000,0
+230000,Female,High School,Married,37,0,228342,6100,0
+250000,Male,University,Married,47,-2,8996,18069,0
+500000,Female,Graduate School,Married,36,-1,24170,8091,0
+50000,Female,University,Married,48,0,12592,2017,0
+50000,Male,High School,Single,46,0,5942,1056,0
+150000,Female,University,Married,36,0,296846,7143,0
+180000,Female,Graduate School,Single,53,1,0,41300,0
+450000,Male,University,Single,37,0,136655,13492,0
+70000,Female,University,Others,24,1,4177,0,0
+110000,Male,High School,Married,40,0,93357,4100,0
+200000,Male,Graduate School,Married,43,1,203393,0,1
+210000,Female,University,Single,37,1,0,0,0
+320000,Male,Graduate School,Single,36,0,146201,5465,0
+20000,Male,Graduate School,Single,37,0,66599,2317,1
+130000,Female,University,Single,29,-1,1110,1170,0
+20000,Male,Graduate School,Married,29,0,16961,2000,0
+210000,Male,Graduate School,Married,39,-1,3391,1000,0
+30000,Female,University,Single,33,0,25498,1451,1
+60000,Female,High School,Married,36,1,47987,1788,1
+110000,Male,University,Married,49,0,75565,4009,0
+130000,Male,Graduate School,Single,32,-1,32236,10000,0
+500000,Female,Graduate School,Single,38,0,74615,4010,0
+500000,Male,University,Married,38,0,499231,22600,1
+100000,Male,University,Single,35,-1,8102,2500,1
+220000,Male,Graduate School,Single,25,0,29664,1018,0
+50000,Male,University,Single,40,0,40561,18280,0
+500000,Male,Graduate School,Married,49,-1,1320,1320,0
+50000,Female,Graduate School,Single,29,-1,2459,2471,1
+200000,Female,Graduate School,Married,45,-2,1417,12660,0
+320000,Female,University,Single,50,-1,5714,3000,0
+450000,Male,University,Single,29,1,16524,676,0
+20000,Male,High School,Single,26,8,43340,0,0
+310000,Female,University,Married,44,-1,500,0,0
+120000,Female,University,Single,23,0,100727,4324,1
+200000,Female,University,Married,48,0,354304,8000,0
+50000,Male,High School,Married,53,0,49466,2442,0
+130000,Male,High School,Married,48,0,100522,4650,0
+50000,Female,University,Single,23,0,25421,2000,0
+130000,Male,University,Single,26,0,16435,1300,1
+50000,Female,University,Single,42,0,50221,2215,0
+30000,Male,University,Single,32,2,17287,3000,0
+30000,Female,University,Married,34,2,24014,2274,1
+160000,Male,Graduate School,Single,34,-1,390,390,1
+50000,Male,Graduate School,Single,27,0,43341,1800,0
+80000,Female,University,Single,28,0,18112,10084,0
+60000,Female,University,Single,25,0,50269,4749,0
+230000,Female,Graduate School,Single,27,1,46628,0,0
+390000,Female,Graduate School,Single,30,0,86124,4300,0
+110000,Female,University,Married,28,0,3242,1300,0
+50000,Male,Graduate School,Single,24,1,0,0,0
+20000,Female,University,Married,35,1,22382,0,1
+100000,Male,University,Single,29,0,65324,2106,0
+70000,Female,High School,Married,33,0,32405,10580,0
+150000,Male,University,Married,31,2,156286,57,0
+390000,Male,Graduate School,Single,31,0,19469,2227,0
+50000,Male,University,Married,22,0,47559,3690,0
+90000,Male,University,Single,26,2,121368,0,0
+290000,Female,University,Single,30,0,288994,12549,1
+20000,Male,University,Single,24,0,19401,3232,0
+90000,Female,Graduate School,Single,28,0,87916,5800,1
+260000,Male,High School,Single,48,0,266762,11500,0
+90000,Female,University,Single,26,0,92472,4600,0
+100000,Female,University,Single,27,3,104489,0,1
+150000,Female,Graduate School,Single,31,0,126419,7000,0
+60000,Female,University,Married,46,0,60987,2145,0
+80000,Male,University,Married,42,2,80586,3000,0
+80000,Female,University,Married,24,-1,15621,3768,0
+30000,Female,University,Married,26,1,30349,0,0
+20000,Male,University,Married,38,-1,2638,3016,0
+50000,Male,University,Single,24,0,34500,0,0
+250000,Female,Graduate School,Single,37,0,43971,1007,0
+50000,Female,Graduate School,Single,22,0,41940,1748,1
+120000,Female,High School,Single,32,0,5152,1015,0
+20000,Female,University,Married,36,1,-25,0,0
+280000,Female,High School,Single,33,-1,2186,2196,0
+50000,Female,Graduate School,Married,35,0,46532,3000,0
+360000,Male,High School,Single,49,-2,348,8047,0
+100000,Male,Graduate School,Single,32,-1,671,671,1
+140000,Female,High School,Married,42,2,47556,2300,0
+170000,Female,University,Married,33,0,172382,6558,0
+50000,Female,University,Married,33,-1,25658,1700,0
+50000,Male,High School,Married,46,0,49083,1463,0
+120000,Male,University,Married,47,-1,2486,260,1
+170000,Female,University,Single,38,0,86401,3991,0
+380000,Male,University,Married,43,-1,195221,8000,0
+400000,Female,High School,Married,37,0,76917,4000,0
+50000,Female,Graduate School,Single,30,0,30145,5000,0
+440000,Female,University,Married,42,1,22597,7324,0
+30000,Female,University,Married,24,2,150,0,0
+470000,Male,Graduate School,Married,39,-1,5572,3595,0
+180000,Female,High School,Married,40,-1,1903,0,0
+110000,Male,High School,Single,32,0,34000,223,1
+300000,Male,Graduate School,Single,25,0,76918,4103,0
+120000,Female,Graduate School,Single,31,-1,325,325,0
+220000,Male,Graduate School,Single,33,-2,3233,2108,0
+50000,Male,Graduate School,Single,27,0,50459,1600,0
+100000,Male,University,Married,43,1,0,0,0
+170000,Female,Graduate School,Single,30,-2,3733,1958,0
+360000,Female,Graduate School,Single,37,-1,3355,5865,0
+180000,Female,University,Single,43,0,4819,1010,0
+120000,Male,University,Single,27,0,33474,1789,0
+210000,Male,High School,Single,32,-2,1649,1000,0
+60000,Male,University,Married,56,0,28842,2000,1
+130000,Male,University,Married,40,0,20733,7261,0
+50000,Female,High School,Married,36,0,3781,1710,0
+50000,Female,High School,Married,52,1,30505,0,1
+100000,Female,Graduate School,Married,33,-1,3976,0,1
+150000,Female,High School,Married,43,0,90606,4348,0
+240000,Male,University,Married,38,2,6848,836,1
+120000,Male,University,Others,38,-2,7787,3505,0
+80000,Female,University,Married,32,-1,199,199,1
+210000,Female,University,Married,27,2,174134,16,1
+80000,Female,University,Single,37,-1,1118,0,0
+50000,Female,University,Married,38,0,20838,3000,1
+50000,Female,University,Others,52,0,50671,1844,0
+330000,Female,Graduate School,Single,34,-1,1406,1488,0
+230000,Male,High School,Single,28,0,152966,65722,0
+30000,Male,High School,Married,44,4,7221,0,1
+50000,Male,Graduate School,Married,41,2,39159,1669,1
+220000,Male,Graduate School,Single,37,2,185955,0,0
+270000,Female,Graduate School,Single,27,0,11306,1500,0
+160000,Male,Graduate School,Married,44,0,156400,7200,0
+10000,Female,University,Married,24,1,3892,0,0
+20000,Male,University,Single,25,0,15006,1406,0
+30000,Male,High School,Single,25,2,22440,2390,1
+210000,Female,University,Single,32,0,85965,4000,0
+250000,Female,University,Married,50,0,116256,11500,0
+50000,Male,University,Single,28,0,23244,1500,0
+180000,Female,Graduate School,Single,28,-2,584,1133,0
+200000,Male,Graduate School,Single,35,-1,7429,24325,0
+350000,Male,Graduate School,Married,38,-2,16459,941,1
+20000,Male,Graduate School,Single,22,0,18093,1300,0
+30000,Male,High School,Single,36,0,30328,1768,1
+20000,Male,Graduate School,Married,34,2,6099,2000,0
+140000,Female,High School,Single,59,2,62042,3000,1
+30000,Male,High School,Married,44,2,18659,4500,1
+180000,Female,Graduate School,Single,30,0,15743,1205,0
+20000,Female,University,Single,25,0,8011,1320,0
+100000,Female,High School,Married,36,2,53530,3000,1
+100000,Female,High School,Married,36,0,101972,16608,1
+20000,Male,University,Married,45,-2,0,0,0
+350000,Male,Graduate School,Married,31,0,18685,3000,1
+80000,Female,University,Married,32,0,80237,3000,0
+120000,Female,High School,Married,49,0,44244,2000,0
+210000,Female,University,Single,30,-1,1654,32000,0
+120000,Male,Graduate School,Married,37,-1,32050,3872,0
+20000,Male,University,Single,22,0,8134,2000,0
+200000,Male,High School,Single,25,-2,848,753,0
+50000,Female,University,Single,54,-1,1126,0,0
+170000,Male,University,Married,37,-2,5635,6321,0
+450000,Male,Graduate School,Married,37,1,0,245,0
+300000,Male,Graduate School,Single,34,-1,396,396,0
+300000,Male,Graduate School,Single,36,0,295433,13000,0
+280000,Male,University,Married,37,-2,2035,11009,0
+170000,Male,Graduate School,Single,39,0,171160,13000,0
+60000,Male,Graduate School,Married,38,2,30013,2000,1
+50000,Male,University,Single,29,0,23469,1437,0
+380000,Male,University,Married,38,-2,0,0,0
+70000,Female,Graduate School,Married,40,2,45251,1800,1
+430000,Male,Graduate School,Married,48,-2,4469,52226,0
+180000,Male,Graduate School,Married,38,-1,326,326,0
+330000,Female,University,Married,41,1,289884,0,0
+250000,Female,Graduate School,Single,26,-1,7374,0,0
+480000,Female,Graduate School,Married,38,0,384150,15000,0
+210000,Male,University,Single,37,-1,1890,3037,0
+160000,Male,University,Single,26,0,25060,8026,0
+390000,Male,Others,Single,29,1,407399,0,0
+280000,Male,Graduate School,Married,44,1,-136,360,0
+50000,Male,University,Married,33,-2,390,390,0
+350000,Male,Graduate School,Married,60,1,0,1150,1
+230000,Female,Graduate School,Single,40,-2,360,0,0
+200000,Male,High School,Others,36,-2,5644,89187,0
+30000,Male,University,Single,47,0,29596,1600,0
+50000,Male,University,Married,37,-1,4719,22530,0
+180000,Male,Graduate School,Single,26,0,24464,1573,0
+150000,Female,High School,Married,45,0,141415,5457,1
+70000,Male,Graduate School,Single,27,0,39449,2100,0
+150000,Male,University,Single,26,0,10111,1158,0
+50000,Male,High School,Single,25,1,43685,2000,0
+430000,Female,University,Married,42,0,89395,3243,0
+70000,Male,Graduate School,Single,25,-1,3635,1100,1
+20000,Male,High School,Single,36,1,0,832,0
+310000,Female,Graduate School,Married,33,-2,115,894,1
+160000,Female,University,Married,23,0,96814,10008,0
+30000,Male,High School,Single,38,1,30771,0,0
+480000,Female,High School,Married,39,-2,51303,73428,0
+180000,Female,Graduate School,Single,28,0,99307,6700,1
+210000,Male,University,Married,40,-1,5130,15266,0
+360000,Male,Graduate School,Single,30,-1,312,313,1
+180000,Male,High School,Single,31,2,36864,0,0
+200000,Female,Graduate School,Single,29,1,16198,0,0
+50000,Male,High School,Single,22,0,48423,2150,1
+240000,Male,University,Married,48,0,3972,783,1
+180000,Female,Graduate School,Single,29,0,35619,1745,0
+80000,Female,High School,Married,33,-1,325,107,0
+50000,Female,High School,Married,32,0,42740,3000,0
+280000,Female,University,Married,36,0,101803,5005,0
+70000,Female,Graduate School,Single,29,0,43462,3000,0
+400000,Male,Graduate School,Single,36,-1,6233,11219,0
+210000,Female,High School,Married,45,-1,430,430,0
+80000,Male,High School,Single,33,0,55424,2068,0
+30000,Female,University,Single,24,2,300,0,0
+310000,Male,University,Single,29,2,400,0,0
+360000,Female,High School,Single,27,1,0,0,0
+200000,Male,University,Single,28,-1,29804,20044,0
+50000,Male,University,Married,33,0,50431,1823,0
+320000,Female,University,Married,41,0,49846,5000,0
+60000,Male,University,Single,27,0,23439,3300,0
+230000,Female,Graduate School,Single,30,2,216409,10000,0
+150000,Female,Graduate School,Married,44,-1,1843,1367,1
+360000,Female,Graduate School,Married,39,-1,165,853,0
+50000,Male,Graduate School,Single,34,0,20613,10000,0
+280000,Female,Graduate School,Married,37,-1,3915,6691,0
+50000,Female,High School,Married,50,0,38499,1743,0
+500000,Male,Graduate School,Married,32,0,15491,12992,0
+400000,Female,High School,Married,41,0,18487,2620,1
+280000,Male,University,Single,28,-2,1799,1782,0
+360000,Female,University,Single,41,0,158277,50000,0
+260000,Female,Graduate School,Single,29,1,15502,18000,1
+330000,Male,Graduate School,Married,42,0,241191,10000,0
+350000,Female,University,Married,33,-2,-7438,152000,0
+500000,Female,University,Married,39,-1,36550,18795,0
+440000,Female,Graduate School,Married,35,0,128808,10000,0
+260000,Male,University,Single,28,0,234867,10500,0
+500000,Male,High School,Married,48,0,36875,1192,0
+100000,Male,University,Married,35,0,81415,4000,0
+70000,Male,High School,Married,47,0,67670,2730,0
+440000,Male,Graduate School,Single,43,0,58108,2400,0
+220000,Female,Graduate School,Married,44,-1,208858,7611,0
+160000,Female,Graduate School,Married,32,-1,3189,3248,0
+200000,Male,University,Married,46,-1,7340,0,0
+30000,Female,University,Single,25,0,24458,5000,0
+110000,Female,Graduate School,Single,27,0,113461,4251,0
+60000,Female,University,Married,43,0,57033,2880,0
+260000,Male,Graduate School,Single,28,-1,76996,11515,0
+110000,Male,University,Married,36,0,109361,3940,0
+260000,Male,High School,Married,44,0,257987,7100,0
+280000,Female,Graduate School,Single,32,0,157480,6400,0
+210000,Male,Graduate School,Single,30,-1,462,1032,0
+80000,Female,University,Married,37,0,5744,33115,0
+200000,Female,Graduate School,Single,37,0,75360,2050,0
+100000,Female,University,Single,33,2,74763,0,0
+210000,Male,University,Married,42,-1,9377,9540,0
+140000,Female,University,Single,35,1,148305,0,0
+180000,Male,Graduate School,Single,29,0,93962,3500,0
+240000,Female,University,Single,34,0,122810,10066,0
+150000,Female,Graduate School,Single,24,-1,4857,1733,0
+80000,Male,University,Married,54,0,105365,5030,1
+210000,Female,University,Married,37,0,22908,32513,1
+180000,Female,Graduate School,Single,28,0,12595,7000,0
+100000,Female,Graduate School,Single,26,0,96353,4900,0
+20000,Male,Graduate School,Single,28,0,17359,2700,0
+140000,Male,University,Single,34,0,128312,5000,0
+200000,Male,High School,Single,40,-2,390,1799,0
+80000,Male,University,Single,25,0,77142,2000,0
+20000,Female,University,Single,49,1,18718,0,1
+80000,Female,University,Single,24,-2,0,0,0
+170000,Female,Graduate School,Married,41,0,165602,80000,0
+70000,Female,University,Married,41,2,50241,2000,1
+90000,Female,High School,Married,40,-1,3569,7385,0
+30000,Male,University,Single,30,0,22221,1391,0
+50000,Male,Graduate School,Single,25,-1,2543,14439,0
+50000,Female,University,Single,21,-1,390,1499,1
+260000,Female,Graduate School,Single,39,0,231024,10000,0
+360000,Female,Graduate School,Married,33,-1,50067,17607,0
+500000,Female,High School,Single,52,-2,0,0,1
+160000,Male,High School,Married,46,2,168501,0,0
+460000,Female,Graduate School,Single,39,0,89095,6000,0
+20000,Female,University,Single,26,2,19366,1800,1
+360000,Male,University,Single,25,1,0,1647,0
+300000,Female,Graduate School,Married,38,-1,1008,228,0
+70000,Female,University,Married,28,1,29836,1800,1
+50000,Female,Graduate School,Single,26,0,26808,30040,0
+150000,Female,High School,Married,51,0,113647,5001,0
+390000,Male,University,Married,42,-1,1565,1733,0
+280000,Male,Graduate School,Single,37,0,251486,9352,0
+50000,Female,High School,Married,28,0,3377,1000,0
+280000,Female,Graduate School,Single,38,1,0,0,1
+50000,Female,University,Married,41,1,0,0,0
+50000,Male,Graduate School,Single,32,0,13185,1255,0
+150000,Female,Graduate School,Single,28,-2,0,0,0
+120000,Female,Graduate School,Single,31,1,0,1730,1
+360000,Female,University,Others,37,0,10764,1052,0
+20000,Male,University,Single,33,-1,3010,1272,0
+50000,Female,High School,Married,46,0,48797,2000,0
+20000,Female,Graduate School,Single,23,1,10407,0,1
+90000,Male,University,Married,35,0,90467,4000,1
+230000,Male,University,Married,35,0,176853,10000,0
+20000,Male,University,Single,22,-1,17990,0,0
+100000,Female,University,Single,23,-2,45756,9187,0
+270000,Male,University,Single,26,0,214277,10000,0
+160000,Male,University,Single,28,-1,1390,550,0
+80000,Female,University,Single,29,0,46724,2000,0
+10000,Male,Graduate School,Single,24,2,7554,1500,1
+500000,Female,University,Married,27,-2,11354,9983,0
+90000,Female,Graduate School,Married,43,-1,16139,4367,0
+150000,Female,High School,Married,59,-2,11067,12405,0
+50000,Male,High School,Married,49,0,2522,1173,0
+260000,Male,High School,Married,48,0,200716,8812,0
+50000,Female,University,Single,22,-2,836,10214,0
+30000,Male,Graduate School,Single,28,-1,1190,1098,1
+70000,Male,University,Single,26,2,46599,0,0
+360000,Male,Graduate School,Married,52,1,0,0,1
+430000,Female,Graduate School,Single,39,-2,673,4576,0
+50000,Female,University,Single,23,-1,1291,1100,0
+150000,Female,Others,Single,26,0,22683,2219,0
+70000,Female,University,Single,23,1,17461,0,1
+150000,Female,Graduate School,Single,25,0,42437,2265,0
+110000,Male,High School,Married,38,0,14084,2500,0
+20000,Female,Graduate School,Single,38,-1,3527,1241,1
+20000,Male,High School,Single,39,0,7824,1307,0
+50000,Male,Graduate School,Single,31,-2,11479,12002,0
+10000,Male,Graduate School,Single,33,0,8417,1700,0
+20000,Male,University,Single,24,0,15703,4663,0
+180000,Male,University,Single,34,1,149172,4600,1
+60000,Female,University,Married,26,0,60597,5504,1
+20000,Male,Graduate School,Married,30,1,6885,2000,1
+70000,Female,Graduate School,Single,25,0,71106,3292,0
+60000,Female,University,Married,38,0,48399,1764,0
+200000,Male,Graduate School,Single,28,-2,3330,2686,0
+200000,Male,Graduate School,Single,34,1,-1730,0,1
+300000,Female,Graduate School,Married,37,-1,3144,5734,0
+390000,Male,Graduate School,Single,34,-1,13006,524,0
+80000,Female,University,Married,23,0,72364,2756,0
+100000,Female,Graduate School,Single,25,-1,12660,5000,0
+230000,Male,High School,Single,30,1,213894,10040,0
+40000,Female,High School,Married,49,1,0,0,0
+180000,Female,University,Married,25,0,33322,1900,0
+420000,Male,University,Single,28,0,22872,1700,0
+20000,Male,University,Married,59,0,2946,0,0
+340000,Female,University,Married,31,-1,33639,5000,0
+20000,Female,Graduate School,Single,25,2,9317,10000,1
+320000,Male,High School,Single,52,0,31858,5000,0
+360000,Male,Graduate School,Married,39,-2,0,0,0
+100000,Female,Graduate School,Single,29,0,33330,3500,0
+310000,Female,Graduate School,Single,36,-2,0,298887,0
+140000,Female,University,Single,24,0,35806,3000,0
+200000,Male,University,Single,33,1,98990,0,0
+30000,Female,University,Single,28,-1,1743,0,1
+130000,Female,High School,Married,25,0,132110,6359,0
+20000,Female,High School,Married,44,0,17095,3000,0
+120000,Female,Others,Married,57,0,116497,5000,0
+190000,Female,Graduate School,Single,32,-2,8010,922,0
+50000,Female,University,Married,26,0,43721,2228,0
+130000,Female,Others,Single,27,0,107599,4030,0
+220000,Female,University,Married,48,0,28132,2700,1
+200000,Female,University,Married,29,0,70422,3500,0
+50000,Female,High School,Married,59,0,37201,1503,0
+30000,Female,University,Married,26,2,30954,2074,0
+200000,Female,University,Married,33,1,0,0,0
+400000,Female,High School,Married,50,2,5523,8037,1
+120000,Female,Graduate School,Married,41,0,118847,5000,0
+60000,Female,University,Married,38,0,45972,5000,0
+150000,Female,Graduate School,Single,25,-1,3167,13879,0
+240000,Female,University,Married,44,-2,16925,1411,0
+130000,Male,University,Single,29,-1,3594,0,0
+200000,Male,High School,Married,40,2,91368,4000,1
+240000,Female,University,Married,56,2,2500,0,1
+30000,Female,Graduate School,Single,22,-1,14885,1939,1
+50000,Female,University,Married,26,0,44335,2126,0
+30000,Male,University,Single,26,2,13660,1269,0
+310000,Male,Graduate School,Married,33,-2,0,0,0
+120000,Female,University,Single,28,2,111840,5300,1
+290000,Female,University,Married,34,0,16196,1292,0
+50000,Male,University,Single,45,0,45911,2350,0
+130000,Female,Graduate School,Single,27,0,86883,40000,0
+50000,Male,Others,Married,30,0,47311,2900,0
+210000,Female,University,Married,33,-1,98,1120,0
+50000,Male,High School,Married,47,1,4759,0,1
+10000,Female,University,Single,40,0,6160,1123,0
+10000,Male,High School,Single,22,0,5757,1200,0
+80000,Female,University,Single,24,2,75444,6400,1
+360000,Male,University,Married,65,1,0,0,1
+530000,Male,Graduate School,Single,32,-2,12451,24120,0
+20000,Male,University,Single,46,0,20376,2000,0
+170000,Male,University,Married,42,0,131629,5300,0
+220000,Female,University,Married,33,-1,16090,4991,0
+180000,Male,High School,Married,40,-1,991,991,0
+440000,Female,Graduate School,Married,37,-1,5302,13333,0
+20000,Male,University,Married,62,2,17870,1462,1
+400000,Female,University,Single,39,1,0,0,1
+130000,Male,University,Single,27,0,44826,1092,0
+420000,Male,Graduate School,Single,29,0,194505,5479,0
+80000,Female,University,Married,34,0,78688,4000,0
+20000,Female,University,Married,45,0,18569,2500,0
+320000,Female,University,Single,29,1,93070,0,1
+160000,Male,Graduate School,Married,40,0,27563,5002,0
+50000,Female,University,Single,23,0,48725,4987,0
+50000,Male,University,Single,35,0,14536,2000,0
+70000,Female,University,Married,43,0,71037,3352,0
+20000,Male,High School,Single,59,1,19074,0,1
+160000,Female,University,Single,24,0,157389,6800,0
+130000,Female,Graduate School,Single,33,1,0,0,1
+20000,Female,Graduate School,Single,23,0,16918,4000,0
+170000,Female,University,Married,48,-2,18727,66513,0
+300000,Female,Graduate School,Single,39,-1,729,0,0
+160000,Female,Graduate School,Single,31,-1,1672,4411,0
+110000,Female,High School,Married,52,1,0,0,0
+230000,Female,Graduate School,Single,27,0,92167,3340,0
+60000,Female,University,Single,21,0,40969,1859,0
+280000,Male,Graduate School,Married,35,0,146368,101585,0
+180000,Female,Graduate School,Single,47,1,-6,0,0
+30000,Female,Graduate School,Married,30,2,12109,1300,1
+50000,Female,University,Single,28,0,48010,5000,0
+270000,Female,University,Married,41,-1,832,0,0
+80000,Female,University,Single,24,-1,2346,0,1
+120000,Male,Graduate School,Single,31,2,25719,5000,1
+120000,Male,High School,Married,53,-1,291,291,0
+30000,Male,Graduate School,Single,23,0,29025,1507,0
+380000,Male,Graduate School,Single,36,0,68943,8000,0
+80000,Female,University,Married,37,2,80508,3100,1
+20000,Female,University,Single,26,0,17938,1456,0
+160000,Male,High School,Married,34,-1,2947,2000,0
+230000,Female,University,Single,26,-1,8328,0,1
+370000,Male,University,Single,39,2,334394,14005,1
+30000,Female,High School,Married,41,1,29717,0,0
+20000,Female,University,Single,24,0,20073,1319,0
+20000,Male,University,Single,23,1,17529,0,1
+470000,Male,University,Married,52,0,84672,5000,0
+220000,Female,University,Single,29,0,36106,3000,1
+80000,Male,High School,Single,32,0,78887,2569,0
+180000,Female,University,Married,33,-2,0,0,0
+100000,Female,University,Single,39,0,24110,2000,0
+10000,Female,University,Single,21,0,7888,1383,0
+280000,Female,University,Married,51,-2,2477,3470,0
+110000,Female,Graduate School,Married,31,2,107507,7960,1
+210000,Female,Graduate School,Single,27,0,135066,5600,0
+200000,Female,Graduate School,Married,36,0,10317,5788,0
+260000,Male,Graduate School,Single,27,-1,868,1320,0
+600000,Male,Graduate School,Single,36,-2,-1,0,0
+330000,Male,University,Married,47,-2,19540,25487,0
+500000,Female,Graduate School,Single,31,0,258257,25239,0
+200000,Male,Graduate School,Others,29,-1,7505,5919,0
+200000,Male,University,Married,36,2,7300,3238,1
+210000,Female,University,Single,27,-1,5353,280,0
+500000,Female,Graduate School,Married,46,-2,3982,3718,0
+10000,Female,University,Married,44,0,4125,1275,0
+160000,Female,Graduate School,Single,24,0,37000,0,0
+50000,Female,High School,Married,42,0,50906,2009,0
+130000,Male,University,Single,33,1,130565,5012,0
+140000,Female,Graduate School,Single,28,0,138036,6600,1
+80000,Female,University,Single,45,0,83150,0,0
+500000,Female,Graduate School,Single,31,-2,17496,2000,1
+500000,Female,University,Married,39,0,146700,4585,0
+320000,Female,University,Married,35,0,4050,1073,0
+50000,Male,University,Single,29,1,49612,0,0
+120000,Female,University,Single,40,0,4180,1000,0
+500000,Male,Graduate School,Married,59,-2,0,0,1
+290000,Female,University,Married,42,-2,-132,18496,0
+280000,Male,Graduate School,Married,50,-1,19378,348,0
+30000,Female,High School,Married,41,0,39230,1618,1
+80000,Male,Graduate School,Single,31,1,21004,0,1
+50000,Male,University,Single,24,0,48136,3700,0
+10000,Male,High School,Single,24,1,7826,1200,0
+550000,Male,Graduate School,Married,49,-1,42141,1522,1
+60000,Male,University,Single,41,0,14895,6750,0
+140000,Female,University,Married,28,0,5925,2000,0
+50000,Female,University,Single,28,0,46096,1442,0
+300000,Female,University,Married,27,0,286321,20000,1
+30000,Female,High School,Married,29,2,26836,2500,1
+50000,Female,University,Single,25,1,0,0,1
+250000,Female,University,Married,33,2,2500,0,1
+60000,Male,Graduate School,Single,39,0,54978,2050,0
+80000,Male,High School,Single,44,2,57076,2800,0
+480000,Male,Graduate School,Single,33,1,-897,0,0
+30000,Male,University,Married,44,1,23171,0,1
+10000,Female,University,Single,25,0,9150,1185,0
+410000,Female,Graduate School,Single,41,-2,0,0,0
+100000,Male,Graduate School,Married,33,0,38219,5000,0
+200000,Male,High School,Married,41,0,84573,2544,0
+380000,Female,University,Married,48,0,363191,15504,0
+100000,Male,University,Single,27,0,41298,2000,0
+50000,Female,High School,Married,27,0,37651,1754,1
+130000,Female,University,Married,28,0,107216,4000,1
+120000,Male,University,Single,34,0,131773,3000,0
+150000,Female,University,Married,42,0,139172,6022,0
+310000,Female,Graduate School,Married,39,-1,1989,4566,0
+110000,Female,University,Single,51,-2,150000,0,0
+420000,Male,University,Married,37,0,415257,16850,1
+140000,Male,University,Married,28,-1,6007,7340,1
+500000,Female,Graduate School,Married,37,1,0,0,1
+260000,Female,University,Single,26,0,258364,45446,0
+90000,Female,University,Married,48,-1,316,316,1
+90000,Female,University,Single,41,-1,48276,2281,0
+20000,Male,High School,Single,35,-1,8918,0,0
+30000,Male,University,Single,36,0,36124,2500,1
+140000,Female,University,Married,36,0,45849,2100,0
+340000,Female,University,Single,32,0,80174,3871,0
+120000,Female,Graduate School,Single,31,-2,-66,0,0
+420000,Male,Graduate School,Married,34,-1,18037,3048,0
+100000,Male,High School,Married,48,-2,6345,0,0
+350000,Female,University,Single,37,0,128146,4052,0
+50000,Male,University,Married,26,0,48992,1769,0
+30000,Female,Graduate School,Single,28,2,37,0,0
+360000,Female,Graduate School,Single,29,-1,1766,4534,0
+20000,Female,High School,Single,23,-1,17432,1165,0
+400000,Female,University,Single,38,0,2988,1222,0
+170000,Male,High School,Married,45,0,40441,5000,0
+110000,Female,University,Married,36,1,10448,0,1
+200000,Female,University,Married,37,-1,390,31638,1
+200000,Female,Graduate School,Married,32,-2,6540,17301,0
+360000,Male,Graduate School,Single,29,-1,5143,6159,0
+500000,Male,Graduate School,Married,37,-1,1961,1188,0
+20000,Male,University,Single,22,0,18462,1510,0
+270000,Male,High School,Married,32,3,150616,0,0
+20000,Male,University,Single,44,1,6535,0,1
+350000,Female,Graduate School,Married,43,-2,0,0,0
+150000,Female,University,Married,26,0,115786,10000,0
+260000,Female,Graduate School,Single,30,0,83138,14038,0
+90000,Male,Graduate School,Single,33,-1,1650,721,0
+310000,Female,University,Married,32,0,162530,6100,0
+210000,Female,Graduate School,Single,43,1,-3309,184212,0
+40000,Female,University,Married,45,1,11402,0,1
+80000,Female,University,Married,25,1,0,0,0
+50000,Male,High School,Single,22,0,6673,1056,1
+370000,Female,Graduate School,Married,46,-1,26555,1572,0
+130000,Male,University,Single,28,0,113613,5847,0
+710000,Female,University,Married,40,-2,5200,0,1
+180000,Female,High School,Single,24,1,0,35300,0
+80000,Female,University,Married,30,0,42755,3000,0
+560000,Female,Graduate School,Single,32,0,68199,1548,0
+170000,Female,Graduate School,Single,26,0,47642,4000,0
+30000,Female,High School,Married,63,-1,8866,5884,0
+430000,Female,Graduate School,Single,30,-1,2202,7232,0
+500000,Female,University,Married,37,-1,7325,355,0
+10000,Male,University,Single,26,0,7606,2000,1
+130000,Female,Graduate School,Single,25,0,5279,4000,0
+160000,Female,University,Single,50,-2,3,13333,0
+150000,Male,Graduate School,Single,33,1,88590,0,1
+340000,Female,University,Married,43,-2,33245,249,0
+200000,Female,University,Married,37,0,184144,8017,0
+130000,Male,University,Single,37,0,125780,6500,0
+320000,Female,High School,Married,30,0,49889,2825,0
+50000,Male,High School,Married,54,2,55518,3000,0
+360000,Male,Graduate School,Married,46,-1,2171,1068,0
+420000,Female,University,Single,29,0,408568,18000,0
+110000,Male,Graduate School,Single,30,1,2932,0,0
+80000,Male,University,Single,35,0,26994,2000,0
+60000,Female,High School,Married,43,0,60283,3000,1
+60000,Male,University,Single,24,0,47757,1468,0
+210000,Male,University,Single,26,0,186109,5929,0
+360000,Female,Graduate School,Married,46,0,42094,10600,0
+20000,Male,University,Single,48,0,6111,4500,0
+150000,Female,Graduate School,Single,42,1,-874,0,0
+310000,Male,High School,Single,30,-2,-44,0,0
+240000,Female,University,Married,40,-1,3571,2309,1
+160000,Female,High School,Married,55,-1,4144,2016,0
+240000,Female,Graduate School,Single,40,-1,1841,2,0
+20000,Male,Graduate School,Single,22,0,14677,3004,0
+190000,Male,Graduate School,Married,55,-1,390,390,1
+280000,Female,University,Married,40,1,-10,0,0
+80000,Male,University,Others,58,0,3719,1000,0
+290000,Female,High School,Married,49,0,169004,7600,0
+200000,Female,Graduate School,Married,34,-1,3987,1631,0
+320000,Male,Graduate School,Married,50,-2,6426,0,0
+350000,Female,University,Single,27,0,16022,1600,0
+110000,Male,Graduate School,Single,33,0,39768,5548,0
+300000,Female,University,Married,37,1,0,0,1
+500000,Female,Graduate School,Married,44,1,706,1242,0
+120000,Female,Graduate School,Single,35,-1,500,2483,0
+80000,Female,High School,Others,31,0,5154,1124,0
+200000,Female,University,Single,33,0,198878,7700,1
+150000,Female,Graduate School,Single,26,-1,1551,2990,1
+150000,Female,Graduate School,Single,40,0,145074,11000,0
+210000,Male,University,Married,42,1,652,0,0
+150000,Male,University,Single,36,-1,5382,0,1
+50000,Female,Graduate School,Single,26,0,39052,6000,0
+80000,Male,University,Married,33,0,79537,2359,0
+50000,Male,University,Married,34,0,2522,0,0
+20000,Female,University,Married,50,2,17393,1315,0
+50000,Male,University,Married,37,0,27516,2000,0
+50000,Male,Graduate School,Single,29,1,14497,0,0
+280000,Male,University,Married,46,1,188894,10101,1
+230000,Male,University,Married,51,0,197993,2965,0
+230000,Male,University,Single,30,0,38651,8000,1
+80000,Female,High School,Married,37,0,63185,5021,0
+60000,Female,University,Others,35,0,12695,3000,0
+200000,Male,University,Married,37,0,199669,9300,0
+30000,Female,High School,Single,22,-1,-854,33404,0
+340000,Female,Graduate School,Single,34,-1,32488,42552,0
+250000,Female,University,Married,40,-1,4843,2942,0
+200000,Male,Graduate School,Married,52,-1,1015,934,0
+150000,Male,Graduate School,Single,39,0,146736,7027,0
+100000,Female,High School,Married,53,1,0,0,0
+220000,Female,University,Married,41,0,126532,4500,0
+140000,Male,Graduate School,Single,32,1,-4,0,1
+40000,Male,Graduate School,Married,28,-1,10371,2567,1
+200000,Female,University,Single,30,0,185070,7022,0
+520000,Female,Graduate School,Single,35,0,60912,7000,0
+30000,Female,University,Married,22,0,16534,2002,0
+20000,Male,University,Single,28,0,17973,1318,0
+50000,Male,University,Single,40,0,14633,1218,1
+300000,Female,High School,Single,29,0,143568,7951,0
+500000,Female,Graduate School,Single,36,-1,10935,4081,0
+490000,Female,University,Single,27,0,222705,5011,0
+200000,Male,Graduate School,Single,27,-1,630,825,0
+40000,Female,Graduate School,Single,23,2,19827,1500,1
+150000,Female,Graduate School,Single,28,-1,3752,2730,0
+50000,Male,Graduate School,Married,29,0,48310,2300,1
+100000,Male,University,Single,27,0,99668,4850,0
+50000,Female,High School,Married,42,2,44961,4000,1
+130000,Female,University,Single,25,-1,390,780,0
+70000,Female,Graduate School,Single,27,-1,157,157,0
+30000,Female,Graduate School,Married,30,0,28289,2013,0
+50000,Male,Graduate School,Single,28,0,31170,2005,0
+80000,Male,Graduate School,Married,45,-1,1473,1473,1
+220000,Female,Graduate School,Single,30,0,72546,5206,0
+200000,Male,University,Married,45,0,200378,6061,0
+120000,Female,University,Single,24,0,11296,1400,0
+100000,Female,University,Single,25,2,98499,5000,0
+100000,Female,High School,Married,59,1,0,7240,0
+50000,Female,Graduate School,Single,23,0,50025,1626,0
+240000,Male,University,Married,37,-1,2208,5665,0
+60000,Male,Graduate School,Single,28,0,2295,6000,0
+260000,Female,University,Married,34,-2,8214,2151,1
+30000,Female,High School,Married,56,2,17643,1500,1
+440000,Female,High School,Married,40,0,54408,5000,0
+360000,Female,University,Single,30,0,132250,5000,0
+50000,Female,Graduate School,Single,26,-1,5310,29000,1
+270000,Male,Graduate School,Married,50,-1,316,316,0
+150000,Female,Graduate School,Single,27,-1,30585,6000,0
+200000,Male,High School,Single,26,-1,8355,8355,0
+50000,Female,University,Married,36,0,44510,3000,0
+210000,Female,Graduate School,Single,33,-1,970,1303,0
+50000,Female,University,Single,27,2,47760,0,1
+50000,Male,University,Single,24,0,19839,1328,0
+440000,Male,Graduate School,Single,29,-2,26430,1249,0
+60000,Female,Graduate School,Single,28,2,43662,0,1
+70000,Female,University,Married,41,0,6017,3000,0
+290000,Female,University,Single,25,0,71559,2500,0
+140000,Male,University,Single,29,-1,487,487,0
+310000,Female,University,Single,33,0,110786,6555,0
+450000,Female,Graduate School,Single,36,1,-44,10267,0
+360000,Male,Graduate School,Married,34,-1,997,1270,0
+550000,Female,University,Married,34,0,383973,15000,0
+180000,Male,Graduate School,Married,44,1,0,0,0
+140000,Male,University,Married,37,0,34661,5334,0
+50000,Male,Graduate School,Single,33,0,17094,2000,0
+200000,Female,University,Married,30,-2,0,0,0
+60000,Male,University,Married,39,0,54282,2128,0
+140000,Male,Graduate School,Single,28,0,3234,4340,0
+70000,Male,University,Married,42,0,70293,3300,0
+20000,Female,University,Married,50,1,7973,2000,0
+100000,Female,University,Married,24,0,68963,3000,0
+360000,Male,Graduate School,Single,33,1,0,0,0
+100000,Female,High School,Married,24,-1,440,500,0
+160000,Female,High School,Single,41,-1,26736,3000,0
+160000,Female,University,Single,27,0,163418,6200,0
+130000,Male,University,Single,26,-1,296,1003,1
+100000,Female,High School,Married,40,0,48923,2084,0
+90000,Male,Graduate School,Married,35,0,15952,3100,1
+80000,Male,University,Single,23,2,8613,4000,0
+220000,Female,University,Married,28,5,216435,8355,1
+260000,Female,Graduate School,Married,39,-2,5912,3786,1
+30000,Male,High School,Married,45,2,25161,0,1
+100000,Female,University,Married,40,1,4535,0,1
+80000,Male,University,Single,39,0,55084,3012,0
+20000,Female,University,Single,31,0,5526,1032,0
+140000,Male,Graduate School,Married,41,0,19853,3000,0
+30000,Female,Graduate School,Single,25,1,0,0,1
+90000,Male,University,Married,39,0,20612,1676,0
+150000,Female,University,Single,23,0,59555,2812,0
+250000,Male,High School,Married,28,0,191305,3899,0
+20000,Female,University,Married,37,1,19911,1500,1
+50000,Male,University,Married,28,0,20914,10087,0
+100000,Male,University,Married,31,3,2000,0,1
+500000,Male,Graduate School,Married,46,-1,3580,18785,0
+50000,Female,University,Married,36,2,47357,3000,0
+40000,Female,Graduate School,Single,27,0,32228,8040,0
+80000,Female,University,Married,35,0,75530,7000,0
+100000,Male,Graduate School,Married,35,0,45124,1730,0
+180000,Male,Graduate School,Single,29,-2,1073,3206,0
+90000,Female,High School,Married,53,0,87235,4000,0
+10000,Female,University,Single,23,-2,1082,2731,0
+200000,Female,Graduate School,Single,33,-1,165,145,0
+80000,Male,High School,Married,57,2,7431,2600,1
+90000,Male,University,Single,33,2,62011,3000,1
+80000,Female,University,Married,42,0,30900,0,1
+90000,Female,University,Single,24,2,124457,6600,1
+80000,Female,University,Married,46,-1,2350,2350,0
+140000,Male,Graduate School,Single,27,-1,832,0,1
+480000,Male,University,Married,39,0,398380,20091,0
+230000,Male,Graduate School,Married,39,1,276758,12000,0
+70000,Male,Graduate School,Single,27,-1,4168,0,1
+200000,Female,Graduate School,Single,39,0,126082,6216,0
+180000,Female,Graduate School,Single,35,0,66376,5000,0
+190000,Male,High School,Married,37,0,181295,8019,0
+20000,Female,University,Married,24,1,18817,0,0
+10000,Male,Graduate School,Single,24,0,7680,2616,1
+80000,Female,Graduate School,Married,45,0,81133,3502,0
+60000,Male,High School,Single,28,2,53567,6950,0
+500000,Female,Graduate School,Single,37,-1,4168,19169,0
+50000,Male,High School,Single,42,-1,550,1974,1
+360000,Female,Graduate School,Single,28,1,0,420,0
+10000,Female,High School,Married,56,0,7324,2609,0
+300000,Male,Graduate School,Married,54,-1,206,555,1
+70000,Male,University,Married,48,0,8439,1188,0
+250000,Female,University,Married,32,0,39936,8666,0
+10000,Male,High School,Married,51,2,3577,2500,1
+140000,Female,High School,Married,28,0,25184,5000,0
+100000,Female,University,Married,40,1,0,0,0
+110000,Female,Graduate School,Married,31,1,0,0,0
+210000,Male,University,Married,38,-1,291,291,0
+210000,Female,Graduate School,Married,40,-2,14472,680,0
+70000,Female,University,Single,26,3,49201,0,1
+20000,Male,University,Married,47,0,15166,1257,0
+220000,Male,Graduate School,Single,31,-2,4935,3796,0
+20000,Female,High School,Others,51,2,18355,1600,1
+80000,Male,Graduate School,Married,28,2,56272,4000,0
+500000,Female,University,Married,42,0,506787,21000,1
+210000,Male,High School,Single,66,0,209396,6510,0
+80000,Male,University,Single,27,2,76896,3613,1
+50000,Female,University,Married,37,2,40994,5500,0
+240000,Male,Graduate School,Married,35,1,0,11901,0
+150000,Female,University,Single,23,-2,198,4848,0
+210000,Female,University,Married,49,1,0,0,0
+360000,Female,University,Single,28,1,0,0,1
+230000,Female,Others,Married,29,0,70069,3000,0
+500000,Male,Graduate School,Single,35,1,-453,2336,0
+150000,Female,University,Married,34,-1,50523,1763,0
+30000,Female,University,Single,31,1,27254,0,1
+470000,Male,Graduate School,Married,57,0,83852,3477,0
+200000,Male,Graduate School,Married,59,0,27119,1700,0
+470000,Male,Graduate School,Single,35,0,211775,9883,0
+310000,Male,Graduate School,Single,32,0,53340,69220,0
+210000,Female,Graduate School,Single,28,0,204191,9000,0
+500000,Female,University,Married,41,-1,24433,187206,0
+80000,Female,University,Single,27,0,72991,3320,0
+20000,Male,High School,Single,38,0,11805,1400,1
+150000,Male,High School,Single,32,1,3706,40093,0
+390000,Male,High School,Married,39,0,475934,59324,0
+10000,Male,Graduate School,Single,29,0,10208,1500,0
+390000,Female,High School,Married,46,-2,14315,7097,0
+210000,Male,High School,Married,41,-1,390,1732,1
+350000,Female,Graduate School,Married,33,-2,16414,693,1
+150000,Female,Graduate School,Single,35,-1,3299,599,0
+210000,Female,University,Married,36,0,54717,2017,0
+100000,Male,Graduate School,Married,55,0,18135,2000,0
+210000,Male,Graduate School,Single,36,0,210463,5187,1
+290000,Female,University,Married,48,-1,31386,31401,0
+330000,Male,University,Married,50,1,-13,2763,1
+50000,Male,University,Single,37,0,50701,2202,0
+80000,Male,High School,Single,27,0,47163,1821,0
+340000,Male,University,Married,35,0,39229,2000,0
+120000,Male,High School,Married,38,0,40538,1988,0
+30000,Female,University,Single,22,0,3158,5572,0
+260000,Female,Graduate School,Single,35,0,20556,9581,0
+50000,Male,University,Married,64,0,48369,2384,0
+420000,Female,High School,Single,25,1,72874,0,1
+130000,Female,University,Married,28,0,71983,4000,0
+270000,Female,University,Married,30,0,62599,5000,0
+50000,Male,High School,Single,30,0,6615,1135,0
+20000,Male,University,Married,34,0,17179,1300,1
+140000,Female,University,Married,34,0,132541,7000,0
+100000,Female,University,Married,35,0,106229,3000,0
+50000,Male,High School,Single,41,2,29368,0,0
+500000,Male,Graduate School,Single,37,-1,27181,7254,0
+80000,Male,University,Married,35,0,78507,3085,0
+220000,Female,University,Married,34,-1,51565,15683,1
+30000,Male,University,Married,34,2,27358,1750,0
+240000,Male,University,Single,26,0,161766,7000,0
+220000,Male,High School,Married,41,-2,10804,34415,0
+320000,Male,University,Married,32,-1,6443,451,0
+230000,Female,University,Married,36,0,52689,0,0
+20000,Male,High School,Single,44,-1,780,0,0
+280000,Female,University,Married,34,0,82691,3500,0
+110000,Male,University,Married,44,0,111188,6010,0
+180000,Female,University,Married,26,0,186248,18000,0
+100000,Male,High School,Married,51,0,93970,3799,0
+80000,Male,University,Married,37,0,43311,5377,0
+260000,Female,High School,Single,30,-1,1264,6894,0
+110000,Female,University,Single,25,2,111483,5900,0
+150000,Female,University,Married,31,0,104014,5500,0
+180000,Female,University,Single,35,-1,15288,4194,0
+230000,Female,University,Married,28,0,53542,36000,0
+500000,Female,Graduate School,Single,27,0,73798,4794,0
+300000,Male,University,Single,30,0,1485,0,0
+180000,Female,University,Single,31,-1,326,326,0
+30000,Female,University,Single,24,2,1050,0,1
+110000,Female,High School,Single,29,1,6467,1114,0
+100000,Female,University,Married,28,0,36428,6000,0
+390000,Male,Graduate School,Single,31,0,381453,15024,0
+180000,Female,Others,Married,35,0,29197,2000,0
+300000,Female,High School,Married,37,1,5975,1905,0
+260000,Female,Graduate School,Single,36,-1,7740,1445,0
+360000,Female,University,Single,29,1,0,0,0
+80000,Female,University,Single,26,0,509,525,0
+100000,Female,Graduate School,Single,26,-1,26273,0,1
+20000,Male,High School,Married,47,2,14328,3114,1
+210000,Female,Graduate School,Single,33,-1,27042,0,0
+10000,Female,University,Married,25,0,6991,1126,0
+150000,Male,Graduate School,Single,26,-1,360,360,0
+360000,Male,High School,Married,37,-1,390,390,0
+80000,Female,University,Single,42,0,16959,1235,0
+130000,Female,Graduate School,Single,29,-1,13194,9403,0
+30000,Male,Graduate School,Single,36,0,9934,1000,0
+430000,Female,University,Married,30,2,332721,10896,1
+20000,Female,University,Single,23,0,16641,1602,0
+50000,Male,University,Single,32,0,50342,8025,0
+140000,Female,University,Married,26,0,129996,7000,0
+200000,Male,Graduate School,Single,32,-1,11494,9139,0
+80000,Male,University,Married,28,2,65499,0,0
+50000,Female,University,Single,23,0,12630,1211,1
+280000,Female,University,Single,27,1,297192,0,0
+710000,Male,University,Married,51,1,0,0,0
+180000,Male,Graduate School,Single,29,1,0,100000,0
+500000,Male,Graduate School,Single,39,-2,12106,13351,0
+360000,Male,University,Single,33,1,0,0,1
+50000,Male,Graduate School,Single,29,-1,20258,3000,0
+80000,Male,University,Single,27,2,62824,2178,1
+50000,Female,University,Married,44,0,43596,5000,0
+50000,Female,University,Single,23,0,13717,2008,0
+150000,Male,Graduate School,Single,33,-1,1532,0,0
+360000,Male,University,Single,28,1,0,0,1
+50000,Female,University,Married,32,0,49547,2100,0
+50000,Female,Graduate School,Others,48,0,83400,1600,0
+200000,Female,Graduate School,Single,29,-2,9344,723,0
+80000,Male,Graduate School,Single,29,1,-54,36710,0
+140000,Female,University,Married,46,-1,390,390,0
+20000,Female,Graduate School,Others,22,0,14993,1800,1
+80000,Female,High School,Single,34,1,0,0,0
+110000,Male,University,Married,30,0,107372,5000,0
+140000,Female,Graduate School,Single,27,0,143536,5341,0
+300000,Female,Graduate School,Single,31,0,32608,20018,0
+50000,Female,University,Single,29,0,47411,1300,1
+180000,Female,Others,Married,39,0,84484,3007,0
+290000,Female,University,Married,39,0,212943,16923,0
+60000,Female,University,Married,36,1,0,0,0
+100000,Female,University,Married,38,1,12893,0,0
+30000,Male,University,Married,41,0,29866,1900,0
+220000,Female,High School,Single,27,2,150739,7000,0
+230000,Female,University,Single,30,-2,0,0,0
+30000,Female,Graduate School,Single,29,1,0,0,0
+20000,Female,University,Married,47,0,1343,1547,0
+50000,Male,University,Single,30,-1,780,780,0
+80000,Female,University,Married,46,1,0,0,1
+90000,Male,Graduate School,Single,29,0,24802,3200,0
+80000,Male,Graduate School,Single,31,0,19824,5000,0
+50000,Female,High School,Single,23,2,37350,3000,1
+50000,Female,University,Single,29,-2,5520,0,0
+20000,Male,High School,Single,53,2,19369,3900,0
+250000,Female,Graduate School,Married,38,-1,532,1106,0
+120000,Female,University,Single,24,-1,889,1742,0
+110000,Female,University,Single,28,1,112597,0,1
+160000,Male,Graduate School,Single,33,-1,100,0,0
+80000,Female,High School,Married,52,2,82198,3543,0
+410000,Male,University,Married,36,-2,41116,5071,0
+100000,Female,University,Married,43,-1,21293,70973,0
+750000,Female,University,Married,43,-1,72495,99664,0
+30000,Male,University,Single,47,0,29554,1895,0
+80000,Male,University,Single,33,1,19073,3895,0
+430000,Female,Graduate School,Single,31,-2,10226,5873,0
+480000,Male,Graduate School,Married,44,0,350110,14082,0
+360000,Female,High School,Married,29,1,0,0,1
+150000,Female,Graduate School,Married,37,0,26416,2497,0
+130000,Male,Graduate School,Single,28,0,89418,4000,0
+40000,Male,University,Married,47,1,37611,0,0
+280000,Female,Graduate School,Married,34,-1,11584,7011,1
+40000,Male,University,Married,45,2,33547,4000,0
+120000,Female,Graduate School,Single,26,-2,10415,5771,0
+290000,Male,Graduate School,Married,39,-1,1757,3579,0
+250000,Male,High School,Married,39,-1,1976,3318,0
+200000,Male,Graduate School,Married,43,-1,1144,2506,0
+80000,Male,University,Single,32,0,81115,5000,0
+130000,Female,University,Single,27,0,18776,3000,0
+290000,Female,University,Married,37,-1,1428,1500,1
+180000,Female,Graduate School,Single,29,-1,1010,3394,0
+360000,Female,Graduate School,Married,31,-2,-5,0,0
+140000,Female,University,Married,37,-1,316,586,0
+350000,Female,Graduate School,Married,41,-1,57956,15987,0
+260000,Male,Graduate School,Single,36,-2,0,0,0
+40000,Male,Graduate School,Single,26,0,18338,2000,0
+180000,Female,Graduate School,Single,27,1,-200,0,1
+80000,Male,High School,Married,27,0,22381,1500,1
+500000,Female,Graduate School,Married,45,-1,3852,2403,0
+80000,Male,Others,Single,27,-2,46089,4840,0
+30000,Male,High School,Married,57,0,20976,2000,0
+220000,Female,University,Married,37,1,0,0,0
+70000,Female,University,Single,22,0,68247,2120,0
+510000,Male,Graduate School,Married,41,-1,3196,16447,0
+20000,Female,University,Single,22,0,18011,2000,0
+100000,Female,University,Married,40,2,47243,2000,1
+340000,Female,University,Married,30,0,338272,15000,0
+180000,Female,Graduate School,Single,29,-2,0,0,0
+290000,Female,Others,Single,38,1,0,1437,0
+80000,Male,University,Single,53,0,80577,3261,0
+280000,Male,Graduate School,Married,42,-2,1751,22909,0
+180000,Female,Graduate School,Single,40,-2,0,0,0
+140000,Female,Graduate School,Married,66,1,0,0,0
+500000,Female,Graduate School,Single,32,-1,103880,39560,0
+230000,Male,University,Single,34,0,151778,5556,0
+250000,Female,University,Married,26,1,631,0,1
+10000,Male,University,Single,24,0,5803,1300,0
+230000,Male,Graduate School,Married,39,0,220706,11200,0
+70000,Male,Graduate School,Single,29,0,63603,7666,0
+140000,Female,Graduate School,Married,39,-2,4654,5525,0
+430000,Female,Graduate School,Married,48,-1,9900,0,0
+120000,Female,Graduate School,Single,28,-1,200,1986,1
+20000,Female,University,Single,25,0,16654,2000,0
+30000,Male,Graduate School,Single,32,0,29250,3600,0
+80000,Male,High School,Single,43,7,81071,0,1
+400000,Female,Graduate School,Single,39,0,404924,13521,0
+200000,Female,Graduate School,Married,37,1,0,0,0
+30000,Male,Others,Single,23,-2,2700,899,0
+50000,Female,University,Single,46,1,50471,0,0
+250000,Male,Graduate School,Married,46,-1,1197,0,1
+120000,Male,Graduate School,Single,28,-1,8922,35000,1
+80000,Male,High School,Married,45,1,-1213,38677,0
+430000,Female,Graduate School,Married,33,-2,1837,373,0
+120000,Female,Graduate School,Single,24,0,75796,3700,0
+20000,Female,University,Single,22,0,20386,1510,0
+20000,Female,University,Single,32,0,129106,4420,0
+80000,Male,Graduate School,Single,25,-2,0,0,0
+20000,Female,University,Single,44,0,7678,3347,0
+140000,Female,University,Single,27,1,0,0,0
+10000,Female,University,Single,22,0,14896,7000,0
+220000,Female,High School,Single,34,-2,99518,937,0
+60000,Male,Graduate School,Married,48,-1,282,62773,0
+80000,Male,High School,Married,59,1,21723,0,0
+30000,Male,University,Single,25,0,29745,2500,1
+140000,Female,University,Married,27,1,139058,4900,0
+20000,Male,University,Single,26,0,4491,1500,0
+100000,Female,High School,Single,46,0,94657,3000,0
+220000,Male,Graduate School,Single,35,0,158882,8000,0
+80000,Female,Graduate School,Single,40,0,27432,1833,0
+80000,Female,High School,Single,46,0,76184,3345,0
+130000,Male,University,Single,28,2,70952,0,1
+300000,Female,Graduate School,Married,33,1,0,0,0
+60000,Female,University,Married,42,0,58264,3200,0
+360000,Female,Graduate School,Married,32,-1,249,0,0
+10000,Male,University,Single,24,-1,12,0,1
+50000,Male,University,Single,32,0,32813,3000,1
+50000,Male,University,Married,48,0,42156,4022,0
+200000,Male,University,Married,35,1,0,0,0
+30000,Female,University,Single,24,1,15164,2000,1
+220000,Male,Graduate School,Married,42,0,3208,6000,0
+200000,Female,Graduate School,Single,41,-2,5954,10464,0
+80000,Female,High School,Single,27,1,84718,0,1
+200000,Male,University,Married,39,-2,15109,3155,0
+240000,Female,Graduate School,Married,34,-1,652,0,1
+500000,Female,University,Single,28,-1,72620,873,0
+230000,Male,High School,Married,38,-2,6901,0,0
+20000,Male,High School,Single,22,0,11242,1282,0
+50000,Male,Graduate School,Married,35,0,40730,4069,0
+100000,Female,University,Married,35,0,5830,1118,0
+230000,Female,University,Single,37,-2,0,0,0
+60000,Male,University,Single,28,0,42243,7773,0
+120000,Female,High School,Married,48,-1,3845,0,1
+130000,Female,High School,Single,23,-1,5719,10000,0
+50000,Female,High School,Married,25,2,31097,0,1
+400000,Female,Graduate School,Single,30,-1,14013,5460,0
+240000,Female,University,Single,27,-2,4400,0,1
+70000,Male,High School,Single,53,0,9928,1497,0
+20000,Female,Graduate School,Single,22,0,28758,1423,0
+50000,Male,High School,Single,28,2,28129,1800,1
+240000,Female,Graduate School,Married,54,0,60381,5000,0
+380000,Male,Graduate School,Single,31,0,167061,6000,0
+50000,Male,University,Single,24,0,5219,1500,0
+50000,Female,University,Single,33,0,12498,4000,0
+430000,Female,Graduate School,Single,33,0,40984,10000,0
+60000,Female,Graduate School,Single,26,0,22921,5000,0
+30000,Female,University,Single,25,0,3638,1130,0
+170000,Male,High School,Married,48,0,170845,6800,0
+70000,Female,Graduate School,Single,29,0,40783,4100,0
+130000,Female,University,Single,28,-1,390,7412,0
+50000,Male,Graduate School,Married,50,1,49252,1521,1
+20000,Female,High School,Single,38,1,0,3796,1
+50000,Male,High School,Married,48,-1,1058,1726,1
+20000,Male,High School,Single,28,2,14043,1600,0
+110000,Female,University,Single,32,-1,2015,1403,0
+90000,Female,University,Single,24,0,76453,2500,0
+180000,Female,Graduate School,Single,24,0,64800,4000,0
+50000,Female,University,Single,28,0,43747,1746,0
+200000,Male,University,Married,36,1,-1,0,0
+20000,Female,Graduate School,Married,32,2,11337,6000,1
+50000,Male,Graduate School,Single,44,0,51671,1928,0
+170000,Female,University,Married,39,0,114007,7000,1
+380000,Male,Graduate School,Married,48,0,356375,40000,0
+30000,Male,University,Single,39,1,1623,1000,1
+30000,Female,University,Single,23,0,19594,1346,0
+160000,Male,High School,Married,36,-1,4030,6720,0
+60000,Female,High School,Married,39,2,29565,3000,1
+450000,Female,Graduate School,Single,31,2,1638,54490,0
+80000,Female,University,Married,43,-2,398,381,1
+90000,Female,Graduate School,Single,26,1,23812,0,0
+30000,Male,University,Married,33,1,0,0,1
+50000,Male,University,Single,25,2,48885,2200,1
+30000,Male,High School,Married,49,2,26343,1500,0
+50000,Male,High School,Married,52,1,32119,0,1
+320000,Female,Graduate School,Married,34,-1,7417,8000,0
+300000,Female,University,Married,43,-1,1314,0,0
+390000,Female,Graduate School,Single,43,0,90294,3200,0
+50000,Female,University,Single,23,0,47662,2000,0
+20000,Male,University,Married,26,2,1050,0,1
+90000,Male,University,Single,52,0,87186,3505,0
+300000,Male,High School,Single,51,-1,18341,19854,0
+300000,Female,Others,Single,32,0,54053,15235,0
+200000,Male,Graduate School,Married,34,-1,5392,1938,0
+50000,Male,University,Single,24,0,33715,6600,0
+20000,Male,University,Single,24,-1,390,780,1
+390000,Male,Graduate School,Single,35,-1,30356,33074,0
+70000,Male,University,Single,41,0,69866,10000,0
+300000,Male,High School,Single,43,0,158741,5823,0
+140000,Female,High School,Married,50,0,139153,6006,0
+200000,Female,Graduate School,Married,37,-1,97821,4220,0
+80000,Female,High School,Single,28,-1,2147,612,0
+80000,Female,High School,Married,45,0,80250,0,0
+170000,Female,Graduate School,Married,36,-1,1218,2434,0
+50000,Female,Graduate School,Single,29,0,50187,3000,0
+100000,Male,University,Single,27,0,82609,3106,0
+30000,Female,University,Single,22,0,1924,7096,0
+30000,Female,Others,Married,49,0,26305,1500,0
+320000,Female,University,Married,36,-2,73024,2904,0
+70000,Female,University,Single,25,0,41511,4000,1
+50000,Female,University,Married,24,0,49387,2000,0
+130000,Male,High School,Married,48,0,68629,2187,0
+50000,Female,University,Single,27,1,8951,200,1
+50000,Male,University,Married,33,0,16149,1223,0
+50000,Male,University,Single,30,0,2496,2254,0
+160000,Male,High School,Married,30,1,102867,31,0
+450000,Male,University,Married,46,-1,130,1926,0
+380000,Female,Graduate School,Married,33,0,112926,4677,0
+30000,Female,Graduate School,Single,29,1,12517,1006,1
+50000,Male,University,Single,24,0,47530,1537,0
+220000,Male,University,Single,30,-1,776,1552,1
+50000,Female,University,Married,32,-1,390,390,1
+90000,Female,Graduate School,Single,27,1,0,0,0
+80000,Male,University,Married,26,0,67692,2763,0
+390000,Male,Graduate School,Married,44,-1,4197,4711,0
+140000,Male,University,Single,43,0,128947,4039,0
+360000,Female,University,Married,32,0,101403,2389,0
+50000,Male,High School,Single,26,2,50951,2500,1
+260000,Male,Graduate School,Married,37,0,244736,10000,1
+180000,Female,High School,Married,43,-1,1207,60267,1
+360000,Male,Graduate School,Single,26,1,-1917,0,0
+230000,Female,Graduate School,Single,35,-1,2630,0,1
+130000,Male,Graduate School,Single,25,-1,13350,6141,0
+20000,Female,High School,Married,42,0,17775,1500,0
+200000,Female,High School,Married,41,-2,-140,0,0
+220000,Female,University,Married,35,0,100252,3500,0
+20000,Male,Graduate School,Single,25,0,8226,1349,0
+20000,Male,University,Single,24,0,17697,1634,0
+240000,Female,High School,Single,40,0,227819,8200,1
+480000,Female,Graduate School,Married,50,-1,14638,11956,0
+50000,Female,University,Married,29,0,26052,2000,0
+100000,Male,University,Married,40,0,37596,1000,0
+120000,Female,Graduate School,Single,34,0,105356,3931,1
+230000,Female,Graduate School,Married,27,0,104001,4000,0
+20000,Male,University,Married,32,0,16849,1434,0
+50000,Male,University,Single,27,0,25140,1252,0
+130000,Female,Graduate School,Married,32,2,114227,5000,0
+150000,Female,Graduate School,Married,38,-2,1736,0,0
+60000,Female,Graduate School,Single,30,1,41259,400,0
+260000,Male,University,Single,44,0,36048,1688,0
+500000,Male,University,Married,40,0,30415,2006,0
+270000,Female,University,Married,34,0,74813,3000,0
+410000,Female,Graduate School,Single,34,1,0,13621,1
+60000,Female,University,Single,29,2,2469,59068,0
+200000,Female,Graduate School,Married,31,-1,261,6229,0
+50000,Male,High School,Others,45,-1,780,0,0
+500000,Female,Graduate School,Single,35,-2,-73,0,1
+150000,Female,Graduate School,Married,35,-2,329,0,0
+170000,Male,High School,Married,44,1,0,0,0
+50000,Female,University,Single,25,0,51461,2000,0
+130000,Male,University,Married,42,-1,1048,1048,0
+120000,Male,High School,Married,30,2,122246,6000,0
+70000,Male,Graduate School,Single,28,0,9214,1500,0
+320000,Female,University,Married,44,-1,3493,12498,0
+110000,Female,University,Married,37,0,79646,50000,0
+420000,Female,Graduate School,Single,30,0,34567,10000,0
+240000,Female,University,Married,49,-1,18045,17409,0
+380000,Male,Graduate School,Single,33,0,70591,10000,0
+20000,Female,University,Single,22,0,17356,1274,0
+50000,Female,Graduate School,Single,42,0,35539,3600,1
+20000,Male,High School,Single,23,-1,780,0,0
+20000,Female,University,Single,31,0,16659,1600,0
+240000,Female,Graduate School,Single,28,0,237633,7065,0
+340000,Female,University,Married,57,1,-8,0,1
+90000,Female,University,Single,23,2,18168,0,1
+50000,Female,University,Married,56,0,31276,1556,0
+210000,Female,University,Married,33,0,137009,20000,0
+60000,Female,University,Single,25,0,44331,5000,0
+50000,Male,University,Married,53,0,12291,3300,0
+50000,Female,Graduate School,Single,26,1,0,1950,0
+90000,Female,University,Single,26,0,21854,1536,0
+250000,Male,Graduate School,Married,31,-2,3759,8873,0
+290000,Male,High School,Married,47,2,255412,5130,1
+240000,Male,University,Single,35,-1,8602,18718,0
+360000,Male,Graduate School,Single,29,-1,1360,8035,0
+340000,Male,Graduate School,Single,36,0,11951,5000,0
+290000,Female,Graduate School,Single,48,0,74864,1411,0
+50000,Female,University,Single,25,2,48690,1692,0
+310000,Female,University,Married,41,-1,530,7233,0
+10000,Male,University,Single,22,0,9358,1216,0
+260000,Female,Graduate School,Single,31,0,181679,8900,0
+180000,Female,High School,Single,26,0,71702,2950,1
+20000,Male,High School,Single,26,-1,96,1560,0
+360000,Female,Graduate School,Single,28,-2,2400,2500,0
+80000,Female,Graduate School,Married,30,0,65622,3300,0
+450000,Female,University,Married,44,0,10716,8000,0
+30000,Female,University,Single,22,0,30190,1532,1
+500000,Male,Graduate School,Married,39,0,26501,1261,0
+210000,Female,Graduate School,Married,34,0,146756,5500,0
+300000,Male,High School,Single,64,-1,1068,1168,0
+100000,Male,Others,Married,52,0,90939,3467,1
+240000,Male,Graduate School,Married,47,1,138103,500,1
+50000,Female,University,Single,56,0,30121,2295,0
+20000,Male,University,Single,22,3,17710,0,1
+110000,Female,University,Single,23,0,1877,1000,1
+200000,Male,Graduate School,Others,43,-1,9873,0,0
+220000,Male,High School,Single,33,1,-117,1000,0
+50000,Female,University,Single,43,0,9784,2303,1
+60000,Male,University,Single,26,0,40171,1491,0
+20000,Female,Graduate School,Single,23,2,9736,3500,1
+170000,Male,University,Single,39,-1,2429,8077,1
+30000,Female,Graduate School,Single,29,2,2030,1533,0
+160000,Female,Graduate School,Married,29,-1,1778,0,1
+200000,Male,Graduate School,Married,35,0,199891,8500,1
+120000,Male,High School,Single,25,0,81129,7504,0
+230000,Female,Graduate School,Married,38,0,12907,316,0
+20000,Female,High School,Married,42,-1,10665,637,0
+10000,Male,High School,Single,46,2,1050,0,1
+100000,Female,High School,Single,24,1,0,0,0
+130000,Female,University,Single,24,1,0,0,0
+50000,Male,University,Single,47,0,91481,3956,0
+140000,Female,High School,Single,31,0,92636,3500,0
+50000,Male,University,Single,49,0,47864,3473,0
+360000,Male,Graduate School,Single,29,-2,3206,2358,0
+20000,Female,University,Single,22,4,19529,0,1
+110000,Female,University,Single,33,0,107028,4000,0
+100000,Male,University,Single,33,5,90822,0,1
+260000,Female,Graduate School,Married,39,0,256557,6726,0
+50000,Female,Graduate School,Single,23,0,46750,1505,0
+360000,Female,University,Married,39,1,0,0,1
+30000,Female,University,Married,27,2,27896,0,1
+50000,Male,University,Single,23,0,49798,2193,0
+270000,Female,University,Single,31,-2,0,0,0
+250000,Female,Graduate School,Married,44,-2,4976,5417,0
+250000,Male,Graduate School,Single,48,-2,5241,2754,0
+120000,Female,University,Married,24,1,0,0,0
+250000,Female,High School,Single,36,-1,44242,25000,0
+260000,Female,University,Single,29,-1,1267,553,0
+240000,Male,University,Married,39,2,234661,10200,0
+60000,Female,University,Single,27,0,18224,2000,0
+250000,Female,Graduate School,Married,40,-1,326,326,1
+100000,Female,University,Single,26,0,98542,3326,0
+150000,Female,Graduate School,Single,35,1,0,1070,0
+170000,Female,Graduate School,Married,45,0,165243,7067,0
+710000,Female,Graduate School,Single,32,0,28585,3561,0
+20000,Male,University,Married,50,0,6620,1200,0
+360000,Female,University,Single,36,0,83412,3700,0
+150000,Female,High School,Married,60,1,0,0,1
+150000,Female,High School,Married,49,-1,3114,23979,0
+50000,Male,High School,Single,58,0,7556,1140,0
+140000,Female,High School,Married,38,2,122283,0,0
+250000,Male,Graduate School,Married,42,1,-8,0,0
+20000,Male,University,Single,36,0,11423,2749,0
+360000,Female,University,Married,42,-2,2333,6295,0
+160000,Female,University,Married,34,-1,2734,0,1
+180000,Female,University,Married,30,0,27225,1765,0
+50000,Male,University,Single,23,0,28569,1400,0
+50000,Male,University,Single,45,0,19834,1340,0
+220000,Female,Graduate School,Married,60,2,19024,2000,1
+150000,Male,University,Married,39,0,76415,4150,1
+160000,Male,University,Married,40,-1,1672,2941,0
+160000,Female,High School,Single,30,-1,1055,2579,0
+200000,Female,University,Married,29,0,73700,3304,0
+190000,Male,Graduate School,Single,32,0,5484,2450,0
+50000,Female,University,Married,32,0,47445,1697,1
+50000,Male,University,Single,24,-1,390,390,0
+140000,Female,University,Single,35,0,88394,3280,0
+60000,Female,University,Single,26,0,48867,2200,0
+10000,Female,High School,Married,24,0,8283,1500,0
+400000,Male,Graduate School,Single,28,1,-1320,9952,0
+500000,Female,Graduate School,Single,44,0,380502,10481,0
+320000,Female,University,Single,24,-1,44500,51334,0
+240000,Male,Graduate School,Married,50,-1,2115,0,0
+110000,Female,High School,Married,44,0,128341,4861,0
+240000,Male,Graduate School,Single,29,1,0,0,1
+50000,Female,High School,Single,31,2,26411,3231,1
+100000,Female,High School,Single,46,2,216,964,0
+50000,Male,Graduate School,Single,25,-1,25045,15522,0
+50000,Male,High School,Single,24,-1,25437,2000,1
+50000,Male,University,Married,43,-2,1040,4254,1
+150000,Female,University,Single,30,0,63150,6000,0
+200000,Female,University,Married,37,-2,4157,606,0
+80000,Female,University,Single,28,0,47234,3000,0
+360000,Male,Graduate School,Married,35,-1,11936,21983,0
+120000,Male,University,Single,49,0,38878,2000,0
+230000,Male,Graduate School,Single,28,0,70966,5606,0
+60000,Female,University,Married,34,0,54316,2770,1
+90000,Male,University,Married,37,1,9549,0,0
+50000,Male,University,Single,29,3,49747,2500,1
+240000,Male,University,Married,31,-1,1051,475,0
+230000,Female,University,Married,42,-1,6946,500,0
+80000,Female,Graduate School,Single,23,0,23998,3150,0
+400000,Female,Graduate School,Single,28,-1,3299,6978,0
+120000,Male,Graduate School,Married,45,0,117775,5700,0
+250000,Male,Graduate School,Married,39,1,0,93,0
+30000,Female,Graduate School,Single,24,1,29846,0,0
+150000,Female,Graduate School,Married,54,0,152059,7100,0
+310000,Male,High School,Single,37,2,207995,8000,0
+150000,Male,University,Single,34,0,135587,4000,0
+80000,Female,Graduate School,Single,29,0,11040,1500,0
+200000,Female,Graduate School,Married,37,1,0,0,1
+50000,Male,University,Single,24,0,35166,1300,0
+190000,Male,University,Married,23,0,223483,11837,1
+50000,Male,University,Single,33,2,30773,4000,0
+100000,Male,University,Single,31,1,24377,1400,1
+150000,Male,Graduate School,Married,33,0,144761,6005,0
+80000,Male,University,Single,43,0,19260,1270,0
+200000,Male,University,Married,30,1,736,736,0
+120000,Female,High School,Single,46,0,116697,4200,0
+60000,Female,University,Married,49,2,61093,1739,1
+450000,Female,Others,Married,51,-1,144396,6396,0
+120000,Male,Graduate School,Single,32,-1,890,390,1
+130000,Male,University,Married,43,-2,0,0,0
+240000,Female,Graduate School,Single,36,-1,5367,6959,1
+90000,Female,University,Single,26,0,11181,3194,0
+210000,Male,University,Single,25,2,7474,1500,1
+90000,Female,University,Single,23,-1,803,16983,0
+420000,Female,Graduate School,Married,40,-2,1794,7786,0
+290000,Male,Graduate School,Married,63,0,183800,6820,1
+390000,Male,Graduate School,Single,29,0,43062,5100,0
+230000,Male,Graduate School,Single,32,-2,489,0,0
+90000,Female,University,Married,29,0,13770,1522,0
+280000,Female,University,Married,45,1,0,0,0
+280000,Female,Graduate School,Single,28,-2,21670,15846,0
+30000,Female,High School,Married,25,-2,0,20657,0
+20000,Male,University,Married,38,-1,18596,1342,0
+180000,Female,Graduate School,Married,43,-1,36484,24049,0
+70000,Female,High School,Single,26,0,7139,1500,1
+230000,Female,University,Married,34,-1,2007,3229,0
+50000,Male,University,Single,36,0,17003,3000,0
+250000,Male,University,Married,47,-1,10735,19269,0
+140000,Male,Graduate School,Married,47,0,109259,5400,0
+140000,Female,University,Married,41,0,43616,2100,0
+130000,Female,Graduate School,Single,25,-1,1088,1521,0
+120000,Female,University,Married,40,-2,73900,0,0
+60000,Female,University,Single,43,1,44193,0,1
+80000,Female,High School,Single,32,0,77385,2900,0
+240000,Female,Graduate School,Married,41,-1,4420,16920,0
+50000,Male,High School,Married,32,0,46727,2000,1
+270000,Female,Graduate School,Single,33,-2,12091,0,0
+170000,Female,Graduate School,Married,52,-1,2939,3000,0
+50000,Male,University,Married,51,0,42462,1928,1
+230000,Female,Others,Married,29,0,123436,7000,0
+390000,Female,Graduate School,Single,26,-2,5044,1248,0
+200000,Female,University,Married,39,0,151955,9201,1
+500000,Female,University,Married,27,0,174725,6630,0
+30000,Female,Graduate School,Single,25,3,14190,0,1
+450000,Female,Graduate School,Single,33,-1,30627,4008,0
+50000,Male,University,Single,25,0,11014,1402,0
+50000,Female,Graduate School,Single,26,0,48150,1797,0
+220000,Male,Graduate School,Single,48,0,115294,6000,0
+30000,Female,University,Single,33,0,29965,2000,0
+50000,Female,High School,Married,42,3,34542,1000,0
+50000,Male,University,Married,31,0,48469,2138,0
+30000,Female,University,Single,24,2,28510,2000,1
+200000,Female,Graduate School,Single,42,-1,660,1320,1
+50000,Female,University,Married,46,1,43244,0,1
+80000,Female,University,Single,26,0,47788,4000,0
+20000,Male,High School,Married,26,3,10135,2100,1
+70000,Female,University,Married,44,0,68687,2545,0
+130000,Male,High School,Married,44,0,15722,0,0
+200000,Male,High School,Married,34,0,192910,10000,0
+320000,Female,Graduate School,Married,34,-2,11317,2617,0
+170000,Female,Graduate School,Single,27,0,61785,3848,0
+280000,Female,Graduate School,Single,30,-1,8527,13244,0
+10000,Female,University,Single,34,0,8014,1300,0
+200000,Male,Graduate School,Married,35,1,204441,0,0
+70000,Male,University,Single,27,0,63298,4100,0
+230000,Female,Graduate School,Single,45,0,204625,15006,0
+150000,Female,Graduate School,Single,40,-1,800,2400,0
+320000,Female,Graduate School,Married,29,0,95322,5000,0
+50000,Male,Graduate School,Single,25,4,35906,2000,0
+200000,Female,University,Married,44,1,2667,0,1
+200000,Male,University,Married,39,-1,326,326,1
+160000,Female,University,Single,29,-1,2248,0,0
+380000,Male,Graduate School,Single,42,0,1837,3000,0
+50000,Male,University,Married,51,0,11583,5000,1
+500000,Female,University,Married,51,-2,8864,1462,0
+150000,Male,High School,Married,34,0,38808,2000,0
+90000,Male,University,Single,33,0,56291,3000,1
+60000,Male,University,Others,46,0,24677,3000,1
+140000,Male,Graduate School,Single,27,-1,13497,15688,0
+220000,Female,Graduate School,Single,40,-1,13654,12312,0
+450000,Male,High School,Single,41,0,262138,10027,0
+200000,Male,University,Married,48,0,205150,0,0
+140000,Female,Graduate School,Single,34,0,30996,3200,1
+290000,Female,Graduate School,Married,46,0,6615,2000,1
+260000,Male,Graduate School,Single,34,0,87281,15000,0
+60000,Female,Graduate School,Single,27,0,61015,3016,0
+280000,Male,Graduate School,Single,31,-1,12740,17352,0
+310000,Female,High School,Married,40,0,69569,3366,0
+80000,Female,High School,Married,50,0,6627,3000,0
+150000,Female,Graduate School,Single,33,0,74033,237,0
+80000,Male,University,Single,28,0,49302,1412,0
+480000,Female,University,Married,45,-1,1223,22313,1
+350000,Female,University,Married,34,1,23589,0,0
+230000,Male,Graduate School,Single,30,1,0,0,0
+180000,Male,High School,Single,46,0,109114,2835,0
+50000,Male,University,Married,32,0,50764,2200,1
+310000,Female,University,Married,43,1,266629,9500,1
+10000,Male,Graduate School,Married,41,0,6859,3000,0
+280000,Male,Graduate School,Single,25,0,9895,3011,0
+50000,Male,University,Single,52,0,50557,2283,1
+50000,Female,University,Single,51,2,10728,1504,1
+10000,Female,Graduate School,Single,22,1,8352,0,0
+290000,Male,Graduate School,Married,43,0,294475,12021,0
+240000,Male,Graduate School,Single,33,1,0,0,0
+100000,Female,University,Single,26,1,48006,0,1
+160000,Female,University,Single,25,-1,21570,7545,0
+230000,Male,University,Married,46,0,170354,7200,0
+230000,Female,University,Married,46,-2,0,0,0
+300000,Female,High School,Married,41,1,0,1018,0
+500000,Male,High School,Single,32,0,211508,10070,0
+200000,Male,Graduate School,Single,29,-1,1124,1000,0
+210000,Female,Graduate School,Single,44,0,125615,5000,0
+50000,Female,University,Single,23,-1,373,0,0
+130000,Male,University,Single,25,0,112561,12421,0
+90000,Female,University,Married,37,-1,2522,0,0
+80000,Female,High School,Married,36,-1,5587,6906,0
+30000,Male,High School,Single,23,2,10732,10000,0
+480000,Female,University,Single,30,1,0,0,0
+80000,Female,High School,Married,41,0,54074,10000,0
+90000,Female,University,Single,24,0,36179,1953,0
+70000,Male,University,Married,31,1,32945,0,0
+50000,Male,High School,Single,31,0,5458,2000,0
+110000,Female,High School,Married,39,0,73064,3900,0
+120000,Male,University,Single,30,0,100802,5200,0
+180000,Male,University,Single,24,-2,2294,1296,0
+50000,Male,University,Single,33,1,40159,2100,1
+150000,Female,Graduate School,Married,29,-1,2565,3648,0
+340000,Female,Graduate School,Single,30,1,0,0,0
+300000,Female,University,Single,33,0,351232,9198,0
+150000,Female,Graduate School,Married,36,-1,123160,20000,1
+30000,Male,Graduate School,Single,38,2,69707,4000,0
+50000,Male,University,Single,52,0,47595,8080,0
+100000,Male,Graduate School,Married,42,-1,2206,0,1
+270000,Female,University,Married,35,-1,165,165,0
+320000,Male,Graduate School,Married,43,-1,3274,86682,1
+160000,Male,University,Single,26,-1,1236,1234,0
+300000,Female,Graduate School,Single,26,0,10545,3000,0
+30000,Female,University,Married,49,0,2273,2000,1
+290000,Male,High School,Married,41,-2,-4,0,0
+180000,Female,University,Married,49,-2,106,3166,0
+150000,Male,Graduate School,Married,57,-2,153664,22000,0
+60000,Female,University,Married,30,-1,54379,2269,0
+260000,Female,Graduate School,Single,33,-2,6475,2500,0
+150000,Male,University,Married,32,0,143706,6010,1
+230000,Male,High School,Married,55,0,214423,10000,0
+250000,Female,Graduate School,Single,29,1,0,0,0
+70000,Female,University,Married,29,0,1929,1054,0
+360000,Male,High School,Married,44,-1,1982,0,1
+200000,Male,Graduate School,Married,34,-1,5879,5884,0
+30000,Female,Graduate School,Single,30,0,27851,2100,0
+20000,Male,University,Married,30,-1,1473,1473,0
+80000,Male,High School,Single,66,0,58592,2912,0
+260000,Female,Graduate School,Single,32,-2,-34,6880,0
+210000,Female,University,Married,35,0,64740,3000,0
+50000,Male,High School,Married,35,0,35214,3900,0
+140000,Male,Graduate School,Single,26,0,66526,3990,0
+90000,Male,High School,Married,42,2,48674,2300,0
+20000,Male,University,Married,31,0,35651,4960,0
+360000,Male,Graduate School,Married,44,-1,165,165,0
+580000,Female,University,Married,57,-1,20498,2290,0
+180000,Male,University,Single,39,0,29478,3353,0
+60000,Male,University,Married,51,0,60557,3040,0
+60000,Female,University,Married,25,0,39831,4500,0
+50000,Male,University,Single,30,0,50995,2500,0
+100000,Female,University,Married,39,-1,200,200,0
+380000,Male,Graduate School,Married,46,0,141102,5034,0
+310000,Female,Graduate School,Married,37,-1,325,4373,0
+50000,Male,University,Single,37,1,28731,0,1
+20000,Male,University,Single,33,0,16128,2000,0
+30000,Female,University,Single,50,0,27284,4664,0
+50000,Female,University,Single,32,1,32814,0,1
+30000,Female,High School,Others,52,1,10033,0,1
+490000,Male,Graduate School,Married,42,0,497657,18260,0
+500000,Female,Graduate School,Single,34,-1,218570,18978,0
+80000,Male,High School,Single,27,0,13165,2100,0
+200000,Male,University,Married,33,0,205150,0,0
+360000,Male,Graduate School,Single,29,0,205119,8270,0
+30000,Male,University,Married,39,0,29493,2006,1
+170000,Female,Graduate School,Married,36,0,46113,3182,0
+10000,Male,University,Single,21,0,6703,2000,0
+230000,Female,University,Married,52,1,0,0,0
+200000,Male,Graduate School,Married,49,-1,416,416,1
+160000,Female,Graduate School,Single,35,0,61089,1965,0
+270000,Male,University,Married,34,2,275572,10332,1
+80000,Male,University,Single,28,0,57798,2050,0
+360000,Female,Graduate School,Single,29,-1,1146,2500,1
+140000,Female,University,Single,27,0,111820,3817,0
+80000,Female,High School,Single,43,0,76594,3100,0
+200000,Female,Graduate School,Single,25,0,1698,1086,0
+80000,Female,Graduate School,Single,25,-1,390,390,0
+390000,Male,Graduate School,Married,35,0,28804,20000,0
+80000,Female,University,Single,31,-1,1019,8479,1
+80000,Female,High School,Married,44,0,40821,2281,1
+20000,Male,University,Married,31,1,21635,0,1
+30000,Female,High School,Single,22,0,28579,2006,0
+60000,Male,Others,Single,23,-1,5363,1000,0
+100000,Female,University,Single,25,0,59219,2197,0
+100000,Male,University,Married,52,0,81382,4611,0
+40000,Female,High School,Married,48,0,69282,3500,0
+120000,Female,Graduate School,Single,30,-1,316,316,0
+20000,Female,High School,Married,56,0,12685,1188,0
+310000,Female,University,Single,26,0,87717,9156,0
+50000,Male,University,Single,23,1,19203,0,1
+500000,Male,University,Married,49,1,9502,0,0
+290000,Male,High School,Married,40,0,151996,7800,0
+30000,Male,University,Single,34,0,19101,1653,0
+100000,Female,Graduate School,Single,23,-1,850,0,0
+360000,Female,Graduate School,Single,31,-1,8288,21888,0
+20000,Male,High School,Married,51,-1,2300,0,1
+150000,Female,University,Married,37,1,108784,4800,1
+360000,Female,Graduate School,Married,31,-1,3680,13517,0
+490000,Female,University,Married,30,0,26088,2003,0
+40000,Female,University,Married,44,-1,11770,4740,0
+100000,Female,High School,Married,36,-1,1490,0,0
+20000,Female,University,Single,24,0,14367,2001,0
+500000,Female,University,Married,32,0,37664,3000,0
+50000,Male,University,Single,28,-2,49703,1761,0
+260000,Male,Graduate School,Single,34,-1,324,828,0
+140000,Female,University,Married,49,-1,2457,0,0
+260000,Male,Graduate School,Single,49,-1,316,316,0
+200000,Female,Graduate School,Single,32,-1,3450,12174,0
+50000,Female,University,Single,28,0,47132,1651,0
+200000,Female,Graduate School,Single,32,1,0,2300,0
+20000,Female,University,Single,26,0,6690,2600,0
+10000,Male,High School,Single,24,0,4050,2000,0
+30000,Male,University,Single,34,0,28014,1694,0
+170000,Female,High School,Married,40,0,73784,3663,0
+50000,Female,High School,Married,45,0,4410,4050,0
+420000,Female,Graduate School,Single,34,-1,13207,6222,1
+70000,Female,University,Married,30,0,8432,1300,0
+300000,Female,University,Married,31,-1,316,316,0
+500000,Male,University,Single,31,2,533142,3089,0
+20000,Female,University,Single,49,0,15945,1288,0
+280000,Female,Graduate School,Single,30,0,161001,5000,0
+150000,Male,Graduate School,Married,41,-1,390,390,0
+180000,Male,University,Married,35,-1,1099,1099,0
+50000,Male,Graduate School,Single,25,0,50401,5000,0
+150000,Female,University,Single,24,1,0,0,0
+30000,Male,University,Single,26,1,24386,0,0
+250000,Male,Graduate School,Married,37,-2,-207,0,0
+300000,Female,High School,Married,39,0,37672,4000,0
+80000,Male,University,Married,30,0,110692,3500,0
+50000,Female,High School,Married,58,0,50019,1400,0
+120000,Male,High School,Married,36,-1,326,326,0
+330000,Female,University,Married,39,8,377779,0,1
+140000,Female,University,Others,30,0,107060,4598,0
+150000,Female,University,Married,50,3,3227,0,0
+330000,Male,Graduate School,Married,57,-1,3166,552,0
+380000,Female,Graduate School,Married,34,2,322199,16001,0
+50000,Male,University,Married,32,0,46253,1998,0
+120000,Female,Graduate School,Single,26,0,104546,4000,0
+150000,Female,University,Married,37,0,135211,10109,0
+130000,Female,University,Married,38,0,127626,5400,0
+160000,Female,University,Single,27,0,78592,3700,0
+30000,Female,High School,Married,44,-1,1072,4440,0
+370000,Male,University,Married,33,1,0,0,0
+180000,Female,Graduate School,Married,51,-1,513,396,0
+50000,Male,University,Married,54,2,48580,0,0
+170000,Female,Graduate School,Single,34,-1,326,326,1
+60000,Female,University,Married,25,0,34733,3600,1
+20000,Male,University,Single,34,-1,13936,8735,0
+20000,Male,University,Single,23,1,16332,2500,1
+50000,Female,Graduate School,Single,25,0,35427,2000,0
+150000,Female,High School,Single,52,0,5678,1010,0
+170000,Female,University,Married,34,0,166851,5000,0
+30000,Female,University,Married,43,0,27862,2000,0
+50000,Female,Graduate School,Single,25,1,0,0,1
+100000,Male,University,Single,25,-1,416,416,0
+500000,Male,Graduate School,Married,53,-1,107782,11016,0
+180000,Male,Graduate School,Single,35,-1,326,326,0
+280000,Male,Graduate School,Single,33,-2,640,0,0
+50000,Female,University,Single,27,-1,1500,0,0
+360000,Female,University,Married,49,0,338321,12323,0
+170000,Male,Graduate School,Married,44,-1,6569,29069,0
+90000,Male,University,Married,36,4,62737,0,1
+50000,Female,University,Married,25,0,49772,2082,0
+20000,Male,Graduate School,Married,24,-1,625,709,1
+220000,Female,University,Married,34,0,109815,3634,0
+200000,Female,Others,Single,25,-1,2017,0,0
+70000,Female,Graduate School,Single,29,0,66648,3000,0
+400000,Female,Graduate School,Married,28,0,181178,4000,1
+90000,Female,Graduate School,Married,36,-1,9999,0,0
+50000,Female,High School,Single,54,4,51300,0,1
+60000,Male,High School,Married,31,1,69311,0,1
+50000,Female,High School,Married,47,1,0,0,0
+100000,Male,Graduate School,Married,52,1,21509,0,1
+180000,Female,University,Married,34,2,95008,4500,1
+210000,Female,Graduate School,Married,42,1,0,0,0
+430000,Female,Graduate School,Married,35,-1,3642,492,0
+20000,Female,University,Married,41,-1,2468,1087,0
+40000,Female,University,Married,51,0,144397,3988,1
+180000,Female,Graduate School,Single,27,-2,12275,6491,0
+140000,Female,Graduate School,Single,28,-1,13159,30000,0
+30000,Female,High School,Single,45,0,28611,1352,0
+210000,Male,Graduate School,Married,36,0,102245,5000,0
+90000,Male,University,Married,35,8,112662,0,1
+160000,Female,Graduate School,Married,29,0,50989,1695,0
+130000,Female,University,Single,27,1,-191,382,0
+10000,Female,University,Single,21,0,8660,1800,1
+170000,Male,University,Married,39,0,143339,20000,0
+20000,Male,University,Single,29,0,15760,1593,0
+50000,Female,Graduate School,Single,29,0,38242,1799,0
+360000,Male,Graduate School,Married,59,-2,17154,5095,0
+50000,Female,Graduate School,Single,23,-1,1413,0,0
+80000,Female,Graduate School,Single,25,-1,390,5250,0
+160000,Male,Graduate School,Single,32,-1,5315,8300,0
+380000,Male,Graduate School,Married,37,0,173252,10069,0
+220000,Female,Graduate School,Single,32,0,33710,8959,0
+50000,Male,High School,Single,54,-2,0,0,0
+140000,Female,University,Single,50,0,95310,4000,0
+270000,Male,Graduate School,Single,48,2,56376,2680,0
+50000,Male,University,Single,26,0,51212,2000,0
+100000,Female,University,Single,25,1,53111,4000,1
+360000,Male,University,Single,28,-1,9780,6712,0
+20000,Male,University,Single,39,0,19592,1334,0
+50000,Male,University,Single,27,0,50902,2128,0
+360000,Female,University,Single,46,1,361863,0,0
+20000,Female,University,Married,50,1,-5,5000,0
+100000,Male,High School,Single,42,0,70542,3000,0
+80000,Male,Graduate School,Single,27,0,76817,3000,0
+140000,Male,High School,Married,39,0,136918,5000,0
+390000,Female,Graduate School,Married,28,0,64440,5000,0
+350000,Male,Graduate School,Married,39,-1,325,2770,0
+170000,Male,University,Married,48,-1,390,390,0
+270000,Female,High School,Married,37,1,265044,73000,0
+460000,Female,High School,Married,42,0,43792,1600,0
+200000,Female,High School,Single,37,-2,3938,8576,0
+450000,Female,Graduate School,Single,28,-1,108,0,0
+240000,Male,Graduate School,Married,47,-1,207,99,0
+260000,Female,Graduate School,Single,29,-2,1476,1476,0
+110000,Female,University,Married,34,0,24147,2000,0
+50000,Male,University,Married,51,0,38434,1681,0
+250000,Female,High School,Others,49,1,260260,13000,0
+380000,Male,Graduate School,Married,41,-1,3907,13345,1
+110000,Female,University,Married,28,0,41799,2027,0
+200000,Female,Graduate School,Married,44,0,26814,368,0
+500000,Female,Graduate School,Single,41,-1,10386,28854,0
+260000,Female,University,Married,35,0,70098,2598,0
+270000,Female,University,Married,38,0,60654,3000,0
+150000,Male,University,Single,29,0,15453,4000,0
+30000,Female,High School,Others,42,3,27669,0,0
+290000,Female,Graduate School,Single,46,0,289002,1343,0
+200000,Male,University,Single,30,-1,326,326,0
+210000,Female,Graduate School,Single,35,-2,552,0,0
+100000,Female,University,Married,49,1,0,0,0
+50000,Female,University,Married,45,0,45390,2321,0
+210000,Female,High School,Single,56,0,27316,15316,0
+240000,Male,University,Married,44,-2,-9095,10000,0
+140000,Female,University,Single,25,0,33740,1363,0
+50000,Male,University,Married,49,0,35623,1168,0
+360000,Female,Graduate School,Married,27,1,0,0,1
+180000,Male,University,Married,47,0,84973,6000,0
+100000,Male,University,Single,36,8,110072,0,0
+90000,Female,University,Married,29,0,5618,1125,0
+90000,Female,Graduate School,Married,37,-2,1243,0,0
+90000,Male,University,Married,52,2,90603,3900,0
+30000,Male,High School,Single,25,-2,780,0,1
+300000,Male,University,Single,29,0,49833,5000,0
+20000,Male,High School,Married,36,0,18839,1690,0
+30000,Female,Graduate School,Single,25,0,8341,2000,1
+50000,Female,Graduate School,Single,27,0,12814,3000,0
+50000,Male,High School,Married,45,-1,1189,300,0
+30000,Female,Graduate School,Single,22,0,8679,1190,1
+60000,Female,Graduate School,Single,28,1,47262,0,0
+200000,Female,Graduate School,Single,25,1,0,0,1
+450000,Female,University,Married,39,-1,810,1000,0
+140000,Female,University,Married,32,0,129653,5820,0
+120000,Female,University,Married,47,0,112448,6500,0
+270000,Female,University,Single,45,0,7088,1173,0
+130000,Male,Graduate School,Single,46,0,110766,6000,1
+50000,Male,University,Married,27,0,9837,1010,0
+220000,Female,University,Married,52,-1,20208,5209,0
+50000,Male,University,Married,26,0,28440,1436,0
+20000,Male,University,Single,36,2,19597,1503,0
+110000,Male,High School,Married,54,0,105449,4500,0
+70000,Female,Graduate School,Single,36,0,75021,2893,0
+380000,Female,High School,Married,38,0,380781,16750,0
+160000,Male,Others,Single,26,-2,2810,16261,0
+70000,Male,High School,Others,35,2,47702,2200,0
+90000,Female,University,Single,29,-1,18858,11056,0
+130000,Male,High School,Others,52,0,106140,3003,1
+240000,Female,Graduate School,Single,26,1,-11,3930,0
+120000,Male,High School,Married,32,2,67502,3000,0
+240000,Female,Graduate School,Married,44,0,188307,6600,0
+300000,Male,Graduate School,Single,45,0,62296,3000,0
+100000,Female,University,Married,40,0,26702,2008,0
+200000,Female,Graduate School,Single,25,-2,6096,2886,0
+50000,Male,University,Married,39,0,39323,7031,0
+100000,Female,Graduate School,Married,40,1,-166,0,0
+500000,Male,Graduate School,Single,39,-1,83528,472,0
+20000,Male,Others,Single,50,0,19062,1333,0
+280000,Female,Graduate School,Single,39,-1,1726,350,0
+160000,Female,Graduate School,Single,23,0,151246,10517,0
+30000,Male,University,Single,21,0,26587,1385,0
+50000,Male,Graduate School,Others,45,0,20408,6000,0
+20000,Male,Graduate School,Single,24,-1,3808,4515,0
+140000,Male,Graduate School,Single,32,0,50890,9395,0
+130000,Male,High School,Single,27,0,89956,2317,0
+100000,Female,University,Single,23,0,13530,1258,0
+100000,Male,University,Single,25,0,86098,3200,0
+50000,Male,University,Single,26,2,32976,2000,1
+360000,Male,Graduate School,Single,31,1,0,0,0
+10000,Male,High School,Others,42,1,5674,0,1
+50000,Male,High School,Married,65,0,48341,2068,0
+10000,Male,University,Single,24,2,223,3000,0
+210000,Male,Graduate School,Married,38,1,-9,0,1
+320000,Male,University,Married,31,-1,6927,10063,0
+70000,Female,University,Single,23,0,39935,1683,0
+160000,Female,University,Single,24,0,52373,1676,1
+180000,Male,University,Married,36,0,4524,2000,1
+70000,Female,University,Married,27,1,71570,0,1
+280000,Female,University,Single,30,-1,1373,207,0
+40000,Male,Graduate School,Single,29,1,0,0,0
+50000,Female,High School,Single,51,2,33384,2000,0
+80000,Female,Others,Married,25,0,75296,4000,0
+30000,Male,High School,Single,42,5,32755,0,1
+20000,Female,University,Married,24,-1,750,165,0
+130000,Female,University,Single,42,2,119378,6000,0
+10000,Male,University,Single,25,-1,1261,1261,0
+70000,Female,Graduate School,Single,25,0,68839,3213,0
+280000,Male,Graduate School,Single,34,0,212093,18685,0
+300000,Male,Graduate School,Married,34,0,66176,3000,0
+50000,Female,University,Married,40,0,47374,2200,1
+230000,Male,University,Married,29,0,113736,4028,0
+80000,Female,University,Married,39,0,63368,5000,0
+50000,Male,Graduate School,Single,38,-1,390,390,1
+470000,Female,University,Single,40,0,277185,10257,0
+210000,Male,University,Single,39,0,33459,3000,0
+200000,Female,Graduate School,Single,32,-1,146,0,0
+50000,Male,University,Married,46,0,49566,1968,0
+300000,Female,University,Married,37,2,51893,2000,0
+500000,Female,Graduate School,Married,33,-1,1110,1110,0
+150000,Female,University,Married,36,-2,48020,8000,0
+50000,Male,University,Single,25,0,11391,1286,0
+270000,Male,Graduate School,Married,37,0,156346,7000,1
+60000,Female,University,Single,27,1,49099,1800,1
+200000,Female,High School,Single,31,0,167358,11000,0
+50000,Female,University,Single,23,0,32822,1150,0
+140000,Female,Graduate School,Single,31,-2,15953,25148,1
+120000,Female,University,Single,25,0,5259,1290,0
+20000,Male,University,Married,29,0,14275,1297,0
+210000,Female,Graduate School,Single,30,-1,23375,20000,0
+70000,Female,High School,Single,27,0,57879,2600,0
+230000,Male,University,Single,37,1,0,0,1
+270000,Female,University,Married,33,-1,2323,5434,0
+20000,Male,Graduate School,Single,56,1,0,0,0
+120000,Female,Graduate School,Single,30,-1,326,326,0
+140000,Female,Graduate School,Single,24,0,4679,1000,0
+180000,Female,University,Single,33,0,153833,7107,0
+190000,Male,Graduate School,Married,36,-1,1779,1288,0
+100000,Female,University,Single,29,2,80234,2000,0
+50000,Female,Graduate School,Single,26,1,42207,2200,0
+150000,Male,High School,Married,44,0,146277,3952,0
+250000,Female,University,Married,38,0,255571,10605,0
+220000,Female,Graduate School,Married,24,2,184761,0,1
+200000,Male,High School,Single,28,0,54663,11000,0
+250000,Female,University,Married,44,0,47463,3000,0
+280000,Female,Graduate School,Single,32,0,202506,10547,0
+30000,Male,High School,Married,64,2,13594,1000,1
+80000,Female,University,Single,29,0,89061,4000,0
+380000,Female,University,Single,31,0,381530,32640,0
+350000,Female,University,Married,32,-2,19637,242247,0
+160000,Male,Graduate School,Single,48,-1,23037,0,1
+70000,Male,High School,Married,45,2,67505,2600,1
+150000,Male,Graduate School,Married,33,1,0,0,0
+160000,Female,Graduate School,Married,39,0,43583,7500,0
+40000,Male,University,Single,27,0,26663,5700,1
+350000,Female,University,Single,30,0,245211,15000,0
+50000,Male,High School,Single,27,0,50577,2138,1
+220000,Female,Graduate School,Married,38,0,4438,3000,0
+200000,Female,Graduate School,Married,26,-1,6666,6666,0
+390000,Male,Graduate School,Married,41,-1,7445,10038,0
+330000,Male,Graduate School,Single,32,-2,28106,2701,0
+100000,Male,Graduate School,Married,60,0,101761,3856,1
+70000,Female,University,Married,58,0,138681,3500,0
+240000,Female,Graduate School,Married,41,0,126395,5000,0
+30000,Male,University,Single,23,0,27275,2000,0
+30000,Female,University,Married,43,2,15246,3141,1
+160000,Male,University,Single,29,0,163982,6150,0
+120000,Female,University,Married,40,0,22462,5000,0
+10000,Female,University,Married,55,3,5358,1200,1
+470000,Male,Graduate School,Single,33,0,72083,5031,0
+200000,Female,University,Single,26,-1,3374,4053,1
+50000,Male,Others,Single,22,0,6514,1500,0
+90000,Female,Graduate School,Single,29,1,0,194,0
+220000,Female,Graduate School,Married,35,-1,288,500,0
+40000,Female,University,Single,23,-1,652,0,1
+200000,Male,High School,Married,50,1,177088,0,1
+120000,Male,University,Married,43,0,33131,2000,0
+210000,Male,Graduate School,Married,43,-1,880,724,1
+30000,Female,University,Single,22,-1,31176,0,0
+110000,Male,University,Married,54,2,56839,2700,1
+280000,Male,University,Married,35,0,278948,10609,0
+150000,Female,Graduate School,Married,31,1,0,0,0
+20000,Male,High School,Single,49,0,15237,1600,0
+80000,Male,University,Single,25,0,29820,2306,0
+50000,Male,University,Married,43,2,30106,1878,1
+170000,Male,Graduate School,Single,36,0,92480,4000,0
+200000,Male,Graduate School,Married,45,1,0,479,1
+300000,Male,Graduate School,Single,38,-1,390,390,0
+100000,Male,University,Married,46,2,17497,1300,1
+80000,Male,Graduate School,Single,28,1,5431,0,0
+30000,Male,Graduate School,Married,32,0,28401,3434,1
+80000,Female,Graduate School,Married,35,0,27560,1701,0
+60000,Male,Graduate School,Single,30,-1,264,264,1
+110000,Female,University,Single,24,0,94597,4260,0
+10000,Female,University,Married,46,1,9841,121,1
+290000,Female,High School,Single,32,0,21981,3000,0
+30000,Male,Graduate School,Single,26,0,27172,1000,0
+60000,Male,University,Single,45,0,59475,2500,0
+150000,Male,High School,Married,49,0,4029,4054,0
+750000,Female,High School,Married,40,-2,16372,15420,0
+30000,Male,University,Married,29,0,10721,2000,0
+160000,Female,University,Single,51,-1,6136,5932,0
+50000,Female,University,Married,31,0,50628,2350,1
+50000,Female,Graduate School,Single,42,-1,1310,37073,0
+160000,Male,University,Single,30,2,75406,3300,0
+150000,Female,Graduate School,Single,23,0,151669,5900,0
+30000,Female,University,Married,42,0,27276,3671,0
+350000,Male,Graduate School,Married,35,0,19734,3000,0
+20000,Female,Graduate School,Single,24,0,18821,2000,0
+50000,Male,University,Single,39,0,47206,1500,0
+180000,Male,High School,Married,27,0,12303,2000,1
+30000,Male,High School,Single,23,2,29144,2033,1
+100000,Male,University,Married,52,0,55120,2403,0
+20000,Male,University,Single,24,1,11893,1500,0
+80000,Male,Graduate School,Single,29,0,73722,3300,0
+150000,Male,University,Married,43,0,89488,5016,0
+230000,Female,Graduate School,Single,30,-1,754,0,0
+50000,Male,Graduate School,Married,25,0,5472,1500,0
+210000,Female,Graduate School,Single,39,-1,3962,1777,0
+50000,Male,University,Others,41,0,52260,1679,0
+240000,Female,University,Married,37,0,1854,1346,0
+500000,Female,High School,Single,40,0,264961,20078,0
+210000,Male,University,Single,28,0,5373,1000,0
+160000,Female,University,Married,40,-2,0,0,0
+250000,Female,Graduate School,Single,48,1,0,0,1
+500000,Female,Graduate School,Single,46,0,20614,5001,0
+100000,Male,University,Married,35,1,59121,0,1
+20000,Female,University,Single,26,0,16385,1700,0
+30000,Female,University,Married,30,0,14451,1525,0
+50000,Female,Graduate School,Single,27,0,13301,1400,0
+160000,Male,University,Single,27,0,109405,12000,0
+50000,Male,High School,Single,28,1,-9802,10000,0
+90000,Female,High School,Married,25,-1,3363,174,1
+20000,Male,University,Single,28,0,11449,1582,0
+50000,Male,Graduate School,Married,44,0,35145,2000,0
+230000,Female,High School,Single,39,1,11391,0,0
+150000,Male,University,Single,31,3,76113,3400,0
+50000,Male,Graduate School,Single,45,0,48719,2212,0
+280000,Female,Graduate School,Single,36,-1,27290,0,0
+500000,Female,Graduate School,Married,37,0,315816,10000,0
+90000,Female,University,Married,47,-1,1887,43542,0
+360000,Male,Graduate School,Single,29,-1,64606,5135,0
+140000,Male,University,Single,50,0,98668,4410,0
+140000,Male,Graduate School,Single,44,-1,5287,5717,0
+230000,Female,Graduate School,Single,44,1,0,0,1
+50000,Female,University,Single,25,0,39033,6455,0
+130000,Female,University,Married,50,-1,1504,1027,0
+330000,Female,Graduate School,Single,44,0,243621,21400,1
+210000,Male,Graduate School,Single,30,-1,4830,0,0
+10000,Female,Graduate School,Married,27,2,4920,1256,0
+50000,Female,University,Married,54,0,49320,1925,0
+100000,Female,Graduate School,Married,46,0,39245,3000,0
+70000,Female,High School,Married,26,0,65763,3000,0
+50000,Female,University,Married,45,0,40650,2850,0
+170000,Female,Graduate School,Married,28,0,20302,3000,0
+100000,Female,University,Married,50,2,44078,2000,1
+80000,Male,Graduate School,Married,36,0,60110,2186,0
+150000,Male,High School,Single,27,0,77298,3029,0
+50000,Female,High School,Single,23,0,26956,1462,0
+80000,Male,University,Married,34,1,0,0,0
+120000,Female,Graduate School,Single,24,0,78304,2900,0
+470000,Male,High School,Married,47,0,163529,4700,0
+80000,Female,High School,Married,52,0,18100,1244,0
+60000,Male,Graduate School,Single,30,0,46307,6000,0
+80000,Male,University,Single,42,0,75264,3200,0
+50000,Male,University,Married,57,0,25551,2733,1
+500000,Male,Graduate School,Married,43,0,403800,17000,1
+110000,Male,University,Single,26,0,49732,2300,0
+80000,Female,University,Married,44,0,40769,2150,0
+50000,Male,University,Single,27,2,48655,460,1
+400000,Female,Graduate School,Married,44,4,278863,0,1
+90000,Female,Graduate School,Single,29,0,90749,3481,0
+150000,Female,Others,Married,53,0,135903,7474,0
+360000,Female,Graduate School,Married,35,-1,10621,0,0
+170000,Male,High School,Married,29,0,165027,7200,0
+110000,Male,High School,Single,37,2,103356,2700,1
+220000,Female,Graduate School,Single,29,0,224852,10235,0
+200000,Female,University,Married,36,0,198807,10700,0
+500000,Female,Graduate School,Single,47,-2,0,3270,0
+340000,Male,Graduate School,Married,48,0,321046,13690,0
+180000,Female,University,Single,32,1,0,0,0
+490000,Male,Graduate School,Single,44,-1,2920,4060,0
+60000,Female,University,Single,26,0,56146,2314,0
+250000,Male,Graduate School,Single,30,-2,2729,2543,0
+20000,Female,University,Single,30,2,15997,1600,1
+170000,Male,High School,Married,39,1,171062,0,0
+160000,Female,High School,Single,37,2,16626,0,1
+80000,Female,Graduate School,Single,25,-1,1672,0,0
+380000,Female,Graduate School,Married,36,-1,8183,10617,0
+70000,Female,High School,Married,33,0,71230,2500,1
+80000,Male,Graduate School,Single,29,0,78038,3000,0
+380000,Male,University,Married,58,0,369699,12172,0
+230000,Female,Graduate School,Single,24,0,14979,2480,0
+130000,Male,University,Single,28,2,134505,6200,1
+160000,Female,Graduate School,Married,35,-1,2851,0,0
+50000,Female,High School,Married,42,0,44332,8802,0
+180000,Male,Graduate School,Married,45,2,106462,10000,0
+220000,Female,University,Single,31,0,75451,4000,0
+200000,Female,High School,Married,41,1,0,340,1
+520000,Male,High School,Married,41,2,435924,0,1
+30000,Male,University,Single,46,0,12485,1221,1
+150000,Female,University,Others,41,1,0,0,0
+210000,Female,University,Married,32,-1,733,737,0
+50000,Female,University,Single,36,2,42380,1901,1
+20000,Female,University,Married,37,0,17843,1280,0
+290000,Male,Graduate School,Married,44,-1,19213,33548,0
+50000,Female,University,Single,32,1,30382,54608,0
+130000,Female,High School,Married,42,0,127524,5051,0
+60000,Male,University,Single,24,0,54809,0,0
+20000,Male,Graduate School,Single,25,0,11394,3600,0
+160000,Female,University,Single,35,0,124459,10000,0
+250000,Female,High School,Married,39,-1,5086,14180,0
+80000,Male,High School,Married,38,0,19277,1400,0
+60000,Female,University,Married,33,0,54660,1879,0
+80000,Male,University,Single,30,0,13793,63023,1
+60000,Male,University,Single,25,2,60323,2000,0
+410000,Male,Graduate School,Married,38,-1,10867,2719,0
+240000,Female,Graduate School,Single,34,0,14038,12000,0
+30000,Male,High School,Single,46,3,13784,0,1
+200000,Female,University,Single,31,0,8350,2001,0
+180000,Female,University,Married,46,1,35916,1711,0
+100000,Male,University,Single,32,-1,88550,5000,0
+50000,Female,Others,Single,42,0,50806,1545,1
+60000,Female,High School,Single,45,-1,1876,1876,1
+50000,Female,Graduate School,Single,26,0,5800,2500,1
+240000,Female,Graduate School,Married,34,-2,597,6507,0
+220000,Female,Graduate School,Single,29,0,47944,5000,0
+40000,Male,High School,Single,42,-1,1978,3,0
+80000,Female,Graduate School,Single,23,0,71487,2157,0
+320000,Male,University,Single,52,-1,1122,5000,0
+420000,Male,University,Married,43,0,44670,30253,0
+500000,Male,University,Married,48,-1,28352,35465,0
+290000,Female,University,Single,24,0,145341,7000,0
+190000,Male,Graduate School,Married,45,0,181814,7035,0
+100000,Male,Graduate School,Single,33,2,101329,3900,1
+230000,Female,University,Single,26,1,0,0,0
+50000,Male,University,Married,36,1,19606,0,0
+210000,Male,University,Married,43,0,47025,2300,0
+90000,Female,University,Single,24,1,0,0,1
+480000,Male,High School,Married,56,0,103497,5300,0
+120000,Male,Graduate School,Single,41,1,-8,0,0
+160000,Male,University,Single,28,1,162761,6000,0
+110000,Male,University,Married,27,1,46657,3000,0
+260000,Female,University,Married,60,1,-20,0,0
+80000,Male,Graduate School,Single,24,-1,2130,1010,0
+50000,Male,University,Married,58,-1,1956,2060,0
+50000,Female,University,Married,45,0,35116,1441,1
+100000,Female,University,Single,29,0,75012,4000,0
+20000,Male,High School,Married,56,1,8766,0,1
+260000,Female,Graduate School,Married,44,1,0,6500,0
+680000,Female,Graduate School,Married,42,0,189514,8000,0
+50000,Male,University,Single,25,-1,1789,48731,0
+170000,Female,Graduate School,Single,27,0,118801,4000,0
+210000,Female,University,Married,37,-1,326,326,1
+500000,Male,Graduate School,Married,51,-2,10063,10395,0
+180000,Female,Graduate School,Married,45,0,18472,2690,0
+340000,Female,University,Married,37,1,-653,3254,0
+160000,Male,High School,Single,30,0,160388,142159,0
+30000,Female,University,Married,22,2,26846,2000,1
+230000,Female,University,Married,44,0,205938,8476,0
+30000,Male,High School,Single,22,2,28328,1704,0
+480000,Male,University,Single,33,-1,57732,55614,0
+360000,Male,Graduate School,Married,44,-1,16731,4864,0
+500000,Male,Graduate School,Married,50,1,74850,3400,1
+220000,Male,Graduate School,Married,37,-1,17039,7540,0
+100000,Female,Graduate School,Single,29,3,87791,0,1
+200000,Female,University,Single,31,1,0,0,0
+200000,Male,Graduate School,Single,40,0,193048,6410,0
+90000,Female,University,Single,23,0,91653,4200,0
+50000,Male,University,Single,24,0,27963,2001,0
+80000,Female,University,Married,42,2,45691,2000,1
+260000,Female,Graduate School,Married,48,0,8339,5000,0
+80000,Female,High School,Married,25,0,61943,2215,0
+20000,Male,Graduate School,Single,23,1,-348,0,1
+50000,Male,University,Single,53,1,0,3333,0
+90000,Female,Graduate School,Single,29,0,92464,3600,0
+290000,Male,University,Married,30,0,351890,11000,0
+20000,Female,High School,Married,30,-2,18920,1700,0
+230000,Female,Graduate School,Single,36,1,0,0,0
+360000,Female,University,Single,34,1,0,1680,0
+50000,Male,University,Single,40,0,48331,2300,1
+80000,Male,University,Single,36,-2,59223,1141,0
+100000,Female,University,Single,56,0,97549,9300,1
+390000,Female,Graduate School,Single,28,0,286888,11000,0
+120000,Male,University,Married,52,2,107269,4449,0
+20000,Female,University,Single,35,1,20971,0,0
+150000,Female,University,Single,24,2,63486,4330,1
+20000,Male,University,Married,38,0,16828,20192,1
+20000,Male,Graduate School,Married,44,0,17974,2000,0
+250000,Male,Graduate School,Single,44,0,98727,5000,0
+230000,Female,University,Single,25,0,19225,2000,0
+20000,Male,High School,Married,40,0,20167,1308,0
+80000,Female,University,Married,41,0,21709,1378,0
+310000,Female,University,Single,32,0,51007,2200,0
+480000,Male,Graduate School,Single,49,0,456668,14706,0
+20000,Female,Graduate School,Single,24,0,16199,0,0
+50000,Male,High School,Married,22,0,46450,2200,0
+190000,Male,University,Married,35,6,254951,0,1
+50000,Male,University,Single,41,0,45284,1844,0
+50000,Female,University,Single,29,2,8281,1000,1
+50000,Male,University,Married,38,0,22619,2000,0
+300000,Female,University,Married,45,-1,41051,71277,0
+60000,Male,University,Married,33,0,42246,2008,0
+260000,Male,Graduate School,Single,28,1,-1197,40000,0
+190000,Male,Graduate School,Single,27,1,-4370,87998,0
+500000,Male,High School,Others,53,0,415022,12602,0
+300000,Female,Graduate School,Single,31,0,60700,5000,0
+50000,Male,University,Single,43,0,50665,1940,1
+50000,Female,High School,Married,27,-1,29321,1503,1
+130000,Female,University,Married,42,0,24565,5000,0
+80000,Male,Graduate School,Married,38,-1,604,0,0
+50000,Male,University,Others,37,0,44828,2005,0
+50000,Female,University,Single,31,0,40019,2006,0
+120000,Female,Graduate School,Single,27,0,70835,3000,0
+220000,Female,Graduate School,Single,33,-1,3217,11435,0
+50000,Female,Graduate School,Single,35,0,51505,2500,0
+90000,Female,Graduate School,Single,29,0,88254,4200,0
+360000,Female,University,Single,43,-2,0,0,0
+280000,Female,University,Married,49,0,20650,0,0
+270000,Female,Graduate School,Married,57,-2,13559,736,0
+200000,Male,Graduate School,Married,31,0,160292,6036,0
+80000,Male,High School,Single,25,0,80493,3035,0
+60000,Female,Graduate School,Single,34,1,0,0,0
+60000,Female,University,Single,31,0,58173,3400,0
+90000,Female,Graduate School,Single,28,-1,23940,75000,0
+10000,Female,High School,Others,48,0,7065,1200,0
+140000,Male,University,Single,36,0,94586,3519,0
+280000,Male,Graduate School,Married,42,0,173092,5853,1
+140000,Female,University,Married,24,0,121241,4000,0
+110000,Male,High School,Married,52,0,13962,1500,0
+120000,Female,High School,Single,26,0,22492,1534,0
+130000,Female,University,Married,39,0,131657,5000,0
+290000,Male,University,Married,48,0,224662,10160,0
+80000,Male,University,Married,33,0,22064,1700,0
+50000,Female,University,Single,22,2,15006,0,1
+450000,Female,University,Married,33,-2,318,316,0
+50000,Male,High School,Married,37,3,30411,1500,1
+170000,Female,Graduate School,Single,27,-2,154189,5926,0
+210000,Male,University,Single,30,0,38967,10000,0
+30000,Female,University,Single,23,2,21684,2000,0
+230000,Male,Graduate School,Single,33,0,57171,2100,0
+170000,Male,University,Single,34,-1,2806,652,1
+20000,Female,University,Single,22,0,17958,1329,0
+50000,Male,High School,Married,54,1,49271,1000,0
+100000,Female,University,Married,42,0,65917,4000,0
+140000,Female,Graduate School,Single,27,-1,390,0,0
+100000,Female,Graduate School,Single,30,0,90375,3007,0
+150000,Female,Graduate School,Married,37,1,0,1505,1
+60000,Female,High School,Single,23,1,28831,0,1
+200000,Male,High School,Married,33,0,199343,10021,1
+200000,Female,Graduate School,Single,27,0,11912,14443,0
+50000,Female,Graduate School,Single,35,0,48585,2220,1
+60000,Male,University,Single,32,0,51934,2000,0
+50000,Female,High School,Married,32,1,45652,0,1
+140000,Female,University,Married,55,0,22662,2500,0
+640000,Female,Graduate School,Single,41,0,55645,3000,0
+210000,Male,University,Married,41,2,87705,4800,1
+60000,Female,University,Married,27,0,43757,2000,0
+80000,Female,University,Married,32,3,72385,2000,1
+290000,Female,Graduate School,Married,37,-1,3486,550,1
+150000,Female,Graduate School,Single,28,1,0,0,0
+200000,Female,University,Single,40,0,99729,59201,0
+130000,Female,University,Single,29,2,2880,2000,1
+490000,Male,University,Single,34,-2,0,0,0
+200000,Male,University,Married,40,-1,9759,0,1
+210000,Female,Graduate School,Married,54,-1,7771,7078,0
+260000,Female,Graduate School,Single,29,-1,3735,0,1
+80000,Male,University,Married,42,0,48237,2200,0
+50000,Male,High School,Married,53,0,41674,1660,0
+210000,Female,University,Single,30,0,77637,4400,0
+210000,Female,University,Others,52,-1,815,815,0
+40000,Female,High School,Married,41,1,20057,1500,0
+110000,Female,University,Married,33,0,184406,7400,0
+50000,Male,Graduate School,Single,49,0,25277,3000,0
+80000,Female,Graduate School,Single,27,-1,3199,1325,0
+80000,Female,Graduate School,Single,37,-1,6734,1990,1
+250000,Female,Graduate School,Single,43,0,93163,4409,0
+100000,Female,Graduate School,Single,27,0,75296,3000,0
+50000,Male,University,Single,29,2,43205,4000,1
+150000,Female,Graduate School,Single,32,2,120800,6000,1
+70000,Male,University,Single,30,0,17526,3000,1
+150000,Male,Graduate School,Single,29,-1,249,0,0
+360000,Female,Graduate School,Single,31,-1,1483,9386,0
+90000,Male,University,Single,29,0,51864,2098,0
+160000,Female,University,Married,28,0,6031,1280,0
+70000,Female,University,Single,36,0,67864,2044,0
+360000,Female,University,Married,33,0,325839,11083,0
+420000,Female,Graduate School,Single,33,-1,25718,0,0
+50000,Female,University,Married,47,0,48888,1900,0
+160000,Female,University,Single,28,3,110819,0,1
+100000,Female,University,Married,38,1,0,199,0
+240000,Female,High School,Married,41,-1,3973,3724,0
+390000,Female,Graduate School,Single,42,0,98262,5003,0
+80000,Female,University,Married,37,0,57830,2103,0
+360000,Female,Graduate School,Single,29,0,25633,2000,1
+210000,Female,University,Married,34,-1,3402,4774,0
+320000,Female,Graduate School,Single,28,1,0,0,1
+20000,Female,High School,Married,50,0,8612,2202,0
+180000,Female,University,Married,27,2,186895,0,1
+490000,Male,University,Married,50,0,205873,10000,0
+30000,Female,University,Single,22,0,25536,1800,0
+290000,Female,Graduate School,Married,34,-1,111,0,0
+340000,Female,University,Married,46,-2,0,0,0
+40000,Female,University,Married,43,-1,220,960,0
+50000,Male,University,Single,23,-2,-29,9910,0
+30000,Male,University,Single,30,2,21326,1400,1
+60000,Female,Graduate School,Single,34,2,493,240,0
+180000,Male,Others,Single,29,1,92245,11400,0
+160000,Male,High School,Married,57,-1,390,390,0
+170000,Male,Graduate School,Married,33,1,9020,0,0
+50000,Male,University,Married,42,2,12664,2000,0
+50000,Male,University,Married,45,0,8558,1471,0
+360000,Female,Graduate School,Married,32,-2,0,0,0
+160000,Female,University,Married,35,2,131104,6300,1
+260000,Female,Graduate School,Single,39,0,107801,30054,0
+350000,Female,Graduate School,Single,31,2,20042,100000,0
+30000,Female,University,Single,26,0,27376,1500,0
+180000,Female,University,Single,27,1,17207,0,0
+230000,Female,Graduate School,Single,30,-1,1219,283,0
+50000,Female,Graduate School,Single,29,1,0,0,0
+370000,Male,University,Married,38,2,376014,16000,1
+30000,Male,University,Married,42,0,24317,4491,0
+280000,Male,Graduate School,Single,31,1,0,3405,0
+210000,Female,University,Single,53,-1,390,390,1
+150000,Female,Graduate School,Married,49,-1,416,797,0
+170000,Female,Graduate School,Single,30,-1,8404,16835,0
+80000,Male,University,Single,28,0,73233,3400,0
+30000,Female,University,Married,29,1,16609,0,1
+50000,Male,University,Single,38,0,51047,1849,0
+140000,Female,University,Single,24,0,38483,10169,0
+30000,Female,Graduate School,Married,22,2,26505,0,0
+50000,Male,University,Married,45,0,42614,10016,0
+20000,Male,Graduate School,Single,24,-1,390,390,0
+180000,Male,Graduate School,Married,37,0,15288,1500,0
+30000,Male,University,Single,46,0,24794,2000,0
+210000,Male,University,Single,26,0,92083,4000,0
+30000,Male,University,Single,35,0,20180,2000,0
+50000,Female,University,Single,31,0,20451,1600,0
+270000,Female,Graduate School,Single,31,-2,973,592,0
+70000,Female,University,Single,25,0,22935,1398,0
+110000,Male,University,Single,32,-1,2308,101780,0
+20000,Female,Graduate School,Single,24,-1,3305,2296,1
+150000,Male,University,Married,37,0,147891,5004,1
+420000,Male,Graduate School,Married,59,0,175795,5300,0
+20000,Male,Graduate School,Single,24,-2,299,0,0
+50000,Male,University,Single,51,0,50383,2304,0
+170000,Male,Graduate School,Single,51,0,195038,7509,0
+20000,Male,High School,Married,35,1,24122,1300,0
+360000,Female,High School,Single,36,-1,2473,0,1
+280000,Female,Graduate School,Single,34,-1,8719,0,0
+340000,Male,Graduate School,Single,35,-1,842,2873,0
+180000,Male,Graduate School,Married,46,0,160580,13714,1
+280000,Male,Graduate School,Single,33,-1,8814,151650,1
+60000,Male,Graduate School,Single,27,0,14853,5000,1
+150000,Male,Graduate School,Married,56,2,116928,5006,1
+300000,Female,Graduate School,Single,24,0,35040,1518,0
+210000,Female,Graduate School,Married,34,0,200239,9510,0
+50000,Male,Graduate School,Single,29,0,27878,1484,0
+140000,Female,Graduate School,Single,30,-1,282,282,0
+180000,Female,Graduate School,Single,42,-1,4314,2500,0
+380000,Male,High School,Married,44,-1,40352,1523,0
+10000,Female,University,Single,22,0,7418,1000,0
+50000,Male,High School,Married,45,0,45982,1421,0
+50000,Female,University,Single,22,0,50186,1994,1
+50000,Female,Graduate School,Single,25,-1,8540,1479,0
+140000,Female,University,Married,37,0,10025,2000,0
+110000,Male,University,Single,36,0,109345,4400,0
+260000,Male,University,Married,63,-1,3382,2519,0
+50000,Female,University,Single,26,0,49722,2200,0
+180000,Female,Graduate School,Single,30,3,1782,0,0
+150000,Female,University,Single,26,2,56956,2716,0
+80000,Female,Graduate School,Married,39,0,36444,3015,0
+210000,Female,Graduate School,Single,31,-2,0,0,0
+320000,Male,Graduate School,Married,44,-1,7500,8500,0
+100000,Female,University,Married,36,0,92688,3000,0
+100000,Female,Graduate School,Married,35,0,52708,9967,0
+50000,Male,Graduate School,Single,24,0,16277,4006,0
+240000,Male,High School,Married,43,-1,2690,2500,1
+300000,Female,Graduate School,Single,33,1,0,0,0
+200000,Female,Graduate School,Single,30,0,199229,6101,1
+270000,Female,Graduate School,Married,31,0,86716,6000,0
+150000,Female,Graduate School,Single,26,2,75373,3300,1
+260000,Female,University,Single,29,-2,-5,0,0
+50000,Female,University,Single,24,0,26239,1300,0
+130000,Female,University,Married,32,1,51559,0,1
+80000,Female,Graduate School,Single,28,0,57776,1200,0
+20000,Male,Graduate School,Single,32,0,11777,3276,0
+410000,Female,University,Married,50,0,246659,11000,0
+80000,Male,University,Married,36,2,77054,6900,1
+160000,Female,High School,Married,58,-1,2034,68942,0
+500000,Female,Graduate School,Single,40,-2,15929,18222,0
+20000,Female,University,Married,36,0,16320,2000,0
+180000,Female,Others,Married,34,-1,6892,6892,0
+200000,Female,University,Married,45,0,193641,4543,1
+310000,Male,High School,Married,44,0,238739,10000,0
+160000,Male,Graduate School,Single,40,-1,1430,0,0
+170000,Female,High School,Married,61,1,0,0,0
+500000,Male,Graduate School,Married,45,-1,5307,6616,1
+140000,Female,Graduate School,Single,28,1,0,330,0
+80000,Female,High School,Married,48,0,44508,1785,0
+420000,Female,University,Single,28,-1,27186,5009,0
+90000,Female,High School,Others,24,-2,3700,20108,0
+280000,Female,Graduate School,Single,31,-2,326,326,0
+20000,Male,High School,Married,44,0,11007,4525,0
+170000,Female,University,Married,31,0,63948,5000,0
+30000,Female,University,Single,26,0,12479,1233,0
+20000,Male,High School,Single,26,0,16320,0,0
+280000,Female,Graduate School,Married,49,-2,4378,2,1
+30000,Female,High School,Single,40,1,53720,0,0
+170000,Female,Graduate School,Married,38,0,5556,0,0
+290000,Female,Graduate School,Married,38,1,0,7023,0
+290000,Female,Graduate School,Single,24,1,0,0,0
+360000,Female,University,Single,33,-1,5344,9860,0
+190000,Female,University,Married,34,0,86730,9310,0
+90000,Male,University,Single,26,0,70182,0,0
+20000,Male,High School,Single,48,1,16434,1577,0
+290000,Female,University,Married,42,0,154753,7000,0
+460000,Female,Graduate School,Single,38,0,296815,10279,0
+230000,Female,Graduate School,Single,26,-2,-10,2122,0
+170000,Male,University,Married,37,-2,326,326,0
+180000,Female,Graduate School,Married,40,-1,4236,3016,0
+20000,Male,University,Single,27,0,19331,1209,0
+50000,Male,University,Married,25,0,48213,1800,0
+200000,Male,Graduate School,Single,30,-2,0,0,0
+80000,Female,High School,Married,53,-2,0,0,0
+380000,Male,Graduate School,Married,53,-1,4901,22000,0
+500000,Male,Graduate School,Married,40,-1,11640,12836,1
+170000,Male,University,Married,49,2,26991,4019,1
+20000,Female,Graduate School,Single,23,0,14605,2000,0
+320000,Female,University,Married,32,0,160832,8000,0
+50000,Female,Graduate School,Single,23,0,48064,2091,1
+400000,Male,University,Married,45,0,337135,44000,0
+320000,Female,Graduate School,Single,33,-1,368,10550,0
+200000,Male,Graduate School,Single,28,0,220343,7000,0
+20000,Male,Graduate School,Single,28,0,7381,3000,1
+230000,Female,High School,Single,42,1,0,0,1
+130000,Female,Graduate School,Single,29,-1,280,2038,1
+310000,Male,Graduate School,Single,28,0,272588,9978,1
+50000,Female,High School,Single,58,0,24885,1772,0
+90000,Male,High School,Single,45,0,86519,4100,0
+260000,Female,High School,Married,42,0,56437,3000,0
+270000,Male,Graduate School,Single,36,0,78630,10076,0
+210000,Female,Graduate School,Single,30,0,52771,44767,0
+220000,Female,University,Single,52,0,127624,6046,0
+80000,Male,University,Single,30,2,72183,10895,1
+50000,Female,High School,Single,61,0,36205,3000,0
+20000,Female,University,Married,39,0,45629,2500,0
+20000,Female,Graduate School,Single,23,0,11967,1031,0
+100000,Female,Graduate School,Single,25,-1,16659,10000,0
+320000,Male,University,Single,32,0,75122,4000,0
+200000,Female,University,Single,31,1,0,0,0
+80000,Female,Graduate School,Single,25,-1,232,815,0
+120000,Female,University,Married,34,2,64139,2900,1
+20000,Male,University,Single,43,0,19197,1500,0
+200000,Female,Graduate School,Married,35,-1,10118,2106,0
+140000,Female,University,Married,24,-1,696,696,0
+50000,Male,University,Married,52,2,27548,1478,1
+60000,Male,University,Single,25,1,61514,1800,0
+50000,Male,University,Single,30,0,47110,2401,0
+200000,Female,University,Single,33,0,189740,6450,0
+20000,Male,University,Married,53,0,16039,4000,1
+200000,Female,Graduate School,Married,32,1,0,0,0
+350000,Female,High School,Married,49,-2,10658,9004,0
+50000,Female,High School,Married,40,0,20831,1400,1
+120000,Male,University,Married,44,-1,4936,0,1
+150000,Male,University,Married,35,-1,4030,593,0
+180000,Female,University,Married,48,-1,11054,11272,0
+120000,Female,University,Single,27,1,0,0,1
+130000,Female,Graduate School,Married,55,0,126205,4998,0
+80000,Female,University,Married,41,-2,7274,6280,0
+390000,Female,University,Married,49,0,7383,1066,0
+50000,Female,Graduate School,Single,27,1,21824,0,1
+50000,Male,University,Single,31,1,25859,1800,0
+320000,Female,University,Married,29,-1,3196,7655,1
+50000,Female,University,Others,48,2,18460,2813,1
+200000,Female,University,Married,43,2,171647,0,0
+30000,Male,High School,Single,59,1,13486,0,1
+290000,Female,University,Married,36,-2,754,0,0
+50000,Female,High School,Married,57,0,3964,1690,0
+50000,Female,University,Married,52,-1,600,972,0
+140000,Male,High School,Married,45,2,134073,0,1
+50000,Male,University,Married,43,0,15941,1255,1
+80000,Female,University,Single,23,0,4125,6174,0
+50000,Female,Graduate School,Single,31,0,30291,2000,0
+280000,Female,University,Single,28,-2,17795,4125,0
+80000,Female,University,Married,53,0,37242,2000,0
+60000,Female,University,Married,29,0,62325,2864,0
+360000,Female,Graduate School,Single,26,-1,1374,12534,0
+520000,Male,University,Single,32,0,17718,1222,0
+210000,Female,University,Married,43,1,0,0,1
+360000,Male,Graduate School,Single,26,0,11910,1500,0
+100000,Male,Graduate School,Single,27,0,99777,4252,0
+50000,Female,High School,Married,45,2,47639,1271,1
+280000,Female,University,Married,31,0,127609,6000,0
+120000,Male,Graduate School,Single,26,0,19544,5000,0
+50000,Male,Graduate School,Single,29,2,45728,3800,1
+360000,Female,Graduate School,Married,48,-2,2119,3,0
+140000,Female,University,Married,35,-1,4694,4340,1
+130000,Male,Graduate School,Single,25,1,0,0,0
+240000,Female,Graduate School,Married,33,-2,9187,5005,0
+70000,Female,High School,Single,40,0,60525,14011,0
+80000,Male,University,Others,45,0,28219,1816,0
+20000,Female,Others,Others,35,0,17962,1336,0
+180000,Female,University,Single,29,0,172271,11600,1
+80000,Male,University,Single,25,0,63771,2500,0
+300000,Female,University,Single,39,0,8386,341,0
+70000,Male,University,Single,31,0,62737,4030,0
+180000,Female,Graduate School,Married,30,1,0,0,1
+150000,Male,University,Married,33,0,87721,7200,0
+30000,Female,High School,Single,36,2,25437,2000,1
+180000,Female,Graduate School,Single,24,0,130780,4700,0
+20000,Male,University,Single,31,5,21763,0,0
+30000,Male,University,Single,29,2,20501,5000,0
+500000,Male,Graduate School,Single,38,-2,0,2326,1
+290000,Female,Graduate School,Single,29,0,62029,3000,0
+500000,Female,University,Married,47,0,182516,5000,0
+150000,Male,University,Single,37,-1,69012,2500,0
+200000,Male,Graduate School,Married,37,-1,446,0,1
+50000,Female,University,Single,23,0,20821,1190,0
+180000,Female,Graduate School,Married,47,1,-12,0,1
+150000,Female,Graduate School,Married,40,1,-15,17686,0
+210000,Female,University,Single,34,1,0,1075,1
+30000,Female,Others,Single,23,2,2826,0,0
+360000,Male,High School,Married,51,-1,3019,2444,0
+60000,Female,University,Married,38,1,22212,0,0
+230000,Female,University,Single,25,-1,2045,44711,0
+260000,Male,High School,Married,39,0,3518,0,0
+210000,Female,University,Single,34,-1,8547,2996,0
+200000,Female,Graduate School,Single,42,-1,4359,3841,0
+20000,Male,University,Married,37,0,16436,1418,0
+30000,Male,High School,Single,53,2,28033,1788,0
+180000,Female,Graduate School,Single,32,2,73618,4500,1
+50000,Male,University,Married,28,2,42435,3000,1
+200000,Male,High School,Married,43,0,146206,5404,0
+140000,Female,Graduate School,Single,27,0,16437,3000,0
+250000,Female,Graduate School,Single,42,-1,4673,17381,0
+200000,Male,Graduate School,Married,43,0,202225,1029,0
+80000,Female,Graduate School,Single,30,0,44999,2100,0
+60000,Female,High School,Married,48,0,62652,2300,0
+360000,Female,Graduate School,Single,29,-1,780,0,0
+50000,Male,Graduate School,Single,23,-1,3063,3625,1
+240000,Female,High School,Single,36,1,-28,0,0
+330000,Female,University,Single,30,-1,13204,12500,1
+50000,Female,University,Single,25,1,13750,2000,1
+110000,Male,High School,Single,33,1,7128,0,1
+30000,Male,High School,Single,29,1,0,0,1
+20000,Male,University,Married,36,1,5566,1200,1
+20000,Male,Graduate School,Single,43,0,13361,6020,1
+160000,Male,Graduate School,Single,31,0,136809,6000,0
+80000,Male,University,Single,24,0,79047,3837,0
+180000,Female,Graduate School,Married,47,-1,8059,11680,0
+80000,Male,University,Single,27,-1,390,390,0
+150000,Male,Graduate School,Single,30,0,4785,2087,0
+130000,Male,Graduate School,Single,29,0,81907,3034,1
+240000,Male,High School,Single,32,-1,4717,11980,0
+400000,Female,Graduate School,Single,28,-1,1416,1052,0
+500000,Male,University,Single,32,-1,10339,6405,0
+20000,Male,Graduate School,Single,22,0,18523,3500,1
+240000,Male,Graduate School,Single,35,-1,446,2852,0
+110000,Female,University,Married,26,2,85814,8500,1
+80000,Female,University,Single,26,0,56373,2129,0
+50000,Male,High School,Married,58,-2,4025,5318,0
+330000,Male,Graduate School,Married,37,-1,8447,0,0
+200000,Male,High School,Married,40,0,187431,8124,0
+230000,Female,Others,Married,32,-1,11000,0,0
+140000,Female,University,Single,26,0,28282,125000,1
+360000,Male,University,Married,34,1,-62,0,0
+70000,Male,High School,Married,46,2,29258,2352,1
+360000,Female,University,Married,40,1,-242,649,0
+260000,Female,Graduate School,Single,27,1,0,0,0
+80000,Female,High School,Others,29,-1,76121,2456,0
+180000,Female,University,Single,48,-1,10000,10000,0
+190000,Female,Graduate School,Single,30,-1,6544,19699,0
+190000,Male,Graduate School,Married,50,2,69518,3500,1
+260000,Male,Graduate School,Married,39,0,259394,9876,0
+200000,Male,University,Single,27,0,233637,7000,1
+340000,Female,Graduate School,Single,47,0,326065,20070,1
+200000,Female,University,Married,47,-1,528,13921,0
+150000,Female,Graduate School,Married,36,-1,13447,17914,0
+100000,Male,University,Married,40,0,99958,2975,0
+100000,Female,University,Married,45,0,87590,17200,1
+200000,Female,Others,Single,34,-1,13943,8964,0
+230000,Male,Graduate School,Single,39,0,38395,2500,0
+200000,Male,Graduate School,Married,40,-2,-7,1772,0
+140000,Male,Graduate School,Single,28,2,113780,5800,0
+30000,Male,University,Single,24,0,26931,1598,0
+100000,Female,University,Single,30,1,0,0,0
+50000,Male,University,Married,41,-1,390,390,1
+330000,Female,Graduate School,Married,41,-1,4122,0,0
+500000,Female,Graduate School,Married,36,2,197231,8500,1
+500000,Female,University,Single,31,-2,4543,9001,0
+50000,Female,University,Single,28,0,47605,2300,0
+50000,Female,Graduate School,Single,25,0,16762,4029,0
+50000,Female,University,Married,42,0,47971,2087,0
+20000,Male,University,Married,51,0,16849,2116,0
+70000,Female,High School,Single,24,1,69346,3200,0
+60000,Female,Graduate School,Single,34,0,55447,2178,1
+230000,Male,Graduate School,Single,36,0,218637,10034,0
+60000,Male,High School,Married,58,2,22658,700,0
+330000,Male,University,Married,43,-1,13200,9863,0
+500000,Female,Graduate School,Married,35,-1,2153,9869,0
+30000,Female,High School,Married,45,-2,0,1930,0
+160000,Female,Graduate School,Married,53,0,52607,2834,0
+110000,Male,University,Single,40,0,109191,5000,0
+50000,Female,Graduate School,Single,35,0,30716,1600,0
+50000,Female,University,Married,43,-1,560,1121,0
+20000,Male,Graduate School,Single,25,1,0,0,1
+50000,Male,University,Single,23,2,49365,1200,0
+210000,Male,Graduate School,Single,28,1,0,0,0
+500000,Female,University,Married,30,-2,344,5757,0
+320000,Male,High School,Married,50,0,29943,4300,0
+350000,Female,Graduate School,Single,32,0,10830,2749,0
+50000,Male,High School,Single,27,0,45557,2300,0
+50000,Female,University,Married,34,0,10745,1200,0
+20000,Female,University,Married,33,2,18075,1600,1
+80000,Female,University,Married,47,0,77714,5000,1
+170000,Male,Graduate School,Single,35,0,169282,10126,0
+50000,Female,University,Married,24,0,48928,2000,1
+400000,Male,High School,Single,49,0,36560,0,0
+90000,Female,University,Single,48,2,87673,9100,1
+50000,Male,University,Single,22,2,50262,2009,0
+60000,Female,Graduate School,Single,24,0,57693,2700,0
+50000,Female,University,Single,50,0,20800,0,0
+250000,Female,Graduate School,Married,45,0,214000,3188,0
+320000,Female,High School,Single,53,-1,3479,3479,1
+260000,Female,Graduate School,Single,27,0,15178,12007,0
+20000,Male,University,Single,24,0,19102,1318,0
+50000,Male,Graduate School,Single,25,0,18595,1310,0
+50000,Male,University,Married,35,1,29273,1000,0
+50000,Male,Graduate School,Single,35,3,39437,0,0
+150000,Female,Graduate School,Single,27,-1,7310,4756,0
+20000,Female,University,Single,21,1,21003,39,1
+190000,Female,Graduate School,Married,39,2,171462,0,1
+250000,Male,Graduate School,Married,46,-1,5379,0,0
+380000,Male,Graduate School,Married,32,0,53369,30307,0
+330000,Female,University,Single,28,0,255834,9425,0
+50000,Female,University,Single,31,0,45861,3687,0
+140000,Female,Graduate School,Single,31,0,41314,1732,0
+20000,Female,University,Single,50,1,5065,0,1
+70000,Female,Graduate School,Single,28,0,53690,2208,0
+160000,Female,Graduate School,Single,31,0,160843,6000,0
+240000,Female,University,Married,34,-2,6170,0,0
+300000,Female,Graduate School,Single,27,-2,3102,1506,0
+70000,Male,University,Single,28,0,24510,2000,0
+30000,Female,University,Married,28,0,11692,3000,1
+390000,Female,Graduate School,Single,32,0,22008,2000,0
+20000,Female,University,Single,40,0,6689,1300,0
+330000,Female,Graduate School,Married,44,0,234447,8903,0
+420000,Female,Graduate School,Single,29,-2,4406,4205,0
+200000,Male,Graduate School,Married,63,-1,316,316,0
+70000,Male,University,Single,29,0,61762,9500,1
+180000,Female,University,Married,40,0,20831,1000,0
+260000,Female,High School,Others,55,0,185441,10000,0
+80000,Female,High School,Single,25,-1,571,3952,0
+250000,Female,Graduate School,Single,32,-1,165,165,0
+140000,Male,Graduate School,Single,29,0,137708,5730,0
+40000,Female,University,Single,24,1,36890,0,1
+50000,Female,University,Married,36,0,27433,1161,0
+210000,Female,University,Married,37,0,162518,8006,0
+180000,Female,University,Married,33,0,30781,2000,0
+50000,Male,University,Single,45,0,46090,3000,0
+320000,Female,University,Single,54,0,156238,10000,0
+230000,Male,Graduate School,Married,41,-1,9360,9650,0
+160000,Female,University,Married,40,-1,671,671,0
+110000,Male,University,Single,25,0,99353,3500,0
+250000,Female,High School,Married,39,0,18376,2000,0
+20000,Female,Others,Single,37,0,6725,2000,0
+170000,Female,Graduate School,Single,28,0,12324,1350,0
+140000,Male,High School,Married,32,0,140456,5200,0
+60000,Female,University,Married,42,0,56166,2760,1
+20000,Male,Graduate School,Single,24,-1,10283,0,1
+80000,Male,University,Married,53,0,66484,2731,0
+60000,Male,High School,Single,34,2,58501,2300,0
+210000,Female,Others,Married,35,0,84018,2000,0
+320000,Male,University,Single,27,0,319069,12661,0
+50000,Female,High School,Married,55,0,37234,1963,0
+10000,Male,University,Others,37,0,8634,1000,0
+50000,Male,University,Single,51,0,48726,2100,1
+140000,Female,University,Married,33,0,133094,5005,0
+50000,Female,University,Married,38,-1,588,588,0
+80000,Male,University,Married,36,2,600,0,1
+300000,Female,Graduate School,Married,36,0,272899,9251,0
+520000,Female,Graduate School,Single,33,0,485298,22000,1
+200000,Female,University,Others,57,0,79761,3500,0
+200000,Female,University,Married,35,1,0,0,0
+60000,Female,University,Married,44,0,56970,3000,0
+180000,Female,University,Married,42,0,102449,5500,0
+120000,Female,University,Single,24,0,32317,2000,0
+50000,Male,University,Married,47,0,49032,2599,0
+20000,Male,Graduate School,Single,33,0,17593,1472,0
+10000,Male,University,Single,27,0,16944,2296,1
+520000,Male,Graduate School,Single,34,0,9568,5014,0
+500000,Female,Graduate School,Married,33,-2,125,4366,0
+50000,Male,University,Single,24,0,50244,1400,0
+200000,Female,University,Married,29,0,61518,25138,0
+360000,Female,Graduate School,Single,36,1,-8,2387,0
+400000,Female,Graduate School,Married,48,-1,5880,18368,0
+90000,Male,University,Single,30,-1,370,370,0
+90000,Female,University,Single,29,-1,917,2000,0
+200000,Female,Graduate School,Single,29,0,61099,4160,0
+150000,Female,Graduate School,Married,38,-1,110,567,1
+350000,Male,University,Single,31,0,360347,13100,0
+500000,Male,University,Married,31,0,378886,15057,0
+160000,Female,University,Single,24,0,13038,155000,0
+80000,Male,Graduate School,Single,35,0,52065,5000,0
+130000,Male,University,Single,38,0,133400,0,0
+100000,Female,High School,Single,31,0,85030,1820,0
+80000,Male,University,Married,46,0,30773,10036,0
+50000,Female,University,Single,24,0,18201,14813,0
+80000,Female,Graduate School,Married,50,1,0,0,0
+200000,Male,Graduate School,Single,27,1,0,0,0
+330000,Female,University,Single,28,0,37637,1500,0
+50000,Female,Graduate School,Single,26,2,10622,1500,1
+80000,Female,University,Single,23,-1,309,0,1
+270000,Female,University,Married,33,-1,11489,8874,0
+30000,Female,High School,Married,60,0,11575,1600,0
+500000,Male,University,Single,30,-1,869,2507,0
+60000,Female,Graduate School,Single,26,-1,3437,16960,0
+20000,Female,University,Married,34,2,17151,300,0
+180000,Female,University,Single,28,1,0,0,0
+30000,Female,University,Married,38,0,3656,0,0
+200000,Male,Graduate School,Married,45,-1,577,0,0
+320000,Female,Graduate School,Single,32,-2,3916,2886,0
+210000,Female,High School,Married,28,0,37608,2000,0
+200000,Female,Graduate School,Single,30,-1,665,2618,0
+170000,Male,University,Single,41,-2,6175,7417,0
+50000,Male,High School,Married,42,0,47838,2180,0
+20000,Female,University,Others,52,0,8586,1600,0
+20000,Female,University,Married,22,0,18475,1400,0
+580000,Female,Graduate School,Married,31,0,484798,19000,0
+80000,Female,High School,Single,25,0,80012,3010,0
+50000,Female,Graduate School,Married,38,0,19926,4158,1
+30000,Male,University,Single,31,0,26788,2001,0
+50000,Female,University,Married,31,0,45614,2450,1
+20000,Male,High School,Single,25,2,1650,0,1
+100000,Female,University,Married,41,2,71978,3500,0
+80000,Male,High School,Single,48,0,80002,3500,0
+40000,Female,High School,Married,48,2,780,0,0
+80000,Female,High School,Married,27,1,79108,3200,0
+10000,Male,High School,Single,31,0,10119,3000,0
+200000,Female,Graduate School,Married,41,-1,10529,1940,0
+50000,Female,University,Others,28,1,51068,0,0
+140000,Female,High School,Single,23,0,100448,5450,0
+340000,Female,Graduate School,Married,34,0,76786,39749,0
+220000,Male,Graduate School,Single,28,2,205130,9975,1
+80000,Female,Graduate School,Single,23,1,80461,37,1
+20000,Male,High School,Single,24,0,12188,1220,1
+500000,Male,Graduate School,Married,36,-1,25334,6867,0
+100000,Female,University,Married,48,1,7828,430,0
+150000,Male,University,Married,39,1,-7,4429,0
+360000,Male,University,Single,27,0,160317,4700,0
+60000,Female,University,Married,37,0,57694,2800,0
+500000,Male,Graduate School,Married,34,-1,1895,1086,0
+270000,Female,Graduate School,Single,27,-1,2016,2000,0
+230000,Female,University,Single,30,0,74664,3000,0
+160000,Female,University,Single,33,0,140225,5211,0
+20000,Male,Graduate School,Single,26,0,14554,1260,0
+230000,Male,University,Married,34,0,131887,5005,0
+50000,Female,High School,Single,23,0,33239,6010,0
+500000,Female,University,Married,52,-1,7593,8677,0
+80000,Female,High School,Single,30,2,71165,2700,0
+50000,Male,High School,Others,55,0,45555,1968,1
+20000,Male,University,Single,29,1,20832,0,0
+200000,Female,Graduate School,Married,40,-2,6549,9421,0
+100000,Female,University,Married,44,0,39000,1900,0
+390000,Female,University,Married,44,0,169658,8000,0
+450000,Male,University,Married,44,-1,3004,5971,0
+10000,Male,High School,Married,40,1,7229,0,1
+270000,Male,University,Single,29,0,18465,3022,0
+190000,Male,Graduate School,Single,33,0,137502,6112,0
+30000,Female,Graduate School,Single,30,0,32751,1600,0
+360000,Female,Graduate School,Single,28,-2,23088,12000,0
+50000,Female,University,Single,28,0,19969,8000,0
+240000,Female,Graduate School,Married,41,-1,10839,10000,0
+150000,Male,University,Single,26,0,144614,6700,0
+150000,Male,University,Single,44,0,143650,0,0
+20000,Male,University,Single,23,1,17682,1500,0
+50000,Male,Graduate School,Single,24,-1,390,0,1
+40000,Female,University,Married,38,0,38491,3000,0
+20000,Male,Graduate School,Single,25,1,16688,2300,1
+150000,Female,Graduate School,Single,29,1,0,0,0
+50000,Female,University,Married,48,-1,1079,7651,0
+50000,Female,High School,Married,44,1,-42,26309,1
+50000,Male,University,Single,23,-1,2165,15000,1
+50000,Male,High School,Single,24,0,7198,1500,1
+200000,Female,Graduate School,Single,35,0,75203,4000,0
+200000,Male,Graduate School,Single,37,2,157131,13500,1
+120000,Female,High School,Married,38,0,117738,5820,0
+90000,Male,University,Single,28,2,18978,2000,1
+30000,Female,Graduate School,Single,24,-1,784,390,0
+70000,Male,Graduate School,Single,29,-1,2343,15000,0
+490000,Female,University,Married,52,-1,2779,22595,0
+50000,Male,High School,Married,39,-1,2522,13196,0
+320000,Male,University,Married,55,0,235477,11000,0
+50000,Female,Graduate School,Single,27,1,12884,2000,1
+140000,Female,Graduate School,Single,25,-1,2229,15980,1
+500000,Female,University,Single,49,-1,27891,8982,0
+500000,Male,Graduate School,Married,58,-1,7285,28730,0
+50000,Male,Graduate School,Single,34,0,48559,2573,0
+170000,Male,High School,Married,31,0,3062,112000,0
+130000,Female,High School,Married,36,0,23997,3440,1
+150000,Female,High School,Single,27,0,3081,5000,0
+50000,Female,Graduate School,Single,25,0,43875,2200,0
+410000,Male,Graduate School,Married,48,0,231152,12000,0
+60000,Female,Others,Single,33,0,52176,2000,0
+150000,Male,High School,Married,45,-1,29169,1000,0
+260000,Female,University,Married,43,-2,690,2500,0
+230000,Male,Graduate School,Married,40,-2,0,0,0
+50000,Male,University,Others,49,0,24282,1500,0
+230000,Female,Graduate School,Single,23,-2,61580,2862,0
+280000,Female,Graduate School,Married,44,0,235635,7665,0
+500000,Male,University,Single,67,-1,4410,2681,0
+30000,Male,University,Married,38,0,29723,2000,0
+330000,Female,University,Married,31,0,298644,13000,0
+180000,Male,Graduate School,Married,50,0,181624,9021,0
+30000,Male,University,Single,47,3,29774,1800,1
+200000,Male,Graduate School,Single,36,-1,1801,9556,0
+80000,Female,High School,Married,41,0,52456,11500,0
+80000,Male,University,Single,25,0,31780,2000,1
+100000,Female,Graduate School,Single,27,-1,3904,400,0
+300000,Male,Graduate School,Married,54,-2,0,0,0
+240000,Female,Graduate School,Married,58,1,0,222,1
+50000,Female,University,Married,49,-1,12515,9115,1
+50000,Female,High School,Married,50,0,35415,6960,0
+100000,Male,University,Married,42,0,17565,3183,0
+110000,Male,High School,Single,40,0,84965,3280,0
+180000,Female,University,Married,40,0,23820,2005,0
+140000,Female,University,Single,26,0,140881,8140,0
+80000,Male,University,Single,28,-1,77701,2000,0
+90000,Female,Graduate School,Single,26,-1,17018,3692,0
+90000,Male,High School,Married,45,2,37681,1519,1
+240000,Female,University,Single,41,0,188983,7106,0
+150000,Male,Graduate School,Married,41,1,-200,0,0
+240000,Female,Graduate School,Single,33,-1,153062,7000,0
+80000,Male,High School,Single,34,0,78562,2100,0
+110000,Female,University,Married,42,2,28383,0,0
+120000,Female,University,Single,39,0,22033,2400,0
+360000,Female,University,Married,42,-2,18186,9548,0
+150000,Male,Others,Single,30,0,175348,9304,0
+80000,Male,Graduate School,Married,53,0,2783,1200,0
+140000,Male,University,Married,47,3,192182,0,1
+50000,Male,University,Married,40,0,15090,1577,0
+20000,Female,High School,Married,36,1,17216,900,0
+240000,Female,Graduate School,Married,29,0,21409,1000,0
+50000,Male,Graduate School,Single,49,0,47754,2280,0
+380000,Male,Graduate School,Married,57,0,319243,150039,0
+30000,Male,University,Single,49,-1,1659,1473,0
+50000,Female,University,Married,31,1,23781,1800,0
+160000,Female,University,Married,36,-1,2511,2242,1
+50000,Male,University,Others,53,0,49170,2009,0
+80000,Female,High School,Single,31,0,34620,2000,0
+70000,Female,University,Single,28,1,70736,3000,1
+200000,Male,Graduate School,Single,29,0,187889,8000,0
+100000,Female,High School,Married,28,3,110946,0,1
+180000,Male,University,Single,38,2,21564,0,1
+360000,Male,High School,Married,53,-1,45421,68536,0
+230000,Male,Graduate School,Single,39,-1,834,0,0
+190000,Female,University,Married,48,0,210612,7496,0
+260000,Male,Graduate School,Single,32,0,122423,5940,0
+30000,Female,University,Single,25,-1,4471,1500,0
+150000,Female,High School,Single,25,1,0,0,0
+80000,Female,University,Single,34,1,0,7400,0
+80000,Male,University,Married,37,2,81562,11,1
+180000,Female,High School,Married,43,-2,1242,0,0
+350000,Female,Graduate School,Single,33,-2,-2,0,0
+200000,Female,High School,Married,30,-1,1120,1100,0
+140000,Male,University,Single,27,0,24050,50000,0
+20000,Male,Graduate School,Single,25,0,2652,1200,1
+20000,Female,High School,Others,55,1,18558,2400,1
+220000,Male,Graduate School,Single,37,0,34339,1294,0
+290000,Female,Graduate School,Single,36,1,170422,7594,1
+80000,Female,High School,Married,40,0,29676,1513,0
+16000,Female,University,Married,48,2,4577,0,0
+100000,Female,University,Married,33,0,82308,3261,0
+390000,Female,University,Married,37,0,200612,8315,0
+550000,Female,Graduate School,Single,35,0,83707,3023,0
+80000,Female,University,Single,31,0,81358,3126,0
+150000,Female,University,Married,39,-1,12365,53444,0
+20000,Female,University,Married,27,0,7286,1006,0
+20000,Female,University,Single,21,0,20000,1685,0
+100000,Female,High School,Single,29,-2,2038,11386,0
+30000,Male,High School,Single,28,2,300,0,1
+80000,Female,High School,Married,47,0,33590,2079,0
+30000,Male,Graduate School,Single,33,2,27274,0,1
+140000,Male,High School,Single,23,-1,390,390,0
+200000,Male,University,Single,32,0,180461,8300,0
+290000,Male,Graduate School,Single,41,0,255049,9006,0
+50000,Male,University,Single,23,0,96234,1052,0
+260000,Female,Graduate School,Single,28,1,0,0,0
+60000,Male,University,Married,24,0,24608,1400,0
+280000,Male,Graduate School,Married,34,0,212968,237000,0
+100000,Female,Graduate School,Single,36,0,10672,2355,0
+290000,Female,High School,Married,48,-1,284990,10000,0
+50000,Male,High School,Married,49,0,10644,2000,0
+150000,Male,Graduate School,Single,66,0,131027,6600,0
+390000,Female,Graduate School,Married,39,-2,48417,2000,0
+180000,Female,University,Married,44,0,161186,10000,0
+290000,Female,University,Married,55,1,0,166,1
+180000,Male,University,Single,34,0,22387,1500,0
+80000,Female,Graduate School,Single,31,-1,14401,4374,0
+450000,Female,Graduate School,Married,42,-1,72517,0,0
+20000,Male,University,Single,37,0,20009,1700,0
+80000,Male,Graduate School,Single,25,0,31443,1800,0
+100000,Female,University,Single,36,0,87656,5000,0
+70000,Male,High School,Married,39,0,58665,6852,0
+330000,Male,High School,Married,49,0,217148,7849,0
+200000,Male,Graduate School,Single,30,-2,1488,2505,0
+30000,Male,Graduate School,Single,36,0,27748,1500,0
+200000,Female,High School,Married,69,2,2500,0,1
+80000,Female,Graduate School,Single,27,-1,5060,0,0
+340000,Female,Graduate School,Single,40,0,68616,7355,0
+20000,Female,University,Single,56,0,17091,3000,1
+130000,Female,Graduate School,Single,29,2,126509,6000,0
+400000,Male,Graduate School,Single,31,0,115506,10000,0
+200000,Female,University,Married,29,0,34391,1507,0
+80000,Female,University,Married,47,0,4762,1000,0
+50000,Male,High School,Married,39,0,26979,1328,0
+30000,Female,Graduate School,Single,25,0,19783,1500,0
+70000,Female,University,Single,22,0,31377,6000,0
+500000,Male,Graduate School,Single,33,-1,3933,5954,0
+80000,Male,Graduate School,Single,27,-1,6544,4924,0
+120000,Male,University,Married,40,0,94166,3500,0
+390000,Male,Graduate School,Married,33,0,17794,6000,0
+30000,Male,High School,Married,52,0,28889,2002,0
+70000,Female,Graduate School,Single,30,0,6749,64626,0
+140000,Male,High School,Single,39,0,115940,10000,0
+30000,Female,University,Married,27,0,52636,3164,0
+190000,Female,University,Single,25,1,0,0,0
+270000,Male,Graduate School,Single,33,0,117705,8000,0
+290000,Male,Graduate School,Single,35,-2,184910,4166,1
+50000,Male,University,Single,29,0,1972,2500,1
+200000,Female,Graduate School,Married,41,0,30689,5003,0
+30000,Female,University,Single,24,0,5807,1600,0
+150000,Male,University,Married,29,0,46625,30151,0
+30000,Female,High School,Single,48,-1,2946,0,1
+500000,Male,Graduate School,Married,34,-2,33348,90889,0
+20000,Female,Graduate School,Single,23,0,17229,1300,0
+50000,Male,High School,Married,29,1,22446,0,0
+280000,Male,University,Single,30,2,110550,6600,0
+80000,Male,High School,Single,24,1,0,1248,0
+200000,Male,Graduate School,Single,27,1,0,0,0
+30000,Female,High School,Married,60,0,27331,2000,0
+30000,Female,University,Married,24,0,22280,1748,0
+380000,Female,University,Married,35,-1,2661,2022,0
+260000,Female,University,Single,26,0,164351,7000,0
+50000,Male,University,Single,22,1,50043,1000,0
+90000,Male,High School,Married,53,0,99905,3763,0
+20000,Male,University,Married,40,1,15919,1570,0
+10000,Male,University,Single,27,1,-2000,4475,0
+400000,Female,University,Married,44,1,0,0,0
+300000,Female,Graduate School,Single,32,-2,2081,1925,0
+150000,Female,Others,Married,25,-2,1072,1321,0
+50000,Female,University,Single,33,0,51444,2000,0
+100000,Female,University,Single,25,2,101827,0,1
+150000,Female,University,Married,38,-2,0,0,1
+110000,Male,University,Single,34,0,111576,4100,0
+100000,Female,University,Single,28,0,106750,3004,0
+50000,Male,University,Married,33,0,21460,1300,1
+80000,Male,Graduate School,Single,29,0,6228,589,0
+210000,Female,Graduate School,Single,25,-1,4576,1837,0
+140000,Female,High School,Married,60,0,137730,4990,1
+240000,Female,Graduate School,Married,43,0,8862,3220,0
+110000,Female,University,Single,36,0,48120,5000,0
+100000,Female,High School,Single,48,-2,2422,894,0
+50000,Male,University,Married,24,0,17961,1700,0
+90000,Female,University,Married,35,0,82164,2926,0
+150000,Male,Graduate School,Single,33,0,26739,7000,0
+100000,Female,University,Married,30,2,29778,1600,1
+140000,Female,University,Married,62,0,142250,5200,0
+230000,Female,Graduate School,Single,33,0,172567,7000,0
+20000,Female,University,Single,25,0,19223,1293,1
+230000,Female,High School,Single,38,0,14887,10000,0
+150000,Male,University,Single,26,-2,3627,0,1
+20000,Male,University,Single,24,0,10534,1300,0
+120000,Female,High School,Married,35,0,27666,1500,0
+30000,Female,High School,Married,58,1,24921,2000,1
+220000,Male,Graduate School,Married,42,1,59932,0,0
+250000,Male,Graduate School,Married,41,-1,9146,10957,0
+80000,Male,University,Married,34,0,11460,5000,0
+60000,Male,Graduate School,Single,22,0,57392,4400,0
+130000,Female,University,Married,37,1,0,0,0
+400000,Male,High School,Married,43,-1,145801,6100,0
+150000,Female,University,Single,28,0,103227,3900,0
+230000,Female,High School,Married,58,-2,316,316,1
+70000,Female,High School,Single,24,0,75894,10004,0
+50000,Female,University,Married,39,0,47407,2039,0
+90000,Female,Graduate School,Married,38,-1,380,2132,0
+30000,Female,High School,Single,49,0,27838,3330,0
+30000,Male,University,Married,49,3,26319,1800,0
+250000,Female,Graduate School,Single,26,-2,1245,712,0
+30000,Female,University,Single,23,1,-1,605,0
+150000,Female,University,Married,27,-2,929,0,0
+400000,Female,Graduate School,Single,48,-1,24478,10846,0
+30000,Female,University,Married,52,0,20533,2000,1
+270000,Female,Graduate School,Single,50,-1,3095,2340,0
+360000,Male,High School,Married,37,-1,303,0,0
+80000,Female,University,Married,23,2,82628,2500,1
+70000,Female,University,Single,33,0,14320,1600,0
+50000,Female,Graduate School,Single,29,0,41897,4000,0
+220000,Female,University,Single,38,0,142846,6000,0
+70000,Male,University,Single,29,0,56626,2800,1
+340000,Female,Graduate School,Married,39,0,250571,9125,0
+20000,Female,High School,Married,43,-1,390,0,0
+300000,Female,University,Single,32,0,139578,6000,0
+180000,Female,Graduate School,Single,29,-2,0,0,0
+60000,Male,University,Single,33,1,-64,0,0
+160000,Female,University,Single,27,2,157818,7500,1
+50000,Male,University,Single,37,0,50148,2064,0
+100000,Female,Graduate School,Married,38,-1,1669,710,0
+60000,Female,High School,Married,48,-1,2970,430,0
+30000,Male,University,Single,23,0,26725,2000,0
+500000,Female,University,Single,33,-2,13284,17311,0
+20000,Female,University,Married,35,0,18126,1347,0
+50000,Female,University,Single,23,0,35131,5582,0
+180000,Female,University,Married,35,0,174635,8155,0
+90000,Female,University,Single,23,0,19204,2000,0
+150000,Female,University,Married,33,0,37794,2000,0
+30000,Male,University,Married,44,0,30003,1643,0
+130000,Male,High School,Single,45,0,13513,1000,0
+80000,Female,High School,Married,39,-2,0,0,0
+210000,Female,University,Married,33,0,206409,8546,0
+530000,Female,Graduate School,Single,39,0,439330,17045,0
+360000,Male,University,Married,51,1,3551,0,1
+140000,Female,University,Single,36,0,134183,8144,0
+20000,Female,University,Married,24,2,19618,1451,0
+200000,Female,Graduate School,Married,40,-1,316,316,0
+40000,Female,University,Single,25,0,39092,1967,1
+200000,Male,High School,Single,27,0,60225,4119,0
+90000,Male,Others,Single,34,0,101023,3580,0
+410000,Male,University,Married,45,0,278405,13000,0
+170000,Female,High School,Single,42,0,77795,5000,0
+230000,Female,Graduate School,Married,37,0,89299,5000,0
+210000,Female,University,Single,42,-1,1631,1024,0
+200000,Male,University,Single,29,0,7468,30035,0
+50000,Female,Graduate School,Single,25,1,43099,0,0
+130000,Male,University,Single,42,0,83521,3087,0
+210000,Male,University,Single,34,0,24708,1995,0
+130000,Female,University,Single,42,0,42946,1676,0
+80000,Female,University,Single,41,-1,66053,3000,0
+360000,Female,University,Married,49,-1,15000,0,0
+500000,Male,Graduate School,Married,42,1,9615,1054,0
+60000,Female,University,Married,53,-1,390,390,1
+160000,Female,University,Single,26,0,8218,1216,0
+50000,Female,University,Single,22,2,47076,2617,1
+50000,Male,High School,Married,33,0,19374,1346,0
+360000,Male,Graduate School,Single,36,0,256324,12009,0
+30000,Female,University,Single,38,1,0,0,0
+20000,Female,University,Single,27,-1,226,0,0
+360000,Female,Graduate School,Married,32,0,37736,1174,1
+50000,Female,Graduate School,Single,28,0,6608,2000,0
+460000,Female,Graduate School,Married,43,-2,7677,21958,0
+20000,Male,University,Single,24,-1,19318,0,0
+330000,Male,University,Married,39,1,-208,416,0
+200000,Female,High School,Married,29,1,0,0,0
+50000,Female,University,Married,38,2,50965,1101,1
+50000,Female,University,Married,24,0,7052,2003,0
+50000,Male,University,Single,24,1,-22,3000,0
+270000,Female,University,Married,42,-1,1138,11921,0
+200000,Female,University,Married,33,-1,7770,5846,0
+300000,Female,University,Single,28,0,199801,5017,0
+50000,Female,Graduate School,Single,26,1,-1,0,0
+250000,Female,High School,Married,55,0,296963,8000,1
+400000,Male,Graduate School,Single,33,0,201800,9150,0
+360000,Female,Graduate School,Single,27,0,119138,20006,0
+120000,Female,University,Married,29,0,122842,6500,1
+50000,Female,University,Married,49,2,19169,1341,1
+340000,Female,University,Married,40,-2,0,0,0
+180000,Male,University,Single,29,0,98535,3977,0
+50000,Female,Graduate School,Single,23,1,-1971,53267,0
+150000,Male,University,Single,37,-1,111880,4000,0
+100000,Female,University,Married,41,2,97031,3500,1
+200000,Female,University,Single,28,1,0,1732,0
+160000,Male,University,Single,43,0,8460,0,0
+50000,Male,High School,Married,55,0,44258,2489,0
+150000,Female,University,Married,40,0,40548,2000,0
+50000,Female,High School,Married,52,-2,0,0,0
+220000,Female,University,Single,36,0,126695,4000,0
+30000,Female,University,Married,23,2,11851,2762,1
+30000,Female,University,Married,26,2,25369,1700,1
+50000,Female,High School,Single,57,0,18599,1321,0
+160000,Female,University,Single,35,0,25590,4000,0
+180000,Male,Graduate School,Single,31,-2,0,48544,0
+190000,Male,Graduate School,Married,31,0,158416,6004,0
+480000,Female,Graduate School,Married,39,0,106660,302000,0
+270000,Female,Graduate School,Single,32,0,25343,20000,0
+200000,Male,University,Single,37,-2,162,536,0
+80000,Male,Graduate School,Single,38,-1,780,259,1
+410000,Male,University,Married,36,-2,3827,10534,0
+50000,Female,Graduate School,Single,25,-1,2823,0,1
+50000,Female,High School,Single,42,0,2389,1064,0
+210000,Male,Graduate School,Married,39,-2,980,0,1
+20000,Male,Graduate School,Single,28,0,16196,1600,0
+150000,Female,Graduate School,Single,28,0,109835,5500,0
+130000,Female,Graduate School,Single,25,1,2576,0,1
+200000,Female,Graduate School,Single,26,0,121830,3311,0
+110000,Female,University,Single,23,0,106963,5023,0
+180000,Female,Graduate School,Married,30,-1,2743,0,1
+300000,Female,Graduate School,Married,40,-1,10920,3939,0
+360000,Male,Graduate School,Single,32,-1,27,185,0
+380000,Female,University,Married,30,-2,3745,745,0
+360000,Female,Graduate School,Single,29,-1,2868,0,1
+80000,Male,Graduate School,Single,28,-2,8598,9467,1
+350000,Female,Graduate School,Others,48,-1,181212,6046,0
+20000,Female,Others,Married,28,0,21501,1674,0
+320000,Female,Graduate School,Single,35,0,227927,10000,0
+90000,Male,High School,Married,62,0,86200,2500,0
+40000,Female,Graduate School,Single,27,-1,32540,14163,1
+50000,Female,University,Married,40,-1,46616,2005,0
+210000,Female,Graduate School,Single,24,-1,4055,114482,0
+30000,Female,University,Single,22,3,30078,87,0
+310000,Female,Graduate School,Single,27,1,0,113,1
+280000,Female,University,Single,35,-2,1616,1198,0
+280000,Female,Graduate School,Married,41,0,99755,3000,0
+260000,Female,Graduate School,Single,33,2,1381,493,0
+90000,Male,University,Married,41,-1,-1750,79786,0
+280000,Male,Graduate School,Married,49,-2,11797,6231,0
+360000,Female,High School,Single,34,-1,2500,0,0
+30000,Female,High School,Single,27,2,13257,1300,1
+380000,Male,University,Married,48,0,379080,13000,0
+80000,Female,University,Single,22,3,42080,0,0
+50000,Female,Graduate School,Single,25,2,45708,3900,1
+280000,Female,Graduate School,Married,37,1,449,12611,0
+120000,Female,University,Others,37,2,26446,2000,1
+180000,Male,Graduate School,Married,49,0,63444,3000,0
+380000,Male,High School,Single,33,-1,2989,3742,0
+110000,Female,Graduate School,Single,27,0,30961,1700,0
+50000,Female,University,Others,37,-1,1149,1424,0
+200000,Female,Graduate School,Single,48,-2,0,0,0
+200000,Male,Graduate School,Married,44,-1,25929,1582,0
+150000,Female,Graduate School,Married,35,0,27329,20228,0
+60000,Male,University,Married,40,0,58078,2354,1
+200000,Female,University,Single,30,0,155768,7325,0
+90000,Female,University,Married,35,2,63426,2700,1
+150000,Female,Graduate School,Single,32,-1,1758,468,0
+20000,Male,High School,Single,23,1,9126,0,1
+150000,Female,High School,Married,34,0,112391,5000,0
+180000,Male,Graduate School,Single,34,-1,16953,1000,0
+470000,Male,University,Single,34,-2,23747,1116,0
+30000,Female,High School,Single,59,2,21767,3363,1
+20000,Female,Graduate School,Single,22,-1,8266,2313,0
+400000,Female,Graduate School,Married,36,-1,14421,8911,0
+210000,Female,Graduate School,Single,27,0,15663,1500,0
+120000,Male,High School,Married,38,2,77468,3000,1
+160000,Female,Others,Married,42,-2,-1582,5624,0
+200000,Female,Graduate School,Single,26,0,6242,1776,0
+160000,Female,Graduate School,Single,30,-1,9640,0,0
+160000,Female,University,Single,37,2,147333,200,1
+250000,Male,High School,Married,51,-2,179,1540,0
+140000,Male,High School,Married,56,0,7272,1299,0
+330000,Female,University,Single,30,-2,0,0,1
+220000,Female,Graduate School,Single,24,2,161081,7100,1
+30000,Female,High School,Married,29,1,13164,0,0
+260000,Male,Graduate School,Single,35,1,149124,0,1
+380000,Female,University,Married,30,0,295543,11000,0
+10000,Female,University,Single,33,0,15004,2452,1
+190000,Female,Graduate School,Single,44,0,181045,8000,0
+30000,Female,University,Married,38,0,23819,1553,0
+300000,Female,Graduate School,Married,39,1,6077,29,0
+20000,Male,University,Married,27,0,17062,2850,0
+20000,Male,University,Married,35,0,17584,3013,0
+80000,Female,High School,Single,22,0,37990,2400,0
+240000,Male,University,Married,33,0,225312,8000,0
+90000,Female,High School,Single,26,0,14692,1269,0
+100000,Female,University,Married,43,0,98944,4000,0
+320000,Male,Graduate School,Others,32,-1,802,4929,0
+360000,Male,University,Married,45,2,2500,0,0
+180000,Male,University,Married,38,1,2826,2000,1
+300000,Male,University,Married,40,-2,15876,4347,0
+20000,Female,Graduate School,Single,22,-2,-9,0,0
+270000,Female,University,Single,26,0,29951,1300,0
+20000,Female,High School,Single,49,0,19437,1337,0
+360000,Female,University,Married,35,0,39685,1600,0
+120000,Male,University,Married,38,0,50441,2000,0
+130000,Female,High School,Single,27,0,129150,5300,0
+60000,Female,High School,Single,45,0,54920,1726,0
+60000,Male,University,Married,36,0,46064,2000,0
+120000,Female,High School,Single,31,-1,1769,390,0
+130000,Male,High School,Single,33,0,92457,6000,0
+360000,Female,Graduate School,Married,31,-1,460,460,0
+10000,Male,High School,Single,24,-1,6217,5112,0
+120000,Female,Graduate School,Single,24,-1,600,0,0
+60000,Male,University,Single,26,2,61103,3000,1
+240000,Female,Graduate School,Married,40,2,2500,0,1
+160000,Female,University,Married,40,0,137695,12121,0
+200000,Female,University,Married,44,-2,0,138,0
+300000,Male,Graduate School,Single,29,-2,0,0,0
+100000,Female,University,Single,26,-1,780,0,0
+280000,Female,Graduate School,Married,30,0,217773,8155,0
+190000,Female,University,Single,26,-1,11613,10000,0
+50000,Male,University,Married,48,1,17613,1298,0
+210000,Female,High School,Married,33,0,31570,2000,0
+120000,Female,Graduate School,Single,26,0,110934,4432,0
+140000,Female,Graduate School,Single,33,-2,1825,0,0
+50000,Female,Graduate School,Single,23,-2,807,0,0
+150000,Female,Graduate School,Single,31,-2,2005,2876,0
+50000,Male,Graduate School,Single,23,4,27101,88,0
+120000,Male,Graduate School,Married,50,1,0,0,1
+240000,Female,Graduate School,Single,29,-1,30380,10023,0
+20000,Male,Graduate School,Single,26,0,1490,3150,0
+80000,Female,University,Single,24,0,78913,2900,0
+140000,Female,University,Married,28,0,99296,3205,0
+50000,Female,University,Single,25,0,28428,2000,0
+150000,Female,University,Others,26,-2,4181,26034,0
+90000,Female,High School,Married,53,0,65237,2255,1
+640000,Male,Graduate School,Single,36,0,192325,4000,0
+360000,Male,Graduate School,Single,31,-1,15285,53751,0
+130000,Female,University,Married,33,-1,2454,1500,0
+210000,Female,Graduate School,Married,34,0,201324,10646,0
+190000,Male,Graduate School,Single,36,2,107689,5200,1
+150000,Female,Graduate School,Married,44,-1,1365,0,0
+10000,Male,High School,Single,25,-1,3161,390,0
+30000,Female,Graduate School,Married,24,-1,9280,2643,1
+200000,Male,Graduate School,Married,67,1,0,0,1
+140000,Female,Graduate School,Single,25,3,21593,0,1
+170000,Female,University,Single,30,0,121009,19078,0
+20000,Female,University,Single,24,2,6270,2284,1
+80000,Female,University,Married,45,0,82150,0,0
+110000,Male,University,Married,49,0,51803,1921,0
+150000,Female,Graduate School,Single,28,0,147933,3202,0
+50000,Female,University,Single,43,-1,26342,1547,0
+50000,Female,High School,Married,70,2,54084,0,0
+50000,Male,High School,Married,50,2,48607,2100,0
+10000,Female,University,Single,22,0,2274,1067,1
+280000,Female,Graduate School,Single,38,-1,17878,2000,1
+50000,Male,University,Single,42,2,51614,2000,0
+10000,Female,Graduate School,Single,22,0,9397,1200,0
+140000,Female,University,Married,28,0,131292,7000,0
+140000,Male,Graduate School,Married,34,2,104651,5000,1
+300000,Male,Graduate School,Married,45,0,193462,6835,0
+30000,Female,University,Married,26,1,27330,1500,0
+200000,Male,High School,Married,47,0,185272,10000,0
+270000,Female,University,Single,26,0,33224,52425,1
+90000,Female,University,Others,36,0,91458,3137,0
+10000,Male,University,Single,26,2,2400,0,1
+50000,Female,High School,Single,22,2,22719,0,0
+80000,Female,Graduate School,Single,25,-2,0,0,0
+110000,Female,University,Single,26,0,65800,2510,0
+370000,Male,Graduate School,Single,30,0,6592,4000,0
+50000,Male,University,Single,23,-1,1678,1200,0
+10000,Male,University,Single,37,-1,885,1475,0
+180000,Female,Graduate School,Single,27,0,99137,22500,0
+60000,Female,University,Married,59,4,56998,3601,0
+90000,Female,Graduate School,Married,43,0,58043,5003,1
+120000,Female,Graduate School,Married,30,1,0,0,0
+110000,Male,Graduate School,Single,30,0,110800,4404,0
+280000,Female,High School,Married,61,0,162734,7000,0
+200000,Male,Graduate School,Single,28,0,7464,22182,0
+130000,Male,University,Single,29,0,62530,2272,0
+200000,Female,University,Single,28,1,61844,0,1
+50000,Female,University,Single,26,0,29356,1603,1
+30000,Male,High School,Married,53,0,1596,0,0
+230000,Female,Graduate School,Married,30,-1,13995,28097,0
+180000,Female,Graduate School,Single,31,-1,3920,1020,0
+220000,Male,Graduate School,Married,40,0,129597,20000,0
+50000,Male,Graduate School,Single,46,0,51400,0,0
+20000,Male,Graduate School,Single,27,1,0,0,0
+250000,Female,University,Single,40,-1,2348,329,0
+50000,Male,High School,Single,41,0,47384,2194,0
+180000,Female,University,Single,26,0,10874,2000,0
+320000,Male,University,Single,44,-1,36831,28791,0
+10000,Female,High School,Married,41,-1,780,1473,1
+320000,Female,Graduate School,Single,31,-2,1345,5458,0
+250000,Male,University,Single,30,0,170559,6523,0
+30000,Male,University,Single,24,1,20985,0,0
+260000,Male,University,Single,31,-1,326,652,0
+130000,Female,University,Married,37,0,122944,6013,0
+180000,Female,Graduate School,Married,30,0,149069,7500,1
+270000,Female,Graduate School,Single,27,0,200750,9500,0
+80000,Female,High School,Single,30,0,36676,2165,0
+30000,Female,Graduate School,Single,27,-1,418,1052,0
+100000,Male,High School,Married,35,2,101326,0,1
+340000,Female,University,Single,33,0,336190,10300,1
+210000,Female,University,Married,26,0,207033,9000,0
+280000,Female,Graduate School,Single,27,-2,-5,0,0
+710000,Male,University,Single,33,-1,8471,17330,0
+50000,Female,University,Single,23,0,42790,1635,0
+190000,Male,Graduate School,Single,35,0,24873,1807,0
+400000,Female,Graduate School,Single,32,0,48419,5000,0
+110000,Female,High School,Married,53,-2,1457,745,0
+20000,Female,University,Married,23,2,16219,1674,1
+20000,Female,Graduate School,Single,25,2,18446,0,1
+20000,Female,University,Married,29,1,12186,0,0
+20000,Male,University,Single,23,1,4673,2000,0
+230000,Female,Graduate School,Single,43,0,39677,5000,0
+180000,Male,University,Married,37,-1,14480,34045,0
+280000,Male,University,Married,41,-1,3520,3110,0
+510000,Female,Graduate School,Single,36,0,234900,9975,0
+150000,Male,University,Married,40,0,141697,7000,0
+20000,Male,High School,Single,26,2,17880,1641,1
+110000,Female,High School,Married,38,0,100699,3530,0
+230000,Female,Graduate School,Single,34,-1,907,0,1
+180000,Female,Graduate School,Single,27,0,124501,5000,0
+230000,Female,High School,Single,36,0,13809,1533,0
+370000,Female,Graduate School,Single,35,0,40229,0,0
+500000,Female,Graduate School,Single,30,-2,1078,5562,0
+260000,Female,University,Married,43,1,0,0,0
+80000,Male,University,Married,53,0,5800,1000,1
+140000,Female,University,Single,27,1,118013,9044,0
+50000,Male,Graduate School,Single,30,0,29994,2000,0
+50000,Female,University,Married,45,0,26615,2000,0
+80000,Male,Graduate School,Single,32,-1,4055,0,0
+150000,Female,University,Single,28,0,85118,5000,1
+50000,Female,University,Married,46,0,49859,2438,0
+80000,Male,University,Others,39,2,76123,2000,0
+140000,Female,Graduate School,Married,40,0,67371,3100,0
+140000,Female,Graduate School,Single,30,0,105732,5000,0
+50000,Female,University,Single,24,-1,1580,306,0
+30000,Female,University,Single,24,0,25111,3551,0
+230000,Male,University,Single,32,0,171111,7468,1
+150000,Female,University,Single,26,0,94347,5000,0
+200000,Female,University,Married,53,-1,5059,0,0
+110000,Female,Graduate School,Married,41,2,150,0,0
+150000,Female,University,Married,44,0,17723,2019,0
+200000,Female,High School,Married,46,-1,1207,5590,1
+180000,Female,High School,Single,33,1,0,199,0
+90000,Female,Graduate School,Single,29,-1,6146,6198,0
+200000,Female,Graduate School,Single,30,-1,105,271,1
+230000,Female,University,Married,45,-1,107,107,0
+80000,Male,Graduate School,Single,29,-1,17562,3015,0
+230000,Female,Graduate School,Single,33,-1,6283,0,0
+340000,Female,University,Single,31,4,581775,0,1
+290000,Female,University,Married,53,-1,1538,199,0
+60000,Female,High School,Married,41,0,55773,2627,0
+320000,Female,Graduate School,Single,29,0,12130,0,1
+200000,Female,University,Single,25,-2,22701,0,0
+50000,Female,High School,Married,54,0,16648,1588,0
+50000,Male,High School,Single,24,4,27226,0,1
+230000,Female,University,Married,43,0,84821,4293,0
+130000,Female,Graduate School,Single,29,0,106271,15107,0
+80000,Female,High School,Married,39,-1,715,2012,0
+120000,Female,Graduate School,Single,31,0,82683,4445,0
+200000,Female,University,Married,34,0,4798,1125,0
+300000,Male,High School,Single,34,-1,4741,3767,0
+200000,Female,University,Married,36,-1,1070,4107,0
+80000,Female,University,Married,46,0,34494,2000,0
+80000,Female,High School,Single,32,1,-137,42010,0
+60000,Female,Graduate School,Single,25,0,51907,1935,0
+30000,Female,University,Married,30,0,25156,1500,0
+300000,Female,University,Married,42,-1,38913,319,0
+100000,Female,Graduate School,Married,38,0,66762,5000,0
+290000,Female,Graduate School,Single,31,0,44193,1380,0
+270000,Male,Others,Single,39,1,0,10193,0
+70000,Female,University,Single,24,0,7033,1954,1
+90000,Female,University,Married,24,-2,91272,5000,0
+170000,Female,Graduate School,Single,26,0,108426,3600,0
+80000,Female,University,Married,36,0,69760,2500,1
+100000,Female,University,Married,41,0,95326,4825,0
+240000,Male,Graduate School,Married,47,-2,5731,0,0
+290000,Female,Graduate School,Married,41,-1,11200,0,0
+120000,Female,Graduate School,Single,27,1,0,0,1
+200000,Male,University,Single,26,0,103194,5500,0
+20000,Male,University,Single,22,0,13447,2200,0
+500000,Male,Graduate School,Married,38,-1,75433,27486,0
+100000,Female,University,Single,25,0,98349,3600,0
+350000,Male,Graduate School,Married,55,-1,5961,6326,0
+50000,Female,University,Single,24,1,34069,0,0
+340000,Female,University,Married,52,0,331508,12100,0
+180000,Female,University,Married,29,0,26000,1800,0
+300000,Female,Graduate School,Married,54,-1,1080,0,0
+220000,Female,Graduate School,Married,32,1,17325,0,0
+80000,Male,Graduate School,Single,26,0,56417,2600,0
+50000,Female,University,Married,32,0,44751,1500,0
+100000,Male,University,Single,26,0,60077,4000,0
+200000,Male,Graduate School,Single,27,0,58725,30229,0
+480000,Female,Graduate School,Married,45,-1,17976,1008,0
+170000,Female,University,Married,34,0,35348,1402,0
+20000,Female,Graduate School,Single,25,-1,15790,1372,1
+80000,Male,University,Married,28,0,32114,1324,1
+80000,Male,Graduate School,Single,26,0,79650,3515,1
+300000,Male,University,Married,31,1,250121,10236,0
+150000,Male,Graduate School,Single,29,-1,1934,12170,0
+50000,Female,University,Single,24,0,14814,10755,0
+50000,Female,University,Married,26,-1,1902,782,0
+100000,Female,High School,Single,28,2,99627,3796,0
+90000,Female,University,Married,45,-1,37453,24509,0
+70000,Female,Graduate School,Single,24,1,15932,1500,1
+90000,Female,Others,Single,23,0,83176,3504,0
+400000,Male,Graduate School,Married,42,0,34084,4085,0
+10000,Male,University,Single,36,0,7393,2329,1
+80000,Male,Graduate School,Single,27,0,72565,10000,0
+70000,Male,Graduate School,Single,33,0,63845,3224,0
+100000,Male,High School,Married,44,2,30076,2000,0
+210000,Female,University,Married,33,-1,1971,822,1
+360000,Female,High School,Single,39,1,0,0,1
+280000,Male,Graduate School,Married,28,1,285249,5,0
+180000,Male,Graduate School,Single,32,0,85040,2937,0
+220000,Male,Graduate School,Married,45,1,157122,0,0
+200000,Male,Graduate School,Single,47,-1,696,696,0
+130000,Female,University,Single,25,1,52180,0,0
+200000,Male,University,Married,47,-2,389,390,0
+170000,Female,Graduate School,Married,44,-1,29940,15927,0
+140000,Male,Graduate School,Single,30,0,22778,6500,0
+50000,Female,High School,Married,54,0,50318,2008,0
+200000,Female,High School,Single,26,0,22629,3000,0
+200000,Male,High School,Single,33,0,181542,7100,0
+360000,Female,High School,Married,36,2,150527,7000,0
+50000,Male,University,Married,33,0,13538,2000,0
+500000,Female,Graduate School,Single,37,-1,6364,1625,0
+90000,Female,University,Single,30,0,75382,3350,1
+500000,Male,Graduate School,Single,36,-1,2450,1100,0
+80000,Female,High School,Married,33,0,78428,3373,0
+230000,Male,High School,Married,53,-1,2150,9413,0
+120000,Female,Graduate School,Single,30,-1,416,416,1
+190000,Female,High School,Married,56,0,67211,2818,0
+500000,Male,University,Married,50,0,265803,16770,0
+80000,Female,University,Married,42,1,0,0,0
+450000,Female,Graduate School,Single,35,1,0,0,0
+280000,Female,Graduate School,Married,30,1,0,0,0
+360000,Female,High School,Married,47,-2,1458,0,1
+30000,Female,University,Single,21,0,28871,1802,0
+50000,Female,University,Married,25,0,35676,3400,1
+60000,Female,University,Married,49,1,0,0,1
+30000,Female,University,Married,28,0,8618,1700,0
+20000,Male,University,Married,49,0,16348,2010,1
+20000,Male,High School,Others,57,1,20442,1200,0
+200000,Male,Graduate School,Married,43,2,43272,2000,1
+100000,Female,Graduate School,Single,25,-2,0,0,0
+300000,Female,High School,Married,39,-1,1117,0,0
+210000,Female,Graduate School,Single,39,1,0,0,0
+70000,Female,High School,Married,71,0,64670,3166,0
+210000,Female,University,Single,29,1,1767,0,0
+440000,Male,Graduate School,Married,31,1,274263,0,1
+410000,Male,Graduate School,Single,33,-1,4111,897,0
+110000,Male,Graduate School,Married,36,0,104498,5000,0
+200000,Female,Graduate School,Single,36,0,120869,0,0
+180000,Female,High School,Married,42,-1,2069,2675,1
+150000,Female,Graduate School,Married,46,0,21787,1382,0
+330000,Female,Graduate School,Single,33,1,-200,0,0
+500000,Female,University,Married,41,-1,8938,65215,0
+350000,Female,University,Married,35,-2,83379,26436,0
+200000,Female,University,Married,36,-1,473,4528,1
+280000,Male,Graduate School,Married,45,-1,2270,1703,1
+80000,Female,High School,Married,39,-1,5280,5000,0
+280000,Female,Graduate School,Single,29,-1,4356,12384,0
+130000,Female,University,Single,23,2,115978,12073,0
+160000,Female,University,Married,49,0,76445,4000,0
+20000,Female,University,Single,23,2,2400,0,1
+300000,Female,University,Single,32,0,26575,1321,0
+100000,Female,Graduate School,Single,32,0,93724,3800,0
+130000,Male,University,Married,38,-1,780,0,0
+500000,Male,Graduate School,Married,55,2,4957,0,1
+200000,Female,University,Married,27,1,183994,5200,1
+360000,Female,Graduate School,Single,30,0,70701,10000,0
+20000,Female,University,Married,26,1,0,0,1
+400000,Male,University,Single,29,1,21092,0,0
+200000,Female,High School,Married,52,1,203385,5891,0
+30000,Female,Others,Married,60,0,19282,1700,0
+30000,Male,University,Married,58,-1,2450,4777,1
+50000,Male,University,Married,55,1,44306,0,0
+230000,Female,Graduate School,Married,51,-1,986,2295,1
+210000,Female,Graduate School,Married,36,1,0,799,1
+50000,Female,Graduate School,Single,49,2,150,0,1
+170000,Female,Others,Married,33,0,143737,6247,0
+230000,Male,Graduate School,Married,44,-1,815,815,0
+230000,Female,Graduate School,Single,48,0,161083,7800,0
+350000,Female,Graduate School,Married,36,0,352032,13068,0
+170000,Female,Graduate School,Single,36,-1,248,1416,0
+150000,Female,University,Married,36,2,137723,7000,0
+90000,Female,University,Married,27,-1,1767,17684,0
+320000,Female,Graduate School,Single,33,-2,-347,0,0
+120000,Female,University,Single,45,0,92000,2000,0
+20000,Female,High School,Married,56,2,12864,1500,0
+280000,Female,Graduate School,Married,41,2,280,6859,0
+90000,Female,University,Single,28,0,81553,2000,0
+450000,Male,Graduate School,Married,30,0,25244,0,0
+270000,Male,University,Single,27,0,144376,6000,0
+80000,Female,University,Married,41,0,74228,3500,0
+230000,Female,University,Married,41,0,166988,8000,0
+50000,Female,University,Married,30,-1,416,832,1
+250000,Male,Graduate School,Married,48,0,205323,10021,1
+220000,Female,Graduate School,Single,42,-1,26340,31533,0
+30000,Female,Graduate School,Married,38,0,26460,2000,0
+120000,Male,University,Single,68,2,125406,4400,1
+100000,Female,University,Single,26,0,22320,1227,0
+150000,Female,Graduate School,Single,25,3,147260,8100,1
+670000,Male,Graduate School,Single,28,0,88612,5000,0
+200000,Male,Graduate School,Married,36,-1,8065,49362,0
+70000,Female,High School,Married,47,0,71392,3200,0
+50000,Female,University,Married,41,1,15910,0,0
+80000,Female,University,Single,64,2,74037,390,1
+10000,Male,Graduate School,Single,23,2,1050,0,1
+140000,Female,Graduate School,Married,37,0,124375,11000,1
+30000,Female,University,Single,21,0,17444,2276,0
+200000,Male,University,Single,36,0,99483,4841,1
+90000,Female,University,Single,24,0,26928,3600,1
+130000,Female,University,Married,52,1,112434,0,0
+160000,Female,University,Single,26,-1,8036,7663,0
+150000,Female,High School,Married,58,-2,42218,4828,0
+150000,Female,Graduate School,Single,34,-2,1198,12003,0
+20000,Male,Graduate School,Single,24,-1,3302,0,0
+80000,Female,Graduate School,Single,26,-1,666,666,1
+460000,Male,University,Married,31,0,120430,6000,0
+60000,Female,High School,Married,40,1,749,60295,1
+50000,Male,University,Married,38,0,49695,2200,1
+200000,Female,High School,Married,35,0,188096,8000,1
+60000,Female,University,Married,30,0,59533,3300,0
+200000,Male,Graduate School,Married,31,-1,58008,20000,0
+230000,Male,Graduate School,Married,33,0,218094,12000,0
+50000,Male,University,Married,38,0,25809,2000,0
+60000,Female,High School,Single,51,0,56793,2342,0
+180000,Male,University,Married,47,-1,1473,1473,1
+50000,Female,University,Married,47,0,50732,2000,0
+150000,Male,High School,Married,62,1,0,0,0
+140000,Female,University,Married,36,-1,326,501,1
+120000,Male,Graduate School,Married,35,1,114127,7000,1
+200000,Female,Graduate School,Married,39,0,193162,7294,0
+500000,Male,Graduate School,Single,31,1,0,0,0
+120000,Female,University,Married,36,2,117922,0,1
+500000,Male,Graduate School,Single,28,-2,17155,0,0
+50000,Female,High School,Single,23,0,47333,1193,0
+30000,Female,High School,Single,51,0,17261,5000,0
+240000,Male,Graduate School,Married,35,-2,2007,21817,0
+170000,Female,University,Married,37,1,0,0,0
+80000,Male,Graduate School,Single,26,1,74384,5000,0
+360000,Male,Graduate School,Single,29,-1,2273,1175,0
+50000,Female,University,Single,24,2,20632,1700,1
+70000,Female,University,Married,27,2,76062,3000,1
+250000,Female,University,Married,29,2,128413,5300,1
+150000,Female,Graduate School,Married,39,1,80762,0,1
+100000,Female,University,Married,41,-2,390,780,0
+360000,Male,Graduate School,Single,32,-2,0,0,0
+30000,Female,High School,Married,47,-2,6911,10523,0
+490000,Male,Graduate School,Married,29,-1,115,109,1
+180000,Female,University,Married,60,2,178854,2600,1
+420000,Female,Graduate School,Married,36,-1,0,0,0
+20000,Male,Graduate School,Single,33,2,16339,5000,1
+480000,Male,Graduate School,Married,47,0,69046,2548,0
+350000,Male,Graduate School,Single,27,1,21872,137308,0
+360000,Male,University,Married,57,1,0,0,1
+240000,Female,Graduate School,Single,27,-2,3293,9079,0
+280000,Female,Graduate School,Single,30,-1,29989,168,1
+150000,Female,University,Married,39,0,32891,5000,0
+320000,Male,University,Single,30,0,29668,1353,0
+20000,Female,University,Single,22,-1,631,1800,1
+50000,Female,High School,Married,36,0,45260,2075,1
+280000,Male,Graduate School,Single,33,0,275487,8879,0
+450000,Female,University,Married,53,-2,2542,2557,0
+260000,Female,Graduate School,Single,53,0,235073,30000,0
+500000,Male,Graduate School,Married,56,0,87161,3248,0
+470000,Male,Graduate School,Single,35,0,393780,15107,0
+150000,Female,University,Single,23,0,15952,20039,0
+100000,Female,High School,Married,46,0,91983,11000,0
+100000,Male,Graduate School,Single,38,1,24796,3000,0
+220000,Male,University,Single,39,-1,3553,15696,0
+300000,Female,Graduate School,Single,37,1,34085,0,0
+230000,Female,Graduate School,Single,31,-2,0,0,0
+300000,Female,High School,Single,39,1,-3928,1720,1
+50000,Male,High School,Single,48,0,79681,3200,0
+170000,Male,High School,Married,47,-2,1521,0,0
+460000,Male,Graduate School,Married,41,-1,5383,40134,1
+30000,Female,Graduate School,Single,34,1,24022,1396,1
+30000,Female,Graduate School,Single,24,-1,165,165,1
+20000,Male,University,Married,42,0,21431,1386,1
+220000,Male,High School,Married,44,-1,380,380,0
+70000,Female,Graduate School,Married,62,0,69926,4000,0
+330000,Female,University,Single,40,2,131096,0,0
+370000,Male,Graduate School,Married,40,1,0,0,1
+230000,Male,Graduate School,Single,28,-2,23855,18309,0
+50000,Male,High School,Married,49,0,40281,1173,0
+360000,Male,University,Single,31,0,281600,14128,1
+390000,Female,University,Single,37,0,324452,13106,0
+360000,Female,University,Others,42,0,429169,16629,0
+30000,Female,University,Married,27,0,10011,3000,1
+500000,Male,Graduate School,Married,33,1,2907,9,0
+120000,Female,High School,Single,32,0,67876,3120,0
+360000,Male,Graduate School,Single,29,-2,0,0,1
+20000,Female,University,Married,27,0,19961,1500,0
+50000,Female,University,Married,48,0,5994,1478,0
+230000,Female,Graduate School,Single,27,1,0,1988,0
+160000,Female,Graduate School,Married,48,1,0,0,0
+110000,Female,Graduate School,Married,32,2,114430,0,1
+50000,Male,University,Single,25,2,32233,0,0
+30000,Male,High School,Single,28,2,22323,4000,1
+50000,Male,University,Single,35,0,57570,2460,1
+240000,Female,High School,Married,37,-2,-3,0,0
+90000,Male,High School,Single,29,0,83683,4000,0
+150000,Female,Graduate School,Married,29,-1,598,3188,1
+100000,Female,University,Married,35,3,61089,0,1
+70000,Female,Graduate School,Single,24,2,26410,5190,1
+430000,Female,University,Single,27,-2,263,1764,1
+150000,Male,Graduate School,Single,31,0,64378,14127,0
+50000,Male,Graduate School,Single,48,0,53540,1066,0
+200000,Male,Graduate School,Single,33,-2,-18,0,1
+150000,Female,University,Married,28,0,19946,2000,0
+50000,Male,University,Single,25,0,5832,30074,0
+300000,Male,Graduate School,Single,31,0,16772,934,0
+50000,Female,University,Married,44,0,45578,3016,0
+30000,Male,Graduate School,Single,30,1,0,0,1
+380000,Male,Graduate School,Married,53,1,0,767,0
+70000,Female,Graduate School,Married,45,1,69794,3000,0
+360000,Male,University,Single,28,0,19390,7113,0
+50000,Male,University,Single,41,0,41993,1942,0
+20000,Female,Graduate School,Single,30,1,11730,1712,0
+80000,Female,University,Single,27,0,80088,4791,0
+30000,Male,University,Married,49,0,12855,3690,0
+50000,Female,High School,Single,55,0,51682,3960,0
+30000,Female,High School,Married,37,0,26411,1898,0
+260000,Male,Graduate School,Married,38,-1,10448,1211,1
+210000,Female,University,Married,39,-1,44100,0,0
+200000,Female,Graduate School,Single,29,0,7523,3000,0
+50000,Male,Graduate School,Married,56,-1,5433,1117,1
+490000,Male,High School,Married,28,0,108157,4056,0
+170000,Male,Graduate School,Married,45,-1,3479,570,0
+500000,Male,Graduate School,Single,29,-1,15732,3000,0
+220000,Male,University,Married,46,2,157646,5800,0
+150000,Female,Graduate School,Single,30,0,141330,8000,0
+50000,Female,University,Married,34,2,44654,2000,0
+50000,Male,University,Married,52,0,46436,2300,0
+100000,Female,Graduate School,Single,26,0,91857,3572,0
+310000,Female,University,Single,25,0,28598,8820,0
+510000,Female,Graduate School,Married,35,0,62382,5000,0
+200000,Male,University,Married,41,1,11201,980,0
+50000,Male,High School,Married,43,0,7978,3887,0
+300000,Female,Graduate School,Single,38,1,10361,0,0
+450000,Female,Graduate School,Married,40,1,0,0,0
+30000,Male,University,Single,27,-1,3902,1950,0
+60000,Female,Graduate School,Single,27,-1,836,836,0
+140000,Female,University,Single,23,2,25621,1600,1
+50000,Female,University,Single,32,0,36211,1709,0
+90000,Female,High School,Single,28,0,86493,4200,0
+190000,Female,University,Married,34,0,138628,21440,0
+150000,Female,Graduate School,Single,29,0,147331,6740,0
+390000,Female,University,Single,28,-1,19081,1170,0
+30000,Male,High School,Married,47,0,28923,1586,0
+240000,Male,Graduate School,Married,60,2,184427,7001,1
+260000,Female,Graduate School,Single,29,-1,470,0,1
+30000,Male,High School,Single,25,0,10198,18808,0
+500000,Female,Graduate School,Single,28,0,58076,2001,0
+60000,Female,University,Single,29,2,41387,1685,0
+200000,Male,High School,Married,30,0,144933,7000,0
+80000,Male,University,Married,31,0,70664,2036,0
+100000,Female,University,Single,30,1,-4,0,0
+380000,Male,Graduate School,Married,42,-1,4377,0,0
+210000,Male,University,Married,30,0,32903,1600,1
+10000,Male,University,Single,27,0,8257,2000,1
+500000,Male,University,Married,44,0,91718,4000,0
+240000,Female,Graduate School,Married,37,2,118402,5598,1
+270000,Male,Graduate School,Single,29,0,227479,7000,0
+50000,Male,University,Single,25,0,21006,4788,0
+120000,Female,Graduate School,Single,36,0,84171,3438,0
+280000,Male,Graduate School,Single,28,0,281663,80019,0
+260000,Female,Graduate School,Single,28,0,126418,6000,0
+360000,Male,Graduate School,Married,38,-1,7975,4062,0
+210000,Female,Graduate School,Single,30,1,0,795,0
+390000,Male,Graduate School,Single,29,0,9918,16008,0
+40000,Male,University,Married,36,0,34871,3700,0
+10000,Female,University,Single,27,0,6691,1142,0
+120000,Female,University,Married,43,-2,0,999,1
+200000,Male,High School,Married,43,-1,1435,1270,0
+80000,Female,University,Married,37,-1,1480,3832,0
+60000,Female,University,Single,27,0,12193,2000,0
+130000,Female,University,Single,39,2,50033,3500,1
+100000,Female,University,Single,29,0,17848,1775,0
+20000,Male,University,Married,45,-1,3213,1300,0
+460000,Male,Graduate School,Single,34,0,45358,2613,0
+80000,Male,University,Single,32,1,82212,7,0
+80000,Male,Graduate School,Single,26,0,74487,7576,0
+240000,Female,Graduate School,Single,28,0,209848,10000,0
+200000,Female,High School,Married,38,0,97584,3600,0
+230000,Male,University,Single,29,-2,200,840,0
+260000,Female,Graduate School,Married,53,2,827,827,0
+90000,Female,University,Married,38,0,88758,3900,0
+320000,Male,University,Single,29,0,324693,11650,0
+280000,Female,University,Married,47,0,231181,1610,0
+120000,Male,University,Single,29,1,87376,0,1
+400000,Female,Graduate School,Single,27,1,140,140,0
+50000,Female,Graduate School,Married,55,0,25043,1775,0
+10000,Female,Graduate School,Single,23,2,2400,0,1
+70000,Male,University,Single,26,0,69900,2506,0
+210000,Male,Graduate School,Married,30,0,65600,2300,0
+140000,Female,University,Married,39,-1,1007,3685,0
+290000,Male,University,Married,50,-1,929,1732,0
+30000,Male,High School,Married,32,0,28292,3000,0
+110000,Female,Graduate School,Single,32,1,58679,1210,1
+20000,Female,High School,Single,54,0,9777,2000,0
+220000,Female,University,Married,36,0,136556,5100,0
+180000,Female,University,Married,36,-1,682,0,0
+30000,Male,Graduate School,Single,26,4,25899,0,1
+50000,Female,University,Single,31,1,0,0,0
+150000,Male,High School,Married,29,0,38675,0,0
+150000,Female,Graduate School,Single,32,0,30697,10000,0
+140000,Male,High School,Single,30,1,84014,3300,0
+80000,Female,University,Married,48,0,79518,2264,0
+200000,Female,Graduate School,Married,33,0,105775,3493,0
+400000,Male,Graduate School,Married,44,-1,4482,0,0
+130000,Female,University,Married,28,2,135805,75,0
+50000,Female,University,Single,22,0,38604,1575,0
+190000,Male,University,Married,51,0,86713,3359,1
+470000,Female,Graduate School,Single,40,0,17243,1243,1
+280000,Male,Graduate School,Married,55,-1,2326,4917,0
+460000,Female,Graduate School,Single,26,-1,18738,16447,0
+340000,Female,University,Single,32,0,169246,6000,0
+110000,Male,University,Single,27,0,103550,6000,0
+440000,Female,University,Single,27,-1,25154,10011,0
+90000,Female,University,Single,23,-2,3440,6140,0
+70000,Female,University,Single,24,-2,63912,2400,0
+30000,Female,University,Single,24,1,7134,1000,0
+120000,Female,University,Single,49,0,3903,1300,0
+300000,Female,University,Single,27,-1,12752,9314,0
+60000,Female,Graduate School,Single,25,0,16313,5000,0
+180000,Male,University,Single,36,-1,100,100,1
+30000,Female,Graduate School,Single,43,1,17775,1500,1
+360000,Male,Graduate School,Single,35,-1,15798,16713,0
+150000,Male,Graduate School,Married,49,-2,15472,14877,0
+70000,Female,Graduate School,Single,28,0,20335,2000,0
+150000,Female,University,Married,44,0,61951,15000,0
+440000,Female,Graduate School,Married,33,-2,0,0,0
+500000,Male,Graduate School,Married,52,1,-5,43533,0
+260000,Male,University,Single,29,1,263943,0,1
+360000,Male,Graduate School,Single,29,1,0,0,0
+140000,Male,Graduate School,Single,28,0,132880,6400,0
+10000,Male,Graduate School,Single,24,0,4216,2500,1
+200000,Female,Graduate School,Single,32,-2,2580,0,0
+210000,Male,University,Married,35,0,128595,4606,0
+20000,Male,University,Single,27,2,13747,1500,1
+240000,Male,Graduate School,Married,44,-1,4221,2188,0
+30000,Male,University,Single,51,5,26551,0,1
+450000,Female,University,Married,38,-2,10972,15008,0
+60000,Female,High School,Single,40,-1,24496,20391,0
+100000,Female,University,Single,30,0,58335,2801,0
+500000,Male,Graduate School,Married,49,-1,12939,28954,0
+50000,Male,High School,Married,51,-1,636,4600,0
+80000,Female,University,Married,35,-1,997,40297,0
+20000,Female,University,Single,22,2,5743,2000,1
+500000,Male,Graduate School,Single,43,0,141323,5000,0
+140000,Female,Graduate School,Single,31,0,71066,2042,0
+500000,Female,University,Single,29,0,455042,12564,0
+50000,Male,University,Single,24,2,43247,3000,1
+170000,Male,Graduate School,Married,60,0,170359,11700,1
+220000,Female,University,Married,44,0,136838,10000,0
+70000,Female,University,Married,33,2,63963,3000,1
+360000,Female,University,Single,34,0,122193,5004,0
+430000,Female,Graduate School,Single,29,0,34785,0,0
+20000,Male,High School,Single,26,2,20155,0,0
+200000,Female,High School,Single,53,-1,784,1188,0
+200000,Female,University,Others,35,-1,1992,8290,0
+80000,Female,University,Single,25,0,78401,3562,0
+130000,Male,Graduate School,Married,35,2,77578,4000,0
+150000,Male,Graduate School,Married,49,-1,6338,8470,0
+150000,Male,High School,Married,53,2,17790,1000,1
+190000,Female,University,Single,27,2,4686,1000,1
+450000,Female,University,Single,36,1,0,0,1
+80000,Female,University,Married,31,-1,390,390,0
+260000,Female,University,Single,31,-1,82274,0,0
+80000,Male,Graduate School,Single,34,2,62423,3000,1
+500000,Male,Graduate School,Married,39,1,0,0,0
+260000,Male,Graduate School,Married,43,-1,76,826,0
+260000,Female,Graduate School,Single,30,2,94114,5000,1
+50000,Male,Graduate School,Single,33,2,24947,249,0
+360000,Female,Graduate School,Married,46,1,-8,867,0
+170000,Female,Graduate School,Single,53,-2,-200,0,0
+120000,Female,University,Married,34,1,0,0,0
+50000,Female,University,Married,26,0,5195,15387,0
+50000,Female,Graduate School,Married,40,-1,465,0,1
+310000,Male,Graduate School,Married,48,0,29662,1536,0
+50000,Male,Graduate School,Single,27,0,17541,1762,0
+140000,Female,University,Single,32,0,121508,5000,0
+100000,Female,University,Married,28,2,73073,3500,1
+20000,Male,High School,Single,32,1,19844,0,0
+120000,Female,Graduate School,Single,27,0,11379,1200,1
+30000,Female,High School,Married,59,0,28170,4946,0
+20000,Male,High School,Married,27,0,20184,2950,1
+160000,Female,University,Single,27,1,0,1519,1
+180000,Female,University,Married,37,1,-1279,0,0
+40000,Female,Graduate School,Single,26,0,20936,3000,0
+80000,Female,University,Married,34,0,49005,2141,0
+120000,Male,University,Single,31,3,113722,0,0
+20000,Female,University,Single,22,-1,2480,179,1
+20000,Male,University,Single,33,0,18271,2967,0
+100000,Male,Graduate School,Married,52,4,101762,3504,1
+500000,Male,Graduate School,Single,45,1,0,0,0
+80000,Female,University,Married,29,0,78293,3161,0
+150000,Female,Graduate School,Single,41,-2,9682,399,0
+30000,Female,High School,Single,26,0,27594,2595,0
+120000,Male,Graduate School,Single,34,1,650,300,1
+70000,Female,University,Married,27,2,25773,3600,1
+150000,Male,University,Married,32,0,141346,10000,0
+210000,Female,University,Married,42,1,0,0,0
+80000,Female,University,Married,30,1,82891,0,0
+90000,Male,University,Single,29,0,86294,4400,0
+20000,Female,University,Single,22,0,17807,1298,0
+240000,Female,Graduate School,Single,36,1,0,18000,0
+50000,Male,University,Single,29,2,30117,1370,1
+250000,Female,Graduate School,Single,28,0,273927,21700,0
+280000,Female,Graduate School,Single,31,1,0,0,0
+10000,Male,High School,Others,40,0,8286,1400,0
+150000,Female,University,Single,33,-1,46006,2000,0
+330000,Female,University,Married,47,1,0,0,0
+240000,Male,Graduate School,Single,30,-1,5354,4790,1
+130000,Female,University,Single,34,0,42063,0,0
+450000,Male,Graduate School,Single,37,0,16259,6000,0
+50000,Male,Graduate School,Single,26,-1,526,526,1
+70000,Female,High School,Married,24,0,68083,3012,0
+80000,Male,University,Single,26,0,33490,5000,0
+50000,Male,Graduate School,Single,24,-1,236,4324,0
+30000,Male,High School,Married,33,2,29742,1406,1
+50000,Male,University,Single,23,0,48025,2122,0
+70000,Male,University,Married,29,0,68540,3000,0
+50000,Female,University,Single,33,0,14720,1265,1
+100000,Female,High School,Single,27,0,53423,1958,1
+200000,Female,University,Single,24,0,39723,1353,0
+180000,Male,Graduate School,Single,38,1,3999,2183,0
+240000,Female,Graduate School,Single,38,-1,6916,2106,1
+90000,Female,High School,Married,35,2,35077,3407,0
+50000,Female,University,Single,33,0,16926,1400,0
+80000,Male,University,Single,48,0,74658,2500,0
+210000,Female,Graduate School,Married,43,-2,0,981,0
+350000,Male,High School,Married,41,0,349125,12005,0
+160000,Male,University,Single,45,0,131895,5336,1
+70000,Male,Graduate School,Single,33,0,66874,2423,1
+140000,Female,Graduate School,Married,33,1,0,840,0
+80000,Male,University,Married,47,0,52690,2022,0
+110000,Female,University,Married,39,2,110109,0,1
+50000,Male,University,Single,25,0,17355,1000,0
+30000,Male,Graduate School,Married,38,0,27429,2403,0
+30000,Female,University,Single,27,0,28392,3500,0
+200000,Male,University,Single,33,0,80012,2972,1
+290000,Female,University,Single,28,0,133789,6300,0
+220000,Male,Graduate School,Single,34,0,29549,2000,0
+180000,Male,Graduate School,Single,28,0,9853,10732,0
+500000,Male,High School,Married,38,0,74223,3338,0
+210000,Male,University,Married,39,-1,7128,3623,0
+100000,Female,Graduate School,Single,24,3,100016,0,1
+100000,Female,University,Single,23,0,8002,2000,0
+130000,Female,University,Single,25,2,138910,0,1
+10000,Male,University,Single,21,0,9042,1305,1
+50000,Male,University,Single,35,-1,780,0,0
+240000,Male,Graduate School,Married,37,0,191748,7000,0
+20000,Female,High School,Married,53,0,12237,1000,0
+20000,Female,Graduate School,Married,38,-1,1925,500,0
+150000,Female,Graduate School,Single,39,3,99,0,0
+230000,Male,University,Married,36,-1,3872,2000,0
+50000,Male,University,Single,26,-1,49713,2000,0
+300000,Female,University,Married,35,0,10911,12042,0
+110000,Female,University,Others,54,-1,316,316,0
+230000,Female,Graduate School,Single,33,-2,4533,0,0
+50000,Female,University,Single,30,1,0,10228,0
+70000,Female,University,Single,24,0,66915,3167,0
+360000,Female,Graduate School,Married,27,-2,0,0,0
+110000,Female,University,Single,32,0,108159,5400,0
+120000,Female,University,Single,48,0,116183,5078,0
+200000,Female,Graduate School,Married,45,-1,15127,1896,1
+260000,Female,Others,Married,46,-1,9507,4405,0
+290000,Female,Graduate School,Married,47,-1,15485,12369,0
+20000,Female,University,Single,22,0,17503,1273,0
+50000,Female,High School,Single,22,0,46687,1896,1
+50000,Female,University,Married,23,0,33713,5012,0
+60000,Female,Graduate School,Single,23,0,3517,1000,0
+330000,Male,Graduate School,Married,46,-1,7401,4567,0
+60000,Female,University,Married,23,0,58844,2000,0
+180000,Female,University,Married,25,0,134733,9148,0
+80000,Female,Graduate School,Single,28,1,53541,2600,0
+450000,Female,Graduate School,Single,32,1,0,1042,0
+150000,Female,Others,Married,36,-2,5253,0,0
+280000,Female,University,Single,28,1,0,0,0
+180000,Male,High School,Single,47,0,67011,4000,0
+380000,Female,High School,Married,54,-1,1630,65757,0
+150000,Male,University,Married,45,-2,390,390,0
+280000,Female,Graduate School,Single,52,-1,886,0,0
+160000,Female,University,Single,34,-1,18958,24000,0
+120000,Male,University,Single,29,0,118277,6513,0
+330000,Female,Graduate School,Single,34,0,261974,15000,0
+210000,Female,University,Married,49,1,0,0,0
+200000,Male,Graduate School,Single,27,1,-36,7239,0
+340000,Female,Others,Married,27,0,281544,11000,0
+200000,Male,Graduate School,Single,34,-1,959,452,0
+160000,Female,High School,Married,34,0,4763,3009,0
+500000,Female,University,Single,48,-1,77461,68781,0
+50000,Male,University,Married,43,0,28287,1485,0
+170000,Female,University,Single,23,-1,32192,1608,0
+260000,Female,Graduate School,Single,29,1,234922,6,0
+360000,Male,University,Married,49,1,0,0,1
+400000,Male,Graduate School,Single,29,0,41057,1300,0
+380000,Female,Graduate School,Single,32,-1,91806,10000,0
+50000,Male,High School,Single,29,0,7334,2500,1
+130000,Male,University,Married,40,-1,390,390,0
+110000,Female,University,Married,24,1,-1256,68500,0
+50000,Male,Graduate School,Married,23,1,0,0,0
+190000,Male,University,Single,26,0,155260,7416,0
+170000,Female,University,Married,46,-1,2741,1848,1
+210000,Female,Others,Single,25,0,179248,3151,0
+50000,Male,High School,Single,25,0,46277,2500,0
+20000,Female,High School,Married,45,2,14064,3000,1
+80000,Female,University,Married,26,0,33319,1700,1
+500000,Female,Graduate School,Single,39,-1,4417,5467,0
+250000,Female,University,Single,31,1,59878,0,1
+50000,Male,High School,Married,34,0,6808,1318,0
+300000,Male,University,Married,38,0,181387,8000,0
+30000,Female,University,Married,40,2,27975,2700,1
+50000,Male,University,Single,34,2,61220,9,0
+180000,Female,University,Single,35,-2,-117,2690,0
+50000,Female,Others,Married,28,0,9014,3000,0
+50000,Male,High School,Married,46,0,6066,1200,0
+190000,Female,University,Married,40,0,184276,15000,0
+200000,Female,University,Single,32,-2,940,174,0
+140000,Female,University,Single,26,0,141057,7200,0
+10000,Male,High School,Married,47,0,6817,2588,1
+180000,Female,Others,Married,54,0,88545,3279,0
+170000,Female,High School,Single,29,-1,14460,738,0
+50000,Male,University,Married,36,1,0,0,0
+200000,Male,University,Single,32,-1,9249,9363,0
+170000,Female,University,Married,42,-1,1085,326,1
+50000,Male,University,Single,52,1,5893,0,0
+230000,Female,University,Single,24,-1,2360,0,0
+500000,Female,University,Single,36,0,21368,4066,0
+50000,Male,University,Single,25,2,1050,0,1
+80000,Male,Graduate School,Single,38,0,32562,1482,0
+200000,Female,University,Married,61,-1,780,0,0
+360000,Male,High School,Single,34,1,0,0,1
+140000,Female,Graduate School,Married,41,-1,780,0,0
+290000,Female,University,Married,48,1,292907,0,1
+50000,Male,University,Married,41,0,48027,3000,1
+10000,Female,University,Others,38,0,3213,1000,0
+80000,Male,University,Single,27,2,1195,700,0
+20000,Male,High School,Single,22,0,18535,5109,1
+150000,Male,Graduate School,Single,37,-2,120,0,0
+230000,Female,Graduate School,Single,37,0,9092,3500,1
+60000,Female,High School,Single,24,0,60491,2560,1
+360000,Female,Graduate School,Single,26,-1,724,125,0
+100000,Male,University,Married,36,0,46202,1784,0
+30000,Male,University,Married,33,0,23467,1409,0
+330000,Female,University,Married,30,0,155285,10000,0
+20000,Female,Graduate School,Single,24,-1,632,0,0
+80000,Female,University,Married,49,1,42053,0,0
+90000,Female,Others,Single,23,0,40145,2000,1
+20000,Female,University,Single,23,0,16740,1600,0
+230000,Male,High School,Married,61,1,0,0,1
+160000,Male,University,Single,28,-1,3498,0,0
+200000,Female,University,Married,36,-1,2500,0,0
+20000,Male,University,Single,27,0,1955,2535,0
+170000,Male,Graduate School,Married,35,0,51207,5000,0
+180000,Female,University,Married,26,0,3679,1105,0
+50000,Female,University,Married,29,0,98492,2477,0
+360000,Female,Graduate School,Married,40,-1,16540,1160,0
+230000,Female,University,Single,28,-1,875,1000,0
+80000,Female,Graduate School,Single,28,0,75709,6800,1
+60000,Male,Graduate School,Single,31,2,61432,0,1
+150000,Female,High School,Married,52,-1,420,0,0
+120000,Female,High School,Married,52,0,117029,5800,0
+430000,Female,Graduate School,Married,43,-1,20849,3814,0
+360000,Male,High School,Married,32,0,89120,105484,0
+200000,Female,University,Married,60,-1,396,396,0
+50000,Male,University,Single,26,0,40959,2000,0
+150000,Female,University,Married,36,0,70001,2500,0
+20000,Female,Graduate School,Single,46,0,12146,1273,0
+20000,Male,Graduate School,Single,23,0,5565,1276,0
+50000,Male,University,Single,24,1,52935,0,1
+260000,Female,University,Single,27,0,19713,3000,1
+230000,Female,High School,Single,30,0,1873,1500,1
+170000,Male,High School,Married,43,-1,19247,1230,0
+180000,Male,Graduate School,Single,28,-2,-124,1004,0
+30000,Male,Graduate School,Single,38,1,24172,0,0
+20000,Male,University,Single,22,3,16014,0,1
+260000,Male,High School,Married,50,2,655,1076,1
+180000,Female,High School,Single,30,-1,1730,6792,0
+230000,Female,University,Married,32,0,57259,2500,1
+120000,Female,University,Others,43,0,46889,1856,0
+280000,Male,Graduate School,Married,63,-2,0,0,0
+270000,Female,University,Single,29,0,150185,4988,0
+70000,Male,Graduate School,Married,39,0,67474,3100,0
+60000,Male,University,Single,31,0,53234,2580,0
+180000,Male,University,Single,28,0,78128,5000,0
+160000,Female,University,Married,32,0,30414,1200,1
+50000,Female,Graduate School,Single,25,0,14500,0,0
+60000,Female,University,Single,48,-2,53044,1607,0
+50000,Male,University,Single,48,3,12961,507,1
+80000,Female,University,Married,56,1,-2,699,0
+30000,Female,University,Married,22,0,28622,1700,0
+90000,Female,Graduate School,Single,26,0,32154,1888,0
+130000,Female,University,Married,33,0,9768,8486,0
+200000,Male,Graduate School,Married,38,-2,2780,2658,0
+20000,Female,University,Single,23,0,11564,2000,1
+300000,Male,High School,Married,31,-1,1877,2500,0
+160000,Female,Graduate School,Married,41,0,120528,7000,0
+500000,Female,Graduate School,Married,41,-1,680,0,0
+50000,Female,University,Single,25,2,50863,1600,1
+20000,Male,Graduate School,Single,25,0,20052,1500,0
+240000,Male,High School,Married,40,2,213732,8500,1
+200000,Male,University,Single,29,2,172325,6000,1
+200000,Female,University,Single,27,-1,184,2250,1
+260000,Female,Graduate School,Single,36,-1,10425,2297,0
+100000,Male,University,Single,26,2,89296,14002,0
+360000,Male,High School,Married,42,-1,25988,1815,0
+280000,Female,Others,Others,38,0,286469,8024,0
+360000,Male,Graduate School,Married,47,0,10221,8536,0
+420000,Female,Graduate School,Married,34,-1,440,1039,0
+400000,Female,University,Married,42,-2,0,641,0
+50000,Female,University,Married,41,2,33343,1502,0
+500000,Male,Graduate School,Married,45,-2,186610,261524,0
+50000,Male,University,Single,56,0,48025,1884,0
+80000,Female,Graduate School,Single,24,-1,1275,2,0
+80000,Female,University,Single,23,2,84176,0,0
+20000,Female,University,Single,50,0,13922,3000,0
+460000,Female,Graduate School,Single,27,0,81252,5020,0
+240000,Male,Graduate School,Single,33,0,47742,1898,0
+120000,Male,University,Single,30,0,53456,2058,0
+20000,Male,University,Single,24,1,7719,0,0
+290000,Male,High School,Married,50,0,62381,1778,0
+50000,Female,University,Single,38,0,33801,1433,0
+200000,Female,University,Married,40,0,22065,2000,0
+20000,Female,High School,Others,27,0,17973,2800,0
+100000,Male,Graduate School,Married,43,0,43970,2500,0
+260000,Female,Graduate School,Single,36,0,107350,4500,0
+20000,Female,University,Single,27,4,21750,0,0
+10000,Female,University,Single,22,0,5964,1284,0
+420000,Male,Graduate School,Married,47,-2,46179,84459,0
+130000,Female,High School,Single,27,0,29645,2000,0
+150000,Female,Graduate School,Single,25,0,36497,2287,0
+120000,Male,Graduate School,Single,33,-1,291,291,0
+30000,Female,University,Single,25,0,7244,1142,0
+260000,Female,University,Single,31,0,166071,7000,0
+60000,Male,High School,Single,34,-1,326,326,1
+70000,Female,High School,Single,52,0,105991,2777,0
+110000,Male,University,Single,26,2,28290,3000,0
+180000,Male,University,Single,37,-1,8788,40970,0
+300000,Female,University,Single,48,-2,6301,4126,0
+260000,Female,Graduate School,Married,38,-2,622,3084,0
+50000,Female,University,Married,41,2,36329,3406,1
+90000,Female,High School,Married,45,0,63115,1988,0
+20000,Female,University,Single,24,1,6025,0,1
+220000,Female,Graduate School,Single,30,-1,2731,3403,0
+240000,Male,Graduate School,Single,27,0,2961,2000,0
+440000,Female,University,Single,40,0,98496,10000,0
+60000,Female,University,Married,37,5,60480,1001,0
+380000,Male,University,Married,31,-1,6749,7959,0
+30000,Female,High School,Married,44,2,19608,1700,1
+120000,Female,University,Single,35,0,71708,3200,0
+150000,Male,Graduate School,Single,29,-2,4583,6989,0
+90000,Male,University,Single,28,0,43314,2037,0
+180000,Female,University,Married,34,0,25000,0,0
+10000,Female,University,Single,37,-1,7841,1157,0
+20000,Male,University,Single,24,2,15166,0,1
+80000,Female,High School,Married,41,2,57754,0,1
+610000,Male,University,Married,30,0,50724,2134,0
+180000,Female,High School,Single,27,1,0,0,0
+50000,Male,High School,Married,53,1,0,484,1
+130000,Male,Graduate School,Single,30,0,130705,6500,1
+160000,Male,Graduate School,Single,26,0,135233,5600,0
+200000,Male,Graduate School,Single,29,0,137218,6808,0
+20000,Male,Graduate School,Single,29,0,14897,3170,0
+50000,Female,University,Single,56,0,47060,2180,0
+30000,Female,University,Married,46,1,27882,1800,1
+140000,Male,University,Married,29,0,130767,6696,0
+160000,Male,University,Married,45,1,11569,0,0
+410000,Male,Graduate School,Married,33,-1,3244,8339,0
+320000,Female,Graduate School,Single,30,0,8144,7500,0
+50000,Female,University,Married,36,0,47404,2148,0
+110000,Male,University,Single,29,0,107221,4000,0
+530000,Female,High School,Married,50,0,219884,10000,0
+80000,Female,University,Single,40,0,46982,1728,1
+190000,Female,Graduate School,Single,26,2,204297,6355,1
+500000,Male,University,Married,58,-1,4101,11155,0
+270000,Female,High School,Single,30,1,0,2900,0
+20000,Female,University,Single,26,2,17368,0,0
+90000,Male,University,Married,39,0,27764,1500,1
+10000,Male,University,Single,44,1,6898,1019,0
+320000,Female,University,Married,36,0,4062,1155,0
+190000,Female,Graduate School,Married,33,0,42740,10000,0
+380000,Male,Graduate School,Married,39,-1,6347,0,1
+180000,Male,University,Single,30,0,145563,10035,0
+360000,Female,Graduate School,Single,33,-1,898,11573,0
+150000,Male,University,Married,30,1,40496,1000,0
+50000,Female,Graduate School,Married,57,1,45608,0,1
+50000,Male,University,Single,39,1,39966,0,0
+50000,Male,High School,Married,51,0,42257,3000,0
+340000,Female,University,Single,30,0,47664,2000,0
+320000,Male,Graduate School,Single,26,-1,111710,1267,0
+230000,Female,Graduate School,Single,42,-1,326,326,0
+120000,Female,Graduate School,Single,31,-1,643,586,0
+30000,Male,Graduate School,Single,34,1,26921,1700,0
+240000,Male,Graduate School,Single,34,-1,21027,27994,0
+240000,Female,University,Married,32,1,0,0,0
+210000,Male,Graduate School,Single,30,0,99342,4900,0
+20000,Male,High School,Single,24,1,17231,0,0
+20000,Female,University,Married,42,0,16560,1293,1
+110000,Female,Graduate School,Single,25,1,4382,5,0
+130000,Male,University,Married,39,0,88014,3245,0
+80000,Female,University,Married,28,0,77857,4000,0
+20000,Male,University,Married,36,1,12564,0,1
+10000,Male,Graduate School,Married,38,-1,4711,5000,1
+180000,Male,University,Married,37,-1,5480,11850,0
+280000,Female,University,Single,29,0,140888,5040,0
+220000,Male,Graduate School,Single,29,2,124820,6300,1
+260000,Male,University,Married,42,3,215521,0,1
+30000,Male,University,Married,40,1,15526,0,0
+50000,Female,University,Married,29,-2,21523,19505,0
+150000,Female,High School,Single,47,0,153720,7000,0
+160000,Male,High School,Single,38,0,47269,2767,0
+20000,Male,University,Single,26,1,0,132,0
+280000,Male,Graduate School,Married,42,0,18958,2676,0
+50000,Male,University,Single,24,0,6716,1290,0
+300000,Female,Graduate School,Single,43,1,1352,2380,0
+230000,Male,University,Married,31,0,60528,5000,0
+60000,Male,High School,Single,27,0,37170,2000,0
+190000,Male,High School,Married,29,-2,51632,1000,0
+60000,Male,University,Single,43,1,0,0,0
+100000,Female,University,Single,25,2,88420,3600,1
+30000,Female,University,Married,45,0,17166,2000,0
+200000,Female,University,Single,30,6,71310,0,1
+30000,Male,Graduate School,Others,27,0,25107,2000,0
+130000,Male,University,Single,25,0,126454,4252,0
+120000,Male,University,Single,29,0,60098,2000,0
+160000,Female,University,Single,35,1,-200,0,0
+20000,Female,University,Single,23,0,17772,1877,0
+30000,Female,University,Single,24,2,26133,3526,0
+70000,Female,University,Single,23,0,65026,4000,0
+80000,Male,High School,Single,25,0,78230,2289,0
+80000,Female,University,Single,23,0,6085,1128,0
+100000,Male,University,Single,50,0,104466,4000,0
+570000,Male,Graduate School,Single,31,0,26775,15004,0
+170000,Male,Graduate School,Married,45,2,360,360,0
+50000,Female,University,Single,25,1,0,870,0
+310000,Female,Graduate School,Single,39,0,300500,11027,0
+140000,Male,Graduate School,Single,27,0,19484,2000,1
+200000,Female,University,Single,29,0,69412,4012,0
+300000,Female,University,Single,26,0,11342,9670,1
+300000,Female,Graduate School,Single,26,0,296186,20006,0
+140000,Male,University,Single,55,0,162452,5000,0
+560000,Male,Graduate School,Married,47,-1,10501,69803,0
+200000,Male,Graduate School,Married,46,1,73537,0,1
+230000,Female,High School,Married,31,0,184444,7016,0
+200000,Male,Graduate School,Married,36,0,2346,2500,1
+160000,Male,University,Single,34,2,73820,0,1
+220000,Male,Graduate School,Single,29,0,216096,10110,0
+60000,Male,High School,Single,25,0,59817,3000,0
+390000,Female,University,Married,43,2,78658,4514,1
+280000,Female,Graduate School,Single,39,-2,1863,872,0
+80000,Male,Graduate School,Married,60,0,53315,3300,0
+360000,Male,University,Married,34,0,31580,1549,0
+20000,Female,University,Single,21,0,18770,1400,0
+500000,Female,University,Married,31,-1,2181,5700,0
+90000,Female,Graduate School,Single,23,-1,316,3066,0
+30000,Female,Graduate School,Married,33,0,27729,1500,0
+390000,Male,Graduate School,Single,47,1,0,3760,0
+360000,Male,Graduate School,Married,32,1,0,0,1
+10000,Female,University,Single,32,0,9463,1165,0
+150000,Female,Graduate School,Married,32,1,0,0,0
+40000,Female,University,Single,27,0,29009,9000,0
+210000,Female,Others,Single,29,0,67931,12411,0
+50000,Female,University,Married,41,0,46965,1650,0
+10000,Male,University,Single,23,0,7886,1500,1
+310000,Female,University,Single,26,0,40604,3000,0
+50000,Male,Graduate School,Single,34,0,43796,2100,0
+150000,Female,Graduate School,Married,37,-1,4787,1054,0
+50000,Male,High School,Married,48,0,11839,1600,1
+250000,Female,Graduate School,Single,30,0,77245,8722,0
+30000,Female,High School,Married,46,2,22352,1700,1
+30000,Male,Graduate School,Married,32,0,22541,1338,1
+30000,Female,High School,Married,50,2,30615,0,1
+20000,Female,University,Married,44,8,16942,0,1
+30000,Female,Graduate School,Single,24,0,17615,1310,0
+300000,Male,Graduate School,Married,45,-1,999,0,1
+90000,Male,University,Married,28,0,5580,0,0
+30000,Male,University,Single,24,0,25063,1545,0
+200000,Female,University,Married,51,0,128208,5000,1
+180000,Female,University,Single,29,0,100534,5364,0
+70000,Female,University,Single,26,0,50680,2000,0
+230000,Female,University,Single,29,-2,0,0,0
+240000,Female,Graduate School,Married,45,1,0,0,0
+190000,Female,High School,Married,41,-1,2665,0,0
+80000,Female,Graduate School,Married,29,-2,4225,20000,0
+300000,Female,University,Married,34,0,86116,3894,0
+220000,Female,Graduate School,Married,36,0,210475,9639,0
+340000,Male,University,Married,31,0,63098,2338,1
+480000,Female,Graduate School,Single,57,-1,7066,6378,0
+60000,Female,Graduate School,Married,25,0,45575,1432,0
+200000,Male,University,Married,28,-1,11445,5013,0
+20000,Male,University,Married,53,0,19646,3000,0
+110000,Female,High School,Single,46,0,45846,2000,0
+470000,Female,University,Single,32,0,491428,300000,1
+200000,Male,University,Married,42,-1,748,49007,0
+130000,Male,Graduate School,Single,30,0,16757,5000,0
+80000,Male,Graduate School,Married,46,0,78861,2435,0
+120000,Female,University,Married,30,1,0,0,0
+180000,Male,Graduate School,Single,27,0,29735,19000,0
+400000,Male,University,Married,42,0,286710,11011,0
+150000,Male,High School,Married,43,0,150797,3008,0
+640000,Female,High School,Single,34,0,105423,51644,0
+500000,Male,Graduate School,Married,32,1,48169,20000,0
+70000,Female,Graduate School,Single,26,1,28106,3100,1
+100000,Female,Graduate School,Single,24,-1,1331,0,0
+80000,Female,High School,Married,42,-2,0,0,0
+120000,Female,University,Married,30,0,84094,3081,0
+210000,Female,Graduate School,Single,31,0,40167,2000,0
+50000,Female,Graduate School,Single,25,-1,13572,5000,0
+50000,Female,University,Married,45,0,46188,1261,0
+260000,Male,University,Married,35,-2,-48,4261,0
+20000,Male,University,Single,22,1,-50,0,0
+30000,Male,University,Married,40,2,38001,0,0
+310000,Female,University,Married,29,0,182202,5118,0
+100000,Male,Graduate School,Single,34,-1,43128,6200,0
+300000,Female,Graduate School,Married,38,-2,390,0,0
+30000,Male,High School,Married,47,0,27609,3630,0
+320000,Male,University,Single,34,0,173742,8100,0
+120000,Female,Graduate School,Single,32,0,81878,3014,0
+80000,Male,University,Single,24,0,26170,1414,1
+20000,Male,High School,Married,50,2,15089,2200,1
+380000,Male,Graduate School,Married,53,-2,360,360,0
+180000,Female,Graduate School,Single,30,0,173094,7000,0
+50000,Male,University,Single,27,0,26380,1000,0
+280000,Male,Graduate School,Single,33,-2,0,0,0
+30000,Female,High School,Single,47,0,25262,1518,0
+100000,Male,University,Single,25,2,97614,3000,1
+30000,Male,Graduate School,Single,45,0,27990,1850,0
+160000,Female,University,Single,30,0,124178,4507,0
+210000,Male,Graduate School,Married,42,-1,2295,6090,0
+10000,Male,Graduate School,Single,30,0,7930,1200,0
+420000,Female,University,Single,36,-1,3581,16647,0
+400000,Female,Graduate School,Married,37,-2,-151,78130,0
+70000,Female,Graduate School,Single,29,0,62870,2134,0
+110000,Male,University,Single,31,0,61050,2600,0
+300000,Male,Graduate School,Married,54,0,234589,0,0
+120000,Female,Graduate School,Single,27,-1,22027,35000,0
+180000,Male,Graduate School,Single,27,0,13895,5000,0
+400000,Male,Graduate School,Married,46,0,31864,4237,0
+100000,Male,Graduate School,Married,44,0,37222,2000,0
+220000,Female,University,Married,32,-2,183718,6243,1
+220000,Female,University,Married,43,-1,316,316,0
+140000,Female,University,Married,27,0,126771,10000,0
+90000,Female,University,Married,51,0,112577,4000,1
+30000,Female,Graduate School,Single,23,0,29292,1804,0
+240000,Female,Graduate School,Single,28,0,2276,1000,0
+110000,Female,Graduate School,Married,37,0,106658,10800,0
+60000,Female,Graduate School,Single,24,2,37356,5200,1
+20000,Male,University,Single,26,1,13242,0,1
+180000,Male,University,Married,53,0,173481,7900,1
+130000,Female,University,Single,26,0,84133,2758,0
+130000,Female,High School,Married,36,0,42553,1740,0
+70000,Male,Graduate School,Single,26,2,50587,2214,0
+10000,Male,University,Married,42,-1,9860,0,1
+50000,Female,Graduate School,Single,24,-2,894,1020,0
+190000,Female,Graduate School,Married,31,-1,8121,0,0
+60000,Male,University,Single,36,0,59219,3083,0
+20000,Male,Graduate School,Single,25,1,17147,3000,1
+80000,Female,University,Single,42,2,73981,9625,1
+500000,Female,Graduate School,Married,51,-1,5255,86992,0
+220000,Male,High School,Married,52,1,-4300,0,0
+230000,Female,University,Married,36,-2,3786,2914,0
+20000,Male,University,Single,25,2,15522,1500,0
+50000,Male,University,Single,39,0,48803,1487,1
+120000,Female,Graduate School,Single,29,0,110488,4000,0
+130000,Female,University,Single,34,0,99593,4050,1
+230000,Female,Graduate School,Single,26,2,363,363,0
+350000,Male,Graduate School,Single,32,0,92134,30000,0
+300000,Female,Graduate School,Single,38,-2,13548,2060,0
+70000,Female,University,Single,55,0,67572,2124,0
+280000,Male,Graduate School,Single,33,0,281604,10000,0
+60000,Female,University,Married,25,0,20636,1369,1
+310000,Male,University,Single,42,0,169749,7000,1
+290000,Male,High School,Single,34,0,5451,1200,1
+180000,Male,Graduate School,Married,35,1,133424,5000,1
+340000,Female,Graduate School,Single,43,1,-212,0,0
+360000,Female,Graduate School,Single,29,-2,5041,2500,0
+180000,Female,Graduate School,Married,33,-1,996,1503,1
+30000,Female,High School,Married,48,0,25615,1751,0
+100000,Male,Graduate School,Single,32,0,26127,2000,0
+180000,Male,University,Married,42,0,87815,3800,0
+210000,Female,Graduate School,Single,31,2,131297,9735,1
+180000,Female,University,Married,36,0,111190,6000,0
+60000,Female,High School,Others,48,0,53116,3689,0
+500000,Male,Graduate School,Single,30,0,192257,40000,0
+200000,Female,University,Married,37,0,5766,1274,0
+610000,Female,University,Married,39,0,145143,5002,0
+50000,Male,University,Single,29,0,47373,2000,0
+80000,Female,Others,Single,25,0,6643,3000,0
+210000,Male,Graduate School,Single,38,0,103310,107651,0
+50000,Male,Graduate School,Single,56,1,52108,52,0
+360000,Male,Graduate School,Married,58,-1,5619,15008,0
+240000,Male,University,Single,30,-2,53812,2010,0
+20000,Male,High School,Married,56,0,10847,1508,0
+10000,Male,Graduate School,Single,22,0,2946,0,1
+30000,Female,University,Single,23,2,30558,1500,1
+360000,Male,Others,Single,30,0,40250,23000,0
+260000,Male,Graduate School,Single,29,0,52079,6000,0
+100000,Male,University,Married,52,0,64358,2416,0
+500000,Female,University,Married,40,-1,2231,0,0
+70000,Female,High School,Single,42,0,69470,3085,0
+200000,Female,Graduate School,Married,38,-2,26344,0,0
+230000,Female,Others,Single,45,0,144407,5400,0
+200000,Male,University,Single,36,-1,2989,0,0
+110000,Male,University,Married,38,2,96898,4826,1
+260000,Female,Graduate School,Single,33,0,6818,1340,0
+50000,Female,High School,Others,55,-1,390,390,0
+50000,Male,University,Single,44,0,49120,2504,0
+80000,Male,University,Single,23,0,25992,1000,0
+260000,Male,Graduate School,Single,29,-1,7168,4497,0
+80000,Male,Graduate School,Single,31,0,48783,5000,0
+80000,Female,Graduate School,Single,29,0,48164,2000,0
+200000,Female,Others,Married,36,-1,2353,3011,0
+370000,Female,University,Single,29,4,390509,0,1
+60000,Female,Graduate School,Married,43,2,46806,2000,1
+260000,Male,Graduate School,Single,29,1,56536,778,0
+30000,Male,University,Single,35,3,13461,1228,1
+120000,Male,University,Single,32,1,60316,3000,1
+130000,Female,High School,Single,39,0,133825,7500,0
+160000,Female,High School,Married,38,-2,0,0,0
+230000,Female,University,Single,31,-1,5070,4317,0
+300000,Female,Graduate School,Single,26,0,293554,7877,0
+180000,Female,University,Single,29,-2,4000,6028,0
+80000,Female,University,Single,24,1,0,159,1
+180000,Female,Others,Married,30,0,107337,3735,0
+50000,Male,University,Single,25,3,50516,9,1
+10000,Male,University,Married,62,-2,2778,1939,0
+50000,Male,University,Single,46,-1,1540,26060,0
+90000,Female,Graduate School,Married,31,1,8318,39,0
+30000,Female,High School,Single,29,2,23136,1744,1
+180000,Male,Graduate School,Married,38,0,181302,6747,0
+150000,Female,University,Married,38,0,140133,6601,0
+590000,Male,Graduate School,Married,63,0,630458,28000,0
+60000,Male,University,Married,39,0,59929,2110,0
+100000,Male,University,Single,27,2,63004,5000,1
+20000,Female,University,Single,25,-2,7666,6664,0
+10000,Male,Graduate School,Single,22,2,6883,2500,1
+300000,Male,Graduate School,Single,26,2,273138,12000,1
+260000,Female,University,Single,40,0,241565,13000,0
+160000,Female,High School,Married,28,0,38660,2500,0
+260000,Female,Graduate School,Single,32,-2,0,0,1
+280000,Female,Graduate School,Single,29,-1,2187,1469,0
+260000,Female,Graduate School,Single,35,1,-15,16611,0
+240000,Male,Graduate School,Married,43,-2,3645,9088,0
+50000,Female,University,Others,49,0,45681,1209,0
+260000,Female,University,Single,44,1,0,0,1
+60000,Female,University,Single,24,2,60270,5706,1
+130000,Female,University,Married,26,0,24579,1187,0
+140000,Female,Graduate School,Single,31,0,112936,6000,0
+20000,Male,High School,Single,39,2,18243,1677,1
+360000,Female,Graduate School,Single,35,1,0,0,0
+220000,Male,Graduate School,Single,28,0,127021,28500,0
+500000,Male,Graduate School,Single,37,-1,4331,60446,1
+50000,Male,High School,Others,59,0,50739,1500,0
+210000,Female,Graduate School,Single,28,-1,262,3771,0
+200000,Male,Graduate School,Married,41,0,90930,5032,0
+370000,Male,Graduate School,Single,28,0,87313,5000,0
+200000,Male,University,Single,27,1,112050,6501,0
+260000,Male,Graduate School,Married,51,1,-200,0,0
+50000,Female,Graduate School,Single,23,-1,15512,2552,0
+260000,Female,Graduate School,Single,33,0,18460,3000,0
+230000,Female,High School,Single,48,-1,2781,12540,0
+20000,Male,University,Single,27,1,0,0,1
+500000,Male,University,Single,33,0,3311,4019,0
+160000,Female,University,Married,50,-1,1261,2261,1
+50000,Male,University,Single,25,0,1552,2452,1
+220000,Female,University,Married,37,2,2345,1007,1
+150000,Male,High School,Married,64,0,56852,2688,0
+180000,Female,Graduate School,Single,35,-1,11123,7936,0
+80000,Female,University,Married,40,0,82150,0,1
+50000,Female,High School,Single,27,0,41991,1720,0
+30000,Female,University,Single,26,1,5231,1300,1
+180000,Female,University,Single,31,0,176731,6814,0
+70000,Male,High School,Married,64,0,62188,3000,0
+300000,Female,Graduate School,Single,33,-2,0,0,1
+640000,Female,Graduate School,Married,50,-2,6482,8853,0
+60000,Female,Graduate School,Single,28,1,0,0,0
+120000,Female,University,Single,24,2,67714,2102,1
+280000,Female,Graduate School,Single,27,-1,36382,2000,0
+260000,Female,Graduate School,Single,32,0,8161,2000,0
+50000,Male,University,Others,34,0,48538,3000,0
+70000,Male,University,Single,47,1,64586,0,0
+50000,Male,Graduate School,Single,38,-1,390,770,1
+470000,Male,Graduate School,Married,40,2,241153,10000,1
+320000,Female,Graduate School,Married,41,-2,56093,0,0
+50000,Female,High School,Single,23,0,49385,1500,0
+230000,Female,High School,Married,32,-2,0,0,0
+140000,Female,University,Single,29,0,43030,2000,0
+50000,Male,University,Single,25,0,6302,1800,1
+450000,Female,Graduate School,Single,45,1,0,0,0
+500000,Male,University,Married,49,-1,371,3540,0
+20000,Male,University,Single,37,0,15960,0,0
+230000,Male,Graduate School,Married,38,-1,1902,2170,0
+50000,Male,University,Single,24,1,53890,2200,0
+300000,Male,University,Married,31,-1,12793,14212,0
+100000,Female,High School,Married,36,2,92675,3445,0
+30000,Male,University,Single,27,0,29063,1908,0
+400000,Male,University,Married,44,-1,6389,59570,0
+170000,Male,Graduate School,Single,30,-1,998,0,0
+340000,Female,High School,Single,57,0,49467,1880,0
+210000,Female,University,Married,35,0,44552,2000,0
+80000,Female,University,Married,38,-1,1894,316,1
+20000,Female,Graduate School,Single,24,-1,7795,4112,0
+300000,Female,Graduate School,Married,51,-1,311,832,0
+20000,Male,High School,Single,43,-1,6085,1045,1
+50000,Male,Graduate School,Married,40,0,50657,2015,0
+60000,Female,Graduate School,Single,27,-1,3136,1286,0
+60000,Female,University,Single,50,0,59853,1500,0
+50000,Male,High School,Married,51,1,48091,1,1
+120000,Female,University,Single,26,1,52623,0,0
+20000,Male,High School,Single,59,2,5119,1500,0
+240000,Female,University,Married,40,-1,326,326,0
+420000,Female,Graduate School,Single,36,-2,0,3363,0
+180000,Female,Graduate School,Single,28,1,0,0,0
+410000,Female,High School,Single,28,2,407490,16195,0
+30000,Male,Graduate School,Single,41,0,16098,1597,1
+110000,Male,University,Single,25,2,38449,3000,0
+90000,Male,Graduate School,Single,33,-1,498,498,0
+220000,Male,Graduate School,Married,41,-1,30262,37092,0
+50000,Female,High School,Married,57,2,47349,1892,1
+30000,Female,High School,Single,22,0,20741,1500,0
+520000,Male,Graduate School,Married,55,-1,2292,5227,0
+80000,Female,High School,Married,39,2,48401,2000,0
+20000,Male,High School,Single,58,0,12066,0,0
+210000,Female,Graduate School,Single,29,-1,27371,871,0
+100000,Female,University,Single,28,0,101557,5000,0
+320000,Female,Graduate School,Married,33,1,0,0,0
+500000,Female,University,Married,44,0,38034,1632,0
+20000,Female,High School,Married,41,0,16208,1300,0
+10000,Male,University,Married,59,0,6827,1290,0
+310000,Male,High School,Married,43,-1,4323,650,0
+200000,Female,Graduate School,Single,37,-1,11211,12329,0
+180000,Male,High School,Single,29,-2,4358,3169,0
+50000,Male,University,Married,32,0,37293,1634,0
+150000,Female,University,Single,41,-1,2371,40824,0
+500000,Female,Graduate School,Married,45,0,93669,10147,0
+230000,Male,University,Single,31,0,14529,5500,1
+20000,Male,University,Single,38,0,19845,1694,0
+190000,Female,University,Single,28,0,43072,3768,0
+20000,Female,University,Single,24,0,19043,1500,0
+170000,Male,High School,Married,51,0,55396,8000,0
+210000,Female,University,Single,43,0,180962,6601,0
+100000,Male,University,Married,44,2,59586,2800,1
+260000,Female,University,Married,42,0,50060,5000,0
+200000,Female,University,Single,25,0,22371,2569,0
+90000,Male,University,Others,40,0,106947,1965,1
+470000,Female,Graduate School,Married,35,0,270005,9500,0
+270000,Female,Graduate School,Single,27,-1,153201,5933,0
+100000,Male,Graduate School,Married,49,0,63977,2500,1
+200000,Female,Graduate School,Single,40,1,-433,0,0
+70000,Female,University,Married,43,0,38173,1957,0
+200000,Female,Graduate School,Married,40,-1,16204,7870,0
+50000,Female,University,Married,31,0,52754,3000,0
+90000,Female,University,Married,24,0,28329,1800,0
+50000,Female,University,Married,30,0,48139,2299,0
+240000,Female,Graduate School,Married,37,-1,317,0,0
+140000,Male,Graduate School,Single,35,-1,325,537,0
+120000,Male,Graduate School,Married,52,2,112593,3300,0
+460000,Male,University,Married,61,1,0,8602,0
+170000,Female,University,Married,25,0,1906,4000,0
+500000,Female,Graduate School,Married,39,0,39823,30006,0
+50000,Male,University,Married,51,0,31281,2556,0
+210000,Female,Graduate School,Single,26,1,-12,500,0
+50000,Male,Graduate School,Single,24,0,42509,2041,0
+200000,Female,University,Married,42,-1,326,645,0
+20000,Female,Graduate School,Single,26,-1,11030,5331,0
+80000,Female,University,Married,53,1,9715,0,0
+430000,Male,University,Married,32,1,0,2500,1
+350000,Female,Graduate School,Single,44,3,314607,14000,1
+50000,Female,University,Married,34,-1,2738,8199,0
+210000,Female,High School,Married,55,0,217016,8049,1
+20000,Male,University,Single,51,-1,780,0,0
+160000,Female,University,Single,29,0,36134,21000,1
+50000,Female,High School,Married,46,2,25706,1434,1
+50000,Female,Graduate School,Single,24,0,49982,1963,0
+30000,Male,High School,Single,31,3,30588,0,1
+100000,Male,Graduate School,Single,29,2,48186,1800,1
+360000,Male,University,Single,28,0,35564,2086,0
+310000,Female,University,Married,34,0,4153,1002,0
+50000,Female,University,Others,30,2,40087,0,0
+120000,Female,University,Married,37,0,12006,1171,0
+30000,Female,University,Single,29,1,16493,0,1
+200000,Female,University,Single,30,0,193300,6860,0
+20000,Male,High School,Single,37,0,19716,1547,0
+50000,Female,University,Single,23,0,47020,2139,1
+70000,Female,Graduate School,Single,25,0,8272,2000,0
+290000,Female,Graduate School,Single,32,1,0,0,0
+90000,Female,Graduate School,Single,30,2,750,0,0
+150000,Male,University,Married,37,2,106628,7400,1
+90000,Male,University,Single,38,0,112212,5170,0
+80000,Male,Graduate School,Married,40,0,28054,3600,1
+100000,Male,Graduate School,Single,32,1,95794,4000,0
+200000,Female,University,Married,41,-2,-780,0,0
+140000,Female,University,Single,31,3,124716,5800,1
+110000,Female,University,Single,45,-1,836,9376,0
+50000,Female,High School,Others,43,0,31199,12009,0
+60000,Female,University,Married,42,0,61759,2600,0
+130000,Male,Graduate School,Married,40,0,25375,1500,0
+320000,Female,University,Married,43,0,181048,8500,0
+120000,Female,High School,Married,42,-1,2207,263,0
+450000,Female,University,Single,35,1,347994,0,0
+200000,Female,University,Married,36,-1,136286,50000,1
+20000,Male,Graduate School,Single,31,1,16170,0,1
+410000,Female,University,Married,29,-2,6764,6500,0
+210000,Female,Graduate School,Married,40,0,206803,7876,0
+210000,Female,Graduate School,Married,36,1,0,0,1
+130000,Male,High School,Married,38,1,0,0,0
+170000,Female,Graduate School,Single,27,-2,6504,1046,1
+190000,Female,University,Married,55,0,104344,6003,0
+230000,Female,Graduate School,Single,28,0,95956,4231,1
+120000,Female,University,Single,29,0,114538,6000,0
+50000,Female,University,Single,23,0,17313,3000,1
+220000,Female,University,Single,41,0,32016,2004,1
+320000,Female,Graduate School,Single,32,2,313383,12937,0
+160000,Female,University,Married,27,0,75652,1020,1
+20000,Female,University,Single,23,0,4561,2000,0
+460000,Female,Graduate School,Single,34,1,-258,0,0
+490000,Male,Graduate School,Married,38,1,380,380,0
+130000,Female,University,Married,47,0,85317,3813,0
+170000,Male,University,Single,41,0,162500,4781,0
+150000,Female,University,Single,24,0,108924,4000,0
+180000,Female,Graduate School,Single,33,-2,1246,1845,0
+50000,Male,High School,Single,35,0,41646,4000,0
+240000,Female,Others,Married,44,0,47315,3000,0
+390000,Female,Graduate School,Single,37,-1,20666,20666,0
+50000,Male,High School,Single,47,0,48593,1278,0
+310000,Male,Graduate School,Married,43,-2,0,0,0
+100000,Female,Graduate School,Single,28,1,290,1534,0
+260000,Female,Graduate School,Single,36,-1,2992,0,0
+30000,Female,University,Single,28,0,28347,2000,0
+310000,Male,University,Single,27,-2,68928,43894,0
+270000,Male,Graduate School,Single,40,0,193444,15000,0
+250000,Female,Graduate School,Married,30,-1,1923,900,1
+130000,Female,University,Married,31,0,32215,2000,0
+220000,Male,Graduate School,Married,35,0,21960,2000,0
+60000,Female,University,Single,48,0,86550,3000,0
+120000,Male,Graduate School,Single,29,-2,119324,5025,0
+30000,Female,University,Married,35,-1,11359,27657,0
+130000,Female,Graduate School,Single,30,2,124651,7100,1
+100000,Female,University,Married,35,-1,29048,0,0
+200000,Female,Graduate School,Single,26,0,1290,599,0
+260000,Female,University,Married,47,0,164137,10000,0
+390000,Male,Graduate School,Married,45,2,185204,0,1
+340000,Female,Graduate School,Single,42,0,308023,27245,0
+140000,Female,University,Single,24,-2,137037,5053,1
+220000,Female,University,Single,39,0,142186,7000,0
+20000,Male,University,Married,31,-1,1572,1386,0
+500000,Male,University,Single,40,-1,33179,99159,0
+210000,Female,University,Married,33,0,8770,6000,1
+340000,Female,University,Single,39,0,308286,13775,0
+130000,Male,University,Married,44,2,16357,1500,1
+200000,Female,University,Single,34,0,189279,10000,0
+150000,Male,Graduate School,Married,35,0,160292,6100,1
+50000,Female,High School,Single,22,0,36926,20007,0
+60000,Female,Graduate School,Single,25,2,38050,2000,1
+60000,Male,Graduate School,Single,27,-1,1386,5273,0
+150000,Female,High School,Married,29,0,118025,4278,0
+180000,Male,Graduate School,Married,38,2,47276,3000,1
+50000,Male,University,Single,22,0,55100,2372,0
+10000,Female,High School,Married,48,0,7863,1172,0
+100000,Female,University,Single,24,0,94056,3730,0
+30000,Female,University,Single,21,-2,0,780,0
+50000,Male,High School,Married,51,0,48550,2000,0
+30000,Male,University,Married,39,0,19643,1700,1
+100000,Female,University,Single,30,2,96638,4300,1
+100000,Male,University,Single,35,2,24524,3500,0
+290000,Male,University,Married,37,-2,199911,8001,0
+140000,Female,University,Married,42,0,3726,2846,0
+240000,Female,Graduate School,Married,39,1,0,0,0
+380000,Male,Graduate School,Single,27,-1,2501,0,0
+230000,Female,High School,Married,54,-1,1227,784,0
+40000,Female,Graduate School,Single,24,0,28140,5000,1
+210000,Female,Graduate School,Married,34,0,74261,3500,0
+150000,Female,University,Married,37,0,144744,4610,0
+70000,Female,High School,Married,27,2,27241,3628,1
+80000,Female,University,Single,24,0,9591,1000,0
+50000,Male,University,Single,30,0,49259,2500,0
+300000,Female,University,Single,27,-2,0,0,0
+180000,Female,University,Single,30,0,63182,2166,0
+150000,Female,Graduate School,Single,35,-2,9164,14021,0
+60000,Male,Graduate School,Single,24,0,20873,2000,0
+100000,Female,University,Single,37,0,177961,3082,1
+90000,Female,University,Married,26,2,89947,5000,0
+20000,Male,University,Married,43,0,14102,1300,0
+320000,Female,Graduate School,Single,32,-2,-1,0,1
+90000,Female,High School,Married,68,-2,2000,0,0
+500000,Male,University,Single,36,1,226,9125,0
+210000,Male,Graduate School,Married,37,-1,3094,565,0
+30000,Male,University,Single,29,2,28603,1000,1
+20000,Male,University,Married,51,0,10400,0,1
+50000,Female,Graduate School,Single,23,1,0,0,0
+360000,Female,High School,Married,39,-1,4964,177,0
+80000,Female,High School,Married,28,0,36847,2000,0
+150000,Male,Graduate School,Married,47,-1,4003,2891,0
+20000,Male,University,Married,36,2,25274,1700,0
+150000,Female,High School,Married,55,-2,1194,2101,0
+340000,Male,High School,Married,29,0,332123,14000,0
+500000,Male,University,Married,61,1,0,0,1
+100000,Female,High School,Married,33,0,32526,5396,0
+160000,Male,University,Married,62,-2,0,0,0
+300000,Female,Graduate School,Married,42,-1,400,0,0
+240000,Male,Graduate School,Single,34,0,18825,12880,0
+60000,Female,University,Married,30,2,60558,3500,1
+20000,Female,University,Single,21,0,13717,1005,0
+80000,Female,University,Married,25,-1,1374,1170,0
+20000,Male,University,Married,47,-1,390,780,0
+240000,Female,Graduate School,Married,35,-1,5770,10420,0
+20000,Male,University,Single,25,1,19242,1700,0
+290000,Male,University,Married,33,0,190547,7158,0
+120000,Female,Graduate School,Single,33,0,53087,2572,0
+100000,Female,University,Single,33,0,61089,30010,0
+20000,Male,University,Married,42,0,18317,1500,0
+60000,Female,University,Married,49,1,17013,2000,1
+150000,Female,University,Single,26,0,10693,3000,1
+500000,Male,Graduate School,Married,57,-1,4560,600,0
+120000,Female,University,Single,24,0,34670,2033,1
+80000,Female,University,Married,35,-1,7961,20004,0
+100000,Female,High School,Married,39,0,39667,1649,0
+20000,Male,Graduate School,Married,53,0,13605,1249,0
+200000,Female,University,Married,33,0,23762,4000,0
+80000,Male,Graduate School,Married,39,0,61708,5000,0
+80000,Male,Graduate School,Single,33,0,75985,12036,0
+150000,Male,Graduate School,Single,26,0,29712,3200,0
+50000,Male,University,Married,50,0,47721,2037,0
+370000,Female,Graduate School,Single,28,-1,577,63895,0
+130000,Female,University,Single,25,0,107594,4023,1
+450000,Female,Graduate School,Married,41,-1,2663,2146,0
+200000,Female,University,Married,44,-1,500,0,0
+130000,Male,Graduate School,Single,30,0,90864,16000,0
+230000,Female,Others,Single,23,0,55197,2712,0
+90000,Female,Graduate School,Single,29,0,9124,10000,0
+30000,Male,High School,Married,53,-1,263,14849,0
+180000,Male,University,Married,43,-1,4425,1200,1
+140000,Male,Graduate School,Single,32,-1,9556,3000,0
+380000,Male,Graduate School,Married,52,1,0,0,0
+30000,Female,University,Married,56,2,26008,1800,1
+80000,Female,High School,Married,46,2,77490,5000,1
+60000,Female,High School,Married,47,0,59098,1392,1
+290000,Female,Graduate School,Single,32,-2,0,5743,0
+160000,Female,University,Married,26,-1,1319,3941,1
+20000,Male,Graduate School,Single,28,0,7032,12500,0
+80000,Female,High School,Married,65,0,80260,3750,0
+360000,Female,Graduate School,Married,34,-2,4259,1376,0
+30000,Female,Graduate School,Single,44,1,39347,1200,0
+140000,Female,University,Married,37,-1,330,330,0
+150000,Male,Graduate School,Single,33,0,78038,5000,0
+130000,Male,Graduate School,Married,45,0,58180,2886,0
+410000,Male,Graduate School,Married,34,-1,109256,15000,1
+70000,Female,Graduate School,Single,28,0,48910,2050,1
+30000,Female,Graduate School,Single,24,0,27181,2000,0
+50000,Female,University,Married,37,-1,7465,2790,0
+180000,Male,University,Married,45,0,119109,6000,0
+30000,Female,University,Single,36,-1,390,390,0
+20000,Male,Graduate School,Single,38,0,16787,1700,0
+30000,Male,University,Single,24,0,12800,0,0
+20000,Female,University,Single,53,0,8847,1500,0
+280000,Male,Graduate School,Single,27,-1,6564,1334,0
+50000,Female,University,Married,28,0,14262,2000,1
+360000,Male,Graduate School,Married,34,1,-130,0,1
+190000,Female,High School,Married,41,2,185738,7762,1
+180000,Female,Graduate School,Married,36,-2,2846,2092,0
+60000,Female,Graduate School,Single,24,0,48890,2000,0
+320000,Female,University,Married,47,-1,10159,38075,0
+10000,Male,High School,Single,33,-1,1655,1473,1
+200000,Female,University,Married,52,-1,316,632,1
+50000,Male,University,Single,24,0,9923,1200,0
+180000,Female,Graduate School,Single,25,-2,0,0,0
+20000,Male,University,Single,24,2,15586,1609,1
+300000,Female,University,Married,28,2,296665,11500,0
+90000,Female,Graduate School,Single,28,-1,350,0,1
+160000,Female,University,Single,23,-1,116212,0,1
+20000,Male,University,Single,39,0,14154,1258,0
+150000,Female,University,Single,28,-1,8200,2500,0
+50000,Male,High School,Single,26,1,48560,4570,0
+60000,Male,High School,Others,43,0,70367,2415,0
+210000,Female,Graduate School,Married,54,0,16355,3000,0
+150000,Male,University,Single,27,0,134492,6000,0
+60000,Female,University,Single,24,0,58143,3044,1
+210000,Female,University,Married,35,0,25806,1035,0
+210000,Male,University,Single,28,-2,198724,4500,0
+360000,Female,High School,Married,31,-1,28835,14596,0
+10000,Male,Graduate School,Single,25,0,6005,3000,0
+140000,Female,Graduate School,Single,32,0,142043,7000,0
+100000,Female,Graduate School,Single,33,1,55066,0,0
+50000,Male,University,Single,23,-1,836,836,1
+360000,Female,High School,Single,42,-2,0,0,1
+360000,Male,University,Married,42,1,0,0,0
+30000,Female,University,Single,27,0,11228,1104,0
+20000,Male,University,Single,36,2,10360,3467,0
+260000,Female,Graduate School,Married,42,0,8606,3856,0
+100000,Female,Graduate School,Single,38,2,37160,3500,1
+340000,Female,University,Single,39,-2,56412,10034,0
+50000,Female,Graduate School,Single,31,1,1129,0,1
+180000,Female,University,Married,32,0,8135,5000,0
+360000,Female,Graduate School,Single,33,-1,1616,3700,0
+60000,Female,University,Married,35,0,27892,2000,0
+20000,Female,High School,Married,28,1,18710,1400,0
+150000,Female,Graduate School,Married,44,1,27270,0,1
+150000,Male,Graduate School,Married,50,0,101639,3820,0
+300000,Male,Graduate School,Single,32,0,30749,20000,0
+420000,Female,High School,Single,35,0,63715,3002,0
+500000,Male,Graduate School,Single,34,0,324503,15633,0
+150000,Female,University,Single,30,0,65591,0,0
+50000,Male,High School,Married,54,0,34301,1700,1
+110000,Female,University,Single,24,1,68068,3900,0
+150000,Female,University,Married,38,-2,1106,1208,0
+80000,Male,Graduate School,Single,26,0,77397,3328,0
+150000,Female,High School,Married,34,0,19130,1000,0
+50000,Female,University,Single,39,0,20650,0,0
+20000,Male,University,Single,33,0,36658,4000,0
+80000,Female,University,Single,33,0,69937,3032,0
+150000,Male,Graduate School,Married,41,2,113358,14779,1
+240000,Female,High School,Married,41,1,-2225,12,0
+100000,Female,University,Married,27,-2,390,4780,0
+130000,Female,University,Married,45,0,79277,3500,0
+20000,Male,University,Married,37,0,18419,1373,0
+340000,Female,Graduate School,Single,35,-1,317,3074,0
+430000,Male,Graduate School,Married,40,-1,6112,10590,0
+110000,Female,Graduate School,Married,57,0,16090,2000,0
+90000,Female,High School,Married,49,0,10242,2006,0
+10000,Female,Graduate School,Single,26,0,9459,231,0
+100000,Male,Graduate School,Single,32,-1,4270,0,0
+390000,Female,Graduate School,Married,34,-2,7639,6083,0
+130000,Female,University,Single,43,-1,2048,2529,0
+180000,Female,University,Married,37,0,78089,3333,0
+50000,Male,University,Others,51,0,79502,4000,0
+80000,Female,University,Married,30,0,64596,4424,0
+30000,Female,High School,Single,33,1,24457,1716,1
+150000,Female,Graduate School,Single,30,-1,7015,0,0
+230000,Male,University,Single,46,1,378,907,0
+90000,Female,University,Single,23,2,82623,5900,1
+80000,Female,University,Others,30,0,11976,1185,0
+700000,Male,Graduate School,Single,35,-1,8277,17599,0
+30000,Female,University,Single,22,0,27447,1923,1
+90000,Female,Graduate School,Married,28,-1,12759,5000,1
+200000,Male,University,Single,44,-1,310,2306,0
+230000,Male,University,Married,50,0,94314,4082,1
+100000,Male,Graduate School,Single,54,0,99932,1712,0
+360000,Female,Graduate School,Single,28,0,152072,5708,0
+500000,Male,Graduate School,Married,43,1,0,0,1
+240000,Female,University,Single,42,-1,3328,164820,0
+360000,Male,Graduate School,Single,28,0,78398,25126,0
+360000,Male,University,Married,30,2,2500,0,1
+320000,Male,Graduate School,Married,43,0,110032,4028,0
+50000,Female,University,Married,40,0,21696,1383,0
+140000,Female,University,Single,31,0,50822,2473,0
+360000,Female,University,Married,30,0,6419,1067,0
+180000,Female,Graduate School,Single,27,-1,641,1812,1
+330000,Female,Graduate School,Married,40,-2,2573,1800,0
+140000,Female,Graduate School,Single,24,0,6265,2198,1
+360000,Female,Graduate School,Married,27,-2,2459,955,0
+150000,Female,Graduate School,Single,27,2,15475,3000,1
+220000,Female,High School,Single,29,0,14246,2601,0
+260000,Male,High School,Single,31,0,257920,10036,0
+30000,Female,High School,Married,54,0,17708,1615,0
+580000,Male,Graduate School,Single,31,-1,2507,2190,0
+50000,Male,University,Single,25,0,1582,300,0
+230000,Female,Graduate School,Married,36,-1,878,9574,1
+20000,Male,University,Married,39,2,20264,1500,1
+110000,Male,High School,Married,35,2,67591,3200,1
+240000,Female,Graduate School,Single,28,0,168068,6042,0
+50000,Female,University,Married,34,0,49940,2128,0
+20000,Female,Graduate School,Others,51,-1,3680,0,0
+190000,Female,High School,Married,42,0,113516,6000,0
+100000,Female,High School,Single,50,0,84027,7500,0
+500000,Male,University,Married,34,-2,13010,5041,0
+170000,Female,Graduate School,Married,40,-1,511,1001,0
+150000,Female,University,Single,23,0,7795,7000,0
+30000,Male,University,Married,48,0,46780,1899,0
+80000,Male,University,Single,41,2,80140,3253,1
+340000,Female,High School,Married,49,-2,20452,18832,0
+210000,Female,University,Single,30,0,89670,7500,0
+30000,Female,Graduate School,Single,23,0,28239,2500,0
+230000,Male,Graduate School,Single,27,-1,3257,1816,0
+20000,Male,University,Single,23,0,18497,1900,0
+70000,Female,High School,Married,26,0,71147,3800,0
+310000,Female,Graduate School,Married,66,0,259874,8874,1
+80000,Male,University,Single,36,0,19211,1337,1
+150000,Female,University,Single,27,0,146778,6554,0
+20000,Male,High School,Married,49,0,13881,1218,0
+220000,Female,University,Married,54,-1,264,1138,0
+150000,Female,Graduate School,Single,30,0,150426,5520,0
+530000,Male,Graduate School,Single,30,0,214022,20000,0
+150000,Female,University,Married,28,-1,108,0,0
+150000,Female,Graduate School,Single,29,-1,2356,16836,1
+420000,Female,University,Married,30,0,83640,1275,0
+80000,Female,University,Married,31,-2,0,0,0
+100000,Female,University,Married,37,-1,87055,3000,0
+30000,Female,University,Single,27,2,15191,1500,1
+50000,Male,University,Married,37,0,23607,2000,0
+330000,Female,Graduate School,Single,39,-1,1972,1586,1
+140000,Female,University,Single,36,0,134783,12800,0
+50000,Female,University,Married,34,0,95380,2140,0
+360000,Male,University,Married,37,1,0,0,1
+180000,Male,University,Single,38,-1,54198,4400,1
+130000,Female,University,Married,27,2,54921,2000,0
+130000,Female,University,Single,31,1,120890,5200,0
+20000,Male,Graduate School,Single,24,2,18199,0,0
+260000,Female,Graduate School,Single,30,0,89538,4000,0
+50000,Female,University,Single,51,0,57595,1996,0
+320000,Female,University,Married,29,0,145907,30236,0
+70000,Female,University,Married,37,0,49775,2000,1
+180000,Female,Graduate School,Married,47,-1,807,4246,0
+60000,Male,University,Married,54,2,61493,55,0
+80000,Male,University,Married,23,-2,795,1703,0
+10000,Female,High School,Married,49,0,6904,1206,0
+150000,Female,University,Married,33,1,42932,0,1
+50000,Female,High School,Single,55,0,45698,6000,0
+150000,Male,High School,Single,41,0,64425,1745,0
+350000,Male,Graduate School,Single,38,0,36098,3604,0
+380000,Female,Graduate School,Single,44,0,371917,15071,0
+130000,Female,High School,Single,25,-1,250,174,0
+180000,Female,Graduate School,Married,35,-1,1651,0,0
+100000,Female,University,Married,41,0,107224,7000,0
+30000,Female,High School,Married,31,3,23368,2000,1
+430000,Male,Graduate School,Single,28,0,406995,40000,0
+200000,Female,University,Married,37,-1,2681,0,1
+360000,Male,University,Married,43,1,0,0,1
+80000,Female,Graduate School,Single,32,-1,1147,1147,0
+40000,Female,Graduate School,Single,26,2,27255,0,1
+80000,Female,University,Single,29,0,12393,2022,0
+20000,Male,High School,Single,23,0,17030,1315,0
+250000,Female,University,Single,24,-1,956,1339,1
+70000,Male,University,Married,43,0,16186,2000,0
+20000,Female,Graduate School,Married,34,0,14676,1286,1
+160000,Female,Others,Married,39,1,-18,0,0
+70000,Male,University,Married,36,0,27868,1850,1
+50000,Female,University,Single,34,-2,588,592,0
+30000,Female,High School,Married,41,2,26042,1800,1
+80000,Male,University,Married,48,0,46495,3000,0
+20000,Male,University,Married,52,0,15768,1285,0
+50000,Female,Others,Married,32,0,10335,1200,0
+50000,Male,Graduate School,Single,26,1,0,0,0
+230000,Female,Graduate School,Married,36,-1,2078,380,0
+10000,Female,University,Single,23,1,-1000,0,1
+20000,Male,University,Married,42,0,16070,3300,0
+110000,Male,University,Single,34,2,108180,5000,0
+200000,Female,Graduate School,Single,29,-1,5624,2519,0
+20000,Male,University,Married,59,1,-1213,52082,0
+80000,Male,Graduate School,Single,29,-1,2946,4599,0
+280000,Female,University,Single,31,1,0,0,0
+140000,Female,University,Married,45,0,106974,4924,0
+60000,Female,Graduate School,Single,25,0,24066,1414,0
+50000,Male,University,Married,51,0,113247,3000,0
+200000,Female,Graduate School,Married,34,-2,0,0,0
+180000,Female,Graduate School,Married,38,1,114413,0,0
+260000,Female,University,Single,45,0,207977,9000,0
+150000,Male,Graduate School,Single,32,0,52582,6395,1
+130000,Female,Graduate School,Single,32,0,126080,7000,0
+120000,Male,High School,Single,50,-1,390,0,1
+210000,Female,University,Married,30,-2,0,0,0
+110000,Female,Graduate School,Others,33,0,165090,4259,0
+30000,Male,University,Single,28,-1,29617,4096,0
+170000,Female,Graduate School,Single,25,-1,12218,5507,1
+360000,Female,Graduate School,Single,32,-1,13819,2148,0
+130000,Male,Graduate School,Single,37,0,116093,3647,1
+30000,Female,Graduate School,Single,24,2,12324,4500,1
+160000,Female,University,Married,31,-2,239,0,0
+30000,Female,University,Married,43,-1,25166,1437,0
+10000,Female,University,Single,22,1,0,1000,1
+50000,Female,University,Single,38,0,2410,0,0
+80000,Male,University,Married,40,0,23833,5000,0
+260000,Female,Graduate School,Married,49,1,0,0,0
+170000,Male,High School,Single,26,2,74453,5020,1
+30000,Female,High School,Single,25,0,20983,2000,0
+50000,Female,High School,Single,51,0,25577,1269,0
+120000,Male,High School,Married,40,0,88185,5000,0
+240000,Female,University,Married,31,2,71876,3100,1
+180000,Male,Graduate School,Single,41,0,16366,18339,0
+60000,Female,High School,Single,26,0,38726,1663,0
+150000,Female,High School,Single,32,0,113932,2500,1
+360000,Female,University,Single,26,1,0,0,0
+350000,Female,Graduate School,Single,32,-2,0,0,1
+30000,Female,Graduate School,Married,43,-1,390,780,0
+120000,Female,Graduate School,Single,29,0,121805,4600,0
+330000,Male,Graduate School,Married,43,1,0,0,1
+500000,Male,University,Single,44,-2,26697,1131,0
+100000,Female,University,Married,34,0,102724,4600,0
+80000,Female,University,Married,37,1,2626,0,0
+260000,Female,University,Single,30,0,34405,20000,0
+500000,Male,University,Married,41,0,171565,20241,0
+90000,Male,Graduate School,Single,52,0,91656,5110,0
+360000,Female,University,Married,38,-1,1604,239,0
+360000,Male,High School,Single,34,-1,3821,1937,0
+100000,Female,University,Single,25,0,8791,1000,0
+200000,Male,University,Single,36,0,190120,9000,1
+490000,Female,University,Single,52,0,191937,7133,0
+460000,Male,Graduate School,Married,40,0,100026,5095,0
+20000,Male,Graduate School,Single,39,1,10968,2000,1
+230000,Female,University,Married,35,0,76050,3000,0
+170000,Female,University,Single,28,0,169531,8000,0
+60000,Female,High School,Married,63,0,56285,5000,0
+90000,Female,University,Married,33,0,26772,56388,0
+160000,Male,Graduate School,Married,38,1,0,499,0
+20000,Male,University,Single,30,2,20586,1300,1
+500000,Female,Graduate School,Married,36,1,0,0,0
+30000,Female,University,Married,42,0,9801,1200,1
+170000,Male,University,Single,31,-1,2908,3408,0
+100000,Female,High School,Married,44,0,80808,3025,0
+50000,Male,University,Single,39,0,48493,2109,0
+110000,Male,University,Single,28,2,111541,5400,1
+20000,Female,Graduate School,Single,23,0,11672,2424,0
+70000,Female,High School,Single,26,0,28995,1600,0
+310000,Female,University,Married,42,-2,-606,45484,0
+30000,Male,University,Single,28,2,35991,2000,1
+30000,Female,University,Married,27,0,21632,3300,1
+100000,Female,Graduate School,Married,36,0,51713,1000,0
+150000,Female,Graduate School,Single,30,-2,217,0,0
+80000,Male,University,Single,27,0,78406,1500,0
+200000,Female,High School,Married,40,0,158809,5380,0
+20000,Male,University,Married,36,1,19006,0,1
+30000,Male,University,Married,33,0,23900,2000,0
+50000,Female,High School,Married,42,0,13436,1246,0
+130000,Female,Others,Married,32,0,3275,376,0
+330000,Male,Graduate School,Single,32,0,162657,5500,0
+80000,Female,High School,Single,53,-2,5687,0,0
+50000,Female,Graduate School,Married,34,-1,2531,14608,0
+150000,Female,Graduate School,Single,33,-1,10544,2000,0
+210000,Female,University,Married,30,1,0,241,0
+30000,Female,University,Married,30,3,30583,389,0
+500000,Female,Graduate School,Single,30,1,17144,3678,0
+60000,Female,Graduate School,Single,32,1,31634,0,0
+20000,Female,University,Married,29,-1,1473,390,0
+30000,Female,University,Single,23,2,21034,7000,1
+120000,Male,University,Married,60,2,27497,1200,1
+80000,Female,University,Married,37,1,79900,21,1
+360000,Female,Graduate School,Married,35,-2,0,0,1
+150000,Male,University,Married,36,-1,6750,15227,0
+10000,Female,High School,Single,51,0,53095,2134,0
+230000,Female,High School,Married,30,0,134272,107000,0
+380000,Female,High School,Single,31,-1,3859,9240,0
+350000,Female,Graduate School,Single,48,-2,0,7900,0
+130000,Female,University,Single,23,0,113900,5000,0
+490000,Female,Graduate School,Single,29,-2,4301,16388,0
+150000,Female,University,Single,28,0,12272,4447,0
+340000,Female,University,Single,38,-2,0,780,1
+80000,Female,Graduate School,Single,52,1,0,0,0
+230000,Female,Graduate School,Single,36,-2,0,0,0
+160000,Male,University,Single,37,0,119165,6008,0
+300000,Female,University,Married,33,-1,1326,0,0
+400000,Female,University,Married,24,0,176294,44776,0
+80000,Female,University,Married,45,-1,390,390,0
+320000,Male,University,Married,45,-1,1471,5318,0
+310000,Male,Graduate School,Single,36,-2,17611,10000,0
+270000,Female,University,Married,32,-2,3979,511,0
+280000,Female,High School,Married,53,0,4405,3000,0
+330000,Male,Graduate School,Single,31,0,163406,7558,0
+50000,Male,University,Single,49,0,7282,1300,0
+240000,Female,Graduate School,Single,26,1,0,0,1
+180000,Female,High School,Single,27,-1,3898,0,0
+270000,Male,Graduate School,Single,28,4,237881,0,1
+70000,Male,Graduate School,Single,36,2,50016,0,1
+290000,Male,Graduate School,Single,27,0,246284,10000,0
+350000,Female,Graduate School,Single,41,-2,0,3813,0
+210000,Male,University,Married,54,2,471,176,0
+110000,Male,University,Single,47,2,53016,5000,1
+300000,Male,University,Married,32,-1,2884,1706,1
+140000,Female,University,Single,38,0,141453,8000,0
+80000,Male,Graduate School,Single,27,0,77111,6384,0
+360000,Female,University,Married,42,-1,44380,968,0
+210000,Female,University,Single,33,0,19951,2000,0
+50000,Male,University,Single,22,-1,19875,0,1
+80000,Female,Graduate School,Single,28,1,0,0,1
+130000,Female,High School,Single,27,0,49780,2300,0
+50000,Female,Graduate School,Single,22,0,49077,2610,0
+80000,Female,High School,Single,34,-2,585,395,0
+350000,Male,Graduate School,Single,35,-1,3260,4273,0
+330000,Female,University,Married,37,-2,2148,0,0
+80000,Female,University,Single,23,2,20661,2500,0
+280000,Female,Graduate School,Married,51,-1,11223,15262,0
+30000,Male,University,Single,49,0,27414,1526,0
+200000,Male,High School,Married,44,0,69106,3000,0
+120000,Female,Graduate School,Single,26,-1,2897,485,0
+50000,Female,University,Married,38,0,3239,1000,0
+50000,Male,University,Single,31,1,18190,0,0
+300000,Male,Graduate School,Married,58,1,158230,6000,1
+20000,Male,University,Single,30,0,12755,832,0
+50000,Male,University,Single,46,0,47772,2000,1
+500000,Female,University,Single,44,0,77257,10000,0
+130000,Female,University,Married,40,-1,1445,1445,0
+70000,Female,Graduate School,Single,28,2,59794,0,1
+70000,Female,High School,Single,29,-1,24336,3010,0
+120000,Female,University,Single,27,0,36175,3000,0
+340000,Male,University,Married,48,0,73627,3870,0
+130000,Male,Graduate School,Single,41,-1,1392,0,1
+240000,Male,Graduate School,Single,33,0,239009,15018,0
+50000,Female,University,Married,37,0,28522,1849,0
+210000,Female,University,Married,48,-1,11876,11876,0
+30000,Male,University,Single,30,2,20732,1347,1
+80000,Female,University,Single,22,0,76737,3500,0
+50000,Male,High School,Married,42,0,50708,1872,0
+220000,Female,High School,Married,25,0,93674,3500,0
+40000,Male,University,Married,37,1,12127,0,0
+280000,Male,Graduate School,Single,26,0,245969,10600,0
+90000,Female,University,Married,31,-1,350,0,0
+180000,Female,University,Single,39,0,166915,6000,0
+50000,Male,University,Single,30,0,32781,1489,1
+320000,Male,University,Single,27,0,144648,8000,0
+70000,Male,University,Married,38,0,52121,1965,0
+260000,Female,Graduate School,Single,36,1,2111,0,1
+360000,Male,Graduate School,Married,38,0,277523,4013,0
+100000,Female,University,Single,38,0,89842,4025,0
+100000,Female,Graduate School,Married,32,-1,380,380,1
+140000,Female,University,Married,39,0,133641,7000,0
+90000,Female,University,Single,28,1,83480,8000,0
+220000,Male,Graduate School,Single,25,-1,-1855,88444,0
+260000,Female,Graduate School,Single,34,-2,5237,0,0
+260000,Female,Graduate School,Married,42,-1,78,984,0
+20000,Male,Graduate School,Single,24,-1,4720,0,0
+50000,Female,Graduate School,Single,27,2,49296,1881,0
+70000,Female,University,Single,23,0,57034,2700,0
+170000,Female,University,Married,36,2,15031,2500,1
+30000,Female,University,Single,22,0,30719,1900,0
+200000,Male,University,Single,68,0,203533,8870,0
+50000,Female,University,Single,39,0,16729,1620,0
+50000,Male,University,Single,30,1,0,0,0
+470000,Female,University,Married,35,0,64875,2505,0
+20000,Female,Graduate School,Single,22,-1,7702,274,1
+50000,Male,University,Single,26,2,291,291,0
+360000,Female,University,Married,56,0,500090,15900,0
+20000,Female,University,Others,46,1,0,0,0
+140000,Female,Graduate School,Single,26,-1,6208,2136,0
+10000,Female,High School,Married,47,0,2855,1069,1
+90000,Female,Graduate School,Single,30,0,19003,3500,0
+200000,Female,University,Married,46,-1,722,9086,0
+390000,Female,University,Married,39,-1,827,827,0
+90000,Female,University,Single,25,0,78651,3600,1
+130000,Female,Graduate School,Single,27,1,0,1483,0
+120000,Female,University,Single,33,1,0,974,1
+160000,Female,University,Married,35,-1,549,0,0
+70000,Female,University,Single,24,2,71339,6,0
+80000,Male,High School,Married,60,0,68085,10000,0
+250000,Female,University,Single,36,0,62131,2741,0
+240000,Male,University,Single,37,0,171974,8000,0
+50000,Female,University,Single,49,0,12272,2005,0
+30000,Female,High School,Single,23,0,2277,1078,0
+50000,Male,High School,Married,51,2,55028,3000,1
+80000,Female,University,Single,24,-1,2152,2245,1
+130000,Male,University,Married,47,0,109640,8436,0
+50000,Male,High School,Single,23,0,47033,1707,0
+200000,Male,University,Married,39,1,-390,0,0
+140000,Male,High School,Married,33,0,140795,3609,0
+30000,Female,Graduate School,Single,23,0,18397,2000,0
+20000,Female,Graduate School,Single,25,1,11729,1000,0
+70000,Male,University,Single,29,0,18312,1500,0
+20000,Female,High School,Others,48,0,19590,1308,0
+410000,Male,Graduate School,Married,42,0,27559,3000,0
+250000,Male,Graduate School,Married,41,-1,390,390,1
+140000,Female,University,Single,27,2,136308,5150,1
+90000,Female,Graduate School,Single,26,0,17704,5005,0
+340000,Male,Graduate School,Married,38,-1,325,325,0
+230000,Female,University,Married,28,-1,1190,1716,0
+80000,Male,High School,Single,33,0,30013,4000,0
+50000,Female,University,Single,42,0,51521,1451,1
+60000,Female,University,Married,35,0,60505,2300,0
+50000,Female,University,Single,24,0,8587,1500,0
+210000,Female,University,Single,28,0,145643,4741,0
+140000,Female,Graduate School,Single,27,0,135249,5500,0
+20000,Male,High School,Single,32,0,4338,1092,0
+20000,Male,High School,Others,48,1,-739,0,1
+230000,Male,Graduate School,Single,32,-2,2785,1223,0
+210000,Male,University,Single,27,0,30365,1606,0
+20000,Male,University,Single,34,0,36773,3210,0
+50000,Female,University,Single,24,0,45537,2100,0
+30000,Male,High School,Single,49,0,15966,262,0
+50000,Female,University,Single,36,0,37329,1500,1
+150000,Female,Graduate School,Married,30,1,44820,0,0
+160000,Male,University,Married,57,-1,2008,0,0
+90000,Female,High School,Married,42,0,14376,1278,0
+150000,Male,University,Married,39,-1,10086,32400,0
+200000,Male,High School,Single,36,-2,45810,0,1
+170000,Male,Graduate School,Married,40,0,171841,7540,0
+100000,Female,University,Married,38,0,84883,4000,0
+20000,Male,High School,Married,39,0,16946,1529,0
+30000,Male,Graduate School,Single,24,0,27576,1500,0
+200000,Male,Graduate School,Single,32,0,42325,2000,0
+180000,Male,University,Married,46,-1,390,390,0
+160000,Female,University,Married,44,1,0,5621,0
+80000,Female,University,Single,31,1,54207,0,0
+70000,Female,Graduate School,Married,36,2,81719,5000,0
+80000,Male,High School,Single,44,2,78950,2815,1
+50000,Male,University,Single,45,0,48406,2286,1
+140000,Female,University,Married,58,2,70428,3200,1
+90000,Female,University,Married,50,0,76025,10374,0
+120000,Female,Graduate School,Single,27,-1,19333,0,0
+80000,Female,University,Single,33,-2,47867,1925,0
+20000,Male,University,Single,43,4,34213,0,1
+20000,Male,Graduate School,Single,28,1,10631,0,0
+440000,Female,University,Single,34,-2,0,200,0
+310000,Female,University,Married,28,0,106540,5000,0
+80000,Female,University,Single,25,-2,0,0,0
+200000,Male,Graduate School,Single,28,-2,95065,20000,0
+300000,Female,University,Married,34,1,6451,0,1
+140000,Female,University,Married,35,0,10215,4293,0
+230000,Female,University,Married,43,-1,3451,5587,0
+130000,Female,High School,Married,38,0,100443,3516,1
+40000,Female,Others,Married,48,0,37896,3292,0
+20000,Female,High School,Married,23,0,20030,1600,0
+90000,Female,University,Single,26,0,43310,2500,0
+50000,Female,High School,Single,41,2,24064,1600,1
+220000,Male,University,Married,49,0,153880,6300,0
+60000,Female,University,Others,64,2,24541,2000,1
+240000,Male,High School,Married,48,0,111990,15000,0
+400000,Female,University,Single,40,1,0,0,0
+120000,Male,Graduate School,Single,30,-1,179,1403,0
+280000,Female,Graduate School,Married,37,1,0,0,0
+390000,Female,University,Single,28,-1,1421,1294,0
+340000,Female,Graduate School,Married,46,-1,9435,40670,0
+10000,Male,University,Single,47,-1,219,10400,1
+110000,Male,University,Married,35,0,48131,32123,1
+100000,Male,University,Single,31,0,95972,5000,0
+130000,Female,University,Married,25,0,119703,5200,0
+230000,Male,University,Married,45,2,221691,10000,0
+290000,Female,University,Married,39,0,296002,12000,1
+50000,Male,Graduate School,Single,29,0,16645,2000,0
+120000,Female,Graduate School,Single,27,-2,13394,17456,0
+240000,Female,Graduate School,Married,36,-1,7058,4789,1
+360000,Female,Graduate School,Single,32,0,166848,7294,0
+110000,Female,High School,Single,31,0,77595,3700,0
+60000,Female,High School,Single,23,0,23268,1215,0
+130000,Male,Graduate School,Single,33,1,0,0,1
+250000,Female,Graduate School,Married,31,-1,315,0,0
+410000,Male,High School,Single,38,-1,4822,233,0
+130000,Female,High School,Single,53,0,100184,4700,0
+200000,Male,Graduate School,Single,38,0,180486,7000,0
+60000,Female,Graduate School,Single,25,2,37287,0,0
+20000,Male,Graduate School,Single,25,-2,85,1377,0
+20000,Male,University,Married,37,0,17214,2248,0
+380000,Female,University,Married,50,0,294100,30125,0
+80000,Male,High School,Married,47,0,78415,3369,1
+160000,Male,University,Single,31,0,24645,1420,0
+50000,Male,Graduate School,Single,23,1,47201,1400,0
+50000,Male,University,Married,34,0,24249,2100,0
+310000,Male,University,Married,34,0,97196,6000,0
+130000,Male,University,Single,30,0,95799,4500,0
+180000,Male,Graduate School,Single,33,0,33331,5317,0
+130000,Female,University,Married,43,0,129099,5008,0
+50000,Male,Graduate School,Married,35,0,50482,6749,0
+150000,Female,University,Married,30,1,36472,0,1
+230000,Female,Graduate School,Single,25,0,32915,4794,0
+20000,Female,Graduate School,Single,23,0,19082,1321,0
+360000,Female,Graduate School,Married,33,1,0,0,1
+180000,Female,Graduate School,Married,31,1,0,0,0
+280000,Male,University,Married,37,0,196807,7005,0
+90000,Female,Graduate School,Single,23,-1,11495,1200,1
+210000,Male,University,Married,41,1,149343,0,0
+240000,Male,Graduate School,Single,30,1,0,0,1
+200000,Female,Graduate School,Married,49,-1,4739,0,0
+180000,Female,Graduate School,Married,35,0,47603,5705,0
+190000,Male,High School,Married,44,1,-1600,4000,0
+50000,Female,High School,Married,53,-1,4794,4000,0
+80000,Female,Others,Married,23,-1,5000,0,0
+50000,Male,University,Single,23,2,32589,1000,1
+130000,Male,Graduate School,Single,29,0,102640,0,0
+30000,Female,University,Married,52,2,18499,2500,1
+30000,Female,Graduate School,Single,28,0,36541,5000,0
+50000,Male,University,Single,24,0,20378,3300,1
+360000,Female,Graduate School,Married,41,0,100608,4000,0
+80000,Female,University,Single,24,2,45647,2039,1
+460000,Male,Graduate School,Single,33,0,161054,6050,0
+200000,Male,Graduate School,Married,48,-1,5472,4318,0
+50000,Male,University,Single,22,0,49334,5000,1
+370000,Female,University,Single,28,0,87248,2896,0
+20000,Female,University,Married,42,3,18464,1600,1
+50000,Female,Graduate School,Single,24,0,50537,2554,0
+80000,Male,University,Married,54,0,26730,1500,0
+170000,Female,Graduate School,Married,31,1,0,0,0
+400000,Male,Graduate School,Married,54,-2,1348,21532,0
+500000,Female,University,Married,41,0,131720,3900,0
+250000,Male,High School,Married,44,-1,6826,3924,1
+80000,Female,University,Single,22,0,80491,2327,0
+90000,Female,University,Others,40,1,89762,3000,0
+120000,Male,High School,Single,44,4,149229,0,1
+310000,Male,University,Married,44,0,3947,2500,0
+20000,Male,University,Single,22,0,13540,406,0
+60000,Female,University,Single,29,-1,4603,0,0
+460000,Female,Graduate School,Single,29,0,170894,7018,0
+220000,Male,University,Married,40,-1,2300,1000,0
+220000,Male,Graduate School,Single,32,0,203689,10000,1
+20000,Male,University,Single,33,1,17439,0,0
+80000,Male,High School,Married,38,-1,6270,19421,1
+220000,Male,Graduate School,Single,31,-1,7521,4028,0
+20000,Male,University,Single,23,0,10478,1193,0
+100000,Female,University,Single,26,0,46891,2000,0
+230000,Female,Graduate School,Single,31,0,81734,3687,0
+300000,Female,University,Married,40,0,184164,7028,0
+20000,Male,University,Single,24,0,2522,390,0
+160000,Male,Graduate School,Single,29,-1,390,4238,0
+70000,Female,University,Married,27,1,48098,881,1
+240000,Female,University,Single,37,0,280110,10000,0
+100000,Male,University,Single,25,-1,1285,2530,0
+220000,Male,University,Single,30,-2,18301,2026,0
+50000,Male,University,Single,42,0,5268,1107,0
+80000,Female,High School,Married,44,-1,5730,5837,0
+130000,Male,Graduate School,Single,28,2,114144,0,1
+30000,Female,University,Single,23,1,27171,1000,0
+360000,Female,University,Married,46,1,0,0,1
+280000,Male,University,Married,34,0,126305,4728,1
+230000,Male,Graduate School,Married,45,0,43885,1749,0
+30000,Female,Graduate School,Single,24,-1,3055,18407,0
+70000,Female,University,Married,28,0,45738,4016,1
+100000,Male,University,Single,32,0,43788,1759,0
+50000,Male,University,Single,27,0,50679,2033,1
+150000,Female,University,Single,26,-2,221,0,0
+20000,Male,University,Single,46,-1,780,0,1
+100000,Male,High School,Others,48,2,10514,13008,1
+90000,Female,University,Single,25,2,41757,2001,1
+50000,Male,High School,Single,49,0,34461,2000,0
+200000,Male,Graduate School,Married,49,2,2500,0,1
+250000,Female,University,Single,26,0,46467,1500,0
+140000,Female,Graduate School,Married,32,1,0,0,0
+320000,Female,University,Married,44,-1,396,396,0
+20000,Female,University,Married,22,2,20439,1602,1
+80000,Female,University,Single,26,0,46738,3001,0
+300000,Female,University,Single,36,1,0,0,0
+500000,Female,Graduate School,Married,38,0,54787,20015,0
+30000,Male,University,Single,22,1,4873,16547,0
+130000,Female,Others,Single,47,2,1478,1501,0
+40000,Male,University,Married,25,0,37540,1976,0
+20000,Female,University,Single,22,0,19194,1129,0
+470000,Female,Graduate School,Single,39,-1,1264,5478,0
+200000,Female,Graduate School,Single,42,-1,1347,208,1
+50000,Female,University,Single,31,2,30619,1800,1
+130000,Female,University,Married,36,0,26244,1550,0
+100000,Female,University,Married,33,-2,5576,792,0
+160000,Male,Graduate School,Single,29,-1,554,0,0
+50000,Male,High School,Single,26,0,24132,1902,0
+180000,Male,Graduate School,Married,42,0,124064,5087,0
+50000,Male,High School,Single,29,0,47863,2000,0
+70000,Female,High School,Married,40,0,65198,2924,0
+270000,Male,University,Married,53,2,211630,3000,1
+210000,Female,University,Married,48,-2,140,140,0
+70000,Female,University,Married,43,0,66083,3017,0
+160000,Female,Graduate School,Single,31,-1,3555,7340,0
+30000,Female,University,Married,27,0,4527,3000,0
+250000,Male,Graduate School,Single,29,-1,12044,3276,0
+10000,Female,High School,Married,51,1,2731,0,0
+50000,Male,University,Single,27,1,17654,0,0
+120000,Male,High School,Married,38,-1,3465,5558,0
+50000,Male,University,Single,28,0,49417,1966,0
+250000,Male,Graduate School,Single,36,0,143200,7000,0
+20000,Male,University,Single,48,-1,390,1170,1
+70000,Female,University,Single,25,0,19508,2000,0
+50000,Male,University,Single,46,-1,2635,27921,0
+110000,Male,University,Single,41,0,120767,24800,0
+100000,Female,Graduate School,Single,32,1,7130,0,0
+20000,Male,University,Married,31,-2,-66,0,0
+120000,Female,University,Married,25,0,30925,43272,0
+50000,Male,University,Single,26,0,5385,2000,0
+140000,Male,University,Married,29,1,73302,3000,1
+140000,Male,University,Married,34,-1,711,711,1
+250000,Female,Graduate School,Married,39,0,196409,6023,0
+180000,Female,University,Single,30,0,43779,1751,0
+130000,Male,University,Single,26,0,55348,1650,0
+200000,Male,Graduate School,Single,27,-2,2436,5963,0
+100000,Female,Graduate School,Single,26,0,64086,6005,0
+20000,Male,University,Single,23,-1,15879,4000,0
+290000,Female,High School,Single,27,0,63584,2000,0
+110000,Male,Graduate School,Married,46,0,106111,5015,0
+30000,Female,University,Single,27,2,1200,0,0
+150000,Female,High School,Single,23,0,6319,2000,0
+150000,Female,University,Single,34,-1,1467,701,0
+120000,Female,Graduate School,Single,25,1,0,0,1
+280000,Male,University,Married,28,0,24189,1500,0
+90000,Female,University,Married,36,1,-191,0,0
+130000,Male,University,Married,39,0,10896,390,0
+30000,Male,University,Single,24,1,27262,0,1
+100000,Male,Graduate School,Married,25,-1,91842,1809,0
+30000,Female,Graduate School,Single,27,0,20234,2000,1
+90000,Female,University,Married,28,0,45537,3000,0
+160000,Female,University,Married,25,4,103398,0,0
+260000,Male,Graduate School,Married,51,-1,771,1596,0
+50000,Male,Graduate School,Single,28,2,48508,2500,1
+260000,Male,High School,Single,27,0,132755,6417,0
+220000,Female,Graduate School,Single,35,-1,7443,5600,0
+20000,Female,University,Single,22,0,19248,2000,0
+300000,Female,Graduate School,Single,25,1,315632,0,1
+250000,Female,Graduate School,Single,31,1,-129,0,0
+200000,Male,University,Single,30,-1,416,416,0
+20000,Female,University,Single,21,0,13660,2000,0
+50000,Male,University,Single,25,3,26798,1200,1
+100000,Female,High School,Single,28,0,7220,13695,0
+170000,Male,University,Married,37,0,164240,8019,0
+120000,Male,Graduate School,Married,37,0,68329,5000,0
+120000,Female,University,Single,38,0,33814,1766,0
+50000,Male,University,Single,26,0,37862,1304,0
+260000,Female,Graduate School,Single,28,0,63504,4000,0
+60000,Female,University,Single,44,0,57657,3400,0
+120000,Female,University,Single,28,-1,200,0,1
+220000,Female,High School,Married,47,-1,38412,50451,0
+200000,Male,Graduate School,Single,26,-1,299,0,0
+90000,Female,University,Married,45,0,63049,10000,0
+310000,Female,Graduate School,Married,25,0,184538,5019,0
+110000,Female,University,Married,34,0,18328,1300,0
+140000,Female,University,Single,42,0,24984,1434,0
+160000,Female,Graduate School,Married,51,0,63406,2901,0
+80000,Female,University,Single,24,2,23787,0,1
+500000,Female,Graduate School,Married,36,0,251931,101458,0
+130000,Male,University,Single,29,0,130708,6658,0
+240000,Female,University,Married,44,0,220439,9657,0
+50000,Female,University,Single,24,2,2400,0,1
+180000,Female,Graduate School,Married,42,1,0,1290,0
+240000,Male,University,Single,33,-1,1863,15172,1
+180000,Male,University,Married,30,0,135001,6100,0
+130000,Male,University,Single,33,0,114984,5800,0
+50000,Female,Graduate School,Single,24,0,12806,7522,1
+500000,Male,Graduate School,Single,34,-1,154085,4373,0
+200000,Male,University,Married,29,0,168374,8000,0
+220000,Female,Graduate School,Single,29,-2,19030,14287,0
+70000,Male,University,Married,33,0,21670,3400,0
+440000,Male,Graduate School,Married,31,-1,21359,5023,0
+290000,Male,Others,Married,52,0,302396,2000,0
+50000,Female,University,Single,23,0,30941,1751,0
+120000,Female,University,Single,23,0,115653,5000,0
+80000,Female,High School,Married,44,-1,485,10735,0
+170000,Female,University,Married,27,0,164214,8001,0
+160000,Male,Graduate School,Single,41,-1,396,396,1
+280000,Male,Graduate School,Single,29,-1,7712,0,1
+40000,Female,University,Married,47,2,31121,0,1
+80000,Female,High School,Married,53,0,66002,5000,0
+210000,Female,University,Married,40,-1,12555,560,0
+60000,Female,University,Married,29,2,56252,3000,1
+20000,Female,University,Married,40,0,14492,1349,1
+230000,Female,University,Single,36,1,2404,687,0
+90000,Female,Graduate School,Married,50,0,58979,6000,0
+170000,Female,Graduate School,Married,35,-1,736,1289,0
+70000,Female,University,Single,22,0,57889,1592,0
+50000,Male,University,Single,29,0,24043,3165,0
+140000,Female,Graduate School,Married,48,-2,5349,2216,0
+50000,Female,High School,Single,55,0,47860,1936,0
+270000,Female,Graduate School,Single,45,0,96131,4204,0
+170000,Female,University,Married,49,2,164210,0,1
+180000,Female,Graduate School,Single,27,0,95757,3880,0
+80000,Female,Graduate School,Single,29,-1,928,4844,0
+450000,Female,Graduate School,Single,37,-1,71985,15000,0
+400000,Male,Graduate School,Single,29,-1,6748,7777,0
+400000,Female,University,Single,35,-1,1908,0,0
+60000,Male,High School,Single,25,0,57530,3869,0
+110000,Male,Graduate School,Single,25,0,51139,3500,0
+60000,Female,University,Single,24,0,27295,1724,0
+140000,Male,Graduate School,Single,37,0,3351,64555,1
+20000,Male,University,Married,24,0,10085,1500,1
+50000,Male,Graduate School,Single,30,0,60762,2300,1
+30000,Female,High School,Single,53,0,21101,1374,0
+210000,Female,University,Single,26,0,193345,8000,0
+50000,Male,University,Single,36,2,32899,2000,1
+200000,Male,Graduate School,Single,30,-1,15176,2715,0
+120000,Female,High School,Married,50,0,119231,3825,0
+200000,Female,Graduate School,Single,29,-1,8972,0,1
+180000,Female,University,Married,35,-1,1866,590,0
+440000,Female,University,Married,47,1,-6,1166,1
+150000,Female,University,Single,22,0,7600,1300,0
+130000,Male,Graduate School,Married,70,1,0,0,1
+50000,Female,University,Married,26,0,14661,3000,0
+180000,Female,University,Single,28,1,7445,0,1
+80000,Female,University,Single,24,0,77139,3010,0
+170000,Female,Graduate School,Single,29,-2,0,0,0
+150000,Female,Graduate School,Single,25,-1,18836,2000,0
+240000,Male,University,Single,35,0,179662,7003,0
+140000,Male,University,Single,24,0,125080,4688,1
+50000,Male,High School,Others,50,0,9404,1500,0
+20000,Female,University,Married,36,0,9845,3000,0
+240000,Female,Graduate School,Married,38,-1,8213,20000,0
+30000,Female,University,Married,45,0,30226,1676,0
+50000,Female,University,Single,29,2,5804,1300,1
+20000,Female,University,Married,37,0,17450,3200,1
+20000,Female,University,Married,50,7,22858,0,1
+310000,Female,University,Married,48,-2,0,0,0
+200000,Male,University,Single,39,0,13124,1058,0
+330000,Male,High School,Single,48,1,0,571,0
+100000,Female,High School,Married,53,0,99768,3919,0
+50000,Female,High School,Single,35,0,33507,1108,0
+120000,Male,University,Single,30,0,77413,5000,0
+160000,Male,Graduate School,Married,42,0,69896,3810,0
+170000,Male,Graduate School,Married,62,0,12980,1550,0
+50000,Female,High School,Others,27,2,49929,1750,0
+30000,Female,University,Single,23,-1,390,780,0
+150000,Male,Graduate School,Single,29,0,146330,4979,0
+320000,Female,University,Single,33,-1,16003,1939,0
+50000,Female,University,Married,58,1,32113,0,0
+20000,Female,Graduate School,Single,26,0,18362,1640,0
+20000,Male,University,Single,25,4,20014,0,1
+50000,Male,University,Married,34,0,45398,2008,0
+160000,Male,High School,Married,46,-1,1672,766,0
+20000,Female,University,Single,34,0,19727,2000,0
+130000,Male,University,Married,27,0,53119,3106,1
+150000,Female,University,Single,24,1,0,10437,0
+20000,Female,University,Single,23,0,16622,3015,0
+20000,Male,High School,Married,45,0,10574,3010,1
+70000,Female,High School,Married,56,0,16976,5000,0
+50000,Male,Graduate School,Single,26,-1,18017,1629,0
+200000,Male,University,Married,40,0,63182,1765,0
+320000,Female,University,Single,30,-1,3720,4206,0
+50000,Male,High School,Single,34,-1,780,0,0
+70000,Male,University,Single,25,0,23018,1400,0
+200000,Female,High School,Single,33,-2,0,0,0
+250000,Female,University,Married,29,-2,0,256,0
+70000,Female,Graduate School,Single,25,0,66738,3005,1
+90000,Female,University,Married,36,-1,685,0,1
+400000,Male,Graduate School,Single,32,0,21930,5028,0
+130000,Female,Graduate School,Single,33,-1,2629,9900,0
+110000,Female,Graduate School,Married,40,0,52691,5000,0
+150000,Male,Graduate School,Single,38,1,139005,0,0
+330000,Female,Graduate School,Single,28,-2,1127,13030,0
+30000,Female,High School,Single,21,2,28309,3700,0
+360000,Female,Graduate School,Single,32,-1,42524,53563,0
+50000,Male,Graduate School,Married,54,0,48768,2173,0
+60000,Female,University,Single,24,2,27583,0,1
+200000,Female,University,Single,27,-2,264,264,0
+180000,Female,Graduate School,Single,43,-2,0,0,0
+50000,Male,High School,Single,52,0,48946,2047,0
+80000,Female,High School,Married,38,-1,330,0,0
+50000,Female,University,Single,23,2,31805,2000,1
+270000,Female,High School,Married,30,-1,3842,3934,0
+100000,Female,University,Single,23,-1,694,656,1
+170000,Female,University,Single,38,0,151050,6000,0
+30000,Male,Graduate School,Married,30,0,10024,3000,0
+230000,Female,University,Single,38,0,104364,3867,0
+200000,Female,Graduate School,Married,34,-1,7448,5473,0
+130000,Male,Graduate School,Single,36,0,127790,6000,1
+30000,Male,University,Married,62,0,11188,1000,0
+140000,Female,University,Married,26,0,41547,2000,0
+490000,Male,Graduate School,Married,43,-1,25916,14187,0
+160000,Male,University,Single,30,0,2720,1280,0
+180000,Male,University,Married,42,0,27884,1526,1
+260000,Female,University,Married,32,0,243742,8017,0
+80000,Female,University,Single,27,0,73684,2896,1
+360000,Female,Graduate School,Single,29,-1,357,3050,1
+20000,Female,Graduate School,Single,25,-1,2650,0,0
+20000,Male,Graduate School,Single,28,0,19849,1650,0
+80000,Female,High School,Married,28,0,75903,2014,0
+500000,Male,Graduate School,Single,32,0,18290,25287,0
+160000,Male,Graduate School,Married,55,0,152162,6500,0
+20000,Female,Graduate School,Single,22,0,12404,1500,0
+330000,Female,Graduate School,Married,36,-1,399,26,0
+80000,Female,Graduate School,Single,26,0,38174,3000,1
+220000,Male,University,Married,40,2,206473,8027,0
+230000,Female,University,Single,28,0,185399,18150,0
+90000,Female,University,Single,25,0,81694,3609,0
+300000,Male,High School,Married,61,-1,274750,0,1
+30000,Female,University,Married,45,0,30488,1600,1
+80000,Female,University,Married,40,2,78610,3900,1
+60000,Male,Graduate School,Single,27,0,10908,8331,0
+360000,Male,Others,Married,31,0,275178,7500,0
+360000,Male,Graduate School,Single,27,-1,1809,0,0
+30000,Female,University,Single,23,0,10805,1184,0
+190000,Male,High School,Married,50,0,106410,5491,0
+130000,Female,University,Married,45,0,128671,7000,0
+110000,Female,Graduate School,Single,34,0,110712,4112,0
+350000,Female,University,Married,46,-1,13000,13576,0
+50000,Female,High School,Married,52,0,41140,1500,0
+130000,Female,University,Others,38,-1,780,0,0
+50000,Male,Graduate School,Single,29,-1,53746,0,1
+50000,Male,University,Single,25,0,27206,0,0
+380000,Female,University,Single,31,0,200932,10000,0
+260000,Female,Graduate School,Single,30,0,162241,5045,0
+140000,Female,University,Single,28,-2,-176,0,0
+270000,Female,Graduate School,Single,36,-1,3458,3833,0
+30000,Female,University,Married,33,0,30147,1554,0
+240000,Male,Graduate School,Single,35,1,220,0,0
+70000,Male,High School,Single,32,0,65460,2352,1
+120000,Male,Graduate School,Married,50,2,38565,2200,1
+160000,Female,University,Single,42,2,152630,0,1
+170000,Male,High School,Single,36,-1,396,5596,1
+380000,Female,Graduate School,Single,31,0,331697,15000,0
+160000,Female,University,Married,26,1,75628,0,0
+30000,Male,University,Single,39,2,27000,0,0
+260000,Female,Graduate School,Married,40,-2,2095,785,0
+70000,Male,University,Single,41,2,34751,3500,1
+240000,Male,University,Married,39,-1,5357,4437,0
+170000,Female,University,Married,43,0,20699,11614,0
+30000,Male,University,Married,36,1,28083,0,0
+90000,Male,High School,Single,32,0,87182,3045,0
+250000,Male,University,Single,25,2,28681,3000,0
+270000,Female,University,Single,25,0,23579,1419,0
+130000,Female,University,Single,24,-2,986,0,1
+360000,Male,Graduate School,Married,43,-1,20310,34,0
+490000,Male,Graduate School,Married,35,0,23396,1207,0
+110000,Female,University,Married,31,0,79126,2782,0
+100000,Male,University,Single,27,-2,33628,36066,0
+200000,Male,High School,Married,28,-1,1793,5191,0
+50000,Female,University,Single,24,0,51261,2600,0
+220000,Male,University,Single,30,-1,2501,1000,0
+360000,Male,High School,Single,41,2,218,7022,0
+20000,Female,University,Married,51,-1,10400,0,0
+70000,Female,University,Married,44,0,79867,2163,0
+180000,Female,University,Single,29,0,122024,4500,0
+50000,Male,University,Single,32,0,48934,1602,0
+140000,Female,Graduate School,Single,28,0,5845,10322,0
+220000,Male,University,Married,47,-2,7625,8334,0
+150000,Female,University,Single,50,-1,7890,8958,0
+30000,Male,University,Single,33,0,29574,28000,0
+300000,Female,University,Married,32,0,170379,6600,0
+420000,Male,Graduate School,Married,36,-1,860,0,0
+80000,Female,University,Single,30,0,35738,1946,0
+100000,Female,High School,Single,46,1,0,203,0
+120000,Female,University,Single,25,1,0,0,1
+230000,Female,Graduate School,Single,38,0,25195,10000,0
+200000,Female,University,Married,28,-1,2816,316,0
+200000,Male,Graduate School,Single,29,-1,389,390,0
+230000,Male,High School,Married,30,0,105105,2068,0
+200000,Male,University,Single,27,-1,131791,2000,0
+50000,Male,High School,Married,65,1,11763,0,1
+220000,Male,Graduate School,Married,36,-1,16783,4880,0
+90000,Male,High School,Single,27,2,90221,4000,1
+500000,Male,University,Single,35,0,207237,20001,0
+130000,Female,Graduate School,Single,29,-1,2139,0,0
+210000,Male,University,Married,37,0,85327,2500,0
+200000,Female,Graduate School,Single,30,-1,1674,7398,0
+20000,Male,Graduate School,Single,24,2,1891,4000,1
+410000,Female,Graduate School,Married,38,2,180922,7670,1
+290000,Female,Graduate School,Married,35,1,0,0,0
+270000,Female,University,Single,26,0,84188,5000,0
+90000,Female,Graduate School,Single,27,0,14421,5000,1
+30000,Female,Graduate School,Single,27,0,26135,2000,0
+360000,Female,Graduate School,Single,28,-1,26349,4045,0
+240000,Male,Graduate School,Married,30,0,240770,9000,0
+180000,Female,University,Single,23,0,180636,20000,0
+50000,Female,University,Married,24,-2,900,0,0
+50000,Female,University,Single,47,0,50883,2200,1
+100000,Male,University,Single,30,0,83312,2900,0
+50000,Male,University,Single,39,0,47378,2105,0
+360000,Male,Graduate School,Married,48,0,167922,6206,0
+550000,Female,Others,Single,52,-2,277376,6001,0
+260000,Female,University,Married,39,2,136007,5400,1
+500000,Female,Graduate School,Married,34,-2,0,0,1
+130000,Male,Graduate School,Single,27,-1,1960,0,0
+30000,Male,University,Single,39,0,28708,1514,0
+160000,Female,Graduate School,Single,40,1,0,0,0
+420000,Female,University,Married,40,-2,3976,2916,1
+50000,Female,High School,Married,22,2,40189,3900,0
+30000,Female,Graduate School,Single,36,1,25747,0,1
+50000,Female,University,Single,32,0,40253,5806,0
+160000,Female,Graduate School,Married,39,1,-5,781,1
+120000,Female,University,Single,26,3,12034,1000,0
+50000,Female,Graduate School,Single,28,0,46471,1807,0
+130000,Male,University,Single,42,0,130888,6000,0
+150000,Male,Graduate School,Single,42,2,143896,6482,1
+450000,Male,Graduate School,Married,48,-1,5514,0,1
+150000,Male,Graduate School,Married,29,-1,46745,20995,0
+90000,Female,University,Single,53,0,30045,1000,0
+130000,Male,Graduate School,Single,28,0,92174,4086,0
+130000,Male,University,Single,30,0,131801,4800,0
+390000,Female,University,Married,41,-2,3983,25146,0
+200000,Male,University,Single,29,2,2500,0,1
+300000,Female,University,Married,50,0,23436,1350,0
+140000,Female,University,Married,42,0,19080,4344,0
+200000,Female,Graduate School,Single,35,-1,909,2711,0
+30000,Male,University,Single,30,0,27020,1703,0
+30000,Male,High School,Single,24,2,28470,3407,0
+160000,Female,Graduate School,Single,37,0,153002,7812,0
+210000,Female,University,Single,37,0,31090,28008,0
+310000,Female,University,Single,51,-2,107,0,0
+400000,Female,University,Married,33,0,9940,0,0
+50000,Female,High School,Married,35,2,49873,2200,1
+80000,Male,University,Married,33,0,28496,1000,0
+340000,Female,Graduate School,Single,33,-1,592,7102,0
+140000,Male,Graduate School,Single,27,-2,25001,141472,0
+380000,Male,Graduate School,Married,39,2,127449,6600,1
+50000,Female,University,Married,34,1,51046,2400,1
+10000,Male,University,Single,35,0,7026,1200,0
+30000,Female,University,Single,28,-1,5688,0,0
+360000,Female,Graduate School,Married,50,-2,0,0,1
+150000,Female,University,Married,47,-1,316,316,0
+50000,Male,Graduate School,Single,23,0,15336,3600,0
+120000,Male,University,Others,42,0,3135,1500,0
+250000,Female,Graduate School,Single,35,-2,661,0,0
+20000,Male,Graduate School,Single,24,0,13359,1300,1
+30000,Female,Graduate School,Married,30,0,15199,3100,1
+260000,Male,Graduate School,Single,27,0,158711,30000,0
+250000,Female,Graduate School,Single,30,0,21550,3500,0
+260000,Female,University,Married,41,-1,2699,3543,0
+150000,Female,University,Married,35,0,128278,7000,0
+160000,Female,University,Single,33,-1,11859,0,0
+80000,Male,University,Single,30,0,79813,3750,0
+500000,Female,Graduate School,Single,38,1,27794,30199,0
+110000,Female,University,Single,29,1,21068,1102,0
+100000,Male,Graduate School,Single,28,0,97032,5341,0
+360000,Female,Graduate School,Married,35,0,101953,25238,0
+330000,Male,University,Married,65,-2,0,2500,0
+70000,Female,High School,Married,32,0,10856,2000,0
+140000,Male,Graduate School,Single,29,-1,1712,2314,1
+200000,Male,Graduate School,Single,31,0,151819,6000,0
+50000,Male,University,Married,49,2,47585,2500,1
+290000,Male,Graduate School,Married,56,0,222000,10013,0
+30000,Male,University,Married,57,0,29204,1400,0
+180000,Female,Graduate School,Single,31,0,21760,1689,0
+30000,Male,University,Married,38,1,16063,2000,0
+180000,Female,Graduate School,Married,47,0,174648,5000,0
+240000,Male,Graduate School,Married,39,-1,2688,2688,0
+130000,Female,Others,Married,27,0,133792,5000,0
+230000,Male,Graduate School,Married,32,1,0,0,1
+500000,Male,Graduate School,Single,43,0,504644,22650,1
+360000,Male,University,Single,33,1,197746,156108,0
+50000,Male,University,Single,23,2,52156,0,1
+200000,Male,Graduate School,Single,31,-1,3702,25678,1
+230000,Male,High School,Married,45,0,203135,10000,0
+70000,Female,University,Married,32,3,31507,3000,1
+50000,Male,University,Single,29,2,39634,2000,1
+500000,Male,Graduate School,Married,46,-1,10678,6944,0
+140000,Male,University,Married,28,0,110384,3729,1
+140000,Female,University,Single,23,-1,396,396,0
+130000,Male,University,Single,32,0,109292,4500,1
+160000,Female,High School,Married,48,0,153390,15102,0
+50000,Female,High School,Single,58,1,0,10587,0
+20000,Male,High School,Married,41,0,13225,1700,1
+360000,Female,University,Married,42,0,118368,10000,0
+160000,Female,University,Married,51,0,67610,4000,1
+250000,Female,Graduate School,Married,37,-2,0,2886,0
+130000,Female,University,Single,25,-2,668,1200,1
+190000,Male,Graduate School,Single,36,1,153364,0,1
+180000,Female,University,Single,25,-1,2984,154,0
+80000,Male,University,Single,43,0,78144,3318,0
+240000,Male,Graduate School,Single,40,-1,1643,8000,0
+50000,Male,University,Single,35,0,48064,2500,0
+160000,Female,University,Single,50,-1,3139,0,0
+90000,Female,University,Married,49,-2,264,0,0
+310000,Female,University,Married,37,-2,100648,716,0
+60000,Male,Graduate School,Single,27,0,39068,1970,0
+240000,Male,Graduate School,Single,29,0,61888,5022,0
+50000,Female,University,Single,24,0,46393,4700,0
+150000,Female,Graduate School,Married,34,-1,9779,61168,0
+130000,Female,University,Married,44,2,99801,5100,0
+20000,Female,Graduate School,Single,22,0,10003,4211,0
+150000,Male,Graduate School,Married,40,0,123548,4221,0
+240000,Male,Graduate School,Married,42,0,72339,20000,0
+50000,Male,High School,Married,45,0,29875,1700,0
+50000,Male,High School,Married,51,0,43045,1738,0
+270000,Male,Graduate School,Married,51,-1,19984,2099,1
+20000,Female,University,Single,28,0,10818,1300,0
+300000,Female,University,Married,33,-2,11613,1513,0
+390000,Male,Graduate School,Married,34,0,91701,60081,0
+110000,Female,Graduate School,Married,37,0,14519,3536,0
+90000,Male,University,Married,58,0,85818,5100,0
+300000,Female,University,Married,30,-2,0,0,0
+450000,Male,High School,Married,42,2,152652,7032,1
+20000,Male,University,Single,30,0,16728,1307,1
+500000,Female,Graduate School,Single,33,1,0,212,0
+30000,Female,University,Single,46,0,32006,1755,0
+50000,Male,University,Single,51,0,15096,2000,1
+140000,Female,University,Single,26,-1,9589,70000,0
+10000,Male,University,Married,43,3,8261,1500,1
+120000,Female,University,Single,37,-2,874,0,0
+30000,Male,University,Single,30,4,30562,0,0
+30000,Female,University,Married,22,1,28037,1745,0
+20000,Male,Graduate School,Single,33,1,17971,0,0
+10000,Male,High School,Single,36,-1,6517,344,0
+110000,Female,University,Married,35,0,114023,4097,1
+280000,Female,Graduate School,Married,29,-1,2597,3807,0
+500000,Male,Graduate School,Married,42,0,580928,25624,0
+50000,Female,High School,Married,46,4,33100,0,1
+230000,Male,University,Married,30,-1,5200,0,0
+50000,Male,University,Married,53,-1,493,493,0
+120000,Female,University,Single,24,0,15322,1500,0
+30000,Male,University,Single,26,0,26301,1500,0
+330000,Female,University,Married,28,0,225079,10000,0
+360000,Male,University,Married,30,-1,6323,7001,0
+40000,Male,Graduate School,Single,26,0,6586,1300,0
+110000,Female,Graduate School,Single,29,2,106321,4334,1
+40000,Female,High School,Married,39,-2,7955,5392,0
+70000,Female,University,Married,30,1,42490,17,0
+100000,Male,High School,Single,26,2,107643,4048,1
+210000,Female,University,Married,50,1,0,0,0
+360000,Male,University,Married,38,-1,696,546,0
+260000,Male,University,Single,27,0,150943,5501,0
+200000,Female,Graduate School,Single,34,-2,500,810,0
+160000,Male,Graduate School,Married,43,-1,162,414,1
+420000,Female,University,Married,29,0,56255,71536,0
+140000,Female,Graduate School,Single,28,1,86761,0,1
+180000,Male,Graduate School,Single,38,0,141510,6700,1
+150000,Female,High School,Married,39,2,153994,0,0
+30000,Female,University,Married,35,0,22206,2000,0
+500000,Male,Graduate School,Single,34,0,8007,3000,0
+240000,Male,University,Married,45,0,237868,8932,0
+30000,Male,Graduate School,Single,22,0,2949,19689,0
+80000,Male,University,Single,28,0,47916,3122,0
+80000,Male,University,Single,34,-2,0,0,0
+310000,Male,Graduate School,Married,39,1,166517,150000,0
+180000,Female,Graduate School,Single,33,0,94001,8000,0
+60000,Female,University,Single,23,0,63709,2351,1
+170000,Female,University,Single,33,-1,298,0,0
+180000,Female,University,Married,31,0,173517,10122,0
+180000,Male,University,Married,28,0,180827,6600,1
+350000,Male,University,Married,42,-1,1037,1365,0
+360000,Male,University,Married,36,0,20328,1204,0
+140000,Male,University,Married,46,0,133899,5425,1
+310000,Female,High School,Single,30,-1,118109,18008,0
+30000,Male,High School,Single,25,0,22821,1397,0
+200000,Male,Graduate School,Single,32,-1,5494,19614,1
+120000,Female,Graduate School,Single,37,2,744,444,1
+260000,Male,Graduate School,Married,45,0,21480,5000,0
+60000,Female,Graduate School,Single,32,0,31142,1842,0
+50000,Male,University,Single,24,0,10834,1509,0
+80000,Male,Graduate School,Single,28,0,47933,2100,0
+60000,Male,Graduate School,Married,46,1,62063,0,0
+200000,Female,High School,Married,27,1,104307,0,0
+140000,Female,Graduate School,Single,28,1,0,0,0
+30000,Male,University,Single,25,0,26328,1868,1
+110000,Female,University,Married,34,0,43780,5000,0
+90000,Female,High School,Married,39,-1,7328,3784,0
+40000,Female,Graduate School,Single,27,1,0,450,0
+110000,Female,Graduate School,Married,29,-1,3261,1380,0
+640000,Female,Graduate School,Single,30,0,116714,4000,0
+20000,Male,High School,Single,51,0,9953,5000,0
+50000,Female,University,Married,36,0,44625,2174,0
+30000,Male,University,Married,25,-1,30527,1389,0
+140000,Female,University,Single,32,0,136497,5000,0
+70000,Male,University,Single,36,1,0,690,0
+80000,Female,University,Married,42,0,23931,1500,1
+100000,Female,Graduate School,Single,31,-2,0,0,0
+180000,Male,High School,Single,40,0,66218,1811,0
+50000,Female,University,Single,28,0,48370,1468,0
+170000,Male,Graduate School,Married,35,-1,473,0,1
+50000,Female,High School,Single,23,2,42304,3695,1
+140000,Male,Graduate School,Single,28,0,135648,5247,1
+100000,Male,Graduate School,Single,29,2,74396,2700,1
+10000,Male,University,Married,36,0,8593,1300,0
+50000,Female,University,Single,22,1,0,0,0
+30000,Female,University,Married,24,0,26755,1507,1
+80000,Female,University,Single,27,0,26144,1075,0
+290000,Female,University,Single,29,0,263796,11320,0
+110000,Female,Graduate School,Single,27,0,8086,1232,0
+30000,Female,High School,Single,24,1,23031,0,0
+70000,Female,University,Married,23,0,43397,2500,0
+300000,Female,Graduate School,Single,26,0,277680,12005,0
+30000,Male,University,Married,47,0,24422,1455,1
+200000,Female,Graduate School,Single,36,-2,0,0,0
+130000,Male,University,Single,27,0,81603,3963,1
+80000,Male,Graduate School,Single,25,-2,0,0,0
+50000,Female,University,Single,37,0,67570,3000,0
+310000,Female,Graduate School,Single,30,0,33376,2000,0
+160000,Female,University,Married,31,-2,11648,780,0
+240000,Male,University,Married,30,0,227675,10003,0
+100000,Female,University,Married,36,0,21237,3013,0
+200000,Female,Others,Single,27,0,98761,4000,0
+50000,Female,Graduate School,Single,38,0,46179,1881,0
+320000,Female,High School,Single,45,0,300598,14227,0
+60000,Male,Graduate School,Married,37,0,57985,2550,0
+60000,Male,University,Married,43,0,58777,5005,0
+50000,Male,University,Married,42,0,46100,2133,1
+360000,Male,University,Married,41,1,0,0,1
+60000,Female,University,Single,32,0,60745,2613,0
+210000,Female,University,Single,28,0,2162,61,0
+20000,Male,University,Married,39,2,20591,1500,1
+100000,Female,University,Married,40,-1,10228,50836,0
+200000,Female,Graduate School,Single,32,0,47517,1219,0
+180000,Female,University,Married,28,-2,454,14213,0
+70000,Female,Graduate School,Single,50,0,64315,3000,1
+170000,Female,University,Single,48,0,83074,4100,0
+80000,Male,Graduate School,Single,36,0,68931,2500,0
+400000,Female,Graduate School,Married,41,-1,8,1274,0
+140000,Female,University,Married,40,0,138389,5148,0
+60000,Male,High School,Married,57,0,20865,1262,0
+360000,Male,Graduate School,Married,44,0,333549,15047,0
+240000,Female,University,Married,43,0,48502,2200,0
+80000,Female,University,Single,27,1,57004,0,0
+270000,Female,University,Single,39,0,116168,4173,0
+30000,Female,High School,Married,39,2,20652,4700,0
+480000,Male,Graduate School,Married,58,-2,24610,4,1
+160000,Male,University,Married,50,-1,269,0,0
+180000,Female,University,Single,34,2,51385,3140,1
+300000,Male,Graduate School,Single,29,1,0,0,1
+200000,Female,Graduate School,Single,31,0,191953,6887,0
+50000,Female,University,Single,30,0,37522,2000,0
+320000,Female,High School,Married,39,-1,3065,41679,0
+30000,Female,Graduate School,Married,35,2,27703,1500,1
+500000,Female,Graduate School,Married,38,-2,913,9782,0
+360000,Male,Graduate School,Married,40,0,21489,41088,0
+50000,Female,University,Married,53,0,13868,2000,1
+80000,Female,Graduate School,Single,29,0,61565,2281,0
+10000,Male,University,Single,23,0,5324,1279,0
+280000,Female,Graduate School,Single,23,0,262579,6453,0
+300000,Female,University,Married,31,-1,3863,5021,0
+280000,Female,Graduate School,Married,34,0,232834,9001,0
+170000,Male,University,Married,42,0,164824,6107,0
+260000,Female,Graduate School,Married,36,0,88273,6000,0
+360000,Male,University,Married,44,1,-3,586,0
+360000,Female,University,Single,31,-1,9688,2012,0
+200000,Male,High School,Married,38,0,181532,8218,0
+50000,Female,Graduate School,Single,28,0,21514,1700,0
+130000,Female,High School,Married,45,1,0,780,0
+20000,Male,University,Married,32,0,19468,3000,1
+50000,Female,High School,Married,52,0,22916,1656,1
+190000,Female,Graduate School,Married,30,-1,6936,500,0
+140000,Male,University,Married,36,-1,780,16570,0
+70000,Female,University,Married,40,0,69341,2000,0
+150000,Female,Graduate School,Single,36,0,15447,0,0
+140000,Female,University,Single,29,0,128583,7000,0
+50000,Male,University,Single,27,2,48387,2300,1
+350000,Female,Graduate School,Married,55,-1,3518,1935,0
+80000,Male,University,Single,53,3,35637,0,1
+400000,Male,Graduate School,Single,34,1,-5,0,0
+170000,Female,Graduate School,Married,38,0,71712,3332,0
+20000,Female,University,Others,47,1,9325,1000,0
+450000,Male,Graduate School,Married,40,-1,22861,6596,0
+200000,Female,High School,Married,36,1,-5,0,0
+200000,Female,Graduate School,Married,36,-1,2444,1301,0
+100000,Female,High School,Married,53,1,0,0,1
+60000,Male,University,Married,39,-1,14085,2000,0
+230000,Female,University,Married,37,0,16302,1265,0
+230000,Female,University,Single,29,0,24524,3000,0
+60000,Male,University,Married,46,-1,3046,285,0
+160000,Female,Graduate School,Married,34,1,5520,0,0
+310000,Male,Graduate School,Married,34,-1,493,0,1
+430000,Female,Graduate School,Single,30,-1,31423,3650,0
+240000,Female,Graduate School,Married,47,1,-40,0,0
+60000,Male,University,Single,29,0,54050,2500,0
+30000,Male,University,Single,28,0,26054,2000,0
+180000,Male,University,Married,30,-1,375,375,1
+20000,Male,University,Single,24,1,16706,2200,0
+30000,Male,Graduate School,Single,24,2,16035,0,1
+320000,Female,University,Married,55,-2,11973,4093,0
+500000,Male,University,Single,32,-1,264,286,1
+170000,Female,University,Married,47,-2,0,0,0
+20000,Female,University,Single,24,0,21838,2349,0
+360000,Female,Graduate School,Single,29,0,73753,5121,0
+90000,Female,Graduate School,Single,24,0,30639,2913,0
+100000,Female,Graduate School,Married,40,-1,390,390,0
+100000,Male,Graduate School,Single,37,1,0,0,0
+170000,Male,Graduate School,Married,38,-2,0,0,0
+100000,Female,University,Single,32,2,50332,3000,0
+20000,Male,Graduate School,Single,27,-1,1760,2200,0
+220000,Male,University,Married,32,-1,1565,3082,0
+470000,Female,Graduate School,Single,31,-2,22988,10363,0
+100000,Female,University,Single,24,0,13391,10608,0
+10000,Male,University,Single,23,-1,1504,6007,0
+230000,Female,Graduate School,Single,29,-1,2521,0,0
+50000,Female,Graduate School,Single,28,2,31336,1500,0
+50000,Male,High School,Married,30,0,50545,2603,1
+120000,Female,University,Single,30,0,116887,4615,0
+160000,Female,Graduate School,Married,33,-1,2355,2719,1
+120000,Male,Graduate School,Single,27,-1,880,880,0
+30000,Female,University,Single,21,0,26757,1520,0
+180000,Female,Graduate School,Single,29,-2,100,0,0
+30000,Male,Graduate School,Single,23,0,30287,1496,0
+200000,Female,Graduate School,Married,40,-1,3266,390,0
+10000,Female,University,Single,33,0,7659,11400,0
+230000,Male,University,Single,39,0,35289,1646,0
+200000,Female,High School,Married,53,0,126027,6500,0
+110000,Male,University,Married,46,0,44953,3087,1
+50000,Male,University,Married,40,0,18485,1166,0
+50000,Female,Graduate School,Single,23,0,48482,2500,0
+220000,Male,Graduate School,Married,43,-2,7695,2725,0
+500000,Male,Graduate School,Married,43,-1,14132,35820,0
+140000,Male,Graduate School,Married,43,0,100021,4535,0
+50000,Female,Graduate School,Single,30,2,2989,1228,0
+50000,Male,University,Single,24,0,7240,2644,0
+150000,Female,Graduate School,Single,32,0,116908,4000,0
+150000,Male,Graduate School,Single,35,1,143376,6000,0
+240000,Male,High School,Married,43,0,191453,8500,0
+180000,Male,Graduate School,Single,31,2,2400,0,0
+240000,Male,University,Married,44,-1,5428,9064,0
+150000,Female,Graduate School,Single,27,0,138066,5767,0
+260000,Female,High School,Married,41,-1,1145,4,0
+160000,Female,Graduate School,Married,34,-1,430,430,0
+290000,Female,University,Married,39,0,58281,5000,0
+240000,Male,Graduate School,Single,48,0,116058,4500,0
+50000,Male,University,Married,56,1,32134,0,0
+170000,Female,High School,Married,46,-1,246,123,0
+130000,Female,Graduate School,Married,46,0,147582,5394,1
+80000,Male,Graduate School,Single,27,1,0,0,0
+30000,Male,Graduate School,Married,45,0,29029,2000,0
+100000,Female,High School,Married,40,0,97763,4500,0
+10000,Male,University,Single,23,2,8638,1294,0
+320000,Female,Graduate School,Single,30,-2,0,200,0
+50000,Male,University,Married,36,0,48612,3000,0
+80000,Male,Graduate School,Single,29,1,49373,1980,1
+70000,Female,University,Single,26,0,23409,3003,1
+700000,Female,University,Single,28,0,16262,2192,0
+360000,Female,Graduate School,Single,26,-1,7142,44234,0
+380000,Male,Graduate School,Single,30,0,192132,10123,0
+50000,Male,University,Single,40,0,42807,10000,0
+20000,Male,University,Married,45,1,20011,0,1
+350000,Female,Graduate School,Single,33,0,314994,12695,1
+80000,Female,Graduate School,Single,29,1,0,0,1
+150000,Male,High School,Married,35,2,135215,6000,1
+300000,Female,Graduate School,Single,35,0,47295,55047,0
+400000,Male,University,Married,39,0,399099,15395,0
+250000,Female,University,Married,35,0,242971,8310,0
+50000,Female,Graduate School,Single,23,0,49061,2138,1
+230000,Female,High School,Married,44,-1,54798,11632,0
+100000,Female,University,Single,27,0,43865,2400,0
+380000,Male,University,Single,34,-1,4384,0,0
+260000,Female,University,Single,28,0,251560,9139,0
+470000,Male,University,Single,31,0,78833,3000,0
+80000,Male,University,Single,29,-1,48098,2081,0
+140000,Male,Graduate School,Single,36,0,74177,4000,1
+140000,Male,Graduate School,Married,45,2,64393,2500,1
+50000,Male,University,Married,36,2,50808,0,1
+180000,Female,University,Married,37,0,14704,1244,1
+30000,Female,University,Single,23,4,15847,0,1
+30000,Female,University,Single,22,2,18796,2000,1
+50000,Female,Graduate School,Married,25,0,30479,1528,1
+250000,Female,Graduate School,Single,35,1,45429,0,0
+310000,Female,University,Single,25,0,38942,135868,0
+360000,Female,University,Married,51,1,0,0,0
+50000,Male,University,Single,31,0,48640,2000,0
+480000,Female,Graduate School,Single,30,0,75834,105,0
+50000,Female,Graduate School,Single,24,-1,1303,1303,0
+160000,Male,University,Single,37,1,0,0,0
+140000,Female,Graduate School,Single,30,0,4975,1200,0
+310000,Female,Graduate School,Single,27,-1,833,323,0
+30000,Female,Graduate School,Single,27,-1,7843,2670,0
+40000,Female,University,Married,27,1,11310,0,0
+80000,Male,High School,Married,55,0,79788,25000,0
+30000,Female,University,Single,41,0,25159,1762,0
+200000,Female,University,Single,26,-2,58147,1468,0
+270000,Female,High School,Married,42,2,106986,3578,0
+180000,Male,Graduate School,Single,32,0,154374,7500,0
+180000,Female,University,Single,34,-1,194,200,0
+10000,Male,High School,Single,21,0,4797,2400,0
+120000,Male,Graduate School,Married,48,-1,360,360,1
+30000,Male,Graduate School,Single,36,2,24796,2000,1
+300000,Female,Graduate School,Married,30,1,-6027,0,0
+50000,Female,High School,Married,61,2,5016,1135,0
+150000,Male,High School,Married,58,-2,0,0,0
+300000,Female,University,Single,36,-1,1041,3,0
+50000,Male,University,Single,26,0,48065,2500,0
+10000,Male,University,Single,34,0,8097,1151,0
+260000,Female,University,Single,35,-1,188,188,0
+400000,Female,Graduate School,Married,28,-1,6500,405016,0
+50000,Female,University,Single,39,0,26262,4000,0
+50000,Female,University,Single,22,2,50974,3000,0
+280000,Female,Graduate School,Single,28,-1,3678,3658,0
+10000,Female,University,Single,22,0,10063,1194,0
+80000,Female,University,Married,32,1,73181,0,0
+150000,Female,High School,Married,52,-1,170,1125,0
+20000,Female,University,Single,53,4,18396,0,0
+50000,Male,University,Single,24,1,0,0,1
+170000,Female,Graduate School,Single,31,1,0,843,0
+130000,Female,University,Single,33,0,79579,2702,0
+50000,Female,University,Single,29,-1,1303,801,0
+50000,Female,High School,Single,27,1,18019,5559,0
+180000,Male,University,Married,40,0,126937,5000,1
+20000,Female,University,Married,50,0,10511,4593,0
+410000,Male,Graduate School,Single,29,0,53629,3037,0
+50000,Female,University,Single,26,0,26701,1749,0
+270000,Female,University,Married,47,-2,8400,7500,0
+80000,Female,Graduate School,Single,24,-1,3940,4117,0
+140000,Female,High School,Single,48,0,129915,6000,0
+60000,Male,University,Married,29,2,40587,2000,1
+180000,Female,Graduate School,Single,26,2,182218,0,0
+320000,Female,Graduate School,Single,34,-2,-532,0,0
+40000,Male,University,Single,26,3,5754,900,1
+380000,Female,Graduate School,Single,32,-1,41727,44178,0
+150000,Female,University,Married,36,1,125938,9734,0
+110000,Female,Graduate School,Single,25,0,63594,2260,0
+110000,Male,Graduate School,Single,26,0,109386,2432,0
+100000,Female,High School,Married,26,0,45807,5000,0
+240000,Male,Graduate School,Single,35,-1,11906,55167,0
+50000,Female,University,Single,25,0,30857,1800,0
+50000,Male,Graduate School,Single,38,0,27148,1655,0
+200000,Male,University,Single,29,1,204212,0,0
+80000,Female,Graduate School,Single,27,2,100,980,0
+40000,Male,High School,Single,49,2,34470,2000,1
+250000,Male,University,Married,29,1,-24,0,0
+230000,Female,Graduate School,Single,24,0,69158,8149,0
+30000,Male,Graduate School,Single,26,2,7510,0,1
+80000,Female,High School,Single,52,0,73056,2953,1
+100000,Female,Graduate School,Single,37,-1,1725,21711,0
+230000,Female,University,Married,28,0,54228,1500,0
+490000,Male,Graduate School,Married,50,-1,17884,6836,1
+30000,Female,High School,Single,44,2,25039,1700,1
+40000,Male,High School,Married,49,0,34203,1880,0
+450000,Male,Graduate School,Married,34,-2,-9,1853,1
+50000,Female,University,Single,23,2,48537,2000,1
+20000,Male,University,Single,26,1,19276,0,0
+260000,Female,High School,Married,36,0,118028,4379,0
+10000,Male,University,Single,28,-1,200,0,0
+500000,Male,Graduate School,Single,38,-2,0,0,0
+70000,Female,University,Married,25,0,34645,2000,0
+100000,Female,High School,Married,28,-2,3166,6896,0
+50000,Female,High School,Married,30,-2,0,614,0
+400000,Female,Graduate School,Single,33,-2,0,0,1
+270000,Male,University,Married,49,-1,3197,61164,0
+260000,Male,Graduate School,Single,38,0,3953,3953,0
+250000,Female,University,Single,32,0,72618,3800,0
+360000,Male,Graduate School,Single,30,-1,28245,8057,0
+230000,Male,University,Single,30,0,120113,4501,0
+110000,Male,University,Single,27,0,84342,6657,0
+50000,Female,University,Married,32,-1,316,282,0
+80000,Male,Graduate School,Married,38,0,39410,3000,0
+10000,Female,University,Married,32,0,4695,1103,1
+20000,Male,High School,Married,52,1,10474,0,1
+200000,Female,High School,Single,33,-1,834,165,0
+180000,Male,Graduate School,Single,40,0,93648,4246,0
+120000,Female,High School,Married,69,0,106775,3810,0
+80000,Female,University,Married,47,0,6612,1188,1
+220000,Male,Graduate School,Married,50,-1,3484,6758,0
+360000,Male,Graduate School,Single,32,-1,8140,0,0
+50000,Female,Graduate School,Single,25,0,49604,2000,0
+150000,Female,University,Married,31,-1,23427,5554,0
+70000,Female,University,Single,27,0,67661,3409,0
+20000,Female,Graduate School,Single,25,1,8483,2000,1
+150000,Female,Graduate School,Married,32,1,0,0,1
+40000,Male,University,Married,35,1,0,0,0
+170000,Male,Graduate School,Single,27,0,133149,11936,0
+220000,Female,University,Married,37,2,48166,2100,0
+110000,Female,University,Married,41,0,108936,3991,1
+30000,Female,University,Married,47,0,29187,1505,0
+230000,Female,Graduate School,Single,32,0,195409,21844,0
+270000,Female,University,Single,38,-2,11421,2000,0
+360000,Female,Graduate School,Single,47,2,291410,22913,1
+180000,Female,University,Single,25,0,169225,7800,0
+360000,Male,University,Married,49,1,0,0,1
+280000,Female,Graduate School,Single,47,0,269124,11268,0
+170000,Female,Graduate School,Married,63,0,54957,3000,0
+50000,Female,University,Others,33,0,48111,2100,0
+160000,Male,High School,Single,44,-1,325,325,0
+160000,Female,Graduate School,Single,42,-2,0,0,0
+160000,Female,High School,Single,24,0,153470,10000,0
+50000,Male,University,Single,27,2,94297,5600,0
+50000,Female,University,Single,49,2,50986,1325,0
+180000,Female,University,Single,30,0,101281,4650,0
+210000,Female,Graduate School,Single,38,-2,0,0,0
+20000,Male,University,Married,40,1,21447,0,0
+570000,Male,Graduate School,Single,29,-1,4119,0,0
+280000,Male,Graduate School,Single,27,-1,5521,7,0
+50000,Female,University,Single,24,1,49469,0,1
+20000,Female,University,Single,46,1,16626,0,1
+30000,Male,Graduate School,Single,28,-1,9262,3000,1
+240000,Male,Graduate School,Married,37,0,222976,8494,0
+30000,Male,Graduate School,Single,47,2,25690,1445,0
+90000,Female,University,Married,47,0,28477,2000,0
+60000,Male,University,Single,25,0,56598,3000,0
+500000,Male,Graduate School,Single,34,0,203729,10043,0
+70000,Male,University,Married,34,0,66977,3013,1
+140000,Male,University,Single,27,2,30121,4000,0
+30000,Male,University,Single,32,3,31895,0,0
+140000,Male,University,Married,55,0,78314,4000,0
+290000,Male,Graduate School,Single,44,-2,0,0,0
+360000,Female,Graduate School,Married,36,-2,0,0,0
+30000,Male,Graduate School,Single,34,0,29160,1494,0
+170000,Female,University,Married,33,-1,1870,0,0
+290000,Male,Graduate School,Married,46,-1,2052,4146,0
+270000,Female,University,Single,53,1,716,2,0
+200000,Male,Graduate School,Married,49,-1,1377,792,1
+140000,Female,Graduate School,Married,45,-1,316,316,0
+230000,Female,High School,Single,39,-2,1876,1876,0
+80000,Male,Graduate School,Single,26,0,76867,1600,0
+350000,Male,University,Single,36,0,230738,9503,0
+50000,Female,High School,Married,42,0,51123,2089,0
+50000,Male,High School,Single,32,0,50114,1991,0
+390000,Female,University,Married,45,1,0,0,1
+80000,Female,Graduate School,Single,25,1,0,0,0
+150000,Female,University,Single,23,-2,27414,10096,0
+10000,Male,University,Single,23,0,8109,1400,0
+100000,Male,University,Single,31,1,97669,0,0
+310000,Female,Graduate School,Married,41,0,48815,2063,0
+80000,Male,Graduate School,Single,31,0,46516,15029,0
+40000,Male,Graduate School,Married,46,-1,780,0,1
+180000,Female,University,Married,43,-2,1091,2062,0
+100000,Male,Graduate School,Married,60,1,8311,0,0
+140000,Female,Graduate School,Married,31,-2,1094,422,0
+230000,Male,University,Single,27,-1,396,396,0
+20000,Male,University,Single,24,1,10248,1000,1
+90000,Female,High School,Married,55,0,23379,1402,0
+50000,Male,Graduate School,Single,29,0,16303,32000,0
+50000,Female,University,Married,39,0,22187,5000,0
+50000,Male,University,Married,51,0,46926,3000,0
+200000,Male,Graduate School,Single,28,-1,13383,1183,0
+20000,Male,Graduate School,Single,23,1,18679,2300,1
+400000,Male,Graduate School,Single,46,0,10965,9283,0
+20000,Male,University,Single,38,0,17760,6500,0
+40000,Male,University,Single,36,1,0,0,0
+240000,Male,Graduate School,Single,28,0,134245,4967,0
+240000,Female,Graduate School,Married,37,-1,5024,17267,0
+80000,Female,University,Single,38,0,70590,2200,0
+180000,Male,Graduate School,Married,37,0,121744,4406,0
+20000,Female,Graduate School,Single,22,0,3978,1500,1
+20000,Male,High School,Married,44,0,18467,1328,0
+20000,Male,University,Married,41,-1,9598,0,0
+360000,Female,University,Single,25,-1,947,4098,0
+130000,Female,University,Married,31,-1,6279,11054,0
+30000,Male,Graduate School,Married,47,2,2400,0,1
+110000,Female,University,Married,33,1,7801,0,0
+30000,Male,High School,Single,46,6,33134,0,0
+80000,Female,Graduate School,Single,26,0,29322,3000,1
+350000,Female,High School,Married,31,0,144718,100012,0
+50000,Female,University,Single,24,1,0,0,0
+110000,Female,Graduate School,Single,23,0,83914,4022,0
+180000,Female,University,Single,32,-1,134,357,0
+440000,Female,Others,Married,32,0,324469,11129,0
+100000,Female,Graduate School,Single,42,0,91141,2635,0
+200000,Male,Graduate School,Single,33,-2,5448,7616,0
+140000,Female,University,Married,39,0,131764,6500,0
+230000,Female,Graduate School,Single,31,1,-22,249,1
+10000,Female,University,Single,44,-2,390,390,1
+280000,Female,University,Single,35,0,184050,7088,0
+310000,Male,University,Married,42,-2,2647,4206,1
+260000,Female,Others,Married,33,0,3140,2740,0
+270000,Male,Graduate School,Single,29,-1,38840,197,0
+50000,Male,Graduate School,Single,23,0,48954,2000,0
+450000,Male,Graduate School,Married,46,-1,1135,1135,0
+190000,Male,High School,Married,54,1,-596,0,0
+290000,Female,Graduate School,Single,28,0,42601,5000,0
+480000,Female,Graduate School,Married,60,-1,13406,13736,0
+360000,Male,Others,Married,34,0,30447,10052,0
+380000,Male,Graduate School,Single,32,-1,326,24899,0
+20000,Male,High School,Single,24,0,9249,1500,1
+50000,Male,High School,Married,52,0,39429,1781,0
+450000,Male,Graduate School,Married,34,0,451113,18500,0
+50000,Female,University,Single,24,2,48283,1700,1
+380000,Male,Graduate School,Married,34,0,106065,3385,0
+150000,Female,High School,Single,34,-2,1464,1586,0
+50000,Female,University,Single,23,0,21249,1649,0
+150000,Female,Graduate School,Single,33,1,-286,0,0
+50000,Female,High School,Married,43,0,39177,2000,0
+50000,Male,University,Single,24,0,34695,6013,0
+230000,Female,Graduate School,Single,26,-1,3741,0,1
+310000,Female,Graduate School,Single,32,-1,13560,0,0
+310000,Female,University,Married,34,-1,3595,4073,0
+50000,Female,University,Married,50,0,50658,2357,0
+240000,Male,University,Single,32,1,0,0,0
+400000,Female,Graduate School,Single,26,-1,1140,940,0
+90000,Female,University,Single,47,0,17247,1000,0
+50000,Male,High School,Married,43,2,10413,3000,0
+250000,Female,Graduate School,Single,30,0,156155,5560,0
+200000,Male,Graduate School,Married,44,0,66689,2362,0
+10000,Male,High School,Married,37,0,14498,2248,0
+310000,Male,University,Married,36,0,32782,11406,0
+340000,Female,High School,Single,46,1,0,1321,1
+220000,Male,University,Married,34,0,144544,6500,0
+500000,Male,Graduate School,Single,36,0,151630,12074,0
+80000,Female,University,Married,41,0,52564,1963,0
+200000,Male,Graduate School,Single,36,0,103457,10000,0
+270000,Female,University,Married,36,-2,72364,65140,0
+100000,Female,University,Single,29,2,42440,1599,1
+100000,Male,University,Single,29,1,99679,4000,1
+50000,Male,Graduate School,Single,25,-2,0,0,0
+260000,Female,High School,Single,33,1,0,926,0
+100000,Female,University,Single,24,-1,396,0,0
+240000,Female,High School,Single,31,-1,298,0,0
+50000,Female,High School,Married,28,0,14484,4300,1
+50000,Male,University,Single,54,0,72374,15000,0
+200000,Female,Graduate School,Single,44,0,34606,2100,0
+150000,Male,University,Married,36,2,86333,4000,1
+150000,Male,University,Married,41,0,19350,2000,0
+50000,Male,University,Married,40,0,50066,1450,0
+100000,Male,Graduate School,Single,27,0,70204,4000,0
+30000,Male,Graduate School,Single,25,1,0,327,0
+350000,Female,Graduate School,Single,31,0,96659,3500,0
+50000,Male,University,Single,50,0,25925,0,0
+50000,Female,University,Married,34,2,20377,1650,0
+250000,Female,Graduate School,Single,43,-1,2177,2000,0
+300000,Female,High School,Single,24,-1,7571,1181,1
+80000,Female,University,Single,22,-1,1812,390,0
+30000,Male,University,Married,45,1,14286,1542,1
+370000,Female,High School,Single,29,-1,28397,19438,0
+50000,Male,University,Married,43,0,8214,1140,0
+30000,Female,Graduate School,Married,22,0,22479,2004,0
+200000,Female,University,Single,26,0,26179,4022,0
+230000,Female,High School,Married,48,1,-5,4587,0
+290000,Female,University,Single,29,0,68652,3149,0
+100000,Female,University,Married,39,-2,0,0,0
+80000,Male,Graduate School,Single,26,1,40029,2000,0
+90000,Female,University,Single,39,0,45709,2000,1
+90000,Male,Graduate School,Married,37,0,40028,2000,0
+80000,Male,University,Single,26,0,15387,6000,0
+260000,Female,Graduate School,Single,39,-1,1550,978,0
+210000,Male,Graduate School,Single,34,0,44947,2000,0
+80000,Female,University,Married,26,2,27458,1800,1
+170000,Female,University,Single,25,0,4360,7000,0
+240000,Female,Graduate School,Single,38,0,167856,7000,0
+150000,Female,Graduate School,Single,32,1,-300,7928,0
+220000,Female,Graduate School,Single,28,-1,25000,25000,0
+80000,Male,Graduate School,Single,27,0,42653,5000,0
+160000,Male,Graduate School,Married,41,-2,155458,7200,0
+140000,Female,University,Married,37,0,136650,5420,0
+130000,Female,University,Single,25,-2,0,0,0
+280000,Female,Graduate School,Single,34,0,95278,4525,0
+200000,Male,Graduate School,Single,28,-1,4420,0,0
+50000,Male,University,Others,36,0,26432,1700,1
+150000,Female,University,Single,44,0,39471,2636,0
+210000,Female,Graduate School,Married,36,0,15991,0,0
+190000,Male,Graduate School,Married,39,1,-2,0,1
+140000,Female,Graduate School,Single,28,0,123662,4600,0
+160000,Female,High School,Married,41,-1,2392,316,0
+180000,Female,Graduate School,Married,39,0,177893,6200,1
+20000,Female,Graduate School,Single,22,0,9930,1336,0
+50000,Female,University,Married,37,0,23325,1400,0
+150000,Female,Graduate School,Married,34,1,0,0,0
+210000,Female,High School,Married,33,0,127491,6018,0
+500000,Female,University,Single,43,-1,374776,27024,0
+70000,Female,University,Single,22,0,24817,3018,0
+100000,Male,Graduate School,Single,27,-1,17553,0,1
+20000,Female,University,Single,27,0,15703,2000,0
+20000,Male,University,Single,23,0,13987,2200,0
+50000,Male,University,Single,25,2,47987,2690,1
+340000,Female,High School,Single,27,1,29221,12,0
+130000,Male,Graduate School,Single,29,0,9700,0,0
+320000,Female,University,Married,53,0,8513,150000,0
+220000,Female,Graduate School,Married,41,7,243234,0,1
+50000,Male,High School,Married,24,0,48497,2089,1
+150000,Female,Graduate School,Single,25,0,39445,2000,0
+500000,Male,Graduate School,Single,29,-1,20641,26111,0
+220000,Female,Graduate School,Single,29,-1,13488,16130,0
+70000,Female,Graduate School,Single,24,-1,28059,2000,0
+70000,Female,University,Single,24,0,27431,5006,0
+360000,Female,Graduate School,Single,26,-2,0,0,0
+100000,Male,Graduate School,Single,30,2,53718,3000,0
+360000,Male,High School,Single,30,1,0,0,1
+210000,Female,University,Single,29,0,14073,2059,0
+50000,Female,University,Married,34,0,37643,1538,0
+50000,Female,Graduate School,Single,25,0,34476,2000,0
+400000,Female,Graduate School,Single,30,0,139862,9209,0
+290000,Female,Graduate School,Married,36,0,49885,2400,0
+320000,Female,University,Married,39,0,38332,30000,0
+20000,Female,University,Married,24,2,17597,1619,1
+500000,Male,Graduate School,Married,51,0,225233,15000,0
+290000,Male,Graduate School,Married,40,0,63421,1619,0
+50000,Male,Graduate School,Married,38,0,47036,1704,0
+50000,Male,Graduate School,Married,60,0,50546,2112,0
+390000,Female,High School,Married,53,-1,2820,513,0
+90000,Female,Graduate School,Single,27,-1,700,2122,1
+50000,Female,High School,Married,53,0,20225,1800,0
+230000,Female,Graduate School,Single,28,0,51545,2000,1
+70000,Male,University,Married,37,2,44207,2100,1
+360000,Female,University,Married,36,0,63389,15000,0
+300000,Female,University,Single,31,-1,10817,2000,0
+50000,Male,High School,Married,51,0,5703,2000,1
+80000,Female,High School,Single,26,1,4923,0,1
+10000,Male,University,Married,50,2,9415,1000,0
+10000,Male,University,Single,28,0,5236,2200,0
+140000,Female,University,Single,29,-1,743,900,0
+100000,Male,University,Married,50,0,23908,1000,0
+30000,Male,High School,Single,42,0,8185,1500,1
+280000,Male,Graduate School,Married,49,-1,503,694,0
+80000,Female,Graduate School,Single,25,1,5999,0,1
+30000,Male,University,Married,59,0,20044,1531,0
+390000,Female,Graduate School,Married,64,1,0,0,1
+10000,Female,High School,Married,49,-1,1409,0,1
+70000,Male,Graduate School,Married,49,-1,1261,2431,1
+160000,Female,University,Single,37,-1,12037,5123,0
+80000,Female,University,Married,51,1,15839,1300,0
+30000,Male,University,Single,24,0,22885,3700,1
+200000,Female,University,Married,40,-1,7236,13003,0
+60000,Female,Graduate School,Single,24,-1,836,390,1
+220000,Female,Graduate School,Married,36,0,56477,9005,0
+30000,Female,Graduate School,Married,41,0,26480,1481,0
+60000,Female,Graduate School,Single,33,0,11237,10000,1
+30000,Female,University,Single,23,3,29093,1200,1
+180000,Male,Graduate School,Single,36,0,141489,4000,0
+430000,Female,Graduate School,Married,34,-1,5149,4600,0
+190000,Male,University,Single,28,-2,0,0,1
+20000,Female,High School,Others,39,0,18988,1400,0
+150000,Female,Graduate School,Married,48,-2,6773,1731,0
+140000,Male,University,Single,61,0,131978,6000,0
+200000,Male,Graduate School,Married,40,-1,3061,13681,0
+240000,Female,Graduate School,Single,35,-2,0,0,0
+50000,Male,University,Married,33,2,42923,2330,0
+50000,Male,University,Married,37,0,24801,3200,1
+470000,Male,University,Single,34,0,247503,9024,0
+90000,Male,University,Married,37,2,117793,6732,0
+420000,Female,Graduate School,Married,45,1,-34,2686,0
+430000,Female,University,Married,32,-2,-15308,8,0
+150000,Female,Others,Single,27,-2,12178,22452,0
+450000,Female,Graduate School,Married,35,-1,6126,6869,0
+220000,Female,University,Married,41,1,6516,0,0
+150000,Male,Graduate School,Single,35,0,251063,9000,0
+120000,Male,Graduate School,Single,31,-1,2340,4908,0
+80000,Female,Graduate School,Married,32,-2,871,3381,0
+230000,Female,University,Single,32,1,73757,0,0
+40000,Female,High School,Married,24,1,25933,0,0
+300000,Female,University,Married,66,1,0,200,0
+360000,Female,University,Married,51,-1,19155,1473,0
+50000,Female,Graduate School,Single,22,2,8823,2000,1
+200000,Male,Graduate School,Single,33,-1,3557,0,0
+170000,Male,University,Married,38,0,157174,9900,0
+100000,Female,High School,Single,56,0,70048,2200,0
+20000,Female,Graduate School,Single,21,0,17374,1466,0
+20000,Male,High School,Single,24,-1,1473,390,0
+210000,Female,University,Married,36,-1,47718,199000,0
+50000,Female,University,Single,29,0,49793,1866,0
+50000,Male,University,Single,42,0,47689,2000,0
+120000,Male,University,Single,26,0,114815,6159,1
+80000,Female,High School,Married,52,2,36649,3700,1
+190000,Male,Graduate School,Single,29,1,0,2214,1
+180000,Female,Graduate School,Single,31,1,0,0,1
+180000,Male,Graduate School,Single,29,-1,996,1597,1
+110000,Female,University,Single,26,0,46633,4029,0
+220000,Male,Graduate School,Single,35,-1,53419,7000,0
+80000,Female,University,Single,22,0,2524,3000,0
+200000,Male,Graduate School,Married,40,-2,15596,1470,0
+60000,Male,Graduate School,Married,51,0,35619,1647,1
+500000,Female,Graduate School,Married,42,1,0,0,1
+300000,Male,Graduate School,Single,33,0,34339,962,0
+150000,Female,Graduate School,Married,30,2,151060,3864,1
+90000,Female,University,Single,24,1,13137,0,0
+240000,Male,Graduate School,Married,44,-1,1473,1473,1
+190000,Female,University,Married,45,0,143701,4664,0
+80000,Male,Graduate School,Single,37,1,0,0,0
+500000,Female,University,Single,37,0,51652,5300,0
+360000,Female,Graduate School,Single,42,-2,45603,29882,0
+60000,Male,High School,Single,24,0,33270,1715,0
+140000,Female,Graduate School,Single,25,-1,1605,1844,0
+50000,Female,University,Married,37,0,25234,1384,0
+30000,Male,High School,Married,41,1,4743,1090,1
+150000,Female,High School,Married,28,0,70537,8000,0
+180000,Female,University,Married,26,-2,52666,272817,0
+660000,Female,University,Single,33,0,495736,13393,0
+210000,Male,Graduate School,Single,28,1,5397,5000,0
+230000,Female,Others,Single,52,0,155131,6641,0
+100000,Male,University,Married,24,1,13233,700,1
+50000,Male,High School,Married,43,0,5653,11340,0
+340000,Male,University,Married,38,0,8347,2000,0
+70000,Male,High School,Others,26,1,39073,2000,1
+220000,Female,Graduate School,Married,43,0,6873,2230,0
+170000,Female,University,Married,30,-1,2457,4599,1
+280000,Male,University,Single,26,-2,2103,2781,0
+370000,Female,Graduate School,Married,39,0,141552,2157,0
+450000,Male,Graduate School,Single,36,-1,2500,0,1
+130000,Male,University,Single,28,0,88549,2709,0
+400000,Female,Graduate School,Married,32,1,0,6656,0
+90000,Male,Graduate School,Single,29,0,72774,2900,0
+450000,Male,Graduate School,Married,33,-1,390,390,0
+130000,Female,University,Single,33,0,127781,4700,0
+20000,Female,Graduate School,Single,24,-1,19214,0,0
+50000,Female,University,Married,27,2,49882,0,1
+200000,Female,University,Single,29,0,93005,3500,0
+150000,Male,Graduate School,Single,55,-1,291,291,0
+250000,Female,Graduate School,Single,40,-1,3264,2395,0
+270000,Male,Graduate School,Married,47,-1,5908,2222,0
+260000,Male,Graduate School,Single,36,0,97626,5195,0
+110000,Male,University,Married,34,0,102638,4600,0
+50000,Male,University,Single,25,2,42883,6000,0
+200000,Male,University,Married,49,-1,8143,39100,0
+60000,Female,University,Single,25,0,53447,2025,0
+200000,Male,University,Married,46,1,0,760,0
+50000,Male,High School,Married,38,-1,9536,7014,0
+70000,Female,Graduate School,Single,22,3,71501,1966,0
+50000,Female,Graduate School,Single,24,0,48482,2000,0
+230000,Female,High School,Single,34,0,1181,1394,0
+20000,Female,University,Married,45,-1,10827,8264,0
+330000,Male,University,Single,28,0,53260,2000,0
+50000,Female,High School,Single,51,0,26660,1763,1
+460000,Male,Graduate School,Married,42,-1,11924,7728,0
+30000,Female,University,Single,27,0,26264,2000,1
+200000,Male,High School,Single,30,0,111053,5020,0
+50000,Male,Graduate School,Single,23,2,7557,0,1
+50000,Male,University,Married,38,0,25742,1372,0
+200000,Male,University,Married,33,-1,3353,5341,0
+360000,Female,Graduate School,Single,38,0,53035,3492,0
+230000,Male,Graduate School,Single,43,-1,1156,10000,0
+200000,Female,University,Married,34,0,68262,2791,0
+20000,Male,High School,Married,51,0,14341,1266,1
+200000,Male,University,Married,41,0,193781,10000,0
+30000,Female,Graduate School,Single,27,-2,1784,6855,0
+360000,Female,Graduate School,Married,31,0,6656,2000,0
+10000,Male,University,Single,29,2,5833,2600,1
+180000,Female,University,Married,34,0,181076,6013,0
+270000,Male,University,Single,31,-2,0,1521,0
+240000,Female,University,Single,42,1,187558,6700,1
+280000,Female,University,Single,43,0,31248,1527,0
+330000,Male,University,Married,34,-1,901,0,0
+200000,Female,High School,Single,46,-2,13929,2837,0
+100000,Male,High School,Single,36,2,38546,4400,0
+170000,Female,Graduate School,Single,30,2,27267,2000,1
+210000,Female,Graduate School,Single,29,0,3675,1082,0
+100000,Male,University,Single,26,0,32186,2000,0
+200000,Female,High School,Married,67,0,188072,17014,0
+230000,Male,University,Single,30,0,101832,5000,0
+90000,Female,High School,Single,35,0,101106,4208,0
+170000,Female,University,Married,28,0,105509,4285,0
+100000,Female,University,Married,42,2,101494,13,1
+20000,Male,High School,Single,24,0,17895,2000,0
+50000,Male,High School,Married,53,0,6618,1133,0
+180000,Female,High School,Single,40,-2,0,0,1
+60000,Female,University,Married,28,0,29701,1463,1
+80000,Female,University,Married,39,0,8755,5000,0
+350000,Male,Graduate School,Married,30,-2,51672,13113,0
+20000,Male,High School,Married,28,1,0,0,0
+160000,Female,High School,Married,30,-1,2215,8977,0
+150000,Male,High School,Single,29,1,54803,0,1
+150000,Male,High School,Married,44,0,80015,19418,0
+50000,Male,University,Single,46,0,45206,2362,0
+500000,Male,Graduate School,Married,45,-1,6197,4969,0
+480000,Female,University,Married,37,-1,5269,2217,0
+340000,Male,Graduate School,Single,44,0,83059,20000,1
+180000,Female,Graduate School,Married,47,1,0,0,0
+50000,Female,High School,Others,45,0,25649,1347,0
+360000,Female,Graduate School,Single,31,0,114287,5200,0
+350000,Male,High School,Married,31,0,232501,20000,0
+180000,Male,University,Single,30,-1,10000,10000,0
+50000,Male,Graduate School,Married,34,0,45341,2115,0
+70000,Female,University,Married,31,-1,6895,10000,1
+40000,Female,High School,Single,23,2,34415,1600,1
+270000,Female,Graduate School,Married,37,-1,1257,73700,0
+70000,Female,University,Single,23,0,4753,1172,0
+280000,Male,Graduate School,Single,41,-2,5819,3683,0
+260000,Female,University,Married,42,0,194654,6540,0
+20000,Female,Graduate School,Single,23,0,6988,1444,1
+80000,Female,Graduate School,Single,26,0,78128,3362,1
+20000,Male,High School,Married,27,1,20386,1000,0
+290000,Male,Graduate School,Single,37,2,131279,4700,1
+180000,Female,University,Single,33,0,10083,1600,0
+250000,Female,Graduate School,Married,43,-1,1041,6854,0
+230000,Male,Graduate School,Single,30,0,149545,60000,0
+240000,Female,University,Single,41,0,8435,169,0
+190000,Female,University,Married,34,0,139569,4500,0
+200000,Male,Graduate School,Single,29,-1,170,3318,0
+30000,Male,University,Single,43,0,28118,2000,0
+180000,Female,High School,Single,42,0,102646,5006,0
+210000,Female,Graduate School,Single,33,-1,1888,320,1
+30000,Female,University,Married,34,-1,5088,996,1
+60000,Female,University,Single,44,0,43221,2021,0
+200000,Male,Graduate School,Married,29,-1,295,1130,0
+150000,Female,Graduate School,Single,35,0,128102,12644,1
+130000,Female,Graduate School,Married,41,-1,767,9804,1
+210000,Female,University,Married,34,0,149092,7000,0
+140000,Female,University,Married,37,-1,65863,3000,0
+80000,Female,University,Married,39,1,38986,6,1
+240000,Female,Graduate School,Married,36,-1,12181,6746,0
+50000,Male,High School,Single,44,0,48138,2203,0
+50000,Male,Graduate School,Single,30,0,45948,6012,0
+160000,Male,University,Married,49,-1,435,435,0
+260000,Female,Graduate School,Single,31,-1,597,1306,0
+80000,Male,University,Single,31,8,126786,0,0
+100000,Female,University,Married,46,-1,5291,0,1
+280000,Female,University,Married,35,1,-3,149,0
+50000,Male,High School,Single,36,0,47911,2166,0
+260000,Female,Graduate School,Single,31,-1,1629,933,0
+30000,Male,University,Married,36,0,18441,1800,1
+30000,Female,University,Married,25,1,30347,0,0
+130000,Male,University,Single,33,2,2183,13,0
+50000,Female,High School,Single,48,0,1500,2000,0
+150000,Male,University,Married,35,2,144194,6600,1
+200000,Female,Graduate School,Married,58,-2,-200,0,0
+560000,Male,University,Married,32,1,19924,1163,0
+20000,Male,High School,Single,26,-1,20780,0,0
+270000,Male,Graduate School,Married,41,-1,316,316,0
+300000,Female,Graduate School,Single,48,2,304040,11500,1
+100000,Female,University,Married,29,1,85264,0,0
+20000,Male,High School,Single,43,0,19107,1330,0
+20000,Male,High School,Married,69,0,18516,1700,0
+70000,Male,High School,Married,56,0,42039,5000,0
+60000,Female,University,Single,31,2,19921,1700,0
+180000,Female,University,Married,51,2,15100,0,0
+360000,Female,Graduate School,Single,31,-1,1041,5821,0
+200000,Male,Graduate School,Married,34,2,170487,8000,1
+170000,Female,University,Married,42,0,37772,2000,0
+170000,Female,Graduate School,Married,51,-2,4667,5914,0
+60000,Female,High School,Married,51,1,0,724,0
+30000,Male,University,Single,27,1,29956,0,1
+140000,Female,Graduate School,Married,37,0,37625,1657,0
+20000,Male,Graduate School,Single,25,0,15848,1595,0
+20000,Female,Graduate School,Single,27,0,14088,1111,0
+520000,Male,Graduate School,Single,39,-1,13273,12712,0
+30000,Female,University,Single,22,0,10348,1013,0
+70000,Female,Graduate School,Single,24,1,71017,0,0
+80000,Female,High School,Married,54,0,76752,3768,0
+50000,Male,High School,Married,36,1,21031,0,1
+170000,Female,Graduate School,Single,25,0,26359,1275,0
+160000,Female,University,Single,24,2,67018,1900,0
+360000,Female,University,Single,30,1,7102,1893,0
+420000,Female,University,Single,25,-1,2804,0,0
+30000,Male,High School,Single,45,0,28185,1377,0
+60000,Male,University,Married,36,1,36544,1700,0
+160000,Female,University,Married,31,0,135320,5000,0
+170000,Female,Graduate School,Married,44,-2,10309,4944,0
+20000,Male,High School,Married,52,1,20838,0,1
+290000,Female,Graduate School,Married,45,-2,1092,1335,0
+340000,Female,University,Single,52,-2,3375,3283,0
+60000,Male,Graduate School,Single,31,1,0,0,1
+500000,Female,Graduate School,Married,36,-1,51465,24201,0
+230000,Female,High School,Single,54,0,66108,6840,0
+30000,Female,Others,Married,53,0,28191,1600,0
+20000,Male,Graduate School,Single,25,1,0,3650,0
+140000,Female,University,Single,30,-1,854,0,0
+500000,Male,University,Married,31,0,39291,1700,0
+120000,Female,Graduate School,Married,39,1,0,0,0
+70000,Female,University,Single,25,0,63745,2294,1
+280000,Female,University,Married,50,-1,25040,41275,0
+130000,Male,University,Married,53,-1,780,0,0
+70000,Male,Graduate School,Single,30,0,68971,2700,1
+340000,Female,University,Married,42,-1,14087,11013,0
+210000,Male,Graduate School,Single,28,-1,1832,1074,0
+80000,Female,University,Single,57,-1,3072,2856,0
+120000,Female,University,Single,43,-1,316,71316,0
+100000,Female,Graduate School,Single,28,0,102697,4600,0
+20000,Male,Graduate School,Single,22,0,14720,3880,0
+220000,Female,Graduate School,Married,46,2,243517,0,0
+70000,Female,Graduate School,Single,34,2,44347,0,1
+20000,Female,University,Married,26,0,73404,2600,0
+100000,Female,University,Single,32,0,54933,2700,0
+110000,Female,University,Single,24,0,49782,1878,0
+80000,Female,High School,Single,27,2,3682,3193,0
+30000,Female,Graduate School,Single,29,1,0,686,0
+30000,Female,University,Married,34,0,27816,1260,0
+60000,Male,Graduate School,Single,27,0,57753,2960,1
+50000,Female,Graduate School,Single,27,0,44421,2500,0
+260000,Female,University,Married,53,0,222134,10020,0
+500000,Female,Graduate School,Single,29,0,201500,10000,0
+110000,Female,Graduate School,Single,35,-1,549,0,0
+30000,Female,High School,Married,33,2,27126,3600,0
+180000,Female,Graduate School,Single,31,-1,8260,5921,1
+170000,Female,Graduate School,Married,35,1,0,3209,1
+320000,Female,Graduate School,Married,34,-1,800,18873,1
+10000,Female,University,Married,53,0,9269,1400,1
+50000,Male,University,Single,26,1,48481,0,0
+170000,Female,Graduate School,Single,32,1,0,0,1
+120000,Female,Graduate School,Married,51,0,120290,6000,0
+30000,Female,High School,Married,47,1,32724,0,0
+160000,Male,University,Single,29,0,157186,6227,0
+50000,Female,Graduate School,Single,27,2,45722,2000,1
+30000,Female,University,Married,44,0,4180,390,0
+50000,Male,University,Married,36,0,47606,1815,0
+240000,Female,University,Married,45,1,0,396,0
+50000,Male,High School,Others,50,0,45771,2266,0
+180000,Female,Graduate School,Single,31,4,83020,3500,1
+500000,Female,University,Single,28,0,215459,7412,0
+240000,Female,High School,Single,50,0,223045,9490,0
+50000,Male,University,Single,23,0,30101,434,0
+50000,Female,University,Married,30,2,6425,1050,1
+220000,Male,Graduate School,Single,38,0,197600,8200,0
+200000,Male,University,Married,36,1,0,0,1
+230000,Female,Graduate School,Married,35,-1,4043,10670,0
+20000,Male,High School,Single,56,0,15552,1300,0
+130000,Female,Graduate School,Married,37,2,41795,3100,1
+100000,Male,University,Married,50,0,95549,5025,0
+20000,Male,University,Single,39,0,15972,1500,1
+50000,Female,University,Single,26,0,48801,2144,0
+140000,Male,High School,Married,45,2,78663,0,1
+90000,Male,University,Single,32,0,23730,2000,0
+290000,Female,Graduate School,Married,39,1,1358,2613,0
+180000,Female,University,Single,34,0,181385,8194,1
+200000,Female,High School,Married,57,1,0,0,1
+110000,Male,Graduate School,Married,32,2,95686,5100,1
+50000,Male,High School,Single,28,0,17141,1500,0
+290000,Female,University,Married,48,-2,10608,34470,0
+380000,Male,Graduate School,Married,46,2,88069,4400,1
+480000,Male,University,Married,39,0,428927,15000,0
+30000,Female,University,Single,22,0,18830,0,0
+380000,Male,Graduate School,Married,50,0,211463,10000,0
+340000,Female,Graduate School,Married,42,0,221587,8409,0
+30000,Female,University,Single,22,0,17358,0,0
+200000,Male,High School,Married,29,-1,15456,46722,0
+360000,Female,Graduate School,Single,30,0,40514,1800,0
+160000,Male,Graduate School,Married,49,-1,316,632,1
+50000,Male,University,Single,40,1,50209,0,0
+200000,Male,University,Single,32,-2,1473,1473,0
+60000,Male,University,Single,30,0,39264,3000,0
+80000,Female,High School,Married,53,-1,780,0,0
+100000,Female,University,Married,38,2,50721,0,1
+330000,Male,Graduate School,Married,37,-2,18807,2087,1
+500000,Female,Graduate School,Single,44,-2,0,5218,0
+180000,Female,University,Married,32,0,54842,2073,0
+200000,Female,University,Married,44,0,120515,3030,0
+420000,Female,Graduate School,Married,39,0,304964,14000,0
+110000,Female,University,Single,23,0,34486,10000,0
+50000,Female,University,Married,24,2,50710,1974,1
+130000,Female,High School,Married,46,0,17505,1312,1
+250000,Male,University,Married,36,0,168899,10000,0
+50000,Female,University,Single,26,-1,528,378,0
+260000,Male,University,Married,43,0,99523,3514,0
+240000,Male,University,Married,54,0,16135,234000,0
+300000,Female,Graduate School,Single,28,0,24763,10700,0
+200000,Female,Graduate School,Married,35,-1,82831,15089,0
+600000,Female,High School,Married,46,-1,22921,11444,0
+130000,Female,University,Single,29,0,88345,8600,0
+150000,Female,Graduate School,Single,28,-2,9625,2099,0
+200000,Male,Graduate School,Single,30,-2,-1883,15143,0
+80000,Female,University,Married,34,1,46461,2100,1
+60000,Female,University,Married,34,0,30834,3000,0
+360000,Male,University,Married,31,1,0,0,0
+110000,Female,Others,Married,37,0,70204,2745,0
+70000,Female,High School,Married,35,3,69852,0,1
+50000,Male,Graduate School,Single,27,1,0,0,1
+10000,Female,University,Single,46,0,2496,1800,1
+120000,Male,Graduate School,Married,51,0,117446,4400,0
+220000,Male,University,Married,41,-1,8840,6643,0
+440000,Male,Graduate School,Single,33,-1,1137,0,1
+50000,Male,University,Married,49,-1,1261,1261,0
+210000,Male,University,Single,32,0,155913,13000,0
+490000,Female,University,Single,34,0,13955,52280,0
+160000,Male,Graduate School,Single,30,-1,4180,7781,0
+180000,Female,University,Married,39,0,46202,2200,0
+330000,Female,University,Married,45,0,123813,5100,0
+20000,Male,University,Single,34,0,17650,2500,0
+120000,Female,University,Single,37,0,93822,4848,0
+360000,Female,Graduate School,Single,30,-2,-28,0,0
+440000,Female,Others,Single,29,0,217407,8005,0
+70000,Female,University,Single,34,0,62694,1818,0
+230000,Female,Graduate School,Married,44,1,3884,949,0
+30000,Male,University,Single,28,2,26177,1800,1
+260000,Male,University,Married,34,0,176688,6929,0
+30000,Male,University,Single,25,2,18941,6000,1
+20000,Female,High School,Others,56,-1,190,1473,1
+70000,Female,High School,Single,55,0,69997,1700,0
+90000,Male,High School,Married,31,0,68476,4017,0
+70000,Female,University,Single,30,2,29773,1550,1
+30000,Female,University,Single,22,1,29676,0,0
+270000,Male,University,Married,51,0,32108,15000,0
+30000,Male,University,Married,35,0,25989,1533,0
+410000,Female,Graduate School,Married,46,-2,0,0,1
+240000,Male,University,Single,30,0,151127,10000,0
+30000,Female,University,Single,31,0,30589,1488,0
+110000,Male,University,Married,32,0,56573,2000,0
+20000,Male,University,Single,24,0,17850,2019,0
+500000,Male,High School,Married,55,-2,735,0,0
+240000,Female,University,Single,32,0,308013,7000,0
+140000,Female,University,Single,25,0,137585,6200,0
+50000,Male,Graduate School,Single,24,0,42058,1367,0
+130000,Male,Graduate School,Single,27,0,32464,1867,0
+50000,Female,High School,Married,51,0,50448,2017,1
+260000,Female,Graduate School,Single,25,-1,359,9188,0
+100000,Male,University,Married,35,-1,1131,291,0
+30000,Female,High School,Married,47,2,25280,0,1
+60000,Male,Graduate School,Single,26,-1,440,880,0
+20000,Female,High School,Married,35,0,15791,3000,1
+60000,Female,University,Married,24,3,33866,0,1
+350000,Female,University,Others,44,2,52949,3682,0
+130000,Female,University,Married,41,2,133894,0,1
+390000,Female,University,Single,26,2,185111,5000,0
+460000,Male,Graduate School,Married,39,2,2495,0,1
+30000,Male,High School,Single,31,0,27838,1703,1
+170000,Female,University,Married,35,-1,10316,2798,1
+20000,Male,University,Single,27,3,20771,0,1
+260000,Male,University,Married,31,0,12695,4005,0
+190000,Female,Graduate School,Single,32,0,161248,6500,0
+480000,Male,Graduate School,Married,37,0,470317,18507,0
+60000,Male,University,Single,33,0,58760,2500,0
+260000,Female,Graduate School,Married,38,0,256359,10027,0
+50000,Female,High School,Single,50,-2,48781,2283,0
+160000,Female,Graduate School,Single,25,0,76038,2791,0
+280000,Female,Graduate School,Single,29,-1,768,788,0
+290000,Female,High School,Married,29,0,161623,11200,0
+240000,Female,High School,Married,36,-1,8020,724,0
+20000,Female,University,Married,41,0,7434,1300,0
+230000,Female,High School,Married,40,-1,1856,32254,0
+120000,Female,University,Married,31,0,46736,30000,1
+400000,Male,Graduate School,Single,28,-2,1202,0,0
+80000,Male,University,Married,34,1,64575,3000,0
+200000,Female,Graduate School,Married,37,0,74418,3251,0
+20000,Male,University,Married,48,1,17453,0,1
+100000,Male,University,Single,26,-1,214,0,1
+150000,Male,University,Others,25,-1,1180,7705,1
+180000,Female,University,Single,23,0,178239,7969,1
+500000,Female,University,Married,47,-1,396,396,1
+360000,Female,University,Single,46,-1,4000,311,0
+320000,Male,Graduate School,Single,38,-1,3872,5409,0
+220000,Male,Graduate School,Married,42,1,0,0,1
+50000,Female,Graduate School,Single,36,1,29003,1000,1
+310000,Male,Graduate School,Married,47,-2,960,1869,0
+50000,Male,University,Single,50,0,16242,1215,0
+160000,Female,University,Single,23,0,21905,1373,0
+280000,Female,University,Married,40,-1,415,415,0
+20000,Male,High School,Single,44,4,12480,0,1
+60000,Male,Graduate School,Single,24,-1,390,780,1
+390000,Male,University,Married,37,0,178087,10000,0
+50000,Male,University,Single,25,1,0,0,0
+450000,Male,Graduate School,Single,31,-1,757,1422,0
+180000,Male,University,Single,26,0,171041,5000,0
+140000,Male,High School,Single,30,2,145627,6005,0
+40000,Female,University,Single,33,1,21576,3000,0
+80000,Male,University,Single,26,0,24906,2000,0
+240000,Female,High School,Single,35,-1,4215,0,0
+80000,Male,University,Single,33,0,69963,5000,0
+360000,Female,High School,Married,45,0,167370,20000,0
+300000,Female,University,Married,31,-1,3862,0,0
+390000,Female,Graduate School,Single,29,-2,0,0,0
+320000,Male,Graduate School,Married,37,-1,14515,0,1
+220000,Female,Graduate School,Married,26,0,52593,15000,0
+340000,Female,Graduate School,Single,29,0,132024,6400,0
+90000,Female,University,Married,32,1,0,323,0
+130000,Female,University,Single,33,0,7547,1011,0
+160000,Male,University,Married,37,2,833,833,1
+280000,Male,Graduate School,Married,38,-2,101259,7259,1
+310000,Female,University,Single,38,-1,1276,3666,0
+130000,Male,University,Single,29,0,30857,13333,0
+180000,Female,University,Married,30,0,149218,10915,0
+30000,Male,University,Married,39,1,30559,129,0
+170000,Female,University,Single,34,-1,8069,2855,1
+240000,Male,Graduate School,Married,29,-1,17882,403,0
+70000,Male,Graduate School,Single,31,0,52440,5000,0
+110000,Female,Graduate School,Single,28,1,-100,500,1
+20000,Female,High School,Married,23,1,14428,480,1
+210000,Female,Graduate School,Single,30,-1,7056,3033,0
+20000,Female,University,Single,21,0,20040,1499,0
+360000,Female,University,Married,43,-1,1999,10145,0
+80000,Male,High School,Married,59,0,71720,3234,0
+170000,Female,Graduate School,Single,34,0,55623,2253,0
+70000,Female,University,Single,30,0,3878,1300,0
+110000,Male,University,Married,30,0,26000,1500,0
+10000,Female,Graduate School,Single,27,0,8128,1400,0
+60000,Female,Graduate School,Single,23,0,17191,1700,0
+220000,Male,Graduate School,Married,39,0,38647,2017,0
+50000,Male,Graduate School,Single,33,0,21171,2028,0
+50000,Female,University,Single,22,0,1470,627,0
+70000,Female,University,Single,26,0,23790,1425,0
+180000,Male,University,Single,26,0,145574,3907,0
+60000,Male,Graduate School,Single,31,2,60705,3200,1
+100000,Female,University,Married,40,2,44421,2100,1
+500000,Male,University,Single,28,0,24677,7013,0
+50000,Male,University,Single,38,0,26533,1000,0
+340000,Female,University,Single,27,-1,24583,1964,0
+450000,Male,Graduate School,Single,31,-1,3960,4412,0
+160000,Male,Graduate School,Single,31,-1,813,1351,0
+180000,Male,University,Married,33,0,177017,20135,0
+50000,Male,University,Married,25,1,-8,18631,1
+500000,Male,University,Married,38,1,91662,0,0
+80000,Female,University,Single,23,-1,3796,7913,0
+280000,Male,High School,Married,49,-1,4792,3410,0
+240000,Female,Others,Married,41,2,239633,9000,0
+20000,Female,Graduate School,Single,24,0,15036,1580,0
+50000,Male,University,Single,27,0,46858,2000,0
+100000,Female,Graduate School,Married,42,2,46195,2000,1
+80000,Female,University,Single,34,1,84054,3,1
+150000,Female,Graduate School,Single,40,-1,7296,1819,1
+220000,Male,University,Single,30,1,105348,4268,1
+270000,Female,University,Single,34,0,131310,6000,0
+230000,Female,High School,Single,25,0,26214,2000,0
+140000,Female,University,Married,30,0,106298,6451,1
+180000,Male,University,Single,40,-1,4289,1000,0
+140000,Female,University,Single,27,1,28332,0,0
+30000,Male,University,Single,35,0,20290,5100,1
+80000,Male,University,Married,36,0,25028,3033,0
+140000,Male,University,Single,30,0,131026,5006,0
+20000,Male,High School,Single,38,0,18878,1318,0
+360000,Male,Graduate School,Married,37,-1,797,360,0
+130000,Female,Graduate School,Single,28,-2,9821,8611,1
+70000,Female,Graduate School,Married,36,1,31919,0,1
+300000,Male,Graduate School,Single,41,-1,2930,1610,0
+500000,Male,University,Single,32,-2,71731,25487,0
+20000,Male,University,Single,30,0,39508,2546,0
+140000,Male,University,Married,45,-1,104144,5200,1
+400000,Female,University,Single,27,1,0,0,0
+30000,Female,University,Single,28,0,15280,1500,1
+50000,Male,Graduate School,Married,33,2,12632,1000,0
+380000,Male,University,Married,38,0,88112,4569,0
+360000,Female,University,Single,27,-2,5497,3000,1
+620000,Female,Graduate School,Single,52,-1,31400,32335,0
+200000,Male,High School,Married,69,0,83858,3207,0
+20000,Female,University,Married,25,-1,19744,0,1
+170000,Female,Graduate School,Single,29,-1,5539,1147,1
+250000,Female,Graduate School,Single,26,0,195126,9500,0
+120000,Female,University,Single,25,1,0,0,1
+150000,Male,High School,Married,46,0,96396,4073,0
+200000,Female,High School,Married,41,2,199802,9231,0
+200000,Male,Graduate School,Married,39,0,91415,20387,0
+180000,Female,Graduate School,Married,44,0,97925,4303,0
+500000,Female,Graduate School,Single,34,0,119451,26000,0
+20000,Female,University,Single,24,0,13867,1500,0
+500000,Male,University,Married,39,0,342548,8628,1
+80000,Female,Graduate School,Married,47,-1,15739,17253,0
+80000,Female,High School,Single,23,0,71988,6200,0
+360000,Male,Graduate School,Single,30,0,36590,1400,0
+230000,Female,Graduate School,Single,28,-1,12089,2306,0
+70000,Male,University,Single,32,3,21986,0,1
+20000,Male,University,Single,25,-2,1000,0,0
+250000,Female,University,Married,37,0,215928,9510,0
+400000,Male,University,Single,33,0,61705,3000,0
+420000,Female,Graduate School,Married,30,-1,3790,19292,0
+320000,Male,University,Single,37,0,31989,5001,0
+30000,Male,Graduate School,Single,25,0,29108,1800,1
+470000,Male,Graduate School,Single,30,0,290137,15128,0
+70000,Male,Graduate School,Single,31,1,65779,3000,1
+70000,Female,University,Single,23,-1,18193,1352,1
+30000,Female,Graduate School,Single,25,0,22399,5000,0
+30000,Female,University,Married,22,1,14274,1000,1
+30000,Female,University,Married,45,2,30330,1400,1
+70000,Male,University,Single,35,0,67306,3400,0
+20000,Female,High School,Single,50,0,16350,1300,0
+210000,Male,University,Single,31,2,200332,0,1
+170000,Female,Graduate School,Married,36,-1,1000,1000,0
+300000,Male,University,Single,31,-1,86267,78580,0
+10000,Female,University,Single,24,0,7199,1350,1
+50000,Female,Graduate School,Single,24,-2,220,220,0
+200000,Female,University,Married,30,1,35706,1500,0
+220000,Female,Graduate School,Single,36,-1,2978,125,0
+400000,Female,University,Married,34,-1,14986,25772,0
+260000,Female,Graduate School,Single,30,-1,449,1340,0
+10000,Female,University,Married,37,0,10000,7419,0
+140000,Male,Graduate School,Single,28,0,23702,1457,0
+50000,Male,High School,Single,43,0,97348,3711,0
+330000,Female,University,Married,42,0,18516,10000,0
+130000,Female,Graduate School,Single,29,-2,-200,0,1
+360000,Female,Graduate School,Single,33,-2,0,883,0
+450000,Male,Graduate School,Single,28,-1,9845,3000,0
+500000,Male,Graduate School,Single,36,-1,15401,23128,0
+240000,Male,University,Married,33,-1,8383,3470,0
+60000,Female,University,Single,29,1,9045,1000,0
+340000,Male,University,Married,42,0,76166,10000,0
+500000,Female,University,Married,53,-1,7855,17512,0
+270000,Female,University,Married,33,0,121065,3500,1
+50000,Male,University,Single,33,0,24060,1400,0
+20000,Male,High School,Single,29,0,11455,2000,0
+130000,Female,Graduate School,Married,36,-1,9164,4605,0
+200000,Female,Graduate School,Married,55,0,110778,3796,1
+360000,Female,University,Single,25,0,336179,6781,0
+90000,Female,University,Married,55,0,15000,1055,1
+200000,Female,Graduate School,Married,50,-1,1662,2482,0
+80000,Female,Graduate School,Married,43,-1,21057,2282,0
+30000,Female,High School,Married,45,1,29334,0,0
+50000,Male,University,Single,28,0,48566,2000,0
+50000,Male,University,Single,33,0,53282,3000,1
+450000,Female,Graduate School,Married,69,-1,2000,0,1
+130000,Female,University,Married,37,0,3811,5000,1
+350000,Male,Graduate School,Single,33,0,53562,2045,0
+20000,Male,High School,Single,22,-2,20732,2,1
+170000,Male,University,Single,27,0,21958,1300,0
+240000,Female,University,Married,38,1,47320,0,0
+130000,Male,Graduate School,Single,30,0,77939,3916,0
+360000,Female,Graduate School,Married,28,1,0,0,0
+10000,Male,University,Single,26,1,9526,0,0
+10000,Female,High School,Married,42,2,8965,1228,1
+200000,Female,High School,Single,70,-1,7109,10025,0
+150000,Female,University,Married,35,0,39883,1806,0
+50000,Male,Others,Single,33,0,47423,1600,0
+40000,Male,University,Single,26,1,39826,1900,0
+400000,Male,Graduate School,Single,42,-2,4279,101319,0
+80000,Male,University,Single,25,-1,832,0,1
+50000,Male,University,Single,28,-1,2809,0,0
+170000,Male,University,Single,42,0,142118,7040,0
+190000,Male,High School,Single,34,0,47713,3000,0
+20000,Male,University,Married,62,0,19724,1000,1
+30000,Female,Graduate School,Single,23,1,12271,0,0
+20000,Male,Graduate School,Single,29,0,14492,1500,0
+390000,Female,University,Single,48,0,171189,10000,0
+100000,Female,High School,Single,66,0,82583,3035,1
+50000,Male,University,Married,34,0,44849,2550,1
+120000,Female,High School,Married,54,0,67094,3000,0
+80000,Female,Graduate School,Single,24,-1,329,500,0
+340000,Female,University,Single,39,0,132749,5290,0
+110000,Female,University,Single,23,0,111271,6000,0
+90000,Female,University,Married,42,-1,390,0,0
+10000,Male,University,Single,26,0,2989,2000,1
+160000,Male,University,Single,28,-2,0,0,0
+170000,Male,Graduate School,Single,34,1,23535,0,0
+440000,Female,University,Married,28,-2,8325,0,0
+310000,Male,Graduate School,Single,32,0,6023,18000,0
+110000,Male,University,Married,55,0,106670,6000,0
+50000,Male,High School,Single,23,1,0,0,0
+100000,Female,Graduate School,Married,39,2,97284,4700,1
+350000,Female,Graduate School,Single,29,0,42518,1740,0
+310000,Female,University,Single,27,1,192910,0,0
+120000,Female,University,Married,40,0,109399,3900,0
+50000,Female,High School,Married,25,1,-1159,48000,0
+300000,Male,Graduate School,Single,32,1,10566,1200,0
+130000,Female,University,Single,29,2,172094,0,0
+260000,Male,Graduate School,Single,30,-1,1943,855,0
+490000,Male,Graduate School,Single,35,-2,-13,2652,0
+500000,Male,Graduate School,Married,43,-1,14872,53421,0
+30000,Female,Graduate School,Single,25,-2,9634,7,0
+60000,Female,Graduate School,Single,31,0,32632,2000,1
+140000,Female,University,Married,45,0,55444,3000,0
+180000,Female,Graduate School,Single,30,0,149525,7320,0
+20000,Male,Graduate School,Single,23,0,18436,1500,0
+50000,Female,University,Married,29,0,40117,6138,1
+400000,Female,University,Single,30,2,24054,1624,0
+20000,Male,High School,Single,24,1,0,5536,0
+430000,Female,University,Married,43,-1,8894,11198,0
+400000,Female,Graduate School,Married,32,-2,110,587,0
+200000,Female,University,Single,36,-1,694,2721,1
+10000,Female,University,Single,22,-1,656,10128,0
+80000,Male,High School,Single,43,2,76292,3006,0
+140000,Female,Graduate School,Single,28,0,115743,5000,0
+20000,Female,University,Single,28,2,24611,0,1
+340000,Female,University,Married,33,2,411,11038,0
+230000,Female,University,Married,44,-1,5630,390,1
+120000,Female,University,Married,37,-1,200,0,0
+210000,Female,Graduate School,Married,34,0,8011,1126,0
+80000,Female,Graduate School,Single,25,0,45392,5000,0
+150000,Male,Graduate School,Single,31,-1,3084,0,0
+130000,Female,University,Married,53,0,126750,5996,0
+20000,Male,Graduate School,Single,35,-1,17007,0,1
+200000,Female,University,Married,29,0,204541,7186,0
+200000,Male,Graduate School,Single,29,-1,99,99,0
+80000,Female,Graduate School,Single,22,0,78028,3200,0
+80000,Male,University,Single,30,-1,2688,2208,1
+500000,Male,Graduate School,Married,44,-1,150758,63256,0
+50000,Male,University,Married,43,2,15641,0,1
+200000,Female,Graduate School,Married,44,-1,798,4150,0
+80000,Male,University,Married,45,0,4089,1042,0
+50000,Male,University,Single,26,0,47169,1804,0
+130000,Male,University,Single,37,0,117471,3857,0
+310000,Female,Graduate School,Married,35,-1,2948,62635,0
+190000,Male,University,Married,37,2,2461,8040,1
+100000,Male,University,Single,29,0,70878,3100,1
+230000,Female,Graduate School,Married,38,-1,2130,62469,0
+70000,Female,University,Single,25,0,66174,3000,1
+500000,Female,Graduate School,Married,30,0,8371,1015,0
+50000,Male,University,Married,36,2,49732,0,1
+120000,Male,University,Single,35,1,0,0,0
+80000,Female,University,Married,39,0,51644,3000,0
+50000,Female,University,Married,34,0,11367,3000,0
+350000,Male,University,Single,51,0,39943,1859,0
+60000,Female,University,Single,29,0,60000,3100,0
+200000,Female,Graduate School,Married,47,-1,935,1418,1
+50000,Female,University,Single,23,0,45078,3000,0
+90000,Female,University,Married,43,-2,7489,5833,0
+360000,Male,University,Single,31,-1,744,9667,0
+440000,Female,University,Single,25,2,225416,0,1
+170000,Female,University,Married,52,0,168516,6791,0
+190000,Female,University,Married,33,0,142272,8007,0
+290000,Male,Graduate School,Single,29,1,0,0,1
+210000,Male,Graduate School,Single,37,1,0,1422,0
+20000,Male,Graduate School,Single,24,0,3395,2296,1
+400000,Female,Graduate School,Married,35,-2,10043,18057,0
+360000,Female,Graduate School,Married,32,1,0,0,1
+250000,Male,Graduate School,Married,53,1,0,4190,1
+30000,Male,University,Married,30,0,27836,2000,0
+280000,Female,Graduate School,Married,47,0,257561,20000,0
+70000,Female,High School,Married,59,2,68766,3700,1
+70000,Male,Graduate School,Single,26,0,35666,5000,0
+20000,Male,University,Single,47,0,14591,1900,0
+350000,Male,Graduate School,Single,32,-1,15179,4983,0
+200000,Female,Graduate School,Single,37,0,21111,5000,0
+90000,Male,High School,Single,25,0,6822,1172,1
+100000,Female,Graduate School,Married,52,0,4740,2000,0
+50000,Female,University,Married,22,0,48255,2400,0
+280000,Female,High School,Married,51,-2,2158,0,0
+50000,Male,High School,Married,38,1,51106,0,1
+120000,Female,University,Single,29,-2,3570,0,1
+100000,Male,High School,Single,25,2,96889,8513,1
+130000,Male,University,Single,29,-1,3920,4000,0
+200000,Female,Graduate School,Single,30,0,196845,8990,0
+350000,Male,Graduate School,Single,34,0,147070,7331,0
+20000,Female,Graduate School,Single,44,0,19018,1327,0
+150000,Female,Graduate School,Single,32,0,125995,5000,0
+130000,Female,Graduate School,Married,30,0,9255,1204,0
+330000,Female,Graduate School,Single,34,0,173630,8000,0
+80000,Female,University,Others,52,0,63134,2087,0
+10000,Male,Others,Single,35,3,10281,25,0
+360000,Male,Graduate School,Single,29,1,-1808,0,0
+120000,Male,Graduate School,Married,36,0,43488,2000,0
+20000,Female,University,Single,22,0,7130,1177,0
+280000,Female,Graduate School,Single,35,-1,199,0,0
+100000,Female,University,Married,45,-2,0,0,0
+70000,Male,University,Single,31,0,63441,1789,0
+350000,Female,Graduate School,Married,31,-2,7914,4221,0
+50000,Male,University,Married,36,0,22430,2000,0
+470000,Female,Graduate School,Single,35,-1,37342,2500,0
+350000,Male,Graduate School,Single,31,0,297967,100066,0
+140000,Male,Graduate School,Single,56,0,140630,6100,0
+140000,Female,University,Single,23,0,49235,2517,0
+380000,Male,Graduate School,Single,27,-1,73903,2924,0
+50000,Male,University,Single,24,1,33991,0,1
+260000,Female,High School,Married,48,0,97908,2980,0
+20000,Female,High School,Married,25,1,19045,2000,0
+190000,Female,High School,Married,39,-1,157,0,0
+220000,Male,University,Married,39,5,187665,0,0
+230000,Female,Graduate School,Single,36,0,111691,8000,0
+20000,Female,University,Married,43,0,7470,1145,0
+100000,Male,University,Married,52,0,21641,2000,0
+150000,Female,Graduate School,Single,26,-1,38052,0,0
+320000,Female,Graduate School,Single,28,-1,11684,1135,0
+50000,Female,University,Married,33,0,76025,3750,1
+50000,Male,University,Single,53,0,50302,2100,0
+370000,Female,University,Single,39,0,181221,10073,0
+150000,Female,Graduate School,Married,37,-1,58337,0,0
+200000,Female,University,Married,40,-1,390,390,1
+200000,Male,University,Married,34,-1,170,1525,1
+30000,Male,University,Single,37,0,1197,2600,1
+200000,Female,University,Single,46,0,121198,4593,0
+200000,Male,Graduate School,Single,26,-2,0,1516,0
+240000,Female,University,Single,26,0,138859,5308,1
+40000,Female,University,Married,24,2,28953,6918,1
+140000,Male,Graduate School,Single,31,0,97448,1413,0
+50000,Male,University,Single,35,0,50497,2011,0
+50000,Female,High School,Married,48,0,38031,2000,0
+160000,Female,University,Single,25,-1,227,861,0
+140000,Male,Graduate School,Single,32,-1,12319,0,0
+240000,Female,High School,Married,28,0,230184,9000,1
+230000,Female,Graduate School,Single,29,-1,495,466,0
+200000,Female,University,Single,32,0,165074,10000,0
+50000,Female,Graduate School,Married,49,0,44767,5001,0
+360000,Female,Graduate School,Single,30,-1,23725,61137,0
+500000,Female,Graduate School,Married,42,1,0,0,0
+20000,Male,High School,Single,22,1,18977,0,0
+80000,Male,University,Single,30,2,73113,3300,0
+30000,Female,Graduate School,Married,44,-2,616,0,0
+80000,Female,University,Married,35,0,14376,1435,0
+30000,Female,University,Married,52,0,27101,4110,0
+200000,Female,High School,Married,48,-2,0,0,0
+240000,Male,Others,Single,29,0,219419,7000,0
+280000,Male,University,Single,29,0,255317,9439,1
+170000,Female,University,Single,33,-1,1547,0,0
+100000,Female,High School,Single,44,0,89988,3852,0
+70000,Female,University,Single,24,0,65508,4000,1
+110000,Female,University,Others,27,0,91538,3354,0
+100000,Female,High School,Single,27,0,83587,3151,0
+80000,Male,Graduate School,Single,39,2,78246,3000,1
+300000,Female,University,Married,33,-1,1496,1500,0
+150000,Female,Graduate School,Married,40,-2,52097,2000,0
+150000,Male,University,Single,26,0,69358,4213,0
+10000,Male,High School,Married,52,1,6969,0,1
+210000,Female,Graduate School,Single,35,-2,0,0,0
+20000,Male,University,Single,25,0,19770,1465,1
+20000,Male,University,Married,38,0,6715,1287,0
+80000,Female,High School,Married,48,2,63695,6056,0
+270000,Male,University,Married,49,-2,-173,0,0
+200000,Female,Graduate School,Married,33,-2,0,225,0
+200000,Male,Graduate School,Married,34,-1,2727,1332,0
+10000,Male,University,Single,34,0,9323,1000,0
+360000,Male,Graduate School,Single,34,-1,5275,24267,0
+20000,Male,University,Married,40,1,20138,0,1
+40000,Female,Graduate School,Married,27,0,40856,2023,0
+110000,Male,High School,Married,50,0,109192,4300,0
+50000,Male,High School,Married,50,0,63889,2800,0
+100000,Female,Graduate School,Single,28,0,46430,2000,0
+50000,Male,University,Single,21,0,21023,3000,0
+30000,Female,High School,Married,23,2,15955,0,1
+20000,Male,Graduate School,Single,23,2,7040,1128,1
+100000,Male,University,Married,44,2,30094,1800,1
+80000,Female,University,Married,24,-1,1462,1502,0
+120000,Male,Graduate School,Single,29,0,118303,3358,1
+30000,Male,University,Single,36,-1,3024,3462,0
+80000,Male,Others,Married,25,-2,59366,15027,0
+170000,Female,Graduate School,Single,30,1,0,0,1
+20000,Male,Graduate School,Married,40,0,30832,3500,0
+50000,Female,High School,Married,36,0,46311,2014,0
+300000,Female,Graduate School,Married,31,0,149893,7000,0
+20000,Female,Graduate School,Single,26,2,300,0,1
+130000,Male,University,Single,29,0,29252,2000,0
+500000,Female,University,Married,41,-1,2503,1043,0
+30000,Male,University,Single,41,-1,9736,1182,0
+180000,Female,University,Married,41,0,139184,6411,0
+340000,Male,Graduate School,Single,33,-1,8427,10300,0
+480000,Female,Graduate School,Single,29,1,0,0,0
+80000,Female,University,Single,30,1,0,0,0
+140000,Female,High School,Single,24,0,20923,1633,0
+180000,Female,High School,Single,30,0,105873,5000,0
+30000,Female,University,Single,30,0,16570,3000,0
+290000,Male,Graduate School,Others,55,0,7970,2482,1
+110000,Female,High School,Single,45,0,9242,5000,0
+60000,Male,Graduate School,Single,25,0,24200,11000,0
+80000,Female,Graduate School,Married,35,-1,3614,0,0
+110000,Male,Graduate School,Single,25,0,64155,2621,0
+20000,Male,University,Married,42,1,17614,0,1
+150000,Male,High School,Single,31,-1,4213,5914,0
+240000,Female,High School,Married,44,-1,24469,11511,0
+400000,Male,Graduate School,Married,44,0,18432,6838,0
+120000,Male,Graduate School,Married,34,2,97463,4000,1
+330000,Male,Graduate School,Single,39,0,365672,12350,0
+30000,Male,University,Single,29,-1,390,390,0
+100000,Male,High School,Married,55,2,435,435,0
+140000,Male,Graduate School,Single,37,-1,291,291,0
+50000,Male,Others,Single,33,0,45226,1618,0
+310000,Male,Graduate School,Single,37,-2,0,0,1
+160000,Female,High School,Married,49,-2,741,6339,0
+110000,Female,Graduate School,Single,25,-2,86107,0,0
+20000,Male,University,Married,45,-2,1651,0,0
+30000,Female,High School,Single,23,-1,528,913,0
+50000,Male,University,Married,47,1,0,835,0
+20000,Male,High School,Single,24,0,13470,1500,1
+220000,Female,Graduate School,Single,36,-1,1838,923,0
+460000,Male,Graduate School,Single,36,-1,11469,13766,0
+150000,Female,Graduate School,Single,34,1,0,0,1
+50000,Female,High School,Married,53,0,20211,36396,0
+100000,Female,Graduate School,Married,32,2,98033,4620,0
+260000,Male,Graduate School,Married,50,0,115937,5800,0
+50000,Male,High School,Married,41,4,52955,0,1
+50000,Female,University,Single,27,0,46465,2367,0
+240000,Female,Graduate School,Single,34,-1,4970,0,0
+230000,Male,High School,Single,24,0,82066,4000,0
+90000,Female,University,Single,28,0,69186,5000,0
+20000,Female,University,Single,22,0,5473,2300,0
+70000,Female,High School,Single,25,0,68658,4500,0
+260000,Male,Graduate School,Single,29,0,128208,17000,0
+20000,Female,University,Single,35,1,14181,0,1
+150000,Female,University,Single,29,0,94409,5000,0
+200000,Female,High School,Single,51,0,196875,7657,0
+50000,Male,Graduate School,Single,57,0,50770,1814,0
+210000,Male,University,Married,39,-1,1443,1443,0
+360000,Male,Graduate School,Single,27,-1,759,5702,0
+30000,Female,University,Single,34,1,23860,0,0
+170000,Female,University,Single,35,0,13500,1315,0
+500000,Female,University,Single,32,0,44462,7000,0
+80000,Male,University,Single,31,0,45605,1800,0
+70000,Male,University,Single,24,3,63000,3000,1
+100000,Male,High School,Single,26,0,30627,2326,0
+300000,Female,University,Married,26,-1,316,316,0
+200000,Female,University,Married,49,-1,3024,10000,0
+190000,Male,Graduate School,Single,30,-1,25953,1698,0
+30000,Female,High School,Married,43,0,23101,1481,0
+230000,Female,High School,Married,58,1,0,0,0
+80000,Male,Graduate School,Married,46,0,39115,2000,0
+280000,Female,Graduate School,Single,36,1,124,0,0
+140000,Female,Graduate School,Single,29,1,144922,17,0
+30000,Female,High School,Married,35,2,27160,0,1
+30000,Female,University,Married,36,2,16818,4000,1
+20000,Male,University,Married,51,0,18673,1297,0
+230000,Female,University,Married,30,-1,3686,5619,0
+20000,Male,University,Single,23,0,16831,4183,1
+360000,Female,Graduate School,Single,27,0,21275,1800,1
+60000,Female,Graduate School,Single,24,0,56119,2664,0
+20000,Female,University,Single,48,0,14218,1562,1
+500000,Male,Graduate School,Married,44,-1,367,16652,0
+380000,Male,University,Married,39,-2,0,4107,0
+160000,Male,Graduate School,Single,27,1,-1860,0,0
+170000,Male,Graduate School,Single,48,2,2400,0,0
+430000,Female,Graduate School,Single,29,0,67336,3261,0
+320000,Male,Graduate School,Married,39,-1,8374,17634,0
+180000,Male,Graduate School,Married,31,-1,1100,860,1
+50000,Male,University,Single,42,-1,47612,2113,0
+60000,Male,University,Single,29,2,47631,2000,0
+360000,Female,High School,Single,38,1,-507,1502,0
+80000,Female,University,Single,23,0,80998,4500,0
+50000,Male,High School,Married,41,-2,26948,5017,0
+200000,Female,University,Single,37,2,214261,8000,0
+460000,Female,Graduate School,Single,36,-1,58312,36570,0
+130000,Female,University,Single,23,0,78426,3000,0
+440000,Male,University,Single,29,0,430990,18887,0
+280000,Male,University,Single,38,2,168581,6000,1
+20000,Male,High School,Single,39,0,6530,8389,0
+60000,Female,University,Single,24,0,57925,2100,0
+210000,Female,University,Single,38,0,194358,8000,0
+100000,Female,High School,Single,46,0,56009,3000,0
+240000,Male,Graduate School,Married,46,-2,0,0,0
+280000,Female,High School,Single,43,0,194868,7000,0
+130000,Female,Graduate School,Single,29,-1,78,0,0
+360000,Male,Graduate School,Single,35,-2,1533,24,1
+210000,Female,High School,Single,36,0,203447,6549,0
+20000,Male,High School,Single,23,2,15034,3200,1
+20000,Female,University,Single,23,0,4731,2000,0
+50000,Male,University,Married,34,1,16501,0,1
+100000,Female,University,Married,39,0,97040,2617,0
+50000,Male,High School,Single,29,-2,0,0,0
+130000,Male,University,Single,27,1,123371,6000,1
+270000,Female,Graduate School,Single,31,0,13759,0,0
+50000,Female,University,Single,24,-1,390,390,0
+80000,Female,University,Married,35,0,9647,1206,0
+10000,Female,Graduate School,Single,29,-1,1727,1508,0
+190000,Female,University,Married,34,-1,500,300,0
+50000,Male,Graduate School,Single,27,0,40688,4100,0
+100000,Female,High School,Married,46,2,42674,4000,1
+150000,Male,University,Single,49,0,151822,2985,1
+510000,Male,High School,Single,30,-2,548,0,0
+110000,Female,University,Single,24,-1,1362,1352,0
+150000,Female,Graduate School,Married,58,-1,1657,2277,0
+50000,Female,University,Married,54,0,48403,2020,0
+50000,Female,University,Single,23,-1,25622,10122,0
+360000,Female,University,Married,29,1,0,1244,0
+10000,Female,Graduate School,Single,24,2,6524,1300,0
+210000,Female,Graduate School,Single,26,1,120098,4500,0
+370000,Male,University,Single,28,0,261021,10001,0
+360000,Female,University,Married,33,0,3177,0,0
+50000,Female,University,Married,31,0,44970,3855,1
+360000,Male,University,Single,38,-2,0,0,0
+170000,Female,Graduate School,Married,35,0,86239,1512,0
+70000,Male,University,Single,26,0,67864,3517,1
+360000,Female,Graduate School,Single,34,-1,8871,6411,0
+230000,Male,Others,Married,42,0,92230,3292,0
+50000,Male,Graduate School,Married,38,0,12650,0,0
+200000,Male,Graduate School,Married,41,-1,1378,0,0
+140000,Male,University,Single,30,-1,326,326,1
+120000,Female,University,Single,33,0,120411,6300,0
+280000,Female,High School,Single,46,-2,1401,5201,0
+300000,Female,Graduate School,Single,29,0,26857,9511,0
+20000,Female,University,Single,22,0,15169,2874,0
+160000,Female,University,Single,25,2,158629,7500,1
+20000,Male,Graduate School,Single,24,2,17781,2000,1
+120000,Male,University,Married,41,0,44966,1690,0
+290000,Female,University,Married,37,-2,1961,3148,0
+60000,Male,University,Married,32,0,30781,1505,0
+110000,Female,Graduate School,Single,29,0,69718,3000,0
+50000,Female,High School,Married,43,0,42874,1862,0
+630000,Male,Graduate School,Single,32,-2,2731,1138,0
+30000,Male,Graduate School,Single,28,0,23837,1800,0
+50000,Male,University,Single,33,0,50094,2479,0
+60000,Male,University,Married,42,0,49244,2000,0
+150000,Male,University,Single,27,0,12593,38018,0
+10000,Male,High School,Single,24,-1,273,8327,0
+20000,Male,University,Single,27,0,18860,1360,0
+80000,Male,High School,Single,32,0,71375,1414,0
+160000,Female,University,Single,52,0,38695,3000,0
+100000,Male,University,Single,43,0,103601,5088,0
+90000,Female,High School,Married,34,0,49771,4000,0
+50000,Female,High School,Single,60,0,27943,3577,0
+170000,Male,Graduate School,Single,30,0,117010,4337,0
+240000,Female,Graduate School,Single,32,-1,5695,9420,0
+80000,Female,University,Single,25,-1,3002,1921,0
+80000,Female,Graduate School,Single,30,0,69533,3500,0
+360000,Female,University,Married,37,-2,0,0,0
+140000,Male,Graduate School,Married,31,2,51028,2500,0
+60000,Male,University,Single,49,0,81411,38709,0
+110000,Male,High School,Single,27,0,55567,15753,0
+160000,Female,Graduate School,Single,28,0,164262,7900,0
+200000,Female,High School,Married,45,-1,80,2230,1
+280000,Female,University,Married,29,0,233717,10004,0
+360000,Male,University,Married,33,1,296148,14906,0
+90000,Female,Graduate School,Single,26,-1,13416,7380,1
+200000,Male,University,Single,46,0,18564,60000,0
+50000,Male,Graduate School,Married,50,1,29394,3100,1
+400000,Female,Graduate School,Married,32,-2,7217,0,0
+360000,Male,Graduate School,Married,49,1,-3,633,0
+150000,Male,High School,Married,50,0,88900,4161,0
+50000,Female,High School,Single,61,1,47731,0,1
+150000,Male,University,Single,35,0,6008,8008,0
+100000,Female,University,Single,37,1,3492,10,1
+310000,Female,Graduate School,Married,30,0,27289,1489,0
+120000,Male,University,Single,39,1,0,1170,0
+90000,Female,University,Married,44,0,82253,8238,0
+150000,Female,Graduate School,Married,36,-2,14843,14528,0
+50000,Male,University,Married,42,0,35255,1690,0
+210000,Male,Graduate School,Married,53,-1,6060,5983,0
+300000,Female,University,Single,42,-1,28077,1500,0
+110000,Female,University,Married,33,2,87514,4000,1
+150000,Female,Graduate School,Single,25,0,150562,5196,0
+210000,Male,Graduate School,Married,42,-1,1729,1315,0
+30000,Male,Graduate School,Single,34,2,21107,3374,1
+240000,Female,University,Married,42,-1,632,0,0
+120000,Male,University,Single,34,-1,856,856,0
+80000,Female,Graduate School,Single,24,0,31815,4038,0
+80000,Female,University,Single,25,1,31040,0,1
+360000,Male,Graduate School,Married,32,-1,2612,3234,0
+130000,Male,Graduate School,Single,34,2,72996,4000,1
+290000,Male,University,Married,36,2,322,330,0
+240000,Female,High School,Married,33,0,129600,7000,0
+60000,Male,University,Single,23,0,4553,3258,0
+200000,Female,University,Married,60,1,0,0,1
+150000,Female,University,Married,28,0,99294,5008,0
+300000,Male,High School,Single,29,0,103792,8401,0
+30000,Female,University,Single,24,-1,2590,6000,0
+250000,Male,University,Married,40,0,142477,7000,1
+20000,Male,High School,Single,40,0,17180,1500,0
+370000,Female,Graduate School,Single,34,0,206161,10000,0
+20000,Female,Graduate School,Single,23,1,6081,0,1
+120000,Female,University,Single,23,0,7945,1000,1
+50000,Male,University,Single,23,0,19766,1300,0
+190000,Female,Graduate School,Single,27,0,72656,2000,1
+20000,Female,University,Single,22,0,15034,1500,0
+240000,Female,University,Married,42,1,15100,0,0
+150000,Male,University,Single,32,0,80437,7000,0
+180000,Female,University,Married,35,0,8694,1500,0
+180000,Male,University,Single,36,0,136613,6600,0
+70000,Male,Graduate School,Single,31,0,45450,0,0
+230000,Female,Graduate School,Single,32,-1,441,28200,0
+360000,Male,Graduate School,Single,27,-1,5810,4041,0
+120000,Female,University,Single,33,0,52266,2522,0
+90000,Male,High School,Married,49,2,74766,3500,1
+110000,Male,University,Married,32,0,110198,3851,0
+100000,Female,High School,Married,53,0,36838,1607,0
+90000,Male,Graduate School,Single,26,1,58518,3006,0
+50000,Female,Graduate School,Married,35,1,47438,0,0
+50000,Female,University,Single,53,0,7343,1139,0
+180000,Female,Graduate School,Married,28,1,0,0,0
+50000,Male,University,Married,41,0,36696,10000,0
+20000,Female,High School,Single,26,0,13804,1300,0
+20000,Male,University,Single,26,2,36060,0,0
+20000,Female,University,Married,23,1,7979,1000,0
+20000,Male,High School,Single,49,0,19363,1200,0
+50000,Female,High School,Married,48,-1,390,390,1
+210000,Male,University,Married,46,-2,0,1828,0
+490000,Male,Others,Married,51,-1,4770,17143,0
+280000,Female,Graduate School,Single,28,1,0,5072,1
+200000,Female,High School,Single,54,-2,880,1948,0
+100000,Female,University,Single,24,1,0,670,1
+250000,Male,Graduate School,Single,56,-2,777,0,0
+240000,Male,University,Single,54,-1,7832,64461,0
+80000,Female,University,Married,26,0,75405,3000,0
+100000,Male,High School,Married,67,2,16918,4145,1
+30000,Female,High School,Married,41,-2,0,18393,0
+180000,Female,Graduate School,Single,42,-2,177061,8000,0
+80000,Female,Graduate School,Single,32,0,60918,2500,0
+110000,Female,University,Married,46,-2,1495,86367,0
+180000,Female,University,Married,30,0,39259,3000,0
+150000,Female,University,Married,29,2,99986,6800,1
+50000,Female,University,Married,52,0,15377,2000,0
+230000,Female,University,Single,23,-1,1050,2203,1
+340000,Male,Graduate School,Married,41,-1,624,2431,1
+30000,Male,University,Single,25,1,26623,1800,1
+270000,Female,University,Married,45,0,17074,15000,0
+50000,Female,Graduate School,Single,36,-1,11684,45135,0
+130000,Female,High School,Married,43,0,74020,5000,0
+200000,Female,Graduate School,Single,27,-1,31789,22091,0
+360000,Male,Graduate School,Married,46,0,14591,3005,1
+500000,Male,University,Single,27,0,131519,6662,0
+100000,Female,Graduate School,Single,25,0,11801,1600,0
+270000,Female,University,Married,50,0,22821,2500,0
+100000,Male,University,Married,52,0,64254,2211,1
+20000,Female,University,Single,25,0,17950,2000,0
+50000,Female,University,Married,45,0,47823,3144,0
+200000,Female,University,Single,32,-1,2147,1173,0
+170000,Male,Graduate School,Married,44,1,84947,2000,1
+20000,Female,Graduate School,Single,28,2,18540,2000,1
+70000,Male,High School,Married,50,2,56781,2,1
+260000,Female,Graduate School,Single,38,-1,252,0,0
+200000,Female,Graduate School,Married,28,-1,1760,0,0
+190000,Female,Graduate School,Single,25,0,106223,7500,0
+220000,Male,High School,Married,58,0,22599,1875,0
+170000,Female,High School,Single,30,0,38903,5144,0
+290000,Female,Graduate School,Single,47,0,281950,11900,0
+60000,Female,Graduate School,Single,25,0,31493,2000,0
+100000,Female,High School,Married,50,0,36925,2000,1
+130000,Female,High School,Married,44,-1,632,0,0
+360000,Male,Graduate School,Married,40,-2,1126,3455,0
+30000,Female,University,Married,38,3,3379,0,0
+20000,Male,High School,Single,28,-1,16122,2000,1
+110000,Female,Graduate School,Single,37,2,600,0,1
+130000,Female,University,Single,43,0,87289,5000,0
+50000,Female,University,Single,23,0,49026,1748,0
+110000,Male,University,Married,48,0,107767,5700,0
+80000,Female,Graduate School,Single,25,0,78327,3000,0
+150000,Female,University,Married,34,-2,316,1801,0
+80000,Female,University,Married,29,1,56340,2600,0
+320000,Male,Graduate School,Single,35,0,164511,15000,0
+170000,Female,Graduate School,Single,35,-1,897,861,0
+60000,Female,Graduate School,Single,26,0,41525,1815,0
+180000,Male,Graduate School,Single,23,2,157074,5000,1
+470000,Male,University,Single,29,0,59235,3106,0
+70000,Female,High School,Married,47,1,30950,3107,0
+280000,Female,University,Married,41,0,40483,2718,0
+210000,Female,Graduate School,Single,26,-1,231,1624,0
+300000,Male,Graduate School,Single,30,0,185327,1500,0
+140000,Female,University,Married,42,0,158495,5530,0
+450000,Male,Graduate School,Married,41,-1,70825,17913,0
+160000,Female,University,Married,38,0,105313,5416,0
+50000,Female,High School,Married,45,1,0,0,0
+50000,Male,Graduate School,Single,42,-1,12751,22660,0
+50000,Male,University,Married,32,0,50669,1613,0
+150000,Female,Graduate School,Single,25,0,129911,12200,1
+90000,Male,Graduate School,Married,53,0,23551,2400,0
+270000,Female,University,Married,42,0,225245,10000,0
+70000,Male,Graduate School,Single,36,0,65275,5000,0
+360000,Male,University,Single,33,-1,9582,2139,0
+200000,Female,University,Married,34,1,2500,9814,1
+250000,Female,Graduate School,Married,42,0,249406,11400,0
+360000,Female,University,Single,34,0,205363,8000,0
+20000,Female,University,Married,39,1,19565,0,0
+250000,Female,Graduate School,Single,28,2,21404,28705,1
+60000,Female,University,Single,25,2,600,0,1
+200000,Male,University,Single,36,1,0,0,0
+30000,Female,High School,Single,38,0,25546,3700,1
+500000,Female,Graduate School,Single,39,1,0,0,0
+280000,Female,High School,Married,45,0,50415,10000,0
+200000,Female,University,Married,42,-1,1261,0,0
+360000,Female,University,Married,35,1,0,0,0
+50000,Female,High School,Single,44,2,42115,0,1
+60000,Male,University,Single,30,0,57041,2148,0
+110000,Male,University,Married,42,-2,100534,3678,0
+200000,Female,High School,Married,44,0,34760,4000,0
+70000,Female,University,Single,39,2,67607,3700,0
+220000,Female,Others,Married,37,2,189976,15510,1
+20000,Female,Graduate School,Single,24,-1,379,0,0
+220000,Female,Graduate School,Single,27,-2,4289,1031,0
+180000,Male,University,Married,40,1,172584,1000,0
+100000,Male,University,Married,53,2,43611,2500,1
+200000,Female,Graduate School,Married,41,-1,6402,5189,0
+200000,Male,High School,Single,34,1,138839,7001,0
+400000,Female,Graduate School,Married,38,-2,123,123,0
+60000,Male,Graduate School,Married,38,2,59310,2700,0
+60000,Male,University,Single,22,0,59575,2051,0
+210000,Female,University,Married,37,-2,1094,1335,0
+200000,Male,Graduate School,Married,32,0,30144,3000,0
+50000,Female,Graduate School,Single,29,-2,0,0,0
+300000,Female,Graduate School,Single,38,-1,2984,0,1
+510000,Female,Graduate School,Married,48,-1,9686,19121,0
+160000,Female,Graduate School,Single,32,1,0,0,0
+50000,Female,University,Married,36,-2,3477,1909,0
+220000,Female,Graduate School,Single,32,-2,21332,28567,0
+20000,Male,University,Single,22,1,6792,0,1
+410000,Male,Graduate School,Single,31,-1,4182,0,1
+50000,Female,University,Married,25,-1,42726,3941,1
+50000,Male,High School,Married,38,0,10825,1500,0
+50000,Male,High School,Married,48,0,10420,1310,0
+70000,Female,University,Single,29,0,4059,2000,0
+120000,Male,University,Single,25,0,120633,4622,0
+170000,Male,University,Married,30,-1,416,832,0
+160000,Male,Graduate School,Married,51,0,1380,1331,1
+30000,Female,University,Single,22,0,30397,3500,0
+10000,Female,University,Married,40,1,0,1912,0
+160000,Female,University,Married,33,0,189935,8113,0
+20000,Male,University,Single,33,0,17439,1340,0
+250000,Female,Graduate School,Single,35,-2,0,0,0
+30000,Female,University,Single,26,2,23064,3100,1
+180000,Female,Graduate School,Single,29,0,123154,5200,0
+240000,Female,University,Single,34,-1,3403,3502,1
+30000,Female,University,Single,25,1,16368,0,0
+50000,Female,Graduate School,Single,28,1,4323,0,0
+280000,Female,University,Single,24,-1,1481,11943,1
+20000,Male,Graduate School,Single,37,2,14688,2700,1
+130000,Female,High School,Married,54,0,77506,2900,0
+70000,Male,Graduate School,Single,29,1,71267,5,1
+180000,Female,University,Married,32,-1,1991,0,0
+320000,Male,Graduate School,Married,46,-2,3266,12253,0
+30000,Female,University,Married,33,2,31947,1872,1
+280000,Female,University,Single,33,1,0,0,0
+80000,Female,University,Single,50,0,8985,2000,0
+140000,Female,University,Single,27,0,136899,5200,0
+180000,Male,University,Married,27,0,181219,7000,0
+20000,Male,University,Single,55,2,11736,780,0
+420000,Female,Graduate School,Single,29,0,105582,23004,0
+200000,Female,Graduate School,Single,29,0,53493,1060,1
+140000,Female,Graduate School,Single,25,0,45068,5000,0
+170000,Female,High School,Single,33,0,166348,7354,0
+30000,Female,University,Single,24,0,18220,3000,0
+50000,Female,Graduate School,Single,26,1,0,0,0
+150000,Female,Graduate School,Married,45,0,123580,3575,0
+30000,Male,University,Single,30,0,6933,3000,1
+50000,Male,High School,Single,38,0,50174,3000,0
+300000,Female,Graduate School,Single,39,-1,2019,1,1
+80000,Female,High School,Others,34,0,59451,1834,0
+50000,Female,Graduate School,Single,27,0,39607,5014,0
+110000,Female,University,Married,41,0,100143,4000,0
+110000,Female,Graduate School,Single,32,1,0,1670,0
+390000,Female,Graduate School,Married,34,-1,3060,60248,0
+200000,Male,High School,Single,46,-1,396,396,0
+30000,Male,Graduate School,Married,38,8,35031,0,1
+230000,Male,Graduate School,Single,30,0,101874,8000,0
+210000,Female,Graduate School,Married,44,-2,0,70,0
+30000,Female,University,Married,54,-1,3334,3366,0
+130000,Female,High School,Married,41,0,101885,3500,0
+130000,Male,High School,Single,37,1,-896,0,0
+480000,Female,Graduate School,Single,27,-1,12759,3001,0
+160000,Female,Graduate School,Married,37,-1,9293,5902,0
+10000,Male,Graduate School,Single,27,0,6599,1400,0
+360000,Female,University,Single,33,-1,8552,1118,0
+260000,Female,University,Married,28,0,251811,9388,0
+330000,Female,University,Married,45,0,47644,2000,0
+30000,Female,University,Married,47,1,5163,0,1
+80000,Male,University,Married,31,2,81712,3271,1
+120000,Female,University,Married,29,-1,7778,3179,1
+170000,Male,University,Single,30,0,6383,1000,1
+140000,Female,Graduate School,Single,31,1,0,241,0
+80000,Female,University,Single,37,-2,3946,0,0
+200000,Male,High School,Married,44,0,138877,6437,0
+80000,Female,University,Single,26,-1,780,0,1
+230000,Male,University,Married,36,0,19505,3000,0
+260000,Female,Graduate School,Married,40,-2,2500,0,0
+50000,Female,University,Single,28,-1,780,0,0
+340000,Male,University,Married,35,1,-6,0,0
+50000,Male,Graduate School,Single,39,0,49869,2146,0
+200000,Female,High School,Single,45,1,666,1937,0
+280000,Female,University,Single,36,0,91093,3224,0
+170000,Male,University,Single,35,0,115634,5935,0
+100000,Female,University,Married,38,2,37516,2000,0
+360000,Female,Graduate School,Married,48,1,491,3700,1
+30000,Female,Graduate School,Single,23,0,14790,12037,1
+150000,Female,University,Single,26,1,2341,340,0
+50000,Male,University,Single,29,-2,15519,4000,0
+40000,Female,Graduate School,Single,35,0,26801,10000,0
+30000,Male,High School,Single,58,-1,390,390,0
+140000,Female,Graduate School,Married,35,0,26187,3000,0
+150000,Male,University,Single,30,-1,8821,5964,0
+150000,Female,University,Single,37,0,152349,7000,0
+360000,Female,Graduate School,Single,47,1,366577,0,0
+170000,Male,University,Single,43,0,136485,3602,0
+50000,Male,Graduate School,Single,26,0,44838,1769,1
+260000,Female,Graduate School,Married,43,-1,3209,0,0
+350000,Female,High School,Married,32,-1,70910,30516,0
+210000,Female,University,Married,41,-1,17605,2164,0
+50000,Female,University,Married,39,3,5171,0,0
+170000,Male,University,Married,46,0,94791,4300,1
+50000,Female,University,Single,24,0,48600,2200,1
+50000,Female,University,Married,32,0,24402,8442,0
+280000,Female,University,Single,33,-1,281,5631,0
+430000,Male,University,Single,42,0,59648,2035,0
+340000,Female,Graduate School,Single,30,-1,8571,21041,0
+310000,Female,Graduate School,Single,32,-2,90654,27778,0
+120000,Male,Graduate School,Married,41,-1,416,0,0
+200000,Male,University,Single,27,-1,1990,2035,1
+500000,Female,University,Single,41,-1,48650,15183,0
+270000,Male,High School,Married,53,0,239963,8710,1
+190000,Male,Graduate School,Single,26,0,184136,9000,0
+340000,Female,High School,Married,37,-1,16275,20047,0
+390000,Female,High School,Single,30,0,338174,8738,0
+90000,Female,Graduate School,Single,31,0,63872,5003,0
+200000,Female,University,Married,27,0,174381,7000,0
+230000,Female,Graduate School,Married,43,-2,0,0,0
+100000,Female,High School,Married,39,-2,1890,24105,0
+30000,Female,Graduate School,Single,24,2,11483,1519,1
+50000,Male,University,Married,50,0,6449,1129,0
+200000,Female,Graduate School,Married,44,1,-40,0,0
+50000,Female,University,Married,39,0,42822,2000,0
+130000,Male,University,Single,31,0,5551,1500,1
+260000,Male,University,Married,46,0,29562,1500,0
+80000,Female,Graduate School,Married,28,-1,9598,60,0
+20000,Male,University,Single,30,1,-2610,3000,0
+160000,Female,University,Married,27,0,31666,3000,0
+70000,Female,High School,Single,24,0,65429,3136,0
+70000,Male,Graduate School,Single,26,0,60701,2735,0
+400000,Male,Graduate School,Single,38,-1,3456,50263,1
+30000,Male,Graduate School,Single,26,1,0,780,0
+240000,Female,Graduate School,Single,35,0,18336,5118,0
+40000,Female,University,Married,43,0,36415,1939,0
+510000,Female,High School,Married,61,0,181733,17000,0
+20000,Female,University,Single,23,1,17538,0,0
+80000,Male,University,Single,26,0,60315,2850,0
+10000,Female,Graduate School,Married,37,0,5300,0,0
+50000,Female,High School,Married,61,0,38158,2220,0
+200000,Male,University,Married,31,1,29070,3019,0
+140000,Male,High School,Married,59,0,129684,6193,0
+60000,Male,University,Single,28,0,60721,4000,1
+100000,Male,Graduate School,Single,27,0,5552,1120,1
+30000,Female,University,Married,34,2,30082,0,0
+500000,Male,University,Married,42,0,1131,0,0
+420000,Female,University,Single,31,0,50771,1862,0
+130000,Female,University,Married,36,2,131989,0,1
+360000,Female,Graduate School,Married,56,0,60206,7690,0
+150000,Female,Graduate School,Single,36,-1,9788,6252,0
+360000,Male,Graduate School,Single,33,-1,845,0,0
+230000,Female,University,Single,29,0,157107,6000,0
+180000,Female,Graduate School,Married,31,-1,18710,45460,0
+160000,Female,Graduate School,Married,34,1,-49,0,1
+250000,Female,University,Married,43,-1,399,508,0
+170000,Male,High School,Married,40,-2,6115,872,0
+100000,Female,University,Married,28,0,95251,5401,0
+200000,Female,Graduate School,Married,37,-1,3660,17302,1
+60000,Female,University,Married,53,0,55505,3000,0
+300000,Male,Graduate School,Married,44,0,58137,3000,0
+290000,Male,University,Single,29,-1,22277,0,0
+30000,Female,University,Single,23,2,19674,2000,0
+340000,Male,University,Married,35,0,292302,3904,0
+90000,Male,University,Single,31,0,34704,3000,0
+140000,Female,Graduate School,Single,28,1,99215,0,1
+150000,Male,Graduate School,Single,31,0,110921,3000,0
+70000,Male,High School,Married,47,0,62080,6874,1
+750000,Female,Graduate School,Married,41,0,184022,6817,0
+200000,Female,University,Single,26,0,116070,5096,0
+230000,Female,University,Married,39,2,183718,7236,1
+180000,Female,High School,Married,43,-1,1473,22969,0
+210000,Female,University,Married,40,-1,2216,5258,0
+170000,Male,University,Single,31,0,136330,46000,0
+30000,Female,University,Single,26,0,8009,2316,0
+30000,Male,Graduate School,Single,24,0,28749,2004,0
+90000,Female,University,Married,33,0,76395,4000,0
+50000,Female,University,Married,30,2,29826,0,1
+240000,Female,Graduate School,Single,28,1,0,1748,1
+280000,Female,Graduate School,Married,63,-2,4108,2219,0
+410000,Female,High School,Single,31,-2,0,0,0
+30000,Female,University,Married,42,2,4641,2521,1
+260000,Male,Graduate School,Married,40,-1,524,248,0
+30000,Female,University,Married,51,0,16883,1303,0
+10000,Male,University,Married,23,1,8679,0,0
+290000,Male,Graduate School,Married,41,1,0,0,0
+150000,Female,Graduate School,Single,37,-1,2195,0,1
+30000,Male,University,Married,24,0,29416,1540,0
+360000,Female,Graduate School,Single,44,0,347696,20000,0
+240000,Female,University,Married,47,1,0,0,0
+60000,Male,Graduate School,Single,28,1,0,3132,1
+50000,Female,University,Single,31,0,9514,2100,0
+50000,Female,Graduate School,Single,42,-1,1618,0,0
+50000,Male,University,Single,29,0,23236,1660,0
+10000,Male,University,Single,24,1,4231,1500,1
+110000,Female,High School,Married,48,1,35341,0,1
+220000,Female,Graduate School,Single,29,1,0,3003,0
+400000,Female,High School,Married,44,0,120456,3557,0
+90000,Female,University,Single,26,0,70206,3300,0
+150000,Female,High School,Married,48,2,84647,5950,1
+80000,Female,University,Single,26,0,77037,3300,0
+50000,Male,High School,Single,23,0,7965,1500,0
+50000,Male,University,Single,33,0,48355,2140,0
+20000,Male,University,Married,38,2,6313,2426,0
+50000,Male,University,Single,29,-1,10498,1345,0
+140000,Female,University,Single,27,0,62986,5000,0
+280000,Female,Graduate School,Married,36,-1,4480,2120,1
+130000,Female,University,Married,37,1,67612,0,1
+70000,Female,Graduate School,Married,25,1,0,0,0
+470000,Male,Graduate School,Married,38,0,156454,5526,0
+500000,Male,University,Single,33,0,134503,20031,0
+40000,Male,University,Single,35,2,21072,1504,0
+40000,Female,University,Married,45,1,40910,0,0
+50000,Female,Graduate School,Married,22,-1,5856,3005,0
+40000,Female,Graduate School,Single,24,-1,780,0,1
+50000,Female,University,Single,26,0,48457,1351,0
+310000,Female,University,Single,44,-2,6571,1156,0
+140000,Female,University,Single,34,0,27098,3485,1
+80000,Female,University,Single,27,0,36406,2000,1
+210000,Male,University,Single,26,1,0,0,1
+50000,Female,Graduate School,Single,23,0,18975,1400,0
+400000,Male,Graduate School,Single,39,2,50747,0,0
+60000,Female,University,Married,36,0,57262,5816,0
+160000,Female,High School,Single,54,1,0,0,0
+400000,Male,Graduate School,Single,31,1,-166,1,0
+450000,Female,Graduate School,Married,46,-1,28205,3793,1
+20000,Female,High School,Single,35,0,10704,1517,0
+50000,Female,University,Married,47,0,47500,2700,0
+200000,Female,University,Married,41,-2,23570,742,1
+160000,Male,Graduate School,Single,42,-2,51997,0,0
+60000,Female,University,Single,48,2,60717,0,1
+280000,Female,University,Single,30,-1,3510,0,0
+200000,Female,University,Single,28,-1,1243,0,0
+100000,Male,University,Married,56,0,96392,4424,0
+10000,Female,Graduate School,Single,22,0,6579,1300,1
+80000,Male,Graduate School,Single,28,0,49740,2000,1
+440000,Male,Graduate School,Single,35,0,421895,15100,1
+500000,Female,University,Single,36,-2,11168,9287,0
+20000,Male,High School,Single,24,0,14628,3000,0
+500000,Female,Graduate School,Single,30,-1,5800,7210,0
+30000,Male,Graduate School,Single,24,-1,8217,5000,0
+180000,Female,Graduate School,Single,32,2,326,2241,0
+120000,Female,Graduate School,Single,23,0,100904,6629,0
+120000,Female,High School,Single,50,0,149540,125000,1
+140000,Male,University,Married,40,0,77356,4000,0
+60000,Male,University,Single,26,0,23146,1400,1
+10000,Female,University,Single,22,-1,528,0,1
+120000,Female,Graduate School,Married,39,0,51662,10000,0
+70000,Female,University,Single,26,0,28986,2400,0
+330000,Female,Graduate School,Single,28,1,0,1056,0
+70000,Female,University,Single,40,0,65059,3141,0
+410000,Female,High School,Married,48,-1,6578,43821,0
+360000,Male,Graduate School,Married,57,-1,2505,4070,0
+200000,Male,Graduate School,Single,28,-2,0,0,0
+20000,Female,High School,Married,34,-1,1030,4177,0
+90000,Female,High School,Married,49,-1,69380,2800,0
+60000,Female,University,Married,30,0,36082,1628,0
+120000,Male,Graduate School,Single,34,1,-325,1000,0
+210000,Female,Graduate School,Single,30,1,0,0,0
+40000,Female,University,Single,25,2,25741,1750,1
+130000,Female,High School,Married,42,0,105406,4349,0
+150000,Female,Graduate School,Single,38,-1,675,0,0
+30000,Male,University,Married,39,1,22595,3000,1
+380000,Male,High School,Married,52,2,390409,15353,0
+150000,Female,High School,Married,40,0,56690,3606,1
+50000,Female,Graduate School,Single,22,2,41632,3509,1
+170000,Male,Graduate School,Single,27,0,17066,4000,0
+360000,Male,High School,Married,41,-1,498,498,1
+500000,Female,Graduate School,Married,38,0,238708,90012,0
+200000,Female,Graduate School,Single,36,1,12099,0,0
+240000,Female,Graduate School,Married,33,-1,30973,1537,0
+210000,Female,University,Married,38,-1,142,307,0
+110000,Female,University,Married,28,0,66831,3021,0
+10000,Male,High School,Married,46,-1,6277,1000,1
+180000,Female,Graduate School,Single,34,-1,1599,0,0
+50000,Female,University,Married,41,2,29463,0,0
+180000,Male,Graduate School,Single,32,0,63461,3018,0
+40000,Female,University,Single,24,0,38221,3107,1
+50000,Male,Graduate School,Single,25,-1,910,1607,0
+200000,Female,Graduate School,Single,29,-2,390,390,0
+120000,Female,Graduate School,Single,27,2,500,0,0
+90000,Male,High School,Married,62,0,98784,1050,0
+130000,Male,University,Single,32,0,65599,3000,0
+120000,Female,University,Single,48,0,72528,2680,0
+350000,Female,Graduate School,Married,46,0,102060,10069,0
+200000,Male,Graduate School,Single,28,-2,778,506,0
+80000,Male,University,Single,26,0,76158,8000,0
+50000,Female,Graduate School,Others,45,-1,4736,0,0
+20000,Female,University,Single,46,1,15481,1600,0
+90000,Female,University,Single,31,0,23418,1408,0
+160000,Female,University,Married,34,0,5283,1500,1
+50000,Male,High School,Married,41,2,31303,5700,0
+220000,Female,Graduate School,Single,30,1,29007,0,0
+70000,Female,Graduate School,Married,29,0,68532,3300,0
+30000,Male,High School,Married,52,2,31521,0,0
+80000,Female,High School,Married,46,2,58934,2254,1
+10000,Male,University,Single,23,0,3072,2535,0
+80000,Female,University,Married,46,0,4742,1277,0
+50000,Male,High School,Single,25,0,45943,1818,0
+60000,Male,Graduate School,Others,47,0,49689,3007,0
+50000,Male,University,Single,41,0,40710,3000,0
+80000,Female,High School,Single,26,0,78513,3000,0
+120000,Male,University,Single,32,0,113650,3999,0
+80000,Female,High School,Others,60,0,30959,1458,0
+110000,Male,Graduate School,Single,33,0,111249,4000,0
+140000,Female,High School,Married,51,0,58213,3500,0
+50000,Female,University,Single,22,1,5139,0,0
+50000,Male,High School,Married,38,0,44837,1871,0
+320000,Male,Graduate School,Married,42,0,19252,4027,0
+140000,Male,University,Married,38,2,131877,7000,0
+180000,Female,University,Single,35,-2,827,675,0
+50000,Male,University,Single,25,1,21679,0,1
+310000,Male,University,Married,36,0,52191,3012,0
+50000,Female,Graduate School,Single,23,-1,1300,628,0
+160000,Male,High School,Married,62,0,37615,1700,0
+220000,Female,University,Married,33,-2,9649,8300,0
+140000,Male,University,Married,41,0,57607,4000,0
+50000,Male,Graduate School,Married,49,0,50430,2500,0
+60000,Male,University,Single,24,0,57480,4207,0
+20000,Male,High School,Single,28,-1,3515,1000,1
+100000,Female,Graduate School,Married,27,0,39056,5000,0
+150000,Female,High School,Married,29,0,149278,3984,0
+260000,Male,University,Single,30,-1,326,326,0
+50000,Male,High School,Married,38,0,31304,3000,0
+320000,Male,Graduate School,Single,29,-1,838,4309,0
+180000,Male,University,Married,35,0,95757,3400,0
+120000,Female,University,Married,35,1,0,0,1
+500000,Female,Graduate School,Married,41,-1,5094,6000,0
+220000,Male,Graduate School,Married,32,-2,8757,9,0
+50000,Male,University,Single,26,1,0,0,0
+180000,Female,High School,Married,56,-1,440,440,1
+230000,Female,University,Single,33,1,-133,744,0
+220000,Female,Graduate School,Single,36,-1,4952,1734,0
+160000,Male,University,Married,41,0,2524,0,1
+50000,Male,Graduate School,Single,23,0,17528,1000,0
+100000,Male,University,Married,41,0,100284,3500,0
+360000,Male,Graduate School,Married,32,-1,2708,5588,0
+160000,Female,University,Married,38,-1,6000,6000,0
+100000,Female,Graduate School,Single,25,-1,9424,1259,0
+170000,Female,Graduate School,Single,24,0,171465,18500,0
+80000,Female,University,Single,26,0,11284,1500,0
+50000,Male,University,Married,44,0,38736,3000,0
+500000,Female,University,Married,32,-2,0,0,0
+20000,Male,Graduate School,Others,47,0,16888,1616,0
+30000,Female,University,Single,26,0,14923,1600,0
+60000,Female,Graduate School,Single,30,-2,0,0,0
+360000,Male,Graduate School,Married,44,-1,18295,3204,0
+240000,Male,Graduate School,Married,50,0,52771,3000,0
+50000,Female,High School,Married,36,0,49212,5450,0
+50000,Male,University,Single,46,2,49423,10,0
+60000,Female,High School,Married,29,0,59190,2200,0
+210000,Female,Graduate School,Single,25,1,0,0,0
+330000,Male,University,Married,42,-1,870,0,0
+90000,Female,Graduate School,Single,25,-1,3407,10850,0
+380000,Female,Graduate School,Single,27,0,44986,70000,0
+500000,Female,University,Married,34,0,60278,1745,0
+30000,Female,High School,Married,35,0,23697,3000,0
+50000,Male,University,Single,33,0,29616,5000,1
+360000,Female,Graduate School,Married,33,0,364367,14050,0
+310000,Female,Graduate School,Single,28,-2,3473,201153,0
+60000,Female,University,Single,26,1,58603,2000,0
+120000,Female,High School,Married,47,0,118148,5033,0
+140000,Female,Graduate School,Married,33,0,27234,1389,0
+220000,Female,Graduate School,Married,41,0,219198,5068,0
+400000,Male,Graduate School,Single,29,0,31348,28102,0
+50000,Female,University,Married,56,0,48836,2000,0
+80000,Male,University,Single,35,1,0,0,1
+150000,Female,High School,Married,33,0,21112,5000,0
+50000,Male,Graduate School,Single,26,0,28408,1477,0
+50000,Female,University,Married,50,0,50224,2200,0
+210000,Male,University,Single,25,0,96518,1681,0
+90000,Male,Graduate School,Single,30,0,41789,2100,0
+320000,Male,Graduate School,Single,31,-1,1391,50006,0
+180000,Female,High School,Single,29,-2,0,0,0
+310000,Male,University,Single,24,0,165477,6000,0
+50000,Female,University,Single,23,0,48529,1743,0
+120000,Male,Graduate School,Single,41,0,113796,6000,0
+150000,Female,University,Single,36,-1,1374,0,0
+20000,Male,University,Married,47,-1,8424,0,0
+180000,Female,High School,Married,43,-2,130500,47346,0
+120000,Female,University,Married,39,0,114644,4320,0
+100000,Male,Graduate School,Single,27,1,0,429,0
+210000,Female,Graduate School,Married,51,-1,1900,0,0
+100000,Female,University,Single,23,0,85435,3143,0
+120000,Female,University,Married,41,-1,7877,11532,0
+190000,Female,High School,Married,54,-1,88,0,0
+80000,Male,Others,Single,30,0,59292,5009,0
+70000,Male,University,Married,34,0,59841,2610,0
+60000,Male,University,Single,42,1,60560,2065,1
+50000,Female,Graduate School,Single,25,0,8107,1000,0
+110000,Female,University,Married,45,0,108214,5300,0
+200000,Female,Graduate School,Single,47,-1,8172,20003,0
+210000,Female,Graduate School,Single,33,1,0,0,0
+30000,Male,High School,Married,50,1,22320,0,1
+160000,Female,Graduate School,Single,32,-1,1179,4042,1
+20000,Female,High School,Single,23,0,13884,3000,1
+70000,Male,University,Single,42,0,66681,3768,0
+260000,Female,High School,Married,51,-2,1821,7091,0
+260000,Female,University,Single,51,0,200964,8000,0
+50000,Female,High School,Single,54,0,42659,1500,0
+170000,Female,University,Single,26,1,16237,0,0
+20000,Male,University,Single,25,1,15863,0,0
+40000,Female,Graduate School,Single,24,1,9958,2500,0
+220000,Male,High School,Married,39,0,192815,20000,0
+30000,Female,University,Married,23,0,29336,1800,1
+20000,Male,University,Single,22,-1,17676,1451,1
+120000,Female,Graduate School,Single,34,0,67253,1954,0
+110000,Female,Graduate School,Single,32,0,30852,1906,0
+170000,Female,University,Married,23,2,160189,5936,0
+90000,Female,Graduate School,Single,32,1,32226,1839,1
+140000,Male,Graduate School,Single,35,-2,2000,0,0
+200000,Male,University,Married,41,-2,23025,0,1
+50000,Male,University,Single,27,0,46301,2187,0
+70000,Female,University,Single,23,-2,0,3999,0
+50000,Female,University,Married,45,0,49073,2022,0
+20000,Male,University,Single,57,0,20104,3000,0
+20000,Female,University,Married,47,3,17772,1300,1
+200000,Male,Graduate School,Single,32,1,18726,0,1
+460000,Male,Graduate School,Single,28,0,17919,1500,0
+140000,Female,Graduate School,Single,28,1,0,0,0
+240000,Male,Graduate School,Others,65,0,4110,1000,0
+70000,Female,University,Married,53,0,24645,0,0
+410000,Female,Graduate School,Married,53,-1,4213,187,0
+90000,Male,University,Single,28,0,91431,4672,0
+260000,Male,University,Married,35,-1,228,500,0
+260000,Male,High School,Married,53,-2,3304,7378,0
+200000,Male,High School,Married,36,-1,1762,4,1
+500000,Male,Graduate School,Single,35,-1,4814,13275,0
+20000,Male,Graduate School,Married,35,0,5745,2224,1
+20000,Male,University,Single,40,0,17068,1800,1
+20000,Female,University,Single,24,0,19096,1335,0
+360000,Male,High School,Married,58,-1,1351,554,0
+20000,Male,University,Single,43,2,9286,1200,1
+30000,Male,University,Single,25,3,18585,1300,1
+180000,Male,Graduate School,Married,44,0,18896,2006,0
+30000,Male,High School,Married,48,0,20817,3400,0
+190000,Female,Others,Married,48,0,146034,5704,0
+220000,Female,Graduate School,Single,35,0,11636,3004,0
+40000,Male,Graduate School,Married,37,0,12745,1500,0
+100000,Male,Graduate School,Married,30,2,85806,3100,1
+30000,Female,Graduate School,Single,41,0,22346,8816,0
+60000,Female,High School,Single,40,0,35602,1654,0
+80000,Female,University,Single,28,0,44739,3000,0
+180000,Female,University,Married,41,2,106944,3500,0
+90000,Female,University,Single,32,1,-295,0,0
+230000,Female,Graduate School,Single,32,1,0,351,0
+90000,Female,University,Single,26,0,83900,4022,1
+290000,Female,University,Married,39,-1,316,316,0
+100000,Female,University,Married,58,2,87851,0,1
+360000,Female,Graduate School,Married,37,-1,4644,1417,0
+100000,Male,University,Single,28,0,61396,3012,0
+200000,Female,Graduate School,Single,33,-1,846,4974,0
+70000,Female,Graduate School,Single,30,0,19492,1600,0
+200000,Female,University,Single,26,-2,0,0,0
+70000,Female,Graduate School,Single,24,0,68689,3120,0
+460000,Female,University,Single,33,-1,877,0,0
+80000,Female,University,Married,47,0,7345,4596,0
+90000,Female,University,Married,49,-2,316,714,0
+310000,Female,Graduate School,Single,29,-1,760,3674,0
+310000,Female,Graduate School,Married,38,1,0,0,1
+50000,Female,University,Single,29,0,48517,3000,0
+220000,Female,Graduate School,Single,30,-1,15069,680,0
+70000,Female,Graduate School,Single,25,0,31680,2000,0
+60000,Male,Graduate School,Married,34,0,37749,1948,0
+450000,Male,Graduate School,Married,54,-1,727,0,0
+20000,Female,Graduate School,Single,29,-1,342,792,0
+420000,Female,Graduate School,Single,33,-2,6135,51,1
+10000,Male,University,Single,22,0,6579,1134,1
+20000,Female,University,Married,25,2,1650,0,1
+200000,Female,University,Single,49,2,91140,4200,1
+80000,Female,University,Married,25,0,78639,3200,0
+80000,Male,High School,Single,30,0,78010,1106,1
+60000,Male,Graduate School,Single,23,0,59318,2065,0
+230000,Male,Graduate School,Single,28,-1,326,326,0
+100000,Male,Graduate School,Single,30,1,94432,0,0
+200000,Male,Graduate School,Single,29,-2,0,0,1
+110000,Male,Graduate School,Single,26,0,57861,2191,0
+50000,Male,University,Single,22,0,47478,1000,0
+260000,Male,Graduate School,Married,53,1,0,990,1
+220000,Female,Graduate School,Single,32,0,205310,7600,0
+200000,Male,Graduate School,Married,32,-1,1727,1809,0
+10000,Female,University,Married,26,1,5015,0,1
+180000,Male,University,Married,34,0,37671,1800,0
+210000,Female,University,Single,30,0,45998,2150,0
+310000,Male,Graduate School,Single,29,2,45153,2904,1
+360000,Male,Graduate School,Single,29,-2,12018,32038,0
+200000,Male,Graduate School,Married,51,0,102847,52556,0
+180000,Male,University,Married,39,2,180645,8000,1
+140000,Male,University,Single,27,0,34229,1583,0
+50000,Female,Graduate School,Single,32,1,20091,700,1
+60000,Male,High School,Married,33,0,59615,2300,0
+90000,Female,University,Married,27,1,0,0,0
+100000,Male,University,Single,40,0,86208,3595,0
+20000,Male,High School,Single,24,1,-1400,0,0
+110000,Male,Graduate School,Single,29,0,18166,3000,0
+70000,Female,Graduate School,Single,25,0,24526,1790,0
+170000,Female,University,Married,33,-2,4018,6330,0
+500000,Male,Graduate School,Married,64,-2,0,0,1
+30000,Male,University,Married,38,1,9303,0,0
+50000,Female,University,Single,23,0,33109,1660,0
+10000,Female,University,Single,22,0,9721,2939,0
+200000,Female,Graduate School,Single,35,0,13766,39124,0
+390000,Female,Graduate School,Single,29,0,196172,9599,0
+140000,Female,University,Single,32,2,76187,9834,1
+220000,Female,High School,Married,50,-2,612,10612,0
+500000,Female,Graduate School,Married,40,-2,2121,0,0
+100000,Female,University,Married,49,-2,1261,1261,0
+400000,Female,High School,Married,30,0,121219,4981,0
+80000,Male,Graduate School,Single,26,0,76436,5990,0
+30000,Male,University,Single,39,0,28138,1793,0
+240000,Male,Graduate School,Married,35,0,160161,7405,1
+500000,Male,Graduate School,Single,49,-2,478030,205,1
+50000,Male,University,Married,36,0,42150,3000,0
+180000,Male,University,Married,34,-1,1341,4185,0
+100000,Female,University,Married,33,0,70096,3300,0
+180000,Female,University,Single,29,0,85465,11508,0
+50000,Male,High School,Single,40,0,52828,3000,0
+360000,Male,Graduate School,Single,29,-1,51400,0,0
+330000,Female,University,Married,27,1,0,0,1
+200000,Male,Graduate School,Married,38,-1,3048,0,0
+40000,Male,University,Single,22,-1,900,2833,0
+260000,Male,Graduate School,Single,37,0,36664,1690,0
+320000,Male,Graduate School,Married,40,-1,10853,34151,0
+10000,Female,University,Married,22,-1,390,390,0
+230000,Female,University,Married,30,0,218495,8270,0
+10000,Female,University,Single,24,-1,5666,576,1
+30000,Male,Graduate School,Single,25,2,581,1312,1
+90000,Male,University,Married,41,0,45431,2749,0
+300000,Female,High School,Married,41,1,390,390,0
+230000,Female,University,Single,31,0,17587,2000,0
+150000,Male,University,Single,27,0,9489,2500,0
+60000,Male,University,Married,52,0,56893,3000,0
+50000,Male,University,Single,25,0,41739,1650,0
+140000,Male,University,Single,31,0,64300,20009,0
+90000,Female,High School,Married,25,0,80603,2827,0
+80000,Female,High School,Married,50,-1,67328,3113,1
+260000,Female,University,Married,38,-1,892,2955,0
+60000,Female,High School,Married,53,-1,747,0,0
+50000,Male,University,Married,48,2,44489,2000,1
+150000,Male,Graduate School,Single,34,-1,40749,0,0
+220000,Female,Graduate School,Married,36,-1,307,200,0
+330000,Female,Graduate School,Married,36,0,95255,10500,0
+300000,Female,University,Single,37,0,67943,3121,0
+70000,Female,University,Married,29,-1,71232,2751,0
+300000,Female,Graduate School,Single,29,-2,-1282,0,0
+70000,Female,University,Married,43,0,31652,5000,0
+110000,Male,University,Single,24,2,101402,6600,0
+240000,Female,University,Single,28,0,121109,6000,0
+80000,Female,High School,Single,54,0,60980,1541,0
+50000,Male,High School,Single,26,-2,0,0,0
+420000,Female,Graduate School,Married,32,0,45465,1706,0
+220000,Female,University,Married,41,2,27094,1465,0
+150000,Female,Graduate School,Single,31,0,90121,2031,0
+60000,Female,Graduate School,Single,31,-1,9815,11160,0
+260000,Female,University,Married,46,-2,0,135,0
+180000,Female,Graduate School,Single,34,1,0,0,1
+130000,Female,University,Single,25,1,39728,0,0
+220000,Female,University,Married,45,0,204052,7600,0
+200000,Male,Others,Single,40,2,58118,150000,0
+130000,Male,Graduate School,Single,27,-1,326,326,1
+420000,Female,Graduate School,Single,41,-1,22186,15262,0
+30000,Female,Graduate School,Married,23,0,26669,1844,1
+110000,Male,High School,Married,43,-1,390,5050,0
+150000,Female,University,Single,25,1,0,0,1
+50000,Female,University,Married,38,-2,0,0,0
+200000,Female,Graduate School,Married,41,1,0,0,0
+170000,Female,Graduate School,Married,33,0,170440,8200,1
+200000,Female,Others,Single,25,0,134857,4257,0
+200000,Female,University,Married,31,1,153829,7022,0
+50000,Female,Others,Single,33,2,50786,2034,0
+500000,Male,Graduate School,Single,38,0,477468,21000,0
+150000,Male,Graduate School,Single,25,-1,3125,8443,0
+30000,Male,University,Single,25,-2,836,836,0
+230000,Female,Graduate School,Single,33,-1,483,0,0
+360000,Female,University,Married,49,0,13219,3050,1
+380000,Female,Graduate School,Married,37,0,452092,17400,0
+30000,Female,High School,Single,43,2,13509,500,0
+50000,Female,University,Single,43,0,53651,3000,0
+420000,Female,University,Married,47,1,0,0,0
+70000,Male,University,Single,32,0,66982,3036,0
+50000,Female,University,Married,37,0,51734,2003,0
+350000,Female,University,Single,33,0,341200,14000,1
+200000,Male,Graduate School,Single,32,0,10532,0,0
+340000,Male,University,Single,29,-1,5060,7604,0
+50000,Male,University,Single,25,0,24724,2000,0
+20000,Female,University,Single,22,0,15189,2869,1
+360000,Female,Graduate School,Single,34,1,0,0,1
+140000,Male,Graduate School,Single,29,0,29785,4000,0
+150000,Male,University,Married,39,2,117485,0,1
+100000,Female,Graduate School,Married,35,0,98899,8800,1
+460000,Male,Graduate School,Married,44,-1,1762,3152,0
+150000,Female,University,Married,34,1,147833,6130,0
+290000,Female,Graduate School,Married,39,-2,79298,0,0
+240000,Female,University,Married,35,0,109226,5500,0
+210000,Female,Graduate School,Single,41,-1,779,11000,0
+350000,Female,University,Single,33,0,293303,50070,0
+280000,Female,High School,Married,37,-1,4591,38979,0
+150000,Male,University,Married,43,0,105416,5800,0
+170000,Female,University,Married,29,-1,7383,23518,0
+50000,Male,High School,Married,40,1,24091,0,0
+100000,Male,University,Married,51,3,56002,1000,1
+50000,Male,University,Single,24,0,47578,2000,0
+230000,Female,University,Married,33,1,0,0,1
+90000,Male,University,Single,28,2,33382,3000,1
+20000,Male,University,Single,24,1,12060,0,0
+260000,Female,University,Single,40,-1,1022,1085,0
+290000,Female,Graduate School,Married,40,-1,636,6732,1
+50000,Female,University,Married,34,1,49378,0,0
+60000,Female,High School,Single,42,0,54216,4000,0
+180000,Female,University,Married,36,-1,787,398,0
+250000,Female,University,Married,43,0,82650,3503,0
+90000,Male,Graduate School,Married,29,0,82410,8000,0
+10000,Female,High School,Married,52,2,3475,1100,0
+290000,Female,Graduate School,Married,37,0,54966,4020,0
+80000,Female,University,Single,27,0,79534,3174,0
+30000,Female,University,Single,25,2,300,0,0
+200000,Female,University,Married,37,-1,24779,8318,0
+50000,Female,High School,Married,56,0,63348,2400,0
+220000,Male,University,Married,47,4,216591,0,1
+430000,Male,University,Married,48,0,32263,1794,0
+50000,Female,University,Married,27,0,48945,1819,1
+410000,Male,Graduate School,Single,33,-2,333,333,0
+50000,Female,University,Married,39,-1,23912,0,0
+110000,Male,University,Single,47,0,72897,3290,0
+50000,Male,Graduate School,Single,28,1,50146,0,0
+30000,Female,High School,Others,56,3,26551,0,1
+80000,Female,Graduate School,Married,25,3,6269,1200,1
+230000,Male,University,Married,39,-1,8681,8681,0
+300000,Male,University,Married,66,0,259513,10000,1
+230000,Female,University,Single,28,0,23063,2000,0
+220000,Male,Graduate School,Married,36,-1,769,0,0
+200000,Female,Graduate School,Single,26,0,199649,7263,0
+130000,Female,Graduate School,Single,30,2,112822,4527,1
+20000,Male,Graduate School,Single,25,0,17263,4321,0
+400000,Female,Graduate School,Single,31,0,7912,4220,0
+230000,Female,Graduate School,Single,29,0,31206,1500,0
+30000,Male,University,Married,36,0,6727,2000,0
+190000,Male,University,Married,44,0,197494,7200,0
+220000,Female,Graduate School,Single,27,-1,471,5272,0
+20000,Male,University,Single,30,0,19149,1392,0
+280000,Female,University,Single,29,-2,0,0,0
+190000,Male,University,Married,36,0,79584,7007,0
+260000,Male,Graduate School,Single,36,-1,5877,10000,0
+80000,Female,University,Married,34,0,63343,3000,0
+60000,Female,University,Single,24,-1,57147,3000,0
+80000,Male,High School,Married,52,-1,35861,1500,0
+360000,Female,Graduate School,Married,36,-1,15521,36011,1
+350000,Female,University,Married,33,-1,598,0,0
+50000,Male,University,Single,25,0,46741,2900,0
+360000,Female,Graduate School,Married,40,-1,10210,0,0
+100000,Female,High School,Single,41,-1,7432,1633,1
+360000,Female,Graduate School,Single,29,-1,1498,0,0
+120000,Female,Graduate School,Single,31,-1,388,0,0
+20000,Female,High School,Married,51,0,19255,2000,0
+100000,Male,Graduate School,Single,31,2,84886,4100,1
+60000,Male,University,Single,22,0,60103,3033,0
+240000,Female,Graduate School,Single,28,-1,5018,0,1
+70000,Female,Graduate School,Single,25,0,67906,2519,0
+360000,Female,Graduate School,Single,31,-2,12521,18843,0
+290000,Male,Graduate School,Married,47,-2,3700,18035,0
+90000,Female,Graduate School,Single,24,0,85365,4054,0
+80000,Female,High School,Single,54,0,12011,3794,0
+50000,Male,University,Single,38,1,28651,0,0
+50000,Female,High School,Single,54,0,48065,2082,0
+160000,Female,University,Married,45,-1,19477,10528,0
+120000,Female,High School,Single,53,2,66963,3500,1
+500000,Female,Graduate School,Single,27,0,78335,2315,0
+50000,Female,University,Single,24,-1,1572,1572,0
+50000,Male,University,Single,28,0,35277,1800,1
+50000,Male,University,Single,28,0,2940,1100,0
+210000,Female,Graduate School,Single,33,-1,591,1053,0
+80000,Male,University,Single,26,0,5844,1067,0
+20000,Male,Graduate School,Single,31,0,12761,1229,0
+310000,Female,Graduate School,Married,34,-1,455,455,1
+150000,Female,Graduate School,Married,29,0,18231,3000,0
+50000,Female,University,Married,50,0,48926,2036,1
+100000,Male,High School,Married,30,2,68436,0,1
+150000,Male,Others,Single,31,-2,136692,2000,0
+50000,Male,University,Single,23,1,6226,0,0
+180000,Female,High School,Married,51,1,0,31,1
+50000,Female,High School,Single,23,-2,8647,4890,0
+50000,Female,High School,Married,36,0,44449,2500,0
+50000,Female,High School,Married,43,-1,6748,1366,0
+230000,Male,University,Married,41,0,231187,10500,0
+70000,Female,University,Single,24,0,68659,3500,0
+50000,Male,High School,Married,51,-1,6700,13902,0
+10000,Female,University,Single,31,0,10503,1140,0
+30000,Male,High School,Single,27,0,27061,1504,0
+230000,Female,University,Married,29,0,69853,2310,0
+50000,Female,High School,Married,61,0,20907,1672,0
+20000,Male,University,Married,31,0,17766,1630,0
+20000,Female,University,Single,31,0,16569,11000,0
+360000,Male,University,Married,26,-2,0,2500,0
+120000,Male,Graduate School,Single,28,0,8460,5000,0
+400000,Male,Graduate School,Married,35,-2,11802,73022,0
+360000,Male,Graduate School,Married,40,-2,5772,11694,0
+500000,Male,Graduate School,Married,55,0,16593,226145,0
+500000,Male,Graduate School,Married,50,0,197004,20000,0
+80000,Male,University,Single,26,-1,2159,2000,0
+20000,Female,University,Single,23,-2,9410,7120,0
+20000,Female,Graduate School,Single,24,0,24146,2024,0
+200000,Male,University,Married,51,-1,1690,0,1
+50000,Female,High School,Married,27,0,11231,2000,0
+120000,Female,High School,Married,46,-1,4940,0,1
+30000,Male,University,Single,37,0,27526,1714,0
+390000,Male,University,Married,35,0,33808,15347,0
+200000,Male,Graduate School,Married,57,-1,1964,2698,0
+260000,Female,Graduate School,Married,52,2,150,0,1
+210000,Male,University,Single,33,-1,1024,264,0
+180000,Male,Graduate School,Married,46,-1,780,0,0
+120000,Female,University,Married,27,0,83567,4326,0
+50000,Male,Graduate School,Single,26,-1,5713,0,1
+50000,Male,University,Single,32,0,46281,1658,0
+50000,Male,Graduate School,Single,30,0,43935,5000,0
+90000,Female,University,Single,24,0,80699,2991,0
+80000,Female,Graduate School,Single,23,1,32337,0,0
+50000,Male,University,Single,24,0,23553,1700,0
+310000,Female,University,Single,24,0,203519,6373,0
+100000,Male,Graduate School,Single,25,0,18883,2000,0
+230000,Female,High School,Married,50,1,0,211,0
+120000,Female,High School,Single,40,0,31543,1600,1
+230000,Male,Graduate School,Married,48,0,160879,7000,0
+360000,Female,High School,Married,38,1,29773,2000,0
+500000,Male,Graduate School,Married,37,-1,37443,24197,0
+50000,Male,High School,Married,44,0,49996,1596,1
+30000,Female,University,Single,49,0,11554,1300,0
+720000,Female,Graduate School,Married,45,0,125529,8019,0
+10000,Male,Graduate School,Single,22,0,4060,1300,0
+400000,Female,Graduate School,Single,27,-2,5457,4460,1
+150000,Female,Graduate School,Single,33,-1,2858,9934,0
+500000,Male,Graduate School,Single,54,-2,2634,1119,0
+50000,Male,University,Married,25,0,43259,1405,0
+210000,Female,University,Single,26,0,26847,1500,0
+260000,Female,Graduate School,Single,36,-1,531,0,0
+240000,Female,University,Married,38,1,7466,899,1
+360000,Female,University,Single,30,-2,0,90,0
+200000,Female,Graduate School,Single,30,-2,0,0,0
+50000,Female,High School,Single,43,-1,150,809,0
+80000,Female,University,Single,23,1,0,0,1
+230000,Male,Graduate School,Single,29,0,32375,5000,0
+500000,Female,Graduate School,Married,40,-1,4240,80648,0
+70000,Male,Graduate School,Single,40,2,45740,0,1
+50000,Female,Graduate School,Married,36,2,37397,3500,1
+210000,Male,High School,Single,34,-1,862,1144,0
+220000,Female,Graduate School,Married,41,0,157980,11024,0
+100000,Female,University,Married,29,2,75242,0,0
+150000,Male,University,Single,45,-1,9487,0,0
+50000,Male,University,Married,42,-2,1464,1469,0
+50000,Male,High School,Married,45,-1,261,2305,0
+410000,Male,University,Single,48,0,28921,5422,1
+80000,Female,University,Single,29,-2,0,0,0
+70000,Male,University,Married,35,1,68375,5000,0
+210000,Female,High School,Single,28,-1,1082,0,0
+290000,Female,University,Single,36,0,320952,11149,0
+50000,Male,University,Married,42,0,12111,1300,1
+380000,Male,University,Married,40,0,31526,21255,0
+60000,Female,University,Single,24,1,59201,2300,0
+200000,Male,Graduate School,Married,37,0,4289,1187,0
+200000,Female,Others,Single,32,0,64059,3372,0
+190000,Male,Graduate School,Single,33,-1,546,1213,0
+80000,Female,University,Married,46,0,73366,4000,0
+480000,Female,Graduate School,Single,33,0,118986,30000,0
+20000,Male,High School,Married,53,0,16854,1500,0
+230000,Male,High School,Single,30,1,0,0,1
+160000,Male,University,Married,31,0,75636,3400,0
+50000,Male,High School,Married,49,1,36728,36,0
+80000,Female,University,Married,47,1,17401,0,0
+240000,Female,University,Married,39,-2,11958,0,0
+50000,Female,Graduate School,Single,27,-1,451,0,0
+300000,Male,University,Married,35,1,240394,10000,0
+200000,Female,University,Single,23,0,67821,4938,0
+180000,Male,University,Married,43,0,171992,6099,0
+370000,Male,High School,Single,27,-1,50886,11378,0
+230000,Male,University,Single,28,0,31875,2000,0
+90000,Female,University,Married,27,2,85753,4000,1
+20000,Male,Graduate School,Single,26,0,11065,1206,1
+280000,Female,Graduate School,Married,57,-1,929,1669,0
+200000,Female,Graduate School,Married,39,1,0,0,1
+140000,Male,University,Married,73,0,135206,5082,0
+150000,Female,Graduate School,Single,27,0,3467,2008,0
+50000,Female,University,Married,31,2,26441,2600,1
+50000,Female,University,Single,25,0,40407,1814,0
+20000,Female,Graduate School,Single,23,0,15573,2000,0
+200000,Female,Graduate School,Single,33,0,150573,6550,0
+20000,Female,University,Single,23,0,8647,1232,0
+30000,Male,University,Single,24,0,26573,1500,1
+80000,Female,University,Married,34,2,17294,5210,1
+230000,Female,University,Single,30,0,112402,6000,0
+240000,Female,Graduate School,Married,38,-2,0,0,0
+200000,Female,Graduate School,Married,38,2,166,1694,1
+80000,Male,University,Married,49,0,71775,2911,0
+290000,Male,University,Married,39,1,-9,707,1
+70000,Female,University,Single,23,0,18285,1400,1
+30000,Female,University,Married,38,0,7979,1200,0
+30000,Female,High School,Married,31,2,18870,1906,1
+280000,Male,University,Single,37,-1,1152,49349,0
+200000,Female,University,Married,28,-1,3099,3174,0
+60000,Male,Graduate School,Single,28,-1,20000,0,0
+140000,Male,Others,Single,26,0,136953,6350,0
+390000,Male,University,Single,29,0,160537,30000,0
+30000,Female,Graduate School,Single,28,0,6359,1230,0
+160000,Male,High School,Married,54,-1,780,0,1
+500000,Female,University,Single,32,0,13631,3840,0
+180000,Female,University,Single,26,-1,5000,1600,0
+290000,Female,University,Married,33,0,117275,3700,0
+140000,Male,Graduate School,Single,35,0,142119,6000,0
+50000,Female,University,Married,31,0,13641,2000,0
+130000,Male,Graduate School,Married,47,2,115704,4900,0
+160000,Male,High School,Married,48,0,156197,4700,1
+110000,Male,University,Married,35,0,112820,4800,0
+180000,Female,University,Married,33,-1,506,0,0
+260000,Female,Others,Single,32,0,169734,6568,1
+310000,Female,Graduate School,Married,32,0,66916,2440,0
+450000,Male,University,Married,44,-1,19951,8203,0
+130000,Male,Graduate School,Single,27,0,69353,2139,0
+300000,Female,Graduate School,Married,36,1,5602,0,0
+160000,Female,Graduate School,Married,39,1,0,0,0
+30000,Male,University,Single,26,-1,12813,3000,0
+270000,Male,University,Married,48,2,53624,15018,1
+360000,Male,University,Married,63,1,-3,0,1
+360000,Male,University,Married,42,1,0,0,1
+180000,Male,University,Married,40,-1,3071,5639,0
+360000,Female,High School,Married,41,-1,390,390,0
+280000,Male,Graduate School,Married,45,2,2482,0,1
+220000,Male,University,Single,44,0,5005,5000,0
+20000,Female,High School,Married,47,0,8152,1125,0
+310000,Male,High School,Married,32,0,172772,8295,0
+160000,Male,University,Single,52,1,2259,1310,1
+50000,Male,Graduate School,Single,27,1,-2900,0,0
+20000,Male,University,Married,38,0,16619,1500,1
+20000,Male,University,Single,26,0,18626,2000,1
+470000,Male,University,Married,32,0,181224,7000,0
+160000,Female,University,Married,31,0,42781,2300,1
+220000,Male,Graduate School,Single,31,-2,2828,2828,0
+50000,Female,University,Married,42,0,11618,1216,0
+50000,Male,University,Single,22,2,13752,4448,1
+230000,Female,Graduate School,Married,53,-1,10000,0,0
+70000,Female,University,Married,51,2,43913,2000,0
+360000,Male,High School,Single,29,1,0,0,0
+90000,Female,University,Single,24,-1,520,0,0
+180000,Male,Graduate School,Single,27,-1,2658,2014,1
+50000,Male,University,Married,53,0,47750,2180,1
+210000,Female,Graduate School,Single,39,-2,0,0,0
+50000,Female,Graduate School,Single,24,0,52227,5350,0
+10000,Male,University,Single,23,2,8206,1134,0
+80000,Male,Graduate School,Single,42,0,74380,2619,0
+200000,Female,High School,Married,34,-2,3600,4,0
+500000,Male,Graduate School,Married,41,2,564757,19850,1
+50000,Female,University,Married,42,0,49401,2100,0
+50000,Female,High School,Single,22,2,31285,3000,0
+80000,Male,High School,Married,46,2,40154,80,0
+50000,Female,Graduate School,Single,23,1,10393,0,0
+50000,Male,University,Single,22,-1,5696,1277,0
+80000,Female,University,Single,24,0,78749,4000,0
+100000,Female,University,Married,39,3,76620,0,1
+360000,Male,Graduate School,Single,49,1,0,0,0
+510000,Female,Graduate School,Married,37,-1,386,763,0
+120000,Male,University,Married,26,0,47733,3000,0
+130000,Female,High School,Married,48,0,13570,785,0
+330000,Male,Graduate School,Single,44,0,604019,19001,0
+50000,Female,High School,Single,40,0,29311,10000,0
+30000,Male,University,Single,43,0,25724,5300,0
+200000,Female,Graduate School,Married,24,0,156683,5808,0
+80000,Female,University,Married,33,2,53843,3500,0
+220000,Female,Graduate School,Married,39,0,6589,2901,0
+70000,Female,Graduate School,Single,30,1,-25,0,0
+350000,Female,University,Married,48,0,311013,11019,0
+40000,Male,University,Single,27,0,38561,1700,0
+60000,Male,University,Single,24,0,60753,2800,0
+120000,Male,University,Married,34,-1,6059,1943,0
+50000,Female,University,Single,48,0,50894,1385,0
+100000,Female,Graduate School,Married,44,0,68444,5000,1
+210000,Male,University,Married,30,0,85988,5000,1
+20000,Female,University,Single,35,0,16528,3400,0
+240000,Male,High School,Single,42,0,159362,146282,0
+170000,Female,University,Married,43,0,95780,13298,0
+330000,Male,Graduate School,Single,34,-1,316,310,0
+70000,Male,University,Single,43,0,59829,1586,0
+50000,Female,High School,Married,29,0,45561,2500,0
+130000,Male,University,Single,38,0,9439,63053,0
+50000,Male,High School,Single,36,2,2400,0,1
+450000,Female,Graduate School,Single,30,0,46244,1599,1
+100000,Male,University,Married,41,2,68786,2500,1
+120000,Female,Graduate School,Single,26,0,11513,1206,0
+270000,Female,Graduate School,Single,29,0,14728,10000,0
+80000,Male,Graduate School,Single,36,-2,5012,1200,0
+80000,Female,High School,Married,40,0,80855,2800,0
+320000,Male,Graduate School,Single,33,-2,3903,3609,0
+380000,Male,Graduate School,Married,35,1,-6,15105,0
+380000,Female,University,Single,41,0,86950,3239,0
+180000,Female,University,Married,41,-2,21450,0,0
+100000,Female,University,Single,31,0,32430,2579,0
+290000,Female,Graduate School,Single,29,0,41435,1690,0
+90000,Male,Graduate School,Single,27,-1,1040,3963,1
+10000,Male,University,Married,24,5,9829,0,1
+150000,Male,High School,Single,31,-2,31991,18696,0
+160000,Female,University,Single,32,2,400,0,0
+200000,Female,Graduate School,Married,27,2,191406,15100,1
+30000,Female,University,Single,23,-1,1393,1393,0
+70000,Female,Graduate School,Single,24,0,65920,3150,0
+160000,Male,University,Single,30,1,0,0,0
+140000,Female,Graduate School,Single,27,2,94903,4400,1
+50000,Female,High School,Single,22,0,11867,1309,0
+50000,Female,Graduate School,Single,23,-1,496,454,0
+150000,Male,Graduate School,Single,31,0,1575,3651,0
+120000,Female,Graduate School,Single,34,1,0,0,0
+160000,Female,University,Married,63,1,-2,0,0
+180000,Female,Others,Single,26,0,252810,8013,0
+220000,Female,University,Single,27,0,220588,9000,0
+10000,Male,University,Single,22,2,8631,1238,1
+80000,Female,Graduate School,Married,36,-1,1069,2000,0
+110000,Female,University,Single,28,0,105080,5540,0
+80000,Female,University,Married,51,1,-12,0,1
+270000,Male,University,Married,52,0,18758,4937,0
+160000,Female,Graduate School,Single,36,1,163897,0,0
+50000,Male,University,Single,44,0,16726,2000,0
+60000,Female,University,Married,39,0,58017,2396,0
+140000,Male,University,Single,28,0,137706,12300,0
+360000,Female,University,Single,34,0,277569,10500,0
+200000,Female,Others,Single,34,-1,6093,4310,0
+40000,Female,University,Married,36,1,38430,0,1
+50000,Female,High School,Married,50,2,48381,2200,1
+50000,Female,University,Married,46,0,48095,1500,0
+70000,Male,Graduate School,Single,27,0,134781,6012,0
+500000,Male,Graduate School,Married,41,-1,719,1558,0
+160000,Male,Graduate School,Single,37,-1,17723,0,1
+180000,Female,High School,Single,28,0,176802,6963,0
+100000,Male,University,Single,25,0,48285,390,0
+50000,Female,University,Single,38,0,4965,500,0
+210000,Male,University,Married,33,0,141571,5000,1
+140000,Male,Graduate School,Single,29,-2,700,700,0
+80000,Female,University,Single,23,0,50884,2427,1
+120000,Female,Graduate School,Single,28,0,75982,3451,1
+280000,Male,University,Married,48,0,74267,3110,0
+80000,Female,University,Married,45,0,30063,1843,0
+200000,Female,High School,Single,31,-1,2126,99787,0
+300000,Female,Graduate School,Single,27,-1,2566,4725,0
+40000,Male,Graduate School,Married,34,-1,7018,7419,0
+360000,Male,University,Married,38,1,0,0,1
+60000,Male,Graduate School,Single,30,1,0,345,0
+110000,Female,University,Single,29,2,600,0,1
+50000,Male,University,Single,42,1,-4,0,0
+150000,Female,University,Single,24,-2,1563,5689,0
+60000,Female,Others,Single,22,0,61094,3000,0
+20000,Female,Graduate School,Single,25,0,17693,1081,0
+360000,Female,University,Single,33,0,9797,1000,0
+30000,Male,Graduate School,Single,25,0,7833,0,0
+70000,Female,University,Married,36,0,59331,3500,0
+50000,Male,University,Single,28,0,48433,2000,1
+80000,Male,University,Married,37,2,58024,2800,1
+200000,Female,University,Married,35,0,7677,7240,0
+480000,Female,High School,Married,42,-2,7245,0,0
+60000,Male,University,Married,46,2,59748,2500,1
+150000,Female,High School,Single,60,0,149892,6780,1
+400000,Male,Graduate School,Married,40,-1,4750,0,1
+300000,Female,Graduate School,Single,36,1,-78,0,1
+80000,Female,High School,Single,26,0,40931,4100,0
+30000,Male,University,Single,25,1,25787,3000,0
+300000,Male,University,Single,33,0,283552,10000,0
+50000,Female,University,Married,23,0,50643,2741,0
+400000,Female,University,Married,53,0,252860,11416,0
+280000,Female,University,Single,26,0,70219,3165,0
+160000,Female,Graduate School,Single,30,-1,6548,7,0
+450000,Male,Graduate School,Married,53,-1,3873,3128,0
+50000,Male,High School,Married,47,1,0,0,0
+120000,Female,High School,Married,47,0,16224,2001,0
+60000,Female,University,Married,57,0,20866,2000,0
+210000,Female,Graduate School,Single,26,0,129882,6000,0
+110000,Female,University,Single,24,0,28956,5000,0
+110000,Female,Graduate School,Married,32,1,42867,0,1
+140000,Male,University,Single,41,0,92281,3551,0
+430000,Female,University,Married,36,1,1821,2324,0
+20000,Male,University,Married,43,-1,4820,1837,0
+360000,Female,Graduate School,Married,31,1,-1,181,0
+630000,Female,Graduate School,Single,29,0,497106,15868,1
+60000,Female,University,Others,39,0,52149,1173,0
+50000,Male,High School,Single,41,0,48238,4866,0
+200000,Male,Graduate School,Married,41,-1,855,0,0
+20000,Male,High School,Single,61,2,18828,1400,0
+260000,Male,Graduate School,Married,37,1,0,0,0
+210000,Female,Graduate School,Married,32,-1,8600,0,0
+30000,Female,University,Single,31,0,28716,2000,0
+500000,Male,Graduate School,Single,33,-1,1025,2000,0
+20000,Female,High School,Married,47,0,19753,1401,0
+240000,Female,Graduate School,Married,40,-1,626,626,0
+20000,Female,High School,Married,45,0,16392,2000,0
+150000,Male,Graduate School,Single,30,2,17352,5000,1
+150000,Female,Graduate School,Single,30,0,14883,4908,0
+500000,Male,Graduate School,Married,41,0,231887,6115,0
+220000,Female,University,Married,53,0,207592,8165,1
+30000,Female,University,Single,22,-1,13271,2015,0
+10000,Female,University,Married,40,1,8481,2000,1
+170000,Female,University,Married,28,1,0,0,0
+120000,Male,Graduate School,Married,45,1,0,106257,1
+280000,Female,Graduate School,Single,29,0,105403,3109,0
+10000,Female,High School,Single,22,0,7572,1118,0
+200000,Female,High School,Married,54,6,110185,0,0
+260000,Female,University,Married,36,0,107030,13011,0
+500000,Female,Graduate School,Single,43,-2,274261,11375,0
+500000,Female,Graduate School,Single,32,-2,0,0,1
+260000,Female,Graduate School,Single,50,0,254749,11046,0
+90000,Male,University,Married,32,0,36081,3002,0
+160000,Male,University,Married,52,0,161747,7350,1
+230000,Female,University,Single,25,0,187472,9200,0
+360000,Male,University,Married,46,1,5800,0,0
+50000,Male,High School,Single,29,0,49156,4000,0
+130000,Male,High School,Single,44,0,83648,3100,0
+240000,Female,University,Single,27,1,0,0,1
+50000,Male,Graduate School,Single,22,0,13444,1600,0
+320000,Female,Graduate School,Single,43,0,34702,50390,0
+130000,Female,University,Married,44,0,69120,3000,0
+50000,Female,High School,Single,48,1,49295,2200,0
+330000,Female,Graduate School,Single,24,0,341499,12500,0
+330000,Male,Graduate School,Single,37,-2,0,0,0
+120000,Female,University,Single,23,0,107934,19000,0
+180000,Female,University,Married,29,2,113436,1308,1
+500000,Female,Graduate School,Married,47,-2,97832,27369,0
+250000,Female,University,Married,54,-1,17582,1200,0
+260000,Female,University,Single,29,0,56380,4121,0
+270000,Female,Graduate School,Married,36,-2,3291,0,0
+500000,Male,Graduate School,Married,42,1,3747,11684,0
+60000,Female,High School,Married,42,-1,1570,20680,1
+240000,Male,Graduate School,Single,28,0,147974,5300,0
+500000,Female,Graduate School,Single,41,2,2500,0,1
+330000,Male,University,Single,33,0,137974,7000,0
+80000,Female,University,Single,24,-2,2530,3055,0
+80000,Female,Graduate School,Single,23,0,80133,3539,0
+180000,Male,High School,Married,39,0,122187,6000,1
+60000,Female,Graduate School,Single,24,0,18220,1553,1
+150000,Female,University,Married,44,-2,292,8144,0
+10000,Female,High School,Married,45,2,390,390,0
+340000,Female,Graduate School,Married,41,0,325371,25034,0
+80000,Female,High School,Married,39,0,83482,2900,0
+430000,Female,Graduate School,Single,34,0,35792,2000,0
+340000,Female,Graduate School,Single,36,0,91938,3911,0
+80000,Female,University,Married,35,0,2227,1250,0
+150000,Female,Graduate School,Married,34,0,97874,54190,0
+320000,Female,Graduate School,Single,27,2,253197,11000,1
+120000,Female,University,Married,28,0,96833,3000,0
+620000,Female,Graduate School,Single,31,0,569023,23009,0
+130000,Male,Graduate School,Single,37,0,22563,1500,0
+230000,Female,University,Married,46,-1,4328,1518,1
+60000,Female,Graduate School,Single,28,2,22343,3000,1
+420000,Male,High School,Single,32,-1,41104,6887,0
+80000,Male,University,Single,36,0,59189,3007,0
+200000,Female,Graduate School,Married,42,0,23242,1711,0
+80000,Male,University,Married,43,1,77871,1261,0
+200000,Male,Graduate School,Single,34,3,1571,0,1
+110000,Female,High School,Married,42,0,116292,5836,0
+60000,Male,University,Married,37,0,54770,5000,1
+340000,Female,Graduate School,Married,33,0,35695,6018,0
+50000,Female,High School,Single,22,0,50089,1798,0
+80000,Female,University,Single,48,0,60261,2234,0
+30000,Male,University,Single,29,2,21959,3400,0
+200000,Female,University,Single,24,0,21447,2649,0
+200000,Male,Graduate School,Married,49,-1,390,390,0
+90000,Female,University,Married,34,2,49530,1449,1
+160000,Male,Graduate School,Married,49,0,143804,6235,0
+50000,Male,University,Married,44,1,49615,0,1
+230000,Female,Graduate School,Married,36,1,0,0,0
+300000,Female,Graduate School,Single,27,-2,525,7797,0
+300000,Female,Graduate School,Married,59,-1,2500,0,0
+50000,Female,Graduate School,Married,35,0,50456,2400,0
+260000,Male,University,Married,45,-1,3814,2861,0
+140000,Female,University,Married,38,-1,1156,316,0
+30000,Male,University,Single,28,0,25198,1500,0
+230000,Female,University,Single,35,-1,30300,1689,0
+260000,Female,Graduate School,Married,28,-2,0,0,1
+110000,Female,University,Married,28,1,106932,27,0
+360000,Male,University,Single,26,0,89635,3122,0
+20000,Female,University,Married,63,-1,858,1716,0
+230000,Female,Graduate School,Married,41,-1,1886,8034,0
+220000,Female,Graduate School,Single,30,0,91178,10128,0
+230000,Female,Graduate School,Single,32,1,205498,10000,1
+500000,Female,University,Single,27,0,196931,5188,0
+50000,Male,High School,Married,36,0,10849,1101,0
+500000,Female,Graduate School,Married,49,0,435737,20157,0
+30000,Female,University,Married,22,-1,4476,5841,1
+50000,Male,Graduate School,Single,23,0,20655,1701,0
+80000,Male,Graduate School,Single,27,-1,1300,179,0
+40000,Female,Graduate School,Single,23,-1,32346,4082,0
+500000,Female,University,Married,37,0,156951,20167,0
+30000,Male,University,Married,29,0,29852,1600,0
+90000,Female,Graduate School,Single,28,0,62062,2971,0
+20000,Male,University,Married,35,0,20139,1480,1
+60000,Male,University,Single,22,1,25300,0,0
+500000,Male,Graduate School,Single,32,0,208603,8444,0
+70000,Male,Graduate School,Married,40,1,12026,2000,0
+180000,Female,Graduate School,Single,26,-1,873,7233,0
+450000,Female,Graduate School,Single,38,0,1414,1080,1
+220000,Male,University,Single,35,1,0,0,0
+20000,Female,University,Single,45,1,4516,700,0
+170000,Female,University,Married,50,-1,79210,660,0
+110000,Male,Graduate School,Married,48,0,73472,3139,0
+70000,Male,University,Married,45,0,43050,1888,0
+290000,Male,Others,Married,38,-2,555,0,0
+50000,Female,Graduate School,Single,23,-1,1050,1050,0
+160000,Female,University,Single,32,0,156098,6000,0
+360000,Female,University,Married,37,-1,725,1002,1
+200000,Female,High School,Single,29,0,98700,0,0
+200000,Male,University,Married,38,0,8565,85000,0
+100000,Male,University,Single,29,-1,2494,2005,0
+70000,Male,University,Single,48,-1,193,188,0
+60000,Female,University,Married,23,2,63143,0,0
+170000,Female,University,Married,43,0,51054,2489,0
+50000,Male,Graduate School,Single,25,0,40181,1500,1
+360000,Male,University,Married,28,0,207571,6114,0
+210000,Male,University,Single,24,0,40985,1987,1
+20000,Female,High School,Single,30,-1,390,286,0
+20000,Male,University,Single,24,0,17196,4007,1
+20000,Female,University,Married,37,0,17312,1500,1
+460000,Female,High School,Married,57,-2,7516,23524,0
+30000,Female,Graduate School,Single,22,1,25229,1000,1
+30000,Female,High School,Married,38,2,5685,0,1
+30000,Female,Graduate School,Single,26,0,9417,3150,0
+20000,Male,University,Married,33,1,18139,0,1
+150000,Male,Graduate School,Single,32,2,3354,0,1
+200000,Male,University,Married,27,0,112133,5693,0
+110000,Female,University,Married,48,1,0,0,0
+400000,Female,Graduate School,Single,32,1,29864,0,1
+50000,Male,University,Single,25,1,11023,14,0
+100000,Female,Graduate School,Married,28,0,106203,8000,0
+120000,Male,Graduate School,Married,34,2,118096,12000,1
+220000,Female,Graduate School,Married,43,-1,1090,167,1
+100000,Female,University,Married,40,0,90828,3523,0
+410000,Female,Graduate School,Married,35,-1,99687,0,0
+50000,Female,University,Married,28,1,5490,0,0
+50000,Male,Graduate School,Single,26,0,50947,3432,0
+90000,Female,High School,Married,33,2,75619,3300,0
+260000,Male,Graduate School,Single,30,-2,-7,0,0
+70000,Female,Graduate School,Single,24,0,8244,2500,0
+130000,Male,University,Married,53,2,109994,17100,1
+120000,Female,High School,Married,43,1,2057,524,0
+160000,Female,High School,Married,44,1,0,14121,0
+50000,Female,University,Single,32,0,49138,2250,1
+220000,Female,Graduate School,Single,27,-1,3352,3352,1
+500000,Female,Graduate School,Single,28,0,5392,1502,0
+360000,Male,Graduate School,Single,26,0,7127,1034,0
+320000,Male,University,Single,32,1,114180,5012,0
+380000,Male,Graduate School,Married,39,1,0,0,1
+70000,Male,University,Married,45,0,31218,1608,1
+120000,Female,Graduate School,Married,32,-1,3556,652,1
+80000,Female,High School,Married,52,1,0,2583,1
+220000,Male,Graduate School,Single,25,1,222737,0,0
+20000,Male,University,Single,34,0,15494,1488,0
+50000,Male,University,Married,39,0,46410,5000,0
+120000,Female,University,Married,34,-1,326,499,0
+50000,Female,Graduate School,Single,30,-1,14855,7468,0
+80000,Female,Graduate School,Single,24,0,81111,3000,0
+130000,Male,University,Single,28,0,119513,6000,0
+450000,Male,Graduate School,Others,37,-2,324,303,0
+80000,Male,University,Single,27,2,56789,2400,1
+100000,Female,High School,Married,39,0,4247,1283,0
+260000,Female,University,Married,43,2,2500,0,1
+50000,Male,High School,Others,45,-1,24077,1000,0
+200000,Female,Graduate School,Single,30,0,74961,2513,0
+260000,Female,University,Married,40,-2,209,217,0
+470000,Male,High School,Single,37,2,499024,20600,1
+100000,Female,University,Married,41,0,76828,4000,0
+360000,Male,Graduate School,Single,31,-1,509,1363,1
+50000,Male,University,Single,24,0,42025,1500,0
+20000,Female,University,Married,23,0,18774,1700,0
+360000,Female,University,Single,34,1,0,0,1
+190000,Male,Graduate School,Single,31,0,182396,8300,0
+180000,Female,University,Single,34,0,59076,3000,0
+220000,Female,University,Married,35,0,228754,10626,0
+200000,Male,High School,Married,34,0,75420,3015,0
+150000,Female,University,Single,29,-1,1739,1984,0
+240000,Male,University,Married,41,0,22165,7002,1
+340000,Female,University,Single,28,0,329096,10025,0
+100000,Female,University,Single,26,-1,53743,3000,0
+80000,Male,High School,Married,29,0,42488,2000,0
+210000,Male,Graduate School,Married,40,0,60183,10000,0
+300000,Female,University,Married,38,-1,5848,24718,0
+220000,Female,High School,Married,37,-1,200,400,1
+250000,Female,Graduate School,Single,50,1,0,0,0
+70000,Male,University,Single,32,0,37199,1700,0
+170000,Male,Graduate School,Single,27,2,46899,2732,0
+160000,Female,University,Single,34,0,57287,1264,0
+110000,Male,University,Single,24,0,105927,2375,0
+60000,Female,Graduate School,Single,25,-2,564,2045,0
+290000,Female,University,Single,29,-2,0,1591,0
+240000,Female,High School,Single,33,0,221960,9500,0
+30000,Female,University,Single,23,4,28607,0,0
+490000,Male,Graduate School,Married,45,-1,9763,7059,1
+20000,Male,High School,Single,43,1,-20,800,0
+260000,Male,Graduate School,Single,34,0,254332,11710,0
+210000,Female,Graduate School,Single,32,-1,552,0,0
+300000,Female,Graduate School,Married,47,-2,3289,5189,0
+40000,Male,University,Married,42,-1,780,0,0
+500000,Male,Graduate School,Single,37,-1,3214,0,0
+260000,Female,Graduate School,Single,30,2,1196,2000,0
+90000,Female,University,Married,37,0,85220,3000,1
+20000,Male,University,Single,25,-2,0,0,0
+270000,Female,Graduate School,Single,31,0,167428,6155,1
+100000,Male,University,Married,42,0,58392,5500,0
+350000,Male,Graduate School,Single,31,0,51900,1000,0
+200000,Male,University,Married,39,-1,162,1879,0
+30000,Female,University,Single,39,1,26981,0,0
+20000,Female,University,Single,23,-1,390,390,0
+240000,Female,Graduate School,Single,29,-1,1356,326,0
+50000,Male,Graduate School,Single,30,1,7690,1400,0
+80000,Female,Graduate School,Married,44,0,68677,3152,0
+160000,Male,Graduate School,Single,29,-1,2275,0,0
+250000,Female,University,Single,26,0,70522,2667,0
+50000,Female,University,Single,31,-1,422,1532,1
+80000,Female,High School,Married,45,-1,2516,0,0
+90000,Female,University,Married,39,-1,27571,337,0
+20000,Male,University,Single,22,0,19666,20700,0
+50000,Female,University,Married,43,1,43791,0,0
+150000,Female,University,Single,23,0,146480,6380,1
+50000,Female,University,Married,39,1,0,0,0
+500000,Female,University,Single,39,-1,17749,10106,0
+200000,Female,Graduate School,Single,28,-1,326,326,0
+90000,Female,Graduate School,Single,27,0,25118,1440,0
+100000,Female,University,Single,39,0,100224,4000,0
+600000,Female,University,Married,46,-2,2585,1652,0
+20000,Male,University,Single,22,-1,2880,0,0
+60000,Female,Graduate School,Married,37,0,37934,2000,0
+180000,Female,High School,Others,40,0,30712,1774,0
+290000,Female,Graduate School,Married,29,0,232319,5071,0
+230000,Female,High School,Married,32,-1,8756,930,0
+180000,Female,University,Single,34,0,115700,6000,0
+150000,Male,Graduate School,Single,30,-1,3415,2425,0
+460000,Male,University,Married,39,0,168182,6051,0
+140000,Female,Graduate School,Single,36,-1,780,177,0
+500000,Female,Graduate School,Single,41,-1,2345,3595,1
+20000,Male,High School,Married,36,0,6620,2151,0
+70000,Male,University,Single,62,1,24635,1100,0
+150000,Male,Graduate School,Single,23,-1,1389,7139,0
+10000,Male,University,Married,58,2,8857,1400,1
+500000,Male,Graduate School,Single,33,0,76440,1397,0
+260000,Female,Graduate School,Married,37,-1,2644,268,0
+10000,Female,Others,Single,24,-1,5742,1500,0
+210000,Female,Graduate School,Single,37,-1,10128,19631,0
+180000,Male,Graduate School,Single,31,-2,2930,3300,0
+40000,Male,University,Single,31,0,37408,1931,0
+180000,Female,University,Married,38,0,9705,1000,0
+250000,Female,University,Married,39,-1,8040,8099,0
+50000,Male,University,Single,23,0,45256,2161,0
+400000,Female,High School,Single,35,-1,8125,7018,0
+240000,Female,Graduate School,Single,30,-2,0,0,0
+180000,Female,Graduate School,Married,33,-1,3060,2337,0
+300000,Male,Graduate School,Single,33,-1,4499,0,0
+300000,Female,University,Single,32,0,192727,8000,0
+140000,Male,Graduate School,Married,35,1,91296,0,0
+70000,Female,Graduate School,Single,25,-1,2781,18180,0
+20000,Female,Graduate School,Single,25,0,15465,1280,0
+290000,Male,University,Married,36,-1,16082,27411,0
+130000,Female,University,Single,24,2,1176,0,1
+70000,Male,University,Married,37,0,67573,3000,1
+60000,Female,High School,Single,26,0,58643,2100,0
+340000,Female,University,Married,31,-1,18495,1611,0
+50000,Female,Graduate School,Married,34,-2,51400,0,0
+150000,Male,University,Single,29,0,71593,3060,0
+180000,Female,Graduate School,Single,28,0,90874,3000,0
+50000,Female,University,Single,30,0,48779,2400,0
+200000,Male,University,Single,39,0,15138,1513,0
+140000,Female,Graduate School,Married,33,0,55222,2510,1
+190000,Female,University,Married,45,2,140329,6800,0
+140000,Female,Graduate School,Single,26,0,84151,5000,0
+50000,Male,University,Single,34,1,50940,0,0
+360000,Male,University,Married,47,0,8289,390,0
+180000,Male,University,Single,29,1,191262,0,0
+160000,Female,Others,Single,40,0,147977,6000,0
+180000,Male,University,Single,30,0,40735,1581,1
+200000,Female,University,Married,39,-1,3603,0,0
+350000,Female,Graduate School,Married,32,0,67761,8007,0
+90000,Female,University,Single,29,0,83158,12128,0
+50000,Male,High School,Single,34,0,9110,1500,0
+30000,Male,Graduate School,Single,29,1,0,0,0
+280000,Male,University,Single,45,0,170678,7000,0
+140000,Female,University,Married,25,0,140876,3600,0
+200000,Male,Graduate School,Single,38,-2,0,0,0
+20000,Male,High School,Married,29,0,11136,10000,0
+120000,Female,University,Single,28,-1,4174,3821,0
+130000,Male,High School,Married,54,3,131139,6000,0
+150000,Male,High School,Single,32,0,148352,5210,0
+10000,Female,Graduate School,Married,37,0,7600,3000,0
+200000,Male,University,Single,58,0,105836,3923,0
+480000,Male,University,Married,44,-2,21149,1743,0
+80000,Female,High School,Single,39,0,4285,1866,0
+200000,Male,University,Single,31,0,173103,6500,0
+50000,Female,University,Married,40,1,37458,2010,0
+130000,Female,High School,Married,32,-1,326,326,1
+200000,Male,University,Married,28,2,2443,0,1
+170000,Male,Graduate School,Married,36,0,82898,10000,0
+150000,Male,Graduate School,Married,46,0,97832,4555,0
+510000,Female,Graduate School,Single,35,-1,198,3125,0
+270000,Female,University,Married,65,1,0,2521,0
+130000,Male,University,Single,26,2,134836,0,1
+130000,Female,High School,Married,57,0,125824,4000,1
+120000,Male,University,Single,34,-2,67,0,0
+140000,Male,Graduate School,Married,43,-1,1872,1249,0
+360000,Female,Graduate School,Single,35,0,6392,3000,0
+50000,Male,University,Single,24,0,15519,5000,0
+30000,Female,University,Married,31,-1,699,0,1
+30000,Male,High School,Single,26,1,-857,30484,0
+150000,Female,Graduate School,Married,42,-1,496,940,0
+50000,Male,High School,Married,58,0,50456,2400,0
+10000,Female,University,Married,22,0,8730,1303,0
+60000,Male,High School,Single,26,1,61966,0,0
+200000,Female,University,Single,28,0,63314,3000,0
+50000,Male,High School,Single,46,0,50297,1852,1
+10000,Female,University,Married,23,1,11640,0,0
+310000,Female,Graduate School,Single,44,-1,11983,7797,0
+170000,Male,University,Single,29,-1,3362,64612,0
+310000,Male,University,Married,34,-1,63908,1867,0
+20000,Male,Graduate School,Married,50,0,9164,0,1
+230000,Female,Graduate School,Married,39,1,0,0,1
+50000,Male,Graduate School,Single,31,0,4560,3000,0
+260000,Female,Graduate School,Single,28,0,39105,8000,0
+110000,Female,University,Single,59,2,89595,2437,0
+210000,Female,Graduate School,Single,30,1,0,0,1
+260000,Male,University,Single,31,0,62446,2000,0
+400000,Male,High School,Married,51,0,373702,13000,0
+20000,Male,University,Married,58,-1,21116,21654,0
+180000,Female,University,Married,42,-2,390,1260,0
+180000,Male,University,Single,26,3,129728,6006,0
+30000,Male,High School,Single,44,2,28290,3600,0
+50000,Female,University,Married,34,0,40798,2000,0
+150000,Male,Graduate School,Single,26,0,131490,16441,0
+30000,Female,Graduate School,Single,23,4,29944,206,0
+60000,Male,Graduate School,Single,24,0,8169,3000,0
+500000,Male,Graduate School,Married,40,2,267116,10100,1
+180000,Female,Graduate School,Married,37,2,1310,0,1
+180000,Male,High School,Married,35,0,111752,2258,0
+60000,Female,University,Married,48,0,46578,1308,1
+200000,Female,Graduate School,Married,30,1,0,0,0
+150000,Female,University,Married,25,0,147476,6640,1
+190000,Male,University,Married,58,2,135184,6586,0
+170000,Male,University,Single,42,0,102141,4002,0
+280000,Male,Graduate School,Single,29,0,135056,4547,0
+290000,Female,High School,Married,43,-1,1994,390,0
+240000,Male,University,Married,44,0,21643,1400,0
+110000,Female,University,Married,24,0,42667,2600,0
+140000,Female,Graduate School,Single,27,0,126785,6500,0
+250000,Male,University,Married,38,2,67788,5080,1
+100000,Male,High School,Single,32,2,103540,254,0
+20000,Female,High School,Married,54,0,6268,3000,0
+120000,Female,University,Single,27,2,51244,2000,0
+120000,Male,Graduate School,Single,30,1,0,0,0
+130000,Female,University,Single,25,1,4393,0,0
+10000,Male,University,Single,24,0,6933,2500,0
+100000,Male,Graduate School,Single,31,0,27163,3000,0
+70000,Female,University,Married,34,2,16965,1000,1
+360000,Male,Graduate School,Married,37,-1,25078,21006,0
+130000,Female,Graduate School,Single,26,-2,0,0,0
+240000,Male,University,Married,38,0,132090,20030,0
+20000,Male,University,Single,34,1,0,0,1
+60000,Female,Graduate School,Married,43,-1,495,0,0
+20000,Male,University,Married,45,4,21605,0,0
+50000,Female,University,Single,29,0,45013,2007,0
+230000,Female,Graduate School,Single,31,0,20356,1676,0
+60000,Male,Graduate School,Single,29,0,40851,2000,0
+100000,Male,University,Married,42,0,100699,3368,1
+340000,Female,Graduate School,Married,49,1,-11,0,1
+260000,Male,Graduate School,Single,30,0,14096,2000,0
+220000,Female,Graduate School,Single,41,-1,2500,0,0
+130000,Female,Graduate School,Single,27,-2,5188,0,0
+300000,Male,Graduate School,Single,29,0,298072,13636,0
+50000,Male,High School,Married,47,1,66978,0,1
+30000,Female,University,Single,25,1,26568,1600,0
+280000,Female,Graduate School,Married,32,1,0,0,1
+50000,Male,High School,Single,26,2,22578,1800,0
+200000,Female,High School,Married,42,0,1554,1345,0
+360000,Male,University,Married,32,0,7230,5017,0
+50000,Male,University,Single,23,0,47038,10020,0
+280000,Female,Graduate School,Single,31,-2,1558,0,0
+50000,Female,Graduate School,Single,43,0,42627,1610,0
+180000,Female,High School,Married,31,-2,247,1562,0
+140000,Female,University,Married,37,1,0,177,1
+120000,Female,Graduate School,Single,33,2,18653,1500,1
+20000,Male,University,Single,30,0,20001,1400,0
+130000,Female,Graduate School,Single,30,3,8830,1000,1
+60000,Female,University,Single,29,0,54932,1900,1
+360000,Female,Graduate School,Single,50,-1,11530,11763,0
+360000,Male,Graduate School,Single,32,0,26833,1722,0
+260000,Female,High School,Married,41,0,137614,5700,1
+50000,Female,High School,Married,31,0,50612,1879,0
+20000,Male,University,Single,43,-1,54,0,1
+80000,Male,Graduate School,Married,34,0,73839,6700,1
+210000,Female,Graduate School,Single,30,-1,1993,1062,1
+500000,Male,University,Single,30,0,380417,9027,0
+50000,Male,University,Married,31,0,51319,5000,1
+30000,Female,University,Single,27,2,29810,1000,0
+260000,Male,University,Married,38,-1,17949,5722,0
+200000,Male,Graduate School,Married,40,-1,4394,3405,0
+290000,Female,University,Single,41,-1,6959,6891,0
+30000,Male,University,Single,27,0,10410,1495,0
+260000,Female,University,Others,38,-1,8000,8000,0
+500000,Male,University,Married,33,0,88832,6047,0
+180000,Female,University,Single,35,-2,0,0,0
+50000,Male,High School,Single,30,2,50110,1500,1
+450000,Female,Graduate School,Single,40,-1,9612,19293,0
+110000,Male,High School,Single,24,0,78453,1500,0
+340000,Male,Graduate School,Single,29,0,191426,8200,0
+20000,Female,University,Single,22,-1,496,0,0
+50000,Female,University,Married,32,1,0,0,1
+140000,Female,High School,Married,35,0,46915,70000,0
+400000,Female,Graduate School,Married,31,-1,5581,2833,0
+50000,Female,University,Single,40,-1,15584,4377,0
+130000,Male,High School,Single,53,-1,7477,6540,0
+70000,Male,University,Single,29,0,34449,1600,1
+40000,Female,Graduate School,Single,24,-1,19133,0,0
+130000,Female,Graduate School,Single,29,-1,390,99235,0
+70000,Female,University,Married,40,0,43045,2036,0
+250000,Male,Graduate School,Single,30,0,6476,5000,0
+150000,Female,Others,Married,25,2,156252,3468,0
+300000,Male,University,Single,31,0,285038,12507,0
+20000,Male,University,Single,24,-2,-331,19729,0
+20000,Female,University,Single,23,-1,6872,1204,0
+20000,Female,Graduate School,Single,24,2,14750,0,1
+210000,Female,High School,Married,42,1,0,1200,1
+220000,Male,Graduate School,Single,31,-2,332,333,0
+300000,Male,Graduate School,Single,32,-1,2144,2085,0
+120000,Male,Graduate School,Single,38,0,52234,1575,0
+50000,Female,University,Single,25,-1,6806,0,1
+210000,Female,University,Married,42,-1,4221,336,0
+170000,Female,University,Single,56,-1,4767,0,1
+60000,Female,Graduate School,Single,23,0,27369,1466,0
+20000,Male,University,Married,41,2,17213,2400,1
+50000,Male,University,Single,27,2,41932,1600,1
+180000,Female,High School,Married,43,1,0,8016,0
+20000,Female,University,Married,28,2,20574,1100,1
+310000,Female,Graduate School,Single,32,1,0,0,0
+130000,Male,High School,Married,48,0,11926,10840,0
+160000,Female,University,Single,25,0,153392,8000,0
+500000,Female,Graduate School,Single,33,0,48206,2100,0
+20000,Male,University,Single,29,0,18920,1700,0
+70000,Male,High School,Single,53,0,14054,1554,0
+130000,Female,University,Married,27,-1,792,0,0
+310000,Female,University,Married,40,0,180225,10000,0
+20000,Female,High School,Single,25,-1,12547,14745,1
+180000,Female,High School,Married,41,-1,3600,5184,0
+30000,Female,High School,Married,43,1,12603,1000,0
+50000,Female,University,Married,39,0,10960,1250,0
+210000,Female,Graduate School,Single,27,-1,272,902,0
+90000,Female,Graduate School,Single,42,2,88346,3500,1
+20000,Female,University,Married,58,0,19698,1315,0
+130000,Male,Graduate School,Married,29,1,162757,5,1
+60000,Female,University,Single,24,1,0,0,0
+310000,Female,University,Married,35,0,14332,1041,0
+80000,Male,University,Single,40,0,81511,3200,0
+200000,Female,University,Married,40,-1,7261,32543,0
+180000,Male,Graduate School,Single,27,0,94450,4600,0
+200000,Female,Graduate School,Married,34,1,0,0,1
+180000,Female,High School,Married,63,-1,1344,0,1
+10000,Male,University,Single,23,0,17834,1033,0
+50000,Male,University,Single,30,0,51396,2481,0
+210000,Male,High School,Married,33,0,44249,8130,0
+20000,Female,Others,Single,22,0,17576,1500,0
+50000,Male,University,Single,31,0,47709,3007,0
+290000,Male,Graduate School,Single,43,0,197175,17000,0
+360000,Female,Graduate School,Single,37,0,16600,3194,0
+500000,Female,University,Married,38,-1,2343,4695,0
+50000,Female,University,Married,41,0,25395,1440,0
+180000,Male,University,Single,25,0,4236,2001,0
+200000,Female,University,Married,44,1,0,0,0
+140000,Male,University,Married,48,0,80740,3000,0
+360000,Female,University,Married,46,-1,6402,13389,1
+60000,Female,University,Single,34,0,59291,3000,0
+110000,Male,University,Single,24,0,91915,3800,0
+50000,Male,University,Single,28,0,17868,1711,0
+130000,Female,University,Single,27,-1,390,390,0
+20000,Male,Graduate School,Single,23,0,8693,1500,0
+200000,Female,Graduate School,Married,35,-1,1972,0,1
+10000,Male,High School,Single,41,2,3983,1000,1
+330000,Female,Graduate School,Single,30,-1,5014,60207,0
+360000,Female,University,Married,34,0,9561,1166,0
+230000,Female,University,Married,46,-1,3503,3970,0
+140000,Male,University,Married,37,-1,2081,0,1
+50000,Male,Graduate School,Married,38,-1,2536,3147,0
+30000,Male,University,Single,31,2,23559,1000,0
+170000,Male,Graduate School,Single,29,1,0,0,0
+50000,Female,University,Single,32,1,49340,0,0
+290000,Female,Graduate School,Married,35,-1,703,0,1
+140000,Male,Graduate School,Single,43,0,138859,5100,1
+110000,Male,University,Married,39,0,49464,3007,0
+170000,Male,Graduate School,Single,27,0,127902,5631,0
+390000,Female,University,Married,31,0,49667,1700,0
+160000,Male,Graduate School,Married,47,-2,1562,1503,0
+20000,Female,University,Married,37,4,20460,0,0
+40000,Male,Graduate School,Single,24,0,36904,1600,0
+400000,Male,University,Married,37,-1,7011,10000,0
+260000,Female,University,Single,29,-2,4400,942,0
+260000,Female,High School,Married,52,0,207627,8535,0
+240000,Male,Graduate School,Single,48,1,180189,15281,0
+230000,Female,University,Married,38,-2,12696,9883,0
+50000,Male,University,Married,45,-1,4900,7222,0
+400000,Male,Graduate School,Single,37,-1,1783,1251,0
+30000,Male,University,Married,31,0,29285,1820,0
+300000,Male,Graduate School,Single,37,0,52268,3000,0
+100000,Female,Graduate School,Single,24,-1,7576,15532,0
+130000,Male,Graduate School,Single,26,0,4417,105072,0
+180000,Female,Graduate School,Single,41,0,6734,1124,0
+130000,Male,High School,Single,24,1,101767,12,1
+130000,Female,Graduate School,Single,25,1,0,0,0
+50000,Female,University,Married,57,-1,173,274,0
+230000,Female,University,Married,36,-2,198,29439,1
+140000,Male,High School,Married,35,0,136324,5407,0
+60000,Male,University,Single,30,0,41540,390,0
+30000,Male,University,Married,23,-1,390,390,0
+40000,Female,High School,Married,39,0,34378,3768,1
+250000,Female,University,Married,40,1,7067,4,0
+20000,Male,University,Married,53,-2,35419,0,0
+120000,Male,University,Married,27,2,110162,0,1
+210000,Female,University,Single,29,0,86598,4000,0
+380000,Male,Graduate School,Single,33,-1,2500,0,1
+130000,Male,University,Single,28,-1,381,551,0
+80000,Female,University,Married,49,0,79545,3708,1
+80000,Female,University,Married,31,0,3601,9,1
+140000,Male,University,Single,36,0,59693,3000,0
+220000,Female,University,Others,50,0,127571,5200,0
+20000,Female,University,Single,23,1,6218,1000,1
+200000,Female,University,Married,43,2,150,0,1
+50000,Female,University,Married,24,0,37159,1500,0
+20000,Male,University,Single,30,0,20027,1395,0
+230000,Male,High School,Single,37,-1,36571,1700,0
+230000,Female,University,Single,29,2,232044,9500,1
+60000,Female,University,Single,24,1,0,0,0
+50000,Male,High School,Married,54,2,18691,3000,1
+130000,Male,High School,Married,40,-1,3015,1000,0
+20000,Female,University,Single,27,0,17304,0,0
+440000,Female,University,Single,31,0,49803,1390,0
+150000,Female,Graduate School,Single,23,-1,227,6692,0
+410000,Male,University,Single,42,-1,7114,8309,0
+320000,Male,University,Married,32,0,43265,1528,0
+30000,Male,University,Single,26,0,26990,4900,0
+80000,Male,University,Single,29,1,75234,3007,0
+50000,Female,High School,Married,47,1,28164,0,1
+50000,Male,University,Single,39,1,50468,0,1
+140000,Female,Graduate School,Single,52,-1,1388,0,0
+500000,Female,Graduate School,Married,38,-1,7789,9666,0
+60000,Female,High School,Married,48,0,59063,2000,0
+150000,Male,Graduate School,Single,35,-1,5888,0,0
+130000,Male,Others,Single,46,2,79705,2475,0
+20000,Male,University,Single,41,0,18577,2000,0
+310000,Male,University,Married,43,-1,4943,2691,0
+90000,Male,University,Married,42,-1,780,0,0
+210000,Female,Graduate School,Single,46,-1,15655,4854,1
+10000,Female,University,Single,22,1,7194,0,1
+30000,Male,High School,Married,59,0,29272,1404,0
+30000,Female,University,Married,22,2,28574,0,1
+30000,Female,Graduate School,Single,24,0,28885,1459,0
+220000,Female,University,Married,45,1,0,6010,0
+20000,Male,High School,Married,45,-1,1340,0,1
+300000,Male,Graduate School,Married,30,-1,649,949,0
+160000,Female,University,Married,30,0,5331,1153,0
+10000,Male,University,Single,25,0,5378,1200,0
+20000,Female,University,Single,23,2,16235,2200,1
+20000,Male,Graduate School,Single,29,3,16330,0,1
+70000,Male,University,Married,26,0,66866,2000,1
+50000,Female,University,Married,30,0,43334,1986,0
+130000,Female,Graduate School,Single,26,-1,10274,2352,0
+180000,Female,Graduate School,Married,38,-1,27959,13855,0
+50000,Male,University,Single,28,1,9500,0,1
+500000,Male,Graduate School,Married,44,-1,35755,40000,0
+340000,Female,Graduate School,Single,27,0,30452,6919,0
+150000,Female,University,Single,25,-2,3303,125,0
+150000,Female,University,Married,40,-1,32117,3505,0
+100000,Female,University,Married,47,0,27555,3000,0
+100000,Female,Graduate School,Single,28,0,16601,1539,0
+30000,Female,Graduate School,Single,23,2,25487,1738,1
+150000,Female,University,Married,34,2,30821,3000,0
+10000,Male,Graduate School,Single,23,0,7684,1300,0
+190000,Female,Graduate School,Single,38,0,77720,3774,0
+70000,Female,Graduate School,Single,33,0,39416,2000,0
+360000,Female,Graduate School,Married,48,-1,8243,10752,0
+500000,Female,University,Married,37,0,71455,10004,0
+210000,Male,University,Married,45,0,82902,3814,0
+320000,Male,University,Married,37,0,188106,6496,0
+50000,Female,University,Married,22,0,39843,1762,0
+500000,Female,Graduate School,Single,38,0,4590,1128,0
+320000,Male,University,Married,42,-2,730,1550,0
+80000,Male,University,Single,26,0,76304,4203,0
+250000,Male,Graduate School,Married,42,1,352016,15210,1
+250000,Male,Graduate School,Married,40,-1,396,396,0
+220000,Male,Graduate School,Married,33,-2,0,2500,0
+250000,Female,Graduate School,Single,37,-1,1391,3498,0
+50000,Female,University,Married,24,0,42120,1921,0
+360000,Female,Graduate School,Single,43,1,0,10121,1
+260000,Female,Graduate School,Married,38,1,0,0,0
+50000,Male,University,Married,45,0,9659,1155,0
+20000,Male,University,Single,25,-1,390,5548,0
+10000,Male,University,Single,23,0,2068,4400,0
+500000,Female,University,Married,34,1,523618,0,0
+330000,Female,University,Single,33,1,-271,0,0
+170000,Male,University,Single,34,0,113270,6000,0
+20000,Male,University,Single,63,2,1650,0,0
+80000,Male,University,Married,29,0,37822,3700,0
+240000,Female,University,Married,32,-1,4101,8500,0
+150000,Female,University,Single,28,0,46576,2500,0
+200000,Male,University,Married,34,0,135654,66022,0
+300000,Male,Graduate School,Single,33,0,1617,7785,0
+100000,Male,University,Married,35,4,82348,1700,1
+290000,Female,University,Single,33,0,14026,9000,0
+100000,Male,High School,Married,56,0,10882,1250,0
+10000,Male,Graduate School,Others,46,0,6772,1128,1
+420000,Female,Graduate School,Single,41,0,79899,3391,0
+20000,Female,University,Married,42,0,19185,1323,0
+300000,Female,Graduate School,Married,37,-1,616,1370,0
+50000,Male,High School,Single,26,0,42040,2000,0
+180000,Female,University,Married,44,-1,33800,0,0
+420000,Male,University,Married,41,-1,16657,2590,0
+100000,Female,University,Married,49,-1,10430,3313,0
+80000,Female,University,Single,38,2,18733,2000,0
+210000,Male,Graduate School,Married,42,0,103772,4000,0
+80000,Male,High School,Married,42,1,0,0,0
+240000,Female,Graduate School,Single,31,0,65220,4000,0
+500000,Male,Graduate School,Married,45,-1,37875,162003,0
+100000,Male,University,Single,27,0,10458,2500,0
+610000,Female,Graduate School,Single,36,-2,874,0,0
+170000,Male,University,Single,27,-1,416,416,0
+320000,Male,Graduate School,Married,45,-2,37361,21903,0
+200000,Female,High School,Married,38,0,60473,101796,0
+360000,Male,Graduate School,Single,35,1,0,1010,1
+200000,Male,University,Married,34,1,0,0,0
+150000,Female,University,Married,39,1,0,0,1
+100000,Male,Graduate School,Married,44,0,48850,0,0
+200000,Male,Graduate School,Single,29,0,45841,4000,0
+30000,Male,University,Married,26,-2,-201,550,0
+260000,Male,Graduate School,Married,36,0,239206,9029,1
+300000,Female,University,Married,34,-1,1000,1000,1
+110000,Male,University,Married,32,0,73772,2765,0
+50000,Female,Graduate School,Single,24,-2,780,0,0
+500000,Male,High School,Single,44,0,16179,2029,0
+60000,Female,Graduate School,Single,30,0,24396,1463,0
+80000,Male,Graduate School,Single,39,0,49171,2000,1
+370000,Male,Graduate School,Married,40,1,0,0,0
+150000,Male,University,Single,38,-1,476,476,0
+210000,Male,Graduate School,Single,35,1,-36,0,0
+500000,Female,University,Single,34,0,175570,8734,0
+150000,Male,Graduate School,Married,42,8,161569,0,1
+110000,Male,University,Married,39,0,54806,2000,0
+60000,Female,Graduate School,Married,49,0,61618,2800,0
+480000,Female,Graduate School,Single,33,0,621749,22024,0
+30000,Male,Graduate School,Single,35,2,14489,1500,0
+240000,Female,University,Married,28,1,210333,0,1
+350000,Female,Graduate School,Single,40,1,-644,0,0
+240000,Female,University,Married,38,1,0,0,1
+80000,Female,Graduate School,Single,27,0,77967,3565,0
+100000,Male,Graduate School,Married,43,0,42451,2100,0
+30000,Female,University,Married,30,-2,3165,13538,0
+290000,Male,Graduate School,Single,41,-2,3394,0,0
+200000,Male,Graduate School,Married,34,-1,3692,0,0
+360000,Female,Graduate School,Married,32,-1,39587,18011,0
+430000,Male,University,Married,46,0,52930,510,1
+150000,Male,Graduate School,Single,44,0,13786,1232,0
+200000,Male,University,Single,30,-2,48492,20294,1
+50000,Male,High School,Married,31,1,0,0,0
+240000,Female,High School,Married,36,1,-608,5000,0
+20000,Female,Graduate School,Single,24,0,3634,3000,0
+150000,Female,Graduate School,Married,30,-2,10427,0,0
+230000,Male,Graduate School,Married,39,0,18835,10005,0
+20000,Male,University,Married,31,-1,836,390,1
+100000,Female,University,Married,33,1,23384,0,0
+280000,Female,University,Married,30,0,71770,4000,1
+90000,Female,High School,Married,39,0,4968,1203,0
+240000,Female,Graduate School,Single,29,0,23094,2009,0
+80000,Male,High School,Married,27,2,40846,1149,1
+160000,Female,University,Married,34,0,51721,2037,0
+280000,Female,Graduate School,Married,37,0,1908,12763,0
+170000,Female,Graduate School,Married,30,2,158819,6500,0
+90000,Female,University,Married,47,2,61066,3500,1
+160000,Female,Graduate School,Single,29,-1,2599,0,0
+420000,Female,High School,Married,36,-2,0,0,0
+280000,Male,University,Single,30,-1,651,1650,0
+30000,Male,University,Single,26,1,17989,0,1
+140000,Female,Graduate School,Single,34,2,125623,5000,0
+220000,Male,Graduate School,Married,48,2,1480,9202,0
+260000,Female,Graduate School,Single,27,-1,399,399,0
+170000,Female,University,Single,27,1,-1020,27107,0
+190000,Female,University,Married,33,0,40173,2200,0
+50000,Male,University,Single,31,0,40754,2000,0
+200000,Female,Graduate School,Single,31,-1,4938,2286,0
+300000,Female,Graduate School,Single,44,-2,9799,299,0
+230000,Female,Graduate School,Married,34,-1,3394,8338,0
+50000,Male,University,Married,57,0,53320,1300,0
+150000,Male,High School,Single,25,-1,9240,386,0
+150000,Female,High School,Married,35,0,89604,10000,1
+10000,Male,Graduate School,Married,50,0,10043,1194,0
+240000,Female,High School,Married,43,0,61190,2300,0
+230000,Female,University,Single,27,0,47520,2000,0
+170000,Female,University,Married,34,-1,326,326,0
+460000,Female,Graduate School,Married,38,-1,21203,21833,0
+240000,Male,University,Single,46,-2,2995,4028,0
+80000,Female,University,Married,43,1,24621,0,0
+290000,Male,University,Single,29,0,1199,2000,0
+30000,Male,University,Single,22,0,30455,1515,0
+240000,Female,University,Single,35,0,227457,9000,0
+80000,Female,University,Married,42,-2,10593,41217,0
+30000,Male,University,Single,33,0,28949,2233,0
+180000,Female,University,Others,52,-2,0,0,0
+50000,Female,University,Single,26,-1,390,390,1
+30000,Female,High School,Single,52,3,26494,900,1
+110000,Male,Graduate School,Single,32,0,112380,6200,1
+180000,Female,University,Single,33,1,5931,2294,0
+50000,Male,University,Married,49,0,46693,1158,0
+140000,Female,University,Single,29,0,141740,7001,0
+90000,Female,University,Married,38,0,92481,3507,1
+90000,Female,University,Single,25,-1,23662,3000,1
+180000,Male,University,Single,31,0,130171,1110,0
+360000,Female,Graduate School,Single,29,0,92693,20595,0
+140000,Male,University,Single,28,2,12642,1600,1
+90000,Female,High School,Single,23,0,119836,30000,0
+150000,Female,Graduate School,Single,27,-1,222,3,0
+200000,Male,University,Married,56,0,195456,8007,0
+110000,Female,University,Married,48,0,216582,8788,0
+410000,Female,Graduate School,Single,25,-1,4151,4357,0
+50000,Male,High School,Married,58,0,31236,3088,0
+170000,Female,University,Married,43,0,78429,5000,0
+210000,Female,University,Married,39,-1,836,4278,0
+290000,Male,Graduate School,Single,35,0,247280,30000,0
+160000,Female,Graduate School,Single,27,-1,2431,5030,0
+50000,Male,University,Single,27,0,48603,1963,1
+50000,Male,University,Single,28,0,33094,17012,0
+30000,Female,University,Single,21,0,10406,1000,0
+30000,Female,High School,Single,24,1,18103,900,1
+80000,Female,University,Single,27,0,55647,3000,0
+20000,Male,Graduate School,Single,39,2,16869,1606,1
+130000,Female,High School,Married,39,-1,1755,1890,0
+80000,Female,Graduate School,Married,34,-1,4269,0,1
+80000,Male,University,Married,40,0,45747,1618,0
+180000,Female,High School,Single,39,1,0,4835,0
+150000,Male,Graduate School,Married,37,0,49112,1510,0
+100000,Male,University,Single,26,0,73070,10000,0
+50000,Female,High School,Married,47,0,29336,1641,0
+290000,Male,University,Married,48,-2,730,4426,1
+490000,Male,Graduate School,Married,54,-2,0,0,1
+130000,Female,Graduate School,Single,30,1,-915,0,0
+20000,Male,Graduate School,Single,24,1,0,150,0
+30000,Female,Graduate School,Single,23,1,10764,0,1
+230000,Female,Graduate School,Single,25,0,52906,3000,0
+260000,Male,Graduate School,Single,52,-1,582,0,0
+290000,Male,Graduate School,Married,38,-2,2500,0,0
+220000,Female,University,Single,54,-1,2742,1500,0
+60000,Male,University,Single,26,3,61014,0,1
+150000,Female,High School,Married,45,0,59885,2200,0
+80000,Female,University,Married,39,3,13683,0,1
+30000,Female,University,Single,23,2,16089,1300,1
+210000,Female,University,Married,34,0,86792,3319,0
+50000,Female,High School,Married,27,2,27358,1800,0
+250000,Female,Graduate School,Married,47,-1,43098,10615,0
+200000,Female,University,Single,34,-1,12842,13805,1
+150000,Male,University,Married,48,1,144837,0,1
+30000,Female,University,Married,26,0,29072,1600,0
+50000,Female,University,Single,23,0,35454,1901,0
+210000,Male,Graduate School,Single,32,0,124838,5000,0
+20000,Male,University,Married,31,0,18490,1500,0
+110000,Male,Graduate School,Married,56,3,46335,0,1
+140000,Male,Graduate School,Single,26,0,84729,26894,0
+50000,Male,High School,Married,56,0,24729,1338,0
+230000,Female,University,Single,27,-1,79872,10033,0
+200000,Female,Graduate School,Single,28,0,184602,7000,0
+120000,Female,University,Married,30,-1,2706,569,0
+50000,Male,University,Married,46,0,49046,2185,0
+140000,Female,High School,Married,39,0,108028,3889,0
+100000,Female,High School,Single,48,-1,16584,16137,0
+80000,Male,High School,Married,43,0,231550,5067,0
+100000,Male,University,Single,29,2,86974,12700,1
+120000,Female,University,Single,23,0,38609,5000,0
+50000,Male,University,Married,50,-2,3526,0,1
+30000,Female,University,Single,24,-1,784,4200,0
+330000,Female,University,Single,27,0,237515,11012,0
+240000,Male,Graduate School,Married,53,0,232943,9700,0
+290000,Female,University,Single,39,0,228342,10600,1
+60000,Male,University,Single,42,0,60660,3500,0
+360000,Female,Graduate School,Single,29,-1,5751,1501,0
+200000,Female,Graduate School,Single,39,-2,0,0,0
+450000,Female,Graduate School,Married,33,0,67139,3000,0
+80000,Male,Graduate School,Married,41,1,102383,0,0
+230000,Female,University,Married,32,-2,750,0,0
+160000,Female,University,Married,32,-1,390,390,0
+20000,Female,High School,Single,53,-1,4400,0,0
+200000,Female,High School,Married,28,-1,188,2791,1
+240000,Female,University,Married,35,-1,1473,1473,1
+80000,Female,High School,Single,39,-1,4762,6734,0
+20000,Male,University,Married,53,0,12278,1500,0
+120000,Female,Graduate School,Single,25,0,120568,5224,0
+130000,Female,High School,Single,24,2,49583,5100,1
+200000,Male,High School,Married,45,0,97933,3200,0
+330000,Male,Graduate School,Single,36,-2,-10,0,0
+30000,Female,High School,Married,22,1,21508,0,1
+20000,Male,University,Single,25,2,1050,0,0
+170000,Male,Graduate School,Married,41,-1,10540,0,1
+150000,Female,University,Single,24,1,170971,39,0
+20000,Male,University,Single,25,0,15859,3200,0
+160000,Female,Graduate School,Single,29,1,-146,17833,0
+140000,Female,University,Single,38,0,15438,2620,0
+150000,Male,University,Single,27,0,61606,5000,0
+120000,Female,High School,Single,47,1,121400,7352,0
+50000,Male,University,Single,29,0,13192,2000,1
+180000,Female,Graduate School,Single,32,1,17816,1000,0
+60000,Male,Graduate School,Single,28,0,18996,2120,1
+100000,Male,University,Single,34,2,54126,5300,1
+120000,Male,University,Single,28,-1,4310,3905,1
+20000,Male,Graduate School,Single,24,1,0,0,1
+70000,Female,Graduate School,Single,29,2,44594,2000,1
+200000,Male,University,Married,58,2,190672,0,0
+500000,Female,Graduate School,Married,48,-1,273430,9704,0
+460000,Male,Graduate School,Married,41,-1,81848,8877,0
+160000,Male,University,Single,57,0,152689,5500,0
+20000,Male,Others,Single,27,0,19189,1376,0
+300000,Male,University,Single,42,-1,1188,2000,1
+40000,Male,University,Married,44,0,36917,2000,0
+130000,Female,High School,Married,26,0,131724,5510,1
+120000,Female,Graduate School,Single,25,0,11431,6000,0
+20000,Female,University,Single,22,0,18062,1700,0
+200000,Female,Graduate School,Single,47,-2,23905,12020,0
+20000,Male,University,Married,51,0,18321,1355,0
+10000,Male,University,Single,32,0,7231,1142,0
+400000,Female,Graduate School,Single,28,0,10430,24030,0
+180000,Female,Graduate School,Single,28,1,65054,1568,0
+60000,Female,University,Single,23,0,21295,1679,0
+260000,Female,Graduate School,Single,28,0,187068,10360,1
+50000,Male,High School,Married,36,0,36809,1601,0
+120000,Female,University,Married,30,2,87199,4110,1
+300000,Female,Graduate School,Married,47,2,5000,0,1
+20000,Male,University,Single,36,0,18400,0,1
+180000,Male,University,Married,26,-1,396,396,0
+30000,Female,University,Single,23,0,27141,3000,0
+70000,Male,University,Single,31,0,5942,2500,0
+220000,Male,Graduate School,Single,51,-1,20730,0,1
+110000,Female,Graduate School,Married,37,-1,2610,54300,1
+180000,Male,Graduate School,Single,33,0,37711,3000,0
+80000,Female,Graduate School,Single,31,-1,5759,8414,0
+100000,Female,Graduate School,Married,34,1,46092,0,0
+50000,Female,University,Married,45,0,46248,5750,1
+50000,Female,University,Single,28,0,59321,2329,0
+130000,Female,Graduate School,Married,38,-1,5190,649,1
+290000,Female,University,Others,49,0,52547,2542,0
+240000,Male,Graduate School,Single,33,0,124105,7000,0
+160000,Female,University,Single,27,1,0,0,0
+50000,Male,University,Single,34,0,48973,91000,0
+30000,Male,University,Others,45,0,3390,390,0
+60000,Female,University,Married,39,1,0,0,0
+60000,Female,University,Single,57,0,53584,2000,0
+100000,Female,High School,Married,51,0,98918,4523,0
+140000,Male,Graduate School,Married,27,1,-488,0,0
+150000,Male,University,Married,35,0,125063,10050,0
+500000,Male,University,Single,28,0,47399,2361,0
+20000,Male,University,Married,33,0,4128,1500,1
+240000,Female,University,Single,37,-2,500,0,0
+90000,Male,High School,Others,34,-2,33957,4990,0
+160000,Female,Graduate School,Married,69,2,498,135,0
+280000,Female,University,Single,33,-2,167,3700,1
+260000,Female,University,Single,27,0,25419,3000,0
+130000,Male,Others,Married,44,0,75472,2452,1
+500000,Female,High School,Married,50,0,5310,2703,0
+120000,Male,University,Married,39,2,435,435,0
+210000,Male,Graduate School,Single,30,-1,1060,2200,0
+50000,Male,University,Married,46,0,48905,1800,0
+210000,Male,University,Married,36,-1,1060,779,0
+10000,Female,University,Single,27,1,8944,1230,1
+70000,Female,University,Single,34,2,64913,3000,0
+150000,Male,University,Single,31,-1,8839,11394,0
+60000,Male,High School,Single,57,0,15602,1284,0
+120000,Male,University,Single,29,-1,5521,1010,1
+80000,Female,Graduate School,Single,30,0,48006,2100,0
+490000,Male,Graduate School,Single,36,-2,5310,2711,0
+450000,Male,Graduate School,Married,44,-2,32000,3423,0
+20000,Female,Others,Single,23,0,16371,2000,0
+10000,Male,Graduate School,Single,23,0,3035,3000,0
+20000,Female,High School,Married,44,-2,836,390,0
+50000,Male,University,Single,27,0,6105,1500,0
+300000,Male,Graduate School,Married,51,-1,18961,4387,1
+80000,Male,Graduate School,Single,27,-2,2049,2003,0
+310000,Male,Graduate School,Married,39,-1,360,0,0
+50000,Female,High School,Married,41,0,48403,2550,0
+500000,Male,University,Married,39,-1,23800,297,0
+80000,Female,High School,Married,40,0,22747,3002,0
+30000,Female,High School,Others,26,1,30698,0,1
+10000,Male,University,Single,42,1,9980,1000,1
+390000,Female,Graduate School,Married,39,-1,31825,56613,0
+290000,Male,University,Married,44,-2,167,619,1
+120000,Female,University,Married,33,0,46640,5000,0
+50000,Female,Graduate School,Single,27,0,28387,10000,0
+280000,Male,Graduate School,Married,36,-2,2624,0,0
+20000,Female,Graduate School,Single,24,0,18556,1805,0
+20000,Female,University,Single,22,0,8392,0,0
+90000,Female,Graduate School,Single,27,2,27459,1800,1
+90000,Female,University,Married,30,2,64937,0,0
+120000,Female,University,Single,35,1,0,0,0
+80000,Female,High School,Married,40,0,12171,1500,1
+180000,Female,University,Married,68,0,127933,15000,0
+200000,Female,Graduate School,Single,32,-1,326,6914,1
+50000,Female,High School,Married,55,0,46300,1200,0
+360000,Male,Graduate School,Single,30,0,14409,3540,0
+30000,Female,University,Married,27,0,23134,1712,0
+50000,Female,University,Single,24,0,50062,2000,0
+80000,Female,University,Married,37,-2,1324,1087,0
+120000,Male,University,Single,34,1,45755,2000,1
+30000,Female,Graduate School,Single,29,0,8326,557,0
+260000,Female,University,Single,30,-1,165,165,0
+50000,Male,University,Married,62,0,51350,50908,0
+30000,Female,University,Married,23,2,26795,3300,0
+50000,Male,High School,Married,59,0,51400,0,0
+340000,Female,High School,Married,40,0,183139,9000,0
+170000,Female,University,Married,36,-1,2600,5234,0
+20000,Male,University,Single,25,1,20415,1264,0
+230000,Female,Graduate School,Married,35,-1,1246,3312,0
+460000,Male,Graduate School,Single,28,0,460609,18138,0
+20000,Male,University,Single,56,1,17819,1000,0
+210000,Male,Graduate School,Single,30,1,0,0,1
+30000,Male,University,Single,25,2,29894,2300,0
+50000,Female,University,Single,30,1,47629,0,1
+130000,Male,Graduate School,Single,26,1,-888,6694,0
+30000,Female,High School,Single,24,2,15335,0,1
+60000,Male,University,Married,35,0,10560,1200,1
+160000,Female,University,Single,28,-1,1426,326,0
+180000,Female,University,Single,32,0,54797,3000,0
+460000,Female,High School,Single,32,0,89747,5000,0
+30000,Female,High School,Married,53,2,34775,1733,0
+280000,Male,University,Married,36,0,143650,0,0
+100000,Male,University,Single,33,1,0,0,0
+30000,Male,University,Single,29,6,32875,0,1
+150000,Female,Graduate School,Single,29,0,39215,11200,0
+200000,Female,University,Married,47,-1,11418,2541,0
+150000,Male,Graduate School,Single,28,0,123767,20732,0
+370000,Female,University,Married,45,0,162526,10000,0
+200000,Female,Graduate School,Single,34,-1,2603,11381,0
+200000,Female,Graduate School,Married,32,-1,10760,2000,0
+30000,Female,Graduate School,Single,24,2,1800,0,0
+210000,Female,University,Single,27,-1,190,0,0
+120000,Female,University,Married,42,0,94218,3415,0
+470000,Female,University,Married,35,0,218675,8717,0
+60000,Female,Others,Married,39,-1,183,1419,0
+260000,Male,Graduate School,Married,53,-1,1361,0,0
+80000,Male,University,Married,28,-1,390,390,1
+100000,Female,High School,Married,47,0,46104,2086,0
+200000,Male,Graduate School,Married,38,1,-57,0,0
+360000,Male,High School,Married,34,1,277040,0,0
+70000,Female,University,Married,27,2,31455,4000,1
+420000,Male,High School,Married,45,2,520453,18500,0
+150000,Female,Graduate School,Married,32,1,4605,22,0
+240000,Male,University,Married,44,0,8421,14230,0
+360000,Female,Graduate School,Single,43,1,0,7221,0
+390000,Female,University,Married,30,0,98984,5700,0
+180000,Male,University,Single,27,-1,11599,644,0
+50000,Female,Graduate School,Single,25,-1,2058,287,0
+180000,Female,University,Single,26,0,88846,3279,1
+200000,Male,University,Married,41,1,69963,0,0
+50000,Male,University,Married,36,1,0,0,0
+280000,Male,University,Single,29,-1,330,9500,0
+90000,Female,University,Single,40,-2,9663,69616,0
+420000,Male,Graduate School,Married,46,2,436968,0,0
+70000,Female,University,Married,37,1,7204,0,0
+170000,Female,University,Married,34,-1,1837,1328,0
+210000,Female,Graduate School,Single,48,1,0,0,1
+80000,Male,Graduate School,Married,33,0,75593,2503,0
+70000,Female,Graduate School,Single,48,0,62319,6000,0
+100000,Male,University,Married,29,2,82781,5000,0
+60000,Male,High School,Single,31,2,30841,4000,0
+350000,Male,University,Single,29,0,36151,2056,0
+180000,Female,University,Single,36,1,0,0,0
+200000,Male,University,Single,30,0,67825,10000,0
+120000,Female,Graduate School,Single,26,0,67519,3000,0
+70000,Female,University,Single,24,0,16366,1000,0
+200000,Female,Graduate School,Married,29,0,21326,10000,0
+200000,Female,Graduate School,Single,39,-1,10140,9984,0
+50000,Male,Graduate School,Married,38,0,21912,1685,0
+290000,Male,Graduate School,Married,49,-1,4599,28420,0
+120000,Male,University,Single,26,0,74831,4000,0
+220000,Female,University,Single,37,-1,1918,4626,0
+50000,Female,High School,Married,50,0,43140,2500,0
+360000,Female,Graduate School,Married,35,1,0,2819,0
+140000,Male,University,Single,45,0,36856,1500,0
+30000,Female,University,Single,22,-2,0,0,0
+130000,Female,University,Single,23,-2,136,0,0
+140000,Female,Graduate School,Single,35,-1,2736,0,0
+260000,Female,University,Married,37,-1,3689,3993,0
+50000,Female,High School,Single,54,0,22136,1388,0
+60000,Male,University,Married,32,0,56228,2147,0
+20000,Male,Graduate School,Single,36,0,7075,1000,0
+250000,Female,High School,Single,26,0,188526,7002,0
+20000,Male,High School,Single,27,0,20443,1558,0
+200000,Female,High School,Single,29,-1,1945,1000,0
+50000,Female,High School,Single,25,0,6954,2161,0
+210000,Male,University,Single,35,1,0,0,0
+110000,Male,Graduate School,Single,29,1,0,0,0
+20000,Male,University,Single,28,1,2013,0,0
+250000,Male,Graduate School,Single,30,0,61331,3000,0
+10000,Male,University,Single,21,0,9900,0,1
+140000,Female,Graduate School,Single,30,-1,4717,5329,0
+100000,Female,University,Single,25,0,87401,7497,1
+130000,Female,High School,Married,43,3,19085,0,1
+80000,Female,Graduate School,Single,23,0,78358,826,0
+350000,Female,High School,Married,38,-2,362,792,0
+140000,Male,High School,Married,45,-1,1553,0,0
+120000,Female,University,Married,33,1,0,0,0
+480000,Male,High School,Married,46,0,277803,16000,0
+20000,Female,University,Single,24,0,3980,1318,1
+30000,Male,University,Married,43,0,24643,2496,0
+500000,Female,University,Single,48,0,185627,8000,0
+360000,Male,High School,Married,55,0,23439,5000,0
+90000,Female,University,Married,34,1,0,2098,0
+90000,Female,High School,Married,44,0,94393,3820,0
+150000,Female,Graduate School,Single,33,-1,717,0,0
+130000,Female,High School,Married,49,1,0,10152,0
+10000,Male,Graduate School,Single,26,-1,10252,0,0
+70000,Male,University,Married,35,0,32507,3000,0
+220000,Male,Graduate School,Single,32,0,207384,9252,0
+150000,Female,Graduate School,Single,29,-2,23414,1000,0
+140000,Male,High School,Others,47,0,75883,2536,0
+160000,Female,University,Married,34,2,160285,0,0
+80000,Female,Graduate School,Single,27,-1,825,825,0
+50000,Male,High School,Married,39,0,44574,0,0
+20000,Male,University,Married,32,0,9622,1511,0
+10000,Female,Others,Single,21,0,7691,1192,0
+320000,Female,University,Single,34,-1,7806,2944,0
+290000,Female,Graduate School,Married,40,-1,550,550,0
+50000,Female,Graduate School,Married,43,0,49390,2449,0
+140000,Female,University,Married,51,-2,4885,650,0
+140000,Female,Graduate School,Married,36,0,132629,5406,0
+50000,Female,University,Married,49,-1,3126,25368,0
+360000,Male,Graduate School,Single,34,-1,9,0,1
+140000,Male,High School,Married,43,-1,139445,5500,1
+280000,Female,University,Married,48,-2,9517,36527,0
+410000,Male,Graduate School,Single,37,0,59728,8000,0
+30000,Female,Graduate School,Single,26,-1,125,125,0
+50000,Female,High School,Married,50,1,4446,0,0
+60000,Female,University,Single,26,0,57171,2464,0
+170000,Female,University,Single,29,0,47742,2105,0
+230000,Male,University,Single,28,-1,651,651,0
+30000,Female,University,Married,25,0,29883,2000,1
+360000,Female,Graduate School,Married,42,-1,5658,4164,0
+360000,Male,Graduate School,Married,30,0,98048,4000,0
+150000,Male,Graduate School,Single,38,0,421504,17000,0
+110000,Female,High School,Married,36,0,2309,1002,0
+280000,Male,Graduate School,Single,32,0,32068,5000,0
+200000,Male,Graduate School,Married,37,-1,2896,10720,1
+50000,Male,High School,Single,33,1,0,0,0
+150000,Male,University,Single,28,0,79426,3003,1
+360000,Male,Graduate School,Single,36,0,331365,15000,0
+130000,Male,High School,Married,49,0,119466,5044,0
+230000,Female,High School,Married,48,-1,500,14883,0
+480000,Female,University,Single,48,-1,2475,1000,0
+30000,Female,High School,Married,35,2,28852,1700,1
+150000,Female,University,Single,42,0,89341,5000,0
+500000,Male,Graduate School,Married,42,-1,29365,94135,0
+220000,Female,University,Married,43,1,0,0,1
+360000,Female,Graduate School,Single,56,-2,437,241,1
+160000,Female,University,Married,42,0,165686,8450,1
+270000,Female,University,Single,42,-1,12263,10025,0
+610000,Male,Graduate School,Single,35,0,269672,10000,0
+20000,Male,High School,Single,29,-1,4453,8019,0
+70000,Female,University,Married,28,1,9192,0,1
+260000,Female,Graduate School,Married,35,0,51133,10000,0
+30000,Male,University,Married,47,-1,778,5900,1
+20000,Male,University,Single,34,1,20690,0,1
+160000,Female,High School,Single,25,-1,141,201,0
+20000,Female,University,Single,23,0,16575,1327,0
+330000,Male,Graduate School,Married,61,-2,14897,19364,0
+70000,Male,University,Single,25,0,50224,10000,0
+300000,Male,Graduate School,Single,27,-2,0,0,0
+280000,Female,University,Single,30,2,254307,9100,0
+180000,Female,University,Married,33,0,47050,3000,0
+50000,Female,Graduate School,Single,50,0,51016,2098,0
+230000,Female,High School,Single,30,-2,122344,5492,1
+200000,Female,University,Married,51,-1,9754,56243,1
+250000,Male,University,Single,30,0,5724,2000,0
+100000,Male,Graduate School,Single,31,1,104395,1867,0
+470000,Female,Graduate School,Single,30,-2,1689,499,1
+80000,Female,Graduate School,Single,26,-1,3900,331,0
+250000,Male,University,Married,39,0,20461,3000,0
+20000,Male,High School,Married,36,2,1050,0,1
+360000,Female,Graduate School,Single,32,1,0,0,0
+80000,Female,University,Married,26,0,35897,1510,0
+100000,Female,University,Married,27,1,3856,2,0
+50000,Male,Graduate School,Married,40,1,0,0,0
+620000,Male,Graduate School,Married,39,0,346927,5200,0
+100000,Female,High School,Married,28,2,29770,1825,0
+30000,Male,Graduate School,Single,22,0,18796,5000,0
+180000,Male,Graduate School,Single,28,-1,2619,1839,1
+330000,Female,Graduate School,Married,36,-1,2964,36,0
+90000,Male,University,Single,34,1,50281,0,0
+280000,Male,Graduate School,Married,41,-1,718,3927,0
+20000,Male,University,Single,30,1,18231,0,1
+120000,Male,Graduate School,Single,37,-1,1235,0,0
+240000,Female,Graduate School,Single,27,0,176829,7058,0
+80000,Female,High School,Married,53,1,0,1710,1
+30000,Male,University,Married,35,1,-2221,0,0
+70000,Female,Graduate School,Married,28,0,20596,5000,0
+20000,Male,University,Married,46,1,19866,0,1
+180000,Female,Graduate School,Married,38,2,183939,8000,1
+140000,Male,University,Single,62,0,143107,4839,0
+20000,Female,Graduate School,Single,24,0,17147,1700,0
+230000,Female,Graduate School,Married,31,-2,0,0,1
+90000,Female,University,Single,39,2,82326,2700,0
+170000,Female,University,Married,33,-1,263,641,0
+80000,Male,University,Single,30,0,79949,2500,0
+80000,Male,University,Married,36,-1,9097,31091,0
+80000,Male,Graduate School,Single,37,0,68553,3000,0
+100000,Female,University,Married,36,1,79695,0,1
+120000,Female,University,Single,26,0,4787,1500,0
+50000,Female,University,Married,47,2,10992,1540,1
+260000,Male,University,Married,64,2,2500,0,1
+180000,Male,Graduate School,Married,35,0,170128,7187,0
+30000,Female,University,Married,23,0,27141,1900,0
+80000,Male,High School,Married,52,1,13701,0,1
+50000,Male,University,Married,36,0,14668,2800,0
+50000,Male,High School,Married,41,0,13868,1227,0
+90000,Female,University,Single,28,2,48177,2500,1
+180000,Female,High School,Married,35,0,133733,3514,0
+20000,Female,University,Others,52,0,19773,1331,1
+460000,Female,University,Married,39,-1,15950,13101,0
+80000,Male,Graduate School,Single,31,1,61241,2515,0
+50000,Male,University,Single,35,1,40624,1700,1
+50000,Male,University,Single,37,0,50581,2020,1
+240000,Male,Graduate School,Married,39,1,-18,0,0
+360000,Male,University,Married,43,-1,6146,1000,0
+150000,Female,Others,Single,23,0,27338,1403,0
+20000,Male,University,Single,24,0,18186,1500,0
+30000,Male,High School,Single,44,4,2646,0,0
+200000,Female,Graduate School,Single,44,-2,0,0,0
+80000,Female,Others,Single,51,0,9747,1203,0
+260000,Female,University,Single,28,0,154489,6000,0
+20000,Male,University,Single,49,2,7823,1151,1
+240000,Female,Graduate School,Single,39,0,123704,4905,0
+120000,Female,Graduate School,Single,44,1,27500,1800,0
+20000,Female,University,Single,22,0,20117,1466,0
+20000,Female,High School,Married,52,0,9410,2600,0
+430000,Female,Graduate School,Married,37,-1,1669,416,0
+80000,Female,University,Married,46,0,76270,3400,0
+50000,Male,University,Single,25,0,47809,2141,0
+140000,Male,High School,Single,35,0,47422,6800,0
+50000,Male,University,Married,31,1,51203,0,0
+50000,Male,Graduate School,Single,25,0,45079,1773,0
+80000,Female,High School,Others,37,-2,7070,3087,0
+50000,Female,University,Married,26,1,16919,1500,0
+80000,Female,University,Married,37,0,39839,1700,0
+50000,Male,High School,Married,55,2,42166,2200,1
+80000,Female,University,Married,60,1,-2,761,1
+360000,Female,Graduate School,Married,52,-1,7542,6302,0
+60000,Female,University,Single,41,0,60672,2741,0
+290000,Female,Graduate School,Single,31,-1,1170,0,0
+50000,Male,Graduate School,Single,56,0,47076,2110,0
+360000,Female,Others,Single,28,0,243114,1587,0
+440000,Female,University,Married,40,-1,3278,2035,0
+500000,Male,Graduate School,Married,36,-1,396,1043,0
+240000,Male,University,Single,27,1,0,0,0
+100000,Male,University,Single,31,0,21345,6363,0
+200000,Female,Graduate School,Married,47,1,4503,0,0
+260000,Male,Graduate School,Married,43,1,-10,5182,0
+70000,Female,High School,Married,53,2,54251,2600,1
+20000,Male,High School,Married,53,0,17614,2000,1
+180000,Male,Graduate School,Single,38,1,124233,5000,1
+80000,Female,University,Married,44,0,47102,1473,0
+140000,Female,Graduate School,Married,41,0,142129,15126,1
+50000,Female,University,Married,26,0,22915,1411,0
+120000,Male,Graduate School,Married,36,2,117492,3985,1
+320000,Female,Graduate School,Married,42,0,197352,9000,0
+70000,Male,University,Married,39,0,6056,1500,0
+120000,Female,University,Married,42,-1,499,499,0
+150000,Female,University,Single,25,0,100165,5000,0
+230000,Female,University,Married,32,-1,1560,1046,0
+30000,Female,High School,Others,46,1,20143,0,0
+290000,Female,University,Married,34,0,38027,6000,0
+50000,Female,University,Married,36,-1,4782,0,0
+360000,Male,Graduate School,Single,34,0,14867,1079,0
+160000,Female,High School,Married,56,-1,2992,4562,0
+30000,Female,University,Single,42,3,2231,1000,1
+60000,Male,Graduate School,Single,27,0,21387,1378,1
+420000,Male,University,Married,34,0,226007,10000,1
+50000,Male,Graduate School,Married,42,0,49887,1850,0
+360000,Female,University,Single,34,1,0,0,0
+250000,Male,High School,Married,46,0,55891,21343,0
+160000,Male,Graduate School,Single,32,0,82345,9000,0
+50000,Male,University,Single,30,1,52419,0,0
+160000,Male,Graduate School,Single,29,1,-113,19000,0
+220000,Male,Graduate School,Single,31,0,5254,5000,0
+70000,Female,High School,Married,43,2,26193,1500,0
+50000,Male,High School,Married,43,0,10898,1000,0
+120000,Male,Graduate School,Single,29,0,25602,5000,0
+50000,Male,High School,Others,55,2,47067,2400,0
+210000,Female,University,Married,32,0,132082,6500,0
+120000,Female,Graduate School,Single,28,-1,594,0,0
+260000,Female,University,Married,40,0,138602,4870,0
+120000,Male,University,Married,41,1,39606,2000,0
+250000,Male,Graduate School,Single,30,1,0,0,0
+30000,Male,University,Single,33,0,29956,29000,0
+50000,Male,University,Married,28,-2,2862,20637,0
+70000,Female,High School,Single,23,0,16103,2800,0
+200000,Female,Graduate School,Single,59,0,180887,10000,0
+100000,Female,University,Married,29,0,48065,1900,0
+200000,Male,University,Married,36,-1,4670,4670,1
+280000,Female,Graduate School,Single,29,-1,3009,3454,0
+420000,Male,Graduate School,Married,54,-1,12349,6576,0
+400000,Female,Graduate School,Single,32,0,35383,12119,0
+180000,Female,University,Single,27,0,28978,5029,0
+340000,Female,Graduate School,Single,29,0,182441,6550,0
+500000,Male,Graduate School,Single,38,-1,3109,34241,0
+420000,Female,University,Single,37,1,6277,31326,0
+180000,Female,Graduate School,Single,32,-2,10989,228,0
+120000,Male,High School,Married,46,0,94953,120656,0
+50000,Female,University,Married,49,0,48501,3500,0
+150000,Female,University,Single,26,0,36302,2070,0
+20000,Female,Graduate School,Single,22,1,11087,1100,0
+150000,Male,High School,Married,27,2,183405,10000,0
+40000,Male,University,Single,27,6,42055,0,1
+180000,Female,Graduate School,Married,30,0,116218,10000,0
+120000,Female,University,Single,43,2,124613,0,1
+420000,Male,Graduate School,Single,42,0,155814,5397,0
+20000,Female,Graduate School,Single,29,1,6345,0,0
+120000,Female,University,Single,24,0,29161,1500,0
+230000,Female,University,Married,27,0,25947,1800,0
+50000,Male,University,Others,46,0,49776,2056,1
+50000,Female,University,Married,36,0,52136,2218,0
+320000,Male,Graduate School,Married,40,0,151222,6500,0
+70000,Female,University,Single,23,2,69074,2875,1
+280000,Female,Graduate School,Single,27,0,160889,9000,0
+180000,Male,Graduate School,Married,48,0,66092,2370,0
+50000,Female,Graduate School,Single,29,-1,2000,5934,0
+340000,Female,University,Single,25,1,0,0,0
+200000,Female,University,Single,25,0,41357,3262,0
+180000,Female,University,Single,28,0,10479,1410,0
+50000,Female,Graduate School,Single,25,0,42056,5000,1
+120000,Female,University,Single,31,0,115573,5700,0
+360000,Male,High School,Married,46,-2,0,0,0
+240000,Male,University,Married,50,-1,419,419,0
+70000,Female,University,Single,25,4,30542,0,1
+20000,Female,High School,Others,41,-2,2795,941,0
+190000,Male,University,Married,32,1,13041,1400,0
+200000,Female,University,Married,38,-1,780,0,0
+410000,Male,University,Married,45,-2,597,2329,0
+110000,Male,Graduate School,Single,27,0,34314,5000,0
+210000,Female,University,Single,27,0,87909,4100,0
+300000,Male,Graduate School,Married,42,1,32643,1600,0
+390000,Female,Graduate School,Single,29,0,283965,8724,0
+30000,Male,University,Married,51,0,28916,11592,1
+20000,Female,University,Single,24,0,2004,1211,1
+70000,Female,University,Single,23,-1,1823,326,0
+70000,Female,Graduate School,Married,35,4,77021,0,0
+50000,Male,Graduate School,Single,25,-1,1304,0,1
+30000,Male,High School,Single,51,0,25358,1800,0
+130000,Male,Graduate School,Single,33,1,36672,0,0
+130000,Male,University,Married,39,0,1240,0,0
+20000,Male,University,Married,38,1,0,0,1
+30000,Female,University,Married,34,-1,411,1000,0
+170000,Female,University,Single,26,0,30005,1842,0
+120000,Male,University,Married,31,2,53672,4300,1
+50000,Male,Graduate School,Single,42,1,44350,0,0
+130000,Female,High School,Married,41,0,131728,7000,0
+110000,Female,High School,Single,32,0,50433,2200,0
+80000,Female,Graduate School,Single,28,0,53301,3525,0
+500000,Female,University,Single,32,0,33763,10000,0
+180000,Female,University,Married,46,-2,0,12103,0
+120000,Female,High School,Married,58,1,0,2228,1
+20000,Female,University,Married,23,0,14313,1000,0
+370000,Male,University,Single,30,-1,25588,2096,1
+50000,Male,University,Single,22,0,49551,1707,0
+150000,Male,Graduate School,Married,51,0,92766,4000,0
+80000,Male,Graduate School,Single,29,-1,1567,20175,1
+460000,Female,Graduate School,Single,28,0,16560,2500,0
+50000,Male,High School,Married,53,2,45870,2100,1
+80000,Female,University,Single,25,0,62242,19885,0
+260000,Female,Graduate School,Married,35,0,89420,3578,0
+320000,Male,Graduate School,Single,32,-1,488,2301,0
+50000,Female,University,Single,46,0,44808,2100,0
+360000,Female,Graduate School,Married,55,-1,2334,37570,0
+50000,Female,University,Single,50,0,20001,2000,0
+150000,Male,University,Married,30,-2,9248,6871,0
+70000,Female,University,Married,32,0,67417,2303,0
+150000,Female,Graduate School,Single,27,-1,5561,4141,0
+50000,Male,University,Single,42,0,48842,2576,0
+90000,Female,University,Single,23,-1,13808,4335,0
+100000,Male,University,Married,30,0,57998,3000,1
+440000,Female,Graduate School,Single,34,-1,3290,4949,0
+250000,Male,Graduate School,Single,35,-1,2746,1666,0
+10000,Female,High School,Married,55,2,420,0,1
+80000,Female,University,Married,37,1,55595,0,0
+220000,Female,Graduate School,Married,54,1,0,0,0
+40000,Male,Graduate School,Single,25,-1,1946,17810,1
+150000,Male,University,Single,28,2,148095,8000,0
+20000,Male,University,Single,35,4,17936,0,1
+240000,Female,University,Married,41,1,137327,0,1
+200000,Male,Graduate School,Single,29,0,170707,6017,0
+50000,Female,Graduate School,Single,23,0,4821,12000,0
+180000,Female,University,Married,41,0,184650,0,0
+50000,Male,University,Married,25,0,21397,3500,1
+20000,Male,Graduate School,Single,31,-1,1261,390,0
+210000,Female,Graduate School,Married,39,0,35639,10000,0
+500000,Male,Graduate School,Married,43,-1,4125,5864,0
+240000,Female,Graduate School,Single,30,1,9028,0,1
+30000,Male,University,Married,36,1,18811,0,0
+190000,Male,Graduate School,Married,42,-1,7776,3706,0
+10000,Male,University,Single,45,-1,492,10300,0
+360000,Male,University,Single,34,-1,1978,8625,0
+80000,Female,University,Married,39,0,46401,1560,0
+60000,Male,University,Married,40,0,18875,1500,1
+100000,Female,High School,Married,45,-1,1480,916,0
+100000,Female,Graduate School,Single,28,-1,557,1033,0
+10000,Male,High School,Married,46,0,7170,1200,0
+180000,Male,University,Single,24,0,182545,9000,0
+280000,Female,University,Married,39,2,186838,8000,1
+50000,Female,University,Single,37,0,49078,3800,0
+380000,Male,Graduate School,Married,55,0,126880,7000,0
+50000,Female,University,Married,42,0,27901,1770,0
+150000,Male,Graduate School,Single,35,0,147471,20000,0
+210000,Female,Graduate School,Single,35,0,44125,1732,0
+30000,Male,Graduate School,Single,36,3,26851,0,1
+70000,Female,High School,Single,43,0,72770,4108,1
+80000,Male,University,Single,28,3,79425,107,1
+50000,Female,University,Married,44,0,19091,1400,0
+480000,Female,University,Married,35,0,227221,51000,0
+500000,Female,University,Married,38,-1,9889,24920,0
+170000,Female,Graduate School,Single,26,0,22138,1808,0
+330000,Female,Graduate School,Single,29,-1,9894,3000,0
+30000,Male,University,Married,37,0,19625,1671,0
+120000,Female,University,Married,36,-1,326,0,0
+80000,Female,Graduate School,Single,25,0,66426,2657,0
+160000,Male,University,Married,31,0,54835,3500,0
+110000,Male,University,Married,30,0,100908,7000,1
+20000,Female,University,Married,36,2,17638,1666,1
+140000,Male,Graduate School,Married,49,2,136128,4756,1
+50000,Male,Graduate School,Single,30,4,37570,0,0
+20000,Female,University,Single,22,-1,11999,1062,1
+200000,Female,University,Single,29,0,170964,7788,1
+320000,Female,University,Married,50,0,271305,9703,0
+400000,Female,High School,Married,42,-1,10132,16932,0
+250000,Female,University,Single,35,0,241711,7481,0
+550000,Female,Graduate School,Single,34,-1,42132,3005,1
+300000,Male,Graduate School,Married,38,-1,105932,5137,0
+50000,Female,University,Married,34,0,46337,1792,0
+160000,Female,Graduate School,Married,34,-1,1072,3297,0
+80000,Male,Graduate School,Single,25,0,20708,6545,0
+280000,Male,High School,Single,38,-2,192080,5716,0
+500000,Female,High School,Single,35,-1,1900,1489,1
+340000,Female,University,Married,39,-2,93767,65758,0
+160000,Female,University,Married,42,-2,0,0,0
+120000,Female,Graduate School,Single,26,-1,942,780,0
+150000,Male,Graduate School,Single,28,0,28416,10000,0
+110000,Male,Graduate School,Single,34,-1,15908,4013,0
+120000,Female,University,Single,28,-1,2049,0,1
+110000,Female,Graduate School,Single,36,0,141670,6278,0
+320000,Female,University,Married,36,-2,-20,0,0
+50000,Male,High School,Married,42,0,29224,2000,0
+50000,Female,University,Married,38,0,1226,1226,0
+90000,Female,University,Single,23,2,37639,2000,0
+270000,Female,Graduate School,Married,40,-1,19769,34813,0
+50000,Female,University,Married,25,1,19454,0,0
+50000,Male,University,Married,47,2,27143,1700,0
+360000,Female,University,Married,28,1,0,0,0
+120000,Female,University,Married,26,0,4317,2000,0
+290000,Female,High School,Single,37,-1,396,396,0
+200000,Male,High School,Married,36,-1,5308,3981,0
+110000,Female,University,Single,25,2,109924,6000,1
+50000,Male,University,Single,39,0,23540,2010,0
+150000,Male,Graduate School,Married,42,-1,2716,0,0
+280000,Female,Graduate School,Single,32,-1,13532,23871,0
+310000,Female,Graduate School,Single,29,0,44592,2000,0
+450000,Male,University,Single,32,-2,-3768,179,0
+40000,Female,University,Single,23,2,33313,1600,1
+50000,Female,University,Single,38,0,41724,5000,0
+290000,Male,University,Single,35,-1,17397,12556,0
+60000,Male,High School,Married,39,0,36343,5140,0
+20000,Male,High School,Married,47,0,7384,2108,1
+80000,Female,University,Married,27,-1,1995,0,0
+100000,Male,University,Married,36,1,9670,0,0
+50000,Male,Graduate School,Married,37,2,48716,2213,1
+250000,Female,University,Married,36,1,-26,8992,0
+210000,Female,Graduate School,Single,34,-1,1262,33414,1
+280000,Female,Graduate School,Single,35,-2,6302,4663,0
+110000,Female,University,Single,35,0,125719,4316,0
+60000,Male,University,Single,36,2,80709,2300,0
+200000,Male,Graduate School,Married,40,-1,540,454,0
+30000,Male,University,Single,35,0,22583,3026,0
+80000,Male,Graduate School,Single,26,0,79565,3481,0
+50000,Male,University,Single,39,0,17698,2005,0
+160000,Female,Graduate School,Married,39,1,-16,1971,0
+130000,Female,University,Single,26,0,110922,5000,1
+230000,Female,University,Single,35,-1,2873,29525,0
+260000,Female,University,Married,35,-1,213,214,0
+200000,Male,University,Married,54,0,177664,10000,0
+420000,Male,Graduate School,Married,32,0,45060,1407,1
+100000,Female,Graduate School,Single,22,2,36891,2201,1
+100000,Female,Graduate School,Single,25,1,0,460,0
+390000,Male,Graduate School,Single,34,-2,5053,0,0
+80000,Female,University,Single,28,-2,0,0,0
+50000,Male,High School,Married,45,1,46166,0,0
+90000,Female,High School,Single,38,1,8144,2034,1
+240000,Female,University,Married,56,0,138785,6500,0
+50000,Female,High School,Married,47,-1,12785,19982,0
+30000,Female,University,Single,21,-1,290,20002,0
+200000,Female,University,Married,39,0,25483,6541,0
+320000,Female,High School,Single,40,1,298343,12300,1
+90000,Male,University,Single,35,0,22672,1339,0
+20000,Female,High School,Married,23,0,17589,1313,1
+80000,Female,Graduate School,Single,23,0,24215,10000,0
+380000,Female,University,Married,36,0,52956,25000,0
+300000,Female,Graduate School,Single,37,-2,0,51,0
+110000,Male,University,Single,51,0,69564,2500,0
+100000,Female,University,Married,29,0,53840,1748,0
+250000,Female,Graduate School,Married,39,-1,1000,3629,0
+250000,Male,Graduate School,Single,33,-1,22824,6573,0
+220000,Male,Graduate School,Single,30,0,87604,5000,0
+20000,Female,Graduate School,Single,27,3,16732,1700,1
+240000,Female,Others,Married,34,-1,1731,6763,0
+210000,Male,Graduate School,Married,40,0,130868,11760,0
+200000,Female,University,Single,26,0,105433,5498,0
+120000,Female,University,Single,25,1,0,0,1
+140000,Male,High School,Single,26,0,18648,1309,0
+30000,Male,Graduate School,Single,24,0,28495,1900,0
+50000,Female,High School,Married,52,0,16033,1593,0
+240000,Female,Graduate School,Single,28,0,133230,5500,0
+200000,Female,University,Married,46,1,0,0,0
+490000,Male,Graduate School,Married,57,0,354782,17035,0
+60000,Female,Graduate School,Single,36,-1,2130,0,1
+300000,Male,University,Married,37,1,-165580,505000,0
+310000,Female,Graduate School,Single,36,1,-8,1711,0
+360000,Male,University,Married,43,-2,0,751,0
+60000,Female,University,Single,23,0,60380,2296,1
+120000,Female,Graduate School,Single,27,-1,390,390,0
+210000,Male,University,Single,26,0,206714,7868,0
+50000,Male,University,Married,38,0,22178,1700,0
+510000,Female,Others,Married,34,-2,10554,1074,0
+10000,Female,Graduate School,Single,24,0,6709,1300,0
+250000,Female,Graduate School,Single,36,-1,6378,72534,0
+210000,Female,Graduate School,Single,26,0,3292,3399,0
+160000,Male,Others,Single,46,0,169746,4318,0
+80000,Female,University,Single,24,0,3278,1000,0
+230000,Male,Graduate School,Single,28,0,29108,1587,0
+30000,Female,University,Married,53,1,18663,0,1
+50000,Female,University,Married,29,2,50136,1000,1
+190000,Female,Graduate School,Single,28,2,159438,8150,1
+130000,Female,University,Married,46,0,125557,2498,0
+140000,Female,High School,Married,46,0,132586,4800,0
+110000,Male,University,Single,29,0,112533,4101,0
+20000,Male,University,Single,21,0,19030,1500,0
+250000,Female,Graduate School,Married,32,0,255413,10500,0
+360000,Female,University,Married,35,0,106010,24976,1
+30000,Female,Graduate School,Single,21,2,30769,0,0
+50000,Male,High School,Single,54,0,43099,2212,0
+100000,Female,University,Married,29,0,49707,2000,0
+150000,Female,Others,Single,33,0,232327,5877,0
+150000,Female,Graduate School,Single,30,-2,12516,5568,0
+200000,Female,Graduate School,Married,41,-1,4533,8756,0
+20000,Male,Graduate School,Single,40,0,13639,1600,0
+310000,Female,University,Married,37,-2,0,0,0
+100000,Female,Graduate School,Single,26,1,93295,4200,0
+50000,Female,University,Single,32,0,41114,4000,1
+130000,Female,University,Single,24,0,48764,1034,0
+100000,Female,High School,Others,47,-1,836,2695,0
+280000,Male,Graduate School,Single,28,-1,4468,11525,0
+200000,Male,Graduate School,Married,37,-1,173867,7000,0
+90000,Male,Graduate School,Married,34,2,54933,2009,1
+150000,Female,University,Married,41,-2,1906,5002,0
+20000,Female,University,Single,42,2,18155,0,1
+360000,Female,Graduate School,Single,30,-1,11193,332,0
+40000,Female,University,Married,28,1,27746,2000,1
+240000,Female,University,Married,55,0,74270,3200,0
+500000,Female,Graduate School,Married,36,-1,7297,12349,0
+20000,Female,University,Single,39,4,11553,0,1
+140000,Male,University,Single,44,0,85119,3153,1
+100000,Male,University,Single,27,0,100442,5000,0
+120000,Female,Graduate School,Single,32,0,18955,1315,0
+100000,Female,University,Married,44,1,97581,0,0
+140000,Female,University,Single,33,-1,4882,4882,0
+210000,Female,Graduate School,Single,27,0,99050,1301,0
+500000,Female,Graduate School,Single,35,1,0,640,1
+110000,Female,Graduate School,Single,27,-1,2280,0,0
+340000,Female,Graduate School,Single,30,-1,2943,3375,0
+330000,Female,Graduate School,Single,27,0,322464,13248,1
+50000,Female,University,Single,34,0,45504,2000,0
+10000,Female,Graduate School,Married,39,1,0,0,1
+70000,Male,High School,Married,42,0,26550,2400,0
+450000,Female,University,Married,37,-1,1212,1032,1
+30000,Female,University,Married,33,0,12051,1250,0
+150000,Female,University,Single,32,5,79529,0,0
+230000,Female,University,Married,37,2,72514,5118,1
+30000,Female,High School,Married,47,2,28073,1800,1
+30000,Male,Graduate School,Single,30,0,29965,2000,0
+20000,Male,High School,Single,45,1,15748,1300,1
+70000,Female,High School,Others,51,0,32485,1267,0
+50000,Female,University,Married,45,-1,2618,2007,0
+360000,Female,Graduate School,Married,46,1,723,17392,0
+400000,Female,Graduate School,Single,29,1,77970,0,0
+130000,Female,High School,Married,36,-1,2342,2951,0
+130000,Male,Graduate School,Single,27,-1,2207,3747,0
+340000,Female,Graduate School,Married,40,-1,1600,93661,0
+270000,Female,University,Married,40,0,26183,1826,1
+60000,Female,University,Single,38,0,13138,20000,1
+130000,Female,University,Single,31,0,80533,1816,0
+10000,Male,High School,Single,22,0,9240,1230,0
+160000,Female,Graduate School,Single,33,2,132510,0,1
+270000,Female,Graduate School,Married,33,0,22604,5000,0
+80000,Male,University,Married,30,2,38370,2000,1
+340000,Female,Graduate School,Married,39,-1,4676,3806,0
+360000,Male,Graduate School,Married,52,-1,14432,15329,0
+240000,Male,Graduate School,Single,43,-2,5740,0,0
+50000,Female,University,Single,26,-1,3475,0,0
+50000,Male,High School,Married,23,0,13352,1066,0
+180000,Female,University,Married,28,-1,1206,661,0
+370000,Female,Graduate School,Married,31,-2,0,229818,0
+60000,Male,University,Single,27,1,10401,2000,1
+50000,Male,University,Single,25,-1,1355,1355,0
+50000,Male,High School,Married,52,-2,-11,46512,0
+130000,Female,Graduate School,Single,29,0,41224,4500,0
+500000,Female,Graduate School,Married,49,-1,4007,10021,0
+300000,Male,Graduate School,Single,27,1,0,11174,0
+120000,Female,University,Single,25,0,75913,4501,0
+110000,Female,University,Married,25,1,102884,0,0
+220000,Male,High School,Single,30,0,35258,1066,0
+140000,Female,University,Single,26,0,8178,1000,0
+30000,Female,University,Single,24,-1,20069,1700,0
+200000,Female,University,Single,35,-1,11409,1228,0
+350000,Female,Graduate School,Married,33,-1,1314,5249,0
+150000,Male,University,Single,42,-2,6407,0,0
+20000,Male,University,Single,24,2,1650,0,1
+200000,Female,Graduate School,Married,56,0,15603,1500,0
+20000,Male,University,Single,24,2,1650,0,0
+320000,Male,Graduate School,Married,35,-1,2695,505,0
+100000,Female,Graduate School,Single,29,-1,170,3224,0
+280000,Female,Graduate School,Single,28,0,7370,3420,1
+50000,Male,Graduate School,Married,47,0,11066,5000,0
+20000,Male,Graduate School,Single,23,0,18311,2000,0
+160000,Female,Graduate School,Single,33,-1,5256,3290,0
+60000,Male,High School,Single,23,0,15592,2000,0
+150000,Male,High School,Married,49,0,32866,5028,0
+460000,Male,Graduate School,Single,34,0,362816,15325,0
+500000,Male,High School,Single,51,-1,2362,575,0
+320000,Male,University,Married,34,0,14237,15031,0
+200000,Female,Graduate School,Single,29,1,0,0,1
+430000,Female,University,Married,28,0,25968,1313,0
+150000,Male,University,Married,58,-2,4473,3273,0
+100000,Female,University,Single,28,2,95197,4050,0
+30000,Male,Graduate School,Single,30,0,13216,2000,0
+260000,Female,High School,Married,31,0,20000,0,0
+40000,Female,Graduate School,Single,24,0,29782,3000,1
+290000,Male,University,Single,34,1,0,0,1
+140000,Male,Graduate School,Single,27,-1,776,0,0
+60000,Female,University,Single,22,1,52126,2030,1
+210000,Male,Graduate School,Single,41,0,62261,6615,0
+170000,Female,Graduate School,Single,28,1,0,0,0
+370000,Female,High School,Married,37,-2,0,0,1
+30000,Male,Graduate School,Single,23,0,30235,3000,0
+360000,Female,Graduate School,Married,32,-1,2500,0,0
+290000,Male,University,Single,43,0,124362,5100,0
+50000,Female,High School,Married,36,0,13572,1188,0
+210000,Female,High School,Married,27,1,0,0,0
+170000,Male,High School,Married,31,2,170485,8000,0
+360000,Male,Graduate School,Married,40,0,76584,21358,0
+90000,Female,Graduate School,Single,25,-1,9359,0,0
+210000,Male,High School,Married,44,1,19131,0,1
+160000,Female,University,Married,50,-2,3440,3219,0
+30000,Female,University,Single,22,0,10094,5300,0
+80000,Male,University,Single,23,2,71528,0,0
+250000,Female,University,Married,36,0,126751,4559,0
+60000,Female,Graduate School,Single,27,1,7169,1500,0
+130000,Male,Graduate School,Single,30,0,127253,12207,0
+180000,Male,Graduate School,Single,30,0,174297,22000,0
+200000,Male,Graduate School,Single,27,-2,5105,7261,0
+210000,Male,Graduate School,Married,28,0,124667,145273,0
+240000,Female,Graduate School,Single,30,0,12876,1905,0
+460000,Female,University,Single,46,0,305814,15000,0
+400000,Female,Graduate School,Married,38,-1,16195,10713,0
+20000,Male,High School,Single,25,0,13701,1241,0
+60000,Female,University,Married,37,0,60655,2410,0
+110000,Female,University,Single,34,0,93039,4500,0
+80000,Female,High School,Single,26,2,41586,3775,1
+50000,Male,Graduate School,Single,30,-1,390,390,0
+360000,Female,Graduate School,Single,28,0,2386,3166,0
+70000,Male,University,Married,24,0,49169,1840,0
+20000,Female,University,Single,22,0,16072,1700,0
+200000,Female,High School,Married,50,0,141886,7000,0
+90000,Female,High School,Married,46,-2,1609,1372,0
+30000,Male,Graduate School,Married,36,-1,1123,200,0
+260000,Female,Graduate School,Married,44,-1,2772,6675,1
+50000,Female,University,Married,34,0,7900,1309,1
+30000,Female,University,Married,25,-1,30119,1900,0
+20000,Female,High School,Married,33,0,2909,1224,0
+30000,Male,University,Married,33,2,42646,2159,1
+290000,Female,Graduate School,Others,29,0,221626,6530,1
+50000,Male,Graduate School,Single,26,2,2467,0,1
+60000,Female,Graduate School,Single,31,0,3610,900,0
+200000,Male,Graduate School,Single,34,0,156449,10000,1
+50000,Male,High School,Married,48,0,45600,1800,0
+190000,Female,Graduate School,Single,29,0,188463,6536,0
+70000,Female,University,Married,38,1,30175,1900,0
+300000,Male,Graduate School,Single,31,1,-122,23075,0
+400000,Male,Graduate School,Single,27,0,58995,5000,0
+200000,Female,Graduate School,Married,46,1,9367,0,1
+560000,Male,University,Married,43,-1,168211,5060,0
+150000,Female,University,Married,37,0,40680,4960,0
+300000,Female,Graduate School,Married,36,-1,2072,2654,0
+80000,Female,University,Married,24,0,48578,3000,0
+150000,Female,Graduate School,Married,52,-1,3391,20,1
+20000,Female,University,Married,29,0,8624,1500,0
+50000,Female,University,Married,27,0,43583,3100,0
+150000,Female,University,Single,23,0,41408,2074,0
+140000,Female,Graduate School,Married,34,1,0,401,0
+500000,Female,Graduate School,Married,52,-1,1323,1725,0
+90000,Female,University,Single,29,0,89068,4073,0
+30000,Male,University,Single,28,-1,27368,3000,1
+50000,Female,High School,Single,50,0,47405,4130,1
+280000,Female,University,Single,32,-2,0,0,0
+390000,Male,Graduate School,Single,26,2,112320,5073,1
+100000,Female,University,Single,27,2,97156,2900,1
+200000,Male,Graduate School,Single,33,0,197272,8600,1
+230000,Female,University,Single,41,0,38975,5000,0
+20000,Female,University,Married,48,0,11492,4500,0
+180000,Male,Graduate School,Single,40,0,123705,6019,0
+10000,Male,University,Single,23,3,4256,1081,0
+327680,Male,High School,Single,42,0,344011,11219,1
+100000,Female,University,Married,42,-1,976,4500,1
+80000,Female,University,Married,41,-2,0,0,0
+250000,Female,Graduate School,Married,37,0,13504,1267,0
+440000,Female,Graduate School,Single,34,-2,4196,10137,0
+140000,Female,Graduate School,Single,24,-1,13408,97396,0
+20000,Female,University,Single,50,0,5100,2940,0
+50000,Female,University,Married,29,2,46191,3800,0
+30000,Female,University,Married,29,-1,2405,390,0
+210000,Female,University,Married,34,0,140272,6018,0
+20000,Male,University,Single,28,0,6866,1500,0
+150000,Female,High School,Married,24,-2,6479,1670,1
+50000,Male,Graduate School,Single,23,0,51175,2500,0
+210000,Female,University,Married,47,0,17952,1500,0
+250000,Female,High School,Married,49,-2,632,0,1
+130000,Male,Graduate School,Single,37,0,109383,5300,0
+290000,Male,Graduate School,Married,38,2,324016,0,1
+150000,Female,University,Single,29,0,19020,1500,0
+80000,Female,High School,Single,28,0,77450,3500,0
+50000,Male,University,Married,39,0,11665,1232,0
+240000,Male,University,Married,49,-2,4639,1838,0
+90000,Male,University,Single,27,0,20249,25000,0
+220000,Male,Graduate School,Single,38,1,7311,0,0
+170000,Male,University,Single,33,-1,2164,165,0
+70000,Male,Graduate School,Single,26,1,0,0,0
+160000,Male,Graduate School,Single,53,2,1309,0,1
+250000,Female,Graduate School,Single,29,-1,17949,9863,0
+70000,Male,Graduate School,Married,42,0,71255,2800,1
+90000,Female,University,Single,27,-1,509,511,1
+50000,Female,University,Married,23,0,49103,1800,0
+150000,Female,Graduate School,Single,30,0,150311,5500,1
+20000,Female,University,Married,21,1,0,0,1
+20000,Female,Graduate School,Single,23,0,16841,1642,0
+50000,Female,Others,Married,29,0,85305,4000,0
+200000,Male,Graduate School,Married,31,0,179063,5000,0
+200000,Female,High School,Single,25,-1,1676,4595,0
+120000,Female,University,Single,52,0,119796,4700,0
+270000,Female,High School,Married,37,-1,384,2197,1
+300000,Female,Graduate School,Single,31,0,11158,1000,0
+400000,Male,University,Single,29,0,41008,2005,1
+120000,Male,High School,Married,24,0,79172,2500,0
+130000,Male,University,Single,25,0,4841,2000,0
+150000,Male,University,Married,30,-2,116778,60367,0
+200000,Female,Graduate School,Single,40,1,0,0,1
+130000,Male,University,Married,49,0,130558,5311,0
+120000,Female,Graduate School,Single,27,-1,1026,1026,0
+230000,Female,University,Single,32,-2,1168,698,0
+50000,Male,Graduate School,Single,26,0,11550,1000,0
+10000,Female,High School,Married,49,-1,32,0,1
+360000,Male,Graduate School,Single,28,-2,14433,2394,0
+140000,Female,Graduate School,Single,24,0,124722,3645,0
+50000,Female,University,Single,25,3,49081,2063,1
+120000,Female,University,Married,44,0,16407,1296,0
+500000,Male,Graduate School,Married,56,-1,22981,97345,0
+500000,Male,Graduate School,Single,33,0,49981,2200,0
+10000,Male,University,Single,25,-1,7786,1000,0
+160000,Female,High School,Married,39,0,40892,2500,0
+60000,Female,University,Single,31,-1,326,1152,0
+60000,Female,University,Married,40,0,25158,5000,0
+160000,Female,University,Single,40,1,0,372,0
+500000,Female,Graduate School,Single,40,-1,5158,5514,0
+360000,Female,Graduate School,Married,41,0,250180,4428,0
+10000,Male,University,Single,27,2,6743,2600,1
+30000,Female,High School,Married,44,2,33382,2000,1
+60000,Female,University,Single,26,0,3724,5600,0
+80000,Female,University,Married,44,0,1719,1000,0
+410000,Male,University,Married,37,-2,4339,1815,0
+70000,Female,University,Married,27,3,70084,3000,1
+60000,Female,High School,Single,47,-1,696,696,1
+80000,Male,University,Married,41,0,12781,2000,0
+30000,Male,High School,Single,38,0,14095,1255,0
+110000,Female,High School,Married,49,0,52775,3000,0
+150000,Female,University,Married,30,-1,696,1392,1
+30000,Female,University,Married,38,1,9033,0,0
+150000,Male,High School,Married,38,0,40441,2000,0
+50000,Male,High School,Married,55,0,42067,4055,0
+40000,Male,University,Others,41,0,39295,1786,0
+70000,Female,University,Single,23,0,54428,2002,0
+100000,Male,Graduate School,Married,46,2,98482,3517,1
+10000,Female,High School,Single,22,1,8846,0,0
+60000,Male,High School,Single,29,0,4262,7662,0
+100000,Male,Graduate School,Single,27,-1,2927,831,1
+50000,Female,University,Single,29,0,9730,1500,0
+10000,Male,University,Married,30,-1,390,0,1
+50000,Female,Graduate School,Married,54,2,47927,4090,1
+10000,Female,High School,Single,36,0,4945,1300,0
+80000,Male,University,Single,36,0,75345,4419,0
+50000,Male,University,Single,47,0,37486,1800,0
+470000,Male,Graduate School,Single,27,-1,4007,3201,0
+530000,Female,University,Single,39,0,398763,17016,0
+200000,Female,Graduate School,Married,34,-2,12322,13871,1
+20000,Male,Graduate School,Single,23,0,18055,1100,0
+80000,Female,High School,Single,24,0,80235,3626,0
+150000,Female,University,Single,38,-1,5219,0,0
+30000,Female,University,Married,41,2,25594,0,1
+170000,Female,Others,Married,37,-2,743,0,0
+230000,Female,High School,Married,30,0,190919,7214,0
+170000,Female,University,Married,37,2,135419,6800,1
+290000,Male,High School,Married,47,-1,1200,0,0
+170000,Female,University,Single,27,0,64195,3000,0
+200000,Female,Graduate School,Single,26,0,118267,3263,0
+80000,Male,High School,Married,55,0,82567,3121,0
+340000,Female,University,Single,32,-2,0,0,1
+110000,Female,Graduate School,Single,28,0,112953,12500,0
+10000,Male,University,Single,36,0,6672,2000,0
+80000,Female,University,Single,27,0,64561,2176,0
+60000,Female,University,Others,28,0,34580,1300,0
+340000,Female,University,Single,27,-1,14745,9318,0
+20000,Male,Graduate School,Married,54,0,10618,2000,0
+500000,Male,University,Married,49,-2,5157,77677,0
+250000,Female,Graduate School,Married,53,0,252916,9011,0
+50000,Female,University,Single,24,1,0,0,0
+100000,Male,Graduate School,Others,48,-1,525,1100,1
+280000,Male,Graduate School,Single,31,-1,2158,0,1
+500000,Male,Graduate School,Married,41,0,505128,19295,0
+20000,Male,University,Single,24,0,16793,1700,0
+140000,Female,University,Single,24,1,0,0,0
+250000,Female,University,Single,30,0,43694,15284,0
+50000,Male,Graduate School,Single,27,-2,0,0,0
+70000,Female,University,Married,33,2,49307,1500,1
+230000,Female,University,Married,27,0,228863,8862,0
+500000,Male,High School,Single,31,-2,453,16783,0
+280000,Female,Graduate School,Married,40,0,212617,7297,0
+160000,Female,University,Married,40,1,0,0,0
+50000,Female,University,Married,32,1,0,0,0
+500000,Female,Graduate School,Married,38,0,105587,3627,0
+290000,Female,University,Single,24,0,230014,8411,0
+100000,Male,High School,Single,29,0,47919,2000,0
+480000,Male,Graduate School,Married,61,0,91524,5000,0
+110000,Female,High School,Single,33,0,110704,2056,0
+350000,Female,Graduate School,Single,24,2,86342,0,1
+70000,Male,University,Others,34,-2,58875,0,0
+290000,Male,University,Married,38,0,118801,3530,0
+210000,Female,University,Single,34,-2,6769,12660,0
+240000,Male,Graduate School,Married,38,-1,4890,4164,0
+260000,Female,University,Married,49,1,0,0,0
+80000,Male,University,Single,27,0,38220,3000,0
+50000,Male,University,Married,38,0,50748,1500,0
+10000,Male,High School,Single,28,2,6302,1000,1
+50000,Female,Others,Married,45,0,48984,2000,0
+200000,Female,University,Married,39,1,0,0,0
+400000,Female,University,Married,39,-2,91170,55974,0
+210000,Female,Graduate School,Married,37,0,24547,4519,1
+500000,Male,Graduate School,Single,42,-2,0,0,1
+60000,Male,High School,Married,55,-2,0,0,0
+150000,Female,University,Single,25,0,151761,5752,0
+60000,Female,Graduate School,Single,26,1,0,23040,0
+450000,Male,Others,Married,35,0,254679,10000,0
+430000,Female,University,Married,42,2,289358,13000,1
+50000,Female,University,Single,30,-1,1383,1383,0
+80000,Female,University,Single,23,-2,0,4419,0
+200000,Male,High School,Married,36,0,25469,2000,1
+60000,Female,University,Single,25,0,18280,1482,0
+50000,Male,University,Married,48,2,49913,2077,0
+150000,Female,Graduate School,Married,35,0,38484,8000,0
+200000,Female,Graduate School,Single,26,-1,980,2000,0
+80000,Male,University,Married,29,0,78265,3070,1
+30000,Female,University,Others,41,1,8247,16463,0
+200000,Female,University,Single,33,0,47132,1917,0
+170000,Female,Graduate School,Single,30,0,30552,2500,0
+130000,Male,University,Married,52,0,82399,7931,0
+180000,Female,University,Single,28,0,103203,6500,0
+160000,Male,Graduate School,Single,27,-1,11924,0,0
+20000,Female,High School,Single,29,-1,957,356,0
+250000,Male,Graduate School,Single,42,0,255473,11300,0
+150000,Male,Graduate School,Single,28,-1,252,6546,0
+180000,Male,Graduate School,Married,35,-1,574,396,0
+120000,Male,Graduate School,Single,27,0,60109,2415,0
+430000,Female,Graduate School,Married,42,-2,1580,1526,0
+200000,Male,Graduate School,Married,38,2,194281,7332,0
+20000,Female,High School,Single,38,-1,12193,5015,0
+370000,Female,Graduate School,Married,42,0,271452,11295,0
+30000,Female,High School,Single,25,0,25366,5000,0
+180000,Female,Graduate School,Single,25,0,7026,10080,0
+360000,Male,Graduate School,Married,48,-1,5518,2548,0
+260000,Female,Graduate School,Single,33,0,248034,11381,0
+200000,Female,Graduate School,Single,25,0,190123,8487,0
+50000,Female,University,Married,39,0,30049,1800,0
+50000,Female,University,Single,22,3,50953,8,1
+270000,Male,Graduate School,Single,33,-1,9711,9391,0
+10000,Female,University,Single,22,2,7582,1200,1
+390000,Male,Graduate School,Married,38,-1,9610,6090,0
+350000,Female,Graduate School,Married,50,-1,50376,20717,0
+350000,Male,Graduate School,Married,40,0,63200,4150,0
+300000,Male,University,Single,33,0,306623,12000,0
+50000,Male,University,Single,38,0,42856,1300,0
+250000,Female,Graduate School,Married,35,0,88132,3500,0
+360000,Male,Graduate School,Single,33,-1,5971,0,0
+180000,Female,Graduate School,Married,42,1,0,0,0
+150000,Male,Others,Married,41,0,78354,2507,0
+250000,Female,University,Married,35,1,196192,0,0
+200000,Male,High School,Single,29,0,41151,1650,0
+260000,Female,Graduate School,Married,46,-1,451,268,1
+150000,Female,Graduate School,Single,52,0,135450,4771,1
+400000,Female,Graduate School,Single,36,-1,39601,21357,0
+20000,Male,University,Married,32,0,14662,1500,0
+90000,Female,High School,Single,25,0,43519,1861,0
+200000,Male,High School,Married,34,0,73494,10000,0
+70000,Female,Graduate School,Single,27,1,61401,2900,0
+80000,Male,University,Married,49,0,5885,2000,0
+240000,Male,Graduate School,Single,43,0,73462,65361,0
+200000,Female,Graduate School,Single,28,-1,339,5322,0
+270000,Female,Graduate School,Single,29,0,125120,4000,0
+210000,Female,Graduate School,Married,32,1,0,0,0
+140000,Male,University,Single,24,-1,16343,1462,1
+120000,Male,High School,Married,36,-1,31218,2000,0
+130000,Female,Graduate School,Married,36,2,65767,9813,1
+70000,Female,High School,Others,34,1,16354,1500,0
+80000,Male,University,Single,24,0,35926,5041,0
+170000,Male,Graduate School,Married,38,1,0,0,1
+300000,Female,Graduate School,Single,33,-2,920,1592,0
+380000,Male,Graduate School,Married,37,-1,15019,4472,0
+110000,Female,Graduate School,Single,26,0,58716,2500,0
+80000,Female,Graduate School,Single,24,0,30985,14300,0
+20000,Male,University,Single,30,0,28674,1392,0
+200000,Male,Graduate School,Single,28,-2,972,0,0
+360000,Male,High School,Married,62,0,234825,15000,0
+170000,Female,University,Single,50,-1,10072,2417,0
+500000,Female,Graduate School,Married,46,-2,28412,72234,0
+50000,Female,University,Single,24,0,16090,36027,0
+50000,Male,High School,Married,52,2,37530,2000,0
+80000,Female,High School,Married,26,0,46802,1903,0
+50000,Male,Graduate School,Single,31,0,50758,2501,0
+30000,Female,University,Single,23,0,28420,1482,1
+30000,Male,University,Married,41,0,5413,9200,0
+20000,Male,High School,Single,23,1,14034,0,1
+500000,Male,Graduate School,Single,29,0,499452,30483,0
+170000,Female,High School,Married,42,-1,552,552,0
+20000,Female,Graduate School,Single,23,1,0,342,0
+90000,Male,High School,Single,50,0,76495,3243,0
+230000,Male,Graduate School,Married,35,-1,3454,0,1
+210000,Female,University,Single,35,0,209802,8800,0
+20000,Male,University,Single,23,0,17728,1700,0
+60000,Female,University,Married,48,-1,9015,14700,0
+300000,Female,University,Married,29,-1,1825,2017,0
+50000,Male,Graduate School,Single,30,0,44572,4500,0
+50000,Male,University,Single,35,0,50000,1150,1
+500000,Male,Graduate School,Married,61,-1,11430,29205,0
+490000,Male,Graduate School,Single,30,0,250637,11032,0
+110000,Female,High School,Married,29,1,109636,0,0
+360000,Female,University,Married,27,0,6449,1108,1
+80000,Female,University,Married,42,0,43024,5098,0
+180000,Female,Graduate School,Single,27,2,153198,1813,1
+180000,Female,Graduate School,Single,27,-1,581,5396,0
+70000,Female,University,Single,22,0,9232,68209,0
+140000,Male,University,Single,26,0,105930,3029,0
+340000,Female,Graduate School,Single,30,-2,24400,27800,0
+70000,Female,University,Single,36,-1,1156,2044,1
+300000,Male,Graduate School,Single,34,0,271928,10230,1
+20000,Female,University,Single,23,2,18495,3000,1
+170000,Female,University,Married,42,-1,415,415,0
+280000,Female,Graduate School,Married,44,0,146390,5111,0
+90000,Female,High School,Single,50,0,83939,4000,0
+160000,Female,University,Single,27,1,0,418,1
+300000,Female,Graduate School,Single,46,1,0,0,1
+60000,Male,Graduate School,Married,38,1,49808,0,1
+210000,Male,Graduate School,Married,46,-1,3035,5790,0
+130000,Female,University,Married,29,0,132214,5500,0
+160000,Female,Graduate School,Single,35,0,11988,0,0
+20000,Female,Graduate School,Single,37,0,8400,1000,1
+240000,Female,University,Married,40,-1,4480,1735,0
+340000,Female,Graduate School,Married,38,-1,15065,41951,0
+150000,Male,Graduate School,Married,42,0,136518,13571,0
+180000,Male,Graduate School,Married,46,0,142826,3000,0
+300000,Female,Graduate School,Single,26,-1,48729,7500,0
+180000,Male,University,Married,44,-1,1110,1110,0
+50000,Female,University,Single,22,0,44320,2280,0
+50000,Female,High School,Married,55,0,46731,1810,0
+340000,Male,Graduate School,Married,37,-1,8480,21631,0
+190000,Male,Graduate School,Single,31,-2,-579,579,1
+180000,Female,Graduate School,Married,43,-1,4824,1001,0
+20000,Male,High School,Single,28,1,5653,0,1
+450000,Male,University,Single,29,0,51686,10068,0
+100000,Female,Graduate School,Single,30,1,105033,0,0
+80000,Male,University,Single,26,0,68194,2000,0
+10000,Male,High School,Single,36,0,6470,1200,1
+20000,Female,University,Single,22,0,5099,1200,0
+260000,Female,University,Married,53,1,118107,3656,1
+150000,Male,High School,Married,52,0,88812,2564,1
+500000,Female,Graduate School,Single,34,-2,19656,65732,0
+110000,Female,Graduate School,Married,46,-1,734,9483,0
+160000,Male,University,Married,32,0,93174,4500,0
+200000,Female,University,Single,36,0,131694,4269,0
+190000,Female,Graduate School,Married,45,1,75758,0,0
+180000,Female,Graduate School,Single,28,-1,8366,16912,1
+50000,Female,Graduate School,Single,25,1,27796,0,1
+100000,Male,High School,Married,54,0,46041,15000,0
+130000,Female,University,Single,29,1,129094,0,1
+120000,Male,University,Married,58,-1,396,396,0
+340000,Female,High School,Single,37,0,289995,10015,0
+20000,Male,University,Single,22,0,19844,1369,0
+30000,Male,University,Married,34,0,28893,2000,1
+250000,Male,University,Married,44,-1,4602,5000,0
+70000,Female,University,Single,25,0,70488,2000,0
+50000,Female,University,Others,47,2,37277,2000,0
+280000,Male,Graduate School,Married,60,-2,792,0,0
+500000,Female,University,Married,46,-1,886,21831,0
+140000,Female,University,Married,50,1,136183,6901,0
+200000,Female,University,Single,31,0,192698,8135,0
+50000,Male,University,Married,36,2,2400,0,1
+320000,Male,Graduate School,Married,58,-1,380,0,0
+50000,Female,High School,Married,59,1,44977,0,0
+30000,Male,University,Others,49,0,12939,1405,0
+100000,Female,Graduate School,Single,25,2,100000,0,0
+340000,Male,University,Single,44,0,327058,16004,1
+350000,Female,University,Married,38,-1,7633,7638,0
+20000,Male,University,Single,24,2,4127,1500,1
+110000,Male,Graduate School,Single,26,0,110925,5607,0
+290000,Female,University,Single,29,0,264929,12324,0
+40000,Male,University,Single,26,1,11318,0,1
+120000,Male,Others,Married,39,2,114647,5000,1
+190000,Male,Graduate School,Married,55,-1,5928,9770,0
+150000,Female,University,Single,23,0,115206,6000,0
+300000,Female,Graduate School,Single,42,-1,6349,668,0
+240000,Female,University,Married,32,0,53286,8000,0
+60000,Female,University,Married,33,2,60099,2500,1
+150000,Male,University,Single,29,-1,51620,9070,0
+80000,Female,High School,Single,47,-2,194843,5501,0
+160000,Female,University,Married,43,-1,5617,267,0
+200000,Male,Graduate School,Married,30,0,182338,4653,0
+90000,Male,University,Married,42,2,95773,4000,0
+360000,Male,University,Married,38,1,0,0,0
+350000,Female,University,Married,38,-1,4482,4998,0
+30000,Female,High School,Married,51,0,17454,1800,0
+40000,Male,High School,Single,23,0,15045,1400,0
+160000,Male,University,Single,24,-1,4449,1283,0
+200000,Male,University,Single,40,0,16816,1610,0
+330000,Male,Graduate School,Single,33,0,223445,5005,0
+500000,Male,Graduate School,Married,47,-1,7233,0,0
+500000,Female,University,Married,28,-2,67763,2244,0
+460000,Female,Graduate School,Married,54,-1,7623,9495,0
+120000,Female,Graduate School,Single,27,-1,6553,0,0
+130000,Female,University,Married,45,2,990,990,0
+20000,Male,Graduate School,Single,22,-1,7268,11500,0
+170000,Male,Graduate School,Married,41,0,149941,3200,0
+50000,Female,Graduate School,Married,29,0,48305,2500,0
+50000,Male,University,Married,36,-1,3580,1084,1
+50000,Male,High School,Married,43,0,17113,1303,0
+220000,Female,Graduate School,Single,36,0,13043,2190,0
+260000,Female,High School,Single,38,0,30845,1275,0
+500000,Female,Graduate School,Married,54,0,428841,15703,0
+20000,Female,University,Single,25,0,20195,1376,0
+20000,Male,High School,Single,40,0,5945,2345,1
+80000,Male,University,Single,25,0,45113,1456,0
+500000,Male,University,Married,35,-2,2295,1384,0
+160000,Male,Graduate School,Single,27,-1,165,0,1
+200000,Male,University,Single,27,0,7934,16140,0
+50000,Female,Graduate School,Married,34,0,42980,1731,0
+20000,Male,University,Single,25,0,19791,1917,1
+170000,Female,University,Married,38,0,146518,7031,0
+10000,Male,Graduate School,Single,23,-1,780,0,1
+20000,Female,High School,Single,28,1,19109,5000,1
+150000,Female,University,Single,29,-1,2501,4400,0
+470000,Female,Graduate School,Married,47,0,447057,17600,0
+200000,Female,Graduate School,Married,32,0,36154,9216,0
+130000,Female,Graduate School,Single,27,0,26868,2000,0
+70000,Female,High School,Single,40,0,12924,2600,0
+270000,Female,University,Married,34,0,160006,5420,0
+150000,Male,High School,Single,27,-1,57386,5000,1
+240000,Female,Graduate School,Married,41,1,8827,0,0
+100000,Female,University,Married,60,-2,0,198,0
+200000,Male,Graduate School,Married,40,-1,6750,5007,1
+220000,Male,Graduate School,Single,33,0,22708,2000,0
+70000,Male,High School,Married,69,1,11572,0,1
+260000,Male,University,Single,44,0,99176,2259,0
+180000,Female,Graduate School,Single,29,0,134797,5000,0
+210000,Female,High School,Single,35,-2,1983,0,0
+130000,Female,University,Married,36,1,98469,0,0
+320000,Female,University,Married,39,0,147440,5009,0
+30000,Female,University,Single,27,2,1473,2998,1
+50000,Female,University,Single,25,0,50485,2000,0
+380000,Female,University,Single,27,0,4400,0,0
+20000,Female,University,Single,22,2,6643,1500,1
+50000,Male,University,Single,31,2,49804,2000,0
+140000,Female,Graduate School,Single,28,-1,8483,9532,0
+150000,Female,High School,Married,46,1,0,0,1
+370000,Male,Graduate School,Single,36,0,297678,12033,1
+100000,Female,Graduate School,Married,41,0,96832,2592,0
+100000,Male,High School,Single,27,0,98746,4146,0
+50000,Male,University,Married,29,3,32750,1600,1
+400000,Female,Graduate School,Single,33,-1,565,0,0
+250000,Female,Graduate School,Married,43,-2,34200,0,0
+150000,Female,High School,Married,45,1,-141,11254,0
+120000,Male,High School,Single,37,2,117681,5200,1
+260000,Female,Graduate School,Single,31,0,117271,7000,0
+90000,Female,University,Single,23,0,80551,2028,0
+130000,Male,Graduate School,Married,39,2,390,390,0
+50000,Female,University,Single,29,1,46376,0,1
+120000,Female,University,Married,29,1,126412,0,1
+140000,Male,High School,Single,33,-1,473,0,0
+170000,Female,Graduate School,Single,34,0,172095,6317,0
+330000,Female,High School,Single,31,-1,5480,30630,0
+210000,Male,University,Married,33,-2,0,0,1
+300000,Male,Graduate School,Married,39,0,27459,8000,1
+130000,Female,University,Married,39,0,93797,3339,0
+360000,Female,Graduate School,Single,29,0,27370,1328,0
+150000,Male,University,Single,26,2,37594,0,0
+90000,Female,University,Single,33,2,61367,0,1
+260000,Female,Graduate School,Married,61,1,-551,17000,0
+30000,Female,High School,Married,21,4,32179,0,1
+80000,Female,University,Married,45,0,79552,3130,0
+500000,Female,Graduate School,Married,33,-2,363,2089,0
+20000,Female,University,Married,33,0,13864,1242,0
+60000,Male,High School,Single,36,0,34364,5000,0
+210000,Male,University,Single,37,0,15637,3017,0
+50000,Male,Graduate School,Single,24,0,29595,3000,1
+350000,Male,University,Single,32,0,228009,9000,0
+200000,Male,University,Single,37,1,0,0,0
+280000,Female,Graduate School,Single,28,0,51439,20000,0
+230000,Male,Graduate School,Married,41,-2,1552,792,0
+200000,Female,Graduate School,Married,35,2,195441,8597,1
+220000,Female,Graduate School,Married,32,1,202792,0,0
+160000,Female,University,Single,30,-1,584,4572,0
+200000,Female,University,Single,29,0,180016,6073,0
+60000,Female,University,Married,45,0,56965,3000,0
+60000,Female,High School,Married,33,0,57561,2100,0
+380000,Male,Graduate School,Married,36,-1,28057,14412,0
+240000,Male,Graduate School,Married,38,1,0,0,0
+70000,Female,University,Single,30,0,19758,1162,0
+180000,Female,High School,Married,33,-1,170126,5950,0
+150000,Female,University,Single,42,-1,2800,67568,0
+140000,Male,Graduate School,Single,34,2,124640,9900,1
+200000,Male,University,Single,25,0,107001,5000,1
+130000,Female,Graduate School,Single,24,2,126874,6500,1
+180000,Female,University,Married,30,0,4747,1200,0
+50000,Female,University,Single,30,-1,4166,2859,0
+80000,Female,University,Single,24,1,0,0,0
+40000,Female,University,Married,23,2,31772,0,0
+210000,Female,Graduate School,Married,36,-2,167,0,0
+150000,Female,Graduate School,Single,29,-1,560,390,0
+280000,Female,Graduate School,Single,62,2,2500,0,1
+100000,Male,University,Single,28,0,180765,3018,0
+100000,Male,University,Married,33,2,66393,3100,1
+50000,Male,High School,Single,23,2,11744,1507,1
+150000,Female,High School,Married,25,0,29823,1630,0
+20000,Male,Graduate School,Single,40,0,20324,1500,0
+270000,Female,Graduate School,Married,50,-1,1363,0,0
+20000,Female,University,Married,28,2,390,390,1
+50000,Female,Graduate School,Single,28,2,34018,2000,1
+250000,Male,University,Married,53,0,177537,25000,0
+80000,Female,Graduate School,Single,33,0,60156,5027,0
+400000,Male,University,Married,40,7,405366,0,1
+50000,Male,Graduate School,Single,35,1,20579,4,0
+50000,Female,University,Married,53,-1,7061,2233,0
+50000,Female,Graduate School,Married,34,0,49135,1971,0
+280000,Female,University,Married,44,0,286706,10480,0
+240000,Female,Graduate School,Single,29,-1,59,548,1
+30000,Female,University,Married,27,2,19193,1500,1
+250000,Male,Graduate School,Single,29,-1,520,2047,0
+150000,Female,Graduate School,Others,55,-1,280,280,0
+60000,Female,University,Single,23,0,28406,1790,0
+50000,Male,University,Married,35,1,0,1118,0
+160000,Male,University,Married,33,0,102292,4500,0
+100000,Male,University,Married,44,0,11119,3000,0
+70000,Male,Graduate School,Single,31,0,67270,3400,0
+200000,Female,University,Single,35,-1,1709,2400,0
+500000,Female,Graduate School,Single,51,-1,187,370,0
+180000,Female,Graduate School,Single,28,1,3775,2005,0
+30000,Male,Graduate School,Single,31,0,28398,2081,0
+360000,Male,Graduate School,Married,51,0,353200,13806,0
+150000,Male,Graduate School,Single,36,0,95716,3226,0
+80000,Male,University,Single,26,2,81268,3450,1
+150000,Female,Graduate School,Single,25,0,25484,1726,0
+230000,Male,University,Married,50,-2,21570,4700,0
+100000,Female,University,Single,36,-1,14891,0,0
+60000,Female,High School,Married,36,1,57285,5336,1
+50000,Male,Graduate School,Single,23,0,29772,1229,0
+20000,Male,University,Single,22,3,12553,2500,1
+290000,Female,Graduate School,Single,30,-1,560,463,0
+110000,Male,Graduate School,Single,29,0,109241,5209,0
+50000,Female,University,Single,22,0,22458,1513,0
+230000,Female,University,Married,43,2,222819,14300,1
+50000,Male,University,Single,30,2,200,0,1
+150000,Female,University,Single,26,0,9189,2000,0
+20000,Male,University,Married,40,-1,836,836,1
+90000,Female,High School,Married,45,1,83760,0,1
+60000,Female,Graduate School,Single,26,-2,0,108,0
+80000,Male,Graduate School,Single,26,0,67229,2089,0
+280000,Male,University,Married,25,-1,3863,89295,0
+230000,Female,Graduate School,Married,36,-2,3240,0,0
+100000,Female,High School,Single,58,0,43386,2000,0
+80000,Female,Graduate School,Single,24,-1,390,1063,0
+60000,Female,University,Married,41,0,60255,2000,0
+70000,Male,University,Married,35,0,70480,3237,0
+110000,Female,University,Married,49,-2,115672,5085,0
+60000,Male,University,Single,26,-1,1396,1700,0
+110000,Male,University,Single,26,0,34949,1600,0
+120000,Male,High School,Married,54,-1,632,0,0
+280000,Female,University,Single,43,-1,4172,0,0
+50000,Female,University,Single,23,0,47598,2559,0
+20000,Female,Graduate School,Single,22,-1,2654,0,0
+240000,Female,Graduate School,Single,30,-1,498,0,0
+50000,Male,University,Single,22,0,22616,1668,0
+200000,Female,University,Single,27,0,65489,4013,0
+320000,Male,University,Single,28,0,11639,2552,0
+20000,Male,University,Married,56,3,17740,0,1
+50000,Male,University,Single,44,0,47505,2100,1
+180000,Male,University,Married,41,-1,859,1811,1
+80000,Male,Graduate School,Married,28,0,78429,0,0
+240000,Female,University,Married,48,-2,-5,0,1
+260000,Female,University,Single,28,-1,500,10500,0
+80000,Female,Graduate School,Married,40,1,0,2054,0
+80000,Male,University,Married,40,0,64120,2438,1
+100000,Male,University,Married,47,0,66539,7000,0
+20000,Male,University,Married,64,-1,528,0,1
+130000,Female,University,Single,31,0,97544,4000,0
+120000,Female,University,Married,28,1,0,0,0
+90000,Female,University,Married,35,0,84681,3666,0
+180000,Female,University,Single,25,-1,3231,15000,0
+80000,Female,Graduate School,Single,32,0,60542,3000,0
+130000,Male,University,Married,45,-1,390,390,1
+230000,Female,Graduate School,Single,26,-1,17532,10000,0
+130000,Female,Graduate School,Single,26,-1,3597,4398,1
+190000,Male,University,Single,33,0,176028,7063,0
+210000,Female,Graduate School,Single,31,0,182501,9000,0
+20000,Female,High School,Married,52,0,11271,2743,0
+20000,Male,Graduate School,Single,23,0,15196,1428,0
+280000,Female,University,Single,30,0,61282,7900,0
+500000,Female,Graduate School,Married,47,0,12943,5025,0
+230000,Male,Graduate School,Married,39,-1,1319,770,1
+130000,Male,University,Married,31,2,131635,0,1
+660000,Male,Graduate School,Single,40,0,142417,6500,0
+350000,Female,High School,Married,50,-1,41069,315,0
+40000,Male,High School,Single,24,2,39149,1658,1
+50000,Female,High School,Married,33,1,51421,0,1
+20000,Male,University,Married,43,2,18908,1700,1
+80000,Female,University,Married,35,0,76978,0,0
+50000,Female,University,Single,23,0,49100,15000,0
+290000,Male,University,Married,40,0,273270,12022,0
+80000,Female,Graduate School,Single,27,-1,1152,7900,0
+230000,Male,High School,Single,29,1,237332,8000,0
+30000,Female,University,Married,29,2,26414,1500,0
+200000,Female,Graduate School,Single,29,-1,2704,4959,0
+50000,Female,University,Single,26,2,26023,1453,1
+50000,Male,University,Married,57,4,51187,0,0
+200000,Female,Graduate School,Married,48,-2,8000,7919,0
+20000,Male,University,Single,23,0,17821,2000,0
+50000,Female,High School,Single,48,0,45163,1500,0
+90000,Male,Graduate School,Single,31,0,79397,2950,1
+30000,Female,University,Single,23,0,22517,2000,0
+80000,Female,University,Single,30,0,79194,3077,0
+20000,Male,University,Single,31,1,0,184,0
+50000,Female,Graduate School,Single,30,0,19930,28019,0
+100000,Male,Graduate School,Single,33,0,84647,4085,0
+210000,Female,Graduate School,Single,28,-2,18755,6122,0
+500000,Female,Graduate School,Married,38,1,0,0,0
+30000,Male,University,Single,25,0,7527,2700,1
+160000,Female,University,Married,41,2,24674,2800,1
+180000,Female,Graduate School,Single,28,-2,168318,4163,0
+490000,Female,Graduate School,Married,37,-1,36518,77996,0
+50000,Male,Graduate School,Single,29,-1,1162,37719,0
+130000,Male,Graduate School,Single,27,0,87685,3300,0
+50000,Female,University,Single,34,1,0,0,1
+200000,Female,Graduate School,Married,58,-1,283,5361,0
+50000,Female,Graduate School,Single,27,1,42764,732,1
+180000,Female,Graduate School,Single,33,0,18205,5000,0
+500000,Female,Graduate School,Single,39,-2,1950,2900,1
+210000,Female,University,Married,46,-1,3095,2768,1
+20000,Male,Graduate School,Single,35,3,10216,900,1
+190000,Female,University,Others,29,1,0,0,0
+140000,Male,High School,Single,29,0,142132,5600,0
+20000,Male,High School,Single,51,-2,21353,0,1
+420000,Female,Graduate School,Single,29,0,57707,86000,0
+180000,Female,Graduate School,Single,29,-1,10447,18006,0
+290000,Male,University,Single,28,0,240745,9323,0
+190000,Female,Graduate School,Single,40,0,78388,3600,0
+360000,Male,Graduate School,Married,39,-1,1139,0,0
+20000,Male,University,Married,53,2,15794,1281,1
+260000,Female,University,Married,32,0,200581,7651,0
+20000,Male,High School,Married,55,1,19420,1000,0
+50000,Male,High School,Married,52,0,19438,2003,0
+250000,Male,Graduate School,Married,55,1,0,0,0
+30000,Female,University,Married,26,-2,-32,11208,0
+460000,Male,University,Single,30,0,396155,15299,0
+80000,Male,University,Single,26,0,80233,4000,0
+60000,Female,University,Married,35,1,5601,0,0
+80000,Female,University,Single,26,0,67798,3100,0
+90000,Female,University,Married,28,-1,10962,4512,1
+50000,Female,University,Single,34,0,27080,1500,0
+50000,Male,High School,Single,50,0,50693,1900,0
+360000,Female,Graduate School,Married,41,1,0,105,1
+360000,Female,Graduate School,Married,49,1,0,0,0
+230000,Female,Graduate School,Single,29,0,106984,4650,0
+240000,Female,University,Married,39,-1,9000,2831,0
+110000,Female,University,Married,27,0,81700,5000,0
+210000,Female,University,Married,28,-2,1412,1194,1
+300000,Female,Graduate School,Single,28,0,4553,1033,0
+200000,Male,High School,Married,65,0,183653,7130,0
+220000,Male,University,Single,29,0,223773,8391,0
+90000,Female,High School,Married,45,-1,2579,2514,0
+210000,Female,University,Married,44,0,89755,3600,0
+500000,Male,Graduate School,Single,31,-2,81252,16527,0
+160000,Female,Graduate School,Single,33,-1,6113,980,0
+140000,Female,Graduate School,Married,47,0,141530,7000,0
+120000,Female,Graduate School,Married,42,0,23331,5000,0
+80000,Male,University,Single,46,0,44302,1192,0
+240000,Female,University,Married,38,0,56331,3004,0
+170000,Female,Graduate School,Single,30,0,49341,4000,0
+30000,Male,University,Single,38,0,26825,2100,0
+260000,Female,Graduate School,Married,31,-1,8078,20000,0
+400000,Male,Graduate School,Married,34,-2,0,0,0
+500000,Female,University,Married,31,0,21008,1500,0
+100000,Female,Graduate School,Single,32,1,0,650,1
+240000,Male,Graduate School,Married,39,0,236229,8019,0
+10000,Male,High School,Married,53,-1,1772,8304,0
+80000,Female,University,Married,36,-1,2741,9266,0
+60000,Male,University,Single,28,0,55731,3000,0
+200000,Female,Graduate School,Single,38,-2,0,0,0
+260000,Female,University,Single,41,0,252582,9725,1
+260000,Female,University,Married,39,0,255674,10663,1
+260000,Male,Graduate School,Single,30,-1,496,496,0
+300000,Male,Graduate School,Married,34,0,1257,5600,0
+310000,Female,University,Single,27,0,44834,2015,0
+140000,Female,Graduate School,Married,32,0,129371,5650,0
+200000,Female,University,Married,50,-1,4608,0,1
+230000,Male,University,Married,39,-1,326,326,0
+50000,Male,University,Single,27,0,91493,3749,0
+180000,Female,University,Married,29,0,3783,2000,0
+100000,Male,Graduate School,Single,26,0,50848,3000,0
+80000,Female,University,Married,49,-1,2222,0,1
+350000,Female,Graduate School,Single,39,-1,3860,648,0
+140000,Female,University,Single,32,-1,3003,3134,1
+360000,Female,Graduate School,Single,29,0,148582,90044,0
+10000,Male,Others,Single,47,0,7968,1000,0
+130000,Female,Graduate School,Single,25,0,37112,2009,0
+320000,Male,Graduate School,Married,41,0,321281,27000,0
+180000,Female,University,Married,34,1,142024,6500,1
+180000,Female,Graduate School,Married,39,0,43507,2000,0
+150000,Female,University,Married,39,2,156713,0,1
+80000,Male,University,Single,27,1,0,0,0
+70000,Male,Graduate School,Single,50,0,56551,1900,0
+150000,Male,University,Married,66,0,190790,4942,0
+480000,Female,High School,Married,36,-1,19311,1368,0
+50000,Female,High School,Single,45,0,51400,0,0
+100000,Female,Graduate School,Single,28,1,0,0,1
+50000,Female,High School,Single,26,0,26708,2000,0
+280000,Male,Graduate School,Single,28,0,139719,6000,0
+20000,Female,University,Married,48,-1,780,390,0
+30000,Male,High School,Married,50,-1,3607,0,1
+70000,Female,University,Married,38,0,40271,3000,0
+30000,Male,High School,Single,59,0,28372,2010,0
+200000,Female,Graduate School,Married,64,1,0,0,0
+210000,Male,Graduate School,Single,30,1,0,855,0
+210000,Female,Graduate School,Married,66,0,132797,4921,0
+470000,Male,Graduate School,Married,39,-1,79455,2348,0
+50000,Male,University,Single,38,0,44620,10000,0
+240000,Female,University,Married,37,-1,12365,90500,0
+60000,Male,High School,Married,38,2,32232,4000,1
+100000,Female,University,Married,33,2,96547,5305,0
+200000,Female,University,Single,25,0,194202,8000,0
+130000,Male,Graduate School,Married,58,0,137671,4300,0
+50000,Male,University,Single,24,0,50628,2235,0
+300000,Female,Graduate School,Single,34,2,2500,0,1
+210000,Male,Graduate School,Single,32,-1,2869,0,0
+70000,Male,Graduate School,Single,27,0,71201,2691,0
+100000,Female,High School,Single,49,-1,85438,5105,0
+30000,Female,University,Married,25,0,18757,1475,0
+200000,Male,Graduate School,Single,30,-1,1528,4191,0
+360000,Male,University,Married,36,-1,6783,2347,0
+20000,Male,University,Single,24,3,18585,0,1
+150000,Female,Graduate School,Single,32,1,0,258,1
+750000,Female,Graduate School,Single,28,1,9500,250000,0
+100000,Male,University,Single,24,0,91572,5100,0
+300000,Male,University,Married,36,1,54977,2500,1
+90000,Female,University,Single,55,0,82901,1544,0
+50000,Male,University,Single,28,0,21510,2200,0
+120000,Female,University,Married,48,0,78384,4000,0
+120000,Female,University,Married,39,0,44630,1999,0
+30000,Male,University,Married,42,0,18583,2975,1
+220000,Male,University,Married,43,0,84466,4000,0
+180000,Female,University,Single,31,1,0,0,1
+20000,Male,High School,Single,52,2,14038,1338,1
+140000,Female,University,Single,26,0,135294,7462,0
+400000,Female,University,Married,46,-1,30018,19753,0
+180000,Female,Graduate School,Single,27,-1,11168,6678,0
+340000,Female,University,Married,36,1,0,0,0
+250000,Male,High School,Single,46,0,244563,10925,0
+50000,Female,University,Married,50,0,8925,1200,0
+220000,Female,Graduate School,Single,24,0,51908,2017,0
+400000,Male,University,Single,35,0,11052,0,0
+320000,Female,Graduate School,Married,37,-2,14536,2949,0
+260000,Female,University,Married,40,-1,148,360,0
+350000,Male,Graduate School,Married,42,-1,1134,2995,0
+30000,Male,University,Single,34,0,28509,2003,0
+10000,Male,University,Single,24,0,11035,1500,1
+50000,Female,University,Single,23,0,5961,0,0
+30000,Female,High School,Single,23,0,26280,3420,0
+350000,Male,Graduate School,Single,35,2,105,3308,0
+410000,Male,Graduate School,Married,37,-1,5770,2290,0
+130000,Female,High School,Married,35,0,65743,2606,0
+80000,Female,High School,Single,28,0,70194,3500,0
+380000,Female,University,Married,51,1,0,0,0
+320000,Female,Graduate School,Married,42,1,318844,10000,0
+20000,Male,Graduate School,Single,23,2,3206,1100,1
+50000,Male,University,Single,32,0,48536,3030,0
+30000,Male,University,Single,25,0,19955,1338,0
+50000,Male,University,Single,24,1,41974,2200,0
+50000,Male,University,Single,41,0,40709,3000,0
+50000,Male,University,Married,51,0,49083,2238,0
+200000,Male,Graduate School,Single,27,0,4770,1300,0
+80000,Male,University,Single,29,0,50695,10010,0
+20000,Female,High School,Others,53,0,9677,4578,0
+180000,Female,University,Single,28,-2,2177,1175,0
+230000,Female,Graduate School,Single,33,0,228639,10075,0
+180000,Female,University,Single,33,-1,6314,1795,0
+90000,Female,University,Single,23,0,42105,2500,1
+20000,Male,University,Single,23,2,2400,0,1
+100000,Male,University,Single,32,1,0,0,1
+60000,Male,High School,Single,44,0,57694,2800,0
+130000,Female,University,Single,35,0,42950,2036,1
+50000,Female,University,Married,55,0,37984,3900,1
+70000,Female,Graduate School,Single,23,0,8878,1500,0
+360000,Female,University,Single,53,-1,5583,495,0
+20000,Male,Graduate School,Single,22,1,19438,1700,1
+20000,Male,High School,Single,34,-1,390,780,1
+440000,Female,High School,Single,36,0,7887,2516,0
+400000,Male,Graduate School,Married,41,-1,1630,816,0
+170000,Female,Graduate School,Single,29,0,90450,3600,0
+200000,Male,University,Married,32,-1,9864,0,0
+220000,Female,Graduate School,Married,41,-1,30038,1089,0
+280000,Female,University,Married,31,0,207966,5078,0
+360000,Male,Graduate School,Single,29,0,90071,3032,0
+20000,Male,University,Married,51,1,18462,300,0
+290000,Female,Graduate School,Married,44,-2,301,768,0
+380000,Female,University,Married,33,1,0,73,0
+180000,Male,Graduate School,Single,28,0,2569,2834,0
+120000,Female,Graduate School,Single,26,0,1379,9000,0
+50000,Female,University,Single,27,0,162865,5000,0
+20000,Male,Graduate School,Married,48,2,14246,0,1
+20000,Male,University,Single,25,1,10515,1500,0
+170000,Female,University,Single,29,0,161581,6000,0
+180000,Female,University,Single,24,0,54357,1260,0
+200000,Male,Graduate School,Single,36,1,0,0,0
+200000,Male,University,Married,55,1,181339,0,1
+360000,Female,Graduate School,Single,26,-2,145,3600,1
+10000,Female,University,Single,26,0,6346,1129,0
+60000,Female,University,Married,60,8,69183,0,0
+40000,Female,University,Married,24,2,25618,1800,1
+100000,Male,University,Single,39,-1,4530,4921,0
+50000,Female,University,Single,23,0,38333,3000,0
+300000,Male,Graduate School,Single,29,-2,22391,4018,0
+110000,Male,High School,Single,29,0,64351,1415,1
+120000,Male,Graduate School,Married,37,0,19020,1363,0
+150000,Male,University,Married,45,0,64869,3049,0
+230000,Female,University,Single,30,-1,1243,113,0
+180000,Female,University,Married,35,0,43068,316,0
+320000,Female,University,Single,33,-1,7444,5783,0
+80000,Female,Graduate School,Single,23,1,-12,2333,0
+160000,Female,Graduate School,Single,29,-1,1289,2106,0
+50000,Female,University,Single,23,0,34997,1900,1
+180000,Male,University,Single,26,-1,316,692,0
+110000,Male,High School,Single,26,0,44089,3002,0
+150000,Male,Graduate School,Single,32,-1,994,10290,0
+210000,Female,High School,Married,52,-1,65265,39000,0
+10000,Male,University,Single,43,2,654,608,1
+30000,Female,University,Single,22,-2,0,2343,0
+450000,Female,Graduate School,Single,36,-1,3893,600,1
+110000,Female,Graduate School,Single,25,1,60491,3000,0
+270000,Female,University,Married,37,0,110593,4500,0
+50000,Female,High School,Married,46,0,51207,2162,0
+100000,Female,High School,Others,43,0,62300,2464,0
+360000,Female,University,Single,30,-2,2500,0,0
+110000,Female,University,Married,32,1,75505,0,1
+180000,Female,High School,Single,55,-1,6770,164163,0
+230000,Female,Graduate School,Married,34,2,190784,9300,0
+20000,Female,High School,Married,41,0,7478,1146,0
+50000,Female,Graduate School,Single,22,2,46035,2000,1
+10000,Male,High School,Single,31,0,5062,1259,0
+320000,Male,Graduate School,Married,46,0,3337,9061,0
+280000,Female,Graduate School,Single,26,0,36395,1700,0
+20000,Male,University,Married,28,0,18537,1358,0
+70000,Male,University,Single,36,0,66888,4000,0
+80000,Female,University,Single,36,0,47502,4000,0
+200000,Female,Graduate School,Single,25,-1,92400,0,0
+30000,Male,University,Single,24,0,27306,1747,0
+100000,Female,University,Married,41,1,51345,0,1
+110000,Male,Graduate School,Single,26,0,64767,6000,0
+300000,Female,Graduate School,Married,33,-1,356,1320,0
+300000,Female,University,Married,31,0,89240,4214,0
+340000,Female,Graduate School,Single,29,0,82770,10000,0
+270000,Male,Graduate School,Married,37,-1,396,396,1
+100000,Male,University,Single,26,0,80117,2683,0
+50000,Male,Graduate School,Married,33,1,46745,0,0
+360000,Female,University,Married,40,0,352583,10461,0
+90000,Female,University,Married,49,0,91263,3100,0
+140000,Male,University,Married,42,0,127939,5059,0
+150000,Female,Graduate School,Single,28,-1,1159,2468,0
+50000,Male,High School,Single,34,-2,17095,1448,0
+30000,Female,High School,Single,26,2,28189,0,1
+230000,Female,University,Married,54,-2,0,880,0
+240000,Female,University,Married,40,-2,0,0,1
+200000,Female,Graduate School,Married,44,-1,65520,0,0
+330000,Male,University,Married,42,-1,118601,4101,0
+110000,Male,Graduate School,Married,29,0,51499,1887,0
+200000,Female,High School,Married,54,0,114643,6000,0
+50000,Female,University,Others,46,0,44076,1810,0
+70000,Female,Graduate School,Single,27,0,69972,5000,0
+360000,Female,University,Married,59,1,0,0,1
+50000,Female,University,Single,54,0,4625,1121,1
+80000,Female,High School,Single,28,-1,4280,2800,0
+200000,Female,University,Married,69,0,106878,4000,0
+50000,Male,University,Single,27,0,47261,2055,1
+30000,Male,Graduate School,Single,27,0,18239,2000,0
+390000,Female,Graduate School,Single,31,0,252109,12009,1
+380000,Female,Graduate School,Married,42,2,326,326,1
+20000,Female,University,Married,46,0,16294,1600,1
+80000,Female,University,Single,31,0,28190,1751,1
+50000,Female,University,Married,49,0,47649,2220,0
+50000,Female,Graduate School,Married,25,0,11324,1500,0
+200000,Male,University,Single,44,-1,541,541,0
+100000,Male,High School,Single,28,-1,103728,3184,1
+180000,Male,Graduate School,Single,31,-1,300,253,0
+120000,Female,University,Single,27,-1,1066,5079,0
+390000,Female,University,Single,31,1,20852,0,0
+250000,Female,University,Married,37,-1,25366,36655,0
+350000,Male,University,Married,47,1,301553,0,1
+70000,Female,University,Single,30,0,71856,3377,0
+210000,Female,Graduate School,Single,33,-2,0,0,0
+180000,Male,Graduate School,Single,29,0,85069,5344,0
+80000,Male,Graduate School,Single,25,-1,390,390,1
+120000,Female,High School,Single,26,-1,1467,975,0
+210000,Female,Graduate School,Married,23,-2,0,0,1
+20000,Male,University,Married,38,-1,18669,1326,0
+150000,Female,Graduate School,Married,38,-1,750,0,1
+390000,Male,Graduate School,Married,38,0,30642,2000,0
+310000,Female,Graduate School,Single,36,-1,6514,8619,0
+230000,Female,Graduate School,Single,27,0,10941,1300,0
+90000,Female,University,Married,51,0,12822,1188,0
+140000,Male,Graduate School,Single,29,0,11693,2000,0
+150000,Female,Graduate School,Single,28,0,105035,4500,0
+130000,Male,University,Single,27,0,92043,2850,0
+210000,Female,Graduate School,Single,27,-1,1441,3000,0
+100000,Female,Graduate School,Single,25,1,0,3473,0
+150000,Male,Others,Married,36,0,220889,3800,0
+200000,Female,Graduate School,Single,29,0,1467,439,0
+80000,Female,University,Single,27,0,78672,7018,0
+230000,Male,Graduate School,Single,27,-1,5652,7662,0
+50000,Male,University,Others,37,0,49642,2060,0
+260000,Female,University,Single,33,-2,4141,133280,0
+110000,Male,University,Single,23,0,43027,2000,0
+20000,Male,High School,Married,56,0,14231,2100,0
+360000,Female,University,Married,42,1,0,6815,0
+160000,Male,Graduate School,Married,36,0,67879,2500,0
+10000,Male,University,Single,35,0,8857,2300,0
+150000,Male,Graduate School,Married,31,2,159734,4800,1
+500000,Male,Graduate School,Married,35,-2,0,0,1
+500000,Male,Graduate School,Married,69,-1,1200,5652,0
+360000,Female,University,Single,28,0,63352,6010,0
+50000,Female,Graduate School,Single,28,0,48726,2406,0
+270000,Female,Graduate School,Married,35,2,128442,6300,0
+10000,Female,High School,Married,58,0,7569,1158,0
+30000,Female,University,Married,29,0,15455,2000,0
+80000,Female,High School,Married,30,1,0,1684,0
+360000,Female,University,Married,27,-1,896,782,0
+120000,Female,Graduate School,Single,26,0,112774,5000,0
+400000,Female,University,Single,35,0,109943,120018,1
+500000,Female,University,Single,26,0,59039,4028,0
+140000,Male,High School,Married,54,-1,1045,1952,0
+110000,Male,University,Single,27,0,29734,2500,0
+30000,Male,University,Single,34,0,29949,6000,0
+140000,Female,Graduate School,Married,56,0,138468,4300,0
+150000,Female,High School,Married,39,0,108755,5163,0
+280000,Male,High School,Married,37,0,271276,16991,1
+20000,Male,High School,Married,45,-1,5577,0,0
+340000,Female,High School,Married,36,-1,2566,8,0
+60000,Male,University,Single,25,2,36090,2017,1
+20000,Male,Graduate School,Married,62,0,17607,1500,0
+50000,Female,Graduate School,Single,25,0,23794,1398,0
+200000,Female,University,Married,35,0,71561,4000,0
+180000,Female,Graduate School,Single,30,0,40311,10000,0
+240000,Female,Graduate School,Married,34,0,41484,1709,0
+360000,Male,University,Single,32,0,33215,1331,0
+50000,Female,University,Married,26,0,49744,1500,0
+100000,Male,High School,Single,50,0,95889,4200,0
+80000,Female,High School,Single,26,1,0,0,1
+50000,Female,High School,Married,32,2,46364,2100,1
+420000,Female,High School,Single,27,-2,1486,2679,1
+190000,Female,Graduate School,Married,35,-1,980,68738,0
+180000,Female,University,Married,33,1,-84,982,0
+40000,Female,Graduate School,Single,22,0,34521,2000,1
+50000,Male,University,Single,43,1,44344,2021,1
+10000,Male,University,Single,23,2,2400,0,1
+200000,Female,Graduate School,Single,39,0,155693,6000,0
+120000,Male,High School,Married,39,-1,7764,10730,0
+100000,Female,University,Married,48,2,44050,1800,0
+100000,Female,Graduate School,Single,24,0,93411,11645,0
+360000,Male,Graduate School,Single,29,0,229928,4905,0
+100000,Female,University,Married,28,-1,316,316,1
+290000,Female,University,Married,49,-1,5287,2334,1
+240000,Male,Graduate School,Single,34,1,40286,14,0
+290000,Female,University,Married,37,0,48978,20078,0
+500000,Female,University,Single,33,-2,11058,18641,0
+50000,Male,University,Married,32,1,14949,0,1
+220000,Female,University,Single,30,0,18598,10000,0
+20000,Male,University,Married,44,0,17441,1309,1
+420000,Female,University,Married,36,0,185882,10000,0
+100000,Female,Graduate School,Single,29,2,74032,3300,0
+230000,Female,Graduate School,Married,43,-1,3245,1867,1
+50000,Female,University,Married,53,0,46940,2188,1
+70000,Male,Graduate School,Married,42,0,119687,11100,0
+180000,Male,Graduate School,Single,31,-1,914,907,0
+90000,Male,Graduate School,Single,27,-1,3382,4983,0
+50000,Male,University,Single,24,0,44193,1806,0
+50000,Female,High School,Single,22,2,42717,9289,0
+60000,Female,Graduate School,Single,26,1,0,0,0
+50000,Female,High School,Single,41,2,23203,1000,1
+300000,Female,Graduate School,Married,58,-2,1257,2851,0
+30000,Female,Graduate School,Single,36,0,14254,2890,1
+10000,Male,Graduate School,Single,25,0,9501,1300,0
+70000,Male,Graduate School,Single,30,0,36201,1586,0
+160000,Male,University,Single,32,0,22506,2000,0
+260000,Male,University,Single,30,0,434442,10000,0
+360000,Male,Graduate School,Single,39,0,163150,50122,0
+60000,Female,High School,Married,30,1,15442,1200,0
+180000,Female,University,Single,39,0,57500,785,0
+40000,Female,University,Single,23,2,33699,1900,1
+160000,Male,University,Single,36,-1,836,836,0
+610000,Female,Graduate School,Single,35,0,389335,14117,0
+50000,Female,High School,Single,35,0,9999,1489,0
+320000,Male,Graduate School,Married,48,2,201932,18300,1
+150000,Male,Graduate School,Single,29,-1,108756,46980,0
+170000,Male,Graduate School,Married,36,-1,2015,2015,0
+500000,Male,University,Married,53,-2,37979,9482,0
+200000,Male,University,Married,57,-1,780,0,0
+290000,Female,Graduate School,Single,27,0,110359,5000,0
+40000,Female,University,Single,53,3,14186,1600,0
+300000,Female,Graduate School,Single,30,-1,6045,7157,0
+50000,Male,High School,Single,32,2,40486,1691,1
+20000,Male,University,Single,29,0,10321,1200,0
+30000,Female,University,Married,22,1,28933,2000,0
+140000,Male,University,Single,34,1,94153,8500,0
+80000,Male,University,Married,32,-1,1253,4091,0
+260000,Female,University,Single,35,1,0,0,0
+130000,Female,Graduate School,Single,27,1,-24,326,1
+380000,Female,High School,Single,39,0,55241,10000,0
+200000,Male,University,Married,36,1,0,0,0
+50000,Female,High School,Married,46,0,21453,1634,0
+50000,Male,Graduate School,Single,25,0,49654,1600,0
+90000,Female,University,Married,26,0,47121,3000,0
+20000,Male,Graduate School,Single,25,1,0,0,1
+140000,Female,University,Married,26,0,75289,20000,0
+30000,Male,High School,Others,45,0,21325,3376,0
+230000,Male,Graduate School,Single,28,0,6289,5112,0
+100000,Male,High School,Married,36,2,8674,2000,1
+240000,Female,Graduate School,Single,46,2,456,0,0
+200000,Male,Graduate School,Married,36,-2,0,0,1
+80000,Male,Graduate School,Single,26,0,83593,2727,0
+360000,Male,Graduate School,Married,56,-1,1094,0,0
+150000,Male,University,Single,37,0,71341,4117,0
+130000,Male,University,Single,27,0,29078,1506,0
+30000,Female,University,Married,56,0,11564,3042,0
+250000,Male,Graduate School,Single,33,0,28306,10000,0
+500000,Male,Graduate School,Married,42,-2,38525,6283,1
+30000,Male,University,Others,43,2,14870,2000,1
+390000,Male,Graduate School,Married,41,2,167795,7299,1
+390000,Female,Graduate School,Single,27,0,78549,4500,0
+360000,Male,University,Single,37,1,-490,0,0
+90000,Female,University,Married,25,0,57846,2018,0
+100000,Female,High School,Married,35,0,47843,2111,0
+30000,Male,High School,Married,31,2,390,390,0
+160000,Male,High School,Single,25,0,31400,0,0
+20000,Male,Graduate School,Single,26,1,15397,0,0
+240000,Female,Graduate School,Single,36,0,7283,1263,0
+80000,Male,Graduate School,Single,26,2,61851,2000,1
+30000,Female,University,Single,22,0,29668,1555,0
+150000,Male,University,Married,45,0,44832,6020,0
+160000,Female,University,Single,28,0,84668,5000,0
+20000,Female,Graduate School,Married,58,0,17241,1500,0
+230000,Male,Graduate School,Married,45,0,26071,2200,0
+100000,Female,University,Married,46,0,98200,3228,0
+420000,Male,University,Single,39,0,204984,8006,0
+480000,Female,University,Married,37,-1,24887,27994,0
+10000,Male,High School,Married,37,-1,678,602,1
+180000,Female,High School,Married,41,2,100094,3582,1
+200000,Male,Graduate School,Single,32,-2,47244,62330,0
+140000,Male,University,Single,36,1,44570,2000,1
+20000,Female,Graduate School,Married,23,0,11862,1300,0
+80000,Female,Graduate School,Single,39,-2,-224,1581,0
+10000,Female,University,Married,29,0,7170,1500,1
+50000,Male,University,Married,35,-1,15609,0,1
+340000,Male,High School,Single,38,-2,660,1131,0
+330000,Female,University,Single,39,0,156853,5757,0
+230000,Female,Graduate School,Married,47,-1,740,740,0
+360000,Male,Graduate School,Single,40,1,0,0,0
+30000,Female,High School,Others,48,0,18385,4000,0
+500000,Male,Graduate School,Single,42,2,419352,17008,1
+200000,Male,Graduate School,Married,42,-1,3664,718,0
+50000,Female,University,Married,55,0,41768,2079,1
+240000,Female,High School,Single,36,0,236267,9000,1
+290000,Female,Graduate School,Married,45,-1,664,0,0
+20000,Female,High School,Single,24,1,-8,13148,1
+60000,Female,High School,Single,49,1,28763,0,0
+160000,Male,University,Single,24,0,76054,3275,0
+200000,Female,Graduate School,Single,31,-1,1041,2000,0
+70000,Female,University,Married,33,0,43981,1683,0
+140000,Male,Graduate School,Single,35,0,10339,10006,0
+460000,Male,Graduate School,Married,44,-1,15264,51383,0
+160000,Female,University,Single,36,-2,2992,1300,0
+200000,Female,University,Married,40,-1,6699,4880,0
+80000,Female,High School,Single,55,0,78009,2420,0
+310000,Female,Graduate School,Married,47,-2,5202,3000,0
+290000,Female,University,Married,29,2,63872,0,1
+120000,Male,Graduate School,Single,27,0,11472,2652,0
+20000,Male,High School,Single,31,2,13335,2800,0
+90000,Male,Graduate School,Single,29,-1,140,0,0
+60000,Male,Graduate School,Married,60,-1,390,390,0
+260000,Male,University,Married,39,2,110529,7010,1
+180000,Female,University,Single,30,0,167539,7782,0
+50000,Male,High School,Single,41,0,15502,1500,0
+50000,Male,High School,Married,40,0,34582,2000,0
+50000,Male,High School,Single,53,0,48019,2279,0
+200000,Female,Graduate School,Single,32,0,14254,10000,0
+170000,Male,University,Married,35,-1,3995,6380,0
+50000,Female,Graduate School,Single,22,0,49086,2047,0
+440000,Male,High School,Married,47,0,98161,1238,1
+100000,Male,University,Married,28,-1,10000,0,0
+50000,Male,High School,Single,49,0,26116,1500,1
+220000,Male,Graduate School,Married,47,0,85224,3209,0
+150000,Female,High School,Single,43,-2,1622,13016,0
+20000,Male,Graduate School,Single,28,1,0,0,0
+160000,Female,University,Married,39,1,40712,0,0
+50000,Male,Graduate School,Single,28,0,44476,2869,0
+130000,Female,High School,Married,40,0,102650,0,0
+20000,Male,University,Single,36,0,15198,5000,1
+200000,Female,University,Married,34,-2,28484,3440,0
+20000,Female,Graduate School,Single,22,-2,483,487,0
+300000,Female,Graduate School,Married,39,-1,1134,613,0
+120000,Female,Graduate School,Single,25,0,105169,8000,1
+360000,Female,Graduate School,Single,27,-1,949,40,0
+140000,Male,Graduate School,Others,59,0,141148,4900,0
+210000,Female,University,Single,25,0,79453,3000,1
+200000,Male,Graduate School,Single,30,-2,0,0,0
+130000,Female,Graduate School,Single,28,0,31283,1900,1
+240000,Female,University,Married,48,-1,101,0,1
+80000,Male,High School,Married,65,0,25043,1374,0
+500000,Female,Graduate School,Single,36,1,0,0,1
+320000,Male,Graduate School,Married,46,2,339202,5000,1
+210000,Female,Graduate School,Single,56,0,203653,5000,0
+200000,Female,University,Married,42,-2,0,0,1
+50000,Male,Graduate School,Single,37,2,11533,1612,0
+340000,Female,Graduate School,Married,38,-1,4397,13847,0
+240000,Male,University,Married,62,-1,1901,3855,0
+240000,Female,Graduate School,Single,28,0,46746,1868,0
+50000,Male,University,Married,40,0,30211,21190,0
+50000,Male,High School,Married,55,2,19056,1000,0
+20000,Female,University,Single,45,0,13521,1000,1
+510000,Male,Graduate School,Married,35,-2,0,2754,0
+220000,Female,Graduate School,Single,25,1,0,0,0
+220000,Female,University,Single,30,-1,596,892,0
+60000,Male,Graduate School,Single,37,0,42372,1500,0
+40000,Female,University,Single,26,2,5678,4500,1
+50000,Male,Graduate School,Single,64,0,46351,1377,0
+360000,Female,Graduate School,Married,46,-1,600,2462,1
+120000,Male,Graduate School,Married,51,1,0,0,1
+20000,Male,High School,Single,23,0,17010,0,0
+30000,Male,University,Married,33,0,34914,1000,0
+110000,Female,Graduate School,Single,29,-1,316,316,0
+170000,Female,University,Single,31,0,165987,8400,0
+180000,Male,University,Married,48,-1,919,0,0
+50000,Female,University,Single,23,1,42586,2200,0
+180000,Female,Graduate School,Single,28,-1,47815,8480,0
+80000,Female,University,Married,50,0,56602,5000,0
+20000,Female,Graduate School,Married,34,0,5048,2000,0
+30000,Female,High School,Married,57,0,17779,2000,0
+80000,Male,University,Single,28,-1,10394,77900,0
+200000,Male,Graduate School,Married,53,-2,513,974,0
+120000,Female,University,Married,42,0,117059,5026,0
+160000,Female,Others,Married,52,0,202638,7638,0
+500000,Female,Graduate School,Married,44,-1,301,11063,0
+50000,Female,University,Single,22,0,45675,3630,0
+50000,Female,University,Married,39,0,43724,2000,0
+200000,Male,Graduate School,Single,41,0,24720,3000,0
+30000,Male,University,Single,25,0,7298,1147,1
+190000,Female,University,Married,33,1,-2500,0,0
+120000,Female,University,Married,41,0,14792,10000,0
+30000,Female,High School,Single,39,0,27782,2023,0
+30000,Female,High School,Single,27,-1,5334,3565,0
+310000,Male,Graduate School,Single,33,0,47224,4900,0
+30000,Female,University,Married,31,0,28405,4000,1
+440000,Male,Graduate School,Single,52,2,424606,15010,1
+50000,Female,Graduate School,Single,27,3,33202,2000,1
+70000,Male,University,Single,38,0,71586,1308,0
+260000,Female,Graduate School,Single,31,0,180891,8250,0
+320000,Male,Graduate School,Married,46,-1,10836,19799,0
+210000,Female,University,Married,34,0,128553,3762,0
+280000,Female,Graduate School,Married,29,0,58308,4000,0
+140000,Male,Graduate School,Single,32,0,83768,5000,0
+20000,Female,High School,Single,22,0,14218,1700,0
+340000,Male,Graduate School,Single,25,-1,239715,10469,1
+70000,Female,High School,Others,39,0,34244,3000,0
+120000,Female,Graduate School,Married,50,1,-110,16500,0
+60000,Female,University,Single,24,0,57771,2336,0
+30000,Male,High School,Single,29,0,28547,1700,0
+30000,Male,University,Single,29,0,12174,1635,0
+80000,Male,Graduate School,Married,25,0,48470,4700,1
+50000,Male,University,Married,49,0,28788,1377,0
+100000,Male,High School,Single,37,0,20116,4000,0
+60000,Male,University,Married,35,-1,60130,3007,0
+10000,Female,University,Single,23,-1,4205,0,1
+280000,Female,Graduate School,Married,41,-2,939,8188,0
+250000,Female,High School,Single,47,-2,7363,27474,0
+40000,Male,University,Married,48,0,1672,0,0
+30000,Female,Graduate School,Single,25,1,29628,1600,0
+50000,Male,Graduate School,Married,57,0,42624,2173,0
+70000,Female,High School,Single,29,0,70780,3088,0
+130000,Female,University,Married,28,0,86914,6134,0
+60000,Female,University,Single,24,0,60839,3400,0
+200000,Female,University,Married,29,0,81568,3500,0
+100000,Female,Graduate School,Single,33,0,90825,8800,0
+190000,Female,University,Single,30,0,194724,8900,1
+20000,Male,High School,Married,36,8,24166,0,1
+230000,Male,Graduate School,Single,42,-2,380,380,1
+90000,Female,University,Single,26,0,87561,8548,0
+240000,Female,Graduate School,Single,29,0,191059,10000,0
+20000,Female,University,Single,23,2,11995,1528,1
+20000,Female,High School,Single,22,2,19393,0,1
+50000,Female,Graduate School,Single,24,0,2757,1070,0
+230000,Female,University,Single,40,0,72742,5000,0
+430000,Female,University,Married,41,-2,2830,21419,0
+50000,Male,University,Married,34,0,48976,2023,0
+230000,Male,University,Single,32,0,143345,6000,0
+240000,Female,Others,Single,36,-2,-235,2000,0
+120000,Female,Graduate School,Single,30,-1,330,0,1
+280000,Female,University,Married,26,0,256455,8354,0
+500000,Male,Graduate School,Single,29,-1,8898,1283,0
+260000,Male,University,Married,53,-2,-1,0,0
+140000,Female,Graduate School,Single,31,-1,727,829,0
+310000,Female,High School,Single,28,0,220845,7809,0
+50000,Female,High School,Married,36,2,45557,2200,0
+50000,Female,High School,Married,35,0,48829,2046,0
+30000,Female,High School,Single,29,2,15070,3000,1
+200000,Male,Graduate School,Married,43,0,142291,14200,0
+80000,Female,High School,Single,33,0,73132,2710,0
+310000,Female,High School,Single,36,-1,6282,5864,0
+220000,Female,University,Single,26,0,57222,107789,0
+80000,Female,University,Single,26,0,35997,0,0
+500000,Female,Graduate School,Single,38,0,56197,5000,0
+100000,Female,Graduate School,Married,37,0,70113,3181,0
+200000,Male,Graduate School,Married,37,-1,4052,4520,1
+20000,Male,University,Single,39,0,8799,1171,0
+20000,Female,University,Married,28,0,74510,2734,0
+80000,Male,University,Single,38,0,54809,2612,0
+110000,Male,University,Single,29,0,41300,0,0
+100000,Male,University,Married,38,0,99660,4600,0
+400000,Male,Others,Married,30,0,3443,69,0
+320000,Male,Graduate School,Married,40,-1,430,430,0
+180000,Male,University,Single,28,0,139097,5699,0
+180000,Female,University,Single,27,0,5319,2300,0
+30000,Female,University,Single,22,0,30123,1600,1
+30000,Male,University,Single,35,1,14320,0,1
+240000,Female,Graduate School,Single,42,0,238107,12067,0
+100000,Female,Graduate School,Single,24,2,76831,4037,0
+90000,Male,University,Married,50,0,85241,3600,0
+20000,Male,University,Married,22,0,19356,1485,0
+60000,Male,University,Single,28,0,37788,2500,1
+90000,Male,High School,Married,38,2,28774,1400,0
+80000,Female,University,Married,40,-1,5143,6458,0
+500000,Female,Graduate School,Married,34,0,174246,6371,0
+430000,Male,Graduate School,Single,34,0,81963,15000,0
+130000,Male,Graduate School,Single,27,-1,339,339,0
+500000,Female,University,Married,66,-1,3718,1755,0
+350000,Female,University,Married,38,-1,282,2750,0
+320000,Male,Graduate School,Single,34,-1,744,2297,0
+160000,Female,University,Single,24,0,142214,5300,0
+110000,Male,Graduate School,Single,25,2,109887,3500,1
+60000,Female,High School,Others,53,0,57605,2088,0
+260000,Female,Others,Single,33,0,18457,5000,0
+50000,Male,Graduate School,Married,47,0,49550,2743,0
+270000,Male,University,Single,44,0,274610,10556,0
+210000,Male,Graduate School,Married,52,0,44933,1793,0
+20000,Female,University,Married,29,0,16196,2200,0
+30000,Female,University,Single,49,2,28859,0,0
+500000,Female,Graduate School,Married,56,-1,881,1035,0
+360000,Female,Graduate School,Single,31,-1,24500,2251,0
+90000,Male,University,Single,30,-1,2629,1079,0
+10000,Female,University,Single,28,0,7797,1228,0
+200000,Female,University,Single,36,0,93863,4762,0
+60000,Male,University,Single,37,1,-391,0,0
+80000,Male,University,Married,40,-1,5810,216,0
+500000,Male,Graduate School,Single,32,-2,11086,23008,0
+50000,Male,University,Married,21,-1,47285,2108,0
+50000,Female,University,Single,24,0,45404,1700,0
+200000,Female,Graduate School,Single,30,-1,4520,10382,0
+470000,Male,University,Married,45,0,293261,18000,0
+180000,Female,High School,Married,41,-1,6092,4198,0
+240000,Male,University,Married,33,0,23486,3000,0
+200000,Male,Graduate School,Married,30,2,2500,0,1
+20000,Male,University,Single,40,8,20753,0,1
+80000,Female,University,Single,32,0,11572,2670,1
+30000,Male,University,Single,54,0,10568,3000,0
+200000,Female,Graduate School,Single,35,-1,816,0,0
+100000,Female,University,Married,38,1,66459,0,0
+70000,Female,High School,Married,30,2,56448,2600,1
+140000,Male,University,Married,30,1,0,0,0
+30000,Female,University,Married,27,2,26648,1600,0
+100000,Female,University,Single,27,0,96191,3504,1
+60000,Male,University,Single,38,0,42184,2014,0
+50000,Female,Graduate School,Single,23,-1,8225,5308,0
+120000,Female,Graduate School,Married,35,-2,0,0,0
+50000,Male,High School,Single,23,0,42821,1008,0
+300000,Male,University,Single,28,0,293498,13007,0
+360000,Female,Graduate School,Married,36,-1,2471,1000,0
+40000,Female,University,Married,36,0,24230,2000,0
+190000,Female,University,Single,40,-1,18255,19241,0
+30000,Female,Graduate School,Single,23,0,14294,1300,0
+280000,Female,High School,Single,47,2,1792,0,1
+30000,Female,University,Single,23,0,25287,2011,0
+500000,Female,Graduate School,Married,32,-1,185419,10573,0
+260000,Female,Graduate School,Single,32,-1,1694,5525,0
+60000,Female,University,Married,46,0,20351,2000,0
+70000,Female,High School,Married,36,0,28518,1504,1
+30000,Male,University,Single,27,2,25465,4000,0
+300000,Male,High School,Married,37,-1,4926,2855,0
+30000,Female,University,Single,22,2,26163,1000,0
+40000,Male,High School,Single,43,2,2400,0,1
+80000,Male,Graduate School,Single,25,0,65734,3000,0
+50000,Male,Graduate School,Single,35,0,48264,1565,1
+90000,Female,Graduate School,Married,38,-1,2346,15271,0
+260000,Male,Graduate School,Single,29,2,2497,0,1
+100000,Female,Graduate School,Married,58,1,27044,0,1
+280000,Male,Graduate School,Single,34,1,0,0,0
+80000,Female,University,Married,22,0,17087,1500,0
+150000,Female,University,Single,27,0,143596,6241,0
+90000,Female,University,Single,39,-1,2618,7639,0
+130000,Female,Graduate School,Married,33,-1,10653,13,0
+210000,Male,University,Single,36,-1,5491,8406,0
+420000,Female,Graduate School,Single,44,-1,559,1001,0
+500000,Male,Graduate School,Single,39,0,128068,20000,0
+180000,Male,Graduate School,Single,30,1,-203,0,1
+150000,Female,University,Married,36,0,141460,8021,0
+440000,Male,Graduate School,Single,30,0,50735,3000,0
+50000,Male,Graduate School,Single,27,0,49114,2005,0
+140000,Male,University,Married,29,-1,3300,1300,1
+150000,Male,Graduate School,Single,29,0,119613,5900,0
+150000,Female,University,Single,27,2,573,0,0
+480000,Male,Others,Single,49,-2,7174,0,0
+100000,Female,Graduate School,Single,35,0,99031,3300,0
+310000,Female,Graduate School,Single,33,0,71855,5000,0
+280000,Female,Graduate School,Single,27,0,56144,5000,0
+60000,Male,University,Single,51,-1,10304,1500,1
+160000,Female,Graduate School,Single,31,1,-200,0,0
+90000,Male,University,Married,48,0,51666,2919,0
+50000,Female,Graduate School,Single,28,-1,2926,1254,0
+200000,Female,University,Married,26,0,45487,2800,0
+240000,Male,University,Single,67,0,508581,17389,0
+200000,Female,Graduate School,Married,32,-1,846,864,0
+100000,Female,University,Single,23,-2,101771,3800,1
+50000,Female,Graduate School,Single,29,2,45367,2335,1
+240000,Female,Graduate School,Single,30,-1,1749,1099,0
+300000,Female,Graduate School,Single,25,1,-4,227,0
+350000,Female,High School,Married,43,-1,14103,0,0
+50000,Female,High School,Married,26,1,48421,0,1
+50000,Female,University,Married,36,0,46801,2839,0
+30000,Female,University,Single,25,2,24022,1700,1
+50000,Male,University,Married,47,-2,-2640,2640,0
+20000,Female,University,Married,34,0,6827,1200,1
+270000,Female,University,Single,30,0,280435,13600,0
+380000,Male,Graduate School,Married,49,2,380933,14651,1
+10000,Female,University,Single,22,5,8541,0,1
+80000,Female,University,Single,29,0,7880,3034,0
+240000,Female,University,Married,35,2,32463,1800,1
+280000,Female,Graduate School,Single,35,0,211645,6445,0
+160000,Male,Graduate School,Single,41,0,42863,2000,0
+180000,Female,Graduate School,Single,25,0,112403,3500,0
+50000,Female,University,Married,35,0,4555,1290,0
+50000,Male,University,Married,37,0,33105,10005,0
+20000,Male,University,Single,22,1,14380,0,0
+20000,Male,University,Single,33,0,18357,1471,0
+140000,Male,Graduate School,Single,27,0,72274,3231,0
+360000,Male,University,Single,28,-2,0,0,0
+180000,Male,University,Married,49,-1,26660,660,0
+390000,Male,Graduate School,Married,36,-2,3931,3625,1
+30000,Female,High School,Married,33,1,39995,1700,0
+210000,Female,Graduate School,Married,35,-1,3805,0,0
+20000,Male,University,Single,24,0,6128,1300,0
+450000,Male,Graduate School,Married,43,2,1000,0,0
+50000,Male,High School,Married,24,1,0,0,0
+50000,Male,High School,Single,46,1,49220,0,1
+20000,Male,University,Single,39,2,19518,1299,1
+200000,Female,Graduate School,Single,39,-1,7475,15617,0
+230000,Female,Graduate School,Single,27,0,70327,2963,0
+50000,Male,University,Married,41,2,48958,2572,1
+50000,Female,University,Single,26,0,44883,3189,0
+50000,Male,Graduate School,Single,57,0,51102,2500,0
+20000,Male,University,Single,23,0,9811,6097,0
+40000,Male,Graduate School,Married,47,2,11084,2000,0
+100000,Male,Graduate School,Single,26,-1,567,390,0
+150000,Female,University,Married,43,0,158844,10004,0
+240000,Male,Graduate School,Married,36,0,14559,1500,0
+360000,Male,University,Married,37,-2,-5,3589,0
+130000,Male,Graduate School,Single,43,0,130874,5500,0
+50000,Female,University,Single,23,0,10587,3500,0
+160000,Male,University,Married,60,0,18056,1626,0
+340000,Female,University,Married,32,0,34228,75113,0
+20000,Male,High School,Single,23,0,2331,2700,1
+60000,Male,High School,Single,52,0,22460,1500,0
+220000,Female,University,Married,35,0,223150,10000,0
+280000,Female,Graduate School,Married,32,-1,1280,2816,1
+60000,Male,University,Married,30,0,58920,2319,0
+230000,Male,Graduate School,Married,42,2,222059,7300,1
+120000,Female,High School,Others,41,0,3307,258,0
+240000,Female,Graduate School,Single,30,-1,325,18428,0
+420000,Female,Graduate School,Married,29,-2,3020,2916,0
+230000,Female,University,Single,31,0,47906,0,0
+30000,Female,Graduate School,Single,26,1,30739,0,1
+210000,Female,University,Married,30,0,134664,6006,0
+250000,Male,Graduate School,Single,29,0,51268,2500,0
+70000,Female,University,Married,34,0,58053,2000,0
+20000,Female,University,Single,43,0,20212,1415,0
+130000,Male,High School,Married,44,0,123770,4600,0
+460000,Male,Graduate School,Married,41,0,321987,12000,0
+170000,Female,University,Single,29,0,5155,0,0
+200000,Male,University,Single,31,-1,7179,7631,0
+70000,Male,High School,Married,43,0,31796,1500,1
+200000,Female,Graduate School,Married,32,-1,220,0,1
+20000,Female,University,Single,30,1,0,0,1
+200000,Male,High School,Single,55,0,195507,7048,0
+250000,Male,University,Single,45,2,237510,9076,1
+50000,Male,University,Married,34,0,37699,2000,0
+30000,Female,University,Single,25,0,25664,1448,0
+50000,Male,Graduate School,Married,44,-1,50404,2300,0
+90000,Male,High School,Married,45,0,13366,1250,0
+360000,Male,Graduate School,Single,34,2,2500,0,0
+290000,Female,University,Married,45,2,95,6,0
+170000,Male,Graduate School,Single,24,0,169883,6808,0
+290000,Female,Graduate School,Married,52,-1,33765,2682,0
+30000,Male,High School,Single,39,1,27934,0,1
+250000,Female,Graduate School,Married,25,-1,4769,6720,0
+210000,Male,High School,Single,32,0,104558,4268,0
+360000,Female,Graduate School,Married,30,1,0,0,1
+120000,Female,University,Married,37,0,78233,2015,0
+170000,Male,Graduate School,Single,28,-2,0,0,1
+20000,Male,High School,Single,55,-1,3413,0,0
+80000,Male,University,Married,25,0,77199,3700,0
+200000,Female,University,Married,43,-1,1875,2239,0
+150000,Female,Graduate School,Single,32,0,35601,2003,0
+110000,Female,High School,Married,44,0,13348,67431,1
+180000,Female,University,Single,26,-1,390,390,0
+520000,Female,High School,Married,41,-2,7322,2728,0
+20000,Male,High School,Married,44,2,12436,1000,1
+180000,Female,University,Single,44,-1,7432,7433,0
+30000,Male,High School,Married,36,-1,1493,1858,0
+380000,Female,Graduate School,Married,31,0,225905,11000,0
+160000,Female,Graduate School,Single,42,-1,4980,0,0
+220000,Male,Graduate School,Single,32,-1,17715,316,0
+250000,Male,Graduate School,Single,29,0,124442,5000,0
+330000,Male,Graduate School,Married,53,1,-317,569,0
+50000,Male,University,Single,26,2,24724,25000,0
+340000,Female,Graduate School,Single,35,-1,484,15904,0
+180000,Female,University,Married,41,0,99925,5024,0
+30000,Male,Graduate School,Single,35,0,26974,2000,1
+10000,Male,High School,Single,45,1,9503,0,0
+120000,Female,University,Married,37,-1,396,396,0
+250000,Female,University,Single,38,1,0,0,1
+200000,Female,Graduate School,Single,39,-1,2277,16021,0
+500000,Female,Graduate School,Single,32,0,298809,12475,1
+90000,Female,Graduate School,Single,29,1,9370,0,1
+80000,Male,High School,Single,38,0,7790,1467,0
+740000,Female,University,Married,35,2,960,8229,1
+300000,Male,Graduate School,Single,34,1,0,0,1
+210000,Male,High School,Married,44,-2,1525,800,0
+230000,Male,Graduate School,Married,37,-1,3102,30580,0
+270000,Male,High School,Single,51,0,152979,3500,0
+270000,Female,Graduate School,Single,32,0,54914,3000,0
+180000,Female,Graduate School,Single,30,-2,0,2647,0
+280000,Female,University,Single,30,-1,1478,0,0
+50000,Male,University,Single,46,0,29254,1737,0
+20000,Female,University,Single,26,2,19017,1600,1
+170000,Female,High School,Married,54,0,139211,7020,0
+300000,Female,University,Married,39,0,145138,5785,0
+20000,Male,University,Married,50,1,1065,0,1
+110000,Female,University,Single,31,2,1472,1500,1
+80000,Male,University,Married,54,1,47339,967,0
+80000,Female,Graduate School,Single,23,0,69584,2519,0
+60000,Female,University,Married,48,0,58794,7296,0
+30000,Male,University,Single,27,0,18951,1270,0
+40000,Male,Graduate School,Single,26,0,39723,10000,0
+180000,Female,High School,Married,43,1,0,2602,1
+290000,Male,University,Married,50,1,0,0,0
+50000,Female,Graduate School,Single,26,3,27056,700,1
+150000,Female,Graduate School,Single,32,-1,1663,14045,0
+50000,Male,University,Married,41,0,50220,2055,0
+20000,Male,University,Married,36,0,20710,1739,0
+180000,Female,Graduate School,Single,28,2,158245,7400,0
+360000,Male,Graduate School,Single,30,-1,30627,4318,0
+90000,Female,High School,Married,33,0,86960,3200,1
+450000,Male,Graduate School,Married,48,-2,20938,80191,0
+140000,Male,Graduate School,Single,29,1,0,0,0
+400000,Female,Graduate School,Single,27,-1,2691,426,0
+50000,Female,University,Married,30,-1,4959,0,0
+420000,Female,University,Married,50,0,405779,16000,0
+390000,Female,Graduate School,Single,35,0,245997,10800,1
+200000,Female,University,Married,40,-2,39929,33000,0
+130000,Male,Graduate School,Single,29,0,110524,10000,0
+360000,Female,Graduate School,Single,36,-1,1365,3000,0
+290000,Male,University,Married,46,0,196566,7800,0
+120000,Female,University,Single,24,0,112336,4200,0
+40000,Female,Graduate School,Single,24,0,35225,1933,1
+500000,Male,Graduate School,Married,42,-2,5121,31305,0
+150000,Female,Graduate School,Married,36,-2,16228,30745,0
+240000,Female,Graduate School,Single,26,-1,1611,14,1
+80000,Male,High School,Single,44,0,10350,1199,0
+220000,Female,Graduate School,Married,48,-1,3467,3340,0
+110000,Female,Graduate School,Married,31,-2,0,0,0
+220000,Female,University,Married,52,0,52536,4151,0
+100000,Female,University,Single,34,1,0,0,0
+70000,Female,High School,Married,42,0,6489,3000,0
+50000,Female,University,Married,27,1,15345,0,0
+200000,Female,Graduate School,Single,32,-1,34500,36659,0
+220000,Male,High School,Married,42,-1,285,825,0
+200000,Female,High School,Married,42,2,203340,8064,0
+350000,Female,Graduate School,Married,43,-1,6337,4300,0
+80000,Female,High School,Single,24,2,58850,2700,0
+50000,Female,University,Single,56,0,48316,1700,0
+200000,Male,Graduate School,Single,32,-1,396,396,0
+60000,Female,High School,Married,31,0,27347,5000,0
+50000,Male,High School,Married,49,0,50180,2400,1
+360000,Female,Graduate School,Single,33,-1,7831,0,0
+30000,Female,High School,Single,50,-1,178,1860,1
+90000,Female,University,Single,32,2,76442,2000,0
+200000,Female,Graduate School,Married,32,1,0,0,0
+100000,Male,Others,Single,51,2,101658,4134,0
+200000,Female,High School,Single,49,0,196272,17500,0
+300000,Female,Graduate School,Married,26,-1,1946,2612,1
+10000,Male,High School,Married,30,0,8315,3258,0
+400000,Male,University,Single,44,0,6902,1216,0
+130000,Female,High School,Married,34,1,126701,6200,1
+180000,Male,University,Single,50,-1,1156,396,0
+370000,Female,High School,Single,27,0,63412,100000,0
+60000,Female,Graduate School,Single,46,0,30073,3000,0
+210000,Female,Graduate School,Single,29,0,205131,10000,0
+70000,Female,High School,Married,46,0,11332,1250,1
+20000,Female,Graduate School,Single,33,1,16180,1000,0
+200000,Female,Graduate School,Single,35,0,99291,2742,0
+130000,Female,University,Married,30,0,21996,3000,0
+120000,Male,University,Married,38,1,116344,0,0
+330000,Female,Graduate School,Married,34,0,21576,5002,0
+30000,Female,High School,Married,54,2,23498,3600,1
+480000,Female,University,Single,39,0,502904,180000,0
+100000,Male,Graduate School,Single,39,-1,1442,1261,0
+270000,Female,Graduate School,Married,43,-2,1626,1443,0
+40000,Female,High School,Married,55,2,2400,0,1
+180000,Female,University,Single,34,1,180373,0,0
+240000,Male,University,Single,25,2,178878,6095,1
+210000,Female,Graduate School,Married,38,-1,4868,13174,0
+50000,Female,University,Single,23,1,3,34361,0
+230000,Female,High School,Married,44,-1,1424,6553,0
+50000,Female,University,Single,24,-1,2675,0,1
+250000,Female,Graduate School,Single,41,1,4649,0,0
+120000,Female,High School,Married,45,-2,0,0,0
+160000,Male,University,Married,35,-2,1822,2229,0
+210000,Male,University,Married,29,-1,1685,6715,0
+260000,Female,Graduate School,Single,29,0,203896,7282,0
+310000,Female,University,Married,32,0,73491,4000,0
+50000,Female,High School,Married,42,0,50427,6506,1
+360000,Female,Graduate School,Single,30,-2,750,1166,0
+50000,Female,Graduate School,Single,24,0,49144,1837,0
+240000,Male,University,Married,44,0,199446,7500,0
+500000,Male,High School,Married,49,0,74919,1862,0
+150000,Female,Graduate School,Married,40,0,152880,6000,0
+160000,Female,University,Married,38,-1,316,316,0
+410000,Female,University,Single,42,0,37492,1625,0
+50000,Female,High School,Single,63,2,51432,0,0
+50000,Male,High School,Single,38,0,47350,1772,0
+170000,Female,Graduate School,Single,29,0,115194,10000,0
+100000,Female,Graduate School,Single,47,0,22970,0,0
+230000,Male,Graduate School,Married,46,2,193112,8500,1
+20000,Male,University,Single,30,1,17819,0,1
+170000,Female,Graduate School,Married,50,-2,3955,5269,0
+160000,Female,Graduate School,Single,36,-1,598,0,1
+400000,Male,High School,Single,35,-1,92,101,0
+340000,Female,Graduate School,Single,29,0,82922,3600,0
+90000,Male,High School,Married,31,1,63276,0,1
+140000,Female,Graduate School,Single,32,-2,0,10626,0
+280000,Female,University,Married,27,-2,747,0,0
+180000,Female,University,Married,40,-1,632,316,0
+500000,Male,Graduate School,Married,53,1,7196,0,0
+360000,Female,Graduate School,Married,34,-2,9100,15658,0
+50000,Male,Graduate School,Single,23,0,8003,1310,0
+220000,Male,Graduate School,Married,55,-1,1782,0,0
+500000,Male,Graduate School,Married,35,0,143839,4844,0
+100000,Male,University,Single,41,-1,780,0,0
+20000,Female,Graduate School,Single,29,-1,3417,4,1
+80000,Female,High School,Married,26,0,76118,4196,0
+160000,Female,Graduate School,Married,38,-1,827,827,0
+240000,Male,University,Married,39,1,0,1263,0
+450000,Male,High School,Married,36,-1,550,0,0
+50000,Male,High School,Single,39,0,34390,810,0
+150000,Male,University,Married,37,0,27707,1477,1
+290000,Female,University,Married,36,0,154336,3600,0
+50000,Female,University,Single,22,0,29632,6016,0
+80000,Male,High School,Married,38,-1,197,4568,0
+50000,Male,University,Single,45,3,25867,4000,1
+50000,Male,University,Married,30,0,14040,1152,0
+500000,Male,University,Married,42,0,114374,50012,0
+500000,Female,Graduate School,Married,39,-2,3220,0,1
+200000,Female,High School,Married,40,0,44600,3000,0
+30000,Female,High School,Married,22,2,30678,1,0
+80000,Female,High School,Married,51,-2,387,2126,0
+50000,Female,Graduate School,Married,50,1,50175,0,0
+50000,Male,University,Married,31,0,42159,2000,1
+280000,Female,University,Single,38,0,94451,4000,0
+260000,Female,University,Married,48,1,0,0,0
+30000,Female,University,Single,22,1,-802,21286,0
+210000,Female,Graduate School,Married,28,1,0,0,0
+50000,Female,University,Married,31,0,45074,2201,1
+20000,Female,University,Married,29,0,10747,1317,0
+310000,Female,Graduate School,Married,47,0,131559,17839,0
+100000,Female,University,Married,42,1,0,0,0
+180000,Female,University,Single,36,0,161159,7000,1
+100000,Male,Graduate School,Single,26,0,18999,5129,0
+20000,Male,University,Single,25,-1,11657,0,1
+240000,Female,University,Single,46,0,128272,5500,0
+200000,Male,University,Single,45,0,200996,9200,0
+30000,Female,University,Single,24,-1,1104,1164,0
+500000,Female,University,Single,38,-1,18410,0,0
+230000,Female,High School,Married,43,-2,1099,2508,0
+110000,Female,High School,Single,27,0,28686,5000,1
+230000,Female,Graduate School,Single,34,1,1103,16286,0
+80000,Female,Graduate School,Married,45,1,74081,0,1
+200000,Female,Graduate School,Married,33,1,0,0,0
+80000,Female,High School,Single,51,0,77499,3000,0
+170000,Male,University,Married,30,0,150693,7000,0
+120000,Female,Graduate School,Married,31,-1,413,4269,0
+100000,Female,Graduate School,Married,45,1,0,5220,1
+210000,Female,Graduate School,Single,34,-1,300878,2250,0
+300000,Female,Graduate School,Single,26,-1,3853,1946,0
+70000,Female,University,Single,49,-1,390,390,0
+30000,Female,University,Single,40,-1,944,0,0
+30000,Female,University,Married,51,0,10874,1205,0
+80000,Female,University,Married,30,0,83093,3204,0
+180000,Female,Graduate School,Married,37,-1,29206,16231,0
+360000,Female,Graduate School,Married,28,0,164857,30650,0
+50000,Male,University,Married,46,0,48885,2000,0
+170000,Male,Others,Married,35,0,91109,6006,0
+230000,Female,University,Married,40,-2,0,0,0
+500000,Male,Graduate School,Married,44,-1,6723,17410,0
+30000,Male,University,Single,26,3,12894,0,1
+120000,Male,Graduate School,Single,31,-1,1316,1316,1
+200000,Male,Graduate School,Single,38,-2,3388,0,1
+60000,Female,High School,Married,38,0,26583,1864,0
+140000,Male,University,Single,39,1,143673,7000,0
+50000,Male,High School,Single,44,0,40151,2000,0
+90000,Female,University,Married,35,-1,6574,4192,0
+150000,Female,Graduate School,Single,28,-1,893,2300,0
+250000,Female,High School,Single,29,-2,686,0,0
+100000,Female,University,Single,24,2,17938,2842,1
+320000,Male,Graduate School,Married,43,-1,18625,10000,1
+50000,Male,High School,Single,31,0,49415,1278,0
+90000,Female,Graduate School,Single,28,1,56889,0,1
+250000,Male,University,Single,42,0,212787,9000,0
+120000,Female,University,Single,27,0,110686,7000,0
+140000,Female,High School,Single,24,0,126546,4922,0
+330000,Female,Graduate School,Single,34,-1,203,1503,0
+500000,Male,Graduate School,Married,33,-1,1901,3800,0
+30000,Male,University,Single,24,1,2706,0,0
+80000,Male,Graduate School,Married,41,2,79950,4900,1
+320000,Male,Graduate School,Married,40,1,14,1205,0
+600000,Male,University,Married,50,0,225274,8773,1
+200000,Female,University,Single,36,2,76308,4000,1
+240000,Female,University,Single,39,1,228610,0,0
+30000,Male,University,Married,54,0,6551,2049,0
+20000,Male,University,Single,33,0,14765,1000,0
+130000,Female,Graduate School,Single,32,0,132105,8000,0
+190000,Male,High School,Married,38,0,94921,4000,0
+470000,Female,Graduate School,Married,46,0,42974,3000,0
+50000,Female,Graduate School,Single,29,0,15416,2233,0
+250000,Female,University,Married,33,0,3694,1090,0
+500000,Female,Graduate School,Married,45,0,193083,21507,0
+500000,Male,Graduate School,Single,28,2,216657,8508,0
+60000,Female,Graduate School,Single,25,0,16006,1400,0
+20000,Male,University,Married,61,1,12961,1209,1
+50000,Male,University,Married,50,1,36750,5000,0
+20000,Male,University,Married,45,2,21136,11,1
+140000,Male,Graduate School,Single,27,0,7845,5000,0
+80000,Female,University,Married,53,0,31333,1541,0
+110000,Female,University,Married,29,0,106302,5500,0
+360000,Female,Graduate School,Single,39,-2,3377,37500,0
+50000,Female,Others,Single,21,0,1211,2000,0
+380000,Male,Graduate School,Married,52,1,0,742,0
+50000,Female,University,Married,38,-2,18750,0,0
+300000,Female,University,Married,49,-1,300,97,0
+360000,Male,Graduate School,Married,29,0,55104,2094,0
+60000,Female,University,Single,23,0,27104,1500,0
+30000,Male,University,Single,29,0,7965,1000,0
+120000,Male,Graduate School,Single,28,0,9383,7703,0
+50000,Female,Graduate School,Single,26,-1,1389,722,0
+150000,Female,University,Married,40,-1,4083,6248,0
+50000,Female,University,Single,23,2,4338,0,1
+50000,Female,University,Single,23,0,50105,2205,0
+50000,Female,University,Married,39,-1,390,390,1
+550000,Female,Graduate School,Single,28,0,257553,12036,0
+190000,Male,University,Married,37,1,49541,0,0
+310000,Female,University,Married,37,0,44981,1942,0
+110000,Male,High School,Single,49,0,113928,4915,0
+160000,Female,University,Married,27,-1,86772,2805,0
+200000,Female,Graduate School,Single,32,0,219146,7950,0
+300000,Male,Graduate School,Single,29,-1,696,696,0
+150000,Female,University,Single,28,0,86787,5120,0
+90000,Male,Others,Married,57,-2,0,0,0
+210000,Female,Graduate School,Single,26,0,6608,3582,0
+360000,Female,Graduate School,Married,33,-1,10571,11174,0
+150000,Male,Graduate School,Single,28,0,6065,5000,1
+100000,Female,University,Married,34,-1,296,0,0
+180000,Female,Graduate School,Single,26,-2,4312,2117,0
+270000,Male,University,Married,39,0,110917,7005,0
+60000,Male,University,Single,53,0,56765,3000,0
+220000,Female,University,Married,33,2,223904,0,1
+200000,Male,Graduate School,Single,60,-1,12716,12453,0
+120000,Female,University,Single,54,0,20157,2000,0
+500000,Male,Graduate School,Single,43,-1,37575,107243,0
+50000,Male,Others,Single,26,1,0,0,0
+30000,Male,Others,Single,38,0,30000,0,0
+50000,Male,University,Married,35,0,49008,2171,0
+20000,Female,University,Married,50,2,390,780,1
+240000,Female,Graduate School,Single,27,1,21260,0,0
+30000,Male,Graduate School,Single,28,0,10480,1199,1
+70000,Female,High School,Married,45,0,69877,2600,0
+50000,Female,High School,Married,25,2,49805,2025,1
+500000,Female,Graduate School,Single,28,2,98541,6000,1
+170000,Male,Graduate School,Married,41,0,56742,5019,0
+20000,Female,High School,Single,22,-1,390,390,0
+100000,Female,University,Single,27,0,94656,4171,0
+20000,Male,Graduate School,Single,23,1,18551,0,0
+210000,Male,High School,Single,27,-1,22431,2010,0
+50000,Male,Graduate School,Married,41,1,48063,0,1
+20000,Female,High School,Others,43,1,4094,0,0
+190000,Male,University,Single,36,0,58107,3000,0
+20000,Male,University,Married,35,-1,4966,1200,0
+220000,Male,Graduate School,Single,31,0,75538,4000,0
+20000,Female,University,Single,34,-1,780,0,0
+140000,Female,High School,Married,32,0,88966,3000,0
+20000,Male,University,Single,27,0,15526,1300,0
+210000,Female,Graduate School,Married,50,-1,90,90,1
+220000,Female,Graduate School,Single,36,-1,5385,5363,1
+120000,Female,Graduate School,Married,38,-1,980,4000,1
+50000,Male,University,Single,25,-1,10695,0,1
+230000,Female,Graduate School,Single,24,1,20887,0,0
+20000,Male,High School,Single,23,-1,9525,8000,0
+210000,Female,Graduate School,Single,31,0,140839,4500,0
+210000,Male,Graduate School,Single,26,0,204824,10000,0
+10000,Male,Graduate School,Single,38,0,8188,1000,0
+100000,Female,University,Single,28,-1,1215,1042,1
+30000,Male,University,Single,23,0,27044,2000,0
+300000,Male,Graduate School,Married,40,-1,7111,0,0
+140000,Male,Graduate School,Single,29,0,60764,3000,0
+50000,Male,University,Others,35,0,4943,45200,1
+200000,Male,University,Married,40,0,1296,2564,0
+260000,Male,Graduate School,Single,33,-1,1025,803,0
+250000,Male,University,Married,52,2,29086,5000,1
+500000,Female,Graduate School,Married,52,-2,144430,50133,0
+60000,Female,High School,Married,44,2,32202,1802,1
+360000,Female,University,Single,25,-1,442,444,0
+90000,Female,University,Married,41,1,28636,0,1
+150000,Male,University,Single,23,0,151686,3000,1
+130000,Male,Graduate School,Single,54,1,0,0,0
+120000,Male,University,Married,52,0,125022,4509,1
+160000,Male,University,Single,29,0,128809,5000,0
+500000,Male,High School,Married,37,0,89574,5053,0
+130000,Male,Graduate School,Single,28,-1,500,0,0
+360000,Female,Graduate School,Single,29,1,0,0,0
+120000,Male,University,Married,51,0,115771,5829,1
+240000,Female,University,Married,41,1,0,40529,0
+50000,Male,University,Single,35,0,41970,1291,0
+30000,Male,University,Others,27,0,14935,4500,0
+50000,Male,Graduate School,Single,25,0,19456,1645,0
+80000,Female,Graduate School,Married,38,2,389,755,1
+230000,Female,Graduate School,Single,27,0,7701,2028,0
+50000,Male,Graduate School,Married,41,0,46631,2100,0
+340000,Female,Graduate School,Single,42,0,374245,15917,1
+170000,Male,Graduate School,Single,30,-2,740,740,1
+450000,Female,University,Single,32,0,14237,5016,0
+350000,Female,Graduate School,Single,36,-1,655,23537,0
+50000,Male,University,Married,50,-1,19061,52498,0
+210000,Female,University,Single,25,-1,461,13966,0
+290000,Female,University,Single,26,1,18099,1400,0
+180000,Female,Graduate School,Single,30,0,174847,12151,1
+230000,Female,Graduate School,Single,34,-1,18482,5000,0
+140000,Female,University,Married,27,0,125273,6000,1
+30000,Female,Graduate School,Single,24,0,2880,3285,1
+140000,Female,Graduate School,Single,26,0,136170,6799,0
+50000,Male,University,Married,46,0,15220,2000,0
+80000,Female,University,Married,22,-1,1941,2132,0
+120000,Male,Graduate School,Single,28,0,35220,3700,0
+400000,Female,University,Single,37,-2,8948,10306,0
+70000,Male,University,Married,43,0,93025,3664,0
+190000,Female,University,Single,39,0,70396,5000,1
+60000,Female,University,Married,40,0,57030,5000,0
+20000,Female,High School,Married,38,-1,1118,12783,0
+30000,Female,High School,Single,50,-1,665,0,1
+450000,Male,Graduate School,Married,41,-1,560,0,0
+70000,Female,Graduate School,Married,38,0,64839,2400,1
+160000,Female,University,Married,38,-1,61934,5000,0
+140000,Female,University,Married,30,-1,2290,2290,0
+130000,Female,Graduate School,Single,26,0,7852,1152,1
+70000,Female,University,Single,42,2,44689,2000,1
+10000,Male,High School,Single,34,0,7165,2000,0
+30000,Female,University,Single,54,0,29230,3500,0
+20000,Male,Graduate School,Single,42,0,15610,3000,0
+360000,Female,University,Single,44,-1,10311,2758,0
+50000,Male,High School,Single,29,2,1050,0,1
+20000,Male,University,Single,24,1,16981,0,1
+110000,Male,University,Single,29,-1,10839,2995,0
+150000,Female,Graduate School,Single,40,-2,-3,374,0
+140000,Female,University,Married,30,0,21593,2000,0
+70000,Female,High School,Married,47,0,70798,2500,1
+80000,Male,Others,Single,28,1,87377,62,0
+80000,Male,Graduate School,Single,32,-1,14939,1000,0
+50000,Male,University,Single,24,1,51116,0,0
+100000,Male,Graduate School,Single,27,0,14712,5000,0
+130000,Female,University,Single,35,0,1923,1000,0
+400000,Male,Graduate School,Single,32,0,71991,10242,0
+500000,Male,Graduate School,Single,41,-1,326,628,1
+450000,Male,Graduate School,Single,26,0,20571,20000,0
+80000,Female,Graduate School,Single,26,0,19891,1576,0
+500000,Male,Graduate School,Married,45,-1,10147,7395,0
+210000,Female,Graduate School,Single,32,-2,1400,2800,0
+440000,Female,Graduate School,Single,28,0,67594,3000,0
+280000,Male,Graduate School,Single,27,0,39664,9022,0
+10000,Male,University,Married,50,2,2400,0,1
+210000,Female,University,Married,36,0,124011,5011,0
+100000,Male,Graduate School,Single,26,0,96644,5000,0
+210000,Female,Graduate School,Single,33,0,6309,0,1
+500000,Female,Graduate School,Married,37,-1,993,1413,0
+150000,Male,Graduate School,Married,39,0,65627,2982,0
+390000,Female,Graduate School,Married,40,1,30245,236533,0
+100000,Female,University,Married,26,2,97361,4732,1
+300000,Male,Graduate School,Married,45,-1,5045,688,0
+80000,Male,University,Married,46,0,79656,3211,0
+50000,Male,University,Single,22,1,50781,41,0
+200000,Female,University,Single,34,0,30833,4000,0
+380000,Female,Graduate School,Married,42,-1,7079,10437,0
+90000,Female,High School,Single,25,0,74818,2400,0
+50000,Male,Graduate School,Single,27,2,6291,0,1
+30000,Female,High School,Single,51,-1,770,0,0
+300000,Male,Graduate School,Married,44,0,21507,1379,0
+20000,Male,High School,Married,57,-1,390,390,1
+50000,Female,Graduate School,Married,47,-1,390,390,0
+50000,Male,Graduate School,Single,28,0,18771,10000,0
+70000,Female,University,Others,42,0,69486,3000,0
+230000,Female,Graduate School,Married,34,0,124824,10001,0
+290000,Female,University,Married,37,0,277449,52000,0
+240000,Male,Graduate School,Married,49,-1,511,0,0
+330000,Male,Graduate School,Single,32,-1,2145,0,0
+300000,Male,Graduate School,Married,38,0,87118,1691,0
+160000,Female,Graduate School,Married,40,-1,326,326,1
+360000,Male,Graduate School,Married,36,0,28291,3000,0
+60000,Female,University,Others,49,0,59902,3400,1
+200000,Male,Graduate School,Single,42,0,163340,8000,0
+220000,Female,Graduate School,Single,50,0,216194,11000,0
+500000,Female,University,Married,42,0,431195,17624,0
+10000,Male,University,Single,38,0,7955,1140,1
+20000,Female,University,Others,47,1,17624,1300,0
+50000,Female,Others,Married,33,2,24890,1925,1
+220000,Female,University,Single,27,-1,1051,735,1
+140000,Male,High School,Married,38,2,110674,22500,1
+80000,Female,Graduate School,Single,27,0,42414,2039,0
+70000,Female,Graduate School,Single,24,0,69023,3600,0
+390000,Female,Graduate School,Married,32,-2,1714,53367,0
+10000,Male,University,Married,47,0,8884,1159,0
+290000,Male,Graduate School,Single,36,0,22681,20246,0
+280000,Male,High School,Married,47,-1,2479,3564,0
+250000,Female,University,Married,29,0,180056,4077,0
+80000,Female,University,Married,45,3,34282,1567,1
+500000,Female,University,Married,37,0,207205,9755,0
+50000,Male,University,Single,26,0,47974,2500,0
+20000,Male,High School,Single,23,-1,1706,6394,0
+50000,Female,University,Single,23,0,16621,1037,1
+160000,Female,University,Married,47,1,0,0,1
+20000,Male,University,Single,24,2,1650,0,1
+80000,Female,High School,Married,25,-1,550,1606,0
+290000,Female,University,Married,27,0,284298,14013,0
+20000,Male,High School,Married,25,0,15706,1700,0
+30000,Female,University,Married,37,0,26168,2000,0
+170000,Male,Graduate School,Single,32,-1,69013,0,0
+700000,Female,Graduate School,Married,36,0,450827,13000,0
+50000,Male,High School,Others,58,0,46199,2023,0
+200000,Male,Graduate School,Single,31,1,0,0,0
+20000,Male,University,Married,27,0,6599,3000,0
+60000,Male,University,Married,41,0,95648,5000,0
+160000,Male,High School,Married,42,1,0,1564,0
+370000,Male,University,Married,34,-1,13854,3382,0
+500000,Female,High School,Single,49,0,207406,15000,0
+500000,Female,University,Married,40,-2,240927,1093,0
+50000,Female,Graduate School,Single,22,-1,45107,2000,1
+140000,Female,Graduate School,Married,36,0,72462,4000,0
+210000,Female,University,Married,50,-1,390,390,1
+300000,Female,Graduate School,Married,56,1,0,0,1
+60000,Female,University,Single,23,0,55765,2600,1
+80000,Male,University,Married,60,2,48463,2100,1
+50000,Male,High School,Married,48,-1,50899,2000,0
+240000,Female,University,Married,44,0,21468,3000,0
+360000,Female,Graduate School,Single,27,-1,1994,5317,0
+140000,Female,Graduate School,Single,24,0,25118,2000,0
+310000,Male,Graduate School,Married,37,0,190704,7787,0
+20000,Female,University,Single,22,0,14413,3000,0
+140000,Female,University,Single,29,0,46605,2281,0
+440000,Male,Graduate School,Married,62,0,14721,4909,0
+330000,Male,University,Married,34,0,17421,1650,0
+60000,Male,Graduate School,Single,26,0,29860,1500,0
+120000,Female,University,Single,23,0,38842,2300,0
+290000,Female,University,Single,34,-1,8690,0,0
+20000,Male,University,Single,26,1,14437,1600,0
+500000,Female,University,Married,64,1,493,1086,1
+20000,Male,University,Single,25,2,150,0,0
+150000,Female,High School,Married,46,2,11164,1000,1
+490000,Female,University,Married,42,-1,15838,11404,0
+170000,Male,University,Single,24,0,108717,8092,0
+50000,Female,Graduate School,Single,27,1,2238,2000,1
+150000,Male,University,Married,40,0,106396,7041,0
+390000,Male,Graduate School,Single,41,-1,6782,1466,0
+380000,Male,Graduate School,Single,33,0,79099,3000,0
+170000,Female,University,Married,34,1,-270,0,0
+170000,Female,University,Married,26,0,55658,5000,0
+390000,Male,Graduate School,Married,36,1,0,250,1
+30000,Female,University,Single,28,2,20218,0,0
+300000,Female,University,Married,48,-1,440,452,0
+180000,Female,Graduate School,Single,25,-1,23890,2181,0
+360000,Female,University,Married,27,0,241584,124900,0
+230000,Female,University,Married,42,0,5838,1300,0
+30000,Male,High School,Single,28,1,25554,3204,1
+400000,Female,University,Married,45,-1,9083,12658,0
+20000,Male,University,Single,47,2,19401,1331,1
+100000,Female,University,Single,26,-1,1508,6186,0
+340000,Male,University,Married,43,1,-15,0,1
+30000,Female,University,Single,22,0,27849,2000,0
+440000,Male,High School,Married,38,2,775,1551,1
+100000,Male,University,Married,46,2,390,390,0
+30000,Female,High School,Single,28,0,25354,1650,0
+20000,Male,University,Single,43,1,20251,0,1
+200000,Male,University,Married,40,1,303719,5133,0
+180000,Male,University,Single,34,0,167604,6949,0
+50000,Female,University,Single,34,-2,23264,13573,0
+210000,Female,University,Single,27,0,115583,25000,0
+210000,Female,University,Single,24,0,94681,3766,0
+220000,Female,Graduate School,Married,36,0,190691,20000,0
+240000,Male,Graduate School,Married,36,2,183360,8600,0
+200000,Male,University,Single,33,0,6195,1300,1
+20000,Male,University,Single,46,0,19401,1650,1
+180000,Female,Graduate School,Married,33,-1,13999,4176,0
+90000,Female,Graduate School,Single,26,2,89526,3400,1
+130000,Male,High School,Married,56,-1,582,0,1
+100000,Male,Graduate School,Married,51,2,87603,8200,1
+450000,Female,Graduate School,Single,54,1,-237,0,0
+210000,Male,University,Married,42,0,88498,3555,0
+380000,Female,University,Single,28,0,123880,7050,0
+20000,Female,University,Married,39,0,20246,3000,0
+50000,Female,Graduate School,Single,27,0,51076,2223,0
+50000,Female,Graduate School,Single,24,-1,1354,1054,1
+160000,Male,High School,Single,42,0,43753,4100,0
+450000,Female,Graduate School,Married,27,1,-4,973,0
+30000,Male,High School,Married,47,2,19150,600,0
+310000,Female,High School,Married,31,0,214190,9000,0
+50000,Male,High School,Single,26,2,41827,1800,1
+160000,Male,High School,Single,35,0,38486,1652,0
+170000,Male,Graduate School,Single,31,-1,959,1911,0
+230000,Female,Graduate School,Single,28,0,15714,13171,0
+20000,Male,High School,Single,25,0,3556,1481,0
+50000,Female,University,Single,22,-2,64498,349,0
+140000,Female,High School,Married,25,-2,76910,1994,0
+50000,Female,University,Single,43,1,48777,0,0
+280000,Female,University,Single,31,-2,2419,10517,0
+360000,Female,Graduate School,Single,31,-2,211,70,0
+50000,Male,High School,Single,22,0,45803,2400,0
+20000,Male,University,Single,21,0,19739,1000,0
+310000,Female,University,Married,34,0,87842,3021,0
+260000,Female,High School,Single,33,0,7571,1000,0
+80000,Male,Graduate School,Married,41,3,7866,1200,1
+30000,Male,High School,Single,42,1,13347,0,1
+340000,Female,University,Single,29,0,135538,4093,0
+260000,Female,Graduate School,Single,27,-1,37003,1305,0
+420000,Female,University,Married,40,0,33561,5900,0
+80000,Female,Graduate School,Married,54,2,12894,3084,1
+400000,Male,Graduate School,Married,36,0,44765,10006,0
+460000,Male,Graduate School,Married,50,-1,30765,13953,0
+200000,Male,University,Single,39,0,31236,1838,0
+160000,Male,University,Married,52,-1,390,780,0
+70000,Male,Graduate School,Single,26,0,6121,1200,1
+290000,Male,High School,Single,43,-2,170823,8050,0
+120000,Female,Graduate School,Single,29,0,50150,5069,0
+100000,Female,Graduate School,Single,26,0,24459,3000,0
+30000,Female,High School,Married,55,0,27381,1633,0
+30000,Female,Graduate School,Single,23,2,14546,1265,1
+50000,Female,Graduate School,Single,23,0,33959,2000,0
+210000,Female,Graduate School,Married,39,-1,4443,3569,1
+10000,Male,High School,Single,24,1,5530,500,1
+250000,Female,Graduate School,Married,50,-2,18381,2749,0
+230000,Female,University,Married,31,1,0,2716,0
+150000,Male,Graduate School,Single,26,-2,0,0,0
+20000,Female,Graduate School,Single,22,1,21672,1000,1
+150000,Female,University,Married,41,0,80435,3349,0
+20000,Male,High School,Single,48,0,5861,1119,0
+400000,Female,Graduate School,Single,30,-1,1919,7389,0
+20000,Female,University,Married,40,2,17761,4000,0
+440000,Female,University,Married,45,-1,600,0,1
+20000,Male,University,Single,26,-1,1439,1261,0
+320000,Male,Graduate School,Single,26,-1,316,316,0
+300000,Male,Graduate School,Married,37,0,41000,0,0
+50000,Female,High School,Married,24,-1,18965,3000,0
+210000,Male,Graduate School,Single,27,0,110757,5650,0
+350000,Female,Graduate School,Single,33,-1,2383,9911,0
+470000,Male,Graduate School,Married,33,-1,1260,2530,0
+300000,Female,Graduate School,Single,28,-2,0,0,0
+30000,Male,University,Single,25,2,1050,0,1
+210000,Male,University,Single,28,0,180004,9000,0
+30000,Female,Graduate School,Married,37,0,8301,1191,0
+200000,Female,University,Single,27,0,127843,6579,0
+140000,Male,University,Married,41,0,82309,12682,0
+50000,Female,Graduate School,Single,26,0,45431,1811,0
+10000,Male,University,Single,32,1,9001,0,1
+390000,Female,University,Single,28,0,116720,7000,0
+50000,Female,High School,Married,50,0,48173,2235,0
+50000,Female,University,Married,25,0,16221,3000,1
+30000,Female,High School,Single,31,-1,5477,8011,0
+50000,Female,Graduate School,Single,31,3,27166,3769,1
+50000,Female,Graduate School,Single,33,1,0,0,0
+260000,Male,Graduate School,Others,41,0,253795,9269,0
+50000,Female,Graduate School,Single,34,1,0,0,0
+140000,Male,Graduate School,Single,28,-1,639,8358,0
+490000,Male,Graduate School,Single,28,0,422483,15358,0
+60000,Female,University,Single,24,1,60425,28,1
+80000,Male,Graduate School,Single,39,2,62281,2500,1
+80000,Male,Graduate School,Single,26,0,51533,1800,0
+290000,Male,Graduate School,Single,28,0,162118,6000,0
+280000,Male,University,Single,28,0,104451,4470,0
+130000,Male,University,Single,29,0,111731,6000,0
+20000,Male,High School,Married,52,-1,2912,390,0
+50000,Male,University,Single,31,0,48548,2032,1
+330000,Female,Graduate School,Single,28,1,963,504,0
+150000,Female,High School,Single,40,0,95943,32153,1
+150000,Female,University,Single,26,0,96739,4000,0
+210000,Male,University,Others,27,0,81387,3883,0
+10000,Male,University,Others,28,0,5757,1300,1
+300000,Female,High School,Married,32,0,15003,1469,0
+140000,Female,University,Single,35,0,140387,5509,0
+50000,Female,University,Single,26,1,0,0,0
+140000,Male,Graduate School,Single,30,-1,1750,1003,0
+50000,Female,University,Married,32,0,46610,2021,0
+500000,Male,Graduate School,Single,35,-1,1369,4769,0
+500000,Female,High School,Married,36,0,70016,2568,0
+20000,Male,University,Single,38,2,1442,1261,0
+420000,Male,Graduate School,Married,47,-1,10517,15585,0
+60000,Male,University,Married,37,1,22007,3000,1
+60000,Male,University,Married,37,0,30010,5000,0
+190000,Female,University,Single,35,0,158360,4800,0
+220000,Male,Graduate School,Married,48,3,2500,0,1
+110000,Female,University,Single,28,2,86774,4200,1
+290000,Male,University,Married,33,-1,1300,0,0
+50000,Female,University,Married,25,2,50105,2300,0
+110000,Female,University,Single,39,0,210804,5426,0
+230000,Female,University,Married,35,-2,2132,1531,0
+250000,Female,Graduate School,Married,42,-1,16240,0,0
+100000,Male,Graduate School,Single,26,0,7136,1500,0
+30000,Female,Graduate School,Single,23,0,29064,2000,0
+230000,Male,Graduate School,Married,47,1,0,0,0
+50000,Male,University,Married,29,0,49660,2000,1
+420000,Male,Graduate School,Single,33,1,-190,6699,0
+30000,Male,University,Single,33,0,30247,1590,0
+20000,Male,Graduate School,Single,34,1,8834,1000,1
+200000,Female,Graduate School,Married,43,0,200898,10000,0
+90000,Female,Graduate School,Single,27,1,17374,0,0
+110000,Male,High School,Married,45,0,66820,4400,0
+60000,Female,University,Single,26,0,58452,2500,0
+80000,Male,Graduate School,Single,35,0,37642,4000,0
+50000,Male,High School,Married,41,1,85579,3023,0
+180000,Male,Graduate School,Single,28,1,0,0,0
+110000,Female,University,Single,39,0,109221,4230,0
+150000,Female,Graduate School,Married,33,0,50286,61101,0
+120000,Female,University,Single,30,0,104801,4000,0
+150000,Male,Graduate School,Single,40,0,152175,5800,0
+360000,Female,High School,Married,42,-1,9152,6361,0
+60000,Male,University,Married,56,0,26344,1446,1
+480000,Male,University,Married,40,0,69011,3000,0
+400000,Female,University,Single,25,0,393159,2522,0
+60000,Female,University,Single,24,1,44238,4000,0
+240000,Female,University,Married,29,0,198128,16500,1
+30000,Female,Graduate School,Single,27,0,3415,1000,0
+240000,Female,Graduate School,Single,38,0,138151,6021,0
+50000,Female,University,Single,23,0,16370,1524,1
+130000,Female,Graduate School,Single,27,0,91467,3247,0
+30000,Female,University,Married,49,2,23778,1415,0
+10000,Male,University,Married,35,0,9159,1160,0
+180000,Male,University,Single,27,0,51826,1500,0
+230000,Female,High School,Single,26,0,14361,1369,0
+30000,Female,Graduate School,Single,27,1,28686,0,0
+50000,Male,High School,Married,52,1,7839,1200,0
+500000,Male,Graduate School,Single,30,-1,1089,58665,0
+360000,Male,University,Single,30,-2,0,1000,0
+460000,Female,Graduate School,Married,43,-2,323408,5229,0
+10000,Female,University,Single,32,0,8509,1400,0
+270000,Female,Graduate School,Single,32,0,48114,20091,0
+80000,Male,University,Married,43,0,43749,3356,0
+100000,Male,Graduate School,Single,32,0,87580,3300,0
+20000,Male,High School,Married,42,2,13053,16,1
+300000,Male,University,Married,38,-1,374,2000,1
+30000,Female,University,Single,44,-1,2436,11378,0
+160000,Male,Graduate School,Single,33,1,-77,2000,1
+30000,Female,University,Single,25,0,7204,1200,0
+160000,Female,Graduate School,Single,31,-1,4203,9739,0
+380000,Female,Graduate School,Single,31,-2,276,276,0
+180000,Male,University,Single,43,0,164222,7500,0
+100000,Male,Graduate School,Married,45,0,99626,4600,0
+260000,Male,Graduate School,Married,38,-1,263,512,0
+180000,Female,High School,Single,25,0,120583,3507,0
+240000,Male,High School,Married,38,0,230131,8500,0
+30000,Male,University,Others,44,0,6289,3173,1
+50000,Male,University,Single,26,0,20329,1600,0
+180000,Female,University,Married,43,-1,7823,9549,0
+20000,Male,University,Single,24,-1,610,13002,0
+130000,Male,University,Single,31,0,48628,5207,1
+240000,Male,Graduate School,Married,36,-1,229,0,0
+160000,Female,Others,Married,35,1,0,0,0
+60000,Male,High School,Single,29,0,4511,1096,1
+100000,Male,High School,Married,35,0,96521,3938,0
+320000,Female,University,Single,28,1,0,0,0
+110000,Female,University,Married,62,0,33536,3000,0
+50000,Female,Graduate School,Single,31,2,48903,6500,1
+50000,Male,University,Married,38,2,27686,1500,1
+140000,Male,University,Single,39,0,139079,7000,0
+80000,Male,University,Married,46,0,81283,2892,1
+50000,Male,University,Single,23,0,47385,2472,0
+140000,Female,University,Married,35,1,130215,0,1
+210000,Female,University,Married,52,0,5723,1237,0
+130000,Male,University,Married,35,0,8010,1500,0
+200000,Female,Graduate School,Single,31,-1,4138,790,1
+110000,Male,University,Single,40,0,4202,2000,0
+140000,Female,University,Single,27,2,107257,0,0
+20000,Male,University,Single,43,-1,390,390,0
+450000,Male,University,Single,30,0,610723,20200,1
+170000,Male,University,Married,36,0,87076,3781,0
+170000,Female,University,Married,36,0,158651,5428,0
+180000,Male,University,Single,29,0,175773,6059,0
+200000,Male,Graduate School,Married,39,0,303347,6500,0
+80000,Female,University,Married,27,0,25424,2000,0
+150000,Female,University,Single,41,0,15938,1317,0
+160000,Male,Graduate School,Single,31,0,157303,8055,1
+60000,Male,High School,Single,38,0,3059,291,0
+240000,Female,University,Married,39,4,47739,0,1
+20000,Male,University,Single,24,0,12549,2319,1
+500000,Male,Graduate School,Single,30,0,449733,20120,0
+280000,Female,University,Single,50,-1,9173,11574,0
+200000,Male,Graduate School,Others,50,3,176077,8000,1
+290000,Female,Graduate School,Single,29,-1,6649,15837,0
+80000,Female,Graduate School,Single,34,0,75517,6000,1
+20000,Female,High School,Married,61,2,2730,2800,1
+400000,Female,Graduate School,Single,41,-1,48938,27703,0
+30000,Female,High School,Married,53,0,30587,1700,0
+130000,Male,Graduate School,Married,34,0,128846,5170,1
+60000,Male,University,Married,33,-1,4843,5000,0
+20000,Male,High School,Married,37,0,1439,1251,0
+160000,Male,University,Married,43,1,-3,0,0
+110000,Female,University,Single,25,1,104871,8000,0
+230000,Male,Graduate School,Married,40,0,87162,4293,0
+20000,Female,Graduate School,Single,22,-1,18356,0,0
+140000,Female,University,Married,34,0,15550,2500,0
+130000,Female,High School,Married,48,0,46206,3000,0
+130000,Female,Graduate School,Single,47,2,18454,2976,1
+200000,Male,University,Married,30,0,113676,6000,0
+60000,Female,High School,Married,43,2,4703,0,1
+20000,Male,Graduate School,Single,25,1,-431,1000,0
+110000,Female,Graduate School,Single,32,1,0,0,1
+320000,Female,Graduate School,Single,27,1,0,638,1
+130000,Male,High School,Others,40,2,88222,3515,0
+280000,Male,Graduate School,Married,50,3,327918,0,0
+290000,Male,Graduate School,Single,34,-1,1597,0,0
+30000,Female,High School,Married,28,0,17897,1342,0
+80000,Male,University,Single,28,0,54475,6150,0
+70000,Female,Graduate School,Single,24,0,10916,2000,0
+80000,Female,High School,Married,44,0,80808,6926,0
+200000,Female,Graduate School,Married,33,-1,1853,5928,0
+50000,Male,University,Married,28,0,96393,5800,0
+120000,Male,University,Single,28,0,59946,4000,0
+80000,Male,Graduate School,Single,29,0,51630,4000,0
+200000,Female,University,Married,35,2,51609,3000,1
+90000,Female,University,Married,34,-2,1905,1924,0
+30000,Female,University,Married,38,1,29803,1734,0
+30000,Female,High School,Married,29,-1,3489,14969,0
+50000,Female,High School,Single,23,0,47486,1703,0
+150000,Male,University,Single,31,0,148815,5147,0
+280000,Female,University,Married,42,-1,256675,13000,0
+10000,Male,High School,Single,25,0,6892,1294,0
+160000,Female,University,Married,35,0,96515,50439,0
+160000,Female,University,Married,30,0,51400,0,0
+390000,Male,Graduate School,Married,46,-1,6346,94921,0
+50000,Male,University,Single,22,0,34033,1342,1
+50000,Male,University,Single,26,0,18838,0,0
+150000,Male,University,Single,28,0,95322,3216,0
+50000,Male,High School,Single,23,0,47104,2000,0
+190000,Female,Graduate School,Single,39,0,83369,3853,0
+10000,Male,High School,Single,29,2,142,0,1
+210000,Female,Graduate School,Single,37,1,-200,0,0
+230000,Male,Graduate School,Single,32,-2,1470,0,0
+150000,Female,Graduate School,Married,52,-1,5094,29148,0
+230000,Female,Graduate School,Married,34,-1,4358,3000,0
+80000,Female,Graduate School,Single,28,1,4885,0,0
+160000,Female,Others,Single,31,-1,1790,0,0
+500000,Female,Graduate School,Single,27,-1,33299,60957,0
+180000,Female,Graduate School,Single,34,0,43374,2100,0
+30000,Female,University,Single,23,1,28850,0,1
+130000,Female,Graduate School,Married,47,0,113470,4000,0
+10000,Female,High School,Married,46,0,10400,0,0
+20000,Female,University,Married,45,-1,650,264,0
+30000,Male,High School,Single,27,0,18556,2000,0
+30000,Male,University,Single,38,0,20996,5000,1
+320000,Female,Graduate School,Married,39,0,206461,5000,0
+10000,Male,University,Single,23,3,10034,0,1
+350000,Female,Others,Single,53,-1,5095,4840,0
+20000,Male,High School,Single,35,0,17365,2618,0
+50000,Male,University,Single,25,1,53086,0,0
+310000,Male,Graduate School,Single,28,0,153620,6603,0
+80000,Male,University,Single,34,0,66122,2600,0
+170000,Male,Graduate School,Single,43,-1,1866,5750,0
+10000,Female,High School,Others,52,2,9970,2001,1
+30000,Male,High School,Single,41,1,0,0,0
+170000,Female,High School,Single,46,-1,1986,1050,0
+230000,Female,Graduate School,Married,32,-2,700,1428,0
+50000,Female,University,Single,23,0,16139,1400,0
+300000,Female,Graduate School,Married,52,1,0,0,0
+210000,Male,University,Married,37,1,0,0,0
+20000,Female,High School,Married,36,0,21309,3150,0
+20000,Female,University,Married,32,0,16434,1500,0
+20000,Female,University,Single,25,2,16923,4000,0
+80000,Male,Graduate School,Single,29,0,31637,1734,0
+30000,Female,University,Others,22,0,7928,2318,0
+200000,Male,Graduate School,Single,33,2,156558,15000,1
+210000,Male,Graduate School,Single,31,2,1172,380,0
+330000,Female,Graduate School,Married,42,-1,899,7448,0
+30000,Male,University,Single,25,2,14420,1500,1
+280000,Male,Graduate School,Married,39,-1,17489,7614,0
+300000,Male,Graduate School,Married,37,-1,1943,0,0
+10000,Male,High School,Married,42,1,6091,1000,0
+50000,Female,University,Single,24,0,27788,2000,0
+280000,Male,University,Married,49,-1,390,390,0
+20000,Male,University,Married,51,0,14681,1562,0
+130000,Male,University,Single,32,-1,1496,1496,0
+20000,Male,University,Single,25,0,18193,1800,0
+120000,Female,University,Single,26,-1,5820,216,0
+400000,Female,Graduate School,Single,33,-1,7768,31330,0
+300000,Female,University,Single,39,-1,6293,5490,0
+30000,Female,University,Single,23,0,12775,1600,0
+20000,Male,University,Married,47,0,14835,3000,1
+320000,Female,University,Married,43,-1,24685,30161,0
+80000,Male,University,Married,57,0,54211,2007,0
+160000,Male,High School,Married,59,-2,-220,1000,0
+80000,Female,High School,Single,50,0,46914,2210,0
+280000,Female,High School,Single,47,-1,27572,39787,0
+210000,Female,Graduate School,Single,28,1,0,0,1
+250000,Female,Graduate School,Single,27,0,8769,990,0
+120000,Female,Graduate School,Single,27,-1,3215,700,0
+20000,Female,University,Single,27,0,15685,1300,0
+200000,Female,University,Single,35,0,72450,5556,0
+100000,Male,Graduate School,Married,45,0,42049,2200,0
+80000,Female,University,Married,53,0,48688,2530,0
+450000,Female,Graduate School,Single,39,-1,242,0,0
+200000,Female,Graduate School,Married,38,-1,5221,6877,0
+50000,Female,University,Single,28,0,47813,1900,0
+500000,Female,University,Single,31,0,43818,5000,0
+50000,Male,Graduate School,Single,25,0,44663,4100,0
+450000,Female,Graduate School,Married,29,0,72622,2000,0
+130000,Female,Graduate School,Single,30,-1,780,0,0
+220000,Female,Graduate School,Single,29,-2,-37,5682,0
+210000,Female,University,Married,38,-2,390,780,0
+50000,Male,Graduate School,Single,57,0,50698,2100,0
+260000,Male,University,Single,31,0,260783,1743,0
+70000,Female,University,Single,26,0,35867,3000,0
+50000,Male,University,Married,37,0,49045,2000,1
+20000,Male,University,Single,28,2,12762,0,1
+170000,Male,University,Married,34,1,91450,3400,1
+100000,Female,University,Single,34,0,104660,4589,0
+200000,Female,University,Single,30,-1,11916,12366,1
+30000,Female,High School,Married,59,1,28084,1950,1
+300000,Female,Graduate School,Single,39,0,89792,4500,0
+360000,Female,University,Single,28,3,2500,0,1
+140000,Male,Graduate School,Single,33,0,128891,4637,0
+360000,Female,University,Single,28,-1,220,2755,0
+150000,Female,University,Married,43,0,32283,18361,0
+280000,Male,University,Married,37,0,149227,5505,0
+230000,Female,University,Married,46,-1,1443,1443,0
+80000,Female,High School,Married,46,0,70324,3000,0
+50000,Male,Graduate School,Single,32,0,20542,1411,0
+180000,Male,Graduate School,Single,30,1,0,1186,1
+100000,Male,University,Single,34,2,32685,4400,1
+30000,Male,Graduate School,Single,30,1,21996,0,1
+80000,Male,Graduate School,Married,37,2,57844,2700,0
+500000,Female,Graduate School,Married,39,-1,28852,90668,0
+150000,Female,University,Single,33,-2,6806,2779,0
+50000,Female,University,Single,22,1,8566,0,0
+260000,Female,Graduate School,Married,42,-2,6000,4570,0
+410000,Female,University,Married,35,0,11193,1831,0
+50000,Female,High School,Married,33,0,39847,1719,0
+100000,Female,University,Married,39,0,100549,3711,0
+90000,Female,High School,Married,37,0,62460,2500,0
+50000,Male,Graduate School,Single,32,2,16434,1688,0
+200000,Female,University,Married,51,1,93215,4300,0
+240000,Male,Graduate School,Single,33,-1,6785,58243,0
+360000,Male,Graduate School,Single,29,-1,6167,17,1
+170000,Female,Graduate School,Single,37,-1,49699,32305,0
+80000,Female,High School,Married,30,2,41687,2000,1
+260000,Male,High School,Married,49,-2,8654,15328,0
+70000,Female,University,Single,55,0,53214,2230,0
+30000,Female,High School,Single,43,0,26828,1588,0
+290000,Female,University,Married,35,0,260698,11019,0
+110000,Female,University,Married,46,0,110950,4088,1
+110000,Male,University,Married,30,0,57503,9600,1
+200000,Male,University,Married,40,0,93700,4512,0
+70000,Female,High School,Single,22,0,52899,1726,0
+150000,Female,University,Single,29,0,84357,2841,0
+340000,Female,University,Single,43,0,106867,12002,0
+30000,Female,University,Single,25,2,10343,1200,1
+200000,Male,Graduate School,Married,40,-1,390,400,0
+400000,Female,University,Single,44,-1,4988,3003,0
+110000,Female,University,Married,41,0,52388,1780,0
+420000,Male,University,Married,36,0,56068,15005,0
+150000,Female,Others,Married,45,0,232587,30032,0
+290000,Female,Graduate School,Married,36,0,97733,3500,0
+160000,Male,Graduate School,Married,58,1,0,0,0
+50000,Male,High School,Single,31,-1,44461,1000,1
+150000,Male,Graduate School,Single,25,0,132684,5713,0
+500000,Female,Graduate School,Single,29,0,82715,7057,0
+140000,Female,University,Single,26,0,54868,3000,0
+200000,Male,Graduate School,Married,44,-1,1785,757,0
+230000,Male,Graduate School,Married,46,-1,519,518,1
+60000,Female,University,Single,26,0,53986,2050,0
+150000,Male,University,Married,38,-1,509,14310,0
+340000,Female,Graduate School,Single,29,-1,2688,11486,0
+140000,Male,Graduate School,Single,30,0,5600,0,0
+20000,Female,Graduate School,Married,41,1,6249,0,0
+200000,Female,Graduate School,Single,26,0,14513,84387,0
+20000,Male,High School,Married,44,-1,2522,169,0
+300000,Female,Graduate School,Married,29,0,64437,2000,0
+200000,Female,Graduate School,Married,64,-1,2035,1261,1
+180000,Male,Graduate School,Single,28,0,180387,9000,0
+30000,Female,High School,Married,24,2,27896,2506,0
+350000,Male,Graduate School,Married,62,-1,83393,10237,0
+160000,Male,Graduate School,Single,28,-1,3236,0,0
+20000,Female,High School,Married,42,2,20205,1500,1
+130000,Male,Graduate School,Single,28,1,0,3000,0
+180000,Female,Graduate School,Married,43,0,166569,6000,0
+220000,Female,University,Married,36,2,195725,8200,1
+120000,Female,University,Single,25,0,54628,2200,0
+500000,Female,University,Married,33,0,48415,15021,0
+150000,Female,Graduate School,Single,29,-2,2889,3469,1
+180000,Male,Graduate School,Single,29,-1,18213,4390,0
+440000,Male,Graduate School,Single,30,-2,390512,10063,0
+190000,Female,University,Married,35,0,189420,8900,0
+160000,Female,University,Married,40,-2,766,766,1
+20000,Male,University,Married,37,0,6372,1100,0
+30000,Male,High School,Married,44,0,10056,1500,0
+420000,Male,Graduate School,Married,32,1,8120,1011,0
+370000,Female,Graduate School,Single,28,0,68452,3513,0
+60000,Male,University,Single,35,1,62522,0,0
+180000,Male,University,Single,43,0,106765,10000,0
+90000,Female,High School,Single,25,0,16977,3000,0
+250000,Female,University,Single,24,2,69770,3562,1
+30000,Female,Graduate School,Single,25,0,25115,1886,1
+140000,Male,Graduate School,Married,41,-1,4523,12171,0
+180000,Female,Graduate School,Single,26,0,51699,2200,0
+70000,Female,Graduate School,Single,30,0,12196,1300,0
+320000,Female,University,Single,36,-2,29862,5000,1
+150000,Male,Graduate School,Single,28,1,0,0,0
+20000,Male,University,Single,29,0,3785,1154,1
+30000,Male,Graduate School,Single,25,0,19041,1500,1
+500000,Male,Graduate School,Single,31,0,239824,5614,0
+30000,Female,University,Single,22,2,200,0,0
+50000,Male,University,Others,41,-1,780,12667,0
+20000,Male,High School,Married,44,0,5855,1122,0
+10000,Female,High School,Married,51,2,103,0,1
+500000,Female,High School,Married,45,-2,338,338,0
+20000,Male,Graduate School,Single,25,2,2902,4200,1
+200000,Female,Graduate School,Married,28,0,40559,5000,0
+10000,Male,University,Married,31,0,6106,1200,0
+50000,Female,High School,Married,43,2,44865,2923,1
+320000,Female,University,Married,35,-1,177391,8200,0
+50000,Male,University,Married,41,0,48729,1000,0
+30000,Female,Graduate School,Married,41,0,15636,2000,0
+110000,Male,High School,Single,36,-1,2509,1260,1
+50000,Female,Graduate School,Married,35,1,0,354,0
+140000,Female,Graduate School,Single,29,1,112592,0,0
+90000,Female,Graduate School,Married,44,-1,539,0,0
+60000,Male,University,Married,49,0,59578,22000,1
+20000,Male,University,Single,24,2,16276,1000,0
+340000,Female,Graduate School,Married,62,0,509365,20309,0
+30000,Male,University,Single,29,1,31986,0,0
+170000,Male,Graduate School,Married,35,-1,481,326,1
+130000,Male,Graduate School,Married,45,2,53709,3000,1
+250000,Female,University,Married,44,0,6107,3000,0
+200000,Female,University,Married,36,-1,390,1308,0
+10000,Female,University,Married,49,0,3100,1073,1
+180000,Female,University,Single,29,-1,13936,0,0
+230000,Female,Graduate School,Single,29,-1,8916,318,0
+200000,Female,High School,Married,45,-1,11025,6959,0
+90000,Female,High School,Single,33,0,130765,4800,0
+130000,Male,High School,Single,28,-1,3964,600,0
+230000,Male,Graduate School,Single,26,-1,1099,19686,0
+320000,Male,University,Single,36,0,289508,21041,0
+30000,Male,High School,Single,34,2,24367,4723,1
+80000,Female,University,Single,25,0,57994,3327,0
+80000,Female,Graduate School,Married,35,-1,1000,0,1
+260000,Female,University,Married,45,-1,7893,5923,0
+20000,Male,High School,Single,26,2,14081,2000,1
+200000,Female,University,Married,34,-1,795,1000,1
+80000,Female,University,Single,23,0,54658,5573,0
+100000,Female,Graduate School,Single,42,0,60380,2200,0
+150000,Female,Others,Married,35,0,39708,1608,0
+80000,Female,Graduate School,Single,27,0,76306,3600,0
+20000,Female,University,Single,48,0,7684,1200,0
+200000,Female,University,Married,26,0,15150,1411,0
+120000,Female,University,Married,41,2,36850,4000,1
+30000,Male,High School,Single,50,2,46868,1800,1
+200000,Male,Graduate School,Single,37,0,118134,15034,0
+80000,Female,High School,Single,38,0,77929,3006,0
+280000,Female,Graduate School,Married,46,-1,11967,183,1
+500000,Male,Graduate School,Single,40,0,353356,17000,0
+120000,Female,Graduate School,Married,37,-1,696,0,0
+220000,Female,University,Married,41,-1,4016,3000,0
+360000,Male,University,Married,34,0,6287,2505,0
+210000,Female,Graduate School,Married,35,0,142422,6500,0
+160000,Female,Graduate School,Married,28,1,4529,0,1
+100000,Female,Graduate School,Single,29,-2,0,0,1
+400000,Female,Graduate School,Single,31,-1,4209,14894,0
+110000,Female,University,Married,46,0,111301,3075,0
+360000,Male,Graduate School,Married,37,1,0,0,1
+500000,Female,University,Married,48,0,39734,117243,0
+130000,Male,University,Single,27,0,116646,5578,0
+60000,Female,Graduate School,Single,26,-2,17873,3000,0
+240000,Male,Graduate School,Married,39,-1,33984,3000,0
+100000,Female,University,Married,40,-2,-14,1466,0
+150000,Female,University,Married,38,-1,5195,4185,0
+290000,Male,University,Married,39,0,209265,10000,0
+50000,Male,University,Single,43,0,34110,3505,0
+30000,Female,University,Single,46,1,8891,28001,0
+210000,Female,University,Married,35,-1,1338,1508,0
+100000,Female,Graduate School,Single,30,-1,14729,0,0
+30000,Female,High School,Single,40,-1,780,0,0
+30000,Female,High School,Single,34,-1,9318,10621,0
+80000,Female,High School,Married,34,2,78489,0,0
+30000,Female,University,Single,24,2,2550,0,1
+70000,Female,High School,Single,41,0,32581,2000,0
+470000,Female,Graduate School,Single,27,-1,2981,3182,0
+180000,Female,Graduate School,Married,39,1,0,0,1
+20000,Male,University,Married,32,0,17106,1458,1
+420000,Male,Graduate School,Married,60,0,195379,8003,0
+100000,Female,University,Single,29,0,95205,9300,1
+130000,Female,High School,Married,53,0,18614,2343,0
+200000,Male,Graduate School,Married,34,-1,780,0,0
+150000,Male,Graduate School,Single,44,1,10710,137,0
+410000,Female,Graduate School,Single,30,0,6972,3188,0
+30000,Male,Graduate School,Married,40,-1,1047,447,1
+20000,Male,University,Single,23,1,9352,0,1
+130000,Female,University,Single,30,0,126834,5802,0
+180000,Female,Graduate School,Single,28,-2,0,0,0
+120000,Male,Graduate School,Single,32,0,45117,3000,0
+190000,Female,High School,Married,27,0,43903,10882,0
+30000,Female,University,Married,53,0,26891,2000,0
+170000,Male,Graduate School,Single,28,0,73254,2500,0
+120000,Male,High School,Married,32,1,110826,6000,0
+50000,Male,University,Single,34,1,35548,0,0
+100000,Male,University,Single,50,1,0,2976,0
+20000,Female,University,Single,26,2,10372,3000,1
+40000,Male,Graduate School,Single,25,-1,511,513,1
+300000,Male,Graduate School,Married,45,-1,1095,2941,0
+230000,Female,High School,Married,37,1,0,0,0
+280000,Female,Graduate School,Married,38,-1,5165,6455,1
+450000,Female,Graduate School,Others,37,1,0,0,0
+260000,Male,University,Single,30,0,7426,2000,0
+50000,Female,High School,Married,39,0,43994,5000,0
+50000,Female,University,Married,29,2,2550,0,1
+240000,Female,University,Married,36,0,227797,10011,0
+70000,Female,University,Married,32,1,22914,1700,0
+60000,Female,University,Single,30,0,60922,3205,0
+230000,Female,University,Married,31,0,160413,6011,0
+30000,Male,Graduate School,Single,27,0,27159,1500,0
+80000,Female,High School,Married,29,1,80867,0,1
+360000,Female,University,Married,40,-2,0,0,0
+200000,Female,Graduate School,Married,49,0,51024,10000,0
+150000,Female,Graduate School,Single,37,-1,180,1402,0
+260000,Female,Graduate School,Married,42,-2,387,616,0
+180000,Female,University,Single,24,0,126420,6519,0
+220000,Female,High School,Married,32,-1,8178,9140,0
+180000,Female,Graduate School,Single,40,0,42790,1800,0
+90000,Female,Others,Others,46,0,77263,8333,0
+20000,Male,University,Single,24,0,14558,3200,0
+290000,Female,University,Single,31,0,172758,8100,0
+210000,Male,University,Married,35,-1,396,1188,0
+20000,Male,Graduate School,Single,31,0,18551,1647,0
+310000,Female,Graduate School,Single,27,0,115126,5222,0
+240000,Male,University,Married,42,0,66394,1273,0
+330000,Female,University,Married,42,-2,-4,15450,1
+310000,Female,University,Single,29,0,1469,1340,1
+30000,Male,University,Single,26,2,200,0,1
+470000,Male,Graduate School,Married,37,0,134561,150055,0
+280000,Female,Graduate School,Married,56,0,89739,4000,0
+120000,Female,Graduate School,Married,34,1,0,0,0
+240000,Male,University,Single,28,-1,4882,5220,1
+800000,Male,University,Married,53,-1,7639,11145,0
+320000,Female,Graduate School,Single,30,-1,1663,7025,1
+140000,Female,Graduate School,Married,34,0,103288,4500,0
+20000,Male,University,Single,25,2,8880,1500,0
+160000,Female,University,Single,43,1,0,0,0
+280000,Female,University,Married,38,0,237821,11000,0
+330000,Female,University,Single,33,0,166917,7400,0
+230000,Female,Graduate School,Married,24,-1,35000,35000,0
+50000,Male,University,Single,25,-1,1170,0,1
+210000,Female,University,Single,33,1,21854,2269,0
+170000,Male,University,Single,29,1,0,0,0
+220000,Male,Graduate School,Single,47,0,159351,6000,0
+90000,Male,Graduate School,Single,30,-1,1166,1918,0
+20000,Male,University,Married,26,1,11338,0,0
+480000,Female,University,Married,31,0,20824,5004,0
+140000,Male,Graduate School,Married,36,0,40836,2001,0
+160000,Male,Graduate School,Married,41,0,97266,3769,0
+80000,Female,University,Married,41,0,78337,3500,1
+100000,Female,Graduate School,Single,22,2,72222,3400,0
+310000,Female,University,Single,35,0,184404,6225,0
+200000,Female,University,Married,36,-1,371,37647,0
+50000,Male,University,Married,55,0,25330,1745,0
+100000,Female,University,Married,39,2,78353,1200,1
+240000,Female,Graduate School,Single,26,1,0,189,0
+30000,Male,University,Married,53,0,18112,1329,0
+180000,Male,High School,Married,43,-1,3690,15682,0
+40000,Female,Graduate School,Single,24,0,38272,2000,1
+70000,Female,University,Married,31,0,71065,2566,0
+50000,Female,University,Single,22,0,30631,4809,0
+500000,Male,University,Single,30,1,2817,5037,0
+200000,Female,Graduate School,Single,33,-1,390,0,0
+250000,Male,Graduate School,Married,51,2,2487,0,1
+50000,Male,Graduate School,Married,49,1,48518,1800,0
+180000,Male,University,Married,33,1,0,0,0
+160000,Female,Graduate School,Married,42,-1,3607,4417,0
+20000,Male,University,Single,40,1,15476,1600,0
+50000,Female,University,Married,44,0,27321,1415,0
+20000,Female,Graduate School,Single,23,0,17165,1500,0
+150000,Female,Graduate School,Single,24,-1,291,291,0
+360000,Male,University,Single,48,0,29269,1500,0
+20000,Male,University,Single,28,0,8814,1500,0
+170000,Female,University,Single,32,0,71752,2687,0
+30000,Male,High School,Single,42,0,27742,4600,0
+80000,Female,University,Married,29,0,49528,1730,0
+110000,Female,University,Single,25,0,111115,6000,0
+140000,Male,University,Single,25,-1,1198,4660,0
+50000,Male,University,Single,34,0,50713,3000,0
+120000,Female,University,Married,34,0,88210,3300,0
+10000,Male,High School,Single,22,0,18289,1317,0
+20000,Female,Graduate School,Married,43,-1,1716,4027,0
+180000,Female,Graduate School,Single,31,-1,4582,1359,1
+240000,Female,University,Single,24,0,50787,1601,0
+120000,Male,University,Married,30,1,6305,1000,1
+150000,Female,Graduate School,Single,28,0,8982,0,0
+30000,Male,High School,Married,55,2,2395,0,1
+80000,Male,Graduate School,Single,31,-1,17873,65592,0
+50000,Male,Graduate School,Married,28,0,4545,0,1
+30000,Female,High School,Married,58,2,26580,1800,1
+200000,Male,Graduate School,Married,29,-1,9585,1735,0
+480000,Female,University,Single,31,-1,1759,1759,1
+50000,Male,University,Single,55,2,21133,1325,1
+30000,Male,University,Married,23,2,28224,1800,1
+300000,Female,Graduate School,Single,31,-1,7834,1093,1
+500000,Female,University,Married,36,0,4952,3008,0
+350000,Male,Graduate School,Single,30,-1,18265,16158,0
+150000,Male,Graduate School,Married,41,-2,6780,10638,0
+30000,Male,University,Married,46,0,20328,1582,0
+20000,Female,Graduate School,Single,25,1,20420,2,0
+380000,Female,University,Single,41,0,302130,10018,0
+70000,Female,Graduate School,Married,42,0,40466,1994,1
+300000,Female,University,Married,40,-1,2991,1278,0
+160000,Male,High School,Single,27,1,80166,0,1
+230000,Female,University,Single,32,1,328,1621,0
+260000,Male,Graduate School,Married,51,-1,229,674,0
+360000,Male,University,Married,37,-2,194836,5180,0
+50000,Female,University,Single,26,0,49028,2400,0
+20000,Male,University,Single,24,-1,2500,3068,0
+140000,Female,University,Single,29,0,116597,3018,0
+50000,Male,University,Single,24,0,44687,2138,0
+50000,Male,High School,Single,27,0,45002,2100,0
+50000,Male,University,Single,28,0,39103,1667,0
+50000,Female,Graduate School,Single,22,0,24763,10000,1
+80000,Female,Graduate School,Single,24,-1,75356,5000,0
+90000,Female,University,Married,28,0,24657,1500,0
+50000,Female,University,Married,27,-1,492,493,0
+270000,Female,University,Married,32,0,116315,6500,0
+60000,Female,University,Single,34,1,64684,0,0
+240000,Female,University,Married,35,-2,3862,1783,0
+240000,Male,Graduate School,Single,36,0,131477,6009,0
+30000,Male,High School,Married,43,1,22494,0,1
+50000,Male,University,Single,42,1,51047,0,0
+370000,Male,University,Single,33,0,72322,4252,0
+50000,Male,University,Married,34,1,49372,0,0
+130000,Male,University,Single,35,0,2541,0,0
+20000,Male,University,Single,40,1,17361,0,0
+410000,Male,University,Single,39,0,335938,14000,0
+50000,Male,Graduate School,Married,47,0,47695,2657,0
+30000,Male,Graduate School,Single,42,1,31217,0,0
+20000,Male,University,Married,44,0,16401,3000,0
+50000,Male,Graduate School,Married,41,0,34384,2000,0
+360000,Male,Graduate School,Married,36,-2,8162,3144,0
+70000,Male,University,Single,45,1,9728,0,0
+160000,Male,High School,Single,38,0,47268,2725,0
+20000,Male,University,Single,38,0,19817,2000,0
+110000,Male,University,Married,39,0,64494,4000,0
+50000,Male,University,Married,45,1,20530,0,0
+80000,Male,University,Married,32,0,75149,4003,0
+150000,Male,Graduate School,Married,31,-1,1600,1815,0
+290000,Male,University,Single,42,0,76063,4434,0
+340000,Male,High School,Married,48,0,4726,1379,0
+110000,Male,High School,Married,43,0,103225,4500,0
+150000,Male,Graduate School,Married,46,2,148364,5400,1
+280000,Male,University,Married,38,-2,3596,963,0
+20000,Male,University,Married,46,0,8168,4600,1
+180000,Male,University,Married,36,2,114140,5000,1
+50000,Male,Graduate School,Single,39,0,33307,6000,1
+200000,Male,Graduate School,Married,38,-2,0,0,1
+280000,Male,University,Single,35,-2,117,1375,0
+100000,Male,Graduate School,Married,35,0,65451,8000,0
+50000,Male,University,Single,33,0,8406,3000,0
+210000,Male,Graduate School,Married,38,0,81582,4000,0
+280000,Male,University,Married,33,0,22331,1371,0
+350000,Male,Graduate School,Married,39,-2,3948,2184,0
+230000,Male,High School,Single,39,-1,948,1156,0
+100000,Male,Graduate School,Single,40,0,97582,5000,0
+390000,Male,Graduate School,Married,35,0,24297,5000,0
+210000,Male,University,Married,40,-1,3180,0,0
+50000,Male,High School,Single,47,0,29789,1763,0
+50000,Male,High School,Married,44,0,15109,1263,0
+340000,Male,Graduate School,Married,42,0,13454,15074,0
+300000,Male,University,Single,38,1,5256,4000,0
+150000,Male,High School,Married,37,-1,8027,7129,0
+200000,Male,Graduate School,Others,45,0,74416,3061,0
+350000,Male,Graduate School,Single,34,1,334427,332,1
+20000,Male,University,Single,34,0,17809,1300,0
+470000,Male,University,Married,35,0,170812,6500,1
+200000,Male,Graduate School,Married,39,0,132589,4900,1
+20000,Male,High School,Single,40,1,9080,0,0
+230000,Male,Graduate School,Married,37,-2,211,162,0
+200000,Male,University,Single,33,0,154000,5000,1
+70000,Male,High School,Married,34,0,63580,3000,0
+20000,Male,High School,Married,38,2,19868,1515,1
+10000,Male,University,Single,34,0,9178,1500,0
+100000,Male,High School,Single,35,2,67646,3300,1
+130000,Male,High School,Married,39,-2,957,6220,0
+240000,Male,Graduate School,Married,46,-2,7278,804,0
+590000,Male,Graduate School,Single,33,0,42337,60092,0
+420000,Male,Graduate School,Single,39,0,403552,13703,0
+360000,Male,University,Married,34,-1,5375,0,0
+280000,Male,Graduate School,Married,35,0,59188,100000,0
+400000,Male,Graduate School,Married,35,-2,0,2720,0
+10000,Male,University,Married,35,0,5828,2000,0
+110000,Male,University,Married,35,1,112829,0,1
+220000,Male,Graduate School,Single,47,-1,628,1,0
+130000,Male,University,Married,46,0,89634,5000,0
+50000,Male,University,Single,33,0,45853,5435,0
+520000,Male,Graduate School,Single,34,0,203647,6501,0
+30000,Male,University,Single,47,1,15783,0,1
+50000,Male,Others,Married,45,0,33634,2000,0
+260000,Male,University,Married,32,0,29389,2796,0
+210000,Male,University,Married,44,-2,245,245,0
+500000,Male,Graduate School,Single,38,0,130700,15785,0
+120000,Male,Graduate School,Married,33,0,119684,5000,0
+110000,Male,University,Single,32,0,97510,3300,0
+420000,Male,Graduate School,Single,32,-2,11728,8904,0
+80000,Male,High School,Married,41,0,76015,3008,1
+300000,Male,University,Married,41,0,39260,2052,0
+260000,Male,Graduate School,Single,35,2,139582,13400,1
+500000,Male,Graduate School,Married,42,1,0,0,0
+20000,Male,University,Married,37,3,20334,0,0
+160000,Male,Graduate School,Single,36,0,156775,5145,0
+290000,Male,Graduate School,Single,34,-2,0,0,0
+360000,Male,Graduate School,Married,32,0,120533,6000,0
+200000,Male,Others,Married,35,-2,3000,5752,0
+180000,Male,Graduate School,Single,41,0,105347,6241,0
+100000,Male,Graduate School,Single,35,-1,1818,0,1
+500000,Male,Graduate School,Others,38,-2,1523,0,1
+50000,Male,University,Married,38,0,13921,1500,0
+290000,Male,Graduate School,Single,36,0,215472,11000,0
+270000,Male,Graduate School,Single,32,-2,0,0,0
+250000,Male,Graduate School,Single,33,-2,5017,22581,0
+80000,Male,High School,Married,45,2,75222,2379,1
+360000,Male,Graduate School,Married,35,-1,1540,1102,0
+50000,Male,Graduate School,Single,40,1,25667,1500,0
+420000,Male,Graduate School,Single,35,0,168182,15000,0
+180000,Male,Graduate School,Married,39,0,257579,20000,0
+180000,Male,Graduate School,Single,36,-2,0,0,0
+220000,Male,Graduate School,Married,45,-1,3455,12001,0
+210000,Male,Others,Single,45,-2,2563,5854,0
+180000,Male,University,Married,47,0,179253,3700,0
+210000,Male,University,Married,44,1,236136,0,0
+20000,Male,University,Married,50,1,18206,1000,0
+50000,Male,University,Married,50,2,19786,0,1
+250000,Male,High School,Married,51,-2,-18,2518,0
+50000,Male,Graduate School,Single,51,1,0,0,0
+50000,Male,University,Married,56,0,46794,1756,0
+50000,Male,High School,Married,51,0,44147,1785,1
+50000,Male,Graduate School,Others,51,0,48600,1986,0
+50000,Male,Graduate School,Married,48,2,47009,4000,0
+370000,Male,University,Married,55,0,360546,15901,0
+20000,Male,High School,Married,56,1,9453,1000,1
+300000,Male,High School,Married,61,0,269781,10001,0
+50000,Male,University,Married,53,0,48120,1818,0
+200000,Male,Graduate School,Married,52,-1,2343,3943,0
+180000,Male,University,Married,53,-1,316,316,1
+490000,Male,University,Married,50,0,208353,10000,0
+330000,Male,Graduate School,Single,65,0,20727,194000,0
+160000,Male,University,Married,50,1,142779,7700,1
+80000,Male,University,Single,48,0,74040,3500,0
+30000,Male,High School,Single,52,1,18034,6,0
+150000,Male,University,Single,52,-2,4966,1406,0
+50000,Male,Graduate School,Single,50,0,22551,1296,0
+50000,Male,University,Married,52,0,43286,5500,0
+200000,Male,High School,Married,50,-2,17306,17380,0
+240000,Male,University,Married,54,0,14332,2000,0
+50000,Male,University,Single,51,0,41700,1853,0
+150000,Male,Graduate School,Single,53,0,103720,5000,0
+190000,Male,University,Married,58,1,147335,0,0
+450000,Male,University,Married,66,-2,-81,15110,0
+30000,Male,High School,Married,55,3,2395,0,1
+410000,Male,University,Married,63,0,429392,15120,1
+50000,Male,High School,Married,55,0,50805,1750,1
+240000,Male,University,Single,57,-1,1115,1211,0
+20000,Female,High School,Single,26,0,19026,2302,0
+290000,Female,Graduate School,Single,42,-2,768,42863,0
+330000,Female,Graduate School,Single,28,0,32316,2000,0
+170000,Female,University,Single,23,0,139172,4970,0
+70000,Female,Graduate School,Single,23,2,72593,0,1
+120000,Female,Graduate School,Single,24,-2,14709,9326,0
+250000,Male,Graduate School,Married,53,-2,0,0,1
+290000,Female,High School,Single,29,0,279804,10000,0
+20000,Female,University,Single,26,0,25451,3000,0
+80000,Female,Graduate School,Single,23,2,74860,6865,1
+30000,Female,University,Single,22,1,27332,0,0
+50000,Female,University,Single,23,2,8730,1000,1
+30000,Female,University,Single,21,2,23243,1704,1
+200000,Female,University,Single,22,0,205736,7758,1
+70000,Female,University,Single,23,0,32244,1841,0
+50000,Female,University,Married,26,0,50905,2200,0
+30000,Female,University,Single,22,0,14617,3000,0
+70000,Female,University,Single,22,2,64838,6000,0
+80000,Female,University,Single,22,2,73954,5400,1
+50000,Female,University,Single,25,0,33703,1800,0
+300000,Female,University,Single,26,0,29300,1214,0
+30000,Female,University,Single,22,-1,2006,1157,0
+50000,Female,University,Single,30,0,49129,1833,0
+50000,Female,High School,Married,31,0,50734,1700,0
+80000,Female,University,Single,22,-1,3436,3473,0
+50000,Female,University,Single,23,0,40760,1500,0
+50000,Female,University,Married,23,0,50859,1965,0
+40000,Female,University,Single,23,2,8160,1327,1
+160000,Female,University,Single,23,0,17920,2000,0
+60000,Female,Graduate School,Single,23,0,21972,3126,0
+30000,Female,University,Single,23,0,27186,1451,1
+30000,Female,Graduate School,Single,23,0,12936,3200,0
+30000,Female,University,Single,23,0,34084,1521,0
+90000,Female,Graduate School,Single,23,-1,5802,4657,0
+100000,Female,University,Single,24,0,85343,2916,0
+50000,Female,Graduate School,Single,23,0,8400,1160,0
+90000,Female,University,Single,25,0,28276,1505,0
+90000,Female,University,Single,28,2,65521,0,1
+50000,Female,University,Single,23,2,48668,4034,1
+80000,Female,University,Married,25,0,76117,2300,0
+60000,Female,University,Single,25,0,6234,1276,0
+80000,Female,University,Single,25,0,4418,3000,0
+20000,Female,University,Single,25,2,14549,2845,0
+30000,Female,Graduate School,Single,25,2,7593,2379,0
+80000,Female,University,Single,24,0,60423,2007,0
+160000,Female,University,Single,24,-1,2227,0,0
+460000,Female,Graduate School,Married,34,-1,11689,2000,0
+90000,Female,University,Single,24,1,88550,3000,1
+30000,Female,University,Single,24,3,300,0,0
+220000,Female,University,Single,28,0,224166,8200,0
+70000,Female,High School,Single,22,0,53443,3000,0
+110000,Female,University,Single,24,2,107378,5500,1
+50000,Female,University,Married,25,0,16092,2352,0
+120000,Female,Graduate School,Single,26,-2,316,316,0
+150000,Female,University,Single,26,0,127487,6000,0
+140000,Female,University,Single,26,0,182424,10207,0
+120000,Female,Graduate School,Single,26,0,102157,5367,0
+40000,Female,University,Single,26,0,36835,2000,0
+90000,Female,University,Single,30,2,75620,3300,1
+50000,Female,University,Single,23,3,22613,0,1
+130000,Female,University,Single,26,-2,90833,2788,0
+180000,Female,Graduate School,Single,26,-2,100,100,0
+220000,Female,Graduate School,Single,26,0,4750,2012,0
+160000,Female,University,Married,29,0,6252,4000,0
+70000,Female,High School,Single,29,0,20064,24336,0
+60000,Female,Graduate School,Single,27,0,4262,3000,1
+140000,Female,University,Single,27,0,120072,2824,0
+70000,Female,University,Married,27,0,68425,6800,0
+320000,Female,University,Single,28,-1,4114,390,0
+60000,Female,Graduate School,Single,28,0,20217,1116,0
+360000,Female,Graduate School,Single,27,1,0,0,1
+500000,Female,Graduate School,Single,27,0,3715,2013,0
+200000,Female,Graduate School,Single,28,1,0,0,0
+160000,Female,University,Single,28,-1,1049,2060,0
+80000,Female,University,Single,44,0,7491,2900,0
+20000,Female,University,Single,39,0,19064,2000,0
+110000,Female,High School,Single,41,1,105152,0,0
+30000,Female,University,Married,38,-1,2805,0,0
+140000,Female,University,Married,31,1,-53,0,0
+150000,Female,University,Married,32,0,17594,1609,0
+140000,Female,University,Single,33,0,54279,2615,0
+460000,Female,University,Single,35,0,442291,16000,1
+60000,Female,University,Single,28,0,58204,2030,0
+90000,Female,High School,Married,40,1,78976,7500,1
+130000,Female,Others,Married,44,2,112570,0,1
+300000,Female,University,Single,28,0,25193,10000,0
+130000,Female,University,Married,29,1,85342,5000,0
+160000,Female,University,Married,29,-1,3617,2122,0
+20000,Female,University,Single,36,0,12530,1500,0
+20000,Female,High School,Married,42,2,16478,2994,1
+210000,Female,Graduate School,Single,34,-1,5081,4055,0
+100000,Female,University,Married,38,0,89745,3421,0
+120000,Female,Graduate School,Married,43,0,24484,3000,1
+60000,Female,High School,Married,41,0,56261,1993,0
+30000,Female,University,Single,26,2,30028,0,1
+290000,Female,Graduate School,Married,30,0,53476,2009,0
+260000,Female,University,Single,34,0,40066,2000,0
+130000,Female,High School,Married,34,0,124777,12000,0
+180000,Female,University,Married,42,1,0,0,0
+80000,Female,University,Married,36,0,79407,2491,0
+70000,Female,University,Married,38,0,59195,2195,0
+290000,Female,University,Single,34,0,58481,2000,0
+80000,Female,University,Married,37,-1,28652,2000,0
+300000,Female,University,Married,27,-2,2200,1378,0
+200000,Female,Graduate School,Married,28,0,148880,7000,0
+30000,Female,Graduate School,Single,29,0,15982,2226,0
+240000,Female,Graduate School,Single,35,0,86508,3583,0
+500000,Female,Graduate School,Married,35,-1,416,258,0
+30000,Female,University,Married,45,2,31548,1665,0
+180000,Female,Graduate School,Single,28,2,2096,1721,0
+180000,Female,University,Married,31,-2,384,0,1
+180000,Female,Graduate School,Single,37,0,34607,10000,0
+260000,Female,University,Single,26,0,217735,9553,0
+80000,Female,University,Single,33,-1,1305,0,0
+400000,Female,University,Single,38,-2,0,0,0
+360000,Female,High School,Single,27,-2,0,750,0
+130000,Female,Graduate School,Single,29,1,0,0,0
+50000,Female,University,Single,35,0,48848,2000,0
+290000,Female,University,Single,29,-1,19315,15000,0
+360000,Female,Graduate School,Married,38,-1,1873,13537,0
+240000,Female,University,Single,29,0,238225,9000,0
+80000,Female,High School,Married,41,0,76095,3500,0
+230000,Female,University,Single,32,1,-46,0,1
+230000,Female,Graduate School,Married,39,0,219712,30500,0
+330000,Female,University,Married,37,-1,2202,2911,0
+30000,Female,University,Married,34,1,22419,3207,0
+120000,Female,University,Married,33,0,42687,5000,0
+60000,Female,Others,Married,30,-2,3591,0,0
+230000,Female,University,Married,42,-2,430,430,1
+330000,Female,University,Married,43,1,-58,22359,0
+380000,Female,University,Married,44,-1,1040,0,0
+150000,Female,Graduate School,Single,27,-1,8404,4493,0
+170000,Female,High School,Single,29,0,116048,4200,1
+200000,Female,University,Married,30,-1,1626,233,0
+200000,Female,University,Married,31,0,24605,1462,0
+260000,Female,University,Married,32,0,104717,5300,0
+430000,Female,University,Married,47,-1,1343,909,0
+20000,Female,University,Married,46,1,10935,1000,1
+200000,Female,University,Married,30,0,75014,3190,0
+290000,Female,Graduate School,Single,31,-2,5307,4739,0
+240000,Female,Graduate School,Single,28,0,188956,7200,0
+290000,Female,Graduate School,Married,34,-1,3412,1573,0
+490000,Female,Graduate School,Single,41,0,250970,10000,0
+230000,Female,Graduate School,Married,39,-2,970,1483,0
+290000,Female,University,Single,26,0,81022,2121,0
+20000,Female,High School,Single,28,-1,5741,0,0
+300000,Female,Graduate School,Single,28,0,9582,1818,0
+130000,Female,University,Married,48,-1,390,390,1
+150000,Female,University,Married,37,0,119398,3459,0
+50000,Female,University,Married,37,0,35874,1600,0
+200000,Female,University,Married,45,-2,1182,0,0
+180000,Female,University,Married,47,-1,107,825,0
+300000,Female,Graduate School,Married,43,-1,21670,0,0
+30000,Female,University,Others,41,1,8055,1500,1
+420000,Female,High School,Married,44,1,0,5526,0
+320000,Female,University,Single,42,-1,2920,2610,0
+380000,Female,Graduate School,Single,31,0,66979,5700,0
+270000,Female,High School,Married,43,0,46107,1700,0
+150000,Female,University,Married,40,0,213638,7877,0
+180000,Female,University,Married,46,0,171768,7400,0
+180000,Female,University,Single,29,-2,2639,0,1
+140000,Female,High School,Married,41,0,114133,10000,1
+90000,Female,High School,Married,38,-1,85718,4200,0
+100000,Female,University,Married,33,0,83013,8000,0
+60000,Female,University,Married,34,-1,2084,0,1
+140000,Female,University,Single,42,0,117460,7000,0
+50000,Female,High School,Married,43,1,15214,0,0
+140000,Female,Graduate School,Married,39,1,0,797,0
+230000,Female,University,Single,31,1,19451,1017,0
+240000,Female,High School,Single,37,1,-3,0,0
+180000,Female,Graduate School,Married,46,0,184650,0,0
+80000,Female,High School,Married,39,0,18787,1400,0
+150000,Female,Graduate School,Married,37,0,61137,2230,0
+300000,Female,Graduate School,Single,35,1,0,0,0
+420000,Female,Graduate School,Married,45,2,412289,16272,1
+60000,Female,Graduate School,Single,31,-1,7233,1596,0
+30000,Female,High School,Married,48,0,29757,1720,0
+210000,Female,Graduate School,Single,37,-1,430,1570,0
+170000,Female,University,Single,38,-2,0,0,1
+270000,Female,University,Married,44,0,24775,10203,0
+120000,Female,Graduate School,Single,39,-1,28724,0,0
+150000,Female,High School,Single,42,0,30643,1237,0
+150000,Female,Graduate School,Single,37,-1,7591,5538,1
+240000,Female,University,Married,35,-2,5391,7375,0
+130000,Female,High School,Others,43,-1,15923,70000,0
+150000,Female,High School,Married,38,0,105911,5530,0
+20000,Female,Graduate School,Married,37,0,19921,1594,0
+390000,Female,Graduate School,Single,33,-2,2703,1017,0
+270000,Female,University,Married,39,0,161039,6000,0
+180000,Female,Graduate School,Married,45,-1,1560,0,1
+200000,Female,Graduate School,Single,30,0,78866,20000,0
+180000,Female,University,Single,37,-1,491,0,0
+330000,Female,Graduate School,Single,39,0,118467,4000,0
+140000,Female,Graduate School,Single,40,0,51270,4500,0
+220000,Female,University,Married,34,-1,387,1086,0
+400000,Female,Graduate School,Married,40,-1,6376,5949,0
+150000,Female,University,Married,36,0,339351,7000,0
+200000,Female,University,Married,36,2,2500,0,1
+30000,Female,University,Married,43,-1,661,25166,0
+170000,Female,University,Single,30,0,106667,3458,0
+180000,Female,University,Married,30,1,18435,0,0
+300000,Female,University,Married,31,0,5361,1060,0
+100000,Female,University,Single,32,0,76112,4500,0
+500000,Female,Graduate School,Married,38,-1,5410,2700,0
+80000,Female,High School,Married,40,0,34965,1578,0
+60000,Female,University,Single,40,0,61756,2200,0
+50000,Female,University,Married,37,0,24380,1366,0
+500000,Female,High School,Married,48,-1,485,6342,0
+50000,Female,High School,Others,37,2,47676,2300,1
+160000,Female,High School,Married,44,0,6061,2000,0
+180000,Female,University,Married,46,1,0,0,0
+520000,Female,Graduate School,Married,46,1,133890,5022,0
+160000,Female,Graduate School,Married,33,-1,5459,20415,0
+380000,Female,Graduate School,Married,35,-1,472,14985,0
+440000,Female,University,Married,33,0,411970,15003,0
+280000,Female,Graduate School,Single,33,2,3192,2000,0
+130000,Female,University,Married,35,0,63281,10000,0
+250000,Female,University,Married,31,1,0,0,0
+360000,Female,University,Married,39,1,0,791,0
+470000,Female,University,Single,36,-1,9,1960,0
+330000,Female,University,Married,29,0,132702,6972,0
+240000,Female,University,Married,42,-1,1986,1323,0
+360000,Female,Graduate School,Married,48,-1,485,0,0
+80000,Female,High School,Single,40,2,81428,4,1
+280000,Female,Graduate School,Single,29,0,15801,1200,0
+60000,Female,Graduate School,Married,38,1,0,780,0
+220000,Female,Graduate School,Married,44,2,34510,0,1
+140000,Female,University,Married,45,0,53951,3000,0
+230000,Female,Graduate School,Single,29,0,23586,5576,0
+360000,Female,Graduate School,Married,30,1,-6,1197,0
+390000,Female,Graduate School,Single,30,0,152535,5153,0
+280000,Female,University,Married,32,0,11561,1300,0
+310000,Female,High School,Single,32,0,41623,3021,0
+500000,Female,University,Married,33,0,264479,15000,0
+230000,Female,University,Single,39,0,72288,2429,0
+30000,Female,High School,Married,41,-2,0,0,0
+120000,Female,Others,Married,41,-2,0,77,0
+200000,Female,University,Single,43,-2,10000,10000,0
+390000,Female,Graduate School,Single,30,0,60819,5000,0
+200000,Female,Graduate School,Single,30,0,59891,3000,0
+600000,Female,University,Married,38,1,485861,33,0
+20000,Female,High School,Married,45,2,21646,3400,1
+140000,Female,University,Married,48,0,41241,1996,0
+90000,Female,University,Married,34,-1,336,0,1
+20000,Female,High School,Married,44,0,5636,1200,0
+30000,Female,Others,Married,45,0,26958,4000,0
+250000,Female,Graduate School,Single,35,0,154878,5200,0
+350000,Female,University,Single,33,0,187649,5100,0
+150000,Female,University,Married,49,-1,4016,5119,0
+300000,Female,University,Single,42,0,113308,5400,0
+50000,Female,High School,Married,53,0,46778,1770,1
+390000,Female,High School,Married,58,-2,1186,15657,0
+120000,Female,High School,Single,53,-1,1743,0,0
+140000,Female,High School,Single,53,0,142270,5357,0
+80000,Female,High School,Married,51,0,80904,2500,0
+50000,Female,High School,Married,55,0,23593,1831,0
+140000,Female,University,Single,49,-2,1282,367,1
+330000,Female,University,Married,50,-2,5600,0,0
+280000,Female,University,Married,49,2,28788,3315,0
+470000,Female,University,Married,49,0,3388,1359,1
+100000,Female,Others,Married,54,0,95980,4057,0
+160000,Female,High School,Married,53,0,123760,6018,0
+320000,Female,High School,Married,51,-1,8391,2869,0
+130000,Female,High School,Married,49,0,132571,0,0
+50000,Female,High School,Married,53,2,41174,2000,0
+540000,Female,University,Married,50,0,66637,3100,0
+50000,Female,Others,Single,48,0,47937,2000,0
+140000,Female,University,Married,48,0,248822,10000,0
+50000,Female,University,Married,53,0,28509,1500,0
+360000,Female,University,Married,69,-2,0,0,0
+420000,Female,Graduate School,Married,45,1,0,23621,0
+360000,Female,Graduate School,Married,51,-2,1495,1120,0
+110000,Female,High School,Married,63,-1,390,390,1
+170000,Female,Graduate School,Married,52,-2,416,2939,0
+350000,Female,Graduate School,Others,53,1,36138,7619,0
+240000,Female,University,Married,56,2,133954,8500,0
+240000,Female,University,Single,51,-2,489,4006,0
+80000,Female,University,Married,57,1,0,2412,0
+60000,Female,High School,Married,59,-1,694,694,0
+220000,Female,University,Married,53,2,220389,8490,1
+500000,Female,Graduate School,Single,54,0,260356,18609,0
+20000,Male,University,Single,24,0,15886,2500,0
+360000,Male,University,Married,27,3,245135,8000,1
+300000,Female,University,Married,53,0,5551,1110,0
+60000,Male,University,Single,38,0,57775,2000,1
+360000,Male,University,Married,30,3,2500,0,1
+50000,Male,University,Single,23,0,42284,1800,0
+130000,Male,University,Married,24,0,46113,5000,0
+50000,Male,Graduate School,Married,24,0,47734,1763,0
+150000,Male,University,Single,28,0,134218,4465,0
+30000,Male,Graduate School,Single,24,0,28327,1464,0
+350000,Male,Graduate School,Single,29,0,27916,2001,0
+50000,Male,University,Single,25,0,48517,5204,0
+20000,Male,University,Single,23,0,13284,1229,0
+20000,Male,University,Single,23,1,0,0,0
+180000,Male,Graduate School,Single,26,-2,-106,0,0
+50000,Male,High School,Single,26,1,0,0,0
+150000,Male,University,Single,26,1,156387,0,1
+120000,Male,Graduate School,Single,27,0,117324,3500,0
+20000,Male,University,Single,27,0,7320,2000,1
+60000,Male,University,Single,24,-1,836,836,0
+80000,Male,High School,Single,28,1,38151,2000,1
+50000,Male,High School,Single,30,0,49631,3000,0
+50000,Male,Graduate School,Single,25,0,44477,2000,0
+30000,Male,University,Single,22,0,30465,1630,0
+20000,Male,University,Single,26,0,15481,3100,1
+50000,Male,University,Single,27,-2,0,0,0
+150000,Male,Graduate School,Single,25,0,51568,3000,0
+50000,Male,University,Married,26,1,0,0,0
+500000,Male,Graduate School,Single,28,0,59646,43018,0
+120000,Male,Graduate School,Single,27,0,75896,3900,0
+260000,Male,Graduate School,Single,27,0,256324,10000,0
+20000,Male,University,Single,23,2,7130,2455,1
+20000,Male,University,Single,24,1,20036,0,0
+200000,Male,Graduate School,Single,29,-2,33188,11958,0
+80000,Male,High School,Married,31,1,0,0,0
+160000,Male,University,Single,28,-2,0,0,0
+200000,Male,Graduate School,Single,28,-1,2006,0,0
+90000,Male,University,Single,29,0,85769,3535,0
+20000,Male,High School,Single,41,1,9913,50,0
+120000,Male,University,Single,36,-1,396,396,0
+490000,Male,Graduate School,Single,35,0,156021,8030,0
+290000,Male,University,Married,38,-1,3137,8042,0
+160000,Male,High School,Single,30,0,275241,8365,0
+220000,Male,Graduate School,Single,30,2,84630,5000,0
+290000,Male,University,Married,38,0,116625,5000,0
+360000,Male,Graduate School,Married,37,-2,0,0,1
+120000,Male,University,Single,40,2,58908,1000,1
+360000,Male,Graduate School,Single,30,0,194001,8000,0
+80000,Male,High School,Married,33,-1,390,390,1
+420000,Male,University,Married,37,-1,3823,150298,0
+360000,Male,Graduate School,Married,39,0,12768,0,0
+230000,Male,University,Married,45,0,41414,2000,0
+50000,Male,High School,Married,46,1,51814,2400,1
+60000,Male,University,Married,48,0,60924,2400,0
+200000,Male,University,Single,30,0,179273,6657,0
+480000,Male,University,Married,30,-1,1309,668,0
+360000,Male,High School,Married,32,-2,33597,0,0
+100000,Male,Graduate School,Married,33,2,94257,0,1
+150000,Male,Graduate School,Married,37,-2,22109,10943,1
+20000,Male,High School,Married,39,0,19831,1300,0
+500000,Male,Graduate School,Married,45,-1,2900,0,0
+180000,Male,Graduate School,Married,36,1,0,0,0
+230000,Male,University,Married,29,0,50845,1768,0
+180000,Male,University,Single,43,-1,5904,5739,0
+60000,Male,University,Single,29,1,0,0,1
+200000,Male,Graduate School,Married,42,1,0,581,1
+30000,Male,High School,Married,28,-1,38390,4200,0
+20000,Male,Graduate School,Married,31,0,4596,1161,1
+110000,Male,Graduate School,Married,41,0,80959,3486,0
+80000,Male,High School,Married,41,0,90894,4000,1
+160000,Male,University,Married,35,0,160303,8000,0
+20000,Male,University,Single,35,0,19600,1312,1
+30000,Male,High School,Single,30,1,4318,1000,0
+30000,Male,University,Married,48,0,29428,1402,0
+150000,Male,University,Married,34,1,52692,3000,1
+280000,Male,University,Married,31,0,235937,7000,0
+50000,Male,University,Single,33,1,51210,0,0
+50000,Male,University,Married,28,2,47237,4500,1
+280000,Male,Graduate School,Married,34,1,-2159,7936,0
+200000,Male,University,Married,41,0,92272,6582,0
+210000,Male,High School,Married,47,-2,6460,3623,1
+360000,Male,High School,Married,27,-2,0,0,0
+50000,Male,High School,Married,29,1,47479,0,0
+90000,Male,University,Single,33,0,60881,5000,0
+270000,Male,University,Married,35,0,48465,1737,0
+50000,Male,High School,Married,44,-1,1261,1261,0
+80000,Male,University,Single,43,0,70479,3300,0
+160000,Male,High School,Married,38,0,25003,2000,0
+200000,Male,University,Married,36,-1,75,3707,0
+30000,Male,University,Married,37,2,19177,1500,1
+260000,Male,University,Single,32,0,170276,6200,1
+520000,Male,University,Married,34,-1,10000,10000,0
+140000,Male,High School,Married,37,-1,13806,1003,1
+20000,Male,University,Single,34,0,18351,1304,1
+50000,Male,High School,Married,41,2,20186,2700,1
+50000,Male,University,Single,31,0,8673,5011,0
+170000,Male,High School,Married,42,0,191865,7080,0
+180000,Male,University,Single,32,0,45469,5000,0
+360000,Male,University,Married,35,-2,0,0,0
+220000,Male,University,Married,39,1,4949,0,0
+500000,Male,Graduate School,Married,48,0,107595,5000,0
+80000,Male,University,Single,36,0,79673,7100,0
+150000,Male,University,Married,40,0,154432,4000,0
+140000,Male,High School,Married,47,-1,292,500,1
+190000,Male,University,Single,39,0,146224,5500,0
+200000,Male,Graduate School,Single,32,0,270160,11000,0
+260000,Male,Graduate School,Single,33,0,93767,3014,0
+210000,Male,Graduate School,Single,30,0,137825,5346,0
+240000,Male,University,Married,38,2,7123,836,1
+140000,Male,Graduate School,Married,39,1,0,0,1
+200000,Male,Graduate School,Married,34,-2,7468,5048,0
+200000,Male,University,Single,35,0,27784,10000,0
+500000,Male,Graduate School,Single,41,-1,1296,0,0
+50000,Male,University,Married,39,2,31476,0,1
+130000,Male,University,Single,38,1,-470,0,0
+230000,Male,University,Single,31,0,100074,9000,0
+260000,Male,Graduate School,Single,34,-2,916,2615,0
+260000,Male,University,Married,46,0,57963,247562,0
+160000,Male,University,Single,30,-2,750,400,0
+250000,Male,University,Married,40,1,65087,0,1
+180000,Male,Graduate School,Single,30,1,0,0,0
+20000,Male,Graduate School,Married,32,2,12720,1250,1
+270000,Male,Graduate School,Married,37,2,160069,6000,1
+280000,Male,Graduate School,Married,47,-1,3587,3200,0
+30000,Male,University,Single,30,0,7038,1291,0
+450000,Male,University,Single,36,-2,8012,4021,1
+360000,Male,Graduate School,Married,33,0,21936,14000,0
+20000,Male,University,Single,36,2,18458,1450,0
+200000,Male,High School,Single,40,-1,316,316,1
+500000,Male,Graduate School,Married,46,0,16347,17029,0
+200000,Male,Graduate School,Married,42,-2,-6028,20405,0
+450000,Male,High School,Married,39,0,28499,1214,0
+20000,Male,University,Single,37,2,3254,2000,1
+500000,Male,Graduate School,Married,42,0,9894,1500,0
+260000,Male,Graduate School,Single,31,0,102350,4033,0
+60000,Male,Graduate School,Married,38,2,46389,4200,1
+10000,Male,University,Married,37,0,8487,2008,0
+240000,Male,University,Married,34,0,133272,6000,0
+480000,Male,Graduate School,Married,43,0,377420,20000,0
+520000,Male,Graduate School,Married,38,0,98615,5102,0
+20000,Male,University,Married,42,2,20745,1373,1
+180000,Male,Graduate School,Single,33,0,3060,1000,0
+290000,Male,University,Married,43,-1,172,1930,0
+50000,Male,High School,Single,40,0,43530,2000,1
+500000,Male,Graduate School,Single,39,0,297309,10322,0
+100000,Male,University,Single,40,0,95600,15037,0
+300000,Male,Graduate School,Married,39,-1,20938,21169,0
+130000,Male,Graduate School,Married,40,2,62177,0,0
+500000,Male,Graduate School,Married,50,-1,2112,6031,0
+220000,Male,University,Married,55,-1,757,757,0
+260000,Male,High School,Married,63,0,261326,9166,0
+50000,Male,University,Married,51,0,40197,2044,1
+110000,Male,University,Single,55,0,51716,65668,1
+350000,Male,Graduate School,Married,49,-2,27506,4213,0
+180000,Male,University,Married,48,-1,1294,1294,0
+120000,Male,High School,Single,50,0,118506,4500,0
+460000,Male,University,Married,51,-1,326,831,1
+250000,Male,University,Married,52,2,29860,0,1
+50000,Male,University,Married,58,-1,60524,2200,0
+350000,Male,Graduate School,Married,56,0,257463,10023,0
+20000,Male,High School,Married,52,0,16049,4519,1
+290000,Male,University,Married,49,-2,4040,5457,0
+500000,Male,Graduate School,Married,50,2,70144,6500,1
+240000,Male,Graduate School,Single,57,0,224041,18800,0
+50000,Male,University,Single,52,0,45840,8000,0
+180000,Male,Graduate School,Married,50,0,62782,5000,0
+500000,Male,High School,Married,52,-1,30561,38219,0
+110000,Male,University,Single,51,0,110950,2500,0
+500000,Male,University,Married,53,-2,-12,1282,0
+30000,Male,University,Married,53,0,29557,1427,0
+600000,Male,Graduate School,Married,53,2,458652,17000,1
+80000,Male,Graduate School,Married,60,0,49246,5000,0
+270000,Male,Graduate School,Married,61,0,15194,2000,0
+30000,Male,High School,Single,56,0,32758,1521,0
+20000,Female,University,Single,22,1,9498,0,1
+20000,Female,Graduate School,Single,23,2,19175,1575,0
+90000,Male,High School,Married,56,2,48900,2000,1
+160000,Male,University,Married,59,-1,1443,1443,0
+90000,Female,University,Single,23,0,87727,3160,0
+60000,Female,University,Single,24,1,0,948,1
+20000,Female,High School,Single,24,-1,12509,0,0
+120000,Female,University,Single,25,-1,827,827,0
+20000,Female,University,Married,27,0,19667,0,0
+130000,Female,University,Single,28,0,120208,5500,1
+90000,Female,University,Single,27,0,18525,10105,0
+430000,Female,Graduate School,Single,27,-2,389,390,0
+360000,Female,Graduate School,Single,27,-2,0,0,0
+10000,Female,Graduate School,Single,22,0,9791,1275,0
+210000,Female,University,Single,25,-1,648,39637,0
+120000,Female,High School,Married,30,-2,757,757,0
+120000,Female,University,Married,30,0,121038,4700,0
+100000,Female,Graduate School,Single,30,2,92162,3800,0
+60000,Female,University,Single,28,2,5669,2000,1
+20000,Female,Graduate School,Single,25,-2,0,0,0
+200000,Female,University,Single,26,-2,0,0,0
+90000,Female,High School,Single,26,0,3268,6745,1
+220000,Female,Graduate School,Single,28,0,53342,5000,0
+180000,Female,University,Married,40,0,22523,2045,0
+320000,Female,Graduate School,Single,31,0,77052,3000,0
+280000,Female,University,Single,29,-1,7545,2709,0
+130000,Female,University,Single,30,-1,396,792,0
+280000,Female,Graduate School,Single,30,1,0,230,1
+50000,Female,University,Married,30,0,86750,4000,0
+200000,Female,University,Single,33,-1,10111,0,0
+320000,Female,Graduate School,Single,33,-2,2722,11040,0
+90000,Female,University,Married,37,-1,2330,0,0
+180000,Female,University,Married,34,2,314,8000,0
+140000,Female,University,Married,35,0,62244,2000,0
+330000,Female,Graduate School,Single,36,0,43943,5019,0
+90000,Female,University,Single,25,1,0,840,1
+180000,Female,University,Single,27,0,53490,1952,0
+200000,Female,Graduate School,Married,38,-2,5625,12125,0
+270000,Female,University,Married,38,-1,2974,1071,0
+60000,Female,Graduate School,Single,23,0,51275,2000,0
+80000,Female,University,Single,26,0,24986,2000,0
+250000,Female,Graduate School,Single,29,2,1381,1381,1
+270000,Female,University,Married,32,2,276339,11000,0
+170000,Female,Graduate School,Single,40,0,15442,2000,0
+250000,Female,University,Married,36,0,253627,10153,0
+360000,Female,University,Married,38,-1,994,0,0
+310000,Female,University,Married,34,0,86769,2800,0
+200000,Female,University,Married,39,1,0,21801,0
+130000,Female,High School,Married,42,1,0,0,0
+310000,Female,University,Married,49,0,52439,2042,0
+200000,Female,University,Single,34,0,19252,1000,0
+190000,Female,University,Married,35,0,187777,8300,0
+210000,Female,University,Married,36,-1,26656,2641,0
+100000,Female,University,Married,36,-1,5788,1600,0
+90000,Female,University,Married,39,1,90880,0,0
+290000,Female,University,Single,41,0,268498,10000,0
+200000,Female,University,Married,42,-1,8050,0,0
+120000,Female,Graduate School,Single,40,-1,326,326,0
+130000,Female,University,Married,37,0,41200,2019,1
+150000,Female,High School,Married,46,0,38985,10107,0
+220000,Female,Graduate School,Married,37,-2,14298,101405,0
+200000,Female,University,Married,37,0,97645,3952,0
+120000,Female,High School,Married,37,-2,3140,448,0
+200000,Female,University,Single,37,1,-1,298,0
+100000,Female,University,Married,41,-1,1594,4612,0
+130000,Female,University,Single,43,0,109464,3835,0
+10000,Female,University,Single,55,1,9193,0,1
+150000,Female,University,Married,45,0,135260,12000,0
+60000,Female,High School,Married,42,0,29243,3032,0
+130000,Male,High School,Single,27,1,133019,6000,0
+50000,Male,Graduate School,Single,22,0,45192,5000,0
+410000,Male,Graduate School,Single,27,0,7189,16090,0
+210000,Male,Graduate School,Single,29,-2,0,0,0
+140000,Male,University,Single,30,-1,1288,0,0
+80000,Male,High School,Married,31,2,28175,30000,1
+30000,Male,University,Single,31,1,10916,1500,1
+200000,Male,Graduate School,Married,32,-1,760,0,0
+550000,Male,Graduate School,Single,31,1,17516,0,0
+80000,Male,University,Single,24,1,-1461,0,1
+50000,Male,Graduate School,Single,25,0,49357,2100,0
+30000,Male,Graduate School,Single,26,1,24599,1700,0
+100000,Male,Graduate School,Single,26,-1,2724,45000,0
+70000,Male,Graduate School,Single,27,0,2219,0,0
+220000,Male,Graduate School,Single,32,-1,13596,34661,0
+30000,Male,University,Single,25,1,0,1170,1
+250000,Male,University,Single,28,-2,22303,51762,0
+20000,Male,High School,Single,28,1,18953,18,0
+210000,Male,University,Single,29,0,118905,5000,0
+360000,Male,Graduate School,Single,29,1,0,0,0
+170000,Male,University,Single,29,0,23070,164000,0
+10000,Male,University,Single,29,0,9363,1000,0
+280000,Male,University,Single,30,0,31272,10000,1
+30000,Male,University,Single,22,0,4921,1500,1
+20000,Male,University,Single,24,0,15110,2000,0
+10000,Male,Graduate School,Single,26,0,6376,1121,1
+80000,Male,University,Single,26,0,350,0,0
+360000,Male,Graduate School,Single,29,0,109997,100000,0
+250000,Male,Graduate School,Single,31,1,-160,0,0
+170000,Male,University,Single,33,-1,2754,2754,0
+80000,Male,Graduate School,Single,33,0,73983,3300,1
+130000,Male,University,Married,33,0,39251,2500,0
+260000,Male,University,Single,34,0,32324,1503,0
+20000,Male,University,Married,31,1,21676,0,0
+400000,Male,Graduate School,Married,32,2,187698,7000,1
+220000,Male,Graduate School,Married,33,-1,586,0,0
+180000,Male,High School,Single,27,0,172511,8008,0
+10000,Male,University,Single,29,0,9893,1125,0
+500000,Male,High School,Single,53,-1,17362,16666,0
+50000,Male,University,Single,35,0,47816,3000,0
+180000,Male,University,Married,36,-2,15315,0,0
+370000,Male,University,Married,39,1,3633,0,0
+360000,Male,Graduate School,Married,37,0,68438,4000,0
+230000,Male,University,Married,41,0,33842,2000,0
+270000,Male,Graduate School,Married,45,1,0,278,0
+150000,Male,Graduate School,Married,35,-2,5819,809,0
+170000,Male,University,Married,37,-2,0,0,0
+10000,Male,University,Married,41,-2,3311,7244,0
+210000,Male,Graduate School,Married,47,-1,8417,15271,0
+270000,Male,Graduate School,Married,39,-2,10648,0,0
+50000,Male,University,Single,44,0,15735,1500,0
+250000,Male,University,Married,34,-1,6461,8117,0
+50000,Male,Graduate School,Single,35,2,50481,3183,1
+280000,Male,University,Married,35,0,202260,4000,0
+210000,Male,University,Married,35,-1,5811,0,0
+360000,Male,Graduate School,Married,45,1,0,709,0
+250000,Male,University,Married,44,0,254933,8754,0
+170000,Male,Others,Married,40,0,69072,100467,0
+140000,Male,University,Single,38,0,137611,6010,0
+30000,Female,University,Single,21,0,28409,2000,0
+20000,Female,University,Married,22,2,14414,1554,1
+50000,Female,University,Single,22,0,50784,2300,0
+50000,Female,Graduate School,Single,21,0,36809,5143,0
+60000,Female,University,Single,22,0,60330,2500,1
+170000,Male,Graduate School,Married,51,-1,316,316,0
+220000,Male,Graduate School,Married,42,-1,2358,156,0
+50000,Female,University,Married,30,1,27369,0,0
+20000,Female,University,Single,23,0,19242,1500,0
+20000,Female,University,Married,38,0,19520,1623,0
+50000,Female,University,Single,30,1,49310,0,1
+160000,Female,Graduate School,Married,27,0,146643,6645,0
+400000,Female,University,Single,27,-1,20030,3513,0
+140000,Female,University,Single,27,0,134936,6700,0
+80000,Female,Graduate School,Single,27,-1,993,1528,0
+140000,Female,University,Single,27,2,137582,12100,0
+340000,Female,Graduate School,Married,27,0,307650,0,0
+70000,Female,University,Single,26,0,7053,3000,0
+130000,Female,Graduate School,Single,26,-2,0,365,0
+30000,Female,High School,Single,22,1,41867,0,1
+90000,Female,University,Single,23,0,23362,9025,0
+30000,Female,University,Married,23,1,28108,0,0
+100000,Female,University,Single,22,1,96491,0,0
+20000,Female,University,Married,23,1,15988,794,0
+30000,Female,Graduate School,Single,24,-1,736,1472,0
+90000,Female,Graduate School,Single,23,0,60374,3000,0
+20000,Female,Graduate School,Single,23,2,17476,2000,1
+30000,Female,University,Single,22,3,200,0,0
+80000,Female,Graduate School,Single,22,0,6518,65145,0
+200000,Female,High School,Single,25,0,7930,1343,0
+60000,Female,High School,Single,23,0,60465,3000,0
+60000,Female,University,Single,22,0,60434,2141,0
+80000,Female,University,Married,22,2,79026,4400,1
+20000,Female,Graduate School,Others,22,2,14033,1500,1
+130000,Female,University,Single,22,2,29929,3412,0
+20000,Female,University,Single,22,2,6655,3200,1
+50000,Female,Graduate School,Single,23,0,1980,269,0
+60000,Female,University,Married,23,2,58032,4800,1
+100000,Female,University,Single,23,2,90680,5000,0
+70000,Female,High School,Single,24,0,5633,1114,0
+30000,Female,High School,Single,22,3,9340,1500,1
+10000,Female,Graduate School,Single,22,1,6251,500,0
+10000,Female,University,Single,23,0,5403,1300,0
+50000,Female,High School,Single,23,0,35158,1544,1
+30000,Female,University,Single,24,-2,0,0,0
+30000,Female,University,Single,24,0,26581,1500,1
+100000,Female,High School,Single,24,0,54938,1985,0
+30000,Female,High School,Single,24,2,31073,0,1
+90000,Female,University,Single,24,0,15985,4000,0
+140000,Female,University,Single,24,2,50643,2400,1
+50000,Female,University,Married,24,0,25207,1000,0
+70000,Female,University,Single,24,0,64927,2177,0
+10000,Female,University,Single,25,1,8599,1400,1
+20000,Female,University,Single,24,2,12764,0,1
+100000,Female,University,Single,44,-1,12566,3061,0
+130000,Female,Graduate School,Single,27,0,116186,5500,0
+130000,Female,University,Single,24,0,12163,10095,0
+80000,Female,University,Single,23,0,77894,2850,0
+50000,Female,University,Single,23,0,48409,1605,0
+50000,Female,Graduate School,Single,23,2,50668,0,1
+150000,Female,University,Married,24,0,1473,1473,0
+50000,Female,University,Others,24,0,40901,1600,0
+70000,Female,University,Single,26,0,105338,3600,0
+30000,Female,University,Married,26,2,30372,1805,1
+390000,Female,Graduate School,Single,25,0,21382,4000,0
+80000,Female,University,Single,25,0,36808,2000,0
+70000,Female,Graduate School,Single,26,0,41614,3000,0
+30000,Female,University,Single,23,0,10550,1300,1
+60000,Female,University,Single,23,0,45995,2400,0
+20000,Female,High School,Married,24,0,13506,1500,0
+70000,Female,University,Single,25,0,73939,3000,1
+200000,Female,University,Single,25,0,195520,7080,0
+50000,Female,University,Single,23,0,48227,2000,0
+50000,Female,University,Married,27,1,52180,0,0
+20000,Female,University,Single,23,0,13858,1500,0
+50000,Female,Graduate School,Single,23,1,34443,0,0
+90000,Female,University,Single,24,0,35411,10000,0
+100000,Female,University,Single,24,2,177646,7900,0
+70000,Female,University,Single,24,0,61094,3000,0
+90000,Female,University,Single,24,0,87711,3248,0
+30000,Female,High School,Married,23,0,27552,3605,0
+100000,Female,Graduate School,Single,27,0,81152,3000,0
+280000,Female,University,Single,25,0,196909,6285,0
+100000,Female,University,Single,28,1,101080,0,0
+140000,Female,University,Married,28,0,62204,7028,0
+10000,Female,Graduate School,Single,27,2,6100,1281,1
+50000,Female,University,Married,28,0,48198,2195,0
+120000,Female,Graduate School,Single,28,0,2750,0,0
+20000,Female,University,Married,27,0,10489,1222,1
+30000,Female,University,Married,27,-1,390,780,0
+130000,Female,University,Single,26,1,0,0,0
+150000,Female,Graduate School,Single,27,-2,5200,1689,0
+180000,Female,University,Married,27,0,114556,5000,0
+400000,Female,University,Single,26,0,74465,3600,0
+300000,Female,Graduate School,Single,27,1,-2,874,0
+30000,Female,Graduate School,Single,27,2,23575,1400,0
+130000,Female,Graduate School,Single,27,-2,0,0,0
+20000,Female,High School,Married,26,0,15271,2000,1
+50000,Female,University,Married,27,2,50151,1724,0
+80000,Female,University,Single,27,1,4073,0,0
+20000,Female,University,Single,26,0,16932,1596,0
+50000,Female,University,Single,27,2,48407,2100,1
+80000,Female,University,Single,26,0,80919,3000,0
+340000,Female,Graduate School,Single,26,-1,4617,13054,0
+30000,Female,University,Married,27,0,29083,1500,0
+120000,Female,University,Single,27,0,57449,1743,0
+50000,Female,Graduate School,Single,27,1,4040,0,1
+50000,Female,University,Single,27,0,33944,1513,0
+500000,Female,Graduate School,Single,28,0,27250,20000,0
+360000,Female,University,Single,27,2,2500,0,1
+60000,Female,Graduate School,Single,27,1,0,3366,0
+50000,Female,University,Single,27,0,50686,1779,0
+130000,Female,Graduate School,Single,27,0,15700,250,0
+360000,Female,University,Single,29,0,58653,5886,0
+160000,Female,University,Single,27,1,156506,7000,0
+360000,Female,University,Married,27,-2,5146,15381,0
+210000,Female,University,Single,28,-2,2316,2316,0
+230000,Female,University,Single,28,2,183197,6724,1
+70000,Female,University,Single,27,2,34263,4000,1
+50000,Female,University,Married,25,0,2126,1513,0
+80000,Female,University,Single,25,-1,2177,2000,0
+100000,Female,University,Single,26,0,1734,6649,1
+200000,Female,University,Married,28,0,197142,9197,0
+70000,Female,University,Single,28,0,68462,3100,0
+110000,Female,Graduate School,Married,28,0,109474,11602,0
+50000,Female,Graduate School,Single,28,1,51943,0,1
+50000,Female,Graduate School,Single,28,1,43810,1500,0
+70000,Female,University,Single,27,0,25105,3500,0
+60000,Female,University,Single,27,3,56670,2100,1
+130000,Female,University,Married,27,0,76885,3500,0
+150000,Female,University,Single,25,2,114074,5600,1
+30000,Female,University,Single,27,0,17845,3384,0
+200000,Female,University,Single,26,0,12557,1084,0
+50000,Female,University,Married,29,0,48712,2128,0
+260000,Female,Graduate School,Single,29,1,0,18488,0
+360000,Female,Graduate School,Single,29,-1,8805,5001,0
+30000,Female,University,Single,29,2,13519,3211,1
+100000,Female,University,Married,29,0,73742,2700,1
+80000,Female,University,Married,29,0,52533,5120,0
+30000,Female,University,Married,29,0,27030,2379,0
+400000,Female,Graduate School,Single,28,-1,1026,9066,0
+240000,Female,Graduate School,Single,29,-1,3126,626,0
+110000,Female,University,Single,30,2,124206,4000,1
+220000,Female,Graduate School,Single,28,0,57739,2000,0
+200000,Female,Graduate School,Single,28,0,105051,32000,0
+360000,Female,University,Single,28,0,264971,9043,0
+30000,Female,University,Single,28,2,19334,1500,0
+110000,Female,Graduate School,Single,28,0,107649,3500,0
+80000,Female,Graduate School,Single,29,0,33947,4000,0
+50000,Female,University,Single,35,0,13517,1550,0
+100000,Female,Graduate School,Single,22,1,74001,0,1
+50000,Female,University,Single,23,0,26871,1500,0
+350000,Female,High School,Single,28,-2,0,0,1
+80000,Female,High School,Married,28,0,63900,3000,1
+90000,Female,Graduate School,Married,29,0,45343,2967,0
+120000,Female,Graduate School,Single,27,0,110994,1471,0
+50000,Female,Graduate School,Single,25,0,3624,1100,0
+50000,Female,University,Single,23,0,19455,1500,0
+90000,Female,University,Single,24,0,10648,1500,0
+70000,Female,University,Single,25,0,32486,2000,0
+10000,Female,University,Married,31,3,10243,0,0
+130000,Female,University,Single,29,-1,867,1200,0
+360000,Female,Graduate School,Single,27,0,130640,4500,0
+80000,Female,University,Married,35,0,63706,5660,0
+70000,Female,University,Single,27,0,47938,2500,0
+30000,Female,Graduate School,Single,22,2,29188,0,1
+80000,Female,University,Single,24,0,68250,3000,0
+80000,Female,University,Single,24,-2,1138,804,0
+150000,Female,Graduate School,Single,29,0,110018,2400,0
+260000,Female,University,Single,29,0,193000,5072,0
+20000,Female,University,Married,29,0,14757,2000,0
+20000,Female,University,Single,25,0,7561,1144,1
+200000,Female,University,Single,27,1,18354,0,0
+110000,Female,University,Married,25,-1,96343,0,1
+160000,Female,Graduate School,Married,24,-1,2990,8619,0
+20000,Female,University,Married,24,0,17191,2000,0
+110000,Female,University,Single,26,0,63376,5000,0
+200000,Female,Graduate School,Single,30,-1,182967,15349,1
+50000,Female,High School,Single,29,2,48259,1726,1
+340000,Female,University,Single,30,0,257884,13000,0
+220000,Female,Graduate School,Single,30,-2,6391,2402,0
+180000,Female,University,Single,30,0,148860,6500,0
+110000,Female,Others,Single,32,3,109655,3500,0
+50000,Female,University,Married,32,0,49330,2150,0
+50000,Female,High School,Single,28,1,40527,2500,0
+240000,Female,University,Single,29,0,17486,5005,0
+200000,Female,Graduate School,Single,29,-2,56555,0,0
+90000,Female,High School,Married,29,0,91116,3500,0
+20000,Female,University,Married,29,0,3138,1030,1
+300000,Female,Graduate School,Single,30,-2,0,0,1
+290000,Female,University,Single,30,-1,4479,15499,0
+20000,Female,University,Married,30,0,18354,1225,1
+200000,Female,University,Single,31,0,101431,5000,0
+130000,Female,Graduate School,Married,31,0,128319,6300,0
+10000,Female,University,Single,27,2,6694,1124,0
+210000,Female,Graduate School,Single,28,0,74235,4000,0
+130000,Female,University,Married,28,0,115303,4157,0
+80000,Female,University,Married,29,2,82074,0,1
+410000,Female,Graduate School,Married,37,0,54326,5000,0
+60000,Female,University,Married,31,0,37549,4000,1
+30000,Female,University,Single,31,-2,0,0,0
+320000,Female,University,Married,32,0,101121,5800,0
+80000,Female,University,Married,32,-1,8592,4000,0
+210000,Female,University,Married,32,0,189307,8438,0
+60000,Female,University,Married,32,2,20524,1700,1
+310000,Female,University,Married,29,-2,13158,1062,0
+130000,Female,University,Single,25,-2,51734,77817,0
+60000,Female,University,Single,26,0,58117,2400,0
+30000,Female,University,Single,29,0,25167,1414,0
+150000,Female,University,Single,25,-1,1344,1041,0
+30000,Female,University,Married,25,2,28766,0,1
+130000,Female,University,Single,42,0,49363,5026,0
+50000,Female,University,Single,26,1,18253,2900,0
+170000,Female,University,Single,27,0,80848,8000,0
+30000,Female,University,Married,32,1,27807,0,1
+250000,Female,High School,Married,32,-1,860,860,0
+110000,Female,University,Single,32,3,108087,0,0
+160000,Female,Graduate School,Single,29,0,14023,1450,0
+340000,Female,Graduate School,Single,33,0,63201,8100,0
+100000,Female,High School,Married,36,0,102505,3830,0
+230000,Female,University,Single,28,0,64992,2500,0
+200000,Female,Graduate School,Married,31,-2,0,0,0
+30000,Female,University,Married,30,3,2400,0,1
+120000,Female,University,Single,30,0,110612,5555,0
+120000,Female,University,Single,28,0,67087,2400,0
+50000,Female,University,Married,31,1,50701,0,1
+20000,Female,High School,Single,31,0,5556,1028,0
+500000,Female,University,Single,30,0,98618,25464,0
+100000,Female,Graduate School,Single,31,1,0,0,0
+50000,Female,Graduate School,Single,38,2,21534,1400,1
+210000,Female,Graduate School,Single,30,0,7264,2000,0
+50000,Female,University,Married,30,0,50425,2400,0
+200000,Female,High School,Married,29,2,128649,6000,1
+150000,Female,University,Single,27,-1,1745,3384,0
+50000,Female,University,Married,34,2,62643,2151,1
+300000,Female,Graduate School,Single,30,0,217329,8000,0
+120000,Female,Graduate School,Single,31,-1,316,316,0
+190000,Female,University,Single,36,0,187972,8197,0
+70000,Female,University,Married,35,0,68445,3001,0
+50000,Female,University,Married,39,2,78874,4725,1
+50000,Female,University,Single,38,0,31426,1655,0
+80000,Female,High School,Married,38,0,82391,2925,0
+30000,Female,University,Others,35,0,25952,1440,1
+140000,Female,University,Married,41,0,41814,2500,0
+20000,Female,High School,Married,49,1,18769,3186,1
+30000,Female,University,Married,38,0,6061,1007,1
+260000,Female,High School,Married,44,0,148803,5100,0
+30000,Female,Graduate School,Married,48,0,25265,2000,1
+50000,Female,University,Single,41,1,49487,0,1
+50000,Female,University,Married,35,0,16053,2000,0
+60000,Female,High School,Married,35,1,10722,0,1
+50000,Female,University,Married,48,1,18704,0,1
+80000,Female,University,Married,38,0,72704,3133,0
+80000,Female,University,Married,41,0,79958,3452,1
+100000,Female,High School,Married,40,0,49328,2038,0
+30000,Female,High School,Married,37,1,28361,0,0
+30000,Female,University,Married,49,0,23705,1386,0
+30000,Female,High School,Married,46,0,9847,1181,0
+50000,Female,University,Married,31,2,13815,0,1
+240000,Female,University,Single,48,0,234312,9454,1
+30000,Female,High School,Married,44,1,0,0,0
+240000,Female,University,Single,35,0,15750,1005,0
+50000,Female,University,Single,31,2,28947,1900,1
+70000,Female,Graduate School,Single,23,0,7531,1145,1
+80000,Female,High School,Married,28,0,72561,2300,0
+120000,Female,University,Single,30,0,98546,5600,0
+100000,Female,High School,Married,32,0,32200,5884,0
+50000,Female,University,Married,44,0,49644,0,0
+320000,Female,University,Married,39,-1,42442,3336,0
+220000,Female,Graduate School,Married,36,0,215979,7800,0
+140000,Female,Graduate School,Single,33,0,87713,5000,0
+140000,Female,Graduate School,Married,37,0,136431,4849,1
+50000,Female,University,Single,24,-2,0,0,0
+60000,Female,Graduate School,Single,25,-1,538,0,1
+90000,Female,University,Single,47,0,81902,3274,0
+20000,Female,University,Others,44,3,2400,0,1
+130000,Female,University,Single,29,0,34529,19000,0
+80000,Female,University,Married,34,1,84720,3808,1
+230000,Female,University,Married,37,0,126251,4500,0
+180000,Female,University,Married,29,0,25781,1800,0
+20000,Female,University,Single,34,0,17489,3077,0
+210000,Female,University,Married,34,0,4141,1283,0
+70000,Female,University,Single,22,2,69242,3500,0
+50000,Female,Graduate School,Single,22,0,15787,2250,0
+140000,Female,University,Married,30,0,106526,4000,0
+100000,Female,University,Married,32,0,29987,2007,0
+80000,Female,University,Single,37,-2,5583,3946,0
+300000,Female,Graduate School,Married,36,1,0,0,0
+200000,Female,High School,Single,26,-1,0,0,0
+100000,Female,University,Single,29,-1,600,600,0
+50000,Female,University,Married,45,0,49375,1721,0
+140000,Female,High School,Married,47,6,151790,0,0
+50000,Female,University,Married,28,0,43206,2100,0
+500000,Female,Graduate School,Single,28,0,229258,10042,0
+360000,Female,University,Single,31,-2,3000,2668,0
+240000,Female,University,Single,34,1,-10,0,0
+200000,Female,Graduate School,Married,35,-1,316,734,0
+280000,Female,Graduate School,Married,35,0,281424,9604,0
+330000,Female,Graduate School,Married,36,1,0,0,0
+100000,Female,High School,Single,32,0,79158,23591,0
+30000,Female,High School,Married,35,1,10211,0,1
+130000,Female,University,Single,37,2,129677,5940,1
+30000,Female,University,Single,41,1,25562,0,0
+50000,Female,High School,Others,43,1,17962,1400,1
+80000,Female,University,Married,39,1,0,174,1
+570000,Female,Graduate School,Single,30,0,323085,50468,0
+20000,Female,University,Married,32,1,17044,0,0
+90000,Female,University,Single,35,0,30646,1500,0
+150000,Female,High School,Married,43,0,142811,6000,1
+110000,Female,University,Married,41,0,182235,6071,0
+170000,Female,High School,Married,37,1,33039,3,0
+90000,Female,University,Single,26,0,61982,2000,0
+110000,Female,University,Single,26,0,45028,2000,0
+20000,Female,University,Married,28,-1,2120,0,0
+200000,Female,University,Married,31,3,5000,0,1
+50000,Female,University,Married,34,0,51012,2000,0
+50000,Female,High School,Married,45,0,22352,1324,0
+30000,Female,High School,Married,46,-1,2572,8848,0
+50000,Female,High School,Married,46,-1,65,0,1
+80000,Female,High School,Married,26,0,6808,2000,0
+160000,Female,Graduate School,Single,26,-1,157453,5937,0
+10000,Female,University,Single,40,-2,-4,0,1
+20000,Female,University,Single,26,1,19725,1293,0
+70000,Female,Graduate School,Married,34,0,40498,1668,0
+110000,Female,High School,Others,37,2,67930,2293,0
+150000,Female,University,Married,45,-1,16611,18552,0
+500000,Female,Graduate School,Married,39,0,418958,39432,0
+30000,Female,University,Married,48,2,28695,1900,1
+160000,Female,High School,Married,46,-1,24904,2343,0
+300000,Female,University,Married,36,-1,77765,1443,0
+200000,Female,High School,Married,46,-1,154875,5216,0
+50000,Female,University,Single,35,0,2628,1000,0
+80000,Female,University,Single,29,0,60857,2000,0
+140000,Female,University,Single,29,0,122662,6504,0
+60000,Female,High School,Married,32,0,63354,3000,1
+50000,Female,High School,Married,47,0,25347,4881,0
+30000,Female,University,Single,39,0,29758,1500,0
+80000,Female,University,Married,45,0,28775,1442,0
+50000,Female,University,Married,43,1,50738,4,0
+90000,Female,University,Single,39,-1,2489,0,0
+120000,Female,University,Married,46,0,80324,3458,0
+50000,Female,University,Married,39,3,52714,0,1
+80000,Female,University,Single,31,0,75953,7000,0
+160000,Female,Graduate School,Married,31,-1,3089,0,0
+140000,Female,University,Single,33,-1,316,316,0
+50000,Female,University,Married,33,0,33441,2659,0
+50000,Female,University,Married,35,0,40506,5000,0
+50000,Female,High School,Single,40,0,19824,15000,0
+360000,Female,University,Married,47,-1,410,3620,0
+120000,Female,University,Married,38,0,50721,1667,0
+160000,Female,High School,Married,38,2,113336,6000,1
+100000,Female,University,Married,42,0,8166,1140,0
+110000,Female,University,Married,46,0,105851,5604,0
+300000,Female,High School,Single,30,-2,0,0,1
+110000,Female,University,Single,31,0,98006,4300,0
+220000,Female,Graduate School,Married,31,0,67722,7500,0
+100000,Female,University,Single,31,0,71171,1746,0
+50000,Female,University,Single,28,2,16639,14,0
+140000,Female,University,Single,28,1,5420,2000,0
+50000,Female,University,Married,43,-1,7965,12724,0
+30000,Female,High School,Married,28,0,16162,2000,1
+170000,Female,University,Single,30,0,92626,5000,0
+260000,Female,University,Married,32,-2,1249,0,1
+290000,Female,University,Married,33,-2,2369,1298,1
+80000,Female,University,Single,37,2,52052,2000,1
+160000,Female,University,Married,35,2,776,776,0
+100000,Female,Graduate School,Married,40,2,97229,0,1
+40000,Female,University,Married,37,2,25561,1500,1
+290000,Female,Graduate School,Single,35,0,190453,8100,0
+160000,Female,University,Single,42,1,152699,4462,1
+140000,Female,University,Married,36,1,59379,5200,0
+80000,Female,University,Single,45,1,-1600,83750,0
+30000,Female,University,Married,42,1,16484,1800,1
+70000,Female,University,Married,46,2,24036,1700,1
+30000,Female,University,Married,29,-1,419,1173,1
+50000,Female,University,Married,30,0,34811,2000,1
+570000,Female,Graduate School,Single,32,0,47758,3000,0
+200000,Female,Others,Married,33,0,92707,3112,0
+10000,Female,University,Married,37,-1,3305,0,0
+60000,Female,High School,Married,48,0,58653,2500,1
+220000,Female,University,Single,46,0,86957,4000,1
+60000,Female,University,Single,42,0,60902,2350,1
+280000,Female,University,Married,42,2,185147,5000,1
+10000,Female,University,Single,31,0,10468,1160,0
+200000,Female,Graduate School,Single,31,0,191807,8500,0
+200000,Female,University,Married,33,0,103876,2034,1
+220000,Female,University,Married,37,0,81191,3000,0
+370000,Female,University,Married,45,0,155124,10000,0
+100000,Female,University,Married,36,0,25945,4021,0
+290000,Female,University,Married,45,0,145023,6000,0
+20000,Female,High School,Married,43,1,0,0,1
+200000,Female,University,Single,36,0,199278,7279,1
+110000,Female,High School,Single,33,0,35370,11000,0
+80000,Female,University,Married,41,0,78032,2130,0
+100000,Female,University,Married,36,0,100050,4500,0
+50000,Female,University,Married,39,0,41598,1825,0
+20000,Female,University,Others,30,1,17836,0,1
+300000,Female,Graduate School,Single,33,0,13713,15003,0
+160000,Female,University,Single,41,0,71900,0,0
+80000,Female,High School,Married,40,2,16413,2160,1
+30000,Female,University,Married,34,2,16944,0,1
+30000,Female,University,Married,37,2,8229,3000,0
+60000,Female,University,Married,36,0,60333,2100,0
+200000,Female,University,Single,32,0,24429,1500,1
+50000,Female,University,Married,41,0,23064,1281,0
+50000,Female,High School,Single,45,0,10723,1000,0
+270000,Female,University,Married,42,1,-330,0,0
+30000,Female,High School,Married,40,2,27783,1748,1
+60000,Female,University,Married,41,0,34632,1600,1
+60000,Female,University,Single,28,1,0,1672,1
+50000,Female,Graduate School,Married,39,0,50465,4000,1
+220000,Female,University,Married,37,1,194640,0,0
+50000,Female,High School,Married,36,-2,51741,300,1
+30000,Female,High School,Single,33,0,22463,3278,1
+280000,Female,University,Single,33,0,28928,1634,0
+120000,Female,University,Married,35,-1,2415,9886,0
+120000,Female,High School,Married,39,1,69830,0,0
+70000,Female,High School,Married,41,0,23546,1573,0
+550000,Female,University,Married,39,0,67886,3688,0
+220000,Female,University,Single,36,0,124418,6300,0
+340000,Female,University,Married,45,-2,3253,15886,0
+110000,Female,High School,Single,32,-1,9943,9284,0
+140000,Female,High School,Married,34,0,131138,6300,0
+360000,Female,University,Single,41,0,352538,14152,1
+290000,Female,Graduate School,Single,30,-1,1459,1078,0
+200000,Female,University,Married,48,0,26465,1000,0
+60000,Female,University,Married,38,0,21261,2522,0
+40000,Female,University,Married,43,-2,1858,220,0
+120000,Female,University,Single,29,0,9102,2000,0
+80000,Female,University,Married,38,2,62622,3208,1
+300000,Female,University,Married,43,-2,0,0,1
+20000,Female,University,Married,25,-1,3454,6885,1
+30000,Female,University,Married,46,2,34713,2000,1
+180000,Female,Graduate School,Single,30,0,174869,8328,0
+110000,Female,University,Single,39,0,111533,5500,0
+250000,Female,University,Married,40,0,121558,6500,0
+140000,Female,University,Single,32,0,137647,5004,0
+80000,Female,University,Married,29,2,76362,3392,1
+20000,Female,High School,Married,32,0,18539,1400,0
+50000,Female,High School,Single,47,0,11961,2500,0
+90000,Female,High School,Married,40,0,14028,3090,0
+20000,Female,University,Married,33,0,17287,1572,0
+150000,Female,University,Single,34,0,121764,35837,0
+150000,Female,Graduate School,Married,34,-1,3099,3102,0
+50000,Female,High School,Single,41,0,3048,1074,1
+90000,Female,University,Single,47,-1,932,319,0
+350000,Female,University,Single,40,-2,26774,22181,0
+30000,Female,University,Single,38,0,22070,1676,0
+150000,Female,Graduate School,Married,28,1,0,0,0
+190000,Female,Graduate School,Single,29,0,189417,3318,1
+340000,Female,University,Married,36,0,3058,1087,0
+50000,Female,University,Married,47,0,49571,2000,0
+50000,Female,University,Single,29,2,30715,0,1
+280000,Female,University,Single,30,-2,0,0,0
+230000,Female,Graduate School,Single,31,-1,903,1711,1
+400000,Female,Graduate School,Single,31,-1,72551,8657,0
+190000,Female,High School,Married,37,-1,752,1302,1
+170000,Female,Graduate School,Single,31,0,18038,10047,0
+150000,Female,Graduate School,Single,30,1,0,0,1
+140000,Female,Graduate School,Single,32,0,97528,3562,1
+110000,Female,University,Single,23,-1,37857,2000,0
+260000,Female,Graduate School,Single,31,-1,16083,759,0
+30000,Female,University,Single,37,2,25037,2000,0
+130000,Female,High School,Married,42,2,95564,0,1
+190000,Female,High School,Single,37,0,44098,164582,0
+10000,Female,University,Single,36,1,7081,0,1
+30000,Female,University,Married,42,2,5288,1117,1
+300000,Female,Graduate School,Single,45,0,122838,10039,0
+200000,Female,University,Married,40,-1,9798,14499,0
+130000,Female,High School,Married,36,0,140479,5000,0
+400000,Female,High School,Single,35,-2,1432,8132,1
+220000,Female,University,Single,32,0,93682,4337,0
+100000,Female,Graduate School,Married,34,1,0,0,0
+80000,Female,University,Married,38,0,29249,1800,0
+50000,Female,High School,Single,34,0,60170,3000,0
+200000,Female,Graduate School,Married,34,1,0,0,0
+400000,Female,University,Married,43,0,398711,17000,0
+50000,Female,University,Single,48,-1,47295,4000,0
+140000,Female,High School,Married,37,-1,4637,1141,0
+10000,Female,University,Married,49,0,8671,1265,0
+80000,Female,High School,Married,41,0,3482,1000,0
+220000,Female,Graduate School,Married,38,0,136269,4967,0
+80000,Female,University,Single,45,0,78083,3045,0
+360000,Female,Graduate School,Married,35,-2,94657,45000,1
+20000,Female,University,Married,46,2,16371,1289,1
+20000,Female,University,Married,46,0,10640,1185,0
+20000,Female,High School,Single,34,1,19381,1600,0
+70000,Female,University,Married,38,0,50416,1835,0
+210000,Female,University,Married,33,0,29672,2100,0
+240000,Female,University,Married,32,0,153104,4500,0
+50000,Female,University,Married,34,0,24636,1500,0
+50000,Female,University,Married,31,0,17019,2591,0
+230000,Female,Graduate School,Single,38,-2,0,0,0
+120000,Female,Graduate School,Single,45,0,90041,3079,0
+180000,Female,Graduate School,Single,44,0,7668,2489,1
+290000,Female,Graduate School,Married,38,-1,3613,14718,0
+160000,Female,Graduate School,Married,47,0,151400,6960,0
+80000,Female,High School,Married,43,2,66852,3000,1
+50000,Female,University,Single,31,2,28133,1800,1
+20000,Female,High School,Married,33,1,0,0,1
+50000,Female,University,Others,34,-1,33436,2005,1
+80000,Female,University,Married,34,2,63619,2500,1
+30000,Female,High School,Married,43,1,32577,0,0
+10000,Female,University,Married,37,3,7621,0,1
+200000,Female,University,Single,41,-2,15094,1727,0
+50000,Female,University,Married,33,0,49450,1743,0
+480000,Female,University,Single,34,0,489305,18231,1
+260000,Female,University,Married,46,0,28700,15000,0
+200000,Female,Graduate School,Single,34,0,129554,5600,0
+210000,Female,Graduate School,Married,40,1,0,0,0
+70000,Female,University,Married,41,2,50460,3000,1
+50000,Female,University,Married,43,2,49072,4000,0
+20000,Female,University,Single,45,2,13696,2000,1
+80000,Female,University,Married,35,-2,390,0,0
+200000,Female,High School,Married,37,-1,10701,4251,0
+210000,Female,Graduate School,Married,38,-1,29504,0,0
+210000,Female,High School,Married,35,0,79662,3000,0
+360000,Female,Graduate School,Others,35,-2,3555,2570,0
+80000,Female,University,Single,56,0,57730,8198,1
+50000,Female,High School,Married,43,0,3238,1500,0
+20000,Female,University,Single,40,0,3135,2000,0
+40000,Female,University,Married,38,0,35183,4600,1
+50000,Female,University,Married,38,2,29989,2000,1
+170000,Female,University,Married,45,0,133341,6804,0
+310000,Female,Graduate School,Married,37,-2,123,123,0
+50000,Female,University,Married,24,0,46108,4000,0
+30000,Female,Graduate School,Single,25,0,24709,5000,0
+200000,Female,University,Married,45,0,144249,6097,0
+260000,Female,University,Married,38,0,191160,8357,0
+80000,Female,University,Single,29,0,59520,2753,1
+180000,Female,University,Single,27,1,0,0,0
+130000,Female,Graduate School,Single,28,-1,2936,2464,0
+190000,Female,Graduate School,Married,30,0,176699,15000,1
+110000,Female,University,Married,33,0,105440,5300,0
+60000,Female,University,Married,40,2,42922,3700,1
+40000,Female,High School,Married,49,1,35771,0,0
+80000,Female,University,Married,32,-2,1880,1261,0
+100000,Female,Others,Single,33,0,42622,5000,0
+400000,Female,University,Married,34,0,95390,2988,0
+70000,Female,University,Married,40,1,56922,5500,1
+500000,Female,University,Married,49,-2,3645,2744,0
+190000,Female,High School,Single,46,0,124245,4536,0
+200000,Female,University,Single,35,0,4585,1500,0
+210000,Female,University,Married,42,0,109919,11488,0
+340000,Female,University,Single,47,0,196112,6446,0
+180000,Female,Graduate School,Single,38,0,16056,180040,0
+110000,Female,University,Single,28,0,10048,2000,0
+80000,Female,High School,Single,42,0,11460,455,0
+30000,Female,High School,Married,31,0,9388,1162,0
+200000,Female,Graduate School,Single,31,-1,2905,0,0
+80000,Female,University,Married,31,0,51346,3000,0
+300000,Female,Graduate School,Single,31,0,34515,30000,1
+280000,Female,Graduate School,Single,32,-1,215052,7000,0
+110000,Female,University,Single,32,1,-296,2345,0
+130000,Female,University,Single,33,0,108984,3775,0
+500000,Female,Others,Married,32,-2,33647,486,0
+150000,Female,University,Married,32,1,0,0,0
+450000,Female,University,Single,36,-1,12050,13551,0
+230000,Female,University,Married,34,0,167299,8448,0
+80000,Female,University,Single,34,2,74860,11074,0
+260000,Female,Graduate School,Married,34,-1,199,599,0
+150000,Female,University,Married,34,2,140711,12416,0
+50000,Female,University,Married,35,0,23280,0,0
+30000,Female,High School,Married,35,0,27785,1800,1
+100000,Female,High School,Married,41,1,102250,0,1
+190000,Female,University,Married,43,1,189697,0,1
+130000,Female,University,Married,34,0,9383,1317,0
+30000,Female,University,Married,38,0,24380,1600,1
+30000,Female,University,Single,38,0,29632,1492,0
+110000,Female,Graduate School,Married,44,0,87418,7900,1
+120000,Female,High School,Married,39,0,116983,5750,0
+30000,Female,High School,Married,41,0,30518,5002,0
+30000,Female,High School,Married,36,2,7109,1132,1
+30000,Female,University,Single,42,0,26504,2000,0
+10000,Female,High School,Single,45,1,6925,3647,1
+400000,Female,University,Married,32,0,183339,10016,0
+20000,Female,High School,Others,32,0,19446,2000,0
+20000,Female,University,Married,36,2,18720,1610,1
+500000,Female,Graduate School,Single,33,0,30267,14773,0
+30000,Female,University,Married,33,4,31087,0,1
+50000,Female,University,Married,33,1,7642,0,1
+30000,Female,University,Single,33,0,30336,3005,0
+30000,Female,University,Single,33,2,17011,2000,1
+110000,Female,University,Single,33,0,36075,4000,0
+100000,Female,University,Married,34,3,39192,1700,0
+200000,Female,University,Married,36,0,175000,12000,0
+110000,Female,High School,Married,40,0,41629,1688,0
+140000,Female,University,Single,38,0,139001,4826,0
+70000,Female,University,Others,45,0,70820,2500,0
+130000,Female,High School,Single,37,0,31280,1813,0
+30000,Female,University,Married,45,-1,390,780,0
+160000,Female,High School,Married,36,0,92816,5000,0
+50000,Female,Others,Single,36,0,46277,2058,0
+80000,Female,University,Married,36,1,4749,6120,0
+50000,Female,High School,Married,44,0,46989,5177,0
+50000,Female,High School,Married,40,0,31297,1476,0
+180000,Female,University,Single,44,-1,3481,761,0
+130000,Female,University,Married,42,1,0,0,1
+420000,Female,Graduate School,Married,47,1,0,14165,0
+140000,Female,University,Married,44,0,110575,0,1
+20000,Female,University,Married,42,0,12029,1224,0
+200000,Female,University,Others,40,0,118728,4895,1
+180000,Female,Graduate School,Married,49,-2,0,0,0
+120000,Female,High School,Married,37,0,19821,1651,0
+210000,Female,University,Married,43,0,190693,7000,0
+30000,Female,University,Married,45,1,23155,0,1
+80000,Female,University,Single,43,0,72987,2561,0
+110000,Female,University,Married,35,0,85958,2968,0
+60000,Female,University,Married,36,0,60299,2000,1
+60000,Female,University,Married,36,0,59945,1805,0
+140000,Female,Graduate School,Single,37,1,-18,0,0
+60000,Female,University,Married,35,0,57267,3023,0
+500000,Female,Graduate School,Single,39,-1,51,0,1
+420000,Female,University,Married,41,0,373978,33808,1
+30000,Female,University,Married,43,2,15780,0,1
+170000,Female,High School,Married,38,0,4202,1336,0
+200000,Female,Graduate School,Married,49,-1,1248,0,0
+160000,Female,Graduate School,Married,45,0,108799,3972,0
+200000,Female,Graduate School,Married,35,1,205002,41,1
+20000,Female,University,Married,43,0,19295,1295,0
+190000,Female,Others,Single,39,0,109023,4000,0
+70000,Female,Graduate School,Married,45,1,6369,0,1
+220000,Female,University,Single,33,2,153664,7000,0
+500000,Female,Graduate School,Single,34,-2,11765,6648,0
+210000,Female,Graduate School,Single,37,-1,349,10152,1
+50000,Female,University,Married,31,2,43979,1724,0
+160000,Female,Others,Married,38,0,141282,5500,0
+20000,Female,Graduate School,Single,39,0,16690,1588,1
+70000,Female,University,Married,37,0,68886,2850,1
+220000,Female,Graduate School,Married,40,0,63386,8000,0
+200000,Female,University,Married,48,0,1002,0,0
+130000,Female,High School,Married,44,0,129329,4505,0
+30000,Female,University,Single,34,1,16935,1500,0
+260000,Female,Graduate School,Married,45,1,4856,1853,0
+230000,Female,High School,Married,32,1,167419,76,0
+240000,Female,Graduate School,Single,33,-2,0,0,0
+260000,Female,Graduate School,Married,32,-2,2378,4884,0
+360000,Female,University,Married,33,1,-10,2645,0
+260000,Female,Graduate School,Married,36,-1,416,416,0
+230000,Female,University,Married,34,-1,3128,10998,0
+300000,Female,Graduate School,Single,35,-1,103,344,0
+140000,Female,Graduate School,Married,45,-1,2300,483,0
+380000,Female,High School,Married,38,0,379080,13501,1
+190000,Female,Graduate School,Single,31,0,150183,7000,0
+500000,Female,Graduate School,Single,31,0,67475,6000,0
+200000,Female,Graduate School,Married,39,-1,14510,7922,0
+310000,Female,University,Married,37,0,30691,7595,0
+30000,Female,High School,Married,49,0,24885,1215,0
+280000,Female,University,Married,49,0,66749,3001,0
+80000,Female,Graduate School,Married,42,2,18378,1600,1
+150000,Female,University,Married,35,2,1846,3672,1
+30000,Female,University,Married,48,2,10509,2700,1
+170000,Female,Graduate School,Married,41,0,118379,4294,0
+110000,Female,Graduate School,Married,42,0,47021,2000,0
+30000,Female,High School,Married,46,1,29201,2000,1
+90000,Female,Graduate School,Single,33,0,33035,2000,1
+50000,Female,University,Single,40,0,50531,2504,1
+120000,Female,Graduate School,Married,36,-1,536,0,1
+20000,Female,High School,Single,46,1,18505,0,0
+220000,Female,Graduate School,Married,42,-1,316,316,0
+230000,Female,University,Married,42,-2,390,390,0
+160000,Female,Graduate School,Single,29,-1,1116,2599,1
+450000,Female,University,Married,38,0,87836,6000,0
+390000,Female,Graduate School,Married,46,-1,6570,2287,0
+90000,Female,University,Single,29,1,34954,0,0
+60000,Female,University,Married,33,0,56038,1911,1
+180000,Female,University,Married,34,0,3359,4060,1
+80000,Female,High School,Married,40,-1,7111,6500,0
+180000,Female,University,Single,43,-1,543,0,0
+320000,Female,University,Married,46,-1,545,5590,0
+320000,Female,Graduate School,Single,36,0,29026,5000,0
+100000,Female,University,Married,39,2,95893,3552,0
+20000,Female,High School,Married,42,0,18650,1300,0
+140000,Female,High School,Single,34,-1,316,0,0
+100000,Female,High School,Single,39,-1,1130,600,0
+10000,Female,High School,Single,44,2,10422,0,1
+30000,Female,University,Married,36,0,23309,2000,0
+50000,Female,High School,Married,39,0,38506,2008,0
+20000,Female,University,Married,44,0,12346,1500,0
+200000,Female,University,Single,31,1,0,0,0
+320000,Female,University,Married,32,0,44312,3000,1
+360000,Female,University,Single,37,-1,1066,0,1
+200000,Female,Graduate School,Married,33,1,0,0,0
+140000,Female,High School,Married,34,0,138375,5000,0
+420000,Female,Graduate School,Single,31,0,127901,3900,0
+150000,Female,Graduate School,Single,36,0,20755,4938,0
+200000,Female,University,Single,38,0,14956,5000,0
+230000,Female,University,Married,42,-1,3210,0,0
+30000,Female,University,Married,38,0,18971,2000,0
+30000,Female,University,Married,41,0,39745,1522,0
+50000,Female,Graduate School,Single,36,1,18992,1500,0
+410000,Female,Graduate School,Married,44,-2,-1,1734,0
+280000,Female,University,Single,43,0,273535,12000,0
+200000,Female,University,Married,46,0,47403,1550,0
+270000,Female,Graduate School,Married,45,-2,1526,0,0
+320000,Female,Graduate School,Single,32,-1,20488,18615,0
+60000,Female,University,Married,32,0,57199,2491,0
+130000,Female,University,Married,35,0,125989,6500,0
+30000,Female,High School,Married,54,0,24285,1500,1
+200000,Female,High School,Married,50,0,162296,13000,0
+170000,Female,University,Married,51,-1,3022,2751,0
+80000,Female,Others,Married,50,2,115027,6000,0
+500000,Female,Graduate School,Single,50,-2,0,0,0
+50000,Female,High School,Married,47,0,1091,390,0
+90000,Female,University,Married,50,0,72840,0,0
+350000,Female,Others,Married,50,0,136293,4846,0
+110000,Female,High School,Single,52,2,32085,3393,0
+120000,Female,High School,Single,50,2,149301,3000,1
+30000,Female,High School,Married,55,0,10682,1200,1
+30000,Female,University,Married,53,0,14495,1500,1
+250000,Female,Graduate School,Married,51,-2,6811,0,0
+50000,Female,High School,Married,52,0,22377,1380,0
+160000,Female,High School,Married,53,0,95907,15325,0
+140000,Female,University,Married,49,-1,8280,3147,1
+80000,Female,High School,Married,51,1,52275,0,1
+10000,Female,High School,Single,53,0,1473,2095,1
+300000,Female,Others,Married,50,-1,528,0,0
+90000,Female,University,Married,50,0,43439,2100,0
+20000,Female,University,Single,51,-2,0,0,0
+30000,Female,University,Married,53,0,25627,2000,0
+80000,Female,High School,Single,50,0,46239,1754,0
+20000,Female,Graduate School,Single,54,0,14034,1500,1
+20000,Female,High School,Married,55,0,3580,1238,0
+50000,Female,University,Married,53,0,36997,1607,1
+120000,Female,Graduate School,Married,60,-2,3956,0,0
+30000,Female,University,Married,47,0,30567,3778,0
+30000,Female,University,Married,56,2,14909,1200,1
+80000,Female,High School,Married,55,0,46777,5024,0
+50000,Female,University,Married,55,0,21371,1376,0
+80000,Female,University,Married,56,3,63995,3000,1
+100000,Female,University,Married,52,2,94935,6018,0
+30000,Female,High School,Married,62,0,18563,1700,1
+130000,Female,Graduate School,Single,51,0,73903,6000,0
+20000,Female,High School,Married,54,1,7189,0,0
+80000,Female,University,Married,57,2,82150,0,1
+100000,Female,Graduate School,Married,51,0,94247,3200,0
+20000,Female,High School,Single,50,2,11545,1600,1
+120000,Female,University,Married,51,0,138313,5000,0
+50000,Female,University,Single,53,2,3914,3591,1
+80000,Female,High School,Married,59,0,75662,3212,0
+90000,Female,High School,Married,54,0,18131,2301,0
+60000,Female,High School,Single,47,0,50469,3329,0
+100000,Female,High School,Married,50,0,27666,1473,1
+50000,Female,University,Married,50,0,51212,2070,0
+30000,Female,High School,Married,53,2,17585,1600,1
+240000,Female,Graduate School,Married,58,-1,7217,8210,0
+200000,Female,Graduate School,Married,47,0,101986,4000,0
+130000,Female,High School,Single,50,-1,390,107767,0
+110000,Female,High School,Married,50,0,101084,4700,0
+50000,Female,University,Married,51,0,40268,2000,0
+20000,Female,High School,Married,56,0,6949,2000,1
+50000,Female,High School,Married,69,0,12196,2000,0
+80000,Female,University,Married,54,2,48298,2200,1
+20000,Female,High School,Married,52,2,18049,1536,1
+20000,Female,High School,Married,56,1,11276,0,1
+30000,Female,High School,Married,53,0,12466,3086,1
+240000,Female,Others,Married,59,0,233918,8357,0
+50000,Female,High School,Single,56,1,0,0,1
+80000,Female,High School,Single,53,0,2315,1400,0
+180000,Female,Graduate School,Single,56,-2,0,0,1
+240000,Female,High School,Married,55,0,216170,8000,0
+120000,Female,University,Married,54,0,89197,3072,0
+20000,Male,University,Single,21,2,19820,2000,1
+100000,Male,University,Single,33,0,35169,5328,0
+30000,Male,University,Single,22,3,1050,0,1
+30000,Male,University,Single,23,0,19321,5320,1
+50000,Male,University,Single,22,0,51083,3500,0
+100000,Male,University,Single,23,0,67873,2000,0
+80000,Male,Graduate School,Single,23,0,59976,2926,0
+60000,Female,High School,Married,54,-1,892,21690,0
+120000,Male,University,Single,25,1,117160,70,1
+20000,Male,High School,Single,32,0,3448,1230,0
+20000,Male,University,Single,26,1,17077,1000,0
+50000,Male,Graduate School,Single,26,0,46415,1672,0
+50000,Male,University,Single,23,-1,2680,0,0
+50000,Male,University,Married,24,0,15524,1300,1
+10000,Male,University,Single,23,-1,10078,1300,0
+20000,Male,University,Single,24,2,14195,1500,1
+30000,Male,University,Single,25,0,6122,1300,0
+10000,Male,High School,Single,25,0,8525,1500,0
+10000,Male,High School,Single,22,1,8357,0,1
+150000,Male,University,Single,23,2,17635,135018,1
+80000,Male,University,Single,24,2,44507,3000,1
+20000,Male,University,Single,24,0,3843,1073,0
+120000,Male,Graduate School,Single,24,2,1330,2103,1
+40000,Male,Graduate School,Single,24,0,37412,2000,0
+20000,Male,High School,Single,24,3,15846,1000,1
+100000,Male,University,Married,24,2,10618,2800,1
+160000,Male,Graduate School,Single,25,-1,898,2000,0
+20000,Male,Graduate School,Single,24,1,17375,0,1
+60000,Male,University,Single,24,1,58231,5000,1
+150000,Male,University,Single,27,0,84034,5065,0
+40000,Male,University,Single,24,4,38344,0,1
+30000,Male,University,Single,26,0,25531,1500,0
+420000,Male,University,Single,26,1,390188,14000,1
+20000,Male,University,Single,26,2,14962,0,0
+50000,Male,University,Single,25,2,44630,1200,1
+20000,Male,University,Single,23,1,17845,0,0
+20000,Male,High School,Single,24,0,16973,1600,0
+20000,Male,University,Married,28,0,15226,2000,0
+300000,Male,Graduate School,Single,26,-2,0,0,0
+30000,Male,Graduate School,Single,25,2,22422,2600,0
+140000,Male,Graduate School,Single,27,0,29751,143713,0
+20000,Male,University,Married,27,0,20285,1500,0
+390000,Male,Graduate School,Single,27,0,372743,20008,0
+50000,Male,University,Single,27,0,39487,1600,0
+50000,Male,University,Single,26,-1,6945,2000,0
+50000,Male,University,Married,26,0,48019,2072,0
+320000,Male,University,Single,26,0,303703,12167,0
+280000,Male,Graduate School,Married,26,0,277815,12596,0
+50000,Male,Graduate School,Single,26,1,38160,3000,1
+500000,Male,University,Single,26,0,126725,6005,0
+200000,Male,Graduate School,Single,28,-1,2393,1731,0
+50000,Male,Graduate School,Single,28,1,0,0,0
+300000,Male,University,Single,27,-2,-158,0,0
+30000,Male,University,Single,27,1,22284,0,0
+50000,Male,University,Single,27,2,49079,0,1
+10000,Male,University,Married,27,4,3462,900,1
+20000,Male,High School,Single,28,1,8055,0,1
+50000,Male,High School,Single,27,0,49684,1961,0
+30000,Male,University,Single,26,0,27889,2875,0
+100000,Male,Graduate School,Single,30,0,100544,4511,1
+130000,Male,Graduate School,Single,29,-1,9619,6247,0
+20000,Male,University,Single,30,1,16906,1000,0
+150000,Male,Graduate School,Single,29,2,7562,3500,0
+220000,Male,University,Married,28,0,38074,6000,0
+100000,Male,University,Single,29,1,0,4586,0
+130000,Male,University,Single,25,0,119803,5100,1
+40000,Male,University,Single,29,0,20528,3100,0
+10000,Male,Graduate School,Single,30,3,2300,0,1
+290000,Male,Graduate School,Married,30,1,0,0,0
+180000,Male,University,Single,27,0,134168,4345,0
+280000,Male,University,Single,27,-2,1566,2891,0
+30000,Male,University,Single,27,3,1650,0,1
+310000,Male,Graduate School,Married,29,1,191457,651,1
+70000,Male,Graduate School,Single,28,0,19728,8000,0
+120000,Male,Graduate School,Single,28,0,7164,2515,1
+240000,Male,University,Single,29,-1,390,0,0
+50000,Male,University,Married,27,0,46806,2000,0
+320000,Male,Graduate School,Married,28,0,79819,3524,0
+80000,Male,University,Single,29,0,82150,0,0
+20000,Male,University,Single,29,1,11586,0,1
+80000,Male,University,Single,29,1,71795,0,0
+360000,Male,University,Married,28,-2,0,2500,0
+50000,Male,University,Single,29,0,44177,2070,0
+20000,Male,University,Single,23,-1,4584,4005,1
+10000,Male,University,Single,25,4,5365,600,0
+100000,Male,University,Single,28,0,72610,3500,0
+50000,Male,University,Others,30,0,47812,2536,0
+30000,Male,University,Single,24,2,390,780,0
+20000,Male,University,Single,24,0,15221,3287,0
+70000,Male,University,Single,28,3,91930,3500,0
+490000,Male,Graduate School,Single,29,-1,264,678,0
+30000,Male,Graduate School,Single,29,1,25264,0,1
+20000,Male,Graduate School,Single,29,0,15934,1329,0
+290000,Male,Graduate School,Single,29,-1,783,2838,0
+20000,Male,High School,Others,29,5,1800,0,0
+180000,Male,Graduate School,Single,26,0,8516,0,1
+100000,Male,University,Single,27,0,100591,3749,0
+30000,Male,University,Single,26,0,24079,2500,1
+470000,Male,Graduate School,Single,27,0,213495,10005,0
+200000,Male,University,Single,29,0,60068,2500,0
+400000,Male,Graduate School,Single,29,0,1123,82694,0
+400000,Male,University,Single,29,0,79435,3803,0
+50000,Male,University,Single,25,0,48336,1855,0
+70000,Male,University,Single,25,1,12088,0,0
+80000,Male,High School,Married,31,1,27653,0,0
+70000,Male,Graduate School,Single,29,2,33786,1800,1
+20000,Male,University,Single,28,3,150,0,0
+10000,Male,University,Single,28,-1,619,0,1
+430000,Male,University,Single,30,-1,28966,5011,1
+40000,Male,University,Single,30,-2,0,0,0
+450000,Male,University,Married,29,0,179547,8101,0
+10000,Male,High School,Married,30,1,9211,0,1
+10000,Male,University,Single,29,0,9073,1120,0
+10000,Male,University,Married,29,2,2918,1500,0
+80000,Male,Graduate School,Single,29,0,79048,3000,0
+80000,Male,High School,Married,29,2,69578,0,1
+240000,Male,University,Single,29,0,230979,7347,0
+280000,Male,University,Single,31,0,229780,2000,0
+50000,Male,University,Single,31,-1,228,2955,0
+80000,Male,Graduate School,Single,32,0,37639,1650,0
+20000,Male,High School,Married,29,1,0,0,0
+90000,Male,University,Single,29,1,0,0,1
+200000,Male,Graduate School,Single,29,-2,8099,3447,0
+360000,Male,Graduate School,Single,30,2,2024,428,0
+50000,Male,University,Married,31,-1,1450,2590,0
+130000,Male,Graduate School,Single,27,2,73561,3500,1
+80000,Male,High School,Married,31,0,65088,3000,0
+200000,Male,University,Single,29,-1,8951,9117,1
+150000,Male,University,Married,39,1,9937,0,0
+370000,Male,Graduate School,Married,34,0,379039,14104,0
+310000,Male,University,Single,45,-1,124793,1217,0
+240000,Male,Graduate School,Single,41,0,24171,3759,0
+320000,Male,Graduate School,Married,35,0,106799,11003,0
+50000,Male,University,Single,39,3,51998,0,0
+50000,Male,University,Married,41,0,48537,2022,0
+50000,Male,High School,Married,30,2,15397,2882,1
+80000,Male,Graduate School,Single,31,0,72840,9000,0
+420000,Male,Graduate School,Single,32,0,90875,4118,0
+50000,Male,Graduate School,Single,32,2,30559,2000,1
+150000,Male,University,Single,32,2,90476,0,1
+130000,Male,Graduate School,Single,34,0,30190,1818,1
+170000,Male,University,Married,34,0,47956,2000,0
+170000,Male,High School,Married,45,0,165315,8000,0
+120000,Male,University,Single,25,0,11147,2446,0
+30000,Male,High School,Single,35,1,29147,1900,1
+220000,Male,University,Single,30,0,6229,1000,0
+50000,Male,University,Single,25,2,19775,3800,1
+30000,Male,University,Single,34,2,10522,1200,1
+250000,Male,Graduate School,Single,30,0,73936,40000,0
+20000,Male,University,Single,29,0,13229,1233,1
+10000,Male,High School,Married,30,0,7470,1141,1
+50000,Male,University,Single,41,0,50611,2200,0
+30000,Male,High School,Married,48,0,19650,1500,1
+20000,Male,University,Single,31,2,13516,1000,1
+200000,Male,University,Single,32,0,196550,8001,0
+50000,Male,Graduate School,Single,33,1,7148,0,0
+120000,Male,Graduate School,Single,35,-1,326,326,0
+210000,Male,Graduate School,Single,40,-1,325,325,1
+180000,Male,High School,Married,32,0,181293,6400,0
+50000,Male,University,Married,32,0,19606,5027,0
+70000,Male,High School,Married,40,0,41986,1455,0
+200000,Male,Graduate School,Married,44,-2,1640,0,0
+50000,Male,High School,Married,46,1,10960,0,1
+230000,Male,University,Married,47,-1,996,1000,0
+30000,Male,Graduate School,Married,42,1,20214,0,1
+30000,Male,University,Single,32,0,20881,2000,0
+150000,Male,University,Single,34,-1,204187,20000,0
+10000,Male,University,Single,30,0,1473,3236,1
+150000,Male,Graduate School,Single,29,0,4614,2000,0
+360000,Male,Graduate School,Single,34,1,0,0,0
+340000,Male,Graduate School,Single,38,0,86057,5000,0
+50000,Male,University,Single,34,0,48844,1898,0
+30000,Male,University,Married,39,0,20893,2007,0
+360000,Male,Graduate School,Single,34,-2,2739,1206,0
+150000,Male,Graduate School,Married,46,0,150838,6000,0
+160000,Male,High School,Single,28,2,152626,8100,1
+30000,Male,University,Married,39,0,28628,2003,0
+480000,Male,Graduate School,Married,47,-1,10668,0,0
+200000,Male,University,Married,45,-1,98974,10000,0
+50000,Male,High School,Single,39,-1,238,0,1
+20000,Male,University,Married,32,1,17913,0,1
+60000,Male,University,Married,44,0,59331,2000,0
+200000,Male,Graduate School,Single,26,0,16000,0,0
+20000,Male,University,Married,27,0,16203,2000,0
+440000,Male,High School,Married,35,0,13754,2017,0
+100000,Male,Graduate School,Single,34,-1,772,0,0
+100000,Male,University,Single,38,0,95933,4499,0
+100000,Male,University,Married,48,0,11574,3500,0
+230000,Male,High School,Married,46,-2,1313,1421,0
+50000,Male,Graduate School,Single,37,0,13191,2000,0
+50000,Male,University,Married,32,0,42163,1472,0
+30000,Male,University,Married,41,1,14110,0,0
+140000,Male,University,Single,32,1,67923,3200,1
+60000,Male,High School,Single,43,0,55800,1782,0
+50000,Male,High School,Married,41,1,26915,0,0
+350000,Male,University,Single,36,0,249663,8026,0
+20000,Male,High School,Married,44,2,11623,1300,1
+20000,Male,University,Single,30,0,16320,1238,0
+50000,Male,University,Single,36,0,46712,5000,0
+20000,Male,University,Married,41,1,19951,0,1
+110000,Male,High School,Married,34,0,23751,1717,0
+200000,Male,Graduate School,Married,44,-1,655,0,1
+260000,Male,University,Married,46,2,82683,4000,1
+150000,Male,University,Married,36,2,88655,0,1
+50000,Male,University,Single,33,0,23142,5494,0
+150000,Male,University,Single,35,-1,13059,11526,0
+250000,Male,Graduate School,Single,35,2,153798,15000,1
+50000,Male,Others,Single,49,0,36869,1615,0
+50000,Male,University,Married,47,0,38830,1600,0
+50000,Male,High School,Others,44,2,41755,0,0
+80000,Male,High School,Single,40,1,81053,300,0
+20000,Male,High School,Married,49,0,11572,2820,0
+220000,Male,University,Married,47,2,233756,3000,1
+120000,Male,High School,Married,39,-1,12190,1400,1
+200000,Male,High School,Single,39,0,84184,16000,0
+30000,Male,High School,Married,47,0,118587,6624,0
+50000,Male,University,Single,43,0,50756,1934,0
+50000,Male,High School,Single,42,2,4864,1252,0
+230000,Male,University,Married,45,0,42770,2000,0
+280000,Male,University,Single,33,0,265120,9104,0
+70000,Male,Graduate School,Single,43,0,68228,2378,0
+120000,Male,University,Married,40,-1,112123,4089,0
+170000,Male,Graduate School,Single,39,0,138811,5505,0
+30000,Male,High School,Married,41,1,17301,1295,0
+20000,Male,University,Married,33,0,18706,1275,1
+20000,Male,High School,Single,35,0,16500,1281,0
+220000,Male,Graduate School,Single,39,3,184140,7500,1
+50000,Male,University,Single,44,0,20995,3000,0
+300000,Male,Others,Married,46,0,289299,15000,0
+430000,Male,University,Single,29,-2,1442,3058,0
+30000,Male,University,Married,36,-1,1226,0,1
+20000,Male,High School,Others,46,0,14075,2000,0
+210000,Male,Graduate School,Married,40,0,119752,10030,0
+80000,Male,University,Married,31,0,47464,3000,1
+30000,Male,High School,Single,31,0,28275,2000,0
+220000,Male,Graduate School,Married,47,2,165665,2008,1
+80000,Male,University,Single,32,0,22753,3000,0
+480000,Male,University,Married,31,0,55346,4000,0
+150000,Male,Graduate School,Single,33,0,94020,3040,0
+60000,Male,High School,Married,31,1,36243,0,0
+80000,Male,University,Single,33,2,24928,0,1
+50000,Male,University,Married,32,0,46827,2084,0
+50000,Male,High School,Single,34,1,33776,1900,1
+180000,Male,High School,Married,42,0,177027,7003,0
+100000,Male,University,Married,39,0,10603,1315,1
+240000,Male,High School,Married,42,2,259511,11000,0
+50000,Male,High School,Married,49,0,49331,2000,1
+20000,Male,High School,Married,40,0,8151,1153,0
+50000,Male,University,Married,30,0,48591,12000,0
+200000,Male,University,Married,31,-2,6926,1747,0
+180000,Male,University,Single,32,0,22832,1429,0
+320000,Male,Graduate School,Single,33,1,0,0,1
+200000,Male,High School,Single,48,0,87035,2915,0
+20000,Male,University,Single,41,0,10142,1200,1
+200000,Male,Graduate School,Single,38,-2,5607,0,0
+20000,Male,High School,Single,40,1,0,0,0
+50000,Male,University,Single,39,1,50180,0,0
+50000,Male,University,Single,25,0,50479,1649,1
+80000,Male,University,Married,43,0,44620,1582,0
+180000,Male,Graduate School,Single,33,0,48252,9024,0
+80000,Male,High School,Married,42,-2,-1590,4666,0
+150000,Male,University,Single,36,2,87021,2000,0
+20000,Male,University,Single,37,-1,830,17597,0
+300000,Male,Graduate School,Married,42,-1,1653,1734,0
+30000,Male,University,Single,35,1,25601,2000,0
+140000,Male,University,Married,49,0,35815,1590,0
+80000,Male,Graduate School,Married,36,1,8415,2181,1
+20000,Male,University,Married,45,0,13243,1500,0
+70000,Male,University,Married,35,0,69388,3439,1
+320000,Male,University,Single,36,0,318022,10810,0
+80000,Male,University,Married,46,0,31290,780,0
+50000,Male,University,Married,33,0,28086,1500,0
+80000,Male,University,Married,41,0,15824,2000,0
+110000,Male,High School,Single,38,0,52708,2469,0
+50000,Male,University,Married,45,0,49573,2245,0
+50000,Male,High School,Married,42,2,10343,2000,1
+20000,Male,University,Married,38,0,14966,1565,0
+120000,Male,Graduate School,Married,42,2,63598,0,0
+30000,Male,University,Single,37,0,23422,1397,0
+50000,Male,University,Others,34,0,8603,1200,0
+100000,Male,High School,Single,35,1,23704,0,1
+50000,Male,University,Single,35,3,150,0,0
+30000,Male,University,Single,34,0,17803,1626,0
+40000,Male,University,Married,38,3,38291,0,1
+190000,Male,University,Married,39,2,166611,7820,1
+10000,Male,High School,Married,39,1,10866,0,1
+50000,Male,High School,Married,43,0,50444,2000,0
+20000,Male,University,Single,34,0,17676,2000,0
+120000,Male,High School,Single,37,-2,2284,2007,0
+50000,Male,High School,Single,43,0,50517,2015,0
+310000,Male,Graduate School,Single,31,0,279711,10143,0
+240000,Male,High School,Single,45,1,221341,7000,0
+50000,Male,University,Married,47,1,-658,47000,1
+210000,Male,Graduate School,Single,40,-1,1535,2652,1
+60000,Male,University,Married,31,1,16253,0,0
+20000,Male,University,Single,31,0,13192,1230,1
+50000,Male,University,Single,43,0,50785,3000,1
+30000,Male,University,Single,39,0,24065,2005,1
+60000,Male,Graduate School,Single,29,1,43434,2100,1
+90000,Male,University,Single,29,0,70559,3000,0
+30000,Male,University,Married,32,0,12452,1500,1
+30000,Male,University,Single,37,3,2379,0,1
+220000,Male,High School,Married,39,2,175808,8000,1
+250000,Male,Others,Married,39,-2,14310,3700,0
+450000,Male,Graduate School,Married,42,-1,2549,20449,0
+30000,Male,University,Single,34,2,18402,2000,0
+220000,Male,University,Married,37,0,13429,2500,0
+300000,Male,Graduate School,Single,30,-2,901,901,0
+500000,Male,University,Single,30,0,135634,5034,0
+20000,Male,University,Single,34,1,15095,0,0
+110000,Male,University,Single,32,0,42015,1575,0
+10000,Male,High School,Married,41,1,10711,0,0
+20000,Male,University,Single,47,1,8557,0,1
+230000,Male,University,Married,34,-1,1344,0,0
+20000,Male,University,Single,35,1,15668,0,1
+50000,Male,University,Married,38,-1,1930,305,0
+200000,Male,Graduate School,Single,40,0,66052,3800,0
+80000,Male,University,Married,37,1,0,3000,0
+50000,Male,High School,Single,43,-1,4166,4166,0
+50000,Male,University,Married,49,1,16747,0,1
+110000,Male,Graduate School,Single,29,0,110721,3938,1
+110000,Male,University,Married,38,0,42617,1678,0
+220000,Male,University,Married,36,0,87431,50005,0
+140000,Male,High School,Married,38,0,138329,6806,0
+50000,Male,University,Single,43,0,34326,3000,0
+100000,Male,University,Married,40,2,70762,0,1
+20000,Male,Graduate School,Single,38,0,15255,5006,1
+50000,Male,High School,Single,32,0,49760,2189,1
+40000,Male,University,Single,33,0,23330,0,0
+180000,Male,University,Married,34,-1,13067,6170,0
+80000,Male,High School,Single,34,0,45876,1752,0
+30000,Male,University,Single,34,0,26774,2000,0
+230000,Male,University,Married,35,0,30798,1000,0
+20000,Male,Others,Single,41,0,20101,1318,0
+350000,Male,High School,Married,44,-1,2161,7426,0
+20000,Male,University,Married,38,1,9599,0,0
+130000,Male,University,Single,36,0,91016,1848,0
+50000,Male,University,Married,37,0,47832,2000,1
+100000,Male,University,Single,44,0,20439,1068,1
+20000,Male,University,Married,40,0,12129,1300,1
+150000,Male,Graduate School,Married,43,-2,1000,0,0
+230000,Male,University,Married,46,0,117255,5400,0
+470000,Male,Graduate School,Married,38,0,33277,1869,1
+50000,Male,University,Married,46,0,43719,1700,0
+120000,Male,High School,Married,49,0,119440,3844,1
+20000,Male,University,Married,43,2,17233,2000,1
+10000,Male,University,Married,45,1,7953,1500,0
+50000,Male,Graduate School,Single,29,2,7881,3000,1
+360000,Male,Graduate School,Single,30,-2,1246,17328,0
+100000,Male,Others,Married,38,0,101330,3824,0
+170000,Male,Graduate School,Single,36,-1,8433,644,0
+30000,Male,University,Single,43,2,14697,3200,1
+230000,Male,Graduate School,Married,37,1,0,2062,1
+150000,Male,Graduate School,Single,35,0,114997,4884,0
+260000,Male,University,Single,37,-1,3000,150,0
+80000,Male,High School,Married,48,0,76288,3003,0
+20000,Male,University,Single,43,0,19656,1300,0
+240000,Male,Graduate School,Married,36,1,0,0,1
+50000,Male,University,Married,40,1,49039,0,1
+50000,Male,Graduate School,Single,45,0,42931,6004,0
+100000,Male,University,Single,45,2,65781,0,1
+20000,Male,High School,Single,40,0,16884,1291,0
+150000,Male,Graduate School,Married,41,2,116699,0,1
+230000,Male,University,Others,37,-1,11770,10000,0
+20000,Male,University,Married,45,0,19473,1896,0
+390000,Male,University,Single,33,0,30209,1804,0
+80000,Male,University,Married,36,3,600,0,1
+270000,Male,Graduate School,Single,35,1,-12,0,0
+20000,Male,University,Married,37,0,15429,1270,0
+150000,Male,University,Married,45,2,152180,9500,0
+50000,Male,Graduate School,Single,36,0,49750,2100,0
+160000,Male,Graduate School,Married,47,1,58706,0,1
+360000,Male,Graduate School,Single,33,-2,714,7,0
+150000,Male,High School,Married,40,0,121544,5000,0
+500000,Male,Graduate School,Married,46,0,11471,2000,0
+220000,Male,Graduate School,Married,35,-1,753,6220,1
+280000,Male,Graduate School,Married,46,1,0,0,1
+30000,Male,Graduate School,Single,38,1,96258,0,0
+500000,Male,High School,Single,36,0,9001,1702,0
+360000,Male,University,Married,40,-2,0,256,0
+400000,Male,Graduate School,Married,38,1,36838,65,0
+350000,Male,University,Married,43,-2,6088,3436,0
+20000,Male,University,Single,36,1,15557,0,1
+110000,Male,University,Married,36,1,112875,0,0
+30000,Male,High School,Single,31,1,31405,0,1
+20000,Male,University,Single,45,0,19856,1316,1
+180000,Male,Graduate School,Married,42,-2,0,0,0
+140000,Male,University,Single,48,0,119965,8506,0
+300000,Male,Graduate School,Single,43,-1,163,2508,0
+240000,Male,High School,Single,41,0,218435,5778,0
+160000,Male,University,Married,41,-1,326,1306,1
+500000,Male,University,Married,38,0,49201,30107,0
+50000,Male,University,Married,45,0,50297,1939,0
+50000,Male,University,Married,39,1,26094,0,0
+100000,Male,Others,Single,37,0,7642,0,0
+150000,Male,University,Single,38,-2,6682,2669,0
+290000,Male,Graduate School,Married,36,1,0,0,0
+20000,Male,University,Single,37,0,15612,1577,1
+50000,Male,University,Married,33,0,49916,2000,0
+80000,Male,University,Married,39,0,37712,1626,0
+140000,Male,University,Married,42,0,139227,4325,0
+260000,Male,University,Single,33,0,19152,5000,0
+150000,Male,Graduate School,Single,37,2,36148,1600,1
+300000,Male,University,Single,37,-2,11669,27963,0
+120000,Male,Graduate School,Single,33,1,0,0,0
+90000,Male,University,Single,38,0,17621,1700,0
+200000,Male,Graduate School,Married,42,-2,-200,0,0
+140000,Male,Graduate School,Single,32,0,46134,5890,1
+50000,Male,University,Married,45,1,49436,0,1
+200000,Male,Others,Single,40,1,0,0,0
+280000,Male,Graduate School,Single,33,0,167272,18000,0
+430000,Male,Graduate School,Single,34,0,416678,16005,0
+190000,Male,High School,Married,34,0,187780,6781,0
+50000,Male,High School,Single,38,1,38811,0,0
+110000,Male,University,Single,40,0,106455,6000,0
+20000,Male,High School,Single,46,0,9784,1885,0
+100000,Male,High School,Single,45,-1,10380,72024,0
+370000,Male,Graduate School,Single,37,-2,12913,3409,0
+320000,Male,Others,Single,36,-2,2860,16021,0
+80000,Male,Graduate School,Married,33,0,45524,2000,0
+100000,Male,Graduate School,Married,35,1,0,0,1
+180000,Male,University,Married,42,2,26883,1452,1
+90000,Male,High School,Married,46,0,91092,4000,0
+170000,Male,University,Married,39,0,142509,5000,0
+80000,Male,University,Married,50,0,72269,4000,0
+20000,Male,High School,Married,50,0,19657,1273,0
+50000,Male,University,Single,60,0,51078,2013,0
+20000,Male,University,Single,50,1,10056,3000,1
+390000,Male,Graduate School,Married,64,0,38383,1608,0
+80000,Male,High School,Married,59,0,78651,3454,0
+350000,Male,University,Married,51,0,343842,15000,0
+320000,Male,Graduate School,Married,51,0,483184,20373,0
+50000,Male,High School,Married,52,2,36428,2000,0
+260000,Male,University,Married,62,-1,390,390,0
+200000,Male,University,Married,55,0,169287,6177,0
+30000,Male,University,Married,49,-1,390,1936,1
+20000,Male,University,Single,55,0,7357,1290,0
+320000,Male,University,Married,50,0,277884,10100,0
+30000,Male,High School,Others,52,1,23819,0,1
+90000,Male,High School,Single,50,0,80926,3451,0
+180000,Male,University,Married,52,-2,360,345,0
+50000,Male,Graduate School,Single,50,0,47155,2000,0
+60000,Male,University,Married,52,0,58682,3762,0
+50000,Male,University,Single,53,0,48353,2000,0
+50000,Male,High School,Married,55,-1,390,390,0
+20000,Male,High School,Married,53,3,2255,0,1
+80000,Male,High School,Married,53,1,9637,0,0
+350000,Male,Graduate School,Married,58,-2,500,4063,0
+200000,Male,University,Married,52,0,150603,7000,0
+360000,Male,University,Married,50,-2,28595,0,0
+20000,Male,High School,Single,59,2,16357,3000,1
+90000,Male,High School,Married,55,0,30789,1514,0
+370000,Male,High School,Single,66,0,258397,9661,0
+180000,Male,Graduate School,Single,51,0,134327,6015,1
+20000,Male,High School,Others,59,2,15281,1596,1
+30000,Male,Graduate School,Married,57,2,22957,1707,1
+60000,Male,High School,Married,54,0,58432,20000,0
+330000,Male,University,Married,52,0,120212,5000,0
+230000,Male,University,Married,51,2,204643,5805,0
+50000,Male,High School,Married,57,2,18275,1509,0
+30000,Male,Others,Married,56,0,29060,1030,0
+80000,Male,Graduate School,Married,53,0,39298,1809,0
+50000,Male,University,Married,58,2,8728,2152,1
+10000,Male,University,Married,50,1,10171,2,0
+170000,Male,Others,Married,56,0,86229,55000,0
+50000,Male,University,Married,61,2,390,390,1
+20000,Male,High School,Single,57,2,9918,2000,0
+20000,Male,University,Single,50,-1,1956,16573,0
+190000,Male,High School,Married,50,0,102648,5400,0
+270000,Male,Graduate School,Married,50,2,213616,0,0
+480000,Male,University,Single,51,-2,2900,17646,0
+380000,Male,High School,Married,52,0,145445,6000,0
+20000,Male,High School,Married,49,0,10823,1400,1
+50000,Male,University,Married,48,2,42807,2700,1
+280000,Male,High School,Married,52,0,247868,9000,1
+240000,Male,Graduate School,Married,65,0,146696,6930,0
+110000,Male,University,Married,54,2,55753,2600,1
+200000,Male,Graduate School,Married,53,0,66012,2500,0
+140000,Male,Graduate School,Married,56,-1,1862,7465,0
+260000,Male,Graduate School,Married,53,0,73223,2001,0
+130000,Male,University,Single,57,-1,4820,0,0
+50000,Male,High School,Married,54,0,37238,1771,0
+300000,Male,Graduate School,Single,55,-2,390,601,1
+30000,Male,High School,Married,54,3,29391,1600,1
+50000,Male,High School,Single,58,-1,14628,1558,0
+450000,Male,Graduate School,Married,67,-2,0,0,1
+170000,Female,University,Single,27,0,48947,10000,0
+60000,Female,University,Single,29,2,40351,2000,1
+30000,Female,High School,Single,26,0,30244,1701,0
+200000,Female,Graduate School,Single,28,-1,3494,2516,0
+30000,Female,University,Married,30,1,28720,0,0
+100000,Female,Graduate School,Married,30,1,0,851,1
+30000,Female,High School,Married,35,1,29622,0,0
+150000,Female,High School,Single,24,0,66347,1827,0
+210000,Female,Graduate School,Single,25,-1,100,0,0
+300000,Female,Graduate School,Single,28,-1,788,1000,0
+50000,Female,High School,Single,22,0,18108,1400,0
+360000,Female,University,Married,33,-1,21730,20000,0
+240000,Female,Graduate School,Single,28,0,233441,9000,0
+260000,Female,Graduate School,Single,30,-2,893,2946,1
+140000,Female,Graduate School,Married,33,2,55563,2520,1
+10000,Female,Graduate School,Single,21,0,8278,1304,0
+50000,Female,High School,Single,22,0,46811,1036,0
+50000,Female,University,Single,22,0,37253,5500,0
+50000,Female,University,Single,22,2,36665,4626,1
+140000,Female,Graduate School,Single,23,0,90289,10007,1
+20000,Female,University,Single,27,0,7486,2000,0
+50000,Female,University,Single,24,2,48690,2078,1
+30000,Female,High School,Single,23,4,15947,0,0
+50000,Female,University,Single,22,0,8585,0,0
+130000,Female,University,Single,23,-2,-169,4739,0
+80000,Female,Graduate School,Single,23,-1,390,390,0
+30000,Female,University,Single,21,0,28028,2019,0
+50000,Female,University,Single,22,0,24775,2001,0
+20000,Female,University,Single,22,1,3843,2000,1
+90000,Female,High School,Single,22,0,68663,4341,0
+30000,Female,Graduate School,Single,22,0,29362,1800,0
+50000,Female,University,Single,22,2,19091,0,0
+150000,Female,Graduate School,Single,24,1,-200,600,0
+20000,Female,High School,Married,24,0,10994,1500,1
+30000,Female,Graduate School,Single,24,0,30900,410,1
+50000,Female,University,Single,24,2,49397,2100,0
+400000,Female,Graduate School,Single,30,0,153655,26876,0
+20000,Female,University,Single,23,0,16027,1600,0
+70000,Female,University,Single,23,1,72306,4,0
+20000,Female,Graduate School,Single,23,-1,6916,14000,0
+30000,Female,Graduate School,Single,23,0,27013,1470,0
+30000,Female,University,Single,23,0,28138,2000,0
+80000,Female,Graduate School,Single,24,2,48597,1800,1
+80000,Female,Graduate School,Single,24,0,78862,4000,0
+40000,Female,University,Single,24,0,33370,1600,1
+70000,Female,University,Single,23,0,44830,2100,0
+10000,Female,Graduate School,Single,22,0,6967,2523,1
+60000,Female,University,Married,24,2,61434,2850,1
+50000,Female,Graduate School,Single,22,1,4434,0,0
+80000,Female,University,Single,23,0,6805,2000,0
+70000,Female,Graduate School,Single,23,0,7668,1500,1
+30000,Female,High School,Single,22,0,8922,1275,0
+30000,Female,University,Single,22,2,27201,4000,1
+20000,Female,University,Single,25,0,18825,1300,1
+50000,Female,Graduate School,Single,25,-1,390,0,1
+30000,Female,University,Single,29,0,26953,1741,0
+50000,Female,University,Single,24,0,50649,1905,0
+50000,Female,Graduate School,Single,22,1,42611,0,0
+50000,Female,High School,Single,23,2,41740,2000,1
+50000,Female,Graduate School,Single,23,1,0,0,0
+10000,Female,Graduate School,Single,23,3,2400,0,1
+10000,Female,University,Married,24,0,6902,1142,1
+110000,Female,University,Single,24,-1,1183,1362,0
+140000,Female,University,Single,24,0,34814,1887,0
+20000,Female,Graduate School,Single,24,0,19000,1000,0
+30000,Female,Graduate School,Single,24,0,25288,3000,0
+100000,Female,University,Single,24,2,93330,4400,1
+20000,Female,Graduate School,Single,24,2,1150,1661,0
+190000,Female,University,Single,25,0,26023,1615,0
+300000,Female,University,Single,24,0,26158,5673,0
+50000,Female,University,Married,26,2,46800,2100,1
+30000,Female,High School,Single,26,0,30467,2173,0
+20000,Female,Graduate School,Single,25,-1,4541,0,1
+60000,Female,University,Single,25,0,3075,61058,1
+70000,Female,University,Married,25,0,67574,3839,0
+60000,Female,University,Married,24,2,34708,0,1
+60000,Female,University,Married,25,2,33598,2000,1
+30000,Female,University,Single,22,2,31331,0,1
+100000,Female,University,Married,24,2,68371,1300,1
+70000,Female,High School,Single,25,2,70963,2527,1
+20000,Female,University,Single,22,0,38391,2608,1
+50000,Female,University,Single,22,0,38340,3700,0
+50000,Female,High School,Married,22,2,40247,900,0
+30000,Female,University,Married,22,0,14533,3002,0
+30000,Female,High School,Single,22,2,29793,0,1
+30000,Female,University,Single,22,0,25490,1432,1
+60000,Female,University,Single,23,2,58900,2717,1
+20000,Female,University,Married,22,0,19667,1500,1
+50000,Female,High School,Single,23,2,50653,2190,0
+30000,Female,Graduate School,Single,24,0,28049,2000,0
+30000,Female,Graduate School,Single,23,0,29047,1508,1
+70000,Female,Graduate School,Single,24,1,19200,500,0
+100000,Female,University,Single,24,0,101614,10000,0
+100000,Female,University,Single,24,0,7094,1439,0
+250000,Female,University,Single,25,-1,5211,5211,0
+130000,Female,University,Married,25,0,127125,4000,0
+50000,Female,Graduate School,Single,25,2,49052,3006,1
+250000,Female,University,Single,26,0,124353,10000,0
+50000,Female,University,Married,26,0,50604,1924,1
+50000,Female,Graduate School,Single,25,0,38487,5000,0
+230000,Female,University,Single,24,-2,0,0,0
+200000,Female,University,Single,25,0,175435,24720,0
+50000,Female,University,Single,23,0,46968,1600,0
+100000,Female,University,Single,24,3,85016,4200,1
+140000,Female,University,Single,24,0,99012,11000,0
+20000,Female,University,Single,24,0,13542,1543,0
+100000,Female,University,Single,25,0,87987,4000,0
+240000,Female,University,Single,25,0,9030,1144,0
+20000,Female,University,Single,25,0,18519,2000,0
+50000,Female,Graduate School,Single,25,0,45342,2000,0
+50000,Female,Graduate School,Single,25,0,25008,3200,0
+60000,Female,University,Single,25,0,36081,2500,0
+10000,Female,University,Single,22,0,6759,2450,1
+30000,Female,Graduate School,Single,23,1,26185,4518,0
+30000,Female,Graduate School,Single,23,0,21077,3383,1
+60000,Female,Graduate School,Single,23,0,13233,1500,0
+80000,Female,University,Single,25,1,78894,3000,0
+130000,Female,University,Single,25,0,111587,4100,0
+30000,Female,University,Single,25,1,17835,4000,0
+60000,Female,University,Single,23,0,57097,2659,1
+80000,Female,University,Married,24,0,74377,3000,0
+230000,Female,University,Single,24,0,78737,2300,0
+80000,Female,Graduate School,Single,25,0,72120,2600,0
+70000,Female,Graduate School,Single,24,0,70888,2900,0
+220000,Female,University,Married,25,0,142714,7200,0
+50000,Female,University,Single,25,0,44894,5003,0
+210000,Female,University,Single,24,0,45685,1609,0
+60000,Female,University,Single,24,3,150,0,1
+30000,Female,University,Married,25,1,29304,0,1
+50000,Female,University,Single,24,0,43627,3000,0
+50000,Female,University,Married,24,0,23996,3500,0
+80000,Female,Graduate School,Single,26,0,37329,3700,0
+60000,Female,High School,Married,25,0,60340,2200,1
+20000,Female,University,Single,25,0,15389,1270,1
+50000,Female,High School,Married,26,0,47095,2073,0
+260000,Female,Graduate School,Single,27,0,23820,10000,0
+170000,Female,University,Single,27,-1,414,0,1
+20000,Female,University,Single,27,0,15248,1265,0
+20000,Female,University,Married,26,1,20224,0,0
+130000,Female,University,Married,26,0,2783,130000,0
+70000,Female,University,Married,25,0,70940,2533,0
+120000,Female,Graduate School,Single,26,-1,594,1052,0
+210000,Female,Graduate School,Single,26,0,10401,18480,0
+50000,Female,University,Married,27,0,40491,2003,0
+30000,Female,Graduate School,Single,27,0,26951,2000,0
+150000,Female,University,Single,27,0,140947,5000,0
+300000,Female,High School,Single,27,0,105786,9000,0
+150000,Female,University,Single,27,1,0,0,0
+30000,Female,University,Married,27,2,25557,1800,1
+50000,Female,University,Single,27,0,48492,1600,0
+80000,Female,Graduate School,Single,26,0,78025,3500,0
+80000,Female,University,Married,26,0,13465,20587,0
+60000,Female,University,Single,26,1,-25,0,0
+200000,Female,Graduate School,Married,27,-2,-3,0,0
+280000,Female,University,Single,31,0,147646,7001,0
+50000,Female,University,Single,24,0,47608,2118,0
+80000,Female,University,Single,24,2,22453,2000,1
+50000,Female,University,Single,23,0,33243,2000,0
+20000,Female,Graduate School,Single,24,0,18916,1600,0
+80000,Female,Graduate School,Single,24,0,39813,9425,1
+20000,Female,High School,Married,27,3,16004,0,1
+110000,Female,University,Single,28,0,106202,5300,0
+50000,Female,Graduate School,Single,28,0,31392,0,0
+130000,Female,High School,Single,24,0,90141,4200,1
+20000,Female,Graduate School,Single,25,-1,1828,1200,1
+130000,Female,University,Single,25,2,836,668,1
+50000,Female,High School,Single,25,0,41641,3500,1
+50000,Female,University,Single,25,2,33586,1900,1
+130000,Female,University,Single,26,0,187600,6074,0
+80000,Female,University,Married,26,0,32183,2820,1
+230000,Female,University,Married,28,-1,1736,68000,0
+500000,Female,Graduate School,Single,28,0,6618,8984,0
+160000,Female,Graduate School,Single,27,0,124573,5800,0
+150000,Female,University,Married,27,-2,5560,0,1
+60000,Female,University,Single,27,0,45373,4747,0
+50000,Female,Graduate School,Single,27,1,19096,0,0
+10000,Female,Graduate School,Married,27,0,6031,836,0
+140000,Female,Graduate School,Single,28,0,9598,1500,0
+180000,Female,Graduate School,Single,30,0,112106,3800,0
+260000,Female,Graduate School,Single,28,0,85666,10005,0
+90000,Female,University,Married,27,3,81701,6000,1
+20000,Female,University,Married,29,2,18985,3100,1
+30000,Female,High School,Married,26,3,8893,0,1
+420000,Female,University,Single,26,-2,62364,30336,0
+230000,Female,University,Single,27,-2,326,326,0
+90000,Female,University,Married,28,0,59694,2852,0
+210000,Female,University,Married,27,2,170646,7800,1
+230000,Female,University,Single,27,0,73521,4000,0
+100000,Female,University,Single,30,0,85841,4000,0
+20000,Female,High School,Married,30,0,12304,5170,0
+20000,Female,High School,Single,27,0,20288,1500,0
+150000,Female,High School,Single,28,-1,35199,1588,0
+180000,Female,Graduate School,Single,29,-1,10265,1200,0
+20000,Female,Graduate School,Single,26,0,17921,1333,0
+280000,Female,Graduate School,Single,27,0,18641,6630,0
+50000,Female,High School,Single,26,0,46879,2569,0
+50000,Female,High School,Single,26,0,35988,1700,0
+100000,Female,University,Single,26,0,6631,1130,0
+10000,Female,University,Single,27,1,5150,0,0
+90000,Female,Graduate School,Single,27,0,85822,3327,0
+120000,Female,Graduate School,Single,29,-1,1914,573,0
+250000,Female,University,Single,29,2,252354,11783,1
+180000,Female,Graduate School,Single,29,-1,9418,0,0
+350000,Female,University,Single,29,0,209234,7280,0
+250000,Female,University,Single,31,-2,1242,0,0
+140000,Female,University,Married,29,0,100064,6029,0
+50000,Female,Graduate School,Single,28,-1,390,390,1
+150000,Female,Graduate School,Single,28,2,145096,5000,1
+160000,Female,Graduate School,Single,28,0,4122,2000,0
+130000,Female,High School,Married,28,0,29974,5026,0
+340000,Female,University,Married,26,0,95878,4500,0
+260000,Female,University,Single,26,0,16150,2000,0
+110000,Female,Graduate School,Single,26,0,38920,1700,0
+400000,Female,Graduate School,Single,29,0,7191,2000,0
+70000,Female,Graduate School,Single,29,0,40048,1700,0
+20000,Female,High School,Married,31,-1,14320,0,1
+60000,Female,University,Married,28,2,17942,0,1
+80000,Female,Graduate School,Single,29,2,54313,4600,0
+150000,Female,University,Married,31,2,101425,3872,0
+200000,Female,Others,Single,29,-1,9383,37465,0
+110000,Female,Graduate School,Single,29,-1,316,31895,1
+450000,Female,University,Single,29,0,371724,10000,0
+160000,Female,University,Single,29,0,21898,5004,0
+150000,Female,University,Single,28,0,49278,2000,0
+30000,Female,Graduate School,Married,28,0,30497,1800,0
+160000,Female,University,Single,29,-1,1194,2108,0
+140000,Female,Graduate School,Single,29,0,51822,2000,0
+50000,Female,Graduate School,Single,28,-1,5953,1044,0
+390000,Female,Graduate School,Single,27,2,393415,0,1
+300000,Female,University,Married,28,-1,16666,1000,0
+50000,Female,University,Single,29,2,43629,3790,1
+70000,Male,University,Single,29,-2,-8187,0,0
+170000,Female,University,Single,28,0,44996,2100,0
+160000,Female,High School,Married,45,-1,227,280,0
+50000,Female,Graduate School,Single,27,2,34044,0,1
+50000,Female,University,Married,28,1,38612,0,0
+90000,Female,Graduate School,Single,28,-1,2780,1950,0
+200000,Female,Graduate School,Single,25,-1,8926,700,0
+20000,Female,University,Single,26,0,15700,1600,0
+50000,Female,University,Married,30,0,31079,2280,0
+50000,Female,University,Married,24,1,12744,0,1
+60000,Female,High School,Single,26,0,27402,1500,0
+30000,Female,University,Married,33,0,11042,1200,0
+360000,Female,High School,Single,30,-1,780,390,0
+20000,Female,University,Single,22,0,15513,3000,0
+30000,Female,University,Single,23,0,18497,3153,0
+80000,Female,University,Single,25,0,47369,4400,1
+20000,Female,University,Married,22,0,16366,1600,0
+70000,Female,University,Single,25,0,69455,2600,0
+120000,Female,Graduate School,Single,26,0,103452,4000,0
+20000,Female,Graduate School,Single,26,1,17040,0,0
+20000,Female,University,Single,22,0,9848,1515,0
+170000,Female,University,Single,23,0,8806,32192,0
+130000,Female,University,Single,24,0,121720,4155,0
+180000,Female,High School,Single,32,0,183512,7700,0
+240000,Female,University,Single,30,0,89009,3071,0
+50000,Female,University,Married,27,0,50544,2300,0
+100000,Female,Graduate School,Married,39,2,94732,5000,1
+80000,Female,Graduate School,Single,33,0,73397,10000,0
+10000,Female,University,Others,46,3,5997,0,1
+50000,Female,University,Single,35,0,48818,2200,0
+80000,Female,University,Married,35,2,22001,2000,1
+40000,Female,High School,Married,39,2,22241,1700,1
+90000,Female,University,Married,40,2,96249,0,1
+170000,Female,High School,Married,40,0,69602,2128,0
+50000,Female,High School,Single,49,0,101590,1800,0
+10000,Female,University,Married,41,3,2400,0,1
+70000,Female,High School,Married,34,1,76023,0,0
+20000,Female,University,Single,38,0,15925,1618,0
+120000,Female,University,Single,36,0,3225,3232,0
+110000,Female,University,Married,38,0,105433,4008,0
+50000,Female,University,Married,36,2,44897,6200,1
+110000,Female,University,Single,34,0,10827,2500,0
+50000,Female,University,Married,37,1,51377,0,0
+360000,Female,Graduate School,Single,31,3,2500,0,1
+20000,Female,University,Married,38,3,18666,0,1
+180000,Female,Graduate School,Married,42,0,79216,3000,1
+100000,Female,University,Married,37,0,100253,4500,0
+20000,Female,High School,Married,37,1,15829,0,1
+150000,Female,High School,Married,39,2,151664,6300,0
+60000,Female,University,Married,27,0,40831,1617,0
+80000,Female,Graduate School,Single,29,2,25248,0,1
+60000,Female,High School,Married,47,2,20347,0,1
+20000,Female,High School,Married,43,2,18310,1300,1
+110000,Female,University,Single,33,0,104151,3791,1
+100000,Female,High School,Married,41,1,77208,0,0
+280000,Female,University,Married,37,0,198823,10000,0
+130000,Female,High School,Single,36,0,127544,5963,0
+50000,Female,High School,Single,29,0,47815,2500,0
+60000,Female,University,Married,33,0,50696,5000,0
+70000,Female,High School,Married,34,0,43071,5000,0
+240000,Female,University,Married,46,0,177221,7053,1
+310000,Female,University,Single,31,2,137171,6600,1
+20000,Female,High School,Single,33,0,20023,1300,0
+50000,Female,University,Married,37,0,29969,1281,0
+50000,Female,High School,Single,33,1,25922,55,0
+130000,Female,University,Married,34,0,99891,3700,0
+230000,Female,University,Single,30,0,62988,2847,0
+380000,Female,University,Married,31,1,3491,3878,0
+160000,Female,High School,Single,32,0,56099,3000,0
+20000,Female,High School,Married,32,0,17930,3000,0
+150000,Female,Graduate School,Single,33,0,1911,1189,1
+90000,Female,Graduate School,Single,32,-1,1560,23153,0
+400000,Female,Graduate School,Married,33,-2,-4,1058,0
+70000,Female,University,Married,34,0,66946,3007,0
+50000,Female,University,Married,34,2,43664,2000,0
+30000,Female,University,Married,35,1,0,0,0
+20000,Female,University,Married,46,0,19113,1292,0
+20000,Female,High School,Single,39,0,8720,4000,1
+130000,Female,High School,Married,37,-2,0,0,1
+80000,Female,Graduate School,Married,41,0,73863,3300,0
+360000,Female,Graduate School,Married,41,-2,0,0,1
+120000,Female,High School,Others,47,-1,1576,1500,0
+200000,Female,High School,Married,41,0,184133,6710,0
+150000,Female,University,Married,38,-2,6829,6719,0
+50000,Female,University,Single,44,0,27200,1172,0
+170000,Female,High School,Married,40,1,179399,0,1
+230000,Female,University,Married,34,0,205530,380,0
+50000,Female,University,Single,46,0,25119,1559,1
+280000,Female,Graduate School,Married,36,0,157424,6000,0
+50000,Female,University,Married,44,0,13226,1470,0
+30000,Female,High School,Married,31,2,30057,0,0
+360000,Female,Graduate School,Married,32,-2,-16,3210,0
+400000,Female,University,Single,32,-2,6464,1692,1
+80000,Female,High School,Single,33,0,107044,5000,0
+150000,Female,University,Single,34,0,18631,3000,0
+260000,Female,University,Married,36,0,59127,2200,0
+110000,Female,High School,Married,39,0,107441,4023,0
+50000,Female,Graduate School,Married,41,0,47739,2000,0
+300000,Female,University,Single,40,-1,10626,10582,0
+20000,Female,University,Married,37,2,16429,1300,1
+290000,Female,University,Single,37,2,140599,13010,1
+120000,Female,University,Married,45,-1,380,3175,0
+20000,Female,University,Single,40,-1,295,748,0
+360000,Female,University,Single,36,-2,-149,0,1
+150000,Female,Graduate School,Married,41,-1,7939,6629,0
+100000,Female,Graduate School,Married,42,2,99184,3000,1
+170000,Female,Graduate School,Married,45,0,73326,3504,0
+200000,Female,High School,Single,49,1,0,0,1
+60000,Female,High School,Married,48,-1,4823,5491,1
+80000,Female,University,Married,29,0,19578,1620,0
+40000,Female,High School,Married,30,1,40672,9,0
+50000,Female,Graduate School,Married,30,1,49061,1406,1
+50000,Female,High School,Single,39,2,20039,0,1
+40000,Female,University,Married,39,-1,500,0,0
+120000,Female,University,Single,25,2,94635,5400,0
+200000,Female,Graduate School,Single,25,1,133797,14000,1
+30000,Female,University,Single,29,0,22501,2429,0
+260000,Female,Others,Married,33,0,2229,1000,0
+390000,Female,University,Single,31,0,103061,7600,0
+140000,Female,Graduate School,Married,31,2,159164,0,1
+150000,Female,University,Single,32,6,81341,0,0
+30000,Female,High School,Married,32,0,30096,1494,0
+90000,Female,University,Others,36,0,91337,4000,0
+150000,Female,University,Married,39,0,16878,5000,0
+30000,Female,University,Married,39,0,22215,18243,0
+200000,Female,University,Single,36,0,138636,5048,0
+30000,Female,Graduate School,Single,38,-2,1232,764,0
+30000,Female,High School,Married,48,0,14834,1261,1
+50000,Female,University,Married,29,0,20103,2000,0
+310000,Female,University,Married,29,0,26597,1732,0
+500000,Female,Graduate School,Married,37,1,0,2529,0
+500000,Female,High School,Married,36,-2,27866,1062,0
+300000,Female,University,Married,36,-1,6447,10000,0
+430000,Female,High School,Single,30,0,200364,6215,0
+30000,Female,Graduate School,Married,30,0,3836,1016,0
+30000,Female,High School,Single,31,0,26667,1455,0
+200000,Female,Graduate School,Single,32,-2,0,0,0
+130000,Female,High School,Married,32,0,126741,6100,0
+120000,Female,Graduate School,Single,33,-1,2446,326,0
+170000,Female,University,Single,31,0,20625,1700,0
+210000,Female,Graduate School,Married,34,-1,8289,16585,0
+130000,Female,University,Married,34,0,41379,3057,0
+420000,Female,University,Married,35,0,67877,3527,0
+180000,Female,University,Married,39,0,130374,5812,0
+240000,Female,University,Single,40,0,127919,4000,0
+100000,Female,University,Married,44,1,76725,0,1
+130000,Female,High School,Single,45,0,38222,1612,0
+20000,Female,University,Married,43,1,11501,2000,1
+50000,Female,University,Single,46,0,46746,2100,0
+70000,Female,University,Married,38,0,19500,1400,0
+30000,Female,High School,Married,41,-1,1688,0,0
+50000,Female,High School,Married,48,0,41842,1984,1
+500000,Female,High School,Married,41,0,119455,7012,0
+200000,Female,Graduate School,Single,41,-1,504,2502,0
+120000,Female,University,Married,42,2,25809,4300,1
+200000,Female,University,Married,49,-1,75853,6065,0
+150000,Female,University,Single,36,1,93658,0,1
+100000,Female,University,Married,42,1,3256,0,0
+50000,Female,University,Single,38,0,7980,3000,0
+50000,Female,University,Married,36,3,47222,2000,1
+50000,Female,Graduate School,Married,43,0,49819,2227,1
+550000,Female,Graduate School,Single,33,0,337733,10000,0
+30000,Female,University,Married,35,1,0,741,0
+150000,Female,High School,Married,39,-1,5211,33522,0
+140000,Female,University,Single,27,0,119692,5000,0
+50000,Female,University,Single,48,0,55329,1902,0
+160000,Female,High School,Single,39,0,156000,5559,0
+50000,Female,University,Married,26,0,1804,4000,0
+50000,Female,Graduate School,Married,28,0,27888,2000,0
+110000,Female,Graduate School,Single,33,-2,0,0,0
+50000,Female,High School,Married,46,2,46677,2500,1
+350000,Female,University,Single,31,1,279767,0,0
+70000,Female,University,Married,42,0,24133,1702,0
+210000,Female,Graduate School,Married,38,1,0,0,0
+180000,Female,University,Married,36,0,20138,2000,0
+30000,Female,High School,Married,28,0,12430,1200,0
+630000,Female,Graduate School,Married,31,0,447263,20000,0
+220000,Female,University,Married,41,0,143660,5005,0
+280000,Female,Graduate School,Married,44,0,230931,8634,0
+340000,Female,University,Married,40,0,244451,10000,0
+180000,Female,Graduate School,Single,33,2,416,416,1
+250000,Female,University,Married,35,-2,7379,45000,0
+20000,Female,University,Single,48,1,17100,0,1
+140000,Female,University,Married,40,0,102502,5000,0
+130000,Female,University,Single,44,2,72181,2700,1
+120000,Female,Graduate School,Married,38,0,65645,15000,0
+360000,Female,Graduate School,Married,42,1,0,0,0
+240000,Female,University,Married,43,0,50120,1781,0
+310000,Female,University,Married,40,-2,3268,206,0
+100000,Female,High School,Married,46,2,43677,0,1
+200000,Female,Graduate School,Married,37,-2,1324,0,0
+80000,Female,University,Married,29,2,72674,3300,0
+210000,Female,Graduate School,Single,32,-2,19959,1603,0
+180000,Female,Graduate School,Single,31,-2,0,0,0
+70000,Female,University,Married,33,0,66701,53000,0
+60000,Female,University,Single,36,0,57724,2000,0
+200000,Female,Graduate School,Married,49,-1,950,950,0
+50000,Female,University,Single,48,-1,879,8061,0
+50000,Female,High School,Married,46,1,8354,0,1
+70000,Female,University,Married,41,0,68953,3000,0
+130000,Female,University,Married,48,0,23575,1300,0
+140000,Female,University,Married,47,-2,0,0,0
+350000,Female,High School,Married,36,0,42548,2000,0
+150000,Female,Graduate School,Single,30,0,7674,432,0
+50000,Female,University,Married,39,0,32802,1497,0
+50000,Female,High School,Married,47,0,38909,1000,0
+30000,Female,University,Single,36,0,11896,1600,0
+20000,Female,High School,Married,38,0,15894,1225,0
+50000,Female,University,Single,33,0,26740,2000,0
+30000,Female,University,Single,47,2,15089,1500,1
+230000,Female,Others,Single,47,-1,8394,5743,0
+70000,Female,High School,Married,43,2,85863,0,1
+30000,Female,University,Single,49,0,15013,1400,0
+150000,Female,High School,Married,33,0,147350,11150,0
+40000,Female,University,Married,38,-1,19537,1500,0
+250000,Female,University,Married,31,0,142905,4000,0
+200000,Female,University,Married,46,1,30783,3058,0
+190000,Female,University,Married,31,1,-92,2000,0
+390000,Female,Graduate School,Married,33,0,5602,48329,0
+20000,Female,High School,Married,36,0,14822,3641,1
+200000,Female,University,Married,47,0,193229,9058,0
+60000,Female,University,Married,38,0,58890,2620,0
+30000,Female,University,Single,39,0,28144,2000,0
+230000,Female,University,Single,38,-2,845,53021,0
+130000,Female,University,Married,33,0,84963,2980,0
+140000,Female,University,Married,44,0,127254,6050,1
+360000,Female,Graduate School,Married,45,1,0,0,0
+20000,Female,Graduate School,Single,31,3,6186,0,1
+100000,Female,University,Single,35,-1,1046,2702,0
+210000,Female,University,Single,45,1,0,0,0
+50000,Female,High School,Married,32,-1,1442,1444,0
+380000,Female,Graduate School,Married,44,0,365023,14320,0
+50000,Female,Graduate School,Single,39,0,47381,2116,0
+200000,Female,University,Single,30,-2,4690,2028,0
+50000,Female,Graduate School,Married,30,0,46791,1593,0
+150000,Female,Graduate School,Married,31,0,138121,10001,0
+30000,Female,University,Married,32,0,17168,12155,1
+30000,Female,University,Single,34,1,29678,0,1
+110000,Female,University,Married,39,0,132062,5000,0
+30000,Female,University,Married,38,0,19707,1650,0
+130000,Female,University,Married,39,1,0,0,0
+260000,Female,University,Married,39,2,238601,0,1
+30000,Female,University,Married,43,2,28264,2000,1
+220000,Female,University,Single,47,0,137736,5100,0
+120000,Female,Graduate School,Married,33,0,35563,1426,1
+100000,Female,University,Single,43,0,51146,5000,0
+100000,Female,University,Married,38,0,88231,4326,1
+70000,Female,University,Single,44,0,79696,3100,1
+20000,Female,University,Married,46,1,13254,0,1
+320000,Female,Graduate School,Married,36,-1,7868,0,0
+210000,Female,University,Married,45,0,209495,8000,0
+90000,Female,High School,Married,37,0,78159,8000,0
+120000,Female,High School,Married,36,0,91914,4260,0
+360000,Female,University,Married,35,0,127548,2145,0
+140000,Female,High School,Married,30,0,93157,4700,0
+300000,Female,Graduate School,Married,32,-1,3092,1589,0
+90000,Female,University,Married,43,2,390,780,1
+170000,Female,High School,Single,44,1,0,4142,1
+190000,Female,University,Married,46,0,140626,8000,0
+140000,Female,University,Married,40,-1,815,8419,1
+200000,Female,Graduate School,Married,37,0,37958,3023,0
+500000,Female,University,Married,43,0,53477,0,0
+300000,Female,Graduate School,Married,38,-1,2884,3558,0
+180000,Female,University,Single,29,0,151652,6000,0
+360000,Female,High School,Married,44,-1,2877,1861,0
+90000,Female,University,Married,32,0,68248,2388,0
+150000,Female,University,Married,44,-2,0,0,0
+500000,Female,Graduate School,Married,30,0,224387,10100,0
+80000,Female,University,Single,29,0,24822,1720,0
+10000,Female,High School,Single,46,1,4474,0,1
+180000,Female,Graduate School,Married,46,-1,9509,16359,0
+150000,Female,Others,Single,36,0,144833,6041,0
+80000,Female,Graduate School,Married,44,-1,14686,3673,0
+60000,Female,University,Married,30,1,14135,0,1
+240000,Female,Graduate School,Married,35,-1,380,380,0
+50000,Female,High School,Married,35,-1,10458,37907,0
+230000,Female,Graduate School,Single,33,0,19518,3735,0
+40000,Female,University,Single,33,0,34253,1567,0
+250000,Female,University,Married,33,3,2500,0,1
+50000,Female,University,Married,32,1,51132,0,0
+120000,Female,High School,Married,39,2,49437,3500,1
+50000,Female,High School,Single,41,2,23043,1700,1
+60000,Female,University,Married,38,0,55120,1735,0
+160000,Female,High School,Single,35,0,92509,3472,0
+50000,Female,University,Married,43,0,17870,1500,1
+400000,Female,Graduate School,Married,48,-1,1908,0,0
+50000,Female,University,Others,39,0,49005,6000,0
+100000,Female,High School,Single,31,0,15856,510,0
+100000,Female,University,Single,34,0,22173,1372,0
+360000,Female,Graduate School,Single,35,-1,1555,6000,0
+30000,Female,High School,Single,41,0,12001,1485,0
+50000,Female,University,Married,32,0,36062,1500,0
+80000,Female,University,Married,28,2,52062,2158,1
+150000,Female,University,Single,28,-1,326,326,0
+10000,Female,High School,Married,33,2,7708,2500,0
+200000,Female,University,Married,35,0,3773,4000,0
+30000,Female,University,Single,42,-1,836,1672,0
+220000,Female,Graduate School,Married,37,0,152197,6000,0
+180000,Female,Graduate School,Single,28,0,161694,7500,0
+220000,Female,Graduate School,Single,30,0,204491,7000,0
+220000,Female,Graduate School,Single,29,-2,0,799,0
+390000,Female,University,Single,33,0,45818,100000,0
+350000,Female,University,Married,36,1,0,0,0
+560000,Female,University,Married,45,0,374663,13195,1
+240000,Female,Graduate School,Married,37,0,173677,8000,1
+410000,Female,University,Married,43,0,6917,1130,0
+90000,Female,University,Single,36,1,0,0,1
+30000,Female,Graduate School,Single,31,0,28803,1800,0
+40000,Female,University,Married,33,1,23734,1600,1
+60000,Female,University,Single,46,0,39518,1870,0
+390000,Female,University,Married,41,0,216622,7244,0
+130000,Female,University,Married,35,0,80713,28000,0
+90000,Female,University,Single,32,3,27565,1800,1
+300000,Female,University,Married,36,0,147051,6700,0
+220000,Female,Graduate School,Married,33,0,204205,20000,0
+20000,Female,Graduate School,Single,33,0,11391,5000,0
+140000,Female,High School,Married,35,0,119184,14113,0
+150000,Female,Others,Married,47,0,89327,4539,0
+120000,Female,High School,Married,44,0,16735,2000,0
+200000,Female,University,Married,37,1,7841,0,0
+20000,Female,University,Married,40,0,4314,1500,1
+110000,Female,University,Married,41,2,23078,3200,1
+290000,Female,University,Single,34,0,305781,12012,0
+210000,Female,High School,Married,45,0,124147,5000,0
+370000,Female,University,Single,40,-2,11403,191446,0
+80000,Female,High School,Others,41,0,69964,3082,0
+200000,Female,High School,Married,42,-2,1261,1261,0
+180000,Female,University,Married,35,-1,1264,176,0
+260000,Female,University,Single,35,0,131211,4001,0
+50000,Female,University,Married,43,0,17102,3000,1
+150000,Female,High School,Married,40,-1,5916,5876,0
+190000,Female,High School,Married,45,0,76433,3300,0
+160000,Female,University,Single,42,-1,13849,40991,0
+180000,Female,University,Married,37,0,107337,2000,0
+130000,Female,University,Single,32,0,7469,1138,0
+30000,Female,University,Married,47,0,27119,2894,0
+100000,Female,University,Married,36,1,32196,0,1
+20000,Female,University,Single,37,1,16174,0,1
+100000,Female,University,Married,43,-1,1597,2491,0
+20000,Female,University,Married,40,0,10470,1145,0
+20000,Female,University,Married,35,2,5890,1500,1
+20000,Female,High School,Single,48,0,17332,1580,0
+50000,Female,High School,Married,42,2,31422,1521,1
+110000,Female,University,Married,38,2,41619,0,1
+60000,Female,University,Married,48,0,50806,2000,1
+50000,Female,University,Single,35,1,50421,0,0
+200000,Female,Graduate School,Single,40,2,185696,0,1
+360000,Female,University,Married,48,-2,0,0,0
+110000,Female,University,Married,35,0,46640,1600,0
+230000,Female,University,Married,46,-1,207,4999,0
+180000,Female,University,Single,39,0,2875,0,0
+180000,Female,University,Single,31,-2,0,1152,0
+50000,Female,University,Single,42,0,48080,1763,0
+570000,Female,Graduate School,Married,39,-2,2394,3882,0
+30000,Female,University,Married,45,1,13440,1000,0
+140000,Female,University,Married,46,-1,264,1024,0
+50000,Female,University,Married,44,0,3351,1077,1
+50000,Female,Graduate School,Married,35,0,44487,4000,1
+120000,Female,Graduate School,Single,35,1,0,0,1
+120000,Female,Graduate School,Single,35,-1,2776,2549,1
+50000,Female,University,Single,45,-2,330,330,0
+180000,Female,University,Married,40,-1,899,2879,0
+80000,Female,High School,Married,42,0,71650,0,0
+300000,Female,Graduate School,Married,43,-2,-763,1000,0
+80000,Female,Graduate School,Married,37,-1,1330,0,0
+70000,Female,University,Single,34,2,71332,2744,1
+320000,Female,University,Married,40,0,96225,5000,0
+500000,Female,Graduate School,Single,35,0,12590,1225,0
+250000,Female,Graduate School,Single,35,-1,4807,10446,0
+230000,Female,University,Married,46,0,8136,1500,0
+80000,Female,High School,Married,45,0,35401,1587,0
+170000,Female,University,Married,40,2,193,3760,1
+110000,Female,High School,Others,31,0,89955,7908,0
+240000,Female,Graduate School,Single,31,0,43601,0,0
+150000,Female,High School,Married,46,0,61510,2240,0
+90000,Female,High School,Married,37,0,91193,7006,0
+100000,Female,University,Married,36,2,101483,4500,1
+30000,Female,University,Married,42,2,30149,0,1
+80000,Female,University,Married,40,2,80934,3800,1
+260000,Female,University,Married,49,-1,8046,2184,1
+200000,Female,High School,Married,43,-1,623,623,0
+20000,Female,High School,Married,39,2,15307,0,1
+160000,Female,Graduate School,Single,31,-1,1494,2,1
+70000,Female,University,Single,38,1,21047,1700,0
+20000,Female,University,Married,43,0,6459,1123,0
+200000,Female,Graduate School,Married,45,-1,632,337,0
+440000,Female,High School,Married,49,0,386288,14453,0
+20000,Female,High School,Married,43,4,20461,0,0
+110000,Female,Graduate School,Married,35,0,107905,4000,0
+130000,Female,High School,Married,44,0,122264,3541,0
+50000,Female,University,Married,33,0,40738,4200,0
+290000,Female,Graduate School,Married,35,-2,0,0,1
+480000,Female,University,Married,38,-1,4658,6000,0
+310000,Female,Graduate School,Single,29,-2,976,0,0
+30000,Female,University,Married,30,0,24061,1800,1
+500000,Female,Others,Married,39,1,-397,2200,0
+80000,Female,University,Single,31,0,51160,4500,0
+100000,Female,University,Married,32,0,74136,3200,0
+70000,Female,High School,Married,37,0,74117,2589,0
+210000,Female,University,Married,43,0,91152,3348,0
+180000,Female,High School,Married,43,-1,780,0,0
+50000,Female,University,Married,43,0,51229,2100,0
+60000,Female,Graduate School,Single,34,0,39505,5000,0
+50000,Female,University,Married,40,-1,4860,8389,0
+70000,Female,High School,Married,38,1,59020,50,0
+50000,Female,University,Single,36,0,81793,2682,0
+350000,Female,Graduate School,Married,42,-1,360,360,0
+170000,Female,University,Married,45,0,156993,8000,0
+180000,Female,University,Married,32,1,146549,7641,0
+220000,Female,Graduate School,Single,30,0,213525,6439,0
+80000,Female,High School,Married,30,2,40654,2000,1
+360000,Female,University,Married,33,0,30198,30000,0
+500000,Female,Graduate School,Single,31,-1,1779,644,0
+410000,Female,Graduate School,Single,31,0,309531,11270,0
+80000,Female,Graduate School,Married,32,-1,4927,0,1
+180000,Female,Graduate School,Single,33,2,118656,10000,1
+280000,Female,Graduate School,Single,31,-1,410,3977,0
+20000,Female,Graduate School,Single,31,-1,416,416,0
+30000,Female,University,Single,31,3,29078,0,1
+200000,Female,Graduate School,Married,31,-1,1203,0,0
+180000,Female,Graduate School,Single,32,-1,10831,7201,1
+140000,Female,Graduate School,Single,33,2,130104,4900,0
+130000,Female,University,Single,29,0,86584,4100,0
+20000,Female,University,Married,30,0,3366,0,0
+120000,Female,Graduate School,Single,29,-1,632,0,1
+50000,Female,University,Single,30,0,35758,3000,0
+310000,Female,University,Married,30,-1,13768,4000,0
+420000,Female,Graduate School,Single,30,0,46209,1543,0
+60000,Female,University,Single,30,0,10809,3160,0
+140000,Female,Graduate School,Married,30,1,0,0,0
+80000,Female,University,Single,30,2,56761,3700,1
+220000,Female,University,Married,31,0,79226,3900,0
+360000,Female,University,Married,33,0,11596,1058,0
+290000,Female,Graduate School,Single,29,0,10370,1400,0
+260000,Female,Graduate School,Single,30,1,96549,0,0
+170000,Female,Graduate School,Single,29,0,24717,3000,0
+200000,Female,Graduate School,Single,34,-2,0,0,1
+140000,Female,University,Married,35,0,127794,4317,0
+170000,Female,University,Single,39,0,161277,5856,0
+220000,Female,Graduate School,Married,45,0,142389,6200,0
+230000,Female,Graduate School,Married,44,-1,756,0,1
+360000,Female,High School,Single,38,-1,1638,3060,0
+340000,Female,Graduate School,Married,36,0,242518,8849,0
+360000,Female,Graduate School,Married,37,-2,0,0,1
+90000,Female,High School,Single,42,0,91240,4895,0
+130000,Female,University,Married,47,2,123896,0,1
+80000,Female,University,Married,41,-1,390,390,1
+400000,Female,Graduate School,Married,41,0,104083,8,0
+240000,Female,University,Single,43,0,225159,20000,0
+10000,Female,University,Married,46,1,7660,1306,1
+240000,Female,Graduate School,Married,41,-1,3087,10839,0
+130000,Female,Graduate School,Married,40,-1,989,991,0
+20000,Female,High School,Married,44,0,20313,3030,0
+80000,Female,University,Married,40,0,37123,2072,0
+250000,Female,University,Married,39,0,210262,10073,0
+30000,Female,University,Married,44,1,28864,2000,0
+280000,Female,University,Married,42,0,255527,273844,0
+290000,Female,University,Married,49,-2,-236,0,0
+500000,Female,High School,Single,46,-1,24239,15000,0
+80000,Female,University,Married,40,1,63340,6,0
+120000,Female,University,Married,47,0,97989,5100,0
+80000,Female,University,Married,48,0,49526,1740,0
+210000,Female,University,Single,41,0,227271,9000,0
+100000,Female,University,Single,30,2,96604,0,1
+210000,Female,Graduate School,Married,41,1,-28,0,0
+500000,Female,University,Married,44,0,193136,6610,0
+230000,Female,Graduate School,Married,39,0,63370,2305,0
+110000,Female,University,Married,40,0,23303,1511,0
+210000,Female,Graduate School,Married,46,0,112901,4910,0
+320000,Female,University,Single,35,-1,61687,3500,0
+140000,Female,University,Single,37,0,143118,6400,0
+40000,Female,Graduate School,Married,48,0,38549,2000,0
+280000,Female,Graduate School,Married,35,2,47,99007,0
+220000,Female,Graduate School,Married,47,0,210753,6426,0
+250000,Female,University,Married,37,0,59023,45000,0
+70000,Female,Graduate School,Single,34,0,36467,5000,0
+70000,Female,Graduate School,Married,35,0,59795,2479,0
+300000,Female,Graduate School,Married,47,-1,514,2392,1
+50000,Female,High School,Married,40,0,117191,2325,0
+70000,Female,Graduate School,Single,36,0,79640,3000,0
+50000,Female,High School,Married,35,0,49930,2300,0
+250000,Female,University,Married,35,2,92966,1641,1
+120000,Female,University,Married,34,0,119287,4400,0
+70000,Female,Graduate School,Married,35,2,51143,3500,1
+440000,Female,Graduate School,Married,40,0,416044,16500,0
+280000,Female,Graduate School,Married,41,0,96590,5000,0
+360000,Female,Graduate School,Married,42,0,174857,10000,0
+150000,Female,Graduate School,Married,43,-2,885,123,0
+300000,Female,High School,Single,44,-2,0,0,1
+20000,Female,University,Married,42,1,8395,0,0
+380000,Female,University,Married,38,0,12538,1000,0
+340000,Female,University,Married,44,0,240851,8297,0
+210000,Female,University,Single,41,0,165425,8000,0
+60000,Female,University,Married,41,2,56483,2203,1
+130000,Female,Graduate School,Married,46,0,15393,1593,0
+340000,Female,Graduate School,Married,43,0,15928,3000,0
+170000,Female,University,Married,49,-1,12769,3557,0
+360000,Female,University,Married,43,-1,9011,4045,0
+60000,Female,University,Married,35,2,3167,2500,0
+360000,Female,Graduate School,Married,37,-1,5625,368,0
+280000,Female,University,Married,41,0,9551,5000,0
+240000,Female,Graduate School,Married,34,-1,2799,855,0
+120000,Female,Graduate School,Single,34,-1,17067,0,0
+90000,Female,Graduate School,Single,33,0,92725,8500,0
+100000,Female,Graduate School,Married,34,0,101620,4000,0
+410000,Female,Graduate School,Married,34,1,0,0,0
+190000,Female,Graduate School,Married,43,2,191869,7305,0
+220000,Female,Graduate School,Married,36,0,184765,9000,0
+90000,Female,University,Single,44,2,15089,2500,1
+120000,Female,Graduate School,Single,30,0,170263,5520,0
+10000,Female,High School,Single,50,3,600,0,1
+500000,Female,Graduate School,Married,50,0,378981,20002,0
+120000,Female,University,Married,51,-1,4777,62462,0
+120000,Female,University,Married,50,0,38261,23000,0
+220000,Female,Graduate School,Married,51,-1,2262,2276,1
+240000,Female,University,Married,50,-1,978,193,0
+150000,Female,Graduate School,Single,36,1,0,990,0
+500000,Female,Graduate School,Married,50,-1,21794,12610,0
+150000,Female,Graduate School,Married,50,-1,6350,0,0
+60000,Female,University,Married,49,0,58928,2500,0
+30000,Female,High School,Married,50,2,29468,1000,1
+320000,Female,University,Married,53,0,319711,3000,0
+470000,Female,Graduate School,Married,61,-2,228,5241,0
+440000,Female,Graduate School,Single,51,0,26106,8999,0
+50000,Female,University,Married,51,0,44664,4110,0
+220000,Female,University,Married,51,0,36022,3010,0
+360000,Female,Graduate School,Married,54,1,0,0,0
+90000,Female,High School,Married,58,-2,5563,0,0
+50000,Female,High School,Single,60,2,46554,5000,1
+50000,Female,University,Married,56,0,50586,2000,0
+290000,Female,High School,Married,58,1,68445,0,0
+350000,Female,University,Married,50,0,263390,10532,1
+110000,Female,High School,Married,52,1,23969,0,0
+40000,Female,University,Married,53,0,4621,2150,1
+50000,Female,High School,Married,59,1,-639,0,1
+110000,Female,Graduate School,Married,55,-1,632,0,0
+360000,Female,High School,Married,70,-1,1960,800,0
+200000,Female,High School,Single,63,-1,632,0,1
+140000,Female,University,Single,53,0,8068,1535,0
+80000,Female,University,Single,61,1,0,0,1
+260000,Female,University,Married,56,0,2338,0,0
+340000,Female,Graduate School,Married,56,-1,5610,5399,1
+30000,Female,University,Married,58,1,11223,0,1
+30000,Female,High School,Married,54,2,22147,3000,0
+440000,Female,High School,Married,50,0,250825,10043,0
+10000,Female,High School,Married,52,-1,2465,1200,0
+230000,Female,University,Married,55,0,7302,2001,0
+50000,Female,High School,Married,61,-1,2020,0,0
+150000,Female,University,Others,54,0,138765,4772,0
+330000,Female,High School,Married,55,0,21333,5000,0
+370000,Female,High School,Single,50,-1,48303,124174,0
+130000,Female,High School,Single,53,0,101156,3615,0
+10000,Female,High School,Married,51,3,2400,0,1
+360000,Female,University,Married,54,0,277688,10450,0
+20000,Female,High School,Married,61,1,0,0,1
+180000,Female,High School,Single,52,0,173065,5902,0
+80000,Female,High School,Married,62,0,36088,2100,0
+50000,Female,University,Married,55,2,41696,1674,1
+150000,Female,Graduate School,Single,64,1,157231,3,0
+30000,Female,High School,Married,56,1,25158,0,0
+50000,Female,High School,Married,52,0,5976,2000,0
+20000,Female,University,Others,56,0,17977,3000,0
+130000,Female,High School,Single,55,0,116336,10063,0
+320000,Female,High School,Single,53,0,188373,276698,0
+300000,Female,High School,Married,62,-2,0,0,0
+330000,Female,Graduate School,Married,53,0,304922,14000,0
+50000,Female,University,Single,53,0,51320,1952,0
+70000,Female,High School,Single,60,0,10083,1184,0
+70000,Female,High School,Others,58,0,70012,2418,0
+100000,Female,High School,Married,58,0,96821,3589,0
+110000,Female,University,Married,54,2,95090,3400,1
+100000,Female,University,Married,58,2,86654,3500,1
+160000,Female,High School,Married,53,1,0,0,0
+70000,Female,High School,Single,55,0,4664,1100,0
+60000,Female,University,Married,59,4,54506,4000,0
+80000,Female,University,Married,61,0,74156,3000,0
+100000,Female,High School,Married,60,-1,2178,7300,0
+230000,Female,Graduate School,Married,66,1,0,0,1
+60000,Female,High School,Others,64,0,15463,1410,0
+160000,Female,University,Married,58,-1,1443,1443,0
+30000,Female,University,Married,55,0,13532,1647,1
+310000,Female,Graduate School,Married,62,2,327289,12500,0
+60000,Male,University,Married,46,0,57487,2706,0
+50000,Male,University,Married,64,2,28433,0,0
+140000,Male,High School,Married,58,2,98985,4520,0
+20000,Male,University,Single,27,-1,780,0,0
+10000,Male,University,Single,38,1,-58,0,0
+80000,Male,University,Single,29,-1,10278,1085,0
+50000,Male,High School,Single,24,0,40751,1627,0
+160000,Female,University,Married,51,2,65312,4000,1
+500000,Female,Graduate School,Single,51,1,0,0,0
+470000,Female,University,Married,54,1,358581,0,0
+210000,Female,Graduate School,Married,54,-1,1494,6555,0
+50000,Male,High School,Single,23,-1,1789,2500,1
+30000,Male,Graduate School,Single,24,2,29015,1400,1
+50000,Male,Graduate School,Single,23,0,18666,2000,0
+80000,Male,Graduate School,Single,24,0,77225,3649,0
+30000,Male,University,Married,23,2,27172,1800,1
+80000,Male,Others,Single,24,0,52991,2121,0
+150000,Male,Graduate School,Single,28,-2,5797,436,0
+170000,Male,University,Single,27,0,169106,5000,0
+220000,Male,University,Single,30,-2,15872,20066,0
+30000,Male,University,Single,35,2,18915,2000,1
+100000,Male,University,Single,23,-1,291,291,0
+10000,Male,University,Single,24,3,1050,0,1
+140000,Male,University,Single,24,-1,6312,0,1
+50000,Male,University,Single,24,1,25932,1800,0
+20000,Male,University,Single,22,0,8785,1142,0
+10000,Male,University,Single,22,1,6829,0,1
+50000,Male,University,Single,24,0,49989,1800,0
+10000,Male,High School,Single,22,1,0,7492,0
+30000,Male,University,Single,23,0,30620,2000,0
+50000,Male,University,Single,23,0,49283,1781,0
+100000,Male,University,Single,24,0,87478,3500,1
+20000,Male,High School,Single,24,0,18661,1400,0
+20000,Male,University,Single,25,0,17584,3000,0
+100000,Male,University,Single,25,0,49199,1000,0
+50000,Male,University,Single,25,1,15491,1000,1
+100000,Male,Graduate School,Single,24,-1,33921,1563,0
+30000,Male,High School,Single,25,2,23421,1500,1
+50000,Male,University,Married,25,0,42238,1807,1
+50000,Male,University,Single,25,-2,-1587,0,1
+50000,Male,Graduate School,Single,25,0,60200,2000,0
+50000,Male,Others,Single,23,0,42407,7275,0
+50000,Male,Graduate School,Single,25,0,48848,1845,0
+50000,Male,University,Single,25,0,50547,1900,0
+380000,Male,University,Single,26,0,64900,2600,0
+50000,Male,Graduate School,Single,24,2,5524,1500,1
+30000,Male,University,Single,24,2,31529,0,0
+150000,Male,University,Married,25,-1,890,0,0
+180000,Male,University,Single,25,0,12891,1600,0
+80000,Male,High School,Single,26,0,79558,3694,1
+110000,Male,University,Single,24,2,98428,5500,1
+30000,Male,High School,Single,25,0,23075,2000,0
+110000,Male,University,Single,25,0,23459,4000,1
+280000,Male,University,Single,26,0,51951,25486,0
+10000,Male,University,Single,23,0,5583,5000,1
+10000,Male,University,Single,23,-1,1473,1863,0
+20000,Male,University,Single,24,2,16239,600,0
+170000,Male,University,Single,26,0,85914,10050,0
+20000,Male,High School,Single,26,3,21667,0,1
+130000,Male,University,Single,22,0,120479,5601,0
+200000,Male,Graduate School,Single,26,2,196979,7800,1
+250000,Male,University,Single,26,0,115497,3835,0
+80000,Male,University,Single,27,0,48329,5004,0
+180000,Male,Graduate School,Single,27,-1,3210,12000,0
+120000,Male,University,Single,28,0,70124,3000,0
+20000,Male,University,Single,26,1,16167,0,0
+150000,Male,Graduate School,Single,26,0,26443,2000,0
+190000,Male,University,Single,26,-1,2204,1800,0
+210000,Male,Graduate School,Single,27,0,42585,8000,0
+180000,Male,University,Single,27,0,147310,7503,0
+50000,Male,University,Single,28,0,48431,1700,0
+140000,Male,Graduate School,Single,25,0,92465,3500,0
+80000,Male,University,Single,26,-2,2970,0,0
+80000,Male,University,Single,27,0,81752,3069,1
+80000,Male,University,Single,25,2,76915,3149,0
+20000,Male,University,Single,28,0,8723,2000,0
+10000,Male,University,Married,26,3,2400,0,1
+30000,Male,University,Single,26,1,14773,0,1
+240000,Male,Graduate School,Single,27,-1,19365,5534,0
+50000,Male,University,Single,28,1,-727,53860,0
+90000,Male,University,Single,26,-1,481,1000,1
+60000,Male,Graduate School,Single,28,-1,40784,2001,0
+100000,Male,University,Single,28,0,9125,2000,1
+120000,Male,Graduate School,Single,28,-1,4660,6074,0
+10000,Male,University,Married,28,0,1474,1196,0
+70000,Male,University,Single,28,1,26357,1600,1
+60000,Male,Graduate School,Single,28,0,26129,5000,0
+130000,Male,Graduate School,Single,29,0,120000,0,0
+30000,Male,High School,Married,30,3,2400,0,1
+360000,Male,Graduate School,Single,30,-1,2020,28057,0
+80000,Male,University,Single,30,0,31299,20000,0
+20000,Male,University,Single,27,0,15993,1600,0
+50000,Male,University,Single,28,0,8608,2000,0
+80000,Male,High School,Single,27,1,0,0,1
+150000,Male,Graduate School,Married,27,-1,390,0,0
+60000,Male,University,Single,27,0,26437,1701,0
+220000,Male,University,Single,28,0,59247,6000,0
+10000,Male,High School,Single,23,0,5807,1270,0
+50000,Male,University,Single,26,2,49586,2268,0
+50000,Male,Graduate School,Single,26,0,48924,2200,0
+220000,Male,Graduate School,Single,28,0,22399,3000,0
+450000,Male,Graduate School,Single,29,1,1261,1261,0
+240000,Male,University,Married,33,1,172618,6765,0
+500000,Male,University,Single,29,0,39345,2014,0
+170000,Male,Graduate School,Single,28,0,56061,2051,0
+100000,Male,University,Single,29,2,43531,2600,1
+60000,Male,Graduate School,Single,29,-1,816,2440,0
+200000,Male,Graduate School,Single,30,-1,6589,3449,0
+200000,Male,University,Single,27,2,189605,14635,1
+20000,Male,Graduate School,Single,28,0,3202,2000,0
+40000,Male,University,Single,27,1,41520,0,1
+110000,Male,Graduate School,Single,27,0,87487,5000,0
+10000,Male,University,Married,27,0,10128,1151,0
+100000,Male,University,Single,28,0,32480,1600,0
+50000,Male,University,Single,28,1,52753,0,0
+30000,Male,High School,Single,28,0,11064,1159,0
+30000,Male,University,Single,28,1,0,0,0
+50000,Male,University,Married,29,3,400,0,0
+20000,Male,High School,Single,29,1,5957,0,1
+50000,Male,University,Single,30,0,26666,2000,0
+60000,Male,University,Single,28,2,61743,2800,1
+50000,Male,University,Single,32,-1,1473,5486,1
+130000,Male,High School,Married,34,2,69388,3000,1
+20000,Male,University,Single,25,0,14193,2850,1
+100000,Male,University,Single,28,2,58566,5100,1
+20000,Male,University,Married,38,0,8337,3000,1
+150000,Male,High School,Single,44,0,151009,6913,0
+70000,Male,Graduate School,Single,33,0,62615,2870,0
+250000,Male,Graduate School,Married,40,0,7351,3039,0
+50000,Male,High School,Married,32,2,38023,2500,1
+90000,Male,University,Married,34,0,86371,3073,1
+90000,Male,University,Single,32,0,73814,3700,0
+120000,Male,Graduate School,Single,33,2,87310,4300,1
+240000,Male,University,Single,25,0,24640,1700,0
+50000,Male,University,Single,25,0,43785,1795,1
+100000,Male,University,Married,29,2,81839,3200,1
+390000,Male,High School,Married,49,0,282849,12000,0
+50000,Male,High School,Married,36,2,47097,1796,1
+40000,Male,University,Married,44,2,24262,3100,1
+190000,Male,University,Married,38,0,194670,9000,0
+10000,Male,University,Married,40,3,1050,0,1
+10000,Male,University,Married,44,0,6696,2668,0
+180000,Male,Graduate School,Single,30,-1,4027,2130,0
+230000,Male,University,Single,31,0,170041,9800,0
+20000,Male,University,Single,31,0,12214,2000,0
+230000,Male,University,Single,32,2,253935,10183,1
+200000,Male,Graduate School,Single,32,0,190530,9000,0
+470000,Male,Graduate School,Single,34,0,57566,5344,0
+150000,Male,Graduate School,Others,45,0,108434,3000,0
+70000,Male,High School,Married,37,0,71742,2721,0
+50000,Male,University,Single,37,0,49473,2314,0
+80000,Male,Graduate School,Single,35,0,75734,20000,0
+330000,Male,Graduate School,Married,40,0,309055,12500,0
+140000,Male,University,Single,33,0,133336,5000,0
+70000,Male,University,Single,44,2,43314,2025,1
+280000,Male,University,Single,32,0,35038,2000,0
+150000,Male,High School,Single,30,0,80220,3095,0
+80000,Male,University,Married,35,0,79875,4000,0
+120000,Male,Graduate School,Married,42,0,7722,2000,1
+50000,Male,University,Married,45,0,14488,1558,0
+230000,Male,University,Married,43,0,3472,1061,0
+160000,Male,University,Single,36,1,0,0,0
+50000,Male,High School,Single,37,0,28951,1765,0
+100000,Male,High School,Married,49,0,82214,2659,0
+200000,Male,University,Married,43,2,94529,0,1
+170000,Male,University,Married,36,1,158954,7,0
+50000,Male,University,Married,43,0,25387,1374,0
+200000,Male,High School,Married,46,-2,0,0,0
+200000,Male,University,Married,39,-1,1543,1109,0
+50000,Male,High School,Single,41,0,43383,1992,0
+50000,Male,High School,Single,45,0,35962,1857,0
+300000,Male,University,Married,39,-1,29997,598,0
+130000,Male,University,Single,32,0,21395,10000,0
+80000,Male,Graduate School,Single,33,2,67206,0,1
+120000,Male,Graduate School,Single,34,0,32773,1509,0
+140000,Male,Graduate School,Single,32,0,130134,5000,1
+220000,Male,Graduate School,Single,48,0,113061,5600,0
+70000,Male,University,Single,32,0,71336,3609,1
+20000,Male,High School,Single,45,0,18416,1303,0
+200000,Male,Graduate School,Married,49,1,0,492,0
+20000,Male,University,Married,31,1,6313,500,1
+510000,Male,Graduate School,Single,32,0,63573,2900,0
+450000,Male,Graduate School,Single,34,-1,42713,0,0
+50000,Male,University,Married,34,0,45459,2000,0
+120000,Male,University,Others,48,0,61673,2345,0
+50000,Male,High School,Married,45,1,0,0,1
+280000,Male,University,Single,31,0,4821,1017,0
+30000,Male,Graduate School,Single,33,4,22303,0,1
+20000,Male,University,Married,35,1,5659,0,0
+80000,Male,University,Married,42,0,78216,2115,0
+120000,Male,University,Married,45,0,52574,2000,0
+370000,Male,University,Married,38,0,189567,8000,0
+50000,Male,University,Single,45,0,30359,2000,0
+20000,Male,High School,Single,39,0,30892,5000,0
+160000,Male,University,Married,36,0,59675,3000,0
+170000,Male,High School,Single,44,0,167077,12275,0
+100000,Male,University,Single,36,0,2069,1500,0
+210000,Male,Graduate School,Married,42,-1,2827,3527,0
+100000,Male,High School,Single,32,0,54483,2460,0
+230000,Male,Graduate School,Married,33,0,213859,9285,1
+240000,Male,Graduate School,Married,38,-1,6333,0,0
+240000,Male,High School,Single,36,0,135053,10000,0
+50000,Male,University,Others,48,2,42850,1000,1
+80000,Male,University,Married,46,0,79707,3542,0
+50000,Male,High School,Single,41,0,94962,4200,0
+50000,Male,University,Married,36,0,43900,2268,0
+440000,Male,Graduate School,Single,37,0,186061,50000,0
+220000,Male,High School,Single,42,0,26595,9000,0
+70000,Male,Graduate School,Single,30,0,16163,1000,0
+240000,Male,Graduate School,Single,32,-2,3162,3162,0
+360000,Male,High School,Single,42,0,52460,3000,0
+360000,Male,Graduate School,Single,32,-2,8971,2303,0
+80000,Male,Graduate School,Single,33,-2,251594,13000,0
+30000,Male,University,Single,36,1,13208,1000,1
+160000,Male,University,Single,37,1,99793,4300,1
+150000,Male,University,Single,39,2,155896,5500,1
+100000,Male,University,Single,30,0,55776,3000,0
+70000,Male,University,Single,33,1,26181,12000,0
+300000,Male,University,Single,35,0,242168,40006,0
+90000,Male,University,Married,39,2,26493,4000,1
+800000,Male,Graduate School,Single,46,-2,6229,4644,0
+360000,Male,University,Single,40,-1,36560,11028,0
+90000,Male,University,Married,41,1,86699,0,1
+300000,Male,High School,Single,43,0,303747,11500,1
+30000,Male,High School,Married,46,1,19523,0,1
+50000,Male,University,Single,37,0,48566,2088,0
+110000,Male,University,Married,30,2,54404,4000,1
+80000,Male,Graduate School,Single,37,0,155342,6000,0
+140000,Male,University,Married,31,0,134039,13650,0
+50000,Male,University,Married,44,0,48659,1345,0
+20000,Male,High School,Single,40,0,13195,1238,0
+50000,Male,High School,Single,46,0,48060,2326,0
+50000,Male,Graduate School,Single,34,1,0,0,0
+130000,Male,University,Married,48,3,120326,5000,1
+90000,Male,University,Married,45,0,46608,4500,0
+30000,Male,High School,Married,44,-2,2180,0,0
+50000,Male,University,Married,49,0,33346,2000,1
+170000,Male,Graduate School,Married,39,-2,0,0,0
+290000,Male,Graduate School,Single,31,2,45021,2700,1
+20000,Male,High School,Married,47,0,16013,1419,0
+180000,Male,Graduate School,Single,38,2,114091,12100,1
+360000,Male,Graduate School,Married,35,1,396,396,0
+20000,Male,High School,Single,31,0,7222,10009,0
+140000,Male,University,Single,33,0,279184,7000,0
+90000,Male,University,Married,34,0,53105,2525,0
+120000,Male,University,Married,40,0,35671,2000,0
+160000,Male,Graduate School,Married,31,-1,24465,7751,0
+280000,Male,University,Married,32,2,204057,0,1
+500000,Male,Graduate School,Married,46,-1,991,4082,0
+100000,Male,University,Single,38,2,106989,3910,1
+160000,Male,High School,Single,31,0,108963,4500,0
+20000,Male,University,Married,42,0,14124,1249,0
+400000,Male,Graduate School,Single,39,2,49354,2500,0
+30000,Male,University,Married,45,2,28854,2372,0
+50000,Male,High School,Married,47,-1,2840,1000,0
+70000,Male,University,Single,40,1,79237,0,0
+110000,Male,University,Married,42,1,45461,0,0
+50000,Male,University,Single,44,0,50677,4000,0
+170000,Male,University,Married,40,-1,4599,3409,0
+30000,Male,Graduate School,Married,33,0,28700,1800,0
+50000,Male,High School,Married,29,0,9527,2000,0
+50000,Male,University,Single,30,-1,780,0,1
+90000,Male,High School,Married,44,1,0,0,1
+110000,Male,University,Married,40,0,31042,3454,0
+240000,Male,University,Single,46,0,47379,3000,0
+360000,Male,High School,Married,42,-2,0,4435,0
+290000,Male,University,Single,45,0,212643,6755,0
+180000,Male,Graduate School,Married,39,-1,5086,5668,0
+180000,Male,University,Married,43,0,158843,6029,0
+50000,Male,University,Single,43,0,47669,2000,0
+500000,Male,Graduate School,Single,47,-2,21376,20473,0
+50000,Male,University,Married,35,0,5219,2500,1
+50000,Male,High School,Married,37,5,54078,0,0
+400000,Male,Graduate School,Single,32,-1,525,1610,0
+240000,Male,University,Married,34,0,225709,8544,0
+150000,Male,University,Married,31,0,64276,3500,0
+50000,Male,High School,Single,31,3,15208,0,1
+150000,Male,Graduate School,Married,34,1,0,347,0
+50000,Male,High School,Married,41,0,2602,10000,1
+360000,Male,Graduate School,Single,40,-1,3457,2020,0
+100000,Male,Graduate School,Married,38,2,72437,4000,1
+30000,Male,University,Single,44,2,44072,0,0
+260000,Male,Graduate School,Single,36,-2,16751,10000,0
+140000,Male,University,Married,29,0,140933,5503,0
+30000,Male,High School,Married,48,-1,1261,1261,0
+60000,Male,University,Single,36,2,48547,2000,1
+10000,Male,University,Single,29,1,8393,1000,0
+110000,Male,High School,Single,34,2,86345,4504,1
+40000,Male,Graduate School,Single,34,0,38849,3600,1
+50000,Male,High School,Married,47,2,49386,1818,1
+50000,Male,University,Single,40,0,44310,3003,0
+350000,Male,University,Single,32,0,227873,10156,1
+450000,Male,Graduate School,Single,34,0,260733,20255,0
+50000,Male,University,Single,42,0,49495,3000,0
+170000,Male,High School,Married,39,0,54397,2009,0
+100000,Male,University,Married,41,1,96310,0,0
+140000,Male,Graduate School,Married,36,0,39598,2200,0
+250000,Male,Graduate School,Single,29,1,0,5023,0
+130000,Male,Graduate School,Single,29,1,0,0,0
+130000,Male,University,Single,33,2,1183,2000,0
+360000,Male,High School,Married,36,-2,222,1256,1
+270000,Male,University,Married,39,-1,18232,14341,0
+140000,Male,University,Married,40,0,139934,6500,0
+150000,Male,University,Single,32,0,151594,4523,0
+200000,Male,Graduate School,Single,33,-1,1431,1491,0
+280000,Male,Graduate School,Married,34,0,188698,6337,0
+80000,Male,High School,Single,35,2,88027,3264,1
+240000,Male,High School,Single,40,0,245311,8491,0
+20000,Male,Graduate School,Single,42,3,1050,0,1
+300000,Male,Graduate School,Married,38,0,113807,4511,0
+250000,Male,Graduate School,Single,41,-1,173,0,1
+50000,Male,Graduate School,Single,49,0,48135,1314,1
+50000,Male,High School,Married,41,0,5620,42100,0
+30000,Male,University,Single,39,2,25999,0,1
+240000,Male,Graduate School,Married,38,2,2500,0,1
+200000,Male,University,Married,40,0,154090,21000,1
+50000,Male,Graduate School,Single,39,0,57436,2055,0
+210000,Male,Graduate School,Single,38,-1,280,0,0
+280000,Male,Graduate School,Married,42,1,0,0,0
+200000,Male,High School,Single,33,0,223147,8376,0
+300000,Male,University,Married,41,0,163033,5505,0
+420000,Male,Graduate School,Single,36,-1,9464,40146,0
+350000,Male,Graduate School,Single,34,0,259524,8432,0
+120000,Male,High School,Married,43,2,41439,2000,1
+110000,Male,University,Married,48,2,65070,5500,1
+20000,Male,Graduate School,Single,31,1,0,150,0
+100000,Male,Graduate School,Single,32,0,81210,3019,0
+240000,Male,Graduate School,Married,36,-2,0,0,1
+130000,Male,Others,Single,37,0,22335,1617,0
+170000,Male,Graduate School,Single,48,0,6203,5079,0
+60000,Male,University,Married,42,2,4136,2300,1
+200000,Male,High School,Married,42,0,151392,7000,0
+210000,Male,University,Married,48,0,186004,8100,0
+100000,Male,University,Single,35,0,95271,3558,0
+380000,Male,Graduate School,Married,37,0,83852,13014,0
+240000,Male,Graduate School,Single,38,0,6865,2000,0
+150000,Male,University,Single,33,0,53224,5000,0
+60000,Male,University,Single,35,0,34604,1900,0
+350000,Male,University,Married,37,0,219376,7669,0
+300000,Male,Graduate School,Married,33,-1,250,1280,0
+160000,Male,High School,Married,42,-1,75847,30000,0
+130000,Male,Graduate School,Single,37,0,91390,5000,0
+50000,Male,High School,Single,38,0,4269,1288,0
+270000,Male,University,Married,40,0,42598,3036,1
+90000,Male,Graduate School,Single,36,0,87678,3573,0
+240000,Male,Graduate School,Married,37,-2,4081,7644,0
+450000,Male,Graduate School,Married,45,0,201015,10000,0
+280000,Male,Graduate School,Single,36,-1,10831,4216,1
+270000,Male,Graduate School,Married,38,-2,0,0,0
+10000,Male,High School,Single,46,3,1050,0,1
+50000,Male,Graduate School,Others,37,0,45553,6324,0
+100000,Male,Graduate School,Married,33,0,34139,5000,0
+190000,Male,Graduate School,Single,32,0,183594,15000,1
+240000,Male,Graduate School,Single,33,0,85603,2500,0
+220000,Male,Graduate School,Married,33,0,171182,7000,0
+500000,Male,Graduate School,Married,42,0,47556,2016,0
+300000,Male,Graduate School,Single,32,0,8077,2800,1
+500000,Male,Graduate School,Single,30,-2,3317,11393,0
+200000,Male,Graduate School,Single,30,-1,8985,805,0
+260000,Male,University,Single,31,0,3113,1300,0
+160000,Male,High School,Single,31,0,50901,4000,0
+340000,Male,Graduate School,Single,31,0,85292,17241,0
+500000,Male,Graduate School,Single,32,0,65736,2425,0
+310000,Male,Graduate School,Married,32,1,310392,56,0
+360000,Male,University,Married,30,-1,1431,3933,0
+110000,Male,Graduate School,Married,36,2,36152,2000,1
+390000,Male,Graduate School,Married,36,-2,8040,7857,0
+140000,Male,Graduate School,Single,31,-1,4900,393,0
+30000,Male,High School,Single,32,0,27995,1456,0
+200000,Male,Graduate School,Single,33,1,-168,1000,0
+110000,Male,Graduate School,Single,40,0,101781,3700,0
+20000,Male,Graduate School,Married,35,2,20547,1250,1
+210000,Male,University,Married,41,2,85639,4200,1
+380000,Male,University,Married,42,-2,2588,39529,0
+70000,Male,Graduate School,Married,42,0,25317,2000,0
+20000,Male,University,Single,32,0,14549,1306,0
+260000,Male,Graduate School,Single,32,-1,2112,733,0
+200000,Male,Graduate School,Married,46,0,53643,1765,0
+190000,Male,University,Single,36,0,154456,6176,1
+50000,Male,University,Single,42,0,47282,2500,0
+20000,Male,University,Single,35,0,17042,1500,0
+400000,Male,Graduate School,Married,49,-2,749,1530,0
+280000,Male,University,Married,49,0,236168,11000,0
+60000,Male,High School,Married,38,2,33044,0,1
+250000,Male,Graduate School,Married,47,-2,0,0,0
+400000,Male,Graduate School,Single,41,-1,11461,5877,0
+180000,Male,University,Married,46,0,97660,6000,0
+50000,Male,University,Married,45,0,51132,2800,0
+220000,Male,University,Single,39,0,103159,5000,0
+50000,Male,University,Married,35,-1,3056,0,0
+20000,Male,High School,Single,61,3,1930,0,0
+100000,Male,University,Married,44,0,11509,1270,0
+500000,Male,Graduate School,Married,44,0,48533,2200,0
+120000,Male,University,Single,38,-1,500,0,0
+360000,Male,Graduate School,Married,37,-2,-40,8187,0
+440000,Male,University,Married,48,0,23424,3000,0
+240000,Male,University,Single,30,-2,13989,7976,0
+200000,Male,Graduate School,Married,45,-2,3659,1718,0
+50000,Male,University,Married,46,-1,4270,1600,0
+260000,Male,University,Married,44,2,802,4541,1
+300000,Male,University,Married,35,0,142378,8000,1
+130000,Male,University,Married,37,-1,923,1632,0
+50000,Male,University,Single,48,0,4975,2000,1
+20000,Male,High School,Single,36,0,18786,1615,0
+20000,Male,Graduate School,Married,36,0,16736,1700,0
+50000,Male,High School,Married,45,0,43469,2015,0
+70000,Male,University,Single,39,2,33521,3350,1
+70000,Male,University,Married,45,2,68722,3100,0
+450000,Male,High School,Single,38,0,322402,15000,0
+70000,Male,University,Married,40,0,5481,1113,0
+120000,Male,University,Single,45,0,123772,6000,0
+80000,Male,University,Single,33,-1,32394,23185,0
+20000,Male,University,Single,40,1,8545,1923,1
+230000,Male,University,Married,42,0,26644,1440,0
+480000,Male,Graduate School,Single,49,0,449336,17124,0
+490000,Male,Graduate School,Married,42,0,283484,10000,0
+310000,Male,University,Married,42,0,286506,9297,0
+100000,Male,Graduate School,Single,33,0,68073,5000,0
+260000,Male,Graduate School,Single,39,0,248880,10611,0
+360000,Male,Graduate School,Married,43,1,-985,10000,0
+500000,Male,Graduate School,Single,48,1,-471,0,0
+160000,Male,Graduate School,Single,34,-2,0,150,0
+360000,Male,High School,Single,34,0,304883,15000,0
+320000,Male,University,Married,36,0,154188,5000,0
+210000,Male,University,Married,42,-2,0,1145,0
+140000,Male,Graduate School,Married,41,0,44635,2000,0
+20000,Male,University,Single,39,0,13231,1300,1
+300000,Male,Graduate School,Others,50,0,301731,60000,1
+130000,Male,University,Married,50,0,122079,7700,0
+200000,Male,High School,Married,50,0,119657,11754,0
+120000,Male,University,Single,51,2,137965,0,1
+50000,Male,University,Others,52,0,49854,2500,0
+180000,Male,Graduate School,Married,51,0,40310,2000,0
+230000,Male,Graduate School,Married,53,1,0,0,1
+10000,Male,High School,Married,50,0,4674,2000,0
+20000,Male,University,Married,49,2,13380,400,1
+200000,Male,University,Single,48,0,111664,4080,0
+360000,Male,University,Married,49,1,3295,0,0
+210000,Male,University,Married,53,0,170760,7700,0
+80000,Male,University,Married,53,2,6469,2600,1
+60000,Male,University,Single,51,0,27293,1450,0
+100000,Male,High School,Married,56,1,51021,4388,0
+20000,Male,High School,Married,62,2,17072,0,0
+50000,Male,University,Married,54,2,35068,0,1
+300000,Male,High School,Married,56,0,110350,4052,0
+140000,Male,High School,Married,50,0,143090,5411,0
+120000,Male,High School,Married,59,0,68646,2144,0
+400000,Male,High School,Married,57,-1,17253,160002,0
+50000,Male,High School,Single,58,2,7831,1123,0
+90000,Male,Graduate School,Married,56,0,58438,2572,0
+60000,Male,University,Married,57,0,59617,3099,0
+260000,Male,Graduate School,Married,51,0,62635,6000,0
+80000,Male,High School,Married,52,0,27308,3000,0
+160000,Male,University,Married,55,-2,-64,0,0
+10000,Male,High School,Single,51,0,7282,1311,0
+190000,Male,University,Married,50,1,19871,0,1
+160000,Male,Graduate School,Married,51,0,131124,5000,0
+20000,Male,High School,Single,49,3,16066,0,0
+300000,Male,High School,Married,57,1,0,176,0
+440000,Male,Graduate School,Married,79,0,429309,15715,0
+50000,Male,High School,Married,54,1,48679,0,0
+350000,Male,Graduate School,Married,50,0,348938,11000,0
+60000,Male,High School,Married,50,0,59703,2047,0
+50000,Male,University,Married,47,2,10492,1600,1
+30000,Male,University,Single,52,2,30119,0,1
+150000,Male,Graduate School,Single,52,0,227634,18000,0
+100000,Male,Graduate School,Married,60,0,4814,3591,1
+350000,Male,Graduate School,Married,55,-1,3297,11637,0
+80000,Male,University,Married,52,0,75113,3500,0
+80000,Male,High School,Married,55,0,80037,63000,0
+10000,Male,High School,Married,54,4,7936,0,1
+210000,Male,University,Married,66,-1,780,0,1
+500000,Male,Graduate School,Married,62,1,0,0,0
+20000,Male,University,Married,61,3,18618,0,1
+330000,Male,Graduate School,Single,54,-2,3027,2386,0
+250000,Male,Graduate School,Married,58,0,1635,1025,0
+420000,Male,Graduate School,Married,54,-2,1003,1632,0
+90000,Male,University,Married,53,-1,24782,3000,1
+180000,Male,High School,Single,52,0,77408,3293,0
+50000,Male,High School,Married,51,0,47877,2516,0
+200000,Male,University,Married,53,1,0,2920,1
+100000,Male,Graduate School,Married,63,0,61228,2810,0
+150000,Male,Graduate School,Married,55,1,0,780,0
+360000,Male,Graduate School,Married,62,-1,5875,7289,0
+50000,Male,University,Single,57,0,51867,2200,0
+30000,Male,University,Others,58,3,23496,0,0
+330000,Male,High School,Married,56,-1,360,1499,0
+70000,Male,University,Others,47,0,27847,2000,1
+420000,Male,Graduate School,Married,51,0,57815,1000,0
+130000,Male,Graduate School,Married,51,0,124108,4530,0
+200000,Male,University,Married,55,0,184505,8000,0
+180000,Male,High School,Married,58,2,36086,2000,1
+170000,Male,University,Married,59,2,44619,1800,1
+190000,Male,Graduate School,Married,54,0,187295,7000,0
+200000,Male,Graduate School,Married,54,1,7111,10,0
+30000,Male,High School,Married,54,0,13693,1600,1
+460000,Male,University,Married,55,0,5352,3000,0
+190000,Male,Graduate School,Married,54,0,205824,7154,0
+10000,Male,University,Single,54,0,3697,1561,0
+90000,Male,University,Married,57,0,10635,3007,0
+440000,Male,Graduate School,Married,54,-1,8517,0,0
+480000,Male,University,Married,58,0,470310,16898,0
+360000,Male,High School,Married,56,-1,22000,7772,0
+500000,Male,Graduate School,Married,69,-2,1149,205,0
+30000,Female,High School,Married,28,0,23176,2867,0
+80000,Female,High School,Married,33,-1,780,0,1
+200000,Male,University,Married,66,-1,389,779,1
+350000,Male,Graduate School,Married,57,-2,1910,0,0
+150000,Male,University,Married,55,0,1540,780,0
+360000,Male,High School,Single,54,-2,0,0,1
+20000,Female,High School,Married,31,1,15903,0,1
+310000,Female,University,Married,47,0,290211,10989,0
+200000,Female,Graduate School,Single,31,1,3028,1473,0
+210000,Female,High School,Married,40,0,160292,3720,0
+30000,Female,High School,Married,47,0,15979,2000,0
+110000,Female,Graduate School,Single,31,1,0,0,0
+50000,Female,High School,Single,27,2,13979,1261,0
+50000,Female,Graduate School,Single,26,0,4197,1000,1
+50000,Female,University,Married,33,0,32752,2000,0
+20000,Female,Graduate School,Single,23,0,16931,1300,0
+230000,Female,High School,Others,30,-1,291,763,0
+80000,Female,University,Single,32,0,16007,5000,0
+150000,Female,University,Married,53,-1,2337,5336,0
+240000,Female,University,Married,28,-1,1444,2553,0
+450000,Female,Graduate School,Married,36,-2,3961,2495,0
+180000,Female,University,Single,24,-1,316,316,1
+30000,Female,University,Single,22,0,24426,1712,0
+50000,Female,University,Married,22,0,48560,2000,1
+20000,Female,University,Married,31,2,17727,1700,0
+150000,Female,University,Single,25,-2,-19,0,0
+20000,Female,University,Married,29,0,5586,1268,1
+40000,Female,University,Married,25,0,40633,1950,1
+180000,Female,University,Married,30,1,1591,150,0
+80000,Female,University,Single,29,0,78626,3400,0
+50000,Female,University,Single,24,0,45397,3015,0
+480000,Female,Graduate School,Single,32,-1,3655,38024,0
+20000,Female,Graduate School,Single,23,0,19422,1300,0
+150000,Female,Graduate School,Single,24,-2,1495,2060,0
+20000,Female,University,Single,22,0,15847,1300,0
+360000,Female,Graduate School,Single,29,1,0,0,0
+200000,Female,Graduate School,Single,33,0,8819,8656,1
+20000,Female,Graduate School,Single,28,0,4647,5057,0
+50000,Female,University,Single,23,1,18085,7000,0
+70000,Female,University,Single,23,0,64882,2782,1
+120000,Female,University,Single,23,0,116314,6019,0
+50000,Female,High School,Married,46,0,13406,2000,0
+310000,Female,Graduate School,Married,36,0,157173,6000,0
+60000,Female,University,Single,24,0,54216,2000,0
+20000,Female,Graduate School,Single,22,0,23050,1336,0
+20000,Female,University,Single,22,2,6900,0,1
+50000,Female,Graduate School,Single,22,0,3560,1090,1
+80000,Female,University,Single,22,0,18429,2015,0
+20000,Female,University,Single,22,0,16990,3000,0
+20000,Female,Others,Single,22,0,19568,4641,1
+10000,Female,Graduate School,Married,23,0,7641,1262,0
+20000,Female,University,Married,21,-1,17617,1489,0
+20000,Female,University,Single,21,1,18455,1327,0
+30000,Female,University,Single,22,0,18424,1311,0
+20000,Female,Graduate School,Single,21,0,2798,2000,0
+20000,Female,Graduate School,Single,22,0,18479,2022,0
+20000,Female,University,Single,22,1,15712,0,0
+50000,Female,University,Single,22,1,51798,0,0
+50000,Female,Graduate School,Single,23,0,8889,1500,0
+50000,Female,High School,Single,23,-1,3237,5389,1
+20000,Female,University,Single,22,0,10294,4000,1
+20000,Female,University,Single,22,2,8628,0,1
+80000,Female,Graduate School,Single,23,-2,390,390,0
+80000,Female,University,Single,23,0,43275,1700,0
+130000,Female,University,Single,23,-1,69,0,0
+200000,Female,Graduate School,Single,24,-2,675,1053,0
+50000,Female,Graduate School,Single,24,0,27320,27704,0
+20000,Female,Graduate School,Single,22,0,12476,3000,0
+20000,Female,University,Single,22,1,15706,1000,1
+50000,Female,University,Single,22,1,44980,7,1
+60000,Female,University,Single,23,1,9491,0,0
+130000,Female,University,Single,22,0,8570,527,0
+80000,Female,University,Single,22,-1,1247,0,1
+170000,Female,University,Single,34,0,156526,7500,0
+90000,Female,Graduate School,Single,23,0,18418,5000,0
+20000,Female,University,Married,23,-1,3426,11615,0
+20000,Female,Graduate School,Single,23,-1,1207,0,0
+80000,Female,University,Single,23,0,38587,5000,0
+30000,Female,University,Single,22,0,26568,1762,1
+30000,Female,Graduate School,Single,22,0,18324,3300,0
+30000,Female,High School,Married,23,2,15003,1500,1
+150000,Female,University,Single,23,-2,956,66470,0
+50000,Female,Graduate School,Single,23,0,40756,10000,1
+80000,Female,University,Single,23,1,-1678,78000,0
+30000,Female,Graduate School,Single,23,0,20958,1092,1
+60000,Female,Graduate School,Single,23,2,26596,3500,1
+100000,Female,University,Single,23,0,42409,1700,0
+140000,Female,Graduate School,Single,23,0,35206,2000,0
+140000,Female,Graduate School,Single,23,0,1217,866,0
+70000,Female,Others,Married,23,0,30847,1516,0
+70000,Female,University,Single,22,2,34970,1313,0
+30000,Female,University,Married,28,2,21588,0,1
+50000,Female,High School,Single,22,2,1261,1261,1
+50000,Female,University,Single,22,2,49559,1100,1
+30000,Female,High School,Married,22,0,23185,1400,0
+30000,Female,High School,Single,23,0,28371,2000,1
+30000,Female,Graduate School,Single,24,-2,0,0,0
+20000,Female,Graduate School,Single,25,0,11629,1193,0
+60000,Female,Others,Single,23,0,60876,2523,0
+200000,Female,Graduate School,Single,23,0,72853,2933,0
+80000,Female,Graduate School,Single,23,1,0,0,1
+170000,Female,Graduate School,Single,23,0,15639,1265,0
+230000,Female,University,Single,24,-1,160,237,1
+90000,Female,Graduate School,Single,24,2,14085,0,1
+290000,Female,University,Single,24,0,262476,8000,1
+60000,Female,University,Single,24,0,8086,3015,0
+20000,Female,Graduate School,Single,24,-2,18252,2000,1
+50000,Female,University,Married,23,0,25526,1431,0
+210000,Female,University,Single,23,0,9987,5000,0
+290000,Female,University,Single,29,-1,140809,6201,0
+90000,Female,University,Single,23,0,29357,1500,0
+50000,Female,University,Single,23,0,37259,1750,0
+50000,Female,University,Single,23,0,32206,1630,0
+20000,Female,High School,Single,24,0,1360,1301,0
+360000,Female,University,Single,23,0,37927,2057,0
+230000,Female,Graduate School,Single,23,0,34975,2000,0
+30000,Female,Graduate School,Single,24,-1,19998,3001,0
+30000,Female,Graduate School,Single,23,0,17462,10000,0
+70000,Female,University,Single,23,0,45743,1619,0
+20000,Female,University,Single,25,2,18696,1705,1
+80000,Female,University,Single,25,-1,390,0,0
+80000,Female,University,Single,24,0,80690,2852,0
+50000,Female,University,Single,25,0,49173,1800,0
+30000,Female,Others,Single,22,0,13738,4000,0
+30000,Female,Graduate School,Single,22,0,28000,3703,0
+70000,Female,University,Single,22,0,53399,1841,0
+30000,Female,University,Single,24,0,30767,1600,0
+110000,Female,University,Married,24,1,9787,5,0
+160000,Female,University,Single,24,0,55559,3010,0
+30000,Female,University,Single,22,2,29412,0,1
+40000,Female,University,Single,24,2,19012,1326,1
+160000,Female,University,Single,25,1,0,1000,1
+50000,Female,University,Single,23,2,33999,1867,1
+160000,Female,Graduate School,Single,23,0,77947,4000,1
+30000,Female,University,Single,22,0,3077,390,0
+70000,Female,University,Single,23,2,64655,0,0
+150000,Female,University,Married,24,-1,12769,902,0
+50000,Female,University,Married,24,0,46485,1523,0
+80000,Female,Graduate School,Married,24,0,79393,3400,0
+50000,Female,University,Single,24,2,35845,1500,1
+50000,Female,University,Single,24,0,54519,2000,0
+30000,Female,High School,Married,25,0,26277,2000,0
+250000,Female,High School,Married,27,2,324204,5000,0
+70000,Female,Graduate School,Married,23,-1,3000,0,0
+150000,Female,University,Single,23,-1,330,330,0
+320000,Female,Graduate School,Single,25,0,179893,6558,0
+80000,Female,Others,Single,25,0,11521,1500,0
+30000,Female,High School,Single,26,1,11167,2600,1
+50000,Female,Graduate School,Single,23,0,47160,5000,0
+30000,Female,High School,Single,23,0,33171,1524,0
+20000,Female,University,Single,23,0,8936,1000,0
+200000,Female,University,Married,25,0,108174,5000,1
+20000,Female,University,Single,22,0,12932,1532,1
+100000,Female,University,Single,24,2,84098,1000,1
+20000,Female,Graduate School,Single,26,1,18194,1446,1
+130000,Female,University,Single,24,0,10071,46034,0
+50000,Female,Others,Single,24,0,42207,2000,0
+200000,Female,Others,Married,25,0,149886,5388,1
+20000,Female,University,Married,25,1,0,0,1
+20000,Female,University,Married,26,1,7072,2,0
+100000,Female,Graduate School,Single,26,1,3144,0,0
+170000,Female,University,Single,24,0,64142,3000,0
+80000,Female,Graduate School,Single,22,0,48305,1739,0
+130000,Female,Graduate School,Single,26,-2,1815,3532,1
+50000,Female,High School,Married,26,2,50002,2330,1
+280000,Female,University,Single,26,0,25989,1800,0
+130000,Female,University,Single,27,-1,3359,0,0
+150000,Female,University,Married,27,-1,9377,11908,0
+130000,Female,Graduate School,Single,23,-2,0,0,0
+30000,Female,University,Single,23,0,20194,5000,0
+50000,Female,Graduate School,Single,23,0,50933,2400,0
+120000,Female,University,Single,24,0,89240,8000,0
+30000,Female,University,Single,24,0,28218,2000,0
+80000,Female,University,Single,25,0,69816,2600,0
+80000,Female,Graduate School,Single,23,-1,1178,2550,0
+20000,Female,Graduate School,Single,23,0,15781,5000,1
+80000,Female,High School,Married,23,-1,333,333,0
+30000,Female,University,Married,27,2,25243,1730,0
+120000,Female,Graduate School,Single,26,-1,6500,4899,0
+110000,Female,Graduate School,Single,26,0,104804,4988,0
+70000,Female,Graduate School,Single,25,0,64822,2502,0
+300000,Female,Graduate School,Single,25,1,0,9048,0
+160000,Female,University,Single,25,-1,5536,15046,0
+150000,Female,University,Single,25,0,44552,2022,0
+210000,Female,University,Married,47,-1,2295,566,1
+20000,Female,University,Single,26,0,16976,1300,0
+240000,Female,University,Single,26,0,150886,6537,0
+60000,Female,University,Single,25,-1,995,7353,0
+20000,Female,University,Single,25,0,7712,6742,0
+130000,Female,University,Single,25,1,136963,0,0
+330000,Female,University,Married,25,0,118053,198842,0
+50000,Female,Graduate School,Married,25,0,48294,1805,1
+110000,Female,Graduate School,Single,24,0,67729,3000,0
+50000,Female,High School,Married,24,0,44742,1737,0
+90000,Female,University,Single,24,0,24179,0,0
+130000,Female,University,Single,26,0,131336,6000,0
+150000,Female,High School,Married,26,0,92802,7500,0
+140000,Female,University,Single,27,-1,515,0,0
+150000,Female,University,Single,24,0,143212,5500,0
+50000,Female,University,Single,26,2,48510,1800,1
+40000,Female,Graduate School,Single,26,0,6223,2000,1
+170000,Female,University,Single,27,0,164646,7200,0
+60000,Female,University,Single,24,-1,3485,0,0
+200000,Female,University,Single,25,-2,0,169,0
+80000,Female,Graduate School,Single,25,1,-51,72500,0
+100000,Female,Graduate School,Single,25,0,13201,2000,0
+170000,Female,University,Single,25,0,65531,2019,0
+140000,Female,Graduate School,Single,26,0,131863,6500,0
+90000,Female,University,Single,26,0,70457,2850,0
+70000,Female,Others,Married,26,0,71993,2851,0
+120000,Female,Graduate School,Single,26,0,80975,4000,0
+60000,Female,University,Married,26,0,28768,1524,0
+150000,Female,University,Single,26,0,107506,5109,0
+50000,Female,High School,Married,24,0,27172,1390,0
+80000,Female,High School,Married,26,0,78627,3000,0
+310000,Female,University,Single,27,0,60692,4000,0
+50000,Female,Graduate School,Single,23,0,22939,1413,0
+20000,Female,University,Married,23,0,12483,3071,1
+80000,Female,Graduate School,Single,24,0,4711,1266,1
+180000,Female,Graduate School,Single,25,-1,2654,0,0
+50000,Female,University,Married,23,0,40544,2000,0
+70000,Female,University,Single,22,0,66505,2501,1
+20000,Female,University,Married,25,0,17294,4821,0
+60000,Female,University,Married,24,2,28600,0,1
+30000,Female,University,Single,25,2,25267,1800,0
+20000,Female,University,Single,24,2,2968,1223,1
+280000,Female,Others,Single,25,0,8574,2000,0
+30000,Female,University,Single,25,-2,0,1232,0
+40000,Female,Graduate School,Single,26,2,26487,1500,1
+130000,Female,University,Single,25,0,14351,2000,0
+90000,Female,University,Single,25,0,44727,2000,0
+260000,Female,University,Single,25,0,5008,10698,0
+100000,Female,Graduate School,Single,26,-1,123,123,1
+50000,Female,Graduate School,Single,24,0,33879,1900,1
+190000,Female,High School,Single,24,0,54186,2000,0
+70000,Female,University,Single,22,0,32845,3100,0
+50000,Female,University,Married,26,0,49855,4100,0
+40000,Female,University,Single,26,0,37423,3967,0
+200000,Female,Graduate School,Single,26,0,203025,10010,0
+100000,Female,Graduate School,Single,26,1,0,0,0
+220000,Female,University,Single,26,-1,1534,3524,0
+200000,Female,University,Married,27,-1,14753,3670,0
+50000,Female,High School,Married,27,2,1884,570,0
+360000,Female,University,Single,27,-2,1189,0,0
+170000,Female,Graduate School,Single,25,0,8046,1085,0
+250000,Female,High School,Others,26,0,28507,129565,0
+80000,Female,University,Single,26,2,76540,8000,0
+110000,Female,University,Single,23,0,19980,2500,0
+20000,Female,University,Single,23,0,10513,1000,0
+20000,Female,University,Married,25,2,11202,1500,1
+60000,Female,University,Single,24,0,29832,1538,0
+170000,Female,University,Single,24,0,19103,3000,0
+150000,Female,University,Single,24,0,66731,3000,0
+210000,Female,University,Single,24,-2,973,4224,0
+230000,Female,University,Single,25,0,10379,2255,0
+160000,Female,University,Single,25,-1,1532,0,0
+150000,Female,University,Single,25,0,38494,1516,0
+210000,Female,University,Single,26,-1,4341,402,0
+300000,Female,University,Single,28,-1,603,4747,0
+340000,Female,Graduate School,Single,27,-1,7453,2942,0
+150000,Female,Graduate School,Single,24,-2,2901,0,1
+50000,Female,University,Single,25,0,8481,1000,0
+160000,Female,Graduate School,Single,27,-2,717,3269,1
+50000,Female,Graduate School,Single,25,0,40067,4000,1
+70000,Female,University,Single,25,1,5412,0,1
+120000,Female,University,Married,25,0,79108,3000,0
+200000,Female,Graduate School,Single,25,-2,-3,776,0
+50000,Female,University,Single,25,0,20123,2880,0
+310000,Female,Graduate School,Single,26,0,27638,1393,0
+50000,Female,University,Single,24,0,27890,2000,0
+300000,Female,Others,Single,24,-1,2892,4039,0
+160000,Female,University,Single,25,0,156753,10000,0
+90000,Female,University,Single,25,0,67004,3000,0
+160000,Female,University,Married,25,0,116744,4000,0
+500000,Female,Graduate School,Single,25,-1,119193,70005,0
+390000,Female,Graduate School,Single,26,-2,0,280,0
+320000,Female,University,Single,26,0,320983,14000,0
+50000,Female,Graduate School,Single,26,-2,3655,0,0
+200000,Female,Others,Single,27,1,777,15103,0
+90000,Female,University,Single,27,0,4403,1000,0
+360000,Female,University,Single,27,0,12389,0,0
+180000,Female,Graduate School,Single,27,-1,264,264,0
+60000,Female,University,Single,24,-1,9970,1020,0
+250000,Female,University,Single,24,-2,0,0,1
+20000,Female,Graduate School,Single,25,0,15312,3000,0
+10000,Female,Graduate School,Single,24,0,5707,1123,0
+110000,Female,Graduate School,Single,24,0,9334,100082,0
+150000,Female,University,Single,25,-1,69333,3401,0
+160000,Female,University,Single,25,0,154365,7000,0
+30000,Female,University,Single,25,1,13785,0,0
+30000,Female,Graduate School,Single,24,1,29492,1000,0
+280000,Female,Graduate School,Married,26,-2,4664,1578,0
+120000,Female,Graduate School,Single,28,1,0,1170,0
+360000,Female,High School,Married,27,0,115730,4371,0
+80000,Female,University,Married,27,0,42037,2100,0
+310000,Female,High School,Single,27,0,83756,4000,0
+80000,Female,University,Married,27,-2,390,390,0
+180000,Female,Graduate School,Single,28,0,171237,8000,0
+130000,Female,Graduate School,Single,26,-1,11149,3160,0
+70000,Female,High School,Married,24,0,66380,4000,0
+400000,Female,University,Single,24,0,14925,5000,0
+20000,Female,Graduate School,Single,22,1,19849,0,1
+30000,Female,Graduate School,Single,23,0,4242,3915,0
+80000,Female,University,Single,25,1,0,9000,0
+150000,Female,University,Single,26,0,11474,5001,0
+20000,Female,High School,Single,24,-1,5103,3773,0
+20000,Female,Graduate School,Single,25,-1,1983,326,0
+150000,Female,Graduate School,Single,26,-1,15079,5018,0
+60000,Female,University,Single,27,0,41546,5845,0
+320000,Female,Graduate School,Single,27,0,120158,6000,0
+80000,Female,University,Single,27,-1,725,2968,0
+130000,Female,Graduate School,Single,25,0,123945,6000,0
+400000,Female,University,Married,24,0,187790,5519,0
+20000,Female,Graduate School,Single,25,0,15810,1249,0
+60000,Female,Graduate School,Single,24,-1,6686,1049,0
+80000,Female,Graduate School,Single,24,0,47952,1732,0
+50000,Female,Graduate School,Married,24,0,14345,2836,1
+20000,Female,Graduate School,Single,25,0,14785,3000,0
+100000,Female,University,Married,25,0,99203,3485,0
+80000,Female,Graduate School,Single,25,3,45031,3100,1
+140000,Female,Graduate School,Single,25,-1,2555,2000,1
+110000,Female,Graduate School,Single,25,0,103135,4800,0
+350000,Female,Graduate School,Single,27,0,83415,4000,0
+60000,Female,Graduate School,Single,27,1,-215,0,0
+50000,Female,Graduate School,Single,24,2,43878,2068,1
+210000,Female,University,Single,26,0,107948,5407,0
+50000,Female,High School,Single,25,0,33831,2000,0
+80000,Female,Others,Married,25,0,74180,3500,0
+30000,Female,University,Single,26,2,22364,1753,1
+150000,Female,University,Single,26,0,153070,6000,0
+20000,Female,Graduate School,Single,26,-2,1000,8930,0
+270000,Female,Graduate School,Single,26,1,0,0,0
+50000,Female,University,Single,27,2,41126,2100,1
+380000,Female,University,Single,27,0,118624,3700,0
+60000,Female,Graduate School,Single,28,0,23518,2700,0
+50000,Female,High School,Single,26,1,31336,1800,1
+120000,Female,University,Married,26,-1,9267,29600,0
+260000,Female,Graduate School,Single,26,-1,1608,29447,0
+190000,Female,University,Single,26,1,51799,0,0
+50000,Female,Graduate School,Single,26,0,16785,1600,0
+80000,Female,Graduate School,Single,26,0,4301,0,0
+110000,Female,University,Single,26,-1,1687,400,1
+50000,Female,University,Single,26,0,28795,1500,0
+150000,Female,Graduate School,Single,27,-2,35688,14820,0
+50000,Female,University,Married,27,-1,780,0,1
+80000,Female,University,Single,25,0,83931,3500,0
+20000,Female,Graduate School,Single,24,0,22515,2000,0
+200000,Female,University,Single,24,0,225931,8271,0
+80000,Female,Graduate School,Single,25,-1,746,4830,0
+70000,Female,University,Single,25,0,33390,2000,1
+280000,Female,Graduate School,Single,26,0,189748,6300,0
+160000,Female,High School,Single,26,0,2454,1083,0
+90000,Female,University,Single,26,0,86183,4100,0
+200000,Female,Graduate School,Single,27,-1,10603,0,0
+150000,Female,Graduate School,Single,28,1,0,382,0
+80000,Female,University,Single,27,0,78933,3400,0
+390000,Female,Graduate School,Single,28,0,249265,8903,0
+190000,Female,University,Married,28,0,4639,1100,0
+240000,Female,Graduate School,Single,28,-2,8451,1419,0
+200000,Female,Graduate School,Single,29,-1,5612,25004,0
+130000,Female,University,Married,29,-2,1451,1175,0
+100000,Female,University,Married,29,0,118008,5614,0
+190000,Female,University,Single,29,-1,4460,5537,0
+90000,Female,Graduate School,Single,29,0,84515,4200,0
+20000,Female,High School,Married,27,0,6294,1500,1
+90000,Female,University,Single,26,1,50200,0,0
+90000,Female,University,Single,26,1,86335,4500,0
+80000,Female,University,Single,27,1,74250,4500,0
+40000,Female,University,Married,28,2,31131,3500,0
+50000,Female,University,Married,22,0,25561,2134,0
+150000,Female,Graduate School,Single,25,0,78823,3000,0
+200000,Female,Graduate School,Single,27,-2,0,127,0
+20000,Female,University,Married,27,0,15939,1500,0
+160000,Female,Graduate School,Single,27,0,27058,3173,0
+80000,Female,University,Single,28,0,79688,2800,0
+30000,Female,University,Single,22,0,28681,1850,0
+50000,Female,Graduate School,Single,23,-1,2332,0,0
+310000,Female,University,Married,25,0,191767,6612,0
+50000,Female,University,Single,26,0,8056,1500,1
+200000,Female,University,Single,26,-1,41906,18089,0
+20000,Female,University,Single,22,0,15499,1270,1
+30000,Female,High School,Married,22,-2,0,0,0
+20000,Female,Graduate School,Single,23,-1,1174,1283,1
+50000,Female,University,Single,22,0,28544,1450,0
+230000,Female,University,Single,28,0,24867,4300,0
+130000,Female,Graduate School,Single,29,-1,430,430,0
+500000,Female,Graduate School,Single,29,-1,14217,20000,0
+200000,Female,University,Married,29,0,78201,4000,0
+110000,Female,Graduate School,Single,29,-2,7397,2000,0
+230000,Female,Graduate School,Married,30,1,6168,0,0
+80000,Female,University,Married,30,-1,36968,390,0
+20000,Female,Graduate School,Single,27,-1,4734,0,1
+120000,Female,University,Single,27,0,15234,1137,1
+200000,Female,Graduate School,Single,27,1,0,0,0
+70000,Female,University,Married,28,0,16030,1500,0
+230000,Female,University,Single,28,0,33057,1500,0
+60000,Female,High School,Married,28,1,60718,2258,0
+60000,Female,University,Single,28,2,48322,4692,0
+360000,Female,University,Married,28,-2,9278,6213,1
+100000,Female,University,Single,25,0,62665,2300,0
+240000,Female,Graduate School,Single,25,1,0,587,0
+130000,Female,Graduate School,Single,25,2,1990,0,0
+130000,Female,University,Others,25,-1,3148,5,1
+50000,Female,Graduate School,Single,25,0,37014,1900,0
+150000,Female,Graduate School,Married,26,2,69716,5211,1
+300000,Female,University,Single,28,-2,0,0,1
+200000,Female,Graduate School,Single,28,-1,18134,18410,0
+320000,Female,Graduate School,Single,28,-2,4392,11700,0
+160000,Female,University,Single,27,1,0,0,0
+210000,Female,Graduate School,Single,26,0,186544,8102,0
+70000,Female,Graduate School,Single,25,0,37017,3900,1
+250000,Female,University,Single,25,0,25823,1500,0
+130000,Female,Graduate School,Single,27,0,58811,1500,0
+30000,Female,Graduate School,Single,27,0,10562,2500,0
+210000,Female,Graduate School,Single,27,0,161055,6000,0
+170000,Female,University,Single,27,2,173577,6500,1
+100000,Female,High School,Married,28,2,30884,0,0
+50000,Female,University,Married,28,1,41468,0,0
+130000,Female,Graduate School,Single,27,0,49964,1000,0
+30000,Female,University,Single,26,0,27110,1455,1
+130000,Female,University,Single,30,0,148234,5200,0
+70000,Female,Graduate School,Single,26,0,46222,2500,0
+210000,Female,Graduate School,Single,26,0,23245,3487,0
+170000,Female,Graduate School,Single,25,0,70326,3100,0
+420000,Female,Graduate School,Single,28,0,7880,1010,0
+80000,Female,University,Married,26,0,42972,2000,0
+150000,Female,University,Single,26,-2,37636,27603,0
+200000,Female,Graduate School,Single,26,-1,73508,3100,0
+70000,Female,University,Single,26,2,42053,0,1
+90000,Female,University,Single,26,2,87606,7063,1
+130000,Female,University,Married,27,2,60032,0,0
+360000,Female,University,Single,28,-1,12393,3445,0
+260000,Female,University,Single,28,0,181121,7013,0
+160000,Female,University,Single,28,0,11832,100000,0
+20000,Female,University,Married,28,-1,1424,10094,0
+140000,Female,High School,Married,28,2,96008,4500,1
+200000,Female,University,Single,29,-1,9283,0,0
+200000,Female,University,Married,29,0,46139,30000,0
+120000,Female,University,Single,28,0,120154,6010,0
+210000,Female,University,Single,29,0,52143,2500,0
+50000,Female,University,Married,23,0,37586,2000,1
+80000,Female,Graduate School,Single,23,1,80457,4350,1
+60000,Female,University,Single,23,-1,56250,2000,0
+50000,Female,University,Married,24,0,44159,5000,0
+50000,Female,University,Single,25,0,26509,1757,0
+70000,Female,Graduate School,Single,25,1,53742,0,0
+80000,Female,University,Single,25,1,50120,0,1
+50000,Female,University,Married,25,0,9587,1200,1
+30000,Female,Graduate School,Single,25,-1,5285,0,1
+300000,Female,Graduate School,Single,26,-2,0,0,0
+240000,Female,University,Single,26,-1,2335,2335,0
+140000,Female,Others,Single,26,0,75000,0,0
+30000,Female,Graduate School,Single,27,0,25552,1800,0
+70000,Female,University,Married,29,0,11502,2000,0
+220000,Female,University,Single,29,0,217345,4030,0
+230000,Female,University,Single,29,0,33511,2000,0
+20000,Female,University,Married,29,0,13958,1600,0
+130000,Female,University,Single,25,1,11446,1600,0
+60000,Female,University,Single,26,2,60455,3200,1
+200000,Female,University,Single,26,0,22737,3000,0
+310000,Female,University,Single,26,0,11566,3010,0
+240000,Female,University,Single,26,0,210437,9016,0
+200000,Female,Graduate School,Single,26,-1,2318,7478,0
+440000,Female,University,Single,27,0,197019,6040,0
+180000,Female,University,Single,27,0,16996,1249,0
+300000,Female,Graduate School,Single,29,-2,25370,21250,0
+160000,Female,University,Single,30,0,17357,2250,0
+160000,Female,Graduate School,Single,30,1,151270,6000,0
+230000,Female,University,Single,30,-1,1855,0,0
+160000,Female,Graduate School,Married,30,2,74286,2700,1
+200000,Female,Graduate School,Single,28,-1,38783,26286,0
+240000,Female,University,Single,29,0,66473,8000,0
+420000,Female,Graduate School,Single,29,0,52898,3000,0
+210000,Female,Graduate School,Single,30,-1,390,390,0
+220000,Female,Graduate School,Single,28,0,157215,6502,0
+200000,Female,Graduate School,Married,28,-1,1878,576,0
+230000,Female,University,Single,28,1,0,0,0
+250000,Female,Graduate School,Single,28,0,3731,1000,0
+80000,Female,University,Married,28,0,58784,1970,0
+80000,Female,University,Married,26,0,26158,1623,0
+50000,Female,High School,Married,26,-1,6064,0,1
+20000,Female,University,Single,26,-1,767,376,1
+60000,Female,University,Married,27,0,54900,1971,1
+50000,Female,University,Single,27,2,19410,3000,1
+310000,Female,Graduate School,Single,28,0,8193,43003,0
+140000,Female,Graduate School,Single,29,0,58971,7062,0
+220000,Female,Graduate School,Single,27,0,67028,5332,0
+130000,Female,University,Single,27,1,0,0,0
+300000,Female,Graduate School,Single,29,-1,3438,28,0
+140000,Female,High School,Single,28,-1,800,7937,0
+400000,Female,Graduate School,Single,29,-1,1158,95,0
+200000,Female,Graduate School,Married,30,1,38398,0,0
+150000,Female,Graduate School,Single,29,-1,3305,0,0
+420000,Female,Graduate School,Single,27,0,41354,7004,0
+230000,Female,University,Single,26,-1,18612,90523,0
+210000,Female,Graduate School,Single,27,0,16186,1300,0
+120000,Female,Graduate School,Single,27,-1,22225,1118,0
+50000,Female,Graduate School,Single,28,0,36917,2100,0
+200000,Female,University,Single,28,0,59373,6002,0
+180000,Female,University,Single,28,0,8689,3000,0
+30000,Female,University,Married,28,0,25344,2000,0
+110000,Female,University,Single,29,2,113394,5314,1
+50000,Female,University,Married,29,0,82843,4000,0
+120000,Female,University,Single,29,1,103944,0,0
+230000,Female,Graduate School,Single,29,0,33988,2500,0
+80000,Female,University,Married,29,-2,2801,0,1
+320000,Female,University,Single,29,-2,2391,1561,0
+50000,Female,High School,Married,29,0,46375,1876,0
+110000,Female,University,Married,29,0,104163,3400,0
+150000,Female,Graduate School,Single,25,1,0,0,0
+30000,Female,Graduate School,Single,26,-1,2089,780,0
+80000,Female,University,Single,26,0,35220,4000,0
+150000,Female,University,Single,26,0,83446,3500,0
+270000,Female,Graduate School,Single,26,0,134235,6000,0
+160000,Female,Graduate School,Single,26,0,118204,6000,0
+150000,Female,Graduate School,Single,29,0,3737,1000,0
+170000,Female,High School,Single,31,1,0,0,0
+70000,Female,University,Married,31,0,72286,2600,0
+210000,Female,Graduate School,Single,31,0,10968,2000,0
+50000,Female,Graduate School,Single,30,0,31205,2000,0
+70000,Female,Graduate School,Single,27,-1,6851,0,0
+180000,Female,Graduate School,Single,28,1,0,652,0
+260000,Female,University,Single,27,-1,208,208,1
+50000,Female,University,Married,27,0,45632,1837,0
+140000,Female,Graduate School,Single,27,0,142027,7000,0
+100000,Female,University,Married,30,0,100963,3700,1
+420000,Female,University,Single,29,0,82235,3000,0
+300000,Female,University,Single,30,1,1943,107000,0
+30000,Female,University,Single,25,0,28494,1750,1
+150000,Female,Others,Married,26,1,0,0,0
+180000,Female,University,Married,26,0,182026,4500,0
+330000,Female,Graduate School,Single,26,2,179887,5289,1
+360000,Female,University,Single,26,-1,1968,463,0
+210000,Female,University,Single,27,-2,542,22366,1
+90000,Female,Graduate School,Single,27,0,83840,3200,0
+250000,Female,University,Married,28,2,28245,2000,1
+80000,Female,High School,Single,24,1,12098,0,0
+290000,Female,Graduate School,Single,27,0,109746,4000,0
+280000,Female,Graduate School,Single,29,0,149312,5500,0
+340000,Female,Graduate School,Married,29,-1,11806,10000,0
+120000,Female,Graduate School,Single,29,0,111625,4012,0
+110000,Female,High School,Single,31,0,4061,1205,0
+30000,Female,University,Single,22,0,20439,1595,0
+70000,Female,University,Single,24,0,34201,1800,0
+20000,Female,High School,Married,24,-1,4983,0,0
+80000,Female,University,Single,25,0,45109,2100,0
+120000,Female,University,Married,25,2,17488,3000,1
+500000,Female,Graduate School,Single,26,0,32637,15943,0
+320000,Female,University,Single,27,0,180137,17100,0
+150000,Female,Graduate School,Single,27,-1,346,21379,0
+190000,Female,University,Single,27,2,3902,1000,1
+130000,Female,University,Married,29,1,0,0,0
+70000,Female,University,Married,34,0,50488,2200,0
+350000,Female,Graduate School,Single,30,-2,856,187,0
+210000,Female,Graduate School,Single,27,0,18430,2000,1
+120000,Female,University,Married,27,0,85492,3500,1
+130000,Female,University,Married,28,0,123943,6500,0
+170000,Female,University,Single,27,0,43839,2400,0
+100000,Female,High School,Married,30,1,-602,0,0
+200000,Female,Graduate School,Single,28,0,140407,8031,0
+270000,Female,Graduate School,Single,28,-1,15167,845,0
+30000,Female,University,Married,30,1,28534,0,1
+160000,Female,University,Single,24,0,37030,4060,0
+80000,Female,High School,Married,26,0,69589,5000,0
+310000,Female,Graduate School,Single,26,0,27283,1459,0
+20000,Female,Graduate School,Single,23,0,14302,2000,0
+80000,Female,Graduate School,Single,27,0,52652,2000,1
+180000,Female,Graduate School,Single,28,-1,390,1345,0
+80000,Female,Graduate School,Single,29,0,64440,3000,0
+140000,Female,University,Single,28,0,129925,6200,0
+150000,Female,Others,Single,26,0,117209,3700,0
+100000,Female,University,Single,27,0,21043,4000,0
+50000,Female,Graduate School,Single,27,1,0,45000,1
+200000,Female,Graduate School,Single,28,0,106519,4000,0
+130000,Female,University,Married,28,2,98606,4500,0
+150000,Female,Graduate School,Married,29,2,6241,0,0
+170000,Female,University,Married,30,0,165327,7400,0
+180000,Female,University,Single,28,8,197231,0,1
+320000,Female,University,Single,28,1,23595,2000,1
+50000,Female,Graduate School,Single,28,-1,4917,0,0
+50000,Female,University,Single,27,0,20762,2000,0
+80000,Female,High School,Married,27,0,18094,1034,0
+100000,Female,Graduate School,Single,27,1,-6,2000,0
+80000,Female,University,Single,23,0,50456,1432,0
+120000,Female,Graduate School,Single,26,1,3730,1500,0
+230000,Female,Graduate School,Single,31,-2,49279,4674,0
+60000,Female,University,Married,31,-1,44912,144047,1
+220000,Female,Graduate School,Single,30,0,45346,1947,0
+80000,Female,University,Single,30,0,70540,2485,0
+300000,Female,Graduate School,Single,28,1,0,0,0
+400000,Female,Graduate School,Single,30,-2,2912,2668,0
+50000,Female,University,Single,31,0,51036,1762,0
+200000,Female,Graduate School,Single,30,-1,4800,2040,0
+120000,Female,Graduate School,Single,32,1,0,0,0
+360000,Female,Graduate School,Married,31,0,15121,5006,0
+220000,Female,University,Single,30,0,32357,2000,0
+280000,Female,University,Married,31,0,139998,6084,0
+500000,Female,University,Single,31,-2,-9,0,0
+240000,Female,High School,Single,31,0,167686,3977,0
+310000,Female,University,Married,30,0,296942,12401,0
+200000,Female,Graduate School,Single,28,1,205683,0,1
+260000,Female,University,Single,28,-1,75841,3008,0
+200000,Female,University,Single,28,1,171468,0,1
+180000,Female,Graduate School,Single,27,-2,0,0,0
+270000,Female,University,Married,27,0,24769,4000,0
+30000,Female,University,Single,28,-1,6731,0,0
+180000,Female,University,Single,28,-1,300,500,0
+340000,Female,University,Married,29,0,36533,3000,0
+140000,Female,Others,Single,28,0,108018,1000,1
+180000,Female,Graduate School,Single,28,1,5930,0,1
+150000,Female,University,Single,27,0,89302,4200,0
+110000,Female,Graduate School,Single,27,0,112938,6000,0
+50000,Female,University,Married,28,-1,1019,1672,0
+140000,Female,Graduate School,Single,28,-1,7588,17015,0
+360000,Female,Graduate School,Single,28,-1,9626,15023,0
+280000,Female,University,Single,29,-2,99,0,0
+310000,Female,Graduate School,Single,28,0,37085,3817,0
+290000,Female,University,Married,28,0,24257,2026,0
+210000,Female,Graduate School,Single,28,-1,5850,4388,0
+210000,Female,Graduate School,Married,29,-1,742,742,0
+110000,Female,University,Single,29,3,600,0,1
+50000,Female,Graduate School,Single,28,0,48211,1971,0
+280000,Female,Graduate School,Single,28,-1,269,4166,0
+50000,Female,University,Single,28,1,45472,3500,1
+120000,Female,University,Married,28,1,60188,0,1
+80000,Female,University,Married,28,1,0,146,0
+50000,Female,Graduate School,Single,29,0,17250,8560,0
+60000,Female,University,Single,28,2,4114,2000,1
+190000,Female,Graduate School,Single,28,2,155335,8000,1
+150000,Female,Graduate School,Single,28,2,65809,1498,1
+170000,Female,Graduate School,Single,28,0,2300,0,0
+150000,Female,University,Single,28,0,62122,2000,0
+390000,Female,Graduate School,Single,28,0,202248,10000,0
+70000,Female,University,Single,29,0,20244,50005,0
+120000,Female,University,Single,29,0,61470,3000,0
+30000,Female,University,Married,29,2,26799,5000,0
+200000,Female,University,Married,26,0,98822,4300,0
+300000,Female,University,Single,28,2,272399,11232,1
+90000,Female,Graduate School,Single,29,2,7052,2605,1
+200000,Female,Graduate School,Married,30,-2,4833,6370,0
+210000,Female,Graduate School,Single,31,-1,326,326,0
+230000,Female,University,Single,31,0,4908,0,0
+200000,Female,Graduate School,Single,31,1,4035,4210,0
+420000,Female,University,Married,32,0,89863,1782,0
+60000,Female,High School,Married,28,1,32578,3000,0
+50000,Female,University,Married,34,0,42050,10000,0
+60000,Female,University,Married,28,1,61474,0,0
+210000,Female,Graduate School,Single,28,-1,2568,52720,0
+50000,Female,University,Married,28,0,43648,3000,0
+390000,Female,High School,Married,28,0,129002,5000,1
+150000,Female,Graduate School,Single,28,1,-89,13983,0
+200000,Female,University,Single,30,0,38616,3040,0
+150000,Female,Graduate School,Married,31,-1,12420,0,0
+470000,Female,University,Single,30,0,216031,7513,0
+200000,Female,Graduate School,Married,30,0,18660,27010,0
+160000,Female,Graduate School,Single,31,-1,214,0,0
+220000,Female,University,Single,30,1,5627,0,1
+290000,Female,University,Married,32,1,146853,1503,0
+50000,Female,High School,Others,32,-1,326,652,0
+290000,Female,University,Married,32,0,76383,3400,0
+30000,Female,University,Single,23,1,33181,0,1
+40000,Female,High School,Married,23,2,8244,0,1
+70000,Female,Graduate School,Single,26,0,64471,2281,0
+100000,Female,University,Single,23,0,82323,1000,0
+20000,Female,Graduate School,Single,23,0,16178,1300,0
+50000,Female,University,Single,23,0,51002,1983,1
+130000,Female,Others,Single,39,-1,264,264,0
+140000,Female,University,Married,28,2,115019,4000,1
+230000,Female,Graduate School,Married,30,0,37719,3025,0
+300000,Female,University,Single,30,-1,8805,2186,0
+200000,Female,Graduate School,Single,29,-2,-792,0,1
+100000,Female,Graduate School,Single,29,1,63542,5000,0
+100000,Female,University,Single,30,0,92683,3324,0
+80000,Female,University,Single,29,-1,1443,1443,1
+170000,Female,University,Single,29,0,51401,2100,0
+420000,Female,Graduate School,Single,29,-2,392,400,0
+70000,Female,Graduate School,Single,29,0,17154,2000,0
+480000,Female,University,Single,30,0,470400,0,1
+250000,Female,Others,Married,27,0,127786,3729,0
+180000,Female,Graduate School,Single,28,0,230740,1500,0
+240000,Female,Graduate School,Single,28,-1,8590,6398,0
+210000,Female,Graduate School,Single,28,0,9918,6942,0
+100000,Female,Graduate School,Single,28,3,49961,0,0
+50000,Female,Graduate School,Single,28,0,48940,2059,0
+300000,Female,Graduate School,Single,28,0,28989,1291,0
+70000,Female,Graduate School,Single,28,0,21777,2000,0
+490000,Female,Graduate School,Single,29,0,20984,2000,0
+170000,Female,Graduate School,Single,29,0,37350,2000,0
+100000,Female,Graduate School,Single,29,0,51650,2500,0
+200000,Female,University,Single,32,-1,2261,298,0
+80000,Female,University,Single,32,-2,3459,3789,0
+220000,Female,University,Single,32,0,150393,2349,0
+300000,Female,High School,Single,33,-2,0,0,1
+120000,Female,University,Married,33,0,90575,3204,0
+240000,Female,University,Single,30,0,226315,9100,0
+200000,Female,Graduate School,Single,30,0,96152,8000,0
+40000,Female,University,Married,30,0,39940,2000,0
+200000,Female,Graduate School,Single,30,0,195128,13600,0
+100000,Female,Graduate School,Single,30,3,95890,2000,0
+50000,Female,University,Married,31,0,48509,3310,0
+40000,Female,University,Married,32,0,40368,1603,0
+70000,Female,University,Married,32,2,68321,6600,0
+210000,Female,Graduate School,Single,31,-2,191,376,0
+30000,Female,University,Single,28,0,25150,3000,0
+100000,Female,Graduate School,Single,30,0,91844,3200,0
+20000,Female,University,Married,30,0,16754,1291,0
+400000,Female,University,Married,36,0,20818,1878,0
+330000,Female,University,Married,28,0,221793,10000,0
+240000,Female,University,Single,30,-1,4684,18432,1
+70000,Female,Graduate School,Single,29,0,71237,6577,1
+240000,Female,Graduate School,Single,29,-1,358,176,0
+200000,Female,Graduate School,Married,27,1,-896,7052,0
+100000,Female,Graduate School,Single,27,-2,904,3710,0
+180000,Female,Graduate School,Single,29,0,44109,8000,0
+360000,Female,University,Single,30,0,82195,20500,0
+130000,Female,University,Single,29,0,113203,4200,0
+120000,Female,University,Single,30,-2,2680,14770,0
+230000,Female,Graduate School,Single,30,-2,0,0,0
+340000,Female,Others,Single,30,0,47001,2000,0
+320000,Female,Graduate School,Single,29,0,198441,10000,0
+330000,Female,High School,Single,29,0,311577,13110,0
+80000,Female,High School,Single,30,2,4794,1074,1
+150000,Female,Graduate School,Married,30,-1,4083,3537,0
+160000,Female,High School,Married,31,-2,207,565,1
+50000,Female,High School,Married,37,0,39804,1964,0
+300000,Female,Others,Married,48,-2,0,0,0
+190000,Female,University,Married,36,0,185788,7483,0
+50000,Female,University,Single,27,0,25870,1325,0
+90000,Female,Graduate School,Single,28,-1,866,148,0
+210000,Female,Graduate School,Single,28,-1,85672,6009,0
+240000,Female,Graduate School,Single,28,0,246310,10500,0
+60000,Female,University,Married,34,2,40511,4200,1
+440000,Female,University,Single,35,-1,3091,1150,0
+30000,Female,University,Married,42,2,28485,0,1
+230000,Female,University,Single,26,0,23758,0,0
+440000,Female,University,Single,28,1,209176,11000,1
+80000,Female,University,Married,29,0,63899,2500,1
+30000,Female,University,Married,30,0,29265,2000,0
+250000,Female,University,Single,35,0,238855,8517,0
+30000,Female,Graduate School,Married,45,2,26155,1482,1
+50000,Female,High School,Others,49,0,49402,1800,0
+50000,Female,University,Single,34,0,47303,2807,0
+30000,Female,University,Single,34,1,26330,0,0
+30000,Female,University,Married,45,0,30788,1759,1
+500000,Female,University,Married,42,0,149838,0,0
+50000,Female,University,Married,41,1,24540,1500,1
+50000,Female,University,Married,36,0,49872,1600,0
+50000,Female,Graduate School,Single,22,0,2416,0,0
+90000,Female,University,Single,26,0,45069,2000,0
+30000,Female,Graduate School,Single,26,1,13890,2000,1
+150000,Female,High School,Married,43,0,125403,5018,0
+50000,Female,University,Married,23,0,14465,50295,0
+70000,Female,University,Single,23,2,14871,2850,1
+80000,Female,University,Single,23,0,68850,0,0
+280000,Female,Graduate School,Single,29,-2,-11,0,0
+90000,Female,University,Married,30,0,6480,1106,0
+330000,Female,Graduate School,Married,39,0,94384,4000,0
+190000,Female,University,Married,46,0,80903,5000,0
+210000,Female,Graduate School,Married,40,0,47109,2000,0
+100000,Female,Graduate School,Married,30,3,98094,6000,1
+20000,Female,University,Single,30,3,20226,0,1
+260000,Female,Graduate School,Married,31,-1,555,8078,0
+20000,Female,University,Married,26,1,15803,0,1
+80000,Female,High School,Married,37,0,77480,3000,1
+50000,Female,University,Married,42,0,30640,1746,0
+70000,Female,High School,Married,35,2,50834,2460,1
+30000,Female,University,Married,38,0,26664,3582,0
+100000,Female,Graduate School,Single,25,1,97271,0,1
+50000,Female,Graduate School,Single,25,0,50673,2500,0
+80000,Female,University,Married,27,2,63711,2907,1
+50000,Female,University,Married,27,0,7439,1028,1
+150000,Female,Others,Married,28,0,152105,6300,0
+140000,Female,Graduate School,Single,29,0,130261,3474,0
+70000,Female,Graduate School,Married,31,0,63820,5000,0
+30000,Female,University,Married,31,1,28933,1949,0
+80000,Female,University,Married,32,2,70381,0,1
+40000,Female,University,Married,41,0,21960,1176,0
+100000,Female,University,Married,37,0,34572,1849,0
+50000,Female,University,Married,40,0,48050,2000,0
+30000,Female,High School,Married,35,2,26088,1800,1
+120000,Female,Graduate School,Married,35,0,9756,2519,0
+260000,Female,High School,Married,39,0,152735,6627,0
+90000,Female,University,Married,35,0,21093,1256,0
+30000,Female,University,Married,33,0,4101,172,0
+260000,Female,Graduate School,Married,37,0,174971,10000,0
+170000,Female,Graduate School,Married,39,1,161105,0,0
+150000,Female,University,Married,36,0,137664,5500,0
+90000,Female,High School,Married,40,1,40467,2000,0
+50000,Female,High School,Single,45,0,3740,1100,0
+70000,Female,Graduate School,Single,29,0,66867,3020,0
+210000,Female,Graduate School,Single,29,0,217220,7700,0
+450000,Female,Graduate School,Single,31,-2,0,0,0
+70000,Female,High School,Married,32,0,14180,3000,1
+70000,Female,High School,Single,32,2,23741,4000,1
+300000,Female,High School,Single,34,-1,2344,8000,0
+70000,Female,Graduate School,Married,29,0,68494,3200,0
+180000,Female,University,Married,37,0,88275,2800,0
+50000,Female,High School,Married,29,2,59077,0,1
+20000,Female,University,Single,27,-1,20556,8,0
+260000,Female,Graduate School,Single,31,0,229398,8720,0
+70000,Female,High School,Married,34,0,5219,3092,0
+50000,Female,University,Others,31,2,39193,4600,1
+150000,Female,University,Married,34,0,146661,6800,0
+150000,Female,High School,Married,45,0,47372,2000,0
+20000,Female,University,Married,37,0,19326,1263,0
+60000,Female,University,Single,45,0,57711,2500,0
+50000,Female,University,Married,30,0,40596,2000,0
+50000,Female,Graduate School,Single,25,0,26240,1500,0
+80000,Female,Graduate School,Married,27,3,71965,0,1
+140000,Female,Graduate School,Single,27,0,10064,2000,0
+20000,Female,High School,Single,28,1,6616,0,1
+10000,Female,University,Married,49,2,2240,1057,1
+120000,Female,University,Married,45,0,11547,5000,0
+50000,Female,University,Married,40,1,36291,0,1
+30000,Female,University,Married,39,2,11436,1500,1
+210000,Female,University,Married,49,2,114273,6000,1
+150000,Female,Graduate School,Single,27,-2,0,0,1
+180000,Female,High School,Married,28,0,113125,1923,0
+590000,Female,University,Married,29,-2,0,0,1
+260000,Female,Graduate School,Married,30,0,25661,5000,0
+90000,Female,High School,Married,35,0,13865,2054,0
+110000,Female,University,Married,28,0,20186,1300,0
+140000,Female,Graduate School,Single,28,0,1756,2209,0
+210000,Female,Graduate School,Single,30,-1,7631,5583,0
+80000,Female,University,Married,31,-2,0,0,0
+30000,Female,High School,Married,36,1,8910,22,0
+120000,Female,High School,Married,28,0,93623,3440,0
+20000,Female,High School,Married,44,0,7661,1457,0
+80000,Female,High School,Single,42,0,50320,1784,0
+110000,Female,University,Single,35,0,22514,1378,0
+240000,Female,University,Married,34,0,77126,5005,0
+100000,Female,University,Married,49,0,5532,5325,0
+200000,Female,University,Single,32,0,150204,5479,0
+450000,Female,Graduate School,Single,32,-2,1249,142,0
+150000,Female,University,Single,32,0,49760,2538,0
+120000,Female,University,Married,39,-1,11277,0,0
+70000,Female,University,Single,26,2,66087,3120,1
+190000,Female,Graduate School,Single,31,0,132946,6000,0
+50000,Female,High School,Married,45,0,49212,1611,0
+60000,Female,High School,Married,44,1,39466,0,0
+90000,Female,Graduate School,Single,30,2,15979,2000,1
+300000,Female,University,Married,33,0,62951,4007,0
+210000,Female,High School,Married,32,0,1406,0,0
+50000,Female,University,Married,34,2,46843,4310,0
+350000,Female,Graduate School,Married,34,-1,37129,75000,0
+30000,Female,University,Married,34,0,26253,1444,0
+280000,Female,High School,Married,34,1,0,0,0
+380000,Female,High School,Married,35,0,18341,2000,0
+360000,Female,University,Single,34,-2,724,1232,0
+260000,Female,University,Married,37,1,5917,0,0
+160000,Female,University,Married,37,0,27720,5000,0
+250000,Female,High School,Married,43,-2,8443,13829,0
+320000,Female,Graduate School,Single,27,-1,19950,5000,0
+140000,Female,University,Single,31,0,137326,6400,0
+80000,Female,High School,Married,32,0,58091,2700,0
+100000,Female,Graduate School,Married,40,2,63422,3000,1
+210000,Female,Others,Married,41,-2,1662,13934,0
+200000,Female,High School,Single,29,-1,5555,7485,0
+110000,Female,High School,Married,37,0,104691,5000,0
+70000,Female,Graduate School,Married,36,1,18514,2000,1
+240000,Female,University,Married,46,0,233438,11250,0
+80000,Female,University,Married,38,0,70613,2006,0
+150000,Female,High School,Single,47,0,132093,5000,0
+50000,Female,University,Single,42,0,46566,2000,0
+230000,Female,University,Married,37,0,53327,1973,0
+290000,Female,Graduate School,Married,43,-1,1457,1808,0
+350000,Female,Graduate School,Single,36,-2,3272,7008,0
+100000,Female,High School,Single,39,0,86453,5000,0
+50000,Female,High School,Married,46,0,48796,2000,0
+360000,Female,University,Married,43,-2,0,0,0
+210000,Female,University,Single,38,-1,1026,3751,0
+500000,Female,University,Married,39,-2,1948,599,0
+290000,Female,University,Married,47,0,244377,10000,0
+350000,Female,High School,Married,36,0,255751,33010,0
+190000,Female,University,Married,33,-2,2125,3411,0
+180000,Female,Graduate School,Single,34,0,108016,150000,0
+110000,Female,University,Married,35,0,6137,1053,1
+200000,Female,High School,Single,33,1,51979,0,1
+210000,Female,University,Married,40,-1,10240,0,0
+100000,Female,High School,Married,36,-1,476,476,0
+130000,Female,University,Married,43,0,135390,6329,0
+70000,Female,High School,Single,38,0,70621,2506,0
+100000,Female,University,Married,33,0,92189,3252,0
+150000,Female,High School,Married,35,-2,600,660,0
+160000,Female,High School,Married,39,0,102358,4671,0
+30000,Female,High School,Married,48,1,-100,200,0
+80000,Female,University,Married,36,-1,816,1423,0
+200000,Female,University,Married,44,-2,7096,1336,0
+180000,Female,High School,Married,41,2,96093,6500,1
+180000,Female,University,Married,34,0,47517,1526,0
+170000,Female,University,Married,35,0,144606,5007,0
+50000,Female,Graduate School,Single,35,0,4309,7000,0
+160000,Female,High School,Married,40,0,93287,3230,0
+160000,Female,University,Married,40,-2,390,390,1
+20000,Female,Graduate School,Single,22,0,18390,2000,0
+70000,Female,University,Married,24,0,33602,10096,0
+240000,Female,University,Single,38,-2,34752,30073,0
+300000,Female,University,Married,45,0,343205,12429,0
+420000,Female,University,Married,37,-2,4904,5332,0
+200000,Female,High School,Married,43,1,119432,0,1
+50000,Female,University,Married,47,0,27121,2000,0
+50000,Female,University,Married,38,0,20606,2000,0
+200000,Female,Graduate School,Married,38,-2,2571,2772,0
+200000,Female,Graduate School,Single,32,0,118532,6000,0
+380000,Female,University,Married,33,1,363944,29540,0
+50000,Female,Graduate School,Single,27,-1,17626,1547,0
+50000,Female,Graduate School,Single,28,-1,1460,1625,0
+500000,Female,University,Single,35,-2,35757,22372,0
+300000,Female,University,Married,40,-1,2470,0,0
+170000,Female,University,Single,32,0,10928,6151,1
+20000,Female,University,Married,46,3,11698,2000,1
+20000,Female,University,Married,43,-1,1475,1291,0
+280000,Female,University,Single,36,0,183797,6083,0
+80000,Female,University,Single,22,-1,858,1000,0
+20000,Female,University,Single,24,0,17372,1500,0
+70000,Female,Graduate School,Single,24,0,110176,2700,0
+30000,Female,University,Married,25,2,23888,3800,1
+170000,Female,University,Single,37,0,117527,5837,0
+320000,Female,University,Single,30,0,97541,3900,0
+210000,Female,Graduate School,Single,30,-2,0,0,0
+310000,Female,University,Single,31,0,14015,2000,0
+230000,Female,Graduate School,Single,29,1,0,162,0
+90000,Female,University,Married,31,2,90089,3500,1
+200000,Female,Graduate School,Single,31,-2,6024,2180,0
+150000,Female,University,Single,34,3,103389,8000,0
+60000,Female,Graduate School,Single,27,0,23690,1283,0
+50000,Female,University,Single,38,-1,1496,35007,0
+200000,Female,Graduate School,Single,40,0,132563,4841,0
+220000,Female,University,Married,35,0,41259,2000,0
+300000,Female,High School,Married,34,-2,-75,29089,0
+30000,Female,University,Married,47,2,25893,5501,1
+330000,Female,Others,Married,36,0,18768,3744,0
+280000,Female,Graduate School,Married,42,0,172886,6232,0
+330000,Female,Graduate School,Single,38,-2,0,0,1
+390000,Female,High School,Single,45,0,94373,2230,0
+200000,Female,University,Married,38,-1,788,165,0
+20000,Female,High School,Single,45,0,12670,1520,0
+140000,Female,University,Married,32,0,52825,3000,0
+30000,Female,Graduate School,Married,32,-1,3684,2911,0
+210000,Female,University,Married,41,1,0,0,0
+100000,Female,High School,Married,44,-2,1010,0,0
+50000,Female,University,Married,38,-1,1121,0,0
+30000,Female,High School,Single,42,0,26012,2000,1
+30000,Female,Others,Single,45,2,29278,2000,1
+150000,Female,Graduate School,Single,37,1,0,0,0
+180000,Female,Graduate School,Single,39,-2,11732,6653,0
+230000,Female,University,Married,49,1,0,0,0
+220000,Female,Graduate School,Single,36,-2,122,104,0
+200000,Female,Graduate School,Single,35,-2,5956,10402,0
+80000,Female,University,Married,35,-2,17087,1617,0
+380000,Female,High School,Married,44,0,7234,1500,0
+30000,Female,University,Married,42,2,9985,1176,1
+160000,Female,University,Married,41,-1,8091,5344,0
+60000,Female,University,Married,47,0,27970,1456,1
+100000,Female,High School,Married,42,-1,7689,0,0
+150000,Female,University,Married,37,-2,0,1771,0
+100000,Female,University,Married,38,2,36414,2000,0
+30000,Female,University,Single,44,-2,0,1330,0
+220000,Female,Graduate School,Married,36,1,4086,0,1
+220000,Female,Graduate School,Single,38,-1,2650,0,0
+400000,Female,University,Single,30,-2,-38,10446,1
+680000,Female,Graduate School,Single,30,0,39639,0,1
+280000,Female,Graduate School,Single,30,0,36269,10300,0
+70000,Female,University,Married,44,0,34135,2020,0
+300000,Female,University,Married,39,-1,6192,4790,1
+250000,Female,University,Single,41,-1,312,0,0
+200000,Female,Graduate School,Single,31,-1,1570,0,0
+450000,Female,Graduate School,Single,31,-1,12818,17577,0
+30000,Female,Graduate School,Single,33,3,21167,0,1
+210000,Female,University,Single,29,-1,1778,226,0
+250000,Female,Graduate School,Single,29,2,124780,5900,1
+70000,Female,University,Single,31,0,64888,2127,0
+230000,Female,Graduate School,Single,31,1,-9,0,0
+150000,Female,University,Married,37,-2,0,0,0
+70000,Female,High School,Married,47,1,29386,3000,0
+150000,Female,University,Married,35,0,102007,5000,1
+200000,Female,University,Married,37,-1,313,0,0
+200000,Female,Others,Single,42,-1,5433,2000,0
+300000,Female,High School,Married,42,-1,885,0,0
+80000,Female,High School,Married,39,-1,2921,780,1
+30000,Female,University,Single,43,5,31273,0,1
+100000,Female,University,Married,40,0,89236,3162,0
+360000,Female,Graduate School,Single,32,0,290543,7288,0
+150000,Female,University,Single,32,0,32428,2000,0
+180000,Female,Graduate School,Single,32,1,-10,0,0
+210000,Female,Graduate School,Married,33,-2,1345,6087,0
+110000,Female,High School,Married,41,0,101316,5300,0
+80000,Female,University,Married,41,-1,2727,1679,0
+50000,Female,University,Single,39,-1,263,1209,0
+150000,Female,University,Married,32,-2,0,0,0
+260000,Female,Graduate School,Single,40,-2,0,0,0
+200000,Female,Graduate School,Single,33,-2,6421,1064,0
+230000,Female,University,Single,29,-1,2624,1149,0
+70000,Female,Graduate School,Married,32,0,70178,2900,1
+70000,Female,Graduate School,Single,32,2,34640,2000,1
+160000,Female,University,Single,32,0,2812,1222,1
+200000,Female,High School,Single,34,0,195380,7632,0
+270000,Female,University,Single,35,-1,1636,37,0
+100000,Female,University,Married,36,0,97026,5000,0
+410000,Female,Graduate School,Married,42,-1,12327,2324,0
+160000,Female,University,Married,33,-1,1655,4271,0
+120000,Female,University,Single,39,-2,326,3756,1
+50000,Female,University,Married,37,0,12246,1196,0
+450000,Female,Graduate School,Married,38,-2,0,0,0
+200000,Female,University,Single,36,0,193083,9000,0
+360000,Female,University,Single,41,-1,3640,1436,0
+340000,Female,University,Married,40,0,40681,3171,0
+300000,Female,High School,Married,40,-2,0,0,1
+220000,Female,High School,Single,40,0,220091,7797,0
+370000,Female,Graduate School,Single,37,0,308011,10404,0
+170000,Female,University,Married,39,-1,2947,4134,0
+500000,Female,University,Single,39,0,491395,25799,0
+100000,Female,Graduate School,Married,36,2,94796,2624,0
+110000,Female,Graduate School,Single,31,-1,223,0,0
+230000,Female,Graduate School,Married,31,0,23526,7005,0
+230000,Female,Graduate School,Single,31,0,33109,5000,0
+150000,Female,University,Married,33,0,25849,1735,0
+210000,Female,Graduate School,Married,32,-1,7544,5255,0
+100000,Female,High School,Married,32,-1,24002,2781,0
+230000,Female,Graduate School,Married,42,0,147396,10000,0
+360000,Female,Graduate School,Single,33,-2,0,0,1
+360000,Female,University,Married,34,0,80540,3933,0
+200000,Female,Graduate School,Single,34,1,-2113,0,1
+200000,Female,Graduate School,Married,34,-2,16024,18184,0
+150000,Female,University,Single,33,-1,508,1166,0
+110000,Female,Graduate School,Married,41,3,150,0,0
+320000,Female,University,Married,38,-2,0,0,0
+50000,Female,High School,Married,44,2,19178,1500,1
+160000,Female,University,Single,39,0,32930,14000,0
+80000,Female,University,Married,42,0,43045,1301,1
+100000,Female,University,Married,35,0,28428,1800,0
+150000,Female,University,Married,36,0,289463,8245,0
+60000,Female,University,Married,38,7,126220,0,0
+200000,Female,Graduate School,Single,39,-2,345,355,0
+160000,Female,High School,Married,38,-2,5207,207,0
+230000,Female,University,Married,38,-1,4549,0,0
+230000,Female,University,Married,40,1,0,0,0
+90000,Female,High School,Married,39,1,0,0,0
+140000,Female,High School,Married,40,2,30755,1815,1
+30000,Female,High School,Single,41,0,24217,2000,0
+320000,Female,Graduate School,Married,41,0,59807,10000,0
+20000,Female,High School,Married,35,0,17008,1501,0
+50000,Female,High School,Single,35,-1,3123,1326,0
+30000,Female,High School,Married,38,0,24505,1800,0
+160000,Female,High School,Single,36,-1,27204,897,0
+280000,Female,University,Married,48,-1,79536,31280,0
+390000,Female,University,Single,35,0,42286,2389,0
+100000,Female,High School,Married,44,0,41803,33882,1
+200000,Female,Others,Single,48,0,198540,0,0
+110000,Female,High School,Married,43,0,48796,2350,0
+50000,Female,University,Married,45,0,48875,2080,0
+100000,Female,University,Married,41,0,111925,4000,0
+200000,Female,High School,Others,31,1,0,0,0
+250000,Female,Graduate School,Married,34,2,138380,5136,0
+20000,Female,High School,Married,48,2,13836,1500,1
+260000,Female,University,Married,35,-2,3439,10127,0
+20000,Female,High School,Married,31,1,15043,0,1
+320000,Female,University,Married,44,1,0,0,0
+30000,Female,Graduate School,Married,37,0,7979,1149,0
+50000,Female,University,Single,41,0,39859,1501,0
+80000,Female,University,Single,36,2,19671,1700,1
+120000,Female,University,Married,40,1,57816,5900,1
+290000,Female,University,Married,49,-1,37246,3000,0
+50000,Female,University,Married,43,0,30530,2000,0
+30000,Female,High School,Married,39,0,21173,2000,1
+580000,Female,University,Single,33,0,18812,2000,0
+270000,Female,Graduate School,Married,41,-1,1278,0,0
+30000,Female,High School,Married,42,2,836,836,0
+160000,Female,Graduate School,Married,31,2,147710,7501,0
+180000,Female,University,Single,31,-1,416,416,0
+20000,Female,University,Married,34,0,3223,1220,1
+210000,Female,University,Married,33,-1,7655,1644,0
+200000,Female,High School,Married,43,0,113570,4149,0
+130000,Female,High School,Married,43,0,32667,2000,0
+30000,Female,Others,Married,36,-2,8839,0,0
+130000,Female,University,Married,40,-1,4176,11839,0
+180000,Female,High School,Married,43,-1,3122,16,0
+110000,Female,University,Married,39,0,93111,3226,0
+20000,Female,University,Married,39,0,15299,1883,0
+150000,Female,University,Single,35,0,97268,3500,0
+210000,Female,High School,Single,37,1,0,202,0
+90000,Female,University,Single,42,0,92837,4500,0
+500000,Female,University,Married,42,0,62112,20025,0
+300000,Female,University,Married,37,-1,15695,11551,0
+370000,Female,High School,Married,33,0,18082,1628,0
+200000,Female,University,Single,35,0,104393,6300,0
+500000,Female,Others,Single,38,-2,2252,2265,0
+50000,Female,University,Single,41,0,43885,2408,1
+120000,Female,University,Married,36,0,106609,380,0
+270000,Female,University,Married,38,0,62263,3000,0
+300000,Female,Graduate School,Married,43,-1,21953,32905,0
+90000,Female,University,Married,37,1,18255,0,0
+120000,Female,University,Married,48,-1,626,0,0
+90000,Female,University,Married,47,1,6681,0,0
+350000,Female,Graduate School,Single,41,0,122017,6000,0
+180000,Female,University,Married,29,1,20867,0,0
+30000,Female,High School,Others,30,0,24171,3634,0
+240000,Female,University,Single,29,-1,2638,1526,1
+80000,Female,University,Married,38,1,84372,0,0
+50000,Female,University,Married,37,0,46239,2000,1
+370000,Female,Graduate School,Married,42,0,291006,12100,0
+150000,Female,University,Married,43,-2,5446,6682,0
+410000,Female,Graduate School,Single,31,0,400967,14158,0
+230000,Female,University,Married,32,-1,1680,1535,0
+80000,Female,High School,Single,32,-2,4965,51932,0
+140000,Female,Graduate School,Single,32,0,10281,32172,0
+180000,Female,University,Married,32,0,5244,3000,0
+240000,Female,Graduate School,Married,33,0,95835,4060,0
+160000,Female,University,Single,30,-2,-38,0,0
+320000,Female,Graduate School,Single,31,-1,11910,4445,0
+80000,Female,University,Married,35,2,252,0,0
+230000,Female,University,Married,36,0,3745,3000,0
+230000,Female,University,Married,40,-2,273,273,0
+20000,Female,University,Married,38,0,11904,1500,0
+220000,Female,University,Married,37,0,158494,7000,0
+180000,Female,Graduate School,Married,42,-2,626,0,0
+260000,Female,University,Married,36,0,7258,2037,0
+360000,Female,University,Married,37,-1,10000,17444,0
+210000,Female,University,Married,36,1,167970,0,1
+200000,Female,Graduate School,Single,32,-2,261,1723,0
+350000,Female,Graduate School,Single,35,-2,8526,10951,0
+360000,Female,Graduate School,Married,35,1,0,0,0
+60000,Female,University,Single,27,0,30641,3987,0
+190000,Female,Graduate School,Single,47,-1,314,3031,0
+180000,Female,Graduate School,Single,35,0,24414,2483,0
+230000,Female,Graduate School,Married,37,-2,312,3186,1
+270000,Female,Graduate School,Married,45,-2,0,0,1
+220000,Female,High School,Married,45,0,176879,10150,0
+80000,Female,Graduate School,Married,36,-1,78780,0,0
+90000,Female,Graduate School,Single,29,0,79063,3687,0
+500000,Female,Graduate School,Married,35,-2,5799,36277,0
+210000,Female,University,Married,35,-1,188,971,1
+110000,Female,Graduate School,Single,27,0,89603,4113,0
+150000,Female,Graduate School,Single,31,0,10778,6000,0
+90000,Female,Graduate School,Single,33,-1,3000,6000,0
+190000,Female,University,Married,34,0,84031,5000,0
+180000,Female,University,Single,43,0,152122,10000,0
+240000,Female,University,Married,37,0,51277,10006,0
+200000,Female,University,Single,28,-1,4571,0,0
+200000,Female,High School,Married,49,1,0,0,0
+160000,Female,University,Single,38,-2,0,0,0
+240000,Female,Graduate School,Single,27,1,7029,0,0
+80000,Female,University,Married,25,0,76358,3000,0
+180000,Female,University,Married,39,1,0,0,1
+80000,Female,University,Single,26,1,0,0,0
+70000,Female,University,Single,26,0,70805,2600,0
+160000,Female,University,Married,38,0,73902,50000,0
+200000,Female,Graduate School,Single,30,0,148646,5000,0
+20000,Female,Graduate School,Single,34,-2,1789,16745,0
+300000,Female,University,Married,35,0,155493,50706,0
+150000,Female,Graduate School,Married,42,0,107968,5023,0
+30000,Female,Graduate School,Single,45,1,5830,0,0
+360000,Female,University,Single,37,-1,333,333,1
+280000,Female,Graduate School,Married,39,0,97694,4014,0
+30000,Female,High School,Single,23,1,30335,0,0
+50000,Female,Graduate School,Single,24,0,48712,2269,0
+160000,Female,Others,Married,28,-1,819,162,0
+80000,Female,Graduate School,Single,28,0,36736,1955,0
+100000,Female,Graduate School,Single,29,0,92929,5000,0
+60000,Female,University,Married,47,0,54532,1970,0
+420000,Female,Graduate School,Single,27,0,198604,7518,0
+140000,Female,University,Single,27,-2,0,0,0
+270000,Female,Graduate School,Single,32,2,9220,5000,1
+160000,Female,Graduate School,Married,35,1,0,0,1
+230000,Female,Graduate School,Single,31,-1,2850,1940,0
+100000,Female,Graduate School,Others,47,0,43175,1680,0
+260000,Female,Graduate School,Single,36,-2,0,132,0
+200000,Female,Graduate School,Single,37,0,191876,4305,0
+380000,Female,University,Married,44,0,268755,11035,0
+130000,Female,University,Married,33,-1,41573,0,0
+70000,Female,University,Single,33,1,67967,0,0
+270000,Female,University,Married,33,-1,10148,625,0
+30000,Female,University,Married,49,0,27922,2000,1
+290000,Female,Graduate School,Married,37,-1,20565,8741,0
+50000,Female,University,Single,40,0,46319,4000,0
+30000,Female,University,Married,40,1,31230,2007,1
+210000,Female,High School,Married,38,-1,2569,3920,0
+220000,Female,High School,Single,43,0,223022,7750,0
+340000,Female,University,Married,32,0,61605,3500,0
+500000,Female,Graduate School,Married,41,-2,4513,4375,0
+30000,Female,University,Married,31,1,20494,0,0
+180000,Female,University,Married,32,2,13286,2800,1
+60000,Female,University,Single,24,0,53910,2600,0
+100000,Female,Graduate School,Single,32,0,41531,6157,0
+270000,Female,University,Married,33,-1,7780,2228,1
+200000,Female,University,Married,47,-2,8800,0,0
+240000,Female,University,Married,38,0,234304,7888,0
+110000,Female,University,Married,40,0,107035,4000,0
+360000,Female,Graduate School,Single,32,0,167232,6121,0
+50000,Female,High School,Single,35,0,54731,1730,0
+60000,Female,University,Married,39,0,50615,1619,0
+50000,Female,University,Married,38,0,5764,1278,0
+120000,Female,Graduate School,Single,28,-1,3564,2063,0
+140000,Female,University,Married,48,0,121096,4407,0
+100000,Female,University,Married,46,1,97515,4000,0
+280000,Female,High School,Married,46,2,506515,0,1
+180000,Female,Graduate School,Married,28,0,176793,8000,0
+180000,Female,Graduate School,Single,28,1,0,0,0
+150000,Female,University,Married,37,2,100390,11600,0
+180000,Female,Graduate School,Single,42,2,130304,13600,1
+180000,Female,Graduate School,Single,35,-1,19369,7123,0
+30000,Female,University,Married,41,0,25735,2000,0
+380000,Female,University,Married,48,0,64001,5000,0
+130000,Female,High School,Single,36,-1,1738,1738,0
+170000,Female,Graduate School,Single,28,-1,1311,3008,0
+220000,Female,Graduate School,Married,43,-1,941,920,1
+150000,Female,Graduate School,Married,49,1,0,0,0
+180000,Female,Others,Single,34,1,0,149,0
+180000,Female,Graduate School,Single,46,-2,0,0,1
+360000,Female,Graduate School,Single,27,1,0,0,0
+150000,Female,Graduate School,Married,30,-1,107799,16363,0
+170000,Female,Others,Married,39,-1,1803,218,0
+50000,Female,Graduate School,Single,44,0,10012,1500,0
+300000,Female,Graduate School,Single,39,0,87540,5000,0
+450000,Female,Graduate School,Single,40,-2,799,0,0
+210000,Female,University,Married,29,0,145499,4751,0
+80000,Female,University,Married,30,2,15304,0,0
+240000,Female,University,Married,32,-2,0,0,0
+110000,Female,University,Single,32,0,68014,2500,0
+400000,Female,High School,Married,40,-1,27035,28204,0
+90000,Female,University,Married,49,1,0,0,1
+210000,Female,Graduate School,Single,29,0,199251,9025,0
+260000,Female,Graduate School,Single,31,0,69343,4000,0
+170000,Female,Graduate School,Single,27,0,89909,8941,0
+210000,Female,University,Single,30,1,0,0,0
+200000,Female,Graduate School,Married,38,1,0,0,0
+560000,Female,Graduate School,Married,45,0,34669,150000,0
+540000,Female,Graduate School,Single,34,-1,1128,1784,0
+100000,Female,University,Married,29,-1,500,0,0
+80000,Female,University,Married,26,0,77758,4000,1
+30000,Female,University,Married,35,2,23785,0,1
+80000,Female,University,Married,28,0,77856,3670,1
+50000,Female,Graduate School,Single,28,2,19930,0,1
+120000,Female,Graduate School,Single,26,0,43629,25000,0
+240000,Female,University,Married,32,0,24615,5000,0
+260000,Female,University,Married,33,-2,3342,9023,0
+280000,Female,University,Single,33,0,71174,2290,0
+30000,Female,University,Married,35,0,31955,1800,1
+150000,Female,High School,Single,38,1,1589,1339,0
+350000,Female,University,Married,36,0,21026,15000,1
+100000,Female,University,Single,34,-1,1919,1831,0
+320000,Female,Graduate School,Married,46,-1,57593,534,0
+380000,Female,University,Married,41,3,415735,0,1
+40000,Female,University,Single,43,1,9268,0,0
+100000,Female,High School,Married,43,0,5719,1043,0
+270000,Female,Graduate School,Single,30,0,81488,3000,0
+30000,Female,High School,Married,44,2,26627,1452,1
+50000,Female,University,Single,40,0,19869,15584,0
+180000,Female,Graduate School,Single,30,0,44478,8000,0
+150000,Female,Graduate School,Single,31,-2,5596,2573,0
+360000,Female,Graduate School,Married,32,0,16039,5000,0
+170000,Female,Graduate School,Married,38,-1,5910,0,0
+140000,Female,University,Single,37,0,86858,5000,0
+90000,Female,University,Married,41,0,25388,1800,0
+20000,Female,High School,Single,46,-1,942,942,0
+160000,Female,High School,Married,28,2,132528,7000,0
+30000,Female,University,Single,36,0,26535,1433,0
+300000,Female,University,Married,38,1,0,0,1
+80000,Female,University,Single,24,2,81228,3400,1
+360000,Female,Graduate School,Single,32,0,11358,7079,1
+460000,Female,Graduate School,Single,41,0,6302,31321,0
+30000,Female,High School,Single,46,2,19830,0,0
+80000,Female,University,Married,37,-1,2000,2000,1
+140000,Female,Others,Single,25,0,91485,3009,0
+280000,Female,University,Married,43,2,644,499,0
+300000,Female,University,Married,28,0,30631,1373,0
+30000,Female,University,Married,29,2,15361,3175,1
+500000,Female,Graduate School,Married,37,0,208068,30072,0
+180000,Female,University,Married,42,0,258662,11056,0
+160000,Female,High School,Married,37,0,17500,10000,0
+200000,Female,Graduate School,Married,40,-2,0,0,0
+130000,Female,University,Married,37,0,110434,4100,0
+230000,Female,High School,Single,30,-1,2380,0,0
+230000,Female,Graduate School,Single,34,-1,3251,4347,1
+60000,Female,High School,Married,48,-1,316,316,1
+80000,Female,University,Married,47,-1,4166,0,0
+240000,Female,University,Married,47,-2,27884,23610,0
+240000,Female,University,Single,30,0,35380,64013,0
+220000,Female,High School,Married,33,-2,1037,1062,0
+50000,Female,University,Married,44,0,10950,1097,1
+50000,Female,High School,Married,42,0,50562,2000,0
+50000,Female,University,Married,38,-1,588,588,0
+90000,Female,University,Married,39,-2,1330,950,1
+80000,Female,University,Single,46,0,79812,2854,0
+190000,Female,High School,Married,43,0,93359,3404,0
+210000,Female,Graduate School,Married,31,0,157544,7505,0
+370000,Female,University,Married,45,0,137250,5927,0
+80000,Female,University,Married,45,1,53916,0,1
+220000,Female,University,Married,32,0,28853,2000,0
+230000,Female,University,Married,44,-1,899,1000,0
+200000,Female,University,Married,39,-1,13145,15571,0
+400000,Female,Graduate School,Single,31,-1,5974,2600,0
+80000,Female,University,Single,44,0,22771,1239,0
+360000,Female,University,Married,36,0,16557,3000,0
+310000,Female,Graduate School,Single,35,0,308105,13088,1
+150000,Female,Graduate School,Single,29,0,80299,3000,0
+80000,Female,University,Single,30,0,90562,3340,0
+40000,Female,University,Married,30,2,14408,1500,1
+320000,Female,Graduate School,Single,30,0,2419,2000,0
+170000,Female,University,Single,38,0,153123,7000,0
+80000,Female,University,Single,36,0,23347,10000,0
+420000,Female,Graduate School,Married,33,2,8117,20837,0
+170000,Female,University,Married,45,0,169625,6081,0
+230000,Female,Graduate School,Married,40,-1,1163,0,0
+200000,Female,Graduate School,Married,35,0,20375,1089,0
+430000,Female,University,Married,42,2,282880,12700,1
+100000,Female,University,Married,32,0,18641,5000,0
+210000,Female,Graduate School,Single,32,1,15207,16529,0
+50000,Female,Graduate School,Married,40,-2,0,1470,0
+290000,Female,Graduate School,Single,36,0,57721,2514,0
+200000,Female,Graduate School,Single,32,-2,0,0,0
+170000,Female,Graduate School,Single,37,-2,7365,255,0
+30000,Female,University,Single,38,0,9730,0,0
+360000,Female,University,Married,48,-2,2500,0,1
+110000,Female,Graduate School,Single,31,1,0,0,0
+180000,Female,Graduate School,Married,31,0,151776,9300,0
+220000,Female,University,Single,34,0,93815,4300,0
+190000,Female,University,Single,34,0,90945,3500,0
+150000,Female,Others,Single,29,0,17196,2012,0
+350000,Female,University,Married,38,0,62862,24000,0
+240000,Female,University,Married,47,0,121386,6000,0
+230000,Female,University,Single,30,-2,0,63508,0
+500000,Female,Graduate School,Married,29,-2,6887,2940,0
+160000,Female,University,Married,30,-1,33712,14116,0
+290000,Female,Graduate School,Single,30,0,41626,5004,0
+50000,Male,University,Single,30,0,22459,26499,0
+270000,Female,Graduate School,Single,30,2,172933,19028,1
+260000,Female,University,Single,30,-2,4456,177,0
+180000,Female,High School,Married,31,-2,10790,7239,0
+180000,Female,University,Single,32,-2,0,0,0
+50000,Female,Others,Married,33,0,43268,2011,0
+50000,Female,University,Married,34,2,47993,2010,1
+80000,Female,Graduate School,Married,37,-1,27695,10000,1
+250000,Female,University,Married,37,0,212954,8000,0
+80000,Female,University,Married,45,0,81328,3750,0
+160000,Female,University,Married,37,0,13307,22887,0
+30000,Female,University,Single,42,0,27855,5000,0
+100000,Female,University,Married,44,0,1603,1440,0
+100000,Female,University,Married,46,1,0,766,0
+130000,Female,University,Married,45,-1,1980,0,0
+80000,Female,University,Married,35,0,81513,3612,0
+170000,Female,Graduate School,Married,41,0,93095,3500,0
+280000,Female,Graduate School,Married,42,-2,6425,3746,0
+500000,Female,High School,Married,44,2,1273,79113,1
+300000,Female,University,Single,41,-1,1215,43943,0
+310000,Female,University,Married,42,0,21692,19697,0
+120000,Female,Graduate School,Single,29,-1,8318,0,0
+50000,Female,High School,Married,49,2,48297,3000,1
+340000,Female,Others,Single,29,0,262621,24298,0
+90000,Female,Graduate School,Single,31,2,20052,59872,0
+60000,Female,Graduate School,Single,32,1,17992,0,1
+380000,Female,Graduate School,Single,29,0,384979,20000,0
+200000,Female,Graduate School,Single,29,-2,2692,4403,0
+460000,Female,Graduate School,Single,29,0,82315,90150,0
+300000,Female,University,Married,30,-1,10940,0,0
+360000,Female,Graduate School,Single,30,-2,4765,3434,0
+120000,Female,University,Married,30,0,25657,3000,0
+500000,Female,University,Married,31,1,0,5377,0
+210000,Female,Graduate School,Married,33,0,199851,7000,1
+180000,Female,University,Married,33,2,115166,10800,1
+200000,Female,University,Married,32,0,47550,5000,0
+240000,Female,University,Married,32,0,12000,0,0
+50000,Female,University,Married,32,-2,943,10528,0
+240000,Female,Graduate School,Single,32,-1,89988,18063,0
+160000,Female,High School,Married,33,-1,2364,0,1
+150000,Female,Graduate School,Single,33,0,11529,2000,0
+100000,Female,Graduate School,Single,33,-1,5640,0,0
+50000,Female,University,Single,33,1,49633,24,0
+180000,Female,Graduate School,Single,33,-1,31689,10000,0
+50000,Female,University,Married,33,1,23250,0,1
+20000,Female,High School,Others,44,0,18126,3599,1
+120000,Female,University,Single,39,-1,2753,0,0
+320000,Female,Graduate School,Single,44,-1,29227,49619,0
+370000,Female,University,Married,36,0,219386,10500,0
+210000,Female,University,Single,37,0,26557,5000,0
+220000,Female,High School,Single,40,0,197139,6500,0
+220000,Female,University,Single,41,0,119951,4375,0
+210000,Female,High School,Single,36,2,420522,16000,0
+200000,Female,University,Married,34,0,5481,1068,0
+180000,Female,University,Married,41,0,9309,1119,0
+60000,Female,University,Married,44,1,61049,0,0
+430000,Female,University,Married,37,0,17800,2003,0
+20000,Female,University,Married,34,0,5185,2000,0
+50000,Female,University,Married,35,0,48658,3000,0
+500000,Female,Graduate School,Single,38,-2,2937,5242,0
+340000,Female,Graduate School,Single,37,0,334125,13562,0
+80000,Female,High School,Married,40,-1,780,0,1
+230000,Female,Graduate School,Married,41,0,54598,1528,0
+170000,Female,University,Single,37,0,170687,3000,0
+260000,Female,University,Married,40,-1,411,209,0
+120000,Female,University,Married,34,0,8505,4086,0
+110000,Female,University,Married,39,0,102801,3818,0
+200000,Female,University,Married,46,0,106651,3692,0
+80000,Female,High School,Married,46,0,25505,1000,1
+420000,Female,High School,Single,35,0,65958,3500,0
+50000,Female,Graduate School,Married,38,0,10379,1000,1
+30000,Female,University,Married,39,2,31207,0,1
+210000,Female,High School,Married,40,0,2616,7537,0
+240000,Female,Graduate School,Married,33,0,214361,7830,0
+240000,Female,Graduate School,Married,45,-1,8732,5286,0
+390000,Female,University,Married,37,0,175363,5551,0
+30000,Female,University,Single,33,-2,498,4831,0
+50000,Female,High School,Married,37,-1,2564,0,0
+70000,Female,University,Married,49,0,64049,3700,1
+180000,Female,University,Married,38,0,36479,2810,0
+180000,Female,High School,Married,49,0,92140,3892,0
+470000,Female,Graduate School,Married,37,0,163000,10047,0
+120000,Female,University,Married,38,2,63700,3000,1
+390000,Female,University,Married,33,-1,8243,4644,0
+50000,Female,High School,Married,48,1,49621,0,1
+50000,Female,High School,Married,39,0,48694,1803,0
+140000,Female,Graduate School,Single,32,0,112933,5000,0
+160000,Female,Graduate School,Married,40,-2,0,3843,0
+30000,Female,High School,Married,37,0,3860,780,0
+20000,Female,University,Married,31,0,5500,3000,0
+100000,Female,University,Married,47,-1,6666,6666,1
+80000,Female,University,Married,47,0,2795,1067,0
+40000,Female,University,Single,24,1,24171,0,1
+180000,Female,Graduate School,Single,28,0,131007,5000,0
+80000,Female,University,Married,34,0,8413,1029,0
+300000,Female,Others,Married,41,-1,3399,0,0
+220000,Female,Graduate School,Single,29,3,4695,0,1
+80000,Female,Graduate School,Single,27,0,35303,1800,0
+290000,Female,Graduate School,Single,27,-2,-4,0,0
+360000,Female,Graduate School,Single,28,0,5446,0,0
+500000,Female,University,Married,32,-1,26455,7060,0
+430000,Female,Graduate School,Single,34,1,14965,0,0
+280000,Female,High School,Single,35,0,5716,0,0
+150000,Female,Graduate School,Married,28,0,58322,36865,0
+200000,Female,Graduate School,Married,32,0,101685,4000,0
+110000,Female,High School,Single,27,0,105979,5340,0
+150000,Female,High School,Single,30,0,24523,2029,0
+120000,Female,Graduate School,Married,39,-1,32598,20092,0
+500000,Female,Graduate School,Married,44,0,329570,72001,0
+270000,Female,University,Single,28,0,38875,6000,0
+50000,Female,University,Single,29,0,24399,4000,0
+300000,Female,High School,Single,30,2,131957,6500,0
+140000,Female,University,Single,30,0,97288,4400,1
+170000,Female,Graduate School,Single,32,-1,6699,2999,0
+320000,Female,University,Married,35,0,158942,5057,0
+200000,Female,High School,Married,39,-1,145687,7627,0
+50000,Female,University,Married,31,-2,0,323,0
+120000,Female,University,Married,32,0,37981,1586,0
+170000,Female,University,Single,38,-1,13451,10539,0
+150000,Female,Graduate School,Single,40,-2,798,2309,0
+200000,Female,Graduate School,Single,27,0,62574,2278,0
+110000,Female,Graduate School,Married,30,0,49069,1600,0
+50000,Female,University,Married,35,3,50437,0,0
+50000,Female,High School,Married,42,0,13562,1500,0
+50000,Female,University,Married,46,2,64350,5000,1
+110000,Female,University,Married,30,0,101706,3687,0
+160000,Female,Graduate School,Single,32,-2,6919,2389,0
+20000,Female,University,Single,43,-1,6483,0,1
+200000,Female,University,Married,48,0,93713,1866,0
+460000,Female,Graduate School,Married,45,-1,184,1904,0
+50000,Female,High School,Single,43,2,344,5128,0
+80000,Female,University,Single,35,0,28590,1500,0
+180000,Female,Graduate School,Married,40,-1,15118,6047,0
+440000,Female,University,Single,39,0,431557,18174,0
+300000,Female,University,Single,31,-2,188,188,0
+180000,Female,University,Single,33,-1,437,0,1
+80000,Female,University,Single,47,-1,3598,2,0
+20000,Female,University,Married,41,1,10774,1300,1
+310000,Female,Graduate School,Married,36,0,101941,3016,0
+180000,Female,Graduate School,Married,41,1,1520,10,0
+20000,Female,University,Married,42,2,15552,1575,1
+480000,Female,Graduate School,Married,38,0,29414,5002,0
+90000,Female,University,Married,33,0,75451,2800,0
+310000,Female,Graduate School,Single,35,-1,27089,20000,0
+310000,Female,Graduate School,Married,35,0,143072,30000,0
+200000,Female,University,Married,34,0,7026,6599,0
+100000,Female,University,Married,34,-1,246,123,1
+100000,Female,High School,Married,35,0,79277,3580,0
+370000,Female,University,Married,35,0,201666,7094,0
+60000,Female,High School,Married,46,0,58584,2700,0
+50000,Female,University,Single,39,2,50932,0,0
+280000,Female,Graduate School,Single,35,0,92095,83959,0
+90000,Female,High School,Married,46,0,5965,1120,1
+360000,Female,High School,Single,45,-2,0,700,0
+100000,Female,High School,Married,47,0,139550,3500,0
+30000,Female,High School,Married,37,2,21368,1500,1
+230000,Female,Others,Single,40,-2,0,0,0
+180000,Female,University,Married,44,-1,108,131,0
+150000,Female,High School,Married,48,0,9730,0,0
+240000,Female,Graduate School,Married,37,0,86701,5000,0
+110000,Female,University,Married,42,0,5174,1106,0
+180000,Female,Graduate School,Single,36,-1,980,960,0
+50000,Female,University,Married,42,0,41836,6000,0
+170000,Female,High School,Single,44,-2,0,0,0
+120000,Female,Graduate School,Married,42,0,112237,4097,0
+400000,Female,University,Married,45,-2,10985,2441,0
+240000,Female,University,Married,40,-1,390,390,0
+50000,Female,University,Single,37,1,47071,148,1
+290000,Female,University,Married,35,1,0,1839,0
+150000,Female,University,Single,36,0,134797,5600,1
+170000,Female,Graduate School,Single,40,-2,-3549,4557,0
+230000,Female,Graduate School,Married,46,-2,4957,9467,0
+180000,Female,Graduate School,Single,38,-2,0,0,0
+260000,Female,Graduate School,Married,41,1,0,5848,0
+90000,Female,University,Married,41,0,61287,3021,0
+520000,Female,Graduate School,Single,35,0,57536,5000,0
+50000,Female,High School,Married,40,0,48489,1813,1
+320000,Female,Graduate School,Married,45,1,0,0,0
+60000,Female,High School,Married,42,0,26536,5000,0
+180000,Female,University,Married,39,-1,2569,11662,0
+290000,Female,University,Single,43,-1,170373,5007,0
+300000,Female,University,Married,40,-2,0,0,0
+220000,Female,Graduate School,Married,47,-1,3148,4206,0
+70000,Female,University,Married,32,0,15623,3009,1
+150000,Female,High School,Married,40,-2,0,0,0
+80000,Female,University,Married,40,-1,3463,5143,0
+10000,Female,High School,Married,41,-1,1838,0,1
+280000,Female,University,Married,46,0,34433,1383,0
+450000,Female,University,Married,43,-1,1448,6018,0
+230000,Female,University,Single,34,-2,1716,1518,0
+320000,Female,Graduate School,Single,35,-2,0,0,1
+200000,Female,University,Married,42,0,25897,10000,0
+60000,Female,University,Married,37,0,38670,1606,0
+50000,Female,High School,Married,31,0,29731,1500,0
+30000,Female,High School,Married,47,1,13012,1000,0
+260000,Female,Graduate School,Single,36,-1,5160,0,0
+210000,Female,University,Married,42,-2,7065,1459,0
+290000,Female,University,Single,41,-1,32192,2500,0
+20000,Female,High School,Married,41,0,6667,2600,0
+100000,Female,High School,Married,45,3,94471,0,1
+230000,Female,University,Married,40,0,33907,2000,0
+300000,Female,Graduate School,Married,36,0,28292,1583,0
+20000,Female,University,Single,33,2,19946,1593,1
+30000,Female,High School,Married,42,0,21969,1706,1
+170000,Female,University,Single,43,1,0,1010,0
+200000,Female,University,Married,46,-1,2893,1436,1
+50000,Female,High School,Single,46,0,47788,2201,0
+20000,Female,University,Married,41,0,20441,1700,0
+60000,Female,University,Married,32,0,59274,2131,0
+20000,Female,University,Married,46,3,10145,0,1
+220000,Female,University,Single,35,-2,3327,0,0
+20000,Female,High School,Married,43,0,14959,1600,0
+70000,Female,University,Single,33,0,9286,2748,0
+380000,Female,Graduate School,Married,43,0,121868,6000,0
+30000,Female,High School,Married,48,0,36956,1553,0
+200000,Female,University,Single,35,1,-3258,14000,0
+80000,Female,University,Single,36,1,81606,0,1
+150000,Female,University,Single,38,0,26495,1291,0
+210000,Female,University,Married,40,0,111508,4100,0
+70000,Female,High School,Married,43,2,46092,1700,0
+280000,Female,University,Single,38,0,92579,4300,0
+140000,Female,University,Married,35,-1,326,326,0
+200000,Female,Others,Married,44,-2,0,1904,0
+170000,Female,University,Single,40,0,117338,4211,0
+50000,Female,University,Married,49,1,19208,1000,0
+90000,Female,University,Married,32,0,4857,0,0
+130000,Female,Graduate School,Married,33,0,63505,5000,0
+250000,Female,Graduate School,Married,40,2,170489,13926,0
+120000,Female,Graduate School,Single,35,2,110581,0,1
+490000,Female,Graduate School,Married,37,1,18961,0,0
+200000,Female,Graduate School,Married,43,1,0,601,1
+200000,Female,University,Married,42,0,192821,8000,0
+100000,Female,University,Married,30,2,736,1237,0
+280000,Female,Graduate School,Single,31,-1,14697,4501,0
+290000,Female,Graduate School,Single,34,0,21849,1500,0
+160000,Female,Others,Married,45,0,151214,6000,0
+230000,Female,University,Married,45,0,71149,3000,0
+130000,Female,University,Married,49,0,53714,2570,0
+420000,Female,Graduate School,Married,42,1,0,0,1
+20000,Female,University,Single,46,0,15717,1276,0
+70000,Female,University,Married,38,1,73688,0,1
+230000,Female,Graduate School,Married,47,-2,-96,3031,0
+340000,Female,Graduate School,Single,38,0,250475,11000,0
+420000,Female,Graduate School,Single,34,0,19406,6000,0
+140000,Female,Graduate School,Married,34,0,22848,2000,0
+30000,Female,High School,Married,26,0,27217,1179,0
+50000,Female,University,Single,27,1,51176,0,1
+430000,Female,University,Single,27,0,401298,40000,0
+260000,Female,High School,Married,41,0,2472,8160,1
+180000,Female,Graduate School,Single,34,-2,-20,50684,0
+100000,Female,University,Married,33,1,12560,0,0
+160000,Female,University,Married,39,0,106260,0,0
+260000,Female,Graduate School,Married,38,-2,0,0,0
+200000,Female,University,Married,33,0,23870,2000,0
+450000,Female,Graduate School,Single,34,-2,1494,538,0
+390000,Female,University,Married,34,0,1162,759,0
+390000,Female,Graduate School,Married,35,-1,43098,42000,0
+40000,Female,University,Married,38,0,37539,2000,1
+500000,Female,University,Married,46,0,198342,19000,0
+180000,Female,University,Married,39,-1,5360,6207,0
+100000,Female,High School,Married,39,1,-5,585,0
+50000,Female,University,Single,42,0,37606,3000,0
+40000,Female,University,Married,38,0,40125,2006,0
+350000,Female,Graduate School,Single,46,-1,5372,12277,0
+360000,Female,Graduate School,Married,37,-1,1568,2500,0
+200000,Female,University,Single,36,0,9738,1177,1
+230000,Female,Graduate School,Single,38,-2,2581,2929,0
+200000,Female,University,Single,39,-2,2880,0,0
+230000,Female,University,Married,42,-2,7906,2138,0
+500000,Female,Graduate School,Single,31,1,29091,5763,0
+230000,Female,University,Married,32,0,198704,10000,0
+340000,Female,Graduate School,Single,34,0,332916,14600,0
+20000,Female,Graduate School,Married,47,0,13963,1500,1
+20000,Female,High School,Married,48,0,14203,1252,0
+310000,Female,High School,Married,35,0,83719,5000,0
+160000,Female,University,Married,36,-1,2054,1950,0
+230000,Female,High School,Married,41,-2,34578,495,0
+200000,Female,Graduate School,Single,40,-1,7277,6,0
+120000,Female,University,Married,42,0,114740,4038,0
+350000,Female,University,Single,37,0,336061,12701,0
+170000,Female,University,Married,32,1,9990,0,1
+240000,Female,High School,Single,35,0,18086,1133,0
+100000,Female,University,Married,49,1,0,0,0
+130000,Female,Graduate School,Single,33,2,73250,0,1
+410000,Female,University,Married,41,2,348559,13000,1
+200000,Female,High School,Married,42,-1,390,742,0
+60000,Female,University,Married,45,0,59998,2152,0
+400000,Female,University,Married,44,-2,0,804,0
+380000,Female,Graduate School,Married,40,-2,12839,11212,0
+210000,Female,University,Single,43,-2,3577,1003,0
+500000,Female,Graduate School,Single,45,-1,290,5692,0
+100000,Female,University,Married,42,2,91534,0,1
+80000,Female,University,Married,44,-1,3995,1000,0
+80000,Female,University,Single,31,-1,1068,0,1
+200000,Female,Graduate School,Single,33,0,77670,6782,0
+260000,Female,University,Single,31,0,69957,3945,0
+310000,Female,University,Single,31,0,309080,11900,0
+30000,Female,University,Single,32,0,29480,1431,1
+50000,Female,High School,Married,33,0,40757,2000,0
+310000,Female,University,Married,46,0,43990,3000,0
+500000,Female,High School,Single,40,0,212961,63000,0
+30000,Female,Graduate School,Married,49,-2,3340,0,0
+160000,Female,University,Married,45,-1,3650,1650,0
+350000,Female,University,Married,44,0,110472,100957,0
+350000,Female,Graduate School,Married,44,-1,4025,2820,1
+50000,Female,Others,Married,40,0,44749,2229,0
+30000,Female,University,Single,44,0,9725,1200,0
+110000,Female,University,Single,31,0,98531,3577,1
+50000,Female,University,Married,31,0,22716,2000,0
+320000,Female,University,Married,34,0,3825,3000,0
+80000,Female,Graduate School,Single,34,-1,652,0,1
+40000,Female,High School,Married,42,0,40285,1658,1
+330000,Female,University,Married,39,1,0,1380,0
+160000,Female,High School,Married,36,-2,170,1062,0
+210000,Female,University,Married,49,-1,11790,2230,0
+50000,Female,University,Married,36,0,26277,1700,0
+50000,Female,High School,Single,49,0,46468,5000,1
+120000,Female,Graduate School,Single,36,0,71199,2574,0
+150000,Female,Graduate School,Married,45,-2,882,258,0
+100000,Female,University,Single,31,-1,960,4348,0
+50000,Female,University,Single,31,2,50539,1000,0
+240000,Female,Graduate School,Single,36,0,176035,25228,0
+230000,Female,Graduate School,Married,39,1,0,0,1
+290000,Female,University,Married,45,0,36335,1564,0
+230000,Female,Graduate School,Married,42,-1,7338,15,0
+360000,Female,Graduate School,Married,39,1,-390,0,0
+320000,Female,University,Married,38,0,129416,4836,0
+360000,Female,University,Married,45,0,12040,0,0
+90000,Female,Graduate School,Married,38,0,85376,4326,0
+500000,Female,Graduate School,Single,34,-1,4677,6187,0
+210000,Female,University,Married,36,0,91801,4000,0
+90000,Female,University,Married,39,0,45406,3000,0
+80000,Female,High School,Married,44,2,31023,1000,0
+390000,Female,University,Single,39,0,27918,3145,0
+170000,Female,University,Single,33,1,-189,0,0
+300000,Female,High School,Married,37,0,85405,3479,0
+30000,Female,High School,Married,38,-2,1535,835,0
+260000,Female,University,Married,39,-2,0,2399,0
+500000,Female,Graduate School,Married,44,-2,2128,635,0
+240000,Female,University,Single,35,0,238887,10000,0
+100000,Female,University,Married,35,1,0,0,0
+180000,Female,Graduate School,Married,45,0,16452,2616,0
+50000,Female,University,Married,45,0,3487,1213,0
+200000,Female,University,Married,35,0,27561,5000,0
+20000,Female,High School,Married,45,0,17235,1610,0
+20000,Female,University,Married,46,1,4426,0,1
+60000,Female,University,Married,35,0,20163,2000,0
+260000,Female,Graduate School,Married,42,0,16827,1500,0
+180000,Female,High School,Married,41,0,10324,3744,0
+210000,Female,Graduate School,Single,44,-1,4925,32930,0
+270000,Female,University,Married,36,0,73953,5168,0
+370000,Female,Graduate School,Married,36,-2,2788,25794,0
+200000,Female,Graduate School,Single,35,-1,467,2700,0
+270000,Female,Graduate School,Married,31,-2,6181,12140,0
+350000,Female,Graduate School,Single,32,-1,11942,3252,0
+230000,Female,University,Single,37,2,535,3888,0
+130000,Female,University,Married,41,0,8920,1161,0
+260000,Female,Graduate School,Married,31,0,263688,8993,0
+400000,Female,Graduate School,Married,32,-2,3616,0,0
+180000,Female,University,Single,36,-2,0,1290,0
+220000,Female,Graduate School,Married,41,0,138657,22000,0
+210000,Female,Graduate School,Single,32,1,-974,0,0
+240000,Female,University,Married,32,1,0,0,0
+240000,Female,University,Single,35,-2,0,0,0
+150000,Female,University,Married,45,0,60236,6007,0
+300000,Female,Graduate School,Single,31,0,123306,5000,0
+180000,Female,University,Others,43,0,81147,101522,0
+300000,Female,Graduate School,Married,47,3,5000,0,1
+70000,Female,University,Married,34,-2,1533,3921,0
+60000,Female,High School,Single,43,0,65349,2300,1
+260000,Female,Graduate School,Single,33,-2,1900,0,0
+400000,Female,University,Married,37,-2,2218,13256,0
+490000,Female,Graduate School,Married,48,1,500,7081,0
+50000,Female,Graduate School,Single,42,2,34218,1893,1
+130000,Female,Graduate School,Single,32,0,58776,2010,0
+150000,Female,University,Married,32,-1,3204,0,0
+70000,Female,University,Married,33,1,51900,0,0
+240000,Female,University,Married,34,-1,626,1921,0
+250000,Female,Graduate School,Married,49,-1,1104,0,0
+440000,Female,University,Married,41,0,348397,14006,0
+130000,Female,University,Married,40,0,133559,5400,0
+60000,Female,University,Married,37,0,59462,3000,1
+290000,Female,University,Married,41,-1,2025,0,0
+150000,Female,University,Married,41,0,4474,3000,0
+240000,Female,University,Married,37,-1,1769,0,0
+220000,Female,Graduate School,Married,43,-1,4009,1689,1
+100000,Female,Graduate School,Single,38,3,2156,0,0
+200000,Female,Graduate School,Married,39,-2,5553,2294,0
+400000,Female,Graduate School,Married,37,-1,10677,31008,0
+500000,Female,Graduate School,Married,47,-2,3634,299,0
+230000,Female,Graduate School,Married,45,-1,6348,0,1
+50000,Female,University,Single,43,0,48474,2334,0
+180000,Female,University,Married,46,-1,578,911,0
+200000,Female,Graduate School,Single,31,0,90708,2768,0
+230000,Female,Graduate School,Single,37,-2,1985,8075,0
+210000,Female,University,Married,34,0,201223,17580,1
+170000,Female,High School,Single,33,2,170348,7622,0
+200000,Female,Graduate School,Single,41,-1,2615,9855,0
+350000,Female,Graduate School,Married,42,-1,3931,0,0
+360000,Female,High School,Single,41,-2,3578,3658,0
+160000,Female,Graduate School,Single,35,-1,3027,1116,0
+230000,Female,Graduate School,Married,36,-2,0,1663,0
+110000,Female,University,Married,44,0,42214,2000,0
+520000,Female,University,Single,42,0,437492,14710,0
+280000,Female,Graduate School,Single,30,0,192045,9300,0
+240000,Female,Graduate School,Single,30,-1,3376,0,0
+20000,Female,High School,Single,31,1,-363,567,0
+210000,Female,University,Single,30,0,184705,7000,0
+250000,Female,Graduate School,Single,30,0,144807,20000,0
+90000,Female,High School,Married,44,-1,1473,1473,0
+70000,Female,University,Single,30,-1,2429,1500,1
+90000,Female,Graduate School,Single,36,0,39317,1808,1
+180000,Female,Graduate School,Married,39,-1,547,547,0
+210000,Female,Graduate School,Married,43,1,0,0,1
+250000,Female,University,Single,30,0,197436,7500,0
+200000,Female,Others,Single,30,-1,17160,2880,0
+240000,Female,University,Single,36,2,247477,11,0
+500000,Female,High School,Single,32,1,0,0,0
+20000,Female,University,Married,32,1,18931,0,0
+160000,Female,High School,Married,44,-2,-18,0,0
+250000,Female,Graduate School,Married,32,-2,3829,4374,0
+360000,Female,University,Married,44,-2,1715,3680,0
+300000,Female,University,Married,44,0,7948,576,0
+500000,Female,University,Married,45,0,494140,14515,0
+420000,Female,University,Married,40,-1,101873,3517,0
+30000,Female,University,Single,36,0,2114,490,0
+260000,Female,University,Married,48,0,85378,2664,0
+360000,Female,University,Married,45,0,20137,3332,0
+220000,Female,University,Single,32,-1,18477,16370,0
+20000,Female,High School,Married,35,0,19792,1339,0
+230000,Female,Graduate School,Single,37,-2,-18,18,1
+100000,Female,University,Married,41,-1,2427,935,0
+180000,Female,Graduate School,Married,44,-2,0,0,0
+300000,Female,University,Married,45,0,12975,1520,0
+160000,Female,High School,Married,46,-1,960,485,0
+10000,Female,University,Others,43,0,6110,1240,0
+300000,Female,Graduate School,Married,39,-2,2973,2702,0
+80000,Female,Graduate School,Married,37,0,104989,4369,0
+440000,Female,Graduate School,Married,39,0,388357,14193,1
+80000,Female,University,Single,33,0,12554,1523,0
+230000,Female,Graduate School,Married,45,-2,0,0,0
+340000,Female,University,Married,47,-1,507,892,0
+50000,Female,University,Single,48,0,50099,1291,0
+80000,Female,Graduate School,Married,37,-1,2028,286,0
+150000,Female,Graduate School,Married,47,-2,39139,15374,0
+160000,Female,Graduate School,Married,43,-1,2980,2817,1
+390000,Female,Others,Married,40,0,195688,7636,0
+500000,Female,Graduate School,Married,37,-2,16801,14805,0
+100000,Female,University,Married,32,-1,29222,1432,0
+310000,Female,Graduate School,Married,41,-1,30680,100000,0
+30000,Female,Graduate School,Single,43,0,29327,1659,0
+20000,Female,University,Single,45,0,7221,1245,0
+500000,Female,Graduate School,Single,45,0,87206,5303,0
+150000,Female,Graduate School,Single,37,-2,702,0,0
+360000,Female,Graduate School,Single,38,1,4741,0,1
+200000,Female,Graduate School,Single,31,0,68244,6018,0
+270000,Female,Graduate School,Single,36,0,16699,1800,0
+230000,Female,University,Single,32,0,17957,3000,0
+150000,Female,Graduate School,Married,40,-1,1221,176,0
+240000,Female,Graduate School,Married,41,0,129776,5000,0
+250000,Female,University,Married,49,0,41075,5000,0
+240000,Female,Graduate School,Married,35,-1,3664,5770,1
+340000,Female,Graduate School,Single,34,-1,1780,28763,0
+160000,Female,Graduate School,Single,35,-2,52010,4792,0
+180000,Female,University,Married,33,0,44806,3000,0
+240000,Female,University,Married,41,-2,852,1746,0
+120000,Female,University,Single,43,0,121427,77225,0
+150000,Female,Others,Married,40,0,21001,4000,0
+210000,Female,Graduate School,Single,31,1,0,0,0
+500000,Female,University,Married,31,0,348526,15210,0
+80000,Female,High School,Married,49,0,79967,2700,0
+30000,Female,High School,Single,45,-2,0,0,0
+60000,Female,University,Married,33,0,56379,3000,0
+370000,Female,University,Single,35,-2,20108,13994,0
+30000,Female,University,Married,33,-1,390,390,0
+150000,Female,Graduate School,Married,30,0,6154,0,1
+100000,Female,Graduate School,Single,42,0,61618,2406,0
+80000,Female,University,Married,39,0,48784,2194,0
+200000,Female,Graduate School,Single,32,-1,8479,2021,0
+80000,Female,University,Married,33,2,51016,1200,1
+140000,Female,University,Married,42,0,156062,5750,0
+290000,Female,University,Single,40,0,258360,12025,0
+350000,Female,Graduate School,Married,36,-1,28489,28950,0
+310000,Female,Graduate School,Married,39,-2,12080,1989,0
+80000,Female,University,Married,42,0,18172,2000,1
+30000,Female,High School,Married,37,0,25689,1500,1
+230000,Female,University,Single,47,-2,0,0,0
+150000,Female,Graduate School,Married,37,-2,0,0,0
+110000,Female,University,Married,29,0,60101,2505,0
+170000,Female,University,Married,29,0,50104,2000,0
+200000,Female,Graduate School,Single,30,1,0,1186,0
+260000,Female,University,Married,30,0,147565,7129,0
+50000,Female,University,Single,47,0,39296,2000,0
+50000,Female,University,Others,42,0,49699,1857,0
+420000,Female,Graduate School,Married,46,-2,19629,8765,0
+220000,Female,Graduate School,Single,36,-2,637,0,1
+230000,Female,Graduate School,Married,36,-2,928,282,0
+280000,Female,University,Single,49,0,278569,11005,0
+20000,Female,University,Married,45,0,25102,3550,1
+200000,Female,Graduate School,Single,44,0,35210,1858,0
+280000,Female,Graduate School,Married,37,-1,8909,3571,0
+170000,Female,University,Married,45,0,130668,5900,1
+50000,Female,University,Married,42,0,49624,2100,0
+240000,Female,Graduate School,Single,33,0,103100,5295,0
+200000,Female,High School,Others,45,0,45876,1732,0
+100000,Female,University,Single,42,0,117464,2962,1
+230000,Female,Graduate School,Single,30,0,185129,10024,0
+240000,Female,University,Married,32,0,199789,9000,0
+240000,Female,University,Single,32,0,132252,6500,0
+80000,Female,University,Married,36,0,22657,1320,0
+80000,Female,University,Single,30,0,21072,2620,0
+50000,Female,High School,Married,48,-1,2896,1500,0
+180000,Female,University,Married,38,0,150345,5300,1
+150000,Female,Graduate School,Single,31,-2,4847,4268,0
+140000,Female,Graduate School,Married,42,2,141321,4300,1
+80000,Female,University,Married,43,2,68612,3100,0
+300000,Female,University,Married,43,-2,5972,1857,0
+350000,Female,University,Single,34,0,19202,1300,0
+110000,Female,High School,Single,46,0,44590,2000,1
+130000,Female,University,Married,37,0,33346,5448,0
+130000,Female,University,Married,45,0,126840,5298,0
+170000,Female,University,Single,48,0,99728,4500,0
+310000,Female,Graduate School,Married,43,0,112343,3023,0
+500000,Female,High School,Single,36,-2,51132,8830,0
+230000,Female,Graduate School,Single,35,-1,1890,533,0
+50000,Female,University,Married,35,1,8917,0,1
+390000,Female,High School,Married,44,-1,35369,15000,0
+200000,Female,High School,Single,43,0,5853,1024,0
+380000,Female,Graduate School,Single,34,-1,25079,100000,0
+140000,Female,University,Single,41,0,130439,5000,0
+420000,Female,University,Single,49,0,115505,4044,0
+50000,Female,University,Married,42,3,24225,500,1
+100000,Female,High School,Single,46,-1,3780,0,0
+50000,Female,University,Single,36,2,50817,2000,1
+140000,Female,University,Married,37,0,58075,3000,0
+50000,Female,Graduate School,Married,40,0,48519,1800,0
+20000,Female,University,Married,48,1,0,0,1
+260000,Female,University,Married,45,0,243952,15521,0
+240000,Female,University,Married,37,-2,-150,0,1
+20000,Female,High School,Married,49,0,12165,1517,0
+200000,Female,University,Single,38,0,18703,1533,0
+150000,Female,University,Married,40,0,87105,3000,0
+280000,Female,Graduate School,Single,35,-1,3072,0,0
+280000,Female,Graduate School,Single,34,-2,2793,1356,0
+120000,Female,Graduate School,Married,34,1,0,0,1
+160000,Female,University,Single,47,-1,1961,5495,0
+80000,Female,University,Married,37,0,63558,2500,0
+500000,Female,University,Married,43,0,42081,16495,0
+50000,Female,University,Single,42,5,20235,0,1
+120000,Female,Graduate School,Married,36,-1,2779,6428,1
+150000,Female,Graduate School,Married,34,-2,5712,14539,0
+150000,Female,University,Married,34,0,150294,6750,0
+200000,Female,University,Single,31,-2,1507,0,0
+160000,Female,Graduate School,Single,31,-2,0,2036,0
+200000,Female,University,Married,33,0,84064,5935,0
+280000,Female,University,Married,43,1,0,4051,1
+120000,Female,University,Single,39,0,20693,2000,1
+360000,Female,Graduate School,Married,42,-1,10000,893,0
+150000,Female,University,Married,49,1,0,1870,0
+390000,Female,Graduate School,Married,39,0,42721,12041,0
+200000,Female,University,Married,34,-1,7271,2000,0
+200000,Female,University,Single,37,1,226365,0,0
+60000,Female,University,Married,43,0,60788,2500,0
+300000,Female,Graduate School,Married,30,-2,0,0,0
+420000,Female,University,Single,32,0,211688,7300,0
+170000,Female,Graduate School,Married,33,1,0,0,0
+170000,Female,Graduate School,Single,28,0,56766,4000,0
+160000,Female,Graduate School,Single,28,-1,15525,34604,0
+210000,Female,Graduate School,Single,28,-1,2670,262,0
+200000,Female,Others,Single,29,-2,2618,784,0
+200000,Female,University,Single,30,0,154030,5370,0
+60000,Female,Graduate School,Single,32,0,59166,2000,0
+500000,Female,Graduate School,Single,33,-1,249,2655,0
+150000,Female,Graduate School,Married,33,-1,3389,0,0
+310000,Female,University,Single,42,-2,-2123,260000,0
+350000,Female,Graduate School,Single,31,0,35420,4000,0
+140000,Female,Graduate School,Single,35,-1,7921,9268,1
+320000,Female,Graduate School,Single,30,-1,9646,13442,0
+330000,Female,Graduate School,Single,32,0,12118,2000,0
+200000,Female,University,Single,36,0,10890,1500,0
+430000,Female,Others,Single,41,-2,2663,20679,0
+170000,Female,High School,Single,30,0,173922,7400,0
+300000,Female,Graduate School,Married,30,-2,15575,10810,0
+450000,Female,University,Single,39,-1,217126,10329,0
+250000,Female,University,Single,44,-1,23438,0,0
+300000,Female,Graduate School,Single,29,0,11500,0,0
+230000,Female,University,Single,30,0,149775,10000,0
+230000,Female,Graduate School,Married,35,-2,9000,0,0
+100000,Female,University,Single,37,-2,14394,36291,0
+20000,Female,High School,Single,52,0,18603,1555,0
+50000,Female,Graduate School,Married,54,0,45777,1800,0
+70000,Female,Graduate School,Single,50,2,62049,4000,1
+170000,Female,University,Married,52,0,170133,7620,1
+110000,Female,High School,Married,53,0,105435,5400,0
+30000,Female,High School,Single,50,0,17891,1800,0
+60000,Female,University,Married,50,2,60199,2800,1
+70000,Female,University,Married,59,0,67088,2244,0
+20000,Female,High School,Married,50,-1,5141,3754,1
+30000,Female,High School,Married,51,0,15126,1236,0
+30000,Female,University,Married,52,2,7033,3700,1
+490000,Female,High School,Married,54,-1,396,396,0
+280000,Female,University,Married,53,-1,1890,2846,0
+150000,Female,Graduate School,Married,51,1,0,0,0
+240000,Female,University,Married,52,1,0,0,0
+230000,Female,University,Single,53,0,9358,5000,0
+100000,Female,Graduate School,Single,52,1,13687,1600,1
+60000,Female,University,Married,51,0,57252,2501,0
+50000,Female,High School,Married,51,0,38306,1661,0
+70000,Female,University,Single,52,1,38527,0,0
+50000,Female,High School,Married,52,3,49534,0,0
+110000,Female,University,Single,52,0,38298,1935,0
+200000,Female,Others,Single,51,2,455293,10688,0
+220000,Female,High School,Married,51,0,44076,2000,0
+30000,Female,High School,Single,50,0,25648,1000,0
+20000,Female,University,Married,50,0,13779,1245,1
+20000,Female,University,Married,51,1,14710,0,0
+20000,Female,High School,Married,51,0,18768,1714,0
+280000,Female,Graduate School,Married,51,-2,390,11223,1
+100000,Female,High School,Married,54,0,40112,1661,0
+300000,Female,University,Married,56,0,129234,10060,0
+200000,Female,High School,Married,54,-1,2841,0,0
+50000,Female,High School,Married,52,2,48970,2100,0
+20000,Female,University,Single,52,0,18652,1300,0
+30000,Female,High School,Others,61,2,24453,0,0
+430000,Female,University,Single,50,0,268537,10000,0
+90000,Female,University,Married,55,-2,3388,0,0
+280000,Female,University,Married,50,-1,8740,4919,0
+20000,Female,High School,Single,50,-1,19681,1313,0
+150000,Female,High School,Single,52,0,139733,7000,0
+300000,Female,Graduate School,Single,50,-2,22443,31175,0
+80000,Female,Others,Married,52,2,79822,2773,0
+280000,Female,Graduate School,Married,50,-1,2712,4494,0
+120000,Female,High School,Married,50,0,121018,5550,0
+210000,Female,High School,Single,51,2,42795,2007,1
+40000,Female,University,Married,52,-2,0,3500,0
+100000,Female,University,Married,52,0,39555,1955,0
+90000,Female,High School,Married,54,-2,0,0,1
+80000,Female,University,Single,49,-1,390,390,1
+170000,Female,University,Married,51,0,9773,4146,0
+350000,Female,Graduate School,Married,52,3,2400,0,1
+80000,Female,High School,Single,55,2,76414,3100,0
+30000,Female,High School,Single,58,0,24874,1500,0
+500000,Female,Graduate School,Married,54,-1,6259,2962,1
+180000,Female,University,Married,53,-2,0,533,0
+80000,Female,High School,Married,57,0,33600,0,0
+150000,Female,University,Single,51,0,13219,1179,0
+600000,Female,High School,Single,51,0,95286,10448,0
+50000,Female,University,Single,51,0,24202,1340,0
+20000,Female,University,Married,51,0,5411,1116,0
+150000,Female,High School,Married,72,-2,0,1250,0
+100000,Female,High School,Married,50,0,47895,1790,0
+20000,Female,High School,Married,58,2,8287,3000,1
+80000,Female,High School,Married,55,0,67671,2141,0
+50000,Female,High School,Married,52,1,19371,0,0
+150000,Female,University,Married,52,0,85404,8400,0
+90000,Female,High School,Married,52,0,49115,2477,0
+80000,Female,Graduate School,Married,50,0,30711,1957,0
+190000,Female,University,Married,48,2,90128,4500,1
+320000,Female,Graduate School,Single,50,-1,1999,730,0
+500000,Female,University,Single,50,-1,10061,20631,0
+50000,Female,High School,Married,57,0,50990,2100,1
+60000,Female,Graduate School,Single,62,0,57918,3028,0
+100000,Female,High School,Married,53,0,98026,3773,0
+150000,Female,Graduate School,Married,54,-2,-1,263,1
+280000,Female,Others,Married,50,0,5380,400,0
+100000,Female,High School,Married,50,0,49032,1736,0
+30000,Female,University,Married,50,2,21993,1300,1
+50000,Female,Graduate School,Married,50,2,38483,1960,1
+60000,Female,University,Married,50,0,33290,1558,0
+40000,Female,High School,Single,55,1,27849,2000,0
+160000,Female,High School,Married,51,-2,0,0,0
+150000,Female,University,Single,50,-2,8478,7890,0
+200000,Female,University,Married,50,-2,0,0,1
+60000,Female,High School,Married,51,0,60350,2435,0
+210000,Female,Graduate School,Married,51,-1,3490,3611,0
+50000,Female,Graduate School,Single,52,2,6379,1091,0
+360000,Female,Graduate School,Married,52,-2,6658,2610,0
+80000,Female,High School,Others,52,0,77086,3000,0
+360000,Female,Graduate School,Married,52,-2,-3,0,1
+140000,Female,University,Married,52,-1,850,12340,0
+260000,Female,High School,Married,52,1,9979,0,1
+180000,Female,University,Married,52,0,76538,3200,0
+300000,Female,Graduate School,Single,55,-2,0,0,0
+200000,Female,High School,Married,53,0,88895,2600,0
+390000,Female,Graduate School,Single,53,0,30400,5000,0
+130000,Female,High School,Married,59,-1,390,2356,0
+130000,Female,High School,Married,56,0,30218,2000,0
+240000,Female,University,Married,53,-2,890,236,0
+80000,Female,High School,Married,54,0,12158,2000,0
+350000,Female,Graduate School,Married,56,-2,8451,36108,0
+110000,Female,University,Married,62,0,65932,2500,0
+30000,Female,University,Married,56,0,10232,1510,1
+130000,Female,University,Married,53,1,112363,0,0
+80000,Female,High School,Married,54,0,75267,3000,0
+60000,Female,University,Single,55,1,61278,17,0
+180000,Female,High School,Married,70,0,162647,5644,0
+50000,Female,High School,Married,54,0,42326,2200,0
+70000,Female,High School,Married,57,0,10042,1176,0
+270000,Female,Graduate School,Married,60,-1,836,836,1
+110000,Female,High School,Married,55,-1,4924,0,1
+150000,Female,University,Married,53,-1,11589,29190,0
+150000,Female,High School,Married,53,0,29076,1200,0
+170000,Female,High School,Single,56,-1,390,17450,0
+70000,Female,University,Married,62,2,48367,1796,1
+360000,Female,Graduate School,Married,57,-2,-2,0,0
+200000,Female,Graduate School,Married,59,0,92215,4300,0
+200000,Female,University,Married,69,0,45870,2100,0
+140000,Female,High School,Single,61,1,0,0,1
+360000,Female,University,Married,58,-2,0,600,0
+180000,Female,Others,Married,54,0,87695,3200,0
+80000,Female,University,Married,55,0,79816,1600,0
+390000,Female,Graduate School,Married,53,-1,611,0,0
+440000,Female,Graduate School,Married,53,0,295180,13007,0
+50000,Female,University,Married,55,-1,1100,0,0
+80000,Female,University,Single,60,-2,577,500,0
+230000,Female,University,Married,54,-1,187,3327,0
+90000,Female,High School,Married,53,0,60769,2207,0
+80000,Female,High School,Single,54,0,16613,3500,0
+350000,Female,High School,Married,64,0,30726,3000,0
+440000,Female,Others,Married,64,0,133996,6319,0
+90000,Female,High School,Married,54,3,86228,1500,1
+40000,Female,University,Married,56,0,30048,1805,1
+50000,Female,University,Married,61,3,52659,4,1
+50000,Female,High School,Married,58,1,39649,1500,0
+100000,Female,High School,Married,55,-2,390,390,1
+200000,Female,High School,Married,56,0,193392,5039,0
+180000,Female,High School,Single,55,-1,52824,2000,0
+180000,Female,University,Married,62,1,0,0,1
+20000,Female,University,Married,58,-1,17895,13605,0
+120000,Female,High School,Married,63,0,49394,3000,0
+80000,Female,High School,Single,65,2,85882,3900,0
+60000,Female,High School,Married,66,0,47350,5000,0
+140000,Female,University,Married,55,0,108038,4200,0
+220000,Male,University,Married,39,-2,1557,4478,0
+80000,Male,Graduate School,Single,25,2,79000,0,0
+360000,Male,University,Single,29,0,67273,10000,0
+60000,Male,High School,Single,26,0,56387,2200,0
+50000,Male,Graduate School,Single,29,-1,1528,0,0
+20000,Male,University,Single,28,0,17027,1500,0
+50000,Male,University,Single,22,1,0,1054,0
+360000,Male,Graduate School,Single,29,-1,390,390,0
+50000,Male,University,Single,34,0,44168,1410,0
+130000,Male,High School,Married,27,-1,500,2000,1
+20000,Male,University,Single,22,1,17336,1000,1
+20000,Male,University,Single,22,3,16132,1000,1
+300000,Male,Graduate School,Married,35,0,91282,3260,0
+250000,Male,Graduate School,Married,36,-1,5992,0,1
+100000,Male,Graduate School,Married,37,0,99516,10184,1
+80000,Male,Graduate School,Married,37,2,81970,3000,1
+20000,Male,University,Single,22,1,9422,1000,1
+50000,Male,Graduate School,Single,23,0,34376,1806,0
+50000,Male,University,Single,23,2,49301,2500,1
+20000,Male,University,Single,23,1,11113,3500,1
+50000,Male,University,Married,23,0,9156,0,1
+180000,Male,Graduate School,Single,23,0,177736,6849,0
+50000,Male,Graduate School,Single,23,0,11306,1069,1
+20000,Male,High School,Single,23,0,8038,1161,1
+20000,Male,University,Single,22,1,15471,1300,0
+10000,Male,University,Single,22,1,7019,0,1
+10000,Male,University,Single,21,0,7327,1500,0
+20000,Male,Others,Single,22,0,19614,11630,0
+150000,Male,University,Married,24,0,139125,0,0
+10000,Male,University,Single,21,2,8999,1400,1
+20000,Male,Graduate School,Single,23,0,3286,3300,0
+20000,Male,University,Single,24,2,21034,0,0
+50000,Male,High School,Married,23,0,43087,2100,1
+10000,Male,University,Single,24,1,5570,2000,1
+50000,Male,University,Single,23,0,49276,0,0
+20000,Male,University,Single,23,0,15932,1605,0
+90000,Male,High School,Single,23,3,92154,0,1
+30000,Male,High School,Single,23,1,30391,0,0
+110000,Male,Graduate School,Single,25,0,89687,4100,0
+10000,Male,Graduate School,Single,23,-1,780,0,1
+20000,Male,University,Single,23,2,16771,2919,1
+10000,Male,University,Single,24,1,5378,0,0
+80000,Male,University,Single,25,-2,302,0,0
+20000,Male,University,Single,24,0,6814,1132,0
+240000,Male,University,Single,25,0,62946,2212,0
+170000,Male,Others,Married,40,0,171650,6384,0
+80000,Male,University,Single,23,0,78542,1750,0
+20000,Male,University,Single,22,2,20433,0,0
+20000,Male,University,Single,22,0,9493,1500,0
+480000,Male,Graduate School,Married,37,-1,2849,45205,0
+20000,Male,University,Single,23,3,11859,0,1
+20000,Male,University,Single,24,0,17924,1400,0
+100000,Male,University,Single,22,0,35986,2548,0
+50000,Male,University,Single,23,0,48784,2400,0
+80000,Male,University,Single,24,2,1430,0,0
+200000,Male,Graduate School,Single,24,-1,9167,200,0
+150000,Male,Others,Single,32,0,159919,4019,0
+30000,Male,University,Single,34,-1,390,0,0
+140000,Male,University,Single,25,0,17172,1815,0
+10000,Male,University,Single,21,0,7427,1200,0
+20000,Male,High School,Single,24,2,11179,1504,1
+50000,Male,Graduate School,Single,26,0,12571,1600,0
+130000,Male,University,Single,25,0,101281,31200,0
+50000,Male,Graduate School,Single,26,0,6905,2000,0
+160000,Male,Graduate School,Single,26,-1,5417,15134,0
+30000,Male,University,Single,26,0,29160,2015,0
+80000,Male,University,Single,27,0,60829,2061,0
+20000,Male,High School,Single,25,0,9248,3100,0
+20000,Male,High School,Single,24,0,15570,2000,0
+10000,Male,High School,Single,24,1,-372,5253,0
+110000,Male,High School,Single,26,0,10183,2000,0
+50000,Male,Graduate School,Single,24,-1,47796,2100,0
+40000,Male,Graduate School,Single,25,-1,977,1190,0
+120000,Male,University,Single,26,-2,316,316,0
+50000,Male,University,Single,26,0,49458,2200,0
+270000,Male,Graduate School,Single,27,0,176380,15000,0
+200000,Male,Graduate School,Married,27,2,177998,14600,1
+20000,Male,University,Single,24,1,7733,1300,0
+30000,Male,University,Single,24,0,27880,2000,0
+20000,Male,High School,Single,26,3,150,0,1
+20000,Male,University,Single,24,0,5212,1500,1
+10000,Male,Graduate School,Single,24,2,8837,0,1
+50000,Male,University,Single,24,2,47121,1950,0
+60000,Male,University,Married,25,0,45648,2000,0
+200000,Male,Graduate School,Single,26,0,14335,1213,0
+110000,Male,University,Single,26,0,36770,2000,0
+210000,Male,Others,Single,27,0,196963,7000,0
+20000,Male,University,Single,27,-1,1663,1662,0
+300000,Male,University,Married,27,0,11645,3513,0
+30000,Male,University,Single,22,3,26061,0,1
+50000,Male,High School,Married,24,0,29341,1806,0
+80000,Male,University,Single,24,1,178541,0,1
+50000,Male,University,Single,26,1,31392,2400,1
+230000,Male,University,Single,27,0,147530,6800,0
+110000,Male,High School,Single,27,0,71048,5000,0
+140000,Male,University,Single,27,0,80210,10000,0
+140000,Male,University,Single,28,0,218318,5000,0
+360000,Male,Graduate School,Single,28,-1,2711,780,0
+90000,Male,High School,Single,25,1,25179,0,1
+10000,Male,Graduate School,Single,26,0,4457,1400,0
+180000,Male,Graduate School,Single,26,-1,14565,9500,0
+120000,Male,High School,Others,26,0,45753,13160,0
+130000,Male,University,Single,25,2,128947,5006,0
+90000,Male,University,Single,25,0,2381,1800,0
+170000,Male,University,Single,26,0,164408,5700,1
+50000,Male,University,Single,23,0,47378,4000,0
+400000,Male,Graduate School,Single,26,0,98580,8006,0
+10000,Male,High School,Single,24,0,9285,273,0
+360000,Male,Graduate School,Single,27,-2,4614,6813,0
+100000,Male,Graduate School,Single,25,-1,1522,291,0
+80000,Male,Graduate School,Single,25,-1,825,1650,0
+20000,Male,University,Single,25,-1,390,390,0
+50000,Male,Graduate School,Single,25,0,15639,1200,0
+50000,Male,Graduate School,Single,24,-1,1959,0,0
+50000,Male,Graduate School,Single,24,0,12570,6000,0
+100000,Male,University,Single,26,0,30928,2500,1
+150000,Male,Graduate School,Single,26,2,164576,0,1
+190000,Male,Graduate School,Single,25,0,3173,2323,0
+80000,Male,Graduate School,Single,26,0,75755,3500,0
+10000,Male,University,Single,26,0,9861,1099,0
+50000,Male,Graduate School,Single,26,0,29021,1768,0
+140000,Male,Others,Single,26,0,136013,5300,0
+80000,Male,University,Single,25,1,0,0,0
+50000,Male,University,Single,26,0,42695,2000,0
+80000,Male,High School,Married,34,1,71104,3400,1
+20000,Male,High School,Single,27,0,36908,0,0
+170000,Male,University,Single,27,0,156411,7100,1
+210000,Male,Graduate School,Single,27,1,0,582,1
+230000,Male,University,Single,27,0,5504,2000,0
+50000,Male,Graduate School,Single,28,0,23266,5224,0
+30000,Male,University,Single,28,2,24104,1800,0
+50000,Male,University,Single,25,0,48424,2710,0
+50000,Male,University,Single,28,2,14091,1200,1
+20000,Male,University,Single,26,0,17961,1513,0
+160000,Male,University,Single,26,-2,1987,0,0
+70000,Male,High School,Single,26,0,64494,5916,1
+290000,Male,Graduate School,Single,28,0,18359,1500,0
+120000,Male,High School,Single,26,0,88143,6000,0
+50000,Male,University,Single,27,0,45586,3000,0
+110000,Male,Graduate School,Single,26,0,101860,3577,0
+160000,Male,Others,Single,26,0,11152,5398,0
+30000,Male,High School,Single,27,0,29584,3000,0
+210000,Male,University,Single,27,1,0,0,0
+150000,Male,High School,Single,27,0,85993,2922,0
+120000,Male,Graduate School,Single,28,0,117303,1256,1
+100000,Male,University,Married,28,1,0,0,1
+120000,Male,Graduate School,Single,28,2,114127,0,1
+220000,Male,Others,Single,23,-1,181870,7500,0
+140000,Male,University,Single,26,2,119424,0,1
+20000,Male,University,Single,24,1,6962,1200,1
+330000,Male,University,Single,25,0,229229,9520,0
+210000,Male,Graduate School,Single,29,0,183214,6000,0
+430000,Male,Graduate School,Single,29,-2,325,325,0
+150000,Male,University,Single,28,0,129151,3501,0
+210000,Male,High School,Single,28,0,58340,2107,0
+240000,Male,University,Single,29,0,243774,8891,0
+140000,Male,University,Single,30,2,138421,5000,1
+50000,Male,University,Single,28,0,45622,2500,1
+400000,Male,Graduate School,Single,28,-1,21012,5056,0
+160000,Male,Others,Married,28,0,154412,7028,0
+130000,Male,Graduate School,Single,28,0,91092,5000,0
+160000,Male,University,Single,26,-2,3720,396,0
+30000,Male,University,Single,27,0,25081,2000,0
+20000,Male,High School,Single,27,0,13384,1300,1
+200000,Male,University,Single,27,0,202591,7565,0
+20000,Male,University,Single,27,1,0,18965,0
+130000,Male,Graduate School,Single,27,-1,2219,2000,0
+100000,Male,University,Married,30,-1,416,416,0
+150000,Male,Others,Single,24,0,10459,2516,0
+100000,Male,University,Single,25,0,83749,4000,0
+40000,Male,Graduate School,Single,27,2,10181,2500,1
+50000,Male,Graduate School,Single,26,0,43226,1300,0
+150000,Male,University,Single,32,1,127399,0,1
+50000,Male,University,Single,29,2,46346,4355,0
+50000,Male,University,Single,24,0,51112,2109,0
+60000,Male,University,Single,29,2,35932,0,1
+320000,Male,Graduate School,Single,30,-2,5562,6091,0
+90000,Male,University,Single,30,2,91562,0,1
+160000,Male,University,Single,30,0,155967,7541,0
+30000,Male,University,Single,25,1,3997,0,1
+50000,Male,University,Single,26,0,41375,1515,0
+20000,Male,University,Single,27,0,19812,1291,0
+150000,Male,University,Single,27,0,58795,5600,0
+70000,Male,Graduate School,Single,28,-1,5154,5860,0
+180000,Male,Graduate School,Single,28,0,40675,1560,0
+320000,Male,Graduate School,Single,29,-2,9161,28524,0
+90000,Male,University,Married,29,0,84844,2277,0
+210000,Male,University,Single,29,0,62590,3815,0
+180000,Male,University,Single,26,0,67276,3000,0
+360000,Male,University,Single,28,-1,7156,9849,0
+100000,Male,Others,Single,28,-2,1000,0,0
+300000,Male,Graduate School,Single,28,1,0,0,0
+160000,Male,University,Single,27,-1,390,390,0
+50000,Male,University,Single,27,2,43063,0,1
+50000,Male,Graduate School,Single,29,0,27179,1454,1
+160000,Male,Others,Single,33,-2,144,6636,0
+160000,Male,Graduate School,Single,30,-1,11170,1007,0
+20000,Male,University,Married,30,4,19379,0,0
+250000,Male,Graduate School,Single,27,0,144525,43089,0
+50000,Male,Graduate School,Single,27,-1,33567,2034,0
+30000,Male,University,Single,26,3,30311,1700,0
+50000,Male,University,Single,31,0,23051,2719,0
+50000,Male,Graduate School,Single,28,0,42046,11000,0
+50000,Male,High School,Single,28,0,57207,2000,0
+200000,Male,Graduate School,Single,28,-2,2626,15954,0
+80000,Male,Graduate School,Single,28,0,75923,2500,0
+220000,Male,Graduate School,Single,29,0,146022,7000,0
+60000,Male,Graduate School,Single,28,0,59059,3800,0
+230000,Male,Graduate School,Single,29,-2,4649,4004,0
+210000,Male,University,Single,31,0,30785,1472,0
+50000,Male,University,Single,32,0,49292,2661,0
+20000,Male,High School,Married,26,2,10449,0,1
+320000,Male,University,Single,28,0,46385,2000,0
+180000,Male,University,Single,28,1,29484,2000,0
+140000,Male,University,Single,30,-1,188,6229,0
+30000,Male,High School,Single,25,0,1211,0,0
+50000,Male,High School,Single,25,0,35227,2011,0
+170000,Male,Others,Single,30,-1,297,1096,0
+80000,Male,University,Single,29,2,9928,2000,1
+290000,Male,University,Married,30,-1,8452,7728,0
+20000,Male,University,Single,28,2,17429,0,1
+20000,Male,University,Married,27,-1,690,665,0
+100000,Male,Graduate School,Single,26,0,1847,2000,0
+10000,Male,High School,Single,27,1,0,0,0
+30000,Male,High School,Single,28,0,22634,1385,1
+180000,Male,Graduate School,Single,29,0,183887,4000,0
+160000,Male,Graduate School,Married,31,1,3310,0,0
+390000,Male,University,Single,29,0,65354,20000,0
+90000,Male,University,Single,28,0,42322,1997,0
+310000,Male,University,Single,29,3,400,0,0
+100000,Male,University,Married,29,0,66195,2303,0
+50000,Male,High School,Single,27,-1,3150,9375,0
+500000,Male,University,Single,30,0,20544,5835,0
+120000,Male,University,Married,30,1,80226,3500,0
+80000,Male,High School,Married,29,0,52175,2034,0
+140000,Male,Graduate School,Single,27,0,88588,3200,0
+50000,Male,University,Single,31,1,51086,0,0
+400000,Male,Graduate School,Single,30,0,162540,10000,0
+380000,Male,Graduate School,Single,29,0,24469,30690,0
+100000,Male,Graduate School,Single,29,0,57757,5003,0
+110000,Male,Graduate School,Single,29,0,104216,7316,0
+130000,Male,University,Single,29,1,-4,1202,0
+200000,Male,University,Married,29,-2,1418,3066,0
+170000,Male,Graduate School,Single,25,-2,-1028,60751,0
+50000,Male,University,Single,27,0,17831,1295,0
+140000,Male,High School,Single,29,0,7307,5000,0
+50000,Male,University,Married,32,0,41266,27004,1
+100000,Male,Others,Single,29,0,92086,3500,0
+200000,Male,University,Married,32,2,110419,5400,1
+70000,Male,High School,Married,33,0,70011,15000,1
+320000,Male,University,Single,29,0,146515,4697,0
+90000,Male,Graduate School,Single,31,0,32581,4500,0
+220000,Male,University,Single,31,0,167390,6000,1
+170000,Male,Others,Single,28,0,169763,6300,0
+150000,Male,University,Single,28,1,155963,12,0
+180000,Male,Graduate School,Single,29,-2,0,300,1
+110000,Male,University,Single,29,0,104968,4000,0
+200000,Male,Graduate School,Single,30,1,182499,6600,1
+150000,Male,Graduate School,Single,30,-2,989,1023,0
+260000,Male,Graduate School,Single,30,-2,2322,1233,0
+360000,Male,Graduate School,Single,30,0,365290,13000,0
+130000,Male,University,Single,29,2,47272,1600,0
+500000,Male,Graduate School,Single,29,0,29378,5000,0
+340000,Male,Graduate School,Single,29,0,277140,10000,1
+70000,Male,Graduate School,Single,26,0,51897,2230,0
+90000,Male,University,Married,29,-1,500,715,0
+50000,Male,University,Married,30,4,49926,0,1
+170000,Male,University,Single,30,-2,352,14547,0
+50000,Male,High School,Single,30,0,43702,2000,0
+90000,Male,Graduate School,Single,29,0,47585,1782,0
+200000,Male,University,Single,29,0,69182,3000,0
+20000,Male,University,Single,32,0,18472,2000,0
+280000,Male,High School,Married,32,0,158798,6000,1
+200000,Male,High School,Married,33,-1,5868,8697,0
+50000,Male,University,Single,32,0,50240,6000,1
+100000,Male,Graduate School,Single,32,0,13565,1000,0
+120000,Male,University,Married,32,-1,1166,2000,0
+30000,Male,High School,Single,32,0,5981,1500,0
+160000,Male,University,Single,32,0,38254,1907,0
+360000,Male,Graduate School,Single,29,-1,6902,141,0
+200000,Male,University,Single,33,-1,18044,390,1
+220000,Male,Graduate School,Single,29,0,26012,1410,0
+80000,Male,Graduate School,Single,30,-1,5504,1004,0
+160000,Male,Graduate School,Married,30,-1,16754,40453,0
+260000,Male,University,Single,30,0,4640,1500,0
+120000,Male,University,Single,31,0,120619,5326,0
+80000,Male,University,Single,30,0,53627,33000,0
+20000,Male,Graduate School,Single,30,-1,82812,2697,0
+100000,Male,High School,Single,30,0,92218,3435,0
+50000,Male,University,Single,31,0,10426,4231,0
+50000,Male,University,Single,29,0,30900,0,0
+260000,Male,Graduate School,Married,37,0,163553,32000,1
+50000,Male,University,Married,38,0,16486,1592,0
+190000,Male,University,Single,28,0,359280,10367,0
+200000,Male,Graduate School,Single,33,0,48950,2374,0
+140000,Male,University,Married,48,0,131959,4153,0
+130000,Male,Graduate School,Married,42,-1,390,0,1
+50000,Male,High School,Married,26,1,14655,1600,1
+100000,Male,University,Single,31,1,36204,2100,0
+200000,Male,High School,Single,37,0,103715,3942,0
+10000,Male,High School,Single,23,0,7787,1299,0
+20000,Male,University,Single,24,0,15408,2000,0
+20000,Male,University,Single,25,0,5347,5006,0
+30000,Male,University,Married,22,1,25147,4311,1
+30000,Male,University,Married,27,4,10846,0,1
+310000,Male,University,Single,30,0,253549,10400,0
+50000,Male,University,Single,31,0,49024,1813,1
+320000,Male,Others,Married,47,0,8748,1285,0
+160000,Male,University,Married,45,0,155842,6139,1
+50000,Male,High School,Single,31,0,45194,1500,0
+20000,Male,University,Married,31,0,3769,1239,1
+330000,Male,University,Married,32,2,60804,6000,0
+10000,Male,High School,Single,36,-1,576,4009,1
+100000,Male,Graduate School,Married,41,1,89137,0,0
+130000,Male,Graduate School,Married,42,0,23970,1288,0
+20000,Male,High School,Others,44,1,11027,0,1
+370000,Male,University,Single,39,1,343248,5,1
+190000,Male,High School,Married,41,0,82926,326,0
+50000,Male,University,Married,35,0,20274,2600,0
+180000,Male,Graduate School,Single,28,0,26730,1800,0
+60000,Male,University,Married,30,1,46138,0,1
+170000,Male,Graduate School,Single,29,0,175349,5800,0
+70000,Male,Graduate School,Single,49,1,38228,0,1
+150000,Male,University,Single,32,0,146785,3300,1
+100000,Male,University,Single,32,0,34904,3000,1
+20000,Male,University,Single,35,0,6855,5000,1
+330000,Male,University,Married,38,0,144729,10000,0
+20000,Male,High School,Single,31,3,11358,0,1
+200000,Male,High School,Single,32,-1,4881,6691,0
+90000,Male,High School,Others,40,0,53256,1923,1
+90000,Male,University,Married,47,0,54268,2425,1
+150000,Male,High School,Married,39,-2,0,0,0
+20000,Male,High School,Single,47,1,17629,0,1
+50000,Male,University,Single,28,0,16889,1271,0
+20000,Male,University,Married,32,0,7162,1200,0
+180000,Male,Graduate School,Married,48,0,64788,3000,0
+50000,Male,University,Single,29,2,35488,2000,1
+180000,Male,University,Married,37,0,175287,8440,0
+200000,Male,University,Married,40,2,208012,0,1
+180000,Male,Graduate School,Single,28,2,175154,5100,0
+440000,Male,Graduate School,Single,31,0,37547,20016,0
+30000,Male,University,Single,40,3,26129,0,1
+270000,Male,Graduate School,Single,36,-1,109611,6285,0
+130000,Male,High School,Single,28,0,132703,5000,0
+50000,Male,University,Single,29,0,13200,1506,0
+130000,Male,University,Married,42,-1,396,396,0
+190000,Male,University,Married,41,2,127370,0,1
+50000,Male,University,Single,34,1,53916,2600,1
+50000,Male,University,Single,35,0,49878,1718,0
+50000,Male,High School,Married,42,0,42238,1425,0
+80000,Male,High School,Single,44,0,55297,2300,1
+70000,Male,University,Married,36,0,33050,5000,0
+50000,Male,High School,Single,49,0,48245,1286,0
+30000,Male,Graduate School,Single,33,1,25826,3000,0
+310000,Male,University,Single,34,0,200328,3860,0
+120000,Male,University,Single,35,0,49485,3458,1
+280000,Male,University,Married,47,0,191759,10051,0
+90000,Male,University,Married,39,0,36703,2000,0
+280000,Male,Graduate School,Married,42,-2,-26,0,0
+10000,Male,University,Married,44,0,6436,3012,1
+340000,Male,Graduate School,Married,43,-2,185,4418,0
+70000,Male,University,Married,44,1,54297,0,1
+560000,Male,University,Married,49,0,13432,3300,0
+360000,Male,University,Married,40,-1,1473,3125,0
+70000,Male,High School,Single,47,0,40862,1421,0
+220000,Male,University,Married,45,0,165583,6500,0
+200000,Male,Graduate School,Married,40,-2,820,820,0
+180000,Male,High School,Married,39,0,24068,1706,0
+210000,Male,University,Others,39,0,173187,4000,0
+100000,Male,University,Married,39,0,74053,3274,0
+50000,Male,High School,Married,44,0,5789,43134,0
+50000,Male,High School,Single,34,0,44036,4500,0
+210000,Male,Graduate School,Married,34,0,42617,30020,0
+20000,Male,University,Single,37,3,2400,0,1
+170000,Male,Graduate School,Single,36,2,177116,0,0
+180000,Male,High School,Married,37,-2,11368,3595,0
+270000,Male,High School,Single,38,2,258442,10350,1
+450000,Male,University,Married,41,-2,1068,1287,0
+60000,Male,University,Married,41,0,27130,6300,1
+30000,Male,University,Married,36,0,22261,6200,0
+50000,Male,University,Single,41,0,39369,2000,0
+280000,Male,High School,Married,46,-1,14186,2836,0
+420000,Male,Graduate School,Single,33,1,170146,0,0
+50000,Male,High School,Married,45,0,29287,1600,0
+120000,Male,University,Married,45,0,22573,5000,0
+50000,Male,University,Married,41,1,32141,0,1
+120000,Male,Graduate School,Married,49,0,10105,106000,0
+100000,Male,University,Married,43,-1,390,390,0
+20000,Male,Graduate School,Single,22,1,10591,0,0
+20000,Male,High School,Single,24,5,21151,0,0
+30000,Male,University,Single,30,1,25907,0,1
+240000,Male,Graduate School,Single,30,-1,178911,6581,0
+360000,Male,High School,Single,33,0,32290,2000,0
+50000,Male,University,Married,44,2,1473,2423,1
+180000,Male,University,Married,46,-2,390,3182,1
+220000,Male,Graduate School,Married,39,0,96473,10000,0
+450000,Male,Graduate School,Married,45,-1,26773,700,1
+30000,Male,University,Single,42,-1,2782,2975,0
+50000,Male,University,Others,41,0,31197,4590,0
+360000,Male,University,Married,36,-2,4671,929,0
+360000,Male,University,Married,34,0,49792,3444,0
+20000,Male,University,Married,40,0,14353,3000,0
+230000,Male,University,Married,36,-2,0,0,0
+50000,Male,University,Single,40,0,47060,2154,1
+420000,Male,Graduate School,Married,42,-1,13218,24671,0
+20000,Male,High School,Married,40,0,18987,1258,1
+20000,Male,High School,Married,40,0,13370,3200,1
+70000,Male,University,Married,37,0,57981,15000,0
+40000,Male,University,Married,46,2,27324,3573,1
+20000,Male,University,Single,34,1,16456,1500,1
+140000,Male,Graduate School,Single,30,-1,13468,1000,0
+130000,Male,University,Married,33,-1,99,99,0
+240000,Male,Graduate School,Married,35,-1,18079,6315,0
+20000,Male,University,Single,30,0,15070,1568,1
+230000,Male,University,Single,34,-2,27178,0,0
+500000,Male,University,Married,44,-2,77009,94064,0
+20000,Male,University,Married,46,0,17483,1363,1
+50000,Male,University,Married,42,0,44637,1880,0
+130000,Male,Graduate School,Single,38,0,131197,7000,0
+400000,Male,University,Married,42,0,2984,3369,1
+50000,Male,High School,Married,40,2,37245,1700,0
+360000,Male,Graduate School,Single,30,-1,8195,100496,0
+50000,Male,High School,Single,34,2,44951,2000,1
+20000,Male,University,Single,44,0,8843,1115,0
+160000,Male,Graduate School,Married,40,-1,500,1000,1
+420000,Male,High School,Married,32,-1,1875,2499,0
+150000,Male,Graduate School,Single,34,2,32355,0,1
+190000,Male,Graduate School,Single,37,0,98588,4007,0
+20000,Male,High School,Single,31,2,16509,2906,1
+410000,Male,Graduate School,Single,32,-1,3339,10000,0
+270000,Male,Graduate School,Single,32,0,47116,2000,0
+120000,Male,Graduate School,Married,32,0,106552,6000,0
+30000,Male,University,Married,33,0,30258,2300,0
+240000,Male,High School,Married,33,0,64853,8000,0
+300000,Male,Graduate School,Married,33,0,31808,1514,0
+500000,Male,Graduate School,Married,34,0,104638,3780,0
+700000,Male,Graduate School,Married,34,-2,3609,6168,0
+110000,Male,Graduate School,Single,33,0,92575,4000,0
+490000,Male,University,Married,34,0,25015,1340,0
+500000,Male,Graduate School,Single,34,0,32282,5000,0
+50000,Male,University,Single,43,-1,396,396,1
+430000,Male,Graduate School,Married,42,-2,11362,11461,0
+200000,Male,University,Married,37,-1,29312,19313,0
+270000,Male,High School,Single,38,0,263842,11021,0
+220000,Male,University,Married,49,-1,540,1158,0
+20000,Male,University,Single,34,-1,390,390,0
+280000,Male,Graduate School,Single,35,1,-2,401,0
+70000,Male,University,Married,49,0,59156,2020,0
+420000,Male,Graduate School,Married,43,-2,8100,14274,0
+90000,Male,University,Married,44,2,58395,2134,1
+10000,Male,University,Married,37,0,6930,1065,0
+120000,Male,University,Single,41,1,121988,0,1
+340000,Male,University,Married,49,0,317686,17000,0
+350000,Male,University,Married,49,0,213236,7600,0
+100000,Male,University,Married,34,2,84196,2300,1
+60000,Male,Graduate School,Married,37,2,21962,2000,1
+290000,Male,University,Married,42,-2,0,0,1
+30000,Male,University,Married,37,0,15054,1551,1
+180000,Male,University,Single,31,0,6221,1000,1
+50000,Male,University,Single,49,0,59236,2000,1
+160000,Male,University,Married,39,0,34171,1639,1
+30000,Male,Others,Single,49,0,29429,3000,0
+240000,Male,Graduate School,Single,47,2,2500,0,1
+260000,Male,High School,Single,43,0,154386,10000,0
+20000,Male,University,Single,42,1,11063,0,1
+50000,Male,High School,Single,39,2,42245,30003,0
+180000,Male,Graduate School,Single,37,-1,49531,45612,0
+110000,Male,High School,Married,43,-1,390,390,0
+130000,Male,University,Married,33,-1,390,390,0
+30000,Male,University,Married,45,-1,390,390,0
+130000,Male,University,Married,48,-2,1115,1902,0
+170000,Male,University,Married,38,-1,16480,15067,1
+240000,Male,University,Single,38,0,167544,8400,0
+360000,Male,Graduate School,Single,38,-1,1800,2413,0
+150000,Male,Graduate School,Single,37,-1,305,89524,0
+70000,Male,High School,Single,38,0,68370,2682,1
+220000,Male,University,Single,30,0,226613,7667,0
+80000,Male,University,Married,31,0,78356,2600,0
+180000,Male,Graduate School,Single,31,0,17426,12000,0
+390000,Male,Graduate School,Married,32,0,26214,6000,0
+20000,Male,High School,Married,48,0,19195,1257,1
+100000,Male,Graduate School,Married,44,0,81786,3400,0
+10000,Male,University,Single,37,0,17018,1117,0
+80000,Male,High School,Married,42,-2,390,1780,0
+330000,Male,Graduate School,Married,48,-2,0,12089,0
+20000,Male,University,Married,47,0,10628,1200,1
+270000,Male,Graduate School,Single,33,-1,288,765,0
+280000,Male,University,Single,33,-2,0,2650,0
+180000,Male,University,Single,33,0,103956,4100,0
+350000,Male,High School,Married,44,-1,266692,3569,0
+200000,Male,High School,Married,44,2,147817,11000,1
+50000,Male,University,Married,41,0,49401,1803,1
+500000,Male,Graduate School,Married,41,0,62289,2608,0
+140000,Male,Graduate School,Married,43,-1,1453,1872,0
+180000,Male,Graduate School,Married,40,-2,0,0,1
+540000,Male,Others,Single,49,0,83722,3538,0
+30000,Male,Graduate School,Single,28,-1,598,0,0
+270000,Male,Graduate School,Married,42,-1,6532,3923,0
+210000,Male,Graduate School,Single,28,1,0,0,0
+160000,Male,Others,Married,45,1,0,10427,0
+50000,Male,High School,Single,39,0,34617,1000,0
+50000,Male,University,Single,29,-1,5140,9377,0
+200000,Male,University,Married,33,0,57579,5250,0
+240000,Male,Graduate School,Single,35,-1,325,650,1
+50000,Male,Graduate School,Single,32,1,45997,2064,1
+80000,Male,University,Single,28,0,10677,2017,0
+80000,Male,University,Married,34,2,81882,3650,1
+80000,Male,University,Married,30,2,13708,9200,1
+200000,Male,University,Married,38,0,7400,1005,0
+200000,Male,High School,Married,41,0,171929,7000,1
+290000,Male,Graduate School,Single,38,1,0,0,0
+80000,Male,University,Married,39,0,145905,5107,0
+250000,Male,Graduate School,Married,40,0,307321,4730,0
+50000,Male,University,Married,37,1,0,780,0
+240000,Male,University,Married,45,0,228278,10000,0
+130000,Male,University,Married,43,0,59762,3496,0
+190000,Male,University,Married,45,0,82764,5000,0
+20000,Male,University,Single,40,-1,1261,11839,0
+60000,Male,University,Single,32,1,47908,3000,0
+170000,Male,University,Married,35,1,17920,1900,0
+50000,Male,High School,Married,46,1,35556,0,0
+240000,Male,University,Married,31,0,230234,8200,0
+200000,Male,Graduate School,Single,31,1,182307,8300,1
+130000,Male,Graduate School,Single,31,1,0,0,0
+360000,Male,Others,Married,31,0,270955,8000,0
+200000,Male,Graduate School,Single,33,0,54713,5000,0
+160000,Male,High School,Single,38,0,138937,120000,0
+230000,Male,Graduate School,Married,39,-1,660,660,0
+280000,Male,Graduate School,Married,38,1,4608,5410,0
+210000,Male,Graduate School,Married,40,0,23764,1367,0
+380000,Male,High School,Married,39,0,37096,10012,0
+270000,Male,University,Married,36,1,5714,0,1
+150000,Male,University,Single,34,-1,917,2000,0
+110000,Male,Graduate School,Single,30,1,88312,4200,0
+500000,Male,Graduate School,Single,32,-1,27318,40019,0
+100000,Male,University,Single,41,-1,1941,1500,1
+130000,Male,Graduate School,Single,30,0,80528,3700,0
+200000,Male,Graduate School,Single,34,2,150493,10000,1
+280000,Male,University,Single,47,0,229281,20213,0
+420000,Male,Graduate School,Married,39,0,100333,5138,0
+380000,Male,Graduate School,Single,29,-1,388,389,0
+220000,Male,Graduate School,Single,33,1,0,0,1
+310000,Male,University,Married,35,-2,-354,0,0
+310000,Male,University,Married,48,0,102646,5200,0
+550000,Male,Graduate School,Single,35,2,539092,23000,0
+170000,Male,Graduate School,Single,36,-1,37830,11376,0
+50000,Male,Graduate School,Single,35,0,49241,1600,0
+390000,Male,Graduate School,Married,41,-2,7015,500,0
+20000,Male,University,Single,24,0,18196,1775,0
+80000,Male,Graduate School,Single,26,0,18229,10000,0
+200000,Male,University,Single,25,-1,88569,15000,0
+80000,Male,University,Single,27,1,0,0,0
+200000,Male,Graduate School,Single,28,0,87390,30002,0
+130000,Male,University,Single,30,0,70142,1100,0
+200000,Male,University,Married,28,3,2443,0,1
+20000,Male,High School,Single,48,3,250,0,1
+310000,Male,University,Single,35,-2,2186,1532,0
+160000,Male,High School,Married,40,0,150451,7000,0
+190000,Male,University,Single,40,2,149411,0,1
+110000,Male,University,Married,49,0,51395,1855,0
+150000,Male,University,Single,49,0,86035,3090,0
+90000,Male,Graduate School,Married,37,0,51438,1950,0
+150000,Male,Graduate School,Single,34,0,56142,10115,0
+160000,Male,Graduate School,Married,41,-1,776,230,1
+180000,Male,Graduate School,Single,34,-2,0,0,0
+20000,Male,University,Single,26,1,8476,2000,1
+50000,Male,University,Married,32,0,50805,2284,1
+420000,Male,University,Married,41,0,193370,4988,0
+30000,Male,University,Married,30,0,29202,2001,0
+20000,Male,High School,Married,46,-1,1836,0,1
+200000,Male,Graduate School,Single,38,2,1810,3405,0
+50000,Male,Graduate School,Single,36,2,45960,0,1
+500000,Male,Graduate School,Married,39,2,15776,3004,1
+30000,Male,High School,Married,42,1,14655,14,0
+360000,Male,University,Married,46,-2,8746,3108,1
+130000,Male,University,Married,38,0,233617,10000,0
+20000,Male,University,Married,45,-2,-2044,5966,0
+200000,Male,Graduate School,Single,34,-2,1740,10108,0
+340000,Male,Graduate School,Married,39,0,177237,16100,0
+50000,Male,Graduate School,Married,33,0,40505,6930,0
+360000,Male,Graduate School,Single,33,-2,780,286,0
+200000,Male,University,Single,35,-1,1121,4621,0
+50000,Male,University,Single,33,0,47364,1780,1
+560000,Male,University,Married,36,-2,4286,3282,0
+130000,Male,High School,Single,39,2,66548,5600,1
+20000,Male,High School,Single,38,0,12248,3000,0
+360000,Male,Graduate School,Married,42,0,30860,20204,0
+50000,Male,High School,Married,39,0,23257,1333,0
+70000,Male,University,Single,35,1,44193,3000,1
+280000,Male,University,Married,49,0,32311,1429,0
+280000,Male,University,Single,39,-2,6251,12577,0
+50000,Male,High School,Married,37,0,44320,2021,0
+160000,Male,University,Married,40,0,101782,3710,0
+240000,Male,High School,Married,40,0,8936,1658,0
+300000,Male,University,Married,48,1,0,0,0
+280000,Male,University,Married,39,2,130549,5000,1
+530000,Male,Graduate School,Single,30,0,205848,13000,0
+120000,Male,University,Single,32,-1,7111,6000,0
+130000,Male,High School,Single,33,0,126592,5264,0
+500000,Male,Graduate School,Married,35,-1,6257,10492,0
+70000,Male,University,Married,31,0,3588,1505,0
+10000,Male,University,Single,31,-1,390,390,0
+500000,Male,Graduate School,Married,41,-1,11060,6895,0
+340000,Male,Graduate School,Married,41,-2,11379,11258,0
+10000,Male,High School,Single,37,0,4950,1103,1
+140000,Male,University,Single,41,0,185948,6088,0
+50000,Male,University,Single,36,-1,1261,1261,0
+270000,Male,University,Married,44,2,187352,8000,1
+290000,Male,University,Single,30,0,65369,3360,0
+50000,Male,Graduate School,Single,33,-1,36363,0,0
+30000,Male,University,Married,33,0,6390,1102,0
+290000,Male,Graduate School,Single,41,0,275367,10580,0
+200000,Male,Graduate School,Single,31,0,9080,14280,0
+50000,Male,Graduate School,Single,31,2,56251,8705,0
+290000,Male,Graduate School,Single,32,-1,3739,0,1
+130000,Male,Graduate School,Single,36,1,128190,6500,0
+220000,Male,University,Married,47,-1,3489,6261,0
+250000,Male,Graduate School,Single,33,0,76412,5000,0
+360000,Male,Graduate School,Married,44,0,162773,5676,0
+150000,Male,Graduate School,Married,38,5,288585,0,1
+160000,Male,University,Single,37,0,140266,5019,0
+360000,Male,Graduate School,Married,34,-2,2433,1230,0
+210000,Male,University,Single,42,0,197204,8800,0
+60000,Male,High School,Married,43,0,44057,1669,0
+20000,Male,University,Others,48,0,16537,1500,0
+50000,Male,High School,Married,45,0,10522,261,1
+60000,Male,High School,Married,41,0,58563,2826,0
+460000,Male,Graduate School,Single,39,0,290372,15495,0
+240000,Male,Others,Married,40,-2,19295,2054,0
+30000,Male,University,Single,44,2,17766,1306,1
+50000,Male,University,Married,41,0,43607,2500,0
+120000,Male,University,Married,41,-2,0,0,1
+120000,Male,University,Single,32,0,114476,4100,0
+230000,Male,Graduate School,Single,40,-2,6514,7882,1
+80000,Male,Graduate School,Single,39,0,48491,8000,0
+20000,Male,University,Single,34,2,17497,3500,1
+360000,Male,University,Married,33,-1,15299,44730,0
+500000,Male,Graduate School,Married,30,0,74797,2800,0
+30000,Male,University,Married,45,1,14728,1600,1
+170000,Male,Graduate School,Single,29,-1,14479,70,0
+240000,Male,University,Single,41,0,226047,10000,0
+230000,Male,Graduate School,Married,40,-2,169,2018,0
+180000,Male,Graduate School,Single,31,1,172505,0,0
+210000,Male,High School,Married,38,2,122157,12301,1
+460000,Male,Graduate School,Single,44,2,282944,12000,1
+200000,Male,Graduate School,Married,47,1,198663,8013,1
+80000,Male,University,Married,40,0,71578,3007,0
+50000,Male,University,Married,49,1,50435,2400,1
+230000,Male,University,Single,38,2,177917,8000,0
+80000,Male,University,Married,38,0,40356,1631,0
+200000,Male,University,Married,32,-1,80,90001,0
+50000,Male,High School,Married,40,-1,1261,2522,0
+30000,Male,University,Married,36,1,0,780,0
+280000,Male,High School,Married,40,2,178220,10000,1
+260000,Male,University,Married,37,0,128602,3203,0
+200000,Male,University,Single,32,2,160106,7600,1
+80000,Male,University,Single,32,2,79237,3000,1
+200000,Male,Graduate School,Single,45,0,14027,5006,0
+20000,Male,Others,Married,28,0,9278,1400,0
+200000,Male,University,Married,47,-1,42114,235728,1
+60000,Male,High School,Single,41,-1,866,29687,0
+50000,Male,University,Married,38,0,46509,2000,0
+50000,Male,University,Married,42,0,1345,1000,0
+50000,Male,University,Single,41,0,50516,2721,0
+200000,Male,High School,Single,43,3,2500,0,1
+130000,Male,High School,Single,38,2,114726,20,1
+130000,Male,University,Single,41,0,131762,3466,1
+50000,Male,University,Single,32,-1,780,0,1
+160000,Male,Graduate School,Single,33,2,146129,15000,1
+50000,Male,University,Single,32,1,48793,2500,1
+10000,Male,University,Single,46,0,8866,1143,0
+200000,Male,University,Single,35,0,87750,3000,0
+360000,Male,Graduate School,Single,38,-2,12325,14123,0
+230000,Male,Graduate School,Married,37,-2,1464,1742,0
+200000,Male,Graduate School,Single,34,-1,3253,4005,0
+50000,Male,High School,Married,43,0,27620,1767,1
+90000,Male,University,Married,34,0,106053,3547,0
+270000,Male,Graduate School,Single,36,-2,11464,1879,0
+50000,Male,University,Married,41,0,49999,2263,0
+50000,Male,University,Married,39,2,17150,0,1
+20000,Male,University,Married,39,0,16621,1200,0
+50000,Male,High School,Married,35,0,50500,2000,1
+390000,Male,University,Married,39,-1,7831,8,0
+50000,Male,University,Single,36,4,51689,3,0
+160000,Male,High School,Single,38,1,0,0,0
+20000,Male,University,Married,36,3,17500,0,1
+230000,Male,University,Single,36,-2,3245,1864,0
+50000,Male,Graduate School,Single,46,0,49386,2000,0
+50000,Male,High School,Single,40,0,55397,2000,0
+80000,Male,High School,Married,37,0,51925,3018,0
+240000,Male,University,Married,42,-1,795,1921,0
+90000,Male,High School,Married,39,0,93757,4000,1
+260000,Male,High School,Married,44,0,250416,11000,0
+50000,Male,University,Single,47,0,48017,2488,0
+350000,Male,Graduate School,Married,41,0,24297,1411,0
+20000,Male,University,Single,42,0,15717,1276,0
+110000,Male,High School,Married,41,0,53497,2510,0
+80000,Male,High School,Married,34,-1,5173,10826,0
+230000,Male,University,Married,34,1,0,0,1
+370000,Male,University,Single,39,0,377259,15198,0
+100000,Male,University,Married,38,-1,7798,864,1
+490000,Male,University,Single,35,2,427020,0,1
+50000,Male,University,Single,42,2,22591,0,0
+200000,Male,High School,Single,41,-2,20950,182,0
+70000,Male,University,Married,37,2,60514,3000,1
+210000,Male,University,Married,38,-1,326,0,0
+50000,Male,High School,Single,33,0,47834,1866,0
+50000,Male,High School,Single,34,0,49454,1792,0
+100000,Male,Graduate School,Single,34,0,18185,7100,1
+160000,Male,University,Single,38,4,99340,0,1
+30000,Male,University,Married,32,-2,5066,6367,0
+220000,Male,Graduate School,Single,34,0,170352,15206,1
+150000,Male,High School,Single,29,-1,820,3431,0
+300000,Male,University,Single,31,0,281829,10208,0
+390000,Male,University,Married,36,1,0,0,0
+180000,Male,Graduate School,Married,36,-1,170619,6063,0
+300000,Male,Graduate School,Single,40,1,0,0,0
+150000,Male,Graduate School,Married,37,0,112695,3515,0
+80000,Male,University,Married,36,0,77870,2900,1
+250000,Male,University,Single,31,1,0,0,0
+350000,Male,Graduate School,Married,37,-2,916,1133,0
+380000,Male,Graduate School,Married,40,0,231759,10000,0
+450000,Male,Graduate School,Single,34,-1,583,0,0
+340000,Male,Graduate School,Single,35,0,62896,16591,0
+70000,Male,University,Married,44,0,14202,1500,0
+320000,Male,University,Married,37,0,367670,13200,1
+180000,Male,Graduate School,Married,48,-1,325,1260,0
+360000,Male,Graduate School,Married,43,0,38324,3390,0
+230000,Male,Graduate School,Married,46,1,189724,8000,1
+280000,Male,Graduate School,Single,36,1,314318,0,0
+50000,Male,University,Single,34,0,46413,4729,0
+260000,Male,University,Married,34,0,186189,6203,1
+230000,Male,University,Single,41,-1,23216,10014,0
+240000,Male,Graduate School,Single,41,-2,-317,63752,0
+140000,Male,Graduate School,Married,44,0,90547,3500,0
+360000,Male,Graduate School,Married,43,0,30769,13952,0
+320000,Male,Graduate School,Married,44,-1,1329,4526,0
+310000,Male,Graduate School,Married,36,-1,8517,6765,0
+60000,Male,University,Single,42,0,78291,3000,0
+380000,Male,University,Married,38,0,365427,14008,0
+40000,Male,Others,Single,37,0,36712,1596,0
+150000,Male,University,Married,40,0,74985,3400,0
+360000,Male,High School,Single,40,-1,2569,316,0
+200000,Male,Graduate School,Married,36,-1,4585,33485,0
+260000,Male,Others,Married,42,0,204017,8000,0
+430000,Male,High School,Single,38,-2,37693,3507,0
+200000,Male,University,Married,36,0,5440,2000,0
+380000,Male,Graduate School,Married,40,0,95972,30027,0
+110000,Male,University,Single,37,0,112091,4625,0
+200000,Male,Graduate School,Married,42,2,168289,8000,0
+270000,Male,Graduate School,Single,34,1,20979,0,0
+120000,Male,Graduate School,Married,31,-1,1216,416,0
+420000,Male,Graduate School,Married,47,0,54799,5000,0
+20000,Male,High School,Single,34,0,6569,2449,0
+200000,Male,University,Single,40,1,193773,7350,1
+80000,Male,University,Married,45,0,34905,1600,0
+270000,Male,University,Single,45,1,0,0,0
+30000,Male,University,Married,45,3,12751,0,1
+80000,Male,High School,Married,45,0,55385,2019,0
+30000,Male,University,Married,39,2,18557,1700,1
+20000,Male,Graduate School,Married,36,0,18958,1500,1
+20000,Male,University,Married,33,2,17999,1700,1
+320000,Male,University,Married,35,0,274599,11500,1
+120000,Male,High School,Married,36,-2,326,326,0
+290000,Male,University,Married,38,1,296437,25,0
+380000,Male,University,Married,39,-2,3074,9203,0
+480000,Male,Graduate School,Married,49,0,376186,20000,0
+350000,Male,University,Single,47,0,190211,8239,0
+220000,Male,Graduate School,Single,36,-1,792,0,0
+360000,Male,Graduate School,Married,44,-2,188,1610,1
+420000,Male,University,Married,39,-1,22219,7573,0
+160000,Male,Graduate School,Married,40,0,4024,3000,1
+30000,Male,Graduate School,Single,47,2,26413,0,0
+170000,Male,University,Single,31,0,8024,1610,0
+710000,Male,University,Married,38,0,67061,10045,0
+350000,Male,University,Single,34,-2,2184,1445,0
+230000,Male,University,Married,36,0,4761,1110,0
+30000,Male,University,Others,40,2,2922,1000,0
+360000,Male,Graduate School,Married,38,-1,475,0,0
+500000,Male,Graduate School,Married,41,-1,390,780,0
+380000,Male,Graduate School,Married,34,0,124486,3563,0
+240000,Male,University,Married,37,1,211808,2000,0
+100000,Male,Graduate School,Single,41,0,31452,1527,1
+200000,Male,University,Single,38,0,176579,6589,1
+500000,Male,Graduate School,Single,35,-2,5925,11194,0
+220000,Male,High School,Married,48,-2,0,0,1
+640000,Male,Graduate School,Single,33,-1,21189,4258,0
+340000,Male,High School,Single,34,0,334628,16040,1
+400000,Male,Graduate School,Single,48,0,112789,5000,0
+200000,Male,Graduate School,Married,39,-2,8852,5585,0
+20000,Male,University,Married,41,0,9632,1208,0
+210000,Male,Graduate School,Married,35,-1,205,0,0
+280000,Male,High School,Single,35,0,275659,10932,0
+150000,Male,University,Single,36,0,149931,5500,0
+50000,Male,Graduate School,Single,36,0,45010,1919,0
+60000,Male,University,Married,36,0,37129,1602,1
+60000,Male,High School,Married,36,1,0,0,1
+200000,Male,University,Married,45,0,197600,10265,0
+30000,Male,University,Single,35,0,21708,1542,1
+130000,Male,University,Married,42,1,-52,0,1
+140000,Male,University,Married,39,0,136032,5400,0
+280000,Male,University,Single,39,0,273651,11128,0
+110000,Male,High School,Single,44,0,102274,3734,0
+130000,Male,Graduate School,Single,42,-1,836,836,0
+110000,Male,University,Married,48,0,92806,5000,0
+320000,Male,Graduate School,Married,38,0,18924,3000,0
+490000,Male,Graduate School,Single,41,0,345075,10000,0
+360000,Male,Graduate School,Single,45,-1,287,29794,0
+280000,Male,Graduate School,Married,37,2,1405,1549,0
+290000,Male,Graduate School,Married,36,2,170953,546,0
+180000,Male,University,Single,39,0,119466,4500,0
+150000,Male,University,Single,36,0,32263,3000,0
+260000,Male,University,Married,44,-1,11506,975,0
+90000,Male,High School,Single,34,0,74012,4600,0
+20000,Male,High School,Single,35,0,16958,1699,0
+180000,Male,Graduate School,Single,34,-1,4400,3018,0
+230000,Male,University,Single,36,-1,182,1407,0
+260000,Male,Graduate School,Married,38,3,2500,0,1
+150000,Male,Graduate School,Single,31,3,125086,4967,1
+480000,Male,Graduate School,Married,44,-1,634,144,0
+110000,Male,University,Married,45,0,112987,7000,1
+150000,Male,Graduate School,Married,49,0,20604,1359,0
+170000,Male,Graduate School,Single,42,0,166708,4000,0
+50000,Male,University,Single,44,0,49699,2300,0
+400000,Male,Others,Single,40,0,178069,6496,0
+430000,Male,High School,Single,47,1,-18,6408,0
+300000,Male,Graduate School,Married,40,-1,4950,0,0
+40000,Male,Graduate School,Married,33,0,21107,3400,0
+180000,Male,University,Married,45,0,130950,6000,0
+300000,Male,High School,Single,31,0,80928,4000,0
+20000,Male,University,Others,49,0,16336,1294,1
+230000,Male,High School,Single,32,-1,326,326,0
+180000,Male,Graduate School,Married,46,0,178517,5000,0
+50000,Male,University,Single,36,1,19649,1500,1
+160000,Male,Graduate School,Married,37,0,16750,1100,0
+100000,Male,University,Single,43,-1,6768,10015,0
+140000,Male,University,Married,35,0,135227,5304,0
+70000,Male,High School,Married,30,0,67730,3000,0
+400000,Male,Graduate School,Single,32,0,55773,3028,0
+330000,Male,Graduate School,Married,35,-1,3888,14782,0
+130000,Male,University,Married,39,0,103566,3969,1
+410000,Male,Graduate School,Married,46,-2,8756,1237,0
+350000,Male,University,Single,36,0,15061,10000,0
+360000,Male,Graduate School,Single,35,0,41327,20059,0
+90000,Male,University,Married,41,-2,188,188,0
+100000,Male,University,Married,46,-1,390,3505,0
+200000,Male,University,Single,36,0,193097,10000,0
+20000,Male,University,Married,38,0,5705,1266,1
+230000,Male,University,Married,35,0,170413,5976,0
+20000,Male,University,Married,31,0,18764,1610,0
+130000,Male,University,Single,31,3,300,0,1
+150000,Male,High School,Single,32,0,150201,5039,0
+160000,Male,Graduate School,Single,31,-2,2930,466,0
+20000,Male,University,Single,30,3,15556,2000,0
+130000,Male,Graduate School,Single,48,0,102771,4843,0
+490000,Male,Graduate School,Married,43,2,3956,3081,1
+360000,Male,Graduate School,Married,45,-1,2369,1367,0
+150000,Male,University,Married,40,0,172935,35749,0
+70000,Male,Graduate School,Married,44,1,14960,0,1
+360000,Male,University,Others,32,-2,10591,326,0
+500000,Male,Graduate School,Married,37,1,271115,15873,0
+250000,Male,University,Single,38,-1,2395,29658,0
+50000,Male,University,Single,38,0,46005,1266,0
+250000,Male,University,Married,45,0,142001,5084,0
+20000,Male,Graduate School,Single,40,2,15563,2000,0
+360000,Male,High School,Single,33,-2,0,0,0
+80000,Male,University,Single,41,0,80920,3500,0
+140000,Male,Graduate School,Single,44,0,72763,3905,0
+140000,Male,University,Single,36,0,123474,4409,1
+50000,Male,University,Single,37,2,72034,7300,1
+160000,Male,University,Single,34,0,144953,6000,0
+50000,Male,University,Single,49,0,41638,2000,0
+200000,Male,University,Married,41,2,1075,11000,0
+80000,Male,Graduate School,Single,38,1,67046,3100,1
+140000,Male,University,Single,33,-2,0,935,0
+90000,Male,University,Single,34,0,77846,5700,0
+40000,Male,University,Married,47,0,34664,3524,0
+10000,Male,University,Married,38,1,4437,1150,0
+500000,Male,University,Married,33,0,26597,3357,0
+410000,Male,Graduate School,Married,41,0,396975,30000,0
+20000,Male,University,Single,49,0,19111,1286,0
+200000,Male,Graduate School,Married,37,1,0,0,0
+50000,Male,Graduate School,Single,32,0,49571,1773,0
+150000,Male,Graduate School,Married,44,-1,4825,3018,0
+180000,Male,Graduate School,Married,49,-1,1092,1733,1
+80000,Male,High School,Single,33,1,0,3495,0
+300000,Male,University,Single,34,0,50281,2000,0
+150000,Male,Graduate School,Single,36,2,167094,5068,1
+220000,Male,High School,Married,48,1,0,0,0
+120000,Male,University,Single,35,0,90972,4000,0
+360000,Male,Graduate School,Single,43,-1,59,4465,0
+550000,Male,Others,Married,37,0,452600,16025,1
+260000,Male,Graduate School,Single,39,3,235799,13,0
+80000,Male,High School,Married,45,0,8133,3000,1
+160000,Male,Graduate School,Married,48,1,0,0,0
+190000,Male,University,Married,48,0,6450,189717,0
+250000,Male,Graduate School,Married,44,-1,3893,54946,0
+30000,Male,High School,Single,37,0,22629,5002,0
+100000,Male,University,Single,37,2,24927,3185,1
+350000,Male,Graduate School,Married,35,0,15193,5197,0
+120000,Male,High School,Married,45,0,24591,3000,0
+350000,Male,Graduate School,Married,43,-2,2878,0,0
+340000,Male,University,Married,49,0,9983,43418,0
+360000,Male,Graduate School,Single,39,-1,2758,1448,0
+410000,Male,University,Married,47,-1,40390,9760,0
+170000,Male,University,Married,42,0,162880,8000,0
+110000,Male,University,Single,38,0,95925,4000,0
+30000,Male,High School,Single,51,0,24070,2000,0
+200000,Male,University,Married,54,0,41190,2138,0
+250000,Male,High School,Married,50,0,254893,9316,0
+240000,Male,Graduate School,Married,50,-1,35117,11051,0
+210000,Male,Graduate School,Married,50,1,0,0,0
+30000,Male,High School,Married,52,0,28541,1502,0
+30000,Male,High School,Married,51,0,13386,1218,0
+40000,Male,High School,Married,56,1,0,0,1
+50000,Male,High School,Others,55,0,23498,1336,0
+50000,Male,University,Married,50,0,47053,1200,1
+20000,Male,University,Married,50,0,14123,3200,0
+20000,Male,High School,Married,52,2,8145,2474,1
+240000,Male,University,Married,51,0,98588,3592,1
+110000,Male,High School,Single,52,0,56327,2700,0
+90000,Male,University,Single,52,0,86826,4000,0
+110000,Male,Graduate School,Married,52,0,82390,4000,0
+500000,Male,Graduate School,Married,50,0,350340,12884,0
+20000,Male,University,Married,54,0,18398,2000,0
+70000,Male,High School,Married,50,2,41087,2000,1
+50000,Male,High School,Single,51,1,0,0,1
+120000,Male,High School,Married,53,0,88702,14357,1
+50000,Male,University,Married,60,0,46344,1389,0
+20000,Male,University,Married,50,0,5519,1200,0
+30000,Male,University,Single,51,0,29628,1460,0
+290000,Male,High School,Married,61,0,281967,10492,0
+80000,Male,University,Married,61,0,81647,5500,1
+110000,Male,University,Married,50,0,110406,4600,0
+190000,Male,High School,Single,54,0,142032,5883,0
+50000,Male,Graduate School,Single,48,0,56672,2500,0
+50000,Male,High School,Married,51,0,38302,1634,0
+20000,Male,High School,Married,52,2,16290,1289,0
+110000,Male,High School,Married,57,-1,390,390,0
+350000,Male,University,Married,50,0,83134,3969,0
+60000,Male,High School,Married,52,0,57470,2200,1
+20000,Male,High School,Single,59,2,7451,2000,0
+430000,Male,Graduate School,Married,51,0,4704,9723,0
+260000,Male,University,Married,51,-2,0,0,1
+140000,Male,University,Married,52,1,31511,0,1
+360000,Male,Graduate School,Married,50,-1,1858,0,0
+280000,Male,University,Married,50,-2,5622,4573,0
+10000,Male,University,Married,50,0,3664,1300,0
+280000,Male,University,Married,49,0,94916,3500,0
+50000,Male,Graduate School,Married,63,-1,264,528,0
+50000,Male,University,Single,56,0,37624,8000,0
+280000,Male,University,Married,53,0,278127,1700,1
+20000,Male,University,Married,50,0,18865,1500,0
+80000,Male,University,Married,51,2,1041,1000,1
+50000,Male,University,Married,51,0,46094,1831,0
+360000,Male,Graduate School,Married,51,0,346225,14562,1
+260000,Male,Graduate School,Married,50,-2,390,7516,0
+20000,Male,High School,Married,60,1,17495,0,0
+110000,Male,University,Married,51,0,45012,2000,0
+20000,Male,University,Single,52,0,19626,1311,1
+20000,Male,Others,Single,55,0,69257,2400,1
+70000,Male,Graduate School,Married,60,0,69956,2500,0
+30000,Male,High School,Married,52,1,3997,0,1
+230000,Male,High School,Married,61,2,2500,0,1
+160000,Male,High School,Married,57,-1,759,209,1
+30000,Male,High School,Married,50,-1,390,390,0
+50000,Male,High School,Others,52,-2,0,0,1
+210000,Male,High School,Married,47,-1,1772,0,0
+50000,Male,High School,Single,60,0,46362,1800,0
+150000,Male,Graduate School,Married,55,3,97719,4500,1
+660000,Male,Graduate School,Single,55,-1,284334,6486,0
+100000,Male,University,Married,47,1,99823,0,0
+700000,Male,University,Married,50,0,280809,10000,0
+300000,Male,Graduate School,Married,50,2,245193,11000,1
+320000,Male,Graduate School,Married,50,-2,18831,1809,0
+250000,Male,Graduate School,Married,50,1,0,0,0
+150000,Male,Graduate School,Single,50,-1,829,18400,0
+200000,Male,Graduate School,Single,50,1,-6676,16374,1
+50000,Male,High School,Single,52,0,50717,1357,1
+320000,Male,Graduate School,Married,52,0,293481,10104,0
+20000,Male,High School,Married,51,0,14962,2890,0
+170000,Male,University,Married,64,-1,1922,1788,0
+50000,Male,University,Married,52,1,50757,0,1
+130000,Male,University,Married,52,-1,6954,12,0
+150000,Male,Graduate School,Married,59,-2,5066,2616,0
+500000,Male,Graduate School,Married,53,-2,13331,13941,0
+410000,Male,University,Married,54,-1,1413,1710,0
+50000,Male,University,Married,53,0,70820,1998,0
+200000,Male,Others,Married,66,0,143994,4767,0
+230000,Male,High School,Married,55,0,208273,10012,0
+20000,Male,High School,Single,53,0,19165,3000,1
+300000,Male,High School,Married,53,-2,-3,0,0
+20000,Male,High School,Others,57,0,19071,2000,0
+380000,Male,Graduate School,Married,55,0,59881,2185,0
+150000,Male,University,Married,57,2,14496,1600,1
+20000,Male,University,Married,53,1,5397,1000,0
+130000,Male,High School,Single,56,0,33928,2000,0
+350000,Male,Graduate School,Married,57,0,262719,11500,0
+440000,Male,Graduate School,Married,64,0,427038,20000,0
+170000,Male,University,Married,54,-1,186,116,0
+70000,Male,Graduate School,Married,67,1,275,10400,0
+270000,Male,Graduate School,Married,58,-2,9211,9012,0
+500000,Male,University,Married,59,0,197792,7200,0
+200000,Male,High School,Married,60,0,135775,4916,0
+290000,Male,High School,Married,55,0,21372,5000,0
+50000,Male,Graduate School,Single,54,0,48092,2500,0
+20000,Male,High School,Married,57,1,1472,0,1
+80000,Male,High School,Married,56,0,49897,1822,0
+70000,Male,University,Single,55,0,31081,1474,0
+50000,Male,High School,Married,53,0,48684,1974,0
+160000,Male,University,Married,55,0,155389,6911,0
+70000,Male,Graduate School,Others,57,0,70051,3567,0
+230000,Male,Graduate School,Single,59,1,0,0,0
+180000,Male,University,Married,70,0,12216,1724,0
+20000,Male,University,Single,56,5,19977,0,0
+350000,Male,Graduate School,Married,70,0,106813,4000,0
+70000,Male,University,Married,54,-1,20706,1351,0
+20000,Male,University,Single,54,0,37720,1200,0
+180000,Male,University,Single,56,0,89560,3460,0
+320000,Male,High School,Married,56,0,274210,9441,0
+50000,Male,University,Married,53,2,17946,1306,0
+50000,Male,University,Married,56,0,48895,1775,0
+180000,Male,University,Married,55,-1,6854,560,1
+130000,Male,University,Married,55,-1,780,390,0
+20000,Male,High School,Single,57,-1,170,0,0
+500000,Male,University,Married,54,0,108075,5014,0
+50000,Male,University,Married,58,-1,382,0,0
+760000,Male,High School,Married,54,0,447670,20011,0
+20000,Male,University,Married,55,3,13139,0,1
+200000,Male,Graduate School,Married,54,-1,22413,890,0
+500000,Male,University,Married,59,-2,-3182,0,0
+70000,Male,Graduate School,Married,54,-1,1261,1261,0
+50000,Male,University,Married,54,2,48556,1804,1
+680000,Male,University,Single,64,0,568532,22000,0
+290000,Male,University,Married,53,-1,4975,1027,0
+450000,Male,Graduate School,Married,53,-1,17530,3885,0
+370000,Male,Graduate School,Married,68,0,8207,3000,0
+80000,Male,Graduate School,Married,53,0,9527,42908,0
+40000,Male,Graduate School,Married,57,0,87313,2729,0
+60000,Male,Graduate School,Married,54,1,60500,0,0
+300000,Male,Graduate School,Married,58,-1,13620,5823,0
+430000,Male,Graduate School,Married,69,0,3658,1962,0
+190000,Male,High School,Married,67,2,186797,9000,0
+200000,Male,University,Married,65,-1,1269,3789,0
+150000,Male,Graduate School,Married,54,1,23959,0,1
+200000,Male,Graduate School,Married,58,-2,0,0,1
+50000,Male,Graduate School,Married,58,0,55262,2000,0
+300000,Male,Graduate School,Married,55,-2,14202,0,0
+480000,Male,University,Married,53,1,-11,11360,0
+290000,Male,High School,Married,53,0,271508,11000,0
+360000,Male,Graduate School,Married,54,1,-281,3011,1
+110000,Male,University,Married,56,0,93983,3371,0
+220000,Female,Graduate School,Single,27,-1,375,750,0
+120000,Female,University,Others,59,2,82172,0,1
+50000,Female,High School,Single,26,0,6450,4500,1
+70000,Female,University,Married,37,0,67374,6044,0
+30000,Female,University,Single,35,0,29728,1779,0
+60000,Female,University,Married,32,0,13070,2000,0
+110000,Female,University,Single,28,0,35208,2000,1
+30000,Female,High School,Married,30,-1,1483,1378,0
+60000,Female,High School,Married,34,0,53748,1778,0
+60000,Female,Graduate School,Single,26,-1,959,0,0
+30000,Female,University,Single,24,1,16704,2000,0
+60000,Female,High School,Married,49,0,60335,6751,0
+180000,Female,Others,Single,26,0,42086,90402,0
+240000,Female,University,Married,34,0,65883,3000,0
+220000,Female,Graduate School,Single,30,-1,8939,316,0
+100000,Female,High School,Married,35,-1,390,390,0
+180000,Female,Graduate School,Single,27,-1,4676,1500,0
+140000,Female,University,Married,26,0,128720,6450,0
+60000,Female,University,Married,27,1,25547,0,1
+90000,Female,High School,Married,56,0,59524,2670,0
+50000,Female,University,Single,27,2,16557,2000,1
+290000,Female,Graduate School,Single,28,0,125679,4315,0
+400000,Female,Graduate School,Single,47,-2,6573,8195,0
+60000,Female,University,Single,35,0,42174,3090,0
+60000,Female,University,Single,29,0,10450,1011,0
+180000,Female,University,Single,29,-2,0,0,1
+130000,Female,University,Single,24,0,128499,5100,0
+110000,Female,Graduate School,Single,24,2,85123,2000,1
+160000,Female,University,Single,35,0,24306,2000,0
+500000,Female,Graduate School,Single,25,1,-46,9458,0
+160000,Female,Graduate School,Single,26,-1,5323,1726,0
+80000,Female,Graduate School,Single,29,1,0,416,0
+70000,Female,Graduate School,Single,28,0,12018,39270,0
+130000,Female,Others,Single,27,0,106852,4000,0
+20000,Female,High School,Married,23,0,5373,2000,0
+90000,Female,High School,Single,27,1,94699,0,1
+30000,Female,Graduate School,Single,24,0,4848,3000,0
+360000,Female,University,Single,25,0,354626,14000,1
+150000,Female,University,Single,30,0,106911,4000,0
+360000,Female,University,Married,26,0,358897,14245,1
+160000,Female,Graduate School,Single,29,-1,3462,4000,0
+120000,Female,High School,Married,51,0,26496,2800,0
+160000,Female,University,Married,25,0,83005,3900,0
+190000,Female,Graduate School,Single,27,0,14671,2000,0
+110000,Female,Graduate School,Single,28,0,65024,2500,0
+290000,Female,University,Single,26,0,291861,9596,0
+20000,Female,Graduate School,Single,24,1,0,1516,0
+200000,Female,Graduate School,Single,28,1,0,344,0
+440000,Female,Graduate School,Single,30,-1,777,612,0
+70000,Female,Graduate School,Single,27,2,43616,2000,1
+160000,Female,University,Single,24,0,39604,1378,0
+120000,Female,University,Single,30,-1,140,3230,0
+20000,Female,High School,Married,22,2,16358,1282,0
+310000,Female,Graduate School,Single,27,-2,1000,152,0
+80000,Female,High School,Married,29,0,15008,3000,0
+180000,Female,University,Single,29,1,77494,0,0
+250000,Female,Graduate School,Single,28,0,176925,7000,1
+180000,Female,University,Single,24,0,8031,2009,1
+130000,Female,University,Single,29,2,120844,4500,1
+330000,Female,High School,Married,27,-1,7098,4885,0
+310000,Female,University,Single,29,0,135195,3500,0
+70000,Female,Graduate School,Single,25,0,27990,1394,0
+30000,Female,University,Single,29,2,18872,3500,1
+290000,Female,University,Married,29,2,112676,4500,1
+260000,Female,Graduate School,Single,29,2,19006,6005,1
+250000,Female,Graduate School,Single,26,0,187336,10000,0
+120000,Female,University,Single,28,1,0,0,1
+230000,Female,Graduate School,Single,26,0,166511,6722,0
+390000,Female,Graduate School,Single,27,0,375023,13313,0
+30000,Female,University,Married,28,-2,435,0,0
+50000,Female,University,Single,26,1,49119,0,1
+180000,Female,Graduate School,Single,28,0,49550,3006,0
+500000,Female,University,Single,34,0,367463,10584,0
+50000,Female,Graduate School,Single,29,-1,2463,0,0
+80000,Female,University,Married,28,0,8432,928,0
+460000,Female,University,Married,28,0,25537,1335,0
+150000,Female,High School,Married,28,0,3558,2000,0
+50000,Female,University,Married,27,0,48292,2300,0
+20000,Female,Graduate School,Single,27,-1,5888,4813,0
+350000,Female,Graduate School,Single,33,0,14118,3922,0
+20000,Female,University,Married,24,-1,19915,600,1
+400000,Female,Graduate School,Single,29,-1,3807,3168,0
+490000,Female,University,Single,35,0,48108,3000,0
+280000,Female,Graduate School,Single,30,-1,596,645,0
+50000,Female,University,Single,23,-1,48570,1773,0
+20000,Female,University,Married,24,2,19891,0,1
+90000,Female,Graduate School,Single,27,0,46588,4000,0
+150000,Female,Graduate School,Single,29,-1,1639,592,0
+70000,Female,Graduate School,Single,28,0,37325,2000,0
+230000,Female,Graduate School,Single,30,-1,1432,1094,0
+200000,Female,University,Single,29,-1,3334,1500,0
+500000,Female,University,Single,28,0,182416,5000,0
+160000,Female,University,Single,28,-2,968,710,0
+250000,Female,University,Married,29,-1,390,0,1
+60000,Female,University,Single,27,0,57697,2200,1
+180000,Female,University,Single,25,-1,3887,0,0
+500000,Female,Graduate School,Married,34,0,30376,15000,0
+50000,Female,University,Married,25,1,52207,0,0
+140000,Female,University,Married,30,-2,0,0,0
+460000,Female,Graduate School,Single,28,0,21333,4560,0
+160000,Female,University,Single,39,0,30633,7000,1
+50000,Female,Graduate School,Single,29,1,0,0,0
+110000,Female,Graduate School,Single,26,0,103705,3500,0
+100000,Female,University,Single,24,0,65744,2398,0
+80000,Female,Graduate School,Single,29,0,19833,2000,0
+230000,Female,Graduate School,Single,27,-2,2978,7000,0
+280000,Female,University,Married,38,0,8868,10006,0
+20000,Female,University,Single,22,3,22353,0,1
+30000,Female,University,Single,22,2,30324,1900,1
+30000,Female,University,Married,23,6,40439,0,1
+30000,Female,Graduate School,Single,23,0,10210,6000,0
+30000,Female,University,Single,23,2,729,483,1
+110000,Female,University,Single,24,-1,90929,3550,0
+30000,Female,University,Single,25,2,28518,0,0
+220000,Female,University,Single,25,0,215036,7800,0
+20000,Female,University,Single,26,1,0,0,0
+60000,Female,University,Single,26,2,55892,2300,0
+90000,Female,High School,Married,48,-1,1102,0,0
+100000,Female,Graduate School,Single,30,0,92909,6000,0
+70000,Female,University,Married,33,0,52901,2011,0
+210000,Female,University,Single,28,0,40236,2200,0
+50000,Female,University,Single,22,0,51417,2054,0
+50000,Female,Graduate School,Single,22,1,4112,0,0
+20000,Female,University,Single,22,-1,18021,0,1
+130000,Female,High School,Single,23,2,1330,5719,1
+360000,Female,Graduate School,Married,44,1,0,0,0
+200000,Female,Graduate School,Single,26,0,134780,6000,0
+130000,Female,Graduate School,Married,35,0,23693,4257,0
+100000,Female,University,Married,23,0,6584,2500,0
+80000,Female,University,Single,22,0,77408,2677,0
+60000,Female,University,Single,22,3,61558,10,1
+50000,Female,University,Single,23,0,50614,2500,0
+10000,Female,University,Single,23,1,5699,2000,1
+30000,Female,High School,Single,23,2,27588,3540,1
+50000,Female,University,Single,23,0,26001,1282,0
+50000,Female,Graduate School,Single,22,2,4536,0,1
+30000,Female,University,Single,22,-1,11222,1000,1
+50000,Female,University,Single,22,0,11970,3000,0
+70000,Female,University,Single,22,0,68969,3200,0
+20000,Female,Graduate School,Single,22,-1,12831,0,1
+30000,Female,University,Single,22,0,19502,3000,0
+30000,Female,University,Single,24,0,24835,1485,0
+80000,Female,University,Single,24,0,17220,1400,0
+130000,Female,Graduate School,Single,24,0,3621,1130,0
+10000,Female,University,Single,23,-1,9438,1985,0
+70000,Female,High School,Single,23,0,14816,1000,0
+20000,Female,University,Single,21,0,19039,1286,0
+20000,Female,University,Single,22,1,0,6594,0
+30000,Female,University,Single,22,2,24874,2000,0
+50000,Female,High School,Single,22,-2,2498,0,0
+20000,Female,University,Single,24,-1,396,1301,0
+60000,Female,University,Single,22,2,46403,1544,1
+50000,Female,University,Single,23,-1,900,3147,0
+20000,Female,Graduate School,Single,22,0,19458,1552,0
+30000,Female,University,Single,22,0,19759,3000,0
+20000,Female,University,Single,23,1,15596,0,1
+10000,Female,Graduate School,Single,23,2,3413,2000,0
+50000,Female,Graduate School,Single,22,0,50813,1972,0
+70000,Female,Graduate School,Married,22,0,64525,3093,0
+20000,Female,High School,Others,23,0,14702,2000,0
+50000,Female,University,Single,23,1,14191,5896,0
+30000,Female,University,Married,23,2,27997,1800,0
+50000,Female,University,Single,22,0,49619,1836,0
+30000,Female,High School,Single,23,3,30051,0,0
+80000,Female,University,Single,23,1,68594,0,0
+50000,Female,High School,Single,23,0,32644,2000,0
+20000,Female,Graduate School,Single,24,-1,16582,1422,0
+30000,Female,Graduate School,Single,25,0,26376,1550,0
+80000,Female,Graduate School,Single,25,2,69614,3238,1
+140000,Female,Graduate School,Single,25,-1,30077,2809,0
+210000,Female,University,Single,26,-1,119269,2345,0
+200000,Female,Graduate School,Single,26,0,200849,80000,0
+20000,Female,University,Single,22,1,14645,0,0
+30000,Female,University,Married,22,2,30525,1725,1
+150000,Female,High School,Single,23,0,11329,1219,0
+60000,Female,Graduate School,Single,24,0,35138,1553,0
+20000,Female,University,Single,21,0,10109,3000,0
+150000,Female,Graduate School,Single,25,0,119788,4728,0
+70000,Female,Graduate School,Single,25,-1,1288,0,0
+60000,Female,University,Married,30,0,65460,2299,1
+30000,Female,High School,Married,23,1,30380,1,1
+330000,Female,University,Single,31,0,30532,3814,0
+20000,Female,High School,Single,22,3,1650,0,1
+180000,Female,University,Married,35,-1,1175,5317,0
+20000,Female,Graduate School,Single,23,0,23809,1700,0
+130000,Female,University,Single,24,0,14518,1150,0
+30000,Female,University,Single,25,0,28709,1462,1
+30000,Female,University,Single,22,0,27762,5000,0
+500000,Female,Graduate School,Single,37,-2,315,2579,0
+450000,Female,Graduate School,Married,38,0,108390,4300,0
+50000,Female,University,Single,22,0,44593,5000,0
+60000,Female,University,Single,24,0,58692,5000,0
+130000,Female,University,Single,22,1,0,0,0
+280000,Female,University,Single,25,0,139019,4074,0
+100000,Female,University,Married,26,0,7086,1500,0
+60000,Female,Graduate School,Single,24,0,57802,2960,0
+80000,Female,Graduate School,Single,25,-1,177,1200,0
+130000,Female,Graduate School,Single,27,1,0,1386,0
+30000,Female,University,Single,21,0,9696,2000,0
+50000,Female,University,Single,22,1,38633,500,0
+50000,Female,University,Single,23,0,43634,0,1
+50000,Female,High School,Single,23,2,7285,0,1
+90000,Female,University,Single,23,0,90508,3432,1
+70000,Female,Graduate School,Single,23,-2,-22,0,0
+110000,Female,University,Single,25,0,80374,3000,0
+80000,Female,University,Single,25,0,81477,3100,0
+50000,Female,High School,Single,23,0,49971,2303,0
+130000,Female,University,Single,23,-1,13429,1000,0
+130000,Female,Graduate School,Single,24,-1,7353,3727,0
+60000,Female,University,Single,24,0,28277,1450,1
+140000,Female,High School,Single,24,-2,0,580,0
+110000,Female,University,Single,24,1,72184,0,0
+100000,Female,University,Single,24,1,7127,0,1
+20000,Female,University,Married,24,0,17599,1920,0
+90000,Female,University,Single,24,0,89608,4507,0
+50000,Female,University,Married,24,0,46437,1912,0
+30000,Female,University,Single,24,-1,1513,26500,0
+130000,Female,University,Married,23,-1,579,1186,1
+50000,Female,University,Single,26,-1,390,2078,0
+80000,Female,University,Married,23,0,79874,2100,0
+50000,Female,University,Single,23,0,21699,11877,0
+130000,Female,Graduate School,Single,24,0,78744,3500,0
+20000,Female,Graduate School,Single,24,-1,390,390,0
+120000,Female,Graduate School,Single,24,-2,0,0,0
+20000,Female,University,Single,24,1,9739,3023,0
+80000,Female,University,Single,24,0,13118,1500,0
+50000,Female,Graduate School,Single,24,0,35871,5000,0
+100000,Female,Graduate School,Single,24,0,76025,3000,0
+70000,Female,University,Single,24,0,12954,2000,0
+90000,Female,University,Married,25,-1,7766,93788,0
+280000,Female,University,Single,25,1,6714,22,0
+400000,Female,Graduate School,Single,26,-1,2871,4016,0
+130000,Female,High School,Single,26,0,113707,5000,0
+30000,Female,Others,Single,22,0,24979,1859,1
+50000,Female,High School,Single,23,2,25046,1724,1
+80000,Female,University,Single,23,1,21775,0,1
+50000,Female,High School,Single,22,2,24729,1651,0
+10000,Female,Graduate School,Single,23,1,8356,1200,0
+90000,Female,University,Single,24,0,83590,3200,0
+50000,Female,Graduate School,Single,24,3,8506,2000,1
+100000,Female,University,Single,24,-1,3587,660,0
+90000,Female,University,Married,25,2,90371,4100,1
+100000,Female,Graduate School,Single,25,0,40083,5000,0
+310000,Female,Graduate School,Single,26,-2,297,0,0
+210000,Female,Graduate School,Single,26,1,7271,6,0
+340000,Female,University,Single,25,-2,-2,0,0
+30000,Female,University,Married,25,2,27458,1267,1
+70000,Female,University,Single,25,2,67588,3100,1
+450000,Female,Graduate School,Single,25,-2,0,0,0
+110000,Female,Graduate School,Single,26,0,111306,3931,0
+30000,Female,University,Married,26,-1,3761,7231,0
+10000,Female,University,Single,26,1,6220,0,1
+50000,Female,Graduate School,Single,26,0,49392,1786,0
+120000,Female,Graduate School,Single,27,-1,13060,23008,0
+210000,Female,Graduate School,Single,27,0,21275,2079,0
+260000,Female,University,Single,27,-2,895,0,0
+300000,Female,Graduate School,Single,27,0,183468,10000,0
+160000,Female,Graduate School,Single,27,0,96441,4406,0
+30000,Female,University,Single,27,0,8476,1500,0
+120000,Female,University,Married,22,2,121522,2000,0
+70000,Female,University,Single,23,0,68410,2413,0
+90000,Female,Graduate School,Single,23,-1,3030,0,0
+60000,Female,University,Single,24,0,57674,2500,0
+60000,Female,Graduate School,Single,24,0,22504,1420,0
+50000,Female,Graduate School,Single,22,0,49772,1778,0
+100000,Female,Graduate School,Single,23,-1,4357,1901,0
+50000,Female,University,Married,24,0,34983,1700,0
+20000,Female,University,Single,24,3,322,0,1
+80000,Female,Graduate School,Single,24,0,34106,3000,0
+300000,Female,Others,Single,26,0,293880,11128,0
+100000,Female,University,Single,23,-1,892,2180,0
+140000,Female,University,Single,25,0,140286,6500,1
+80000,Female,University,Single,23,2,71975,3139,1
+60000,Female,High School,Single,23,0,30660,1800,0
+230000,Female,University,Single,23,1,-15,6194,0
+120000,Female,University,Single,25,0,46953,2500,0
+50000,Female,Graduate School,Single,23,0,48771,2106,0
+60000,Female,Graduate School,Single,24,0,59267,2694,0
+50000,Female,University,Single,24,-2,50096,1000,0
+210000,Female,University,Married,25,0,7788,1149,0
+90000,Female,University,Single,23,0,13107,1251,0
+20000,Female,Graduate School,Single,24,0,19403,1300,0
+130000,Female,University,Single,22,0,34970,2012,0
+20000,Female,University,Single,22,0,16653,2000,0
+90000,Female,University,Single,24,0,14329,3000,0
+30000,Female,Graduate School,Single,23,-1,23225,30043,0
+30000,Female,University,Married,23,0,27222,1450,0
+150000,Female,University,Single,24,1,0,0,0
+20000,Female,Graduate School,Single,24,0,20933,2700,1
+80000,Female,High School,Married,25,0,63214,2300,0
+40000,Female,University,Single,25,0,34084,1861,0
+240000,Female,Graduate School,Single,27,-2,5886,5055,0
+100000,Female,Others,Single,27,0,152330,3000,0
+100000,Female,Graduate School,Single,30,1,0,0,0
+150000,Female,High School,Single,25,-2,0,1145,0
+270000,Female,University,Single,25,0,22579,1380,0
+400000,Female,University,Single,24,0,35660,2000,0
+50000,Female,Graduate School,Single,25,0,38153,1700,0
+80000,Female,Graduate School,Single,25,-1,259,0,0
+20000,Female,University,Single,25,0,4120,1500,0
+240000,Female,High School,Single,25,0,198768,2986,0
+70000,Female,University,Single,25,2,19112,1629,1
+50000,Female,University,Single,26,2,47510,1744,0
+70000,Female,Graduate School,Single,23,0,45707,1931,0
+170000,Female,Others,Single,24,-2,85663,0,0
+110000,Female,High School,Single,26,1,117274,3000,0
+100000,Female,Graduate School,Married,26,0,98530,4301,0
+10000,Female,Graduate School,Single,23,3,7067,1291,1
+230000,Female,University,Single,23,1,5263,3,0
+230000,Female,University,Single,23,-1,5049,1444,0
+60000,Female,University,Married,23,0,43714,5000,0
+50000,Female,High School,Single,23,0,17976,1564,0
+110000,Female,Graduate School,Single,23,0,50346,5000,0
+180000,Female,High School,Single,42,0,102826,5011,0
+80000,Female,University,Single,24,-1,390,2463,0
+30000,Female,University,Single,24,1,25193,0,1
+50000,Female,High School,Single,24,1,50162,0,0
+190000,Female,Graduate School,Single,26,1,0,2236,0
+120000,Female,University,Single,26,0,87151,3500,0
+30000,Female,University,Married,21,0,28404,1467,0
+20000,Female,University,Single,21,0,19382,1318,0
+60000,Female,Graduate School,Single,22,0,57484,3000,0
+20000,Female,Graduate School,Single,22,0,16936,2684,0
+50000,Female,University,Single,23,0,97259,3608,1
+30000,Female,High School,Single,26,2,26412,0,0
+150000,Female,University,Single,27,0,58751,3000,0
+50000,Female,Graduate School,Single,22,2,50345,0,1
+210000,Female,University,Single,26,0,177492,10000,0
+130000,Female,High School,Married,25,0,24885,1740,0
+50000,Female,University,Married,26,1,0,0,0
+200000,Female,Graduate School,Single,26,-1,32938,3000,0
+180000,Female,Graduate School,Married,26,-1,4745,5865,0
+190000,Female,Graduate School,Single,26,0,174519,9000,1
+160000,Female,High School,Single,24,0,133268,4753,0
+130000,Female,Graduate School,Single,23,-1,4459,13485,0
+50000,Female,Graduate School,Single,23,-2,307,0,0
+30000,Female,High School,Married,21,0,28884,2091,0
+20000,Female,Graduate School,Single,23,-1,4456,3000,0
+110000,Female,University,Single,23,0,37129,1800,0
+50000,Female,High School,Single,24,2,57396,2500,1
+50000,Female,Graduate School,Single,24,2,45049,0,1
+80000,Female,University,Single,23,0,80715,5000,0
+20000,Female,University,Married,24,0,16505,2905,1
+20000,Female,University,Married,24,0,16393,1688,0
+150000,Female,Graduate School,Single,23,-2,0,0,0
+50000,Female,Graduate School,Single,23,0,45345,2010,0
+20000,Female,Graduate School,Single,24,0,5275,2778,0
+130000,Female,Graduate School,Married,23,1,126885,239,0
+130000,Female,University,Single,25,-1,3639,396,0
+60000,Female,University,Single,24,0,52676,1778,0
+110000,Female,University,Single,24,0,6071,0,0
+80000,Female,University,Single,22,1,7369,0,0
+50000,Female,Graduate School,Single,24,0,40246,2128,0
+80000,Female,University,Single,24,0,75125,3503,0
+150000,Female,Graduate School,Single,24,-2,4244,9830,0
+50000,Female,Graduate School,Single,24,1,50789,2500,0
+270000,Female,University,Married,25,0,184560,6061,0
+180000,Female,Graduate School,Single,23,0,6825,2015,0
+80000,Female,Graduate School,Single,24,-2,0,698,0
+50000,Female,University,Single,24,0,48393,1814,1
+210000,Female,University,Single,24,-1,1077,7960,1
+90000,Female,University,Single,24,0,12750,3155,0
+130000,Female,Graduate School,Single,24,0,9920,5555,0
+60000,Female,University,Single,24,-2,5648,12270,0
+320000,Female,Graduate School,Single,24,0,64232,4000,0
+130000,Female,Graduate School,Single,25,1,0,2178,0
+180000,Female,University,Single,25,0,109030,78266,0
+100000,Female,High School,Single,25,0,97997,4000,0
+80000,Female,University,Single,25,-2,0,0,0
+90000,Female,Others,Single,25,-1,1407,5000,1
+310000,Female,Graduate School,Single,25,0,140483,4000,0
+360000,Female,University,Single,25,2,1614,40,0
+180000,Female,Graduate School,Single,26,0,60258,2130,0
+40000,Female,Graduate School,Single,24,-2,5247,7196,0
+20000,Female,Graduate School,Single,24,2,1015,19214,1
+90000,Female,University,Married,24,0,86422,5000,0
+210000,Female,University,Single,25,-1,2785,461,0
+180000,Female,University,Single,25,0,11393,2000,0
+220000,Female,University,Single,25,0,47588,1479,1
+150000,Female,Graduate School,Single,25,0,151160,5526,0
+50000,Female,University,Single,25,0,18734,1340,0
+20000,Female,Graduate School,Single,25,1,2521,17388,1
+130000,Female,Graduate School,Single,25,0,55368,5000,0
+200000,Female,Graduate School,Single,25,0,136435,105,0
+320000,Female,High School,Single,25,0,66497,3084,0
+90000,Female,University,Single,25,1,0,0,0
+160000,Female,University,Married,25,-1,8030,8030,1
+210000,Female,Graduate School,Married,26,1,5589,14,0
+360000,Female,Graduate School,Single,26,-2,2741,0,0
+220000,Female,University,Single,26,2,80055,4500,1
+200000,Female,Graduate School,Single,25,1,2396,1823,0
+60000,Female,Others,Single,27,0,170536,5143,0
+30000,Female,University,Single,24,0,27690,1578,0
+50000,Female,High School,Married,24,0,48789,1775,0
+90000,Female,Graduate School,Single,25,0,14550,2000,0
+110000,Female,University,Single,25,0,38152,2000,0
+140000,Female,University,Single,25,0,133714,6560,0
+200000,Female,University,Single,25,-1,2298,1490,0
+30000,Female,University,Single,25,0,26815,2000,0
+50000,Female,Graduate School,Single,25,-1,3195,3557,0
+50000,Female,Graduate School,Single,25,2,40959,3900,1
+50000,Female,Graduate School,Single,25,0,4705,5000,0
+80000,Female,High School,Married,24,-1,754,0,0
+240000,Female,Others,Single,25,0,26501,5000,0
+120000,Female,High School,Single,25,0,106621,4000,1
+200000,Female,Graduate School,Single,26,-2,5337,2322,0
+140000,Female,Graduate School,Single,26,0,21688,0,0
+360000,Female,University,Single,26,0,22233,1287,0
+70000,Female,University,Married,26,0,41659,1687,0
+170000,Female,University,Single,26,2,111170,4065,1
+140000,Female,High School,Single,26,2,144010,5301,1
+20000,Female,University,Single,24,3,17690,2000,1
+50000,Female,Graduate School,Single,24,0,12986,1230,0
+110000,Female,Graduate School,Single,25,0,92332,2800,0
+180000,Female,University,Single,26,1,0,0,0
+90000,Female,University,Single,25,2,77651,3300,1
+120000,Female,Graduate School,Single,26,0,115520,5001,0
+140000,Female,Graduate School,Single,26,0,63119,2450,0
+290000,Female,University,Married,26,0,262518,6500,1
+310000,Female,Graduate School,Single,26,0,11683,3000,0
+30000,Female,Graduate School,Single,27,0,3812,2000,0
+200000,Female,Graduate School,Married,25,-1,776,0,0
+180000,Female,High School,Single,26,-1,1728,0,0
+120000,Female,University,Married,24,0,56324,3000,0
+290000,Female,Graduate School,Single,25,-1,11102,9358,0
+90000,Female,High School,Single,26,0,91564,3000,0
+260000,Female,Graduate School,Single,26,0,145165,6500,0
+230000,Female,Graduate School,Single,26,-2,865,221,0
+270000,Female,Graduate School,Single,27,-1,9112,6035,0
+280000,Female,University,Single,29,-1,11536,3419,0
+110000,Female,University,Single,27,1,0,0,1
+60000,Female,Graduate School,Single,27,-1,1130,7,0
+70000,Female,Graduate School,Single,27,0,70844,3000,0
+100000,Female,University,Married,27,2,101429,2833,1
+140000,Female,University,Married,27,0,125792,5000,0
+50000,Female,University,Married,22,0,34250,1550,1
+80000,Female,Graduate School,Single,23,0,74464,3117,0
+40000,Female,Others,Married,24,0,40848,1650,0
+140000,Female,University,Single,24,0,135432,6500,1
+200000,Female,University,Single,24,0,19326,1330,0
+50000,Female,University,Single,24,1,16455,2000,0
+80000,Female,University,Single,24,0,41444,1306,0
+270000,Female,University,Married,26,0,22804,1496,0
+70000,Female,University,Married,27,0,69762,2450,1
+200000,Female,Graduate School,Single,25,0,11740,2012,0
+170000,Female,University,Single,25,0,26151,3000,1
+420000,Female,University,Single,25,0,23191,2267,0
+50000,Female,Graduate School,Single,26,2,25875,2000,1
+140000,Female,University,Single,27,0,124753,3626,0
+130000,Female,Graduate School,Single,27,-1,19486,35000,0
+270000,Female,Graduate School,Single,27,-1,1390,363,0
+80000,Female,Others,Single,27,0,81787,2600,0
+160000,Female,University,Married,26,1,46835,0,0
+130000,Female,University,Single,26,0,130318,4700,1
+160000,Female,University,Single,27,0,100613,3600,0
+180000,Female,University,Single,27,0,173098,8000,0
+180000,Female,Graduate School,Single,27,1,0,2039,0
+210000,Female,High School,Married,27,0,4700,0,0
+150000,Female,High School,Single,26,0,21081,2200,0
+360000,Female,Graduate School,Married,26,-2,1517,2749,0
+470000,Female,University,Single,26,0,27134,1500,0
+40000,Female,Graduate School,Single,27,0,33563,1204,0
+200000,Female,Graduate School,Single,27,0,201202,15200,0
+150000,Female,Graduate School,Single,27,0,8000,13024,0
+90000,Female,University,Single,25,0,65924,3000,1
+60000,Female,University,Single,26,1,47963,0,0
+80000,Female,Graduate School,Single,26,-1,4620,2991,0
+160000,Female,Graduate School,Single,26,0,13122,4000,0
+130000,Female,Graduate School,Single,27,1,0,0,0
+70000,Female,High School,Single,26,0,70251,2000,0
+170000,Female,University,Single,24,-1,1471,6040,0
+50000,Female,University,Single,25,0,4406,3000,0
+100000,Female,Graduate School,Married,25,0,7565,28,0
+130000,Female,Graduate School,Single,27,1,0,0,0
+60000,Female,University,Single,23,0,56397,2039,0
+290000,Female,University,Single,25,0,140615,7003,0
+50000,Female,University,Married,26,1,0,1000,0
+140000,Female,Graduate School,Single,24,0,28872,2000,0
+120000,Female,Graduate School,Single,24,0,3377,2500,0
+190000,Female,University,Single,26,0,115608,26500,0
+30000,Female,University,Single,22,2,174,1995,0
+160000,Female,High School,Single,26,-1,390,390,0
+230000,Female,Others,Single,26,1,0,500,0
+130000,Female,High School,Single,27,0,124769,5400,0
+340000,Female,Graduate School,Single,27,0,195524,7524,0
+230000,Female,Graduate School,Single,28,2,101347,4179,1
+90000,Female,Graduate School,Single,25,0,33336,10000,0
+300000,Female,University,Single,26,0,63425,3005,0
+110000,Female,University,Single,25,0,52059,1150,1
+140000,Female,Graduate School,Single,25,0,38199,2000,0
+180000,Female,University,Single,26,0,3739,3889,1
+70000,Female,Graduate School,Single,26,1,52204,2500,1
+180000,Female,University,Married,27,0,117703,4171,0
+150000,Female,University,Single,27,-1,7815,4933,0
+30000,Female,University,Married,27,0,27347,5500,0
+230000,Female,University,Single,25,0,14330,232557,0
+180000,Female,University,Married,27,1,0,376,0
+150000,Female,University,Single,27,0,61162,3000,0
+40000,Female,Graduate School,Single,27,0,30923,5000,0
+40000,Female,Graduate School,Single,27,0,37005,2000,1
+120000,Female,University,Single,27,2,113354,6200,1
+140000,Female,Graduate School,Single,27,0,14997,2000,0
+80000,Female,Graduate School,Married,27,-1,494,329,0
+420000,Female,Graduate School,Single,27,-1,15629,5402,0
+180000,Female,High School,Single,27,-1,3578,3898,0
+90000,Female,University,Married,27,0,73849,2900,0
+330000,Female,University,Married,27,-2,0,0,0
+260000,Female,University,Married,27,0,71865,5000,0
+200000,Female,University,Single,27,-1,1126,390,0
+30000,Female,University,Single,23,0,19769,5896,0
+300000,Female,University,Single,24,0,16463,1500,0
+130000,Female,Graduate School,Single,24,0,16078,4176,0
+30000,Female,University,Single,22,-1,2293,1009,0
+30000,Female,University,Single,23,1,11174,1300,1
+130000,Female,Graduate School,Single,23,0,102087,4500,0
+190000,Female,University,Single,28,0,188370,8700,0
+230000,Female,University,Single,23,3,1090,0,1
+50000,Female,University,Single,24,-1,18625,2071,0
+70000,Female,University,Married,24,0,8193,1122,0
+50000,Female,University,Married,25,0,59010,2039,1
+60000,Female,Graduate School,Single,24,0,47386,4807,1
+150000,Female,University,Single,25,-2,36215,6105,0
+240000,Female,University,Married,27,0,240428,7227,0
+110000,Female,University,Single,25,0,50406,1800,0
+110000,Female,University,Married,26,0,108001,3500,0
+50000,Female,University,Single,25,0,47858,1949,0
+210000,Female,Graduate School,Single,29,-2,10692,0,0
+160000,Female,Graduate School,Single,28,-1,3849,6458,0
+120000,Female,Graduate School,Single,29,0,3706,100000,0
+130000,Female,Others,Married,27,0,131041,6500,0
+170000,Female,University,Single,27,0,168858,7000,0
+120000,Female,Graduate School,Single,27,0,35238,1777,0
+170000,Female,Graduate School,Single,27,1,0,0,0
+290000,Female,University,Single,27,0,240506,8601,0
+70000,Female,Graduate School,Single,27,-1,157,157,0
+140000,Female,University,Single,27,0,136276,7000,0
+180000,Female,Graduate School,Single,27,1,5892,0,1
+200000,Female,Graduate School,Single,27,-1,1171,0,0
+90000,Female,University,Single,27,0,97973,12500,1
+80000,Female,University,Single,27,0,52934,3000,0
+130000,Female,Graduate School,Married,27,-1,3363,8319,0
+20000,Female,University,Married,27,-1,191,20200,0
+200000,Female,Graduate School,Single,27,-2,4257,3690,1
+200000,Female,University,Single,27,-1,13285,2155,0
+120000,Female,University,Single,27,1,0,0,1
+60000,Female,High School,Married,28,0,7207,12625,0
+340000,Female,High School,Single,28,0,1830,3000,0
+150000,Female,Graduate School,Single,28,-1,1040,0,1
+160000,Female,High School,Single,28,-1,408,99,0
+200000,Female,Graduate School,Single,28,0,52935,2500,0
+50000,Female,High School,Married,28,0,43165,6987,1
+170000,Female,Graduate School,Single,27,-2,106,18719,1
+280000,Female,University,Married,27,0,109392,21000,0
+270000,Female,Graduate School,Single,26,0,209164,6429,0
+120000,Female,University,Single,27,-1,1184,1066,0
+280000,Female,Graduate School,Single,27,-1,6585,4308,0
+50000,Female,Graduate School,Single,27,0,30951,1600,0
+200000,Female,Graduate School,Single,28,-2,0,0,0
+50000,Female,University,Married,27,0,47254,1720,1
+150000,Female,Graduate School,Single,28,0,6121,4391,0
+50000,Female,High School,Single,28,3,2350,50,1
+150000,Female,University,Single,27,-1,1518,1798,0
+150000,Female,Graduate School,Single,27,-2,-3706,3449,0
+90000,Female,University,Single,27,-1,18674,92858,0
+500000,Female,Graduate School,Single,28,0,93625,11723,0
+300000,Female,University,Single,28,0,327724,11250,0
+240000,Female,University,Single,28,0,234053,5000,0
+120000,Female,University,Single,28,0,48499,1621,0
+120000,Female,University,Married,28,-2,1613,136,0
+390000,Female,Graduate School,Single,28,0,125863,10053,0
+200000,Female,University,Married,28,0,5733,2023,0
+80000,Female,University,Single,24,0,70411,6657,0
+20000,Female,University,Single,23,2,12492,0,1
+50000,Female,University,Single,24,0,26735,1500,0
+70000,Female,University,Single,25,2,46526,2070,1
+170000,Female,University,Single,24,0,169398,5679,0
+80000,Female,University,Single,25,2,77096,8000,0
+310000,Female,University,Single,27,0,122127,10000,0
+230000,Female,Graduate School,Single,27,0,145959,3551,0
+20000,Female,University,Single,27,0,18833,1290,0
+270000,Female,University,Married,27,1,253072,0,0
+230000,Female,High School,Single,27,-1,6155,500,1
+230000,Female,University,Single,28,0,7865,5000,0
+200000,Female,Graduate School,Single,28,0,58833,4100,0
+30000,Female,High School,Single,28,-1,8432,25460,0
+300000,Female,University,Single,28,0,275855,15000,1
+500000,Female,Graduate School,Single,29,-1,16090,37039,0
+170000,Female,Graduate School,Single,29,1,0,6390,0
+200000,Female,University,Single,29,-1,18568,0,0
+500000,Female,Graduate School,Single,29,0,33107,7000,0
+50000,Female,University,Single,29,0,50029,2100,0
+150000,Female,Graduate School,Single,29,-1,2599,0,0
+100000,Female,University,Single,29,0,47107,18000,0
+150000,Female,Graduate School,Single,26,0,24848,14000,0
+170000,Female,Graduate School,Single,26,0,22680,2000,0
+100000,Female,University,Single,27,0,42713,10000,0
+160000,Female,Graduate School,Single,27,-2,1040,1320,0
+90000,Female,Graduate School,Single,27,0,37026,5000,0
+190000,Female,University,Single,27,0,97136,4418,0
+200000,Female,University,Single,28,0,61940,3000,0
+50000,Female,University,Married,28,0,56701,5000,0
+260000,Female,Graduate School,Single,28,-2,-28,5763,0
+250000,Female,University,Single,29,0,20687,14158,0
+230000,Female,Graduate School,Single,29,-2,3764,11142,0
+240000,Female,Graduate School,Single,26,-1,28700,1639,0
+110000,Female,Graduate School,Single,27,0,110553,4000,1
+60000,Female,Graduate School,Single,27,1,0,0,0
+80000,Female,Graduate School,Single,24,0,42377,2008,0
+230000,Female,University,Single,28,0,111295,4473,0
+210000,Female,Graduate School,Single,28,0,7457,1512,0
+100000,Female,University,Single,28,2,72464,0,1
+50000,Female,University,Single,29,2,48286,2100,1
+50000,Female,University,Single,27,0,42391,3000,0
+40000,Female,Graduate School,Single,27,-1,27025,6000,1
+90000,Female,University,Single,24,0,3386,0,0
+70000,Female,Graduate School,Single,25,0,35871,1902,0
+30000,Female,University,Married,26,0,20105,1650,0
+170000,Female,Graduate School,Single,27,-1,272,272,0
+80000,Female,University,Single,27,0,83120,1764,0
+140000,Female,University,Single,27,0,16037,2000,0
+30000,Female,University,Single,25,0,5293,3000,0
+40000,Female,University,Single,25,0,26133,1500,1
+50000,Female,Graduate School,Single,26,1,0,0,0
+230000,Female,University,Single,26,0,2656,1074,0
+60000,Female,Graduate School,Single,27,1,0,3947,0
+60000,Female,Graduate School,Single,27,0,62321,3000,0
+250000,Female,University,Single,29,0,46948,2500,0
+240000,Female,Graduate School,Married,29,0,41923,1779,0
+310000,Female,Graduate School,Single,28,0,176565,25532,0
+320000,Female,Graduate School,Single,28,0,293073,13300,0
+20000,Female,Graduate School,Single,28,0,12031,5000,0
+300000,Female,Graduate School,Single,28,0,10291,19326,0
+50000,Female,University,Single,28,0,10051,5000,0
+570000,Female,Graduate School,Single,28,0,33675,3000,0
+160000,Female,Graduate School,Single,28,0,118695,10000,0
+230000,Female,University,Single,28,-1,7539,6500,0
+250000,Female,Graduate School,Single,28,0,45687,5000,0
+80000,Female,University,Single,28,0,77391,2784,0
+60000,Female,High School,Single,28,-1,396,87804,0
+90000,Female,Others,Single,24,0,17584,2000,0
+150000,Female,Graduate School,Married,25,-1,8319,5056,0
+90000,Female,High School,Married,25,0,25806,1619,0
+150000,Female,Graduate School,Single,25,1,0,0,0
+150000,Female,Graduate School,Single,24,0,52824,4000,0
+400000,Female,High School,Married,25,0,12570,958,0
+200000,Female,Graduate School,Single,25,0,198890,8058,0
+210000,Female,Graduate School,Single,25,0,66517,3000,0
+140000,Female,University,Married,28,-1,1953,0,1
+220000,Female,University,Single,28,-1,7468,5045,0
+250000,Female,University,Married,29,0,227762,35018,0
+120000,Female,Graduate School,Single,29,-1,328,417,0
+200000,Female,University,Married,28,1,0,2658,0
+50000,Female,University,Single,29,1,22021,1000,0
+50000,Female,High School,Single,29,0,50148,1840,0
+230000,Female,Graduate School,Single,29,-2,20517,12903,0
+100000,Female,Graduate School,Married,29,2,1526,0,0
+50000,Female,University,Married,29,1,26293,0,0
+330000,Female,Graduate School,Single,27,0,37587,2000,0
+90000,Female,Graduate School,Single,27,-1,3000,0,0
+150000,Female,University,Married,27,0,5877,1268,0
+70000,Female,High School,Single,28,2,24799,0,0
+80000,Female,Graduate School,Single,28,0,71126,4000,0
+130000,Female,University,Single,28,0,131630,3600,0
+130000,Female,Graduate School,Single,29,0,101650,4621,0
+140000,Female,University,Single,29,-1,389,407,1
+240000,Female,High School,Married,27,-1,2400,152,1
+160000,Female,Graduate School,Single,26,-1,2659,3859,0
+50000,Female,Graduate School,Single,26,-1,651,651,1
+500000,Female,Graduate School,Single,27,-1,17531,26588,0
+260000,Female,University,Married,28,0,238896,11000,0
+100000,Female,University,Single,23,0,10693,22622,0
+130000,Female,University,Single,23,0,111388,6000,0
+80000,Female,Graduate School,Single,26,0,124224,4574,0
+400000,Female,University,Single,29,0,52875,55544,0
+360000,Female,University,Single,27,0,44179,1498,0
+190000,Female,Graduate School,Single,28,0,7188,2459,0
+90000,Female,University,Single,28,0,41138,2000,0
+50000,Female,Graduate School,Single,29,0,8838,6540,0
+80000,Female,University,Single,29,0,49196,6292,0
+120000,Female,University,Single,29,-1,1450,0,0
+100000,Female,University,Single,28,2,21117,1400,1
+170000,Female,Graduate School,Married,28,1,0,1295,0
+90000,Female,Graduate School,Single,28,0,38355,1700,0
+400000,Female,Graduate School,Single,27,-2,0,0,1
+230000,Female,Graduate School,Single,27,0,79108,3187,0
+260000,Female,Graduate School,Single,28,0,207593,7011,0
+110000,Female,University,Single,29,0,56073,4500,0
+500000,Female,University,Single,28,-1,319082,8752,0
+90000,Female,University,Single,28,-1,2943,906,0
+60000,Female,Graduate School,Single,28,0,20729,3016,0
+320000,Female,University,Married,28,0,320635,12500,1
+320000,Female,University,Married,29,0,34818,1690,0
+190000,Female,Graduate School,Single,28,0,112039,4600,0
+80000,Female,Graduate School,Single,28,1,0,0,1
+230000,Female,University,Married,28,2,35610,3706,1
+310000,Female,Graduate School,Married,29,-2,594,477,0
+260000,Female,University,Single,29,0,150709,6911,0
+200000,Female,Graduate School,Single,29,1,0,928,0
+200000,Female,Graduate School,Single,29,0,65199,6532,0
+200000,Female,University,Single,28,0,50925,2455,0
+50000,Female,University,Single,28,0,28157,1430,0
+50000,Female,High School,Single,28,0,52775,2500,1
+50000,Female,Graduate School,Single,29,0,44498,4000,0
+240000,Female,University,Married,29,0,187431,8200,0
+100000,Female,University,Single,29,0,7815,5000,0
+230000,Female,University,Single,28,-1,2454,1197,0
+230000,Female,Graduate School,Single,29,0,4827,2000,0
+240000,Female,University,Married,28,2,191137,24000,1
+150000,Female,University,Married,30,-1,35213,115041,0
+150000,Female,University,Married,30,0,1499,1231,0
+200000,Female,Graduate School,Single,29,-1,27869,5531,0
+60000,Female,University,Single,29,2,58501,2000,1
+50000,Female,University,Single,31,0,19889,1500,0
+140000,Female,University,Single,31,-1,277,19983,0
+220000,Female,Graduate School,Married,30,2,150682,0,0
+300000,Female,Graduate School,Single,28,-2,936,0,0
+300000,Female,Graduate School,Single,29,-2,-5478,1220,0
+200000,Female,Graduate School,Single,29,0,118296,3900,0
+80000,Female,University,Others,29,-1,6441,14349,0
+430000,Female,University,Single,29,2,75249,3754,1
+60000,Female,University,Others,29,0,4134,1200,0
+170000,Female,University,Single,29,-1,1333,1359,0
+50000,Female,University,Single,29,0,26159,1800,0
+420000,Female,University,Single,26,0,24434,1500,0
+80000,Female,University,Single,26,3,45088,2000,1
+230000,Female,Graduate School,Single,27,-1,17077,22437,1
+270000,Female,University,Single,29,0,62516,1854,0
+500000,Female,Graduate School,Single,28,0,56437,2357,0
+220000,Female,University,Single,28,0,94311,4319,0
+180000,Female,High School,Single,24,0,100772,3034,0
+50000,Female,University,Single,23,0,34448,1975,0
+20000,Female,Others,Single,22,1,11871,0,0
+20000,Female,Graduate School,Single,24,1,0,890,0
+80000,Female,Graduate School,Single,22,1,6261,1000,0
+120000,Female,University,Married,23,0,64049,3000,0
+40000,Female,Graduate School,Single,23,0,10389,1200,0
+100000,Female,University,Married,42,-2,2811,1055,0
+90000,Female,University,Single,25,1,91830,0,0
+160000,Female,University,Single,24,0,160553,6050,0
+100000,Female,Graduate School,Single,24,0,27753,1632,0
+230000,Female,Graduate School,Single,25,0,50757,3000,0
+60000,Female,Graduate School,Single,28,0,37878,3000,0
+300000,Female,Graduate School,Married,26,2,800,1946,0
+140000,Female,University,Married,29,0,129289,3713,0
+300000,Female,Graduate School,Single,29,-2,1049,205,1
+60000,Female,Graduate School,Single,29,0,23304,3000,0
+360000,Female,University,Single,29,-2,0,1307,0
+100000,Female,Graduate School,Single,29,1,3788,0,0
+100000,Female,University,Single,29,-1,3262,3262,0
+320000,Female,Graduate School,Single,29,-2,0,192,0
+500000,Female,University,Single,29,-2,902,0,0
+180000,Female,Graduate School,Single,30,-2,405,4445,0
+280000,Female,University,Married,30,2,18925,1976,1
+140000,Female,University,Married,30,0,13859,5000,0
+200000,Female,University,Married,30,-1,1877,0,0
+270000,Female,Graduate School,Single,30,0,25628,3000,0
+230000,Female,Graduate School,Married,30,0,5442,1210,0
+210000,Female,Graduate School,Single,29,1,-12,2437,0
+170000,Female,Graduate School,Single,29,-1,1380,9533,0
+230000,Female,Graduate School,Single,29,-2,360,0,0
+300000,Female,University,Single,29,0,164437,5100,0
+150000,Female,High School,Single,29,-1,3850,227,0
+360000,Female,Graduate School,Single,29,1,0,0,0
+110000,Female,University,Single,29,2,99876,3000,1
+150000,Female,Graduate School,Others,30,1,6043,0,0
+50000,Female,Graduate School,Single,27,0,49959,2100,0
+170000,Female,Graduate School,Single,27,-1,356,1536,0
+210000,Female,University,Single,27,0,26311,2000,0
+50000,Female,University,Single,28,2,45004,3880,1
+200000,Female,University,Single,30,0,4644,1263,0
+100000,Female,Graduate School,Single,28,0,29086,1400,0
+180000,Female,University,Single,31,0,112900,0,0
+80000,Female,University,Married,33,0,75564,3406,0
+80000,Female,University,Married,27,0,56720,5958,1
+50000,Female,High School,Married,29,0,47425,2100,0
+560000,Female,University,Married,28,-2,2740,2740,0
+150000,Female,University,Single,28,2,89336,4027,1
+310000,Female,University,Single,27,-1,1261,1261,1
+210000,Female,University,Single,27,-1,1856,8657,0
+200000,Female,Graduate School,Single,27,-2,4941,5610,0
+100000,Female,Graduate School,Single,28,0,14309,1268,0
+200000,Female,University,Married,28,-2,34819,165,0
+40000,Female,University,Married,28,0,29195,3650,0
+80000,Female,High School,Married,29,-2,456,3366,0
+360000,Female,High School,Single,29,-2,3307,0,0
+290000,Female,University,Single,28,0,15042,15000,0
+90000,Female,Graduate School,Single,28,-2,1944,0,0
+280000,Female,Graduate School,Single,29,-1,33210,726,0
+50000,Female,Graduate School,Single,29,0,22120,4000,0
+200000,Female,High School,Married,29,0,203621,7464,0
+400000,Female,University,Single,30,0,438569,16718,0
+210000,Female,Graduate School,Single,31,-2,0,0,0
+50000,Female,University,Married,31,0,47800,3725,1
+150000,Female,Graduate School,Single,30,-1,4205,1264,0
+230000,Female,Graduate School,Single,29,1,0,0,0
+70000,Female,Graduate School,Single,29,0,75040,3000,0
+450000,Female,Graduate School,Married,29,0,104457,2489,0
+80000,Female,University,Single,29,0,70810,3200,0
+380000,Female,University,Single,30,0,305489,11687,0
+100000,Female,University,Married,31,0,91192,4000,0
+30000,Female,University,Married,31,0,24358,3000,0
+140000,Female,Graduate School,Single,31,0,3886,1100,0
+320000,Female,Graduate School,Single,27,1,43243,2000,1
+20000,Female,University,Married,24,0,18919,1600,0
+350000,Female,University,Single,25,0,327915,20000,1
+180000,Female,University,Single,25,0,175063,5000,0
+30000,Female,University,Single,23,0,28347,2300,0
+110000,Female,Graduate School,Single,24,0,111863,6001,0
+50000,Female,University,Single,24,0,2568,1070,0
+200000,Female,University,Single,31,-1,2718,591,0
+100000,Female,University,Single,29,0,87018,3267,0
+100000,Female,Graduate School,Single,27,-1,4829,0,0
+150000,Female,Graduate School,Single,29,-1,1148,2000,1
+60000,Female,University,Married,27,0,60915,2255,0
+80000,Female,University,Married,27,0,20417,3082,0
+150000,Female,Graduate School,Single,26,0,7448,2089,0
+30000,Female,Graduate School,Single,27,0,30342,1500,1
+150000,Female,Graduate School,Single,26,0,205467,6000,0
+250000,Female,Graduate School,Single,26,0,50122,2027,0
+500000,Female,Graduate School,Single,29,0,36051,2065,0
+230000,Female,Graduate School,Single,28,0,65862,5000,0
+160000,Female,Graduate School,Single,29,-2,386,4194,0
+360000,Female,Graduate School,Married,31,-2,9315,58617,0
+400000,Female,University,Married,30,0,30667,5034,0
+130000,Female,Graduate School,Single,29,0,93761,8901,0
+80000,Female,Graduate School,Single,28,0,70723,3200,0
+160000,Female,Graduate School,Single,28,-2,316,1617,0
+390000,Female,Graduate School,Married,28,0,60697,5000,0
+200000,Female,University,Single,31,-1,3707,0,1
+30000,Female,University,Married,30,0,9211,1154,0
+50000,Female,High School,Single,31,-1,780,0,0
+180000,Female,University,Single,31,0,69502,2500,0
+200000,Female,High School,Single,32,0,148959,6700,0
+180000,Female,University,Married,32,0,151307,6000,0
+250000,Female,Graduate School,Single,31,-1,36439,5000,0
+180000,Female,Graduate School,Single,29,-1,3900,222,0
+50000,Female,University,Married,32,0,48971,1754,0
+220000,Female,Graduate School,Married,33,0,199040,10032,0
+360000,Female,University,Married,31,-2,990,990,0
+500000,Female,Graduate School,Single,29,0,33252,10000,0
+50000,Female,High School,Married,30,-1,1032,1464,0
+150000,Female,University,Single,30,-1,55,0,0
+150000,Female,University,Single,31,-1,13701,12013,0
+300000,Female,Graduate School,Single,31,-1,720,0,0
+130000,Female,Graduate School,Single,24,0,76582,12000,0
+30000,Female,High School,Single,22,2,15048,2000,0
+90000,Female,University,Single,25,0,27658,6708,0
+90000,Female,University,Single,25,0,88822,1588,1
+70000,Female,High School,Single,25,0,31337,4000,0
+80000,Female,University,Married,25,0,73267,2971,1
+60000,Female,University,Single,25,0,7885,1500,1
+60000,Female,High School,Single,26,-1,4837,0,0
+420000,Female,University,Single,27,-2,15000,15000,0
+200000,Female,Graduate School,Single,27,-2,0,0,1
+220000,Female,Graduate School,Single,29,-2,2848,7181,0
+170000,Female,Graduate School,Single,29,-1,1000,5000,0
+500000,Female,High School,Single,29,-2,5024,802,1
+60000,Female,Graduate School,Single,29,1,0,351,1
+140000,Female,University,Single,30,-2,107,232,0
+100000,Female,University,Married,31,0,95160,2000,0
+180000,Female,Graduate School,Single,31,-2,-23,5654,0
+360000,Female,Graduate School,Single,31,1,0,0,0
+420000,Female,Graduate School,Single,31,0,233413,12000,1
+200000,Female,University,Single,29,-1,779,860,0
+260000,Female,University,Married,31,0,82156,3611,0
+300000,Female,University,Single,30,-2,19333,19333,0
+400000,Female,Graduate School,Single,29,0,74829,7017,1
+220000,Female,Graduate School,Single,30,0,22959,20066,0
+130000,Female,Graduate School,Single,30,-1,1493,1492,1
+150000,Female,Graduate School,Single,29,-1,962,5868,0
+280000,Female,Graduate School,Single,33,0,266030,12000,0
+230000,Female,Graduate School,Single,33,0,181532,9000,0
+180000,Female,University,Married,28,0,54972,2000,0
+270000,Female,University,Single,29,0,39004,1520,0
+190000,Female,Others,Single,30,0,198098,4515,0
+150000,Female,High School,Single,30,0,69235,2500,0
+230000,Female,Graduate School,Single,28,-2,355,0,1
+70000,Female,High School,Married,29,0,22438,2000,0
+500000,Female,University,Single,29,-2,2140,3919,0
+50000,Female,University,Married,30,0,46800,1604,1
+220000,Female,Graduate School,Single,29,0,206879,9100,1
+210000,Female,University,Married,30,-1,2818,0,0
+260000,Female,University,Married,30,0,66433,2400,0
+280000,Female,Graduate School,Single,29,-1,1939,333,0
+180000,Female,Graduate School,Single,30,-1,360,120,0
+30000,Female,High School,Married,29,1,27705,1500,1
+20000,Female,University,Married,30,1,0,0,1
+200000,Female,University,Married,32,1,-400,563,0
+80000,Female,Graduate School,Single,30,3,82523,0,1
+90000,Female,University,Single,31,0,22259,1838,1
+20000,Female,University,Married,32,2,15073,1600,1
+30000,Female,High School,Married,32,0,27395,2000,0
+70000,Female,Graduate School,Single,29,0,61217,3000,0
+200000,Female,University,Married,29,0,74566,3213,0
+420000,Female,University,Single,29,0,326239,7108,0
+130000,Female,University,Married,29,0,130927,7000,0
+160000,Female,Graduate School,Single,30,0,78623,3000,0
+320000,Female,Graduate School,Married,29,-1,2799,7329,0
+130000,Female,Graduate School,Single,29,-1,790,390,0
+200000,Female,Graduate School,Single,29,1,0,0,1
+320000,Female,Graduate School,Single,30,-2,1259,5177,0
+400000,Female,University,Married,30,0,68835,2600,0
+390000,Female,University,Single,30,0,35169,1070,0
+100000,Female,Graduate School,Married,29,0,27304,2000,0
+100000,Female,University,Married,30,-1,98048,3903,0
+400000,Female,University,Single,31,0,9123,1500,0
+100000,Female,University,Married,27,-1,667,0,1
+300000,Female,High School,Married,27,-1,1890,1570,0
+20000,Female,University,Single,27,0,14248,2000,0
+50000,Female,University,Single,27,0,27314,1500,0
+100000,Female,Graduate School,Single,30,2,29668,3600,0
+200000,Female,University,Married,30,0,194747,9000,0
+300000,Female,University,Single,30,0,251986,11000,0
+240000,Female,High School,Married,30,-1,7075,17003,0
+240000,Female,University,Married,30,-1,17047,15188,0
+120000,Female,High School,Married,30,1,119511,3,1
+350000,Female,Graduate School,Single,31,-1,403,0,0
+590000,Female,Graduate School,Single,31,1,-154,1000,0
+240000,Female,Graduate School,Single,30,1,0,0,0
+200000,Female,Others,Single,27,-2,0,0,0
+300000,Female,Graduate School,Single,27,0,131760,4700,0
+240000,Female,University,Married,28,-1,5740,1215,0
+170000,Female,Graduate School,Single,28,-1,3497,17920,0
+470000,Female,University,Single,30,0,26549,1420,0
+110000,Female,Graduate School,Single,30,0,51276,3000,0
+30000,Female,High School,Single,24,0,27276,2000,1
+80000,Female,University,Single,24,1,62550,10000,0
+100000,Female,Graduate School,Single,24,1,-78,0,0
+160000,Female,University,Single,25,0,45835,2000,0
+110000,Female,Graduate School,Single,27,0,52426,1772,0
+150000,Female,High School,Single,28,2,107603,5500,0
+390000,Female,Graduate School,Single,30,0,7650,3031,0
+180000,Female,University,Married,30,-2,5100,9861,1
+300000,Female,Graduate School,Single,29,0,45117,1281,0
+400000,Female,Graduate School,Married,29,0,5074,1097,0
+110000,Female,Graduate School,Single,29,1,113093,3,0
+120000,Female,Graduate School,Single,28,0,20578,2000,0
+200000,Female,University,Single,28,-1,8837,9690,0
+120000,Female,University,Married,25,0,70282,4000,0
+270000,Female,University,Single,25,0,162905,5085,0
+250000,Female,Graduate School,Single,25,0,18761,4662,0
+130000,Female,University,Married,25,0,61223,4003,0
+290000,Female,Graduate School,Single,26,-1,32126,10884,0
+210000,Female,University,Single,26,0,78632,2000,0
+200000,Female,University,Single,26,1,-1544,12651,0
+240000,Female,Graduate School,Single,28,1,17868,0,0
+170000,Female,Graduate School,Married,30,-2,5280,5020,0
+180000,Female,Graduate School,Single,28,-1,26000,5488,0
+280000,Female,Graduate School,Single,28,-2,40683,1888,0
+490000,Female,University,Single,28,0,128821,3047,0
+240000,Female,University,Single,31,-1,7414,2020,0
+150000,Female,Graduate School,Married,31,0,67852,2500,0
+350000,Female,Graduate School,Single,32,0,260076,9501,0
+80000,Female,University,Married,30,0,22113,1600,0
+330000,Female,Graduate School,Single,31,-1,3437,7565,0
+20000,Female,University,Married,31,1,-19,4540,0
+210000,Female,University,Single,32,0,44304,3000,0
+400000,Female,University,Single,32,-2,0,626,0
+230000,Female,University,Married,28,-2,0,0,0
+200000,Female,Graduate School,Single,29,-2,8695,14329,0
+180000,Female,University,Single,31,-2,0,2813,0
+200000,Female,University,Married,30,0,33604,3000,0
+50000,Female,University,Single,26,0,48057,2007,0
+290000,Female,Graduate School,Single,27,-1,168,569,0
+100000,Female,University,Married,31,0,75818,2745,0
+150000,Female,University,Single,32,0,9545,5000,1
+20000,Female,University,Single,32,-1,9763,925,0
+200000,Female,Graduate School,Married,32,0,12367,2000,0
+80000,Female,University,Single,32,0,77535,3010,0
+290000,Female,Graduate School,Single,29,0,78928,8000,0
+200000,Female,University,Married,29,0,118729,4874,0
+150000,Female,University,Single,29,-1,933,598,0
+180000,Female,Graduate School,Single,27,0,101489,10006,0
+420000,Female,University,Married,32,0,546485,18546,0
+30000,Female,High School,Married,29,0,10456,3200,1
+230000,Female,Graduate School,Single,31,-1,2155,2010,0
+60000,Female,University,Single,27,1,5087,2410,0
+220000,Female,University,Single,28,0,43436,5012,0
+240000,Female,Graduate School,Single,29,0,186969,5510,0
+140000,Female,University,Single,31,-2,518,518,0
+200000,Female,Graduate School,Single,32,-2,6890,11002,0
+410000,Female,Graduate School,Single,30,-2,0,8417,0
+140000,Female,University,Married,34,0,47922,2710,0
+80000,Female,University,Single,30,0,127580,13000,0
+340000,Female,University,Single,29,0,246245,8730,0
+140000,Female,Graduate School,Single,29,-1,4880,0,1
+170000,Female,Graduate School,Single,30,2,155910,6800,1
+290000,Female,Graduate School,Single,30,0,288033,10784,0
+360000,Female,University,Single,31,0,3386,5964,0
+310000,Female,Graduate School,Married,32,0,4762,26943,1
+500000,Female,University,Single,32,-1,9609,5637,0
+250000,Female,Graduate School,Single,33,2,243262,7991,1
+500000,Female,Graduate School,Single,33,-2,10361,263,0
+80000,Female,University,Married,33,0,74973,3113,0
+180000,Female,University,Single,30,0,23398,1541,0
+320000,Female,Graduate School,Single,30,0,60947,2868,0
+130000,Female,University,Married,30,0,106796,4000,0
+250000,Female,High School,Single,30,-1,1962,82,0
+380000,Female,Graduate School,Single,27,-1,119628,3250,0
+200000,Female,Graduate School,Single,27,0,24635,0,0
+260000,Female,Others,Married,28,0,237212,8006,0
+150000,Female,University,Single,28,2,137564,3500,1
+100000,Female,Graduate School,Single,28,1,-4894,0,0
+180000,Female,University,Married,28,0,85557,3200,0
+150000,Female,Graduate School,Single,26,0,136736,4486,0
+230000,Female,University,Single,27,0,175050,10000,0
+160000,Female,University,Single,29,0,36370,4002,0
+360000,Female,University,Married,30,0,20992,20000,0
+150000,Female,High School,Married,28,0,146325,5125,0
+360000,Female,Graduate School,Married,34,-2,0,0,1
+10000,Female,University,Single,37,-1,6993,6495,0
+20000,Female,University,Single,35,0,15852,1579,1
+50000,Female,University,Married,35,0,34265,2000,0
+50000,Female,University,Single,37,1,26305,0,0
+60000,Female,University,Married,37,1,49335,0,1
+20000,Female,High School,Single,37,2,12443,2000,0
+130000,Female,University,Single,39,2,108879,5500,0
+50000,Female,High School,Single,36,0,48831,2002,0
+350000,Female,University,Married,36,0,50889,10000,0
+210000,Female,University,Married,40,0,41826,1520,0
+10000,Female,University,Single,23,0,8536,1117,0
+20000,Female,University,Married,34,1,17970,0,0
+80000,Female,Graduate School,Married,33,0,65484,3000,0
+110000,Female,University,Single,33,0,83762,3200,1
+180000,Female,Graduate School,Single,40,-2,0,0,0
+360000,Female,High School,Single,37,-1,646,0,0
+250000,Female,University,Married,33,-1,484,0,0
+180000,Female,University,Married,34,0,13954,5589,0
+190000,Female,University,Single,35,0,66859,3054,0
+150000,Female,Graduate School,Single,35,-1,15349,0,0
+90000,Female,University,Married,35,0,37003,2000,0
+230000,Female,Graduate School,Single,39,-1,4612,2906,0
+120000,Female,High School,Married,47,-2,0,0,0
+100000,Female,High School,Married,36,2,99298,5220,1
+60000,Female,High School,Single,45,0,26350,1413,0
+360000,Female,Graduate School,Single,38,0,254900,9300,0
+310000,Female,University,Married,38,0,302623,12000,0
+280000,Female,University,Single,37,-1,12693,13214,0
+30000,Female,High School,Single,38,2,11403,0,0
+260000,Female,University,Married,38,0,81745,2984,0
+180000,Female,University,Married,36,-1,1252,23,0
+330000,Female,University,Single,36,0,257156,10037,0
+400000,Female,High School,Married,46,-1,260,60064,1
+90000,Female,University,Married,36,0,86155,8600,0
+200000,Female,Graduate School,Single,36,-2,0,0,1
+280000,Female,High School,Single,38,-1,6491,4041,0
+350000,Female,Graduate School,Married,37,-2,1140,0,0
+50000,Female,University,Married,42,0,47867,2083,0
+50000,Female,University,Single,44,-1,390,390,0
+120000,Female,University,Married,41,-1,4591,0,0
+80000,Female,University,Single,38,0,17816,3000,0
+90000,Female,University,Others,36,0,91552,55000,0
+50000,Female,University,Others,44,0,42941,1800,0
+180000,Female,Graduate School,Married,44,0,123100,6102,0
+200000,Female,High School,Married,31,0,202139,7000,0
+420000,Female,University,Married,31,0,147344,4153,0
+270000,Female,University,Married,31,-1,6894,14305,0
+50000,Female,High School,Married,49,0,46970,3005,0
+30000,Female,Graduate School,Married,41,0,26300,1450,0
+170000,Female,University,Married,36,2,15570,0,0
+170000,Female,University,Married,42,0,52262,1985,0
+150000,Female,Others,Single,40,0,157548,160812,1
+20000,Female,University,Single,38,1,21769,0,0
+120000,Female,University,Married,40,0,105124,6020,0
+50000,Female,University,Married,41,2,17028,3828,1
+500000,Female,High School,Single,44,0,206766,20077,0
+40000,Female,University,Married,36,2,7129,3147,1
+70000,Female,University,Married,48,-1,3838,1930,0
+30000,Female,University,Married,44,0,28467,1814,1
+70000,Female,High School,Married,48,1,9518,0,1
+20000,Female,High School,Married,49,0,17543,2100,1
+140000,Female,High School,Married,45,-1,1352,381,0
+200000,Female,University,Married,39,3,117868,0,1
+90000,Female,High School,Married,42,0,34672,1562,0
+30000,Female,High School,Single,44,1,3213,0,0
+260000,Female,University,Married,41,-2,8154,1365,0
+100000,Female,University,Married,36,1,0,0,0
+150000,Female,Others,Single,36,1,0,0,0
+50000,Female,High School,Single,44,2,23268,1500,1
+200000,Female,University,Married,31,-1,2480,0,0
+480000,Female,Graduate School,Single,34,-1,19810,3495,0
+150000,Female,Others,Married,36,-2,7160,5279,0
+80000,Female,Graduate School,Married,38,-1,2226,6259,0
+310000,Female,High School,Married,39,-1,11819,3000,0
+200000,Female,High School,Married,46,-1,14617,1261,0
+20000,Female,University,Married,43,0,11511,1205,0
+480000,Female,High School,Married,41,0,44018,35176,0
+200000,Female,University,Single,35,0,6512,2000,0
+20000,Female,High School,Married,45,-1,1649,1071,0
+300000,Female,Graduate School,Single,30,-1,515,200,0
+230000,Female,University,Married,34,1,0,0,0
+50000,Female,University,Married,35,0,7378,1121,1
+50000,Female,High School,Married,39,0,49575,2067,0
+110000,Female,High School,Married,40,0,105055,4058,0
+70000,Female,High School,Married,49,2,16127,4500,1
+50000,Female,High School,Others,46,-2,-1494,7084,0
+50000,Female,High School,Married,32,0,33231,1868,1
+440000,Female,University,Married,46,0,177425,7700,0
+150000,Female,University,Single,38,-1,1242,0,0
+420000,Female,University,Married,37,0,36032,7022,0
+180000,Female,University,Married,33,-1,3490,0,0
+230000,Female,Graduate School,Single,33,0,38646,2000,0
+50000,Female,University,Single,44,0,16094,3155,1
+260000,Female,University,Single,32,0,251330,10000,1
+260000,Female,University,Single,33,-1,5188,13000,1
+310000,Female,Graduate School,Married,33,-2,2729,0,1
+60000,Female,University,Married,37,2,42101,2500,1
+120000,Female,University,Single,40,-1,32130,3613,0
+400000,Female,University,Married,41,0,10928,2000,0
+80000,Female,High School,Married,45,-2,640,23699,0
+60000,Female,University,Married,40,0,31168,2000,0
+160000,Female,University,Married,40,0,137930,6574,0
+220000,Female,University,Married,34,0,208009,8200,0
+300000,Female,University,Married,46,0,5287,3000,0
+100000,Female,University,Single,32,0,101556,3481,0
+60000,Female,University,Married,37,0,55233,1940,0
+80000,Female,University,Married,43,0,71100,10000,0
+10000,Female,High School,Single,43,-1,1593,2006,0
+30000,Female,High School,Others,40,-1,29896,11158,0
+500000,Female,Graduate School,Married,36,-2,9241,10786,0
+40000,Female,High School,Married,40,1,-531,3383,1
+300000,Female,University,Single,35,0,48926,1006,0
+20000,Female,University,Married,34,2,13552,3188,0
+280000,Female,Graduate School,Single,37,-1,2560,0,1
+190000,Female,Graduate School,Single,38,0,39468,2151,0
+500000,Female,University,Married,33,-1,9511,1468,0
+250000,Female,Others,Married,43,-2,-6,1304,0
+120000,Female,High School,Single,47,-1,7301,2327,0
+20000,Female,High School,Single,35,-2,0,1492,0
+210000,Female,Graduate School,Married,40,1,66635,0,0
+220000,Female,Others,Single,46,0,104603,5000,0
+30000,Female,High School,Married,49,0,28806,1600,0
+340000,Female,University,Married,45,0,284099,11000,0
+180000,Female,Graduate School,Married,36,0,4911,2000,0
+40000,Female,Graduate School,Married,40,-1,776,776,1
+20000,Female,University,Married,37,0,19969,1400,0
+130000,Female,University,Married,48,2,105500,5500,1
+50000,Female,University,Married,41,0,47406,2200,1
+300000,Female,Graduate School,Married,31,-1,17641,2765,0
+140000,Female,University,Single,33,0,62063,22400,0
+260000,Female,University,Married,46,0,16170,3000,0
+100000,Female,University,Married,45,0,90483,3315,0
+130000,Female,University,Single,40,2,8539,1165,1
+200000,Female,University,Single,47,1,0,0,1
+200000,Female,Graduate School,Married,48,-1,657,0,1
+200000,Female,Graduate School,Single,31,1,0,6372,0
+180000,Female,Graduate School,Single,32,0,17332,4579,0
+360000,Female,University,Married,33,-2,0,0,1
+420000,Female,Graduate School,Single,34,-1,26232,8967,0
+180000,Female,Graduate School,Married,35,-1,3544,0,0
+50000,Female,High School,Married,35,3,34440,0,0
+120000,Female,Graduate School,Married,34,-2,3755,0,1
+80000,Female,High School,Married,35,0,54986,1966,0
+180000,Female,Graduate School,Married,38,1,0,0,0
+20000,Female,University,Married,32,0,15547,2900,1
+180000,Female,Others,Married,32,0,136055,6400,0
+230000,Female,University,Married,34,-2,3256,1401,0
+80000,Female,University,Married,43,0,78374,4000,0
+90000,Female,University,Married,42,-1,1367,2086,0
+300000,Female,High School,Married,30,0,25936,1500,0
+160000,Female,Graduate School,Single,36,1,0,0,1
+180000,Female,University,Single,47,-1,4164,701,0
+160000,Female,University,Single,30,0,13680,2019,0
+230000,Female,Graduate School,Single,30,0,246101,38172,0
+360000,Female,Graduate School,Single,31,-2,3429,33702,0
+500000,Male,Graduate School,Single,31,-2,26760,750,0
+120000,Male,University,Single,30,2,69443,3200,0
+280000,Male,Graduate School,Single,31,-2,0,0,1
+200000,Male,University,Married,32,1,134625,341,0
+180000,Male,Graduate School,Single,31,0,89903,3073,0
+240000,Female,Graduate School,Married,34,0,1670,1479,1
+500000,Female,Graduate School,Married,44,0,322849,14000,0
+180000,Female,University,Married,48,-1,4498,0,0
+330000,Female,University,Married,43,0,15018,3000,0
+420000,Female,Graduate School,Single,37,0,433837,15500,0
+220000,Female,University,Married,33,0,41352,3100,0
+170000,Female,University,Single,34,-1,2435,0,0
+140000,Female,Graduate School,Married,48,-1,907,19891,0
+200000,Female,High School,Single,31,0,187864,8500,0
+230000,Female,University,Married,32,0,55623,5855,0
+150000,Female,Others,Single,31,-2,9846,5345,0
+240000,Female,Graduate School,Single,37,-2,660,31187,0
+210000,Female,University,Married,49,-2,21639,8455,0
+500000,Female,Graduate School,Single,35,0,113466,12197,0
+180000,Female,High School,Married,39,1,0,0,0
+60000,Female,University,Single,31,0,59426,2800,0
+290000,Female,Graduate School,Married,34,0,276365,9263,0
+80000,Female,High School,Others,48,0,36320,2030,0
+180000,Female,Graduate School,Married,46,0,182283,15000,0
+350000,Female,High School,Married,33,-1,61937,5343,0
+160000,Female,University,Others,41,0,8554,1000,1
+210000,Female,High School,Single,36,0,57605,3000,0
+160000,Female,Graduate School,Married,35,1,0,0,0
+350000,Female,Graduate School,Married,40,-2,3286,0,0
+130000,Female,University,Married,41,0,104898,4714,0
+240000,Female,University,Married,49,-2,0,2421,0
+330000,Female,High School,Married,40,0,81084,3100,0
+500000,Female,University,Married,45,1,0,1376,1
+170000,Female,High School,Married,41,-1,1000,0,0
+240000,Female,Others,Married,43,-2,4522,1868,0
+180000,Female,Others,Single,47,0,167915,6028,1
+260000,Female,University,Single,36,-1,17631,13184,0
+130000,Female,University,Married,43,-1,2292,6776,0
+30000,Female,University,Single,38,-1,497,1001,0
+440000,Female,University,Single,42,1,0,0,0
+260000,Female,University,Married,43,-1,684,1742,0
+30000,Female,Graduate School,Married,38,2,25080,1800,0
+390000,Female,University,Married,37,0,200927,17843,0
+260000,Female,University,Married,45,0,113872,6302,0
+70000,Female,University,Married,46,0,65906,3500,0
+200000,Female,High School,Married,30,2,140327,6600,0
+150000,Female,University,Married,32,0,21274,11540,0
+460000,Female,Graduate School,Single,32,-2,13377,12840,0
+180000,Female,High School,Married,33,-1,478,478,0
+70000,Female,Graduate School,Single,37,0,64860,2200,0
+250000,Female,University,Married,40,-2,1385,4843,0
+170000,Female,University,Married,39,0,22870,2000,0
+160000,Female,University,Single,34,-1,149,1100,0
+80000,Female,Graduate School,Single,36,2,12639,1300,1
+400000,Female,High School,Single,49,1,0,877,0
+20000,Female,University,Married,43,4,28447,0,1
+130000,Female,University,Single,44,0,128854,4000,0
+60000,Female,Graduate School,Married,36,0,4373,1131,0
+90000,Female,High School,Married,41,0,87240,10000,0
+210000,Female,University,Single,44,0,84298,3379,0
+380000,Female,Graduate School,Single,34,2,315912,13000,0
+200000,Female,University,Married,35,0,29327,2000,0
+180000,Female,Graduate School,Married,37,2,326,326,1
+20000,Female,University,Married,42,1,21001,0,0
+350000,Female,High School,Married,38,-2,1316,3061,0
+200000,Female,Others,Married,41,1,0,0,0
+400000,Female,High School,Single,34,-2,10150,5869,0
+230000,Female,Graduate School,Married,38,1,828,1547,0
+230000,Female,Graduate School,Married,43,-1,390,390,1
+50000,Female,University,Married,38,0,25184,1800,0
+300000,Female,University,Single,47,0,225542,7055,0
+230000,Female,High School,Married,45,0,15160,1500,0
+50000,Female,High School,Married,44,0,47569,1986,0
+150000,Female,Graduate School,Single,34,-1,7249,1070,0
+260000,Female,Graduate School,Single,33,-2,2537,2606,0
+320000,Female,University,Married,33,-1,3077,1661,1
+100000,Female,University,Married,49,0,55084,2500,0
+60000,Female,High School,Married,37,1,0,0,0
+420000,Female,High School,Married,42,-2,87263,20803,0
+500000,Female,University,Married,38,0,80460,3527,0
+450000,Female,University,Married,34,0,372188,20000,0
+30000,Female,High School,Married,42,0,75112,1371,1
+60000,Female,University,Married,47,2,26782,3400,1
+180000,Female,University,Married,43,-1,316,316,0
+320000,Female,University,Single,35,-1,1302,3571,0
+250000,Female,Graduate School,Single,37,0,116861,3424,0
+30000,Female,University,Single,34,0,23336,1700,0
+360000,Female,University,Married,34,-1,4874,5000,0
+360000,Female,High School,Married,43,-2,4435,805,0
+150000,Female,Graduate School,Married,36,-2,32440,13447,0
+400000,Female,University,Married,39,-1,58126,31,0
+350000,Female,Graduate School,Single,44,2,309250,12000,1
+80000,Female,High School,Married,45,-2,3473,3329,0
+140000,Female,High School,Married,37,-1,1208,4411,0
+150000,Female,High School,Single,48,0,175095,10011,0
+480000,Female,High School,Married,40,-2,5664,12871,0
+200000,Female,High School,Married,41,-2,0,0,0
+180000,Female,Graduate School,Single,29,-1,9862,2903,0
+260000,Female,Graduate School,Single,30,-2,0,3459,0
+450000,Female,Graduate School,Married,33,1,621,11420,0
+140000,Female,High School,Married,33,0,136302,6000,0
+220000,Female,High School,Married,43,0,242526,8100,1
+100000,Female,High School,Married,39,-1,380,1357,0
+160000,Female,University,Married,41,-2,550,550,0
+140000,Female,University,Married,36,0,74500,3508,0
+200000,Female,University,Single,39,-1,1017,0,0
+460000,Female,Graduate School,Single,30,-2,0,0,0
+50000,Female,High School,Single,31,0,41124,1750,0
+100000,Female,Graduate School,Married,31,-1,6033,19922,0
+150000,Female,University,Single,31,0,138363,6000,0
+460000,Female,Graduate School,Married,32,0,5085,3282,0
+360000,Female,University,Single,32,-2,171,1058,0
+180000,Female,Graduate School,Single,33,-1,4019,1550,0
+470000,Female,Graduate School,Married,42,0,10360,2474,0
+320000,Female,High School,Married,44,1,-1,508,0
+360000,Female,University,Single,47,-2,3899,7148,0
+60000,Female,University,Married,36,-1,2079,6964,0
+310000,Female,Graduate School,Married,39,-2,4934,1275,0
+140000,Female,University,Married,36,0,50663,1788,0
+450000,Female,University,Married,43,-2,7361,8692,0
+210000,Female,University,Married,30,-1,857,725,0
+180000,Female,Graduate School,Single,33,-1,2196,1640,1
+360000,Female,University,Single,38,-1,1659,0,0
+50000,Female,Graduate School,Married,33,2,40669,3424,1
+340000,Female,University,Single,34,0,16462,1310,0
+140000,Female,High School,Married,38,2,119715,6000,0
+120000,Female,Graduate School,Single,32,0,32051,10000,0
+180000,Female,University,Married,32,1,0,0,1
+360000,Female,Graduate School,Single,32,0,192160,20104,0
+340000,Female,Graduate School,Married,34,0,254219,9500,0
+200000,Female,University,Married,35,0,136264,5000,0
+80000,Female,University,Single,36,-2,3966,0,0
+500000,Female,University,Married,35,0,286595,11000,0
+150000,Female,University,Married,31,-2,416,416,0
+20000,Female,Graduate School,Single,46,1,14231,1200,1
+500000,Female,University,Married,41,0,275956,15000,0
+340000,Female,High School,Married,47,0,169842,7000,0
+500000,Female,University,Single,34,-1,2730,5606,0
+240000,Female,University,Single,31,0,173628,8000,0
+80000,Female,University,Single,34,0,7437,4000,0
+200000,Female,Graduate School,Married,37,1,-58,0,0
+230000,Female,University,Single,31,-2,-18,3937,0
+300000,Female,High School,Married,36,-2,4035,0,0
+140000,Female,University,Married,36,-1,165,341,1
+300000,Female,Graduate School,Single,33,-1,3854,142,0
+360000,Female,High School,Single,34,-2,0,0,1
+320000,Female,High School,Married,35,0,157249,6000,1
+380000,Female,University,Married,30,0,384846,13291,0
+170000,Female,High School,Married,30,0,136639,7200,0
+300000,Female,Graduate School,Married,41,-1,66132,6983,0
+60000,Female,High School,Single,44,-1,2802,0,0
+240000,Female,University,Married,36,0,214122,19000,0
+500000,Female,High School,Single,44,-2,0,1275,0
+230000,Female,University,Married,45,-1,12047,17094,0
+290000,Female,Others,Single,35,0,33278,1600,0
+60000,Female,University,Married,35,0,59449,2300,0
+100000,Female,University,Single,35,0,98118,3500,0
+270000,Female,University,Single,36,-2,-33,0,0
+280000,Female,University,Single,35,0,259637,10028,0
+100000,Female,University,Married,38,3,750,0,0
+100000,Female,Graduate School,Single,32,-1,1876,9814,0
+90000,Female,Graduate School,Single,31,1,0,0,0
+280000,Female,Others,Married,44,-2,5374,12360,1
+150000,Female,Others,Married,38,-2,0,0,0
+70000,Female,University,Married,40,-1,390,780,0
+180000,Female,Graduate School,Single,32,-1,1160,2889,0
+190000,Female,Graduate School,Single,33,0,153962,10000,0
+500000,Female,University,Married,38,0,15050,7000,0
+190000,Female,High School,Married,47,1,194487,92,0
+210000,Female,Graduate School,Single,32,-2,524,4581,0
+60000,Female,Graduate School,Married,38,-1,6842,15772,0
+200000,Female,High School,Married,34,0,198482,5014,0
+470000,Female,Graduate School,Single,32,1,0,0,0
+260000,Female,Graduate School,Single,30,0,235961,7988,0
+140000,Female,Graduate School,Single,30,-1,13626,8399,0
+190000,Female,University,Married,33,-2,-2650,0,1
+80000,Female,University,Married,36,0,73873,2006,0
+310000,Female,University,Single,37,0,68430,5000,0
+150000,Female,Graduate School,Single,40,0,16986,2000,0
+340000,Female,University,Married,30,0,338624,15000,1
+360000,Female,University,Single,31,0,49011,15000,0
+250000,Female,University,Married,38,2,236546,8007,0
+220000,Female,University,Married,34,-1,3410,1623,0
+30000,Female,High School,Married,46,1,0,0,0
+210000,Female,High School,Married,41,-2,16486,15728,0
+350000,Female,Graduate School,Married,32,-1,30625,60396,1
+270000,Female,University,Married,33,-1,1481,3952,0
+200000,Female,University,Single,34,0,205362,5504,0
+70000,Female,University,Single,40,2,64744,2361,1
+220000,Female,High School,Married,45,-1,1196,8087,0
+230000,Female,University,Married,39,0,5650,3798,1
+120000,Female,University,Married,35,0,22148,1550,0
+360000,Female,University,Married,30,0,17602,1030,0
+210000,Female,High School,Single,37,-1,13438,9315,0
+460000,Female,Graduate School,Married,40,3,2650,0,1
+500000,Female,Graduate School,Married,34,-2,13000,13000,0
+180000,Female,University,Single,35,-2,3010,1679,0
+180000,Female,High School,Married,36,-2,11540,633,0
+220000,Female,University,Married,26,0,68090,4039,0
+600000,Female,University,Single,30,0,501499,19500,0
+180000,Female,University,Single,36,-1,360,2240,1
+360000,Female,University,Married,39,-1,26680,22434,0
+620000,Female,Graduate School,Single,38,-1,1966,35703,0
+60000,Female,High School,Married,44,-1,63236,2500,1
+100000,Female,Graduate School,Single,32,0,11273,694,0
+200000,Female,Graduate School,Single,31,-1,7773,3643,0
+50000,Female,High School,Single,36,0,49889,1952,0
+200000,Female,University,Single,29,1,0,0,1
+10000,Female,University,Married,43,0,4834,1038,1
+280000,Female,University,Single,36,0,93982,15000,0
+450000,Female,University,Married,46,-2,30535,84267,0
+260000,Female,Graduate School,Single,29,0,267336,11000,1
+140000,Female,University,Others,30,0,124288,10000,0
+390000,Female,Graduate School,Single,33,0,84100,5000,0
+350000,Female,Graduate School,Married,41,-2,5037,15069,0
+50000,Female,High School,Married,44,0,47497,2306,0
+120000,Female,University,Single,46,0,68168,2650,0
+80000,Female,University,Married,45,0,17613,2000,0
+160000,Female,High School,Others,36,-1,9330,31745,0
+90000,Female,High School,Married,46,-2,316,10758,0
+190000,Female,University,Married,39,0,180201,7000,1
+100000,Female,High School,Married,40,2,13278,0,1
+210000,Female,University,Single,35,0,147812,5000,0
+160000,Female,University,Single,33,2,161771,15000,0
+240000,Female,High School,Married,35,0,238166,8522,1
+20000,Female,Graduate School,Single,36,-2,223,0,0
+220000,Female,University,Married,36,2,192654,8500,1
+260000,Female,Graduate School,Single,44,0,279670,11778,0
+500000,Female,Graduate School,Married,34,0,250263,29000,0
+200000,Female,University,Single,34,-2,0,0,0
+260000,Female,University,Married,33,0,151123,4002,1
+170000,Female,Graduate School,Single,30,0,171992,6027,0
+150000,Female,University,Married,38,-1,5584,1106,0
+360000,Female,University,Married,37,-2,-5,0,0
+260000,Female,Graduate School,Married,44,0,296349,6891,0
+170000,Female,Graduate School,Single,30,0,15220,8404,0
+230000,Female,Graduate School,Single,33,0,10765,1300,0
+620000,Female,Graduate School,Single,38,-1,12136,8638,0
+180000,Female,Graduate School,Married,35,0,68488,3603,0
+180000,Female,Graduate School,Married,33,-1,7017,14824,0
+30000,Female,University,Others,45,-1,5239,0,1
+300000,Female,Graduate School,Single,31,0,31827,1009,0
+370000,Female,University,Married,35,0,236242,10301,0
+80000,Female,University,Single,31,0,28788,8000,0
+200000,Female,Others,Single,40,0,201598,10100,0
+150000,Female,High School,Married,42,-1,11070,6151,0
+210000,Female,University,Married,33,0,159715,6000,0
+70000,Female,Graduate School,Married,35,1,41730,0,0
+80000,Female,University,Married,38,-1,4404,5096,0
+50000,Female,University,Others,44,-1,390,390,0
+210000,Female,Graduate School,Single,33,0,14682,1075,1
+260000,Female,University,Married,33,0,112501,4000,0
+170000,Female,Graduate School,Married,36,-2,7346,27183,0
+360000,Female,High School,Single,35,-1,2500,0,0
+500000,Female,Graduate School,Single,37,0,43643,4000,0
+290000,Female,Graduate School,Single,46,0,113618,34507,0
+200000,Female,High School,Married,35,0,215090,7700,0
+90000,Female,University,Married,47,0,28240,2000,0
+100000,Female,University,Single,41,-1,1023,0,0
+290000,Female,Graduate School,Single,31,0,82939,5017,0
+50000,Female,Graduate School,Single,31,1,0,0,1
+150000,Female,Graduate School,Married,32,0,75918,3525,0
+400000,Female,High School,Married,42,-1,44198,10132,0
+50000,Female,High School,Married,37,0,49080,2228,0
+400000,Female,Graduate School,Married,42,-1,80,0,0
+50000,Female,University,Married,37,2,19068,1500,1
+50000,Female,Graduate School,Single,29,3,49624,0,1
+100000,Female,Graduate School,Single,34,0,66193,3413,0
+170000,Female,University,Married,39,0,100849,2000,0
+150000,Female,Graduate School,Single,30,-2,52806,13497,0
+360000,Female,Graduate School,Single,30,0,37012,4924,0
+50000,Female,University,Married,31,0,46993,2000,1
+180000,Female,Graduate School,Married,35,-1,37701,32005,0
+350000,Female,University,Single,35,0,314309,13000,0
+120000,Female,High School,Married,38,0,111098,5600,0
+140000,Female,University,Married,33,-1,1750,397,0
+150000,Female,High School,Married,34,0,28210,1130,0
+260000,Female,High School,Single,42,-1,6783,0,1
+50000,Female,University,Married,28,0,45797,2500,1
+180000,Female,University,Married,36,-1,1000,92,0
+290000,Female,High School,Married,44,-1,2615,0,0
+40000,Female,University,Married,49,0,36159,3000,0
+310000,Female,Graduate School,Single,36,0,67940,2000,0
+70000,Female,University,Single,46,0,86919,3000,1
+180000,Female,Graduate School,Married,39,-2,0,0,0
+570000,Female,University,Married,46,0,563892,19534,0
+150000,Female,High School,Single,41,2,102629,3100,1
+20000,Female,University,Single,30,1,16560,0,0
+180000,Female,High School,Married,32,-1,166,1484,0
+100000,Female,High School,Married,34,0,9728,5000,0
+90000,Female,Graduate School,Single,31,0,77278,8000,1
+150000,Female,High School,Single,47,0,153412,7000,0
+490000,Female,University,Married,31,0,245566,10095,0
+30000,Female,High School,Married,33,0,39577,3800,1
+300000,Female,University,Married,33,-1,996,1500,0
+140000,Female,University,Married,32,-1,739,0,1
+50000,Female,University,Single,32,0,12513,1600,0
+150000,Female,Graduate School,Single,31,2,17893,3400,1
+350000,Female,Graduate School,Single,32,0,87611,3300,0
+140000,Female,University,Single,32,-1,15497,1231,0
+80000,Female,High School,Married,43,-1,1520,23190,0
+80000,Female,University,Married,40,-1,390,390,0
+210000,Female,High School,Married,34,-2,11848,5687,0
+110000,Female,University,Married,34,0,45302,2100,0
+120000,Female,University,Single,29,1,0,0,0
+180000,Female,University,Single,28,-1,3840,18588,0
+340000,Female,High School,Married,44,3,160596,6,1
+220000,Female,Graduate School,Single,29,-2,-12,0,0
+230000,Female,Others,Married,39,-1,7479,31280,0
+170000,Female,Graduate School,Single,27,0,10596,3000,0
+70000,Female,Graduate School,Single,28,0,44397,3000,0
+50000,Female,High School,Married,31,0,10184,2204,0
+200000,Female,High School,Single,30,0,192743,9003,0
+290000,Female,University,Single,34,2,219757,0,1
+80000,Female,Others,Single,30,-1,2994,16709,0
+30000,Female,High School,Married,39,1,25574,0,0
+280000,Female,Graduate School,Single,37,0,280323,10505,1
+100000,Female,Graduate School,Single,29,0,85268,5000,0
+50000,Female,Graduate School,Married,35,-1,50202,1200,0
+240000,Female,University,Married,43,-1,2077,0,1
+240000,Female,Graduate School,Married,31,0,3371,2000,0
+100000,Female,Graduate School,Married,31,1,0,0,0
+210000,Female,Graduate School,Single,36,-2,14516,4915,0
+400000,Female,Graduate School,Married,37,0,49447,30000,0
+280000,Female,University,Married,42,-2,396,396,0
+180000,Female,University,Single,36,-1,802,2240,0
+500000,Female,Graduate School,Married,36,0,238847,10000,0
+200000,Female,University,Married,45,0,144314,7008,0
+500000,Female,Graduate School,Single,27,0,8328,25450,1
+200000,Female,University,Single,30,-2,389,5890,0
+110000,Female,University,Married,27,1,110593,4363,0
+280000,Female,University,Single,30,0,21489,5,0
+100000,Female,University,Single,31,2,41052,2000,1
+70000,Female,University,Married,33,0,31130,10171,0
+130000,Female,High School,Married,47,-1,5027,1622,0
+30000,Female,University,Married,35,2,17880,2000,1
+30000,Female,University,Married,43,0,13135,1240,0
+180000,Female,Graduate School,Single,27,-1,2381,28100,0
+100000,Female,University,Single,27,0,48051,2000,0
+150000,Female,University,Single,30,1,0,0,0
+140000,Female,University,Married,32,0,142357,5300,0
+240000,Female,Graduate School,Single,45,-2,103344,0,0
+180000,Female,Graduate School,Married,48,-2,8308,440,0
+340000,Female,University,Married,37,-2,7200,0,0
+200000,Female,University,Married,37,0,114480,5000,0
+360000,Female,University,Married,39,-2,4201,0,1
+500000,Female,Graduate School,Married,37,1,0,17250,0
+270000,Female,Others,Married,33,-2,0,0,0
+140000,Female,High School,Married,33,-1,1247,1654,0
+120000,Female,University,Single,33,-2,0,0,0
+210000,Female,University,Married,34,0,205475,9500,0
+300000,Female,Graduate School,Married,35,-1,5345,0,0
+360000,Female,University,Married,41,0,4528,2000,0
+120000,Female,University,Married,27,-1,1164,1671,0
+430000,Female,Graduate School,Single,29,0,42130,1000,0
+250000,Female,University,Single,27,0,4883,2500,0
+180000,Female,Graduate School,Single,29,-1,326,326,1
+170000,Female,University,Married,29,0,56980,3000,0
+500000,Female,Graduate School,Married,36,0,64266,26000,0
+20000,Female,University,Married,37,-1,425,1629,0
+200000,Female,Others,Married,43,0,132823,71998,0
+50000,Female,University,Married,48,0,46589,2500,0
+120000,Female,Graduate School,Single,34,-1,515,882,0
+340000,Female,University,Married,31,0,192860,7332,0
+80000,Female,Graduate School,Single,32,-1,5566,6000,0
+90000,Female,University,Married,33,0,3817,5000,0
+90000,Female,University,Married,32,0,39664,1657,0
+80000,Female,Graduate School,Married,32,-1,1784,871,0
+170000,Female,University,Single,46,0,162110,6000,0
+230000,Female,Graduate School,Single,29,-1,33147,1672,0
+290000,Female,University,Single,30,-2,4603,2699,0
+360000,Female,University,Married,44,1,48114,0,0
+270000,Female,University,Married,43,-1,94125,1326,0
+60000,Female,High School,Married,43,0,49340,4570,0
+360000,Female,University,Single,33,0,7157,2000,0
+100000,Female,Graduate School,Single,28,0,20068,1800,0
+180000,Female,Others,Married,44,0,20916,0,0
+80000,Female,University,Married,31,-2,3506,2372,0
+80000,Female,University,Married,34,-2,0,0,0
+220000,Female,University,Married,35,0,45739,2000,0
+240000,Female,University,Married,43,1,0,0,0
+210000,Female,University,Single,44,0,308361,9408,1
+270000,Female,University,Married,34,0,106600,4500,0
+170000,Female,University,Single,44,0,2376,2870,1
+190000,Female,High School,Single,29,0,132669,6000,0
+200000,Female,Graduate School,Single,31,-1,2500,0,1
+240000,Female,University,Married,32,0,234774,10000,0
+150000,Female,University,Married,34,0,23900,1329,0
+90000,Female,High School,Married,44,-1,421,9422,0
+400000,Female,High School,Married,44,0,35504,5025,0
+420000,Female,University,Married,46,0,347531,13503,0
+140000,Female,Graduate School,Single,25,0,52196,2000,0
+50000,Female,High School,Married,48,0,50411,5000,0
+210000,Female,University,Married,46,0,134700,6149,0
+210000,Female,Graduate School,Single,30,0,87292,4600,0
+180000,Female,University,Married,41,-1,4351,12069,0
+100000,Female,Graduate School,Single,30,-1,430,35993,0
+300000,Female,Graduate School,Married,34,-1,56,0,0
+250000,Female,University,Single,36,-1,396,396,0
+200000,Female,Graduate School,Single,33,0,19572,10000,0
+380000,Female,Graduate School,Married,33,0,372926,13248,0
+30000,Female,Graduate School,Single,25,0,26417,1448,0
+80000,Female,University,Married,31,0,70492,2806,1
+300000,Female,University,Married,27,1,311987,0,0
+240000,Female,Graduate School,Married,38,1,-25,4103,0
+80000,Female,Graduate School,Single,23,0,25084,1500,0
+280000,Female,University,Single,27,0,272507,8533,0
+50000,Female,University,Single,36,0,23368,4400,1
+200000,Female,University,Married,36,2,197810,9051,1
+210000,Female,University,Single,32,-2,355,979,0
+160000,Female,University,Single,31,0,112878,8000,0
+130000,Female,Graduate School,Married,43,-2,2798,3228,0
+300000,Female,University,Married,39,0,301945,13700,0
+50000,Female,University,Married,41,1,21638,0,1
+150000,Female,University,Married,40,-1,16019,0,0
+500000,Female,University,Married,43,-2,10330,12320,0
+270000,Female,University,Married,34,0,203657,10015,0
+500000,Female,Graduate School,Married,43,-2,270,9483,0
+200000,Female,University,Married,40,0,21464,2000,0
+300000,Female,Graduate School,Single,33,-1,13889,1465,0
+230000,Female,University,Single,40,0,164208,7105,0
+500000,Female,Graduate School,Single,37,-1,5946,140000,0
+80000,Female,Graduate School,Single,38,-2,2431,31386,0
+500000,Female,Graduate School,Single,42,0,237066,20000,0
+30000,Female,High School,Married,44,0,29263,1441,0
+170000,Female,University,Married,41,-2,736,736,0
+70000,Female,University,Single,37,0,69099,3018,0
+310000,Female,University,Married,46,-1,12479,423,0
+50000,Female,Graduate School,Married,41,1,45313,2000,0
+180000,Female,University,Single,44,0,25706,5000,0
+310000,Female,Graduate School,Single,35,0,192454,9088,0
+500000,Female,Graduate School,Married,39,-2,4419,2584,0
+330000,Female,University,Single,41,0,76279,4179,0
+150000,Female,Others,Others,49,1,-260,4,0
+110000,Female,University,Single,42,0,106478,3951,0
+230000,Female,University,Married,46,-1,782,316,1
+50000,Female,Graduate School,Married,46,1,10372,110,0
+60000,Female,High School,Single,40,-1,728,22,0
+250000,Female,Graduate School,Married,32,1,-154973,225066,0
+450000,Female,Graduate School,Single,46,-2,2656,1861,0
+260000,Female,University,Single,33,2,258150,20485,1
+380000,Female,University,Married,45,0,220208,9000,0
+240000,Female,Graduate School,Single,38,-2,3816,3816,0
+30000,Female,High School,Married,34,-2,14840,10483,1
+90000,Female,University,Married,43,-2,0,0,0
+230000,Female,University,Married,43,0,226319,9309,1
+60000,Female,University,Married,45,0,57966,1900,0
+170000,Female,University,Married,38,0,9550,1055,0
+440000,Female,Graduate School,Married,41,-1,3982,16282,0
+120000,Female,University,Married,37,-1,15591,7227,0
+240000,Female,Graduate School,Married,37,0,180517,8022,0
+260000,Female,Graduate School,Married,42,-1,2788,4058,0
+300000,Female,University,Single,43,0,36062,1897,0
+190000,Female,Others,Married,34,-1,7251,11996,0
+110000,Female,Graduate School,Married,43,0,33794,6725,1
+280000,Female,University,Married,38,0,242864,11000,0
+220000,Female,University,Married,40,-1,1129,227,0
+280000,Female,Graduate School,Single,36,-1,4126,7065,0
+70000,Female,University,Married,39,3,20392,0,1
+500000,Female,University,Married,36,0,352986,13016,0
+130000,Female,University,Single,36,0,118410,5001,0
+610000,Female,Graduate School,Single,43,0,403546,25002,0
+210000,Female,Graduate School,Single,35,-2,6518,14272,0
+210000,Female,University,Married,45,0,203273,6515,0
+80000,Female,High School,Others,37,-2,600,0,0
+150000,Female,Graduate School,Married,33,2,105610,4500,1
+160000,Female,Graduate School,Married,33,1,0,0,0
+150000,Female,Others,Married,34,0,14967,3000,0
+130000,Female,High School,Married,45,-1,780,0,0
+210000,Female,University,Married,40,-2,2245,1794,0
+360000,Female,University,Married,42,-1,64268,2339,0
+500000,Female,University,Married,45,-2,443118,155484,0
+220000,Female,Graduate School,Single,33,-1,652,0,0
+320000,Female,University,Married,46,-2,15176,24969,0
+220000,Female,Graduate School,Married,41,8,246915,0,1
+80000,Female,University,Single,32,0,50189,10000,0
+210000,Female,Others,Single,32,-2,0,0,0
+160000,Female,Graduate School,Married,32,0,111907,3805,0
+160000,Female,Graduate School,Single,33,-1,699,0,1
+440000,Female,Graduate School,Married,36,0,14234,11066,0
+90000,Female,University,Married,36,1,0,950,0
+230000,Female,High School,Married,37,0,225800,8440,0
+180000,Female,Graduate School,Single,33,1,0,4163,0
+440000,Female,University,Married,32,2,339410,11500,0
+200000,Female,Graduate School,Married,41,1,2217,61634,1
+220000,Female,University,Married,41,-2,24435,2142,0
+360000,Female,University,Married,39,-1,1220,170,0
+120000,Female,Graduate School,Single,27,-1,934,0,1
+20000,Female,Graduate School,Single,27,0,5635,1186,0
+230000,Female,Graduate School,Single,28,-1,2376,0,0
+30000,Female,University,Single,29,-2,647,1222,0
+160000,Female,Graduate School,Single,29,0,157067,5708,1
+170000,Female,University,Single,29,0,69314,2330,0
+140000,Female,Graduate School,Single,29,1,0,1216,0
+160000,Female,Graduate School,Married,33,-1,7845,0,1
+500000,Female,Graduate School,Single,36,0,13302,1051,0
+80000,Female,High School,Married,31,0,62636,5000,1
+150000,Female,Graduate School,Married,32,0,23364,1450,0
+30000,Female,Graduate School,Single,46,0,18910,2000,0
+180000,Female,University,Married,47,-1,11900,4256,0
+150000,Female,University,Married,41,2,68571,3000,1
+320000,Female,Graduate School,Single,32,-1,13122,1500,0
+440000,Female,Graduate School,Single,33,0,164269,5666,0
+200000,Female,University,Married,38,-2,1430,6002,0
+200000,Female,University,Married,45,-1,8934,0,0
+230000,Female,Graduate School,Married,35,0,6012,78,1
+350000,Female,Graduate School,Single,38,-2,10354,29916,0
+100000,Female,Graduate School,Single,31,-2,1500,3794,0
+360000,Female,High School,Single,45,-2,0,0,1
+210000,Female,Graduate School,Single,31,0,202064,6000,0
+360000,Female,Graduate School,Married,32,0,317024,11200,0
+200000,Female,Graduate School,Single,32,-1,14279,3904,0
+230000,Female,Graduate School,Married,34,-2,711,0,0
+180000,Female,Others,Married,41,-1,22777,20328,0
+170000,Female,University,Single,47,0,34804,1600,0
+50000,Female,University,Married,40,0,6732,2441,0
+180000,Female,University,Single,37,1,37095,0,0
+100000,Female,Graduate School,Married,47,1,-84,0,1
+150000,Female,Graduate School,Single,28,0,82559,3500,0
+180000,Female,Graduate School,Single,33,-2,8492,5199,0
+280000,Female,University,Single,33,0,55709,9441,0
+150000,Female,University,Single,40,-2,8482,3514,0
+100000,Female,University,Married,36,2,78063,3500,1
+70000,Female,University,Married,37,0,66447,2391,1
+220000,Female,University,Married,38,2,166513,7400,0
+240000,Female,University,Married,41,-2,0,0,1
+710000,Female,Others,Single,32,0,377249,11000,0
+470000,Female,University,Single,33,-2,0,0,1
+80000,Female,University,Married,46,-1,390,390,1
+180000,Female,High School,Married,49,0,162567,5821,1
+250000,Female,University,Married,32,0,195044,8000,0
+690000,Female,High School,Single,34,0,390757,14724,0
+340000,Female,Graduate School,Single,45,0,68288,15000,0
+220000,Female,University,Single,47,0,103078,10000,0
+60000,Female,High School,Single,44,2,11847,55887,1
+80000,Female,University,Single,30,-1,5919,5548,0
+290000,Female,Graduate School,Married,36,-2,3602,6107,0
+170000,Female,High School,Single,41,0,76704,2405,0
+90000,Female,High School,Others,42,0,12174,3072,1
+130000,Female,Graduate School,Single,31,0,133845,7000,0
+200000,Female,High School,Married,32,-2,1380,3706,0
+360000,Female,University,Single,35,0,35327,10000,0
+240000,Female,Graduate School,Married,37,0,157668,10762,0
+60000,Female,Graduate School,Married,28,0,46233,2100,0
+130000,Female,University,Single,42,1,117391,5400,0
+310000,Female,University,Single,32,0,325056,13294,0
+80000,Female,University,Single,28,2,260,0,1
+70000,Female,University,Single,30,0,10705,50071,0
+180000,Female,University,Married,31,0,115400,2006,0
+110000,Female,University,Married,45,0,36267,1587,0
+30000,Female,High School,Single,47,0,29851,1500,0
+200000,Female,University,Single,27,0,125854,4775,0
+30000,Female,Others,Married,33,0,25083,1406,0
+210000,Female,Graduate School,Single,27,-1,780,0,1
+80000,Female,Others,Married,30,-2,0,0,0
+120000,Female,High School,Married,40,0,97886,3003,0
+30000,Female,University,Married,43,-1,10443,4503,0
+490000,Female,University,Single,33,0,58308,10000,0
+260000,Female,Others,Married,41,-1,2398,2294,0
+230000,Female,University,Single,30,-1,3948,5899,1
+110000,Female,University,Married,46,0,2359,2410,1
+150000,Female,University,Single,40,0,76282,3705,0
+170000,Female,University,Married,29,0,79091,2235,0
+90000,Female,University,Married,35,0,35232,4000,0
+70000,Female,High School,Married,40,0,14896,0,0
+100000,Female,Graduate School,Single,28,1,102122,322,0
+50000,Female,High School,Married,34,0,24203,1367,0
+30000,Female,University,Married,35,0,30471,1472,0
+280000,Female,University,Married,37,0,252814,10610,1
+210000,Female,High School,Married,41,0,59071,2000,0
+20000,Female,High School,Others,35,0,3975,362,0
+250000,Female,Others,Married,33,0,211084,6934,0
+70000,Female,University,Married,34,0,43662,1800,1
+120000,Female,Graduate School,Single,34,0,8113,2000,0
+300000,Female,Graduate School,Single,36,-2,230,469,0
+310000,Female,University,Married,39,-2,360,360,0
+20000,Female,High School,Single,34,-2,0,0,1
+50000,Female,University,Single,37,0,16090,38710,0
+100000,Female,University,Married,32,0,98908,2749,0
+280000,Female,University,Married,44,-1,9713,264,0
+210000,Female,University,Married,37,-2,2978,1798,0
+290000,Female,High School,Single,47,-1,1163,6650,0
+160000,Female,Graduate School,Married,36,-2,25386,2549,0
+100000,Female,University,Married,41,2,155351,105200,1
+280000,Female,University,Single,37,-1,1876,2716,0
+200000,Female,University,Married,32,-2,899,0,0
+180000,Female,University,Single,33,-1,3316,3446,0
+240000,Female,Graduate School,Single,34,1,-25,0,0
+180000,Female,University,Single,35,0,174068,10000,0
+180000,Female,University,Married,36,-2,0,0,0
+50000,Female,University,Single,35,1,18544,0,0
+30000,Female,High School,Single,38,2,24524,1426,1
+180000,Female,University,Married,40,0,144481,1620,0
+50000,Female,Graduate School,Married,38,2,64760,2000,0
+230000,Female,High School,Married,47,0,182787,7000,0
+60000,Female,University,Married,31,0,49581,1500,0
+50000,Female,High School,Single,32,0,48237,1755,0
+280000,Female,Graduate School,Married,32,0,137630,6000,0
+170000,Female,University,Married,45,-1,12847,9807,0
+110000,Female,University,Married,36,-1,2002,1056,1
+70000,Female,High School,Married,35,2,1183,41263,1
+90000,Female,High School,Married,47,2,92454,4100,1
+260000,Female,Graduate School,Married,36,0,10942,3000,0
+60000,Female,University,Married,33,0,57810,2100,0
+320000,Female,University,Married,37,2,289260,14500,0
+160000,Female,University,Single,39,-1,2828,1473,0
+230000,Female,Graduate School,Single,33,1,-3,4535,0
+50000,Female,University,Single,48,-1,1309,1000,0
+180000,Female,University,Single,36,-1,25359,28703,0
+390000,Female,University,Single,45,0,60082,5000,0
+290000,Female,University,Single,41,0,286391,10000,0
+290000,Female,University,Married,35,0,32485,10000,0
+30000,Female,High School,Married,46,0,29003,1500,0
+100000,Female,Graduate School,Married,38,1,0,0,0
+190000,Female,University,Married,34,1,-18,0,0
+200000,Female,Graduate School,Single,35,-2,441,0,0
+180000,Female,High School,Single,32,0,4480,3000,0
+250000,Female,University,Married,45,-1,3233,12716,0
+80000,Female,University,Married,39,0,41753,2200,0
+140000,Female,University,Married,37,0,137545,6206,0
+260000,Female,University,Married,40,0,35211,40000,0
+360000,Female,Graduate School,Single,39,-1,1070,0,1
+250000,Female,University,Married,45,-2,2218,4940,0
+320000,Female,Graduate School,Married,36,-1,7225,9386,0
+100000,Female,University,Married,42,0,63984,3000,0
+340000,Female,Graduate School,Married,36,0,183898,60075,0
+230000,Female,University,Single,41,-2,24477,23722,0
+150000,Female,High School,Married,43,0,18308,1973,1
+60000,Female,High School,Married,48,0,49890,2000,0
+500000,Female,High School,Single,48,-2,83948,25325,0
+10000,Female,University,Single,48,1,3133,0,0
+140000,Female,High School,Married,39,0,73166,3112,0
+90000,Female,University,Single,33,0,89129,3308,0
+390000,Female,University,Single,39,0,53386,2000,0
+150000,Female,Graduate School,Married,38,-1,9474,0,0
+170000,Female,University,Single,48,0,24953,2000,0
+400000,Female,University,Married,34,-1,41305,206032,0
+180000,Female,University,Married,38,0,151031,7011,0
+440000,Female,University,Married,44,0,70548,20000,0
+340000,Female,Graduate School,Single,36,-1,3474,1974,0
+170000,Female,University,Married,45,1,110970,9460,1
+280000,Female,Graduate School,Single,46,-1,2475,1236,0
+320000,Female,Others,Married,46,-2,17849,5149,0
+190000,Female,Others,Married,40,0,58783,3100,0
+210000,Female,University,Married,42,-1,1708,2928,0
+110000,Female,University,Married,43,1,100987,4600,0
+90000,Female,University,Single,34,0,30026,2000,0
+500000,Female,Graduate School,Married,44,-1,29178,26963,0
+50000,Female,University,Single,36,0,29647,2000,1
+210000,Female,Graduate School,Single,39,-1,5119,20965,0
+80000,Female,University,Married,37,2,1662,2427,1
+320000,Female,University,Married,42,-1,2862,18729,0
+50000,Female,High School,Married,45,0,48169,2400,0
+270000,Female,University,Single,34,-2,128620,2222,0
+30000,Female,High School,Single,39,0,28467,2000,0
+150000,Female,High School,Single,35,-1,1877,0,0
+140000,Female,University,Married,41,0,56379,1855,0
+120000,Female,High School,Married,42,-1,2386,1000,0
+300000,Female,Graduate School,Married,31,0,147277,7000,0
+480000,Female,Graduate School,Married,43,0,276681,5647,0
+460000,Female,Graduate School,Married,37,1,268775,0,1
+280000,Female,University,Married,39,0,269472,8229,0
+50000,Female,High School,Married,31,0,44115,1652,0
+350000,Female,Graduate School,Single,32,-2,384,717,0
+300000,Female,University,Single,32,0,114690,4682,0
+380000,Female,Graduate School,Single,33,-2,11924,16615,0
+190000,Female,Graduate School,Single,33,-2,211,4031,0
+230000,Female,University,Single,35,-2,0,0,1
+390000,Female,University,Married,35,2,62474,2757,0
+220000,Female,Others,Married,35,-2,0,319,0
+120000,Female,University,Married,41,0,118131,4211,1
+300000,Female,University,Single,49,1,0,0,0
+210000,Female,Graduate School,Single,42,-1,4131,24255,0
+50000,Female,University,Married,37,0,50664,3000,0
+330000,Female,Graduate School,Single,44,2,238795,10300,1
+110000,Female,High School,Married,42,0,110191,3200,0
+20000,Female,Graduate School,Single,37,1,0,0,0
+150000,Female,University,Married,40,0,6369,5215,0
+140000,Female,High School,Married,39,0,127896,7500,0
+160000,Female,Graduate School,Married,40,-1,661,2881,0
+260000,Female,High School,Married,38,-1,1973,1973,0
+150000,Female,High School,Married,45,1,0,0,0
+50000,Female,University,Married,33,0,34065,3000,0
+80000,Female,High School,Married,36,0,65205,20000,0
+270000,Female,University,Married,37,-1,304,0,0
+60000,Female,University,Married,40,0,58127,2800,0
+50000,Female,University,Married,45,0,15240,1574,0
+290000,Female,Graduate School,Married,33,2,88176,1354,1
+180000,Female,University,Single,39,-1,671,671,1
+360000,Female,Graduate School,Married,43,-1,17412,1328,0
+260000,Female,High School,Single,40,0,260,3000,0
+200000,Female,University,Single,46,-2,430,430,0
+290000,Female,University,Single,38,-2,0,130,0
+50000,Female,University,Married,36,1,17292,2000,0
+10000,Female,University,Married,37,-1,5050,5011,0
+180000,Female,Graduate School,Married,39,2,1343,438,1
+150000,Female,University,Single,39,0,22724,3221,0
+250000,Female,Graduate School,Single,44,-1,891,0,0
+130000,Female,University,Single,37,2,82640,5000,1
+540000,Female,Graduate School,Single,37,0,4696,2033,0
+320000,Female,Graduate School,Married,42,0,212883,9000,0
+440000,Female,University,Married,39,0,58639,30000,0
+80000,Female,University,Single,40,0,79390,3200,0
+120000,Female,University,Married,36,1,0,0,1
+230000,Female,University,Married,35,0,68755,1607,0
+60000,Female,High School,Single,37,0,53867,2000,0
+290000,Female,Others,Married,49,-1,3950,0,0
+130000,Female,University,Married,49,1,0,0,0
+180000,Female,University,Single,46,-2,1905,2453,0
+140000,Female,High School,Married,42,0,138391,6300,1
+500000,Female,Graduate School,Single,44,0,120452,100000,0
+300000,Female,University,Married,42,0,168796,7010,0
+210000,Female,University,Married,42,-2,0,0,0
+150000,Female,University,Married,49,-1,575,25622,0
+360000,Female,University,Married,37,-2,10072,7728,0
+250000,Female,University,Married,36,0,107476,5143,0
+30000,Female,High School,Single,44,0,21007,7200,0
+200000,Female,Graduate School,Married,42,-1,796,770,0
+130000,Female,University,Married,35,0,128649,5500,0
+290000,Female,High School,Married,49,0,52894,21560,0
+100000,Female,High School,Married,35,-1,50327,1835,0
+40000,Female,High School,Married,43,-1,780,177,0
+130000,Female,High School,Married,39,0,91982,8900,1
+160000,Female,High School,Married,34,-2,717,597,0
+150000,Female,Graduate School,Single,35,-2,0,26660,0
+130000,Female,University,Single,45,-1,131221,10006,0
+180000,Female,University,Married,39,-2,0,0,1
+150000,Female,University,Single,30,0,61785,3000,0
+100000,Female,University,Single,31,-2,4576,1872,1
+30000,Female,University,Single,31,0,26830,5000,0
+160000,Female,Graduate School,Single,25,-1,6087,1000,0
+50000,Female,University,Single,28,-1,1876,0,1
+130000,Female,University,Married,29,0,67260,4000,0
+140000,Female,High School,Single,31,0,137254,3320,0
+340000,Female,High School,Married,32,0,11049,57301,0
+340000,Female,University,Married,33,-2,1632,2027,0
+90000,Female,University,Single,35,2,89282,6201,0
+300000,Female,University,Married,42,0,281000,15000,0
+50000,Female,University,Single,25,0,37034,1586,0
+50000,Female,University,Married,37,0,48890,1787,0
+100000,Female,University,Married,42,-2,0,0,0
+20000,Female,High School,Single,40,3,18244,0,0
+160000,Female,Graduate School,Single,30,0,86454,3000,1
+390000,Female,University,Married,32,-1,2075,0,0
+30000,Female,High School,Married,39,-1,1937,0,1
+20000,Female,University,Married,46,1,3098,0,1
+210000,Female,University,Single,31,0,168161,5000,0
+300000,Female,Others,Married,31,-2,0,0,0
+220000,Female,Graduate School,Single,31,-1,232,0,0
+270000,Female,Graduate School,Single,33,0,68589,6000,0
+90000,Female,University,Married,40,2,4918,0,1
+360000,Female,University,Single,27,0,214711,10440,0
+300000,Female,Graduate School,Single,29,1,0,0,0
+310000,Female,University,Single,31,0,313956,11000,1
+130000,Female,University,Single,31,0,95292,4600,0
+80000,Female,Graduate School,Married,29,-1,2175,358,0
+280000,Female,University,Married,34,-2,6398,8331,0
+80000,Female,High School,Single,44,0,45412,1900,1
+130000,Female,High School,Single,39,1,4867,0,0
+270000,Female,Graduate School,Married,42,-1,1870,0,0
+20000,Female,University,Married,41,-1,958,1546,0
+260000,Female,Graduate School,Married,36,0,122825,5000,0
+170000,Female,University,Single,47,0,62927,1566,0
+160000,Female,Graduate School,Single,35,-1,19082,2815,0
+240000,Female,Graduate School,Single,35,-2,5757,3242,0
+170000,Female,University,Single,36,-2,2848,1941,0
+500000,Female,University,Married,40,0,215508,10004,0
+260000,Female,Graduate School,Married,38,-2,629,625,1
+150000,Female,Graduate School,Single,36,0,43544,5000,0
+500000,Female,Graduate School,Married,36,-2,8561,1364,1
+320000,Female,University,Single,34,0,1800,0,0
+90000,Female,University,Married,34,0,18358,1300,0
+300000,Female,University,Single,43,-2,2936,1676,0
+200000,Female,University,Single,36,1,73459,4500,0
+160000,Female,High School,Married,44,2,1500,0,1
+120000,Female,Graduate School,Single,29,-2,0,0,0
+120000,Female,Graduate School,Single,30,2,416,416,1
+90000,Female,University,Single,30,1,88093,0,1
+170000,Female,University,Married,31,-2,6015,7820,1
+170000,Female,Graduate School,Single,33,0,144031,5500,0
+500000,Female,Graduate School,Married,33,-2,10486,32648,0
+240000,Female,Graduate School,Single,33,0,239278,9582,0
+230000,Female,University,Married,30,0,4942,2554,0
+210000,Female,University,Married,32,-2,0,0,0
+400000,Female,Graduate School,Married,33,-1,10858,16000,0
+150000,Female,High School,Married,41,0,88818,3241,1
+50000,Female,Graduate School,Single,27,-1,6018,0,0
+150000,Female,University,Married,28,0,12780,850,1
+230000,Female,University,Single,28,-2,0,0,1
+470000,Female,Graduate School,Married,28,0,178262,6100,0
+30000,Female,University,Single,28,3,1200,0,1
+100000,Female,High School,Married,29,1,0,0,0
+30000,Female,High School,Married,34,0,8090,0,1
+110000,Female,University,Single,40,0,96950,3255,0
+300000,Female,High School,Married,39,2,36656,1929,0
+240000,Female,High School,Others,37,-1,374,1823,0
+350000,Female,University,Single,28,1,358895,0,1
+220000,Female,Graduate School,Single,28,-2,2956,2968,0
+400000,Female,Graduate School,Single,35,-2,26930,11187,0
+500000,Female,University,Married,40,0,26102,33009,0
+280000,Female,High School,Single,48,0,166473,7750,0
+30000,Female,University,Married,37,-2,108,709,0
+20000,Female,Graduate School,Single,49,0,16715,1271,0
+270000,Female,Others,Single,29,1,-9,3274,0
+270000,Female,Graduate School,Single,34,0,126642,4500,0
+400000,Female,University,Married,34,0,45514,4854,0
+320000,Female,Graduate School,Single,39,0,91224,10000,0
+430000,Female,Graduate School,Single,30,0,62774,10284,0
+140000,Female,University,Single,31,1,0,0,1
+70000,Female,University,Married,31,0,9319,3652,0
+110000,Female,University,Single,31,0,7398,1500,0
+30000,Female,Others,Married,31,0,28731,1656,0
+550000,Female,University,Married,32,2,548551,21094,1
+310000,Female,Graduate School,Married,47,-1,2257,3000,0
+130000,Female,Graduate School,Single,32,0,126164,6500,0
+130000,Female,University,Married,32,0,119720,6000,0
+220000,Female,University,Married,34,0,154029,5700,0
+300000,Female,Graduate School,Married,35,-2,0,0,0
+110000,Female,University,Married,35,-1,1245,1245,0
+50000,Female,University,Married,44,1,24622,3100,1
+300000,Female,Graduate School,Single,40,-1,5130,0,1
+200000,Female,University,Married,36,-2,3131,207,0
+360000,Female,High School,Single,38,-1,6904,0,1
+500000,Female,Graduate School,Single,34,0,80630,3000,0
+290000,Female,University,Married,42,0,19281,6654,0
+430000,Female,Graduate School,Married,37,-2,1013,0,1
+180000,Female,High School,Single,34,1,3700,4696,0
+90000,Female,High School,Married,35,0,35872,1578,0
+280000,Female,University,Married,43,-2,14633,5600,0
+230000,Female,University,Single,37,0,29087,2100,0
+70000,Female,University,Married,39,0,54801,5336,0
+110000,Female,University,Married,38,-2,4976,2532,0
+70000,Female,High School,Married,32,0,70122,2431,0
+180000,Female,University,Married,32,-1,3305,0,0
+240000,Female,High School,Single,35,-2,0,0,1
+210000,Female,Graduate School,Married,40,-2,0,0,0
+20000,Female,University,Single,46,-1,390,0,1
+80000,Female,High School,Married,48,0,3188,6239,0
+50000,Female,Graduate School,Single,33,0,13452,1300,0
+80000,Female,High School,Single,33,-1,6607,6205,0
+80000,Female,Graduate School,Single,34,-1,34827,27881,0
+60000,Female,University,Others,34,1,0,60559,0
+360000,Female,University,Single,35,1,0,0,0
+50000,Female,University,Single,48,1,10024,0,0
+200000,Female,Others,Married,43,0,198595,7003,0
+140000,Female,University,Married,42,1,0,0,0
+180000,Female,University,Single,30,0,174104,7000,0
+160000,Female,University,Married,32,0,18400,10000,0
+200000,Female,University,Married,34,-2,0,0,1
+600000,Female,University,Married,38,1,51072,0,0
+20000,Female,University,Married,46,4,15833,0,1
+220000,Female,Graduate School,Married,44,0,215191,10000,0
+140000,Female,Graduate School,Married,40,2,34359,0,1
+240000,Female,University,Single,38,1,0,0,1
+150000,Female,University,Married,32,1,-1474,140916,0
+20000,Female,Graduate School,Single,32,-1,2095,7733,1
+310000,Female,University,Married,32,0,69886,2177,0
+230000,Female,High School,Married,39,1,6499,0,0
+130000,Female,Graduate School,Single,34,-2,0,0,0
+300000,Female,Graduate School,Single,31,-1,16612,19099,0
+300000,Female,University,Married,34,-1,37505,57210,0
+20000,Female,University,Married,29,2,19098,1474,0
+200000,Female,High School,Single,33,1,18021,0,0
+200000,Female,University,Married,40,-2,531,0,0
+500000,Female,Graduate School,Single,28,-2,4428,0,0
+80000,Female,University,Married,37,0,10559,1621,0
+360000,Female,Graduate School,Single,29,-2,0,0,1
+30000,Female,University,Married,33,0,28938,1800,0
+190000,Female,University,Single,44,0,162051,17917,0
+350000,Female,Graduate School,Married,36,-2,1133,0,0
+260000,Female,University,Married,31,-1,2224,2007,0
+190000,Female,University,Married,37,0,105679,11000,0
+230000,Female,University,Married,34,-2,0,0,0
+650000,Female,High School,Single,39,0,5865,0,0
+250000,Female,Graduate School,Single,32,-1,10495,2150,0
+700000,Female,University,Single,34,-2,1878,4354,0
+270000,Female,Graduate School,Married,40,-1,9658,8781,0
+100000,Female,University,Married,43,0,94713,4366,0
+260000,Female,University,Married,38,0,163550,5966,0
+410000,Female,University,Single,33,-2,4720,25488,0
+520000,Female,University,Married,47,-2,76762,107124,0
+80000,Female,University,Married,39,0,77020,3179,1
+450000,Female,High School,Married,36,-2,0,0,0
+30000,Female,Others,Single,45,0,25298,2000,0
+600000,Female,High School,Single,44,0,140179,5000,0
+200000,Female,Graduate School,Single,34,-1,692,0,0
+60000,Female,University,Single,34,2,60707,2102,0
+280000,Female,University,Married,34,-2,9103,6439,0
+230000,Female,Graduate School,Married,36,-1,1240,2078,0
+210000,Female,University,Single,31,-1,1154,1368,1
+60000,Female,Graduate School,Single,33,1,1242,0,0
+270000,Female,Graduate School,Single,37,-2,7842,6380,0
+80000,Female,High School,Married,41,-1,3526,10129,0
+80000,Female,University,Married,44,-2,0,0,0
+360000,Female,University,Married,40,-1,4343,725,0
+300000,Female,Graduate School,Single,38,0,15001,2318,1
+270000,Female,Graduate School,Married,44,-2,-106,3000,0
+200000,Female,Graduate School,Single,46,-2,2263,3448,0
+20000,Female,University,Single,46,-1,189,2249,1
+20000,Female,University,Married,32,0,8217,2000,1
+30000,Female,University,Married,33,2,29498,1,1
+50000,Female,Graduate School,Married,35,0,45822,2413,0
+230000,Female,Others,Single,30,0,168708,10699,0
+50000,Female,University,Married,39,1,16983,0,1
+70000,Female,High School,Married,31,0,9633,2000,0
+100000,Female,Graduate School,Single,31,1,9311,0,0
+100000,Female,High School,Married,30,1,46809,2000,0
+210000,Female,Graduate School,Single,31,-1,12618,6344,0
+450000,Female,Graduate School,Single,31,-1,2609,0,0
+300000,Female,High School,Single,31,0,92235,4505,0
+140000,Female,Graduate School,Single,31,-2,1399,1519,0
+480000,Female,Graduate School,Single,32,-2,11872,40000,0
+100000,Female,University,Single,30,1,31708,0,1
+110000,Female,Graduate School,Single,32,-1,3610,12000,0
+100000,Female,University,Single,31,0,91344,3309,1
+200000,Female,Graduate School,Single,31,-2,0,950,0
+100000,Female,Graduate School,Single,30,0,7384,3000,0
+50000,Female,University,Married,30,-1,1500,0,1
+20000,Female,University,Married,31,0,13080,1500,0
+490000,Female,University,Married,31,0,470579,19469,0
+90000,Female,Graduate School,Single,32,-1,7563,0,0
+310000,Female,Graduate School,Married,32,0,141203,7000,0
+240000,Female,Graduate School,Single,33,-2,5000,0,0
+390000,Female,Graduate School,Single,32,0,16566,9076,0
+140000,Female,University,Single,33,0,59097,2137,0
+120000,Female,Graduate School,Single,33,-1,332,332,1
+30000,Female,High School,Married,33,0,29666,1763,0
+230000,Female,University,Married,32,-1,399,1560,0
+320000,Female,Graduate School,Single,32,0,13720,8015,0
+110000,Female,Graduate School,Single,32,0,60558,1900,0
+60000,Female,University,Single,31,0,36434,4100,1
+180000,Female,Graduate School,Single,32,-2,-30,2230,0
+430000,Female,Graduate School,Single,33,0,345053,13206,0
+230000,Female,Graduate School,Single,33,0,7990,13450,0
+140000,Female,University,Married,33,1,-23,5765,0
+180000,Female,Graduate School,Single,31,-1,766,243,0
+60000,Female,University,Single,31,1,20544,0,0
+310000,Female,University,Single,31,0,86082,3550,0
+320000,Female,University,Married,33,0,324160,10067,0
+100000,Female,Graduate School,Single,33,-2,3417,7400,0
+120000,Female,University,Single,33,1,68493,3500,1
+260000,Female,Graduate School,Single,33,-2,18480,3978,1
+170000,Female,Graduate School,Single,32,-1,13880,6593,0
+200000,Female,Graduate School,Single,31,-1,450,0,0
+110000,Female,Graduate School,Married,31,1,76817,15200,0
+50000,Female,Graduate School,Single,32,0,5458,1500,0
+350000,Female,University,Single,32,-1,10362,42541,0
+110000,Female,Graduate School,Single,31,0,14268,2491,0
+240000,Female,Graduate School,Married,30,0,236823,10279,0
+120000,Female,University,Married,30,2,1248,1701,0
+150000,Female,Graduate School,Single,32,0,18519,3000,0
+180000,Female,University,Married,34,-2,10485,274,1
+200000,Female,University,Single,34,0,13112,13011,0
+250000,Female,Graduate School,Single,33,-1,16317,0,0
+350000,Female,Graduate School,Married,36,-1,326,326,0
+200000,Female,University,Married,36,-2,0,0,0
+30000,Female,University,Married,35,0,25239,3000,0
+150000,Female,University,Married,45,0,141292,4719,0
+380000,Female,High School,Married,46,1,0,0,0
+180000,Female,University,Married,47,0,176424,5896,0
+20000,Female,University,Married,35,1,16335,0,1
+100000,Female,Graduate School,Married,49,1,103307,0,1
+300000,Female,University,Single,36,-1,8310,2592,0
+450000,Female,Graduate School,Single,36,1,5909,968,0
+220000,Female,University,Married,35,2,29202,2000,1
+500000,Female,University,Married,37,0,58448,5000,0
+200000,Female,High School,Single,45,-2,0,10221,0
+80000,Female,University,Married,35,1,78959,0,1
+430000,Female,Graduate School,Married,46,-2,9270,43717,0
+200000,Female,University,Married,40,-2,1521,0,0
+90000,Female,Graduate School,Married,36,-1,3860,1826,0
+200000,Female,Graduate School,Married,38,-1,4891,28258,0
+60000,Female,University,Married,47,0,9253,50000,0
+250000,Female,Graduate School,Single,36,-2,2200,0,0
+200000,Female,Graduate School,Married,40,0,9141,1138,0
+360000,Female,University,Single,45,-2,0,2500,0
+200000,Female,Graduate School,Single,33,0,196304,9000,0
+500000,Female,University,Single,41,-2,6305,6332,0
+200000,Female,Graduate School,Married,38,0,185155,6500,0
+230000,Female,Graduate School,Single,44,0,230101,10125,0
+50000,Female,High School,Married,37,-1,389,25742,0
+260000,Female,Graduate School,Married,38,0,16995,3087,0
+140000,Female,Graduate School,Married,39,0,27942,2000,0
+320000,Female,University,Single,35,0,125567,6831,0
+50000,Female,University,Married,39,0,48651,2200,0
+150000,Female,University,Married,35,-2,-10,8856,1
+160000,Female,Graduate School,Married,36,-1,1714,2845,0
+500000,Female,University,Married,43,0,374802,14000,0
+350000,Female,Graduate School,Married,44,-2,8544,0,0
+230000,Female,High School,Single,48,-1,3096,518,1
+200000,Female,University,Married,39,-1,2485,3798,0
+500000,Female,University,Single,39,0,229332,7400,0
+130000,Female,Graduate School,Single,34,0,76589,3400,0
+290000,Female,University,Single,34,0,25530,15035,0
+210000,Female,University,Single,34,0,73415,4002,0
+240000,Female,Graduate School,Single,36,-1,330,3152,0
+80000,Female,University,Married,36,-2,6746,556,0
+150000,Female,Graduate School,Single,36,0,127204,5900,1
+200000,Female,Others,Married,39,-2,500,0,0
+110000,Female,University,Single,36,2,112900,0,1
+320000,Female,High School,Married,33,0,71047,2000,0
+340000,Female,Graduate School,Single,37,-1,20624,8641,0
+380000,Female,University,Married,43,1,0,0,0
+150000,Female,Graduate School,Married,43,-1,6940,6200,0
+500000,Female,Graduate School,Single,37,-2,747,0,0
+500000,Female,University,Married,49,-2,12871,5177,0
+370000,Female,University,Married,44,-2,0,1742,0
+360000,Female,University,Married,40,-2,8736,974,0
+300000,Female,University,Married,39,-2,2022,537,0
+150000,Female,University,Single,37,-2,0,0,0
+210000,Female,Graduate School,Married,38,-2,-81,0,0
+280000,Female,Graduate School,Married,39,-2,1397,0,0
+230000,Female,University,Married,44,-2,14029,9140,0
+360000,Female,University,Single,37,0,292725,10638,0
+230000,Female,University,Married,40,-1,1215,525,0
+450000,Female,University,Married,44,-2,8521,16080,0
+240000,Female,Graduate School,Single,34,2,5488,1700,1
+300000,Female,Graduate School,Single,41,-1,5770,0,0
+240000,Female,University,Married,38,0,197231,6702,0
+230000,Female,University,Married,37,-1,1054,2000,1
+230000,Female,Graduate School,Single,41,-1,12731,7000,0
+240000,Female,University,Married,40,-1,3204,4888,1
+160000,Female,Graduate School,Single,39,-2,0,300,0
+360000,Female,Others,Married,39,-1,3102,552,0
+160000,Female,Graduate School,Married,41,0,169867,3360,0
+10000,Female,University,Married,42,0,8710,1200,0
+430000,Female,Graduate School,Single,38,-1,590,590,0
+80000,Female,Graduate School,Married,39,1,21340,1700,0
+310000,Female,Graduate School,Married,39,-1,3803,681,0
+150000,Female,High School,Married,41,0,20934,1500,0
+200000,Female,University,Married,43,0,2893,0,0
+210000,Female,Graduate School,Married,44,-1,360,360,0
+400000,Female,University,Single,43,-2,15000,18400,0
+80000,Female,University,Married,40,-1,2667,2035,0
+160000,Female,University,Married,42,-1,34186,1569,0
+170000,Female,University,Married,40,-1,17914,1253,0
+240000,Female,University,Married,41,-2,5366,0,0
+300000,Female,University,Married,33,2,279065,0,1
+90000,Female,University,Married,44,2,1654,0,0
+180000,Female,University,Married,35,-1,11285,11798,0
+210000,Female,University,Married,48,-1,1417,0,0
+300000,Female,Graduate School,Married,42,1,-694,0,0
+240000,Female,Graduate School,Single,37,3,164691,7000,1
+170000,Female,Graduate School,Married,44,-2,0,0,0
+50000,Female,Graduate School,Single,37,0,43886,2500,0
+50000,Female,University,Married,39,0,45989,2081,0
+230000,Female,Graduate School,Single,39,-1,900,1945,0
+310000,Female,Graduate School,Married,34,0,270776,11080,0
+150000,Female,High School,Single,35,1,0,200,0
+200000,Female,Graduate School,Single,35,-2,0,185,0
+80000,Female,University,Single,39,2,129966,9685,1
+500000,Female,Graduate School,Married,37,-2,6757,9505,0
+20000,Female,University,Married,42,0,15379,3000,1
+20000,Female,High School,Married,46,0,11691,3000,1
+150000,Female,Graduate School,Single,34,-2,0,0,1
+100000,Female,Others,Single,34,0,91674,5004,0
+210000,Female,High School,Married,44,-2,11771,13462,1
+300000,Female,University,Married,40,1,0,0,0
+80000,Female,University,Single,32,1,7604,0,1
+410000,Female,Graduate School,Single,31,0,134072,15003,0
+90000,Female,High School,Married,36,0,79642,1580,0
+270000,Female,University,Married,43,0,135068,5000,0
+360000,Female,University,Married,41,0,270553,10900,0
+230000,Female,High School,Married,38,0,67810,1980,0
+150000,Female,University,Single,42,0,97066,2000,0
+150000,Female,University,Single,44,2,168179,0,1
+350000,Female,University,Married,43,-2,2124,3048,0
+210000,Female,High School,Single,43,-1,7605,1170,0
+130000,Female,High School,Married,44,0,40331,5000,0
+500000,Female,Graduate School,Married,49,-2,27936,2305,0
+280000,Female,University,Married,48,2,9511,35852,1
+450000,Female,Graduate School,Married,39,-2,8494,25128,0
+200000,Female,University,Single,46,0,120343,4342,0
+290000,Female,University,Married,34,0,53437,1274,0
+170000,Female,University,Married,35,1,0,0,0
+380000,Female,University,Married,39,0,307388,12936,0
+180000,Female,Graduate School,Single,35,1,-18,0,1
+280000,Female,Graduate School,Single,35,-1,6494,28723,0
+200000,Female,University,Married,31,-2,2167,9003,0
+220000,Female,University,Married,40,1,8359,14518,0
+240000,Female,Graduate School,Married,38,2,107425,5500,1
+470000,Female,University,Single,43,-2,5631,404,0
+280000,Female,Graduate School,Single,35,0,10089,40000,0
+200000,Female,Graduate School,Single,43,-2,-109,0,1
+20000,Female,High School,Married,46,1,8131,1200,0
+20000,Female,University,Single,34,0,18649,1700,0
+290000,Female,Graduate School,Single,34,-1,1915,6151,0
+20000,Female,Graduate School,Married,37,1,-113,1575,1
+150000,Female,Graduate School,Married,38,-2,1544,463,0
+160000,Female,University,Married,36,-2,-20,0,1
+360000,Female,Graduate School,Married,36,-2,0,0,1
+160000,Female,University,Married,41,1,99496,0,0
+400000,Female,Graduate School,Married,37,0,229714,11477,0
+180000,Female,University,Single,36,0,14222,1300,0
+120000,Female,University,Married,32,0,111441,5596,0
+200000,Female,University,Single,33,-2,1502,3836,0
+210000,Female,Graduate School,Married,39,-1,5907,2637,0
+70000,Female,High School,Married,48,2,62775,3000,1
+110000,Female,University,Married,38,0,195437,5000,1
+30000,Female,High School,Single,47,2,11652,1300,1
+40000,Female,High School,Married,46,-1,50,1800,0
+310000,Female,Graduate School,Married,41,-1,3640,0,0
+200000,Female,University,Married,41,-1,5592,0,0
+230000,Female,University,Married,39,0,24510,0,0
+450000,Female,Graduate School,Married,33,-2,3485,3004,0
+80000,Female,University,Single,34,0,40570,1983,0
+210000,Female,University,Single,43,0,209781,6500,0
+130000,Female,University,Others,48,-1,1778,0,0
+500000,Female,Graduate School,Married,49,-2,1088,4031,0
+180000,Female,University,Married,38,-1,7098,0,1
+340000,Female,Graduate School,Married,36,1,0,544,0
+450000,Female,Graduate School,Single,37,0,37070,17596,0
+80000,Female,High School,Married,49,0,61267,2179,0
+150000,Female,High School,Single,47,-1,2934,2843,0
+210000,Female,Graduate School,Married,43,1,0,0,0
+380000,Female,University,Single,38,0,250755,8339,0
+400000,Female,University,Married,46,0,43405,15000,0
+80000,Female,High School,Married,52,-2,210,0,0
+240000,Female,Graduate School,Married,44,-2,2771,4756,0
+120000,Female,University,Married,50,0,118492,6141,0
+60000,Female,High School,Married,63,0,59471,2400,0
+360000,Female,Graduate School,Single,54,0,132387,11606,0
+50000,Female,University,Married,50,2,27525,1900,1
+360000,Female,High School,Married,50,-1,846,1831,1
+30000,Female,High School,Married,51,0,22373,3400,0
+500000,Female,Graduate School,Married,53,-2,1200,0,0
+20000,Female,University,Married,54,3,6329,2415,1
+90000,Female,University,Married,56,-1,1160,4701,0
+80000,Female,High School,Married,55,0,59838,10000,0
+100000,Female,High School,Single,54,1,66212,0,0
+120000,Female,Graduate School,Married,51,0,82865,3652,0
+100000,Female,High School,Others,51,1,0,0,1
+150000,Female,University,Married,51,0,146018,7000,0
+10000,Female,University,Married,51,1,9151,0,0
+50000,Female,High School,Married,52,1,0,0,0
+490000,Female,Graduate School,Married,53,-2,1700,8913,0
+50000,Female,High School,Single,52,0,47670,1954,0
+340000,Female,University,Married,57,-1,16729,8942,0
+240000,Female,High School,Married,51,0,225614,10473,0
+50000,Female,High School,Married,53,2,40926,3000,1
+500000,Female,University,Married,50,0,290301,11000,0
+410000,Female,University,Married,50,0,249152,9000,0
+280000,Female,University,Married,64,0,22715,15506,0
+240000,Female,University,Single,59,0,230026,20000,0
+20000,Female,Graduate School,Single,54,0,17016,1568,0
+100000,Female,University,Married,55,2,44400,4100,1
+290000,Female,University,Married,52,0,276377,12880,0
+100000,Female,High School,Single,50,0,82397,3004,0
+270000,Female,University,Married,65,-1,157,0,0
+80000,Female,High School,Single,57,0,78312,3459,0
+280000,Female,Graduate School,Married,47,4,127928,3000,1
+30000,Female,High School,Married,52,1,-190,0,0
+90000,Female,High School,Married,53,0,88363,3200,1
+600000,Female,Graduate School,Married,53,-2,84300,27446,0
+70000,Female,High School,Married,54,0,37717,1655,0
+270000,Female,High School,Married,58,-2,936,1188,0
+260000,Female,High School,Married,59,0,258885,10000,0
+50000,Female,High School,Married,59,0,25775,0,0
+50000,Female,High School,Married,53,0,48549,2277,1
+70000,Female,High School,Single,52,0,103570,3500,0
+210000,Female,High School,Married,55,-1,3857,0,0
+330000,Female,High School,Married,50,-1,2064,0,0
+200000,Female,University,Married,51,0,288015,4470,0
+220000,Female,Graduate School,Married,51,-2,3315,2625,0
+50000,Female,High School,Married,52,0,11463,1500,0
+20000,Female,University,Married,53,0,19594,2000,0
+30000,Female,University,Single,53,-1,356,15700,1
+280000,Female,University,Married,56,2,284128,9970,1
+60000,Female,High School,Married,60,0,57974,2810,0
+200000,Female,High School,Married,50,0,8413,5000,0
+230000,Female,University,Single,51,0,213457,17000,0
+230000,Female,Graduate School,Married,51,-2,0,0,1
+100000,Female,High School,Married,49,0,62049,2247,0
+80000,Female,University,Married,49,0,83888,3900,0
+80000,Female,University,Single,51,0,80106,1400,1
+70000,Female,University,Married,57,0,18379,1500,0
+200000,Female,Graduate School,Married,54,-2,0,0,1
+20000,Female,University,Married,51,2,17612,1800,0
+200000,Female,University,Married,50,-2,3414,6640,0
+50000,Female,University,Married,55,0,23605,5856,0
+360000,Female,University,Married,52,1,0,0,0
+200000,Female,High School,Single,49,-1,1801,181,0
+90000,Female,University,Married,55,2,83040,900,1
+50000,Female,High School,Single,59,0,50761,2500,0
+400000,Female,University,Married,48,0,352730,12370,0
+110000,Female,High School,Married,51,0,105476,5436,0
+50000,Female,High School,Married,58,1,24135,0,1
+500000,Female,Graduate School,Married,58,1,-1,35436,1
+150000,Female,High School,Married,65,-2,0,0,0
+300000,Female,Others,Married,54,1,0,0,0
+50000,Female,University,Married,56,0,48703,2000,0
+300000,Female,Graduate School,Married,59,-1,12982,11138,0
+290000,Female,University,Married,49,0,282538,10711,1
+500000,Female,University,Married,52,0,588000,17559,0
+140000,Female,High School,Married,50,-1,845,0,1
+350000,Female,Others,Married,49,1,0,36011,0
+180000,Female,University,Single,49,0,56677,1600,0
+390000,Female,Graduate School,Married,48,1,0,5660,0
+50000,Female,Graduate School,Single,52,-1,28594,5000,0
+50000,Female,Graduate School,Single,51,0,43273,2800,0
+150000,Female,Graduate School,Single,52,0,126810,5000,0
+200000,Female,Graduate School,Married,49,-1,413,0,0
+180000,Female,Graduate School,Single,53,-2,-1580,0,0
+50000,Female,University,Married,50,0,12699,1247,0
+110000,Female,Graduate School,Married,51,0,103701,3771,0
+310000,Female,University,Married,62,0,113529,5000,0
+80000,Female,High School,Single,60,0,37298,3000,0
+80000,Female,High School,Married,54,0,55498,2000,0
+240000,Female,Graduate School,Married,53,-2,0,0,0
+210000,Female,University,Married,54,-2,0,0,0
+30000,Female,University,Single,50,0,25416,1389,0
+80000,Female,Graduate School,Married,52,0,34243,1570,0
+70000,Female,High School,Married,49,2,19954,1700,1
+50000,Female,High School,Single,50,1,0,0,1
+180000,Female,High School,Married,64,-2,18084,1703,0
+160000,Female,University,Married,60,-2,16060,1605,0
+20000,Female,High School,Married,50,2,15887,1275,1
+180000,Female,Graduate School,Married,53,1,0,0,0
+30000,Female,University,Others,54,0,26302,2000,0
+210000,Female,University,Married,56,-1,25487,6655,0
+120000,Female,High School,Married,55,1,-150,0,0
+20000,Female,High School,Married,51,1,15399,0,1
+30000,Female,University,Married,49,1,10440,0,0
+50000,Female,High School,Married,53,1,51254,0,0
+20000,Female,High School,Married,51,1,0,780,0
+120000,Female,University,Married,57,-2,0,644,0
+70000,Female,University,Married,62,2,70457,6454,1
+50000,Female,University,Married,58,0,50182,1500,0
+20000,Female,University,Married,49,1,19127,0,0
+310000,Female,High School,Married,53,-2,1126,2635,0
+120000,Female,University,Married,40,0,123270,6000,0
+360000,Female,High School,Married,47,0,366484,14463,0
+240000,Female,High School,Single,49,-1,1271,3000,0
+500000,Female,High School,Single,53,-2,0,0,1
+500000,Female,University,Married,53,-1,10789,7919,0
+240000,Female,Graduate School,Married,55,-1,80199,3900,0
+200000,Female,University,Married,56,-2,2155,0,0
+30000,Female,High School,Married,67,2,26590,0,1
+130000,Female,High School,Married,49,0,108711,6000,0
+80000,Female,High School,Married,50,0,55070,5012,0
+150000,Female,University,Married,51,0,71776,2046,0
+530000,Female,Graduate School,Married,54,-2,6458,5600,0
+210000,Female,Graduate School,Married,53,0,180227,11000,0
+30000,Female,University,Others,54,1,0,0,0
+410000,Female,University,Married,50,0,358896,13000,0
+240000,Female,University,Married,50,-2,664,1086,0
+50000,Female,Graduate School,Single,61,3,60458,0,1
+200000,Female,High School,Married,67,0,187610,10000,0
+70000,Female,University,Single,55,0,57867,1920,0
+140000,Female,High School,Single,53,0,68889,3075,0
+320000,Female,High School,Single,54,-1,54661,183429,0
+50000,Female,High School,Married,53,0,46138,2182,0
+500000,Female,University,Single,50,-1,588,2393,1
+300000,Female,University,Married,51,-2,0,0,1
+400000,Female,Graduate School,Married,51,1,-9,1093,0
+170000,Female,University,Single,51,0,1606,1420,0
+60000,Female,High School,Married,50,2,61349,2900,1
+120000,Female,Graduate School,Married,47,-1,632,948,1
+210000,Female,Graduate School,Single,50,-1,1561,0,0
+50000,Female,High School,Married,51,0,45096,1500,0
+30000,Female,University,Single,51,0,22473,3200,0
+130000,Female,University,Married,51,0,124619,6301,0
+280000,Female,University,Married,50,2,106341,5600,1
+240000,Female,Graduate School,Married,51,-2,4302,546,0
+240000,Female,High School,Married,50,0,242152,8000,0
+270000,Female,University,Married,51,-1,7582,10392,0
+500000,Female,Graduate School,Married,52,-2,9534,9267,0
+170000,Female,High School,Single,53,0,161858,5800,0
+80000,Female,High School,Married,50,2,2684,0,0
+280000,Female,High School,Married,53,0,4866,1200,0
+110000,Female,Graduate School,Married,52,-1,1865,0,0
+250000,Female,Graduate School,Married,54,-1,390,1090,0
+360000,Female,University,Single,54,0,217936,8880,0
+50000,Female,High School,Others,53,0,49326,1803,1
+280000,Female,University,Married,57,-1,7640,9369,0
+500000,Female,University,Married,54,0,455977,13455,0
+260000,Female,High School,Married,65,0,225028,8165,0
+250000,Female,University,Married,52,-2,1376,4038,1
+140000,Female,High School,Married,56,0,141342,5025,0
+180000,Female,Graduate School,Married,55,1,0,0,0
+190000,Female,University,Married,52,-2,175,1547,0
+290000,Female,Graduate School,Married,52,0,85532,12000,0
+230000,Female,Graduate School,Single,53,-2,0,742,0
+220000,Female,University,Married,54,-1,265,4786,0
+30000,Female,University,Married,60,1,0,0,0
+300000,Female,University,Single,57,-1,2890,390,0
+260000,Female,High School,Married,57,-1,1148,0,0
+140000,Female,University,Married,58,2,68859,3300,1
+350000,Female,Graduate School,Married,57,-2,7617,2558,0
+70000,Male,University,Married,40,0,30346,1681,0
+200000,Male,University,Single,30,0,183464,7197,0
+90000,Male,Graduate School,Single,31,2,90226,2500,1
+30000,Male,University,Single,49,1,24597,3000,0
+80000,Male,Graduate School,Single,25,0,67751,2500,0
+150000,Male,Graduate School,Single,26,-1,29997,2000,0
+360000,Male,Graduate School,Single,29,1,-4316,65000,0
+400000,Male,University,Married,30,-1,11633,1200,0
+300000,Male,Graduate School,Single,30,-1,121,3204,0
+50000,Male,University,Single,30,2,21839,1600,1
+20000,Male,Graduate School,Single,27,1,15678,0,0
+90000,Male,University,Single,28,2,82671,0,1
+180000,Male,University,Married,28,0,148735,6000,0
+230000,Male,University,Single,29,-1,566,5480,0
+160000,Male,University,Married,28,1,0,0,0
+200000,Male,University,Single,25,1,124652,6,0
+300000,Male,University,Single,27,0,1726,2800,0
+360000,Male,Graduate School,Single,29,-2,0,0,0
+230000,Male,University,Single,30,-1,9125,4000,0
+80000,Male,Graduate School,Single,27,0,74413,7000,0
+90000,Male,University,Single,27,2,2542,1066,1
+80000,Male,Graduate School,Single,25,0,37325,2100,0
+160000,Male,Graduate School,Single,29,1,0,6285,0
+200000,Male,Graduate School,Single,30,-2,0,0,0
+60000,Male,University,Single,25,0,15614,1267,0
+360000,Male,Graduate School,Single,27,-2,2807,898,0
+360000,Male,High School,Single,30,-2,0,0,0
+220000,Male,Graduate School,Single,28,-1,184,1709,0
+120000,Male,Graduate School,Single,27,0,21812,1700,0
+50000,Male,University,Single,23,-1,45708,6000,0
+300000,Male,University,Single,29,-2,0,0,1
+400000,Male,Graduate School,Single,36,-2,0,0,1
+260000,Male,High School,Single,35,0,33078,10000,0
+80000,Male,University,Married,27,0,58777,3000,0
+50000,Male,Graduate School,Single,30,0,50457,2400,0
+20000,Male,University,Single,27,-1,1200,0,0
+50000,Male,University,Married,29,0,47555,1932,0
+20000,Male,High School,Single,23,1,19901,0,0
+150000,Male,Graduate School,Single,30,0,84777,2881,0
+620000,Male,University,Single,29,0,524191,19306,0
+270000,Male,High School,Single,36,0,276192,11000,0
+130000,Male,University,Single,29,1,88766,5000,1
+150000,Male,University,Single,22,-1,82791,3187,0
+50000,Male,University,Married,26,0,38189,3000,0
+50000,Male,University,Married,26,1,13294,0,1
+20000,Male,University,Single,23,2,15021,6000,1
+50000,Male,University,Single,22,-1,2454,0,0
+170000,Male,Graduate School,Single,25,0,112367,5451,0
+50000,Male,University,Single,29,0,51232,1900,0
+20000,Male,High School,Single,22,0,18306,1287,0
+20000,Male,University,Single,22,1,14542,0,1
+20000,Male,University,Single,21,0,23675,1800,0
+20000,Male,University,Single,21,0,18671,1614,0
+20000,Male,Graduate School,Single,22,0,19231,1331,1
+90000,Male,University,Single,25,0,79938,3000,0
+80000,Male,University,Married,26,-1,495,0,0
+50000,Male,Graduate School,Single,22,0,28725,2000,1
+60000,Male,Graduate School,Single,23,0,31348,20109,0
+50000,Male,University,Single,22,0,49718,1786,0
+20000,Male,Graduate School,Single,22,0,17875,3300,0
+20000,Male,High School,Single,23,0,19812,1300,0
+50000,Male,University,Single,24,-1,90,7396,0
+80000,Male,Graduate School,Single,24,0,50781,4400,0
+50000,Male,Graduate School,Single,22,0,38251,4000,0
+20000,Male,University,Single,22,1,17456,0,1
+20000,Male,University,Single,22,2,15662,900,0
+50000,Male,High School,Married,22,0,45458,2051,1
+50000,Male,High School,Single,23,0,4332,0,0
+50000,Male,University,Single,23,0,35176,2007,0
+20000,Male,University,Single,23,1,16830,50,0
+420000,Male,University,Married,24,0,392639,16117,0
+50000,Male,University,Single,25,0,17826,1400,0
+110000,Male,University,Single,26,1,83796,5200,0
+20000,Male,Others,Single,21,0,13203,1036,0
+20000,Male,University,Single,22,0,16439,1279,0
+50000,Male,High School,Single,23,0,9168,1166,0
+180000,Male,University,Single,23,0,96709,4485,0
+20000,Male,University,Single,22,0,18690,3500,0
+20000,Male,University,Single,21,0,22291,1700,0
+20000,Male,University,Single,21,0,20070,1072,0
+20000,Male,University,Single,22,2,18292,0,1
+50000,Male,University,Single,22,0,38311,20052,0
+50000,Male,High School,Single,26,2,31708,1700,0
+150000,Male,Others,Others,27,-1,3462,1147,0
+70000,Male,University,Single,27,2,29193,1800,1
+50000,Male,University,Single,27,1,47568,0,1
+150000,Male,Graduate School,Married,32,0,15000,0,0
+100000,Male,University,Single,26,0,44053,2206,0
+20000,Male,University,Single,24,0,15730,1313,0
+20000,Male,University,Single,24,0,14068,3200,0
+20000,Male,High School,Single,24,0,3724,2000,0
+20000,Male,University,Single,22,0,20813,1000,0
+100000,Male,High School,Single,25,0,76068,2895,0
+190000,Male,University,Single,29,0,185222,5836,0
+50000,Male,University,Single,22,0,47871,1546,0
+20000,Male,High School,Single,24,1,14665,0,1
+70000,Male,University,Married,26,1,8587,1100,0
+50000,Male,University,Single,24,0,52359,2064,0
+20000,Male,University,Single,25,1,20873,4,1
+20000,Male,Graduate School,Single,24,-1,720,100,1
+20000,Male,University,Single,24,0,10768,1502,0
+50000,Male,University,Single,24,-1,48846,1904,0
+20000,Male,Graduate School,Single,23,0,10242,10000,0
+50000,Male,University,Single,23,1,0,0,0
+50000,Male,University,Single,24,1,5784,0,0
+20000,Male,Graduate School,Single,24,1,255,541,0
+20000,Male,University,Single,25,1,12589,0,0
+50000,Male,University,Single,25,0,46980,1910,1
+70000,Male,Graduate School,Single,25,0,23799,9000,0
+20000,Male,University,Single,27,0,12546,1500,1
+20000,Male,University,Married,24,1,17611,1000,0
+20000,Male,University,Single,23,0,20432,1280,0
+20000,Male,University,Single,25,0,17286,1594,1
+220000,Male,Graduate School,Single,26,0,44432,100000,0
+120000,Male,University,Single,25,1,63119,2200,1
+140000,Male,Graduate School,Single,26,0,17376,4907,0
+50000,Male,Graduate School,Single,27,0,48215,4007,0
+50000,Male,University,Single,25,0,47034,2500,0
+10000,Male,University,Single,26,1,7140,11153,0
+50000,Male,Graduate School,Single,25,0,18164,1900,0
+50000,Male,University,Single,26,0,25126,2000,0
+20000,Male,Graduate School,Single,25,0,26694,1200,0
+200000,Male,Graduate School,Single,26,-2,-4,913,0
+40000,Male,Graduate School,Single,26,0,21651,5009,0
+30000,Male,University,Single,23,0,14404,3196,0
+20000,Male,University,Single,23,2,15789,1610,1
+50000,Male,University,Single,23,0,51274,4175,0
+20000,Male,High School,Single,24,-1,546,0,0
+130000,Male,University,Single,25,0,72850,3300,0
+160000,Male,University,Single,25,0,4071,1060,0
+110000,Male,University,Single,25,0,54165,88490,0
+50000,Male,University,Single,37,0,16029,1281,1
+50000,Male,University,Single,25,0,29206,25000,0
+240000,Male,University,Single,26,0,27967,10016,0
+20000,Male,University,Single,27,0,2003,1086,0
+500000,Male,High School,Single,30,-1,1092,4303,0
+60000,Male,University,Single,26,0,59487,2113,0
+50000,Male,High School,Single,25,0,47727,2100,0
+50000,Male,University,Single,24,0,46521,2150,0
+20000,Male,University,Single,25,-1,390,390,0
+30000,Male,University,Single,25,0,30274,1932,0
+80000,Male,University,Married,25,1,80906,3800,0
+20000,Male,University,Married,25,0,14447,1552,0
+10000,Male,University,Single,26,0,8882,1300,0
+20000,Male,Others,Single,26,0,20564,3055,0
+100000,Male,Graduate School,Single,26,0,1544,1100,1
+50000,Male,University,Single,23,2,51246,8,0
+60000,Male,University,Single,26,0,58072,2282,1
+400000,Male,University,Single,27,0,15330,2501,0
+20000,Male,Others,Single,27,5,21673,0,0
+50000,Male,High School,Single,27,0,32590,0,0
+110000,Male,University,Single,27,0,102551,5500,0
+30000,Male,High School,Single,23,0,4443,430,0
+230000,Male,University,Single,27,0,34592,1816,0
+20000,Male,High School,Others,23,0,18455,2000,0
+30000,Male,Graduate School,Single,24,0,6003,1440,0
+90000,Male,University,Single,25,0,91151,3453,0
+30000,Male,University,Single,25,2,24279,1713,1
+20000,Male,High School,Single,24,4,20569,0,1
+20000,Male,University,Single,24,2,17996,2100,1
+50000,Male,University,Single,24,0,12236,1300,0
+80000,Male,University,Single,25,-1,416,416,0
+120000,Male,Graduate School,Single,28,-1,231,6673,0
+50000,Male,University,Single,25,-1,5814,0,0
+200000,Male,Graduate School,Single,26,0,50124,1830,0
+150000,Male,University,Single,25,0,140662,6000,0
+360000,Male,Graduate School,Single,24,2,1188,1003,1
+60000,Male,Graduate School,Single,25,0,56918,5914,0
+80000,Male,University,Single,25,3,79977,0,0
+320000,Male,Graduate School,Single,25,0,146183,6206,0
+140000,Male,University,Single,25,2,142697,5010,1
+20000,Male,University,Married,26,3,1050,0,1
+110000,Male,University,Single,26,0,60808,196,0
+190000,Male,University,Single,26,0,45694,9000,0
+130000,Male,University,Single,25,0,113290,5650,0
+160000,Male,Graduate School,Single,26,0,138825,6800,1
+260000,Male,Graduate School,Single,26,0,250303,7065,0
+80000,Male,Graduate School,Single,26,-1,7412,945,0
+50000,Male,University,Single,25,0,28651,14200,1
+100000,Male,Graduate School,Single,26,0,91189,3500,0
+20000,Male,University,Single,27,1,7326,0,1
+210000,Male,University,Single,27,1,0,628,0
+160000,Male,University,Single,28,0,84879,2593,0
+340000,Male,Graduate School,Single,26,0,241720,8000,0
+70000,Male,High School,Single,27,2,58009,0,1
+50000,Male,Graduate School,Single,27,2,51042,0,1
+60000,Male,Graduate School,Single,26,-1,1870,32607,0
+20000,Male,High School,Single,27,3,1200,0,1
+50000,Male,Graduate School,Single,27,0,35215,1600,0
+240000,Male,Graduate School,Single,26,-1,5010,1188,0
+20000,Male,University,Single,23,0,9550,1480,0
+50000,Male,University,Single,25,0,29827,1536,0
+130000,Male,University,Single,25,0,39074,1766,0
+50000,Male,University,Single,25,0,49899,1679,0
+130000,Male,University,Single,27,-1,6136,9272,0
+20000,Male,Graduate School,Single,27,0,19483,1258,0
+370000,Male,Others,Married,28,0,64268,2868,0
+110000,Male,High School,Married,27,0,108435,3509,0
+200000,Male,High School,Single,28,0,203357,7064,0
+160000,Male,High School,Single,26,0,58929,3000,0
+310000,Male,University,Single,26,0,304091,12805,0
+90000,Male,Graduate School,Single,27,1,0,0,0
+20000,Male,University,Single,22,4,16226,0,1
+30000,Male,Others,Single,23,0,30830,0,0
+90000,Male,Others,Single,24,0,35865,1546,0
+240000,Male,University,Single,28,0,44996,0,0
+260000,Male,University,Single,32,0,60587,0,0
+120000,Male,University,Single,27,0,29344,5053,0
+80000,Male,University,Single,27,1,83062,0,0
+160000,Male,Graduate School,Single,27,-1,5574,3353,0
+130000,Male,University,Single,27,0,7100,1100,0
+20000,Male,Graduate School,Single,23,-1,14697,12483,0
+170000,Male,University,Married,26,0,30779,1675,0
+110000,Male,High School,Single,27,0,45443,15567,0
+50000,Male,University,Married,25,0,26339,1800,0
+70000,Male,University,Single,26,0,65329,3000,0
+80000,Male,University,Single,25,0,77901,3600,0
+50000,Male,Graduate School,Single,25,0,13886,3000,0
+60000,Male,Graduate School,Single,27,0,20567,4000,0
+100000,Male,University,Married,28,0,74269,2812,0
+50000,Male,Graduate School,Single,27,0,17468,32400,0
+180000,Male,Graduate School,Married,28,2,187205,5,0
+250000,Male,Graduate School,Single,27,-1,73,2265,0
+140000,Male,University,Single,27,0,24048,2014,0
+200000,Male,Graduate School,Single,28,-2,6905,4628,0
+70000,Male,Graduate School,Single,29,0,32938,1548,0
+160000,Male,University,Single,28,0,7169,581,0
+50000,Male,Graduate School,Single,28,1,0,0,0
+30000,Male,University,Single,28,1,26026,2000,0
+110000,Male,Graduate School,Single,28,0,102282,5200,0
+180000,Male,Graduate School,Single,27,0,20793,10053,0
+150000,Male,Graduate School,Single,28,0,24942,80284,0
+110000,Male,University,Single,28,0,111425,4300,0
+90000,Male,Graduate School,Single,27,0,84404,3100,0
+110000,Male,Graduate School,Single,28,0,107800,4000,0
+40000,Male,University,Single,28,-1,3333,167,0
+130000,Male,Others,Married,37,-1,1390,2196,0
+40000,Male,Graduate School,Single,28,0,38941,2002,0
+180000,Male,Graduate School,Single,27,2,10070,1500,0
+210000,Male,University,Single,28,2,209479,10000,1
+200000,Male,Others,Single,28,-1,1707,1109,0
+260000,Male,University,Single,28,0,251829,10000,0
+20000,Male,University,Single,25,-1,769,1500,0
+150000,Male,University,Married,29,0,14075,0,0
+250000,Male,University,Single,26,0,123148,15447,0
+80000,Male,Graduate School,Single,27,0,74640,20024,0
+30000,Male,University,Single,24,0,14597,1568,1
+20000,Male,High School,Married,27,-2,-2000,2780,0
+50000,Male,University,Married,29,1,23451,0,1
+460000,Male,Graduate School,Single,29,0,428369,16833,0
+340000,Male,University,Married,30,-2,326,7132,0
+10000,Male,University,Single,23,0,4304,1100,0
+50000,Male,University,Single,24,0,11772,1469,0
+110000,Male,University,Single,24,0,74349,3101,0
+50000,Male,University,Single,22,2,45614,5991,0
+100000,Male,Graduate School,Single,27,-1,194,1000,1
+80000,Male,Graduate School,Single,27,-1,400,600,0
+500000,Male,University,Single,28,0,33758,1626,0
+360000,Male,Graduate School,Single,28,-2,0,0,0
+280000,Male,University,Married,29,-2,-650,0,0
+300000,Male,University,Married,29,0,8857,3000,0
+20000,Male,High School,Single,27,-1,390,0,0
+460000,Male,Others,Single,28,-1,1358,55071,0
+500000,Male,Graduate School,Single,28,0,92442,7036,0
+500000,Male,Graduate School,Single,27,0,252881,30032,0
+360000,Male,Graduate School,Single,28,-1,9179,10977,0
+50000,Male,University,Single,27,-2,3043,2446,0
+30000,Male,Graduate School,Single,29,0,27720,0,0
+140000,Male,Graduate School,Single,29,1,0,0,0
+50000,Male,Graduate School,Single,29,-1,759,360,0
+20000,Male,Graduate School,Single,29,0,22819,1700,0
+20000,Male,High School,Married,29,0,14171,2000,0
+80000,Male,University,Single,27,2,58324,0,1
+230000,Male,University,Married,28,0,231357,10000,0
+110000,Male,University,Single,27,0,107339,3873,0
+90000,Male,University,Single,27,0,64497,2297,0
+200000,Male,Graduate School,Single,28,2,833,932,0
+60000,Male,Graduate School,Single,25,0,55656,4041,0
+50000,Male,University,Single,29,1,38916,0,1
+130000,Male,Graduate School,Single,29,0,78337,4000,0
+260000,Male,University,Married,29,-1,30903,1200,0
+180000,Female,Others,Single,29,-2,0,0,1
+240000,Male,University,Single,29,0,90440,3500,0
+150000,Male,Others,Single,28,0,15855,18056,0
+260000,Male,Graduate School,Single,28,-2,825,650,0
+100000,Male,Graduate School,Single,29,-1,711,1000,1
+50000,Male,University,Single,29,0,13778,2600,0
+150000,Male,Graduate School,Single,27,0,149340,10000,0
+40000,Male,Graduate School,Single,27,0,33077,1554,0
+20000,Male,University,Single,24,0,17172,1597,0
+30000,Male,Graduate School,Single,28,0,29234,2000,0
+280000,Male,Graduate School,Single,28,-2,10296,3543,0
+430000,Male,Graduate School,Single,29,0,402396,14000,0
+70000,Male,University,Single,29,0,17103,1500,1
+30000,Male,High School,Single,29,0,27904,1500,0
+30000,Male,University,Single,27,0,7956,5000,0
+80000,Male,Graduate School,Single,29,-1,2170,17568,0
+50000,Male,Graduate School,Single,27,0,47189,1778,0
+230000,Male,Graduate School,Single,27,0,24874,2000,0
+230000,Male,Graduate School,Single,29,0,41154,4000,0
+110000,Male,University,Single,29,2,55700,5000,1
+200000,Male,University,Single,29,3,2500,0,1
+50000,Male,Graduate School,Single,28,0,44619,2150,0
+60000,Male,Graduate School,Single,28,0,30567,1519,0
+20000,Male,Others,Single,28,-1,780,1180,0
+320000,Male,University,Single,29,2,172655,10000,0
+220000,Male,Graduate School,Single,28,0,150544,12000,0
+470000,Male,Graduate School,Single,29,-1,215666,7395,0
+60000,Male,Graduate School,Single,29,0,11976,2000,0
+330000,Male,University,Married,29,0,25131,1200,0
+110000,Male,University,Single,28,2,109241,5600,1
+80000,Male,University,Married,29,4,12237,0,1
+290000,Male,Graduate School,Single,30,0,243992,11091,0
+20000,Male,University,Single,22,3,2400,0,1
+30000,Male,Graduate School,Single,28,2,9799,2350,0
+80000,Male,High School,Single,27,0,58323,4000,0
+50000,Male,University,Married,28,0,28249,1196,1
+150000,Male,University,Single,28,-2,88609,4660,0
+20000,Male,High School,Single,28,1,18534,0,0
+140000,Male,University,Single,24,0,103884,4000,0
+260000,Male,Graduate School,Single,30,-1,182329,10000,0
+120000,Male,University,Single,30,-1,2494,1997,0
+110000,Male,University,Married,30,0,35283,1500,0
+140000,Male,Graduate School,Single,30,0,132970,6334,0
+100000,Male,Graduate School,Single,27,1,108762,25,0
+360000,Male,Graduate School,Single,30,-1,9939,14226,0
+70000,Male,University,Married,30,0,68671,2500,0
+210000,Male,Graduate School,Married,30,0,129890,12112,0
+200000,Male,University,Single,30,-1,15140,18044,0
+200000,Male,University,Single,28,0,162439,4000,0
+160000,Male,Graduate School,Single,28,0,157921,6000,0
+120000,Male,Graduate School,Single,31,1,-190,0,1
+180000,Male,University,Single,32,1,173179,0,0
+240000,Male,University,Single,29,0,210522,20000,0
+120000,Male,University,Single,27,0,113556,4200,1
+200000,Male,University,Single,28,0,99216,5300,0
+280000,Male,Graduate School,Married,30,0,166037,12126,0
+20000,Male,High School,Married,30,0,16925,1000,1
+20000,Male,University,Married,30,1,17005,0,1
+260000,Male,University,Single,29,-1,8832,15,1
+220000,Male,University,Single,29,-1,326,326,0
+30000,Male,University,Married,30,-2,5678,20774,0
+80000,Male,University,Married,29,0,80832,2000,0
+80000,Male,University,Single,30,2,66995,2500,1
+140000,Male,Graduate School,Single,30,-1,1800,1885,0
+220000,Male,High School,Married,30,2,780,390,1
+110000,Male,Graduate School,Single,30,0,57040,5000,0
+240000,Male,University,Married,31,0,168376,7648,0
+170000,Male,Graduate School,Single,28,0,145584,6500,0
+210000,Male,Graduate School,Married,31,0,140994,5100,0
+300000,Male,Graduate School,Single,29,-2,1898,2572,1
+190000,Male,High School,Single,26,0,124896,4633,0
+200000,Male,High School,Single,27,-1,4484,195316,0
+150000,Male,University,Single,25,0,141242,4000,0
+50000,Male,University,Single,23,0,16252,1600,0
+10000,Male,Graduate School,Single,24,-2,-2,0,0
+70000,Male,Graduate School,Single,31,-2,25856,13737,0
+470000,Male,Graduate School,Single,30,-2,9884,3301,0
+250000,Male,Graduate School,Single,28,-2,4003,2549,0
+70000,Male,High School,Single,28,0,71199,2507,0
+80000,Male,Graduate School,Single,28,1,0,47,0
+270000,Male,Graduate School,Single,29,-1,1576,199973,0
+180000,Male,High School,Others,27,-2,349,1260,1
+230000,Male,Graduate School,Single,29,2,481,12381,0
+80000,Male,High School,Single,30,2,79629,1691,1
+290000,Male,University,Single,30,-1,792,264,0
+230000,Male,Graduate School,Single,30,0,33400,36348,0
+130000,Male,University,Married,31,0,132242,5000,0
+500000,Male,University,Single,32,0,331751,10558,0
+580000,Male,Graduate School,Single,32,-1,49193,11686,0
+180000,Male,University,Single,30,0,42198,2000,0
+460000,Male,Graduate School,Single,31,0,4533,5002,0
+70000,Male,University,Married,30,1,72576,0,1
+310000,Male,Graduate School,Single,32,0,217464,8500,0
+50000,Male,Graduate School,Single,30,1,49233,0,1
+200000,Male,University,Single,32,2,88808,0,1
+300000,Male,Graduate School,Single,31,-1,4370,0,0
+110000,Male,Graduate School,Single,31,-1,7300,593,0
+50000,Male,Graduate School,Single,27,1,-45,1000,1
+360000,Male,High School,Married,31,-2,0,392,0
+220000,Male,University,Single,31,0,38386,3000,0
+190000,Male,Graduate School,Single,31,0,104503,4000,0
+20000,Male,High School,Single,23,2,18337,1602,1
+80000,Male,Graduate School,Single,31,-1,6887,171,0
+50000,Male,Graduate School,Single,29,0,51234,2300,0
+50000,Male,University,Single,30,1,13745,125,0
+360000,Male,University,Single,31,0,34831,5059,0
+90000,Male,Graduate School,Single,30,0,40768,2000,0
+200000,Male,High School,Single,30,-2,0,680,0
+410000,Male,University,Single,30,-2,7011,9222,0
+330000,Male,Graduate School,Single,31,0,165042,9000,0
+100000,Male,Graduate School,Single,30,0,96846,5055,0
+200000,Male,University,Married,31,-1,6961,3853,0
+110000,Male,University,Married,30,0,47755,1525,0
+90000,Male,Graduate School,Single,31,1,33868,1600,1
+320000,Male,University,Single,30,0,26310,2728,0
+70000,Male,Graduate School,Single,30,-1,832,0,0
+220000,Male,University,Single,30,0,91210,4000,0
+180000,Male,Graduate School,Single,32,2,220901,0,1
+180000,Male,Graduate School,Single,31,0,12805,2000,0
+240000,Male,University,Single,31,0,231467,8500,1
+140000,Male,University,Single,30,0,70568,2100,1
+280000,Male,University,Single,30,2,106907,6900,0
+30000,Male,Graduate School,Married,31,1,14107,1200,1
+20000,Male,University,Single,31,-1,5380,0,0
+80000,Male,Graduate School,Single,29,0,58246,5000,0
+230000,Male,University,Single,30,0,171861,10000,0
+200000,Male,University,Married,30,0,143266,6000,1
+520000,Male,Graduate School,Single,29,0,464541,15521,0
+400000,Male,University,Single,29,0,237425,8600,1
+30000,Male,University,Single,25,-2,836,836,0
+370000,Male,Graduate School,Single,31,0,35605,5000,0
+150000,Male,University,Married,31,-2,500,15338,0
+80000,Male,University,Single,26,1,27897,0,1
+430000,Male,High School,Single,26,0,438795,18000,0
+20000,Male,University,Single,24,0,19498,2000,1
+50000,Male,University,Single,27,0,49321,1700,0
+60000,Male,University,Single,24,2,48945,4549,1
+360000,Male,Graduate School,Single,28,-2,2988,2766,0
+280000,Male,Graduate School,Single,31,-2,-1,0,0
+360000,Male,University,Married,29,0,18454,6028,0
+420000,Male,Graduate School,Married,32,0,60825,1407,0
+320000,Male,Graduate School,Single,31,-2,4445,4984,0
+90000,Male,University,Married,31,0,5287,2500,0
+360000,Male,Graduate School,Single,33,-1,5484,4317,1
+70000,Male,University,Single,30,0,68668,2347,0
+50000,Male,University,Single,26,0,46008,1007,0
+200000,Male,High School,Married,30,0,141785,7000,0
+250000,Male,Graduate School,Single,31,-2,0,0,0
+150000,Male,Graduate School,Single,33,-1,1340,1815,0
+350000,Male,University,Married,33,-1,694,10524,0
+30000,Male,University,Married,32,0,28729,2000,0
+200000,Male,Graduate School,Single,30,2,173639,7310,0
+200000,Male,Graduate School,Single,32,0,78284,5005,0
+150000,Male,University,Single,29,0,102802,12186,0
+190000,Male,Graduate School,Single,30,0,138429,4555,0
+340000,Male,University,Single,30,0,20865,2000,0
+360000,Male,Graduate School,Single,31,1,0,951,0
+280000,Male,Graduate School,Single,30,-1,17913,380,0
+30000,Male,University,Single,47,0,29119,1500,0
+300000,Male,High School,Married,35,-2,0,0,0
+220000,Male,Others,Single,37,2,217638,7506,0
+80000,Male,High School,Married,41,0,81070,2385,0
+30000,Male,University,Married,40,2,20801,3400,1
+40000,Male,University,Married,37,0,9608,3000,0
+200000,Male,University,Single,29,0,145090,12000,0
+200000,Male,University,Single,30,0,83149,13550,0
+110000,Male,Graduate School,Single,29,-1,750,71022,0
+100000,Male,Graduate School,Single,31,2,93975,3500,1
+20000,Male,University,Married,34,2,390,18864,1
+120000,Male,High School,Married,46,0,104153,4500,0
+280000,Male,University,Married,37,0,85029,2625,0
+280000,Male,University,Single,30,0,29161,1282,0
+300000,Male,University,Single,42,1,0,159,0
+420000,Male,Graduate School,Married,43,-2,3542,3010,0
+80000,Male,Graduate School,Married,41,-1,11688,8739,0
+360000,Male,Graduate School,Married,49,-1,1050,390,1
+260000,Male,Others,Married,37,0,9801,1485,0
+260000,Male,University,Married,32,-1,5227,5000,0
+110000,Male,University,Single,32,0,64651,2941,0
+400000,Male,University,Married,32,0,271974,100000,0
+20000,Male,High School,Married,32,1,15598,0,1
+170000,Male,Graduate School,Married,35,0,136460,4800,0
+70000,Male,High School,Married,35,0,27915,3577,0
+210000,Male,Graduate School,Married,39,-1,840,1665,0
+10000,Male,University,Married,38,0,6798,1500,1
+780000,Male,Graduate School,Married,48,0,171459,10000,0
+420000,Male,Graduate School,Single,35,0,413076,15048,0
+280000,Male,High School,Married,44,0,295786,15000,0
+50000,Male,High School,Single,46,-1,390,170,0
+50000,Male,High School,Married,49,1,49206,3000,1
+60000,Male,University,Single,33,0,32961,1799,0
+20000,Male,University,Single,33,0,18330,1291,0
+170000,Male,University,Married,35,2,148655,6000,1
+330000,Male,University,Married,48,-2,6051,8387,0
+30000,Male,Graduate School,Married,37,1,20829,2500,1
+350000,Male,Graduate School,Married,42,2,671,671,1
+290000,Male,University,Married,49,0,193650,4126,0
+50000,Male,University,Married,49,2,49943,0,1
+20000,Male,High School,Single,45,1,0,780,0
+30000,Male,University,Married,38,0,18526,1500,1
+50000,Male,University,Others,37,0,50239,1799,0
+70000,Male,High School,Single,39,0,63330,5024,0
+310000,Male,Graduate School,Married,33,1,232244,40,0
+220000,Male,University,Married,48,0,2858,1000,0
+360000,Male,High School,Single,34,-1,686,3821,0
+140000,Male,Graduate School,Married,38,-1,1110,24633,0
+200000,Male,University,Married,38,2,102272,5000,1
+20000,Male,University,Married,42,0,14310,1600,1
+290000,Male,University,Married,36,0,154923,6507,0
+300000,Male,University,Married,36,0,76382,115718,0
+200000,Male,University,Married,45,3,2500,0,1
+150000,Male,Graduate School,Single,32,0,76215,22365,0
+240000,Male,University,Single,39,0,177795,8576,0
+450000,Male,University,Single,32,0,166295,5771,0
+20000,Male,University,Single,32,0,15121,1700,0
+50000,Male,High School,Married,45,4,50419,0,1
+190000,Male,University,Married,33,0,98814,5000,1
+390000,Male,Graduate School,Married,41,0,250490,9506,0
+90000,Male,Graduate School,Single,36,-1,91111,9400,0
+120000,Male,University,Married,39,0,93673,4375,0
+210000,Male,Graduate School,Single,33,-1,396,396,0
+500000,Male,University,Married,41,0,478623,14000,0
+320000,Male,Graduate School,Married,40,-1,22155,10908,0
+210000,Male,Graduate School,Single,39,-1,22861,6536,0
+90000,Male,University,Single,38,2,89094,3000,1
+100000,Male,University,Single,34,1,-342,0,0
+360000,Male,University,Married,37,1,0,0,0
+40000,Male,University,Others,37,0,20774,1355,0
+80000,Male,Graduate School,Married,41,1,0,0,0
+300000,Male,Graduate School,Single,34,1,2915,1813,0
+130000,Male,High School,Married,43,-1,1261,1705,0
+480000,Male,Graduate School,Married,41,1,0,0,0
+320000,Male,University,Married,39,0,310243,0,0
+330000,Male,High School,Married,37,-2,0,0,0
+70000,Female,Graduate School,Single,38,0,63455,2057,0
+240000,Male,University,Single,46,0,262206,9200,1
+210000,Male,High School,Single,32,0,204157,10059,0
+50000,Male,University,Married,47,-1,390,390,0
+50000,Male,Graduate School,Married,39,-1,8086,21012,0
+30000,Male,University,Married,38,0,17085,5000,0
+310000,Male,University,Married,38,-1,23285,1058,0
+230000,Male,Others,Single,40,2,119730,5000,0
+170000,Male,University,Married,39,0,91289,2486,0
+180000,Male,University,Married,39,1,0,0,0
+330000,Male,Graduate School,Married,49,-2,5813,65,0
+170000,Male,Graduate School,Married,34,0,53145,2005,0
+160000,Male,Graduate School,Single,33,2,101430,5000,0
+50000,Male,University,Single,32,0,15276,4800,1
+60000,Male,University,Single,33,0,56221,3000,0
+500000,Male,High School,Married,49,-1,12536,28023,0
+230000,Male,Graduate School,Married,44,1,-213,5000,0
+180000,Male,Graduate School,Married,38,1,-47,0,0
+320000,Male,Graduate School,Married,40,-1,8240,8240,0
+490000,Male,Graduate School,Married,39,0,189460,16405,0
+210000,Male,Others,Married,43,0,197743,7500,0
+50000,Male,University,Married,33,0,22384,1320,1
+160000,Male,High School,Married,33,0,25058,1344,0
+80000,Male,Others,Married,41,0,165578,4351,1
+220000,Male,Graduate School,Married,49,-1,1410,580,0
+20000,Male,University,Married,46,-1,7544,0,1
+200000,Male,Graduate School,Married,49,2,416,416,0
+300000,Male,High School,Married,31,1,0,0,1
+20000,Male,High School,Married,36,3,1050,0,1
+180000,Male,Graduate School,Single,32,1,0,144,0
+200000,Male,High School,Single,33,-2,2810,7149,0
+140000,Male,University,Married,38,2,53797,2700,1
+150000,Male,Graduate School,Single,36,-1,5856,4914,0
+160000,Male,Others,Married,40,0,19978,10000,0
+160000,Male,University,Single,32,0,163695,7000,0
+200000,Male,Graduate School,Married,34,-2,466,1242,0
+20000,Male,High School,Single,49,1,19481,0,1
+260000,Male,University,Single,32,1,3581,0,0
+260000,Male,University,Single,32,-1,1198,1198,1
+620000,Male,University,Married,42,0,75509,3500,0
+50000,Male,Graduate School,Married,37,0,37603,2000,1
+20000,Male,University,Married,32,-1,390,390,0
+190000,Male,University,Single,32,0,6415,2000,0
+30000,Male,University,Married,38,1,24305,0,1
+160000,Male,High School,Married,38,-1,6763,1651,0
+360000,Male,University,Married,40,-1,12663,0,0
+20000,Male,University,Single,44,0,19157,1500,0
+150000,Male,High School,Single,42,0,49728,5000,0
+400000,Male,Graduate School,Married,38,-1,1751,7046,0
+80000,Male,High School,Married,43,1,0,0,0
+260000,Male,Graduate School,Married,34,-2,-40,0,0
+500000,Male,University,Married,38,-1,6178,829,0
+240000,Male,University,Married,34,0,242805,10000,0
+50000,Male,University,Married,37,2,49444,2306,0
+10000,Male,University,Married,36,2,9075,2000,0
+50000,Male,Graduate School,Married,47,1,0,0,0
+150000,Male,University,Married,41,0,140172,5507,0
+150000,Male,Graduate School,Married,40,-2,0,1003,0
+130000,Male,High School,Single,30,0,29344,1107,0
+120000,Male,University,Single,37,1,21786,1000,0
+50000,Male,High School,Single,39,0,39753,2000,1
+160000,Male,Graduate School,Married,41,1,63446,1699,0
+70000,Male,University,Single,39,0,120766,4000,1
+380000,Male,Graduate School,Single,40,0,245069,12000,0
+280000,Male,Graduate School,Single,47,-2,2180,251,0
+50000,Male,High School,Married,39,0,51400,0,1
+30000,Male,University,Married,35,0,8348,1476,1
+260000,Male,Graduate School,Single,37,-2,256,188,0
+210000,Male,University,Single,35,0,131956,5000,0
+320000,Male,Graduate School,Single,32,-1,1043,0,0
+480000,Male,Graduate School,Married,41,0,76750,30000,0
+80000,Male,Graduate School,Married,42,1,45619,2000,0
+100000,Male,University,Single,43,0,42529,1703,0
+70000,Male,Graduate School,Single,31,0,47873,6000,0
+300000,Male,Graduate School,Others,37,-1,298,752,0
+150000,Male,Graduate School,Single,48,0,104320,2433,0
+140000,Male,Graduate School,Others,41,0,130138,4756,1
+360000,Male,Graduate School,Married,39,-1,30956,30818,0
+500000,Male,High School,Married,47,-1,18033,8783,0
+320000,Male,University,Single,36,2,283743,12013,1
+190000,Male,Graduate School,Married,44,-1,231,3190,0
+50000,Male,High School,Married,49,0,18139,1198,1
+50000,Male,University,Single,38,0,52711,1769,1
+290000,Male,Graduate School,Single,31,0,124626,3900,0
+110000,Male,Others,Married,36,0,58949,3000,0
+210000,Male,University,Single,31,0,189650,5000,0
+20000,Male,High School,Single,32,1,20450,0,1
+90000,Male,High School,Married,35,0,86821,4000,0
+120000,Male,University,Single,36,-1,66,843,0
+20000,Male,University,Married,46,3,7283,0,1
+270000,Male,University,Single,31,0,21129,1697,0
+190000,Male,University,Single,41,0,33798,2000,0
+200000,Male,High School,Single,33,-2,655,270,0
+340000,Male,University,Married,35,0,38214,1500,0
+310000,Male,Graduate School,Single,33,-1,10878,12910,0
+180000,Male,High School,Married,30,-1,56951,2100,0
+80000,Male,University,Single,32,0,78239,4500,1
+510000,Male,High School,Single,31,1,0,14370,1
+200000,Male,Graduate School,Married,44,0,75811,4001,0
+100000,Male,Graduate School,Single,36,0,15585,5000,0
+150000,Male,University,Single,35,0,4301,4000,1
+400000,Male,University,Married,35,0,212647,6000,0
+130000,Male,University,Married,44,1,5230,0,1
+420000,Male,University,Married,35,-1,9288,17095,0
+340000,Male,Graduate School,Single,49,-1,1190,2988,0
+200000,Male,Graduate School,Married,36,-1,1990,1186,0
+420000,Male,University,Married,32,0,203843,3200,0
+50000,Male,Graduate School,Single,34,-1,27465,24125,0
+280000,Male,Graduate School,Single,31,0,148186,5000,0
+280000,Male,Graduate School,Single,31,2,278510,6996,1
+60000,Male,High School,Married,46,0,56869,2110,0
+360000,Male,Graduate School,Married,33,-1,2040,897,0
+220000,Male,University,Single,31,1,224244,0,1
+20000,Male,University,Single,38,0,16465,3000,0
+160000,Male,Graduate School,Married,41,1,0,0,1
+20000,Male,High School,Single,48,-1,170,0,0
+180000,Male,University,Married,33,-2,3187,3198,0
+500000,Male,Graduate School,Single,33,0,28498,5000,0
+20000,Male,University,Married,34,0,9662,1180,1
+20000,Male,University,Single,34,-2,2216,2199,0
+250000,Male,High School,Single,45,-1,98015,4541,0
+320000,Male,University,Married,38,0,208207,1154,0
+470000,Male,University,Single,32,-1,19419,11070,0
+180000,Male,High School,Single,39,0,51496,1838,0
+170000,Male,University,Single,32,1,-1000,0,0
+20000,Male,High School,Married,45,0,8541,5577,0
+280000,Male,University,Married,48,0,127636,4300,0
+290000,Male,University,Married,36,0,42512,3000,0
+60000,Male,Graduate School,Single,36,-1,780,0,1
+130000,Male,Graduate School,Single,27,-1,2881,670,0
+100000,Male,High School,Married,42,7,33816,0,0
+60000,Male,Graduate School,Single,36,1,104492,5200,0
+100000,Male,University,Single,32,-1,4080,0,0
+50000,Male,Graduate School,Single,34,1,48212,2500,0
+550000,Male,Graduate School,Married,44,-2,21788,0,0
+100000,Male,University,Married,46,1,22648,2000,1
+280000,Male,University,Married,38,-1,3300,0,0
+30000,Male,University,Single,32,-1,17578,1000,0
+230000,Male,Graduate School,Single,45,-2,50500,22771,0
+80000,Male,University,Married,44,-1,1798,2310,0
+80000,Male,Graduate School,Single,33,0,59991,5000,0
+50000,Male,University,Single,31,0,50275,2000,0
+50000,Male,University,Married,46,-1,1853,2000,0
+280000,Male,Graduate School,Single,36,-2,198,1979,0
+290000,Male,Graduate School,Married,40,0,97193,6000,0
+50000,Male,High School,Single,47,0,44681,2100,0
+140000,Male,University,Single,41,-2,-10,0,0
+80000,Male,University,Married,43,-1,435,4495,0
+280000,Male,Graduate School,Single,32,0,4524,2000,0
+20000,Male,High School,Single,46,0,7790,2500,0
+470000,Male,University,Married,40,-1,20614,37978,0
+350000,Male,University,Single,33,0,25507,1800,0
+170000,Male,University,Married,34,0,35923,2000,0
+200000,Male,Graduate School,Married,31,-1,990,1010,0
+360000,Male,High School,Married,35,1,0,4359,0
+140000,Male,University,Single,41,0,144904,10000,1
+240000,Male,University,Married,41,0,238902,8135,1
+500000,Male,Graduate School,Married,32,0,305575,9824,0
+190000,Male,Graduate School,Single,34,2,154258,14500,1
+200000,Male,University,Married,32,0,74150,3000,0
+210000,Male,Graduate School,Single,41,-1,326,326,1
+30000,Male,Graduate School,Single,36,-1,1772,9647,0
+50000,Male,University,Married,36,0,47599,2000,1
+360000,Male,Graduate School,Married,47,-1,229,4261,0
+280000,Male,University,Married,43,0,149803,6150,0
+360000,Male,Graduate School,Married,38,-2,0,0,1
+50000,Male,University,Single,39,0,47990,1997,0
+270000,Male,High School,Married,47,0,161999,6000,0
+20000,Male,University,Married,37,0,17622,3000,0
+200000,Male,University,Married,37,0,4757,1500,0
+320000,Male,University,Married,39,0,103918,9203,0
+230000,Male,University,Married,37,0,21374,3000,0
+350000,Male,High School,Married,35,0,218605,6222,0
+220000,Male,Graduate School,Single,33,-2,1172,2917,1
+50000,Male,University,Married,35,0,22188,2000,0
+20000,Male,High School,Single,48,0,13543,3132,1
+390000,Male,Graduate School,Single,32,-1,35745,13000,0
+120000,Male,University,Married,39,-1,30000,0,0
+490000,Male,Graduate School,Married,41,0,79712,3000,0
+190000,Male,University,Single,37,0,79838,4000,0
+150000,Male,Others,Married,40,-1,291,291,0
+150000,Male,Others,Married,42,2,143947,4742,0
+270000,Male,Graduate School,Married,37,0,174532,6900,1
+150000,Male,University,Married,34,0,122375,12300,0
+600000,Male,Graduate School,Married,36,0,372396,50000,0
+240000,Male,Graduate School,Single,30,0,213483,8008,0
+100000,Male,Graduate School,Married,31,0,91176,6000,0
+260000,Male,Graduate School,Single,31,-1,500,0,1
+80000,Male,High School,Married,42,0,6959,3000,0
+20000,Male,High School,Single,46,2,9417,0,1
+260000,Male,University,Married,42,0,225844,10030,0
+370000,Male,Graduate School,Married,41,1,0,0,0
+70000,Male,University,Single,31,0,101284,2864,0
+270000,Male,High School,Single,31,0,188929,10089,0
+250000,Male,University,Single,44,0,175070,8000,0
+70000,Male,High School,Single,39,0,18219,2000,0
+160000,Male,University,Married,42,1,0,0,0
+300000,Male,Graduate School,Single,35,0,17975,1510,0
+280000,Male,High School,Married,39,-1,1466,1473,0
+210000,Male,Graduate School,Married,34,-1,330,330,0
+200000,Male,Graduate School,Single,33,2,234069,16412,0
+50000,Male,Graduate School,Single,31,3,1050,0,1
+60000,Male,University,Married,45,0,6471,1131,0
+50000,Male,University,Single,36,0,52045,2000,0
+50000,Male,University,Married,35,1,48047,0,1
+140000,Male,University,Married,39,0,92861,3992,0
+150000,Male,Graduate School,Married,31,-1,15000,0,0
+230000,Male,Graduate School,Single,33,0,56754,3000,0
+310000,Male,Graduate School,Married,40,-2,-150,0,0
+500000,Male,Graduate School,Married,47,1,0,1200,0
+170000,Male,Graduate School,Single,36,-2,5470,2000,0
+230000,Male,Graduate School,Married,41,1,0,1482,1
+50000,Male,University,Single,47,0,19949,2000,0
+170000,Male,University,Married,33,-1,917,0,0
+340000,Male,University,Married,34,-1,335002,12510,1
+220000,Male,University,Single,35,0,237976,8300,0
+60000,Male,University,Married,45,0,25513,1300,0
+200000,Male,Graduate School,Single,33,-1,1201,0,0
+350000,Male,University,Single,34,0,38021,3000,0
+60000,Male,University,Single,42,0,99114,3700,0
+100000,Male,University,Single,28,-2,0,0,0
+70000,Male,Graduate School,Single,32,1,67065,3000,1
+360000,Male,Graduate School,Married,35,-1,3738,2019,0
+100000,Male,University,Married,29,-1,1200,2100,1
+290000,Male,Graduate School,Single,47,-2,822,850,0
+380000,Male,University,Married,47,1,0,0,1
+380000,Male,Graduate School,Single,33,0,77994,3000,0
+60000,Male,University,Married,45,0,60035,2380,1
+320000,Male,High School,Single,28,-1,67264,3231,0
+80000,Male,University,Married,45,2,35169,4500,1
+280000,Male,Graduate School,Married,37,-1,33687,5000,0
+110000,Male,University,Married,38,2,103202,2500,1
+310000,Male,University,Single,46,2,102064,5030,1
+230000,Male,Graduate School,Single,28,0,28534,15222,0
+200000,Male,High School,Single,28,1,0,1500,1
+70000,Male,University,Single,29,0,58305,5000,0
+210000,Male,University,Married,35,0,62809,3018,0
+290000,Male,High School,Married,41,-1,7488,5807,0
+460000,Male,Graduate School,Married,37,2,7121,1273,0
+360000,Male,Graduate School,Married,39,-1,264,1139,0
+200000,Male,Graduate School,Single,32,-2,1100,740,0
+230000,Male,Graduate School,Married,37,0,30769,1500,0
+80000,Male,University,Single,32,-1,2976,13424,0
+80000,Male,High School,Married,36,0,81066,3000,0
+150000,Male,Graduate School,Single,33,0,27221,5000,0
+290000,Male,University,Married,36,0,72774,3500,0
+90000,Male,High School,Married,31,0,67776,4023,0
+360000,Male,University,Married,40,-1,250,1088,1
+190000,Male,Graduate School,Single,36,1,105418,5500,1
+20000,Male,University,Married,41,0,7472,2000,0
+150000,Male,High School,Married,47,0,250475,5772,0
+50000,Male,University,Married,32,0,11030,1203,0
+310000,Male,Graduate School,Single,38,-2,995,993,0
+260000,Male,Graduate School,Married,38,0,94152,4150,0
+190000,Male,University,Married,42,0,31323,1500,0
+30000,Male,Graduate School,Single,31,-1,6456,4398,1
+280000,Male,Graduate School,Single,32,-2,19668,1000,0
+60000,Male,University,Single,40,0,58123,2000,1
+20000,Male,University,Single,32,2,18996,1325,0
+140000,Male,University,Single,29,0,132629,5000,0
+80000,Male,Graduate School,Married,47,-1,2001,2338,1
+200000,Male,High School,Married,43,0,1243,1453,0
+170000,Male,Graduate School,Married,42,-1,610,995,0
+210000,Male,University,Single,48,-1,2776,2776,0
+500000,Male,University,Married,46,0,157046,1652,0
+60000,Male,University,Single,36,-1,39212,45994,0
+410000,Male,Graduate School,Single,32,0,35868,3762,1
+220000,Male,University,Married,35,0,210200,8000,1
+420000,Male,University,Single,30,0,122418,4816,0
+250000,Male,University,Single,28,0,149678,7000,0
+80000,Male,Graduate School,Single,26,0,84488,2871,0
+80000,Male,University,Single,27,0,40612,2000,0
+20000,Male,Graduate School,Single,28,0,3608,5000,0
+300000,Male,University,Single,30,-1,18468,61200,0
+50000,Male,Graduate School,Single,44,0,54332,3000,1
+360000,Male,High School,Married,46,-2,0,0,0
+140000,Male,Graduate School,Married,36,0,127496,5000,0
+70000,Male,University,Married,31,0,61177,5300,1
+70000,Male,University,Single,34,0,69368,4000,0
+430000,Male,Graduate School,Married,39,0,41428,17800,0
+170000,Male,University,Married,39,-2,0,582,0
+210000,Male,University,Married,44,1,0,0,0
+150000,Male,Graduate School,Married,36,-2,2750,2424,0
+260000,Male,University,Married,42,0,252107,12004,0
+500000,Male,University,Married,40,-2,22520,1110,0
+160000,Male,Graduate School,Married,39,-2,4138,35,0
+210000,Male,University,Married,45,0,56024,1836,1
+230000,Male,University,Married,33,1,203218,10000,0
+180000,Male,University,Married,43,-1,2745,3884,0
+20000,Male,University,Single,37,0,7440,10050,0
+310000,Male,Graduate School,Married,41,0,247202,8213,0
+180000,Male,Graduate School,Single,39,-2,0,444,0
+80000,Male,Graduate School,Married,46,0,48967,2000,0
+230000,Male,Graduate School,Married,47,0,143083,6500,0
+210000,Male,University,Single,41,0,114953,5000,1
+90000,Male,University,Married,37,1,61584,3000,1
+30000,Male,Graduate School,Married,45,1,25829,0,0
+430000,Male,Graduate School,Married,34,2,435,183,0
+200000,Male,Graduate School,Married,41,-2,10799,1213,0
+220000,Male,Graduate School,Single,34,-2,416,416,1
+480000,Male,Graduate School,Married,37,1,6071,8023,0
+360000,Male,University,Married,36,0,33051,1000,0
+390000,Male,University,Married,34,-2,-702,0,0
+20000,Male,University,Single,43,0,13817,1815,1
+200000,Male,High School,Married,37,0,65927,2700,0
+30000,Male,University,Single,35,2,3462,0,1
+410000,Male,Graduate School,Married,40,-1,34405,18612,0
+40000,Male,Graduate School,Married,47,2,38741,2000,0
+50000,Female,High School,Married,47,1,50574,0,1
+230000,Female,Graduate School,Single,28,-1,48144,12000,0
+130000,Female,Graduate School,Single,33,2,1022,0,1
+100000,Female,University,Married,33,-1,7067,0,0
+400000,Male,Graduate School,Single,34,0,26912,10016,0
+330000,Male,Graduate School,Single,42,0,73214,10000,0
+590000,Male,Graduate School,Married,41,-2,617,1304,0
+50000,Male,Graduate School,Single,34,2,13731,1530,1
+140000,Male,University,Single,36,0,58165,2500,1
+730000,Male,University,Married,37,0,70309,20000,0
+110000,Male,University,Single,34,0,107430,5500,1
+500000,Male,High School,Married,46,-1,4031,3709,0
+200000,Male,High School,Single,40,-1,6154,6000,0
+180000,Male,High School,Single,41,0,183047,6704,0
+70000,Male,University,Married,45,0,67062,3000,0
+170000,Male,Graduate School,Married,45,0,2610,0,0
+420000,Male,University,Single,44,0,13620,2000,0
+60000,Male,University,Married,29,0,58634,3000,0
+280000,Male,University,Married,31,0,266511,9557,0
+30000,Male,University,Single,40,0,21617,3200,0
+90000,Male,Graduate School,Married,38,2,88937,3103,1
+200000,Male,Graduate School,Married,36,-2,239,2925,0
+220000,Male,High School,Single,33,0,62770,40000,0
+280000,Male,University,Single,47,0,92288,4004,0
+280000,Male,Graduate School,Single,38,0,138984,4900,0
+430000,Male,Graduate School,Single,40,-2,2099,8297,0
+180000,Male,Graduate School,Married,37,1,0,0,0
+50000,Male,Graduate School,Married,49,1,27231,0,1
+500000,Male,Graduate School,Married,46,-2,18664,68013,0
+50000,Male,High School,Single,35,0,38536,2000,0
+200000,Male,Graduate School,Married,43,-2,3005,4709,0
+100000,Male,Graduate School,Married,42,2,95926,5000,0
+60000,Male,University,Married,31,0,55831,8660,0
+30000,Male,High School,Single,33,0,30384,3010,1
+180000,Male,High School,Married,41,2,111786,6000,0
+20000,Male,University,Married,40,1,8201,1000,1
+350000,Male,University,Married,39,0,179189,6695,0
+20000,Male,University,Single,43,0,5909,2500,0
+100000,Male,University,Married,38,0,100600,4090,0
+20000,Male,University,Married,43,2,11573,1300,1
+30000,Male,High School,Single,29,0,10305,1500,0
+30000,Male,University,Single,31,-1,7130,2654,0
+180000,Male,Others,Married,40,0,78170,4000,0
+300000,Male,University,Single,29,0,49917,5000,0
+360000,Male,University,Others,38,-1,176,252,0
+210000,Male,Graduate School,Single,36,-1,931,2570,0
+30000,Male,Graduate School,Married,35,1,18829,1000,1
+150000,Male,University,Married,31,0,146805,5300,0
+150000,Male,University,Single,31,-2,0,648,0
+150000,Male,High School,Single,42,-2,0,0,1
+470000,Male,Graduate School,Married,40,1,276880,5014,0
+200000,Male,University,Married,38,0,105729,3806,0
+80000,Male,High School,Married,49,0,77985,4200,0
+250000,Male,Graduate School,Married,37,1,206430,8300,0
+500000,Male,Graduate School,Single,30,-2,24349,9605,0
+200000,Male,Graduate School,Married,36,1,14460,0,0
+90000,Male,Graduate School,Married,34,0,106339,20585,0
+400000,Male,Graduate School,Married,47,0,105610,3904,0
+180000,Male,University,Single,38,-1,9409,10423,0
+160000,Male,University,Married,47,-1,988,779,0
+130000,Male,University,Married,38,0,123044,4949,0
+60000,Male,University,Married,45,0,67719,3000,0
+240000,Male,University,Married,39,0,215013,9600,0
+160000,Male,University,Married,33,0,82535,4000,1
+50000,Male,University,Married,43,0,48919,1800,0
+280000,Male,Graduate School,Single,34,0,251267,15000,0
+30000,Male,University,Married,40,0,25328,2000,0
+130000,Male,University,Married,46,0,46849,1802,0
+160000,Male,Graduate School,Married,43,-1,238,0,0
+420000,Male,Others,Single,36,1,944,1819,0
+210000,Male,University,Married,37,1,0,650,1
+70000,Male,Graduate School,Married,39,0,60412,8137,0
+50000,Male,University,Single,44,0,47674,1764,0
+50000,Male,High School,Married,38,2,37501,2000,1
+330000,Male,Graduate School,Married,38,-1,1051,671,0
+70000,Male,University,Single,40,2,69164,3500,1
+170000,Male,University,Single,36,-2,600,1598,0
+50000,Male,University,Others,41,0,51449,1700,1
+250000,Male,Graduate School,Married,39,-1,1852,0,0
+200000,Male,High School,Married,49,0,51437,2000,0
+460000,Male,University,Married,37,0,66935,13039,0
+100000,Male,High School,Married,40,2,77617,4000,0
+220000,Male,Graduate School,Single,35,1,190105,0,1
+30000,Male,University,Single,35,-1,780,0,1
+80000,Male,University,Single,34,0,73414,3558,0
+320000,Male,University,Married,34,0,117983,10009,0
+110000,Male,Others,Married,36,-1,3385,28000,0
+20000,Male,University,Single,35,0,17809,1000,0
+50000,Male,University,Married,44,2,127510,0,1
+430000,Male,University,Married,38,-2,18517,23565,0
+80000,Male,University,Married,39,2,59999,2800,1
+40000,Male,Graduate School,Single,35,1,11074,0,1
+150000,Male,University,Married,42,0,98965,3360,0
+330000,Male,Others,Married,33,0,26586,1700,0
+20000,Male,University,Single,40,0,29035,4000,0
+20000,Male,University,Single,45,0,17369,15000,0
+500000,Male,Graduate School,Married,40,0,18110,0,0
+200000,Male,Graduate School,Single,42,0,161514,6000,1
+20000,Male,University,Single,46,0,11027,1199,0
+210000,Male,Graduate School,Married,44,0,126113,5002,0
+500000,Male,University,Married,41,1,169117,0,0
+300000,Male,Graduate School,Single,42,-2,161772,114402,0
+110000,Male,University,Married,35,0,103750,3759,0
+500000,Male,Graduate School,Married,39,0,74213,4006,0
+120000,Male,Graduate School,Married,44,1,75294,3000,1
+360000,Male,University,Married,44,-2,1824,1740,0
+350000,Male,High School,Single,36,-1,9189,6088,0
+80000,Male,Graduate School,Single,35,1,0,4117,1
+320000,Male,Graduate School,Married,46,-2,2790,3349,0
+230000,Male,Graduate School,Married,38,-1,1088,2451,0
+110000,Male,University,Married,44,0,82503,3083,0
+60000,Male,University,Married,32,0,22357,1684,0
+20000,Male,Graduate School,Single,34,0,15415,1271,0
+320000,Male,University,Married,35,0,10125,1526,0
+90000,Male,Graduate School,Married,41,0,88805,3500,0
+200000,Male,University,Married,49,-2,390,390,0
+240000,Male,Graduate School,Single,46,1,102318,468,1
+240000,Male,Graduate School,Married,43,0,223151,7766,1
+120000,Male,University,Married,48,-1,13544,3000,0
+70000,Male,High School,Married,46,0,24345,1414,0
+20000,Male,University,Married,33,1,4481,0,0
+140000,Male,University,Married,35,1,147946,7000,0
+240000,Male,High School,Married,41,1,101060,6000,0
+50000,Male,High School,Single,44,0,48322,3000,0
+100000,Male,University,Single,26,2,89799,0,0
+140000,Male,University,Single,26,0,21513,2000,0
+260000,Male,University,Married,31,-1,7559,0,0
+230000,Male,Graduate School,Single,35,-2,0,13371,0
+20000,Male,High School,Single,36,-1,416,0,0
+110000,Male,University,Single,38,0,106113,1635,0
+460000,Male,Graduate School,Married,40,0,235388,8000,0
+120000,Male,University,Single,36,0,26490,3246,0
+60000,Male,University,Married,31,0,22219,140,1
+50000,Male,High School,Married,31,0,1473,390,1
+50000,Male,High School,Single,41,0,48619,2100,0
+30000,Male,High School,Married,42,0,1946,1000,0
+20000,Male,High School,Single,45,0,18011,1500,0
+20000,Male,University,Single,47,0,17059,1500,0
+350000,Male,University,Single,32,0,262576,10013,0
+240000,Male,University,Married,36,0,239003,9700,0
+120000,Male,University,Married,37,2,45902,0,1
+200000,Male,High School,Married,47,-2,0,9661,0
+150000,Male,Graduate School,Single,31,-1,3056,834,0
+180000,Male,University,Married,37,-2,0,0,0
+260000,Male,Graduate School,Married,30,-1,232,91,0
+180000,Male,Graduate School,Single,31,-2,3580,0,0
+130000,Male,Graduate School,Married,31,0,26528,1400,0
+250000,Male,Graduate School,Single,34,0,119310,15000,0
+460000,Male,Others,Married,35,-2,0,737,0
+40000,Male,Graduate School,Married,35,0,30927,1300,0
+20000,Male,University,Married,36,2,9735,3000,1
+310000,Male,University,Married,41,-2,34682,23712,0
+90000,Male,University,Married,40,0,87190,3068,0
+20000,Male,High School,Single,43,0,12648,2000,0
+270000,Male,University,Married,39,0,18205,1500,0
+170000,Male,High School,Married,45,0,114625,6305,0
+300000,Male,University,Married,41,0,66132,2412,0
+350000,Male,Graduate School,Single,30,2,832,0,1
+360000,Male,Graduate School,Married,39,-1,396,396,0
+260000,Male,University,Married,37,0,131472,5036,0
+100000,Male,Graduate School,Single,28,0,12332,1296,0
+200000,Male,Graduate School,Single,28,0,42215,3092,0
+30000,Male,University,Single,38,0,21697,1500,0
+160000,Male,Graduate School,Single,46,-1,1338,711,0
+370000,Male,University,Single,36,0,47269,2000,0
+380000,Male,University,Married,43,0,112118,96778,0
+210000,Male,High School,Single,37,-1,53748,99737,0
+90000,Male,University,Single,44,1,36835,0,0
+210000,Male,University,Married,42,-1,3096,5260,0
+120000,Male,Graduate School,Single,30,1,0,0,1
+500000,Male,Graduate School,Married,46,-1,46178,57498,0
+390000,Male,University,Married,48,0,32708,3000,0
+310000,Male,Graduate School,Single,32,-2,8242,32617,0
+280000,Male,Graduate School,Married,32,0,284556,9500,0
+200000,Male,Graduate School,Married,41,-1,1980,1802,0
+50000,Male,University,Married,38,0,32669,3800,0
+500000,Male,University,Married,42,-1,17360,13222,0
+30000,Male,University,Single,49,0,29893,1697,0
+50000,Male,University,Single,35,0,49986,23800,1
+210000,Male,University,Single,47,-2,-163,0,1
+150000,Male,Graduate School,Married,41,-2,0,0,0
+230000,Male,University,Married,36,0,218205,7859,0
+200000,Male,Graduate School,Single,38,-1,1488,4246,0
+200000,Male,Graduate School,Single,38,-1,3810,3063,0
+330000,Male,University,Married,38,0,179935,8000,1
+70000,Male,University,Married,37,0,68144,2538,0
+100000,Male,University,Married,41,-1,91,957,1
+130000,Male,University,Married,34,0,129998,6543,0
+90000,Male,University,Single,35,-1,941,3008,1
+150000,Male,Graduate School,Married,36,1,0,0,0
+230000,Male,University,Married,35,0,22472,1330,0
+80000,Male,Graduate School,Single,38,-1,21788,16445,0
+100000,Male,University,Married,36,0,66470,2000,0
+170000,Male,Graduate School,Married,37,0,187146,6500,1
+310000,Male,Graduate School,Married,43,-1,2432,0,0
+160000,Male,University,Single,36,-1,735,735,0
+70000,Male,University,Married,36,2,30809,0,1
+500000,Male,Graduate School,Married,37,0,62053,75840,0
+50000,Male,University,Married,46,0,51400,0,0
+410000,Male,University,Married,46,2,71532,3700,1
+120000,Male,Graduate School,Single,32,0,120792,20137,0
+20000,Male,Graduate School,Single,30,-1,323,26703,1
+310000,Male,University,Married,40,-1,12295,2000,0
+90000,Male,University,Married,36,0,24489,1527,0
+200000,Male,Graduate School,Married,43,0,145901,5216,0
+360000,Male,Graduate School,Single,30,-1,10821,5586,0
+350000,Male,Graduate School,Single,29,0,16476,5000,0
+20000,Male,University,Married,34,0,14950,2065,0
+260000,Male,Graduate School,Married,37,-2,5275,0,0
+110000,Male,University,Married,42,0,108227,4200,1
+140000,Male,Graduate School,Single,32,0,90590,8000,0
+310000,Male,High School,Single,34,0,263685,8000,0
+170000,Male,University,Married,40,0,148211,5400,0
+80000,Male,Graduate School,Single,30,0,76539,3510,0
+410000,Male,University,Married,46,-1,5135,2916,0
+130000,Male,University,Single,38,0,16657,1500,0
+260000,Male,Graduate School,Married,39,0,5992,1116,0
+500000,Male,Graduate School,Single,40,-1,2594,1190,0
+160000,Male,University,Married,43,-2,390,780,0
+180000,Male,University,Married,33,0,37565,3006,0
+70000,Male,High School,Single,34,0,53972,1700,0
+50000,Male,University,Single,35,-1,1000,14300,0
+90000,Male,Graduate School,Single,32,2,63632,3000,1
+210000,Male,University,Single,32,0,137264,3664,1
+210000,Male,Others,Married,31,2,195615,9234,0
+360000,Male,University,Married,32,-1,942,3678,0
+280000,Male,Graduate School,Single,30,0,100740,5000,0
+160000,Male,Graduate School,Single,31,0,108179,3866,0
+200000,Male,University,Married,32,-1,3254,1998,0
+50000,Male,University,Single,32,5,46750,0,0
+400000,Male,High School,Married,33,0,2487,4009,1
+210000,Male,Graduate School,Single,32,-2,918,1146,0
+440000,Male,Graduate School,Single,33,-2,2380,1750,0
+380000,Male,High School,Married,34,-1,4539,4131,0
+120000,Male,Graduate School,Single,33,0,8578,5502,0
+110000,Male,Graduate School,Married,33,0,47460,5000,1
+110000,Male,University,Married,34,1,53524,6600,1
+360000,Male,Graduate School,Married,34,-1,6878,49190,0
+200000,Male,Graduate School,Others,47,0,202967,7208,1
+200000,Male,University,Single,36,0,175353,25001,1
+50000,Male,University,Single,35,0,50471,3000,0
+400000,Male,Graduate School,Married,38,-2,23052,8025,0
+150000,Male,Graduate School,Single,41,-1,1980,0,0
+430000,Male,Graduate School,Married,39,-1,43970,50942,0
+170000,Male,Graduate School,Married,35,-2,-21,0,0
+300000,Male,Graduate School,Married,42,0,23742,10000,0
+500000,Male,Graduate School,Single,36,0,34639,10010,0
+230000,Male,University,Married,34,0,165090,6500,0
+50000,Male,University,Married,47,0,50593,1798,0
+50000,Male,University,Single,44,1,15323,0,1
+220000,Male,Graduate School,Single,35,1,0,0,0
+500000,Male,University,Married,39,-2,2082,803,0
+350000,Male,University,Married,38,1,70845,0,0
+200000,Male,Graduate School,Married,44,-1,3983,4884,0
+30000,Male,High School,Single,35,0,27224,2001,0
+180000,Male,High School,Married,36,-1,1321,590,0
+50000,Male,University,Married,42,2,8971,1000,0
+80000,Male,Graduate School,Married,44,-1,1049,0,0
+270000,Male,Graduate School,Married,39,0,18221,3005,0
+310000,Male,Graduate School,Married,39,-1,16577,1361,0
+450000,Male,Graduate School,Married,37,1,387192,0,1
+210000,Male,Graduate School,Married,40,2,151258,5000,1
+300000,Male,University,Married,39,0,127237,4100,0
+60000,Male,University,Married,38,0,72930,5000,0
+360000,Male,Graduate School,Single,36,-1,415,0,0
+280000,Male,Graduate School,Married,34,1,0,0,0
+150000,Male,Graduate School,Married,40,0,133013,4159,1
+260000,Male,University,Married,42,-2,246,246,0
+150000,Male,Graduate School,Single,39,-2,8766,5950,0
+180000,Male,University,Single,37,0,62140,5311,0
+120000,Male,Graduate School,Married,43,-1,165,0,0
+200000,Male,Graduate School,Married,42,0,125722,12400,0
+130000,Male,Graduate School,Single,43,0,121816,6000,0
+50000,Male,University,Married,35,1,45442,0,0
+430000,Male,Graduate School,Married,35,-2,0,0,0
+30000,Male,University,Single,30,0,30210,28000,1
+60000,Male,University,Single,30,0,4419,1500,1
+180000,Male,Graduate School,Single,29,-1,1430,1785,0
+360000,Male,High School,Single,31,1,0,238,0
+60000,Male,University,Single,31,0,53954,5000,0
+100000,Male,Graduate School,Single,31,-1,3746,12077,0
+80000,Male,University,Single,48,0,74349,3400,0
+240000,Male,Graduate School,Single,34,0,30851,10000,0
+40000,Male,Graduate School,Single,35,-2,0,2277,0
+80000,Male,University,Single,35,0,77298,2515,0
+120000,Male,Graduate School,Married,48,2,360,360,1
+280000,Male,Others,Married,42,-1,11847,10252,0
+210000,Male,Graduate School,Married,41,-2,2361,652,0
+70000,Male,University,Single,38,0,48421,2006,0
+160000,Male,Graduate School,Single,37,2,63043,6079,1
+260000,Male,Others,Married,47,-1,3469,1579,0
+100000,Male,Graduate School,Single,37,2,38604,0,1
+80000,Male,University,Single,36,2,48751,2200,1
+140000,Male,University,Married,44,1,0,1384,0
+100000,Male,University,Single,40,-2,8853,0,0
+50000,Male,Graduate School,Married,37,0,29294,2000,0
+20000,Male,University,Single,35,2,15087,1600,0
+240000,Male,Graduate School,Married,35,0,188964,8000,0
+90000,Male,University,Married,44,0,16420,1300,0
+420000,Male,University,Married,38,0,293402,10433,0
+500000,Male,University,Single,33,0,134558,5185,0
+350000,Male,Graduate School,Single,33,-2,6015,8797,0
+50000,Male,University,Married,43,0,39722,1700,0
+80000,Male,Graduate School,Single,40,-1,1068,37200,0
+200000,Male,Graduate School,Married,42,1,0,0,0
+240000,Male,Graduate School,Married,43,0,67120,2200,0
+30000,Male,University,Single,44,1,0,0,0
+50000,Male,High School,Others,39,0,31514,1710,0
+290000,Male,University,Married,46,0,31824,1529,0
+20000,Male,High School,Married,38,0,17860,1400,0
+330000,Male,University,Married,45,0,86916,3135,0
+10000,Male,Graduate School,Single,36,1,8803,2620,1
+370000,Male,Graduate School,Married,45,0,123485,10000,0
+80000,Male,Graduate School,Single,43,0,84437,3093,0
+340000,Male,University,Married,40,0,147943,5169,0
+500000,Male,Graduate School,Married,31,1,0,0,1
+460000,Male,Graduate School,Married,49,0,293429,13200,0
+50000,Male,University,Single,43,0,46594,1768,1
+160000,Male,Others,Married,44,0,151394,5500,0
+80000,Male,University,Married,47,2,68250,6325,1
+210000,Male,High School,Single,48,0,200822,11477,0
+30000,Male,High School,Married,35,0,28486,2000,0
+30000,Male,University,Single,37,0,31404,1500,1
+410000,Male,Graduate School,Married,38,-1,499,0,0
+260000,Male,University,Single,35,0,493411,12325,1
+50000,Male,University,Married,40,0,50536,1200,0
+360000,Male,High School,Married,37,-1,1090,303,0
+50000,Male,High School,Married,49,0,48584,2000,0
+220000,Male,Graduate School,Married,40,-1,1208,0,0
+30000,Male,High School,Single,50,2,53237,0,1
+230000,Male,University,Single,50,1,23674,0,1
+20000,Male,High School,Married,50,1,18378,0,0
+500000,Male,University,Married,45,-1,102690,15599,0
+50000,Male,Graduate School,Single,34,0,33499,3800,0
+440000,Male,University,Married,49,-2,942,2002,0
+140000,Male,Graduate School,Married,50,0,83957,5000,0
+480000,Male,Graduate School,Married,49,-2,63701,0,0
+20000,Male,University,Single,53,0,17053,2219,1
+170000,Male,University,Married,52,3,2184,0,0
+260000,Male,University,Married,54,0,212197,7534,0
+140000,Male,Graduate School,Single,50,-1,780,0,1
+100000,Male,High School,Married,49,0,33333,0,0
+220000,Male,High School,Married,65,-1,1193,1525,0
+160000,Male,High School,Married,61,-1,390,390,0
+20000,Male,University,Married,50,0,17020,1000,0
+210000,Male,High School,Married,50,-1,4137,390,1
+60000,Male,High School,Married,50,-2,1400,373,0
+190000,Male,Graduate School,Married,50,2,36137,1900,1
+500000,Male,Graduate School,Single,51,0,79461,3000,0
+20000,Male,University,Single,50,0,8639,2000,0
+460000,Male,Graduate School,Married,51,1,179442,4,0
+460000,Male,High School,Married,61,0,10643,1600,0
+300000,Male,High School,Married,56,0,282506,20024,0
+490000,Male,University,Married,57,0,59056,2153,0
+90000,Male,University,Married,58,0,93737,3600,0
+70000,Male,High School,Married,55,2,20205,3000,1
+20000,Male,University,Married,55,0,16734,1300,1
+20000,Male,University,Married,61,-1,1480,0,0
+10000,Male,High School,Single,61,0,7700,1293,1
+420000,Male,Graduate School,Married,53,-1,1936,3489,0
+200000,Male,University,Married,54,-1,10755,10120,0
+500000,Male,Graduate School,Single,58,0,201013,8366,0
+150000,Male,Graduate School,Married,55,0,142904,5200,0
+50000,Male,High School,Single,58,0,46190,2000,0
+180000,Male,University,Married,63,0,47932,2087,0
+20000,Male,University,Single,52,0,13874,1238,0
+150000,Male,University,Married,52,2,131953,8100,1
+50000,Male,High School,Married,60,1,52198,0,0
+130000,Male,University,Married,54,0,129049,3000,0
+370000,Male,Graduate School,Single,52,0,242132,5600,0
+360000,Male,University,Married,58,-1,30155,10541,1
+100000,Male,High School,Married,54,0,37082,10000,1
+50000,Male,High School,Single,54,0,75804,2785,0
+200000,Male,High School,Single,61,0,3030,7007,1
+400000,Male,High School,Married,53,0,15090,4004,0
+20000,Male,University,Married,56,0,17768,1600,0
+500000,Male,Graduate School,Married,55,3,4957,0,1
+20000,Male,University,Single,49,0,18260,1621,1
+240000,Male,Others,Married,50,0,236470,9560,0
+210000,Male,University,Single,52,-2,5649,5425,0
+130000,Male,University,Single,54,0,89913,3300,0
+130000,Male,High School,Single,49,0,20678,1610,0
+360000,Male,Graduate School,Married,49,-1,4698,0,1
+50000,Male,Graduate School,Single,55,0,11045,3115,0
+500000,Male,Graduate School,Married,54,-1,10287,5084,0
+130000,Male,High School,Single,54,2,98285,4300,0
+20000,Male,High School,Single,55,0,19978,1594,1
+480000,Male,Graduate School,Married,50,-2,0,8389,0
+240000,Male,High School,Married,50,0,140233,4754,0
+60000,Male,University,Married,51,0,60114,1902,0
+450000,Male,Graduate School,Married,61,-1,156875,11000,1
+350000,Male,High School,Married,65,-1,188,188,0
+130000,Male,University,Married,48,2,121409,6000,1
+240000,Male,Graduate School,Married,48,0,234984,9500,0
+50000,Male,University,Single,54,2,46431,1756,0
+60000,Male,University,Single,53,0,62038,1579,0
+20000,Male,Graduate School,Single,50,0,17820,1435,0
+30000,Male,University,Married,48,0,26846,1783,0
+400000,Male,Graduate School,Married,48,-2,3173,4265,0
+400000,Male,Graduate School,Married,50,0,16241,1217,0
+200000,Male,Graduate School,Married,48,-1,419,419,0
+260000,Male,University,Married,50,0,133208,6400,0
+30000,Male,Graduate School,Married,59,-1,390,390,0
+30000,Male,Graduate School,Married,52,0,14476,1129,1
+270000,Male,Graduate School,Married,54,-1,931,31838,0
+260000,Male,High School,Married,48,0,154927,8000,0
+210000,Male,University,Married,47,1,0,296,0
+500000,Male,Graduate School,Married,51,1,231300,9838,0
+50000,Male,University,Married,49,0,48321,1839,1
+160000,Male,Graduate School,Married,53,0,205134,1800,0
+550000,Male,Graduate School,Married,53,0,201417,7110,0
+330000,Male,Graduate School,Married,54,-2,0,41219,0
+280000,Male,Graduate School,Married,60,-1,495,4,1
+400000,Male,Graduate School,Married,66,0,279579,10700,0
+300000,Male,Graduate School,Married,55,0,127480,4810,0
+20000,Male,University,Married,56,0,20180,1500,0
+280000,Male,Graduate School,Married,39,-1,1585,6416,0
+400000,Male,High School,Single,63,-1,52332,42898,0
+230000,Male,High School,Married,58,-2,-11,0,0
+20000,Male,High School,Single,53,-1,13561,1105,1
+50000,Male,High School,Married,55,0,30900,0,1
+480000,Male,Graduate School,Married,66,-1,3820,4665,0
+360000,Male,Others,Married,50,1,4767,0,0
+110000,Male,High School,Married,51,-1,2132,1946,0
+150000,Male,Graduate School,Married,51,0,149640,6000,0
+50000,Male,University,Married,54,2,49264,2500,1
+200000,Male,Graduate School,Married,54,0,205150,0,0
+110000,Male,University,Married,49,0,106264,4400,0
+100000,Male,Graduate School,Married,55,0,10712,8000,0
+20000,Male,Graduate School,Single,50,0,11462,1212,0
+30000,Male,University,Married,50,0,28796,1609,0
+230000,Male,University,Married,60,0,131550,6057,0
+50000,Male,High School,Married,50,2,8917,1468,1
+30000,Male,High School,Single,53,0,24181,3000,0
+50000,Male,High School,Married,55,-1,1344,1200,0
+130000,Male,High School,Married,51,0,124626,4500,0
+50000,Male,University,Married,57,0,46540,2770,0
+360000,Male,Graduate School,Married,58,-1,1,1001,1
+50000,Male,University,Married,56,0,28358,1416,0
+150000,Male,Graduate School,Married,54,-2,4015,501,1
+240000,Male,University,Married,56,-1,390,390,0
+500000,Male,Graduate School,Married,51,-1,8030,13156,0
+170000,Male,High School,Married,61,0,167535,6000,0
+350000,Male,Graduate School,Married,50,0,93676,3240,0
+160000,Male,Graduate School,Married,51,2,150,1230,1
+220000,Male,University,Married,50,-2,0,200,0
+500000,Male,Others,Married,53,0,431243,11000,0
+460000,Male,Graduate School,Married,51,-1,1637,0,0
+50000,Male,Graduate School,Married,50,-1,780,0,0
+740000,Male,Graduate School,Married,50,0,321279,13000,0
+50000,Male,High School,Married,50,-1,7056,2840,0
+290000,Male,Graduate School,Married,51,0,225478,10026,0
+320000,Male,Graduate School,Single,57,0,234704,11000,0
+420000,Male,Graduate School,Married,52,-1,11097,16106,0
+500000,Male,High School,Married,50,-1,339,2744,0
+450000,Male,Graduate School,Married,52,0,388009,15000,0
+120000,Male,Graduate School,Married,54,-2,0,0,0
+80000,Male,University,Married,56,0,65186,2800,0
+250000,Male,High School,Married,56,-1,6297,8657,0
+180000,Male,Graduate School,Married,75,1,0,0,1
+260000,Male,High School,Single,52,1,0,0,0
+20000,Male,University,Married,52,0,1912,2768,0
+220000,Male,High School,Married,63,-2,1293,496,0
+90000,Male,Graduate School,Married,61,0,46441,1762,0
+210000,Male,University,Married,75,0,205601,9700,0
+300000,Male,High School,Married,72,-1,4984,756,0
+30000,Male,High School,Married,56,1,30826,0,1
+90000,Male,University,Married,55,0,15019,5000,0
+300000,Male,High School,Married,56,-2,0,0,1
+500000,Male,Graduate School,Married,54,2,518950,39000,1
+170000,Female,University,Single,37,-1,1499,53705,0
+150000,Female,High School,Single,28,0,100623,4700,0
+360000,Female,University,Single,25,-1,20952,12590,0
+70000,Female,University,Married,46,0,70349,2600,0
+350000,Female,Graduate School,Married,33,-2,6016,4348,0
+50000,Female,High School,Married,27,0,29875,4506,0
+50000,Female,University,Married,27,1,20293,2000,1
+70000,Female,University,Single,26,2,71862,3500,1
+30000,Female,University,Married,24,1,29654,0,0
+30000,Female,High School,Married,23,0,26117,1737,0
+260000,Female,University,Married,37,1,0,312,0
+30000,Female,University,Single,22,1,22623,10,1
+240000,Female,University,Married,34,-2,0,2712,0
+80000,Female,Graduate School,Single,29,2,41355,0,1
+40000,Female,Graduate School,Single,27,0,9663,1200,0
+130000,Female,Graduate School,Single,30,2,9123,0,1
+60000,Female,High School,Single,41,0,56309,3000,0
+250000,Female,University,Single,28,-1,500,3344,0
+60000,Female,University,Single,25,0,50791,1832,0
+390000,Female,Graduate School,Single,27,1,0,0,0
+110000,Female,High School,Single,29,1,23258,0,0
+30000,Female,University,Single,22,0,27219,3600,0
+50000,Female,High School,Single,23,0,44027,8800,0
+10000,Female,University,Single,21,-1,8126,1668,0
+50000,Female,Graduate School,Single,22,0,8000,1118,0
+80000,Female,University,Single,22,1,0,2388,0
+30000,Female,University,Married,21,0,28640,1400,0
+30000,Female,Graduate School,Single,21,1,33300,0,1
+50000,Female,University,Married,22,1,44531,7178,0
+50000,Female,University,Single,22,2,50583,2300,0
+50000,Female,High School,Married,22,1,20065,1600,0
+10000,Female,Graduate School,Single,22,0,7302,1330,0
+50000,Female,High School,Single,23,0,32789,1500,0
+50000,Female,University,Single,23,1,7181,0,0
+10000,Female,University,Single,22,0,7828,1009,0
+10000,Female,University,Single,21,2,9917,1400,1
+70000,Female,University,Single,22,0,66007,2057,0
+20000,Female,University,Single,22,0,17377,1360,0
+30000,Female,University,Single,22,2,26243,0,0
+30000,Female,Graduate School,Single,22,0,22271,1379,0
+50000,Female,High School,Single,23,0,29496,1503,0
+100000,Female,University,Married,22,0,100192,3000,0
+30000,Female,University,Single,23,0,22756,5000,0
+20000,Female,University,Single,22,0,4400,4500,1
+50000,Female,Graduate School,Single,23,2,44609,1000,1
+30000,Female,University,Married,25,4,37695,0,0
+30000,Female,University,Single,23,2,27216,3003,1
+30000,Female,University,Single,23,0,23237,10000,0
+20000,Female,University,Single,23,1,16674,22362,0
+30000,Female,University,Single,23,2,12376,1600,1
+50000,Female,Graduate School,Single,23,0,39787,1600,0
+80000,Female,University,Married,23,0,54600,3300,0
+30000,Female,University,Single,24,1,-5,570,0
+10000,Female,University,Single,22,2,7568,1300,0
+30000,Female,University,Single,23,0,20059,1641,1
+20000,Female,Graduate School,Single,24,-1,1908,1405,0
+30000,Female,University,Married,22,0,3118,2414,0
+120000,Female,Graduate School,Single,23,0,117035,6000,0
+30000,Female,Graduate School,Married,23,0,20340,1400,0
+110000,Female,University,Single,23,-1,541,541,0
+30000,Female,University,Single,23,0,11284,1503,0
+80000,Female,University,Single,23,0,51980,6670,0
+20000,Female,Graduate School,Single,23,0,17849,2000,0
+170000,Female,High School,Married,23,0,28615,1446,0
+120000,Female,Graduate School,Single,24,0,11245,1500,0
+30000,Female,University,Single,23,-1,1434,6021,0
+50000,Female,Graduate School,Single,23,0,49711,2019,0
+200000,Female,University,Married,29,1,0,0,0
+30000,Female,Graduate School,Single,23,0,14544,1300,1
+90000,Female,University,Single,23,0,11333,4234,0
+50000,Female,University,Single,24,0,53159,2000,0
+50000,Female,Graduate School,Single,23,0,40111,2252,0
+150000,Female,University,Single,24,1,65542,0,0
+10000,Female,Graduate School,Single,24,1,6779,0,0
+130000,Female,High School,Single,24,0,114466,4600,1
+70000,Female,University,Single,23,-1,312,1837,0
+40000,Female,Graduate School,Single,23,-1,14310,0,0
+200000,Female,University,Single,24,0,52143,1910,0
+110000,Female,University,Single,25,1,113385,3900,0
+80000,Female,University,Single,25,0,26855,2386,0
+90000,Female,Graduate School,Single,24,0,53173,1324,0
+30000,Female,University,Single,24,0,27170,2500,0
+20000,Female,High School,Single,22,0,16063,1565,0
+30000,Female,Graduate School,Single,24,0,21540,1358,0
+230000,Female,University,Single,24,-1,1993,2360,1
+50000,Female,University,Single,24,-1,1572,1572,0
+80000,Female,University,Single,25,0,36841,3900,1
+130000,Female,University,Single,25,0,128976,7100,0
+150000,Female,Graduate School,Single,25,-1,2784,0,1
+80000,Female,High School,Married,25,-2,3377,3377,0
+140000,Female,University,Single,26,0,131335,5864,0
+150000,Female,Graduate School,Single,26,0,38580,2000,0
+50000,Female,University,Single,24,0,50870,1985,0
+80000,Female,Graduate School,Single,26,0,62536,3000,0
+130000,Female,University,Single,25,0,45134,9000,0
+40000,Female,Graduate School,Single,26,-1,1216,776,0
+140000,Female,University,Single,24,2,38661,5000,1
+30000,Female,Graduate School,Single,25,-1,640,30013,0
+500000,Female,University,Single,25,0,37797,3050,0
+270000,Female,University,Single,25,0,38525,2000,0
+60000,Female,University,Single,23,1,59866,0,0
+120000,Female,University,Single,23,3,1250,0,0
+130000,Female,University,Single,24,0,35871,1900,0
+30000,Female,University,Single,25,1,30937,2500,0
+290000,Female,University,Single,25,0,57445,2000,0
+30000,Female,Graduate School,Single,24,2,23905,0,1
+50000,Female,University,Single,26,0,50952,2700,0
+210000,Female,High School,Single,25,0,118408,8600,1
+30000,Female,Graduate School,Single,23,2,25366,2000,1
+60000,Female,University,Single,23,0,22241,2000,0
+50000,Female,High School,Married,23,2,32847,1941,1
+80000,Female,Graduate School,Single,23,-1,740,9052,0
+50000,Female,University,Single,24,2,41500,0,1
+90000,Female,University,Single,23,1,39772,0,0
+50000,Female,Graduate School,Single,23,1,-69,7790,1
+50000,Female,Graduate School,Single,23,2,49316,4044,1
+20000,Female,University,Single,23,0,13291,1600,0
+150000,Female,University,Single,23,0,167317,10000,0
+110000,Female,University,Single,24,0,98843,3200,0
+140000,Female,University,Single,24,-1,22732,6624,0
+30000,Female,University,Single,25,0,30523,2000,0
+120000,Female,University,Married,25,0,52109,1565,0
+80000,Female,University,Single,25,0,77440,3500,0
+80000,Female,University,Single,26,0,42933,1000,0
+80000,Female,University,Single,25,0,80115,75004,0
+50000,Female,University,Married,25,0,7444,326,0
+30000,Female,University,Single,22,0,29449,1402,0
+80000,Female,High School,Single,22,0,23023,2100,0
+30000,Female,University,Single,23,0,28767,1749,1
+50000,Female,High School,Single,23,0,13385,1200,0
+170000,Female,Graduate School,Single,27,-1,41977,1650,0
+80000,Female,High School,Single,27,0,54184,1500,1
+210000,Female,University,Single,26,0,15599,1059,0
+160000,Female,Graduate School,Single,26,1,100364,0,0
+150000,Female,University,Single,26,-1,242,3180,0
+70000,Female,High School,Single,24,0,69039,3000,0
+60000,Female,High School,Married,23,2,60406,2055,0
+50000,Female,University,Single,22,0,45417,2000,0
+140000,Female,University,Single,26,2,118425,6000,1
+80000,Female,University,Married,26,2,78768,3500,0
+80000,Female,University,Married,26,0,40216,1400,0
+60000,Female,Graduate School,Single,26,0,36604,1765,0
+200000,Female,Graduate School,Single,27,0,86192,15186,0
+100000,Female,Graduate School,Single,27,0,19164,5000,0
+100000,Female,Graduate School,Single,26,0,81463,4000,0
+80000,Female,University,Single,26,0,77760,3100,1
+70000,Female,University,Single,22,0,72473,3500,0
+80000,Female,University,Single,24,0,19167,3000,0
+60000,Female,University,Single,24,0,50840,2100,0
+80000,Female,University,Single,24,0,72383,3500,0
+50000,Female,University,Married,24,0,48700,2401,0
+80000,Female,Graduate School,Single,24,2,79434,3100,1
+30000,Female,University,Single,23,0,24607,1700,0
+50000,Female,Graduate School,Single,25,0,9162,1500,0
+80000,Female,High School,Single,25,0,88786,3200,0
+20000,Female,University,Married,21,0,14326,1252,0
+80000,Female,University,Married,25,0,79029,4000,0
+100000,Female,Graduate School,Single,25,0,90976,4000,0
+30000,Female,Graduate School,Single,25,1,14717,0,1
+200000,Female,Graduate School,Single,25,0,183516,6700,0
+60000,Female,University,Single,25,0,34992,2000,1
+170000,Female,University,Others,24,-1,44544,2000,0
+20000,Female,Graduate School,Single,24,0,15451,1600,1
+230000,Female,Graduate School,Single,25,0,35101,5000,0
+50000,Female,University,Single,25,3,66881,0,0
+90000,Female,University,Single,25,0,103541,5000,1
+70000,Female,Graduate School,Single,25,0,5328,212,0
+50000,Female,Graduate School,Single,25,0,44984,3000,0
+120000,Female,University,Single,26,0,34183,1700,0
+50000,Female,High School,Single,26,2,33457,1900,1
+380000,Female,Graduate School,Single,27,-2,0,292,0
+140000,Female,Graduate School,Single,27,1,0,2051,1
+200000,Female,University,Single,27,0,157768,6000,0
+60000,Female,University,Single,26,0,23911,1668,0
+80000,Female,High School,Single,25,0,80804,3000,0
+270000,Female,Graduate School,Single,28,0,70587,3015,0
+150000,Female,University,Single,28,0,25639,2000,0
+260000,Female,High School,Single,27,0,106164,3834,0
+200000,Female,Others,Married,27,0,185143,7000,0
+90000,Female,Graduate School,Single,27,0,61799,2400,0
+200000,Female,University,Married,29,1,57982,0,0
+200000,Female,High School,Single,29,-1,326,326,0
+30000,Female,University,Single,25,0,23657,3300,1
+180000,Female,Graduate School,Single,27,0,101395,5000,0
+30000,Female,University,Married,28,0,8704,1300,0
+520000,Female,Graduate School,Single,28,-1,34167,40054,0
+80000,Female,High School,Single,28,-1,1646,4830,0
+120000,Female,High School,Married,28,0,57863,5000,0
+360000,Female,Graduate School,Single,27,-1,660,663,0
+50000,Female,Graduate School,Single,27,0,44402,2000,0
+200000,Female,University,Single,27,0,108422,3900,0
+30000,Female,University,Single,24,0,23729,10000,1
+50000,Female,University,Single,24,0,16691,3500,0
+80000,Female,Graduate School,Single,25,-2,4932,2638,0
+50000,Female,University,Single,25,0,40114,1000,0
+120000,Female,University,Single,26,-1,1682,1929,0
+210000,Female,Others,Single,27,1,0,0,0
+80000,Female,University,Single,22,0,76405,2780,0
+50000,Female,Graduate School,Single,25,0,48408,1774,0
+300000,Female,University,Single,25,-2,1800,1800,0
+100000,Female,Graduate School,Single,26,0,2405,1214,0
+80000,Female,University,Single,26,-1,390,390,1
+130000,Female,Graduate School,Single,26,-1,1094,1293,0
+80000,Female,University,Single,27,0,20540,1985,0
+160000,Female,Graduate School,Single,27,1,0,11700,1
+130000,Female,Graduate School,Single,27,0,128137,4000,0
+30000,Female,Graduate School,Married,23,-1,6630,3842,0
+50000,Female,Graduate School,Single,23,0,9861,1000,0
+20000,Female,University,Single,22,0,19583,1300,0
+140000,Female,University,Single,23,0,21001,4000,0
+180000,Female,Graduate School,Single,25,-1,62798,1867,0
+400000,Female,High School,Single,26,0,18256,1300,0
+260000,Female,University,Single,26,-1,28,288,0
+260000,Female,University,Single,26,0,264249,9800,0
+180000,Female,University,Married,26,-1,2406,2410,0
+160000,Female,Graduate School,Single,27,1,0,0,0
+20000,Female,University,Single,25,2,17496,0,0
+30000,Female,University,Married,25,0,28848,2000,0
+250000,Female,University,Married,25,0,13358,6000,0
+80000,Female,Graduate School,Single,25,0,40764,2000,0
+90000,Female,Graduate School,Single,24,0,77860,3292,0
+50000,Female,Graduate School,Single,28,-2,0,0,0
+210000,Female,Graduate School,Single,27,-1,39397,2000,0
+200000,Female,Graduate School,Single,27,0,8666,1224,0
+320000,Female,University,Single,28,0,10668,44462,0
+200000,Female,Graduate School,Married,32,-2,14202,0,0
+80000,Female,University,Married,27,1,3970,0,1
+140000,Female,High School,Single,26,0,33236,1785,0
+80000,Female,Graduate School,Single,27,0,78178,3437,0
+200000,Female,Graduate School,Single,29,0,11347,25000,0
+230000,Female,University,Single,30,-1,2374,0,1
+230000,Female,Graduate School,Single,28,0,231591,4650,0
+20000,Female,High School,Married,26,0,13548,1605,0
+90000,Female,University,Married,27,0,91367,2121,0
+130000,Female,University,Married,26,2,5169,1300,1
+320000,Female,University,Single,29,-2,5118,6783,0
+30000,Female,University,Single,29,2,36328,0,0
+240000,Female,Graduate School,Single,29,2,32870,2000,1
+360000,Female,Graduate School,Single,29,-1,885,10553,0
+30000,Female,Graduate School,Single,28,0,13568,3166,0
+240000,Female,High School,Single,28,2,242837,10000,1
+320000,Female,Graduate School,Single,28,-2,0,0,0
+410000,Female,Graduate School,Single,28,0,210866,2444,0
+220000,Female,University,Single,29,0,43596,6000,0
+240000,Female,Graduate School,Single,26,-2,0,0,0
+200000,Female,Graduate School,Single,26,-2,2232,3967,0
+80000,Female,Others,Single,25,0,72732,3201,0
+20000,Female,Graduate School,Single,25,0,17977,2000,0
+20000,Female,Graduate School,Single,24,-1,4687,15000,0
+300000,Female,University,Single,26,-1,1121,699,0
+340000,Female,Graduate School,Single,26,1,17211,2637,0
+300000,Female,University,Single,27,0,15024,1200,0
+350000,Female,Graduate School,Single,26,-2,0,0,0
+20000,Female,Graduate School,Single,24,2,18466,1500,1
+90000,Female,Graduate School,Single,24,0,69542,2366,0
+80000,Female,University,Married,27,0,77039,2984,1
+20000,Female,University,Married,28,0,6064,1120,0
+240000,Female,Graduate School,Single,28,0,203850,7500,0
+150000,Female,Graduate School,Single,27,-1,942,0,1
+100000,Female,Graduate School,Single,27,0,40576,1977,1
+150000,Female,University,Married,24,1,-122,16924,0
+50000,Female,Graduate School,Single,27,0,45190,5277,0
+260000,Female,Graduate School,Single,27,0,109588,5000,0
+140000,Female,Graduate School,Single,27,0,128656,6000,0
+30000,Female,Graduate School,Single,24,-1,25616,31348,0
+90000,Female,University,Single,23,0,11989,7000,0
+50000,Female,University,Married,23,0,5215,1106,1
+230000,Female,University,Single,31,-1,10190,49777,0
+70000,Female,High School,Married,30,2,55352,2600,1
+300000,Female,Graduate School,Single,30,0,53377,2500,0
+300000,Female,Graduate School,Single,27,-1,4850,42198,0
+110000,Female,Graduate School,Single,29,0,105102,5289,1
+30000,Female,University,Single,27,0,4512,2205,0
+20000,Female,University,Single,27,2,17311,0,1
+30000,Female,Graduate School,Single,22,-1,846,3173,0
+350000,Female,Graduate School,Single,31,0,49184,20738,0
+200000,Female,Graduate School,Single,30,-1,10225,9468,0
+200000,Female,Graduate School,Single,31,-2,10742,165,0
+50000,Female,University,Married,26,0,41891,1781,0
+100000,Female,University,Single,26,2,88728,0,1
+140000,Female,University,Single,26,0,22844,1119,0
+50000,Female,High School,Single,22,0,46476,2000,0
+50000,Female,Graduate School,Single,22,0,38393,2000,0
+160000,Female,Graduate School,Single,27,0,97375,20000,0
+170000,Female,University,Single,28,0,30860,304815,0
+160000,Female,Graduate School,Single,28,0,40854,1676,0
+100000,Female,High School,Single,28,3,1250,0,0
+100000,Female,Graduate School,Single,29,0,94378,6000,0
+200000,Female,University,Single,23,0,61650,0,1
+360000,Female,Graduate School,Single,27,0,297275,20000,0
+160000,Female,Graduate School,Single,27,0,159460,10000,0
+500000,Female,Graduate School,Single,30,0,137243,5000,0
+50000,Female,Graduate School,Single,30,0,45977,1823,0
+90000,Female,University,Single,25,0,86830,9000,0
+300000,Female,University,Single,25,-2,780,390,0
+120000,Female,Graduate School,Single,28,0,50477,3500,1
+300000,Female,Graduate School,Single,28,-1,14033,6520,1
+500000,Female,Graduate School,Single,28,0,19543,5110,0
+50000,Female,University,Married,28,0,15314,1269,0
+20000,Female,University,Married,26,0,18141,1600,0
+240000,Female,Graduate School,Single,27,0,118175,0,0
+210000,Female,University,Single,27,0,46066,1951,0
+50000,Female,High School,Married,24,1,-1037,0,0
+50000,Female,University,Single,25,0,42684,2000,0
+60000,Female,Graduate School,Single,25,-1,1507,0,0
+310000,Female,University,Single,28,0,311380,11264,0
+140000,Female,Graduate School,Single,28,0,47547,9532,1
+320000,Female,University,Married,27,-2,1175,0,0
+90000,Female,University,Married,27,0,20461,1649,0
+280000,Female,University,Married,27,0,280913,11052,0
+80000,Female,High School,Married,25,0,3525,1654,1
+170000,Female,Graduate School,Single,25,0,170950,6014,0
+240000,Female,University,Married,30,1,0,0,1
+30000,Female,Graduate School,Married,23,2,25665,1433,1
+120000,Female,University,Single,24,0,69725,3008,0
+50000,Female,University,Single,23,0,48424,1500,0
+10000,Female,University,Single,23,0,6969,1300,0
+80000,Female,Graduate School,Single,24,0,56222,2000,0
+170000,Female,Graduate School,Single,28,0,42062,2000,0
+360000,Female,University,Single,28,-2,-307,0,1
+110000,Female,Graduate School,Married,33,2,24178,1700,1
+210000,Female,University,Married,34,0,14724,2000,0
+150000,Female,Others,Married,34,0,83000,5000,0
+200000,Female,University,Married,34,-2,17822,19855,0
+140000,Female,University,Single,26,0,271896,14000,1
+80000,Female,University,Single,27,-1,45198,2000,1
+240000,Female,High School,Married,24,0,136760,90000,0
+50000,Female,University,Married,29,0,48529,3000,0
+80000,Female,Graduate School,Single,28,0,39663,1664,0
+70000,Female,Others,Married,24,0,47663,1000,0
+50000,Female,Graduate School,Single,24,-1,3258,0,0
+180000,Female,Graduate School,Single,28,0,6396,7000,0
+110000,Female,Graduate School,Single,29,0,107336,5320,0
+300000,Female,University,Married,29,0,321155,11369,1
+210000,Female,Others,Single,29,0,43933,2000,0
+200000,Female,Graduate School,Single,29,0,7298,1000,0
+100000,Female,University,Married,28,1,0,0,1
+70000,Female,High School,Single,31,2,68174,3200,1
+250000,Female,Graduate School,Single,31,0,67492,3500,0
+600000,Female,Graduate School,Single,31,-2,-84,700,0
+320000,Female,University,Married,31,0,2437,1100,0
+180000,Female,University,Single,31,0,180754,3085,0
+110000,Female,Graduate School,Single,27,0,69970,4009,0
+20000,Female,Graduate School,Single,22,1,18761,0,0
+60000,Female,High School,Single,22,-1,4049,1815,0
+320000,Female,University,Single,30,-1,3690,3720,0
+320000,Female,Graduate School,Single,29,0,42596,1955,0
+110000,Female,University,Single,29,0,113003,5837,0
+80000,Female,High School,Single,28,0,71834,3200,0
+500000,Female,High School,Single,29,0,110651,3005,0
+190000,Female,High School,Single,31,-1,1568,87,0
+80000,Female,University,Single,27,0,23424,5000,0
+90000,Female,Graduate School,Married,29,0,4034,10400,0
+140000,Female,High School,Single,29,0,177535,14500,0
+200000,Female,Graduate School,Single,31,1,0,896,0
+250000,Female,Graduate School,Married,31,-1,2198,7404,0
+260000,Female,Graduate School,Married,32,1,0,0,0
+220000,Female,University,Single,28,-1,3700,0,1
+110000,Female,University,Single,29,0,50268,1400,0
+210000,Female,University,Single,27,0,83381,3100,0
+380000,Female,University,Married,30,0,295618,13437,0
+120000,Female,Graduate School,Single,26,1,-93,1029,0
+60000,Female,Graduate School,Single,25,0,56136,2100,0
+100000,Female,University,Single,24,3,81601,0,1
+20000,Female,Graduate School,Single,24,0,19167,1585,0
+30000,Female,University,Single,25,0,19309,1325,1
+50000,Female,Graduate School,Single,26,0,48338,3008,0
+200000,Female,Graduate School,Single,26,0,71574,5000,1
+210000,Female,Graduate School,Single,25,-1,390,390,0
+240000,Female,University,Single,25,-2,0,0,0
+150000,Female,High School,Single,33,0,9516,1100,0
+330000,Female,Graduate School,Single,34,-2,4577,8400,0
+280000,Female,University,Single,34,-1,38,34163,0
+210000,Female,Graduate School,Single,33,1,0,0,0
+400000,Female,Graduate School,Married,34,-2,10000,10000,0
+200000,Female,Graduate School,Single,33,0,147178,7300,0
+60000,Female,University,Married,32,0,59738,2700,0
+50000,Female,University,Married,34,0,2791,2000,0
+360000,Female,Graduate School,Single,31,-2,304,8288,0
+300000,Female,Graduate School,Single,31,-2,1555,1603,0
+10000,Female,University,Single,33,-2,-216,3000,0
+420000,Female,Graduate School,Married,32,-1,8871,358,0
+220000,Female,High School,Single,32,0,209259,7000,0
+130000,Female,University,Married,34,2,28967,2500,1
+200000,Female,Graduate School,Married,34,-2,0,0,0
+50000,Female,Graduate School,Single,33,2,31652,3400,0
+300000,Female,University,Married,33,0,248832,11000,0
+120000,Female,University,Married,44,-1,390,0,1
+360000,Female,Graduate School,Single,35,-1,1475,14200,0
+50000,Female,High School,Married,35,1,0,0,0
+230000,Female,Graduate School,Single,37,1,-152,0,0
+50000,Female,University,Married,37,0,48596,2152,0
+30000,Female,University,Married,38,0,28751,2588,0
+150000,Female,High School,Married,45,2,179441,7333,1
+70000,Female,High School,Married,44,0,9195,1041,0
+320000,Female,Graduate School,Single,35,0,44566,2054,0
+330000,Female,Graduate School,Married,41,1,0,0,1
+120000,Female,University,Married,37,0,106469,5400,0
+360000,Female,Graduate School,Married,36,-2,13686,1992,0
+30000,Female,University,Married,40,1,29002,0,0
+150000,Female,High School,Single,39,0,60115,1662,0
+30000,Female,University,Single,49,2,17667,1500,1
+150000,Female,University,Married,46,0,77066,2094,0
+470000,Female,Graduate School,Married,37,-1,12572,5305,0
+410000,Female,University,Married,42,0,407873,18026,0
+390000,Female,High School,Single,43,1,0,0,1
+320000,Female,University,Married,37,0,202442,8211,0
+450000,Female,High School,Single,40,-1,10815,4000,0
+70000,Female,University,Single,42,1,70803,10,0
+50000,Female,University,Married,44,0,5358,1109,0
+100000,Female,High School,Married,46,0,80402,8000,0
+110000,Female,Graduate School,Single,28,0,91286,8080,0
+280000,Female,University,Single,28,0,119587,4411,0
+500000,Female,Graduate School,Single,28,0,302089,10049,0
+360000,Female,Graduate School,Single,30,0,146277,6029,0
+180000,Female,University,Married,30,-1,2759,13,0
+480000,Female,University,Single,31,0,241989,9960,0
+450000,Female,Graduate School,Married,39,-1,3664,4595,0
+80000,Female,Graduate School,Single,26,0,76157,5000,0
+290000,Female,University,Single,25,0,78719,2510,0
+500000,Female,University,Single,27,0,195140,11613,0
+200000,Female,Graduate School,Married,37,-1,9212,0,0
+130000,Female,High School,Married,43,2,19690,0,1
+360000,Female,University,Single,34,0,282674,12200,0
+300000,Female,University,Married,34,0,59900,3500,0
+70000,Female,University,Single,41,0,72593,4000,1
+80000,Female,High School,Married,44,-1,817,1131,0
+30000,Female,Graduate School,Single,36,2,23338,3120,1
+280000,Female,University,Single,30,0,237345,12873,0
+210000,Male,High School,Single,46,0,46743,5000,0
+80000,Female,High School,Married,41,0,59184,2665,0
+260000,Female,Graduate School,Single,37,-2,3821,1316,0
+120000,Female,High School,Married,36,0,9826,1181,1
+150000,Female,Graduate School,Single,32,2,35099,1887,0
+180000,Female,University,Married,31,1,0,200,0
+140000,Female,University,Married,34,-1,56702,3000,0
+370000,Female,High School,Married,34,0,104114,210000,0
+30000,Female,University,Married,35,2,27223,0,1
+160000,Female,University,Single,33,0,140122,6500,0
+60000,Female,University,Others,33,-1,4260,780,1
+230000,Female,Graduate School,Married,34,2,186163,9000,1
+80000,Female,University,Married,37,0,69767,5000,0
+360000,Female,University,Married,37,-1,5748,0,0
+130000,Female,University,Married,42,2,130623,6400,1
+500000,Female,Graduate School,Single,35,-1,2398,4583,0
+360000,Female,University,Married,47,0,93431,3389,0
+120000,Female,High School,Married,38,0,118016,5720,0
+150000,Female,University,Single,36,0,80321,3052,0
+120000,Female,University,Married,37,-1,18486,0,0
+190000,Female,Graduate School,Others,35,0,111776,21238,0
+230000,Female,High School,Married,43,-2,416,1099,0
+50000,Female,High School,Others,43,0,45807,1800,0
+20000,Female,High School,Single,45,-1,390,350,0
+100000,Female,High School,Married,38,0,16618,13514,0
+200000,Female,Others,Others,40,0,225480,7263,0
+160000,Female,University,Single,38,0,133017,7008,0
+160000,Female,Graduate School,Single,32,-2,162,0,0
+50000,Female,High School,Married,39,2,31063,1800,1
+100000,Female,Graduate School,Single,26,1,0,632,0
+420000,Female,Graduate School,Married,44,1,35545,6,0
+500000,Female,University,Married,38,-1,250,1384,0
+360000,Female,Graduate School,Married,41,-2,0,0,1
+190000,Female,Graduate School,Single,28,-2,0,2278,0
+120000,Female,University,Single,32,0,40420,0,0
+300000,Female,University,Married,33,0,16260,1500,0
+50000,Female,University,Single,31,0,44325,3000,0
+290000,Female,Graduate School,Single,29,0,98709,10000,0
+240000,Female,University,Single,29,0,92076,5006,0
+120000,Female,University,Married,28,0,103676,3283,0
+150000,Female,University,Single,30,-1,146432,5301,0
+270000,Female,Graduate School,Single,31,-1,173,2254,0
+310000,Female,Graduate School,Single,31,-2,9108,24485,0
+20000,Female,University,Married,42,0,6336,1125,1
+70000,Female,Graduate School,Single,29,0,66330,3049,0
+290000,Female,University,Married,29,0,20593,2002,0
+120000,Female,University,Single,38,-1,652,0,0
+140000,Female,High School,Married,48,0,143650,0,0
+220000,Female,Graduate School,Married,43,0,64792,2344,0
+240000,Female,Graduate School,Single,28,0,192551,100050,0
+320000,Female,University,Married,30,0,91029,5005,0
+50000,Female,University,Married,30,0,25422,1450,1
+100000,Female,Graduate School,Married,30,0,43717,3000,0
+70000,Female,University,Single,30,2,14232,1700,1
+230000,Female,Graduate School,Single,30,0,64062,5000,0
+120000,Female,University,Single,30,-1,5713,0,0
+350000,Female,Graduate School,Single,31,0,314747,15000,0
+70000,Female,Graduate School,Single,34,2,43357,2000,1
+50000,Female,High School,Married,31,0,39072,1914,0
+150000,Female,Graduate School,Single,32,-2,1999,3769,0
+410000,Female,High School,Married,34,-2,2347,4252,0
+410000,Female,Graduate School,Single,34,0,43719,2626,0
+400000,Female,Graduate School,Single,35,-2,4238,5000,0
+50000,Female,University,Married,34,0,26149,2000,0
+390000,Female,Graduate School,Married,39,0,40439,2013,0
+360000,Female,Graduate School,Single,36,-1,4279,9398,0
+30000,Female,University,Married,36,-1,326,326,0
+300000,Female,University,Single,42,-2,-904,2000,0
+170000,Female,Graduate School,Married,44,-2,754,0,0
+40000,Female,University,Single,44,2,35515,1550,1
+430000,Female,Graduate School,Single,38,0,41092,3000,0
+130000,Female,University,Married,47,-1,2317,3000,1
+450000,Female,University,Married,42,0,7669,7272,0
+90000,Female,Graduate School,Single,29,0,91643,3338,0
+190000,Female,Graduate School,Married,43,-2,1675,4873,0
+210000,Female,Graduate School,Single,38,0,13744,5000,0
+20000,Female,University,Married,43,0,16510,1558,1
+200000,Female,Graduate School,Married,36,1,0,0,0
+160000,Female,Graduate School,Single,29,-2,0,368,0
+220000,Female,University,Single,30,0,152514,7149,0
+230000,Female,Graduate School,Single,31,-1,2274,7398,0
+290000,Female,Graduate School,Single,32,-2,1485,299,0
+360000,Female,Graduate School,Single,33,0,20376,1500,0
+160000,Female,University,Married,33,0,9765,2836,0
+240000,Female,University,Married,34,-2,0,0,0
+260000,Female,University,Married,35,-2,4539,10417,0
+100000,Female,University,Married,34,0,14815,2010,0
+70000,Female,High School,Married,42,0,22068,29500,0
+170000,Female,University,Single,25,0,57198,3200,0
+40000,Female,University,Single,26,-1,28991,1300,0
+80000,Female,University,Married,47,0,78849,1433,0
+30000,Female,Graduate School,Single,25,0,29668,1500,0
+140000,Female,Graduate School,Single,25,1,-74,0,1
+230000,Female,University,Married,27,0,24739,2000,0
+200000,Female,University,Married,39,0,35465,1600,0
+400000,Female,Graduate School,Married,35,-2,4689,10097,0
+170000,Female,University,Married,38,0,136768,5000,0
+270000,Female,High School,Married,43,-1,2769,0,0
+30000,Female,University,Married,39,-1,4042,5327,0
+200000,Female,University,Single,46,-2,0,0,0
+280000,Female,Graduate School,Single,31,2,103,168,0
+420000,Female,University,Single,32,0,476396,16007,0
+30000,Female,High School,Married,43,2,30346,1702,1
+180000,Female,University,Married,41,0,110856,5000,0
+290000,Female,University,Married,31,0,284583,10000,0
+20000,Female,University,Single,32,0,11389,1500,0
+500000,Female,High School,Married,38,-2,-40,0,0
+340000,Female,University,Married,42,0,226644,8000,0
+270000,Female,University,Married,39,0,85827,3000,0
+150000,Female,High School,Married,41,1,-1203,51772,0
+40000,Female,University,Single,33,-1,1860,2279,0
+360000,Female,Graduate School,Single,34,-2,6276,16142,0
+500000,Female,University,Married,34,0,144774,20053,0
+280000,Female,University,Single,47,-2,12477,13987,0
+50000,Female,University,Single,43,0,43051,1500,0
+330000,Female,Graduate School,Single,42,-2,565,20650,1
+360000,Female,High School,Single,46,-1,15219,26796,0
+340000,Female,High School,Married,40,0,194380,7000,0
+500000,Female,Graduate School,Single,26,-1,1419,1293,0
+270000,Female,Graduate School,Single,34,0,67544,4025,0
+120000,Female,High School,Married,40,0,102400,7500,0
+140000,Female,University,Married,39,-1,567,1498,0
+200000,Female,University,Single,27,-1,6369,5830,0
+50000,Female,High School,Married,45,0,20080,1317,0
+220000,Female,Graduate School,Single,28,0,55477,3000,0
+80000,Female,Graduate School,Single,43,0,80958,1556,0
+200000,Female,High School,Others,45,-2,399,2430,0
+20000,Female,Graduate School,Single,29,1,0,2494,0
+260000,Female,Graduate School,Single,29,2,792,0,1
+430000,Female,Graduate School,Single,29,0,111409,6000,0
+360000,Female,Graduate School,Married,39,-1,21421,31110,0
+500000,Female,Graduate School,Single,30,-1,16558,11075,0
+50000,Female,High School,Married,45,1,49768,0,0
+50000,Female,University,Married,47,0,46170,1755,0
+360000,Female,Graduate School,Single,35,-2,-3,0,0
+10000,Female,High School,Single,36,-1,3671,1500,0
+220000,Female,Graduate School,Single,32,1,0,4048,0
+110000,Female,University,Single,33,0,103195,5200,1
+290000,Female,University,Single,34,0,284150,12000,0
+110000,Female,University,Married,35,0,28969,1500,0
+120000,Female,University,Single,40,1,-200,0,0
+140000,Female,University,Married,46,0,82497,2978,0
+500000,Female,University,Married,38,0,242546,7000,0
+360000,Female,Graduate School,Married,45,-2,3401,2655,0
+280000,Female,University,Married,44,0,390,0,1
+220000,Female,University,Single,38,-1,22145,5575,1
+210000,Female,University,Married,30,0,158805,5404,0
+170000,Female,Graduate School,Single,38,-1,822,0,0
+50000,Female,Graduate School,Single,23,0,15213,1300,0
+300000,Female,Graduate School,Married,34,2,74917,3028,1
+160000,Female,Graduate School,Single,30,-2,2000,1000,1
+320000,Female,University,Single,26,1,303926,8100,1
+280000,Female,High School,Married,37,0,199046,8200,0
+290000,Female,Graduate School,Single,31,0,41879,2000,0
+20000,Female,High School,Single,41,0,19080,3000,0
+80000,Female,High School,Married,37,0,47751,215,0
+200000,Female,High School,Married,47,1,0,0,0
+270000,Female,Graduate School,Single,29,-2,990,5850,0
+200000,Female,Graduate School,Married,46,-1,10687,18031,0
+130000,Female,University,Married,39,0,125816,7166,0
+30000,Female,University,Single,35,2,25368,0,1
+60000,Female,University,Married,29,-1,1033,2081,0
+180000,Female,Graduate School,Single,34,0,85129,5000,1
+90000,Female,University,Married,45,2,45820,2000,1
+430000,Female,Graduate School,Married,43,0,30000,28647,0
+30000,Female,University,Married,37,3,27672,2000,1
+190000,Female,University,Single,40,0,182923,8059,1
+240000,Female,Graduate School,Single,31,2,179483,8151,1
+240000,Female,Graduate School,Married,33,-1,2193,0,0
+240000,Female,University,Married,38,0,32981,15000,0
+160000,Female,University,Married,41,1,0,1496,0
+110000,Female,University,Married,28,0,96131,14000,0
+360000,Female,Graduate School,Single,30,-1,13861,7884,0
+130000,Female,High School,Married,43,0,126965,6739,1
+30000,Female,University,Married,38,0,2433,4000,1
+370000,Female,Graduate School,Single,30,-1,17004,19254,0
+120000,Female,High School,Single,31,1,122806,15,0
+100000,Female,University,Single,31,1,0,0,0
+80000,Female,Others,Single,33,-2,5833,0,0
+120000,Female,University,Married,30,0,101596,3706,0
+290000,Female,University,Married,33,0,149495,3721,0
+150000,Female,High School,Single,35,0,145063,5000,0
+270000,Female,University,Married,28,0,322249,10000,0
+20000,Female,University,Single,29,0,17648,3000,0
+360000,Female,Graduate School,Married,31,-2,-4,438,1
+320000,Female,Graduate School,Single,30,-2,633,8226,0
+260000,Female,Graduate School,Married,31,0,33142,2000,0
+320000,Female,Graduate School,Married,38,2,2500,0,1
+360000,Female,Graduate School,Single,32,0,272112,10130,0
+190000,Female,Graduate School,Single,40,-1,991,991,0
+330000,Female,Graduate School,Married,31,0,132514,10000,0
+90000,Female,University,Married,33,0,27180,1405,1
+240000,Female,Graduate School,Married,41,0,102488,6500,0
+280000,Female,Graduate School,Single,30,-1,8010,3375,0
+360000,Female,Graduate School,Single,31,-2,19847,16059,0
+350000,Female,Graduate School,Married,33,0,82964,68940,1
+420000,Female,University,Married,33,0,52018,10000,0
+280000,Female,University,Single,42,0,272190,9910,0
+70000,Female,University,Married,44,0,79346,2794,0
+140000,Female,Graduate School,Single,32,0,131540,5701,0
+60000,Female,University,Single,31,0,57092,3000,0
+110000,Female,University,Married,33,0,84880,8300,0
+300000,Female,University,Married,46,1,0,0,0
+80000,Female,High School,Single,41,0,78069,1576,0
+200000,Female,University,Married,32,0,10701,0,0
+450000,Female,University,Single,42,-1,182,889,0
+140000,Female,High School,Single,43,0,141036,4504,0
+150000,Female,Graduate School,Single,37,-1,20224,24840,0
+80000,Female,University,Married,31,0,115293,2006,0
+200000,Female,University,Married,35,-2,3023,3965,0
+360000,Female,University,Married,44,1,0,0,1
+160000,Female,University,Married,37,0,45966,3900,0
+200000,Female,Graduate School,Single,33,-2,9099,3775,0
+150000,Female,Graduate School,Married,35,-1,3003,2698,0
+50000,Female,University,Single,38,-1,390,390,0
+360000,Female,University,Married,45,0,222505,8000,0
+500000,Female,Graduate School,Married,32,-1,93174,45979,0
+220000,Female,Graduate School,Married,31,0,218796,5539,0
+220000,Female,University,Single,37,-2,2662,1918,0
+80000,Female,University,Married,45,0,77995,2855,0
+200000,Female,University,Single,38,-2,3000,3000,0
+100000,Female,Graduate School,Single,38,2,37566,500,1
+260000,Female,Graduate School,Married,32,-2,0,0,1
+300000,Female,Others,Single,37,-2,-1244,6425,0
+160000,Female,High School,Married,44,-2,390,0,0
+50000,Female,University,Single,35,0,50521,1649,0
+150000,Female,High School,Single,38,0,3060,150,0
+70000,Female,Graduate School,Single,27,0,72227,6000,0
+80000,Female,University,Single,27,0,66203,3500,0
+100000,Female,Graduate School,Single,28,-2,-177,0,0
+150000,Female,University,Single,30,0,69955,1873,0
+340000,Female,University,Others,35,-2,132,132,0
+210000,Female,High School,Married,39,1,26624,0,0
+240000,Female,Graduate School,Single,28,-2,5623,181,0
+250000,Female,Graduate School,Single,35,-1,964,69032,0
+200000,Female,Graduate School,Married,38,-2,15099,6370,0
+50000,Female,Graduate School,Married,36,0,47317,3000,0
+450000,Female,Graduate School,Single,35,-1,7256,816,0
+200000,Female,Graduate School,Married,38,0,13311,1711,0
+350000,Female,University,Married,47,0,97500,3010,1
+200000,Female,Graduate School,Single,45,-1,2377,221,0
+530000,Female,University,Married,44,0,562326,20010,1
+60000,Female,High School,Married,45,0,62396,1500,0
+280000,Female,Graduate School,Single,30,-2,2946,3326,0
+210000,Female,Graduate School,Married,34,0,157907,50000,0
+210000,Female,Graduate School,Single,34,1,3824,0,1
+420000,Female,University,Married,46,0,254275,10004,0
+80000,Female,University,Single,45,-1,2574,390,0
+170000,Female,Graduate School,Married,31,0,6832,0,0
+80000,Female,University,Single,35,0,48725,10000,0
+510000,Female,University,Married,38,0,95589,2000,0
+110000,Female,University,Married,31,0,109814,4665,0
+10000,Female,University,Single,31,0,10154,1400,1
+300000,Female,Graduate School,Single,32,-2,-3,0,0
+60000,Female,University,Married,32,1,61291,0,1
+200000,Female,University,Single,33,0,194283,7520,0
+230000,Female,University,Married,35,-1,7035,2969,0
+500000,Female,Graduate School,Single,45,-2,0,0,0
+50000,Female,University,Single,47,2,43439,2341,1
+280000,Female,University,Married,45,0,159633,7000,0
+170000,Female,University,Married,31,0,102477,5210,0
+60000,Female,University,Single,31,2,57880,0,1
+210000,Female,Graduate School,Single,32,1,0,92,1
+70000,Female,High School,Single,42,0,69362,6000,1
+300000,Female,High School,Married,37,1,43459,26137,0
+330000,Female,Graduate School,Single,31,0,37426,5000,0
+200000,Female,University,Single,49,0,80528,4000,0
+170000,Female,Others,Married,33,0,143805,10000,0
+20000,Female,University,Married,48,1,0,0,1
+230000,Female,University,Married,45,-1,416,416,1
+80000,Female,University,Married,27,0,80658,3800,0
+320000,Female,Others,Others,32,-1,1145,4135,0
+230000,Female,University,Married,35,0,160560,10012,0
+80000,Female,University,Single,38,-1,390,390,0
+300000,Female,High School,Married,32,-2,0,0,1
+100000,Female,High School,Married,43,2,100304,0,1
+370000,Female,University,Married,47,-2,1289,2432,0
+200000,Female,University,Single,37,-1,4037,3913,0
+130000,Female,Graduate School,Single,30,-1,5002,2437,0
+50000,Female,University,Married,42,0,38516,1754,0
+300000,Female,University,Single,35,0,86574,3219,0
+500000,Female,Graduate School,Married,44,0,111937,4000,0
+140000,Female,University,Married,37,-1,326,326,1
+20000,Female,University,Married,44,0,16231,1600,0
+270000,Female,Graduate School,Married,44,-2,-1404,75000,0
+150000,Female,University,Married,35,0,216793,4818,0
+70000,Female,Graduate School,Married,45,1,116630,0,0
+120000,Female,Graduate School,Married,46,0,28675,1500,0
+20000,Female,University,Single,39,0,20885,2157,0
+260000,Female,Graduate School,Married,34,0,136914,7000,0
+160000,Female,Graduate School,Married,35,1,0,0,0
+50000,Female,University,Married,45,0,48153,2000,0
+350000,Female,University,Married,39,1,0,0,0
+510000,Female,Graduate School,Single,40,8,477094,0,1
+250000,Female,Graduate School,Married,38,-1,6352,6790,0
+80000,Female,University,Married,36,1,4233,14233,0
+180000,Female,University,Single,33,-1,4160,7817,0
+60000,Female,University,Others,39,1,58574,0,1
+170000,Female,Graduate School,Married,35,2,131909,4800,1
+200000,Female,Graduate School,Married,39,-1,100239,81596,0
+210000,Female,Graduate School,Married,31,-2,31808,9283,1
+200000,Female,Graduate School,Single,47,-1,30177,2061,0
+200000,Female,University,Single,34,2,71817,3300,1
+200000,Female,University,Married,32,-1,11191,6002,0
+240000,Female,University,Married,29,0,21909,2420,0
+450000,Female,University,Single,34,-2,3752,513,0
+80000,Female,University,Married,26,0,78872,2800,0
+430000,Female,University,Single,33,0,191520,8266,0
+30000,Female,High School,Married,40,2,16457,3280,1
+290000,Female,University,Married,30,0,284054,10222,0
+40000,Female,University,Single,37,1,7613,0,1
+170000,Female,High School,Married,37,0,155379,7000,0
+130000,Female,Graduate School,Single,26,0,13700,10008,0
+360000,Female,University,Single,27,0,13102,5655,0
+360000,Female,High School,Married,30,1,-16,1637,0
+180000,Female,High School,Married,45,-2,396,396,0
+210000,Female,University,Single,28,0,26770,2000,0
+260000,Female,University,Single,35,-2,-7,0,0
+200000,Female,University,Married,33,1,-7082,21179,0
+340000,Female,Graduate School,Single,31,0,112119,25000,0
+200000,Female,Graduate School,Married,37,-1,3135,0,0
+360000,Female,University,Married,35,2,360996,16500,1
+90000,Female,University,Single,32,0,86076,3012,0
+40000,Female,High School,Married,34,0,32052,3500,0
+500000,Female,University,Single,28,0,234957,8086,0
+240000,Female,University,Married,36,0,232031,8193,0
+440000,Female,Graduate School,Single,33,-1,304,301,0
+50000,Female,University,Married,33,0,44052,2000,0
+180000,Female,University,Single,28,1,172850,14400,1
+30000,Female,University,Married,35,0,25485,1447,0
+60000,Female,Graduate School,Single,29,0,39553,1969,0
+150000,Female,High School,Married,29,0,8902,1200,0
+80000,Female,University,Married,32,0,5988,0,0
+290000,Female,Graduate School,Married,34,-2,0,0,0
+220000,Female,University,Married,40,0,21920,0,0
+50000,Female,Graduate School,Single,34,-1,1884,1077,0
+80000,Female,High School,Married,48,2,66321,0,0
+400000,Female,Graduate School,Single,37,1,6914,73036,0
+150000,Female,Graduate School,Married,33,-2,-3,0,0
+250000,Female,Graduate School,Single,34,2,212681,9917,0
+120000,Female,University,Married,34,-1,1800,0,0
+300000,Female,Graduate School,Single,28,-1,16094,2259,0
+500000,Female,University,Single,39,0,24893,12260,0
+240000,Female,University,Single,36,-1,1000,0,0
+240000,Female,University,Single,29,0,17570,150359,0
+30000,Female,University,Single,39,-2,2416,2230,1
+90000,Female,Graduate School,Single,29,-2,-240,0,0
+60000,Female,Graduate School,Single,31,-1,2631,2000,1
+60000,Female,University,Single,41,0,16194,1518,0
+230000,Female,Graduate School,Single,41,-1,177,0,0
+290000,Female,Graduate School,Married,40,0,192432,8000,0
+430000,Female,Graduate School,Single,42,-2,9777,75276,0
+230000,Female,University,Married,39,-2,0,0,1
+140000,Female,High School,Married,48,1,0,0,0
+210000,Female,University,Single,30,0,215842,9500,0
+390000,Female,University,Single,36,0,112528,3669,0
+320000,Female,Graduate School,Married,35,-1,6793,5,0
+50000,Female,University,Married,44,2,26587,3600,1
+100000,Female,Graduate School,Single,30,0,41150,0,0
+260000,Female,High School,Married,40,-2,-6,0,0
+260000,Female,Graduate School,Married,30,1,0,0,0
+100000,Female,Graduate School,Single,30,0,77388,3000,0
+340000,Female,University,Married,37,0,36006,5000,0
+150000,Female,University,Married,36,0,113368,2354,0
+360000,Female,University,Single,30,0,7410,1115,0
+210000,Female,University,Married,34,-1,25650,1360,0
+200000,Female,Graduate School,Married,33,2,1076,696,0
+310000,Female,Graduate School,Married,41,0,20876,2016,0
+260000,Female,Graduate School,Married,32,-2,1009,2403,0
+210000,Female,University,Single,41,-2,0,0,0
+330000,Female,University,Single,39,-1,8872,8284,0
+430000,Female,University,Single,29,-2,2548,12082,0
+450000,Female,Graduate School,Single,32,0,46437,7875,0
+150000,Female,University,Married,32,-1,3000,184000,0
+410000,Female,High School,Married,32,-2,388,388,0
+90000,Female,University,Married,48,0,43810,1723,0
+280000,Female,Graduate School,Married,36,-1,17951,6094,0
+150000,Female,Graduate School,Married,45,-2,5423,4133,0
+200000,Female,Graduate School,Married,42,-1,2374,0,0
+150000,Female,University,Married,42,0,15878,1300,0
+140000,Female,Graduate School,Single,40,1,143478,0,0
+260000,Female,Graduate School,Married,29,0,71864,3090,0
+160000,Female,University,Single,30,0,151620,7300,0
+310000,Female,Graduate School,Married,45,0,91532,3340,0
+80000,Female,High School,Married,47,-1,390,780,1
+150000,Female,High School,Married,38,2,146057,6527,0
+330000,Female,Graduate School,Married,31,0,22285,1700,0
+180000,Female,Graduate School,Single,30,-2,1490,3687,0
+120000,Female,Graduate School,Single,29,0,6385,3000,1
+150000,Female,Others,Single,30,-1,3826,2978,0
+200000,Female,Graduate School,Single,35,0,65503,2394,0
+170000,Female,University,Single,37,1,0,213,0
+50000,Female,University,Others,45,0,44797,46012,0
+80000,Female,University,Married,41,-1,264,264,0
+20000,Female,High School,Single,31,-1,1550,1550,0
+360000,Female,University,Married,31,0,350752,13016,0
+200000,Female,Graduate School,Single,31,1,0,0,1
+50000,Female,University,Married,33,0,39554,5000,0
+70000,Female,University,Married,33,0,56778,20000,0
+300000,Female,University,Single,34,0,297890,7800,0
+20000,Female,High School,Single,45,0,14381,1564,1
+170000,Female,Graduate School,Single,40,-2,0,0,1
+180000,Female,Others,Married,44,0,151538,6256,0
+200000,Female,Graduate School,Single,38,-2,0,0,0
+320000,Female,University,Married,44,-2,-6,656,0
+180000,Female,High School,Single,45,0,76942,4003,0
+60000,Female,University,Married,40,2,60211,3000,0
+360000,Female,High School,Married,40,0,44518,8061,0
+30000,Female,University,Single,46,0,19495,3400,1
+130000,Female,University,Married,35,0,105976,5400,0
+80000,Female,University,Married,35,1,0,964,0
+230000,Female,University,Married,40,0,44948,1728,0
+210000,Female,University,Married,36,0,4879,3000,0
+270000,Female,High School,Married,35,0,60227,3000,0
+280000,Female,University,Single,37,0,213581,9000,0
+50000,Female,University,Married,36,0,30803,1477,0
+360000,Female,University,Married,38,-2,0,212,0
+470000,Female,Graduate School,Married,37,-2,0,0,0
+230000,Female,Graduate School,Married,42,1,6087,16050,0
+80000,Female,University,Married,47,2,80019,3111,1
+200000,Female,Graduate School,Single,29,-1,3215,676,0
+200000,Female,University,Married,32,0,166428,6026,0
+50000,Female,High School,Single,30,0,48134,5261,0
+20000,Female,University,Married,33,0,10001,2000,0
+50000,Female,University,Married,41,0,22824,1100,0
+110000,Female,University,Married,31,-1,832,416,1
+280000,Female,University,Single,46,0,29523,126000,0
+160000,Female,Graduate School,Single,32,-1,9441,5726,0
+320000,Female,University,Married,35,-1,2276,6626,1
+300000,Female,Graduate School,Single,30,-1,452,3,1
+300000,Female,University,Single,30,0,133115,2300,1
+100000,Female,University,Single,31,0,99259,2750,0
+100000,Female,University,Married,39,-2,1884,0,0
+110000,Female,University,Married,33,0,15605,5649,0
+400000,Female,University,Married,39,-2,52306,3656,0
+20000,Female,University,Married,35,0,15284,6230,0
+50000,Female,Graduate School,Single,27,2,12250,0,1
+310000,Female,Graduate School,Single,28,0,269832,15327,0
+80000,Female,University,Single,44,0,75901,3500,0
+120000,Female,High School,Married,43,0,116032,4201,0
+200000,Female,Graduate School,Married,39,-1,11399,400,0
+100000,Female,University,Married,44,-1,9242,181,0
+210000,Female,University,Single,37,0,206157,10000,0
+60000,Female,Graduate School,Married,33,-1,14932,5000,0
+130000,Female,University,Married,38,-1,929,2876,0
+150000,Female,High School,Single,33,-2,1879,69842,0
+220000,Female,University,Married,44,2,195375,10,0
+140000,Female,University,Married,40,0,98221,3600,0
+50000,Female,University,Single,31,0,77962,3000,0
+90000,Female,High School,Married,41,0,2833,5000,0
+500000,Female,Graduate School,Single,37,0,20181,1358,0
+300000,Female,High School,Married,28,-1,390,390,1
+260000,Female,Graduate School,Married,40,-1,63273,49857,0
+130000,Female,High School,Married,42,-1,2551,0,1
+200000,Female,Graduate School,Single,28,-1,6023,3632,0
+290000,Female,University,Single,29,0,7324,1130,0
+650000,Female,Graduate School,Single,29,1,21289,3000,0
+230000,Female,Graduate School,Single,27,0,11406,5097,0
+230000,Female,Graduate School,Married,30,2,63064,2800,1
+200000,Female,University,Single,31,0,203265,7471,0
+400000,Female,University,Single,32,0,234779,5784,0
+490000,Female,University,Married,30,0,195330,8683,0
+360000,Female,Graduate School,Single,30,-2,0,0,0
+230000,Female,Graduate School,Single,33,-2,5869,11556,0
+290000,Female,University,Single,34,0,285484,10220,0
+200000,Female,Graduate School,Single,34,1,0,0,0
+360000,Female,University,Married,34,-1,42720,1430,0
+10000,Female,High School,Married,42,2,7587,1000,1
+180000,Female,University,Single,33,-2,11281,3411,0
+230000,Female,High School,Married,47,-1,6200,5482,0
+340000,Female,Graduate School,Single,35,-2,732,1618,0
+170000,Female,High School,Married,44,0,150692,6936,0
+50000,Female,High School,Single,42,0,48926,2019,0
+170000,Female,University,Married,37,-1,168719,7000,0
+160000,Female,University,Married,49,-1,16982,4980,0
+280000,Female,High School,Married,46,0,100858,7768,0
+140000,Female,University,Single,38,0,135034,4729,0
+520000,Female,Graduate School,Married,43,0,146124,6024,0
+140000,Female,University,Single,29,0,25973,30000,0
+130000,Female,Graduate School,Single,26,-2,1016,0,0
+100000,Female,Graduate School,Single,27,0,58127,3000,0
+50000,Female,High School,Married,27,2,11275,27234,1
+130000,Female,University,Single,27,1,0,0,1
+200000,Female,Graduate School,Married,39,-1,25149,15084,0
+280000,Female,Graduate School,Single,28,-1,5432,0,0
+240000,Female,Graduate School,Single,29,-2,28438,29923,0
+90000,Female,Graduate School,Married,29,2,42815,0,0
+250000,Female,Graduate School,Single,29,0,98925,9000,0
+210000,Female,Graduate School,Single,29,-1,1383,484,0
+90000,Female,University,Married,30,-2,0,0,0
+180000,Female,Others,Single,30,-2,0,0,0
+500000,Female,Others,Married,32,0,139255,2040,0
+240000,Female,Graduate School,Married,32,-2,6101,1205,0
+50000,Female,University,Single,31,0,21704,1400,0
+400000,Female,Graduate School,Single,32,0,46782,2036,0
+230000,Female,Graduate School,Single,33,1,4213,0,1
+400000,Female,High School,Single,31,1,79403,3,0
+80000,Female,University,Married,35,1,0,0,0
+160000,Female,University,Single,37,-2,7596,920,0
+430000,Female,Graduate School,Married,38,1,-469,1000,0
+90000,Female,University,Married,34,1,8021,77,0
+80000,Female,High School,Married,35,0,63436,2096,0
+210000,Female,University,Single,36,1,0,0,0
+310000,Female,University,Married,46,-2,53474,11919,0
+500000,Female,University,Married,39,0,11712,523,0
+70000,Female,High School,Single,38,0,34240,2000,0
+20000,Female,University,Married,36,1,16161,2000,0
+210000,Female,University,Married,44,-2,0,4098,1
+170000,Female,High School,Married,40,0,166305,7800,0
+390000,Female,Graduate School,Single,38,-1,4989,2076,0
+320000,Female,University,Married,43,2,520651,10000,1
+50000,Female,University,Married,37,0,38418,10000,0
+100000,Female,High School,Married,36,2,100281,5000,0
+220000,Female,Graduate School,Married,39,-1,2342,1122,0
+300000,Female,Graduate School,Married,38,-1,15230,1222,0
+160000,Female,University,Married,40,-1,6102,0,0
+200000,Female,High School,Married,45,0,54721,2209,0
+240000,Female,Graduate School,Married,43,-1,7994,20487,0
+90000,Female,Graduate School,Single,27,2,28206,0,1
+50000,Female,University,Married,46,-2,4640,2720,0
+210000,Female,Graduate School,Married,46,-2,0,477,0
+290000,Female,University,Married,33,-2,11303,17654,0
+160000,Female,University,Single,37,2,144255,6900,0
+200000,Female,Graduate School,Married,31,-1,11715,5289,0
+260000,Female,University,Married,37,0,182249,6107,0
+30000,Female,High School,Married,47,2,27380,1440,0
+230000,Female,University,Single,36,-2,140,358,0
+220000,Female,High School,Single,39,0,209596,6888,0
+140000,Female,Graduate School,Single,27,-1,2116,11033,0
+80000,Female,University,Married,38,0,43170,2000,0
+450000,Female,Graduate School,Married,49,1,0,557,0
+160000,Female,University,Married,45,0,106643,3700,0
+100000,Female,Graduate School,Married,27,1,77346,0,1
+250000,Female,University,Married,39,-1,9690,0,0
+80000,Female,University,Married,38,-2,0,0,0
+180000,Female,Graduate School,Single,35,-2,7210,370,0
+720000,Female,University,Single,38,1,5065,2184,0
+180000,Female,University,Married,27,0,3063,1084,0
+230000,Female,Graduate School,Married,32,-2,2774,0,0
+200000,Female,University,Single,32,1,0,0,0
+80000,Female,University,Married,36,0,77998,3700,0
+100000,Female,University,Married,30,-1,100037,3607,0
+200000,Female,Graduate School,Single,30,0,38540,1928,0
+200000,Female,High School,Married,32,0,132436,4780,1
+50000,Female,University,Married,30,0,42326,1500,0
+380000,Female,Graduate School,Married,36,-2,3744,20498,1
+250000,Female,University,Married,41,0,214301,8972,0
+220000,Female,University,Single,47,-2,1526,1122,0
+70000,Female,University,Single,33,0,62209,2900,0
+10000,Female,University,Married,40,0,9135,0,0
+60000,Female,University,Single,35,0,21770,1400,0
+200000,Female,High School,Married,44,1,0,0,1
+200000,Female,University,Married,30,0,25874,1048,0
+300000,Female,Graduate School,Single,48,-2,0,0,0
+260000,Female,High School,Single,36,0,146033,4160,0
+130000,Female,High School,Married,44,-1,994,3693,0
+30000,Female,University,Married,29,2,26844,300,0
+170000,Female,Graduate School,Married,30,0,165637,15600,1
+210000,Female,University,Married,29,-2,3814,0,0
+50000,Female,High School,Single,44,0,43828,1947,0
+600000,Female,Graduate School,Single,36,-2,53838,530,0
+180000,Female,High School,Married,37,0,165369,4402,0
+100000,Female,Graduate School,Single,29,2,57214,3000,1
+210000,Female,Graduate School,Married,46,-2,4065,0,0
+150000,Female,University,Married,37,-1,2897,2489,0
+360000,Female,High School,Single,30,-1,11816,10000,0
+330000,Female,Graduate School,Married,32,0,155927,5388,0
+300000,Female,University,Single,43,1,499,0,0
+250000,Female,Graduate School,Single,42,-1,30471,39968,0
+230000,Female,University,Married,48,0,344458,11000,0
+20000,Female,High School,Single,47,1,3133,0,0
+210000,Female,Graduate School,Married,35,-2,2020,4055,0
+80000,Female,University,Single,36,2,82177,10096,1
+250000,Female,Graduate School,Single,39,-1,12368,1742,0
+50000,Female,University,Married,34,3,47904,2000,1
+200000,Female,Others,Married,42,0,149351,3925,0
+290000,Female,High School,Single,35,0,159165,11000,0
+310000,Female,High School,Married,43,0,6974,1137,0
+270000,Female,Graduate School,Single,28,1,118198,0,0
+200000,Female,Graduate School,Single,33,-2,285,0,0
+290000,Female,University,Married,32,0,169108,7118,0
+220000,Female,University,Single,29,0,131598,4800,0
+500000,Female,Graduate School,Married,35,-1,3290,11000,1
+370000,Female,High School,Single,31,-1,2804,1674,0
+100000,Female,University,Married,44,1,-1651,0,0
+210000,Female,Graduate School,Married,44,0,112817,5000,0
+310000,Female,High School,Single,38,-2,-41,8239,0
+200000,Female,University,Married,27,0,157222,7200,0
+110000,Female,Graduate School,Single,28,0,101138,3400,1
+130000,Female,University,Married,30,0,38967,2000,0
+130000,Female,Graduate School,Single,34,0,66803,3000,0
+80000,Female,University,Married,35,3,78317,9,1
+30000,Female,Graduate School,Married,47,0,24895,2000,1
+50000,Female,Graduate School,Married,39,1,0,13030,0
+160000,Female,University,Married,30,2,164082,17,1
+80000,Female,University,Married,45,0,67142,2420,0
+100000,Female,University,Single,37,-1,390,0,0
+10000,Female,University,Single,46,2,1526,1000,1
+90000,Female,University,Married,30,-1,414,414,0
+360000,Female,University,Married,38,0,41922,1692,0
+670000,Female,Graduate School,Married,43,0,55893,10075,0
+150000,Female,High School,Single,28,-2,0,0,0
+150000,Female,Graduate School,Single,28,0,12799,1600,0
+230000,Female,Graduate School,Married,35,0,22909,0,0
+230000,Female,University,Married,30,-1,6294,1945,0
+120000,Female,University,Single,33,0,135770,5000,0
+170000,Female,Graduate School,Married,33,-1,1554,360,0
+200000,Female,University,Married,35,1,59466,7970,0
+450000,Female,University,Married,38,-2,390,390,1
+200000,Female,Graduate School,Single,32,-1,4640,18116,0
+310000,Female,University,Married,33,0,73095,2573,0
+50000,Female,Graduate School,Single,30,-2,0,2070,0
+140000,Female,University,Married,36,-1,430,303,1
+250000,Female,Graduate School,Single,37,-1,1291,1391,0
+310000,Female,Graduate School,Single,31,0,175357,6006,0
+80000,Female,Graduate School,Single,31,0,68613,2600,0
+200000,Female,University,Married,29,-2,8350,14185,1
+90000,Female,High School,Married,47,0,84195,4001,0
+160000,Female,High School,Married,45,-2,2869,2361,0
+180000,Female,Graduate School,Single,27,1,0,1530,0
+310000,Female,University,Married,33,0,86513,4000,0
+280000,Female,Others,Single,29,0,4567,2500,0
+90000,Female,University,Single,43,0,17503,4000,0
+210000,Female,University,Single,39,0,193976,7100,0
+200000,Female,Graduate School,Single,27,-1,134,0,0
+70000,Female,University,Married,48,0,20744,2000,0
+320000,Female,University,Single,31,-2,3615,3707,0
+210000,Female,Graduate School,Single,33,-2,1723,0,0
+130000,Female,Graduate School,Single,27,0,36487,3000,0
+80000,Female,University,Married,27,-1,390,390,0
+370000,Female,University,Married,47,-2,1859,2243,0
+180000,Female,High School,Married,36,0,8911,1034,0
+310000,Female,Graduate School,Married,43,0,66140,6000,0
+280000,Female,University,Single,27,1,0,0,0
+150000,Female,University,Married,27,-1,1108,147314,0
+180000,Female,Graduate School,Married,29,1,36537,0,0
+140000,Female,Graduate School,Single,31,-1,7695,1680,0
+430000,Female,Graduate School,Single,34,-1,180,1662,0
+360000,Female,Graduate School,Single,37,-2,268,260416,0
+240000,Female,Graduate School,Married,42,-1,2108,0,0
+140000,Female,University,Single,30,3,3043,1423,1
+200000,Female,Graduate School,Married,34,-1,860,0,0
+80000,Female,University,Married,37,0,81218,2527,1
+100000,Female,University,Single,38,0,96707,4070,0
+210000,Female,Graduate School,Married,40,-1,316,316,0
+250000,Female,University,Married,37,1,0,0,0
+50000,Female,High School,Married,38,0,50312,6000,0
+400000,Female,Graduate School,Married,28,-1,8716,211,0
+150000,Female,University,Single,37,0,6039,3000,0
+280000,Female,University,Single,39,0,123997,5600,0
+100000,Female,University,Married,29,-2,0,0,0
+610000,Female,Graduate School,Single,38,0,608594,26868,0
+120000,Female,University,Single,30,0,121280,6026,0
+180000,Female,University,Single,44,0,23611,1436,0
+270000,Female,High School,Married,47,-1,123,123,1
+110000,Female,High School,Married,36,0,114097,5620,0
+90000,Female,High School,Married,40,0,90483,4000,0
+120000,Female,Graduate School,Single,48,0,96183,5000,0
+50000,Female,High School,Married,35,2,48115,6000,1
+160000,Female,High School,Single,32,-1,1375,3645,1
+500000,Female,University,Married,48,0,79640,3017,0
+440000,Female,University,Single,28,0,412962,15000,0
+330000,Female,University,Married,31,-1,1732,1000,0
+200000,Female,University,Single,33,0,64932,2365,0
+250000,Female,University,Single,33,0,147934,5500,0
+140000,Female,University,Married,37,0,58081,5000,0
+260000,Female,Graduate School,Married,36,0,203755,15000,0
+60000,Female,Graduate School,Married,45,1,58360,0,1
+500000,Female,Graduate School,Married,37,0,501369,17204,0
+130000,Female,University,Single,30,0,114838,4224,0
+80000,Female,University,Married,30,1,69942,0,1
+150000,Female,University,Single,44,0,37050,3000,0
+120000,Female,Graduate School,Married,31,-1,1680,17749,0
+90000,Female,Graduate School,Single,31,1,0,0,0
+210000,Female,Graduate School,Others,31,-1,1332,3848,0
+280000,Female,Graduate School,Married,45,-2,26573,18388,0
+240000,Female,Graduate School,Single,28,-2,1981,0,0
+230000,Female,University,Single,29,-2,0,4375,0
+30000,Female,University,Married,29,0,28620,1453,0
+60000,Female,University,Married,29,0,63578,2188,0
+360000,Female,Graduate School,Married,30,-1,13395,20460,0
+90000,Female,Graduate School,Single,33,0,55937,2100,0
+180000,Female,Graduate School,Married,33,-1,4520,1453,0
+10000,Female,High School,Married,33,1,0,0,0
+210000,Female,Graduate School,Married,34,-1,2501,639,1
+30000,Female,Graduate School,Married,34,-2,3275,5000,0
+200000,Female,Graduate School,Single,34,-1,8177,2453,0
+90000,Female,University,Married,32,0,88729,3500,0
+50000,Female,High School,Married,44,0,45858,2000,0
+30000,Female,University,Married,46,1,26840,0,0
+220000,Female,Graduate School,Single,35,-1,5564,7479,0
+500000,Female,University,Married,46,0,16400,886,0
+60000,Female,High School,Married,47,0,33867,2000,0
+110000,Female,University,Married,41,0,104740,5000,0
+360000,Female,High School,Married,43,-2,4392,0,1
+30000,Female,High School,Single,42,0,26300,1700,0
+300000,Female,University,Married,38,-2,0,0,0
+490000,Female,Graduate School,Married,36,-2,1886,9480,0
+210000,Female,Graduate School,Married,36,-1,1630,25903,0
+280000,Female,Graduate School,Married,37,-1,733,168,0
+60000,Female,University,Married,36,-1,1577,1100,0
+490000,Female,University,Married,38,-2,0,0,0
+50000,Female,High School,Married,45,0,4151,1088,1
+200000,Female,University,Married,35,1,0,0,0
+70000,Female,Graduate School,Single,43,3,56994,2500,0
+260000,Female,University,Single,29,0,16496,8027,0
+460000,Female,University,Married,40,0,133102,5015,0
+200000,Female,High School,Married,41,-1,3000,187,0
+320000,Female,University,Married,42,-1,8144,0,1
+230000,Female,High School,Others,48,0,41134,10000,0
+350000,Female,Graduate School,Married,30,0,43548,15000,0
+230000,Female,Graduate School,Married,31,0,15704,2000,0
+20000,Female,High School,Single,47,1,16289,0,1
+230000,Female,University,Married,43,0,41435,1674,0
+80000,Female,University,Married,35,0,15234,3000,0
+110000,Female,University,Married,40,0,93466,4438,0
+360000,Female,Graduate School,Married,30,-2,323,1986,0
+80000,Female,University,Single,34,1,-3,0,0
+350000,Female,University,Married,35,-1,439,884,1
+70000,Female,University,Married,47,0,136809,3158,0
+360000,Female,High School,Single,29,-1,4887,5005,0
+240000,Female,Graduate School,Married,32,2,209368,0,1
+80000,Female,University,Single,34,0,64553,1784,0
+140000,Female,University,Single,29,0,62023,3000,0
+50000,Female,Graduate School,Single,33,2,60679,0,1
+130000,Female,High School,Single,35,0,3929,1000,0
+160000,Female,University,Single,44,0,118809,10000,0
+160000,Female,University,Married,42,2,346,0,0
+90000,Female,Graduate School,Single,29,0,8927,8000,0
+50000,Female,University,Married,42,0,41077,1681,0
+210000,Female,Graduate School,Single,29,-1,871,27371,0
+20000,Female,Graduate School,Single,29,2,15037,1600,1
+200000,Female,Graduate School,Single,29,-1,3540,3028,0
+230000,Female,Graduate School,Married,31,-1,623,0,0
+150000,Female,High School,Married,43,-2,0,0,0
+210000,Female,University,Single,28,-2,0,0,0
+290000,Female,Graduate School,Married,33,-2,0,0,1
+260000,Female,High School,Married,34,2,118706,0,1
+20000,Female,University,Married,29,0,9665,2700,1
+180000,Female,High School,Married,47,-1,3735,3404,1
+100000,Female,Others,Married,29,0,99302,5712,0
+280000,Female,University,Married,38,0,236583,10000,0
+100000,Female,University,Married,41,-2,0,0,1
+100000,Female,University,Married,35,-2,3835,1891,0
+30000,Female,Graduate School,Single,30,0,4480,4000,0
+160000,Female,University,Married,44,-1,1224,0,1
+360000,Female,Graduate School,Single,36,0,308711,11000,0
+450000,Female,Graduate School,Single,36,-1,1866,4579,0
+380000,Female,Graduate School,Single,40,0,4366,2566,1
+120000,Female,University,Married,42,-2,0,338,0
+160000,Female,Graduate School,Married,47,1,89006,0,1
+50000,Female,Graduate School,Single,33,0,25123,3400,1
+170000,Female,University,Single,34,0,85235,5000,0
+350000,Female,University,Single,34,1,-20,650,0
+180000,Female,University,Married,48,-2,-15,2131,0
+100000,Female,Graduate School,Married,38,0,68916,3000,0
+50000,Female,Graduate School,Single,26,0,4041,1200,0
+180000,Female,Graduate School,Single,30,-2,50055,5320,0
+150000,Female,Graduate School,Single,31,1,-512,0,0
+300000,Female,Graduate School,Single,32,-2,880,788,1
+350000,Female,Graduate School,Single,33,0,303949,10000,0
+260000,Female,University,Married,42,0,35215,2500,0
+200000,Female,University,Single,27,0,7886,4500,0
+240000,Female,University,Married,27,0,133348,5160,1
+150000,Female,Graduate School,Single,26,0,65421,4850,0
+290000,Female,University,Single,26,0,18125,3000,0
+140000,Female,University,Single,27,0,127464,6000,0
+130000,Female,Graduate School,Single,29,-2,0,0,0
+180000,Female,Graduate School,Married,29,-2,0,287,0
+200000,Female,Graduate School,Single,29,-1,2116,0,0
+230000,Female,Graduate School,Married,28,1,207128,6300,1
+240000,Female,University,Single,28,2,294180,0,0
+170000,Female,Graduate School,Single,31,-2,757,4319,0
+120000,Female,High School,Single,31,-1,396,396,0
+240000,Female,Graduate School,Single,31,-2,6887,1926,0
+90000,Female,University,Married,31,0,12045,1500,0
+300000,Female,University,Single,30,-2,850,850,1
+70000,Female,High School,Single,30,-1,71521,0,0
+100000,Female,University,Single,30,0,2419,2177,0
+110000,Female,High School,Married,31,0,22396,4604,0
+180000,Female,University,Married,33,0,165839,20064,0
+500000,Female,Graduate School,Single,32,0,422713,15000,0
+80000,Female,High School,Married,32,2,26856,1700,0
+280000,Female,Graduate School,Single,32,-2,5855,13713,0
+280000,Female,Graduate School,Single,33,-2,2533,2997,0
+120000,Female,Graduate School,Married,35,-1,525,2458,0
+90000,Female,University,Married,35,-1,5488,1000,0
+170000,Female,University,Single,34,-1,3105,1599,0
+50000,Female,University,Single,34,0,31673,1449,0
+50000,Female,High School,Married,34,-1,390,390,0
+230000,Female,Graduate School,Single,38,0,217578,9840,0
+180000,Female,University,Others,48,-1,860,102,1
+40000,Female,High School,Single,43,0,26449,1303,0
+360000,Female,University,Single,42,-1,24081,6068,0
+360000,Female,University,Married,47,-2,0,0,1
+170000,Female,Graduate School,Married,45,0,166388,7300,0
+20000,Female,High School,Married,35,-1,17056,2500,0
+240000,Female,University,Married,43,-1,54,0,0
+100000,Female,Graduate School,Married,39,0,2998,1100,1
+100000,Female,Graduate School,Married,43,1,0,0,0
+110000,Female,University,Single,40,0,111720,3406,1
+420000,Female,University,Married,37,0,101987,6000,0
+180000,Female,University,Married,34,2,93155,4300,1
+240000,Female,University,Married,37,1,0,0,0
+210000,Female,University,Married,41,-1,13340,42000,0
+30000,Female,High School,Married,43,-1,10330,0,0
+210000,Female,University,Married,44,0,36612,4534,0
+320000,Female,Graduate School,Single,30,-1,10283,2488,0
+220000,Female,High School,Married,38,0,120622,4803,0
+240000,Female,Graduate School,Single,29,-1,1530,700,0
+450000,Female,Graduate School,Single,35,1,131351,4723,0
+500000,Female,University,Married,36,0,261502,30000,0
+180000,Female,University,Married,44,0,174830,7000,0
+80000,Female,Graduate School,Married,38,-2,0,339,0
+50000,Female,University,Single,41,2,390,780,1
+450000,Female,University,Married,41,0,104624,28114,0
+380000,Female,Graduate School,Married,43,-1,18866,20884,0
+50000,Female,Graduate School,Single,29,-2,-101,0,0
+30000,Female,High School,Married,30,3,29508,1900,0
+300000,Female,Graduate School,Married,37,-1,5771,3217,0
+340000,Female,University,Married,45,-2,5754,482,0
+190000,Female,University,Single,44,0,137394,5774,0
+120000,Female,High School,Married,41,0,108264,3906,0
+210000,Female,Graduate School,Single,40,0,160161,5500,0
+360000,Female,Graduate School,Married,46,0,58389,20000,0
+330000,Female,Graduate School,Single,35,-1,117,0,0
+100000,Female,Graduate School,Married,37,0,68760,3095,0
+250000,Female,Graduate School,Single,38,-1,14766,990,0
+120000,Female,University,Married,46,-2,0,0,1
+200000,Female,Graduate School,Single,43,1,0,2112,0
+280000,Female,Graduate School,Single,30,1,25903,50,0
+480000,Female,University,Married,46,-1,993,1317,0
+240000,Female,University,Single,36,-1,3922,2947,1
+120000,Female,University,Married,39,-2,280,6192,0
+110000,Female,University,Single,40,0,96155,5000,0
+390000,Female,Graduate School,Married,41,0,48378,4705,0
+20000,Female,University,Single,38,1,10683,2500,0
+140000,Female,Graduate School,Single,35,0,111941,5800,0
+160000,Female,University,Married,41,0,39962,10000,0
+210000,Female,Graduate School,Married,30,1,0,820,0
+210000,Female,Graduate School,Married,47,0,202503,7082,0
+240000,Female,Graduate School,Single,39,0,52595,238861,0
+150000,Female,University,Married,36,-2,11085,2507,0
+60000,Female,University,Married,48,0,56899,2080,0
+180000,Female,University,Married,40,-1,1552,776,1
+120000,Female,University,Single,37,0,3014,760,0
+210000,Female,Graduate School,Single,40,0,54904,2000,0
+420000,Female,Graduate School,Single,36,0,20326,1479,0
+110000,Female,University,Married,43,0,92244,4500,0
+180000,Female,University,Single,31,0,112164,1821,0
+200000,Female,Graduate School,Married,42,-2,0,0,1
+220000,Female,Graduate School,Single,32,0,217636,8000,0
+250000,Female,Graduate School,Married,36,-2,2508,3978,0
+250000,Female,Graduate School,Single,39,-2,0,632,0
+400000,Female,High School,Married,47,-2,1698,365,0
+50000,Female,University,Single,38,0,38945,2511,0
+480000,Female,Graduate School,Single,48,-2,1087,188,0
+470000,Female,Graduate School,Married,39,-1,235490,25853,0
+300000,Female,Graduate School,Married,36,0,30153,2000,0
+160000,Female,University,Single,44,1,-4,0,0
+160000,Female,Graduate School,Married,42,-1,3346,4420,0
+230000,Female,Graduate School,Single,33,-1,2227,3496,0
+50000,Female,High School,Single,49,0,39803,5007,0
+150000,Female,High School,Married,48,0,36717,2000,0
+130000,Female,University,Married,50,2,73621,4300,0
+360000,Female,Graduate School,Single,49,1,0,0,1
+150000,Female,University,Married,50,0,145268,6005,1
+70000,Female,University,Married,50,0,12857,1530,0
+450000,Female,Graduate School,Married,49,-2,0,0,0
+200000,Female,University,Married,49,1,-5,733,1
+80000,Female,University,Married,50,2,89901,6000,1
+340000,Female,University,Married,48,0,310866,15000,0
+50000,Female,University,Single,52,0,33743,2000,1
+200000,Female,Graduate School,Married,50,-1,2119,1662,0
+210000,Female,High School,Married,48,0,119530,3998,0
+360000,Female,University,Married,49,-1,801,913,0
+230000,Female,High School,Married,50,-2,2789,2942,0
+50000,Female,High School,Married,48,0,73302,2500,1
+120000,Female,University,Married,50,2,81979,3000,1
+340000,Female,Graduate School,Single,50,1,2728,15650,0
+50000,Female,High School,Married,49,0,49195,47000,1
+70000,Female,High School,Married,49,1,9915,0,0
+100000,Female,University,Married,50,0,94745,4600,0
+20000,Female,University,Married,49,0,13561,1543,1
+80000,Female,University,Married,49,0,77347,2384,0
+280000,Female,High School,Single,49,-2,795,1025,0
+300000,Female,University,Married,50,-2,882,0,0
+500000,Female,High School,Married,49,-1,396,396,1
+50000,Female,High School,Single,51,0,46880,1500,0
+140000,Female,University,Single,50,0,128949,7000,1
+210000,Female,University,Single,49,-1,991,6035,1
+100000,Female,High School,Married,49,0,41088,2000,0
+50000,Female,University,Married,50,0,14592,2000,0
+20000,Female,High School,Married,51,0,13184,1242,0
+200000,Female,University,Married,50,0,4701,2310,0
+100000,Female,High School,Married,51,-1,200,100,0
+50000,Female,High School,Married,50,0,36149,3000,1
+230000,Female,High School,Married,53,0,38723,2001,0
+260000,Female,University,Married,50,0,263320,7000,0
+120000,Female,University,Married,52,1,123184,0,1
+380000,Female,High School,Single,51,-1,43715,60718,0
+80000,Female,High School,Married,65,0,82074,3470,0
+240000,Female,Graduate School,Married,50,-2,498,0,0
+30000,Female,University,Married,54,0,33494,2506,1
+30000,Female,University,Married,59,0,27620,1449,0
+80000,Female,High School,Single,55,1,78910,84361,0
+50000,Female,University,Married,54,0,50566,2102,0
+10000,Female,Graduate School,Single,56,1,12006,1000,0
+100000,Female,High School,Married,50,2,117517,4700,1
+280000,Female,High School,Married,50,-2,0,0,0
+30000,Female,High School,Married,55,1,30724,0,1
+80000,Female,High School,Married,50,-1,7506,0,0
+30000,Female,High School,Single,51,0,14417,9000,0
+170000,Female,University,Single,51,0,154278,10000,0
+60000,Female,University,Married,52,1,28934,0,0
+100000,Female,High School,Married,50,0,92956,4333,0
+160000,Female,Graduate School,Married,58,-2,749,3894,1
+50000,Female,University,Single,55,0,40623,1616,0
+360000,Female,Graduate School,Married,59,1,0,0,1
+210000,Female,Graduate School,Married,52,-1,262,738,0
+190000,Female,University,Married,55,-1,6637,6476,0
+90000,Female,High School,Married,52,0,42567,2022,0
+50000,Female,High School,Married,52,0,48195,2035,0
+30000,Female,High School,Single,55,-2,-330,500,0
+30000,Female,High School,Married,54,2,24180,0,1
+180000,Female,High School,Married,61,0,20554,1353,1
+120000,Female,High School,Married,58,1,79400,4000,1
+80000,Female,High School,Married,55,0,72462,3000,0
+400000,Female,University,Married,49,-2,389,390,0
+120000,Female,High School,Single,53,0,118676,4210,0
+100000,Female,Graduate School,Married,52,2,40938,3000,1
+300000,Female,Graduate School,Single,52,-1,583,111665,0
+80000,Female,High School,Married,69,0,79862,3000,0
+500000,Female,High School,Married,53,-1,1453,961,0
+10000,Female,High School,Married,51,0,4416,2400,0
+100000,Female,High School,Married,55,2,1520,0,0
+240000,Female,University,Married,53,0,135322,4531,0
+280000,Female,Graduate School,Married,63,-2,-200,0,1
+20000,Female,University,Single,55,0,16503,1289,0
+110000,Female,University,Married,59,0,100800,1914,0
+290000,Female,University,Married,60,1,0,0,0
+80000,Female,Graduate School,Married,48,-1,1198,1150,1
+30000,Female,High School,Married,51,1,30128,2000,0
+30000,Female,Graduate School,Married,52,-1,4743,390,0
+60000,Female,University,Single,57,0,44041,3810,1
+70000,Female,High School,Married,52,-1,1000,1098,0
+140000,Female,High School,Married,52,0,76902,3000,0
+150000,Female,University,Single,53,0,11467,11053,0
+90000,Female,University,Married,58,1,25941,500,1
+30000,Female,University,Married,51,0,11860,1000,0
+60000,Female,High School,Single,52,0,60250,2167,0
+490000,Female,University,Married,52,-1,7357,2779,0
+230000,Female,University,Married,56,-2,10511,2381,1
+70000,Female,University,Married,53,0,71219,2900,0
+60000,Female,High School,Married,55,3,74786,0,1
+30000,Female,Others,Married,59,0,30419,1481,0
+70000,Female,University,Married,62,0,57271,2857,0
+450000,Female,High School,Married,56,1,-4,835,1
+20000,Female,High School,Single,54,0,16120,1308,1
+30000,Female,University,Married,51,1,16490,3000,1
+80000,Female,High School,Married,54,0,78929,3200,0
+390000,Female,Graduate School,Single,53,-2,450,1540,0
+30000,Female,University,Single,50,0,10741,1498,0
+200000,Female,High School,Married,51,1,0,1221,0
+140000,Female,High School,Single,59,1,63654,0,0
+30000,Female,University,Married,51,-2,1724,1222,1
+290000,Female,University,Married,51,-2,6859,11210,0
+200000,Female,University,Married,51,-1,2433,735,0
+180000,Female,High School,Married,64,0,120070,5876,0
+200000,Female,University,Married,50,-2,411,453,0
+500000,Female,Others,Married,54,0,355251,12680,0
+520000,Female,High School,Married,54,0,653062,28500,0
+90000,Female,High School,Married,53,-2,827,0,0
+260000,Female,University,Married,54,-1,16570,8234,0
+50000,Female,High School,Married,51,0,22000,2000,1
+80000,Female,University,Others,59,0,80893,2799,1
+210000,Female,University,Single,54,0,197324,8000,0
+300000,Female,Graduate School,Married,51,0,143121,8269,0
+20000,Female,Graduate School,Married,53,1,8992,0,0
+50000,Female,High School,Single,51,0,25436,2445,0
+20000,Female,High School,Married,52,2,5682,1000,1
+150000,Female,Graduate School,Married,53,-1,8706,2508,0
+50000,Female,High School,Others,51,0,44788,2352,1
+50000,Female,University,Single,51,0,50566,1738,0
+50000,Female,High School,Single,51,2,32215,2000,0
+200000,Female,High School,Married,51,1,-186,1032,0
+80000,Female,High School,Single,52,0,42635,1623,0
+50000,Female,High School,Single,56,0,48155,1798,0
+200000,Female,High School,Married,55,2,159017,9159,0
+210000,Female,High School,Married,54,0,94343,3500,0
+80000,Female,High School,Married,55,0,78573,3001,0
+200000,Female,High School,Married,50,-1,390,390,0
+200000,Female,High School,Married,51,0,99036,3520,0
+220000,Female,High School,Married,52,2,220053,9000,1
+280000,Female,Graduate School,Married,56,-2,-5,4614,0
+250000,Female,High School,Married,53,1,-13,3836,0
+90000,Female,High School,Married,55,0,39731,2000,0
+320000,Female,Graduate School,Married,58,-2,1100,0,0
+130000,Female,University,Married,53,0,210628,6800,0
+50000,Female,High School,Single,51,-1,390,780,0
+80000,Female,High School,Married,51,0,6203,1094,0
+430000,Female,University,Single,51,0,416090,17107,0
+250000,Female,Graduate School,Married,51,1,0,645,0
+50000,Female,University,Married,51,2,15902,2000,0
+360000,Female,High School,Others,58,0,35673,1179,0
+450000,Female,University,Married,53,-2,0,0,0
+110000,Male,University,Single,22,0,99209,1766,1
+360000,Male,Graduate School,Married,34,0,3187,1025,0
+10000,Male,University,Single,24,0,9035,1300,1
+200000,Male,Graduate School,Single,27,0,186778,23000,1
+20000,Male,University,Single,24,0,6215,1300,1
+70000,Male,Graduate School,Single,28,1,-47,700,1
+80000,Female,High School,Married,55,2,71905,0,1
+250000,Female,University,Single,55,-2,603,1867,0
+90000,Male,University,Single,34,0,86275,4004,0
+100000,Male,University,Married,42,0,98430,5000,0
+230000,Male,High School,Married,39,0,53672,25000,0
+380000,Male,Graduate School,Single,32,0,387851,13070,0
+390000,Male,University,Single,29,-1,1246,2000,0
+120000,Male,Graduate School,Single,30,-2,0,0,0
+20000,Male,University,Married,27,0,19339,1500,0
+20000,Male,University,Married,41,1,0,3793,0
+160000,Male,Graduate School,Single,25,1,0,10139,1
+240000,Male,High School,Single,35,-1,11119,10722,0
+30000,Male,University,Single,23,0,41300,0,0
+50000,Male,University,Single,24,0,5599,1200,0
+30000,Male,University,Single,23,1,29772,0,0
+20000,Male,Graduate School,Single,23,0,19247,2500,0
+80000,Male,Graduate School,Single,23,-1,9183,9643,0
+20000,Male,University,Single,23,3,2400,0,1
+50000,Male,University,Single,23,0,40698,2000,0
+90000,Male,Graduate School,Single,23,0,90888,6000,0
+50000,Male,University,Single,24,-1,999,594,0
+20000,Male,University,Single,23,0,10208,4972,1
+30000,Male,University,Single,24,-2,0,0,0
+50000,Male,University,Single,23,1,52319,0,0
+20000,Male,University,Single,23,0,19178,2000,0
+20000,Male,University,Single,22,1,19512,0,0
+20000,Male,University,Single,23,3,18781,0,0
+50000,Male,High School,Single,22,2,47021,2543,1
+20000,Male,University,Single,22,0,19555,2000,0
+20000,Male,University,Single,23,0,14784,1253,0
+50000,Male,University,Single,23,0,48241,2000,1
+10000,Male,University,Single,23,2,5232,1273,1
+10000,Male,University,Single,23,-2,998,780,1
+20000,Male,Graduate School,Single,22,0,16620,2000,0
+10000,Male,University,Single,22,0,8780,1100,0
+50000,Male,University,Single,25,0,11669,1502,0
+30000,Male,University,Single,25,0,7653,1500,1
+170000,Male,Graduate School,Single,26,0,118084,3800,0
+50000,Male,University,Single,23,0,51550,0,1
+60000,Male,University,Single,24,2,27019,1686,1
+50000,Male,High School,Single,26,0,50316,3000,0
+50000,Male,University,Single,22,0,46831,4000,0
+50000,Male,University,Single,24,0,49907,1963,0
+100000,Male,High School,Single,24,1,100364,107,0
+10000,Male,Graduate School,Single,25,0,5642,1112,0
+80000,Male,Graduate School,Single,26,1,0,0,1
+300000,Male,University,Single,26,-2,0,0,0
+10000,Male,Graduate School,Single,23,1,9020,0,1
+30000,Male,Graduate School,Single,24,-1,5857,18794,0
+50000,Male,Graduate School,Single,24,0,20472,1700,0
+30000,Male,Graduate School,Single,24,-1,1426,1300,0
+90000,Male,Graduate School,Single,26,1,0,0,0
+20000,Male,Graduate School,Single,23,2,5996,1300,1
+80000,Male,Graduate School,Single,23,-1,444,444,0
+90000,Male,University,Single,24,0,86724,7016,0
+20000,Male,High School,Single,24,2,8295,1160,1
+30000,Male,University,Single,24,-1,1473,390,1
+10000,Male,University,Single,23,0,9984,1300,1
+70000,Male,Graduate School,Single,25,0,31011,2002,0
+40000,Male,Graduate School,Single,26,2,36460,2000,1
+170000,Male,University,Single,26,0,145354,8000,1
+10000,Male,University,Single,24,1,9774,1000,0
+30000,Male,University,Single,25,2,29608,1600,0
+10000,Male,University,Single,24,1,6496,0,1
+50000,Male,Graduate School,Single,24,0,48061,2000,0
+30000,Male,Others,Single,24,0,21652,1369,0
+50000,Male,Graduate School,Single,24,0,30011,7000,0
+20000,Male,High School,Single,24,1,20464,4,0
+10000,Male,University,Single,25,0,3684,1100,1
+50000,Male,University,Single,24,1,-1000,1000,0
+50000,Male,University,Single,24,1,59197,2000,1
+50000,Male,University,Single,25,0,47077,2800,0
+160000,Male,University,Single,25,0,60905,2000,0
+50000,Male,University,Single,24,0,50140,3000,0
+30000,Male,Graduate School,Single,24,0,14857,2000,0
+60000,Male,University,Single,27,5,67940,0,1
+150000,Male,University,Single,25,0,153932,5500,0
+30000,Male,High School,Single,26,0,25535,3641,0
+20000,Male,Graduate School,Single,26,-1,587,2195,0
+20000,Male,University,Single,26,0,18838,1200,0
+50000,Male,University,Married,24,0,17331,1600,0
+200000,Male,Graduate School,Single,33,0,66471,2387,0
+70000,Male,University,Single,25,0,60719,2500,0
+150000,Male,Graduate School,Married,28,0,145728,5002,0
+130000,Male,Others,Single,27,1,136378,0,0
+20000,Male,High School,Single,27,0,13621,1596,0
+60000,Male,Graduate School,Single,27,-1,2165,18214,0
+50000,Male,University,Single,24,0,50603,2500,0
+210000,Male,University,Single,27,0,65795,2151,0
+150000,Male,Graduate School,Single,26,-2,13707,16408,0
+50000,Male,University,Single,25,0,47034,2100,0
+40000,Male,Graduate School,Single,26,1,0,0,0
+90000,Male,High School,Single,27,2,88072,4500,1
+70000,Male,Graduate School,Single,27,0,64317,2500,0
+20000,Male,Graduate School,Single,27,0,39628,1610,0
+30000,Male,Graduate School,Single,24,0,28436,1800,0
+20000,Male,University,Single,25,1,19099,0,0
+60000,Male,University,Single,25,2,60760,2740,1
+50000,Male,University,Single,26,1,16412,0,0
+50000,Male,Graduate School,Single,25,2,48966,2130,0
+10000,Male,University,Single,25,0,7806,1114,1
+20000,Male,High School,Single,22,1,21425,620,1
+110000,Male,Graduate School,Single,26,0,58301,2179,0
+30000,Male,University,Single,26,1,21871,0,0
+200000,Male,Graduate School,Single,27,-1,3592,3622,1
+70000,Male,University,Single,27,0,17058,52896,0
+80000,Male,High School,Single,27,0,10100,50000,0
+80000,Male,University,Married,28,0,81099,10000,0
+20000,Male,University,Single,24,0,4091,1300,0
+10000,Male,University,Single,23,0,8579,1060,0
+60000,Male,High School,Single,25,0,56753,2700,0
+50000,Male,University,Single,27,0,47452,2000,1
+110000,Male,Graduate School,Single,28,0,107560,4024,1
+210000,Male,Graduate School,Married,29,1,26845,66,0
+40000,Male,Graduate School,Single,29,1,37681,0,0
+360000,Male,Graduate School,Single,29,-2,6001,5874,0
+320000,Male,University,Married,29,0,48998,6105,0
+130000,Male,Graduate School,Single,27,-1,1439,3555,0
+130000,Male,University,Single,26,2,137204,7000,1
+50000,Male,University,Single,26,0,47661,1789,0
+50000,Male,Graduate School,Single,25,-1,45626,3000,0
+360000,Male,High School,Married,27,-2,1097,780,0
+160000,Male,High School,Single,27,2,70373,11450,1
+200000,Male,Graduate School,Single,26,0,26230,2829,0
+30000,Male,High School,Single,26,2,14598,2000,0
+360000,Male,Graduate School,Single,26,0,50536,1527,0
+20000,Male,Graduate School,Single,28,0,19262,1310,0
+80000,Male,University,Single,27,1,63467,5000,1
+490000,Male,Graduate School,Single,28,0,325669,10041,0
+10000,Male,Graduate School,Single,22,0,1483,46,0
+20000,Male,University,Single,23,0,13422,2000,0
+30000,Male,University,Single,25,1,12925,1000,1
+50000,Male,University,Single,26,0,47052,1728,0
+340000,Male,University,Single,25,0,170717,6401,0
+180000,Male,Graduate School,Single,26,-1,189,0,0
+40000,Male,Graduate School,Single,26,0,38131,1600,0
+140000,Male,Graduate School,Single,27,-1,24602,5063,0
+210000,Male,Graduate School,Single,27,0,36017,3015,0
+150000,Male,Graduate School,Single,26,-2,600,360,0
+160000,Male,University,Single,24,0,77838,3300,0
+50000,Male,High School,Single,24,1,52354,2211,1
+50000,Male,Graduate School,Single,24,0,31292,2000,0
+40000,Male,Graduate School,Single,24,0,39868,1259,0
+70000,Male,Graduate School,Single,24,0,16861,1300,0
+310000,Male,University,Single,27,0,22936,240101,0
+40000,Male,Graduate School,Single,28,2,1892,1889,1
+50000,Male,Graduate School,Single,27,-1,658,4179,0
+140000,Male,Graduate School,Single,27,0,130905,6067,0
+80000,Male,University,Single,27,1,53576,0,1
+230000,Male,Graduate School,Married,27,-1,1727,0,0
+50000,Male,University,Single,23,1,6283,0,1
+160000,Male,Graduate School,Single,27,-1,701,3442,1
+200000,Male,High School,Single,26,2,72763,5041,1
+130000,Male,High School,Single,27,2,1985,1671,0
+130000,Male,High School,Single,29,0,65861,3698,0
+100000,Male,University,Single,28,1,96469,7000,0
+160000,Male,University,Single,30,-1,3197,1941,0
+60000,Male,Graduate School,Single,29,0,31235,6050,0
+220000,Male,Graduate School,Single,29,1,31012,0,1
+50000,Male,Graduate School,Single,29,-1,2159,1961,1
+160000,Male,University,Single,27,-2,2161,1681,0
+60000,Male,University,Single,27,2,58598,2632,1
+210000,Male,Graduate School,Single,28,-2,15648,2500,0
+360000,Male,Graduate School,Single,27,0,375392,13290,0
+120000,Male,Graduate School,Single,28,0,59688,3000,0
+250000,Male,High School,Married,28,0,149499,15000,0
+210000,Male,University,Single,28,-1,4432,1173,0
+380000,Male,Graduate School,Single,28,-2,1294,1058,0
+240000,Male,University,Single,27,0,81861,15000,0
+20000,Male,High School,Single,27,1,20079,0,0
+50000,Male,University,Single,27,0,19657,1700,0
+200000,Male,University,Single,29,-1,10538,6201,1
+470000,Male,Graduate School,Single,31,0,99931,5070,0
+50000,Male,University,Single,25,0,18894,3000,0
+180000,Male,Graduate School,Single,29,1,-366,0,0
+180000,Male,Graduate School,Single,26,0,131653,5002,0
+190000,Male,Graduate School,Single,27,2,114355,13500,1
+180000,Male,Graduate School,Single,26,-1,758,0,1
+50000,Male,University,Single,25,0,48065,1800,0
+20000,Male,Graduate School,Single,25,0,18069,2180,0
+80000,Male,University,Single,25,0,8852,1000,1
+160000,Male,Graduate School,Single,28,-1,498,498,0
+200000,Male,Graduate School,Single,29,-1,3773,4029,0
+180000,Male,Graduate School,Married,30,1,143900,10000,0
+360000,Male,University,Single,30,0,282222,7100,0
+50000,Male,University,Single,25,2,61143,3000,1
+30000,Male,University,Single,32,2,32704,0,0
+30000,Male,Graduate School,Single,25,-1,1581,1046,0
+100000,Male,University,Single,28,0,82238,20000,0
+260000,Male,Graduate School,Single,29,-1,396,792,1
+240000,Male,Graduate School,Single,29,0,13980,9969,0
+50000,Male,University,Single,29,1,12460,1530,0
+400000,Male,Graduate School,Married,30,0,12820,5427,0
+70000,Male,Graduate School,Single,27,0,51923,5265,1
+280000,Male,High School,Single,30,0,104252,4510,0
+80000,Male,Graduate School,Single,30,-1,40276,5000,0
+160000,Male,University,Single,26,4,152924,3200,1
+130000,Male,University,Single,28,2,132968,6700,0
+180000,Male,Graduate School,Single,29,-1,3810,6406,0
+360000,Male,Graduate School,Single,28,-2,8840,4951,0
+260000,Male,Graduate School,Single,29,0,256503,9248,1
+50000,Male,High School,Single,29,1,0,1899,0
+80000,Male,University,Married,27,-1,79690,3181,0
+230000,Male,Graduate School,Single,26,0,42314,1804,0
+70000,Male,University,Single,26,1,71160,0,0
+270000,Male,University,Single,27,0,24689,10058,0
+220000,Male,Graduate School,Single,27,0,34686,8,0
+260000,Male,University,Married,28,0,226244,8304,0
+160000,Male,Graduate School,Single,28,-2,2235,13695,0
+280000,Male,Graduate School,Married,34,-1,441,76,0
+230000,Male,Graduate School,Single,27,0,8786,2500,0
+40000,Male,Graduate School,Single,31,0,8368,1000,0
+170000,Male,Graduate School,Single,31,0,166701,5887,0
+200000,Male,Graduate School,Married,30,3,2500,0,1
+230000,Male,University,Single,28,0,11343,7000,0
+50000,Male,Graduate School,Single,27,2,1893,0,1
+150000,Male,University,Married,27,-1,42055,2000,0
+300000,Male,Graduate School,Single,27,-2,0,0,0
+360000,Male,Graduate School,Single,31,0,81740,8000,0
+390000,Male,Graduate School,Married,31,0,54007,49188,0
+50000,Male,Graduate School,Single,31,0,49139,1707,1
+220000,Male,Graduate School,Single,32,0,100226,5200,0
+60000,Male,University,Single,27,0,48307,1768,0
+100000,Male,University,Single,28,-1,942,390,0
+290000,Male,University,Single,30,0,10823,3000,0
+130000,Male,University,Single,31,0,122486,6013,1
+400000,Male,Graduate School,Others,31,1,0,1731,0
+30000,Male,University,Married,32,0,22503,3000,0
+260000,Male,Graduate School,Single,29,-1,2335,1432,0
+70000,Male,University,Single,29,0,53700,2200,0
+160000,Male,Graduate School,Single,29,1,0,0,1
+280000,Male,Graduate School,Single,30,-2,3310,0,0
+500000,Male,Graduate School,Single,30,-1,33269,1248,0
+110000,Male,Graduate School,Single,29,0,106698,6000,0
+120000,Male,University,Married,29,-1,579,3580,0
+50000,Male,Graduate School,Single,32,0,18741,1724,0
+100000,Male,Graduate School,Single,27,0,100751,3592,0
+360000,Male,Graduate School,Single,29,-2,6906,4893,0
+20000,Male,Graduate School,Single,27,1,4473,3000,1
+50000,Male,High School,Single,29,0,46803,1772,0
+500000,Male,Graduate School,Single,30,0,309558,13749,0
+50000,Male,Graduate School,Married,32,0,25411,2000,1
+250000,Male,High School,Married,34,-2,0,0,1
+260000,Male,University,Single,34,0,54448,2017,0
+150000,Male,Graduate School,Single,34,0,62854,5000,0
+80000,Male,Graduate School,Single,33,0,59796,5000,0
+180000,Male,High School,Single,34,-2,9531,13606,0
+170000,Male,Graduate School,Married,40,0,158326,6043,0
+300000,Male,Graduate School,Married,40,-2,5649,5649,0
+230000,Male,Graduate School,Married,39,1,-17,1803,0
+470000,Male,University,Single,35,0,262980,10929,0
+20000,Male,High School,Married,37,0,18629,21473,0
+70000,Male,Graduate School,Single,38,2,2700,2108,0
+320000,Male,Graduate School,Single,41,0,309805,15818,0
+500000,Male,Graduate School,Married,39,-2,2514,810,0
+360000,Male,University,Married,47,-2,1730,2500,0
+180000,Male,Graduate School,Married,42,-1,326,326,0
+100000,Male,Graduate School,Married,40,2,27002,0,1
+10000,Male,Graduate School,Married,40,2,3567,5000,1
+20000,Male,University,Single,41,0,16808,1301,0
+70000,Male,High School,Married,46,1,30040,0,0
+20000,Male,University,Single,26,2,6078,1200,1
+500000,Male,Graduate School,Married,46,-2,11100,16966,0
+140000,Male,Graduate School,Single,32,0,32678,5400,0
+360000,Male,University,Married,39,-1,1261,1261,0
+160000,Male,University,Married,32,1,5525,0,1
+200000,Male,Graduate School,Married,34,-1,165,165,0
+230000,Male,High School,Married,48,0,20271,5034,0
+360000,Male,Graduate School,Single,38,-2,0,0,1
+260000,Male,Graduate School,Married,40,0,22325,7590,0
+50000,Male,University,Married,35,2,58175,2600,1
+500000,Male,High School,Married,39,0,41597,10000,0
+230000,Male,Graduate School,Married,38,-2,7904,864,0
+340000,Male,University,Married,39,0,148624,4000,0
+130000,Male,High School,Married,38,0,6360,1088,0
+80000,Male,High School,Married,47,0,70062,1321,0
+250000,Male,Graduate School,Married,40,-1,396,396,0
+140000,Male,University,Married,48,1,86114,4100,1
+390000,Male,Graduate School,Single,37,-2,73,10952,1
+260000,Male,Graduate School,Married,37,0,257456,10621,0
+160000,Male,University,Married,43,0,69554,4000,0
+210000,Male,University,Single,38,0,101776,1032,0
+90000,Male,University,Married,48,2,71900,0,1
+290000,Male,Graduate School,Married,45,-2,2316,2143,0
+560000,Male,Graduate School,Single,40,0,487816,15933,0
+110000,Male,University,Married,37,0,111864,11000,0
+350000,Male,High School,Married,37,0,70257,20001,0
+280000,Male,Graduate School,Single,35,0,210465,10299,0
+260000,Male,Graduate School,Married,40,0,34471,0,0
+440000,Male,University,Married,34,0,267475,10506,0
+200000,Male,Others,Married,49,0,49221,1689,0
+150000,Male,University,Single,32,1,133401,6000,0
+50000,Male,Graduate School,Single,39,3,65766,0,0
+500000,Male,Graduate School,Single,29,-1,29276,7972,0
+320000,Male,Graduate School,Married,29,0,40271,60043,0
+500000,Male,Graduate School,Married,34,1,0,198,1
+500000,Male,Graduate School,Married,45,-1,1302,5308,1
+260000,Male,Others,Single,35,0,138834,7000,0
+330000,Male,High School,Married,39,-2,2184,2184,0
+360000,Male,University,Married,36,0,338992,15348,0
+360000,Male,Graduate School,Single,33,0,206597,10022,0
+130000,Male,University,Single,33,2,20091,2000,1
+320000,Male,University,Single,32,0,281017,10454,0
+350000,Male,University,Single,42,0,347790,13000,1
+20000,Male,University,Single,38,1,26288,0,0
+200000,Male,Graduate School,Single,29,0,9228,2100,0
+240000,Male,High School,Married,33,0,45971,4000,0
+50000,Male,University,Married,29,0,39124,2000,0
+10000,Male,High School,Married,35,0,7536,1136,1
+360000,Male,Graduate School,Single,31,1,0,4260,0
+20000,Male,High School,Married,30,0,19522,4358,0
+460000,Male,Graduate School,Married,47,-2,0,0,1
+180000,Male,Graduate School,Married,49,0,62087,3000,0
+50000,Male,University,Married,35,0,48792,2105,0
+70000,Male,University,Single,32,0,69921,2708,0
+140000,Male,University,Single,39,1,10091,18539,0
+500000,Male,University,Married,41,0,155820,20000,0
+280000,Male,University,Married,44,0,159967,7822,0
+110000,Male,University,Single,37,0,68773,3500,0
+90000,Male,Graduate School,Married,37,0,91641,3100,0
+170000,Male,Graduate School,Married,38,-1,2430,2430,0
+290000,Male,Graduate School,Single,40,0,135527,8000,0
+80000,Male,University,Married,46,1,1548,0,0
+150000,Male,University,Married,43,0,88812,3309,0
+80000,Male,High School,Married,38,0,79135,3006,0
+20000,Male,Graduate School,Married,44,0,18091,1600,1
+150000,Male,University,Married,36,2,47529,17,1
+300000,Male,Graduate School,Married,48,-1,5036,32898,0
+130000,Male,University,Married,39,3,130819,0,0
+490000,Male,Graduate School,Married,41,1,230090,0,0
+120000,Male,University,Single,34,0,49206,2000,0
+50000,Male,High School,Married,34,0,15750,1300,1
+150000,Male,Graduate School,Single,28,0,88506,2200,0
+720000,Male,Graduate School,Married,40,0,308274,8500,1
+20000,Male,University,Married,48,-1,10222,1400,0
+260000,Male,Graduate School,Married,35,0,163327,8200,0
+240000,Male,Graduate School,Married,33,0,167835,7200,0
+130000,Male,University,Others,34,2,126749,11243,1
+200000,Male,High School,Single,38,0,101254,16002,0
+140000,Male,University,Married,37,0,64372,2300,0
+10000,Male,High School,Married,41,0,6270,1300,0
+140000,Male,University,Married,38,1,129497,6000,1
+20000,Male,High School,Single,45,0,5250,2000,1
+200000,Male,Graduate School,Married,38,-2,3528,1196,0
+130000,Male,University,Married,44,0,139892,6179,1
+360000,Male,Graduate School,Single,35,0,10424,1104,0
+210000,Male,University,Married,38,2,196426,8405,1
+230000,Male,Graduate School,Married,38,-2,0,0,0
+210000,Male,University,Married,44,1,89567,0,1
+200000,Male,University,Married,35,-1,8028,3747,0
+20000,Male,High School,Single,27,0,11849,1500,1
+410000,Male,Graduate School,Single,28,0,82735,10000,0
+280000,Male,Others,Married,39,0,227604,7670,0
+360000,Male,University,Married,39,0,29814,1333,0
+10000,Male,University,Single,28,1,3717,150,1
+20000,Male,University,Single,37,1,14440,0,1
+370000,Male,Graduate School,Single,30,0,333930,13000,0
+180000,Male,Graduate School,Married,48,0,163461,8000,0
+60000,Male,Graduate School,Married,38,1,0,0,0
+80000,Male,Graduate School,Married,46,2,57477,2600,1
+70000,Male,University,Married,47,0,53239,2000,0
+400000,Male,Graduate School,Single,32,-1,3919,0,0
+150000,Male,Graduate School,Single,31,-2,13877,8422,0
+50000,Male,University,Married,42,1,50179,0,0
+310000,Male,University,Married,41,0,324628,12000,0
+200000,Male,University,Married,44,0,71656,1714,0
+430000,Male,Graduate School,Married,34,-1,11873,8299,0
+240000,Male,University,Married,40,0,11671,1810,0
+360000,Male,University,Single,44,0,189003,6050,0
+380000,Male,University,Married,35,0,27268,50042,0
+190000,Male,High School,Married,41,0,30425,1467,0
+350000,Male,Graduate School,Married,34,0,120198,5900,0
+110000,Male,High School,Married,37,0,107793,4010,0
+180000,Male,University,Married,32,-2,2196,396,0
+400000,Male,Graduate School,Married,35,-2,2240,12245,0
+500000,Male,Graduate School,Single,35,-1,3938,82392,0
+170000,Male,University,Married,48,-1,390,390,0
+360000,Male,High School,Married,41,-2,0,0,1
+60000,Male,University,Married,44,0,58216,3000,0
+350000,Male,Graduate School,Single,32,0,43356,2000,0
+500000,Male,University,Married,31,0,12988,1122,0
+230000,Male,Graduate School,Single,32,0,48418,5019,0
+20000,Male,University,Married,35,0,11043,1200,1
+240000,Male,University,Married,37,0,11278,3000,0
+440000,Male,University,Single,42,0,64786,5000,0
+250000,Male,High School,Single,31,0,73215,7009,0
+300000,Male,Graduate School,Married,38,-1,706,200,1
+180000,Male,University,Single,28,1,23638,3000,0
+360000,Male,Graduate School,Married,39,0,167685,3304,0
+260000,Male,Graduate School,Married,38,0,162820,6024,0
+80000,Male,Graduate School,Single,42,5,74477,0,0
+390000,Male,Graduate School,Married,47,-1,5768,8875,0
+250000,Male,High School,Married,42,0,41300,0,0
+100000,Male,University,Single,43,0,104028,6000,0
+170000,Male,High School,Married,44,0,159094,6000,0
+500000,Male,Graduate School,Married,45,-1,1366,9543,0
+410000,Male,University,Married,36,0,421227,15050,0
+360000,Male,Graduate School,Married,32,-1,2500,0,0
+290000,Male,University,Married,38,-2,24709,291,0
+80000,Male,University,Married,44,0,1245,27279,0
+280000,Male,Graduate School,Single,31,-2,8383,9044,0
+360000,Male,University,Married,40,0,69464,10034,0
+100000,Male,University,Married,44,-1,390,390,0
+80000,Male,University,Single,31,0,42188,1577,0
+80000,Male,Graduate School,Single,46,0,52674,9000,0
+180000,Male,Graduate School,Married,45,1,0,0,0
+400000,Male,Graduate School,Married,34,-1,43711,37954,0
+450000,Male,University,Married,48,-2,18659,389,0
+80000,Male,Graduate School,Single,34,2,61231,2800,1
+210000,Male,University,Married,37,0,41030,3012,0
+300000,Male,University,Married,45,-1,360,360,0
+340000,Male,University,Single,41,-2,186,3842,0
+110000,Male,Graduate School,Single,30,0,60290,3000,0
+50000,Male,University,Married,41,0,8988,1200,0
+20000,Male,University,Married,40,0,10288,3010,0
+16000,Male,Others,Married,46,0,22058,3000,0
+60000,Male,Graduate School,Single,31,0,57937,3000,0
+70000,Male,Graduate School,Single,31,1,23633,0,1
+70000,Male,University,Married,34,0,27202,3000,0
+430000,Male,Graduate School,Single,35,-1,5454,2328,0
+60000,Male,University,Single,39,0,58538,6000,0
+200000,Male,Graduate School,Married,46,-1,28910,6593,0
+160000,Male,University,Married,39,1,47675,500,0
+200000,Male,High School,Married,40,0,205277,269,0
+500000,Male,Graduate School,Married,43,0,124362,4567,0
+50000,Male,University,Married,43,-1,780,0,1
+220000,Male,Graduate School,Single,38,0,209044,7705,0
+180000,Male,University,Married,48,0,178440,8000,0
+310000,Male,University,Single,35,0,124617,6015,0
+80000,Male,High School,Married,41,0,35646,3000,0
+50000,Male,University,Single,37,0,44756,1736,0
+400000,Male,Graduate School,Married,43,-2,-5,0,0
+280000,Male,Graduate School,Single,32,1,0,0,1
+100000,Male,Graduate School,Single,32,0,99573,2787,0
+350000,Male,University,Married,42,0,195639,7200,0
+10000,Male,High School,Single,41,0,4520,2000,0
+100000,Male,University,Single,35,2,92590,3251,0
+240000,Male,Graduate School,Married,38,-1,3833,4890,0
+500000,Male,Graduate School,Married,46,-2,0,0,0
+50000,Male,High School,Married,36,3,150,0,1
+30000,Male,University,Married,33,0,59867,1500,1
+360000,Male,Graduate School,Married,41,-2,2988,0,0
+250000,Male,Graduate School,Married,49,0,236430,11000,0
+260000,Male,University,Single,29,0,190074,6847,0
+50000,Male,High School,Married,40,0,46850,2068,0
+550000,Male,University,Single,33,-2,-509,15,0
+420000,Male,Graduate School,Married,46,2,426429,19000,1
+130000,Male,University,Married,35,0,129902,7000,0
+320000,Male,University,Married,46,0,345408,15004,0
+50000,Male,High School,Single,47,0,15019,1263,0
+310000,Male,Graduate School,Married,45,-2,8918,14000,0
+340000,Male,Graduate School,Married,49,-2,1689,655,1
+280000,Male,Graduate School,Married,37,-1,22398,30422,0
+250000,Male,Graduate School,Married,32,-1,207,4907,0
+80000,Male,University,Others,40,0,72185,12660,0
+230000,Male,High School,Single,29,1,-4,1500,1
+330000,Male,University,Single,37,0,16270,140,0
+350000,Male,High School,Single,41,-2,-331,2155,0
+280000,Male,Graduate School,Single,35,1,0,416,0
+140000,Male,Graduate School,Single,36,2,75384,4000,1
+130000,Male,High School,Married,43,2,55544,3400,0
+200000,Male,University,Married,35,0,187487,16000,1
+280000,Male,Graduate School,Single,37,-1,1695,6644,0
+20000,Male,High School,Single,36,-1,16842,1000,0
+50000,Male,High School,Married,47,0,12975,1230,0
+30000,Male,High School,Married,28,2,23027,1500,1
+60000,Male,Others,Single,32,0,57505,1800,0
+50000,Male,University,Married,32,0,47842,2097,0
+20000,Male,High School,Married,46,1,19450,0,0
+100000,Male,Graduate School,Married,38,0,46717,2500,1
+100000,Male,University,Single,34,0,19138,1100,0
+210000,Male,University,Married,39,-1,600,169,1
+20000,Male,High School,Married,45,0,26400,1000,0
+130000,Male,University,Single,33,1,0,870,0
+290000,Male,High School,Single,41,0,293281,7948,0
+40000,Male,Graduate School,Single,45,-1,10567,264,0
+10000,Male,University,Married,33,0,10146,1158,0
+360000,Male,University,Single,39,-1,10027,8975,0
+210000,Male,High School,Married,44,1,0,0,1
+230000,Male,Graduate School,Single,39,-1,260,1000,0
+60000,Male,University,Married,35,2,9501,1500,1
+150000,Male,Graduate School,Single,35,0,150359,15000,0
+230000,Male,Graduate School,Single,44,0,25949,2000,0
+110000,Male,Graduate School,Single,35,3,450,0,1
+150000,Male,High School,Single,39,-2,1580,2935,0
+350000,Male,Graduate School,Married,38,-1,17999,38933,0
+100000,Male,High School,Single,44,0,90128,4200,1
+200000,Male,University,Single,30,0,65342,2856,0
+200000,Male,University,Single,34,-2,541,59,0
+10000,Male,University,Married,35,2,9242,1315,0
+200000,Male,Graduate School,Single,30,0,46288,3000,0
+240000,Male,University,Married,34,0,194548,7200,0
+290000,Male,Graduate School,Single,38,-2,2318,1827,0
+80000,Male,University,Single,31,0,73414,3000,0
+80000,Male,University,Married,39,2,51193,1200,1
+320000,Male,Graduate School,Single,29,0,46758,2563,0
+140000,Male,High School,Married,39,1,89592,5000,1
+60000,Male,University,Single,32,-1,2178,0,0
+150000,Male,Others,Married,34,0,92000,2000,0
+20000,Male,High School,Married,39,-1,780,390,0
+50000,Male,University,Married,44,0,3201,0,1
+150000,Male,University,Single,35,0,66662,2116,0
+20000,Male,University,Single,36,0,16419,1300,0
+80000,Male,High School,Single,46,2,77142,3000,1
+50000,Male,University,Married,36,-1,630,1000,0
+190000,Male,Graduate School,Married,48,0,189158,6561,0
+190000,Male,High School,Single,35,0,170928,6500,0
+140000,Male,University,Single,30,-1,44300,54700,0
+160000,Male,University,Single,33,0,78859,4000,0
+580000,Male,Graduate School,Married,35,-2,9238,0,0
+500000,Male,Graduate School,Married,43,1,0,0,0
+130000,Male,University,Married,38,0,61105,3000,0
+300000,Male,High School,Married,44,-1,530,25576,0
+300000,Male,University,Married,45,1,114324,244500,1
+120000,Male,University,Single,34,-1,1236,856,0
+220000,Male,Graduate School,Single,30,-2,2895,0,0
+10000,Male,University,Married,44,-1,390,390,0
+60000,Male,University,Married,47,0,29324,1500,1
+150000,Male,High School,Single,32,-1,1437,162,0
+130000,Male,University,Single,41,0,124391,4442,0
+210000,Male,High School,Married,49,0,27984,1500,0
+70000,Male,University,Single,36,0,20857,2000,0
+130000,Male,University,Married,37,0,25991,2500,1
+70000,Male,High School,Married,46,1,11300,0,1
+260000,Male,Others,Married,48,0,135437,19020,1
+60000,Male,University,Single,29,0,50901,4000,0
+110000,Male,Graduate School,Single,33,0,9495,4018,0
+230000,Male,University,Married,33,1,0,1427,0
+100000,Male,High School,Single,33,1,0,0,0
+360000,Male,High School,Married,42,0,226988,8006,0
+120000,Male,Graduate School,Married,43,-1,3131,5126,0
+230000,Male,High School,Single,41,0,9552,2000,1
+150000,Male,University,Single,31,0,19057,1500,0
+450000,Male,Graduate School,Single,31,1,153246,4009,1
+150000,Male,Graduate School,Single,31,2,57504,2500,1
+170000,Male,Graduate School,Married,41,0,70317,2579,0
+270000,Male,Graduate School,Married,40,0,3686,1084,0
+500000,Male,Graduate School,Single,38,-2,0,0,0
+300000,Male,Graduate School,Married,35,-1,1246,1217,0
+150000,Male,Graduate School,Married,35,-1,18786,6879,0
+360000,Male,Graduate School,Single,34,0,350178,15000,0
+110000,Male,University,Married,42,0,70265,5000,1
+360000,Male,Graduate School,Married,38,0,47026,48564,0
+220000,Male,Graduate School,Single,37,-2,0,0,1
+230000,Male,Graduate School,Single,45,0,92129,3510,0
+500000,Male,High School,Married,47,-1,1665,1666,0
+210000,Male,University,Married,43,0,48540,4000,0
+500000,Male,University,Married,47,-1,18463,18011,0
+30000,Male,High School,Married,33,3,19932,1400,1
+490000,Male,High School,Married,45,0,56294,8000,0
+240000,Male,University,Married,48,0,236767,8622,0
+200000,Male,Graduate School,Married,46,2,64158,3000,1
+490000,Male,University,Single,42,0,455520,16100,0
+240000,Male,University,Married,47,0,205495,34700,0
+160000,Male,Graduate School,Single,29,-1,4908,0,0
+60000,Male,University,Single,30,-1,921,1498,0
+160000,Male,Others,Single,40,0,152240,7650,0
+400000,Male,High School,Married,42,0,95014,5000,1
+320000,Male,University,Married,33,0,254065,68000,0
+340000,Male,Graduate School,Single,38,-1,9054,1868,0
+240000,Male,High School,Married,45,0,135367,6300,0
+100000,Male,Graduate School,Single,30,2,33217,2000,1
+50000,Male,Graduate School,Married,46,0,13176,3000,0
+200000,Male,Graduate School,Married,36,1,-95,0,0
+280000,Male,High School,Married,44,0,38242,4000,0
+360000,Male,University,Married,37,-2,346,9182,0
+20000,Male,University,Married,43,2,14301,3000,1
+160000,Male,Others,Married,47,-1,386,907,0
+30000,Male,University,Single,29,7,33666,0,1
+90000,Male,Graduate School,Single,29,2,11512,1500,1
+20000,Male,University,Married,43,0,12721,1600,1
+100000,Male,High School,Single,30,0,37709,2500,1
+20000,Male,High School,Married,40,3,300,0,1
+300000,Male,High School,Married,40,0,39883,3000,0
+70000,Male,University,Single,31,0,67958,2500,0
+10000,Male,Graduate School,Married,42,4,6744,0,0
+50000,Male,University,Single,42,0,7217,1200,0
+280000,Male,University,Married,40,1,238094,0,0
+10000,Male,High School,Single,44,1,7816,0,0
+10000,Male,University,Married,29,0,8275,2800,1
+20000,Male,Graduate School,Single,29,0,13835,1600,0
+200000,Male,Graduate School,Single,30,0,70185,3000,0
+140000,Male,University,Married,34,0,70649,3100,0
+100000,Male,Graduate School,Single,32,0,83845,4500,0
+170000,Male,Graduate School,Single,33,-1,2371,316,0
+160000,Male,High School,Married,33,-1,326,326,0
+190000,Male,University,Single,38,0,3816,1100,0
+180000,Male,University,Single,36,0,134496,6600,0
+500000,Male,Graduate School,Married,36,1,-44,0,0
+340000,Male,Graduate School,Married,42,-2,6888,2913,0
+120000,Male,High School,Married,42,2,61650,0,1
+180000,Male,Graduate School,Married,38,0,21336,1400,0
+200000,Male,High School,Married,48,-2,2772,4127,0
+230000,Male,Graduate School,Married,45,0,24901,1500,0
+120000,Male,Graduate School,Single,27,0,45375,12000,0
+150000,Male,Graduate School,Single,46,0,69627,5000,0
+80000,Male,University,Single,27,1,3751,2,1
+90000,Male,University,Single,27,0,49271,3011,0
+200000,Male,Graduate School,Single,28,-1,5838,2256,0
+240000,Male,High School,Single,28,0,22334,3851,0
+120000,Male,Graduate School,Single,28,0,120942,4585,0
+120000,Male,University,Single,29,1,0,652,1
+230000,Male,Graduate School,Married,29,2,306836,9062,0
+230000,Male,Graduate School,Single,34,-1,476,476,0
+250000,Male,Graduate School,Single,32,3,2500,0,1
+330000,Male,Graduate School,Single,32,0,16352,0,1
+470000,Male,University,Married,32,0,176838,9000,0
+310000,Male,Graduate School,Married,33,2,2500,0,1
+20000,Male,High School,Married,48,-1,1522,1000,1
+150000,Male,High School,Single,45,-1,6129,6051,0
+30000,Male,University,Married,35,1,0,26232,1
+330000,Male,Graduate School,Married,37,-1,694,0,0
+180000,Male,Graduate School,Married,42,-1,1568,1550,0
+360000,Male,University,Married,38,-2,1288,2520,0
+150000,Male,University,Married,43,0,122084,5000,0
+280000,Male,High School,Others,42,1,-191,0,0
+130000,Male,High School,Married,39,0,51790,390,0
+240000,Male,University,Married,47,0,125055,12400,0
+90000,Male,University,Married,41,0,3014,1757,0
+190000,Male,University,Married,41,0,66506,3000,0
+20000,Male,Graduate School,Married,41,0,19796,1507,0
+50000,Male,University,Married,40,0,42511,5000,1
+160000,Male,Graduate School,Single,27,-1,1111,1111,0
+60000,Male,High School,Single,31,2,31649,0,1
+150000,Male,Graduate School,Single,27,0,56918,3000,0
+100000,Male,Graduate School,Single,27,2,34694,1900,1
+60000,Male,University,Married,35,2,20195,1700,0
+220000,Male,Graduate School,Married,37,-1,8821,20059,0
+130000,Male,Graduate School,Single,49,2,68739,3000,1
+10000,Male,High School,Single,42,1,8090,0,1
+200000,Male,High School,Single,31,0,6492,1050,0
+150000,Male,High School,Married,42,0,148826,6000,0
+100000,Male,Graduate School,Single,32,0,26650,2000,0
+300000,Male,University,Single,36,2,285177,13003,1
+160000,Male,Graduate School,Single,30,1,0,650,1
+500000,Male,Graduate School,Single,31,0,30056,6000,0
+80000,Male,High School,Single,34,0,72463,7000,1
+90000,Male,University,Married,31,0,77107,20000,0
+320000,Male,Graduate School,Married,33,-2,1877,4114,0
+390000,Male,Graduate School,Married,35,-1,14673,83728,0
+30000,Male,University,Married,40,1,14557,0,1
+170000,Male,University,Married,34,1,-8,0,0
+200000,Male,Graduate School,Single,36,2,89082,4000,1
+380000,Male,Others,Married,48,-1,17670,10233,0
+260000,Male,Graduate School,Married,43,-1,1350,0,0
+360000,Male,Graduate School,Married,32,-1,7462,2721,0
+240000,Male,Graduate School,Married,35,-1,2024,2017,0
+100000,Male,Graduate School,Single,31,1,20505,2700,0
+90000,Male,University,Married,39,0,88081,3244,1
+290000,Male,University,Single,38,-1,72607,2003,0
+80000,Male,High School,Single,42,0,11204,6922,0
+240000,Male,Graduate School,Married,39,-1,14719,3192,0
+30000,Male,Graduate School,Single,29,0,17960,1610,0
+30000,Male,University,Married,42,2,29492,1488,0
+50000,Male,University,Single,33,0,18844,1500,1
+180000,Male,High School,Married,37,-1,1170,420,1
+310000,Male,University,Married,32,0,11416,5000,0
+80000,Male,Graduate School,Single,35,0,52132,3000,0
+50000,Male,University,Married,42,1,49559,0,0
+390000,Male,High School,Single,35,0,55213,10000,0
+440000,Male,Graduate School,Single,39,0,120448,3939,0
+50000,Male,University,Single,30,1,0,676,0
+210000,Male,Graduate School,Married,33,0,154714,4830,0
+230000,Male,High School,Single,34,0,137122,6700,0
+50000,Male,University,Married,43,0,27681,1458,0
+100000,Male,High School,Single,48,-1,390,780,1
+80000,Male,Graduate School,Married,46,2,40509,1000,0
+250000,Male,High School,Married,28,0,249827,8500,0
+150000,Male,High School,Married,45,-2,389,350,0
+140000,Male,High School,Married,42,0,137952,7000,0
+530000,Male,Graduate School,Married,35,0,339912,12188,0
+20000,Male,Graduate School,Single,30,2,16855,1000,1
+130000,Male,Graduate School,Married,29,0,71051,3000,0
+20000,Male,High School,Single,34,1,15526,0,0
+50000,Male,University,Married,38,3,33759,3500,1
+470000,Male,High School,Single,37,2,519901,0,1
+310000,Male,University,Single,41,0,369398,10250,0
+180000,Male,University,Married,49,-2,0,0,0
+310000,Male,Graduate School,Married,45,-2,8964,1174,0
+160000,Male,Graduate School,Married,42,2,14137,0,1
+500000,Male,University,Single,32,0,126599,8000,0
+380000,Male,University,Married,37,0,288908,12036,0
+250000,Male,Others,Married,35,-2,22839,7783,0
+500000,Male,Graduate School,Single,39,-1,22204,83643,0
+210000,Male,Graduate School,Single,33,-1,10635,10046,0
+250000,Male,Graduate School,Married,46,-2,0,0,1
+60000,Male,Graduate School,Single,28,0,59114,2800,0
+200000,Male,University,Married,44,-1,4735,3679,0
+200000,Male,Graduate School,Married,30,2,44411,2000,1
+20000,Male,University,Single,44,1,18330,0,0
+360000,Male,Graduate School,Married,41,-2,316,736,1
+100000,Male,University,Married,39,0,91300,0,0
+180000,Male,University,Married,37,0,102937,5000,0
+380000,Male,Graduate School,Married,46,1,0,0,0
+50000,Male,High School,Married,28,0,49468,1900,0
+200000,Male,Graduate School,Single,32,0,141634,5700,0
+200000,Male,Graduate School,Married,41,-1,580,580,1
+280000,Male,Graduate School,Single,32,0,217798,10050,0
+200000,Male,Graduate School,Married,34,1,0,0,0
+150000,Male,Graduate School,Single,30,0,201864,13230,0
+360000,Male,University,Married,31,0,359491,14108,1
+230000,Male,Graduate School,Married,35,-2,392,4114,0
+240000,Male,University,Others,35,0,18392,1505,0
+420000,Male,Graduate School,Married,40,-1,4110,0,0
+450000,Male,Graduate School,Married,37,-2,16666,16662,0
+10000,Male,High School,Single,34,0,8820,1100,0
+100000,Male,University,Married,30,0,37218,1641,0
+180000,Male,Graduate School,Married,45,1,0,0,1
+10000,Male,High School,Single,43,0,9545,1117,0
+20000,Male,High School,Single,43,0,15921,1280,1
+50000,Male,Graduate School,Single,32,0,15068,3500,1
+250000,Male,Graduate School,Single,32,-1,55741,74081,0
+150000,Male,University,Married,48,0,118610,4955,0
+30000,Male,University,Single,30,0,29993,1856,0
+500000,Male,Graduate School,Married,31,0,400780,30000,0
+120000,Male,Graduate School,Single,31,0,64670,5000,0
+200000,Male,Graduate School,Single,31,-2,15321,5006,0
+340000,Male,High School,Married,44,0,109424,4032,0
+20000,Male,University,Single,44,0,19696,203,0
+20000,Male,University,Married,37,0,11551,1102,0
+20000,Male,High School,Single,33,1,16247,0,0
+220000,Male,Graduate School,Single,34,0,14588,5021,0
+20000,Male,University,Married,34,0,2799,2005,0
+280000,Male,University,Married,30,-1,652,0,0
+400000,Male,Graduate School,Married,49,-1,38810,58265,0
+310000,Male,Graduate School,Single,45,0,313268,10010,0
+30000,Male,Graduate School,Married,37,0,29104,2000,1
+80000,Male,University,Single,38,0,73171,5000,0
+10000,Male,Others,Single,44,0,8958,1147,0
+70000,Male,High School,Others,47,0,67938,2314,0
+390000,Male,Graduate School,Single,37,-2,0,0,0
+340000,Male,High School,Married,43,0,251135,22000,0
+180000,Male,Graduate School,Married,41,1,114952,6000,1
+100000,Male,Graduate School,Married,31,0,30505,1509,1
+360000,Male,University,Single,37,-2,880,22311,0
+50000,Male,High School,Single,44,0,49911,1762,0
+360000,Male,University,Single,33,0,60219,22694,0
+50000,Male,University,Single,39,0,26842,2500,0
+150000,Male,University,Married,35,0,129564,10076,0
+20000,Male,High School,Married,30,0,19131,1500,0
+150000,Male,Graduate School,Married,44,0,152375,5005,1
+50000,Male,High School,Married,42,0,45220,1800,0
+500000,Male,University,Married,40,-1,1010,4682,0
+280000,Male,University,Married,39,-2,1525,7430,0
+270000,Male,University,Married,49,0,157713,7100,0
+110000,Male,University,Single,42,0,53113,5150,0
+280000,Male,University,Single,31,0,136798,4528,0
+200000,Male,University,Married,33,-1,2104,1003,0
+30000,Male,University,Single,37,2,25811,3700,1
+130000,Male,High School,Married,44,0,124716,6789,0
+440000,Male,Graduate School,Single,33,-1,2976,16911,0
+170000,Male,University,Married,37,1,77959,0,0
+50000,Male,Graduate School,Single,38,0,50620,3000,0
+120000,Male,High School,Married,40,-1,32161,64308,0
+500000,Male,Graduate School,Single,37,0,50140,1862,0
+120000,Male,Graduate School,Single,31,-1,15665,2647,0
+180000,Male,University,Married,33,0,65226,10000,0
+350000,Male,Graduate School,Single,33,-2,2394,1685,1
+20000,Male,University,Single,30,0,7534,2000,0
+160000,Male,University,Married,39,0,14658,1258,1
+290000,Male,Graduate School,Married,37,-2,-17,3482,0
+180000,Male,University,Single,30,-1,3229,0,0
+200000,Male,Graduate School,Single,32,-1,1125,2182,1
+210000,Male,University,Married,39,-1,1443,1443,0
+70000,Male,University,Single,30,1,0,11609,0
+280000,Male,University,Single,38,2,163675,9000,1
+120000,Male,High School,Single,31,1,-94,3,1
+260000,Male,Graduate School,Single,45,-1,6062,0,0
+340000,Male,Graduate School,Single,31,0,342299,14000,0
+260000,Male,Graduate School,Single,29,0,56050,6000,0
+140000,Male,Graduate School,Single,29,0,34052,1553,0
+140000,Male,University,Married,36,1,119279,10000,0
+200000,Male,University,Married,38,1,185471,9000,0
+50000,Male,University,Single,43,0,29308,2000,1
+130000,Male,University,Single,27,0,21920,1400,0
+130000,Male,Graduate School,Single,27,0,71668,2000,0
+50000,Male,University,Single,28,0,26538,1500,0
+20000,Male,High School,Married,28,0,19794,1304,0
+390000,Male,Graduate School,Single,28,0,18891,6000,0
+50000,Male,University,Single,28,2,45769,2500,0
+30000,Male,University,Single,29,1,24180,0,0
+240000,Male,University,Married,33,-2,15220,7575,0
+180000,Male,Graduate School,Single,31,-1,6383,3200,0
+420000,Male,Graduate School,Single,31,-2,7712,35527,0
+120000,Male,Graduate School,Single,33,2,52053,2000,1
+450000,Male,Graduate School,Single,34,-1,12712,1830,0
+20000,Male,Graduate School,Single,34,0,19569,1000,1
+360000,Male,University,Married,35,-2,-9,0,1
+350000,Male,University,Single,40,0,28194,1500,0
+180000,Male,High School,Single,44,0,172064,8000,0
+410000,Male,University,Married,41,0,392906,16000,0
+50000,Male,High School,Married,36,0,44917,2061,0
+430000,Male,High School,Married,36,0,35488,5000,0
+200000,Male,High School,Others,48,-1,7331,4780,0
+180000,Male,Graduate School,Single,35,0,52160,3022,0
+200000,Male,Graduate School,Married,35,0,29675,9000,0
+160000,Male,University,Married,35,2,155327,8000,1
+100000,Male,University,Married,47,0,72703,2653,0
+140000,Male,University,Single,44,2,84371,3046,1
+120000,Male,University,Married,43,-1,3972,2816,0
+360000,Male,Graduate School,Single,37,1,-2,515,0
+120000,Male,Graduate School,Married,41,0,115467,2000,1
+310000,Male,Graduate School,Married,37,-1,46916,5545,0
+180000,Male,Graduate School,Married,44,-2,14787,12898,0
+160000,Male,University,Married,46,0,116988,4200,0
+50000,Male,University,Married,48,2,79262,3278,1
+300000,Male,University,Married,40,0,25492,3000,0
+500000,Male,High School,Married,43,0,57313,4060,0
+230000,Male,University,Married,39,1,-430,18675,0
+300000,Male,High School,Married,45,-1,836,390,0
+180000,Male,University,Single,41,-2,0,900,0
+280000,Male,University,Married,45,-1,1207,1207,0
+20000,Male,High School,Single,47,-1,3897,1004,0
+360000,Male,Graduate School,Married,37,-1,5225,6,0
+20000,Male,University,Others,37,0,19701,2000,0
+250000,Male,Graduate School,Married,35,0,70767,10000,0
+120000,Male,Graduate School,Married,45,0,118287,5700,0
+100000,Male,Others,Others,44,2,40199,0,0
+50000,Male,High School,Married,47,0,11752,2000,0
+470000,Male,High School,Single,31,-1,8748,3500,0
+80000,Male,University,Married,45,-1,10290,40000,0
+110000,Male,University,Single,29,0,31327,2000,0
+280000,Male,University,Single,30,0,261602,10000,0
+200000,Male,Graduate School,Married,37,-2,3509,0,0
+170000,Male,High School,Married,45,-1,34778,10000,0
+60000,Male,University,Married,41,1,-1582,59852,0
+180000,Male,University,Married,36,0,184661,7000,0
+300000,Male,University,Married,43,-2,0,0,1
+150000,Male,University,Married,40,0,142326,4500,1
+210000,Male,Graduate School,Single,39,-2,28435,7637,0
+50000,Male,High School,Single,37,-1,20248,45535,0
+50000,Male,Graduate School,Married,45,-1,390,390,1
+300000,Male,University,Married,35,-2,0,0,0
+80000,Male,University,Married,43,0,78726,3000,0
+170000,Male,University,Single,31,0,172012,168019,0
+50000,Male,High School,Married,44,0,19951,1282,0
+140000,Male,University,Married,40,0,139282,4945,1
+350000,Male,University,Married,36,0,32596,1000,1
+20000,Male,University,Married,45,-1,836,836,0
+170000,Male,Graduate School,Single,32,2,2097,0,1
+20000,Male,High School,Single,40,0,17613,1590,1
+240000,Male,University,Married,44,0,23904,1400,0
+50000,Male,University,Married,40,0,31852,1600,1
+230000,Male,University,Married,41,0,301038,9405,1
+100000,Male,Graduate School,Married,35,1,3515,10,0
+300000,Male,Graduate School,Single,45,-1,3016,0,0
+90000,Male,University,Married,42,0,87104,5000,1
+210000,Male,Graduate School,Single,32,0,218742,10000,0
+50000,Male,University,Single,44,0,10101,1300,0
+400000,Male,University,Married,46,0,113125,8000,0
+300000,Male,Graduate School,Married,37,-2,19954,23348,0
+150000,Male,Graduate School,Single,32,2,126115,6300,0
+500000,Male,Graduate School,Married,35,2,510367,24913,1
+300000,Male,University,Single,38,1,0,1560,1
+300000,Male,Graduate School,Married,45,1,0,0,0
+150000,Male,Graduate School,Single,33,-1,6146,0,0
+410000,Male,Graduate School,Single,32,0,198954,7000,0
+80000,Male,Graduate School,Single,33,0,77244,3492,0
+50000,Male,High School,Single,33,0,8949,1030,0
+140000,Male,High School,Single,33,-2,500,473,0
+150000,Male,University,Married,33,2,1033,1550,1
+20000,Male,Graduate School,Married,33,0,20276,1326,0
+200000,Male,Graduate School,Single,33,-1,12202,4994,0
+110000,Male,University,Single,42,0,80338,5000,0
+230000,Male,University,Single,41,2,225043,10000,1
+50000,Male,University,Married,49,1,17371,1000,1
+50000,Male,University,Single,49,1,47360,0,0
+50000,Male,Graduate School,Married,49,-1,25801,0,0
+310000,Male,Graduate School,Married,50,-1,396,396,0
+10000,Male,University,Married,49,1,-20,800,1
+100000,Male,High School,Married,50,2,66204,0,0
+700000,Male,University,Married,50,-2,3528,2024,0
+330000,Male,Graduate School,Married,50,-2,-17,0,0
+300000,Male,University,Married,46,-1,1413,15480,0
+500000,Male,High School,Married,49,0,260991,7890,0
+10000,Male,High School,Married,51,1,0,950,0
+70000,Male,High School,Married,50,0,68725,3220,0
+110000,Male,University,Married,50,1,-10682,62000,0
+250000,Male,University,Married,50,0,238215,9000,0
+550000,Male,High School,Married,50,2,491810,51500,1
+390000,Male,Graduate School,Single,47,-1,898,0,0
+100000,Male,Others,Single,51,2,104264,0,0
+20000,Male,University,Married,47,2,17305,1295,1
+360000,Male,University,Married,50,0,221855,46000,0
+60000,Male,University,Married,50,-1,7888,14026,0
+150000,Male,University,Married,51,2,17307,0,1
+360000,Male,High School,Married,49,0,170792,5729,0
+50000,Male,Graduate School,Single,49,0,48960,2255,0
+170000,Male,Graduate School,Married,50,1,152495,7200,0
+300000,Male,University,Married,51,1,198235,0,0
+150000,Male,High School,Single,51,-1,3958,3146,0
+300000,Male,High School,Married,51,-2,1236,0,0
+500000,Male,Graduate School,Married,57,-2,18726,29914,0
+160000,Male,University,Single,57,-2,-1,0,1
+50000,Male,Graduate School,Single,54,2,9159,1500,0
+10000,Male,High School,Single,56,2,10156,0,0
+100000,Male,High School,Married,54,2,81507,3900,1
+300000,Male,Graduate School,Married,58,2,148265,14000,1
+30000,Male,University,Married,57,0,24911,1750,1
+90000,Male,Graduate School,Single,52,0,91671,4508,0
+30000,Male,High School,Single,53,0,29683,1426,0
+140000,Male,High School,Married,53,0,133167,5400,0
+50000,Male,Graduate School,Single,53,0,49512,1771,0
+100000,Male,Others,Married,53,0,184438,3800,0
+30000,Male,University,Married,49,0,29707,2000,0
+160000,Male,High School,Married,51,0,4042,1000,1
+10000,Male,High School,Married,57,0,7525,1276,0
+170000,Male,University,Married,57,-1,792,0,1
+50000,Male,Graduate School,Married,48,0,48096,2300,0
+500000,Male,High School,Married,53,-1,5008,1,0
+500000,Male,High School,Married,55,2,613860,37300,1
+20000,Male,University,Married,59,3,6015,2000,1
+20000,Male,University,Married,52,0,16361,1285,0
+80000,Male,Others,Married,54,0,61454,2545,1
+500000,Male,Graduate School,Married,58,1,65005,0,0
+20000,Male,High School,Single,62,0,16303,1286,0
+190000,Male,University,Single,56,0,105046,2777,0
+160000,Male,University,Single,59,0,86249,5000,0
+170000,Male,Graduate School,Single,51,0,201851,6873,1
+80000,Male,University,Married,54,0,80449,2779,1
+180000,Male,University,Married,57,0,182329,9000,0
+50000,Male,High School,Single,51,2,51264,68,1
+70000,Male,University,Single,62,6,112202,0,0
+20000,Male,University,Married,55,0,10301,2750,1
+330000,Male,University,Married,48,-1,933,12701,0
+330000,Male,High School,Married,57,-2,3524,300,0
+320000,Male,Graduate School,Married,51,-2,1834,556,0
+140000,Male,High School,Married,59,-1,2859,3447,0
+20000,Male,University,Married,55,0,15793,1300,0
+70000,Male,High School,Married,53,-1,712,712,0
+240000,Male,University,Single,67,0,542827,20225,0
+460000,Male,University,Married,52,0,177520,8000,0
+80000,Male,Graduate School,Single,56,0,89088,3400,1
+20000,Male,University,Single,56,0,15397,3000,0
+200000,Male,High School,Married,54,0,1749,3000,1
+200000,Male,University,Married,51,2,206,2522,1
+420000,Male,High School,Married,53,-1,3717,5,0
+130000,Male,University,Single,54,0,30184,2419,0
+30000,Male,University,Others,53,1,0,0,1
+110000,Male,University,Married,51,0,54051,1972,1
+360000,Male,University,Married,51,-1,14093,10575,0
+50000,Male,University,Married,56,1,48141,3739,0
+70000,Male,High School,Married,54,0,64527,2519,0
+420000,Male,High School,Married,59,0,24389,1733,0
+50000,Male,High School,Married,57,0,49543,1500,0
+450000,Male,University,Married,55,2,2500,0,1
+50000,Male,University,Married,56,0,38111,2966,0
+20000,Male,High School,Married,53,3,15507,0,1
+250000,Male,Graduate School,Married,51,3,2487,0,1
+20000,Male,University,Others,53,0,18856,1318,0
+240000,Male,High School,Married,57,0,224736,9573,0
+50000,Male,High School,Married,52,0,47594,2200,1
+180000,Male,High School,Married,65,0,191309,175100,1
+250000,Male,Graduate School,Married,71,0,173907,6504,0
+30000,Male,High School,Single,52,0,25686,2500,0
+50000,Male,High School,Married,55,0,40247,1800,1
+20000,Male,University,Married,62,1,17539,1612,0
+260000,Male,High School,Married,52,0,187581,7299,1
+130000,Male,High School,Married,54,-1,193,2520,0
+490000,Male,University,Married,51,1,-16,0,1
+150000,Male,High School,Married,52,2,142903,13312,1
+30000,Male,High School,Married,53,1,18361,2500,1
+50000,Male,University,Single,54,1,0,0,1
+330000,Male,High School,Married,53,0,141061,3856,0
+330000,Male,University,Married,52,0,171004,6199,0
+140000,Male,High School,Married,56,2,90462,4212,1
+40000,Male,University,Married,55,1,-150,0,1
+360000,Male,High School,Single,54,-2,390,390,0
+80000,Male,University,Single,59,2,24756,5500,1
+730000,Male,High School,Married,56,0,746814,20500,0
+180000,Male,Graduate School,Married,63,0,157932,7000,0
+190000,Male,High School,Married,64,0,65666,3800,0
+50000,Male,High School,Single,57,-1,390,1726,0
+230000,Male,Graduate School,Married,56,0,38349,1933,0
+20000,Male,High School,Married,53,0,19084,1300,0
+130000,Male,University,Single,56,0,121165,4000,0
+30000,Male,Others,Married,53,-2,1780,24437,1
+360000,Male,Graduate School,Married,57,1,0,0,0
+100000,Male,University,Married,52,1,15420,2000,0
+180000,Male,High School,Single,55,0,193095,6900,0
+210000,Male,University,Married,53,1,-197,3392,0
+340000,Male,University,Married,53,0,342617,12604,1
+20000,Male,Graduate School,Married,52,0,21192,1600,0
+20000,Male,University,Married,52,-2,990,2990,0
+500000,Male,Graduate School,Married,54,0,493251,18725,0
+360000,Male,Graduate School,Married,53,-2,826,4765,0
+50000,Male,High School,Married,54,0,47722,2100,0
+530000,Male,University,Married,54,-2,-30,0,0
+260000,Male,Graduate School,Married,59,-1,1929,627,0
+40000,Male,High School,Married,50,1,35879,0,1
+30000,Male,University,Single,60,0,23575,1398,0
+240000,Male,Graduate School,Married,55,1,99259,0,1
+320000,Male,Graduate School,Single,55,2,200545,9600,1
+70000,Male,High School,Others,61,3,8909,1200,0
+200000,Male,University,Married,58,-1,1877,151,0
+360000,Male,Graduate School,Married,51,-1,2782,5636,0
+110000,Male,University,Single,51,0,25707,1400,1
+80000,Male,High School,Married,52,1,77185,5000,1
+200000,Male,Graduate School,Married,52,1,-768,0,0
+100000,Male,High School,Married,56,-1,390,390,1
+160000,Male,High School,Single,53,-1,791,9173,0
+140000,Male,Graduate School,Married,55,1,0,0,1
+300000,Male,University,Married,55,0,69994,2552,0
+50000,Male,High School,Married,59,1,48504,0,0
+50000,Female,University,Married,29,0,49895,1867,1
+50000,Female,Graduate School,Married,29,0,50737,1818,0
+50000,Female,University,Married,34,0,11340,3200,0
+50000,Female,University,Married,38,0,51400,0,1
+210000,Female,Graduate School,Single,29,-1,2783,1088,0
+150000,Female,Graduate School,Single,26,0,24242,1700,0
+190000,Female,University,Married,29,-2,0,0,0
+80000,Female,Graduate School,Single,26,2,37097,2000,1
+100000,Female,Graduate School,Single,28,0,4864,5800,0
+20000,Female,University,Single,27,0,18854,1292,0
+400000,Female,Graduate School,Single,29,-2,7622,39097,0
+100000,Female,University,Single,28,0,50697,1959,0
+30000,Female,University,Single,22,0,30028,2634,0
+150000,Female,University,Married,26,0,81638,2481,0
+60000,Female,University,Married,24,0,62786,2212,1
+140000,Female,High School,Single,26,0,93384,3514,0
+80000,Female,Graduate School,Single,25,0,53685,3291,0
+200000,Female,University,Single,26,0,153900,0,0
+410000,Female,Graduate School,Single,28,-2,5184,3476,0
+50000,Female,University,Married,26,0,43800,2043,0
+50000,Female,University,Single,25,2,33995,1800,1
+100000,Female,University,Single,26,1,6056,0,1
+190000,Female,University,Single,24,0,15740,106602,0
+120000,Female,University,Single,27,2,120649,6000,1
+20000,Female,Graduate School,Single,22,0,16917,3500,0
+30000,Female,University,Single,22,1,16549,1500,0
+20000,Female,High School,Single,22,2,16381,0,1
+20000,Female,University,Single,22,0,15247,1575,1
+80000,Female,University,Single,22,0,1742,1200,0
+80000,Female,University,Single,23,-1,78379,3015,0
+50000,Female,Graduate School,Single,23,-1,3430,2548,0
+80000,Female,Graduate School,Single,23,2,84052,7100,1
+50000,Female,Graduate School,Single,23,0,49593,2200,0
+50000,Female,University,Single,24,-1,594,0,0
+80000,Female,University,Single,24,-1,386,1396,0
+20000,Female,University,Married,24,1,19242,3200,1
+70000,Female,High School,Single,24,0,8391,2000,0
+130000,Female,University,Single,23,0,8690,1000,0
+10000,Female,University,Single,21,2,6795,3905,1
+50000,Female,University,Married,23,2,54866,1000,1
+50000,Female,Graduate School,Single,23,0,9978,3000,0
+20000,Female,University,Single,22,0,15090,1500,0
+80000,Female,Graduate School,Single,22,-1,1807,133,0
+70000,Female,University,Married,23,0,69951,3200,1
+10000,Female,University,Single,22,1,0,0,0
+120000,Female,University,Single,23,0,39517,4000,0
+20000,Female,University,Single,23,0,15071,1266,0
+50000,Female,University,Single,23,-1,865,1202,0
+20000,Female,Graduate School,Single,24,0,14273,1183,0
+40000,Female,University,Single,23,0,39402,2000,0
+80000,Female,Graduate School,Single,23,0,78366,2540,0
+20000,Female,University,Single,24,1,4697,0,1
+130000,Female,University,Married,24,0,66072,3000,0
+100000,Female,University,Married,24,-1,53309,7500,0
+120000,Female,University,Single,24,0,93338,3261,0
+50000,Female,High School,Single,24,0,48658,1900,0
+190000,Female,University,Single,24,0,183821,9000,0
+50000,Female,Graduate School,Single,25,0,20916,5000,0
+20000,Female,High School,Married,54,2,14381,2350,0
+50000,Female,Graduate School,Single,23,1,-697,12058,0
+50000,Female,Graduate School,Single,23,0,15252,2200,0
+80000,Female,University,Single,22,2,53023,0,1
+10000,Female,High School,Single,22,2,7720,2500,1
+50000,Female,University,Single,22,0,18293,1315,0
+50000,Female,University,Single,22,0,50635,1760,1
+50000,Female,University,Single,23,2,49441,1998,0
+20000,Female,University,Single,22,1,0,0,0
+60000,Female,Graduate School,Single,27,2,19341,900,1
+50000,Female,University,Single,23,-1,9101,1502,0
+50000,Female,University,Single,23,0,47559,1784,1
+20000,Female,University,Single,21,-1,390,780,0
+20000,Female,University,Single,23,0,14418,5000,1
+80000,Female,High School,Married,23,2,22852,3200,1
+50000,Female,High School,Single,24,0,10901,22514,1
+120000,Female,University,Single,24,0,27491,3000,0
+50000,Female,Graduate School,Single,24,0,43219,2400,0
+110000,Female,Graduate School,Single,24,1,115527,0,1
+30000,Female,University,Single,24,1,26690,2000,0
+110000,Female,University,Single,23,2,1587,1000,1
+20000,Female,University,Single,23,0,15675,4492,0
+50000,Female,University,Single,23,0,47654,2000,0
+50000,Female,Graduate School,Single,24,0,48483,4500,1
+50000,Female,University,Single,24,0,43638,2200,0
+20000,Female,University,Single,22,2,20549,1400,0
+200000,Female,University,Married,36,0,184697,8001,0
+30000,Female,Graduate School,Single,24,-1,1991,2000,0
+130000,Female,High School,Married,23,0,109592,5733,0
+20000,Female,University,Single,24,0,4261,7830,0
+20000,Female,Graduate School,Single,23,-1,19364,1358,0
+60000,Female,University,Married,23,0,59408,2230,0
+30000,Female,Graduate School,Single,24,0,27660,2000,0
+30000,Female,University,Single,22,2,27987,1420,1
+120000,Female,University,Single,24,0,55829,2000,0
+30000,Female,University,Single,24,0,20367,2000,0
+30000,Female,Graduate School,Single,24,0,26094,1440,0
+60000,Female,University,Married,24,0,23851,1700,1
+50000,Female,University,Single,25,0,45688,2500,1
+30000,Female,University,Single,25,0,8935,1165,0
+270000,Female,Graduate School,Single,25,-1,4999,5588,0
+360000,Female,Graduate School,Single,25,0,160411,6200,0
+450000,Female,High School,Single,26,0,28765,1714,0
+50000,Female,Graduate School,Single,25,-1,9040,0,0
+50000,Female,High School,Single,25,0,27232,1439,0
+120000,Female,Graduate School,Single,25,0,58106,2000,0
+40000,Female,Graduate School,Single,26,1,7904,0,1
+140000,Female,University,Married,24,0,62755,2285,0
+200000,Female,University,Married,25,-2,0,0,0
+30000,Female,University,Single,25,-1,415,150,0
+30000,Female,University,Single,25,0,15493,1500,0
+30000,Female,Graduate School,Single,25,2,29335,4000,1
+30000,Female,University,Married,25,0,25530,1500,0
+110000,Female,Graduate School,Single,25,2,2729,4340,1
+50000,Female,Graduate School,Single,24,1,33385,3000,0
+130000,Female,University,Single,24,0,135466,4139,0
+50000,Female,University,Single,25,0,47431,2000,1
+150000,Female,Graduate School,Single,25,0,87822,2575,0
+30000,Female,University,Single,25,0,21020,2000,1
+50000,Female,University,Single,23,0,15167,1523,0
+20000,Female,University,Married,23,0,17049,15000,0
+230000,Female,University,Single,24,0,64929,6000,0
+280000,Female,Graduate School,Single,25,-2,985,505,0
+50000,Female,University,Single,24,0,5819,2555,0
+130000,Female,University,Single,23,0,44930,6000,0
+20000,Female,Graduate School,Single,25,1,0,891,0
+80000,Female,High School,Married,26,0,69489,2600,0
+100000,Female,Graduate School,Single,24,0,84895,3513,0
+90000,Female,University,Single,24,2,89346,4100,1
+20000,Female,Graduate School,Single,24,-1,440,4023,0
+140000,Female,University,Married,24,2,1536,696,1
+150000,Female,Graduate School,Single,24,-1,291,291,1
+140000,Female,Graduate School,Single,24,2,778,980,0
+50000,Female,University,Single,24,2,23497,1395,0
+30000,Female,High School,Single,24,1,0,0,0
+110000,Female,University,Single,22,-2,0,0,0
+80000,Female,Graduate School,Single,23,-1,326,0,0
+30000,Female,University,Married,23,0,21710,1670,1
+90000,Female,University,Single,23,0,28120,5000,0
+80000,Female,University,Single,25,0,47836,2000,0
+30000,Female,University,Single,25,3,9095,1500,1
+20000,Female,Graduate School,Single,24,0,19335,1415,0
+30000,Female,University,Married,24,0,3740,1100,0
+80000,Female,Graduate School,Single,24,1,2254,0,0
+90000,Female,University,Single,23,0,88015,5640,0
+50000,Female,University,Single,25,0,46895,2000,0
+20000,Female,Graduate School,Single,22,2,13718,1550,1
+10000,Female,University,Single,22,0,6488,1282,0
+80000,Female,High School,Single,23,0,66909,2451,0
+60000,Female,University,Single,25,0,75932,3121,1
+50000,Female,Graduate School,Single,25,0,50051,1754,0
+30000,Female,High School,Single,22,0,30136,1335,0
+30000,Female,University,Single,23,2,34048,0,0
+10000,Female,Graduate School,Single,24,0,6632,2000,0
+50000,Female,Graduate School,Single,26,-1,5052,0,0
+290000,Female,Graduate School,Single,24,0,11157,0,0
+100000,Female,Graduate School,Married,24,0,10376,26404,0
+40000,Female,High School,Married,26,2,33811,1700,1
+130000,Female,University,Single,26,2,130535,6000,1
+100000,Female,University,Married,26,0,1962,3000,0
+360000,Female,Graduate School,Single,27,-2,0,0,0
+110000,Female,Graduate School,Single,23,0,78793,3400,0
+150000,Female,University,Single,23,0,58757,2714,0
+100000,Female,Graduate School,Single,23,-1,4802,4806,0
+50000,Female,Graduate School,Single,23,1,50638,0,1
+50000,Female,University,Single,22,0,50180,2100,0
+50000,Female,University,Married,22,0,47718,2106,0
+60000,Female,University,Single,23,0,48413,5000,1
+40000,Female,University,Single,23,0,18653,2000,0
+50000,Female,High School,Married,25,2,51243,0,1
+80000,Female,High School,Single,25,0,79986,3000,0
+50000,Female,University,Married,24,0,20173,1500,0
+10000,Female,University,Single,22,-1,3646,4000,0
+50000,Female,University,Single,24,0,30344,20002,0
+180000,Female,University,Married,24,0,6272,0,0
+80000,Female,University,Single,25,-2,0,0,0
+50000,Female,Graduate School,Single,25,1,0,0,1
+50000,Female,University,Married,25,-1,8731,2463,0
+30000,Female,Graduate School,Single,22,-2,8066,4869,0
+50000,Female,Others,Single,23,0,48473,2100,0
+80000,Female,University,Single,24,0,77883,3500,0
+150000,Female,University,Single,26,0,69169,3083,0
+120000,Female,Graduate School,Single,24,-1,2946,384,1
+80000,Female,University,Married,24,2,3177,0,1
+240000,Female,University,Single,24,0,75830,1480,0
+30000,Female,University,Single,24,0,27465,3700,0
+150000,Female,Graduate School,Single,25,0,30305,10190,0
+50000,Female,Graduate School,Single,25,0,41892,3000,0
+50000,Female,University,Married,24,0,25562,10000,0
+230000,Female,Graduate School,Single,24,-1,5883,7000,0
+90000,Female,University,Single,24,0,84137,3210,0
+230000,Female,Graduate School,Single,25,-2,264,264,0
+90000,Female,Graduate School,Single,25,0,6090,1500,0
+230000,Female,University,Married,25,0,9167,1392,0
+230000,Female,University,Single,25,0,95544,4200,0
+420000,Female,University,Single,25,0,17685,30553,0
+50000,Female,Graduate School,Single,25,0,30587,1800,0
+70000,Female,University,Single,24,0,67897,3000,0
+200000,Female,Others,Single,25,-2,3302,0,0
+30000,Female,Graduate School,Single,25,0,27396,1753,0
+60000,Female,University,Single,24,2,47210,0,1
+230000,Female,Graduate School,Single,25,-2,248,248,0
+100000,Female,University,Single,25,0,100778,4000,0
+420000,Female,High School,Single,25,2,68394,5620,0
+100000,Female,University,Single,25,-2,880,880,0
+160000,Female,High School,Single,26,0,19542,10000,0
+260000,Female,Graduate School,Single,26,0,238110,10368,0
+280000,Female,University,Married,26,0,6056,1200,0
+70000,Female,University,Single,24,0,36175,0,0
+50000,Female,Graduate School,Single,26,2,47068,0,1
+100000,Female,University,Single,26,2,30067,7500,1
+100000,Female,Graduate School,Married,28,1,11059,0,0
+290000,Female,Graduate School,Single,26,-1,55758,5029,0
+180000,Female,Graduate School,Single,26,-1,3000,0,0
+300000,Female,University,Married,25,1,237155,102,1
+80000,Female,University,Single,25,0,34984,3015,0
+90000,Female,University,Married,25,0,58502,3043,0
+500000,Female,University,Single,29,0,433792,13678,1
+40000,Female,Graduate School,Single,25,-1,20216,8851,0
+170000,Female,High School,Single,26,2,63977,2550,0
+30000,Female,University,Single,25,0,14045,1555,0
+120000,Female,University,Single,23,-2,436,1480,0
+100000,Female,University,Single,28,0,1506,89,1
+50000,Female,University,Single,28,1,0,0,0
+360000,Female,University,Married,26,-1,4183,6501,0
+220000,Female,Graduate School,Single,26,0,29452,5000,0
+110000,Female,University,Single,25,-2,-2628,165,0
+110000,Female,Graduate School,Single,26,0,113284,5200,0
+310000,Female,University,Single,26,0,219155,10000,0
+50000,Female,University,Single,27,-1,45750,1655,0
+50000,Female,Graduate School,Single,26,1,-18,0,0
+200000,Female,Graduate School,Single,26,0,29339,2000,0
+120000,Female,University,Married,27,0,50620,1000,0
+80000,Female,University,Married,27,1,81841,15,1
+320000,Female,University,Single,27,0,228129,100741,0
+210000,Female,Graduate School,Single,27,-1,4426,0,0
+90000,Female,University,Single,26,2,88103,8300,1
+170000,Female,Graduate School,Single,26,0,172523,8000,0
+310000,Female,University,Single,25,-2,2548,10317,0
+360000,Female,University,Single,26,2,2143,1000,0
+200000,Female,Graduate School,Single,26,-2,2364,442,0
+500000,Female,High School,Single,26,0,16162,3000,0
+300000,Female,University,Single,26,0,212323,8420,0
+130000,Female,University,Single,26,1,99290,0,1
+260000,Female,Graduate School,Single,26,0,144112,5033,0
+100000,Female,High School,Single,27,0,89312,3000,0
+70000,Female,University,Single,26,0,65229,3000,0
+100000,Female,University,Married,26,0,95087,3965,1
+80000,Female,Graduate School,Single,26,0,1238,376,0
+100000,Female,Graduate School,Single,25,2,97570,4100,1
+70000,Female,Graduate School,Single,24,0,47220,2000,0
+170000,Female,Graduate School,Single,27,2,169057,7500,0
+30000,Female,Graduate School,Single,26,-1,9600,0,0
+20000,Female,High School,Single,26,1,18025,0,0
+180000,Female,Graduate School,Single,26,-1,17517,28000,0
+30000,Female,University,Married,27,2,20581,1700,1
+80000,Female,Graduate School,Single,27,0,82150,3463,0
+60000,Female,University,Single,27,0,27921,2500,0
+150000,Female,Graduate School,Single,26,0,151812,7005,0
+50000,Female,High School,Single,23,2,46913,1938,0
+360000,Female,Graduate School,Married,27,1,0,0,0
+70000,Female,High School,Married,26,2,71832,3275,0
+50000,Female,Graduate School,Single,27,2,48064,3000,1
+90000,Female,Graduate School,Single,28,1,52061,79,1
+60000,Female,High School,Single,27,0,39325,2000,0
+360000,Female,Graduate School,Single,27,-2,2434,3900,0
+70000,Female,University,Single,22,0,66748,2049,0
+20000,Female,University,Single,23,3,12508,0,1
+110000,Female,University,Married,25,2,30079,1494,1
+30000,Female,University,Single,25,0,15767,1488,0
+90000,Female,High School,Single,25,0,74814,5000,0
+210000,Female,University,Married,25,0,142848,5000,0
+270000,Female,Graduate School,Single,27,0,196192,9000,0
+60000,Female,Graduate School,Single,27,2,322,563,0
+280000,Female,University,Single,24,-1,316,316,0
+50000,Female,High School,Single,25,0,8298,1250,0
+70000,Female,Graduate School,Married,25,0,69334,3000,0
+140000,Female,University,Single,26,-1,10066,6483,0
+30000,Female,High School,Single,26,0,27818,3000,1
+70000,Female,Graduate School,Single,26,0,69754,2791,0
+140000,Female,Graduate School,Single,27,-2,0,0,0
+300000,Female,Others,Single,30,-1,10915,3480,0
+80000,Female,University,Single,26,1,0,0,0
+340000,Female,University,Single,27,0,59322,20063,0
+310000,Female,Graduate School,Single,27,0,134525,6200,0
+110000,Female,University,Single,27,1,59580,0,1
+130000,Female,Graduate School,Single,27,0,68251,3000,0
+280000,Female,University,Single,29,-2,190601,4987,0
+100000,Female,Graduate School,Single,29,1,6266,0,0
+150000,Female,Graduate School,Single,29,0,94062,3183,0
+130000,Female,High School,Married,26,0,121329,11000,0
+250000,Female,Graduate School,Married,27,0,85238,20004,0
+200000,Female,Graduate School,Single,28,0,7484,2000,0
+110000,Female,University,Single,28,2,105014,9000,0
+160000,Female,University,Others,27,0,11442,1000,0
+80000,Female,High School,Married,27,0,24003,3028,1
+220000,Female,Graduate School,Single,27,0,66620,16008,0
+20000,Female,University,Single,27,0,16679,1554,0
+160000,Female,University,Single,26,1,5654,398,0
+170000,Female,Graduate School,Single,26,-1,3732,3732,0
+360000,Female,University,Single,26,0,17084,1400,0
+50000,Female,University,Married,26,0,49378,2500,0
+100000,Female,Graduate School,Single,27,0,43599,3000,0
+150000,Female,Graduate School,Single,29,0,146465,7000,0
+150000,Female,University,Married,29,0,8139,151691,0
+100000,Female,Graduate School,Single,29,0,167410,70001,0
+200000,Female,Graduate School,Single,29,-1,5967,1897,0
+300000,Female,High School,Single,27,0,17376,1294,0
+130000,Female,High School,Single,28,1,0,14051,0
+150000,Female,University,Married,28,0,97069,5300,0
+110000,Female,Graduate School,Single,28,0,53152,1725,0
+340000,Female,High School,Married,28,-1,6780,3629,0
+60000,Female,Graduate School,Single,25,0,59228,2300,0
+90000,Female,High School,Single,23,0,94806,30000,0
+230000,Female,University,Married,29,1,113218,4800,0
+200000,Female,Graduate School,Single,29,-1,3655,3084,0
+310000,Female,Graduate School,Single,28,-1,18877,18417,0
+150000,Female,University,Married,29,0,80625,8002,0
+390000,Female,University,Married,29,-2,10598,1571,0
+210000,Female,Graduate School,Single,29,1,0,0,1
+290000,Female,Graduate School,Single,27,1,236258,1500,0
+180000,Female,University,Married,26,-1,31540,5000,0
+140000,Female,Graduate School,Single,26,0,137789,5000,1
+10000,Female,University,Married,26,2,2742,2500,0
+80000,Female,Graduate School,Single,27,0,36959,2000,0
+50000,Female,Graduate School,Single,27,3,300,0,1
+110000,Female,Graduate School,Single,28,0,52693,5064,0
+180000,Female,Graduate School,Single,29,-2,0,0,1
+230000,Female,High School,Married,27,0,67615,1323,0
+80000,Female,Graduate School,Single,27,-1,2127,0,0
+50000,Female,Graduate School,Married,27,1,19029,0,0
+220000,Female,Graduate School,Single,28,0,215779,8600,0
+30000,Female,Graduate School,Single,28,1,24307,0,1
+200000,Female,Graduate School,Single,26,0,196168,8800,0
+130000,Female,University,Married,26,0,45563,2100,0
+360000,Female,Graduate School,Single,26,2,2500,0,0
+140000,Female,Graduate School,Single,27,0,27750,1500,0
+130000,Female,Graduate School,Single,27,-2,8508,27049,0
+150000,Female,Graduate School,Single,28,1,0,0,0
+20000,Female,University,Married,28,2,17023,2000,1
+130000,Female,University,Single,28,2,7964,1152,0
+190000,Female,Graduate School,Single,29,0,28096,2000,0
+140000,Female,Graduate School,Single,25,2,26623,3000,1
+180000,Female,University,Single,25,0,177607,6500,0
+180000,Female,Graduate School,Single,28,-1,2393,5461,0
+50000,Female,University,Single,28,0,23318,1398,0
+80000,Female,University,Married,27,-1,2805,1995,0
+120000,Female,University,Single,27,1,11238,0,1
+420000,Female,University,Single,28,0,89829,5126,0
+90000,Female,University,Married,27,1,0,1191,0
+70000,Female,Graduate School,Single,27,0,42825,5000,0
+200000,Female,University,Single,27,0,37076,5000,0
+240000,Female,Graduate School,Single,28,0,30173,5009,0
+360000,Female,Graduate School,Single,28,-1,276,0,0
+180000,Female,Graduate School,Single,28,0,20098,3000,0
+160000,Female,Graduate School,Single,27,-1,288,451,0
+20000,Female,University,Single,29,0,12742,1500,0
+30000,Female,High School,Single,28,2,1946,1000,0
+230000,Female,Graduate School,Single,28,0,45405,2500,0
+240000,Female,Graduate School,Single,29,0,10747,30439,0
+200000,Female,Graduate School,Single,27,0,159215,15500,1
+280000,Female,University,Single,28,-2,0,0,0
+340000,Female,University,Married,29,0,20816,1005,0
+50000,Female,Graduate School,Single,25,0,44641,1870,0
+200000,Female,High School,Married,26,0,201642,7671,1
+140000,Female,University,Married,27,2,776,752,0
+160000,Female,Graduate School,Single,29,-1,20193,3250,0
+50000,Female,Graduate School,Single,26,2,29164,0,0
+20000,Female,Graduate School,Single,28,2,15522,3186,1
+170000,Female,University,Married,29,-2,-807,0,0
+140000,Female,University,Single,27,-1,1186,4000,0
+150000,Female,Graduate School,Married,29,1,0,231,0
+70000,Female,University,Single,25,1,73167,0,0
+50000,Female,High School,Single,28,0,51422,2100,1
+80000,Female,University,Married,28,0,150651,3905,0
+360000,Female,University,Married,29,-2,0,0,0
+360000,Female,Graduate School,Single,27,1,0,0,0
+130000,Female,Graduate School,Single,27,0,37319,2031,0
+120000,Female,Graduate School,Single,28,-1,2494,2000,0
+110000,Female,University,Single,28,0,79573,3800,0
+230000,Female,University,Single,28,0,29500,4800,0
+110000,Female,University,Single,29,0,18817,2600,1
+200000,Female,Graduate School,Single,28,-1,1860,4464,0
+60000,Female,Graduate School,Married,28,-1,2848,1469,0
+310000,Female,High School,Single,28,0,240846,10500,0
+160000,Female,High School,Single,27,1,-10,3392,0
+180000,Female,Graduate School,Married,27,0,152158,5539,0
+170000,Female,Graduate School,Married,30,-1,364,14662,0
+240000,Female,Graduate School,Single,29,-1,3647,2535,0
+160000,Female,Graduate School,Single,28,-1,160,0,0
+200000,Female,Graduate School,Single,28,-2,7618,7961,0
+60000,Female,High School,Single,28,0,59650,3376,0
+100000,Female,University,Married,28,0,94245,3500,0
+60000,Female,Graduate School,Single,28,0,6393,4278,1
+50000,Female,Graduate School,Single,31,-1,12725,30018,0
+120000,Female,University,Single,29,0,96342,4500,0
+250000,Female,Graduate School,Single,30,-1,18751,5000,0
+180000,Female,Graduate School,Married,30,-1,1863,1863,1
+210000,Female,University,Married,30,0,107539,3000,0
+360000,Female,University,Married,30,0,19918,500,0
+210000,Female,Graduate School,Single,26,0,132909,7000,0
+110000,Female,University,Married,29,0,27340,10000,0
+270000,Female,University,Single,29,-1,88996,17142,0
+50000,Female,Graduate School,Single,29,0,6161,3500,0
+220000,Female,University,Married,29,0,142818,5042,0
+80000,Female,University,Single,30,-1,4161,2000,0
+180000,Female,Graduate School,Single,31,-2,4522,1771,0
+400000,Female,Graduate School,Married,31,-1,325,5581,0
+230000,Female,University,Married,30,0,234521,9800,0
+90000,Female,Others,Single,31,1,0,0,0
+30000,Female,University,Married,29,1,27371,0,1
+150000,Female,Graduate School,Single,29,0,145138,5200,0
+230000,Female,Graduate School,Single,29,2,711,711,0
+510000,Female,Graduate School,Single,30,-1,71121,493358,0
+20000,Female,University,Single,30,1,20128,0,1
+150000,Female,Graduate School,Single,27,0,51032,30000,0
+110000,Female,University,Single,24,0,6217,2000,0
+250000,Female,University,Married,25,0,24920,2000,0
+240000,Female,Graduate School,Others,29,1,0,0,0
+150000,Female,Graduate School,Single,28,-1,3470,0,0
+420000,Female,University,Single,29,0,267050,10000,0
+170000,Female,Others,Single,30,0,66805,5148,0
+220000,Female,Graduate School,Single,28,-1,6601,5026,0
+160000,Female,University,Single,28,-2,2191,645,0
+150000,Female,University,Married,28,0,5527,1000,0
+150000,Female,University,Single,29,1,4577,0,1
+140000,Female,University,Married,27,-2,444,0,1
+240000,Female,University,Single,27,0,30589,2239,0
+20000,Female,University,Single,28,0,2937,3353,1
+30000,Female,University,Single,29,2,28541,1385,1
+80000,Female,University,Married,29,2,80100,3500,0
+400000,Female,Graduate School,Single,29,-1,5567,13011,0
+150000,Female,University,Single,27,0,88957,5500,0
+180000,Female,University,Married,32,0,95917,2986,0
+360000,Female,High School,Married,32,0,227689,7142,0
+160000,Female,Graduate School,Married,30,-1,960,3597,1
+80000,Female,University,Single,30,0,80681,6900,0
+240000,Female,University,Married,32,1,0,0,0
+50000,Female,University,Married,33,0,34485,1815,1
+50000,Female,University,Married,34,-2,50680,50702,0
+50000,Female,High School,Married,29,0,47194,2002,1
+200000,Female,University,Married,30,-2,42488,0,0
+50000,Female,Graduate School,Single,29,2,50461,2100,0
+140000,Female,Graduate School,Single,30,0,78095,8000,0
+310000,Female,Graduate School,Single,29,0,8454,14581,0
+430000,Female,High School,Single,30,0,308741,12000,0
+100000,Female,Graduate School,Single,30,0,89242,4200,0
+500000,Female,Graduate School,Single,29,0,28555,5000,0
+200000,Female,University,Single,29,0,204134,8000,1
+140000,Female,Graduate School,Single,29,0,135436,13102,0
+200000,Female,University,Married,30,-2,0,1541,0
+200000,Female,Graduate School,Single,28,-2,466,161844,0
+100000,Female,Graduate School,Single,28,0,10886,1300,0
+180000,Female,Others,Married,28,2,117328,4117,0
+190000,Female,Graduate School,Single,28,0,141693,5618,0
+50000,Female,Graduate School,Single,28,0,48782,2100,0
+440000,Female,Graduate School,Married,31,0,232526,1025,0
+30000,Female,University,Married,30,-1,4994,0,0
+180000,Female,University,Single,29,-1,4184,0,0
+150000,Female,Graduate School,Single,29,-2,0,5067,1
+390000,Female,Graduate School,Single,29,0,89953,25000,0
+30000,Female,University,Married,31,0,16207,4000,0
+10000,Female,University,Single,32,-1,77,7417,1
+270000,Female,University,Single,32,0,59710,1808,0
+20000,Female,University,Married,32,3,2400,0,1
+50000,Female,University,Married,34,0,45974,2374,0
+60000,Female,Graduate School,Single,33,1,11776,0,0
+50000,Female,High School,Married,34,2,28289,5000,1
+20000,Female,High School,Single,40,0,19088,1248,0
+80000,Female,High School,Married,35,1,68423,3300,1
+50000,Female,University,Married,47,0,48433,3500,0
+140000,Female,Graduate School,Married,40,2,21952,6300,1
+400000,Female,Graduate School,Single,36,-1,845,1428,0
+50000,Female,High School,Married,44,0,13881,3483,0
+80000,Female,University,Married,36,0,74652,3700,0
+380000,Female,University,Married,30,2,225995,10382,1
+230000,Female,High School,Married,33,0,233314,8005,0
+210000,Female,Graduate School,Single,31,0,9604,5056,0
+100000,Female,University,Single,40,0,92393,3500,0
+210000,Female,University,Married,44,0,124146,6000,0
+140000,Female,Graduate School,Married,35,0,92924,4691,0
+150000,Female,University,Married,47,-2,0,0,0
+40000,Female,Others,Married,48,0,36581,1926,0
+120000,Female,High School,Single,26,1,0,1953,1
+50000,Female,University,Single,26,0,49296,1614,0
+360000,Female,Graduate School,Single,26,-1,273,3558,0
+80000,Female,High School,Married,42,2,17511,3600,1
+230000,Female,Graduate School,Single,46,-1,699,699,0
+50000,Female,University,Married,38,0,6350,1100,1
+150000,Female,University,Others,48,0,3611,2000,0
+20000,Female,High School,Married,35,0,7196,1120,0
+10000,Female,High School,Married,44,5,8678,0,1
+140000,Female,Graduate School,Married,41,0,80603,3000,0
+150000,Female,University,Married,41,-1,1650,0,0
+290000,Female,University,Single,36,0,110695,3260,0
+170000,Female,University,Married,36,-1,813,2085,0
+420000,Female,High School,Married,39,-2,2476,3634,0
+120000,Female,University,Single,33,-1,508,700,0
+150000,Female,University,Married,40,-2,3042,41243,0
+80000,Female,Graduate School,Married,45,0,77402,2900,0
+100000,Female,University,Married,48,0,109033,3641,0
+40000,Female,High School,Single,41,0,72892,1971,1
+130000,Female,High School,Single,46,0,131996,5000,0
+100000,Female,University,Single,41,0,96509,4500,0
+300000,Female,High School,Married,46,0,147289,10000,0
+150000,Female,University,Single,41,-1,316,316,0
+50000,Female,University,Married,39,1,27094,0,1
+350000,Female,Graduate School,Married,37,-2,6185,27821,0
+170000,Female,University,Single,45,-1,1638,1638,0
+220000,Female,University,Single,31,2,182176,0,1
+210000,Female,Graduate School,Single,29,-1,18662,6224,0
+200000,Female,Graduate School,Single,30,0,89874,4200,0
+180000,Female,Graduate School,Single,30,2,14038,3500,1
+50000,Female,Graduate School,Married,31,1,50883,2200,1
+50000,Female,University,Single,35,1,31507,0,0
+180000,Female,Graduate School,Married,39,-1,892,6846,0
+500000,Female,Graduate School,Single,44,-2,29996,10625,0
+160000,Female,University,Single,45,1,0,0,0
+120000,Female,Graduate School,Single,37,2,744,444,1
+60000,Female,University,Married,39,-1,12545,1217,1
+220000,Female,Others,Married,35,-1,1849,326,0
+210000,Female,University,Married,36,0,13460,1534,0
+480000,Female,University,Single,38,-2,5619,7600,0
+140000,Female,High School,Single,43,0,140832,5006,0
+150000,Female,University,Married,38,0,149957,6000,0
+110000,Female,High School,Single,43,2,114614,0,1
+350000,Female,Graduate School,Married,41,0,15407,3000,0
+150000,Female,University,Single,43,0,111024,2720,0
+420000,Female,Graduate School,Married,39,0,298753,13000,1
+80000,Female,Graduate School,Married,41,0,11987,1325,0
+180000,Female,Graduate School,Married,44,0,138701,5100,0
+290000,Female,University,Married,39,1,-70,10000,0
+300000,Female,Graduate School,Married,37,0,4476,3017,0
+80000,Female,University,Married,41,0,78480,3100,1
+230000,Female,University,Married,39,-2,0,0,0
+20000,Female,University,Married,43,1,17188,0,0
+20000,Female,University,Single,39,1,0,0,1
+200000,Female,Graduate School,Married,40,-2,-50,0,0
+500000,Female,Graduate School,Single,35,-1,33214,18694,0
+10000,Female,High School,Married,49,0,9807,2114,0
+100000,Female,High School,Married,42,0,58543,2004,0
+250000,Female,University,Married,28,-2,0,0,0
+540000,Female,High School,Single,38,1,736,736,0
+50000,Female,University,Married,39,0,37503,1399,0
+550000,Female,Graduate School,Single,44,0,27093,30820,0
+200000,Female,High School,Married,42,-1,2213,5286,0
+50000,Female,High School,Married,44,0,46363,1762,0
+230000,Female,High School,Married,36,1,232115,22,0
+330000,Female,High School,Married,45,0,33369,2000,0
+520000,Female,Graduate School,Married,37,-2,4439,3913,0
+180000,Female,Graduate School,Single,44,-2,16891,0,0
+200000,Female,High School,Married,36,0,167385,8000,0
+430000,Female,Others,Single,44,0,626648,20659,0
+220000,Female,Graduate School,Married,46,3,249449,9501,0
+210000,Female,University,Single,41,-2,0,8333,0
+360000,Female,University,Married,47,-1,71006,36448,0
+200000,Female,University,Married,48,-2,2233,6734,0
+150000,Female,Graduate School,Married,36,0,38298,1635,0
+170000,Female,High School,Single,47,-1,16089,5455,0
+50000,Female,High School,Married,49,2,47934,2100,0
+20000,Female,High School,Single,48,1,18038,1000,1
+300000,Female,Graduate School,Single,32,0,169223,10000,0
+200000,Female,Graduate School,Married,40,1,2418,0,0
+210000,Female,Graduate School,Married,34,0,73329,2700,0
+100000,Female,Graduate School,Single,42,0,93256,8300,0
+210000,Female,University,Married,43,-1,3088,5056,0
+80000,Female,University,Married,32,0,77808,3000,0
+130000,Female,High School,Married,31,0,62518,4065,1
+320000,Female,University,Single,35,0,94561,6085,0
+350000,Female,Others,Married,45,0,355310,13671,0
+260000,Female,Graduate School,Single,29,0,78341,10000,0
+230000,Female,High School,Single,46,0,273094,135034,0
+210000,Female,Graduate School,Married,44,-1,14993,1938,0
+160000,Female,High School,Married,41,1,45057,2000,0
+210000,Female,Graduate School,Married,30,0,187191,4700,0
+120000,Female,Graduate School,Single,32,1,0,0,0
+30000,Female,Graduate School,Married,32,0,27764,2000,1
+70000,Female,University,Married,31,0,38579,8958,0
+270000,Female,University,Married,31,0,177003,6235,0
+350000,Female,Graduate School,Single,30,-2,951,605,0
+50000,Female,University,Single,45,1,23751,0,1
+110000,Female,University,Married,48,0,108668,4322,0
+150000,Female,University,Married,44,0,49716,2000,0
+280000,Female,Graduate School,Single,29,-2,0,0,1
+300000,Female,University,Married,30,-1,10746,7006,0
+20000,Female,University,Married,32,0,18377,1288,1
+50000,Female,University,Married,31,2,44105,2000,0
+200000,Female,Graduate School,Single,31,-2,0,0,0
+50000,Female,High School,Married,31,1,0,0,0
+150000,Female,Graduate School,Married,46,0,135748,5000,0
+200000,Female,University,Single,32,-1,13467,4366,0
+100000,Female,University,Married,32,1,-350,0,0
+360000,Female,Graduate School,Single,33,-1,5334,3017,1
+50000,Female,University,Married,33,0,37541,1434,0
+220000,Female,University,Married,33,0,204144,6500,0
+80000,Female,Graduate School,Single,33,1,83367,0,1
+150000,Female,Graduate School,Married,33,-1,20441,6549,0
+210000,Female,Graduate School,Married,34,-1,326,983,1
+70000,Female,University,Single,34,0,67655,3200,0
+50000,Female,University,Married,48,0,44324,2000,0
+80000,Female,High School,Others,43,0,3408,0,0
+110000,Female,University,Single,34,-2,0,576,0
+170000,Female,Graduate School,Single,34,-2,1088,1088,0
+160000,Female,High School,Married,46,-2,-510,380,0
+130000,Female,High School,Married,49,0,25339,1427,0
+200000,Female,High School,Married,48,-2,119139,5844,0
+360000,Female,University,Married,40,-1,29540,27930,0
+160000,Female,University,Married,42,0,25338,3000,0
+310000,Female,University,Married,42,-2,2851,7869,0
+500000,Female,Graduate School,Married,40,0,18436,3001,0
+430000,Female,Graduate School,Single,36,0,258212,10500,1
+300000,Female,High School,Single,48,1,-2,781,1
+50000,Female,Graduate School,Single,46,-1,3737,0,0
+300000,Female,University,Married,38,0,1250,940,0
+300000,Female,University,Single,39,-2,1028,0,1
+20000,Female,High School,Married,39,0,19078,1509,1
+400000,Female,Graduate School,Single,39,-2,0,0,1
+310000,Female,University,Married,48,-2,0,99,0
+370000,Female,University,Single,38,0,75823,5000,0
+230000,Female,University,Married,36,-1,4390,1402,0
+300000,Female,University,Married,40,0,11704,1139,0
+250000,Female,Graduate School,Married,40,-1,2971,1737,0
+320000,Female,Graduate School,Married,41,0,95155,4400,0
+310000,Female,High School,Married,45,0,311401,21084,0
+500000,Female,Graduate School,Married,39,-2,3625,398,0
+80000,Female,High School,Married,37,0,24941,1423,0
+200000,Female,High School,Single,45,0,69817,2049,0
+160000,Female,High School,Married,38,0,151963,5100,0
+20000,Female,University,Single,47,0,20151,2000,0
+20000,Female,University,Single,33,0,11635,2000,0
+30000,Female,University,Married,33,3,2400,0,1
+500000,Female,Graduate School,Single,30,1,943,5692,0
+180000,Female,Graduate School,Single,30,-1,1650,196,0
+180000,Female,High School,Single,31,-2,2941,1443,0
+450000,Female,Graduate School,Married,33,-2,0,0,0
+180000,Female,University,Married,33,0,70472,3218,0
+290000,Female,High School,Married,46,-1,30365,32575,0
+200000,Female,University,Married,37,0,4942,1068,0
+200000,Female,Graduate School,Married,34,1,0,0,0
+400000,Female,University,Single,35,2,61245,50016,1
+210000,Female,University,Single,32,-2,0,0,0
+30000,Female,High School,Single,42,1,30445,0,1
+340000,Female,University,Single,39,0,332319,12500,0
+50000,Female,University,Married,41,0,74903,2120,0
+220000,Female,High School,Married,43,0,215050,9029,0
+100000,Female,Graduate School,Married,37,1,0,0,1
+400000,Female,High School,Married,40,0,30179,35729,0
+360000,Female,University,Single,43,-2,6088,17980,0
+30000,Female,High School,Single,47,-1,3220,390,0
+10000,Female,High School,Married,34,0,7101,1307,1
+330000,Female,Graduate School,Married,37,-2,726,0,0
+110000,Female,University,Married,46,0,102726,10700,0
+30000,Female,University,Married,44,2,21003,1662,1
+110000,Female,University,Married,37,0,95986,4300,0
+150000,Female,University,Single,42,0,255207,6301,0
+80000,Female,University,Single,32,1,113740,0,1
+160000,Female,Graduate School,Married,41,-1,24411,7944,0
+30000,Female,University,Married,32,0,3454,0,0
+200000,Female,Graduate School,Single,33,-2,4510,7430,0
+150000,Female,Graduate School,Married,33,1,-3,353,0
+140000,Female,University,Married,34,1,0,0,0
+120000,Female,Graduate School,Married,37,0,21128,5020,0
+100000,Female,High School,Married,48,-1,390,5333,0
+50000,Female,University,Married,38,0,51023,2300,0
+80000,Female,High School,Married,42,-1,4571,4905,0
+140000,Female,High School,Single,41,0,85698,1400,0
+270000,Female,Graduate School,Married,45,0,218116,10041,0
+210000,Female,High School,Married,36,-1,1168,1511,0
+230000,Female,University,Married,48,-2,0,0,0
+30000,Female,University,Married,43,0,26806,4000,0
+260000,Female,High School,Married,44,0,259346,9428,1
+20000,Female,University,Married,42,2,18162,900,1
+160000,Female,University,Married,34,-2,390,390,0
+50000,Female,University,Married,43,0,46882,5100,0
+50000,Female,High School,Married,41,0,46857,2000,0
+160000,Female,Graduate School,Married,38,-1,316,873,1
+140000,Female,High School,Married,44,1,-9,909,0
+230000,Female,University,Married,43,0,7075,1279,0
+120000,Female,University,Married,34,1,0,0,0
+20000,Female,University,Married,49,0,9589,2200,1
+90000,Female,University,Married,44,-1,390,390,1
+130000,Female,High School,Married,38,0,64517,3000,0
+20000,Female,High School,Single,31,-2,-885,20,0
+230000,Female,Graduate School,Single,31,-2,-7,6792,0
+430000,Female,University,Married,42,0,89756,3169,0
+110000,Female,University,Married,40,0,5457,1300,0
+210000,Female,Others,Married,40,0,206446,8139,1
+260000,Female,High School,Single,49,-2,-5684,217773,0
+280000,Female,High School,Married,45,0,294995,26345,0
+230000,Female,University,Single,36,-2,858,885,0
+50000,Female,High School,Single,46,0,36165,1600,0
+320000,Female,University,Single,36,-1,396,396,0
+110000,Female,University,Married,41,0,104133,4500,1
+180000,Female,University,Married,42,-1,390,390,0
+180000,Female,High School,Single,41,0,130979,5739,0
+10000,Female,University,Single,32,0,5250,1223,1
+30000,Female,University,Married,32,1,0,0,1
+70000,Female,University,Single,32,2,67955,3007,1
+20000,Female,University,Single,44,0,18578,2000,0
+50000,Female,University,Single,36,2,39125,2000,1
+180000,Female,University,Married,38,-1,390,10426,0
+220000,Female,High School,Single,46,0,214274,11000,0
+180000,Female,Graduate School,Single,29,-1,15991,1473,0
+170000,Female,Graduate School,Single,29,0,106146,11021,0
+100000,Female,University,Single,30,0,96636,3402,0
+280000,Female,University,Single,30,-1,13182,5290,0
+290000,Female,Graduate School,Single,31,-1,21720,1240,0
+30000,Female,University,Married,31,3,25320,0,1
+30000,Female,University,Single,31,0,28651,1800,0
+80000,Female,University,Married,46,0,74745,2815,0
+50000,Female,University,Married,42,0,48706,1809,0
+60000,Female,High School,Married,34,2,6731,1300,0
+260000,Female,University,Married,34,0,198788,6105,0
+60000,Female,University,Married,41,0,19813,5000,0
+80000,Female,University,Married,29,1,85738,0,0
+360000,Female,High School,Married,40,1,0,0,0
+180000,Female,University,Married,30,0,8531,1200,0
+80000,Female,University,Single,31,2,30911,2009,1
+70000,Female,University,Married,44,0,66301,3500,0
+360000,Female,Graduate School,Married,36,-1,18641,7174,0
+300000,Female,High School,Single,36,-2,0,1979,0
+100000,Female,High School,Married,38,0,87181,4017,0
+100000,Female,Graduate School,Single,29,0,70205,4000,0
+120000,Female,High School,Married,34,-2,0,360,0
+200000,Female,University,Married,48,-2,22111,11257,1
+180000,Female,Graduate School,Married,37,-2,8592,8478,0
+60000,Female,University,Single,27,0,7301,11512,0
+10000,Female,University,Married,29,0,6176,1265,0
+230000,Female,Others,Married,35,-2,4793,6336,0
+50000,Female,University,Single,27,1,11966,1500,0
+350000,Female,University,Single,28,-2,3738,7555,0
+200000,Female,High School,Married,28,-2,86,7066,1
+150000,Female,High School,Single,28,-2,1666,806,0
+180000,Female,University,Married,29,2,113331,5500,1
+30000,Female,Graduate School,Single,30,0,29009,1315,0
+70000,Female,University,Single,30,-1,69494,2716,1
+460000,Female,University,Single,32,0,30506,1568,0
+110000,Female,University,Married,42,-1,776,776,0
+240000,Female,Graduate School,Single,31,0,31759,10000,0
+210000,Female,University,Married,31,0,205243,7736,0
+300000,Female,High School,Single,32,1,-27,72071,0
+170000,Female,Graduate School,Single,34,1,0,5010,0
+420000,Female,University,Married,34,0,49256,4000,0
+230000,Female,Others,Single,34,0,2223,1000,0
+260000,Female,Graduate School,Single,33,-1,465,460,0
+50000,Female,High School,Married,41,0,46456,1760,0
+80000,Female,University,Single,43,0,77479,3631,0
+200000,Female,Graduate School,Married,45,-1,69288,0,0
+80000,Female,High School,Married,40,0,18608,2000,0
+290000,Female,University,Married,36,0,8413,2006,0
+20000,Female,High School,Others,40,-1,390,390,1
+50000,Female,University,Married,42,-1,565,5481,0
+450000,Female,University,Married,36,-1,3087,2081,0
+250000,Female,University,Married,38,1,-163,0,0
+50000,Female,High School,Married,40,0,48747,5300,0
+280000,Female,University,Others,40,0,291952,8562,0
+20000,Female,High School,Married,41,0,8777,2000,0
+310000,Female,High School,Married,37,-2,4052,6514,0
+250000,Female,Graduate School,Single,36,0,111204,5600,0
+180000,Female,University,Married,36,2,172360,9000,1
+500000,Female,High School,Single,35,1,458863,16918,0
+30000,Female,High School,Married,44,0,25957,3400,0
+250000,Female,University,Single,36,0,71012,2200,0
+170000,Female,Graduate School,Single,39,0,60514,5011,0
+100000,Female,University,Married,39,-2,518,0,0
+430000,Female,University,Married,42,-1,38495,11935,0
+170000,Female,University,Single,43,0,136113,7000,0
+210000,Female,University,Single,32,-1,576,3578,0
+280000,Female,University,Single,32,-2,6562,54318,0
+160000,Female,High School,Single,47,-2,396,396,0
+210000,Female,Graduate School,Married,40,-2,7542,0,0
+200000,Female,Graduate School,Married,40,-1,3827,8030,0
+400000,Female,Graduate School,Single,30,-1,634,5939,0
+80000,Female,High School,Single,35,1,0,0,0
+80000,Female,University,Married,35,1,30997,1852,1
+30000,Female,University,Married,38,-1,2828,3963,0
+20000,Female,High School,Married,39,1,8562,1200,1
+300000,Female,Graduate School,Married,37,-2,8756,26,0
+460000,Female,University,Married,39,-1,11403,14029,0
+80000,Female,University,Single,42,0,42656,1519,0
+80000,Female,High School,Married,36,-1,808,836,0
+100000,Female,High School,Married,40,-2,3234,6163,0
+100000,Female,Graduate School,Married,41,2,31786,3680,1
+50000,Female,University,Married,36,0,46714,1766,0
+80000,Female,University,Married,36,1,74328,3000,1
+130000,Female,University,Single,37,0,81731,4000,0
+50000,Female,University,Married,36,2,46203,0,0
+210000,Female,University,Married,44,0,88309,4500,0
+260000,Female,Others,Single,39,0,174591,10000,0
+50000,Female,University,Married,32,0,45271,2000,0
+460000,Female,University,Single,33,0,8316,6018,0
+150000,Female,High School,Married,34,0,115227,10000,0
+80000,Female,High School,Married,31,0,56195,4808,0
+80000,Female,University,Married,42,0,46911,1748,0
+180000,Female,High School,Married,46,-1,10456,357,1
+100000,Female,High School,Married,35,1,0,0,0
+60000,Female,University,Married,40,0,28738,4000,1
+50000,Female,High School,Single,34,0,51037,2100,0
+390000,Female,Graduate School,Married,42,0,310075,10021,0
+210000,Female,Graduate School,Married,38,0,8420,3000,0
+160000,Female,University,Single,39,0,163058,6001,0
+50000,Female,High School,Married,42,0,28693,3644,0
+50000,Female,University,Single,31,0,10682,1194,0
+160000,Female,University,Married,37,0,138594,3755,0
+220000,Female,University,Married,40,-2,5785,4183,0
+80000,Female,High School,Married,43,0,71604,3500,0
+170000,Female,University,Married,31,-1,60644,5000,0
+60000,Female,High School,Married,33,1,56326,0,0
+200000,Female,Graduate School,Married,36,-2,588,608,0
+320000,Female,Graduate School,Single,41,-2,2197,1376,0
+160000,Female,Graduate School,Married,47,1,0,0,0
+20000,Female,University,Married,45,0,9779,1063,0
+80000,Female,Graduate School,Single,27,-1,988,12961,1
+210000,Female,University,Single,29,-1,103591,3831,0
+50000,Female,Graduate School,Single,27,0,4950,1068,1
+500000,Female,Graduate School,Single,28,0,390393,26085,0
+120000,Female,University,Single,28,1,0,0,0
+20000,Female,University,Single,28,1,20583,0,0
+250000,Female,Graduate School,Single,28,0,116861,5000,0
+80000,Female,High School,Married,34,-1,3500,9295,0
+240000,Female,University,Single,28,-2,1154,1154,0
+150000,Female,University,Single,28,0,22943,1335,0
+340000,Female,University,Single,29,0,43733,16787,0
+80000,Female,Graduate School,Married,28,0,73606,10200,0
+130000,Female,High School,Single,29,0,12939,1000,0
+100000,Female,University,Single,29,0,87133,3400,0
+230000,Female,Graduate School,Single,29,0,12394,2102,0
+160000,Female,Graduate School,Single,30,1,0,0,0
+220000,Female,University,Single,29,2,28992,8000,0
+80000,Female,Graduate School,Single,29,0,14227,1500,1
+420000,Female,University,Married,29,0,48455,2011,0
+130000,Female,University,Single,32,-2,11175,14181,0
+80000,Female,Graduate School,Single,32,0,40478,5225,0
+210000,Female,Graduate School,Single,31,0,77830,2555,0
+220000,Female,Graduate School,Single,31,0,124281,10000,0
+170000,Female,University,Single,30,0,132283,6000,1
+50000,Female,Graduate School,Single,30,1,41934,2100,0
+80000,Female,University,Single,31,-2,4771,4771,0
+300000,Female,Graduate School,Single,33,0,58719,3000,0
+140000,Female,University,Married,33,1,0,7933,0
+290000,Female,University,Single,33,0,242422,7757,0
+80000,Female,Graduate School,Single,33,-2,0,747,0
+500000,Female,Graduate School,Married,33,-1,3994,8022,0
+40000,Female,High School,Married,44,1,35544,2000,1
+50000,Female,University,Married,45,0,39944,2124,0
+260000,Female,University,Married,34,-2,0,0,0
+80000,Female,High School,Married,45,-1,19989,1228,0
+280000,Female,Graduate School,Married,35,-1,7147,2500,0
+270000,Female,Graduate School,Married,37,-2,264,217,0
+80000,Female,High School,Married,48,0,33058,1945,0
+120000,Female,University,Married,34,0,119884,4545,0
+300000,Female,Graduate School,Married,40,0,121603,30000,0
+120000,Female,University,Single,37,2,38757,4300,0
+80000,Female,University,Others,36,0,63908,10051,0
+50000,Female,University,Married,42,0,12106,1184,0
+100000,Female,University,Married,47,0,24735,3555,0
+180000,Female,University,Married,44,-2,6529,6626,0
+160000,Female,University,Married,45,0,63198,7000,0
+350000,Female,University,Married,49,-2,16173,9122,0
+50000,Female,High School,Married,33,0,23500,0,0
+320000,Female,University,Single,37,-2,-5700,0,0
+170000,Female,University,Single,35,0,121231,5000,0
+50000,Female,University,Married,37,0,47832,3010,0
+360000,Female,Graduate School,Married,40,2,108399,0,1
+70000,Female,University,Single,36,0,68028,3340,1
+20000,Female,High School,Married,45,-1,1403,6990,0
+100000,Female,High School,Married,46,0,33699,4500,0
+180000,Female,High School,Others,36,-1,326,326,0
+160000,Female,High School,Married,42,-2,0,1473,0
+230000,Female,University,Married,35,2,1214,500,1
+20000,Female,University,Married,38,-2,0,0,1
+180000,Female,University,Married,39,1,101526,3000,0
+180000,Female,Graduate School,Single,45,1,0,0,0
+20000,Female,University,Married,34,0,17625,1019,0
+150000,Female,University,Married,49,1,0,970,0
+240000,Female,Graduate School,Married,40,-2,0,0,1
+360000,Female,High School,Married,37,-2,0,0,0
+260000,Female,University,Married,35,-1,5656,869,0
+500000,Female,University,Married,34,-2,17172,35454,0
+200000,Female,University,Married,41,-2,-60,0,0
+230000,Female,High School,Single,45,0,30099,1000,0
+200000,Female,University,Single,32,0,127132,5058,0
+70000,Female,High School,Married,37,0,48684,1785,0
+160000,Female,High School,Married,36,1,0,17029,0
+80000,Female,University,Single,33,0,25211,1300,0
+100000,Female,University,Married,30,0,47423,3000,0
+20000,Female,University,Married,38,1,0,2000,1
+110000,Female,High School,Married,47,0,109461,3477,0
+100000,Female,University,Single,35,-1,326,2691,1
+240000,Female,University,Married,39,0,233161,8900,0
+170000,Female,Graduate School,Married,31,1,1564,0,0
+110000,Female,Graduate School,Single,31,2,91773,4550,1
+50000,Female,University,Single,44,-1,501,1463,0
+390000,Female,University,Single,41,0,10496,0,0
+40000,Female,High School,Married,44,0,13033,1232,0
+160000,Female,High School,Married,39,-2,6089,2400,0
+20000,Female,High School,Married,41,1,18548,3100,0
+180000,Female,University,Single,36,1,0,0,0
+50000,Female,University,Married,41,1,23165,2000,1
+110000,Female,University,Married,37,0,20733,1602,0
+20000,Female,High School,Single,39,0,15969,1285,1
+40000,Female,High School,Married,36,0,15861,3000,0
+50000,Female,High School,Married,47,2,46666,2100,1
+200000,Female,Graduate School,Single,30,-1,200,200,0
+90000,Female,University,Married,31,0,19929,1500,0
+300000,Female,Graduate School,Single,34,-2,0,0,0
+310000,Female,University,Married,36,0,294180,11000,0
+330000,Female,University,Married,37,0,46348,1574,0
+210000,Female,High School,Single,30,-1,1661,5793,0
+260000,Female,University,Single,34,-2,8383,2985,0
+290000,Female,Graduate School,Single,35,-2,2348,10421,0
+20000,Female,University,Married,39,1,17751,1800,0
+140000,Female,University,Married,31,0,98250,4600,0
+150000,Female,University,Married,36,2,100247,12924,1
+50000,Female,High School,Others,46,-1,390,390,0
+50000,Female,University,Married,37,-1,7390,2157,0
+270000,Female,University,Married,28,-2,1459,12,0
+240000,Female,University,Married,28,0,175976,6600,0
+150000,Female,Graduate School,Single,29,0,78044,2000,0
+260000,Female,University,Single,40,0,120108,4224,0
+470000,Female,High School,Married,42,0,219081,70000,0
+190000,Female,Graduate School,Married,47,0,192493,7200,0
+160000,Female,University,Single,28,-1,5171,3634,0
+320000,Female,Graduate School,Single,31,-1,66604,9293,0
+30000,Female,University,Single,30,0,24482,1800,1
+150000,Female,Graduate School,Single,33,-2,24393,27007,0
+200000,Female,University,Single,34,0,189490,18000,1
+150000,Female,University,Single,34,0,39883,2068,0
+50000,Female,High School,Married,30,0,48549,1813,0
+80000,Female,University,Single,24,0,19188,1374,0
+50000,Female,University,Married,47,0,9532,5000,0
+80000,Female,High School,Married,37,2,93050,8840,1
+280000,Female,University,Single,34,0,7215,24143,0
+160000,Female,Graduate School,Married,38,0,132932,8005,0
+80000,Female,Graduate School,Single,29,0,28491,1710,0
+160000,Female,University,Single,30,0,136053,3528,0
+90000,Female,University,Married,32,-1,3682,0,0
+360000,Female,University,Single,32,0,11646,5000,0
+80000,Female,University,Married,34,0,71435,2642,0
+20000,Female,High School,Married,36,-1,626,626,1
+50000,Female,University,Married,48,0,48572,2028,0
+200000,Female,Graduate School,Married,40,2,80468,4200,1
+50000,Female,High School,Married,44,1,13112,2100,1
+120000,Female,University,Married,31,1,21134,0,1
+60000,Female,University,Married,31,2,63201,1132,1
+120000,Female,High School,Single,32,-1,66551,2429,0
+80000,Female,Graduate School,Married,32,1,0,0,0
+140000,Female,University,Single,32,2,103181,6000,0
+300000,Female,Graduate School,Married,33,-1,3890,4008,0
+230000,Female,University,Married,30,0,8685,1079,0
+310000,Female,High School,Single,30,1,5931,120000,0
+330000,Female,Graduate School,Single,31,-1,4054,1091,0
+210000,Female,Graduate School,Married,35,0,67374,2500,0
+120000,Female,University,Married,35,0,115600,5521,0
+170000,Female,University,Married,35,0,96565,30580,0
+50000,Female,University,Single,39,0,25254,2000,0
+100000,Female,High School,Married,44,2,12724,0,1
+140000,Female,Graduate School,Single,33,-1,10174,8651,0
+210000,Female,University,Single,33,0,43047,1000,0
+360000,Female,Graduate School,Single,34,0,96169,4000,0
+50000,Female,University,Others,34,0,50854,1314,0
+340000,Female,University,Single,34,-2,0,2630,0
+260000,Female,Graduate School,Single,37,-1,1050,1050,1
+500000,Female,High School,Single,37,1,5186,0,0
+60000,Female,University,Married,40,1,57388,2522,0
+420000,Female,University,Married,38,0,56107,53400,0
+30000,Female,High School,Married,41,1,28041,13,0
+370000,Female,University,Married,41,-2,1442,9750,0
+450000,Female,Graduate School,Married,35,-2,7240,1251,0
+400000,Female,Graduate School,Married,39,0,1991,2920,1
+400000,Female,Graduate School,Single,46,0,2030,10540,1
+640000,Female,University,Single,39,0,119887,10000,0
+210000,Female,University,Married,38,-1,4991,551,0
+30000,Female,High School,Single,35,0,27873,1441,0
+100000,Female,Graduate School,Married,36,-2,16155,3514,0
+250000,Female,Graduate School,Married,41,-1,22423,60,0
+190000,Female,Graduate School,Single,40,0,122308,4445,0
+20000,Female,High School,Married,43,0,9540,2422,0
+180000,Female,High School,Married,40,0,20909,4262,0
+200000,Female,Graduate School,Married,29,-1,188,2215,0
+80000,Female,Graduate School,Single,29,1,0,0,0
+120000,Female,University,Single,29,0,96686,3508,0
+50000,Female,University,Single,29,0,30517,4000,0
+260000,Female,Others,Single,30,-2,5106,280,0
+180000,Female,Graduate School,Single,30,1,0,3095,0
+130000,Female,University,Married,30,-1,1650,1650,1
+110000,Female,University,Married,30,1,68910,3200,0
+50000,Female,University,Married,44,0,50496,2007,0
+500000,Female,Graduate School,Married,46,0,196606,10000,0
+360000,Female,Graduate School,Single,32,1,0,0,0
+80000,Female,University,Married,34,-1,1383,5030,0
+10000,Female,Graduate School,Single,33,2,7486,3000,0
+320000,Female,Graduate School,Married,34,-2,5159,967,0
+30000,Female,High School,Married,32,0,28701,27078,0
+80000,Female,High School,Married,32,-2,0,0,0
+160000,Female,University,Married,34,1,79727,0,0
+490000,Female,Graduate School,Single,34,0,19036,5038,0
+220000,Female,Graduate School,Married,35,1,0,0,0
+450000,Female,Graduate School,Married,35,-1,403,0,0
+370000,Female,High School,Married,35,-1,5886,8190,0
+80000,Female,High School,Married,34,0,43321,5000,1
+360000,Female,Graduate School,Married,47,2,2500,0,1
+60000,Female,University,Married,42,0,58899,2579,0
+50000,Female,Graduate School,Married,45,-1,7963,1669,0
+160000,Female,University,Married,49,-1,347,2389,0
+260000,Female,High School,Married,41,0,264549,10000,0
+220000,Female,University,Single,45,-1,990,990,0
+60000,Female,High School,Married,46,2,44030,1500,1
+220000,Female,Graduate School,Married,38,1,171784,10000,0
+290000,Female,Graduate School,Married,39,-1,264,264,0
+130000,Female,High School,Single,40,0,98874,5000,0
+200000,Female,University,Married,41,0,6981,1000,0
+200000,Female,High School,Others,37,1,0,35000,0
+110000,Female,High School,Others,44,2,38639,2000,1
+30000,Female,High School,Married,45,2,15667,12110,1
+30000,Female,University,Married,42,0,22943,1700,0
+100000,Female,University,Married,41,-1,621,0,0
+360000,Female,University,Married,44,-2,0,0,0
+140000,Female,University,Married,38,0,84539,18000,0
+50000,Female,High School,Married,47,0,34740,4000,0
+70000,Female,High School,Married,35,-1,1473,1473,0
+60000,Female,High School,Single,48,2,48094,0,0
+360000,Female,High School,Single,39,1,0,0,0
+240000,Female,Graduate School,Married,40,1,0,9965,0
+200000,Female,Graduate School,Single,30,-2,896,900,0
+100000,Female,High School,Married,35,0,91518,3000,0
+50000,Female,University,Married,41,0,9172,1172,0
+100000,Female,University,Single,41,0,98075,3600,0
+270000,Female,Others,Single,42,-2,14352,1859,0
+50000,Female,High School,Married,48,0,47945,4100,0
+300000,Female,University,Married,41,-1,5511,1982,0
+50000,Female,University,Married,40,0,31934,2000,1
+80000,Female,High School,Married,41,0,42124,1835,0
+200000,Female,Graduate School,Married,31,0,76621,6808,0
+340000,Female,University,Single,31,5,589654,3000,1
+310000,Female,Graduate School,Married,31,-2,0,0,0
+80000,Female,University,Single,31,0,15633,2302,0
+210000,Female,Graduate School,Single,31,-1,1983,422,0
+80000,Female,Graduate School,Single,32,0,70163,4000,0
+500000,Female,University,Married,32,0,405490,20000,0
+260000,Female,University,Married,35,0,9286,2036,0
+120000,Female,University,Married,41,-1,919,8209,0
+100000,Female,High School,Married,41,0,70202,2548,0
+60000,Female,University,Married,35,2,49825,1700,1
+190000,Female,Graduate School,Married,36,-1,6915,4923,0
+100000,Female,High School,Married,35,2,93983,5500,1
+380000,Female,University,Married,37,0,13415,1204,0
+350000,Female,Graduate School,Single,44,1,8651,1871,1
+100000,Female,High School,Married,39,0,97944,4296,0
+100000,Female,Graduate School,Single,31,0,100296,3703,0
+150000,Female,University,Married,32,-1,3267,3782,0
+150000,Female,University,Married,33,1,5607,1200,1
+150000,Female,Graduate School,Married,38,0,109244,3506,0
+460000,Female,Graduate School,Single,35,1,-14,707,0
+100000,Female,University,Single,44,2,105373,5700,1
+230000,Female,University,Single,38,0,75459,3000,0
+10000,Female,University,Married,38,1,3980,0,1
+200000,Female,High School,Married,44,-2,390,390,0
+140000,Female,University,Married,34,0,141066,4873,0
+370000,Female,High School,Married,46,-2,6370,2120,0
+50000,Female,High School,Married,49,0,36325,0,0
+160000,Female,University,Married,36,-1,2447,206,0
+80000,Female,High School,Single,43,0,80707,3332,0
+240000,Female,Graduate School,Single,33,0,61180,23222,0
+160000,Female,High School,Single,33,-1,18195,0,0
+50000,Female,High School,Married,44,0,48592,1063,1
+180000,Female,University,Single,38,0,179307,7294,0
+50000,Female,High School,Married,46,0,58141,2200,0
+500000,Female,Graduate School,Single,32,0,253767,10000,0
+200000,Female,Graduate School,Married,34,-2,2955,5938,0
+60000,Female,High School,Single,44,0,56403,2300,0
+50000,Female,University,Single,46,1,52308,0,0
+50000,Female,High School,Single,44,2,40985,2100,1
+210000,Female,University,Single,27,1,0,0,1
+60000,Female,University,Married,27,1,10415,0,0
+140000,Female,Graduate School,Single,28,-1,31,2000,1
+260000,Female,Graduate School,Married,35,-2,-124,0,1
+300000,Female,Graduate School,Married,40,-2,1529,3717,0
+500000,Female,University,Married,36,-2,-1,230,0
+140000,Female,Graduate School,Married,34,-1,2574,0,0
+290000,Female,High School,Married,49,2,301651,34905,1
+360000,Female,Graduate School,Single,27,0,91282,3967,0
+20000,Female,University,Single,27,0,8879,1500,0
+190000,Female,University,Single,27,0,125739,6000,0
+150000,Female,Graduate School,Single,28,0,41387,2000,0
+100000,Female,Graduate School,Single,28,2,4523,3000,1
+310000,Female,Graduate School,Single,29,0,29906,14750,0
+500000,Female,Graduate School,Single,29,0,70287,5000,0
+120000,Female,University,Single,28,-1,792,0,0
+200000,Female,University,Married,28,0,135841,4449,0
+180000,Female,Graduate School,Single,30,0,7808,1198,0
+50000,Female,University,Single,30,0,46795,2169,1
+210000,Female,Graduate School,Single,30,0,52514,2449,0
+230000,Female,Graduate School,Single,30,2,212400,9000,1
+200000,Female,Graduate School,Single,30,-2,1028,1030,0
+100000,Female,University,Single,31,0,101458,3800,1
+250000,Female,University,Single,31,0,171006,5004,0
+200000,Female,University,Married,31,-1,3850,6,0
+50000,Female,University,Married,31,0,4468,0,0
+170000,Female,University,Married,31,1,0,427,0
+70000,Female,University,Married,32,0,57550,3000,0
+260000,Female,Graduate School,Married,31,0,50312,85387,0
+60000,Female,High School,Single,32,0,59143,2100,0
+120000,Female,University,Single,31,-2,140,140,1
+100000,Female,High School,Single,33,0,74847,3000,0
+500000,Female,Graduate School,Single,34,0,161716,6018,0
+230000,Female,University,Married,34,2,566,2329,0
+330000,Female,Others,Married,34,0,252755,25000,0
+360000,Female,Graduate School,Married,35,-2,3925,5352,0
+180000,Female,University,Married,44,1,0,0,0
+180000,Female,High School,Married,35,0,175200,7002,1
+140000,Female,University,Single,35,0,84873,4000,0
+260000,Female,Graduate School,Married,37,1,58261,3000,0
+220000,Female,Graduate School,Married,36,-2,0,0,0
+100000,Female,High School,Married,42,-1,1651,631,0
+280000,Female,Graduate School,Married,39,-1,1451,9152,0
+100000,Female,University,Single,40,-1,1573,0,0
+290000,Female,High School,Married,46,-1,73612,2207,0
+360000,Female,Graduate School,Married,48,0,202783,10000,0
+290000,Female,High School,Married,46,0,7874,1500,0
+60000,Female,University,Married,39,0,32734,1506,0
+80000,Female,Graduate School,Single,44,0,41630,8000,1
+180000,Female,University,Married,42,0,132986,6167,0
+200000,Female,Graduate School,Single,37,-1,454,2500,0
+170000,Female,Graduate School,Single,44,-1,2500,804,0
+780000,Female,University,Single,41,-2,101957,62819,0
+340000,Female,Graduate School,Others,42,-1,139808,873552,0
+200000,Female,Others,Married,39,-2,20218,25267,0
+350000,Female,Graduate School,Married,45,-1,291,291,0
+360000,Female,University,Married,39,0,366965,12709,1
+130000,Female,University,Married,41,-1,2538,1177,0
+310000,Female,University,Married,44,-1,67423,10000,0
+110000,Female,University,Married,46,0,107281,4560,0
+260000,Female,University,Married,44,1,-6,1106,0
+290000,Female,Graduate School,Married,44,-2,7886,9636,0
+200000,Female,Graduate School,Married,37,-1,1382,23994,0
+130000,Female,University,Married,40,0,54137,2600,1
+60000,Female,Graduate School,Single,28,0,38991,10008,0
+20000,Female,University,Single,30,0,21302,1300,0
+300000,Female,University,Married,31,0,3565,316,0
+70000,Female,High School,Married,39,0,124493,4800,0
+240000,Female,University,Married,35,-1,528,0,1
+60000,Female,Graduate School,Single,29,-1,5557,7587,0
+240000,Female,Graduate School,Single,28,1,0,0,0
+200000,Female,High School,Married,29,-1,161,0,0
+500000,Female,Graduate School,Single,31,0,47177,10000,0
+140000,Female,University,Others,28,0,138901,5600,1
+210000,Female,University,Married,29,0,42140,2000,0
+240000,Female,University,Married,31,-1,184,0,0
+240000,Female,Graduate School,Single,30,-2,0,0,0
+160000,Female,Graduate School,Single,28,-2,4145,0,0
+50000,Female,Graduate School,Single,29,0,28889,1368,0
+400000,Female,Graduate School,Single,29,-1,16666,62545,0
+440000,Female,University,Single,29,-1,23147,70554,0
+150000,Female,Graduate School,Married,30,0,114797,4500,0
+100000,Female,University,Married,27,1,0,0,1
+120000,Female,Graduate School,Married,32,-1,497,0,0
+50000,Female,University,Married,28,3,2400,0,0
+150000,Female,Graduate School,Single,32,2,118348,5900,1
+360000,Female,Graduate School,Single,30,-2,2500,0,0
+260000,Female,University,Married,29,0,246509,7012,0
+240000,Female,University,Married,43,0,244103,9102,1
+180000,Female,University,Others,41,0,141137,5481,0
+210000,Female,University,Married,42,-1,2201,3571,0
+70000,Female,University,Single,34,-1,2226,1245,0
+500000,Female,Graduate School,Single,35,0,3994,11160,0
+500000,Female,University,Single,48,-1,30252,38009,0
+180000,Female,Graduate School,Single,28,0,51085,5014,0
+120000,Female,University,Married,31,0,45716,2000,0
+200000,Female,University,Married,33,0,15126,3000,0
+140000,Female,University,Married,33,0,89226,3133,0
+20000,Female,High School,Single,34,1,0,0,1
+220000,Female,Graduate School,Married,39,-1,419,419,0
+290000,Female,Graduate School,Married,32,-2,4340,3686,1
+260000,Female,University,Single,37,-1,188,188,0
+220000,Female,Graduate School,Single,34,0,135132,7245,0
+80000,Female,High School,Others,40,2,1140,5000,0
+160000,Female,University,Married,41,0,13512,1206,0
+500000,Female,Graduate School,Single,37,0,80846,3032,0
+230000,Female,University,Married,42,-2,-138,0,0
+120000,Female,High School,Married,39,1,0,0,1
+360000,Female,Graduate School,Married,38,-2,2410,7954,0
+220000,Female,Graduate School,Single,39,-1,300,0,0
+120000,Female,Graduate School,Single,32,-1,1192,0,0
+360000,Female,University,Single,33,-1,3231,9666,0
+120000,Female,Graduate School,Married,42,-1,344,881,1
+50000,Female,Graduate School,Married,32,0,27447,1500,0
+290000,Female,University,Married,35,0,285679,10050,0
+230000,Female,University,Married,38,-1,1129,1029,0
+200000,Female,Graduate School,Married,36,1,0,0,0
+370000,Female,Others,Married,48,0,250264,10004,0
+50000,Male,University,Single,40,0,60131,3000,0
+50000,Female,Others,Married,32,-2,0,0,0
+20000,Female,High School,Single,44,0,18783,1230,0
+260000,Female,Graduate School,Married,36,0,131971,10000,0
+230000,Female,Others,Married,42,-1,390,390,0
+200000,Female,University,Married,36,-1,8812,15234,0
+340000,Female,Graduate School,Married,36,0,29879,1459,0
+220000,Female,Others,Married,32,0,84942,2500,0
+340000,Female,Graduate School,Married,36,-1,4490,4834,0
+200000,Female,High School,Married,42,-1,535,10468,0
+180000,Female,High School,Married,44,1,0,0,0
+50000,Female,University,Single,43,0,24847,2000,0
+80000,Female,Graduate School,Single,38,0,77204,2938,0
+160000,Female,University,Married,31,-1,1133,31964,0
+80000,Female,University,Married,48,0,77627,5000,0
+30000,Female,Graduate School,Single,37,-1,264,264,1
+230000,Female,Graduate School,Single,42,-2,107,529,0
+400000,Female,University,Single,42,-1,3608,4052,0
+200000,Female,High School,Single,42,1,-105,10000,0
+20000,Female,University,Married,46,2,5437,1264,1
+200000,Female,Graduate School,Married,38,-1,3824,6115,0
+140000,Female,Graduate School,Single,34,-1,1500,0,0
+360000,Female,Graduate School,Single,29,0,270528,10256,0
+360000,Female,Graduate School,Married,36,1,-1,0,0
+390000,Female,Others,Single,45,0,184547,6645,0
+390000,Female,Graduate School,Single,41,-2,-13,0,1
+160000,Female,High School,Married,35,-1,335,0,0
+340000,Female,Graduate School,Single,44,-2,2732,2183,0
+280000,Female,Graduate School,Married,36,-1,832,0,0
+500000,Female,Graduate School,Single,42,-1,21179,51304,0
+60000,Female,Graduate School,Single,46,0,16017,22000,0
+420000,Female,University,Single,37,0,14823,1800,0
+160000,Female,Graduate School,Married,38,0,6555,0,0
+230000,Female,University,Single,34,-1,316,316,0
+230000,Female,Graduate School,Single,31,2,125270,6531,1
+280000,Female,University,Married,36,-2,2874,2888,0
+180000,Female,University,Single,31,0,6903,8000,0
+100000,Female,Graduate School,Married,40,-2,1895,0,1
+180000,Female,Graduate School,Single,33,0,3302,1295,0
+30000,Female,Others,Single,46,0,29213,5064,0
+320000,Female,University,Married,39,0,46383,68800,0
+590000,Female,High School,Married,39,0,128659,5025,0
+130000,Female,University,Married,38,-1,5636,8510,0
+130000,Female,Graduate School,Married,33,-1,1365,84118,0
+280000,Female,University,Married,41,1,268,509,0
+90000,Female,University,Married,40,0,89789,7069,0
+320000,Female,University,Married,30,-2,1031,1895,0
+160000,Female,University,Married,31,0,63780,3518,0
+150000,Female,University,Married,37,-1,104460,6000,0
+40000,Female,University,Married,48,-1,27273,2000,0
+180000,Female,High School,Married,31,0,68450,3100,0
+550000,Female,University,Single,39,0,336722,11007,0
+50000,Female,Graduate School,Married,45,1,6805,0,0
+220000,Female,Graduate School,Single,36,0,222598,10000,1
+360000,Female,University,Single,40,-1,9325,10741,0
+250000,Female,University,Single,35,0,277822,6150,0
+340000,Female,Graduate School,Married,46,0,94915,3224,0
+180000,Female,Graduate School,Married,44,-1,1210,1071,0
+30000,Female,High School,Single,48,2,26950,0,1
+300000,Female,High School,Single,33,-1,76922,0,1
+40000,Female,University,Married,37,0,5366,1273,0
+500000,Female,Graduate School,Married,47,0,26110,2064,0
+180000,Female,Graduate School,Married,41,0,55193,5000,0
+350000,Female,Graduate School,Married,44,0,110819,6000,0
+90000,Female,University,Single,48,-2,6224,8150,0
+240000,Female,University,Single,31,0,245553,10178,0
+50000,Female,High School,Others,46,0,44835,1900,0
+220000,Female,Graduate School,Single,33,-1,1379,705,0
+180000,Female,University,Single,38,0,25895,2000,0
+400000,Female,Others,Married,37,-2,14889,929,0
+220000,Female,Graduate School,Married,42,3,2500,0,1
+180000,Female,Others,Married,48,-1,872,3347,0
+80000,Female,High School,Married,38,0,39604,1000,1
+360000,Female,Graduate School,Single,37,0,198839,7000,0
+170000,Female,Graduate School,Single,49,0,163951,5100,0
+130000,Female,University,Married,48,-2,0,1240,0
+290000,Female,Graduate School,Married,30,0,216328,8000,0
+90000,Female,University,Married,32,0,79105,8200,0
+130000,Female,High School,Single,32,0,12535,2000,0
+140000,Female,Graduate School,Others,48,1,0,780,0
+510000,Female,University,Single,48,0,113616,9019,0
+50000,Female,Graduate School,Married,39,-1,2080,0,0
+200000,Female,Graduate School,Single,40,-1,1489,9973,0
+200000,Female,Graduate School,Single,37,0,83000,2500,0
+400000,Female,University,Single,36,0,69376,2255,0
+310000,Female,Graduate School,Married,35,-1,1815,3689,0
+210000,Female,Graduate School,Married,46,0,50720,3111,0
+90000,Female,University,Married,33,0,77379,2500,0
+270000,Female,Graduate School,Single,31,-1,9213,1080,0
+30000,Female,High School,Married,39,0,30238,2000,1
+30000,Female,University,Married,41,0,20061,1542,0
+210000,Female,University,Married,36,-1,2853,8037,0
+110000,Female,Graduate School,Married,43,0,104622,4000,0
+160000,Female,University,Married,42,1,0,0,0
+380000,Female,Graduate School,Single,30,0,27015,5700,0
+140000,Female,Graduate School,Single,33,-2,954,1015,0
+250000,Female,High School,Married,44,0,166325,6686,0
+150000,Female,Graduate School,Single,39,-1,1709,14674,0
+230000,Female,University,Single,43,2,206245,13126,1
+220000,Female,Graduate School,Married,49,-2,0,0,1
+60000,Female,High School,Married,39,1,31011,0,0
+50000,Female,Others,Married,41,0,46148,4000,1
+300000,Female,University,Married,40,1,0,0,0
+320000,Female,University,Married,41,-2,2555,7100,0
+250000,Female,Graduate School,Married,43,-2,331,0,0
+200000,Female,High School,Married,42,-1,858,1406,0
+110000,Female,High School,Married,36,-1,3681,6026,0
+50000,Female,High School,Single,46,0,36259,2500,0
+300000,Female,Others,Single,45,1,0,1669,0
+50000,Female,Others,Married,33,1,26358,4,0
+230000,Female,High School,Married,42,-1,170,2217,0
+30000,Female,University,Married,36,-1,528,0,1
+30000,Female,University,Married,41,3,1660,428,0
+290000,Female,University,Married,41,0,4783,2552,0
+210000,Female,Graduate School,Married,47,-2,234,0,0
+200000,Female,University,Married,46,-1,827,1654,0
+150000,Female,University,Married,41,-2,11026,9640,0
+110000,Female,Graduate School,Married,45,0,21842,3000,0
+20000,Female,University,Married,43,2,13139,1300,1
+360000,Female,Graduate School,Single,28,0,79161,4000,0
+230000,Female,Graduate School,Married,35,0,165405,6027,0
+60000,Female,University,Married,38,0,27527,5000,0
+470000,Female,University,Married,45,0,212948,30000,0
+120000,Female,Graduate School,Single,29,0,59538,4000,0
+50000,Female,University,Single,35,0,41657,10000,0
+140000,Female,Graduate School,Married,40,1,0,1220,0
+500000,Female,Graduate School,Married,39,-1,10840,1108,0
+400000,Female,University,Single,41,-2,43672,678,1
+310000,Female,University,Single,34,0,87559,3000,0
+320000,Female,Graduate School,Married,42,0,304057,20000,1
+330000,Female,Graduate School,Married,41,-1,11006,4160,0
+240000,Female,University,Married,38,0,231990,10028,0
+290000,Female,University,Married,36,-2,8909,8300,0
+150000,Female,Graduate School,Married,39,-1,10147,0,0
+50000,Female,University,Married,34,0,5745,1023,0
+210000,Female,Graduate School,Married,37,0,2960,0,0
+150000,Female,Graduate School,Married,38,-2,14391,1213,0
+210000,Female,University,Married,46,-1,594,0,0
+180000,Female,Graduate School,Married,40,0,34483,22864,0
+360000,Female,Graduate School,Single,33,1,0,0,1
+220000,Female,High School,Single,33,-1,179430,7041,0
+160000,Female,University,Married,40,1,0,3685,0
+500000,Female,High School,Married,40,0,38322,1284,0
+100000,Female,High School,Married,43,2,100792,0,0
+80000,Female,University,Married,47,-2,966,0,1
+360000,Female,University,Single,35,-2,0,0,1
+230000,Female,High School,Married,49,0,224985,10000,0
+180000,Female,Graduate School,Single,33,-2,4500,2580,0
+200000,Female,Graduate School,Single,44,0,203045,4501,1
+60000,Female,University,Married,45,0,60244,3000,0
+280000,Female,Graduate School,Married,35,0,171057,7000,0
+80000,Female,High School,Married,48,-1,396,792,1
+200000,Female,Graduate School,Married,40,1,0,0,0
+50000,Female,University,Single,37,0,4793,3000,0
+460000,Female,Graduate School,Single,34,-1,14460,8869,0
+220000,Female,Graduate School,Married,34,-1,2774,5162,0
+170000,Female,Graduate School,Married,37,-1,397,395,0
+50000,Female,University,Married,41,0,10400,0,0
+210000,Female,Graduate School,Married,44,0,89519,20033,0
+20000,Female,University,Married,31,1,9701,0,0
+230000,Female,Graduate School,Married,46,-2,266,0,0
+390000,Female,Graduate School,Single,32,-1,46895,7518,0
+180000,Female,University,Single,34,0,18314,2000,0
+130000,Female,High School,Married,33,1,0,0,0
+250000,Female,High School,Married,43,-2,278,278,0
+500000,Female,Graduate School,Married,37,-1,6694,17470,1
+240000,Female,Graduate School,Single,40,0,338294,2857,0
+270000,Female,University,Single,39,-2,3073,1816,0
+450000,Female,High School,Single,44,-2,11150,4281,0
+280000,Female,High School,Married,33,0,75953,9348,1
+170000,Female,Graduate School,Single,42,-1,3480,316,0
+280000,Female,Graduate School,Single,30,-1,26679,52223,0
+170000,Female,University,Married,43,0,38137,1934,0
+200000,Female,Graduate School,Single,37,0,19149,1500,0
+380000,Female,Graduate School,Single,33,0,89842,8000,0
+500000,Female,Graduate School,Single,44,-2,522,20007,0
+40000,Female,Graduate School,Single,29,0,35735,2017,0
+150000,Female,University,Married,29,2,123248,5000,1
+220000,Female,University,Married,34,0,36242,3000,0
+200000,Female,Graduate School,Single,29,0,181821,6000,0
+360000,Female,Graduate School,Married,43,-2,0,0,0
+100000,Female,Graduate School,Married,36,-1,905,3450,0
+420000,Female,University,Married,34,0,158471,1644,1
+200000,Female,Graduate School,Married,39,1,0,0,0
+310000,Female,University,Single,36,0,298221,10500,1
+180000,Female,University,Single,34,0,41699,1800,0
+500000,Female,Others,Married,32,-1,5353,6081,0
+230000,Female,Graduate School,Single,35,-2,0,0,0
+50000,Female,Graduate School,Single,40,0,52841,1860,0
+250000,Female,Graduate School,Married,40,1,16503,0,0
+150000,Female,University,Single,31,1,22723,0,0
+50000,Female,University,Married,34,0,2435,0,0
+100000,Female,University,Married,35,0,77394,2640,1
+110000,Female,University,Single,42,2,39635,1900,1
+210000,Female,University,Single,38,0,76418,5000,0
+150000,Female,Graduate School,Single,45,-2,1923,101,0
+80000,Female,Graduate School,Married,44,1,-60,2460,0
+200000,Female,University,Married,48,-2,-1123,4165,1
+150000,Female,Graduate School,Single,35,2,55337,0,1
+100000,Female,High School,Married,39,-2,10135,1646,0
+110000,Female,Graduate School,Single,32,0,46603,2536,0
+240000,Female,Graduate School,Single,33,0,99938,3500,0
+80000,Female,High School,Married,42,-2,0,0,0
+180000,Female,Graduate School,Single,35,0,5450,0,1
+180000,Female,University,Single,35,1,0,0,0
+400000,Female,Graduate School,Single,31,-2,-287,0,0
+260000,Female,University,Married,41,0,195465,8400,0
+160000,Female,University,Married,47,-1,1857,0,1
+240000,Female,University,Married,34,0,7806,2481,0
+260000,Female,Graduate School,Married,35,-2,0,0,0
+160000,Female,University,Single,38,0,159637,5580,0
+80000,Female,High School,Single,43,0,66047,2071,0
+480000,Female,Graduate School,Married,39,0,98569,10620,1
+270000,Female,High School,Married,41,0,48872,3688,0
+100000,Female,University,Single,40,2,40626,2000,1
+300000,Female,High School,Single,40,-1,390,390,0
+450000,Female,High School,Single,40,-1,5453,26731,0
+360000,Female,Graduate School,Married,43,1,0,0,0
+170000,Female,University,Married,40,0,134496,17400,0
+50000,Female,High School,Married,36,0,48032,1771,0
+50000,Female,High School,Single,30,0,16986,1186,1
+240000,Female,University,Married,31,1,6900,0,1
+440000,Female,University,Single,34,0,6480,1000,0
+180000,Female,University,Single,35,2,168090,5500,0
+140000,Female,University,Single,40,-2,0,1887,0
+150000,Female,Graduate School,Single,34,1,0,53,0
+100000,Female,University,Single,35,0,100207,4026,0
+190000,Female,University,Married,40,0,124082,4500,0
+200000,Female,Graduate School,Married,40,1,0,114558,0
+260000,Female,Graduate School,Married,41,0,157551,7312,0
+580000,Female,University,Married,34,1,420115,0,1
+100000,Female,University,Single,36,0,101513,3600,0
+400000,Female,Graduate School,Married,40,-1,3716,4318,0
+350000,Female,Graduate School,Single,32,-2,24522,18970,0
+140000,Female,University,Married,43,2,90632,0,1
+90000,Female,University,Married,46,-1,1829,2793,0
+180000,Female,Graduate School,Married,40,1,0,319,1
+150000,Female,Graduate School,Married,41,-2,3275,1105,0
+200000,Female,Graduate School,Single,40,-2,0,0,1
+390000,Female,University,Married,39,0,359641,11100,0
+180000,Female,University,Single,28,-1,416,416,1
+90000,Female,Graduate School,Single,28,1,5799,0,0
+240000,Female,University,Married,29,0,81274,3101,0
+320000,Female,Graduate School,Single,28,-1,331,1085,1
+90000,Female,Graduate School,Single,28,2,88627,0,1
+200000,Female,Graduate School,Married,40,-1,5853,6567,0
+210000,Female,Graduate School,Single,41,-1,194,6479,1
+160000,Female,University,Married,31,3,166573,2000,1
+100000,Female,Graduate School,Single,28,0,77007,5000,0
+240000,Female,Graduate School,Single,29,0,236883,8636,0
+50000,Female,High School,Married,50,0,12164,1521,0
+50000,Female,University,Married,50,0,21956,2000,0
+20000,Female,High School,Married,52,0,10248,1501,1
+320000,Female,Graduate School,Married,50,-1,885,2116,0
+60000,Female,High School,Married,50,-1,2811,297,1
+230000,Female,Graduate School,Married,49,-2,1034,299,0
+170000,Female,Graduate School,Married,50,-1,1978,2191,0
+400000,Female,Graduate School,Married,49,1,9279,0,0
+200000,Female,High School,Single,50,-1,3943,1443,0
+250000,Female,High School,Married,50,-2,193355,8173,0
+460000,Female,University,Married,50,0,581319,19141,1
+10000,Female,High School,Single,54,0,1083,780,0
+50000,Female,University,Married,50,0,32570,3000,0
+150000,Female,University,Married,50,-2,0,0,1
+290000,Female,University,Married,50,0,272799,10109,0
+50000,Female,High School,Married,58,0,7308,1200,0
+120000,Female,High School,Married,65,0,58436,2740,0
+250000,Female,Graduate School,Married,53,-1,440,439,0
+500000,Female,High School,Married,54,-2,0,0,1
+160000,Female,University,Married,55,0,86179,4000,0
+60000,Female,High School,Married,58,1,26259,0,0
+10000,Female,University,Others,52,1,8546,0,1
+20000,Female,University,Married,50,1,0,780,0
+80000,Female,University,Married,51,0,42731,1365,1
+100000,Female,High School,Married,57,0,101890,4000,0
+30000,Female,High School,Married,53,2,35647,0,0
+230000,Female,University,Single,53,0,148058,20074,0
+360000,Female,High School,Married,53,-2,0,208,1
+30000,Female,High School,Married,52,0,27964,1720,0
+300000,Female,High School,Married,52,0,293174,15000,0
+110000,Female,High School,Married,64,0,83050,4000,0
+450000,Female,Graduate School,Married,54,-2,2500,0,0
+330000,Female,High School,Married,53,0,287021,6802,0
+240000,Female,University,Married,55,2,72188,3000,0
+100000,Female,High School,Married,55,0,17048,10000,0
+30000,Female,High School,Married,56,0,29261,1500,0
+120000,Female,High School,Married,50,0,109372,5500,0
+310000,Female,University,Married,50,-1,316,316,0
+290000,Female,Graduate School,Married,51,-2,0,22888,0
+230000,Female,High School,Married,52,-2,986,3939,0
+270000,Female,University,Married,52,0,43210,3500,0
+160000,Female,Others,Single,52,0,157018,5000,0
+280000,Female,University,Married,51,2,258764,25000,0
+20000,Female,University,Married,54,1,14930,11,1
+400000,Female,Graduate School,Married,55,-1,153951,193358,0
+50000,Female,High School,Single,56,0,26242,3006,0
+130000,Female,University,Married,52,0,50662,4600,0
+100000,Female,High School,Others,56,1,28625,0,0
+400000,Female,University,Married,55,-1,530,5101,0
+120000,Female,University,Married,54,1,-200,0,1
+200000,Female,Graduate School,Married,58,1,203108,7500,1
+260000,Female,Graduate School,Married,55,1,0,0,0
+110000,Female,Graduate School,Married,56,0,34102,2000,0
+260000,Female,High School,Married,65,-2,0,0,1
+220000,Female,Graduate School,Married,56,1,0,0,0
+60000,Female,High School,Married,61,0,28208,1495,0
+140000,Female,High School,Married,56,-1,11106,3000,0
+100000,Female,High School,Married,50,0,16469,2000,0
+360000,Female,University,Single,51,1,-28,15528,0
+160000,Female,University,Married,52,-1,465,10451,0
+30000,Female,University,Married,52,1,34104,0,1
+200000,Female,University,Married,52,0,290122,7100,0
+330000,Female,University,Single,59,0,130797,20000,0
+80000,Female,High School,Married,56,-1,1407,1068,0
+140000,Female,Graduate School,Married,56,0,136084,6500,1
+270000,Female,High School,Married,55,-2,2039,7922,0
+90000,Female,High School,Married,50,0,91123,3614,0
+120000,Female,High School,Single,50,-1,6574,1374,0
+50000,Female,University,Married,50,0,49068,1848,0
+320000,Female,High School,Married,52,-2,650,2080,0
+80000,Female,University,Single,56,0,49350,2000,0
+180000,Female,University,Married,58,-2,0,0,0
+60000,Female,University,Single,51,0,38788,2065,0
+440000,Female,Graduate School,Married,54,0,293717,9700,0
+50000,Female,High School,Married,50,1,0,1381,1
+160000,Female,Graduate School,Single,50,-2,1902,14846,0
+50000,Female,University,Married,51,1,4746,0,0
+50000,Female,High School,Single,57,2,50525,0,1
+140000,Female,Graduate School,Single,54,1,17962,0,1
+80000,Female,High School,Married,57,0,80297,3410,0
+300000,Female,High School,Others,53,-2,0,0,0
+40000,Female,High School,Married,60,0,14019,2000,0
+30000,Female,University,Married,54,0,38866,2000,1
+300000,Female,High School,Married,50,-2,0,0,1
+20000,Female,University,Married,59,3,8803,2800,1
+20000,Female,High School,Single,54,0,13543,1300,0
+140000,Female,University,Married,56,0,123460,5000,0
+200000,Female,High School,Married,49,2,570,52439,1
+200000,Female,Graduate School,Married,50,-1,2070,2500,0
+400000,Female,University,Single,50,-1,15986,4359,0
+290000,Female,High School,Married,50,1,173564,7000,0
+210000,Female,High School,Married,53,0,88680,7350,0
+50000,Female,University,Married,54,0,46137,2000,0
+20000,Female,High School,Single,56,0,11471,1214,0
+360000,Female,Graduate School,Single,54,-1,4463,3270,0
+70000,Female,High School,Married,53,2,70659,3000,1
+80000,Female,University,Married,52,1,82904,0,0
+230000,Female,High School,Married,66,-2,0,5580,0
+30000,Female,University,Single,52,0,19629,1340,0
+500000,Female,Graduate School,Married,52,0,76701,2540,0
+260000,Female,Graduate School,Married,51,1,0,2000,0
+330000,Female,Graduate School,Married,58,-2,880,2304,1
+350000,Female,Graduate School,Married,59,-2,0,0,0
+200000,Female,Others,Married,50,-2,0,1882,0
+130000,Female,University,Single,52,0,124007,6000,0
+610000,Female,High School,Married,50,-1,10579,11299,0
+330000,Female,University,Married,51,0,208263,10000,0
+70000,Female,University,Married,51,2,42815,2100,1
+110000,Female,High School,Single,51,0,111255,4112,0
+120000,Female,Graduate School,Married,51,0,117287,6500,0
+480000,Female,Graduate School,Married,52,-2,105286,87170,0
+80000,Female,University,Single,52,0,88819,2990,0
+200000,Female,Graduate School,Married,52,-1,16964,317,0
+20000,Female,University,Married,51,0,11156,2300,0
+230000,Female,High School,Married,70,-1,183500,7000,1
+40000,Female,High School,Single,52,0,31960,2000,0
+350000,Female,University,Married,56,-2,1966,14515,0
+360000,Female,University,Married,56,-1,919,2385,0
+30000,Female,High School,Married,67,2,27943,3580,0
+360000,Female,University,Married,61,2,353176,13869,1
+30000,Female,High School,Married,60,3,1950,0,1
+120000,Female,High School,Married,61,1,121709,99,1
+20000,Female,High School,Married,56,0,19774,2009,0
+140000,Female,Graduate School,Married,56,3,450,0,1
+20000,Female,University,Married,59,1,19328,0,1
+80000,Female,High School,Married,54,0,68519,2500,0
+360000,Female,High School,Married,56,1,0,0,0
+50000,Female,High School,Married,56,0,45652,1734,0
+150000,Female,Graduate School,Married,55,-1,14149,18734,0
+20000,Female,University,Single,53,2,17474,1500,0
+10000,Female,High School,Single,53,2,9917,3400,1
+100000,Female,University,Single,56,-2,390,780,0
+360000,Female,High School,Married,73,-2,0,0,1
+170000,Female,University,Single,53,-2,784,605,0
+70000,Female,High School,Married,55,0,68467,2249,0
+30000,Female,High School,Married,53,0,8615,1500,0
+80000,Female,Graduate School,Single,54,0,78294,3500,1
+100000,Female,High School,Single,58,0,98067,4600,0
+20000,Female,High School,Married,56,1,9061,0,1
+60000,Female,University,Single,53,1,8735,0,1
+510000,Female,High School,Married,61,0,187070,8500,0
+260000,Female,High School,Married,54,-1,9760,5484,0
+360000,Female,Graduate School,Married,64,-2,0,4900,0
+300000,Female,University,Married,53,-2,7733,7424,0
+160000,Female,High School,Married,74,0,79201,3783,0
+240000,Female,University,Married,56,3,2500,0,1
+150000,Female,High School,Single,58,2,152767,6251,0
+90000,Female,High School,Single,60,-1,1933,63201,0
+500000,Female,University,Married,73,-2,0,2826,1
+140000,Female,University,Married,60,2,113079,0,1
+170000,Female,High School,Married,57,0,163452,7300,0
+280000,Female,Graduate School,Married,56,-2,4320,4811,0
+30000,Male,Graduate School,Single,30,0,26061,1780,0
+120000,Male,Graduate School,Single,34,-1,646,1271,1
+50000,Female,High School,Single,55,2,13741,1243,0
+80000,Male,University,Single,41,0,60588,2505,0
+180000,Male,University,Single,29,0,177554,6500,0
+400000,Male,University,Single,26,1,938,350,0
+50000,Male,Graduate School,Single,28,0,13258,5635,1
+60000,Male,High School,Single,24,0,17054,1600,0
+200000,Male,University,Single,29,-2,13765,14304,0
+50000,Male,Graduate School,Single,29,0,50085,3000,1
+20000,Male,High School,Married,28,0,16066,2000,1
+130000,Male,University,Others,22,0,25200,0,0
+10000,Male,University,Single,22,-1,10160,0,0
+50000,Male,High School,Single,26,0,50343,1319,0
+10000,Male,University,Single,24,0,7521,1145,0
+200000,Male,High School,Single,27,-1,3179,1844,0
+10000,Male,University,Single,27,0,9457,1107,0
+50000,Male,High School,Single,25,0,39686,5000,0
+20000,Male,University,Single,23,1,17081,300,1
+10000,Male,High School,Others,22,0,9473,1288,1
+100000,Male,High School,Married,35,0,94467,3627,0
+50000,Male,University,Single,23,0,49842,2000,0
+50000,Male,University,Single,23,0,46926,1387,0
+50000,Male,University,Single,23,0,11148,3000,0
+160000,Male,Graduate School,Single,23,-1,6760,4000,0
+10000,Male,Graduate School,Single,23,0,6927,1500,0
+20000,Male,University,Single,23,0,6321,6000,0
+20000,Male,University,Single,22,1,19698,2000,1
+50000,Male,University,Single,22,0,51159,3000,0
+90000,Male,University,Single,24,0,37985,2010,0
+20000,Male,University,Single,24,2,5266,1000,1
+20000,Male,Graduate School,Single,24,1,6696,1000,1
+50000,Male,University,Single,23,0,50175,1796,0
+10000,Male,High School,Single,23,0,8604,1600,0
+150000,Male,University,Single,24,-1,9514,15046,0
+60000,Male,University,Single,24,0,58197,3000,0
+20000,Male,University,Others,25,0,3855,1040,0
+10000,Male,Graduate School,Single,23,1,0,0,0
+20000,Male,University,Single,24,0,18836,1318,0
+10000,Male,Graduate School,Single,24,0,7760,1471,0
+10000,Male,University,Single,23,0,10041,2000,0
+20000,Male,University,Single,24,2,16794,1288,0
+50000,Male,Graduate School,Single,23,-1,980,3500,0
+20000,Male,University,Single,22,0,4341,1245,0
+50000,Male,University,Single,24,-1,6701,2200,0
+10000,Male,University,Single,23,0,7545,1500,0
+50000,Male,University,Single,24,0,18684,1323,0
+30000,Male,Graduate School,Single,24,0,11080,0,0
+250000,Male,University,Single,24,1,0,0,0
+50000,Male,University,Single,24,0,49411,2100,0
+50000,Male,University,Single,23,0,22643,1497,1
+10000,Male,University,Single,23,0,2427,3775,1
+20000,Male,University,Single,23,0,16994,1700,0
+20000,Male,University,Single,23,0,18963,1300,1
+50000,Male,University,Single,23,0,50511,1785,0
+50000,Male,High School,Single,24,1,50645,0,0
+20000,Male,University,Single,24,0,15940,3572,0
+110000,Male,University,Single,24,-2,326,326,1
+20000,Male,Graduate School,Single,23,0,11819,1524,0
+50000,Male,University,Single,24,-1,390,390,1
+50000,Male,University,Single,25,0,43793,1651,0
+140000,Male,University,Single,25,1,0,850,0
+50000,Male,Graduate School,Married,25,0,47053,2000,0
+10000,Male,University,Single,25,1,0,0,0
+70000,Male,High School,Single,25,1,69447,0,1
+20000,Male,University,Single,25,0,14989,1263,1
+20000,Male,High School,Single,25,3,1650,0,1
+20000,Male,University,Single,25,1,18885,3300,1
+20000,Male,University,Single,25,0,17927,1500,0
+60000,Male,University,Single,26,0,38815,2023,0
+50000,Male,University,Married,25,0,47410,2100,0
+50000,Male,University,Married,25,0,49007,1922,1
+30000,Male,University,Married,25,0,11355,1512,1
+20000,Male,High School,Single,24,0,19438,1273,0
+50000,Male,University,Single,25,0,20636,1604,1
+70000,Male,University,Single,24,1,32536,0,0
+70000,Male,Graduate School,Single,24,0,12109,5328,0
+20000,Male,University,Single,25,0,13429,1300,0
+10000,Male,University,Single,23,-1,4812,1196,1
+10000,Male,High School,Single,24,0,2736,3000,0
+80000,Male,High School,Single,23,-1,1053,0,0
+60000,Male,Graduate School,Single,25,-1,1193,48967,0
+180000,Male,University,Married,26,-1,396,396,0
+300000,Male,University,Single,27,-1,1386,25125,0
+70000,Male,University,Married,26,0,64691,2500,0
+50000,Male,University,Single,26,3,29373,0,1
+120000,Male,University,Married,26,0,16068,1134,0
+50000,Male,High School,Single,26,0,50419,1000,1
+170000,Male,University,Married,27,-1,2697,0,0
+170000,Male,University,Single,27,-1,711,176,0
+50000,Male,University,Single,27,1,35905,0,0
+180000,Male,Graduate School,Single,23,0,177998,7000,0
+80000,Male,University,Married,26,2,31740,50000,0
+50000,Male,High School,Single,26,0,49006,2000,0
+290000,Male,University,Single,27,0,98861,3221,0
+160000,Male,University,Married,27,0,112022,4342,1
+160000,Male,Graduate School,Single,27,-1,316,316,0
+390000,Male,Graduate School,Single,27,0,59917,15000,0
+10000,Male,High School,Single,25,1,9737,1352,1
+80000,Female,Graduate School,Others,25,0,8900,1136,0
+160000,Male,Graduate School,Single,25,0,85748,4000,0
+80000,Male,Graduate School,Single,26,0,56808,3000,0
+360000,Male,University,Single,26,0,94672,3485,0
+200000,Male,University,Single,26,1,58050,0,1
+60000,Male,University,Single,25,0,56473,6000,0
+110000,Male,University,Single,26,0,61167,2721,0
+90000,Male,Graduate School,Single,26,0,61212,2497,0
+50000,Male,High School,Single,26,2,49644,2047,1
+180000,Male,Graduate School,Single,27,0,212751,40000,0
+30000,Male,University,Single,25,2,21403,0,1
+50000,Male,Graduate School,Single,25,-1,9716,25045,1
+50000,Male,University,Single,24,0,19267,5000,0
+120000,Male,University,Single,26,2,109110,0,0
+60000,Male,Graduate School,Single,25,0,58479,3000,0
+70000,Male,University,Single,25,0,34700,21597,0
+50000,Male,University,Single,25,0,29425,2000,0
+240000,Male,University,Single,26,0,24779,5000,0
+200000,Male,University,Single,25,2,147660,5500,1
+50000,Male,Graduate School,Single,25,0,41761,4000,0
+290000,Male,Graduate School,Single,26,0,196780,10144,0
+30000,Male,University,Single,26,1,-185,28924,0
+290000,Male,University,Single,26,0,46041,2000,0
+50000,Male,High School,Single,25,0,50184,2000,0
+50000,Male,University,Single,26,-1,581,581,0
+310000,Male,Graduate School,Single,26,-1,47330,4000,0
+120000,Male,University,Married,26,0,101215,5006,0
+140000,Male,University,Single,26,1,146023,6,1
+50000,Male,High School,Single,26,0,24003,2000,0
+10000,Male,University,Married,26,0,1198,0,1
+110000,Male,University,Married,30,2,88168,7300,1
+20000,Male,High School,Married,29,0,14576,2000,0
+50000,Male,University,Single,26,0,44781,2000,0
+230000,Male,University,Single,26,-2,-8,0,0
+80000,Male,High School,Single,27,1,0,0,0
+30000,Male,High School,Single,26,0,28337,2000,0
+100000,Male,University,Single,28,0,188853,5003,0
+280000,Male,Others,Single,26,-1,688,2400,0
+110000,Male,Graduate School,Single,26,0,65215,2375,0
+110000,Male,University,Married,26,0,52489,2000,0
+40000,Male,University,Single,27,1,40933,0,0
+180000,Male,Graduate School,Single,26,0,23758,9000,0
+50000,Male,High School,Single,26,0,50632,1808,0
+50000,Male,University,Single,27,0,50580,3000,0
+50000,Male,High School,Single,27,0,48534,1824,0
+90000,Male,Graduate School,Single,27,0,21506,1600,0
+80000,Male,Graduate School,Single,26,0,67090,2500,0
+90000,Male,Others,Single,26,-2,2862,0,1
+50000,Male,Graduate School,Single,26,0,14577,1300,0
+100000,Male,Graduate School,Single,26,-1,1872,4503,0
+390000,Male,University,Single,41,0,22309,1325,0
+70000,Male,University,Married,26,0,18757,1625,0
+100000,Male,Graduate School,Single,27,0,9228,1171,0
+210000,Male,University,Single,27,0,90675,1607,0
+10000,Male,University,Single,27,-1,5002,1103,0
+310000,Male,University,Single,26,-2,2154,360,0
+30000,Male,University,Single,26,6,33181,0,1
+130000,Male,Graduate School,Single,27,1,0,0,0
+20000,Male,Others,Single,27,-1,543,1100,0
+50000,Male,University,Single,28,0,5281,2000,0
+30000,Male,University,Single,24,1,21788,1700,1
+140000,Male,University,Single,26,0,80351,4000,0
+70000,Male,Graduate School,Single,27,0,43214,2000,0
+20000,Male,University,Single,27,1,17569,1500,1
+400000,Male,Graduate School,Single,27,0,3359,6034,0
+20000,Male,University,Single,27,1,19388,0,0
+10000,Male,Graduate School,Single,27,0,8907,1500,0
+60000,Male,Graduate School,Single,27,0,5370,2000,0
+180000,Male,Graduate School,Single,28,-2,2566,7255,0
+50000,Male,University,Married,28,0,49139,2500,0
+90000,Male,High School,Single,28,0,15940,4400,1
+390000,Male,University,Single,28,0,198132,9124,0
+30000,Male,University,Single,29,0,19363,2000,1
+200000,Male,Graduate School,Single,29,1,7885,0,1
+20000,Male,Graduate School,Single,28,1,8469,0,1
+70000,Male,Graduate School,Single,28,1,72605,0,0
+80000,Male,Graduate School,Single,27,0,63132,2378,0
+80000,Male,Graduate School,Married,27,-1,2197,3613,0
+110000,Male,University,Single,28,0,10817,3000,1
+310000,Male,High School,Single,28,-1,186,0,0
+80000,Male,Graduate School,Single,28,0,80849,3200,0
+390000,Male,University,Single,29,-1,6664,6931,0
+20000,Male,Graduate School,Single,22,-2,-7,0,0
+50000,Male,University,Single,24,0,10912,2152,0
+20000,Male,University,Single,26,3,300,0,1
+30000,Male,University,Single,26,0,25993,1183,0
+170000,Male,University,Single,26,0,7228,1200,0
+320000,Male,University,Single,28,0,5799,70891,0
+200000,Male,University,Single,27,0,137266,5212,0
+50000,Male,High School,Single,27,0,17500,0,0
+180000,Male,Graduate School,Single,29,0,115191,5700,0
+290000,Male,University,Single,29,0,42065,1100,0
+260000,Male,University,Single,28,1,0,0,0
+170000,Male,Graduate School,Single,27,-2,2190,264,0
+290000,Male,Graduate School,Single,28,-2,346,0,1
+120000,Male,University,Single,28,0,92677,4516,0
+290000,Male,Graduate School,Single,28,0,6395,1200,1
+10000,Male,Graduate School,Single,27,0,5703,1130,0
+160000,Male,University,Single,28,-1,530,0,0
+20000,Male,University,Married,28,0,18941,1500,0
+200000,Male,Graduate School,Married,29,1,199417,0,0
+200000,Male,High School,Single,30,0,24853,3158,0
+130000,Male,University,Single,27,0,84579,3084,0
+230000,Male,University,Single,28,0,26574,26,0
+140000,Male,Graduate School,Single,28,-2,933,1768,0
+280000,Male,Graduate School,Single,27,-1,1284,846,0
+50000,Male,Graduate School,Single,27,0,14614,1558,0
+20000,Male,University,Single,28,0,12234,1192,0
+150000,Male,Others,Single,27,0,133374,10000,0
+200000,Male,Graduate School,Single,29,1,0,0,1
+420000,Male,Graduate School,Single,29,0,16127,1322,0
+10000,Male,High School,Single,29,0,2944,2000,0
+500000,Male,University,Married,28,0,257603,10234,0
+150000,Male,University,Single,28,0,157051,8566,0
+240000,Male,Graduate School,Single,28,0,227346,8308,0
+70000,Male,Graduate School,Single,28,-1,4638,323,0
+70000,Male,University,Married,29,0,57779,2200,0
+230000,Male,University,Single,28,-2,-73,0,0
+200000,Male,Graduate School,Single,28,-1,5765,12453,0
+50000,Male,Graduate School,Single,29,0,46489,2500,0
+300000,Male,Graduate School,Single,29,-2,2163,345,0
+130000,Male,University,Single,29,0,58299,2626,0
+610000,Male,High School,Single,29,0,8965,7062,0
+120000,Male,University,Single,29,0,115698,6000,0
+60000,Male,University,Single,27,0,58982,2097,0
+50000,Male,University,Single,29,0,28818,2000,0
+50000,Male,High School,Single,29,1,43933,0,1
+20000,Male,University,Single,30,0,19082,1314,0
+300000,Male,University,Married,30,-2,900,195,0
+50000,Male,High School,Married,31,1,43200,6000,0
+130000,Male,Graduate School,Married,29,1,118723,3001,0
+100000,Male,University,Single,30,2,89898,4900,1
+130000,Male,University,Single,28,0,101546,3000,0
+30000,Male,University,Single,29,-1,390,390,0
+10000,Male,University,Married,29,0,8099,2000,0
+50000,Male,University,Single,28,0,48591,2300,0
+10000,Male,University,Married,30,3,1050,0,1
+230000,Male,University,Single,30,0,115141,5028,0
+210000,Male,University,Married,30,-1,820,820,0
+140000,Male,Others,Single,30,-2,2663,1823,0
+50000,Male,University,Single,31,0,47533,2200,0
+310000,Male,Graduate School,Single,30,-1,48931,207000,0
+80000,Male,University,Single,30,0,64330,2934,0
+500000,Male,Graduate School,Married,30,-1,6049,2500,0
+80000,Male,University,Single,26,0,50442,1816,0
+120000,Male,High School,Single,28,-1,1835,0,1
+50000,Male,University,Single,26,0,45253,1724,0
+50000,Male,University,Single,30,0,16397,1572,0
+200000,Male,Graduate School,Married,30,-2,3207,4360,0
+50000,Male,University,Single,30,-1,1382,1300,0
+130000,Male,University,Single,30,-1,780,390,0
+240000,Male,Graduate School,Single,28,0,181569,6606,0
+180000,Male,Graduate School,Single,28,0,9796,500,0
+20000,Male,University,Single,28,0,10926,2500,0
+140000,Male,Graduate School,Single,28,1,31529,0,1
+160000,Male,Graduate School,Married,29,-1,498,682,0
+280000,Male,Graduate School,Single,29,-2,10660,5123,0
+80000,Male,Graduate School,Single,28,-1,198,734,0
+140000,Male,University,Married,29,0,14544,8376,0
+300000,Male,Graduate School,Single,30,0,237372,10000,0
+150000,Male,University,Married,27,0,118533,3375,0
+240000,Male,Others,Married,27,-2,0,249,0
+100000,Male,Graduate School,Single,29,-2,390,390,1
+100000,Male,University,Single,30,0,27869,3500,0
+220000,Male,University,Married,30,0,15746,16780,0
+50000,Male,High School,Single,30,0,9610,1147,1
+120000,Male,Graduate School,Single,30,-1,1500,1700,0
+280000,Male,Graduate School,Married,31,-2,-6,0,0
+200000,Male,Graduate School,Single,30,1,0,1870,0
+10000,Male,University,Single,29,2,5948,0,1
+10000,Male,University,Single,30,0,8107,1298,1
+50000,Male,Graduate School,Single,29,0,5249,10000,0
+70000,Male,Graduate School,Single,30,0,2329,55000,1
+20000,Male,University,Single,30,1,-300,0,0
+140000,Male,Graduate School,Single,30,0,90313,15037,0
+200000,Male,Others,Married,46,0,103138,3929,0
+280000,Male,University,Single,42,-1,22821,1118,0
+70000,Male,Graduate School,Single,45,1,73958,67,0
+210000,Male,University,Single,26,0,9358,0,0
+50000,Male,Others,Single,27,0,47830,6143,0
+150000,Male,Graduate School,Single,27,0,85500,3956,0
+120000,Male,University,Single,28,0,8022,2000,0
+200000,Male,Graduate School,Married,28,-1,1867,997,1
+420000,Male,Graduate School,Single,31,-1,31065,2004,0
+50000,Male,Graduate School,Single,29,1,5931,0,0
+270000,Male,Graduate School,Single,31,0,9526,1116,0
+130000,Male,University,Married,31,-1,2280,1725,0
+210000,Male,Graduate School,Single,31,0,82940,3024,0
+240000,Male,University,Single,31,1,0,780,0
+260000,Male,Graduate School,Single,30,0,27855,1259,0
+250000,Male,University,Married,32,-1,79319,0,0
+150000,Male,University,Married,31,-1,530,530,0
+140000,Male,Graduate School,Single,31,0,128036,4180,0
+60000,Male,High School,Married,34,0,43293,1368,0
+500000,Male,Graduate School,Married,39,0,133598,54209,0
+50000,Male,University,Single,37,3,21153,3000,1
+50000,Male,High School,Married,42,0,18838,4650,0
+140000,Male,Graduate School,Single,33,0,128213,7000,0
+20000,Male,University,Married,32,0,16354,1700,0
+360000,Male,Graduate School,Married,38,-1,2513,5066,0
+10000,Male,High School,Single,41,0,10400,5275,0
+200000,Male,University,Married,41,0,51995,4000,1
+400000,Male,University,Married,38,-1,3380,3411,0
+180000,Male,Graduate School,Married,32,0,6687,1227,1
+220000,Male,High School,Single,41,0,193903,8000,0
+50000,Male,High School,Married,45,0,5339,1189,0
+100000,Male,Others,Married,45,0,94502,5000,0
+50000,Male,University,Single,47,0,51129,2096,0
+320000,Male,Graduate School,Single,28,-2,26149,3894,0
+320000,Male,Graduate School,Single,31,1,-6029,185,1
+310000,Male,University,Married,31,-1,4673,0,0
+100000,Male,University,Married,33,2,60286,0,0
+320000,Male,Graduate School,Single,34,0,71944,19246,0
+280000,Male,University,Married,34,0,55237,2625,0
+80000,Male,University,Married,35,-1,1692,2702,0
+200000,Male,Graduate School,Married,41,1,150406,5000,1
+70000,Male,University,Married,37,0,23546,3800,0
+50000,Male,Graduate School,Married,40,0,47695,2017,0
+10000,Male,University,Married,42,2,4770,4505,1
+300000,Male,Graduate School,Married,43,-1,1920,412,0
+210000,Male,High School,Married,39,-2,13024,726,0
+60000,Male,Graduate School,Single,36,1,28504,0,1
+20000,Male,University,Married,40,1,14829,3000,0
+50000,Male,University,Married,42,0,35892,2007,0
+100000,Male,University,Married,39,0,66177,4000,0
+200000,Male,Graduate School,Married,43,-1,18422,1127,0
+240000,Male,Graduate School,Married,34,0,102650,0,0
+200000,Male,University,Others,41,0,106576,3864,0
+130000,Male,University,Married,39,0,19698,1500,0
+130000,Male,Graduate School,Single,28,1,0,0,1
+50000,Male,University,Married,45,-1,942,942,1
+30000,Male,University,Single,32,0,26742,2800,0
+130000,Male,High School,Single,29,0,129602,5437,0
+10000,Male,High School,Single,46,0,9592,1150,1
+70000,Male,University,Married,33,-1,390,4924,1
+150000,Male,University,Married,37,0,100894,4700,0
+120000,Male,University,Single,38,0,116399,5000,1
+100000,Male,High School,Single,48,0,58430,2000,0
+50000,Male,High School,Single,34,0,49093,2400,1
+210000,Male,Graduate School,Married,40,-2,5411,5956,0
+50000,Male,University,Single,29,0,48632,4600,0
+50000,Male,University,Married,36,3,2400,0,1
+30000,Male,High School,Married,42,1,24335,0,1
+500000,Male,University,Married,33,-1,680,1101,0
+60000,Male,High School,Single,46,0,61548,2700,0
+180000,Male,University,Single,39,-1,715,4637,0
+20000,Male,University,Married,47,0,14186,1000,0
+50000,Male,University,Married,42,1,0,3095,0
+100000,Male,University,Married,38,0,1473,390,0
+240000,Male,Graduate School,Married,34,1,0,0,1
+170000,Male,University,Married,34,-1,7950,6691,0
+50000,Male,Graduate School,Single,36,1,49395,1776,0
+150000,Male,Graduate School,Single,30,0,147112,4518,0
+20000,Male,University,Married,34,1,16112,1300,0
+210000,Male,Graduate School,Single,35,0,103632,0,0
+50000,Male,University,Married,49,0,48481,2313,1
+120000,Male,University,Married,34,2,115040,6000,0
+70000,Male,High School,Married,45,0,65824,3067,0
+10000,Male,University,Single,40,0,9649,1500,1
+20000,Male,University,Married,47,1,8198,300,0
+150000,Male,Others,Married,36,0,181335,14000,0
+190000,Male,University,Married,35,1,0,0,0
+50000,Male,University,Single,40,0,49224,1821,0
+50000,Male,High School,Married,30,0,37208,1615,1
+50000,Male,Graduate School,Single,30,0,49194,1978,0
+250000,Male,University,Single,31,0,206199,8006,0
+220000,Male,Graduate School,Single,31,0,119810,12000,1
+490000,Male,Graduate School,Single,33,-1,59016,0,0
+20000,Male,University,Single,30,2,13269,33,0
+70000,Male,Graduate School,Single,29,1,10449,6000,0
+290000,Male,Graduate School,Single,32,0,23561,30020,0
+450000,Male,Graduate School,Single,32,-1,1270,1580,0
+20000,Male,University,Married,32,0,23055,1400,1
+100000,Male,Graduate School,Married,33,0,49415,2284,0
+80000,Male,Graduate School,Single,33,2,98677,0,1
+70000,Male,High School,Married,44,-1,5484,1000,1
+200000,Male,University,Married,45,1,-7,34144,0
+20000,Male,University,Married,38,0,9269,1477,0
+220000,Male,Graduate School,Married,38,-1,1815,1486,0
+10000,Male,University,Single,45,1,9298,1000,0
+290000,Male,High School,Married,35,0,80748,2755,0
+20000,Male,High School,Single,38,1,-2086,18451,0
+200000,Male,University,Married,37,-1,3506,4727,0
+50000,Male,High School,Married,47,1,-2012,0,0
+50000,Male,University,Married,41,0,45422,1916,0
+60000,Male,University,Married,41,0,47963,2308,0
+50000,Male,High School,Married,48,1,6300,0,0
+290000,Male,Graduate School,Married,38,2,97531,4429,1
+140000,Male,University,Married,41,2,73259,3600,1
+570000,Male,Graduate School,Single,33,0,388897,9083,0
+210000,Male,Graduate School,Married,40,-1,1582,326,1
+270000,Male,High School,Single,32,1,157318,0,0
+50000,Male,University,Married,33,0,47048,1445,0
+60000,Male,University,Single,38,2,58190,3000,1
+120000,Male,University,Single,45,1,63911,0,1
+450000,Male,Graduate School,Married,42,-1,3566,0,1
+60000,Male,University,Married,39,-2,50210,60054,0
+20000,Male,University,Single,36,3,2400,0,1
+180000,Male,Graduate School,Married,36,0,85703,3614,0
+380000,Male,University,Single,37,-1,2302,7500,0
+20000,Male,University,Married,48,0,18401,2000,0
+120000,Male,Graduate School,Single,26,-1,2382,884,0
+500000,Male,High School,Married,39,0,32647,1832,1
+20000,Male,University,Single,39,-1,18646,2000,1
+260000,Male,High School,Married,43,0,205599,6000,0
+20000,Male,University,Single,33,0,13479,2000,0
+230000,Male,University,Married,35,0,149697,6000,0
+120000,Male,University,Married,33,-1,1336,0,0
+130000,Male,Graduate School,Married,40,2,80987,0,1
+200000,Male,Graduate School,Married,40,-1,840,800,0
+30000,Male,Graduate School,Single,49,0,25144,1500,0
+230000,Male,High School,Married,39,-1,2606,2990,0
+210000,Male,Graduate School,Married,44,-1,396,396,1
+260000,Male,Others,Married,40,0,18503,3000,0
+160000,Male,University,Married,36,-2,8700,5183,0
+140000,Male,University,Single,44,0,129301,6104,0
+70000,Male,University,Married,39,0,71059,5000,1
+100000,Male,University,Married,38,0,15788,2000,0
+150000,Male,Graduate School,Single,32,-2,833,1606,0
+290000,Male,University,Single,34,0,14543,2000,0
+60000,Male,University,Married,37,0,6742,3000,0
+50000,Male,University,Single,37,5,19859,0,0
+20000,Male,High School,Married,38,0,17224,1267,0
+500000,Male,Graduate School,Married,37,0,474934,40024,0
+90000,Male,University,Married,45,-1,3664,0,1
+280000,Male,Graduate School,Single,37,-2,7523,1574,0
+40000,Male,University,Married,40,1,34364,0,1
+220000,Male,Graduate School,Married,39,-1,875,1093,1
+220000,Male,Others,Married,37,-2,44253,1966,0
+50000,Male,University,Married,33,0,97190,1787,1
+20000,Male,University,Others,47,-1,390,780,1
+200000,Male,Graduate School,Married,39,0,192257,7004,0
+20000,Male,University,Married,32,2,18390,2000,1
+60000,Male,Graduate School,Single,39,1,60037,0,1
+280000,Male,High School,Married,45,-2,14493,1596,0
+50000,Male,University,Single,48,0,44385,1992,0
+420000,Male,Graduate School,Married,32,0,387143,1585,0
+30000,Male,High School,Married,44,1,19397,1500,1
+50000,Male,High School,Single,36,0,22312,3416,0
+20000,Male,High School,Married,43,0,18100,2000,0
+110000,Male,University,Married,39,2,39733,0,1
+230000,Male,University,Married,39,0,20307,2000,0
+60000,Male,University,Married,45,0,20959,1500,0
+360000,Male,Graduate School,Married,36,-2,1814,6545,0
+50000,Male,University,Single,42,0,49165,1800,0
+110000,Male,University,Single,35,0,106612,6100,0
+150000,Male,University,Single,29,0,104165,5000,0
+180000,Male,Graduate School,Single,29,-1,1124,18213,0
+80000,Male,University,Single,30,-1,5092,4623,0
+50000,Male,University,Single,30,0,46652,2014,0
+170000,Male,University,Single,31,-1,6336,1281,0
+200000,Male,University,Single,31,-1,5433,20710,0
+210000,Male,Graduate School,Single,31,0,36042,2000,0
+20000,Male,University,Single,32,0,19435,2003,0
+160000,Male,Graduate School,Single,35,-1,348,0,0
+50000,Male,High School,Single,45,0,46842,2356,0
+340000,Male,University,Single,30,0,233770,8100,1
+220000,Male,Graduate School,Single,32,-1,5359,4217,0
+200000,Male,University,Married,37,-2,500,598,1
+320000,Male,Graduate School,Single,29,0,225892,9429,0
+110000,Male,Graduate School,Single,29,2,77831,3430,1
+200000,Male,Graduate School,Single,30,3,1130,0,1
+160000,Male,Graduate School,Single,31,1,56432,0,0
+310000,Male,University,Married,34,0,89600,3100,0
+70000,Male,University,Married,29,0,78244,2009,0
+80000,Male,University,Single,29,-1,28175,0,0
+320000,Male,Graduate School,Single,34,-2,5250,0,0
+230000,Male,Graduate School,Single,34,0,56449,231788,0
+260000,Male,University,Single,36,2,334214,12000,1
+50000,Male,High School,Married,41,5,54090,0,1
+210000,Male,High School,Single,48,0,85107,33658,0
+60000,Male,High School,Married,42,0,19489,1336,0
+30000,Male,University,Married,48,0,26393,3006,1
+480000,Male,University,Married,44,-2,29112,43111,0
+240000,Male,University,Single,43,0,236238,8763,0
+20000,Male,University,Married,44,0,18015,1400,1
+50000,Male,High School,Married,41,0,49782,1816,0
+50000,Male,University,Married,37,0,44270,5000,0
+50000,Male,Graduate School,Married,35,-1,6290,3770,0
+500000,Male,Graduate School,Married,37,-1,14817,3904,0
+70000,Male,University,Single,31,0,35515,2000,0
+40000,Male,Graduate School,Single,31,0,33075,2000,0
+420000,Male,Graduate School,Single,32,-1,1100,1183,0
+20000,Male,University,Married,36,2,25238,2000,0
+300000,Male,High School,Married,41,-1,1556,118816,0
+110000,Male,University,Single,38,3,78480,4000,1
+70000,Male,University,Married,42,4,59130,2272,1
+110000,Male,Graduate School,Single,39,0,18310,1327,0
+220000,Male,University,Married,40,2,222418,4,0
+20000,Male,Graduate School,Single,37,0,14320,2000,0
+120000,Male,High School,Married,39,-1,326,326,0
+150000,Male,High School,Single,44,0,65903,5011,0
+110000,Male,High School,Married,40,0,91275,4500,0
+320000,Male,High School,Married,36,-2,5360,726,0
+50000,Male,University,Married,28,-1,430,0,0
+20000,Male,University,Single,34,0,16912,0,0
+180000,Male,High School,Married,35,2,57991,2700,1
+200000,Male,University,Married,37,0,208835,7313,0
+50000,Male,High School,Married,47,1,16217,0,0
+360000,Male,Graduate School,Single,40,-1,499,0,1
+30000,Male,University,Married,30,0,28130,1750,0
+230000,Male,Graduate School,Single,31,-2,0,0,0
+280000,Male,Graduate School,Single,32,-2,3981,1355,0
+50000,Male,High School,Married,32,0,47974,3000,1
+360000,Male,Graduate School,Single,33,-1,57709,23718,0
+240000,Male,University,Single,37,2,45070,1900,1
+180000,Male,High School,Married,43,2,176911,0,0
+200000,Male,Graduate School,Single,36,-1,396,396,0
+390000,Male,Graduate School,Married,29,-2,570,2691,0
+110000,Male,Graduate School,Single,29,0,47982,1809,0
+250000,Male,Graduate School,Single,29,0,240725,10022,0
+380000,Male,University,Single,30,0,47765,3000,0
+300000,Male,University,Married,30,-1,165,165,0
+300000,Male,University,Married,44,0,223211,21000,0
+320000,Male,Graduate School,Married,40,-2,968,6476,0
+160000,Male,University,Married,48,1,0,0,0
+310000,Male,Graduate School,Married,47,-2,-1874,0,0
+50000,Male,University,Single,32,0,50962,2264,0
+20000,Male,University,Married,32,0,8768,1159,1
+240000,Male,University,Married,31,2,60511,2300,1
+250000,Male,Graduate School,Single,31,-1,377399,10045,0
+200000,Male,Graduate School,Single,31,0,162018,6000,0
+240000,Male,Graduate School,Single,35,1,0,0,0
+150000,Male,High School,Married,43,-1,1280,0,0
+420000,Male,University,Married,43,0,36590,42063,0
+10000,Male,High School,Married,36,0,5104,1274,0
+30000,Male,University,Married,44,1,12351,1216,0
+250000,Male,Graduate School,Single,33,-1,441,0,1
+500000,Male,University,Married,33,0,5322,1125,0
+30000,Male,University,Single,38,0,29770,6000,0
+120000,Male,University,Single,48,0,116757,10717,0
+200000,Male,University,Married,34,-1,2498,2670,0
+20000,Male,University,Single,36,0,11179,1300,0
+50000,Male,University,Single,38,1,15116,300,1
+50000,Male,High School,Married,47,2,39892,3600,1
+50000,Male,High School,Single,40,0,46185,3000,1
+140000,Male,High School,Married,39,0,138317,6047,0
+20000,Male,High School,Married,34,0,17692,2000,0
+150000,Male,High School,Single,39,0,42029,2100,0
+230000,Male,University,Married,35,0,174274,7004,0
+240000,Male,University,Married,36,0,230523,11828,0
+150000,Male,Graduate School,Married,41,3,111056,5700,1
+140000,Male,Graduate School,Married,38,0,52684,1692,0
+310000,Male,University,Married,37,-2,1619,1443,0
+60000,Male,University,Married,36,3,2400,0,1
+200000,Male,Graduate School,Single,34,-2,485,1316,0
+160000,Male,Graduate School,Married,46,-2,0,0,1
+360000,Male,Graduate School,Married,35,-1,12895,3552,0
+30000,Male,University,Married,49,0,12922,1500,0
+100000,Male,High School,Married,41,1,0,840,1
+10000,Male,University,Married,45,1,6321,0,0
+300000,Male,University,Married,37,-2,3016,12165,0
+500000,Male,University,Married,49,-1,1460,5334,1
+50000,Male,High School,Married,40,2,48785,2090,1
+500000,Male,University,Married,48,-2,1900,5441,0
+100000,Male,University,Married,46,2,100000,4500,1
+240000,Male,Graduate School,Single,35,0,241499,8500,0
+450000,Male,University,Married,43,0,104485,6000,0
+620000,Male,University,Single,31,-2,5712,11881,0
+100000,Male,University,Married,32,0,60121,2454,0
+230000,Male,Graduate School,Single,32,0,246764,8316,0
+50000,Male,University,Single,34,0,4166,0,0
+30000,Male,University,Single,38,4,17628,3000,1
+20000,Male,University,Others,40,0,16149,1500,0
+180000,Male,Graduate School,Single,32,-1,291,582,1
+90000,Male,Graduate School,Single,31,2,90322,4300,1
+170000,Male,Graduate School,Single,32,0,166470,5809,0
+180000,Male,University,Single,34,-2,-16,0,0
+50000,Male,High School,Single,40,0,50693,1747,0
+100000,Male,High School,Married,30,0,50512,2000,0
+200000,Male,Graduate School,Single,39,1,0,0,0
+200000,Male,Graduate School,Single,37,-2,4279,3856,0
+220000,Male,Graduate School,Married,39,1,0,0,0
+80000,Male,High School,Married,43,2,1473,1473,0
+30000,Male,High School,Single,37,1,24095,1398,0
+50000,Male,High School,Single,30,0,46860,2500,0
+210000,Male,Graduate School,Single,40,1,65568,3700,0
+10000,Male,Graduate School,Married,48,1,8594,0,0
+300000,Male,Graduate School,Married,38,-2,0,0,1
+470000,Male,University,Married,31,0,12281,2014,0
+200000,Male,Graduate School,Married,44,0,196770,10500,0
+80000,Male,University,Married,32,0,81249,3000,0
+10000,Male,High School,Single,36,0,6728,1446,0
+10000,Male,High School,Single,49,1,2365,10008,0
+20000,Male,High School,Single,37,0,18623,3501,0
+240000,Male,University,Married,39,1,245338,0,0
+30000,Male,University,Single,29,0,13172,1082,0
+160000,Male,University,Married,31,0,94041,3579,0
+20000,Male,Graduate School,Single,31,3,2400,0,1
+10000,Male,University,Married,37,-1,7501,0,1
+250000,Male,High School,Married,39,0,209263,7127,0
+260000,Male,Graduate School,Single,34,0,251267,12021,0
+100000,Male,University,Single,34,0,47908,1800,1
+210000,Male,Graduate School,Single,35,-1,1527,0,0
+290000,Male,Graduate School,Single,33,0,277227,10681,0
+150000,Male,Others,Single,29,0,200442,8500,1
+300000,Male,Graduate School,Single,32,2,255299,9106,1
+50000,Male,University,Married,32,1,-11545,10000,0
+100000,Male,University,Single,29,0,51183,0,0
+210000,Male,Graduate School,Single,32,1,161444,6000,0
+300000,Male,High School,Married,32,-1,590,2635,0
+90000,Male,University,Married,30,1,45271,0,0
+210000,Male,University,Married,31,-1,1748,2488,0
+10000,Male,University,Single,47,2,8640,1145,1
+90000,Male,University,Married,36,0,53234,2100,0
+50000,Male,High School,Single,35,1,51350,0,0
+140000,Male,High School,Married,41,-1,1476,2139,0
+310000,Male,University,Single,35,0,259636,10000,0
+250000,Male,Graduate School,Married,44,-1,3704,5319,0
+200000,Male,Graduate School,Married,37,-2,11572,55291,0
+360000,Male,Graduate School,Married,35,-1,3771,900,0
+50000,Male,University,Single,43,0,12661,1600,1
+500000,Male,University,Married,39,-2,1777,23800,0
+50000,Male,University,Married,43,0,50562,6500,0
+20000,Male,Graduate School,Single,31,-2,668,355,0
+90000,Male,University,Married,31,0,70329,2471,0
+240000,Male,University,Married,36,0,216923,30700,0
+20000,Male,University,Single,33,3,14513,8000,1
+210000,Male,University,Single,33,0,155847,6000,0
+260000,Male,Graduate School,Single,33,2,128352,7000,0
+50000,Male,University,Single,32,0,52475,2000,0
+90000,Male,University,Single,32,0,88903,7000,0
+20000,Male,High School,Married,45,1,18838,0,1
+60000,Male,University,Married,40,0,53896,2500,0
+200000,Male,University,Single,34,0,176642,7000,0
+50000,Male,University,Married,34,0,43389,1717,0
+250000,Male,Graduate School,Married,34,0,6035,1131,1
+20000,Male,High School,Married,48,0,10805,5000,1
+200000,Male,High School,Single,42,-1,2753,3006,0
+50000,Male,Others,Single,46,0,49336,1423,0
+200000,Male,University,Married,37,0,149303,30000,0
+250000,Male,High School,Single,35,-1,29363,4501,0
+50000,Male,University,Single,49,5,52321,0,0
+210000,Male,Graduate School,Single,35,-1,326,326,0
+80000,Male,University,Single,42,1,97841,0,0
+50000,Male,University,Single,40,2,11058,2000,1
+50000,Male,University,Single,48,0,23694,1635,0
+20000,Male,University,Single,36,0,20528,1300,1
+400000,Male,University,Single,37,1,396343,423903,0
+500000,Male,Graduate School,Married,44,-1,100862,367,0
+100000,Male,University,Married,46,0,60799,2036,0
+220000,Male,Graduate School,Married,42,1,0,0,1
+40000,Male,Graduate School,Married,47,2,10555,1000,1
+50000,Male,University,Married,41,0,43474,40003,0
+100000,Male,High School,Married,49,0,99721,3500,0
+150000,Male,Graduate School,Married,35,2,165441,7300,1
+80000,Male,University,Single,37,0,47765,3000,1
+100000,Male,University,Single,32,0,85366,3114,0
+340000,Male,Graduate School,Single,33,1,0,0,1
+120000,Male,University,Single,30,0,98440,4000,0
+20000,Male,University,Single,30,0,19027,1279,0
+20000,Male,University,Single,31,-1,390,390,1
+30000,Male,University,Single,34,0,24380,2000,0
+130000,Male,High School,Single,34,0,10557,2000,0
+180000,Male,Others,Married,34,-2,0,0,0
+500000,Male,Graduate School,Single,34,0,494356,17438,0
+190000,Male,University,Married,43,0,83399,4000,1
+150000,Male,Others,Married,36,0,216482,5500,0
+320000,Male,Graduate School,Married,48,2,206732,0,1
+320000,Male,Graduate School,Married,47,0,162701,20000,0
+490000,Male,Graduate School,Married,49,-1,19107,10238,0
+10000,Male,High School,Married,42,2,9013,0,1
+400000,Male,Graduate School,Married,39,0,10260,12201,0
+10000,Male,University,Married,37,1,-265,0,1
+300000,Male,University,Single,47,0,109274,3000,0
+310000,Male,Others,Married,46,-1,15431,10414,0
+100000,Male,University,Single,44,2,80358,2900,1
+80000,Male,University,Single,36,0,33457,1600,0
+50000,Male,High School,Married,46,0,62315,2200,0
+270000,Male,University,Single,36,1,-16,3342,0
+500000,Male,University,Married,44,-1,2445,13053,0
+50000,Male,University,Single,39,0,25637,3500,1
+460000,Male,University,Married,39,-2,700,9916,0
+80000,Male,High School,Single,44,1,58606,0,0
+260000,Male,High School,Married,39,0,30498,1894,0
+50000,Male,University,Married,47,0,11325,1504,0
+30000,Male,High School,Married,41,0,19904,2000,0
+190000,Male,University,Married,42,0,3747,1000,1
+290000,Male,Graduate School,Single,40,0,83607,3000,0
+650000,Male,Graduate School,Married,44,-2,2119,5115,0
+50000,Male,University,Married,38,0,1529,1200,1
+80000,Male,Graduate School,Single,44,0,77156,3000,0
+200000,Male,University,Married,42,0,76929,5000,0
+150000,Male,Graduate School,Married,43,-1,832,0,1
+290000,Male,University,Single,30,2,270131,0,1
+340000,Male,Graduate School,Single,29,-1,44855,300039,0
+320000,Male,Graduate School,Single,30,0,95526,5000,0
+80000,Male,University,Single,31,0,50244,1835,0
+10000,Male,University,Single,30,0,9018,1158,1
+420000,Male,University,Single,31,0,405274,14302,0
+230000,Male,University,Married,41,0,226467,10000,0
+100000,Male,University,Single,35,0,61378,3000,0
+100000,Male,Graduate School,Married,35,0,98803,4035,1
+280000,Male,Graduate School,Married,31,0,235461,8900,0
+220000,Male,Graduate School,Married,34,-2,6433,6307,0
+340000,Male,University,Single,36,-2,19628,14594,0
+340000,Male,High School,Married,36,0,362178,10116,1
+210000,Male,University,Single,37,0,18896,1500,0
+140000,Male,University,Married,38,0,135261,4494,1
+390000,Male,University,Married,37,0,174603,6379,0
+20000,Male,High School,Single,49,0,16787,2074,0
+420000,Male,Graduate School,Married,34,0,454391,20000,0
+350000,Male,High School,Married,31,1,225434,11000,0
+50000,Male,High School,Married,39,0,43127,2000,1
+630000,Male,University,Married,46,0,125975,3416,0
+140000,Male,University,Married,34,0,32538,4000,0
+180000,Male,High School,Married,40,0,105918,4400,0
+340000,Male,University,Married,37,0,71934,8000,0
+210000,Male,Graduate School,Married,49,-1,291,291,0
+100000,Male,High School,Single,47,0,94667,4000,0
+180000,Male,Graduate School,Married,37,1,1660,2701,0
+300000,Male,Graduate School,Married,46,-1,132760,1977,0
+210000,Male,University,Married,45,-1,1792,4369,1
+230000,Male,Graduate School,Married,46,-1,39900,1200,0
+70000,Male,High School,Married,45,2,71586,3100,1
+120000,Male,High School,Married,47,0,71777,2700,1
+20000,Male,University,Single,34,1,14498,0,0
+140000,Male,Graduate School,Single,29,0,142044,6100,1
+20000,Male,University,Single,29,0,16809,0,1
+50000,Male,University,Single,29,1,50845,0,0
+170000,Male,High School,Single,30,0,10392,168096,0
+260000,Male,Graduate School,Married,30,-1,27378,10018,0
+60000,Male,High School,Single,30,0,54952,2600,1
+50000,Male,High School,Married,30,1,17479,1300,1
+150000,Male,High School,Single,29,0,106556,46060,0
+240000,Male,Graduate School,Single,29,0,64826,10589,1
+140000,Male,Graduate School,Single,29,-2,0,0,1
+360000,Male,Graduate School,Single,32,-2,0,0,0
+310000,Male,University,Single,30,-2,167,0,0
+210000,Male,University,Married,31,-1,326,326,1
+450000,Male,Graduate School,Single,31,-1,5000,5000,0
+230000,Male,Graduate School,Single,34,-1,23396,25748,0
+390000,Male,Graduate School,Single,32,0,60425,1781,0
+10000,Male,University,Single,32,2,9299,1132,0
+330000,Male,University,Single,33,0,50999,2000,0
+30000,Male,Graduate School,Single,33,-1,31724,0,1
+190000,Male,Graduate School,Single,33,0,144706,6388,0
+50000,Male,High School,Married,37,-1,3057,3000,1
+50000,Male,Others,Married,45,2,71927,3000,1
+410000,Male,Graduate School,Single,34,0,468749,17259,1
+150000,Male,University,Married,35,1,-18,0,0
+360000,Male,Graduate School,Single,35,-1,200,317,0
+50000,Male,University,Married,40,0,49073,1691,0
+210000,Male,Graduate School,Single,36,0,97826,5000,0
+50000,Male,High School,Married,46,3,53418,0,0
+220000,Male,Graduate School,Single,38,0,159322,7500,0
+20000,Male,University,Single,42,0,10690,4854,0
+170000,Male,High School,Married,46,-1,1804,1804,1
+20000,Male,University,Married,43,1,6216,1307,0
+60000,Male,University,Others,41,0,61118,2200,0
+160000,Male,High School,Married,42,2,139983,6000,1
+190000,Male,High School,Married,44,-1,291,0,0
+360000,Male,University,Married,45,-1,18552,836,0
+50000,Male,University,Married,48,0,48187,46895,1
+130000,Male,High School,Single,40,0,129548,6000,0
+320000,Male,Graduate School,Married,44,-2,14075,6818,0
+30000,Male,University,Single,38,2,19992,1700,0
+100000,Male,High School,Married,37,0,31647,2129,0
+20000,Male,University,Married,48,1,20733,2,0
+50000,Male,University,Single,45,0,20638,1314,0
+130000,Male,High School,Married,45,-1,1475,1826,1
+500000,Male,University,Single,38,0,31504,12160,0
+80000,Male,University,Single,39,3,75635,0,1
+30000,Male,University,Married,42,0,28658,1656,0
+230000,Male,Graduate School,Single,44,-2,5419,3306,0
+50000,Male,High School,Married,46,0,11401,1300,0
+290000,Male,Graduate School,Married,32,-1,31933,39032,0
+280000,Male,Graduate School,Married,39,0,55050,1652,0
+220000,Male,Graduate School,Married,32,-1,528,0,0
+160000,Male,Graduate School,Single,30,0,143457,5237,0
+210000,Male,University,Single,30,-1,4206,3980,0
+170000,Male,Graduate School,Single,30,0,129848,5115,0
+180000,Male,University,Single,30,0,51323,2124,0
+410000,Male,Graduate School,Single,32,-1,380,0,0
+150000,Male,Graduate School,Single,30,0,2384,2000,0
+500000,Male,University,Single,31,0,29566,6025,0
+230000,Male,High School,Single,31,0,4261,1000,0
+290000,Male,Graduate School,Single,32,-1,23234,1575,0
+10000,Male,University,Married,29,0,9406,3009,0
+260000,Male,Graduate School,Single,33,-2,0,263,0
+50000,Male,University,Single,30,1,-264,528,0
+610000,Male,Graduate School,Single,31,0,348392,323014,0
+360000,Male,Graduate School,Single,31,-1,735,51,0
+140000,Male,University,Married,32,0,134236,6300,0
+150000,Male,Others,Single,31,2,134866,4633,1
+50000,Male,University,Married,32,0,50564,2686,0
+20000,Male,University,Single,34,0,13730,2000,0
+200000,Male,University,Married,33,0,110006,5000,0
+360000,Male,Graduate School,Married,34,-1,33654,52951,0
+80000,Male,High School,Married,36,0,65554,2395,0
+190000,Male,Graduate School,Married,37,0,21628,2000,0
+230000,Male,University,Married,35,1,0,0,1
+50000,Male,University,Married,37,1,10904,0,1
+220000,Male,University,Married,41,0,45075,8840,0
+40000,Male,University,Single,47,2,52358,4000,1
+420000,Male,Graduate School,Single,34,0,131939,7000,0
+310000,Male,University,Married,39,0,238973,10029,0
+180000,Male,Graduate School,Married,32,-2,0,0,0
+50000,Male,High School,Single,42,0,43998,10000,0
+50000,Male,University,Married,44,1,38671,2300,0
+90000,Male,University,Married,36,0,7752,1500,1
+20000,Male,University,Married,44,-2,1822,2890,0
+30000,Male,University,Single,38,-1,315,923,0
+240000,Male,Graduate School,Single,30,-2,0,0,0
+360000,Male,Graduate School,Single,35,-1,2220,0,0
+130000,Male,Graduate School,Single,34,0,23292,3000,0
+250000,Male,Graduate School,Married,34,0,279640,65000,0
+150000,Male,Graduate School,Single,35,-1,3425,9054,0
+140000,Male,University,Married,41,0,138325,6000,0
+210000,Male,University,Married,34,3,2500,0,1
+10000,Male,High School,Married,43,0,8802,2000,0
+100000,Male,Graduate School,Single,38,0,3042,2000,0
+80000,Male,University,Single,34,2,72557,7000,1
+220000,Male,High School,Married,39,0,188948,8500,0
+150000,Male,High School,Single,43,-1,1683,1837,0
+30000,Male,University,Single,37,4,3565,0,1
+80000,Male,High School,Married,41,1,-1645,85900,1
+50000,Male,University,Married,46,0,47929,2078,1
diff --git a/doc/src/week39/codes/logreg.py b/doc/src/week39/codes/logreg.py
new file mode 100644
index 000000000..6b287d00d
--- /dev/null
+++ b/doc/src/week39/codes/logreg.py
@@ -0,0 +1,80 @@
+import numpy as np
+import pandas as pd
+
+# Load the dataset (ensure the CSV file is in the working directory)
+df = pd.read_csv('creditcard.csv')
+
+# Preprocess the data
+df = df.drop('ID', axis=1)  # drop ID column
+# Rename the target column for convenience (optional)
+df = df.rename(columns={'default payment next month': 'default'})
+# Separate features and target
+X = df.drop('default', axis=1).values  # features (shape: [30000, 23])
+y = df['default'].values              # target (shape: [30000,])
+
+# Standardize features (zero mean, unit std dev)
+X_mean = X.mean(axis=0)
+X_std = X.std(axis=0)
+X = (X - X_mean) / X_std
+
+# Split into training and test sets (80/20 split)
+m = X.shape[0]
+train_size = int(0.8 * m)
+indices = np.random.permutation(m)  # shuffle indices for randomness
+train_idx, test_idx = indices[:train_size], indices[train_size:]
+X_train, X_test = X[train_idx], X[test_idx]
+y_train, y_test = y[train_idx], y[test_idx]
+
+# Add intercept term (bias) to features by adding a column of 1s
+X_train = np.hstack([np.ones((X_train.shape[0], 1)), X_train])
+X_test  = np.hstack([np.ones((X_test.shape[0], 1)), X_test])
+
+# Initialize logistic regression parameters
+n_features = X_train.shape[1]  # number of features including bias
+theta = np.zeros(n_features)   # model weights (initialized to 0)
+
+# Set training hyperparameters
+learning_rate = 0.1
+epochs = 500
+
+# Training loop (gradient descent)
+m_train = X_train.shape[0]
+for epoch in range(epochs):
+    # Compute predictions (sigmoid of linear combination)
+    z = X_train.dot(theta)                   # linear combination
+    predictions = 1 / (1 + np.exp(-z))       # sigmoid function
+    
+    # Compute the gradient of loss w.r.t. theta
+    error = predictions - y_train            # vector of (pred - true) for each example
+    grad = (X_train.T.dot(error)) / m_train  # gradient vector
+    
+    # Update weights
+    theta -= learning_rate * grad
+    
+    # (Optional) compute and print loss every 50 epochs for monitoring
+    if epoch % 50 == 0:
+        # Binary cross-entropy loss
+        loss = -np.mean(y_train * np.log(predictions + 1e-15) + 
+                        (1 - y_train) * np.log(1 - predictions + 1e-15))
+        print(f"Epoch {epoch}: Training loss = {loss:.4f}")
+
+# 5. Evaluate the model on training and test data
+# Predict probabilities for test set and classify as 1 if sigmoid >= 0.5
+# (We can equivalently check linear term >= 0, since sigma(x)>=0.5 iff x>=0)
+train_prob = 1 / (1 + np.exp(-X_train.dot(theta)))
+test_prob  = 1 / (1 + np.exp(-X_test.dot(theta)))
+train_pred = (train_prob >= 0.5).astype(int)
+test_pred  = (test_prob  >= 0.5).astype(int)
+
+# Calculate accuracy
+train_accuracy = (train_pred == y_train).mean()
+test_accuracy = (test_pred == y_test).mean()
+
+# Calculate final loss on training set for reference
+final_loss = -np.mean(y_train * np.log(train_prob + 1e-15) + 
+                      (1 - y_train) * np.log(1 - train_prob + 1e-15))
+
+# Print performance metrics
+print(f"Final Training Loss: {final_loss:.4f}")
+print(f"Training Accuracy: {train_accuracy * 100:.2f}%")
+print(f"Test Accuracy: {test_accuracy * 100:.2f}%")
diff --git a/doc/src/week39/codes/logregtest.py b/doc/src/week39/codes/logregtest.py
new file mode 100644
index 000000000..cbbd0071f
--- /dev/null
+++ b/doc/src/week39/codes/logregtest.py
@@ -0,0 +1,11 @@
+import pandas as pd
+import numpy as np
+import matplotlib.pyplot as plt
+import seaborn as sns
+credit = pd.read_csv('creditcard.csv')
+print(credit.columns)
+print(credit.head())
+print("dimension of credit data: {}".format(credit.shape))
+print(credit.groupby('Outcome').size())
+#sns.countplot(credit[‘Outcome’],label=”Count”)
+print(credit.info())
diff --git a/doc/src/week39/creditcard.do.txt b/doc/src/week39/creditcard.do.txt
new file mode 100644
index 000000000..0bb300353
--- /dev/null
+++ b/doc/src/week39/creditcard.do.txt
@@ -0,0 +1,163 @@
+TITLE: Week 40: Credit card example
+AUTHOR: Morten Hjorth-Jensen {copyright, 1999-present|CC BY-NC} at Department of Physics, University of Oslo
+DATE: Week 40
+
+
+!split
+===== Example case: Logistic Regression on Taiwan Credit Card Default Dataset =====
+
+
+
+=== Dataset Overview and Preparation ===
+
+
+The Default of Credit Card Clients dataset (Taiwan credit card default
+data) contains 30,000 instances and 23 features (plus an ID and a
+binary default indicator). The target variable is _default payment
+next month_ (1 = default, 0 = no default) . Key characteristics of the
+data include:
+
+o Instances: 30,000 card clients (each a unique ID) 
+o Features: 23 predictive features (mostly numeric, including demographics, credit limit, payment history, bill amounts, etc.) 
+o Target: Binary default indicator for the next month (Yes=1, No=0) 
+o Missing Values: None – the dataset has no missing values recorded 
+o Class Distribution: 22.1% of clients (6,636) defaulted vs 77.9% non-default  (i.e. the data is imbalanced about 1:3).
+o Basic Stats: Average credit limit is around NT$167,500, and mean age is ~35.5 years . Bill statement amounts are on the order of tens of thousands NT (mean ≈ 40k), while monthly payment amounts average a few thousand NT (mean ≈ 5k) .
+
+
+=== Feature details ===
+
+All features are numeric. Some are categorical in nature (encoded as
+integers): e.g. SEX (1=male, 2=female), EDUCATION (1=graduate,
+2=university, 3=high school, 4=others, with 5-6 as unknown) and
+MARRIAGE (1=married, 2=single, 3=others) . Others are continuous
+(credit amount, bill amounts, payments) or ordinal (e.g. $PAY_0 – PAY_6$
+repayment status, where -1 = pay duly, 1 = 1 month delay, 2 = 2 months
+delay, etc.) .
+
+=== Preprocessing steps ===
+
+For binary classification, we prepare the data as follows:
+
+o Remove the ID column (identifier not useful for prediction).
+o Use the 23 remaining features as $X$ and the _default payment next month_ as the binary target $y$.
+o The categorical features (SEX, EDUCATION, MARRIAGE) are already integer-encoded; for simplicity, we will use them as-is (one could one-hot encode these, but it’s not strictly necessary for logistic regression).
+o Standardize the feature values – i.e. scale each feature to zero mean and unit variance. This helps the gradient descent in logistic regression converge faster.
+o Split the dataset into a training set and testing set (we’ll use an 80/20 split). The model will be trained on the training set and evaluated on the test set to gauge performance on unseen data.
+
+
+!split
+===== Implementing Logistic Regression =====
+
+
+We will implement a logistic regression classifier using only NumPy,
+without any machine learning libraries. Logistic regression computes a
+weighted linear combination of features (plus a bias term) and applies
+a sigmoid function $\sigma(z) = 1/(1+e^{-z})$ to produce a probability
+between 0 and 1. We train the model by minimizing the logistic loss
+(binary cross-entropy) using gradient descent.
+
+Below is a single Python script that loads and preprocesses the data,
+then defines and trains the logistic regression model using gradient
+descent. It also outputs the model’s performance metrics (accuracy and
+loss). The file can be downloaded from the UCI database at URL:"https://archive.ics.uci.edu/dataset/350/default+of+credit+card+clients"
+
+!bc pycod
+import numpy as np
+import pandas as pd
+
+# Load the dataset (ensure the CSV file is in the working directory)
+df = pd.read_csv('credit_card_clients.csv')
+
+# Preprocess the data
+df = df.drop('ID', axis=1)  # drop ID column
+# Rename the target column for convenience (optional)
+df = df.rename(columns={'default payment next month': 'default'})
+# Separate features and target
+X = df.drop('default', axis=1).values  # features (shape: [30000, 23])
+y = df['default'].values              # target (shape: [30000,])
+
+# Standardize features (zero mean, unit std dev)
+X_mean = X.mean(axis=0)
+X_std = X.std(axis=0)
+X = (X - X_mean) / X_std
+
+# Split into training and test sets (80/20 split)
+m = X.shape[0]
+train_size = int(0.8 * m)
+indices = np.random.permutation(m)  # shuffle indices for randomness
+train_idx, test_idx = indices[:train_size], indices[train_size:]
+X_train, X_test = X[train_idx], X[test_idx]
+y_train, y_test = y[train_idx], y[test_idx]
+
+# Add intercept term (bias) to features by adding a column of 1s
+X_train = np.hstack([np.ones((X_train.shape[0], 1)), X_train])
+X_test  = np.hstack([np.ones((X_test.shape[0], 1)), X_test])
+
+# Initialize logistic regression parameters
+n_features = X_train.shape[1]  # number of features including bias
+theta = np.zeros(n_features)   # model weights (initialized to 0)
+
+# Set training hyperparameters
+learning_rate = 0.1
+epochs = 500
+
+# Training loop (gradient descent)
+m_train = X_train.shape[0]
+for epoch in range(epochs):
+    # Compute predictions (sigmoid of linear combination)
+    z = X_train.dot(theta)                   # linear combination
+    predictions = 1 / (1 + np.exp(-z))       # sigmoid function
+    
+    # Compute the gradient of loss w.r.t. theta
+    error = predictions - y_train            # vector of (pred - true) for each example
+    grad = (X_train.T.dot(error)) / m_train  # gradient vector
+    
+    # Update weights
+    theta -= learning_rate * grad
+    
+    # (Optional) compute and print loss every 50 epochs for monitoring
+    if epoch % 50 == 0:
+        # Binary cross-entropy loss
+        loss = -np.mean(y_train * np.log(predictions + 1e-15) + 
+                        (1 - y_train) * np.log(1 - predictions + 1e-15))
+        print(f"Epoch {epoch}: Training loss = {loss:.4f}")
+
+# 5. Evaluate the model on training and test data
+# Predict probabilities for test set and classify as 1 if sigmoid >= 0.5
+# (We can equivalently check linear term >= 0, since sigma(x)>=0.5 iff x>=0)
+train_prob = 1 / (1 + np.exp(-X_train.dot(theta)))
+test_prob  = 1 / (1 + np.exp(-X_test.dot(theta)))
+train_pred = (train_prob >= 0.5).astype(int)
+test_pred  = (test_prob  >= 0.5).astype(int)
+
+# Calculate accuracy
+train_accuracy = (train_pred == y_train).mean()
+test_accuracy = (test_pred == y_test).mean()
+
+# Calculate final loss on training set for reference
+final_loss = -np.mean(y_train * np.log(train_prob + 1e-15) + 
+                      (1 - y_train) * np.log(1 - train_prob + 1e-15))
+
+# Print performance metrics
+print(f"Final Training Loss: {final_loss:.4f}")
+print(f"Training Accuracy: {train_accuracy * 100:.2f}%")
+print(f"Test Accuracy: {test_accuracy * 100:.2f}%")
+!ec
+
+
+Explanation: We initialize the weight vector theta (including a bias term) to zeros. In each training epoch, we compute the logistic predictions for all training examples and then update the weights using the gradient of the cross-entropy loss. The learning rate and number of epochs are set to fixed values (which can be tuned). We periodically print the training loss to ensure the model is learning (decreasing loss). After training, we obtain predicted probabilities on the train and test sets, convert them to class labels (threshold 0.5), and compute the accuracy.
+
+!split
+===== Model performance and results =====
+
+
+After training, the code prints out the final loss and accuracies. You should observe that the training and test accuracy are around 78–80% for this logistic model. 
+
+This performance is in line with expectations. The dataset’s class imbalance means that always predicting “no default” would already achieve ~77.9% accuracy . Our logistic regression does slightly better, around 79–81% accuracy, by identifying some of the default cases correctly. (In one reference implementation, logistic regression achieved ~81.3% accuracy on a test set .)
+
+The model’s loss converges to a reasonable value, indicating the gradient descent optimization was successful.
+
+Note: The relatively modest improvement over the baseline is due to the imbalance and the limited complexity of a linear model. In practice, one could improve default prediction by using more complex models or addressing the imbalance (e.g. with oversampling or adjusting the classification threshold). Nonetheless, the above example demonstrates the complete process: data loading, preprocessing, model training, and evaluation, fulfilling the binary classification task on the Taiwan credit card default dataset.
+
+
diff --git a/doc/src/week39/figures/BiasVariance.png b/doc/src/week39/figures/BiasVariance.png
new file mode 100644
index 000000000..3fb3474ac
Binary files /dev/null and b/doc/src/week39/figures/BiasVariance.png differ
diff --git a/doc/src/week39/figures/adagrad.png b/doc/src/week39/figures/adagrad.png
new file mode 100644
index 000000000..97a9cf908
Binary files /dev/null and b/doc/src/week39/figures/adagrad.png differ
diff --git a/doc/src/week39/figures/adam.png b/doc/src/week39/figures/adam.png
new file mode 100644
index 000000000..a3a39f025
Binary files /dev/null and b/doc/src/week39/figures/adam.png differ
diff --git a/doc/src/week39/figures/nns.png b/doc/src/week39/figures/nns.png
new file mode 100644
index 000000000..19e31ef05
Binary files /dev/null and b/doc/src/week39/figures/nns.png differ
diff --git a/doc/src/week39/figures/rmsprop.png b/doc/src/week39/figures/rmsprop.png
new file mode 100644
index 000000000..9f336d033
Binary files /dev/null and b/doc/src/week39/figures/rmsprop.png differ
diff --git a/doc/src/week39/multiclass_results.csv b/doc/src/week39/multiclass_results.csv
new file mode 100644
index 000000000..9ffbe5203
--- /dev/null
+++ b/doc/src/week39/multiclass_results.csv
@@ -0,0 +1,301 @@
+TrueLabel,PredictedLabel
+0,0
+0,1
+0,0
+0,0
+0,1
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,1
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,1
+0,0
+0,2
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+1,1
+1,1
+1,0
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,0
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,0
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,0
+1,1
+1,1
+1,0
+1,1
+1,1
+1,1
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
diff --git a/doc/src/week39/week39.do.txt b/doc/src/week39/week39.do.txt
index 2f3e51097..377b7e6af 100644
--- a/doc/src/week39/week39.do.txt
+++ b/doc/src/week39/week39.do.txt
@@ -1,2711 +1,1562 @@
-TITLE: Week 39: Optimization and  Gradient Methods
+TITLE: Week 39: Resampling methods and logistic regression
 AUTHOR: Morten Hjorth-Jensen {copyright, 1999-present|CC BY-NC} at Department of Physics, University of Oslo
 DATE: Week 39
 
 !split
 ===== Plan for week 39, September 22-26, 2025 =====
 
+!bblock Material for the lecture on Monday September 22
+o Resampling techniques, Bootstrap and cross validation and bias-variance tradeoff
+o Logistic regression, our first classification encounter and a stepping stone towards neural networks
+o "Video of lecture":"https://youtu.be/OVouJyhoksY"
+o "Whiteboard notes":"https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2024/FYSSTKweek39.pdf"
+!eblock
 
 
 !split
-===== Lecture Monday September 22 =====
+===== Readings and Videos, resampling methods =====
+!bblock
+o Raschka et al, pages 175-192
+o Hastie et al Chapter 7, here we recommend 7.1-7.5 and 7.10 (cross-validation) and 7.11 (bootstrap). See URL:"https://link.springer.com/book/10.1007/978-0-387-84858-7".
+o "Video on bias-variance tradeoff":"https://www.youtube.com/watch?v=EuBBz3bI-aA"
+o "Video on Bootstrapping":"https://www.youtube.com/watch?v=Xz0x-8-cgaQ"
+o "Video on cross validation":"https://www.youtube.com/watch?v=fSytzGwwBVw"
+!eblock
 
-!bblock Material for the lecture on Monday September 22
-  * Repetition of Logistic regression equations and classification problems and discussion of Gradient methods. Examples on how to implement Logistic Regression and discussion of gradient methods
-  * Stochastic Gradient descent with examples and automatic differentiation (theme also for next week).
-#  * "Video of lecture":"https://youtu.be/ISGpTC28Vmk"
-#  * "Whiteboard notes":"https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2024/NotesSeptember23.pdf"
-  * Readings and Videos:
-    * These lecture notes
-    * For a good discussion on gradient methods, we would like to recommend Goodfellow et al section 4.3-4.5 and sections 8.3-8.6. We will come back to the latter chapter in our discussion of Neural networks as well.
-    * Raschka et al, pages 53-76 on Logistic regression and pages 37-52 on gradient optimization    
-    * "Video on gradient descent":"https://www.youtube.com/watch?v=sDv4f4s2SB8"
-    * "Video on stochastic gradient descent":"https://www.youtube.com/watch?v=vMh0zPT0tLI"
+
+!split
+===== Readings and Videos, logistic regression =====
+!bblock
+o Hastie et al 4.1, 4.2 and 4.3 on logistic regression
+o Raschka et al, pages 53-76 on Logistic regression and pages 37-52 on gradient optimization
+o "Video on Logistic regression":"https://www.youtube.com/watch?v=C5268D9t9Ak"
+o "Yet another video on logistic regression":"https://www.youtube.com/watch?v=yIYKR4sgzI8"
 !eblock
 
+
 !split
 ===== Lab sessions week 39 =====
 
-!bblock  Material for the active learning sessions on Tuesday and Wednesday
-  * Discussions on how to structure your report for the first project
-  * Exercise for week 39 on how to write the abstract and the introduction of the report and how to include references. 
-  * Work on project 1, in particular resampling methods like cross-validation and bootstrap. _For more discussions of project 1, chapter 5 of Goodfellow et al is a good read, in particular sections 5.1-5.5 and 5.7-5.11_.
-  * "Video on how to write scientific reports recorded during one of the lab sessions":"https://youtu.be/tVW1ZDmZnwM"
-  * A general guideline can be found at URL:"https://github.com/CompPhysics/MachineLearning/blob/master/doc/Projects/EvaluationGrading/EvaluationForm.md".
+!bblock  Material for the lab sessions on Tuesday and Wednesday
+o Discussions on how to structure your report for the first project
+o Exercise for week 39 on how to write the abstract and the introduction of the report and how to include references. 
+o Work on project 1, in particular resampling methods like cross-validation and bootstrap. _For more discussions of project 1, chapter 5 of Goodfellow et al is a good read, in particular sections 5.1-5.5 and 5.7-5.11_.
+o "Video on how to write scientific reports recorded during one of the lab sessions":"https://youtu.be/tVW1ZDmZnwM"
+o A general guideline can be found at URL:"https://github.com/CompPhysics/MachineLearning/blob/master/doc/Projects/EvaluationGrading/EvaluationForm.md".
 !eblock  
 
 
-
-
 !split
-===== Lecture Monday September 22, Optimization, the central part of any Machine Learning algortithm =====
+===== Lecture material =====
 
-The first few slides here are a repetition from last week. 
 
-Almost every problem in machine learning and data science starts with
-a dataset $X$, a model $g(\beta)$, which is a function of the
-parameters $\beta$ and a cost function $C(X, g(\beta))$ that allows
-us to judge how well the model $g(\beta)$ explains the observations
-$X$. The model is fit by finding the values of $\beta$ that minimize
-the cost function. Ideally we would be able to solve for $\beta$
-analytically, however this is not possible in general and we must use
-some approximative/numerical method to compute the minimum.
+
+!split
+===== Resampling methods =====
+!bblock
+Resampling methods are an indispensable tool in modern
+statistics. They involve repeatedly drawing samples from a training
+set and refitting a model of interest on each sample in order to
+obtain additional information about the fitted model. For example, in
+order to estimate the variability of a linear regression fit, we can
+repeatedly draw different samples from the training data, fit a linear
+regression to each new sample, and then examine the extent to which
+the resulting fits differ. Such an approach may allow us to obtain
+information that would not be available from fitting the model only
+once using the original training sample.
+
+Two resampling methods are often used in Machine Learning analyses,
+o The _bootstrap method_
+o and _Cross-Validation_
+
+In addition there are several other methods such as the Jackknife and the Blocking methods. This week will repeat some of the elements of the bootstrap method and focus more on cross-validation.
+!eblock
 
 
 !split
-=====  Revisiting our Logistic Regression case =====
+===== Resampling approaches can be computationally expensive =====
+!bblock
 
-In our discussion on Logistic Regression we studied the 
-case of
-two classes, with $y_i$ either
-$0$ or $1$. Furthermore we assumed also that we have only two
-parameters $\beta$ in our fitting, that is we
-defined probabilities
+Resampling approaches can be computationally expensive, because they
+involve fitting the same statistical method multiple times using
+different subsets of the training data. However, due to recent
+advances in computing power, the computational requirements of
+resampling methods generally are not prohibitive. In this chapter, we
+discuss two of the most commonly used resampling methods,
+cross-validation and the bootstrap. Both methods are important tools
+in the practical application of many statistical learning
+procedures. For example, cross-validation can be used to estimate the
+test error associated with a given statistical learning method in
+order to evaluate its performance, or to select the appropriate level
+of flexibility. The process of evaluating a model’s performance is
+known as model assessment, whereas the process of selecting the proper
+level of flexibility for a model is known as model selection. The
+bootstrap is widely used.
 
-!bt
-\begin{align*}
-p(y_i=1|x_i,\bm{\beta}) &= \frac{\exp{(\beta_0+\beta_1x_i)}}{1+\exp{(\beta_0+\beta_1x_i)}},\nonumber\\
-p(y_i=0|x_i,\bm{\beta}) &= 1 - p(y_i=1|x_i,\bm{\beta}),
-\end{align*}
-!et
-where $\bm{\beta}$ are the weights we wish to extract from data, in our case $\beta_0$ and $\beta_1$. 
+!eblock
 
 !split
-===== The equations to solve =====
+===== Why resampling methods ? ===== 
+!bblock Statistical analysis
+    
+* Our simulations can be treated as *computer experiments*. This is particularly the case for Monte Carlo methods which are widely used in statistical analyses.
+* The results can be analysed with the same statistical tools as we would use when analysing experimental data.
+* As in all experiments, we are looking for expectation values and an estimate of how accurate they are, i.e., possible sources for errors.
 
-Our compact equations used a definition of a vector $\bm{y}$ with $n$
-elements $y_i$, an $n\times p$ matrix $\bm{X}$ which contains the
-$x_i$ values and a vector $\bm{p}$ of fitted probabilities
-$p(y_i\vert x_i,\bm{\beta})$. We rewrote in a more compact form
-the first derivative of the cost function as
 
-!bt
-\[
-\frac{\partial \mathcal{C}(\bm{\beta})}{\partial \bm{\beta}} = -\bm{X}^T\left(\bm{y}-\bm{p}\right). 
-\]
-!et
+!eblock    
 
-If we in addition define a diagonal matrix $\bm{W}$ with elements 
-$p(y_i\vert x_i,\bm{\beta})(1-p(y_i\vert x_i,\bm{\beta})$, we can obtain a compact expression of the second derivative as 
+!split
+=====  Statistical analysis ===== 
+!bblock 
 
-!bt
-\[
-\frac{\partial^2 \mathcal{C}(\bm{\beta})}{\partial \bm{\beta}\partial \bm{\beta}^T} = \bm{X}^T\bm{W}\bm{X}. 
-\]
-!et
-This defines what is called  the Hessian matrix.
+* As in other experiments, many numerical  experiments have two classes of errors:
+  * Statistical errors
+  * Systematical errors
+* Statistical errors can be estimated using standard tools from statistics
+* Systematical errors are method specific and must be treated differently from case to case. 
+!eblock    
 
 
 
 
 
 !split
-===== Solving using Newton-Raphson's method =====
-
-If we can set up these equations, Newton-Raphson's iterative method is normally the method of choice. It requires however that we can compute in an efficient way the  matrices that define the first and second derivatives. 
+===== Resampling methods =====
 
-Our iterative scheme is then given by
+With all these analytical equations for both the OLS and Ridge
+regression, we will now outline how to assess a given model. This will
+lead to a discussion of the so-called bias-variance tradeoff (see
+below) and so-called resampling methods.
 
-!bt
-\[
-\bm{\beta}^{\mathrm{new}} = \bm{\beta}^{\mathrm{old}}-\left(\frac{\partial^2 \mathcal{C}(\bm{\beta})}{\partial \bm{\beta}\partial \bm{\beta}^T}\right)^{-1}_{\bm{\beta}^{\mathrm{old}}}\times \left(\frac{\partial \mathcal{C}(\bm{\beta})}{\partial \bm{\beta}}\right)_{\bm{\beta}^{\mathrm{old}}},
-\]
-!et
-or in matrix form as
+One of the quantities we have discussed as a way to measure errors is
+the mean-squared error (MSE), mainly used for fitting of continuous
+functions. Another choice is the absolute error.
 
-!bt
-\[
-\bm{\beta}^{\mathrm{new}} = \bm{\beta}^{\mathrm{old}}-\left(\bm{X}^T\bm{W}\bm{X} \right)^{-1}\times \left(-\bm{X}^T(\bm{y}-\bm{p}) \right)_{\bm{\beta}^{\mathrm{old}}}.
-\]
-!et
-The right-hand side is computed with the old values of $\beta$. 
+In the discussions below we will focus on the MSE and in particular since we will split the data into test and training data,
+we discuss the
+o prediction error or simply the _test error_ $\mathrm{Err_{Test}}$, where we have a fixed training set and the test error is the MSE arising from the data reserved for testing. We discuss also the 
+o training error $\mathrm{Err_{Train}}$, which is the average loss over the training data.
 
-If we can compute these matrices, in particular the Hessian, the above is often the easiest method to implement. 
+As our model becomes more and more complex, more of the training data tends to  used. The training may thence adapt to more complicated structures in the data. This may lead to a decrease in the bias (see below for code example) and a slight increase of the variance for the test error.
+For a certain level of complexity the test error will reach minimum, before starting to increase again. The
+training error reaches a saturation.
 
 
 !split
-===== Brief reminder on Newton-Raphson's method =====
+===== Resampling methods: Bootstrap =====
+!bblock
+Bootstrapping is a "non-parametric approach":"https://en.wikipedia.org/wiki/Nonparametric_statistics" to statistical inference
+that substitutes computation for more traditional distributional
+assumptions and asymptotic results. Bootstrapping offers a number of
+advantages: 
+o The bootstrap is quite general, although there are some cases in which it fails.  
+o Because it does not require distributional assumptions (such as normally distributed errors), the bootstrap can provide more accurate inferences when the data are not well behaved or when the sample size is small.  
+o It is possible to apply the bootstrap to statistics with sampling distributions that are difficult to derive, even asymptotically. 
+o It is relatively simple to apply the bootstrap to complex data-collection plans (such as stratified and clustered samples).
+!eblock
+
+The textbook by "Davison on the Bootstrap Methods and their Applications":"https://www.cambridge.org/core/books/bootstrap-methods-and-their-application/ED2FD043579F27952363566DC09CBD6A" provides many more insights and proofs. In this course we will take a more practical approach and use the results and theorems provided in the literature. For those interested in reading more about the bootstrap methods, we recommend the above text and the one by "Efron and Tibshirani":"https://www.routledge.com/An-Introduction-to-the-Bootstrap/Efron-Tibshirani/p/book/9780412042317".
 
-Let us quickly remind ourselves how we derive the above method.
 
-Perhaps the most celebrated of all one-dimensional root-finding
-routines is Newton's method, also called the Newton-Raphson
-method. This method  requires the evaluation of both the
-function $f$ and its derivative $f'$ at arbitrary points. 
-If you can only calculate the derivative
-numerically and/or your function is not of the smooth type, we
-normally discourage the use of this method.
 
 !split
-===== The equations =====
+===== The bias-variance tradeoff =====
+
 
-The Newton-Raphson formula consists geometrically of extending the
-tangent line at a current point until it crosses zero, then setting
-the next guess to the abscissa of that zero-crossing.  The mathematics
-behind this method is rather simple. Employing a Taylor expansion for
-$x$ sufficiently close to the solution $s$, we have
+We will discuss the bias-variance tradeoff in the context of
+continuous predictions such as regression. However, many of the
+intuitions and ideas discussed here also carry over to classification
+tasks. Consider a dataset $\mathcal{D}$ consisting of the data
+$\mathbf{X}_\mathcal{D}=\{(y_j, \boldsymbol{x}_j), j=0\ldots n-1\}$. 
 
+Let us assume that the true data is generated from a noisy model
 
 !bt
 \[
-    f(s)=0=f(x)+(s-x)f'(x)+\frac{(s-x)^2}{2}f''(x) +\dots.
-    \label{eq:taylornr}
+\bm{y}=f(\boldsymbol{x}) + \bm{\epsilon}
 \]
 !et
 
-For small enough values of the function and for well-behaved
-functions, the terms beyond linear are unimportant, hence we obtain
+where $\epsilon$ is normally distributed with mean zero and standard deviation $\sigma^2$.
 
+In our derivation of the ordinary least squares method we defined then
+an approximation to the function $f$ in terms of the parameters
+$\bm{\theta}$ and the design matrix $\bm{X}$ which embody our model,
+that is $\bm{\tilde{y}}=\bm{X}\bm{\theta}$. 
 
+Thereafter we found the parameters $\bm{\theta}$ by optimizing the means squared error via the so-called cost function
 !bt
 \[
-   f(x)+(s-x)f'(x)\approx 0,
-\]
-!et
-yielding
-!bt
-\[
-   s\approx x-\frac{f(x)}{f'(x)}.
+C(\bm{X},\bm{\theta}) =\frac{1}{n}\sum_{i=0}^{n-1}(y_i-\tilde{y}_i)^2=\mathbb{E}\left[(\bm{y}-\bm{\tilde{y}})^2\right].
 \]
 !et
 
-Having in mind an iterative procedure, it is natural to start iterating with
+We can rewrite this as 
 !bt
 \[
-   x_{n+1}=x_n-\frac{f(x_n)}{f'(x_n)}.
+\mathbb{E}\left[(\bm{y}-\bm{\tilde{y}})^2\right]=\frac{1}{n}\sum_i(f_i-\mathbb{E}\left[\bm{\tilde{y}}\right])^2+\frac{1}{n}\sum_i(\tilde{y}_i-\mathbb{E}\left[\bm{\tilde{y}}\right])^2+\sigma^2.
 \]
 !et
 
-!split
-===== Simple geometric interpretation =====
-
-The above is Newton-Raphson's method. It has a simple geometric
-interpretation, namely $x_{n+1}$ is the point where the tangent from
-$(x_n,f(x_n))$ crosses the $x$-axis.  Close to the solution,
-Newton-Raphson converges fast to the desired result. However, if we
-are far from a root, where the higher-order terms in the series are
-important, the Newton-Raphson formula can give grossly inaccurate
-results. For instance, the initial guess for the root might be so far
-from the true root as to let the search interval include a local
-maximum or minimum of the function.  If an iteration places a trial
-guess near such a local extremum, so that the first derivative nearly
-vanishes, then Newton-Raphson may fail totally
 
 
-!split
-===== Extending to more than one variable =====
-
-Newton's method can be generalized to systems of several non-linear equations
-and variables. Consider the case with two equations
-!bt
-\[
-   \begin{array}{cc} f_1(x_1,x_2) &=0\\
-                     f_2(x_1,x_2) &=0,\end{array}
-\]
-!et
-which we Taylor expand to obtain
+The three terms represent the square of the bias of the learning
+method, which can be thought of as the error caused by the simplifying
+assumptions built into the method. The second term represents the
+variance of the chosen model and finally the last terms is variance of
+the error $\bm{\epsilon}$.
 
+To derive this equation, we need to recall that the variance of $\bm{y}$ and $\bm{\epsilon}$ are both equal to $\sigma^2$. The mean value of $\bm{\epsilon}$ is by definition equal to zero. Furthermore, the function $f$ is not a stochastics variable, idem for $\bm{\tilde{y}}$.
+We use a more compact notation in terms of the expectation value 
 !bt
 \[
-   \begin{array}{cc} 0=f_1(x_1+h_1,x_2+h_2)=&f_1(x_1,x_2)+h_1
-                     \partial f_1/\partial x_1+h_2
-                     \partial f_1/\partial x_2+\dots\\
-                     0=f_2(x_1+h_1,x_2+h_2)=&f_2(x_1,x_2)+h_1
-                     \partial f_2/\partial x_1+h_2
-                     \partial f_2/\partial x_2+\dots
-                       \end{array}.
-\]
-!et
-Defining the Jacobian matrix ${\bf \bm{J}}$ we have
-!bt
-\[
- {\bf \bm{J}}=\left( \begin{array}{cc}
-                         \partial f_1/\partial x_1  & \partial f_1/\partial x_2 \\
-                          \partial f_2/\partial x_1     &\partial f_2/\partial x_2
-             \end{array} \right),
+\mathbb{E}\left[(\bm{y}-\bm{\tilde{y}})^2\right]=\mathbb{E}\left[(\bm{f}+\bm{\epsilon}-\bm{\tilde{y}})^2\right],
 \]
 !et
-we can rephrase Newton's method as
+and adding and subtracting $\mathbb{E}\left[\bm{\tilde{y}}\right]$ we get
 !bt
 \[
-\left(\begin{array}{c} x_1^{n+1} \\ x_2^{n+1} \end{array} \right)=
-\left(\begin{array}{c} x_1^{n} \\ x_2^{n} \end{array} \right)+
-\left(\begin{array}{c} h_1^{n} \\ h_2^{n} \end{array} \right),
+\mathbb{E}\left[(\bm{y}-\bm{\tilde{y}})^2\right]=\mathbb{E}\left[(\bm{f}+\bm{\epsilon}-\bm{\tilde{y}}+\mathbb{E}\left[\bm{\tilde{y}}\right]-\mathbb{E}\left[\bm{\tilde{y}}\right])^2\right],
 \]
 !et
-where we have defined
+which, using the abovementioned expectation values can be rewritten as 
 !bt
 \[
-   \left(\begin{array}{c} h_1^{n} \\ h_2^{n} \end{array} \right)=
-   -{\bf \bm{J}}^{-1}
-   \left(\begin{array}{c} f_1(x_1^{n},x_2^{n}) \\ f_2(x_1^{n},x_2^{n}) \end{array} \right).
+\mathbb{E}\left[(\bm{y}-\bm{\tilde{y}})^2\right]=\mathbb{E}\left[(\bm{y}-\mathbb{E}\left[\bm{\tilde{y}}\right])^2\right]+\mathrm{Var}\left[\bm{\tilde{y}}\right]+\sigma^2,
 \]
 !et
-We need thus to compute the inverse of the Jacobian matrix and it
-is to understand that difficulties  may
-arise in case ${\bf \bm{J}}$ is nearly singular.
+that is the rewriting in terms of the so-called bias, the variance of the model $\bm{\tilde{y}}$ and the variance of $\bm{\epsilon}$.
 
-It is rather straightforward to extend the above scheme to systems of
-more than two non-linear equations. In our case, the Jacobian matrix is given by the Hessian that represents the second derivative of cost function. 
+_Note that in order to derive these equations we have assumed we can replace the unknown function $\bm{f}$ with the target/output data $\bm{y}$._
+
+!split
+===== A way to Read the Bias-Variance Tradeoff =====
+
+FIGURE: [figures/BiasVariance.png, width=600 frac=0.9]
 
 
 
 !split
-===== Steepest descent =====
+===== Understanding what happens =====
+!bc pycod
+import matplotlib.pyplot as plt
+import numpy as np
+from sklearn.linear_model import LinearRegression, Ridge, Lasso
+from sklearn.preprocessing import PolynomialFeatures
+from sklearn.model_selection import train_test_split
+from sklearn.pipeline import make_pipeline
+from sklearn.utils import resample
+
+np.random.seed(2018)
+
+n = 40
+n_boostraps = 100
+maxdegree = 14
+
+
+# Make data set.
+x = np.linspace(-3, 3, n).reshape(-1, 1)
+y = np.exp(-x**2) + 1.5 * np.exp(-(x-2)**2)+ np.random.normal(0, 0.1, x.shape)
+error = np.zeros(maxdegree)
+bias = np.zeros(maxdegree)
+variance = np.zeros(maxdegree)
+polydegree = np.zeros(maxdegree)
+x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2)
+
+for degree in range(maxdegree):
+    model = make_pipeline(PolynomialFeatures(degree=degree), LinearRegression(fit_intercept=False))
+    y_pred = np.empty((y_test.shape[0], n_boostraps))
+    for i in range(n_boostraps):
+        x_, y_ = resample(x_train, y_train)
+        y_pred[:, i] = model.fit(x_, y_).predict(x_test).ravel()
+
+    polydegree[degree] = degree
+    error[degree] = np.mean( np.mean((y_test - y_pred)**2, axis=1, keepdims=True) )
+    bias[degree] = np.mean( (y_test - np.mean(y_pred, axis=1, keepdims=True))**2 )
+    variance[degree] = np.mean( np.var(y_pred, axis=1, keepdims=True) )
+    print('Polynomial degree:', degree)
+    print('Error:', error[degree])
+    print('Bias^2:', bias[degree])
+    print('Var:', variance[degree])
+    print('{} >= {} + {} = {}'.format(error[degree], bias[degree], variance[degree], bias[degree]+variance[degree]))
+
+plt.plot(polydegree, error, label='Error')
+plt.plot(polydegree, bias, label='bias')
+plt.plot(polydegree, variance, label='Variance')
+plt.legend()
+plt.show()
 
-The basic idea of gradient descent is
-that a function $F(\mathbf{x})$, 
-$\mathbf{x} \equiv (x_1,\cdots,x_n)$, decreases fastest if one goes from $\bf {x}$ in the
-direction of the negative gradient $-\nabla F(\mathbf{x})$.
 
-It can be shown that if 
-!bt
-\[
-\mathbf{x}_{k+1} = \mathbf{x}_k - \gamma_k \nabla F(\mathbf{x}_k),
-\]
-!et
-with $\gamma_k > 0$.
 
-For $\gamma_k$ small enough, then $F(\mathbf{x}_{k+1}) \leq
-F(\mathbf{x}_k)$. This means that for a sufficiently small $\gamma_k$
-we are always moving towards smaller function values, i.e a minimum.
+
+!ec
 
 !split 
-===== More on Steepest descent =====
+===== Summing up ===== 
 
-The previous observation is the basis of the method of steepest
-descent, which is also referred to as just gradient descent (GD). One
-starts with an initial guess $\mathbf{x}_0$ for a minimum of $F$ and
-computes new approximations according to
 
-!bt
-\[
-\mathbf{x}_{k+1} = \mathbf{x}_k - \gamma_k \nabla F(\mathbf{x}_k), \ \ k \geq 0.
-\]
-!et
 
-The parameter $\gamma_k$ is often referred to as the step length or
-the learning rate within the context of Machine Learning.
 
-!split 
-===== The ideal =====
+The bias-variance tradeoff summarizes the fundamental tension in
+machine learning, particularly supervised learning, between the
+complexity of a model and the amount of training data needed to train
+it.  Since data is often limited, in practice it is often useful to
+use a less-complex model with higher bias, that is  a model whose asymptotic
+performance is worse than another model because it is easier to
+train and less sensitive to sampling noise arising from having a
+finite-sized training dataset (smaller variance). 
 
-Ideally the sequence $\{\mathbf{x}_k \}_{k=0}$ converges to a global
-minimum of the function $F$. In general we do not know if we are in a
-global or local minimum. In the special case when $F$ is a convex
-function, all local minima are also global minima, so in this case
-gradient descent can converge to the global solution. The advantage of
-this scheme is that it is conceptually simple and straightforward to
-implement. However the method in this form has some severe
-limitations:
 
-In machine learing we are often faced with non-convex high dimensional
-cost functions with many local minima. Since GD is deterministic we
-will get stuck in a local minimum, if the method converges, unless we
-have a very good intial guess. This also implies that the scheme is
-sensitive to the chosen initial condition.
 
-Note that the gradient is a function of $\mathbf{x} =
-(x_1,\cdots,x_n)$ which makes it expensive to compute numerically.
+The above equations tell us that in
+order to minimize the expected test error, we need to select a
+statistical learning method that simultaneously achieves low variance
+and low bias. Note that variance is inherently a nonnegative quantity,
+and squared bias is also nonnegative. Hence, we see that the expected
+test MSE can never lie below $Var(\epsilon)$, the irreducible error.
 
 
-!split 
-===== The sensitiveness of the gradient descent ===== 
+What do we mean by the variance and bias of a statistical learning
+method? The variance refers to the amount by which our model would change if we
+estimated it using a different training data set. Since the training
+data are used to fit the statistical learning method, different
+training data sets  will result in a different estimate. But ideally the
+estimate for our model should not vary too much between training
+sets. However, if a method has high variance  then small changes in
+the training data can result in large changes in the model. In general, more
+flexible statistical methods have higher variance.
 
-The gradient descent method 
-is sensitive to the choice of learning rate $\gamma_k$. This is due
-to the fact that we are only guaranteed that $F(\mathbf{x}_{k+1}) \leq
-F(\mathbf{x}_k)$ for sufficiently small $\gamma_k$. The problem is to
-determine an optimal learning rate. If the learning rate is chosen too
-small the method will take a long time to converge and if it is too
-large we can experience erratic behavior.
 
-Many of these shortcomings can be alleviated by introducing
-randomness. One such method is that of Stochastic Gradient Descent
-(SGD), see below.
+You may also find this recent "article":"https://www.pnas.org/content/116/32/15849" of interest.
 
+!split
+===== Another Example from Scikit-Learn's Repository =====
 
-!split 
-===== Convex functions ===== 
+This example demonstrates the problems of underfitting and overfitting and
+how we can use linear regression with polynomial features to approximate
+nonlinear functions. The plot shows the function that we want to approximate,
+which is a part of the cosine function. In addition, the samples from the
+real function and the approximations of different models are displayed. The
+models have polynomial features of different degrees. We can see that a
+linear function (polynomial with degree 1) is not sufficient to fit the
+training samples. This is called _underfitting_. A polynomial of degree 4
+approximates the true function almost perfectly. However, for higher degrees
+the model will _overfit_ the training data, i.e. it learns the noise of the
+training data.
+We evaluate quantitatively overfitting and underfitting by using
+cross-validation. We calculate the mean squared error (MSE) on the validation
+set, the higher, the less likely the model generalizes correctly from the
+training data.
 
-Ideally we want our cost/loss function to be convex(concave).
+!bc pycod
 
-First we give the definition of a convex set: A set $C$ in
-$\mathbb{R}^n$ is said to be convex if, for all $x$ and $y$ in $C$ and
-all $t \in (0,1)$ , the point $(1 − t)x + ty$ also belongs to
-C. Geometrically this means that every point on the line segment
-connecting $x$ and $y$ is in $C$ as discussed below.
 
-The convex subsets of $\mathbb{R}$ are the intervals of
-$\mathbb{R}$. Examples of convex sets of $\mathbb{R}^2$ are the
-regular polygons (triangles, rectangles, pentagons, etc...).
+#print(__doc__)
 
-!split
-===== Convex function =====
-
-_Convex function_: Let $X \subset \mathbb{R}^n$ be a convex
-set. Assume that the function $f: X \rightarrow \mathbb{R}$ is
-continuous, then $f$ is said to be convex if $f(tx_1 + (1-t)x_2) \leq tf(x_1) + (1-t)f(x_2)$
-for all $x_1, x_2 \in X$ and for all $t \in [0,1]$.
-If $\leq$ is replaced with a strict inequaltiy in the
-definition, we demand $x_1 \neq x_2$ and $t\in(0,1)$ then $f$ is said
-to be strictly convex. For a single variable function, convexity means
-that if you draw a straight line connecting $f(x_1)$ and $f(x_2)$, the
-value of the function on the interval $[x_1,x_2]$ is always below the
-line as illustrated below.
+import numpy as np
+import matplotlib.pyplot as plt
+from sklearn.pipeline import Pipeline
+from sklearn.preprocessing import PolynomialFeatures
+from sklearn.linear_model import LinearRegression
+from sklearn.model_selection import cross_val_score
+
+
+def true_fun(X):
+    return np.cos(1.5 * np.pi * X)
+
+np.random.seed(0)
+
+n_samples = 30
+degrees = [1, 4, 15]
+
+X = np.sort(np.random.rand(n_samples))
+y = true_fun(X) + np.random.randn(n_samples) * 0.1
+
+plt.figure(figsize=(14, 5))
+for i in range(len(degrees)):
+    ax = plt.subplot(1, len(degrees), i + 1)
+    plt.setp(ax, xticks=(), yticks=())
+
+    polynomial_features = PolynomialFeatures(degree=degrees[i],
+                                             include_bias=False)
+    linear_regression = LinearRegression()
+    pipeline = Pipeline([("polynomial_features", polynomial_features),
+                         ("linear_regression", linear_regression)])
+    pipeline.fit(X[:, np.newaxis], y)
+
+    # Evaluate the models using crossvalidation
+    scores = cross_val_score(pipeline, X[:, np.newaxis], y,
+                             scoring="neg_mean_squared_error", cv=10)
+
+    X_test = np.linspace(0, 1, 100)
+    plt.plot(X_test, pipeline.predict(X_test[:, np.newaxis]), label="Model")
+    plt.plot(X_test, true_fun(X_test), label="True function")
+    plt.scatter(X, y, edgecolor='b', s=20, label="Samples")
+    plt.xlabel("x")
+    plt.ylabel("y")
+    plt.xlim((0, 1))
+    plt.ylim((-2, 2))
+    plt.legend(loc="best")
+    plt.title("Degree {}\nMSE = {:.2e}(+/- {:.2e})".format(
+        degrees[i], -scores.mean(), scores.std()))
+plt.show()
+!ec
 
-!split
-===== Conditions on convex functions =====
-
-In the following we state first and second-order conditions which
-ensures convexity of a function $f$. We write $D_f$ to denote the
-domain of $f$, i.e the subset of $R^n$ where $f$ is defined. For more
-details and proofs we refer to: "S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge University Press":"http://stanford.edu/boyd/cvxbook/".
-
-!bblock First order condition
-Suppose $f$ is differentiable (i.e $\nabla f(x)$ is well defined for
-all $x$ in the domain of $f$). Then $f$ is convex if and only if $D_f$
-is a convex set and $f(y) \geq f(x) + \nabla f(x)^T (y-x)$ holds
-for all $x,y \in D_f$.
-
-This condition means that for a convex function
-the first order Taylor expansion (right hand side above) at any point
-a global under estimator of the function. To convince yourself you can
-make a drawing of $f(x) = x^2+1$ and draw the tangent line to $f(x)$ and
-note that it is always below the graph.  
-!eblock
 
-!bblock Second order condition 
-Assume that $f$ is twice
-differentiable, i.e the Hessian matrix exists at each point in
-$D_f$. Then $f$ is convex if and only if $D_f$ is a convex set and its
-Hessian is positive semi-definite for all $x\in D_f$. For a
-single-variable function this reduces to $f''(x) \geq 0$. Geometrically this means that $f$ has nonnegative curvature
-everywhere.
-!eblock
 
-This condition is particularly useful since it gives us an procedure for determining if the function under consideration is convex, apart from using the definition.
 
-!split
-===== More on convex functions =====
-
-The next result is of great importance to us and the reason why we are
-going on about convex functions. In machine learning we frequently
-have to minimize a loss/cost function in order to find the best
-parameters for the model we are considering. 
-
-Ideally we want the
-global minimum (for high-dimensional models it is hard to know
-if we have local or global minimum). However, if the cost/loss function
-is convex the following result provides invaluable information:
-
-!bblock Any minimum is global for convex functions
-Consider the problem of finding $x \in \mathbb{R}^n$ such that $f(x)$
-is minimal, where $f$ is convex and differentiable. Then, any point
-$x^*$ that satisfies $\nabla f(x^*) = 0$ is a global minimum.
-!eblock
+!split 
+===== Various steps in cross-validation =====
 
-This result means that if we know that the cost/loss function is convex and we are able to find a minimum, we are guaranteed that it is a global minimum.
+When the repetitive splitting of the data set is done randomly,
+samples may accidently end up in a fast majority of the splits in
+either training or test set. Such samples may have an unbalanced
+influence on either model building or prediction evaluation. To avoid
+this $k$-fold cross-validation structures the data splitting. The
+samples are divided into $k$ more or less equally sized exhaustive and
+mutually exclusive subsets. In turn (at each split) one of these
+subsets plays the role of the test set while the union of the
+remaining subsets constitutes the training set. Such a splitting
+warrants a balanced representation of each sample in both training and
+test set over the splits. Still the division into the $k$ subsets
+involves a degree of randomness. This may be fully excluded when
+choosing $k=n$. This particular case is referred to as leave-one-out
+cross-validation (LOOCV). 
 
-!split
-===== Some simple problems =====
 
-o Show that $f(x)=x^2$ is convex for $x \in \mathbb{R}$ using the definition of convexity. Hint: If you re-write the definition, $f$ is convex if the following holds for all $x,y \in D_f$ and any $\lambda \in [0,1]$ $\lambda f(x)+(1-\lambda)f(y)-f(\lambda x + (1-\lambda) y ) \geq 0$.
+!split
+===== Cross-validation in brief =====
 
-o Using the second order condition show that the following functions are convex on the specified domain.
- * $f(x) = e^x$ is convex for $x \in \mathbb{R}$.
- * $g(x) = -\ln(x)$ is convex for $x \in (0,\infty)$.
-o Let $f(x) = x^2$ and $g(x) = e^x$. Show that $f(g(x))$ and $g(f(x))$ is convex for $x \in \mathbb{R}$. Also show that if $f(x)$ is any convex function than $h(x) = e^{f(x)}$ is convex.
+For the various values of $k$
 
-o A norm is any function that satisfy the following properties
- * $f(\alpha x) = |\alpha| f(x)$ for all $\alpha \in \mathbb{R}$.
- * $f(x+y) \leq f(x) + f(y)$
- * $f(x) \leq 0$ for all $x \in \mathbb{R}^n$ with equality if and only if $x = 0$
+o shuffle the dataset randomly.
+o Split the dataset into $k$ groups.
+o For each unique group:
+ o Decide which group to use as set for test data
+ o Take the remaining groups as a training data set
+ o Fit a model on the training set and evaluate it on the test set
+ o Retain the evaluation score and discard the model
+o Summarize the model using the sample of model evaluation scores
 
-Using the definition of convexity, try to show that a function satisfying the properties above is convex (the third condition is not needed to show this).
 
 
+!split
+===== Code Example for Cross-validation and $k$-fold Cross-validation =====
 
+The code here uses Ridge regression with cross-validation (CV)  resampling and $k$-fold CV in order to fit a specific polynomial. 
+!bc pycod
+import numpy as np
+import matplotlib.pyplot as plt
+from sklearn.model_selection import KFold
+from sklearn.linear_model import Ridge
+from sklearn.model_selection import cross_val_score
+from sklearn.preprocessing import PolynomialFeatures
 
-!split
-===== Standard steepest descent =====
+# A seed just to ensure that the random numbers are the same for every run.
+# Useful for eventual debugging.
+np.random.seed(3155)
 
+# Generate the data.
+nsamples = 100
+x = np.random.randn(nsamples)
+y = 3*x**2 + np.random.randn(nsamples)
 
-Before we proceed, we would like to discuss the approach called the
-_standard Steepest descent_ (different from the above steepest descent discussion), which again leads to us having to be able
-to compute a matrix. It belongs to the class of Conjugate Gradient methods (CG).
+## Cross-validation on Ridge regression using KFold only
 
-"The success of the CG method":"https://www.cs.cmu.edu/~quake-papers/painless-conjugate-gradient.pdf"
-for finding solutions of non-linear problems is based on the theory
-of conjugate gradients for linear systems of equations. It belongs to
-the class of iterative methods for solving problems from linear
-algebra of the type 
-!bt 
-\begin{equation*} 
-\bm{A}\bm{x} = \bm{b}.
-\end{equation*} 
-!et 
+# Decide degree on polynomial to fit
+poly = PolynomialFeatures(degree = 6)
 
-In the iterative process we end up with a problem like
+# Decide which values of lambda to use
+nlambdas = 500
+lambdas = np.logspace(-3, 5, nlambdas)
 
-!bt
-\begin{equation*}
-  \bm{r}= \bm{b}-\bm{A}\bm{x},
-\end{equation*}
-!et
-where $\bm{r}$ is the so-called residual or error in the iterative process.
+# Initialize a KFold instance
+k = 5
+kfold = KFold(n_splits = k)
 
-When we have found the exact solution, $\bm{r}=0$.
+# Perform the cross-validation to estimate MSE
+scores_KFold = np.zeros((nlambdas, k))
 
-!split
-===== Gradient method =====
+i = 0
+for lmb in lambdas:
+    ridge = Ridge(alpha = lmb)
+    j = 0
+    for train_inds, test_inds in kfold.split(x):
+        xtrain = x[train_inds]
+        ytrain = y[train_inds]
 
-The residual is zero when we reach the minimum of the quadratic equation
-!bt
-\begin{equation*}
-  P(\bm{x})=\frac{1}{2}\bm{x}^T\bm{A}\bm{x} - \bm{x}^T\bm{b},
-\end{equation*}
-!et
+        xtest = x[test_inds]
+        ytest = y[test_inds]
 
-with the constraint that the matrix $\bm{A}$ is positive definite and
-symmetric.  This defines also the Hessian and we want it to be  positive definite.  
+        Xtrain = poly.fit_transform(xtrain[:, np.newaxis])
+        ridge.fit(Xtrain, ytrain[:, np.newaxis])
 
+        Xtest = poly.fit_transform(xtest[:, np.newaxis])
+        ypred = ridge.predict(Xtest)
 
-!split
-===== Steepest descent  method ===== 
+        scores_KFold[i,j] = np.sum((ypred - ytest[:, np.newaxis])**2)/np.size(ypred)
 
-We denote the initial guess for $\bm{x}$ as $\bm{x}_0$. 
-We can assume without loss of generality that
-!bt
-\begin{equation*}
-\bm{x}_0=0,
-\end{equation*}
-!et
-or consider the system
-!bt
-\begin{equation*}
-\bm{A}\bm{z} = \bm{b}-\bm{A}\bm{x}_0,
-\end{equation*}
-!et
-instead.
+        j += 1
+    i += 1
 
 
-!split
-===== Steepest descent  method ===== 
-!bblock
-One can show that the solution $\bm{x}$ is also the unique minimizer of the quadratic form
-!bt
-\begin{equation*}
-  f(\bm{x}) = \frac{1}{2}\bm{x}^T\bm{A}\bm{x} - \bm{x}^T \bm{x} , \quad \bm{x}\in\mathbf{R}^n. 
-\end{equation*}
-!et
-This suggests taking the first basis vector $\bm{r}_1$ (see below for definition) 
-to be the gradient of $f$ at $\bm{x}=\bm{x}_0$, 
-which equals
-!bt
-\begin{equation*}
-\bm{A}\bm{x}_0-\bm{b},
-\end{equation*}
-!et
-and 
-$\bm{x}_0=0$ it is equal $-\bm{b}$.
+estimated_mse_KFold = np.mean(scores_KFold, axis = 1)
 
-!eblock
+## Cross-validation using cross_val_score from sklearn along with KFold
 
-!split
-===== Final expressions ===== 
-!bblock
-We can compute the residual iteratively as
-!bt
-\begin{equation*}
-\bm{r}_{k+1}=\bm{b}-\bm{A}\bm{x}_{k+1},
- \end{equation*}
-!et
-which equals
-!bt
-\begin{equation*}
-\bm{b}-\bm{A}(\bm{x}_k+\alpha_k\bm{r}_k),
- \end{equation*}
-!et
-or
-!bt
-\begin{equation*}
-(\bm{b}-\bm{A}\bm{x}_k)-\alpha_k\bm{A}\bm{r}_k,
- \end{equation*}
-!et
-which gives
+# kfold is an instance initialized above as:
+# kfold = KFold(n_splits = k)
 
-!bt
-\[
-\alpha_k = \frac{\bm{r}_k^T\bm{r}_k}{\bm{r}_k^T\bm{A}\bm{r}_k}
-\]
-!et
-leading to the iterative scheme
-!bt
-\begin{equation*}
-\bm{x}_{k+1}=\bm{x}_k+\alpha_k\bm{r}_{k},
- \end{equation*}
-!et
-!eblock
+estimated_mse_sklearn = np.zeros(nlambdas)
+i = 0
+for lmb in lambdas:
+    ridge = Ridge(alpha = lmb)
 
+    X = poly.fit_transform(x[:, np.newaxis])
+    estimated_mse_folds = cross_val_score(ridge, X, y[:, np.newaxis], scoring='neg_mean_squared_error', cv=kfold)
 
+    # cross_val_score return an array containing the estimated negative mse for every fold.
+    # we have to the the mean of every array in order to get an estimate of the mse of the model
+    estimated_mse_sklearn[i] = np.mean(-estimated_mse_folds)
 
-!split
-===== Steepest descent example =====
+    i += 1
 
-!bc pycod
-import numpy as np
-import numpy.linalg as la
+## Plot and compare the slightly different ways to perform cross-validation
 
-import scipy.optimize as sopt
+plt.figure()
 
-import matplotlib.pyplot as pt
-from mpl_toolkits.mplot3d import axes3d
+plt.plot(np.log10(lambdas), estimated_mse_sklearn, label = 'cross_val_score')
+#plt.plot(np.log10(lambdas), estimated_mse_KFold, 'r--', label = 'KFold')
 
-def f(x):
-    return x[0]**2 + 3.0*x[1]**2
+plt.xlabel('log10(lambda)')
+plt.ylabel('mse')
 
-def df(x):
-    return np.array([2*x[0], 6*x[1]])
+plt.legend()
 
-fig = pt.figure()
-ax = fig.add_subplot(projection = '3d')
+plt.show()
 
-xmesh, ymesh = np.mgrid[-3:3:50j,-3:3:50j]
-fmesh = f(np.array([xmesh, ymesh]))
-ax.plot_surface(xmesh, ymesh, fmesh)
 !ec
-And then as countor plot
-!bc pycod
-pt.axis("equal")
-pt.contour(xmesh, ymesh, fmesh)
-guesses = [np.array([2, 2./5])]
-!ec
-Find guesses
-!bc pycod
-x = guesses[-1]
-s = -df(x)
-!ec
-Run it!
-!bc pycod
-def f1d(alpha):
-    return f(x + alpha*s)
 
-alpha_opt = sopt.golden(f1d)
-next_guess = x + alpha_opt * s
-guesses.append(next_guess)
-print(next_guess)
-!ec
-What happened?
-!bc pycod
-pt.axis("equal")
-pt.contour(xmesh, ymesh, fmesh, 50)
-it_array = np.array(guesses)
-pt.plot(it_array.T[0], it_array.T[1], "x-")
-!ec
 
-Note that we did only one iteration here. We can easily add more using our previous guesses.
 
 !split
-===== Conjugate gradient method ===== 
-!bblock
-In the CG method we define so-called conjugate directions and two vectors 
-$\bm{s}$ and $\bm{t}$
-are said to be
-conjugate if
-!bt
-\begin{equation*}
-\bm{s}^T\bm{A}\bm{t}= 0.
-\end{equation*}
-!et
-The philosophy of the CG method is to perform searches in various conjugate directions
-of our vectors $\bm{x}_i$ obeying the above criterion, namely
-!bt
-\begin{equation*}
-\bm{x}_i^T\bm{A}\bm{x}_j= 0.
-\end{equation*}
-!et
-Two vectors are conjugate if they are orthogonal with respect to 
-this inner product. Being conjugate is a symmetric relation: if $\bm{s}$ is conjugate to $\bm{t}$, then $\bm{t}$ is conjugate to $\bm{s}$.
-!eblock
+===== More examples on bootstrap and cross-validation and errors =====
 
-!split
-===== Conjugate gradient method ===== 
-!bblock
-An example is given by the eigenvectors of the matrix
-!bt
-\begin{equation*}
-\bm{v}_i^T\bm{A}\bm{v}_j= \lambda\bm{v}_i^T\bm{v}_j,
-\end{equation*}
-!et
-which is zero unless $i=j$. 
-!eblock
+!bc pycod
+# Common imports
+import os
+import numpy as np
+import pandas as pd
+import matplotlib.pyplot as plt
+from sklearn.linear_model import LinearRegression, Ridge, Lasso
+from sklearn.model_selection import train_test_split
+from sklearn.utils import resample
+from sklearn.metrics import mean_squared_error
+# Where to save the figures and data files
+PROJECT_ROOT_DIR = "Results"
+FIGURE_ID = "Results/FigureFiles"
+DATA_ID = "DataFiles/"
+
+if not os.path.exists(PROJECT_ROOT_DIR):
+    os.mkdir(PROJECT_ROOT_DIR)
+
+if not os.path.exists(FIGURE_ID):
+    os.makedirs(FIGURE_ID)
+
+if not os.path.exists(DATA_ID):
+    os.makedirs(DATA_ID)
+
+def image_path(fig_id):
+    return os.path.join(FIGURE_ID, fig_id)
+
+def data_path(dat_id):
+    return os.path.join(DATA_ID, dat_id)
+
+def save_fig(fig_id):
+    plt.savefig(image_path(fig_id) + ".png", format='png')
+
+infile = open(data_path("EoS.csv"),'r')
+
+# Read the EoS data as  csv file and organize the data into two arrays with density and energies
+EoS = pd.read_csv(infile, names=('Density', 'Energy'))
+EoS['Energy'] = pd.to_numeric(EoS['Energy'], errors='coerce')
+EoS = EoS.dropna()
+Energies = EoS['Energy']
+Density = EoS['Density']
+#  The design matrix now as function of various polytrops
+
+Maxpolydegree = 30
+X = np.zeros((len(Density),Maxpolydegree))
+X[:,0] = 1.0
+testerror = np.zeros(Maxpolydegree)
+trainingerror = np.zeros(Maxpolydegree)
+polynomial = np.zeros(Maxpolydegree)
+
+trials = 100
+for polydegree in range(1, Maxpolydegree):
+    polynomial[polydegree] = polydegree
+    for degree in range(polydegree):
+        X[:,degree] = Density**(degree/3.0)
+
+# loop over trials in order to estimate the expectation value of the MSE
+    testerror[polydegree] = 0.0
+    trainingerror[polydegree] = 0.0
+    for samples in range(trials):
+        x_train, x_test, y_train, y_test = train_test_split(X, Energies, test_size=0.2)
+        model = LinearRegression(fit_intercept=False).fit(x_train, y_train)
+        ypred = model.predict(x_train)
+        ytilde = model.predict(x_test)
+        testerror[polydegree] += mean_squared_error(y_test, ytilde)
+        trainingerror[polydegree] += mean_squared_error(y_train, ypred) 
+
+    testerror[polydegree] /= trials
+    trainingerror[polydegree] /= trials
+    print("Degree of polynomial: %3d"% polynomial[polydegree])
+    print("Mean squared error on training data: %.8f" % trainingerror[polydegree])
+    print("Mean squared error on test data: %.8f" % testerror[polydegree])
+
+plt.plot(polynomial, np.log10(trainingerror), label='Training Error')
+plt.plot(polynomial, np.log10(testerror), label='Test Error')
+plt.xlabel('Polynomial degree')
+plt.ylabel('log10[MSE]')
+plt.legend()
+plt.show()
 
+!ec
 
-!split
-===== Conjugate gradient method ===== 
-!bblock
-Assume now that we have a symmetric positive-definite matrix $\bm{A}$ of size
-$n\times n$. At each iteration $i+1$ we obtain the conjugate direction of a vector
-!bt
-\begin{equation*}
-\bm{x}_{i+1}=\bm{x}_{i}+\alpha_i\bm{p}_{i}. 
-\end{equation*}
-!et
-We assume that $\bm{p}_{i}$ is a sequence of $n$ mutually conjugate directions. 
-Then the $\bm{p}_{i}$  form a basis of $R^n$ and we can expand the solution 
-$  \bm{A}\bm{x} = \bm{b}$ in this basis, namely
+Note that we kept the intercept column in the fitting here. This means that we need to set the _intercept_ in the call to the _Scikit-Learn_ function as _False_. Alternatively, we could have set up the design matrix $X$ without the first column of ones.
 
-!bt
-\begin{equation*}
-  \bm{x}  = \sum^{n}_{i=1} \alpha_i \bm{p}_i.
-\end{equation*}
-!et
-!eblock
+!split 
+=====  The same example but now with cross-validation =====
 
-!split
-===== Conjugate gradient method ===== 
-!bblock
-The coefficients are given by
-!bt
-\begin{equation*}
-    \mathbf{A}\mathbf{x} = \sum^{n}_{i=1} \alpha_i \mathbf{A} \mathbf{p}_i = \mathbf{b}.
-\end{equation*}
-!et
-Multiplying with $\bm{p}_k^T$  from the left gives
+In this example we keep the intercept column again but add cross-validation in order to estimate the best possible value of the means squared error.
+!bc pycod
+# Common imports
+import os
+import numpy as np
+import pandas as pd
+import matplotlib.pyplot as plt
+from sklearn.linear_model import LinearRegression, Ridge, Lasso
+from sklearn.metrics import mean_squared_error
+from sklearn.model_selection import KFold
+from sklearn.model_selection import cross_val_score
+
+
+# Where to save the figures and data files
+PROJECT_ROOT_DIR = "Results"
+FIGURE_ID = "Results/FigureFiles"
+DATA_ID = "DataFiles/"
+
+if not os.path.exists(PROJECT_ROOT_DIR):
+    os.mkdir(PROJECT_ROOT_DIR)
+
+if not os.path.exists(FIGURE_ID):
+    os.makedirs(FIGURE_ID)
+
+if not os.path.exists(DATA_ID):
+    os.makedirs(DATA_ID)
+
+def image_path(fig_id):
+    return os.path.join(FIGURE_ID, fig_id)
+
+def data_path(dat_id):
+    return os.path.join(DATA_ID, dat_id)
+
+def save_fig(fig_id):
+    plt.savefig(image_path(fig_id) + ".png", format='png')
+
+infile = open(data_path("EoS.csv"),'r')
+
+# Read the EoS data as  csv file and organize the data into two arrays with density and energies
+EoS = pd.read_csv(infile, names=('Density', 'Energy'))
+EoS['Energy'] = pd.to_numeric(EoS['Energy'], errors='coerce')
+EoS = EoS.dropna()
+Energies = EoS['Energy']
+Density = EoS['Density']
+#  The design matrix now as function of various polytrops
+
+Maxpolydegree = 30
+X = np.zeros((len(Density),Maxpolydegree))
+X[:,0] = 1.0
+estimated_mse_sklearn = np.zeros(Maxpolydegree)
+polynomial = np.zeros(Maxpolydegree)
+k =5
+kfold = KFold(n_splits = k)
+
+for polydegree in range(1, Maxpolydegree):
+    polynomial[polydegree] = polydegree
+    for degree in range(polydegree):
+        X[:,degree] = Density**(degree/3.0)
+        OLS = LinearRegression(fit_intercept=False)
+# loop over trials in order to estimate the expectation value of the MSE
+    estimated_mse_folds = cross_val_score(OLS, X, Energies, scoring='neg_mean_squared_error', cv=kfold)
+#[:, np.newaxis]
+    estimated_mse_sklearn[polydegree] = np.mean(-estimated_mse_folds)
+
+plt.plot(polynomial, np.log10(estimated_mse_sklearn), label='Test Error')
+plt.xlabel('Polynomial degree')
+plt.ylabel('log10[MSE]')
+plt.legend()
+plt.show()
 
-!bt
-\begin{equation*}
-  \bm{p}_k^T \bm{A}\bm{x} = \sum^{n}_{i=1} \alpha_i\bm{p}_k^T \bm{A}\bm{p}_i= \bm{p}_k^T \bm{b},
-\end{equation*}
-!et
-and we can define the coefficients $\alpha_k$ as
+!ec
 
-!bt
-\begin{equation*}
-    \alpha_k = \frac{\bm{p}_k^T \bm{b}}{\bm{p}_k^T \bm{A} \bm{p}_k}
-\end{equation*}
-!et 
-!eblock
 
-!split
-===== Conjugate gradient method and iterations ===== 
-!bblock
 
-If we choose the conjugate vectors $\bm{p}_k$ carefully, 
-then we may not need all of them to obtain a good approximation to the solution 
-$\bm{x}$. 
-We want to regard the conjugate gradient method as an iterative method. 
-This will us to solve systems where $n$ is so large that the direct 
-method would take too much time.
 
-We denote the initial guess for $\bm{x}$ as $\bm{x}_0$. 
-We can assume without loss of generality that
-!bt
-\begin{equation*}
-\bm{x}_0=0,
-\end{equation*}
-!et
-or consider the system
-!bt
-\begin{equation*}
-\bm{A}\bm{z} = \bm{b}-\bm{A}\bm{x}_0,
-\end{equation*}
-!et
-instead.
-!eblock
 
 
-!split
-===== Conjugate gradient method ===== 
-!bblock
-One can show that the solution $\bm{x}$ is also the unique minimizer of the quadratic form
-!bt
-\begin{equation*}
-  f(\bm{x}) = \frac{1}{2}\bm{x}^T\bm{A}\bm{x} - \bm{x}^T \bm{x} , \quad \bm{x}\in\mathbf{R}^n. 
-\end{equation*}
-!et
-This suggests taking the first basis vector $\bm{p}_1$ 
-to be the gradient of $f$ at $\bm{x}=\bm{x}_0$, 
-which equals
-!bt
-\begin{equation*}
-\bm{A}\bm{x}_0-\bm{b},
-\end{equation*}
-!et
-and 
-$\bm{x}_0=0$ it is equal $-\bm{b}$.
-The other vectors in the basis will be conjugate to the gradient, 
-hence the name conjugate gradient method.
-!eblock
 
 
-!split
-===== Conjugate gradient method ===== 
-!bblock
-Let  $\bm{r}_k$ be the residual at the $k$-th step:
-!bt
-\begin{equation*}
-\bm{r}_k=\bm{b}-\bm{A}\bm{x}_k.
-\end{equation*}
-!et
-Note that $\bm{r}_k$ is the negative gradient of $f$ at 
-$\bm{x}=\bm{x}_k$, 
-so the gradient descent method would be to move in the direction $\bm{r}_k$. 
-Here, we insist that the directions $\bm{p}_k$ are conjugate to each other, 
-so we take the direction closest to the gradient $\bm{r}_k$  
-under the conjugacy constraint. 
-This gives the following expression
-!bt
-\begin{equation*}
-\bm{p}_{k+1}=\bm{r}_k-\frac{\bm{p}_k^T \bm{A}\bm{r}_k}{\bm{p}_k^T\bm{A}\bm{p}_k} \bm{p}_k.
-\end{equation*}
-!et
-!eblock
 
-!split
-===== Conjugate gradient method ===== 
-!bblock
-We can also  compute the residual iteratively as
-!bt
-\begin{equation*}
-\bm{r}_{k+1}=\bm{b}-\bm{A}\bm{x}_{k+1},
- \end{equation*}
-!et
-which equals
-!bt
-\begin{equation*}
-\bm{b}-\bm{A}(\bm{x}_k+\alpha_k\bm{p}_k),
- \end{equation*}
-!et
-or
-!bt
-\begin{equation*}
-(\bm{b}-\bm{A}\bm{x}_k)-\alpha_k\bm{A}\bm{p}_k,
- \end{equation*}
-!et
-which gives
+!split 
+===== Logistic Regression =====
+
+In linear regression our main interest was centered on learning the
+coefficients of a functional fit (say a polynomial) in order to be
+able to predict the response of a continuous variable on some unseen
+data. The fit to the continuous variable $y_i$ is based on some
+independent variables $\bm{x}_i$. Linear regression resulted in
+analytical expressions for standard ordinary Least Squares or Ridge
+regression (in terms of matrices to invert) for several quantities,
+ranging from the variance and thereby the confidence intervals of the
+parameters $\bm{\theta}$ to the mean squared error. If we can invert
+the product of the design matrices, linear regression gives then a
+simple recipe for fitting our data.
 
-!bt
-\begin{equation*}
-\bm{r}_{k+1}=\bm{r}_k-\bm{A}\bm{p}_{k},
- \end{equation*}
-!et
-!eblock
+!split 
+===== Classification problems =====
+
+
+Classification problems, however, are concerned with outcomes taking
+the form of discrete variables (i.e. categories). We may for example,
+on the basis of DNA sequencing for a number of patients, like to find
+out which mutations are important for a certain disease; or based on
+scans of various patients' brains, figure out if there is a tumor or
+not; or given a specific physical system, we'd like to identify its
+state, say whether it is an ordered or disordered system (typical
+situation in solid state physics); or classify the status of a
+patient, whether she/he has a stroke or not and many other similar
+situations.
 
+The most common situation we encounter when we apply logistic
+regression is that of two possible outcomes, normally denoted as a
+binary outcome, true or false, positive or negative, success or
+failure etc.
 
+!split
+===== Optimization and Deep learning =====
+
+Logistic regression will also serve as our stepping stone towards
+neural network algorithms and supervised deep learning. For logistic
+learning, the minimization of the cost function leads to a non-linear
+equation in the parameters $\bm{\theta}$. The optimization of the
+problem calls therefore for minimization algorithms. This forms the
+bottle neck of all machine learning algorithms, namely how to find
+reliable minima of a multi-variable function. This leads us to the
+family of gradient descent methods. The latter are the working horses
+of basically all modern machine learning algorithms.
 
+We note also that many of the topics discussed here on logistic 
+regression are also commonly used in modern supervised Deep Learning
+models, as we will see later.
 
 
 !split 
-===== Revisiting our first homework =====
+===== Basics =====
 
-We will use linear regression as a case study for the gradient descent
-methods. Linear regression is a great test case for the gradient
-descent methods discussed in the lectures since it has several
-desirable properties such as:
+We consider the case where the outputs/targets, also called the
+responses or the outcomes, $y_i$ are discrete and only take values
+from $k=0,\dots,K-1$ (i.e. $K$ classes).
 
-o An analytical solution (recall homework set 1).
-o The gradient can be computed analytically.
-o The cost function is convex which guarantees that gradient descent converges for small enough learning rates
+The goal is to predict the
+output classes from the design matrix $\bm{X}\in\mathbb{R}^{n\times p}$
+made of $n$ samples, each of which carries $p$ features or predictors. The
+primary goal is to identify the classes to which new unseen samples
+belong.
+ 
+Let us specialize to the case of two classes only, with outputs
+$y_i=0$ and $y_i=1$. Our outcomes could represent the status of a
+credit card user that could default or not on her/his credit card
+debt. That is
 
-We revisit an example similar to what we had in the first homework set. We had a function  of the type
 
-!bc pycod
-m = 100
-x = 2*np.random.rand(m,1)
-y = 4+3*x+np.random.randn(m,1)
-!ec
-with $x_i \in [0,1] $ is chosen randomly using a uniform distribution. Additionally we have a stochastic noise chosen according to a normal distribution $\cal {N}(0,1)$. 
-The linear regression model is given by 
-!bt
-\[
-h_\beta(x) = \bm{y} = \beta_0 + \beta_1 x,
-\] 
-!et
-such that 
 !bt
 \[
-\bm{y}_i = \beta_0 + \beta_1 x_i.
+y_i = \begin{bmatrix} 0 & \mathrm{no}\\  1 & \mathrm{yes} \end{bmatrix}.
 \]
 !et
 
-!split 
-===== Gradient descent example =====
 
-Let $\mathbf{y} = (y_1,\cdots,y_n)^T$, $\mathbf{\bm{y}} = (\bm{y}_1,\cdots,\bm{y}_n)^T$ and $\beta = (\beta_0, \beta_1)^T$
 
-It is convenient to write $\mathbf{\bm{y}} = X\beta$ where $X \in \mathbb{R}^{100 \times 2} $ is the design matrix given by (we keep the intercept here)
+!split
+===== Linear classifier =====
+
+Before moving to the logistic model, let us try to use our linear
+regression model to classify these two outcomes. We could for example
+fit a linear model to the default case if $y_i > 0.5$ and the no
+default case $y_i \leq 0.5$.
+
+We would then have our 
+weighted linear combination, namely 
 !bt
-\[
-X \equiv \begin{bmatrix}
-1 & x_1  \\
-\vdots & \vdots  \\
-1 & x_{100} &  \\
-\end{bmatrix}.
-\]
+\begin{equation}
+\bm{y} = \bm{X}^T\bm{\theta} +  \bm{\epsilon},
+\end{equation}
 !et
-The cost/loss/risk function is given by (
-!bt
-\[
-C(\beta) = \frac{1}{n}||X\beta-\mathbf{y}||_{2}^{2} = \frac{1}{n}\sum_{i=1}^{100}\left[ (\beta_0 + \beta_1 x_i)^2 - 2 y_i (\beta_0 + \beta_1 x_i) + y_i^2\right] 
-\]
-!et
-and we want to find $\beta$ such that $C(\beta)$ is minimized.
+where $\bm{y}$ is a vector representing the possible outcomes, $\bm{X}$ is our
+$n\times p$ design matrix and $\bm{\theta}$ represents our estimators/predictors.
 
 !split
-===== The derivative of the cost/loss function =====
-
-Computing $\partial C(\beta) / \partial \beta_0$ and $\partial C(\beta) / \partial \beta_1$ we can show  that the gradient can be written as
-!bt
-\[
-\nabla_{\beta} C(\beta) = \frac{2}{n}\begin{bmatrix} \sum_{i=1}^{100} \left(\beta_0+\beta_1x_i-y_i\right) \\
-\sum_{i=1}^{100}\left( x_i (\beta_0+\beta_1x_i)-y_ix_i\right) \\
-\end{bmatrix} = \frac{2}{n}X^T(X\beta - \mathbf{y}), 
-\]
-!et
-where $X$ is the design matrix defined above.
-
-!split
-===== The Hessian matrix =====
-The Hessian matrix of $C(\beta)$ is given by 
-!bt
-\[
-\bm{H} \equiv \begin{bmatrix}
-\frac{\partial^2 C(\beta)}{\partial \beta_0^2} & \frac{\partial^2 C(\beta)}{\partial \beta_0 \partial \beta_1}  \\
-\frac{\partial^2 C(\beta)}{\partial \beta_0 \partial \beta_1} & \frac{\partial^2 C(\beta)}{\partial \beta_1^2} &  \\
-\end{bmatrix} = \frac{2}{n}X^T X.
-\]
-!et
-This result implies that $C(\beta)$ is a convex function since the matrix $X^T X$ always is positive semi-definite.
+===== Some selected properties =====
 
+The main problem with our function is that it takes values on the
+entire real axis. In the case of logistic regression, however, the
+labels $y_i$ are discrete variables. A typical example is the credit
+card data discussed below here, where we can set the state of
+defaulting the debt to $y_i=1$ and not to $y_i=0$ for one the persons
+in the data set (see the full example below).
 
+One simple way to get a discrete output is to have sign
+functions that map the output of a linear regressor to values $\{0,1\}$,
+$f(s_i)=sign(s_i)=1$ if $s_i\ge 0$ and 0 if otherwise. 
+We will encounter this model in our first demonstration of neural networks.
 
+Historically it is called the _perceptron_ model in the machine learning
+literature. This model is extremely simple. However, in many cases it is more
+favorable to use a ``soft" classifier that outputs
+the probability of a given category. This leads us to the logistic function.
 
 !split
-===== Simple program =====
-
-We can now write a program that minimizes $C(\beta)$ using the gradient descent method with a constant learning rate $\gamma$ according to 
-!bt
-\[
-\beta_{k+1} = \beta_k - \gamma \nabla_\beta C(\beta_k), \ k=0,1,\cdots 
-\]
-!et
+===== Simple example =====
 
-We can use the expression we computed for the gradient and let use a
-$\beta_0$ be chosen randomly and let $\gamma = 0.001$. Stop iterating
-when $||\nabla_\beta C(\beta_k) || \leq \epsilon = 10^{-8}$. _Note that the code below does not include the latter stop criterion_.
-
-And finally we can compare our solution for $\beta$ with the analytic result given by 
-$\beta= (X^TX)^{-1} X^T \mathbf{y}$.
-
-!split
-===== Gradient Descent Example =====
+The following example on data for coronary heart disease (CHD) as function of age may serve as an illustration. In the code here we read and plot whether a person has had CHD (output = 1) or not (output = 0). This ouput  is plotted the person's against age. Clearly, the figure shows that attempting to make a standard linear regression fit may not be very meaningful.
 
-Here our simple example
 !bc pycod
-
-# Importing various packages
-from random import random, seed
+# Common imports
+import os
 import numpy as np
+import pandas as pd
 import matplotlib.pyplot as plt
-from mpl_toolkits.mplot3d import Axes3D
-from matplotlib import cm
-from matplotlib.ticker import LinearLocator, FormatStrFormatter
-import sys
-
-# the number of datapoints
-n = 100
-x = 2*np.random.rand(n,1)
-y = 4+3*x+np.random.randn(n,1)
-
-X = np.c_[np.ones((n,1)), x]
-# Hessian matrix
-H = (2.0/n)* X.T @ X
-# Get the eigenvalues
-EigValues, EigVectors = np.linalg.eig(H)
-print(f"Eigenvalues of Hessian Matrix:{EigValues}")
-
-beta_linreg = np.linalg.inv(X.T @ X) @ X.T @ y
-print(beta_linreg)
-beta = np.random.randn(2,1)
-
-eta = 1.0/np.max(EigValues)
-Niterations = 1000
-
-for iter in range(Niterations):
-    gradient = (2.0/n)*X.T @ (X @ beta-y)
-    beta -= eta*gradient
-
-print(beta)
-xnew = np.array([[0],[2]])
-xbnew = np.c_[np.ones((2,1)), xnew]
-ypredict = xbnew.dot(beta)
-ypredict2 = xbnew.dot(beta_linreg)
-plt.plot(xnew, ypredict, "r-")
-plt.plot(xnew, ypredict2, "b-")
-plt.plot(x, y ,'ro')
-plt.axis([0,2.0,0, 15.0])
-plt.xlabel(r'$x$')
-plt.ylabel(r'$y$')
-plt.title(r'Gradient descent example')
+from sklearn.linear_model import LinearRegression, Ridge, Lasso
+from sklearn.model_selection import train_test_split
+from sklearn.utils import resample
+from sklearn.metrics import mean_squared_error
+from IPython.display import display
+from pylab import plt, mpl
+mpl.rcParams['font.family'] = 'serif'
+
+# Where to save the figures and data files
+PROJECT_ROOT_DIR = "Results"
+FIGURE_ID = "Results/FigureFiles"
+DATA_ID = "DataFiles/"
+
+if not os.path.exists(PROJECT_ROOT_DIR):
+    os.mkdir(PROJECT_ROOT_DIR)
+
+if not os.path.exists(FIGURE_ID):
+    os.makedirs(FIGURE_ID)
+
+if not os.path.exists(DATA_ID):
+    os.makedirs(DATA_ID)
+
+def image_path(fig_id):
+    return os.path.join(FIGURE_ID, fig_id)
+
+def data_path(dat_id):
+    return os.path.join(DATA_ID, dat_id)
+
+def save_fig(fig_id):
+    plt.savefig(image_path(fig_id) + ".png", format='png')
+
+infile = open(data_path("chddata.csv"),'r')
+
+# Read the chd data as  csv file and organize the data into arrays with age group, age, and chd
+chd = pd.read_csv(infile, names=('ID', 'Age', 'Agegroup', 'CHD'))
+chd.columns = ['ID', 'Age', 'Agegroup', 'CHD']
+output = chd['CHD']
+age = chd['Age']
+agegroup = chd['Agegroup']
+numberID  = chd['ID'] 
+display(chd)
+
+plt.scatter(age, output, marker='o')
+plt.axis([18,70.0,-0.1, 1.2])
+plt.xlabel(r'Age')
+plt.ylabel(r'CHD')
+plt.title(r'Age distribution and Coronary heart disease')
 plt.show()
-
 !ec
 
 !split
-===== And a corresponding example using _scikit-learn_ =====
-
-!bc pycod
-# Importing various packages
-from random import random, seed
-import numpy as np
-import matplotlib.pyplot as plt
-from sklearn.linear_model import SGDRegressor
+===== Plotting the mean value for each group =====
 
-n = 100
-x = 2*np.random.rand(n,1)
-y = 4+3*x+np.random.randn(n,1)
-
-X = np.c_[np.ones((n,1)), x]
-beta_linreg = np.linalg.inv(X.T @ X) @ (X.T @ y)
-print(beta_linreg)
-sgdreg = SGDRegressor(max_iter = 50, penalty=None, eta0=0.1)
-sgdreg.fit(x,y.ravel())
-print(sgdreg.intercept_, sgdreg.coef_)
+What we could attempt however is to plot the mean value for each group.
 
+!bc pycod
+agegroupmean = np.array([0.1, 0.133, 0.250, 0.333, 0.462, 0.625, 0.765, 0.800])
+group = np.array([1, 2, 3, 4, 5, 6, 7, 8])
+plt.plot(group, agegroupmean, "r-")
+plt.axis([0,9,0, 1.0])
+plt.xlabel(r'Age group')
+plt.ylabel(r'CHD mean values')
+plt.title(r'Mean values for each age group')
+plt.show()
 !ec
 
-
-
-!split 
-===== Gradient descent and Ridge =====
-
-We have also discussed Ridge regression where the loss function contains a regularized term given by the $L_2$ norm of $\beta$, 
-!bt
-\[
-C_{\text{ridge}}(\beta) = \frac{1}{n}||X\beta -\mathbf{y}||^2 + \lambda ||\beta||^2, \ \lambda \geq 0.
-\]
-!et
-
-In order to minimize $C_{\text{ridge}}(\beta)$ using GD we adjust the gradient as follows 
+We are now trying to find a function $f(y\vert x)$, that is a function which gives us an expected value for the output $y$ with a given input $x$.
+In standard linear regression with a linear dependence on $x$, we would write this in terms of our model
 !bt
 \[
-\nabla_\beta C_{\text{ridge}}(\beta)  = \frac{2}{n}\begin{bmatrix} \sum_{i=1}^{100} \left(\beta_0+\beta_1x_i-y_i\right) \\
-\sum_{i=1}^{100}\left( x_i (\beta_0+\beta_1x_i)-y_ix_i\right) \\
-\end{bmatrix} + 2\lambda\begin{bmatrix} \beta_0 \\ \beta_1\end{bmatrix} = 2 (\frac{1}{n}X^T(X\beta - \mathbf{y})+\lambda \beta).
+f(y_i\vert x_i)=\theta_0+\theta_1 x_i.
 \]
 !et
 
-We can easily extend our program to minimize $C_{\text{ridge}}(\beta)$ using gradient descent and compare with the analytical solution given by 
+This expression implies however that $f(y_i\vert x_i)$ could take any
+value from minus infinity to plus infinity. If we however let
+$f(y\vert y)$ be represented by the mean value, the above example
+shows us that we can constrain the function to take values between
+zero and one, that is we have $0 \le f(y_i\vert x_i) \le 1$. Looking
+at our last curve we see also that it has an S-shaped form. This leads
+us to a very popular model for the function $f$, namely the so-called
+Sigmoid function or logistic model. We will consider this function as
+representing the probability for finding a value of $y_i$ with a given
+$x_i$.
+
+!split
+===== The logistic function =====
+
+Another widely studied model, is the so-called 
+perceptron model, which is an example of a ``hard classification'' model. We
+will encounter this model when we discuss neural networks as
+well. Each datapoint is deterministically assigned to a category (i.e
+$y_i=0$ or $y_i=1$). In many cases, and the coronary heart disease data forms one of many such examples, it is favorable to have a ``soft''
+classifier that outputs the probability of a given category rather
+than a single value. For example, given $x_i$, the classifier
+outputs the probability of being in a category $k$.  Logistic regression
+is the most common example of a so-called soft classifier. In logistic
+regression, the probability that a data point $x_i$
+belongs to a category $y_i=\{0,1\}$ is given by the so-called logit function (or Sigmoid) which is meant to represent the likelihood for a given event, 
 !bt
 \[
-\beta_{\text{ridge}} = \left(X^T X + n\lambda I_{2 \times 2} \right)^{-1} X^T \mathbf{y}.
+p(t) = \frac{1}{1+\mathrm \exp{-t}}=\frac{\exp{t}}{1+\mathrm \exp{t}}.
 \]
 !et
+Note that $1-p(t)= p(-t)$.
 
 !split
-===== The Hessian matrix for Ridge Regression =====
-The Hessian matrix of Ridge Regression for our simple example  is given by 
-!bt
-\[
-\bm{H} \equiv \begin{bmatrix}
-\frac{\partial^2 C(\beta)}{\partial \beta_0^2} & \frac{\partial^2 C(\beta)}{\partial \beta_0 \partial \beta_1}  \\
-\frac{\partial^2 C(\beta)}{\partial \beta_0 \partial \beta_1} & \frac{\partial^2 C(\beta)}{\partial \beta_1^2} &  \\
-\end{bmatrix} = \frac{2}{n}X^T X+2\lambda\bm{I}.
-\]
-!et
-This implies that the Hessian matrix  is positive definite, hence the stationary point is a
-minimum.
-Note that the Ridge cost function is convex being  a sum of two convex
-functions. Therefore, the stationary point is a global
-minimum of this function.
+===== Examples of likelihood functions used in logistic regression and nueral networks =====
 
 
-!split
-===== Program example for gradient descent with Ridge Regression =====
-!bc pycod
-from random import random, seed
-import numpy as np
-import matplotlib.pyplot as plt
-from mpl_toolkits.mplot3d import Axes3D
-from matplotlib import cm
-from matplotlib.ticker import LinearLocator, FormatStrFormatter
-import sys
-
-# the number of datapoints
-n = 100
-x = 2*np.random.rand(n,1)
-y = 4+3*x+np.random.randn(n,1)
-
-X = np.c_[np.ones((n,1)), x]
-XT_X = X.T @ X
-
-#Ridge parameter lambda
-lmbda  = 0.001
-Id = n*lmbda* np.eye(XT_X.shape[0])
-
-# Hessian matrix
-H = (2.0/n)* XT_X+2*lmbda* np.eye(XT_X.shape[0])
-# Get the eigenvalues
-EigValues, EigVectors = np.linalg.eig(H)
-print(f"Eigenvalues of Hessian Matrix:{EigValues}")
-
-
-beta_linreg = np.linalg.inv(XT_X+Id) @ X.T @ y
-print(beta_linreg)
-# Start plain gradient descent
-beta = np.random.randn(2,1)
-
-eta = 1.0/np.max(EigValues)
-Niterations = 100
-
-for iter in range(Niterations):
-    gradients = 2.0/n*X.T @ (X @ (beta)-y)+2*lmbda*beta
-    beta -= eta*gradients
-
-print(beta)
-ypredict = X @ beta
-ypredict2 = X @ beta_linreg
-plt.plot(x, ypredict, "r-")
-plt.plot(x, ypredict2, "b-")
-plt.plot(x, y ,'ro')
-plt.axis([0,2.0,0, 15.0])
-plt.xlabel(r'$x$')
-plt.ylabel(r'$y$')
-plt.title(r'Gradient descent example for Ridge')
-plt.show()
-
-
-!ec
+The following code plots the logistic function, the step function and other functions we will encounter from here and on.
 
-!split
-===== Using gradient descent methods, limitations =====
-
-* _Gradient descent (GD) finds local minima of our function_. Since the GD algorithm is deterministic, if it converges, it will converge to a local minimum of our cost/loss/risk function. Because in ML we are often dealing with extremely rugged landscapes with many local minima, this can lead to poor performance.
 
-* _GD is sensitive to initial conditions_. One consequence of the local nature of GD is that initial conditions matter. Depending on where one starts, one will end up at a different local minima. Therefore, it is very important to think about how one initializes the training process. This is true for GD as well as more complicated variants of GD.
+!bc pycod
+"""The sigmoid function (or the logistic curve) is a
+function that takes any real number, z, and outputs a number (0,1).
+It is useful in neural networks for assigning weights on a relative scale.
+The value z is the weighted sum of parameters involved in the learning algorithm."""
 
-* _Gradients are computationally expensive to calculate for large datasets_. In many cases in statistics and ML, the cost/loss/risk function is a sum of terms, with one term for each data point. For example, in linear regression, $E \propto \sum_{i=1}^n (y_i - \mathbf{w}^T\cdot\mathbf{x}_i)^2$; for logistic regression, the square error is replaced by the cross entropy. To calculate the gradient we have to sum over *all* $n$ data points. Doing this at every GD step becomes extremely computationally expensive. An ingenious solution to this, is to calculate the gradients using small subsets of the data called ``mini batches''. This has the added benefit of introducing stochasticity into our algorithm.
+import numpy
+import matplotlib.pyplot as plt
+import math as mt
 
-* _GD is very sensitive to choices of learning rates_. GD is extremely sensitive to the choice of learning rates. If the learning rate is very small, the training process take an extremely long time. For larger learning rates, GD can diverge and give poor results. Furthermore, depending on what the local landscape looks like, we have to modify the learning rates to ensure convergence. Ideally, we would *adaptively* choose the learning rates to match the landscape.
+z = numpy.arange(-5, 5, .1)
+sigma_fn = numpy.vectorize(lambda z: 1/(1+numpy.exp(-z)))
+sigma = sigma_fn(z)
 
-* _GD treats all directions in parameter space uniformly._ Another major drawback of GD is that unlike Newton's method, the learning rate for GD is the same in all directions in parameter space. For this reason, the maximum learning rate is set by the behavior of the steepest direction and this can significantly slow down training. Ideally, we would like to take large steps in flat directions and small steps in steep directions. Since we are exploring rugged landscapes where curvatures change, this requires us to keep track of not only the gradient but second derivatives. The ideal scenario would be to calculate the Hessian but this proves to be too computationally expensive. 
-	
-* GD can take exponential time to escape saddle points, even with random initialization. As we mentioned, GD is extremely sensitive to initial condition since it determines the particular local minimum GD would eventually reach. However, even with a good initialization scheme, through the introduction of randomness, GD can still take exponential time to escape saddle points.
+fig = plt.figure()
+ax = fig.add_subplot(111)
+ax.plot(z, sigma)
+ax.set_ylim([-0.1, 1.1])
+ax.set_xlim([-5,5])
+ax.grid(True)
+ax.set_xlabel('z')
+ax.set_title('sigmoid function')
 
-!split
-===== Improving gradient descent with momentum =====
+plt.show()
 
-We discuss here some simple examples where we introduce what is called 'memory'about previous steps, or what is normally called momentum gradient descent. The mathematics is explained below in connection with Stochastic gradient descent.
+"""Step Function"""
+z = numpy.arange(-5, 5, .02)
+step_fn = numpy.vectorize(lambda z: 1.0 if z >= 0.0 else 0.0)
+step = step_fn(z)
 
-!bc pycod
-from numpy import asarray
-from numpy import arange
-from numpy.random import rand
-from numpy.random import seed
-from matplotlib import pyplot
- 
-# objective function
-def objective(x):
-	return x**2.0
- 
-# derivative of objective function
-def derivative(x):
-	return x * 2.0
- 
-# gradient descent algorithm
-def gradient_descent(objective, derivative, bounds, n_iter, step_size):
-	# track all solutions
-	solutions, scores = list(), list()
-	# generate an initial point
-	solution = bounds[:, 0] + rand(len(bounds)) * (bounds[:, 1] - bounds[:, 0])
-	# run the gradient descent
-	for i in range(n_iter):
-		# calculate gradient
-		gradient = derivative(solution)
-		# take a step
-		solution = solution - step_size * gradient
-		# evaluate candidate point
-		solution_eval = objective(solution)
-		# store solution
-		solutions.append(solution)
-		scores.append(solution_eval)
-		# report progress
-		print('>%d f(%s) = %.5f' % (i, solution, solution_eval))
-	return [solutions, scores]
- 
-# seed the pseudo random number generator
-seed(4)
-# define range for input
-bounds = asarray([[-1.0, 1.0]])
-# define the total iterations
-n_iter = 30
-# define the step size
-step_size = 0.1
-# perform the gradient descent search
-solutions, scores = gradient_descent(objective, derivative, bounds, n_iter, step_size)
-# sample input range uniformly at 0.1 increments
-inputs = arange(bounds[0,0], bounds[0,1]+0.1, 0.1)
-# compute targets
-results = objective(inputs)
-# create a line plot of input vs result
-pyplot.plot(inputs, results)
-# plot the solutions found
-pyplot.plot(solutions, scores, '.-', color='red')
-# show the plot
-pyplot.show()
+fig = plt.figure()
+ax = fig.add_subplot(111)
+ax.plot(z, step)
+ax.set_ylim([-0.5, 1.5])
+ax.set_xlim([-5,5])
+ax.grid(True)
+ax.set_xlabel('z')
+ax.set_title('step function')
 
-!ec
+plt.show()
 
+"""tanh Function"""
+z = numpy.arange(-2*mt.pi, 2*mt.pi, 0.1)
+t = numpy.tanh(z)
 
-!split
-===== Same code but now with momentum gradient descent =====
+fig = plt.figure()
+ax = fig.add_subplot(111)
+ax.plot(z, t)
+ax.set_ylim([-1.0, 1.0])
+ax.set_xlim([-2*mt.pi,2*mt.pi])
+ax.grid(True)
+ax.set_xlabel('z')
+ax.set_title('tanh function')
 
-!bc pycod
-from numpy import asarray
-from numpy import arange
-from numpy.random import rand
-from numpy.random import seed
-from matplotlib import pyplot
- 
-# objective function
-def objective(x):
-	return x**2.0
- 
-# derivative of objective function
-def derivative(x):
-	return x * 2.0
- 
-# gradient descent algorithm
-def gradient_descent(objective, derivative, bounds, n_iter, step_size, momentum):
-	# track all solutions
-	solutions, scores = list(), list()
-	# generate an initial point
-	solution = bounds[:, 0] + rand(len(bounds)) * (bounds[:, 1] - bounds[:, 0])
-	# keep track of the change
-	change = 0.0
-	# run the gradient descent
-	for i in range(n_iter):
-		# calculate gradient
-		gradient = derivative(solution)
-		# calculate update
-		new_change = step_size * gradient + momentum * change
-		# take a step
-		solution = solution - new_change
-		# save the change
-		change = new_change
-		# evaluate candidate point
-		solution_eval = objective(solution)
-		# store solution
-		solutions.append(solution)
-		scores.append(solution_eval)
-		# report progress
-		print('>%d f(%s) = %.5f' % (i, solution, solution_eval))
-	return [solutions, scores]
- 
-# seed the pseudo random number generator
-seed(4)
-# define range for input
-bounds = asarray([[-1.0, 1.0]])
-# define the total iterations
-n_iter = 30
-# define the step size
-step_size = 0.1
-# define momentum
-momentum = 0.3
-# perform the gradient descent search with momentum
-solutions, scores = gradient_descent(objective, derivative, bounds, n_iter, step_size, momentum)
-# sample input range uniformly at 0.1 increments
-inputs = arange(bounds[0,0], bounds[0,1]+0.1, 0.1)
-# compute targets
-results = objective(inputs)
-# create a line plot of input vs result
-pyplot.plot(inputs, results)
-# plot the solutions found
-pyplot.plot(solutions, scores, '.-', color='red')
-# show the plot
-pyplot.show()
+plt.show()
 !ec
 
 
 
-!split
-===== Overview video on Stochastic Gradient Descent =====
-
-"What is Stochastic Gradient Descent":"https://www.youtube.com/watch?v=vMh0zPT0tLI&ab_channel=StatQuestwithJoshStarmer"
-
-
-!split
-=====  Batches and mini-batches =====
-
-In gradient descent we compute the cost function and its gradient for all data points we have.
 
-In large-scale applications such as the "ILSVRC challenge":"https://www.image-net.org/challenges/LSVRC/", the
-training data can have on order of millions of examples. Hence, it
-seems wasteful to compute the full cost function over the entire
-training set in order to perform only a single parameter update. A
-very common approach to addressing this challenge is to compute the
-gradient over batches of the training data. For example, a typical batch could contain some thousand  examples from
-an  entire training set of several millions. This batch is then used to
-perform a parameter update.
 
-!split
-===== Stochastic Gradient Descent (SGD) =====
-
-In stochastic gradient descent, the extreme case is the case where we
-have only one batch, that is we include the whole data set.
-
-This process is called Stochastic Gradient
-Descent (SGD) (or also sometimes on-line gradient descent). This is
-relatively less common to see because in practice due to vectorized
-code optimizations it can be computationally much more efficient to
-evaluate the gradient for 100 examples, than the gradient for one
-example 100 times. Even though SGD technically refers to using a
-single example at a time to evaluate the gradient, you will hear
-people use the term SGD even when referring to mini-batch gradient
-descent (i.e. mentions of MGD for “Minibatch Gradient Descent”, or BGD
-for “Batch gradient descent” are rare to see), where it is usually
-assumed that mini-batches are used. The size of the mini-batch is a
-hyperparameter but it is not very common to cross-validate or bootstrap it. It is
-usually based on memory constraints (if any), or set to some value,
-e.g. 32, 64 or 128. We use powers of 2 in practice because many
-vectorized operation implementations work faster when their inputs are
-sized in powers of 2.
-
-In our notes with  SGD we mean stochastic gradient descent with mini-batches.
 
 
 !split
-===== Stochastic Gradient Descent =====
+=====  Two parameters =====
 
-Stochastic gradient descent (SGD) and variants thereof address some of
-the shortcomings of the Gradient descent method discussed above.
+We assume now that we have two classes with $y_i$ either $0$ or $1$. Furthermore we assume also that we have only two parameters $\theta$ in our fitting of the Sigmoid function, that is we define probabilities 
+!bt
+\begin{align*}
+p(y_i=1|x_i,\bm{\theta}) &= \frac{\exp{(\theta_0+\theta_1x_i)}}{1+\exp{(\theta_0+\theta_1x_i)}},\nonumber\\
+p(y_i=0|x_i,\bm{\theta}) &= 1 - p(y_i=1|x_i,\bm{\theta}),
+\end{align*}
+!et
+where $\bm{\theta}$ are the weights we wish to extract from data, in our case $\theta_0$ and $\theta_1$. 
 
-The underlying idea of SGD comes from the observation that the cost
-function, which we want to minimize, can almost always be written as a
-sum over $n$ data points $\{\mathbf{x}_i\}_{i=1}^n$,
+Note that we used
 !bt
 \[
-C(\mathbf{\beta}) = \sum_{i=1}^n c_i(\mathbf{x}_i,
-\mathbf{\beta}). 
+p(y_i=0\vert x_i, \bm{\theta}) = 1-p(y_i=1\vert x_i, \bm{\theta}).
 \]
 !et
 
-!split
-===== Computation of gradients =====
+!split 
+===== Maximum likelihood =====
 
-This in turn means that the gradient can be
-computed as a sum over $i$-gradients 
+In order to define the total likelihood for all possible outcomes from a  
+dataset $\mathcal{D}=\{(y_i,x_i)\}$, with the binary labels
+$y_i\in\{0,1\}$ and where the data points are drawn independently, we use the so-called "Maximum Likelihood Estimation":"https://en.wikipedia.org/wiki/Maximum_likelihood_estimation" (MLE) principle. 
+We aim thus at maximizing 
+the probability of seeing the observed data. We can then approximate the 
+likelihood in terms of the product of the individual probabilities of a specific outcome $y_i$, that is 
+!bt
+\begin{align*}
+P(\mathcal{D}|\bm{\theta})& = \prod_{i=1}^n \left[p(y_i=1|x_i,\bm{\theta})\right]^{y_i}\left[1-p(y_i=1|x_i,\bm{\theta}))\right]^{1-y_i}\nonumber \\
+\end{align*}
+!et
+from which we obtain the log-likelihood and our _cost/loss_ function
 !bt
 \[
-\nabla_\beta C(\mathbf{\beta}) = \sum_i^n \nabla_\beta c_i(\mathbf{x}_i,
-\mathbf{\beta}).
+\mathcal{C}(\bm{\theta}) = \sum_{i=1}^n \left( y_i\log{p(y_i=1|x_i,\bm{\theta})} + (1-y_i)\log\left[1-p(y_i=1|x_i,\bm{\theta}))\right]\right).
 \]
 !et
 
-Stochasticity/randomness is introduced by only taking the
-gradient on a subset of the data called minibatches.  If there are $n$
-data points and the size of each minibatch is $M$, there will be $n/M$
-minibatches. We denote these minibatches by $B_k$ where
-$k=1,\cdots,n/M$.
-
-
-
 !split
-===== SGD example =====
-As an example, suppose we have $10$ data points $(\mathbf{x}_1,\cdots, \mathbf{x}_{10})$ 
-and we choose to have $M=5$ minibathces,
-then each minibatch contains two data points. In particular we have
-$B_1 = (\mathbf{x}_1,\mathbf{x}_2), \cdots, B_5 =
-(\mathbf{x}_9,\mathbf{x}_{10})$. Note that if you choose $M=1$ you
-have only a single batch with all data points and on the other extreme,
-you may choose $M=n$ resulting in a minibatch for each datapoint, i.e
-$B_k = \mathbf{x}_k$.
-
-The idea is now to approximate the gradient by replacing the sum over
-all data points with a sum over the data points in one the minibatches
-picked at random in each gradient descent step 
+===== The cost function rewritten =====
+
+Reordering the logarithms, we can rewrite the _cost/loss_ function as
 !bt
 \[
-\nabla_{\beta}
-C(\mathbf{\beta}) = \sum_{i=1}^n \nabla_\beta c_i(\mathbf{x}_i,
-\mathbf{\beta}) \rightarrow \sum_{i \in B_k}^n \nabla_\beta
-c_i(\mathbf{x}_i, \mathbf{\beta}).
+\mathcal{C}(\bm{\theta}) = \sum_{i=1}^n  \left(y_i(\theta_0+\theta_1x_i) -\log{(1+\exp{(\theta_0+\theta_1x_i)})}\right).
 \]
 !et
 
-!split
-===== The gradient step =====
-
-Thus a gradient descent step now looks like 
+The maximum likelihood estimator is defined as the set of parameters that maximize the log-likelihood where we maximize with respect to $\theta$.
+Since the cost (error) function is just the negative log-likelihood, for logistic regression we have that
 !bt
 \[
-\beta_{j+1} = \beta_j - \gamma_j \sum_{i \in B_k}^n \nabla_\beta c_i(\mathbf{x}_i,
-\mathbf{\beta})
+\mathcal{C}(\bm{\theta})=-\sum_{i=1}^n  \left(y_i(\theta_0+\theta_1x_i) -\log{(1+\exp{(\theta_0+\theta_1x_i)})}\right).
 \]
 !et
-
-where $k$ is picked at random with equal
-probability from $[1,n/M]$. An iteration over the number of
-minibathces (n/M) is commonly referred to as an epoch. Thus it is
-typical to choose a number of epochs and for each epoch iterate over
-the number of minibatches, as exemplified in the code below.
-
-!split
-===== Simple example code =====
-
-!bc pycod
-import numpy as np 
-
-n = 100 #100 datapoints 
-M = 5   #size of each minibatch
-m = int(n/M) #number of minibatches
-n_epochs = 10 #number of epochs
-
-j = 0
-for epoch in range(1,n_epochs+1):
-    for i in range(m):
-        k = np.random.randint(m) #Pick the k-th minibatch at random
-        #Compute the gradient using the data in minibatch Bk
-        #Compute new suggestion for 
-        j += 1
-!ec
-
-Taking the gradient only on a subset of the data has two important
-benefits. First, it introduces randomness which decreases the chance
-that our opmization scheme gets stuck in a local minima. Second, if
-the size of the minibatches are small relative to the number of
-datapoints ($M <  n$), the computation of the gradient is much
-cheaper since we sum over the datapoints in the $k-th$ minibatch and not
-all $n$ datapoints.
-
-!split
-===== When do we stop? =====
-
-A natural question is when do we stop the search for a new minimum?
-One possibility is to compute the full gradient after a given number
-of epochs and check if the norm of the gradient is smaller than some
-threshold and stop if true. However, the condition that the gradient
-is zero is valid also for local minima, so this would only tell us
-that we are close to a local/global minimum. However, we could also
-evaluate the cost function at this point, store the result and
-continue the search. If the test kicks in at a later stage we can
-compare the values of the cost function and keep the $\beta$ that
-gave the lowest value.
-
-!split
-===== Slightly different approach =====
-
-Another approach is to let the step length $\gamma_j$ depend on the
-number of epochs in such a way that it becomes very small after a
-reasonable time such that we do not move at all. Such approaches are
-also called scaling. There are many such ways to "scale the learning
-rate":"https://towardsdatascience.com/gradient-descent-the-learning-rate-and-the-importance-of-feature-scaling-6c0b416596e1"
-and "discussions here":"https://www.jmlr.org/papers/volume23/20-1258/20-1258.pdf". See
-also
-URL:"https://towardsdatascience.com/learning-rate-schedules-and-adaptive-learning-rate-methods-for-deep-learning-2c8f433990d1"
-for a discussion of different scaling functions for the learning rate.
-
-!split
-===== Time decay rate =====
-
-As an example, let $e = 0,1,2,3,\cdots$ denote the current epoch and let $t_0, t_1 > 0$ be two fixed numbers. Furthermore, let $t = e \cdot m + i$ where $m$ is the number of minibatches and $i=0,\cdots,m-1$. Then the function $$\gamma_j(t; t_0, t_1) = \frac{t_0}{t+t_1} $$ goes to zero as the number of epochs gets large. I.e. we start with a step length $\gamma_j (0; t_0, t_1) = t_0/t_1$ which decays in *time* $t$.
-
-In this way we can fix the number of epochs, compute $\beta$ and
-evaluate the cost function at the end. Repeating the computation will
-give a different result since the scheme is random by design. Then we
-pick the final $\beta$ that gives the lowest value of the cost
-function.
-
-!bc pycod
-import numpy as np 
-
-def step_length(t,t0,t1):
-    return t0/(t+t1)
-
-n = 100 #100 datapoints 
-M = 5   #size of each minibatch
-m = int(n/M) #number of minibatches
-n_epochs = 500 #number of epochs
-t0 = 1.0
-t1 = 10
-
-gamma_j = t0/t1
-j = 0
-for epoch in range(1,n_epochs+1):
-    for i in range(m):
-        k = np.random.randint(m) #Pick the k-th minibatch at random
-        #Compute the gradient using the data in minibatch Bk
-        #Compute new suggestion for beta
-        t = epoch*m+i
-        gamma_j = step_length(t,t0,t1)
-        j += 1
-
-print("gamma_j after %d epochs: %g" % (n_epochs,gamma_j))
-!ec
-
-
-
-
-
-
-!split
-===== Code with a Number of Minibatches which varies =====
-
-In the code here we vary the number of mini-batches.
-!bc pycode
-# Importing various packages
-from math import exp, sqrt
-from random import random, seed
-import numpy as np
-import matplotlib.pyplot as plt
-
-n = 100
-x = 2*np.random.rand(n,1)
-y = 4+3*x+np.random.randn(n,1)
-
-X = np.c_[np.ones((n,1)), x]
-XT_X = X.T @ X
-theta_linreg = np.linalg.inv(X.T @ X) @ (X.T @ y)
-print("Own inversion")
-print(theta_linreg)
-# Hessian matrix
-H = (2.0/n)* XT_X
-EigValues, EigVectors = np.linalg.eig(H)
-print(f"Eigenvalues of Hessian Matrix:{EigValues}")
-
-theta = np.random.randn(2,1)
-eta = 1.0/np.max(EigValues)
-Niterations = 1000
-
-
-for iter in range(Niterations):
-    gradients = 2.0/n*X.T @ ((X @ theta)-y)
-    theta -= eta*gradients
-print("theta from own gd")
-print(theta)
-
-xnew = np.array([[0],[2]])
-Xnew = np.c_[np.ones((2,1)), xnew]
-ypredict = Xnew.dot(theta)
-ypredict2 = Xnew.dot(theta_linreg)
-
-n_epochs = 50
-M = 5   #size of each minibatch
-m = int(n/M) #number of minibatches
-t0, t1 = 5, 50
-
-def learning_schedule(t):
-    return t0/(t+t1)
-
-theta = np.random.randn(2,1)
-
-for epoch in range(n_epochs):
-# Can you figure out a better way of setting up the contributions to each batch?
-    for i in range(m):
-        random_index = M*np.random.randint(m)
-        xi = X[random_index:random_index+M]
-        yi = y[random_index:random_index+M]
-        gradients = (2.0/M)* xi.T @ ((xi @ theta)-yi)
-        eta = learning_schedule(epoch*m+i)
-        theta = theta - eta*gradients
-print("theta from own sdg")
-print(theta)
-
-plt.plot(xnew, ypredict, "r-")
-plt.plot(xnew, ypredict2, "b-")
-plt.plot(x, y ,'ro')
-plt.axis([0,2.0,0, 15.0])
-plt.xlabel(r'$x$')
-plt.ylabel(r'$y$')
-plt.title(r'Random numbers ')
-plt.show()
-
-!ec
-
-
+This equation is known in statistics as the _cross entropy_. Finally, we note that just as in linear regression, 
+in practice we often supplement the cross-entropy with additional regularization terms, usually $L_1$ and $L_2$ regularization as we did for Ridge and Lasso regression.
 
 !split
-===== Replace or not =====
-
-In the above code, we have use replacement in setting up the
-mini-batches. The discussion
-"here":"https://sebastianraschka.com/faq/docs/sgd-methods.html" may be
-useful.  
+=====  Minimizing the cross entropy =====
 
+The cross entropy is a convex function of the weights $\bm{\theta}$ and,
+therefore, any local minimizer is a global minimizer. 
 
-!split
-===== Momentum based GD =====
 
-The stochastic gradient descent (SGD) is almost always used with a
-*momentum* or inertia term that serves as a memory of the direction we
-are moving in parameter space.  This is typically implemented as
-follows
+Minimizing this
+cost function with respect to the two parameters $\theta_0$ and $\theta_1$ we obtain
 
 !bt
-\begin{align}
-\mathbf{v}_{t}&=\gamma \mathbf{v}_{t-1}+\eta_{t}\nabla_\theta E(\boldsymbol{\theta}_t) \nonumber \\
-\boldsymbol{\theta}_{t+1}&= \boldsymbol{\theta}_t -\mathbf{v}_{t},
-\end{align}
+\[
+\frac{\partial \mathcal{C}(\bm{\theta})}{\partial \theta_0} = -\sum_{i=1}^n  \left(y_i -\frac{\exp{(\theta_0+\theta_1x_i)}}{1+\exp{(\theta_0+\theta_1x_i)}}\right),
+\]
 !et
-
-where we have introduced a momentum parameter $\gamma$, with
-$0\le\gamma\le 1$, and for brevity we dropped the explicit notation to
-indicate the gradient is to be taken over a different mini-batch at
-each step. We call this algorithm gradient descent with momentum
-(GDM). From these equations, it is clear that $\mathbf{v}_t$ is a
-running average of recently encountered gradients and
-$(1-\gamma)^{-1}$ sets the characteristic time scale for the memory
-used in the averaging procedure. Consistent with this, when
-$\gamma=0$, this just reduces down to ordinary SGD as discussed
-earlier. An equivalent way of writing the updates is
-
+and 
 !bt
 \[
-\Delta \boldsymbol{\theta}_{t+1} = \gamma \Delta \boldsymbol{\theta}_t -\ \eta_{t}\nabla_\theta E(\boldsymbol{\theta}_t),
+\frac{\partial \mathcal{C}(\bm{\theta})}{\partial \theta_1} = -\sum_{i=1}^n  \left(y_ix_i -x_i\frac{\exp{(\theta_0+\theta_1x_i)}}{1+\exp{(\theta_0+\theta_1x_i)}}\right).
 \]
 !et
-where we have defined $\Delta \boldsymbol{\theta}_{t}= \boldsymbol{\theta}_t-\boldsymbol{\theta}_{t-1}$.
 
 !split
-===== More on momentum based approaches =====
+=====  A more compact expression =====
 
-Let us try to get more intuition from these equations. It is helpful
-to consider a simple physical analogy with a particle of mass $m$
-moving in a viscous medium with drag coefficient $\mu$ and potential
-$E(\mathbf{w})$. If we denote the particle's position by $\mathbf{w}$,
-then its motion is described by
+Let us now define a vector $\bm{y}$ with $n$ elements $y_i$, an
+$n\times p$ matrix $\bm{X}$ which contains the $x_i$ values and a
+vector $\bm{p}$ of fitted probabilities $p(y_i\vert x_i,\bm{\theta})$. We can rewrite in a more compact form the first
+derivative of the cost function as
 
 !bt
 \[
-m {d^2 \mathbf{w} \over dt^2} + \mu {d \mathbf{w} \over dt }= -\nabla_w E(\mathbf{w}).
+\frac{\partial \mathcal{C}(\bm{\theta})}{\partial \bm{\theta}} = -\bm{X}^T\left(\bm{y}-\bm{p}\right). 
 \]
 !et
 
-We can discretize this equation in the usual way to get
+If we in addition define a diagonal matrix $\bm{W}$ with elements 
+$p(y_i\vert x_i,\bm{\theta})(1-p(y_i\vert x_i,\bm{\theta})$, we can obtain a compact expression of the second derivative as 
 
 !bt
 \[
-m { \mathbf{w}_{t+\Delta t}-2 \mathbf{w}_{t} +\mathbf{w}_{t-\Delta t} \over (\Delta t)^2}+\mu {\mathbf{w}_{t+\Delta t}- \mathbf{w}_{t} \over \Delta t} = -\nabla_w E(\mathbf{w}).
+\frac{\partial^2 \mathcal{C}(\bm{\theta})}{\partial \bm{\theta}\partial \bm{\theta}^T} = \bm{X}^T\bm{W}\bm{X}. 
 \]
 !et
 
-Rearranging this equation, we can rewrite this as
+!split
+===== Extending to more predictors =====
 
+Within a binary classification problem, we can easily expand our model to include multiple predictors. Our ratio between likelihoods is then with $p$ predictors
 !bt
 \[
-\Delta \mathbf{w}_{t +\Delta t}= - { (\Delta t)^2 \over m +\mu \Delta t} \nabla_w E(\mathbf{w})+ {m \over m +\mu \Delta t} \Delta \mathbf{w}_t.
+\log{ \frac{p(\bm{\theta}\bm{x})}{1-p(\bm{\theta}\bm{x})}} = \theta_0+\theta_1x_1+\theta_2x_2+\dots+\theta_px_p.
 \]
 !et
-
-!split
-===== Momentum parameter =====
-
-Notice that this equation is identical to previous one if we identify
-the position of the particle, $\mathbf{w}$, with the parameters
-$\boldsymbol{\theta}$. This allows us to identify the momentum
-parameter and learning rate with the mass of the particle and the
-viscous drag as:
-
+Here we defined $\bm{x}=[1,x_1,x_2,\dots,x_p]$ and $\bm{\theta}=[\theta_0, \theta_1, \dots, \theta_p]$ leading to
 !bt
 \[
-\gamma= {m \over m +\mu \Delta t }, \qquad \eta = {(\Delta t)^2 \over m +\mu \Delta t}.
+p(\bm{\theta}\bm{x})=\frac{ \exp{(\theta_0+\theta_1x_1+\theta_2x_2+\dots+\theta_px_p)}}{1+\exp{(\theta_0+\theta_1x_1+\theta_2x_2+\dots+\theta_px_p)}}.
 \]
 !et
 
-Thus, as the name suggests, the momentum parameter is proportional to
-the mass of the particle and effectively provides inertia.
-Furthermore, in the large viscosity/small learning rate limit, our
-memory time scales as $(1-\gamma)^{-1} \approx m/(\mu \Delta t)$.
-
-Why is momentum useful? SGD momentum helps the gradient descent
-algorithm gain speed in directions with persistent but small gradients
-even in the presence of stochasticity, while suppressing oscillations
-in high-curvature directions. This becomes especially important in
-situations where the landscape is shallow and flat in some directions
-and narrow and steep in others. It has been argued that first-order
-methods (with appropriate initial conditions) can perform comparable
-to more expensive second order methods, especially in the context of
-complex deep learning models.
-
-These beneficial properties of momentum can sometimes become even more
-pronounced by using a slight modification of the classical momentum
-algorithm called Nesterov Accelerated Gradient (NAG).
-
-In the NAG algorithm, rather than calculating the gradient at the
-current parameters, $\nabla_\theta E(\boldsymbol{\theta}_t)$, one
-calculates the gradient at the expected value of the parameters given
-our current momentum, $\nabla_\theta E(\boldsymbol{\theta}_t +\gamma
-\mathbf{v}_{t-1})$. This yields the NAG update rule
-
-!bt
-\begin{align}
-\mathbf{v}_{t}&=\gamma \mathbf{v}_{t-1}+\eta_{t}\nabla_\theta E(\boldsymbol{\theta}_t +\gamma \mathbf{v}_{t-1}) \nonumber \\
-\boldsymbol{\theta}_{t+1}&= \boldsymbol{\theta}_t -\mathbf{v}_{t}.
-\end{align}
-!et
-
-One of the major advantages of NAG is that it allows for the use of a larger learning rate than GDM for the same choice of $\gamma$.
-
-
 !split
-===== Second moment of the gradient =====
-
-
-In stochastic gradient descent, with and without momentum, we still
-have to specify a schedule for tuning the learning rates $\eta_t$
-as a function of time.  As discussed in the context of Newton's
-method, this presents a number of dilemmas. The learning rate is
-limited by the steepest direction which can change depending on the
-current position in the landscape. To circumvent this problem, ideally
-our algorithm would keep track of curvature and take large steps in
-shallow, flat directions and small steps in steep, narrow directions.
-Second-order methods accomplish this by calculating or approximating
-the Hessian and normalizing the learning rate by the
-curvature. However, this is very computationally expensive for
-extremely large models. Ideally, we would like to be able to
-adaptively change the step size to match the landscape without paying
-the steep computational price of calculating or approximating
-Hessians.
-
-Recently, a number of methods have been introduced that accomplish
-this by tracking not only the gradient, but also the second moment of
-the gradient. These methods include AdaGrad, AdaDelta, Root Mean Squared Propagation (RMS-Prop), and
-"ADAM":"https://arxiv.org/abs/1412.6980".
+===== Including more classes =====
 
-!split
-===== RMS prop =====
-
-In RMS prop, in addition to keeping a running average of the first
-moment of the gradient, we also keep track of the second moment
-denoted by $\mathbf{s}_t=\mathbb{E}[\mathbf{g}_t^2]$. The update rule
-for RMS prop is given by
+Till now we have mainly focused on two classes, the so-called binary
+system. Suppose we wish to extend to $K$ classes.  Let us for the sake
+of simplicity assume we have only two predictors. We have then following model
 
 !bt
-\begin{align}
-\mathbf{g}_t &= \nabla_\theta E(\boldsymbol{\theta}) \\
-\mathbf{s}_t &=\beta \mathbf{s}_{t-1} +(1-\beta)\mathbf{g}_t^2 \nonumber \\
-\boldsymbol{\theta}_{t+1}&=&\boldsymbol{\theta}_t - \eta_t { \mathbf{g}_t \over \sqrt{\mathbf{s}_t +\epsilon}}, \nonumber
-\end{align}
+\[
+\log{\frac{p(C=1\vert x)}{p(K\vert x)}} = \theta_{10}+\theta_{11}x_1,
+\]
 !et
-
-where $\beta$ controls the averaging time of the second moment and is
-typically taken to be about $\beta=0.9$, $\eta_t$ is a learning rate
-typically chosen to be $10^{-3}$, and $\epsilon\sim 10^{-8} $ is a
-small regularization constant to prevent divergences. Multiplication
-and division by vectors is understood as an element-wise operation. It
-is clear from this formula that the learning rate is reduced in
-directions where the norm of the gradient is consistently large. This
-greatly speeds up the convergence by allowing us to use a larger
-learning rate for flat directions.
-
-
-!split
-===== "ADAM optimizer":"https://arxiv.org/abs/1412.6980" =====
-
-A related algorithm is the ADAM optimizer. In
-"ADAM":"https://arxiv.org/abs/1412.6980", we keep a running average of
-both the first and second moment of the gradient and use this
-information to adaptively change the learning rate for different
-parameters.  The method isefficient when working with large
-problems involving lots data and/or parameters.  It is a combination of the
-gradient descent with momentum algorithm and the RMSprop algorithm
-discussed above.
-
-In addition to keeping a running average of the first and
-second moments of the gradient
-(i.e. $\mathbf{m}_t=\mathbb{E}[\mathbf{g}_t]$ and
-$\mathbf{s}_t=\mathbb{E}[\mathbf{g}^2_t]$, respectively), ADAM
-performs an additional bias correction to account for the fact that we
-are estimating the first two moments of the gradient using a running
-average (denoted by the hats in the update rule below). The update
-rule for ADAM is given by (where multiplication and division are once
-again understood to be element-wise operations below)
-
+and 
 !bt
-\begin{align}
-\mathbf{g}_t &= \nabla_\theta E(\boldsymbol{\theta}) \\
-\mathbf{m}_t &= \beta_1 \mathbf{m}_{t-1} + (1-\beta_1) \mathbf{g}_t \nonumber \\
-\mathbf{s}_t &=\beta_2 \mathbf{s}_{t-1} +(1-\beta_2)\mathbf{g}_t^2 \nonumber \\
-\bm{\mathbf{m}}_t&={\mathbf{m}_t \over 1-\beta_1^t} \nonumber \\
-\bm{\mathbf{s}}_t &={\mathbf{s}_t \over1-\beta_2^t} \nonumber \\
-\boldsymbol{\theta}_{t+1}&=\boldsymbol{\theta}_t - \eta_t { \bm{\mathbf{m}}_t \over \sqrt{\bm{\mathbf{s}}_t} +\epsilon}, \nonumber \\
-\end{align}
+\[
+\log{\frac{p(C=2\vert x)}{p(K\vert x)}} = \theta_{20}+\theta_{21}x_1,
+\]
 !et
-
-where $\beta_1$ and $\beta_2$ set the memory lifetime of the first and
-second moment and are typically taken to be $0.9$ and $0.99$
-respectively, and $\eta$ and $\epsilon$ are identical to RMSprop.
-
-Like in RMSprop, the effective step size of a parameter depends on the
-magnitude of its gradient squared.  To understand this better, let us
-rewrite this expression in terms of the variance
-$\boldsymbol{\sigma}_t^2 = \bm{\mathbf{s}}_t -
-(\bm{\mathbf{m}}_t)^2$. Consider a single parameter $\theta_t$. The
-update rule for this parameter is given by
-
+and so on till the class $C=K-1$ class
 !bt
 \[
-\Delta \theta_{t+1}= -\eta_t { \bm{m}_t \over \sqrt{\sigma_t^2 +  m_t^2 }+\epsilon}.
+\log{\frac{p(C=K-1\vert x)}{p(K\vert x)}} = \theta_{(K-1)0}+\theta_{(K-1)1}x_1,
 \]
 !et
 
-!split
-===== Algorithms and codes for Adagrad, RMSprop and Adam =====
-
-The algorithms we have implemented are well described in the text by "Goodfellow, Bengio and Courville, chapter 8":"https://www.deeplearningbook.org/contents/optimization.html".
-
-The codes which implement these algorithms are discussed after our presentation of automatic differentiation.
-
-
-!split
-===== Practical tips =====
-
-* _Randomize the data when making mini-batches_. It is always important to randomly shuffle the data when forming mini-batches. Otherwise, the gradient descent method can fit spurious correlations resulting from the order in which data is presented.
-
-* _Transform your inputs_. Learning becomes difficult when our landscape has a mixture of steep and flat directions. One simple trick for minimizing these situations is to standardize the data by subtracting the mean and normalizing the variance of input variables. Whenever possible, also decorrelate the inputs. To understand why this is helpful, consider the case of linear regression. It is easy to show that for the squared error cost function, the Hessian of the cost function is just the correlation matrix between the inputs. Thus, by standardizing the inputs, we are ensuring that the landscape looks homogeneous in all directions in parameter space. Since most deep networks can be viewed as linear transformations followed by a non-linearity at each layer, we expect this intuition to hold beyond the linear case.
-
-* _Monitor the out-of-sample performance._ Always monitor the performance of your model on a validation set (a small portion of the training data that is held out of the training process to serve as a proxy for the test set. If the validation error starts increasing, then the model is beginning to overfit. Terminate the learning process. This *early stopping* significantly improves performance in many settings.
-	
-* _Adaptive optimization methods don't always have good generalization._ Recent studies have shown that adaptive methods such as ADAM, RMSPorp, and AdaGrad tend to have poor generalization compared to SGD or SGD with momentum, particularly in the high-dimensional limit (i.e. the number of parameters exceeds the number of data points). Although it is not clear at this stage why these methods perform so well in training deep neural networks, simpler procedures like properly-tuned SGD may work as well or better in these applications.
-
-Geron's text, see chapter 11, has several interesting discussions.
-
+and the model is specified in term of $K-1$ so-called log-odds or
+_logit_ transformations.
 
 
 !split
-===== Automatic differentiation =====
-
-"Automatic differentiation (AD)":"https://en.wikipedia.org/wiki/Automatic_differentiation", 
-also called algorithmic
-differentiation or computational differentiation,is a set of
-techniques to numerically evaluate the derivative of a function
-specified by a computer program. AD exploits the fact that every
-computer program, no matter how complicated, executes a sequence of
-elementary arithmetic operations (addition, subtraction,
-multiplication, division, etc.) and elementary functions (exp, log,
-sin, cos, etc.). By applying the chain rule repeatedly to these
-operations, derivatives of arbitrary order can be computed
-automatically, accurately to working precision, and using at most a
-small constant factor more arithmetic operations than the original
-program.
+===== More classes =====
 
-Automatic differentiation is neither:
+In our discussion of neural networks we will encounter the above again
+in terms of a slightly modified function, the so-called _Softmax_ function.
 
-* Symbolic differentiation, nor
-* Numerical differentiation (the method of finite differences).
+The softmax function is used in various multiclass classification
+methods, such as multinomial logistic regression (also known as
+softmax regression), multiclass linear discriminant analysis, naive
+Bayes classifiers, and artificial neural networks.  Specifically, in
+multinomial logistic regression and linear discriminant analysis, the
+input to the function is the result of $K$ distinct linear functions,
+and the predicted probability for the $k$-th class given a sample
+vector $\bm{x}$ and a weighting vector $\bm{\theta}$ is (with two
+predictors):
 
-Symbolic differentiation can lead to inefficient code and faces the
-difficulty of converting a computer program into a single expression,
-while numerical differentiation can introduce round-off errors in the
-discretization process and cancellation
-
-
-
-Python has tools for so-called _automatic differentiation_.
-Consider the following example
 !bt
 \[
-f(x) = \sin\left(2\pi x + x^2\right)
+p(C=k\vert \mathbf {x} )=\frac{\exp{(\theta_{k0}+\theta_{k1}x_1)}}{1+\sum_{l=1}^{K-1}\exp{(\theta_{l0}+\theta_{l1}x_1)}}.
 \]
 !et
-which has the following derivative
+It is easy to extend to more predictors. The final class is 
 !bt
 \[
-f'(x) = \cos\left(2\pi x + x^2\right)\left(2\pi + 2x\right) 
+p(C=K\vert \mathbf {x} )=\frac{1}{1+\sum_{l=1}^{K-1}\exp{(\theta_{l0}+\theta_{l1}x_1)}},
 \]
 !et
-Using _autograd_ we have
-
-!bc pycod
-import autograd.numpy as np
-
-# To do elementwise differentiation:
-from autograd import elementwise_grad as egrad 
-
-# To plot:
-import matplotlib.pyplot as plt 
-
-
-def f(x):
-    return np.sin(2*np.pi*x + x**2)
-
-def f_grad_analytic(x):
-    return np.cos(2*np.pi*x + x**2)*(2*np.pi + 2*x)
-
-# Do the comparison:
-x = np.linspace(0,1,1000)
-
-f_grad = egrad(f)
 
-computed = f_grad(x)
-analytic = f_grad_analytic(x)
+and they sum to one. Our earlier discussions were all specialized to
+the case with two classes only. It is easy to see from the above that
+what we derived earlier is compatible with these equations.
 
-plt.title('Derivative computed from Autograd compared with the analytical derivative')
-plt.plot(x,computed,label='autograd')
-plt.plot(x,analytic,label='analytic')
+To find the optimal parameters we would typically use a gradient
+descent method.  Newton's method and gradient descent methods are
+discussed in the material on "optimization
+methods":"https://compphysics.github.io/MachineLearning/doc/pub/Splines/html/Splines-bs.html".
 
-plt.xlabel('x')
-plt.ylabel('y')
-plt.legend()
 
-plt.show()
 
-print("The max absolute difference is: %g"%(np.max(np.abs(computed - analytic))))
-!ec
-
-!split 
-===== Using autograd =====
-
-Here we
-experiment with what kind of functions Autograd is capable
-of finding the gradient of. The following Python functions are just
-meant to illustrate what Autograd can do, but please feel free to
-experiment with other, possibly more complicated, functions as well.
-
-!bc pycod
-import autograd.numpy as np
-from autograd import grad
-
-def f1(x):
-    return x**3 + 1
-
-f1_grad = grad(f1)
-
-# Remember to send in float as argument to the computed gradient from Autograd!
-a = 1.0
-
-# See the evaluated gradient at a using autograd:
-print("The gradient of f1 evaluated at a = %g using autograd is: %g"%(a,f1_grad(a)))
-
-# Compare with the analytical derivative, that is f1'(x) = 3*x**2 
-grad_analytical = 3*a**2
-print("The gradient of f1 evaluated at a = %g by finding the analytic expression is: %g"%(a,grad_analytical))
-!ec
 
 
 !split
-===== Autograd with more complicated functions =====
-
-To differentiate with respect to two (or more) arguments of a Python
-function, Autograd need to know at which variable the function if
-being differentiated with respect to.
-
-!bc pycod
-import autograd.numpy as np
-from autograd import grad
-def f2(x1,x2):
-    return 3*x1**3 + x2*(x1 - 5) + 1
-
-# By sending the argument 0, Autograd will compute the derivative w.r.t the first variable, in this case x1
-f2_grad_x1 = grad(f2,0)
-
-# ... and differentiate w.r.t x2 by sending 1 as an additional arugment to grad
-f2_grad_x2 = grad(f2,1)
-
-x1 = 1.0
-x2 = 3.0 
-
-print("Evaluating at x1 = %g, x2 = %g"%(x1,x2))
-print("-"*30)
-
-# Compare with the analytical derivatives:
-
-# Derivative of f2 w.r.t x1 is: 9*x1**2 + x2:
-f2_grad_x1_analytical = 9*x1**2 + x2
-
-# Derivative of f2 w.r.t x2 is: x1 - 5:
-f2_grad_x2_analytical = x1 - 5
-
-# See the evaluated derivations:
-print("The derivative of f2 w.r.t x1: %g"%( f2_grad_x1(x1,x2) ))
-print("The analytical derivative of f2 w.r.t x1: %g"%( f2_grad_x1(x1,x2) ))
-
-print()
-
-print("The derivative of f2 w.r.t x2: %g"%( f2_grad_x2(x1,x2) ))
-print("The analytical derivative of f2 w.r.t x2: %g"%( f2_grad_x2(x1,x2) ))
-!ec
-
-Note that the grad function will not produce the true gradient of the function. The true gradient of a function with two or more variables will produce a vector, where each element is the function differentiated w.r.t a variable.
-
-
-!split
-=====  More complicated functions using the elements of their arguments directly =====
-
-!bc pycod
-import autograd.numpy as np
-from autograd import grad
-def f3(x): # Assumes x is an array of length 5 or higher
-    return 2*x[0] + 3*x[1] + 5*x[2] + 7*x[3] + 11*x[4]**2
-
-f3_grad = grad(f3)
-
-x = np.linspace(0,4,5)
-
-# Print the computed gradient:
-print("The computed gradient of f3 is: ", f3_grad(x))
-
-# The analytical gradient is: (2, 3, 5, 7, 22*x[4])
-f3_grad_analytical = np.array([2, 3, 5, 7, 22*x[4]])
-
-# Print the analytical gradient:
-print("The analytical gradient of f3 is: ", f3_grad_analytical)
-!ec
-
-Note that in this case, when sending an array as input argument, the
-output from Autograd is another array. This is the true gradient of
-the function, as opposed to the function in the previous example. By
-using arrays to represent the variables, the output from Autograd
-might be easier to work with, as the output is closer to what one
-could expect form a gradient-evaluting function.
-
-!split 
-===== Functions using mathematical functions from Numpy =====
-
-!bc pycod
-import autograd.numpy as np
-from autograd import grad
-def f4(x):
-    return np.sqrt(1+x**2) + np.exp(x) + np.sin(2*np.pi*x)
-
-f4_grad = grad(f4)
-
-x = 2.7
-
-# Print the computed derivative:
-print("The computed derivative of f4 at x = %g is: %g"%(x,f4_grad(x)))
-
-# The analytical derivative is: x/sqrt(1 + x**2) + exp(x) + cos(2*pi*x)*2*pi
-f4_grad_analytical = x/np.sqrt(1 + x**2) + np.exp(x) + np.cos(2*np.pi*x)*2*np.pi
+===== Optimization, the central part of any Machine Learning algortithm =====
 
-# Print the analytical gradient:
-print("The analytical gradient of f4 at x = %g is: %g"%(x,f4_grad_analytical))
-!ec
-
-
-!split
-===== More autograd =====
-
-!bc pycod
-import autograd.numpy as np
-from autograd import grad
-def f5(x):
-    if x >= 0:
-        return x**2
-    else:
-        return -3*x + 1
-
-f5_grad = grad(f5)
-
-x = 2.7
-
-# Print the computed derivative:
-print("The computed derivative of f5 at x = %g is: %g"%(x,f5_grad(x)))
-!ec
-
-
-!split
-===== And  with loops =====
+Almost every problem in machine learning and data science starts with
+a dataset $X$, a model $g(\theta)$, which is a function of the
+parameters $\theta$ and a cost function $C(X, g(\theta))$ that allows
+us to judge how well the model $g(\theta)$ explains the observations
+$X$. The model is fit by finding the values of $\theta$ that minimize
+the cost function. Ideally we would be able to solve for $\theta$
+analytically, however this is not possible in general and we must use
+some approximative/numerical method to compute the minimum.
 
-!bc pycod
-import autograd.numpy as np
-from autograd import grad
-def f6_for(x):
-    val = 0
-    for i in range(10):
-        val = val + x**i
-    return val
-
-def f6_while(x):
-    val = 0
-    i = 0
-    while i < 10:
-        val = val + x**i
-        i = i + 1
-    return val
-
-f6_for_grad = grad(f6_for)
-f6_while_grad = grad(f6_while)
-
-x = 0.5
-
-# Print the computed derivaties of f6_for and f6_while
-print("The computed derivative of f6_for at x = %g is: %g"%(x,f6_for_grad(x)))
-print("The computed derivative of f6_while at x = %g is: %g"%(x,f6_while_grad(x)))
-!ec
-!bc pycod
-import autograd.numpy as np
-from autograd import grad
-# Both of the functions are implementation of the sum: sum(x**i) for i = 0, ..., 9
-# The analytical derivative is: sum(i*x**(i-1)) 
-f6_grad_analytical = 0
-for i in range(10):
-    f6_grad_analytical += i*x**(i-1)
-
-print("The analytical derivative of f6 at x = %g is: %g"%(x,f6_grad_analytical))
-!ec
 
 !split
-===== Using recursion =====
-!bc pycod
-import autograd.numpy as np
-from autograd import grad
-
-def f7(n): # Assume that n is an integer
-    if n == 1 or n == 0:
-        return 1
-    else:
-        return n*f7(n-1)
-
-f7_grad = grad(f7)
-
-n = 2.0
-
-print("The computed derivative of f7 at n = %d is: %g"%(n,f7_grad(n)))
-
-# The function f7 is an implementation of the factorial of n.
-# By using the product rule, one can find that the derivative is:
-
-f7_grad_analytical = 0
-for i in range(int(n)-1):
-    tmp = 1
-    for k in range(int(n)-1):
-        if k != i:
-            tmp *= (n - k)
-    f7_grad_analytical += tmp
+=====  Revisiting our Logistic Regression case =====
 
-print("The analytical derivative of f7 at n = %d is: %g"%(n,f7_grad_analytical))
+In our discussion on Logistic Regression we studied the 
+case of
+two classes, with $y_i$ either
+$0$ or $1$. Furthermore we assumed also that we have only two
+parameters $\theta$ in our fitting, that is we
+defined probabilities
 
-!ec
-Note that if n is equal to zero or one, Autograd will give an error message. This message appears when the output is independent on input.
+!bt
+\begin{align*}
+p(y_i=1|x_i,\bm{\theta}) &= \frac{\exp{(\theta_0+\theta_1x_i)}}{1+\exp{(\theta_0+\theta_1x_i)}},\nonumber\\
+p(y_i=0|x_i,\bm{\theta}) &= 1 - p(y_i=1|x_i,\bm{\theta}),
+\end{align*}
+!et
+where $\bm{\theta}$ are the weights we wish to extract from data, in our case $\theta_0$ and $\theta_1$. 
 
 !split
-===== Unsupported functions =====
-Autograd supports many features. However, there are some functions that is not supported (yet) by Autograd.
+===== The equations to solve =====
 
-Assigning a value to the variable being differentiated with respect to
-!bc pycod
-import autograd.numpy as np
-from autograd import grad
-def f8(x): # Assume x is an array
-    x[2] = 3
-    return x*2
+Our compact equations used a definition of a vector $\bm{y}$ with $n$
+elements $y_i$, an $n\times p$ matrix $\bm{X}$ which contains the
+$x_i$ values and a vector $\bm{p}$ of fitted probabilities
+$p(y_i\vert x_i,\bm{\theta})$. We rewrote in a more compact form
+the first derivative of the cost function as
 
-#f8_grad = grad(f8)
+!bt
+\[
+\frac{\partial \mathcal{C}(\bm{\theta})}{\partial \bm{\theta}} = -\bm{X}^T\left(\bm{y}-\bm{p}\right). 
+\]
+!et
 
-#x = 8.4
+If we in addition define a diagonal matrix $\bm{W}$ with elements 
+$p(y_i\vert x_i,\bm{\theta})(1-p(y_i\vert x_i,\bm{\theta})$, we can obtain a compact expression of the second derivative as 
 
-#print("The derivative of f8 is:",f8_grad(x))
-!ec
-Here, running this code, Autograd tells us that an 'ArrayBox' does not support item assignment. The item assignment is done when the program tries to assign x[2] to the value 3. However, Autograd has implemented the computation of the derivative such that this assignment is not possible.
+!bt
+\[
+\frac{\partial^2 \mathcal{C}(\bm{\theta})}{\partial \bm{\theta}\partial \bm{\theta}^T} = \bm{X}^T\bm{W}\bm{X}. 
+\]
+!et
+This defines what is called  the Hessian matrix.
 
 !split
-===== The syntax a.dot(b) when finding the dot product =====
-!bc pycod
-import autograd.numpy as np
-from autograd import grad
-def f9(a): # Assume a is an array with 2 elements
-    b = np.array([1.0,2.0])
-    return a.dot(b)
-
-#f9_grad = grad(f9)
-
-#x = np.array([1.0,0.0])
-
-#print("The derivative of f9 is:",f9_grad(x))
-!ec
+===== Solving using Newton-Raphson's method =====
 
-Here we are told that the 'dot' function does not belong to Autograd's
-version of a Numpy array.  To overcome this, an alternative syntax
-which also computed the dot product can be used:
+If we can set up these equations, Newton-Raphson's iterative method is normally the method of choice. It requires however that we can compute in an efficient way the  matrices that define the first and second derivatives. 
 
-!bc pycod
-import autograd.numpy as np
-from autograd import grad
-def f9_alternative(x): # Assume a is an array with 2 elements
-    b = np.array([1.0,2.0])
-    return np.dot(x,b) # The same as x_1*b_1 + x_2*b_2
+Our iterative scheme is then given by
 
-f9_alternative_grad = grad(f9_alternative)
+!bt
+\[
+\bm{\theta}^{\mathrm{new}} = \bm{\theta}^{\mathrm{old}}-\left(\frac{\partial^2 \mathcal{C}(\bm{\theta})}{\partial \bm{\theta}\partial \bm{\theta}^T}\right)^{-1}_{\bm{\theta}^{\mathrm{old}}}\times \left(\frac{\partial \mathcal{C}(\bm{\theta})}{\partial \bm{\theta}}\right)_{\bm{\theta}^{\mathrm{old}}},
+\]
+!et
+or in matrix form as
 
-x = np.array([3.0,0.0])
+!bt
+\[
+\bm{\theta}^{\mathrm{new}} = \bm{\theta}^{\mathrm{old}}-\left(\bm{X}^T\bm{W}\bm{X} \right)^{-1}\times \left(-\bm{X}^T(\bm{y}-\bm{p}) \right)_{\bm{\theta}^{\mathrm{old}}}.
+\]
+!et
+The right-hand side is computed with the old values of $\theta$. 
 
-print("The gradient of f9 is:",f9_alternative_grad(x))
+If we can compute these matrices, in particular the Hessian, the above is often the easiest method to implement. 
 
-# The analytical gradient of the dot product of vectors x and b with two elements (x_1,x_2) and (b_1, b_2) respectively
-# w.r.t x is (b_1, b_2).
-!ec
 
 
 !split
-=====  Using Autograd with OLS =====
-
-We conclude the part on optmization by showing how we can make codes
-for linear regression and logistic regression using _autograd_. The
-first example shows results with ordinary leats squares.
-
-!bc pycod
-# Using Autograd to calculate gradients for OLS
-from random import random, seed
-import numpy as np
-import autograd.numpy as np
-import matplotlib.pyplot as plt
-from autograd import grad
-
-def CostOLS(beta):
-    return (1.0/n)*np.sum((y-X @ beta)**2)
-
-n = 100
-x = 2*np.random.rand(n,1)
-y = 4+3*x+np.random.randn(n,1)
-
-X = np.c_[np.ones((n,1)), x]
-XT_X = X.T @ X
-theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)
-print("Own inversion")
-print(theta_linreg)
-# Hessian matrix
-H = (2.0/n)* XT_X
-EigValues, EigVectors = np.linalg.eig(H)
-print(f"Eigenvalues of Hessian Matrix:{EigValues}")
-
-theta = np.random.randn(2,1)
-eta = 1.0/np.max(EigValues)
-Niterations = 1000
-# define the gradient
-training_gradient = grad(CostOLS)
-
-for iter in range(Niterations):
-    gradients = training_gradient(theta)
-    theta -= eta*gradients
-print("theta from own gd")
-print(theta)
-
-xnew = np.array([[0],[2]])
-Xnew = np.c_[np.ones((2,1)), xnew]
-ypredict = Xnew.dot(theta)
-ypredict2 = Xnew.dot(theta_linreg)
-
-plt.plot(xnew, ypredict, "r-")
-plt.plot(xnew, ypredict2, "b-")
-plt.plot(x, y ,'ro')
-plt.axis([0,2.0,0, 15.0])
-plt.xlabel(r'$x$')
-plt.ylabel(r'$y$')
-plt.title(r'Random numbers ')
-plt.show()
-
-!ec
-
+===== Example code for Logistic Regression =====
 
-!split
-===== Same code but now with momentum gradient descent =====
+Here we make a class for Logistic regression. The code uses a simple data set and includes both a binary case and a multiclass case.
 !bc pycod
-# Using Autograd to calculate gradients for OLS
-from random import random, seed
 import numpy as np
-import autograd.numpy as np
-import matplotlib.pyplot as plt
-from autograd import grad
-
-def CostOLS(beta):
-    return (1.0/n)*np.sum((y-X @ beta)**2)
-
-n = 100
-x = 2*np.random.rand(n,1)
-y = 4+3*x#+np.random.randn(n,1)
-
-X = np.c_[np.ones((n,1)), x]
-XT_X = X.T @ X
-theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)
-print("Own inversion")
-print(theta_linreg)
-# Hessian matrix
-H = (2.0/n)* XT_X
-EigValues, EigVectors = np.linalg.eig(H)
-print(f"Eigenvalues of Hessian Matrix:{EigValues}")
-
-theta = np.random.randn(2,1)
-eta = 1.0/np.max(EigValues)
-Niterations = 30
-
-# define the gradient
-training_gradient = grad(CostOLS)
-
-for iter in range(Niterations):
-    gradients = training_gradient(theta)
-    theta -= eta*gradients
-    print(iter,gradients[0],gradients[1])
-print("theta from own gd")
-print(theta)
-
-# Now improve with momentum gradient descent
-change = 0.0
-delta_momentum = 0.3
-for iter in range(Niterations):
-    # calculate gradient
-    gradients = training_gradient(theta)
-    # calculate update
-    new_change = eta*gradients+delta_momentum*change
-    # take a step
-    theta -= new_change
-    # save the change
-    change = new_change
-    print(iter,gradients[0],gradients[1])
-print("theta from own gd wth momentum")
-print(theta)
-
-!ec
-
-!split
-===== But none of these can compete with Newton's method =====
 
-!bc pycod
-# Using Newton's method
-from random import random, seed
-import numpy as np
-import autograd.numpy as np
-import matplotlib.pyplot as plt
-from autograd import grad
-
-def CostOLS(beta):
-    return (1.0/n)*np.sum((y-X @ beta)**2)
-
-n = 100
-x = 2*np.random.rand(n,1)
-y = 4+3*x+np.random.randn(n,1)
-
-X = np.c_[np.ones((n,1)), x]
-XT_X = X.T @ X
-beta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)
-print("Own inversion")
-print(beta_linreg)
-# Hessian matrix
-H = (2.0/n)* XT_X
-# Note that here the Hessian does not depend on the parameters beta
-invH = np.linalg.pinv(H)
-EigValues, EigVectors = np.linalg.eig(H)
-print(f"Eigenvalues of Hessian Matrix:{EigValues}")
-
-beta = np.random.randn(2,1)
-Niterations = 5
-
-# define the gradient
-training_gradient = grad(CostOLS)
-
-for iter in range(Niterations):
-    gradients = training_gradient(beta)
-    beta -= invH @ gradients
-    print(iter,gradients[0],gradients[1])
-print("beta from own Newton code")
-print(beta)
+class LogisticRegression:
+    """
+    Logistic Regression for binary and multiclass classification.
+    """
+    def __init__(self, lr=0.01, epochs=1000, fit_intercept=True, verbose=False):
+        self.lr = lr                  # Learning rate for gradient descent
+        self.epochs = epochs          # Number of iterations
+        self.fit_intercept = fit_intercept  # Whether to add intercept (bias)
+        self.verbose = verbose        # Print loss during training if True
+        self.weights = None
+        self.multi_class = False      # Will be determined at fit time
+
+    def _add_intercept(self, X):
+        """Add intercept term (column of ones) to feature matrix."""
+        intercept = np.ones((X.shape[0], 1))
+        return np.concatenate((intercept, X), axis=1)
+
+    def _sigmoid(self, z):
+        """Sigmoid function for binary logistic."""
+        return 1 / (1 + np.exp(-z))
+
+    def _softmax(self, Z):
+        """Softmax function for multiclass logistic."""
+        exp_Z = np.exp(Z - np.max(Z, axis=1, keepdims=True))
+        return exp_Z / np.sum(exp_Z, axis=1, keepdims=True)
+
+    def fit(self, X, y):
+        """
+        Train the logistic regression model using gradient descent.
+        Supports binary (sigmoid) and multiclass (softmax) based on y.
+        """
+        X = np.array(X)
+        y = np.array(y)
+        n_samples, n_features = X.shape
+
+        # Add intercept if needed
+        if self.fit_intercept:
+            X = self._add_intercept(X)
+            n_features += 1
+
+        # Determine classes and mode (binary vs multiclass)
+        unique_classes = np.unique(y)
+        if len(unique_classes) > 2:
+            self.multi_class = True
+        else:
+            self.multi_class = False
+
+        # ----- Multiclass case -----
+        if self.multi_class:
+            n_classes = len(unique_classes)
+            # Map original labels to 0...n_classes-1
+            class_to_index = {c: idx for idx, c in enumerate(unique_classes)}
+            y_indices = np.array([class_to_index[c] for c in y])
+            # Initialize weight matrix (features x classes)
+            self.weights = np.zeros((n_features, n_classes))
+
+            # One-hot encode y
+            Y_onehot = np.zeros((n_samples, n_classes))
+            Y_onehot[np.arange(n_samples), y_indices] = 1
+
+            # Gradient descent
+            for epoch in range(self.epochs):
+                scores = X.dot(self.weights)          # Linear scores (n_samples x n_classes)
+                probs = self._softmax(scores)        # Probabilities (n_samples x n_classes)
+                # Compute gradient (features x classes)
+                gradient = (1 / n_samples) * X.T.dot(probs - Y_onehot)
+                # Update weights
+                self.weights -= self.lr * gradient
+
+                if self.verbose and epoch % 100 == 0:
+                    # Compute current loss (categorical cross-entropy)
+                    loss = -np.sum(Y_onehot * np.log(probs + 1e-15)) / n_samples
+                    print(f"[Epoch {epoch}] Multiclass loss: {loss:.4f}")
+
+        # ----- Binary case -----
+        else:
+            # Convert y to 0/1 if not already
+            if not np.array_equal(unique_classes, [0, 1]):
+                # Map the two classes to 0 and 1
+                class0, class1 = unique_classes
+                y_binary = np.where(y == class1, 1, 0)
+            else:
+                y_binary = y.copy().astype(int)
+
+            # Initialize weights vector (features,)
+            self.weights = np.zeros(n_features)
+
+            # Gradient descent
+            for epoch in range(self.epochs):
+                linear_model = X.dot(self.weights)     # (n_samples,)
+                probs = self._sigmoid(linear_model)   # (n_samples,)
+                # Gradient for binary cross-entropy
+                gradient = (1 / n_samples) * X.T.dot(probs - y_binary)
+                self.weights -= self.lr * gradient
+
+                if self.verbose and epoch % 100 == 0:
+                    # Compute binary cross-entropy loss
+                    loss = -np.mean(
+                        y_binary * np.log(probs + 1e-15) + 
+                        (1 - y_binary) * np.log(1 - probs + 1e-15)
+                    )
+                    print(f"[Epoch {epoch}] Binary loss: {loss:.4f}")
+
+    def predict_prob(self, X):
+        """
+        Compute probability estimates. Returns a 1D array for binary or
+        a 2D array (n_samples x n_classes) for multiclass.
+        """
+        X = np.array(X)
+        # Add intercept if the model used it
+        if self.fit_intercept:
+            X = self._add_intercept(X)
+        scores = X.dot(self.weights)
+        if self.multi_class:
+            return self._softmax(scores)
+        else:
+            return self._sigmoid(scores)
+
+    def predict(self, X):
+        """
+        Predict class labels for samples in X.
+        Returns integer class labels (0,1 for binary, or 0...C-1 for multiclass).
+        """
+        probs = self.predict_prob(X)
+        if self.multi_class:
+            # Choose class with highest probability
+            return np.argmax(probs, axis=1)
+        else:
+            # Threshold at 0.5 for binary
+            return (probs >= 0.5).astype(int)
 !ec
 
 
-!split
-===== Including Stochastic Gradient Descent with Autograd =====
-In this code we include the stochastic gradient descent approach discussed above. Note here that we specify which argument we are taking the derivative with respect to when using _autograd_.
-
-!bc pycod
-# Using Autograd to calculate gradients using SGD
-# OLS example
-from random import random, seed
-import numpy as np
-import autograd.numpy as np
-import matplotlib.pyplot as plt
-from autograd import grad
-
-# Note change from previous example
-def CostOLS(y,X,theta):
-    return np.sum((y-X @ theta)**2)
-
-n = 100
-x = 2*np.random.rand(n,1)
-y = 4+3*x+np.random.randn(n,1)
-
-X = np.c_[np.ones((n,1)), x]
-XT_X = X.T @ X
-theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)
-print("Own inversion")
-print(theta_linreg)
-# Hessian matrix
-H = (2.0/n)* XT_X
-EigValues, EigVectors = np.linalg.eig(H)
-print(f"Eigenvalues of Hessian Matrix:{EigValues}")
-
-theta = np.random.randn(2,1)
-eta = 1.0/np.max(EigValues)
-Niterations = 1000
-
-# Note that we request the derivative wrt third argument (theta, 2 here)
-training_gradient = grad(CostOLS,2)
-
-for iter in range(Niterations):
-    gradients = (1.0/n)*training_gradient(y, X, theta)
-    theta -= eta*gradients
-print("theta from own gd")
-print(theta)
-
-xnew = np.array([[0],[2]])
-Xnew = np.c_[np.ones((2,1)), xnew]
-ypredict = Xnew.dot(theta)
-ypredict2 = Xnew.dot(theta_linreg)
-
-plt.plot(xnew, ypredict, "r-")
-plt.plot(xnew, ypredict2, "b-")
-plt.plot(x, y ,'ro')
-plt.axis([0,2.0,0, 15.0])
-plt.xlabel(r'$x$')
-plt.ylabel(r'$y$')
-plt.title(r'Random numbers ')
-plt.show()
-
-n_epochs = 50
-M = 5   #size of each minibatch
-m = int(n/M) #number of minibatches
-t0, t1 = 5, 50
-def learning_schedule(t):
-    return t0/(t+t1)
-
-theta = np.random.randn(2,1)
-
-for epoch in range(n_epochs):
-# Can you figure out a better way of setting up the contributions to each batch?
-    for i in range(m):
-        random_index = M*np.random.randint(m)
-        xi = X[random_index:random_index+M]
-        yi = y[random_index:random_index+M]
-        gradients = (1.0/M)*training_gradient(yi, xi, theta)
-        eta = learning_schedule(epoch*m+i)
-        theta = theta - eta*gradients
-print("theta from own sdg")
-print(theta)
+The class implements the sigmoid and softmax internally. During fit(),
+we check the number of classes: if more than 2, we set
+self.multi_class=True and perform multinomial logistic regression. We
+one-hot encode the target vector and update a weight matrix with
+softmax probabilities. Otherwise, we do standard binary logistic
+regression, converting labels to 0/1 if needed and updating a weight
+vector. In both cases we use batch gradient descent on the
+cross-entropy loss (we add a small epsilon 1e-15 to logs for numerical
+stability). Progress (loss) can be printed if verbose=True.
 
-
-!ec
-
-
-!split
-===== Same code but now with momentum gradient descent =====
 !bc pycod
-# Using Autograd to calculate gradients using SGD
-# OLS example
-from random import random, seed
-import numpy as np
-import autograd.numpy as np
-import matplotlib.pyplot as plt
-from autograd import grad
-
-# Note change from previous example
-def CostOLS(y,X,theta):
-    return np.sum((y-X @ theta)**2)
-
-n = 100
-x = 2*np.random.rand(n,1)
-y = 4+3*x+np.random.randn(n,1)
-
-X = np.c_[np.ones((n,1)), x]
-XT_X = X.T @ X
-theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)
-print("Own inversion")
-print(theta_linreg)
-# Hessian matrix
-H = (2.0/n)* XT_X
-EigValues, EigVectors = np.linalg.eig(H)
-print(f"Eigenvalues of Hessian Matrix:{EigValues}")
-
-theta = np.random.randn(2,1)
-eta = 1.0/np.max(EigValues)
-Niterations = 100
-
-# Note that we request the derivative wrt third argument (theta, 2 here)
-training_gradient = grad(CostOLS,2)
-
-for iter in range(Niterations):
-    gradients = (1.0/n)*training_gradient(y, X, theta)
-    theta -= eta*gradients
-print("theta from own gd")
-print(theta)
-
-
-n_epochs = 50
-M = 5   #size of each minibatch
-m = int(n/M) #number of minibatches
-t0, t1 = 5, 50
-def learning_schedule(t):
-    return t0/(t+t1)
-
-theta = np.random.randn(2,1)
-
-change = 0.0
-delta_momentum = 0.3
-
-for epoch in range(n_epochs):
-    for i in range(m):
-        random_index = M*np.random.randint(m)
-        xi = X[random_index:random_index+M]
-        yi = y[random_index:random_index+M]
-        gradients = (1.0/M)*training_gradient(yi, xi, theta)
-        eta = learning_schedule(epoch*m+i)
-        # calculate update
-        new_change = eta*gradients+delta_momentum*change
-        # take a step
-        theta -= new_change
-        # save the change
-        change = new_change
-print("theta from own sdg with momentum")
-print(theta)
+# Evaluation Metrics
+#We define helper functions for accuracy and cross-entropy loss. Accuracy is the fraction of correct predictions . For loss, we compute the appropriate cross-entropy:
+
+def accuracy_score(y_true, y_pred):
+    """Accuracy = (# correct predictions) / (total samples)."""
+    y_true = np.array(y_true)
+    y_pred = np.array(y_pred)
+    return np.mean(y_true == y_pred)
+
+def binary_cross_entropy(y_true, y_prob):
+    """
+    Binary cross-entropy loss.
+    y_true: true binary labels (0 or 1), y_prob: predicted probabilities for class 1.
+    """
+    y_true = np.array(y_true)
+    y_prob = np.clip(np.array(y_prob), 1e-15, 1-1e-15)
+    return -np.mean(y_true * np.log(y_prob) + (1 - y_true) * np.log(1 - y_prob))
+
+def categorical_cross_entropy(y_true, y_prob):
+    """
+    Categorical cross-entropy loss for multiclass.
+    y_true: true labels (0...C-1), y_prob: array of predicted probabilities (n_samples x C).
+    """
+    y_true = np.array(y_true, dtype=int)
+    y_prob = np.clip(np.array(y_prob), 1e-15, 1-1e-15)
+    # One-hot encode true labels
+    n_samples, n_classes = y_prob.shape
+    one_hot = np.zeros_like(y_prob)
+    one_hot[np.arange(n_samples), y_true] = 1
+    # Compute cross-entropy
+    loss_vec = -np.sum(one_hot * np.log(y_prob), axis=1)
+    return np.mean(loss_vec)
 !ec
 
 
-!split
-===== Similar (second order function now) problem but now with AdaGrad =====
-!bc pycod
-# Using Autograd to calculate gradients using AdaGrad and Stochastic Gradient descent
-# OLS example
-from random import random, seed
-import numpy as np
-import autograd.numpy as np
-import matplotlib.pyplot as plt
-from autograd import grad
-
-# Note change from previous example
-def CostOLS(y,X,theta):
-    return np.sum((y-X @ theta)**2)
-
-n = 1000
-x = np.random.rand(n,1)
-y = 2.0+3*x +4*x*x
-
-X = np.c_[np.ones((n,1)), x, x*x]
-XT_X = X.T @ X
-theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)
-print("Own inversion")
-print(theta_linreg)
-
-
-# Note that we request the derivative wrt third argument (theta, 2 here)
-training_gradient = grad(CostOLS,2)
-# Define parameters for Stochastic Gradient Descent
-n_epochs = 50
-M = 5   #size of each minibatch
-m = int(n/M) #number of minibatches
-# Guess for unknown parameters theta
-theta = np.random.randn(3,1)
-
-# Value for learning rate
-eta = 0.01
-# Including AdaGrad parameter to avoid possible division by zero
-delta  = 1e-8
-for epoch in range(n_epochs):
-    Giter = 0.0
-    for i in range(m):
-        random_index = M*np.random.randint(m)
-        xi = X[random_index:random_index+M]
-        yi = y[random_index:random_index+M]
-        gradients = (1.0/M)*training_gradient(yi, xi, theta)
-        Giter += gradients*gradients
-        update = gradients*eta/(delta+np.sqrt(Giter))
-        theta -= update
-print("theta from own AdaGrad")
-print(theta)
-
-
-!ec
-
-Running this code we note an almost perfect agreement with the results from matrix inversion.
+=== Synthetic data generation ===
 
-!split
-=====  RMSprop for adaptive learning rate with Stochastic Gradient Descent =====
-!bc pycod
-# Using Autograd to calculate gradients using RMSprop  and Stochastic Gradient descent
-# OLS example
-from random import random, seed
-import numpy as np
-import autograd.numpy as np
-import matplotlib.pyplot as plt
-from autograd import grad
-
-# Note change from previous example
-def CostOLS(y,X,theta):
-    return np.sum((y-X @ theta)**2)
-
-n = 1000
-x = np.random.rand(n,1)
-y = 2.0+3*x +4*x*x# +np.random.randn(n,1)
-
-X = np.c_[np.ones((n,1)), x, x*x]
-XT_X = X.T @ X
-theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)
-print("Own inversion")
-print(theta_linreg)
-
-
-# Note that we request the derivative wrt third argument (theta, 2 here)
-training_gradient = grad(CostOLS,2)
-# Define parameters for Stochastic Gradient Descent
-n_epochs = 50
-M = 5   #size of each minibatch
-m = int(n/M) #number of minibatches
-# Guess for unknown parameters theta
-theta = np.random.randn(3,1)
-
-# Value for learning rate
-eta = 0.01
-# Value for parameter rho
-rho = 0.99
-# Including AdaGrad parameter to avoid possible division by zero
-delta  = 1e-8
-for epoch in range(n_epochs):
-    Giter = 0.0
-    for i in range(m):
-        random_index = M*np.random.randint(m)
-        xi = X[random_index:random_index+M]
-        yi = y[random_index:random_index+M]
-        gradients = (1.0/M)*training_gradient(yi, xi, theta)
-	# Accumulated gradient
-	# Scaling with rho the new and the previous results
-        Giter = (rho*Giter+(1-rho)*gradients*gradients)
-	# Taking the diagonal only and inverting
-        update = gradients*eta/(delta+np.sqrt(Giter))
-	# Hadamard product
-        theta -= update
-print("theta from own RMSprop")
-print(theta)
-!ec
+Binary classification data: Create two Gaussian clusters in 2D. For example, class 0 around mean [-2,-2] and class 1 around [2,2].
+Multiclass data: Create several Gaussian clusters (one per class) spread out in feature space.
 
-!split
-===== And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf" =====
 
 !bc pycod
-# Using Autograd to calculate gradients using RMSprop  and Stochastic Gradient descent
-# OLS example
-from random import random, seed
 import numpy as np
-import autograd.numpy as np
-import matplotlib.pyplot as plt
-from autograd import grad
-
-# Note change from previous example
-def CostOLS(y,X,theta):
-    return np.sum((y-X @ theta)**2)
-
-n = 1000
-x = np.random.rand(n,1)
-y = 2.0+3*x +4*x*x# +np.random.randn(n,1)
-
-X = np.c_[np.ones((n,1)), x, x*x]
-XT_X = X.T @ X
-theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)
-print("Own inversion")
-print(theta_linreg)
-
-
-# Note that we request the derivative wrt third argument (theta, 2 here)
-training_gradient = grad(CostOLS,2)
-# Define parameters for Stochastic Gradient Descent
-n_epochs = 50
-M = 5   #size of each minibatch
-m = int(n/M) #number of minibatches
-# Guess for unknown parameters theta
-theta = np.random.randn(3,1)
-
-# Value for learning rate
-eta = 0.01
-# Value for parameters beta1 and beta2, see https://arxiv.org/abs/1412.6980
-beta1 = 0.9
-beta2 = 0.999
-# Including AdaGrad parameter to avoid possible division by zero
-delta  = 1e-7
-iter = 0
-for epoch in range(n_epochs):
-    first_moment = 0.0
-    second_moment = 0.0
-    iter += 1
-    for i in range(m):
-        random_index = M*np.random.randint(m)
-        xi = X[random_index:random_index+M]
-        yi = y[random_index:random_index+M]
-        gradients = (1.0/M)*training_gradient(yi, xi, theta)
-        # Computing moments first
-        first_moment = beta1*first_moment + (1-beta1)*gradients
-        second_moment = beta2*second_moment+(1-beta2)*gradients*gradients
-        first_term = first_moment/(1.0-beta1**iter)
-        second_term = second_moment/(1.0-beta2**iter)
-	# Scaling with rho the new and the previous results
-        update = eta*first_term/(np.sqrt(second_term)+delta)
-        theta -= update
-print("theta from own ADAM")
-print(theta)
-!ec
-
-!split
-===== And Logistic Regression =====
-
-!bc pycod
-import autograd.numpy as np
-from autograd import grad
-
-def sigmoid(x):
-    return 0.5 * (np.tanh(x / 2.) + 1)
-
-def logistic_predictions(weights, inputs):
-    # Outputs probability of a label being true according to logistic model.
-    return sigmoid(np.dot(inputs, weights))
-
-def training_loss(weights):
-    # Training loss is the negative log-likelihood of the training labels.
-    preds = logistic_predictions(weights, inputs)
-    label_probabilities = preds * targets + (1 - preds) * (1 - targets)
-    return -np.sum(np.log(label_probabilities))
-
-# Build a toy dataset.
-inputs = np.array([[0.52, 1.12,  0.77],
-                   [0.88, -1.08, 0.15],
-                   [0.52, 0.06, -1.30],
-                   [0.74, -2.49, 1.39]])
-targets = np.array([True, True, False, True])
-
-# Define a function that returns gradients of training loss using Autograd.
-training_gradient_fun = grad(training_loss)
-
-# Optimize weights using gradient descent.
-weights = np.array([0.0, 0.0, 0.0])
-print("Initial loss:", training_loss(weights))
-for i in range(100):
-    weights -= training_gradient_fun(weights) * 0.01
-
-print("Trained loss:", training_loss(weights))
-!ec
-
-
-!split
-===== Introducing "JAX":"https://jax.readthedocs.io/en/latest/" =====
-
-Presently, instead of using _autograd_, we recommend using "JAX":"https://jax.readthedocs.io/en/latest/"
-
-_JAX_ is Autograd and "XLA (Accelerated Linear Algebra))":"https://www.tensorflow.org/xla",
-brought together for high-performance numerical computing and machine learning research.
-It provides composable transformations of Python+NumPy programs: differentiate, vectorize, parallelize, Just-In-Time compile to GPU/TPU, and more.
-
-Here's a simple example on how you can use _JAX_ to compute the derivate of the logistic function.
-
-!bc pycod
-import jax.numpy as jnp
-from jax import grad, jit, vmap
 
-def sum_logistic(x):
-  return jnp.sum(1.0 / (1.0 + jnp.exp(-x)))
-
-x_small = jnp.arange(3.)
-derivative_fn = grad(sum_logistic)
-print(derivative_fn(x_small))
+def generate_binary_data(n_samples=100, n_features=2, random_state=None):
+    """
+    Generate synthetic binary classification data.
+    Returns (X, y) where X is (n_samples x n_features), y in {0,1}.
+    """
+    rng = np.random.RandomState(random_state)
+    # Half samples for class 0, half for class 1
+    n0 = n_samples // 2
+    n1 = n_samples - n0
+    # Class 0 around mean -2, class 1 around +2
+    mean0 = -2 * np.ones(n_features)
+    mean1 =  2 * np.ones(n_features)
+    X0 = rng.randn(n0, n_features) + mean0
+    X1 = rng.randn(n1, n_features) + mean1
+    X = np.vstack((X0, X1))
+    y = np.array([0]*n0 + [1]*n1)
+    return X, y
+
+def generate_multiclass_data(n_samples=150, n_features=2, n_classes=3, random_state=None):
+    """
+    Generate synthetic multiclass data with n_classes Gaussian clusters.
+    """
+    rng = np.random.RandomState(random_state)
+    X = []
+    y = []
+    samples_per_class = n_samples // n_classes
+    for cls in range(n_classes):
+        # Random cluster center for each class
+        center = rng.uniform(-5, 5, size=n_features)
+        Xi = rng.randn(samples_per_class, n_features) + center
+        yi = [cls] * samples_per_class
+        X.append(Xi)
+        y.extend(yi)
+    X = np.vstack(X)
+    y = np.array(y)
+    return X, y
+
+
+# Generate and test on binary data
+X_bin, y_bin = generate_binary_data(n_samples=200, n_features=2, random_state=42)
+model_bin = LogisticRegression(lr=0.1, epochs=1000)
+model_bin.fit(X_bin, y_bin)
+y_prob_bin = model_bin.predict_prob(X_bin)      # probabilities for class 1
+y_pred_bin = model_bin.predict(X_bin)           # predicted classes 0 or 1
+
+acc_bin = accuracy_score(y_bin, y_pred_bin)
+loss_bin = binary_cross_entropy(y_bin, y_prob_bin)
+print(f"Binary Classification - Accuracy: {acc_bin:.2f}, Cross-Entropy Loss: {loss_bin:.2f}")
+#For multiclass:
+# Generate and test on multiclass data
+X_multi, y_multi = generate_multiclass_data(n_samples=300, n_features=2, n_classes=3, random_state=1)
+model_multi = LogisticRegression(lr=0.1, epochs=1000)
+model_multi.fit(X_multi, y_multi)
+y_prob_multi = model_multi.predict_prob(X_multi)     # (n_samples x 3) probabilities
+y_pred_multi = model_multi.predict(X_multi)          # predicted labels 0,1,2
+
+acc_multi = accuracy_score(y_multi, y_pred_multi)
+loss_multi = categorical_cross_entropy(y_multi, y_prob_multi)
+print(f"Multiclass Classification - Accuracy: {acc_multi:.2f}, Cross-Entropy Loss: {loss_multi:.2f}")
+
+# CSV Export
+import csv
+
+# Export binary results
+with open('binary_results.csv', mode='w', newline='') as f:
+    writer = csv.writer(f)
+    writer.writerow(["TrueLabel", "PredictedLabel"])
+    for true, pred in zip(y_bin, y_pred_bin):
+        writer.writerow([true, pred])
+
+# Export multiclass results
+with open('multiclass_results.csv', mode='w', newline='') as f:
+    writer = csv.writer(f)
+    writer.writerow(["TrueLabel", "PredictedLabel"])
+    for true, pred in zip(y_multi, y_pred_multi):
+        writer.writerow([true, pred])
 
 !ec
-
-
-
-
-
diff --git a/doc/src/week40/.ipynb_checkpoints/week40-checkpoint.ipynb b/doc/src/week40/.ipynb_checkpoints/week40-checkpoint.ipynb
new file mode 100644
index 000000000..d4af5971e
--- /dev/null
+++ b/doc/src/week40/.ipynb_checkpoints/week40-checkpoint.ipynb
@@ -0,0 +1,2457 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "8bcdb676",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- HTML file automatically generated from DocOnce source (https://github.com/doconce/doconce/)\n",
+    "doconce format html week40.do.txt --no_mako -->\n",
+    "<!-- dom:TITLE: Week 40: Gradient descent methods (continued) and start Neural networks -->"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "35f7b9f3",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "# Week 40: Gradient descent methods (continued) and start Neural networks\n",
+    "**Morten Hjorth-Jensen**, Department of Physics, University of Oslo, Norway\n",
+    "\n",
+    "Date: **September 29-October 3, 2025**"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6cbe7551",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Lecture Monday September 30, 2024\n",
+    "1. Logistic regression and gradient descent, examples on how to code\n",
+    "<!-- o Automatic differentiation and gradient descent, examples using Logistic regression -->\n",
+    "\n",
+    "2. Start with the basics of Neural Networks, setting up the basic steps, from the simple perceptron model to the multi-layer perceptron model\n",
+    "<!-- o [Video of lecture](https://youtu.be/jdJoOrCIdII) -->\n",
+    "<!-- o Whiteboard notes at <https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2024/NotesSeptember30.pdf> -->"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a6b2f32f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Suggested readings and videos\n",
+    "**Readings and Videos:**\n",
+    "\n",
+    "1. The lecture notes for week 40 (these notes)\n",
+    "<!-- o For a good discussion on gradient methods, we would like to recommend Goodfellow et al section 4.3-4.5 and# sections 8.3-8.6. We will come back to the latter chapter in our discussion of Neural networks as well. -->\n",
+    "\n",
+    "2. For neural networks we recommend Goodfellow et al chapter 6 and Raschka et al chapter 2 (contains also material about gradient descent) and chapter 11 (we will use this next week)\n",
+    "<!-- o Video on gradient descent at <https://www.youtube.com/watch?v=sDv4f4s2SB8> -->\n",
+    "<!-- o Video on automatic differentiation  at <https://www.youtube.com/watch?v=wG_nF1awSSY> -->\n",
+    "\n",
+    "3. Neural Networks demystified at <https://www.youtube.com/watch?v=bxe2T-V8XRs&list=PLiaHhY2iBX9hdHaRr6b7XevZtgZRa1PoU&ab_channel=WelchLabs>\n",
+    "\n",
+    "4. Building Neural Networks from scratch at URL:https://www.youtube.com/watch?v=Wo5dMEP_BbI&list=PLQVvvaa0QuDcjD5BAw2DxE6OF2tius3V3&ab_channel=sentdex\""
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f0e19e8f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Lab sessions Tuesday and Wednesday\n",
+    "**Material for the active learning sessions on Tuesday and Wednesday.**\n",
+    "\n",
+    "  * Work on project 1 and discussions on how to structure your report\n",
+    "\n",
+    "  * No weekly exercises for week 40, project work only\n",
+    "\n",
+    "  * Video on how to write scientific reports recorded during one of the lab sessions at <https://youtu.be/tVW1ZDmZnwM>\n",
+    "\n",
+    "  * A general guideline can be found at <https://github.com/CompPhysics/MachineLearning/blob/master/doc/Projects/EvaluationGrading/EvaluationForm.md>."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a9c9867d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Logistic Regression, from last week\n",
+    "\n",
+    "In linear regression our main interest was centered on learning the\n",
+    "coefficients of a functional fit (say a polynomial) in order to be\n",
+    "able to predict the response of a continuous variable on some unseen\n",
+    "data. The fit to the continuous variable $y_i$ is based on some\n",
+    "independent variables $\\boldsymbol{x}_i$. Linear regression resulted in\n",
+    "analytical expressions for standard ordinary Least Squares or Ridge\n",
+    "regression (in terms of matrices to invert) for several quantities,\n",
+    "ranging from the variance and thereby the confidence intervals of the\n",
+    "parameters $\\boldsymbol{\\theta}$ to the mean squared error. If we can invert\n",
+    "the product of the design matrices, linear regression gives then a\n",
+    "simple recipe for fitting our data."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "934e9a13",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Classification problems\n",
+    "\n",
+    "Classification problems, however, are concerned with outcomes taking\n",
+    "the form of discrete variables (i.e. categories). We may for example,\n",
+    "on the basis of DNA sequencing for a number of patients, like to find\n",
+    "out which mutations are important for a certain disease; or based on\n",
+    "scans of various patients' brains, figure out if there is a tumor or\n",
+    "not; or given a specific physical system, we'd like to identify its\n",
+    "state, say whether it is an ordered or disordered system (typical\n",
+    "situation in solid state physics); or classify the status of a\n",
+    "patient, whether she/he has a stroke or not and many other similar\n",
+    "situations.\n",
+    "\n",
+    "The most common situation we encounter when we apply logistic\n",
+    "regression is that of two possible outcomes, normally denoted as a\n",
+    "binary outcome, true or false, positive or negative, success or\n",
+    "failure etc."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "69902ada",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Optimization and Deep learning\n",
+    "\n",
+    "Logistic regression will also serve as our stepping stone towards\n",
+    "neural network algorithms and supervised deep learning. For logistic\n",
+    "learning, the minimization of the cost function leads to a non-linear\n",
+    "equation in the parameters $\\boldsymbol{\\theta}$. The optimization of the\n",
+    "problem calls therefore for minimization algorithms.\n",
+    "\n",
+    "As we have discussed earlier, this forms the\n",
+    "bottle neck of all machine learning algorithms, namely how to find\n",
+    "reliable minima of a multi-variable function. This leads us to the\n",
+    "family of gradient descent methods. The latter are the working horses\n",
+    "of basically all modern machine learning algorithms.\n",
+    "\n",
+    "We note also that many of the topics discussed here on logistic \n",
+    "regression are also commonly used in modern supervised Deep Learning\n",
+    "models, as we will see later."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2f9d9093",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Basics\n",
+    "\n",
+    "We consider the case where the outputs/targets, also called the\n",
+    "responses or the outcomes, $y_i$ are discrete and only take values\n",
+    "from $k=0,\\dots,K-1$ (i.e. $K$ classes).\n",
+    "\n",
+    "The goal is to predict the\n",
+    "output classes from the design matrix $\\boldsymbol{X}\\in\\mathbb{R}^{n\\times p}$\n",
+    "made of $n$ samples, each of which carries $p$ features or predictors. The\n",
+    "primary goal is to identify the classes to which new unseen samples\n",
+    "belong.\n",
+    "\n",
+    "Last week we  specialized to the case of two classes only, with outputs\n",
+    "$y_i=0$ and $y_i=1$. Our outcomes could represent the status of a\n",
+    "credit card user that could default or not on her/his credit card\n",
+    "debt. That is"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b4b81421",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "y_i = \\begin{bmatrix} 0 & \\mathrm{no}\\\\  1 & \\mathrm{yes} \\end{bmatrix}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "aaef1654",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Two parameters\n",
+    "\n",
+    "We assume now that we have two classes with $y_i$ either $0$ or $1$. Furthermore we assume also that we have only two parameters $\\theta$ in our fitting of the Sigmoid function, that is we define probabilities"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8289aa3b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\begin{align*}\n",
+    "p(y_i=1|x_i,\\boldsymbol{\\theta}) &= \\frac{\\exp{(\\theta_0+\\theta_1x_i)}}{1+\\exp{(\\theta_0+\\theta_1x_i)}},\\nonumber\\\\\n",
+    "p(y_i=0|x_i,\\boldsymbol{\\theta}) &= 1 - p(y_i=1|x_i,\\boldsymbol{\\theta}),\n",
+    "\\end{align*}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "dc858cac",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where $\\boldsymbol{\\theta}$ are the weights we wish to extract from data, in our case $\\theta_0$ and $\\theta_1$. \n",
+    "\n",
+    "Note that we used"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "28d74740",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "p(y_i=0\\vert x_i, \\boldsymbol{\\theta}) = 1-p(y_i=1\\vert x_i, \\boldsymbol{\\theta}).\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c96aab95",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Maximum likelihood\n",
+    "\n",
+    "In order to define the total likelihood for all possible outcomes from a  \n",
+    "dataset $\\mathcal{D}=\\{(y_i,x_i)\\}$, with the binary labels\n",
+    "$y_i\\in\\{0,1\\}$ and where the data points are drawn independently, we use the so-called [Maximum Likelihood Estimation](https://en.wikipedia.org/wiki/Maximum_likelihood_estimation) (MLE) principle. \n",
+    "We aim thus at maximizing \n",
+    "the probability of seeing the observed data. We can then approximate the \n",
+    "likelihood in terms of the product of the individual probabilities of a specific outcome $y_i$, that is"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5652aaa8",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\begin{align*}\n",
+    "P(\\mathcal{D}|\\boldsymbol{\\theta})& = \\prod_{i=1}^n \\left[p(y_i=1|x_i,\\boldsymbol{\\theta})\\right]^{y_i}\\left[1-p(y_i=1|x_i,\\boldsymbol{\\theta}))\\right]^{1-y_i}\\nonumber \\\\\n",
+    "\\end{align*}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "25dad53f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "from which we obtain the log-likelihood and our **cost/loss** function"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e835976e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\mathcal{C}(\\boldsymbol{\\theta}) = \\sum_{i=1}^n \\left( y_i\\log{p(y_i=1|x_i,\\boldsymbol{\\theta})} + (1-y_i)\\log\\left[1-p(y_i=1|x_i,\\boldsymbol{\\theta}))\\right]\\right).\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "bdf49fa9",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## The cost function rewritten\n",
+    "\n",
+    "Reordering the logarithms, we can rewrite the **cost/loss** function as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "11358642",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\mathcal{C}(\\boldsymbol{\\theta}) = \\sum_{i=1}^n  \\left(y_i(\\theta_0+\\theta_1x_i) -\\log{(1+\\exp{(\\theta_0+\\theta_1x_i)})}\\right).\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "64dfe209",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "The maximum likelihood estimator is defined as the set of parameters that maximize the log-likelihood where we maximize with respect to $\\theta$.\n",
+    "Since the cost (error) function is just the negative log-likelihood, for logistic regression we have that"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7ec40e31",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\mathcal{C}(\\boldsymbol{\\theta})=-\\sum_{i=1}^n  \\left(y_i(\\theta_0+\\theta_1x_i) -\\log{(1+\\exp{(\\theta_0+\\theta_1x_i)})}\\right).\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6b62b909",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "This equation is known in statistics as the **cross entropy**. Finally, we note that just as in linear regression, \n",
+    "in practice we often supplement the cross-entropy with additional regularization terms, usually $L_1$ and $L_2$ regularization as we did for Ridge and Lasso regression."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "be73c319",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Minimizing the cross entropy\n",
+    "\n",
+    "The cross entropy is a convex function of the weights $\\boldsymbol{\\theta}$ and,\n",
+    "therefore, any local minimizer is a global minimizer. \n",
+    "\n",
+    "Minimizing this\n",
+    "cost function with respect to the two parameters $\\theta_0$ and $\\theta_1$ we obtain"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9fa8b192",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial \\mathcal{C}(\\boldsymbol{\\theta})}{\\partial \\theta_0} = -\\sum_{i=1}^n  \\left(y_i -\\frac{\\exp{(\\theta_0+\\theta_1x_i)}}{1+\\exp{(\\theta_0+\\theta_1x_i)}}\\right),\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "740d8af4",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8ca31a46",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial \\mathcal{C}(\\boldsymbol{\\theta})}{\\partial \\theta_1} = -\\sum_{i=1}^n  \\left(y_ix_i -x_i\\frac{\\exp{(\\theta_0+\\theta_1x_i)}}{1+\\exp{(\\theta_0+\\theta_1x_i)}}\\right).\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b3abac50",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## A more compact expression\n",
+    "\n",
+    "Let us now define a vector $\\boldsymbol{y}$ with $n$ elements $y_i$, an\n",
+    "$n\\times p$ matrix $\\boldsymbol{X}$ which contains the $x_i$ values and a\n",
+    "vector $\\boldsymbol{p}$ of fitted probabilities $p(y_i\\vert x_i,\\boldsymbol{\\theta})$. We can rewrite in a more compact form the first\n",
+    "derivative of the cost function as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "838e1486",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial \\mathcal{C}(\\boldsymbol{\\theta})}{\\partial \\boldsymbol{\\theta}} = -\\boldsymbol{X}^T\\left(\\boldsymbol{y}-\\boldsymbol{p}\\right).\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "994dcab6",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "If we in addition define a diagonal matrix $\\boldsymbol{W}$ with elements \n",
+    "$p(y_i\\vert x_i,\\boldsymbol{\\theta})(1-p(y_i\\vert x_i,\\boldsymbol{\\theta})$, we can obtain a compact expression of the second derivative as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d4722049",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial^2 \\mathcal{C}(\\boldsymbol{\\theta})}{\\partial \\boldsymbol{\\theta}\\partial \\boldsymbol{\\theta}^T} = \\boldsymbol{X}^T\\boldsymbol{W}\\boldsymbol{X}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "36bfab48",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Extending to more predictors\n",
+    "\n",
+    "Within a binary classification problem, we can easily expand our model to include multiple predictors. Our ratio between likelihoods is then with $p$ predictors"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9514800b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\log{ \\frac{p(\\boldsymbol{\\theta}\\boldsymbol{x})}{1-p(\\boldsymbol{\\theta}\\boldsymbol{x})}} = \\theta_0+\\theta_1x_1+\\theta_2x_2+\\dots+\\theta_px_p.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3a143e21",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Here we defined $\\boldsymbol{x}=[1,x_1,x_2,\\dots,x_p]$ and $\\boldsymbol{\\theta}=[\\theta_0, \\theta_1, \\dots, \\theta_p]$ leading to"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ee80561b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "p(\\boldsymbol{\\theta}\\boldsymbol{x})=\\frac{ \\exp{(\\theta_0+\\theta_1x_1+\\theta_2x_2+\\dots+\\theta_px_p)}}{1+\\exp{(\\theta_0+\\theta_1x_1+\\theta_2x_2+\\dots+\\theta_px_p)}}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b91b4f87",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Including more classes\n",
+    "\n",
+    "Till now we have mainly focused on two classes, the so-called binary\n",
+    "system. Suppose we wish to extend to $K$ classes.  Let us for the sake\n",
+    "of simplicity assume we have only two predictors. We have then following model"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "33915386",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\log{\\frac{p(C=1\\vert x)}{p(K\\vert x)}} = \\theta_{10}+\\theta_{11}x_1,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "73943f4b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "26a59726",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\log{\\frac{p(C=2\\vert x)}{p(K\\vert x)}} = \\theta_{20}+\\theta_{21}x_1,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3729cb43",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and so on till the class $C=K-1$ class"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "21579ac3",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\log{\\frac{p(C=K-1\\vert x)}{p(K\\vert x)}} = \\theta_{(K-1)0}+\\theta_{(K-1)1}x_1,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2582951d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and the model is specified in term of $K-1$ so-called log-odds or\n",
+    "**logit** transformations."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e060c0b4",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## More classes\n",
+    "\n",
+    "In our discussion of neural networks we will encounter the above again\n",
+    "in terms of a slightly modified function, the so-called **Softmax** function.\n",
+    "\n",
+    "The softmax function is used in various multiclass classification\n",
+    "methods, such as multinomial logistic regression (also known as\n",
+    "softmax regression), multiclass linear discriminant analysis, naive\n",
+    "Bayes classifiers, and artificial neural networks.  Specifically, in\n",
+    "multinomial logistic regression and linear discriminant analysis, the\n",
+    "input to the function is the result of $K$ distinct linear functions,\n",
+    "and the predicted probability for the $k$-th class given a sample\n",
+    "vector $\\boldsymbol{x}$ and a weighting vector $\\boldsymbol{\\theta}$ is (with two\n",
+    "predictors):"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "753633a4",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "p(C=k\\vert \\mathbf {x} )=\\frac{\\exp{(\\theta_{k0}+\\theta_{k1}x_1)}}{1+\\sum_{l=1}^{K-1}\\exp{(\\theta_{l0}+\\theta_{l1}x_1)}}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "de82173b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "It is easy to extend to more predictors. The final class is"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "bf612286",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "p(C=K\\vert \\mathbf {x} )=\\frac{1}{1+\\sum_{l=1}^{K-1}\\exp{(\\theta_{l0}+\\theta_{l1}x_1)}},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8cd8d65e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and they sum to one. Our earlier discussions were all specialized to\n",
+    "the case with two classes only. It is easy to see from the above that\n",
+    "what we derived earlier is compatible with these equations.\n",
+    "\n",
+    "To find the optimal parameters we would typically use a gradient\n",
+    "descent method.  Newton's method and gradient descent methods are\n",
+    "discussed in the material on [optimization\n",
+    "methods](https://compphysics.github.io/MachineLearning/doc/pub/Splines/html/Splines-bs.html)."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "36e6dba8",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Optimization, the central part of any Machine Learning algortithm\n",
+    "\n",
+    "Almost every problem in machine learning and data science starts with\n",
+    "a dataset $X$, a model $g(\\theta)$, which is a function of the\n",
+    "parameters $\\theta$ and a cost function $C(X, g(\\theta))$ that allows\n",
+    "us to judge how well the model $g(\\theta)$ explains the observations\n",
+    "$X$. The model is fit by finding the values of $\\theta$ that minimize\n",
+    "the cost function. Ideally we would be able to solve for $\\theta$\n",
+    "analytically, however this is not possible in general and we must use\n",
+    "some approximative/numerical method to compute the minimum."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7c070981",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Revisiting our Logistic Regression case\n",
+    "\n",
+    "In our discussion on Logistic Regression we studied the \n",
+    "case of\n",
+    "two classes, with $y_i$ either\n",
+    "$0$ or $1$. Furthermore we assumed also that we have only two\n",
+    "parameters $\\theta$ in our fitting, that is we\n",
+    "defined probabilities"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "09272c26",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\begin{align*}\n",
+    "p(y_i=1|x_i,\\boldsymbol{\\theta}) &= \\frac{\\exp{(\\theta_0+\\theta_1x_i)}}{1+\\exp{(\\theta_0+\\theta_1x_i)}},\\nonumber\\\\\n",
+    "p(y_i=0|x_i,\\boldsymbol{\\theta}) &= 1 - p(y_i=1|x_i,\\boldsymbol{\\theta}),\n",
+    "\\end{align*}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2e7c4381",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where $\\boldsymbol{\\theta}$ are the weights we wish to extract from data, in our case $\\theta_0$ and $\\theta_1$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "356c84f4",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## The equations to solve\n",
+    "\n",
+    "Our compact equations used a definition of a vector $\\boldsymbol{y}$ with $n$\n",
+    "elements $y_i$, an $n\\times p$ matrix $\\boldsymbol{X}$ which contains the\n",
+    "$x_i$ values and a vector $\\boldsymbol{p}$ of fitted probabilities\n",
+    "$p(y_i\\vert x_i,\\boldsymbol{\\theta})$. We rewrote in a more compact form\n",
+    "the first derivative of the cost function as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "fc194a00",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial \\mathcal{C}(\\boldsymbol{\\theta})}{\\partial \\boldsymbol{\\theta}} = -\\boldsymbol{X}^T\\left(\\boldsymbol{y}-\\boldsymbol{p}\\right).\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "dc726c11",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "If we in addition define a diagonal matrix $\\boldsymbol{W}$ with elements \n",
+    "$p(y_i\\vert x_i,\\boldsymbol{\\theta})(1-p(y_i\\vert x_i,\\boldsymbol{\\theta})$, we can obtain a compact expression of the second derivative as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "902b686f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial^2 \\mathcal{C}(\\boldsymbol{\\theta})}{\\partial \\boldsymbol{\\theta}\\partial \\boldsymbol{\\theta}^T} = \\boldsymbol{X}^T\\boldsymbol{W}\\boldsymbol{X}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "77c884a5",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "This defines what is called  the Hessian matrix."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f0e8343a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Solving using Newton-Raphson's method\n",
+    "\n",
+    "If we can set up these equations, Newton-Raphson's iterative method is normally the method of choice. It requires however that we can compute in an efficient way the  matrices that define the first and second derivatives. \n",
+    "\n",
+    "Our iterative scheme is then given by"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "81c29422",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\boldsymbol{\\theta}^{\\mathrm{new}} = \\boldsymbol{\\theta}^{\\mathrm{old}}-\\left(\\frac{\\partial^2 \\mathcal{C}(\\boldsymbol{\\theta})}{\\partial \\boldsymbol{\\theta}\\partial \\boldsymbol{\\theta}^T}\\right)^{-1}_{\\boldsymbol{\\theta}^{\\mathrm{old}}}\\times \\left(\\frac{\\partial \\mathcal{C}(\\boldsymbol{\\theta})}{\\partial \\boldsymbol{\\theta}}\\right)_{\\boldsymbol{\\theta}^{\\mathrm{old}}},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5ceb33f6",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "or in matrix form as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "74f3e780",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\boldsymbol{\\theta}^{\\mathrm{new}} = \\boldsymbol{\\theta}^{\\mathrm{old}}-\\left(\\boldsymbol{X}^T\\boldsymbol{W}\\boldsymbol{X} \\right)^{-1}\\times \\left(-\\boldsymbol{X}^T(\\boldsymbol{y}-\\boldsymbol{p}) \\right)_{\\boldsymbol{\\theta}^{\\mathrm{old}}}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "58db0d80",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "The right-hand side is computed with the old values of $\\theta$. \n",
+    "\n",
+    "If we can compute these matrices, in particular the Hessian, the above is often the easiest method to implement."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "fc55097e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Example code for Logistic Regression\n",
+    "\n",
+    "Here we make a class for Logistic regression. The code uses a simple data set and includes both a binary case and a multiclass case."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "id": "bfabd065",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "import numpy as np\n",
+    "\n",
+    "class LogisticRegression:\n",
+    "    \"\"\"\n",
+    "    Logistic Regression for binary and multiclass classification.\n",
+    "    \"\"\"\n",
+    "    def __init__(self, lr=0.01, epochs=1000, fit_intercept=True, verbose=False):\n",
+    "        self.lr = lr                  # Learning rate for gradient descent\n",
+    "        self.epochs = epochs          # Number of iterations\n",
+    "        self.fit_intercept = fit_intercept  # Whether to add intercept (bias)\n",
+    "        self.verbose = verbose        # Print loss during training if True\n",
+    "        self.weights = None\n",
+    "        self.multi_class = False      # Will be determined at fit time\n",
+    "\n",
+    "    def _add_intercept(self, X):\n",
+    "        \"\"\"Add intercept term (column of ones) to feature matrix.\"\"\"\n",
+    "        intercept = np.ones((X.shape[0], 1))\n",
+    "        return np.concatenate((intercept, X), axis=1)\n",
+    "\n",
+    "    def _sigmoid(self, z):\n",
+    "        \"\"\"Sigmoid function for binary logistic.\"\"\"\n",
+    "        return 1 / (1 + np.exp(-z))\n",
+    "\n",
+    "    def _softmax(self, Z):\n",
+    "        \"\"\"Softmax function for multiclass logistic.\"\"\"\n",
+    "        exp_Z = np.exp(Z - np.max(Z, axis=1, keepdims=True))\n",
+    "        return exp_Z / np.sum(exp_Z, axis=1, keepdims=True)\n",
+    "\n",
+    "    def fit(self, X, y):\n",
+    "        \"\"\"\n",
+    "        Train the logistic regression model using gradient descent.\n",
+    "        Supports binary (sigmoid) and multiclass (softmax) based on y.\n",
+    "        \"\"\"\n",
+    "        X = np.array(X)\n",
+    "        y = np.array(y)\n",
+    "        n_samples, n_features = X.shape\n",
+    "\n",
+    "        # Add intercept if needed\n",
+    "        if self.fit_intercept:\n",
+    "            X = self._add_intercept(X)\n",
+    "            n_features += 1\n",
+    "\n",
+    "        # Determine classes and mode (binary vs multiclass)\n",
+    "        unique_classes = np.unique(y)\n",
+    "        if len(unique_classes) > 2:\n",
+    "            self.multi_class = True\n",
+    "        else:\n",
+    "            self.multi_class = False\n",
+    "\n",
+    "        # ----- Multiclass case -----\n",
+    "        if self.multi_class:\n",
+    "            n_classes = len(unique_classes)\n",
+    "            # Map original labels to 0...n_classes-1\n",
+    "            class_to_index = {c: idx for idx, c in enumerate(unique_classes)}\n",
+    "            y_indices = np.array([class_to_index[c] for c in y])\n",
+    "            # Initialize weight matrix (features x classes)\n",
+    "            self.weights = np.zeros((n_features, n_classes))\n",
+    "\n",
+    "            # One-hot encode y\n",
+    "            Y_onehot = np.zeros((n_samples, n_classes))\n",
+    "            Y_onehot[np.arange(n_samples), y_indices] = 1\n",
+    "\n",
+    "            # Gradient descent\n",
+    "            for epoch in range(self.epochs):\n",
+    "                scores = X.dot(self.weights)          # Linear scores (n_samples x n_classes)\n",
+    "                probs = self._softmax(scores)        # Probabilities (n_samples x n_classes)\n",
+    "                # Compute gradient (features x classes)\n",
+    "                gradient = (1 / n_samples) * X.T.dot(probs - Y_onehot)\n",
+    "                # Update weights\n",
+    "                self.weights -= self.lr * gradient\n",
+    "\n",
+    "                if self.verbose and epoch % 100 == 0:\n",
+    "                    # Compute current loss (categorical cross-entropy)\n",
+    "                    loss = -np.sum(Y_onehot * np.log(probs + 1e-15)) / n_samples\n",
+    "                    print(f\"[Epoch {epoch}] Multiclass loss: {loss:.4f}\")\n",
+    "\n",
+    "        # ----- Binary case -----\n",
+    "        else:\n",
+    "            # Convert y to 0/1 if not already\n",
+    "            if not np.array_equal(unique_classes, [0, 1]):\n",
+    "                # Map the two classes to 0 and 1\n",
+    "                class0, class1 = unique_classes\n",
+    "                y_binary = np.where(y == class1, 1, 0)\n",
+    "            else:\n",
+    "                y_binary = y.copy().astype(int)\n",
+    "\n",
+    "            # Initialize weights vector (features,)\n",
+    "            self.weights = np.zeros(n_features)\n",
+    "\n",
+    "            # Gradient descent\n",
+    "            for epoch in range(self.epochs):\n",
+    "                linear_model = X.dot(self.weights)     # (n_samples,)\n",
+    "                probs = self._sigmoid(linear_model)   # (n_samples,)\n",
+    "                # Gradient for binary cross-entropy\n",
+    "                gradient = (1 / n_samples) * X.T.dot(probs - y_binary)\n",
+    "                self.weights -= self.lr * gradient\n",
+    "\n",
+    "                if self.verbose and epoch % 100 == 0:\n",
+    "                    # Compute binary cross-entropy loss\n",
+    "                    loss = -np.mean(\n",
+    "                        y_binary * np.log(probs + 1e-15) + \n",
+    "                        (1 - y_binary) * np.log(1 - probs + 1e-15)\n",
+    "                    )\n",
+    "                    print(f\"[Epoch {epoch}] Binary loss: {loss:.4f}\")\n",
+    "\n",
+    "    def predict_prob(self, X):\n",
+    "        \"\"\"\n",
+    "        Compute probability estimates. Returns a 1D array for binary or\n",
+    "        a 2D array (n_samples x n_classes) for multiclass.\n",
+    "        \"\"\"\n",
+    "        X = np.array(X)\n",
+    "        # Add intercept if the model used it\n",
+    "        if self.fit_intercept:\n",
+    "            X = self._add_intercept(X)\n",
+    "        scores = X.dot(self.weights)\n",
+    "        if self.multi_class:\n",
+    "            return self._softmax(scores)\n",
+    "        else:\n",
+    "            return self._sigmoid(scores)\n",
+    "\n",
+    "    def predict(self, X):\n",
+    "        \"\"\"\n",
+    "        Predict class labels for samples in X.\n",
+    "        Returns integer class labels (0,1 for binary, or 0...C-1 for multiclass).\n",
+    "        \"\"\"\n",
+    "        probs = self.predict_prob(X)\n",
+    "        if self.multi_class:\n",
+    "            # Choose class with highest probability\n",
+    "            return np.argmax(probs, axis=1)\n",
+    "        else:\n",
+    "            # Threshold at 0.5 for binary\n",
+    "            return (probs >= 0.5).astype(int)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e336d995",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "The class implements the sigmoid and softmax internally. During fit(),\n",
+    "we check the number of classes: if more than 2, we set\n",
+    "self.multi_class=True and perform multinomial logistic regression. We\n",
+    "one-hot encode the target vector and update a weight matrix with\n",
+    "softmax probabilities. Otherwise, we do standard binary logistic\n",
+    "regression, converting labels to 0/1 if needed and updating a weight\n",
+    "vector. In both cases we use batch gradient descent on the\n",
+    "cross-entropy loss (we add a small epsilon 1e-15 to logs for numerical\n",
+    "stability). Progress (loss) can be printed if verbose=True."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "id": "8202f579",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "# Evaluation Metrics\n",
+    "#We define helper functions for accuracy and cross-entropy loss. Accuracy is the fraction of correct predictions . For loss, we compute the appropriate cross-entropy:\n",
+    "\n",
+    "def accuracy_score(y_true, y_pred):\n",
+    "    \"\"\"Accuracy = (# correct predictions) / (total samples).\"\"\"\n",
+    "    y_true = np.array(y_true)\n",
+    "    y_pred = np.array(y_pred)\n",
+    "    return np.mean(y_true == y_pred)\n",
+    "\n",
+    "def binary_cross_entropy(y_true, y_prob):\n",
+    "    \"\"\"\n",
+    "    Binary cross-entropy loss.\n",
+    "    y_true: true binary labels (0 or 1), y_prob: predicted probabilities for class 1.\n",
+    "    \"\"\"\n",
+    "    y_true = np.array(y_true)\n",
+    "    y_prob = np.clip(np.array(y_prob), 1e-15, 1-1e-15)\n",
+    "    return -np.mean(y_true * np.log(y_prob) + (1 - y_true) * np.log(1 - y_prob))\n",
+    "\n",
+    "def categorical_cross_entropy(y_true, y_prob):\n",
+    "    \"\"\"\n",
+    "    Categorical cross-entropy loss for multiclass.\n",
+    "    y_true: true labels (0...C-1), y_prob: array of predicted probabilities (n_samples x C).\n",
+    "    \"\"\"\n",
+    "    y_true = np.array(y_true, dtype=int)\n",
+    "    y_prob = np.clip(np.array(y_prob), 1e-15, 1-1e-15)\n",
+    "    # One-hot encode true labels\n",
+    "    n_samples, n_classes = y_prob.shape\n",
+    "    one_hot = np.zeros_like(y_prob)\n",
+    "    one_hot[np.arange(n_samples), y_true] = 1\n",
+    "    # Compute cross-entropy\n",
+    "    loss_vec = -np.sum(one_hot * np.log(y_prob), axis=1)\n",
+    "    return np.mean(loss_vec)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "41c57f8d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "### Synthetic data generation\n",
+    "\n",
+    "Binary classification data: Create two Gaussian clusters in 2D. For example, class 0 around mean [-2,-2] and class 1 around [2,2].\n",
+    "Multiclass data: Create several Gaussian clusters (one per class) spread out in feature space."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "id": "144cfb9a",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "import numpy as np\n",
+    "\n",
+    "def generate_binary_data(n_samples=100, n_features=2, random_state=None):\n",
+    "    \"\"\"\n",
+    "    Generate synthetic binary classification data.\n",
+    "    Returns (X, y) where X is (n_samples x n_features), y in {0,1}.\n",
+    "    \"\"\"\n",
+    "    rng = np.random.RandomState(random_state)\n",
+    "    # Half samples for class 0, half for class 1\n",
+    "    n0 = n_samples // 2\n",
+    "    n1 = n_samples - n0\n",
+    "    # Class 0 around mean -2, class 1 around +2\n",
+    "    mean0 = -2 * np.ones(n_features)\n",
+    "    mean1 =  2 * np.ones(n_features)\n",
+    "    X0 = rng.randn(n0, n_features) + mean0\n",
+    "    X1 = rng.randn(n1, n_features) + mean1\n",
+    "    X = np.vstack((X0, X1))\n",
+    "    y = np.array([0]*n0 + [1]*n1)\n",
+    "    return X, y\n",
+    "\n",
+    "def generate_multiclass_data(n_samples=150, n_features=2, n_classes=3, random_state=None):\n",
+    "    \"\"\"\n",
+    "    Generate synthetic multiclass data with n_classes Gaussian clusters.\n",
+    "    \"\"\"\n",
+    "    rng = np.random.RandomState(random_state)\n",
+    "    X = []\n",
+    "    y = []\n",
+    "    samples_per_class = n_samples // n_classes\n",
+    "    for cls in range(n_classes):\n",
+    "        # Random cluster center for each class\n",
+    "        center = rng.uniform(-5, 5, size=n_features)\n",
+    "        Xi = rng.randn(samples_per_class, n_features) + center\n",
+    "        yi = [cls] * samples_per_class\n",
+    "        X.append(Xi)\n",
+    "        y.extend(yi)\n",
+    "    X = np.vstack(X)\n",
+    "    y = np.array(y)\n",
+    "    return X, y\n",
+    "\n",
+    "\n",
+    "# Generate and test on binary data\n",
+    "X_bin, y_bin = generate_binary_data(n_samples=200, n_features=2, random_state=42)\n",
+    "model_bin = LogisticRegression(lr=0.1, epochs=1000)\n",
+    "model_bin.fit(X_bin, y_bin)\n",
+    "y_prob_bin = model_bin.predict_prob(X_bin)      # probabilities for class 1\n",
+    "y_pred_bin = model_bin.predict(X_bin)           # predicted classes 0 or 1\n",
+    "\n",
+    "acc_bin = accuracy_score(y_bin, y_pred_bin)\n",
+    "loss_bin = binary_cross_entropy(y_bin, y_prob_bin)\n",
+    "print(f\"Binary Classification - Accuracy: {acc_bin:.2f}, Cross-Entropy Loss: {loss_bin:.2f}\")\n",
+    "#For multiclass:\n",
+    "# Generate and test on multiclass data\n",
+    "X_multi, y_multi = generate_multiclass_data(n_samples=300, n_features=2, n_classes=3, random_state=1)\n",
+    "model_multi = LogisticRegression(lr=0.1, epochs=1000)\n",
+    "model_multi.fit(X_multi, y_multi)\n",
+    "y_prob_multi = model_multi.predict_prob(X_multi)     # (n_samples x 3) probabilities\n",
+    "y_pred_multi = model_multi.predict(X_multi)          # predicted labels 0,1,2\n",
+    "\n",
+    "acc_multi = accuracy_score(y_multi, y_pred_multi)\n",
+    "loss_multi = categorical_cross_entropy(y_multi, y_prob_multi)\n",
+    "print(f\"Multiclass Classification - Accuracy: {acc_multi:.2f}, Cross-Entropy Loss: {loss_multi:.2f}\")\n",
+    "\n",
+    "# CSV Export\n",
+    "import csv\n",
+    "\n",
+    "# Export binary results\n",
+    "with open('binary_results.csv', mode='w', newline='') as f:\n",
+    "    writer = csv.writer(f)\n",
+    "    writer.writerow([\"TrueLabel\", \"PredictedLabel\"])\n",
+    "    for true, pred in zip(y_bin, y_pred_bin):\n",
+    "        writer.writerow([true, pred])\n",
+    "\n",
+    "# Export multiclass results\n",
+    "with open('multiclass_results.csv', mode='w', newline='') as f:\n",
+    "    writer = csv.writer(f)\n",
+    "    writer.writerow([\"TrueLabel\", \"PredictedLabel\"])\n",
+    "    for true, pred in zip(y_multi, y_pred_multi):\n",
+    "        writer.writerow([true, pred])"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e572de84",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Using **Scikit-learn**\n",
+    "\n",
+    "We show here how we can use a logistic regression case on a data set\n",
+    "included in _scikit_learn_, the so-called Wisconsin breast cancer data\n",
+    "using Logistic regression as our algorithm for classification. This is\n",
+    "a widely studied data set and can easily be included in demonstrations\n",
+    "of classification problems."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "id": "004ca90b",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "%matplotlib inline\n",
+    "\n",
+    "import matplotlib.pyplot as plt\n",
+    "import numpy as np\n",
+    "from sklearn.model_selection import  train_test_split \n",
+    "from sklearn.datasets import load_breast_cancer\n",
+    "from sklearn.linear_model import LogisticRegression\n",
+    "\n",
+    "# Load the data\n",
+    "cancer = load_breast_cancer()\n",
+    "\n",
+    "X_train, X_test, y_train, y_test = train_test_split(cancer.data,cancer.target,random_state=0)\n",
+    "print(X_train.shape)\n",
+    "print(X_test.shape)\n",
+    "# Logistic Regression\n",
+    "logreg = LogisticRegression(solver='lbfgs')\n",
+    "logreg.fit(X_train, y_train)\n",
+    "print(\"Test set accuracy with Logistic Regression: {:.2f}\".format(logreg.score(X_test,y_test)))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c6bb2f75",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Using the correlation matrix\n",
+    "\n",
+    "In addition to the above scores, we could also study the covariance (and the correlation matrix).\n",
+    "We use **Pandas** to compute the correlation matrix."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "id": "f9b8479b",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "import matplotlib.pyplot as plt\n",
+    "import numpy as np\n",
+    "from sklearn.model_selection import  train_test_split \n",
+    "from sklearn.datasets import load_breast_cancer\n",
+    "from sklearn.linear_model import LogisticRegression\n",
+    "cancer = load_breast_cancer()\n",
+    "import pandas as pd\n",
+    "# Making a data frame\n",
+    "cancerpd = pd.DataFrame(cancer.data, columns=cancer.feature_names)\n",
+    "\n",
+    "fig, axes = plt.subplots(15,2,figsize=(10,20))\n",
+    "malignant = cancer.data[cancer.target == 0]\n",
+    "benign = cancer.data[cancer.target == 1]\n",
+    "ax = axes.ravel()\n",
+    "\n",
+    "for i in range(30):\n",
+    "    _, bins = np.histogram(cancer.data[:,i], bins =50)\n",
+    "    ax[i].hist(malignant[:,i], bins = bins, alpha = 0.5)\n",
+    "    ax[i].hist(benign[:,i], bins = bins, alpha = 0.5)\n",
+    "    ax[i].set_title(cancer.feature_names[i])\n",
+    "    ax[i].set_yticks(())\n",
+    "ax[0].set_xlabel(\"Feature magnitude\")\n",
+    "ax[0].set_ylabel(\"Frequency\")\n",
+    "ax[0].legend([\"Malignant\", \"Benign\"], loc =\"best\")\n",
+    "fig.tight_layout()\n",
+    "plt.show()\n",
+    "\n",
+    "import seaborn as sns\n",
+    "correlation_matrix = cancerpd.corr().round(1)\n",
+    "# use the heatmap function from seaborn to plot the correlation matrix\n",
+    "# annot = True to print the values inside the square\n",
+    "plt.figure(figsize=(15,8))\n",
+    "sns.heatmap(data=correlation_matrix, annot=True)\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a20d6e11",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Discussing the correlation data\n",
+    "\n",
+    "In the above example we note two things. In the first plot we display\n",
+    "the overlap of benign and malignant tumors as functions of the various\n",
+    "features in the Wisconsin data set. We see that for\n",
+    "some of the features we can distinguish clearly the benign and\n",
+    "malignant cases while for other features we cannot. This can point to\n",
+    "us which features may be of greater interest when we wish to classify\n",
+    "a benign or not benign tumour.\n",
+    "\n",
+    "In the second figure we have computed the so-called correlation\n",
+    "matrix, which in our case with thirty features becomes a $30\\times 30$\n",
+    "matrix.\n",
+    "\n",
+    "We constructed this matrix using **pandas** via the statements"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "id": "85e60caf",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "cancerpd = pd.DataFrame(cancer.data, columns=cancer.feature_names)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5207f1c7",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and then"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "id": "7d64eaa8",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "correlation_matrix = cancerpd.corr().round(1)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5bf589dd",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Diagonalizing this matrix we can in turn say something about which\n",
+    "features are of relevance and which are not. This leads  us to\n",
+    "the classical Principal Component Analysis (PCA) theorem with\n",
+    "applications. This will be discussed later this semester."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4ead9eec",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Other measures in classification studies"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 8,
+   "id": "606f3e3c",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "import matplotlib.pyplot as plt\n",
+    "import numpy as np\n",
+    "from sklearn.model_selection import  train_test_split \n",
+    "from sklearn.datasets import load_breast_cancer\n",
+    "from sklearn.linear_model import LogisticRegression\n",
+    "\n",
+    "# Load the data\n",
+    "cancer = load_breast_cancer()\n",
+    "\n",
+    "X_train, X_test, y_train, y_test = train_test_split(cancer.data,cancer.target,random_state=0)\n",
+    "print(X_train.shape)\n",
+    "print(X_test.shape)\n",
+    "# Logistic Regression\n",
+    "logreg = LogisticRegression(solver='lbfgs')\n",
+    "logreg.fit(X_train, y_train)\n",
+    "\n",
+    "from sklearn.preprocessing import LabelEncoder\n",
+    "from sklearn.model_selection import cross_validate\n",
+    "#Cross validation\n",
+    "accuracy = cross_validate(logreg,X_test,y_test,cv=10)['test_score']\n",
+    "print(accuracy)\n",
+    "print(\"Test set accuracy with Logistic Regression: {:.2f}\".format(logreg.score(X_test,y_test)))\n",
+    "\n",
+    "import scikitplot as skplt\n",
+    "y_pred = logreg.predict(X_test)\n",
+    "skplt.metrics.plot_confusion_matrix(y_test, y_pred, normalize=True)\n",
+    "plt.show()\n",
+    "y_probas = logreg.predict_proba(X_test)\n",
+    "skplt.metrics.plot_roc(y_test, y_probas)\n",
+    "plt.show()\n",
+    "skplt.metrics.plot_cumulative_gain(y_test, y_probas)\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1990dce8",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Introduction to Neural networks\n",
+    "\n",
+    "Artificial neural networks are computational systems that can learn to\n",
+    "perform tasks by considering examples, generally without being\n",
+    "programmed with any task-specific rules. It is supposed to mimic a\n",
+    "biological system, wherein neurons interact by sending signals in the\n",
+    "form of mathematical functions between layers. All layers can contain\n",
+    "an arbitrary number of neurons, and each connection is represented by\n",
+    "a weight variable."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "66296a42",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Artificial neurons\n",
+    "\n",
+    "The field of artificial neural networks has a long history of\n",
+    "development, and is closely connected with the advancement of computer\n",
+    "science and computers in general. A model of artificial neurons was\n",
+    "first developed by McCulloch and Pitts in 1943 to study signal\n",
+    "processing in the brain and has later been refined by others. The\n",
+    "general idea is to mimic neural networks in the human brain, which is\n",
+    "composed of billions of neurons that communicate with each other by\n",
+    "sending electrical signals.  Each neuron accumulates its incoming\n",
+    "signals, which must exceed an activation threshold to yield an\n",
+    "output. If the threshold is not overcome, the neuron remains inactive,\n",
+    "i.e. has zero output.\n",
+    "\n",
+    "This behaviour has inspired a simple mathematical model for an artificial neuron."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0c2cfc17",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"artificialNeuron\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation}\n",
+    " y = f\\left(\\sum_{i=1}^n w_ix_i\\right) = f(u)\n",
+    "\\label{artificialNeuron} \\tag{1}\n",
+    "\\end{equation}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "aee8a8d0",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Here, the output $y$ of the neuron is the value of its activation function, which have as input\n",
+    "a weighted sum of signals $x_i, \\dots ,x_n$ received by $n$ other neurons.\n",
+    "\n",
+    "Conceptually, it is helpful to divide neural networks into four\n",
+    "categories:\n",
+    "1. general purpose neural networks for supervised learning,\n",
+    "\n",
+    "2. neural networks designed specifically for image processing, the most prominent example of this class being Convolutional Neural Networks (CNNs),\n",
+    "\n",
+    "3. neural networks for sequential data such as Recurrent Neural Networks (RNNs), and\n",
+    "\n",
+    "4. neural networks for unsupervised learning such as Deep Boltzmann Machines.\n",
+    "\n",
+    "In natural science, DNNs and CNNs have already found numerous\n",
+    "applications. In statistical physics, they have been applied to detect\n",
+    "phase transitions in 2D Ising and Potts models, lattice gauge\n",
+    "theories, and different phases of polymers, or solving the\n",
+    "Navier-Stokes equation in weather forecasting.  Deep learning has also\n",
+    "found interesting applications in quantum physics. Various quantum\n",
+    "phase transitions can be detected and studied using DNNs and CNNs,\n",
+    "topological phases, and even non-equilibrium many-body\n",
+    "localization. Representing quantum states as DNNs quantum state\n",
+    "tomography are among some of the impressive achievements to reveal the\n",
+    "potential of DNNs to facilitate the study of quantum systems.\n",
+    "\n",
+    "In quantum information theory, it has been shown that one can perform\n",
+    "gate decompositions with the help of neural. \n",
+    "\n",
+    "The applications are not limited to the natural sciences. There is a\n",
+    "plethora of applications in essentially all disciplines, from the\n",
+    "humanities to life science and medicine."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "de9a9d96",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Neural network types\n",
+    "\n",
+    "An artificial neural network (ANN), is a computational model that\n",
+    "consists of layers of connected neurons, or nodes or units.  We will\n",
+    "refer to these interchangeably as units or nodes, and sometimes as\n",
+    "neurons.\n",
+    "\n",
+    "It is supposed to mimic a biological nervous system by letting each\n",
+    "neuron interact with other neurons by sending signals in the form of\n",
+    "mathematical functions between layers.  A wide variety of different\n",
+    "ANNs have been developed, but most of them consist of an input layer,\n",
+    "an output layer and eventual layers in-between, called *hidden\n",
+    "layers*. All layers can contain an arbitrary number of nodes, and each\n",
+    "connection between two nodes is associated with a weight variable.\n",
+    "\n",
+    "Neural networks (also called neural nets) are neural-inspired\n",
+    "nonlinear models for supervised learning.  As we will see, neural nets\n",
+    "can be viewed as natural, more powerful extensions of supervised\n",
+    "learning methods such as linear and logistic regression and soft-max\n",
+    "methods we discussed earlier."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c32afbfa",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Feed-forward neural networks\n",
+    "\n",
+    "The feed-forward neural network (FFNN) was the first and simplest type\n",
+    "of ANNs that were devised. In this network, the information moves in\n",
+    "only one direction: forward through the layers.\n",
+    "\n",
+    "Nodes are represented by circles, while the arrows display the\n",
+    "connections between the nodes, including the direction of information\n",
+    "flow. Additionally, each arrow corresponds to a weight variable\n",
+    "(figure to come).  We observe that each node in a layer is connected\n",
+    "to *all* nodes in the subsequent layer, making this a so-called\n",
+    "*fully-connected* FFNN."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f5045339",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Convolutional Neural Network\n",
+    "\n",
+    "A different variant of FFNNs are *convolutional neural networks*\n",
+    "(CNNs), which have a connectivity pattern inspired by the animal\n",
+    "visual cortex. Individual neurons in the visual cortex only respond to\n",
+    "stimuli from small sub-regions of the visual field, called a receptive\n",
+    "field. This makes the neurons well-suited to exploit the strong\n",
+    "spatially local correlation present in natural images. The response of\n",
+    "each neuron can be approximated mathematically as a convolution\n",
+    "operation.  (figure to come)\n",
+    "\n",
+    "Convolutional neural networks emulate the behaviour of neurons in the\n",
+    "visual cortex by enforcing a *local* connectivity pattern between\n",
+    "nodes of adjacent layers: Each node in a convolutional layer is\n",
+    "connected only to a subset of the nodes in the previous layer, in\n",
+    "contrast to the fully-connected FFNN.  Often, CNNs consist of several\n",
+    "convolutional layers that learn local features of the input, with a\n",
+    "fully-connected layer at the end, which gathers all the local data and\n",
+    "produces the outputs. They have wide applications in image and video\n",
+    "recognition."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "88e00eb7",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Recurrent neural networks\n",
+    "\n",
+    "So far we have only mentioned ANNs where information flows in one\n",
+    "direction: forward. *Recurrent neural networks* on the other hand,\n",
+    "have connections between nodes that form directed *cycles*. This\n",
+    "creates a form of internal memory which are able to capture\n",
+    "information on what has been calculated before; the output is\n",
+    "dependent on the previous computations. Recurrent NNs make use of\n",
+    "sequential information by performing the same task for every element\n",
+    "in a sequence, where each element depends on previous elements. An\n",
+    "example of such information is sentences, making recurrent NNs\n",
+    "especially well-suited for handwriting and speech recognition."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "bddcdbc7",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Other types of networks\n",
+    "\n",
+    "There are many other kinds of ANNs that have been developed. One type\n",
+    "that is specifically designed for interpolation in multidimensional\n",
+    "space is the radial basis function (RBF) network. RBFs are typically\n",
+    "made up of three layers: an input layer, a hidden layer with\n",
+    "non-linear radial symmetric activation functions and a linear output\n",
+    "layer (''linear'' here means that each node in the output layer has a\n",
+    "linear activation function). The layers are normally fully-connected\n",
+    "and there are no cycles, thus RBFs can be viewed as a type of\n",
+    "fully-connected FFNN. They are however usually treated as a separate\n",
+    "type of NN due the unusual activation functions."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1fad9ef2",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Multilayer perceptrons\n",
+    "\n",
+    "One uses often so-called fully-connected feed-forward neural networks\n",
+    "with three or more layers (an input layer, one or more hidden layers\n",
+    "and an output layer) consisting of neurons that have non-linear\n",
+    "activation functions.\n",
+    "\n",
+    "Such networks are often called *multilayer perceptrons* (MLPs)."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d48440ae",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Why multilayer perceptrons?\n",
+    "\n",
+    "According to the *Universal approximation theorem*, a feed-forward\n",
+    "neural network with just a single hidden layer containing a finite\n",
+    "number of neurons can approximate a continuous multidimensional\n",
+    "function to arbitrary accuracy, assuming the activation function for\n",
+    "the hidden layer is a **non-constant, bounded and\n",
+    "monotonically-increasing continuous function**.\n",
+    "\n",
+    "Note that the requirements on the activation function only applies to\n",
+    "the hidden layer, the output nodes are always assumed to be linear, so\n",
+    "as to not restrict the range of output values."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "86bdec84",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Illustration of a single perceptron model and a multi-perceptron model\n",
+    "\n",
+    "<!-- dom:FIGURE: [figures/nns.png, width=600 frac=0.8]  In a) we show a single perceptron model while in b) we dispay a network with two  hidden layers, an input layer and an output layer. -->\n",
+    "<!-- begin figure -->\n",
+    "\n",
+    "<img src=\"figures/nns.png\" width=\"600\"><p style=\"font-size: 0.9em\"><i>Figure 1: In a) we show a single perceptron model while in b) we dispay a network with two  hidden layers, an input layer and an output layer.</i></p>\n",
+    "<!-- end figure -->"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0ae94443",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Examples of XOR, OR and AND gates\n",
+    "\n",
+    "Let us first try to fit various gates using standard linear\n",
+    "regression. The gates we are thinking of are the classical XOR, OR and\n",
+    "AND gates, well-known elements in computer science. The tables here\n",
+    "show how we can set up the inputs $x_1$ and $x_2$ in order to yield a\n",
+    "specific target $y_i$."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 9,
+   "id": "24243281",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "\"\"\"\n",
+    "Simple code that tests XOR, OR and AND gates with linear regression\n",
+    "\"\"\"\n",
+    "\n",
+    "import numpy as np\n",
+    "# Design matrix\n",
+    "X = np.array([ [1, 0, 0], [1, 0, 1], [1, 1, 0],[1, 1, 1]],dtype=np.float64)\n",
+    "print(f\"The X.TX  matrix:{X.T @ X}\")\n",
+    "Xinv = np.linalg.pinv(X.T @ X)\n",
+    "print(f\"The invers of X.TX  matrix:{Xinv}\")\n",
+    "\n",
+    "# The XOR gate \n",
+    "yXOR = np.array( [ 0, 1 ,1, 0])\n",
+    "ThetaXOR  = Xinv @ X.T @ yXOR\n",
+    "print(f\"The values of theta for the XOR gate:{ThetaXOR}\")\n",
+    "print(f\"The linear regression prediction  for the XOR gate:{X @ ThetaXOR}\")\n",
+    "\n",
+    "\n",
+    "# The OR gate \n",
+    "yOR = np.array( [ 0, 1 ,1, 1])\n",
+    "ThetaOR  = Xinv @ X.T @ yOR\n",
+    "print(f\"The values of theta for the OR gate:{ThetaOR}\")\n",
+    "print(f\"The linear regression prediction  for the OR gate:{X @ ThetaOR}\")\n",
+    "\n",
+    "\n",
+    "# The OR gate \n",
+    "yAND = np.array( [ 0, 0 ,0, 1])\n",
+    "ThetaAND  = Xinv @ X.T @ yAND\n",
+    "print(f\"The values of theta for the AND gate:{ThetaAND}\")\n",
+    "print(f\"The linear regression prediction  for the AND gate:{X @ ThetaAND}\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9f5a12f3",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "What is happening here?"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "76643288",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Does Logistic Regression do a better Job?"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 10,
+   "id": "16e2d07c",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "\"\"\"\n",
+    "Simple code that tests XOR and OR gates with linear regression\n",
+    "and logistic regression\n",
+    "\"\"\"\n",
+    "\n",
+    "import matplotlib.pyplot as plt\n",
+    "from sklearn.linear_model import LogisticRegression\n",
+    "import numpy as np\n",
+    "\n",
+    "# Design matrix\n",
+    "X = np.array([ [1, 0, 0], [1, 0, 1], [1, 1, 0],[1, 1, 1]],dtype=np.float64)\n",
+    "print(f\"The X.TX  matrix:{X.T @ X}\")\n",
+    "Xinv = np.linalg.pinv(X.T @ X)\n",
+    "print(f\"The invers of X.TX  matrix:{Xinv}\")\n",
+    "\n",
+    "# The XOR gate \n",
+    "yXOR = np.array( [ 0, 1 ,1, 0])\n",
+    "ThetaXOR  = Xinv @ X.T @ yXOR\n",
+    "print(f\"The values of theta for the XOR gate:{ThetaXOR}\")\n",
+    "print(f\"The linear regression prediction  for the XOR gate:{X @ ThetaXOR}\")\n",
+    "\n",
+    "\n",
+    "# The OR gate \n",
+    "yOR = np.array( [ 0, 1 ,1, 1])\n",
+    "ThetaOR  = Xinv @ X.T @ yOR\n",
+    "print(f\"The values of theta for the OR gate:{ThetaOR}\")\n",
+    "print(f\"The linear regression prediction  for the OR gate:{X @ ThetaOR}\")\n",
+    "\n",
+    "\n",
+    "# The OR gate \n",
+    "yAND = np.array( [ 0, 0 ,0, 1])\n",
+    "ThetaAND  = Xinv @ X.T @ yAND\n",
+    "print(f\"The values of theta for the AND gate:{ThetaAND}\")\n",
+    "print(f\"The linear regression prediction  for the AND gate:{X @ ThetaAND}\")\n",
+    "\n",
+    "# Now we change to logistic regression\n",
+    "\n",
+    "\n",
+    "# Logistic Regression\n",
+    "logreg = LogisticRegression()\n",
+    "logreg.fit(X, yOR)\n",
+    "print(\"Test set accuracy with Logistic Regression for OR gate: {:.2f}\".format(logreg.score(X,yOR)))\n",
+    "\n",
+    "logreg.fit(X, yXOR)\n",
+    "print(\"Test set accuracy with Logistic Regression for XOR gate: {:.2f}\".format(logreg.score(X,yXOR)))\n",
+    "\n",
+    "\n",
+    "logreg.fit(X, yAND)\n",
+    "print(\"Test set accuracy with Logistic Regression for AND gate: {:.2f}\".format(logreg.score(X,yAND)))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "113ba59b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Not exactly impressive, but somewhat better."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "fe95f21c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Adding Neural Networks"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 11,
+   "id": "52b22d7e",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "\n",
+    "# and now neural networks with Scikit-Learn and the XOR\n",
+    "\n",
+    "from sklearn.neural_network import MLPClassifier\n",
+    "from sklearn.datasets import make_classification\n",
+    "X, yXOR = make_classification(n_samples=100, random_state=1)\n",
+    "FFNN = MLPClassifier(random_state=1, max_iter=300).fit(X, yXOR)\n",
+    "FFNN.predict_proba(X)\n",
+    "print(f\"Test set accuracy with Feed Forward Neural Network  for XOR gate:{FFNN.score(X, yXOR)}\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1c309252",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Mathematical model\n",
+    "\n",
+    "The output $y$ is produced via the activation function $f$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "72b265bc",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "y = f\\left(\\sum_{i=1}^n w_ix_i + b_i\\right) = f(z),\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c039759b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "This function receives $x_i$ as inputs.\n",
+    "Here the activation $z=(\\sum_{i=1}^n w_ix_i+b_i)$. \n",
+    "In an FFNN of such neurons, the *inputs* $x_i$ are the *outputs* of\n",
+    "the neurons in the preceding layer. Furthermore, an MLP is\n",
+    "fully-connected, which means that each neuron receives a weighted sum\n",
+    "of the outputs of *all* neurons in the previous layer."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "78afec18",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Mathematical model\n",
+    "\n",
+    "First, for each node $i$ in the first hidden layer, we calculate a weighted sum $z_i^1$ of the input coordinates $x_j$,"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9403b128",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"_auto1\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation} z_i^1 = \\sum_{j=1}^{M} w_{ij}^1 x_j + b_i^1\n",
+    "\\label{_auto1} \\tag{2}\n",
+    "\\end{equation}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "41fbd774",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Here $b_i$ is the so-called bias which is normally needed in\n",
+    "case of zero activation weights or inputs. How to fix the biases and\n",
+    "the weights will be discussed below.  The value of $z_i^1$ is the\n",
+    "argument to the activation function $f_i$ of each node $i$, The\n",
+    "variable $M$ stands for all possible inputs to a given node $i$ in the\n",
+    "first layer.  We define  the output $y_i^1$ of all neurons in layer 1 as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "863e94d5",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"outputLayer1\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation}\n",
+    " y_i^1 = f(z_i^1) = f\\left(\\sum_{j=1}^M w_{ij}^1 x_j  + b_i^1\\right)\n",
+    "\\label{outputLayer1} \\tag{3}\n",
+    "\\end{equation}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a177bdcb",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where we assume that all nodes in the same layer have identical\n",
+    "activation functions, hence the notation $f$. In general, we could assume in the more general case that different layers have different activation functions.\n",
+    "In this case we would identify these functions with a superscript $l$ for the $l$-th layer,"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5a2904a9",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"generalLayer\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation}\n",
+    " y_i^l = f^l(u_i^l) = f^l\\left(\\sum_{j=1}^{N_{l-1}} w_{ij}^l y_j^{l-1} + b_i^l\\right)\n",
+    "\\label{generalLayer} \\tag{4}\n",
+    "\\end{equation}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "dce26598",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where $N_l$ is the number of nodes in layer $l$. When the output of\n",
+    "all the nodes in the first hidden layer are computed, the values of\n",
+    "the subsequent layer can be calculated and so forth until the output\n",
+    "is obtained."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1d5c39b4",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Mathematical model\n",
+    "\n",
+    "The output of neuron $i$ in layer 2 is thus,"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3178a30e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"_auto2\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation}\n",
+    " y_i^2 = f^2\\left(\\sum_{j=1}^N w_{ij}^2 y_j^1 + b_i^2\\right) \n",
+    "\\label{_auto2} \\tag{5}\n",
+    "\\end{equation}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "94e18bc0",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"outputLayer2\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation} \n",
+    " = f^2\\left[\\sum_{j=1}^N w_{ij}^2f^1\\left(\\sum_{k=1}^M w_{jk}^1 x_k + b_j^1\\right) + b_i^2\\right]\n",
+    "\\label{outputLayer2} \\tag{6}\n",
+    "\\end{equation}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8ed290af",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where we have substituted $y_k^1$ with the inputs $x_k$. Finally, the ANN output reads"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "fccf06fa",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"_auto3\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation}\n",
+    " y_i^3 = f^3\\left(\\sum_{j=1}^N w_{ij}^3 y_j^2 + b_i^3\\right) \n",
+    "\\label{_auto3} \\tag{7}\n",
+    "\\end{equation}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e0dc93d5",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"_auto4\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation} \n",
+    " = f_3\\left[\\sum_{j} w_{ij}^3 f^2\\left(\\sum_{k} w_{jk}^2 f^1\\left(\\sum_{m} w_{km}^1 x_m + b_k^1\\right) + b_j^2\\right)\n",
+    "  + b_1^3\\right]\n",
+    "\\label{_auto4} \\tag{8}\n",
+    "\\end{equation}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "35ef6ff4",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Mathematical model\n",
+    "\n",
+    "We can generalize this expression to an MLP with $l$ hidden\n",
+    "layers. The complete functional form is,"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "306dafc4",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"completeNN\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation}\n",
+    "y^{l+1}_i = f^{l+1}\\left[\\!\\sum_{j=1}^{N_l} w_{ij}^3 f^l\\left(\\sum_{k=1}^{N_{l-1}}w_{jk}^{l-1}\\left(\\dots f^1\\left(\\sum_{n=1}^{N_0} w_{mn}^1 x_n+ b_m^1\\right)\\dots\\right)+b_k^2\\right)+b_1^3\\right] \n",
+    "\\label{completeNN} \\tag{9}\n",
+    "\\end{equation}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c4c850c4",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "which illustrates a basic property of MLPs: The only independent\n",
+    "variables are the input values $x_n$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "92538f08",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Mathematical model\n",
+    "\n",
+    "This confirms that an MLP, despite its quite convoluted mathematical\n",
+    "form, is nothing more than an analytic function, specifically a\n",
+    "mapping of real-valued vectors $\\hat{x} \\in \\mathbb{R}^n \\rightarrow\n",
+    "\\hat{y} \\in \\mathbb{R}^m$.\n",
+    "\n",
+    "Furthermore, the flexibility and universality of an MLP can be\n",
+    "illustrated by realizing that the expression is essentially a nested\n",
+    "sum of scaled activation functions of the form"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a9261c9a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"_auto5\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation}\n",
+    " f(x) = c_1 f(c_2 x + c_3) + c_4\n",
+    "\\label{_auto5} \\tag{10}\n",
+    "\\end{equation}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "688d9884",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where the parameters $c_i$ are weights and biases. By adjusting these\n",
+    "parameters, the activation functions can be shifted up and down or\n",
+    "left and right, change slope or be rescaled which is the key to the\n",
+    "flexibility of a neural network."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "73fa4bac",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "### Matrix-vector notation\n",
+    "\n",
+    "We can introduce a more convenient notation for the activations in an A NN. \n",
+    "\n",
+    "Additionally, we can represent the biases and activations\n",
+    "as layer-wise column vectors $\\hat{b}_l$ and $\\hat{y}_l$, so that the $i$-th element of each vector \n",
+    "is the bias $b_i^l$ and activation $y_i^l$ of node $i$ in layer $l$ respectively. \n",
+    "\n",
+    "We have that $\\mathrm{W}_l$ is an $N_{l-1} \\times N_l$ matrix, while $\\hat{b}_l$ and $\\hat{y}_l$ are $N_l \\times 1$ column vectors. \n",
+    "With this notation, the sum becomes a matrix-vector multiplication, and we can write\n",
+    "the equation for the activations of hidden layer 2 (assuming three nodes for simplicity) as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9d4f97e7",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"_auto6\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation}\n",
+    " \\hat{y}_2 = f_2(\\mathrm{W}_2 \\hat{y}_{1} + \\hat{b}_{2}) = \n",
+    " f_2\\left(\\left[\\begin{array}{ccc}\n",
+    "    w^2_{11} &w^2_{12} &w^2_{13} \\\\\n",
+    "    w^2_{21} &w^2_{22} &w^2_{23} \\\\\n",
+    "    w^2_{31} &w^2_{32} &w^2_{33} \\\\\n",
+    "    \\end{array} \\right] \\cdot\n",
+    "    \\left[\\begin{array}{c}\n",
+    "           y^1_1 \\\\\n",
+    "           y^1_2 \\\\\n",
+    "           y^1_3 \\\\\n",
+    "          \\end{array}\\right] + \n",
+    "    \\left[\\begin{array}{c}\n",
+    "           b^2_1 \\\\\n",
+    "           b^2_2 \\\\\n",
+    "           b^2_3 \\\\\n",
+    "          \\end{array}\\right]\\right).\n",
+    "\\label{_auto6} \\tag{11}\n",
+    "\\end{equation}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9afd2931",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "### Matrix-vector notation  and activation\n",
+    "\n",
+    "The activation of node $i$ in layer 2 is"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6b394df8",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"_auto7\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation}\n",
+    " y^2_i = f_2\\Bigr(w^2_{i1}y^1_1 + w^2_{i2}y^1_2 + w^2_{i3}y^1_3 + b^2_i\\Bigr) = \n",
+    " f_2\\left(\\sum_{j=1}^3 w^2_{ij} y_j^1 + b^2_i\\right).\n",
+    "\\label{_auto7} \\tag{12}\n",
+    "\\end{equation}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "82aa60b1",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "This is not just a convenient and compact notation, but also a useful\n",
+    "and intuitive way to think about MLPs: The output is calculated by a\n",
+    "series of matrix-vector multiplications and vector additions that are\n",
+    "used as input to the activation functions. For each operation\n",
+    "$\\mathrm{W}_l \\hat{y}_{l-1}$ we move forward one layer."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "88c7c0b7",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "### Activation functions\n",
+    "\n",
+    "A property that characterizes a neural network, other than its\n",
+    "connectivity, is the choice of activation function(s).  As described\n",
+    "in, the following restrictions are imposed on an activation function\n",
+    "for a FFNN to fulfill the universal approximation theorem\n",
+    "\n",
+    "  * Non-constant\n",
+    "\n",
+    "  * Bounded\n",
+    "\n",
+    "  * Monotonically-increasing\n",
+    "\n",
+    "  * Continuous"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c90697d8",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "### Activation functions, Logistic and Hyperbolic ones\n",
+    "\n",
+    "The second requirement excludes all linear functions. Furthermore, in\n",
+    "a MLP with only linear activation functions, each layer simply\n",
+    "performs a linear transformation of its inputs.\n",
+    "\n",
+    "Regardless of the number of layers, the output of the NN will be\n",
+    "nothing but a linear function of the inputs. Thus we need to introduce\n",
+    "some kind of non-linearity to the NN to be able to fit non-linear\n",
+    "functions Typical examples are the logistic *Sigmoid*"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6539d005",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "f(x) = \\frac{1}{1 + e^{-x}},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "734bba5e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and the *hyperbolic tangent* function"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1aba0d87",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "f(x) = \\tanh(x)\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e54c0559",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "### Relevance\n",
+    "\n",
+    "The *sigmoid* function are more biologically plausible because the\n",
+    "output of inactive neurons are zero. Such activation function are\n",
+    "called *one-sided*. However, it has been shown that the hyperbolic\n",
+    "tangent performs better than the sigmoid for training MLPs.  has\n",
+    "become the most popular for *deep neural networks*"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 12,
+   "id": "7ddece95",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "\"\"\"The sigmoid function (or the logistic curve) is a \n",
+    "function that takes any real number, z, and outputs a number (0,1).\n",
+    "It is useful in neural networks for assigning weights on a relative scale.\n",
+    "The value z is the weighted sum of parameters involved in the learning algorithm.\"\"\"\n",
+    "\n",
+    "import numpy\n",
+    "import matplotlib.pyplot as plt\n",
+    "import math as mt\n",
+    "\n",
+    "z = numpy.arange(-5, 5, .1)\n",
+    "sigma_fn = numpy.vectorize(lambda z: 1/(1+numpy.exp(-z)))\n",
+    "sigma = sigma_fn(z)\n",
+    "\n",
+    "fig = plt.figure()\n",
+    "ax = fig.add_subplot(111)\n",
+    "ax.plot(z, sigma)\n",
+    "ax.set_ylim([-0.1, 1.1])\n",
+    "ax.set_xlim([-5,5])\n",
+    "ax.grid(True)\n",
+    "ax.set_xlabel('z')\n",
+    "ax.set_title('sigmoid function')\n",
+    "\n",
+    "plt.show()\n",
+    "\n",
+    "\"\"\"Step Function\"\"\"\n",
+    "z = numpy.arange(-5, 5, .02)\n",
+    "step_fn = numpy.vectorize(lambda z: 1.0 if z >= 0.0 else 0.0)\n",
+    "step = step_fn(z)\n",
+    "\n",
+    "fig = plt.figure()\n",
+    "ax = fig.add_subplot(111)\n",
+    "ax.plot(z, step)\n",
+    "ax.set_ylim([-0.5, 1.5])\n",
+    "ax.set_xlim([-5,5])\n",
+    "ax.grid(True)\n",
+    "ax.set_xlabel('z')\n",
+    "ax.set_title('step function')\n",
+    "\n",
+    "plt.show()\n",
+    "\n",
+    "\"\"\"Sine Function\"\"\"\n",
+    "z = numpy.arange(-2*mt.pi, 2*mt.pi, 0.1)\n",
+    "t = numpy.sin(z)\n",
+    "\n",
+    "fig = plt.figure()\n",
+    "ax = fig.add_subplot(111)\n",
+    "ax.plot(z, t)\n",
+    "ax.set_ylim([-1.0, 1.0])\n",
+    "ax.set_xlim([-2*mt.pi,2*mt.pi])\n",
+    "ax.grid(True)\n",
+    "ax.set_xlabel('z')\n",
+    "ax.set_title('sine function')\n",
+    "\n",
+    "plt.show()\n",
+    "\n",
+    "\"\"\"Plots a graph of the squashing function used by a rectified linear\n",
+    "unit\"\"\"\n",
+    "z = numpy.arange(-2, 2, .1)\n",
+    "zero = numpy.zeros(len(z))\n",
+    "y = numpy.max([zero, z], axis=0)\n",
+    "\n",
+    "fig = plt.figure()\n",
+    "ax = fig.add_subplot(111)\n",
+    "ax.plot(z, y)\n",
+    "ax.set_ylim([-2.0, 2.0])\n",
+    "ax.set_xlim([-2.0, 2.0])\n",
+    "ax.grid(True)\n",
+    "ax.set_xlabel('z')\n",
+    "ax.set_title('Rectified linear unit')\n",
+    "\n",
+    "plt.show()"
+   ]
+  }
+ ],
+ "metadata": {},
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/doc/src/week40/Previousversions/week40.do.txt b/doc/src/week40/Previousversions/week40.do.txt
new file mode 100644
index 000000000..696d13c3c
--- /dev/null
+++ b/doc/src/week40/Previousversions/week40.do.txt
@@ -0,0 +1,2259 @@
+TITLE: Week 40: Gradient descent methods (continued) and start Neural networks
+AUTHOR: Morten Hjorth-Jensen {copyright, 1999-present|CC BY-NC} at Department of Physics, University of Oslo, Norway
+DATE: September 29-October 3, 2025
+
+
+
+!split
+===== Plans for week 40 =====
+
+
+!split
+===== Lecture Monday September 30, 2024 =====
+!bblock 
+  o Stochastic Gradient descent with examples and automatic differentiation
+  o If we get time, we start with the basics of Neural Networks, setting up the basic steps, from the simple perceptron model to the multi-layer perceptron model
+  o "Video of lecture":"https://youtu.be/jdJoOrCIdII"
+  o Whiteboard notes at URL:"https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2024/NotesSeptember30.pdf"  
+!eblock
+
+!split
+===== Suggested readings and videos =====
+!bblock Readings and Videos:
+   o The lecture notes for week 40 (these notes)
+   o For a good discussion on gradient methods, we would like to recommend Goodfellow et al section 4.3-4.5 and sections 8.3-8.6. We will come back to the latter chapter in our discussion of Neural networks as well.
+   o For neural networks we recommend Goodfellow et al chapter 6 and Raschka et al chapter 2 (contains also material about gradient descent) and chapter 11 (we will use this next week)
+   o Video on gradient descent at URL:"https://www.youtube.com/watch?v=sDv4f4s2SB8"
+   o Video on stochastic gradient descent at URL:"https://www.youtube.com/watch?v=vMh0zPT0tLI"
+   o Neural Networks demystified at URL:"https://www.youtube.com/watch?v=bxe2T-V8XRs&list=PLiaHhY2iBX9hdHaRr6b7XevZtgZRa1PoU&ab_channel=WelchLabs"
+   o Building Neural Networks from scratch at URL:https://www.youtube.com/watch?v=Wo5dMEP_BbI&list=PLQVvvaa0QuDcjD5BAw2DxE6OF2tius3V3&ab_channel=sentdex"
+!eblock
+
+!split
+===== Lab sessions Tuesday and Wednesday =====
+!bblock  Material for the active learning sessions on Tuesday and Wednesday
+  * Work on project 1 and discussions on how to structure your report
+  * No weekly exercises for week 40, project work only
+  * Video on how to write scientific reports recorded during one of the lab sessions at URL:"https://youtu.be/tVW1ZDmZnwM"
+  * A general guideline can be found at URL:"https://github.com/CompPhysics/MachineLearning/blob/master/doc/Projects/EvaluationGrading/EvaluationForm.md".
+!eblock  
+
+
+!split
+===== Summary from last week, using gradient descent methods, limitations =====
+
+* _Gradient descent (GD) finds local minima of our function_. Since the GD algorithm is deterministic, if it converges, it will converge to a local minimum of our cost/loss/risk function. Because in ML we are often dealing with extremely rugged landscapes with many local minima, this can lead to poor performance.
+
+* _GD is sensitive to initial conditions_. One consequence of the local nature of GD is that initial conditions matter. Depending on where one starts, one will end up at a different local minima. Therefore, it is very important to think about how one initializes the training process. This is true for GD as well as more complicated variants of GD.
+
+* _Gradients are computationally expensive to calculate for large datasets_. In many cases in statistics and ML, the cost/loss/risk function is a sum of terms, with one term for each data point. For example, in linear regression, $E \propto \sum_{i=1}^n (y_i - \mathbf{w}^T\cdot\mathbf{x}_i)^2$; for logistic regression, the square error is replaced by the cross entropy. To calculate the gradient we have to sum over *all* $n$ data points. Doing this at every GD step becomes extremely computationally expensive. An ingenious solution to this, is to calculate the gradients using small subsets of the data called ``mini batches''. This has the added benefit of introducing stochasticity into our algorithm.
+
+* _GD is very sensitive to choices of learning rates_. GD is extremely sensitive to the choice of learning rates. If the learning rate is very small, the training process take an extremely long time. For larger learning rates, GD can diverge and give poor results. Furthermore, depending on what the local landscape looks like, we have to modify the learning rates to ensure convergence. Ideally, we would *adaptively* choose the learning rates to match the landscape.
+
+* _GD treats all directions in parameter space uniformly._ Another major drawback of GD is that unlike Newton's method, the learning rate for GD is the same in all directions in parameter space. For this reason, the maximum learning rate is set by the behavior of the steepest direction and this can significantly slow down training. Ideally, we would like to take large steps in flat directions and small steps in steep directions. Since we are exploring rugged landscapes where curvatures change, this requires us to keep track of not only the gradient but second derivatives. The ideal scenario would be to calculate the Hessian but this proves to be too computationally expensive. 
+	
+* GD can take exponential time to escape saddle points, even with random initialization. As we mentioned, GD is extremely sensitive to initial condition since it determines the particular local minimum GD would eventually reach. However, even with a good initialization scheme, through the introduction of randomness, GD can still take exponential time to escape saddle points.
+
+!split
+===== Simple implementation of GD for OLS, Ridge and Lasso =====
+
+Last week we studied both several gradient methods. With and without an update of the learning.
+We summarize some of these here for the methods we hvae studied in project one, without the inclusion of momentum.
+!bc pycod
+from random import random, seed
+import numpy as np
+
+# the number of datapoints with a 2nd-order polynomial
+n = 100
+x = 2*np.random.rand(n,1)
+y = 4+3*x+5*x*x
+# Design matrix including the intercept
+# No scaling of data of and all data used for training 
+X = np.c_[np.ones((n,1)), x, x*x]
+# Learning rate and number of iterations
+eta = 0.05
+Niterations = 100
+
+# OLS part
+beta_OLS = np.random.randn(3,1)
+gradient = np.zeros(3)
+for iter in range(Niterations):
+    gradient = (2.0/n)*X.T @ (X @ beta_OLS-y)
+    beta_OLS -= eta*gradient
+print('Parameters for OLS using gradient descent')    
+print(beta_OLS)
+
+#Ridge and Lasso parameter Lambda
+Lambda  = 0.01
+Id = n*Lambda* np.eye((X.T @ X).shape[0])
+# Gradient descent with  Ridge
+beta_Ridge = np.random.randn(3,1)
+gradient = np.zeros(3)
+for iter in range(Niterations):
+    gradients = 2.0/n*X.T @ (X @ beta_Ridge-y)+2*Lambda*beta_Ridge
+    beta_Ridge -= eta*gradients
+print('Parameters for Ridge using gradient descent')    
+print(beta_Ridge)
+
+# Gradient descent with Lasso
+beta_Lasso = np.random.randn(3,1)
+gradient = np.zeros(3)
+for iter in range(Niterations):
+    gradients = 2.0/n*X.T @ (X @ beta_Lasso-y)+Lambda*np.sign(beta_Lasso)
+    beta_Lasso -= eta*gradients
+print('Parameters for Lasso using gradient descent')    
+print(beta_Lasso)
+
+!ec
+
+
+!split
+===== But none of these can compete with Newton's method =====
+
+Note that we here have introduced automatic differentiation
+!bc pycod
+# Using Newton's method
+from random import random, seed
+import numpy as np
+import autograd.numpy as np
+from autograd import grad
+
+def CostOLS(beta):
+    return (1.0/n)*np.sum((y-X @ beta)**2)
+
+n = 100
+x = 2*np.random.rand(n,1)
+y = 4+3*x+5*x*x
+
+X = np.c_[np.ones((n,1)), x, x*x]
+XT_X = X.T @ X
+beta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)
+print("Own inversion")
+print(beta_linreg)
+# Hessian matrix
+H = (2.0/n)* XT_X
+# Note that here the Hessian does not depend on the parameters beta
+invH = np.linalg.pinv(H)
+beta = np.random.randn(3,1)
+Niterations = 5
+# define the gradient
+training_gradient = grad(CostOLS)
+
+for iter in range(Niterations):
+    gradients = training_gradient(beta)
+    beta -= invH @ gradients
+    print(iter,gradients[0],gradients[1])
+print("beta from own Newton code")
+print(beta)
+
+!ec
+
+!split
+===== Gradient descent and Logistic regression =====
+
+Finally, we complete these examples by adding a simple code for
+Logistic regression. Note the more general approach with a class for
+the method. Here we use a so-called _AND_ gate for our data set.
+
+!bc pycod
+import numpy as np
+class LogisticRegression:
+    def __init__(self, learning_rate=0.01, num_iterations=1000):
+        self.learning_rate = learning_rate
+        self.num_iterations = num_iterations
+        self.beta_logreg = None
+    def sigmoid(self, z):
+        return 1 / (1 + np.exp(-z))
+    def GDfit(self, X, y):
+        n_data, num_features = X.shape
+        self.beta_logreg = np.zeros(num_features)
+        for _ in range(self.num_iterations):
+            linear_model = X @ self.beta_logreg
+            y_predicted = self.sigmoid(linear_model)
+            # Gradient calculation
+            gradient = (X.T @ (y_predicted - y))/n_data
+            # Update beta_logreg
+            self.beta_logreg -= self.learning_rate*gradient
+    def predict(self, X):
+        linear_model = X @ self.beta_logreg
+        y_predicted = self.sigmoid(linear_model)
+        return [1 if i >= 0.5 else 0 for i in y_predicted]
+# Example usage
+if __name__ == "__main__":
+    # Sample data
+    X = np.array([[0, 0], [1, 0], [0, 1], [1, 1]])
+    y = np.array([0, 0, 0, 1])  # This is an AND gate
+    model = LogisticRegression(learning_rate=0.01, num_iterations=1000)
+    model.GDfit(X, y)
+    predictions = model.predict(X)
+    print("Predictions:", predictions)
+!ec
+
+!split
+===== Overview video on Stochastic Gradient Descent =====
+
+"What is Stochastic Gradient Descent":"https://www.youtube.com/watch?v=vMh0zPT0tLI&ab_channel=StatQuestwithJoshStarmer"
+There are several reasons for using stochastic gradient descent. Some of these are:
+
+o Efficiency: Updates weights more frequently using a single or a small batch of samples, which speeds up convergence.
+o Hopefully avoid Local Minima
+o Memory Usage: Requires less memory compared to computing gradients for the entire dataset.
+
+!split
+=====  Batches and mini-batches =====
+
+In gradient descent we compute the cost function and its gradient for all data points we have.
+
+In large-scale applications such as the "ILSVRC challenge":"https://www.image-net.org/challenges/LSVRC/", the
+training data can have on order of millions of examples. Hence, it
+seems wasteful to compute the full cost function over the entire
+training set in order to perform only a single parameter update. A
+very common approach to addressing this challenge is to compute the
+gradient over batches of the training data. For example, a typical batch could contain some thousand  examples from
+an  entire training set of several millions. This batch is then used to
+perform a parameter update.
+
+!split
+===== Stochastic Gradient Descent (SGD) =====
+
+In stochastic gradient descent, the extreme case is the case where we
+have only one batch, that is we include the whole data set.
+
+This process is called Stochastic Gradient
+Descent (SGD) (or also sometimes on-line gradient descent). This is
+relatively less common to see because in practice due to vectorized
+code optimizations it can be computationally much more efficient to
+evaluate the gradient for 100 examples, than the gradient for one
+example 100 times. Even though SGD technically refers to using a
+single example at a time to evaluate the gradient, you will hear
+people use the term SGD even when referring to mini-batch gradient
+descent (i.e. mentions of MGD for “Minibatch Gradient Descent”, or BGD
+for “Batch gradient descent” are rare to see), where it is usually
+assumed that mini-batches are used. The size of the mini-batch is a
+hyperparameter but it is not very common to cross-validate or bootstrap it. It is
+usually based on memory constraints (if any), or set to some value,
+e.g. 32, 64 or 128. We use powers of 2 in practice because many
+vectorized operation implementations work faster when their inputs are
+sized in powers of 2.
+
+In our notes with  SGD we mean stochastic gradient descent with mini-batches.
+
+
+!split
+===== Stochastic Gradient Descent =====
+
+Stochastic gradient descent (SGD) and variants thereof address some of
+the shortcomings of the Gradient descent method discussed above.
+
+The underlying idea of SGD comes from the observation that the cost
+function, which we want to minimize, can almost always be written as a
+sum over $n$ data points $\{\mathbf{x}_i\}_{i=1}^n$,
+!bt
+\[
+C(\mathbf{\beta}) = \sum_{i=1}^n c_i(\mathbf{x}_i,
+\mathbf{\beta}). 
+\]
+!et
+
+!split
+===== Computation of gradients =====
+
+This in turn means that the gradient can be
+computed as a sum over $i$-gradients 
+!bt
+\[
+\nabla_\beta C(\mathbf{\beta}) = \sum_i^n \nabla_\beta c_i(\mathbf{x}_i,
+\mathbf{\beta}).
+\]
+!et
+
+Stochasticity/randomness is introduced by only taking the
+gradient on a subset of the data called minibatches.  If there are $n$
+data points and the size of each minibatch is $M$, there will be $n/M$
+minibatches. We denote these minibatches by $B_k$ where
+$k=1,\cdots,n/M$.
+
+
+
+!split
+===== SGD example =====
+As an example, suppose we have $10$ data points $(\mathbf{x}_1,\cdots, \mathbf{x}_{10})$ 
+and we choose to have $M=5$ minibathces,
+then each minibatch contains two data points. In particular we have
+$B_1 = (\mathbf{x}_1,\mathbf{x}_2), \cdots, B_5 =
+(\mathbf{x}_9,\mathbf{x}_{10})$. Note that if you choose $M=1$ you
+have only a single batch with all data points and on the other extreme,
+you may choose $M=n$ resulting in a minibatch for each datapoint, i.e
+$B_k = \mathbf{x}_k$.
+
+The idea is now to approximate the gradient by replacing the sum over
+all data points with a sum over the data points in one the minibatches
+picked at random in each gradient descent step 
+!bt
+\[
+\nabla_{\beta}
+C(\mathbf{\beta}) = \sum_{i=1}^n \nabla_\beta c_i(\mathbf{x}_i,
+\mathbf{\beta}) \rightarrow \sum_{i \in B_k}^n \nabla_\beta
+c_i(\mathbf{x}_i, \mathbf{\beta}).
+\]
+!et
+
+!split
+===== The gradient step =====
+
+Thus a gradient descent step now looks like 
+!bt
+\[
+\beta_{j+1} = \beta_j - \gamma_j \sum_{i \in B_k}^n \nabla_\beta c_i(\mathbf{x}_i,
+\mathbf{\beta})
+\]
+!et
+
+where $k$ is picked at random with equal
+probability from $[1,n/M]$. An iteration over the number of
+minibathces (n/M) is commonly referred to as an epoch. Thus it is
+typical to choose a number of epochs and for each epoch iterate over
+the number of minibatches, as exemplified in the code below.
+
+!split
+===== Simple example code =====
+
+!bc pycod
+import numpy as np 
+
+n = 100 #100 datapoints 
+M = 5   #size of each minibatch
+m = int(n/M) #number of minibatches
+n_epochs = 10 #number of epochs
+
+j = 0
+for epoch in range(1,n_epochs+1):
+    for i in range(m):
+        k = np.random.randint(m) #Pick the k-th minibatch at random
+        #Compute the gradient using the data in minibatch Bk
+        #Compute new suggestion for 
+        j += 1
+!ec
+
+Taking the gradient only on a subset of the data has two important
+benefits. First, it introduces randomness which decreases the chance
+that our opmization scheme gets stuck in a local minima. Second, if
+the size of the minibatches are small relative to the number of
+datapoints ($M <  n$), the computation of the gradient is much
+cheaper since we sum over the datapoints in the $k-th$ minibatch and not
+all $n$ datapoints.
+
+!split
+===== When do we stop? =====
+
+A natural question is when do we stop the search for a new minimum?
+One possibility is to compute the full gradient after a given number
+of epochs and check if the norm of the gradient is smaller than some
+threshold and stop if true. However, the condition that the gradient
+is zero is valid also for local minima, so this would only tell us
+that we are close to a local/global minimum. However, we could also
+evaluate the cost function at this point, store the result and
+continue the search. If the test kicks in at a later stage we can
+compare the values of the cost function and keep the $\beta$ that
+gave the lowest value.
+
+!split
+===== Slightly different approach =====
+
+Another approach is to let the step length $\gamma_j$ depend on the
+number of epochs in such a way that it becomes very small after a
+reasonable time such that we do not move at all. Such approaches are
+also called scaling. There are many such ways to "scale the learning
+rate":"https://towardsdatascience.com/gradient-descent-the-learning-rate-and-the-importance-of-feature-scaling-6c0b416596e1"
+and "discussions here":"https://www.jmlr.org/papers/volume23/20-1258/20-1258.pdf". See
+also
+URL:"https://towardsdatascience.com/learning-rate-schedules-and-adaptive-learning-rate-methods-for-deep-learning-2c8f433990d1"
+for a discussion of different scaling functions for the learning rate.
+
+!split
+===== Time decay rate =====
+
+As an example, let $e = 0,1,2,3,\cdots$ denote the current epoch and let $t_0, t_1 > 0$ be two fixed numbers. Furthermore, let $t = e \cdot m + i$ where $m$ is the number of minibatches and $i=0,\cdots,m-1$. Then the function $$\gamma_j(t; t_0, t_1) = \frac{t_0}{t+t_1} $$ goes to zero as the number of epochs gets large. I.e. we start with a step length $\gamma_j (0; t_0, t_1) = t_0/t_1$ which decays in *time* $t$.
+
+In this way we can fix the number of epochs, compute $\beta$ and
+evaluate the cost function at the end. Repeating the computation will
+give a different result since the scheme is random by design. Then we
+pick the final $\beta$ that gives the lowest value of the cost
+function.
+
+!bc pycod
+import numpy as np 
+
+def step_length(t,t0,t1):
+    return t0/(t+t1)
+
+n = 100 #100 datapoints 
+M = 5   #size of each minibatch
+m = int(n/M) #number of minibatches
+n_epochs = 500 #number of epochs
+t0 = 1.0
+t1 = 10
+
+gamma_j = t0/t1
+j = 0
+for epoch in range(1,n_epochs+1):
+    for i in range(m):
+        k = np.random.randint(m) #Pick the k-th minibatch at random
+        #Compute the gradient using the data in minibatch Bk
+        #Compute new suggestion for beta
+        t = epoch*m+i
+        gamma_j = step_length(t,t0,t1)
+        j += 1
+
+print("gamma_j after %d epochs: %g" % (n_epochs,gamma_j))
+!ec
+
+
+!split
+===== Code with a Number of Minibatches which varies =====
+
+In the code here we vary the number of mini-batches.
+!bc pycode
+# Importing various packages
+from math import exp, sqrt
+from random import random, seed
+import numpy as np
+import matplotlib.pyplot as plt
+
+n = 100
+x = 2*np.random.rand(n,1)
+y = 4+3*x+np.random.randn(n,1)
+
+X = np.c_[np.ones((n,1)), x]
+XT_X = X.T @ X
+theta_linreg = np.linalg.inv(X.T @ X) @ (X.T @ y)
+print("Own inversion")
+print(theta_linreg)
+# Hessian matrix
+H = (2.0/n)* XT_X
+EigValues, EigVectors = np.linalg.eig(H)
+print(f"Eigenvalues of Hessian Matrix:{EigValues}")
+
+theta = np.random.randn(2,1)
+eta = 1.0/np.max(EigValues)
+Niterations = 1000
+
+
+for iter in range(Niterations):
+    gradients = 2.0/n*X.T @ ((X @ theta)-y)
+    theta -= eta*gradients
+print("theta from own gd")
+print(theta)
+
+xnew = np.array([[0],[2]])
+Xnew = np.c_[np.ones((2,1)), xnew]
+ypredict = Xnew.dot(theta)
+ypredict2 = Xnew.dot(theta_linreg)
+
+n_epochs = 50
+M = 5   #size of each minibatch
+m = int(n/M) #number of minibatches
+t0, t1 = 5, 50
+
+def learning_schedule(t):
+    return t0/(t+t1)
+
+theta = np.random.randn(2,1)
+
+for epoch in range(n_epochs):
+# Can you figure out a better way of setting up the contributions to each batch?
+    for i in range(m):
+        random_index = M*np.random.randint(m)
+        xi = X[random_index:random_index+M]
+        yi = y[random_index:random_index+M]
+        gradients = (2.0/M)* xi.T @ ((xi @ theta)-yi)
+        eta = learning_schedule(epoch*m+i)
+        theta = theta - eta*gradients
+print("theta from own sdg")
+print(theta)
+
+plt.plot(xnew, ypredict, "r-")
+plt.plot(xnew, ypredict2, "b-")
+plt.plot(x, y ,'ro')
+plt.axis([0,2.0,0, 15.0])
+plt.xlabel(r'$x$')
+plt.ylabel(r'$y$')
+plt.title(r'Random numbers ')
+plt.show()
+
+!ec
+
+
+
+!split
+===== Replace or not =====
+
+In the above code, we have use replacement in setting up the
+mini-batches. The discussion
+"here":"https://sebastianraschka.com/faq/docs/sgd-methods.html" may be
+useful.  
+
+
+!split
+===== Momentum based GD =====
+
+The stochastic gradient descent (SGD) is almost always used with a
+*momentum* or inertia term that serves as a memory of the direction we
+are moving in parameter space.  This is typically implemented as
+follows
+
+!bt
+\begin{align}
+\mathbf{v}_{t}&=\gamma \mathbf{v}_{t-1}+\eta_{t}\nabla_\theta E(\boldsymbol{\theta}_t) \nonumber \\
+\boldsymbol{\theta}_{t+1}&= \boldsymbol{\theta}_t -\mathbf{v}_{t},
+\end{align}
+!et
+
+where we have introduced a momentum parameter $\gamma$, with
+$0\le\gamma\le 1$, and for brevity we dropped the explicit notation to
+indicate the gradient is to be taken over a different mini-batch at
+each step. We call this algorithm gradient descent with momentum
+(GDM). From these equations, it is clear that $\mathbf{v}_t$ is a
+running average of recently encountered gradients and
+$(1-\gamma)^{-1}$ sets the characteristic time scale for the memory
+used in the averaging procedure. Consistent with this, when
+$\gamma=0$, this just reduces down to ordinary SGD as discussed
+earlier. An equivalent way of writing the updates is
+
+!bt
+\[
+\Delta \boldsymbol{\theta}_{t+1} = \gamma \Delta \boldsymbol{\theta}_t -\ \eta_{t}\nabla_\theta E(\boldsymbol{\theta}_t),
+\]
+!et
+where we have defined $\Delta \boldsymbol{\theta}_{t}= \boldsymbol{\theta}_t-\boldsymbol{\theta}_{t-1}$.
+
+!split
+===== More on momentum based approaches =====
+
+Let us try to get more intuition from these equations. It is helpful
+to consider a simple physical analogy with a particle of mass $m$
+moving in a viscous medium with drag coefficient $\mu$ and potential
+$E(\mathbf{w})$. If we denote the particle's position by $\mathbf{w}$,
+then its motion is described by
+
+!bt
+\[
+m {d^2 \mathbf{w} \over dt^2} + \mu {d \mathbf{w} \over dt }= -\nabla_w E(\mathbf{w}).
+\]
+!et
+
+We can discretize this equation in the usual way to get
+
+!bt
+\[
+m { \mathbf{w}_{t+\Delta t}-2 \mathbf{w}_{t} +\mathbf{w}_{t-\Delta t} \over (\Delta t)^2}+\mu {\mathbf{w}_{t+\Delta t}- \mathbf{w}_{t} \over \Delta t} = -\nabla_w E(\mathbf{w}).
+\]
+!et
+
+Rearranging this equation, we can rewrite this as
+
+!bt
+\[
+\Delta \mathbf{w}_{t +\Delta t}= - { (\Delta t)^2 \over m +\mu \Delta t} \nabla_w E(\mathbf{w})+ {m \over m +\mu \Delta t} \Delta \mathbf{w}_t.
+\]
+!et
+
+!split
+===== Momentum parameter =====
+
+Notice that this equation is identical to previous one if we identify
+the position of the particle, $\mathbf{w}$, with the parameters
+$\boldsymbol{\theta}$. This allows us to identify the momentum
+parameter and learning rate with the mass of the particle and the
+viscous drag as:
+
+!bt
+\[
+\gamma= {m \over m +\mu \Delta t }, \qquad \eta = {(\Delta t)^2 \over m +\mu \Delta t}.
+\]
+!et
+
+Thus, as the name suggests, the momentum parameter is proportional to
+the mass of the particle and effectively provides inertia.
+Furthermore, in the large viscosity/small learning rate limit, our
+memory time scales as $(1-\gamma)^{-1} \approx m/(\mu \Delta t)$.
+
+Why is momentum useful? SGD momentum helps the gradient descent
+algorithm gain speed in directions with persistent but small gradients
+even in the presence of stochasticity, while suppressing oscillations
+in high-curvature directions. This becomes especially important in
+situations where the landscape is shallow and flat in some directions
+and narrow and steep in others. It has been argued that first-order
+methods (with appropriate initial conditions) can perform comparable
+to more expensive second order methods, especially in the context of
+complex deep learning models.
+
+These beneficial properties of momentum can sometimes become even more
+pronounced by using a slight modification of the classical momentum
+algorithm called Nesterov Accelerated Gradient (NAG).
+
+In the NAG algorithm, rather than calculating the gradient at the
+current parameters, $\nabla_\theta E(\boldsymbol{\theta}_t)$, one
+calculates the gradient at the expected value of the parameters given
+our current momentum, $\nabla_\theta E(\boldsymbol{\theta}_t +\gamma
+\mathbf{v}_{t-1})$. This yields the NAG update rule
+
+!bt
+\begin{align}
+\mathbf{v}_{t}&=\gamma \mathbf{v}_{t-1}+\eta_{t}\nabla_\theta E(\boldsymbol{\theta}_t +\gamma \mathbf{v}_{t-1}) \nonumber \\
+\boldsymbol{\theta}_{t+1}&= \boldsymbol{\theta}_t -\mathbf{v}_{t}.
+\end{align}
+!et
+
+One of the major advantages of NAG is that it allows for the use of a larger learning rate than GDM for the same choice of $\gamma$.
+
+
+!split
+===== Second moment of the gradient =====
+
+
+In stochastic gradient descent, with and without momentum, we still
+have to specify a schedule for tuning the learning rates $\eta_t$
+as a function of time.  As discussed in the context of Newton's
+method, this presents a number of dilemmas. The learning rate is
+limited by the steepest direction which can change depending on the
+current position in the landscape. To circumvent this problem, ideally
+our algorithm would keep track of curvature and take large steps in
+shallow, flat directions and small steps in steep, narrow directions.
+Second-order methods accomplish this by calculating or approximating
+the Hessian and normalizing the learning rate by the
+curvature. However, this is very computationally expensive for
+extremely large models. Ideally, we would like to be able to
+adaptively change the step size to match the landscape without paying
+the steep computational price of calculating or approximating
+Hessians.
+
+During the last decade a number of methods have been introduced that accomplish
+this by tracking not only the gradient, but also the second moment of
+the gradient. These methods include AdaGrad, AdaDelta, Root Mean Squared Propagation (RMS-Prop), and
+"ADAM":"https://arxiv.org/abs/1412.6980".
+
+!split
+===== RMS prop =====
+
+In RMS prop, in addition to keeping a running average of the first
+moment of the gradient, we also keep track of the second moment
+denoted by $\mathbf{s}_t=\mathbb{E}[\mathbf{g}_t^2]$. The update rule
+for RMS prop is given by
+
+!bt
+\begin{align}
+\mathbf{g}_t &= \nabla_\theta E(\boldsymbol{\theta}) \\
+\mathbf{s}_t &=\beta \mathbf{s}_{t-1} +(1-\beta)\mathbf{g}_t^2 \nonumber \\
+\boldsymbol{\theta}_{t+1}&=&\boldsymbol{\theta}_t - \eta_t { \mathbf{g}_t \over \sqrt{\mathbf{s}_t +\epsilon}}, \nonumber
+\end{align}
+!et
+
+where $\beta$ controls the averaging time of the second moment and is
+typically taken to be about $\beta=0.9$, $\eta_t$ is a learning rate
+typically chosen to be $10^{-3}$, and $\epsilon\sim 10^{-8} $ is a
+small regularization constant to prevent divergences. Multiplication
+and division by vectors is understood as an element-wise operation. It
+is clear from this formula that the learning rate is reduced in
+directions where the norm of the gradient is consistently large. This
+greatly speeds up the convergence by allowing us to use a larger
+learning rate for flat directions.
+
+
+!split
+===== "ADAM optimizer":"https://arxiv.org/abs/1412.6980" =====
+
+A related algorithm is the ADAM optimizer. In
+"ADAM":"https://arxiv.org/abs/1412.6980", we keep a running average of
+both the first and second moment of the gradient and use this
+information to adaptively change the learning rate for different
+parameters.  The method isefficient when working with large
+problems involving lots data and/or parameters.  It is a combination of the
+gradient descent with momentum algorithm and the RMSprop algorithm
+discussed above.
+
+In addition to keeping a running average of the first and
+second moments of the gradient
+(i.e. $\mathbf{m}_t=\mathbb{E}[\mathbf{g}_t]$ and
+$\mathbf{s}_t=\mathbb{E}[\mathbf{g}^2_t]$, respectively), ADAM
+performs an additional bias correction to account for the fact that we
+are estimating the first two moments of the gradient using a running
+average (denoted by the hats in the update rule below). The update
+rule for ADAM is given by (where multiplication and division are once
+again understood to be element-wise operations below)
+
+!bt
+\begin{align}
+\mathbf{g}_t &= \nabla_\theta E(\boldsymbol{\theta}) \\
+\mathbf{m}_t &= \beta_1 \mathbf{m}_{t-1} + (1-\beta_1) \mathbf{g}_t \nonumber \\
+\mathbf{s}_t &=\beta_2 \mathbf{s}_{t-1} +(1-\beta_2)\mathbf{g}_t^2 \nonumber \\
+\bm{\mathbf{m}}_t&={\mathbf{m}_t \over 1-\beta_1^t} \nonumber \\
+\bm{\mathbf{s}}_t &={\mathbf{s}_t \over1-\beta_2^t} \nonumber \\
+\boldsymbol{\theta}_{t+1}&=\boldsymbol{\theta}_t - \eta_t { \bm{\mathbf{m}}_t \over \sqrt{\bm{\mathbf{s}}_t} +\epsilon}, \nonumber \\
+\end{align}
+!et
+
+where $\beta_1$ and $\beta_2$ set the memory lifetime of the first and
+second moment and are typically taken to be $0.9$ and $0.99$
+respectively, and $\eta$ and $\epsilon$ are identical to RMSprop.
+
+Like in RMSprop, the effective step size of a parameter depends on the
+magnitude of its gradient squared.  To understand this better, let us
+rewrite this expression in terms of the variance
+$\boldsymbol{\sigma}_t^2 = \bm{\mathbf{s}}_t -
+(\bm{\mathbf{m}}_t)^2$. Consider a single parameter $\theta_t$. The
+update rule for this parameter is given by
+
+!bt
+\[
+\Delta \theta_{t+1}= -\eta_t { \bm{m}_t \over \sqrt{\sigma_t^2 +  m_t^2 }+\epsilon}.
+\]
+!et
+
+!split
+===== Algorithms and codes for Adagrad, RMSprop and Adam =====
+
+The algorithms we have implemented are well described in the text by "Goodfellow, Bengio and Courville, chapter 8":"https://www.deeplearningbook.org/contents/optimization.html".
+
+The codes which implement these algorithms are discussed after our presentation of automatic differentiation.
+
+
+===== AdaGrad algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html" =====
+
+FIGURE: [figures/adagrad.png, width=600 frac=0.8]
+
+
+===== RMSProp algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html" =====
+
+FIGURE: [figures/rmsprop.png, width=600 frac=0.8]
+
+
+
+===== ADAM algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html" =====
+
+FIGURE: [figures/adam.png, width=600 frac=0.8]
+
+
+
+
+
+!split
+===== Practical tips =====
+
+* _Randomize the data when making mini-batches_. It is always important to randomly shuffle the data when forming mini-batches. Otherwise, the gradient descent method can fit spurious correlations resulting from the order in which data is presented.
+
+* _Transform your inputs_. Learning becomes difficult when our landscape has a mixture of steep and flat directions. One simple trick for minimizing these situations is to standardize the data by subtracting the mean and normalizing the variance of input variables. Whenever possible, also decorrelate the inputs. To understand why this is helpful, consider the case of linear regression. It is easy to show that for the squared error cost function, the Hessian of the cost function is just the correlation matrix between the inputs. Thus, by standardizing the inputs, we are ensuring that the landscape looks homogeneous in all directions in parameter space. Since most deep networks can be viewed as linear transformations followed by a non-linearity at each layer, we expect this intuition to hold beyond the linear case.
+
+* _Monitor the out-of-sample performance._ Always monitor the performance of your model on a validation set (a small portion of the training data that is held out of the training process to serve as a proxy for the test set. If the validation error starts increasing, then the model is beginning to overfit. Terminate the learning process. This *early stopping* significantly improves performance in many settings.
+	
+* _Adaptive optimization methods don't always have good generalization._ Recent studies have shown that adaptive methods such as ADAM, RMSPorp, and AdaGrad tend to have poor generalization compared to SGD or SGD with momentum, particularly in the high-dimensional limit (i.e. the number of parameters exceeds the number of data points). Although it is not clear at this stage why these methods perform so well in training deep neural networks, simpler procedures like properly-tuned SGD may work as well or better in these applications.
+
+Geron's text, see chapter 11, has several interesting discussions.
+
+
+
+!split
+===== Automatic differentiation =====
+
+"Automatic differentiation (AD)":"https://en.wikipedia.org/wiki/Automatic_differentiation", 
+also called algorithmic
+differentiation or computational differentiation,is a set of
+techniques to numerically evaluate the derivative of a function
+specified by a computer program. AD exploits the fact that every
+computer program, no matter how complicated, executes a sequence of
+elementary arithmetic operations (addition, subtraction,
+multiplication, division, etc.) and elementary functions (exp, log,
+sin, cos, etc.). By applying the chain rule repeatedly to these
+operations, derivatives of arbitrary order can be computed
+automatically, accurately to working precision, and using at most a
+small constant factor more arithmetic operations than the original
+program.
+
+Automatic differentiation is neither:
+
+* Symbolic differentiation, nor
+* Numerical differentiation (the method of finite differences).
+
+Symbolic differentiation can lead to inefficient code and faces the
+difficulty of converting a computer program into a single expression,
+while numerical differentiation can introduce round-off errors in the
+discretization process and cancellation
+
+
+
+Python has tools for so-called _automatic differentiation_.
+Consider the following example
+!bt
+\[
+f(x) = \sin\left(2\pi x + x^2\right)
+\]
+!et
+which has the following derivative
+!bt
+\[
+f'(x) = \cos\left(2\pi x + x^2\right)\left(2\pi + 2x\right) 
+\]
+!et
+Using _autograd_ we have
+
+!bc pycod
+import autograd.numpy as np
+
+# To do elementwise differentiation:
+from autograd import elementwise_grad as egrad 
+
+# To plot:
+import matplotlib.pyplot as plt 
+
+
+def f(x):
+    return np.sin(2*np.pi*x + x**2)
+
+def f_grad_analytic(x):
+    return np.cos(2*np.pi*x + x**2)*(2*np.pi + 2*x)
+
+# Do the comparison:
+x = np.linspace(0,1,1000)
+
+f_grad = egrad(f)
+
+computed = f_grad(x)
+analytic = f_grad_analytic(x)
+
+plt.title('Derivative computed from Autograd compared with the analytical derivative')
+plt.plot(x,computed,label='autograd')
+plt.plot(x,analytic,label='analytic')
+
+plt.xlabel('x')
+plt.ylabel('y')
+plt.legend()
+
+plt.show()
+
+print("The max absolute difference is: %g"%(np.max(np.abs(computed - analytic))))
+!ec
+
+!split 
+===== Using autograd =====
+
+Here we
+experiment with what kind of functions Autograd is capable
+of finding the gradient of. The following Python functions are just
+meant to illustrate what Autograd can do, but please feel free to
+experiment with other, possibly more complicated, functions as well.
+
+!bc pycod
+import autograd.numpy as np
+from autograd import grad
+
+def f1(x):
+    return x**3 + 1
+
+f1_grad = grad(f1)
+
+# Remember to send in float as argument to the computed gradient from Autograd!
+a = 1.0
+
+# See the evaluated gradient at a using autograd:
+print("The gradient of f1 evaluated at a = %g using autograd is: %g"%(a,f1_grad(a)))
+
+# Compare with the analytical derivative, that is f1'(x) = 3*x**2 
+grad_analytical = 3*a**2
+print("The gradient of f1 evaluated at a = %g by finding the analytic expression is: %g"%(a,grad_analytical))
+!ec
+
+
+!split
+===== Autograd with more complicated functions =====
+
+To differentiate with respect to two (or more) arguments of a Python
+function, Autograd need to know at which variable the function if
+being differentiated with respect to.
+
+!bc pycod
+import autograd.numpy as np
+from autograd import grad
+def f2(x1,x2):
+    return 3*x1**3 + x2*(x1 - 5) + 1
+
+# By sending the argument 0, Autograd will compute the derivative w.r.t the first variable, in this case x1
+f2_grad_x1 = grad(f2,0)
+
+# ... and differentiate w.r.t x2 by sending 1 as an additional arugment to grad
+f2_grad_x2 = grad(f2,1)
+
+x1 = 1.0
+x2 = 3.0 
+
+print("Evaluating at x1 = %g, x2 = %g"%(x1,x2))
+print("-"*30)
+
+# Compare with the analytical derivatives:
+
+# Derivative of f2 w.r.t x1 is: 9*x1**2 + x2:
+f2_grad_x1_analytical = 9*x1**2 + x2
+
+# Derivative of f2 w.r.t x2 is: x1 - 5:
+f2_grad_x2_analytical = x1 - 5
+
+# See the evaluated derivations:
+print("The derivative of f2 w.r.t x1: %g"%( f2_grad_x1(x1,x2) ))
+print("The analytical derivative of f2 w.r.t x1: %g"%( f2_grad_x1(x1,x2) ))
+
+print()
+
+print("The derivative of f2 w.r.t x2: %g"%( f2_grad_x2(x1,x2) ))
+print("The analytical derivative of f2 w.r.t x2: %g"%( f2_grad_x2(x1,x2) ))
+!ec
+
+Note that the grad function will not produce the true gradient of the function. The true gradient of a function with two or more variables will produce a vector, where each element is the function differentiated w.r.t a variable.
+
+
+!split
+=====  More complicated functions using the elements of their arguments directly =====
+
+!bc pycod
+import autograd.numpy as np
+from autograd import grad
+def f3(x): # Assumes x is an array of length 5 or higher
+    return 2*x[0] + 3*x[1] + 5*x[2] + 7*x[3] + 11*x[4]**2
+
+f3_grad = grad(f3)
+
+x = np.linspace(0,4,5)
+
+# Print the computed gradient:
+print("The computed gradient of f3 is: ", f3_grad(x))
+
+# The analytical gradient is: (2, 3, 5, 7, 22*x[4])
+f3_grad_analytical = np.array([2, 3, 5, 7, 22*x[4]])
+
+# Print the analytical gradient:
+print("The analytical gradient of f3 is: ", f3_grad_analytical)
+!ec
+
+Note that in this case, when sending an array as input argument, the
+output from Autograd is another array. This is the true gradient of
+the function, as opposed to the function in the previous example. By
+using arrays to represent the variables, the output from Autograd
+might be easier to work with, as the output is closer to what one
+could expect form a gradient-evaluting function.
+
+!split 
+===== Functions using mathematical functions from Numpy =====
+
+!bc pycod
+import autograd.numpy as np
+from autograd import grad
+def f4(x):
+    return np.sqrt(1+x**2) + np.exp(x) + np.sin(2*np.pi*x)
+
+f4_grad = grad(f4)
+
+x = 2.7
+
+# Print the computed derivative:
+print("The computed derivative of f4 at x = %g is: %g"%(x,f4_grad(x)))
+
+# The analytical derivative is: x/sqrt(1 + x**2) + exp(x) + cos(2*pi*x)*2*pi
+f4_grad_analytical = x/np.sqrt(1 + x**2) + np.exp(x) + np.cos(2*np.pi*x)*2*np.pi
+
+# Print the analytical gradient:
+print("The analytical gradient of f4 at x = %g is: %g"%(x,f4_grad_analytical))
+!ec
+
+
+!split
+===== More autograd =====
+
+!bc pycod
+import autograd.numpy as np
+from autograd import grad
+def f5(x):
+    if x >= 0:
+        return x**2
+    else:
+        return -3*x + 1
+
+f5_grad = grad(f5)
+
+x = 2.7
+
+# Print the computed derivative:
+print("The computed derivative of f5 at x = %g is: %g"%(x,f5_grad(x)))
+!ec
+
+
+!split
+===== And  with loops =====
+
+!bc pycod
+import autograd.numpy as np
+from autograd import grad
+def f6_for(x):
+    val = 0
+    for i in range(10):
+        val = val + x**i
+    return val
+
+def f6_while(x):
+    val = 0
+    i = 0
+    while i < 10:
+        val = val + x**i
+        i = i + 1
+    return val
+
+f6_for_grad = grad(f6_for)
+f6_while_grad = grad(f6_while)
+
+x = 0.5
+
+# Print the computed derivaties of f6_for and f6_while
+print("The computed derivative of f6_for at x = %g is: %g"%(x,f6_for_grad(x)))
+print("The computed derivative of f6_while at x = %g is: %g"%(x,f6_while_grad(x)))
+!ec
+!bc pycod
+import autograd.numpy as np
+from autograd import grad
+# Both of the functions are implementation of the sum: sum(x**i) for i = 0, ..., 9
+# The analytical derivative is: sum(i*x**(i-1)) 
+f6_grad_analytical = 0
+for i in range(10):
+    f6_grad_analytical += i*x**(i-1)
+
+print("The analytical derivative of f6 at x = %g is: %g"%(x,f6_grad_analytical))
+!ec
+
+!split
+===== Using recursion =====
+!bc pycod
+import autograd.numpy as np
+from autograd import grad
+
+def f7(n): # Assume that n is an integer
+    if n == 1 or n == 0:
+        return 1
+    else:
+        return n*f7(n-1)
+
+f7_grad = grad(f7)
+
+n = 2.0
+
+print("The computed derivative of f7 at n = %d is: %g"%(n,f7_grad(n)))
+
+# The function f7 is an implementation of the factorial of n.
+# By using the product rule, one can find that the derivative is:
+
+f7_grad_analytical = 0
+for i in range(int(n)-1):
+    tmp = 1
+    for k in range(int(n)-1):
+        if k != i:
+            tmp *= (n - k)
+    f7_grad_analytical += tmp
+
+print("The analytical derivative of f7 at n = %d is: %g"%(n,f7_grad_analytical))
+
+!ec
+Note that if n is equal to zero or one, Autograd will give an error message. This message appears when the output is independent on input.
+
+
+!split
+=====  Using Autograd with OLS =====
+
+We conclude the part on optmization by showing how we can make codes
+for linear regression and logistic regression using _autograd_. The
+first example shows results with ordinary leats squares.
+
+!bc pycod
+# Using Autograd to calculate gradients for OLS
+from random import random, seed
+import numpy as np
+import autograd.numpy as np
+import matplotlib.pyplot as plt
+from autograd import grad
+
+def CostOLS(beta):
+    return (1.0/n)*np.sum((y-X @ beta)**2)
+
+n = 100
+x = 2*np.random.rand(n,1)
+y = 4+3*x+np.random.randn(n,1)
+
+X = np.c_[np.ones((n,1)), x]
+XT_X = X.T @ X
+theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)
+print("Own inversion")
+print(theta_linreg)
+# Hessian matrix
+H = (2.0/n)* XT_X
+EigValues, EigVectors = np.linalg.eig(H)
+print(f"Eigenvalues of Hessian Matrix:{EigValues}")
+
+theta = np.random.randn(2,1)
+eta = 1.0/np.max(EigValues)
+Niterations = 1000
+# define the gradient
+training_gradient = grad(CostOLS)
+
+for iter in range(Niterations):
+    gradients = training_gradient(theta)
+    theta -= eta*gradients
+print("theta from own gd")
+print(theta)
+
+xnew = np.array([[0],[2]])
+Xnew = np.c_[np.ones((2,1)), xnew]
+ypredict = Xnew.dot(theta)
+ypredict2 = Xnew.dot(theta_linreg)
+
+plt.plot(xnew, ypredict, "r-")
+plt.plot(xnew, ypredict2, "b-")
+plt.plot(x, y ,'ro')
+plt.axis([0,2.0,0, 15.0])
+plt.xlabel(r'$x$')
+plt.ylabel(r'$y$')
+plt.title(r'Random numbers ')
+plt.show()
+
+!ec
+
+
+!split
+===== Same code but now with momentum gradient descent =====
+!bc pycod
+# Using Autograd to calculate gradients for OLS
+from random import random, seed
+import numpy as np
+import autograd.numpy as np
+import matplotlib.pyplot as plt
+from autograd import grad
+
+def CostOLS(beta):
+    return (1.0/n)*np.sum((y-X @ beta)**2)
+
+n = 100
+x = 2*np.random.rand(n,1)
+y = 4+3*x#+np.random.randn(n,1)
+
+X = np.c_[np.ones((n,1)), x]
+XT_X = X.T @ X
+theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)
+print("Own inversion")
+print(theta_linreg)
+# Hessian matrix
+H = (2.0/n)* XT_X
+EigValues, EigVectors = np.linalg.eig(H)
+print(f"Eigenvalues of Hessian Matrix:{EigValues}")
+
+theta = np.random.randn(2,1)
+eta = 1.0/np.max(EigValues)
+Niterations = 30
+
+# define the gradient
+training_gradient = grad(CostOLS)
+
+for iter in range(Niterations):
+    gradients = training_gradient(theta)
+    theta -= eta*gradients
+    print(iter,gradients[0],gradients[1])
+print("theta from own gd")
+print(theta)
+
+# Now improve with momentum gradient descent
+change = 0.0
+delta_momentum = 0.3
+for iter in range(Niterations):
+    # calculate gradient
+    gradients = training_gradient(theta)
+    # calculate update
+    new_change = eta*gradients+delta_momentum*change
+    # take a step
+    theta -= new_change
+    # save the change
+    change = new_change
+    print(iter,gradients[0],gradients[1])
+print("theta from own gd wth momentum")
+print(theta)
+
+!ec
+
+!split
+===== Including Stochastic Gradient Descent with Autograd =====
+In this code we include the stochastic gradient descent approach discussed above. Note here that we specify which argument we are taking the derivative with respect to when using _autograd_.
+
+!bc pycod
+# Using Autograd to calculate gradients using SGD
+# OLS example
+from random import random, seed
+import numpy as np
+import autograd.numpy as np
+import matplotlib.pyplot as plt
+from autograd import grad
+
+# Note change from previous example
+def CostOLS(y,X,theta):
+    return np.sum((y-X @ theta)**2)
+
+n = 100
+x = 2*np.random.rand(n,1)
+y = 4+3*x+np.random.randn(n,1)
+
+X = np.c_[np.ones((n,1)), x]
+XT_X = X.T @ X
+theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)
+print("Own inversion")
+print(theta_linreg)
+# Hessian matrix
+H = (2.0/n)* XT_X
+EigValues, EigVectors = np.linalg.eig(H)
+print(f"Eigenvalues of Hessian Matrix:{EigValues}")
+
+theta = np.random.randn(2,1)
+eta = 1.0/np.max(EigValues)
+Niterations = 1000
+
+# Note that we request the derivative wrt third argument (theta, 2 here)
+training_gradient = grad(CostOLS,2)
+
+for iter in range(Niterations):
+    gradients = (1.0/n)*training_gradient(y, X, theta)
+    theta -= eta*gradients
+print("theta from own gd")
+print(theta)
+
+xnew = np.array([[0],[2]])
+Xnew = np.c_[np.ones((2,1)), xnew]
+ypredict = Xnew.dot(theta)
+ypredict2 = Xnew.dot(theta_linreg)
+
+plt.plot(xnew, ypredict, "r-")
+plt.plot(xnew, ypredict2, "b-")
+plt.plot(x, y ,'ro')
+plt.axis([0,2.0,0, 15.0])
+plt.xlabel(r'$x$')
+plt.ylabel(r'$y$')
+plt.title(r'Random numbers ')
+plt.show()
+
+n_epochs = 50
+M = 5   #size of each minibatch
+m = int(n/M) #number of minibatches
+t0, t1 = 5, 50
+def learning_schedule(t):
+    return t0/(t+t1)
+
+theta = np.random.randn(2,1)
+
+for epoch in range(n_epochs):
+# Can you figure out a better way of setting up the contributions to each batch?
+    for i in range(m):
+        random_index = M*np.random.randint(m)
+        xi = X[random_index:random_index+M]
+        yi = y[random_index:random_index+M]
+        gradients = (1.0/M)*training_gradient(yi, xi, theta)
+        eta = learning_schedule(epoch*m+i)
+        theta = theta - eta*gradients
+print("theta from own sdg")
+print(theta)
+
+
+!ec
+
+
+!split
+===== Same code but now with momentum gradient descent =====
+!bc pycod
+# Using Autograd to calculate gradients using SGD
+# OLS example
+from random import random, seed
+import numpy as np
+import autograd.numpy as np
+import matplotlib.pyplot as plt
+from autograd import grad
+
+# Note change from previous example
+def CostOLS(y,X,theta):
+    return np.sum((y-X @ theta)**2)
+
+n = 100
+x = 2*np.random.rand(n,1)
+y = 4+3*x+np.random.randn(n,1)
+
+X = np.c_[np.ones((n,1)), x]
+XT_X = X.T @ X
+theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)
+print("Own inversion")
+print(theta_linreg)
+# Hessian matrix
+H = (2.0/n)* XT_X
+EigValues, EigVectors = np.linalg.eig(H)
+print(f"Eigenvalues of Hessian Matrix:{EigValues}")
+
+theta = np.random.randn(2,1)
+eta = 1.0/np.max(EigValues)
+Niterations = 100
+
+# Note that we request the derivative wrt third argument (theta, 2 here)
+training_gradient = grad(CostOLS,2)
+
+for iter in range(Niterations):
+    gradients = (1.0/n)*training_gradient(y, X, theta)
+    theta -= eta*gradients
+print("theta from own gd")
+print(theta)
+
+
+n_epochs = 50
+M = 5   #size of each minibatch
+m = int(n/M) #number of minibatches
+t0, t1 = 5, 50
+def learning_schedule(t):
+    return t0/(t+t1)
+
+theta = np.random.randn(2,1)
+
+change = 0.0
+delta_momentum = 0.3
+
+for epoch in range(n_epochs):
+    for i in range(m):
+        random_index = M*np.random.randint(m)
+        xi = X[random_index:random_index+M]
+        yi = y[random_index:random_index+M]
+        gradients = (1.0/M)*training_gradient(yi, xi, theta)
+        eta = learning_schedule(epoch*m+i)
+        # calculate update
+        new_change = eta*gradients+delta_momentum*change
+        # take a step
+        theta -= new_change
+        # save the change
+        change = new_change
+print("theta from own sdg with momentum")
+print(theta)
+!ec
+
+
+!split
+===== Similar (second order function now) problem but now with AdaGrad =====
+!bc pycod
+# Using Autograd to calculate gradients using AdaGrad and Stochastic Gradient descent
+# OLS example
+from random import random, seed
+import numpy as np
+import autograd.numpy as np
+import matplotlib.pyplot as plt
+from autograd import grad
+
+# Note change from previous example
+def CostOLS(y,X,theta):
+    return np.sum((y-X @ theta)**2)
+
+n = 1000
+x = np.random.rand(n,1)
+y = 2.0+3*x +4*x*x
+
+X = np.c_[np.ones((n,1)), x, x*x]
+XT_X = X.T @ X
+theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)
+print("Own inversion")
+print(theta_linreg)
+
+
+# Note that we request the derivative wrt third argument (theta, 2 here)
+training_gradient = grad(CostOLS,2)
+# Define parameters for Stochastic Gradient Descent
+n_epochs = 50
+M = 5   #size of each minibatch
+m = int(n/M) #number of minibatches
+# Guess for unknown parameters theta
+theta = np.random.randn(3,1)
+
+# Value for learning rate
+eta = 0.01
+# Including AdaGrad parameter to avoid possible division by zero
+delta  = 1e-8
+for epoch in range(n_epochs):
+    Giter = 0.0
+    for i in range(m):
+        random_index = M*np.random.randint(m)
+        xi = X[random_index:random_index+M]
+        yi = y[random_index:random_index+M]
+        gradients = (1.0/M)*training_gradient(yi, xi, theta)
+        Giter += gradients*gradients
+        update = gradients*eta/(delta+np.sqrt(Giter))
+        theta -= update
+print("theta from own AdaGrad")
+print(theta)
+
+
+!ec
+
+Running this code we note an almost perfect agreement with the results from matrix inversion.
+
+!split
+=====  RMSprop for adaptive learning rate with Stochastic Gradient Descent =====
+!bc pycod
+# Using Autograd to calculate gradients using RMSprop  and Stochastic Gradient descent
+# OLS example
+from random import random, seed
+import numpy as np
+import autograd.numpy as np
+import matplotlib.pyplot as plt
+from autograd import grad
+
+# Note change from previous example
+def CostOLS(y,X,theta):
+    return np.sum((y-X @ theta)**2)
+
+n = 1000
+x = np.random.rand(n,1)
+y = 2.0+3*x +4*x*x# +np.random.randn(n,1)
+
+X = np.c_[np.ones((n,1)), x, x*x]
+XT_X = X.T @ X
+theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)
+print("Own inversion")
+print(theta_linreg)
+
+
+# Note that we request the derivative wrt third argument (theta, 2 here)
+training_gradient = grad(CostOLS,2)
+# Define parameters for Stochastic Gradient Descent
+n_epochs = 50
+M = 5   #size of each minibatch
+m = int(n/M) #number of minibatches
+# Guess for unknown parameters theta
+theta = np.random.randn(3,1)
+
+# Value for learning rate
+eta = 0.01
+# Value for parameter rho
+rho = 0.99
+# Including AdaGrad parameter to avoid possible division by zero
+delta  = 1e-8
+for epoch in range(n_epochs):
+    Giter = 0.0
+    for i in range(m):
+        random_index = M*np.random.randint(m)
+        xi = X[random_index:random_index+M]
+        yi = y[random_index:random_index+M]
+        gradients = (1.0/M)*training_gradient(yi, xi, theta)
+	# Accumulated gradient
+	# Scaling with rho the new and the previous results
+        Giter = (rho*Giter+(1-rho)*gradients*gradients)
+	# Taking the diagonal only and inverting
+        update = gradients*eta/(delta+np.sqrt(Giter))
+	# Hadamard product
+        theta -= update
+print("theta from own RMSprop")
+print(theta)
+!ec
+
+!split
+===== And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf" =====
+
+!bc pycod
+# Using Autograd to calculate gradients using RMSprop  and Stochastic Gradient descent
+# OLS example
+from random import random, seed
+import numpy as np
+import autograd.numpy as np
+import matplotlib.pyplot as plt
+from autograd import grad
+
+# Note change from previous example
+def CostOLS(y,X,theta):
+    return np.sum((y-X @ theta)**2)
+
+n = 1000
+x = np.random.rand(n,1)
+y = 2.0+3*x +4*x*x# +np.random.randn(n,1)
+
+X = np.c_[np.ones((n,1)), x, x*x]
+XT_X = X.T @ X
+theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)
+print("Own inversion")
+print(theta_linreg)
+
+
+# Note that we request the derivative wrt third argument (theta, 2 here)
+training_gradient = grad(CostOLS,2)
+# Define parameters for Stochastic Gradient Descent
+n_epochs = 50
+M = 5   #size of each minibatch
+m = int(n/M) #number of minibatches
+# Guess for unknown parameters theta
+theta = np.random.randn(3,1)
+
+# Value for learning rate
+eta = 0.01
+# Value for parameters beta1 and beta2, see https://arxiv.org/abs/1412.6980
+beta1 = 0.9
+beta2 = 0.999
+# Including AdaGrad parameter to avoid possible division by zero
+delta  = 1e-7
+iter = 0
+for epoch in range(n_epochs):
+    first_moment = 0.0
+    second_moment = 0.0
+    iter += 1
+    for i in range(m):
+        random_index = M*np.random.randint(m)
+        xi = X[random_index:random_index+M]
+        yi = y[random_index:random_index+M]
+        gradients = (1.0/M)*training_gradient(yi, xi, theta)
+        # Computing moments first
+        first_moment = beta1*first_moment + (1-beta1)*gradients
+        second_moment = beta2*second_moment+(1-beta2)*gradients*gradients
+        first_term = first_moment/(1.0-beta1**iter)
+        second_term = second_moment/(1.0-beta2**iter)
+	# Scaling with rho the new and the previous results
+        update = eta*first_term/(np.sqrt(second_term)+delta)
+        theta -= update
+print("theta from own ADAM")
+print(theta)
+!ec
+
+!split
+===== And Logistic Regression =====
+
+!bc pycod
+import autograd.numpy as np
+from autograd import grad
+
+def sigmoid(x):
+    return 0.5 * (np.tanh(x / 2.) + 1)
+
+def logistic_predictions(weights, inputs):
+    # Outputs probability of a label being true according to logistic model.
+    return sigmoid(np.dot(inputs, weights))
+
+def training_loss(weights):
+    # Training loss is the negative log-likelihood of the training labels.
+    preds = logistic_predictions(weights, inputs)
+    label_probabilities = preds * targets + (1 - preds) * (1 - targets)
+    return -np.sum(np.log(label_probabilities))
+
+# Build a toy dataset.
+inputs = np.array([[0.52, 1.12,  0.77],
+                   [0.88, -1.08, 0.15],
+                   [0.52, 0.06, -1.30],
+                   [0.74, -2.49, 1.39]])
+targets = np.array([True, True, False, True])
+
+# Define a function that returns gradients of training loss using Autograd.
+training_gradient_fun = grad(training_loss)
+
+# Optimize weights using gradient descent.
+weights = np.array([0.0, 0.0, 0.0])
+print("Initial loss:", training_loss(weights))
+for i in range(100):
+    weights -= training_gradient_fun(weights) * 0.01
+
+print("Trained loss:", training_loss(weights))
+!ec
+
+
+
+
+===== Introducing "JAX":"https://jax.readthedocs.io/en/latest/" =====
+
+Presently, instead of using _autograd_, we recommend using "JAX":"https://jax.readthedocs.io/en/latest/"
+
+_JAX_ is Autograd and "XLA (Accelerated Linear Algebra))":"https://www.tensorflow.org/xla",
+brought together for high-performance numerical computing and machine learning research.
+It provides composable transformations of Python+NumPy programs: differentiate, vectorize, parallelize, Just-In-Time compile to GPU/TPU, and more.
+
+=== Getting started with Jax, note the way we import numpy ===
+!bc pycod
+import jax
+import jax.numpy as jnp
+import numpy as np
+import matplotlib.pyplot as plt
+
+from jax import grad as jax_grad
+!ec
+
+
+=== A warm-up example ===
+
+!bc pycod
+def function(x):
+    return x**2
+
+def analytical_gradient(x):
+    return 2*x
+
+def gradient_descent(starting_point, learning_rate, num_iterations, solver="analytical"):
+    x = starting_point
+    trajectory_x = [x]
+    trajectory_y = [function(x)]
+
+    if solver == "analytical":
+        grad = analytical_gradient    
+    elif solver == "jax":
+        grad = jax_grad(function)
+        x = jnp.float64(x)
+        learning_rate = jnp.float64(learning_rate)
+
+    for _ in range(num_iterations):
+        
+        x = x - learning_rate * grad(x)
+        trajectory_x.append(x)
+        trajectory_y.append(function(x))
+
+    return trajectory_x, trajectory_y
+
+x = np.linspace(-5, 5, 100)
+plt.plot(x, function(x), label="f(x)")
+
+descent_x, descent_y = gradient_descent(5, 0.1, 10, solver="analytical")
+jax_descend_x, jax_descend_y = gradient_descent(5, 0.1, 10, solver="jax")
+
+plt.plot(descent_x, descent_y, label="Gradient descent", marker="o")
+plt.plot(jax_descend_x, jax_descend_y, label="JAX", marker="x")
+!ec
+
+=== A more advanced example ===
+
+!bc pycod
+backend = np
+
+def function(x):
+    return x*backend.sin(x**2 + 1)
+
+def analytical_gradient(x):
+    return backend.sin(x**2 + 1) + 2*x**2*backend.cos(x**2 + 1)
+
+
+x = np.linspace(-5, 5, 100)
+plt.plot(x, function(x), label="f(x)")
+
+descent_x, descent_y = gradient_descent(1, 0.01, 300, solver="analytical")
+
+# Change the backend to JAX
+backend = jnp
+jax_descend_x, jax_descend_y = gradient_descent(1, 0.01, 300, solver="jax")
+
+plt.scatter(descent_x, descent_y, label="Gradient descent", marker="v", s=10, color="red") 
+plt.scatter(jax_descend_x, jax_descend_y, label="JAX", marker="x", s=5, color="black")
+!ec
+
+
+
+
+
+
+
+
+
+
+
+!split
+===== Introduction to Neural networks =====
+
+Artificial neural networks are computational systems that can learn to
+perform tasks by considering examples, generally without being
+programmed with any task-specific rules. It is supposed to mimic a
+biological system, wherein neurons interact by sending signals in the
+form of mathematical functions between layers. All layers can contain
+an arbitrary number of neurons, and each connection is represented by
+a weight variable.
+
+
+!split
+===== Artificial neurons  =====
+
+The field of artificial neural networks has a long history of
+development, and is closely connected with the advancement of computer
+science and computers in general. A model of artificial neurons was
+first developed by McCulloch and Pitts in 1943 to study signal
+processing in the brain and has later been refined by others. The
+general idea is to mimic neural networks in the human brain, which is
+composed of billions of neurons that communicate with each other by
+sending electrical signals.  Each neuron accumulates its incoming
+signals, which must exceed an activation threshold to yield an
+output. If the threshold is not overcome, the neuron remains inactive,
+i.e. has zero output.
+
+This behaviour has inspired a simple mathematical model for an artificial neuron.
+
+!bt
+\begin{equation}
+ y = f\left(\sum_{i=1}^n w_ix_i\right) = f(u)
+ label{artificialNeuron}
+\end{equation}
+!et
+Here, the output $y$ of the neuron is the value of its activation function, which have as input
+a weighted sum of signals $x_i, \dots ,x_n$ received by $n$ other neurons.
+
+Conceptually, it is helpful to divide neural networks into four
+categories:
+o general purpose neural networks for supervised learning,
+o neural networks designed specifically for image processing, the most prominent example of this class being Convolutional Neural Networks (CNNs),
+o neural networks for sequential data such as Recurrent Neural Networks (RNNs), and
+o neural networks for unsupervised learning such as Deep Boltzmann Machines.
+
+
+In natural science, DNNs and CNNs have already found numerous
+applications. In statistical physics, they have been applied to detect
+phase transitions in 2D Ising and Potts models, lattice gauge
+theories, and different phases of polymers, or solving the
+Navier-Stokes equation in weather forecasting.  Deep learning has also
+found interesting applications in quantum physics. Various quantum
+phase transitions can be detected and studied using DNNs and CNNs,
+topological phases, and even non-equilibrium many-body
+localization. Representing quantum states as DNNs quantum state
+tomography are among some of the impressive achievements to reveal the
+potential of DNNs to facilitate the study of quantum systems.
+
+In quantum information theory, it has been shown that one can perform
+gate decompositions with the help of neural. 
+
+The applications are not limited to the natural sciences. There is a
+plethora of applications in essentially all disciplines, from the
+humanities to life science and medicine.
+
+!split
+===== Neural network types =====
+
+An artificial neural network (ANN), is a computational model that
+consists of layers of connected neurons, or nodes or units.  We will
+refer to these interchangeably as units or nodes, and sometimes as
+neurons.
+
+It is supposed to mimic a biological nervous system by letting each
+neuron interact with other neurons by sending signals in the form of
+mathematical functions between layers.  A wide variety of different
+ANNs have been developed, but most of them consist of an input layer,
+an output layer and eventual layers in-between, called *hidden
+layers*. All layers can contain an arbitrary number of nodes, and each
+connection between two nodes is associated with a weight variable.
+
+Neural networks (also called neural nets) are neural-inspired
+nonlinear models for supervised learning.  As we will see, neural nets
+can be viewed as natural, more powerful extensions of supervised
+learning methods such as linear and logistic regression and soft-max
+methods we discussed earlier.
+
+
+!split
+===== Feed-forward neural networks =====
+
+The feed-forward neural network (FFNN) was the first and simplest type
+of ANNs that were devised. In this network, the information moves in
+only one direction: forward through the layers.
+
+Nodes are represented by circles, while the arrows display the
+connections between the nodes, including the direction of information
+flow. Additionally, each arrow corresponds to a weight variable
+(figure to come).  We observe that each node in a layer is connected
+to *all* nodes in the subsequent layer, making this a so-called
+*fully-connected* FFNN.
+
+
+
+!split
+===== Convolutional Neural Network =====
+
+A different variant of FFNNs are *convolutional neural networks*
+(CNNs), which have a connectivity pattern inspired by the animal
+visual cortex. Individual neurons in the visual cortex only respond to
+stimuli from small sub-regions of the visual field, called a receptive
+field. This makes the neurons well-suited to exploit the strong
+spatially local correlation present in natural images. The response of
+each neuron can be approximated mathematically as a convolution
+operation.  (figure to come)
+
+Convolutional neural networks emulate the behaviour of neurons in the
+visual cortex by enforcing a *local* connectivity pattern between
+nodes of adjacent layers: Each node in a convolutional layer is
+connected only to a subset of the nodes in the previous layer, in
+contrast to the fully-connected FFNN.  Often, CNNs consist of several
+convolutional layers that learn local features of the input, with a
+fully-connected layer at the end, which gathers all the local data and
+produces the outputs. They have wide applications in image and video
+recognition.
+
+!split
+===== Recurrent neural networks =====
+
+So far we have only mentioned ANNs where information flows in one
+direction: forward. *Recurrent neural networks* on the other hand,
+have connections between nodes that form directed *cycles*. This
+creates a form of internal memory which are able to capture
+information on what has been calculated before; the output is
+dependent on the previous computations. Recurrent NNs make use of
+sequential information by performing the same task for every element
+in a sequence, where each element depends on previous elements. An
+example of such information is sentences, making recurrent NNs
+especially well-suited for handwriting and speech recognition.
+
+!split
+===== Other types of networks =====
+
+There are many other kinds of ANNs that have been developed. One type
+that is specifically designed for interpolation in multidimensional
+space is the radial basis function (RBF) network. RBFs are typically
+made up of three layers: an input layer, a hidden layer with
+non-linear radial symmetric activation functions and a linear output
+layer (''linear'' here means that each node in the output layer has a
+linear activation function). The layers are normally fully-connected
+and there are no cycles, thus RBFs can be viewed as a type of
+fully-connected FFNN. They are however usually treated as a separate
+type of NN due the unusual activation functions.
+
+!split
+===== Multilayer perceptrons  =====
+
+One uses often so-called fully-connected feed-forward neural networks
+with three or more layers (an input layer, one or more hidden layers
+and an output layer) consisting of neurons that have non-linear
+activation functions.
+
+Such networks are often called *multilayer perceptrons* (MLPs).
+
+!split
+===== Why multilayer perceptrons?  =====
+
+According to the *Universal approximation theorem*, a feed-forward
+neural network with just a single hidden layer containing a finite
+number of neurons can approximate a continuous multidimensional
+function to arbitrary accuracy, assuming the activation function for
+the hidden layer is a _non-constant, bounded and
+monotonically-increasing continuous function_.
+
+Note that the requirements on the activation function only applies to
+the hidden layer, the output nodes are always assumed to be linear, so
+as to not restrict the range of output values.
+
+
+!split
+=====  Illustration of a single perceptron model and a multi-perceptron model =====
+
+FIGURE: [figures/nns.png, width=600 frac=0.8]  In a) we show a single perceptron model while in b) we dispay a network with two  hidden layers, an input layer and an output layer.
+
+
+!split
+===== Examples of XOR, OR and AND gates =====
+
+
+
+Let us first try to fit various gates using standard linear
+regression. The gates we are thinking of are the classical XOR, OR and
+AND gates, well-known elements in computer science. The tables here
+show how we can set up the inputs $x_1$ and $x_2$ in order to yield a
+specific target $y_i$.
+
+
+
+
+!bc pycod
+"""
+Simple code that tests XOR, OR and AND gates with linear regression
+"""
+
+import numpy as np
+# Design matrix
+X = np.array([ [1, 0, 0], [1, 0, 1], [1, 1, 0],[1, 1, 1]],dtype=np.float64)
+print(f"The X.TX  matrix:{X.T @ X}")
+Xinv = np.linalg.pinv(X.T @ X)
+print(f"The invers of X.TX  matrix:{Xinv}")
+
+# The XOR gate 
+yXOR = np.array( [ 0, 1 ,1, 0])
+ThetaXOR  = Xinv @ X.T @ yXOR
+print(f"The values of theta for the XOR gate:{ThetaXOR}")
+print(f"The linear regression prediction  for the XOR gate:{X @ ThetaXOR}")
+
+
+# The OR gate 
+yOR = np.array( [ 0, 1 ,1, 1])
+ThetaOR  = Xinv @ X.T @ yOR
+print(f"The values of theta for the OR gate:{ThetaOR}")
+print(f"The linear regression prediction  for the OR gate:{X @ ThetaOR}")
+
+
+# The OR gate 
+yAND = np.array( [ 0, 0 ,0, 1])
+ThetaAND  = Xinv @ X.T @ yAND
+print(f"The values of theta for the AND gate:{ThetaAND}")
+print(f"The linear regression prediction  for the AND gate:{X @ ThetaAND}")
+!ec
+
+What is happening here?
+
+!split
+===== Does Logistic Regression do a better Job? =====
+
+!bc pycod
+"""
+Simple code that tests XOR and OR gates with linear regression
+and logistic regression
+"""
+
+import matplotlib.pyplot as plt
+from sklearn.linear_model import LogisticRegression
+import numpy as np
+
+# Design matrix
+X = np.array([ [1, 0, 0], [1, 0, 1], [1, 1, 0],[1, 1, 1]],dtype=np.float64)
+print(f"The X.TX  matrix:{X.T @ X}")
+Xinv = np.linalg.pinv(X.T @ X)
+print(f"The invers of X.TX  matrix:{Xinv}")
+
+# The XOR gate 
+yXOR = np.array( [ 0, 1 ,1, 0])
+ThetaXOR  = Xinv @ X.T @ yXOR
+print(f"The values of theta for the XOR gate:{ThetaXOR}")
+print(f"The linear regression prediction  for the XOR gate:{X @ ThetaXOR}")
+
+
+# The OR gate 
+yOR = np.array( [ 0, 1 ,1, 1])
+ThetaOR  = Xinv @ X.T @ yOR
+print(f"The values of theta for the OR gate:{ThetaOR}")
+print(f"The linear regression prediction  for the OR gate:{X @ ThetaOR}")
+
+
+# The OR gate 
+yAND = np.array( [ 0, 0 ,0, 1])
+ThetaAND  = Xinv @ X.T @ yAND
+print(f"The values of theta for the AND gate:{ThetaAND}")
+print(f"The linear regression prediction  for the AND gate:{X @ ThetaAND}")
+
+# Now we change to logistic regression
+
+
+# Logistic Regression
+logreg = LogisticRegression()
+logreg.fit(X, yOR)
+print("Test set accuracy with Logistic Regression for OR gate: {:.2f}".format(logreg.score(X,yOR)))
+
+logreg.fit(X, yXOR)
+print("Test set accuracy with Logistic Regression for XOR gate: {:.2f}".format(logreg.score(X,yXOR)))
+
+
+logreg.fit(X, yAND)
+print("Test set accuracy with Logistic Regression for AND gate: {:.2f}".format(logreg.score(X,yAND)))
+!ec
+
+Not exactly impressive, but somewhat better.
+
+!split
+===== Adding Neural Networks =====
+
+!bc pycod
+
+# and now neural networks with Scikit-Learn and the XOR
+
+from sklearn.neural_network import MLPClassifier
+from sklearn.datasets import make_classification
+X, yXOR = make_classification(n_samples=100, random_state=1)
+FFNN = MLPClassifier(random_state=1, max_iter=300).fit(X, yXOR)
+FFNN.predict_proba(X)
+print(f"Test set accuracy with Feed Forward Neural Network  for XOR gate:{FFNN.score(X, yXOR)}")
+
+!ec
+
+
+
+!split
+===== Mathematical model  =====
+
+The output $y$ is produced via the activation function $f$
+!bt
+\[
+ y = f\left(\sum_{i=1}^n w_ix_i + b_i\right) = f(z),
+\]
+!et
+This function receives $x_i$ as inputs.
+Here the activation $z=(\sum_{i=1}^n w_ix_i+b_i)$. 
+In an FFNN of such neurons, the *inputs* $x_i$ are the *outputs* of
+the neurons in the preceding layer. Furthermore, an MLP is
+fully-connected, which means that each neuron receives a weighted sum
+of the outputs of *all* neurons in the previous layer.
+
+!split
+===== Mathematical model  =====
+
+First, for each node $i$ in the first hidden layer, we calculate a weighted sum $z_i^1$ of the input coordinates $x_j$,
+
+!bt
+\begin{equation} z_i^1 = \sum_{j=1}^{M} w_{ij}^1 x_j + b_i^1
+\end{equation}
+!et
+
+Here $b_i$ is the so-called bias which is normally needed in
+case of zero activation weights or inputs. How to fix the biases and
+the weights will be discussed below.  The value of $z_i^1$ is the
+argument to the activation function $f_i$ of each node $i$, The
+variable $M$ stands for all possible inputs to a given node $i$ in the
+first layer.  We define  the output $y_i^1$ of all neurons in layer 1 as
+
+!bt
+\begin{equation}
+ y_i^1 = f(z_i^1) = f\left(\sum_{j=1}^M w_{ij}^1 x_j  + b_i^1\right)
+ label{outputLayer1}
+\end{equation}
+!et
+
+where we assume that all nodes in the same layer have identical
+activation functions, hence the notation $f$. In general, we could assume in the more general case that different layers have different activation functions.
+In this case we would identify these functions with a superscript $l$ for the $l$-th layer,
+
+!bt
+\begin{equation}
+ y_i^l = f^l(u_i^l) = f^l\left(\sum_{j=1}^{N_{l-1}} w_{ij}^l y_j^{l-1} + b_i^l\right)
+ label{generalLayer}
+\end{equation}
+!et
+
+where $N_l$ is the number of nodes in layer $l$. When the output of
+all the nodes in the first hidden layer are computed, the values of
+the subsequent layer can be calculated and so forth until the output
+is obtained.
+
+
+
+!split
+===== Mathematical model  =====
+
+The output of neuron $i$ in layer 2 is thus,
+
+!bt
+\begin{align}
+ y_i^2 &= f^2\left(\sum_{j=1}^N w_{ij}^2 y_j^1 + b_i^2\right) \\
+ &= f^2\left[\sum_{j=1}^N w_{ij}^2f^1\left(\sum_{k=1}^M w_{jk}^1 x_k + b_j^1\right) + b_i^2\right]
+ label{outputLayer2}
+\end{align}
+!et
+where we have substituted $y_k^1$ with the inputs $x_k$. Finally, the ANN output reads
+
+!bt
+\begin{align}
+ y_i^3 &= f^3\left(\sum_{j=1}^N w_{ij}^3 y_j^2 + b_i^3\right) \\
+ &= f_3\left[\sum_{j} w_{ij}^3 f^2\left(\sum_{k} w_{jk}^2 f^1\left(\sum_{m} w_{km}^1 x_m + b_k^1\right) + b_j^2\right)
+  + b_1^3\right]
+\end{align}
+!et
+
+!split
+===== Mathematical model  =====
+
+We can generalize this expression to an MLP with $l$ hidden
+layers. The complete functional form is,
+
+!bt
+\begin{align}
+&y^{l+1}_i = f^{l+1}\left[\!\sum_{j=1}^{N_l} w_{ij}^3 f^l\left(\sum_{k=1}^{N_{l-1}}w_{jk}^{l-1}\left(\dots f^1\left(\sum_{n=1}^{N_0} w_{mn}^1 x_n+ b_m^1\right)\dots\right)+b_k^2\right)+b_1^3\right] &&
+ label{completeNN}
+\end{align}
+!et
+
+which illustrates a basic property of MLPs: The only independent
+variables are the input values $x_n$.
+
+!split
+===== Mathematical model  =====
+
+This confirms that an MLP, despite its quite convoluted mathematical
+form, is nothing more than an analytic function, specifically a
+mapping of real-valued vectors $\hat{x} \in \mathbb{R}^n \rightarrow
+\hat{y} \in \mathbb{R}^m$.
+
+Furthermore, the flexibility and universality of an MLP can be
+illustrated by realizing that the expression is essentially a nested
+sum of scaled activation functions of the form
+
+!bt
+\begin{equation}
+ f(x) = c_1 f(c_2 x + c_3) + c_4
+\end{equation}
+!et
+
+where the parameters $c_i$ are weights and biases. By adjusting these
+parameters, the activation functions can be shifted up and down or
+left and right, change slope or be rescaled which is the key to the
+flexibility of a neural network.
+
+!split
+=== Matrix-vector notation ===
+
+We can introduce a more convenient notation for the activations in an A NN. 
+
+Additionally, we can represent the biases and activations
+as layer-wise column vectors $\hat{b}_l$ and $\hat{y}_l$, so that the $i$-th element of each vector 
+is the bias $b_i^l$ and activation $y_i^l$ of node $i$ in layer $l$ respectively. 
+
+We have that $\mathrm{W}_l$ is an $N_{l-1} \times N_l$ matrix, while $\hat{b}_l$ and $\hat{y}_l$ are $N_l \times 1$ column vectors. 
+With this notation, the sum becomes a matrix-vector multiplication, and we can write
+the equation for the activations of hidden layer 2 (assuming three nodes for simplicity) as
+!bt
+\begin{equation}
+ \hat{y}_2 = f_2(\mathrm{W}_2 \hat{y}_{1} + \hat{b}_{2}) = 
+ f_2\left(\left[\begin{array}{ccc}
+    w^2_{11} &w^2_{12} &w^2_{13} \\
+    w^2_{21} &w^2_{22} &w^2_{23} \\
+    w^2_{31} &w^2_{32} &w^2_{33} \\
+    \end{array} \right] \cdot
+    \left[\begin{array}{c}
+           y^1_1 \\
+           y^1_2 \\
+           y^1_3 \\
+          \end{array}\right] + 
+    \left[\begin{array}{c}
+           b^2_1 \\
+           b^2_2 \\
+           b^2_3 \\
+          \end{array}\right]\right).
+\end{equation}
+!et
+
+!split
+=== Matrix-vector notation  and activation ===
+
+The activation of node $i$ in layer 2 is
+
+!bt
+\begin{equation}
+ y^2_i = f_2\Bigr(w^2_{i1}y^1_1 + w^2_{i2}y^1_2 + w^2_{i3}y^1_3 + b^2_i\Bigr) = 
+ f_2\left(\sum_{j=1}^3 w^2_{ij} y_j^1 + b^2_i\right).
+\end{equation}
+!et 
+
+This is not just a convenient and compact notation, but also a useful
+and intuitive way to think about MLPs: The output is calculated by a
+series of matrix-vector multiplications and vector additions that are
+used as input to the activation functions. For each operation
+$\mathrm{W}_l \hat{y}_{l-1}$ we move forward one layer.
+
+
+!split
+=== Activation functions  ===
+
+
+A property that characterizes a neural network, other than its
+connectivity, is the choice of activation function(s).  As described
+in, the following restrictions are imposed on an activation function
+for a FFNN to fulfill the universal approximation theorem
+
+  * Non-constant
+
+  * Bounded
+
+  * Monotonically-increasing
+
+  * Continuous
+
+!split
+=== Activation functions, Logistic and Hyperbolic ones  ===
+
+The second requirement excludes all linear functions. Furthermore, in
+a MLP with only linear activation functions, each layer simply
+performs a linear transformation of its inputs.
+
+Regardless of the number of layers, the output of the NN will be
+nothing but a linear function of the inputs. Thus we need to introduce
+some kind of non-linearity to the NN to be able to fit non-linear
+functions Typical examples are the logistic *Sigmoid*
+
+!bt
+\[
+ f(x) = \frac{1}{1 + e^{-x}},
+\]
+!et
+and the *hyperbolic tangent* function
+!bt
+\[
+ f(x) = \tanh(x)
+\]
+!et
+
+!split
+=== Relevance === 
+
+The *sigmoid* function are more biologically plausible because the
+output of inactive neurons are zero. Such activation function are
+called *one-sided*. However, it has been shown that the hyperbolic
+tangent performs better than the sigmoid for training MLPs.  has
+become the most popular for *deep neural networks*
+
+!bc pycod
+"""The sigmoid function (or the logistic curve) is a 
+function that takes any real number, z, and outputs a number (0,1).
+It is useful in neural networks for assigning weights on a relative scale.
+The value z is the weighted sum of parameters involved in the learning algorithm."""
+
+import numpy
+import matplotlib.pyplot as plt
+import math as mt
+
+z = numpy.arange(-5, 5, .1)
+sigma_fn = numpy.vectorize(lambda z: 1/(1+numpy.exp(-z)))
+sigma = sigma_fn(z)
+
+fig = plt.figure()
+ax = fig.add_subplot(111)
+ax.plot(z, sigma)
+ax.set_ylim([-0.1, 1.1])
+ax.set_xlim([-5,5])
+ax.grid(True)
+ax.set_xlabel('z')
+ax.set_title('sigmoid function')
+
+plt.show()
+
+"""Step Function"""
+z = numpy.arange(-5, 5, .02)
+step_fn = numpy.vectorize(lambda z: 1.0 if z >= 0.0 else 0.0)
+step = step_fn(z)
+
+fig = plt.figure()
+ax = fig.add_subplot(111)
+ax.plot(z, step)
+ax.set_ylim([-0.5, 1.5])
+ax.set_xlim([-5,5])
+ax.grid(True)
+ax.set_xlabel('z')
+ax.set_title('step function')
+
+plt.show()
+
+"""Sine Function"""
+z = numpy.arange(-2*mt.pi, 2*mt.pi, 0.1)
+t = numpy.sin(z)
+
+fig = plt.figure()
+ax = fig.add_subplot(111)
+ax.plot(z, t)
+ax.set_ylim([-1.0, 1.0])
+ax.set_xlim([-2*mt.pi,2*mt.pi])
+ax.grid(True)
+ax.set_xlabel('z')
+ax.set_title('sine function')
+
+plt.show()
+
+"""Plots a graph of the squashing function used by a rectified linear
+unit"""
+z = numpy.arange(-2, 2, .1)
+zero = numpy.zeros(len(z))
+y = numpy.max([zero, z], axis=0)
+
+fig = plt.figure()
+ax = fig.add_subplot(111)
+ax.plot(z, y)
+ax.set_ylim([-2.0, 2.0])
+ax.set_xlim([-2.0, 2.0])
+ax.grid(True)
+ax.set_xlabel('z')
+ax.set_title('Rectified linear unit')
+
+plt.show()
+!ec
+
+
diff --git a/doc/src/week40/binary_results.csv b/doc/src/week40/binary_results.csv
new file mode 100644
index 000000000..1a5f8e043
--- /dev/null
+++ b/doc/src/week40/binary_results.csv
@@ -0,0 +1,201 @@
+TrueLabel,PredictedLabel
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,0
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
diff --git a/doc/src/week40/multiclass_results.csv b/doc/src/week40/multiclass_results.csv
new file mode 100644
index 000000000..9ffbe5203
--- /dev/null
+++ b/doc/src/week40/multiclass_results.csv
@@ -0,0 +1,301 @@
+TrueLabel,PredictedLabel
+0,0
+0,1
+0,0
+0,0
+0,1
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,1
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,1
+0,0
+0,2
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+0,0
+1,1
+1,1
+1,0
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,0
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,0
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,1
+1,0
+1,1
+1,1
+1,0
+1,1
+1,1
+1,1
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
+2,2
diff --git a/doc/src/week40/week40.do.txt b/doc/src/week40/week40.do.txt
index 696d13c3c..917bfc307 100644
--- a/doc/src/week40/week40.do.txt
+++ b/doc/src/week40/week40.do.txt
@@ -5,28 +5,25 @@ DATE: September 29-October 3, 2025
 
 
 !split
-===== Plans for week 40 =====
-
-
-!split
-===== Lecture Monday September 30, 2024 =====
-!bblock 
-  o Stochastic Gradient descent with examples and automatic differentiation
-  o If we get time, we start with the basics of Neural Networks, setting up the basic steps, from the simple perceptron model to the multi-layer perceptron model
-  o "Video of lecture":"https://youtu.be/jdJoOrCIdII"
-  o Whiteboard notes at URL:"https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2024/NotesSeptember30.pdf"  
+===== Lecture Monday September 29, 2025 =====
+!bblock
+o Logistic regression and gradient descent, examples on how to code
+#o Automatic differentiation and gradient descent, examples using Logistic regression
+o Start with the basics of Neural Networks, setting up the basic steps, from the simple perceptron model to the multi-layer perceptron model
+o Video of lecture at URL:"https://youtu.be/MS3Tv8FVArs"
+o Whiteboard notes at URL:"https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2025/FYSSTKweek40.pdf"  
 !eblock
 
 !split
 ===== Suggested readings and videos =====
 !bblock Readings and Videos:
-   o The lecture notes for week 40 (these notes)
-   o For a good discussion on gradient methods, we would like to recommend Goodfellow et al section 4.3-4.5 and sections 8.3-8.6. We will come back to the latter chapter in our discussion of Neural networks as well.
-   o For neural networks we recommend Goodfellow et al chapter 6 and Raschka et al chapter 2 (contains also material about gradient descent) and chapter 11 (we will use this next week)
-   o Video on gradient descent at URL:"https://www.youtube.com/watch?v=sDv4f4s2SB8"
-   o Video on stochastic gradient descent at URL:"https://www.youtube.com/watch?v=vMh0zPT0tLI"
-   o Neural Networks demystified at URL:"https://www.youtube.com/watch?v=bxe2T-V8XRs&list=PLiaHhY2iBX9hdHaRr6b7XevZtgZRa1PoU&ab_channel=WelchLabs"
-   o Building Neural Networks from scratch at URL:https://www.youtube.com/watch?v=Wo5dMEP_BbI&list=PLQVvvaa0QuDcjD5BAw2DxE6OF2tius3V3&ab_channel=sentdex"
+o The lecture notes for week 40 (these notes)
+#   o For a good discussion on gradient methods, we would like to recommend Goodfellow et al section 4.3-4.5 and# sections 8.3-8.6. We will come back to the latter chapter in our discussion of Neural networks as well.
+o For neural networks we recommend Goodfellow et al chapter 6 and Raschka et al chapter 2 (contains also material about gradient descent) and chapter 11 (we will use this next week)
+#   o Video on gradient descent at URL:"https://www.youtube.com/watch?v=sDv4f4s2SB8"
+#   o Video on automatic differentiation  at URL:"https://www.youtube.com/watch?v=wG_nF1awSSY"
+o Neural Networks demystified at URL:"https://www.youtube.com/watch?v=bxe2T-V8XRs&list=PLiaHhY2iBX9hdHaRr6b7XevZtgZRa1PoU&ab_channel=WelchLabs"
+o Building Neural Networks from scratch at URL:https://www.youtube.com/watch?v=Wo5dMEP_BbI&list=PLQVvvaa0QuDcjD5BAw2DxE6OF2tius3V3&ab_channel=sentdex"
 !eblock
 
 !split
@@ -39,1609 +36,783 @@ DATE: September 29-October 3, 2025
 !eblock  
 
 
-!split
-===== Summary from last week, using gradient descent methods, limitations =====
-
-* _Gradient descent (GD) finds local minima of our function_. Since the GD algorithm is deterministic, if it converges, it will converge to a local minimum of our cost/loss/risk function. Because in ML we are often dealing with extremely rugged landscapes with many local minima, this can lead to poor performance.
-
-* _GD is sensitive to initial conditions_. One consequence of the local nature of GD is that initial conditions matter. Depending on where one starts, one will end up at a different local minima. Therefore, it is very important to think about how one initializes the training process. This is true for GD as well as more complicated variants of GD.
-
-* _Gradients are computationally expensive to calculate for large datasets_. In many cases in statistics and ML, the cost/loss/risk function is a sum of terms, with one term for each data point. For example, in linear regression, $E \propto \sum_{i=1}^n (y_i - \mathbf{w}^T\cdot\mathbf{x}_i)^2$; for logistic regression, the square error is replaced by the cross entropy. To calculate the gradient we have to sum over *all* $n$ data points. Doing this at every GD step becomes extremely computationally expensive. An ingenious solution to this, is to calculate the gradients using small subsets of the data called ``mini batches''. This has the added benefit of introducing stochasticity into our algorithm.
-
-* _GD is very sensitive to choices of learning rates_. GD is extremely sensitive to the choice of learning rates. If the learning rate is very small, the training process take an extremely long time. For larger learning rates, GD can diverge and give poor results. Furthermore, depending on what the local landscape looks like, we have to modify the learning rates to ensure convergence. Ideally, we would *adaptively* choose the learning rates to match the landscape.
 
-* _GD treats all directions in parameter space uniformly._ Another major drawback of GD is that unlike Newton's method, the learning rate for GD is the same in all directions in parameter space. For this reason, the maximum learning rate is set by the behavior of the steepest direction and this can significantly slow down training. Ideally, we would like to take large steps in flat directions and small steps in steep directions. Since we are exploring rugged landscapes where curvatures change, this requires us to keep track of not only the gradient but second derivatives. The ideal scenario would be to calculate the Hessian but this proves to be too computationally expensive. 
-	
-* GD can take exponential time to escape saddle points, even with random initialization. As we mentioned, GD is extremely sensitive to initial condition since it determines the particular local minimum GD would eventually reach. However, even with a good initialization scheme, through the introduction of randomness, GD can still take exponential time to escape saddle points.
 
-!split
-===== Simple implementation of GD for OLS, Ridge and Lasso =====
+!split 
+===== Logistic Regression, from last week  =====
+
+In linear regression our main interest was centered on learning the
+coefficients of a functional fit (say a polynomial) in order to be
+able to predict the response of a continuous variable on some unseen
+data. The fit to the continuous variable $y_i$ is based on some
+independent variables $\bm{x}_i$. Linear regression resulted in
+analytical expressions for standard ordinary Least Squares or Ridge
+regression (in terms of matrices to invert) for several quantities,
+ranging from the variance and thereby the confidence intervals of the
+parameters $\bm{\theta}$ to the mean squared error. If we can invert
+the product of the design matrices, linear regression gives then a
+simple recipe for fitting our data.
 
-Last week we studied both several gradient methods. With and without an update of the learning.
-We summarize some of these here for the methods we hvae studied in project one, without the inclusion of momentum.
-!bc pycod
-from random import random, seed
-import numpy as np
+!split 
+===== Classification problems =====
 
-# the number of datapoints with a 2nd-order polynomial
-n = 100
-x = 2*np.random.rand(n,1)
-y = 4+3*x+5*x*x
-# Design matrix including the intercept
-# No scaling of data of and all data used for training 
-X = np.c_[np.ones((n,1)), x, x*x]
-# Learning rate and number of iterations
-eta = 0.05
-Niterations = 100
-
-# OLS part
-beta_OLS = np.random.randn(3,1)
-gradient = np.zeros(3)
-for iter in range(Niterations):
-    gradient = (2.0/n)*X.T @ (X @ beta_OLS-y)
-    beta_OLS -= eta*gradient
-print('Parameters for OLS using gradient descent')    
-print(beta_OLS)
-
-#Ridge and Lasso parameter Lambda
-Lambda  = 0.01
-Id = n*Lambda* np.eye((X.T @ X).shape[0])
-# Gradient descent with  Ridge
-beta_Ridge = np.random.randn(3,1)
-gradient = np.zeros(3)
-for iter in range(Niterations):
-    gradients = 2.0/n*X.T @ (X @ beta_Ridge-y)+2*Lambda*beta_Ridge
-    beta_Ridge -= eta*gradients
-print('Parameters for Ridge using gradient descent')    
-print(beta_Ridge)
-
-# Gradient descent with Lasso
-beta_Lasso = np.random.randn(3,1)
-gradient = np.zeros(3)
-for iter in range(Niterations):
-    gradients = 2.0/n*X.T @ (X @ beta_Lasso-y)+Lambda*np.sign(beta_Lasso)
-    beta_Lasso -= eta*gradients
-print('Parameters for Lasso using gradient descent')    
-print(beta_Lasso)
 
-!ec
+Classification problems, however, are concerned with outcomes taking
+the form of discrete variables (i.e. categories). We may for example,
+on the basis of DNA sequencing for a number of patients, like to find
+out which mutations are important for a certain disease; or based on
+scans of various patients' brains, figure out if there is a tumor or
+not; or given a specific physical system, we'd like to identify its
+state, say whether it is an ordered or disordered system (typical
+situation in solid state physics); or classify the status of a
+patient, whether she/he has a stroke or not and many other similar
+situations.
 
+The most common situation we encounter when we apply logistic
+regression is that of two possible outcomes, normally denoted as a
+binary outcome, true or false, positive or negative, success or
+failure etc.
 
 !split
-===== But none of these can compete with Newton's method =====
+===== Optimization and Deep learning =====
 
-Note that we here have introduced automatic differentiation
-!bc pycod
-# Using Newton's method
-from random import random, seed
-import numpy as np
-import autograd.numpy as np
-from autograd import grad
-
-def CostOLS(beta):
-    return (1.0/n)*np.sum((y-X @ beta)**2)
-
-n = 100
-x = 2*np.random.rand(n,1)
-y = 4+3*x+5*x*x
-
-X = np.c_[np.ones((n,1)), x, x*x]
-XT_X = X.T @ X
-beta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)
-print("Own inversion")
-print(beta_linreg)
-# Hessian matrix
-H = (2.0/n)* XT_X
-# Note that here the Hessian does not depend on the parameters beta
-invH = np.linalg.pinv(H)
-beta = np.random.randn(3,1)
-Niterations = 5
-# define the gradient
-training_gradient = grad(CostOLS)
-
-for iter in range(Niterations):
-    gradients = training_gradient(beta)
-    beta -= invH @ gradients
-    print(iter,gradients[0],gradients[1])
-print("beta from own Newton code")
-print(beta)
+Logistic regression will also serve as our stepping stone towards
+neural network algorithms and supervised deep learning. For logistic
+learning, the minimization of the cost function leads to a non-linear
+equation in the parameters $\bm{\theta}$. The optimization of the
+problem calls therefore for minimization algorithms.
 
-!ec
+As we have discussed earlier, this forms the
+bottle neck of all machine learning algorithms, namely how to find
+reliable minima of a multi-variable function. This leads us to the
+family of gradient descent methods. The latter are the working horses
+of basically all modern machine learning algorithms.
 
-!split
-===== Gradient descent and Logistic regression =====
+We note also that many of the topics discussed here on logistic 
+regression are also commonly used in modern supervised Deep Learning
+models, as we will see later.
 
-Finally, we complete these examples by adding a simple code for
-Logistic regression. Note the more general approach with a class for
-the method. Here we use a so-called _AND_ gate for our data set.
 
-!bc pycod
-import numpy as np
-class LogisticRegression:
-    def __init__(self, learning_rate=0.01, num_iterations=1000):
-        self.learning_rate = learning_rate
-        self.num_iterations = num_iterations
-        self.beta_logreg = None
-    def sigmoid(self, z):
-        return 1 / (1 + np.exp(-z))
-    def GDfit(self, X, y):
-        n_data, num_features = X.shape
-        self.beta_logreg = np.zeros(num_features)
-        for _ in range(self.num_iterations):
-            linear_model = X @ self.beta_logreg
-            y_predicted = self.sigmoid(linear_model)
-            # Gradient calculation
-            gradient = (X.T @ (y_predicted - y))/n_data
-            # Update beta_logreg
-            self.beta_logreg -= self.learning_rate*gradient
-    def predict(self, X):
-        linear_model = X @ self.beta_logreg
-        y_predicted = self.sigmoid(linear_model)
-        return [1 if i >= 0.5 else 0 for i in y_predicted]
-# Example usage
-if __name__ == "__main__":
-    # Sample data
-    X = np.array([[0, 0], [1, 0], [0, 1], [1, 1]])
-    y = np.array([0, 0, 0, 1])  # This is an AND gate
-    model = LogisticRegression(learning_rate=0.01, num_iterations=1000)
-    model.GDfit(X, y)
-    predictions = model.predict(X)
-    print("Predictions:", predictions)
-!ec
-
-!split
-===== Overview video on Stochastic Gradient Descent =====
+!split 
+===== Basics =====
 
-"What is Stochastic Gradient Descent":"https://www.youtube.com/watch?v=vMh0zPT0tLI&ab_channel=StatQuestwithJoshStarmer"
-There are several reasons for using stochastic gradient descent. Some of these are:
+We consider the case where the outputs/targets, also called the
+responses or the outcomes, $y_i$ are discrete and only take values
+from $k=0,\dots,K-1$ (i.e. $K$ classes).
 
-o Efficiency: Updates weights more frequently using a single or a small batch of samples, which speeds up convergence.
-o Hopefully avoid Local Minima
-o Memory Usage: Requires less memory compared to computing gradients for the entire dataset.
+The goal is to predict the
+output classes from the design matrix $\bm{X}\in\mathbb{R}^{n\times p}$
+made of $n$ samples, each of which carries $p$ features or predictors. The
+primary goal is to identify the classes to which new unseen samples
+belong.
+ 
+Last week we  specialized to the case of two classes only, with outputs
+$y_i=0$ and $y_i=1$. Our outcomes could represent the status of a
+credit card user that could default or not on her/his credit card
+debt. That is
 
-!split
-=====  Batches and mini-batches =====
 
-In gradient descent we compute the cost function and its gradient for all data points we have.
+!bt
+\[
+y_i = \begin{bmatrix} 0 & \mathrm{no}\\  1 & \mathrm{yes} \end{bmatrix}.
+\]
+!et
 
-In large-scale applications such as the "ILSVRC challenge":"https://www.image-net.org/challenges/LSVRC/", the
-training data can have on order of millions of examples. Hence, it
-seems wasteful to compute the full cost function over the entire
-training set in order to perform only a single parameter update. A
-very common approach to addressing this challenge is to compute the
-gradient over batches of the training data. For example, a typical batch could contain some thousand  examples from
-an  entire training set of several millions. This batch is then used to
-perform a parameter update.
 
-!split
-===== Stochastic Gradient Descent (SGD) =====
-
-In stochastic gradient descent, the extreme case is the case where we
-have only one batch, that is we include the whole data set.
-
-This process is called Stochastic Gradient
-Descent (SGD) (or also sometimes on-line gradient descent). This is
-relatively less common to see because in practice due to vectorized
-code optimizations it can be computationally much more efficient to
-evaluate the gradient for 100 examples, than the gradient for one
-example 100 times. Even though SGD technically refers to using a
-single example at a time to evaluate the gradient, you will hear
-people use the term SGD even when referring to mini-batch gradient
-descent (i.e. mentions of MGD for “Minibatch Gradient Descent”, or BGD
-for “Batch gradient descent” are rare to see), where it is usually
-assumed that mini-batches are used. The size of the mini-batch is a
-hyperparameter but it is not very common to cross-validate or bootstrap it. It is
-usually based on memory constraints (if any), or set to some value,
-e.g. 32, 64 or 128. We use powers of 2 in practice because many
-vectorized operation implementations work faster when their inputs are
-sized in powers of 2.
-
-In our notes with  SGD we mean stochastic gradient descent with mini-batches.
 
 
 !split
-===== Stochastic Gradient Descent =====
+=====  Two parameters =====
 
-Stochastic gradient descent (SGD) and variants thereof address some of
-the shortcomings of the Gradient descent method discussed above.
+We assume now that we have two classes with $y_i$ either $0$ or $1$. Furthermore we assume also that we have only two parameters $\theta$ in our fitting of the Sigmoid function, that is we define probabilities 
+!bt
+\begin{align*}
+p(y_i=1|x_i,\bm{\theta}) &= \frac{\exp{(\theta_0+\theta_1x_i)}}{1+\exp{(\theta_0+\theta_1x_i)}},\nonumber\\
+p(y_i=0|x_i,\bm{\theta}) &= 1 - p(y_i=1|x_i,\bm{\theta}),
+\end{align*}
+!et
+where $\bm{\theta}$ are the weights we wish to extract from data, in our case $\theta_0$ and $\theta_1$. 
 
-The underlying idea of SGD comes from the observation that the cost
-function, which we want to minimize, can almost always be written as a
-sum over $n$ data points $\{\mathbf{x}_i\}_{i=1}^n$,
+Note that we used
 !bt
 \[
-C(\mathbf{\beta}) = \sum_{i=1}^n c_i(\mathbf{x}_i,
-\mathbf{\beta}). 
+p(y_i=0\vert x_i, \bm{\theta}) = 1-p(y_i=1\vert x_i, \bm{\theta}).
 \]
 !et
 
-!split
-===== Computation of gradients =====
-
-This in turn means that the gradient can be
-computed as a sum over $i$-gradients 
+!split 
+===== Maximum likelihood =====
+
+In order to define the total likelihood for all possible outcomes from a  
+dataset $\mathcal{D}=\{(y_i,x_i)\}$, with the binary labels
+$y_i\in\{0,1\}$ and where the data points are drawn independently, we use the so-called "Maximum Likelihood Estimation":"https://en.wikipedia.org/wiki/Maximum_likelihood_estimation" (MLE) principle. 
+We aim thus at maximizing 
+the probability of seeing the observed data. We can then approximate the 
+likelihood in terms of the product of the individual probabilities of a specific outcome $y_i$, that is 
+!bt
+\begin{align*}
+P(\mathcal{D}|\bm{\theta})& = \prod_{i=1}^n \left[p(y_i=1|x_i,\bm{\theta})\right]^{y_i}\left[1-p(y_i=1|x_i,\bm{\theta}))\right]^{1-y_i}\nonumber \\
+\end{align*}
+!et
+from which we obtain the log-likelihood and our _cost/loss_ function
 !bt
 \[
-\nabla_\beta C(\mathbf{\beta}) = \sum_i^n \nabla_\beta c_i(\mathbf{x}_i,
-\mathbf{\beta}).
+\mathcal{C}(\bm{\theta}) = \sum_{i=1}^n \left( y_i\log{p(y_i=1|x_i,\bm{\theta})} + (1-y_i)\log\left[1-p(y_i=1|x_i,\bm{\theta}))\right]\right).
 \]
 !et
 
-Stochasticity/randomness is introduced by only taking the
-gradient on a subset of the data called minibatches.  If there are $n$
-data points and the size of each minibatch is $M$, there will be $n/M$
-minibatches. We denote these minibatches by $B_k$ where
-$k=1,\cdots,n/M$.
-
-
-
 !split
-===== SGD example =====
-As an example, suppose we have $10$ data points $(\mathbf{x}_1,\cdots, \mathbf{x}_{10})$ 
-and we choose to have $M=5$ minibathces,
-then each minibatch contains two data points. In particular we have
-$B_1 = (\mathbf{x}_1,\mathbf{x}_2), \cdots, B_5 =
-(\mathbf{x}_9,\mathbf{x}_{10})$. Note that if you choose $M=1$ you
-have only a single batch with all data points and on the other extreme,
-you may choose $M=n$ resulting in a minibatch for each datapoint, i.e
-$B_k = \mathbf{x}_k$.
-
-The idea is now to approximate the gradient by replacing the sum over
-all data points with a sum over the data points in one the minibatches
-picked at random in each gradient descent step 
+===== The cost function rewritten =====
+
+Reordering the logarithms, we can rewrite the _cost/loss_ function as
 !bt
 \[
-\nabla_{\beta}
-C(\mathbf{\beta}) = \sum_{i=1}^n \nabla_\beta c_i(\mathbf{x}_i,
-\mathbf{\beta}) \rightarrow \sum_{i \in B_k}^n \nabla_\beta
-c_i(\mathbf{x}_i, \mathbf{\beta}).
+\mathcal{C}(\bm{\theta}) = \sum_{i=1}^n  \left(y_i(\theta_0+\theta_1x_i) -\log{(1+\exp{(\theta_0+\theta_1x_i)})}\right).
 \]
 !et
 
-!split
-===== The gradient step =====
-
-Thus a gradient descent step now looks like 
+The maximum likelihood estimator is defined as the set of parameters that maximize the log-likelihood where we maximize with respect to $\theta$.
+Since the cost (error) function is just the negative log-likelihood, for logistic regression we have that
 !bt
 \[
-\beta_{j+1} = \beta_j - \gamma_j \sum_{i \in B_k}^n \nabla_\beta c_i(\mathbf{x}_i,
-\mathbf{\beta})
+\mathcal{C}(\bm{\theta})=-\sum_{i=1}^n  \left(y_i(\theta_0+\theta_1x_i) -\log{(1+\exp{(\theta_0+\theta_1x_i)})}\right).
 \]
 !et
-
-where $k$ is picked at random with equal
-probability from $[1,n/M]$. An iteration over the number of
-minibathces (n/M) is commonly referred to as an epoch. Thus it is
-typical to choose a number of epochs and for each epoch iterate over
-the number of minibatches, as exemplified in the code below.
-
-!split
-===== Simple example code =====
-
-!bc pycod
-import numpy as np 
-
-n = 100 #100 datapoints 
-M = 5   #size of each minibatch
-m = int(n/M) #number of minibatches
-n_epochs = 10 #number of epochs
-
-j = 0
-for epoch in range(1,n_epochs+1):
-    for i in range(m):
-        k = np.random.randint(m) #Pick the k-th minibatch at random
-        #Compute the gradient using the data in minibatch Bk
-        #Compute new suggestion for 
-        j += 1
-!ec
-
-Taking the gradient only on a subset of the data has two important
-benefits. First, it introduces randomness which decreases the chance
-that our opmization scheme gets stuck in a local minima. Second, if
-the size of the minibatches are small relative to the number of
-datapoints ($M <  n$), the computation of the gradient is much
-cheaper since we sum over the datapoints in the $k-th$ minibatch and not
-all $n$ datapoints.
-
-!split
-===== When do we stop? =====
-
-A natural question is when do we stop the search for a new minimum?
-One possibility is to compute the full gradient after a given number
-of epochs and check if the norm of the gradient is smaller than some
-threshold and stop if true. However, the condition that the gradient
-is zero is valid also for local minima, so this would only tell us
-that we are close to a local/global minimum. However, we could also
-evaluate the cost function at this point, store the result and
-continue the search. If the test kicks in at a later stage we can
-compare the values of the cost function and keep the $\beta$ that
-gave the lowest value.
-
-!split
-===== Slightly different approach =====
-
-Another approach is to let the step length $\gamma_j$ depend on the
-number of epochs in such a way that it becomes very small after a
-reasonable time such that we do not move at all. Such approaches are
-also called scaling. There are many such ways to "scale the learning
-rate":"https://towardsdatascience.com/gradient-descent-the-learning-rate-and-the-importance-of-feature-scaling-6c0b416596e1"
-and "discussions here":"https://www.jmlr.org/papers/volume23/20-1258/20-1258.pdf". See
-also
-URL:"https://towardsdatascience.com/learning-rate-schedules-and-adaptive-learning-rate-methods-for-deep-learning-2c8f433990d1"
-for a discussion of different scaling functions for the learning rate.
-
-!split
-===== Time decay rate =====
-
-As an example, let $e = 0,1,2,3,\cdots$ denote the current epoch and let $t_0, t_1 > 0$ be two fixed numbers. Furthermore, let $t = e \cdot m + i$ where $m$ is the number of minibatches and $i=0,\cdots,m-1$. Then the function $$\gamma_j(t; t_0, t_1) = \frac{t_0}{t+t_1} $$ goes to zero as the number of epochs gets large. I.e. we start with a step length $\gamma_j (0; t_0, t_1) = t_0/t_1$ which decays in *time* $t$.
-
-In this way we can fix the number of epochs, compute $\beta$ and
-evaluate the cost function at the end. Repeating the computation will
-give a different result since the scheme is random by design. Then we
-pick the final $\beta$ that gives the lowest value of the cost
-function.
-
-!bc pycod
-import numpy as np 
-
-def step_length(t,t0,t1):
-    return t0/(t+t1)
-
-n = 100 #100 datapoints 
-M = 5   #size of each minibatch
-m = int(n/M) #number of minibatches
-n_epochs = 500 #number of epochs
-t0 = 1.0
-t1 = 10
-
-gamma_j = t0/t1
-j = 0
-for epoch in range(1,n_epochs+1):
-    for i in range(m):
-        k = np.random.randint(m) #Pick the k-th minibatch at random
-        #Compute the gradient using the data in minibatch Bk
-        #Compute new suggestion for beta
-        t = epoch*m+i
-        gamma_j = step_length(t,t0,t1)
-        j += 1
-
-print("gamma_j after %d epochs: %g" % (n_epochs,gamma_j))
-!ec
-
-
-!split
-===== Code with a Number of Minibatches which varies =====
-
-In the code here we vary the number of mini-batches.
-!bc pycode
-# Importing various packages
-from math import exp, sqrt
-from random import random, seed
-import numpy as np
-import matplotlib.pyplot as plt
-
-n = 100
-x = 2*np.random.rand(n,1)
-y = 4+3*x+np.random.randn(n,1)
-
-X = np.c_[np.ones((n,1)), x]
-XT_X = X.T @ X
-theta_linreg = np.linalg.inv(X.T @ X) @ (X.T @ y)
-print("Own inversion")
-print(theta_linreg)
-# Hessian matrix
-H = (2.0/n)* XT_X
-EigValues, EigVectors = np.linalg.eig(H)
-print(f"Eigenvalues of Hessian Matrix:{EigValues}")
-
-theta = np.random.randn(2,1)
-eta = 1.0/np.max(EigValues)
-Niterations = 1000
-
-
-for iter in range(Niterations):
-    gradients = 2.0/n*X.T @ ((X @ theta)-y)
-    theta -= eta*gradients
-print("theta from own gd")
-print(theta)
-
-xnew = np.array([[0],[2]])
-Xnew = np.c_[np.ones((2,1)), xnew]
-ypredict = Xnew.dot(theta)
-ypredict2 = Xnew.dot(theta_linreg)
-
-n_epochs = 50
-M = 5   #size of each minibatch
-m = int(n/M) #number of minibatches
-t0, t1 = 5, 50
-
-def learning_schedule(t):
-    return t0/(t+t1)
-
-theta = np.random.randn(2,1)
-
-for epoch in range(n_epochs):
-# Can you figure out a better way of setting up the contributions to each batch?
-    for i in range(m):
-        random_index = M*np.random.randint(m)
-        xi = X[random_index:random_index+M]
-        yi = y[random_index:random_index+M]
-        gradients = (2.0/M)* xi.T @ ((xi @ theta)-yi)
-        eta = learning_schedule(epoch*m+i)
-        theta = theta - eta*gradients
-print("theta from own sdg")
-print(theta)
-
-plt.plot(xnew, ypredict, "r-")
-plt.plot(xnew, ypredict2, "b-")
-plt.plot(x, y ,'ro')
-plt.axis([0,2.0,0, 15.0])
-plt.xlabel(r'$x$')
-plt.ylabel(r'$y$')
-plt.title(r'Random numbers ')
-plt.show()
-
-!ec
-
-
+This equation is known in statistics as the _cross entropy_. Finally, we note that just as in linear regression, 
+in practice we often supplement the cross-entropy with additional regularization terms, usually $L_1$ and $L_2$ regularization as we did for Ridge and Lasso regression.
 
 !split
-===== Replace or not =====
+=====  Minimizing the cross entropy =====
 
-In the above code, we have use replacement in setting up the
-mini-batches. The discussion
-"here":"https://sebastianraschka.com/faq/docs/sgd-methods.html" may be
-useful.  
+The cross entropy is a convex function of the weights $\bm{\theta}$ and,
+therefore, any local minimizer is a global minimizer. 
 
 
-!split
-===== Momentum based GD =====
-
-The stochastic gradient descent (SGD) is almost always used with a
-*momentum* or inertia term that serves as a memory of the direction we
-are moving in parameter space.  This is typically implemented as
-follows
+Minimizing this
+cost function with respect to the two parameters $\theta_0$ and $\theta_1$ we obtain
 
 !bt
-\begin{align}
-\mathbf{v}_{t}&=\gamma \mathbf{v}_{t-1}+\eta_{t}\nabla_\theta E(\boldsymbol{\theta}_t) \nonumber \\
-\boldsymbol{\theta}_{t+1}&= \boldsymbol{\theta}_t -\mathbf{v}_{t},
-\end{align}
+\[
+\frac{\partial \mathcal{C}(\bm{\theta})}{\partial \theta_0} = -\sum_{i=1}^n  \left(y_i -\frac{\exp{(\theta_0+\theta_1x_i)}}{1+\exp{(\theta_0+\theta_1x_i)}}\right),
+\]
 !et
-
-where we have introduced a momentum parameter $\gamma$, with
-$0\le\gamma\le 1$, and for brevity we dropped the explicit notation to
-indicate the gradient is to be taken over a different mini-batch at
-each step. We call this algorithm gradient descent with momentum
-(GDM). From these equations, it is clear that $\mathbf{v}_t$ is a
-running average of recently encountered gradients and
-$(1-\gamma)^{-1}$ sets the characteristic time scale for the memory
-used in the averaging procedure. Consistent with this, when
-$\gamma=0$, this just reduces down to ordinary SGD as discussed
-earlier. An equivalent way of writing the updates is
-
+and 
 !bt
 \[
-\Delta \boldsymbol{\theta}_{t+1} = \gamma \Delta \boldsymbol{\theta}_t -\ \eta_{t}\nabla_\theta E(\boldsymbol{\theta}_t),
+\frac{\partial \mathcal{C}(\bm{\theta})}{\partial \theta_1} = -\sum_{i=1}^n  \left(y_ix_i -x_i\frac{\exp{(\theta_0+\theta_1x_i)}}{1+\exp{(\theta_0+\theta_1x_i)}}\right).
 \]
 !et
-where we have defined $\Delta \boldsymbol{\theta}_{t}= \boldsymbol{\theta}_t-\boldsymbol{\theta}_{t-1}$.
 
 !split
-===== More on momentum based approaches =====
+=====  A more compact expression =====
 
-Let us try to get more intuition from these equations. It is helpful
-to consider a simple physical analogy with a particle of mass $m$
-moving in a viscous medium with drag coefficient $\mu$ and potential
-$E(\mathbf{w})$. If we denote the particle's position by $\mathbf{w}$,
-then its motion is described by
+Let us now define a vector $\bm{y}$ with $n$ elements $y_i$, an
+$n\times p$ matrix $\bm{X}$ which contains the $x_i$ values and a
+vector $\bm{p}$ of fitted probabilities $p(y_i\vert x_i,\bm{\theta})$. We can rewrite in a more compact form the first
+derivative of the cost function as
 
 !bt
 \[
-m {d^2 \mathbf{w} \over dt^2} + \mu {d \mathbf{w} \over dt }= -\nabla_w E(\mathbf{w}).
+\frac{\partial \mathcal{C}(\bm{\theta})}{\partial \bm{\theta}} = -\bm{X}^T\left(\bm{y}-\bm{p}\right). 
 \]
 !et
 
-We can discretize this equation in the usual way to get
+If we in addition define a diagonal matrix $\bm{W}$ with elements 
+$p(y_i\vert x_i,\bm{\theta})(1-p(y_i\vert x_i,\bm{\theta})$, we can obtain a compact expression of the second derivative as 
 
 !bt
 \[
-m { \mathbf{w}_{t+\Delta t}-2 \mathbf{w}_{t} +\mathbf{w}_{t-\Delta t} \over (\Delta t)^2}+\mu {\mathbf{w}_{t+\Delta t}- \mathbf{w}_{t} \over \Delta t} = -\nabla_w E(\mathbf{w}).
+\frac{\partial^2 \mathcal{C}(\bm{\theta})}{\partial \bm{\theta}\partial \bm{\theta}^T} = \bm{X}^T\bm{W}\bm{X}. 
 \]
 !et
 
-Rearranging this equation, we can rewrite this as
+!split
+===== Extending to more predictors =====
 
+Within a binary classification problem, we can easily expand our model to include multiple predictors. Our ratio between likelihoods is then with $p$ predictors
 !bt
 \[
-\Delta \mathbf{w}_{t +\Delta t}= - { (\Delta t)^2 \over m +\mu \Delta t} \nabla_w E(\mathbf{w})+ {m \over m +\mu \Delta t} \Delta \mathbf{w}_t.
+\log{ \frac{p(\bm{\theta}\bm{x})}{1-p(\bm{\theta}\bm{x})}} = \theta_0+\theta_1x_1+\theta_2x_2+\dots+\theta_px_p.
+\]
+!et
+Here we defined $\bm{x}=[1,x_1,x_2,\dots,x_p]$ and $\bm{\theta}=[\theta_0, \theta_1, \dots, \theta_p]$ leading to
+!bt
+\[
+p(\bm{\theta}\bm{x})=\frac{ \exp{(\theta_0+\theta_1x_1+\theta_2x_2+\dots+\theta_px_p)}}{1+\exp{(\theta_0+\theta_1x_1+\theta_2x_2+\dots+\theta_px_p)}}.
 \]
 !et
 
 !split
-===== Momentum parameter =====
+===== Including more classes =====
 
-Notice that this equation is identical to previous one if we identify
-the position of the particle, $\mathbf{w}$, with the parameters
-$\boldsymbol{\theta}$. This allows us to identify the momentum
-parameter and learning rate with the mass of the particle and the
-viscous drag as:
+Till now we have mainly focused on two classes, the so-called binary
+system. Suppose we wish to extend to $K$ classes.  Let us for the sake
+of simplicity assume we have only two predictors. We have then following model
 
 !bt
 \[
-\gamma= {m \over m +\mu \Delta t }, \qquad \eta = {(\Delta t)^2 \over m +\mu \Delta t}.
+\log{\frac{p(C=1\vert x)}{p(K\vert x)}} = \theta_{10}+\theta_{11}x_1,
 \]
 !et
-
-Thus, as the name suggests, the momentum parameter is proportional to
-the mass of the particle and effectively provides inertia.
-Furthermore, in the large viscosity/small learning rate limit, our
-memory time scales as $(1-\gamma)^{-1} \approx m/(\mu \Delta t)$.
-
-Why is momentum useful? SGD momentum helps the gradient descent
-algorithm gain speed in directions with persistent but small gradients
-even in the presence of stochasticity, while suppressing oscillations
-in high-curvature directions. This becomes especially important in
-situations where the landscape is shallow and flat in some directions
-and narrow and steep in others. It has been argued that first-order
-methods (with appropriate initial conditions) can perform comparable
-to more expensive second order methods, especially in the context of
-complex deep learning models.
-
-These beneficial properties of momentum can sometimes become even more
-pronounced by using a slight modification of the classical momentum
-algorithm called Nesterov Accelerated Gradient (NAG).
-
-In the NAG algorithm, rather than calculating the gradient at the
-current parameters, $\nabla_\theta E(\boldsymbol{\theta}_t)$, one
-calculates the gradient at the expected value of the parameters given
-our current momentum, $\nabla_\theta E(\boldsymbol{\theta}_t +\gamma
-\mathbf{v}_{t-1})$. This yields the NAG update rule
-
+and 
 !bt
-\begin{align}
-\mathbf{v}_{t}&=\gamma \mathbf{v}_{t-1}+\eta_{t}\nabla_\theta E(\boldsymbol{\theta}_t +\gamma \mathbf{v}_{t-1}) \nonumber \\
-\boldsymbol{\theta}_{t+1}&= \boldsymbol{\theta}_t -\mathbf{v}_{t}.
-\end{align}
+\[
+\log{\frac{p(C=2\vert x)}{p(K\vert x)}} = \theta_{20}+\theta_{21}x_1,
+\]
 !et
-
-One of the major advantages of NAG is that it allows for the use of a larger learning rate than GDM for the same choice of $\gamma$.
-
-
-!split
-===== Second moment of the gradient =====
-
-
-In stochastic gradient descent, with and without momentum, we still
-have to specify a schedule for tuning the learning rates $\eta_t$
-as a function of time.  As discussed in the context of Newton's
-method, this presents a number of dilemmas. The learning rate is
-limited by the steepest direction which can change depending on the
-current position in the landscape. To circumvent this problem, ideally
-our algorithm would keep track of curvature and take large steps in
-shallow, flat directions and small steps in steep, narrow directions.
-Second-order methods accomplish this by calculating or approximating
-the Hessian and normalizing the learning rate by the
-curvature. However, this is very computationally expensive for
-extremely large models. Ideally, we would like to be able to
-adaptively change the step size to match the landscape without paying
-the steep computational price of calculating or approximating
-Hessians.
-
-During the last decade a number of methods have been introduced that accomplish
-this by tracking not only the gradient, but also the second moment of
-the gradient. These methods include AdaGrad, AdaDelta, Root Mean Squared Propagation (RMS-Prop), and
-"ADAM":"https://arxiv.org/abs/1412.6980".
-
-!split
-===== RMS prop =====
-
-In RMS prop, in addition to keeping a running average of the first
-moment of the gradient, we also keep track of the second moment
-denoted by $\mathbf{s}_t=\mathbb{E}[\mathbf{g}_t^2]$. The update rule
-for RMS prop is given by
-
+and so on till the class $C=K-1$ class
 !bt
-\begin{align}
-\mathbf{g}_t &= \nabla_\theta E(\boldsymbol{\theta}) \\
-\mathbf{s}_t &=\beta \mathbf{s}_{t-1} +(1-\beta)\mathbf{g}_t^2 \nonumber \\
-\boldsymbol{\theta}_{t+1}&=&\boldsymbol{\theta}_t - \eta_t { \mathbf{g}_t \over \sqrt{\mathbf{s}_t +\epsilon}}, \nonumber
-\end{align}
+\[
+\log{\frac{p(C=K-1\vert x)}{p(K\vert x)}} = \theta_{(K-1)0}+\theta_{(K-1)1}x_1,
+\]
 !et
 
-where $\beta$ controls the averaging time of the second moment and is
-typically taken to be about $\beta=0.9$, $\eta_t$ is a learning rate
-typically chosen to be $10^{-3}$, and $\epsilon\sim 10^{-8} $ is a
-small regularization constant to prevent divergences. Multiplication
-and division by vectors is understood as an element-wise operation. It
-is clear from this formula that the learning rate is reduced in
-directions where the norm of the gradient is consistently large. This
-greatly speeds up the convergence by allowing us to use a larger
-learning rate for flat directions.
+and the model is specified in term of $K-1$ so-called log-odds or
+_logit_ transformations.
 
 
 !split
-===== "ADAM optimizer":"https://arxiv.org/abs/1412.6980" =====
-
-A related algorithm is the ADAM optimizer. In
-"ADAM":"https://arxiv.org/abs/1412.6980", we keep a running average of
-both the first and second moment of the gradient and use this
-information to adaptively change the learning rate for different
-parameters.  The method isefficient when working with large
-problems involving lots data and/or parameters.  It is a combination of the
-gradient descent with momentum algorithm and the RMSprop algorithm
-discussed above.
-
-In addition to keeping a running average of the first and
-second moments of the gradient
-(i.e. $\mathbf{m}_t=\mathbb{E}[\mathbf{g}_t]$ and
-$\mathbf{s}_t=\mathbb{E}[\mathbf{g}^2_t]$, respectively), ADAM
-performs an additional bias correction to account for the fact that we
-are estimating the first two moments of the gradient using a running
-average (denoted by the hats in the update rule below). The update
-rule for ADAM is given by (where multiplication and division are once
-again understood to be element-wise operations below)
+===== More classes =====
 
-!bt
-\begin{align}
-\mathbf{g}_t &= \nabla_\theta E(\boldsymbol{\theta}) \\
-\mathbf{m}_t &= \beta_1 \mathbf{m}_{t-1} + (1-\beta_1) \mathbf{g}_t \nonumber \\
-\mathbf{s}_t &=\beta_2 \mathbf{s}_{t-1} +(1-\beta_2)\mathbf{g}_t^2 \nonumber \\
-\bm{\mathbf{m}}_t&={\mathbf{m}_t \over 1-\beta_1^t} \nonumber \\
-\bm{\mathbf{s}}_t &={\mathbf{s}_t \over1-\beta_2^t} \nonumber \\
-\boldsymbol{\theta}_{t+1}&=\boldsymbol{\theta}_t - \eta_t { \bm{\mathbf{m}}_t \over \sqrt{\bm{\mathbf{s}}_t} +\epsilon}, \nonumber \\
-\end{align}
-!et
-
-where $\beta_1$ and $\beta_2$ set the memory lifetime of the first and
-second moment and are typically taken to be $0.9$ and $0.99$
-respectively, and $\eta$ and $\epsilon$ are identical to RMSprop.
+In our discussion of neural networks we will encounter the above again
+in terms of a slightly modified function, the so-called _Softmax_ function.
 
-Like in RMSprop, the effective step size of a parameter depends on the
-magnitude of its gradient squared.  To understand this better, let us
-rewrite this expression in terms of the variance
-$\boldsymbol{\sigma}_t^2 = \bm{\mathbf{s}}_t -
-(\bm{\mathbf{m}}_t)^2$. Consider a single parameter $\theta_t$. The
-update rule for this parameter is given by
+The softmax function is used in various multiclass classification
+methods, such as multinomial logistic regression (also known as
+softmax regression), multiclass linear discriminant analysis, naive
+Bayes classifiers, and artificial neural networks.  Specifically, in
+multinomial logistic regression and linear discriminant analysis, the
+input to the function is the result of $K$ distinct linear functions,
+and the predicted probability for the $k$-th class given a sample
+vector $\bm{x}$ and a weighting vector $\bm{\theta}$ is (with two
+predictors):
 
 !bt
 \[
-\Delta \theta_{t+1}= -\eta_t { \bm{m}_t \over \sqrt{\sigma_t^2 +  m_t^2 }+\epsilon}.
+p(C=k\vert \mathbf {x} )=\frac{\exp{(\theta_{k0}+\theta_{k1}x_1)}}{1+\sum_{l=1}^{K-1}\exp{(\theta_{l0}+\theta_{l1}x_1)}}.
+\]
+!et
+It is easy to extend to more predictors. The final class is 
+!bt
+\[
+p(C=K\vert \mathbf {x} )=\frac{1}{1+\sum_{l=1}^{K-1}\exp{(\theta_{l0}+\theta_{l1}x_1)}},
 \]
 !et
 
-!split
-===== Algorithms and codes for Adagrad, RMSprop and Adam =====
-
-The algorithms we have implemented are well described in the text by "Goodfellow, Bengio and Courville, chapter 8":"https://www.deeplearningbook.org/contents/optimization.html".
-
-The codes which implement these algorithms are discussed after our presentation of automatic differentiation.
-
-
-===== AdaGrad algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html" =====
+and they sum to one. Our earlier discussions were all specialized to
+the case with two classes only. It is easy to see from the above that
+what we derived earlier is compatible with these equations.
 
-FIGURE: [figures/adagrad.png, width=600 frac=0.8]
+To find the optimal parameters we would typically use a gradient
+descent method.  Newton's method and gradient descent methods are
+discussed in the material on "optimization
+methods":"https://compphysics.github.io/MachineLearning/doc/pub/Splines/html/Splines-bs.html".
 
 
-===== RMSProp algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html" =====
 
-FIGURE: [figures/rmsprop.png, width=600 frac=0.8]
 
 
+!split
+===== Optimization, the central part of any Machine Learning algortithm =====
 
-===== ADAM algorithm, taken from "Goodfellow et al":"https://www.deeplearningbook.org/contents/optimization.html" =====
 
-FIGURE: [figures/adam.png, width=600 frac=0.8]
+Almost every problem in machine learning and data science starts with
+a dataset $X$, a model $g(\theta)$, which is a function of the
+parameters $\theta$ and a cost function $C(X, g(\theta))$ that allows
+us to judge how well the model $g(\theta)$ explains the observations
+$X$. The model is fit by finding the values of $\theta$ that minimize
+the cost function. Ideally we would be able to solve for $\theta$
+analytically, however this is not possible in general and we must use
+some approximative/numerical method to compute the minimum.
 
 
+!split
+=====  Revisiting our Logistic Regression case =====
 
+In our discussion on Logistic Regression we studied the 
+case of
+two classes, with $y_i$ either
+$0$ or $1$. Furthermore we assumed also that we have only two
+parameters $\theta$ in our fitting, that is we
+defined probabilities
 
+!bt
+\begin{align*}
+p(y_i=1|x_i,\bm{\theta}) &= \frac{\exp{(\theta_0+\theta_1x_i)}}{1+\exp{(\theta_0+\theta_1x_i)}},\nonumber\\
+p(y_i=0|x_i,\bm{\theta}) &= 1 - p(y_i=1|x_i,\bm{\theta}),
+\end{align*}
+!et
+where $\bm{\theta}$ are the weights we wish to extract from data, in our case $\theta_0$ and $\theta_1$. 
 
 !split
-===== Practical tips =====
-
-* _Randomize the data when making mini-batches_. It is always important to randomly shuffle the data when forming mini-batches. Otherwise, the gradient descent method can fit spurious correlations resulting from the order in which data is presented.
+===== The equations to solve =====
 
-* _Transform your inputs_. Learning becomes difficult when our landscape has a mixture of steep and flat directions. One simple trick for minimizing these situations is to standardize the data by subtracting the mean and normalizing the variance of input variables. Whenever possible, also decorrelate the inputs. To understand why this is helpful, consider the case of linear regression. It is easy to show that for the squared error cost function, the Hessian of the cost function is just the correlation matrix between the inputs. Thus, by standardizing the inputs, we are ensuring that the landscape looks homogeneous in all directions in parameter space. Since most deep networks can be viewed as linear transformations followed by a non-linearity at each layer, we expect this intuition to hold beyond the linear case.
+Our compact equations used a definition of a vector $\bm{y}$ with $n$
+elements $y_i$, an $n\times p$ matrix $\bm{X}$ which contains the
+$x_i$ values and a vector $\bm{p}$ of fitted probabilities
+$p(y_i\vert x_i,\bm{\theta})$. We rewrote in a more compact form
+the first derivative of the cost function as
 
-* _Monitor the out-of-sample performance._ Always monitor the performance of your model on a validation set (a small portion of the training data that is held out of the training process to serve as a proxy for the test set. If the validation error starts increasing, then the model is beginning to overfit. Terminate the learning process. This *early stopping* significantly improves performance in many settings.
-	
-* _Adaptive optimization methods don't always have good generalization._ Recent studies have shown that adaptive methods such as ADAM, RMSPorp, and AdaGrad tend to have poor generalization compared to SGD or SGD with momentum, particularly in the high-dimensional limit (i.e. the number of parameters exceeds the number of data points). Although it is not clear at this stage why these methods perform so well in training deep neural networks, simpler procedures like properly-tuned SGD may work as well or better in these applications.
-
-Geron's text, see chapter 11, has several interesting discussions.
+!bt
+\[
+\frac{\partial \mathcal{C}(\bm{\theta})}{\partial \bm{\theta}} = -\bm{X}^T\left(\bm{y}-\bm{p}\right). 
+\]
+!et
 
+If we in addition define a diagonal matrix $\bm{W}$ with elements 
+$p(y_i\vert x_i,\bm{\theta})(1-p(y_i\vert x_i,\bm{\theta})$, we can obtain a compact expression of the second derivative as 
 
+!bt
+\[
+\frac{\partial^2 \mathcal{C}(\bm{\theta})}{\partial \bm{\theta}\partial \bm{\theta}^T} = \bm{X}^T\bm{W}\bm{X}. 
+\]
+!et
+This defines what is called  the Hessian matrix.
 
 !split
-===== Automatic differentiation =====
-
-"Automatic differentiation (AD)":"https://en.wikipedia.org/wiki/Automatic_differentiation", 
-also called algorithmic
-differentiation or computational differentiation,is a set of
-techniques to numerically evaluate the derivative of a function
-specified by a computer program. AD exploits the fact that every
-computer program, no matter how complicated, executes a sequence of
-elementary arithmetic operations (addition, subtraction,
-multiplication, division, etc.) and elementary functions (exp, log,
-sin, cos, etc.). By applying the chain rule repeatedly to these
-operations, derivatives of arbitrary order can be computed
-automatically, accurately to working precision, and using at most a
-small constant factor more arithmetic operations than the original
-program.
-
-Automatic differentiation is neither:
+===== Solving using Newton-Raphson's method =====
 
-* Symbolic differentiation, nor
-* Numerical differentiation (the method of finite differences).
+If we can set up these equations, Newton-Raphson's iterative method is normally the method of choice. It requires however that we can compute in an efficient way the  matrices that define the first and second derivatives. 
 
-Symbolic differentiation can lead to inefficient code and faces the
-difficulty of converting a computer program into a single expression,
-while numerical differentiation can introduce round-off errors in the
-discretization process and cancellation
+Our iterative scheme is then given by
 
-
-
-Python has tools for so-called _automatic differentiation_.
-Consider the following example
 !bt
 \[
-f(x) = \sin\left(2\pi x + x^2\right)
+\bm{\theta}^{\mathrm{new}} = \bm{\theta}^{\mathrm{old}}-\left(\frac{\partial^2 \mathcal{C}(\bm{\theta})}{\partial \bm{\theta}\partial \bm{\theta}^T}\right)^{-1}_{\bm{\theta}^{\mathrm{old}}}\times \left(\frac{\partial \mathcal{C}(\bm{\theta})}{\partial \bm{\theta}}\right)_{\bm{\theta}^{\mathrm{old}}},
 \]
 !et
-which has the following derivative
+or in matrix form as
+
 !bt
 \[
-f'(x) = \cos\left(2\pi x + x^2\right)\left(2\pi + 2x\right) 
+\bm{\theta}^{\mathrm{new}} = \bm{\theta}^{\mathrm{old}}-\left(\bm{X}^T\bm{W}\bm{X} \right)^{-1}\times \left(-\bm{X}^T(\bm{y}-\bm{p}) \right)_{\bm{\theta}^{\mathrm{old}}}.
 \]
 !et
-Using _autograd_ we have
-
-!bc pycod
-import autograd.numpy as np
-
-# To do elementwise differentiation:
-from autograd import elementwise_grad as egrad 
-
-# To plot:
-import matplotlib.pyplot as plt 
-
-
-def f(x):
-    return np.sin(2*np.pi*x + x**2)
+The right-hand side is computed with the old values of $\theta$. 
 
-def f_grad_analytic(x):
-    return np.cos(2*np.pi*x + x**2)*(2*np.pi + 2*x)
+If we can compute these matrices, in particular the Hessian, the above is often the easiest method to implement. 
 
-# Do the comparison:
-x = np.linspace(0,1,1000)
-
-f_grad = egrad(f)
-
-computed = f_grad(x)
-analytic = f_grad_analytic(x)
-
-plt.title('Derivative computed from Autograd compared with the analytical derivative')
-plt.plot(x,computed,label='autograd')
-plt.plot(x,analytic,label='analytic')
-
-plt.xlabel('x')
-plt.ylabel('y')
-plt.legend()
-
-plt.show()
-
-print("The max absolute difference is: %g"%(np.max(np.abs(computed - analytic))))
-!ec
-
-!split 
-===== Using autograd =====
-
-Here we
-experiment with what kind of functions Autograd is capable
-of finding the gradient of. The following Python functions are just
-meant to illustrate what Autograd can do, but please feel free to
-experiment with other, possibly more complicated, functions as well.
-
-!bc pycod
-import autograd.numpy as np
-from autograd import grad
-
-def f1(x):
-    return x**3 + 1
-
-f1_grad = grad(f1)
-
-# Remember to send in float as argument to the computed gradient from Autograd!
-a = 1.0
-
-# See the evaluated gradient at a using autograd:
-print("The gradient of f1 evaluated at a = %g using autograd is: %g"%(a,f1_grad(a)))
-
-# Compare with the analytical derivative, that is f1'(x) = 3*x**2 
-grad_analytical = 3*a**2
-print("The gradient of f1 evaluated at a = %g by finding the analytic expression is: %g"%(a,grad_analytical))
-!ec
 
 
 !split
-===== Autograd with more complicated functions =====
-
-To differentiate with respect to two (or more) arguments of a Python
-function, Autograd need to know at which variable the function if
-being differentiated with respect to.
+===== Example code for Logistic Regression =====
 
+Here we make a class for Logistic regression. The code uses a simple data set and includes both a binary case and a multiclass case.
 !bc pycod
-import autograd.numpy as np
-from autograd import grad
-def f2(x1,x2):
-    return 3*x1**3 + x2*(x1 - 5) + 1
-
-# By sending the argument 0, Autograd will compute the derivative w.r.t the first variable, in this case x1
-f2_grad_x1 = grad(f2,0)
-
-# ... and differentiate w.r.t x2 by sending 1 as an additional arugment to grad
-f2_grad_x2 = grad(f2,1)
-
-x1 = 1.0
-x2 = 3.0 
-
-print("Evaluating at x1 = %g, x2 = %g"%(x1,x2))
-print("-"*30)
-
-# Compare with the analytical derivatives:
-
-# Derivative of f2 w.r.t x1 is: 9*x1**2 + x2:
-f2_grad_x1_analytical = 9*x1**2 + x2
-
-# Derivative of f2 w.r.t x2 is: x1 - 5:
-f2_grad_x2_analytical = x1 - 5
+import numpy as np
 
-# See the evaluated derivations:
-print("The derivative of f2 w.r.t x1: %g"%( f2_grad_x1(x1,x2) ))
-print("The analytical derivative of f2 w.r.t x1: %g"%( f2_grad_x1(x1,x2) ))
+class LogisticRegression:
+    """
+    Logistic Regression for binary and multiclass classification.
+    """
+    def __init__(self, lr=0.01, epochs=1000, fit_intercept=True, verbose=False):
+        self.lr = lr                  # Learning rate for gradient descent
+        self.epochs = epochs          # Number of iterations
+        self.fit_intercept = fit_intercept  # Whether to add intercept (bias)
+        self.verbose = verbose        # Print loss during training if True
+        self.weights = None
+        self.multi_class = False      # Will be determined at fit time
+
+    def _add_intercept(self, X):
+        """Add intercept term (column of ones) to feature matrix."""
+        intercept = np.ones((X.shape[0], 1))
+        return np.concatenate((intercept, X), axis=1)
+
+    def _sigmoid(self, z):
+        """Sigmoid function for binary logistic."""
+        return 1 / (1 + np.exp(-z))
 
-print()
+    def _softmax(self, Z):
+        """Softmax function for multiclass logistic."""
+        exp_Z = np.exp(Z - np.max(Z, axis=1, keepdims=True))
+        return exp_Z / np.sum(exp_Z, axis=1, keepdims=True)
+
+    def fit(self, X, y):
+        """
+        Train the logistic regression model using gradient descent.
+        Supports binary (sigmoid) and multiclass (softmax) based on y.
+        """
+        X = np.array(X)
+        y = np.array(y)
+        n_samples, n_features = X.shape
+
+        # Add intercept if needed
+        if self.fit_intercept:
+            X = self._add_intercept(X)
+            n_features += 1
+
+        # Determine classes and mode (binary vs multiclass)
+        unique_classes = np.unique(y)
+        if len(unique_classes) > 2:
+            self.multi_class = True
+        else:
+            self.multi_class = False
+
+        # ----- Multiclass case -----
+        if self.multi_class:
+            n_classes = len(unique_classes)
+            # Map original labels to 0...n_classes-1
+            class_to_index = {c: idx for idx, c in enumerate(unique_classes)}
+            y_indices = np.array([class_to_index[c] for c in y])
+            # Initialize weight matrix (features x classes)
+            self.weights = np.zeros((n_features, n_classes))
+
+            # One-hot encode y
+            Y_onehot = np.zeros((n_samples, n_classes))
+            Y_onehot[np.arange(n_samples), y_indices] = 1
+
+            # Gradient descent
+            for epoch in range(self.epochs):
+                scores = X.dot(self.weights)          # Linear scores (n_samples x n_classes)
+                probs = self._softmax(scores)        # Probabilities (n_samples x n_classes)
+                # Compute gradient (features x classes)
+                gradient = (1 / n_samples) * X.T.dot(probs - Y_onehot)
+                # Update weights
+                self.weights -= self.lr * gradient
+
+                if self.verbose and epoch % 100 == 0:
+                    # Compute current loss (categorical cross-entropy)
+                    loss = -np.sum(Y_onehot * np.log(probs + 1e-15)) / n_samples
+                    print(f"[Epoch {epoch}] Multiclass loss: {loss:.4f}")
+
+        # ----- Binary case -----
+        else:
+            # Convert y to 0/1 if not already
+            if not np.array_equal(unique_classes, [0, 1]):
+                # Map the two classes to 0 and 1
+                class0, class1 = unique_classes
+                y_binary = np.where(y == class1, 1, 0)
+            else:
+                y_binary = y.copy().astype(int)
+
+            # Initialize weights vector (features,)
+            self.weights = np.zeros(n_features)
+
+            # Gradient descent
+            for epoch in range(self.epochs):
+                linear_model = X.dot(self.weights)     # (n_samples,)
+                probs = self._sigmoid(linear_model)   # (n_samples,)
+                # Gradient for binary cross-entropy
+                gradient = (1 / n_samples) * X.T.dot(probs - y_binary)
+                self.weights -= self.lr * gradient
+
+                if self.verbose and epoch % 100 == 0:
+                    # Compute binary cross-entropy loss
+                    loss = -np.mean(
+                        y_binary * np.log(probs + 1e-15) + 
+                        (1 - y_binary) * np.log(1 - probs + 1e-15)
+                    )
+                    print(f"[Epoch {epoch}] Binary loss: {loss:.4f}")
+
+    def predict_prob(self, X):
+        """
+        Compute probability estimates. Returns a 1D array for binary or
+        a 2D array (n_samples x n_classes) for multiclass.
+        """
+        X = np.array(X)
+        # Add intercept if the model used it
+        if self.fit_intercept:
+            X = self._add_intercept(X)
+        scores = X.dot(self.weights)
+        if self.multi_class:
+            return self._softmax(scores)
+        else:
+            return self._sigmoid(scores)
 
-print("The derivative of f2 w.r.t x2: %g"%( f2_grad_x2(x1,x2) ))
-print("The analytical derivative of f2 w.r.t x2: %g"%( f2_grad_x2(x1,x2) ))
+    def predict(self, X):
+        """
+        Predict class labels for samples in X.
+        Returns integer class labels (0,1 for binary, or 0...C-1 for multiclass).
+        """
+        probs = self.predict_prob(X)
+        if self.multi_class:
+            # Choose class with highest probability
+            return np.argmax(probs, axis=1)
+        else:
+            # Threshold at 0.5 for binary
+            return (probs >= 0.5).astype(int)
 !ec
 
-Note that the grad function will not produce the true gradient of the function. The true gradient of a function with two or more variables will produce a vector, where each element is the function differentiated w.r.t a variable.
 
-
-!split
-=====  More complicated functions using the elements of their arguments directly =====
+The class implements the sigmoid and softmax internally. During fit(),
+we check the number of classes: if more than 2, we set
+self.multi_class=True and perform multinomial logistic regression. We
+one-hot encode the target vector and update a weight matrix with
+softmax probabilities. Otherwise, we do standard binary logistic
+regression, converting labels to 0/1 if needed and updating a weight
+vector. In both cases we use batch gradient descent on the
+cross-entropy loss (we add a small epsilon 1e-15 to logs for numerical
+stability). Progress (loss) can be printed if verbose=True.
 
 !bc pycod
-import autograd.numpy as np
-from autograd import grad
-def f3(x): # Assumes x is an array of length 5 or higher
-    return 2*x[0] + 3*x[1] + 5*x[2] + 7*x[3] + 11*x[4]**2
-
-f3_grad = grad(f3)
-
-x = np.linspace(0,4,5)
-
-# Print the computed gradient:
-print("The computed gradient of f3 is: ", f3_grad(x))
-
-# The analytical gradient is: (2, 3, 5, 7, 22*x[4])
-f3_grad_analytical = np.array([2, 3, 5, 7, 22*x[4]])
-
-# Print the analytical gradient:
-print("The analytical gradient of f3 is: ", f3_grad_analytical)
+# Evaluation Metrics
+#We define helper functions for accuracy and cross-entropy loss. Accuracy is the fraction of correct predictions . For loss, we compute the appropriate cross-entropy:
+
+def accuracy_score(y_true, y_pred):
+    """Accuracy = (# correct predictions) / (total samples)."""
+    y_true = np.array(y_true)
+    y_pred = np.array(y_pred)
+    return np.mean(y_true == y_pred)
+
+def binary_cross_entropy(y_true, y_prob):
+    """
+    Binary cross-entropy loss.
+    y_true: true binary labels (0 or 1), y_prob: predicted probabilities for class 1.
+    """
+    y_true = np.array(y_true)
+    y_prob = np.clip(np.array(y_prob), 1e-15, 1-1e-15)
+    return -np.mean(y_true * np.log(y_prob) + (1 - y_true) * np.log(1 - y_prob))
+
+def categorical_cross_entropy(y_true, y_prob):
+    """
+    Categorical cross-entropy loss for multiclass.
+    y_true: true labels (0...C-1), y_prob: array of predicted probabilities (n_samples x C).
+    """
+    y_true = np.array(y_true, dtype=int)
+    y_prob = np.clip(np.array(y_prob), 1e-15, 1-1e-15)
+    # One-hot encode true labels
+    n_samples, n_classes = y_prob.shape
+    one_hot = np.zeros_like(y_prob)
+    one_hot[np.arange(n_samples), y_true] = 1
+    # Compute cross-entropy
+    loss_vec = -np.sum(one_hot * np.log(y_prob), axis=1)
+    return np.mean(loss_vec)
 !ec
 
-Note that in this case, when sending an array as input argument, the
-output from Autograd is another array. This is the true gradient of
-the function, as opposed to the function in the previous example. By
-using arrays to represent the variables, the output from Autograd
-might be easier to work with, as the output is closer to what one
-could expect form a gradient-evaluting function.
-
-!split 
-===== Functions using mathematical functions from Numpy =====
-
-!bc pycod
-import autograd.numpy as np
-from autograd import grad
-def f4(x):
-    return np.sqrt(1+x**2) + np.exp(x) + np.sin(2*np.pi*x)
-
-f4_grad = grad(f4)
-
-x = 2.7
-
-# Print the computed derivative:
-print("The computed derivative of f4 at x = %g is: %g"%(x,f4_grad(x)))
-
-# The analytical derivative is: x/sqrt(1 + x**2) + exp(x) + cos(2*pi*x)*2*pi
-f4_grad_analytical = x/np.sqrt(1 + x**2) + np.exp(x) + np.cos(2*np.pi*x)*2*np.pi
 
-# Print the analytical gradient:
-print("The analytical gradient of f4 at x = %g is: %g"%(x,f4_grad_analytical))
-!ec
+=== Synthetic data generation ===
 
+Binary classification data: Create two Gaussian clusters in 2D. For example, class 0 around mean [-2,-2] and class 1 around [2,2].
+Multiclass data: Create several Gaussian clusters (one per class) spread out in feature space.
 
-!split
-===== More autograd =====
 
 !bc pycod
-import autograd.numpy as np
-from autograd import grad
-def f5(x):
-    if x >= 0:
-        return x**2
-    else:
-        return -3*x + 1
-
-f5_grad = grad(f5)
+import numpy as np
 
-x = 2.7
+def generate_binary_data(n_samples=100, n_features=2, random_state=None):
+    """
+    Generate synthetic binary classification data.
+    Returns (X, y) where X is (n_samples x n_features), y in {0,1}.
+    """
+    rng = np.random.RandomState(random_state)
+    # Half samples for class 0, half for class 1
+    n0 = n_samples // 2
+    n1 = n_samples - n0
+    # Class 0 around mean -2, class 1 around +2
+    mean0 = -2 * np.ones(n_features)
+    mean1 =  2 * np.ones(n_features)
+    X0 = rng.randn(n0, n_features) + mean0
+    X1 = rng.randn(n1, n_features) + mean1
+    X = np.vstack((X0, X1))
+    y = np.array([0]*n0 + [1]*n1)
+    return X, y
+
+def generate_multiclass_data(n_samples=150, n_features=2, n_classes=3, random_state=None):
+    """
+    Generate synthetic multiclass data with n_classes Gaussian clusters.
+    """
+    rng = np.random.RandomState(random_state)
+    X = []
+    y = []
+    samples_per_class = n_samples // n_classes
+    for cls in range(n_classes):
+        # Random cluster center for each class
+        center = rng.uniform(-5, 5, size=n_features)
+        Xi = rng.randn(samples_per_class, n_features) + center
+        yi = [cls] * samples_per_class
+        X.append(Xi)
+        y.extend(yi)
+    X = np.vstack(X)
+    y = np.array(y)
+    return X, y
+
+
+# Generate and test on binary data
+X_bin, y_bin = generate_binary_data(n_samples=200, n_features=2, random_state=42)
+model_bin = LogisticRegression(lr=0.1, epochs=1000)
+model_bin.fit(X_bin, y_bin)
+y_prob_bin = model_bin.predict_prob(X_bin)      # probabilities for class 1
+y_pred_bin = model_bin.predict(X_bin)           # predicted classes 0 or 1
+
+acc_bin = accuracy_score(y_bin, y_pred_bin)
+loss_bin = binary_cross_entropy(y_bin, y_prob_bin)
+print(f"Binary Classification - Accuracy: {acc_bin:.2f}, Cross-Entropy Loss: {loss_bin:.2f}")
+#For multiclass:
+# Generate and test on multiclass data
+X_multi, y_multi = generate_multiclass_data(n_samples=300, n_features=2, n_classes=3, random_state=1)
+model_multi = LogisticRegression(lr=0.1, epochs=1000)
+model_multi.fit(X_multi, y_multi)
+y_prob_multi = model_multi.predict_prob(X_multi)     # (n_samples x 3) probabilities
+y_pred_multi = model_multi.predict(X_multi)          # predicted labels 0,1,2
+
+acc_multi = accuracy_score(y_multi, y_pred_multi)
+loss_multi = categorical_cross_entropy(y_multi, y_prob_multi)
+print(f"Multiclass Classification - Accuracy: {acc_multi:.2f}, Cross-Entropy Loss: {loss_multi:.2f}")
+
+# CSV Export
+import csv
+
+# Export binary results
+with open('binary_results.csv', mode='w', newline='') as f:
+    writer = csv.writer(f)
+    writer.writerow(["TrueLabel", "PredictedLabel"])
+    for true, pred in zip(y_bin, y_pred_bin):
+        writer.writerow([true, pred])
+
+# Export multiclass results
+with open('multiclass_results.csv', mode='w', newline='') as f:
+    writer = csv.writer(f)
+    writer.writerow(["TrueLabel", "PredictedLabel"])
+    for true, pred in zip(y_multi, y_pred_multi):
+        writer.writerow([true, pred])
 
-# Print the computed derivative:
-print("The computed derivative of f5 at x = %g is: %g"%(x,f5_grad(x)))
 !ec
 
 
 !split
-===== And  with loops =====
-
-!bc pycod
-import autograd.numpy as np
-from autograd import grad
-def f6_for(x):
-    val = 0
-    for i in range(10):
-        val = val + x**i
-    return val
-
-def f6_while(x):
-    val = 0
-    i = 0
-    while i < 10:
-        val = val + x**i
-        i = i + 1
-    return val
-
-f6_for_grad = grad(f6_for)
-f6_while_grad = grad(f6_while)
-
-x = 0.5
-
-# Print the computed derivaties of f6_for and f6_while
-print("The computed derivative of f6_for at x = %g is: %g"%(x,f6_for_grad(x)))
-print("The computed derivative of f6_while at x = %g is: %g"%(x,f6_while_grad(x)))
-!ec
-!bc pycod
-import autograd.numpy as np
-from autograd import grad
-# Both of the functions are implementation of the sum: sum(x**i) for i = 0, ..., 9
-# The analytical derivative is: sum(i*x**(i-1)) 
-f6_grad_analytical = 0
-for i in range(10):
-    f6_grad_analytical += i*x**(i-1)
-
-print("The analytical derivative of f6 at x = %g is: %g"%(x,f6_grad_analytical))
-!ec
-
-!split
-===== Using recursion =====
-!bc pycod
-import autograd.numpy as np
-from autograd import grad
-
-def f7(n): # Assume that n is an integer
-    if n == 1 or n == 0:
-        return 1
-    else:
-        return n*f7(n-1)
-
-f7_grad = grad(f7)
-
-n = 2.0
-
-print("The computed derivative of f7 at n = %d is: %g"%(n,f7_grad(n)))
-
-# The function f7 is an implementation of the factorial of n.
-# By using the product rule, one can find that the derivative is:
-
-f7_grad_analytical = 0
-for i in range(int(n)-1):
-    tmp = 1
-    for k in range(int(n)-1):
-        if k != i:
-            tmp *= (n - k)
-    f7_grad_analytical += tmp
+=====  Using _Scikit-learn_   =====
 
-print("The analytical derivative of f7 at n = %d is: %g"%(n,f7_grad_analytical))
+We show here how we can use a logistic regression case on a data set
+included in _scikit_learn_, the so-called Wisconsin breast cancer data
+using Logistic regression as our algorithm for classification. This is
+a widely studied data set and can easily be included in demonstrations
+of classification problems.
 
-!ec
-Note that if n is equal to zero or one, Autograd will give an error message. This message appears when the output is independent on input.
-
-
-!split
-=====  Using Autograd with OLS =====
-
-We conclude the part on optmization by showing how we can make codes
-for linear regression and logistic regression using _autograd_. The
-first example shows results with ordinary leats squares.
 
 !bc pycod
-# Using Autograd to calculate gradients for OLS
-from random import random, seed
-import numpy as np
-import autograd.numpy as np
 import matplotlib.pyplot as plt
-from autograd import grad
-
-def CostOLS(beta):
-    return (1.0/n)*np.sum((y-X @ beta)**2)
-
-n = 100
-x = 2*np.random.rand(n,1)
-y = 4+3*x+np.random.randn(n,1)
-
-X = np.c_[np.ones((n,1)), x]
-XT_X = X.T @ X
-theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)
-print("Own inversion")
-print(theta_linreg)
-# Hessian matrix
-H = (2.0/n)* XT_X
-EigValues, EigVectors = np.linalg.eig(H)
-print(f"Eigenvalues of Hessian Matrix:{EigValues}")
-
-theta = np.random.randn(2,1)
-eta = 1.0/np.max(EigValues)
-Niterations = 1000
-# define the gradient
-training_gradient = grad(CostOLS)
-
-for iter in range(Niterations):
-    gradients = training_gradient(theta)
-    theta -= eta*gradients
-print("theta from own gd")
-print(theta)
-
-xnew = np.array([[0],[2]])
-Xnew = np.c_[np.ones((2,1)), xnew]
-ypredict = Xnew.dot(theta)
-ypredict2 = Xnew.dot(theta_linreg)
-
-plt.plot(xnew, ypredict, "r-")
-plt.plot(xnew, ypredict2, "b-")
-plt.plot(x, y ,'ro')
-plt.axis([0,2.0,0, 15.0])
-plt.xlabel(r'$x$')
-plt.ylabel(r'$y$')
-plt.title(r'Random numbers ')
-plt.show()
-
-!ec
-
-
-!split
-===== Same code but now with momentum gradient descent =====
-!bc pycod
-# Using Autograd to calculate gradients for OLS
-from random import random, seed
 import numpy as np
-import autograd.numpy as np
-import matplotlib.pyplot as plt
-from autograd import grad
-
-def CostOLS(beta):
-    return (1.0/n)*np.sum((y-X @ beta)**2)
-
-n = 100
-x = 2*np.random.rand(n,1)
-y = 4+3*x#+np.random.randn(n,1)
-
-X = np.c_[np.ones((n,1)), x]
-XT_X = X.T @ X
-theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)
-print("Own inversion")
-print(theta_linreg)
-# Hessian matrix
-H = (2.0/n)* XT_X
-EigValues, EigVectors = np.linalg.eig(H)
-print(f"Eigenvalues of Hessian Matrix:{EigValues}")
-
-theta = np.random.randn(2,1)
-eta = 1.0/np.max(EigValues)
-Niterations = 30
-
-# define the gradient
-training_gradient = grad(CostOLS)
-
-for iter in range(Niterations):
-    gradients = training_gradient(theta)
-    theta -= eta*gradients
-    print(iter,gradients[0],gradients[1])
-print("theta from own gd")
-print(theta)
-
-# Now improve with momentum gradient descent
-change = 0.0
-delta_momentum = 0.3
-for iter in range(Niterations):
-    # calculate gradient
-    gradients = training_gradient(theta)
-    # calculate update
-    new_change = eta*gradients+delta_momentum*change
-    # take a step
-    theta -= new_change
-    # save the change
-    change = new_change
-    print(iter,gradients[0],gradients[1])
-print("theta from own gd wth momentum")
-print(theta)
+from sklearn.model_selection import  train_test_split 
+from sklearn.datasets import load_breast_cancer
+from sklearn.linear_model import LogisticRegression
+
+# Load the data
+cancer = load_breast_cancer()
 
+X_train, X_test, y_train, y_test = train_test_split(cancer.data,cancer.target,random_state=0)
+print(X_train.shape)
+print(X_test.shape)
+# Logistic Regression
+logreg = LogisticRegression(solver='lbfgs')
+logreg.fit(X_train, y_train)
+print("Test set accuracy with Logistic Regression: {:.2f}".format(logreg.score(X_test,y_test)))
 !ec
 
 !split
-===== Including Stochastic Gradient Descent with Autograd =====
-In this code we include the stochastic gradient descent approach discussed above. Note here that we specify which argument we are taking the derivative with respect to when using _autograd_.
+===== Using the correlation matrix =====
 
+In addition to the above scores, we could also study the covariance (and the correlation matrix).
+We use _Pandas_ to compute the correlation matrix.
 !bc pycod
-# Using Autograd to calculate gradients using SGD
-# OLS example
-from random import random, seed
-import numpy as np
-import autograd.numpy as np
 import matplotlib.pyplot as plt
-from autograd import grad
-
-# Note change from previous example
-def CostOLS(y,X,theta):
-    return np.sum((y-X @ theta)**2)
-
-n = 100
-x = 2*np.random.rand(n,1)
-y = 4+3*x+np.random.randn(n,1)
-
-X = np.c_[np.ones((n,1)), x]
-XT_X = X.T @ X
-theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)
-print("Own inversion")
-print(theta_linreg)
-# Hessian matrix
-H = (2.0/n)* XT_X
-EigValues, EigVectors = np.linalg.eig(H)
-print(f"Eigenvalues of Hessian Matrix:{EigValues}")
-
-theta = np.random.randn(2,1)
-eta = 1.0/np.max(EigValues)
-Niterations = 1000
-
-# Note that we request the derivative wrt third argument (theta, 2 here)
-training_gradient = grad(CostOLS,2)
-
-for iter in range(Niterations):
-    gradients = (1.0/n)*training_gradient(y, X, theta)
-    theta -= eta*gradients
-print("theta from own gd")
-print(theta)
-
-xnew = np.array([[0],[2]])
-Xnew = np.c_[np.ones((2,1)), xnew]
-ypredict = Xnew.dot(theta)
-ypredict2 = Xnew.dot(theta_linreg)
-
-plt.plot(xnew, ypredict, "r-")
-plt.plot(xnew, ypredict2, "b-")
-plt.plot(x, y ,'ro')
-plt.axis([0,2.0,0, 15.0])
-plt.xlabel(r'$x$')
-plt.ylabel(r'$y$')
-plt.title(r'Random numbers ')
+import numpy as np
+from sklearn.model_selection import  train_test_split 
+from sklearn.datasets import load_breast_cancer
+from sklearn.linear_model import LogisticRegression
+cancer = load_breast_cancer()
+import pandas as pd
+# Making a data frame
+cancerpd = pd.DataFrame(cancer.data, columns=cancer.feature_names)
+
+fig, axes = plt.subplots(15,2,figsize=(10,20))
+malignant = cancer.data[cancer.target == 0]
+benign = cancer.data[cancer.target == 1]
+ax = axes.ravel()
+
+for i in range(30):
+    _, bins = np.histogram(cancer.data[:,i], bins =50)
+    ax[i].hist(malignant[:,i], bins = bins, alpha = 0.5)
+    ax[i].hist(benign[:,i], bins = bins, alpha = 0.5)
+    ax[i].set_title(cancer.feature_names[i])
+    ax[i].set_yticks(())
+ax[0].set_xlabel("Feature magnitude")
+ax[0].set_ylabel("Frequency")
+ax[0].legend(["Malignant", "Benign"], loc ="best")
+fig.tight_layout()
 plt.show()
 
-n_epochs = 50
-M = 5   #size of each minibatch
-m = int(n/M) #number of minibatches
-t0, t1 = 5, 50
-def learning_schedule(t):
-    return t0/(t+t1)
-
-theta = np.random.randn(2,1)
-
-for epoch in range(n_epochs):
-# Can you figure out a better way of setting up the contributions to each batch?
-    for i in range(m):
-        random_index = M*np.random.randint(m)
-        xi = X[random_index:random_index+M]
-        yi = y[random_index:random_index+M]
-        gradients = (1.0/M)*training_gradient(yi, xi, theta)
-        eta = learning_schedule(epoch*m+i)
-        theta = theta - eta*gradients
-print("theta from own sdg")
-print(theta)
+import seaborn as sns
+correlation_matrix = cancerpd.corr().round(1)
+# use the heatmap function from seaborn to plot the correlation matrix
+# annot = True to print the values inside the square
+plt.figure(figsize=(15,8))
+sns.heatmap(data=correlation_matrix, annot=True)
+plt.show()
 
 
 !ec
 
-
 !split
-===== Same code but now with momentum gradient descent =====
-!bc pycod
-# Using Autograd to calculate gradients using SGD
-# OLS example
-from random import random, seed
-import numpy as np
-import autograd.numpy as np
-import matplotlib.pyplot as plt
-from autograd import grad
-
-# Note change from previous example
-def CostOLS(y,X,theta):
-    return np.sum((y-X @ theta)**2)
-
-n = 100
-x = 2*np.random.rand(n,1)
-y = 4+3*x+np.random.randn(n,1)
-
-X = np.c_[np.ones((n,1)), x]
-XT_X = X.T @ X
-theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)
-print("Own inversion")
-print(theta_linreg)
-# Hessian matrix
-H = (2.0/n)* XT_X
-EigValues, EigVectors = np.linalg.eig(H)
-print(f"Eigenvalues of Hessian Matrix:{EigValues}")
-
-theta = np.random.randn(2,1)
-eta = 1.0/np.max(EigValues)
-Niterations = 100
-
-# Note that we request the derivative wrt third argument (theta, 2 here)
-training_gradient = grad(CostOLS,2)
-
-for iter in range(Niterations):
-    gradients = (1.0/n)*training_gradient(y, X, theta)
-    theta -= eta*gradients
-print("theta from own gd")
-print(theta)
-
-
-n_epochs = 50
-M = 5   #size of each minibatch
-m = int(n/M) #number of minibatches
-t0, t1 = 5, 50
-def learning_schedule(t):
-    return t0/(t+t1)
-
-theta = np.random.randn(2,1)
-
-change = 0.0
-delta_momentum = 0.3
-
-for epoch in range(n_epochs):
-    for i in range(m):
-        random_index = M*np.random.randint(m)
-        xi = X[random_index:random_index+M]
-        yi = y[random_index:random_index+M]
-        gradients = (1.0/M)*training_gradient(yi, xi, theta)
-        eta = learning_schedule(epoch*m+i)
-        # calculate update
-        new_change = eta*gradients+delta_momentum*change
-        # take a step
-        theta -= new_change
-        # save the change
-        change = new_change
-print("theta from own sdg with momentum")
-print(theta)
-!ec
+===== Discussing the correlation data =====
 
+In the above example we note two things. In the first plot we display
+the overlap of benign and malignant tumors as functions of the various
+features in the Wisconsin data set. We see that for
+some of the features we can distinguish clearly the benign and
+malignant cases while for other features we cannot. This can point to
+us which features may be of greater interest when we wish to classify
+a benign or not benign tumour.
 
-!split
-===== Similar (second order function now) problem but now with AdaGrad =====
-!bc pycod
-# Using Autograd to calculate gradients using AdaGrad and Stochastic Gradient descent
-# OLS example
-from random import random, seed
-import numpy as np
-import autograd.numpy as np
-import matplotlib.pyplot as plt
-from autograd import grad
-
-# Note change from previous example
-def CostOLS(y,X,theta):
-    return np.sum((y-X @ theta)**2)
-
-n = 1000
-x = np.random.rand(n,1)
-y = 2.0+3*x +4*x*x
-
-X = np.c_[np.ones((n,1)), x, x*x]
-XT_X = X.T @ X
-theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)
-print("Own inversion")
-print(theta_linreg)
-
-
-# Note that we request the derivative wrt third argument (theta, 2 here)
-training_gradient = grad(CostOLS,2)
-# Define parameters for Stochastic Gradient Descent
-n_epochs = 50
-M = 5   #size of each minibatch
-m = int(n/M) #number of minibatches
-# Guess for unknown parameters theta
-theta = np.random.randn(3,1)
-
-# Value for learning rate
-eta = 0.01
-# Including AdaGrad parameter to avoid possible division by zero
-delta  = 1e-8
-for epoch in range(n_epochs):
-    Giter = 0.0
-    for i in range(m):
-        random_index = M*np.random.randint(m)
-        xi = X[random_index:random_index+M]
-        yi = y[random_index:random_index+M]
-        gradients = (1.0/M)*training_gradient(yi, xi, theta)
-        Giter += gradients*gradients
-        update = gradients*eta/(delta+np.sqrt(Giter))
-        theta -= update
-print("theta from own AdaGrad")
-print(theta)
-
+In the second figure we have computed the so-called correlation
+matrix, which in our case with thirty features becomes a $30\times 30$
+matrix.
 
-!ec
-
-Running this code we note an almost perfect agreement with the results from matrix inversion.
-
-!split
-=====  RMSprop for adaptive learning rate with Stochastic Gradient Descent =====
+We constructed this matrix using _pandas_ via the statements
 !bc pycod
-# Using Autograd to calculate gradients using RMSprop  and Stochastic Gradient descent
-# OLS example
-from random import random, seed
-import numpy as np
-import autograd.numpy as np
-import matplotlib.pyplot as plt
-from autograd import grad
-
-# Note change from previous example
-def CostOLS(y,X,theta):
-    return np.sum((y-X @ theta)**2)
-
-n = 1000
-x = np.random.rand(n,1)
-y = 2.0+3*x +4*x*x# +np.random.randn(n,1)
-
-X = np.c_[np.ones((n,1)), x, x*x]
-XT_X = X.T @ X
-theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)
-print("Own inversion")
-print(theta_linreg)
-
-
-# Note that we request the derivative wrt third argument (theta, 2 here)
-training_gradient = grad(CostOLS,2)
-# Define parameters for Stochastic Gradient Descent
-n_epochs = 50
-M = 5   #size of each minibatch
-m = int(n/M) #number of minibatches
-# Guess for unknown parameters theta
-theta = np.random.randn(3,1)
-
-# Value for learning rate
-eta = 0.01
-# Value for parameter rho
-rho = 0.99
-# Including AdaGrad parameter to avoid possible division by zero
-delta  = 1e-8
-for epoch in range(n_epochs):
-    Giter = 0.0
-    for i in range(m):
-        random_index = M*np.random.randint(m)
-        xi = X[random_index:random_index+M]
-        yi = y[random_index:random_index+M]
-        gradients = (1.0/M)*training_gradient(yi, xi, theta)
-	# Accumulated gradient
-	# Scaling with rho the new and the previous results
-        Giter = (rho*Giter+(1-rho)*gradients*gradients)
-	# Taking the diagonal only and inverting
-        update = gradients*eta/(delta+np.sqrt(Giter))
-	# Hadamard product
-        theta -= update
-print("theta from own RMSprop")
-print(theta)
+cancerpd = pd.DataFrame(cancer.data, columns=cancer.feature_names)
 !ec
-
-!split
-===== And finally "ADAM":"https://arxiv.org/pdf/1412.6980.pdf" =====
-
+and then
 !bc pycod
-# Using Autograd to calculate gradients using RMSprop  and Stochastic Gradient descent
-# OLS example
-from random import random, seed
-import numpy as np
-import autograd.numpy as np
-import matplotlib.pyplot as plt
-from autograd import grad
-
-# Note change from previous example
-def CostOLS(y,X,theta):
-    return np.sum((y-X @ theta)**2)
-
-n = 1000
-x = np.random.rand(n,1)
-y = 2.0+3*x +4*x*x# +np.random.randn(n,1)
-
-X = np.c_[np.ones((n,1)), x, x*x]
-XT_X = X.T @ X
-theta_linreg = np.linalg.pinv(XT_X) @ (X.T @ y)
-print("Own inversion")
-print(theta_linreg)
-
-
-# Note that we request the derivative wrt third argument (theta, 2 here)
-training_gradient = grad(CostOLS,2)
-# Define parameters for Stochastic Gradient Descent
-n_epochs = 50
-M = 5   #size of each minibatch
-m = int(n/M) #number of minibatches
-# Guess for unknown parameters theta
-theta = np.random.randn(3,1)
-
-# Value for learning rate
-eta = 0.01
-# Value for parameters beta1 and beta2, see https://arxiv.org/abs/1412.6980
-beta1 = 0.9
-beta2 = 0.999
-# Including AdaGrad parameter to avoid possible division by zero
-delta  = 1e-7
-iter = 0
-for epoch in range(n_epochs):
-    first_moment = 0.0
-    second_moment = 0.0
-    iter += 1
-    for i in range(m):
-        random_index = M*np.random.randint(m)
-        xi = X[random_index:random_index+M]
-        yi = y[random_index:random_index+M]
-        gradients = (1.0/M)*training_gradient(yi, xi, theta)
-        # Computing moments first
-        first_moment = beta1*first_moment + (1-beta1)*gradients
-        second_moment = beta2*second_moment+(1-beta2)*gradients*gradients
-        first_term = first_moment/(1.0-beta1**iter)
-        second_term = second_moment/(1.0-beta2**iter)
-	# Scaling with rho the new and the previous results
-        update = eta*first_term/(np.sqrt(second_term)+delta)
-        theta -= update
-print("theta from own ADAM")
-print(theta)
+correlation_matrix = cancerpd.corr().round(1)
 !ec
 
-!split
-===== And Logistic Regression =====
+Diagonalizing this matrix we can in turn say something about which
+features are of relevance and which are not. This leads  us to
+the classical Principal Component Analysis (PCA) theorem with
+applications. This will be discussed later this semester.
 
-!bc pycod
-import autograd.numpy as np
-from autograd import grad
-
-def sigmoid(x):
-    return 0.5 * (np.tanh(x / 2.) + 1)
-
-def logistic_predictions(weights, inputs):
-    # Outputs probability of a label being true according to logistic model.
-    return sigmoid(np.dot(inputs, weights))
-
-def training_loss(weights):
-    # Training loss is the negative log-likelihood of the training labels.
-    preds = logistic_predictions(weights, inputs)
-    label_probabilities = preds * targets + (1 - preds) * (1 - targets)
-    return -np.sum(np.log(label_probabilities))
-
-# Build a toy dataset.
-inputs = np.array([[0.52, 1.12,  0.77],
-                   [0.88, -1.08, 0.15],
-                   [0.52, 0.06, -1.30],
-                   [0.74, -2.49, 1.39]])
-targets = np.array([True, True, False, True])
-
-# Define a function that returns gradients of training loss using Autograd.
-training_gradient_fun = grad(training_loss)
-
-# Optimize weights using gradient descent.
-weights = np.array([0.0, 0.0, 0.0])
-print("Initial loss:", training_loss(weights))
-for i in range(100):
-    weights -= training_gradient_fun(weights) * 0.01
-
-print("Trained loss:", training_loss(weights))
-!ec
-
-
-
-
-===== Introducing "JAX":"https://jax.readthedocs.io/en/latest/" =====
 
-Presently, instead of using _autograd_, we recommend using "JAX":"https://jax.readthedocs.io/en/latest/"
 
-_JAX_ is Autograd and "XLA (Accelerated Linear Algebra))":"https://www.tensorflow.org/xla",
-brought together for high-performance numerical computing and machine learning research.
-It provides composable transformations of Python+NumPy programs: differentiate, vectorize, parallelize, Just-In-Time compile to GPU/TPU, and more.
-
-=== Getting started with Jax, note the way we import numpy ===
-!bc pycod
-import jax
-import jax.numpy as jnp
-import numpy as np
+!split
+===== Other measures in classification studies =====
+!bc pycod 
 import matplotlib.pyplot as plt
+import numpy as np
+from sklearn.model_selection import  train_test_split 
+from sklearn.datasets import load_breast_cancer
+from sklearn.linear_model import LogisticRegression
 
-from jax import grad as jax_grad
-!ec
-
-
-=== A warm-up example ===
-
-!bc pycod
-def function(x):
-    return x**2
-
-def analytical_gradient(x):
-    return 2*x
-
-def gradient_descent(starting_point, learning_rate, num_iterations, solver="analytical"):
-    x = starting_point
-    trajectory_x = [x]
-    trajectory_y = [function(x)]
-
-    if solver == "analytical":
-        grad = analytical_gradient    
-    elif solver == "jax":
-        grad = jax_grad(function)
-        x = jnp.float64(x)
-        learning_rate = jnp.float64(learning_rate)
-
-    for _ in range(num_iterations):
-        
-        x = x - learning_rate * grad(x)
-        trajectory_x.append(x)
-        trajectory_y.append(function(x))
-
-    return trajectory_x, trajectory_y
-
-x = np.linspace(-5, 5, 100)
-plt.plot(x, function(x), label="f(x)")
+# Load the data
+cancer = load_breast_cancer()
 
-descent_x, descent_y = gradient_descent(5, 0.1, 10, solver="analytical")
-jax_descend_x, jax_descend_y = gradient_descent(5, 0.1, 10, solver="jax")
+X_train, X_test, y_train, y_test = train_test_split(cancer.data,cancer.target,random_state=0)
+print(X_train.shape)
+print(X_test.shape)
+# Logistic Regression
+logreg = LogisticRegression(solver='lbfgs')
+logreg.fit(X_train, y_train)
+
+from sklearn.preprocessing import LabelEncoder
+from sklearn.model_selection import cross_validate
+#Cross validation
+accuracy = cross_validate(logreg,X_test,y_test,cv=10)['test_score']
+print(accuracy)
+print("Test set accuracy with Logistic Regression: {:.2f}".format(logreg.score(X_test,y_test)))
+
+import scikitplot as skplt
+y_pred = logreg.predict(X_test)
+skplt.metrics.plot_confusion_matrix(y_test, y_pred, normalize=True)
+plt.show()
+y_probas = logreg.predict_proba(X_test)
+skplt.metrics.plot_roc(y_test, y_probas)
+plt.show()
+skplt.metrics.plot_cumulative_gain(y_test, y_probas)
+plt.show()
 
-plt.plot(descent_x, descent_y, label="Gradient descent", marker="o")
-plt.plot(jax_descend_x, jax_descend_y, label="JAX", marker="x")
 !ec
 
-=== A more advanced example ===
-
-!bc pycod
-backend = np
-
-def function(x):
-    return x*backend.sin(x**2 + 1)
-
-def analytical_gradient(x):
-    return backend.sin(x**2 + 1) + 2*x**2*backend.cos(x**2 + 1)
 
 
-x = np.linspace(-5, 5, 100)
-plt.plot(x, function(x), label="f(x)")
-
-descent_x, descent_y = gradient_descent(1, 0.01, 300, solver="analytical")
-
-# Change the backend to JAX
-backend = jnp
-jax_descend_x, jax_descend_y = gradient_descent(1, 0.01, 300, solver="jax")
-
-plt.scatter(descent_x, descent_y, label="Gradient descent", marker="v", s=10, color="red") 
-plt.scatter(jax_descend_x, jax_descend_y, label="JAX", marker="x", s=5, color="black")
-!ec
-
 
 
 
@@ -2257,3 +1428,4 @@ plt.show()
 !ec
 
 
+
diff --git a/doc/src/week41/.ipynb_checkpoints/week41-checkpoint.ipynb b/doc/src/week41/.ipynb_checkpoints/week41-checkpoint.ipynb
new file mode 100644
index 000000000..a70cfa378
--- /dev/null
+++ b/doc/src/week41/.ipynb_checkpoints/week41-checkpoint.ipynb
@@ -0,0 +1,3812 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "009b0d42",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- HTML file automatically generated from DocOnce source (https://github.com/doconce/doconce/)\n",
+    "doconce format html week41.do.txt --no_mako -->\n",
+    "<!-- dom:TITLE: Week 41 Neural networks and constructing a neural network code -->"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "77106a5c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "# Week 41 Neural networks and constructing a neural network code\n",
+    "**Morten Hjorth-Jensen**, Department of Physics, University of Oslo, Norway\n",
+    "\n",
+    "Date: **Week 41**"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "24abde37",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Plan for week 41, October 6-10"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f046f2f8",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Material for the lecture on Monday October 6, 2025\n",
+    "1. Neural Networks, setting up the basic steps, from the simple perceptron model to the multi-layer perceptron model.\n",
+    "\n",
+    "2. Building our own Feed-forward Neural Network, getting started\n",
+    "<!-- * Video of lecture notes at <https://youtu.be/pMRUbf9E-gM> -->\n",
+    "<!-- * Whiteboard notes at <https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2024/NotesOctober7.pdf> -->\n",
+    "\n",
+    "**Readings and Videos:**\n",
+    "\n",
+    "1. These lecture notes\n",
+    "\n",
+    "2. For neural networks we recommend Goodfellow et al chapters 6 and 7.\n",
+    "\n",
+    "3. Rashkca et al., chapter 11, jupyter-notebook sent separately, from [GitHub](https://github.com/rasbt/machine-learning-book)\n",
+    "\n",
+    "a. Neural Networks demystified at <https://www.youtube.com/watch?v=bxe2T-V8XRs&list=PLiaHhY2iBX9hdHaRr6b7XevZtgZRa1PoU&ab_channel=WelchLabs>\n",
+    "\n",
+    "2. Building Neural Networks from scratch at <https://www.youtube.com/watch?v=Wo5dMEP_BbI&list=PLQVvvaa0QuDcjD5BAw2DxE6OF2tius3V3&ab_channel=sentdex>\n",
+    "\n",
+    "3. Video on Neural Networks at <https://www.youtube.com/watch?v=CqOfi41LfDw>\n",
+    "\n",
+    "4. Video on the back propagation algorithm at <https://www.youtube.com/watch?v=Ilg3gGewQ5U>\n",
+    "\n",
+    "We also  recommend Michael Nielsen's intuitive approach to the neural networks and the universal approximation theorem, see the slides at <http://neuralnetworksanddeeplearning.com/chap4.html>."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "083a57fa",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Mathematics of deep learning\n",
+    "\n",
+    "**Two recent books online.**\n",
+    "\n",
+    "1. [The Modern Mathematics of Deep Learning, by Julius Berner, Philipp Grohs, Gitta Kutyniok, Philipp Petersen](https://arxiv.org/abs/2105.04026), published as [Mathematical Aspects of Deep Learning, pp. 1-111. Cambridge University Press, 2022](https://doi.org/10.1017/9781009025096.002)\n",
+    "\n",
+    "2. [Mathematical Introduction to Deep Learning: Methods, Implementations, and Theory, Arnulf Jentzen, Benno Kuckuck, Philippe von Wurstemberger](https://doi.org/10.48550/arXiv.2310.20360)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "afd44c9c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Reminder on books with hands-on material and codes\n",
+    "* [Sebastian Rashcka et al, Machine learning with Sickit-Learn and PyTorch](https://sebastianraschka.com/blog/2022/ml-pytorch-book.html)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a635e90e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Lab sessions on Tuesday and Wednesday\n",
+    "\n",
+    "1. Getting started with coding neural network. The exercises this week aim at setting up the feed-forward part of a neural network."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "117eee5e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Lecture Monday  October 6"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b17ac07a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Introduction to Neural networks\n",
+    "\n",
+    "Artificial neural networks are computational systems that can learn to\n",
+    "perform tasks by considering examples, generally without being\n",
+    "programmed with any task-specific rules. It is supposed to mimic a\n",
+    "biological system, wherein neurons interact by sending signals in the\n",
+    "form of mathematical functions between layers. All layers can contain\n",
+    "an arbitrary number of neurons, and each connection is represented by\n",
+    "a weight variable."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9fabd88e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Artificial neurons\n",
+    "\n",
+    "The field of artificial neural networks has a long history of\n",
+    "development, and is closely connected with the advancement of computer\n",
+    "science and computers in general. A model of artificial neurons was\n",
+    "first developed by McCulloch and Pitts in 1943 to study signal\n",
+    "processing in the brain and has later been refined by others. The\n",
+    "general idea is to mimic neural networks in the human brain, which is\n",
+    "composed of billions of neurons that communicate with each other by\n",
+    "sending electrical signals.  Each neuron accumulates its incoming\n",
+    "signals, which must exceed an activation threshold to yield an\n",
+    "output. If the threshold is not overcome, the neuron remains inactive,\n",
+    "i.e. has zero output.\n",
+    "\n",
+    "This behaviour has inspired a simple mathematical model for an artificial neuron."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4e03559d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"artificialNeuron\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation}\n",
+    " y = f\\left(\\sum_{i=1}^n w_ix_i\\right) = f(u)\n",
+    "\\label{artificialNeuron} \\tag{1}\n",
+    "\\end{equation}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "797b27ee",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Here, the output $y$ of the neuron is the value of its activation function, which have as input\n",
+    "a weighted sum of signals $x_i, \\dots ,x_n$ received by $n$ other neurons.\n",
+    "\n",
+    "Conceptually, it is helpful to divide neural networks into four\n",
+    "categories:\n",
+    "1. general purpose neural networks for supervised learning,\n",
+    "\n",
+    "2. neural networks designed specifically for image processing, the most prominent example of this class being Convolutional Neural Networks (CNNs),\n",
+    "\n",
+    "3. neural networks for sequential data such as Recurrent Neural Networks (RNNs), and\n",
+    "\n",
+    "4. neural networks for unsupervised learning such as Deep Boltzmann Machines.\n",
+    "\n",
+    "In natural science, DNNs and CNNs have already found numerous\n",
+    "applications. In statistical physics, they have been applied to detect\n",
+    "phase transitions in 2D Ising and Potts models, lattice gauge\n",
+    "theories, and different phases of polymers, or solving the\n",
+    "Navier-Stokes equation in weather forecasting.  Deep learning has also\n",
+    "found interesting applications in quantum physics. Various quantum\n",
+    "phase transitions can be detected and studied using DNNs and CNNs,\n",
+    "topological phases, and even non-equilibrium many-body\n",
+    "localization. Representing quantum states as DNNs quantum state\n",
+    "tomography are among some of the impressive achievements to reveal the\n",
+    "potential of DNNs to facilitate the study of quantum systems.\n",
+    "\n",
+    "In quantum information theory, it has been shown that one can perform\n",
+    "gate decompositions with the help of neural. \n",
+    "\n",
+    "The applications are not limited to the natural sciences. There is a\n",
+    "plethora of applications in essentially all disciplines, from the\n",
+    "humanities to life science and medicine."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3ac6b2d0",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Neural network types\n",
+    "\n",
+    "An artificial neural network (ANN), is a computational model that\n",
+    "consists of layers of connected neurons, or nodes or units.  We will\n",
+    "refer to these interchangeably as units or nodes, and sometimes as\n",
+    "neurons.\n",
+    "\n",
+    "It is supposed to mimic a biological nervous system by letting each\n",
+    "neuron interact with other neurons by sending signals in the form of\n",
+    "mathematical functions between layers.  A wide variety of different\n",
+    "ANNs have been developed, but most of them consist of an input layer,\n",
+    "an output layer and eventual layers in-between, called *hidden\n",
+    "layers*. All layers can contain an arbitrary number of nodes, and each\n",
+    "connection between two nodes is associated with a weight variable.\n",
+    "\n",
+    "Neural networks (also called neural nets) are neural-inspired\n",
+    "nonlinear models for supervised learning.  As we will see, neural nets\n",
+    "can be viewed as natural, more powerful extensions of supervised\n",
+    "learning methods such as linear and logistic regression and soft-max\n",
+    "methods we discussed earlier."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "58d8729d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Feed-forward neural networks\n",
+    "\n",
+    "The feed-forward neural network (FFNN) was the first and simplest type\n",
+    "of ANNs that were devised. In this network, the information moves in\n",
+    "only one direction: forward through the layers.\n",
+    "\n",
+    "Nodes are represented by circles, while the arrows display the\n",
+    "connections between the nodes, including the direction of information\n",
+    "flow. Additionally, each arrow corresponds to a weight variable\n",
+    "(figure to come).  We observe that each node in a layer is connected\n",
+    "to *all* nodes in the subsequent layer, making this a so-called\n",
+    "*fully-connected* FFNN."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7a91bacc",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Convolutional Neural Network\n",
+    "\n",
+    "A different variant of FFNNs are *convolutional neural networks*\n",
+    "(CNNs), which have a connectivity pattern inspired by the animal\n",
+    "visual cortex. Individual neurons in the visual cortex only respond to\n",
+    "stimuli from small sub-regions of the visual field, called a receptive\n",
+    "field. This makes the neurons well-suited to exploit the strong\n",
+    "spatially local correlation present in natural images. The response of\n",
+    "each neuron can be approximated mathematically as a convolution\n",
+    "operation.  (figure to come)\n",
+    "\n",
+    "Convolutional neural networks emulate the behaviour of neurons in the\n",
+    "visual cortex by enforcing a *local* connectivity pattern between\n",
+    "nodes of adjacent layers: Each node in a convolutional layer is\n",
+    "connected only to a subset of the nodes in the previous layer, in\n",
+    "contrast to the fully-connected FFNN.  Often, CNNs consist of several\n",
+    "convolutional layers that learn local features of the input, with a\n",
+    "fully-connected layer at the end, which gathers all the local data and\n",
+    "produces the outputs. They have wide applications in image and video\n",
+    "recognition."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "811d80fc",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Recurrent neural networks\n",
+    "\n",
+    "So far we have only mentioned ANNs where information flows in one\n",
+    "direction: forward. *Recurrent neural networks* on the other hand,\n",
+    "have connections between nodes that form directed *cycles*. This\n",
+    "creates a form of internal memory which are able to capture\n",
+    "information on what has been calculated before; the output is\n",
+    "dependent on the previous computations. Recurrent NNs make use of\n",
+    "sequential information by performing the same task for every element\n",
+    "in a sequence, where each element depends on previous elements. An\n",
+    "example of such information is sentences, making recurrent NNs\n",
+    "especially well-suited for handwriting and speech recognition."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "755de74a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Other types of networks\n",
+    "\n",
+    "There are many other kinds of ANNs that have been developed. One type\n",
+    "that is specifically designed for interpolation in multidimensional\n",
+    "space is the radial basis function (RBF) network. RBFs are typically\n",
+    "made up of three layers: an input layer, a hidden layer with\n",
+    "non-linear radial symmetric activation functions and a linear output\n",
+    "layer (''linear'' here means that each node in the output layer has a\n",
+    "linear activation function). The layers are normally fully-connected\n",
+    "and there are no cycles, thus RBFs can be viewed as a type of\n",
+    "fully-connected FFNN. They are however usually treated as a separate\n",
+    "type of NN due the unusual activation functions."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3e5ada1e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Multilayer perceptrons\n",
+    "\n",
+    "One uses often so-called fully-connected feed-forward neural networks\n",
+    "with three or more layers (an input layer, one or more hidden layers\n",
+    "and an output layer) consisting of neurons that have non-linear\n",
+    "activation functions.\n",
+    "\n",
+    "Such networks are often called *multilayer perceptrons* (MLPs)."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "39a52a85",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Why multilayer perceptrons?\n",
+    "\n",
+    "According to the *Universal approximation theorem*, a feed-forward\n",
+    "neural network with just a single hidden layer containing a finite\n",
+    "number of neurons can approximate a continuous multidimensional\n",
+    "function to arbitrary accuracy, assuming the activation function for\n",
+    "the hidden layer is a **non-constant, bounded and\n",
+    "monotonically-increasing continuous function**.\n",
+    "\n",
+    "Note that the requirements on the activation function only applies to\n",
+    "the hidden layer, the output nodes are always assumed to be linear, so\n",
+    "as to not restrict the range of output values."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b6dfa1a1",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Illustration of a single perceptron model and a multi-perceptron model\n",
+    "\n",
+    "<!-- dom:FIGURE: [figures/nns.png, width=600 frac=0.8]  In a) we show a single perceptron model while in b) we dispay a network with two  hidden layers, an input layer and an output layer. -->\n",
+    "<!-- begin figure -->\n",
+    "\n",
+    "<img src=\"figures/nns.png\" width=\"600\"><p style=\"font-size: 0.9em\"><i>Figure 1: In a) we show a single perceptron model while in b) we dispay a network with two  hidden layers, an input layer and an output layer.</i></p>\n",
+    "<!-- end figure -->"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7fe70d00",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Mathematics of deep learning and neural networks\n",
+    "\n",
+    "Neural networks, in its so-called feed-forward form, where each\n",
+    "iterations contains a feed-forward stage and a back-propgagation\n",
+    "stage, consist of series of affine matrix-matrix and matrix-vector\n",
+    "multiplications. The unknown parameters (the so-called biases and\n",
+    "weights which deternine the architecture of a neural network), are\n",
+    "uptaded iteratively using the so-called back-propagation algorithm.\n",
+    "This algorithm corresponds to the so-called reverse mode of \n",
+    "automatic differentation."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a6350e9a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Basics of an NN\n",
+    "\n",
+    "A neural network consists of a series of hidden layers, in addition to\n",
+    "the input and output layers.  Each layer $l$ has a set of parameters\n",
+    "$\\boldsymbol{\\Theta}^{(l)}=(\\boldsymbol{W}^{(l)},\\boldsymbol{b}^{(l)})$ which are related to the\n",
+    "parameters in other layers through a series of affine transformations,\n",
+    "for a standard NN these are matrix-matrix and matrix-vector\n",
+    "multiplications.  For all layers we will simply use a collective variable $\\boldsymbol{\\Theta}$.\n",
+    "\n",
+    "It consist of two basic steps:\n",
+    "1. a feed forward stage which takes a given input and produces a final output which is compared with the target values through our cost/loss function.\n",
+    "\n",
+    "2. a back-propagation state where the unknown parameters $\\boldsymbol{\\Theta}$ are updated through the optimization of the their gradients. The expressions for the gradients are obtained via the chain rule, starting from the derivative of the cost/function.\n",
+    "\n",
+    "These two steps make up one iteration. This iterative process is continued till we reach an eventual stopping criterion."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "747e12b0",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Overarching view of a neural network\n",
+    "\n",
+    "The architecture of a neural network defines our model. This model\n",
+    "aims at describing some function $f(\\boldsymbol{x}$ which represents\n",
+    "some final result (outputs or tagrget values) given a specific inpput\n",
+    "$\\boldsymbol{x}$. Note that here $\\boldsymbol{y}$ and $\\boldsymbol{x}$ are not limited to be\n",
+    "vectors.\n",
+    "\n",
+    "The architecture consists of\n",
+    "1. An input and an output layer where the input layer is defined by the inputs $\\boldsymbol{x}$. The output layer produces the model ouput $\\boldsymbol{\\tilde{y}}$ which is compared with the target value $\\boldsymbol{y}$\n",
+    "\n",
+    "2. A given number of hidden layers and neurons/nodes/units for each layer (this may vary)\n",
+    "\n",
+    "3. A given activation function $\\sigma(\\boldsymbol{z})$ with arguments $\\boldsymbol{z}$ to be defined below. The activation functions may differ from layer to layer.\n",
+    "\n",
+    "4. The last layer, normally called **output** layer has normally an activation function tailored to the specific problem\n",
+    "\n",
+    "5. Finally we define a so-called cost or loss function which is used to gauge the quality of our model."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "854e3306",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## The optimization problem\n",
+    "\n",
+    "The cost function is a function of the unknown parameters\n",
+    "$\\boldsymbol{\\Theta}$ where the latter is a container for all possible\n",
+    "parameters needed to define a neural network\n",
+    "\n",
+    "If we are dealing with a regression task a typical cost/loss function\n",
+    "is the mean squared error"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "be6f6f3a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "C(\\boldsymbol{\\Theta})=\\frac{1}{n}\\left\\{\\left(\\boldsymbol{y}-\\boldsymbol{X}\\boldsymbol{\\theta}\\right)^T\\left(\\boldsymbol{y}-\\boldsymbol{X}\\boldsymbol{\\theta}\\right)\\right\\}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6b12c77c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "This function represents one of many possible ways to define\n",
+    "the so-called cost function. Note that here we have assumed a linear dependence in terms of the paramters $\\boldsymbol{\\Theta}$. This is in general not the case."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5b36bfaf",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Parameters of neural networks\n",
+    "For neural networks the parameters\n",
+    "$\\boldsymbol{\\Theta}$ are given by the so-called weights and biases (to be\n",
+    "defined below).\n",
+    "\n",
+    "The weights are given by matrix elements $w_{ij}^{(l)}$ where the\n",
+    "superscript indicates the layer number. The biases are typically given\n",
+    "by vector elements representing each single node of a given layer,\n",
+    "that is $b_j^{(l)}$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4f6a7c65",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Other ingredients of a neural network\n",
+    "\n",
+    "Having defined the architecture of a neural network, the optimization\n",
+    "of the cost function with respect to the parameters $\\boldsymbol{\\Theta}$,\n",
+    "involves the calculations of gradients and their optimization. The\n",
+    "gradients represent the derivatives of a multidimensional object and\n",
+    "are often approximated by various gradient methods, including\n",
+    "1. various quasi-Newton methods,\n",
+    "\n",
+    "2. plain gradient descent (GD) with a constant learning rate $\\eta$,\n",
+    "\n",
+    "3. GD with momentum and other approximations to the learning rates such as\n",
+    "\n",
+    "  * Adapative gradient (ADAgrad)\n",
+    "\n",
+    "  * Root mean-square propagation (RMSprop)\n",
+    "\n",
+    "  * Adaptive gradient with momentum (ADAM) and many other\n",
+    "\n",
+    "4. Stochastic gradient descent and various families of learning rate approximations"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "fa362d11",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Other parameters\n",
+    "\n",
+    "In addition to the above, there are often additional hyperparamaters\n",
+    "which are included in the setup of a neural network. These will be\n",
+    "discussed below."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "be99467f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Universal approximation theorem\n",
+    "\n",
+    "The universal approximation theorem plays a central role in deep\n",
+    "learning.  [Cybenko (1989)](https://link.springer.com/article/10.1007/BF02551274) showed\n",
+    "the following:\n",
+    "\n",
+    "Let $\\sigma$ be any continuous sigmoidal function such that"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a7786a41",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\sigma(z) = \\left\\{\\begin{array}{cc} 1 & z\\rightarrow \\infty\\\\ 0 & z \\rightarrow -\\infty \\end{array}\\right.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0ee49876",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Given a continuous and deterministic function $F(\\boldsymbol{x})$ on the unit\n",
+    "cube in $d$-dimensions $F\\in [0,1]^d$, $x\\in [0,1]^d$ and a parameter\n",
+    "$\\epsilon >0$, there is a one-layer (hidden) neural network\n",
+    "$f(\\boldsymbol{x};\\boldsymbol{\\Theta})$ with $\\boldsymbol{\\Theta}=(\\boldsymbol{W},\\boldsymbol{b})$ and $\\boldsymbol{W}\\in\n",
+    "\\mathbb{R}^{m\\times n}$ and $\\boldsymbol{b}\\in \\mathbb{R}^{n}$, for which"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5907b6b3",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\vert F(\\boldsymbol{x})-f(\\boldsymbol{x};\\boldsymbol{\\Theta})\\vert < \\epsilon \\hspace{0.1cm} \\forall \\boldsymbol{x}\\in[0,1]^d.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f5243715",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Some parallels from real analysis\n",
+    "\n",
+    "For those of you familiar with for example the [Stone-Weierstrass\n",
+    "theorem](https://en.wikipedia.org/wiki/Stone%E2%80%93Weierstrass_theorem)\n",
+    "for polynomial approximations or the convergence criterion for Fourier\n",
+    "series, there are similarities in the derivation of the proof for\n",
+    "neural networks."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3ec7c025",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## The approximation theorem in words\n",
+    "\n",
+    "**Any continuous function $y=F(\\boldsymbol{x})$ supported on the unit cube in\n",
+    "$d$-dimensions can be approximated by a one-layer sigmoidal network to\n",
+    "arbitrary accuracy.**\n",
+    "\n",
+    "[Hornik (1991)](https://www.sciencedirect.com/science/article/abs/pii/089360809190009T) extended the theorem by letting any non-constant, bounded activation function to be included using that the expectation value"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "00614dd5",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\mathbb{E}[\\vert F(\\boldsymbol{x})\\vert^2] =\\int_{\\boldsymbol{x}\\in D} \\vert F(\\boldsymbol{x})\\vert^2p(\\boldsymbol{x})d\\boldsymbol{x} < \\infty.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3e3c5d32",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Then we have"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3d292016",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\mathbb{E}[\\vert F(\\boldsymbol{x})-f(\\boldsymbol{x};\\boldsymbol{\\Theta})\\vert^2] =\\int_{\\boldsymbol{x}\\in D} \\vert F(\\boldsymbol{x})-f(\\boldsymbol{x};\\boldsymbol{\\Theta})\\vert^2p(\\boldsymbol{x})d\\boldsymbol{x} < \\epsilon.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ffccfb34",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## More on the general approximation theorem\n",
+    "\n",
+    "None of the proofs give any insight into the relation between the\n",
+    "number of of hidden layers and nodes and the approximation error\n",
+    "$\\epsilon$, nor the magnitudes of $\\boldsymbol{W}$ and $\\boldsymbol{b}$.\n",
+    "\n",
+    "Neural networks (NNs) have what we may call a kind of universality no matter what function we want to compute.\n",
+    "\n",
+    "It does not mean that an NN can be used to exactly compute any function. Rather, we get an approximation that is as good as we want."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "18364a7b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Class of functions we can approximate\n",
+    "\n",
+    "The class of functions that can be approximated are the continuous ones.\n",
+    "If the function $F(\\boldsymbol{x})$ is discontinuous, it won't in general be possible to approximate it. However, an NN may still give an approximation even if we fail in some points."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4ff3f75c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Setting up the equations for a neural network\n",
+    "\n",
+    "The questions we want to ask are how do changes in the biases and the\n",
+    "weights in our network change the cost function and how can we use the\n",
+    "final output to modify the weights and biases?\n",
+    "\n",
+    "To derive these equations let us start with a plain regression problem\n",
+    "and define our cost function as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e2110836",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "{\\cal C}(\\boldsymbol{\\Theta})  =  \\frac{1}{2}\\sum_{i=1}^n\\left(y_i - \\tilde{y}_i\\right)^2,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "86b8cfca",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where the $y_i$s are our $n$ targets (the values we want to\n",
+    "reproduce), while the outputs of the network after having propagated\n",
+    "all inputs $\\boldsymbol{x}$ are given by $\\boldsymbol{\\tilde{y}}_i$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6accc6fd",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Layout of a neural network with three hidden layers\n",
+    "\n",
+    "<!-- dom:FIGURE: [figures/nn1.png, width=900 frac=1.0] -->\n",
+    "<!-- begin figure -->\n",
+    "\n",
+    "<img src=\"figures/nn1.png\" width=\"900\"><p style=\"font-size: 0.9em\"><i>Figure 1: </i></p>\n",
+    "<!-- end figure -->"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a7658d1e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Definitions\n",
+    "\n",
+    "With our definition of the targets $\\boldsymbol{y}$, the outputs of the\n",
+    "network $\\boldsymbol{\\tilde{y}}$ and the inputs $\\boldsymbol{x}$ we\n",
+    "define now the activation $z_j^l$ of node/neuron/unit $j$ of the\n",
+    "$l$-th layer as a function of the bias, the weights which add up from\n",
+    "the previous layer $l-1$ and the forward passes/outputs\n",
+    "$\\hat{a}^{l-1}$ from the previous layer as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5c32377f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "z_j^l = \\sum_{i=1}^{M_{l-1}}w_{ij}^la_i^{l-1}+b_j^l,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "fe50869b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where $b_k^l$ are the biases from layer $l$.  Here $M_{l-1}$\n",
+    "represents the total number of nodes/neurons/units of layer $l-1$. The\n",
+    "figure in the whiteboard notes illustrates this equation.  We can rewrite this in a more\n",
+    "compact form as the matrix-vector products we discussed earlier,"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ebaa515a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\hat{z}^l = \\left(\\hat{W}^l\\right)^T\\hat{a}^{l-1}+\\hat{b}^l.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "796833f6",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Inputs to the activation function\n",
+    "\n",
+    "With the activation values $\\boldsymbol{z}^l$ we can in turn define the\n",
+    "output of layer $l$ as $\\boldsymbol{a}^l = f(\\boldsymbol{z}^l)$ where $f$ is our\n",
+    "activation function. In the examples here we will use the sigmoid\n",
+    "function discussed in our logistic regression lectures. We will also use the same activation function $f$ for all layers\n",
+    "and their nodes.  It means we have"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ee384d06",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "a_j^l = \\sigma(z_j^l) = \\frac{1}{1+\\exp{-(z_j^l)}}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "27fdce89",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Derivatives and the chain rule\n",
+    "\n",
+    "From the definition of the activation $z_j^l$ we have"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "20c880e6",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial z_j^l}{\\partial w_{ij}^l} = a_i^{l-1},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9c0f588a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4967d9dc",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial z_j^l}{\\partial a_i^{l-1}} = w_{ji}^l.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e16de9db",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "With our definition of the activation function we have that (note that this function depends only on $z_j^l$)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f8640493",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial a_j^l}{\\partial z_j^{l}} = a_j^l(1-a_j^l)=\\sigma(z_j^l)(1-\\sigma(z_j^l)).\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9aa93ac0",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Derivative of the cost function\n",
+    "\n",
+    "With these definitions we can now compute the derivative of the cost function in terms of the weights.\n",
+    "\n",
+    "Let us specialize to the output layer $l=L$. Our cost function is"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "14cc2a9c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "{\\cal C}(\\boldsymbol{\\Theta}^L)  =  \\frac{1}{2}\\sum_{i=1}^n\\left(y_i - \\tilde{y}_i\\right)^2=\\frac{1}{2}\\sum_{i=1}^n\\left(a_i^L - y_i\\right)^2,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0cfa4223",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "The derivative of this function with respect to the weights is"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9c28ca56",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial{\\cal C}(\\boldsymbol{\\Theta}^L)}{\\partial w_{jk}^L}  =  \\left(a_j^L - y_j\\right)\\frac{\\partial a_j^L}{\\partial w_{jk}^{L}},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "75fc65a8",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "The last partial derivative can easily be computed and reads (by applying the chain rule)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6fa982ac",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial a_j^L}{\\partial w_{jk}^{L}} = \\frac{\\partial a_j^L}{\\partial z_{j}^{L}}\\frac{\\partial z_j^L}{\\partial w_{jk}^{L}}=a_j^L(1-a_j^L)a_k^{L-1}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c491c9c5",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Simpler examples first, and automatic differentiation\n",
+    "\n",
+    "In order to understand the back propagation algorithm and its\n",
+    "derivation (an implementation of the chain rule), let us first digress\n",
+    "with some simple examples. These examples are also meant to motivate\n",
+    "the link with back propagation and [automatic differentiation](https://en.wikipedia.org/wiki/Automatic_differentiation). We will discuss these topics next week (week 42)."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "45521d36",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Reminder on the chain rule and gradients\n",
+    "\n",
+    "If we have a multivariate function $f(x,y)$ where $x=x(t)$ and $y=y(t)$ are functions of a variable $t$, we have that the gradient of $f$ with respect to $t$ (without the explicit unit vector components)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "12a8d1db",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{df}{dt} = \\begin{bmatrix}\\frac{\\partial f}{\\partial x} & \\frac{\\partial f}{\\partial y} \\end{bmatrix} \\begin{bmatrix}\\frac{\\partial x}{\\partial t} \\\\ \\frac{\\partial y}{\\partial t} \\end{bmatrix}=\\frac{\\partial f}{\\partial x} \\frac{\\partial x}{\\partial t} +\\frac{\\partial f}{\\partial y} \\frac{\\partial y}{\\partial t}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "aee14a46",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Multivariable functions\n",
+    "\n",
+    "If we have a multivariate function $f(x,y)$ where $x=x(t,s)$ and $y=y(t,s)$ are functions of the variables $t$ and $s$, we have that the partial derivatives"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2ada2272",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial f}{\\partial s}=\\frac{\\partial f}{\\partial x}\\frac{\\partial x}{\\partial s}+\\frac{\\partial f}{\\partial y}\\frac{\\partial y}{\\partial s},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4e8672d5",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "fdb8ac89",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial f}{\\partial t}=\\frac{\\partial f}{\\partial x}\\frac{\\partial x}{\\partial t}+\\frac{\\partial f}{\\partial y}\\frac{\\partial y}{\\partial t}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "984afcbb",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "the gradient of $f$ with respect to $t$ and $s$ (without the explicit unit vector components)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6351b89c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{df}{d(s,t)} = \\begin{bmatrix}\\frac{\\partial f}{\\partial x} & \\frac{\\partial f}{\\partial y} \\end{bmatrix} \\begin{bmatrix}\\frac{\\partial x}{\\partial s}  &\\frac{\\partial x}{\\partial t} \\\\ \\frac{\\partial y}{\\partial s} & \\frac{\\partial y}{\\partial t} \\end{bmatrix}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "30614241",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Automatic differentiation through examples\n",
+    "\n",
+    "A great introduction to automatic differentiation is given by Baydin et al., see <https://arxiv.org/abs/1502.05767>.\n",
+    "See also the video at <https://www.youtube.com/watch?v=wG_nF1awSSY>.\n",
+    "\n",
+    "Automatic differentiation is a represented by a repeated application\n",
+    "of the chain rule on well-known functions and allows for the\n",
+    "calculation of derivatives to numerical precision. It is not the same\n",
+    "as the calculation of symbolic derivatives via for example SymPy, nor\n",
+    "does it use approximative formulae based on Taylor-expansions of a\n",
+    "function around a given value. The latter are error prone due to\n",
+    "truncation errors and values of the step size $\\Delta$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2b01a0a0",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Simple example\n",
+    "\n",
+    "Our first example is rather simple,"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f46c4ff5",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "f(x) =\\exp{x^2},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3b14ab20",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "with derivative"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "367488fb",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "f'(x) =2x\\exp{x^2}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "449e3f42",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "We can use SymPy to extract the pertinent lines of Python code through the following simple example"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "id": "96bfacee",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "from __future__ import division\n",
+    "from sympy import *\n",
+    "x = symbols('x')\n",
+    "expr = exp(x*x)\n",
+    "simplify(expr)\n",
+    "derivative = diff(expr,x)\n",
+    "print(python(expr))\n",
+    "print(python(derivative))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "72314ab1",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Smarter way of evaluating the above function\n",
+    "If we study this function, we note that we can reduce the number of operations by introducing an intermediate variable"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "73ac0b1e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "a = x^2,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e4f9a39e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "leading to"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d7862e63",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "f(x) = f(a(x)) = b= \\exp{a}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "125aa92f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "We now assume that all operations can be counted in terms of equal\n",
+    "floating point operations. This means that in order to calculate\n",
+    "$f(x)$ we need first to square $x$ and then compute the exponential. We\n",
+    "have thus two floating point operations only."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4191307d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Reducing the number of operations\n",
+    "\n",
+    "With the introduction of a precalculated quantity $a$ and thereby $f(x)$ we have that the derivative can be written as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2bde847d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "f'(x) = 2xb,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "83266d60",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "which reduces the number of operations from four in the orginal\n",
+    "expression to two. This means that if we need to compute $f(x)$ and\n",
+    "its derivative (a common task in optimizations), we have reduced the\n",
+    "number of operations from six to four in total.\n",
+    "\n",
+    "**Note** that the usage of a symbolic software like SymPy does not\n",
+    "include such simplifications and the calculations of the function and\n",
+    "the derivatives yield in general more floating point operations."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c5a7fbff",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Chain rule, forward and reverse modes\n",
+    "\n",
+    "In the above example we have introduced the variables $a$ and $b$, and our function is"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b2fe2411",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "f(x) = f(a(x)) = b= \\exp{a},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "67b6afd2",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "with $a=x^2$. We can decompose the derivative of $f$ with respect to $x$ as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c3c09fb8",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{df}{dx}=\\frac{df}{db}\\frac{db}{da}\\frac{da}{dx}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a327ede6",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "We note that since $b=f(x)$ that"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0883434b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{df}{db}=1,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "29f6a2bd",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "leading to"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "dee0077a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{df}{dx}=\\frac{db}{da}\\frac{da}{dx}=2x\\exp{x^2},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8da19ed0",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "as before."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "07a932a6",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Forward and reverse modes\n",
+    "\n",
+    "We have that"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "fb1d92ff",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{df}{dx}=\\frac{df}{db}\\frac{db}{da}\\frac{da}{dx},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c8cae913",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "which we can rewrite either as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "88a96c22",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{df}{dx}=\\left[\\frac{df}{db}\\frac{db}{da}\\right]\\frac{da}{dx},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "96c7a8cf",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "or"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "980f3549",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{df}{dx}=\\frac{df}{db}\\left[\\frac{db}{da}\\frac{da}{dx}\\right].\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "11087f75",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "The first expression is called reverse mode (or back propagation)\n",
+    "since we start by evaluating the derivatives at the end point and then\n",
+    "propagate backwards. This is the standard way of evaluating\n",
+    "derivatives (gradients) when optimizing the parameters of a neural\n",
+    "network.  In the context of deep learning this is computationally\n",
+    "more efficient since the output of a neural network consists of either\n",
+    "one or some few other output variables.\n",
+    "\n",
+    "The second equation defines the so-called  **forward mode**."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8f7ada68",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## More complicated function\n",
+    "\n",
+    "We increase our ambitions and introduce a slightly more complicated function"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b6572672",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "f(x) =\\sqrt{x^2+exp{x^2}},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "34890dca",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "with derivative"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "885dcf45",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "f'(x) =\\frac{x(1+\\exp{x^2})}{\\sqrt{x^2+exp{x^2}}}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "eb519077",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "The corresponding SymPy code reads"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "id": "041e5116",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "from __future__ import division\n",
+    "from sympy import *\n",
+    "x = symbols('x')\n",
+    "expr = sqrt(x*x+exp(x*x))\n",
+    "simplify(expr)\n",
+    "derivative = diff(expr,x)\n",
+    "print(python(expr))\n",
+    "print(python(derivative))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5ee38441",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Counting the number of floating point operations\n",
+    "\n",
+    "A simple count of operations shows that we need five operations for\n",
+    "the function itself and ten for the derivative.  Fifteen operations in total if we wish to proceed with the above codes.\n",
+    "\n",
+    "Can we reduce this to\n",
+    "say half the number of operations?"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f0aa3e89",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Defining intermediate operations\n",
+    "\n",
+    "We can indeed reduce the number of operation to half of those listed in the brute force approach above.\n",
+    "We define the following quantities"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7ee15116",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "a = x^2,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "816deb1e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "fc1687f8",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "b = \\exp{x^2} = \\exp{a},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6903e979",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7ce8c2e9",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "c= a+b,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7c5a40b4",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9dece790",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "d=f(x)=\\sqrt{c}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6c7469b5",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## New expression for the derivative\n",
+    "\n",
+    "With these definitions we obtain the following partial derivatives"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "660e5353",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial a}{\\partial x} = 2x,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d310327e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1455ef65",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial b}{\\partial a} = \\exp{a},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b8baab0b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9212114c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial c}{\\partial a} = 1,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8afc58ba",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b09e58b3",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial c}{\\partial b} = 1,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c1985faf",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a3a8ead6",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial d}{\\partial c} = \\frac{1}{2\\sqrt{c}},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ea8b9183",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and finally"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a1a3d3e6",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial f}{\\partial d} = 1.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c4c7699a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Final derivatives\n",
+    "Our final derivatives are thus"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f8796867",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial f}{\\partial c} = \\frac{\\partial f}{\\partial d} \\frac{\\partial d}{\\partial c}  = \\frac{1}{2\\sqrt{c}},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7696682f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial f}{\\partial b} = \\frac{\\partial f}{\\partial c} \\frac{\\partial c}{\\partial b}  = \\frac{1}{2\\sqrt{c}},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c255369f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial f}{\\partial a} = \\frac{\\partial f}{\\partial c} \\frac{\\partial c}{\\partial a}+\n",
+    "\\frac{\\partial f}{\\partial b} \\frac{\\partial b}{\\partial a}  = \\frac{1+\\exp{a}}{2\\sqrt{c}},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "fb3583a2",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and finally"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6e85c1f8",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial f}{\\partial x} = \\frac{\\partial f}{\\partial a} \\frac{\\partial a}{\\partial x}  = \\frac{x(1+\\exp{a})}{\\sqrt{c}},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1d674382",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "which is just"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "64dc7923",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial f}{\\partial x} = \\frac{x(1+b)}{d},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "143ed6f6",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and requires only three operations if we can reuse all intermediate variables."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "df28a5cc",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## In general not this simple\n",
+    "\n",
+    "In general, see the generalization below, unless we can obtain simple\n",
+    "analytical expressions which we can simplify further, the final\n",
+    "implementation of automatic differentiation involves repeated\n",
+    "calculations (and thereby operations) of derivatives of elementary\n",
+    "functions."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8fc17c97",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Automatic differentiation\n",
+    "\n",
+    "We can make this example more formal. Automatic differentiation is a\n",
+    "formalization of the previous example (see graph).\n",
+    "\n",
+    "We define $\\boldsymbol{x}\\in x_1,\\dots, x_l$ input variables to a given function $f(\\boldsymbol{x})$ and $x_{l+1},\\dots, x_L$ intermediate variables.\n",
+    "\n",
+    "In the above example we have only one input variable, $l=1$ and four intermediate variables, that is"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c98c6269",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\begin{bmatrix} x_1=x & x_2 = x^2=a & x_3 =\\exp{a}= b & x_4=c=a+b & x_5 = \\sqrt{c}=d \\end{bmatrix}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "51e1f13d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Furthemore, for $i=l+1, \\dots, L$ (here $i=2,3,4,5$ and $f=x_L=d$), we\n",
+    "define the elementary functions $g_i(x_{Pa(x_i)})$ where $x_{Pa(x_i)}$ are the parent nodes of the variable $x_i$.\n",
+    "\n",
+    "In our case, we have for example for $x_3=g_3(x_{Pa(x_i)})=\\exp{a}$, that $g_3=\\exp{()}$ and $x_{Pa(x_3)}=a$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3051afb8",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Chain rule\n",
+    "\n",
+    "We can now compute the gradients by back-propagating the derivatives using the chain rule.\n",
+    "We have defined"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3b0d12f5",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial f}{\\partial x_L} = 1,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d6416f6d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "which allows us to find the derivatives of the various variables $x_i$ as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d82132f6",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial f}{\\partial x_i} = \\sum_{x_j:x_i\\in Pa(x_j)}\\frac{\\partial f}{\\partial x_j} \\frac{\\partial x_j}{\\partial x_i}=\\sum_{x_j:x_i\\in Pa(x_j)}\\frac{\\partial f}{\\partial x_j} \\frac{\\partial g_j}{\\partial x_i}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a2cc1519",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Whenever we have a function which can be expressed as a computation\n",
+    "graph and the various functions can be expressed in terms of\n",
+    "elementary functions that are differentiable, then automatic\n",
+    "differentiation works.  The functions may not need to be elementary\n",
+    "functions, they could also be computer programs, although not all\n",
+    "programs can be automatically differentiated."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "15dec134",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## First network example, simple percepetron with one input\n",
+    "\n",
+    "As yet another example we define now a simple perceptron model with\n",
+    "all quantities given by scalars. We consider only one input variable\n",
+    "$x$ and one target value $y$.  We define an activation function\n",
+    "$\\sigma_1$ which takes as input"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4e82fa87",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "z_1 = w_1x+b_1,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ef7fc3e0",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where $w_1$ is the weight and $b_1$ is the bias. These are the\n",
+    "parameters we want to optimize.  The output is $a_1=\\sigma(z_1)$ (see\n",
+    "graph from whiteboard notes). This output is then fed into the\n",
+    "**cost/loss** function, which we here for the sake of simplicity just\n",
+    "define as the squared error"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "36cd8975",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "C(x;w_1,b_1)=\\frac{1}{2}(a_1-y)^2.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "506df9e4",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Layout of a simple neural network with no hidden layer\n",
+    "\n",
+    "<!-- dom:FIGURE: [figures/simplenn1.png, width=900 frac=1.0] -->\n",
+    "<!-- begin figure -->\n",
+    "\n",
+    "<img src=\"figures/simplenn1.png\" width=\"900\"><p style=\"font-size: 0.9em\"><i>Figure 1: </i></p>\n",
+    "<!-- end figure -->"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "29010edf",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Optimizing the parameters\n",
+    "\n",
+    "In setting up the feed forward and back propagation parts of the\n",
+    "algorithm, we need now the derivative of the various variables we want\n",
+    "to train.\n",
+    "\n",
+    "We need"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a711c13b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial C}{\\partial w_1} \\hspace{0.1cm}\\mathrm{and}\\hspace{0.1cm}\\frac{\\partial C}{\\partial b_1}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d3f80d27",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Using the chain rule we find"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5b9125a4",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial C}{\\partial w_1}=\\frac{\\partial C}{\\partial a_1}\\frac{\\partial a_1}{\\partial z_1}\\frac{\\partial z_1}{\\partial w_1}=(a_1-y)\\sigma_1'x,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "eb0f2a16",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2ce24758",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial C}{\\partial b_1}=\\frac{\\partial C}{\\partial a_1}\\frac{\\partial a_1}{\\partial z_1}\\frac{\\partial z_1}{\\partial b_1}=(a_1-y)\\sigma_1',\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4b607868",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "which we later will just define as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3f2688b3",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial C}{\\partial a_1}\\frac{\\partial a_1}{\\partial z_1}=\\delta_1.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "83bef901",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Adding a hidden layer\n",
+    "\n",
+    "We change our simple model to (see graph)\n",
+    "a network with just one hidden layer but with scalar variables only.\n",
+    "\n",
+    "Our output variable changes to $a_2$ and $a_1$ is now the output from the hidden node and $a_0=x$.\n",
+    "We have then"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "364b5f09",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "z_1 = w_1a_0+b_1 \\hspace{0.1cm} \\wedge a_1 = \\sigma_1(z_1),\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "38288d5b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "z_2 = w_2a_1+b_2 \\hspace{0.1cm} \\wedge a_2 = \\sigma_2(z_2),\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b69ebeb8",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and the cost function"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a1c64c16",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "C(x;\\boldsymbol{\\Theta})=\\frac{1}{2}(a_2-y)^2,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "78f802ec",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "with $\\boldsymbol{\\Theta}=[w_1,w_2,b_1,b_2]$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "502934b1",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Layout of a simple neural network with one hidden layer\n",
+    "\n",
+    "<!-- dom:FIGURE: [figures/simplenn2.png, width=900 frac=1.0] -->\n",
+    "<!-- begin figure -->\n",
+    "\n",
+    "<img src=\"figures/simplenn2.png\" width=\"900\"><p style=\"font-size: 0.9em\"><i>Figure 1: </i></p>\n",
+    "<!-- end figure -->"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2c4fcdd6",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## The derivatives\n",
+    "\n",
+    "The derivatives are now, using the chain rule again"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6cc539fa",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial C}{\\partial w_2}=\\frac{\\partial C}{\\partial a_2}\\frac{\\partial a_2}{\\partial z_2}\\frac{\\partial z_2}{\\partial w_2}=(a_2-y)\\sigma_2'a_1=\\delta_2a_1,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "12891d80",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial C}{\\partial b_2}=\\frac{\\partial C}{\\partial a_2}\\frac{\\partial a_2}{\\partial z_2}\\frac{\\partial z_2}{\\partial b_2}=(a_2-y)\\sigma_2'=\\delta_2,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "81ea70b8",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial C}{\\partial w_1}=\\frac{\\partial C}{\\partial a_2}\\frac{\\partial a_2}{\\partial z_2}\\frac{\\partial z_2}{\\partial a_1}\\frac{\\partial a_1}{\\partial z_1}\\frac{\\partial z_1}{\\partial w_1}=(a_2-y)\\sigma_2'a_1\\sigma_1'a_0,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7861d7fc",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial C}{\\partial b_1}=\\frac{\\partial C}{\\partial a_2}\\frac{\\partial a_2}{\\partial z_2}\\frac{\\partial z_2}{\\partial a_1}\\frac{\\partial a_1}{\\partial z_1}\\frac{\\partial z_1}{\\partial b_1}=(a_2-y)\\sigma_2'\\sigma_1'=\\delta_1.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "bfd3cdb5",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Can you generalize this to more than one hidden layer?"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "900caec3",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Important observations\n",
+    "\n",
+    "From the above equations we see that the derivatives of the activation\n",
+    "functions play a central role. If they vanish, the training may\n",
+    "stop. This is called the vanishing gradient problem, see discussions below. If they become\n",
+    "large, the parameters $w_i$ and $b_i$ may simply go to infinity. This\n",
+    "is referenced as  the exploding gradient problem."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "68cde135",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## The training\n",
+    "\n",
+    "The training of the parameters is done through various gradient descent approximations with"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7be0563a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "w_{i}\\leftarrow w_{i}- \\eta \\delta_i a_{i-1},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a59fe19e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8d0b13ca",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "b_i \\leftarrow b_i-\\eta \\delta_i,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "00f9304b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "with $\\eta$ is the learning rate.\n",
+    "\n",
+    "One iteration consists of one feed forward step and one back-propagation step. Each back-propagation step does one update of the parameters $\\boldsymbol{\\Theta}$.\n",
+    "\n",
+    "For the first hidden layer $a_{i-1}=a_0=x$ for this simple model."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6f93d071",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Code example\n",
+    "\n",
+    "The code here implements the above model with one hidden layer and\n",
+    "scalar variables for the same function we studied in the previous\n",
+    "example.  The code is however set up so that we can add multiple\n",
+    "inputs $x$ and target values $y$. Note also that we have the\n",
+    "possibility of defining a feature matrix $\\boldsymbol{X}$ with more than just\n",
+    "one column for the input values. This will turn useful in our next example. We have also defined matrices and vectors for all of our operations although it is not necessary here."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "id": "a9f388a5",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "import numpy as np\n",
+    "# We use the Sigmoid function as activation function\n",
+    "def sigmoid(z):\n",
+    "    return 1.0/(1.0+np.exp(-z))\n",
+    "\n",
+    "def forwardpropagation(x):\n",
+    "    # weighted sum of inputs to the hidden layer\n",
+    "    z_1 = np.matmul(x, w_1) + b_1\n",
+    "    # activation in the hidden layer\n",
+    "    a_1 = sigmoid(z_1)\n",
+    "    # weighted sum of inputs to the output layer\n",
+    "    z_2 = np.matmul(a_1, w_2) + b_2\n",
+    "    a_2 = z_2\n",
+    "    return a_1, a_2\n",
+    "\n",
+    "def backpropagation(x, y):\n",
+    "    a_1, a_2 = forwardpropagation(x)\n",
+    "    # parameter delta for the output layer, note that a_2=z_2 and its derivative wrt z_2 is just 1\n",
+    "    delta_2 = a_2 - y\n",
+    "    print(0.5*((a_2-y)**2))\n",
+    "    # delta for  the hidden layer\n",
+    "    delta_1 = np.matmul(delta_2, w_2.T) * a_1 * (1 - a_1)\n",
+    "    # gradients for the output layer\n",
+    "    output_weights_gradient = np.matmul(a_1.T, delta_2)\n",
+    "    output_bias_gradient = np.sum(delta_2, axis=0)\n",
+    "    # gradient for the hidden layer\n",
+    "    hidden_weights_gradient = np.matmul(x.T, delta_1)\n",
+    "    hidden_bias_gradient = np.sum(delta_1, axis=0)\n",
+    "    return output_weights_gradient, output_bias_gradient, hidden_weights_gradient, hidden_bias_gradient\n",
+    "\n",
+    "\n",
+    "# ensure the same random numbers appear every time\n",
+    "np.random.seed(0)\n",
+    "# Input variable\n",
+    "x = np.array([4.0],dtype=np.float64)\n",
+    "# Target values\n",
+    "y = 2*x+1.0 \n",
+    "\n",
+    "# Defining the neural network, only scalars here\n",
+    "n_inputs = x.shape\n",
+    "n_features = 1\n",
+    "n_hidden_neurons = 1\n",
+    "n_outputs = 1\n",
+    "\n",
+    "# Initialize the network\n",
+    "# weights and bias in the hidden layer\n",
+    "w_1 = np.random.randn(n_features, n_hidden_neurons)\n",
+    "b_1 = np.zeros(n_hidden_neurons) + 0.01\n",
+    "\n",
+    "# weights and bias in the output layer\n",
+    "w_2 = np.random.randn(n_hidden_neurons, n_outputs)\n",
+    "b_2 = np.zeros(n_outputs) + 0.01\n",
+    "\n",
+    "eta = 0.1\n",
+    "for i in range(50):\n",
+    "    # calculate gradients\n",
+    "    derivW2, derivB2, derivW1, derivB1 = backpropagation(x, y)\n",
+    "    # update weights and biases\n",
+    "    w_2 -= eta * derivW2\n",
+    "    b_2 -= eta * derivB2\n",
+    "    w_1 -= eta * derivW1\n",
+    "    b_1 -= eta * derivB1"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2b757e0a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "We see that after some few iterations (the results do depend on the learning rate however), we get an error which is rather small."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a746f529",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Exercise 1: Including more data\n",
+    "\n",
+    "Try to increase the amount of input and\n",
+    "target/output data. Try also to perform calculations for more values\n",
+    "of the learning rates. Feel free to add either hyperparameters with an\n",
+    "$l_1$ norm or an $l_2$ norm and discuss your results.\n",
+    "Discuss your results as functions of the amount of training data and various learning rates.\n",
+    "\n",
+    "**Challenge:** Try to change the activation functions and replace the hard-coded analytical expressions with automatic derivation via either **autograd** or **JAX**."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ec8820f7",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Simple neural network and the  back propagation equations\n",
+    "\n",
+    "Let us now try to increase our level of ambition and attempt at setting \n",
+    "up the equations for a neural network with two input nodes, one hidden\n",
+    "layer with two hidden nodes and one output layer with one output node/neuron only (see graph)..\n",
+    "\n",
+    "We need to define the following parameters and variables with the input layer (layer $(0)$) \n",
+    "where we label the  nodes $x_0$ and $x_1$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "75eb95db",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "x_0 = a_0^{(0)} \\wedge x_1 = a_1^{(0)}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "105d8a64",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "The  hidden layer (layer $(1)$) has  nodes which yield the outputs $a_0^{(1)}$ and $a_1^{(1)}$) with  weight $\\boldsymbol{w}$ and bias $\\boldsymbol{b}$ parameters"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2818bd8c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "w_{ij}^{(1)}=\\left\\{w_{00}^{(1)},w_{01}^{(1)},w_{10}^{(1)},w_{11}^{(1)}\\right\\} \\wedge b^{(1)}=\\left\\{b_0^{(1)},b_1^{(1)}\\right\\}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "66e9f0bf",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Layout of a simple neural network with two input nodes, one  hidden layer and one output node\n",
+    "\n",
+    "<!-- dom:FIGURE: [figures/simplenn3.png, width=900 frac=1.0] -->\n",
+    "<!-- begin figure -->\n",
+    "\n",
+    "<img src=\"figures/simplenn3.png\" width=\"900\"><p style=\"font-size: 0.9em\"><i>Figure 1: </i></p>\n",
+    "<!-- end figure -->"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "dcbddb2f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## The ouput layer\n",
+    "\n",
+    "Finally, we have the ouput layer given by layer label $(2)$ with output $a^{(2)}$ and weights and biases to be determined given by the variables"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ce85df1e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "w_{i}^{(2)}=\\left\\{w_{0}^{(2)},w_{1}^{(2)}\\right\\} \\wedge b^{(2)}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "29bd7b18",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Our output is $\\tilde{y}=a^{(2)}$ and we define a generic cost function $C(a^{(2)},y;\\boldsymbol{\\Theta})$ where $y$ is the target value (a scalar here).\n",
+    "The parameters we need to optimize are given by"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ef82ab54",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\boldsymbol{\\Theta}=\\left\\{w_{00}^{(1)},w_{01}^{(1)},w_{10}^{(1)},w_{11}^{(1)},w_{0}^{(2)},w_{1}^{(2)},b_0^{(1)},b_1^{(1)},b^{(2)}\\right\\}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d1ac57d1",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Compact expressions\n",
+    "\n",
+    "We can define the inputs to the activation functions for the various layers in terms of various matrix-vector multiplications and vector additions.\n",
+    "The inputs to the first hidden layer are"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "596a9ea8",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\begin{bmatrix}z_0^{(1)} \\\\ z_1^{(1)} \\end{bmatrix}=\\begin{bmatrix}w_{00}^{(1)} & w_{01}^{(1)}\\\\ w_{10}^{(1)} &w_{11}^{(1)} \\end{bmatrix}\\begin{bmatrix}a_0^{(0)} \\\\ a_1^{(0)} \\end{bmatrix}+\\begin{bmatrix}b_0^{(1)} \\\\ b_1^{(1)} \\end{bmatrix},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "70a48160",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "with outputs"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3d8663dc",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\begin{bmatrix}a_0^{(1)} \\\\ a_1^{(1)} \\end{bmatrix}=\\begin{bmatrix}\\sigma^{(1)}(z_0^{(1)}) \\\\ \\sigma^{(1)}(z_1^{(1)}) \\end{bmatrix}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2cb74c95",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Output layer\n",
+    "\n",
+    "For the final output layer we have the inputs to the final activation function"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6f25a15d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "z^{(2)} = w_{0}^{(2)}a_0^{(1)} +w_{1}^{(2)}a_1^{(1)}+b^{(2)},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "980266d5",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "resulting in the  output"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "70bce478",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "a^{(2)}=\\sigma^{(2)}(z^{(2)}).\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6d265593",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Explicit derivatives\n",
+    "\n",
+    "In total we have nine parameters which we need to train.  Using the\n",
+    "chain rule (or just the back-propagation algorithm) we can find all\n",
+    "derivatives. Since we will use automatic differentiation in reverse\n",
+    "mode, we start with the derivatives of the cost function with respect\n",
+    "to the parameters of the output layer, namely"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "dbd4e2a3",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial C}{\\partial w_{i}^{(2)}}=\\frac{\\partial C}{\\partial a^{(2)}}\\frac{\\partial a^{(2)}}{\\partial z^{(2)}}\\frac{\\partial z^{(2)}}{\\partial w_{i}^{(2)}}=\\delta^{(2)}a_i^{(1)},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "09d2cf31",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "with"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f79ab8f7",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\delta^{(2)}=\\frac{\\partial C}{\\partial a^{(2)}}\\frac{\\partial a^{(2)}}{\\partial z^{(2)}}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9c08c538",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and finally"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a2f6c044",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial C}{\\partial b^{(2)}}=\\frac{\\partial C}{\\partial a^{(2)}}\\frac{\\partial a^{(2)}}{\\partial z^{(2)}}\\frac{\\partial z^{(2)}}{\\partial b^{(2)}}=\\delta^{(2)}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a09f4e04",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Derivatives of the hidden layer\n",
+    "\n",
+    "Using the chain rule we have the following expressions for say one of the weight parameters (it is easy to generalize to the other weight parameters)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6a92eb7a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial C}{\\partial w_{00}^{(1)}}=\\frac{\\partial C}{\\partial a^{(2)}}\\frac{\\partial a^{(2)}}{\\partial z^{(2)}}\n",
+    "\\frac{\\partial z^{(2)}}{\\partial z_0^{(1)}}\\frac{\\partial z_0^{(1)}}{\\partial w_{00}^{(1)}}=    \\delta^{(2)}\\frac{\\partial z^{(2)}}{\\partial z_0^{(1)}}\\frac{\\partial z_0^{(1)}}{\\partial w_{00}^{(1)}},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8fe089f9",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "which, noting that"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a62a5291",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "z^{(2)} =w_0^{(2)}a_0^{(1)}+w_1^{(2)}a_1^{(1)}+b^{(2)},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "435fd697",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "allows us to rewrite"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "fa2bd02f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial z^{(2)}}{\\partial z_0^{(1)}}\\frac{\\partial z_0^{(1)}}{\\partial w_{00}^{(1)}}=w_0^{(2)}\\frac{\\partial a_0^{(1)}}{\\partial z_0^{(1)}}a_0^{(1)}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b809a430",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Final expression\n",
+    "Defining"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8d9a8843",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\delta_0^{(1)}=w_0^{(2)}\\frac{\\partial a_0^{(1)}}{\\partial z_0^{(1)}}\\delta^{(2)},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4ac52a1d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "we have"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f146f084",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial C}{\\partial w_{00}^{(1)}}=\\delta_0^{(1)}a_0^{(1)}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5152d124",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Similarly, we obtain"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ac1cc726",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial C}{\\partial w_{01}^{(1)}}=\\delta_0^{(1)}a_1^{(1)}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "301bfbae",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Completing the list\n",
+    "\n",
+    "Similarly, we find"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0f8342ca",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial C}{\\partial w_{10}^{(1)}}=\\delta_1^{(1)}a_0^{(1)},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3c0e1892",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1fd466c4",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial C}{\\partial w_{11}^{(1)}}=\\delta_1^{(1)}a_1^{(1)},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "96a66261",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where we have defined"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "48fec0f6",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\delta_1^{(1)}=w_1^{(2)}\\frac{\\partial a_1^{(1)}}{\\partial z_1^{(1)}}\\delta^{(2)}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0c74d22b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Final expressions for the biases of the hidden layer\n",
+    "\n",
+    "For the sake of completeness, we list the derivatives of the biases, which are"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0c854481",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial C}{\\partial b_{0}^{(1)}}=\\delta_0^{(1)},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f10bb50d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "76a5d55f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial C}{\\partial b_{1}^{(1)}}=\\delta_1^{(1)}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e5f00732",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "As we will see below, these expressions can be generalized in a more compact form."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "afdef373",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Gradient expressions\n",
+    "\n",
+    "For this specific model, with just one output node and two hidden\n",
+    "nodes, the gradient descent equations take the following form for output layer"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d52eee8a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "w_{i}^{(2)}\\leftarrow w_{i}^{(2)}- \\eta \\delta^{(2)} a_{i}^{(1)},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "38b3bf6d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f6068eca",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "b^{(2)} \\leftarrow b^{(2)}-\\eta \\delta^{(2)},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "71ee9ceb",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5c8a4ccd",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "w_{ij}^{(1)}\\leftarrow w_{ij}^{(1)}- \\eta \\delta_{i}^{(1)} a_{j}^{(0)},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "02025483",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c6619121",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "b_{i}^{(1)} \\leftarrow b_{i}^{(1)}-\\eta \\delta_{i}^{(1)},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "050f3fcc",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where $\\eta$ is the learning rate."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "06a24ad2",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Exercise 2: Extended program\n",
+    "\n",
+    "We extend our simple code to a function which depends on two variable $x_0$ and $x_1$, that is"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4274918f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "y=f(x_0,x_1)=x_0^2+3x_0x_1+x_1^2+5.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "56d9f66f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "We feed our network with $n=100$ entries $x_0$ and $x_1$. We have thus two features represented by these variable and an input matrix/design matrix $\\boldsymbol{X}\\in \\mathbf{R}^{n\\times 2}$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "042c5edd",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\boldsymbol{X}=\\begin{bmatrix} x_{00} & x_{01} \\\\ x_{00} & x_{01} \\\\ x_{10} & x_{11} \\\\ x_{20} & x_{21} \\\\ \\dots & \\dots \\\\ \\dots & \\dots \\\\ x_{n-20} & x_{n-21} \\\\ x_{n-10} & x_{n-11} \\end{bmatrix}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ce337d30",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Write a code, based on the previous code examples, which takes as input these data and fit the above function.\n",
+    "You can extend your code to include automatic differentiation.\n",
+    "\n",
+    "With these examples, we are now ready to embark upon the writing of more a general code for neural networks."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "53b17e8e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Getting serious, the  back propagation equations for a neural network\n",
+    "\n",
+    "Now it is time to move away from one node in each layer only. Our inputs are also represented either by several inputs.\n",
+    "\n",
+    "We have thus"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b6759476",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial{\\cal C}((\\boldsymbol{\\Theta}^L)}{\\partial w_{jk}^L}  =  \\left(a_j^L - y_j\\right)a_j^L(1-a_j^L)a_k^{L-1},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "22d35cab",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Defining"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3869fd5f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\delta_j^L = a_j^L(1-a_j^L)\\left(a_j^L - y_j\\right) = \\sigma'(z_j^L)\\frac{\\partial {\\cal C}}{\\partial (a_j^L)},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4ecac8d6",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and using the Hadamard product of two vectors we can write this as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "33824242",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\boldsymbol{\\delta}^L = \\sigma'(\\hat{z}^L)\\circ\\frac{\\partial {\\cal C}}{\\partial (\\boldsymbol{a}^L)}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "905fc741",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Analyzing the last results\n",
+    "\n",
+    "This is an important expression. The second term on the right handside\n",
+    "measures how fast the cost function is changing as a function of the $j$th\n",
+    "output activation.  If, for example, the cost function doesn't depend\n",
+    "much on a particular output node $j$, then $\\delta_j^L$ will be small,\n",
+    "which is what we would expect. The first term on the right, measures\n",
+    "how fast the activation function $f$ is changing at a given activation\n",
+    "value $z_j^L$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f81d4ba5",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## More considerations\n",
+    "\n",
+    "Notice that everything in the above equations is easily computed.  In\n",
+    "particular, we compute $z_j^L$ while computing the behaviour of the\n",
+    "network, and it is only a small additional overhead to compute\n",
+    "$\\sigma'(z^L_j)$.  The exact form of the derivative with respect to the\n",
+    "output depends on the form of the cost function.\n",
+    "However, provided the cost function is known there should be little\n",
+    "trouble in calculating"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "36bf1dfb",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial {\\cal C}}{\\partial (a_j^L)}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ef9f1e9b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "With the definition of $\\delta_j^L$ we have a more compact definition of the derivative of the cost function in terms of the weights, namely"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "16944f70",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial{\\cal C}}{\\partial w_{jk}^L}  =  \\delta_j^La_k^{L-1}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3af2849b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Derivatives in terms of $z_j^L$\n",
+    "\n",
+    "It is also easy to see that our previous equation can be written as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "aba9c751",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\delta_j^L =\\frac{\\partial {\\cal C}}{\\partial z_j^L}= \\frac{\\partial {\\cal C}}{\\partial a_j^L}\\frac{\\partial a_j^L}{\\partial z_j^L},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e75abc21",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "which can also be interpreted as the partial derivative of the cost function with respect to the biases $b_j^L$, namely"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "eb71f184",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\delta_j^L = \\frac{\\partial {\\cal C}}{\\partial b_j^L}\\frac{\\partial b_j^L}{\\partial z_j^L}=\\frac{\\partial {\\cal C}}{\\partial b_j^L},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "31ab0d28",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "That is, the error $\\delta_j^L$ is exactly equal to the rate of change of the cost function as a function of the bias."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9ee59b2e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Bringing it together\n",
+    "\n",
+    "We have now three equations that are essential for the computations of the derivatives of the cost function at the output layer. These equations are needed to start the algorithm and they are"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d6fa40ac",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"_auto1\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation}\n",
+    "\\frac{\\partial{\\cal C}(\\hat{W^L})}{\\partial w_{jk}^L}  =  \\delta_j^La_k^{L-1},\n",
+    "\\label{_auto1} \\tag{2}\n",
+    "\\end{equation}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1ced4984",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3b2fea6b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"_auto2\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation}\n",
+    "\\delta_j^L = \\sigma'(z_j^L)\\frac{\\partial {\\cal C}}{\\partial (a_j^L)},\n",
+    "\\label{_auto2} \\tag{3}\n",
+    "\\end{equation}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "dd703643",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d9fdc337",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"_auto3\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation}\n",
+    "\\delta_j^L = \\frac{\\partial {\\cal C}}{\\partial b_j^L},\n",
+    "\\label{_auto3} \\tag{4}\n",
+    "\\end{equation}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5412dee8",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Final back propagating equation\n",
+    "\n",
+    "We have that (replacing $L$ with a general layer $l$)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e912e54c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\delta_j^l =\\frac{\\partial {\\cal C}}{\\partial z_j^l}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "04224d80",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "We want to express this in terms of the equations for layer $l+1$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9f08fbcc",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Using the chain rule and summing over all $k$ entries\n",
+    "\n",
+    "We obtain"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "faa505b9",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\delta_j^l =\\sum_k \\frac{\\partial {\\cal C}}{\\partial z_k^{l+1}}\\frac{\\partial z_k^{l+1}}{\\partial z_j^{l}}=\\sum_k \\delta_k^{l+1}\\frac{\\partial z_k^{l+1}}{\\partial z_j^{l}},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ed46f21f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and recalling that"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "515a4eef",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "z_j^{l+1} = \\sum_{i=1}^{M_{l}}w_{ij}^{l+1}a_i^{l}+b_j^{l+1},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "bee442d8",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "with $M_l$ being the number of nodes in layer $l$, we obtain"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2b2596dd",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\delta_j^l =\\sum_k \\delta_k^{l+1}w_{kj}^{l+1}\\sigma'(z_j^l),\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1d832dcf",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "This is our final equation.\n",
+    "\n",
+    "We are now ready to set up the algorithm for back propagation and learning the weights and biases."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2f958487",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Setting up the back propagation algorithm\n",
+    "\n",
+    "The four equations  provide us with a way of computing the gradient of the cost function. Let us write this out in the form of an algorithm.\n",
+    "\n",
+    "**First**, we set up the input data $\\hat{x}$ and the activations\n",
+    "$\\hat{z}_1$ of the input layer and compute the activation function and\n",
+    "the pertinent outputs $\\hat{a}^1$.\n",
+    "\n",
+    "**Secondly**, we perform then the feed forward till we reach the output\n",
+    "layer and compute all $\\hat{z}_l$ of the input layer and compute the\n",
+    "activation function and the pertinent outputs $\\hat{a}^l$ for\n",
+    "$l=1,2,3,\\dots,L$.\n",
+    "\n",
+    "**Notation**: The first hidden layer has $l=1$ as label and the final output layer has $l=L$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8861c9ac",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Setting up the back propagation algorithm, part 2\n",
+    "\n",
+    "Thereafter we compute the ouput error $\\hat{\\delta}^L$ by computing all"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "aff5f2e9",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\delta_j^L = \\sigma'(z_j^L)\\frac{\\partial {\\cal C}}{\\partial (a_j^L)}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a12a0d0f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Then we compute the back propagate error for each $l=L-1,L-2,\\dots,1$ as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "eebf7ad0",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\delta_j^l = \\sum_k \\delta_k^{l+1}w_{kj}^{l+1}\\sigma'(z_j^l).\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5ac6e8cd",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Setting up the Back propagation algorithm, part 3\n",
+    "\n",
+    "Finally, we update the weights and the biases using gradient descent\n",
+    "for each $l=L-1,L-2,\\dots,1$ and update the weights and biases\n",
+    "according to the rules"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "219d7967",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "w_{jk}^l\\leftarrow  = w_{jk}^l- \\eta \\delta_j^la_k^{l-1},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c5e69141",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "b_j^l \\leftarrow b_j^l-\\eta \\frac{\\partial {\\cal C}}{\\partial b_j^l}=b_j^l-\\eta \\delta_j^l,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5f72b173",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "with $\\eta$ being the learning rate."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "77500a39",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Updating the gradients\n",
+    "\n",
+    "With the back propagate error for each $l=L-1,L-2,\\dots,1$ as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6d90be80",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\delta_j^l = \\sum_k \\delta_k^{l+1}w_{kj}^{l+1}sigma'(z_j^l),\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6dc431f1",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "we update the weights and the biases using gradient descent for each $l=L-1,L-2,\\dots,1$ and update the weights and biases according to the rules"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "bc54103e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "w_{jk}^l\\leftarrow  = w_{jk}^l- \\eta \\delta_j^la_k^{l-1},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8a08a83e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "b_j^l \\leftarrow b_j^l-\\eta \\frac{\\partial {\\cal C}}{\\partial b_j^l}=b_j^l-\\eta \\delta_j^l,\n",
+    "$$"
+   ]
+  }
+ ],
+ "metadata": {},
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/doc/src/week41/Previousversions/week41.do.txt b/doc/src/week41/Previousversions/week41.do.txt
new file mode 100644
index 000000000..0b015e79b
--- /dev/null
+++ b/doc/src/week41/Previousversions/week41.do.txt
@@ -0,0 +1,2239 @@
+TITLE: Week 41 Neural networks and constructing a neural network code
+AUTHOR: Morten Hjorth-Jensen {copyright, 1999-present|CC BY-NC} at Department of Physics, University of Oslo, Norway
+DATE: Week 41
+
+
+!split
+===== Plan for week 41, October 6-10 =====
+
+!split
+===== Material for the lecture on Monday October 6, 2025 =====
+o Neural Networks, setting up the basic steps, from the simple perceptron model to the multi-layer perceptron model.
+o Building our own Feed-forward Neural Network
+#  * Video of lecture notes at URL:"https://youtu.be/pMRUbf9E-gM"
+#  * Whiteboard notes at URL:"https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2024/NotesOctober7.pdf"
+!bblock  Readings and Videos:
+ o These lecture notes
+ o Rashcka et al chapter 11 
+ o For neural networks we recommend Goodfellow et al chapter 6.
+ o  Neural Networks demystified at URL:"https://www.youtube.com/watch?v=bxe2T-V8XRs&list=PLiaHhY2iBX9hdHaRr6b7XevZtgZRa1PoU&ab_channel=WelchLabs"
+ o Building Neural Networks from scratch at URL:"https://www.youtube.com/watch?v=Wo5dMEP_BbI&list=PLQVvvaa0QuDcjD5BAw2DxE6OF2tius3V3&ab_channel=sentdex"
+ o Video on Neural Networks at URL:"https://www.youtube.com/watch?v=CqOfi41LfDw"
+ o Video on the back propagation algorithm at URL:"https://www.youtube.com/watch?v=Ilg3gGewQ5U"
+We also  recommend Michael Nielsen's intuitive approach to the neural networks and the universal approximation theorem, see the slides at URL:"http://neuralnetworksanddeeplearning.com/chap4.html".
+!eblock
+
+
+!split
+===== Material for the active learning sessions on Tuesday and Wednesday =====
+* Exercise on writing your own stochastic gradient and gradient descent codes. This exercise continues next week with studies of automatic differentiation
+* One lecture at the beginning of each session on the material from weeks 39 and 40 and how to write your own gradient descent code
+* Discussion of project 2
+* Your task before the sessions: revisit the material from weeks 39 and 40 and in particular the material from week 40 on stochastic gradient descent
+
+
+
+!split
+===== Lecture Monday  October 7 =====
+
+!split
+===== Introduction to Neural networks =====
+
+Artificial neural networks are computational systems that can learn to
+perform tasks by considering examples, generally without being
+programmed with any task-specific rules. It is supposed to mimic a
+biological system, wherein neurons interact by sending signals in the
+form of mathematical functions between layers. All layers can contain
+an arbitrary number of neurons, and each connection is represented by
+a weight variable.
+
+
+!split
+===== Artificial neurons  =====
+
+The field of artificial neural networks has a long history of
+development, and is closely connected with the advancement of computer
+science and computers in general. A model of artificial neurons was
+first developed by McCulloch and Pitts in 1943 to study signal
+processing in the brain and has later been refined by others. The
+general idea is to mimic neural networks in the human brain, which is
+composed of billions of neurons that communicate with each other by
+sending electrical signals.  Each neuron accumulates its incoming
+signals, which must exceed an activation threshold to yield an
+output. If the threshold is not overcome, the neuron remains inactive,
+i.e. has zero output.
+
+This behaviour has inspired a simple mathematical model for an artificial neuron.
+
+!bt
+\begin{equation}
+ y = f\left(\sum_{i=1}^n w_ix_i\right) = f(u)
+ label{artificialNeuron}
+\end{equation}
+!et
+Here, the output $y$ of the neuron is the value of its activation function, which have as input
+a weighted sum of signals $x_i, \dots ,x_n$ received by $n$ other neurons.
+
+Conceptually, it is helpful to divide neural networks into four
+categories:
+o general purpose neural networks for supervised learning,
+o neural networks designed specifically for image processing, the most prominent example of this class being Convolutional Neural Networks (CNNs),
+o neural networks for sequential data such as Recurrent Neural Networks (RNNs), and
+o neural networks for unsupervised learning such as Deep Boltzmann Machines.
+
+
+In natural science, DNNs and CNNs have already found numerous
+applications. In statistical physics, they have been applied to detect
+phase transitions in 2D Ising and Potts models, lattice gauge
+theories, and different phases of polymers, or solving the
+Navier-Stokes equation in weather forecasting.  Deep learning has also
+found interesting applications in quantum physics. Various quantum
+phase transitions can be detected and studied using DNNs and CNNs,
+topological phases, and even non-equilibrium many-body
+localization. Representing quantum states as DNNs quantum state
+tomography are among some of the impressive achievements to reveal the
+potential of DNNs to facilitate the study of quantum systems.
+
+In quantum information theory, it has been shown that one can perform
+gate decompositions with the help of neural. 
+
+The applications are not limited to the natural sciences. There is a
+plethora of applications in essentially all disciplines, from the
+humanities to life science and medicine.
+
+!split
+===== Neural network types =====
+
+An artificial neural network (ANN), is a computational model that
+consists of layers of connected neurons, or nodes or units.  We will
+refer to these interchangeably as units or nodes, and sometimes as
+neurons.
+
+It is supposed to mimic a biological nervous system by letting each
+neuron interact with other neurons by sending signals in the form of
+mathematical functions between layers.  A wide variety of different
+ANNs have been developed, but most of them consist of an input layer,
+an output layer and eventual layers in-between, called *hidden
+layers*. All layers can contain an arbitrary number of nodes, and each
+connection between two nodes is associated with a weight variable.
+
+Neural networks (also called neural nets) are neural-inspired
+nonlinear models for supervised learning.  As we will see, neural nets
+can be viewed as natural, more powerful extensions of supervised
+learning methods such as linear and logistic regression and soft-max
+methods we discussed earlier.
+
+
+!split
+===== Feed-forward neural networks =====
+
+The feed-forward neural network (FFNN) was the first and simplest type
+of ANNs that were devised. In this network, the information moves in
+only one direction: forward through the layers.
+
+Nodes are represented by circles, while the arrows display the
+connections between the nodes, including the direction of information
+flow. Additionally, each arrow corresponds to a weight variable
+(figure to come).  We observe that each node in a layer is connected
+to *all* nodes in the subsequent layer, making this a so-called
+*fully-connected* FFNN.
+
+
+
+!split
+===== Convolutional Neural Network =====
+
+A different variant of FFNNs are *convolutional neural networks*
+(CNNs), which have a connectivity pattern inspired by the animal
+visual cortex. Individual neurons in the visual cortex only respond to
+stimuli from small sub-regions of the visual field, called a receptive
+field. This makes the neurons well-suited to exploit the strong
+spatially local correlation present in natural images. The response of
+each neuron can be approximated mathematically as a convolution
+operation.  (figure to come)
+
+Convolutional neural networks emulate the behaviour of neurons in the
+visual cortex by enforcing a *local* connectivity pattern between
+nodes of adjacent layers: Each node in a convolutional layer is
+connected only to a subset of the nodes in the previous layer, in
+contrast to the fully-connected FFNN.  Often, CNNs consist of several
+convolutional layers that learn local features of the input, with a
+fully-connected layer at the end, which gathers all the local data and
+produces the outputs. They have wide applications in image and video
+recognition.
+
+!split
+===== Recurrent neural networks =====
+
+So far we have only mentioned ANNs where information flows in one
+direction: forward. *Recurrent neural networks* on the other hand,
+have connections between nodes that form directed *cycles*. This
+creates a form of internal memory which are able to capture
+information on what has been calculated before; the output is
+dependent on the previous computations. Recurrent NNs make use of
+sequential information by performing the same task for every element
+in a sequence, where each element depends on previous elements. An
+example of such information is sentences, making recurrent NNs
+especially well-suited for handwriting and speech recognition.
+
+!split
+===== Other types of networks =====
+
+There are many other kinds of ANNs that have been developed. One type
+that is specifically designed for interpolation in multidimensional
+space is the radial basis function (RBF) network. RBFs are typically
+made up of three layers: an input layer, a hidden layer with
+non-linear radial symmetric activation functions and a linear output
+layer (''linear'' here means that each node in the output layer has a
+linear activation function). The layers are normally fully-connected
+and there are no cycles, thus RBFs can be viewed as a type of
+fully-connected FFNN. They are however usually treated as a separate
+type of NN due the unusual activation functions.
+
+!split
+===== Multilayer perceptrons  =====
+
+One uses often so-called fully-connected feed-forward neural networks
+with three or more layers (an input layer, one or more hidden layers
+and an output layer) consisting of neurons that have non-linear
+activation functions.
+
+Such networks are often called *multilayer perceptrons* (MLPs).
+
+!split
+===== Why multilayer perceptrons?  =====
+
+According to the *Universal approximation theorem*, a feed-forward
+neural network with just a single hidden layer containing a finite
+number of neurons can approximate a continuous multidimensional
+function to arbitrary accuracy, assuming the activation function for
+the hidden layer is a _non-constant, bounded and
+monotonically-increasing continuous function_.
+
+Note that the requirements on the activation function only applies to
+the hidden layer, the output nodes are always assumed to be linear, so
+as to not restrict the range of output values.
+
+
+!split
+=====  Illustration of a single perceptron model and a multi-perceptron model =====
+
+FIGURE: [figures/nns.png, width=600 frac=0.8]  In a) we show a single perceptron model while in b) we dispay a network with two  hidden layers, an input layer and an output layer.
+
+
+!split
+===== Examples of XOR, OR and AND gates =====
+
+
+
+Let us first try to fit various gates using standard linear
+regression. The gates we are thinking of are the classical XOR, OR and
+AND gates, well-known elements in computer science. The tables here
+show how we can set up the inputs $x_1$ and $x_2$ in order to yield a
+specific target $y_i$.
+
+
+
+
+!bc pycod
+"""
+Simple code that tests XOR, OR and AND gates with linear regression
+"""
+
+import numpy as np
+# Design matrix
+X = np.array([ [1, 0, 0], [1, 0, 1], [1, 1, 0],[1, 1, 1]],dtype=np.float64)
+print(f"The X.TX  matrix:{X.T @ X}")
+Xinv = np.linalg.pinv(X.T @ X)
+print(f"The invers of X.TX  matrix:{Xinv}")
+
+# The XOR gate 
+yXOR = np.array( [ 0, 1 ,1, 0])
+ThetaXOR  = Xinv @ X.T @ yXOR
+print(f"The values of theta for the XOR gate:{ThetaXOR}")
+print(f"The linear regression prediction  for the XOR gate:{X @ ThetaXOR}")
+
+
+# The OR gate 
+yOR = np.array( [ 0, 1 ,1, 1])
+ThetaOR  = Xinv @ X.T @ yOR
+print(f"The values of theta for the OR gate:{ThetaOR}")
+print(f"The linear regression prediction  for the OR gate:{X @ ThetaOR}")
+
+
+# The OR gate 
+yAND = np.array( [ 0, 0 ,0, 1])
+ThetaAND  = Xinv @ X.T @ yAND
+print(f"The values of theta for the AND gate:{ThetaAND}")
+print(f"The linear regression prediction  for the AND gate:{X @ ThetaAND}")
+!ec
+
+What is happening here?
+
+!split
+===== Does Logistic Regression do a better Job? =====
+
+!bc pycod
+"""
+Simple code that tests XOR and OR gates with linear regression
+and logistic regression
+"""
+
+import matplotlib.pyplot as plt
+from sklearn.linear_model import LogisticRegression
+import numpy as np
+
+# Design matrix
+X = np.array([ [1, 0, 0], [1, 0, 1], [1, 1, 0],[1, 1, 1]],dtype=np.float64)
+print(f"The X.TX  matrix:{X.T @ X}")
+Xinv = np.linalg.pinv(X.T @ X)
+print(f"The invers of X.TX  matrix:{Xinv}")
+
+# The XOR gate 
+yXOR = np.array( [ 0, 1 ,1, 0])
+ThetaXOR  = Xinv @ X.T @ yXOR
+print(f"The values of theta for the XOR gate:{ThetaXOR}")
+print(f"The linear regression prediction  for the XOR gate:{X @ ThetaXOR}")
+
+
+# The OR gate 
+yOR = np.array( [ 0, 1 ,1, 1])
+ThetaOR  = Xinv @ X.T @ yOR
+print(f"The values of theta for the OR gate:{ThetaOR}")
+print(f"The linear regression prediction  for the OR gate:{X @ ThetaOR}")
+
+
+# The OR gate 
+yAND = np.array( [ 0, 0 ,0, 1])
+ThetaAND  = Xinv @ X.T @ yAND
+print(f"The values of theta for the AND gate:{ThetaAND}")
+print(f"The linear regression prediction  for the AND gate:{X @ ThetaAND}")
+
+# Now we change to logistic regression
+
+
+# Logistic Regression
+logreg = LogisticRegression()
+logreg.fit(X, yOR)
+print("Test set accuracy with Logistic Regression for OR gate: {:.2f}".format(logreg.score(X,yOR)))
+
+logreg.fit(X, yXOR)
+print("Test set accuracy with Logistic Regression for XOR gate: {:.2f}".format(logreg.score(X,yXOR)))
+
+
+logreg.fit(X, yAND)
+print("Test set accuracy with Logistic Regression for AND gate: {:.2f}".format(logreg.score(X,yAND)))
+!ec
+
+Not exactly impressive, but somewhat better.
+
+!split
+===== Adding Neural Networks =====
+
+!bc pycod
+
+# and now neural networks with Scikit-Learn and the XOR
+
+from sklearn.neural_network import MLPClassifier
+from sklearn.datasets import make_classification
+X, yXOR = make_classification(n_samples=100, random_state=1)
+FFNN = MLPClassifier(random_state=1, max_iter=300).fit(X, yXOR)
+FFNN.predict_proba(X)
+print(f"Test set accuracy with Feed Forward Neural Network  for XOR gate:{FFNN.score(X, yXOR)}")
+
+!ec
+
+
+
+!split
+===== Mathematics of deep learning =====
+
+!bblock Two recent books online
+o "The Modern Mathematics of Deep Learning, by Julius Berner, Philipp Grohs, Gitta Kutyniok, Philipp Petersen":"https://arxiv.org/abs/2105.04026", published as "Mathematical Aspects of Deep Learning, pp. 1-111. Cambridge University Press, 2022":"https://doi.org/10.1017/9781009025096.002"
+
+o "Mathematical Introduction to Deep Learning: Methods, Implementations, and Theory, Arnulf Jentzen, Benno Kuckuck, Philippe von Wurstemberger":"https://doi.org/10.48550/arXiv.2310.20360"
+!eblock
+
+
+!split
+===== Reminder on books with hands-on material and codes =====
+!bblock
+* "Sebastian Rashcka et al, Machine learning with Sickit-Learn and PyTorch":"https://sebastianraschka.com/blog/2022/ml-pytorch-book.html"
+!eblock
+
+
+!split
+===== Reading recommendations =====
+
+o Rashkca et al., chapter 11, jupyter-notebook sent separately, from "GitHub":"https://github.com/rasbt/machine-learning-book"
+o Goodfellow et al, chapter 6 and 7 contain most of the neural network background.
+
+!split
+===== Mathematics of deep learning and neural networks =====
+
+
+Neural networks, in its so-called feed-forward form, where each
+iterations contains a feed-forward stage and a back-propgagation
+stage, consist of series of affine matrix-matrix and matrix-vector
+multiplications. The unknown parameters (the so-called biases and
+weights which deternine the architecture of a neural network), are
+uptaded iteratively using the so-called back-propagation algorithm.
+This algorithm corresponds to the so-called reverse mode of 
+automatic differentation. 
+
+!split
+===== Basics of an NN =====
+
+A neural network consists of a series of hidden layers, in addition to
+the input and output layers.  Each layer $l$ has a set of parameters
+$\bm{\Theta}^{(l)}=(\bm{W}^{(l)},\bm{b}^{(l)})$ which are related to the
+parameters in other layers through a series of affine transformations,
+for a standard NN these are matrix-matrix and matrix-vector
+multiplications.  For all layers we will simply use a collective variable $\bm{\Theta}$.
+
+It consist of two basic steps:
+o a feed forward stage which takes a given input and produces a final output which is compared with the target values through our cost/loss function.
+o a back-propagation state where the unknown parameters $\bm{\Theta}$ are updated through the optimization of the their gradients. The expressions for the gradients are obtained via the chain rule, starting from the derivative of the cost/function.
+
+These two steps make up one iteration. This iterative process is continued till we reach an eventual stopping criterion.
+
+
+!split
+===== Overarching view of a neural network =====
+
+The architecture of a neural network defines our model. This model
+aims at describing some function $f(\bm{x}$ which represents
+some final result (outputs or tagrget values) given a specific inpput
+$\bm{x}$. Note that here $\bm{y}$ and $\bm{x}$ are not limited to be
+vectors.
+
+The architecture consists of
+o An input and an output layer where the input layer is defined by the inputs $\bm{x}$. The output layer produces the model ouput $\bm{\tilde{y}}$ which is compared with the target value $\bm{y}$
+o A given number of hidden layers and neurons/nodes/units for each layer (this may vary)
+o A given activation function $\sigma(\bm{z})$ with arguments $\bm{z}$ to be defined below. The activation functions may differ from layer to layer.
+o The last layer, normally called _output_ layer has normally an activation function tailored to the specific problem
+o Finally we define a so-called cost or loss function which is used to gauge the quality of our model. 
+
+
+!split
+===== The optimization problem =====
+
+The cost function is a function of the unknown parameters
+$\bm{\Theta}$ where the latter is a container for all possible
+parameters needed to define a neural network
+
+If we are dealing with a regression task a typical cost/loss function
+is the mean squared error
+!bt
+\[
+C(\bm{\Theta})=\frac{1}{n}\left\{\left(\bm{y}-\bm{X}\bm{\theta}\right)^T\left(\bm{y}-\bm{X}\bm{\theta}\right)\right\}.
+\]
+!et
+This function represents one of many possible ways to define
+the so-called cost function. Note that here we have assumed a linear dependence in terms of the paramters $\bm{\Theta}$. This is in general not the case.
+
+
+!split
+===== Parameters of neural networks =====
+For neural networks the parameters
+$\bm{\Theta}$ are given by the so-called weights and biases (to be
+defined below).
+
+The weights are given by matrix elements $w_{ij}^{(l)}$ where the
+superscript indicates the layer number. The biases are typically given
+by vector elements representing each single node of a given layer,
+that is $b_j^{(l)}$.
+
+!split
+===== Other ingredients of a neural network =====
+
+Having defined the architecture of a neural network, the optimization
+of the cost function with respect to the parameters $\bm{\Theta}$,
+involves the calculations of gradients and their optimization. The
+gradients represent the derivatives of a multidimensional object and
+are often approximated by various gradient methods, including
+o various quasi-Newton methods,
+o plain gradient descent (GD) with a constant learning rate $\eta$,
+o GD with momentum and other approximations to the learning rates such as
+  * Adapative gradient (ADAgrad)
+  * Root mean-square propagation (RMSprop)
+  * Adaptive gradient with momentum (ADAM) and many other
+o Stochastic gradient descent and various families of learning rate approximations
+
+!split
+===== Other parameters =====
+
+In addition to the above, there are often additional hyperparamaters
+which are included in the setup of a neural network. These will be
+discussed below.
+
+
+!split
+===== Universal approximation theorem =====
+
+The universal approximation theorem plays a central role in deep
+learning.  "Cybenko (1989)":"https://link.springer.com/article/10.1007/BF02551274" showed
+the following:
+
+!bblock
+Let $\sigma$ be any continuous sigmoidal function such that
+!bt
+\[
+\sigma(z) = \left\{\begin{array}{cc} 1 & z\rightarrow \infty\\ 0 & z \rightarrow -\infty \end{array}\right.
+\]
+!et
+Given a continuous and deterministic function $F(\bm{x})$ on the unit
+cube in $d$-dimensions $F\in [0,1]^d$, $x\in [0,1]^d$ and a parameter
+$\epsilon >0$, there is a one-layer (hidden) neural network
+$f(\bm{x};\bm{\Theta})$ with $\bm{\Theta}=(\bm{W},\bm{b})$ and $\bm{W}\in
+\mathbb{R}^{m\times n}$ and $\bm{b}\in \mathbb{R}^{n}$, for which
+!bt
+\[
+\vert F(\bm{x})-f(\bm{x};\bm{\Theta})\vert < \epsilon \hspace{0.1cm} \forall \bm{x}\in[0,1]^d.
+\]
+!et
+
+!eblock
+
+!split
+===== Some parallels from real analysis =====
+
+For those of you familiar with for example the "Stone-Weierstrass
+theorem":"https://en.wikipedia.org/wiki/Stone%E2%80%93Weierstrass_theorem"
+for polynomial approximations or the convergence criterion for Fourier
+series, there are similarities in the derivation of the proof for
+neural networks. 
+
+!split
+===== The approximation theorem in words =====
+
+_Any continuous function $y=F(\bm{x})$ supported on the unit cube in
+$d$-dimensions can be approximated by a one-layer sigmoidal network to
+arbitrary accuracy._
+
+"Hornik (1991)":"https://www.sciencedirect.com/science/article/abs/pii/089360809190009T" extended the theorem by letting any non-constant, bounded activation function to be included using that the expectation value
+!bt
+\[
+\mathbb{E}[\vert F(\bm{x})\vert^2] =\int_{\bm{x}\in D} \vert F(\bm{x})\vert^2p(\bm{x})d\bm{x} < \infty.
+\]
+!et
+Then we have
+!bt
+\[
+\mathbb{E}[\vert F(\bm{x})-f(\bm{x};\bm{\Theta})\vert^2] =\int_{\bm{x}\in D} \vert F(\bm{x})-f(\bm{x};\bm{\Theta})\vert^2p(\bm{x})d\bm{x} < \epsilon.
+\]
+!et
+
+
+!split
+===== More on the general approximation theorem =====
+
+None of the proofs give any insight into the relation between the
+number of of hidden layers and nodes and the approximation error
+$\epsilon$, nor the magnitudes of $\bm{W}$ and $\bm{b}$.
+
+Neural networks (NNs) have what we may call a kind of universality no matter what function we want to compute.
+
+!bblock
+It does not mean that an NN can be used to exactly compute any function. Rather, we get an approximation that is as good as we want. 
+!eblock
+
+!split
+===== Class of functions we can approximate =====
+
+!bblock
+The class of functions that can be approximated are the continuous ones.
+If the function $F(\bm{x})$ is discontinuous, it won't in general be possible to approximate it. However, an NN may still give an approximation even if we fail in some points.
+!eblock
+
+
+
+!split
+===== Setting up the equations for a neural network =====
+
+The questions we want to ask are how do changes in the biases and the
+weights in our network change the cost function and how can we use the
+final output to modify the weights and biases?
+
+To derive these equations let us start with a plain regression problem
+and define our cost function as
+
+!bt
+\[
+{\cal C}(\bm{\Theta})  =  \frac{1}{2}\sum_{i=1}^n\left(y_i - \tilde{y}_i\right)^2, 
+\]
+!et
+
+where the $y_i$s are our $n$ targets (the values we want to
+reproduce), while the outputs of the network after having propagated
+all inputs $\bm{x}$ are given by $\bm{\tilde{y}}_i$.
+
+
+!split
+===== Layout of a neural network with three hidden layers =====
+
+FIGURE: [figures/nn1.pdf, width=900 frac=1.0]
+
+!split
+===== Definitions =====
+
+With our definition of the targets $\bm{y}$, the outputs of the
+network $\bm{\tilde{y}}$ and the inputs $\bm{x}$ we
+define now the activation $z_j^l$ of node/neuron/unit $j$ of the
+$l$-th layer as a function of the bias, the weights which add up from
+the previous layer $l-1$ and the forward passes/outputs
+$\hat{a}^{l-1}$ from the previous layer as
+
+
+!bt
+\[
+z_j^l = \sum_{i=1}^{M_{l-1}}w_{ij}^la_i^{l-1}+b_j^l,
+\]
+!et
+
+where $b_k^l$ are the biases from layer $l$.  Here $M_{l-1}$
+represents the total number of nodes/neurons/units of layer $l-1$. The
+figure in the whiteboard notes illustrates this equation.  We can rewrite this in a more
+compact form as the matrix-vector products we discussed earlier,
+
+!bt
+\[
+\hat{z}^l = \left(\hat{W}^l\right)^T\hat{a}^{l-1}+\hat{b}^l.
+\]
+!et
+
+!split
+===== Inputs to the activation function =====
+
+With the activation values $\bm{z}^l$ we can in turn define the
+output of layer $l$ as $\bm{a}^l = f(\bm{z}^l)$ where $f$ is our
+activation function. In the examples here we will use the sigmoid
+function discussed in our logistic regression lectures. We will also use the same activation function $f$ for all layers
+and their nodes.  It means we have
+
+!bt
+\[
+a_j^l = \sigma(z_j^l) = \frac{1}{1+\exp{-(z_j^l)}}.
+\]
+!et
+
+
+
+
+!split
+===== Derivatives and the chain rule =====
+
+From the definition of the activation $z_j^l$ we have
+!bt
+\[
+\frac{\partial z_j^l}{\partial w_{ij}^l} = a_i^{l-1},
+\]
+!et
+and
+!bt
+\[
+\frac{\partial z_j^l}{\partial a_i^{l-1}} = w_{ji}^l. 
+\]
+!et
+
+With our definition of the activation function we have that (note that this function depends only on $z_j^l$)
+!bt
+\[
+\frac{\partial a_j^l}{\partial z_j^{l}} = a_j^l(1-a_j^l)=\sigma(z_j^l)(1-\sigma(z_j^l)). 
+\]
+!et
+
+
+!split
+===== Derivative of the cost function =====
+
+With these definitions we can now compute the derivative of the cost function in terms of the weights.
+
+Let us specialize to the output layer $l=L$. Our cost function is
+!bt
+\[
+{\cal C}(\bm{\Theta}^L)  =  \frac{1}{2}\sum_{i=1}^n\left(y_i - \tilde{y}_i\right)^2=\frac{1}{2}\sum_{i=1}^n\left(a_i^L - y_i\right)^2, 
+\]
+!et
+The derivative of this function with respect to the weights is
+
+!bt
+\[
+\frac{\partial{\cal C}(\bm{\Theta}^L)}{\partial w_{jk}^L}  =  \left(a_j^L - y_j\right)\frac{\partial a_j^L}{\partial w_{jk}^{L}}, 
+\]
+!et
+The last partial derivative can easily be computed and reads (by applying the chain rule)
+!bt
+\[
+\frac{\partial a_j^L}{\partial w_{jk}^{L}} = \frac{\partial a_j^L}{\partial z_{j}^{L}}\frac{\partial z_j^L}{\partial w_{jk}^{L}}=a_j^L(1-a_j^L)a_k^{L-1}.  
+\]
+!et
+
+
+
+!split
+===== Simpler examples first, and automatic differentiation =====
+
+In order to understand the back propagation algorithm and its
+derivation (an implementation of the chain rule), let us first digress
+with some simple examples. These examples are also meant to motivate
+the link with back propagation and "automatic differentiation":"https://en.wikipedia.org/wiki/Automatic_differentiation".
+
+!split
+===== Reminder on the chain rule and gradients =====
+
+If we have a multivariate function $f(x,y)$ where $x=x(t)$ and $y=y(t)$ are functions of a variable $t$, we have that the gradient of $f$ with respect to $t$ (without the explicit unit vector components)
+!bt
+\[
+\frac{df}{dt} = \begin{bmatrix}\frac{\partial f}{\partial x} & \frac{\partial f}{\partial y} \end{bmatrix} \begin{bmatrix}\frac{\partial x}{\partial t} \\ \frac{\partial y}{\partial t} \end{bmatrix}=\frac{\partial f}{\partial x} \frac{\partial x}{\partial t} +\frac{\partial f}{\partial y} \frac{\partial y}{\partial t}. 
+\]
+!et
+
+
+!split
+===== Multivariable functions =====
+
+If we have a multivariate function $f(x,y)$ where $x=x(t,s)$ and $y=y(t,s)$ are functions of the variables $t$ and $s$, we have that the partial derivatives
+!bt
+\[
+\frac{\partial f}{\partial s}=\frac{\partial f}{\partial x}\frac{\partial x}{\partial s}+\frac{\partial f}{\partial y}\frac{\partial y}{\partial s},
+\]
+!et
+and
+!bt
+\[
+\frac{\partial f}{\partial t}=\frac{\partial f}{\partial x}\frac{\partial x}{\partial t}+\frac{\partial f}{\partial y}\frac{\partial y}{\partial t}.
+\]
+!et
+
+the gradient of $f$ with respect to $t$ and $s$ (without the explicit unit vector components)
+!bt
+\[
+\frac{df}{d(s,t)} = \begin{bmatrix}\frac{\partial f}{\partial x} & \frac{\partial f}{\partial y} \end{bmatrix} \begin{bmatrix}\frac{\partial x}{\partial s}  &\frac{\partial x}{\partial t} \\ \frac{\partial y}{\partial s} & \frac{\partial y}{\partial t} \end{bmatrix}.
+\]
+!et
+
+
+
+!split
+===== Automatic differentiation through examples =====
+
+A great introduction to automatic differentiation is given by Baydin et al., see URL:"https://arxiv.org/abs/1502.05767".
+
+Automatic differentiation is a represented by a repeated application
+of the chain rule on well-known functions and allows for the
+calculation of derivatives to numerical precision. It is not the same
+as the calculation of symbolic derivatives via for example SymPy, nor
+does it use approximative formulae based on Taylor-expansions of a
+function around a given value. The latter are error prone due to
+truncation errors and values of the step size $\Delta$.
+
+!split
+===== Simple example =====
+
+Our first example is rather simple,
+!bt
+\[
+f(x) =\exp{x^2},
+\]
+!et
+with derivative
+!bt
+\[
+f'(x) =2x\exp{x^2}.
+\]
+!et
+We can use SymPy to extract the pertinent lines of Python code through the following simple example
+!bc pycod
+from __future__ import division
+from sympy import *
+x = symbols('x')
+expr = exp(x*x)
+simplify(expr)
+derivative = diff(expr,x)
+print(python(expr))
+print(python(derivative))
+!ec
+
+!split
+===== Smarter way of evaluating the above function =====
+If we study this function, we note that we can reduce the number of operations by introducing an intermediate variable
+!bt
+\[
+a = x^2,
+\]
+!et
+leading to 
+!bt
+\[
+f(x) = f(a(x)) = b= \exp{a}.
+\]
+!et
+
+We now assume that all operations can be counted in terms of equal
+floating point operations. This means that in order to calculate
+$f(x)$ we need first to square $x$ and then compute the exponential. We
+have thus two floating point operations only.
+
+!split
+===== Reducing the number of operations =====
+
+With the introduction of a precalculated quantity $a$ and thereby $f(x)$ we have that the derivative can be written as
+
+!bt
+\[
+f'(x) = 2xb,
+\]
+!et
+
+which reduces the number of operations from four in the orginal
+expression to two. This means that if we need to compute $f(x)$ and
+its derivative (a common task in optimizations), we have reduced the
+number of operations from six to four in total.
+
+_Note_ that the usage of a symbolic software like SymPy does not
+include such simplifications and the calculations of the function and
+the derivatives yield in general more floating point operations.
+
+!split
+===== Chain rule, forward and reverse modes =====
+
+In the above example we have introduced the variables $a$ and $b$, and our function is
+!bt
+\[
+f(x) = f(a(x)) = b= \exp{a},
+\]
+!et
+with $a=x^2$. We can decompose the derivative of $f$ with respect to $x$ as
+!bt
+\[
+\frac{df}{dx}=\frac{df}{db}\frac{db}{da}\frac{da}{dx}.
+\]
+!et
+
+We note that since $b=f(x)$ that
+!bt
+\[
+\frac{df}{db}=1,
+\]
+!et
+leading to
+!bt
+\[
+\frac{df}{dx}=\frac{db}{da}\frac{da}{dx}=2x\exp{x^2},
+\]
+!et
+as before.
+
+
+!split
+===== Forward and reverse modes =====
+
+We have that 
+!bt
+\[
+\frac{df}{dx}=\frac{df}{db}\frac{db}{da}\frac{da}{dx},
+\]
+!et
+which we can rewrite either as
+!bt
+\[
+\frac{df}{dx}=\left[\frac{df}{db}\frac{db}{da}\right]\frac{da}{dx},
+\]
+!et
+or
+!bt
+\[
+\frac{df}{dx}=\frac{df}{db}\left[\frac{db}{da}\frac{da}{dx}\right].
+\]
+!et
+
+The first expression is called reverse mode (or back propagation)
+since we start by evaluating the derivatives at the end point and then
+propagate backwards. This is the standard way of evaluating
+derivatives (gradients) when optimizing the parameters of a neural
+network.  In the context of deep learning this is computationally
+more efficient since the output of a neural network consists of either
+one or some few other output variables.
+
+The second equation defines the so-called  _forward mode_.
+
+
+!split
+===== More complicated function =====
+
+We increase our ambitions and introduce a slightly more complicated function
+!bt
+\[
+f(x) =\sqrt{x^2+exp{x^2}},
+\]
+!et
+with derivative
+!bt
+\[
+f'(x) =\frac{x(1+\exp{x^2})}{\sqrt{x^2+exp{x^2}}}.
+\]
+!et
+The corresponding SymPy code reads
+!bc pycod
+from __future__ import division
+from sympy import *
+x = symbols('x')
+expr = sqrt(x*x+exp(x*x))
+simplify(expr)
+derivative = diff(expr,x)
+print(python(expr))
+print(python(derivative))
+!ec
+
+!split
+===== Counting the number of floating point operations =====
+
+A simple count of operations shows that we need five operations for
+the function itself and ten for the derivative.  Fifteen operations in total if we wish to proceed with the above codes.
+
+Can we reduce this to
+say half the number of operations?
+
+!split
+=====  Defining intermediate operations =====
+
+We can indeed reduce the number of operation to half of those listed in the brute force approach above.
+We define the following quantities
+!bt
+\[
+a = x^2,
+\]
+!et
+and
+!bt
+\[
+b = \exp{x^2} = \exp{a},
+\]
+!et
+and
+!bt
+\[
+c= a+b,
+\]
+!et
+and
+!bt
+\[
+d=f(x)=\sqrt{c}.
+\]
+!et
+
+!split
+===== New expression for the derivative =====
+
+With these definitions we obtain the following partial derivatives 
+!bt
+\[
+\frac{\partial a}{\partial x} = 2x,
+\]
+!et
+and
+!bt
+\[
+\frac{\partial b}{\partial a} = \exp{a},
+\]
+!et
+and
+!bt
+\[
+\frac{\partial c}{\partial a} = 1,
+\]
+!et
+and
+!bt
+\[
+\frac{\partial c}{\partial b} = 1,
+\]
+!et
+and
+!bt
+\[
+\frac{\partial d}{\partial c} = \frac{1}{2\sqrt{c}},
+\]
+!et
+and finally
+!bt
+\[
+\frac{\partial f}{\partial d} = 1.
+\]
+!et
+
+!split
+===== Final derivatives =====
+Our final derivatives are thus
+!bt
+\[
+\frac{\partial f}{\partial c} = \frac{\partial f}{\partial d} \frac{\partial d}{\partial c}  = \frac{1}{2\sqrt{c}},
+\]
+!et
+!bt
+\[
+\frac{\partial f}{\partial b} = \frac{\partial f}{\partial c} \frac{\partial c}{\partial b}  = \frac{1}{2\sqrt{c}},
+\]
+!et
+!bt
+\[
+\frac{\partial f}{\partial a} = \frac{\partial f}{\partial c} \frac{\partial c}{\partial a}+
+\frac{\partial f}{\partial b} \frac{\partial b}{\partial a}  = \frac{1+\exp{a}}{2\sqrt{c}},
+\]
+!et
+and finally 
+!bt
+\[
+\frac{\partial f}{\partial x} = \frac{\partial f}{\partial a} \frac{\partial a}{\partial x}  = \frac{x(1+\exp{a})}{\sqrt{c}},
+\]
+!et
+which is just
+!bt
+\[
+\frac{\partial f}{\partial x} = \frac{x(1+b)}{d},
+\]
+!et
+and requires only three operations if we can reuse all intermediate variables.
+
+
+!split
+===== In general not this simple =====
+
+In general, see the generalization below, unless we can obtain simple
+analytical expressions which we can simplify further, the final
+implementation of automatic differentiation involves repeated
+calculations (and thereby operations) of derivatives of elementary
+functions.
+
+!split
+===== Automatic differentiation =====
+
+We can make this example more formal. Automatic differentiation is a
+formalization of the previous example (see graph).
+
+We define $\bm{x}\in x_1,\dots, x_l$ input variables to a given function $f(\bm{x})$ and $x_{l+1},\dots, x_L$ intermediate variables.
+
+In the above example we have only one input variable, $l=1$ and four intermediate variables, that is
+!bt
+\[
+\begin{bmatrix} x_1=x & x_2 = x^2=a & x_3 =\exp{a}= b & x_4=c=a+b & x_5 = \sqrt{c}=d \end{bmatrix}.
+\]
+!et
+
+Furthemore, for $i=l+1, \dots, L$ (here $i=2,3,4,5$ and $f=x_L=d$), we
+define the elementary functions $g_i(x_{Pa(x_i)})$ where $x_{Pa(x_i)}$ are the parent nodes of the variable $x_i$.
+
+In our case, we have for example for $x_3=g_3(x_{Pa(x_i)})=\exp{a}$, that $g_3=\exp{()}$ and $x_{Pa(x_3)}=a$.
+
+!split
+===== Chain rule =====
+
+We can now compute the gradients by back-propagating the derivatives using the chain rule.
+We have defined
+!bt
+\[
+\frac{\partial f}{\partial x_L} = 1,
+\]
+!et
+which allows us to find the derivatives of the various variables $x_i$ as
+!bt
+\[
+\frac{\partial f}{\partial x_i} = \sum_{x_j:x_i\in Pa(x_j)}\frac{\partial f}{\partial x_j} \frac{\partial x_j}{\partial x_i}=\sum_{x_j:x_i\in Pa(x_j)}\frac{\partial f}{\partial x_j} \frac{\partial g_j}{\partial x_i}.
+\]
+!et
+
+Whenever we have a function which can be expressed as a computation
+graph and the various functions can be expressed in terms of
+elementary functions that are differentiable, then automatic
+differentiation works.  The functions may not need to be elementary
+functions, they could also be computer programs, although not all
+programs can be automatically differentiated.
+
+!split
+===== First network example, simple percepetron with one input =====
+
+As yet another example we define now a simple perceptron model with
+all quantities given by scalars. We consider only one input variable
+$x$ and one target value $y$.  We define an activation function
+$\sigma_1$ which takes as input
+
+!bt
+\[
+z_1 = w_1x+b_1,
+\]
+!et
+where $w_1$ is the weight and $b_1$ is the bias. These are the
+parameters we want to optimize.  The output is $a_1=\sigma(z_1)$ (see
+graph from whiteboard notes). This output is then fed into the
+_cost/loss_ function, which we here for the sake of simplicity just
+define as the squared error
+
+!bt
+\[
+C(x;w_1,b_1)=\frac{1}{2}(a_1-y)^2.
+\]
+!et
+
+!split
+===== Layout of a simple neural network with no hidden layer  =====
+
+FIGURE: [figures/simplenn1.png, width=900 frac=1.0]
+
+
+
+!split
+===== Optimizing the parameters =====
+
+In setting up the feed forward and back propagation parts of the
+algorithm, we need now the derivative of the various variables we want
+to train.
+
+We need
+!bt
+\[
+\frac{\partial C}{\partial w_1} \hspace{0.1cm}\mathrm{and}\hspace{0.1cm}\frac{\partial C}{\partial b_1}. 
+\]
+!et
+
+Using the chain rule we find 
+!bt
+\[
+\frac{\partial C}{\partial w_1}=\frac{\partial C}{\partial a_1}\frac{\partial a_1}{\partial z_1}\frac{\partial z_1}{\partial w_1}=(a_1-y)\sigma_1'x,
+\]
+!et
+and
+!bt
+\[
+\frac{\partial C}{\partial b_1}=\frac{\partial C}{\partial a_1}\frac{\partial a_1}{\partial z_1}\frac{\partial z_1}{\partial b_1}=(a_1-y)\sigma_1',
+\]
+!et
+which we later will just define as
+!bt
+\[
+\frac{\partial C}{\partial a_1}\frac{\partial a_1}{\partial z_1}=\delta_1.
+\]
+!et
+
+
+!split
+===== Adding a hidden layer =====
+
+We change our simple model to (see graph)
+a network with just one hidden layer but with scalar variables only.
+
+Our output variable changes to $a_2$ and $a_1$ is now the output from the hidden node and $a_0=x$.
+We have then
+!bt
+\[
+z_1 = w_1a_0+b_1 \hspace{0.1cm} \wedge a_1 = \sigma_1(z_1),
+\]
+!et
+!bt
+\[
+z_2 = w_2a_1+b_2 \hspace{0.1cm} \wedge a_2 = \sigma_2(z_2),
+\]
+!et
+and the cost function
+!bt
+\[
+C(x;\bm{\Theta})=\frac{1}{2}(a_2-y)^2,
+\]
+!et
+with $\bm{\Theta}=[w_1,w_2,b_1,b_2]$.
+
+
+!split
+===== Layout of a simple neural network with one hidden layer  =====
+
+FIGURE: [figures/simplenn2.png, width=900 frac=1.0]
+
+
+
+!split
+===== The derivatives =====
+
+The derivatives are now, using the chain rule again
+
+!bt
+\[
+\frac{\partial C}{\partial w_2}=\frac{\partial C}{\partial a_2}\frac{\partial a_2}{\partial z_2}\frac{\partial z_2}{\partial w_2}=(a_2-y)\sigma_2'a_1=\delta_2a_1,
+\]
+!et
+!bt
+\[
+\frac{\partial C}{\partial b_2}=\frac{\partial C}{\partial a_2}\frac{\partial a_2}{\partial z_2}\frac{\partial z_2}{\partial b_2}=(a_2-y)\sigma_2'=\delta_2,
+\]
+!et
+!bt
+\[
+\frac{\partial C}{\partial w_1}=\frac{\partial C}{\partial a_2}\frac{\partial a_2}{\partial z_2}\frac{\partial z_2}{\partial a_1}\frac{\partial a_1}{\partial z_1}\frac{\partial z_1}{\partial w_1}=(a_2-y)\sigma_2'a_1\sigma_1'a_0,
+\]
+!et
+!bt
+\[
+\frac{\partial C}{\partial b_1}=\frac{\partial C}{\partial a_2}\frac{\partial a_2}{\partial z_2}\frac{\partial z_2}{\partial a_1}\frac{\partial a_1}{\partial z_1}\frac{\partial z_1}{\partial b_1}=(a_2-y)\sigma_2'\sigma_1'=\delta_1.
+\]
+!et
+
+Can you generalize this to more than one hidden layer?
+
+
+!split
+===== Important observations =====
+
+!bblock
+From the above equations we see that the derivatives of the activation
+functions play a central role. If they vanish, the training may
+stop. This is called the vanishing gradient problem, see discussions below. If they become
+large, the parameters $w_i$ and $b_i$ may simply go to infinity. This
+is referenced as  the exploding gradient problem.
+!eblock
+
+
+
+
+!split
+===== The training =====
+
+The training of the parameters is done through various gradient descent approximations with
+
+!bt
+\[
+w_{i}\leftarrow w_{i}- \eta \delta_i a_{i-1},
+\]
+!et
+and
+!bt
+\[
+b_i \leftarrow b_i-\eta \delta_i,
+\]
+!et
+with $\eta$ is the learning rate.
+
+One iteration consists of one feed forward step and one back-propagation step. Each back-propagation step does one update of the parameters $\bm{\Theta}$.
+
+For the first hidden layer $a_{i-1}=a_0=x$ for this simple model.
+
+
+!split
+===== Code example  =====
+
+The code here implements the above model with one hidden layer and
+scalar variables for the same function we studied in the previous
+example.  The code is however set up so that we can add multiple
+inputs $x$ and target values $y$. Note also that we have the
+possibility of defining a feature matrix $\bm{X}$ with more than just
+one column for the input values. This will turn useful in our next example. We have also defined matrices and vectors for all of our operations although it is not necessary here.
+
+!bc pycod
+import numpy as np
+# We use the Sigmoid function as activation function
+def sigmoid(z):
+    return 1.0/(1.0+np.exp(-z))
+
+def forwardpropagation(x):
+    # weighted sum of inputs to the hidden layer
+    z_1 = np.matmul(x, w_1) + b_1
+    # activation in the hidden layer
+    a_1 = sigmoid(z_1)
+    # weighted sum of inputs to the output layer
+    z_2 = np.matmul(a_1, w_2) + b_2
+    a_2 = z_2
+    return a_1, a_2
+
+def backpropagation(x, y):
+    a_1, a_2 = forwardpropagation(x)
+    # parameter delta for the output layer, note that a_2=z_2 and its derivative wrt z_2 is just 1
+    delta_2 = a_2 - y
+    print(0.5*((a_2-y)**2))
+    # delta for  the hidden layer
+    delta_1 = np.matmul(delta_2, w_2.T) * a_1 * (1 - a_1)
+    # gradients for the output layer
+    output_weights_gradient = np.matmul(a_1.T, delta_2)
+    output_bias_gradient = np.sum(delta_2, axis=0)
+    # gradient for the hidden layer
+    hidden_weights_gradient = np.matmul(x.T, delta_1)
+    hidden_bias_gradient = np.sum(delta_1, axis=0)
+    return output_weights_gradient, output_bias_gradient, hidden_weights_gradient, hidden_bias_gradient
+
+
+# ensure the same random numbers appear every time
+np.random.seed(0)
+# Input variable
+x = np.array([4.0],dtype=np.float64)
+# Target values
+y = 2*x+1.0 
+
+# Defining the neural network, only scalars here
+n_inputs = x.shape
+n_features = 1
+n_hidden_neurons = 1
+n_outputs = 1
+
+# Initialize the network
+# weights and bias in the hidden layer
+w_1 = np.random.randn(n_features, n_hidden_neurons)
+b_1 = np.zeros(n_hidden_neurons) + 0.01
+
+# weights and bias in the output layer
+w_2 = np.random.randn(n_hidden_neurons, n_outputs)
+b_2 = np.zeros(n_outputs) + 0.01
+
+eta = 0.1
+for i in range(50):
+    # calculate gradients
+    derivW2, derivB2, derivW1, derivB1 = backpropagation(x, y)
+    # update weights and biases
+    w_2 -= eta * derivW2
+    b_2 -= eta * derivB2
+    w_1 -= eta * derivW1
+    b_1 -= eta * derivB1
+
+
+!ec
+
+We see that after some few iterations (the results do depend on the learning rate however), we get an error which is rather small.
+
+!split
+===== Exercise: Including more data =====
+
+Try to increase the amount of input and
+target/output data. Try also to perform calculations for more values
+of the learning rates. Feel free to add either hyperparameters with an
+$l_1$ norm or an $l_2$ norm and discuss your results.
+Discuss your results as functions of the amount of training data and various learning rates.
+
+_Challenge:_ Try to change the activation functions and replace the hard-coded analytical expressions with automatic derivation via either _autograd_ or _JAX_.
+
+!split
+===== Simple neural network and the  back propagation equations  =====
+
+Let us now try to increase our level of ambition and attempt at setting 
+up the equations for a neural network with two input nodes, one hidden
+layer with two hidden nodes and one output layer with one output node/neuron only (see graph)..
+
+We need to define the following parameters and variables with the input layer (layer $(0)$) 
+where we label the  nodes $x_0$ and $x_1$
+!bt
+\[
+x_0 = a_0^{(0)} \wedge x_1 = a_1^{(0)}.
+\]
+!et
+
+The  hidden layer (layer $(1)$) has  nodes which yield the outputs $a_0^{(1)}$ and $a_1^{(1)}$) with  weight $\bm{w}$ and bias $\bm{b}$ parameters
+!bt
+\[
+w_{ij}^{(1)}=\left\{w_{00}^{(1)},w_{01}^{(1)},w_{10}^{(1)},w_{11}^{(1)}\right\} \wedge b^{(1)}=\left\{b_0^{(1)},b_1^{(1)}\right\}.
+\]
+!et
+
+
+!split
+===== Layout of a simple neural network with two input nodes, one  hidden layer and one output node  =====
+
+FIGURE: [figures/simplenn3.png, width=900 frac=1.0]
+
+
+
+!split
+===== The ouput layer =====
+
+Finally, we have the ouput layer given by layer label $(2)$ with output $a^{(2)}$ and weights and biases to be determined given by the variables
+!bt
+\[
+w_{i}^{(2)}=\left\{w_{0}^{(2)},w_{1}^{(2)}\right\} \wedge b^{(2)}.
+\]
+!et
+
+
+Our output is $\tilde{y}=a^{(2)}$ and we define a generic cost function $C(a^{(2)},y;\bm{\Theta})$ where $y$ is the target value (a scalar here).
+The parameters we need to optimize are given by
+!bt
+\[
+\bm{\Theta}=\left\{w_{00}^{(1)},w_{01}^{(1)},w_{10}^{(1)},w_{11}^{(1)},w_{0}^{(2)},w_{1}^{(2)},b_0^{(1)},b_1^{(1)},b^{(2)}\right\}.
+\]
+!et
+
+!split
+===== Compact expressions =====
+
+We can define the inputs to the activation functions for the various layers in terms of various matrix-vector multiplications and vector additions.
+The inputs to the first hidden layer are
+!bt
+\[
+\begin{bmatrix}z_0^{(1)} \\ z_1^{(1)} \end{bmatrix}=\begin{bmatrix}w_{00}^{(1)} & w_{01}^{(1)}\\ w_{10}^{(1)} &w_{11}^{(1)} \end{bmatrix}\begin{bmatrix}a_0^{(0)} \\ a_1^{(0)} \end{bmatrix}+\begin{bmatrix}b_0^{(1)} \\ b_1^{(1)} \end{bmatrix},
+\]
+!et
+with outputs
+!bt
+\[
+\begin{bmatrix}a_0^{(1)} \\ a_1^{(1)} \end{bmatrix}=\begin{bmatrix}\sigma^{(1)}(z_0^{(1)}) \\ \sigma^{(1)}(z_1^{(1)}) \end{bmatrix}.
+\]
+!et
+
+!split
+===== Output layer =====
+
+For the final output layer we have the inputs to the final activation function 
+!bt
+\[
+z^{(2)} = w_{0}^{(2)}a_0^{(1)} +w_{1}^{(2)}a_1^{(1)}+b^{(2)},
+\]
+!et
+resulting in the  output
+!bt
+\[
+a^{(2)}=\sigma^{(2)}(z^{(2)}).
+\]
+!et
+
+
+
+
+
+!split
+===== Explicit derivatives =====
+
+In total we have nine parameters which we need to train.  Using the
+chain rule (or just the back-propagation algorithm) we can find all
+derivatives. Since we will use automatic differentiation in reverse
+mode, we start with the derivatives of the cost function with respect
+to the parameters of the output layer, namely
+
+!bt
+\[
+\frac{\partial C}{\partial w_{i}^{(2)}}=\frac{\partial C}{\partial a^{(2)}}\frac{\partial a^{(2)}}{\partial z^{(2)}}\frac{\partial z^{(2)}}{\partial w_{i}^{(2)}}=\delta^{(2)}a_i^{(1)},
+\]
+!et
+with
+!bt
+\[
+\delta^{(2)}=\frac{\partial C}{\partial a^{(2)}}\frac{\partial a^{(2)}}{\partial z^{(2)}}
+\]
+!et
+and finally
+!bt
+\[
+\frac{\partial C}{\partial b^{(2)}}=\frac{\partial C}{\partial a^{(2)}}\frac{\partial a^{(2)}}{\partial z^{(2)}}\frac{\partial z^{(2)}}{\partial b^{(2)}}=\delta^{(2)}.
+\]
+!et
+
+
+!split
+===== Derivatives of the hidden layer =====
+
+Using the chain rule we have the following expressions for say one of the weight parameters (it is easy to generalize to the other weight parameters)
+!bt
+\[
+\frac{\partial C}{\partial w_{00}^{(1)}}=\frac{\partial C}{\partial a^{(2)}}\frac{\partial a^{(2)}}{\partial z^{(2)}}
+\frac{\partial z^{(2)}}{\partial z_0^{(1)}}\frac{\partial z_0^{(1)}}{\partial w_{00}^{(1)}}=    \delta^{(2)}\frac{\partial z^{(2)}}{\partial z_0^{(1)}}\frac{\partial z_0^{(1)}}{\partial w_{00}^{(1)}},
+\]
+!et
+which, noting that
+!bt
+\[
+z^{(2)} =w_0^{(2)}a_0^{(1)}+w_1^{(2)}a_1^{(1)}+b^{(2)},
+\]
+!et
+allows us to rewrite 
+!bt
+\[
+\frac{\partial z^{(2)}}{\partial z_0^{(1)}}\frac{\partial z_0^{(1)}}{\partial w_{00}^{(1)}}=w_0^{(2)}\frac{\partial a_0^{(1)}}{\partial z_0^{(1)}}a_0^{(1)}.
+\]
+!et
+
+!split
+===== Final expression =====
+Defining
+!bt
+\[
+\delta_0^{(1)}=w_0^{(2)}\frac{\partial a_0^{(1)}}{\partial z_0^{(1)}}\delta^{(2)},
+\]
+!et
+we have 
+!bt
+\[
+\frac{\partial C}{\partial w_{00}^{(1)}}=\delta_0^{(1)}a_0^{(1)}.
+\]
+!et
+Similarly, we obtain
+!bt
+\[
+\frac{\partial C}{\partial w_{01}^{(1)}}=\delta_0^{(1)}a_1^{(1)}.
+\]
+!et
+
+!split
+===== Completing the list =====
+
+Similarly, we find
+!bt
+\[
+\frac{\partial C}{\partial w_{10}^{(1)}}=\delta_1^{(1)}a_0^{(1)},
+\]
+!et
+and 
+!bt
+\[
+\frac{\partial C}{\partial w_{11}^{(1)}}=\delta_1^{(1)}a_1^{(1)},
+\]
+!et
+where we have defined 
+!bt
+\[
+\delta_1^{(1)}=w_1^{(2)}\frac{\partial a_1^{(1)}}{\partial z_1^{(1)}}\delta^{(2)}.
+\]
+!et
+
+!split
+===== Final expressions for the biases of the hidden layer =====
+
+For the sake of completeness, we list the derivatives of the biases, which are
+!bt
+\[
+\frac{\partial C}{\partial b_{0}^{(1)}}=\delta_0^{(1)},
+\]
+!et
+and
+!bt
+\[
+\frac{\partial C}{\partial b_{1}^{(1)}}=\delta_1^{(1)}.
+\]
+!et
+As we will see below, these expressions can be generalized in a more compact form. 
+
+
+!split
+===== Gradient expressions =====
+
+For this specific model, with just one output node and two hidden
+nodes, the gradient descent equations take the following form for output layer
+!bt
+\[
+w_{i}^{(2)}\leftarrow w_{i}^{(2)}- \eta \delta^{(2)} a_{i}^{(1)},
+\]
+!et
+and
+!bt
+\[
+b^{(2)} \leftarrow b^{(2)}-\eta \delta^{(2)},
+\]
+!et
+and
+!bt
+\[
+w_{ij}^{(1)}\leftarrow w_{ij}^{(1)}- \eta \delta_{i}^{(1)} a_{j}^{(0)},
+\]
+!et
+and
+!bt
+\[
+b_{i}^{(1)} \leftarrow b_{i}^{(1)}-\eta \delta_{i}^{(1)},
+\]
+!et
+where $\eta$ is the learning rate.
+
+
+
+!split
+===== Exercise: Extended program  =====
+
+We extend our simple code to a function which depends on two variable $x_0$ and $x_1$, that is
+!bt
+\[
+y=f(x_0,x_1)=x_0^2+3x_0x_1+x_1^2+5.
+\]
+!et
+We feed our network with $n=100$ entries $x_0$ and $x_1$. We have thus two features represented by these variable and an input matrix/design matrix $\bm{X}\in \mathbf{R}^{n\times 2}$
+!bt
+\[
+\bm{X}=\begin{bmatrix} x_{00} & x_{01} \\ x_{00} & x_{01} \\ x_{10} & x_{11} \\ x_{20} & x_{21} \\ \dots & \dots \\ \dots & \dots \\ x_{n-20} & x_{n-21} \\ x_{n-10} & x_{n-11} \end{bmatrix}.
+\]
+!et
+Write a code, based on the previous code examples, which takes as input these data and fit the above function.
+You can extend your code to include automatic differentiation.
+
+With these examples, we are now ready to embark upon the writing of more a general code for neural networks.
+
+!split
+===== Getting serious, the  back propagation equations for a neural network =====
+
+Now it is time to move away from one node in each layer only. Our inputs are also represented either by several inputs.
+
+We have thus
+!bt
+\[
+\frac{\partial{\cal C}((\bm{\Theta}^L)}{\partial w_{jk}^L}  =  \left(a_j^L - y_j\right)a_j^L(1-a_j^L)a_k^{L-1}, 
+\]
+!et
+
+Defining
+!bt
+\[
+\delta_j^L = a_j^L(1-a_j^L)\left(a_j^L - y_j\right) = \sigma'(z_j^L)\frac{\partial {\cal C}}{\partial (a_j^L)},
+\]
+!et
+and using the Hadamard product of two vectors we can write this as
+!bt
+\[
+\bm{\delta}^L = \sigma'(\hat{z}^L)\circ\frac{\partial {\cal C}}{\partial (\bm{a}^L)}.
+\]
+!et
+
+!split
+===== Analyzing the last results =====
+
+This is an important expression. The second term on the right handside
+measures how fast the cost function is changing as a function of the $j$th
+output activation.  If, for example, the cost function doesn't depend
+much on a particular output node $j$, then $\delta_j^L$ will be small,
+which is what we would expect. The first term on the right, measures
+how fast the activation function $f$ is changing at a given activation
+value $z_j^L$.
+
+!split
+===== More considerations =====
+
+
+Notice that everything in the above equations is easily computed.  In
+particular, we compute $z_j^L$ while computing the behaviour of the
+network, and it is only a small additional overhead to compute
+$\sigma'(z^L_j)$.  The exact form of the derivative with respect to the
+output depends on the form of the cost function.
+However, provided the cost function is known there should be little
+trouble in calculating
+
+!bt
+\[
+\frac{\partial {\cal C}}{\partial (a_j^L)}
+\]
+!et
+
+With the definition of $\delta_j^L$ we have a more compact definition of the derivative of the cost function in terms of the weights, namely
+!bt
+\[
+\frac{\partial{\cal C}}{\partial w_{jk}^L}  =  \delta_j^La_k^{L-1}.
+\]
+!et
+
+!split
+===== Derivatives in terms of $z_j^L$ =====
+
+It is also easy to see that our previous equation can be written as
+
+!bt
+\[
+\delta_j^L =\frac{\partial {\cal C}}{\partial z_j^L}= \frac{\partial {\cal C}}{\partial a_j^L}\frac{\partial a_j^L}{\partial z_j^L},
+\]
+!et
+which can also be interpreted as the partial derivative of the cost function with respect to the biases $b_j^L$, namely
+!bt
+\[
+\delta_j^L = \frac{\partial {\cal C}}{\partial b_j^L}\frac{\partial b_j^L}{\partial z_j^L}=\frac{\partial {\cal C}}{\partial b_j^L},
+\]
+!et
+That is, the error $\delta_j^L$ is exactly equal to the rate of change of the cost function as a function of the bias. 
+
+!split
+===== Bringing it together =====
+
+We have now three equations that are essential for the computations of the derivatives of the cost function at the output layer. These equations are needed to start the algorithm and they are
+
+!bt
+\begin{equation}
+\frac{\partial{\cal C}(\hat{W^L})}{\partial w_{jk}^L}  =  \delta_j^La_k^{L-1},
+\end{equation}
+!et
+and
+!bt
+\begin{equation}
+\delta_j^L = \sigma'(z_j^L)\frac{\partial {\cal C}}{\partial (a_j^L)},
+\end{equation}
+!et
+and
+
+!bt
+\begin{equation}
+\delta_j^L = \frac{\partial {\cal C}}{\partial b_j^L},
+\end{equation}
+!et
+
+
+!split
+===== Final back propagating equation =====
+
+We have that (replacing $L$ with a general layer $l$)
+!bt
+\[
+\delta_j^l =\frac{\partial {\cal C}}{\partial z_j^l}.
+\]
+!et
+We want to express this in terms of the equations for layer $l+1$.
+
+!split
+===== Using the chain rule and summing over all $k$ entries =====
+
+We obtain
+!bt
+\[
+\delta_j^l =\sum_k \frac{\partial {\cal C}}{\partial z_k^{l+1}}\frac{\partial z_k^{l+1}}{\partial z_j^{l}}=\sum_k \delta_k^{l+1}\frac{\partial z_k^{l+1}}{\partial z_j^{l}},
+\]
+!et
+and recalling that
+!bt
+\[
+z_j^{l+1} = \sum_{i=1}^{M_{l}}w_{ij}^{l+1}a_i^{l}+b_j^{l+1},
+\]
+!et
+with $M_l$ being the number of nodes in layer $l$, we obtain
+!bt
+\[
+\delta_j^l =\sum_k \delta_k^{l+1}w_{kj}^{l+1}\sigma'(z_j^l),
+\]
+!et
+This is our final equation.
+
+We are now ready to set up the algorithm for back propagation and learning the weights and biases.
+
+!split
+===== Setting up the back propagation algorithm =====
+
+The four equations  provide us with a way of computing the gradient of the cost function. Let us write this out in the form of an algorithm.
+
+_First_, we set up the input data $\hat{x}$ and the activations
+$\hat{z}_1$ of the input layer and compute the activation function and
+the pertinent outputs $\hat{a}^1$.
+
+_Secondly_, we perform then the feed forward till we reach the output
+layer and compute all $\hat{z}_l$ of the input layer and compute the
+activation function and the pertinent outputs $\hat{a}^l$ for
+$l=1,2,3,\dots,L$.
+
+
+_Notation_: The first hidden layer has $l=1$ as label and the final output layer has $l=L$.
+
+!split
+===== Setting up the back propagation algorithm, part 2 =====
+
+
+Thereafter we compute the ouput error $\hat{\delta}^L$ by computing all
+!bt
+\[
+\delta_j^L = \sigma'(z_j^L)\frac{\partial {\cal C}}{\partial (a_j^L)}.
+\]
+!et
+
+Then we compute the back propagate error for each $l=L-1,L-2,\dots,1$ as
+!bt
+\[
+\delta_j^l = \sum_k \delta_k^{l+1}w_{kj}^{l+1}\sigma'(z_j^l).
+\]
+!et
+
+!split
+===== Setting up the Back propagation algorithm, part 3 =====
+
+
+Finally, we update the weights and the biases using gradient descent
+for each $l=L-1,L-2,\dots,1$ and update the weights and biases
+according to the rules
+
+!bt
+\[
+w_{jk}^l\leftarrow  = w_{jk}^l- \eta \delta_j^la_k^{l-1},
+\]
+!et
+
+!bt
+\[
+b_j^l \leftarrow b_j^l-\eta \frac{\partial {\cal C}}{\partial b_j^l}=b_j^l-\eta \delta_j^l,
+\]
+!et
+with $\eta$ being the learning rate.
+
+!split
+===== Updating the gradients  =====
+
+With the back propagate error for each $l=L-1,L-2,\dots,1$ as
+!bt
+\[
+\delta_j^l = \sum_k \delta_k^{l+1}w_{kj}^{l+1}sigma'(z_j^l),
+\]
+!et
+we update the weights and the biases using gradient descent for each $l=L-1,L-2,\dots,1$ and update the weights and biases according to the rules
+!bt
+\[
+w_{jk}^l\leftarrow  = w_{jk}^l- \eta \delta_j^la_k^{l-1},
+\]
+!et
+
+!bt
+\[
+b_j^l \leftarrow b_j^l-\eta \frac{\partial {\cal C}}{\partial b_j^l}=b_j^l-\eta \delta_j^l,
+\]
+!et
+
+
+
+
+
+
+!split
+=== Activation functions  ===
+
+
+A property that characterizes a neural network, other than its
+connectivity, is the choice of activation function(s).  As described
+in, the following restrictions are imposed on an activation function
+for a FFNN to fulfill the universal approximation theorem
+
+  * Non-constant
+
+  * Bounded
+
+  * Monotonically-increasing
+
+  * Continuous
+
+!split
+=== Activation functions, Logistic and Hyperbolic ones  ===
+
+The second requirement excludes all linear functions. Furthermore, in
+a MLP with only linear activation functions, each layer simply
+performs a linear transformation of its inputs.
+
+Regardless of the number of layers, the output of the NN will be
+nothing but a linear function of the inputs. Thus we need to introduce
+some kind of non-linearity to the NN to be able to fit non-linear
+functions Typical examples are the logistic *Sigmoid*
+
+!bt
+\[
+ f(x) = \frac{1}{1 + e^{-x}},
+\]
+!et
+and the *hyperbolic tangent* function
+!bt
+\[
+ f(x) = \tanh(x)
+\]
+!et
+
+!split
+=== Relevance === 
+
+The *sigmoid* function are more biologically plausible because the
+output of inactive neurons are zero. Such activation function are
+called *one-sided*. However, it has been shown that the hyperbolic
+tangent performs better than the sigmoid for training MLPs.  has
+become the most popular for *deep neural networks*
+
+!bc pycod
+"""The sigmoid function (or the logistic curve) is a 
+function that takes any real number, z, and outputs a number (0,1).
+It is useful in neural networks for assigning weights on a relative scale.
+The value z is the weighted sum of parameters involved in the learning algorithm."""
+
+import numpy
+import matplotlib.pyplot as plt
+import math as mt
+
+z = numpy.arange(-5, 5, .1)
+sigma_fn = numpy.vectorize(lambda z: 1/(1+numpy.exp(-z)))
+sigma = sigma_fn(z)
+
+fig = plt.figure()
+ax = fig.add_subplot(111)
+ax.plot(z, sigma)
+ax.set_ylim([-0.1, 1.1])
+ax.set_xlim([-5,5])
+ax.grid(True)
+ax.set_xlabel('z')
+ax.set_title('sigmoid function')
+
+plt.show()
+
+"""Step Function"""
+z = numpy.arange(-5, 5, .02)
+step_fn = numpy.vectorize(lambda z: 1.0 if z >= 0.0 else 0.0)
+step = step_fn(z)
+
+fig = plt.figure()
+ax = fig.add_subplot(111)
+ax.plot(z, step)
+ax.set_ylim([-0.5, 1.5])
+ax.set_xlim([-5,5])
+ax.grid(True)
+ax.set_xlabel('z')
+ax.set_title('step function')
+
+plt.show()
+
+"""Sine Function"""
+z = numpy.arange(-2*mt.pi, 2*mt.pi, 0.1)
+t = numpy.sin(z)
+
+fig = plt.figure()
+ax = fig.add_subplot(111)
+ax.plot(z, t)
+ax.set_ylim([-1.0, 1.0])
+ax.set_xlim([-2*mt.pi,2*mt.pi])
+ax.grid(True)
+ax.set_xlabel('z')
+ax.set_title('sine function')
+
+plt.show()
+
+"""Plots a graph of the squashing function used by a rectified linear
+unit"""
+z = numpy.arange(-2, 2, .1)
+zero = numpy.zeros(len(z))
+y = numpy.max([zero, z], axis=0)
+
+fig = plt.figure()
+ax = fig.add_subplot(111)
+ax.plot(z, y)
+ax.set_ylim([-2.0, 2.0])
+ax.set_xlim([-2.0, 2.0])
+ax.grid(True)
+ax.set_xlabel('z')
+ax.set_title('Rectified linear unit')
+
+plt.show()
+!ec
+
+
+
+
+
+
+
+!split
+===== Fine-tuning neural network hyperparameters =====
+
+The flexibility of neural networks is also one of their main
+drawbacks: there are many hyperparameters to tweak. Not only can you
+use any imaginable network topology (how neurons/nodes are
+interconnected), but even in a simple FFNN you can change the number
+of layers, the number of neurons per layer, the type of activation
+function to use in each layer, the weight initialization logic, the
+stochastic gradient optmized and much more. How do you know what
+combination of hyperparameters is the best for your task?
+
+* You can use grid search with cross-validation to find the right hyperparameters.
+
+However,since there are many hyperparameters to tune, and since
+training a neural network on a large dataset takes a lot of time, you
+will only be able to explore a tiny part of the hyperparameter space.
+
+
+* You can use randomized search.
+* Or use tools like "Oscar":"http://oscar.calldesk.ai/", which implements more complex algorithms to help you find a good set of hyperparameters quickly.  
+
+!split
+===== Hidden layers =====
+
+
+
+For many problems you can start with just one or two hidden layers and
+it will work just fine.  For the MNIST data set you ca easily get a
+high accuracy using just one hidden layer with a few hundred neurons.
+You can reach for this data set above 98% accuracy using two hidden
+layers with the same total amount of neurons, in roughly the same
+amount of training time.
+
+For more complex problems, you can gradually ramp up the number of
+hidden layers, until you start overfitting the training set. Very
+complex tasks, such as large image classification or speech
+recognition, typically require networks with dozens of layers and they
+need a huge amount of training data. However, you will rarely have to
+train such networks from scratch: it is much more common to reuse
+parts of a pretrained state-of-the-art network that performs a similar
+task.
+
+
+
+
+!split 
+===== Vanishing gradients  =====
+
+The Back propagation algorithm we derived above works by going from
+the output layer to the input layer, propagating the error gradient on
+the way. Once the algorithm has computed the gradient of the cost
+function with regards to each parameter in the network, it uses these
+gradients to update each parameter with a Gradient Descent (GD) step.
+
+
+Unfortunately for us, the gradients often get smaller and smaller as
+the algorithm progresses down to the first hidden layers. As a result,
+the GD update leaves the lower layer connection weights virtually
+unchanged, and training never converges to a good solution. This is
+known in the literature as _the vanishing gradients problem_.
+
+!split
+===== Exploding gradients =====
+
+In other cases, the opposite can happen, namely the the gradients can
+grow bigger and bigger. The result is that many of the layers get
+large updates of the weights the algorithm diverges. This is the
+_exploding gradients problem_, which is mostly encountered in
+recurrent neural networks. More generally, deep neural networks suffer
+from unstable gradients, different layers may learn at widely
+different speeds
+
+
+
+!split 
+===== Is the Logistic activation function (Sigmoid)  our choice? =====
+
+Although this unfortunate behavior has been empirically observed for
+quite a while (it was one of the reasons why deep neural networks were
+mostly abandoned for a long time), it is only around 2010 that
+significant progress was made in understanding it.
+
+A paper titled "Understanding the Difficulty of Training Deep
+Feedforward Neural Networks by Xavier Glorot and Yoshua Bengio":"http://proceedings.mlr.press/v9/glorot10a.html" found that
+the problems with the popular logistic
+sigmoid activation function and the weight initialization technique
+that was most popular at the time, namely random initialization using
+a normal distribution with a mean of 0 and a standard deviation of
+1. 
+
+!split
+===== Logistic function as the root of problems =====
+
+They showed that with this activation function and this
+initialization scheme, the variance of the outputs of each layer is
+much greater than the variance of its inputs. Going forward in the
+network, the variance keeps increasing after each layer until the
+activation function saturates at the top layers. This is actually made
+worse by the fact that the logistic function has a mean of 0.5, not 0
+(the hyperbolic tangent function has a mean of 0 and behaves slightly
+better than the logistic function in deep networks).
+
+
+!split
+===== The derivative of the Logistic funtion =====
+
+Looking at the logistic activation function, when inputs become large
+(negative or positive), the function saturates at 0 or 1, with a
+derivative extremely close to 0. Thus when backpropagation kicks in,
+it has virtually no gradient to propagate back through the network,
+and what little gradient exists keeps getting diluted as
+backpropagation progresses down through the top layers, so there is
+really nothing left for the lower layers.
+
+In their paper, Glorot and Bengio propose a way to significantly
+alleviate this problem. We need the signal to flow properly in both
+directions: in the forward direction when making predictions, and in
+the reverse direction when backpropagating gradients. We don’t want
+the signal to die out, nor do we want it to explode and saturate. For
+the signal to flow properly, the authors argue that we need the
+variance of the outputs of each layer to be equal to the variance of
+its inputs, and we also need the gradients to have equal variance
+before and after flowing through a layer in the reverse direction.
+
+
+!split
+===== Insights from the paper by Glorot and Bengio =====
+
+One of the insights in the 2010 paper by Glorot and Bengio was that
+the vanishing/exploding gradients problems were in part due to a poor
+choice of activation function. Until then most people had assumed that
+if Nature had chosen to use roughly sigmoid activation functions in
+biological neurons, they must be an excellent choice. But it turns out
+that other activation functions behave much better in deep neural
+networks, in particular the ReLU activation function, mostly because
+it does not saturate for positive values (and also because it is quite
+fast to compute).
+
+
+
+!split
+===== The RELU function family =====
+
+The ReLU activation function suffers from a problem known as the dying
+ReLUs: during training, some neurons effectively die, meaning they
+stop outputting anything other than 0.
+
+In some cases, you may find that half of your network’s neurons are
+dead, especially if you used a large learning rate. During training,
+if a neuron’s weights get updated such that the weighted sum of the
+neuron’s inputs is negative, it will start outputting 0. When this
+happen, the neuron is unlikely to come back to life since the gradient
+of the ReLU function is 0 when its input is negative.
+
+!split
+===== ELU function =====
+
+To solve this problem, nowadays practitioners use a variant of the
+ReLU function, such as the leaky ReLU discussed above or the so-called
+exponential linear unit (ELU) function
+
+
+!bt
+\[
+ELU(z) = \left\{\begin{array}{cc} \alpha\left( \exp{(z)}-1\right) & z < 0,\\  z & z \ge 0.\end{array}\right. 
+\]
+!et
+
+!split
+===== Which activation function should we use? =====
+
+In general it seems that the ELU activation function is better than
+the leaky ReLU function (and its variants), which is better than
+ReLU. ReLU performs better than $\tanh$ which in turn performs better
+than the logistic function.
+
+If runtime performance is an issue, then you may opt for the leaky
+ReLU function over the ELU function If you don’t want to tweak yet
+another hyperparameter, you may just use the default $\alpha$ of
+$0.01$ for the leaky ReLU, and $1$ for ELU. If you have spare time and
+computing power, you can use cross-validation or bootstrap to evaluate
+other activation functions.
+
+
+!split
+=====  More on activation functions, output layers =====
+
+In most cases you can use the ReLU activation function in the hidden
+layers (or one of its variants).
+
+It is a bit faster to compute than other activation functions, and the
+gradient descent optimization does in general not get stuck.
+
+_For the output layer:_
+
+* For classification the softmax activation function is generally a good choice for classification tasks (when the classes are mutually exclusive).
+* For regression tasks, you can simply use no activation function at all.
+
+
+
+
+!split
+===== Batch Normalization =====
+
+Batch Normalization aims to address the vanishing/exploding gradients
+problems, and more generally the problem that the distribution of each
+layer’s inputs changes during training, as the parameters of the
+previous layers change.
+
+The technique consists of adding an operation in the model just before
+the activation function of each layer, simply zero-centering and
+normalizing the inputs, then scaling and shifting the result using two
+new parameters per layer (one for scaling, the other for shifting). In
+other words, this operation lets the model learn the optimal scale and
+mean of the inputs for each layer.  In order to zero-center and
+normalize the inputs, the algorithm needs to estimate the inputs’ mean
+and standard deviation. It does so by evaluating the mean and standard
+deviation of the inputs over the current mini-batch, from this the
+name batch normalization.
+
+!split
+===== Dropout =====
+
+It is a fairly simple algorithm: at every training step, every neuron
+(including the input neurons but excluding the output neurons) has a
+probability $p$ of being temporarily dropped out, meaning it will be
+entirely ignored during this training step, but it may be active
+during the next step.
+
+The hyperparameter $p$ is called the dropout rate, and it is typically
+set to 50%. After training, the neurons are not dropped anymore.  It
+is viewed as one of the most popular regularization techniques.
+
+!split
+===== Gradient Clipping =====
+
+A popular technique to lessen the exploding gradients problem is to
+simply clip the gradients during backpropagation so that they never
+exceed some threshold (this is mostly useful for recurrent neural
+networks).
+
+This technique is called Gradient Clipping.
+
+In general however, Batch
+Normalization is preferred.
+
+!split 
+===== A top-down perspective on Neural networks =====
+
+
+The first thing we would like to do is divide the data into two or
+three parts. A training set, a validation or dev (development) set,
+and a test set. The test set is the data on which we want to make
+predictions. The dev set is a subset of the training data we use to
+check how well we are doing out-of-sample, after training the model on
+the training dataset. We use the validation error as a proxy for the
+test error in order to make tweaks to our model. It is crucial that we
+do not use any of the test data to train the algorithm. This is a
+cardinal sin in ML. Then:
+
+o Estimate optimal error rate
+o Minimize underfitting (bias) on training data set.
+o Make sure you are not overfitting.
+
+
+
+!split
+===== More top-down perspectives =====
+
+If the validation and test sets are drawn from the same distributions,
+then a good performance on the validation set should lead to similarly
+good performance on the test set. 
+
+However, sometimes
+the training data and test data differ in subtle ways because, for
+example, they are collected using slightly different methods, or
+because it is cheaper to collect data in one way versus another. In
+this case, there can be a mismatch between the training and test
+data. This can lead to the neural network overfitting these small
+differences between the test and training sets, and a poor performance
+on the test set despite having a good performance on the validation
+set. To rectify this, Andrew Ng suggests making two validation or dev
+sets, one constructed from the training data and one constructed from
+the test data. The difference between the performance of the algorithm
+on these two validation sets quantifies the train-test mismatch. This
+can serve as another important diagnostic when using DNNs for
+supervised learning.
+
+!split
+===== Limitations of supervised learning with deep networks =====
+
+Like all statistical methods, supervised learning using neural
+networks has important limitations. This is especially important when
+one seeks to apply these methods, especially to physics problems. Like
+all tools, DNNs are not a universal solution. Often, the same or
+better performance on a task can be achieved by using a few
+hand-engineered features (or even a collection of random
+features). 
+
+
+!split
+===== Limitations of NNs =====
+
+Here we list some of the important limitations of supervised neural network based models. 
+
+
+
+* _Need labeled data_. All supervised learning methods, DNNs for supervised learning require labeled data. Often, labeled data is harder to acquire than unlabeled data (e.g. one must pay for human experts to label images).
+* _Supervised neural networks are extremely data intensive._ DNNs are data hungry. They perform best when data is plentiful. This is doubly so for supervised methods where the data must also be labeled. The utility of DNNs is extremely limited if data is hard to acquire or the datasets are small (hundreds to a few thousand samples). In this case, the performance of other methods that utilize hand-engineered features can exceed that of DNNs.
+
+
+!split
+===== Homogeneous data =====
+
+* _Homogeneous data._ Almost all DNNs deal with homogeneous data of one type. It is very hard to design architectures that mix and match data types (i.e.~some continuous variables, some discrete variables, some time series). In applications beyond images, video, and language, this is often what is required. In contrast, ensemble models like random forests or gradient-boosted trees have no difficulty handling mixed data types.
+
+!split
+===== More limitations =====
+
+
+* _Many problems are not about prediction._ In natural science we are often interested in learning something about the underlying distribution that generates the data. In this case, it is often difficult to cast these ideas in a supervised learning setting. While the problems are related, it is possible to make good predictions with a *wrong* model. The model might or might not be useful for understanding the underlying science.
+
+Some of these remarks are particular to DNNs, others are shared by all supervised learning methods. This motivates the use of unsupervised methods which in part circumvent these problems.
+
+
+
diff --git a/doc/src/week41/_minted/65E03FF20B6B3DC1E8F8756667419FF9.highlight.minted b/doc/src/week41/_minted/65E03FF20B6B3DC1E8F8756667419FF9.highlight.minted
new file mode 100644
index 000000000..d2da3799e
--- /dev/null
+++ b/doc/src/week41/_minted/65E03FF20B6B3DC1E8F8756667419FF9.highlight.minted
@@ -0,0 +1,11 @@
+\begin{MintedVerbatim}[commandchars=\\\{\},codes={\catcode`\$=3\catcode`\^=7\catcode`\_=8\relax}]
+\PYG{k+kn}{from}\PYG{+w}{ }\PYG{n+nn}{\PYGZus{}\PYGZus{}future\PYGZus{}\PYGZus{}}\PYG{+w}{ }\PYG{k+kn}{import} \PYG{n}{division}
+\PYG{k+kn}{from}\PYG{+w}{ }\PYG{n+nn}{sympy}\PYG{+w}{ }\PYG{k+kn}{import} \PYG{o}{*}
+\PYG{n}{x} \PYG{o}{=} \PYG{n}{symbols}\PYG{p}{(}\PYG{l+s+s1}{\PYGZsq{}}\PYG{l+s+s1}{x}\PYG{l+s+s1}{\PYGZsq{}}\PYG{p}{)}
+\PYG{n}{expr} \PYG{o}{=} \PYG{n}{sqrt}\PYG{p}{(}\PYG{n}{x}\PYG{o}{*}\PYG{n}{x}\PYG{o}{+}\PYG{n}{exp}\PYG{p}{(}\PYG{n}{x}\PYG{o}{*}\PYG{n}{x}\PYG{p}{)}\PYG{p}{)}
+\PYG{n}{simplify}\PYG{p}{(}\PYG{n}{expr}\PYG{p}{)}
+\PYG{n}{derivative} \PYG{o}{=} \PYG{n}{diff}\PYG{p}{(}\PYG{n}{expr}\PYG{p}{,}\PYG{n}{x}\PYG{p}{)}
+\PYG{n+nb}{print}\PYG{p}{(}\PYG{n}{python}\PYG{p}{(}\PYG{n}{expr}\PYG{p}{)}\PYG{p}{)}
+\PYG{n+nb}{print}\PYG{p}{(}\PYG{n}{python}\PYG{p}{(}\PYG{n}{derivative}\PYG{p}{)}\PYG{p}{)}
+
+\end{MintedVerbatim}
diff --git a/doc/src/week41/_minted/93B9C4D3B722A5D46E6B643B143D1A87.highlight.minted b/doc/src/week41/_minted/93B9C4D3B722A5D46E6B643B143D1A87.highlight.minted
new file mode 100644
index 000000000..d0817680b
--- /dev/null
+++ b/doc/src/week41/_minted/93B9C4D3B722A5D46E6B643B143D1A87.highlight.minted
@@ -0,0 +1,67 @@
+\begin{MintedVerbatim}[commandchars=\\\{\},codes={\catcode`\$=3\catcode`\^=7\catcode`\_=8\relax}]
+\PYG{k+kn}{import}\PYG{+w}{ }\PYG{n+nn}{numpy}\PYG{+w}{ }\PYG{k}{as}\PYG{+w}{ }\PYG{n+nn}{np}
+\PYG{c+c1}{\PYGZsh{} We use the Sigmoid function as activation function}
+\PYG{k}{def}\PYG{+w}{ }\PYG{n+nf}{sigmoid}\PYG{p}{(}\PYG{n}{z}\PYG{p}{)}\PYG{p}{:}
+    \PYG{k}{return} \PYG{l+m+mf}{1.0}\PYG{o}{/}\PYG{p}{(}\PYG{l+m+mf}{1.0}\PYG{o}{+}\PYG{n}{np}\PYG{o}{.}\PYG{n}{exp}\PYG{p}{(}\PYG{o}{\PYGZhy{}}\PYG{n}{z}\PYG{p}{)}\PYG{p}{)}
+
+\PYG{k}{def}\PYG{+w}{ }\PYG{n+nf}{forwardpropagation}\PYG{p}{(}\PYG{n}{x}\PYG{p}{)}\PYG{p}{:}
+    \PYG{c+c1}{\PYGZsh{} weighted sum of inputs to the hidden layer}
+    \PYG{n}{z\PYGZus{}1} \PYG{o}{=} \PYG{n}{np}\PYG{o}{.}\PYG{n}{matmul}\PYG{p}{(}\PYG{n}{x}\PYG{p}{,} \PYG{n}{w\PYGZus{}1}\PYG{p}{)} \PYG{o}{+} \PYG{n}{b\PYGZus{}1}
+    \PYG{c+c1}{\PYGZsh{} activation in the hidden layer}
+    \PYG{n}{a\PYGZus{}1} \PYG{o}{=} \PYG{n}{sigmoid}\PYG{p}{(}\PYG{n}{z\PYGZus{}1}\PYG{p}{)}
+    \PYG{c+c1}{\PYGZsh{} weighted sum of inputs to the output layer}
+    \PYG{n}{z\PYGZus{}2} \PYG{o}{=} \PYG{n}{np}\PYG{o}{.}\PYG{n}{matmul}\PYG{p}{(}\PYG{n}{a\PYGZus{}1}\PYG{p}{,} \PYG{n}{w\PYGZus{}2}\PYG{p}{)} \PYG{o}{+} \PYG{n}{b\PYGZus{}2}
+    \PYG{n}{a\PYGZus{}2} \PYG{o}{=} \PYG{n}{z\PYGZus{}2}
+    \PYG{k}{return} \PYG{n}{a\PYGZus{}1}\PYG{p}{,} \PYG{n}{a\PYGZus{}2}
+
+\PYG{k}{def}\PYG{+w}{ }\PYG{n+nf}{backpropagation}\PYG{p}{(}\PYG{n}{x}\PYG{p}{,} \PYG{n}{y}\PYG{p}{)}\PYG{p}{:}
+    \PYG{n}{a\PYGZus{}1}\PYG{p}{,} \PYG{n}{a\PYGZus{}2} \PYG{o}{=} \PYG{n}{forwardpropagation}\PYG{p}{(}\PYG{n}{x}\PYG{p}{)}
+    \PYG{c+c1}{\PYGZsh{} parameter delta for the output layer, note that a\PYGZus{}2=z\PYGZus{}2 and its derivative wrt z\PYGZus{}2 is just 1}
+    \PYG{n}{delta\PYGZus{}2} \PYG{o}{=} \PYG{n}{a\PYGZus{}2} \PYG{o}{\PYGZhy{}} \PYG{n}{y}
+    \PYG{n+nb}{print}\PYG{p}{(}\PYG{l+m+mf}{0.5}\PYG{o}{*}\PYG{p}{(}\PYG{p}{(}\PYG{n}{a\PYGZus{}2}\PYG{o}{\PYGZhy{}}\PYG{n}{y}\PYG{p}{)}\PYG{o}{*}\PYG{o}{*}\PYG{l+m+mi}{2}\PYG{p}{)}\PYG{p}{)}
+    \PYG{c+c1}{\PYGZsh{} delta for  the hidden layer}
+    \PYG{n}{delta\PYGZus{}1} \PYG{o}{=} \PYG{n}{np}\PYG{o}{.}\PYG{n}{matmul}\PYG{p}{(}\PYG{n}{delta\PYGZus{}2}\PYG{p}{,} \PYG{n}{w\PYGZus{}2}\PYG{o}{.}\PYG{n}{T}\PYG{p}{)} \PYG{o}{*} \PYG{n}{a\PYGZus{}1} \PYG{o}{*} \PYG{p}{(}\PYG{l+m+mi}{1} \PYG{o}{\PYGZhy{}} \PYG{n}{a\PYGZus{}1}\PYG{p}{)}
+    \PYG{c+c1}{\PYGZsh{} gradients for the output layer}
+    \PYG{n}{output\PYGZus{}weights\PYGZus{}gradient} \PYG{o}{=} \PYG{n}{np}\PYG{o}{.}\PYG{n}{matmul}\PYG{p}{(}\PYG{n}{a\PYGZus{}1}\PYG{o}{.}\PYG{n}{T}\PYG{p}{,} \PYG{n}{delta\PYGZus{}2}\PYG{p}{)}
+    \PYG{n}{output\PYGZus{}bias\PYGZus{}gradient} \PYG{o}{=} \PYG{n}{np}\PYG{o}{.}\PYG{n}{sum}\PYG{p}{(}\PYG{n}{delta\PYGZus{}2}\PYG{p}{,} \PYG{n}{axis}\PYG{o}{=}\PYG{l+m+mi}{0}\PYG{p}{)}
+    \PYG{c+c1}{\PYGZsh{} gradient for the hidden layer}
+    \PYG{n}{hidden\PYGZus{}weights\PYGZus{}gradient} \PYG{o}{=} \PYG{n}{np}\PYG{o}{.}\PYG{n}{matmul}\PYG{p}{(}\PYG{n}{x}\PYG{o}{.}\PYG{n}{T}\PYG{p}{,} \PYG{n}{delta\PYGZus{}1}\PYG{p}{)}
+    \PYG{n}{hidden\PYGZus{}bias\PYGZus{}gradient} \PYG{o}{=} \PYG{n}{np}\PYG{o}{.}\PYG{n}{sum}\PYG{p}{(}\PYG{n}{delta\PYGZus{}1}\PYG{p}{,} \PYG{n}{axis}\PYG{o}{=}\PYG{l+m+mi}{0}\PYG{p}{)}
+    \PYG{k}{return} \PYG{n}{output\PYGZus{}weights\PYGZus{}gradient}\PYG{p}{,} \PYG{n}{output\PYGZus{}bias\PYGZus{}gradient}\PYG{p}{,} \PYG{n}{hidden\PYGZus{}weights\PYGZus{}gradient}\PYG{p}{,} \PYG{n}{hidden\PYGZus{}bias\PYGZus{}gradient}
+
+
+\PYG{c+c1}{\PYGZsh{} ensure the same random numbers appear every time}
+\PYG{n}{np}\PYG{o}{.}\PYG{n}{random}\PYG{o}{.}\PYG{n}{seed}\PYG{p}{(}\PYG{l+m+mi}{0}\PYG{p}{)}
+\PYG{c+c1}{\PYGZsh{} Input variable}
+\PYG{n}{x} \PYG{o}{=} \PYG{n}{np}\PYG{o}{.}\PYG{n}{array}\PYG{p}{(}\PYG{p}{[}\PYG{l+m+mf}{4.0}\PYG{p}{]}\PYG{p}{,}\PYG{n}{dtype}\PYG{o}{=}\PYG{n}{np}\PYG{o}{.}\PYG{n}{float64}\PYG{p}{)}
+\PYG{c+c1}{\PYGZsh{} Target values}
+\PYG{n}{y} \PYG{o}{=} \PYG{l+m+mi}{2}\PYG{o}{*}\PYG{n}{x}\PYG{o}{+}\PYG{l+m+mf}{1.0}
+
+\PYG{c+c1}{\PYGZsh{} Defining the neural network, only scalars here}
+\PYG{n}{n\PYGZus{}inputs} \PYG{o}{=} \PYG{n}{x}\PYG{o}{.}\PYG{n}{shape}
+\PYG{n}{n\PYGZus{}features} \PYG{o}{=} \PYG{l+m+mi}{1}
+\PYG{n}{n\PYGZus{}hidden\PYGZus{}neurons} \PYG{o}{=} \PYG{l+m+mi}{1}
+\PYG{n}{n\PYGZus{}outputs} \PYG{o}{=} \PYG{l+m+mi}{1}
+
+\PYG{c+c1}{\PYGZsh{} Initialize the network}
+\PYG{c+c1}{\PYGZsh{} weights and bias in the hidden layer}
+\PYG{n}{w\PYGZus{}1} \PYG{o}{=} \PYG{n}{np}\PYG{o}{.}\PYG{n}{random}\PYG{o}{.}\PYG{n}{randn}\PYG{p}{(}\PYG{n}{n\PYGZus{}features}\PYG{p}{,} \PYG{n}{n\PYGZus{}hidden\PYGZus{}neurons}\PYG{p}{)}
+\PYG{n}{b\PYGZus{}1} \PYG{o}{=} \PYG{n}{np}\PYG{o}{.}\PYG{n}{zeros}\PYG{p}{(}\PYG{n}{n\PYGZus{}hidden\PYGZus{}neurons}\PYG{p}{)} \PYG{o}{+} \PYG{l+m+mf}{0.01}
+
+\PYG{c+c1}{\PYGZsh{} weights and bias in the output layer}
+\PYG{n}{w\PYGZus{}2} \PYG{o}{=} \PYG{n}{np}\PYG{o}{.}\PYG{n}{random}\PYG{o}{.}\PYG{n}{randn}\PYG{p}{(}\PYG{n}{n\PYGZus{}hidden\PYGZus{}neurons}\PYG{p}{,} \PYG{n}{n\PYGZus{}outputs}\PYG{p}{)}
+\PYG{n}{b\PYGZus{}2} \PYG{o}{=} \PYG{n}{np}\PYG{o}{.}\PYG{n}{zeros}\PYG{p}{(}\PYG{n}{n\PYGZus{}outputs}\PYG{p}{)} \PYG{o}{+} \PYG{l+m+mf}{0.01}
+
+\PYG{n}{eta} \PYG{o}{=} \PYG{l+m+mf}{0.1}
+\PYG{k}{for} \PYG{n}{i} \PYG{o+ow}{in} \PYG{n+nb}{range}\PYG{p}{(}\PYG{l+m+mi}{50}\PYG{p}{)}\PYG{p}{:}
+    \PYG{c+c1}{\PYGZsh{} calculate gradients}
+    \PYG{n}{derivW2}\PYG{p}{,} \PYG{n}{derivB2}\PYG{p}{,} \PYG{n}{derivW1}\PYG{p}{,} \PYG{n}{derivB1} \PYG{o}{=} \PYG{n}{backpropagation}\PYG{p}{(}\PYG{n}{x}\PYG{p}{,} \PYG{n}{y}\PYG{p}{)}
+    \PYG{c+c1}{\PYGZsh{} update weights and biases}
+    \PYG{n}{w\PYGZus{}2} \PYG{o}{\PYGZhy{}}\PYG{o}{=} \PYG{n}{eta} \PYG{o}{*} \PYG{n}{derivW2}
+    \PYG{n}{b\PYGZus{}2} \PYG{o}{\PYGZhy{}}\PYG{o}{=} \PYG{n}{eta} \PYG{o}{*} \PYG{n}{derivB2}
+    \PYG{n}{w\PYGZus{}1} \PYG{o}{\PYGZhy{}}\PYG{o}{=} \PYG{n}{eta} \PYG{o}{*} \PYG{n}{derivW1}
+    \PYG{n}{b\PYGZus{}1} \PYG{o}{\PYGZhy{}}\PYG{o}{=} \PYG{n}{eta} \PYG{o}{*} \PYG{n}{derivB1}
+
+
+
+\end{MintedVerbatim}
diff --git a/doc/src/week41/_minted/FCC7426E301CBCF11A668C5057AF3D1A.highlight.minted b/doc/src/week41/_minted/FCC7426E301CBCF11A668C5057AF3D1A.highlight.minted
new file mode 100644
index 000000000..098dcea87
--- /dev/null
+++ b/doc/src/week41/_minted/FCC7426E301CBCF11A668C5057AF3D1A.highlight.minted
@@ -0,0 +1,11 @@
+\begin{MintedVerbatim}[commandchars=\\\{\},codes={\catcode`\$=3\catcode`\^=7\catcode`\_=8\relax}]
+\PYG{k+kn}{from}\PYG{+w}{ }\PYG{n+nn}{\PYGZus{}\PYGZus{}future\PYGZus{}\PYGZus{}}\PYG{+w}{ }\PYG{k+kn}{import} \PYG{n}{division}
+\PYG{k+kn}{from}\PYG{+w}{ }\PYG{n+nn}{sympy}\PYG{+w}{ }\PYG{k+kn}{import} \PYG{o}{*}
+\PYG{n}{x} \PYG{o}{=} \PYG{n}{symbols}\PYG{p}{(}\PYG{l+s+s1}{\PYGZsq{}}\PYG{l+s+s1}{x}\PYG{l+s+s1}{\PYGZsq{}}\PYG{p}{)}
+\PYG{n}{expr} \PYG{o}{=} \PYG{n}{exp}\PYG{p}{(}\PYG{n}{x}\PYG{o}{*}\PYG{n}{x}\PYG{p}{)}
+\PYG{n}{simplify}\PYG{p}{(}\PYG{n}{expr}\PYG{p}{)}
+\PYG{n}{derivative} \PYG{o}{=} \PYG{n}{diff}\PYG{p}{(}\PYG{n}{expr}\PYG{p}{,}\PYG{n}{x}\PYG{p}{)}
+\PYG{n+nb}{print}\PYG{p}{(}\PYG{n}{python}\PYG{p}{(}\PYG{n}{expr}\PYG{p}{)}\PYG{p}{)}
+\PYG{n+nb}{print}\PYG{p}{(}\PYG{n}{python}\PYG{p}{(}\PYG{n}{derivative}\PYG{p}{)}\PYG{p}{)}
+
+\end{MintedVerbatim}
diff --git a/doc/src/week41/_minted/_F65A394E8F693820B1C98E9CDA6B0A8F.index.minted b/doc/src/week41/_minted/_F65A394E8F693820B1C98E9CDA6B0A8F.index.minted
new file mode 100644
index 000000000..b7f3a9c6f
--- /dev/null
+++ b/doc/src/week41/_minted/_F65A394E8F693820B1C98E9CDA6B0A8F.index.minted
@@ -0,0 +1,12 @@
+{
+  "jobname": "week41",
+  "md5": "F65A394E8F693820B1C98E9CDA6B0A8F",
+  "timestamp": "20251005194057",
+  "cachefiles": [
+    "65E03FF20B6B3DC1E8F8756667419FF9.highlight.minted",
+    "93B9C4D3B722A5D46E6B643B143D1A87.highlight.minted",
+    "FCC7426E301CBCF11A668C5057AF3D1A.highlight.minted",
+    "_F65A394E8F693820B1C98E9CDA6B0A8F.index.minted",
+    "default.style.minted"
+  ]
+}
\ No newline at end of file
diff --git a/doc/src/week41/_minted/default.style.minted b/doc/src/week41/_minted/default.style.minted
new file mode 100644
index 000000000..3553f35b6
--- /dev/null
+++ b/doc/src/week41/_minted/default.style.minted
@@ -0,0 +1,100 @@
+\makeatletter
+\def\PYG@reset{\let\PYG@it=\relax \let\PYG@bf=\relax%
+    \let\PYG@ul=\relax \let\PYG@tc=\relax%
+    \let\PYG@bc=\relax \let\PYG@ff=\relax}
+\def\PYG@tok#1{\csname PYG@tok@#1\endcsname}
+\def\PYG@toks#1+{\ifx\relax#1\empty\else%
+    \PYG@tok{#1}\expandafter\PYG@toks\fi}
+\def\PYG@do#1{\PYG@bc{\PYG@tc{\PYG@ul{%
+    \PYG@it{\PYG@bf{\PYG@ff{#1}}}}}}}
+\def\PYG#1#2{\PYG@reset\PYG@toks#1+\relax+\PYG@do{#2}}
+
+\@namedef{PYG@tok@w}{\def\PYG@tc##1{\textcolor[rgb]{0.73,0.73,0.73}{##1}}}
+\@namedef{PYG@tok@c}{\let\PYG@it=\textit\def\PYG@tc##1{\textcolor[rgb]{0.24,0.48,0.48}{##1}}}
+\@namedef{PYG@tok@cp}{\def\PYG@tc##1{\textcolor[rgb]{0.61,0.40,0.00}{##1}}}
+\@namedef{PYG@tok@k}{\let\PYG@bf=\textbf\def\PYG@tc##1{\textcolor[rgb]{0.00,0.50,0.00}{##1}}}
+\@namedef{PYG@tok@kp}{\def\PYG@tc##1{\textcolor[rgb]{0.00,0.50,0.00}{##1}}}
+\@namedef{PYG@tok@kt}{\def\PYG@tc##1{\textcolor[rgb]{0.69,0.00,0.25}{##1}}}
+\@namedef{PYG@tok@o}{\def\PYG@tc##1{\textcolor[rgb]{0.40,0.40,0.40}{##1}}}
+\@namedef{PYG@tok@ow}{\let\PYG@bf=\textbf\def\PYG@tc##1{\textcolor[rgb]{0.67,0.13,1.00}{##1}}}
+\@namedef{PYG@tok@nb}{\def\PYG@tc##1{\textcolor[rgb]{0.00,0.50,0.00}{##1}}}
+\@namedef{PYG@tok@nf}{\def\PYG@tc##1{\textcolor[rgb]{0.00,0.00,1.00}{##1}}}
+\@namedef{PYG@tok@nc}{\let\PYG@bf=\textbf\def\PYG@tc##1{\textcolor[rgb]{0.00,0.00,1.00}{##1}}}
+\@namedef{PYG@tok@nn}{\let\PYG@bf=\textbf\def\PYG@tc##1{\textcolor[rgb]{0.00,0.00,1.00}{##1}}}
+\@namedef{PYG@tok@ne}{\let\PYG@bf=\textbf\def\PYG@tc##1{\textcolor[rgb]{0.80,0.25,0.22}{##1}}}
+\@namedef{PYG@tok@nv}{\def\PYG@tc##1{\textcolor[rgb]{0.10,0.09,0.49}{##1}}}
+\@namedef{PYG@tok@no}{\def\PYG@tc##1{\textcolor[rgb]{0.53,0.00,0.00}{##1}}}
+\@namedef{PYG@tok@nl}{\def\PYG@tc##1{\textcolor[rgb]{0.46,0.46,0.00}{##1}}}
+\@namedef{PYG@tok@ni}{\let\PYG@bf=\textbf\def\PYG@tc##1{\textcolor[rgb]{0.44,0.44,0.44}{##1}}}
+\@namedef{PYG@tok@na}{\def\PYG@tc##1{\textcolor[rgb]{0.41,0.47,0.13}{##1}}}
+\@namedef{PYG@tok@nt}{\let\PYG@bf=\textbf\def\PYG@tc##1{\textcolor[rgb]{0.00,0.50,0.00}{##1}}}
+\@namedef{PYG@tok@nd}{\def\PYG@tc##1{\textcolor[rgb]{0.67,0.13,1.00}{##1}}}
+\@namedef{PYG@tok@s}{\def\PYG@tc##1{\textcolor[rgb]{0.73,0.13,0.13}{##1}}}
+\@namedef{PYG@tok@sd}{\let\PYG@it=\textit\def\PYG@tc##1{\textcolor[rgb]{0.73,0.13,0.13}{##1}}}
+\@namedef{PYG@tok@si}{\let\PYG@bf=\textbf\def\PYG@tc##1{\textcolor[rgb]{0.64,0.35,0.47}{##1}}}
+\@namedef{PYG@tok@se}{\let\PYG@bf=\textbf\def\PYG@tc##1{\textcolor[rgb]{0.67,0.36,0.12}{##1}}}
+\@namedef{PYG@tok@sr}{\def\PYG@tc##1{\textcolor[rgb]{0.64,0.35,0.47}{##1}}}
+\@namedef{PYG@tok@ss}{\def\PYG@tc##1{\textcolor[rgb]{0.10,0.09,0.49}{##1}}}
+\@namedef{PYG@tok@sx}{\def\PYG@tc##1{\textcolor[rgb]{0.00,0.50,0.00}{##1}}}
+\@namedef{PYG@tok@m}{\def\PYG@tc##1{\textcolor[rgb]{0.40,0.40,0.40}{##1}}}
+\@namedef{PYG@tok@gh}{\let\PYG@bf=\textbf\def\PYG@tc##1{\textcolor[rgb]{0.00,0.00,0.50}{##1}}}
+\@namedef{PYG@tok@gu}{\let\PYG@bf=\textbf\def\PYG@tc##1{\textcolor[rgb]{0.50,0.00,0.50}{##1}}}
+\@namedef{PYG@tok@gd}{\def\PYG@tc##1{\textcolor[rgb]{0.63,0.00,0.00}{##1}}}
+\@namedef{PYG@tok@gi}{\def\PYG@tc##1{\textcolor[rgb]{0.00,0.52,0.00}{##1}}}
+\@namedef{PYG@tok@gr}{\def\PYG@tc##1{\textcolor[rgb]{0.89,0.00,0.00}{##1}}}
+\@namedef{PYG@tok@ge}{\let\PYG@it=\textit}
+\@namedef{PYG@tok@gs}{\let\PYG@bf=\textbf}
+\@namedef{PYG@tok@ges}{\let\PYG@bf=\textbf\let\PYG@it=\textit}
+\@namedef{PYG@tok@gp}{\let\PYG@bf=\textbf\def\PYG@tc##1{\textcolor[rgb]{0.00,0.00,0.50}{##1}}}
+\@namedef{PYG@tok@go}{\def\PYG@tc##1{\textcolor[rgb]{0.44,0.44,0.44}{##1}}}
+\@namedef{PYG@tok@gt}{\def\PYG@tc##1{\textcolor[rgb]{0.00,0.27,0.87}{##1}}}
+\@namedef{PYG@tok@err}{\def\PYG@bc##1{{\setlength{\fboxsep}{\string -\fboxrule}\fcolorbox[rgb]{1.00,0.00,0.00}{1,1,1}{\strut ##1}}}}
+\@namedef{PYG@tok@kc}{\let\PYG@bf=\textbf\def\PYG@tc##1{\textcolor[rgb]{0.00,0.50,0.00}{##1}}}
+\@namedef{PYG@tok@kd}{\let\PYG@bf=\textbf\def\PYG@tc##1{\textcolor[rgb]{0.00,0.50,0.00}{##1}}}
+\@namedef{PYG@tok@kn}{\let\PYG@bf=\textbf\def\PYG@tc##1{\textcolor[rgb]{0.00,0.50,0.00}{##1}}}
+\@namedef{PYG@tok@kr}{\let\PYG@bf=\textbf\def\PYG@tc##1{\textcolor[rgb]{0.00,0.50,0.00}{##1}}}
+\@namedef{PYG@tok@bp}{\def\PYG@tc##1{\textcolor[rgb]{0.00,0.50,0.00}{##1}}}
+\@namedef{PYG@tok@fm}{\def\PYG@tc##1{\textcolor[rgb]{0.00,0.00,1.00}{##1}}}
+\@namedef{PYG@tok@vc}{\def\PYG@tc##1{\textcolor[rgb]{0.10,0.09,0.49}{##1}}}
+\@namedef{PYG@tok@vg}{\def\PYG@tc##1{\textcolor[rgb]{0.10,0.09,0.49}{##1}}}
+\@namedef{PYG@tok@vi}{\def\PYG@tc##1{\textcolor[rgb]{0.10,0.09,0.49}{##1}}}
+\@namedef{PYG@tok@vm}{\def\PYG@tc##1{\textcolor[rgb]{0.10,0.09,0.49}{##1}}}
+\@namedef{PYG@tok@sa}{\def\PYG@tc##1{\textcolor[rgb]{0.73,0.13,0.13}{##1}}}
+\@namedef{PYG@tok@sb}{\def\PYG@tc##1{\textcolor[rgb]{0.73,0.13,0.13}{##1}}}
+\@namedef{PYG@tok@sc}{\def\PYG@tc##1{\textcolor[rgb]{0.73,0.13,0.13}{##1}}}
+\@namedef{PYG@tok@dl}{\def\PYG@tc##1{\textcolor[rgb]{0.73,0.13,0.13}{##1}}}
+\@namedef{PYG@tok@s2}{\def\PYG@tc##1{\textcolor[rgb]{0.73,0.13,0.13}{##1}}}
+\@namedef{PYG@tok@sh}{\def\PYG@tc##1{\textcolor[rgb]{0.73,0.13,0.13}{##1}}}
+\@namedef{PYG@tok@s1}{\def\PYG@tc##1{\textcolor[rgb]{0.73,0.13,0.13}{##1}}}
+\@namedef{PYG@tok@mb}{\def\PYG@tc##1{\textcolor[rgb]{0.40,0.40,0.40}{##1}}}
+\@namedef{PYG@tok@mf}{\def\PYG@tc##1{\textcolor[rgb]{0.40,0.40,0.40}{##1}}}
+\@namedef{PYG@tok@mh}{\def\PYG@tc##1{\textcolor[rgb]{0.40,0.40,0.40}{##1}}}
+\@namedef{PYG@tok@mi}{\def\PYG@tc##1{\textcolor[rgb]{0.40,0.40,0.40}{##1}}}
+\@namedef{PYG@tok@il}{\def\PYG@tc##1{\textcolor[rgb]{0.40,0.40,0.40}{##1}}}
+\@namedef{PYG@tok@mo}{\def\PYG@tc##1{\textcolor[rgb]{0.40,0.40,0.40}{##1}}}
+\@namedef{PYG@tok@ch}{\let\PYG@it=\textit\def\PYG@tc##1{\textcolor[rgb]{0.24,0.48,0.48}{##1}}}
+\@namedef{PYG@tok@cm}{\let\PYG@it=\textit\def\PYG@tc##1{\textcolor[rgb]{0.24,0.48,0.48}{##1}}}
+\@namedef{PYG@tok@cpf}{\let\PYG@it=\textit\def\PYG@tc##1{\textcolor[rgb]{0.24,0.48,0.48}{##1}}}
+\@namedef{PYG@tok@c1}{\let\PYG@it=\textit\def\PYG@tc##1{\textcolor[rgb]{0.24,0.48,0.48}{##1}}}
+\@namedef{PYG@tok@cs}{\let\PYG@it=\textit\def\PYG@tc##1{\textcolor[rgb]{0.24,0.48,0.48}{##1}}}
+
+\def\PYGZbs{\char`\\}
+\def\PYGZus{\char`\_}
+\def\PYGZob{\char`\{}
+\def\PYGZcb{\char`\}}
+\def\PYGZca{\char`\^}
+\def\PYGZam{\char`\&}
+\def\PYGZlt{\char`\<}
+\def\PYGZgt{\char`\>}
+\def\PYGZsh{\char`\#}
+\def\PYGZpc{\char`\%}
+\def\PYGZdl{\char`\$}
+\def\PYGZhy{\char`\-}
+\def\PYGZsq{\char`\'}
+\def\PYGZdq{\char`\"}
+\def\PYGZti{\char`\~}
+% for compatibility with earlier versions
+\def\PYGZat{@}
+\def\PYGZlb{[}
+\def\PYGZrb{]}
+\makeatother
diff --git a/doc/src/week41/ipynb-exercisesweek41-src.tar.gz b/doc/src/week41/ipynb-exercisesweek41-src.tar.gz
deleted file mode 100644
index 2bd56c014..000000000
Binary files a/doc/src/week41/ipynb-exercisesweek41-src.tar.gz and /dev/null differ
diff --git a/doc/src/week41/week41.do.txt b/doc/src/week41/week41.do.txt
index 0b015e79b..cd27c48e6 100644
--- a/doc/src/week41/week41.do.txt
+++ b/doc/src/week41/week41.do.txt
@@ -9,32 +9,50 @@ DATE: Week 41
 !split
 ===== Material for the lecture on Monday October 6, 2025 =====
 o Neural Networks, setting up the basic steps, from the simple perceptron model to the multi-layer perceptron model.
-o Building our own Feed-forward Neural Network
-#  * Video of lecture notes at URL:"https://youtu.be/pMRUbf9E-gM"
-#  * Whiteboard notes at URL:"https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2024/NotesOctober7.pdf"
-!bblock  Readings and Videos:
- o These lecture notes
- o Rashcka et al chapter 11 
- o For neural networks we recommend Goodfellow et al chapter 6.
- o  Neural Networks demystified at URL:"https://www.youtube.com/watch?v=bxe2T-V8XRs&list=PLiaHhY2iBX9hdHaRr6b7XevZtgZRa1PoU&ab_channel=WelchLabs"
- o Building Neural Networks from scratch at URL:"https://www.youtube.com/watch?v=Wo5dMEP_BbI&list=PLQVvvaa0QuDcjD5BAw2DxE6OF2tius3V3&ab_channel=sentdex"
- o Video on Neural Networks at URL:"https://www.youtube.com/watch?v=CqOfi41LfDw"
- o Video on the back propagation algorithm at URL:"https://www.youtube.com/watch?v=Ilg3gGewQ5U"
-We also  recommend Michael Nielsen's intuitive approach to the neural networks and the universal approximation theorem, see the slides at URL:"http://neuralnetworksanddeeplearning.com/chap4.html".
+o Building our own Feed-forward Neural Network, getting started
+o Video of lecture notes at URL:"https://youtu.be/96Pe6O9Wn6g"
+o Whiteboard notes at URL:"https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2025/FYSSTKweek41.pdf"
+
+
+!split
+===== Readings and Videos: =====
+!bblock  
+o These lecture notes
+o For neural networks we recommend Goodfellow et al chapters 6 and 7.
+o Rashkca et al., chapter 11, jupyter-notebook sent separately, from "GitHub":"https://github.com/rasbt/machine-learning-book"
+o Neural Networks demystified at URL:"https://www.youtube.com/watch?v=bxe2T-V8XRs&list=PLiaHhY2iBX9hdHaRr6b7XevZtgZRa1PoU&ab_channel=WelchLabs"
+o Building Neural Networks from scratch at URL:"https://www.youtube.com/watch?v=Wo5dMEP_BbI&list=PLQVvvaa0QuDcjD5BAw2DxE6OF2tius3V3&ab_channel=sentdex"
+o Video on Neural Networks at URL:"https://www.youtube.com/watch?v=CqOfi41LfDw"
+o Video on the back propagation algorithm at URL:"https://www.youtube.com/watch?v=Ilg3gGewQ5U"
+o We also  recommend Michael Nielsen's intuitive approach to the neural networks and the universal approximation theorem, see the slides at URL:"http://neuralnetworksanddeeplearning.com/chap4.html".
+!eblock
+
+!split
+===== Mathematics of deep learning =====
+
+!bblock Two recent books online
+o "The Modern Mathematics of Deep Learning, by Julius Berner, Philipp Grohs, Gitta Kutyniok, Philipp Petersen":"https://arxiv.org/abs/2105.04026", published as "Mathematical Aspects of Deep Learning, pp. 1-111. Cambridge University Press, 2022":"https://doi.org/10.1017/9781009025096.002"
+o "Mathematical Introduction to Deep Learning: Methods, Implementations, and Theory, Arnulf Jentzen, Benno Kuckuck, Philippe von Wurstemberger":"https://doi.org/10.48550/arXiv.2310.20360"
 !eblock
 
 
 !split
-===== Material for the active learning sessions on Tuesday and Wednesday =====
-* Exercise on writing your own stochastic gradient and gradient descent codes. This exercise continues next week with studies of automatic differentiation
-* One lecture at the beginning of each session on the material from weeks 39 and 40 and how to write your own gradient descent code
-* Discussion of project 2
-* Your task before the sessions: revisit the material from weeks 39 and 40 and in particular the material from week 40 on stochastic gradient descent
+===== Reminder on books with hands-on material and codes =====
+!bblock
+"Sebastian Rashcka et al, Machine learning with Sickit-Learn and PyTorch":"https://sebastianraschka.com/blog/2022/ml-pytorch-book.html"
+!eblock
+
+
+!split
+===== Lab sessions on Tuesday and Wednesday =====
+
+Aim: Getting started with coding neural network. The exercises this
+week aim at setting up the feed-forward part of a neural network.
 
 
 
 !split
-===== Lecture Monday  October 7 =====
+===== Lecture Monday  October 6 =====
 
 !split
 ===== Introduction to Neural networks =====
@@ -221,152 +239,7 @@ as to not restrict the range of output values.
 FIGURE: [figures/nns.png, width=600 frac=0.8]  In a) we show a single perceptron model while in b) we dispay a network with two  hidden layers, an input layer and an output layer.
 
 
-!split
-===== Examples of XOR, OR and AND gates =====
-
-
-
-Let us first try to fit various gates using standard linear
-regression. The gates we are thinking of are the classical XOR, OR and
-AND gates, well-known elements in computer science. The tables here
-show how we can set up the inputs $x_1$ and $x_2$ in order to yield a
-specific target $y_i$.
-
-
-
-
-!bc pycod
-"""
-Simple code that tests XOR, OR and AND gates with linear regression
-"""
-
-import numpy as np
-# Design matrix
-X = np.array([ [1, 0, 0], [1, 0, 1], [1, 1, 0],[1, 1, 1]],dtype=np.float64)
-print(f"The X.TX  matrix:{X.T @ X}")
-Xinv = np.linalg.pinv(X.T @ X)
-print(f"The invers of X.TX  matrix:{Xinv}")
-
-# The XOR gate 
-yXOR = np.array( [ 0, 1 ,1, 0])
-ThetaXOR  = Xinv @ X.T @ yXOR
-print(f"The values of theta for the XOR gate:{ThetaXOR}")
-print(f"The linear regression prediction  for the XOR gate:{X @ ThetaXOR}")
-
-
-# The OR gate 
-yOR = np.array( [ 0, 1 ,1, 1])
-ThetaOR  = Xinv @ X.T @ yOR
-print(f"The values of theta for the OR gate:{ThetaOR}")
-print(f"The linear regression prediction  for the OR gate:{X @ ThetaOR}")
-
-
-# The OR gate 
-yAND = np.array( [ 0, 0 ,0, 1])
-ThetaAND  = Xinv @ X.T @ yAND
-print(f"The values of theta for the AND gate:{ThetaAND}")
-print(f"The linear regression prediction  for the AND gate:{X @ ThetaAND}")
-!ec
-
-What is happening here?
-
-!split
-===== Does Logistic Regression do a better Job? =====
-
-!bc pycod
-"""
-Simple code that tests XOR and OR gates with linear regression
-and logistic regression
-"""
-
-import matplotlib.pyplot as plt
-from sklearn.linear_model import LogisticRegression
-import numpy as np
-
-# Design matrix
-X = np.array([ [1, 0, 0], [1, 0, 1], [1, 1, 0],[1, 1, 1]],dtype=np.float64)
-print(f"The X.TX  matrix:{X.T @ X}")
-Xinv = np.linalg.pinv(X.T @ X)
-print(f"The invers of X.TX  matrix:{Xinv}")
-
-# The XOR gate 
-yXOR = np.array( [ 0, 1 ,1, 0])
-ThetaXOR  = Xinv @ X.T @ yXOR
-print(f"The values of theta for the XOR gate:{ThetaXOR}")
-print(f"The linear regression prediction  for the XOR gate:{X @ ThetaXOR}")
-
-
-# The OR gate 
-yOR = np.array( [ 0, 1 ,1, 1])
-ThetaOR  = Xinv @ X.T @ yOR
-print(f"The values of theta for the OR gate:{ThetaOR}")
-print(f"The linear regression prediction  for the OR gate:{X @ ThetaOR}")
-
-
-# The OR gate 
-yAND = np.array( [ 0, 0 ,0, 1])
-ThetaAND  = Xinv @ X.T @ yAND
-print(f"The values of theta for the AND gate:{ThetaAND}")
-print(f"The linear regression prediction  for the AND gate:{X @ ThetaAND}")
 
-# Now we change to logistic regression
-
-
-# Logistic Regression
-logreg = LogisticRegression()
-logreg.fit(X, yOR)
-print("Test set accuracy with Logistic Regression for OR gate: {:.2f}".format(logreg.score(X,yOR)))
-
-logreg.fit(X, yXOR)
-print("Test set accuracy with Logistic Regression for XOR gate: {:.2f}".format(logreg.score(X,yXOR)))
-
-
-logreg.fit(X, yAND)
-print("Test set accuracy with Logistic Regression for AND gate: {:.2f}".format(logreg.score(X,yAND)))
-!ec
-
-Not exactly impressive, but somewhat better.
-
-!split
-===== Adding Neural Networks =====
-
-!bc pycod
-
-# and now neural networks with Scikit-Learn and the XOR
-
-from sklearn.neural_network import MLPClassifier
-from sklearn.datasets import make_classification
-X, yXOR = make_classification(n_samples=100, random_state=1)
-FFNN = MLPClassifier(random_state=1, max_iter=300).fit(X, yXOR)
-FFNN.predict_proba(X)
-print(f"Test set accuracy with Feed Forward Neural Network  for XOR gate:{FFNN.score(X, yXOR)}")
-
-!ec
-
-
-
-!split
-===== Mathematics of deep learning =====
-
-!bblock Two recent books online
-o "The Modern Mathematics of Deep Learning, by Julius Berner, Philipp Grohs, Gitta Kutyniok, Philipp Petersen":"https://arxiv.org/abs/2105.04026", published as "Mathematical Aspects of Deep Learning, pp. 1-111. Cambridge University Press, 2022":"https://doi.org/10.1017/9781009025096.002"
-
-o "Mathematical Introduction to Deep Learning: Methods, Implementations, and Theory, Arnulf Jentzen, Benno Kuckuck, Philippe von Wurstemberger":"https://doi.org/10.48550/arXiv.2310.20360"
-!eblock
-
-
-!split
-===== Reminder on books with hands-on material and codes =====
-!bblock
-* "Sebastian Rashcka et al, Machine learning with Sickit-Learn and PyTorch":"https://sebastianraschka.com/blog/2022/ml-pytorch-book.html"
-!eblock
-
-
-!split
-===== Reading recommendations =====
-
-o Rashkca et al., chapter 11, jupyter-notebook sent separately, from "GitHub":"https://github.com/rasbt/machine-learning-book"
-o Goodfellow et al, chapter 6 and 7 contain most of the neural network background.
 
 !split
 ===== Mathematics of deep learning and neural networks =====
@@ -677,7 +550,7 @@ The last partial derivative can easily be computed and reads (by applying the ch
 In order to understand the back propagation algorithm and its
 derivation (an implementation of the chain rule), let us first digress
 with some simple examples. These examples are also meant to motivate
-the link with back propagation and "automatic differentiation":"https://en.wikipedia.org/wiki/Automatic_differentiation".
+the link with back propagation and "automatic differentiation":"https://en.wikipedia.org/wiki/Automatic_differentiation". We will discuss these topics next week (week 42).
 
 !split
 ===== Reminder on the chain rule and gradients =====
@@ -719,6 +592,7 @@ the gradient of $f$ with respect to $t$ and $s$ (without the explicit unit vecto
 ===== Automatic differentiation through examples =====
 
 A great introduction to automatic differentiation is given by Baydin et al., see URL:"https://arxiv.org/abs/1502.05767".
+See also the video at URL:"https://www.youtube.com/watch?v=wG_nF1awSSY".
 
 Automatic differentiation is a represented by a repeated application
 of the chain rule on well-known functions and allows for the
@@ -1778,462 +1652,3 @@ b_j^l \leftarrow b_j^l-\eta \frac{\partial {\cal C}}{\partial b_j^l}=b_j^l-\eta
 
 
 
-
-
-!split
-=== Activation functions  ===
-
-
-A property that characterizes a neural network, other than its
-connectivity, is the choice of activation function(s).  As described
-in, the following restrictions are imposed on an activation function
-for a FFNN to fulfill the universal approximation theorem
-
-  * Non-constant
-
-  * Bounded
-
-  * Monotonically-increasing
-
-  * Continuous
-
-!split
-=== Activation functions, Logistic and Hyperbolic ones  ===
-
-The second requirement excludes all linear functions. Furthermore, in
-a MLP with only linear activation functions, each layer simply
-performs a linear transformation of its inputs.
-
-Regardless of the number of layers, the output of the NN will be
-nothing but a linear function of the inputs. Thus we need to introduce
-some kind of non-linearity to the NN to be able to fit non-linear
-functions Typical examples are the logistic *Sigmoid*
-
-!bt
-\[
- f(x) = \frac{1}{1 + e^{-x}},
-\]
-!et
-and the *hyperbolic tangent* function
-!bt
-\[
- f(x) = \tanh(x)
-\]
-!et
-
-!split
-=== Relevance === 
-
-The *sigmoid* function are more biologically plausible because the
-output of inactive neurons are zero. Such activation function are
-called *one-sided*. However, it has been shown that the hyperbolic
-tangent performs better than the sigmoid for training MLPs.  has
-become the most popular for *deep neural networks*
-
-!bc pycod
-"""The sigmoid function (or the logistic curve) is a 
-function that takes any real number, z, and outputs a number (0,1).
-It is useful in neural networks for assigning weights on a relative scale.
-The value z is the weighted sum of parameters involved in the learning algorithm."""
-
-import numpy
-import matplotlib.pyplot as plt
-import math as mt
-
-z = numpy.arange(-5, 5, .1)
-sigma_fn = numpy.vectorize(lambda z: 1/(1+numpy.exp(-z)))
-sigma = sigma_fn(z)
-
-fig = plt.figure()
-ax = fig.add_subplot(111)
-ax.plot(z, sigma)
-ax.set_ylim([-0.1, 1.1])
-ax.set_xlim([-5,5])
-ax.grid(True)
-ax.set_xlabel('z')
-ax.set_title('sigmoid function')
-
-plt.show()
-
-"""Step Function"""
-z = numpy.arange(-5, 5, .02)
-step_fn = numpy.vectorize(lambda z: 1.0 if z >= 0.0 else 0.0)
-step = step_fn(z)
-
-fig = plt.figure()
-ax = fig.add_subplot(111)
-ax.plot(z, step)
-ax.set_ylim([-0.5, 1.5])
-ax.set_xlim([-5,5])
-ax.grid(True)
-ax.set_xlabel('z')
-ax.set_title('step function')
-
-plt.show()
-
-"""Sine Function"""
-z = numpy.arange(-2*mt.pi, 2*mt.pi, 0.1)
-t = numpy.sin(z)
-
-fig = plt.figure()
-ax = fig.add_subplot(111)
-ax.plot(z, t)
-ax.set_ylim([-1.0, 1.0])
-ax.set_xlim([-2*mt.pi,2*mt.pi])
-ax.grid(True)
-ax.set_xlabel('z')
-ax.set_title('sine function')
-
-plt.show()
-
-"""Plots a graph of the squashing function used by a rectified linear
-unit"""
-z = numpy.arange(-2, 2, .1)
-zero = numpy.zeros(len(z))
-y = numpy.max([zero, z], axis=0)
-
-fig = plt.figure()
-ax = fig.add_subplot(111)
-ax.plot(z, y)
-ax.set_ylim([-2.0, 2.0])
-ax.set_xlim([-2.0, 2.0])
-ax.grid(True)
-ax.set_xlabel('z')
-ax.set_title('Rectified linear unit')
-
-plt.show()
-!ec
-
-
-
-
-
-
-
-!split
-===== Fine-tuning neural network hyperparameters =====
-
-The flexibility of neural networks is also one of their main
-drawbacks: there are many hyperparameters to tweak. Not only can you
-use any imaginable network topology (how neurons/nodes are
-interconnected), but even in a simple FFNN you can change the number
-of layers, the number of neurons per layer, the type of activation
-function to use in each layer, the weight initialization logic, the
-stochastic gradient optmized and much more. How do you know what
-combination of hyperparameters is the best for your task?
-
-* You can use grid search with cross-validation to find the right hyperparameters.
-
-However,since there are many hyperparameters to tune, and since
-training a neural network on a large dataset takes a lot of time, you
-will only be able to explore a tiny part of the hyperparameter space.
-
-
-* You can use randomized search.
-* Or use tools like "Oscar":"http://oscar.calldesk.ai/", which implements more complex algorithms to help you find a good set of hyperparameters quickly.  
-
-!split
-===== Hidden layers =====
-
-
-
-For many problems you can start with just one or two hidden layers and
-it will work just fine.  For the MNIST data set you ca easily get a
-high accuracy using just one hidden layer with a few hundred neurons.
-You can reach for this data set above 98% accuracy using two hidden
-layers with the same total amount of neurons, in roughly the same
-amount of training time.
-
-For more complex problems, you can gradually ramp up the number of
-hidden layers, until you start overfitting the training set. Very
-complex tasks, such as large image classification or speech
-recognition, typically require networks with dozens of layers and they
-need a huge amount of training data. However, you will rarely have to
-train such networks from scratch: it is much more common to reuse
-parts of a pretrained state-of-the-art network that performs a similar
-task.
-
-
-
-
-!split 
-===== Vanishing gradients  =====
-
-The Back propagation algorithm we derived above works by going from
-the output layer to the input layer, propagating the error gradient on
-the way. Once the algorithm has computed the gradient of the cost
-function with regards to each parameter in the network, it uses these
-gradients to update each parameter with a Gradient Descent (GD) step.
-
-
-Unfortunately for us, the gradients often get smaller and smaller as
-the algorithm progresses down to the first hidden layers. As a result,
-the GD update leaves the lower layer connection weights virtually
-unchanged, and training never converges to a good solution. This is
-known in the literature as _the vanishing gradients problem_.
-
-!split
-===== Exploding gradients =====
-
-In other cases, the opposite can happen, namely the the gradients can
-grow bigger and bigger. The result is that many of the layers get
-large updates of the weights the algorithm diverges. This is the
-_exploding gradients problem_, which is mostly encountered in
-recurrent neural networks. More generally, deep neural networks suffer
-from unstable gradients, different layers may learn at widely
-different speeds
-
-
-
-!split 
-===== Is the Logistic activation function (Sigmoid)  our choice? =====
-
-Although this unfortunate behavior has been empirically observed for
-quite a while (it was one of the reasons why deep neural networks were
-mostly abandoned for a long time), it is only around 2010 that
-significant progress was made in understanding it.
-
-A paper titled "Understanding the Difficulty of Training Deep
-Feedforward Neural Networks by Xavier Glorot and Yoshua Bengio":"http://proceedings.mlr.press/v9/glorot10a.html" found that
-the problems with the popular logistic
-sigmoid activation function and the weight initialization technique
-that was most popular at the time, namely random initialization using
-a normal distribution with a mean of 0 and a standard deviation of
-1. 
-
-!split
-===== Logistic function as the root of problems =====
-
-They showed that with this activation function and this
-initialization scheme, the variance of the outputs of each layer is
-much greater than the variance of its inputs. Going forward in the
-network, the variance keeps increasing after each layer until the
-activation function saturates at the top layers. This is actually made
-worse by the fact that the logistic function has a mean of 0.5, not 0
-(the hyperbolic tangent function has a mean of 0 and behaves slightly
-better than the logistic function in deep networks).
-
-
-!split
-===== The derivative of the Logistic funtion =====
-
-Looking at the logistic activation function, when inputs become large
-(negative or positive), the function saturates at 0 or 1, with a
-derivative extremely close to 0. Thus when backpropagation kicks in,
-it has virtually no gradient to propagate back through the network,
-and what little gradient exists keeps getting diluted as
-backpropagation progresses down through the top layers, so there is
-really nothing left for the lower layers.
-
-In their paper, Glorot and Bengio propose a way to significantly
-alleviate this problem. We need the signal to flow properly in both
-directions: in the forward direction when making predictions, and in
-the reverse direction when backpropagating gradients. We don’t want
-the signal to die out, nor do we want it to explode and saturate. For
-the signal to flow properly, the authors argue that we need the
-variance of the outputs of each layer to be equal to the variance of
-its inputs, and we also need the gradients to have equal variance
-before and after flowing through a layer in the reverse direction.
-
-
-!split
-===== Insights from the paper by Glorot and Bengio =====
-
-One of the insights in the 2010 paper by Glorot and Bengio was that
-the vanishing/exploding gradients problems were in part due to a poor
-choice of activation function. Until then most people had assumed that
-if Nature had chosen to use roughly sigmoid activation functions in
-biological neurons, they must be an excellent choice. But it turns out
-that other activation functions behave much better in deep neural
-networks, in particular the ReLU activation function, mostly because
-it does not saturate for positive values (and also because it is quite
-fast to compute).
-
-
-
-!split
-===== The RELU function family =====
-
-The ReLU activation function suffers from a problem known as the dying
-ReLUs: during training, some neurons effectively die, meaning they
-stop outputting anything other than 0.
-
-In some cases, you may find that half of your network’s neurons are
-dead, especially if you used a large learning rate. During training,
-if a neuron’s weights get updated such that the weighted sum of the
-neuron’s inputs is negative, it will start outputting 0. When this
-happen, the neuron is unlikely to come back to life since the gradient
-of the ReLU function is 0 when its input is negative.
-
-!split
-===== ELU function =====
-
-To solve this problem, nowadays practitioners use a variant of the
-ReLU function, such as the leaky ReLU discussed above or the so-called
-exponential linear unit (ELU) function
-
-
-!bt
-\[
-ELU(z) = \left\{\begin{array}{cc} \alpha\left( \exp{(z)}-1\right) & z < 0,\\  z & z \ge 0.\end{array}\right. 
-\]
-!et
-
-!split
-===== Which activation function should we use? =====
-
-In general it seems that the ELU activation function is better than
-the leaky ReLU function (and its variants), which is better than
-ReLU. ReLU performs better than $\tanh$ which in turn performs better
-than the logistic function.
-
-If runtime performance is an issue, then you may opt for the leaky
-ReLU function over the ELU function If you don’t want to tweak yet
-another hyperparameter, you may just use the default $\alpha$ of
-$0.01$ for the leaky ReLU, and $1$ for ELU. If you have spare time and
-computing power, you can use cross-validation or bootstrap to evaluate
-other activation functions.
-
-
-!split
-=====  More on activation functions, output layers =====
-
-In most cases you can use the ReLU activation function in the hidden
-layers (or one of its variants).
-
-It is a bit faster to compute than other activation functions, and the
-gradient descent optimization does in general not get stuck.
-
-_For the output layer:_
-
-* For classification the softmax activation function is generally a good choice for classification tasks (when the classes are mutually exclusive).
-* For regression tasks, you can simply use no activation function at all.
-
-
-
-
-!split
-===== Batch Normalization =====
-
-Batch Normalization aims to address the vanishing/exploding gradients
-problems, and more generally the problem that the distribution of each
-layer’s inputs changes during training, as the parameters of the
-previous layers change.
-
-The technique consists of adding an operation in the model just before
-the activation function of each layer, simply zero-centering and
-normalizing the inputs, then scaling and shifting the result using two
-new parameters per layer (one for scaling, the other for shifting). In
-other words, this operation lets the model learn the optimal scale and
-mean of the inputs for each layer.  In order to zero-center and
-normalize the inputs, the algorithm needs to estimate the inputs’ mean
-and standard deviation. It does so by evaluating the mean and standard
-deviation of the inputs over the current mini-batch, from this the
-name batch normalization.
-
-!split
-===== Dropout =====
-
-It is a fairly simple algorithm: at every training step, every neuron
-(including the input neurons but excluding the output neurons) has a
-probability $p$ of being temporarily dropped out, meaning it will be
-entirely ignored during this training step, but it may be active
-during the next step.
-
-The hyperparameter $p$ is called the dropout rate, and it is typically
-set to 50%. After training, the neurons are not dropped anymore.  It
-is viewed as one of the most popular regularization techniques.
-
-!split
-===== Gradient Clipping =====
-
-A popular technique to lessen the exploding gradients problem is to
-simply clip the gradients during backpropagation so that they never
-exceed some threshold (this is mostly useful for recurrent neural
-networks).
-
-This technique is called Gradient Clipping.
-
-In general however, Batch
-Normalization is preferred.
-
-!split 
-===== A top-down perspective on Neural networks =====
-
-
-The first thing we would like to do is divide the data into two or
-three parts. A training set, a validation or dev (development) set,
-and a test set. The test set is the data on which we want to make
-predictions. The dev set is a subset of the training data we use to
-check how well we are doing out-of-sample, after training the model on
-the training dataset. We use the validation error as a proxy for the
-test error in order to make tweaks to our model. It is crucial that we
-do not use any of the test data to train the algorithm. This is a
-cardinal sin in ML. Then:
-
-o Estimate optimal error rate
-o Minimize underfitting (bias) on training data set.
-o Make sure you are not overfitting.
-
-
-
-!split
-===== More top-down perspectives =====
-
-If the validation and test sets are drawn from the same distributions,
-then a good performance on the validation set should lead to similarly
-good performance on the test set. 
-
-However, sometimes
-the training data and test data differ in subtle ways because, for
-example, they are collected using slightly different methods, or
-because it is cheaper to collect data in one way versus another. In
-this case, there can be a mismatch between the training and test
-data. This can lead to the neural network overfitting these small
-differences between the test and training sets, and a poor performance
-on the test set despite having a good performance on the validation
-set. To rectify this, Andrew Ng suggests making two validation or dev
-sets, one constructed from the training data and one constructed from
-the test data. The difference between the performance of the algorithm
-on these two validation sets quantifies the train-test mismatch. This
-can serve as another important diagnostic when using DNNs for
-supervised learning.
-
-!split
-===== Limitations of supervised learning with deep networks =====
-
-Like all statistical methods, supervised learning using neural
-networks has important limitations. This is especially important when
-one seeks to apply these methods, especially to physics problems. Like
-all tools, DNNs are not a universal solution. Often, the same or
-better performance on a task can be achieved by using a few
-hand-engineered features (or even a collection of random
-features). 
-
-
-!split
-===== Limitations of NNs =====
-
-Here we list some of the important limitations of supervised neural network based models. 
-
-
-
-* _Need labeled data_. All supervised learning methods, DNNs for supervised learning require labeled data. Often, labeled data is harder to acquire than unlabeled data (e.g. one must pay for human experts to label images).
-* _Supervised neural networks are extremely data intensive._ DNNs are data hungry. They perform best when data is plentiful. This is doubly so for supervised methods where the data must also be labeled. The utility of DNNs is extremely limited if data is hard to acquire or the datasets are small (hundreds to a few thousand samples). In this case, the performance of other methods that utilize hand-engineered features can exceed that of DNNs.
-
-
-!split
-===== Homogeneous data =====
-
-* _Homogeneous data._ Almost all DNNs deal with homogeneous data of one type. It is very hard to design architectures that mix and match data types (i.e.~some continuous variables, some discrete variables, some time series). In applications beyond images, video, and language, this is often what is required. In contrast, ensemble models like random forests or gradient-boosted trees have no difficulty handling mixed data types.
-
-!split
-===== More limitations =====
-
-
-* _Many problems are not about prediction._ In natural science we are often interested in learning something about the underlying distribution that generates the data. In this case, it is often difficult to cast these ideas in a supervised learning setting. While the problems are related, it is possible to make good predictions with a *wrong* model. The model might or might not be useful for understanding the underlying science.
-
-Some of these remarks are particular to DNNs, others are shared by all supervised learning methods. This motivates the use of unsupervised methods which in part circumvent these problems.
-
-
-
diff --git a/doc/src/week42/Previousversions/week42.do.txt b/doc/src/week42/Previousversions/week42.do.txt
new file mode 100644
index 000000000..de039fe22
--- /dev/null
+++ b/doc/src/week42/Previousversions/week42.do.txt
@@ -0,0 +1,3764 @@
+TITLE: Week 42 Constructing a Neural Network code with examples
+AUTHOR: Morten Hjorth-Jensen {copyright, 1999-present|CC BY-NC} at Department of Physics, University of Oslo, Norway
+DATE: October 13-17, 2025
+
+!split
+===== Lecture October 13, 2025 =====
+!bblock
+o Building our own Feed-forward Neural Network and discussion of project 2
+!eblock
+!bblock Readings and videos 
+o These lecture notes
+#o "Video of lecture":"https://youtu.be/7B2F35gNj2Y"
+#o "Whiteboard notes":"https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2024/NotesOct14.pdf"
+o For a more in depth discussion on  neural networks we recommend Goodfellow et al chapters 6 and 7.     
+o Neural Networks demystified at URL:"https://www.youtube.com/watch?v=bxe2T-V8XRs&list=PLiaHhY2iBX9hdHaRr6b7XevZtgZRa1PoU&ab_channel=WelchLabs"
+o Building Neural Networks from scratch at URL:"https://www.youtube.com/watch?v=Wo5dMEP_BbI&list=PLQVvvaa0QuDcjD5BAw2DxE6OF2tius3V3&ab_channel=sentdex"
+o Video on Neural Networks at URL:"https://www.youtube.com/watch?v=CqOfi41LfDw"
+o Video on the back propagation algorithm at URL:"https://www.youtube.com/watch?v=Ilg3gGewQ5U"
+I also  recommend Michael Nielsen's intuitive approach to the neural networks and the universal approximation theorem, see the slides at URL:"http://neuralnetworksanddeeplearning.com/chap4.html".
+!eblock
+
+!split
+===== Material for the active learning sessions on Tuesday and Wednesday =====
+!bblock  
+  * Exercise on starting to write a code for neural networks, feed forward part. We will also continue ur discussions of gradient descent methods from last week. If you have time, start considering the back-propagation part as well (exercises for next week)
+  * Discussion of project 2
+!eblock  
+
+_Note_: some of the codes will also be discussed next week in connection with the solution of differential equations.
+
+
+!split
+===== Writing a code which implements a feed-forward neural network  =====
+
+Last week we discussed the basics of neural networks and deep learning
+and the basics of automatic differentiation.  We looked also at
+examples on how compute the parameters of a simple network with scalar
+inputs and ouputs and no or just one hidden layers.
+
+
+We ended our discussions with the derivation of the equations for a
+neural network with one hidden layers and two input variables and two
+hidden nodes but only one output node.
+
+
+!split
+===== Mathematics of deep learning =====
+
+!bblock Two recent books online
+o "The Modern Mathematics of Deep Learning, by Julius Berner, Philipp Grohs, Gitta Kutyniok, Philipp Petersen":"https://arxiv.org/abs/2105.04026", published as "Mathematical Aspects of Deep Learning, pp. 1-111. Cambridge University Press, 2022":"https://doi.org/10.1017/9781009025096.002"
+
+o "Mathematical Introduction to Deep Learning: Methods, Implementations, and Theory, Arnulf Jentzen, Benno Kuckuck, Philippe von Wurstemberger":"https://doi.org/10.48550/arXiv.2310.20360"
+!eblock
+
+
+!split
+===== Reminder on books with hands-on material and codes =====
+!bblock
+* "Sebastian Rashcka et al, Machine learning with Sickit-Learn and PyTorch":"https://sebastianraschka.com/blog/2022/ml-pytorch-book.html"
+!eblock
+
+
+!split
+===== Reading recommendations =====
+
+o Rashkca et al., chapter 11, jupyter-notebook sent separately, from "GitHub":"https://github.com/rasbt/machine-learning-book"
+o Goodfellow et al, chapter 6 and 7 contain most of the neural network background.
+
+
+
+!split
+===== First network example, simple percepetron with one input =====
+
+As yet another example we define now a simple perceptron model with
+all quantities given by scalars. We consider only one input variable
+$x$ and one target value $y$.  We define an activation function
+$\sigma_1$ which takes as input
+
+!bt
+\[
+z_1 = w_1x+b_1,
+\]
+!et
+where $w_1$ is the weight and $b_1$ is the bias. These are the
+parameters we want to optimize.  The output is $a_1=\sigma(z_1)$ (see
+graph from whiteboard notes). This output is then fed into the
+_cost/loss_ function, which we here for the sake of simplicity just
+define as the squared error
+
+!bt
+\[
+C(x;w_1,b_1)=\frac{1}{2}(a_1-y)^2.
+\]
+!et
+
+!split
+===== Layout of a simple neural network with no hidden layer  =====
+
+FIGURE: [figures/simplenn1.png, width=900 frac=1.0]
+
+
+
+!split
+===== Optimizing the parameters =====
+
+In setting up the feed forward and back propagation parts of the
+algorithm, we need now the derivative of the various variables we want
+to train.
+
+We need
+!bt
+\[
+\frac{\partial C}{\partial w_1} \hspace{0.1cm}\mathrm{and}\hspace{0.1cm}\frac{\partial C}{\partial b_1}. 
+\]
+!et
+
+Using the chain rule we find 
+!bt
+\[
+\frac{\partial C}{\partial w_1}=\frac{\partial C}{\partial a_1}\frac{\partial a_1}{\partial z_1}\frac{\partial z_1}{\partial w_1}=(a_1-y)\sigma_1'x,
+\]
+!et
+and
+!bt
+\[
+\frac{\partial C}{\partial b_1}=\frac{\partial C}{\partial a_1}\frac{\partial a_1}{\partial z_1}\frac{\partial z_1}{\partial b_1}=(a_1-y)\sigma_1',
+\]
+!et
+which we later will just define as
+!bt
+\[
+\frac{\partial C}{\partial a_1}\frac{\partial a_1}{\partial z_1}=\delta_1.
+\]
+!et
+
+
+!split
+===== Adding a hidden layer =====
+
+We change our simple model to (see graph)
+a network with just one hidden layer but with scalar variables only.
+
+Our output variable changes to $a_2$ and $a_1$ is now the output from the hidden node and $a_0=x$.
+We have then
+!bt
+\[
+z_1 = w_1a_0+b_1 \hspace{0.1cm} \wedge a_1 = \sigma_1(z_1),
+\]
+!et
+!bt
+\[
+z_2 = w_2a_1+b_2 \hspace{0.1cm} \wedge a_2 = \sigma_2(z_2),
+\]
+!et
+and the cost function
+!bt
+\[
+C(x;\bm{\Theta})=\frac{1}{2}(a_2-y)^2,
+\]
+!et
+with $\bm{\Theta}=[w_1,w_2,b_1,b_2]$.
+
+
+!split
+===== Layout of a simple neural network with one hidden layer  =====
+
+FIGURE: [figures/simplenn2.png, width=900 frac=1.0]
+
+
+
+!split
+===== The derivatives =====
+
+The derivatives are now, using the chain rule again
+
+!bt
+\[
+\frac{\partial C}{\partial w_2}=\frac{\partial C}{\partial a_2}\frac{\partial a_2}{\partial z_2}\frac{\partial z_2}{\partial w_2}=(a_2-y)\sigma_2'a_1=\delta_2a_1,
+\]
+!et
+!bt
+\[
+\frac{\partial C}{\partial b_2}=\frac{\partial C}{\partial a_2}\frac{\partial a_2}{\partial z_2}\frac{\partial z_2}{\partial b_2}=(a_2-y)\sigma_2'=\delta_2,
+\]
+!et
+!bt
+\[
+\frac{\partial C}{\partial w_1}=\frac{\partial C}{\partial a_2}\frac{\partial a_2}{\partial z_2}\frac{\partial z_2}{\partial a_1}\frac{\partial a_1}{\partial z_1}\frac{\partial z_1}{\partial w_1}=(a_2-y)\sigma_2'a_1\sigma_1'a_0,
+\]
+!et
+!bt
+\[
+\frac{\partial C}{\partial b_1}=\frac{\partial C}{\partial a_2}\frac{\partial a_2}{\partial z_2}\frac{\partial z_2}{\partial a_1}\frac{\partial a_1}{\partial z_1}\frac{\partial z_1}{\partial b_1}=(a_2-y)\sigma_2'\sigma_1'=\delta_1.
+\]
+!et
+
+Can you generalize this to more than one hidden layer?
+
+
+!split
+===== Important observations =====
+
+!bblock
+From the above equations we see that the derivatives of the activation
+functions play a central role. If they vanish, the training may
+stop. This is called the vanishing gradient problem, see discussions below. If they become
+large, the parameters $w_i$ and $b_i$ may simply go to infinity. This
+is referenced as  the exploding gradient problem.
+!eblock
+
+
+
+
+!split
+===== The training =====
+
+The training of the parameters is done through various gradient descent approximations with
+
+!bt
+\[
+w_{i}\leftarrow w_{i}- \eta \delta_i a_{i-1},
+\]
+!et
+and
+!bt
+\[
+b_i \leftarrow b_i-\eta \delta_i,
+\]
+!et
+with $\eta$ is the learning rate.
+
+One iteration consists of one feed forward step and one back-propagation step. Each back-propagation step does one update of the parameters $\bm{\Theta}$.
+
+For the first hidden layer $a_{i-1}=a_0=x$ for this simple model.
+
+
+!split
+===== Code example  =====
+
+The code here implements the above model with one hidden layer and
+scalar variables for the same function we studied in the previous
+example.  The code is however set up so that we can add multiple
+inputs $x$ and target values $y$. Note also that we have the
+possibility of defining a feature matrix $\bm{X}$ with more than just
+one column for the input values. This will turn useful in our next example. We have also defined matrices and vectors for all of our operations although it is not necessary here.
+
+!bc pycod
+import numpy as np
+# We use the Sigmoid function as activation function
+def sigmoid(z):
+    return 1.0/(1.0+np.exp(-z))
+
+def forwardpropagation(x):
+    # weighted sum of inputs to the hidden layer
+    z_1 = np.matmul(x, w_1) + b_1
+    # activation in the hidden layer
+    a_1 = sigmoid(z_1)
+    # weighted sum of inputs to the output layer
+    z_2 = np.matmul(a_1, w_2) + b_2
+    a_2 = z_2
+    return a_1, a_2
+
+def backpropagation(x, y):
+    a_1, a_2 = forwardpropagation(x)
+    # parameter delta for the output layer, note that a_2=z_2 and its derivative wrt z_2 is just 1
+    delta_2 = a_2 - y
+    print(0.5*((a_2-y)**2))
+    # delta for  the hidden layer
+    delta_1 = np.matmul(delta_2, w_2.T) * a_1 * (1 - a_1)
+    # gradients for the output layer
+    output_weights_gradient = np.matmul(a_1.T, delta_2)
+    output_bias_gradient = np.sum(delta_2, axis=0)
+    # gradient for the hidden layer
+    hidden_weights_gradient = np.matmul(x.T, delta_1)
+    hidden_bias_gradient = np.sum(delta_1, axis=0)
+    return output_weights_gradient, output_bias_gradient, hidden_weights_gradient, hidden_bias_gradient
+
+
+# ensure the same random numbers appear every time
+np.random.seed(0)
+# Input variable
+x = np.array([4.0],dtype=np.float64)
+# Target values
+y = 2*x+1.0 
+
+# Defining the neural network, only scalars here
+n_inputs = x.shape
+n_features = 1
+n_hidden_neurons = 1
+n_outputs = 1
+
+# Initialize the network
+# weights and bias in the hidden layer
+w_1 = np.random.randn(n_features, n_hidden_neurons)
+b_1 = np.zeros(n_hidden_neurons) + 0.01
+
+# weights and bias in the output layer
+w_2 = np.random.randn(n_hidden_neurons, n_outputs)
+b_2 = np.zeros(n_outputs) + 0.01
+
+eta = 0.1
+for i in range(50):
+    # calculate gradients
+    derivW2, derivB2, derivW1, derivB1 = backpropagation(x, y)
+    # update weights and biases
+    w_2 -= eta * derivW2
+    b_2 -= eta * derivB2
+    w_1 -= eta * derivW1
+    b_1 -= eta * derivB1
+
+
+!ec
+
+We see that after some few iterations (the results do depend on the learning rate however), we get an error which is rather small.
+
+
+!split
+===== Simple neural network and the  back propagation equations  =====
+
+Let us now try to increase our level of ambition and attempt at setting 
+up the equations for a neural network with two input nodes, one hidden
+layer with two hidden nodes and one output layer with one output node/neuron only (see graph)..
+
+We need to define the following parameters and variables with the input layer (layer $(0)$) 
+where we label the  nodes $x_1$ and $x_2$
+!bt
+\[
+x_1 = a_1^{(0)} \wedge x_2 = a_2^{(0)}.
+\]
+!et
+
+The  hidden layer (layer $(1)$) has  nodes which yield the outputs $a_1^{(1)}$ and $a_2^{(1)}$) with  weight $\bm{w}$ and bias $\bm{b}$ parameters
+!bt
+\[
+w_{ij}^{(1)}=\left\{w_{11}^{(1)},w_{12}^{(1)},w_{21}^{(1)},w_{22}^{(1)}\right\} \wedge b^{(1)}=\left\{b_1^{(1)},b_2^{(1)}\right\}.
+\]
+!et
+
+
+!split
+===== Layout of a simple neural network with two input nodes, one  hidden layer with two hidden noeds and one output node  =====
+
+FIGURE: [figures/simplenn3.pdf, width=900 frac=1.0]
+
+
+
+!split
+===== The ouput layer =====
+
+We have the ouput layer given by layer label $(2)$ with output $a^{(2)}$ and weights and biases to be determined given by the variables
+!bt
+\[
+w_{i}^{(2)}=\left\{w_{1}^{(2)},w_{2}^{(2)}\right\} \wedge b^{(2)}.
+\]
+!et
+
+
+Our output is $\tilde{y}=a^{(2)}$ and we define a generic cost function $C(a^{(2)},y;\bm{\Theta})$ where $y$ is the target value (a scalar here).
+The parameters we need to optimize are given by
+!bt
+\[
+\bm{\Theta}=\left\{w_{11}^{(1)},w_{12}^{(1)},w_{21}^{(1)},w_{22}^{(1)},w_{1}^{(2)},w_{2}^{(2)},b_1^{(1)},b_2^{(1)},b^{(2)}\right\}.
+\]
+!et
+
+!split
+===== Compact expressions =====
+
+We can define the inputs to the activation functions for the various layers in terms of various matrix-vector multiplications and vector additions.
+The inputs to the first hidden layer are
+!bt
+\[
+\begin{bmatrix}z_1^{(1)} \\ z_2^{(1)} \end{bmatrix}=\left(\begin{bmatrix}w_{11}^{(1)} & w_{12}^{(1)}\\ w_{21}^{(1)} &w_{22}^{(1)} \end{bmatrix}\right)^{T}\begin{bmatrix}a_1^{(0)} \\ a_2^{(0)} \end{bmatrix}+\begin{bmatrix}b_1^{(1)} \\ b_2^{(1)} \end{bmatrix},
+\]
+!et
+with outputs
+!bt
+\[
+\begin{bmatrix}a_1^{(1)} \\ a_2^{(1)} \end{bmatrix}=\begin{bmatrix}\sigma^{(1)}(z_1^{(1)}) \\ \sigma^{(1)}(z_2^{(1)}) \end{bmatrix}.
+\]
+!et
+
+!split
+===== Output layer =====
+
+For the final output layer we have the inputs to the final activation function 
+!bt
+\[
+z^{(2)} = w_{1}^{(2)}a_1^{(1)} +w_{2}^{(2)}a_2^{(1)}+b^{(2)},
+\]
+!et
+resulting in the  output
+!bt
+\[
+a^{(2)}=\sigma^{(2)}(z^{(2)}).
+\]
+!et
+
+
+
+
+
+!split
+===== Explicit derivatives =====
+
+In total we have nine parameters which we need to train.  Using the
+chain rule (or just the back-propagation algorithm) we can find all
+derivatives. Since we will use automatic differentiation in reverse
+mode, we start with the derivatives of the cost function with respect
+to the parameters of the output layer, namely
+
+!bt
+\[
+\frac{\partial C}{\partial w_{i}^{(2)}}=\frac{\partial C}{\partial a^{(2)}}\frac{\partial a^{(2)}}{\partial z^{(2)}}\frac{\partial z^{(2)}}{\partial w_{i}^{(2)}}=\delta^{(2)}a_i^{(1)},
+\]
+!et
+with
+!bt
+\[
+\delta^{(2)}=\frac{\partial C}{\partial a^{(2)}}\frac{\partial a^{(2)}}{\partial z^{(2)}}
+\]
+!et
+and finally
+!bt
+\[
+\frac{\partial C}{\partial b^{(2)}}=\frac{\partial C}{\partial a^{(2)}}\frac{\partial a^{(2)}}{\partial z^{(2)}}\frac{\partial z^{(2)}}{\partial b^{(2)}}=\delta^{(2)}.
+\]
+!et
+
+
+!split
+===== Derivatives of the hidden layer =====
+
+Using the chain rule we have the following expressions for say one of the weight parameters (it is easy to generalize to the other weight parameters)
+!bt
+\[
+\frac{\partial C}{\partial w_{11}^{(1)}}=\frac{\partial C}{\partial a^{(2)}}\frac{\partial a^{(2)}}{\partial z^{(2)}}
+\frac{\partial z^{(2)}}{\partial z_1^{(1)}}\frac{\partial z_1^{(1)}}{\partial w_{11}^{(1)}}=    \delta^{(2)}\frac{\partial z^{(2)}}{\partial z_1^{(1)}}\frac{\partial z_1^{(1)}}{\partial w_{11}^{(1)}},
+\]
+!et
+which, noting that
+!bt
+\[
+z^{(2)} =w_1^{(2)}a_1^{(1)}+w_2^{(2)}a_2^{(1)}+b^{(2)},
+\]
+!et
+allows us to rewrite 
+!bt
+\[
+\frac{\partial z^{(2)}}{\partial z_1^{(1)}}\frac{\partial z_1^{(1)}}{\partial w_{11}^{(1)}}=w_1^{(2)}\frac{\partial a_1^{(1)}}{\partial z_1^{(1)}}a_1^{(1)}.
+\]
+!et
+
+!split
+===== Final expression =====
+Defining
+!bt
+\[
+\delta_1^{(1)}=w_1^{(2)}\frac{\partial a_1^{(1)}}{\partial z_1^{(1)}}\delta^{(2)},
+\]
+!et
+we have 
+!bt
+\[
+\frac{\partial C}{\partial w_{11}^{(1)}}=\delta_1^{(1)}a_1^{(1)}.
+\]
+!et
+Similarly, we obtain
+!bt
+\[
+\frac{\partial C}{\partial w_{12}^{(1)}}=\delta_1^{(1)}a_2^{(1)}.
+\]
+!et
+
+!split
+===== Completing the list =====
+
+Similarly, we find
+!bt
+\[
+\frac{\partial C}{\partial w_{21}^{(1)}}=\delta_2^{(1)}a_1^{(1)},
+\]
+!et
+and 
+!bt
+\[
+\frac{\partial C}{\partial w_{22}^{(1)}}=\delta_2^{(1)}a_2^{(1)},
+\]
+!et
+where we have defined 
+!bt
+\[
+\delta_2^{(1)}=w_2^{(2)}\frac{\partial a_2^{(1)}}{\partial z_2^{(1)}}\delta^{(2)}.
+\]
+!et
+
+!split
+===== Final expressions for the biases of the hidden layer =====
+
+For the sake of completeness, we list the derivatives of the biases, which are
+!bt
+\[
+\frac{\partial C}{\partial b_{1}^{(1)}}=\delta_1^{(1)},
+\]
+!et
+and
+!bt
+\[
+\frac{\partial C}{\partial b_{2}^{(1)}}=\delta_2^{(1)}.
+\]
+!et
+As we will see below, these expressions can be generalized in a more compact form. 
+
+
+!split
+===== Gradient expressions =====
+
+For this specific model, with just one output node and two hidden
+nodes, the gradient descent equations take the following form for output layer
+!bt
+\[
+w_{i}^{(2)}\leftarrow w_{i}^{(2)}- \eta \delta^{(2)} a_{i}^{(1)},
+\]
+!et
+and
+!bt
+\[
+b^{(2)} \leftarrow b^{(2)}-\eta \delta^{(2)},
+\]
+!et
+and
+!bt
+\[
+w_{ij}^{(1)}\leftarrow w_{ij}^{(1)}- \eta \delta_{i}^{(1)} a_{j}^{(0)},
+\]
+!et
+and
+!bt
+\[
+b_{i}^{(1)} \leftarrow b_{i}^{(1)}-\eta \delta_{i}^{(1)},
+\]
+!et
+where $\eta$ is the learning rate.
+
+
+!split
+===== Setting up the equations for a neural network =====
+
+The questions we want to ask are how do changes in the biases and the
+weights in our network change the cost function and how can we use the
+final output to modify the weights and biases?
+
+To derive these equations let us start with a plain regression problem
+and define our cost function as
+
+!bt
+\[
+{\cal C}(\bm{\Theta})  =  \frac{1}{2}\sum_{i=1}^n\left(y_i - \tilde{y}_i\right)^2, 
+\]
+!et
+
+where the $y_i$s are our $n$ targets (the values we want to
+reproduce), while the outputs of the network after having propagated
+all inputs $\bm{x}$ are given by $\bm{\tilde{y}}_i$.
+
+
+!split
+===== Layout of a neural network with three hidden layers (last later = $l=L=4$, first layer $l=0$) =====
+
+FIGURE: [figures/nn2.pdf, width=900 frac=1.0]
+
+!split
+===== Definitions =====
+
+With our definition of the targets $\bm{y}$, the outputs of the
+network $\bm{\tilde{y}}$ and the inputs $\bm{x}$ we
+define now the activation $z_j^l$ of node/neuron/unit $j$ of the
+$l$-th layer as a function of the bias, the weights which add up from
+the previous layer $l-1$ and the forward passes/outputs
+$\bm{a}^{l-1}$ from the previous layer as
+
+
+!bt
+\[
+z_j^l = \sum_{i=1}^{M_{l-1}}w_{ij}^la_i^{l-1}+b_j^l,
+\]
+!et
+
+where $b_k^l$ are the biases from layer $l$.  Here $M_{l-1}$
+represents the total number of nodes/neurons/units of layer $l-1$. The
+figure in the whiteboard notes illustrates this equation.  We can rewrite this in a more
+compact form as the matrix-vector products we discussed earlier,
+
+!bt
+\[
+\bm{z}^l = \left(\bm{W}^l\right)^T\bm{a}^{l-1}+\bm{b}^l.
+\]
+!et
+
+!split
+===== Inputs to the activation function =====
+
+With the activation values $\bm{z}^l$ we can in turn define the
+output of layer $l$ as $\bm{a}^l = \sigma(\bm{z}^l)$ where $\sigma$ is our
+activation function. In the examples here we will use the sigmoid
+function discussed in our logistic regression lectures. We will also use the same activation function $\sigma$ for all layers
+and their nodes.  It means we have
+
+!bt
+\[
+a_j^l = \sigma(z_j^l) = \frac{1}{1+\exp{-(z_j^l)}}.
+\]
+!et
+
+
+!split
+===== Layout of input to first hidden layer $l=1$ from input layer $l=0$ =====
+
+FIGURE: [figures/structure.pdf, width=900 frac=1.0]
+
+
+
+!split
+===== Derivatives and the chain rule =====
+
+From the definition of the input variable to the activation function, that is $z_j^l$ we have
+!bt
+\[
+\frac{\partial z_j^l}{\partial w_{ij}^l} = a_i^{l-1},
+\]
+!et
+and
+!bt
+\[
+\frac{\partial z_j^l}{\partial a_i^{l-1}} = w_{ji}^l. 
+\]
+!et
+
+With our definition of the activation function we have that (note that this function depends only on $z_j^l$)
+!bt
+\[
+\frac{\partial a_j^l}{\partial z_j^{l}} = a_j^l(1-a_j^l)=\sigma(z_j^l)(1-\sigma(z_j^l)). 
+\]
+!et
+
+
+!split
+===== Derivative of the cost function =====
+
+With these definitions we can now compute the derivative of the cost function in terms of the weights.
+
+Let us specialize to the output layer $l=L$. Our cost function is
+!bt
+\[
+{\cal C}(\bm{\Theta}^L)  =  \frac{1}{2}\sum_{i=1}^n\left(y_i - \tilde{y}_i\right)^2=\frac{1}{2}\sum_{i=1}^n\left(a_i^L - y_i\right)^2, 
+\]
+!et
+The derivative of this function with respect to the weights is
+
+!bt
+\[
+\frac{\partial{\cal C}(\bm{\Theta}^L)}{\partial w_{ij}^L}  =  \left(a_j^L - y_j\right)\frac{\partial a_j^L}{\partial w_{ij}^{L}}, 
+\]
+!et
+The last partial derivative can easily be computed and reads (by applying the chain rule)
+!bt
+\[
+\frac{\partial a_j^L}{\partial w_{ij}^{L}} = \frac{\partial a_j^L}{\partial z_{j}^{L}}\frac{\partial z_j^L}{\partial w_{ij}^{L}}=a_j^L(1-a_j^L)a_i^{L-1}.  
+\]
+!et
+
+
+
+
+
+!split
+===== The  back propagation equations for a neural network =====
+
+We have thus
+!bt
+\[
+\frac{\partial{\cal C}((\bm{\Theta}^L)}{\partial w_{ij}^L}  =  \left(a_j^L - y_j\right)a_j^L(1-a_j^L)a_i^{L-1}, 
+\]
+!et
+
+Defining
+!bt
+\[
+\delta_j^L = a_j^L(1-a_j^L)\left(a_j^L - y_j\right) = \sigma'(z_j^L)\frac{\partial {\cal C}}{\partial (a_j^L)},
+\]
+!et
+and using the Hadamard product of two vectors we can write this as
+!bt
+\[
+\bm{\delta}^L = \sigma'(\bm{z}^L)\circ\frac{\partial {\cal C}}{\partial (\bm{a}^L)}.
+\]
+!et
+
+!split
+===== Analyzing the last results =====
+
+This is an important expression. The second term on the right handside
+measures how fast the cost function is changing as a function of the $j$th
+output activation.  If, for example, the cost function doesn't depend
+much on a particular output node $j$, then $\delta_j^L$ will be small,
+which is what we would expect. The first term on the right, measures
+how fast the activation function $f$ is changing at a given activation
+value $z_j^L$.
+
+!split
+===== More considerations =====
+
+
+Notice that everything in the above equations is easily computed.  In
+particular, we compute $z_j^L$ while computing the behaviour of the
+network, and it is only a small additional overhead to compute
+$\sigma'(z^L_j)$.  The exact form of the derivative with respect to the
+output depends on the form of the cost function.
+However, provided the cost function is known there should be little
+trouble in calculating
+
+!bt
+\[
+\frac{\partial {\cal C}}{\partial (a_j^L)}
+\]
+!et
+
+With the definition of $\delta_j^L$ we have a more compact definition of the derivative of the cost function in terms of the weights, namely
+!bt
+\[
+\frac{\partial{\cal C}}{\partial w_{ij}^L}  =  \delta_j^La_i^{L-1}.
+\]
+!et
+
+!split
+===== Derivatives in terms of $z_j^L$ =====
+
+It is also easy to see that our previous equation can be written as
+
+!bt
+\[
+\delta_j^L =\frac{\partial {\cal C}}{\partial z_j^L}= \frac{\partial {\cal C}}{\partial a_j^L}\frac{\partial a_j^L}{\partial z_j^L},
+\]
+!et
+which can also be interpreted as the partial derivative of the cost function with respect to the biases $b_j^L$, namely
+!bt
+\[
+\delta_j^L = \frac{\partial {\cal C}}{\partial b_j^L}\frac{\partial b_j^L}{\partial z_j^L}=\frac{\partial {\cal C}}{\partial b_j^L},
+\]
+!et
+That is, the error $\delta_j^L$ is exactly equal to the rate of change of the cost function as a function of the bias. 
+
+!split
+===== Bringing it together =====
+
+We have now three equations that are essential for the computations of the derivatives of the cost function at the output layer. These equations are needed to start the algorithm and they are
+
+!bt
+\begin{equation}
+\frac{\partial{\cal C}(\bm{W^L})}{\partial w_{ij}^L}  =  \delta_j^La_i^{L-1},
+\end{equation}
+!et
+and
+!bt
+\begin{equation}
+\delta_j^L = \sigma'(z_j^L)\frac{\partial {\cal C}}{\partial (a_j^L)},
+\end{equation}
+!et
+and
+
+!bt
+\begin{equation}
+\delta_j^L = \frac{\partial {\cal C}}{\partial b_j^L},
+\end{equation}
+!et
+
+
+!split
+===== Final back propagating equation =====
+
+We have that (replacing $L$ with a general layer $l$)
+!bt
+\[
+\delta_j^l =\frac{\partial {\cal C}}{\partial z_j^l}.
+\]
+!et
+We want to express this in terms of the equations for layer $l+1$.
+
+!split
+===== Using the chain rule and summing over all $k$ entries =====
+
+We obtain
+!bt
+\[
+\delta_j^l =\sum_k \frac{\partial {\cal C}}{\partial z_k^{l+1}}\frac{\partial z_k^{l+1}}{\partial z_j^{l}}=\sum_k \delta_k^{l+1}\frac{\partial z_k^{l+1}}{\partial z_j^{l}},
+\]
+!et
+and recalling that
+!bt
+\[
+z_j^{l+1} = \sum_{i=1}^{M_{l}}w_{ij}^{l+1}a_i^{l}+b_j^{l+1},
+\]
+!et
+with $M_l$ being the number of nodes in layer $l$, we obtain
+!bt
+\[
+\delta_j^l =\sum_k \delta_k^{l+1}w_{kj}^{l+1}\sigma'(z_j^l),
+\]
+!et
+This is our final equation.
+
+We are now ready to set up the algorithm for back propagation and learning the weights and biases.
+
+
+!split
+===== Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations =====
+
+!bblock The architecture (our model)
+o Set up your inputs and outputs (scalars, vectors, matrices or higher-order arrays)
+o Define the number of hidden layers and hidden nodes
+o Define activation functions for hidden layers and output layers
+o Define optimizer (plan learning rate, momentum, ADAgrad, RMSprop, ADAM etc) and array of initial learning rates
+o Define cost function and possible regularization terms with hyperparameters
+o Initialize weights and biases
+o Fix number of iterations for the feed forward part and back propagation part
+!eblock
+
+!split
+===== Setting up the back propagation algorithm, part 1 =====
+
+The four equations  provide us with a way of computing the gradients of the cost function. Let us write this out in the form of an algorithm.
+
+_First_, we set up the input data $\bm{x}$ and the activations
+$\bm{z}_1$ of the input layer and compute the activation function and
+the pertinent outputs $\bm{a}^1$.
+
+_Secondly_, we perform then the feed forward till we reach the output
+layer and compute all $\bm{z}_l$ of the input layer and compute the
+activation function and the pertinent outputs $\bm{a}^l$ for
+$l=1,2,3,\dots,L$.
+
+
+_Notation_: The first hidden layer has $l=1$ as label and the final output layer has $l=L$.
+
+!split
+===== Setting up the back propagation algorithm, part 2 =====
+
+
+Thereafter we compute the ouput error $\bm{\delta}^L$ by computing all
+!bt
+\[
+\delta_j^L = \sigma'(z_j^L)\frac{\partial {\cal C}}{\partial (a_j^L)}.
+\]
+!et
+
+Then we compute the back propagate error for each $l=L-1,L-2,\dots,1$ as
+!bt
+\[
+\delta_j^l = \sum_k \delta_k^{l+1}w_{kj}^{l+1}\sigma'(z_j^l).
+\]
+!et
+
+!split
+===== Setting up the Back propagation algorithm, part 3 =====
+
+
+Finally, we update the weights and the biases using gradient descent
+for each $l=L-1,L-2,\dots,1$ (the first hidden layer) and update the weights and biases
+according to the rules
+
+!bt
+\[
+w_{ij}^l\leftarrow  = w_{ij}^l- \eta \delta_j^la_i^{l-1},
+\]
+!et
+
+!bt
+\[
+b_j^l \leftarrow b_j^l-\eta \frac{\partial {\cal C}}{\partial b_j^l}=b_j^l-\eta \delta_j^l,
+\]
+!et
+with $\eta$ being the learning rate.
+
+!split
+===== Updating the gradients  =====
+
+With the back propagate error for each $l=L-1,L-2,\dots,1$ as
+!bt
+\[
+\delta_j^l = \sum_k \delta_k^{l+1}w_{kj}^{l+1}\sigma'(z_j^l),
+\]
+!et
+we update the weights and the biases using gradient descent for each $l=L-1,L-2,\dots,1$ and update the weights and biases according to the rules
+!bt
+\[
+w_{ij}^l\leftarrow  = w_{ij}^l- \eta \delta_j^la_i^{l-1},
+\]
+!et
+
+!bt
+\[
+b_j^l \leftarrow b_j^l-\eta \frac{\partial {\cal C}}{\partial b_j^l}=b_j^l-\eta \delta_j^l,
+\]
+!et
+
+
+
+
+
+
+!split
+===== Activation functions  =====
+
+
+A property that characterizes a neural network, other than its
+connectivity, is the choice of activation function(s).  The following
+restrictions are imposed on an activation function for an FFNN to
+fulfill the universal approximation theorem
+
+  * Non-constant
+
+  * Bounded
+
+  * Monotonically-increasing
+
+  * Continuous
+
+!split
+=== Activation functions, Logistic and Hyperbolic ones  ===
+
+The second requirement excludes all linear functions. Furthermore, in
+a MLP with only linear activation functions, each layer simply
+performs a linear transformation of its inputs.
+
+Regardless of the number of layers, the output of the NN will be
+nothing but a linear function of the inputs. Thus we need to introduce
+some kind of non-linearity to the NN to be able to fit non-linear
+functions Typical examples are the logistic *Sigmoid*
+
+!bt
+\[
+ \sigma(x) = \frac{1}{1 + e^{-x}},
+\]
+!et
+and the *hyperbolic tangent* function
+!bt
+\[
+ \sigma(x) = \tanh(x)
+\]
+!et
+
+!split
+===== Relevance =====
+
+The *sigmoid* function are more biologically plausible because the
+output of inactive neurons are zero. Such activation function are
+called *one-sided*. However, it has been shown that the hyperbolic
+tangent performs better than the sigmoid for training MLPs.  has
+become the most popular for *deep neural networks*
+
+!bc pycod
+"""The sigmoid function (or the logistic curve) is a 
+function that takes any real number, z, and outputs a number (0,1).
+It is useful in neural networks for assigning weights on a relative scale.
+The value z is the weighted sum of parameters involved in the learning algorithm."""
+
+import numpy
+import matplotlib.pyplot as plt
+import math as mt
+
+z = numpy.arange(-5, 5, .1)
+sigma_fn = numpy.vectorize(lambda z: 1/(1+numpy.exp(-z)))
+sigma = sigma_fn(z)
+
+fig = plt.figure()
+ax = fig.add_subplot(111)
+ax.plot(z, sigma)
+ax.set_ylim([-0.1, 1.1])
+ax.set_xlim([-5,5])
+ax.grid(True)
+ax.set_xlabel('z')
+ax.set_title('sigmoid function')
+
+plt.show()
+
+"""Step Function"""
+z = numpy.arange(-5, 5, .02)
+step_fn = numpy.vectorize(lambda z: 1.0 if z >= 0.0 else 0.0)
+step = step_fn(z)
+
+fig = plt.figure()
+ax = fig.add_subplot(111)
+ax.plot(z, step)
+ax.set_ylim([-0.5, 1.5])
+ax.set_xlim([-5,5])
+ax.grid(True)
+ax.set_xlabel('z')
+ax.set_title('step function')
+
+plt.show()
+
+"""Sine Function"""
+z = numpy.arange(-2*mt.pi, 2*mt.pi, 0.1)
+t = numpy.sin(z)
+
+fig = plt.figure()
+ax = fig.add_subplot(111)
+ax.plot(z, t)
+ax.set_ylim([-1.0, 1.0])
+ax.set_xlim([-2*mt.pi,2*mt.pi])
+ax.grid(True)
+ax.set_xlabel('z')
+ax.set_title('sine function')
+
+plt.show()
+
+"""Plots a graph of the squashing function used by a rectified linear
+unit"""
+z = numpy.arange(-2, 2, .1)
+zero = numpy.zeros(len(z))
+y = numpy.max([zero, z], axis=0)
+
+fig = plt.figure()
+ax = fig.add_subplot(111)
+ax.plot(z, y)
+ax.set_ylim([-2.0, 2.0])
+ax.set_xlim([-2.0, 2.0])
+ax.grid(True)
+ax.set_xlabel('z')
+ax.set_title('Rectified linear unit')
+
+plt.show()
+!ec
+
+
+
+
+
+
+!split 
+===== Vanishing gradients  =====
+
+The Back propagation algorithm we derived above works by going from
+the output layer to the input layer, propagating the error gradient on
+the way. Once the algorithm has computed the gradient of the cost
+function with regards to each parameter in the network, it uses these
+gradients to update each parameter with a Gradient Descent (GD) step.
+
+
+Unfortunately for us, the gradients often get smaller and smaller as
+the algorithm progresses down to the first hidden layers. As a result,
+the GD update leaves the lower layer connection weights virtually
+unchanged, and training never converges to a good solution. This is
+known in the literature as _the vanishing gradients problem_.
+
+!split
+===== Exploding gradients =====
+
+In other cases, the opposite can happen, namely the the gradients can
+grow bigger and bigger. The result is that many of the layers get
+large updates of the weights the algorithm diverges. This is the
+_exploding gradients problem_, which is mostly encountered in
+recurrent neural networks. More generally, deep neural networks suffer
+from unstable gradients, different layers may learn at widely
+different speeds
+
+
+
+!split 
+===== Is the Logistic activation function (Sigmoid)  our choice? =====
+
+Although this unfortunate behavior has been empirically observed for
+quite a while (it was one of the reasons why deep neural networks were
+mostly abandoned for a long time), it is only around 2010 that
+significant progress was made in understanding it.
+
+A paper titled "Understanding the Difficulty of Training Deep
+Feedforward Neural Networks by Xavier Glorot and Yoshua Bengio":"http://proceedings.mlr.press/v9/glorot10a.html" found that
+the problems with the popular logistic
+sigmoid activation function and the weight initialization technique
+that was most popular at the time, namely random initialization using
+a normal distribution with a mean of 0 and a standard deviation of
+1. 
+
+!split
+===== Logistic function as the root of problems =====
+
+They showed that with this activation function and this
+initialization scheme, the variance of the outputs of each layer is
+much greater than the variance of its inputs. Going forward in the
+network, the variance keeps increasing after each layer until the
+activation function saturates at the top layers. This is actually made
+worse by the fact that the logistic function has a mean of 0.5, not 0
+(the hyperbolic tangent function has a mean of 0 and behaves slightly
+better than the logistic function in deep networks).
+
+
+!split
+===== The derivative of the Logistic funtion =====
+
+Looking at the logistic activation function, when inputs become large
+(negative or positive), the function saturates at 0 or 1, with a
+derivative extremely close to 0. Thus when backpropagation kicks in,
+it has virtually no gradient to propagate back through the network,
+and what little gradient exists keeps getting diluted as
+backpropagation progresses down through the top layers, so there is
+really nothing left for the lower layers.
+
+In their paper, Glorot and Bengio propose a way to significantly
+alleviate this problem. We need the signal to flow properly in both
+directions: in the forward direction when making predictions, and in
+the reverse direction when backpropagating gradients. We don’t want
+the signal to die out, nor do we want it to explode and saturate. For
+the signal to flow properly, the authors argue that we need the
+variance of the outputs of each layer to be equal to the variance of
+its inputs, and we also need the gradients to have equal variance
+before and after flowing through a layer in the reverse direction.
+
+
+!split
+===== Insights from the paper by Glorot and Bengio =====
+
+One of the insights in the 2010 paper by Glorot and Bengio was that
+the vanishing/exploding gradients problems were in part due to a poor
+choice of activation function. Until then most people had assumed that
+if Nature had chosen to use roughly sigmoid activation functions in
+biological neurons, they must be an excellent choice. But it turns out
+that other activation functions behave much better in deep neural
+networks, in particular the ReLU activation function, mostly because
+it does not saturate for positive values (and also because it is quite
+fast to compute).
+
+
+
+!split
+===== The RELU function family =====
+
+The ReLU activation function suffers from a problem known as the dying
+ReLUs: during training, some neurons effectively die, meaning they
+stop outputting anything other than 0.
+
+In some cases, you may find that half of your network’s neurons are
+dead, especially if you used a large learning rate. During training,
+if a neuron’s weights get updated such that the weighted sum of the
+neuron’s inputs is negative, it will start outputting 0. When this
+happen, the neuron is unlikely to come back to life since the gradient
+of the ReLU function is 0 when its input is negative.
+
+!split
+===== ELU function =====
+
+To solve this problem, nowadays practitioners use a variant of the
+ReLU function, such as the leaky ReLU discussed above or the so-called
+exponential linear unit (ELU) function
+
+
+!bt
+\[
+ELU(z) = \left\{\begin{array}{cc} \alpha\left( \exp{(z)}-1\right) & z < 0,\\  z & z \ge 0.\end{array}\right. 
+\]
+!et
+
+!split
+===== Which activation function should we use? =====
+
+In general it seems that the ELU activation function is better than
+the leaky ReLU function (and its variants), which is better than
+ReLU. ReLU performs better than $\tanh$ which in turn performs better
+than the logistic function.
+
+If runtime performance is an issue, then you may opt for the leaky
+ReLU function over the ELU function If you don’t want to tweak yet
+another hyperparameter, you may just use the default $\alpha$ of
+$0.01$ for the leaky ReLU, and $1$ for ELU. If you have spare time and
+computing power, you can use cross-validation or bootstrap to evaluate
+other activation functions.
+
+
+!split
+=====  More on activation functions, output layers =====
+
+In most cases you can use the ReLU activation function in the hidden
+layers (or one of its variants).
+
+It is a bit faster to compute than other activation functions, and the
+gradient descent optimization does in general not get stuck.
+
+_For the output layer:_
+
+* For classification the softmax activation function is generally a good choice for classification tasks (when the classes are mutually exclusive).
+* For regression tasks, you can simply use no activation function at all.
+
+
+!split
+===== Fine-tuning neural network hyperparameters =====
+
+The flexibility of neural networks is also one of their main
+drawbacks: there are many hyperparameters to tweak. Not only can you
+use any imaginable network topology (how neurons/nodes are
+interconnected), but even in a simple FFNN you can change the number
+of layers, the number of neurons per layer, the type of activation
+function to use in each layer, the weight initialization logic, the
+stochastic gradient optmized and much more. How do you know what
+combination of hyperparameters is the best for your task?
+
+* You can use grid search with cross-validation to find the right hyperparameters.
+
+However,since there are many hyperparameters to tune, and since
+training a neural network on a large dataset takes a lot of time, you
+will only be able to explore a tiny part of the hyperparameter space.
+
+
+* You can use randomized search.
+* Or use tools like "Oscar":"http://oscar.calldesk.ai/", which implements more complex algorithms to help you find a good set of hyperparameters quickly.  
+
+!split
+===== Hidden layers =====
+
+
+
+For many problems you can start with just one or two hidden layers and
+it will work just fine.  For the MNIST data set discussed below you can easily get a
+high accuracy using just one hidden layer with a few hundred neurons.
+You can reach for this data set above 98% accuracy using two hidden
+layers with the same total amount of neurons, in roughly the same
+amount of training time.
+
+For more complex problems, you can gradually ramp up the number of
+hidden layers, until you start overfitting the training set. Very
+complex tasks, such as large image classification or speech
+recognition, typically require networks with dozens of layers and they
+need a huge amount of training data. However, you will rarely have to
+train such networks from scratch: it is much more common to reuse
+parts of a pretrained state-of-the-art network that performs a similar
+task.
+
+
+
+!split
+===== Batch Normalization =====
+
+Batch Normalization aims to address the vanishing/exploding gradients
+problems, and more generally the problem that the distribution of each
+layer’s inputs changes during training, as the parameters of the
+previous layers change.
+
+The technique consists of adding an operation in the model just before
+the activation function of each layer, simply zero-centering and
+normalizing the inputs, then scaling and shifting the result using two
+new parameters per layer (one for scaling, the other for shifting). In
+other words, this operation lets the model learn the optimal scale and
+mean of the inputs for each layer.  In order to zero-center and
+normalize the inputs, the algorithm needs to estimate the inputs’ mean
+and standard deviation. It does so by evaluating the mean and standard
+deviation of the inputs over the current mini-batch, from this the
+name batch normalization.
+
+!split
+===== Dropout =====
+
+It is a fairly simple algorithm: at every training step, every neuron
+(including the input neurons but excluding the output neurons) has a
+probability $p$ of being temporarily dropped out, meaning it will be
+entirely ignored during this training step, but it may be active
+during the next step.
+
+The hyperparameter $p$ is called the dropout rate, and it is typically
+set to 50%. After training, the neurons are not dropped anymore.  It
+is viewed as one of the most popular regularization techniques.
+
+!split
+===== Gradient Clipping =====
+
+A popular technique to lessen the exploding gradients problem is to
+simply clip the gradients during backpropagation so that they never
+exceed some threshold (this is mostly useful for recurrent neural
+networks).
+
+This technique is called Gradient Clipping.
+
+In general however, Batch
+Normalization is preferred.
+
+!split 
+===== A top-down perspective on Neural networks =====
+
+
+The first thing we would like to do is divide the data into two or
+three parts. A training set, a validation or dev (development) set,
+and a test set. The test set is the data on which we want to make
+predictions. The dev set is a subset of the training data we use to
+check how well we are doing out-of-sample, after training the model on
+the training dataset. We use the validation error as a proxy for the
+test error in order to make tweaks to our model. It is crucial that we
+do not use any of the test data to train the algorithm. This is a
+cardinal sin in ML. Then:
+
+o Estimate optimal error rate
+o Minimize underfitting (bias) on training data set.
+o Make sure you are not overfitting.
+
+
+
+!split
+===== More top-down perspectives =====
+
+If the validation and test sets are drawn from the same distributions,
+then a good performance on the validation set should lead to similarly
+good performance on the test set. 
+
+However, sometimes
+the training data and test data differ in subtle ways because, for
+example, they are collected using slightly different methods, or
+because it is cheaper to collect data in one way versus another. In
+this case, there can be a mismatch between the training and test
+data. This can lead to the neural network overfitting these small
+differences between the test and training sets, and a poor performance
+on the test set despite having a good performance on the validation
+set. To rectify this, Andrew Ng suggests making two validation or dev
+sets, one constructed from the training data and one constructed from
+the test data. The difference between the performance of the algorithm
+on these two validation sets quantifies the train-test mismatch. This
+can serve as another important diagnostic when using DNNs for
+supervised learning.
+
+!split
+===== Limitations of supervised learning with deep networks =====
+
+Like all statistical methods, supervised learning using neural
+networks has important limitations. This is especially important when
+one seeks to apply these methods, especially to physics problems. Like
+all tools, DNNs are not a universal solution. Often, the same or
+better performance on a task can be achieved by using a few
+hand-engineered features (or even a collection of random
+features). 
+
+
+!split
+===== Limitations of NNs =====
+
+Here we list some of the important limitations of supervised neural network based models. 
+
+
+
+* _Need labeled data_. All supervised learning methods, DNNs for supervised learning require labeled data. Often, labeled data is harder to acquire than unlabeled data (e.g. one must pay for human experts to label images).
+* _Supervised neural networks are extremely data intensive._ DNNs are data hungry. They perform best when data is plentiful. This is doubly so for supervised methods where the data must also be labeled. The utility of DNNs is extremely limited if data is hard to acquire or the datasets are small (hundreds to a few thousand samples). In this case, the performance of other methods that utilize hand-engineered features can exceed that of DNNs.
+
+
+!split
+===== Homogeneous data =====
+
+* _Homogeneous data._ Almost all DNNs deal with homogeneous data of one type. It is very hard to design architectures that mix and match data types (i.e.~some continuous variables, some discrete variables, some time series). In applications beyond images, video, and language, this is often what is required. In contrast, ensemble models like random forests or gradient-boosted trees have no difficulty handling mixed data types.
+
+!split
+===== More limitations =====
+
+
+* _Many problems are not about prediction._ In natural science we are often interested in learning something about the underlying distribution that generates the data. In this case, it is often difficult to cast these ideas in a supervised learning setting. While the problems are related, it is possible to make good predictions with a *wrong* model. The model might or might not be useful for understanding the underlying science.
+
+Some of these remarks are particular to DNNs, others are shared by all supervised learning methods. This motivates the use of unsupervised methods which in part circumvent these problems.
+
+
+
+
+
+
+
+!split 
+===== Setting up a Multi-layer perceptron model for classification  =====
+
+We are now gong to develop an example based on the MNIST data
+base. This is a classification problem and we need to use our
+cross-entropy function we discussed in connection with logistic
+regression. The cross-entropy defines our cost function for the
+classificaton problems with neural networks.
+  
+In binary classification with two classes $(0, 1)$ we define the
+logistic/sigmoid function as the probability that a particular input
+is in class $0$ or $1$.  This is possible because the logistic
+function takes any input from the real numbers and inputs a number
+between 0 and 1, and can therefore be interpreted as a probability. It
+also has other nice properties, such as a derivative that is simple to
+calculate.
+  
+For an input $\boldsymbol{a}$ from the hidden layer, the probability that the input $\boldsymbol{x}$
+is in class 0 or 1 is just. We let $\theta$ represent the unknown weights and biases to be adjusted by our equations). The variable $x$
+represents our activation values $z$. We have
+!bt
+\[
+P(y = 0 \mid \bm{x}, \bm{\theta}) = \frac{1}{1 + \exp{(- \bm{x}})} ,
+\]
+!et
+and
+!bt
+\[
+P(y = 1 \mid \bm{x}, \bm{\theta}) = 1 - P(y = 0 \mid \bm{x}, \bm{\theta}) ,
+\]
+!et
+  
+where $y \in \{0, 1\}$  and $\bm{\theta}$ represents the weights and biases
+of our network.
+  
+
+!split
+===== Defining the cost function =====
+
+Our cost function is given as (see the Logistic regression lectures)
+!bt
+\[
+\mathcal{C}(\bm{\theta}) = - \ln P(\mathcal{D} \mid \bm{\theta}) = - \sum_{i=1}^n
+y_i \ln[P(y_i = 0)] + (1 - y_i) \ln [1 - P(y_i = 0)] = \sum_{i=1}^n \mathcal{L}_i(\bm{\theta}) .
+\]
+!et
+  
+This last equality means that we can interpret our *cost* function as a sum over the *loss* function
+for each point in the dataset $\mathcal{L}_i(\bm{\theta})$.  
+The negative sign is just so that we can think about our algorithm as minimizing a positive number, rather
+than maximizing a negative number.  
+  
+In *multiclass* classification it is common to treat each integer label as a so called *one-hot* vector:  
+  
+$y = 5 \quad \rightarrow \quad \bm{y} = (0, 0, 0, 0, 0, 1, 0, 0, 0, 0) ,$ and
+
+  
+$y = 1 \quad \rightarrow \quad \bm{y} = (0, 1, 0, 0, 0, 0, 0, 0, 0, 0) ,$ 
+  
+  
+i.e. a binary bit string of length $C$, where $C = 10$ is the number of classes in the MNIST dataset (numbers from $0$ to $9$)..  
+  
+If $\bm{x}_i$ is the $i$-th input (image), $y_{ic}$ refers to the $c$-th component of the $i$-th
+output vector $\bm{y}_i$.  
+The probability of $\bm{x}_i$ being in class $c$ will be given by the softmax function:  
+  
+!bt
+\[
+P(y_{ic} = 1 \mid \bm{x}_i, \bm{\theta}) = \frac{\exp{((\bm{a}_i^{hidden})^T \bm{w}_c)}}
+{\sum_{c'=0}^{C-1} \exp{((\bm{a}_i^{hidden})^T \bm{w}_{c'})}} ,
+\]
+!et
+  
+which reduces to the logistic function in the binary case.  
+The likelihood of this $C$-class classifier
+is now given as:  
+  
+!bt
+\[
+P(\mathcal{D} \mid \bm{\theta}) = \prod_{i=1}^n \prod_{c=0}^{C-1} [P(y_{ic} = 1)]^{y_{ic}} .
+\]
+!et
+Again we take the negative log-likelihood to define our cost function:  
+  
+!bt
+\[
+\mathcal{C}(\bm{\theta}) = - \log{P(\mathcal{D} \mid \bm{\theta})}.
+\]
+!et
+See the logistic regression lectures for a full definition of the cost function.
+
+The back propagation equations need now only a small change, namely the definition of a new cost function. We are thus ready to use the same equations as before!
+
+!split
+===== Example: binary classification problem =====
+
+As an example of the above, relevant for project 2 as well, let us consider a binary class. As discussed in our logistic regression lectures, we defined a cost function in terms of the parameters $\beta$ as
+!bt
+\[
+\mathcal{C}(\bm{\beta}) = - \sum_{i=1}^n \left(y_i\log{p(y_i \vert x_i,\bm{\beta})}+(1-y_i)\log{1-p(y_i \vert x_i,\bm{\beta})}\right),
+\]
+!et
+where we had defined the logistic (sigmoid) function
+!bt
+\[
+p(y_i =1\vert x_i,\bm{\beta})=\frac{\exp{(\beta_0+\beta_1 x_i)}}{1+\exp{(\beta_0+\beta_1 x_i)}},
+\]
+!et
+and
+!bt
+\[
+p(y_i =0\vert x_i,\bm{\beta})=1-p(y_i =1\vert x_i,\bm{\beta}).
+\]
+!et
+The parameters $\bm{\beta}$ were defined using a minimization method like gradient descent or Newton-Raphson's method. 
+
+Now we replace $x_i$ with the activation $z_i^l$ for a given layer $l$ and the outputs as $y_i=a_i^l=f(z_i^l)$, with $z_i^l$ now being a function of the weights $w_{ij}^l$ and biases $b_i^l$. 
+We have then
+!bt
+\[
+a_i^l = y_i = \frac{\exp{(z_i^l)}}{1+\exp{(z_i^l)}},
+\]
+!et
+with 
+!bt
+\[
+z_i^l = \sum_{j}w_{ij}^l a_j^{l-1}+b_i^l,
+\]
+!et
+where the superscript $l-1$ indicates that these are the outputs from layer $l-1$.
+Our cost function at the final layer $l=L$ is now
+!bt
+\[
+\mathcal{C}(\bm{W}) = - \sum_{i=1}^n \left(t_i\log{a_i^L}+(1-t_i)\log{(1-a_i^L)}\right),
+\]
+!et
+where we have defined the targets $t_i$. The derivatives of the cost function with respect to the output $a_i^L$ are then easily calculated and we get
+!bt
+\[
+\frac{\partial \mathcal{C}(\bm{W})}{\partial a_i^L} = \frac{a_i^L-t_i}{a_i^L(1-a_i^L)}. 
+\]
+!et
+In case we use another activation function than the logistic one, we need to evaluate other derivatives. 
+
+
+!split
+===== The Softmax function =====
+In case we employ the more general case given by the Softmax equation, we need to evaluate the derivative of the activation function with respect to the activation $z_i^l$, that is we need
+!bt
+\[
+\frac{\partial f(z_i^l)}{\partial w_{jk}^l} =
+\frac{\partial f(z_i^l)}{\partial z_j^l} \frac{\partial z_j^l}{\partial w_{jk}^l}= \frac{\partial f(z_i^l)}{\partial z_j^l}a_k^{l-1}.
+\]
+!et
+For the Softmax function we have
+!bt
+\[
+f(z_i^l) = \frac{\exp{(z_i^l)}}{\sum_{m=1}^K\exp{(z_m^l)}}.
+\]
+!et
+Its derivative with respect to $z_j^l$ gives 
+!bt
+\[
+\frac{\partial f(z_i^l)}{\partial z_j^l}= f(z_i^l)\left(\delta_{ij}-f(z_j^l)\right), 
+\]
+!et
+which in case of the simply binary model reduces to  having $i=j$. 
+
+!split 
+===== Developing a code for doing neural networks with back propagation =====
+
+  
+One can identify a set of key steps when using neural networks to solve supervised learning problems:  
+  
+o Collect and pre-process data  
+o Define model and architecture  
+o Choose cost function and optimizer  
+o Train the model  
+o Evaluate model performance on test data  
+o Adjust hyperparameters (if necessary, network architecture)
+
+!split
+===== Collect and pre-process data =====
+  
+Here we will be using the MNIST dataset, which is readily available through the _scikit-learn_
+package. You may also find it for example "here":"http://yann.lecun.com/exdb/mnist/".  
+The *MNIST* (Modified National Institute of Standards and Technology) database is a large database
+of handwritten digits that is commonly used for training various image processing systems.  
+The MNIST dataset consists of 70 000 images of size $28\times 28$ pixels, each labeled from 0 to 9.  
+The scikit-learn dataset we will use consists of a selection of 1797 images of size $8\times 8$ collected and processed from this database.  
+  
+To feed data into a feed-forward neural network we need to represent
+the inputs as a design/feature matrix $X = (n_{inputs}, n_{features})$.  Each
+row represents an *input*, in this case a handwritten digit, and
+each column represents a *feature*, in this case a pixel.  The
+correct answers, also known as *labels* or *targets* are
+represented as a 1D array of integers 
+$Y = (n_{inputs}) = (5, 3, 1, 8,...)$.
+  
+As an example, say we want to build a neural network using supervised learning to predict Body-Mass Index (BMI) from
+measurements of height (in m)  
+and weight (in kg). If we have measurements of 5 people the design/feature matrix could be for example:  
+  
+$$ X = \begin{bmatrix}
+1.85 & 81\\
+1.71 & 65\\
+1.95 & 103\\
+1.55 & 42\\
+1.63 & 56
+\end{bmatrix} ,$$  
+  
+and the targets would be:  
+  
+$$ Y = (23.7, 22.2, 27.1, 17.5, 21.1) $$  
+  
+Since each input image is a 2D matrix, we need to flatten the image
+(i.e. "unravel" the 2D matrix into a 1D array) to turn the data into a
+design/feature matrix. This means we lose all spatial information in the
+image, such as locality and translational invariance. More complicated
+architectures such as Convolutional Neural Networks can take advantage
+of such information, and are most commonly applied when analyzing
+images.
+
+
+!bc pycod
+# import necessary packages
+import numpy as np
+import matplotlib.pyplot as plt
+from sklearn import datasets
+
+
+# ensure the same random numbers appear every time
+np.random.seed(0)
+
+# display images in notebook
+%matplotlib inline
+plt.rcParams['figure.figsize'] = (12,12)
+
+
+# download MNIST dataset
+digits = datasets.load_digits()
+
+# define inputs and labels
+inputs = digits.images
+labels = digits.target
+
+print("inputs = (n_inputs, pixel_width, pixel_height) = " + str(inputs.shape))
+print("labels = (n_inputs) = " + str(labels.shape))
+
+
+# flatten the image
+# the value -1 means dimension is inferred from the remaining dimensions: 8x8 = 64
+n_inputs = len(inputs)
+inputs = inputs.reshape(n_inputs, -1)
+print("X = (n_inputs, n_features) = " + str(inputs.shape))
+
+
+# choose some random images to display
+indices = np.arange(n_inputs)
+random_indices = np.random.choice(indices, size=5)
+
+for i, image in enumerate(digits.images[random_indices]):
+    plt.subplot(1, 5, i+1)
+    plt.axis('off')
+    plt.imshow(image, cmap=plt.cm.gray_r, interpolation='nearest')
+    plt.title("Label: %d" % digits.target[random_indices[i]])
+plt.show()
+!ec
+
+!split
+===== Train and test datasets =====
+
+Performing analysis before partitioning the dataset is a major error, that can lead to incorrect conclusions.  
+  
+We will reserve $80 \%$ of our dataset for training and $20 \%$ for testing.  
+  
+It is important that the train and test datasets are drawn randomly from our dataset, to ensure
+no bias in the sampling.  
+Say you are taking measurements of weather data to predict the weather in the coming 5 days.
+You don't want to train your model on measurements taken from the hours 00.00 to 12.00, and then test it on data
+collected from 12.00 to 24.00.
+
+
+!bc pycod
+from sklearn.model_selection import train_test_split
+
+# one-liner from scikit-learn library
+train_size = 0.8
+test_size = 1 - train_size
+X_train, X_test, Y_train, Y_test = train_test_split(inputs, labels, train_size=train_size,
+                                                    test_size=test_size)
+
+# equivalently in numpy
+def train_test_split_numpy(inputs, labels, train_size, test_size):
+    n_inputs = len(inputs)
+    inputs_shuffled = inputs.copy()
+    labels_shuffled = labels.copy()
+    
+    np.random.shuffle(inputs_shuffled)
+    np.random.shuffle(labels_shuffled)
+    
+    train_end = int(n_inputs*train_size)
+    X_train, X_test = inputs_shuffled[:train_end], inputs_shuffled[train_end:]
+    Y_train, Y_test = labels_shuffled[:train_end], labels_shuffled[train_end:]
+    
+    return X_train, X_test, Y_train, Y_test
+
+#X_train, X_test, Y_train, Y_test = train_test_split_numpy(inputs, labels, train_size, test_size)
+
+print("Number of training images: " + str(len(X_train)))
+print("Number of test images: " + str(len(X_test)))
+!ec
+
+!split
+===== Define model and architecture =====
+  
+Our simple feed-forward neural network will consist of an *input* layer, a single *hidden* layer and an *output* layer. The activation $y$ of each neuron is a weighted sum of inputs, passed through an activation function. In case of the simple perceptron model we have 
+  
+$$ z = \sum_{i=1}^n w_i a_i ,$$
+  
+$$ y = f(z) ,$$
+  
+where $f$ is the activation function, $a_i$ represents input from neuron $i$ in the preceding layer
+and $w_i$ is the weight to input $i$.  
+The activation of the neurons in the input layer is just the features (e.g. a pixel value).  
+  
+The simplest activation function for a neuron is the *Heaviside* function:
+  
+$$ f(z) = 
+\begin{cases}
+1,  &  z > 0\\
+0,  & \text{otherwise}
+\end{cases}
+$$
+  
+A feed-forward neural network with this activation is known as a *perceptron*.  
+For a binary classifier (i.e. two classes, 0 or 1, dog or not-dog) we can also use this in our output layer.  
+This activation can be generalized to $k$ classes (using e.g. the *one-against-all* strategy), 
+and we call these architectures *multiclass perceptrons*.  
+  
+However, it is now common to use the terms Single Layer Perceptron (SLP) (1 hidden layer) and  
+Multilayer Perceptron (MLP) (2 or more hidden layers) to refer to feed-forward neural networks with any activation function.  
+  
+Typical choices for activation functions include the sigmoid function, hyperbolic tangent, and Rectified Linear Unit (ReLU).  
+We will be using the sigmoid function $\sigma(x)$:  
+  
+$$ f(x) = \sigma(x) = \frac{1}{1 + e^{-x}} ,$$
+  
+which is inspired by probability theory (see logistic regression) and was most commonly used until about 2011. See the discussion below concerning other activation functions.
+
+!split  
+===== Layers =====
+  
+* Input 
+Since each input image has 8x8 = 64 pixels or features, we have an input layer of 64 neurons.  
+  
+* Hidden layer
+We will use 50 neurons in the hidden layer receiving input from the neurons in the input layer.  
+Since each neuron in the hidden layer is connected to the 64 inputs we have 64x50 = 3200 weights to the hidden layer.  
+  
+* Output
+If we were building a binary classifier, it would be sufficient with a single neuron in the output layer,
+which could output 0 or 1 according to the Heaviside function. This would be an example of a *hard* classifier, meaning it outputs the class of the input directly. However, if we are dealing with noisy data it is often beneficial to use a *soft* classifier, which outputs the probability of being in class 0 or 1.  
+  
+For a soft binary classifier, we could use a single neuron and interpret the output as either being the probability of being in class 0 or the probability of being in class 1. Alternatively we could use 2 neurons, and interpret each neuron as the probability of being in each class.  
+  
+Since we are doing multiclass classification, with 10 categories, it is natural to use 10 neurons in the output layer. We number the neurons $j = 0,1,...,9$. The activation of each output neuron $j$ will be according to the *softmax* function:  
+  
+$$ P(\text{class $j$} \mid \text{input $\bm{a}$}) = \frac{\exp{(\bm{a}^T \bm{w}_j)}}
+{\sum_{c=0}^{9} \exp{(\bm{a}^T \bm{w}_c)}} ,$$  
+  
+i.e. each neuron $j$ outputs the probability of being in class $j$ given an input from the hidden layer $\bm{a}$, with $\bm{w}_j$ the weights of neuron $j$ to the inputs.  
+The denominator is a normalization factor to ensure the outputs (probabilities) sum up to 1.  
+The exponent is just the weighted sum of inputs as before:  
+  
+$$ z_j = \sum_{i=1}^n w_ {ij} a_i+b_j.$$  
+  
+Since each neuron in the output layer is connected to the 50 inputs from the hidden layer we have 50x10 = 500
+weights to the output layer.
+
+!split  
+===== Weights and biases =====
+  
+Typically weights are initialized with small values distributed around zero, drawn from a uniform
+or normal distribution. Setting all weights to zero means all neurons give the same output, making the network useless.  
+  
+Adding a bias value to the weighted sum of inputs allows the neural network to represent a greater range
+of values. Without it, any input with the value 0 will be mapped to zero (before being passed through the activation). The bias unit has an output of 1, and a weight to each neuron $j$, $b_j$:  
+  
+$$ z_j = \sum_{i=1}^n w_ {ij} a_i + b_j.$$  
+  
+The bias weights $\bm{b}$ are often initialized to zero, but a small value like $0.01$ ensures all neurons have some output which can be backpropagated in the first training cycle.
+!bc pycod
+# building our neural network
+
+n_inputs, n_features = X_train.shape
+n_hidden_neurons = 50
+n_categories = 10
+
+# we make the weights normally distributed using numpy.random.randn
+
+# weights and bias in the hidden layer
+hidden_weights = np.random.randn(n_features, n_hidden_neurons)
+hidden_bias = np.zeros(n_hidden_neurons) + 0.01
+
+# weights and bias in the output layer
+output_weights = np.random.randn(n_hidden_neurons, n_categories)
+output_bias = np.zeros(n_categories) + 0.01
+!ec
+
+!split
+===== Feed-forward pass =====
+
+Denote $F$ the number of features, $H$ the number of hidden neurons and $C$ the number of categories.  
+For each input image we calculate a weighted sum of input features (pixel values) to each neuron $j$ in the hidden layer $l$:  
+  
+$$ z_{j}^{l} = \sum_{i=1}^{F} w_{ij}^{l} x_i + b_{j}^{l},$$
+  
+this is then passed through our activation function  
+  
+$$ a_{j}^{l} = f(z_{j}^{l}) .$$  
+  
+We calculate a weighted sum of inputs (activations in the hidden layer) to each neuron $j$ in the output layer:  
+  
+$$ z_{j}^{L} = \sum_{i=1}^{H} w_{ij}^{L} a_{i}^{l} + b_{j}^{L}.$$  
+  
+Finally we calculate the output of neuron $j$ in the output layer using the softmax function:  
+  
+$$ a_{j}^{L} = \frac{\exp{(z_j^{L})}}
+{\sum_{c=0}^{C-1} \exp{(z_c^{L})}} .$$  
+
+!split   
+===== Matrix multiplications =====
+  
+Since our data has the dimensions $X = (n_{inputs}, n_{features})$ and our weights to the hidden
+layer have the dimensions  
+$W_{hidden} = (n_{features}, n_{hidden})$,
+we can easily feed the network all our training data in one go by taking the matrix product  
+  
+$$ X W^{h} = (n_{inputs}, n_{hidden}),$$ 
+  
+and obtain a matrix that holds the weighted sum of inputs to the hidden layer
+for each input image and each hidden neuron.    
+We also add the bias to obtain a matrix of weighted sums to the hidden layer $Z^{h}$:  
+  
+$$ \bm{z}^{l} = \bm{X} \bm{W}^{l} + \bm{b}^{l} ,$$
+  
+meaning the same bias (1D array with size equal number of hidden neurons) is added to each input image.  
+This is then passed through the activation:  
+  
+$$ \bm{a}^{l} = f(\bm{z}^l) .$$  
+  
+This is fed to the output layer:  
+  
+$$ \bm{z}^{L} = \bm{a}^{L} \bm{W}^{L} + \bm{b}^{L} .$$
+  
+Finally we receive our output values for each image and each category by passing it through the softmax function:  
+  
+$$ output = softmax (\bm{z}^{L}) = (n_{inputs}, n_{categories}) .$$
+
+
+!bc pycod
+# setup the feed-forward pass, subscript h = hidden layer
+
+def sigmoid(x):
+    return 1/(1 + np.exp(-x))
+
+def feed_forward(X):
+    # weighted sum of inputs to the hidden layer
+    z_h = np.matmul(X, hidden_weights) + hidden_bias
+    # activation in the hidden layer
+    a_h = sigmoid(z_h)
+    
+    # weighted sum of inputs to the output layer
+    z_o = np.matmul(a_h, output_weights) + output_bias
+    # softmax output
+    # axis 0 holds each input and axis 1 the probabilities of each category
+    exp_term = np.exp(z_o)
+    probabilities = exp_term / np.sum(exp_term, axis=1, keepdims=True)
+    
+    return probabilities
+
+probabilities = feed_forward(X_train)
+print("probabilities = (n_inputs, n_categories) = " + str(probabilities.shape))
+print("probability that image 0 is in category 0,1,2,...,9 = \n" + str(probabilities[0]))
+print("probabilities sum up to: " + str(probabilities[0].sum()))
+print()
+
+# we obtain a prediction by taking the class with the highest likelihood
+def predict(X):
+    probabilities = feed_forward(X)
+    return np.argmax(probabilities, axis=1)
+
+predictions = predict(X_train)
+print("predictions = (n_inputs) = " + str(predictions.shape))
+print("prediction for image 0: " + str(predictions[0]))
+print("correct label for image 0: " + str(Y_train[0]))
+!ec
+
+!split
+===== Choose cost function and optimizer =====
+  
+To measure how well our neural network is doing we need to introduce a cost function.  
+We will call the function that gives the error of a single sample output the *loss* function, and the function
+that gives the total error of our network across all samples the *cost* function.
+A typical choice for multiclass classification is the *cross-entropy* loss, also known as the negative log likelihood.  
+  
+In *multiclass* classification it is common to treat each integer label as a so called *one-hot* vector:  
+  
+$$ y = 5 \quad \rightarrow \quad \bm{y} = (0, 0, 0, 0, 0, 1, 0, 0, 0, 0) ,$$  
+
+  
+$$ y = 1 \quad \rightarrow \quad \bm{y} = (0, 1, 0, 0, 0, 0, 0, 0, 0, 0) ,$$  
+  
+  
+i.e. a binary bit string of length $C$, where $C = 10$ is the number of classes in the MNIST dataset.  
+  
+Let $y_{ic}$ denote the $c$-th component of the $i$-th one-hot vector.  
+We define the cost function $\mathcal{C}$ as a sum over the cross-entropy loss for each point $\bm{x}_i$ in the dataset.
+  
+In the one-hot representation only one of the terms in the loss function is non-zero, namely the
+probability of the correct category $c'$  
+(i.e. the category $c'$ such that $y_{ic'} = 1$). This means that the cross entropy loss only punishes you for how wrong
+you got the correct label. The probability of category $c$ is given by the softmax function. The vector $\bm{\theta}$ represents the parameters of our network, i.e. all the weights and biases.  
+  
+  
+!split
+===== Optimizing the cost function =====
+  
+The network is trained by finding the weights and biases that minimize the cost function. One of the most widely used classes of methods is *gradient descent* and its generalizations. The idea behind gradient descent
+is simply to adjust the weights in the direction where the gradient of the cost function is large and negative. This ensures we flow toward a *local* minimum of the cost function.  
+Each parameter $\theta$ is iteratively adjusted according to the rule  
+  
+$$ \theta_{i+1} = \theta_i - \eta \nabla \mathcal{C}(\theta_i) ,$$
+
+where $\eta$ is known as the *learning rate*, which controls how big a step we take towards the minimum.  
+This update can be repeated for any number of iterations, or until we are satisfied with the result.  
+  
+A simple and effective improvement is a variant called *Batch Gradient Descent*.  
+Instead of calculating the gradient on the whole dataset, we calculate an approximation of the gradient
+on a subset of the data called a *minibatch*.  
+If there are $N$ data points and we have a minibatch size of $M$, the total number of batches
+is $N/M$.  
+We denote each minibatch $B_k$, with $k = 1, 2,...,N/M$. The gradient then becomes:  
+  
+$$ \nabla \mathcal{C}(\theta) = \frac{1}{N} \sum_{i=1}^N \nabla \mathcal{L}_i(\theta) \quad \rightarrow \quad
+\frac{1}{M} \sum_{i \in B_k} \nabla \mathcal{L}_i(\theta) ,$$
+  
+i.e. instead of averaging the loss over the entire dataset, we average over a minibatch.  
+  
+This has two important benefits:  
+o Introducing stochasticity decreases the chance that the algorithm becomes stuck in a local minima.  
+o It significantly speeds up the calculation, since we do not have to use the entire dataset to calculate the gradient.  
+
+The various optmization  methods, with codes and algorithms,  are discussed in our lectures on "Gradient descent approaches":"https://compphysics.github.io/MachineLearning/doc/pub/Splines/html/Splines-bs.html".
+
+!split  
+=====  Regularization =====
+  
+It is common to add an extra term to the cost function, proportional
+to the size of the weights.  This is equivalent to constraining the
+size of the weights, so that they do not grow out of control.
+Constraining the size of the weights means that the weights cannot
+grow arbitrarily large to fit the training data, and in this way
+reduces *overfitting*.
+  
+We will measure the size of the weights using the so called *L2-norm*, meaning our cost function becomes:  
+  
+$$  \mathcal{C}(\theta) = \frac{1}{N} \sum_{i=1}^N \mathcal{L}_i(\theta) \quad \rightarrow \quad
+\frac{1}{N} \sum_{i=1}^N  \mathcal{L}_i(\theta) + \lambda \lvert \lvert \bm{w} \rvert \rvert_2^2 
+= \frac{1}{N} \sum_{i=1}^N  \mathcal{L}(\theta) + \lambda \sum_{ij} w_{ij}^2,$$  
+  
+i.e. we sum up all the weights squared. The factor $\lambda$ is known as a regularization parameter.
+
+  
+In order to train the model, we need to calculate the derivative of
+the cost function with respect to every bias and weight in the
+network.  In total our network has $(64 + 1)\times 50=3250$ weights in
+the hidden layer and $(50 + 1)\times 10=510$ weights to the output
+layer ($+1$ for the bias), and the gradient must be calculated for
+every parameter.  We use the *backpropagation* algorithm discussed
+above. This is a clever use of the chain rule that allows us to
+calculate the gradient efficently. 
+
+  
+!split
+===== Matrix  multiplication =====
+  
+To more efficently train our network these equations are implemented using matrix operations.  
+The error in the output layer is calculated simply as, with $\bm{t}$ being our targets,  
+  
+$$ \delta_L = \bm{t} - \bm{y} = (n_{inputs}, n_{categories}) .$$  
+  
+The gradient for the output weights is calculated as  
+  
+$$ \nabla W_{L} = \bm{a}^T \delta_L   = (n_{hidden}, n_{categories}) ,$$
+  
+where $\bm{a} = (n_{inputs}, n_{hidden})$. This simply means that we are summing up the gradients for each input.  
+Since we are going backwards we have to transpose the activation matrix.  
+  
+The gradient with respect to the output bias is then  
+  
+$$ \nabla \bm{b}_{L} = \sum_{i=1}^{n_{inputs}} \delta_L = (n_{categories}) .$$  
+  
+The error in the hidden layer is  
+  
+$$ \Delta_h = \delta_L W_{L}^T \circ f'(z_{h}) = \delta_L W_{L}^T \circ a_{h} \circ (1 - a_{h}) = (n_{inputs}, n_{hidden}) ,$$  
+  
+where $f'(a_{h})$ is the derivative of the activation in the hidden layer. The matrix products mean
+that we are summing up the products for each neuron in the output layer. The symbol $\circ$ denotes
+the *Hadamard product*, meaning element-wise multiplication.  
+  
+This again gives us the gradients in the hidden layer:  
+  
+$$ \nabla W_{h} = X^T \delta_h = (n_{features}, n_{hidden}) ,$$  
+  
+$$ \nabla b_{h} = \sum_{i=1}^{n_{inputs}} \delta_h = (n_{hidden}) .$$
+
+
+!bc pycod
+# to categorical turns our integer vector into a onehot representation
+from sklearn.metrics import accuracy_score
+
+# one-hot in numpy
+def to_categorical_numpy(integer_vector):
+    n_inputs = len(integer_vector)
+    n_categories = np.max(integer_vector) + 1
+    onehot_vector = np.zeros((n_inputs, n_categories))
+    onehot_vector[range(n_inputs), integer_vector] = 1
+    
+    return onehot_vector
+
+#Y_train_onehot, Y_test_onehot = to_categorical(Y_train), to_categorical(Y_test)
+Y_train_onehot, Y_test_onehot = to_categorical_numpy(Y_train), to_categorical_numpy(Y_test)
+
+def feed_forward_train(X):
+    # weighted sum of inputs to the hidden layer
+    z_h = np.matmul(X, hidden_weights) + hidden_bias
+    # activation in the hidden layer
+    a_h = sigmoid(z_h)
+    
+    # weighted sum of inputs to the output layer
+    z_o = np.matmul(a_h, output_weights) + output_bias
+    # softmax output
+    # axis 0 holds each input and axis 1 the probabilities of each category
+    exp_term = np.exp(z_o)
+    probabilities = exp_term / np.sum(exp_term, axis=1, keepdims=True)
+    
+    # for backpropagation need activations in hidden and output layers
+    return a_h, probabilities
+
+def backpropagation(X, Y):
+    a_h, probabilities = feed_forward_train(X)
+    
+    # error in the output layer
+    error_output = probabilities - Y
+    # error in the hidden layer
+    error_hidden = np.matmul(error_output, output_weights.T) * a_h * (1 - a_h)
+    
+    # gradients for the output layer
+    output_weights_gradient = np.matmul(a_h.T, error_output)
+    output_bias_gradient = np.sum(error_output, axis=0)
+    
+    # gradient for the hidden layer
+    hidden_weights_gradient = np.matmul(X.T, error_hidden)
+    hidden_bias_gradient = np.sum(error_hidden, axis=0)
+
+    return output_weights_gradient, output_bias_gradient, hidden_weights_gradient, hidden_bias_gradient
+
+print("Old accuracy on training data: " + str(accuracy_score(predict(X_train), Y_train)))
+
+eta = 0.01
+lmbd = 0.01
+for i in range(1000):
+    # calculate gradients
+    dWo, dBo, dWh, dBh = backpropagation(X_train, Y_train_onehot)
+    
+    # regularization term gradients
+    dWo += lmbd * output_weights
+    dWh += lmbd * hidden_weights
+    
+    # update weights and biases
+    output_weights -= eta * dWo
+    output_bias -= eta * dBo
+    hidden_weights -= eta * dWh
+    hidden_bias -= eta * dBh
+
+print("New accuracy on training data: " + str(accuracy_score(predict(X_train), Y_train)))
+!ec
+
+!split
+===== Improving performance =====
+  
+As we can see the network does not seem to be learning at all. It seems to be just guessing the label for each image.  
+In order to obtain a network that does something useful, we will have to do a bit more work.  
+  
+The choice of *hyperparameters* such as learning rate and regularization parameter is hugely influential for the performance of the network. Typically a *grid-search* is performed, wherein we test different hyperparameters separated by orders of magnitude. For example we could test the learning rates $\eta = 10^{-6}, 10^{-5},...,10^{-1}$ with different regularization parameters $\lambda = 10^{-6},...,10^{-0}$.  
+  
+Next, we haven't implemented minibatching yet, which introduces stochasticity and is though to act as an important regularizer on the weights. We call a feed-forward + backward pass with a minibatch an *iteration*, and a full training period
+going through the entire dataset ($n/M$ batches) an *epoch*.
+  
+If this does not improve network performance, you may want to consider altering the network architecture, adding more neurons or hidden layers.  
+Andrew Ng goes through some of these considerations in this "video":"https://youtu.be/F1ka6a13S9I". You can find a summary of the video "here":"https://kevinzakka.github.io/2016/09/26/applying-deep-learning/".  
+  
+!split
+===== Full object-oriented implementation =====
+  
+It is very natural to think of the network as an object, with specific instances of the network
+being realizations of this object with different hyperparameters. An implementation using Python classes provides a clean structure and interface, and the full implementation of our neural network is given below.
+
+
+!bc pycod
+class NeuralNetwork:
+    def __init__(
+            self,
+            X_data,
+            Y_data,
+            n_hidden_neurons=50,
+            n_categories=10,
+            epochs=10,
+            batch_size=100,
+            eta=0.1,
+            lmbd=0.0):
+
+        self.X_data_full = X_data
+        self.Y_data_full = Y_data
+
+        self.n_inputs = X_data.shape[0]
+        self.n_features = X_data.shape[1]
+        self.n_hidden_neurons = n_hidden_neurons
+        self.n_categories = n_categories
+
+        self.epochs = epochs
+        self.batch_size = batch_size
+        self.iterations = self.n_inputs // self.batch_size
+        self.eta = eta
+        self.lmbd = lmbd
+
+        self.create_biases_and_weights()
+
+    def create_biases_and_weights(self):
+        self.hidden_weights = np.random.randn(self.n_features, self.n_hidden_neurons)
+        self.hidden_bias = np.zeros(self.n_hidden_neurons) + 0.01
+
+        self.output_weights = np.random.randn(self.n_hidden_neurons, self.n_categories)
+        self.output_bias = np.zeros(self.n_categories) + 0.01
+
+    def feed_forward(self):
+        # feed-forward for training
+        self.z_h = np.matmul(self.X_data, self.hidden_weights) + self.hidden_bias
+        self.a_h = sigmoid(self.z_h)
+
+        self.z_o = np.matmul(self.a_h, self.output_weights) + self.output_bias
+
+        exp_term = np.exp(self.z_o)
+        self.probabilities = exp_term / np.sum(exp_term, axis=1, keepdims=True)
+
+    def feed_forward_out(self, X):
+        # feed-forward for output
+        z_h = np.matmul(X, self.hidden_weights) + self.hidden_bias
+        a_h = sigmoid(z_h)
+
+        z_o = np.matmul(a_h, self.output_weights) + self.output_bias
+        
+        exp_term = np.exp(z_o)
+        probabilities = exp_term / np.sum(exp_term, axis=1, keepdims=True)
+        return probabilities
+
+    def backpropagation(self):
+        error_output = self.probabilities - self.Y_data
+        error_hidden = np.matmul(error_output, self.output_weights.T) * self.a_h * (1 - self.a_h)
+
+        self.output_weights_gradient = np.matmul(self.a_h.T, error_output)
+        self.output_bias_gradient = np.sum(error_output, axis=0)
+
+        self.hidden_weights_gradient = np.matmul(self.X_data.T, error_hidden)
+        self.hidden_bias_gradient = np.sum(error_hidden, axis=0)
+
+        if self.lmbd > 0.0:
+            self.output_weights_gradient += self.lmbd * self.output_weights
+            self.hidden_weights_gradient += self.lmbd * self.hidden_weights
+
+        self.output_weights -= self.eta * self.output_weights_gradient
+        self.output_bias -= self.eta * self.output_bias_gradient
+        self.hidden_weights -= self.eta * self.hidden_weights_gradient
+        self.hidden_bias -= self.eta * self.hidden_bias_gradient
+
+    def predict(self, X):
+        probabilities = self.feed_forward_out(X)
+        return np.argmax(probabilities, axis=1)
+
+    def predict_probabilities(self, X):
+        probabilities = self.feed_forward_out(X)
+        return probabilities
+
+    def train(self):
+        data_indices = np.arange(self.n_inputs)
+
+        for i in range(self.epochs):
+            for j in range(self.iterations):
+                # pick datapoints with replacement
+                chosen_datapoints = np.random.choice(
+                    data_indices, size=self.batch_size, replace=False
+                )
+
+                # minibatch training data
+                self.X_data = self.X_data_full[chosen_datapoints]
+                self.Y_data = self.Y_data_full[chosen_datapoints]
+
+                self.feed_forward()
+                self.backpropagation()
+!ec
+
+!split
+===== Evaluate model performance on test data =====
+  
+To measure the performance of our network we evaluate how well it does it data it has never seen before, i.e. the test data.  
+We measure the performance of the network using the *accuracy* score.  
+The accuracy is as you would expect just the number of images correctly labeled divided by the total number of images. A perfect classifier will have an accuracy score of $1$.  
+  
+$$ \text{Accuracy} = \frac{\sum_{i=1}^n I(\tilde{y}_i = y_i)}{n} ,$$  
+  
+where $I$ is the indicator function, $1$ if $\tilde{y}_i = y_i$ and $0$ otherwise.
+
+
+!bc pycod
+epochs = 100
+batch_size = 100
+
+dnn = NeuralNetwork(X_train, Y_train_onehot, eta=eta, lmbd=lmbd, epochs=epochs, batch_size=batch_size,
+                    n_hidden_neurons=n_hidden_neurons, n_categories=n_categories)
+dnn.train()
+test_predict = dnn.predict(X_test)
+
+# accuracy score from scikit library
+print("Accuracy score on test set: ", accuracy_score(Y_test, test_predict))
+
+# equivalent in numpy
+def accuracy_score_numpy(Y_test, Y_pred):
+    return np.sum(Y_test == Y_pred) / len(Y_test)
+
+#print("Accuracy score on test set: ", accuracy_score_numpy(Y_test, test_predict))
+!ec
+
+!split
+===== Adjust hyperparameters =====
+  
+We now perform a grid search to find the optimal hyperparameters for the network.  
+Note that we are only using 1 layer with 50 neurons, and human performance is estimated to be around $98\%$ ($2\%$ error rate).
+
+!bc pycod
+eta_vals = np.logspace(-5, 1, 7)
+lmbd_vals = np.logspace(-5, 1, 7)
+# store the models for later use
+DNN_numpy = np.zeros((len(eta_vals), len(lmbd_vals)), dtype=object)
+
+# grid search
+for i, eta in enumerate(eta_vals):
+    for j, lmbd in enumerate(lmbd_vals):
+        dnn = NeuralNetwork(X_train, Y_train_onehot, eta=eta, lmbd=lmbd, epochs=epochs, batch_size=batch_size,
+                            n_hidden_neurons=n_hidden_neurons, n_categories=n_categories)
+        dnn.train()
+        
+        DNN_numpy[i][j] = dnn
+        
+        test_predict = dnn.predict(X_test)
+        
+        print("Learning rate  = ", eta)
+        print("Lambda = ", lmbd)
+        print("Accuracy score on test set: ", accuracy_score(Y_test, test_predict))
+        print()
+!ec
+
+!split
+===== Visualization =====
+
+!bc pycod
+# visual representation of grid search
+# uses seaborn heatmap, you can also do this with matplotlib imshow
+import seaborn as sns
+
+sns.set()
+
+train_accuracy = np.zeros((len(eta_vals), len(lmbd_vals)))
+test_accuracy = np.zeros((len(eta_vals), len(lmbd_vals)))
+
+for i in range(len(eta_vals)):
+    for j in range(len(lmbd_vals)):
+        dnn = DNN_numpy[i][j]
+        
+        train_pred = dnn.predict(X_train) 
+        test_pred = dnn.predict(X_test)
+
+        train_accuracy[i][j] = accuracy_score(Y_train, train_pred)
+        test_accuracy[i][j] = accuracy_score(Y_test, test_pred)
+
+        
+fig, ax = plt.subplots(figsize = (10, 10))
+sns.heatmap(train_accuracy, annot=True, ax=ax, cmap="viridis")
+ax.set_title("Training Accuracy")
+ax.set_ylabel("$\eta$")
+ax.set_xlabel("$\lambda$")
+plt.show()
+
+fig, ax = plt.subplots(figsize = (10, 10))
+sns.heatmap(test_accuracy, annot=True, ax=ax, cmap="viridis")
+ax.set_title("Test Accuracy")
+ax.set_ylabel("$\eta$")
+ax.set_xlabel("$\lambda$")
+plt.show()
+!ec
+
+!split
+===== scikit-learn implementation =====
+  
+_scikit-learn_ focuses more
+on traditional machine learning methods, such as regression,
+clustering, decision trees, etc. As such, it has only two types of
+neural networks: Multi Layer Perceptron outputting continuous values,
+*MPLRegressor*, and Multi Layer Perceptron outputting labels,
+*MLPClassifier*. We will see how simple it is to use these classes.
+  
+_scikit-learn_ implements a few improvements from our neural network,
+such as early stopping, a varying learning rate, different
+optimization methods, etc. We would therefore expect a better
+performance overall.
+
+!bc pycod
+from sklearn.neural_network import MLPClassifier
+# store models for later use
+DNN_scikit = np.zeros((len(eta_vals), len(lmbd_vals)), dtype=object)
+
+for i, eta in enumerate(eta_vals):
+    for j, lmbd in enumerate(lmbd_vals):
+        dnn = MLPClassifier(hidden_layer_sizes=(n_hidden_neurons), activation='logistic',
+                            alpha=lmbd, learning_rate_init=eta, max_iter=epochs)
+        dnn.fit(X_train, Y_train)
+        
+        DNN_scikit[i][j] = dnn
+        
+        print("Learning rate  = ", eta)
+        print("Lambda = ", lmbd)
+        print("Accuracy score on test set: ", dnn.score(X_test, Y_test))
+        print()
+!ec
+
+
+!split
+===== Visualization =====
+!bc pycod
+# optional
+# visual representation of grid search
+# uses seaborn heatmap, could probably do this in matplotlib
+import seaborn as sns
+
+sns.set()
+
+train_accuracy = np.zeros((len(eta_vals), len(lmbd_vals)))
+test_accuracy = np.zeros((len(eta_vals), len(lmbd_vals)))
+
+for i in range(len(eta_vals)):
+    for j in range(len(lmbd_vals)):
+        dnn = DNN_scikit[i][j]
+        
+        train_pred = dnn.predict(X_train) 
+        test_pred = dnn.predict(X_test)
+
+        train_accuracy[i][j] = accuracy_score(Y_train, train_pred)
+        test_accuracy[i][j] = accuracy_score(Y_test, test_pred)
+
+        
+fig, ax = plt.subplots(figsize = (10, 10))
+sns.heatmap(train_accuracy, annot=True, ax=ax, cmap="viridis")
+ax.set_title("Training Accuracy")
+ax.set_ylabel("$\eta$")
+ax.set_xlabel("$\lambda$")
+plt.show()
+
+fig, ax = plt.subplots(figsize = (10, 10))
+sns.heatmap(test_accuracy, annot=True, ax=ax, cmap="viridis")
+ax.set_title("Test Accuracy")
+ax.set_ylabel("$\eta$")
+ax.set_xlabel("$\lambda$")
+plt.show()
+!ec
+
+
+
+
+
+
+
+
+
+!split
+===== Building neural networks in Tensorflow and Keras =====
+
+Now we want  to build on the experience gained from our neural network implementation in NumPy and scikit-learn
+and use it to construct a neural network in Tensorflow. Once we have constructed a neural network in NumPy
+and Tensorflow, building one in Keras is really quite trivial, though the performance may suffer.  
+  
+In our previous example we used only one hidden layer, and in this we will use two. From this it should be quite
+clear how to build one using an arbitrary number of hidden layers, using data structures such as Python lists or
+NumPy arrays.
+
+!split
+===== Tensorflow =====
+  
+Tensorflow is an open source library machine learning library
+developed by the Google Brain team for internal use. It was released
+under the Apache 2.0 open source license in November 9, 2015.
+  
+Tensorflow is a computational framework that allows you to construct
+machine learning models at different levels of abstraction, from
+high-level, object-oriented APIs like Keras, down to the C++ kernels
+that Tensorflow is built upon. The higher levels of abstraction are
+simpler to use, but less flexible, and our choice of implementation
+should reflect the problems we are trying to solve.
+  
+"Tensorflow uses":"https://www.tensorflow.org/guide/graphs" so-called graphs to represent your computation
+in terms of the dependencies between individual operations, such that you first build a Tensorflow *graph*
+to represent your model, and then create a Tensorflow *session* to run the graph.
+  
+In this guide we will analyze the same data as we did in our NumPy and
+scikit-learn tutorial, gathered from the MNIST database of images. We
+will give an introduction to the lower level Python Application
+Program Interfaces (APIs), and see how we use them to build our graph.
+Then we will build (effectively) the same graph in Keras, to see just
+how simple solving a machine learning problem can be.
+  
+To install tensorflow on Unix/Linux systems, use pip as
+!bc pycod
+pip3 install tensorflow
+!ec
+and/or if you use _anaconda_, just write (or install from the graphical user interface)
+(current release of CPU-only TensorFlow)
+!bc pycod 
+conda create -n tf tensorflow
+conda activate tf
+!ec
+To install the current release of GPU TensorFlow
+!bc pycod
+conda create -n tf-gpu tensorflow-gpu
+conda activate tf-gpu
+!ec
+
+!split
+===== Using Keras =====
+  
+Keras is a high level "neural network":"https://en.wikipedia.org/wiki/Application_programming_interface"
+that supports Tensorflow, CTNK and Theano as backends.  
+If you have Anaconda installed you may run the following command
+!bc pycod
+conda install keras
+!ec
+You can look up the "instructions here":"https://keras.io/" for more information.
+
+We will to a large extent use _keras_ in this course. 
+
+!split
+===== Collect and pre-process data =====
+
+Let us look again at the MINST data set.
+
+!bc pycod
+# import necessary packages
+import numpy as np
+import matplotlib.pyplot as plt
+import tensorflow as tf
+from sklearn import datasets
+
+
+# ensure the same random numbers appear every time
+np.random.seed(0)
+
+# display images in notebook
+%matplotlib inline
+plt.rcParams['figure.figsize'] = (12,12)
+
+
+# download MNIST dataset
+digits = datasets.load_digits()
+
+# define inputs and labels
+inputs = digits.images
+labels = digits.target
+
+print("inputs = (n_inputs, pixel_width, pixel_height) = " + str(inputs.shape))
+print("labels = (n_inputs) = " + str(labels.shape))
+
+
+# flatten the image
+# the value -1 means dimension is inferred from the remaining dimensions: 8x8 = 64
+n_inputs = len(inputs)
+inputs = inputs.reshape(n_inputs, -1)
+print("X = (n_inputs, n_features) = " + str(inputs.shape))
+
+
+# choose some random images to display
+indices = np.arange(n_inputs)
+random_indices = np.random.choice(indices, size=5)
+
+for i, image in enumerate(digits.images[random_indices]):
+    plt.subplot(1, 5, i+1)
+    plt.axis('off')
+    plt.imshow(image, cmap=plt.cm.gray_r, interpolation='nearest')
+    plt.title("Label: %d" % digits.target[random_indices[i]])
+plt.show()
+!ec
+
+!bc pycod
+from tensorflow.keras.layers import Input
+from tensorflow.keras.models import Sequential      #This allows appending layers to existing models
+from tensorflow.keras.layers import Dense           #This allows defining the characteristics of a particular layer
+from tensorflow.keras import optimizers             #This allows using whichever optimiser we want (sgd,adam,RMSprop)
+from tensorflow.keras import regularizers           #This allows using whichever regularizer we want (l1,l2,l1_l2)
+from tensorflow.keras.utils import to_categorical   #This allows using categorical cross entropy as the cost function
+
+from sklearn.model_selection import train_test_split
+
+# one-hot representation of labels
+labels = to_categorical(labels)
+
+# split into train and test data
+train_size = 0.8
+test_size = 1 - train_size
+X_train, X_test, Y_train, Y_test = train_test_split(inputs, labels, train_size=train_size,
+                                                    test_size=test_size)
+!ec
+
+
+
+!bc pycod
+
+epochs = 100
+batch_size = 100
+n_neurons_layer1 = 100
+n_neurons_layer2 = 50
+n_categories = 10
+eta_vals = np.logspace(-5, 1, 7)
+lmbd_vals = np.logspace(-5, 1, 7)
+def create_neural_network_keras(n_neurons_layer1, n_neurons_layer2, n_categories, eta, lmbd):
+    model = Sequential()
+    model.add(Dense(n_neurons_layer1, activation='sigmoid', kernel_regularizer=regularizers.l2(lmbd)))
+    model.add(Dense(n_neurons_layer2, activation='sigmoid', kernel_regularizer=regularizers.l2(lmbd)))
+    model.add(Dense(n_categories, activation='softmax'))
+    
+    sgd = optimizers.SGD(lr=eta)
+    model.compile(loss='categorical_crossentropy', optimizer=sgd, metrics=['accuracy'])
+    
+    return model
+!ec
+
+!bc pycod
+DNN_keras = np.zeros((len(eta_vals), len(lmbd_vals)), dtype=object)
+        
+for i, eta in enumerate(eta_vals):
+    for j, lmbd in enumerate(lmbd_vals):
+        DNN = create_neural_network_keras(n_neurons_layer1, n_neurons_layer2, n_categories,
+                                         eta=eta, lmbd=lmbd)
+        DNN.fit(X_train, Y_train, epochs=epochs, batch_size=batch_size, verbose=0)
+        scores = DNN.evaluate(X_test, Y_test)
+        
+        DNN_keras[i][j] = DNN
+        
+        print("Learning rate = ", eta)
+        print("Lambda = ", lmbd)
+        print("Test accuracy: %.3f" % scores[1])
+        print()
+!ec
+
+
+
+!bc pycod
+# optional
+# visual representation of grid search
+# uses seaborn heatmap, could probably do this in matplotlib
+import seaborn as sns
+
+sns.set()
+
+train_accuracy = np.zeros((len(eta_vals), len(lmbd_vals)))
+test_accuracy = np.zeros((len(eta_vals), len(lmbd_vals)))
+
+for i in range(len(eta_vals)):
+    for j in range(len(lmbd_vals)):
+        DNN = DNN_keras[i][j]
+
+        train_accuracy[i][j] = DNN.evaluate(X_train, Y_train)[1]
+        test_accuracy[i][j] = DNN.evaluate(X_test, Y_test)[1]
+
+        
+fig, ax = plt.subplots(figsize = (10, 10))
+sns.heatmap(train_accuracy, annot=True, ax=ax, cmap="viridis")
+ax.set_title("Training Accuracy")
+ax.set_ylabel("$\eta$")
+ax.set_xlabel("$\lambda$")
+plt.show()
+
+fig, ax = plt.subplots(figsize = (10, 10))
+sns.heatmap(test_accuracy, annot=True, ax=ax, cmap="viridis")
+ax.set_title("Test Accuracy")
+ax.set_ylabel("$\eta$")
+ax.set_xlabel("$\lambda$")
+plt.show()
+!ec
+
+
+
+!split
+=====  The Breast Cancer Data, now with Keras =====
+
+!bc pycod
+
+import tensorflow as tf
+from tensorflow.keras.layers import Input
+from tensorflow.keras.models import Sequential      #This allows appending layers to existing models
+from tensorflow.keras.layers import Dense           #This allows defining the characteristics of a particular layer
+from tensorflow.keras import optimizers             #This allows using whichever optimiser we want (sgd,adam,RMSprop)
+from tensorflow.keras import regularizers           #This allows using whichever regularizer we want (l1,l2,l1_l2)
+from tensorflow.keras.utils import to_categorical   #This allows using categorical cross entropy as the cost function
+import numpy as np
+import matplotlib.pyplot as plt
+import seaborn as sns
+from sklearn.model_selection import train_test_split as splitter
+from sklearn.datasets import load_breast_cancer
+import pickle
+import os 
+
+
+"""Load breast cancer dataset"""
+
+np.random.seed(0)        #create same seed for random number every time
+
+cancer=load_breast_cancer()      #Download breast cancer dataset
+
+inputs=cancer.data                     #Feature matrix of 569 rows (samples) and 30 columns (parameters)
+outputs=cancer.target                  #Label array of 569 rows (0 for benign and 1 for malignant)
+labels=cancer.feature_names[0:30]
+
+print('The content of the breast cancer dataset is:')      #Print information about the datasets
+print(labels)
+print('-------------------------')
+print("inputs =  " + str(inputs.shape))
+print("outputs =  " + str(outputs.shape))
+print("labels =  "+ str(labels.shape))
+
+x=inputs      #Reassign the Feature and Label matrices to other variables
+y=outputs
+
+#%% 
+
+# Visualisation of dataset (for correlation analysis)
+
+plt.figure()
+plt.scatter(x[:,0],x[:,2],s=40,c=y,cmap=plt.cm.Spectral)
+plt.xlabel('Mean radius',fontweight='bold')
+plt.ylabel('Mean perimeter',fontweight='bold')
+plt.show()
+
+plt.figure()
+plt.scatter(x[:,5],x[:,6],s=40,c=y, cmap=plt.cm.Spectral)
+plt.xlabel('Mean compactness',fontweight='bold')
+plt.ylabel('Mean concavity',fontweight='bold')
+plt.show()
+
+
+plt.figure()
+plt.scatter(x[:,0],x[:,1],s=40,c=y,cmap=plt.cm.Spectral)
+plt.xlabel('Mean radius',fontweight='bold')
+plt.ylabel('Mean texture',fontweight='bold')
+plt.show()
+
+plt.figure()
+plt.scatter(x[:,2],x[:,1],s=40,c=y,cmap=plt.cm.Spectral)
+plt.xlabel('Mean perimeter',fontweight='bold')
+plt.ylabel('Mean compactness',fontweight='bold')
+plt.show()
+
+
+# Generate training and testing datasets
+
+#Select features relevant to classification (texture,perimeter,compactness and symmetery) 
+#and add to input matrix
+
+temp1=np.reshape(x[:,1],(len(x[:,1]),1))
+temp2=np.reshape(x[:,2],(len(x[:,2]),1))
+X=np.hstack((temp1,temp2))      
+temp=np.reshape(x[:,5],(len(x[:,5]),1))
+X=np.hstack((X,temp))       
+temp=np.reshape(x[:,8],(len(x[:,8]),1))
+X=np.hstack((X,temp))       
+
+X_train,X_test,y_train,y_test=splitter(X,y,test_size=0.1)   #Split datasets into training and testing
+
+y_train=to_categorical(y_train)     #Convert labels to categorical when using categorical cross entropy
+y_test=to_categorical(y_test)
+
+del temp1,temp2,temp
+
+# %%
+
+# Define tunable parameters"
+
+eta=np.logspace(-3,-1,3)                    #Define vector of learning rates (parameter to SGD optimiser)
+lamda=0.01                                  #Define hyperparameter
+n_layers=2                                  #Define number of hidden layers in the model
+n_neuron=np.logspace(0,3,4,dtype=int)       #Define number of neurons per layer
+epochs=100                                   #Number of reiterations over the input data
+batch_size=100                              #Number of samples per gradient update
+
+# %%
+
+"""Define function to return Deep Neural Network model"""
+
+def NN_model(inputsize,n_layers,n_neuron,eta,lamda):
+    model=Sequential()      
+    for i in range(n_layers):       #Run loop to add hidden layers to the model
+        if (i==0):                  #First layer requires input dimensions
+            model.add(Dense(n_neuron,activation='relu',kernel_regularizer=regularizers.l2(lamda),input_dim=inputsize))
+        else:                       #Subsequent layers are capable of automatic shape inferencing
+            model.add(Dense(n_neuron,activation='relu',kernel_regularizer=regularizers.l2(lamda)))
+    model.add(Dense(2,activation='softmax'))  #2 outputs - ordered and disordered (softmax for prob)
+    sgd=optimizers.SGD(lr=eta)
+    model.compile(loss='categorical_crossentropy',optimizer=sgd,metrics=['accuracy'])
+    return model
+
+    
+Train_accuracy=np.zeros((len(n_neuron),len(eta)))      #Define matrices to store accuracy scores as a function
+Test_accuracy=np.zeros((len(n_neuron),len(eta)))       #of learning rate and number of hidden neurons for 
+
+for i in range(len(n_neuron)):     #run loops over hidden neurons and learning rates to calculate 
+    for j in range(len(eta)):      #accuracy scores 
+        DNN_model=NN_model(X_train.shape[1],n_layers,n_neuron[i],eta[j],lamda)
+        DNN_model.fit(X_train,y_train,epochs=epochs,batch_size=batch_size,verbose=1)
+        Train_accuracy[i,j]=DNN_model.evaluate(X_train,y_train)[1]
+        Test_accuracy[i,j]=DNN_model.evaluate(X_test,y_test)[1]
+               
+
+def plot_data(x,y,data,title=None):
+
+    # plot results
+    fontsize=16
+
+
+    fig = plt.figure()
+    ax = fig.add_subplot(111)
+    cax = ax.matshow(data, interpolation='nearest', vmin=0, vmax=1)
+    
+    cbar=fig.colorbar(cax)
+    cbar.ax.set_ylabel('accuracy (%)',rotation=90,fontsize=fontsize)
+    cbar.set_ticks([0,.2,.4,0.6,0.8,1.0])
+    cbar.set_ticklabels(['0%','20%','40%','60%','80%','100%'])
+
+    # put text on matrix elements
+    for i, x_val in enumerate(np.arange(len(x))):
+        for j, y_val in enumerate(np.arange(len(y))):
+            c = "${0:.1f}\\%$".format( 100*data[j,i])  
+            ax.text(x_val, y_val, c, va='center', ha='center')
+
+    # convert axis vaues to to string labels
+    x=[str(i) for i in x]
+    y=[str(i) for i in y]
+
+
+    ax.set_xticklabels(['']+x)
+    ax.set_yticklabels(['']+y)
+
+    ax.set_xlabel('$\\mathrm{learning\\ rate}$',fontsize=fontsize)
+    ax.set_ylabel('$\\mathrm{hidden\\ neurons}$',fontsize=fontsize)
+    if title is not None:
+        ax.set_title(title)
+
+    plt.tight_layout()
+
+    plt.show()
+    
+plot_data(eta,n_neuron,Train_accuracy, 'training')
+plot_data(eta,n_neuron,Test_accuracy, 'testing')
+
+!ec
+
+
+
+
+
+!split
+===== Building a neural network code =====
+
+Here we  present a flexible object oriented codebase
+for a feed forward neural network, along with a demonstration of how
+to use it. Before we get into the details of the neural network, we
+will first present some implementations of various schedulers, cost
+functions and activation functions that can be used together with the
+neural network.
+
+The codes here were developed by Eric Reber and Gregor Kajda during spring 2023.
+
+=== Learning rate methods  ===
+
+The code below shows object oriented implementations of the Constant,
+Momentum, Adagrad, AdagradMomentum, RMS prop and Adam schedulers. All
+of the classes belong to the shared abstract Scheduler class, and
+share the update_change() and reset() methods allowing for any of the
+schedulers to be seamlessly used during the training stage, as will
+later be shown in the fit() method of the neural
+network. Update_change() only has one parameter, the gradient
+($δ^l_ja^{l−1}_k$), and returns the change which will be subtracted
+from the weights. The reset() function takes no parameters, and resets
+the desired variables. For Constant and Momentum, reset does nothing.
+
+
+!bc pycod
+import autograd.numpy as np
+
+class Scheduler:
+    """
+    Abstract class for Schedulers
+    """
+
+    def __init__(self, eta):
+        self.eta = eta
+
+    # should be overwritten
+    def update_change(self, gradient):
+        raise NotImplementedError
+
+    # overwritten if needed
+    def reset(self):
+        pass
+
+
+class Constant(Scheduler):
+    def __init__(self, eta):
+        super().__init__(eta)
+
+    def update_change(self, gradient):
+        return self.eta * gradient
+    
+    def reset(self):
+        pass
+
+
+class Momentum(Scheduler):
+    def __init__(self, eta: float, momentum: float):
+        super().__init__(eta)
+        self.momentum = momentum
+        self.change = 0
+
+    def update_change(self, gradient):
+        self.change = self.momentum * self.change + self.eta * gradient
+        return self.change
+
+    def reset(self):
+        pass
+
+
+class Adagrad(Scheduler):
+    def __init__(self, eta):
+        super().__init__(eta)
+        self.G_t = None
+
+    def update_change(self, gradient):
+        delta = 1e-8  # avoid division ny zero
+
+        if self.G_t is None:
+            self.G_t = np.zeros((gradient.shape[0], gradient.shape[0]))
+
+        self.G_t += gradient @ gradient.T
+
+        G_t_inverse = 1 / (
+            delta + np.sqrt(np.reshape(np.diagonal(self.G_t), (self.G_t.shape[0], 1)))
+        )
+        return self.eta * gradient * G_t_inverse
+
+    def reset(self):
+        self.G_t = None
+
+
+class AdagradMomentum(Scheduler):
+    def __init__(self, eta, momentum):
+        super().__init__(eta)
+        self.G_t = None
+        self.momentum = momentum
+        self.change = 0
+
+    def update_change(self, gradient):
+        delta = 1e-8  # avoid division ny zero
+
+        if self.G_t is None:
+            self.G_t = np.zeros((gradient.shape[0], gradient.shape[0]))
+
+        self.G_t += gradient @ gradient.T
+
+        G_t_inverse = 1 / (
+            delta + np.sqrt(np.reshape(np.diagonal(self.G_t), (self.G_t.shape[0], 1)))
+        )
+        self.change = self.change * self.momentum + self.eta * gradient * G_t_inverse
+        return self.change
+
+    def reset(self):
+        self.G_t = None
+
+
+class RMS_prop(Scheduler):
+    def __init__(self, eta, rho):
+        super().__init__(eta)
+        self.rho = rho
+        self.second = 0.0
+
+    def update_change(self, gradient):
+        delta = 1e-8  # avoid division ny zero
+        self.second = self.rho * self.second + (1 - self.rho) * gradient * gradient
+        return self.eta * gradient / (np.sqrt(self.second + delta))
+
+    def reset(self):
+        self.second = 0.0
+
+
+class Adam(Scheduler):
+    def __init__(self, eta, rho, rho2):
+        super().__init__(eta)
+        self.rho = rho
+        self.rho2 = rho2
+        self.moment = 0
+        self.second = 0
+        self.n_epochs = 1
+
+    def update_change(self, gradient):
+        delta = 1e-8  # avoid division ny zero
+
+        self.moment = self.rho * self.moment + (1 - self.rho) * gradient
+        self.second = self.rho2 * self.second + (1 - self.rho2) * gradient * gradient
+
+        moment_corrected = self.moment / (1 - self.rho**self.n_epochs)
+        second_corrected = self.second / (1 - self.rho2**self.n_epochs)
+
+        return self.eta * moment_corrected / (np.sqrt(second_corrected + delta))
+
+    def reset(self):
+        self.n_epochs += 1
+        self.moment = 0
+        self.second = 0
+
+!ec
+
+=== Usage of the above learning rate schedulers ===
+
+To initalize a scheduler, simply create the object and pass in the
+necessary parameters such as the learning rate and the momentum as
+shown below. As the Scheduler class is an abstract class it should not
+called directly, and will raise an error upon usage.
+
+!bc pycod
+momentum_scheduler = Momentum(eta=1e-3, momentum=0.9)
+adam_scheduler = Adam(eta=1e-3, rho=0.9, rho2=0.999)
+!ec
+
+Here is a small example for how a segment of code using schedulers
+could look. Switching out the schedulers is simple.
+
+!bc pycod
+weights = np.ones((3,3))
+print(f"Before scheduler:\n{weights=}")
+
+epochs = 10
+for e in range(epochs):
+    gradient = np.random.rand(3, 3)
+    change = adam_scheduler.update_change(gradient)
+    weights = weights - change
+    adam_scheduler.reset()
+
+print(f"\nAfter scheduler:\n{weights=}")
+!ec
+
+
+=== Cost functions ===
+
+Here we discuss cost functions that can be used when creating the
+neural network. Every cost function takes the target vector as its
+parameter, and returns a function valued only at $x$ such that it may
+easily be differentiated.
+
+
+!bc pycod
+import autograd.numpy as np
+
+def CostOLS(target):
+    
+    def func(X):
+        return (1.0 / target.shape[0]) * np.sum((target - X) ** 2)
+
+    return func
+
+
+def CostLogReg(target):
+
+    def func(X):
+        
+        return -(1.0 / target.shape[0]) * np.sum(
+            (target * np.log(X + 10e-10)) + ((1 - target) * np.log(1 - X + 10e-10))
+        )
+
+    return func
+
+
+def CostCrossEntropy(target):
+    
+    def func(X):
+        return -(1.0 / target.size) * np.sum(target * np.log(X + 10e-10))
+
+    return func
+!ec
+
+
+Below we give a short example of how these cost function may be used
+to obtain results if you wish to test them out on your own using
+AutoGrad's automatics differentiation.
+
+!bc pycod
+from autograd import grad
+
+target = np.array([[1, 2, 3]]).T
+a = np.array([[4, 5, 6]]).T
+
+cost_func = CostCrossEntropy
+cost_func_derivative = grad(cost_func(target))
+
+valued_at_a = cost_func_derivative(a)
+print(f"Derivative of cost function {cost_func.__name__} valued at a:\n{valued_at_a}")
+!ec
+
+
+=== Activation functions ===
+
+Finally, before we look at the neural network, we will look at the
+activation functions which can be specified between the hidden layers
+and as the output function. Each function can be valued for any given
+vector or matrix X, and can be differentiated via derivate().
+
+!bc pycod
+import autograd.numpy as np
+from autograd import elementwise_grad
+
+def identity(X):
+    return X
+
+
+def sigmoid(X):
+    try:
+        return 1.0 / (1 + np.exp(-X))
+    except FloatingPointError:
+        return np.where(X > np.zeros(X.shape), np.ones(X.shape), np.zeros(X.shape))
+
+
+def softmax(X):
+    X = X - np.max(X, axis=-1, keepdims=True)
+    delta = 10e-10
+    return np.exp(X) / (np.sum(np.exp(X), axis=-1, keepdims=True) + delta)
+
+
+def RELU(X):
+    return np.where(X > np.zeros(X.shape), X, np.zeros(X.shape))
+
+
+def LRELU(X):
+    delta = 10e-4
+    return np.where(X > np.zeros(X.shape), X, delta * X)
+
+
+def derivate(func):
+    if func.__name__ == "RELU":
+
+        def func(X):
+            return np.where(X > 0, 1, 0)
+
+        return func
+
+    elif func.__name__ == "LRELU":
+
+        def func(X):
+            delta = 10e-4
+            return np.where(X > 0, 1, delta)
+
+        return func
+
+    else:
+        return elementwise_grad(func)
+!ec
+
+Below follows a short demonstration of how to use an activation
+function. The derivative of the activation function will be important
+when calculating the output delta term during backpropagation. Note
+that derivate() can also be used for cost functions for a more
+generalized approach.
+
+!bc pycod
+z = np.array([[4, 5, 6]]).T
+print(f"Input to activation function:\n{z}")
+
+act_func = sigmoid
+a = act_func(z)
+print(f"\nOutput from {act_func.__name__} activation function:\n{a}")
+
+act_func_derivative = derivate(act_func)
+valued_at_z = act_func_derivative(a)
+print(f"\nDerivative of {act_func.__name__} activation function valued at z:\n{valued_at_z}")
+!ec
+
+=== The Neural Network ===
+
+Now that we have gotten a good understanding of the implementation of
+some important components, we can take a look at an object oriented
+implementation of a feed forward neural network. The feed forward
+neural network has been implemented as a class named FFNN, which can
+be initiated as a regressor or classifier dependant on the choice of
+cost function. The FFNN can have any number of input nodes, hidden
+layers with any amount of hidden nodes, and any amount of output nodes
+meaning it can perform multiclass classification as well as binary
+classification and regression problems. Although there is a lot of
+code present, it makes for an easy to use and generalizeable interface
+for creating many types of neural networks as will be demonstrated
+below.
+
+!bc pycod
+import math
+import autograd.numpy as np
+import sys
+import warnings
+from autograd import grad, elementwise_grad
+from random import random, seed
+from copy import deepcopy, copy
+from typing import Tuple, Callable
+from sklearn.utils import resample
+
+warnings.simplefilter("error")
+
+
+class FFNN:
+    """
+    Description:
+    ------------
+        Feed Forward Neural Network with interface enabling flexible design of a
+        nerual networks architecture and the specification of activation function
+        in the hidden layers and output layer respectively. This model can be used
+        for both regression and classification problems, depending on the output function.
+
+    Attributes:
+    ------------
+        I   dimensions (tuple[int]): A list of positive integers, which specifies the
+            number of nodes in each of the networks layers. The first integer in the array
+            defines the number of nodes in the input layer, the second integer defines number
+            of nodes in the first hidden layer and so on until the last number, which
+            specifies the number of nodes in the output layer.
+        II  hidden_func (Callable): The activation function for the hidden layers
+        III output_func (Callable): The activation function for the output layer
+        IV  cost_func (Callable): Our cost function
+        V   seed (int): Sets random seed, makes results reproducible
+    """
+
+    def __init__(
+        self,
+        dimensions: tuple[int],
+        hidden_func: Callable = sigmoid,
+        output_func: Callable = lambda x: x,
+        cost_func: Callable = CostOLS,
+        seed: int = None,
+    ):
+        self.dimensions = dimensions
+        self.hidden_func = hidden_func
+        self.output_func = output_func
+        self.cost_func = cost_func
+        self.seed = seed
+        self.weights = list()
+        self.schedulers_weight = list()
+        self.schedulers_bias = list()
+        self.a_matrices = list()
+        self.z_matrices = list()
+        self.classification = None
+
+        self.reset_weights()
+        self._set_classification()
+
+    def fit(
+        self,
+        X: np.ndarray,
+        t: np.ndarray,
+        scheduler: Scheduler,
+        batches: int = 1,
+        epochs: int = 100,
+        lam: float = 0,
+        X_val: np.ndarray = None,
+        t_val: np.ndarray = None,
+    ):
+        """
+        Description:
+        ------------
+            This function performs the training the neural network by performing the feedforward and backpropagation
+            algorithm to update the networks weights.
+
+        Parameters:
+        ------------
+            I    X (np.ndarray) : training data
+            II   t (np.ndarray) : target data
+            III  scheduler (Scheduler) : specified scheduler (algorithm for optimization of gradient descent)
+            IV   scheduler_args (list[int]) : list of all arguments necessary for scheduler
+
+        Optional Parameters:
+        ------------
+            V    batches (int) : number of batches the datasets are split into, default equal to 1
+            VI   epochs (int) : number of iterations used to train the network, default equal to 100
+            VII  lam (float) : regularization hyperparameter lambda
+            VIII X_val (np.ndarray) : validation set
+            IX   t_val (np.ndarray) : validation target set
+
+        Returns:
+        ------------
+            I   scores (dict) : A dictionary containing the performance metrics of the model.
+                The number of the metrics depends on the parameters passed to the fit-function.
+
+        """
+
+        # setup 
+        if self.seed is not None:
+            np.random.seed(self.seed)
+
+        val_set = False
+        if X_val is not None and t_val is not None:
+            val_set = True
+
+        # creating arrays for score metrics
+        train_errors = np.empty(epochs)
+        train_errors.fill(np.nan)
+        val_errors = np.empty(epochs)
+        val_errors.fill(np.nan)
+
+        train_accs = np.empty(epochs)
+        train_accs.fill(np.nan)
+        val_accs = np.empty(epochs)
+        val_accs.fill(np.nan)
+
+        self.schedulers_weight = list()
+        self.schedulers_bias = list()
+
+        batch_size = X.shape[0] // batches
+
+        X, t = resample(X, t)
+
+        # this function returns a function valued only at X
+        cost_function_train = self.cost_func(t)
+        if val_set:
+            cost_function_val = self.cost_func(t_val)
+
+        # create schedulers for each weight matrix
+        for i in range(len(self.weights)):
+            self.schedulers_weight.append(copy(scheduler))
+            self.schedulers_bias.append(copy(scheduler))
+
+        print(f"{scheduler.__class__.__name__}: Eta={scheduler.eta}, Lambda={lam}")
+
+        try:
+            for e in range(epochs):
+                for i in range(batches):
+                    # allows for minibatch gradient descent
+                    if i == batches - 1:
+                        # If the for loop has reached the last batch, take all thats left
+                        X_batch = X[i * batch_size :, :]
+                        t_batch = t[i * batch_size :, :]
+                    else:
+                        X_batch = X[i * batch_size : (i + 1) * batch_size, :]
+                        t_batch = t[i * batch_size : (i + 1) * batch_size, :]
+
+                    self._feedforward(X_batch)
+                    self._backpropagate(X_batch, t_batch, lam)
+
+                # reset schedulers for each epoch (some schedulers pass in this call)
+                for scheduler in self.schedulers_weight:
+                    scheduler.reset()
+
+                for scheduler in self.schedulers_bias:
+                    scheduler.reset()
+
+                # computing performance metrics
+                pred_train = self.predict(X)
+                train_error = cost_function_train(pred_train)
+
+                train_errors[e] = train_error
+                if val_set:
+                    
+                    pred_val = self.predict(X_val)
+                    val_error = cost_function_val(pred_val)
+                    val_errors[e] = val_error
+
+                if self.classification:
+                    train_acc = self._accuracy(self.predict(X), t)
+                    train_accs[e] = train_acc
+                    if val_set:
+                        val_acc = self._accuracy(pred_val, t_val)
+                        val_accs[e] = val_acc
+
+                # printing progress bar
+                progression = e / epochs
+                print_length = self._progress_bar(
+                    progression,
+                    train_error=train_errors[e],
+                    train_acc=train_accs[e],
+                    val_error=val_errors[e],
+                    val_acc=val_accs[e],
+                )
+        except KeyboardInterrupt:
+            # allows for stopping training at any point and seeing the result
+            pass
+
+        # visualization of training progression (similiar to tensorflow progression bar)
+        sys.stdout.write("\r" + " " * print_length)
+        sys.stdout.flush()
+        self._progress_bar(
+            1,
+            train_error=train_errors[e],
+            train_acc=train_accs[e],
+            val_error=val_errors[e],
+            val_acc=val_accs[e],
+        )
+        sys.stdout.write("")
+
+        # return performance metrics for the entire run
+        scores = dict()
+
+        scores["train_errors"] = train_errors
+
+        if val_set:
+            scores["val_errors"] = val_errors
+
+        if self.classification:
+            scores["train_accs"] = train_accs
+
+            if val_set:
+                scores["val_accs"] = val_accs
+
+        return scores
+
+    def predict(self, X: np.ndarray, *, threshold=0.5):
+        """
+         Description:
+         ------------
+             Performs prediction after training of the network has been finished.
+
+         Parameters:
+        ------------
+             I   X (np.ndarray): The design matrix, with n rows of p features each
+
+         Optional Parameters:
+         ------------
+             II  threshold (float) : sets minimal value for a prediction to be predicted as the positive class
+                 in classification problems
+
+         Returns:
+         ------------
+             I   z (np.ndarray): A prediction vector (row) for each row in our design matrix
+                 This vector is thresholded if regression=False, meaning that classification results
+                 in a vector of 1s and 0s, while regressions in an array of decimal numbers
+
+        """
+
+        predict = self._feedforward(X)
+
+        if self.classification:
+            return np.where(predict > threshold, 1, 0)
+        else:
+            return predict
+
+    def reset_weights(self):
+        """
+        Description:
+        ------------
+            Resets/Reinitializes the weights in order to train the network for a new problem.
+
+        """
+        if self.seed is not None:
+            np.random.seed(self.seed)
+
+        self.weights = list()
+        for i in range(len(self.dimensions) - 1):
+            weight_array = np.random.randn(
+                self.dimensions[i] + 1, self.dimensions[i + 1]
+            )
+            weight_array[0, :] = np.random.randn(self.dimensions[i + 1]) * 0.01
+
+            self.weights.append(weight_array)
+
+    def _feedforward(self, X: np.ndarray):
+        """
+        Description:
+        ------------
+            Calculates the activation of each layer starting at the input and ending at the output.
+            Each following activation is calculated from a weighted sum of each of the preceeding
+            activations (except in the case of the input layer).
+
+        Parameters:
+        ------------
+            I   X (np.ndarray): The design matrix, with n rows of p features each
+
+        Returns:
+        ------------
+            I   z (np.ndarray): A prediction vector (row) for each row in our design matrix
+        """
+
+        # reset matrices
+        self.a_matrices = list()
+        self.z_matrices = list()
+
+        # if X is just a vector, make it into a matrix
+        if len(X.shape) == 1:
+            X = X.reshape((1, X.shape[0]))
+
+        # Add a coloumn of zeros as the first coloumn of the design matrix, in order
+        # to add bias to our data
+        bias = np.ones((X.shape[0], 1)) * 0.01
+        X = np.hstack([bias, X])
+
+        # a^0, the nodes in the input layer (one a^0 for each row in X - where the
+        # exponent indicates layer number).
+        a = X
+        self.a_matrices.append(a)
+        self.z_matrices.append(a)
+
+        # The feed forward algorithm
+        for i in range(len(self.weights)):
+            if i < len(self.weights) - 1:
+                z = a @ self.weights[i]
+                self.z_matrices.append(z)
+                a = self.hidden_func(z)
+                # bias column again added to the data here
+                bias = np.ones((a.shape[0], 1)) * 0.01
+                a = np.hstack([bias, a])
+                self.a_matrices.append(a)
+            else:
+                try:
+                    # a^L, the nodes in our output layers
+                    z = a @ self.weights[i]
+                    a = self.output_func(z)
+                    self.a_matrices.append(a)
+                    self.z_matrices.append(z)
+                except Exception as OverflowError:
+                    print(
+                        "OverflowError in fit() in FFNN\nHOW TO DEBUG ERROR: Consider lowering your learning rate or scheduler specific parameters such as momentum, or check if your input values need scaling"
+                    )
+
+        # this will be a^L
+        return a
+
+    def _backpropagate(self, X, t, lam):
+        """
+        Description:
+        ------------
+            Performs the backpropagation algorithm. In other words, this method
+            calculates the gradient of all the layers starting at the
+            output layer, and moving from right to left accumulates the gradient until
+            the input layer is reached. Each layers respective weights are updated while
+            the algorithm propagates backwards from the output layer (auto-differentation in reverse mode).
+
+        Parameters:
+        ------------
+            I   X (np.ndarray): The design matrix, with n rows of p features each.
+            II  t (np.ndarray): The target vector, with n rows of p targets.
+            III lam (float32): regularization parameter used to punish the weights in case of overfitting
+
+        Returns:
+        ------------
+            No return value.
+
+        """
+        out_derivative = derivate(self.output_func)
+        hidden_derivative = derivate(self.hidden_func)
+
+        for i in range(len(self.weights) - 1, -1, -1):
+            # delta terms for output
+            if i == len(self.weights) - 1:
+                # for multi-class classification
+                if (
+                    self.output_func.__name__ == "softmax"
+                ):
+                    delta_matrix = self.a_matrices[i + 1] - t
+                # for single class classification
+                else:
+                    cost_func_derivative = grad(self.cost_func(t))
+                    delta_matrix = out_derivative(
+                        self.z_matrices[i + 1]
+                    ) * cost_func_derivative(self.a_matrices[i + 1])
+
+            # delta terms for hidden layer
+            else:
+                delta_matrix = (
+                    self.weights[i + 1][1:, :] @ delta_matrix.T
+                ).T * hidden_derivative(self.z_matrices[i + 1])
+
+            # calculate gradient
+            gradient_weights = self.a_matrices[i][:, 1:].T @ delta_matrix
+            gradient_bias = np.sum(delta_matrix, axis=0).reshape(
+                1, delta_matrix.shape[1]
+            )
+
+            # regularization term
+            gradient_weights += self.weights[i][1:, :] * lam
+
+            # use scheduler
+            update_matrix = np.vstack(
+                [
+                    self.schedulers_bias[i].update_change(gradient_bias),
+                    self.schedulers_weight[i].update_change(gradient_weights),
+                ]
+            )
+
+            # update weights and bias
+            self.weights[i] -= update_matrix
+
+    def _accuracy(self, prediction: np.ndarray, target: np.ndarray):
+        """
+        Description:
+        ------------
+            Calculates accuracy of given prediction to target
+
+        Parameters:
+        ------------
+            I   prediction (np.ndarray): vector of predicitons output network
+                (1s and 0s in case of classification, and real numbers in case of regression)
+            II  target (np.ndarray): vector of true values (What the network ideally should predict)
+
+        Returns:
+        ------------
+            A floating point number representing the percentage of correctly classified instances.
+        """
+        assert prediction.size == target.size
+        return np.average((target == prediction))
+    def _set_classification(self):
+        """
+        Description:
+        ------------
+            Decides if FFNN acts as classifier (True) og regressor (False),
+            sets self.classification during init()
+        """
+        self.classification = False
+        if (
+            self.cost_func.__name__ == "CostLogReg"
+            or self.cost_func.__name__ == "CostCrossEntropy"
+        ):
+            self.classification = True
+
+    def _progress_bar(self, progression, **kwargs):
+        """
+        Description:
+        ------------
+            Displays progress of training
+        """
+        print_length = 40
+        num_equals = int(progression * print_length)
+        num_not = print_length - num_equals
+        arrow = ">" if num_equals > 0 else ""
+        bar = "[" + "=" * (num_equals - 1) + arrow + "-" * num_not + "]"
+        perc_print = self._format(progression * 100, decimals=5)
+        line = f"  {bar} {perc_print}% "
+
+        for key in kwargs:
+            if not np.isnan(kwargs[key]):
+                value = self._format(kwargs[key], decimals=4)
+                line += f"| {key}: {value} "
+        sys.stdout.write("\r" + line)
+        sys.stdout.flush()
+        return len(line)
+
+    def _format(self, value, decimals=4):
+        """
+        Description:
+        ------------
+            Formats decimal numbers for progress bar
+        """
+        if value > 0:
+            v = value
+        elif value < 0:
+            v = -10 * value
+        else:
+            v = 1
+        n = 1 + math.floor(math.log10(v))
+        if n >= decimals - 1:
+            return str(round(value))
+        return f"{value:.{decimals-n-1}f}"
+!ec
+
+Before we make a model, we will quickly generate a dataset we can use
+for our linear regression problem as shown below
+
+!bc pycod
+import autograd.numpy as np
+from sklearn.model_selection import train_test_split
+
+def SkrankeFunction(x, y):
+    return np.ravel(0 + 1*x + 2*y + 3*x**2 + 4*x*y + 5*y**2)
+
+def create_X(x, y, n):
+    if len(x.shape) > 1:
+        x = np.ravel(x)
+        y = np.ravel(y)
+
+    N = len(x)
+    l = int((n + 1) * (n + 2) / 2)  # Number of elements in beta
+    X = np.ones((N, l))
+
+    for i in range(1, n + 1):
+        q = int((i) * (i + 1) / 2)
+        for k in range(i + 1):
+            X[:, q + k] = (x ** (i - k)) * (y**k)
+
+    return X
+
+step=0.5
+x = np.arange(0, 1, step)
+y = np.arange(0, 1, step)
+x, y = np.meshgrid(x, y)
+target = SkrankeFunction(x, y)
+target = target.reshape(target.shape[0], 1)
+
+poly_degree=3
+X = create_X(x, y, poly_degree)
+
+X_train, X_test, t_train, t_test = train_test_split(X, target)
+
+!ec
+
+Now that we have our dataset ready for the regression, we can create
+our regressor. Note that with the seed parameter, we can make sure our
+results stay the same every time we run the neural network. For
+inititialization, we simply specify the dimensions (we wish the amount
+of input nodes to be equal to the datapoints, and the output to
+predict one value).
+
+
+!bc pycod
+input_nodes = X_train.shape[1]
+output_nodes = 1
+
+linear_regression = FFNN((input_nodes, output_nodes), output_func=identity, cost_func=CostOLS, seed=2023)
+
+!ec
+
+We then fit our model with our training data using the scheduler of our choice.
+
+!bc pycod
+linear_regression.reset_weights() # reset weights such that previous runs or reruns don't affect the weights
+
+scheduler = Constant(eta=1e-3)
+scores = linear_regression.fit(X_train, t_train, scheduler)
+
+
+!ec
+
+Due to the progress bar we can see the MSE (train_error) throughout
+the FFNN's training. Note that the fit() function has some optional
+parameters with defualt arguments. For example, the regularization
+hyperparameter can be left ignored if not needed, and equally the FFNN
+will by default run for 100 epochs. These can easily be changed, such
+as for example:
+
+!bc pycod
+linear_regression.reset_weights() # reset weights such that previous runs or reruns don't affect the weights
+
+scores = linear_regression.fit(X_train, t_train, scheduler, lam=1e-4, epochs=1000)
+
+!ec
+
+We see that given more epochs to train on, the regressor reaches a lower MSE.
+
+Let us then switch to a binary classification. We use a binary
+classification dataset, and follow a similar setup to the regression
+case.
+
+
+
+!bc pycod
+from sklearn.datasets import load_breast_cancer
+from sklearn.preprocessing import MinMaxScaler
+
+wisconsin = load_breast_cancer()
+X = wisconsin.data
+target = wisconsin.target
+target = target.reshape(target.shape[0], 1)
+
+X_train, X_val, t_train, t_val = train_test_split(X, target)
+
+scaler = MinMaxScaler()
+scaler.fit(X_train)
+X_train = scaler.transform(X_train)
+X_val = scaler.transform(X_val)
+
+
+!ec
+
+!bc pycod
+input_nodes = X_train.shape[1]
+output_nodes = 1
+
+logistic_regression = FFNN((input_nodes, output_nodes), output_func=sigmoid, cost_func=CostLogReg, seed=2023)
+
+!ec
+
+We will now make use of our validation data by passing it into our fit function as a keyword argument
+
+!bc pycod
+logistic_regression.reset_weights() # reset weights such that previous runs or reruns don't affect the weights
+
+scheduler = Adam(eta=1e-3, rho=0.9, rho2=0.999)
+scores = logistic_regression.fit(X_train, t_train, scheduler, epochs=1000, X_val=X_val, t_val=t_val)
+
+
+!ec
+
+Finally, we will create a neural network with 2 hidden layers with activation functions.
+!bc pycod
+input_nodes = X_train.shape[1]
+hidden_nodes1 = 100
+hidden_nodes2 = 30
+output_nodes = 1
+
+dims = (input_nodes, hidden_nodes1, hidden_nodes2, output_nodes)
+
+neural_network = FFNN(dims, hidden_func=RELU, output_func=sigmoid, cost_func=CostLogReg, seed=2023)
+
+
+!ec
+
+!bc pycod
+neural_network.reset_weights() # reset weights such that previous runs or reruns don't affect the weights
+
+scheduler = Adam(eta=1e-4, rho=0.9, rho2=0.999)
+scores = neural_network.fit(X_train, t_train, scheduler, epochs=1000, X_val=X_val, t_val=t_val)
+
+!ec
+
+=== Multiclass classification ===
+
+Finally, we will demonstrate the use case of multiclass classification
+using our FFNN with the famous MNIST dataset, which contain images of
+digits between the range of 0 to 9.
+
+
+!bc pycod
+from sklearn.datasets import load_digits
+
+def onehot(target: np.ndarray):
+    onehot = np.zeros((target.size, target.max() + 1))
+    onehot[np.arange(target.size), target] = 1
+    return onehot
+
+digits = load_digits()
+
+X = digits.data
+target = digits.target
+target = onehot(target)
+
+input_nodes = 64
+hidden_nodes1 = 100
+hidden_nodes2 = 30
+output_nodes = 10
+
+dims = (input_nodes, hidden_nodes1, hidden_nodes2, output_nodes)
+
+multiclass = FFNN(dims, hidden_func=LRELU, output_func=softmax, cost_func=CostCrossEntropy)
+
+multiclass.reset_weights() # reset weights such that previous runs or reruns don't affect the weights
+
+scheduler = Adam(eta=1e-4, rho=0.9, rho2=0.999)
+scores = multiclass.fit(X, target, scheduler, epochs=1000)
+
+!ec
+
+
+
+!split
+===== Testing the XOR gate and other gates =====
+
+Let us now use our code to test the XOR gate.
+
+!bc pycod
+X = np.array([ [0, 0], [0, 1], [1, 0],[1, 1]],dtype=np.float64)
+
+# The XOR gate
+yXOR = np.array( [[ 0], [1] ,[1], [0]])
+
+input_nodes = X.shape[1]
+output_nodes = 1
+
+logistic_regression = FFNN((input_nodes, output_nodes), output_func=sigmoid, cost_func=CostLogReg, seed=2023)
+logistic_regression.reset_weights() # reset weights such that previous runs or reruns don't affect the weights
+scheduler = Adam(eta=1e-1, rho=0.9, rho2=0.999)
+scores = logistic_regression.fit(X, yXOR, scheduler, epochs=1000)
+!ec
+Not bad, but the results depend strongly on the learning reate. Try different learning rates.
+
diff --git a/doc/src/week42/week42.do.txt b/doc/src/week42/week42.do.txt
index de039fe22..47333ef45 100644
--- a/doc/src/week42/week42.do.txt
+++ b/doc/src/week42/week42.do.txt
@@ -6,12 +6,15 @@ DATE: October 13-17, 2025
 ===== Lecture October 13, 2025 =====
 !bblock
 o Building our own Feed-forward Neural Network and discussion of project 2
+o Project 2 is available at URL:"https://github.com/CompPhysics/MachineLearning/blob/master/doc/Projects/2025/Project2/ipynb/Project2.ipynb"
 !eblock
-!bblock Readings and videos 
+!split
+===== Readings and videos  =====
+!bblock 
 o These lecture notes
-#o "Video of lecture":"https://youtu.be/7B2F35gNj2Y"
-#o "Whiteboard notes":"https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2024/NotesOct14.pdf"
-o For a more in depth discussion on  neural networks we recommend Goodfellow et al chapters 6 and 7.     
+o Video of lecture at URL:"https://youtu.be/eqyNrEYRXnY"
+o Whiteboard notes at URL:"https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2025/FYSSTKweek42.pdf"
+o For a more in depth discussion on  neural networks we recommend Goodfellow et al chapters 6 and 7. For the optimization part, see chapter 8.    
 o Neural Networks demystified at URL:"https://www.youtube.com/watch?v=bxe2T-V8XRs&list=PLiaHhY2iBX9hdHaRr6b7XevZtgZRa1PoU&ab_channel=WelchLabs"
 o Building Neural Networks from scratch at URL:"https://www.youtube.com/watch?v=Wo5dMEP_BbI&list=PLQVvvaa0QuDcjD5BAw2DxE6OF2tius3V3&ab_channel=sentdex"
 o Video on Neural Networks at URL:"https://www.youtube.com/watch?v=CqOfi41LfDw"
@@ -20,17 +23,15 @@ I also  recommend Michael Nielsen's intuitive approach to the neural networks an
 !eblock
 
 !split
-===== Material for the active learning sessions on Tuesday and Wednesday =====
+===== Material for the lab sessions on Tuesday and Wednesday =====
 !bblock  
-  * Exercise on starting to write a code for neural networks, feed forward part. We will also continue ur discussions of gradient descent methods from last week. If you have time, start considering the back-propagation part as well (exercises for next week)
-  * Discussion of project 2
+o Exercises on writing a code for neural networks, back propagation part, see exercises for week 42 at URL:"https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/exercisesweek42.html" 
+o Discussion of project 2
 !eblock  
 
-_Note_: some of the codes will also be discussed next week in connection with the solution of differential equations.
-
 
 !split
-===== Writing a code which implements a feed-forward neural network  =====
+===== Lecture material: Writing a code which implements a feed-forward neural network  =====
 
 Last week we discussed the basics of neural networks and deep learning
 and the basics of automatic differentiation.  We looked also at
@@ -40,7 +41,7 @@ inputs and ouputs and no or just one hidden layers.
 
 We ended our discussions with the derivation of the equations for a
 neural network with one hidden layers and two input variables and two
-hidden nodes but only one output node.
+hidden nodes but only one output node. We did almost finish the derivation of the back propagation algorithm.
 
 
 !split
@@ -69,7 +70,7 @@ o Goodfellow et al, chapter 6 and 7 contain most of the neural network backgroun
 
 
 !split
-===== First network example, simple percepetron with one input =====
+===== Reminder from last week: First network example, simple percepetron with one input =====
 
 As yet another example we define now a simple perceptron model with
 all quantities given by scalars. We consider only one input variable
@@ -565,7 +566,7 @@ all inputs $\bm{x}$ are given by $\bm{\tilde{y}}_i$.
 
 
 !split
-===== Layout of a neural network with three hidden layers (last later = $l=L=4$, first layer $l=0$) =====
+===== Layout of a neural network with three hidden layers (last layer = $l=L=4$, first layer $l=0$) =====
 
 FIGURE: [figures/nn2.pdf, width=900 frac=1.0]
 
@@ -2574,181 +2575,6 @@ plt.show()
 
 
 
-!split
-=====  The Breast Cancer Data, now with Keras =====
-
-!bc pycod
-
-import tensorflow as tf
-from tensorflow.keras.layers import Input
-from tensorflow.keras.models import Sequential      #This allows appending layers to existing models
-from tensorflow.keras.layers import Dense           #This allows defining the characteristics of a particular layer
-from tensorflow.keras import optimizers             #This allows using whichever optimiser we want (sgd,adam,RMSprop)
-from tensorflow.keras import regularizers           #This allows using whichever regularizer we want (l1,l2,l1_l2)
-from tensorflow.keras.utils import to_categorical   #This allows using categorical cross entropy as the cost function
-import numpy as np
-import matplotlib.pyplot as plt
-import seaborn as sns
-from sklearn.model_selection import train_test_split as splitter
-from sklearn.datasets import load_breast_cancer
-import pickle
-import os 
-
-
-"""Load breast cancer dataset"""
-
-np.random.seed(0)        #create same seed for random number every time
-
-cancer=load_breast_cancer()      #Download breast cancer dataset
-
-inputs=cancer.data                     #Feature matrix of 569 rows (samples) and 30 columns (parameters)
-outputs=cancer.target                  #Label array of 569 rows (0 for benign and 1 for malignant)
-labels=cancer.feature_names[0:30]
-
-print('The content of the breast cancer dataset is:')      #Print information about the datasets
-print(labels)
-print('-------------------------')
-print("inputs =  " + str(inputs.shape))
-print("outputs =  " + str(outputs.shape))
-print("labels =  "+ str(labels.shape))
-
-x=inputs      #Reassign the Feature and Label matrices to other variables
-y=outputs
-
-#%% 
-
-# Visualisation of dataset (for correlation analysis)
-
-plt.figure()
-plt.scatter(x[:,0],x[:,2],s=40,c=y,cmap=plt.cm.Spectral)
-plt.xlabel('Mean radius',fontweight='bold')
-plt.ylabel('Mean perimeter',fontweight='bold')
-plt.show()
-
-plt.figure()
-plt.scatter(x[:,5],x[:,6],s=40,c=y, cmap=plt.cm.Spectral)
-plt.xlabel('Mean compactness',fontweight='bold')
-plt.ylabel('Mean concavity',fontweight='bold')
-plt.show()
-
-
-plt.figure()
-plt.scatter(x[:,0],x[:,1],s=40,c=y,cmap=plt.cm.Spectral)
-plt.xlabel('Mean radius',fontweight='bold')
-plt.ylabel('Mean texture',fontweight='bold')
-plt.show()
-
-plt.figure()
-plt.scatter(x[:,2],x[:,1],s=40,c=y,cmap=plt.cm.Spectral)
-plt.xlabel('Mean perimeter',fontweight='bold')
-plt.ylabel('Mean compactness',fontweight='bold')
-plt.show()
-
-
-# Generate training and testing datasets
-
-#Select features relevant to classification (texture,perimeter,compactness and symmetery) 
-#and add to input matrix
-
-temp1=np.reshape(x[:,1],(len(x[:,1]),1))
-temp2=np.reshape(x[:,2],(len(x[:,2]),1))
-X=np.hstack((temp1,temp2))      
-temp=np.reshape(x[:,5],(len(x[:,5]),1))
-X=np.hstack((X,temp))       
-temp=np.reshape(x[:,8],(len(x[:,8]),1))
-X=np.hstack((X,temp))       
-
-X_train,X_test,y_train,y_test=splitter(X,y,test_size=0.1)   #Split datasets into training and testing
-
-y_train=to_categorical(y_train)     #Convert labels to categorical when using categorical cross entropy
-y_test=to_categorical(y_test)
-
-del temp1,temp2,temp
-
-# %%
-
-# Define tunable parameters"
-
-eta=np.logspace(-3,-1,3)                    #Define vector of learning rates (parameter to SGD optimiser)
-lamda=0.01                                  #Define hyperparameter
-n_layers=2                                  #Define number of hidden layers in the model
-n_neuron=np.logspace(0,3,4,dtype=int)       #Define number of neurons per layer
-epochs=100                                   #Number of reiterations over the input data
-batch_size=100                              #Number of samples per gradient update
-
-# %%
-
-"""Define function to return Deep Neural Network model"""
-
-def NN_model(inputsize,n_layers,n_neuron,eta,lamda):
-    model=Sequential()      
-    for i in range(n_layers):       #Run loop to add hidden layers to the model
-        if (i==0):                  #First layer requires input dimensions
-            model.add(Dense(n_neuron,activation='relu',kernel_regularizer=regularizers.l2(lamda),input_dim=inputsize))
-        else:                       #Subsequent layers are capable of automatic shape inferencing
-            model.add(Dense(n_neuron,activation='relu',kernel_regularizer=regularizers.l2(lamda)))
-    model.add(Dense(2,activation='softmax'))  #2 outputs - ordered and disordered (softmax for prob)
-    sgd=optimizers.SGD(lr=eta)
-    model.compile(loss='categorical_crossentropy',optimizer=sgd,metrics=['accuracy'])
-    return model
-
-    
-Train_accuracy=np.zeros((len(n_neuron),len(eta)))      #Define matrices to store accuracy scores as a function
-Test_accuracy=np.zeros((len(n_neuron),len(eta)))       #of learning rate and number of hidden neurons for 
-
-for i in range(len(n_neuron)):     #run loops over hidden neurons and learning rates to calculate 
-    for j in range(len(eta)):      #accuracy scores 
-        DNN_model=NN_model(X_train.shape[1],n_layers,n_neuron[i],eta[j],lamda)
-        DNN_model.fit(X_train,y_train,epochs=epochs,batch_size=batch_size,verbose=1)
-        Train_accuracy[i,j]=DNN_model.evaluate(X_train,y_train)[1]
-        Test_accuracy[i,j]=DNN_model.evaluate(X_test,y_test)[1]
-               
-
-def plot_data(x,y,data,title=None):
-
-    # plot results
-    fontsize=16
-
-
-    fig = plt.figure()
-    ax = fig.add_subplot(111)
-    cax = ax.matshow(data, interpolation='nearest', vmin=0, vmax=1)
-    
-    cbar=fig.colorbar(cax)
-    cbar.ax.set_ylabel('accuracy (%)',rotation=90,fontsize=fontsize)
-    cbar.set_ticks([0,.2,.4,0.6,0.8,1.0])
-    cbar.set_ticklabels(['0%','20%','40%','60%','80%','100%'])
-
-    # put text on matrix elements
-    for i, x_val in enumerate(np.arange(len(x))):
-        for j, y_val in enumerate(np.arange(len(y))):
-            c = "${0:.1f}\\%$".format( 100*data[j,i])  
-            ax.text(x_val, y_val, c, va='center', ha='center')
-
-    # convert axis vaues to to string labels
-    x=[str(i) for i in x]
-    y=[str(i) for i in y]
-
-
-    ax.set_xticklabels(['']+x)
-    ax.set_yticklabels(['']+y)
-
-    ax.set_xlabel('$\\mathrm{learning\\ rate}$',fontsize=fontsize)
-    ax.set_ylabel('$\\mathrm{hidden\\ neurons}$',fontsize=fontsize)
-    if title is not None:
-        ax.set_title(title)
-
-    plt.tight_layout()
-
-    plt.show()
-    
-plot_data(eta,n_neuron,Train_accuracy, 'training')
-plot_data(eta,n_neuron,Test_accuracy, 'testing')
-
-!ec
-
-
-
 
 
 !split
diff --git a/doc/src/week43/backup2022.do.txt b/doc/src/week43/Previousversions/backup2022.do.txt
similarity index 100%
rename from doc/src/week43/backup2022.do.txt
rename to doc/src/week43/Previousversions/backup2022.do.txt
diff --git a/doc/src/week43/Previousversions/exercisesweek43.do.txt b/doc/src/week43/Previousversions/exercisesweek43.do.txt
new file mode 100644
index 000000000..105d59dd8
--- /dev/null
+++ b/doc/src/week43/Previousversions/exercisesweek43.do.txt
@@ -0,0 +1,1284 @@
+TITLE: Exercises weeks 43 and 44 
+AUTHOR: October 23-27, 2023
+DATE: Deadline is Sunday November 5 at midnight
+
+You can hand in the exercises from week 43 and week 44 as one exercise and get a total score of two additional points.
+
+=======  Overarching aims of the exercises weeks 43 and 44 =======
+
+The aim of the exercises this week and next week is to get started with writing a neural network code
+of relevance for project 2. 
+
+
+During week 41 we discussed three different types of gates, the
+so-called XOR, the OR and the AND gates.  In order to develop a code
+for neural networks, it can be useful to set up a simpler system with
+only two inputs and one output. This can make it easier to debug and
+study the feed forward pass and the back propagation part. In the
+exercise this and next week, we propose to study this system with just
+one hidden layer and two hidden nodes. There is only one output node
+and we can choose to use either a simple regression case (fitting a
+line) or just a binary classification case with the cross-entropy as
+cost function.
+
+
+Their inputs and outputs can be
+summarized using the following tables, first for the OR gate with
+inputs $x_1$ and $x_2$ and outputs $y$:
+
+|---------------------|
+| $x_1$ | $x_2$ | $y$ |
+|---------------------|
+| 0    |  0 |  0  |
+| 0    | 1  |  1  |
+| 1    | 0  | 1 |
+| 1    | 1  | 1 |
+|---------------------|
+
+!split
+===== The AND and XOR Gates =====
+
+The AND gate is defined as
+
+|---------------------|
+| $x_1$ | $x_2$ | $y$ |
+|---------------------|
+| 0    |  0 |  0  |
+| 0    | 1  |  0  |
+| 1    | 0  | 0  |
+| 1    | 1  | 1 |
+|---------------------|
+
+And finally we have the XOR gate
+
+|---------------------|
+| $x_1$ | $x_2$ | $y$ |
+|---------------------|
+| 0    |  0 |  0  |
+| 0    | 1  |  1  |
+| 1    | 0  | 1 |
+| 1    | 1  | 0 |
+|---------------------|
+
+!split
+===== Representing the Data Sets =====
+
+Our design matrix is defined by the input values $x_1$ and $x_2$. Since we have four possible outputs, our design matrix reads
+
+!bt
+\bm{X}=\begin{bmatrix} 0 & 0 \\
+                       0 & 1 \\
+		       1 & 0 \\
+		       1 & 1 \end{bmatrix},
+!et
+
+while the vector of outputs is $\bm{y}^T=[0,1,1,0]$ for the XOR gate, $\bm{y}^T=[0,0,0,1]$ for the AND gate and $\bm{y}^T=[0,1,1,1]$ for the OR gate.
+
+
+
+Your tasks here are
+
+o Set up the design matrix with the inputs as discussed above and a vector containing the output, the so-called targets. Note that the design matrix is the same for all gates. You need just to define different outputs.
+o Construct a neural network with only one hidden layer and two hidden nodes using the Sigmoid function as activation function.
+o Set up the output layer with only one output node and use again the Sigmoid function as activation function for the output.
+o Initialize the weights and biases and perform a feed forward pass and compare the outputs with the targets.
+o Set up the cost function (cross entropy for classification of binary cases).
+o Calculate the gradients needed for the back propagation part.
+o Use the gradients to train the network in the back propagation part. Think of using automatic differentiation.
+o Train the network and study your results and compare with results obtained either with _scikit-learn_ or _TensorFlow_.
+
+Everything you develop here can be used directly into the code for the project.
+
+
+!split
+=====  Setting up dimensionalities by hand =====
+
+It can be useful to test the dimensionalities for the network.  Let us assume we have performed an optimization for XOR gate and found that the weights for the hidden layer are given by
+!bt
+\bm{W_h}=\begin{bmatrix} 1 & 1 \\
+                       1 & 1 \end{bmatrix},
+!et
+
+Multiplying $\bm{X}$ and $\bm{W}$ gives
+
+!bt
+\bm{X}{W}_h=\begin{bmatrix} 0 & 0 \\
+                       1 & 1 \\
+		       1 & 1 \\
+		       2 & 2 \end{bmatrix},
+!et
+Assume also that the bias vector for the hidden layer is
+!bt
+\bm{b}_h=\begin{bmatrix} 0 \\
+                       -1\end{bmatrix},
+!et
+Adding it gives us the input to the activation function of the hidden layer
+!bt
+\bm{z}_h=\bm{X}\bm{W}_h+\bm{b}_h=\begin{bmatrix} 0 & -1 \\
+                       1 & 0 \\
+		       1 & 0 \\
+		       2 & 1 \end{bmatrix},
+!et
+
+Let us then assume that our activation function is the RELU function, which simply means that we take the max of $0$ and the elements of the input argument $\bm{z}_h$, that is we have
+!bt
+\bm{a}_h=\mathrm{RELU}(\bm{z}_h=\bm{X}\bm{W}_h+\bm{b}_h)=\begin{bmatrix} 0 & 0 \\
+                       1 & 0 \\
+		       1 & 0 \\
+		       2 & 1 \end{bmatrix},
+!et
+Assume also that the bias of the output layer is zero and that the weights of the output layer are
+!bt
+\bm{w}_o=\begin{bmatrix} 1 \\
+                       -2\end{bmatrix},
+!et
+and multiplying with $\bm{a}_h$ gives the output
+!bt
+\bm{a}_o=\begin{bmatrix} 0 & 0 \\
+                       1 & 0 \\
+		       1 & 0 \\
+		       2 & 1 \end{bmatrix}\begin{bmatrix} 1 \\
+                       -2\end{bmatrix}=\begin{bmatrix} 0 \\ 1 \\ 1 \\0\end{bmatrix},
+!et
+the wanted result.  Pay attention to the dimensionalities as well.
+
+
+!split
+===== Setting up the Neural Network =====
+
+We define first our design matrix and the various output vectors for the different gates.
+
+!bc pycod
+"""
+Simple code that tests XOR, OR and AND gates with linear regression
+"""
+
+# import necessary packages
+import numpy as np
+import matplotlib.pyplot as plt
+from sklearn import datasets
+
+def sigmoid(x):
+    return 1/(1 + np.exp(-x))
+
+def feed_forward(X):
+    # weighted sum of inputs to the hidden layer
+    z_h = np.matmul(X, hidden_weights) + hidden_bias
+    # activation in the hidden layer
+    a_h = sigmoid(z_h)
+    
+    # weighted sum of inputs to the output layer
+    z_o = np.matmul(a_h, output_weights) + output_bias
+    # softmax output
+    # axis 0 holds each input and axis 1 the probabilities of each category
+    probabilities = sigmoid(z_o)
+    return probabilities
+
+
+# ensure the same random numbers appear every time
+np.random.seed(0)
+
+# Design matrix
+X = np.array([ [0, 0], [0, 1], [1, 0],[1, 1]],dtype=np.float64)
+
+# The XOR gate
+yXOR = np.array( [ 0, 1 ,1, 0])
+# The OR gate
+yOR = np.array( [ 0, 1 ,1, 1])
+# The AND gate
+yAND = np.array( [ 0, 0 ,0, 1])
+
+# Defining the neural network
+n_inputs, n_features = X.shape
+n_hidden_neurons = 2
+n_categories = 1
+n_features = 2
+
+# we make the weights normally distributed using numpy.random.randn
+
+# weights and bias in the hidden layer
+hidden_weights = np.random.randn(n_features, n_hidden_neurons)
+hidden_bias = np.zeros(n_hidden_neurons) + 0.01
+
+# weights and bias in the output layer
+output_weights = np.random.randn(n_hidden_neurons, n_categories)
+output_bias = np.zeros(n_categories) + 0.01
+
+probabilities = feed_forward(X)
+print(probabilities)
+
+!ec
+
+Not an impressive result, but this was our first forward pass with randomly assigned weights. Let us now add the full network with the back-propagation algorithm discussed above.
+
+!split
+===== The Code using Scikit-Learn =====
+
+!bc pycod
+# import necessary packages
+import numpy as np
+import matplotlib.pyplot as plt
+from sklearn.neural_network import MLPClassifier
+from sklearn.metrics import accuracy_score
+import seaborn as sns
+
+# ensure the same random numbers appear every time
+np.random.seed(0)
+
+# Design matrix
+X = np.array([ [0, 0], [0, 1], [1, 0],[1, 1]],dtype=np.float64)
+
+# The XOR gate
+yXOR = np.array( [ 0, 1 ,1, 0])
+# The OR gate
+yOR = np.array( [ 0, 1 ,1, 1])
+# The AND gate
+yAND = np.array( [ 0, 0 ,0, 1])
+
+# Defining the neural network
+n_hidden_neurons = 2
+
+eta_vals = np.logspace(-5, 1, 7)
+lmbd_vals = np.logspace(-5, 1, 7)
+# store models for later use
+DNN_scikit = np.zeros((len(eta_vals), len(lmbd_vals)), dtype=object)
+epochs = 100
+
+for i, eta in enumerate(eta_vals):
+    for j, lmbd in enumerate(lmbd_vals):
+        dnn = MLPClassifier(hidden_layer_sizes=(n_hidden_neurons), activation='logistic',
+                            alpha=lmbd, learning_rate_init=eta, max_iter=epochs)
+        dnn.fit(X, yXOR)
+        DNN_scikit[i][j] = dnn
+        print("Learning rate  = ", eta)
+        print("Lambda = ", lmbd)
+        print("Accuracy score on data set: ", dnn.score(X, yXOR))
+        print()
+
+sns.set()
+test_accuracy = np.zeros((len(eta_vals), len(lmbd_vals)))
+for i in range(len(eta_vals)):
+    for j in range(len(lmbd_vals)):
+        dnn = DNN_scikit[i][j]
+        test_pred = dnn.predict(X)
+        test_accuracy[i][j] = accuracy_score(yXOR, test_pred)
+
+fig, ax = plt.subplots(figsize = (10, 10))
+sns.heatmap(test_accuracy, annot=True, ax=ax, cmap="viridis")
+ax.set_title("Test Accuracy")
+ax.set_ylabel("$\eta$")
+ax.set_xlabel("$\lambda$")
+plt.show()
+
+!ec
+
+!split
+===== Building a neural network code =====
+
+Here we  present a flexible object oriented codebase
+for a feed forward neural network, along with a demonstration of how
+to use it. Before we get into the details of the neural network, we
+will first present some implementations of various schedulers, cost
+functions and activation functions that can be used together with the
+neural network.
+
+The codes here were developed by Eric Reber and Gregor Kajda during spring 2023.
+
+=== Learning rate methods  ===
+
+The code below shows object oriented implementations of the Constant,
+Momentum, Adagrad, AdagradMomentum, RMS prop and Adam schedulers. All
+of the classes belong to the shared abstract Scheduler class, and
+share the update_change() and reset() methods allowing for any of the
+schedulers to be seamlessly used during the training stage, as will
+later be shown in the fit() method of the neural
+network. Update_change() only has one parameter, the gradient
+($δ^l_ja^{l−1}_k$), and returns the change which will be subtracted
+from the weights. The reset() function takes no parameters, and resets
+the desired variables. For Constant and Momentum, reset does nothing.
+
+
+!bc pycod
+import autograd.numpy as np
+
+class Scheduler:
+    """
+    Abstract class for Schedulers
+    """
+
+    def __init__(self, eta):
+        self.eta = eta
+
+    # should be overwritten
+    def update_change(self, gradient):
+        raise NotImplementedError
+
+    # overwritten if needed
+    def reset(self):
+        pass
+
+
+class Constant(Scheduler):
+    def __init__(self, eta):
+        super().__init__(eta)
+
+    def update_change(self, gradient):
+        return self.eta * gradient
+    
+    def reset(self):
+        pass
+
+
+class Momentum(Scheduler):
+    def __init__(self, eta: float, momentum: float):
+        super().__init__(eta)
+        self.momentum = momentum
+        self.change = 0
+
+    def update_change(self, gradient):
+        self.change = self.momentum * self.change + self.eta * gradient
+        return self.change
+
+    def reset(self):
+        pass
+
+
+class Adagrad(Scheduler):
+    def __init__(self, eta):
+        super().__init__(eta)
+        self.G_t = None
+
+    def update_change(self, gradient):
+        delta = 1e-8  # avoid division ny zero
+
+        if self.G_t is None:
+            self.G_t = np.zeros((gradient.shape[0], gradient.shape[0]))
+
+        self.G_t += gradient @ gradient.T
+
+        G_t_inverse = 1 / (
+            delta + np.sqrt(np.reshape(np.diagonal(self.G_t), (self.G_t.shape[0], 1)))
+        )
+        return self.eta * gradient * G_t_inverse
+
+    def reset(self):
+        self.G_t = None
+
+
+class AdagradMomentum(Scheduler):
+    def __init__(self, eta, momentum):
+        super().__init__(eta)
+        self.G_t = None
+        self.momentum = momentum
+        self.change = 0
+
+    def update_change(self, gradient):
+        delta = 1e-8  # avoid division ny zero
+
+        if self.G_t is None:
+            self.G_t = np.zeros((gradient.shape[0], gradient.shape[0]))
+
+        self.G_t += gradient @ gradient.T
+
+        G_t_inverse = 1 / (
+            delta + np.sqrt(np.reshape(np.diagonal(self.G_t), (self.G_t.shape[0], 1)))
+        )
+        self.change = self.change * self.momentum + self.eta * gradient * G_t_inverse
+        return self.change
+
+    def reset(self):
+        self.G_t = None
+
+
+class RMS_prop(Scheduler):
+    def __init__(self, eta, rho):
+        super().__init__(eta)
+        self.rho = rho
+        self.second = 0.0
+
+    def update_change(self, gradient):
+        delta = 1e-8  # avoid division ny zero
+        self.second = self.rho * self.second + (1 - self.rho) * gradient * gradient
+        return self.eta * gradient / (np.sqrt(self.second + delta))
+
+    def reset(self):
+        self.second = 0.0
+
+
+class Adam(Scheduler):
+    def __init__(self, eta, rho, rho2):
+        super().__init__(eta)
+        self.rho = rho
+        self.rho2 = rho2
+        self.moment = 0
+        self.second = 0
+        self.n_epochs = 1
+
+    def update_change(self, gradient):
+        delta = 1e-8  # avoid division ny zero
+
+        self.moment = self.rho * self.moment + (1 - self.rho) * gradient
+        self.second = self.rho2 * self.second + (1 - self.rho2) * gradient * gradient
+
+        moment_corrected = self.moment / (1 - self.rho**self.n_epochs)
+        second_corrected = self.second / (1 - self.rho2**self.n_epochs)
+
+        return self.eta * moment_corrected / (np.sqrt(second_corrected + delta))
+
+    def reset(self):
+        self.n_epochs += 1
+        self.moment = 0
+        self.second = 0
+
+!ec
+
+=== Usage of the above learning rate schedulers ===
+
+To initalize a scheduler, simply create the object and pass in the
+necessary parameters such as the learning rate and the momentum as
+shown below. As the Scheduler class is an abstract class it should not
+called directly, and will raise an error upon usage.
+
+!bc pycod
+momentum_scheduler = Momentum(eta=1e-3, momentum=0.9)
+adam_scheduler = Adam(eta=1e-3, rho=0.9, rho2=0.999)
+!ec
+
+Here is a small example for how a segment of code using schedulers
+could look. Switching out the schedulers is simple.
+
+!bc pycod
+weights = np.ones((3,3))
+print(f"Before scheduler:\n{weights=}")
+
+epochs = 10
+for e in range(epochs):
+    gradient = np.random.rand(3, 3)
+    change = adam_scheduler.update_change(gradient)
+    weights = weights - change
+    adam_scheduler.reset()
+
+print(f"\nAfter scheduler:\n{weights=}")
+!ec
+
+
+=== Cost functions ===
+
+Here we discuss cost functions that can be used when creating the
+neural network. Every cost function takes the target vector as its
+parameter, and returns a function valued only at $x$ such that it may
+easily be differentiated.
+
+
+!bc pycod
+import autograd.numpy as np
+
+def CostOLS(target):
+    
+    def func(X):
+        return (1.0 / target.shape[0]) * np.sum((target - X) ** 2)
+
+    return func
+
+
+def CostLogReg(target):
+
+    def func(X):
+        
+        return -(1.0 / target.shape[0]) * np.sum(
+            (target * np.log(X + 10e-10)) + ((1 - target) * np.log(1 - X + 10e-10))
+        )
+
+    return func
+
+
+def CostCrossEntropy(target):
+    
+    def func(X):
+        return -(1.0 / target.size) * np.sum(target * np.log(X + 10e-10))
+
+    return func
+!ec
+
+
+Below we give a short example of how these cost function may be used
+to obtain results if you wish to test them out on your own using
+AutoGrad's automatics differentiation.
+
+!bc pycod
+from autograd import grad
+
+target = np.array([[1, 2, 3]]).T
+a = np.array([[4, 5, 6]]).T
+
+cost_func = CostCrossEntropy
+cost_func_derivative = grad(cost_func(target))
+
+valued_at_a = cost_func_derivative(a)
+print(f"Derivative of cost function {cost_func.__name__} valued at a:\n{valued_at_a}")
+!ec
+
+
+=== Activation functions ===
+
+Finally, before we look at the neural network, we will look at the
+activation functions which can be specified between the hidden layers
+and as the output function. Each function can be valued for any given
+vector or matrix X, and can be differentiated via derivate().
+
+!bc pycod
+import autograd.numpy as np
+from autograd import elementwise_grad
+
+def identity(X):
+    return X
+
+
+def sigmoid(X):
+    try:
+        return 1.0 / (1 + np.exp(-X))
+    except FloatingPointError:
+        return np.where(X > np.zeros(X.shape), np.ones(X.shape), np.zeros(X.shape))
+
+
+def softmax(X):
+    X = X - np.max(X, axis=-1, keepdims=True)
+    delta = 10e-10
+    return np.exp(X) / (np.sum(np.exp(X), axis=-1, keepdims=True) + delta)
+
+
+def RELU(X):
+    return np.where(X > np.zeros(X.shape), X, np.zeros(X.shape))
+
+
+def LRELU(X):
+    delta = 10e-4
+    return np.where(X > np.zeros(X.shape), X, delta * X)
+
+
+def derivate(func):
+    if func.__name__ == "RELU":
+
+        def func(X):
+            return np.where(X > 0, 1, 0)
+
+        return func
+
+    elif func.__name__ == "LRELU":
+
+        def func(X):
+            delta = 10e-4
+            return np.where(X > 0, 1, delta)
+
+        return func
+
+    else:
+        return elementwise_grad(func)
+!ec
+
+Below follows a short demonstration of how to use an activation
+function. The derivative of the activation function will be important
+when calculating the output delta term during backpropagation. Note
+that derivate() can also be used for cost functions for a more
+generalized approach.
+
+!bc pycod
+z = np.array([[4, 5, 6]]).T
+print(f"Input to activation function:\n{z}")
+
+act_func = sigmoid
+a = act_func(z)
+print(f"\nOutput from {act_func.__name__} activation function:\n{a}")
+
+act_func_derivative = derivate(act_func)
+valued_at_z = act_func_derivative(a)
+print(f"\nDerivative of {act_func.__name__} activation function valued at z:\n{valued_at_z}")
+!ec
+
+=== The Neural Network ===
+
+Now that we have gotten a good understanding of the implementation of
+some important components, we can take a look at an object oriented
+implementation of a feed forward neural network. The feed forward
+neural network has been implemented as a class named FFNN, which can
+be initiated as a regressor or classifier dependant on the choice of
+cost function. The FFNN can have any number of input nodes, hidden
+layers with any amount of hidden nodes, and any amount of output nodes
+meaning it can perform multiclass classification as well as binary
+classification and regression problems. Although there is a lot of
+code present, it makes for an easy to use and generalizeable interface
+for creating many types of neural networks as will be demonstrated
+below.
+
+!bc pycod
+import math
+import autograd.numpy as np
+import sys
+import warnings
+from autograd import grad, elementwise_grad
+from random import random, seed
+from copy import deepcopy, copy
+from typing import Tuple, Callable
+from sklearn.utils import resample
+
+warnings.simplefilter("error")
+
+
+class FFNN:
+    """
+    Description:
+    ------------
+        Feed Forward Neural Network with interface enabling flexible design of a
+        nerual networks architecture and the specification of activation function
+        in the hidden layers and output layer respectively. This model can be used
+        for both regression and classification problems, depending on the output function.
+
+    Attributes:
+    ------------
+        I   dimensions (tuple[int]): A list of positive integers, which specifies the
+            number of nodes in each of the networks layers. The first integer in the array
+            defines the number of nodes in the input layer, the second integer defines number
+            of nodes in the first hidden layer and so on until the last number, which
+            specifies the number of nodes in the output layer.
+        II  hidden_func (Callable): The activation function for the hidden layers
+        III output_func (Callable): The activation function for the output layer
+        IV  cost_func (Callable): Our cost function
+        V   seed (int): Sets random seed, makes results reproducible
+    """
+
+    def __init__(
+        self,
+        dimensions: tuple[int],
+        hidden_func: Callable = sigmoid,
+        output_func: Callable = lambda x: x,
+        cost_func: Callable = CostOLS,
+        seed: int = None,
+    ):
+        self.dimensions = dimensions
+        self.hidden_func = hidden_func
+        self.output_func = output_func
+        self.cost_func = cost_func
+        self.seed = seed
+        self.weights = list()
+        self.schedulers_weight = list()
+        self.schedulers_bias = list()
+        self.a_matrices = list()
+        self.z_matrices = list()
+        self.classification = None
+
+        self.reset_weights()
+        self._set_classification()
+
+    def fit(
+        self,
+        X: np.ndarray,
+        t: np.ndarray,
+        scheduler: Scheduler,
+        batches: int = 1,
+        epochs: int = 100,
+        lam: float = 0,
+        X_val: np.ndarray = None,
+        t_val: np.ndarray = None,
+    ):
+        """
+        Description:
+        ------------
+            This function performs the training the neural network by performing the feedforward and backpropagation
+            algorithm to update the networks weights.
+
+        Parameters:
+        ------------
+            I    X (np.ndarray) : training data
+            II   t (np.ndarray) : target data
+            III  scheduler (Scheduler) : specified scheduler (algorithm for optimization of gradient descent)
+            IV   scheduler_args (list[int]) : list of all arguments necessary for scheduler
+
+        Optional Parameters:
+        ------------
+            V    batches (int) : number of batches the datasets are split into, default equal to 1
+            VI   epochs (int) : number of iterations used to train the network, default equal to 100
+            VII  lam (float) : regularization hyperparameter lambda
+            VIII X_val (np.ndarray) : validation set
+            IX   t_val (np.ndarray) : validation target set
+
+        Returns:
+        ------------
+            I   scores (dict) : A dictionary containing the performance metrics of the model.
+                The number of the metrics depends on the parameters passed to the fit-function.
+
+        """
+
+        # setup 
+        if self.seed is not None:
+            np.random.seed(self.seed)
+
+        val_set = False
+        if X_val is not None and t_val is not None:
+            val_set = True
+
+        # creating arrays for score metrics
+        train_errors = np.empty(epochs)
+        train_errors.fill(np.nan)
+        val_errors = np.empty(epochs)
+        val_errors.fill(np.nan)
+
+        train_accs = np.empty(epochs)
+        train_accs.fill(np.nan)
+        val_accs = np.empty(epochs)
+        val_accs.fill(np.nan)
+
+        self.schedulers_weight = list()
+        self.schedulers_bias = list()
+
+        batch_size = X.shape[0] // batches
+
+        X, t = resample(X, t)
+
+        # this function returns a function valued only at X
+        cost_function_train = self.cost_func(t)
+        if val_set:
+            cost_function_val = self.cost_func(t_val)
+
+        # create schedulers for each weight matrix
+        for i in range(len(self.weights)):
+            self.schedulers_weight.append(copy(scheduler))
+            self.schedulers_bias.append(copy(scheduler))
+
+        print(f"{scheduler.__class__.__name__}: Eta={scheduler.eta}, Lambda={lam}")
+
+        try:
+            for e in range(epochs):
+                for i in range(batches):
+                    # allows for minibatch gradient descent
+                    if i == batches - 1:
+                        # If the for loop has reached the last batch, take all thats left
+                        X_batch = X[i * batch_size :, :]
+                        t_batch = t[i * batch_size :, :]
+                    else:
+                        X_batch = X[i * batch_size : (i + 1) * batch_size, :]
+                        t_batch = t[i * batch_size : (i + 1) * batch_size, :]
+
+                    self._feedforward(X_batch)
+                    self._backpropagate(X_batch, t_batch, lam)
+
+                # reset schedulers for each epoch (some schedulers pass in this call)
+                for scheduler in self.schedulers_weight:
+                    scheduler.reset()
+
+                for scheduler in self.schedulers_bias:
+                    scheduler.reset()
+
+                # computing performance metrics
+                pred_train = self.predict(X)
+                train_error = cost_function_train(pred_train)
+
+                train_errors[e] = train_error
+                if val_set:
+                    
+                    pred_val = self.predict(X_val)
+                    val_error = cost_function_val(pred_val)
+                    val_errors[e] = val_error
+
+                if self.classification:
+                    train_acc = self._accuracy(self.predict(X), t)
+                    train_accs[e] = train_acc
+                    if val_set:
+                        val_acc = self._accuracy(pred_val, t_val)
+                        val_accs[e] = val_acc
+
+                # printing progress bar
+                progression = e / epochs
+                print_length = self._progress_bar(
+                    progression,
+                    train_error=train_errors[e],
+                    train_acc=train_accs[e],
+                    val_error=val_errors[e],
+                    val_acc=val_accs[e],
+                )
+        except KeyboardInterrupt:
+            # allows for stopping training at any point and seeing the result
+            pass
+
+        # visualization of training progression (similiar to tensorflow progression bar)
+        sys.stdout.write("\r" + " " * print_length)
+        sys.stdout.flush()
+        self._progress_bar(
+            1,
+            train_error=train_errors[e],
+            train_acc=train_accs[e],
+            val_error=val_errors[e],
+            val_acc=val_accs[e],
+        )
+        sys.stdout.write("")
+
+        # return performance metrics for the entire run
+        scores = dict()
+
+        scores["train_errors"] = train_errors
+
+        if val_set:
+            scores["val_errors"] = val_errors
+
+        if self.classification:
+            scores["train_accs"] = train_accs
+
+            if val_set:
+                scores["val_accs"] = val_accs
+
+        return scores
+
+    def predict(self, X: np.ndarray, *, threshold=0.5):
+        """
+         Description:
+         ------------
+             Performs prediction after training of the network has been finished.
+
+         Parameters:
+        ------------
+             I   X (np.ndarray): The design matrix, with n rows of p features each
+
+         Optional Parameters:
+         ------------
+             II  threshold (float) : sets minimal value for a prediction to be predicted as the positive class
+                 in classification problems
+
+         Returns:
+         ------------
+             I   z (np.ndarray): A prediction vector (row) for each row in our design matrix
+                 This vector is thresholded if regression=False, meaning that classification results
+                 in a vector of 1s and 0s, while regressions in an array of decimal numbers
+
+        """
+
+        predict = self._feedforward(X)
+
+        if self.classification:
+            return np.where(predict > threshold, 1, 0)
+        else:
+            return predict
+
+    def reset_weights(self):
+        """
+        Description:
+        ------------
+            Resets/Reinitializes the weights in order to train the network for a new problem.
+
+        """
+        if self.seed is not None:
+            np.random.seed(self.seed)
+
+        self.weights = list()
+        for i in range(len(self.dimensions) - 1):
+            weight_array = np.random.randn(
+                self.dimensions[i] + 1, self.dimensions[i + 1]
+            )
+            weight_array[0, :] = np.random.randn(self.dimensions[i + 1]) * 0.01
+
+            self.weights.append(weight_array)
+
+    def _feedforward(self, X: np.ndarray):
+        """
+        Description:
+        ------------
+            Calculates the activation of each layer starting at the input and ending at the output.
+            Each following activation is calculated from a weighted sum of each of the preceeding
+            activations (except in the case of the input layer).
+
+        Parameters:
+        ------------
+            I   X (np.ndarray): The design matrix, with n rows of p features each
+
+        Returns:
+        ------------
+            I   z (np.ndarray): A prediction vector (row) for each row in our design matrix
+        """
+
+        # reset matrices
+        self.a_matrices = list()
+        self.z_matrices = list()
+
+        # if X is just a vector, make it into a matrix
+        if len(X.shape) == 1:
+            X = X.reshape((1, X.shape[0]))
+
+        # Add a coloumn of zeros as the first coloumn of the design matrix, in order
+        # to add bias to our data
+        bias = np.ones((X.shape[0], 1)) * 0.01
+        X = np.hstack([bias, X])
+
+        # a^0, the nodes in the input layer (one a^0 for each row in X - where the
+        # exponent indicates layer number).
+        a = X
+        self.a_matrices.append(a)
+        self.z_matrices.append(a)
+
+        # The feed forward algorithm
+        for i in range(len(self.weights)):
+            if i < len(self.weights) - 1:
+                z = a @ self.weights[i]
+                self.z_matrices.append(z)
+                a = self.hidden_func(z)
+                # bias column again added to the data here
+                bias = np.ones((a.shape[0], 1)) * 0.01
+                a = np.hstack([bias, a])
+                self.a_matrices.append(a)
+            else:
+                try:
+                    # a^L, the nodes in our output layers
+                    z = a @ self.weights[i]
+                    a = self.output_func(z)
+                    self.a_matrices.append(a)
+                    self.z_matrices.append(z)
+                except Exception as OverflowError:
+                    print(
+                        "OverflowError in fit() in FFNN\nHOW TO DEBUG ERROR: Consider lowering your learning rate or scheduler specific parameters such as momentum, or check if your input values need scaling"
+                    )
+
+        # this will be a^L
+        return a
+
+    def _backpropagate(self, X, t, lam):
+        """
+        Description:
+        ------------
+            Performs the backpropagation algorithm. In other words, this method
+            calculates the gradient of all the layers starting at the
+            output layer, and moving from right to left accumulates the gradient until
+            the input layer is reached. Each layers respective weights are updated while
+            the algorithm propagates backwards from the output layer (auto-differentation in reverse mode).
+
+        Parameters:
+        ------------
+            I   X (np.ndarray): The design matrix, with n rows of p features each.
+            II  t (np.ndarray): The target vector, with n rows of p targets.
+            III lam (float32): regularization parameter used to punish the weights in case of overfitting
+
+        Returns:
+        ------------
+            No return value.
+
+        """
+        out_derivative = derivate(self.output_func)
+        hidden_derivative = derivate(self.hidden_func)
+
+        for i in range(len(self.weights) - 1, -1, -1):
+            # delta terms for output
+            if i == len(self.weights) - 1:
+                # for multi-class classification
+                if (
+                    self.output_func.__name__ == "softmax"
+                ):
+                    delta_matrix = self.a_matrices[i + 1] - t
+                # for single class classification
+                else:
+                    cost_func_derivative = grad(self.cost_func(t))
+                    delta_matrix = out_derivative(
+                        self.z_matrices[i + 1]
+                    ) * cost_func_derivative(self.a_matrices[i + 1])
+
+            # delta terms for hidden layer
+            else:
+                delta_matrix = (
+                    self.weights[i + 1][1:, :] @ delta_matrix.T
+                ).T * hidden_derivative(self.z_matrices[i + 1])
+
+            # calculate gradient
+            gradient_weights = self.a_matrices[i][:, 1:].T @ delta_matrix
+            gradient_bias = np.sum(delta_matrix, axis=0).reshape(
+                1, delta_matrix.shape[1]
+            )
+
+            # regularization term
+            gradient_weights += self.weights[i][1:, :] * lam
+
+            # use scheduler
+            update_matrix = np.vstack(
+                [
+                    self.schedulers_bias[i].update_change(gradient_bias),
+                    self.schedulers_weight[i].update_change(gradient_weights),
+                ]
+            )
+
+            # update weights and bias
+            self.weights[i] -= update_matrix
+
+    def _accuracy(self, prediction: np.ndarray, target: np.ndarray):
+        """
+        Description:
+        ------------
+            Calculates accuracy of given prediction to target
+
+        Parameters:
+        ------------
+            I   prediction (np.ndarray): vector of predicitons output network
+                (1s and 0s in case of classification, and real numbers in case of regression)
+            II  target (np.ndarray): vector of true values (What the network ideally should predict)
+
+        Returns:
+        ------------
+            A floating point number representing the percentage of correctly classified instances.
+        """
+        assert prediction.size == target.size
+        return np.average((target == prediction))
+    def _set_classification(self):
+        """
+        Description:
+        ------------
+            Decides if FFNN acts as classifier (True) og regressor (False),
+            sets self.classification during init()
+        """
+        self.classification = False
+        if (
+            self.cost_func.__name__ == "CostLogReg"
+            or self.cost_func.__name__ == "CostCrossEntropy"
+        ):
+            self.classification = True
+
+    def _progress_bar(self, progression, **kwargs):
+        """
+        Description:
+        ------------
+            Displays progress of training
+        """
+        print_length = 40
+        num_equals = int(progression * print_length)
+        num_not = print_length - num_equals
+        arrow = ">" if num_equals > 0 else ""
+        bar = "[" + "=" * (num_equals - 1) + arrow + "-" * num_not + "]"
+        perc_print = self._format(progression * 100, decimals=5)
+        line = f"  {bar} {perc_print}% "
+
+        for key in kwargs:
+            if not np.isnan(kwargs[key]):
+                value = self._format(kwargs[key], decimals=4)
+                line += f"| {key}: {value} "
+        sys.stdout.write("\r" + line)
+        sys.stdout.flush()
+        return len(line)
+
+    def _format(self, value, decimals=4):
+        """
+        Description:
+        ------------
+            Formats decimal numbers for progress bar
+        """
+        if value > 0:
+            v = value
+        elif value < 0:
+            v = -10 * value
+        else:
+            v = 1
+        n = 1 + math.floor(math.log10(v))
+        if n >= decimals - 1:
+            return str(round(value))
+        return f"{value:.{decimals-n-1}f}"
+!ec
+
+Before we make a model, we will quickly generate a dataset we can use
+for our linear regression problem as shown below
+
+!bc pycod
+import autograd.numpy as np
+from sklearn.model_selection import train_test_split
+
+def SkrankeFunction(x, y):
+    return np.ravel(0 + 1*x + 2*y + 3*x**2 + 4*x*y + 5*y**2)
+
+def create_X(x, y, n):
+    if len(x.shape) > 1:
+        x = np.ravel(x)
+        y = np.ravel(y)
+
+    N = len(x)
+    l = int((n + 1) * (n + 2) / 2)  # Number of elements in beta
+    X = np.ones((N, l))
+
+    for i in range(1, n + 1):
+        q = int((i) * (i + 1) / 2)
+        for k in range(i + 1):
+            X[:, q + k] = (x ** (i - k)) * (y**k)
+
+    return X
+
+step=0.5
+x = np.arange(0, 1, step)
+y = np.arange(0, 1, step)
+x, y = np.meshgrid(x, y)
+target = SkrankeFunction(x, y)
+target = target.reshape(target.shape[0], 1)
+
+poly_degree=3
+X = create_X(x, y, poly_degree)
+
+X_train, X_test, t_train, t_test = train_test_split(X, target)
+
+!ec
+
+Now that we have our dataset ready for the regression, we can create
+our regressor. Note that with the seed parameter, we can make sure our
+results stay the same every time we run the neural network. For
+inititialization, we simply specify the dimensions (we wish the amount
+of input nodes to be equal to the datapoints, and the output to
+predict one value).
+
+
+!bc pycod
+input_nodes = X_train.shape[1]
+output_nodes = 1
+
+linear_regression = FFNN((input_nodes, output_nodes), output_func=identity, cost_func=CostOLS, seed=2023)
+
+!ec
+
+We then fit our model with our training data using the scheduler of our choice.
+
+!bc pycod
+linear_regression.reset_weights() # reset weights such that previous runs or reruns don't affect the weights
+
+scheduler = Constant(eta=1e-3)
+scores = linear_regression.fit(X_train, t_train, scheduler)
+
+
+!ec
+
+Due to the progress bar we can see the MSE (train_error) throughout
+the FFNN's training. Note that the fit() function has some optional
+parameters with defualt arguments. For example, the regularization
+hyperparameter can be left ignored if not needed, and equally the FFNN
+will by default run for 100 epochs. These can easily be changed, such
+as for example:
+
+!bc pycod
+linear_regression.reset_weights() # reset weights such that previous runs or reruns don't affect the weights
+
+scores = linear_regression.fit(X_train, t_train, scheduler, lam=1e-4, epochs=1000)
+
+!ec
+
+We see that given more epochs to train on, the regressor reaches a lower MSE.
+
+Let us then switch to a binary classification. We use a binary
+classification dataset, and follow a similar setup to the regression
+case.
+
+
+
+!bc pycod
+from sklearn.datasets import load_breast_cancer
+from sklearn.preprocessing import MinMaxScaler
+
+wisconsin = load_breast_cancer()
+X = wisconsin.data
+target = wisconsin.target
+target = target.reshape(target.shape[0], 1)
+
+X_train, X_val, t_train, t_val = train_test_split(X, target)
+
+scaler = MinMaxScaler()
+scaler.fit(X_train)
+X_train = scaler.transform(X_train)
+X_val = scaler.transform(X_val)
+
+
+!ec
+
+!bc pycod
+input_nodes = X_train.shape[1]
+output_nodes = 1
+
+logistic_regression = FFNN((input_nodes, output_nodes), output_func=sigmoid, cost_func=CostLogReg, seed=2023)
+
+!ec
+
+We will now make use of our validation data by passing it into our fit function as a keyword argument
+
+!bc pycod
+logistic_regression.reset_weights() # reset weights such that previous runs or reruns don't affect the weights
+
+scheduler = Adam(eta=1e-3, rho=0.9, rho2=0.999)
+scores = logistic_regression.fit(X_train, t_train, scheduler, epochs=1000, X_val=X_val, t_val=t_val)
+
+
+!ec
+
+Finally, we will create a neural network with 2 hidden layers with activation functions.
+!bc pycod
+input_nodes = X_train.shape[1]
+hidden_nodes1 = 100
+hidden_nodes2 = 30
+output_nodes = 1
+
+dims = (input_nodes, hidden_nodes1, hidden_nodes2, output_nodes)
+
+neural_network = FFNN(dims, hidden_func=RELU, output_func=sigmoid, cost_func=CostLogReg, seed=2023)
+
+
+!ec
+
+!bc pycod
+neural_network.reset_weights() # reset weights such that previous runs or reruns don't affect the weights
+
+scheduler = Adam(eta=1e-4, rho=0.9, rho2=0.999)
+scores = neural_network.fit(X_train, t_train, scheduler, epochs=1000, X_val=X_val, t_val=t_val)
+
+!ec
+
+=== Multiclass classification ===
+
+Finally, we will demonstrate the use case of multiclass classification
+using our FFNN with the famous MNIST dataset, which contain images of
+digits between the range of 0 to 9.
+
+
+!bc pycod
+from sklearn.datasets import load_digits
+
+def onehot(target: np.ndarray):
+    onehot = np.zeros((target.size, target.max() + 1))
+    onehot[np.arange(target.size), target] = 1
+    return onehot
+
+digits = load_digits()
+
+X = digits.data
+target = digits.target
+target = onehot(target)
+
+input_nodes = 64
+hidden_nodes1 = 100
+hidden_nodes2 = 30
+output_nodes = 10
+
+dims = (input_nodes, hidden_nodes1, hidden_nodes2, output_nodes)
+
+multiclass = FFNN(dims, hidden_func=LRELU, output_func=softmax, cost_func=CostCrossEntropy)
+
+multiclass.reset_weights() # reset weights such that previous runs or reruns don't affect the weights
+
+scheduler = Adam(eta=1e-4, rho=0.9, rho2=0.999)
+scores = multiclass.fit(X, target, scheduler, epochs=1000)
+
+!ec
+
+
+
+!split
+===== Testing the XOR gate and other gates =====
+
+Let us now use our code to test the XOR gate.
+
+!bc pycod
+X = np.array([ [0, 0], [0, 1], [1, 0],[1, 1]],dtype=np.float64)
+
+# The XOR gate
+yXOR = np.array( [[ 0], [1] ,[1], [0]])
+
+input_nodes = X.shape[1]
+output_nodes = 1
+
+logistic_regression = FFNN((input_nodes, output_nodes), output_func=sigmoid, cost_func=CostLogReg, seed=2023)
+logistic_regression.reset_weights() # reset weights such that previous runs or reruns don't affect the weights
+scheduler = Adam(eta=1e-1, rho=0.9, rho2=0.999)
+scores = logistic_regression.fit(X, yXOR, scheduler, epochs=1000)
+!ec
+Not bad, but the results depend strongly on the learning reate. Try different learning rates.
diff --git a/doc/src/week43/version2021.do.txt b/doc/src/week43/Previousversions/version2021.do.txt
similarity index 100%
rename from doc/src/week43/version2021.do.txt
rename to doc/src/week43/Previousversions/version2021.do.txt
diff --git a/doc/src/week43/Previousversions/week43.do.txt b/doc/src/week43/Previousversions/week43.do.txt
new file mode 100644
index 000000000..4fb97b4f8
--- /dev/null
+++ b/doc/src/week43/Previousversions/week43.do.txt
@@ -0,0 +1,5076 @@
+TITLE: Week 43: Deep Learning: Constructing a Neural Network code and solving differential equations
+AUTHOR: Morten Hjorth-Jensen {copyright, 1999-present|CC BY-NC} at Department of Physics, University of Oslo, Norway
+DATE: October 20, 2025
+
+!split
+===== Plans for week 43 =====
+
+!bblock Material for the lecture on Monday October 20, 2025
+  * Building our own Feed-forward Neural Network with intro to Tensorflow
+  * Solving differential equations with Neural Networks
+#  * Video of lecture at URL:"https://youtu.be/vkBNTn-MLqs"
+#  * Video os second part, solving differential equations with neural networks at URL:"https://youtu.be/2N8To65I2wQ"
+#  * Whiteboard notes on solving differential equations at URL:"https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2024/NotesOct21.pdf"  
+!eblock
+
+
+
+!split
+===== Exercises and lab session week 43 =====
+!bblock  Lab sessions on Tuesday and Wednesday
+  * Exercise on writing your own neural network code
+  * The exercises this week will be continued next week as well
+  * Discussion of project 2
+!eblock  
+
+
+
+!split
+===== Mathematics of deep learning =====
+
+!bblock Two recent books online
+o The Modern Mathematics of Deep Learning, by Julius Berner, Philipp Grohs, Gitta Kutyniok, Philipp Petersen at URL:"https://arxiv.org/abs/2105.04026", published as "Mathematical Aspects of Deep Learning, pp. 1-111. Cambridge University Press, 2022":"https://doi.org/10.1017/9781009025096.002"
+
+o Mathematical Introduction to Deep Learning: Methods, Implementations, and Theory, Arnulf Jentzen, Benno Kuckuck, Philippe von Wurstemberger at URL:"https://doi.org/10.48550/arXiv.2310.20360"
+!eblock
+
+
+!split
+===== Reminder on books with hands-on material and codes =====
+!bblock
+* Sebastian Rashcka et al, Machine learning with Scikit-Learn and PyTorch at URL:"https://sebastianraschka.com/blog/2022/ml-pytorch-book.html"
+!eblock
+
+
+!split
+===== Reading recommendations =====
+
+o Rashkca et al., chapter 11, jupyter-notebook sent separately, from GitHub site at URL:"https://github.com/rasbt/machine-learning-book". See also chapters 12 and 13 on using Pytorch to make a Neural network code. 
+o Goodfellow et al, chapter 6 and 7 contain most of the neural network background.
+
+
+!split
+===== Using Automatic differentiation =====
+
+In our discussions of ordinary differential equations and neural network codes
+we will also study the usage of Autograd, see for example URL:"https://www.youtube.com/watch?v=fRf4l5qaX1M&ab_channel=AlexSmola" in computing gradients for deep learning. For the documentation of Autograd and examples see the lectures slides from "week 39":"https://compphysics.github.io/MachineLearning/doc/pub/week39/html/week39.html" and the Autograd documentation at URL:"https://github.com/HIPS/autograd".
+
+
+!split
+===== Back propagation and automatic differentiation =====
+
+For more details on the back propagation algorithm and automatic differentiation see
+o URL:"https://www.jmlr.org/papers/volume18/17-468/17-468.pdf"
+o URL:"https://deepimaging.github.io/lectures/lecture_11_Backpropagation.pdf"
+o Slides 12-44 at URL:"http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture4.pdf"
+
+
+
+!split
+===== Lecture Monday  October 21 =====
+
+
+!split
+===== Setting up the back propagation algorithm and algorithm for a feed forward NN, initalizations =====
+This is a reminder from where we ended last week.
+
+!bblock The architecture (our model)
+o Set up your inputs and outputs (scalars, vectors, matrices or higher-order arrays)
+o Define the number of hidden layers and hidden nodes
+o Define activation functions for hidden layers and output layers
+o Define optimizer (plan learning rate, momentum, ADAgrad, RMSprop, ADAM etc) and array of initial learning rates
+o Define cost function and possible regularization terms with hyperparameters
+o Initialize weights and biases
+o Fix number of iterations for the feed forward part and back propagation part
+!eblock
+
+!split
+===== Setting up the back propagation algorithm, part 1 =====
+
+Let us write this out in the form of an algorithm.
+
+_First_, we set up the input data $\bm{x}$ and the activations
+$\bm{z}_1$ of the input layer and compute the activation function and
+the pertinent outputs $\bm{a}^1$.
+
+_Secondly_, we perform then the feed forward till we reach the output
+layer and compute all $\bm{z}_l$ of the input layer and compute the
+activation function and the pertinent outputs $\bm{a}^l$ for
+$l=1,2,3,\dots,L$.
+
+
+_Notation_: The first hidden layer has $l=1$ as label and the final output layer has $l=L$.
+
+!split
+===== Setting up the back propagation algorithm, part 2 =====
+
+
+Thereafter we compute the ouput error $\bm{\delta}^L$ by computing all
+!bt
+\[
+\delta_j^L = \sigma'(z_j^L)\frac{\partial {\cal C}}{\partial (a_j^L)}.
+\]
+!et
+
+Then we compute the back propagate error for each $l=L-1,L-2,\dots,1$ as
+!bt
+\[
+\delta_j^l = \sum_k \delta_k^{l+1}w_{kj}^{l+1}\sigma'(z_j^l).
+\]
+!et
+
+!split
+===== Setting up the Back propagation algorithm, part 3 =====
+
+
+Finally, we update the weights and the biases using gradient descent
+for each $l=L-1,L-2,\dots,1$ (the first hidden layer) and update the weights and biases
+according to the rules
+
+!bt
+\[
+w_{ij}^l\leftarrow  = w_{ij}^l- \eta \delta_j^la_i^{l-1},
+\]
+!et
+
+!bt
+\[
+b_j^l \leftarrow b_j^l-\eta \frac{\partial {\cal C}}{\partial b_j^l}=b_j^l-\eta \delta_j^l,
+\]
+!et
+with $\eta$ being the learning rate.
+
+!split
+===== Updating the gradients  =====
+
+With the back propagate error for each $l=L-1,L-2,\dots,1$ as
+!bt
+\[
+\delta_j^l = \sum_k \delta_k^{l+1}w_{kj}^{l+1}\sigma'(z_j^l),
+\]
+!et
+we update the weights and the biases using gradient descent for each $l=L-1,L-2,\dots,1$ and update the weights and biases according to the rules
+!bt
+\[
+w_{ij}^l\leftarrow  = w_{ij}^l- \eta \delta_j^la_i^{l-1},
+\]
+!et
+
+!bt
+\[
+b_j^l \leftarrow b_j^l-\eta \frac{\partial {\cal C}}{\partial b_j^l}=b_j^l-\eta \delta_j^l,
+\]
+!et
+
+
+
+!split
+===== Activation functions  =====
+
+
+A property that characterizes a neural network, other than its
+connectivity, is the choice of activation function(s).  The following
+restrictions are imposed on an activation function for an FFNN to
+fulfill the universal approximation theorem
+
+  * Non-constant
+
+  * Bounded
+
+  * Monotonically-increasing
+
+  * Continuous
+
+!split
+=== Activation functions, examples  ===
+
+Typical examples are the logistic *Sigmoid*
+
+!bt
+\[
+ \sigma(x) = \frac{1}{1 + e^{-x}},
+\]
+!et
+and the *hyperbolic tangent* function
+!bt
+\[
+ \sigma(x) = \tanh(x)
+\]
+!et
+
+
+!split
+===== The RELU function family =====
+
+The ReLU activation function suffers from a problem known as the dying
+ReLUs: during training, some neurons effectively die, meaning they
+stop outputting anything other than 0.
+
+In some cases, you may find that half of your network’s neurons are
+dead, especially if you used a large learning rate. During training,
+if a neuron’s weights get updated such that the weighted sum of the
+neuron’s inputs is negative, it will start outputting 0. When this
+happen, the neuron is unlikely to come back to life since the gradient
+of the ReLU function is 0 when its input is negative.
+
+!split
+===== ELU function =====
+
+To solve this problem, nowadays practitioners use a variant of the
+ReLU function, such as the leaky ReLU discussed above or the so-called
+exponential linear unit (ELU) function
+
+
+!bt
+\[
+ELU(z) = \left\{\begin{array}{cc} \alpha\left( \exp{(z)}-1\right) & z < 0,\\  z & z \ge 0.\end{array}\right. 
+\]
+!et
+
+!split
+===== Which activation function should we use? =====
+
+In general it seems that the ELU activation function is better than
+the leaky ReLU function (and its variants), which is better than
+ReLU. ReLU performs better than $\tanh$ which in turn performs better
+than the logistic function.
+
+If runtime performance is an issue, then you may opt for the leaky
+ReLU function over the ELU function If you don’t want to tweak yet
+another hyperparameter, you may just use the default $\alpha$ of
+$0.01$ for the leaky ReLU, and $1$ for ELU. If you have spare time and
+computing power, you can use cross-validation or bootstrap to evaluate
+other activation functions.
+
+
+!split
+=====  More on activation functions, output layers =====
+
+In most cases you can use the ReLU activation function in the hidden
+layers (or one of its variants).
+
+It is a bit faster to compute than other activation functions, and the
+gradient descent optimization does in general not get stuck.
+
+_For the output layer:_
+
+* For classification the softmax activation function is generally a good choice for classification tasks (when the classes are mutually exclusive).
+* For regression tasks, you can simply use no activation function at all.
+
+
+
+
+!split 
+===== Setting up a Multi-layer perceptron model for classification  =====
+
+We are now gong to develop an example based on the MNIST data
+base. This is a classification problem and we need to use our
+cross-entropy function we discussed in connection with logistic
+regression. The cross-entropy defines our cost function for the
+classificaton problems with neural networks.
+  
+In binary classification with two classes $(0, 1)$ we define the
+logistic/sigmoid function as the probability that a particular input
+is in class $0$ or $1$.  This is possible because the logistic
+function takes any input from the real numbers and inputs a number
+between 0 and 1, and can therefore be interpreted as a probability. It
+also has other nice properties, such as a derivative that is simple to
+calculate.
+  
+For an input $\boldsymbol{a}$ from the hidden layer, the probability that the input $\boldsymbol{x}$
+is in class 0 or 1 is just. We let $\theta$ represent the unknown weights and biases to be adjusted by our equations). The variable $x$
+represents our activation values $z$. We have
+!bt
+\[
+P(y = 0 \mid \bm{x}, \bm{\theta}) = \frac{1}{1 + \exp{(- \bm{x}})} ,
+\]
+!et
+and
+!bt
+\[
+P(y = 1 \mid \bm{x}, \bm{\theta}) = 1 - P(y = 0 \mid \bm{x}, \bm{\theta}) ,
+\]
+!et
+  
+where $y \in \{0, 1\}$  and $\bm{\theta}$ represents the weights and biases
+of our network.
+  
+
+!split
+===== Defining the cost function =====
+
+Our cost function is given as (see the Logistic regression lectures)
+!bt
+\[
+\mathcal{C}(\bm{\theta}) = - \ln P(\mathcal{D} \mid \bm{\theta}) = - \sum_{i=1}^n
+y_i \ln[P(y_i = 0)] + (1 - y_i) \ln [1 - P(y_i = 0)] = \sum_{i=1}^n \mathcal{L}_i(\bm{\theta}) .
+\]
+!et
+  
+This last equality means that we can interpret our *cost* function as a sum over the *loss* function
+for each point in the dataset $\mathcal{L}_i(\bm{\theta})$.  
+The negative sign is just so that we can think about our algorithm as minimizing a positive number, rather
+than maximizing a negative number.  
+  
+In *multiclass* classification it is common to treat each integer label as a so called *one-hot* vector:  
+  
+$y = 5 \quad \rightarrow \quad \bm{y} = (0, 0, 0, 0, 0, 1, 0, 0, 0, 0) ,$ and
+
+  
+$y = 1 \quad \rightarrow \quad \bm{y} = (0, 1, 0, 0, 0, 0, 0, 0, 0, 0) ,$ 
+  
+  
+i.e. a binary bit string of length $C$, where $C = 10$ is the number of classes in the MNIST dataset (numbers from $0$ to $9$)..  
+  
+If $\bm{x}_i$ is the $i$-th input (image), $y_{ic}$ refers to the $c$-th component of the $i$-th
+output vector $\bm{y}_i$.  
+The probability of $\bm{x}_i$ being in class $c$ will be given by the softmax function:  
+  
+!bt
+\[
+P(y_{ic} = 1 \mid \bm{x}_i, \bm{\theta}) = \frac{\exp{((\bm{a}_i^{hidden})^T \bm{w}_c)}}
+{\sum_{c'=0}^{C-1} \exp{((\bm{a}_i^{hidden})^T \bm{w}_{c'})}} ,
+\]
+!et
+  
+which reduces to the logistic function in the binary case.  
+The likelihood of this $C$-class classifier
+is now given as:  
+  
+!bt
+\[
+P(\mathcal{D} \mid \bm{\theta}) = \prod_{i=1}^n \prod_{c=0}^{C-1} [P(y_{ic} = 1)]^{y_{ic}} .
+\]
+!et
+Again we take the negative log-likelihood to define our cost function:  
+  
+!bt
+\[
+\mathcal{C}(\bm{\theta}) = - \log{P(\mathcal{D} \mid \bm{\theta})}.
+\]
+!et
+See the logistic regression lectures for a full definition of the cost function.
+
+The back propagation equations need now only a small change, namely the definition of a new cost function. We are thus ready to use the same equations as before!
+
+!split
+===== Example: binary classification problem =====
+
+As an example of the above, relevant for project 2 as well, let us consider a binary class. As discussed in our logistic regression lectures, we defined a cost function in terms of the parameters $\beta$ as
+!bt
+\[
+\mathcal{C}(\bm{\beta}) = - \sum_{i=1}^n \left(y_i\log{p(y_i \vert x_i,\bm{\beta})}+(1-y_i)\log{1-p(y_i \vert x_i,\bm{\beta})}\right),
+\]
+!et
+where we had defined the logistic (sigmoid) function
+!bt
+\[
+p(y_i =1\vert x_i,\bm{\beta})=\frac{\exp{(\beta_0+\beta_1 x_i)}}{1+\exp{(\beta_0+\beta_1 x_i)}},
+\]
+!et
+and
+!bt
+\[
+p(y_i =0\vert x_i,\bm{\beta})=1-p(y_i =1\vert x_i,\bm{\beta}).
+\]
+!et
+The parameters $\bm{\beta}$ were defined using a minimization method like gradient descent or Newton-Raphson's method. 
+
+Now we replace $x_i$ with the activation $z_i^l$ for a given layer $l$ and the outputs as $y_i=a_i^l=f(z_i^l)$, with $z_i^l$ now being a function of the weights $w_{ij}^l$ and biases $b_i^l$. 
+We have then
+!bt
+\[
+a_i^l = y_i = \frac{\exp{(z_i^l)}}{1+\exp{(z_i^l)}},
+\]
+!et
+with 
+!bt
+\[
+z_i^l = \sum_{j}w_{ij}^l a_j^{l-1}+b_i^l,
+\]
+!et
+where the superscript $l-1$ indicates that these are the outputs from layer $l-1$.
+Our cost function at the final layer $l=L$ is now
+!bt
+\[
+\mathcal{C}(\bm{W}) = - \sum_{i=1}^n \left(t_i\log{a_i^L}+(1-t_i)\log{(1-a_i^L)}\right),
+\]
+!et
+where we have defined the targets $t_i$. The derivatives of the cost function with respect to the output $a_i^L$ are then easily calculated and we get
+!bt
+\[
+\frac{\partial \mathcal{C}(\bm{W})}{\partial a_i^L} = \frac{a_i^L-t_i}{a_i^L(1-a_i^L)}. 
+\]
+!et
+In case we use another activation function than the logistic one, we need to evaluate other derivatives. 
+
+
+!split
+===== The Softmax function =====
+In case we employ the more general case given by the Softmax equation, we need to evaluate the derivative of the activation function with respect to the activation $z_i^l$, that is we need
+!bt
+\[
+\frac{\partial f(z_i^l)}{\partial w_{jk}^l} =
+\frac{\partial f(z_i^l)}{\partial z_j^l} \frac{\partial z_j^l}{\partial w_{jk}^l}= \frac{\partial f(z_i^l)}{\partial z_j^l}a_k^{l-1}.
+\]
+!et
+For the Softmax function we have
+!bt
+\[
+f(z_i^l) = \frac{\exp{(z_i^l)}}{\sum_{m=1}^K\exp{(z_m^l)}}.
+\]
+!et
+Its derivative with respect to $z_j^l$ gives 
+!bt
+\[
+\frac{\partial f(z_i^l)}{\partial z_j^l}= f(z_i^l)\left(\delta_{ij}-f(z_j^l)\right), 
+\]
+!et
+which in case of the simply binary model reduces to  having $i=j$. 
+
+!split 
+===== Developing a code for doing neural networks with back propagation =====
+
+  
+One can identify a set of key steps when using neural networks to solve supervised learning problems:  
+  
+o Collect and pre-process data  
+o Define model and architecture  
+o Choose cost function and optimizer  
+o Train the model  
+o Evaluate model performance on test data  
+o Adjust hyperparameters (if necessary, network architecture)
+
+!split
+===== Collect and pre-process data =====
+  
+Here we will be using the MNIST dataset, which is readily available through the _scikit-learn_
+package. You may also find it for example "here":"http://yann.lecun.com/exdb/mnist/".  
+The *MNIST* (Modified National Institute of Standards and Technology) database is a large database
+of handwritten digits that is commonly used for training various image processing systems.  
+The MNIST dataset consists of 70 000 images of size $28\times 28$ pixels, each labeled from 0 to 9.  
+The scikit-learn dataset we will use consists of a selection of 1797 images of size $8\times 8$ collected and processed from this database.  
+  
+To feed data into a feed-forward neural network we need to represent
+the inputs as a design/feature matrix $X = (n_{inputs}, n_{features})$.  Each
+row represents an *input*, in this case a handwritten digit, and
+each column represents a *feature*, in this case a pixel.  The
+correct answers, also known as *labels* or *targets* are
+represented as a 1D array of integers 
+$Y = (n_{inputs}) = (5, 3, 1, 8,...)$.
+  
+As an example, say we want to build a neural network using supervised learning to predict Body-Mass Index (BMI) from
+measurements of height (in m)  
+and weight (in kg). If we have measurements of 5 people the design/feature matrix could be for example:  
+  
+$$ X = \begin{bmatrix}
+1.85 & 81\\
+1.71 & 65\\
+1.95 & 103\\
+1.55 & 42\\
+1.63 & 56
+\end{bmatrix} ,$$  
+  
+and the targets would be:  
+  
+$$ Y = (23.7, 22.2, 27.1, 17.5, 21.1) $$  
+  
+Since each input image is a 2D matrix, we need to flatten the image
+(i.e. "unravel" the 2D matrix into a 1D array) to turn the data into a
+design/feature matrix. This means we lose all spatial information in the
+image, such as locality and translational invariance. More complicated
+architectures such as Convolutional Neural Networks can take advantage
+of such information, and are most commonly applied when analyzing
+images.
+
+
+!bc pycod
+# import necessary packages
+import numpy as np
+import matplotlib.pyplot as plt
+from sklearn import datasets
+
+
+# ensure the same random numbers appear every time
+np.random.seed(0)
+
+# display images in notebook
+%matplotlib inline
+plt.rcParams['figure.figsize'] = (12,12)
+
+
+# download MNIST dataset
+digits = datasets.load_digits()
+
+# define inputs and labels
+inputs = digits.images
+labels = digits.target
+
+print("inputs = (n_inputs, pixel_width, pixel_height) = " + str(inputs.shape))
+print("labels = (n_inputs) = " + str(labels.shape))
+
+
+# flatten the image
+# the value -1 means dimension is inferred from the remaining dimensions: 8x8 = 64
+n_inputs = len(inputs)
+inputs = inputs.reshape(n_inputs, -1)
+print("X = (n_inputs, n_features) = " + str(inputs.shape))
+
+
+# choose some random images to display
+indices = np.arange(n_inputs)
+random_indices = np.random.choice(indices, size=5)
+
+for i, image in enumerate(digits.images[random_indices]):
+    plt.subplot(1, 5, i+1)
+    plt.axis('off')
+    plt.imshow(image, cmap=plt.cm.gray_r, interpolation='nearest')
+    plt.title("Label: %d" % digits.target[random_indices[i]])
+plt.show()
+!ec
+
+!split
+===== Train and test datasets =====
+
+Performing analysis before partitioning the dataset is a major error, that can lead to incorrect conclusions.  
+  
+We will reserve $80 \%$ of our dataset for training and $20 \%$ for testing.  
+  
+It is important that the train and test datasets are drawn randomly from our dataset, to ensure
+no bias in the sampling.  
+Say you are taking measurements of weather data to predict the weather in the coming 5 days.
+You don't want to train your model on measurements taken from the hours 00.00 to 12.00, and then test it on data
+collected from 12.00 to 24.00.
+
+
+!bc pycod
+from sklearn.model_selection import train_test_split
+
+# one-liner from scikit-learn library
+train_size = 0.8
+test_size = 1 - train_size
+X_train, X_test, Y_train, Y_test = train_test_split(inputs, labels, train_size=train_size,
+                                                    test_size=test_size)
+
+# equivalently in numpy
+def train_test_split_numpy(inputs, labels, train_size, test_size):
+    n_inputs = len(inputs)
+    inputs_shuffled = inputs.copy()
+    labels_shuffled = labels.copy()
+    
+    np.random.shuffle(inputs_shuffled)
+    np.random.shuffle(labels_shuffled)
+    
+    train_end = int(n_inputs*train_size)
+    X_train, X_test = inputs_shuffled[:train_end], inputs_shuffled[train_end:]
+    Y_train, Y_test = labels_shuffled[:train_end], labels_shuffled[train_end:]
+    
+    return X_train, X_test, Y_train, Y_test
+
+#X_train, X_test, Y_train, Y_test = train_test_split_numpy(inputs, labels, train_size, test_size)
+
+print("Number of training images: " + str(len(X_train)))
+print("Number of test images: " + str(len(X_test)))
+!ec
+
+!split
+===== Define model and architecture =====
+  
+Our simple feed-forward neural network will consist of an *input* layer, a single *hidden* layer and an *output* layer. The activation $y$ of each neuron is a weighted sum of inputs, passed through an activation function. In case of the simple perceptron model we have 
+  
+$$ z = \sum_{i=1}^n w_i a_i ,$$
+  
+$$ y = f(z) ,$$
+  
+where $f$ is the activation function, $a_i$ represents input from neuron $i$ in the preceding layer
+and $w_i$ is the weight to input $i$.  
+The activation of the neurons in the input layer is just the features (e.g. a pixel value).  
+  
+The simplest activation function for a neuron is the *Heaviside* function:
+  
+$$ f(z) = 
+\begin{cases}
+1,  &  z > 0\\
+0,  & \text{otherwise}
+\end{cases}
+$$
+  
+A feed-forward neural network with this activation is known as a *perceptron*.  
+For a binary classifier (i.e. two classes, 0 or 1, dog or not-dog) we can also use this in our output layer.  
+This activation can be generalized to $k$ classes (using e.g. the *one-against-all* strategy), 
+and we call these architectures *multiclass perceptrons*.  
+  
+However, it is now common to use the terms Single Layer Perceptron (SLP) (1 hidden layer) and  
+Multilayer Perceptron (MLP) (2 or more hidden layers) to refer to feed-forward neural networks with any activation function.  
+  
+Typical choices for activation functions include the sigmoid function, hyperbolic tangent, and Rectified Linear Unit (ReLU).  
+We will be using the sigmoid function $\sigma(x)$:  
+  
+$$ f(x) = \sigma(x) = \frac{1}{1 + e^{-x}} ,$$
+  
+which is inspired by probability theory (see logistic regression) and was most commonly used until about 2011. See the discussion below concerning other activation functions.
+
+!split  
+===== Layers =====
+  
+* Input 
+Since each input image has 8x8 = 64 pixels or features, we have an input layer of 64 neurons.  
+  
+* Hidden layer
+We will use 50 neurons in the hidden layer receiving input from the neurons in the input layer.  
+Since each neuron in the hidden layer is connected to the 64 inputs we have 64x50 = 3200 weights to the hidden layer.  
+  
+* Output
+If we were building a binary classifier, it would be sufficient with a single neuron in the output layer,
+which could output 0 or 1 according to the Heaviside function. This would be an example of a *hard* classifier, meaning it outputs the class of the input directly. However, if we are dealing with noisy data it is often beneficial to use a *soft* classifier, which outputs the probability of being in class 0 or 1.  
+  
+For a soft binary classifier, we could use a single neuron and interpret the output as either being the probability of being in class 0 or the probability of being in class 1. Alternatively we could use 2 neurons, and interpret each neuron as the probability of being in each class.  
+  
+Since we are doing multiclass classification, with 10 categories, it is natural to use 10 neurons in the output layer. We number the neurons $j = 0,1,...,9$. The activation of each output neuron $j$ will be according to the *softmax* function:  
+  
+$$ P(\text{class $j$} \mid \text{input $\bm{a}$}) = \frac{\exp{(\bm{a}^T \bm{w}_j)}}
+{\sum_{c=0}^{9} \exp{(\bm{a}^T \bm{w}_c)}} ,$$  
+  
+i.e. each neuron $j$ outputs the probability of being in class $j$ given an input from the hidden layer $\bm{a}$, with $\bm{w}_j$ the weights of neuron $j$ to the inputs.  
+The denominator is a normalization factor to ensure the outputs (probabilities) sum up to 1.  
+The exponent is just the weighted sum of inputs as before:  
+  
+$$ z_j = \sum_{i=1}^n w_ {ij} a_i+b_j.$$  
+  
+Since each neuron in the output layer is connected to the 50 inputs from the hidden layer we have 50x10 = 500
+weights to the output layer.
+
+!split  
+===== Weights and biases =====
+  
+Typically weights are initialized with small values distributed around zero, drawn from a uniform
+or normal distribution. Setting all weights to zero means all neurons give the same output, making the network useless.  
+  
+Adding a bias value to the weighted sum of inputs allows the neural network to represent a greater range
+of values. Without it, any input with the value 0 will be mapped to zero (before being passed through the activation). The bias unit has an output of 1, and a weight to each neuron $j$, $b_j$:  
+  
+$$ z_j = \sum_{i=1}^n w_ {ij} a_i + b_j.$$  
+  
+The bias weights $\bm{b}$ are often initialized to zero, but a small value like $0.01$ ensures all neurons have some output which can be backpropagated in the first training cycle.
+!bc pycod
+# building our neural network
+
+n_inputs, n_features = X_train.shape
+n_hidden_neurons = 50
+n_categories = 10
+
+# we make the weights normally distributed using numpy.random.randn
+
+# weights and bias in the hidden layer
+hidden_weights = np.random.randn(n_features, n_hidden_neurons)
+hidden_bias = np.zeros(n_hidden_neurons) + 0.01
+
+# weights and bias in the output layer
+output_weights = np.random.randn(n_hidden_neurons, n_categories)
+output_bias = np.zeros(n_categories) + 0.01
+!ec
+
+!split
+===== Feed-forward pass =====
+
+Denote $F$ the number of features, $H$ the number of hidden neurons and $C$ the number of categories.  
+For each input image we calculate a weighted sum of input features (pixel values) to each neuron $j$ in the hidden layer $l$:  
+  
+$$ z_{j}^{l} = \sum_{i=1}^{F} w_{ij}^{l} x_i + b_{j}^{l},$$
+  
+this is then passed through our activation function  
+  
+$$ a_{j}^{l} = f(z_{j}^{l}) .$$  
+  
+We calculate a weighted sum of inputs (activations in the hidden layer) to each neuron $j$ in the output layer:  
+  
+$$ z_{j}^{L} = \sum_{i=1}^{H} w_{ij}^{L} a_{i}^{l} + b_{j}^{L}.$$  
+  
+Finally we calculate the output of neuron $j$ in the output layer using the softmax function:  
+  
+$$ a_{j}^{L} = \frac{\exp{(z_j^{L})}}
+{\sum_{c=0}^{C-1} \exp{(z_c^{L})}} .$$  
+
+!split   
+===== Matrix multiplications =====
+  
+Since our data has the dimensions $X = (n_{inputs}, n_{features})$ and our weights to the hidden
+layer have the dimensions  
+$W_{hidden} = (n_{features}, n_{hidden})$,
+we can easily feed the network all our training data in one go by taking the matrix product  
+  
+$$ X W^{h} = (n_{inputs}, n_{hidden}),$$ 
+  
+and obtain a matrix that holds the weighted sum of inputs to the hidden layer
+for each input image and each hidden neuron.    
+We also add the bias to obtain a matrix of weighted sums to the hidden layer $Z^{h}$:  
+  
+$$ \bm{z}^{l} = \bm{X} \bm{W}^{l} + \bm{b}^{l} ,$$
+  
+meaning the same bias (1D array with size equal number of hidden neurons) is added to each input image.  
+This is then passed through the activation:  
+  
+$$ \bm{a}^{l} = f(\bm{z}^l) .$$  
+  
+This is fed to the output layer:  
+  
+$$ \bm{z}^{L} = \bm{a}^{L} \bm{W}^{L} + \bm{b}^{L} .$$
+  
+Finally we receive our output values for each image and each category by passing it through the softmax function:  
+  
+$$ output = softmax (\bm{z}^{L}) = (n_{inputs}, n_{categories}) .$$
+
+
+!bc pycod
+# setup the feed-forward pass, subscript h = hidden layer
+
+def sigmoid(x):
+    return 1/(1 + np.exp(-x))
+
+def feed_forward(X):
+    # weighted sum of inputs to the hidden layer
+    z_h = np.matmul(X, hidden_weights) + hidden_bias
+    # activation in the hidden layer
+    a_h = sigmoid(z_h)
+    
+    # weighted sum of inputs to the output layer
+    z_o = np.matmul(a_h, output_weights) + output_bias
+    # softmax output
+    # axis 0 holds each input and axis 1 the probabilities of each category
+    exp_term = np.exp(z_o)
+    probabilities = exp_term / np.sum(exp_term, axis=1, keepdims=True)
+    
+    return probabilities
+
+probabilities = feed_forward(X_train)
+print("probabilities = (n_inputs, n_categories) = " + str(probabilities.shape))
+print("probability that image 0 is in category 0,1,2,...,9 = \n" + str(probabilities[0]))
+print("probabilities sum up to: " + str(probabilities[0].sum()))
+print()
+
+# we obtain a prediction by taking the class with the highest likelihood
+def predict(X):
+    probabilities = feed_forward(X)
+    return np.argmax(probabilities, axis=1)
+
+predictions = predict(X_train)
+print("predictions = (n_inputs) = " + str(predictions.shape))
+print("prediction for image 0: " + str(predictions[0]))
+print("correct label for image 0: " + str(Y_train[0]))
+!ec
+
+!split
+===== Choose cost function and optimizer =====
+  
+To measure how well our neural network is doing we need to introduce a cost function.  
+We will call the function that gives the error of a single sample output the *loss* function, and the function
+that gives the total error of our network across all samples the *cost* function.
+A typical choice for multiclass classification is the *cross-entropy* loss, also known as the negative log likelihood.  
+  
+In *multiclass* classification it is common to treat each integer label as a so called *one-hot* vector:  
+  
+$$ y = 5 \quad \rightarrow \quad \bm{y} = (0, 0, 0, 0, 0, 1, 0, 0, 0, 0) ,$$  
+
+  
+$$ y = 1 \quad \rightarrow \quad \bm{y} = (0, 1, 0, 0, 0, 0, 0, 0, 0, 0) ,$$  
+  
+  
+i.e. a binary bit string of length $C$, where $C = 10$ is the number of classes in the MNIST dataset.  
+  
+Let $y_{ic}$ denote the $c$-th component of the $i$-th one-hot vector.  
+We define the cost function $\mathcal{C}$ as a sum over the cross-entropy loss for each point $\bm{x}_i$ in the dataset.
+  
+In the one-hot representation only one of the terms in the loss function is non-zero, namely the
+probability of the correct category $c'$  
+(i.e. the category $c'$ such that $y_{ic'} = 1$). This means that the cross entropy loss only punishes you for how wrong
+you got the correct label. The probability of category $c$ is given by the softmax function. The vector $\bm{\theta}$ represents the parameters of our network, i.e. all the weights and biases.  
+  
+  
+!split
+===== Optimizing the cost function =====
+  
+The network is trained by finding the weights and biases that minimize the cost function. One of the most widely used classes of methods is *gradient descent* and its generalizations. The idea behind gradient descent
+is simply to adjust the weights in the direction where the gradient of the cost function is large and negative. This ensures we flow toward a *local* minimum of the cost function.  
+Each parameter $\theta$ is iteratively adjusted according to the rule  
+  
+$$ \theta_{i+1} = \theta_i - \eta \nabla \mathcal{C}(\theta_i) ,$$
+
+where $\eta$ is known as the *learning rate*, which controls how big a step we take towards the minimum.  
+This update can be repeated for any number of iterations, or until we are satisfied with the result.  
+  
+A simple and effective improvement is a variant called *Batch Gradient Descent*.  
+Instead of calculating the gradient on the whole dataset, we calculate an approximation of the gradient
+on a subset of the data called a *minibatch*.  
+If there are $N$ data points and we have a minibatch size of $M$, the total number of batches
+is $N/M$.  
+We denote each minibatch $B_k$, with $k = 1, 2,...,N/M$. The gradient then becomes:  
+  
+$$ \nabla \mathcal{C}(\theta) = \frac{1}{N} \sum_{i=1}^N \nabla \mathcal{L}_i(\theta) \quad \rightarrow \quad
+\frac{1}{M} \sum_{i \in B_k} \nabla \mathcal{L}_i(\theta) ,$$
+  
+i.e. instead of averaging the loss over the entire dataset, we average over a minibatch.  
+  
+This has two important benefits:  
+o Introducing stochasticity decreases the chance that the algorithm becomes stuck in a local minima.  
+o It significantly speeds up the calculation, since we do not have to use the entire dataset to calculate the gradient.  
+
+The various optmization  methods, with codes and algorithms,  are discussed in our lectures on "Gradient descent approaches":"https://compphysics.github.io/MachineLearning/doc/pub/Splines/html/Splines-bs.html".
+
+!split  
+=====  Regularization =====
+  
+It is common to add an extra term to the cost function, proportional
+to the size of the weights.  This is equivalent to constraining the
+size of the weights, so that they do not grow out of control.
+Constraining the size of the weights means that the weights cannot
+grow arbitrarily large to fit the training data, and in this way
+reduces *overfitting*.
+  
+We will measure the size of the weights using the so called *L2-norm*, meaning our cost function becomes:  
+  
+$$  \mathcal{C}(\theta) = \frac{1}{N} \sum_{i=1}^N \mathcal{L}_i(\theta) \quad \rightarrow \quad
+\frac{1}{N} \sum_{i=1}^N  \mathcal{L}_i(\theta) + \lambda \lvert \lvert \bm{w} \rvert \rvert_2^2 
+= \frac{1}{N} \sum_{i=1}^N  \mathcal{L}(\theta) + \lambda \sum_{ij} w_{ij}^2,$$  
+  
+i.e. we sum up all the weights squared. The factor $\lambda$ is known as a regularization parameter.
+
+  
+In order to train the model, we need to calculate the derivative of
+the cost function with respect to every bias and weight in the
+network.  In total our network has $(64 + 1)\times 50=3250$ weights in
+the hidden layer and $(50 + 1)\times 10=510$ weights to the output
+layer ($+1$ for the bias), and the gradient must be calculated for
+every parameter.  We use the *backpropagation* algorithm discussed
+above. This is a clever use of the chain rule that allows us to
+calculate the gradient efficently. 
+
+  
+!split
+===== Matrix  multiplication =====
+  
+To more efficently train our network these equations are implemented using matrix operations.  
+The error in the output layer is calculated simply as, with $\bm{t}$ being our targets,  
+  
+$$ \delta_L = \bm{t} - \bm{y} = (n_{inputs}, n_{categories}) .$$  
+  
+The gradient for the output weights is calculated as  
+  
+$$ \nabla W_{L} = \bm{a}^T \delta_L   = (n_{hidden}, n_{categories}) ,$$
+  
+where $\bm{a} = (n_{inputs}, n_{hidden})$. This simply means that we are summing up the gradients for each input.  
+Since we are going backwards we have to transpose the activation matrix.  
+  
+The gradient with respect to the output bias is then  
+  
+$$ \nabla \bm{b}_{L} = \sum_{i=1}^{n_{inputs}} \delta_L = (n_{categories}) .$$  
+  
+The error in the hidden layer is  
+  
+$$ \Delta_h = \delta_L W_{L}^T \circ f'(z_{h}) = \delta_L W_{L}^T \circ a_{h} \circ (1 - a_{h}) = (n_{inputs}, n_{hidden}) ,$$  
+  
+where $f'(a_{h})$ is the derivative of the activation in the hidden layer. The matrix products mean
+that we are summing up the products for each neuron in the output layer. The symbol $\circ$ denotes
+the *Hadamard product*, meaning element-wise multiplication.  
+  
+This again gives us the gradients in the hidden layer:  
+  
+$$ \nabla W_{h} = X^T \delta_h = (n_{features}, n_{hidden}) ,$$  
+  
+$$ \nabla b_{h} = \sum_{i=1}^{n_{inputs}} \delta_h = (n_{hidden}) .$$
+
+
+!bc pycod
+# to categorical turns our integer vector into a onehot representation
+from sklearn.metrics import accuracy_score
+
+# one-hot in numpy
+def to_categorical_numpy(integer_vector):
+    n_inputs = len(integer_vector)
+    n_categories = np.max(integer_vector) + 1
+    onehot_vector = np.zeros((n_inputs, n_categories))
+    onehot_vector[range(n_inputs), integer_vector] = 1
+    
+    return onehot_vector
+
+#Y_train_onehot, Y_test_onehot = to_categorical(Y_train), to_categorical(Y_test)
+Y_train_onehot, Y_test_onehot = to_categorical_numpy(Y_train), to_categorical_numpy(Y_test)
+
+def feed_forward_train(X):
+    # weighted sum of inputs to the hidden layer
+    z_h = np.matmul(X, hidden_weights) + hidden_bias
+    # activation in the hidden layer
+    a_h = sigmoid(z_h)
+    
+    # weighted sum of inputs to the output layer
+    z_o = np.matmul(a_h, output_weights) + output_bias
+    # softmax output
+    # axis 0 holds each input and axis 1 the probabilities of each category
+    exp_term = np.exp(z_o)
+    probabilities = exp_term / np.sum(exp_term, axis=1, keepdims=True)
+    
+    # for backpropagation need activations in hidden and output layers
+    return a_h, probabilities
+
+def backpropagation(X, Y):
+    a_h, probabilities = feed_forward_train(X)
+    
+    # error in the output layer
+    error_output = probabilities - Y
+    # error in the hidden layer
+    error_hidden = np.matmul(error_output, output_weights.T) * a_h * (1 - a_h)
+    
+    # gradients for the output layer
+    output_weights_gradient = np.matmul(a_h.T, error_output)
+    output_bias_gradient = np.sum(error_output, axis=0)
+    
+    # gradient for the hidden layer
+    hidden_weights_gradient = np.matmul(X.T, error_hidden)
+    hidden_bias_gradient = np.sum(error_hidden, axis=0)
+
+    return output_weights_gradient, output_bias_gradient, hidden_weights_gradient, hidden_bias_gradient
+
+print("Old accuracy on training data: " + str(accuracy_score(predict(X_train), Y_train)))
+
+eta = 0.01
+lmbd = 0.01
+for i in range(1000):
+    # calculate gradients
+    dWo, dBo, dWh, dBh = backpropagation(X_train, Y_train_onehot)
+    
+    # regularization term gradients
+    dWo += lmbd * output_weights
+    dWh += lmbd * hidden_weights
+    
+    # update weights and biases
+    output_weights -= eta * dWo
+    output_bias -= eta * dBo
+    hidden_weights -= eta * dWh
+    hidden_bias -= eta * dBh
+
+print("New accuracy on training data: " + str(accuracy_score(predict(X_train), Y_train)))
+!ec
+
+!split
+===== Improving performance =====
+  
+As we can see the network does not seem to be learning at all. It seems to be just guessing the label for each image.  
+In order to obtain a network that does something useful, we will have to do a bit more work.  
+  
+The choice of *hyperparameters* such as learning rate and regularization parameter is hugely influential for the performance of the network. Typically a *grid-search* is performed, wherein we test different hyperparameters separated by orders of magnitude. For example we could test the learning rates $\eta = 10^{-6}, 10^{-5},...,10^{-1}$ with different regularization parameters $\lambda = 10^{-6},...,10^{-0}$.  
+  
+Next, we haven't implemented minibatching yet, which introduces stochasticity and is though to act as an important regularizer on the weights. We call a feed-forward + backward pass with a minibatch an *iteration*, and a full training period
+going through the entire dataset ($n/M$ batches) an *epoch*.
+  
+If this does not improve network performance, you may want to consider altering the network architecture, adding more neurons or hidden layers.  
+Andrew Ng goes through some of these considerations in this "video":"https://youtu.be/F1ka6a13S9I". You can find a summary of the video "here":"https://kevinzakka.github.io/2016/09/26/applying-deep-learning/".  
+  
+!split
+===== Full object-oriented implementation =====
+  
+It is very natural to think of the network as an object, with specific instances of the network
+being realizations of this object with different hyperparameters. An implementation using Python classes provides a clean structure and interface, and the full implementation of our neural network is given below.
+
+
+!bc pycod
+class NeuralNetwork:
+    def __init__(
+            self,
+            X_data,
+            Y_data,
+            n_hidden_neurons=50,
+            n_categories=10,
+            epochs=10,
+            batch_size=100,
+            eta=0.1,
+            lmbd=0.0):
+
+        self.X_data_full = X_data
+        self.Y_data_full = Y_data
+
+        self.n_inputs = X_data.shape[0]
+        self.n_features = X_data.shape[1]
+        self.n_hidden_neurons = n_hidden_neurons
+        self.n_categories = n_categories
+
+        self.epochs = epochs
+        self.batch_size = batch_size
+        self.iterations = self.n_inputs // self.batch_size
+        self.eta = eta
+        self.lmbd = lmbd
+
+        self.create_biases_and_weights()
+
+    def create_biases_and_weights(self):
+        self.hidden_weights = np.random.randn(self.n_features, self.n_hidden_neurons)
+        self.hidden_bias = np.zeros(self.n_hidden_neurons) + 0.01
+
+        self.output_weights = np.random.randn(self.n_hidden_neurons, self.n_categories)
+        self.output_bias = np.zeros(self.n_categories) + 0.01
+
+    def feed_forward(self):
+        # feed-forward for training
+        self.z_h = np.matmul(self.X_data, self.hidden_weights) + self.hidden_bias
+        self.a_h = sigmoid(self.z_h)
+
+        self.z_o = np.matmul(self.a_h, self.output_weights) + self.output_bias
+
+        exp_term = np.exp(self.z_o)
+        self.probabilities = exp_term / np.sum(exp_term, axis=1, keepdims=True)
+
+    def feed_forward_out(self, X):
+        # feed-forward for output
+        z_h = np.matmul(X, self.hidden_weights) + self.hidden_bias
+        a_h = sigmoid(z_h)
+
+        z_o = np.matmul(a_h, self.output_weights) + self.output_bias
+        
+        exp_term = np.exp(z_o)
+        probabilities = exp_term / np.sum(exp_term, axis=1, keepdims=True)
+        return probabilities
+
+    def backpropagation(self):
+        error_output = self.probabilities - self.Y_data
+        error_hidden = np.matmul(error_output, self.output_weights.T) * self.a_h * (1 - self.a_h)
+
+        self.output_weights_gradient = np.matmul(self.a_h.T, error_output)
+        self.output_bias_gradient = np.sum(error_output, axis=0)
+
+        self.hidden_weights_gradient = np.matmul(self.X_data.T, error_hidden)
+        self.hidden_bias_gradient = np.sum(error_hidden, axis=0)
+
+        if self.lmbd > 0.0:
+            self.output_weights_gradient += self.lmbd * self.output_weights
+            self.hidden_weights_gradient += self.lmbd * self.hidden_weights
+
+        self.output_weights -= self.eta * self.output_weights_gradient
+        self.output_bias -= self.eta * self.output_bias_gradient
+        self.hidden_weights -= self.eta * self.hidden_weights_gradient
+        self.hidden_bias -= self.eta * self.hidden_bias_gradient
+
+    def predict(self, X):
+        probabilities = self.feed_forward_out(X)
+        return np.argmax(probabilities, axis=1)
+
+    def predict_probabilities(self, X):
+        probabilities = self.feed_forward_out(X)
+        return probabilities
+
+    def train(self):
+        data_indices = np.arange(self.n_inputs)
+
+        for i in range(self.epochs):
+            for j in range(self.iterations):
+                # pick datapoints with replacement
+                chosen_datapoints = np.random.choice(
+                    data_indices, size=self.batch_size, replace=False
+                )
+
+                # minibatch training data
+                self.X_data = self.X_data_full[chosen_datapoints]
+                self.Y_data = self.Y_data_full[chosen_datapoints]
+
+                self.feed_forward()
+                self.backpropagation()
+!ec
+
+!split
+===== Evaluate model performance on test data =====
+  
+To measure the performance of our network we evaluate how well it does it data it has never seen before, i.e. the test data.  
+We measure the performance of the network using the *accuracy* score.  
+The accuracy is as you would expect just the number of images correctly labeled divided by the total number of images. A perfect classifier will have an accuracy score of $1$.  
+  
+$$ \text{Accuracy} = \frac{\sum_{i=1}^n I(\tilde{y}_i = y_i)}{n} ,$$  
+  
+where $I$ is the indicator function, $1$ if $\tilde{y}_i = y_i$ and $0$ otherwise.
+
+
+!bc pycod
+epochs = 100
+batch_size = 100
+
+dnn = NeuralNetwork(X_train, Y_train_onehot, eta=eta, lmbd=lmbd, epochs=epochs, batch_size=batch_size,
+                    n_hidden_neurons=n_hidden_neurons, n_categories=n_categories)
+dnn.train()
+test_predict = dnn.predict(X_test)
+
+# accuracy score from scikit library
+print("Accuracy score on test set: ", accuracy_score(Y_test, test_predict))
+
+# equivalent in numpy
+def accuracy_score_numpy(Y_test, Y_pred):
+    return np.sum(Y_test == Y_pred) / len(Y_test)
+
+#print("Accuracy score on test set: ", accuracy_score_numpy(Y_test, test_predict))
+!ec
+
+!split
+===== Adjust hyperparameters =====
+  
+We now perform a grid search to find the optimal hyperparameters for the network.  
+Note that we are only using 1 layer with 50 neurons, and human performance is estimated to be around $98\%$ ($2\%$ error rate).
+
+!bc pycod
+eta_vals = np.logspace(-5, 1, 7)
+lmbd_vals = np.logspace(-5, 1, 7)
+# store the models for later use
+DNN_numpy = np.zeros((len(eta_vals), len(lmbd_vals)), dtype=object)
+
+# grid search
+for i, eta in enumerate(eta_vals):
+    for j, lmbd in enumerate(lmbd_vals):
+        dnn = NeuralNetwork(X_train, Y_train_onehot, eta=eta, lmbd=lmbd, epochs=epochs, batch_size=batch_size,
+                            n_hidden_neurons=n_hidden_neurons, n_categories=n_categories)
+        dnn.train()
+        
+        DNN_numpy[i][j] = dnn
+        
+        test_predict = dnn.predict(X_test)
+        
+        print("Learning rate  = ", eta)
+        print("Lambda = ", lmbd)
+        print("Accuracy score on test set: ", accuracy_score(Y_test, test_predict))
+        print()
+!ec
+
+!split
+===== Visualization =====
+
+!bc pycod
+# visual representation of grid search
+# uses seaborn heatmap, you can also do this with matplotlib imshow
+import seaborn as sns
+
+sns.set()
+
+train_accuracy = np.zeros((len(eta_vals), len(lmbd_vals)))
+test_accuracy = np.zeros((len(eta_vals), len(lmbd_vals)))
+
+for i in range(len(eta_vals)):
+    for j in range(len(lmbd_vals)):
+        dnn = DNN_numpy[i][j]
+        
+        train_pred = dnn.predict(X_train) 
+        test_pred = dnn.predict(X_test)
+
+        train_accuracy[i][j] = accuracy_score(Y_train, train_pred)
+        test_accuracy[i][j] = accuracy_score(Y_test, test_pred)
+
+        
+fig, ax = plt.subplots(figsize = (10, 10))
+sns.heatmap(train_accuracy, annot=True, ax=ax, cmap="viridis")
+ax.set_title("Training Accuracy")
+ax.set_ylabel("$\eta$")
+ax.set_xlabel("$\lambda$")
+plt.show()
+
+fig, ax = plt.subplots(figsize = (10, 10))
+sns.heatmap(test_accuracy, annot=True, ax=ax, cmap="viridis")
+ax.set_title("Test Accuracy")
+ax.set_ylabel("$\eta$")
+ax.set_xlabel("$\lambda$")
+plt.show()
+!ec
+
+!split
+===== scikit-learn implementation =====
+  
+_scikit-learn_ focuses more
+on traditional machine learning methods, such as regression,
+clustering, decision trees, etc. As such, it has only two types of
+neural networks: Multi Layer Perceptron outputting continuous values,
+*MPLRegressor*, and Multi Layer Perceptron outputting labels,
+*MLPClassifier*. We will see how simple it is to use these classes.
+  
+_scikit-learn_ implements a few improvements from our neural network,
+such as early stopping, a varying learning rate, different
+optimization methods, etc. We would therefore expect a better
+performance overall.
+
+!bc pycod
+from sklearn.neural_network import MLPClassifier
+# store models for later use
+DNN_scikit = np.zeros((len(eta_vals), len(lmbd_vals)), dtype=object)
+
+for i, eta in enumerate(eta_vals):
+    for j, lmbd in enumerate(lmbd_vals):
+        dnn = MLPClassifier(hidden_layer_sizes=(n_hidden_neurons), activation='logistic',
+                            alpha=lmbd, learning_rate_init=eta, max_iter=epochs)
+        dnn.fit(X_train, Y_train)
+        
+        DNN_scikit[i][j] = dnn
+        
+        print("Learning rate  = ", eta)
+        print("Lambda = ", lmbd)
+        print("Accuracy score on test set: ", dnn.score(X_test, Y_test))
+        print()
+!ec
+
+
+!split
+===== Visualization =====
+!bc pycod
+# optional
+# visual representation of grid search
+# uses seaborn heatmap, could probably do this in matplotlib
+import seaborn as sns
+
+sns.set()
+
+train_accuracy = np.zeros((len(eta_vals), len(lmbd_vals)))
+test_accuracy = np.zeros((len(eta_vals), len(lmbd_vals)))
+
+for i in range(len(eta_vals)):
+    for j in range(len(lmbd_vals)):
+        dnn = DNN_scikit[i][j]
+        
+        train_pred = dnn.predict(X_train) 
+        test_pred = dnn.predict(X_test)
+
+        train_accuracy[i][j] = accuracy_score(Y_train, train_pred)
+        test_accuracy[i][j] = accuracy_score(Y_test, test_pred)
+
+        
+fig, ax = plt.subplots(figsize = (10, 10))
+sns.heatmap(train_accuracy, annot=True, ax=ax, cmap="viridis")
+ax.set_title("Training Accuracy")
+ax.set_ylabel("$\eta$")
+ax.set_xlabel("$\lambda$")
+plt.show()
+
+fig, ax = plt.subplots(figsize = (10, 10))
+sns.heatmap(test_accuracy, annot=True, ax=ax, cmap="viridis")
+ax.set_title("Test Accuracy")
+ax.set_ylabel("$\eta$")
+ax.set_xlabel("$\lambda$")
+plt.show()
+!ec
+
+
+
+
+
+
+
+
+
+!split
+===== Building neural networks in Tensorflow and Keras =====
+
+Now we want  to build on the experience gained from our neural network implementation in NumPy and scikit-learn
+and use it to construct a neural network in Tensorflow. Once we have constructed a neural network in NumPy
+and Tensorflow, building one in Keras is really quite trivial, though the performance may suffer.  
+  
+In our previous example we used only one hidden layer, and in this we will use two. From this it should be quite
+clear how to build one using an arbitrary number of hidden layers, using data structures such as Python lists or
+NumPy arrays.
+
+!split
+===== Tensorflow =====
+  
+Tensorflow is an open source library machine learning library
+developed by the Google Brain team for internal use. It was released
+under the Apache 2.0 open source license in November 9, 2015.
+  
+Tensorflow is a computational framework that allows you to construct
+machine learning models at different levels of abstraction, from
+high-level, object-oriented APIs like Keras, down to the C++ kernels
+that Tensorflow is built upon. The higher levels of abstraction are
+simpler to use, but less flexible, and our choice of implementation
+should reflect the problems we are trying to solve.
+  
+"Tensorflow uses":"https://www.tensorflow.org/guide/graphs" so-called graphs to represent your computation
+in terms of the dependencies between individual operations, such that you first build a Tensorflow *graph*
+to represent your model, and then create a Tensorflow *session* to run the graph.
+  
+In this guide we will analyze the same data as we did in our NumPy and
+scikit-learn tutorial, gathered from the MNIST database of images. We
+will give an introduction to the lower level Python Application
+Program Interfaces (APIs), and see how we use them to build our graph.
+Then we will build (effectively) the same graph in Keras, to see just
+how simple solving a machine learning problem can be.
+  
+To install tensorflow on Unix/Linux systems, use pip as
+!bc pycod
+pip3 install tensorflow
+!ec
+and/or if you use _anaconda_, just write (or install from the graphical user interface)
+(current release of CPU-only TensorFlow)
+!bc pycod 
+conda create -n tf tensorflow
+conda activate tf
+!ec
+To install the current release of GPU TensorFlow
+!bc pycod
+conda create -n tf-gpu tensorflow-gpu
+conda activate tf-gpu
+!ec
+
+!split
+===== Using Keras =====
+  
+Keras is a high level "neural network":"https://en.wikipedia.org/wiki/Application_programming_interface"
+that supports Tensorflow, CTNK and Theano as backends.  
+If you have Anaconda installed you may run the following command
+!bc pycod
+conda install keras
+!ec
+You can look up the "instructions here":"https://keras.io/" for more information.
+
+We will to a large extent use _keras_ in this course. 
+
+!split
+===== Collect and pre-process data =====
+
+Let us look again at the MINST data set.
+
+!bc pycod
+# import necessary packages
+import numpy as np
+import matplotlib.pyplot as plt
+import tensorflow as tf
+from sklearn import datasets
+
+
+# ensure the same random numbers appear every time
+np.random.seed(0)
+
+# display images in notebook
+%matplotlib inline
+plt.rcParams['figure.figsize'] = (12,12)
+
+
+# download MNIST dataset
+digits = datasets.load_digits()
+
+# define inputs and labels
+inputs = digits.images
+labels = digits.target
+
+print("inputs = (n_inputs, pixel_width, pixel_height) = " + str(inputs.shape))
+print("labels = (n_inputs) = " + str(labels.shape))
+
+
+# flatten the image
+# the value -1 means dimension is inferred from the remaining dimensions: 8x8 = 64
+n_inputs = len(inputs)
+inputs = inputs.reshape(n_inputs, -1)
+print("X = (n_inputs, n_features) = " + str(inputs.shape))
+
+
+# choose some random images to display
+indices = np.arange(n_inputs)
+random_indices = np.random.choice(indices, size=5)
+
+for i, image in enumerate(digits.images[random_indices]):
+    plt.subplot(1, 5, i+1)
+    plt.axis('off')
+    plt.imshow(image, cmap=plt.cm.gray_r, interpolation='nearest')
+    plt.title("Label: %d" % digits.target[random_indices[i]])
+plt.show()
+!ec
+
+!bc pycod
+from tensorflow.keras.layers import Input
+from tensorflow.keras.models import Sequential      #This allows appending layers to existing models
+from tensorflow.keras.layers import Dense           #This allows defining the characteristics of a particular layer
+from tensorflow.keras import optimizers             #This allows using whichever optimiser we want (sgd,adam,RMSprop)
+from tensorflow.keras import regularizers           #This allows using whichever regularizer we want (l1,l2,l1_l2)
+from tensorflow.keras.utils import to_categorical   #This allows using categorical cross entropy as the cost function
+
+from sklearn.model_selection import train_test_split
+
+# one-hot representation of labels
+labels = to_categorical(labels)
+
+# split into train and test data
+train_size = 0.8
+test_size = 1 - train_size
+X_train, X_test, Y_train, Y_test = train_test_split(inputs, labels, train_size=train_size,
+                                                    test_size=test_size)
+!ec
+
+
+
+!bc pycod
+
+epochs = 100
+batch_size = 100
+n_neurons_layer1 = 100
+n_neurons_layer2 = 50
+n_categories = 10
+eta_vals = np.logspace(-5, 1, 7)
+lmbd_vals = np.logspace(-5, 1, 7)
+def create_neural_network_keras(n_neurons_layer1, n_neurons_layer2, n_categories, eta, lmbd):
+    model = Sequential()
+    model.add(Dense(n_neurons_layer1, activation='sigmoid', kernel_regularizer=regularizers.l2(lmbd)))
+    model.add(Dense(n_neurons_layer2, activation='sigmoid', kernel_regularizer=regularizers.l2(lmbd)))
+    model.add(Dense(n_categories, activation='softmax'))
+    
+    sgd = optimizers.SGD(learning_rate=eta)
+    model.compile(loss='categorical_crossentropy', optimizer=sgd, metrics=['accuracy'])
+    
+    return model
+!ec
+
+!bc pycod
+DNN_keras = np.zeros((len(eta_vals), len(lmbd_vals)), dtype=object)
+        
+for i, eta in enumerate(eta_vals):
+    for j, lmbd in enumerate(lmbd_vals):
+        DNN = create_neural_network_keras(n_neurons_layer1, n_neurons_layer2, n_categories,
+                                         eta=eta, lmbd=lmbd)
+        DNN.fit(X_train, Y_train, epochs=epochs, batch_size=batch_size, verbose=0)
+        scores = DNN.evaluate(X_test, Y_test)
+        
+        DNN_keras[i][j] = DNN
+        
+        print("Learning rate = ", eta)
+        print("Lambda = ", lmbd)
+        print("Test accuracy: %.3f" % scores[1])
+        print()
+!ec
+
+
+
+!bc pycod
+# optional
+# visual representation of grid search
+# uses seaborn heatmap, could probably do this in matplotlib
+import seaborn as sns
+
+sns.set()
+
+train_accuracy = np.zeros((len(eta_vals), len(lmbd_vals)))
+test_accuracy = np.zeros((len(eta_vals), len(lmbd_vals)))
+
+for i in range(len(eta_vals)):
+    for j in range(len(lmbd_vals)):
+        DNN = DNN_keras[i][j]
+
+        train_accuracy[i][j] = DNN.evaluate(X_train, Y_train)[1]
+        test_accuracy[i][j] = DNN.evaluate(X_test, Y_test)[1]
+
+        
+fig, ax = plt.subplots(figsize = (10, 10))
+sns.heatmap(train_accuracy, annot=True, ax=ax, cmap="viridis")
+ax.set_title("Training Accuracy")
+ax.set_ylabel("$\eta$")
+ax.set_xlabel("$\lambda$")
+plt.show()
+
+fig, ax = plt.subplots(figsize = (10, 10))
+sns.heatmap(test_accuracy, annot=True, ax=ax, cmap="viridis")
+ax.set_title("Test Accuracy")
+ax.set_ylabel("$\eta$")
+ax.set_xlabel("$\lambda$")
+plt.show()
+!ec
+
+
+
+!split
+=====  The Breast Cancer Data, now with Keras =====
+
+!bc pycod
+
+import tensorflow as tf
+from tensorflow.keras.layers import Input
+from tensorflow.keras.models import Sequential      #This allows appending layers to existing models
+from tensorflow.keras.layers import Dense           #This allows defining the characteristics of a particular layer
+from tensorflow.keras import optimizers             #This allows using whichever optimiser we want (sgd,adam,RMSprop)
+from tensorflow.keras import regularizers           #This allows using whichever regularizer we want (l1,l2,l1_l2)
+from tensorflow.keras.utils import to_categorical   #This allows using categorical cross entropy as the cost function
+import numpy as np
+import matplotlib.pyplot as plt
+import seaborn as sns
+from sklearn.model_selection import train_test_split as splitter
+from sklearn.datasets import load_breast_cancer
+import pickle
+import os 
+
+
+"""Load breast cancer dataset"""
+
+np.random.seed(0)        #create same seed for random number every time
+
+cancer=load_breast_cancer()      #Download breast cancer dataset
+
+inputs=cancer.data                     #Feature matrix of 569 rows (samples) and 30 columns (parameters)
+outputs=cancer.target                  #Label array of 569 rows (0 for benign and 1 for malignant)
+labels=cancer.feature_names[0:30]
+
+print('The content of the breast cancer dataset is:')      #Print information about the datasets
+print(labels)
+print('-------------------------')
+print("inputs =  " + str(inputs.shape))
+print("outputs =  " + str(outputs.shape))
+print("labels =  "+ str(labels.shape))
+
+x=inputs      #Reassign the Feature and Label matrices to other variables
+y=outputs
+
+#%% 
+
+# Visualisation of dataset (for correlation analysis)
+
+plt.figure()
+plt.scatter(x[:,0],x[:,2],s=40,c=y,cmap=plt.cm.Spectral)
+plt.xlabel('Mean radius',fontweight='bold')
+plt.ylabel('Mean perimeter',fontweight='bold')
+plt.show()
+
+plt.figure()
+plt.scatter(x[:,5],x[:,6],s=40,c=y, cmap=plt.cm.Spectral)
+plt.xlabel('Mean compactness',fontweight='bold')
+plt.ylabel('Mean concavity',fontweight='bold')
+plt.show()
+
+
+plt.figure()
+plt.scatter(x[:,0],x[:,1],s=40,c=y,cmap=plt.cm.Spectral)
+plt.xlabel('Mean radius',fontweight='bold')
+plt.ylabel('Mean texture',fontweight='bold')
+plt.show()
+
+plt.figure()
+plt.scatter(x[:,2],x[:,1],s=40,c=y,cmap=plt.cm.Spectral)
+plt.xlabel('Mean perimeter',fontweight='bold')
+plt.ylabel('Mean compactness',fontweight='bold')
+plt.show()
+
+
+# Generate training and testing datasets
+
+#Select features relevant to classification (texture,perimeter,compactness and symmetery) 
+#and add to input matrix
+
+temp1=np.reshape(x[:,1],(len(x[:,1]),1))
+temp2=np.reshape(x[:,2],(len(x[:,2]),1))
+X=np.hstack((temp1,temp2))      
+temp=np.reshape(x[:,5],(len(x[:,5]),1))
+X=np.hstack((X,temp))       
+temp=np.reshape(x[:,8],(len(x[:,8]),1))
+X=np.hstack((X,temp))       
+
+X_train,X_test,y_train,y_test=splitter(X,y,test_size=0.1)   #Split datasets into training and testing
+
+y_train=to_categorical(y_train)     #Convert labels to categorical when using categorical cross entropy
+y_test=to_categorical(y_test)
+
+del temp1,temp2,temp
+
+# %%
+
+# Define tunable parameters"
+
+eta=np.logspace(-3,-1,3)                    #Define vector of learning rates (parameter to SGD optimiser)
+lamda=0.01                                  #Define hyperparameter
+n_layers=2                                  #Define number of hidden layers in the model
+n_neuron=np.logspace(0,3,4,dtype=int)       #Define number of neurons per layer
+epochs=100                                   #Number of reiterations over the input data
+batch_size=100                              #Number of samples per gradient update
+
+# %%
+
+"""Define function to return Deep Neural Network model"""
+
+def NN_model(inputsize,n_layers,n_neuron,eta,lamda):
+    model=Sequential()      
+    for i in range(n_layers):       #Run loop to add hidden layers to the model
+        if (i==0):                  #First layer requires input dimensions
+            model.add(Dense(n_neuron,activation='relu',kernel_regularizer=regularizers.l2(lamda),input_dim=inputsize))
+        else:                       #Subsequent layers are capable of automatic shape inferencing
+            model.add(Dense(n_neuron,activation='relu',kernel_regularizer=regularizers.l2(lamda)))
+    model.add(Dense(2,activation='softmax'))  #2 outputs - ordered and disordered (softmax for prob)
+    sgd=optimizers.SGD(learning_rate=eta)
+    model.compile(loss='categorical_crossentropy',optimizer=sgd,metrics=['accuracy'])
+    return model
+
+    
+Train_accuracy=np.zeros((len(n_neuron),len(eta)))      #Define matrices to store accuracy scores as a function
+Test_accuracy=np.zeros((len(n_neuron),len(eta)))       #of learning rate and number of hidden neurons for 
+
+for i in range(len(n_neuron)):     #run loops over hidden neurons and learning rates to calculate 
+    for j in range(len(eta)):      #accuracy scores 
+        DNN_model=NN_model(X_train.shape[1],n_layers,n_neuron[i],eta[j],lamda)
+        DNN_model.fit(X_train,y_train,epochs=epochs,batch_size=batch_size,verbose=1)
+        Train_accuracy[i,j]=DNN_model.evaluate(X_train,y_train)[1]
+        Test_accuracy[i,j]=DNN_model.evaluate(X_test,y_test)[1]
+               
+
+def plot_data(x,y,data,title=None):
+
+    # plot results
+    fontsize=16
+
+
+    fig = plt.figure()
+    ax = fig.add_subplot(111)
+    cax = ax.matshow(data, interpolation='nearest', vmin=0, vmax=1)
+    
+    cbar=fig.colorbar(cax)
+    cbar.ax.set_ylabel('accuracy (%)',rotation=90,fontsize=fontsize)
+    cbar.set_ticks([0,.2,.4,0.6,0.8,1.0])
+    cbar.set_ticklabels(['0%','20%','40%','60%','80%','100%'])
+
+    # put text on matrix elements
+    for i, x_val in enumerate(np.arange(len(x))):
+        for j, y_val in enumerate(np.arange(len(y))):
+            c = "${0:.1f}\\%$".format( 100*data[j,i])  
+            ax.text(x_val, y_val, c, va='center', ha='center')
+
+    # convert axis vaues to to string labels
+    x=[str(i) for i in x]
+    y=[str(i) for i in y]
+
+
+    ax.set_xticklabels(['']+x)
+    ax.set_yticklabels(['']+y)
+
+    ax.set_xlabel('$\\mathrm{learning\\ rate}$',fontsize=fontsize)
+    ax.set_ylabel('$\\mathrm{hidden\\ neurons}$',fontsize=fontsize)
+    if title is not None:
+        ax.set_title(title)
+
+    plt.tight_layout()
+
+    plt.show()
+    
+plot_data(eta,n_neuron,Train_accuracy, 'training')
+plot_data(eta,n_neuron,Test_accuracy, 'testing')
+
+!ec
+
+
+
+
+
+!split
+===== Building a neural network code =====
+
+Here we  present a flexible object oriented codebase
+for a feed forward neural network, along with a demonstration of how
+to use it. Before we get into the details of the neural network, we
+will first present some implementations of various schedulers, cost
+functions and activation functions that can be used together with the
+neural network.
+
+The codes here were developed by Eric Reber and Gregor Kajda during spring 2023.
+
+=== Learning rate methods  ===
+
+The code below shows object oriented implementations of the Constant,
+Momentum, Adagrad, AdagradMomentum, RMS prop and Adam schedulers. All
+of the classes belong to the shared abstract Scheduler class, and
+share the update_change() and reset() methods allowing for any of the
+schedulers to be seamlessly used during the training stage, as will
+later be shown in the fit() method of the neural
+network. Update_change() only has one parameter, the gradient
+($δ^l_ja^{l−1}_k$), and returns the change which will be subtracted
+from the weights. The reset() function takes no parameters, and resets
+the desired variables. For Constant and Momentum, reset does nothing.
+
+
+!bc pycod
+import autograd.numpy as np
+
+class Scheduler:
+    """
+    Abstract class for Schedulers
+    """
+
+    def __init__(self, eta):
+        self.eta = eta
+
+    # should be overwritten
+    def update_change(self, gradient):
+        raise NotImplementedError
+
+    # overwritten if needed
+    def reset(self):
+        pass
+
+
+class Constant(Scheduler):
+    def __init__(self, eta):
+        super().__init__(eta)
+
+    def update_change(self, gradient):
+        return self.eta * gradient
+    
+    def reset(self):
+        pass
+
+
+class Momentum(Scheduler):
+    def __init__(self, eta: float, momentum: float):
+        super().__init__(eta)
+        self.momentum = momentum
+        self.change = 0
+
+    def update_change(self, gradient):
+        self.change = self.momentum * self.change + self.eta * gradient
+        return self.change
+
+    def reset(self):
+        pass
+
+
+class Adagrad(Scheduler):
+    def __init__(self, eta):
+        super().__init__(eta)
+        self.G_t = None
+
+    def update_change(self, gradient):
+        delta = 1e-8  # avoid division ny zero
+
+        if self.G_t is None:
+            self.G_t = np.zeros((gradient.shape[0], gradient.shape[0]))
+
+        self.G_t += gradient @ gradient.T
+
+        G_t_inverse = 1 / (
+            delta + np.sqrt(np.reshape(np.diagonal(self.G_t), (self.G_t.shape[0], 1)))
+        )
+        return self.eta * gradient * G_t_inverse
+
+    def reset(self):
+        self.G_t = None
+
+
+class AdagradMomentum(Scheduler):
+    def __init__(self, eta, momentum):
+        super().__init__(eta)
+        self.G_t = None
+        self.momentum = momentum
+        self.change = 0
+
+    def update_change(self, gradient):
+        delta = 1e-8  # avoid division ny zero
+
+        if self.G_t is None:
+            self.G_t = np.zeros((gradient.shape[0], gradient.shape[0]))
+
+        self.G_t += gradient @ gradient.T
+
+        G_t_inverse = 1 / (
+            delta + np.sqrt(np.reshape(np.diagonal(self.G_t), (self.G_t.shape[0], 1)))
+        )
+        self.change = self.change * self.momentum + self.eta * gradient * G_t_inverse
+        return self.change
+
+    def reset(self):
+        self.G_t = None
+
+
+class RMS_prop(Scheduler):
+    def __init__(self, eta, rho):
+        super().__init__(eta)
+        self.rho = rho
+        self.second = 0.0
+
+    def update_change(self, gradient):
+        delta = 1e-8  # avoid division ny zero
+        self.second = self.rho * self.second + (1 - self.rho) * gradient * gradient
+        return self.eta * gradient / (np.sqrt(self.second + delta))
+
+    def reset(self):
+        self.second = 0.0
+
+
+class Adam(Scheduler):
+    def __init__(self, eta, rho, rho2):
+        super().__init__(eta)
+        self.rho = rho
+        self.rho2 = rho2
+        self.moment = 0
+        self.second = 0
+        self.n_epochs = 1
+
+    def update_change(self, gradient):
+        delta = 1e-8  # avoid division ny zero
+
+        self.moment = self.rho * self.moment + (1 - self.rho) * gradient
+        self.second = self.rho2 * self.second + (1 - self.rho2) * gradient * gradient
+
+        moment_corrected = self.moment / (1 - self.rho**self.n_epochs)
+        second_corrected = self.second / (1 - self.rho2**self.n_epochs)
+
+        return self.eta * moment_corrected / (np.sqrt(second_corrected + delta))
+
+    def reset(self):
+        self.n_epochs += 1
+        self.moment = 0
+        self.second = 0
+
+!ec
+
+=== Usage of the above learning rate schedulers ===
+
+To initalize a scheduler, simply create the object and pass in the
+necessary parameters such as the learning rate and the momentum as
+shown below. As the Scheduler class is an abstract class it should not
+called directly, and will raise an error upon usage.
+
+!bc pycod
+momentum_scheduler = Momentum(eta=1e-3, momentum=0.9)
+adam_scheduler = Adam(eta=1e-3, rho=0.9, rho2=0.999)
+!ec
+
+Here is a small example for how a segment of code using schedulers
+could look. Switching out the schedulers is simple.
+
+!bc pycod
+weights = np.ones((3,3))
+print(f"Before scheduler:\n{weights=}")
+
+epochs = 10
+for e in range(epochs):
+    gradient = np.random.rand(3, 3)
+    change = adam_scheduler.update_change(gradient)
+    weights = weights - change
+    adam_scheduler.reset()
+
+print(f"\nAfter scheduler:\n{weights=}")
+!ec
+
+
+=== Cost functions ===
+
+Here we discuss cost functions that can be used when creating the
+neural network. Every cost function takes the target vector as its
+parameter, and returns a function valued only at $x$ such that it may
+easily be differentiated.
+
+
+!bc pycod
+import autograd.numpy as np
+
+def CostOLS(target):
+    
+    def func(X):
+        return (1.0 / target.shape[0]) * np.sum((target - X) ** 2)
+
+    return func
+
+
+def CostLogReg(target):
+
+    def func(X):
+        
+        return -(1.0 / target.shape[0]) * np.sum(
+            (target * np.log(X + 10e-10)) + ((1 - target) * np.log(1 - X + 10e-10))
+        )
+
+    return func
+
+
+def CostCrossEntropy(target):
+    
+    def func(X):
+        return -(1.0 / target.size) * np.sum(target * np.log(X + 10e-10))
+
+    return func
+!ec
+
+
+Below we give a short example of how these cost function may be used
+to obtain results if you wish to test them out on your own using
+AutoGrad's automatics differentiation.
+
+!bc pycod
+from autograd import grad
+
+target = np.array([[1, 2, 3]]).T
+a = np.array([[4, 5, 6]]).T
+
+cost_func = CostCrossEntropy
+cost_func_derivative = grad(cost_func(target))
+
+valued_at_a = cost_func_derivative(a)
+print(f"Derivative of cost function {cost_func.__name__} valued at a:\n{valued_at_a}")
+!ec
+
+
+=== Activation functions ===
+
+Finally, before we look at the neural network, we will look at the
+activation functions which can be specified between the hidden layers
+and as the output function. Each function can be valued for any given
+vector or matrix X, and can be differentiated via derivate().
+
+!bc pycod
+import autograd.numpy as np
+from autograd import elementwise_grad
+
+def identity(X):
+    return X
+
+
+def sigmoid(X):
+    try:
+        return 1.0 / (1 + np.exp(-X))
+    except FloatingPointError:
+        return np.where(X > np.zeros(X.shape), np.ones(X.shape), np.zeros(X.shape))
+
+
+def softmax(X):
+    X = X - np.max(X, axis=-1, keepdims=True)
+    delta = 10e-10
+    return np.exp(X) / (np.sum(np.exp(X), axis=-1, keepdims=True) + delta)
+
+
+def RELU(X):
+    return np.where(X > np.zeros(X.shape), X, np.zeros(X.shape))
+
+
+def LRELU(X):
+    delta = 10e-4
+    return np.where(X > np.zeros(X.shape), X, delta * X)
+
+
+def derivate(func):
+    if func.__name__ == "RELU":
+
+        def func(X):
+            return np.where(X > 0, 1, 0)
+
+        return func
+
+    elif func.__name__ == "LRELU":
+
+        def func(X):
+            delta = 10e-4
+            return np.where(X > 0, 1, delta)
+
+        return func
+
+    else:
+        return elementwise_grad(func)
+!ec
+
+Below follows a short demonstration of how to use an activation
+function. The derivative of the activation function will be important
+when calculating the output delta term during backpropagation. Note
+that derivate() can also be used for cost functions for a more
+generalized approach.
+
+!bc pycod
+z = np.array([[4, 5, 6]]).T
+print(f"Input to activation function:\n{z}")
+
+act_func = sigmoid
+a = act_func(z)
+print(f"\nOutput from {act_func.__name__} activation function:\n{a}")
+
+act_func_derivative = derivate(act_func)
+valued_at_z = act_func_derivative(a)
+print(f"\nDerivative of {act_func.__name__} activation function valued at z:\n{valued_at_z}")
+!ec
+
+=== The Neural Network ===
+
+Now that we have gotten a good understanding of the implementation of
+some important components, we can take a look at an object oriented
+implementation of a feed forward neural network. The feed forward
+neural network has been implemented as a class named FFNN, which can
+be initiated as a regressor or classifier dependant on the choice of
+cost function. The FFNN can have any number of input nodes, hidden
+layers with any amount of hidden nodes, and any amount of output nodes
+meaning it can perform multiclass classification as well as binary
+classification and regression problems. Although there is a lot of
+code present, it makes for an easy to use and generalizeable interface
+for creating many types of neural networks as will be demonstrated
+below.
+
+!bc pycod
+import math
+import autograd.numpy as np
+import sys
+import warnings
+from autograd import grad, elementwise_grad
+from random import random, seed
+from copy import deepcopy, copy
+from typing import Tuple, Callable
+from sklearn.utils import resample
+
+warnings.simplefilter("error")
+
+
+class FFNN:
+    """
+    Description:
+    ------------
+        Feed Forward Neural Network with interface enabling flexible design of a
+        nerual networks architecture and the specification of activation function
+        in the hidden layers and output layer respectively. This model can be used
+        for both regression and classification problems, depending on the output function.
+
+    Attributes:
+    ------------
+        I   dimensions (tuple[int]): A list of positive integers, which specifies the
+            number of nodes in each of the networks layers. The first integer in the array
+            defines the number of nodes in the input layer, the second integer defines number
+            of nodes in the first hidden layer and so on until the last number, which
+            specifies the number of nodes in the output layer.
+        II  hidden_func (Callable): The activation function for the hidden layers
+        III output_func (Callable): The activation function for the output layer
+        IV  cost_func (Callable): Our cost function
+        V   seed (int): Sets random seed, makes results reproducible
+    """
+
+    def __init__(
+        self,
+        dimensions: tuple[int],
+        hidden_func: Callable = sigmoid,
+        output_func: Callable = lambda x: x,
+        cost_func: Callable = CostOLS,
+        seed: int = None,
+    ):
+        self.dimensions = dimensions
+        self.hidden_func = hidden_func
+        self.output_func = output_func
+        self.cost_func = cost_func
+        self.seed = seed
+        self.weights = list()
+        self.schedulers_weight = list()
+        self.schedulers_bias = list()
+        self.a_matrices = list()
+        self.z_matrices = list()
+        self.classification = None
+
+        self.reset_weights()
+        self._set_classification()
+
+    def fit(
+        self,
+        X: np.ndarray,
+        t: np.ndarray,
+        scheduler: Scheduler,
+        batches: int = 1,
+        epochs: int = 100,
+        lam: float = 0,
+        X_val: np.ndarray = None,
+        t_val: np.ndarray = None,
+    ):
+        """
+        Description:
+        ------------
+            This function performs the training the neural network by performing the feedforward and backpropagation
+            algorithm to update the networks weights.
+
+        Parameters:
+        ------------
+            I    X (np.ndarray) : training data
+            II   t (np.ndarray) : target data
+            III  scheduler (Scheduler) : specified scheduler (algorithm for optimization of gradient descent)
+            IV   scheduler_args (list[int]) : list of all arguments necessary for scheduler
+
+        Optional Parameters:
+        ------------
+            V    batches (int) : number of batches the datasets are split into, default equal to 1
+            VI   epochs (int) : number of iterations used to train the network, default equal to 100
+            VII  lam (float) : regularization hyperparameter lambda
+            VIII X_val (np.ndarray) : validation set
+            IX   t_val (np.ndarray) : validation target set
+
+        Returns:
+        ------------
+            I   scores (dict) : A dictionary containing the performance metrics of the model.
+                The number of the metrics depends on the parameters passed to the fit-function.
+
+        """
+
+        # setup 
+        if self.seed is not None:
+            np.random.seed(self.seed)
+
+        val_set = False
+        if X_val is not None and t_val is not None:
+            val_set = True
+
+        # creating arrays for score metrics
+        train_errors = np.empty(epochs)
+        train_errors.fill(np.nan)
+        val_errors = np.empty(epochs)
+        val_errors.fill(np.nan)
+
+        train_accs = np.empty(epochs)
+        train_accs.fill(np.nan)
+        val_accs = np.empty(epochs)
+        val_accs.fill(np.nan)
+
+        self.schedulers_weight = list()
+        self.schedulers_bias = list()
+
+        batch_size = X.shape[0] // batches
+
+        X, t = resample(X, t)
+
+        # this function returns a function valued only at X
+        cost_function_train = self.cost_func(t)
+        if val_set:
+            cost_function_val = self.cost_func(t_val)
+
+        # create schedulers for each weight matrix
+        for i in range(len(self.weights)):
+            self.schedulers_weight.append(copy(scheduler))
+            self.schedulers_bias.append(copy(scheduler))
+
+        print(f"{scheduler.__class__.__name__}: Eta={scheduler.eta}, Lambda={lam}")
+
+        try:
+            for e in range(epochs):
+                for i in range(batches):
+                    # allows for minibatch gradient descent
+                    if i == batches - 1:
+                        # If the for loop has reached the last batch, take all thats left
+                        X_batch = X[i * batch_size :, :]
+                        t_batch = t[i * batch_size :, :]
+                    else:
+                        X_batch = X[i * batch_size : (i + 1) * batch_size, :]
+                        t_batch = t[i * batch_size : (i + 1) * batch_size, :]
+
+                    self._feedforward(X_batch)
+                    self._backpropagate(X_batch, t_batch, lam)
+
+                # reset schedulers for each epoch (some schedulers pass in this call)
+                for scheduler in self.schedulers_weight:
+                    scheduler.reset()
+
+                for scheduler in self.schedulers_bias:
+                    scheduler.reset()
+
+                # computing performance metrics
+                pred_train = self.predict(X)
+                train_error = cost_function_train(pred_train)
+
+                train_errors[e] = train_error
+                if val_set:
+                    
+                    pred_val = self.predict(X_val)
+                    val_error = cost_function_val(pred_val)
+                    val_errors[e] = val_error
+
+                if self.classification:
+                    train_acc = self._accuracy(self.predict(X), t)
+                    train_accs[e] = train_acc
+                    if val_set:
+                        val_acc = self._accuracy(pred_val, t_val)
+                        val_accs[e] = val_acc
+
+                # printing progress bar
+                progression = e / epochs
+                print_length = self._progress_bar(
+                    progression,
+                    train_error=train_errors[e],
+                    train_acc=train_accs[e],
+                    val_error=val_errors[e],
+                    val_acc=val_accs[e],
+                )
+        except KeyboardInterrupt:
+            # allows for stopping training at any point and seeing the result
+            pass
+
+        # visualization of training progression (similiar to tensorflow progression bar)
+        sys.stdout.write("\r" + " " * print_length)
+        sys.stdout.flush()
+        self._progress_bar(
+            1,
+            train_error=train_errors[e],
+            train_acc=train_accs[e],
+            val_error=val_errors[e],
+            val_acc=val_accs[e],
+        )
+        sys.stdout.write("")
+
+        # return performance metrics for the entire run
+        scores = dict()
+
+        scores["train_errors"] = train_errors
+
+        if val_set:
+            scores["val_errors"] = val_errors
+
+        if self.classification:
+            scores["train_accs"] = train_accs
+
+            if val_set:
+                scores["val_accs"] = val_accs
+
+        return scores
+
+    def predict(self, X: np.ndarray, *, threshold=0.5):
+        """
+         Description:
+         ------------
+             Performs prediction after training of the network has been finished.
+
+         Parameters:
+        ------------
+             I   X (np.ndarray): The design matrix, with n rows of p features each
+
+         Optional Parameters:
+         ------------
+             II  threshold (float) : sets minimal value for a prediction to be predicted as the positive class
+                 in classification problems
+
+         Returns:
+         ------------
+             I   z (np.ndarray): A prediction vector (row) for each row in our design matrix
+                 This vector is thresholded if regression=False, meaning that classification results
+                 in a vector of 1s and 0s, while regressions in an array of decimal numbers
+
+        """
+
+        predict = self._feedforward(X)
+
+        if self.classification:
+            return np.where(predict > threshold, 1, 0)
+        else:
+            return predict
+
+    def reset_weights(self):
+        """
+        Description:
+        ------------
+            Resets/Reinitializes the weights in order to train the network for a new problem.
+
+        """
+        if self.seed is not None:
+            np.random.seed(self.seed)
+
+        self.weights = list()
+        for i in range(len(self.dimensions) - 1):
+            weight_array = np.random.randn(
+                self.dimensions[i] + 1, self.dimensions[i + 1]
+            )
+            weight_array[0, :] = np.random.randn(self.dimensions[i + 1]) * 0.01
+
+            self.weights.append(weight_array)
+
+    def _feedforward(self, X: np.ndarray):
+        """
+        Description:
+        ------------
+            Calculates the activation of each layer starting at the input and ending at the output.
+            Each following activation is calculated from a weighted sum of each of the preceeding
+            activations (except in the case of the input layer).
+
+        Parameters:
+        ------------
+            I   X (np.ndarray): The design matrix, with n rows of p features each
+
+        Returns:
+        ------------
+            I   z (np.ndarray): A prediction vector (row) for each row in our design matrix
+        """
+
+        # reset matrices
+        self.a_matrices = list()
+        self.z_matrices = list()
+
+        # if X is just a vector, make it into a matrix
+        if len(X.shape) == 1:
+            X = X.reshape((1, X.shape[0]))
+
+        # Add a coloumn of zeros as the first coloumn of the design matrix, in order
+        # to add bias to our data
+        bias = np.ones((X.shape[0], 1)) * 0.01
+        X = np.hstack([bias, X])
+
+        # a^0, the nodes in the input layer (one a^0 for each row in X - where the
+        # exponent indicates layer number).
+        a = X
+        self.a_matrices.append(a)
+        self.z_matrices.append(a)
+
+        # The feed forward algorithm
+        for i in range(len(self.weights)):
+            if i < len(self.weights) - 1:
+                z = a @ self.weights[i]
+                self.z_matrices.append(z)
+                a = self.hidden_func(z)
+                # bias column again added to the data here
+                bias = np.ones((a.shape[0], 1)) * 0.01
+                a = np.hstack([bias, a])
+                self.a_matrices.append(a)
+            else:
+                try:
+                    # a^L, the nodes in our output layers
+                    z = a @ self.weights[i]
+                    a = self.output_func(z)
+                    self.a_matrices.append(a)
+                    self.z_matrices.append(z)
+                except Exception as OverflowError:
+                    print(
+                        "OverflowError in fit() in FFNN\nHOW TO DEBUG ERROR: Consider lowering your learning rate or scheduler specific parameters such as momentum, or check if your input values need scaling"
+                    )
+
+        # this will be a^L
+        return a
+
+    def _backpropagate(self, X, t, lam):
+        """
+        Description:
+        ------------
+            Performs the backpropagation algorithm. In other words, this method
+            calculates the gradient of all the layers starting at the
+            output layer, and moving from right to left accumulates the gradient until
+            the input layer is reached. Each layers respective weights are updated while
+            the algorithm propagates backwards from the output layer (auto-differentation in reverse mode).
+
+        Parameters:
+        ------------
+            I   X (np.ndarray): The design matrix, with n rows of p features each.
+            II  t (np.ndarray): The target vector, with n rows of p targets.
+            III lam (float32): regularization parameter used to punish the weights in case of overfitting
+
+        Returns:
+        ------------
+            No return value.
+
+        """
+        out_derivative = derivate(self.output_func)
+        hidden_derivative = derivate(self.hidden_func)
+
+        for i in range(len(self.weights) - 1, -1, -1):
+            # delta terms for output
+            if i == len(self.weights) - 1:
+                # for multi-class classification
+                if (
+                    self.output_func.__name__ == "softmax"
+                ):
+                    delta_matrix = self.a_matrices[i + 1] - t
+                # for single class classification
+                else:
+                    cost_func_derivative = grad(self.cost_func(t))
+                    delta_matrix = out_derivative(
+                        self.z_matrices[i + 1]
+                    ) * cost_func_derivative(self.a_matrices[i + 1])
+
+            # delta terms for hidden layer
+            else:
+                delta_matrix = (
+                    self.weights[i + 1][1:, :] @ delta_matrix.T
+                ).T * hidden_derivative(self.z_matrices[i + 1])
+
+            # calculate gradient
+            gradient_weights = self.a_matrices[i][:, 1:].T @ delta_matrix
+            gradient_bias = np.sum(delta_matrix, axis=0).reshape(
+                1, delta_matrix.shape[1]
+            )
+
+            # regularization term
+            gradient_weights += self.weights[i][1:, :] * lam
+
+            # use scheduler
+            update_matrix = np.vstack(
+                [
+                    self.schedulers_bias[i].update_change(gradient_bias),
+                    self.schedulers_weight[i].update_change(gradient_weights),
+                ]
+            )
+
+            # update weights and bias
+            self.weights[i] -= update_matrix
+
+    def _accuracy(self, prediction: np.ndarray, target: np.ndarray):
+        """
+        Description:
+        ------------
+            Calculates accuracy of given prediction to target
+
+        Parameters:
+        ------------
+            I   prediction (np.ndarray): vector of predicitons output network
+                (1s and 0s in case of classification, and real numbers in case of regression)
+            II  target (np.ndarray): vector of true values (What the network ideally should predict)
+
+        Returns:
+        ------------
+            A floating point number representing the percentage of correctly classified instances.
+        """
+        assert prediction.size == target.size
+        return np.average((target == prediction))
+    def _set_classification(self):
+        """
+        Description:
+        ------------
+            Decides if FFNN acts as classifier (True) og regressor (False),
+            sets self.classification during init()
+        """
+        self.classification = False
+        if (
+            self.cost_func.__name__ == "CostLogReg"
+            or self.cost_func.__name__ == "CostCrossEntropy"
+        ):
+            self.classification = True
+
+    def _progress_bar(self, progression, **kwargs):
+        """
+        Description:
+        ------------
+            Displays progress of training
+        """
+        print_length = 40
+        num_equals = int(progression * print_length)
+        num_not = print_length - num_equals
+        arrow = ">" if num_equals > 0 else ""
+        bar = "[" + "=" * (num_equals - 1) + arrow + "-" * num_not + "]"
+        perc_print = self._format(progression * 100, decimals=5)
+        line = f"  {bar} {perc_print}% "
+
+        for key in kwargs:
+            if not np.isnan(kwargs[key]):
+                value = self._format(kwargs[key], decimals=4)
+                line += f"| {key}: {value} "
+        sys.stdout.write("\r" + line)
+        sys.stdout.flush()
+        return len(line)
+
+    def _format(self, value, decimals=4):
+        """
+        Description:
+        ------------
+            Formats decimal numbers for progress bar
+        """
+        if value > 0:
+            v = value
+        elif value < 0:
+            v = -10 * value
+        else:
+            v = 1
+        n = 1 + math.floor(math.log10(v))
+        if n >= decimals - 1:
+            return str(round(value))
+        return f"{value:.{decimals-n-1}f}"
+!ec
+
+Before we make a model, we will quickly generate a dataset we can use
+for our linear regression problem as shown below
+
+!bc pycod
+import autograd.numpy as np
+from sklearn.model_selection import train_test_split
+
+def SkrankeFunction(x, y):
+    return np.ravel(0 + 1*x + 2*y + 3*x**2 + 4*x*y + 5*y**2)
+
+def create_X(x, y, n):
+    if len(x.shape) > 1:
+        x = np.ravel(x)
+        y = np.ravel(y)
+
+    N = len(x)
+    l = int((n + 1) * (n + 2) / 2)  # Number of elements in beta
+    X = np.ones((N, l))
+
+    for i in range(1, n + 1):
+        q = int((i) * (i + 1) / 2)
+        for k in range(i + 1):
+            X[:, q + k] = (x ** (i - k)) * (y**k)
+
+    return X
+
+step=0.5
+x = np.arange(0, 1, step)
+y = np.arange(0, 1, step)
+x, y = np.meshgrid(x, y)
+target = SkrankeFunction(x, y)
+target = target.reshape(target.shape[0], 1)
+
+poly_degree=3
+X = create_X(x, y, poly_degree)
+
+X_train, X_test, t_train, t_test = train_test_split(X, target)
+
+!ec
+
+Now that we have our dataset ready for the regression, we can create
+our regressor. Note that with the seed parameter, we can make sure our
+results stay the same every time we run the neural network. For
+inititialization, we simply specify the dimensions (we wish the amount
+of input nodes to be equal to the datapoints, and the output to
+predict one value).
+
+
+!bc pycod
+input_nodes = X_train.shape[1]
+output_nodes = 1
+
+linear_regression = FFNN((input_nodes, output_nodes), output_func=identity, cost_func=CostOLS, seed=2023)
+
+!ec
+
+We then fit our model with our training data using the scheduler of our choice.
+
+!bc pycod
+linear_regression.reset_weights() # reset weights such that previous runs or reruns don't affect the weights
+
+scheduler = Constant(eta=1e-3)
+scores = linear_regression.fit(X_train, t_train, scheduler)
+
+
+!ec
+
+Due to the progress bar we can see the MSE (train_error) throughout
+the FFNN's training. Note that the fit() function has some optional
+parameters with defualt arguments. For example, the regularization
+hyperparameter can be left ignored if not needed, and equally the FFNN
+will by default run for 100 epochs. These can easily be changed, such
+as for example:
+
+!bc pycod
+linear_regression.reset_weights() # reset weights such that previous runs or reruns don't affect the weights
+
+scores = linear_regression.fit(X_train, t_train, scheduler, lam=1e-4, epochs=1000)
+
+!ec
+
+We see that given more epochs to train on, the regressor reaches a lower MSE.
+
+Let us then switch to a binary classification. We use a binary
+classification dataset, and follow a similar setup to the regression
+case.
+
+
+
+!bc pycod
+from sklearn.datasets import load_breast_cancer
+from sklearn.preprocessing import MinMaxScaler
+
+wisconsin = load_breast_cancer()
+X = wisconsin.data
+target = wisconsin.target
+target = target.reshape(target.shape[0], 1)
+
+X_train, X_val, t_train, t_val = train_test_split(X, target)
+
+scaler = MinMaxScaler()
+scaler.fit(X_train)
+X_train = scaler.transform(X_train)
+X_val = scaler.transform(X_val)
+
+
+!ec
+
+!bc pycod
+input_nodes = X_train.shape[1]
+output_nodes = 1
+
+logistic_regression = FFNN((input_nodes, output_nodes), output_func=sigmoid, cost_func=CostLogReg, seed=2023)
+
+!ec
+
+We will now make use of our validation data by passing it into our fit function as a keyword argument
+
+!bc pycod
+logistic_regression.reset_weights() # reset weights such that previous runs or reruns don't affect the weights
+
+scheduler = Adam(eta=1e-3, rho=0.9, rho2=0.999)
+scores = logistic_regression.fit(X_train, t_train, scheduler, epochs=1000, X_val=X_val, t_val=t_val)
+
+
+!ec
+
+Finally, we will create a neural network with 2 hidden layers with activation functions.
+!bc pycod
+input_nodes = X_train.shape[1]
+hidden_nodes1 = 100
+hidden_nodes2 = 30
+output_nodes = 1
+
+dims = (input_nodes, hidden_nodes1, hidden_nodes2, output_nodes)
+
+neural_network = FFNN(dims, hidden_func=RELU, output_func=sigmoid, cost_func=CostLogReg, seed=2023)
+
+
+!ec
+
+!bc pycod
+neural_network.reset_weights() # reset weights such that previous runs or reruns don't affect the weights
+
+scheduler = Adam(eta=1e-4, rho=0.9, rho2=0.999)
+scores = neural_network.fit(X_train, t_train, scheduler, epochs=1000, X_val=X_val, t_val=t_val)
+
+!ec
+
+=== Multiclass classification ===
+
+Finally, we will demonstrate the use case of multiclass classification
+using our FFNN with the famous MNIST dataset, which contain images of
+digits between the range of 0 to 9.
+
+
+!bc pycod
+from sklearn.datasets import load_digits
+
+def onehot(target: np.ndarray):
+    onehot = np.zeros((target.size, target.max() + 1))
+    onehot[np.arange(target.size), target] = 1
+    return onehot
+
+digits = load_digits()
+
+X = digits.data
+target = digits.target
+target = onehot(target)
+
+input_nodes = 64
+hidden_nodes1 = 100
+hidden_nodes2 = 30
+output_nodes = 10
+
+dims = (input_nodes, hidden_nodes1, hidden_nodes2, output_nodes)
+
+multiclass = FFNN(dims, hidden_func=LRELU, output_func=softmax, cost_func=CostCrossEntropy)
+
+multiclass.reset_weights() # reset weights such that previous runs or reruns don't affect the weights
+
+scheduler = Adam(eta=1e-4, rho=0.9, rho2=0.999)
+scores = multiclass.fit(X, target, scheduler, epochs=1000)
+
+!ec
+
+
+
+!split
+===== Testing the XOR gate and other gates =====
+
+Let us now use our code to test the XOR gate.
+
+!bc pycod
+X = np.array([ [0, 0], [0, 1], [1, 0],[1, 1]],dtype=np.float64)
+
+# The XOR gate
+yXOR = np.array( [[ 0], [1] ,[1], [0]])
+
+input_nodes = X.shape[1]
+output_nodes = 1
+
+logistic_regression = FFNN((input_nodes, output_nodes), output_func=sigmoid, cost_func=CostLogReg, seed=2023)
+logistic_regression.reset_weights() # reset weights such that previous runs or reruns don't affect the weights
+scheduler = Adam(eta=1e-1, rho=0.9, rho2=0.999)
+scores = logistic_regression.fit(X, yXOR, scheduler, epochs=1000)
+!ec
+Not bad, but the results depend strongly on the learning reate. Try different learning rates.
+
+
+
+
+
+
+!split
+===== Solving differential equations  with Deep Learning =====
+
+!bblock
+The Universal Approximation Theorem states that a neural network can
+approximate any function at a single hidden layer along with one input
+and output layer to any given precision.  
+!eblock
+
+!bblock  Book on solving differential equations with ML methods
+"An Introduction to Neural Network Methods for Differential Equations":"https://www.springer.com/gp/book/9789401798150", by Yadav and Kumar.
+!eblock
+
+!bblock Physics informed neural networks
+"Scientific Machine Learning Through Physics–Informed Neural Networks: Where we are and What’s Next":"https://link.springer.com/article/10.1007/s10915-022-01939-z", by Cuomo et al
+!eblock
+
+
+
+!bblock Thanks to Kristine Baluka Hein
+The lectures on differential equations were developed by Kristine Baluka Hein, now PhD student at IFI.
+A great thanks to Kristine.
+!eblock
+
+!split
+===== Ordinary Differential Equations first  =====
+
+An ordinary differential equation (ODE) is an equation involving functions having one variable.
+
+In general, an ordinary differential equation looks like
+
+!bt
+\begin{equation} \label{ode}
+f\left(x, \, g(x), \, g'(x), \, g''(x), \, \dots \, , \, g^{(n)}(x)\right) = 0
+\end{equation}
+!et
+
+where $g(x)$ is the function to find, and $g^{(n)}(x)$ is the $n$-th derivative of $g(x)$.
+
+The $f\left(x, g(x), g'(x), g''(x), \, \dots \, , g^{(n)}(x)\right)$ is just a way to write that there is an expression involving $x$ and $g(x), \ g'(x), \ g''(x), \, \dots \, , \text{ and } g^{(n)}(x)$ on the left side of the equality sign in (ref{ode}).
+The highest order of derivative, that is the value of $n$, determines to the order of the equation.
+The equation is referred to as a $n$-th order ODE.
+Along with (ref{ode}), some additional conditions of the function $g(x)$ are typically given
+for the solution to be unique.
+
+!split
+===== The trial solution =====
+
+Let the trial solution $g_t(x)$ be
+
+!bt
+\begin{equation}
+	g_t(x) = h_1(x) + h_2(x,N(x,P))
+\end{equation}
+!et
+
+
+where $h_1(x)$ is a function that makes $g_t(x)$ satisfy a given set
+of conditions, $N(x,P)$ a neural network with weights and biases
+described by $P$ and $h_2(x, N(x,P))$ some expression involving the
+neural network.  The role of the function $h_2(x, N(x,P))$, is to
+ensure that the output from $N(x,P)$ is zero when $g_t(x)$ is
+evaluated at the values of $x$ where the given conditions must be
+satisfied.  The function $h_1(x)$ should alone make $g_t(x)$ satisfy
+the conditions.
+
+But what about the network $N(x,P)$?
+
+
+As described previously, an optimization method could be used to minimize the parameters of a neural network, that being its weights and biases, through backward propagation.
+
+
+!split
+===== Minimization process =====
+
+For the minimization to be defined, we need to have a cost function at hand to minimize.
+
+It is given that $f\left(x, \, g(x), \, g'(x), \, g''(x), \, \dots \, , \, g^{(n)}(x)\right)$ should be equal to zero in (ref{ode}).
+We can choose to consider the mean squared error as the cost function for an input $x$.
+Since we are looking at one input, the cost function is just $f$ squared.
+The cost function $c\left(x, P \right)$ can therefore be expressed as
+
+!bt
+C\left(x, P\right) = \big(f\left(x, \, g(x), \, g'(x), \, g''(x), \, \dots \, , \, g^{(n)}(x)\right)\big)^2
+!et
+
+If $N$ inputs are given as a vector $\bm{x}$ with elements $x_i$ for $i = 1,\dots,N$,
+the cost function becomes
+
+!bt
+\begin{equation} \label{cost}
+	C\left(\bm{x}, P\right) = \frac{1}{N} \sum_{i=1}^N \big(f\left(x_i, \, g(x_i), \, g'(x_i), \, g''(x_i), \, \dots \, , \, g^{(n)}(x_i)\right)\big)^2
+\end{equation}
+!et
+
+The neural net should then find the parameters $P$ that minimizes the cost function in
+(ref{cost}) for a set of $N$ training samples $x_i$.
+
+!split
+===== Minimizing the cost function using gradient descent and automatic differentiation =====
+
+To perform the minimization using gradient descent, the gradient of $C\left(\bm{x}, P\right)$ is needed.
+It might happen so that finding an analytical expression of the gradient of $C(\bm{x}, P)$ from (ref{cost}) gets too messy, depending on which cost function one desires to use.
+
+Luckily, there exists libraries that makes the job for us through automatic differentiation.
+Automatic differentiation is a method of finding the derivatives numerically with very high precision.
+
+
+!split
+===== Example: Exponential decay =====
+
+An exponential decay of a quantity $g(x)$ is described by the equation
+
+!bt
+\begin{equation} \label{solve_expdec}
+  g'(x) = -\gamma g(x)
+\end{equation}
+!et
+
+with $g(0) = g_0$ for some chosen initial value $g_0$.
+
+The analytical solution of (ref{solve_expdec}) is
+
+!bt
+\begin{equation}
+  g(x) = g_0 \exp\left(-\gamma x\right)
+\end{equation}
+!et
+
+Having an analytical solution at hand, it is possible to use it to compare how well a neural network finds a solution of (ref{solve_expdec}).
+
+
+!split
+===== The function to solve for =====
+
+The program will use a neural network to solve
+
+!bt
+\begin{equation} \label{solveode}
+g'(x) = -\gamma g(x)
+\end{equation}
+!et
+
+where $g(0) = g_0$ with $\gamma$ and $g_0$ being some chosen values.
+
+In this example, $\gamma = 2$ and $g_0 = 10$.
+
+!split
+===== The trial solution =====
+To begin with, a trial solution $g_t(t)$ must be chosen. A general trial solution for ordinary differential equations could be
+
+!bt
+g_t(x, P) = h_1(x) + h_2(x, N(x, P))
+!et
+
+with $h_1(x)$ ensuring that $g_t(x)$ satisfies some conditions and $h_2(x,N(x, P))$ an expression involving $x$ and the output from the neural network $N(x,P)$ with $P $ being the collection of the weights and biases for each layer. For now, it is assumed that the network consists of one input layer, one hidden layer, and one output layer.
+
+!split
+===== Setup of Network =====
+
+In this network, there are no weights and bias at the input layer, so $P = \{ P_{\text{hidden}},  P_{\text{output}} \}$.
+If there are $N_{\text{hidden} }$ neurons in the hidden layer, then $P_{\text{hidden}}$ is a $N_{\text{hidden} } \times (1 + N_{\text{input}})$ matrix, given that there are $N_{\text{input}}$ neurons in the input layer.
+
+The first column in $P_{\text{hidden} }$ represents the bias for each neuron in the hidden layer and the second column represents the weights for each neuron in the hidden layer from the input layer.
+If there are $N_{\text{output} }$ neurons in the output layer, then $P_{\text{output}} $ is a $N_{\text{output} } \times (1 + N_{\text{hidden} })$ matrix.
+
+Its first column represents the bias of each neuron and the remaining columns represents the weights to each neuron.
+
+It is given that $g(0) = g_0$. The trial solution must fulfill this condition to be a proper solution of (ref{solveode}). A possible way to ensure that $g_t(0, P) = g_0$, is to let $F(N(x,P)) = x \cdot N(x,P)$ and $A(x) = g_0$. This gives the following trial solution:
+
+!bt
+\begin{equation} \label{trial}
+g_t(x, P) = g_0 + x \cdot N(x, P)
+\end{equation}
+!et
+
+!split
+===== Reformulating the problem =====
+
+We wish that our neural network manages to minimize a given cost function.
+
+A reformulation of out equation, (ref{solveode}), must therefore be done,
+such that it describes the problem a neural network can solve for.
+
+The neural network must find the set of weights and biases $P$ such that the trial solution in (ref{trial}) satisfies (ref{solveode}).
+
+The trial solution
+
+!bt
+g_t(x, P) = g_0 + x \cdot N(x, P)
+!et
+
+has been chosen such that it already solves the condition $g(0) = g_0$. What remains, is to find $P$ such that
+
+!bt
+\begin{equation} \label{nnmin}
+g_t'(x, P) = - \gamma g_t(x, P)
+\end{equation}
+!et
+
+is fulfilled as *best as possible*.
+
+!split
+===== More technicalities =====
+
+The left hand side and right hand side of (ref{nnmin}) must be computed separately, and then the neural network must choose weights and biases, contained in $P$, such that the sides are equal as best as possible.
+This means that the absolute or squared difference between the sides must be as close to zero, ideally equal to zero.
+In this case, the difference squared shows to be an appropriate measurement of how erroneous the trial solution is with respect to $P$ of the neural network.
+
+This gives the following cost function our neural network must solve for:
+
+!bt
+\min_{P}\Big\{ \big(g_t'(x, P) - ( -\gamma g_t(x, P) \big)^2 \Big\}
+!et
+
+(the notation $\min_{P}\{ f(x, P) \}$ means that we desire to find $P$ that yields the minimum of $f(x, P)$)
+
+or, in terms of weights and biases for the hidden and output layer in our network:
+
+!bt
+\min_{P_{\text{hidden} }, \ P_{\text{output} }}\Big\{ \big(g_t'(x, \{ P_{\text{hidden} }, P_{\text{output} }\}) - ( -\gamma g_t(x, \{ P_{\text{hidden} }, P_{\text{output} }\}) \big)^2 \Big\}
+!et
+
+for an input value $x$.
+
+!split
+=====  More details =====
+
+If the neural network evaluates $g_t(x, P)$ at more values for $x$, say $N$ values $x_i$ for $i = 1, \dots, N$, then the *total* error to minimize becomes
+
+!bt
+\begin{equation} \label{min}
+\min_{P}\Big\{\frac{1}{N} \sum_{i=1}^N  \big(g_t'(x_i, P) - ( -\gamma g_t(x_i, P) \big)^2 \Big\}
+\end{equation}
+!et
+
+Letting $\bm{x}$ be a vector with elements $x_i$ and $C(\bm{x}, P) = \frac{1}{N} \sum_i  \big(g_t'(x_i, P) - ( -\gamma g_t(x_i, P) \big)^2$ denote the cost function, the minimization problem that our network must solve, becomes
+
+!bt
+\min_{P} C(\bm{x}, P)
+!et
+
+In terms of $P_{\text{hidden} }$ and $P_{\text{output} }$, this could also be expressed as
+
+$$
+\min_{P_{\text{hidden} }, \ P_{\text{output} }} C(\bm{x}, \{P_{\text{hidden} }, P_{\text{output} }\})
+$$
+
+!split
+===== A possible implementation of a neural network =====
+
+For simplicity, it is assumed that the input is an array $\bm{x} = (x_1, \dots, x_N)$ with $N$ elements. It is at these points the neural network should find $P$ such that it fulfills (ref{min}).
+
+First, the neural network must feed forward the inputs.
+This means that $\bm{x}s$ must be passed through an input layer, a hidden layer and a output layer. The input layer in this case, does not need to process the data any further.
+The input layer will consist of $N_{\text{input} }$ neurons, passing its element to each neuron in the hidden layer.  The number of neurons in the hidden layer will be $N_{\text{hidden} }$.
+
+!split
+===== Technicalities =====
+
+For the $i$-th in the hidden layer with weight $w_i^{\text{hidden} }$ and bias $b_i^{\text{hidden} }$, the weighting from the $j$-th neuron at the input layer is:
+
+!bt
+\begin{aligned}
+z_{i,j}^{\text{hidden}} &= b_i^{\text{hidden}} + w_i^{\text{hidden}}x_j \\
+&=
+\begin{pmatrix}
+b_i^{\text{hidden}} & w_i^{\text{hidden}}
+\end{pmatrix}
+\begin{pmatrix}
+1 \\
+x_j
+\end{pmatrix}
+\end{aligned}
+!et
+
+!split
+=====  Final technicalities I =====
+
+The result after weighting the inputs at the $i$-th hidden neuron can be written as a vector:
+
+!bt
+\begin{aligned}
+\bm{z}_{i}^{\text{hidden}} &= \Big( b_i^{\text{hidden}} + w_i^{\text{hidden}}x_1 , \ b_i^{\text{hidden}} + w_i^{\text{hidden}} x_2, \ \dots \, , \ b_i^{\text{hidden}} + w_i^{\text{hidden}} x_N\Big)  \\
+&=
+\begin{pmatrix}
+ b_i^{\text{hidden}}  & w_i^{\text{hidden}}
+\end{pmatrix}
+\begin{pmatrix}
+1  & 1 & \dots & 1 \\
+x_1 & x_2 & \dots & x_N
+\end{pmatrix} \\
+&= \bm{p}_{i, \text{hidden}}^T X
+\end{aligned}
+!et
+
+!split
+=====  Final technicalities II =====
+
+The vector $\bm{p}_{i, \text{hidden}}^T$ constitutes each row in $P_{\text{hidden} }$, which contains the weights for the neural network to minimize according to (ref{min}).
+
+After having found $\bm{z}_{i}^{\text{hidden}} $ for every $i$-th neuron within the hidden layer, the vector will be sent to an activation function $a_i(\bm{z})$.
+
+In this example, the sigmoid function has been chosen to be the activation function for each hidden neuron:
+
+!bt
+f(z) = \frac{1}{1 + \exp{(-z)}}
+!et
+
+It is possible to use other activations functions for the hidden layer also.
+
+The output $\bm{x}_i^{\text{hidden}}$ from each $i$-th hidden neuron is:
+
+$$
+\bm{x}_i^{\text{hidden} } = f\big(  \bm{z}_{i}^{\text{hidden}} \big)
+$$
+
+The outputs $\bm{x}_i^{\text{hidden} } $ are then sent to the output layer.
+
+The output layer consists of one neuron in this case, and combines the
+output from each of the neurons in the hidden layers. The output layer
+combines the results from the hidden layer using some weights $w_i^{\text{output}}$
+and biases $b_i^{\text{output}}$. In this case,
+it is assumes that the number of neurons in the output layer is one.
+
+!split
+=====  Final technicalities III =====
+
+
+The procedure of weighting the output neuron $j$ in the hidden layer to the $i$-th neuron in the output layer is similar as for the hidden layer described previously.
+
+!bt
+\begin{aligned}
+z_{1,j}^{\text{output}} & =
+\begin{pmatrix}
+b_1^{\text{output}} & \bm{w}_1^{\text{output}}
+\end{pmatrix}
+\begin{pmatrix}
+1 \\
+\bm{x}_j^{\text{hidden}}
+\end{pmatrix}
+\end{aligned}
+!et
+
+!split
+=====  Final technicalities IV =====
+
+Expressing $z_{1,j}^{\text{output}}$ as a vector gives the following way of weighting the inputs from the hidden layer:
+
+!bt
+\bm{z}_{1}^{\text{output}} =
+\begin{pmatrix}
+b_1^{\text{output}} & \bm{w}_1^{\text{output}}
+\end{pmatrix}
+\begin{pmatrix}
+1  & 1 & \dots & 1 \\
+\bm{x}_1^{\text{hidden}} & \bm{x}_2^{\text{hidden}} & \dots & \bm{x}_N^{\text{hidden}}
+\end{pmatrix}
+!et
+
+In this case we seek a continuous range of values since we are approximating a function. This means that after computing $\bm{z}_{1}^{\text{output}}$ the neural network has finished its feed forward step, and $\bm{z}_{1}^{\text{output}}$ is the final output of the network.
+
+!split
+===== Back propagation =====
+
+The next step is to decide how the parameters should be changed such that they minimize the cost function.
+
+The chosen cost function for this problem is
+
+!bt
+C(\bm{x}, P) = \frac{1}{N} \sum_i  \big(g_t'(x_i, P) - ( -\gamma g_t(x_i, P) \big)^2
+!et
+
+In order to minimize the cost function, an optimization method must be chosen.
+
+Here, gradient descent with a constant step size has been chosen.
+
+!split
+===== Gradient descent =====
+
+The idea of the gradient descent algorithm is to update parameters in
+a direction where the cost function decreases goes to a minimum.
+
+In general, the update of some parameters $\bm{\omega}$ given a cost
+function defined by some weights $\bm{\omega}$, $C(\bm{x},
+\bm{\omega})$, goes as follows:
+
+!bt
+\bm{\omega}_{\text{new} } = \bm{\omega} - \lambda \nabla_{\bm{\omega}} C(\bm{x}, \bm{\omega})
+!et
+
+for a number of iterations or until $ \big|\big| \bm{\omega}_{\text{new} } - \bm{\omega} \big|\big|$ becomes smaller than some given tolerance.
+
+The value of $\lambda$ decides how large steps the algorithm must take
+in the direction of $ \nabla_{\bm{\omega}} C(\bm{x}, \bm{\omega})$.
+The notation $\nabla_{\bm{\omega}}$ express the gradient with respect
+to the elements in $\bm{\omega}$.
+
+In our case, we have to minimize the cost function $C(\bm{x}, P)$ with
+respect to the two sets of weights and biases, that is for the hidden
+layer $P_{\text{hidden} }$ and for the output layer $P_{\text{output}
+}$ .
+
+This means that $P_{\text{hidden} }$ and $P_{\text{output} }$ is updated by
+
+!bt
+\begin{aligned}
+P_{\text{hidden},\text{new}} &= P_{\text{hidden}} - \lambda \nabla_{P_{\text{hidden}}} C(\bm{x}, P)  \\
+P_{\text{output},\text{new}} &= P_{\text{output}} - \lambda \nabla_{P_{\text{output}}} C(\bm{x}, P)
+\end{aligned}
+!et
+
+!split
+===== The code for solving the ODE =====
+
+!bc pycod
+import autograd.numpy as np
+from autograd import grad, elementwise_grad
+import autograd.numpy.random as npr
+from matplotlib import pyplot as plt
+
+def sigmoid(z):
+    return 1/(1 + np.exp(-z))
+
+# Assuming one input, hidden, and output layer
+def neural_network(params, x):
+
+    # Find the weights (including and biases) for the hidden and output layer.
+    # Assume that params is a list of parameters for each layer.
+    # The biases are the first element for each array in params,
+    # and the weights are the remaning elements in each array in params.
+
+    w_hidden = params[0]
+    w_output = params[1]
+
+    # Assumes input x being an one-dimensional array
+    num_values = np.size(x)
+    x = x.reshape(-1, num_values)
+
+    # Assume that the input layer does nothing to the input x
+    x_input = x
+
+    ## Hidden layer:
+
+    # Add a row of ones to include bias
+    x_input = np.concatenate((np.ones((1,num_values)), x_input ), axis = 0)
+
+    z_hidden = np.matmul(w_hidden, x_input)
+    x_hidden = sigmoid(z_hidden)
+
+    ## Output layer:
+
+    # Include bias:
+    x_hidden = np.concatenate((np.ones((1,num_values)), x_hidden ), axis = 0)
+
+    z_output = np.matmul(w_output, x_hidden)
+    x_output = z_output
+
+    return x_output
+
+# The trial solution using the deep neural network:
+def g_trial(x,params, g0 = 10):
+    return g0 + x*neural_network(params,x)
+
+# The right side of the ODE:
+def g(x, g_trial, gamma = 2):
+    return -gamma*g_trial
+
+# The cost function:
+def cost_function(P, x):
+
+    # Evaluate the trial function with the current parameters P
+    g_t = g_trial(x,P)
+
+    # Find the derivative w.r.t x of the neural network
+    d_net_out = elementwise_grad(neural_network,1)(P,x)
+
+    # Find the derivative w.r.t x of the trial function
+    d_g_t = elementwise_grad(g_trial,0)(x,P)
+
+    # The right side of the ODE
+    func = g(x, g_t)
+
+    err_sqr = (d_g_t - func)**2
+    cost_sum = np.sum(err_sqr)
+
+    return cost_sum / np.size(err_sqr)
+
+# Solve the exponential decay ODE using neural network with one input, hidden, and output layer
+def solve_ode_neural_network(x, num_neurons_hidden, num_iter, lmb):
+    ## Set up initial weights and biases
+
+    # For the hidden layer
+    p0 = npr.randn(num_neurons_hidden, 2 )
+
+    # For the output layer
+    p1 = npr.randn(1, num_neurons_hidden + 1 ) # +1 since bias is included
+
+    P = [p0, p1]
+
+    print('Initial cost: %g'%cost_function(P, x))
+
+    ## Start finding the optimal weights using gradient descent
+
+    # Find the Python function that represents the gradient of the cost function
+    # w.r.t the 0-th input argument -- that is the weights and biases in the hidden and output layer
+    cost_function_grad = grad(cost_function,0)
+
+    # Let the update be done num_iter times
+    for i in range(num_iter):
+        # Evaluate the gradient at the current weights and biases in P.
+        # The cost_grad consist now of two arrays;
+        # one for the gradient w.r.t P_hidden and
+        # one for the gradient w.r.t P_output
+        cost_grad =  cost_function_grad(P, x)
+
+        P[0] = P[0] - lmb * cost_grad[0]
+        P[1] = P[1] - lmb * cost_grad[1]
+
+    print('Final cost: %g'%cost_function(P, x))
+
+    return P
+
+def g_analytic(x, gamma = 2, g0 = 10):
+    return g0*np.exp(-gamma*x)
+
+# Solve the given problem
+if __name__ == '__main__':
+    # Set seed such that the weight are initialized
+    # with same weights and biases for every run.
+    npr.seed(15)
+
+    ## Decide the vales of arguments to the function to solve
+    N = 10
+    x = np.linspace(0, 1, N)
+
+    ## Set up the initial parameters
+    num_hidden_neurons = 10
+    num_iter = 10000
+    lmb = 0.001
+
+    # Use the network
+    P = solve_ode_neural_network(x, num_hidden_neurons, num_iter, lmb)
+
+    # Print the deviation from the trial solution and true solution
+    res = g_trial(x,P)
+    res_analytical = g_analytic(x)
+
+    print('Max absolute difference: %g'%np.max(np.abs(res - res_analytical)))
+
+    # Plot the results
+    plt.figure(figsize=(10,10))
+
+    plt.title('Performance of neural network solving an ODE compared to the analytical solution')
+    plt.plot(x, res_analytical)
+    plt.plot(x, res[0,:])
+    plt.legend(['analytical','nn'])
+    plt.xlabel('x')
+    plt.ylabel('g(x)')
+    plt.show()
+!ec
+
+
+!split
+===== The network with one input layer, specified number of hidden layers, and one output layer =====
+
+It is also possible to extend the construction of our network into a more general one, allowing the network to contain more than one hidden layers.
+
+The number of neurons within each hidden layer are given as a list of integers in the program below.
+
+!bc pycod
+import autograd.numpy as np
+from autograd import grad, elementwise_grad
+import autograd.numpy.random as npr
+from matplotlib import pyplot as plt
+
+def sigmoid(z):
+    return 1/(1 + np.exp(-z))
+
+# The neural network with one input layer and one output layer,
+# but with number of hidden layers specified by the user.
+def deep_neural_network(deep_params, x):
+    # N_hidden is the number of hidden layers
+    # deep_params is a list, len() should be used
+    N_hidden = len(deep_params) - 1 # -1 since params consists of
+                                        # parameters to all the hidden
+                                        # layers AND the output layer.
+
+    # Assumes input x being an one-dimensional array
+    num_values = np.size(x)
+    x = x.reshape(-1, num_values)
+
+    # Assume that the input layer does nothing to the input x
+    x_input = x
+
+    # Due to multiple hidden layers, define a variable referencing to the
+    # output of the previous layer:
+    x_prev = x_input
+
+    ## Hidden layers:
+
+    for l in range(N_hidden):
+        # From the list of parameters P; find the correct weigths and bias for this layer
+        w_hidden = deep_params[l]
+
+        # Add a row of ones to include bias
+        x_prev = np.concatenate((np.ones((1,num_values)), x_prev ), axis = 0)
+
+        z_hidden = np.matmul(w_hidden, x_prev)
+        x_hidden = sigmoid(z_hidden)
+
+        # Update x_prev such that next layer can use the output from this layer
+        x_prev = x_hidden
+
+    ## Output layer:
+
+    # Get the weights and bias for this layer
+    w_output = deep_params[-1]
+
+    # Include bias:
+    x_prev = np.concatenate((np.ones((1,num_values)), x_prev), axis = 0)
+
+    z_output = np.matmul(w_output, x_prev)
+    x_output = z_output
+
+    return x_output
+
+# The trial solution using the deep neural network:
+def g_trial_deep(x,params, g0 = 10):
+    return g0 + x*deep_neural_network(params, x)
+
+# The right side of the ODE:
+def g(x, g_trial, gamma = 2):
+    return -gamma*g_trial
+
+# The same cost function as before, but calls deep_neural_network instead.
+def cost_function_deep(P, x):
+
+    # Evaluate the trial function with the current parameters P
+    g_t = g_trial_deep(x,P)
+
+    # Find the derivative w.r.t x of the neural network
+    d_net_out = elementwise_grad(deep_neural_network,1)(P,x)
+
+    # Find the derivative w.r.t x of the trial function
+    d_g_t = elementwise_grad(g_trial_deep,0)(x,P)
+
+    # The right side of the ODE
+    func = g(x, g_t)
+
+    err_sqr = (d_g_t - func)**2
+    cost_sum = np.sum(err_sqr)
+
+    return cost_sum / np.size(err_sqr)
+
+# Solve the exponential decay ODE using neural network with one input and one output layer,
+# but with specified number of hidden layers from the user.
+def solve_ode_deep_neural_network(x, num_neurons, num_iter, lmb):
+    # num_hidden_neurons is now a list of number of neurons within each hidden layer
+
+    # The number of elements in the list num_hidden_neurons thus represents
+    # the number of hidden layers.
+
+    # Find the number of hidden layers:
+    N_hidden = np.size(num_neurons)
+
+    ## Set up initial weights and biases
+
+    # Initialize the list of parameters:
+    P = [None]*(N_hidden + 1) # + 1 to include the output layer
+
+    P[0] = npr.randn(num_neurons[0], 2 )
+    for l in range(1,N_hidden):
+        P[l] = npr.randn(num_neurons[l], num_neurons[l-1] + 1) # +1 to include bias
+
+    # For the output layer
+    P[-1] = npr.randn(1, num_neurons[-1] + 1 ) # +1 since bias is included
+
+    print('Initial cost: %g'%cost_function_deep(P, x))
+
+    ## Start finding the optimal weights using gradient descent
+
+    # Find the Python function that represents the gradient of the cost function
+    # w.r.t the 0-th input argument -- that is the weights and biases in the hidden and output layer
+    cost_function_deep_grad = grad(cost_function_deep,0)
+
+    # Let the update be done num_iter times
+    for i in range(num_iter):
+        # Evaluate the gradient at the current weights and biases in P.
+        # The cost_grad consist now of N_hidden + 1 arrays; the gradient w.r.t the weights and biases
+        # in the hidden layers and output layers evaluated at x.
+        cost_deep_grad =  cost_function_deep_grad(P, x)
+
+        for l in range(N_hidden+1):
+            P[l] = P[l] - lmb * cost_deep_grad[l]
+
+    print('Final cost: %g'%cost_function_deep(P, x))
+
+    return P
+
+def g_analytic(x, gamma = 2, g0 = 10):
+    return g0*np.exp(-gamma*x)
+
+# Solve the given problem
+if __name__ == '__main__':
+    npr.seed(15)
+
+    ## Decide the vales of arguments to the function to solve
+    N = 10
+    x = np.linspace(0, 1, N)
+
+    ## Set up the initial parameters
+    num_hidden_neurons = np.array([10,10])
+    num_iter = 10000
+    lmb = 0.001
+
+    P = solve_ode_deep_neural_network(x, num_hidden_neurons, num_iter, lmb)
+
+    res = g_trial_deep(x,P)
+    res_analytical = g_analytic(x)
+
+    plt.figure(figsize=(10,10))
+
+    plt.title('Performance of a deep neural network solving an ODE compared to the analytical solution')
+    plt.plot(x, res_analytical)
+    plt.plot(x, res[0,:])
+    plt.legend(['analytical','dnn'])
+    plt.ylabel('g(x)')
+    plt.show()
+!ec
+
+
+!split
+===== Example: Population growth =====
+
+A logistic model of population growth assumes that a population converges toward an equilibrium.
+The population growth can be modeled by
+
+!bt
+\begin{equation} \label{log}
+	g'(t) = \alpha g(t)(A - g(t))
+\end{equation}
+!et
+
+where $g(t)$ is the population density at time $t$, $\alpha > 0$ the growth rate and $A > 0$ is the maximum population number in the environment.
+Also, at $t = 0$ the population has the size $g(0) = g_0$, where $g_0$ is some chosen constant.
+
+In this example, similar network as for the exponential decay using Autograd has been used to solve the equation. However, as the implementation might suffer from e.g numerical instability
+and high execution time (this might be more apparent in the examples solving PDEs),
+using a library like  TensorFlow is recommended.
+Here, we stay with a more simple approach and implement for comparison, the simple forward Euler method.
+
+!split
+===== Setting up the problem =====
+
+Here, we will model a population $g(t)$ in an environment having carrying capacity $A$.
+The population follows the model
+
+!bt
+\begin{equation} \label{solveode_population}
+g'(t) = \alpha g(t)(A - g(t))
+\end{equation}
+!et
+
+where $g(0) = g_0$.
+
+In this example, we let $\alpha = 2$, $A = 1$, and $g_0 = 1.2$.
+
+!split
+===== The trial solution =====
+
+We will get a slightly different trial solution, as the boundary conditions are different
+compared to the case for exponential decay.
+
+A possible trial solution satisfying the condition $g(0) = g_0$ could be
+
+$$
+h_1(t) = g_0 + t \cdot N(t,P)
+$$
+
+with $N(t,P)$ being the output from the neural network with weights and biases for each layer collected in the set $P$.
+
+The analytical solution is
+
+$$
+g(t) = \frac{Ag_0}{g_0 + (A - g_0)\exp(-\alpha A t)}
+$$
+
+!split
+===== The program using Autograd =====
+
+The network will be the similar as for the exponential decay example, but with some small modifications for our problem.
+
+!bc pycod
+import autograd.numpy as np
+from autograd import grad, elementwise_grad
+import autograd.numpy.random as npr
+from matplotlib import pyplot as plt
+
+def sigmoid(z):
+    return 1/(1 + np.exp(-z))
+
+# Function to get the parameters.
+# Done such that one can easily change the paramaters after one's liking.
+def get_parameters():
+    alpha = 2
+    A = 1
+    g0 = 1.2
+    return alpha, A, g0
+
+def deep_neural_network(deep_params, x):
+    # N_hidden is the number of hidden layers
+    # deep_params is a list, len() should be used
+    N_hidden = len(deep_params) - 1 # -1 since params consists of
+                                        # parameters to all the hidden
+                                        # layers AND the output layer.
+
+    # Assumes input x being an one-dimensional array
+    num_values = np.size(x)
+    x = x.reshape(-1, num_values)
+
+    # Assume that the input layer does nothing to the input x
+    x_input = x
+
+    # Due to multiple hidden layers, define a variable referencing to the
+    # output of the previous layer:
+    x_prev = x_input
+
+    ## Hidden layers:
+
+    for l in range(N_hidden):
+        # From the list of parameters P; find the correct weigths and bias for this layer
+        w_hidden = deep_params[l]
+
+        # Add a row of ones to include bias
+        x_prev = np.concatenate((np.ones((1,num_values)), x_prev ), axis = 0)
+
+        z_hidden = np.matmul(w_hidden, x_prev)
+        x_hidden = sigmoid(z_hidden)
+
+        # Update x_prev such that next layer can use the output from this layer
+        x_prev = x_hidden
+
+    ## Output layer:
+
+    # Get the weights and bias for this layer
+    w_output = deep_params[-1]
+
+    # Include bias:
+    x_prev = np.concatenate((np.ones((1,num_values)), x_prev), axis = 0)
+
+    z_output = np.matmul(w_output, x_prev)
+    x_output = z_output
+
+    return x_output
+
+
+
+
+def cost_function_deep(P, x):
+
+    # Evaluate the trial function with the current parameters P
+    g_t = g_trial_deep(x,P)
+
+    # Find the derivative w.r.t x of the trial function
+    d_g_t = elementwise_grad(g_trial_deep,0)(x,P)
+
+    # The right side of the ODE
+    func = f(x, g_t)
+
+    err_sqr = (d_g_t - func)**2
+    cost_sum = np.sum(err_sqr)
+
+    return cost_sum / np.size(err_sqr)
+
+# The right side of the ODE:
+def f(x, g_trial):
+    alpha,A, g0 = get_parameters()
+    return alpha*g_trial*(A - g_trial)
+
+# The trial solution using the deep neural network:
+def g_trial_deep(x, params):
+    alpha,A, g0 = get_parameters()
+    return g0 + x*deep_neural_network(params,x)
+
+# The analytical solution:
+def g_analytic(t):
+    alpha,A, g0 = get_parameters()
+    return A*g0/(g0 + (A - g0)*np.exp(-alpha*A*t))
+
+def solve_ode_deep_neural_network(x, num_neurons, num_iter, lmb):
+    # num_hidden_neurons is now a list of number of neurons within each hidden layer
+
+    # Find the number of hidden layers:
+    N_hidden = np.size(num_neurons)
+
+    ## Set up initial weigths and biases
+
+    # Initialize the list of parameters:
+    P = [None]*(N_hidden + 1) # + 1 to include the output layer
+
+    P[0] = npr.randn(num_neurons[0], 2 )
+    for l in range(1,N_hidden):
+        P[l] = npr.randn(num_neurons[l], num_neurons[l-1] + 1) # +1 to include bias
+
+    # For the output layer
+    P[-1] = npr.randn(1, num_neurons[-1] + 1 ) # +1 since bias is included
+
+    print('Initial cost: %g'%cost_function_deep(P, x))
+
+    ## Start finding the optimal weigths using gradient descent
+
+    # Find the Python function that represents the gradient of the cost function
+    # w.r.t the 0-th input argument -- that is the weights and biases in the hidden and output layer
+    cost_function_deep_grad = grad(cost_function_deep,0)
+
+    # Let the update be done num_iter times
+    for i in range(num_iter):
+        # Evaluate the gradient at the current weights and biases in P.
+        # The cost_grad consist now of N_hidden + 1 arrays; the gradient w.r.t the weights and biases
+        # in the hidden layers and output layers evaluated at x.
+        cost_deep_grad =  cost_function_deep_grad(P, x)
+
+        for l in range(N_hidden+1):
+            P[l] = P[l] - lmb * cost_deep_grad[l]
+
+    print('Final cost: %g'%cost_function_deep(P, x))
+
+    return P
+
+if __name__ == '__main__':
+    npr.seed(4155)
+
+    ## Decide the vales of arguments to the function to solve
+    Nt = 10
+    T = 1
+    t = np.linspace(0,T, Nt)
+
+    ## Set up the initial parameters
+    num_hidden_neurons = [100, 50, 25]
+    num_iter = 1000
+    lmb = 1e-3
+
+    P = solve_ode_deep_neural_network(t, num_hidden_neurons, num_iter, lmb)
+
+    g_dnn_ag = g_trial_deep(t,P)
+    g_analytical = g_analytic(t)
+
+    # Find the maximum absolute difference between the solutons:
+    diff_ag = np.max(np.abs(g_dnn_ag - g_analytical))
+    print("The max absolute difference between the solutions is: %g"%diff_ag)
+
+    plt.figure(figsize=(10,10))
+
+    plt.title('Performance of neural network solving an ODE compared to the analytical solution')
+    plt.plot(t, g_analytical)
+    plt.plot(t, g_dnn_ag[0,:])
+    plt.legend(['analytical','nn'])
+    plt.xlabel('t')
+    plt.ylabel('g(t)')
+
+    plt.show()
+!ec
+
+!split
+===== Using forward Euler to solve the ODE =====
+
+A straightforward way of solving an ODE numerically, is to use Euler's method.
+
+Euler's method uses Taylor series to approximate the value at a function $f$ at a step $\Delta x$ from $x$:
+
+$$
+f(x + \Delta x) \approx f(x) + \Delta x f'(x)
+$$
+
+In our case, using Euler's method to approximate the value of $g$ at a step $\Delta t$ from $t$ yields
+
+!bt
+\begin{aligned}
+  g(t + \Delta t) &\approx g(t) + \Delta t g'(t) \\
+  &= g(t) + \Delta t \big(\alpha g(t)(A - g(t))\big)
+\end{aligned}
+!et
+along with the condition that $g(0) = g_0$.
+
+Let $t_i = i \cdot \Delta t$ where $\Delta t = \frac{T}{N_t-1}$ where $T$ is the final time our solver must solve for and $N_t$ the number of values for $t \in [0, T]$ for $i = 0, \dots, N_t-1$.
+
+For $i \geq 1$, we have that
+!bt
+\begin{aligned}
+t_i &= i\Delta t \\
+&= (i - 1)\Delta t + \Delta t \\
+&= t_{i-1} + \Delta t
+\end{aligned}
+!et
+
+Now, if $g_i = g(t_i)$ then
+
+!bt
+\begin{equation}
+  \begin{aligned}
+  g_i &= g(t_i) \\
+  &= g(t_{i-1} + \Delta t) \\
+  &\approx g(t_{i-1}) + \Delta t \big(\alpha g(t_{i-1})(A - g(t_{i-1}))\big) \\
+  &= g_{i-1} + \Delta t \big(\alpha g_{i-1}(A - g_{i-1})\big)
+  \end{aligned}
+\end{equation} \label{odenum}
+!et
+for $i \geq 1$ and $g_0 = g(t_0) = g(0) = g_0$.
+
+Equation (ref{odenum}) could be implemented in the following way,
+extending the program that uses the network using Autograd:
+
+!bc pycod
+# Assume that all function definitions from the example program using Autograd
+# are located here.
+
+if __name__ == '__main__':
+    npr.seed(4155)
+
+    ## Decide the vales of arguments to the function to solve
+    Nt = 10
+    T = 1
+    t = np.linspace(0,T, Nt)
+
+    ## Set up the initial parameters
+    num_hidden_neurons = [100,50,25]
+    num_iter = 1000
+    lmb = 1e-3
+
+    P = solve_ode_deep_neural_network(t, num_hidden_neurons, num_iter, lmb)
+
+    g_dnn_ag = g_trial_deep(t,P)
+    g_analytical = g_analytic(t)
+
+    # Find the maximum absolute difference between the solutons:
+    diff_ag = np.max(np.abs(g_dnn_ag - g_analytical))
+    print("The max absolute difference between the solutions is: %g"%diff_ag)
+
+    plt.figure(figsize=(10,10))
+
+    plt.title('Performance of neural network solving an ODE compared to the analytical solution')
+    plt.plot(t, g_analytical)
+    plt.plot(t, g_dnn_ag[0,:])
+    plt.legend(['analytical','nn'])
+    plt.xlabel('t')
+    plt.ylabel('g(t)')
+
+    ## Find an approximation to the funtion using forward Euler
+
+    alpha, A, g0 = get_parameters()
+    dt = T/(Nt - 1)
+
+    # Perform forward Euler to solve the ODE
+    g_euler = np.zeros(Nt)
+    g_euler[0] = g0
+
+    for i in range(1,Nt):
+        g_euler[i] = g_euler[i-1] + dt*(alpha*g_euler[i-1]*(A - g_euler[i-1]))
+
+    # Print the errors done by each method
+    diff1 = np.max(np.abs(g_euler - g_analytical))
+    diff2 = np.max(np.abs(g_dnn_ag[0,:] - g_analytical))
+
+    print('Max absolute difference between Euler method and analytical: %g'%diff1)
+    print('Max absolute difference between deep neural network and analytical: %g'%diff2)
+
+    # Plot results
+    plt.figure(figsize=(10,10))
+
+    plt.plot(t,g_euler)
+    plt.plot(t,g_analytical)
+    plt.plot(t,g_dnn_ag[0,:])
+
+    plt.legend(['euler','analytical','dnn'])
+    plt.xlabel('Time t')
+    plt.ylabel('g(t)')
+
+    plt.show()
+!ec
+
+
+
+!split
+===== Example: Solving the one dimensional Poisson equation =====
+
+The Poisson equation for $g(x)$ in one dimension is
+
+!bt
+\begin{equation} \label{poisson}
+  -g''(x) = f(x)
+\end{equation}
+!et
+
+where $f(x)$ is a given function for $x \in (0,1)$.
+
+The conditions that $g(x)$ is chosen to fulfill, are
+!bt
+\begin{align*}
+  g(0) &= 0 \\
+  g(1) &= 0
+\end{align*}
+!et
+
+This equation can be solved numerically using programs where e.g Autograd and TensorFlow are used.
+The results from the networks can then be compared to the analytical solution.
+In addition, it could be interesting to see how a typical method for numerically solving second order ODEs compares to the neural networks.
+
+!split
+===== The specific equation to solve for =====
+
+Here, the function $g(x)$ to solve for follows the equation
+
+!bt
+-g''(x) = f(x),\qquad x \in (0,1)
+!et
+
+where $f(x)$ is a given function, along with the chosen conditions
+
+!bt
+\begin{aligned}
+g(0) = g(1) = 0
+\end{aligned}\label{cond}
+!et
+
+In this example, we consider the case when $f(x) = (3x + x^2)\exp(x)$.
+
+For this case, a possible trial solution satisfying the conditions could be
+
+!bt
+g_t(x) = x \cdot (1-x) \cdot N(P,x)
+!et
+
+The analytical solution for this problem is
+
+!bt
+g(x) = x(1 - x)\exp(x)
+!et
+
+!split
+===== Solving the equation using Autograd =====
+
+!bc pycod
+import autograd.numpy as np
+from autograd import grad, elementwise_grad
+import autograd.numpy.random as npr
+from matplotlib import pyplot as plt
+
+def sigmoid(z):
+    return 1/(1 + np.exp(-z))
+
+def deep_neural_network(deep_params, x):
+    # N_hidden is the number of hidden layers
+    # deep_params is a list, len() should be used
+    N_hidden = len(deep_params) - 1 # -1 since params consists of
+                                        # parameters to all the hidden
+                                        # layers AND the output layer.
+
+    # Assumes input x being an one-dimensional array
+    num_values = np.size(x)
+    x = x.reshape(-1, num_values)
+
+    # Assume that the input layer does nothing to the input x
+    x_input = x
+
+    # Due to multiple hidden layers, define a variable referencing to the
+    # output of the previous layer:
+    x_prev = x_input
+
+    ## Hidden layers:
+
+    for l in range(N_hidden):
+        # From the list of parameters P; find the correct weigths and bias for this layer
+        w_hidden = deep_params[l]
+
+        # Add a row of ones to include bias
+        x_prev = np.concatenate((np.ones((1,num_values)), x_prev ), axis = 0)
+
+        z_hidden = np.matmul(w_hidden, x_prev)
+        x_hidden = sigmoid(z_hidden)
+
+        # Update x_prev such that next layer can use the output from this layer
+        x_prev = x_hidden
+
+    ## Output layer:
+
+    # Get the weights and bias for this layer
+    w_output = deep_params[-1]
+
+    # Include bias:
+    x_prev = np.concatenate((np.ones((1,num_values)), x_prev), axis = 0)
+
+    z_output = np.matmul(w_output, x_prev)
+    x_output = z_output
+
+    return x_output
+
+
+def solve_ode_deep_neural_network(x, num_neurons, num_iter, lmb):
+    # num_hidden_neurons is now a list of number of neurons within each hidden layer
+
+    # Find the number of hidden layers:
+    N_hidden = np.size(num_neurons)
+
+    ## Set up initial weigths and biases
+
+    # Initialize the list of parameters:
+    P = [None]*(N_hidden + 1) # + 1 to include the output layer
+
+    P[0] = npr.randn(num_neurons[0], 2 )
+    for l in range(1,N_hidden):
+        P[l] = npr.randn(num_neurons[l], num_neurons[l-1] + 1) # +1 to include bias
+
+    # For the output layer
+    P[-1] = npr.randn(1, num_neurons[-1] + 1 ) # +1 since bias is included
+
+    print('Initial cost: %g'%cost_function_deep(P, x))
+
+    ## Start finding the optimal weigths using gradient descent
+
+    # Find the Python function that represents the gradient of the cost function
+    # w.r.t the 0-th input argument -- that is the weights and biases in the hidden and output layer
+    cost_function_deep_grad = grad(cost_function_deep,0)
+
+    # Let the update be done num_iter times
+    for i in range(num_iter):
+        # Evaluate the gradient at the current weights and biases in P.
+        # The cost_grad consist now of N_hidden + 1 arrays; the gradient w.r.t the weights and biases
+        # in the hidden layers and output layers evaluated at x.
+        cost_deep_grad =  cost_function_deep_grad(P, x)
+
+        for l in range(N_hidden+1):
+            P[l] = P[l] - lmb * cost_deep_grad[l]
+
+    print('Final cost: %g'%cost_function_deep(P, x))
+
+    return P
+
+## Set up the cost function specified for this Poisson equation:
+
+# The right side of the ODE
+def f(x):
+    return (3*x + x**2)*np.exp(x)
+
+def cost_function_deep(P, x):
+
+    # Evaluate the trial function with the current parameters P
+    g_t = g_trial_deep(x,P)
+
+    # Find the derivative w.r.t x of the trial function
+    d2_g_t = elementwise_grad(elementwise_grad(g_trial_deep,0))(x,P)
+
+    right_side = f(x)
+
+    err_sqr = (-d2_g_t - right_side)**2
+    cost_sum = np.sum(err_sqr)
+
+    return cost_sum/np.size(err_sqr)
+
+# The trial solution:
+def g_trial_deep(x,P):
+    return x*(1-x)*deep_neural_network(P,x)
+
+# The analytic solution;
+def g_analytic(x):
+    return x*(1-x)*np.exp(x)
+
+if __name__ == '__main__':
+    npr.seed(4155)
+
+    ## Decide the vales of arguments to the function to solve
+    Nx = 10
+    x = np.linspace(0,1, Nx)
+
+    ## Set up the initial parameters
+    num_hidden_neurons = [200,100]
+    num_iter = 1000
+    lmb = 1e-3
+
+    P = solve_ode_deep_neural_network(x, num_hidden_neurons, num_iter, lmb)
+
+    g_dnn_ag = g_trial_deep(x,P)
+    g_analytical = g_analytic(x)
+
+    # Find the maximum absolute difference between the solutons:
+    max_diff = np.max(np.abs(g_dnn_ag - g_analytical))
+    print("The max absolute difference between the solutions is: %g"%max_diff)
+
+    plt.figure(figsize=(10,10))
+
+    plt.title('Performance of neural network solving an ODE compared to the analytical solution')
+    plt.plot(x, g_analytical)
+    plt.plot(x, g_dnn_ag[0,:])
+    plt.legend(['analytical','nn'])
+    plt.xlabel('x')
+    plt.ylabel('g(x)')
+    plt.show()
+!ec
+
+!split
+===== Comparing with a numerical scheme =====
+
+The Poisson equation is possible to solve using Taylor series to approximate the second derivative.
+
+Using Taylor series, the second derivative can be expressed as
+
+$$
+g''(x) = \frac{g(x + \Delta x) - 2g(x) + g(x-\Delta x)}{\Delta x^2} + E_{\Delta x}(x)
+$$
+
+where $\Delta x$ is a small step size and $E_{\Delta x}(x)$ being the error term.
+
+Looking away from the error terms gives an approximation to the second derivative:
+
+!bt
+\begin{equation} \label{approx}
+g''(x) \approx \frac{g(x + \Delta x) - 2g(x) + g(x-\Delta x)}{\Delta x^2}
+\end{equation}
+!et
+
+If $x_i = i \Delta x = x_{i-1} + \Delta x$ and $g_i = g(x_i)$ for $i = 1,\dots N_x - 2$ with $N_x$ being the number of values for $x$, (ref{approx}) becomes
+
+!bt
+\begin{aligned}
+g''(x_i) &\approx \frac{g(x_i + \Delta x) - 2g(x_i) + g(x_i -\Delta x)}{\Delta x^2} \\
+&= \frac{g_{i+1} - 2g_i + g_{i-1}}{\Delta x^2}
+\end{aligned}
+!et
+
+Since we know from our problem that
+
+!bt
+\begin{aligned}
+-g''(x) &= f(x) \\
+&= (3x + x^2)\exp(x)
+\end{aligned}
+!et
+
+along with the conditions $g(0) = g(1) = 0$,
+the following scheme can be used to find an approximate solution for $g(x)$ numerically:
+
+!bt
+\begin{equation}
+  \begin{aligned}
+  -\Big( \frac{g_{i+1} - 2g_i + g_{i-1}}{\Delta x^2} \Big) &= f(x_i) \\
+  -g_{i+1} + 2g_i - g_{i-1} &= \Delta x^2 f(x_i)
+  \end{aligned}
+\end{equation} \label{odesys}
+!et
+
+for $i = 1, \dots, N_x - 2$ where $g_0 = g_{N_x - 1} = 0$ and $f(x_i) = (3x_i + x_i^2)\exp(x_i)$, which is given for our specific problem.
+
+The equation can be rewritten into a matrix equation:
+
+!bt
+\begin{aligned}
+\begin{pmatrix}
+2 & -1 & 0 & \dots & 0 \\
+-1 & 2 & -1 & \dots & 0 \\
+\vdots & & \ddots & & \vdots \\
+0 & \dots & -1 & 2 & -1  \\
+0 & \dots & 0 & -1 & 2\\
+\end{pmatrix}
+\begin{pmatrix}
+g_1 \\
+g_2 \\
+\vdots \\
+g_{N_x - 3} \\
+g_{N_x - 2}
+\end{pmatrix}
+&=
+\Delta x^2
+\begin{pmatrix}
+f(x_1) \\
+f(x_2) \\
+\vdots \\
+f(x_{N_x - 3}) \\
+f(x_{N_x - 2})
+\end{pmatrix} \\
+\bm{A}\bm{g} &= \bm{f},
+\end{aligned}
+!et
+
+which makes it possible to solve for the vector $\bm{g}$.
+
+!split
+===== Setting up the code =====
+
+We can then compare the result from this numerical scheme with the output from our network using Autograd:
+
+!bc pycod
+import autograd.numpy as np
+from autograd import grad, elementwise_grad
+import autograd.numpy.random as npr
+from matplotlib import pyplot as plt
+
+def sigmoid(z):
+    return 1/(1 + np.exp(-z))
+
+def deep_neural_network(deep_params, x):
+    # N_hidden is the number of hidden layers
+    # deep_params is a list, len() should be used
+    N_hidden = len(deep_params) - 1 # -1 since params consists of
+                                        # parameters to all the hidden
+                                        # layers AND the output layer.
+
+    # Assumes input x being an one-dimensional array
+    num_values = np.size(x)
+    x = x.reshape(-1, num_values)
+
+    # Assume that the input layer does nothing to the input x
+    x_input = x
+
+    # Due to multiple hidden layers, define a variable referencing to the
+    # output of the previous layer:
+    x_prev = x_input
+
+    ## Hidden layers:
+
+    for l in range(N_hidden):
+        # From the list of parameters P; find the correct weigths and bias for this layer
+        w_hidden = deep_params[l]
+
+        # Add a row of ones to include bias
+        x_prev = np.concatenate((np.ones((1,num_values)), x_prev ), axis = 0)
+
+        z_hidden = np.matmul(w_hidden, x_prev)
+        x_hidden = sigmoid(z_hidden)
+
+        # Update x_prev such that next layer can use the output from this layer
+        x_prev = x_hidden
+
+    ## Output layer:
+
+    # Get the weights and bias for this layer
+    w_output = deep_params[-1]
+
+    # Include bias:
+    x_prev = np.concatenate((np.ones((1,num_values)), x_prev), axis = 0)
+
+    z_output = np.matmul(w_output, x_prev)
+    x_output = z_output
+
+    return x_output
+
+
+def solve_ode_deep_neural_network(x, num_neurons, num_iter, lmb):
+    # num_hidden_neurons is now a list of number of neurons within each hidden layer
+
+    # Find the number of hidden layers:
+    N_hidden = np.size(num_neurons)
+
+    ## Set up initial weigths and biases
+
+    # Initialize the list of parameters:
+    P = [None]*(N_hidden + 1) # + 1 to include the output layer
+
+    P[0] = npr.randn(num_neurons[0], 2 )
+    for l in range(1,N_hidden):
+        P[l] = npr.randn(num_neurons[l], num_neurons[l-1] + 1) # +1 to include bias
+
+    # For the output layer
+    P[-1] = npr.randn(1, num_neurons[-1] + 1 ) # +1 since bias is included
+
+    print('Initial cost: %g'%cost_function_deep(P, x))
+
+    ## Start finding the optimal weigths using gradient descent
+
+    # Find the Python function that represents the gradient of the cost function
+    # w.r.t the 0-th input argument -- that is the weights and biases in the hidden and output layer
+    cost_function_deep_grad = grad(cost_function_deep,0)
+
+    # Let the update be done num_iter times
+    for i in range(num_iter):
+        # Evaluate the gradient at the current weights and biases in P.
+        # The cost_grad consist now of N_hidden + 1 arrays; the gradient w.r.t the weights and biases
+        # in the hidden layers and output layers evaluated at x.
+        cost_deep_grad =  cost_function_deep_grad(P, x)
+
+        for l in range(N_hidden+1):
+            P[l] = P[l] - lmb * cost_deep_grad[l]
+
+    print('Final cost: %g'%cost_function_deep(P, x))
+
+    return P
+
+## Set up the cost function specified for this Poisson equation:
+
+# The right side of the ODE
+def f(x):
+    return (3*x + x**2)*np.exp(x)
+
+def cost_function_deep(P, x):
+
+    # Evaluate the trial function with the current parameters P
+    g_t = g_trial_deep(x,P)
+
+    # Find the derivative w.r.t x of the trial function
+    d2_g_t = elementwise_grad(elementwise_grad(g_trial_deep,0))(x,P)
+
+    right_side = f(x)
+
+    err_sqr = (-d2_g_t - right_side)**2
+    cost_sum = np.sum(err_sqr)
+
+    return cost_sum/np.size(err_sqr)
+
+# The trial solution:
+def g_trial_deep(x,P):
+    return x*(1-x)*deep_neural_network(P,x)
+
+# The analytic solution;
+def g_analytic(x):
+    return x*(1-x)*np.exp(x)
+
+if __name__ == '__main__':
+    npr.seed(4155)
+
+    ## Decide the vales of arguments to the function to solve
+    Nx = 10
+    x = np.linspace(0,1, Nx)
+
+    ## Set up the initial parameters
+    num_hidden_neurons = [200,100]
+    num_iter = 1000
+    lmb = 1e-3
+
+    P = solve_ode_deep_neural_network(x, num_hidden_neurons, num_iter, lmb)
+
+    g_dnn_ag = g_trial_deep(x,P)
+    g_analytical = g_analytic(x)
+
+    # Find the maximum absolute difference between the solutons:
+
+    plt.figure(figsize=(10,10))
+
+    plt.title('Performance of neural network solving an ODE compared to the analytical solution')
+    plt.plot(x, g_analytical)
+    plt.plot(x, g_dnn_ag[0,:])
+    plt.legend(['analytical','nn'])
+    plt.xlabel('x')
+    plt.ylabel('g(x)')
+
+    ## Perform the computation using the numerical scheme
+
+    dx = 1/(Nx - 1)
+
+    # Set up the matrix A
+    A = np.zeros((Nx-2,Nx-2))
+
+    A[0,0] = 2
+    A[0,1] = -1
+
+    for i in range(1,Nx-3):
+        A[i,i-1] = -1
+        A[i,i] = 2
+        A[i,i+1] = -1
+
+    A[Nx - 3, Nx - 4] = -1
+    A[Nx - 3, Nx - 3] = 2
+
+    # Set up the vector f
+    f_vec = dx**2 * f(x[1:-1])
+
+    # Solve the equation
+    g_res = np.linalg.solve(A,f_vec)
+
+    g_vec = np.zeros(Nx)
+    g_vec[1:-1] = g_res
+
+    # Print the differences between each method
+    max_diff1 = np.max(np.abs(g_dnn_ag - g_analytical))
+    max_diff2 = np.max(np.abs(g_vec - g_analytical))
+    print("The max absolute difference between the analytical solution and DNN Autograd: %g"%max_diff1)
+    print("The max absolute difference between the analytical solution and numerical scheme: %g"%max_diff2)
+
+    # Plot the results
+    plt.figure(figsize=(10,10))
+
+    plt.plot(x,g_vec)
+    plt.plot(x,g_analytical)
+    plt.plot(x,g_dnn_ag[0,:])
+
+    plt.legend(['numerical scheme','analytical','dnn'])
+    plt.show()
+
+!ec
+
+
+
+!split
+===== Partial Differential Equations =====
+
+A partial differential equation (PDE) has a solution here the function
+is defined by multiple variables.  The equation may involve all kinds
+of combinations of which variables the function is differentiated with
+respect to.
+
+In general, a partial differential equation for a function $g(x_1,\dots,x_N)$ with $N$ variables may be expressed as
+
+!bt
+\begin{equation} \label{PDE}
+  f\left(x_1, \, \dots \, , x_N, \frac{\partial g(x_1,\dots,x_N) }{\partial x_1}, \dots , \frac{\partial g(x_1,\dots,x_N) }{\partial x_N}, \frac{\partial g(x_1,\dots,x_N) }{\partial x_1\partial x_2}, \, \dots \, , \frac{\partial^n g(x_1,\dots,x_N) }{\partial x_N^n} \right) = 0
+\end{equation}
+!et
+
+where $f$ is an expression involving all kinds of possible mixed derivatives of $g(x_1,\dots,x_N)$ up to an order $n$. In order for the solution to be unique, some additional conditions must also be given.
+
+!split
+===== Type of problem =====
+
+The problem our network must solve for, is similar to the ODE case.
+We must have a trial solution $g_t$ at hand.
+
+For instance, the trial solution could be expressed as
+!bt
+\begin{align*}
+  g_t(x_1,\dots,x_N) = h_1(x_1,\dots,x_N) + h_2(x_1,\dots,x_N,N(x_1,\dots,x_N,P))
+\end{align*}
+!et
+where $h_1(x_1,\dots,x_N)$ is a function that ensures $g_t(x_1,\dots,x_N)$ satisfies some given conditions.
+The neural network $N(x_1,\dots,x_N,P)$ has weights and biases described by $P$ and $h_2(x_1,\dots,x_N,N(x_1,\dots,x_N,P))$ is an expression using the output from the neural network in some way.
+
+The role of the function $h_2(x_1,\dots,x_N,N(x_1,\dots,x_N,P))$, is to ensure that the output of $N(x_1,\dots,x_N,P)$ is zero when $g_t(x_1,\dots,x_N)$ is evaluated at the values of $x_1,\dots,x_N$ where the given conditions must be satisfied. The function $h_1(x_1,\dots,x_N)$ should alone make $g_t(x_1,\dots,x_N)$ satisfy the conditions.
+
+
+!split
+===== Network requirements =====
+
+The network tries then the minimize the cost function following the
+same ideas as described for the ODE case, but now with more than one
+variables to consider.  The concept still remains the same; find a set
+of parameters $P$ such that the expression $f$ in (ref{PDE}) is as
+close to zero as possible.
+
+As for the ODE case, the cost function is the mean squared error that
+the network must try to minimize. The cost function for the network to
+minimize is
+
+!bt
+\begin{equation*}
+C\left(x_1, \dots, x_N, P\right) = \left(  f\left(x_1, \, \dots \, , x_N, \frac{\partial g(x_1,\dots,x_N) }{\partial x_1}, \dots , \frac{\partial g(x_1,\dots,x_N) }{\partial x_N}, \frac{\partial g(x_1,\dots,x_N) }{\partial x_1\partial x_2}, \, \dots \, , \frac{\partial^n g(x_1,\dots,x_N) }{\partial x_N^n} \right) \right)^2
+\end{equation*}
+!et
+
+!split
+===== More details =====
+
+If we let $\bm{x} = \big( x_1, \dots, x_N \big)$ be an array containing the values for $x_1, \dots, x_N$ respectively, the cost function can be reformulated into the following:
+!bt
+\[
+	C\left(\bm{x}, P\right) = f\left( \left( \bm{x}, \frac{\partial g(\bm{x}) }{\partial x_1}, \dots , \frac{\partial g(\bm{x}) }{\partial x_N}, \frac{\partial g(\bm{x}) }{\partial x_1\partial x_2}, \, \dots \, , \frac{\partial^n g(\bm{x}) }{\partial x_N^n} \right) \right)^2
+\]
+!et
+
+If we also have $M$ different sets of values for $x_1, \dots, x_N$, that is $\bm{x}_i = \big(x_1^{(i)}, \dots, x_N^{(i)}\big)$ for $i = 1,\dots,M$ being the rows in matrix $X$, the cost function can be generalized into
+!bt
+\begin{equation*}
+C\left(X, P \right) = \sum_{i=1}^M f\left( \left( \bm{x}_i, \frac{\partial g(\bm{x}_i) }{\partial x_1}, \dots , \frac{\partial g(\bm{x}_i) }{\partial x_N}, \frac{\partial g(\bm{x}_i) }{\partial x_1\partial x_2}, \, \dots \, , \frac{\partial^n g(\bm{x}_i) }{\partial x_N^n} \right) \right)^2.
+\end{equation*}
+!et
+
+!split
+===== Example: The diffusion equation =====
+
+In one spatial dimension, the equation reads
+!bt
+\begin{equation*}
+  \frac{\partial g(x,t)}{\partial t} = \frac{\partial^2 g(x,t)}{\partial x^2}
+\end{equation*}
+!et
+
+where a possible choice of conditions are
+!bt
+\begin{align*}
+g(0,t) &= 0 ,\qquad t \geq 0 \\
+g(1,t) &= 0, \qquad t \geq 0 \\
+g(x,0) &= u(x),\qquad x\in [0,1]
+\end{align*}
+!et
+with $u(x)$ being some given function.
+
+!split
+===== Defining the problem =====
+
+For this case, we want to find $g(x,t)$ such that
+
+!bt
+\begin{equation}
+  \frac{\partial g(x,t)}{\partial t} = \frac{\partial^2 g(x,t)}{\partial x^2}
+\end{equation} \label{diffonedim}
+!et
+
+and
+
+!bt
+\begin{align*}
+g(0,t) &= 0 ,\qquad t \geq 0 \\
+g(1,t) &= 0, \qquad t \geq 0 \\
+g(x,0) &= u(x),\qquad x\in [0,1]
+\end{align*}
+!et
+with $u(x) = \sin(\pi x)$.
+
+First, let us set up the deep neural network.
+The deep neural network will follow the same structure as discussed in the examples solving the ODEs.
+First, we will look into how Autograd could be used in a network tailored to solve for bivariate functions.
+
+
+
+!split
+===== Setting up the network using Autograd =====
+
+The only change to do here, is to extend our network such that
+functions of multiple parameters are correctly handled.  In this case
+we have two variables in our function to solve for, that is time $t$
+and position $x$.  The variables will be represented by a
+one-dimensional array in the program.  The program will evaluate the
+network at each possible pair $(x,t)$, given an array for the desired
+$x$-values and $t$-values to approximate the solution at.
+
+!bc pycod
+def sigmoid(z):
+    return 1/(1 + np.exp(-z))
+
+def deep_neural_network(deep_params, x):
+    # x is now a point and a 1D numpy array; make it a column vector
+    num_coordinates = np.size(x,0)
+    x = x.reshape(num_coordinates,-1)
+
+    num_points = np.size(x,1)
+
+    # N_hidden is the number of hidden layers
+    N_hidden = len(deep_params) - 1 # -1 since params consist of parameters to all the hidden layers AND the output layer
+
+    # Assume that the input layer does nothing to the input x
+    x_input = x
+    x_prev = x_input
+
+    ## Hidden layers:
+
+    for l in range(N_hidden):
+        # From the list of parameters P; find the correct weigths and bias for this layer
+        w_hidden = deep_params[l]
+
+        # Add a row of ones to include bias
+        x_prev = np.concatenate((np.ones((1,num_points)), x_prev ), axis = 0)
+
+        z_hidden = np.matmul(w_hidden, x_prev)
+        x_hidden = sigmoid(z_hidden)
+
+        # Update x_prev such that next layer can use the output from this layer
+        x_prev = x_hidden
+
+    ## Output layer:
+
+    # Get the weights and bias for this layer
+    w_output = deep_params[-1]
+
+    # Include bias:
+    x_prev = np.concatenate((np.ones((1,num_points)), x_prev), axis = 0)
+
+    z_output = np.matmul(w_output, x_prev)
+    x_output = z_output
+
+    return x_output[0][0]
+!ec
+
+!split
+===== Setting up the network using Autograd; The trial solution =====
+
+The cost function must then iterate through the given arrays
+containing values for $x$ and $t$, defines a point $(x,t)$ the deep
+neural network and the trial solution is evaluated at, and then finds
+the Jacobian of the trial solution.
+
+A possible trial solution for this PDE is
+
+$$
+g_t(x,t) = h_1(x,t) + x(1-x)tN(x,t,P)
+$$
+
+with $A(x,t)$ being a function ensuring that $g_t(x,t)$ satisfies our given conditions, and $N(x,t,P)$ being the output from the deep neural network using weights and biases for each layer from $P$.
+
+To fulfill the conditions, $A(x,t)$ could be:
+
+$$
+h_1(x,t) = (1-t)\Big(u(x) - \big((1-x)u(0) + x u(1)\big)\Big) = (1-t)u(x) = (1-t)\sin(\pi x)
+$$
+since $(0) = u(1) = 0$ and $u(x) = \sin(\pi x)$.
+
+!split
+===== Why the jacobian? =====
+
+The Jacobian is used because the program must find the derivative of
+the trial solution with respect to $x$ and $t$.
+
+This gives the necessity of computing the Jacobian matrix, as we want
+to evaluate the gradient with respect to $x$ and $t$ (note that the
+Jacobian of a scalar-valued multivariate function is simply its
+gradient).
+
+In Autograd, the differentiation is by default done with respect to
+the first input argument of your Python function. Since the points is
+an array representing $x$ and $t$, the Jacobian is calculated using
+the values of $x$ and $t$.
+
+To find the second derivative with respect to $x$ and $t$, the
+Jacobian can be found for the second time. The result is a Hessian
+matrix, which is the matrix containing all the possible second order
+mixed derivatives of $g(x,t)$.
+
+!bc pycod
+# Set up the trial function:
+def u(x):
+    return np.sin(np.pi*x)
+
+def g_trial(point,P):
+    x,t = point
+    return (1-t)*u(x) + x*(1-x)*t*deep_neural_network(P,point)
+
+# The right side of the ODE:
+def f(point):
+    return 0.
+
+# The cost function:
+def cost_function(P, x, t):
+    cost_sum = 0
+
+    g_t_jacobian_func = jacobian(g_trial)
+    g_t_hessian_func = hessian(g_trial)
+
+    for x_ in x:
+        for t_ in t:
+            point = np.array([x_,t_])
+
+            g_t = g_trial(point,P)
+            g_t_jacobian = g_t_jacobian_func(point,P)
+            g_t_hessian = g_t_hessian_func(point,P)
+
+            g_t_dt = g_t_jacobian[1]
+            g_t_d2x = g_t_hessian[0][0]
+
+            func = f(point)
+
+            err_sqr = ( (g_t_dt - g_t_d2x) - func)**2
+            cost_sum += err_sqr
+
+    return cost_sum
+!ec
+
+!split
+===== Setting up the network using Autograd; The full program =====
+
+Having set up the network, along with the trial solution and cost function, we can now see how the deep neural network performs by comparing the results to the analytical solution.
+
+The analytical solution of our problem is
+
+$$
+g(x,t) = \exp(-\pi^2 t)\sin(\pi x)
+$$
+
+A possible way to implement a neural network solving the PDE, is given below.
+Be aware, though, that it is fairly slow for the parameters used.
+A better result is possible, but requires more iterations, and thus longer time to complete.
+
+
+Indeed, the program below is not optimal in its implementation, but rather serves as an example on how to implement and use a neural network to solve a PDE.
+Using TensorFlow results in a much better execution time. Try it!
+
+!bc pycod
+import autograd.numpy as np
+from autograd import jacobian,hessian,grad
+import autograd.numpy.random as npr
+from matplotlib import cm
+from matplotlib import pyplot as plt
+from mpl_toolkits.mplot3d import axes3d
+
+## Set up the network
+
+def sigmoid(z):
+    return 1/(1 + np.exp(-z))
+
+def deep_neural_network(deep_params, x):
+    # x is now a point and a 1D numpy array; make it a column vector
+    num_coordinates = np.size(x,0)
+    x = x.reshape(num_coordinates,-1)
+
+    num_points = np.size(x,1)
+
+    # N_hidden is the number of hidden layers
+    N_hidden = len(deep_params) - 1 # -1 since params consist of parameters to all the hidden layers AND the output layer
+
+    # Assume that the input layer does nothing to the input x
+    x_input = x
+    x_prev = x_input
+
+    ## Hidden layers:
+
+    for l in range(N_hidden):
+        # From the list of parameters P; find the correct weigths and bias for this layer
+        w_hidden = deep_params[l]
+
+        # Add a row of ones to include bias
+        x_prev = np.concatenate((np.ones((1,num_points)), x_prev ), axis = 0)
+
+        z_hidden = np.matmul(w_hidden, x_prev)
+        x_hidden = sigmoid(z_hidden)
+
+        # Update x_prev such that next layer can use the output from this layer
+        x_prev = x_hidden
+
+    ## Output layer:
+
+    # Get the weights and bias for this layer
+    w_output = deep_params[-1]
+
+    # Include bias:
+    x_prev = np.concatenate((np.ones((1,num_points)), x_prev), axis = 0)
+
+    z_output = np.matmul(w_output, x_prev)
+    x_output = z_output
+
+    return x_output[0][0]
+
+## Define the trial solution and cost function
+def u(x):
+    return np.sin(np.pi*x)
+
+def g_trial(point,P):
+    x,t = point
+    return (1-t)*u(x) + x*(1-x)*t*deep_neural_network(P,point)
+
+# The right side of the ODE:
+def f(point):
+    return 0.
+
+# The cost function:
+def cost_function(P, x, t):
+    cost_sum = 0
+
+    g_t_jacobian_func = jacobian(g_trial)
+    g_t_hessian_func = hessian(g_trial)
+
+    for x_ in x:
+        for t_ in t:
+            point = np.array([x_,t_])
+
+            g_t = g_trial(point,P)
+            g_t_jacobian = g_t_jacobian_func(point,P)
+            g_t_hessian = g_t_hessian_func(point,P)
+
+            g_t_dt = g_t_jacobian[1]
+            g_t_d2x = g_t_hessian[0][0]
+
+            func = f(point)
+
+            err_sqr = ( (g_t_dt - g_t_d2x) - func)**2
+            cost_sum += err_sqr
+
+    return cost_sum /( np.size(x)*np.size(t) )
+
+## For comparison, define the analytical solution
+def g_analytic(point):
+    x,t = point
+    return np.exp(-np.pi**2*t)*np.sin(np.pi*x)
+
+## Set up a function for training the network to solve for the equation
+def solve_pde_deep_neural_network(x,t, num_neurons, num_iter, lmb):
+    ## Set up initial weigths and biases
+    N_hidden = np.size(num_neurons)
+
+    ## Set up initial weigths and biases
+
+    # Initialize the list of parameters:
+    P = [None]*(N_hidden + 1) # + 1 to include the output layer
+
+    P[0] = npr.randn(num_neurons[0], 2 + 1 ) # 2 since we have two points, +1 to include bias
+    for l in range(1,N_hidden):
+        P[l] = npr.randn(num_neurons[l], num_neurons[l-1] + 1) # +1 to include bias
+
+    # For the output layer
+    P[-1] = npr.randn(1, num_neurons[-1] + 1 ) # +1 since bias is included
+
+    print('Initial cost: ',cost_function(P, x, t))
+
+    cost_function_grad = grad(cost_function,0)
+
+    # Let the update be done num_iter times
+    for i in range(num_iter):
+        cost_grad =  cost_function_grad(P, x , t)
+
+        for l in range(N_hidden+1):
+            P[l] = P[l] - lmb * cost_grad[l]
+
+    print('Final cost: ',cost_function(P, x, t))
+
+    return P
+
+if __name__ == '__main__':
+    ### Use the neural network:
+    npr.seed(15)
+
+    ## Decide the vales of arguments to the function to solve
+    Nx = 10; Nt = 10
+    x = np.linspace(0, 1, Nx)
+    t = np.linspace(0,1,Nt)
+
+    ## Set up the parameters for the network
+    num_hidden_neurons = [100, 25]
+    num_iter = 250
+    lmb = 0.01
+
+    P = solve_pde_deep_neural_network(x,t, num_hidden_neurons, num_iter, lmb)
+
+    ## Store the results
+    g_dnn_ag = np.zeros((Nx, Nt))
+    G_analytical = np.zeros((Nx, Nt))
+    for i,x_ in enumerate(x):
+        for j, t_ in enumerate(t):
+            point = np.array([x_, t_])
+            g_dnn_ag[i,j] = g_trial(point,P)
+
+            G_analytical[i,j] = g_analytic(point)
+
+    # Find the map difference between the analytical and the computed solution
+    diff_ag = np.abs(g_dnn_ag - G_analytical)
+    print('Max absolute difference between the analytical solution and the network: %g'%np.max(diff_ag))
+
+    ## Plot the solutions in two dimensions, that being in position and time
+
+    T,X = np.meshgrid(t,x)
+
+    fig = plt.figure(figsize=(10,10))
+    ax = fig.add_suplot(projection='3d')
+    ax.set_title('Solution from the deep neural network w/ %d layer'%len(num_hidden_neurons))
+    s = ax.plot_surface(T,X,g_dnn_ag,linewidth=0,antialiased=False,cmap=cm.viridis)
+    ax.set_xlabel('Time $t$')
+    ax.set_ylabel('Position $x$');
+
+
+    fig = plt.figure(figsize=(10,10))
+    ax = fig.add_suplot(projection='3d')
+    ax.set_title('Analytical solution')
+    s = ax.plot_surface(T,X,G_analytical,linewidth=0,antialiased=False,cmap=cm.viridis)
+    ax.set_xlabel('Time $t$')
+    ax.set_ylabel('Position $x$');
+
+    fig = plt.figure(figsize=(10,10))
+    ax = fig.add_suplot(projection='3d')
+    ax.set_title('Difference')
+    s = ax.plot_surface(T,X,diff_ag,linewidth=0,antialiased=False,cmap=cm.viridis)
+    ax.set_xlabel('Time $t$')
+    ax.set_ylabel('Position $x$');
+
+    ## Take some slices of the 3D plots just to see the solutions at particular times
+    indx1 = 0
+    indx2 = int(Nt/2)
+    indx3 = Nt-1
+
+    t1 = t[indx1]
+    t2 = t[indx2]
+    t3 = t[indx3]
+
+    # Slice the results from the DNN
+    res1 = g_dnn_ag[:,indx1]
+    res2 = g_dnn_ag[:,indx2]
+    res3 = g_dnn_ag[:,indx3]
+
+    # Slice the analytical results
+    res_analytical1 = G_analytical[:,indx1]
+    res_analytical2 = G_analytical[:,indx2]
+    res_analytical3 = G_analytical[:,indx3]
+
+    # Plot the slices
+    plt.figure(figsize=(10,10))
+    plt.title("Computed solutions at time = %g"%t1)
+    plt.plot(x, res1)
+    plt.plot(x,res_analytical1)
+    plt.legend(['dnn','analytical'])
+
+    plt.figure(figsize=(10,10))
+    plt.title("Computed solutions at time = %g"%t2)
+    plt.plot(x, res2)
+    plt.plot(x,res_analytical2)
+    plt.legend(['dnn','analytical'])
+
+    plt.figure(figsize=(10,10))
+    plt.title("Computed solutions at time = %g"%t3)
+    plt.plot(x, res3)
+    plt.plot(x,res_analytical3)
+    plt.legend(['dnn','analytical'])
+
+    plt.show()
+!ec
+
+!split
+===== Example: Solving the wave equation with Neural Networks =====
+
+The wave equation is
+!bt
+\begin{equation*}
+	\frac{\partial^2 g(x,t)}{\partial t^2} = c^2\frac{\partial^2 g(x,t)}{\partial x^2}
+\end{equation*}
+!et
+
+with $c$ being the specified wave speed.
+
+Here, the chosen conditions are
+!bt
+\begin{align*}
+	g(0,t) &= 0 \\
+	g(1,t) &= 0 \\
+	g(x,0) &= u(x) \\
+	\frac{\partial g(x,t)}{\partial t} \Big |_{t = 0} &= v(x)
+\end{align*}
+!et
+where $\frac{\partial g(x,t)}{\partial t} \Big |_{t = 0}$ means the derivative of $g(x,t)$ with respect to $t$ is evaluated at $t = 0$, and $u(x)$ and $v(x)$ being given functions.
+
+!split
+===== The problem to solve for =====
+
+The wave equation to solve for, is
+
+!bt
+\begin{equation} \label{wave}
+\frac{\partial^2 g(x,t)}{\partial t^2} = c^2 \frac{\partial^2 g(x,t)}{\partial x^2}
+\end{equation}
+!et
+
+where $c$ is the given wave speed.
+The chosen conditions for this equation are
+
+!bt
+\begin{aligned}
+g(0,t) &= 0, &t \geq 0 \\
+g(1,t) &= 0, &t \geq 0 \\
+g(x,0) &= u(x), &x\in[0,1] \\
+\frac{\partial g(x,t)}{\partial t}\Big |_{t = 0} &= v(x), &x \in [0,1]
+\end{aligned} \label{condwave}
+!et
+
+In this example, let $c = 1$ and $u(x) = \sin(\pi x)$ and $v(x) = -\pi\sin(\pi x)$.
+
+
+!split
+===== The trial solution =====
+Setting up the network is done in similar matter as for the example of solving the diffusion equation.
+The only things we have to change, is the trial solution such that it satisfies the conditions from (ref{condwave}) and the cost function.
+
+The trial solution becomes slightly different since we have other conditions than in the example of solving the diffusion equation. Here, a possible trial solution $g_t(x,t)$ is
+
+$$
+g_t(x,t) = h_1(x,t) + x(1-x)t^2N(x,t,P)
+$$
+
+where
+
+$$
+h_1(x,t) = (1-t^2)u(x) + tv(x)
+$$
+
+Note that this trial solution satisfies the conditions only if $u(0) = v(0) = u(1) = v(1) = 0$, which is the case in this example.
+
+!split
+===== The analytical solution =====
+
+The analytical solution for our specific problem, is
+
+$$
+g(x,t) = \sin(\pi x)\cos(\pi t) - \sin(\pi x)\sin(\pi t)
+$$
+
+!split
+===== Solving the wave equation - the full program using Autograd =====
+
+!bc pycod
+import autograd.numpy as np
+from autograd import hessian,grad
+import autograd.numpy.random as npr
+from matplotlib import cm
+from matplotlib import pyplot as plt
+from mpl_toolkits.mplot3d import axes3d
+
+## Set up the trial function:
+def u(x):
+    return np.sin(np.pi*x)
+
+def v(x):
+    return -np.pi*np.sin(np.pi*x)
+
+def h1(point):
+    x,t = point
+    return (1 - t**2)*u(x) + t*v(x)
+
+def g_trial(point,P):
+    x,t = point
+    return h1(point) + x*(1-x)*t**2*deep_neural_network(P,point)
+
+## Define the cost function
+def cost_function(P, x, t):
+    cost_sum = 0
+
+    g_t_hessian_func = hessian(g_trial)
+
+    for x_ in x:
+        for t_ in t:
+            point = np.array([x_,t_])
+
+            g_t_hessian = g_t_hessian_func(point,P)
+
+            g_t_d2x = g_t_hessian[0][0]
+            g_t_d2t = g_t_hessian[1][1]
+
+            err_sqr = ( (g_t_d2t - g_t_d2x) )**2
+            cost_sum += err_sqr
+
+    return cost_sum / (np.size(t) * np.size(x))
+
+## The neural network
+def sigmoid(z):
+    return 1/(1 + np.exp(-z))
+
+def deep_neural_network(deep_params, x):
+    # x is now a point and a 1D numpy array; make it a column vector
+    num_coordinates = np.size(x,0)
+    x = x.reshape(num_coordinates,-1)
+
+    num_points = np.size(x,1)
+
+    # N_hidden is the number of hidden layers
+    N_hidden = len(deep_params) - 1 # -1 since params consist of parameters to all the hidden layers AND the output layer
+
+    # Assume that the input layer does nothing to the input x
+    x_input = x
+    x_prev = x_input
+
+    ## Hidden layers:
+
+    for l in range(N_hidden):
+        # From the list of parameters P; find the correct weigths and bias for this layer
+        w_hidden = deep_params[l]
+
+        # Add a row of ones to include bias
+        x_prev = np.concatenate((np.ones((1,num_points)), x_prev ), axis = 0)
+
+        z_hidden = np.matmul(w_hidden, x_prev)
+        x_hidden = sigmoid(z_hidden)
+
+        # Update x_prev such that next layer can use the output from this layer
+        x_prev = x_hidden
+
+    ## Output layer:
+
+    # Get the weights and bias for this layer
+    w_output = deep_params[-1]
+
+    # Include bias:
+    x_prev = np.concatenate((np.ones((1,num_points)), x_prev), axis = 0)
+
+    z_output = np.matmul(w_output, x_prev)
+    x_output = z_output
+
+    return x_output[0][0]
+
+## The analytical solution
+def g_analytic(point):
+    x,t = point
+    return np.sin(np.pi*x)*np.cos(np.pi*t) - np.sin(np.pi*x)*np.sin(np.pi*t)
+
+def solve_pde_deep_neural_network(x,t, num_neurons, num_iter, lmb):
+    ## Set up initial weigths and biases
+    N_hidden = np.size(num_neurons)
+
+    ## Set up initial weigths and biases
+
+    # Initialize the list of parameters:
+    P = [None]*(N_hidden + 1) # + 1 to include the output layer
+
+    P[0] = npr.randn(num_neurons[0], 2 + 1 ) # 2 since we have two points, +1 to include bias
+    for l in range(1,N_hidden):
+        P[l] = npr.randn(num_neurons[l], num_neurons[l-1] + 1) # +1 to include bias
+
+    # For the output layer
+    P[-1] = npr.randn(1, num_neurons[-1] + 1 ) # +1 since bias is included
+
+    print('Initial cost: ',cost_function(P, x, t))
+
+    cost_function_grad = grad(cost_function,0)
+
+    # Let the update be done num_iter times
+    for i in range(num_iter):
+        cost_grad =  cost_function_grad(P, x , t)
+
+        for l in range(N_hidden+1):
+            P[l] = P[l] - lmb * cost_grad[l]
+
+
+    print('Final cost: ',cost_function(P, x, t))
+
+    return P
+
+if __name__ == '__main__':
+    ### Use the neural network:
+    npr.seed(15)
+
+    ## Decide the vales of arguments to the function to solve
+    Nx = 10; Nt = 10
+    x = np.linspace(0, 1, Nx)
+    t = np.linspace(0,1,Nt)
+
+    ## Set up the parameters for the network
+    num_hidden_neurons = [50,20]
+    num_iter = 1000
+    lmb = 0.01
+
+    P = solve_pde_deep_neural_network(x,t, num_hidden_neurons, num_iter, lmb)
+
+    ## Store the results
+    res = np.zeros((Nx, Nt))
+    res_analytical = np.zeros((Nx, Nt))
+    for i,x_ in enumerate(x):
+        for j, t_ in enumerate(t):
+            point = np.array([x_, t_])
+            res[i,j] = g_trial(point,P)
+
+            res_analytical[i,j] = g_analytic(point)
+
+    diff = np.abs(res - res_analytical)
+    print("Max difference between analytical and solution from nn: %g"%np.max(diff))
+
+    ## Plot the solutions in two dimensions, that being in position and time
+
+    T,X = np.meshgrid(t,x)
+
+    fig = plt.figure(figsize=(10,10))
+    ax = fig.add_suplot(projection='3d')
+    ax.set_title('Solution from the deep neural network w/ %d layer'%len(num_hidden_neurons))
+    s = ax.plot_surface(T,X,res,linewidth=0,antialiased=False,cmap=cm.viridis)
+    ax.set_xlabel('Time $t$')
+    ax.set_ylabel('Position $x$');
+
+
+    fig = plt.figure(figsize=(10,10))
+    ax = fig.add_suplot(projection='3d')
+    ax.set_title('Analytical solution')
+    s = ax.plot_surface(T,X,res_analytical,linewidth=0,antialiased=False,cmap=cm.viridis)
+    ax.set_xlabel('Time $t$')
+    ax.set_ylabel('Position $x$');
+
+
+    fig = plt.figure(figsize=(10,10))
+    ax = fig.add_suplot(projection='3d')
+    ax.set_title('Difference')
+    s = ax.plot_surface(T,X,diff,linewidth=0,antialiased=False,cmap=cm.viridis)
+    ax.set_xlabel('Time $t$')
+    ax.set_ylabel('Position $x$');
+
+    ## Take some slices of the 3D plots just to see the solutions at particular times
+    indx1 = 0
+    indx2 = int(Nt/2)
+    indx3 = Nt-1
+
+    t1 = t[indx1]
+    t2 = t[indx2]
+    t3 = t[indx3]
+
+    # Slice the results from the DNN
+    res1 = res[:,indx1]
+    res2 = res[:,indx2]
+    res3 = res[:,indx3]
+
+    # Slice the analytical results
+    res_analytical1 = res_analytical[:,indx1]
+    res_analytical2 = res_analytical[:,indx2]
+    res_analytical3 = res_analytical[:,indx3]
+
+    # Plot the slices
+    plt.figure(figsize=(10,10))
+    plt.title("Computed solutions at time = %g"%t1)
+    plt.plot(x, res1)
+    plt.plot(x,res_analytical1)
+    plt.legend(['dnn','analytical'])
+
+    plt.figure(figsize=(10,10))
+    plt.title("Computed solutions at time = %g"%t2)
+    plt.plot(x, res2)
+    plt.plot(x,res_analytical2)
+    plt.legend(['dnn','analytical'])
+
+    plt.figure(figsize=(10,10))
+    plt.title("Computed solutions at time = %g"%t3)
+    plt.plot(x, res3)
+    plt.plot(x,res_analytical3)
+    plt.legend(['dnn','analytical'])
+
+    plt.show()
+!ec
+
+!split
+===== Resources on differential equations and deep learning =====
+
+o "Artificial neural networks for solving ordinary and partial differential equations by I.E. Lagaris et al":"https://pdfs.semanticscholar.org/d061/df393e0e8fbfd0ea24976458b7d42419040d.pdf"
+o "Neural networks for solving differential equations by A. Honchar":"https://becominghuman.ai/neural-networks-for-solving-differential-equations-fa230ac5e04c"
+o "Solving differential equations using neural networks by M.M Chiaramonte and M. Kiener":"http://cs229.stanford.edu/proj2013/ChiaramonteKiener-SolvingDifferentialEquationsUsingNeuralNetworks.pdf"
+o "Introduction to Partial Differential Equations by A. Tveito, R. Winther":"https://www.springer.com/us/book/9783540225515"
+
+
+
+
+
+
diff --git a/doc/src/week43/exercisesweek43.do.txt b/doc/src/week43/exercisesweek43.do.txt
index 105d59dd8..a82a92316 100644
--- a/doc/src/week43/exercisesweek43.do.txt
+++ b/doc/src/week43/exercisesweek43.do.txt
@@ -1,1284 +1,283 @@
-TITLE: Exercises weeks 43 and 44 
-AUTHOR: October 23-27, 2023
-DATE: Deadline is Sunday November 5 at midnight
+TITLE: Exercises week 43 
+AUTHOR: October 20-24, 2025
+DATE: Deadline Friday October 24 at midnight
 
-You can hand in the exercises from week 43 and week 44 as one exercise and get a total score of two additional points.
 
 =======  Overarching aims of the exercises weeks 43 and 44 =======
 
-The aim of the exercises this week and next week is to get started with writing a neural network code
-of relevance for project 2. 
+The aim of the exercises this week is to gain some confidence with
+ways to visualize the results of a classification problem.  We will
+target three ways of setting up the analysis. The first and simplest
+one is the
+o so-called confusion matrix, and the next is the
+o ROC curve and finally the
+o Cumulative gain curve.
 
+We will use Logistic Regression as method for the classification in
+this exercise. You can compare these results with those obtained with
+your neural network code from project 2 without a hidden layer.
 
-During week 41 we discussed three different types of gates, the
-so-called XOR, the OR and the AND gates.  In order to develop a code
-for neural networks, it can be useful to set up a simpler system with
-only two inputs and one output. This can make it easier to debug and
-study the feed forward pass and the back propagation part. In the
-exercise this and next week, we propose to study this system with just
-one hidden layer and two hidden nodes. There is only one output node
-and we can choose to use either a simple regression case (fitting a
-line) or just a binary classification case with the cross-entropy as
-cost function.
+In these exercises we will use binary and  multi-class data sets
+(the Iris data set from week 41).
 
+The underlying mathematics is described here.
 
-Their inputs and outputs can be
-summarized using the following tables, first for the OR gate with
-inputs $x_1$ and $x_2$ and outputs $y$:
-
-|---------------------|
-| $x_1$ | $x_2$ | $y$ |
-|---------------------|
-| 0    |  0 |  0  |
-| 0    | 1  |  1  |
-| 1    | 0  | 1 |
-| 1    | 1  | 1 |
-|---------------------|
-
-!split
-===== The AND and XOR Gates =====
-
-The AND gate is defined as
-
-|---------------------|
-| $x_1$ | $x_2$ | $y$ |
-|---------------------|
-| 0    |  0 |  0  |
-| 0    | 1  |  0  |
-| 1    | 0  | 0  |
-| 1    | 1  | 1 |
-|---------------------|
-
-And finally we have the XOR gate
-
-|---------------------|
-| $x_1$ | $x_2$ | $y$ |
-|---------------------|
-| 0    |  0 |  0  |
-| 0    | 1  |  1  |
-| 1    | 0  | 1 |
-| 1    | 1  | 0 |
-|---------------------|
-
-!split
-===== Representing the Data Sets =====
-
-Our design matrix is defined by the input values $x_1$ and $x_2$. Since we have four possible outputs, our design matrix reads
+=== Confusion Matrix ===
 
+A _confusion matrix_ summarizes a classifier’s performance by
+tabulating predictions versus true labels.  For binary classification,
+it is a $2\times2$ table whose entries are counts of outcomes:
 !bt
-\bm{X}=\begin{bmatrix} 0 & 0 \\
-                       0 & 1 \\
-		       1 & 0 \\
-		       1 & 1 \end{bmatrix},
+\[
+\begin{array}{l|cc} & \text{Predicted Positive} & \text{Predicted Negative} \\ \hline \text{Actual Positive} & TP & FN \\ \text{Actual Negative} & FP & TN \end{array}.
+\]
 !et
 
-while the vector of outputs is $\bm{y}^T=[0,1,1,0]$ for the XOR gate, $\bm{y}^T=[0,0,0,1]$ for the AND gate and $\bm{y}^T=[0,1,1,1]$ for the OR gate.
-
-
-
-Your tasks here are
+Here TP (true positives) is the number of cases correctly predicted as
+positive, FP (false positives) is the number incorrectly predicted as
+positive, TN (true negatives) is correctly predicted negative, and FN
+(false negatives) is incorrectly predicted negative .  In other words,
+“positive” means class 1 and “negative” means class 0; for example, TP
+occurs when the prediction and actual are both positive.  Formally:
 
-o Set up the design matrix with the inputs as discussed above and a vector containing the output, the so-called targets. Note that the design matrix is the same for all gates. You need just to define different outputs.
-o Construct a neural network with only one hidden layer and two hidden nodes using the Sigmoid function as activation function.
-o Set up the output layer with only one output node and use again the Sigmoid function as activation function for the output.
-o Initialize the weights and biases and perform a feed forward pass and compare the outputs with the targets.
-o Set up the cost function (cross entropy for classification of binary cases).
-o Calculate the gradients needed for the back propagation part.
-o Use the gradients to train the network in the back propagation part. Think of using automatic differentiation.
-o Train the network and study your results and compare with results obtained either with _scikit-learn_ or _TensorFlow_.
-
-Everything you develop here can be used directly into the code for the project.
-
-
-!split
-=====  Setting up dimensionalities by hand =====
-
-It can be useful to test the dimensionalities for the network.  Let us assume we have performed an optimization for XOR gate and found that the weights for the hidden layer are given by
 !bt
-\bm{W_h}=\begin{bmatrix} 1 & 1 \\
-                       1 & 1 \end{bmatrix},
+\[
+\text{TPR} = \frac{\text{TP}}{\text{TP} + \text{FN}}, \quad \text{FPR} = \frac{\text{FP}}{\text{FP} + \text{TN}},
+\]
 !et
+where TPR and FPR are the true and false positive rates defined below.
 
-Multiplying $\bm{X}$ and $\bm{W}$ gives
+In multiclass classification with $K$ classes, the confusion matrix
+generalizes to a $K\times K$ table.  Entry $N_{ij}$ in the table is
+the count of instances whose true class is $i$ and whose predicted
+class is $j$.  For example, a three-class confusion matrix can be written
+as:
 
 !bt
-\bm{X}{W}_h=\begin{bmatrix} 0 & 0 \\
-                       1 & 1 \\
-		       1 & 1 \\
-		       2 & 2 \end{bmatrix},
+\[
+\begin{array}{c|ccc} & \text{Pred Class 1} & \text{Pred Class 2} & \text{Pred Class 3} \\ \hline \text{Act Class 1} & N_{11} & N_{12} & N_{13} \\ \text{Act Class 2} & N_{21} & N_{22} & N_{23} \\ \text{Act Class 3} & N_{31} & N_{32} & N_{33} \end{array}.
+\]
 !et
-Assume also that the bias vector for the hidden layer is
+
+Here the diagonal entries $N_{ii}$ are the true positives for each
+class, and off-diagonal entries are misclassifications.  This matrix
+allows computation of per-class metrics: e.g. for class $i$,
+$\mathrm{TP}_i=N_{ii}$, $\mathrm{FN}_i=\sum_{j\neq i}N_{ij}$,
+$\mathrm{FP}_i=\sum_{j\neq i}N_{ji}$, and $\mathrm{TN}_i$ is the sum of
+all remaining entries.
+
+As defined above, TPR and FPR come from the binary case. In binary
+terms with $P$ actual positives and $N$ actual negatives, one has
 !bt
-\bm{b}_h=\begin{bmatrix} 0 \\
-                       -1\end{bmatrix},
+\[
+\text{TPR} = \frac{TP}{P} = \frac{TP}{TP+FN}, \quad \text{FPR} =
+\frac{FP}{N} = \frac{FP}{FP+TN},
+\]
 !et
-Adding it gives us the input to the activation function of the hidden layer
+as used in standard confusion-matrix
+formulations. These rates will be used in constructing ROC curves.
+
+=== ROC Curve ===
+
+The Receiver Operating Characteristic (ROC) curve plots the trade-off
+between true positives and false positives as a discrimination
+threshold varies.  Specifically, for a binary classifier that outputs
+a score or probability, one varies the threshold $t$ for declaring
+_positive_, and computes at each $t$ the true positive rate
+$\mathrm{TPR}(t)$ and false positive rate $\mathrm{FPR}(t)$ using the
+confusion matrix at that threshold.  The ROC curve is then the graph
+of TPR versus FPR.  By definition,
 !bt
-\bm{z}_h=\bm{X}\bm{W}_h+\bm{b}_h=\begin{bmatrix} 0 & -1 \\
-                       1 & 0 \\
-		       1 & 0 \\
-		       2 & 1 \end{bmatrix},
+\[
+\mathrm{TPR} = \frac{TP}{TP+FN}, \qquad \mathrm{FPR} = \frac{FP}{FP+TN},
+\]
 !et
 
-Let us then assume that our activation function is the RELU function, which simply means that we take the max of $0$ and the elements of the input argument $\bm{z}_h$, that is we have
-!bt
-\bm{a}_h=\mathrm{RELU}(\bm{z}_h=\bm{X}\bm{W}_h+\bm{b}_h)=\begin{bmatrix} 0 & 0 \\
-                       1 & 0 \\
-		       1 & 0 \\
-		       2 & 1 \end{bmatrix},
-!et
-Assume also that the bias of the output layer is zero and that the weights of the output layer are
+where $TP,FP,TN,FN$ are counts determined by threshold $t$.  A perfect
+classifier would reach the point (FPR=0, TPR=1) at some threshold.
+
+Formally, the ROC curve is obtained by plotting
+$(\mathrm{FPR}(t),\mathrm{TPR}(t))$ for all $t\in[0,1]$ (or as $t$
+sweeps through the sorted scores).  The Area Under the ROC Curve (AUC)
+quantifies the average performance over all thresholds.  It can be
+interpreted probabilistically: $\mathrm{AUC} =
+\Pr\bigl(s(X^+)>s(X^-)\bigr)$, the probability that a random positive
+instance $X^+$ receives a higher score $s$ than a random negative
+instance $X^-$ .  Equivalently, the AUC is the integral under the ROC
+curve:
+
 !bt
-\bm{w}_o=\begin{bmatrix} 1 \\
-                       -2\end{bmatrix},
+\[
+\mathrm{AUC} \;=\; \int_{0}^{1} \mathrm{TPR}(f)\,df,
+\]
 !et
-and multiplying with $\bm{a}_h$ gives the output
+where $f$ ranges over FPR (or fraction of negatives).  A model that guesses at random yields a diagonal ROC (AUC=0.5), whereas a perfect model yields AUC=1.0.
+
+=== Cumulative Gain ===
+
+The cumulative gain curve (or gains chart) evaluates how many
+positives are captured as one targets an increasing fraction of the
+population, sorted by model confidence.  To construct it, sort all
+instances by decreasing predicted probability of the positive class.
+Then, for the top $\alpha$ fraction of instances, compute the fraction
+of all actual positives that fall in this subset.  In formula form, if
+$P$ is the total number of positive instances and $P(\alpha)$ is the
+number of positives among the top $\alpha$ of the data, the cumulative
+gain at level $\alpha$ is
 !bt
-\bm{a}_o=\begin{bmatrix} 0 & 0 \\
-                       1 & 0 \\
-		       1 & 0 \\
-		       2 & 1 \end{bmatrix}\begin{bmatrix} 1 \\
-                       -2\end{bmatrix}=\begin{bmatrix} 0 \\ 1 \\ 1 \\0\end{bmatrix},
+\[
+\mathrm{Gain}(\alpha) \;=\; \frac{P(\alpha)}{P}.
+\]
 !et
-the wanted result.  Pay attention to the dimensionalities as well.
 
+For example, cutting off at the top 10% of predictions yields a gain
+equal to (positives in top 10%) divided by (total positives) .
+Plotting $\mathrm{Gain}(\alpha)$ versus $\alpha$ (often in percent)
+gives the gain curve.  The baseline (random) curve is the diagonal
+$\mathrm{Gain}(\alpha)=\alpha$, while an ideal model has a steep climb
+toward 1.
 
-!split
-===== Setting up the Neural Network =====
-
-We define first our design matrix and the various output vectors for the different gates.
-
-!bc pycod
-"""
-Simple code that tests XOR, OR and AND gates with linear regression
-"""
-
-# import necessary packages
-import numpy as np
-import matplotlib.pyplot as plt
-from sklearn import datasets
-
-def sigmoid(x):
-    return 1/(1 + np.exp(-x))
-
-def feed_forward(X):
-    # weighted sum of inputs to the hidden layer
-    z_h = np.matmul(X, hidden_weights) + hidden_bias
-    # activation in the hidden layer
-    a_h = sigmoid(z_h)
-    
-    # weighted sum of inputs to the output layer
-    z_o = np.matmul(a_h, output_weights) + output_bias
-    # softmax output
-    # axis 0 holds each input and axis 1 the probabilities of each category
-    probabilities = sigmoid(z_o)
-    return probabilities
+A related measure is the {\em lift}, often called the gain ratio.  It is the ratio of the model’s capture rate to that of random selection.  Equivalently,
+!bt
+\[
+\mathrm{Lift}(\alpha) \;=\; \frac{\mathrm{Gain}(\alpha)}{\alpha}.
+\]
+!et
 
+A lift $>1$ indicates better-than-random targeting.  In practice, gain
+and lift charts (used e.g.\ in marketing or imbalanced classification)
+show how many positives can be “gained” by focusing on a fraction of
+the population .
 
-# ensure the same random numbers appear every time
-np.random.seed(0)
+=== Other measures: Precision, Recall, and the F$_1$ Measure ===
 
-# Design matrix
-X = np.array([ [0, 0], [0, 1], [1, 0],[1, 1]],dtype=np.float64)
+Precision and recall (sensitivity) quantify binary classification
+accuracy in terms of positive predictions.  They are defined from the
+confusion matrix as:
+!bt
+\[
+\text{Precision} = \frac{TP}{TP + FP}, \qquad \text{Recall} = \frac{TP}{TP + FN}.
+\]
+!et
 
-# The XOR gate
-yXOR = np.array( [ 0, 1 ,1, 0])
-# The OR gate
-yOR = np.array( [ 0, 1 ,1, 1])
-# The AND gate
-yAND = np.array( [ 0, 0 ,0, 1])
+Precision is the fraction of predicted positives that are correct, and
+recall is the fraction of actual positives that are correctly
+identified .  A high-precision classifier makes few false-positive
+errors, while a high-recall classifier makes few false-negative
+errors.
 
-# Defining the neural network
-n_inputs, n_features = X.shape
-n_hidden_neurons = 2
-n_categories = 1
-n_features = 2
+The F$_1$ score (balanced F-measure) combines precision and recall into a single metric via their harmonic mean.  The usual formula is:
+!bt
+\[
+F_1 =2\frac{\text{Precision}\times\text{Recall}}{\text{Precision} + \text{Recall}}.
+\]
+!et
+This can be shown to equal
+!bt
+\[
+\frac{2\,TP}{2\,TP + FP + FN}.
+\]
+!et
 
-# we make the weights normally distributed using numpy.random.randn
+The F$_1$ score ranges from 0 (worst) to 1 (best), and balances the
+trade-off between precision and recall.
 
-# weights and bias in the hidden layer
-hidden_weights = np.random.randn(n_features, n_hidden_neurons)
-hidden_bias = np.zeros(n_hidden_neurons) + 0.01
+For multi-class classification, one computes per-class
+precision/recall/F$_1$ (treating each class as “positive” in a
+one-vs-rest manner) and then averages.  Common averaging methods are:
 
-# weights and bias in the output layer
-output_weights = np.random.randn(n_hidden_neurons, n_categories)
-output_bias = np.zeros(n_categories) + 0.01
+Micro-averaging: Sum all true positives, false positives, and false negatives across classes, then compute precision/recall/F$_1$ from these totals.
+Macro-averaging: Compute the F$1$ score $F{1,i}$ for each class $i$ separately, then take the unweighted mean: $F_{1,\mathrm{macro}} = \frac{1}{K}\sum_{i=1}^K F_{1,i}$ .  This treats all classes equally regardless of size.
+Weighted-averaging: Like macro-average, but weight each class’s $F_{1,i}$ by its support $n_i$ (true count): $F_{1,\mathrm{weighted}} = \frac{1}{N}\sum_{i=1}^K n_i F_{1,i}$, where $N=\sum_i n_i$.  This accounts for class imbalance by giving more weight to larger classes .
 
-probabilities = feed_forward(X)
-print(probabilities)
 
-!ec
+Each of these averages has different use-cases. Micro-average is
+dominated by common classes, macro-average highlights performance on
+rare classes, and weighted-average is a compromise.  These formulas
+and concepts allow rigorous evaluation of classifier performance in
+both binary and multi-class settings.
 
-Not an impressive result, but this was our first forward pass with randomly assigned weights. Let us now add the full network with the back-propagation algorithm discussed above.
+===== Exercises =====
 
-!split
-===== The Code using Scikit-Learn =====
 
-!bc pycod
-# import necessary packages
-import numpy as np
+Here is a simple code example which uses  the Logistic regression machinery from _scikit-learn_.
+At the end it sets up the confusion matrix and the ROC and cumulative gain curves.
+Feel free to use these functionalities (we don't expect you to write your own code for say the confusion matrix).
+!bc pycod 
 import matplotlib.pyplot as plt
-from sklearn.neural_network import MLPClassifier
-from sklearn.metrics import accuracy_score
-import seaborn as sns
-
-# ensure the same random numbers appear every time
-np.random.seed(0)
-
-# Design matrix
-X = np.array([ [0, 0], [0, 1], [1, 0],[1, 1]],dtype=np.float64)
-
-# The XOR gate
-yXOR = np.array( [ 0, 1 ,1, 0])
-# The OR gate
-yOR = np.array( [ 0, 1 ,1, 1])
-# The AND gate
-yAND = np.array( [ 0, 0 ,0, 1])
-
-# Defining the neural network
-n_hidden_neurons = 2
-
-eta_vals = np.logspace(-5, 1, 7)
-lmbd_vals = np.logspace(-5, 1, 7)
-# store models for later use
-DNN_scikit = np.zeros((len(eta_vals), len(lmbd_vals)), dtype=object)
-epochs = 100
-
-for i, eta in enumerate(eta_vals):
-    for j, lmbd in enumerate(lmbd_vals):
-        dnn = MLPClassifier(hidden_layer_sizes=(n_hidden_neurons), activation='logistic',
-                            alpha=lmbd, learning_rate_init=eta, max_iter=epochs)
-        dnn.fit(X, yXOR)
-        DNN_scikit[i][j] = dnn
-        print("Learning rate  = ", eta)
-        print("Lambda = ", lmbd)
-        print("Accuracy score on data set: ", dnn.score(X, yXOR))
-        print()
-
-sns.set()
-test_accuracy = np.zeros((len(eta_vals), len(lmbd_vals)))
-for i in range(len(eta_vals)):
-    for j in range(len(lmbd_vals)):
-        dnn = DNN_scikit[i][j]
-        test_pred = dnn.predict(X)
-        test_accuracy[i][j] = accuracy_score(yXOR, test_pred)
-
-fig, ax = plt.subplots(figsize = (10, 10))
-sns.heatmap(test_accuracy, annot=True, ax=ax, cmap="viridis")
-ax.set_title("Test Accuracy")
-ax.set_ylabel("$\eta$")
-ax.set_xlabel("$\lambda$")
+import numpy as np
+from sklearn.model_selection import  train_test_split 
+# from sklearn.datasets import fill in the data set
+from sklearn.linear_model import LogisticRegression
+
+# Load the data, fill inn
+mydata.data = ?
+
+X_train, X_test, y_train, y_test = train_test_split(mydata.data,cancer.target,random_state=0)
+print(X_train.shape)
+print(X_test.shape)
+# Logistic Regression
+# define which type of problem, binary or multiclass
+logreg = LogisticRegression(solver='lbfgs')
+logreg.fit(X_train, y_train)
+
+from sklearn.preprocessing import LabelEncoder
+from sklearn.model_selection import cross_validate
+#Cross validation
+accuracy = cross_validate(logreg,X_test,y_test,cv=10)['test_score']
+print(accuracy)
+print("Test set accuracy with Logistic Regression: {:.2f}".format(logreg.score(X_test,y_test)))
+
+import scikitplot as skplt
+y_pred = logreg.predict(X_test)
+skplt.metrics.plot_confusion_matrix(y_test, y_pred, normalize=True)
+plt.show()
+y_probas = logreg.predict_proba(X_test)
+skplt.metrics.plot_roc(y_test, y_probas)
+plt.show()
+skplt.metrics.plot_cumulative_gain(y_test, y_probas)
 plt.show()
 
 !ec
 
-!split
-===== Building a neural network code =====
-
-Here we  present a flexible object oriented codebase
-for a feed forward neural network, along with a demonstration of how
-to use it. Before we get into the details of the neural network, we
-will first present some implementations of various schedulers, cost
-functions and activation functions that can be used together with the
-neural network.
-
-The codes here were developed by Eric Reber and Gregor Kajda during spring 2023.
-
-=== Learning rate methods  ===
-
-The code below shows object oriented implementations of the Constant,
-Momentum, Adagrad, AdagradMomentum, RMS prop and Adam schedulers. All
-of the classes belong to the shared abstract Scheduler class, and
-share the update_change() and reset() methods allowing for any of the
-schedulers to be seamlessly used during the training stage, as will
-later be shown in the fit() method of the neural
-network. Update_change() only has one parameter, the gradient
-($δ^l_ja^{l−1}_k$), and returns the change which will be subtracted
-from the weights. The reset() function takes no parameters, and resets
-the desired variables. For Constant and Momentum, reset does nothing.
-
-
-!bc pycod
-import autograd.numpy as np
-
-class Scheduler:
-    """
-    Abstract class for Schedulers
-    """
-
-    def __init__(self, eta):
-        self.eta = eta
-
-    # should be overwritten
-    def update_change(self, gradient):
-        raise NotImplementedError
-
-    # overwritten if needed
-    def reset(self):
-        pass
-
-
-class Constant(Scheduler):
-    def __init__(self, eta):
-        super().__init__(eta)
-
-    def update_change(self, gradient):
-        return self.eta * gradient
-    
-    def reset(self):
-        pass
-
-
-class Momentum(Scheduler):
-    def __init__(self, eta: float, momentum: float):
-        super().__init__(eta)
-        self.momentum = momentum
-        self.change = 0
-
-    def update_change(self, gradient):
-        self.change = self.momentum * self.change + self.eta * gradient
-        return self.change
-
-    def reset(self):
-        pass
-
-
-class Adagrad(Scheduler):
-    def __init__(self, eta):
-        super().__init__(eta)
-        self.G_t = None
-
-    def update_change(self, gradient):
-        delta = 1e-8  # avoid division ny zero
-
-        if self.G_t is None:
-            self.G_t = np.zeros((gradient.shape[0], gradient.shape[0]))
-
-        self.G_t += gradient @ gradient.T
-
-        G_t_inverse = 1 / (
-            delta + np.sqrt(np.reshape(np.diagonal(self.G_t), (self.G_t.shape[0], 1)))
-        )
-        return self.eta * gradient * G_t_inverse
-
-    def reset(self):
-        self.G_t = None
-
-
-class AdagradMomentum(Scheduler):
-    def __init__(self, eta, momentum):
-        super().__init__(eta)
-        self.G_t = None
-        self.momentum = momentum
-        self.change = 0
-
-    def update_change(self, gradient):
-        delta = 1e-8  # avoid division ny zero
-
-        if self.G_t is None:
-            self.G_t = np.zeros((gradient.shape[0], gradient.shape[0]))
-
-        self.G_t += gradient @ gradient.T
-
-        G_t_inverse = 1 / (
-            delta + np.sqrt(np.reshape(np.diagonal(self.G_t), (self.G_t.shape[0], 1)))
-        )
-        self.change = self.change * self.momentum + self.eta * gradient * G_t_inverse
-        return self.change
-
-    def reset(self):
-        self.G_t = None
-
-
-class RMS_prop(Scheduler):
-    def __init__(self, eta, rho):
-        super().__init__(eta)
-        self.rho = rho
-        self.second = 0.0
-
-    def update_change(self, gradient):
-        delta = 1e-8  # avoid division ny zero
-        self.second = self.rho * self.second + (1 - self.rho) * gradient * gradient
-        return self.eta * gradient / (np.sqrt(self.second + delta))
-
-    def reset(self):
-        self.second = 0.0
-
-
-class Adam(Scheduler):
-    def __init__(self, eta, rho, rho2):
-        super().__init__(eta)
-        self.rho = rho
-        self.rho2 = rho2
-        self.moment = 0
-        self.second = 0
-        self.n_epochs = 1
-
-    def update_change(self, gradient):
-        delta = 1e-8  # avoid division ny zero
-
-        self.moment = self.rho * self.moment + (1 - self.rho) * gradient
-        self.second = self.rho2 * self.second + (1 - self.rho2) * gradient * gradient
-
-        moment_corrected = self.moment / (1 - self.rho**self.n_epochs)
-        second_corrected = self.second / (1 - self.rho2**self.n_epochs)
-
-        return self.eta * moment_corrected / (np.sqrt(second_corrected + delta))
-
-    def reset(self):
-        self.n_epochs += 1
-        self.moment = 0
-        self.second = 0
-
-!ec
-
-=== Usage of the above learning rate schedulers ===
-
-To initalize a scheduler, simply create the object and pass in the
-necessary parameters such as the learning rate and the momentum as
-shown below. As the Scheduler class is an abstract class it should not
-called directly, and will raise an error upon usage.
-
-!bc pycod
-momentum_scheduler = Momentum(eta=1e-3, momentum=0.9)
-adam_scheduler = Adam(eta=1e-3, rho=0.9, rho2=0.999)
-!ec
-
-Here is a small example for how a segment of code using schedulers
-could look. Switching out the schedulers is simple.
-
-!bc pycod
-weights = np.ones((3,3))
-print(f"Before scheduler:\n{weights=}")
-
-epochs = 10
-for e in range(epochs):
-    gradient = np.random.rand(3, 3)
-    change = adam_scheduler.update_change(gradient)
-    weights = weights - change
-    adam_scheduler.reset()
-
-print(f"\nAfter scheduler:\n{weights=}")
-!ec
-
-
-=== Cost functions ===
-
-Here we discuss cost functions that can be used when creating the
-neural network. Every cost function takes the target vector as its
-parameter, and returns a function valued only at $x$ such that it may
-easily be differentiated.
-
-
-!bc pycod
-import autograd.numpy as np
-
-def CostOLS(target):
-    
-    def func(X):
-        return (1.0 / target.shape[0]) * np.sum((target - X) ** 2)
-
-    return func
-
-
-def CostLogReg(target):
-
-    def func(X):
-        
-        return -(1.0 / target.shape[0]) * np.sum(
-            (target * np.log(X + 10e-10)) + ((1 - target) * np.log(1 - X + 10e-10))
-        )
-
-    return func
-
-
-def CostCrossEntropy(target):
-    
-    def func(X):
-        return -(1.0 / target.size) * np.sum(target * np.log(X + 10e-10))
-
-    return func
-!ec
-
-
-Below we give a short example of how these cost function may be used
-to obtain results if you wish to test them out on your own using
-AutoGrad's automatics differentiation.
-
-!bc pycod
-from autograd import grad
-
-target = np.array([[1, 2, 3]]).T
-a = np.array([[4, 5, 6]]).T
-
-cost_func = CostCrossEntropy
-cost_func_derivative = grad(cost_func(target))
-
-valued_at_a = cost_func_derivative(a)
-print(f"Derivative of cost function {cost_func.__name__} valued at a:\n{valued_at_a}")
-!ec
-
-
-=== Activation functions ===
-
-Finally, before we look at the neural network, we will look at the
-activation functions which can be specified between the hidden layers
-and as the output function. Each function can be valued for any given
-vector or matrix X, and can be differentiated via derivate().
-
-!bc pycod
-import autograd.numpy as np
-from autograd import elementwise_grad
-
-def identity(X):
-    return X
-
-
-def sigmoid(X):
-    try:
-        return 1.0 / (1 + np.exp(-X))
-    except FloatingPointError:
-        return np.where(X > np.zeros(X.shape), np.ones(X.shape), np.zeros(X.shape))
-
-
-def softmax(X):
-    X = X - np.max(X, axis=-1, keepdims=True)
-    delta = 10e-10
-    return np.exp(X) / (np.sum(np.exp(X), axis=-1, keepdims=True) + delta)
-
-
-def RELU(X):
-    return np.where(X > np.zeros(X.shape), X, np.zeros(X.shape))
-
-
-def LRELU(X):
-    delta = 10e-4
-    return np.where(X > np.zeros(X.shape), X, delta * X)
-
-
-def derivate(func):
-    if func.__name__ == "RELU":
-
-        def func(X):
-            return np.where(X > 0, 1, 0)
-
-        return func
-
-    elif func.__name__ == "LRELU":
-
-        def func(X):
-            delta = 10e-4
-            return np.where(X > 0, 1, delta)
-
-        return func
-
-    else:
-        return elementwise_grad(func)
-!ec
-
-Below follows a short demonstration of how to use an activation
-function. The derivative of the activation function will be important
-when calculating the output delta term during backpropagation. Note
-that derivate() can also be used for cost functions for a more
-generalized approach.
-
-!bc pycod
-z = np.array([[4, 5, 6]]).T
-print(f"Input to activation function:\n{z}")
-
-act_func = sigmoid
-a = act_func(z)
-print(f"\nOutput from {act_func.__name__} activation function:\n{a}")
-
-act_func_derivative = derivate(act_func)
-valued_at_z = act_func_derivative(a)
-print(f"\nDerivative of {act_func.__name__} activation function valued at z:\n{valued_at_z}")
-!ec
-
-=== The Neural Network ===
-
-Now that we have gotten a good understanding of the implementation of
-some important components, we can take a look at an object oriented
-implementation of a feed forward neural network. The feed forward
-neural network has been implemented as a class named FFNN, which can
-be initiated as a regressor or classifier dependant on the choice of
-cost function. The FFNN can have any number of input nodes, hidden
-layers with any amount of hidden nodes, and any amount of output nodes
-meaning it can perform multiclass classification as well as binary
-classification and regression problems. Although there is a lot of
-code present, it makes for an easy to use and generalizeable interface
-for creating many types of neural networks as will be demonstrated
-below.
-
-!bc pycod
-import math
-import autograd.numpy as np
-import sys
-import warnings
-from autograd import grad, elementwise_grad
-from random import random, seed
-from copy import deepcopy, copy
-from typing import Tuple, Callable
-from sklearn.utils import resample
-
-warnings.simplefilter("error")
-
-
-class FFNN:
-    """
-    Description:
-    ------------
-        Feed Forward Neural Network with interface enabling flexible design of a
-        nerual networks architecture and the specification of activation function
-        in the hidden layers and output layer respectively. This model can be used
-        for both regression and classification problems, depending on the output function.
-
-    Attributes:
-    ------------
-        I   dimensions (tuple[int]): A list of positive integers, which specifies the
-            number of nodes in each of the networks layers. The first integer in the array
-            defines the number of nodes in the input layer, the second integer defines number
-            of nodes in the first hidden layer and so on until the last number, which
-            specifies the number of nodes in the output layer.
-        II  hidden_func (Callable): The activation function for the hidden layers
-        III output_func (Callable): The activation function for the output layer
-        IV  cost_func (Callable): Our cost function
-        V   seed (int): Sets random seed, makes results reproducible
-    """
-
-    def __init__(
-        self,
-        dimensions: tuple[int],
-        hidden_func: Callable = sigmoid,
-        output_func: Callable = lambda x: x,
-        cost_func: Callable = CostOLS,
-        seed: int = None,
-    ):
-        self.dimensions = dimensions
-        self.hidden_func = hidden_func
-        self.output_func = output_func
-        self.cost_func = cost_func
-        self.seed = seed
-        self.weights = list()
-        self.schedulers_weight = list()
-        self.schedulers_bias = list()
-        self.a_matrices = list()
-        self.z_matrices = list()
-        self.classification = None
-
-        self.reset_weights()
-        self._set_classification()
-
-    def fit(
-        self,
-        X: np.ndarray,
-        t: np.ndarray,
-        scheduler: Scheduler,
-        batches: int = 1,
-        epochs: int = 100,
-        lam: float = 0,
-        X_val: np.ndarray = None,
-        t_val: np.ndarray = None,
-    ):
-        """
-        Description:
-        ------------
-            This function performs the training the neural network by performing the feedforward and backpropagation
-            algorithm to update the networks weights.
-
-        Parameters:
-        ------------
-            I    X (np.ndarray) : training data
-            II   t (np.ndarray) : target data
-            III  scheduler (Scheduler) : specified scheduler (algorithm for optimization of gradient descent)
-            IV   scheduler_args (list[int]) : list of all arguments necessary for scheduler
-
-        Optional Parameters:
-        ------------
-            V    batches (int) : number of batches the datasets are split into, default equal to 1
-            VI   epochs (int) : number of iterations used to train the network, default equal to 100
-            VII  lam (float) : regularization hyperparameter lambda
-            VIII X_val (np.ndarray) : validation set
-            IX   t_val (np.ndarray) : validation target set
-
-        Returns:
-        ------------
-            I   scores (dict) : A dictionary containing the performance metrics of the model.
-                The number of the metrics depends on the parameters passed to the fit-function.
-
-        """
-
-        # setup 
-        if self.seed is not None:
-            np.random.seed(self.seed)
-
-        val_set = False
-        if X_val is not None and t_val is not None:
-            val_set = True
-
-        # creating arrays for score metrics
-        train_errors = np.empty(epochs)
-        train_errors.fill(np.nan)
-        val_errors = np.empty(epochs)
-        val_errors.fill(np.nan)
-
-        train_accs = np.empty(epochs)
-        train_accs.fill(np.nan)
-        val_accs = np.empty(epochs)
-        val_accs.fill(np.nan)
-
-        self.schedulers_weight = list()
-        self.schedulers_bias = list()
-
-        batch_size = X.shape[0] // batches
-
-        X, t = resample(X, t)
-
-        # this function returns a function valued only at X
-        cost_function_train = self.cost_func(t)
-        if val_set:
-            cost_function_val = self.cost_func(t_val)
-
-        # create schedulers for each weight matrix
-        for i in range(len(self.weights)):
-            self.schedulers_weight.append(copy(scheduler))
-            self.schedulers_bias.append(copy(scheduler))
-
-        print(f"{scheduler.__class__.__name__}: Eta={scheduler.eta}, Lambda={lam}")
-
-        try:
-            for e in range(epochs):
-                for i in range(batches):
-                    # allows for minibatch gradient descent
-                    if i == batches - 1:
-                        # If the for loop has reached the last batch, take all thats left
-                        X_batch = X[i * batch_size :, :]
-                        t_batch = t[i * batch_size :, :]
-                    else:
-                        X_batch = X[i * batch_size : (i + 1) * batch_size, :]
-                        t_batch = t[i * batch_size : (i + 1) * batch_size, :]
-
-                    self._feedforward(X_batch)
-                    self._backpropagate(X_batch, t_batch, lam)
-
-                # reset schedulers for each epoch (some schedulers pass in this call)
-                for scheduler in self.schedulers_weight:
-                    scheduler.reset()
-
-                for scheduler in self.schedulers_bias:
-                    scheduler.reset()
-
-                # computing performance metrics
-                pred_train = self.predict(X)
-                train_error = cost_function_train(pred_train)
-
-                train_errors[e] = train_error
-                if val_set:
-                    
-                    pred_val = self.predict(X_val)
-                    val_error = cost_function_val(pred_val)
-                    val_errors[e] = val_error
-
-                if self.classification:
-                    train_acc = self._accuracy(self.predict(X), t)
-                    train_accs[e] = train_acc
-                    if val_set:
-                        val_acc = self._accuracy(pred_val, t_val)
-                        val_accs[e] = val_acc
-
-                # printing progress bar
-                progression = e / epochs
-                print_length = self._progress_bar(
-                    progression,
-                    train_error=train_errors[e],
-                    train_acc=train_accs[e],
-                    val_error=val_errors[e],
-                    val_acc=val_accs[e],
-                )
-        except KeyboardInterrupt:
-            # allows for stopping training at any point and seeing the result
-            pass
-
-        # visualization of training progression (similiar to tensorflow progression bar)
-        sys.stdout.write("\r" + " " * print_length)
-        sys.stdout.flush()
-        self._progress_bar(
-            1,
-            train_error=train_errors[e],
-            train_acc=train_accs[e],
-            val_error=val_errors[e],
-            val_acc=val_accs[e],
-        )
-        sys.stdout.write("")
-
-        # return performance metrics for the entire run
-        scores = dict()
-
-        scores["train_errors"] = train_errors
-
-        if val_set:
-            scores["val_errors"] = val_errors
-
-        if self.classification:
-            scores["train_accs"] = train_accs
-
-            if val_set:
-                scores["val_accs"] = val_accs
-
-        return scores
-
-    def predict(self, X: np.ndarray, *, threshold=0.5):
-        """
-         Description:
-         ------------
-             Performs prediction after training of the network has been finished.
-
-         Parameters:
-        ------------
-             I   X (np.ndarray): The design matrix, with n rows of p features each
-
-         Optional Parameters:
-         ------------
-             II  threshold (float) : sets minimal value for a prediction to be predicted as the positive class
-                 in classification problems
-
-         Returns:
-         ------------
-             I   z (np.ndarray): A prediction vector (row) for each row in our design matrix
-                 This vector is thresholded if regression=False, meaning that classification results
-                 in a vector of 1s and 0s, while regressions in an array of decimal numbers
-
-        """
-
-        predict = self._feedforward(X)
-
-        if self.classification:
-            return np.where(predict > threshold, 1, 0)
-        else:
-            return predict
-
-    def reset_weights(self):
-        """
-        Description:
-        ------------
-            Resets/Reinitializes the weights in order to train the network for a new problem.
-
-        """
-        if self.seed is not None:
-            np.random.seed(self.seed)
-
-        self.weights = list()
-        for i in range(len(self.dimensions) - 1):
-            weight_array = np.random.randn(
-                self.dimensions[i] + 1, self.dimensions[i + 1]
-            )
-            weight_array[0, :] = np.random.randn(self.dimensions[i + 1]) * 0.01
-
-            self.weights.append(weight_array)
-
-    def _feedforward(self, X: np.ndarray):
-        """
-        Description:
-        ------------
-            Calculates the activation of each layer starting at the input and ending at the output.
-            Each following activation is calculated from a weighted sum of each of the preceeding
-            activations (except in the case of the input layer).
-
-        Parameters:
-        ------------
-            I   X (np.ndarray): The design matrix, with n rows of p features each
-
-        Returns:
-        ------------
-            I   z (np.ndarray): A prediction vector (row) for each row in our design matrix
-        """
-
-        # reset matrices
-        self.a_matrices = list()
-        self.z_matrices = list()
-
-        # if X is just a vector, make it into a matrix
-        if len(X.shape) == 1:
-            X = X.reshape((1, X.shape[0]))
-
-        # Add a coloumn of zeros as the first coloumn of the design matrix, in order
-        # to add bias to our data
-        bias = np.ones((X.shape[0], 1)) * 0.01
-        X = np.hstack([bias, X])
-
-        # a^0, the nodes in the input layer (one a^0 for each row in X - where the
-        # exponent indicates layer number).
-        a = X
-        self.a_matrices.append(a)
-        self.z_matrices.append(a)
-
-        # The feed forward algorithm
-        for i in range(len(self.weights)):
-            if i < len(self.weights) - 1:
-                z = a @ self.weights[i]
-                self.z_matrices.append(z)
-                a = self.hidden_func(z)
-                # bias column again added to the data here
-                bias = np.ones((a.shape[0], 1)) * 0.01
-                a = np.hstack([bias, a])
-                self.a_matrices.append(a)
-            else:
-                try:
-                    # a^L, the nodes in our output layers
-                    z = a @ self.weights[i]
-                    a = self.output_func(z)
-                    self.a_matrices.append(a)
-                    self.z_matrices.append(z)
-                except Exception as OverflowError:
-                    print(
-                        "OverflowError in fit() in FFNN\nHOW TO DEBUG ERROR: Consider lowering your learning rate or scheduler specific parameters such as momentum, or check if your input values need scaling"
-                    )
-
-        # this will be a^L
-        return a
-
-    def _backpropagate(self, X, t, lam):
-        """
-        Description:
-        ------------
-            Performs the backpropagation algorithm. In other words, this method
-            calculates the gradient of all the layers starting at the
-            output layer, and moving from right to left accumulates the gradient until
-            the input layer is reached. Each layers respective weights are updated while
-            the algorithm propagates backwards from the output layer (auto-differentation in reverse mode).
-
-        Parameters:
-        ------------
-            I   X (np.ndarray): The design matrix, with n rows of p features each.
-            II  t (np.ndarray): The target vector, with n rows of p targets.
-            III lam (float32): regularization parameter used to punish the weights in case of overfitting
-
-        Returns:
-        ------------
-            No return value.
-
-        """
-        out_derivative = derivate(self.output_func)
-        hidden_derivative = derivate(self.hidden_func)
-
-        for i in range(len(self.weights) - 1, -1, -1):
-            # delta terms for output
-            if i == len(self.weights) - 1:
-                # for multi-class classification
-                if (
-                    self.output_func.__name__ == "softmax"
-                ):
-                    delta_matrix = self.a_matrices[i + 1] - t
-                # for single class classification
-                else:
-                    cost_func_derivative = grad(self.cost_func(t))
-                    delta_matrix = out_derivative(
-                        self.z_matrices[i + 1]
-                    ) * cost_func_derivative(self.a_matrices[i + 1])
-
-            # delta terms for hidden layer
-            else:
-                delta_matrix = (
-                    self.weights[i + 1][1:, :] @ delta_matrix.T
-                ).T * hidden_derivative(self.z_matrices[i + 1])
-
-            # calculate gradient
-            gradient_weights = self.a_matrices[i][:, 1:].T @ delta_matrix
-            gradient_bias = np.sum(delta_matrix, axis=0).reshape(
-                1, delta_matrix.shape[1]
-            )
-
-            # regularization term
-            gradient_weights += self.weights[i][1:, :] * lam
-
-            # use scheduler
-            update_matrix = np.vstack(
-                [
-                    self.schedulers_bias[i].update_change(gradient_bias),
-                    self.schedulers_weight[i].update_change(gradient_weights),
-                ]
-            )
-
-            # update weights and bias
-            self.weights[i] -= update_matrix
-
-    def _accuracy(self, prediction: np.ndarray, target: np.ndarray):
-        """
-        Description:
-        ------------
-            Calculates accuracy of given prediction to target
-
-        Parameters:
-        ------------
-            I   prediction (np.ndarray): vector of predicitons output network
-                (1s and 0s in case of classification, and real numbers in case of regression)
-            II  target (np.ndarray): vector of true values (What the network ideally should predict)
-
-        Returns:
-        ------------
-            A floating point number representing the percentage of correctly classified instances.
-        """
-        assert prediction.size == target.size
-        return np.average((target == prediction))
-    def _set_classification(self):
-        """
-        Description:
-        ------------
-            Decides if FFNN acts as classifier (True) og regressor (False),
-            sets self.classification during init()
-        """
-        self.classification = False
-        if (
-            self.cost_func.__name__ == "CostLogReg"
-            or self.cost_func.__name__ == "CostCrossEntropy"
-        ):
-            self.classification = True
-
-    def _progress_bar(self, progression, **kwargs):
-        """
-        Description:
-        ------------
-            Displays progress of training
-        """
-        print_length = 40
-        num_equals = int(progression * print_length)
-        num_not = print_length - num_equals
-        arrow = ">" if num_equals > 0 else ""
-        bar = "[" + "=" * (num_equals - 1) + arrow + "-" * num_not + "]"
-        perc_print = self._format(progression * 100, decimals=5)
-        line = f"  {bar} {perc_print}% "
-
-        for key in kwargs:
-            if not np.isnan(kwargs[key]):
-                value = self._format(kwargs[key], decimals=4)
-                line += f"| {key}: {value} "
-        sys.stdout.write("\r" + line)
-        sys.stdout.flush()
-        return len(line)
-
-    def _format(self, value, decimals=4):
-        """
-        Description:
-        ------------
-            Formats decimal numbers for progress bar
-        """
-        if value > 0:
-            v = value
-        elif value < 0:
-            v = -10 * value
-        else:
-            v = 1
-        n = 1 + math.floor(math.log10(v))
-        if n >= decimals - 1:
-            return str(round(value))
-        return f"{value:.{decimals-n-1}f}"
-!ec
-
-Before we make a model, we will quickly generate a dataset we can use
-for our linear regression problem as shown below
-
-!bc pycod
-import autograd.numpy as np
-from sklearn.model_selection import train_test_split
-
-def SkrankeFunction(x, y):
-    return np.ravel(0 + 1*x + 2*y + 3*x**2 + 4*x*y + 5*y**2)
-
-def create_X(x, y, n):
-    if len(x.shape) > 1:
-        x = np.ravel(x)
-        y = np.ravel(y)
-
-    N = len(x)
-    l = int((n + 1) * (n + 2) / 2)  # Number of elements in beta
-    X = np.ones((N, l))
-
-    for i in range(1, n + 1):
-        q = int((i) * (i + 1) / 2)
-        for k in range(i + 1):
-            X[:, q + k] = (x ** (i - k)) * (y**k)
-
-    return X
-
-step=0.5
-x = np.arange(0, 1, step)
-y = np.arange(0, 1, step)
-x, y = np.meshgrid(x, y)
-target = SkrankeFunction(x, y)
-target = target.reshape(target.shape[0], 1)
-
-poly_degree=3
-X = create_X(x, y, poly_degree)
-
-X_train, X_test, t_train, t_test = train_test_split(X, target)
-
-!ec
-
-Now that we have our dataset ready for the regression, we can create
-our regressor. Note that with the seed parameter, we can make sure our
-results stay the same every time we run the neural network. For
-inititialization, we simply specify the dimensions (we wish the amount
-of input nodes to be equal to the datapoints, and the output to
-predict one value).
-
-
-!bc pycod
-input_nodes = X_train.shape[1]
-output_nodes = 1
-
-linear_regression = FFNN((input_nodes, output_nodes), output_func=identity, cost_func=CostOLS, seed=2023)
-
-!ec
-
-We then fit our model with our training data using the scheduler of our choice.
-
-!bc pycod
-linear_regression.reset_weights() # reset weights such that previous runs or reruns don't affect the weights
-
-scheduler = Constant(eta=1e-3)
-scores = linear_regression.fit(X_train, t_train, scheduler)
-
-
-!ec
-
-Due to the progress bar we can see the MSE (train_error) throughout
-the FFNN's training. Note that the fit() function has some optional
-parameters with defualt arguments. For example, the regularization
-hyperparameter can be left ignored if not needed, and equally the FFNN
-will by default run for 100 epochs. These can easily be changed, such
-as for example:
-
-!bc pycod
-linear_regression.reset_weights() # reset weights such that previous runs or reruns don't affect the weights
-
-scores = linear_regression.fit(X_train, t_train, scheduler, lam=1e-4, epochs=1000)
-
-!ec
-
-We see that given more epochs to train on, the regressor reaches a lower MSE.
-
-Let us then switch to a binary classification. We use a binary
-classification dataset, and follow a similar setup to the regression
-case.
-
-
-
-!bc pycod
-from sklearn.datasets import load_breast_cancer
-from sklearn.preprocessing import MinMaxScaler
-
-wisconsin = load_breast_cancer()
-X = wisconsin.data
-target = wisconsin.target
-target = target.reshape(target.shape[0], 1)
-
-X_train, X_val, t_train, t_val = train_test_split(X, target)
-
-scaler = MinMaxScaler()
-scaler.fit(X_train)
-X_train = scaler.transform(X_train)
-X_val = scaler.transform(X_val)
-
-
-!ec
-
-!bc pycod
-input_nodes = X_train.shape[1]
-output_nodes = 1
-
-logistic_regression = FFNN((input_nodes, output_nodes), output_func=sigmoid, cost_func=CostLogReg, seed=2023)
-
-!ec
-
-We will now make use of our validation data by passing it into our fit function as a keyword argument
-
-!bc pycod
-logistic_regression.reset_weights() # reset weights such that previous runs or reruns don't affect the weights
-
-scheduler = Adam(eta=1e-3, rho=0.9, rho2=0.999)
-scores = logistic_regression.fit(X_train, t_train, scheduler, epochs=1000, X_val=X_val, t_val=t_val)
+===  Exercise a)  ===
+Convince yourself about the mathematics for the confusion matrix, the ROC and the cumlative gain curves for both a binary and a multiclass classification problem.
 
-
-!ec
-
-Finally, we will create a neural network with 2 hidden layers with activation functions.
+===  Exercise b)  ===
+Use a binary classification data available from _scikit-learn_. As an example you can use
+the MNIST data set and just specialize to two numbers. To do so you can use the following code lines
 !bc pycod
-input_nodes = X_train.shape[1]
-hidden_nodes1 = 100
-hidden_nodes2 = 30
-output_nodes = 1
-
-dims = (input_nodes, hidden_nodes1, hidden_nodes2, output_nodes)
-
-neural_network = FFNN(dims, hidden_func=RELU, output_func=sigmoid, cost_func=CostLogReg, seed=2023)
-
-
+from sklearn.datasets import load_digits
+digits = load_digits(n_class=2) # Load only two classes, e.g., 0 and 1
+X, y = digits.data, digits.target
 !ec
 
+Alternatively, you can use the _make$\_$classification_
+functionality. This function generates a random $n$-class classification
+dataset, which can be configured for binary classification by setting
+n_classes=2. You can also control the number of samples, features,
+informative features, redundant features, and more.
 !bc pycod
-neural_network.reset_weights() # reset weights such that previous runs or reruns don't affect the weights
-
-scheduler = Adam(eta=1e-4, rho=0.9, rho2=0.999)
-scores = neural_network.fit(X_train, t_train, scheduler, epochs=1000, X_val=X_val, t_val=t_val)
-
+from sklearn.datasets import make_classification
+X, y = make_classification(n_samples=1000, n_features=20, n_informative=10, n_redundant=5, n_classes=2, random_state=42)
 !ec
+You can use this option for the multiclass case as well, see the next exercise.
+If you prefer to study other binary classification datasets, feel free
+to replace the above suggestions with your own dataset.
 
-=== Multiclass classification ===
-
-Finally, we will demonstrate the use case of multiclass classification
-using our FFNN with the famous MNIST dataset, which contain images of
-digits between the range of 0 to 9.
+Make plots of the confusion matrix, the ROC curve and the cumulative gain curve.
 
+===  Exercise c) week 43 ===
 
+As a multiclass problem, we will use the Iris data set discussed in
+the exercises from weeks 41 and 42. This is a three-class data set and
+you can set it up using _scikit-learn_,
 !bc pycod
-from sklearn.datasets import load_digits
-
-def onehot(target: np.ndarray):
-    onehot = np.zeros((target.size, target.max() + 1))
-    onehot[np.arange(target.size), target] = 1
-    return onehot
-
-digits = load_digits()
-
-X = digits.data
-target = digits.target
-target = onehot(target)
-
-input_nodes = 64
-hidden_nodes1 = 100
-hidden_nodes2 = 30
-output_nodes = 10
-
-dims = (input_nodes, hidden_nodes1, hidden_nodes2, output_nodes)
-
-multiclass = FFNN(dims, hidden_func=LRELU, output_func=softmax, cost_func=CostCrossEntropy)
-
-multiclass.reset_weights() # reset weights such that previous runs or reruns don't affect the weights
-
-scheduler = Adam(eta=1e-4, rho=0.9, rho2=0.999)
-scores = multiclass.fit(X, target, scheduler, epochs=1000)
-
+from sklearn.datasets import load_iris
+iris = load_iris()
+X = iris.data  # Features
+y = iris.target # Target labels
 !ec
 
-
-
-!split
-===== Testing the XOR gate and other gates =====
-
-Let us now use our code to test the XOR gate.
-
-!bc pycod
-X = np.array([ [0, 0], [0, 1], [1, 0],[1, 1]],dtype=np.float64)
-
-# The XOR gate
-yXOR = np.array( [[ 0], [1] ,[1], [0]])
-
-input_nodes = X.shape[1]
-output_nodes = 1
-
-logistic_regression = FFNN((input_nodes, output_nodes), output_func=sigmoid, cost_func=CostLogReg, seed=2023)
-logistic_regression.reset_weights() # reset weights such that previous runs or reruns don't affect the weights
-scheduler = Adam(eta=1e-1, rho=0.9, rho2=0.999)
-scores = logistic_regression.fit(X, yXOR, scheduler, epochs=1000)
-!ec
-Not bad, but the results depend strongly on the learning reate. Try different learning rates.
+Make plots of the confusion matrix, the ROC curve and the cumulative
+gain curve for this (or other) multiclass data set.
diff --git a/doc/src/week43/programs/test1.py b/doc/src/week43/programs/test1.py
new file mode 100644
index 000000000..22a3ac8f1
--- /dev/null
+++ b/doc/src/week43/programs/test1.py
@@ -0,0 +1,19 @@
+import matplotlib.pyplot as plt
+from sklearn.metrics import ConfusionMatrixDisplay, confusion_matrix
+from sklearn.neighbors import KNeighborsClassifier # Example estimator
+
+# Assuming y_true and y_pred are your true and predicted labels
+# and you have a trained estimator (e.g., knn)
+# Example data:
+y_true = [0, 1, 0, 1, 0, 1, 0, 1]
+y_pred = [0, 0, 0, 1, 1, 1, 0, 0]
+classes = [0, 1]
+
+# Calculate the confusion matrix
+cm = confusion_matrix(y_true, y_pred, labels=classes)
+
+# Create and plot the display
+disp = ConfusionMatrixDisplay(confusion_matrix=cm, display_labels=classes)
+disp.plot(cmap=plt.cm.Blues) # Customize colormap if desired
+plt.title("Confusion Matrix")
+plt.show()
diff --git a/doc/src/week43/week43.do.txt b/doc/src/week43/week43.do.txt
index 4fb97b4f8..c7701fa35 100644
--- a/doc/src/week43/week43.do.txt
+++ b/doc/src/week43/week43.do.txt
@@ -6,11 +6,12 @@ DATE: October 20, 2025
 ===== Plans for week 43 =====
 
 !bblock Material for the lecture on Monday October 20, 2025
-  * Building our own Feed-forward Neural Network with intro to Tensorflow
-  * Solving differential equations with Neural Networks
-#  * Video of lecture at URL:"https://youtu.be/vkBNTn-MLqs"
-#  * Video os second part, solving differential equations with neural networks at URL:"https://youtu.be/2N8To65I2wQ"
-#  * Whiteboard notes on solving differential equations at URL:"https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2024/NotesOct21.pdf"  
+o Reminder from last week, see also lecture notes from week 42 at URL:"https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/week42.html" as well as those from week 41, see see URL:"https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/week41.html". 
+o Building our own Feed-forward Neural Network.
+o Coding examples using Tensorflow/Keras and Pytorch examples. The Pytorch examples are adapted from Rashcka's text, see chapters 11-13.. 
+o Start discussions on how to use neural networks for solving  differential equations (ordinary and partial ones). This topic continues next week as well.
+o Video of lecture at URL:"https://youtu.be/Gi6mzxAT0Ew"
+o Whiteboard notes at URL:"https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2025/FYSSTKweek43.pdf"  
 !eblock
 
 
@@ -18,42 +19,18 @@ DATE: October 20, 2025
 !split
 ===== Exercises and lab session week 43 =====
 !bblock  Lab sessions on Tuesday and Wednesday
-  * Exercise on writing your own neural network code
-  * The exercises this week will be continued next week as well
-  * Discussion of project 2
+o Work  on writing your own neural network code and discussions of project 2. If you didn't get time to do the exercises from the two last weeks, we recommend doing so as these exercises give you the basic elements of a neural network code.
+o The exercises this week are tailored to the optional part of project 2, and deal with studying ways to display results from classification problems
 !eblock  
 
 
 
-!split
-===== Mathematics of deep learning =====
-
-!bblock Two recent books online
-o The Modern Mathematics of Deep Learning, by Julius Berner, Philipp Grohs, Gitta Kutyniok, Philipp Petersen at URL:"https://arxiv.org/abs/2105.04026", published as "Mathematical Aspects of Deep Learning, pp. 1-111. Cambridge University Press, 2022":"https://doi.org/10.1017/9781009025096.002"
-
-o Mathematical Introduction to Deep Learning: Methods, Implementations, and Theory, Arnulf Jentzen, Benno Kuckuck, Philippe von Wurstemberger at URL:"https://doi.org/10.48550/arXiv.2310.20360"
-!eblock
-
-
-!split
-===== Reminder on books with hands-on material and codes =====
-!bblock
-* Sebastian Rashcka et al, Machine learning with Scikit-Learn and PyTorch at URL:"https://sebastianraschka.com/blog/2022/ml-pytorch-book.html"
-!eblock
-
-
-!split
-===== Reading recommendations =====
-
-o Rashkca et al., chapter 11, jupyter-notebook sent separately, from GitHub site at URL:"https://github.com/rasbt/machine-learning-book". See also chapters 12 and 13 on using Pytorch to make a Neural network code. 
-o Goodfellow et al, chapter 6 and 7 contain most of the neural network background.
-
 
 !split
 ===== Using Automatic differentiation =====
 
 In our discussions of ordinary differential equations and neural network codes
-we will also study the usage of Autograd, see for example URL:"https://www.youtube.com/watch?v=fRf4l5qaX1M&ab_channel=AlexSmola" in computing gradients for deep learning. For the documentation of Autograd and examples see the lectures slides from "week 39":"https://compphysics.github.io/MachineLearning/doc/pub/week39/html/week39.html" and the Autograd documentation at URL:"https://github.com/HIPS/autograd".
+we will also study the usage of Autograd, see for example URL:"https://www.youtube.com/watch?v=fRf4l5qaX1M&ab_channel=AlexSmola" in computing gradients for deep learning. For the documentation of Autograd and examples see the Autograd documentation at URL:"https://github.com/HIPS/autograd" and the lecture slides from week 41, see URL:"https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/week41.html". 
 
 
 !split
@@ -67,12 +44,12 @@ o Slides 12-44 at URL:"http://cs231n.stanford.edu/slides/2017/cs231n_2017_lectur%20%20%20!split-=====%20Lecture%20Monday%20%20October%2021%20=====+=====%20Lecture%20Monday%20%20October%2020%20=====%20%20%20!split%20=====%20Setting%20up%20the%20back%20propagation%20algorithm%20and%20algorithm%20for%20a%20feed%20forward%20NN,%20initalizations%20=====-This%20is%20a%20reminder%20from%20where%20we%20ended%20last%20week.+This%20is%20a%20reminder%20from%20last%20week.%20%20!bblock%20The%20architecture%20(our%20model)%20o%20Set%20up%20your%20inputs%20and%20outputs%20(scalars,%20vectors,%20matrices%20or%20higher-order%20arrays)@@%20-260,234%20+237,85%20@@%20_For%20the%20output%20layer:_%20%20%20-!split%20-=====%20Setting%20up%20a%20Multi-layer%20perceptron%20model%20for%20classification%20%20=====%20-We%20are%20now%20gong%20to%20develop%20an%20example%20based%20on%20the%20MNIST%20data-base.%20This%20is%20a%20classification%20problem%20and%20we%20need%20to%20use%20our-cross-entropy%20function%20we%20discussed%20in%20connection%20with%20logistic-regression.%20The%20cross-entropy%20defines%20our%20cost%20function%20for%20the-classificaton%20problems%20with%20neural%20networks.-%20%20-In%20binary%20classification%20with%20two%20classes%20$(0,%201)$%20we%20define%20the-logistic/sigmoid%20function%20as%20the%20probability%20that%20a%20particular%20input-is%20in%20class%20$0$%20or%20$1$.%20%20This%20is%20possible%20because%20the%20logistic-function%20takes%20any%20input%20from%20the%20real%20numbers%20and%20inputs%20a%20number-between%200%20and%201,%20and%20can%20therefore%20be%20interpreted%20as%20a%20probability.%20It-also%20has%20other%20nice%20properties,%20such%20as%20a%20derivative%20that%20is%20simple%20to-calculate.-%20%20-For%20an%20input%20$/boldsymbol%7Ba%7D$%20from%20the%20hidden%20layer,%20the%20probability%20that%20the%20input%20$/boldsymbol%7Bx%7D$-is%20in%20class%200%20or%201%20is%20just.%20We%20let%20$/theta$%20represent%20the%20unknown%20weights%20and%20biases%20to%20be%20adjusted%20by%20our%20equations).%20The%20variable%20$x$-represents%20our%20activation%20values%20$z$.%20We%20have-!bt-/[-P(y%20=%200%20/mid%20/bm%7Bx%7D,%20/bm%7B/theta%7D)%20=%20/frac%7B1%7D%7B1%20+%20/exp%7B(-%20/bm%7Bx%7D%7D)%7D%20,-/]-!et-and-!bt-/[-P(y%20=%201%20/mid%20/bm%7Bx%7D,%20/bm%7B/theta%7D)%20=%201%20-%20P(y%20=%200%20/mid%20/bm%7Bx%7D,%20/bm%7B/theta%7D)%20,-/]-!et-%20%20-where%20$y%20/in%20/%7B0,%201/%7D$%20%20and%20$/bm%7B/theta%7D$%20represents%20the%20weights%20and%20biases-of%20our%20network.-%20%20++%20%20!split-=====%20Defining%20the%20cost%20function%20=====+=====%20Building%20neural%20networks%20in%20Tensorflow%20and%20Keras%20=====%20-Our%20cost%20function%20is%20given%20as%20(see%20the%20Logistic%20regression%20lectures)-!bt-/[-/mathcal%7BC%7D(/bm%7B/theta%7D)%20=%20-%20/ln%20P(/mathcal%7BD%7D%20/mid%20/bm%7B/theta%7D)%20=%20-%20/sum_%7Bi=1%7D%5En-y_i%20/ln[P(y_i%20=%200)]%20+%20(1%20-%20y_i)%20/ln%20[1%20-%20P(y_i%20=%200)]%20=%20/sum_%7Bi=1%7D%5En%20/mathcal%7BL%7D_i(/bm%7B/theta%7D)%20.-/]-!et-%20%20-This%20last%20equality%20means%20that%20we%20can%20interpret%20our%20*cost*%20function%20as%20a%20sum%20over%20the%20*loss*%20function-for%20each%20point%20in%20the%20dataset%20$/mathcal%7BL%7D_i(/bm%7B/theta%7D)$.%20%20-The%20negative%20sign%20is%20just%20so%20that%20we%20can%20think%20about%20our%20algorithm%20as%20minimizing%20a%20positive%20number,%20rather-than%20maximizing%20a%20negative%20number.%20%20-%20%20-In%20*multiclass*%20classification%20it%20is%20common%20to%20treat%20each%20integer%20label%20as%20a%20so%20called%20*one-hot*%20vector:%20%20+Now%20we%20want%20%20to%20build%20on%20the%20experience%20gained%20from%20our%20neural%20network%20implementation%20in%20NumPy%20and%20scikit-learn+and%20use%20it%20to%20construct%20a%20neural%20network%20in%20Tensorflow.%20Once%20we%20have%20constructed%20a%20neural%20network%20in%20NumPy+and%20Tensorflow,%20building%20one%20in%20Keras%20is%20really%20quite%20trivial,%20though%20the%20performance%20may%20suffer.%20%20%20%20%20-$y%20=%205%20/quad%20/rightarrow%20/quad%20/bm%7By%7D%20=%20(0,%200,%200,%200,%200,%201,%200,%200,%200,%200)%20,$%20and+In%20our%20previous%20example%20we%20used%20only%20one%20hidden%20layer,%20and%20in%20this%20we%20will%20use%20two.%20From%20this%20it%20should%20be%20quite+clear%20how%20to%20build%20one%20using%20an%20arbitrary%20number%20of%20hidden%20layers,%20using%20data%20structures%20such%20as%20Python%20lists%20or+NumPy%20arrays.%20+!split+=====%20Tensorflow%20=====%20%20%20-$y%20=%201%20/quad%20/rightarrow%20/quad%20/bm%7By%7D%20=%20(0,%201,%200,%200,%200,%200,%200,%200,%200,%200)%20,$%20-%20%20-%20%20-i.e.%20a%20binary%20bit%20string%20of%20length%20$C$,%20where%20$C%20=%2010$%20is%20the%20number%20of%20classes%20in%20the%20MNIST%20dataset%20(numbers%20from%20$0$%20to%20$9$)..%20%20-%20%20-If%20$/bm%7Bx%7D_i$%20is%20the%20$i$-th%20input%20(image),%20$y_%7Bic%7D$%20refers%20to%20the%20$c$-th%20component%20of%20the%20$i$-th-output%20vector%20$/bm%7By%7D_i$.%20%20-The%20probability%20of%20$/bm%7Bx%7D_i$%20being%20in%20class%20$c$%20will%20be%20given%20by%20the%20softmax%20function:%20%20+Tensorflow%20is%20an%20open%20source%20library%20machine%20learning%20library+developed%20by%20the%20Google%20Brain%20team%20for%20internal%20use.%20It%20was%20released+under%20the%20Apache%202.0%20open%20source%20license%20in%20November%209,%202015.%20%20%20-!bt-/[-P(y_%7Bic%7D%20=%201%20/mid%20/bm%7Bx%7D_i,%20/bm%7B/theta%7D)%20=%20/frac%7B/exp%7B((/bm%7Ba%7D_i%5E%7Bhidden%7D)%5ET%20/bm%7Bw%7D_c)%7D%7D-%7B/sum_%7Bc'=0}^{C-1} \exp{((\bm{a}_i^{hidden})^T \bm{w}_{c'})}} ,
-\]
-!et
+Tensorflow is a computational framework that allows you to construct
+machine learning models at different levels of abstraction, from
+high-level, object-oriented APIs like Keras, down to the C++ kernels
+that Tensorflow is built upon. The higher levels of abstraction are
+simpler to use, but less flexible, and our choice of implementation
+should reflect the problems we are trying to solve.
   
-which reduces to the logistic function in the binary case.  
-The likelihood of this $C$-class classifier
-is now given as:  
+"Tensorflow uses":"https://www.tensorflow.org/guide/graphs" so-called graphs to represent your computation
+in terms of the dependencies between individual operations, such that you first build a Tensorflow *graph*
+to represent your model, and then create a Tensorflow *session* to run the graph.
   
-!bt
-\[
-P(\mathcal{D} \mid \bm{\theta}) = \prod_{i=1}^n \prod_{c=0}^{C-1} [P(y_{ic} = 1)]^{y_{ic}} .
-\]
-!et
-Again we take the negative log-likelihood to define our cost function:  
+In this guide we will analyze the same data as we did in our NumPy and
+scikit-learn tutorial, gathered from the MNIST database of images. We
+will give an introduction to the lower level Python Application
+Program Interfaces (APIs), and see how we use them to build our graph.
+Then we will build (effectively) the same graph in Keras, to see just
+how simple solving a machine learning problem can be.
   
-!bt
-\[
-\mathcal{C}(\bm{\theta}) = - \log{P(\mathcal{D} \mid \bm{\theta})}.
-\]
-!et
-See the logistic regression lectures for a full definition of the cost function.
-
-The back propagation equations need now only a small change, namely the definition of a new cost function. We are thus ready to use the same equations as before!
-
-!split
-===== Example: binary classification problem =====
-
-As an example of the above, relevant for project 2 as well, let us consider a binary class. As discussed in our logistic regression lectures, we defined a cost function in terms of the parameters $\beta$ as
-!bt
-\[
-\mathcal{C}(\bm{\beta}) = - \sum_{i=1}^n \left(y_i\log{p(y_i \vert x_i,\bm{\beta})}+(1-y_i)\log{1-p(y_i \vert x_i,\bm{\beta})}\right),
-\]
-!et
-where we had defined the logistic (sigmoid) function
-!bt
-\[
-p(y_i =1\vert x_i,\bm{\beta})=\frac{\exp{(\beta_0+\beta_1 x_i)}}{1+\exp{(\beta_0+\beta_1 x_i)}},
-\]
-!et
-and
-!bt
-\[
-p(y_i =0\vert x_i,\bm{\beta})=1-p(y_i =1\vert x_i,\bm{\beta}).
-\]
-!et
-The parameters $\bm{\beta}$ were defined using a minimization method like gradient descent or Newton-Raphson's method. 
-
-Now we replace $x_i$ with the activation $z_i^l$ for a given layer $l$ and the outputs as $y_i=a_i^l=f(z_i^l)$, with $z_i^l$ now being a function of the weights $w_{ij}^l$ and biases $b_i^l$. 
-We have then
-!bt
-\[
-a_i^l = y_i = \frac{\exp{(z_i^l)}}{1+\exp{(z_i^l)}},
-\]
-!et
-with 
-!bt
-\[
-z_i^l = \sum_{j}w_{ij}^l a_j^{l-1}+b_i^l,
-\]
-!et
-where the superscript $l-1$ indicates that these are the outputs from layer $l-1$.
-Our cost function at the final layer $l=L$ is now
-!bt
-\[
-\mathcal{C}(\bm{W}) = - \sum_{i=1}^n \left(t_i\log{a_i^L}+(1-t_i)\log{(1-a_i^L)}\right),
-\]
-!et
-where we have defined the targets $t_i$. The derivatives of the cost function with respect to the output $a_i^L$ are then easily calculated and we get
-!bt
-\[
-\frac{\partial \mathcal{C}(\bm{W})}{\partial a_i^L} = \frac{a_i^L-t_i}{a_i^L(1-a_i^L)}. 
-\]
-!et
-In case we use another activation function than the logistic one, we need to evaluate other derivatives. 
-
+To install tensorflow on Unix/Linux systems, use pip as
+!bc pycod
+pip3 install tensorflow
+!ec
+and/or if you use _anaconda_, just write (or install from the graphical user interface)
+(current release of CPU-only TensorFlow)
+!bc pycod 
+conda create -n tf tensorflow
+conda activate tf
+!ec
+To install the current release of GPU TensorFlow
+!bc pycod
+conda create -n tf-gpu tensorflow-gpu
+conda activate tf-gpu
+!ec
 
 !split
-===== The Softmax function =====
-In case we employ the more general case given by the Softmax equation, we need to evaluate the derivative of the activation function with respect to the activation $z_i^l$, that is we need
-!bt
-\[
-\frac{\partial f(z_i^l)}{\partial w_{jk}^l} =
-\frac{\partial f(z_i^l)}{\partial z_j^l} \frac{\partial z_j^l}{\partial w_{jk}^l}= \frac{\partial f(z_i^l)}{\partial z_j^l}a_k^{l-1}.
-\]
-!et
-For the Softmax function we have
-!bt
-\[
-f(z_i^l) = \frac{\exp{(z_i^l)}}{\sum_{m=1}^K\exp{(z_m^l)}}.
-\]
-!et
-Its derivative with respect to $z_j^l$ gives 
-!bt
-\[
-\frac{\partial f(z_i^l)}{\partial z_j^l}= f(z_i^l)\left(\delta_{ij}-f(z_j^l)\right), 
-\]
-!et
-which in case of the simply binary model reduces to  having $i=j$. 
-
-!split 
-===== Developing a code for doing neural networks with back propagation =====
-
-  
-One can identify a set of key steps when using neural networks to solve supervised learning problems:  
+===== Using Keras =====
   
-o Collect and pre-process data  
-o Define model and architecture  
-o Choose cost function and optimizer  
-o Train the model  
-o Evaluate model performance on test data  
-o Adjust hyperparameters (if necessary, network architecture)
+Keras is a high level "neural network":"https://en.wikipedia.org/wiki/Application_programming_interface"
+that supports Tensorflow, CTNK and Theano as backends.  
+If you have Anaconda installed you may run the following command
+!bc pycod
+conda install keras
+!ec
+You can look up the "instructions here":"https://keras.io/" for more information.
+
+We will to a large extent use _keras_ in this course. 
 
 !split
 ===== Collect and pre-process data =====
-  
-Here we will be using the MNIST dataset, which is readily available through the _scikit-learn_
-package. You may also find it for example "here":"http://yann.lecun.com/exdb/mnist/".  
-The *MNIST* (Modified National Institute of Standards and Technology) database is a large database
-of handwritten digits that is commonly used for training various image processing systems.  
-The MNIST dataset consists of 70 000 images of size $28\times 28$ pixels, each labeled from 0 to 9.  
-The scikit-learn dataset we will use consists of a selection of 1797 images of size $8\times 8$ collected and processed from this database.  
-  
-To feed data into a feed-forward neural network we need to represent
-the inputs as a design/feature matrix $X = (n_{inputs}, n_{features})$.  Each
-row represents an *input*, in this case a handwritten digit, and
-each column represents a *feature*, in this case a pixel.  The
-correct answers, also known as *labels* or *targets* are
-represented as a 1D array of integers 
-$Y = (n_{inputs}) = (5, 3, 1, 8,...)$.
-  
-As an example, say we want to build a neural network using supervised learning to predict Body-Mass Index (BMI) from
-measurements of height (in m)  
-and weight (in kg). If we have measurements of 5 people the design/feature matrix could be for example:  
-  
-$$ X = \begin{bmatrix}
-1.85 & 81\\
-1.71 & 65\\
-1.95 & 103\\
-1.55 & 42\\
-1.63 & 56
-\end{bmatrix} ,$$  
-  
-and the targets would be:  
-  
-$$ Y = (23.7, 22.2, 27.1, 17.5, 21.1) $$  
-  
-Since each input image is a 2D matrix, we need to flatten the image
-(i.e. "unravel" the 2D matrix into a 1D array) to turn the data into a
-design/feature matrix. This means we lose all spatial information in the
-image, such as locality and translational invariance. More complicated
-architectures such as Convolutional Neural Networks can take advantage
-of such information, and are most commonly applied when analyzing
-images.
 
+Let us look again at the MINST data set.
 
 !bc pycod
 # import necessary packages
 import numpy as np
 import matplotlib.pyplot as plt
+import tensorflow as tf
 from sklearn import datasets
 
 
@@ -529,706 +357,86 @@ for i, image in enumerate(digits.images[random_indices]):
 plt.show()
 !ec
 
-!split
-===== Train and test datasets =====
-
-Performing analysis before partitioning the dataset is a major error, that can lead to incorrect conclusions.  
-  
-We will reserve $80 \%$ of our dataset for training and $20 \%$ for testing.  
-  
-It is important that the train and test datasets are drawn randomly from our dataset, to ensure
-no bias in the sampling.  
-Say you are taking measurements of weather data to predict the weather in the coming 5 days.
-You don't want to train your model on measurements taken from the hours 00.00 to 12.00, and then test it on data
-collected from 12.00 to 24.00.
-
-
 !bc pycod
+from tensorflow.keras.layers import Input
+from tensorflow.keras.models import Sequential      #This allows appending layers to existing models
+from tensorflow.keras.layers import Dense           #This allows defining the characteristics of a particular layer
+from tensorflow.keras import optimizers             #This allows using whichever optimiser we want (sgd,adam,RMSprop)
+from tensorflow.keras import regularizers           #This allows using whichever regularizer we want (l1,l2,l1_l2)
+from tensorflow.keras.utils import to_categorical   #This allows using categorical cross entropy as the cost function
+
 from sklearn.model_selection import train_test_split
 
-# one-liner from scikit-learn library
+# one-hot representation of labels
+labels = to_categorical(labels)
+
+# split into train and test data
 train_size = 0.8
 test_size = 1 - train_size
 X_train, X_test, Y_train, Y_test = train_test_split(inputs, labels, train_size=train_size,
                                                     test_size=test_size)
+!ec
 
-# equivalently in numpy
-def train_test_split_numpy(inputs, labels, train_size, test_size):
-    n_inputs = len(inputs)
-    inputs_shuffled = inputs.copy()
-    labels_shuffled = labels.copy()
-    
-    np.random.shuffle(inputs_shuffled)
-    np.random.shuffle(labels_shuffled)
+
+
+!bc pycod
+
+epochs = 100
+batch_size = 100
+n_neurons_layer1 = 100
+n_neurons_layer2 = 50
+n_categories = 10
+eta_vals = np.logspace(-5, 1, 7)
+lmbd_vals = np.logspace(-5, 1, 7)
+def create_neural_network_keras(n_neurons_layer1, n_neurons_layer2, n_categories, eta, lmbd):
+    model = Sequential()
+    model.add(Dense(n_neurons_layer1, activation='sigmoid', kernel_regularizer=regularizers.l2(lmbd)))
+    model.add(Dense(n_neurons_layer2, activation='sigmoid', kernel_regularizer=regularizers.l2(lmbd)))
+    model.add(Dense(n_categories, activation='softmax'))
     
-    train_end = int(n_inputs*train_size)
-    X_train, X_test = inputs_shuffled[:train_end], inputs_shuffled[train_end:]
-    Y_train, Y_test = labels_shuffled[:train_end], labels_shuffled[train_end:]
+    sgd = optimizers.SGD(learning_rate=eta)
+    model.compile(loss='categorical_crossentropy', optimizer=sgd, metrics=['accuracy'])
     
-    return X_train, X_test, Y_train, Y_test
-
-#X_train, X_test, Y_train, Y_test = train_test_split_numpy(inputs, labels, train_size, test_size)
+    return model
+!ec
 
-print("Number of training images: " + str(len(X_train)))
-print("Number of test images: " + str(len(X_test)))
+!bc pycod
+DNN_keras = np.zeros((len(eta_vals), len(lmbd_vals)), dtype=object)
+        
+for i, eta in enumerate(eta_vals):
+    for j, lmbd in enumerate(lmbd_vals):
+        DNN = create_neural_network_keras(n_neurons_layer1, n_neurons_layer2, n_categories,
+                                         eta=eta, lmbd=lmbd)
+        DNN.fit(X_train, Y_train, epochs=epochs, batch_size=batch_size, verbose=0)
+        scores = DNN.evaluate(X_test, Y_test)
+        
+        DNN_keras[i][j] = DNN
+        
+        print("Learning rate = ", eta)
+        print("Lambda = ", lmbd)
+        print("Test accuracy: %.3f" % scores[1])
+        print()
 !ec
 
-!split
-===== Define model and architecture =====
-  
-Our simple feed-forward neural network will consist of an *input* layer, a single *hidden* layer and an *output* layer. The activation $y$ of each neuron is a weighted sum of inputs, passed through an activation function. In case of the simple perceptron model we have 
-  
-$$ z = \sum_{i=1}^n w_i a_i ,$$
-  
-$$ y = f(z) ,$$
-  
-where $f$ is the activation function, $a_i$ represents input from neuron $i$ in the preceding layer
-and $w_i$ is the weight to input $i$.  
-The activation of the neurons in the input layer is just the features (e.g. a pixel value).  
-  
-The simplest activation function for a neuron is the *Heaviside* function:
-  
-$$ f(z) = 
-\begin{cases}
-1,  &  z > 0\\
-0,  & \text{otherwise}
-\end{cases}
-$$
-  
-A feed-forward neural network with this activation is known as a *perceptron*.  
-For a binary classifier (i.e. two classes, 0 or 1, dog or not-dog) we can also use this in our output layer.  
-This activation can be generalized to $k$ classes (using e.g. the *one-against-all* strategy), 
-and we call these architectures *multiclass perceptrons*.  
-  
-However, it is now common to use the terms Single Layer Perceptron (SLP) (1 hidden layer) and  
-Multilayer Perceptron (MLP) (2 or more hidden layers) to refer to feed-forward neural networks with any activation function.  
-  
-Typical choices for activation functions include the sigmoid function, hyperbolic tangent, and Rectified Linear Unit (ReLU).  
-We will be using the sigmoid function $\sigma(x)$:  
-  
-$$ f(x) = \sigma(x) = \frac{1}{1 + e^{-x}} ,$$
-  
-which is inspired by probability theory (see logistic regression) and was most commonly used until about 2011. See the discussion below concerning other activation functions.
 
-!split  
-===== Layers =====
-  
-* Input 
-Since each input image has 8x8 = 64 pixels or features, we have an input layer of 64 neurons.  
-  
-* Hidden layer
-We will use 50 neurons in the hidden layer receiving input from the neurons in the input layer.  
-Since each neuron in the hidden layer is connected to the 64 inputs we have 64x50 = 3200 weights to the hidden layer.  
-  
-* Output
-If we were building a binary classifier, it would be sufficient with a single neuron in the output layer,
-which could output 0 or 1 according to the Heaviside function. This would be an example of a *hard* classifier, meaning it outputs the class of the input directly. However, if we are dealing with noisy data it is often beneficial to use a *soft* classifier, which outputs the probability of being in class 0 or 1.  
-  
-For a soft binary classifier, we could use a single neuron and interpret the output as either being the probability of being in class 0 or the probability of being in class 1. Alternatively we could use 2 neurons, and interpret each neuron as the probability of being in each class.  
-  
-Since we are doing multiclass classification, with 10 categories, it is natural to use 10 neurons in the output layer. We number the neurons $j = 0,1,...,9$. The activation of each output neuron $j$ will be according to the *softmax* function:  
-  
-$$ P(\text{class $j$} \mid \text{input $\bm{a}$}) = \frac{\exp{(\bm{a}^T \bm{w}_j)}}
-{\sum_{c=0}^{9} \exp{(\bm{a}^T \bm{w}_c)}} ,$$  
-  
-i.e. each neuron $j$ outputs the probability of being in class $j$ given an input from the hidden layer $\bm{a}$, with $\bm{w}_j$ the weights of neuron $j$ to the inputs.  
-The denominator is a normalization factor to ensure the outputs (probabilities) sum up to 1.  
-The exponent is just the weighted sum of inputs as before:  
-  
-$$ z_j = \sum_{i=1}^n w_ {ij} a_i+b_j.$$  
-  
-Since each neuron in the output layer is connected to the 50 inputs from the hidden layer we have 50x10 = 500
-weights to the output layer.
 
-!split  
-===== Weights and biases =====
-  
-Typically weights are initialized with small values distributed around zero, drawn from a uniform
-or normal distribution. Setting all weights to zero means all neurons give the same output, making the network useless.  
-  
-Adding a bias value to the weighted sum of inputs allows the neural network to represent a greater range
-of values. Without it, any input with the value 0 will be mapped to zero (before being passed through the activation). The bias unit has an output of 1, and a weight to each neuron $j$, $b_j$:  
-  
-$$ z_j = \sum_{i=1}^n w_ {ij} a_i + b_j.$$  
-  
-The bias weights $\bm{b}$ are often initialized to zero, but a small value like $0.01$ ensures all neurons have some output which can be backpropagated in the first training cycle.
 !bc pycod
-# building our neural network
-
-n_inputs, n_features = X_train.shape
-n_hidden_neurons = 50
-n_categories = 10
+# optional
+# visual representation of grid search
+# uses seaborn heatmap, could probably do this in matplotlib
+import seaborn as sns
 
-# we make the weights normally distributed using numpy.random.randn
+sns.set()
 
-# weights and bias in the hidden layer
-hidden_weights = np.random.randn(n_features, n_hidden_neurons)
-hidden_bias = np.zeros(n_hidden_neurons) + 0.01
+train_accuracy = np.zeros((len(eta_vals), len(lmbd_vals)))
+test_accuracy = np.zeros((len(eta_vals), len(lmbd_vals)))
 
-# weights and bias in the output layer
-output_weights = np.random.randn(n_hidden_neurons, n_categories)
-output_bias = np.zeros(n_categories) + 0.01
-!ec
+for i in range(len(eta_vals)):
+    for j in range(len(lmbd_vals)):
+        DNN = DNN_keras[i][j]
 
-!split
-===== Feed-forward pass =====
-
-Denote $F$ the number of features, $H$ the number of hidden neurons and $C$ the number of categories.  
-For each input image we calculate a weighted sum of input features (pixel values) to each neuron $j$ in the hidden layer $l$:  
-  
-$$ z_{j}^{l} = \sum_{i=1}^{F} w_{ij}^{l} x_i + b_{j}^{l},$$
-  
-this is then passed through our activation function  
-  
-$$ a_{j}^{l} = f(z_{j}^{l}) .$$  
-  
-We calculate a weighted sum of inputs (activations in the hidden layer) to each neuron $j$ in the output layer:  
-  
-$$ z_{j}^{L} = \sum_{i=1}^{H} w_{ij}^{L} a_{i}^{l} + b_{j}^{L}.$$  
-  
-Finally we calculate the output of neuron $j$ in the output layer using the softmax function:  
-  
-$$ a_{j}^{L} = \frac{\exp{(z_j^{L})}}
-{\sum_{c=0}^{C-1} \exp{(z_c^{L})}} .$$  
-
-!split   
-===== Matrix multiplications =====
-  
-Since our data has the dimensions $X = (n_{inputs}, n_{features})$ and our weights to the hidden
-layer have the dimensions  
-$W_{hidden} = (n_{features}, n_{hidden})$,
-we can easily feed the network all our training data in one go by taking the matrix product  
-  
-$$ X W^{h} = (n_{inputs}, n_{hidden}),$$ 
-  
-and obtain a matrix that holds the weighted sum of inputs to the hidden layer
-for each input image and each hidden neuron.    
-We also add the bias to obtain a matrix of weighted sums to the hidden layer $Z^{h}$:  
-  
-$$ \bm{z}^{l} = \bm{X} \bm{W}^{l} + \bm{b}^{l} ,$$
-  
-meaning the same bias (1D array with size equal number of hidden neurons) is added to each input image.  
-This is then passed through the activation:  
-  
-$$ \bm{a}^{l} = f(\bm{z}^l) .$$  
-  
-This is fed to the output layer:  
-  
-$$ \bm{z}^{L} = \bm{a}^{L} \bm{W}^{L} + \bm{b}^{L} .$$
-  
-Finally we receive our output values for each image and each category by passing it through the softmax function:  
-  
-$$ output = softmax (\bm{z}^{L}) = (n_{inputs}, n_{categories}) .$$
-
-
-!bc pycod
-# setup the feed-forward pass, subscript h = hidden layer
-
-def sigmoid(x):
-    return 1/(1 + np.exp(-x))
-
-def feed_forward(X):
-    # weighted sum of inputs to the hidden layer
-    z_h = np.matmul(X, hidden_weights) + hidden_bias
-    # activation in the hidden layer
-    a_h = sigmoid(z_h)
-    
-    # weighted sum of inputs to the output layer
-    z_o = np.matmul(a_h, output_weights) + output_bias
-    # softmax output
-    # axis 0 holds each input and axis 1 the probabilities of each category
-    exp_term = np.exp(z_o)
-    probabilities = exp_term / np.sum(exp_term, axis=1, keepdims=True)
-    
-    return probabilities
-
-probabilities = feed_forward(X_train)
-print("probabilities = (n_inputs, n_categories) = " + str(probabilities.shape))
-print("probability that image 0 is in category 0,1,2,...,9 = \n" + str(probabilities[0]))
-print("probabilities sum up to: " + str(probabilities[0].sum()))
-print()
-
-# we obtain a prediction by taking the class with the highest likelihood
-def predict(X):
-    probabilities = feed_forward(X)
-    return np.argmax(probabilities, axis=1)
-
-predictions = predict(X_train)
-print("predictions = (n_inputs) = " + str(predictions.shape))
-print("prediction for image 0: " + str(predictions[0]))
-print("correct label for image 0: " + str(Y_train[0]))
-!ec
-
-!split
-===== Choose cost function and optimizer =====
-  
-To measure how well our neural network is doing we need to introduce a cost function.  
-We will call the function that gives the error of a single sample output the *loss* function, and the function
-that gives the total error of our network across all samples the *cost* function.
-A typical choice for multiclass classification is the *cross-entropy* loss, also known as the negative log likelihood.  
-  
-In *multiclass* classification it is common to treat each integer label as a so called *one-hot* vector:  
-  
-$$ y = 5 \quad \rightarrow \quad \bm{y} = (0, 0, 0, 0, 0, 1, 0, 0, 0, 0) ,$$  
-
-  
-$$ y = 1 \quad \rightarrow \quad \bm{y} = (0, 1, 0, 0, 0, 0, 0, 0, 0, 0) ,$$  
-  
-  
-i.e. a binary bit string of length $C$, where $C = 10$ is the number of classes in the MNIST dataset.  
-  
-Let $y_{ic}$ denote the $c$-th component of the $i$-th one-hot vector.  
-We define the cost function $\mathcal{C}$ as a sum over the cross-entropy loss for each point $\bm{x}_i$ in the dataset.
-  
-In the one-hot representation only one of the terms in the loss function is non-zero, namely the
-probability of the correct category $c'$  
-(i.e. the category $c'$ such that $y_{ic'} = 1$). This means that the cross entropy loss only punishes you for how wrong
-you got the correct label. The probability of category $c$ is given by the softmax function. The vector $\bm{\theta}$ represents the parameters of our network, i.e. all the weights and biases.  
-  
-  
-!split
-===== Optimizing the cost function =====
-  
-The network is trained by finding the weights and biases that minimize the cost function. One of the most widely used classes of methods is *gradient descent* and its generalizations. The idea behind gradient descent
-is simply to adjust the weights in the direction where the gradient of the cost function is large and negative. This ensures we flow toward a *local* minimum of the cost function.  
-Each parameter $\theta$ is iteratively adjusted according to the rule  
-  
-$$ \theta_{i+1} = \theta_i - \eta \nabla \mathcal{C}(\theta_i) ,$$
-
-where $\eta$ is known as the *learning rate*, which controls how big a step we take towards the minimum.  
-This update can be repeated for any number of iterations, or until we are satisfied with the result.  
-  
-A simple and effective improvement is a variant called *Batch Gradient Descent*.  
-Instead of calculating the gradient on the whole dataset, we calculate an approximation of the gradient
-on a subset of the data called a *minibatch*.  
-If there are $N$ data points and we have a minibatch size of $M$, the total number of batches
-is $N/M$.  
-We denote each minibatch $B_k$, with $k = 1, 2,...,N/M$. The gradient then becomes:  
-  
-$$ \nabla \mathcal{C}(\theta) = \frac{1}{N} \sum_{i=1}^N \nabla \mathcal{L}_i(\theta) \quad \rightarrow \quad
-\frac{1}{M} \sum_{i \in B_k} \nabla \mathcal{L}_i(\theta) ,$$
-  
-i.e. instead of averaging the loss over the entire dataset, we average over a minibatch.  
-  
-This has two important benefits:  
-o Introducing stochasticity decreases the chance that the algorithm becomes stuck in a local minima.  
-o It significantly speeds up the calculation, since we do not have to use the entire dataset to calculate the gradient.  
-
-The various optmization  methods, with codes and algorithms,  are discussed in our lectures on "Gradient descent approaches":"https://compphysics.github.io/MachineLearning/doc/pub/Splines/html/Splines-bs.html".
-
-!split  
-=====  Regularization =====
-  
-It is common to add an extra term to the cost function, proportional
-to the size of the weights.  This is equivalent to constraining the
-size of the weights, so that they do not grow out of control.
-Constraining the size of the weights means that the weights cannot
-grow arbitrarily large to fit the training data, and in this way
-reduces *overfitting*.
-  
-We will measure the size of the weights using the so called *L2-norm*, meaning our cost function becomes:  
-  
-$$  \mathcal{C}(\theta) = \frac{1}{N} \sum_{i=1}^N \mathcal{L}_i(\theta) \quad \rightarrow \quad
-\frac{1}{N} \sum_{i=1}^N  \mathcal{L}_i(\theta) + \lambda \lvert \lvert \bm{w} \rvert \rvert_2^2 
-= \frac{1}{N} \sum_{i=1}^N  \mathcal{L}(\theta) + \lambda \sum_{ij} w_{ij}^2,$$  
-  
-i.e. we sum up all the weights squared. The factor $\lambda$ is known as a regularization parameter.
-
-  
-In order to train the model, we need to calculate the derivative of
-the cost function with respect to every bias and weight in the
-network.  In total our network has $(64 + 1)\times 50=3250$ weights in
-the hidden layer and $(50 + 1)\times 10=510$ weights to the output
-layer ($+1$ for the bias), and the gradient must be calculated for
-every parameter.  We use the *backpropagation* algorithm discussed
-above. This is a clever use of the chain rule that allows us to
-calculate the gradient efficently. 
-
-  
-!split
-===== Matrix  multiplication =====
-  
-To more efficently train our network these equations are implemented using matrix operations.  
-The error in the output layer is calculated simply as, with $\bm{t}$ being our targets,  
-  
-$$ \delta_L = \bm{t} - \bm{y} = (n_{inputs}, n_{categories}) .$$  
-  
-The gradient for the output weights is calculated as  
-  
-$$ \nabla W_{L} = \bm{a}^T \delta_L   = (n_{hidden}, n_{categories}) ,$$
-  
-where $\bm{a} = (n_{inputs}, n_{hidden})$. This simply means that we are summing up the gradients for each input.  
-Since we are going backwards we have to transpose the activation matrix.  
-  
-The gradient with respect to the output bias is then  
-  
-$$ \nabla \bm{b}_{L} = \sum_{i=1}^{n_{inputs}} \delta_L = (n_{categories}) .$$  
-  
-The error in the hidden layer is  
-  
-$$ \Delta_h = \delta_L W_{L}^T \circ f'(z_{h}) = \delta_L W_{L}^T \circ a_{h} \circ (1 - a_{h}) = (n_{inputs}, n_{hidden}) ,$$  
-  
-where $f'(a_{h})$ is the derivative of the activation in the hidden layer. The matrix products mean
-that we are summing up the products for each neuron in the output layer. The symbol $\circ$ denotes
-the *Hadamard product*, meaning element-wise multiplication.  
-  
-This again gives us the gradients in the hidden layer:  
-  
-$$ \nabla W_{h} = X^T \delta_h = (n_{features}, n_{hidden}) ,$$  
-  
-$$ \nabla b_{h} = \sum_{i=1}^{n_{inputs}} \delta_h = (n_{hidden}) .$$
-
-
-!bc pycod
-# to categorical turns our integer vector into a onehot representation
-from sklearn.metrics import accuracy_score
-
-# one-hot in numpy
-def to_categorical_numpy(integer_vector):
-    n_inputs = len(integer_vector)
-    n_categories = np.max(integer_vector) + 1
-    onehot_vector = np.zeros((n_inputs, n_categories))
-    onehot_vector[range(n_inputs), integer_vector] = 1
-    
-    return onehot_vector
-
-#Y_train_onehot, Y_test_onehot = to_categorical(Y_train), to_categorical(Y_test)
-Y_train_onehot, Y_test_onehot = to_categorical_numpy(Y_train), to_categorical_numpy(Y_test)
-
-def feed_forward_train(X):
-    # weighted sum of inputs to the hidden layer
-    z_h = np.matmul(X, hidden_weights) + hidden_bias
-    # activation in the hidden layer
-    a_h = sigmoid(z_h)
-    
-    # weighted sum of inputs to the output layer
-    z_o = np.matmul(a_h, output_weights) + output_bias
-    # softmax output
-    # axis 0 holds each input and axis 1 the probabilities of each category
-    exp_term = np.exp(z_o)
-    probabilities = exp_term / np.sum(exp_term, axis=1, keepdims=True)
-    
-    # for backpropagation need activations in hidden and output layers
-    return a_h, probabilities
-
-def backpropagation(X, Y):
-    a_h, probabilities = feed_forward_train(X)
-    
-    # error in the output layer
-    error_output = probabilities - Y
-    # error in the hidden layer
-    error_hidden = np.matmul(error_output, output_weights.T) * a_h * (1 - a_h)
-    
-    # gradients for the output layer
-    output_weights_gradient = np.matmul(a_h.T, error_output)
-    output_bias_gradient = np.sum(error_output, axis=0)
-    
-    # gradient for the hidden layer
-    hidden_weights_gradient = np.matmul(X.T, error_hidden)
-    hidden_bias_gradient = np.sum(error_hidden, axis=0)
-
-    return output_weights_gradient, output_bias_gradient, hidden_weights_gradient, hidden_bias_gradient
-
-print("Old accuracy on training data: " + str(accuracy_score(predict(X_train), Y_train)))
-
-eta = 0.01
-lmbd = 0.01
-for i in range(1000):
-    # calculate gradients
-    dWo, dBo, dWh, dBh = backpropagation(X_train, Y_train_onehot)
-    
-    # regularization term gradients
-    dWo += lmbd * output_weights
-    dWh += lmbd * hidden_weights
-    
-    # update weights and biases
-    output_weights -= eta * dWo
-    output_bias -= eta * dBo
-    hidden_weights -= eta * dWh
-    hidden_bias -= eta * dBh
-
-print("New accuracy on training data: " + str(accuracy_score(predict(X_train), Y_train)))
-!ec
-
-!split
-===== Improving performance =====
-  
-As we can see the network does not seem to be learning at all. It seems to be just guessing the label for each image.  
-In order to obtain a network that does something useful, we will have to do a bit more work.  
-  
-The choice of *hyperparameters* such as learning rate and regularization parameter is hugely influential for the performance of the network. Typically a *grid-search* is performed, wherein we test different hyperparameters separated by orders of magnitude. For example we could test the learning rates $\eta = 10^{-6}, 10^{-5},...,10^{-1}$ with different regularization parameters $\lambda = 10^{-6},...,10^{-0}$.  
-  
-Next, we haven't implemented minibatching yet, which introduces stochasticity and is though to act as an important regularizer on the weights. We call a feed-forward + backward pass with a minibatch an *iteration*, and a full training period
-going through the entire dataset ($n/M$ batches) an *epoch*.
-  
-If this does not improve network performance, you may want to consider altering the network architecture, adding more neurons or hidden layers.  
-Andrew Ng goes through some of these considerations in this "video":"https://youtu.be/F1ka6a13S9I". You can find a summary of the video "here":"https://kevinzakka.github.io/2016/09/26/applying-deep-learning/".  
-  
-!split
-===== Full object-oriented implementation =====
-  
-It is very natural to think of the network as an object, with specific instances of the network
-being realizations of this object with different hyperparameters. An implementation using Python classes provides a clean structure and interface, and the full implementation of our neural network is given below.
-
-
-!bc pycod
-class NeuralNetwork:
-    def __init__(
-            self,
-            X_data,
-            Y_data,
-            n_hidden_neurons=50,
-            n_categories=10,
-            epochs=10,
-            batch_size=100,
-            eta=0.1,
-            lmbd=0.0):
-
-        self.X_data_full = X_data
-        self.Y_data_full = Y_data
-
-        self.n_inputs = X_data.shape[0]
-        self.n_features = X_data.shape[1]
-        self.n_hidden_neurons = n_hidden_neurons
-        self.n_categories = n_categories
-
-        self.epochs = epochs
-        self.batch_size = batch_size
-        self.iterations = self.n_inputs // self.batch_size
-        self.eta = eta
-        self.lmbd = lmbd
-
-        self.create_biases_and_weights()
-
-    def create_biases_and_weights(self):
-        self.hidden_weights = np.random.randn(self.n_features, self.n_hidden_neurons)
-        self.hidden_bias = np.zeros(self.n_hidden_neurons) + 0.01
-
-        self.output_weights = np.random.randn(self.n_hidden_neurons, self.n_categories)
-        self.output_bias = np.zeros(self.n_categories) + 0.01
-
-    def feed_forward(self):
-        # feed-forward for training
-        self.z_h = np.matmul(self.X_data, self.hidden_weights) + self.hidden_bias
-        self.a_h = sigmoid(self.z_h)
-
-        self.z_o = np.matmul(self.a_h, self.output_weights) + self.output_bias
-
-        exp_term = np.exp(self.z_o)
-        self.probabilities = exp_term / np.sum(exp_term, axis=1, keepdims=True)
-
-    def feed_forward_out(self, X):
-        # feed-forward for output
-        z_h = np.matmul(X, self.hidden_weights) + self.hidden_bias
-        a_h = sigmoid(z_h)
-
-        z_o = np.matmul(a_h, self.output_weights) + self.output_bias
-        
-        exp_term = np.exp(z_o)
-        probabilities = exp_term / np.sum(exp_term, axis=1, keepdims=True)
-        return probabilities
-
-    def backpropagation(self):
-        error_output = self.probabilities - self.Y_data
-        error_hidden = np.matmul(error_output, self.output_weights.T) * self.a_h * (1 - self.a_h)
-
-        self.output_weights_gradient = np.matmul(self.a_h.T, error_output)
-        self.output_bias_gradient = np.sum(error_output, axis=0)
-
-        self.hidden_weights_gradient = np.matmul(self.X_data.T, error_hidden)
-        self.hidden_bias_gradient = np.sum(error_hidden, axis=0)
-
-        if self.lmbd > 0.0:
-            self.output_weights_gradient += self.lmbd * self.output_weights
-            self.hidden_weights_gradient += self.lmbd * self.hidden_weights
-
-        self.output_weights -= self.eta * self.output_weights_gradient
-        self.output_bias -= self.eta * self.output_bias_gradient
-        self.hidden_weights -= self.eta * self.hidden_weights_gradient
-        self.hidden_bias -= self.eta * self.hidden_bias_gradient
-
-    def predict(self, X):
-        probabilities = self.feed_forward_out(X)
-        return np.argmax(probabilities, axis=1)
-
-    def predict_probabilities(self, X):
-        probabilities = self.feed_forward_out(X)
-        return probabilities
-
-    def train(self):
-        data_indices = np.arange(self.n_inputs)
-
-        for i in range(self.epochs):
-            for j in range(self.iterations):
-                # pick datapoints with replacement
-                chosen_datapoints = np.random.choice(
-                    data_indices, size=self.batch_size, replace=False
-                )
-
-                # minibatch training data
-                self.X_data = self.X_data_full[chosen_datapoints]
-                self.Y_data = self.Y_data_full[chosen_datapoints]
-
-                self.feed_forward()
-                self.backpropagation()
-!ec
-
-!split
-===== Evaluate model performance on test data =====
-  
-To measure the performance of our network we evaluate how well it does it data it has never seen before, i.e. the test data.  
-We measure the performance of the network using the *accuracy* score.  
-The accuracy is as you would expect just the number of images correctly labeled divided by the total number of images. A perfect classifier will have an accuracy score of $1$.  
-  
-$$ \text{Accuracy} = \frac{\sum_{i=1}^n I(\tilde{y}_i = y_i)}{n} ,$$  
-  
-where $I$ is the indicator function, $1$ if $\tilde{y}_i = y_i$ and $0$ otherwise.
-
-
-!bc pycod
-epochs = 100
-batch_size = 100
-
-dnn = NeuralNetwork(X_train, Y_train_onehot, eta=eta, lmbd=lmbd, epochs=epochs, batch_size=batch_size,
-                    n_hidden_neurons=n_hidden_neurons, n_categories=n_categories)
-dnn.train()
-test_predict = dnn.predict(X_test)
-
-# accuracy score from scikit library
-print("Accuracy score on test set: ", accuracy_score(Y_test, test_predict))
-
-# equivalent in numpy
-def accuracy_score_numpy(Y_test, Y_pred):
-    return np.sum(Y_test == Y_pred) / len(Y_test)
-
-#print("Accuracy score on test set: ", accuracy_score_numpy(Y_test, test_predict))
-!ec
-
-!split
-===== Adjust hyperparameters =====
-  
-We now perform a grid search to find the optimal hyperparameters for the network.  
-Note that we are only using 1 layer with 50 neurons, and human performance is estimated to be around $98\%$ ($2\%$ error rate).
-
-!bc pycod
-eta_vals = np.logspace(-5, 1, 7)
-lmbd_vals = np.logspace(-5, 1, 7)
-# store the models for later use
-DNN_numpy = np.zeros((len(eta_vals), len(lmbd_vals)), dtype=object)
-
-# grid search
-for i, eta in enumerate(eta_vals):
-    for j, lmbd in enumerate(lmbd_vals):
-        dnn = NeuralNetwork(X_train, Y_train_onehot, eta=eta, lmbd=lmbd, epochs=epochs, batch_size=batch_size,
-                            n_hidden_neurons=n_hidden_neurons, n_categories=n_categories)
-        dnn.train()
-        
-        DNN_numpy[i][j] = dnn
-        
-        test_predict = dnn.predict(X_test)
-        
-        print("Learning rate  = ", eta)
-        print("Lambda = ", lmbd)
-        print("Accuracy score on test set: ", accuracy_score(Y_test, test_predict))
-        print()
-!ec
-
-!split
-===== Visualization =====
-
-!bc pycod
-# visual representation of grid search
-# uses seaborn heatmap, you can also do this with matplotlib imshow
-import seaborn as sns
-
-sns.set()
-
-train_accuracy = np.zeros((len(eta_vals), len(lmbd_vals)))
-test_accuracy = np.zeros((len(eta_vals), len(lmbd_vals)))
-
-for i in range(len(eta_vals)):
-    for j in range(len(lmbd_vals)):
-        dnn = DNN_numpy[i][j]
-        
-        train_pred = dnn.predict(X_train) 
-        test_pred = dnn.predict(X_test)
-
-        train_accuracy[i][j] = accuracy_score(Y_train, train_pred)
-        test_accuracy[i][j] = accuracy_score(Y_test, test_pred)
-
-        
-fig, ax = plt.subplots(figsize = (10, 10))
-sns.heatmap(train_accuracy, annot=True, ax=ax, cmap="viridis")
-ax.set_title("Training Accuracy")
-ax.set_ylabel("$\eta$")
-ax.set_xlabel("$\lambda$")
-plt.show()
-
-fig, ax = plt.subplots(figsize = (10, 10))
-sns.heatmap(test_accuracy, annot=True, ax=ax, cmap="viridis")
-ax.set_title("Test Accuracy")
-ax.set_ylabel("$\eta$")
-ax.set_xlabel("$\lambda$")
-plt.show()
-!ec
-
-!split
-===== scikit-learn implementation =====
-  
-_scikit-learn_ focuses more
-on traditional machine learning methods, such as regression,
-clustering, decision trees, etc. As such, it has only two types of
-neural networks: Multi Layer Perceptron outputting continuous values,
-*MPLRegressor*, and Multi Layer Perceptron outputting labels,
-*MLPClassifier*. We will see how simple it is to use these classes.
-  
-_scikit-learn_ implements a few improvements from our neural network,
-such as early stopping, a varying learning rate, different
-optimization methods, etc. We would therefore expect a better
-performance overall.
-
-!bc pycod
-from sklearn.neural_network import MLPClassifier
-# store models for later use
-DNN_scikit = np.zeros((len(eta_vals), len(lmbd_vals)), dtype=object)
-
-for i, eta in enumerate(eta_vals):
-    for j, lmbd in enumerate(lmbd_vals):
-        dnn = MLPClassifier(hidden_layer_sizes=(n_hidden_neurons), activation='logistic',
-                            alpha=lmbd, learning_rate_init=eta, max_iter=epochs)
-        dnn.fit(X_train, Y_train)
-        
-        DNN_scikit[i][j] = dnn
-        
-        print("Learning rate  = ", eta)
-        print("Lambda = ", lmbd)
-        print("Accuracy score on test set: ", dnn.score(X_test, Y_test))
-        print()
-!ec
-
-
-!split
-===== Visualization =====
-!bc pycod
-# optional
-# visual representation of grid search
-# uses seaborn heatmap, could probably do this in matplotlib
-import seaborn as sns
-
-sns.set()
-
-train_accuracy = np.zeros((len(eta_vals), len(lmbd_vals)))
-test_accuracy = np.zeros((len(eta_vals), len(lmbd_vals)))
-
-for i in range(len(eta_vals)):
-    for j in range(len(lmbd_vals)):
-        dnn = DNN_scikit[i][j]
-        
-        train_pred = dnn.predict(X_train) 
-        test_pred = dnn.predict(X_test)
-
-        train_accuracy[i][j] = accuracy_score(Y_train, train_pred)
-        test_accuracy[i][j] = accuracy_score(Y_test, test_pred)
+        train_accuracy[i][j] = DNN.evaluate(X_train, Y_train)[1]
+        test_accuracy[i][j] = DNN.evaluate(X_test, Y_test)[1]
 
         
 fig, ax = plt.subplots(figsize = (10, 10))
@@ -1247,407 +455,148 @@ plt.show()
 !ec
 
 
-
-
-
-
-
-
-
 !split
-===== Building neural networks in Tensorflow and Keras =====
+===== Using Pytorch with the full MNIST data set =====
 
-Now we want  to build on the experience gained from our neural network implementation in NumPy and scikit-learn
-and use it to construct a neural network in Tensorflow. Once we have constructed a neural network in NumPy
-and Tensorflow, building one in Keras is really quite trivial, though the performance may suffer.  
-  
-In our previous example we used only one hidden layer, and in this we will use two. From this it should be quite
-clear how to build one using an arbitrary number of hidden layers, using data structures such as Python lists or
-NumPy arrays.
-
-!split
-===== Tensorflow =====
-  
-Tensorflow is an open source library machine learning library
-developed by the Google Brain team for internal use. It was released
-under the Apache 2.0 open source license in November 9, 2015.
-  
-Tensorflow is a computational framework that allows you to construct
-machine learning models at different levels of abstraction, from
-high-level, object-oriented APIs like Keras, down to the C++ kernels
-that Tensorflow is built upon. The higher levels of abstraction are
-simpler to use, but less flexible, and our choice of implementation
-should reflect the problems we are trying to solve.
-  
-"Tensorflow uses":"https://www.tensorflow.org/guide/graphs" so-called graphs to represent your computation
-in terms of the dependencies between individual operations, such that you first build a Tensorflow *graph*
-to represent your model, and then create a Tensorflow *session* to run the graph.
-  
-In this guide we will analyze the same data as we did in our NumPy and
-scikit-learn tutorial, gathered from the MNIST database of images. We
-will give an introduction to the lower level Python Application
-Program Interfaces (APIs), and see how we use them to build our graph.
-Then we will build (effectively) the same graph in Keras, to see just
-how simple solving a machine learning problem can be.
-  
-To install tensorflow on Unix/Linux systems, use pip as
-!bc pycod
-pip3 install tensorflow
-!ec
-and/or if you use _anaconda_, just write (or install from the graphical user interface)
-(current release of CPU-only TensorFlow)
-!bc pycod 
-conda create -n tf tensorflow
-conda activate tf
-!ec
-To install the current release of GPU TensorFlow
 !bc pycod
-conda create -n tf-gpu tensorflow-gpu
-conda activate tf-gpu
-!ec
+import torch
+import torch.nn as nn
+import torch.optim as optim
+import torchvision
+import torchvision.transforms as transforms
+
+# Device configuration: use GPU if available
+device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+
+# MNIST dataset (downloads if not already present)
+transform = transforms.Compose([
+    transforms.ToTensor(),
+    transforms.Normalize((0.5,), (0.5,))  # normalize to mean=0.5, std=0.5 (approx. [-1,1] pixel range)
+])
+train_dataset = torchvision.datasets.MNIST(root='./data', train=True, download=True, transform=transform)
+test_dataset  = torchvision.datasets.MNIST(root='./data', train=False, download=True, transform=transform)
+
+train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=64, shuffle=True)
+test_loader  = torch.utils.data.DataLoader(test_dataset, batch_size=64, shuffle=False)
+
+
+class NeuralNet(nn.Module):
+    def __init__(self):
+        super(NeuralNet, self).__init__()
+        self.fc1 = nn.Linear(28*28, 100)   # first hidden layer (784 -> 100)
+        self.fc2 = nn.Linear(100, 100)    # second hidden layer (100 -> 100)
+        self.fc3 = nn.Linear(100, 10)     # output layer (100 -> 10 classes)
+    def forward(self, x):
+        x = x.view(x.size(0), -1)         # flatten images into vectors of size 784
+        x = torch.relu(self.fc1(x))       # hidden layer 1 + ReLU activation
+        x = torch.relu(self.fc2(x))       # hidden layer 2 + ReLU activation
+        x = self.fc3(x)                   # output layer (logits for 10 classes)
+        return x
+
+model = NeuralNet().to(device)
+
+
+criterion = nn.CrossEntropyLoss()
+optimizer = optim.SGD(model.parameters(), lr=0.01, weight_decay=1e-4)
+
+num_epochs = 10
+for epoch in range(num_epochs):
+    model.train()  # set model to training mode
+    running_loss = 0.0
+    for images, labels in train_loader:
+        # Move data to device (GPU if available, else CPU)
+        images, labels = images.to(device), labels.to(device)
+
+        optimizer.zero_grad()            # reset gradients to zero
+        outputs = model(images)          # forward pass: compute predictions
+        loss = criterion(outputs, labels)  # compute cross-entropy loss
+        loss.backward()                 # backpropagate to compute gradients
+        optimizer.step()                # update weights using SGD step 
+
+        running_loss += loss.item()
+    # Compute average loss over all batches in this epoch
+    avg_loss = running_loss / len(train_loader)
+    print(f"Epoch {epoch+1}/{num_epochs}, Loss: {avg_loss:.4f}")
+
+#Evaluation on the Test Set
+
+
+
+model.eval()  # set model to evaluation mode 
+correct = 0
+total = 0
+with torch.no_grad():  # disable gradient calculation for evaluation 
+    for images, labels in test_loader:
+        images, labels = images.to(device), labels.to(device)
+        outputs = model(images)
+        _, predicted = torch.max(outputs, dim=1)  # class with highest score
+        total += labels.size(0)
+        correct += (predicted == labels).sum().item()
+
+accuracy = 100 * correct / total
+print(f"Test Accuracy: {accuracy:.2f}%")
 
-!split
-===== Using Keras =====
-  
-Keras is a high level "neural network":"https://en.wikipedia.org/wiki/Application_programming_interface"
-that supports Tensorflow, CTNK and Theano as backends.  
-If you have Anaconda installed you may run the following command
-!bc pycod
-conda install keras
 !ec
-You can look up the "instructions here":"https://keras.io/" for more information.
 
-We will to a large extent use _keras_ in this course. 
 
 !split
-===== Collect and pre-process data =====
-
-Let us look again at the MINST data set.
-
-!bc pycod
-# import necessary packages
-import numpy as np
-import matplotlib.pyplot as plt
-import tensorflow as tf
-from sklearn import datasets
-
-
-# ensure the same random numbers appear every time
-np.random.seed(0)
-
-# display images in notebook
-%matplotlib inline
-plt.rcParams['figure.figsize'] = (12,12)
-
-
-# download MNIST dataset
-digits = datasets.load_digits()
-
-# define inputs and labels
-inputs = digits.images
-labels = digits.target
-
-print("inputs = (n_inputs, pixel_width, pixel_height) = " + str(inputs.shape))
-print("labels = (n_inputs) = " + str(labels.shape))
-
-
-# flatten the image
-# the value -1 means dimension is inferred from the remaining dimensions: 8x8 = 64
-n_inputs = len(inputs)
-inputs = inputs.reshape(n_inputs, -1)
-print("X = (n_inputs, n_features) = " + str(inputs.shape))
-
-
-# choose some random images to display
-indices = np.arange(n_inputs)
-random_indices = np.random.choice(indices, size=5)
-
-for i, image in enumerate(digits.images[random_indices]):
-    plt.subplot(1, 5, i+1)
-    plt.axis('off')
-    plt.imshow(image, cmap=plt.cm.gray_r, interpolation='nearest')
-    plt.title("Label: %d" % digits.target[random_indices[i]])
-plt.show()
-!ec
-
-!bc pycod
-from tensorflow.keras.layers import Input
-from tensorflow.keras.models import Sequential      #This allows appending layers to existing models
-from tensorflow.keras.layers import Dense           #This allows defining the characteristics of a particular layer
-from tensorflow.keras import optimizers             #This allows using whichever optimiser we want (sgd,adam,RMSprop)
-from tensorflow.keras import regularizers           #This allows using whichever regularizer we want (l1,l2,l1_l2)
-from tensorflow.keras.utils import to_categorical   #This allows using categorical cross entropy as the cost function
-
-from sklearn.model_selection import train_test_split
-
-# one-hot representation of labels
-labels = to_categorical(labels)
-
-# split into train and test data
-train_size = 0.8
-test_size = 1 - train_size
-X_train, X_test, Y_train, Y_test = train_test_split(inputs, labels, train_size=train_size,
-                                                    test_size=test_size)
-!ec
-
-
-
-!bc pycod
-
-epochs = 100
-batch_size = 100
-n_neurons_layer1 = 100
-n_neurons_layer2 = 50
-n_categories = 10
-eta_vals = np.logspace(-5, 1, 7)
-lmbd_vals = np.logspace(-5, 1, 7)
-def create_neural_network_keras(n_neurons_layer1, n_neurons_layer2, n_categories, eta, lmbd):
-    model = Sequential()
-    model.add(Dense(n_neurons_layer1, activation='sigmoid', kernel_regularizer=regularizers.l2(lmbd)))
-    model.add(Dense(n_neurons_layer2, activation='sigmoid', kernel_regularizer=regularizers.l2(lmbd)))
-    model.add(Dense(n_categories, activation='softmax'))
-    
-    sgd = optimizers.SGD(learning_rate=eta)
-    model.compile(loss='categorical_crossentropy', optimizer=sgd, metrics=['accuracy'])
-    
-    return model
-!ec
-
-!bc pycod
-DNN_keras = np.zeros((len(eta_vals), len(lmbd_vals)), dtype=object)
-        
-for i, eta in enumerate(eta_vals):
-    for j, lmbd in enumerate(lmbd_vals):
-        DNN = create_neural_network_keras(n_neurons_layer1, n_neurons_layer2, n_categories,
-                                         eta=eta, lmbd=lmbd)
-        DNN.fit(X_train, Y_train, epochs=epochs, batch_size=batch_size, verbose=0)
-        scores = DNN.evaluate(X_test, Y_test)
-        
-        DNN_keras[i][j] = DNN
-        
-        print("Learning rate = ", eta)
-        print("Lambda = ", lmbd)
-        print("Test accuracy: %.3f" % scores[1])
-        print()
-!ec
-
-
-
-!bc pycod
-# optional
-# visual representation of grid search
-# uses seaborn heatmap, could probably do this in matplotlib
-import seaborn as sns
-
-sns.set()
-
-train_accuracy = np.zeros((len(eta_vals), len(lmbd_vals)))
-test_accuracy = np.zeros((len(eta_vals), len(lmbd_vals)))
-
-for i in range(len(eta_vals)):
-    for j in range(len(lmbd_vals)):
-        DNN = DNN_keras[i][j]
-
-        train_accuracy[i][j] = DNN.evaluate(X_train, Y_train)[1]
-        test_accuracy[i][j] = DNN.evaluate(X_test, Y_test)[1]
-
-        
-fig, ax = plt.subplots(figsize = (10, 10))
-sns.heatmap(train_accuracy, annot=True, ax=ax, cmap="viridis")
-ax.set_title("Training Accuracy")
-ax.set_ylabel("$\eta$")
-ax.set_xlabel("$\lambda$")
-plt.show()
-
-fig, ax = plt.subplots(figsize = (10, 10))
-sns.heatmap(test_accuracy, annot=True, ax=ax, cmap="viridis")
-ax.set_title("Test Accuracy")
-ax.set_ylabel("$\eta$")
-ax.set_xlabel("$\lambda$")
-plt.show()
-!ec
-
-
-
-!split
-=====  The Breast Cancer Data, now with Keras =====
+===== And a similar example using Tensorflow with Keras =====
 
 !bc pycod
 
 import tensorflow as tf
-from tensorflow.keras.layers import Input
-from tensorflow.keras.models import Sequential      #This allows appending layers to existing models
-from tensorflow.keras.layers import Dense           #This allows defining the characteristics of a particular layer
-from tensorflow.keras import optimizers             #This allows using whichever optimiser we want (sgd,adam,RMSprop)
-from tensorflow.keras import regularizers           #This allows using whichever regularizer we want (l1,l2,l1_l2)
-from tensorflow.keras.utils import to_categorical   #This allows using categorical cross entropy as the cost function
-import numpy as np
-import matplotlib.pyplot as plt
-import seaborn as sns
-from sklearn.model_selection import train_test_split as splitter
-from sklearn.datasets import load_breast_cancer
-import pickle
-import os 
-
-
-"""Load breast cancer dataset"""
-
-np.random.seed(0)        #create same seed for random number every time
-
-cancer=load_breast_cancer()      #Download breast cancer dataset
-
-inputs=cancer.data                     #Feature matrix of 569 rows (samples) and 30 columns (parameters)
-outputs=cancer.target                  #Label array of 569 rows (0 for benign and 1 for malignant)
-labels=cancer.feature_names[0:30]
-
-print('The content of the breast cancer dataset is:')      #Print information about the datasets
-print(labels)
-print('-------------------------')
-print("inputs =  " + str(inputs.shape))
-print("outputs =  " + str(outputs.shape))
-print("labels =  "+ str(labels.shape))
-
-x=inputs      #Reassign the Feature and Label matrices to other variables
-y=outputs
-
-#%% 
-
-# Visualisation of dataset (for correlation analysis)
-
-plt.figure()
-plt.scatter(x[:,0],x[:,2],s=40,c=y,cmap=plt.cm.Spectral)
-plt.xlabel('Mean radius',fontweight='bold')
-plt.ylabel('Mean perimeter',fontweight='bold')
-plt.show()
-
-plt.figure()
-plt.scatter(x[:,5],x[:,6],s=40,c=y, cmap=plt.cm.Spectral)
-plt.xlabel('Mean compactness',fontweight='bold')
-plt.ylabel('Mean concavity',fontweight='bold')
-plt.show()
-
-
-plt.figure()
-plt.scatter(x[:,0],x[:,1],s=40,c=y,cmap=plt.cm.Spectral)
-plt.xlabel('Mean radius',fontweight='bold')
-plt.ylabel('Mean texture',fontweight='bold')
-plt.show()
-
-plt.figure()
-plt.scatter(x[:,2],x[:,1],s=40,c=y,cmap=plt.cm.Spectral)
-plt.xlabel('Mean perimeter',fontweight='bold')
-plt.ylabel('Mean compactness',fontweight='bold')
-plt.show()
-
-
-# Generate training and testing datasets
-
-#Select features relevant to classification (texture,perimeter,compactness and symmetery) 
-#and add to input matrix
-
-temp1=np.reshape(x[:,1],(len(x[:,1]),1))
-temp2=np.reshape(x[:,2],(len(x[:,2]),1))
-X=np.hstack((temp1,temp2))      
-temp=np.reshape(x[:,5],(len(x[:,5]),1))
-X=np.hstack((X,temp))       
-temp=np.reshape(x[:,8],(len(x[:,8]),1))
-X=np.hstack((X,temp))       
-
-X_train,X_test,y_train,y_test=splitter(X,y,test_size=0.1)   #Split datasets into training and testing
-
-y_train=to_categorical(y_train)     #Convert labels to categorical when using categorical cross entropy
-y_test=to_categorical(y_test)
-
-del temp1,temp2,temp
-
-# %%
-
-# Define tunable parameters"
-
-eta=np.logspace(-3,-1,3)                    #Define vector of learning rates (parameter to SGD optimiser)
-lamda=0.01                                  #Define hyperparameter
-n_layers=2                                  #Define number of hidden layers in the model
-n_neuron=np.logspace(0,3,4,dtype=int)       #Define number of neurons per layer
-epochs=100                                   #Number of reiterations over the input data
-batch_size=100                              #Number of samples per gradient update
-
-# %%
-
-"""Define function to return Deep Neural Network model"""
-
-def NN_model(inputsize,n_layers,n_neuron,eta,lamda):
-    model=Sequential()      
-    for i in range(n_layers):       #Run loop to add hidden layers to the model
-        if (i==0):                  #First layer requires input dimensions
-            model.add(Dense(n_neuron,activation='relu',kernel_regularizer=regularizers.l2(lamda),input_dim=inputsize))
-        else:                       #Subsequent layers are capable of automatic shape inferencing
-            model.add(Dense(n_neuron,activation='relu',kernel_regularizer=regularizers.l2(lamda)))
-    model.add(Dense(2,activation='softmax'))  #2 outputs - ordered and disordered (softmax for prob)
-    sgd=optimizers.SGD(learning_rate=eta)
-    model.compile(loss='categorical_crossentropy',optimizer=sgd,metrics=['accuracy'])
-    return model
-
-    
-Train_accuracy=np.zeros((len(n_neuron),len(eta)))      #Define matrices to store accuracy scores as a function
-Test_accuracy=np.zeros((len(n_neuron),len(eta)))       #of learning rate and number of hidden neurons for 
-
-for i in range(len(n_neuron)):     #run loops over hidden neurons and learning rates to calculate 
-    for j in range(len(eta)):      #accuracy scores 
-        DNN_model=NN_model(X_train.shape[1],n_layers,n_neuron[i],eta[j],lamda)
-        DNN_model.fit(X_train,y_train,epochs=epochs,batch_size=batch_size,verbose=1)
-        Train_accuracy[i,j]=DNN_model.evaluate(X_train,y_train)[1]
-        Test_accuracy[i,j]=DNN_model.evaluate(X_test,y_test)[1]
-               
-
-def plot_data(x,y,data,title=None):
-
-    # plot results
-    fontsize=16
-
-
-    fig = plt.figure()
-    ax = fig.add_subplot(111)
-    cax = ax.matshow(data, interpolation='nearest', vmin=0, vmax=1)
-    
-    cbar=fig.colorbar(cax)
-    cbar.ax.set_ylabel('accuracy (%)',rotation=90,fontsize=fontsize)
-    cbar.set_ticks([0,.2,.4,0.6,0.8,1.0])
-    cbar.set_ticklabels(['0%','20%','40%','60%','80%','100%'])
-
-    # put text on matrix elements
-    for i, x_val in enumerate(np.arange(len(x))):
-        for j, y_val in enumerate(np.arange(len(y))):
-            c = "${0:.1f}\\%$".format( 100*data[j,i])  
-            ax.text(x_val, y_val, c, va='center', ha='center')
-
-    # convert axis vaues to to string labels
-    x=[str(i) for i in x]
-    y=[str(i) for i in y]
-
-
-    ax.set_xticklabels(['']+x)
-    ax.set_yticklabels(['']+y)
-
-    ax.set_xlabel('$\\mathrm{learning\\ rate}$',fontsize=fontsize)
-    ax.set_ylabel('$\\mathrm{hidden\\ neurons}$',fontsize=fontsize)
-    if title is not None:
-        ax.set_title(title)
-
-    plt.tight_layout()
-
-    plt.show()
-    
-plot_data(eta,n_neuron,Train_accuracy, 'training')
-plot_data(eta,n_neuron,Test_accuracy, 'testing')
+from tensorflow import keras
+from tensorflow.keras import layers, regularizers
+
+# Check for GPU (TensorFlow will use it automatically if available)
+gpus = tf.config.list_physical_devices('GPU')
+print(f"GPUs available: {gpus}")
+
+# 1) Load and preprocess MNIST
+(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()
+# Normalize to [0, 1]
+x_train = (x_train.astype("float32") / 255.0)
+x_test  = (x_test.astype("float32") / 255.0)
+
+# 2) Build the model: 784 -> 100 -> 100 -> 10
+l2_reg = 1e-4  # L2 regularization strength
+
+model = keras.Sequential([
+    layers.Input(shape=(28, 28)),
+    layers.Flatten(),
+    layers.Dense(100, activation="relu",
+                 kernel_regularizer=regularizers.l2(l2_reg)),
+    layers.Dense(100, activation="relu",
+                 kernel_regularizer=regularizers.l2(l2_reg)),
+    layers.Dense(10, activation="softmax")  # output probabilities for 10 classes
+])
+
+# 3) Compile with SGD + weight decay via L2 regularizers
+model.compile(
+    optimizer=keras.optimizers.SGD(learning_rate=0.01),
+    loss="sparse_categorical_crossentropy",
+    metrics=["accuracy"],
+)
+
+model.summary()
+
+# 4) Train
+history = model.fit(
+    x_train, y_train,
+    epochs=10,
+    batch_size=64,
+    validation_split=0.1,  # optional: monitor validation during training
+    verbose=1
+)
+
+# 5) Evaluate on test set
+test_loss, test_acc = model.evaluate(x_test, y_test, verbose=0)
+print(f"Test accuracy: {test_acc:.4f}, Test loss: {test_loss:.4f}")
 
 !ec
 
-
-
-
-
 !split
-===== Building a neural network code =====
+===== Building our own  neural network code =====
 
 Here we  present a flexible object oriented codebase
 for a feed forward neural network, along with a demonstration of how
diff --git a/doc/src/week44/.ipynb_checkpoints/exercisesweek44-checkpoint.ipynb b/doc/src/week44/.ipynb_checkpoints/exercisesweek44-checkpoint.ipynb
new file mode 100644
index 000000000..32aa0e723
--- /dev/null
+++ b/doc/src/week44/.ipynb_checkpoints/exercisesweek44-checkpoint.ipynb
@@ -0,0 +1,182 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "55f7cd56",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- HTML file automatically generated from DocOnce source (https://github.com/doconce/doconce/)\n",
+    "doconce format html exercisesweek44.do.txt  -->\n",
+    "<!-- dom:TITLE: Exercises week 44 -->\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "37c83276",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "# Exercises week 44\n",
+    "\n",
+    "**October 27-31, 2025**\n",
+    "\n",
+    "Date: **Deadline is Friday October 31 at midnight**\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "58a26983",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "# Overarching aims of the exercises this week\n",
+    "\n",
+    "The exercise set this week has two parts.\n",
+    "\n",
+    "1. The first is a version of the exercises from week 39, where you got started with the report and github repository for project 1, only this time for project 2. This part is required, and a short feedback to this exercise will be available before the project deadline. And you can reuse these elements in your final report.\n",
+    "\n",
+    "2. The second is a list of questions meant as a summary of many of the central elements we have discussed in connection with projects 1 and 2, with a slight bias towards deep learning methods and their training. The hope is that these exercises can be of use in your discussions about the neural network results in project 2. **You don't need to answer all the questions, but you should be able to answer them by the end of working on project 2.**\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "350c58e2",
+   "metadata": {
+    "vscode": {
+     "languageId": "plaintext"
+    }
+   },
+   "source": [
+    "### Deliverables\n",
+    "\n",
+    "First, join a group in canvas with your group partners. Pick an avaliable group for Project 2 in the “People” page. If you don't have a group, you should really consider joining one!\n",
+    "\n",
+    "Complete exercise 1 while working in an Overleaf project. Then, in canvas, include\n",
+    "\n",
+    "- An exported PDF of the report draft you have been working on.\n",
+    "- A comment linking to the github repository used in exercise **1d)**\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "00f65f6e",
+   "metadata": {},
+   "source": [
+    "## Exercise 1:\n",
+    "\n",
+    "Following the same directions as in the weekly exercises for week 39:\n",
+    "\n",
+    "**a)** Create a report document in Overleaf, and write a suitable abstract and introduction for project 2.\n",
+    "\n",
+    "**b)** Add a figure in your report of a heatmap showing the test accuracy of a neural network with [0, 1, 2, 3] hidden layers and [5, 10, 25, 50] nodes per hidden layer.\n",
+    "\n",
+    "**c)** Add a figure in your report which meets as few requirements as possible of what we consider a good figure in this course, while still including some results, a title, figure text, and axis labels. Describe in the text of the report the different ways in which the figure is lacking. (This should not be included in the final report for project 2.)\n",
+    "\n",
+    "**d)** Create a github repository or folder in a repository with all the elements described in exercise 4 of the weekly exercises of week 39.\n",
+    "\n",
+    "**e)** If applicable, add references to your report for the source of your data for regression and classification, the source of claims you make about your data, and for the sources of the gradient optimizers you use and your general claims about these.\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6dff53b8",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Exercise 2:\n",
+    "\n",
+    "**a)** Linear and logistic regression methods\n",
+    "\n",
+    "1. What is the main difference between ordinary least squares and Ridge regression?\n",
+    "\n",
+    "2. Which kind of data set would you use logistic regression for?\n",
+    "\n",
+    "3. In linear regression you assume that your output is described by a continuous non-stochastic function $f(x)$. Which is the equivalent function in logistic regression?\n",
+    "\n",
+    "4. Can you find an analytic solution to a logistic regression type of problem?\n",
+    "\n",
+    "5. What kind of cost function would you use in logistic regression?\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "21a056a4",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "**b)** Deep learning\n",
+    "\n",
+    "1. What is an activation function and discuss the use of an activation function? Explain three different types of activation functions?\n",
+    "\n",
+    "2. Describe the architecture of a typical feed forward Neural Network (NN).\n",
+    "\n",
+    "3. You are using a deep neural network for a prediction task. After training your model, you notice that it is strongly overfitting the training set and that the performance on the test isn’t good. What can you do to reduce overfitting?\n",
+    "\n",
+    "4. How would you know if your model is suffering from the problem of exploding gradients?\n",
+    "\n",
+    "5. Can you name and explain a few hyperparameters used for training a neural network?\n",
+    "\n",
+    "6. Describe the architecture of a typical Convolutional Neural Network (CNN)\n",
+    "\n",
+    "7. What is the vanishing gradient problem in Neural Networks and how to fix it?\n",
+    "\n",
+    "8. When it comes to training an artificial neural network, what could the reason be for why the cost/loss doesn't decrease in a few epochs?\n",
+    "\n",
+    "9. How does L1/L2 regularization affect a neural network?\n",
+    "\n",
+    "10. What is(are) the advantage(s) of deep learning over traditional methods like linear regression or logistic regression?\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7c48bc09",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "**c)** Optimization part\n",
+    "\n",
+    "1. Which is the basic mathematical root-finding method behind essentially all gradient descent approaches(stochastic and non-stochastic)?\n",
+    "\n",
+    "2. And why don't we use it? Or stated differently, why do we introduce the learning rate as a parameter?\n",
+    "\n",
+    "3. What might happen if you set the momentum hyperparameter too close to 1 (e.g., 0.9999) when using an optimizer for the learning rate?\n",
+    "\n",
+    "4. Why should we use stochastic gradient descent instead of plain gradient descent?\n",
+    "\n",
+    "5. Which parameters would you need to tune when use a stochastic gradient descent approach?\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "56b0b5f6",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "**d)** Analysis of results\n",
+    "\n",
+    "1. How do you assess overfitting and underfitting?\n",
+    "\n",
+    "2. Why do we divide the data in test and train and/or eventually validation sets?\n",
+    "\n",
+    "3. Why would you use resampling methods in the data analysis? Mention some widely popular resampling methods.\n",
+    "\n",
+    "4. Why might a model that does not overfit the data (maybe because there is a lot of data) perform worse when we add regularization?\n"
+   ]
+  }
+ ],
+ "metadata": {
+  "language_info": {
+   "name": "python"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/doc/src/week44/.ipynb_checkpoints/week44-checkpoint.ipynb b/doc/src/week44/.ipynb_checkpoints/week44-checkpoint.ipynb
new file mode 100644
index 000000000..569c51940
--- /dev/null
+++ b/doc/src/week44/.ipynb_checkpoints/week44-checkpoint.ipynb
@@ -0,0 +1,4981 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "c95dabc8",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- HTML file automatically generated from DocOnce source (https://github.com/doconce/doconce/)\n",
+    "doconce format html week44.do.txt --no_mako -->\n",
+    "<!-- dom:TITLE: Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN) -->"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "01d58cf4",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "# Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)\n",
+    "**Morten Hjorth-Jensen**, Department of Physics, University of Oslo, Norway\n",
+    "\n",
+    "Date: **Week 44**"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "36bd8c1c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Plan for week 44\n",
+    "\n",
+    "**Material for the lecture Monday October 27, 2025.**\n",
+    "\n",
+    "1. Solving differential equations, continuation from last week, first lecture\n",
+    "\n",
+    "2. Convolutional  Neural Networks, second lecture\n",
+    "\n",
+    "3. Readings and Videos:\n",
+    "\n",
+    "  * These lecture notes at <https://github.com/CompPhysics/MachineLearning/blob/master/doc/pub/week44/ipynb/week44.ipynb>\n",
+    "\n",
+    "  * For a more in depth discussion on  neural networks we recommend Goodfellow et al chapter 9. See also chapter 11 and 12 on practicalities and applications\n",
+    "\n",
+    "  * Reading suggestions for implementation of CNNs see Rashcka et al.'s chapter 14 at <https://github.com/rasbt/machine-learning-book/tree/main/ch14>.     \n",
+    "\n",
+    "  * Video on Deep Learning at <https://www.youtube.com/playlist?list=PLZHQObOWTQDNU6R1_67000Dx_ZCJB-3pi>\n",
+    "\n",
+    "  * Video  on Convolutional Neural Networks from MIT at <https://www.youtube.com/watch?v=iaSUYvmCekI&ab_channel=AlexanderAmini>\n",
+    "\n",
+    "  * Video on CNNs from Stanford at <https://www.youtube.com/watch?v=bNb2fEVKeEo&list=PLC1qU-LWwrF64f4QKQT-Vg5Wr4qEE1Zxk&index=6&ab_channel=StanfordUniversitySchoolofEngineering>\n",
+    "<!-- * Video of lecture October 28 at <https://youtu.be/rfrSfikAz94> -->\n",
+    "<!-- * Whiteboard notes at <https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2024/NotesOctober28> -->"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1fc206f2",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Lab  sessions on Tuesday and Wednesday\n",
+    "\n",
+    "* Main focus is discussion of and work on project 2\n",
+    "\n",
+    "* If you did not get time to finish the exercises from weeks 41-42, you can also keep working on them and hand in this coming Friday"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1c161fc8",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Material for Lecture Monday October 27"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2980e816",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Solving differential equations  with Deep Learning\n",
+    "\n",
+    "The Universal Approximation Theorem states that a neural network can\n",
+    "approximate any function at a single hidden layer along with one input\n",
+    "and output layer to any given precision.\n",
+    "\n",
+    "**Book on solving differential equations with ML methods.**\n",
+    "\n",
+    "[An Introduction to Neural Network Methods for Differential Equations](https://www.springer.com/gp/book/9789401798150), by Yadav and Kumar.\n",
+    "\n",
+    "**Physics informed neural networks.**\n",
+    "\n",
+    "[Scientific Machine Learning Through Physics–Informed Neural Networks: Where we are and What’s Next](https://link.springer.com/article/10.1007/s10915-022-01939-z), by Cuomo et al\n",
+    "\n",
+    "**Thanks to Kristine Baluka Hein.**\n",
+    "\n",
+    "The lectures on differential equations were developed by Kristine Baluka Hein, now PhD student at IFI.\n",
+    "A great thanks to Kristine."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "67bb578e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Ordinary Differential Equations first\n",
+    "\n",
+    "An ordinary differential equation (ODE) is an equation involving functions having one variable.\n",
+    "\n",
+    "In general, an ordinary differential equation looks like"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "166eb1c4",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"ode\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation} \\label{ode} \\tag{1}\n",
+    "f\\left(x, \\, g(x), \\, g'(x), \\, g''(x), \\, \\dots \\, , \\, g^{(n)}(x)\\right) = 0\n",
+    "\\end{equation}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ad3ca5d3",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where $g(x)$ is the function to find, and $g^{(n)}(x)$ is the $n$-th derivative of $g(x)$.\n",
+    "\n",
+    "The $f\\left(x, g(x), g'(x), g''(x), \\, \\dots \\, , g^{(n)}(x)\\right)$ is just a way to write that there is an expression involving $x$ and $g(x), \\ g'(x), \\ g''(x), \\, \\dots \\, , \\text{ and } g^{(n)}(x)$ on the left side of the equality sign in ([1](#ode)).\n",
+    "The highest order of derivative, that is the value of $n$, determines to the order of the equation.\n",
+    "The equation is referred to as a $n$-th order ODE.\n",
+    "Along with ([1](#ode)), some additional conditions of the function $g(x)$ are typically given\n",
+    "for the solution to be unique."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9048fb27",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## The trial solution\n",
+    "\n",
+    "Let the trial solution $g_t(x)$ be"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cae59e87",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"_auto1\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation}\n",
+    "\tg_t(x) = h_1(x) + h_2(x,N(x,P))\n",
+    "\\label{_auto1} \\tag{2}\n",
+    "\\end{equation}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "31ab8079",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where $h_1(x)$ is a function that makes $g_t(x)$ satisfy a given set\n",
+    "of conditions, $N(x,P)$ a neural network with weights and biases\n",
+    "described by $P$ and $h_2(x, N(x,P))$ some expression involving the\n",
+    "neural network.  The role of the function $h_2(x, N(x,P))$, is to\n",
+    "ensure that the output from $N(x,P)$ is zero when $g_t(x)$ is\n",
+    "evaluated at the values of $x$ where the given conditions must be\n",
+    "satisfied.  The function $h_1(x)$ should alone make $g_t(x)$ satisfy\n",
+    "the conditions.\n",
+    "\n",
+    "But what about the network $N(x,P)$?\n",
+    "\n",
+    "As described previously, an optimization method could be used to minimize the parameters of a neural network, that being its weights and biases, through backward propagation."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1fdf388c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Minimization process\n",
+    "\n",
+    "For the minimization to be defined, we need to have a cost function at hand to minimize.\n",
+    "\n",
+    "It is given that $f\\left(x, \\, g(x), \\, g'(x), \\, g''(x), \\, \\dots \\, , \\, g^{(n)}(x)\\right)$ should be equal to zero in ([1](#ode)).\n",
+    "We can choose to consider the mean squared error as the cost function for an input $x$.\n",
+    "Since we are looking at one input, the cost function is just $f$ squared.\n",
+    "The cost function $c\\left(x, P \\right)$ can therefore be expressed as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b86f81ff",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "C\\left(x, P\\right) = \\big(f\\left(x, \\, g(x), \\, g'(x), \\, g''(x), \\, \\dots \\, , \\, g^{(n)}(x)\\right)\\big)^2\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "bc9f91e7",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "If $N$ inputs are given as a vector $\\boldsymbol{x}$ with elements $x_i$ for $i = 1,\\dots,N$,\n",
+    "the cost function becomes"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b78df727",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"cost\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation} \\label{cost} \\tag{3}\n",
+    "\tC\\left(\\boldsymbol{x}, P\\right) = \\frac{1}{N} \\sum_{i=1}^N \\big(f\\left(x_i, \\, g(x_i), \\, g'(x_i), \\, g''(x_i), \\, \\dots \\, , \\, g^{(n)}(x_i)\\right)\\big)^2\n",
+    "\\end{equation}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e8fdf56b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "The neural net should then find the parameters $P$ that minimizes the cost function in\n",
+    "([3](#cost)) for a set of $N$ training samples $x_i$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "785244fb",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Minimizing the cost function using gradient descent and automatic differentiation\n",
+    "\n",
+    "To perform the minimization using gradient descent, the gradient of $C\\left(\\boldsymbol{x}, P\\right)$ is needed.\n",
+    "It might happen so that finding an analytical expression of the gradient of $C(\\boldsymbol{x}, P)$ from ([3](#cost)) gets too messy, depending on which cost function one desires to use.\n",
+    "\n",
+    "Luckily, there exists libraries that makes the job for us through automatic differentiation.\n",
+    "Automatic differentiation is a method of finding the derivatives numerically with very high precision."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ccf471e1",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Example: Exponential decay\n",
+    "\n",
+    "An exponential decay of a quantity $g(x)$ is described by the equation"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "fe91e4a7",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"solve_expdec\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation} \\label{solve_expdec} \\tag{4}\n",
+    "  g'(x) = -\\gamma g(x)\n",
+    "\\end{equation}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b829ae90",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "with $g(0) = g_0$ for some chosen initial value $g_0$.\n",
+    "\n",
+    "The analytical solution of ([4](#solve_expdec)) is"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6ef99f34",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"_auto2\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation}\n",
+    "  g(x) = g_0 \\exp\\left(-\\gamma x\\right)\n",
+    "\\label{_auto2} \\tag{5}\n",
+    "\\end{equation}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "fa46d7b1",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Having an analytical solution at hand, it is possible to use it to compare how well a neural network finds a solution of ([4](#solve_expdec))."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "47c0feab",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## The function to solve for\n",
+    "\n",
+    "The program will use a neural network to solve"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7d504ec9",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"solveode\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation} \\label{solveode} \\tag{6}\n",
+    "g'(x) = -\\gamma g(x)\n",
+    "\\end{equation}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "405547fd",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where $g(0) = g_0$ with $\\gamma$ and $g_0$ being some chosen values.\n",
+    "\n",
+    "In this example, $\\gamma = 2$ and $g_0 = 10$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7235cbd2",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## The trial solution\n",
+    "To begin with, a trial solution $g_t(t)$ must be chosen. A general trial solution for ordinary differential equations could be"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "de761e0b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "g_t(x, P) = h_1(x) + h_2(x, N(x, P))\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1ce1572e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "with $h_1(x)$ ensuring that $g_t(x)$ satisfies some conditions and $h_2(x,N(x, P))$ an expression involving $x$ and the output from the neural network $N(x,P)$ with $P $ being the collection of the weights and biases for each layer. For now, it is assumed that the network consists of one input layer, one hidden layer, and one output layer."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "acd26497",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Setup of Network\n",
+    "\n",
+    "In this network, there are no weights and bias at the input layer, so $P = \\{ P_{\\text{hidden}},  P_{\\text{output}} \\}$.\n",
+    "If there are $N_{\\text{hidden} }$ neurons in the hidden layer, then $P_{\\text{hidden}}$ is a $N_{\\text{hidden} } \\times (1 + N_{\\text{input}})$ matrix, given that there are $N_{\\text{input}}$ neurons in the input layer.\n",
+    "\n",
+    "The first column in $P_{\\text{hidden} }$ represents the bias for each neuron in the hidden layer and the second column represents the weights for each neuron in the hidden layer from the input layer.\n",
+    "If there are $N_{\\text{output} }$ neurons in the output layer, then $P_{\\text{output}} $ is a $N_{\\text{output} } \\times (1 + N_{\\text{hidden} })$ matrix.\n",
+    "\n",
+    "Its first column represents the bias of each neuron and the remaining columns represents the weights to each neuron.\n",
+    "\n",
+    "It is given that $g(0) = g_0$. The trial solution must fulfill this condition to be a proper solution of ([6](#solveode)). A possible way to ensure that $g_t(0, P) = g_0$, is to let $F(N(x,P)) = x \\cdot N(x,P)$ and $A(x) = g_0$. This gives the following trial solution:"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e1c31110",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"trial\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation} \\label{trial} \\tag{7}\n",
+    "g_t(x, P) = g_0 + x \\cdot N(x, P)\n",
+    "\\end{equation}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1176f34d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Reformulating the problem\n",
+    "\n",
+    "We wish that our neural network manages to minimize a given cost function.\n",
+    "\n",
+    "A reformulation of out equation, ([6](#solveode)), must therefore be done,\n",
+    "such that it describes the problem a neural network can solve for.\n",
+    "\n",
+    "The neural network must find the set of weights and biases $P$ such that the trial solution in ([7](#trial)) satisfies ([6](#solveode)).\n",
+    "\n",
+    "The trial solution"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5540e8c6",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "g_t(x, P) = g_0 + x \\cdot N(x, P)\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "12b649d5",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "has been chosen such that it already solves the condition $g(0) = g_0$. What remains, is to find $P$ such that"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1de146c9",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"nnmin\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation} \\label{nnmin} \\tag{8}\n",
+    "g_t'(x, P) = - \\gamma g_t(x, P)\n",
+    "\\end{equation}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1d60d54a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "is fulfilled as *best as possible*."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4393a431",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## More technicalities\n",
+    "\n",
+    "The left hand side and right hand side of ([8](#nnmin)) must be computed separately, and then the neural network must choose weights and biases, contained in $P$, such that the sides are equal as best as possible.\n",
+    "This means that the absolute or squared difference between the sides must be as close to zero, ideally equal to zero.\n",
+    "In this case, the difference squared shows to be an appropriate measurement of how erroneous the trial solution is with respect to $P$ of the neural network.\n",
+    "\n",
+    "This gives the following cost function our neural network must solve for:"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "99233c5b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\min_{P}\\Big\\{ \\big(g_t'(x, P) - ( -\\gamma g_t(x, P) \\big)^2 \\Big\\}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "715bf1a6",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "(the notation $\\min_{P}\\{ f(x, P) \\}$ means that we desire to find $P$ that yields the minimum of $f(x, P)$)\n",
+    "\n",
+    "or, in terms of weights and biases for the hidden and output layer in our network:"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0652b8c8",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\min_{P_{\\text{hidden} }, \\ P_{\\text{output} }}\\Big\\{ \\big(g_t'(x, \\{ P_{\\text{hidden} }, P_{\\text{output} }\\}) - ( -\\gamma g_t(x, \\{ P_{\\text{hidden} }, P_{\\text{output} }\\}) \\big)^2 \\Big\\}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d79808cf",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "for an input value $x$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c5e887d3",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## More details\n",
+    "\n",
+    "If the neural network evaluates $g_t(x, P)$ at more values for $x$, say $N$ values $x_i$ for $i = 1, \\dots, N$, then the *total* error to minimize becomes"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "13d70473",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"min\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation} \\label{min} \\tag{9}\n",
+    "\\min_{P}\\Big\\{\\frac{1}{N} \\sum_{i=1}^N  \\big(g_t'(x_i, P) - ( -\\gamma g_t(x_i, P) \\big)^2 \\Big\\}\n",
+    "\\end{equation}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "bc3d92bb",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Letting $\\boldsymbol{x}$ be a vector with elements $x_i$ and $C(\\boldsymbol{x}, P) = \\frac{1}{N} \\sum_i  \\big(g_t'(x_i, P) - ( -\\gamma g_t(x_i, P) \\big)^2$ denote the cost function, the minimization problem that our network must solve, becomes"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "adf984ab",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\min_{P} C(\\boldsymbol{x}, P)\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4d655450",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "In terms of $P_{\\text{hidden} }$ and $P_{\\text{output} }$, this could also be expressed as\n",
+    "\n",
+    "$$\n",
+    "\\min_{P_{\\text{hidden} }, \\ P_{\\text{output} }} C(\\boldsymbol{x}, \\{P_{\\text{hidden} }, P_{\\text{output} }\\})\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f4118a1f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## A possible implementation of a neural network\n",
+    "\n",
+    "For simplicity, it is assumed that the input is an array $\\boldsymbol{x} = (x_1, \\dots, x_N)$ with $N$ elements. It is at these points the neural network should find $P$ such that it fulfills ([9](#min)).\n",
+    "\n",
+    "First, the neural network must feed forward the inputs.\n",
+    "This means that $\\boldsymbol{x}s$ must be passed through an input layer, a hidden layer and a output layer. The input layer in this case, does not need to process the data any further.\n",
+    "The input layer will consist of $N_{\\text{input} }$ neurons, passing its element to each neuron in the hidden layer.  The number of neurons in the hidden layer will be $N_{\\text{hidden} }$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "acbb4e39",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Technicalities\n",
+    "\n",
+    "For the $i$-th in the hidden layer with weight $w_i^{\\text{hidden} }$ and bias $b_i^{\\text{hidden} }$, the weighting from the $j$-th neuron at the input layer is:"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "45e1d8d3",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\begin{aligned}\n",
+    "z_{i,j}^{\\text{hidden}} &= b_i^{\\text{hidden}} + w_i^{\\text{hidden}}x_j \\\\\n",
+    "&=\n",
+    "\\begin{pmatrix}\n",
+    "b_i^{\\text{hidden}} & w_i^{\\text{hidden}}\n",
+    "\\end{pmatrix}\n",
+    "\\begin{pmatrix}\n",
+    "1 \\\\\n",
+    "x_j\n",
+    "\\end{pmatrix}\n",
+    "\\end{aligned}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "be522fd1",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Final technicalities I\n",
+    "\n",
+    "The result after weighting the inputs at the $i$-th hidden neuron can be written as a vector:"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3118d417",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\begin{aligned}\n",
+    "\\boldsymbol{z}_{i}^{\\text{hidden}} &= \\Big( b_i^{\\text{hidden}} + w_i^{\\text{hidden}}x_1 , \\ b_i^{\\text{hidden}} + w_i^{\\text{hidden}} x_2, \\ \\dots \\, , \\ b_i^{\\text{hidden}} + w_i^{\\text{hidden}} x_N\\Big)  \\\\\n",
+    "&=\n",
+    "\\begin{pmatrix}\n",
+    " b_i^{\\text{hidden}}  & w_i^{\\text{hidden}}\n",
+    "\\end{pmatrix}\n",
+    "\\begin{pmatrix}\n",
+    "1  & 1 & \\dots & 1 \\\\\n",
+    "x_1 & x_2 & \\dots & x_N\n",
+    "\\end{pmatrix} \\\\\n",
+    "&= \\boldsymbol{p}_{i, \\text{hidden}}^T X\n",
+    "\\end{aligned}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d68c769f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Final technicalities II\n",
+    "\n",
+    "The vector $\\boldsymbol{p}_{i, \\text{hidden}}^T$ constitutes each row in $P_{\\text{hidden} }$, which contains the weights for the neural network to minimize according to ([9](#min)).\n",
+    "\n",
+    "After having found $\\boldsymbol{z}_{i}^{\\text{hidden}} $ for every $i$-th neuron within the hidden layer, the vector will be sent to an activation function $a_i(\\boldsymbol{z})$.\n",
+    "\n",
+    "In this example, the sigmoid function has been chosen to be the activation function for each hidden neuron:"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "52e488a2",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "f(z) = \\frac{1}{1 + \\exp{(-z)}}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "81c8db59",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "It is possible to use other activations functions for the hidden layer also.\n",
+    "\n",
+    "The output $\\boldsymbol{x}_i^{\\text{hidden}}$ from each $i$-th hidden neuron is:\n",
+    "\n",
+    "$$\n",
+    "\\boldsymbol{x}_i^{\\text{hidden} } = f\\big(  \\boldsymbol{z}_{i}^{\\text{hidden}} \\big)\n",
+    "$$\n",
+    "\n",
+    "The outputs $\\boldsymbol{x}_i^{\\text{hidden} } $ are then sent to the output layer.\n",
+    "\n",
+    "The output layer consists of one neuron in this case, and combines the\n",
+    "output from each of the neurons in the hidden layers. The output layer\n",
+    "combines the results from the hidden layer using some weights $w_i^{\\text{output}}$\n",
+    "and biases $b_i^{\\text{output}}$. In this case,\n",
+    "it is assumes that the number of neurons in the output layer is one."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b166589d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Final technicalities III\n",
+    "\n",
+    "The procedure of weighting the output neuron $j$ in the hidden layer to the $i$-th neuron in the output layer is similar as for the hidden layer described previously."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "09deb754",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\begin{aligned}\n",
+    "z_{1,j}^{\\text{output}} & =\n",
+    "\\begin{pmatrix}\n",
+    "b_1^{\\text{output}} & \\boldsymbol{w}_1^{\\text{output}}\n",
+    "\\end{pmatrix}\n",
+    "\\begin{pmatrix}\n",
+    "1 \\\\\n",
+    "\\boldsymbol{x}_j^{\\text{hidden}}\n",
+    "\\end{pmatrix}\n",
+    "\\end{aligned}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6143850e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Final technicalities IV\n",
+    "\n",
+    "Expressing $z_{1,j}^{\\text{output}}$ as a vector gives the following way of weighting the inputs from the hidden layer:"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ca7729cb",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\boldsymbol{z}_{1}^{\\text{output}} =\n",
+    "\\begin{pmatrix}\n",
+    "b_1^{\\text{output}} & \\boldsymbol{w}_1^{\\text{output}}\n",
+    "\\end{pmatrix}\n",
+    "\\begin{pmatrix}\n",
+    "1  & 1 & \\dots & 1 \\\\\n",
+    "\\boldsymbol{x}_1^{\\text{hidden}} & \\boldsymbol{x}_2^{\\text{hidden}} & \\dots & \\boldsymbol{x}_N^{\\text{hidden}}\n",
+    "\\end{pmatrix}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ba06b3ee",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "In this case we seek a continuous range of values since we are approximating a function. This means that after computing $\\boldsymbol{z}_{1}^{\\text{output}}$ the neural network has finished its feed forward step, and $\\boldsymbol{z}_{1}^{\\text{output}}$ is the final output of the network."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7709f7e2",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Back propagation\n",
+    "\n",
+    "The next step is to decide how the parameters should be changed such that they minimize the cost function.\n",
+    "\n",
+    "The chosen cost function for this problem is"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2f3a79d3",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "C(\\boldsymbol{x}, P) = \\frac{1}{N} \\sum_i  \\big(g_t'(x_i, P) - ( -\\gamma g_t(x_i, P) \\big)^2\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "71dba3c9",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "In order to minimize the cost function, an optimization method must be chosen.\n",
+    "\n",
+    "Here, gradient descent with a constant step size has been chosen."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "544ab86d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Gradient descent\n",
+    "\n",
+    "The idea of the gradient descent algorithm is to update parameters in\n",
+    "a direction where the cost function decreases goes to a minimum.\n",
+    "\n",
+    "In general, the update of some parameters $\\boldsymbol{\\omega}$ given a cost\n",
+    "function defined by some weights $\\boldsymbol{\\omega}$, $C(\\boldsymbol{x},\n",
+    "\\boldsymbol{\\omega})$, goes as follows:"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "60cfc8c4",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\boldsymbol{\\omega}_{\\text{new} } = \\boldsymbol{\\omega} - \\lambda \\nabla_{\\boldsymbol{\\omega}} C(\\boldsymbol{x}, \\boldsymbol{\\omega})\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6452f24a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "for a number of iterations or until $ \\big|\\big| \\boldsymbol{\\omega}_{\\text{new} } - \\boldsymbol{\\omega} \\big|\\big|$ becomes smaller than some given tolerance.\n",
+    "\n",
+    "The value of $\\lambda$ decides how large steps the algorithm must take\n",
+    "in the direction of $ \\nabla_{\\boldsymbol{\\omega}} C(\\boldsymbol{x}, \\boldsymbol{\\omega})$.\n",
+    "The notation $\\nabla_{\\boldsymbol{\\omega}}$ express the gradient with respect\n",
+    "to the elements in $\\boldsymbol{\\omega}$.\n",
+    "\n",
+    "In our case, we have to minimize the cost function $C(\\boldsymbol{x}, P)$ with\n",
+    "respect to the two sets of weights and biases, that is for the hidden\n",
+    "layer $P_{\\text{hidden} }$ and for the output layer $P_{\\text{output}\n",
+    "}$ .\n",
+    "\n",
+    "This means that $P_{\\text{hidden} }$ and $P_{\\text{output} }$ is updated by"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "00d51f12",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\begin{aligned}\n",
+    "P_{\\text{hidden},\\text{new}} &= P_{\\text{hidden}} - \\lambda \\nabla_{P_{\\text{hidden}}} C(\\boldsymbol{x}, P)  \\\\\n",
+    "P_{\\text{output},\\text{new}} &= P_{\\text{output}} - \\lambda \\nabla_{P_{\\text{output}}} C(\\boldsymbol{x}, P)\n",
+    "\\end{aligned}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "89d566ca",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## The code for solving the ODE"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "id": "b4de17b2",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "%matplotlib inline\n",
+    "\n",
+    "import autograd.numpy as np\n",
+    "from autograd import grad, elementwise_grad\n",
+    "import autograd.numpy.random as npr\n",
+    "from matplotlib import pyplot as plt\n",
+    "\n",
+    "def sigmoid(z):\n",
+    "    return 1/(1 + np.exp(-z))\n",
+    "\n",
+    "# Assuming one input, hidden, and output layer\n",
+    "def neural_network(params, x):\n",
+    "\n",
+    "    # Find the weights (including and biases) for the hidden and output layer.\n",
+    "    # Assume that params is a list of parameters for each layer.\n",
+    "    # The biases are the first element for each array in params,\n",
+    "    # and the weights are the remaning elements in each array in params.\n",
+    "\n",
+    "    w_hidden = params[0]\n",
+    "    w_output = params[1]\n",
+    "\n",
+    "    # Assumes input x being an one-dimensional array\n",
+    "    num_values = np.size(x)\n",
+    "    x = x.reshape(-1, num_values)\n",
+    "\n",
+    "    # Assume that the input layer does nothing to the input x\n",
+    "    x_input = x\n",
+    "\n",
+    "    ## Hidden layer:\n",
+    "\n",
+    "    # Add a row of ones to include bias\n",
+    "    x_input = np.concatenate((np.ones((1,num_values)), x_input ), axis = 0)\n",
+    "\n",
+    "    z_hidden = np.matmul(w_hidden, x_input)\n",
+    "    x_hidden = sigmoid(z_hidden)\n",
+    "\n",
+    "    ## Output layer:\n",
+    "\n",
+    "    # Include bias:\n",
+    "    x_hidden = np.concatenate((np.ones((1,num_values)), x_hidden ), axis = 0)\n",
+    "\n",
+    "    z_output = np.matmul(w_output, x_hidden)\n",
+    "    x_output = z_output\n",
+    "\n",
+    "    return x_output\n",
+    "\n",
+    "# The trial solution using the deep neural network:\n",
+    "def g_trial(x,params, g0 = 10):\n",
+    "    return g0 + x*neural_network(params,x)\n",
+    "\n",
+    "# The right side of the ODE:\n",
+    "def g(x, g_trial, gamma = 2):\n",
+    "    return -gamma*g_trial\n",
+    "\n",
+    "# The cost function:\n",
+    "def cost_function(P, x):\n",
+    "\n",
+    "    # Evaluate the trial function with the current parameters P\n",
+    "    g_t = g_trial(x,P)\n",
+    "\n",
+    "    # Find the derivative w.r.t x of the neural network\n",
+    "    d_net_out = elementwise_grad(neural_network,1)(P,x)\n",
+    "\n",
+    "    # Find the derivative w.r.t x of the trial function\n",
+    "    d_g_t = elementwise_grad(g_trial,0)(x,P)\n",
+    "\n",
+    "    # The right side of the ODE\n",
+    "    func = g(x, g_t)\n",
+    "\n",
+    "    err_sqr = (d_g_t - func)**2\n",
+    "    cost_sum = np.sum(err_sqr)\n",
+    "\n",
+    "    return cost_sum / np.size(err_sqr)\n",
+    "\n",
+    "# Solve the exponential decay ODE using neural network with one input, hidden, and output layer\n",
+    "def solve_ode_neural_network(x, num_neurons_hidden, num_iter, lmb):\n",
+    "    ## Set up initial weights and biases\n",
+    "\n",
+    "    # For the hidden layer\n",
+    "    p0 = npr.randn(num_neurons_hidden, 2 )\n",
+    "\n",
+    "    # For the output layer\n",
+    "    p1 = npr.randn(1, num_neurons_hidden + 1 ) # +1 since bias is included\n",
+    "\n",
+    "    P = [p0, p1]\n",
+    "\n",
+    "    print('Initial cost: %g'%cost_function(P, x))\n",
+    "\n",
+    "    ## Start finding the optimal weights using gradient descent\n",
+    "\n",
+    "    # Find the Python function that represents the gradient of the cost function\n",
+    "    # w.r.t the 0-th input argument -- that is the weights and biases in the hidden and output layer\n",
+    "    cost_function_grad = grad(cost_function,0)\n",
+    "\n",
+    "    # Let the update be done num_iter times\n",
+    "    for i in range(num_iter):\n",
+    "        # Evaluate the gradient at the current weights and biases in P.\n",
+    "        # The cost_grad consist now of two arrays;\n",
+    "        # one for the gradient w.r.t P_hidden and\n",
+    "        # one for the gradient w.r.t P_output\n",
+    "        cost_grad =  cost_function_grad(P, x)\n",
+    "\n",
+    "        P[0] = P[0] - lmb * cost_grad[0]\n",
+    "        P[1] = P[1] - lmb * cost_grad[1]\n",
+    "\n",
+    "    print('Final cost: %g'%cost_function(P, x))\n",
+    "\n",
+    "    return P\n",
+    "\n",
+    "def g_analytic(x, gamma = 2, g0 = 10):\n",
+    "    return g0*np.exp(-gamma*x)\n",
+    "\n",
+    "# Solve the given problem\n",
+    "if __name__ == '__main__':\n",
+    "    # Set seed such that the weight are initialized\n",
+    "    # with same weights and biases for every run.\n",
+    "    npr.seed(15)\n",
+    "\n",
+    "    ## Decide the vales of arguments to the function to solve\n",
+    "    N = 10\n",
+    "    x = np.linspace(0, 1, N)\n",
+    "\n",
+    "    ## Set up the initial parameters\n",
+    "    num_hidden_neurons = 10\n",
+    "    num_iter = 10000\n",
+    "    lmb = 0.001\n",
+    "\n",
+    "    # Use the network\n",
+    "    P = solve_ode_neural_network(x, num_hidden_neurons, num_iter, lmb)\n",
+    "\n",
+    "    # Print the deviation from the trial solution and true solution\n",
+    "    res = g_trial(x,P)\n",
+    "    res_analytical = g_analytic(x)\n",
+    "\n",
+    "    print('Max absolute difference: %g'%np.max(np.abs(res - res_analytical)))\n",
+    "\n",
+    "    # Plot the results\n",
+    "    plt.figure(figsize=(10,10))\n",
+    "\n",
+    "    plt.title('Performance of neural network solving an ODE compared to the analytical solution')\n",
+    "    plt.plot(x, res_analytical)\n",
+    "    plt.plot(x, res[0,:])\n",
+    "    plt.legend(['analytical','nn'])\n",
+    "    plt.xlabel('x')\n",
+    "    plt.ylabel('g(x)')\n",
+    "    plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "88c14096",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## The network with one input layer, specified number of hidden layers, and one output layer\n",
+    "\n",
+    "It is also possible to extend the construction of our network into a more general one, allowing the network to contain more than one hidden layers.\n",
+    "\n",
+    "The number of neurons within each hidden layer are given as a list of integers in the program below."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "id": "fd76460d",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "import autograd.numpy as np\n",
+    "from autograd import grad, elementwise_grad\n",
+    "import autograd.numpy.random as npr\n",
+    "from matplotlib import pyplot as plt\n",
+    "\n",
+    "def sigmoid(z):\n",
+    "    return 1/(1 + np.exp(-z))\n",
+    "\n",
+    "# The neural network with one input layer and one output layer,\n",
+    "# but with number of hidden layers specified by the user.\n",
+    "def deep_neural_network(deep_params, x):\n",
+    "    # N_hidden is the number of hidden layers\n",
+    "    # deep_params is a list, len() should be used\n",
+    "    N_hidden = len(deep_params) - 1 # -1 since params consists of\n",
+    "                                        # parameters to all the hidden\n",
+    "                                        # layers AND the output layer.\n",
+    "\n",
+    "    # Assumes input x being an one-dimensional array\n",
+    "    num_values = np.size(x)\n",
+    "    x = x.reshape(-1, num_values)\n",
+    "\n",
+    "    # Assume that the input layer does nothing to the input x\n",
+    "    x_input = x\n",
+    "\n",
+    "    # Due to multiple hidden layers, define a variable referencing to the\n",
+    "    # output of the previous layer:\n",
+    "    x_prev = x_input\n",
+    "\n",
+    "    ## Hidden layers:\n",
+    "\n",
+    "    for l in range(N_hidden):\n",
+    "        # From the list of parameters P; find the correct weigths and bias for this layer\n",
+    "        w_hidden = deep_params[l]\n",
+    "\n",
+    "        # Add a row of ones to include bias\n",
+    "        x_prev = np.concatenate((np.ones((1,num_values)), x_prev ), axis = 0)\n",
+    "\n",
+    "        z_hidden = np.matmul(w_hidden, x_prev)\n",
+    "        x_hidden = sigmoid(z_hidden)\n",
+    "\n",
+    "        # Update x_prev such that next layer can use the output from this layer\n",
+    "        x_prev = x_hidden\n",
+    "\n",
+    "    ## Output layer:\n",
+    "\n",
+    "    # Get the weights and bias for this layer\n",
+    "    w_output = deep_params[-1]\n",
+    "\n",
+    "    # Include bias:\n",
+    "    x_prev = np.concatenate((np.ones((1,num_values)), x_prev), axis = 0)\n",
+    "\n",
+    "    z_output = np.matmul(w_output, x_prev)\n",
+    "    x_output = z_output\n",
+    "\n",
+    "    return x_output\n",
+    "\n",
+    "# The trial solution using the deep neural network:\n",
+    "def g_trial_deep(x,params, g0 = 10):\n",
+    "    return g0 + x*deep_neural_network(params, x)\n",
+    "\n",
+    "# The right side of the ODE:\n",
+    "def g(x, g_trial, gamma = 2):\n",
+    "    return -gamma*g_trial\n",
+    "\n",
+    "# The same cost function as before, but calls deep_neural_network instead.\n",
+    "def cost_function_deep(P, x):\n",
+    "\n",
+    "    # Evaluate the trial function with the current parameters P\n",
+    "    g_t = g_trial_deep(x,P)\n",
+    "\n",
+    "    # Find the derivative w.r.t x of the neural network\n",
+    "    d_net_out = elementwise_grad(deep_neural_network,1)(P,x)\n",
+    "\n",
+    "    # Find the derivative w.r.t x of the trial function\n",
+    "    d_g_t = elementwise_grad(g_trial_deep,0)(x,P)\n",
+    "\n",
+    "    # The right side of the ODE\n",
+    "    func = g(x, g_t)\n",
+    "\n",
+    "    err_sqr = (d_g_t - func)**2\n",
+    "    cost_sum = np.sum(err_sqr)\n",
+    "\n",
+    "    return cost_sum / np.size(err_sqr)\n",
+    "\n",
+    "# Solve the exponential decay ODE using neural network with one input and one output layer,\n",
+    "# but with specified number of hidden layers from the user.\n",
+    "def solve_ode_deep_neural_network(x, num_neurons, num_iter, lmb):\n",
+    "    # num_hidden_neurons is now a list of number of neurons within each hidden layer\n",
+    "\n",
+    "    # The number of elements in the list num_hidden_neurons thus represents\n",
+    "    # the number of hidden layers.\n",
+    "\n",
+    "    # Find the number of hidden layers:\n",
+    "    N_hidden = np.size(num_neurons)\n",
+    "\n",
+    "    ## Set up initial weights and biases\n",
+    "\n",
+    "    # Initialize the list of parameters:\n",
+    "    P = [None]*(N_hidden + 1) # + 1 to include the output layer\n",
+    "\n",
+    "    P[0] = npr.randn(num_neurons[0], 2 )\n",
+    "    for l in range(1,N_hidden):\n",
+    "        P[l] = npr.randn(num_neurons[l], num_neurons[l-1] + 1) # +1 to include bias\n",
+    "\n",
+    "    # For the output layer\n",
+    "    P[-1] = npr.randn(1, num_neurons[-1] + 1 ) # +1 since bias is included\n",
+    "\n",
+    "    print('Initial cost: %g'%cost_function_deep(P, x))\n",
+    "\n",
+    "    ## Start finding the optimal weights using gradient descent\n",
+    "\n",
+    "    # Find the Python function that represents the gradient of the cost function\n",
+    "    # w.r.t the 0-th input argument -- that is the weights and biases in the hidden and output layer\n",
+    "    cost_function_deep_grad = grad(cost_function_deep,0)\n",
+    "\n",
+    "    # Let the update be done num_iter times\n",
+    "    for i in range(num_iter):\n",
+    "        # Evaluate the gradient at the current weights and biases in P.\n",
+    "        # The cost_grad consist now of N_hidden + 1 arrays; the gradient w.r.t the weights and biases\n",
+    "        # in the hidden layers and output layers evaluated at x.\n",
+    "        cost_deep_grad =  cost_function_deep_grad(P, x)\n",
+    "\n",
+    "        for l in range(N_hidden+1):\n",
+    "            P[l] = P[l] - lmb * cost_deep_grad[l]\n",
+    "\n",
+    "    print('Final cost: %g'%cost_function_deep(P, x))\n",
+    "\n",
+    "    return P\n",
+    "\n",
+    "def g_analytic(x, gamma = 2, g0 = 10):\n",
+    "    return g0*np.exp(-gamma*x)\n",
+    "\n",
+    "# Solve the given problem\n",
+    "if __name__ == '__main__':\n",
+    "    npr.seed(15)\n",
+    "\n",
+    "    ## Decide the vales of arguments to the function to solve\n",
+    "    N = 10\n",
+    "    x = np.linspace(0, 1, N)\n",
+    "\n",
+    "    ## Set up the initial parameters\n",
+    "    num_hidden_neurons = np.array([10,10])\n",
+    "    num_iter = 10000\n",
+    "    lmb = 0.001\n",
+    "\n",
+    "    P = solve_ode_deep_neural_network(x, num_hidden_neurons, num_iter, lmb)\n",
+    "\n",
+    "    res = g_trial_deep(x,P)\n",
+    "    res_analytical = g_analytic(x)\n",
+    "\n",
+    "    plt.figure(figsize=(10,10))\n",
+    "\n",
+    "    plt.title('Performance of a deep neural network solving an ODE compared to the analytical solution')\n",
+    "    plt.plot(x, res_analytical)\n",
+    "    plt.plot(x, res[0,:])\n",
+    "    plt.legend(['analytical','dnn'])\n",
+    "    plt.ylabel('g(x)')\n",
+    "    plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "bd1c8c54",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Example: Population growth\n",
+    "\n",
+    "A logistic model of population growth assumes that a population converges toward an equilibrium.\n",
+    "The population growth can be modeled by"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d4fdb3ae",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"log\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation} \\label{log} \\tag{10}\n",
+    "\tg'(t) = \\alpha g(t)(A - g(t))\n",
+    "\\end{equation}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e775b877",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where $g(t)$ is the population density at time $t$, $\\alpha > 0$ the growth rate and $A > 0$ is the maximum population number in the environment.\n",
+    "Also, at $t = 0$ the population has the size $g(0) = g_0$, where $g_0$ is some chosen constant.\n",
+    "\n",
+    "In this example, similar network as for the exponential decay using Autograd has been used to solve the equation. However, as the implementation might suffer from e.g numerical instability\n",
+    "and high execution time (this might be more apparent in the examples solving PDEs),\n",
+    "using a library like  TensorFlow is recommended.\n",
+    "Here, we stay with a more simple approach and implement for comparison, the simple forward Euler method."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "03f17c69",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Setting up the problem\n",
+    "\n",
+    "Here, we will model a population $g(t)$ in an environment having carrying capacity $A$.\n",
+    "The population follows the model"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "93d5050c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"solveode_population\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation} \\label{solveode_population} \\tag{11}\n",
+    "g'(t) = \\alpha g(t)(A - g(t))\n",
+    "\\end{equation}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4d2e02c0",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where $g(0) = g_0$.\n",
+    "\n",
+    "In this example, we let $\\alpha = 2$, $A = 1$, and $g_0 = 1.2$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "21d9f937",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## The trial solution\n",
+    "\n",
+    "We will get a slightly different trial solution, as the boundary conditions are different\n",
+    "compared to the case for exponential decay.\n",
+    "\n",
+    "A possible trial solution satisfying the condition $g(0) = g_0$ could be\n",
+    "\n",
+    "$$\n",
+    "h_1(t) = g_0 + t \\cdot N(t,P)\n",
+    "$$\n",
+    "\n",
+    "with $N(t,P)$ being the output from the neural network with weights and biases for each layer collected in the set $P$.\n",
+    "\n",
+    "The analytical solution is\n",
+    "\n",
+    "$$\n",
+    "g(t) = \\frac{Ag_0}{g_0 + (A - g_0)\\exp(-\\alpha A t)}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7238ec16",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## The program using Autograd\n",
+    "\n",
+    "The network will be the similar as for the exponential decay example, but with some small modifications for our problem."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "id": "6c67a503",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "import autograd.numpy as np\n",
+    "from autograd import grad, elementwise_grad\n",
+    "import autograd.numpy.random as npr\n",
+    "from matplotlib import pyplot as plt\n",
+    "\n",
+    "def sigmoid(z):\n",
+    "    return 1/(1 + np.exp(-z))\n",
+    "\n",
+    "# Function to get the parameters.\n",
+    "# Done such that one can easily change the paramaters after one's liking.\n",
+    "def get_parameters():\n",
+    "    alpha = 2\n",
+    "    A = 1\n",
+    "    g0 = 1.2\n",
+    "    return alpha, A, g0\n",
+    "\n",
+    "def deep_neural_network(deep_params, x):\n",
+    "    # N_hidden is the number of hidden layers\n",
+    "    # deep_params is a list, len() should be used\n",
+    "    N_hidden = len(deep_params) - 1 # -1 since params consists of\n",
+    "                                        # parameters to all the hidden\n",
+    "                                        # layers AND the output layer.\n",
+    "\n",
+    "    # Assumes input x being an one-dimensional array\n",
+    "    num_values = np.size(x)\n",
+    "    x = x.reshape(-1, num_values)\n",
+    "\n",
+    "    # Assume that the input layer does nothing to the input x\n",
+    "    x_input = x\n",
+    "\n",
+    "    # Due to multiple hidden layers, define a variable referencing to the\n",
+    "    # output of the previous layer:\n",
+    "    x_prev = x_input\n",
+    "\n",
+    "    ## Hidden layers:\n",
+    "\n",
+    "    for l in range(N_hidden):\n",
+    "        # From the list of parameters P; find the correct weigths and bias for this layer\n",
+    "        w_hidden = deep_params[l]\n",
+    "\n",
+    "        # Add a row of ones to include bias\n",
+    "        x_prev = np.concatenate((np.ones((1,num_values)), x_prev ), axis = 0)\n",
+    "\n",
+    "        z_hidden = np.matmul(w_hidden, x_prev)\n",
+    "        x_hidden = sigmoid(z_hidden)\n",
+    "\n",
+    "        # Update x_prev such that next layer can use the output from this layer\n",
+    "        x_prev = x_hidden\n",
+    "\n",
+    "    ## Output layer:\n",
+    "\n",
+    "    # Get the weights and bias for this layer\n",
+    "    w_output = deep_params[-1]\n",
+    "\n",
+    "    # Include bias:\n",
+    "    x_prev = np.concatenate((np.ones((1,num_values)), x_prev), axis = 0)\n",
+    "\n",
+    "    z_output = np.matmul(w_output, x_prev)\n",
+    "    x_output = z_output\n",
+    "\n",
+    "    return x_output\n",
+    "\n",
+    "\n",
+    "\n",
+    "\n",
+    "def cost_function_deep(P, x):\n",
+    "\n",
+    "    # Evaluate the trial function with the current parameters P\n",
+    "    g_t = g_trial_deep(x,P)\n",
+    "\n",
+    "    # Find the derivative w.r.t x of the trial function\n",
+    "    d_g_t = elementwise_grad(g_trial_deep,0)(x,P)\n",
+    "\n",
+    "    # The right side of the ODE\n",
+    "    func = f(x, g_t)\n",
+    "\n",
+    "    err_sqr = (d_g_t - func)**2\n",
+    "    cost_sum = np.sum(err_sqr)\n",
+    "\n",
+    "    return cost_sum / np.size(err_sqr)\n",
+    "\n",
+    "# The right side of the ODE:\n",
+    "def f(x, g_trial):\n",
+    "    alpha,A, g0 = get_parameters()\n",
+    "    return alpha*g_trial*(A - g_trial)\n",
+    "\n",
+    "# The trial solution using the deep neural network:\n",
+    "def g_trial_deep(x, params):\n",
+    "    alpha,A, g0 = get_parameters()\n",
+    "    return g0 + x*deep_neural_network(params,x)\n",
+    "\n",
+    "# The analytical solution:\n",
+    "def g_analytic(t):\n",
+    "    alpha,A, g0 = get_parameters()\n",
+    "    return A*g0/(g0 + (A - g0)*np.exp(-alpha*A*t))\n",
+    "\n",
+    "def solve_ode_deep_neural_network(x, num_neurons, num_iter, lmb):\n",
+    "    # num_hidden_neurons is now a list of number of neurons within each hidden layer\n",
+    "\n",
+    "    # Find the number of hidden layers:\n",
+    "    N_hidden = np.size(num_neurons)\n",
+    "\n",
+    "    ## Set up initial weigths and biases\n",
+    "\n",
+    "    # Initialize the list of parameters:\n",
+    "    P = [None]*(N_hidden + 1) # + 1 to include the output layer\n",
+    "\n",
+    "    P[0] = npr.randn(num_neurons[0], 2 )\n",
+    "    for l in range(1,N_hidden):\n",
+    "        P[l] = npr.randn(num_neurons[l], num_neurons[l-1] + 1) # +1 to include bias\n",
+    "\n",
+    "    # For the output layer\n",
+    "    P[-1] = npr.randn(1, num_neurons[-1] + 1 ) # +1 since bias is included\n",
+    "\n",
+    "    print('Initial cost: %g'%cost_function_deep(P, x))\n",
+    "\n",
+    "    ## Start finding the optimal weigths using gradient descent\n",
+    "\n",
+    "    # Find the Python function that represents the gradient of the cost function\n",
+    "    # w.r.t the 0-th input argument -- that is the weights and biases in the hidden and output layer\n",
+    "    cost_function_deep_grad = grad(cost_function_deep,0)\n",
+    "\n",
+    "    # Let the update be done num_iter times\n",
+    "    for i in range(num_iter):\n",
+    "        # Evaluate the gradient at the current weights and biases in P.\n",
+    "        # The cost_grad consist now of N_hidden + 1 arrays; the gradient w.r.t the weights and biases\n",
+    "        # in the hidden layers and output layers evaluated at x.\n",
+    "        cost_deep_grad =  cost_function_deep_grad(P, x)\n",
+    "\n",
+    "        for l in range(N_hidden+1):\n",
+    "            P[l] = P[l] - lmb * cost_deep_grad[l]\n",
+    "\n",
+    "    print('Final cost: %g'%cost_function_deep(P, x))\n",
+    "\n",
+    "    return P\n",
+    "\n",
+    "if __name__ == '__main__':\n",
+    "    npr.seed(4155)\n",
+    "\n",
+    "    ## Decide the vales of arguments to the function to solve\n",
+    "    Nt = 10\n",
+    "    T = 1\n",
+    "    t = np.linspace(0,T, Nt)\n",
+    "\n",
+    "    ## Set up the initial parameters\n",
+    "    num_hidden_neurons = [100, 50, 25]\n",
+    "    num_iter = 1000\n",
+    "    lmb = 1e-3\n",
+    "\n",
+    "    P = solve_ode_deep_neural_network(t, num_hidden_neurons, num_iter, lmb)\n",
+    "\n",
+    "    g_dnn_ag = g_trial_deep(t,P)\n",
+    "    g_analytical = g_analytic(t)\n",
+    "\n",
+    "    # Find the maximum absolute difference between the solutons:\n",
+    "    diff_ag = np.max(np.abs(g_dnn_ag - g_analytical))\n",
+    "    print(\"The max absolute difference between the solutions is: %g\"%diff_ag)\n",
+    "\n",
+    "    plt.figure(figsize=(10,10))\n",
+    "\n",
+    "    plt.title('Performance of neural network solving an ODE compared to the analytical solution')\n",
+    "    plt.plot(t, g_analytical)\n",
+    "    plt.plot(t, g_dnn_ag[0,:])\n",
+    "    plt.legend(['analytical','nn'])\n",
+    "    plt.xlabel('t')\n",
+    "    plt.ylabel('g(t)')\n",
+    "\n",
+    "    plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6d8da7aa",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Using forward Euler to solve the ODE\n",
+    "\n",
+    "A straightforward way of solving an ODE numerically, is to use Euler's method.\n",
+    "\n",
+    "Euler's method uses Taylor series to approximate the value at a function $f$ at a step $\\Delta x$ from $x$:\n",
+    "\n",
+    "$$\n",
+    "f(x + \\Delta x) \\approx f(x) + \\Delta x f'(x)\n",
+    "$$\n",
+    "\n",
+    "In our case, using Euler's method to approximate the value of $g$ at a step $\\Delta t$ from $t$ yields"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "16110c83",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\begin{aligned}\n",
+    "  g(t + \\Delta t) &\\approx g(t) + \\Delta t g'(t) \\\\\n",
+    "  &= g(t) + \\Delta t \\big(\\alpha g(t)(A - g(t))\\big)\n",
+    "\\end{aligned}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f22eb44d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "along with the condition that $g(0) = g_0$.\n",
+    "\n",
+    "Let $t_i = i \\cdot \\Delta t$ where $\\Delta t = \\frac{T}{N_t-1}$ where $T$ is the final time our solver must solve for and $N_t$ the number of values for $t \\in [0, T]$ for $i = 0, \\dots, N_t-1$.\n",
+    "\n",
+    "For $i \\geq 1$, we have that"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "29db5d9e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\begin{aligned}\n",
+    "t_i &= i\\Delta t \\\\\n",
+    "&= (i - 1)\\Delta t + \\Delta t \\\\\n",
+    "&= t_{i-1} + \\Delta t\n",
+    "\\end{aligned}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "98a96f6f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Now, if $g_i = g(t_i)$ then"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ab27fd4d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"odenum\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation}\n",
+    "  \\begin{aligned}\n",
+    "  g_i &= g(t_i) \\\\\n",
+    "  &= g(t_{i-1} + \\Delta t) \\\\\n",
+    "  &\\approx g(t_{i-1}) + \\Delta t \\big(\\alpha g(t_{i-1})(A - g(t_{i-1}))\\big) \\\\\n",
+    "  &= g_{i-1} + \\Delta t \\big(\\alpha g_{i-1}(A - g_{i-1})\\big)\n",
+    "  \\end{aligned}\n",
+    "\\end{equation} \\label{odenum} \\tag{12}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "113b2500",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "for $i \\geq 1$ and $g_0 = g(t_0) = g(0) = g_0$.\n",
+    "\n",
+    "Equation ([12](#odenum)) could be implemented in the following way,\n",
+    "extending the program that uses the network using Autograd:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "id": "1ea67fe1",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "# Assume that all function definitions from the example program using Autograd\n",
+    "# are located here.\n",
+    "\n",
+    "if __name__ == '__main__':\n",
+    "    npr.seed(4155)\n",
+    "\n",
+    "    ## Decide the vales of arguments to the function to solve\n",
+    "    Nt = 10\n",
+    "    T = 1\n",
+    "    t = np.linspace(0,T, Nt)\n",
+    "\n",
+    "    ## Set up the initial parameters\n",
+    "    num_hidden_neurons = [100,50,25]\n",
+    "    num_iter = 1000\n",
+    "    lmb = 1e-3\n",
+    "\n",
+    "    P = solve_ode_deep_neural_network(t, num_hidden_neurons, num_iter, lmb)\n",
+    "\n",
+    "    g_dnn_ag = g_trial_deep(t,P)\n",
+    "    g_analytical = g_analytic(t)\n",
+    "\n",
+    "    # Find the maximum absolute difference between the solutons:\n",
+    "    diff_ag = np.max(np.abs(g_dnn_ag - g_analytical))\n",
+    "    print(\"The max absolute difference between the solutions is: %g\"%diff_ag)\n",
+    "\n",
+    "    plt.figure(figsize=(10,10))\n",
+    "\n",
+    "    plt.title('Performance of neural network solving an ODE compared to the analytical solution')\n",
+    "    plt.plot(t, g_analytical)\n",
+    "    plt.plot(t, g_dnn_ag[0,:])\n",
+    "    plt.legend(['analytical','nn'])\n",
+    "    plt.xlabel('t')\n",
+    "    plt.ylabel('g(t)')\n",
+    "\n",
+    "    ## Find an approximation to the funtion using forward Euler\n",
+    "\n",
+    "    alpha, A, g0 = get_parameters()\n",
+    "    dt = T/(Nt - 1)\n",
+    "\n",
+    "    # Perform forward Euler to solve the ODE\n",
+    "    g_euler = np.zeros(Nt)\n",
+    "    g_euler[0] = g0\n",
+    "\n",
+    "    for i in range(1,Nt):\n",
+    "        g_euler[i] = g_euler[i-1] + dt*(alpha*g_euler[i-1]*(A - g_euler[i-1]))\n",
+    "\n",
+    "    # Print the errors done by each method\n",
+    "    diff1 = np.max(np.abs(g_euler - g_analytical))\n",
+    "    diff2 = np.max(np.abs(g_dnn_ag[0,:] - g_analytical))\n",
+    "\n",
+    "    print('Max absolute difference between Euler method and analytical: %g'%diff1)\n",
+    "    print('Max absolute difference between deep neural network and analytical: %g'%diff2)\n",
+    "\n",
+    "    # Plot results\n",
+    "    plt.figure(figsize=(10,10))\n",
+    "\n",
+    "    plt.plot(t,g_euler)\n",
+    "    plt.plot(t,g_analytical)\n",
+    "    plt.plot(t,g_dnn_ag[0,:])\n",
+    "\n",
+    "    plt.legend(['euler','analytical','dnn'])\n",
+    "    plt.xlabel('Time t')\n",
+    "    plt.ylabel('g(t)')\n",
+    "\n",
+    "    plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b00ecf89",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Example: Solving the one dimensional Poisson equation\n",
+    "\n",
+    "The Poisson equation for $g(x)$ in one dimension is"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7a9c646d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"poisson\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation} \\label{poisson} \\tag{13}\n",
+    "  -g''(x) = f(x)\n",
+    "\\end{equation}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5a307285",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where $f(x)$ is a given function for $x \\in (0,1)$.\n",
+    "\n",
+    "The conditions that $g(x)$ is chosen to fulfill, are"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4f89448c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\begin{align*}\n",
+    "  g(0) &= 0 \\\\\n",
+    "  g(1) &= 0\n",
+    "\\end{align*}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ccefa400",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "This equation can be solved numerically using programs where e.g Autograd and TensorFlow are used.\n",
+    "The results from the networks can then be compared to the analytical solution.\n",
+    "In addition, it could be interesting to see how a typical method for numerically solving second order ODEs compares to the neural networks."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3cc971df",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## The specific equation to solve for\n",
+    "\n",
+    "Here, the function $g(x)$ to solve for follows the equation"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "fc8799c8",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "-g''(x) = f(x),\\qquad x \\in (0,1)\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "20862b1e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where $f(x)$ is a given function, along with the chosen conditions"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "156c5f9e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"cond\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{aligned}\n",
+    "g(0) = g(1) = 0\n",
+    "\\end{aligned}\\label{cond} \\tag{14}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1f6fa50d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "In this example, we consider the case when $f(x) = (3x + x^2)\\exp(x)$.\n",
+    "\n",
+    "For this case, a possible trial solution satisfying the conditions could be"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4a66fed7",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "g_t(x) = x \\cdot (1-x) \\cdot N(P,x)\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "66d89dd3",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "The analytical solution for this problem is"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "590f5a2f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "g(x) = x(1 - x)\\exp(x)\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "87d48c50",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Solving the equation using Autograd"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "id": "7759134e",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "import autograd.numpy as np\n",
+    "from autograd import grad, elementwise_grad\n",
+    "import autograd.numpy.random as npr\n",
+    "from matplotlib import pyplot as plt\n",
+    "\n",
+    "def sigmoid(z):\n",
+    "    return 1/(1 + np.exp(-z))\n",
+    "\n",
+    "def deep_neural_network(deep_params, x):\n",
+    "    # N_hidden is the number of hidden layers\n",
+    "    # deep_params is a list, len() should be used\n",
+    "    N_hidden = len(deep_params) - 1 # -1 since params consists of\n",
+    "                                        # parameters to all the hidden\n",
+    "                                        # layers AND the output layer.\n",
+    "\n",
+    "    # Assumes input x being an one-dimensional array\n",
+    "    num_values = np.size(x)\n",
+    "    x = x.reshape(-1, num_values)\n",
+    "\n",
+    "    # Assume that the input layer does nothing to the input x\n",
+    "    x_input = x\n",
+    "\n",
+    "    # Due to multiple hidden layers, define a variable referencing to the\n",
+    "    # output of the previous layer:\n",
+    "    x_prev = x_input\n",
+    "\n",
+    "    ## Hidden layers:\n",
+    "\n",
+    "    for l in range(N_hidden):\n",
+    "        # From the list of parameters P; find the correct weigths and bias for this layer\n",
+    "        w_hidden = deep_params[l]\n",
+    "\n",
+    "        # Add a row of ones to include bias\n",
+    "        x_prev = np.concatenate((np.ones((1,num_values)), x_prev ), axis = 0)\n",
+    "\n",
+    "        z_hidden = np.matmul(w_hidden, x_prev)\n",
+    "        x_hidden = sigmoid(z_hidden)\n",
+    "\n",
+    "        # Update x_prev such that next layer can use the output from this layer\n",
+    "        x_prev = x_hidden\n",
+    "\n",
+    "    ## Output layer:\n",
+    "\n",
+    "    # Get the weights and bias for this layer\n",
+    "    w_output = deep_params[-1]\n",
+    "\n",
+    "    # Include bias:\n",
+    "    x_prev = np.concatenate((np.ones((1,num_values)), x_prev), axis = 0)\n",
+    "\n",
+    "    z_output = np.matmul(w_output, x_prev)\n",
+    "    x_output = z_output\n",
+    "\n",
+    "    return x_output\n",
+    "\n",
+    "\n",
+    "def solve_ode_deep_neural_network(x, num_neurons, num_iter, lmb):\n",
+    "    # num_hidden_neurons is now a list of number of neurons within each hidden layer\n",
+    "\n",
+    "    # Find the number of hidden layers:\n",
+    "    N_hidden = np.size(num_neurons)\n",
+    "\n",
+    "    ## Set up initial weigths and biases\n",
+    "\n",
+    "    # Initialize the list of parameters:\n",
+    "    P = [None]*(N_hidden + 1) # + 1 to include the output layer\n",
+    "\n",
+    "    P[0] = npr.randn(num_neurons[0], 2 )\n",
+    "    for l in range(1,N_hidden):\n",
+    "        P[l] = npr.randn(num_neurons[l], num_neurons[l-1] + 1) # +1 to include bias\n",
+    "\n",
+    "    # For the output layer\n",
+    "    P[-1] = npr.randn(1, num_neurons[-1] + 1 ) # +1 since bias is included\n",
+    "\n",
+    "    print('Initial cost: %g'%cost_function_deep(P, x))\n",
+    "\n",
+    "    ## Start finding the optimal weigths using gradient descent\n",
+    "\n",
+    "    # Find the Python function that represents the gradient of the cost function\n",
+    "    # w.r.t the 0-th input argument -- that is the weights and biases in the hidden and output layer\n",
+    "    cost_function_deep_grad = grad(cost_function_deep,0)\n",
+    "\n",
+    "    # Let the update be done num_iter times\n",
+    "    for i in range(num_iter):\n",
+    "        # Evaluate the gradient at the current weights and biases in P.\n",
+    "        # The cost_grad consist now of N_hidden + 1 arrays; the gradient w.r.t the weights and biases\n",
+    "        # in the hidden layers and output layers evaluated at x.\n",
+    "        cost_deep_grad =  cost_function_deep_grad(P, x)\n",
+    "\n",
+    "        for l in range(N_hidden+1):\n",
+    "            P[l] = P[l] - lmb * cost_deep_grad[l]\n",
+    "\n",
+    "    print('Final cost: %g'%cost_function_deep(P, x))\n",
+    "\n",
+    "    return P\n",
+    "\n",
+    "## Set up the cost function specified for this Poisson equation:\n",
+    "\n",
+    "# The right side of the ODE\n",
+    "def f(x):\n",
+    "    return (3*x + x**2)*np.exp(x)\n",
+    "\n",
+    "def cost_function_deep(P, x):\n",
+    "\n",
+    "    # Evaluate the trial function with the current parameters P\n",
+    "    g_t = g_trial_deep(x,P)\n",
+    "\n",
+    "    # Find the derivative w.r.t x of the trial function\n",
+    "    d2_g_t = elementwise_grad(elementwise_grad(g_trial_deep,0))(x,P)\n",
+    "\n",
+    "    right_side = f(x)\n",
+    "\n",
+    "    err_sqr = (-d2_g_t - right_side)**2\n",
+    "    cost_sum = np.sum(err_sqr)\n",
+    "\n",
+    "    return cost_sum/np.size(err_sqr)\n",
+    "\n",
+    "# The trial solution:\n",
+    "def g_trial_deep(x,P):\n",
+    "    return x*(1-x)*deep_neural_network(P,x)\n",
+    "\n",
+    "# The analytic solution;\n",
+    "def g_analytic(x):\n",
+    "    return x*(1-x)*np.exp(x)\n",
+    "\n",
+    "if __name__ == '__main__':\n",
+    "    npr.seed(4155)\n",
+    "\n",
+    "    ## Decide the vales of arguments to the function to solve\n",
+    "    Nx = 10\n",
+    "    x = np.linspace(0,1, Nx)\n",
+    "\n",
+    "    ## Set up the initial parameters\n",
+    "    num_hidden_neurons = [200,100]\n",
+    "    num_iter = 1000\n",
+    "    lmb = 1e-3\n",
+    "\n",
+    "    P = solve_ode_deep_neural_network(x, num_hidden_neurons, num_iter, lmb)\n",
+    "\n",
+    "    g_dnn_ag = g_trial_deep(x,P)\n",
+    "    g_analytical = g_analytic(x)\n",
+    "\n",
+    "    # Find the maximum absolute difference between the solutons:\n",
+    "    max_diff = np.max(np.abs(g_dnn_ag - g_analytical))\n",
+    "    print(\"The max absolute difference between the solutions is: %g\"%max_diff)\n",
+    "\n",
+    "    plt.figure(figsize=(10,10))\n",
+    "\n",
+    "    plt.title('Performance of neural network solving an ODE compared to the analytical solution')\n",
+    "    plt.plot(x, g_analytical)\n",
+    "    plt.plot(x, g_dnn_ag[0,:])\n",
+    "    plt.legend(['analytical','nn'])\n",
+    "    plt.xlabel('x')\n",
+    "    plt.ylabel('g(x)')\n",
+    "    plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0ce85b98",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Comparing with a numerical scheme\n",
+    "\n",
+    "The Poisson equation is possible to solve using Taylor series to approximate the second derivative.\n",
+    "\n",
+    "Using Taylor series, the second derivative can be expressed as\n",
+    "\n",
+    "$$\n",
+    "g''(x) = \\frac{g(x + \\Delta x) - 2g(x) + g(x-\\Delta x)}{\\Delta x^2} + E_{\\Delta x}(x)\n",
+    "$$\n",
+    "\n",
+    "where $\\Delta x$ is a small step size and $E_{\\Delta x}(x)$ being the error term.\n",
+    "\n",
+    "Looking away from the error terms gives an approximation to the second derivative:"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c47d0ae3",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"approx\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation} \\label{approx} \\tag{15}\n",
+    "g''(x) \\approx \\frac{g(x + \\Delta x) - 2g(x) + g(x-\\Delta x)}{\\Delta x^2}\n",
+    "\\end{equation}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b5ceafc1",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "If $x_i = i \\Delta x = x_{i-1} + \\Delta x$ and $g_i = g(x_i)$ for $i = 1,\\dots N_x - 2$ with $N_x$ being the number of values for $x$, ([15](#approx)) becomes"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7a803a67",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\begin{aligned}\n",
+    "g''(x_i) &\\approx \\frac{g(x_i + \\Delta x) - 2g(x_i) + g(x_i -\\Delta x)}{\\Delta x^2} \\\\\n",
+    "&= \\frac{g_{i+1} - 2g_i + g_{i-1}}{\\Delta x^2}\n",
+    "\\end{aligned}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2bef7d4e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Since we know from our problem that"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c57c7f4d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\begin{aligned}\n",
+    "-g''(x) &= f(x) \\\\\n",
+    "&= (3x + x^2)\\exp(x)\n",
+    "\\end{aligned}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ad39a54c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "along with the conditions $g(0) = g(1) = 0$,\n",
+    "the following scheme can be used to find an approximate solution for $g(x)$ numerically:"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5aaa02f6",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"odesys\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation}\n",
+    "  \\begin{aligned}\n",
+    "  -\\Big( \\frac{g_{i+1} - 2g_i + g_{i-1}}{\\Delta x^2} \\Big) &= f(x_i) \\\\\n",
+    "  -g_{i+1} + 2g_i - g_{i-1} &= \\Delta x^2 f(x_i)\n",
+    "  \\end{aligned}\n",
+    "\\end{equation} \\label{odesys} \\tag{16}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4d94a0cf",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "for $i = 1, \\dots, N_x - 2$ where $g_0 = g_{N_x - 1} = 0$ and $f(x_i) = (3x_i + x_i^2)\\exp(x_i)$, which is given for our specific problem.\n",
+    "\n",
+    "The equation can be rewritten into a matrix equation:"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7af7c13d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\begin{aligned}\n",
+    "\\begin{pmatrix}\n",
+    "2 & -1 & 0 & \\dots & 0 \\\\\n",
+    "-1 & 2 & -1 & \\dots & 0 \\\\\n",
+    "\\vdots & & \\ddots & & \\vdots \\\\\n",
+    "0 & \\dots & -1 & 2 & -1  \\\\\n",
+    "0 & \\dots & 0 & -1 & 2\\\\\n",
+    "\\end{pmatrix}\n",
+    "\\begin{pmatrix}\n",
+    "g_1 \\\\\n",
+    "g_2 \\\\\n",
+    "\\vdots \\\\\n",
+    "g_{N_x - 3} \\\\\n",
+    "g_{N_x - 2}\n",
+    "\\end{pmatrix}\n",
+    "&=\n",
+    "\\Delta x^2\n",
+    "\\begin{pmatrix}\n",
+    "f(x_1) \\\\\n",
+    "f(x_2) \\\\\n",
+    "\\vdots \\\\\n",
+    "f(x_{N_x - 3}) \\\\\n",
+    "f(x_{N_x - 2})\n",
+    "\\end{pmatrix} \\\\\n",
+    "\\boldsymbol{A}\\boldsymbol{g} &= \\boldsymbol{f},\n",
+    "\\end{aligned}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2f7eb054",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "which makes it possible to solve for the vector $\\boldsymbol{g}$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1b0ef823",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Setting up the code\n",
+    "\n",
+    "We can then compare the result from this numerical scheme with the output from our network using Autograd:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "id": "4b0407eb",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "import autograd.numpy as np\n",
+    "from autograd import grad, elementwise_grad\n",
+    "import autograd.numpy.random as npr\n",
+    "from matplotlib import pyplot as plt\n",
+    "\n",
+    "def sigmoid(z):\n",
+    "    return 1/(1 + np.exp(-z))\n",
+    "\n",
+    "def deep_neural_network(deep_params, x):\n",
+    "    # N_hidden is the number of hidden layers\n",
+    "    # deep_params is a list, len() should be used\n",
+    "    N_hidden = len(deep_params) - 1 # -1 since params consists of\n",
+    "                                        # parameters to all the hidden\n",
+    "                                        # layers AND the output layer.\n",
+    "\n",
+    "    # Assumes input x being an one-dimensional array\n",
+    "    num_values = np.size(x)\n",
+    "    x = x.reshape(-1, num_values)\n",
+    "\n",
+    "    # Assume that the input layer does nothing to the input x\n",
+    "    x_input = x\n",
+    "\n",
+    "    # Due to multiple hidden layers, define a variable referencing to the\n",
+    "    # output of the previous layer:\n",
+    "    x_prev = x_input\n",
+    "\n",
+    "    ## Hidden layers:\n",
+    "\n",
+    "    for l in range(N_hidden):\n",
+    "        # From the list of parameters P; find the correct weigths and bias for this layer\n",
+    "        w_hidden = deep_params[l]\n",
+    "\n",
+    "        # Add a row of ones to include bias\n",
+    "        x_prev = np.concatenate((np.ones((1,num_values)), x_prev ), axis = 0)\n",
+    "\n",
+    "        z_hidden = np.matmul(w_hidden, x_prev)\n",
+    "        x_hidden = sigmoid(z_hidden)\n",
+    "\n",
+    "        # Update x_prev such that next layer can use the output from this layer\n",
+    "        x_prev = x_hidden\n",
+    "\n",
+    "    ## Output layer:\n",
+    "\n",
+    "    # Get the weights and bias for this layer\n",
+    "    w_output = deep_params[-1]\n",
+    "\n",
+    "    # Include bias:\n",
+    "    x_prev = np.concatenate((np.ones((1,num_values)), x_prev), axis = 0)\n",
+    "\n",
+    "    z_output = np.matmul(w_output, x_prev)\n",
+    "    x_output = z_output\n",
+    "\n",
+    "    return x_output\n",
+    "\n",
+    "\n",
+    "def solve_ode_deep_neural_network(x, num_neurons, num_iter, lmb):\n",
+    "    # num_hidden_neurons is now a list of number of neurons within each hidden layer\n",
+    "\n",
+    "    # Find the number of hidden layers:\n",
+    "    N_hidden = np.size(num_neurons)\n",
+    "\n",
+    "    ## Set up initial weigths and biases\n",
+    "\n",
+    "    # Initialize the list of parameters:\n",
+    "    P = [None]*(N_hidden + 1) # + 1 to include the output layer\n",
+    "\n",
+    "    P[0] = npr.randn(num_neurons[0], 2 )\n",
+    "    for l in range(1,N_hidden):\n",
+    "        P[l] = npr.randn(num_neurons[l], num_neurons[l-1] + 1) # +1 to include bias\n",
+    "\n",
+    "    # For the output layer\n",
+    "    P[-1] = npr.randn(1, num_neurons[-1] + 1 ) # +1 since bias is included\n",
+    "\n",
+    "    print('Initial cost: %g'%cost_function_deep(P, x))\n",
+    "\n",
+    "    ## Start finding the optimal weigths using gradient descent\n",
+    "\n",
+    "    # Find the Python function that represents the gradient of the cost function\n",
+    "    # w.r.t the 0-th input argument -- that is the weights and biases in the hidden and output layer\n",
+    "    cost_function_deep_grad = grad(cost_function_deep,0)\n",
+    "\n",
+    "    # Let the update be done num_iter times\n",
+    "    for i in range(num_iter):\n",
+    "        # Evaluate the gradient at the current weights and biases in P.\n",
+    "        # The cost_grad consist now of N_hidden + 1 arrays; the gradient w.r.t the weights and biases\n",
+    "        # in the hidden layers and output layers evaluated at x.\n",
+    "        cost_deep_grad =  cost_function_deep_grad(P, x)\n",
+    "\n",
+    "        for l in range(N_hidden+1):\n",
+    "            P[l] = P[l] - lmb * cost_deep_grad[l]\n",
+    "\n",
+    "    print('Final cost: %g'%cost_function_deep(P, x))\n",
+    "\n",
+    "    return P\n",
+    "\n",
+    "## Set up the cost function specified for this Poisson equation:\n",
+    "\n",
+    "# The right side of the ODE\n",
+    "def f(x):\n",
+    "    return (3*x + x**2)*np.exp(x)\n",
+    "\n",
+    "def cost_function_deep(P, x):\n",
+    "\n",
+    "    # Evaluate the trial function with the current parameters P\n",
+    "    g_t = g_trial_deep(x,P)\n",
+    "\n",
+    "    # Find the derivative w.r.t x of the trial function\n",
+    "    d2_g_t = elementwise_grad(elementwise_grad(g_trial_deep,0))(x,P)\n",
+    "\n",
+    "    right_side = f(x)\n",
+    "\n",
+    "    err_sqr = (-d2_g_t - right_side)**2\n",
+    "    cost_sum = np.sum(err_sqr)\n",
+    "\n",
+    "    return cost_sum/np.size(err_sqr)\n",
+    "\n",
+    "# The trial solution:\n",
+    "def g_trial_deep(x,P):\n",
+    "    return x*(1-x)*deep_neural_network(P,x)\n",
+    "\n",
+    "# The analytic solution;\n",
+    "def g_analytic(x):\n",
+    "    return x*(1-x)*np.exp(x)\n",
+    "\n",
+    "if __name__ == '__main__':\n",
+    "    npr.seed(4155)\n",
+    "\n",
+    "    ## Decide the vales of arguments to the function to solve\n",
+    "    Nx = 10\n",
+    "    x = np.linspace(0,1, Nx)\n",
+    "\n",
+    "    ## Set up the initial parameters\n",
+    "    num_hidden_neurons = [200,100]\n",
+    "    num_iter = 1000\n",
+    "    lmb = 1e-3\n",
+    "\n",
+    "    P = solve_ode_deep_neural_network(x, num_hidden_neurons, num_iter, lmb)\n",
+    "\n",
+    "    g_dnn_ag = g_trial_deep(x,P)\n",
+    "    g_analytical = g_analytic(x)\n",
+    "\n",
+    "    # Find the maximum absolute difference between the solutons:\n",
+    "\n",
+    "    plt.figure(figsize=(10,10))\n",
+    "\n",
+    "    plt.title('Performance of neural network solving an ODE compared to the analytical solution')\n",
+    "    plt.plot(x, g_analytical)\n",
+    "    plt.plot(x, g_dnn_ag[0,:])\n",
+    "    plt.legend(['analytical','nn'])\n",
+    "    plt.xlabel('x')\n",
+    "    plt.ylabel('g(x)')\n",
+    "\n",
+    "    ## Perform the computation using the numerical scheme\n",
+    "\n",
+    "    dx = 1/(Nx - 1)\n",
+    "\n",
+    "    # Set up the matrix A\n",
+    "    A = np.zeros((Nx-2,Nx-2))\n",
+    "\n",
+    "    A[0,0] = 2\n",
+    "    A[0,1] = -1\n",
+    "\n",
+    "    for i in range(1,Nx-3):\n",
+    "        A[i,i-1] = -1\n",
+    "        A[i,i] = 2\n",
+    "        A[i,i+1] = -1\n",
+    "\n",
+    "    A[Nx - 3, Nx - 4] = -1\n",
+    "    A[Nx - 3, Nx - 3] = 2\n",
+    "\n",
+    "    # Set up the vector f\n",
+    "    f_vec = dx**2 * f(x[1:-1])\n",
+    "\n",
+    "    # Solve the equation\n",
+    "    g_res = np.linalg.solve(A,f_vec)\n",
+    "\n",
+    "    g_vec = np.zeros(Nx)\n",
+    "    g_vec[1:-1] = g_res\n",
+    "\n",
+    "    # Print the differences between each method\n",
+    "    max_diff1 = np.max(np.abs(g_dnn_ag - g_analytical))\n",
+    "    max_diff2 = np.max(np.abs(g_vec - g_analytical))\n",
+    "    print(\"The max absolute difference between the analytical solution and DNN Autograd: %g\"%max_diff1)\n",
+    "    print(\"The max absolute difference between the analytical solution and numerical scheme: %g\"%max_diff2)\n",
+    "\n",
+    "    # Plot the results\n",
+    "    plt.figure(figsize=(10,10))\n",
+    "\n",
+    "    plt.plot(x,g_vec)\n",
+    "    plt.plot(x,g_analytical)\n",
+    "    plt.plot(x,g_dnn_ag[0,:])\n",
+    "\n",
+    "    plt.legend(['numerical scheme','analytical','dnn'])\n",
+    "    plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "42b99553",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Partial Differential Equations\n",
+    "\n",
+    "A partial differential equation (PDE) has a solution here the function\n",
+    "is defined by multiple variables.  The equation may involve all kinds\n",
+    "of combinations of which variables the function is differentiated with\n",
+    "respect to.\n",
+    "\n",
+    "In general, a partial differential equation for a function $g(x_1,\\dots,x_N)$ with $N$ variables may be expressed as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c4fa66ad",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"PDE\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation} \\label{PDE} \\tag{17}\n",
+    "  f\\left(x_1, \\, \\dots \\, , x_N, \\frac{\\partial g(x_1,\\dots,x_N) }{\\partial x_1}, \\dots , \\frac{\\partial g(x_1,\\dots,x_N) }{\\partial x_N}, \\frac{\\partial g(x_1,\\dots,x_N) }{\\partial x_1\\partial x_2}, \\, \\dots \\, , \\frac{\\partial^n g(x_1,\\dots,x_N) }{\\partial x_N^n} \\right) = 0\n",
+    "\\end{equation}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "376a5d25",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where $f$ is an expression involving all kinds of possible mixed derivatives of $g(x_1,\\dots,x_N)$ up to an order $n$. In order for the solution to be unique, some additional conditions must also be given."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2ada314e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Type of problem\n",
+    "\n",
+    "The problem our network must solve for, is similar to the ODE case.\n",
+    "We must have a trial solution $g_t$ at hand.\n",
+    "\n",
+    "For instance, the trial solution could be expressed as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "aaa8b247",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\begin{align*}\n",
+    "  g_t(x_1,\\dots,x_N) = h_1(x_1,\\dots,x_N) + h_2(x_1,\\dots,x_N,N(x_1,\\dots,x_N,P))\n",
+    "\\end{align*}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "fa015b7b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where $h_1(x_1,\\dots,x_N)$ is a function that ensures $g_t(x_1,\\dots,x_N)$ satisfies some given conditions.\n",
+    "The neural network $N(x_1,\\dots,x_N,P)$ has weights and biases described by $P$ and $h_2(x_1,\\dots,x_N,N(x_1,\\dots,x_N,P))$ is an expression using the output from the neural network in some way.\n",
+    "\n",
+    "The role of the function $h_2(x_1,\\dots,x_N,N(x_1,\\dots,x_N,P))$, is to ensure that the output of $N(x_1,\\dots,x_N,P)$ is zero when $g_t(x_1,\\dots,x_N)$ is evaluated at the values of $x_1,\\dots,x_N$ where the given conditions must be satisfied. The function $h_1(x_1,\\dots,x_N)$ should alone make $g_t(x_1,\\dots,x_N)$ satisfy the conditions."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7207c772",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Network requirements\n",
+    "\n",
+    "The network tries then the minimize the cost function following the\n",
+    "same ideas as described for the ODE case, but now with more than one\n",
+    "variables to consider.  The concept still remains the same; find a set\n",
+    "of parameters $P$ such that the expression $f$ in ([17](#PDE)) is as\n",
+    "close to zero as possible.\n",
+    "\n",
+    "As for the ODE case, the cost function is the mean squared error that\n",
+    "the network must try to minimize. The cost function for the network to\n",
+    "minimize is"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ca1950c7",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "C\\left(x_1, \\dots, x_N, P\\right) = \\left(  f\\left(x_1, \\, \\dots \\, , x_N, \\frac{\\partial g(x_1,\\dots,x_N) }{\\partial x_1}, \\dots , \\frac{\\partial g(x_1,\\dots,x_N) }{\\partial x_N}, \\frac{\\partial g(x_1,\\dots,x_N) }{\\partial x_1\\partial x_2}, \\, \\dots \\, , \\frac{\\partial^n g(x_1,\\dots,x_N) }{\\partial x_N^n} \\right) \\right)^2\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "43374234",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## More details\n",
+    "\n",
+    "If we let $\\boldsymbol{x} = \\big( x_1, \\dots, x_N \\big)$ be an array containing the values for $x_1, \\dots, x_N$ respectively, the cost function can be reformulated into the following:"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "647a2647",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "C\\left(\\boldsymbol{x}, P\\right) = f\\left( \\left( \\boldsymbol{x}, \\frac{\\partial g(\\boldsymbol{x}) }{\\partial x_1}, \\dots , \\frac{\\partial g(\\boldsymbol{x}) }{\\partial x_N}, \\frac{\\partial g(\\boldsymbol{x}) }{\\partial x_1\\partial x_2}, \\, \\dots \\, , \\frac{\\partial^n g(\\boldsymbol{x}) }{\\partial x_N^n} \\right) \\right)^2\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "70b62893",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "If we also have $M$ different sets of values for $x_1, \\dots, x_N$, that is $\\boldsymbol{x}_i = \\big(x_1^{(i)}, \\dots, x_N^{(i)}\\big)$ for $i = 1,\\dots,M$ being the rows in matrix $X$, the cost function can be generalized into"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "50709b1a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "C\\left(X, P \\right) = \\sum_{i=1}^M f\\left( \\left( \\boldsymbol{x}_i, \\frac{\\partial g(\\boldsymbol{x}_i) }{\\partial x_1}, \\dots , \\frac{\\partial g(\\boldsymbol{x}_i) }{\\partial x_N}, \\frac{\\partial g(\\boldsymbol{x}_i) }{\\partial x_1\\partial x_2}, \\, \\dots \\, , \\frac{\\partial^n g(\\boldsymbol{x}_i) }{\\partial x_N^n} \\right) \\right)^2.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "23c6896e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Example: The diffusion equation\n",
+    "\n",
+    "In one spatial dimension, the equation reads"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "347884c9",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\frac{\\partial g(x,t)}{\\partial t} = \\frac{\\partial^2 g(x,t)}{\\partial x^2}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "004b5638",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where a possible choice of conditions are"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b185e0d5",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\begin{align*}\n",
+    "g(0,t) &= 0 ,\\qquad t \\geq 0 \\\\\n",
+    "g(1,t) &= 0, \\qquad t \\geq 0 \\\\\n",
+    "g(x,0) &= u(x),\\qquad x\\in [0,1]\n",
+    "\\end{align*}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "245a06b2",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "with $u(x)$ being some given function."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "43dbe7c7",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Defining the problem\n",
+    "\n",
+    "For this case, we want to find $g(x,t)$ such that"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "38eb5e27",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "<!-- Equation labels as ordinary links -->\n",
+    "<div id=\"diffonedim\"></div>\n",
+    "\n",
+    "$$\n",
+    "\\begin{equation}\n",
+    "  \\frac{\\partial g(x,t)}{\\partial t} = \\frac{\\partial^2 g(x,t)}{\\partial x^2}\n",
+    "\\end{equation} \\label{diffonedim} \\tag{18}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "92b44b8e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9143c3b9",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\begin{align*}\n",
+    "g(0,t) &= 0 ,\\qquad t \\geq 0 \\\\\n",
+    "g(1,t) &= 0, \\qquad t \\geq 0 \\\\\n",
+    "g(x,0) &= u(x),\\qquad x\\in [0,1]\n",
+    "\\end{align*}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7be0acde",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "with $u(x) = \\sin(\\pi x)$.\n",
+    "\n",
+    "First, let us set up the deep neural network.\n",
+    "The deep neural network will follow the same structure as discussed in the examples solving the ODEs.\n",
+    "First, we will look into how Autograd could be used in a network tailored to solve for bivariate functions."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "76ed6a24",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Setting up the network using Autograd\n",
+    "\n",
+    "The only change to do here, is to extend our network such that\n",
+    "functions of multiple parameters are correctly handled.  In this case\n",
+    "we have two variables in our function to solve for, that is time $t$\n",
+    "and position $x$.  The variables will be represented by a\n",
+    "one-dimensional array in the program.  The program will evaluate the\n",
+    "network at each possible pair $(x,t)$, given an array for the desired\n",
+    "$x$-values and $t$-values to approximate the solution at."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "id": "d0d57b08",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "def sigmoid(z):\n",
+    "    return 1/(1 + np.exp(-z))\n",
+    "\n",
+    "def deep_neural_network(deep_params, x):\n",
+    "    # x is now a point and a 1D numpy array; make it a column vector\n",
+    "    num_coordinates = np.size(x,0)\n",
+    "    x = x.reshape(num_coordinates,-1)\n",
+    "\n",
+    "    num_points = np.size(x,1)\n",
+    "\n",
+    "    # N_hidden is the number of hidden layers\n",
+    "    N_hidden = len(deep_params) - 1 # -1 since params consist of parameters to all the hidden layers AND the output layer\n",
+    "\n",
+    "    # Assume that the input layer does nothing to the input x\n",
+    "    x_input = x\n",
+    "    x_prev = x_input\n",
+    "\n",
+    "    ## Hidden layers:\n",
+    "\n",
+    "    for l in range(N_hidden):\n",
+    "        # From the list of parameters P; find the correct weigths and bias for this layer\n",
+    "        w_hidden = deep_params[l]\n",
+    "\n",
+    "        # Add a row of ones to include bias\n",
+    "        x_prev = np.concatenate((np.ones((1,num_points)), x_prev ), axis = 0)\n",
+    "\n",
+    "        z_hidden = np.matmul(w_hidden, x_prev)\n",
+    "        x_hidden = sigmoid(z_hidden)\n",
+    "\n",
+    "        # Update x_prev such that next layer can use the output from this layer\n",
+    "        x_prev = x_hidden\n",
+    "\n",
+    "    ## Output layer:\n",
+    "\n",
+    "    # Get the weights and bias for this layer\n",
+    "    w_output = deep_params[-1]\n",
+    "\n",
+    "    # Include bias:\n",
+    "    x_prev = np.concatenate((np.ones((1,num_points)), x_prev), axis = 0)\n",
+    "\n",
+    "    z_output = np.matmul(w_output, x_prev)\n",
+    "    x_output = z_output\n",
+    "\n",
+    "    return x_output[0][0]"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f4cd2d7a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Setting up the network using Autograd; The trial solution\n",
+    "\n",
+    "The cost function must then iterate through the given arrays\n",
+    "containing values for $x$ and $t$, defines a point $(x,t)$ the deep\n",
+    "neural network and the trial solution is evaluated at, and then finds\n",
+    "the Jacobian of the trial solution.\n",
+    "\n",
+    "A possible trial solution for this PDE is\n",
+    "\n",
+    "$$\n",
+    "g_t(x,t) = h_1(x,t) + x(1-x)tN(x,t,P)\n",
+    "$$\n",
+    "\n",
+    "with $A(x,t)$ being a function ensuring that $g_t(x,t)$ satisfies our given conditions, and $N(x,t,P)$ being the output from the deep neural network using weights and biases for each layer from $P$.\n",
+    "\n",
+    "To fulfill the conditions, $A(x,t)$ could be:\n",
+    "\n",
+    "$$\n",
+    "h_1(x,t) = (1-t)\\Big(u(x) - \\big((1-x)u(0) + x u(1)\\big)\\Big) = (1-t)u(x) = (1-t)\\sin(\\pi x)\n",
+    "$$\n",
+    "since $(0) = u(1) = 0$ and $u(x) = \\sin(\\pi x)$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "376d4c32",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Why the jacobian?\n",
+    "\n",
+    "The Jacobian is used because the program must find the derivative of\n",
+    "the trial solution with respect to $x$ and $t$.\n",
+    "\n",
+    "This gives the necessity of computing the Jacobian matrix, as we want\n",
+    "to evaluate the gradient with respect to $x$ and $t$ (note that the\n",
+    "Jacobian of a scalar-valued multivariate function is simply its\n",
+    "gradient).\n",
+    "\n",
+    "In Autograd, the differentiation is by default done with respect to\n",
+    "the first input argument of your Python function. Since the points is\n",
+    "an array representing $x$ and $t$, the Jacobian is calculated using\n",
+    "the values of $x$ and $t$.\n",
+    "\n",
+    "To find the second derivative with respect to $x$ and $t$, the\n",
+    "Jacobian can be found for the second time. The result is a Hessian\n",
+    "matrix, which is the matrix containing all the possible second order\n",
+    "mixed derivatives of $g(x,t)$."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 8,
+   "id": "01de9bfb",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "# Set up the trial function:\n",
+    "def u(x):\n",
+    "    return np.sin(np.pi*x)\n",
+    "\n",
+    "def g_trial(point,P):\n",
+    "    x,t = point\n",
+    "    return (1-t)*u(x) + x*(1-x)*t*deep_neural_network(P,point)\n",
+    "\n",
+    "# The right side of the ODE:\n",
+    "def f(point):\n",
+    "    return 0.\n",
+    "\n",
+    "# The cost function:\n",
+    "def cost_function(P, x, t):\n",
+    "    cost_sum = 0\n",
+    "\n",
+    "    g_t_jacobian_func = jacobian(g_trial)\n",
+    "    g_t_hessian_func = hessian(g_trial)\n",
+    "\n",
+    "    for x_ in x:\n",
+    "        for t_ in t:\n",
+    "            point = np.array([x_,t_])\n",
+    "\n",
+    "            g_t = g_trial(point,P)\n",
+    "            g_t_jacobian = g_t_jacobian_func(point,P)\n",
+    "            g_t_hessian = g_t_hessian_func(point,P)\n",
+    "\n",
+    "            g_t_dt = g_t_jacobian[1]\n",
+    "            g_t_d2x = g_t_hessian[0][0]\n",
+    "\n",
+    "            func = f(point)\n",
+    "\n",
+    "            err_sqr = ( (g_t_dt - g_t_d2x) - func)**2\n",
+    "            cost_sum += err_sqr\n",
+    "\n",
+    "    return cost_sum"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "51d30a8e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Setting up the network using Autograd; The full program\n",
+    "\n",
+    "Having set up the network, along with the trial solution and cost function, we can now see how the deep neural network performs by comparing the results to the analytical solution.\n",
+    "\n",
+    "The analytical solution of our problem is\n",
+    "\n",
+    "$$\n",
+    "g(x,t) = \\exp(-\\pi^2 t)\\sin(\\pi x)\n",
+    "$$\n",
+    "\n",
+    "A possible way to implement a neural network solving the PDE, is given below.\n",
+    "Be aware, though, that it is fairly slow for the parameters used.\n",
+    "A better result is possible, but requires more iterations, and thus longer time to complete.\n",
+    "\n",
+    "Indeed, the program below is not optimal in its implementation, but rather serves as an example on how to implement and use a neural network to solve a PDE.\n",
+    "Using TensorFlow results in a much better execution time. Try it!"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 9,
+   "id": "74d5fa6b",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "import autograd.numpy as np\n",
+    "from autograd import jacobian,hessian,grad\n",
+    "import autograd.numpy.random as npr\n",
+    "from matplotlib import cm\n",
+    "from matplotlib import pyplot as plt\n",
+    "from mpl_toolkits.mplot3d import axes3d\n",
+    "\n",
+    "## Set up the network\n",
+    "\n",
+    "def sigmoid(z):\n",
+    "    return 1/(1 + np.exp(-z))\n",
+    "\n",
+    "def deep_neural_network(deep_params, x):\n",
+    "    # x is now a point and a 1D numpy array; make it a column vector\n",
+    "    num_coordinates = np.size(x,0)\n",
+    "    x = x.reshape(num_coordinates,-1)\n",
+    "\n",
+    "    num_points = np.size(x,1)\n",
+    "\n",
+    "    # N_hidden is the number of hidden layers\n",
+    "    N_hidden = len(deep_params) - 1 # -1 since params consist of parameters to all the hidden layers AND the output layer\n",
+    "\n",
+    "    # Assume that the input layer does nothing to the input x\n",
+    "    x_input = x\n",
+    "    x_prev = x_input\n",
+    "\n",
+    "    ## Hidden layers:\n",
+    "\n",
+    "    for l in range(N_hidden):\n",
+    "        # From the list of parameters P; find the correct weigths and bias for this layer\n",
+    "        w_hidden = deep_params[l]\n",
+    "\n",
+    "        # Add a row of ones to include bias\n",
+    "        x_prev = np.concatenate((np.ones((1,num_points)), x_prev ), axis = 0)\n",
+    "\n",
+    "        z_hidden = np.matmul(w_hidden, x_prev)\n",
+    "        x_hidden = sigmoid(z_hidden)\n",
+    "\n",
+    "        # Update x_prev such that next layer can use the output from this layer\n",
+    "        x_prev = x_hidden\n",
+    "\n",
+    "    ## Output layer:\n",
+    "\n",
+    "    # Get the weights and bias for this layer\n",
+    "    w_output = deep_params[-1]\n",
+    "\n",
+    "    # Include bias:\n",
+    "    x_prev = np.concatenate((np.ones((1,num_points)), x_prev), axis = 0)\n",
+    "\n",
+    "    z_output = np.matmul(w_output, x_prev)\n",
+    "    x_output = z_output\n",
+    "\n",
+    "    return x_output[0][0]\n",
+    "\n",
+    "## Define the trial solution and cost function\n",
+    "def u(x):\n",
+    "    return np.sin(np.pi*x)\n",
+    "\n",
+    "def g_trial(point,P):\n",
+    "    x,t = point\n",
+    "    return (1-t)*u(x) + x*(1-x)*t*deep_neural_network(P,point)\n",
+    "\n",
+    "# The right side of the ODE:\n",
+    "def f(point):\n",
+    "    return 0.\n",
+    "\n",
+    "# The cost function:\n",
+    "def cost_function(P, x, t):\n",
+    "    cost_sum = 0\n",
+    "\n",
+    "    g_t_jacobian_func = jacobian(g_trial)\n",
+    "    g_t_hessian_func = hessian(g_trial)\n",
+    "\n",
+    "    for x_ in x:\n",
+    "        for t_ in t:\n",
+    "            point = np.array([x_,t_])\n",
+    "\n",
+    "            g_t = g_trial(point,P)\n",
+    "            g_t_jacobian = g_t_jacobian_func(point,P)\n",
+    "            g_t_hessian = g_t_hessian_func(point,P)\n",
+    "\n",
+    "            g_t_dt = g_t_jacobian[1]\n",
+    "            g_t_d2x = g_t_hessian[0][0]\n",
+    "\n",
+    "            func = f(point)\n",
+    "\n",
+    "            err_sqr = ( (g_t_dt - g_t_d2x) - func)**2\n",
+    "            cost_sum += err_sqr\n",
+    "\n",
+    "    return cost_sum /( np.size(x)*np.size(t) )\n",
+    "\n",
+    "## For comparison, define the analytical solution\n",
+    "def g_analytic(point):\n",
+    "    x,t = point\n",
+    "    return np.exp(-np.pi**2*t)*np.sin(np.pi*x)\n",
+    "\n",
+    "## Set up a function for training the network to solve for the equation\n",
+    "def solve_pde_deep_neural_network(x,t, num_neurons, num_iter, lmb):\n",
+    "    ## Set up initial weigths and biases\n",
+    "    N_hidden = np.size(num_neurons)\n",
+    "\n",
+    "    ## Set up initial weigths and biases\n",
+    "\n",
+    "    # Initialize the list of parameters:\n",
+    "    P = [None]*(N_hidden + 1) # + 1 to include the output layer\n",
+    "\n",
+    "    P[0] = npr.randn(num_neurons[0], 2 + 1 ) # 2 since we have two points, +1 to include bias\n",
+    "    for l in range(1,N_hidden):\n",
+    "        P[l] = npr.randn(num_neurons[l], num_neurons[l-1] + 1) # +1 to include bias\n",
+    "\n",
+    "    # For the output layer\n",
+    "    P[-1] = npr.randn(1, num_neurons[-1] + 1 ) # +1 since bias is included\n",
+    "\n",
+    "    print('Initial cost: ',cost_function(P, x, t))\n",
+    "\n",
+    "    cost_function_grad = grad(cost_function,0)\n",
+    "\n",
+    "    # Let the update be done num_iter times\n",
+    "    for i in range(num_iter):\n",
+    "        cost_grad =  cost_function_grad(P, x , t)\n",
+    "\n",
+    "        for l in range(N_hidden+1):\n",
+    "            P[l] = P[l] - lmb * cost_grad[l]\n",
+    "\n",
+    "    print('Final cost: ',cost_function(P, x, t))\n",
+    "\n",
+    "    return P\n",
+    "\n",
+    "if __name__ == '__main__':\n",
+    "    ### Use the neural network:\n",
+    "    npr.seed(15)\n",
+    "\n",
+    "    ## Decide the vales of arguments to the function to solve\n",
+    "    Nx = 10; Nt = 10\n",
+    "    x = np.linspace(0, 1, Nx)\n",
+    "    t = np.linspace(0,1,Nt)\n",
+    "\n",
+    "    ## Set up the parameters for the network\n",
+    "    num_hidden_neurons = [100, 25]\n",
+    "    num_iter = 250\n",
+    "    lmb = 0.01\n",
+    "\n",
+    "    P = solve_pde_deep_neural_network(x,t, num_hidden_neurons, num_iter, lmb)\n",
+    "\n",
+    "    ## Store the results\n",
+    "    g_dnn_ag = np.zeros((Nx, Nt))\n",
+    "    G_analytical = np.zeros((Nx, Nt))\n",
+    "    for i,x_ in enumerate(x):\n",
+    "        for j, t_ in enumerate(t):\n",
+    "            point = np.array([x_, t_])\n",
+    "            g_dnn_ag[i,j] = g_trial(point,P)\n",
+    "\n",
+    "            G_analytical[i,j] = g_analytic(point)\n",
+    "\n",
+    "    # Find the map difference between the analytical and the computed solution\n",
+    "    diff_ag = np.abs(g_dnn_ag - G_analytical)\n",
+    "    print('Max absolute difference between the analytical solution and the network: %g'%np.max(diff_ag))\n",
+    "\n",
+    "    ## Plot the solutions in two dimensions, that being in position and time\n",
+    "\n",
+    "    T,X = np.meshgrid(t,x)\n",
+    "\n",
+    "    fig = plt.figure(figsize=(10,10))\n",
+    "    ax = fig.add_suplot(projection='3d')\n",
+    "    ax.set_title('Solution from the deep neural network w/ %d layer'%len(num_hidden_neurons))\n",
+    "    s = ax.plot_surface(T,X,g_dnn_ag,linewidth=0,antialiased=False,cmap=cm.viridis)\n",
+    "    ax.set_xlabel('Time $t$')\n",
+    "    ax.set_ylabel('Position $x$');\n",
+    "\n",
+    "\n",
+    "    fig = plt.figure(figsize=(10,10))\n",
+    "    ax = fig.add_suplot(projection='3d')\n",
+    "    ax.set_title('Analytical solution')\n",
+    "    s = ax.plot_surface(T,X,G_analytical,linewidth=0,antialiased=False,cmap=cm.viridis)\n",
+    "    ax.set_xlabel('Time $t$')\n",
+    "    ax.set_ylabel('Position $x$');\n",
+    "\n",
+    "    fig = plt.figure(figsize=(10,10))\n",
+    "    ax = fig.add_suplot(projection='3d')\n",
+    "    ax.set_title('Difference')\n",
+    "    s = ax.plot_surface(T,X,diff_ag,linewidth=0,antialiased=False,cmap=cm.viridis)\n",
+    "    ax.set_xlabel('Time $t$')\n",
+    "    ax.set_ylabel('Position $x$');\n",
+    "\n",
+    "    ## Take some slices of the 3D plots just to see the solutions at particular times\n",
+    "    indx1 = 0\n",
+    "    indx2 = int(Nt/2)\n",
+    "    indx3 = Nt-1\n",
+    "\n",
+    "    t1 = t[indx1]\n",
+    "    t2 = t[indx2]\n",
+    "    t3 = t[indx3]\n",
+    "\n",
+    "    # Slice the results from the DNN\n",
+    "    res1 = g_dnn_ag[:,indx1]\n",
+    "    res2 = g_dnn_ag[:,indx2]\n",
+    "    res3 = g_dnn_ag[:,indx3]\n",
+    "\n",
+    "    # Slice the analytical results\n",
+    "    res_analytical1 = G_analytical[:,indx1]\n",
+    "    res_analytical2 = G_analytical[:,indx2]\n",
+    "    res_analytical3 = G_analytical[:,indx3]\n",
+    "\n",
+    "    # Plot the slices\n",
+    "    plt.figure(figsize=(10,10))\n",
+    "    plt.title(\"Computed solutions at time = %g\"%t1)\n",
+    "    plt.plot(x, res1)\n",
+    "    plt.plot(x,res_analytical1)\n",
+    "    plt.legend(['dnn','analytical'])\n",
+    "\n",
+    "    plt.figure(figsize=(10,10))\n",
+    "    plt.title(\"Computed solutions at time = %g\"%t2)\n",
+    "    plt.plot(x, res2)\n",
+    "    plt.plot(x,res_analytical2)\n",
+    "    plt.legend(['dnn','analytical'])\n",
+    "\n",
+    "    plt.figure(figsize=(10,10))\n",
+    "    plt.title(\"Computed solutions at time = %g\"%t3)\n",
+    "    plt.plot(x, res3)\n",
+    "    plt.plot(x,res_analytical3)\n",
+    "    plt.legend(['dnn','analytical'])\n",
+    "\n",
+    "    plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "93e05afe",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Resources on differential equations and deep learning\n",
+    "\n",
+    "1. [Artificial neural networks for solving ordinary and partial differential equations by I.E. Lagaris et al](https://pdfs.semanticscholar.org/d061/df393e0e8fbfd0ea24976458b7d42419040d.pdf)\n",
+    "\n",
+    "2. [Neural networks for solving differential equations by A. Honchar](https://becominghuman.ai/neural-networks-for-solving-differential-equations-fa230ac5e04c)\n",
+    "\n",
+    "3. [Solving differential equations using neural networks by M.M Chiaramonte and M. Kiener](http://cs229.stanford.edu/proj2013/ChiaramonteKiener-SolvingDifferentialEquationsUsingNeuralNetworks.pdf)\n",
+    "\n",
+    "4. [Introduction to Partial Differential Equations by A. Tveito, R. Winther](https://www.springer.com/us/book/9783540225515)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a7c00ec0",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Convolutional Neural Networks (recognizing images)\n",
+    "\n",
+    "Convolutional neural networks (CNNs) were developed during the last\n",
+    "decade of the previous century, with a focus on character recognition\n",
+    "tasks. Nowadays, CNNs are a central element in the spectacular success\n",
+    "of deep learning methods. The success in for example image\n",
+    "classifications have made them a central tool for most machine\n",
+    "learning practitioners.\n",
+    "\n",
+    "CNNs are very similar to ordinary Neural Networks.\n",
+    "They are made up of neurons that have learnable weights and\n",
+    "biases. Each neuron receives some inputs, performs a dot product and\n",
+    "optionally follows it with a non-linearity. The whole network still\n",
+    "expresses a single differentiable score function: from the raw image\n",
+    "pixels on one end to class scores at the other. And they still have a\n",
+    "loss function (for example Softmax) on the last (fully-connected) layer\n",
+    "and all the tips/tricks we developed for learning regular Neural\n",
+    "Networks still apply (back propagation, gradient descent etc etc)."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "bce35e6a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## What is the Difference\n",
+    "\n",
+    "**CNN architectures make the explicit assumption that\n",
+    "the inputs are images, which allows us to encode certain properties\n",
+    "into the architecture. These then make the forward function more\n",
+    "efficient to implement and vastly reduce the amount of parameters in\n",
+    "the network.**"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d0b0a7ec",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Neural Networks vs CNNs\n",
+    "\n",
+    "Neural networks are defined as **affine transformations**, that is \n",
+    "a vector is received as input and is multiplied with a matrix of so-called weights (our unknown paramters) to produce an\n",
+    "output (to which a bias vector is usually added before passing the result\n",
+    "through a nonlinear activation function). This is applicable to any type of input, be it an\n",
+    "image, a sound clip or an unordered collection of features: whatever their\n",
+    "dimensionality, their representation can always be flattened into a vector\n",
+    "before the transformation."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "23eb56af",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Why CNNS for images, sound files, medical images from CT scans etc?\n",
+    "\n",
+    "However, when we consider images, sound clips and many other similar kinds of data, these data  have an intrinsic\n",
+    "structure. More formally, they share these important properties:\n",
+    "* They are stored as multi-dimensional arrays (think of the pixels of a figure) .\n",
+    "\n",
+    "* They feature one or more axes for which ordering matters (e.g., width and height axes for an image, time axis for a sound clip).\n",
+    "\n",
+    "* One axis, called the channel axis, is used to access different views of the data (e.g., the red, green and blue channels of a color image, or the left and right channels of a stereo audio track).\n",
+    "\n",
+    "These properties are not exploited when an affine transformation is applied; in\n",
+    "fact, all the axes are treated in the same way and the topological information\n",
+    "is not taken into account. Still, taking advantage of the implicit structure of\n",
+    "the data may prove very handy in solving some tasks, like computer vision and\n",
+    "speech recognition, and in these cases it would be best to preserve it. This is\n",
+    "where discrete convolutions come into play.\n",
+    "\n",
+    "A discrete convolution is a linear transformation that preserves this notion of\n",
+    "ordering. It is sparse (only a few input units contribute to a given output\n",
+    "unit) and reuses parameters (the same weights are applied to multiple locations\n",
+    "in the input)."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "87985f41",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Regular NNs don’t scale well to full images\n",
+    "\n",
+    "As an example, consider\n",
+    "an image of size $32\\times 32\\times 3$ (32 wide, 32 high, 3 color channels), so a\n",
+    "single fully-connected neuron in a first hidden layer of a regular\n",
+    "Neural Network would have $32\\times 32\\times 3 = 3072$ weights. This amount still\n",
+    "seems manageable, but clearly this fully-connected structure does not\n",
+    "scale to larger images. For example, an image of more respectable\n",
+    "size, say $200\\times 200\\times 3$, would lead to neurons that have \n",
+    "$200\\times 200\\times 3 = 120,000$ weights. \n",
+    "\n",
+    "We could have\n",
+    "several such neurons, and the parameters would add up quickly! Clearly,\n",
+    "this full connectivity is wasteful and the huge number of parameters\n",
+    "would quickly lead to possible overfitting.\n",
+    "\n",
+    "<!-- dom:FIGURE: [figslides/nn.jpeg, width=500 frac=0.6]  A regular 3-layer Neural Network. -->\n",
+    "<!-- begin figure -->\n",
+    "\n",
+    "<img src=\"figslides/nn.jpeg\" width=\"500\"><p style=\"font-size: 0.9em\"><i>Figure 1: A regular 3-layer Neural Network.</i></p>\n",
+    "<!-- end figure -->"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5baec46b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## 3D volumes of neurons\n",
+    "\n",
+    "Convolutional Neural Networks take advantage of the fact that the\n",
+    "input consists of images and they constrain the architecture in a more\n",
+    "sensible way. \n",
+    "\n",
+    "In particular, unlike a regular Neural Network, the\n",
+    "layers of a CNN have neurons arranged in 3 dimensions: width,\n",
+    "height, depth. (Note that the word depth here refers to the third\n",
+    "dimension of an activation volume, not to the depth of a full Neural\n",
+    "Network, which can refer to the total number of layers in a network.)\n",
+    "\n",
+    "To understand it better, the above example of an image \n",
+    "with an input volume of\n",
+    "activations has dimensions $32\\times 32\\times 3$ (width, height,\n",
+    "depth respectively). \n",
+    "\n",
+    "The neurons in a layer will\n",
+    "only be connected to a small region of the layer before it, instead of\n",
+    "all of the neurons in a fully-connected manner. Moreover, the final\n",
+    "output layer could  for this specific image have dimensions $1\\times 1 \\times 10$, \n",
+    "because by the\n",
+    "end of the CNN architecture we will reduce the full image into a\n",
+    "single vector of class scores, arranged along the depth\n",
+    "dimension. \n",
+    "\n",
+    "<!-- dom:FIGURE: [figslides/cnn.jpeg, width=500 frac=0.6]  A CNN arranges its neurons in three dimensions (width, height, depth), as visualized in one of the layers. Every layer of a CNN transforms the 3D input volume to a 3D output volume of neuron activations. In this example, the red input layer holds the image, so its width and height would be the dimensions of the image, and the depth would be 3 (Red, Green, Blue channels). -->\n",
+    "<!-- begin figure -->\n",
+    "\n",
+    "<img src=\"figslides/cnn.jpeg\" width=\"500\"><p style=\"font-size: 0.9em\"><i>Figure 1: A CNN arranges its neurons in three dimensions (width, height, depth), as visualized in one of the layers. Every layer of a CNN transforms the 3D input volume to a 3D output volume of neuron activations. In this example, the red input layer holds the image, so its width and height would be the dimensions of the image, and the depth would be 3 (Red, Green, Blue channels).</i></p>\n",
+    "<!-- end figure -->"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "51f8d89d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## More on Dimensionalities\n",
+    "\n",
+    "In fields like signal processing (and imaging as well), one designs\n",
+    "so-called filters. These filters are defined by the convolutions and\n",
+    "are often hand-crafted. One may specify filters for smoothing, edge\n",
+    "detection, frequency reshaping, and similar operations. However with\n",
+    "neural networks the idea is to automatically learn the filters and use\n",
+    "many of them in conjunction with non-linear operations (activation\n",
+    "functions).\n",
+    "\n",
+    "As an example consider a neural network operating on sound sequence\n",
+    "data.  Assume that we an input vector $\\boldsymbol{x}$ of length $d=10^6$.  We\n",
+    "construct then a neural network with onle hidden layer only with\n",
+    "$10^4$ nodes. This means that we will have a weight matrix with\n",
+    "$10^4\\times 10^6=10^{10}$ weights to be determined, together with $10^4$ biases.\n",
+    "\n",
+    "Assume furthermore that we have an output layer which is meant to train whether the sound sequence represents a human voice (true) or something else (false).\n",
+    "It means that we have only one output node. But since this output node connects to $10^4$ nodes in the hidden layer, there are in total $10^4$ weights to be determined for the output layer, plus one bias. In total we have"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "258beedf",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\mathrm{NumberParameters}=10^{10}+10^4+10^4+1 \\approx 10^{10},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9851f79d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "that is ten billion parameters to determine."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b310bbf7",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Further remarks\n",
+    "\n",
+    "The main principles that justify convolutions is locality of\n",
+    "information and repetion of patterns within the signal. Sound samples\n",
+    "of the input in adjacent spots are much more likely to affect each\n",
+    "other than those that are very far away. Similarly, sounds are\n",
+    "repeated in multiple times in the signal. While slightly simplistic,\n",
+    "reasoning about such a sound example demonstrates this. The same\n",
+    "principles then apply to images and other similar data."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0416ce1c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Layers used to build CNNs\n",
+    "\n",
+    "A simple CNN is a sequence of layers, and every layer of a CNN\n",
+    "transforms one volume of activations to another through a\n",
+    "differentiable function. We use three main types of layers to build\n",
+    "CNN architectures: Convolutional Layer, Pooling Layer, and\n",
+    "Fully-Connected Layer (exactly as seen in regular Neural Networks). We\n",
+    "will stack these layers to form a full CNN architecture.\n",
+    "\n",
+    "A simple CNN for image classification could have the architecture:\n",
+    "\n",
+    "* **INPUT** ($32\\times 32 \\times 3$) will hold the raw pixel values of the image, in this case an image of width 32, height 32, and with three color channels R,G,B.\n",
+    "\n",
+    "* **CONV** (convolutional )layer will compute the output of neurons that are connected to local regions in the input, each computing a dot product between their weights and a small region they are connected to in the input volume. This may result in volume such as $[32\\times 32\\times 12]$ if we decided to use 12 filters.\n",
+    "\n",
+    "* **RELU** layer will apply an elementwise activation function, such as the $max(0,x)$ thresholding at zero. This leaves the size of the volume unchanged ($[32\\times 32\\times 12]$).\n",
+    "\n",
+    "* **POOL** (pooling) layer will perform a downsampling operation along the spatial dimensions (width, height), resulting in volume such as $[16\\times 16\\times 12]$.\n",
+    "\n",
+    "* **FC** (i.e. fully-connected) layer will compute the class scores, resulting in volume of size $[1\\times 1\\times 10]$, where each of the 10 numbers correspond to a class score, such as among the 10 categories of the MNIST images we considered above . As with ordinary Neural Networks and as the name implies, each neuron in this layer will be connected to all the numbers in the previous volume."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "779f2a92",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Transforming images\n",
+    "\n",
+    "CNNs transform the original image layer by layer from the original\n",
+    "pixel values to the final class scores. \n",
+    "\n",
+    "Observe that some layers contain\n",
+    "parameters and other don’t. In particular, the CNN layers perform\n",
+    "transformations that are a function of not only the activations in the\n",
+    "input volume, but also of the parameters (the weights and biases of\n",
+    "the neurons). On the other hand, the RELU/POOL layers will implement a\n",
+    "fixed function. The parameters in the CONV/FC layers will be trained\n",
+    "with gradient descent so that the class scores that the CNN computes\n",
+    "are consistent with the labels in the training set for each image."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "704d3793",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## CNNs in brief\n",
+    "\n",
+    "In summary:\n",
+    "\n",
+    "* A CNN architecture is in the simplest case a list of Layers that transform the image volume into an output volume (e.g. holding the class scores)\n",
+    "\n",
+    "* There are a few distinct types of Layers (e.g. CONV/FC/RELU/POOL are by far the most popular)\n",
+    "\n",
+    "* Each Layer accepts an input 3D volume and transforms it to an output 3D volume through a differentiable function\n",
+    "\n",
+    "* Each Layer may or may not have parameters (e.g. CONV/FC do, RELU/POOL don’t)\n",
+    "\n",
+    "* Each Layer may or may not have additional hyperparameters (e.g. CONV/FC/POOL do, RELU doesn’t)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "098ba4db",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## A deep CNN model ([From Raschka et al](https://github.com/rasbt/machine-learning-book))\n",
+    "\n",
+    "<!-- dom:FIGURE: [figslides/deepcnn.png, width=500 frac=0.67]  A deep CNN -->\n",
+    "<!-- begin figure -->\n",
+    "\n",
+    "<img src=\"figslides/deepcnn.png\" width=\"500\"><p style=\"font-size: 0.9em\"><i>Figure 1: A deep CNN</i></p>\n",
+    "<!-- end figure -->"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4b726a9a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Key Idea\n",
+    "\n",
+    "A dense neural network is representd by an affine operation (like\n",
+    "matrix-matrix multiplication) where all parameters are included.\n",
+    "\n",
+    "The key idea in CNNs for say imaging is that in images neighbor pixels tend to be related! So we connect\n",
+    "only neighboring neurons in the input instead of connecting all with the first hidden layer.\n",
+    "\n",
+    "We say we perform a filtering (convolution is the mathematical operation)."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "69454917",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## How to do image compression before the era of deep learning\n",
+    "\n",
+    "The singular-value decomposition (SVD) algorithm has been for decades one of the standard ways of compressing images.\n",
+    "The [lectures on the SVD](https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/chapter2.html#the-singular-value-decomposition) give many of the essential details concerning the SVD.\n",
+    "\n",
+    "The orthogonal vectors which are obtained from the SVD, can be used to\n",
+    "project down the dimensionality of a given image. In the example here\n",
+    "we gray-scale an image and downsize it.\n",
+    "\n",
+    "This recipe relies on us being able to actually perform the SVD. For\n",
+    "large images, and in particular with many images to reconstruct, using the SVD \n",
+    "may quickly become an overwhelming task. With the advent of efficient deep\n",
+    "learning methods like CNNs and later generative methods, these methods\n",
+    "have become in the last years the premier way of performing image\n",
+    "analysis. In particular for classification problems with labelled images."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5b466162",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## The SVD example"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 10,
+   "id": "cf827ae5",
+   "metadata": {
+    "collapsed": false,
+    "editable": true
+   },
+   "outputs": [],
+   "source": [
+    "from matplotlib.image import imread\n",
+    "import matplotlib.pyplot as plt\n",
+    "import scipy.linalg as ln\n",
+    "import numpy as np\n",
+    "import os\n",
+    "from PIL import Image\n",
+    "from math import log10, sqrt \n",
+    "plt.rcParams['figure.figsize'] = [16, 8]\n",
+    "# Import image\n",
+    "A = imread(os.path.join(\"figslides/photo1.jpg\"))\n",
+    "X = A.dot([0.299, 0.5870, 0.114]) # Convert RGB to grayscale\n",
+    "img = plt.imshow(X)\n",
+    "# convert to gray\n",
+    "img.set_cmap('gray')\n",
+    "plt.axis('off')\n",
+    "plt.show()\n",
+    "# Call image size\n",
+    "print(': %s'%str(X.shape))\n",
+    "\n",
+    "\n",
+    "# split the matrix into U, S, VT\n",
+    "U, S, VT = np.linalg.svd(X,full_matrices=False)\n",
+    "S = np.diag(S)\n",
+    "m = 800 # Image's width\n",
+    "n = 1200 # Image's height\n",
+    "j = 0\n",
+    "# Try compression with different k vectors (these represent projections):\n",
+    "for k in (5,10, 20, 100,200,400,500):\n",
+    "    # Original size of the image\n",
+    "    originalSize = m * n \n",
+    "    # Size after compressed\n",
+    "    compressedSize = k * (1 + m + n) \n",
+    "    # The projection of the original image\n",
+    "    Xapprox = U[:,:k] @ S[0:k,:k] @ VT[:k,:]\n",
+    "    plt.figure(j+1)\n",
+    "    j += 1\n",
+    "    img = plt.imshow(Xapprox)\n",
+    "    img.set_cmap('gray')\n",
+    "    \n",
+    "    plt.axis('off')\n",
+    "    plt.title('k = ' + str(k))\n",
+    "    plt.show() \n",
+    "    print('Original size of image:')\n",
+    "    print(originalSize)\n",
+    "    print('Compression rate as Compressed image / Original size:')\n",
+    "    ratio = compressedSize * 1.0 / originalSize\n",
+    "    print(ratio)\n",
+    "    print('Compression rate is ' + str( round(ratio * 100 ,2)) + '%' )  \n",
+    "    # Estimate MQA\n",
+    "    x= X.astype(\"float\")\n",
+    "    y=Xapprox.astype(\"float\")\n",
+    "    err = np.sum((x - y) ** 2)\n",
+    "    err /= float(X.shape[0] * Xapprox.shape[1])\n",
+    "    print('The mean-square deviation '+ str(round( err)))\n",
+    "    max_pixel = 255.0\n",
+    "    # Estimate Signal Noise Ratio\n",
+    "    srv = 20 * (log10(max_pixel / sqrt(err)))\n",
+    "    print('Signa to noise ratio '+ str(round(srv)) +'dB')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "aa026875",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Mathematics of CNNs\n",
+    "\n",
+    "The mathematics of CNNs is based on the mathematical operation of\n",
+    "**convolution**.  In mathematics (in particular in functional analysis),\n",
+    "convolution is represented by mathematical operations (integration,\n",
+    "summation etc) on two functions in order to produce a third function\n",
+    "that expresses how the shape of one gets modified by the other.\n",
+    "Convolution has a plethora of applications in a variety of\n",
+    "disciplines, spanning from statistics to signal processing, computer\n",
+    "vision, solutions of differential equations,linear algebra,\n",
+    "engineering, and yes, machine learning.\n",
+    "\n",
+    "Mathematically, convolution is defined as follows (one-dimensional example):\n",
+    "Let us define a continuous function $y(t)$ given by"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ac5e804d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "y(t) = \\int x(a) w(t-a) da,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7925ef06",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where $x(a)$ represents a so-called input and $w(t-a)$ is normally called the weight function or kernel.\n",
+    "\n",
+    "The above integral is written in  a more compact form as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1d42469c",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "y(t) = \\left(x * w\\right)(t).\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0c11e47b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "The discretized version reads"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "895942da",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "y(t) = \\sum_{a=-\\infty}^{a=\\infty}x(a)w(t-a).\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "89fb8d48",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Computing the inverse of the above convolution operations is known as deconvolution and the process is commutative.\n",
+    "\n",
+    "How can we use this? And what does it mean? Let us study some familiar examples first."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4216a5d6",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Convolution Examples: Polynomial multiplication\n",
+    "\n",
+    "Our first example is that of a multiplication between two polynomials,\n",
+    "which we will rewrite in terms of the mathematics of convolution. In\n",
+    "the final stage, since the problem here is a discrete one, we will\n",
+    "recast the final expression in terms of a matrix-vector\n",
+    "multiplication, where the matrix is a so-called [Toeplitz matrix\n",
+    "](https://link.springer.com/book/10.1007/978-93-86279-04-0).\n",
+    "\n",
+    "Let us look a the following polynomials to second and third order, respectively:"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4c52ad3d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "p(t) = \\alpha_0+\\alpha_1 t+\\alpha_2 t^2,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e1f1d607",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c2943ca3",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "s(t) = \\beta_0+\\beta_1 t+\\beta_2 t^2+\\beta_3 t^3.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "80f2b566",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "The polynomial multiplication gives us a new polynomial of degree $5$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8e4539d7",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "z(t) = \\delta_0+\\delta_1 t+\\delta_2 t^2+\\delta_3 t^3+\\delta_4 t^4+\\delta_5 t^5.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "bab249e6",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Efficient Polynomial Multiplication\n",
+    "\n",
+    "Computing polynomial products can be implemented efficiently if we rewrite the more brute force multiplications using convolution.\n",
+    "We note first that the new coefficients are given as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8ad44074",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\begin{split}\n",
+    "\\delta_0=&\\alpha_0\\beta_0\\\\\n",
+    "\\delta_1=&\\alpha_1\\beta_0+\\alpha_0\\beta_1\\\\\n",
+    "\\delta_2=&\\alpha_0\\beta_2+\\alpha_1\\beta_1+\\alpha_2\\beta_0\\\\\n",
+    "\\delta_3=&\\alpha_1\\beta_2+\\alpha_2\\beta_1+\\alpha_0\\beta_3\\\\\n",
+    "\\delta_4=&\\alpha_2\\beta_2+\\alpha_1\\beta_3\\\\\n",
+    "\\delta_5=&\\alpha_2\\beta_3.\\\\\n",
+    "\\end{split}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e631a306",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "We note that $\\alpha_i=0$ except for $i\\in \\left\\{0,1,2\\right\\}$ and $\\beta_i=0$ except for $i\\in\\left\\{0,1,2,3\\right\\}$.\n",
+    "\n",
+    "We can then rewrite the coefficients $\\delta_j$ using a discrete convolution as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "feaf0660",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\delta_j = \\sum_{i=-\\infty}^{i=\\infty}\\alpha_i\\beta_{j-i}=(\\alpha * \\beta)_j,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "adb9b1e3",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "or as a double sum with restriction $l=i+j$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6af726f2",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\delta_l = \\sum_{ij}\\alpha_i\\beta_{j}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cd03e637",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Further simplification\n",
+    "\n",
+    "Although we may have redundant operations with some few zeros for $\\beta_i$, we can rewrite the above sum in a more compact way as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8eabbe18",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\delta_i = \\sum_{k=0}^{k=m-1}\\alpha_k\\beta_{i-k},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "94ebccdc",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where $m=3$ in our case, the maximum length of\n",
+    "the vector $\\alpha$. Note that the vector $\\boldsymbol{\\beta}$ has length $n=4$. Below we will find an even more efficient representation."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "075125e9",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## A more efficient way of coding the above Convolution\n",
+    "\n",
+    "Since we only have a finite number of $\\alpha$ and $\\beta$ values\n",
+    "which are non-zero, we can rewrite the above convolution expressions\n",
+    "as a matrix-vector multiplication"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b70cfb9a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\boldsymbol{\\delta}=\\begin{bmatrix}\\alpha_0 & 0 & 0 & 0 \\\\\n",
+    "                            \\alpha_1 & \\alpha_0 & 0 & 0 \\\\\n",
+    "\t\t\t    \\alpha_2 & \\alpha_1 & \\alpha_0 & 0 \\\\\n",
+    "\t\t\t    0 & \\alpha_2 & \\alpha_1 & \\alpha_0 \\\\\n",
+    "\t\t\t    0 & 0 & \\alpha_2 & \\alpha_1 \\\\\n",
+    "\t\t\t    0 & 0 & 0 & \\alpha_2\n",
+    "\t\t\t    \\end{bmatrix}\\begin{bmatrix} \\beta_0 \\\\ \\beta_1 \\\\ \\beta_2 \\\\ \\beta_3\\end{bmatrix}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b4da3223",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Commutative process\n",
+    "\n",
+    "The process is commutative and we can easily see that we can rewrite the multiplication in terms of  a matrix holding $\\beta$ and a vector holding $\\alpha$.\n",
+    "In this case we have"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e1f7894a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\boldsymbol{\\delta}=\\begin{bmatrix}\\beta_0 & 0 & 0  \\\\\n",
+    "                            \\beta_1 & \\beta_0 & 0  \\\\\n",
+    "\t\t\t    \\beta_2 & \\beta_1 & \\beta_0  \\\\\n",
+    "\t\t\t    \\beta_3 & \\beta_2 & \\beta_1 \\\\\n",
+    "\t\t\t    0 & \\beta_3 & \\beta_2 \\\\\n",
+    "\t\t\t    0 & 0 & \\beta_3\n",
+    "\t\t\t    \\end{bmatrix}\\begin{bmatrix} \\alpha_0 \\\\ \\alpha_1 \\\\ \\alpha_2\\end{bmatrix}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d9d03601",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Note that the use of these matrices is for mathematical purposes only\n",
+    "and not implementation purposes.  When implementing the above equation\n",
+    "we do not encode (and allocate memory) the matrices explicitely.  We\n",
+    "rather code the convolutions in the minimal memory footprint that they\n",
+    "require."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3c50a242",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Toeplitz matrices\n",
+    "\n",
+    "The above matrices are examples of so-called [Toeplitz\n",
+    "matrices](https://link.springer.com/book/10.1007/978-93-86279-04-0). A\n",
+    "Toeplitz matrix is a matrix in which each descending diagonal from\n",
+    "left to right is constant. For instance the last matrix, which we\n",
+    "rewrite as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "222d97ac",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\boldsymbol{A}=\\begin{bmatrix}a_0 & 0 & 0  \\\\\n",
+    "                            a_1 & a_0 & 0  \\\\\n",
+    "\t\t\t    a_2 & a_1 & a_0  \\\\\n",
+    "\t\t\t    a_3 & a_2 & a_1 \\\\\n",
+    "\t\t\t    0 & a_3 & a_2 \\\\\n",
+    "\t\t\t    0 & 0 & a_3\n",
+    "\t\t\t    \\end{bmatrix},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f9b00efe",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "with elements $a_{ii}=a_{i+1,j+1}=a_{i-j}$ is an example of a Toeplitz\n",
+    "matrix. Such a matrix does not need to be a square matrix.  Toeplitz\n",
+    "matrices are also closely connected with Fourier series, because the multiplication operator by a trigonometric\n",
+    "polynomial, compressed to a finite-dimensional space, can be\n",
+    "represented by such a matrix. The example above shows that we can\n",
+    "represent linear convolution as multiplication of a Toeplitz matrix by\n",
+    "a vector."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "722e1945",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Fourier series and Toeplitz matrices\n",
+    "\n",
+    "This is an active and ogoing research area concerning CNNs. The following articles may be of interest\n",
+    "1. [Read more about the convolution theorem and Fouriers series](https://www.sciencedirect.com/topics/engineering/convolution-theorem#:~:text=The%20convolution%20theorem%20(together%20with,k%20)%20G%20(%20k%20)%20.)\n",
+    "\n",
+    "2. [Fourier Transform Layer](https://www.sciencedirect.com/science/article/pii/S1568494623006257)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f3602d93",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Generalizing the above one-dimensional case\n",
+    "\n",
+    "In order to align the above simple case with the more general\n",
+    "convolution cases, we rename $\\boldsymbol{\\alpha}$, whose length is $m=3$,\n",
+    "with $\\boldsymbol{w}$.  We will interpret $\\boldsymbol{w}$ as a weight/filter function\n",
+    "with which we want to perform the convolution with an input variable\n",
+    "$\\boldsymbol{x}$ of length $n$.  We will assume always that the filter\n",
+    "$\\boldsymbol{w}$ has dimensionality $m \\le n$.\n",
+    "\n",
+    "We replace thus $\\boldsymbol{\\beta}$ with $\\boldsymbol{x}$ and $\\boldsymbol{\\delta}$ with $\\boldsymbol{y}$ and have"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "57e33522",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "y(i)= \\left(x*w\\right)(i)= \\sum_{k=0}^{k=m-1}w(k)x(i-k),\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ce07f8ea",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "where $m=3$ in our case, the maximum length of the vector $\\boldsymbol{w}$.\n",
+    "Here the symbol $*$ represents the mathematical operation of convolution."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6b906a6b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Memory considerations\n",
+    "\n",
+    "This expression leaves us however with some terms with negative\n",
+    "indices, for example $x(-1)$ and $x(-2)$ which may not be defined. Our\n",
+    "vector $\\boldsymbol{x}$ has components $x(0)$, $x(1)$, $x(2)$ and $x(3)$.\n",
+    "\n",
+    "The index $j$ for $\\boldsymbol{x}$ runs from $j=0$ to $j=3$ since $\\boldsymbol{x}$ is meant to\n",
+    "represent a third-order polynomial.\n",
+    "\n",
+    "Furthermore, the index $i$ runs from $i=0$ to $i=5$ since $\\boldsymbol{y}$\n",
+    "contains the coefficients of a fifth-order polynomial.  When $i=5$ we\n",
+    "may also have values of $x(4)$ and $x(5)$ which are not defined."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8f30e92b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Padding\n",
+    "\n",
+    "The solution to this is what is called **padding**!  We simply define a\n",
+    "new vector $x$ with two added elements set to zero before $x(0)$ and\n",
+    "two new elements after $x(3)$ set to zero. That is, we augment the\n",
+    "length of $\\boldsymbol{x}$ from $n=4$ to $n+2P=8$, where $P=2$ is the padding\n",
+    "constant (a new hyperparameter), see discussions below as well."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "dd92f08d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## New vector\n",
+    "\n",
+    "We have a new vector defined as $x(0)=0$, $x(1)=0$,\n",
+    "$x(2)=\\beta_0$, $x(3)=\\beta_1$, $x(4)=\\beta_2$, $x(5)=\\beta_3$,\n",
+    "$x(6)=0$, and $x(7)=0$.\n",
+    "\n",
+    "We have added four new elements, which\n",
+    "are all zero. The benefit is that we can rewrite the equation for\n",
+    "$\\boldsymbol{y}$, with $i=0,1,\\dots,5$,"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "58c275aa",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "y(i) = \\sum_{k=0}^{k=m-1}w(k)x(i+(m-1)-k).\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "bc5b3b85",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "As an example, we have"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d5ee2e1a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "y(4)=x(6)w(0)+x(5)w(1)+x(4)w(2)=0\\times \\alpha_0+\\beta_3\\alpha_1+\\beta_2\\alpha_2,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ea889824",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "as before except that we have an additional term $x(6)w(0)$, which is zero.\n",
+    "\n",
+    "Similarly, for the fifth-order term we have"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e6b90b52",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "y(5)=x(7)w(0)+x(6)w(1)+x(5)w(2)=0\\times \\alpha_0+0\\times\\alpha_1+\\beta_3\\alpha_2.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a2470cf7",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "The zeroth-order term is"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f4ab5455",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "y(0)=x(2)w(0)+x(1)w(1)+x(0)w(2)=\\beta_0 \\alpha_0+0\\times\\alpha_1+0\\times\\alpha_2=\\alpha_0\\beta_0.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "87592676",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Rewriting as dot products\n",
+    "\n",
+    "If we now flip the filter/weight vector, with the following term as a typical example"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "609c7201",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "y(0)=x(2)w(0)+x(1)w(1)+x(0)w(2)=x(2)\\tilde{w}(2)+x(1)\\tilde{w}(1)+x(0)\\tilde{w}(0),\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c3a1c25a",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "with $\\tilde{w}(0)=w(2)$, $\\tilde{w}(1)=w(1)$, and $\\tilde{w}(2)=w(0)$, we can then rewrite the above sum as a dot product of\n",
+    "$x(i:i+(m-1))\\tilde{w}$ for element $y(i)$, where $x(i:i+(m-1))$ is simply a patch of $\\boldsymbol{x}$ of size $m-1$.\n",
+    "\n",
+    "The padding $P$ we have introduced for the convolution stage is just\n",
+    "another hyperparameter which is introduced as part of the\n",
+    "architecture. Similarly, below we will also introduce another\n",
+    "hyperparameter called **Stride** $S$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "bbae281b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Cross correlation\n",
+    "\n",
+    "In essentially all applications one uses what is called cross correlation instead of the standard convolution described above.\n",
+    "This means that multiplication is performed in the same direction and instead of the general expression we have discussed above (with infinite sums)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9b103e11",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "y(i) = \\sum_{k=-\\infty}^{k=\\infty}w(k)x(i-k),\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8d898823",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "we have now"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5f5bb6b9",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "y(i) = \\sum_{k=-\\infty}^{k=\\infty}w(k)x(i+k).\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b2f9923d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Both TensorFlow and PyTorch (as well as our own code example below),\n",
+    "implement the last equation, although it is normally referred to as\n",
+    "convolution.  The same padding rules and stride rules discussed below\n",
+    "apply to this expression as well.\n",
+    "\n",
+    "We leave it as an exercise for you to convince yourself that the example we have discussed till now, gives the same final expression using the last expression."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d926b66e",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Two-dimensional objects\n",
+    "\n",
+    "We are now ready to start studying the discrete convolutions relevant for convolutional neural networks.\n",
+    "We often use convolutions over more than one dimension at a time. If\n",
+    "we have a two-dimensional image $X$ as input, we can have a **filter**\n",
+    "defined by a two-dimensional **kernel/weight/filter** $W$. This leads to an output $Y$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "14707e6f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "Y(i,j)=(X * W)(i,j) = \\sum_m\\sum_n X(m,n)W(i-m,j-n).\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "23b17e45",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Convolution is a commutative process, which means we can rewrite this equation as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6065b88f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "Y(i,j)=(X * W)(i,j) = \\sum_m\\sum_n X(i-m,j-n)W(m,n).\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a6c32a44",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "Normally the latter is more straightforward to implement in a machine\n",
+    "larning library since there is less variation in the range of values\n",
+    "of $m$ and $n$.\n",
+    "\n",
+    "As mentioned above, most deep learning libraries implement\n",
+    "cross-correlation instead of convolution (although it is referred to as\n",
+    "convolution)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "814916d0",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "Y(i,j)=(X * W)(i,j) = \\sum_m\\sum_n X(i+m,j+n)W(m,n).\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "99146caf",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## CNNs in more detail, simple example\n",
+    "\n",
+    "Let assume we have an input matrix $X$ of dimensionality $3\\times 3$\n",
+    "and a $2\\times 2$ filter $W$ given by the following matrices"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4bd65000",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\boldsymbol{X}=\\begin{bmatrix}x_{00} & x_{01} & x_{02}  \\\\\n",
+    "                      x_{10} & x_{11} & x_{12}  \\\\\n",
+    "\t              x_{20} & x_{21} & x_{22} \\end{bmatrix},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "40242730",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "96c2e908",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\boldsymbol{W}=\\begin{bmatrix}w_{00} & w_{01} \\\\\n",
+    "\t              w_{10} & w_{11}\\end{bmatrix}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7f8a58b1",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "We introduce now the hyperparameter $S$ **stride**. Stride represents how the filter $W$ moves the convolution process on the matrix $X$.\n",
+    "We strongly recommend the repository on [Arithmetic of deep learning by Dumoulin and Visin](https://github.com/vdumoulin/conv_arithmetic) \n",
+    "\n",
+    "Here we set the stride equal to $S=1$, which means that, starting with the element $x_{00}$, the filter will act on $2\\times 2$ submatrices each time, starting with the upper corner and moving according to the stride value column by column. \n",
+    "\n",
+    "Here we perform the operation"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a8a9c2fd",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "Y_(i,j)=(X * W)(i,j) = \\sum_m\\sum_n X(i-m,j-n)W(m,n),\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "23d99a1f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and obtain"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6ba61d60",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\boldsymbol{Y}=\\begin{bmatrix}x_{00}w_{00}+x_{01}w_{01}+x_{10}w_{10}+x_{11}w_{11} & x_{01}w_{00}+x_{02}w_{01}+x_{11}w_{10}+x_{12}w_{11}  \\\\\n",
+    "\t              x_{10}w_{00}+x_{11}w_{01}+x_{20}w_{10}+x_{21}w_{11} & x_{11}w_{00}+x_{12}w_{01}+x_{21}w_{10}+x_{22}w_{11}\\end{bmatrix}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3ea3a585",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "We can rewrite this operation in terms of a matrix-vector multiplication by defining a new vector where we flatten out the inputs as a vector $\\boldsymbol{X}'$ of length $9$ and\n",
+    "a matrix $\\boldsymbol{W}'$ with dimension $4\\times 9$ as"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "752efa1b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\boldsymbol{X}'=\\begin{bmatrix}x_{00} \\\\ x_{01} \\\\ x_{02} \\\\ x_{10} \\\\ x_{11} \\\\ x_{12} \\\\ x_{20} \\\\ x_{21} \\\\ x_{22} \\end{bmatrix},\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "170c74d4",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and the new matrix"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "50f540f3",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\boldsymbol{W}'=\\begin{bmatrix} w_{00} & w_{01} & 0 & w_{10} & w_{11} & 0 & 0 & 0 & 0 \\\\\n",
+    "                        0  & w_{00} & w_{01} & 0 & w_{10} & w_{11} & 0 & 0 & 0 \\\\\n",
+    "\t\t\t0 & 0 & 0 & w_{00} & w_{01} & 0 & w_{10} & w_{11} & 0  \\\\\n",
+    "                        0 & 0 & 0 & 0 & w_{00} & w_{01} & 0 & w_{10} & w_{11}\\end{bmatrix}.\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0df97e58",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "We see easily that performing the matrix-vector multiplication $\\boldsymbol{W}'\\boldsymbol{X}'$ is the same as the above convolution with stride $S=1$, that is"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2377f353",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "Y=(\\boldsymbol{W}*\\boldsymbol{X}),\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b552110f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "is now given by $\\boldsymbol{W}'\\boldsymbol{X}'$ which is a vector of length $4$ instead of the originally resulting  $2\\times 2$ output matrix."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c40d02f1",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## The convolution stage\n",
+    "\n",
+    "The convolution stage, where we apply different filters $\\boldsymbol{W}$ in\n",
+    "order to reduce the dimensionality of an image, adds, in addition to\n",
+    "the weights and biases (to be trained by the back propagation\n",
+    "algorithm) that define the filters, two new hyperparameters, the so-called\n",
+    "**padding** $P$ and the stride $S$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1807c0c5",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Finding the number of parameters\n",
+    "\n",
+    "In the above example we have an input matrix of dimension $3\\times\n",
+    "3$. In general we call the input for an input volume and it is defined\n",
+    "by its width $H_1$, height $H_1$ and depth $D_1$. If we have the\n",
+    "standard three color channels $D_1=3$.\n",
+    "\n",
+    "The above example has $W_1=H_1=3$ and $D_1=1$.\n",
+    "\n",
+    "When we introduce the filter we have the following additional hyperparameters\n",
+    "1. $K$ the number of filters. It is common to perform the convolution of the input several times since by experience shrinking the input too fast does not work well\n",
+    "\n",
+    "2. $F$ as the filter's spatial extent\n",
+    "\n",
+    "3. $S$ as the stride parameter\n",
+    "\n",
+    "4. $P$ as the padding parameter\n",
+    "\n",
+    "These parameters are defined by the architecture of the network and are not included in the training."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "53c92169",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## New image (or volume)\n",
+    "\n",
+    "Acting with the filter on the input volume produces an output volume\n",
+    "which is defined by its width $W_2$, its height $H_2$ and its depth\n",
+    "$D_2$.\n",
+    "\n",
+    "These are defined by the following relations"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "dc6dfd20",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "W_2 = \\frac{(W_1-F+2P)}{S}+1,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4d115a9b",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "H_2 = \\frac{(H_1-F+2P)}{S}+1,\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c9b1267f",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "and $D_2=K$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6acbe046",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Parameters to train, common settings\n",
+    "\n",
+    "With parameter sharing, the convolution involves thus  for each filter  $F\\times F\\times D_1$ weights plus one bias parameter.\n",
+    "\n",
+    "In total we have"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "742a6912",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "$$\n",
+    "\\left(F\\times F\\times D_1)\\right) \\times K+(K\\mathrm{--biases}),\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "eb5c047d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "parameters to train by back propagation.\n",
+    "\n",
+    "It is common to let $K$ come in powers of $2$, that is $32$, $64$, $128$ etc.\n",
+    "\n",
+    "**Common settings.**\n",
+    "\n",
+    "1. $\\begin{array}{c} F=3 & S=1 & P=1 \\end{array}$\n",
+    "\n",
+    "2. $\\begin{array}{c} F=5 & S=1 & P=2 \\end{array}$\n",
+    "\n",
+    "3. $\\begin{array}{c} F=5 & S=2 & P=\\mathrm{open} \\end{array}$\n",
+    "\n",
+    "4. $\\begin{array}{c} F=1 & S=1 & P=0 \\end{array}$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "86c76c63",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Examples of CNN setups\n",
+    "\n",
+    "Let us assume we have an input volume $V$ given by an image of dimensionality\n",
+    "$32\\times 32 \\times 3$, that is three color channels and $32\\times 32$ pixels.\n",
+    "\n",
+    "We apply a filter of dimension $5\\times 5$ ten times with stride $S=1$ and padding $P=0$.\n",
+    "\n",
+    "The output volume is given by $(32-5)/1+1=28$, resulting in ten images\n",
+    "of dimensionality $28\\times 28\\times 3$.\n",
+    "\n",
+    "The total number of parameters to train for each filter is then\n",
+    "$5\\times 5\\times 3+1$, where the last parameter is the bias. This\n",
+    "gives us $76$ parameters for each filter, leading to a total of $760$\n",
+    "parameters for the ten filters.\n",
+    "\n",
+    "How many parameters will a filter of dimensionality $3\\times 3$\n",
+    "(adding color channels) result in if we produce $32$ new images? Use $S=1$ and $P=0$.\n",
+    "\n",
+    "Note that strides constitute a form of **subsampling**. As an alternative to\n",
+    "being interpreted as a measure of how much the kernel/filter is translated, strides\n",
+    "can also be viewed as how much of the output is retained. For instance, moving\n",
+    "the kernel by hops of two is equivalent to moving the kernel by hops of one but\n",
+    "retaining only odd output elements."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d1e640cf",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Summarizing: Performing a general discrete convolution ([From Raschka et al](https://github.com/rasbt/machine-learning-book))\n",
+    "\n",
+    "<!-- dom:FIGURE: [figslides/discreteconv1.png, width=500 frac=0.67]  A deep CNN -->\n",
+    "<!-- begin figure -->\n",
+    "\n",
+    "<img src=\"figslides/discreteconv1.png\" width=\"500\"><p style=\"font-size: 0.9em\"><i>Figure 1: A deep CNN</i></p>\n",
+    "<!-- end figure -->"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "43e91f54",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Pooling\n",
+    "\n",
+    "In addition to discrete convolutions themselves, **pooling** operations\n",
+    "make up another important building block in CNNs. Pooling operations reduce\n",
+    "the size of feature maps by using some function to summarize subregions, such\n",
+    "as taking the average or the maximum value.\n",
+    "\n",
+    "Pooling works by sliding a window across the input and feeding the content of\n",
+    "the window to a **pooling function**. In some sense, pooling works very much\n",
+    "like a discrete convolution, but replaces the linear combination described by\n",
+    "the kernel with some other function."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "711cbbe2",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Pooling arithmetic\n",
+    "\n",
+    "In a neural network, pooling layers provide invariance to small translations of\n",
+    "the input. The most common kind of pooling is **max pooling**, which\n",
+    "consists in splitting the input in (usually non-overlapping) patches and\n",
+    "outputting the maximum value of each patch. Other kinds of pooling exist, e.g.,\n",
+    "mean or average pooling, which all share the same idea of aggregating the input\n",
+    "locally by applying a non-linearity to the content of some patches."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e6c29654",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Pooling types ([From Raschka et al](https://github.com/rasbt/machine-learning-book))\n",
+    "\n",
+    "<!-- dom:FIGURE: [figslides/maxpooling.png, width=500 frac=0.67]  A deep CNN -->\n",
+    "<!-- begin figure -->\n",
+    "\n",
+    "<img src=\"figslides/maxpooling.png\" width=\"500\"><p style=\"font-size: 0.9em\"><i>Figure 1: A deep CNN</i></p>\n",
+    "<!-- end figure -->"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "df2a810d",
+   "metadata": {
+    "editable": true
+   },
+   "source": [
+    "## Building convolutional neural networks in Tensorflow/Keras and PyTorch\n",
+    "\n",
+    "As discussed above, CNNs are neural networks built from the assumption\n",
+    "that the inputs to the network are 2D images. This is important\n",
+    "because the number of features or pixels in images grows very fast\n",
+    "with the image size, and an enormous number of weights and biases are\n",
+    "needed in order to build an accurate network.  Next week we will\n",
+    "discuss in more detail how we can build a CNN using either TensorFlow\n",
+    "with Keras and PyTorch."
+   ]
+  }
+ ],
+ "metadata": {},
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/doc/src/week44/Backup2021.do.txt b/doc/src/week44/Previousversions/Backup2021.do.txt
similarity index 100%
rename from doc/src/week44/Backup2021.do.txt
rename to doc/src/week44/Previousversions/Backup2021.do.txt
diff --git a/doc/src/week44/Backup2022.do.txt b/doc/src/week44/Previousversions/Backup2022.do.txt
similarity index 100%
rename from doc/src/week44/Backup2022.do.txt
rename to doc/src/week44/Previousversions/Backup2022.do.txt
diff --git a/doc/src/week44/Previousversions/week44.do.txt b/doc/src/week44/Previousversions/week44.do.txt
new file mode 100644
index 000000000..cc51f319a
--- /dev/null
+++ b/doc/src/week44/Previousversions/week44.do.txt
@@ -0,0 +1,4080 @@
+TITLE: Week 44,  Convolutional Neural Networks (CNN)
+AUTHOR: Morten Hjorth-Jensen {copyright, 1999-present|CC BY-NC} at Department of Physics, University of Oslo, Norway
+DATE: Week 44
+
+
+!split
+===== Plan for week 44 =====
+
+!bblock Material for the lecture Monday October 27, 2025
+o Convolutional  Neural Networks
+o Readings and Videos:
+  * These lecture notes at URL:"https://github.com/CompPhysics/MachineLearning/blob/master/doc/pub/week44/ipynb/week44.ipynb"
+  * For a more in depth discussion on  neural networks we recommend Goodfellow et al chapter 9. See also chapter 11 and 12 on practicalities and applications
+  * Reading suggestions for implementation of CNNs see Rashcka et al.'s chapter 14 at URL:"https://github.com/rasbt/machine-learning-book/tree/main/ch14".     
+  * Video on Deep Learning at URL:"https://www.youtube.com/playlist?list=PLZHQObOWTQDNU6R1_67000Dx_ZCJB-3pi"
+  * Video  on Convolutional Neural Networks from MIT at URL:"https://www.youtube.com/watch?v=iaSUYvmCekI&ab_channel=AlexanderAmini"
+  * Video on CNNs from Stanford at URL:"https://www.youtube.com/watch?v=bNb2fEVKeEo&list=PLC1qU-LWwrF64f4QKQT-Vg5Wr4qEE1Zxk&index=6&ab_channel=StanfordUniversitySchoolofEngineering"
+#  * Video of lecture October 28 at URL:"https://youtu.be/rfrSfikAz94"
+#  * Whiteboard notes at URL:"https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2024/NotesOctober28"
+!eblock
+
+
+!split
+===== Lab  sessions on Tuesday and Wednesday =====
+
+!bblock
+* Main focus is discussion of and work on project 2
+* If you did not get time to finish the exercises from week 43, you can also keep working on them and hand in this coming Friday
+!eblock  
+
+
+
+
+!split
+===== Material for Lecture Monday October 28 =====
+
+
+
+!split
+===== Convolutional Neural Networks (recognizing images) =====
+
+
+Convolutional neural networks (CNNs) were developed during the last
+decade of the previous century, with a focus on character recognition
+tasks. Nowadays, CNNs are a central element in the spectacular success
+of deep learning methods. The success in for example image
+classifications have made them a central tool for most machine
+learning practitioners.
+
+CNNs are very similar to ordinary Neural Networks.
+They are made up of neurons that have learnable weights and
+biases. Each neuron receives some inputs, performs a dot product and
+optionally follows it with a non-linearity. The whole network still
+expresses a single differentiable score function: from the raw image
+pixels on one end to class scores at the other. And they still have a
+loss function (for example Softmax) on the last (fully-connected) layer
+and all the tips/tricks we developed for learning regular Neural
+Networks still apply (back propagation, gradient descent etc etc).
+
+!split
+===== What is the Difference =====
+
+_CNN architectures make the explicit assumption that
+the inputs are images, which allows us to encode certain properties
+into the architecture. These then make the forward function more
+efficient to implement and vastly reduce the amount of parameters in
+the network._
+
+
+
+!split
+===== Neural Networks vs CNNs =====
+
+Neural networks are defined as _affine transformations_, that is 
+a vector is received as input and is multiplied with a matrix of so-called weights (our unknown paramters) to produce an
+output (to which a bias vector is usually added before passing the result
+through a nonlinear activation function). This is applicable to any type of input, be it an
+image, a sound clip or an unordered collection of features: whatever their
+dimensionality, their representation can always be flattened into a vector
+before the transformation.
+
+
+!split
+===== Why CNNS for images, sound files, medical images from CT scans etc? =====
+
+However, when we consider images, sound clips and many other similar kinds of data, these data  have an intrinsic
+structure. More formally, they share these important properties:
+* They are stored as multi-dimensional arrays (think of the pixels of a figure) .
+* They feature one or more axes for which ordering matters (e.g., width and height axes for an image, time axis for a sound clip).
+* One axis, called the channel axis, is used to access different views of the data (e.g., the red, green and blue channels of a color image, or the left and right channels of a stereo audio track).
+
+These properties are not exploited when an affine transformation is applied; in
+fact, all the axes are treated in the same way and the topological information
+is not taken into account. Still, taking advantage of the implicit structure of
+the data may prove very handy in solving some tasks, like computer vision and
+speech recognition, and in these cases it would be best to preserve it. This is
+where discrete convolutions come into play.
+
+A discrete convolution is a linear transformation that preserves this notion of
+ordering. It is sparse (only a few input units contribute to a given output
+unit) and reuses parameters (the same weights are applied to multiple locations
+in the input).
+
+
+
+
+!split
+===== Regular NNs don’t scale well to full images =====
+
+As an example, consider
+an image of size $32\times 32\times 3$ (32 wide, 32 high, 3 color channels), so a
+single fully-connected neuron in a first hidden layer of a regular
+Neural Network would have $32\times 32\times 3 = 3072$ weights. This amount still
+seems manageable, but clearly this fully-connected structure does not
+scale to larger images. For example, an image of more respectable
+size, say $200\times 200\times 3$, would lead to neurons that have 
+$200\times 200\times 3 = 120,000$ weights. 
+
+We could have
+several such neurons, and the parameters would add up quickly! Clearly,
+this full connectivity is wasteful and the huge number of parameters
+would quickly lead to possible overfitting.
+
+FIGURE: [figslides/nn.jpeg, width=500 frac=0.6]  A regular 3-layer Neural Network.
+
+!split
+===== 3D volumes of neurons ===== 
+
+Convolutional Neural Networks take advantage of the fact that the
+input consists of images and they constrain the architecture in a more
+sensible way. 
+
+In particular, unlike a regular Neural Network, the
+layers of a CNN have neurons arranged in 3 dimensions: width,
+height, depth. (Note that the word depth here refers to the third
+dimension of an activation volume, not to the depth of a full Neural
+Network, which can refer to the total number of layers in a network.)
+
+To understand it better, the above example of an image 
+with an input volume of
+activations has dimensions $32\times 32\times 3$ (width, height,
+depth respectively). 
+
+The neurons in a layer will
+only be connected to a small region of the layer before it, instead of
+all of the neurons in a fully-connected manner. Moreover, the final
+output layer could  for this specific image have dimensions $1\times 1 \times 10$, 
+because by the
+end of the CNN architecture we will reduce the full image into a
+single vector of class scores, arranged along the depth
+dimension. 
+
+FIGURE: [figslides/cnn.jpeg, width=500 frac=0.6]  A CNN arranges its neurons in three dimensions (width, height, depth), as visualized in one of the layers. Every layer of a CNN transforms the 3D input volume to a 3D output volume of neuron activations. In this example, the red input layer holds the image, so its width and height would be the dimensions of the image, and the depth would be 3 (Red, Green, Blue channels).
+
+
+!split
+===== More on Dimensionalities =====
+
+In fields like signal processing (and imaging as well), one designs
+so-called filters. These filters are defined by the convolutions and
+are often hand-crafted. One may specify filters for smoothing, edge
+detection, frequency reshaping, and similar operations. However with
+neural networks the idea is to automatically learn the filters and use
+many of them in conjunction with non-linear operations (activation
+functions).
+
+As an example consider a neural network operating on sound sequence
+data.  Assume that we an input vector $\bm{x}$ of length $d=10^6$.  We
+construct then a neural network with onle hidden layer only with
+$10^4$ nodes. This means that we will have a weight matrix with
+$10^4\times 10^6=10^{10}$ weights to be determined, together with $10^4$ biases.
+
+Assume furthermore that we have an output layer which is meant to train whether the sound sequence represents a human voice (true) or something else (false).
+It means that we have only one output node. But since this output node connects to $10^4$ nodes in the hidden layer, there are in total $10^4$ weights to be determined for the output layer, plus one bias. In total we have
+
+!bt
+\[
+\mathrm{NumberParameters}=10^{10}+10^4+10^4+1 \approx 10^{10},
+\]
+!et
+that is ten billion parameters to determine. 
+
+
+!split
+===== Further remarks =====
+
+
+
+The main principles that justify convolutions is locality of
+information and repetion of patterns within the signal. Sound samples
+of the input in adjacent spots are much more likely to affect each
+other than those that are very far away. Similarly, sounds are
+repeated in multiple times in the signal. While slightly simplistic,
+reasoning about such a sound example demonstrates this. The same
+principles then apply to images and other similar data.
+
+
+
+
+
+!split 
+===== Layers used to build CNNs =====
+
+
+A simple CNN is a sequence of layers, and every layer of a CNN
+transforms one volume of activations to another through a
+differentiable function. We use three main types of layers to build
+CNN architectures: Convolutional Layer, Pooling Layer, and
+Fully-Connected Layer (exactly as seen in regular Neural Networks). We
+will stack these layers to form a full CNN architecture.
+
+A simple CNN for image classification could have the architecture:
+
+* _INPUT_ ($32\times 32 \times 3$) will hold the raw pixel values of the image, in this case an image of width 32, height 32, and with three color channels R,G,B.
+* _CONV_ (convolutional )layer will compute the output of neurons that are connected to local regions in the input, each computing a dot product between their weights and a small region they are connected to in the input volume. This may result in volume such as $[32\times 32\times 12]$ if we decided to use 12 filters.
+* _RELU_ layer will apply an elementwise activation function, such as the $max(0,x)$ thresholding at zero. This leaves the size of the volume unchanged ($[32\times 32\times 12]$).
+* _POOL_ (pooling) layer will perform a downsampling operation along the spatial dimensions (width, height), resulting in volume such as $[16\times 16\times 12]$.
+* _FC_ (i.e. fully-connected) layer will compute the class scores, resulting in volume of size $[1\times 1\times 10]$, where each of the 10 numbers correspond to a class score, such as among the 10 categories of the MNIST images we considered above . As with ordinary Neural Networks and as the name implies, each neuron in this layer will be connected to all the numbers in the previous volume.
+
+
+!split
+===== Transforming images =====
+
+CNNs transform the original image layer by layer from the original
+pixel values to the final class scores. 
+
+Observe that some layers contain
+parameters and other don’t. In particular, the CNN layers perform
+transformations that are a function of not only the activations in the
+input volume, but also of the parameters (the weights and biases of
+the neurons). On the other hand, the RELU/POOL layers will implement a
+fixed function. The parameters in the CONV/FC layers will be trained
+with gradient descent so that the class scores that the CNN computes
+are consistent with the labels in the training set for each image.
+
+
+!split
+===== CNNs in brief =====
+
+In summary:
+
+* A CNN architecture is in the simplest case a list of Layers that transform the image volume into an output volume (e.g. holding the class scores)
+* There are a few distinct types of Layers (e.g. CONV/FC/RELU/POOL are by far the most popular)
+* Each Layer accepts an input 3D volume and transforms it to an output 3D volume through a differentiable function
+* Each Layer may or may not have parameters (e.g. CONV/FC do, RELU/POOL don’t)
+* Each Layer may or may not have additional hyperparameters (e.g. CONV/FC/POOL do, RELU doesn’t)
+
+!split
+===== A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book") =====
+
+FIGURE: [figslides/deepcnn.png, width=500 frac=0.67]  A deep CNN
+
+!split
+===== Key Idea =====
+
+A dense neural network is representd by an affine operation (like matrix-matrix multiplication) where all parameters are included.
+
+The key idea in CNNs for say imaging is that in images neighbor pixels tend to be related! So we connect
+only neighboring neurons in the input instead of connecting all with the first hidden layer.
+
+We say we perform a filtering (convolution is the mathematical operation). 
+
+
+
+!split
+===== How to do image compression before the era of deep learning =====
+
+The singular-value decomposition (SVD) algorithm has been for decades one of the standard ways of compressing images.
+The "lectures on the SVD":"https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/chapter2.html#the-singular-value-decomposition" give many of the essential details concerning the SVD.
+
+The orthogonal vectors which are obtained from the SVD, can be used to
+project down the dimensionality of a given image. In the example here
+we gray-scale an image and downsize it.
+
+This recipe relies on us being able to actually perform the SVD. For
+large images, and in particular with many images to reconstruct, using the SVD 
+may quickly become an overwhelming task. With the advent of efficient deep
+learning methods like CNNs and later generative methods, these methods
+have become in the last years the premier way of performing image
+analysis. In particular for classification problems with labelled images.
+
+!split
+===== The SVD example =====
+
+!bc pycod
+from matplotlib.image import imread
+import matplotlib.pyplot as plt
+import scipy.linalg as ln
+import numpy as np
+import os
+from PIL import Image
+from math import log10, sqrt 
+plt.rcParams['figure.figsize'] = [16, 8]
+# Import image
+A = imread(os.path.join("figslides/photo1.jpg"))
+X = A.dot([0.299, 0.5870, 0.114]) # Convert RGB to grayscale
+img = plt.imshow(X)
+# convert to gray
+img.set_cmap('gray')
+plt.axis('off')
+plt.show()
+# Call image size
+print(': %s'%str(X.shape))
+
+
+# split the matrix into U, S, VT
+U, S, VT = np.linalg.svd(X,full_matrices=False)
+S = np.diag(S)
+m = 800 # Image's width
+n = 1200 # Image's height
+j = 0
+# Try compression with different k vectors (these represent projections):
+for k in (5,10, 20, 100,200,400,500):
+    # Original size of the image
+    originalSize = m * n 
+    # Size after compressed
+    compressedSize = k * (1 + m + n) 
+    # The projection of the original image
+    Xapprox = U[:,:k] @ S[0:k,:k] @ VT[:k,:]
+    plt.figure(j+1)
+    j += 1
+    img = plt.imshow(Xapprox)
+    img.set_cmap('gray')
+    
+    plt.axis('off')
+    plt.title('k = ' + str(k))
+    plt.show() 
+    print('Original size of image:')
+    print(originalSize)
+    print('Compression rate as Compressed image / Original size:')
+    ratio = compressedSize * 1.0 / originalSize
+    print(ratio)
+    print('Compression rate is ' + str( round(ratio * 100 ,2)) + '%' )  
+    # Estimate MQA
+    x= X.astype("float")
+    y=Xapprox.astype("float")
+    err = np.sum((x - y) ** 2)
+    err /= float(X.shape[0] * Xapprox.shape[1])
+    print('The mean-square deviation '+ str(round( err)))
+    max_pixel = 255.0
+    # Estimate Signal Noise Ratio
+    srv = 20 * (log10(max_pixel / sqrt(err)))
+    print('Signa to noise ratio '+ str(round(srv)) +'dB')
+
+!ec
+
+
+
+!split
+===== Mathematics of CNNs =====
+
+The mathematics of CNNs is based on the mathematical operation of
+_convolution_.  In mathematics (in particular in functional analysis),
+convolution is represented by mathematical operations (integration,
+summation etc) on two functions in order to produce a third function
+that expresses how the shape of one gets modified by the other.
+Convolution has a plethora of applications in a variety of
+disciplines, spanning from statistics to signal processing, computer
+vision, solutions of differential equations,linear algebra,
+engineering, and yes, machine learning.
+
+Mathematically, convolution is defined as follows (one-dimensional example):
+Let us define a continuous function $y(t)$ given by
+!bt
+\[
+y(t) = \int x(a) w(t-a) da,
+\]
+!et
+where $x(a)$ represents a so-called input and $w(t-a)$ is normally called the weight function or kernel.
+
+The above integral is written in  a more compact form as
+!bt
+\[
+y(t) = \left(x * w\right)(t).
+\]
+!et
+
+The discretized version reads
+!bt
+\[
+y(t) = \sum_{a=-\infty}^{a=\infty}x(a)w(t-a).
+\]
+!et
+Computing the inverse of the above convolution operations is known as deconvolution and the process is commutative.
+
+How can we use this? And what does it mean? Let us study some familiar examples first.
+
+
+!split
+===== Convolution Examples: Polynomial multiplication =====
+
+Our first example is that of a multiplication between two polynomials,
+which we will rewrite in terms of the mathematics of convolution. In
+the final stage, since the problem here is a discrete one, we will
+recast the final expression in terms of a matrix-vector
+multiplication, where the matrix is a so-called "Toeplitz matrix
+":"https://link.springer.com/book/10.1007/978-93-86279-04-0".
+
+Let us look a the following polynomials to second and third order, respectively:
+!bt
+\[
+p(t) = \alpha_0+\alpha_1 t+\alpha_2 t^2,
+\]
+!et
+and
+!bt
+\[
+s(t) = \beta_0+\beta_1 t+\beta_2 t^2+\beta_3 t^3.
+\]
+!et
+
+The polynomial multiplication gives us a new polynomial of degree $5$
+!bt
+\[
+z(t) = \delta_0+\delta_1 t+\delta_2 t^2+\delta_3 t^3+\delta_4 t^4+\delta_5 t^5.
+\]
+!et
+
+!split
+===== Efficient Polynomial Multiplication =====
+
+Computing polynomial products can be implemented efficiently if we rewrite the more brute force multiplications using convolution.
+We note first that the new coefficients are given as
+
+!bt
+\begin{split}
+\delta_0=&\alpha_0\beta_0\\
+\delta_1=&\alpha_1\beta_0+\alpha_0\beta_1\\
+\delta_2=&\alpha_0\beta_2+\alpha_1\beta_1+\alpha_2\beta_0\\
+\delta_3=&\alpha_1\beta_2+\alpha_2\beta_1+\alpha_0\beta_3\\
+\delta_4=&\alpha_2\beta_2+\alpha_1\beta_3\\
+\delta_5=&\alpha_2\beta_3.\\
+\end{split}
+!et
+
+
+We note that $\alpha_i=0$ except for $i\in \left\{0,1,2\right\}$ and $\beta_i=0$ except for $i\in\left\{0,1,2,3\right\}$.
+
+We can then rewrite the coefficients $\delta_j$ using a discrete convolution as
+!bt
+\[
+\delta_j = \sum_{i=-\infty}^{i=\infty}\alpha_i\beta_{j-i}=(\alpha * \beta)_j,
+\]
+!et
+or as a double sum with restriction $l=i+j$
+!bt
+\[
+\delta_l = \sum_{ij}\alpha_i\beta_{j}.
+\]
+!et
+
+
+!split
+===== Further simplification =====
+
+Although we may have redundant operations with some few zeros for $\beta_i$, we can rewrite the above sum in a more compact way as 
+!bt
+\[
+\delta_i = \sum_{k=0}^{k=m-1}\alpha_k\beta_{i-k},
+\]
+!et
+where $m=3$ in our case, the maximum length of
+the vector $\alpha$. Note that the vector $\bm{\beta}$ has length $n=4$. Below we will find an even more efficient representation.
+
+!split
+===== A more efficient way of coding the above Convolution =====
+
+Since we only have a finite number of $\alpha$ and $\beta$ values
+which are non-zero, we can rewrite the above convolution expressions
+as a matrix-vector multiplication
+
+!bt
+\[
+\bm{\delta}=\begin{bmatrix}\alpha_0 & 0 & 0 & 0 \\
+                            \alpha_1 & \alpha_0 & 0 & 0 \\
+			    \alpha_2 & \alpha_1 & \alpha_0 & 0 \\
+			    0 & \alpha_2 & \alpha_1 & \alpha_0 \\
+			    0 & 0 & \alpha_2 & \alpha_1 \\
+			    0 & 0 & 0 & \alpha_2
+			    \end{bmatrix}\begin{bmatrix} \beta_0 \\ \beta_1 \\ \beta_2 \\ \beta_3\end{bmatrix}.
+\]
+!et
+
+!split
+===== Commutative process =====
+
+The process is commutative and we can easily see that we can rewrite the multiplication in terms of  a matrix holding $\beta$ and a vector holding $\alpha$.
+In this case we have
+!bt
+\[
+\bm{\delta}=\begin{bmatrix}\beta_0 & 0 & 0  \\
+                            \beta_1 & \beta_0 & 0  \\
+			    \beta_2 & \beta_1 & \beta_0  \\
+			    \beta_3 & \beta_2 & \beta_1 \\
+			    0 & \beta_3 & \beta_2 \\
+			    0 & 0 & \beta_3
+			    \end{bmatrix}\begin{bmatrix} \alpha_0 \\ \alpha_1 \\ \alpha_2\end{bmatrix}.
+\]
+!et
+
+Note that the use of these matrices is for mathematical purposes only
+and not implementation purposes.  When implementing the above equation
+we do not encode (and allocate memory) the matrices explicitely.  We
+rather code the convolutions in the minimal memory footprint that they
+require.
+
+
+
+!split
+===== Toeplitz matrices =====
+
+The above matrices are examples of so-called "Toeplitz
+matrices":"https://link.springer.com/book/10.1007/978-93-86279-04-0". A
+Toeplitz matrix is a matrix in which each descending diagonal from
+left to right is constant. For instance the last matrix, which we
+rewrite as
+!bt
+\[
+\bm{A}=\begin{bmatrix}a_0 & 0 & 0  \\
+                            a_1 & a_0 & 0  \\
+			    a_2 & a_1 & a_0  \\
+			    a_3 & a_2 & a_1 \\
+			    0 & a_3 & a_2 \\
+			    0 & 0 & a_3
+			    \end{bmatrix},
+\]
+!et
+with elements $a_{ii}=a_{i+1,j+1}=a_{i-j}$ is an example of a Toeplitz
+matrix. Such a matrix does not need to be a square matrix.  Toeplitz
+matrices are also closely connected with Fourier series, because the multiplication operator by a trigonometric
+polynomial, compressed to a finite-dimensional space, can be
+represented by such a matrix. The example above shows that we can
+represent linear convolution as multiplication of a Toeplitz matrix by
+a vector.
+
+
+!split
+===== Fourier series and Toeplitz matrices =====
+
+This is an active and ogoing research area concerning CNNs. The following articles may be of interest
+o "Read more about the convolution theorem and Fouriers series":"https://www.sciencedirect.com/topics/engineering/convolution-theorem#:~:text=The%20convolution%20theorem%20(together%20with,k%20)%20G%20(%20k%20)%20."
+o "Fourier Transform Layer":"https://www.sciencedirect.com/science/article/pii/S1568494623006257"
+
+
+
+!split
+===== Generalizing the above one-dimensional case =====
+
+In order to align the above simple case with the more general
+convolution cases, we rename $\bm{\alpha}$, whose length is $m=3$,
+with $\bm{w}$.  We will interpret $\bm{w}$ as a weight/filter function
+with which we want to perform the convolution with an input variable
+$\bm{x}$ of length $n$.  We will assume always that the filter
+$\bm{w}$ has dimensionality $m \le n$.
+
+We replace thus $\bm{\beta}$ with $\bm{x}$ and $\bm{\delta}$ with $\bm{y}$ and have
+!bt
+\[
+y(i)= \left(x*w\right)(i)= \sum_{k=0}^{k=m-1}w(k)x(i-k),
+\]
+!et
+where $m=3$ in our case, the maximum length of the vector $\bm{w}$.
+Here the symbol $*$ represents the mathematical operation of convolution.
+
+
+!split
+===== Memory considerations =====
+
+This expression leaves us however with some terms with negative
+indices, for example $x(-1)$ and $x(-2)$ which may not be defined. Our
+vector $\bm{x}$ has components $x(0)$, $x(1)$, $x(2)$ and $x(3)$.
+
+The index $j$ for $\bm{x}$ runs from $j=0$ to $j=3$ since $\bm{x}$ is meant to
+represent a third-order polynomial.
+
+Furthermore, the index $i$ runs from $i=0$ to $i=5$ since $\bm{y}$
+contains the coefficients of a fifth-order polynomial.  When $i=5$ we
+may also have values of $x(4)$ and $x(5)$ which are not defined.
+
+
+!split
+===== Padding =====
+
+The solution to this is what is called _padding_!  We simply define a
+new vector $x$ with two added elements set to zero before $x(0)$ and
+two new elements after $x(3)$ set to zero. That is, we augment the
+length of $\bm{x}$ from $n=4$ to $n+2P=8$, where $P=2$ is the padding
+constant (a new hyperparameter), see discussions below as well.
+
+!split
+===== New vector =====
+
+We have a new vector defined as $x(0)=0$, $x(1)=0$,
+$x(2)=\beta_0$, $x(3)=\beta_1$, $x(4)=\beta_2$, $x(5)=\beta_3$,
+$x(6)=0$, and $x(7)=0$.
+
+
+We have added four new elements, which
+are all zero. The benefit is that we can rewrite the equation for
+$\bm{y}$, with $i=0,1,\dots,5$,
+!bt
+\[
+y(i) = \sum_{k=0}^{k=m-1}w(k)x(i+(m-1)-k).
+\]
+!et
+
+As an example, we have
+!bt
+\[
+y(4)=x(6)w(0)+x(5)w(1)+x(4)w(2)=0\times \alpha_0+\beta_3\alpha_1+\beta_2\alpha_2,
+\]
+!et
+as before except that we have an additional term $x(6)w(0)$, which is zero.
+
+Similarly, for the fifth-order term we have
+!bt
+\[
+y(5)=x(7)w(0)+x(6)w(1)+x(5)w(2)=0\times \alpha_0+0\times\alpha_1+\beta_3\alpha_2.
+\]
+!et
+
+The zeroth-order term is
+!bt
+\[
+y(0)=x(2)w(0)+x(1)w(1)+x(0)w(2)=\beta_0 \alpha_0+0\times\alpha_1+0\times\alpha_2=\alpha_0\beta_0.
+\]
+!et
+
+!split
+===== Rewriting as dot products =====
+
+If we now flip the filter/weight vector, with the following term as a typical example
+!bt
+\[
+y(0)=x(2)w(0)+x(1)w(1)+x(0)w(2)=x(2)\tilde{w}(2)+x(1)\tilde{w}(1)+x(0)\tilde{w}(0),
+\]
+!et
+with $\tilde{w}(0)=w(2)$, $\tilde{w}(1)=w(1)$, and $\tilde{w}(2)=w(0)$, we can then rewrite the above sum as a dot product of
+$x(i:i+(m-1))\tilde{w}$ for element $y(i)$, where $x(i:i+(m-1))$ is simply a patch of $\bm{x}$ of size $m-1$.
+
+The padding $P$ we have introduced for the convolution stage is just
+another hyperparameter which is introduced as part of the
+architecture. Similarly, below we will also introduce another
+hyperparameter called _Stride_ $S$. 
+
+
+!split
+===== Cross correlation =====
+
+In essentially all applications one uses what is called cross correlation instead of the standard convolution described above.
+This means that multiplication is performed in the same direction and instead of the general expression we have discussed above (with infinite sums)
+!bt
+\[
+y(i) = \sum_{k=-\infty}^{k=\infty}w(k)x(i-k),
+\]
+!et
+we have now
+!bt
+\[
+y(i) = \sum_{k=-\infty}^{k=\infty}w(k)x(i+k).
+\]
+!et
+
+Both TensorFlow and PyTorch (as well as our own code example below),
+implement the last equation, although it is normally referred to as
+convolution.  The same padding rules and stride rules discussed below
+apply to this expression as well.
+
+We leave it as an exercise for you to convince yourself that the example we have discussed till now, gives the same final expression using the last expression.
+
+
+
+=====  Two-dimensional objects =====
+
+We are now ready to start studying the discrete convolutions relevant for convolutional neural networks.
+We often use convolutions over more than one dimension at a time. If
+we have a two-dimensional image $X$ as input, we can have a _filter_
+defined by a two-dimensional _kernel/weight/filter_ $W$. This leads to an output $Y$
+
+!bt
+\[
+Y(i,j)=(X * W)(i,j) = \sum_m\sum_n X(m,n)W(i-m,j-n).
+\]
+!et
+
+Convolution is a commutative process, which means we can rewrite this equation as
+!bt
+\[
+Y(i,j)=(X * W)(i,j) = \sum_m\sum_n X(i-m,j-n)W(m,n).
+\]
+!et
+
+Normally the latter is more straightforward to implement in a machine
+larning library since there is less variation in the range of values
+of $m$ and $n$.
+
+
+As mentioned above, most deep learning libraries implement
+cross-correlation instead of convolution (although it is referred to as
+convolution)
+!bt
+Y(i,j)=(X * W)(i,j) = \sum_m\sum_n X(i+m,j+n)W(m,n).
+\]
+!et
+
+
+
+!split
+===== CNNs in more detail, simple example  =====
+
+
+Let assume we have an input matrix $X$ of dimensionality $3\times 3$
+and a $2\times 2$ filter $W$ given by the following matrices
+
+!bt
+\[
+\bm{X}=\begin{bmatrix}x_{00} & x_{01} & x_{02}  \\
+                      x_{10} & x_{11} & x_{12}  \\
+	              x_{20} & x_{21} & x_{22} \end{bmatrix},
+\]
+!et
+and 
+!bt
+\[
+\bm{W}=\begin{bmatrix}w_{00} & w_{01} \\
+	              w_{10} & w_{11}\end{bmatrix}.
+\]
+!et
+We introduce now the hyperparameter $S$ _stride_. Stride represents how the filter $W$ moves the convolution process on the matrix $X$.
+We strongly recommend the repository on "Arithmetic of deep learning by Dumoulin and Visin":"https://github.com/vdumoulin/conv_arithmetic" 
+
+Here we set the stride equal to $S=1$, which means that, starting with the element $x_{00}$, the filter will act on $2\times 2$ submatrices each time, starting with the upper corner and moving according to the stride value column by column. 
+
+Here we perform the operation
+!bt
+\[
+Y_(i,j)=(X * W)(i,j) = \sum_m\sum_n X(i-m,j-n)W(m,n),
+\]
+!et
+and obtain
+!bt
+\[
+\bm{Y}=\begin{bmatrix}x_{00}w_{00}+x_{01}w_{01}+x_{10}w_{10}+x_{11}w_{11} & x_{01}w_{00}+x_{02}w_{01}+x_{11}w_{10}+x_{12}w_{11}  \\
+	              x_{10}w_{00}+x_{11}w_{01}+x_{20}w_{10}+x_{21}w_{11} & x_{11}w_{00}+x_{12}w_{01}+x_{21}w_{10}+x_{22}w_{11}\end{bmatrix}.
+\]
+!et
+
+We can rewrite this operation in terms of a matrix-vector multiplication by defining a new vector where we flatten out the inputs as a vector $\bm{X}'$ of length $9$ and
+a matrix $\bm{W}'$ with dimension $4\times 9$ as
+
+!bt
+\[
+\bm{X}'=\begin{bmatrix}x_{00} \\ x_{01} \\ x_{02} \\ x_{10} \\ x_{11} \\ x_{12} \\ x_{20} \\ x_{21} \\ x_{22} \end{bmatrix},
+\]
+!et
+
+and the new matrix
+!bt
+\[
+\bm{W}'=\begin{bmatrix} w_{00} & w_{01} & 0 & w_{10} & w_{11} & 0 & 0 & 0 & 0 \\
+                        0  & w_{00} & w_{01} & 0 & w_{10} & w_{11} & 0 & 0 & 0 \\
+			0 & 0 & 0 & w_{00} & w_{01} & 0 & w_{10} & w_{11} & 0  \\
+                        0 & 0 & 0 & 0 & w_{00} & w_{01} & 0 & w_{10} & w_{11}\end{bmatrix}.
+\]
+!et
+
+We see easily that performing the matrix-vector multiplication $\bm{W}'\bm{X}'$ is the same as the above convolution with stride $S=1$, that is
+
+!bt
+\[
+Y=(\bm{W}*\bm{X}),
+\]
+!et
+is now given by $\bm{W}'\bm{X}'$ which is a vector of length $4$ instead of the originally resulting  $2\times 2$ output matrix.
+
+
+!split
+===== The convolution stage =====
+
+The convolution stage, where we apply different filters $\bm{W}$ in
+order to reduce the dimensionality of an image, adds, in addition to
+the weights and biases (to be trained by the back propagation
+algorithm) that define the filters, two new hyperparameters, the so-called
+_padding_ $P$ and the stride $S$.
+
+
+!split
+===== Finding the number of parameters =====
+
+In the above example we have an input matrix of dimension $3\times
+3$. In general we call the input for an input volume and it is defined
+by its width $H_1$, height $H_1$ and depth $D_1$. If we have the
+standard three color channels $D_1=3$.
+
+The above example has $W_1=H_1=3$ and $D_1=1$.
+
+When we introduce the filter we have the following additional hyperparameters
+o $K$ the number of filters. It is common to perform the convolution of the input several times since by experience shrinking the input too fast does not work well
+o $F$ as the filter's spatial extent
+o $S$ as the stride parameter
+o $P$ as the padding parameter
+
+These parameters are defined by the architecture of the network and are not included in the training.
+
+!split
+===== New image (or volume) =====
+
+Acting with the filter on the input volume produces an output volume
+which is defined by its width $W_2$, its height $H_2$ and its depth
+$D_2$.
+
+These are defined by the following relations
+!bt
+\[
+W_2 = \frac{(W_1-F+2P)}{S}+1,
+\]
+!et
+!bt
+\[
+H_2 = \frac{(H_1-F+2P)}{S}+1,
+\]
+!et
+and $D_2=K$.
+
+!split
+===== Parameters to train, common settings =====
+
+With parameter sharing, the convolution involves thus  for each filter  $F\times F\times D_1$ weights plus one bias parameter.
+
+In total we have
+!bt
+\[
+\left(F\times F\times D_1\right) \times K+K_{\mathrm{biases}},
+\]
+!et
+parameters to train by back propagation.
+
+It is common to let $K$ come in powers of $2$, that is $32$, $64$, $128$ etc.
+
+!bblock Common settings
+o $\begin{array}{c} F=3 & S=1 & P=1 \end{array}$
+o $\begin{array}{c} F=5 & S=1 & P=2 \end{array}$
+o $\begin{array}{c} F=5 & S=2 & P=\mathrm{open} \end{array}$
+o $\begin{array}{c} F=1 & S=1 & P=0 \end{array}$
+!eblock
+
+!split
+===== Examples of CNN setups =====
+
+Let us assume we have an input volume $V$ given by an image of dimensionality
+$32\times 32 \times 3$, that is three color channels and $32\times 32$ pixels.
+
+We apply a filter of dimension $5\times 5$ ten times with stride $S=1$ and padding $P=0$.
+
+The output volume is given by $(32-5)/1+1=28$, resulting in ten images
+of dimensionality $28\times 28\times 3$.
+
+The total number of parameters to train for each filter is then
+$5\times 5\times 3+1$, where the last parameter is the bias. This
+gives us $76$ parameters for each filter, leading to a total of $760$
+parameters for the ten filters.
+
+How many parameters will a filter of dimensionality $3\times 3$
+(adding color channels) result in if we produce $32$ new images? Use $S=1$ and $P=0$.
+
+Note that strides constitute a form of _subsampling_. As an alternative to
+being interpreted as a measure of how much the kernel/filter is translated, strides
+can also be viewed as how much of the output is retained. For instance, moving
+the kernel by hops of two is equivalent to moving the kernel by hops of one but
+retaining only odd output elements.
+
+
+
+!split
+===== Summarizing: Performing a general discrete convolution ("From Raschka et al":"https://github.com/rasbt/machine-learning-book") =====
+
+FIGURE: [figslides/discreteconv1.png, width=500 frac=0.67]  A deep CNN
+
+
+
+!split
+===== Pooling =====
+
+In addition to discrete convolutions themselves, _pooling_ operations
+make up another important building block in CNNs. Pooling operations reduce
+the size of feature maps by using some function to summarize subregions, such
+as taking the average or the maximum value.
+
+Pooling works by sliding a window across the input and feeding the content of
+the window to a _pooling function_. In some sense, pooling works very much
+like a discrete convolution, but replaces the linear combination described by
+the kernel with some other function.
+
+
+
+!split
+===== Pooling arithmetic =====
+
+In a neural network, pooling layers provide invariance to small translations of
+the input. The most common kind of pooling is _max pooling_, which
+consists in splitting the input in (usually non-overlapping) patches and
+outputting the maximum value of each patch. Other kinds of pooling exist, e.g.,
+mean or average pooling, which all share the same idea of aggregating the input
+locally by applying a non-linearity to the content of some patches.
+
+
+!split
+===== Pooling types ("From Raschka et al":"https://github.com/rasbt/machine-learning-book") =====
+
+FIGURE: [figslides/maxpooling.png, width=500 frac=0.67]  A deep CNN
+
+
+
+!split
+===== Building convolutional neural networks in Tensorflow and Keras =====
+
+  
+As discussed above, CNNs are neural networks built from the assumption that the inputs
+to the network are 2D images. This is important because the number of features or pixels in images
+grows very fast with the image size, and an enormous number of weights and biases are needed in order to build an accurate network.  
+  
+As before, we still have our input, a hidden layer and an output. What's novel about convolutional networks
+are the _convolutional_ and _pooling_ layers stacked in pairs between the input and the hidden layer.
+In addition, the data is no longer represented as a 2D feature matrix, instead each input is a number of 2D
+matrices, typically 1 for each color dimension (Red, Green, Blue). 
+
+
+!split
+===== Setting it up =====
+
+It means that to represent the entire
+dataset of images, we require a 4D matrix or _tensor_. This tensor has the dimensions:  
+!bt
+\[  
+(n_{inputs},\, n_{pixels, width},\, n_{pixels, height},\, depth) .
+\]
+!et
+  
+!split
+=====  The MNIST dataset again =====
+
+The MNIST dataset consists of grayscale images with a pixel size of
+$28\times 28$, meaning we require $28 \times 28 = 724$ weights to each
+neuron in the first hidden layer.
+
+If we were to analyze images of size $128\times 128$ we would require
+$128 \times 128 = 16384$ weights to each neuron. Even worse if we were
+dealing with color images, as most images are, we have an image matrix
+of size $128\times 128$ for each color dimension (Red, Green, Blue),
+meaning 3 times the number of weights $= 49152$ are required for every
+single neuron in the first hidden layer.
+  
+
+!split
+===== Strong correlations =====
+
+Images typically have strong local correlations, meaning that a small
+part of the image varies little from its neighboring regions. If for
+example we have an image of a blue car, we can roughly assume that a
+small blue part of the image is surrounded by other blue regions.
+
+Therefore, instead of connecting every single pixel to a neuron in the
+first hidden layer, as we have previously done with deep neural
+networks, we can instead connect each neuron to a small part of the
+image (in all 3 RGB depth dimensions).  The size of each small area is
+fixed, and known as a "receptive":"https://en.wikipedia.org/wiki/Receptive_field".
+  
+
+!split 
+===== Layers of a CNN =====
+
+The layers of a convolutional neural network arrange neurons in 3D: width, height and depth.  
+The input image is typically a square matrix of depth 3. 
+
+A _convolution_ is performed on the image which outputs
+a 3D volume of neurons. The weights to the input are arranged in a number of 2D matrices, known as _filters_.
+
+
+Each filter slides along the input image, taking the dot product
+between each small part of the image and the filter, in all depth
+dimensions. This is then passed through a non-linear function,
+typically the _Rectified Linear (ReLu)_ function, which serves as the
+activation of the neurons in the first convolutional layer. This is
+further passed through a _pooling layer_, which reduces the size of the
+convolutional layer, e.g. by taking the maximum or average across some
+small regions, and this serves as input to the next convolutional
+layer.
+
+
+!split
+===== Systematic reduction =====  
+
+By systematically reducing the size of the input volume, through
+convolution and pooling, the network should create representations of
+small parts of the input, and then from them assemble representations
+of larger areas.  The final pooling layer is flattened to serve as
+input to a hidden layer, such that each neuron in the final pooling
+layer is connected to every single neuron in the hidden layer. This
+then serves as input to the output layer, e.g. a softmax output for
+classification.
+  
+
+!split
+===== Prerequisites: Collect and pre-process data =====
+!bc pycod
+# import necessary packages
+import numpy as np
+import matplotlib.pyplot as plt
+from sklearn import datasets
+
+
+# ensure the same random numbers appear every time
+np.random.seed(0)
+
+# display images in notebook
+%matplotlib inline
+plt.rcParams['figure.figsize'] = (12,12)
+
+
+# download MNIST dataset
+digits = datasets.load_digits()
+
+# define inputs and labels
+inputs = digits.images
+labels = digits.target
+
+# RGB images have a depth of 3
+# our images are grayscale so they should have a depth of 1
+inputs = inputs[:,:,:,np.newaxis]
+
+print("inputs = (n_inputs, pixel_width, pixel_height, depth) = " + str(inputs.shape))
+print("labels = (n_inputs) = " + str(labels.shape))
+
+
+# choose some random images to display
+n_inputs = len(inputs)
+indices = np.arange(n_inputs)
+random_indices = np.random.choice(indices, size=5)
+
+for i, image in enumerate(digits.images[random_indices]):
+    plt.subplot(1, 5, i+1)
+    plt.axis('off')
+    plt.imshow(image, cmap=plt.cm.gray_r, interpolation='nearest')
+    plt.title("Label: %d" % digits.target[random_indices[i]])
+plt.show()
+!ec
+
+
+!split
+===== Importing Keras and Tensorflow =====
+!bc pycod
+from tensorflow.keras import datasets, layers, models
+from tensorflow.keras.layers import Input
+from tensorflow.keras.models import Sequential      #This allows appending layers to existing models
+from tensorflow.keras.layers import Dense           #This allows defining the characteristics of a particular layer
+from tensorflow.keras import optimizers             #This allows using whichever optimiser we want (sgd,adam,RMSprop)
+from tensorflow.keras import regularizers           #This allows using whichever regularizer we want (l1,l2,l1_l2)
+from tensorflow.keras.utils import to_categorical   #This allows using categorical cross entropy as the cost function
+#from tensorflow.keras import Conv2D
+#from tensorflow.keras import MaxPooling2D
+#from tensorflow.keras import Flatten
+
+from sklearn.model_selection import train_test_split
+
+# representation of labels
+labels = to_categorical(labels)
+
+# split into train and test data
+# one-liner from scikit-learn library
+train_size = 0.8
+test_size = 1 - train_size
+X_train, X_test, Y_train, Y_test = train_test_split(inputs, labels, train_size=train_size,
+                                                    test_size=test_size)
+!ec
+
+!split 
+===== Running with Keras =====
+
+!bc pycod
+def create_convolutional_neural_network_keras(input_shape, receptive_field,
+                                              n_filters, n_neurons_connected, n_categories,
+                                              eta, lmbd):
+    model = Sequential()
+    model.add(layers.Conv2D(n_filters, (receptive_field, receptive_field), input_shape=input_shape, padding='same',
+              activation='relu', kernel_regularizer=regularizers.l2(lmbd)))
+    model.add(layers.MaxPooling2D(pool_size=(2, 2)))
+    model.add(layers.Flatten())
+    model.add(layers.Dense(n_neurons_connected, activation='relu', kernel_regularizer=regularizers.l2(lmbd)))
+    model.add(layers.Dense(n_categories, activation='softmax', kernel_regularizer=regularizers.l2(lmbd)))
+    
+    sgd = optimizers.SGD(learning_rate=eta)
+    model.compile(loss='categorical_crossentropy', optimizer=sgd, metrics=['accuracy'])
+    
+    return model
+
+epochs = 100
+batch_size = 100
+input_shape = X_train.shape[1:4]
+receptive_field = 3
+n_filters = 10
+n_neurons_connected = 50
+n_categories = 10
+
+eta_vals = np.logspace(-5, 1, 7)
+lmbd_vals = np.logspace(-5, 1, 7)
+!ec
+
+!split
+===== Final part =====
+
+!bc pycod
+CNN_keras = np.zeros((len(eta_vals), len(lmbd_vals)), dtype=object)
+        
+for i, eta in enumerate(eta_vals):
+    for j, lmbd in enumerate(lmbd_vals):
+        CNN = create_convolutional_neural_network_keras(input_shape, receptive_field,
+                                              n_filters, n_neurons_connected, n_categories,
+                                              eta, lmbd)
+        CNN.fit(X_train, Y_train, epochs=epochs, batch_size=batch_size, verbose=0)
+        scores = CNN.evaluate(X_test, Y_test)
+        
+        CNN_keras[i][j] = CNN
+        
+        print("Learning rate = ", eta)
+        print("Lambda = ", lmbd)
+        print("Test accuracy: %.3f" % scores[1])
+        print()
+!ec     
+
+!split
+===== Final visualization =====
+
+!bc pycod
+# visual representation of grid search
+# uses seaborn heatmap, could probably do this in matplotlib
+import seaborn as sns
+
+sns.set()
+
+train_accuracy = np.zeros((len(eta_vals), len(lmbd_vals)))
+test_accuracy = np.zeros((len(eta_vals), len(lmbd_vals)))
+
+for i in range(len(eta_vals)):
+    for j in range(len(lmbd_vals)):
+        CNN = CNN_keras[i][j]
+
+        train_accuracy[i][j] = CNN.evaluate(X_train, Y_train)[1]
+        test_accuracy[i][j] = CNN.evaluate(X_test, Y_test)[1]
+
+        
+fig, ax = plt.subplots(figsize = (10, 10))
+sns.heatmap(train_accuracy, annot=True, ax=ax, cmap="viridis")
+ax.set_title("Training Accuracy")
+ax.set_ylabel("$\eta$")
+ax.set_xlabel("$\lambda$")
+plt.show()
+
+fig, ax = plt.subplots(figsize = (10, 10))
+sns.heatmap(test_accuracy, annot=True, ax=ax, cmap="viridis")
+ax.set_title("Test Accuracy")
+ax.set_ylabel("$\eta$")
+ax.set_xlabel("$\lambda$")
+plt.show()
+!ec
+
+
+
+
+!split
+===== The CIFAR01 data set =====
+
+The CIFAR10 dataset contains 60,000 color images in 10 classes, with
+6,000 images in each class. The dataset is divided into 50,000
+training images and 10,000 testing images. The classes are mutually
+exclusive and there is no overlap between them.
+
+!bc pycod
+import tensorflow as tf
+
+from tensorflow.keras import datasets, layers, models
+import matplotlib.pyplot as plt
+
+# We import the data set
+(train_images, train_labels), (test_images, test_labels) = datasets.cifar10.load_data()
+
+# Normalize pixel values to be between 0 and 1 by dividing by 255. 
+train_images, test_images = train_images / 255.0, test_images / 255.0
+
+!ec
+
+
+
+!split
+===== Verifying the data set =====
+
+To verify that the dataset looks correct, let's plot the first 25 images from the training set and display the class name below each image.
+
+!bc pycod
+class_names = ['airplane', 'automobile', 'bird', 'cat', 'deer',
+               'dog', 'frog', 'horse', 'ship', 'truck']
+plt.figure(figsize=(10,10))
+for i in range(25):
+    plt.subplot(5,5,i+1)
+    plt.xticks([])
+    plt.yticks([])
+    plt.grid(False)
+    plt.imshow(train_images[i], cmap=plt.cm.binary)
+    # The CIFAR labels happen to be arrays, 
+    # which is why you need the extra index
+    plt.xlabel(class_names[train_labels[i][0]])
+plt.show()
+!ec
+
+!split
+===== Set up  the model =====
+
+The 6 lines of code below define the convolutional base using a common pattern: a stack of Conv2D and MaxPooling2D layers.
+
+As input, a CNN takes tensors of shape (image_height, image_width, color_channels), ignoring the batch size. If you are new to these dimensions, color_channels refers to (R,G,B). In this example, you will configure our CNN to process inputs of shape (32, 32, 3), which is the format of CIFAR images. You can do this by passing the argument input_shape to our first layer.
+
+!bc pycod
+model = models.Sequential()
+model.add(layers.Conv2D(32, (3, 3), activation='relu', input_shape=(32, 32, 3)))
+model.add(layers.MaxPooling2D((2, 2)))
+model.add(layers.Conv2D(64, (3, 3), activation='relu'))
+model.add(layers.MaxPooling2D((2, 2)))
+model.add(layers.Conv2D(64, (3, 3), activation='relu'))
+
+# Let's display the architecture of our model so far.
+
+model.summary()
+!ec
+
+You can see that the output of every Conv2D and MaxPooling2D layer is a 3D tensor of shape (height, width, channels). The width and height dimensions tend to shrink as you go deeper in the network. The number of output channels for each Conv2D layer is controlled by the first argument (e.g., 32 or 64). Typically, as the width and height shrink, you can afford (computationally) to add more output channels in each Conv2D layer.
+
+
+
+
+!split
+===== Add Dense layers on top =====
+
+To complete our model, you will feed the last output tensor from the
+convolutional base (of shape (4, 4, 64)) into one or more Dense layers
+to perform classification. Dense layers take vectors as input (which
+are 1D), while the current output is a 3D tensor. First, you will
+flatten (or unroll) the 3D output to 1D, then add one or more Dense
+layers on top. CIFAR has 10 output classes, so you use a final Dense
+layer with 10 outputs and a softmax activation.
+
+!bc pycod
+model.add(layers.Flatten())
+model.add(layers.Dense(64, activation='relu'))
+model.add(layers.Dense(10))
+# Here's the complete architecture of our model
+
+model.summary()
+!ec
+As you can see, our (4, 4, 64) outputs were flattened into vectors of shape (1024) before going through two Dense layers.
+
+!split
+===== Compile and train the model =====
+
+!bc pycod
+model.compile(optimizer='adam',
+              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
+              metrics=['accuracy'])
+
+history = model.fit(train_images, train_labels, epochs=10, 
+                    validation_data=(test_images, test_labels))
+
+!ec
+
+
+!split
+===== Finally, evaluate the model =====
+
+!bc pycod
+plt.plot(history.history['accuracy'], label='accuracy')
+plt.plot(history.history['val_accuracy'], label = 'val_accuracy')
+plt.xlabel('Epoch')
+plt.ylabel('Accuracy')
+plt.ylim([0.5, 1])
+plt.legend(loc='lower right')
+
+test_loss, test_acc = model.evaluate(test_images,  test_labels, verbose=2)
+
+print(test_acc)
+
+!ec
+
+
+
+
+
+!split
+===== Building our own CNN code ===== 
+
+
+Here we present a flexible and readable python code for a CNN
+implemented with NumPy. We will present the code, showcase how to use
+the codebase and fit a CNN that yields a 99% accuracy on the 28x28
+MNIST dataset within reasonable time.
+
+_The codes here were developed by Eric Reber and Gregor Kajda during spring 2023._
+
+The CNN is compatible with all schedulers, cost functions and
+activation functions discussed in constructing our neural network
+codes.
+ 
+ The CNN code consists of different types of Layer classes, including
+Convolution2DLayer, Pooling2DLayer, FlattenLayer, FullyConnectedLayer
+and OutputLayer, which can be added to the CNN object using the
+interface of the CNN class. This allows you to easily construct your
+own CNN, as well as allowing you to get used to an interface similar
+to that of TensorFlow which is used for real world applications. 
+
+Another important feature of this code is that it throws errors if
+unreasonable decisions are made (for example using a kernel that is
+larger than the image, not using a FlattenLayer, etc), and provides
+the user with an informative error message.
+
+=== List of contents: ===
+o Schedulers
+o Activation Functions
+o Cost Functions 
+o Convolution
+o Layers
+o CNN 
+o Some final remarks
+
+=== Schedulers ===
+
+The code below shows object oriented implementations of the Constant,
+Momentum, Adagrad, AdagradMomentum, RMS prop and Adam schedulers. All
+of the classes belong to the shared abstract Scheduler class, and
+share the update_change() and reset() methods allowing for any of the
+schedulers to be seamlessly used during the training stage, as will
+later be shown in the fit() method of the neural
+network. Update_change() only has one parameter, the gradient
+($\delta^{l}_{j}a^{l-1}_k$), and returns the change which will be
+subtracted from the weights. The reset() function takes no parameters,
+and resets the desired variables. For Constant and Momentum, reset
+does nothing.
+
+!bc pycod
+import autograd.numpy as np
+
+class Scheduler:
+    """
+    Abstract class for Schedulers
+    """
+
+    def __init__(self, eta):
+        self.eta = eta
+
+    # should be overwritten
+    def update_change(self, gradient):
+        raise NotImplementedError
+
+    # overwritten if needed
+    def reset(self):
+        pass
+
+
+class Constant(Scheduler):
+    def __init__(self, eta):
+        super().__init__(eta)
+
+    def update_change(self, gradient):
+        return self.eta * gradient
+    
+    def reset(self):
+        pass
+
+
+class Momentum(Scheduler):
+    def __init__(self, eta: float, momentum: float):
+        super().__init__(eta)
+        self.momentum = momentum
+        self.change = 0
+
+    def update_change(self, gradient):
+        self.change = self.momentum * self.change + self.eta * gradient
+        return self.change
+
+    def reset(self):
+        pass
+
+
+class Adagrad(Scheduler):
+    def __init__(self, eta):
+        super().__init__(eta)
+        self.G_t = None
+
+    def update_change(self, gradient):
+        delta = 1e-8  # avoid division ny zero
+
+        if self.G_t is None:
+            self.G_t = np.zeros((gradient.shape[0], gradient.shape[0]))
+
+        self.G_t += gradient @ gradient.T
+
+        G_t_inverse = 1 / (
+            delta + np.sqrt(np.reshape(np.diagonal(self.G_t), (self.G_t.shape[0], 1)))
+        )
+        return self.eta * gradient * G_t_inverse
+
+    def reset(self):
+        self.G_t = None
+
+
+class AdagradMomentum(Scheduler):
+    def __init__(self, eta, momentum):
+        super().__init__(eta)
+        self.G_t = None
+        self.momentum = momentum
+        self.change = 0
+
+    def update_change(self, gradient):
+        delta = 1e-8  # avoid division ny zero
+
+        if self.G_t is None:
+            self.G_t = np.zeros((gradient.shape[0], gradient.shape[0]))
+
+        self.G_t += gradient @ gradient.T
+
+        G_t_inverse = 1 / (
+            delta + np.sqrt(np.reshape(np.diagonal(self.G_t), (self.G_t.shape[0], 1)))
+        )
+        self.change = self.change * self.momentum + self.eta * gradient * G_t_inverse
+        return self.change
+
+    def reset(self):
+        self.G_t = None
+
+
+class RMS_prop(Scheduler):
+    def __init__(self, eta, rho):
+        super().__init__(eta)
+        self.rho = rho
+        self.second = 0.0
+
+    def update_change(self, gradient):
+        delta = 1e-8  # avoid division ny zero
+        self.second = self.rho * self.second + (1 - self.rho) * gradient * gradient
+        return self.eta * gradient / (np.sqrt(self.second + delta))
+
+    def reset(self):
+        self.second = 0.0
+
+
+class Adam(Scheduler):
+    def __init__(self, eta, rho, rho2):
+        super().__init__(eta)
+        self.rho = rho
+        self.rho2 = rho2
+        self.moment = 0
+        self.second = 0
+        self.n_epochs = 1
+
+    def update_change(self, gradient):
+        delta = 1e-8  # avoid division ny zero
+
+        self.moment = self.rho * self.moment + (1 - self.rho) * gradient
+        self.second = self.rho2 * self.second + (1 - self.rho2) * gradient * gradient
+
+        moment_corrected = self.moment / (1 - self.rho**self.n_epochs)
+        second_corrected = self.second / (1 - self.rho2**self.n_epochs)
+
+        return self.eta * moment_corrected / (np.sqrt(second_corrected + delta))
+
+    def reset(self):
+        self.n_epochs += 1
+        self.moment = 0
+        self.second = 0
+
+!ec
+
+=== Usage of schedulers ===
+
+To initalize a scheduler, simply create the object and pass in the necessary parameters such as the learning rate and the momentum as shown below. As the Scheduler class is an abstract class it should not called directly, and will raise an error upon usage.
+
+!bc pycod
+momentum_scheduler = Momentum(eta=1e-3, momentum=0.9)
+adam_scheduler = Adam(eta=1e-3, rho=0.9, rho2=0.999)
+!ec
+
+Here is a small example for how a segment of code using schedulers could look. Switching out the schedulers is simple.
+
+!bc pycod
+weights = np.ones((3,3))
+print(f"Before scheduler:\n{weights=}")
+
+epochs = 10
+for e in range(epochs):
+    gradient = np.random.rand(3, 3)
+    change = adam_scheduler.update_change(gradient)
+    weights = weights - change
+    adam_scheduler.reset()
+
+print(f"\nAfter scheduler:\n{weights=}")
+!ec
+
+=== Cost functions ===
+
+In this section we will quickly look at cost functions that can be
+used when creating the neural network. Every cost function takes the
+target vector as its parameter, and returns a function valued only at
+X such that it may easily be differentiated.
+
+
+!bc pycod
+def CostOLS(target):
+    """
+    Return OLS function valued only at X, so
+    that it may be easily differentiated
+    """
+
+    def func(X):
+        return (1.0 / target.shape[0]) * np.sum((target - X) ** 2)
+
+    return func
+
+
+def CostLogReg(target):
+    """
+    Return Logistic Regression cost function
+    valued only at X, so that it may be easily differentiated
+    """
+
+    def func(X):
+        return -(1.0 / target.shape[0]) * np.sum(
+            (target * np.log(X + 10e-10)) + ((1 - target) * np.log(1 - X + 10e-10))
+        )
+
+    return func
+
+
+def CostCrossEntropy(target):
+    """
+    Return cross entropy cost function valued only at X, so
+    that it may be easily differentiated
+    """
+    
+    def func(X):
+        return -(1.0 / target.size) * np.sum(target * np.log(X + 10e-10))
+
+    return func
+
+!ec
+
+=== Usage of cost functions ===
+
+Below we will provide a short example of how these cost function may
+be used to obtain results if you wish to test them out on your own
+using AutoGrad's automatic differentiation.
+
+!bc pycod
+from autograd import grad
+
+target = np.array([[1, 2, 3]]).T
+a = np.array([[4, 5, 6]]).T
+
+cost_func = CostCrossEntropy
+cost_func_derivative = grad(cost_func(target))
+
+valued_at_a = cost_func_derivative(a)
+print(f"Derivative of cost function {cost_func.__name__} valued at a:\n{valued_at_a}")
+!ec
+
+=== Activation functions ===
+
+Finally, before we look at the layers that make up the neural network,
+we will look at the activation functions which can be specified
+between the hidden layers and as the output function. Each function
+can be valued for any given vector or matrix X, and can be
+differentiated via derivate().
+
+
+!bc pycod
+
+import autograd.numpy as np
+from autograd import elementwise_grad
+
+def identity(X):
+    return X
+
+
+def sigmoid(X):
+    try:
+        return 1.0 / (1 + np.exp(-X))
+    except FloatingPointError:
+        return np.where(X > np.zeros(X.shape), np.ones(X.shape), np.zeros(X.shape))
+
+
+def softmax(X):
+    X = X - np.max(X, axis=-1, keepdims=True)
+    delta = 10e-10
+    return np.exp(X) / (np.sum(np.exp(X), axis=-1, keepdims=True) + delta)
+
+
+def RELU(X):
+    return np.where(X > np.zeros(X.shape), X, np.zeros(X.shape))
+
+
+def LRELU(X):
+    delta = 10e-4
+    return np.where(X > np.zeros(X.shape), X, delta * X)
+
+
+def derivate(func):
+    if func.__name__ == "RELU":
+
+        def func(X):
+            return np.where(X > 0, 1, 0)
+
+        return func
+
+    elif func.__name__ == "LRELU":
+
+        def func(X):
+            delta = 10e-4
+            return np.where(X > 0, 1, delta)
+
+        return func
+
+    else:
+        return elementwise_grad(func)
+!ec
+
+
+=== Usage of activation functions ===
+
+Below we present a short demonstration of how to use an activation
+function. The derivative of the activation function will be important
+when calculating the output delta term during backpropagation. Note
+that derivate() can also be used for cost functions for a more
+generalized approach.
+
+
+!bc pycod
+z = np.array([[4, 5, 6]]).T
+print(f"Input to activation function:\n{z}")
+
+act_func = sigmoid
+a = act_func(z)
+print(f"\nOutput from {act_func.__name__} activation function:\n{a}")
+
+act_func_derivative = derivate(act_func)
+valued_at_z = act_func_derivative(a)
+print(f"\nDerivative of {act_func.__name__} activation function valued at z:\n{valued_at_z}")
+!ec
+
+=== Convolution ===
+
+In order to construct a convolutional neural network (CNN), it is
+crucial to comprehend the fundamental principles of convolution and
+how it aids in extracting information from images. Convolution, at its
+core, is merely a mathematical operation between two functions that
+yields another function. It is represented by an integral between two
+functions, which is typically expressed as:
+
+!bt
+\[
+(f \ast g)(t):=\int_{-\infty}^{\infty} f(\tau) g(t-\tau) d \tau.
+\]
+!et
+
+Here, $f$ and $g$ are the two functions on which we want to perform an
+operation. The outcome of the convolution operation is represented by
+$(f \ast g)$, and it is derived by sliding the function $g$ over $f$ and
+computing the integral of their product at each position. If both
+functions are continuous, convolution takes the form shown
+above. However, if we discretize both $f$ and $g$, the convolution
+operation will take the form of a sum between the elements of $f$ and $g$:
+!bt
+\[
+(f \ast g)[n]=\sum_{m=0}^{n-1} f(m) g(n-m).
+\]
+!et
+
+The key idea we utilize to extract the information contained in an
+image is to slide an $m \times n$ matrix $g$ over an $m \times n$
+matrix $f$. In our case, $f$ represents the image, while $g$
+represents the kernel, oftentimes called a filter. However, since our
+convolution will be a two-dimensional variant, we need to extend our
+mathematical formula with an additional summation:
+
+!bt
+\[
+(f \ast g)(i, j)\sum_{m=0}^{M-1}\sum_{n=0}^{N-1} f(m,n) g(i-m, j-n).
+\]
+!et
+
+It is imperative to note that the size of the kernel g is
+significantly smaller than the size of the input image f, thereby
+reducing the amount of computation necessary for feature
+extraction. Furthermore, the kernel is usually a trainable parameter
+in a convolutional neural network, allowing the network to learn
+appropriate kernels for specific tasks.
+
+To give you an example of how 2D convolution works in practice,
+suppose we have an image $f$ of dimension $6 \times 6$
+
+!bt
+\[
+f = \begin{bmatrix}
+4 & 1 & 2 & 9 & 8 & 6 \\
+9 & 5 & 9 & 5 & 8 & 5 \\
+1 & 5 & 9 & 7 & 6 & 4 \\
+2 & 9 & 8 & 3 & 7 & 1 \\
+8 & 1 & 6 & 4 & 2 & 2 \\
+1 & 0 & 5 & 7 & 8 & 2 \\
+\end{bmatrix}
+\]
+!et
+
+and a $3 \times 3$ kernel $g$ called a low-pass filter. Note that the
+kernel is usually rotated by 180 degrees during convolution, however
+this has no effect on this kernel.
+
+!bt
+\[
+g = \frac{1}{9}
+\begin{bmatrix}
+1 & 1 & 1 \\
+1 & 1 & 1 \\
+1 & 1 & 1 \\
+\end{bmatrix}
+\]
+!et
+
+In order to filter the image, we have to extract a $3 \times 3$
+element from the upper left corner of $f$, and perform element-wise
+multiplication of the extracted image pixels with the elements of the
+kernel $g$:
+
+!bt
+\[
+\begin{bmatrix}
+4 & 1 & 2 \\
+9 & 5 & 9 \\
+1 & 5 & 9 \\
+\end{bmatrix}
+\cdot
+\begin{bmatrix}
+\frac{1}{9} & \frac{1}{9} & \frac{1}{9} \\
+\frac{1}{9} & \frac{1}{9} & \frac{1}{9} \\
+\frac{1}{9} & \frac{1}{9} & \frac{1}{9} \\
+\end{bmatrix}
+=
+\begin{bmatrix}
+\frac{4}{9} & \frac{1}{9} & \frac{2}{9} \\
+\frac{9}{9} & \frac{5}{9} & \frac{9}{9} \\
+\frac{1}{9} & \frac{5}{9} & \frac{9}{9} \\
+\end {bmatrix}
+= \bm{A}
+\]
+!et
+
+Then, following the multiplication, we summarize all the elements of the resulting matrix $\bm{A}$:
+
+!bt
+\[
+(f \ast g)(0, 0)= \sum_{i=0}^{2} \sum_{j=0}^{2} a_{i,j} = 5,
+\]
+!et
+which corresponds to the first element of the filtered image $(f \ast g)$.
+
+Here we use a stride of $S=1$, a parameter denoted $S$ which describes how
+many indexes we move the kernel $g$ to the right before repeating the
+calculations above for the next $3 \times 3$ element of the image
+$f$. It is usually presumed that $S=1$, however, larger values for $S$
+can be used to reduce the dimentionality of the filtered image such
+that the convolution operation is more computationally efficient. In
+the context of a convolutional neural network, this will become very
+useful.
+
+The full result of the convolution is:
+
+!bt
+\[
+(f \ast g) =
+\begin{bmatrix}
+5 & 5.78 & 7 & 6.44 \\
+6.33 & 6.67 & 6.89 & 5.11 \\
+5.44 & 5.78 & 5.78 & 4 \\
+4.44 & 4.78 & 5.56 & 4 \\
+\end{bmatrix}
+\]
+!et
+
+The result is markedly smaller in shape than the original image. This
+occurs when using convolution without first padding the image with
+additional columns and rows, allowing us to keep the original image
+shape after sliding the kernel over the image.  How many rows and
+columns we wish to pad the image with depends strictly on the shape of
+the kernel, as we wish to pad the image with $r$ additional rows and
+$c$ additional columns.
+
+!bt
+\[
+r =\lfloor \frac{\mathrm{kernel height}}{2} \rfloor \cdot 2 \\
+c =\lfloor \frac{\mathrm{kernel width}}{2} \rfloor \cdot 2
+\]
+!et
+
+Note the notation $\lfloor \frac{\mathrm{kernel width}}{2} \rfloor$ means that
+we floor the result of the division, meaning we round down to a whole
+number in case $\frac{\mathrm{kernel width}}{2}$ results in a floating point
+number.
+
+Using those simple equations, we find out by how much we have to
+extend the dimensions of the original image. Before proceeding,
+however, we might ask what we shall fill the additional rows and
+columns with? One of the most common approaches to padding is
+zero-padding, which as the name suggest, involves filling the rows and
+columns with zeros. This is the approach that we will be using for
+this demonstration. If we apply this padding to out original $6 \times 6$
+image, the result will be an $8 \times 8$ image as the kernel has a width and
+height of 3. Note that the original image is encapsuled by the
+zero-padded rows and columns:
+
+!bt
+\[
+\begin{bmatrix}
+0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\
+0 & 4 & 1 & 2 & 9 & 8 & 6 & 0 \\
+0 & 9 & 5 & 9 & 5 & 8 & 5 & 0 \\
+0 & 1 & 5 & 9 & 7 & 6 & 4 & 0 \\
+0 & 2 & 9 & 8 & 3 & 7 & 1 & 0 \\
+0 & 8 & 1 & 6 & 4 & 2 & 2 & 0 \\
+0 & 1 & 0 & 5 & 7 & 8 & 2 & 0 \\
+0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\
+\end{bmatrix}.
+\]
+!et
+
+Below we have provided code that demonstrates padding and
+convolution. As you will see when we run the code, the size of the
+image will remain unchanged when using padding.~
+
+!bc pycod
+import numpy as np
+
+def padding(image, kernel):
+    # calculate r and c
+    r = (kernel.shape[0] // 2) * 2
+    c = (kernel.shape[1] // 2) * 2
+    
+    # padded image dimensions
+    padded_height = image.shape[0] + r
+    padded_width = image.shape[1] + c
+    
+    # for more readable code
+    k_half_height = kernel.shape[0] // 2
+    k_half_width = kernel.shape[1] // 2
+
+    # zero matrix with padded dimensions
+    padded_img = np.zeros((padded_height, padded_width))
+
+    # place image into zero matrix
+    padded_img[k_half_height : padded_height - k_half_height,
+               k_half_width : padded_width - k_half_width] = image[:, :]
+
+    return padded_img
+
+def convolve(original_image, padded_image, kernel, stride=1):
+    # rotate kernel by 180 degrees
+    kernel = np.rot90(np.rot90(kernel))
+
+    # note that kernel height // 2 is written as 'm'
+    # and kernel width // 2 as 'n' in the mathematical notation
+    m = kernel.shape[0] // 2
+    n = kernel.shape[1] // 2
+    
+    r = (kernel.shape[0] // 2) * 2
+    c = (kernel.shape[1] // 2) * 2
+    
+    # initialize output array
+    convolved_image = np.zeros(original_image.shape)
+    image_height = original_image.shape[0]
+    image_width = original_image.shape[1]
+
+    # the convolution
+    for i in range(m, image_height + m, stride):
+        for j in range(n, image_width + n, stride):
+            convolved_image[i-m, j-n] = np.sum(
+                padded_image[i : i + m, j : j + n]
+                * kernel
+            )
+            
+    return convolved_image
+
+def convolve(image, kernel, stride=1):
+    for i in range(2):
+        kernel = np.rot90(kernel)
+
+    k_half_height = kernel.shape[0] // 2
+    k_half_width = kernel.shape[0] // 2
+
+    conv_image = np.zeros(image.shape)
+    pad_image = padding(image, kernel)
+
+    for i in range(k_half_height, conv_image.shape[0] + k_half_height, stride):
+        for j in range(k_half_width, conv_image.shape[1] + k_half_width, stride):
+            conv_image[i - k_half_height, j - k_half_width] = np.sum(
+                pad_image[
+                    i - k_half_height : i + k_half_height + 1, j - k_half_width : j + k_half_width + 1
+                ]
+                * kernel
+            )
+
+    return conv_image
+!ec
+
+Fun fact: When filtering images, you will see that convolution
+involves rotating the kernel by 180 degrees.  However, this is not the
+case when applying convolution in a CNN, where the same operation that is not
+rotated by 180 degrees is called cross-correlation, which is normally implemented in most libraries.
+
+!bc pycod
+
+original_image = np.array([[4, 1, 2, 9, 8, 6],
+                 [9, 5, 9, 5, 8, 5],
+                 [1, 5, 9, 7, 6, 4],
+                 [2, 9, 8, 3, 7, 1],
+                 [8, 1, 6, 4, 2, 2],
+                 [1, 0, 5, 7, 8, 2]])
+
+kernel = (1/9)*np.ones((3,3))
+
+print(f"{original_image.shape=}")
+
+# note that convolve() performs padding
+convolved_image = convolve(original_image, kernel, stride=1)
+
+print(f"{convolved_image.shape=}")
+!ec
+
+As you can see, the resulting image is of the same size as the
+original image. To round of our demonstration of convolution, we will
+present the results of convolution using commonly used kernels. In a
+CNN, the values of the kernels are randomly initialized, and then
+learned during training. These kernels will extract information
+regarding the picture, such as for example the edge detection filter
+demonstrated below extracts the edges present in the picture. Of
+course, there is no guarantee that the CNN will learn an edge
+detection filter, but this should provide some intuiton as to how the
+CNN is able to use kernels to make better predictions than a regular
+feed forward neural network.
+
+!bc pycod
+# Now an example using a real image and first a gaussian low-pass filter and then a Sobel filter
+import numpy as np
+import imageio.v3 as imageio
+import matplotlib.pyplot as plt
+import time
+
+def generate_gauss_mask(sigma, K=1):
+    side = np.ceil(1 + 8 * sigma)
+    y, x = np.mgrid[-side // 2 + 1 : (side // 2) + 1, -side // 2 + 1 : (side // 2) + 1]
+    ker_coef = K / (2 * np.pi * sigma**2)
+    g = np.exp(-((x**2 + y**2) / (2.0 * sigma**2)))
+
+    return g, ker_coef
+
+
+img_path = "data/IMG-2167.JPG"
+image_of_cute_dog = imageio.imread(img_path, mode='L')
+
+plt.imshow(image_of_cute_dog, cmap="gray", vmin=0, vmax=255, aspect="auto")
+plt.title("Original image")
+plt.show()
+
+gauss, kernel = generate_gauss_mask(sigma=6)
+gauss_kernel = gauss*kernel
+
+filtered_image = convolve(image_of_cute_dog, gauss_kernel)
+plt.imshow(filtered_image, cmap="gray", vmin=0, vmax=255, aspect="auto")
+plt.title("Result of convolution with gauss kernel (blurring filter)")
+plt.show()
+
+sobel_kernel = np.array([[1, 2, 1],
+                    [0, 0, 0], 
+                    [-1, -2, -1]])
+
+filtered_image = convolve(image_of_cute_dog, sobel_kernel)
+
+plt.imshow(filtered_image, cmap="gray", vmin=0, vmax=255, aspect="auto")
+plt.title("Result of convolution with sobel kernel (edge detection filter)")
+plt.show()
+!ec
+
+=== Layers ===
+
+The code below initialises global variables for readability and
+describes the abstract class Layers. This is not important in order to
+understand the CNN, but is benefitial for organizing the code neatly.
+
+!bc pycod
+import math
+import autograd.numpy as np
+from copy import deepcopy, copy
+from autograd import grad
+from typing import Callable
+
+# global variables for index readability
+input_index = 0
+node_index = 1
+bias_index = 1
+input_channel_index = 1
+feature_maps_index = 1
+height_index = 2
+width_index = 3
+kernel_feature_maps_index = 1
+kernel_input_channels_index = 0
+
+
+class Layer:
+    def __init__(self, seed):
+        self.seed = seed
+
+    def _feedforward(self):
+        raise NotImplementedError
+
+    def _backpropagate(self):
+        raise NotImplementedError
+
+    def _reset_weights(self, previous_nodes):
+        raise NotImplementedError
+!ec
+
+=== Convolution2DLayer: convolution in a hidden layer ===
+
+After establishing the foundational understanding of applying
+convolution to spatial data, let us delve into the intricate workings
+of a convolutional layer in a Convolutional Neural Network (CNN). The
+primary function of convolution, as previously discussed, is to
+extract pertinent information from images while simultaneously
+decreasing the scale of our data. To initiate the image processing, we
+shall begin by partitioning the images into color channels (unless the
+image is grayscale), comprising three primary colors: red, green, and
+blue. We will subsequently utilize trainable kernels to construct a
+higher-dimensional encoding of each channel called feature
+maps. Successive layers will receive these feature maps as inputs,
+generating further encodings, albeit with reduced dimensions. The term
+trainable kernels denotes the initialization of pre-defined
+kernel-shaped weights, which we will then train via backpropagation,
+similar to how weights are trained in a Feedforward Neural Network.
+
+ 
+To ensure seamless integration between our implementation of the
+convolutional layer and popular machine learning frameworks like
+Tensorflow (Keras) and PyTorch, we have adopted a design pattern that
+mirrors the construction of models using these APIs. This involves
+implementing our convolutional layer as a Python class or object,
+which allows for a more modular and flexible approach to building
+neural networks. By structuring our code in this way, users can easily
+incorporate our implementation into their existing machine learning
+pipelines without having to make significant changes to their
+codebase. Additionally, this design pattern promotes code reusability
+and makes it easier to maintain and update our convolutional layer
+implementation over time.
+
+ 
+Note that the Convolution2DLayer takes in an activation function as a parameter, as it also performs non-linearity.
+
+!bc pycod
+class Convolution2DLayer(Layer):
+    def __init__(
+        self,
+        input_channels,
+        feature_maps,
+        kernel_height,
+        kernel_width,
+        v_stride,
+        h_stride,
+        pad,
+        act_func: Callable,
+        seed=None,
+        reset_weights_independently=True,
+    ):
+        super().__init__(seed)
+        self.input_channels = input_channels
+        self.feature_maps = feature_maps
+        self.kernel_height = kernel_height
+        self.kernel_width = kernel_width
+        self.v_stride = v_stride
+        self.h_stride = h_stride
+        self.pad = pad
+        self.act_func = act_func
+
+        # such that the layer can be used on its own
+        # outside of the CNN module
+        if reset_weights_independently == True:
+            self._reset_weights_independently()
+
+    def _feedforward(self, X_batch):
+        # note that the shape of X_batch = [inputs, input_maps, img_height, img_width]
+
+        # pad the input batch
+        X_batch_padded = self._padding(X_batch)
+
+        # calculate height_index and width_index after stride
+        strided_height = int(np.ceil(X_batch.shape[height_index] / self.v_stride))
+        strided_width = int(np.ceil(X_batch.shape[width_index] / self.h_stride))
+
+        # create output array
+        output = np.ndarray(
+            (
+                X_batch.shape[input_index],
+                self.feature_maps,
+                strided_height,
+                strided_width,
+            )
+        )
+
+        # save input and output for backpropagation
+        self.X_batch_feedforward = X_batch
+        self.output_shape = output.shape
+
+        # checking for errors, no need to look here :)
+        self._check_for_errors()
+
+        # convolve input with kernel
+        for img in range(X_batch.shape[input_index]):
+            for chin in range(self.input_channels):
+                for fmap in range(self.feature_maps):
+                    out_h = 0
+                    for h in range(0, X_batch.shape[height_index], self.v_stride):
+                        out_w = 0
+                        for w in range(0, X_batch.shape[width_index], self.h_stride):
+                            output[img, fmap, out_h, out_w] = np.sum(
+                                X_batch_padded[
+                                    img,
+                                    chin,
+                                    h : h + self.kernel_height,
+                                    w : w + self.kernel_width,
+                                ]
+                                * self.kernel[chin, fmap, :, :]
+                            )
+                            out_w += 1
+                        out_h += 1
+
+        # Pay attention to the fact that we're not rotating the kernel by 180 degrees when filtering the image in
+        # the convolutional layer, as convolution in terms of Machine Learning is a procedure known as cross-correlation
+        # in image processing and signal processing
+
+        # return a
+        return self.act_func(output / (self.kernel_height))
+
+    def _backpropagate(self, delta_term_next):
+        # intiate matrices
+        delta_term = np.zeros((self.X_batch_feedforward.shape))
+        gradient_kernel = np.zeros((self.kernel.shape))
+
+        # pad input for convolution
+        X_batch_padded = self._padding(self.X_batch_feedforward)
+
+        # Since an activation function is used at the output of the convolution layer, its derivative
+        # has to be accounted for in the backpropagation -> as if ReLU was a layer on its own.
+        act_derivative = derivate(self.act_func)
+        delta_term_next = act_derivative(delta_term_next)
+
+        # fill in 0's for values removed by vertical stride in feedforward
+        if self.v_stride > 1:
+            v_ind = 1
+            for i in range(delta_term_next.shape[height_index]):
+                for j in range(self.v_stride - 1):
+                    delta_term_next = np.insert(
+                        delta_term_next, v_ind, 0, axis=height_index
+                    )
+                v_ind += self.v_stride
+
+        # fill in 0's for values removed by horizontal stride in feedforward
+        if self.h_stride > 1:
+            h_ind = 1
+            for i in range(delta_term_next.shape[width_index]):
+                for k in range(self.h_stride - 1):
+                    delta_term_next = np.insert(
+                        delta_term_next, h_ind, 0, axis=width_index
+                    )
+                h_ind += self.h_stride
+
+        # crops out 0-rows and 0-columns
+        delta_term_next = delta_term_next[
+            :,
+            :,
+            : self.X_batch_feedforward.shape[height_index],
+            : self.X_batch_feedforward.shape[width_index],
+        ]
+
+        # the gradient received from the next layer also needs to be padded
+        delta_term_next = self._padding(delta_term_next)
+
+        # calculate delta term by convolving next delta term with kernel
+        for img in range(self.X_batch_feedforward.shape[input_index]):
+            for chin in range(self.input_channels):
+                for fmap in range(self.feature_maps):
+                    for h in range(self.X_batch_feedforward.shape[height_index]):
+                        for w in range(self.X_batch_feedforward.shape[width_index]):
+                            delta_term[img, chin, h, w] = np.sum(
+                                delta_term_next[
+                                    img,
+                                    fmap,
+                                    h : h + self.kernel_height,
+                                    w : w + self.kernel_width,
+                                ]
+                                * np.rot90(np.rot90(self.kernel[chin, fmap, :, :]))
+                            )
+
+        # calculate gradient for kernel for weight update
+        # also via convolution
+        for chin in range(self.input_channels):
+            for fmap in range(self.feature_maps):
+                for k_x in range(self.kernel_height):
+                    for k_y in range(self.kernel_width):
+                        gradient_kernel[chin, fmap, k_x, k_y] = np.sum(
+                            X_batch_padded[
+                                img,
+                                chin,
+                                h : h + self.kernel_height,
+                                w : w + self.kernel_width,
+                            ]
+                            * delta_term_next[
+                                img,
+                                fmap,
+                                h : h + self.kernel_height,
+                                w : w + self.kernel_width,
+                            ]
+                        )
+        # all kernels are updated with weight gradient of kernel
+        self.kernel -= gradient_kernel
+
+        # return delta term
+        return delta_term
+
+    def _padding(self, X_batch, batch_type="image"):
+
+        # same padding for images
+        if self.pad == "same" and batch_type == "image":
+            padded_height = X_batch.shape[height_index] + (self.kernel_height // 2) * 2
+            padded_width = X_batch.shape[width_index] + (self.kernel_width // 2) * 2
+            half_kernel_height = self.kernel_height // 2
+            half_kernel_width = self.kernel_width // 2
+
+            # initialize padded array
+            X_batch_padded = np.ndarray(
+                (
+                    X_batch.shape[input_index],
+                    X_batch.shape[feature_maps_index],
+                    padded_height,
+                    padded_width,
+                )
+            )
+
+            # zero pad all images in X_batch
+            for img in range(X_batch.shape[input_index]):
+                padded_img = np.zeros(
+                    (X_batch.shape[feature_maps_index], padded_height, padded_width)
+                )
+                padded_img[
+                    :,
+                    half_kernel_height : padded_height - half_kernel_height,
+                    half_kernel_width : padded_width - half_kernel_width,
+                ] = X_batch[img, :, :, :]
+                X_batch_padded[img, :, :, :] = padded_img[:, :, :]
+
+            return X_batch_padded
+
+        # same padding for gradients
+        elif self.pad == "same" and batch_type == "grad":
+            padded_height = X_batch.shape[height_index] + (self.kernel_height // 2) * 2
+            padded_width = X_batch.shape[width_index] + (self.kernel_width // 2) * 2
+            half_kernel_height = self.kernel_height // 2
+            half_kernel_width = self.kernel_width // 2
+
+            # initialize padded array
+            delta_term_padded = np.zeros(
+                (
+                    X_batch.shape[input_index],
+                    X_batch.shape[feature_maps_index],
+                    padded_height,
+                    padded_width,
+                )
+            )
+
+            # zero pad delta term
+            delta_term_padded[
+                :, :, : X_batch.shape[height_index], : X_batch.shape[width_index]
+            ] = X_batch[:, :, :, :]
+
+            return delta_term_padded
+
+        else:
+            return X_batch
+
+    def _reset_weights_independently(self):
+        # sets seed to remove randomness inbetween runs
+        if self.seed is not None:
+            np.random.seed(self.seed)
+
+        # initializes kernel matrix
+        self.kernel = np.ndarray(
+            (
+                self.input_channels,
+                self.feature_maps,
+                self.kernel_height,
+                self.kernel_width,
+            )
+        )
+
+        # randomly initializes weights
+        for chin in range(self.kernel.shape[kernel_input_channels_index]):
+            for fmap in range(self.kernel.shape[kernel_feature_maps_index]):
+                self.kernel[chin, fmap, :, :] = np.random.rand(
+                    self.kernel_height, self.kernel_width
+                )
+
+    def _reset_weights(self, previous_nodes):
+        # sets weights
+        self._reset_weights_independently()
+
+        # returns shape of output used for subsequent layer's weight initiation
+        strided_height = int(
+            np.ceil(previous_nodes.shape[height_index] / self.v_stride)
+        )
+        strided_width = int(np.ceil(previous_nodes.shape[width_index] / self.h_stride))
+        next_nodes = np.ones(
+            (
+                previous_nodes.shape[input_index],
+                self.feature_maps,
+                strided_height,
+                strided_width,
+            )
+        )
+        return next_nodes / self.kernel_height
+
+    def _check_for_errors(self):
+        if self.X_batch_feedforward.shape[input_channel_index] != self.input_channels:
+            raise AssertionError(
+                f"ERROR: Number of input channels in data ({self.X_batch_feedforward.shape[input_channel_index]}) is not equal to input channels in Convolution2DLayerOPT ({self.input_channels})! Please change the number of input channels of the Convolution2DLayer such that they are equal"
+            )
+!ec
+
+=== Backpropagation in the convolutional layer ===
+
+As you may have noticed, we have not yet explained how the
+backpropagation algorithm works in a convolutional layer. However,
+having covered all other major details about convolutional layers, we
+are now prepared to do so. It should come as no surprise that the
+calculation of delta terms at each convolutional layer takes the form
+of convolution. After the gradient has been propagated backwards
+through the flattening layer, where it was reshaped into an
+appropriate form, calculating the update value for the kernel is
+simply a matter of convolving the output gradient with the input of
+the layer for which we are updating the weights. For more detail, this
+article serves as an excellent resource, see
+URL:"https://pavisj.medium.com/convolutions-and-backpropagations-46026a8f5d2c"
+
+=== Demonstration ===
+
+We can use the convolutional layer above to perform a simple convolution on an image of the now familiar cute dog.
+
+!bc pycod
+import numpy as np
+import imageio.v3 as imageio
+import matplotlib.pyplot as plt
+
+def plot_convolution_result(X, layer):
+    plt.imshow(X[0, 0, :, :], vmin=0, vmax=255, cmap="gray")
+    plt.title("Original image")
+    plt.colorbar()
+    plt.show()
+    conv_result = layer._feedforward(X)
+    plt.title("Result of convolutional layer")
+    plt.imshow(conv_result[0, 0, :, :], vmin=0, vmax=255, cmap="gray")
+    plt.colorbar()
+    plt.show()
+
+# create layer
+layer = Convolution2DLayer(
+    input_channels=3,
+    feature_maps=1,
+    kernel_height=4,
+    kernel_width=4,
+    v_stride=2,
+    h_stride=2,
+    pad="same",
+    act_func=identity,
+    seed=2023,
+    )
+
+# read in image path, make data correct format
+img_path = img_path = "data/IMG-2167.JPG"
+image_of_cute_dog = imageio.imread(img_path)
+image_shape = image_of_cute_dog.shape
+image_of_cute_dog = image_of_cute_dog.reshape(1, image_shape[0], image_shape[1], image_shape[2])
+image_of_cute_dog = image_of_cute_dog.transpose(0, 3, 1, 2)
+
+# plot the result of the convolution
+plot_convolution_result(image_of_cute_dog, layer)
+!ec
+
+We cobserve that the result has half the pixels on each axis due to
+the fact that we've used a horizontal and vertical stride of 2. The
+result of this convolution is not very insightfull, as the kernel has
+completely random values for the first feedforward pass. However, as
+we perform multiple forward and backward passes, the results of the
+convolution should provide identifying features of the image it uses
+for classification.
+
+
+Note that image data usually comes in many different shapes and sizes,
+but for our CNN we require the input data be formatted as \[Number of
+inputs, input channels, input height, input width\]. Occasionally, the
+data you come accross use will be formatted like this, but on many
+occasions reshaping and transposing the dimensions is sadly necessary.
+
+=== Pooling Layer ===
+
+The pooling layer is another widely used type of layer in
+convolutional neural networks that enables data downsampling to a more
+manageable size. Despite recent technological advancements that allow
+for convolution without excessive size reduction of the data, the
+pooling layer still remains a fundamental component of convolutional
+neural networks. It can be used before, after, or in between
+convolutional layers, although finding the optimal placement of layers
+and network depth requires experimentation to achieve the best
+performance for a given problem. The code we provide allows you to
+perform two types of pooling known as max pooling and average pooling.
+
+!bc pycod
+class Pooling2DLayer(Layer):
+    def __init__(
+        self,
+        kernel_height,
+        kernel_width,
+        v_stride,
+        h_stride,
+        pooling="max",
+        seed=None,
+    ):
+        super().__init__(seed)
+        self.kernel_height = kernel_height
+        self.kernel_width = kernel_width
+        self.v_stride = v_stride
+        self.h_stride = h_stride
+        self.pooling = pooling
+
+    def _feedforward(self, X_batch):
+        # Saving the input for use in the backwardpass
+        self.X_batch_feedforward = X_batch
+
+        # check if user is silly
+        self._check_for_errors()
+
+        # Computing the size of the feature maps based on kernel size and the stride parameter
+        strided_height = (
+            X_batch.shape[height_index] - self.kernel_height
+        ) // self.v_stride + 1
+        if X_batch.shape[height_index] == X_batch.shape[width_index]:
+            strided_width = strided_height
+        else:
+            strided_width = (
+                X_batch.shape[width_index] - self.kernel_width
+            ) // self.h_stride + 1
+
+        # initialize output array
+        output = np.ndarray(
+            (
+                X_batch.shape[input_index],
+                X_batch.shape[feature_maps_index],
+                strided_height,
+                strided_width,
+            )
+        )
+
+        # select pooling action, either max or average pooling
+        if self.pooling == "max":
+            self.pooling_action = np.max
+        elif self.pooling == "average":
+            self.pooling_action = np.mean
+
+        # pool based on kernel size and stride
+        for img in range(output.shape[input_index]):
+            for fmap in range(output.shape[feature_maps_index]):
+                for h in range(strided_height):
+                    for w in range(strided_width):
+                        output[img, fmap, h, w] = self.pooling_action(
+                            X_batch[
+                                img,
+                                fmap,
+                                (h * self.v_stride) : (h * self.v_stride)
+                                + self.kernel_height,
+                                (w * self.h_stride) : (w * self.h_stride)
+                                + self.kernel_width,
+                            ]
+                        )
+
+        # output for feedforward in next layer
+        return output
+
+    def _backpropagate(self, delta_term_next):
+        # initiate delta term array
+        delta_term = np.zeros((self.X_batch_feedforward.shape))
+
+        for img in range(delta_term_next.shape[input_index]):
+            for fmap in range(delta_term_next.shape[feature_maps_index]):
+                for h in range(0, delta_term_next.shape[height_index], self.v_stride):
+                    for w in range(
+                        0, delta_term_next.shape[width_index], self.h_stride
+                    ):
+                        # max pooling
+                        if self.pooling == "max":
+                            # get window
+                            window = self.X_batch_feedforward[
+                                img,
+                                fmap,
+                                h : h + self.kernel_height,
+                                w : w + self.kernel_width,
+                            ]
+
+                            # find max values indices in window
+                            max_h, max_w = np.unravel_index(
+                                window.argmax(), window.shape
+                            )
+
+                            # set values in new, upsampled delta term
+                            delta_term[
+                                img,
+                                fmap,
+                                (h + max_h),
+                                (w + max_w),
+                            ] += delta_term_next[img, fmap, h, w]
+
+                        # average pooling
+                        if self.pooling == "average":
+                            delta_term[
+                                img,
+                                fmap,
+                                h : h + self.kernel_height,
+                                w : w + self.kernel_width,
+                            ] = (
+                                delta_term_next[img, fmap, h, w]
+                                / self.kernel_height
+                                / self.kernel_width
+                            )
+        # returns input to backpropagation in previous layer
+        return delta_term
+
+    def _reset_weights(self, previous_nodes):
+        # calculate strided height, strided width
+        strided_height = (
+            previous_nodes.shape[height_index] - self.kernel_height
+        ) // self.v_stride + 1
+        if previous_nodes.shape[height_index] == previous_nodes.shape[width_index]:
+            strided_width = strided_height
+        else:
+            strided_width = (
+                previous_nodes.shape[width_index] - self.kernel_width
+            ) // self.h_stride + 1
+
+        # initiate output array
+        output = np.ones(
+            (
+                previous_nodes.shape[input_index],
+                previous_nodes.shape[feature_maps_index],
+                strided_height,
+                strided_width,
+            )
+        )
+
+        # returns output with shape used for reset weights in next layer
+        return output
+
+    def _check_for_errors(self):
+        # check if input is smaller than kernel size -> error
+        assert (
+            self.X_batch_feedforward.shape[width_index] >= self.kernel_width
+        ), f"ERROR: Pooling kernel width_index ({self.kernel_width}) larger than data width_index ({self.X_batch_feedforward.input.shape[2]}), please lower the kernel width_index of the Pooling2DLayer"
+        assert (
+            self.X_batch_feedforward.shape[height_index] >= self.kernel_height
+        ), f"ERROR: Pooling kernel height_index ({self.kernel_height}) larger than data height_index ({self.X_batch_feedforward.input.shape[3]}), please lower the kernel height_index of the Pooling2DLayer"
+!ec
+
+=== Flattening Layer ===
+
+Before we can begin building our first CNN model, we need to introduce
+the flattening layer. As its name suggests, the flattening layer
+transforms the data into a one-dimensional vector that can be fed into
+the feedforward layers of our network. This layer plays a crucial role
+in preparing the data for further processing in the
+network. Additionally, the flattening layer is responsible for
+reshaping the gradient to the proper shape during
+backpropagation. This ensures that the kernels are correctly updated,
+allowing for effective learning in the network.
+
+!bc pycod
+class FlattenLayer(Layer):
+    def __init__(self, act_func=LRELU, seed=None):
+        super().__init__(seed)
+        self.act_func = act_func
+
+    def _feedforward(self, X_batch):
+        # save input for backpropagation
+        self.X_batch_feedforward_shape = X_batch.shape
+        # Remember, the data has the following shape: (I, FM, H, W, ) in the convolutional layers
+        # whilst the data has the shape (I, FM * H * W) in the fully connected layers
+        # I = Inputs, FM = Feature Maps, H = Height and W = Width.
+        X_batch = X_batch.reshape(
+            X_batch.shape[input_index],
+            X_batch.shape[feature_maps_index]
+            * X_batch.shape[height_index]
+            * X_batch.shape[width_index],
+        )
+
+        # add bias to a
+        self.z_matrix = X_batch
+        bias = np.ones((X_batch.shape[input_index], 1)) * 0.01
+        self.a_matrix = np.hstack([bias, X_batch])
+
+        # return a, the input to feedforward in next layer
+        return self.a_matrix
+
+    def _backpropagate(self, weights_next, delta_term_next):
+        activation_derivative = derivate(self.act_func)
+
+        # calculate delta term
+        delta_term = (
+            weights_next[bias_index:, :] @ delta_term_next.T
+        ).T * activation_derivative(self.z_matrix)
+
+        # FlattenLayer does not update weights
+        # reshapes delta layer to convolutional layer data format [Input, Feature_Maps, Height, Width]
+        return delta_term.reshape(self.X_batch_feedforward_shape)
+
+    def _reset_weights(self, previous_nodes):
+        # note that the previous nodes to the FlattenLayer are from the convolutional layers
+        previous_nodes = previous_nodes.reshape(
+            previous_nodes.shape[input_index],
+            previous_nodes.shape[feature_maps_index]
+            * previous_nodes.shape[height_index]
+            * previous_nodes.shape[width_index],
+        )
+
+        # return shape used in reset_weights in next layer
+        return previous_nodes.shape[node_index]
+
+    def get_prev_a(self):
+        return self.a_matrix
+!ec
+
+=== Fully Connected Layers ===
+
+Finally, the result from the flatten layer will pass to a series of
+fully connected layers, which function as a normal feed forward neural
+network. The fully connected layers are split into two classes;
+FullyConnectedLayer which acts as a hidden layer, and OutputLayer,
+which acts as the single output layer at the end of the CNN. If one
+wishes to use this codebase to construct a normal feed forward neural
+network, it must start with a FlattenLayer due to techincal details
+regarding weight intitialization. However many FullyConnectedLayers
+can be added to the CNN, and in each layer the amount of nodes, which
+activation function and scheduler to use can be specified. In
+practice, the scheduler will be specified in the CNN object
+initialization, and inherited if no other scheduler is specified.
+
+!bc pycod
+class FullyConnectedLayer(Layer):
+    # FullyConnectedLayer per default uses LRELU and Adam scheduler
+    # with an eta of 0.0001, rho of 0.9 and rho2 of 0.999
+    def __init__(
+        self,
+        nodes: int,
+        act_func: Callable = LRELU,
+        scheduler: Scheduler = Adam(eta=1e-4, rho=0.9, rho2=0.999),
+        seed: int = None,
+    ):
+        super().__init__(seed)
+        self.nodes = nodes
+        self.act_func = act_func
+        self.scheduler_weight = copy(scheduler)
+        self.scheduler_bias = copy(scheduler)
+
+        # initiate matrices for later
+        self.weights = None
+        self.a_matrix = None
+        self.z_matrix = None
+
+    def _feedforward(self, X_batch):
+        # calculate z
+        self.z_matrix = X_batch @ self.weights
+
+        # calculate a, add bias
+        bias = np.ones((X_batch.shape[input_index], 1)) * 0.01
+        self.a_matrix = self.act_func(self.z_matrix)
+        self.a_matrix = np.hstack([bias, self.a_matrix])
+
+        # return a, the input for feedforward in next layer
+        return self.a_matrix
+
+    def _backpropagate(self, weights_next, delta_term_next, a_previous, lam):
+        # take the derivative of the activation function
+        activation_derivative = derivate(self.act_func)
+
+        # calculate the delta term
+        delta_term = (
+            weights_next[bias_index:, :] @ delta_term_next.T
+        ).T * activation_derivative(self.z_matrix)
+
+        # intitiate matrix to store gradient
+        # note that we exclude the bias term, which we will calculate later
+        gradient_weights = np.zeros(
+            (
+                a_previous.shape[input_index],
+                a_previous.shape[node_index] - bias_index,
+                delta_term.shape[node_index],
+            )
+        )
+
+        # calculate gradient = delta term * previous a
+        for i in range(len(delta_term)):
+            gradient_weights[i, :, :] = np.outer(
+                a_previous[i, bias_index:], delta_term[i, :]
+            )
+
+        # sum the gradient, divide by input_index
+        gradient_weights = np.mean(gradient_weights, axis=input_index)
+        # for the bias gradient we do not multiply by previous a
+        gradient_bias = np.mean(delta_term, axis=input_index).reshape(
+            1, delta_term.shape[node_index]
+        )
+
+        # regularization term
+        gradient_weights += self.weights[bias_index:, :] * lam
+
+        # send gradients into scheduler
+        # returns update matrix which will be used to update the weights and bias
+        update_matrix = np.vstack(
+            [
+                self.scheduler_bias.update_change(gradient_bias),
+                self.scheduler_weight.update_change(gradient_weights),
+            ]
+        )
+
+        # update weights
+        self.weights -= update_matrix
+
+        # return weights and delta term, input for backpropagation in previous layer
+        return self.weights, delta_term
+
+    def _reset_weights(self, previous_nodes):
+        # sets seed to remove randomness inbetween runs
+        if self.seed is not None:
+            np.random.seed(self.seed)
+
+        # add bias, initiate random weights
+        bias = 1
+        self.weights = np.random.randn(previous_nodes + bias, self.nodes)
+
+        # returns number of nodes, used for reset_weights in next layer
+        return self.nodes
+
+    def _reset_scheduler(self):
+        # resets scheduler per epoch
+        self.scheduler_weight.reset()
+        self.scheduler_bias.reset()
+
+    def get_prev_a(self):
+        # returns a matrix, used in backpropagation
+        return self.a_matrix
+
+
+class OutputLayer(FullyConnectedLayer):
+    def __init__(
+        self,
+        nodes: int,
+        output_func: Callable = LRELU,
+        cost_func: Callable = CostCrossEntropy,
+        scheduler: Scheduler = Adam(eta=1e-4, rho=0.9, rho2=0.999),
+        seed: int = None,
+    ):
+        super().__init__(nodes, output_func, copy(scheduler), seed)
+        self.cost_func = cost_func
+
+        # initiate matrices for later
+        self.weights = None
+        self.a_matrix = None
+        self.z_matrix = None
+
+        # decides if the output layer performs binary or multi-class classification
+        self._set_pred_format()
+
+    def _feedforward(self, X_batch: np.ndarray):
+        # calculate a, z
+        # note that bias is not added as this would create an extra output class
+        self.z_matrix = X_batch @ self.weights
+        self.a_matrix = self.act_func(self.z_matrix)
+
+        # returns prediction
+        return self.a_matrix
+
+    def _backpropagate(self, target, a_previous, lam):
+        # note that in the OutputLayer the activation function is the output function
+        activation_derivative = derivate(self.act_func)
+
+        # calculate output delta terms
+        # for multi-class or binary classification
+        if self.pred_format == "Multi-class":
+            delta_term = self.a_matrix - target
+        else:
+            cost_func_derivative = grad(self.cost_func(target))
+            delta_term = activation_derivative(self.z_matrix) * cost_func_derivative(
+                self.a_matrix
+            )
+
+        # intiate matrix that stores gradient
+        gradient_weights = np.zeros(
+            (
+                a_previous.shape[input_index],
+                a_previous.shape[node_index] - bias_index,
+                delta_term.shape[node_index],
+            )
+        )
+
+        # calculate gradient = delta term * previous a
+        for i in range(len(delta_term)):
+            gradient_weights[i, :, :] = np.outer(
+                a_previous[i, bias_index:], delta_term[i, :]
+            )
+
+        # sum the gradient, divide by input_index
+        gradient_weights = np.mean(gradient_weights, axis=input_index)
+        # for the bias gradient we do not multiply by previous a
+        gradient_bias = np.mean(delta_term, axis=input_index).reshape(
+            1, delta_term.shape[node_index]
+        )
+
+        # regularization term
+        gradient_weights += self.weights[bias_index:, :] * lam
+
+        # send gradients into scheduler
+        # returns update matrix which will be used to update the weights and bias
+        update_matrix = np.vstack(
+            [
+                self.scheduler_bias.update_change(gradient_bias),
+                self.scheduler_weight.update_change(gradient_weights),
+            ]
+        )
+
+        # update weights
+        self.weights -= update_matrix
+
+        # return weights and delta term, input for backpropagation in previous layer
+        return self.weights, delta_term
+
+    def _reset_weights(self, previous_nodes):
+        # sets seed to remove randomness inbetween runs
+        if self.seed is not None:
+            np.random.seed(self.seed)
+
+        # add bias, initiate random weights
+        bias = 1
+        self.weights = np.random.rand(previous_nodes + bias, self.nodes)
+
+        # returns number of nodes, used for reset_weights in next layer
+        return self.nodes
+
+    def _reset_scheduler(self):
+        # resets scheduler per epoch
+        self.scheduler_weight.reset()
+        self.scheduler_bias.reset()
+
+    def _set_pred_format(self):
+        # sets prediction format to either regression, binary or multi-class classification
+        if self.act_func.__name__ is None or self.act_func.__name__ == "identity":
+            self.pred_format = "Regression"
+        elif self.act_func.__name__ == "sigmoid" or self.act_func.__name__ == "tanh":
+            self.pred_format = "Binary"
+        else:
+            self.pred_format = "Multi-class"
+
+    def get_pred_format(self):
+        # returns format of prediction
+        return self.pred_format
+!ec
+
+=== Optimized Convolution2DLayer ===
+
+For our CNN, we have also implemented an optimized version of the
+Convolution2DLayer, Convolution2DLayerOPT, which runs much faster. See
+VII. Remarks for discussion. This layer will per default be used by
+the CNN due to its computational advantages, but is much less
+readable. We've documented it such that specially interested students
+can understand the principles behind it, but it is not recommended to
+read. In short, we reshape and transpose parts of the image such that
+the convolutional operation can be swapped out for a simple matrix
+multiplication.
+
+
+!bc pycod
+class Convolution2DLayerOPT(Convolution2DLayer):
+    """
+    Am optimized version of the convolution layer above which
+    utilizes an approach of extracting windows of size equivalent
+    in size to the filter. The convoution is then performed on those
+    windows instead of a full feature map.
+    """
+
+    def __init__(
+        self,
+        input_channels,
+        feature_maps,
+        kernel_height,
+        kernel_width,
+        v_stride,
+        h_stride,
+        pad,
+        act_func: Callable,
+        seed=None,
+        reset_weights_independently=True,
+    ):
+        super().__init__(
+            input_channels,
+            feature_maps,
+            kernel_height,
+            kernel_width,
+            v_stride,
+            h_stride,
+            pad,
+            act_func,
+            seed,
+        )
+        # true if layer is used outside of CNN
+        if reset_weights_independently == True:
+            self._reset_weights_independently()
+
+    def _feedforward(self, X_batch):
+        # The optimized _feedforward method is difficult to understand but computationally more efficient
+        # for a more "by the book" approach, please look at the _feedforward method of Convolution2DLayer
+
+        # save the input for backpropagation
+        self.X_batch_feedforward = X_batch
+
+        # check that there are the correct amount of input channels
+        self._check_for_errors()
+
+        # calculate new shape after stride
+        strided_height = int(np.ceil(X_batch.shape[height_index] / self.v_stride))
+        strided_width = int(np.ceil(X_batch.shape[width_index] / self.h_stride))
+
+        # get windows of the image for more computationally efficient convolution
+        # the idea is that we want to align the dimensions that we wish to matrix
+        # multiply, then use a simple matrix multiplication instead of convolution.
+        # then, we reshape the size back to its intended shape
+        windows = self._extract_windows(X_batch)
+        windows = windows.transpose(1, 0, 2, 3, 4).reshape(
+            X_batch.shape[input_index],
+            strided_height * strided_width,
+            -1,
+        )
+
+        # reshape the kernel for more computationally efficient convolution
+        kernel = self.kernel
+        kernel = kernel.transpose(0, 2, 3, 1).reshape(
+            kernel.shape[kernel_input_channels_index]
+            * kernel.shape[height_index]
+            * kernel.shape[width_index],
+            -1,
+        )
+
+        # use simple matrix calculation to obtain output
+        output = (
+            (windows @ kernel)
+            .reshape(
+                X_batch.shape[input_index],
+                strided_height,
+                strided_width,
+                -1,
+            )
+            .transpose(0, 3, 1, 2)
+        )
+
+        # The output is reshaped and rearranged to appropriate shape
+        return self.act_func(
+            output / (self.kernel_height * X_batch.shape[feature_maps_index])
+        )
+
+    def _backpropagate(self, delta_term_next):
+        # The optimized _backpropagate method is difficult to understand but computationally more efficient
+        # for a more "by the book" approach, please look at the _backpropagate method of Convolution2DLayer
+        act_derivative = derivate(self.act_func)
+        delta_term_next = act_derivative(delta_term_next)
+
+        # calculate strided dimensions
+        strided_height = int(
+            np.ceil(self.X_batch_feedforward.shape[height_index] / self.v_stride)
+        )
+        strided_width = int(
+            np.ceil(self.X_batch_feedforward.shape[width_index] / self.h_stride)
+        )
+
+        # copy kernel
+        kernel = self.kernel
+
+        # get windows, reshape for matrix multiplication
+        windows = self._extract_windows(self.X_batch_feedforward, "image").reshape(
+            self.X_batch_feedforward.shape[input_index]
+            * strided_height
+            * strided_width,
+            -1,
+        )
+
+        # initialize output gradient, reshape and transpose into correct shape
+        # for matrix multiplication
+        output_grad_tr = delta_term_next.transpose(0, 2, 3, 1).reshape(
+            self.X_batch_feedforward.shape[input_index]
+            * strided_height
+            * strided_width,
+            -1,
+        )
+
+        # calculate gradient kernel via simple matrix multiplication and reshaping
+        gradient_kernel = (
+            (windows.T @ output_grad_tr)
+            .reshape(
+                kernel.shape[kernel_input_channels_index],
+                kernel.shape[height_index],
+                kernel.shape[width_index],
+                kernel.shape[kernel_feature_maps_index],
+            )
+            .transpose(0, 3, 1, 2)
+        )
+
+        # for computing the input gradient
+        windows_out, upsampled_height, upsampled_width = self._extract_windows(
+            delta_term_next, "grad"
+        )
+
+        # calculate new window dimensions
+        new_windows_first_dim = (
+            self.X_batch_feedforward.shape[input_index]
+            * upsampled_height
+            * upsampled_width
+        )
+        # ceil allows for various asymmetric kernels
+        new_windows_sec_dim = int(np.ceil(windows_out.size / new_windows_first_dim))
+
+        # reshape for matrix multiplication
+        windows_out = windows_out.transpose(1, 0, 2, 3, 4).reshape(
+            new_windows_first_dim, new_windows_sec_dim
+        )
+
+        # reshape for matrix multiplication
+        kernel_reshaped = kernel.reshape(self.input_channels, -1)
+
+        # calculating input gradient for next convolutional layer
+        input_grad = (windows_out @ kernel_reshaped.T).reshape(
+            self.X_batch_feedforward.shape[input_index],
+            upsampled_height,
+            upsampled_width,
+            kernel.shape[kernel_input_channels_index],
+        )
+        input_grad = input_grad.transpose(0, 3, 1, 2)
+
+        # Update the weights in the kernel
+        self.kernel -= gradient_kernel
+
+        # Output the gradient to propagate backwards
+        return input_grad
+
+    def _extract_windows(self, X_batch, batch_type="image"):
+        """
+        Receives as input the X_batch with shape (inputs, feature_maps, image_height, image_width)
+        and extract windows of size kernel_height * kernel_width for every image and every feature_map.
+        It then returns an np.ndarray of shape (image_height * image_width, inputs, feature_maps, kernel_height, kernel_width)
+        which will be used either to filter the images in feedforward or to calculate the gradient.
+        """
+
+        # initialize list of windows
+        windows = []
+
+        if batch_type == "image":
+            # pad the images
+            X_batch_padded = self._padding(X_batch, batch_type="image")
+            img_height, img_width = X_batch_padded.shape[2:]
+            # For each location in the image...
+            for h in range(
+                0,
+                X_batch.shape[height_index],
+                self.v_stride,
+            ):
+                for w in range(
+                    0,
+                    X_batch.shape[width_index],
+                    self.h_stride,
+                ):
+                    # ...obtain an image patch of the original size (strided)
+
+                    # get window
+                    window = X_batch_padded[
+                        :,
+                        :,
+                        h : h + self.kernel_height,
+                        w : w + self.kernel_width,
+                    ]
+
+                    # append to list of windows
+                    windows.append(window)
+
+            # return numpy array instead of list
+            return np.stack(windows)
+
+        # In order to be able to perform backprogagation by the method of window extraction,
+        # here is a modified approach to extracting the windows which allow for the necessary
+        # upsampling of the gradient in case the on of the stride parameters is larger than one.
+
+        if batch_type == "grad":
+
+            # In the case of one of the stride parameters being odd, we have to take some
+            # extra care in calculating the upsampled size of X_batch. We solve this
+            # by simply flooring the result of dividing stride by 2.
+            if self.v_stride < 2 or self.v_stride % 2 == 0:
+                v_stride = 0
+            else:
+                v_stride = int(np.floor(self.v_stride / 2))
+
+            if self.h_stride < 2 or self.h_stride % 2 == 0:
+                h_stride = 0
+            else:
+                h_stride = int(np.floor(self.h_stride / 2))
+
+            upsampled_height = (X_batch.shape[height_index] * self.v_stride) - v_stride
+            upsampled_width = (X_batch.shape[width_index] * self.h_stride) - h_stride
+
+            # When upsampling, we need to insert rows and columns filled with zeros
+            # into each feature map. How many of those we have to insert is purely
+            # dependant on the value of stride parameter in the vertical and horizontal
+            # direction.
+            if self.v_stride > 1:
+                v_ind = 1
+                for i in range(X_batch.shape[height_index]):
+                    for j in range(self.v_stride - 1):
+                        X_batch = np.insert(X_batch, v_ind, 0, axis=height_index)
+                    v_ind += self.v_stride
+
+            if self.h_stride > 1:
+                h_ind = 1
+                for i in range(X_batch.shape[width_index]):
+                    for k in range(self.h_stride - 1):
+                        X_batch = np.insert(X_batch, h_ind, 0, axis=width_index)
+                    h_ind += self.h_stride
+
+            # Since the insertion of zero-filled rows and columns isn't perfect, we have
+            # to assure that the resulting feature maps will have the expected upsampled height
+            # and width by cutting them og at desired dimensions.
+
+            X_batch = X_batch[:, :, :upsampled_height, :upsampled_width]
+
+            X_batch_padded = self._padding(X_batch, batch_type="grad")
+
+            # initialize list of windows
+            windows = []
+
+            # For each location in the image...
+            for h in range(
+                0,
+                X_batch.shape[height_index],
+                self.v_stride,
+            ):
+                for w in range(
+                    0,
+                    X_batch.shape[width_index],
+                    self.h_stride,
+                ):
+                    # ...obtain an image patch of the original size (strided)
+
+                    # get window
+                    window = X_batch_padded[
+                        :, :, h : h + self.kernel_height, w : w + self.kernel_width
+                    ]
+
+                    # append window to list
+                    windows.append(window)
+
+            # return numpy array, unsampled dimensions
+            return np.stack(windows), upsampled_height, upsampled_width
+
+    def _check_for_errors(self):
+        # compares input channels of data to input channels of Convolution2DLayer
+        if self.X_batch_feedforward.shape[input_channel_index] != self.input_channels:
+            raise AssertionError(
+                f"ERROR: Number of input channels in data ({self.X_batch_feedforward.shape[input_channel_index]}) is not equal to input channels in Convolution2DLayerOPT ({self.input_channels})! Please change the number of input channels of the Convolution2DLayer such that they are equal"
+            )
+!ec
+
+=== The Convolutional Neural Network (CNN) ===
+
+Finally, we present the code for the CNN. The CNN class organizes all the layers, and allows for training on image data.
+
+!bc pycod
+import math
+import autograd.numpy as np
+import sys
+import warnings
+from autograd import grad, elementwise_grad
+from random import random, seed
+from copy import deepcopy
+from typing import Tuple, Callable
+from sklearn.utils import resample
+
+warnings.simplefilter("error")
+
+
+class CNN:
+    def __init__(
+        self,
+        cost_func: Callable = CostCrossEntropy,
+        scheduler: Scheduler = Adam(eta=1e-4, rho=0.9, rho2=0.999),
+        seed: int = None,
+    ):
+        """
+        Description:
+        ------------
+            Instantiates CNN object
+
+        Parameters:
+        ------------
+            I   output_func (costFunctions) cost function for feed forward neural network part of CNN,
+                such as "CostLogReg", "CostOLS" or "CostCrossEntropy"
+
+            II  scheduler (Scheduler) optional parameter, default set to Adam. Can also be set to other
+                schedulers such as AdaGrad, Momentum, RMS_prop and Constant. Note that schedulers have
+                to be instantiated first with proper parameters (for example eta, rho and rho2 for Adam)
+
+            III seed (int) used for seeding all random operations
+        """
+        self.layers = list()
+        self.cost_func = cost_func
+        self.scheduler = scheduler
+        self.schedulers_weight = list()
+        self.schedulers_bias = list()
+        self.seed = seed
+        self.pred_format = None
+
+    def add_FullyConnectedLayer(
+        self, nodes: int, act_func=LRELU, scheduler=None
+    ) -> None:
+        """
+        Description:
+        ------------
+            Add a FullyConnectedLayer to the CNN, i.e. a hidden layer in the feed forward neural
+            network part of the CNN. Often called a Dense layer in literature
+
+        Parameters:
+        ------------
+            I   nodes (int) number of nodes in FullyConnectedLayer
+            II  act_func (activationFunctions) activation function of FullyConnectedLayer,
+                such as "sigmoid", "RELU", "LRELU", "softmax" or "identity"
+            III scheduler (Scheduler) optional parameter, default set to Adam. Can also be set to other
+                schedulers such as AdaGrad, Momentum, RMS_prop and Constant
+        """
+        assert self.layers, "FullyConnectedLayer should follow FlattenLayer in CNN"
+
+        if scheduler is None:
+            scheduler = self.scheduler
+
+        layer = FullyConnectedLayer(nodes, act_func, scheduler, self.seed)
+        self.layers.append(layer)
+
+    def add_OutputLayer(self, nodes: int, output_func=sigmoid, scheduler=None) -> None:
+        """
+        Description:
+        ------------
+            Add an OutputLayer to the CNN, i.e. a the final layer in the feed forward neural
+            network part of the CNN
+
+        Parameters:
+        ------------
+            I   nodes (int) number of nodes in OutputLayer. Set nodes=1 for binary classification and
+                nodes = number of classes for multi-class classification
+            II  output_func (activationFunctions) activation function for the output layer, such as
+                "identity" for regression, "sigmoid" for binary classification and "softmax" for multi-class
+                classification
+            III scheduler (Scheduler) optional parameter, default set to Adam. Can also be set to other
+                schedulers such as AdaGrad, Momentum, RMS_prop and Constant
+        """
+        assert self.layers, "OutputLayer should follow FullyConnectedLayer in CNN"
+
+        if scheduler is None:
+            scheduler = self.scheduler
+
+        output_layer = OutputLayer(
+            nodes, output_func, self.cost_func, scheduler, self.seed
+        )
+        self.layers.append(output_layer)
+        self.pred_format = output_layer.get_pred_format()
+
+    def add_FlattenLayer(self, act_func=LRELU) -> None:
+        """
+        Description:
+        ------------
+            Add a FlattenLayer to the CNN, which flattens the image data such that it is formatted to
+            be used in the feed forward neural network part of the CNN
+        """
+        self.layers.append(FlattenLayer(act_func=act_func, seed=self.seed))
+
+    def add_Convolution2DLayer(
+        self,
+        input_channels=1,
+        feature_maps=1,
+        kernel_height=3,
+        kernel_width=3,
+        v_stride=1,
+        h_stride=1,
+        pad="same",
+        act_func=LRELU,
+        optimized=True,
+    ) -> None:
+        """
+        Description:
+        ------------
+            Add a Convolution2DLayer to the CNN, i.e. a convolutional layer with a 2 dimensional kernel. Should be
+            the first layer added to the CNN
+
+        Parameters:
+        ------------
+            I   input_channels (int) specifies amount of input channels. For monochrome images, use input_channels
+                = 1, and input_channels = 3 for colored images, where each channel represents one of R, G and B
+            II  feature_maps (int) amount of feature maps in CNN
+            III kernel_height (int) height of the kernel, also called 'convolutional filter' in literature
+            IV  kernel_width (int) width of the kernel, also called 'convolutional filter' in literature
+            V   v_stride (int) value of vertical stride for dimentionality reduction
+            VI  h_stride (int) value of horizontal stride for dimentionality reduction
+            VII pad (str) default = "same" ensures output size is the same as input size (given stride=1)
+           VIII act_func (activationFunctions) default = "LRELU", nonlinear activation function
+             IX optimized (bool) default = True, uses Convolution2DLayerOPT if True which is much faster when
+                compared to Convolution2DLayer, which is a more straightforward, understandable implementation
+        """
+        if optimized:
+            conv_layer = Convolution2DLayerOPT(
+                input_channels,
+                feature_maps,
+                kernel_height,
+                kernel_width,
+                v_stride,
+                h_stride,
+                pad,
+                act_func,
+                self.seed,
+                reset_weights_independently=False,
+            )
+        else:
+            conv_layer = Convolution2DLayer(
+                input_channels,
+                feature_maps,
+                kernel_height,
+                kernel_width,
+                v_stride,
+                h_stride,
+                pad,
+                act_func,
+                self.seed,
+                reset_weights_independently=False,
+            )
+        self.layers.append(conv_layer)
+
+    def add_PoolingLayer(
+        self, kernel_height=2, kernel_width=2, v_stride=1, h_stride=1, pooling="max"
+    ) -> None:
+        """
+        Description:
+        ------------
+            Add a Pooling2DLayer to the CNN, i.e. a pooling layer that reduces the dimentionality of
+            the image data. It is not necessary to use a Pooling2DLayer when creating a CNN, but it
+            can be used to speed up the training
+
+        Parameters:
+        ------------
+            I   kernel_height (int) height of the kernel used for pooling
+            II  kernel_width (int) width of the kernel used for pooling
+            III v_stride (int) value of vertical stride for dimentionality reduction
+            IV  h_stride (int) value of horizontal stride for dimentionality reduction
+            V   pooling (str) either "max" or "average", describes type of pooling performed
+        """
+        pooling_layer = Pooling2DLayer(
+            kernel_height, kernel_width, v_stride, h_stride, pooling, self.seed
+        )
+        self.layers.append(pooling_layer)
+
+    def fit(
+        self,
+        X: np.ndarray,
+        t: np.ndarray,
+        epochs: int = 100,
+        lam: float = 0,
+        batches: int = 1,
+        X_val: np.ndarray = None,
+        t_val: np.ndarray = None,
+    ) -> dict:
+        """
+        Description:
+        ------------
+            Fits the CNN to input X for a given amount of epochs. Performs feedforward and backpropagation passes,
+            can utilize batches, regulariziation and validation if desired.
+
+        Parameters:
+        ------------
+            X (numpy array) with input data in format [images, input channels,
+            image height, image_width]
+            t (numpy array) target labels for input data
+            epochs (int) amount of epochs
+            lam (float) regulariziation term lambda
+            batches (int) amount of batches input data splits into
+            X_val (numpy array) validation data
+            t_val (numpy array) target labels for validation data
+
+        Returns:
+        ------------
+            scores (dict) a dictionary with "train_error", "train_acc", "val_error", val_acc" keys
+            that contain numpy arrays with float values of all accuracies/errors over all epochs.
+            Can be used to create plots. Also used to update the progress bar during training
+        """
+
+        # setup
+        if self.seed is not None:
+            np.random.seed(self.seed)
+
+        # initialize weights
+        self._initialize_weights(X)
+
+        # create arrays for score metrics
+        scores = self._initialize_scores(epochs)
+
+        assert batches <= t.shape[0]
+        batch_size = X.shape[0] // batches
+
+        try:
+            for epoch in range(epochs):
+                for batch in range(batches):
+                    # minibatch gradient descent
+                    # If the for loop has reached the last batch, take all thats left
+                    if batch == batches - 1:
+                        X_batch = X[batch * batch_size :, :, :, :]
+                        t_batch = t[batch * batch_size :, :]
+                    else:
+                        X_batch = X[
+                            batch * batch_size : (batch + 1) * batch_size, :, :, :
+                        ]
+                        t_batch = t[batch * batch_size : (batch + 1) * batch_size, :]
+
+                    self._feedforward(X_batch)
+                    self._backpropagate(t_batch, lam)
+
+                # reset schedulers for each epoch (some schedulers pass in this call)
+                for layer in self.layers:
+                    if isinstance(layer, FullyConnectedLayer):
+                        layer._reset_scheduler()
+
+                # computing performance metrics
+                scores = self._compute_scores(scores, epoch, X, t, X_val, t_val)
+
+                # printing progress bar
+                print_length = self._progress_bar(
+                    epoch,
+                    epochs,
+                    scores,
+                )
+        # allows for stopping training at any point and seeing the result
+        except KeyboardInterrupt:
+            pass
+
+        # visualization of training progression (similiar to tensorflow progression bar)
+        sys.stdout.write("\r" + " " * print_length)
+        sys.stdout.flush()
+        self._progress_bar(
+            epochs,
+            epochs,
+            scores,
+        )
+        sys.stdout.write("")
+
+        return scores
+
+    def _feedforward(self, X_batch) -> np.ndarray:
+        """
+        Description:
+        ------------
+            Performs the feedforward pass for all layers in the CNN. Called from fit()
+        """
+        a = X_batch
+        for layer in self.layers:
+            a = layer._feedforward(a)
+
+        return a
+
+    def _backpropagate(self, t_batch, lam) -> None:
+        """
+        Description:
+        ------------
+            Performs backpropagation for all layers in the CNN. Called from fit()
+        """
+        assert len(self.layers) >= 2
+        reversed_layers = self.layers[::-1]
+
+        # for every layer, backwards
+        for i in range(len(reversed_layers) - 1):
+            layer = reversed_layers[i]
+            prev_layer = reversed_layers[i + 1]
+
+            # OutputLayer
+            if isinstance(layer, OutputLayer):
+                prev_a = prev_layer.get_prev_a()
+                weights_next, delta_next = layer._backpropagate(t_batch, prev_a, lam)
+
+            # FullyConnectedLayer
+            elif isinstance(layer, FullyConnectedLayer):
+                assert (
+                    delta_next is not None
+                ), "No OutputLayer to follow FullyConnectedLayer"
+                assert (
+                    weights_next is not None
+                ), "No OutputLayer to follow FullyConnectedLayer"
+                prev_a = prev_layer.get_prev_a()
+                weights_next, delta_next = layer._backpropagate(
+                    weights_next, delta_next, prev_a, lam
+                )
+
+            # FlattenLayer
+            elif isinstance(layer, FlattenLayer):
+                assert (
+                    delta_next is not None
+                ), "No FullyConnectedLayer to follow FlattenLayer"
+                assert (
+                    weights_next is not None
+                ), "No FullyConnectedLayer to follow FlattenLayer"
+                delta_next = layer._backpropagate(weights_next, delta_next)
+
+            # Convolution2DLayer and Convolution2DLayerOPT
+            elif isinstance(layer, Convolution2DLayer):
+                assert (
+                    delta_next is not None
+                ), "No FlattenLayer to follow Convolution2DLayer"
+                delta_next = layer._backpropagate(delta_next)
+
+            # Pooling2DLayer
+            elif isinstance(layer, Pooling2DLayer):
+                assert delta_next is not None, "No Layer to follow Pooling2DLayer"
+                delta_next = layer._backpropagate(delta_next)
+
+            # Catch error
+            else:
+                raise NotImplementedError
+
+    def _compute_scores(
+        self,
+        scores: dict,
+        epoch: int,
+        X: np.ndarray,
+        t: np.ndarray,
+        X_val: np.ndarray,
+        t_val: np.ndarray,
+    ) -> dict:
+        """
+        Description:
+        ------------
+            Computes scores such as training error, training accuracy, validation error
+            and validation accuracy for the CNN depending on if a validation set is used
+            and if the CNN performs classification or regression
+
+        Returns:
+        ------------
+            scores (dict) a dictionary with "train_error", "train_acc", "val_error", val_acc" keys
+            that contain numpy arrays with float values of all accuracies/errors over all epochs.
+            Can be used to create plots. Also used to update the progress bar during training
+        """
+
+        pred_train = self.predict(X)
+        cost_function_train = self.cost_func(t)
+        train_error = cost_function_train(pred_train)
+        scores["train_error"][epoch] = train_error
+
+        if X_val is not None and t_val is not None:
+            cost_function_val = self.cost_func(t_val)
+            pred_val = self.predict(X_val)
+            val_error = cost_function_val(pred_val)
+            scores["val_error"][epoch] = val_error
+
+        if self.pred_format != "Regression":
+            train_acc = self._accuracy(pred_train, t)
+            scores["train_acc"][epoch] = train_acc
+            if X_val is not None and t_val is not None:
+                val_acc = self._accuracy(pred_val, t_val)
+                scores["val_acc"][epoch] = val_acc
+
+        return scores
+
+    def _initialize_scores(self, epochs) -> dict:
+        """
+        Description:
+        ------------
+            Initializes scores such as training error, training accuracy, validation error
+            and validation accuracy for the CNN
+
+        Returns:
+        ------------
+            A dictionary with "train_error", "train_acc", "val_error", val_acc" keys that
+            will contain numpy arrays with float values of all accuracies/errors over all epochs
+            when passed through the _compute_scores() function during fit()
+        """
+        scores = dict()
+
+        train_errors = np.empty(epochs)
+        train_errors.fill(np.nan)
+        val_errors = np.empty(epochs)
+        val_errors.fill(np.nan)
+
+        train_accs = np.empty(epochs)
+        train_accs.fill(np.nan)
+        val_accs = np.empty(epochs)
+        val_accs.fill(np.nan)
+
+        scores["train_error"] = train_errors
+        scores["val_error"] = val_errors
+        scores["train_acc"] = train_accs
+        scores["val_acc"] = val_accs
+
+        return scores
+
+    def _initialize_weights(self, X: np.ndarray) -> None:
+        """
+        Description:
+        ------------
+            Initializes weights for all layers in CNN
+
+        Parameters:
+        ------------
+            I   X (np.ndarray) input of format [img, feature_maps, height, width]
+        """
+        prev_nodes = X
+        for layer in self.layers:
+            prev_nodes = layer._reset_weights(prev_nodes)
+
+    def predict(self, X: np.ndarray, *, threshold=0.5) -> np.ndarray:
+        """
+        Description:
+        ------------
+            Predicts output of input X
+
+        Parameters:
+        ------------
+            I   X (np.ndarray) input [img, feature_maps, height, width]
+        """
+
+        prediction = self._feedforward(X)
+
+        if self.pred_format == "Binary":
+            return np.where(prediction > threshold, 1, 0)
+        elif self.pred_format == "Multi-class":
+            class_prediction = np.zeros(prediction.shape)
+            for i in range(prediction.shape[0]):
+                class_prediction[i, np.argmax(prediction[i, :])] = 1
+            return class_prediction
+        else:
+            return prediction
+
+    def _accuracy(self, prediction: np.ndarray, target: np.ndarray) -> float:
+        """
+        Description:
+        ------------
+            Calculates accuracy of given prediction to target
+
+        Parameters:
+        ------------
+            I   prediction (np.ndarray): output of predict() fuction
+            (1s and 0s in case of classification, and real numbers in case of regression)
+            II  target (np.ndarray): vector of true values (What the network should predict)
+
+        Returns:
+        ------------
+            A floating point number representing the percentage of correctly classified instances.
+        """
+        assert prediction.size == target.size
+        return np.average((target == prediction))
+
+    def _progress_bar(self, epoch: int, epochs: int, scores: dict) -> int:
+        """
+        Description:
+        ------------
+            Displays progress of training
+        """
+        progression = epoch / epochs
+        epoch -= 1
+        print_length = 40
+        num_equals = int(progression * print_length)
+        num_not = print_length - num_equals
+        arrow = ">" if num_equals > 0 else ""
+        bar = "[" + "=" * (num_equals - 1) + arrow + "-" * num_not + "]"
+        perc_print = self._fmt(progression * 100, N=5)
+        line = f"  {bar} {perc_print}% "
+
+        for key, score in scores.items():
+            if np.isnan(score[epoch]) == False:
+                value = self._fmt(score[epoch], N=4)
+                line += f"| {key}: {value} "
+        print(line, end="\r")
+        return len(line)
+
+    def _fmt(self, value: int, N=4) -> str:
+        """
+        Description:
+        ------------
+            Formats decimal numbers for progress bar
+        """
+        if value > 0:
+            v = value
+        elif value < 0:
+            v = -10 * value
+        else:
+            v = 1
+        n = 1 + math.floor(math.log10(v))
+        if n >= N - 1:
+            return str(round(value))
+            # or overflow
+        return f"{value:.{N-n-1}f}"
+!ec
+
+=== Usage of CNN code ===
+
+Using the CNN codebase is very simple. We begin by initiating a CNN
+object, which takes a cost function, a scheduler and a seed as its
+arguments. If a scheduler is not provided, it will per default
+initiate an Adam scheduler with eta=1e-4, and if a seed is not
+provided, the CNN will not be seeded, meaning it will run with a
+different random seed every run. Below we demonstrate an initiation of
+our CNN.
+
+!bc pycod
+adam_scheduler = Adam(eta=1e-3, rho=0.9, rho2=0.999)
+cnn = CNN(cost_func=CostCrossEntropy, scheduler=adam_scheduler, seed=2023)
+!ec
+
+Now that we have our CNN object, we can begin to add layers to it!
+Many of the add_layer functions have default values, for example
+add_Convolution2DLayer() has a default v_stride and h_stride of
+1. However, these can of course be set to any value you please. Note
+that the input channels of a subsequent convolutional layer must equal
+the previous convolutional layer's feature maps.
+
+!bc pycod
+cnn.add_Convolution2DLayer(
+    input_channels=1,
+    feature_maps=1,
+    kernel_height=3,
+    kernel_width=3,
+    act_func=LRELU,
+)
+
+cnn.add_FlattenLayer()
+
+cnn.add_FullyConnectedLayer(30, LRELU)
+
+cnn.add_FullyConnectedLayer(20, LRELU)
+
+cnn.add_OutputLayer(10, softmax)
+!ec
+
+Here we have created a CNN with the following architecture:
+ 
+o A convolutional layer with 1 input channel, with a kernel height of 2 and a width of 2, which uses LRELU as its non-linearity function. This layer outputs 1 feature map, which feed into the subsequent layer.
+
+o A flatten layer
+
+o A hidden layer with 30 nodes, with LRELU as its activation function
+
+o Another hidden layer but with 20 nodes
+
+o The output layer, with softmax as its activation function and 10 nodes. We use 10 nodes because we will be using a dataset with 10 classes.
+
+Now, before we can train the model, we need to load in our data. We
+will use the MNIST dataset and use 10000 $28 \times  28$ images.
+
+!bc pycod
+from sklearn.datasets import fetch_openml
+from sklearn.model_selection import train_test_split
+
+def onehot(target: np.ndarray):
+    onehot = np.zeros((target.size, target.max() + 1))
+    onehot[np.arange(target.size), target] = 1
+    return onehot
+
+# get dataset
+dataset = fetch_openml("mnist_784", parser="auto")
+mnist = dataset.data.to_numpy(dtype="float")[:10000, :]
+
+# scale data
+for i in range(mnist.shape[1]):
+    mnist[:, i] /= 255
+    
+# reshape to add single input channel to data shape [inputs, input_channels, height, width]
+mnist = mnist.reshape(mnist.shape[0], 1, 28, 28)
+
+# one hot encode target as we are doing multi-class classification
+target = onehot(np.array([int(i) for i in dataset.target.to_numpy()[:10000]]))
+
+# split into training and validation data
+x_train, x_val, y_train, y_val = train_test_split(mnist, target)
+!ec
+
+Now we may train our model. Note that we can utilize regularization in
+the CNN by using the lam (lambda) parameter in fit(), and utilize
+different types of gradient descent by specifying the amount of
+batches via the batches parameter as shown below.
+ 
+The functionfit() returns a score dictionary of the training error and
+accuracy (and validation error and accuracy if a validation set is
+provided) which can be used to plot the error and accuracy of the
+model over epochs.
+
+!bc pycod
+scores = cnn.fit(
+    x_train,
+    y_train,
+    lam=1e-5,
+    batches=10,
+    epochs=100,
+    X_val=x_val,
+    t_val=y_val,
+)
+
+plt.plot(scores["train_acc"], label="Training")
+plt.plot(scores["val_acc"], label="Validation")
+plt.ylim([0.8,1])
+plt.xlabel("Epochs")
+plt.ylabel("Accuracy")
+plt.legend()
+plt.show()
+!ec
+
+Considering we only trained the model for 100 epochs without any tuning of the hyperparameters, this result is pretty good.
+ 
+The codebase allows for great flexibility in CNN
+architectures. Pooling layers can be added before, inbetween or after
+convolutional layers, but due to the great optimizations made within
+Convolution2DLayerOPT, we recommend using the v_stride and h_stride
+parameters in add_Convolution2DLayer() to reduce the dimentionality of
+the problem as the pooling layer is slow in comparison. To use the
+unoptimized version of Convolution2DLayer, simply pass optimized=False
+as an argument in add_Convolution2DLayer().
+
+If one wishes to perform binary classification using the CNN, simply
+use the cost function 'CostLogReg' when initializing the CNN and use 1
+node at the OutputLayer.
+ 
+Below we have created another, more untraditional architecture using
+our code to demonstrate its flexibility and different attributes such
+as asymmetric stride that might become useful when constructing your
+own CNN.
+
+!bc pycod
+adam_scheduler = Adam(eta=1e-3, rho=0.9, rho2=0.999)
+cnn = CNN(cost_func=CostCrossEntropy, scheduler=adam_scheduler, seed=2023)
+
+cnn.add_Convolution2DLayer(
+    input_channels=1,
+    feature_maps=7,
+    kernel_height=7,
+    kernel_width=1,
+    act_func=LRELU,
+)
+
+cnn.add_PoolingLayer(
+    kernel_height=2,
+    kernel_width=2,
+    pooling="average",
+)
+
+cnn.add_PoolingLayer(
+    kernel_height=2,
+    kernel_width=2,
+    pooling="max",
+)
+
+cnn.add_Convolution2DLayer(
+    input_channels=7,
+    feature_maps=1,
+    kernel_height=4,
+    kernel_width=4,
+    v_stride=2,
+    h_stride=3,
+    act_func=LRELU,
+    optimized=False,
+)
+
+cnn.add_Convolution2DLayer(
+    input_channels=1,
+    feature_maps=1,
+    kernel_height=2,
+    kernel_width=2,
+    act_func=sigmoid,
+    optimized=True,
+)
+
+cnn.add_PoolingLayer(
+    kernel_height=2,
+    kernel_width=2,
+    pooling="max"
+)
+
+cnn.add_FlattenLayer()
+
+cnn.add_FullyConnectedLayer(100, LRELU)
+
+cnn.add_FullyConnectedLayer(10, sigmoid)
+
+cnn.add_FullyConnectedLayer(101, identity)
+
+cnn.add_OutputLayer(10, softmax)
+!ec
+
+Here we see the use of asymmetrical 1D kernels such as the $7 \times
+1$ kernel in the first convolutional layer, both max and average
+pooling, asymmetric stride in the unoptimized convolutional layer,
+more pooling, a flatten layer, a hidden layer with 100 nodes using
+LRELU, another hidden layer with 10 hidden nodes that uses the sigmoid
+activation function, and another hidden layer with 101 nodes which
+utilizes no activation function (identity). Finally, we arrive at the
+output layer with 10 nodes, which uses softmax as its activation
+function.
+
+=== Additional Remarks ===
+
+
+The stride parameter controls the distance between each convolution
+and the kernel/filter. If our image is padded, stride is the only
+parameter that determines the size of the output from a convolutional
+layer. However, if we decide not to perform any padding, the size of
+the output feature map depends on both the stride and kernel size. It
+is important to note that neither the stride nor the kernel has to be
+symmetrical. This means that we can use a rectangular filter if we
+choose, and the stride in the vertical direction (axis=0 in Python)
+does not need to be the same as the stride in the horizontal direction
+(axis=1 in Python). It may even be the case that asymmetric
+combinations of stride or kernel dimensions, or both, yield better
+results than symmetric values for these parameters.
+
+!bc pycod
+def convolve(image, kernel, stride=1):
+    for i in range(2):
+        kernel = np.rot90(kernel)
+
+    k_half_height = kernel.shape[0] // 2
+    k_half_width = kernel.shape[0] // 2
+
+    conv_image = np.zeros(image.shape)
+    pad_image = padding(image, kernel)
+
+    for i in range(k_half_height, conv_image.shape[0] + k_half_height, stride):
+        for j in range(k_half_width, conv_image.shape[1] + k_half_width, stride):
+            conv_image[i - k_half_height, j - k_half_width] = np.sum(
+                pad_image[
+                    i - k_half_height : i + k_half_height + 1, j - k_half_width : j + k_half_width + 1
+                ]
+                * kernel
+            )
+
+    return conv_image
+!ec
+
+=== Remarks on the speed  ===
+
+Despite the naive convolution algorithm shown above working finely, it
+is extremely slow, requiring approximately 20-30 seconds to process a
+single image. The time complexity of 2D convolution, which is O(NMnm),
+rapidly becomes a constraint and may, at worst, make computations
+infeasible. Consequently, optimizing the naive 2D convolution
+algorithm is a necessity, as the execution time of the algorithm
+significantly increases as the input data size expands. This can pose
+a bottleneck in applications that necessitate real-time processing of
+large data volumes, such as image and video processing, deep learning,
+and scientific simulations.
+
+To address this issue, we shall present two widely used optimization
+techniques: the separable kernel approach and Fast Fourier Transform
+(FFT). Both of these methods can drastically reduce the computational
+complexity of convolution and enhance the overall efficiency of
+processing substantial data quantities. While we shall refrain from
+delving into the intricacies of these algorithms, we strongly
+encourage you to examine at least the application of FFT to optimize
+computations.
+
+=== Convolution using separable kernels ===
+
+!bc pycod
+def conv2DSep(image, kernel, coef, stride=1, pad="zero"):
+    for i in range(2):
+        kernel = np.rot90(kernel)
+
+    # The kernel is quadratic, thus we only need one of its dimensions
+    half_dim = kernel.shape[0] // 2
+
+    ker1 = np.array(kernel[0, :])
+    ker2 = np.array(kernel[:, 0])
+
+    if pad == "zero":
+        conv_image = np.zeros(image.shape)
+        pad_image = padding(image, kernel)
+    else:
+        conv_image = np.zeros(
+            (image.shape[0] - kernel.shape[0], image.shape[1] - kernel.shape[1])
+        )
+        pad_image = image[:, :]
+
+    for i in range(half_dim, conv_image.shape[0] + half_dim, stride):
+        for j in range(half_dim, conv_image.shape[1] + half_dim, stride):
+            conv_image[i - half_dim, j - half_dim] = (
+                pad_image[
+                    i - half_dim : i + half_dim + 1, j - half_dim : j + half_dim + 1
+                ]
+                @ ker1
+                @ ker2.T
+                * coef
+            )
+
+    return conv_image
+
+img_path = img_path = "data/IMG-2167.JPG"
+image_of_cute_dog = imageio.imread(img_path, mode="L")
+start_time = time.time()
+filtered_image = conv2DSep(image_of_cute_dog, kernel=sobel_kernel, coef=1)
+print(f'Time taken for convolution with seperated kernel on 128x128 image {time.time() - start_time}')
+plt.imshow(filtered_image, cmap="gray", vmin=0, vmax=255, aspect="auto")
+plt.show()
+!ec
+
+By taking advantage of the capabilities of separable kernels, we can
+effectively cut the computational expense of filtering an image in
+half. Yet, if we seek even more rapid processing, we can turn to the
+Fast Fourier Transform (FFT) algorithm provided by the numpy
+library. By utilizing FFT to transform the input image and filter into
+the frequency domain, we can perform convolution in this domain. This
+approach significantly reduces the number of operations needed and
+results in a marked speedup relative to other convolution
+techniques. In addition, it is worth noting that the FFT is widely
+regarded as one of the most critical algorithms developed to date,
+with applications ranging from digital signal processing to scientific
+computing.
+
+=== Convolution in the Fourier domain ===
+!bc pycod
+start_time = time.time()
+img_fft = np.fft.fft2(image_of_cute_dog)
+kernel_fft = np.fft.fft2(sobel_kernel, s=image_of_cute_dog.shape)
+
+conv_image = img_fft * kernel_fft
+
+filtered_image = np.fft.ifft2(conv_image)
+print(f'Time take for convolution in the fourier domain: {time.time() - start_time}')
+plt.imshow(filtered_image.real, cmap="gray", vmin=0, vmax=255, aspect="auto")
+plt.show()
+!ec
+
+It is evident that executing convolution in the Fourier domain yields
+the quickest computation time. Nonetheless, one should exercise
+caution, particularly when dealing with images of relatively small
+dimensions, as one of the other methods may prove to be more
+expeditious than FFT-enhanced convolution. The overhead involved in
+transferring both the image and filter into the Fourier domain,
+followed by their subsequent transformation back into the spatial
+domain, results in a minor inconvenience. Therefore, it is imperative
+to remain cognizant of this fact when utilizing FFT as the primary
+optimization technique.
+
+
+
+
+
+
diff --git a/doc/src/week44/Programs/project2pytorch.py b/doc/src/week44/Programs/project2pytorch.py
new file mode 100644
index 000000000..46db1fc50
--- /dev/null
+++ b/doc/src/week44/Programs/project2pytorch.py
@@ -0,0 +1,136 @@
+import math
+import numpy as np
+import torch
+import torch.nn as nn
+from torch.utils.data import TensorDataset, DataLoader
+import matplotlib.pyplot as plt
+
+# ----------------------------
+# Problem setup: Runge function
+# ----------------------------
+def runge(x):
+    # x: numpy array
+    return 1.0 / (1.0 + 25.0 * x**2)
+
+# ----------------------------
+# MLP: two hidden layers
+# ----------------------------
+class MLP(nn.Module):
+    def __init__(self, in_dim=1, hidden1=64, hidden2=64, out_dim=1):
+        super().__init__()
+        self.net = nn.Sequential(
+            nn.Linear(in_dim, hidden1),
+            nn.Sigmoid(),
+            nn.Linear(hidden1, hidden2),
+            nn.Sigmoid(),
+            nn.Linear(hidden2, out_dim),
+        )
+
+    def forward(self, x):
+        return self.net(x)
+
+# ----------------------------
+# Training utilities
+# ----------------------------
+def train(model, loader, optimizer, loss_fn, device):
+    model.train()
+    running = 0.0
+    for xb, yb in loader:
+        xb = xb.to(device)
+        yb = yb.to(device)
+        optimizer.zero_grad(set_to_none=True)
+        pred = model(xb)
+        loss = loss_fn(pred, yb)
+        loss.backward()
+        optimizer.step()
+        running += loss.item() * xb.size(0)
+    return running / len(loader.dataset)
+
+@torch.no_grad()
+def evaluate(model, loader, loss_fn, device):
+    model.eval()
+    running = 0.0
+    for xb, yb in loader:
+        xb = xb.to(device)
+        yb = yb.to(device)
+        pred = model(xb)
+        loss = loss_fn(pred, yb)
+        running += loss.item() * xb.size(0)
+    return running / len(loader.dataset)
+
+# ----------------------------
+# Main
+# ----------------------------
+def main():
+    # Reproducibility
+    torch.manual_seed(7)
+    np.random.seed(7)
+
+    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+    print("Device:", device)
+
+    # Generate training data on [-1, 1]
+    n_train = 256
+    X_train = np.random.uniform(-1.0, 1.0, size=(n_train, 1)).astype(np.float32)
+    y_train = runge(X_train).astype(np.float32)
+
+    # Small validation set
+    n_val = 128
+    X_val = np.random.uniform(-1.0, 1.0, size=(n_val, 1)).astype(np.float32)
+    y_val = runge(X_val).astype(np.float32)
+
+    # Torch datasets/loaders
+    train_ds = TensorDataset(torch.from_numpy(X_train), torch.from_numpy(y_train))
+    val_ds   = TensorDataset(torch.from_numpy(X_val),   torch.from_numpy(y_val))
+    train_loader = DataLoader(train_ds, batch_size=64, shuffle=True)
+    val_loader   = DataLoader(val_ds,   batch_size=128, shuffle=False)
+
+    # Model, loss, optimizer (Adam)
+    model = MLP(1, 128, 128, 1).to(device)
+    loss_fn = nn.MSELoss()
+    optimizer = torch.optim.Adam(model.parameters(), lr=0.01)
+
+    # Optional: mild cosine LR schedule for smooth convergence
+    scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max=1500)
+
+    # Train
+    epochs = 1500
+    best_val = math.inf
+    best_state = None
+    for ep in range(1, epochs + 1):
+        tr_loss = train(model, train_loader, optimizer, loss_fn, device)
+        val_loss = evaluate(model, val_loader, loss_fn, device)
+        scheduler.step()
+
+        if val_loss < best_val:
+            best_val = val_loss
+            best_state = {k: v.cpu().clone() for k, v in model.state_dict().items()}
+
+        if ep % 100 == 0 or ep == 1 or ep == epochs:
+            print(f"Epoch {ep:4d} | train MSE: {tr_loss:.4e} | val MSE: {val_loss:.4e}")
+
+    # Restore best model
+    if best_state is not None:
+        model.load_state_dict(best_state)
+
+    # Evaluate on a dense grid for plotting
+    xs = np.linspace(-1.0, 1.0, 500, dtype=np.float32).reshape(-1, 1)
+    ys_true = runge(xs).reshape(-1)
+    with torch.no_grad():
+        yhat = model(torch.from_numpy(xs).to(device)).cpu().numpy().reshape(-1)
+
+    # Plot
+    plt.figure()
+    plt.plot(xs, ys_true, label="Runge function (true)")
+    plt.plot(xs, yhat, linestyle="--", label="Neural net (prediction)")
+    # also scatter training points (optional)
+    plt.scatter(X_train, y_train, s=10, alpha=0.3, label="Train samples")
+    plt.xlabel("x")
+    plt.ylabel("f(x)")
+    plt.title("Approximating the Runge function with a 2-hidden-layer MLP (Adam)")
+    plt.legend()
+    plt.tight_layout()
+    plt.show()
+
+if __name__ == "__main__":
+    main()
diff --git a/doc/src/week44/Programs/tensorflowproject1.py b/doc/src/week44/Programs/tensorflowproject1.py
new file mode 100644
index 000000000..9ff2f2080
--- /dev/null
+++ b/doc/src/week44/Programs/tensorflowproject1.py
@@ -0,0 +1,81 @@
+# import necessary packages
+import numpy as np
+import matplotlib.pyplot as plt
+import tensorflow as tf
+from tensorflow.keras.layers import Input
+from tensorflow.keras.models import Sequential      #This allows appending layers to existing models
+from tensorflow.keras.layers import Dense           #This allows defining the characteristics of a particular layer
+from tensorflow.keras import optimizers             #This allows using whichever optimiser we want (sgd,adam,RMSprop)
+from tensorflow.keras import regularizers           #This allows using whichever regularizer we want (l1,l2,l1_l2)
+from tensorflow.keras.utils import to_categorical   #This allows using categorical cross entropy as the cost function
+from sklearn.model_selection import train_test_split
+
+
+# Define the Runge function
+def runge_function(x):
+    return 1 / (1 + 25 * x**2)
+
+# Generate training data
+np.random.seed(0)
+x  = np.linspace(-1, 1, 200).reshape(-1, 1)
+y  = runge_function(x)
+
+# split into train and test data
+train_size = 0.8
+test_size = 1 - train_size
+X_train, X_test, Y_train, Y_test = train_test_split(x, y, train_size=train_size,test_size=test_size)
+
+epochs = 1000
+batch_size = 100
+n_neurons_layer1 = 128
+n_neurons_layer2 = 128
+n_categories = 1
+eta_vals = np.logspace(-5, 1, 7)
+lmbd_vals = np.logspace(-5, 1, 7)
+def create_neural_network_keras(n_neurons_layer1, n_neurons_layer2, n_categories, eta, lmbd):
+    model = Sequential()
+    model.add(Dense(n_neurons_layer1, activation='tanh', kernel_regularizer=regularizers.l2(lmbd)))
+    model.add(Dense(n_neurons_layer2, activation='tanh', kernel_regularizer=regularizers.l2(lmbd)))
+    model.add(Dense(n_categories, activation='linear'))
+    sgd = optimizers.SGD(learning_rate=eta)
+    model.compile(loss='mse', optimizer=sgd, metrics=['mse'])
+    return model
+
+DNN_keras = np.zeros((len(eta_vals), len(lmbd_vals)), dtype=object)
+for i, eta in enumerate(eta_vals):
+    for j, lmbd in enumerate(lmbd_vals):
+        DNN = create_neural_network_keras(n_neurons_layer1, n_neurons_layer2, n_categories,
+                                         eta=eta, lmbd=lmbd)
+        DNN.fit(X_train, Y_train, epochs=epochs, batch_size=batch_size, verbose=0)
+        scores = DNN.evaluate(X_test, Y_test)        
+        DNN_keras[i][j] = DNN
+        print("Learning rate = ", eta)
+        print("Lambda = ", lmbd)
+        print("Test MSE: %.3f" % scores[1])
+        print()
+
+import seaborn as sns
+sns.set()
+train_accuracy = np.zeros((len(eta_vals), len(lmbd_vals)))
+test_accuracy = np.zeros((len(eta_vals), len(lmbd_vals)))
+for i in range(len(eta_vals)):
+    for j in range(len(lmbd_vals)):
+        DNN = DNN_keras[i][j]
+        train_accuracy[i][j] = DNN.evaluate(X_train, Y_train)[1]
+        test_accuracy[i][j] = DNN.evaluate(X_test, Y_test)[1]
+fig, ax = plt.subplots(figsize = (10, 10))
+sns.heatmap(train_accuracy, annot=True, ax=ax, cmap="viridis")
+ax.set_title("Training Accuracy")
+ax.set_ylabel("$\eta$")
+ax.set_xlabel("$\lambda$")
+plt.show()
+
+fig, ax = plt.subplots(figsize = (10, 10))
+sns.heatmap(test_accuracy, annot=True, ax=ax, cmap="viridis")
+ax.set_title("Test Accuracy")
+ax.set_ylabel("$\eta$")
+ax.set_xlabel("$\lambda$")
+plt.show()
+
+
+
diff --git a/doc/src/week44/exercisesweek44.do.txt b/doc/src/week44/exercisesweek44.do.txt
new file mode 100644
index 000000000..f01ae0a24
--- /dev/null
+++ b/doc/src/week44/exercisesweek44.do.txt
@@ -0,0 +1,51 @@
+TITLE: Exercises week 44
+AUTHOR: October 27-31, 2025
+DATE: Deadline is Friday October 31 at midnight
+
+
+=======  Overarching aims of the exercises this week =======
+
+The exercise set this week is meant as a summary of many of the
+central elements we have discussed in connection with projects 1 and
+2, with a slight bias towards deep learning methods and their
+training. You don't need to answer all questions.  The hope is that
+these exercises can be of use in your discussions about the neural
+network results in project 2.  You don't need to answer all questions.
+
+===== Exercise 1: Linear and logistic regression methods  =====
+
+o What is the main difference between ordinary least squares and Ridge regression?
+o Which kind of data set would you use logistic regression for?
+o In linear regression you assume that your output is described by a continuous non-stochastic function $f(x)$. Which is the equivalent function in logistic regression?
+o Can you find an analytic solution to a logistic regression type of problem?
+o What kind of cost function would you use in logistic regression?
+
+
+===== Exercise 2: Deep learning  =====
+
+o What is an activation function and discuss the use of an activation function? Explain three different types of activation functions?
+o Describe the architecture of a typical feed forward  Neural Network (NN). 
+o You are using a deep neural network for a prediction task. After training your model, you notice that it is strongly overfitting the training set and that the performance on the test isn’t good. What can you do to reduce overfitting?
+o How would you know if your model is suffering from the problem of exploding gradients?
+o Can you name and explain a few hyperparameters used for training a neural network?
+o Describe the architecture of a typical Convolutional Neural Network (CNN)
+o What is the vanishing gradient problem in Neural Networks and how to fix it?
+o When it comes to training an artificial neural network, what could the reason be for why the cost/loss doesn't decrease in a few epochs?
+o How does L1/L2 regularization affect a neural network?
+o What is(are) the advantage(s) of deep learning over traditional methods like linear regression or logistic regression?
+
+
+===== Exercise 3: Optimization part =====
+
+o Which is the basic mathematical root-finding method behind essentially all gradient descent approaches(stochastic and non-stochastic)? 
+o And why don't we use it? Or stated differently, why do we introduce the learning rate as a parameter?
+o What might happen if you set the momentum hyperparameter too close to 1 (e.g., 0.9999) when using an optimizer for the learning rate?
+o Why should we use stochastic gradient descent instead of plain gradient descent?
+o Which parameters would you need to tune when use a stochastic gradient descent approach?
+
+
+
+===== Exercise 4: Analysis of results =====
+o How do you assess overfitting and underfitting?
+o Why do we divide the data in test and train and/or eventually validation sets?
+o Why would you use resampling methods in the data analysis? Mention some widely popular resampling methods.
diff --git a/doc/src/week44/week44.do.txt b/doc/src/week44/week44.do.txt
index cc51f319a..9bc62f1ae 100644
--- a/doc/src/week44/week44.do.txt
+++ b/doc/src/week44/week44.do.txt
@@ -1,4 +1,4 @@
-TITLE: Week 44,  Convolutional Neural Networks (CNN)
+TITLE: Week 44,  Solving differential equations with neural networks and start Convolutional Neural Networks (CNN)
 AUTHOR: Morten Hjorth-Jensen {copyright, 1999-present|CC BY-NC} at Department of Physics, University of Oslo, Norway
 DATE: Week 44
 
@@ -7,7 +7,8 @@ DATE: Week 44
 ===== Plan for week 44 =====
 
 !bblock Material for the lecture Monday October 27, 2025
-o Convolutional  Neural Networks
+o Solving differential equations, continuation from last week, first lecture
+o Convolutional  Neural Networks, second lecture
 o Readings and Videos:
   * These lecture notes at URL:"https://github.com/CompPhysics/MachineLearning/blob/master/doc/pub/week44/ipynb/week44.ipynb"
   * For a more in depth discussion on  neural networks we recommend Goodfellow et al chapter 9. See also chapter 11 and 12 on practicalities and applications
@@ -15,8 +16,8 @@ o Readings and Videos:
   * Video on Deep Learning at URL:"https://www.youtube.com/playlist?list=PLZHQObOWTQDNU6R1_67000Dx_ZCJB-3pi"
   * Video  on Convolutional Neural Networks from MIT at URL:"https://www.youtube.com/watch?v=iaSUYvmCekI&ab_channel=AlexanderAmini"
   * Video on CNNs from Stanford at URL:"https://www.youtube.com/watch?v=bNb2fEVKeEo&list=PLC1qU-LWwrF64f4QKQT-Vg5Wr4qEE1Zxk&index=6&ab_channel=StanfordUniversitySchoolofEngineering"
-#  * Video of lecture October 28 at URL:"https://youtu.be/rfrSfikAz94"
-#  * Whiteboard notes at URL:"https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2024/NotesOctober28"
+  * Video of lecture October 27 at URL:"https://youtu.be/QqOGhLgkig0"
+  * Whiteboard notes at URL:"https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2025/FYSSTKweek44"
 !eblock
 
 
@@ -25,4053 +26,3014 @@ o Readings and Videos:
 
 !bblock
 * Main focus is discussion of and work on project 2
-* If you did not get time to finish the exercises from week 43, you can also keep working on them and hand in this coming Friday
+* If you did not get time to finish the exercises from weeks 41-42, you can also keep working on them and hand in this coming Friday
 !eblock  
 
 
 
 
 !split
-===== Material for Lecture Monday October 28 =====
-
+===== Material for Lecture Monday October 27 =====
 
 
 !split
-===== Convolutional Neural Networks (recognizing images) =====
+===== Solving differential equations  with Deep Learning =====
 
+!bblock
+The Universal Approximation Theorem states that a neural network can
+approximate any function at a single hidden layer along with one input
+and output layer to any given precision.  
+!eblock
 
-Convolutional neural networks (CNNs) were developed during the last
-decade of the previous century, with a focus on character recognition
-tasks. Nowadays, CNNs are a central element in the spectacular success
-of deep learning methods. The success in for example image
-classifications have made them a central tool for most machine
-learning practitioners.
+!bblock  Book on solving differential equations with ML methods
+"An Introduction to Neural Network Methods for Differential Equations":"https://www.springer.com/gp/book/9789401798150", by Yadav and Kumar.
+!eblock
 
-CNNs are very similar to ordinary Neural Networks.
-They are made up of neurons that have learnable weights and
-biases. Each neuron receives some inputs, performs a dot product and
-optionally follows it with a non-linearity. The whole network still
-expresses a single differentiable score function: from the raw image
-pixels on one end to class scores at the other. And they still have a
-loss function (for example Softmax) on the last (fully-connected) layer
-and all the tips/tricks we developed for learning regular Neural
-Networks still apply (back propagation, gradient descent etc etc).
+!bblock Physics informed neural networks
+"Scientific Machine Learning Through Physics–Informed Neural Networks: Where we are and What’s Next":"https://link.springer.com/article/10.1007/s10915-022-01939-z", by Cuomo et al
+!eblock
+
+
+
+!bblock Thanks to Kristine Baluka Hein
+The lectures on differential equations were developed by Kristine Baluka Hein, now PhD student at IFI.
+A great thanks to Kristine.
+!eblock
 
 !split
-===== What is the Difference =====
+===== Ordinary Differential Equations first  =====
 
-_CNN architectures make the explicit assumption that
-the inputs are images, which allows us to encode certain properties
-into the architecture. These then make the forward function more
-efficient to implement and vastly reduce the amount of parameters in
-the network._
+An ordinary differential equation (ODE) is an equation involving functions having one variable.
+
+In general, an ordinary differential equation looks like
 
+!bt
+\begin{equation} \label{ode}
+f\left(x, \, g(x), \, g'(x), \, g''(x), \, \dots \, , \, g^{(n)}(x)\right) = 0
+\end{equation}
+!et
 
+where $g(x)$ is the function to find, and $g^{(n)}(x)$ is the $n$-th derivative of $g(x)$.
+
+The $f\left(x, g(x), g'(x), g''(x), \, \dots \, , g^{(n)}(x)\right)$ is just a way to write that there is an expression involving $x$ and $g(x), \ g'(x), \ g''(x), \, \dots \, , \text{ and } g^{(n)}(x)$ on the left side of the equality sign in (ref{ode}).
+The highest order of derivative, that is the value of $n$, determines to the order of the equation.
+The equation is referred to as a $n$-th order ODE.
+Along with (ref{ode}), some additional conditions of the function $g(x)$ are typically given
+for the solution to be unique.
 
 !split
-===== Neural Networks vs CNNs =====
+===== The trial solution =====
 
-Neural networks are defined as _affine transformations_, that is 
-a vector is received as input and is multiplied with a matrix of so-called weights (our unknown paramters) to produce an
-output (to which a bias vector is usually added before passing the result
-through a nonlinear activation function). This is applicable to any type of input, be it an
-image, a sound clip or an unordered collection of features: whatever their
-dimensionality, their representation can always be flattened into a vector
-before the transformation.
+Let the trial solution $g_t(x)$ be
+
+!bt
+\begin{equation}
+	g_t(x) = h_1(x) + h_2(x,N(x,P))
+\end{equation}
+!et
+
+
+where $h_1(x)$ is a function that makes $g_t(x)$ satisfy a given set
+of conditions, $N(x,P)$ a neural network with weights and biases
+described by $P$ and $h_2(x, N(x,P))$ some expression involving the
+neural network.  The role of the function $h_2(x, N(x,P))$, is to
+ensure that the output from $N(x,P)$ is zero when $g_t(x)$ is
+evaluated at the values of $x$ where the given conditions must be
+satisfied.  The function $h_1(x)$ should alone make $g_t(x)$ satisfy
+the conditions.
+
+But what about the network $N(x,P)$?
+
+
+As described previously, an optimization method could be used to minimize the parameters of a neural network, that being its weights and biases, through backward propagation.
 
 
 !split
-===== Why CNNS for images, sound files, medical images from CT scans etc? =====
+===== Minimization process =====
 
-However, when we consider images, sound clips and many other similar kinds of data, these data  have an intrinsic
-structure. More formally, they share these important properties:
-* They are stored as multi-dimensional arrays (think of the pixels of a figure) .
-* They feature one or more axes for which ordering matters (e.g., width and height axes for an image, time axis for a sound clip).
-* One axis, called the channel axis, is used to access different views of the data (e.g., the red, green and blue channels of a color image, or the left and right channels of a stereo audio track).
+For the minimization to be defined, we need to have a cost function at hand to minimize.
 
-These properties are not exploited when an affine transformation is applied; in
-fact, all the axes are treated in the same way and the topological information
-is not taken into account. Still, taking advantage of the implicit structure of
-the data may prove very handy in solving some tasks, like computer vision and
-speech recognition, and in these cases it would be best to preserve it. This is
-where discrete convolutions come into play.
+It is given that $f\left(x, \, g(x), \, g'(x), \, g''(x), \, \dots \, , \, g^{(n)}(x)\right)$ should be equal to zero in (ref{ode}).
+We can choose to consider the mean squared error as the cost function for an input $x$.
+Since we are looking at one input, the cost function is just $f$ squared.
+The cost function $c\left(x, P \right)$ can therefore be expressed as
 
-A discrete convolution is a linear transformation that preserves this notion of
-ordering. It is sparse (only a few input units contribute to a given output
-unit) and reuses parameters (the same weights are applied to multiple locations
-in the input).
+!bt
+C\left(x, P\right) = \big(f\left(x, \, g(x), \, g'(x), \, g''(x), \, \dots \, , \, g^{(n)}(x)\right)\big)^2
+!et
 
+If $N$ inputs are given as a vector $\bm{x}$ with elements $x_i$ for $i = 1,\dots,N$,
+the cost function becomes
 
+!bt
+\begin{equation} \label{cost}
+	C\left(\bm{x}, P\right) = \frac{1}{N} \sum_{i=1}^N \big(f\left(x_i, \, g(x_i), \, g'(x_i), \, g''(x_i), \, \dots \, , \, g^{(n)}(x_i)\right)\big)^2
+\end{equation}
+!et
 
+The neural net should then find the parameters $P$ that minimizes the cost function in
+(ref{cost}) for a set of $N$ training samples $x_i$.
 
 !split
-===== Regular NNs don’t scale well to full images =====
+===== Minimizing the cost function using gradient descent and automatic differentiation =====
 
-As an example, consider
-an image of size $32\times 32\times 3$ (32 wide, 32 high, 3 color channels), so a
-single fully-connected neuron in a first hidden layer of a regular
-Neural Network would have $32\times 32\times 3 = 3072$ weights. This amount still
-seems manageable, but clearly this fully-connected structure does not
-scale to larger images. For example, an image of more respectable
-size, say $200\times 200\times 3$, would lead to neurons that have 
-$200\times 200\times 3 = 120,000$ weights. 
+To perform the minimization using gradient descent, the gradient of $C\left(\bm{x}, P\right)$ is needed.
+It might happen so that finding an analytical expression of the gradient of $C(\bm{x}, P)$ from (ref{cost}) gets too messy, depending on which cost function one desires to use.
 
-We could have
-several such neurons, and the parameters would add up quickly! Clearly,
-this full connectivity is wasteful and the huge number of parameters
-would quickly lead to possible overfitting.
+Luckily, there exists libraries that makes the job for us through automatic differentiation.
+Automatic differentiation is a method of finding the derivatives numerically with very high precision.
 
-FIGURE: [figslides/nn.jpeg, width=500 frac=0.6]  A regular 3-layer Neural Network.
 
 !split
-===== 3D volumes of neurons ===== 
+===== Example: Exponential decay =====
 
-Convolutional Neural Networks take advantage of the fact that the
-input consists of images and they constrain the architecture in a more
-sensible way. 
+An exponential decay of a quantity $g(x)$ is described by the equation
 
-In particular, unlike a regular Neural Network, the
-layers of a CNN have neurons arranged in 3 dimensions: width,
-height, depth. (Note that the word depth here refers to the third
-dimension of an activation volume, not to the depth of a full Neural
-Network, which can refer to the total number of layers in a network.)
+!bt
+\begin{equation} \label{solve_expdec}
+  g'(x) = -\gamma g(x)
+\end{equation}
+!et
 
-To understand it better, the above example of an image 
-with an input volume of
-activations has dimensions $32\times 32\times 3$ (width, height,
-depth respectively). 
+with $g(0) = g_0$ for some chosen initial value $g_0$.
 
-The neurons in a layer will
-only be connected to a small region of the layer before it, instead of
-all of the neurons in a fully-connected manner. Moreover, the final
-output layer could  for this specific image have dimensions $1\times 1 \times 10$, 
-because by the
-end of the CNN architecture we will reduce the full image into a
-single vector of class scores, arranged along the depth
-dimension. 
+The analytical solution of (ref{solve_expdec}) is
 
-FIGURE: [figslides/cnn.jpeg, width=500 frac=0.6]  A CNN arranges its neurons in three dimensions (width, height, depth), as visualized in one of the layers. Every layer of a CNN transforms the 3D input volume to a 3D output volume of neuron activations. In this example, the red input layer holds the image, so its width and height would be the dimensions of the image, and the depth would be 3 (Red, Green, Blue channels).
+!bt
+\begin{equation}
+  g(x) = g_0 \exp\left(-\gamma x\right)
+\end{equation}
+!et
+
+Having an analytical solution at hand, it is possible to use it to compare how well a neural network finds a solution of (ref{solve_expdec}).
 
 
 !split
-===== More on Dimensionalities =====
+===== The function to solve for =====
 
-In fields like signal processing (and imaging as well), one designs
-so-called filters. These filters are defined by the convolutions and
-are often hand-crafted. One may specify filters for smoothing, edge
-detection, frequency reshaping, and similar operations. However with
-neural networks the idea is to automatically learn the filters and use
-many of them in conjunction with non-linear operations (activation
-functions).
+The program will use a neural network to solve
 
-As an example consider a neural network operating on sound sequence
-data.  Assume that we an input vector $\bm{x}$ of length $d=10^6$.  We
-construct then a neural network with onle hidden layer only with
-$10^4$ nodes. This means that we will have a weight matrix with
-$10^4\times 10^6=10^{10}$ weights to be determined, together with $10^4$ biases.
+!bt
+\begin{equation} \label{solveode}
+g'(x) = -\gamma g(x)
+\end{equation}
+!et
 
-Assume furthermore that we have an output layer which is meant to train whether the sound sequence represents a human voice (true) or something else (false).
-It means that we have only one output node. But since this output node connects to $10^4$ nodes in the hidden layer, there are in total $10^4$ weights to be determined for the output layer, plus one bias. In total we have
+where $g(0) = g_0$ with $\gamma$ and $g_0$ being some chosen values.
+
+In this example, $\gamma = 2$ and $g_0 = 10$.
+
+!split
+===== The trial solution =====
+To begin with, a trial solution $g_t(t)$ must be chosen. A general trial solution for ordinary differential equations could be
 
 !bt
-\[
-\mathrm{NumberParameters}=10^{10}+10^4+10^4+1 \approx 10^{10},
-\]
+g_t(x, P) = h_1(x) + h_2(x, N(x, P))
 !et
-that is ten billion parameters to determine. 
 
+with $h_1(x)$ ensuring that $g_t(x)$ satisfies some conditions and $h_2(x,N(x, P))$ an expression involving $x$ and the output from the neural network $N(x,P)$ with $P $ being the collection of the weights and biases for each layer. For now, it is assumed that the network consists of one input layer, one hidden layer, and one output layer.
 
 !split
-===== Further remarks =====
+===== Setup of Network =====
 
+In this network, there are no weights and bias at the input layer, so $P = \{ P_{\text{hidden}},  P_{\text{output}} \}$.
+If there are $N_{\text{hidden} }$ neurons in the hidden layer, then $P_{\text{hidden}}$ is a $N_{\text{hidden} } \times (1 + N_{\text{input}})$ matrix, given that there are $N_{\text{input}}$ neurons in the input layer.
 
+The first column in $P_{\text{hidden} }$ represents the bias for each neuron in the hidden layer and the second column represents the weights for each neuron in the hidden layer from the input layer.
+If there are $N_{\text{output} }$ neurons in the output layer, then $P_{\text{output}} $ is a $N_{\text{output} } \times (1 + N_{\text{hidden} })$ matrix.
 
-The main principles that justify convolutions is locality of
-information and repetion of patterns within the signal. Sound samples
-of the input in adjacent spots are much more likely to affect each
-other than those that are very far away. Similarly, sounds are
-repeated in multiple times in the signal. While slightly simplistic,
-reasoning about such a sound example demonstrates this. The same
-principles then apply to images and other similar data.
+Its first column represents the bias of each neuron and the remaining columns represents the weights to each neuron.
 
+It is given that $g(0) = g_0$. The trial solution must fulfill this condition to be a proper solution of (ref{solveode}). A possible way to ensure that $g_t(0, P) = g_0$, is to let $F(N(x,P)) = x \cdot N(x,P)$ and $h_1(x) = g_0$. This gives the following trial solution:
 
+!bt
+\begin{equation} \label{trial}
+g_t(x, P) = g_0 + x \cdot N(x, P)
+\end{equation}
+!et
 
+!split
+===== Reformulating the problem =====
 
+We wish that our neural network manages to minimize a given cost function.
 
-!split 
-===== Layers used to build CNNs =====
+A reformulation of out equation, (ref{solveode}), must therefore be done,
+such that it describes the problem a neural network can solve for.
 
+The neural network must find the set of weights and biases $P$ such that the trial solution in (ref{trial}) satisfies (ref{solveode}).
 
-A simple CNN is a sequence of layers, and every layer of a CNN
-transforms one volume of activations to another through a
-differentiable function. We use three main types of layers to build
-CNN architectures: Convolutional Layer, Pooling Layer, and
-Fully-Connected Layer (exactly as seen in regular Neural Networks). We
-will stack these layers to form a full CNN architecture.
+The trial solution
 
-A simple CNN for image classification could have the architecture:
+!bt
+g_t(x, P) = g_0 + x \cdot N(x, P)
+!et
 
-* _INPUT_ ($32\times 32 \times 3$) will hold the raw pixel values of the image, in this case an image of width 32, height 32, and with three color channels R,G,B.
-* _CONV_ (convolutional )layer will compute the output of neurons that are connected to local regions in the input, each computing a dot product between their weights and a small region they are connected to in the input volume. This may result in volume such as $[32\times 32\times 12]$ if we decided to use 12 filters.
-* _RELU_ layer will apply an elementwise activation function, such as the $max(0,x)$ thresholding at zero. This leaves the size of the volume unchanged ($[32\times 32\times 12]$).
-* _POOL_ (pooling) layer will perform a downsampling operation along the spatial dimensions (width, height), resulting in volume such as $[16\times 16\times 12]$.
-* _FC_ (i.e. fully-connected) layer will compute the class scores, resulting in volume of size $[1\times 1\times 10]$, where each of the 10 numbers correspond to a class score, such as among the 10 categories of the MNIST images we considered above . As with ordinary Neural Networks and as the name implies, each neuron in this layer will be connected to all the numbers in the previous volume.
+has been chosen such that it already solves the condition $g(0) = g_0$. What remains, is to find $P$ such that
+
+!bt
+\begin{equation} \label{nnmin}
+g_t'(x, P) = - \gamma g_t(x, P)
+\end{equation}
+!et
 
+is fulfilled as *best as possible*.
 
 !split
-===== Transforming images =====
+===== More technicalities =====
 
-CNNs transform the original image layer by layer from the original
-pixel values to the final class scores. 
+The left hand side and right hand side of (ref{nnmin}) must be computed separately, and then the neural network must choose weights and biases, contained in $P$, such that the sides are equal as best as possible.
+This means that the absolute or squared difference between the sides must be as close to zero, ideally equal to zero.
+In this case, the difference squared shows to be an appropriate measurement of how erroneous the trial solution is with respect to $P$ of the neural network.
 
-Observe that some layers contain
-parameters and other don’t. In particular, the CNN layers perform
-transformations that are a function of not only the activations in the
-input volume, but also of the parameters (the weights and biases of
-the neurons). On the other hand, the RELU/POOL layers will implement a
-fixed function. The parameters in the CONV/FC layers will be trained
-with gradient descent so that the class scores that the CNN computes
-are consistent with the labels in the training set for each image.
+This gives the following cost function our neural network must solve for:
+
+!bt
+\min_{P}\Big\{ \big(g_t'(x, P) - ( -\gamma g_t(x, P) \big)^2 \Big\}
+!et
+
+(the notation $\min_{P}\{ f(x, P) \}$ means that we desire to find $P$ that yields the minimum of $f(x, P)$)
+
+or, in terms of weights and biases for the hidden and output layer in our network:
+
+!bt
+\min_{P_{\text{hidden} }, \ P_{\text{output} }}\Big\{ \big(g_t'(x, \{ P_{\text{hidden} }, P_{\text{output} }\}) - ( -\gamma g_t(x, \{ P_{\text{hidden} }, P_{\text{output} }\}) \big)^2 \Big\}
+!et
 
+for an input value $x$.
 
 !split
-===== CNNs in brief =====
+=====  More details =====
 
-In summary:
+If the neural network evaluates $g_t(x, P)$ at more values for $x$, say $N$ values $x_i$ for $i = 1, \dots, N$, then the *total* error to minimize becomes
 
-* A CNN architecture is in the simplest case a list of Layers that transform the image volume into an output volume (e.g. holding the class scores)
-* There are a few distinct types of Layers (e.g. CONV/FC/RELU/POOL are by far the most popular)
-* Each Layer accepts an input 3D volume and transforms it to an output 3D volume through a differentiable function
-* Each Layer may or may not have parameters (e.g. CONV/FC do, RELU/POOL don’t)
-* Each Layer may or may not have additional hyperparameters (e.g. CONV/FC/POOL do, RELU doesn’t)
+!bt
+\begin{equation} \label{min}
+\min_{P}\Big\{\frac{1}{N} \sum_{i=1}^N  \big(g_t'(x_i, P) - ( -\gamma g_t(x_i, P) \big)^2 \Big\}
+\end{equation}
+!et
+
+Letting $\bm{x}$ be a vector with elements $x_i$ and $C(\bm{x}, P) = \frac{1}{N} \sum_i  \big(g_t'(x_i, P) - ( -\gamma g_t(x_i, P) \big)^2$ denote the cost function, the minimization problem that our network must solve, becomes
+
+!bt
+\min_{P} C(\bm{x}, P)
+!et
+
+In terms of $P_{\text{hidden} }$ and $P_{\text{output} }$, this could also be expressed as
+
+$$
+\min_{P_{\text{hidden} }, \ P_{\text{output} }} C(\bm{x}, \{P_{\text{hidden} }, P_{\text{output} }\})
+$$
 
 !split
-===== A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book") =====
+===== A possible implementation of a neural network =====
 
-FIGURE: [figslides/deepcnn.png, width=500 frac=0.67]  A deep CNN
+For simplicity, it is assumed that the input is an array $\bm{x} = (x_1, \dots, x_N)$ with $N$ elements. It is at these points the neural network should find $P$ such that it fulfills (ref{min}).
+
+First, the neural network must feed forward the inputs.
+This means that $\bm{x}s$ must be passed through an input layer, a hidden layer and a output layer. The input layer in this case, does not need to process the data any further.
+The input layer will consist of $N_{\text{input} }$ neurons, passing its element to each neuron in the hidden layer.  The number of neurons in the hidden layer will be $N_{\text{hidden} }$.
 
 !split
-===== Key Idea =====
+===== Technicalities =====
 
-A dense neural network is representd by an affine operation (like matrix-matrix multiplication) where all parameters are included.
+For the $i$-th in the hidden layer with weight $w_i^{\text{hidden} }$ and bias $b_i^{\text{hidden} }$, the weighting from the $j$-th neuron at the input layer is:
 
-The key idea in CNNs for say imaging is that in images neighbor pixels tend to be related! So we connect
-only neighboring neurons in the input instead of connecting all with the first hidden layer.
+!bt
+\begin{aligned}
+z_{i,j}^{\text{hidden}} &= b_i^{\text{hidden}} + w_i^{\text{hidden}}x_j \\
+&=
+\begin{pmatrix}
+b_i^{\text{hidden}} & w_i^{\text{hidden}}
+\end{pmatrix}
+\begin{pmatrix}
+1 \\
+x_j
+\end{pmatrix}
+\end{aligned}
+!et
 
-We say we perform a filtering (convolution is the mathematical operation). 
+!split
+=====  Final technicalities I =====
 
+The result after weighting the inputs at the $i$-th hidden neuron can be written as a vector:
 
+!bt
+\begin{aligned}
+\bm{z}_{i}^{\text{hidden}} &= \Big( b_i^{\text{hidden}} + w_i^{\text{hidden}}x_1 , \ b_i^{\text{hidden}} + w_i^{\text{hidden}} x_2, \ \dots \, , \ b_i^{\text{hidden}} + w_i^{\text{hidden}} x_N\Big)  \\
+&=
+\begin{pmatrix}
+ b_i^{\text{hidden}}  & w_i^{\text{hidden}}
+\end{pmatrix}
+\begin{pmatrix}
+1  & 1 & \dots & 1 \\
+x_1 & x_2 & \dots & x_N
+\end{pmatrix} \\
+&= \bm{p}_{i, \text{hidden}}^T X
+\end{aligned}
+!et
 
 !split
-===== How to do image compression before the era of deep learning =====
+=====  Final technicalities II =====
 
-The singular-value decomposition (SVD) algorithm has been for decades one of the standard ways of compressing images.
-The "lectures on the SVD":"https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/chapter2.html#the-singular-value-decomposition" give many of the essential details concerning the SVD.
+The vector $\bm{p}_{i, \text{hidden}}^T$ constitutes each row in $P_{\text{hidden} }$, which contains the weights for the neural network to minimize according to (ref{min}).
 
-The orthogonal vectors which are obtained from the SVD, can be used to
-project down the dimensionality of a given image. In the example here
-we gray-scale an image and downsize it.
+After having found $\bm{z}_{i}^{\text{hidden}} $ for every $i$-th neuron within the hidden layer, the vector will be sent to an activation function $a_i(\bm{z})$.
 
-This recipe relies on us being able to actually perform the SVD. For
-large images, and in particular with many images to reconstruct, using the SVD 
-may quickly become an overwhelming task. With the advent of efficient deep
-learning methods like CNNs and later generative methods, these methods
-have become in the last years the premier way of performing image
-analysis. In particular for classification problems with labelled images.
+In this example, the sigmoid function has been chosen to be the activation function for each hidden neuron:
+
+!bt
+f(z) = \frac{1}{1 + \exp{(-z)}}
+!et
+
+It is possible to use other activations functions for the hidden layer also.
+
+The output $\bm{x}_i^{\text{hidden}}$ from each $i$-th hidden neuron is:
+
+$$
+\bm{x}_i^{\text{hidden} } = f\big(  \bm{z}_{i}^{\text{hidden}} \big)
+$$
+
+The outputs $\bm{x}_i^{\text{hidden} } $ are then sent to the output layer.
+
+The output layer consists of one neuron in this case, and combines the
+output from each of the neurons in the hidden layers. The output layer
+combines the results from the hidden layer using some weights $w_i^{\text{output}}$
+and biases $b_i^{\text{output}}$. In this case,
+it is assumes that the number of neurons in the output layer is one.
 
 !split
-===== The SVD example =====
+=====  Final technicalities III =====
 
-!bc pycod
-from matplotlib.image import imread
-import matplotlib.pyplot as plt
-import scipy.linalg as ln
-import numpy as np
-import os
-from PIL import Image
-from math import log10, sqrt 
-plt.rcParams['figure.figsize'] = [16, 8]
-# Import image
-A = imread(os.path.join("figslides/photo1.jpg"))
-X = A.dot([0.299, 0.5870, 0.114]) # Convert RGB to grayscale
-img = plt.imshow(X)
-# convert to gray
-img.set_cmap('gray')
-plt.axis('off')
-plt.show()
-# Call image size
-print(': %s'%str(X.shape))
-
-
-# split the matrix into U, S, VT
-U, S, VT = np.linalg.svd(X,full_matrices=False)
-S = np.diag(S)
-m = 800 # Image's width
-n = 1200 # Image's height
-j = 0
-# Try compression with different k vectors (these represent projections):
-for k in (5,10, 20, 100,200,400,500):
-    # Original size of the image
-    originalSize = m * n 
-    # Size after compressed
-    compressedSize = k * (1 + m + n) 
-    # The projection of the original image
-    Xapprox = U[:,:k] @ S[0:k,:k] @ VT[:k,:]
-    plt.figure(j+1)
-    j += 1
-    img = plt.imshow(Xapprox)
-    img.set_cmap('gray')
-    
-    plt.axis('off')
-    plt.title('k = ' + str(k))
-    plt.show() 
-    print('Original size of image:')
-    print(originalSize)
-    print('Compression rate as Compressed image / Original size:')
-    ratio = compressedSize * 1.0 / originalSize
-    print(ratio)
-    print('Compression rate is ' + str( round(ratio * 100 ,2)) + '%' )  
-    # Estimate MQA
-    x= X.astype("float")
-    y=Xapprox.astype("float")
-    err = np.sum((x - y) ** 2)
-    err /= float(X.shape[0] * Xapprox.shape[1])
-    print('The mean-square deviation '+ str(round( err)))
-    max_pixel = 255.0
-    # Estimate Signal Noise Ratio
-    srv = 20 * (log10(max_pixel / sqrt(err)))
-    print('Signa to noise ratio '+ str(round(srv)) +'dB')
-
-!ec
 
+The procedure of weighting the output neuron $j$ in the hidden layer to the $i$-th neuron in the output layer is similar as for the hidden layer described previously.
 
+!bt
+\begin{aligned}
+z_{1,j}^{\text{output}} & =
+\begin{pmatrix}
+b_1^{\text{output}} & \bm{w}_1^{\text{output}}
+\end{pmatrix}
+\begin{pmatrix}
+1 \\
+\bm{x}_j^{\text{hidden}}
+\end{pmatrix}
+\end{aligned}
+!et
 
 !split
-===== Mathematics of CNNs =====
+=====  Final technicalities IV =====
 
-The mathematics of CNNs is based on the mathematical operation of
-_convolution_.  In mathematics (in particular in functional analysis),
-convolution is represented by mathematical operations (integration,
-summation etc) on two functions in order to produce a third function
-that expresses how the shape of one gets modified by the other.
-Convolution has a plethora of applications in a variety of
-disciplines, spanning from statistics to signal processing, computer
-vision, solutions of differential equations,linear algebra,
-engineering, and yes, machine learning.
+Expressing $z_{1,j}^{\text{output}}$ as a vector gives the following way of weighting the inputs from the hidden layer:
 
-Mathematically, convolution is defined as follows (one-dimensional example):
-Let us define a continuous function $y(t)$ given by
 !bt
-\[
-y(t) = \int x(a) w(t-a) da,
-\]
+\bm{z}_{1}^{\text{output}} =
+\begin{pmatrix}
+b_1^{\text{output}} & \bm{w}_1^{\text{output}}
+\end{pmatrix}
+\begin{pmatrix}
+1  & 1 & \dots & 1 \\
+\bm{x}_1^{\text{hidden}} & \bm{x}_2^{\text{hidden}} & \dots & \bm{x}_N^{\text{hidden}}
+\end{pmatrix}
 !et
-where $x(a)$ represents a so-called input and $w(t-a)$ is normally called the weight function or kernel.
 
-The above integral is written in  a more compact form as
-!bt
-\[
-y(t) = \left(x * w\right)(t).
-\]
-!et
+In this case we seek a continuous range of values since we are approximating a function. This means that after computing $\bm{z}_{1}^{\text{output}}$ the neural network has finished its feed forward step, and $\bm{z}_{1}^{\text{output}}$ is the final output of the network.
+
+!split
+===== Back propagation =====
+
+The next step is to decide how the parameters should be changed such that they minimize the cost function.
+
+The chosen cost function for this problem is
 
-The discretized version reads
 !bt
-\[
-y(t) = \sum_{a=-\infty}^{a=\infty}x(a)w(t-a).
-\]
+C(\bm{x}, P) = \frac{1}{N} \sum_i  \big(g_t'(x_i, P) - ( -\gamma g_t(x_i, P) \big)^2
 !et
-Computing the inverse of the above convolution operations is known as deconvolution and the process is commutative.
 
-How can we use this? And what does it mean? Let us study some familiar examples first.
+In order to minimize the cost function, an optimization method must be chosen.
 
+Here, gradient descent with a constant step size has been chosen.
 
 !split
-===== Convolution Examples: Polynomial multiplication =====
+===== Gradient descent =====
 
-Our first example is that of a multiplication between two polynomials,
-which we will rewrite in terms of the mathematics of convolution. In
-the final stage, since the problem here is a discrete one, we will
-recast the final expression in terms of a matrix-vector
-multiplication, where the matrix is a so-called "Toeplitz matrix
-":"https://link.springer.com/book/10.1007/978-93-86279-04-0".
+The idea of the gradient descent algorithm is to update parameters in
+a direction where the cost function decreases goes to a minimum.
+
+In general, the update of some parameters $\bm{\omega}$ given a cost
+function defined by some weights $\bm{\omega}$, $C(\bm{x},
+\bm{\omega})$, goes as follows:
 
-Let us look a the following polynomials to second and third order, respectively:
-!bt
-\[
-p(t) = \alpha_0+\alpha_1 t+\alpha_2 t^2,
-\]
-!et
-and
 !bt
-\[
-s(t) = \beta_0+\beta_1 t+\beta_2 t^2+\beta_3 t^3.
-\]
+\bm{\omega}_{\text{new} } = \bm{\omega} - \lambda \nabla_{\bm{\omega}} C(\bm{x}, \bm{\omega})
 !et
 
-The polynomial multiplication gives us a new polynomial of degree $5$
+for a number of iterations or until $ \big|\big| \bm{\omega}_{\text{new} } - \bm{\omega} \big|\big|$ becomes smaller than some given tolerance.
+
+The value of $\lambda$ decides how large steps the algorithm must take
+in the direction of $ \nabla_{\bm{\omega}} C(\bm{x}, \bm{\omega})$.
+The notation $\nabla_{\bm{\omega}}$ express the gradient with respect
+to the elements in $\bm{\omega}$.
+
+In our case, we have to minimize the cost function $C(\bm{x}, P)$ with
+respect to the two sets of weights and biases, that is for the hidden
+layer $P_{\text{hidden} }$ and for the output layer $P_{\text{output}
+}$ .
+
+This means that $P_{\text{hidden} }$ and $P_{\text{output} }$ is updated by
+
 !bt
-\[
-z(t) = \delta_0+\delta_1 t+\delta_2 t^2+\delta_3 t^3+\delta_4 t^4+\delta_5 t^5.
-\]
+\begin{aligned}
+P_{\text{hidden},\text{new}} &= P_{\text{hidden}} - \lambda \nabla_{P_{\text{hidden}}} C(\bm{x}, P)  \\
+P_{\text{output},\text{new}} &= P_{\text{output}} - \lambda \nabla_{P_{\text{output}}} C(\bm{x}, P)
+\end{aligned}
 !et
 
 !split
-===== Efficient Polynomial Multiplication =====
+===== The code for solving the ODE =====
 
-Computing polynomial products can be implemented efficiently if we rewrite the more brute force multiplications using convolution.
-We note first that the new coefficients are given as
+!bc pycod
+import autograd.numpy as np
+from autograd import grad, elementwise_grad
+import autograd.numpy.random as npr
+from matplotlib import pyplot as plt
 
-!bt
-\begin{split}
-\delta_0=&\alpha_0\beta_0\\
-\delta_1=&\alpha_1\beta_0+\alpha_0\beta_1\\
-\delta_2=&\alpha_0\beta_2+\alpha_1\beta_1+\alpha_2\beta_0\\
-\delta_3=&\alpha_1\beta_2+\alpha_2\beta_1+\alpha_0\beta_3\\
-\delta_4=&\alpha_2\beta_2+\alpha_1\beta_3\\
-\delta_5=&\alpha_2\beta_3.\\
-\end{split}
-!et
+def sigmoid(z):
+    return 1/(1 + np.exp(-z))
 
+# Assuming one input, hidden, and output layer
+def neural_network(params, x):
 
-We note that $\alpha_i=0$ except for $i\in \left\{0,1,2\right\}$ and $\beta_i=0$ except for $i\in\left\{0,1,2,3\right\}$.
+    # Find the weights (including and biases) for the hidden and output layer.
+    # Assume that params is a list of parameters for each layer.
+    # The biases are the first element for each array in params,
+    # and the weights are the remaning elements in each array in params.
 
-We can then rewrite the coefficients $\delta_j$ using a discrete convolution as
-!bt
-\[
-\delta_j = \sum_{i=-\infty}^{i=\infty}\alpha_i\beta_{j-i}=(\alpha * \beta)_j,
-\]
-!et
-or as a double sum with restriction $l=i+j$
-!bt
-\[
-\delta_l = \sum_{ij}\alpha_i\beta_{j}.
-\]
-!et
+    w_hidden = params[0]
+    w_output = params[1]
 
+    # Assumes input x being an one-dimensional array
+    num_values = np.size(x)
+    x = x.reshape(-1, num_values)
 
-!split
-===== Further simplification =====
+    # Assume that the input layer does nothing to the input x
+    x_input = x
 
-Although we may have redundant operations with some few zeros for $\beta_i$, we can rewrite the above sum in a more compact way as 
-!bt
-\[
-\delta_i = \sum_{k=0}^{k=m-1}\alpha_k\beta_{i-k},
-\]
-!et
-where $m=3$ in our case, the maximum length of
-the vector $\alpha$. Note that the vector $\bm{\beta}$ has length $n=4$. Below we will find an even more efficient representation.
+    ## Hidden layer:
 
-!split
-===== A more efficient way of coding the above Convolution =====
+    # Add a row of ones to include bias
+    x_input = np.concatenate((np.ones((1,num_values)), x_input ), axis = 0)
 
-Since we only have a finite number of $\alpha$ and $\beta$ values
-which are non-zero, we can rewrite the above convolution expressions
-as a matrix-vector multiplication
+    z_hidden = np.matmul(w_hidden, x_input)
+    x_hidden = sigmoid(z_hidden)
 
-!bt
-\[
-\bm{\delta}=\begin{bmatrix}\alpha_0 & 0 & 0 & 0 \\
-                            \alpha_1 & \alpha_0 & 0 & 0 \\
-			    \alpha_2 & \alpha_1 & \alpha_0 & 0 \\
-			    0 & \alpha_2 & \alpha_1 & \alpha_0 \\
-			    0 & 0 & \alpha_2 & \alpha_1 \\
-			    0 & 0 & 0 & \alpha_2
-			    \end{bmatrix}\begin{bmatrix} \beta_0 \\ \beta_1 \\ \beta_2 \\ \beta_3\end{bmatrix}.
-\]
-!et
+    ## Output layer:
 
-!split
-===== Commutative process =====
+    # Include bias:
+    x_hidden = np.concatenate((np.ones((1,num_values)), x_hidden ), axis = 0)
 
-The process is commutative and we can easily see that we can rewrite the multiplication in terms of  a matrix holding $\beta$ and a vector holding $\alpha$.
-In this case we have
-!bt
-\[
-\bm{\delta}=\begin{bmatrix}\beta_0 & 0 & 0  \\
-                            \beta_1 & \beta_0 & 0  \\
-			    \beta_2 & \beta_1 & \beta_0  \\
-			    \beta_3 & \beta_2 & \beta_1 \\
-			    0 & \beta_3 & \beta_2 \\
-			    0 & 0 & \beta_3
-			    \end{bmatrix}\begin{bmatrix} \alpha_0 \\ \alpha_1 \\ \alpha_2\end{bmatrix}.
-\]
-!et
+    z_output = np.matmul(w_output, x_hidden)
+    x_output = z_output
 
-Note that the use of these matrices is for mathematical purposes only
-and not implementation purposes.  When implementing the above equation
-we do not encode (and allocate memory) the matrices explicitely.  We
-rather code the convolutions in the minimal memory footprint that they
-require.
+    return x_output
 
+# The trial solution using the deep neural network:
+def g_trial(x,params, g0 = 10):
+    return g0 + x*neural_network(params,x)
 
+# The right side of the ODE:
+def g(x, g_trial, gamma = 2):
+    return -gamma*g_trial
 
-!split
-===== Toeplitz matrices =====
+# The cost function:
+def cost_function(P, x):
 
-The above matrices are examples of so-called "Toeplitz
-matrices":"https://link.springer.com/book/10.1007/978-93-86279-04-0". A
-Toeplitz matrix is a matrix in which each descending diagonal from
-left to right is constant. For instance the last matrix, which we
-rewrite as
-!bt
-\[
-\bm{A}=\begin{bmatrix}a_0 & 0 & 0  \\
-                            a_1 & a_0 & 0  \\
-			    a_2 & a_1 & a_0  \\
-			    a_3 & a_2 & a_1 \\
-			    0 & a_3 & a_2 \\
-			    0 & 0 & a_3
-			    \end{bmatrix},
-\]
-!et
-with elements $a_{ii}=a_{i+1,j+1}=a_{i-j}$ is an example of a Toeplitz
-matrix. Such a matrix does not need to be a square matrix.  Toeplitz
-matrices are also closely connected with Fourier series, because the multiplication operator by a trigonometric
-polynomial, compressed to a finite-dimensional space, can be
-represented by such a matrix. The example above shows that we can
-represent linear convolution as multiplication of a Toeplitz matrix by
-a vector.
+    # Evaluate the trial function with the current parameters P
+    g_t = g_trial(x,P)
 
+    # Find the derivative w.r.t x of the neural network
+    d_net_out = elementwise_grad(neural_network,1)(P,x)
 
-!split
-===== Fourier series and Toeplitz matrices =====
+    # Find the derivative w.r.t x of the trial function
+    d_g_t = elementwise_grad(g_trial,0)(x,P)
 
-This is an active and ogoing research area concerning CNNs. The following articles may be of interest
-o "Read more about the convolution theorem and Fouriers series":"https://www.sciencedirect.com/topics/engineering/convolution-theorem#:~:text=The%20convolution%20theorem%20(together%20with,k%20)%20G%20(%20k%20)%20."
-o "Fourier Transform Layer":"https://www.sciencedirect.com/science/article/pii/S1568494623006257"
+    # The right side of the ODE
+    func = g(x, g_t)
 
+    err_sqr = (d_g_t - func)**2
+    cost_sum = np.sum(err_sqr)
 
+    return cost_sum / np.size(err_sqr)
 
-!split
-===== Generalizing the above one-dimensional case =====
+# Solve the exponential decay ODE using neural network with one input, hidden, and output layer
+def solve_ode_neural_network(x, num_neurons_hidden, num_iter, lmb):
+    ## Set up initial weights and biases
 
-In order to align the above simple case with the more general
-convolution cases, we rename $\bm{\alpha}$, whose length is $m=3$,
-with $\bm{w}$.  We will interpret $\bm{w}$ as a weight/filter function
-with which we want to perform the convolution with an input variable
-$\bm{x}$ of length $n$.  We will assume always that the filter
-$\bm{w}$ has dimensionality $m \le n$.
+    # For the hidden layer
+    p0 = npr.randn(num_neurons_hidden, 2 )
 
-We replace thus $\bm{\beta}$ with $\bm{x}$ and $\bm{\delta}$ with $\bm{y}$ and have
-!bt
-\[
-y(i)= \left(x*w\right)(i)= \sum_{k=0}^{k=m-1}w(k)x(i-k),
-\]
-!et
-where $m=3$ in our case, the maximum length of the vector $\bm{w}$.
-Here the symbol $*$ represents the mathematical operation of convolution.
+    # For the output layer
+    p1 = npr.randn(1, num_neurons_hidden + 1 ) # +1 since bias is included
 
+    P = [p0, p1]
 
-!split
-===== Memory considerations =====
+    print('Initial cost: %g'%cost_function(P, x))
 
-This expression leaves us however with some terms with negative
-indices, for example $x(-1)$ and $x(-2)$ which may not be defined. Our
-vector $\bm{x}$ has components $x(0)$, $x(1)$, $x(2)$ and $x(3)$.
+    ## Start finding the optimal weights using gradient descent
 
-The index $j$ for $\bm{x}$ runs from $j=0$ to $j=3$ since $\bm{x}$ is meant to
-represent a third-order polynomial.
+    # Find the Python function that represents the gradient of the cost function
+    # w.r.t the 0-th input argument -- that is the weights and biases in the hidden and output layer
+    cost_function_grad = grad(cost_function,0)
 
-Furthermore, the index $i$ runs from $i=0$ to $i=5$ since $\bm{y}$
-contains the coefficients of a fifth-order polynomial.  When $i=5$ we
-may also have values of $x(4)$ and $x(5)$ which are not defined.
+    # Let the update be done num_iter times
+    for i in range(num_iter):
+        # Evaluate the gradient at the current weights and biases in P.
+        # The cost_grad consist now of two arrays;
+        # one for the gradient w.r.t P_hidden and
+        # one for the gradient w.r.t P_output
+        cost_grad =  cost_function_grad(P, x)
 
+        P[0] = P[0] - lmb * cost_grad[0]
+        P[1] = P[1] - lmb * cost_grad[1]
 
-!split
-===== Padding =====
+    print('Final cost: %g'%cost_function(P, x))
 
-The solution to this is what is called _padding_!  We simply define a
-new vector $x$ with two added elements set to zero before $x(0)$ and
-two new elements after $x(3)$ set to zero. That is, we augment the
-length of $\bm{x}$ from $n=4$ to $n+2P=8$, where $P=2$ is the padding
-constant (a new hyperparameter), see discussions below as well.
+    return P
 
-!split
-===== New vector =====
+def g_analytic(x, gamma = 2, g0 = 10):
+    return g0*np.exp(-gamma*x)
 
-We have a new vector defined as $x(0)=0$, $x(1)=0$,
-$x(2)=\beta_0$, $x(3)=\beta_1$, $x(4)=\beta_2$, $x(5)=\beta_3$,
-$x(6)=0$, and $x(7)=0$.
+# Solve the given problem
+if __name__ == '__main__':
+    # Set seed such that the weight are initialized
+    # with same weights and biases for every run.
+    npr.seed(15)
 
+    ## Decide the vales of arguments to the function to solve
+    N = 10
+    x = np.linspace(0, 1, N)
 
-We have added four new elements, which
-are all zero. The benefit is that we can rewrite the equation for
-$\bm{y}$, with $i=0,1,\dots,5$,
-!bt
-\[
-y(i) = \sum_{k=0}^{k=m-1}w(k)x(i+(m-1)-k).
-\]
-!et
+    ## Set up the initial parameters
+    num_hidden_neurons = 10
+    num_iter = 10000
+    lmb = 0.001
 
-As an example, we have
-!bt
-\[
-y(4)=x(6)w(0)+x(5)w(1)+x(4)w(2)=0\times \alpha_0+\beta_3\alpha_1+\beta_2\alpha_2,
-\]
-!et
-as before except that we have an additional term $x(6)w(0)$, which is zero.
+    # Use the network
+    P = solve_ode_neural_network(x, num_hidden_neurons, num_iter, lmb)
 
-Similarly, for the fifth-order term we have
-!bt
-\[
-y(5)=x(7)w(0)+x(6)w(1)+x(5)w(2)=0\times \alpha_0+0\times\alpha_1+\beta_3\alpha_2.
-\]
-!et
-
-The zeroth-order term is
-!bt
-\[
-y(0)=x(2)w(0)+x(1)w(1)+x(0)w(2)=\beta_0 \alpha_0+0\times\alpha_1+0\times\alpha_2=\alpha_0\beta_0.
-\]
-!et
+    # Print the deviation from the trial solution and true solution
+    res = g_trial(x,P)
+    res_analytical = g_analytic(x)
 
-!split
-===== Rewriting as dot products =====
+    print('Max absolute difference: %g'%np.max(np.abs(res - res_analytical)))
 
-If we now flip the filter/weight vector, with the following term as a typical example
-!bt
-\[
-y(0)=x(2)w(0)+x(1)w(1)+x(0)w(2)=x(2)\tilde{w}(2)+x(1)\tilde{w}(1)+x(0)\tilde{w}(0),
-\]
-!et
-with $\tilde{w}(0)=w(2)$, $\tilde{w}(1)=w(1)$, and $\tilde{w}(2)=w(0)$, we can then rewrite the above sum as a dot product of
-$x(i:i+(m-1))\tilde{w}$ for element $y(i)$, where $x(i:i+(m-1))$ is simply a patch of $\bm{x}$ of size $m-1$.
+    # Plot the results
+    plt.figure(figsize=(10,10))
 
-The padding $P$ we have introduced for the convolution stage is just
-another hyperparameter which is introduced as part of the
-architecture. Similarly, below we will also introduce another
-hyperparameter called _Stride_ $S$. 
+    plt.title('Performance of neural network solving an ODE compared to the analytical solution')
+    plt.plot(x, res_analytical)
+    plt.plot(x, res[0,:])
+    plt.legend(['analytical','nn'])
+    plt.xlabel('x')
+    plt.ylabel('g(x)')
+    plt.show()
+!ec
 
 
 !split
-===== Cross correlation =====
-
-In essentially all applications one uses what is called cross correlation instead of the standard convolution described above.
-This means that multiplication is performed in the same direction and instead of the general expression we have discussed above (with infinite sums)
-!bt
-\[
-y(i) = \sum_{k=-\infty}^{k=\infty}w(k)x(i-k),
-\]
-!et
-we have now
-!bt
-\[
-y(i) = \sum_{k=-\infty}^{k=\infty}w(k)x(i+k).
-\]
-!et
+===== The network with one input layer, specified number of hidden layers, and one output layer =====
 
-Both TensorFlow and PyTorch (as well as our own code example below),
-implement the last equation, although it is normally referred to as
-convolution.  The same padding rules and stride rules discussed below
-apply to this expression as well.
+It is also possible to extend the construction of our network into a more general one, allowing the network to contain more than one hidden layers.
 
-We leave it as an exercise for you to convince yourself that the example we have discussed till now, gives the same final expression using the last expression.
+The number of neurons within each hidden layer are given as a list of integers in the program below.
 
+!bc pycod
+import autograd.numpy as np
+from autograd import grad, elementwise_grad
+import autograd.numpy.random as npr
+from matplotlib import pyplot as plt
 
+def sigmoid(z):
+    return 1/(1 + np.exp(-z))
 
-=====  Two-dimensional objects =====
+# The neural network with one input layer and one output layer,
+# but with number of hidden layers specified by the user.
+def deep_neural_network(deep_params, x):
+    # N_hidden is the number of hidden layers
+    # deep_params is a list, len() should be used
+    N_hidden = len(deep_params) - 1 # -1 since params consists of
+                                        # parameters to all the hidden
+                                        # layers AND the output layer.
 
-We are now ready to start studying the discrete convolutions relevant for convolutional neural networks.
-We often use convolutions over more than one dimension at a time. If
-we have a two-dimensional image $X$ as input, we can have a _filter_
-defined by a two-dimensional _kernel/weight/filter_ $W$. This leads to an output $Y$
+    # Assumes input x being an one-dimensional array
+    num_values = np.size(x)
+    x = x.reshape(-1, num_values)
 
-!bt
-\[
-Y(i,j)=(X * W)(i,j) = \sum_m\sum_n X(m,n)W(i-m,j-n).
-\]
-!et
+    # Assume that the input layer does nothing to the input x
+    x_input = x
 
-Convolution is a commutative process, which means we can rewrite this equation as
-!bt
-\[
-Y(i,j)=(X * W)(i,j) = \sum_m\sum_n X(i-m,j-n)W(m,n).
-\]
-!et
+    # Due to multiple hidden layers, define a variable referencing to the
+    # output of the previous layer:
+    x_prev = x_input
 
-Normally the latter is more straightforward to implement in a machine
-larning library since there is less variation in the range of values
-of $m$ and $n$.
+    ## Hidden layers:
 
+    for l in range(N_hidden):
+        # From the list of parameters P; find the correct weigths and bias for this layer
+        w_hidden = deep_params[l]
 
-As mentioned above, most deep learning libraries implement
-cross-correlation instead of convolution (although it is referred to as
-convolution)
-!bt
-Y(i,j)=(X * W)(i,j) = \sum_m\sum_n X(i+m,j+n)W(m,n).
-\]
-!et
+        # Add a row of ones to include bias
+        x_prev = np.concatenate((np.ones((1,num_values)), x_prev ), axis = 0)
 
+        z_hidden = np.matmul(w_hidden, x_prev)
+        x_hidden = sigmoid(z_hidden)
 
+        # Update x_prev such that next layer can use the output from this layer
+        x_prev = x_hidden
 
-!split
-===== CNNs in more detail, simple example  =====
+    ## Output layer:
 
+    # Get the weights and bias for this layer
+    w_output = deep_params[-1]
 
-Let assume we have an input matrix $X$ of dimensionality $3\times 3$
-and a $2\times 2$ filter $W$ given by the following matrices
+    # Include bias:
+    x_prev = np.concatenate((np.ones((1,num_values)), x_prev), axis = 0)
 
-!bt
-\[
-\bm{X}=\begin{bmatrix}x_{00} & x_{01} & x_{02}  \\
-                      x_{10} & x_{11} & x_{12}  \\
-	              x_{20} & x_{21} & x_{22} \end{bmatrix},
-\]
-!et
-and 
-!bt
-\[
-\bm{W}=\begin{bmatrix}w_{00} & w_{01} \\
-	              w_{10} & w_{11}\end{bmatrix}.
-\]
-!et
-We introduce now the hyperparameter $S$ _stride_. Stride represents how the filter $W$ moves the convolution process on the matrix $X$.
-We strongly recommend the repository on "Arithmetic of deep learning by Dumoulin and Visin":"https://github.com/vdumoulin/conv_arithmetic" 
+    z_output = np.matmul(w_output, x_prev)
+    x_output = z_output
 
-Here we set the stride equal to $S=1$, which means that, starting with the element $x_{00}$, the filter will act on $2\times 2$ submatrices each time, starting with the upper corner and moving according to the stride value column by column. 
+    return x_output
 
-Here we perform the operation
-!bt
-\[
-Y_(i,j)=(X * W)(i,j) = \sum_m\sum_n X(i-m,j-n)W(m,n),
-\]
-!et
-and obtain
-!bt
-\[
-\bm{Y}=\begin{bmatrix}x_{00}w_{00}+x_{01}w_{01}+x_{10}w_{10}+x_{11}w_{11} & x_{01}w_{00}+x_{02}w_{01}+x_{11}w_{10}+x_{12}w_{11}  \\
-	              x_{10}w_{00}+x_{11}w_{01}+x_{20}w_{10}+x_{21}w_{11} & x_{11}w_{00}+x_{12}w_{01}+x_{21}w_{10}+x_{22}w_{11}\end{bmatrix}.
-\]
-!et
+# The trial solution using the deep neural network:
+def g_trial_deep(x,params, g0 = 10):
+    return g0 + x*deep_neural_network(params, x)
 
-We can rewrite this operation in terms of a matrix-vector multiplication by defining a new vector where we flatten out the inputs as a vector $\bm{X}'$ of length $9$ and
-a matrix $\bm{W}'$ with dimension $4\times 9$ as
+# The right side of the ODE:
+def g(x, g_trial, gamma = 2):
+    return -gamma*g_trial
 
-!bt
-\[
-\bm{X}'=\begin{bmatrix}x_{00} \\ x_{01} \\ x_{02} \\ x_{10} \\ x_{11} \\ x_{12} \\ x_{20} \\ x_{21} \\ x_{22} \end{bmatrix},
-\]
-!et
+# The same cost function as before, but calls deep_neural_network instead.
+def cost_function_deep(P, x):
 
-and the new matrix
-!bt
-\[
-\bm{W}'=\begin{bmatrix} w_{00} & w_{01} & 0 & w_{10} & w_{11} & 0 & 0 & 0 & 0 \\
-                        0  & w_{00} & w_{01} & 0 & w_{10} & w_{11} & 0 & 0 & 0 \\
-			0 & 0 & 0 & w_{00} & w_{01} & 0 & w_{10} & w_{11} & 0  \\
-                        0 & 0 & 0 & 0 & w_{00} & w_{01} & 0 & w_{10} & w_{11}\end{bmatrix}.
-\]
-!et
+    # Evaluate the trial function with the current parameters P
+    g_t = g_trial_deep(x,P)
 
-We see easily that performing the matrix-vector multiplication $\bm{W}'\bm{X}'$ is the same as the above convolution with stride $S=1$, that is
+    # Find the derivative w.r.t x of the neural network
+    d_net_out = elementwise_grad(deep_neural_network,1)(P,x)
 
-!bt
-\[
-Y=(\bm{W}*\bm{X}),
-\]
-!et
-is now given by $\bm{W}'\bm{X}'$ which is a vector of length $4$ instead of the originally resulting  $2\times 2$ output matrix.
+    # Find the derivative w.r.t x of the trial function
+    d_g_t = elementwise_grad(g_trial_deep,0)(x,P)
 
+    # The right side of the ODE
+    func = g(x, g_t)
 
-!split
-===== The convolution stage =====
+    err_sqr = (d_g_t - func)**2
+    cost_sum = np.sum(err_sqr)
 
-The convolution stage, where we apply different filters $\bm{W}$ in
-order to reduce the dimensionality of an image, adds, in addition to
-the weights and biases (to be trained by the back propagation
-algorithm) that define the filters, two new hyperparameters, the so-called
-_padding_ $P$ and the stride $S$.
+    return cost_sum / np.size(err_sqr)
 
+# Solve the exponential decay ODE using neural network with one input and one output layer,
+# but with specified number of hidden layers from the user.
+def solve_ode_deep_neural_network(x, num_neurons, num_iter, lmb):
+    # num_hidden_neurons is now a list of number of neurons within each hidden layer
 
-!split
-===== Finding the number of parameters =====
+    # The number of elements in the list num_hidden_neurons thus represents
+    # the number of hidden layers.
 
-In the above example we have an input matrix of dimension $3\times
-3$. In general we call the input for an input volume and it is defined
-by its width $H_1$, height $H_1$ and depth $D_1$. If we have the
-standard three color channels $D_1=3$.
+    # Find the number of hidden layers:
+    N_hidden = np.size(num_neurons)
 
-The above example has $W_1=H_1=3$ and $D_1=1$.
+    ## Set up initial weights and biases
 
-When we introduce the filter we have the following additional hyperparameters
-o $K$ the number of filters. It is common to perform the convolution of the input several times since by experience shrinking the input too fast does not work well
-o $F$ as the filter's spatial extent
-o $S$ as the stride parameter
-o $P$ as the padding parameter
+    # Initialize the list of parameters:
+    P = [None]*(N_hidden + 1) # + 1 to include the output layer
 
-These parameters are defined by the architecture of the network and are not included in the training.
+    P[0] = npr.randn(num_neurons[0], 2 )
+    for l in range(1,N_hidden):
+        P[l] = npr.randn(num_neurons[l], num_neurons[l-1] + 1) # +1 to include bias
 
-!split
-===== New image (or volume) =====
+    # For the output layer
+    P[-1] = npr.randn(1, num_neurons[-1] + 1 ) # +1 since bias is included
 
-Acting with the filter on the input volume produces an output volume
-which is defined by its width $W_2$, its height $H_2$ and its depth
-$D_2$.
+    print('Initial cost: %g'%cost_function_deep(P, x))
 
-These are defined by the following relations
-!bt
-\[
-W_2 = \frac{(W_1-F+2P)}{S}+1,
-\]
-!et
-!bt
-\[
-H_2 = \frac{(H_1-F+2P)}{S}+1,
-\]
-!et
-and $D_2=K$.
+    ## Start finding the optimal weights using gradient descent
 
-!split
-===== Parameters to train, common settings =====
+    # Find the Python function that represents the gradient of the cost function
+    # w.r.t the 0-th input argument -- that is the weights and biases in the hidden and output layer
+    cost_function_deep_grad = grad(cost_function_deep,0)
 
-With parameter sharing, the convolution involves thus  for each filter  $F\times F\times D_1$ weights plus one bias parameter.
+    # Let the update be done num_iter times
+    for i in range(num_iter):
+        # Evaluate the gradient at the current weights and biases in P.
+        # The cost_grad consist now of N_hidden + 1 arrays; the gradient w.r.t the weights and biases
+        # in the hidden layers and output layers evaluated at x.
+        cost_deep_grad =  cost_function_deep_grad(P, x)
 
-In total we have
-!bt
-\[
-\left(F\times F\times D_1\right) \times K+K_{\mathrm{biases}},
-\]
-!et
-parameters to train by back propagation.
+        for l in range(N_hidden+1):
+            P[l] = P[l] - lmb * cost_deep_grad[l]
 
-It is common to let $K$ come in powers of $2$, that is $32$, $64$, $128$ etc.
+    print('Final cost: %g'%cost_function_deep(P, x))
 
-!bblock Common settings
-o $\begin{array}{c} F=3 & S=1 & P=1 \end{array}$
-o $\begin{array}{c} F=5 & S=1 & P=2 \end{array}$
-o $\begin{array}{c} F=5 & S=2 & P=\mathrm{open} \end{array}$
-o $\begin{array}{c} F=1 & S=1 & P=0 \end{array}$
-!eblock
+    return P
 
-!split
-===== Examples of CNN setups =====
+def g_analytic(x, gamma = 2, g0 = 10):
+    return g0*np.exp(-gamma*x)
 
-Let us assume we have an input volume $V$ given by an image of dimensionality
-$32\times 32 \times 3$, that is three color channels and $32\times 32$ pixels.
+# Solve the given problem
+if __name__ == '__main__':
+    npr.seed(15)
 
-We apply a filter of dimension $5\times 5$ ten times with stride $S=1$ and padding $P=0$.
+    ## Decide the vales of arguments to the function to solve
+    N = 10
+    x = np.linspace(0, 1, N)
 
-The output volume is given by $(32-5)/1+1=28$, resulting in ten images
-of dimensionality $28\times 28\times 3$.
+    ## Set up the initial parameters
+    num_hidden_neurons = np.array([10,10])
+    num_iter = 10000
+    lmb = 0.001
 
-The total number of parameters to train for each filter is then
-$5\times 5\times 3+1$, where the last parameter is the bias. This
-gives us $76$ parameters for each filter, leading to a total of $760$
-parameters for the ten filters.
+    P = solve_ode_deep_neural_network(x, num_hidden_neurons, num_iter, lmb)
 
-How many parameters will a filter of dimensionality $3\times 3$
-(adding color channels) result in if we produce $32$ new images? Use $S=1$ and $P=0$.
+    res = g_trial_deep(x,P)
+    res_analytical = g_analytic(x)
 
-Note that strides constitute a form of _subsampling_. As an alternative to
-being interpreted as a measure of how much the kernel/filter is translated, strides
-can also be viewed as how much of the output is retained. For instance, moving
-the kernel by hops of two is equivalent to moving the kernel by hops of one but
-retaining only odd output elements.
+    plt.figure(figsize=(10,10))
 
+    plt.title('Performance of a deep neural network solving an ODE compared to the analytical solution')
+    plt.plot(x, res_analytical)
+    plt.plot(x, res[0,:])
+    plt.legend(['analytical','dnn'])
+    plt.ylabel('g(x)')
+    plt.show()
+!ec
 
 
 !split
-===== Summarizing: Performing a general discrete convolution ("From Raschka et al":"https://github.com/rasbt/machine-learning-book") =====
+===== Example: Population growth =====
 
-FIGURE: [figslides/discreteconv1.png, width=500 frac=0.67]  A deep CNN
+A logistic model of population growth assumes that a population converges toward an equilibrium.
+The population growth can be modeled by
+
+!bt
+\begin{equation} \label{log}
+	g'(t) = \alpha g(t)(A - g(t))
+\end{equation}
+!et
 
+where $g(t)$ is the population density at time $t$, $\alpha > 0$ the growth rate and $A > 0$ is the maximum population number in the environment.
+Also, at $t = 0$ the population has the size $g(0) = g_0$, where $g_0$ is some chosen constant.
 
+In this example, similar network as for the exponential decay using Autograd has been used to solve the equation. However, as the implementation might suffer from e.g numerical instability
+and high execution time (this might be more apparent in the examples solving PDEs),
+using a library like  TensorFlow is recommended.
+Here, we stay with a more simple approach and implement for comparison, the simple forward Euler method.
 
 !split
-===== Pooling =====
+===== Setting up the problem =====
 
-In addition to discrete convolutions themselves, _pooling_ operations
-make up another important building block in CNNs. Pooling operations reduce
-the size of feature maps by using some function to summarize subregions, such
-as taking the average or the maximum value.
+Here, we will model a population $g(t)$ in an environment having carrying capacity $A$.
+The population follows the model
 
-Pooling works by sliding a window across the input and feeding the content of
-the window to a _pooling function_. In some sense, pooling works very much
-like a discrete convolution, but replaces the linear combination described by
-the kernel with some other function.
+!bt
+\begin{equation} \label{solveode_population}
+g'(t) = \alpha g(t)(A - g(t))
+\end{equation}
+!et
 
+where $g(0) = g_0$.
 
+In this example, we let $\alpha = 2$, $A = 1$, and $g_0 = 1.2$.
 
 !split
-===== Pooling arithmetic =====
+===== The trial solution =====
 
-In a neural network, pooling layers provide invariance to small translations of
-the input. The most common kind of pooling is _max pooling_, which
-consists in splitting the input in (usually non-overlapping) patches and
-outputting the maximum value of each patch. Other kinds of pooling exist, e.g.,
-mean or average pooling, which all share the same idea of aggregating the input
-locally by applying a non-linearity to the content of some patches.
+We will get a slightly different trial solution, as the boundary conditions are different
+compared to the case for exponential decay.
 
+A possible trial solution satisfying the condition $g(0) = g_0$ could be
 
-!split
-===== Pooling types ("From Raschka et al":"https://github.com/rasbt/machine-learning-book") =====
+$$
+h_1(t) = g_0 + t \cdot N(t,P)
+$$
 
-FIGURE: [figslides/maxpooling.png, width=500 frac=0.67]  A deep CNN
+with $N(t,P)$ being the output from the neural network with weights and biases for each layer collected in the set $P$.
 
+The analytical solution is
 
+$$
+g(t) = \frac{Ag_0}{g_0 + (A - g_0)\exp(-\alpha A t)}
+$$
 
 !split
-===== Building convolutional neural networks in Tensorflow and Keras =====
+===== The program using Autograd =====
 
-  
-As discussed above, CNNs are neural networks built from the assumption that the inputs
-to the network are 2D images. This is important because the number of features or pixels in images
-grows very fast with the image size, and an enormous number of weights and biases are needed in order to build an accurate network.  
-  
-As before, we still have our input, a hidden layer and an output. What's novel about convolutional networks
-are the _convolutional_ and _pooling_ layers stacked in pairs between the input and the hidden layer.
-In addition, the data is no longer represented as a 2D feature matrix, instead each input is a number of 2D
-matrices, typically 1 for each color dimension (Red, Green, Blue). 
+The network will be the similar as for the exponential decay example, but with some small modifications for our problem.
 
+!bc pycod
+import autograd.numpy as np
+from autograd import grad, elementwise_grad
+import autograd.numpy.random as npr
+from matplotlib import pyplot as plt
 
-!split
-===== Setting it up =====
+def sigmoid(z):
+    return 1/(1 + np.exp(-z))
 
-It means that to represent the entire
-dataset of images, we require a 4D matrix or _tensor_. This tensor has the dimensions:  
-!bt
-\[  
-(n_{inputs},\, n_{pixels, width},\, n_{pixels, height},\, depth) .
-\]
-!et
-  
-!split
-=====  The MNIST dataset again =====
+# Function to get the parameters.
+# Done such that one can easily change the paramaters after one's liking.
+def get_parameters():
+    alpha = 2
+    A = 1
+    g0 = 1.2
+    return alpha, A, g0
 
-The MNIST dataset consists of grayscale images with a pixel size of
-$28\times 28$, meaning we require $28 \times 28 = 724$ weights to each
-neuron in the first hidden layer.
+def deep_neural_network(deep_params, x):
+    # N_hidden is the number of hidden layers
+    # deep_params is a list, len() should be used
+    N_hidden = len(deep_params) - 1 # -1 since params consists of
+                                        # parameters to all the hidden
+                                        # layers AND the output layer.
 
-If we were to analyze images of size $128\times 128$ we would require
-$128 \times 128 = 16384$ weights to each neuron. Even worse if we were
-dealing with color images, as most images are, we have an image matrix
-of size $128\times 128$ for each color dimension (Red, Green, Blue),
-meaning 3 times the number of weights $= 49152$ are required for every
-single neuron in the first hidden layer.
-  
+    # Assumes input x being an one-dimensional array
+    num_values = np.size(x)
+    x = x.reshape(-1, num_values)
 
-!split
-===== Strong correlations =====
+    # Assume that the input layer does nothing to the input x
+    x_input = x
 
-Images typically have strong local correlations, meaning that a small
-part of the image varies little from its neighboring regions. If for
-example we have an image of a blue car, we can roughly assume that a
-small blue part of the image is surrounded by other blue regions.
+    # Due to multiple hidden layers, define a variable referencing to the
+    # output of the previous layer:
+    x_prev = x_input
 
-Therefore, instead of connecting every single pixel to a neuron in the
-first hidden layer, as we have previously done with deep neural
-networks, we can instead connect each neuron to a small part of the
-image (in all 3 RGB depth dimensions).  The size of each small area is
-fixed, and known as a "receptive":"https://en.wikipedia.org/wiki/Receptive_field".
-  
+    ## Hidden layers:
 
-!split 
-===== Layers of a CNN =====
+    for l in range(N_hidden):
+        # From the list of parameters P; find the correct weigths and bias for this layer
+        w_hidden = deep_params[l]
 
-The layers of a convolutional neural network arrange neurons in 3D: width, height and depth.  
-The input image is typically a square matrix of depth 3. 
+        # Add a row of ones to include bias
+        x_prev = np.concatenate((np.ones((1,num_values)), x_prev ), axis = 0)
 
-A _convolution_ is performed on the image which outputs
-a 3D volume of neurons. The weights to the input are arranged in a number of 2D matrices, known as _filters_.
+        z_hidden = np.matmul(w_hidden, x_prev)
+        x_hidden = sigmoid(z_hidden)
 
+        # Update x_prev such that next layer can use the output from this layer
+        x_prev = x_hidden
 
-Each filter slides along the input image, taking the dot product
-between each small part of the image and the filter, in all depth
-dimensions. This is then passed through a non-linear function,
-typically the _Rectified Linear (ReLu)_ function, which serves as the
-activation of the neurons in the first convolutional layer. This is
-further passed through a _pooling layer_, which reduces the size of the
-convolutional layer, e.g. by taking the maximum or average across some
-small regions, and this serves as input to the next convolutional
-layer.
+    ## Output layer:
 
+    # Get the weights and bias for this layer
+    w_output = deep_params[-1]
 
-!split
-===== Systematic reduction =====  
+    # Include bias:
+    x_prev = np.concatenate((np.ones((1,num_values)), x_prev), axis = 0)
 
-By systematically reducing the size of the input volume, through
-convolution and pooling, the network should create representations of
-small parts of the input, and then from them assemble representations
-of larger areas.  The final pooling layer is flattened to serve as
-input to a hidden layer, such that each neuron in the final pooling
-layer is connected to every single neuron in the hidden layer. This
-then serves as input to the output layer, e.g. a softmax output for
-classification.
-  
+    z_output = np.matmul(w_output, x_prev)
+    x_output = z_output
 
-!split
-===== Prerequisites: Collect and pre-process data =====
-!bc pycod
-# import necessary packages
-import numpy as np
-import matplotlib.pyplot as plt
-from sklearn import datasets
+    return x_output
 
 
-# ensure the same random numbers appear every time
-np.random.seed(0)
 
-# display images in notebook
-%matplotlib inline
-plt.rcParams['figure.figsize'] = (12,12)
 
+def cost_function_deep(P, x):
 
-# download MNIST dataset
-digits = datasets.load_digits()
+    # Evaluate the trial function with the current parameters P
+    g_t = g_trial_deep(x,P)
 
-# define inputs and labels
-inputs = digits.images
-labels = digits.target
+    # Find the derivative w.r.t x of the trial function
+    d_g_t = elementwise_grad(g_trial_deep,0)(x,P)
 
-# RGB images have a depth of 3
-# our images are grayscale so they should have a depth of 1
-inputs = inputs[:,:,:,np.newaxis]
+    # The right side of the ODE
+    func = f(x, g_t)
 
-print("inputs = (n_inputs, pixel_width, pixel_height, depth) = " + str(inputs.shape))
-print("labels = (n_inputs) = " + str(labels.shape))
+    err_sqr = (d_g_t - func)**2
+    cost_sum = np.sum(err_sqr)
 
+    return cost_sum / np.size(err_sqr)
 
-# choose some random images to display
-n_inputs = len(inputs)
-indices = np.arange(n_inputs)
-random_indices = np.random.choice(indices, size=5)
+# The right side of the ODE:
+def f(x, g_trial):
+    alpha,A, g0 = get_parameters()
+    return alpha*g_trial*(A - g_trial)
 
-for i, image in enumerate(digits.images[random_indices]):
-    plt.subplot(1, 5, i+1)
-    plt.axis('off')
-    plt.imshow(image, cmap=plt.cm.gray_r, interpolation='nearest')
-    plt.title("Label: %d" % digits.target[random_indices[i]])
-plt.show()
-!ec
+# The trial solution using the deep neural network:
+def g_trial_deep(x, params):
+    alpha,A, g0 = get_parameters()
+    return g0 + x*deep_neural_network(params,x)
 
+# The analytical solution:
+def g_analytic(t):
+    alpha,A, g0 = get_parameters()
+    return A*g0/(g0 + (A - g0)*np.exp(-alpha*A*t))
 
-!split
-===== Importing Keras and Tensorflow =====
-!bc pycod
-from tensorflow.keras import datasets, layers, models
-from tensorflow.keras.layers import Input
-from tensorflow.keras.models import Sequential      #This allows appending layers to existing models
-from tensorflow.keras.layers import Dense           #This allows defining the characteristics of a particular layer
-from tensorflow.keras import optimizers             #This allows using whichever optimiser we want (sgd,adam,RMSprop)
-from tensorflow.keras import regularizers           #This allows using whichever regularizer we want (l1,l2,l1_l2)
-from tensorflow.keras.utils import to_categorical   #This allows using categorical cross entropy as the cost function
-#from tensorflow.keras import Conv2D
-#from tensorflow.keras import MaxPooling2D
-#from tensorflow.keras import Flatten
-
-from sklearn.model_selection import train_test_split
-
-# representation of labels
-labels = to_categorical(labels)
-
-# split into train and test data
-# one-liner from scikit-learn library
-train_size = 0.8
-test_size = 1 - train_size
-X_train, X_test, Y_train, Y_test = train_test_split(inputs, labels, train_size=train_size,
-                                                    test_size=test_size)
-!ec
+def solve_ode_deep_neural_network(x, num_neurons, num_iter, lmb):
+    # num_hidden_neurons is now a list of number of neurons within each hidden layer
 
-!split 
-===== Running with Keras =====
+    # Find the number of hidden layers:
+    N_hidden = np.size(num_neurons)
 
-!bc pycod
-def create_convolutional_neural_network_keras(input_shape, receptive_field,
-                                              n_filters, n_neurons_connected, n_categories,
-                                              eta, lmbd):
-    model = Sequential()
-    model.add(layers.Conv2D(n_filters, (receptive_field, receptive_field), input_shape=input_shape, padding='same',
-              activation='relu', kernel_regularizer=regularizers.l2(lmbd)))
-    model.add(layers.MaxPooling2D(pool_size=(2, 2)))
-    model.add(layers.Flatten())
-    model.add(layers.Dense(n_neurons_connected, activation='relu', kernel_regularizer=regularizers.l2(lmbd)))
-    model.add(layers.Dense(n_categories, activation='softmax', kernel_regularizer=regularizers.l2(lmbd)))
-    
-    sgd = optimizers.SGD(learning_rate=eta)
-    model.compile(loss='categorical_crossentropy', optimizer=sgd, metrics=['accuracy'])
-    
-    return model
-
-epochs = 100
-batch_size = 100
-input_shape = X_train.shape[1:4]
-receptive_field = 3
-n_filters = 10
-n_neurons_connected = 50
-n_categories = 10
-
-eta_vals = np.logspace(-5, 1, 7)
-lmbd_vals = np.logspace(-5, 1, 7)
+    ## Set up initial weigths and biases
+
+    # Initialize the list of parameters:
+    P = [None]*(N_hidden + 1) # + 1 to include the output layer
+
+    P[0] = npr.randn(num_neurons[0], 2 )
+    for l in range(1,N_hidden):
+        P[l] = npr.randn(num_neurons[l], num_neurons[l-1] + 1) # +1 to include bias
+
+    # For the output layer
+    P[-1] = npr.randn(1, num_neurons[-1] + 1 ) # +1 since bias is included
+
+    print('Initial cost: %g'%cost_function_deep(P, x))
+
+    ## Start finding the optimal weigths using gradient descent
+
+    # Find the Python function that represents the gradient of the cost function
+    # w.r.t the 0-th input argument -- that is the weights and biases in the hidden and output layer
+    cost_function_deep_grad = grad(cost_function_deep,0)
+
+    # Let the update be done num_iter times
+    for i in range(num_iter):
+        # Evaluate the gradient at the current weights and biases in P.
+        # The cost_grad consist now of N_hidden + 1 arrays; the gradient w.r.t the weights and biases
+        # in the hidden layers and output layers evaluated at x.
+        cost_deep_grad =  cost_function_deep_grad(P, x)
+
+        for l in range(N_hidden+1):
+            P[l] = P[l] - lmb * cost_deep_grad[l]
+
+    print('Final cost: %g'%cost_function_deep(P, x))
+
+    return P
+
+if __name__ == '__main__':
+    npr.seed(4155)
+
+    ## Decide the vales of arguments to the function to solve
+    Nt = 10
+    T = 1
+    t = np.linspace(0,T, Nt)
+
+    ## Set up the initial parameters
+    num_hidden_neurons = [100, 50, 25]
+    num_iter = 1000
+    lmb = 1e-3
+
+    P = solve_ode_deep_neural_network(t, num_hidden_neurons, num_iter, lmb)
+
+    g_dnn_ag = g_trial_deep(t,P)
+    g_analytical = g_analytic(t)
+
+    # Find the maximum absolute difference between the solutons:
+    diff_ag = np.max(np.abs(g_dnn_ag - g_analytical))
+    print("The max absolute difference between the solutions is: %g"%diff_ag)
+
+    plt.figure(figsize=(10,10))
+
+    plt.title('Performance of neural network solving an ODE compared to the analytical solution')
+    plt.plot(t, g_analytical)
+    plt.plot(t, g_dnn_ag[0,:])
+    plt.legend(['analytical','nn'])
+    plt.xlabel('t')
+    plt.ylabel('g(t)')
+
+    plt.show()
 !ec
 
 !split
-===== Final part =====
+===== Using forward Euler to solve the ODE =====
 
-!bc pycod
-CNN_keras = np.zeros((len(eta_vals), len(lmbd_vals)), dtype=object)
-        
-for i, eta in enumerate(eta_vals):
-    for j, lmbd in enumerate(lmbd_vals):
-        CNN = create_convolutional_neural_network_keras(input_shape, receptive_field,
-                                              n_filters, n_neurons_connected, n_categories,
-                                              eta, lmbd)
-        CNN.fit(X_train, Y_train, epochs=epochs, batch_size=batch_size, verbose=0)
-        scores = CNN.evaluate(X_test, Y_test)
-        
-        CNN_keras[i][j] = CNN
-        
-        print("Learning rate = ", eta)
-        print("Lambda = ", lmbd)
-        print("Test accuracy: %.3f" % scores[1])
-        print()
-!ec     
-
-!split
-===== Final visualization =====
+A straightforward way of solving an ODE numerically, is to use Euler's method.
+
+Euler's method uses Taylor series to approximate the value at a function $f$ at a step $\Delta x$ from $x$:
+
+$$
+f(x + \Delta x) \approx f(x) + \Delta x f'(x)
+$$
+
+In our case, using Euler's method to approximate the value of $g$ at a step $\Delta t$ from $t$ yields
+
+!bt
+\begin{aligned}
+  g(t + \Delta t) &\approx g(t) + \Delta t g'(t) \\
+  &= g(t) + \Delta t \big(\alpha g(t)(A - g(t))\big)
+\end{aligned}
+!et
+along with the condition that $g(0) = g_0$.
+
+Let $t_i = i \cdot \Delta t$ where $\Delta t = \frac{T}{N_t-1}$ where $T$ is the final time our solver must solve for and $N_t$ the number of values for $t \in [0, T]$ for $i = 0, \dots, N_t-1$.
+
+For $i \geq 1$, we have that
+!bt
+\begin{aligned}
+t_i &= i\Delta t \\
+&= (i - 1)\Delta t + \Delta t \\
+&= t_{i-1} + \Delta t
+\end{aligned}
+!et
+
+Now, if $g_i = g(t_i)$ then
+
+!bt
+\begin{equation}
+  \begin{aligned}
+  g_i &= g(t_i) \\
+  &= g(t_{i-1} + \Delta t) \\
+  &\approx g(t_{i-1}) + \Delta t \big(\alpha g(t_{i-1})(A - g(t_{i-1}))\big) \\
+  &= g_{i-1} + \Delta t \big(\alpha g_{i-1}(A - g_{i-1})\big)
+  \end{aligned}
+\end{equation} \label{odenum}
+!et
+for $i \geq 1$ and $g_0 = g(t_0) = g(0) = g_0$.
+
+Equation (ref{odenum}) could be implemented in the following way,
+extending the program that uses the network using Autograd:
 
 !bc pycod
-# visual representation of grid search
-# uses seaborn heatmap, could probably do this in matplotlib
-import seaborn as sns
+# Assume that all function definitions from the example program using Autograd
+# are located here.
 
-sns.set()
+if __name__ == '__main__':
+    npr.seed(4155)
 
-train_accuracy = np.zeros((len(eta_vals), len(lmbd_vals)))
-test_accuracy = np.zeros((len(eta_vals), len(lmbd_vals)))
+    ## Decide the vales of arguments to the function to solve
+    Nt = 10
+    T = 1
+    t = np.linspace(0,T, Nt)
 
-for i in range(len(eta_vals)):
-    for j in range(len(lmbd_vals)):
-        CNN = CNN_keras[i][j]
+    ## Set up the initial parameters
+    num_hidden_neurons = [100,50,25]
+    num_iter = 1000
+    lmb = 1e-3
 
-        train_accuracy[i][j] = CNN.evaluate(X_train, Y_train)[1]
-        test_accuracy[i][j] = CNN.evaluate(X_test, Y_test)[1]
+    P = solve_ode_deep_neural_network(t, num_hidden_neurons, num_iter, lmb)
 
-        
-fig, ax = plt.subplots(figsize = (10, 10))
-sns.heatmap(train_accuracy, annot=True, ax=ax, cmap="viridis")
-ax.set_title("Training Accuracy")
-ax.set_ylabel("$\eta$")
-ax.set_xlabel("$\lambda$")
-plt.show()
+    g_dnn_ag = g_trial_deep(t,P)
+    g_analytical = g_analytic(t)
 
-fig, ax = plt.subplots(figsize = (10, 10))
-sns.heatmap(test_accuracy, annot=True, ax=ax, cmap="viridis")
-ax.set_title("Test Accuracy")
-ax.set_ylabel("$\eta$")
-ax.set_xlabel("$\lambda$")
-plt.show()
-!ec
+    # Find the maximum absolute difference between the solutons:
+    diff_ag = np.max(np.abs(g_dnn_ag - g_analytical))
+    print("The max absolute difference between the solutions is: %g"%diff_ag)
 
+    plt.figure(figsize=(10,10))
 
+    plt.title('Performance of neural network solving an ODE compared to the analytical solution')
+    plt.plot(t, g_analytical)
+    plt.plot(t, g_dnn_ag[0,:])
+    plt.legend(['analytical','nn'])
+    plt.xlabel('t')
+    plt.ylabel('g(t)')
 
+    ## Find an approximation to the funtion using forward Euler
 
-!split
-===== The CIFAR01 data set =====
+    alpha, A, g0 = get_parameters()
+    dt = T/(Nt - 1)
 
-The CIFAR10 dataset contains 60,000 color images in 10 classes, with
-6,000 images in each class. The dataset is divided into 50,000
-training images and 10,000 testing images. The classes are mutually
-exclusive and there is no overlap between them.
+    # Perform forward Euler to solve the ODE
+    g_euler = np.zeros(Nt)
+    g_euler[0] = g0
 
-!bc pycod
-import tensorflow as tf
+    for i in range(1,Nt):
+        g_euler[i] = g_euler[i-1] + dt*(alpha*g_euler[i-1]*(A - g_euler[i-1]))
 
-from tensorflow.keras import datasets, layers, models
-import matplotlib.pyplot as plt
+    # Print the errors done by each method
+    diff1 = np.max(np.abs(g_euler - g_analytical))
+    diff2 = np.max(np.abs(g_dnn_ag[0,:] - g_analytical))
+
+    print('Max absolute difference between Euler method and analytical: %g'%diff1)
+    print('Max absolute difference between deep neural network and analytical: %g'%diff2)
 
-# We import the data set
-(train_images, train_labels), (test_images, test_labels) = datasets.cifar10.load_data()
+    # Plot results
+    plt.figure(figsize=(10,10))
 
-# Normalize pixel values to be between 0 and 1 by dividing by 255. 
-train_images, test_images = train_images / 255.0, test_images / 255.0
+    plt.plot(t,g_euler)
+    plt.plot(t,g_analytical)
+    plt.plot(t,g_dnn_ag[0,:])
 
+    plt.legend(['euler','analytical','dnn'])
+    plt.xlabel('Time t')
+    plt.ylabel('g(t)')
+
+    plt.show()
 !ec
 
 
 
 !split
-===== Verifying the data set =====
+===== Example: Solving the one dimensional Poisson equation =====
 
-To verify that the dataset looks correct, let's plot the first 25 images from the training set and display the class name below each image.
+The Poisson equation for $g(x)$ in one dimension is
 
-!bc pycod
-class_names = ['airplane', 'automobile', 'bird', 'cat', 'deer',
-               'dog', 'frog', 'horse', 'ship', 'truck']
-plt.figure(figsize=(10,10))
-for i in range(25):
-    plt.subplot(5,5,i+1)
-    plt.xticks([])
-    plt.yticks([])
-    plt.grid(False)
-    plt.imshow(train_images[i], cmap=plt.cm.binary)
-    # The CIFAR labels happen to be arrays, 
-    # which is why you need the extra index
-    plt.xlabel(class_names[train_labels[i][0]])
-plt.show()
-!ec
+!bt
+\begin{equation} \label{poisson}
+  -g''(x) = f(x)
+\end{equation}
+!et
+
+where $f(x)$ is a given function for $x \in (0,1)$.
+
+The conditions that $g(x)$ is chosen to fulfill, are
+!bt
+\begin{align*}
+  g(0) &= 0 \\
+  g(1) &= 0
+\end{align*}
+!et
+
+This equation can be solved numerically using programs where e.g Autograd and TensorFlow are used.
+The results from the networks can then be compared to the analytical solution.
+In addition, it could be interesting to see how a typical method for numerically solving second order ODEs compares to the neural networks.
 
 !split
-===== Set up  the model =====
+===== The specific equation to solve for =====
 
-The 6 lines of code below define the convolutional base using a common pattern: a stack of Conv2D and MaxPooling2D layers.
+Here, the function $g(x)$ to solve for follows the equation
 
-As input, a CNN takes tensors of shape (image_height, image_width, color_channels), ignoring the batch size. If you are new to these dimensions, color_channels refers to (R,G,B). In this example, you will configure our CNN to process inputs of shape (32, 32, 3), which is the format of CIFAR images. You can do this by passing the argument input_shape to our first layer.
+!bt
+-g''(x) = f(x),\qquad x \in (0,1)
+!et
 
-!bc pycod
-model = models.Sequential()
-model.add(layers.Conv2D(32, (3, 3), activation='relu', input_shape=(32, 32, 3)))
-model.add(layers.MaxPooling2D((2, 2)))
-model.add(layers.Conv2D(64, (3, 3), activation='relu'))
-model.add(layers.MaxPooling2D((2, 2)))
-model.add(layers.Conv2D(64, (3, 3), activation='relu'))
+where $f(x)$ is a given function, along with the chosen conditions
 
-# Let's display the architecture of our model so far.
+!bt
+\begin{aligned}
+g(0) = g(1) = 0
+\end{aligned}\label{cond}
+!et
 
-model.summary()
-!ec
+In this example, we consider the case when $f(x) = (3x + x^2)\exp(x)$.
 
-You can see that the output of every Conv2D and MaxPooling2D layer is a 3D tensor of shape (height, width, channels). The width and height dimensions tend to shrink as you go deeper in the network. The number of output channels for each Conv2D layer is controlled by the first argument (e.g., 32 or 64). Typically, as the width and height shrink, you can afford (computationally) to add more output channels in each Conv2D layer.
+For this case, a possible trial solution satisfying the conditions could be
 
+!bt
+g_t(x) = x \cdot (1-x) \cdot N(P,x)
+!et
 
+The analytical solution for this problem is
 
+!bt
+g(x) = x(1 - x)\exp(x)
+!et
 
 !split
-===== Add Dense layers on top =====
-
-To complete our model, you will feed the last output tensor from the
-convolutional base (of shape (4, 4, 64)) into one or more Dense layers
-to perform classification. Dense layers take vectors as input (which
-are 1D), while the current output is a 3D tensor. First, you will
-flatten (or unroll) the 3D output to 1D, then add one or more Dense
-layers on top. CIFAR has 10 output classes, so you use a final Dense
-layer with 10 outputs and a softmax activation.
+===== Solving the equation using Autograd =====
 
 !bc pycod
-model.add(layers.Flatten())
-model.add(layers.Dense(64, activation='relu'))
-model.add(layers.Dense(10))
-# Here's the complete architecture of our model
+import autograd.numpy as np
+from autograd import grad, elementwise_grad
+import autograd.numpy.random as npr
+from matplotlib import pyplot as plt
+
+def sigmoid(z):
+    return 1/(1 + np.exp(-z))
+
+def deep_neural_network(deep_params, x):
+    # N_hidden is the number of hidden layers
+    # deep_params is a list, len() should be used
+    N_hidden = len(deep_params) - 1 # -1 since params consists of
+                                        # parameters to all the hidden
+                                        # layers AND the output layer.
+
+    # Assumes input x being an one-dimensional array
+    num_values = np.size(x)
+    x = x.reshape(-1, num_values)
+
+    # Assume that the input layer does nothing to the input x
+    x_input = x
+
+    # Due to multiple hidden layers, define a variable referencing to the
+    # output of the previous layer:
+    x_prev = x_input
+
+    ## Hidden layers:
+
+    for l in range(N_hidden):
+        # From the list of parameters P; find the correct weigths and bias for this layer
+        w_hidden = deep_params[l]
+
+        # Add a row of ones to include bias
+        x_prev = np.concatenate((np.ones((1,num_values)), x_prev ), axis = 0)
+
+        z_hidden = np.matmul(w_hidden, x_prev)
+        x_hidden = sigmoid(z_hidden)
+
+        # Update x_prev such that next layer can use the output from this layer
+        x_prev = x_hidden
+
+    ## Output layer:
+
+    # Get the weights and bias for this layer
+    w_output = deep_params[-1]
+
+    # Include bias:
+    x_prev = np.concatenate((np.ones((1,num_values)), x_prev), axis = 0)
+
+    z_output = np.matmul(w_output, x_prev)
+    x_output = z_output
+
+    return x_output
+
+
+def solve_ode_deep_neural_network(x, num_neurons, num_iter, lmb):
+    # num_hidden_neurons is now a list of number of neurons within each hidden layer
+
+    # Find the number of hidden layers:
+    N_hidden = np.size(num_neurons)
+
+    ## Set up initial weigths and biases
 
-model.summary()
+    # Initialize the list of parameters:
+    P = [None]*(N_hidden + 1) # + 1 to include the output layer
+
+    P[0] = npr.randn(num_neurons[0], 2 )
+    for l in range(1,N_hidden):
+        P[l] = npr.randn(num_neurons[l], num_neurons[l-1] + 1) # +1 to include bias
+
+    # For the output layer
+    P[-1] = npr.randn(1, num_neurons[-1] + 1 ) # +1 since bias is included
+
+    print('Initial cost: %g'%cost_function_deep(P, x))
+
+    ## Start finding the optimal weigths using gradient descent
+
+    # Find the Python function that represents the gradient of the cost function
+    # w.r.t the 0-th input argument -- that is the weights and biases in the hidden and output layer
+    cost_function_deep_grad = grad(cost_function_deep,0)
+
+    # Let the update be done num_iter times
+    for i in range(num_iter):
+        # Evaluate the gradient at the current weights and biases in P.
+        # The cost_grad consist now of N_hidden + 1 arrays; the gradient w.r.t the weights and biases
+        # in the hidden layers and output layers evaluated at x.
+        cost_deep_grad =  cost_function_deep_grad(P, x)
+
+        for l in range(N_hidden+1):
+            P[l] = P[l] - lmb * cost_deep_grad[l]
+
+    print('Final cost: %g'%cost_function_deep(P, x))
+
+    return P
+
+## Set up the cost function specified for this Poisson equation:
+
+# The right side of the ODE
+def f(x):
+    return (3*x + x**2)*np.exp(x)
+
+def cost_function_deep(P, x):
+
+    # Evaluate the trial function with the current parameters P
+    g_t = g_trial_deep(x,P)
+
+    # Find the derivative w.r.t x of the trial function
+    d2_g_t = elementwise_grad(elementwise_grad(g_trial_deep,0))(x,P)
+
+    right_side = f(x)
+
+    err_sqr = (-d2_g_t - right_side)**2
+    cost_sum = np.sum(err_sqr)
+
+    return cost_sum/np.size(err_sqr)
+
+# The trial solution:
+def g_trial_deep(x,P):
+    return x*(1-x)*deep_neural_network(P,x)
+
+# The analytic solution;
+def g_analytic(x):
+    return x*(1-x)*np.exp(x)
+
+if __name__ == '__main__':
+    npr.seed(4155)
+
+    ## Decide the vales of arguments to the function to solve
+    Nx = 10
+    x = np.linspace(0,1, Nx)
+
+    ## Set up the initial parameters
+    num_hidden_neurons = [200,100]
+    num_iter = 1000
+    lmb = 1e-3
+
+    P = solve_ode_deep_neural_network(x, num_hidden_neurons, num_iter, lmb)
+
+    g_dnn_ag = g_trial_deep(x,P)
+    g_analytical = g_analytic(x)
+
+    # Find the maximum absolute difference between the solutons:
+    max_diff = np.max(np.abs(g_dnn_ag - g_analytical))
+    print("The max absolute difference between the solutions is: %g"%max_diff)
+
+    plt.figure(figsize=(10,10))
+
+    plt.title('Performance of neural network solving an ODE compared to the analytical solution')
+    plt.plot(x, g_analytical)
+    plt.plot(x, g_dnn_ag[0,:])
+    plt.legend(['analytical','nn'])
+    plt.xlabel('x')
+    plt.ylabel('g(x)')
+    plt.show()
 !ec
-As you can see, our (4, 4, 64) outputs were flattened into vectors of shape (1024) before going through two Dense layers.
 
 !split
-===== Compile and train the model =====
+===== Comparing with a numerical scheme =====
 
-!bc pycod
-model.compile(optimizer='adam',
-              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
-              metrics=['accuracy'])
+The Poisson equation is possible to solve using Taylor series to approximate the second derivative.
 
-history = model.fit(train_images, train_labels, epochs=10, 
-                    validation_data=(test_images, test_labels))
+Using Taylor series, the second derivative can be expressed as
 
-!ec
+$$
+g''(x) = \frac{g(x + \Delta x) - 2g(x) + g(x-\Delta x)}{\Delta x^2} + E_{\Delta x}(x)
+$$
+
+where $\Delta x$ is a small step size and $E_{\Delta x}(x)$ being the error term.
+
+Looking away from the error terms gives an approximation to the second derivative:
 
+!bt
+\begin{equation} \label{approx}
+g''(x) \approx \frac{g(x + \Delta x) - 2g(x) + g(x-\Delta x)}{\Delta x^2}
+\end{equation}
+!et
+
+If $x_i = i \Delta x = x_{i-1} + \Delta x$ and $g_i = g(x_i)$ for $i = 1,\dots N_x - 2$ with $N_x$ being the number of values for $x$, (ref{approx}) becomes
+
+!bt
+\begin{aligned}
+g''(x_i) &\approx \frac{g(x_i + \Delta x) - 2g(x_i) + g(x_i -\Delta x)}{\Delta x^2} \\
+&= \frac{g_{i+1} - 2g_i + g_{i-1}}{\Delta x^2}
+\end{aligned}
+!et
+
+Since we know from our problem that
+
+!bt
+\begin{aligned}
+-g''(x) &= f(x) \\
+&= (3x + x^2)\exp(x)
+\end{aligned}
+!et
+
+along with the conditions $g(0) = g(1) = 0$,
+the following scheme can be used to find an approximate solution for $g(x)$ numerically:
+
+!bt
+\begin{equation}
+  \begin{aligned}
+  -\Big( \frac{g_{i+1} - 2g_i + g_{i-1}}{\Delta x^2} \Big) &= f(x_i) \\
+  -g_{i+1} + 2g_i - g_{i-1} &= \Delta x^2 f(x_i)
+  \end{aligned}
+\end{equation} \label{odesys}
+!et
+
+for $i = 1, \dots, N_x - 2$ where $g_0 = g_{N_x - 1} = 0$ and $f(x_i) = (3x_i + x_i^2)\exp(x_i)$, which is given for our specific problem.
+
+The equation can be rewritten into a matrix equation:
+
+!bt
+\begin{aligned}
+\begin{pmatrix}
+2 & -1 & 0 & \dots & 0 \\
+-1 & 2 & -1 & \dots & 0 \\
+\vdots & & \ddots & & \vdots \\
+0 & \dots & -1 & 2 & -1  \\
+0 & \dots & 0 & -1 & 2\\
+\end{pmatrix}
+\begin{pmatrix}
+g_1 \\
+g_2 \\
+\vdots \\
+g_{N_x - 3} \\
+g_{N_x - 2}
+\end{pmatrix}
+&=
+\Delta x^2
+\begin{pmatrix}
+f(x_1) \\
+f(x_2) \\
+\vdots \\
+f(x_{N_x - 3}) \\
+f(x_{N_x - 2})
+\end{pmatrix} \\
+\bm{A}\bm{g} &= \bm{f},
+\end{aligned}
+!et
+
+which makes it possible to solve for the vector $\bm{g}$.
 
 !split
-===== Finally, evaluate the model =====
+===== Setting up the code =====
+
+We can then compare the result from this numerical scheme with the output from our network using Autograd:
 
 !bc pycod
-plt.plot(history.history['accuracy'], label='accuracy')
-plt.plot(history.history['val_accuracy'], label = 'val_accuracy')
-plt.xlabel('Epoch')
-plt.ylabel('Accuracy')
-plt.ylim([0.5, 1])
-plt.legend(loc='lower right')
+import autograd.numpy as np
+from autograd import grad, elementwise_grad
+import autograd.numpy.random as npr
+from matplotlib import pyplot as plt
 
-test_loss, test_acc = model.evaluate(test_images,  test_labels, verbose=2)
+def sigmoid(z):
+    return 1/(1 + np.exp(-z))
 
-print(test_acc)
+def deep_neural_network(deep_params, x):
+    # N_hidden is the number of hidden layers
+    # deep_params is a list, len() should be used
+    N_hidden = len(deep_params) - 1 # -1 since params consists of
+                                        # parameters to all the hidden
+                                        # layers AND the output layer.
 
-!ec
+    # Assumes input x being an one-dimensional array
+    num_values = np.size(x)
+    x = x.reshape(-1, num_values)
 
+    # Assume that the input layer does nothing to the input x
+    x_input = x
 
+    # Due to multiple hidden layers, define a variable referencing to the
+    # output of the previous layer:
+    x_prev = x_input
 
+    ## Hidden layers:
 
+    for l in range(N_hidden):
+        # From the list of parameters P; find the correct weigths and bias for this layer
+        w_hidden = deep_params[l]
 
-!split
-===== Building our own CNN code ===== 
+        # Add a row of ones to include bias
+        x_prev = np.concatenate((np.ones((1,num_values)), x_prev ), axis = 0)
 
+        z_hidden = np.matmul(w_hidden, x_prev)
+        x_hidden = sigmoid(z_hidden)
 
-Here we present a flexible and readable python code for a CNN
-implemented with NumPy. We will present the code, showcase how to use
-the codebase and fit a CNN that yields a 99% accuracy on the 28x28
-MNIST dataset within reasonable time.
+        # Update x_prev such that next layer can use the output from this layer
+        x_prev = x_hidden
 
-_The codes here were developed by Eric Reber and Gregor Kajda during spring 2023._
+    ## Output layer:
 
-The CNN is compatible with all schedulers, cost functions and
-activation functions discussed in constructing our neural network
-codes.
- 
- The CNN code consists of different types of Layer classes, including
-Convolution2DLayer, Pooling2DLayer, FlattenLayer, FullyConnectedLayer
-and OutputLayer, which can be added to the CNN object using the
-interface of the CNN class. This allows you to easily construct your
-own CNN, as well as allowing you to get used to an interface similar
-to that of TensorFlow which is used for real world applications. 
+    # Get the weights and bias for this layer
+    w_output = deep_params[-1]
 
-Another important feature of this code is that it throws errors if
-unreasonable decisions are made (for example using a kernel that is
-larger than the image, not using a FlattenLayer, etc), and provides
-the user with an informative error message.
+    # Include bias:
+    x_prev = np.concatenate((np.ones((1,num_values)), x_prev), axis = 0)
 
-=== List of contents: ===
-o Schedulers
-o Activation Functions
-o Cost Functions 
-o Convolution
-o Layers
-o CNN 
-o Some final remarks
+    z_output = np.matmul(w_output, x_prev)
+    x_output = z_output
 
-=== Schedulers ===
+    return x_output
 
-The code below shows object oriented implementations of the Constant,
-Momentum, Adagrad, AdagradMomentum, RMS prop and Adam schedulers. All
-of the classes belong to the shared abstract Scheduler class, and
-share the update_change() and reset() methods allowing for any of the
-schedulers to be seamlessly used during the training stage, as will
-later be shown in the fit() method of the neural
-network. Update_change() only has one parameter, the gradient
-($\delta^{l}_{j}a^{l-1}_k$), and returns the change which will be
-subtracted from the weights. The reset() function takes no parameters,
-and resets the desired variables. For Constant and Momentum, reset
-does nothing.
 
-!bc pycod
-import autograd.numpy as np
+def solve_ode_deep_neural_network(x, num_neurons, num_iter, lmb):
+    # num_hidden_neurons is now a list of number of neurons within each hidden layer
 
-class Scheduler:
-    """
-    Abstract class for Schedulers
-    """
+    # Find the number of hidden layers:
+    N_hidden = np.size(num_neurons)
 
-    def __init__(self, eta):
-        self.eta = eta
+    ## Set up initial weigths and biases
 
-    # should be overwritten
-    def update_change(self, gradient):
-        raise NotImplementedError
+    # Initialize the list of parameters:
+    P = [None]*(N_hidden + 1) # + 1 to include the output layer
 
-    # overwritten if needed
-    def reset(self):
-        pass
+    P[0] = npr.randn(num_neurons[0], 2 )
+    for l in range(1,N_hidden):
+        P[l] = npr.randn(num_neurons[l], num_neurons[l-1] + 1) # +1 to include bias
 
+    # For the output layer
+    P[-1] = npr.randn(1, num_neurons[-1] + 1 ) # +1 since bias is included
 
-class Constant(Scheduler):
-    def __init__(self, eta):
-        super().__init__(eta)
+    print('Initial cost: %g'%cost_function_deep(P, x))
 
-    def update_change(self, gradient):
-        return self.eta * gradient
-    
-    def reset(self):
-        pass
+    ## Start finding the optimal weigths using gradient descent
 
+    # Find the Python function that represents the gradient of the cost function
+    # w.r.t the 0-th input argument -- that is the weights and biases in the hidden and output layer
+    cost_function_deep_grad = grad(cost_function_deep,0)
 
-class Momentum(Scheduler):
-    def __init__(self, eta: float, momentum: float):
-        super().__init__(eta)
-        self.momentum = momentum
-        self.change = 0
+    # Let the update be done num_iter times
+    for i in range(num_iter):
+        # Evaluate the gradient at the current weights and biases in P.
+        # The cost_grad consist now of N_hidden + 1 arrays; the gradient w.r.t the weights and biases
+        # in the hidden layers and output layers evaluated at x.
+        cost_deep_grad =  cost_function_deep_grad(P, x)
 
-    def update_change(self, gradient):
-        self.change = self.momentum * self.change + self.eta * gradient
-        return self.change
+        for l in range(N_hidden+1):
+            P[l] = P[l] - lmb * cost_deep_grad[l]
 
-    def reset(self):
-        pass
+    print('Final cost: %g'%cost_function_deep(P, x))
 
+    return P
 
-class Adagrad(Scheduler):
-    def __init__(self, eta):
-        super().__init__(eta)
-        self.G_t = None
+## Set up the cost function specified for this Poisson equation:
 
-    def update_change(self, gradient):
-        delta = 1e-8  # avoid division ny zero
+# The right side of the ODE
+def f(x):
+    return (3*x + x**2)*np.exp(x)
 
-        if self.G_t is None:
-            self.G_t = np.zeros((gradient.shape[0], gradient.shape[0]))
+def cost_function_deep(P, x):
 
-        self.G_t += gradient @ gradient.T
+    # Evaluate the trial function with the current parameters P
+    g_t = g_trial_deep(x,P)
 
-        G_t_inverse = 1 / (
-            delta + np.sqrt(np.reshape(np.diagonal(self.G_t), (self.G_t.shape[0], 1)))
-        )
-        return self.eta * gradient * G_t_inverse
+    # Find the derivative w.r.t x of the trial function
+    d2_g_t = elementwise_grad(elementwise_grad(g_trial_deep,0))(x,P)
 
-    def reset(self):
-        self.G_t = None
+    right_side = f(x)
 
+    err_sqr = (-d2_g_t - right_side)**2
+    cost_sum = np.sum(err_sqr)
 
-class AdagradMomentum(Scheduler):
-    def __init__(self, eta, momentum):
-        super().__init__(eta)
-        self.G_t = None
-        self.momentum = momentum
-        self.change = 0
+    return cost_sum/np.size(err_sqr)
 
-    def update_change(self, gradient):
-        delta = 1e-8  # avoid division ny zero
+# The trial solution:
+def g_trial_deep(x,P):
+    return x*(1-x)*deep_neural_network(P,x)
 
-        if self.G_t is None:
-            self.G_t = np.zeros((gradient.shape[0], gradient.shape[0]))
+# The analytic solution;
+def g_analytic(x):
+    return x*(1-x)*np.exp(x)
 
-        self.G_t += gradient @ gradient.T
+if __name__ == '__main__':
+    npr.seed(4155)
 
-        G_t_inverse = 1 / (
-            delta + np.sqrt(np.reshape(np.diagonal(self.G_t), (self.G_t.shape[0], 1)))
-        )
-        self.change = self.change * self.momentum + self.eta * gradient * G_t_inverse
-        return self.change
+    ## Decide the vales of arguments to the function to solve
+    Nx = 10
+    x = np.linspace(0,1, Nx)
 
-    def reset(self):
-        self.G_t = None
+    ## Set up the initial parameters
+    num_hidden_neurons = [200,100]
+    num_iter = 1000
+    lmb = 1e-3
 
+    P = solve_ode_deep_neural_network(x, num_hidden_neurons, num_iter, lmb)
 
-class RMS_prop(Scheduler):
-    def __init__(self, eta, rho):
-        super().__init__(eta)
-        self.rho = rho
-        self.second = 0.0
+    g_dnn_ag = g_trial_deep(x,P)
+    g_analytical = g_analytic(x)
 
-    def update_change(self, gradient):
-        delta = 1e-8  # avoid division ny zero
-        self.second = self.rho * self.second + (1 - self.rho) * gradient * gradient
-        return self.eta * gradient / (np.sqrt(self.second + delta))
+    # Find the maximum absolute difference between the solutons:
 
-    def reset(self):
-        self.second = 0.0
+    plt.figure(figsize=(10,10))
 
+    plt.title('Performance of neural network solving an ODE compared to the analytical solution')
+    plt.plot(x, g_analytical)
+    plt.plot(x, g_dnn_ag[0,:])
+    plt.legend(['analytical','nn'])
+    plt.xlabel('x')
+    plt.ylabel('g(x)')
 
-class Adam(Scheduler):
-    def __init__(self, eta, rho, rho2):
-        super().__init__(eta)
-        self.rho = rho
-        self.rho2 = rho2
-        self.moment = 0
-        self.second = 0
-        self.n_epochs = 1
+    ## Perform the computation using the numerical scheme
 
-    def update_change(self, gradient):
-        delta = 1e-8  # avoid division ny zero
+    dx = 1/(Nx - 1)
 
-        self.moment = self.rho * self.moment + (1 - self.rho) * gradient
-        self.second = self.rho2 * self.second + (1 - self.rho2) * gradient * gradient
+    # Set up the matrix A
+    A = np.zeros((Nx-2,Nx-2))
 
-        moment_corrected = self.moment / (1 - self.rho**self.n_epochs)
-        second_corrected = self.second / (1 - self.rho2**self.n_epochs)
+    A[0,0] = 2
+    A[0,1] = -1
 
-        return self.eta * moment_corrected / (np.sqrt(second_corrected + delta))
+    for i in range(1,Nx-3):
+        A[i,i-1] = -1
+        A[i,i] = 2
+        A[i,i+1] = -1
 
-    def reset(self):
-        self.n_epochs += 1
-        self.moment = 0
-        self.second = 0
+    A[Nx - 3, Nx - 4] = -1
+    A[Nx - 3, Nx - 3] = 2
 
-!ec
+    # Set up the vector f
+    f_vec = dx**2 * f(x[1:-1])
 
-=== Usage of schedulers ===
+    # Solve the equation
+    g_res = np.linalg.solve(A,f_vec)
 
-To initalize a scheduler, simply create the object and pass in the necessary parameters such as the learning rate and the momentum as shown below. As the Scheduler class is an abstract class it should not called directly, and will raise an error upon usage.
+    g_vec = np.zeros(Nx)
+    g_vec[1:-1] = g_res
 
-!bc pycod
-momentum_scheduler = Momentum(eta=1e-3, momentum=0.9)
-adam_scheduler = Adam(eta=1e-3, rho=0.9, rho2=0.999)
-!ec
+    # Print the differences between each method
+    max_diff1 = np.max(np.abs(g_dnn_ag - g_analytical))
+    max_diff2 = np.max(np.abs(g_vec - g_analytical))
+    print("The max absolute difference between the analytical solution and DNN Autograd: %g"%max_diff1)
+    print("The max absolute difference between the analytical solution and numerical scheme: %g"%max_diff2)
 
-Here is a small example for how a segment of code using schedulers could look. Switching out the schedulers is simple.
+    # Plot the results
+    plt.figure(figsize=(10,10))
 
-!bc pycod
-weights = np.ones((3,3))
-print(f"Before scheduler:\n{weights=}")
+    plt.plot(x,g_vec)
+    plt.plot(x,g_analytical)
+    plt.plot(x,g_dnn_ag[0,:])
 
-epochs = 10
-for e in range(epochs):
-    gradient = np.random.rand(3, 3)
-    change = adam_scheduler.update_change(gradient)
-    weights = weights - change
-    adam_scheduler.reset()
+    plt.legend(['numerical scheme','analytical','dnn'])
+    plt.show()
 
-print(f"\nAfter scheduler:\n{weights=}")
 !ec
 
-=== Cost functions ===
 
-In this section we will quickly look at cost functions that can be
-used when creating the neural network. Every cost function takes the
-target vector as its parameter, and returns a function valued only at
-X such that it may easily be differentiated.
 
+!split
+===== Partial Differential Equations =====
+
+A partial differential equation (PDE) has a solution here the function
+is defined by multiple variables.  The equation may involve all kinds
+of combinations of which variables the function is differentiated with
+respect to.
+
+In general, a partial differential equation for a function $g(x_1,\dots,x_N)$ with $N$ variables may be expressed as
+
+!bt
+\begin{equation} \label{PDE}
+  f\left(x_1, \, \dots \, , x_N, \frac{\partial g(x_1,\dots,x_N) }{\partial x_1}, \dots , \frac{\partial g(x_1,\dots,x_N) }{\partial x_N}, \frac{\partial g(x_1,\dots,x_N) }{\partial x_1\partial x_2}, \, \dots \, , \frac{\partial^n g(x_1,\dots,x_N) }{\partial x_N^n} \right) = 0
+\end{equation}
+!et
+
+where $f$ is an expression involving all kinds of possible mixed derivatives of $g(x_1,\dots,x_N)$ up to an order $n$. In order for the solution to be unique, some additional conditions must also be given.
+
+!split
+===== Type of problem =====
+
+The problem our network must solve for, is similar to the ODE case.
+We must have a trial solution $g_t$ at hand.
+
+For instance, the trial solution could be expressed as
+!bt
+\begin{align*}
+  g_t(x_1,\dots,x_N) = h_1(x_1,\dots,x_N) + h_2(x_1,\dots,x_N,N(x_1,\dots,x_N,P))
+\end{align*}
+!et
+where $h_1(x_1,\dots,x_N)$ is a function that ensures $g_t(x_1,\dots,x_N)$ satisfies some given conditions.
+The neural network $N(x_1,\dots,x_N,P)$ has weights and biases described by $P$ and $h_2(x_1,\dots,x_N,N(x_1,\dots,x_N,P))$ is an expression using the output from the neural network in some way.
+
+The role of the function $h_2(x_1,\dots,x_N,N(x_1,\dots,x_N,P))$, is to ensure that the output of $N(x_1,\dots,x_N,P)$ is zero when $g_t(x_1,\dots,x_N)$ is evaluated at the values of $x_1,\dots,x_N$ where the given conditions must be satisfied. The function $h_1(x_1,\dots,x_N)$ should alone make $g_t(x_1,\dots,x_N)$ satisfy the conditions.
+
+
+!split
+===== Network requirements =====
+
+The network tries then the minimize the cost function following the
+same ideas as described for the ODE case, but now with more than one
+variables to consider.  The concept still remains the same; find a set
+of parameters $P$ such that the expression $f$ in (ref{PDE}) is as
+close to zero as possible.
+
+As for the ODE case, the cost function is the mean squared error that
+the network must try to minimize. The cost function for the network to
+minimize is
+
+!bt
+\begin{equation*}
+C\left(x_1, \dots, x_N, P\right) = \left(  f\left(x_1, \, \dots \, , x_N, \frac{\partial g(x_1,\dots,x_N) }{\partial x_1}, \dots , \frac{\partial g(x_1,\dots,x_N) }{\partial x_N}, \frac{\partial g(x_1,\dots,x_N) }{\partial x_1\partial x_2}, \, \dots \, , \frac{\partial^n g(x_1,\dots,x_N) }{\partial x_N^n} \right) \right)^2
+\end{equation*}
+!et
+
+!split
+===== More details =====
+
+If we let $\bm{x} = \big( x_1, \dots, x_N \big)$ be an array containing the values for $x_1, \dots, x_N$ respectively, the cost function can be reformulated into the following:
+!bt
+\[
+	C\left(\bm{x}, P\right) = f\left( \left( \bm{x}, \frac{\partial g(\bm{x}) }{\partial x_1}, \dots , \frac{\partial g(\bm{x}) }{\partial x_N}, \frac{\partial g(\bm{x}) }{\partial x_1\partial x_2}, \, \dots \, , \frac{\partial^n g(\bm{x}) }{\partial x_N^n} \right) \right)^2
+\]
+!et
+
+If we also have $M$ different sets of values for $x_1, \dots, x_N$, that is $\bm{x}_i = \big(x_1^{(i)}, \dots, x_N^{(i)}\big)$ for $i = 1,\dots,M$ being the rows in matrix $X$, the cost function can be generalized into
+!bt
+\begin{equation*}
+C\left(X, P \right) = \sum_{i=1}^M f\left( \left( \bm{x}_i, \frac{\partial g(\bm{x}_i) }{\partial x_1}, \dots , \frac{\partial g(\bm{x}_i) }{\partial x_N}, \frac{\partial g(\bm{x}_i) }{\partial x_1\partial x_2}, \, \dots \, , \frac{\partial^n g(\bm{x}_i) }{\partial x_N^n} \right) \right)^2.
+\end{equation*}
+!et
+
+!split
+===== Example: The diffusion equation =====
+
+In one spatial dimension, the equation reads
+!bt
+\begin{equation*}
+  \frac{\partial g(x,t)}{\partial t} = \frac{\partial^2 g(x,t)}{\partial x^2}
+\end{equation*}
+!et
+
+where a possible choice of conditions are
+!bt
+\begin{align*}
+g(0,t) &= 0 ,\qquad t \geq 0 \\
+g(1,t) &= 0, \qquad t \geq 0 \\
+g(x,0) &= u(x),\qquad x\in [0,1]
+\end{align*}
+!et
+with $u(x)$ being some given function.
+
+!split
+===== Defining the problem =====
+
+For this case, we want to find $g(x,t)$ such that
+
+!bt
+\begin{equation}
+  \frac{\partial g(x,t)}{\partial t} = \frac{\partial^2 g(x,t)}{\partial x^2}
+\end{equation} \label{diffonedim}
+!et
+
+and
+
+!bt
+\begin{align*}
+g(0,t) &= 0 ,\qquad t \geq 0 \\
+g(1,t) &= 0, \qquad t \geq 0 \\
+g(x,0) &= u(x),\qquad x\in [0,1]
+\end{align*}
+!et
+with $u(x) = \sin(\pi x)$.
+
+First, let us set up the deep neural network.
+The deep neural network will follow the same structure as discussed in the examples solving the ODEs.
+First, we will look into how Autograd could be used in a network tailored to solve for bivariate functions.
+
+
+
+!split
+===== Setting up the network using Autograd =====
+
+The only change to do here, is to extend our network such that
+functions of multiple parameters are correctly handled.  In this case
+we have two variables in our function to solve for, that is time $t$
+and position $x$.  The variables will be represented by a
+one-dimensional array in the program.  The program will evaluate the
+network at each possible pair $(x,t)$, given an array for the desired
+$x$-values and $t$-values to approximate the solution at.
 
 !bc pycod
-def CostOLS(target):
-    """
-    Return OLS function valued only at X, so
-    that it may be easily differentiated
-    """
+def sigmoid(z):
+    return 1/(1 + np.exp(-z))
 
-    def func(X):
-        return (1.0 / target.shape[0]) * np.sum((target - X) ** 2)
+def deep_neural_network(deep_params, x):
+    # x is now a point and a 1D numpy array; make it a column vector
+    num_coordinates = np.size(x,0)
+    x = x.reshape(num_coordinates,-1)
 
-    return func
+    num_points = np.size(x,1)
 
+    # N_hidden is the number of hidden layers
+    N_hidden = len(deep_params) - 1 # -1 since params consist of parameters to all the hidden layers AND the output layer
 
-def CostLogReg(target):
-    """
-    Return Logistic Regression cost function
-    valued only at X, so that it may be easily differentiated
-    """
+    # Assume that the input layer does nothing to the input x
+    x_input = x
+    x_prev = x_input
 
-    def func(X):
-        return -(1.0 / target.shape[0]) * np.sum(
-            (target * np.log(X + 10e-10)) + ((1 - target) * np.log(1 - X + 10e-10))
-        )
+    ## Hidden layers:
 
-    return func
+    for l in range(N_hidden):
+        # From the list of parameters P; find the correct weigths and bias for this layer
+        w_hidden = deep_params[l]
 
+        # Add a row of ones to include bias
+        x_prev = np.concatenate((np.ones((1,num_points)), x_prev ), axis = 0)
 
-def CostCrossEntropy(target):
-    """
-    Return cross entropy cost function valued only at X, so
-    that it may be easily differentiated
-    """
-    
-    def func(X):
-        return -(1.0 / target.size) * np.sum(target * np.log(X + 10e-10))
+        z_hidden = np.matmul(w_hidden, x_prev)
+        x_hidden = sigmoid(z_hidden)
+
+        # Update x_prev such that next layer can use the output from this layer
+        x_prev = x_hidden
 
-    return func
+    ## Output layer:
 
+    # Get the weights and bias for this layer
+    w_output = deep_params[-1]
+
+    # Include bias:
+    x_prev = np.concatenate((np.ones((1,num_points)), x_prev), axis = 0)
+
+    z_output = np.matmul(w_output, x_prev)
+    x_output = z_output
+
+    return x_output[0][0]
 !ec
 
-=== Usage of cost functions ===
+!split
+===== Setting up the network using Autograd; The trial solution =====
+
+The cost function must then iterate through the given arrays
+containing values for $x$ and $t$, defines a point $(x,t)$ the deep
+neural network and the trial solution is evaluated at, and then finds
+the Jacobian of the trial solution.
+
+A possible trial solution for this PDE is
+
+$$
+g_t(x,t) = h_1(x,t) + x(1-x)tN(x,t,P)
+$$
+
+with $h_1(x,t)$ being a function ensuring that $g_t(x,t)$ satisfies our given conditions, and $N(x,t,P)$ being the output from the deep neural network using weights and biases for each layer from $P$.
+
+To fulfill the conditions, $h_1(x,t)$ could be:
+
+$$
+h_1(x,t) = (1-t)\Big(u(x) - \big((1-x)u(0) + x u(1)\big)\Big) = (1-t)u(x) = (1-t)\sin(\pi x)
+$$
+since $(0) = u(1) = 0$ and $u(x) = \sin(\pi x)$.
+
+!split
+===== Why the Jacobian? =====
+
+The Jacobian is used because the program must find the derivative of
+the trial solution with respect to $x$ and $t$.
+
+This gives the necessity of computing the Jacobian matrix, as we want
+to evaluate the gradient with respect to $x$ and $t$ (note that the
+Jacobian of a scalar-valued multivariate function is simply its
+gradient).
 
-Below we will provide a short example of how these cost function may
-be used to obtain results if you wish to test them out on your own
-using AutoGrad's automatic differentiation.
+In Autograd, the differentiation is by default done with respect to
+the first input argument of your Python function. Since the points is
+an array representing $x$ and $t$, the Jacobian is calculated using
+the values of $x$ and $t$.
+
+To find the second derivative with respect to $x$ and $t$, the
+Jacobian can be found for the second time. The result is a Hessian
+matrix, which is the matrix containing all the possible second order
+mixed derivatives of $g(x,t)$.
 
 !bc pycod
-from autograd import grad
+# Set up the trial function:
+def u(x):
+    return np.sin(np.pi*x)
+
+def g_trial(point,P):
+    x,t = point
+    return (1-t)*u(x) + x*(1-x)*t*deep_neural_network(P,point)
+
+# The right side of the ODE:
+def f(point):
+    return 0.
+
+# The cost function:
+def cost_function(P, x, t):
+    cost_sum = 0
+
+    g_t_jacobian_func = jacobian(g_trial)
+    g_t_hessian_func = hessian(g_trial)
+
+    for x_ in x:
+        for t_ in t:
+            point = np.array([x_,t_])
 
-target = np.array([[1, 2, 3]]).T
-a = np.array([[4, 5, 6]]).T
+            g_t = g_trial(point,P)
+            g_t_jacobian = g_t_jacobian_func(point,P)
+            g_t_hessian = g_t_hessian_func(point,P)
 
-cost_func = CostCrossEntropy
-cost_func_derivative = grad(cost_func(target))
+            g_t_dt = g_t_jacobian[1]
+            g_t_d2x = g_t_hessian[0][0]
 
-valued_at_a = cost_func_derivative(a)
-print(f"Derivative of cost function {cost_func.__name__} valued at a:\n{valued_at_a}")
+            func = f(point)
+
+            err_sqr = ( (g_t_dt - g_t_d2x) - func)**2
+            cost_sum += err_sqr
+
+    return cost_sum
 !ec
 
-=== Activation functions ===
+!split
+===== Setting up the network using Autograd; The full program =====
 
-Finally, before we look at the layers that make up the neural network,
-we will look at the activation functions which can be specified
-between the hidden layers and as the output function. Each function
-can be valued for any given vector or matrix X, and can be
-differentiated via derivate().
+Having set up the network, along with the trial solution and cost function, we can now see how the deep neural network performs by comparing the results to the analytical solution.
 
+The analytical solution of our problem is
 
-!bc pycod
+$$
+g(x,t) = \exp(-\pi^2 t)\sin(\pi x)
+$$
 
+A possible way to implement a neural network solving the PDE, is given below.
+Be aware, though, that it is fairly slow for the parameters used.
+A better result is possible, but requires more iterations, and thus longer time to complete.
+
+
+Indeed, the program below is not optimal in its implementation, but rather serves as an example on how to implement and use a neural network to solve a PDE.
+Using TensorFlow results in a much better execution time. Try it!
+
+!bc pycod
 import autograd.numpy as np
-from autograd import elementwise_grad
+from autograd import jacobian,hessian,grad
+import autograd.numpy.random as npr
+from matplotlib import cm
+from matplotlib import pyplot as plt
+from mpl_toolkits.mplot3d import axes3d
 
-def identity(X):
-    return X
+## Set up the network
 
+def sigmoid(z):
+    return 1/(1 + np.exp(-z))
 
-def sigmoid(X):
-    try:
-        return 1.0 / (1 + np.exp(-X))
-    except FloatingPointError:
-        return np.where(X > np.zeros(X.shape), np.ones(X.shape), np.zeros(X.shape))
+def deep_neural_network(deep_params, x):
+    # x is now a point and a 1D numpy array; make it a column vector
+    num_coordinates = np.size(x,0)
+    x = x.reshape(num_coordinates,-1)
 
+    num_points = np.size(x,1)
 
-def softmax(X):
-    X = X - np.max(X, axis=-1, keepdims=True)
-    delta = 10e-10
-    return np.exp(X) / (np.sum(np.exp(X), axis=-1, keepdims=True) + delta)
+    # N_hidden is the number of hidden layers
+    N_hidden = len(deep_params) - 1 # -1 since params consist of parameters to all the hidden layers AND the output layer
 
+    # Assume that the input layer does nothing to the input x
+    x_input = x
+    x_prev = x_input
 
-def RELU(X):
-    return np.where(X > np.zeros(X.shape), X, np.zeros(X.shape))
+    ## Hidden layers:
 
+    for l in range(N_hidden):
+        # From the list of parameters P; find the correct weigths and bias for this layer
+        w_hidden = deep_params[l]
 
-def LRELU(X):
-    delta = 10e-4
-    return np.where(X > np.zeros(X.shape), X, delta * X)
+        # Add a row of ones to include bias
+        x_prev = np.concatenate((np.ones((1,num_points)), x_prev ), axis = 0)
 
+        z_hidden = np.matmul(w_hidden, x_prev)
+        x_hidden = sigmoid(z_hidden)
 
-def derivate(func):
-    if func.__name__ == "RELU":
+        # Update x_prev such that next layer can use the output from this layer
+        x_prev = x_hidden
 
-        def func(X):
-            return np.where(X > 0, 1, 0)
+    ## Output layer:
 
-        return func
+    # Get the weights and bias for this layer
+    w_output = deep_params[-1]
 
-    elif func.__name__ == "LRELU":
+    # Include bias:
+    x_prev = np.concatenate((np.ones((1,num_points)), x_prev), axis = 0)
 
-        def func(X):
-            delta = 10e-4
-            return np.where(X > 0, 1, delta)
+    z_output = np.matmul(w_output, x_prev)
+    x_output = z_output
 
-        return func
+    return x_output[0][0]
 
-    else:
-        return elementwise_grad(func)
-!ec
+## Define the trial solution and cost function
+def u(x):
+    return np.sin(np.pi*x)
+
+def g_trial(point,P):
+    x,t = point
+    return (1-t)*u(x) + x*(1-x)*t*deep_neural_network(P,point)
+
+# The right side of the ODE:
+def f(point):
+    return 0.
+
+# The cost function:
+def cost_function(P, x, t):
+    cost_sum = 0
+
+    g_t_jacobian_func = jacobian(g_trial)
+    g_t_hessian_func = hessian(g_trial)
+
+    for x_ in x:
+        for t_ in t:
+            point = np.array([x_,t_])
+
+            g_t = g_trial(point,P)
+            g_t_jacobian = g_t_jacobian_func(point,P)
+            g_t_hessian = g_t_hessian_func(point,P)
+
+            g_t_dt = g_t_jacobian[1]
+            g_t_d2x = g_t_hessian[0][0]
 
+            func = f(point)
 
-=== Usage of activation functions ===
+            err_sqr = ( (g_t_dt - g_t_d2x) - func)**2
+            cost_sum += err_sqr
 
-Below we present a short demonstration of how to use an activation
-function. The derivative of the activation function will be important
-when calculating the output delta term during backpropagation. Note
-that derivate() can also be used for cost functions for a more
-generalized approach.
+    return cost_sum /( np.size(x)*np.size(t) )
 
+## For comparison, define the analytical solution
+def g_analytic(point):
+    x,t = point
+    return np.exp(-np.pi**2*t)*np.sin(np.pi*x)
+
+## Set up a function for training the network to solve for the equation
+def solve_pde_deep_neural_network(x,t, num_neurons, num_iter, lmb):
+    ## Set up initial weigths and biases
+    N_hidden = np.size(num_neurons)
+
+    ## Set up initial weigths and biases
+
+    # Initialize the list of parameters:
+    P = [None]*(N_hidden + 1) # + 1 to include the output layer
+
+    P[0] = npr.randn(num_neurons[0], 2 + 1 ) # 2 since we have two points, +1 to include bias
+    for l in range(1,N_hidden):
+        P[l] = npr.randn(num_neurons[l], num_neurons[l-1] + 1) # +1 to include bias
+
+    # For the output layer
+    P[-1] = npr.randn(1, num_neurons[-1] + 1 ) # +1 since bias is included
+
+    print('Initial cost: ',cost_function(P, x, t))
+
+    cost_function_grad = grad(cost_function,0)
+
+    # Let the update be done num_iter times
+    for i in range(num_iter):
+        cost_grad =  cost_function_grad(P, x , t)
+
+        for l in range(N_hidden+1):
+            P[l] = P[l] - lmb * cost_grad[l]
+
+    print('Final cost: ',cost_function(P, x, t))
+
+    return P
+
+if __name__ == '__main__':
+    ### Use the neural network:
+    npr.seed(15)
+
+    ## Decide the vales of arguments to the function to solve
+    Nx = 10; Nt = 10
+    x = np.linspace(0, 1, Nx)
+    t = np.linspace(0,1,Nt)
+
+    ## Set up the parameters for the network
+    num_hidden_neurons = [100, 25]
+    num_iter = 250
+    lmb = 0.01
+
+    P = solve_pde_deep_neural_network(x,t, num_hidden_neurons, num_iter, lmb)
+
+    ## Store the results
+    g_dnn_ag = np.zeros((Nx, Nt))
+    G_analytical = np.zeros((Nx, Nt))
+    for i,x_ in enumerate(x):
+        for j, t_ in enumerate(t):
+            point = np.array([x_, t_])
+            g_dnn_ag[i,j] = g_trial(point,P)
+
+            G_analytical[i,j] = g_analytic(point)
+
+    # Find the map difference between the analytical and the computed solution
+    diff_ag = np.abs(g_dnn_ag - G_analytical)
+    print('Max absolute difference between the analytical solution and the network: %g'%np.max(diff_ag))
+
+    ## Plot the solutions in two dimensions, that being in position and time
+
+    T,X = np.meshgrid(t,x)
+
+    fig = plt.figure(figsize=(10,10))
+    ax = fig.add_suplot(projection='3d')
+    ax.set_title('Solution from the deep neural network w/ %d layer'%len(num_hidden_neurons))
+    s = ax.plot_surface(T,X,g_dnn_ag,linewidth=0,antialiased=False,cmap=cm.viridis)
+    ax.set_xlabel('Time $t$')
+    ax.set_ylabel('Position $x$');
+
+
+    fig = plt.figure(figsize=(10,10))
+    ax = fig.add_suplot(projection='3d')
+    ax.set_title('Analytical solution')
+    s = ax.plot_surface(T,X,G_analytical,linewidth=0,antialiased=False,cmap=cm.viridis)
+    ax.set_xlabel('Time $t$')
+    ax.set_ylabel('Position $x$');
+
+    fig = plt.figure(figsize=(10,10))
+    ax = fig.add_suplot(projection='3d')
+    ax.set_title('Difference')
+    s = ax.plot_surface(T,X,diff_ag,linewidth=0,antialiased=False,cmap=cm.viridis)
+    ax.set_xlabel('Time $t$')
+    ax.set_ylabel('Position $x$');
+
+    ## Take some slices of the 3D plots just to see the solutions at particular times
+    indx1 = 0
+    indx2 = int(Nt/2)
+    indx3 = Nt-1
+
+    t1 = t[indx1]
+    t2 = t[indx2]
+    t3 = t[indx3]
+
+    # Slice the results from the DNN
+    res1 = g_dnn_ag[:,indx1]
+    res2 = g_dnn_ag[:,indx2]
+    res3 = g_dnn_ag[:,indx3]
+
+    # Slice the analytical results
+    res_analytical1 = G_analytical[:,indx1]
+    res_analytical2 = G_analytical[:,indx2]
+    res_analytical3 = G_analytical[:,indx3]
+
+    # Plot the slices
+    plt.figure(figsize=(10,10))
+    plt.title("Computed solutions at time = %g"%t1)
+    plt.plot(x, res1)
+    plt.plot(x,res_analytical1)
+    plt.legend(['dnn','analytical'])
+
+    plt.figure(figsize=(10,10))
+    plt.title("Computed solutions at time = %g"%t2)
+    plt.plot(x, res2)
+    plt.plot(x,res_analytical2)
+    plt.legend(['dnn','analytical'])
+
+    plt.figure(figsize=(10,10))
+    plt.title("Computed solutions at time = %g"%t3)
+    plt.plot(x, res3)
+    plt.plot(x,res_analytical3)
+    plt.legend(['dnn','analytical'])
+
+    plt.show()
+!ec
+
+!split
+===== Resources on differential equations and deep learning =====
+
+o "Artificial neural networks for solving ordinary and partial differential equations by I.E. Lagaris et al":"https://pdfs.semanticscholar.org/d061/df393e0e8fbfd0ea24976458b7d42419040d.pdf"
+o "Neural networks for solving differential equations by A. Honchar":"https://becominghuman.ai/neural-networks-for-solving-differential-equations-fa230ac5e04c"
+o "Solving differential equations using neural networks by M.M Chiaramonte and M. Kiener":"http://cs229.stanford.edu/proj2013/ChiaramonteKiener-SolvingDifferentialEquationsUsingNeuralNetworks.pdf"
+o "Introduction to Partial Differential Equations by A. Tveito, R. Winther":"https://www.springer.com/us/book/9783540225515"
+
+
+
+
+
+!split
+===== Convolutional Neural Networks (recognizing images) =====
+
+
+Convolutional neural networks (CNNs) were developed during the last
+decade of the previous century, with a focus on character recognition
+tasks. Nowadays, CNNs are a central element in the spectacular success
+of deep learning methods. The success in for example image
+classifications have made them a central tool for most machine
+learning practitioners.
+
+CNNs are very similar to ordinary Neural Networks.
+They are made up of neurons that have learnable weights and
+biases. Each neuron receives some inputs, performs a dot product and
+optionally follows it with a non-linearity. The whole network still
+expresses a single differentiable score function: from the raw image
+pixels on one end to class scores at the other. And they still have a
+loss function (for example Softmax) on the last (fully-connected) layer
+and all the tips/tricks we developed for learning regular Neural
+Networks still apply (back propagation, gradient descent etc etc).
+
+!split
+===== What is the Difference =====
+
+_CNN architectures make the explicit assumption that
+the inputs are images, which allows us to encode certain properties
+into the architecture. These then make the forward function more
+efficient to implement and vastly reduce the amount of parameters in
+the network._
+
+
+
+!split
+===== Neural Networks vs CNNs =====
+
+Neural networks are defined as _affine transformations_, that is 
+a vector is received as input and is multiplied with a matrix of so-called weights (our unknown paramters) to produce an
+output (to which a bias vector is usually added before passing the result
+through a nonlinear activation function). This is applicable to any type of input, be it an
+image, a sound clip or an unordered collection of features: whatever their
+dimensionality, their representation can always be flattened into a vector
+before the transformation.
+
+
+!split
+===== Why CNNS for images, sound files, medical images from CT scans etc? =====
+
+However, when we consider images, sound clips and many other similar kinds of data, these data  have an intrinsic
+structure. More formally, they share these important properties:
+* They are stored as multi-dimensional arrays (think of the pixels of a figure) .
+* They feature one or more axes for which ordering matters (e.g., width and height axes for an image, time axis for a sound clip).
+* One axis, called the channel axis, is used to access different views of the data (e.g., the red, green and blue channels of a color image, or the left and right channels of a stereo audio track).
+
+These properties are not exploited when an affine transformation is applied; in
+fact, all the axes are treated in the same way and the topological information
+is not taken into account. Still, taking advantage of the implicit structure of
+the data may prove very handy in solving some tasks, like computer vision and
+speech recognition, and in these cases it would be best to preserve it. This is
+where discrete convolutions come into play.
+
+A discrete convolution is a linear transformation that preserves this notion of
+ordering. It is sparse (only a few input units contribute to a given output
+unit) and reuses parameters (the same weights are applied to multiple locations
+in the input).
+
+
+
+
+!split
+===== Regular NNs don’t scale well to full images =====
+
+As an example, consider
+an image of size $32\times 32\times 3$ (32 wide, 32 high, 3 color channels), so a
+single fully-connected neuron in a first hidden layer of a regular
+Neural Network would have $32\times 32\times 3 = 3072$ weights. This amount still
+seems manageable, but clearly this fully-connected structure does not
+scale to larger images. For example, an image of more respectable
+size, say $200\times 200\times 3$, would lead to neurons that have 
+$200\times 200\times 3 = 120,000$ weights. 
+
+We could have
+several such neurons, and the parameters would add up quickly! Clearly,
+this full connectivity is wasteful and the huge number of parameters
+would quickly lead to possible overfitting.
+
+FIGURE: [figslides/nn.jpeg, width=500 frac=0.6]  A regular 3-layer Neural Network.
+
+!split
+===== 3D volumes of neurons ===== 
+
+Convolutional Neural Networks take advantage of the fact that the
+input consists of images and they constrain the architecture in a more
+sensible way. 
+
+In particular, unlike a regular Neural Network, the
+layers of a CNN have neurons arranged in 3 dimensions: width,
+height, depth. (Note that the word depth here refers to the third
+dimension of an activation volume, not to the depth of a full Neural
+Network, which can refer to the total number of layers in a network.)
+
+To understand it better, the above example of an image 
+with an input volume of
+activations has dimensions $32\times 32\times 3$ (width, height,
+depth respectively). 
+
+The neurons in a layer will
+only be connected to a small region of the layer before it, instead of
+all of the neurons in a fully-connected manner. Moreover, the final
+output layer could  for this specific image have dimensions $1\times 1 \times 10$, 
+because by the
+end of the CNN architecture we will reduce the full image into a
+single vector of class scores, arranged along the depth
+dimension. 
+
+FIGURE: [figslides/cnn.jpeg, width=500 frac=0.6]  A CNN arranges its neurons in three dimensions (width, height, depth), as visualized in one of the layers. Every layer of a CNN transforms the 3D input volume to a 3D output volume of neuron activations. In this example, the red input layer holds the image, so its width and height would be the dimensions of the image, and the depth would be 3 (Red, Green, Blue channels).
+
+
+!split
+===== More on Dimensionalities =====
+
+In fields like signal processing (and imaging as well), one designs
+so-called filters. These filters are defined by the convolutions and
+are often hand-crafted. One may specify filters for smoothing, edge
+detection, frequency reshaping, and similar operations. However with
+neural networks the idea is to automatically learn the filters and use
+many of them in conjunction with non-linear operations (activation
+functions).
+
+As an example consider a neural network operating on sound sequence
+data.  Assume that we an input vector $\bm{x}$ of length $d=10^6$.  We
+construct then a neural network with onle hidden layer only with
+$10^4$ nodes. This means that we will have a weight matrix with
+$10^4\times 10^6=10^{10}$ weights to be determined, together with $10^4$ biases.
+
+Assume furthermore that we have an output layer which is meant to train whether the sound sequence represents a human voice (true) or something else (false).
+It means that we have only one output node. But since this output node connects to $10^4$ nodes in the hidden layer, there are in total $10^4$ weights to be determined for the output layer, plus one bias. In total we have
+
+!bt
+\[
+\mathrm{NumberParameters}=10^{10}+10^4+10^4+1 \approx 10^{10},
+\]
+!et
+that is ten billion parameters to determine. 
+
+
+!split
+===== Further remarks =====
+
+
+
+The main principles that justify convolutions is locality of
+information and repetion of patterns within the signal. Sound samples
+of the input in adjacent spots are much more likely to affect each
+other than those that are very far away. Similarly, sounds are
+repeated in multiple times in the signal. While slightly simplistic,
+reasoning about such a sound example demonstrates this. The same
+principles then apply to images and other similar data.
+
+
+
+
+
+!split 
+===== Layers used to build CNNs =====
+
+
+A simple CNN is a sequence of layers, and every layer of a CNN
+transforms one volume of activations to another through a
+differentiable function. We use three main types of layers to build
+CNN architectures: Convolutional Layer, Pooling Layer, and
+Fully-Connected Layer (exactly as seen in regular Neural Networks). We
+will stack these layers to form a full CNN architecture.
+
+A simple CNN for image classification could have the architecture:
+
+* _INPUT_ ($32\times 32 \times 3$) will hold the raw pixel values of the image, in this case an image of width 32, height 32, and with three color channels R,G,B.
+* _CONV_ (convolutional )layer will compute the output of neurons that are connected to local regions in the input, each computing a dot product between their weights and a small region they are connected to in the input volume. This may result in volume such as $[32\times 32\times 12]$ if we decided to use 12 filters.
+* _RELU_ layer will apply an elementwise activation function, such as the $max(0,x)$ thresholding at zero. This leaves the size of the volume unchanged ($[32\times 32\times 12]$).
+* _POOL_ (pooling) layer will perform a downsampling operation along the spatial dimensions (width, height), resulting in volume such as $[16\times 16\times 12]$.
+* _FC_ (i.e. fully-connected) layer will compute the class scores, resulting in volume of size $[1\times 1\times 10]$, where each of the 10 numbers correspond to a class score, such as among the 10 categories of the MNIST images we considered above . As with ordinary Neural Networks and as the name implies, each neuron in this layer will be connected to all the numbers in the previous volume.
+
+
+!split
+===== Transforming images =====
+
+CNNs transform the original image layer by layer from the original
+pixel values to the final class scores. 
+
+Observe that some layers contain
+parameters and other don’t. In particular, the CNN layers perform
+transformations that are a function of not only the activations in the
+input volume, but also of the parameters (the weights and biases of
+the neurons). On the other hand, the RELU/POOL layers will implement a
+fixed function. The parameters in the CONV/FC layers will be trained
+with gradient descent so that the class scores that the CNN computes
+are consistent with the labels in the training set for each image.
+
+
+!split
+===== CNNs in brief =====
+
+In summary:
+
+* A CNN architecture is in the simplest case a list of Layers that transform the image volume into an output volume (e.g. holding the class scores)
+* There are a few distinct types of Layers (e.g. CONV/FC/RELU/POOL are by far the most popular)
+* Each Layer accepts an input 3D volume and transforms it to an output 3D volume through a differentiable function
+* Each Layer may or may not have parameters (e.g. CONV/FC do, RELU/POOL don’t)
+* Each Layer may or may not have additional hyperparameters (e.g. CONV/FC/POOL do, RELU doesn’t)
+
+!split
+===== A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book") =====
+
+FIGURE: [figslides/deepcnn.png, width=500 frac=0.67]  A deep CNN
+
+!split
+===== Key Idea =====
+
+A dense neural network is representd by an affine operation (like
+matrix-matrix multiplication) where all parameters are included.
+
+The key idea in CNNs for say imaging is that in images neighbor pixels tend to be related! So we connect
+only neighboring neurons in the input instead of connecting all with the first hidden layer.
+
+We say we perform a filtering (convolution is the mathematical operation). 
+
+
+
+!split
+===== How to do image compression before the era of deep learning =====
+
+The singular-value decomposition (SVD) algorithm has been for decades one of the standard ways of compressing images.
+The "lectures on the SVD":"https://compphysics.github.io/MachineLearning/doc/LectureNotes/_build/html/chapter2.html#the-singular-value-decomposition" give many of the essential details concerning the SVD.
+
+The orthogonal vectors which are obtained from the SVD, can be used to
+project down the dimensionality of a given image. In the example here
+we gray-scale an image and downsize it.
+
+This recipe relies on us being able to actually perform the SVD. For
+large images, and in particular with many images to reconstruct, using the SVD 
+may quickly become an overwhelming task. With the advent of efficient deep
+learning methods like CNNs and later generative methods, these methods
+have become in the last years the premier way of performing image
+analysis. In particular for classification problems with labelled images.
+
+!split
+===== The SVD example =====
+
+!bc pycod
+from matplotlib.image import imread
+import matplotlib.pyplot as plt
+import scipy.linalg as ln
+import numpy as np
+import os
+from PIL import Image
+from math import log10, sqrt 
+plt.rcParams['figure.figsize'] = [16, 8]
+# Import image
+A = imread(os.path.join("figslides/photo1.jpg"))
+X = A.dot([0.299, 0.5870, 0.114]) # Convert RGB to grayscale
+img = plt.imshow(X)
+# convert to gray
+img.set_cmap('gray')
+plt.axis('off')
+plt.show()
+# Call image size
+print(': %s'%str(X.shape))
+
+
+# split the matrix into U, S, VT
+U, S, VT = np.linalg.svd(X,full_matrices=False)
+S = np.diag(S)
+m = 800 # Image's width
+n = 1200 # Image's height
+j = 0
+# Try compression with different k vectors (these represent projections):
+for k in (5,10, 20, 100,200,400,500):
+    # Original size of the image
+    originalSize = m * n 
+    # Size after compressed
+    compressedSize = k * (1 + m + n) 
+    # The projection of the original image
+    Xapprox = U[:,:k] @ S[0:k,:k] @ VT[:k,:]
+    plt.figure(j+1)
+    j += 1
+    img = plt.imshow(Xapprox)
+    img.set_cmap('gray')
+    
+    plt.axis('off')
+    plt.title('k = ' + str(k))
+    plt.show() 
+    print('Original size of image:')
+    print(originalSize)
+    print('Compression rate as Compressed image / Original size:')
+    ratio = compressedSize * 1.0 / originalSize
+    print(ratio)
+    print('Compression rate is ' + str( round(ratio * 100 ,2)) + '%' )  
+    # Estimate MQA
+    x= X.astype("float")
+    y=Xapprox.astype("float")
+    err = np.sum((x - y) ** 2)
+    err /= float(X.shape[0] * Xapprox.shape[1])
+    print('The mean-square deviation '+ str(round( err)))
+    max_pixel = 255.0
+    # Estimate Signal Noise Ratio
+    srv = 20 * (log10(max_pixel / sqrt(err)))
+    print('Signa to noise ratio '+ str(round(srv)) +'dB')
+
+!ec
+
+
+
+!split
+===== Mathematics of CNNs =====
+
+The mathematics of CNNs is based on the mathematical operation of
+_convolution_.  In mathematics (in particular in functional analysis),
+convolution is represented by mathematical operations (integration,
+summation etc) on two functions in order to produce a third function
+that expresses how the shape of one gets modified by the other.
+Convolution has a plethora of applications in a variety of
+disciplines, spanning from statistics to signal processing, computer
+vision, solutions of differential equations,linear algebra,
+engineering, and yes, machine learning.
+
+Mathematically, convolution is defined as follows (one-dimensional example):
+Let us define a continuous function $y(t)$ given by
+!bt
+\[
+y(t) = \int x(a) w(t-a) da,
+\]
+!et
+where $x(a)$ represents a so-called input and $w(t-a)$ is normally called the weight function or kernel.
+
+The above integral is written in  a more compact form as
+!bt
+\[
+y(t) = \left(x * w\right)(t).
+\]
+!et
+
+The discretized version reads
+!bt
+\[
+y(t) = \sum_{a=-\infty}^{a=\infty}x(a)w(t-a).
+\]
+!et
+Computing the inverse of the above convolution operations is known as deconvolution and the process is commutative.
+
+How can we use this? And what does it mean? Let us study some familiar examples first.
+
+
+!split
+===== Convolution Examples: Polynomial multiplication =====
+
+Our first example is that of a multiplication between two polynomials,
+which we will rewrite in terms of the mathematics of convolution. In
+the final stage, since the problem here is a discrete one, we will
+recast the final expression in terms of a matrix-vector
+multiplication, where the matrix is a so-called "Toeplitz matrix
+":"https://link.springer.com/book/10.1007/978-93-86279-04-0".
+
+Let us look a the following polynomials to second and third order, respectively:
+!bt
+\[
+p(t) = \alpha_0+\alpha_1 t+\alpha_2 t^2,
+\]
+!et
+and
+!bt
+\[
+s(t) = \beta_0+\beta_1 t+\beta_2 t^2+\beta_3 t^3.
+\]
+!et
+
+The polynomial multiplication gives us a new polynomial of degree $5$
+!bt
+\[
+z(t) = \delta_0+\delta_1 t+\delta_2 t^2+\delta_3 t^3+\delta_4 t^4+\delta_5 t^5.
+\]
+!et
+
+!split
+===== Efficient Polynomial Multiplication =====
+
+Computing polynomial products can be implemented efficiently if we rewrite the more brute force multiplications using convolution.
+We note first that the new coefficients are given as
+
+!bt
+\begin{split}
+\delta_0=&\alpha_0\beta_0\\
+\delta_1=&\alpha_1\beta_0+\alpha_0\beta_1\\
+\delta_2=&\alpha_0\beta_2+\alpha_1\beta_1+\alpha_2\beta_0\\
+\delta_3=&\alpha_1\beta_2+\alpha_2\beta_1+\alpha_0\beta_3\\
+\delta_4=&\alpha_2\beta_2+\alpha_1\beta_3\\
+\delta_5=&\alpha_2\beta_3.\\
+\end{split}
+!et
+
+
+We note that $\alpha_i=0$ except for $i\in \left\{0,1,2\right\}$ and $\beta_i=0$ except for $i\in\left\{0,1,2,3\right\}$.
+
+We can then rewrite the coefficients $\delta_j$ using a discrete convolution as
+!bt
+\[
+\delta_j = \sum_{i=-\infty}^{i=\infty}\alpha_i\beta_{j-i}=(\alpha * \beta)_j,
+\]
+!et
+or as a double sum with restriction $l=i+j$
+!bt
+\[
+\delta_l = \sum_{ij}\alpha_i\beta_{j}.
+\]
+!et
+
+
+!split
+===== Further simplification =====
+
+Although we may have redundant operations with some few zeros for $\beta_i$, we can rewrite the above sum in a more compact way as 
+!bt
+\[
+\delta_i = \sum_{k=0}^{k=m-1}\alpha_k\beta_{i-k},
+\]
+!et
+where $m=3$ in our case, the maximum length of
+the vector $\alpha$. Note that the vector $\bm{\beta}$ has length $n=4$. Below we will find an even more efficient representation.
+
+!split
+===== A more efficient way of coding the above Convolution =====
+
+Since we only have a finite number of $\alpha$ and $\beta$ values
+which are non-zero, we can rewrite the above convolution expressions
+as a matrix-vector multiplication
+
+!bt
+\[
+\bm{\delta}=\begin{bmatrix}\alpha_0 & 0 & 0 & 0 \\
+                            \alpha_1 & \alpha_0 & 0 & 0 \\
+			    \alpha_2 & \alpha_1 & \alpha_0 & 0 \\
+			    0 & \alpha_2 & \alpha_1 & \alpha_0 \\
+			    0 & 0 & \alpha_2 & \alpha_1 \\
+			    0 & 0 & 0 & \alpha_2
+			    \end{bmatrix}\begin{bmatrix} \beta_0 \\ \beta_1 \\ \beta_2 \\ \beta_3\end{bmatrix}.
+\]
+!et
+
+!split
+===== Commutative process =====
+
+The process is commutative and we can easily see that we can rewrite the multiplication in terms of  a matrix holding $\beta$ and a vector holding $\alpha$.
+In this case we have
+!bt
+\[
+\bm{\delta}=\begin{bmatrix}\beta_0 & 0 & 0  \\
+                            \beta_1 & \beta_0 & 0  \\
+			    \beta_2 & \beta_1 & \beta_0  \\
+			    \beta_3 & \beta_2 & \beta_1 \\
+			    0 & \beta_3 & \beta_2 \\
+			    0 & 0 & \beta_3
+			    \end{bmatrix}\begin{bmatrix} \alpha_0 \\ \alpha_1 \\ \alpha_2\end{bmatrix}.
+\]
+!et
+
+Note that the use of these matrices is for mathematical purposes only
+and not implementation purposes.  When implementing the above equation
+we do not encode (and allocate memory) the matrices explicitely.  We
+rather code the convolutions in the minimal memory footprint that they
+require.
+
+
+
+!split
+===== Toeplitz matrices =====
+
+The above matrices are examples of so-called "Toeplitz
+matrices":"https://link.springer.com/book/10.1007/978-93-86279-04-0". A
+Toeplitz matrix is a matrix in which each descending diagonal from
+left to right is constant. For instance the last matrix, which we
+rewrite as
+!bt
+\[
+\bm{A}=\begin{bmatrix}a_0 & 0 & 0  \\
+                            a_1 & a_0 & 0  \\
+			    a_2 & a_1 & a_0  \\
+			    a_3 & a_2 & a_1 \\
+			    0 & a_3 & a_2 \\
+			    0 & 0 & a_3
+			    \end{bmatrix},
+\]
+!et
+with elements $a_{ii}=a_{i+1,j+1}=a_{i-j}$ is an example of a Toeplitz
+matrix. Such a matrix does not need to be a square matrix.  Toeplitz
+matrices are also closely connected with Fourier series, because the multiplication operator by a trigonometric
+polynomial, compressed to a finite-dimensional space, can be
+represented by such a matrix. The example above shows that we can
+represent linear convolution as multiplication of a Toeplitz matrix by
+a vector.
+
+
+!split
+===== Fourier series and Toeplitz matrices =====
+
+This is an active and ogoing research area concerning CNNs. The following articles may be of interest
+o "Read more about the convolution theorem and Fouriers series":"https://www.sciencedirect.com/topics/engineering/convolution-theorem#:~:text=The%20convolution%20theorem%20(together%20with,k%20)%20G%20(%20k%20)%20."
+o "Fourier Transform Layer":"https://www.sciencedirect.com/science/article/pii/S1568494623006257"
+
+
+
+!split
+===== Generalizing the above one-dimensional case =====
+
+In order to align the above simple case with the more general
+convolution cases, we rename $\bm{\alpha}$, whose length is $m=3$,
+with $\bm{w}$.  We will interpret $\bm{w}$ as a weight/filter function
+with which we want to perform the convolution with an input variable
+$\bm{x}$ of length $n$.  We will assume always that the filter
+$\bm{w}$ has dimensionality $m \le n$.
+
+We replace thus $\bm{\beta}$ with $\bm{x}$ and $\bm{\delta}$ with $\bm{y}$ and have
+!bt
+\[
+y(i)= \left(x*w\right)(i)= \sum_{k=0}^{k=m-1}w(k)x(i-k),
+\]
+!et
+where $m=3$ in our case, the maximum length of the vector $\bm{w}$.
+Here the symbol $*$ represents the mathematical operation of convolution.
+
+
+!split
+===== Memory considerations =====
+
+This expression leaves us however with some terms with negative
+indices, for example $x(-1)$ and $x(-2)$ which may not be defined. Our
+vector $\bm{x}$ has components $x(0)$, $x(1)$, $x(2)$ and $x(3)$.
+
+The index $j$ for $\bm{x}$ runs from $j=0$ to $j=3$ since $\bm{x}$ is meant to
+represent a third-order polynomial.
+
+Furthermore, the index $i$ runs from $i=0$ to $i=5$ since $\bm{y}$
+contains the coefficients of a fifth-order polynomial.  When $i=5$ we
+may also have values of $x(4)$ and $x(5)$ which are not defined.
+
+
+!split
+===== Padding =====
+
+The solution to this is what is called _padding_!  We simply define a
+new vector $x$ with two added elements set to zero before $x(0)$ and
+two new elements after $x(3)$ set to zero. That is, we augment the
+length of $\bm{x}$ from $n=4$ to $n+2P=8$, where $P=2$ is the padding
+constant (a new hyperparameter), see discussions below as well.
+
+!split
+===== New vector =====
+
+We have a new vector defined as $x(0)=0$, $x(1)=0$,
+$x(2)=\beta_0$, $x(3)=\beta_1$, $x(4)=\beta_2$, $x(5)=\beta_3$,
+$x(6)=0$, and $x(7)=0$.
+
+
+We have added four new elements, which
+are all zero. The benefit is that we can rewrite the equation for
+$\bm{y}$, with $i=0,1,\dots,5$,
+!bt
+\[
+y(i) = \sum_{k=0}^{k=m-1}w(k)x(i+(m-1)-k).
+\]
+!et
+
+As an example, we have
+!bt
+\[
+y(4)=x(6)w(0)+x(5)w(1)+x(4)w(2)=0\times \alpha_0+\beta_3\alpha_1+\beta_2\alpha_2,
+\]
+!et
+as before except that we have an additional term $x(6)w(0)$, which is zero.
+
+Similarly, for the fifth-order term we have
+!bt
+\[
+y(5)=x(7)w(0)+x(6)w(1)+x(5)w(2)=0\times \alpha_0+0\times\alpha_1+\beta_3\alpha_2.
+\]
+!et
+
+The zeroth-order term is
+!bt
+\[
+y(0)=x(2)w(0)+x(1)w(1)+x(0)w(2)=\beta_0 \alpha_0+0\times\alpha_1+0\times\alpha_2=\alpha_0\beta_0.
+\]
+!et
+
+!split
+===== Rewriting as dot products =====
+
+If we now flip the filter/weight vector, with the following term as a typical example
+!bt
+\[
+y(0)=x(2)w(0)+x(1)w(1)+x(0)w(2)=x(2)\tilde{w}(2)+x(1)\tilde{w}(1)+x(0)\tilde{w}(0),
+\]
+!et
+with $\tilde{w}(0)=w(2)$, $\tilde{w}(1)=w(1)$, and $\tilde{w}(2)=w(0)$, we can then rewrite the above sum as a dot product of
+$x(i:i+(m-1))\tilde{w}$ for element $y(i)$, where $x(i:i+(m-1))$ is simply a patch of $\bm{x}$ of size $m-1$.
+
+The padding $P$ we have introduced for the convolution stage is just
+another hyperparameter which is introduced as part of the
+architecture. Similarly, below we will also introduce another
+hyperparameter called _Stride_ $S$. 
+
+
+!split
+===== Cross correlation =====
+
+In essentially all applications one uses what is called cross correlation instead of the standard convolution described above.
+This means that multiplication is performed in the same direction and instead of the general expression we have discussed above (with infinite sums)
+!bt
+\[
+y(i) = \sum_{k=-\infty}^{k=\infty}w(k)x(i-k),
+\]
+!et
+we have now
+!bt
+\[
+y(i) = \sum_{k=-\infty}^{k=\infty}w(k)x(i+k).
+\]
+!et
+
+Both TensorFlow and PyTorch (as well as our own code example below),
+implement the last equation, although it is normally referred to as
+convolution.  The same padding rules and stride rules discussed below
+apply to this expression as well.
 
-!bc pycod
-z = np.array([[4, 5, 6]]).T
-print(f"Input to activation function:\n{z}")
+We leave it as an exercise for you to convince yourself that the example we have discussed till now, gives the same final expression using the last expression.
 
-act_func = sigmoid
-a = act_func(z)
-print(f"\nOutput from {act_func.__name__} activation function:\n{a}")
 
-act_func_derivative = derivate(act_func)
-valued_at_z = act_func_derivative(a)
-print(f"\nDerivative of {act_func.__name__} activation function valued at z:\n{valued_at_z}")
-!ec
 
-=== Convolution ===
+=====  Two-dimensional objects =====
 
-In order to construct a convolutional neural network (CNN), it is
-crucial to comprehend the fundamental principles of convolution and
-how it aids in extracting information from images. Convolution, at its
-core, is merely a mathematical operation between two functions that
-yields another function. It is represented by an integral between two
-functions, which is typically expressed as:
+We are now ready to start studying the discrete convolutions relevant for convolutional neural networks.
+We often use convolutions over more than one dimension at a time. If
+we have a two-dimensional image $X$ as input, we can have a _filter_
+defined by a two-dimensional _kernel/weight/filter_ $W$. This leads to an output $Y$
 
 !bt
 \[
-(f \ast g)(t):=\int_{-\infty}^{\infty} f(\tau) g(t-\tau) d \tau.
+Y(i,j)=(X * W)(i,j) = \sum_m\sum_n X(m,n)W(i-m,j-n).
 \]
 !et
 
-Here, $f$ and $g$ are the two functions on which we want to perform an
-operation. The outcome of the convolution operation is represented by
-$(f \ast g)$, and it is derived by sliding the function $g$ over $f$ and
-computing the integral of their product at each position. If both
-functions are continuous, convolution takes the form shown
-above. However, if we discretize both $f$ and $g$, the convolution
-operation will take the form of a sum between the elements of $f$ and $g$:
+Convolution is a commutative process, which means we can rewrite this equation as
 !bt
 \[
-(f \ast g)[n]=\sum_{m=0}^{n-1} f(m) g(n-m).
+Y(i,j)=(X * W)(i,j) = \sum_m\sum_n X(i-m,j-n)W(m,n).
 \]
 !et
 
-The key idea we utilize to extract the information contained in an
-image is to slide an $m \times n$ matrix $g$ over an $m \times n$
-matrix $f$. In our case, $f$ represents the image, while $g$
-represents the kernel, oftentimes called a filter. However, since our
-convolution will be a two-dimensional variant, we need to extend our
-mathematical formula with an additional summation:
+Normally the latter is more straightforward to implement in a machine
+larning library since there is less variation in the range of values
+of $m$ and $n$.
+
 
+As mentioned above, most deep learning libraries implement
+cross-correlation instead of convolution (although it is referred to as
+convolution)
 !bt
-\[
-(f \ast g)(i, j)\sum_{m=0}^{M-1}\sum_{n=0}^{N-1} f(m,n) g(i-m, j-n).
+Y(i,j)=(X * W)(i,j) = \sum_m\sum_n X(i+m,j+n)W(m,n).
 \]
 !et
 
-It is imperative to note that the size of the kernel g is
-significantly smaller than the size of the input image f, thereby
-reducing the amount of computation necessary for feature
-extraction. Furthermore, the kernel is usually a trainable parameter
-in a convolutional neural network, allowing the network to learn
-appropriate kernels for specific tasks.
 
-To give you an example of how 2D convolution works in practice,
-suppose we have an image $f$ of dimension $6 \times 6$
+
+!split
+===== CNNs in more detail, simple example  =====
+
+
+Let assume we have an input matrix $X$ of dimensionality $3\times 3$
+and a $2\times 2$ filter $W$ given by the following matrices
 
 !bt
 \[
-f = \begin{bmatrix}
-4 & 1 & 2 & 9 & 8 & 6 \\
-9 & 5 & 9 & 5 & 8 & 5 \\
-1 & 5 & 9 & 7 & 6 & 4 \\
-2 & 9 & 8 & 3 & 7 & 1 \\
-8 & 1 & 6 & 4 & 2 & 2 \\
-1 & 0 & 5 & 7 & 8 & 2 \\
-\end{bmatrix}
+\bm{X}=\begin{bmatrix}x_{00} & x_{01} & x_{02}  \\
+                      x_{10} & x_{11} & x_{12}  \\
+	              x_{20} & x_{21} & x_{22} \end{bmatrix},
 \]
 !et
-
-and a $3 \times 3$ kernel $g$ called a low-pass filter. Note that the
-kernel is usually rotated by 180 degrees during convolution, however
-this has no effect on this kernel.
-
+and 
 !bt
 \[
-g = \frac{1}{9}
-\begin{bmatrix}
-1 & 1 & 1 \\
-1 & 1 & 1 \\
-1 & 1 & 1 \\
-\end{bmatrix}
+\bm{W}=\begin{bmatrix}w_{00} & w_{01} \\
+	              w_{10} & w_{11}\end{bmatrix}.
 \]
 !et
+We introduce now the hyperparameter $S$ _stride_. Stride represents how the filter $W$ moves the convolution process on the matrix $X$.
+We strongly recommend the repository on "Arithmetic of deep learning by Dumoulin and Visin":"https://github.com/vdumoulin/conv_arithmetic" 
 
-In order to filter the image, we have to extract a $3 \times 3$
-element from the upper left corner of $f$, and perform element-wise
-multiplication of the extracted image pixels with the elements of the
-kernel $g$:
+Here we set the stride equal to $S=1$, which means that, starting with the element $x_{00}$, the filter will act on $2\times 2$ submatrices each time, starting with the upper corner and moving according to the stride value column by column. 
 
+Here we perform the operation
 !bt
 \[
-\begin{bmatrix}
-4 & 1 & 2 \\
-9 & 5 & 9 \\
-1 & 5 & 9 \\
-\end{bmatrix}
-\cdot
-\begin{bmatrix}
-\frac{1}{9} & \frac{1}{9} & \frac{1}{9} \\
-\frac{1}{9} & \frac{1}{9} & \frac{1}{9} \\
-\frac{1}{9} & \frac{1}{9} & \frac{1}{9} \\
-\end{bmatrix}
-=
-\begin{bmatrix}
-\frac{4}{9} & \frac{1}{9} & \frac{2}{9} \\
-\frac{9}{9} & \frac{5}{9} & \frac{9}{9} \\
-\frac{1}{9} & \frac{5}{9} & \frac{9}{9} \\
-\end {bmatrix}
-= \bm{A}
+Y_(i,j)=(X * W)(i,j) = \sum_m\sum_n X(i-m,j-n)W(m,n),
 \]
 !et
-
-Then, following the multiplication, we summarize all the elements of the resulting matrix $\bm{A}$:
-
+and obtain
 !bt
 \[
-(f \ast g)(0, 0)= \sum_{i=0}^{2} \sum_{j=0}^{2} a_{i,j} = 5,
+\bm{Y}=\begin{bmatrix}x_{00}w_{00}+x_{01}w_{01}+x_{10}w_{10}+x_{11}w_{11} & x_{01}w_{00}+x_{02}w_{01}+x_{11}w_{10}+x_{12}w_{11}  \\
+	              x_{10}w_{00}+x_{11}w_{01}+x_{20}w_{10}+x_{21}w_{11} & x_{11}w_{00}+x_{12}w_{01}+x_{21}w_{10}+x_{22}w_{11}\end{bmatrix}.
 \]
 !et
-which corresponds to the first element of the filtered image $(f \ast g)$.
-
-Here we use a stride of $S=1$, a parameter denoted $S$ which describes how
-many indexes we move the kernel $g$ to the right before repeating the
-calculations above for the next $3 \times 3$ element of the image
-$f$. It is usually presumed that $S=1$, however, larger values for $S$
-can be used to reduce the dimentionality of the filtered image such
-that the convolution operation is more computationally efficient. In
-the context of a convolutional neural network, this will become very
-useful.
 
-The full result of the convolution is:
+We can rewrite this operation in terms of a matrix-vector multiplication by defining a new vector where we flatten out the inputs as a vector $\bm{X}'$ of length $9$ and
+a matrix $\bm{W}'$ with dimension $4\times 9$ as
 
 !bt
 \[
-(f \ast g) =
-\begin{bmatrix}
-5 & 5.78 & 7 & 6.44 \\
-6.33 & 6.67 & 6.89 & 5.11 \\
-5.44 & 5.78 & 5.78 & 4 \\
-4.44 & 4.78 & 5.56 & 4 \\
-\end{bmatrix}
+\bm{X}'=\begin{bmatrix}x_{00} \\ x_{01} \\ x_{02} \\ x_{10} \\ x_{11} \\ x_{12} \\ x_{20} \\ x_{21} \\ x_{22} \end{bmatrix},
 \]
 !et
 
-The result is markedly smaller in shape than the original image. This
-occurs when using convolution without first padding the image with
-additional columns and rows, allowing us to keep the original image
-shape after sliding the kernel over the image.  How many rows and
-columns we wish to pad the image with depends strictly on the shape of
-the kernel, as we wish to pad the image with $r$ additional rows and
-$c$ additional columns.
-
+and the new matrix
 !bt
 \[
-r =\lfloor \frac{\mathrm{kernel height}}{2} \rfloor \cdot 2 \\
-c =\lfloor \frac{\mathrm{kernel width}}{2} \rfloor \cdot 2
+\bm{W}'=\begin{bmatrix} w_{00} & w_{01} & 0 & w_{10} & w_{11} & 0 & 0 & 0 & 0 \\
+                        0  & w_{00} & w_{01} & 0 & w_{10} & w_{11} & 0 & 0 & 0 \\
+			0 & 0 & 0 & w_{00} & w_{01} & 0 & w_{10} & w_{11} & 0  \\
+                        0 & 0 & 0 & 0 & w_{00} & w_{01} & 0 & w_{10} & w_{11}\end{bmatrix}.
 \]
 !et
 
-Note the notation $\lfloor \frac{\mathrm{kernel width}}{2} \rfloor$ means that
-we floor the result of the division, meaning we round down to a whole
-number in case $\frac{\mathrm{kernel width}}{2}$ results in a floating point
-number.
-
-Using those simple equations, we find out by how much we have to
-extend the dimensions of the original image. Before proceeding,
-however, we might ask what we shall fill the additional rows and
-columns with? One of the most common approaches to padding is
-zero-padding, which as the name suggest, involves filling the rows and
-columns with zeros. This is the approach that we will be using for
-this demonstration. If we apply this padding to out original $6 \times 6$
-image, the result will be an $8 \times 8$ image as the kernel has a width and
-height of 3. Note that the original image is encapsuled by the
-zero-padded rows and columns:
+We see easily that performing the matrix-vector multiplication $\bm{W}'\bm{X}'$ is the same as the above convolution with stride $S=1$, that is
 
 !bt
 \[
-\begin{bmatrix}
-0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\
-0 & 4 & 1 & 2 & 9 & 8 & 6 & 0 \\
-0 & 9 & 5 & 9 & 5 & 8 & 5 & 0 \\
-0 & 1 & 5 & 9 & 7 & 6 & 4 & 0 \\
-0 & 2 & 9 & 8 & 3 & 7 & 1 & 0 \\
-0 & 8 & 1 & 6 & 4 & 2 & 2 & 0 \\
-0 & 1 & 0 & 5 & 7 & 8 & 2 & 0 \\
-0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\
-\end{bmatrix}.
+Y=(\bm{W}*\bm{X}),
 \]
 !et
+is now given by $\bm{W}'\bm{X}'$ which is a vector of length $4$ instead of the originally resulting  $2\times 2$ output matrix.
 
-Below we have provided code that demonstrates padding and
-convolution. As you will see when we run the code, the size of the
-image will remain unchanged when using padding.~
-
-!bc pycod
-import numpy as np
-
-def padding(image, kernel):
-    # calculate r and c
-    r = (kernel.shape[0] // 2) * 2
-    c = (kernel.shape[1] // 2) * 2
-    
-    # padded image dimensions
-    padded_height = image.shape[0] + r
-    padded_width = image.shape[1] + c
-    
-    # for more readable code
-    k_half_height = kernel.shape[0] // 2
-    k_half_width = kernel.shape[1] // 2
-
-    # zero matrix with padded dimensions
-    padded_img = np.zeros((padded_height, padded_width))
-
-    # place image into zero matrix
-    padded_img[k_half_height : padded_height - k_half_height,
-               k_half_width : padded_width - k_half_width] = image[:, :]
-
-    return padded_img
-
-def convolve(original_image, padded_image, kernel, stride=1):
-    # rotate kernel by 180 degrees
-    kernel = np.rot90(np.rot90(kernel))
-
-    # note that kernel height // 2 is written as 'm'
-    # and kernel width // 2 as 'n' in the mathematical notation
-    m = kernel.shape[0] // 2
-    n = kernel.shape[1] // 2
-    
-    r = (kernel.shape[0] // 2) * 2
-    c = (kernel.shape[1] // 2) * 2
-    
-    # initialize output array
-    convolved_image = np.zeros(original_image.shape)
-    image_height = original_image.shape[0]
-    image_width = original_image.shape[1]
-
-    # the convolution
-    for i in range(m, image_height + m, stride):
-        for j in range(n, image_width + n, stride):
-            convolved_image[i-m, j-n] = np.sum(
-                padded_image[i : i + m, j : j + n]
-                * kernel
-            )
-            
-    return convolved_image
-
-def convolve(image, kernel, stride=1):
-    for i in range(2):
-        kernel = np.rot90(kernel)
-
-    k_half_height = kernel.shape[0] // 2
-    k_half_width = kernel.shape[0] // 2
-
-    conv_image = np.zeros(image.shape)
-    pad_image = padding(image, kernel)
-
-    for i in range(k_half_height, conv_image.shape[0] + k_half_height, stride):
-        for j in range(k_half_width, conv_image.shape[1] + k_half_width, stride):
-            conv_image[i - k_half_height, j - k_half_width] = np.sum(
-                pad_image[
-                    i - k_half_height : i + k_half_height + 1, j - k_half_width : j + k_half_width + 1
-                ]
-                * kernel
-            )
-
-    return conv_image
-!ec
-
-Fun fact: When filtering images, you will see that convolution
-involves rotating the kernel by 180 degrees.  However, this is not the
-case when applying convolution in a CNN, where the same operation that is not
-rotated by 180 degrees is called cross-correlation, which is normally implemented in most libraries.
-
-!bc pycod
-
-original_image = np.array([[4, 1, 2, 9, 8, 6],
-                 [9, 5, 9, 5, 8, 5],
-                 [1, 5, 9, 7, 6, 4],
-                 [2, 9, 8, 3, 7, 1],
-                 [8, 1, 6, 4, 2, 2],
-                 [1, 0, 5, 7, 8, 2]])
-
-kernel = (1/9)*np.ones((3,3))
-
-print(f"{original_image.shape=}")
-
-# note that convolve() performs padding
-convolved_image = convolve(original_image, kernel, stride=1)
-
-print(f"{convolved_image.shape=}")
-!ec
-
-As you can see, the resulting image is of the same size as the
-original image. To round of our demonstration of convolution, we will
-present the results of convolution using commonly used kernels. In a
-CNN, the values of the kernels are randomly initialized, and then
-learned during training. These kernels will extract information
-regarding the picture, such as for example the edge detection filter
-demonstrated below extracts the edges present in the picture. Of
-course, there is no guarantee that the CNN will learn an edge
-detection filter, but this should provide some intuiton as to how the
-CNN is able to use kernels to make better predictions than a regular
-feed forward neural network.
-
-!bc pycod
-# Now an example using a real image and first a gaussian low-pass filter and then a Sobel filter
-import numpy as np
-import imageio.v3 as imageio
-import matplotlib.pyplot as plt
-import time
-
-def generate_gauss_mask(sigma, K=1):
-    side = np.ceil(1 + 8 * sigma)
-    y, x = np.mgrid[-side // 2 + 1 : (side // 2) + 1, -side // 2 + 1 : (side // 2) + 1]
-    ker_coef = K / (2 * np.pi * sigma**2)
-    g = np.exp(-((x**2 + y**2) / (2.0 * sigma**2)))
-
-    return g, ker_coef
-
-
-img_path = "data/IMG-2167.JPG"
-image_of_cute_dog = imageio.imread(img_path, mode='L')
-
-plt.imshow(image_of_cute_dog, cmap="gray", vmin=0, vmax=255, aspect="auto")
-plt.title("Original image")
-plt.show()
-
-gauss, kernel = generate_gauss_mask(sigma=6)
-gauss_kernel = gauss*kernel
-
-filtered_image = convolve(image_of_cute_dog, gauss_kernel)
-plt.imshow(filtered_image, cmap="gray", vmin=0, vmax=255, aspect="auto")
-plt.title("Result of convolution with gauss kernel (blurring filter)")
-plt.show()
-
-sobel_kernel = np.array([[1, 2, 1],
-                    [0, 0, 0], 
-                    [-1, -2, -1]])
-
-filtered_image = convolve(image_of_cute_dog, sobel_kernel)
-
-plt.imshow(filtered_image, cmap="gray", vmin=0, vmax=255, aspect="auto")
-plt.title("Result of convolution with sobel kernel (edge detection filter)")
-plt.show()
-!ec
-
-=== Layers ===
-
-The code below initialises global variables for readability and
-describes the abstract class Layers. This is not important in order to
-understand the CNN, but is benefitial for organizing the code neatly.
-
-!bc pycod
-import math
-import autograd.numpy as np
-from copy import deepcopy, copy
-from autograd import grad
-from typing import Callable
-
-# global variables for index readability
-input_index = 0
-node_index = 1
-bias_index = 1
-input_channel_index = 1
-feature_maps_index = 1
-height_index = 2
-width_index = 3
-kernel_feature_maps_index = 1
-kernel_input_channels_index = 0
-
-
-class Layer:
-    def __init__(self, seed):
-        self.seed = seed
-
-    def _feedforward(self):
-        raise NotImplementedError
-
-    def _backpropagate(self):
-        raise NotImplementedError
-
-    def _reset_weights(self, previous_nodes):
-        raise NotImplementedError
-!ec
-
-=== Convolution2DLayer: convolution in a hidden layer ===
-
-After establishing the foundational understanding of applying
-convolution to spatial data, let us delve into the intricate workings
-of a convolutional layer in a Convolutional Neural Network (CNN). The
-primary function of convolution, as previously discussed, is to
-extract pertinent information from images while simultaneously
-decreasing the scale of our data. To initiate the image processing, we
-shall begin by partitioning the images into color channels (unless the
-image is grayscale), comprising three primary colors: red, green, and
-blue. We will subsequently utilize trainable kernels to construct a
-higher-dimensional encoding of each channel called feature
-maps. Successive layers will receive these feature maps as inputs,
-generating further encodings, albeit with reduced dimensions. The term
-trainable kernels denotes the initialization of pre-defined
-kernel-shaped weights, which we will then train via backpropagation,
-similar to how weights are trained in a Feedforward Neural Network.
-
- 
-To ensure seamless integration between our implementation of the
-convolutional layer and popular machine learning frameworks like
-Tensorflow (Keras) and PyTorch, we have adopted a design pattern that
-mirrors the construction of models using these APIs. This involves
-implementing our convolutional layer as a Python class or object,
-which allows for a more modular and flexible approach to building
-neural networks. By structuring our code in this way, users can easily
-incorporate our implementation into their existing machine learning
-pipelines without having to make significant changes to their
-codebase. Additionally, this design pattern promotes code reusability
-and makes it easier to maintain and update our convolutional layer
-implementation over time.
-
- 
-Note that the Convolution2DLayer takes in an activation function as a parameter, as it also performs non-linearity.
-
-!bc pycod
-class Convolution2DLayer(Layer):
-    def __init__(
-        self,
-        input_channels,
-        feature_maps,
-        kernel_height,
-        kernel_width,
-        v_stride,
-        h_stride,
-        pad,
-        act_func: Callable,
-        seed=None,
-        reset_weights_independently=True,
-    ):
-        super().__init__(seed)
-        self.input_channels = input_channels
-        self.feature_maps = feature_maps
-        self.kernel_height = kernel_height
-        self.kernel_width = kernel_width
-        self.v_stride = v_stride
-        self.h_stride = h_stride
-        self.pad = pad
-        self.act_func = act_func
-
-        # such that the layer can be used on its own
-        # outside of the CNN module
-        if reset_weights_independently == True:
-            self._reset_weights_independently()
-
-    def _feedforward(self, X_batch):
-        # note that the shape of X_batch = [inputs, input_maps, img_height, img_width]
-
-        # pad the input batch
-        X_batch_padded = self._padding(X_batch)
-
-        # calculate height_index and width_index after stride
-        strided_height = int(np.ceil(X_batch.shape[height_index] / self.v_stride))
-        strided_width = int(np.ceil(X_batch.shape[width_index] / self.h_stride))
-
-        # create output array
-        output = np.ndarray(
-            (
-                X_batch.shape[input_index],
-                self.feature_maps,
-                strided_height,
-                strided_width,
-            )
-        )
-
-        # save input and output for backpropagation
-        self.X_batch_feedforward = X_batch
-        self.output_shape = output.shape
-
-        # checking for errors, no need to look here :)
-        self._check_for_errors()
-
-        # convolve input with kernel
-        for img in range(X_batch.shape[input_index]):
-            for chin in range(self.input_channels):
-                for fmap in range(self.feature_maps):
-                    out_h = 0
-                    for h in range(0, X_batch.shape[height_index], self.v_stride):
-                        out_w = 0
-                        for w in range(0, X_batch.shape[width_index], self.h_stride):
-                            output[img, fmap, out_h, out_w] = np.sum(
-                                X_batch_padded[
-                                    img,
-                                    chin,
-                                    h : h + self.kernel_height,
-                                    w : w + self.kernel_width,
-                                ]
-                                * self.kernel[chin, fmap, :, :]
-                            )
-                            out_w += 1
-                        out_h += 1
-
-        # Pay attention to the fact that we're not rotating the kernel by 180 degrees when filtering the image in
-        # the convolutional layer, as convolution in terms of Machine Learning is a procedure known as cross-correlation
-        # in image processing and signal processing
-
-        # return a
-        return self.act_func(output / (self.kernel_height))
-
-    def _backpropagate(self, delta_term_next):
-        # intiate matrices
-        delta_term = np.zeros((self.X_batch_feedforward.shape))
-        gradient_kernel = np.zeros((self.kernel.shape))
-
-        # pad input for convolution
-        X_batch_padded = self._padding(self.X_batch_feedforward)
-
-        # Since an activation function is used at the output of the convolution layer, its derivative
-        # has to be accounted for in the backpropagation -> as if ReLU was a layer on its own.
-        act_derivative = derivate(self.act_func)
-        delta_term_next = act_derivative(delta_term_next)
-
-        # fill in 0's for values removed by vertical stride in feedforward
-        if self.v_stride > 1:
-            v_ind = 1
-            for i in range(delta_term_next.shape[height_index]):
-                for j in range(self.v_stride - 1):
-                    delta_term_next = np.insert(
-                        delta_term_next, v_ind, 0, axis=height_index
-                    )
-                v_ind += self.v_stride
-
-        # fill in 0's for values removed by horizontal stride in feedforward
-        if self.h_stride > 1:
-            h_ind = 1
-            for i in range(delta_term_next.shape[width_index]):
-                for k in range(self.h_stride - 1):
-                    delta_term_next = np.insert(
-                        delta_term_next, h_ind, 0, axis=width_index
-                    )
-                h_ind += self.h_stride
-
-        # crops out 0-rows and 0-columns
-        delta_term_next = delta_term_next[
-            :,
-            :,
-            : self.X_batch_feedforward.shape[height_index],
-            : self.X_batch_feedforward.shape[width_index],
-        ]
-
-        # the gradient received from the next layer also needs to be padded
-        delta_term_next = self._padding(delta_term_next)
-
-        # calculate delta term by convolving next delta term with kernel
-        for img in range(self.X_batch_feedforward.shape[input_index]):
-            for chin in range(self.input_channels):
-                for fmap in range(self.feature_maps):
-                    for h in range(self.X_batch_feedforward.shape[height_index]):
-                        for w in range(self.X_batch_feedforward.shape[width_index]):
-                            delta_term[img, chin, h, w] = np.sum(
-                                delta_term_next[
-                                    img,
-                                    fmap,
-                                    h : h + self.kernel_height,
-                                    w : w + self.kernel_width,
-                                ]
-                                * np.rot90(np.rot90(self.kernel[chin, fmap, :, :]))
-                            )
-
-        # calculate gradient for kernel for weight update
-        # also via convolution
-        for chin in range(self.input_channels):
-            for fmap in range(self.feature_maps):
-                for k_x in range(self.kernel_height):
-                    for k_y in range(self.kernel_width):
-                        gradient_kernel[chin, fmap, k_x, k_y] = np.sum(
-                            X_batch_padded[
-                                img,
-                                chin,
-                                h : h + self.kernel_height,
-                                w : w + self.kernel_width,
-                            ]
-                            * delta_term_next[
-                                img,
-                                fmap,
-                                h : h + self.kernel_height,
-                                w : w + self.kernel_width,
-                            ]
-                        )
-        # all kernels are updated with weight gradient of kernel
-        self.kernel -= gradient_kernel
-
-        # return delta term
-        return delta_term
-
-    def _padding(self, X_batch, batch_type="image"):
-
-        # same padding for images
-        if self.pad == "same" and batch_type == "image":
-            padded_height = X_batch.shape[height_index] + (self.kernel_height // 2) * 2
-            padded_width = X_batch.shape[width_index] + (self.kernel_width // 2) * 2
-            half_kernel_height = self.kernel_height // 2
-            half_kernel_width = self.kernel_width // 2
-
-            # initialize padded array
-            X_batch_padded = np.ndarray(
-                (
-                    X_batch.shape[input_index],
-                    X_batch.shape[feature_maps_index],
-                    padded_height,
-                    padded_width,
-                )
-            )
-
-            # zero pad all images in X_batch
-            for img in range(X_batch.shape[input_index]):
-                padded_img = np.zeros(
-                    (X_batch.shape[feature_maps_index], padded_height, padded_width)
-                )
-                padded_img[
-                    :,
-                    half_kernel_height : padded_height - half_kernel_height,
-                    half_kernel_width : padded_width - half_kernel_width,
-                ] = X_batch[img, :, :, :]
-                X_batch_padded[img, :, :, :] = padded_img[:, :, :]
-
-            return X_batch_padded
-
-        # same padding for gradients
-        elif self.pad == "same" and batch_type == "grad":
-            padded_height = X_batch.shape[height_index] + (self.kernel_height // 2) * 2
-            padded_width = X_batch.shape[width_index] + (self.kernel_width // 2) * 2
-            half_kernel_height = self.kernel_height // 2
-            half_kernel_width = self.kernel_width // 2
-
-            # initialize padded array
-            delta_term_padded = np.zeros(
-                (
-                    X_batch.shape[input_index],
-                    X_batch.shape[feature_maps_index],
-                    padded_height,
-                    padded_width,
-                )
-            )
-
-            # zero pad delta term
-            delta_term_padded[
-                :, :, : X_batch.shape[height_index], : X_batch.shape[width_index]
-            ] = X_batch[:, :, :, :]
-
-            return delta_term_padded
-
-        else:
-            return X_batch
-
-    def _reset_weights_independently(self):
-        # sets seed to remove randomness inbetween runs
-        if self.seed is not None:
-            np.random.seed(self.seed)
-
-        # initializes kernel matrix
-        self.kernel = np.ndarray(
-            (
-                self.input_channels,
-                self.feature_maps,
-                self.kernel_height,
-                self.kernel_width,
-            )
-        )
-
-        # randomly initializes weights
-        for chin in range(self.kernel.shape[kernel_input_channels_index]):
-            for fmap in range(self.kernel.shape[kernel_feature_maps_index]):
-                self.kernel[chin, fmap, :, :] = np.random.rand(
-                    self.kernel_height, self.kernel_width
-                )
-
-    def _reset_weights(self, previous_nodes):
-        # sets weights
-        self._reset_weights_independently()
-
-        # returns shape of output used for subsequent layer's weight initiation
-        strided_height = int(
-            np.ceil(previous_nodes.shape[height_index] / self.v_stride)
-        )
-        strided_width = int(np.ceil(previous_nodes.shape[width_index] / self.h_stride))
-        next_nodes = np.ones(
-            (
-                previous_nodes.shape[input_index],
-                self.feature_maps,
-                strided_height,
-                strided_width,
-            )
-        )
-        return next_nodes / self.kernel_height
-
-    def _check_for_errors(self):
-        if self.X_batch_feedforward.shape[input_channel_index] != self.input_channels:
-            raise AssertionError(
-                f"ERROR: Number of input channels in data ({self.X_batch_feedforward.shape[input_channel_index]}) is not equal to input channels in Convolution2DLayerOPT ({self.input_channels})! Please change the number of input channels of the Convolution2DLayer such that they are equal"
-            )
-!ec
-
-=== Backpropagation in the convolutional layer ===
-
-As you may have noticed, we have not yet explained how the
-backpropagation algorithm works in a convolutional layer. However,
-having covered all other major details about convolutional layers, we
-are now prepared to do so. It should come as no surprise that the
-calculation of delta terms at each convolutional layer takes the form
-of convolution. After the gradient has been propagated backwards
-through the flattening layer, where it was reshaped into an
-appropriate form, calculating the update value for the kernel is
-simply a matter of convolving the output gradient with the input of
-the layer for which we are updating the weights. For more detail, this
-article serves as an excellent resource, see
-URL:"https://pavisj.medium.com/convolutions-and-backpropagations-46026a8f5d2c"
-
-=== Demonstration ===
-
-We can use the convolutional layer above to perform a simple convolution on an image of the now familiar cute dog.
-
-!bc pycod
-import numpy as np
-import imageio.v3 as imageio
-import matplotlib.pyplot as plt
-
-def plot_convolution_result(X, layer):
-    plt.imshow(X[0, 0, :, :], vmin=0, vmax=255, cmap="gray")
-    plt.title("Original image")
-    plt.colorbar()
-    plt.show()
-    conv_result = layer._feedforward(X)
-    plt.title("Result of convolutional layer")
-    plt.imshow(conv_result[0, 0, :, :], vmin=0, vmax=255, cmap="gray")
-    plt.colorbar()
-    plt.show()
-
-# create layer
-layer = Convolution2DLayer(
-    input_channels=3,
-    feature_maps=1,
-    kernel_height=4,
-    kernel_width=4,
-    v_stride=2,
-    h_stride=2,
-    pad="same",
-    act_func=identity,
-    seed=2023,
-    )
-
-# read in image path, make data correct format
-img_path = img_path = "data/IMG-2167.JPG"
-image_of_cute_dog = imageio.imread(img_path)
-image_shape = image_of_cute_dog.shape
-image_of_cute_dog = image_of_cute_dog.reshape(1, image_shape[0], image_shape[1], image_shape[2])
-image_of_cute_dog = image_of_cute_dog.transpose(0, 3, 1, 2)
-
-# plot the result of the convolution
-plot_convolution_result(image_of_cute_dog, layer)
-!ec
-
-We cobserve that the result has half the pixels on each axis due to
-the fact that we've used a horizontal and vertical stride of 2. The
-result of this convolution is not very insightfull, as the kernel has
-completely random values for the first feedforward pass. However, as
-we perform multiple forward and backward passes, the results of the
-convolution should provide identifying features of the image it uses
-for classification.
-
-
-Note that image data usually comes in many different shapes and sizes,
-but for our CNN we require the input data be formatted as \[Number of
-inputs, input channels, input height, input width\]. Occasionally, the
-data you come accross use will be formatted like this, but on many
-occasions reshaping and transposing the dimensions is sadly necessary.
-
-=== Pooling Layer ===
-
-The pooling layer is another widely used type of layer in
-convolutional neural networks that enables data downsampling to a more
-manageable size. Despite recent technological advancements that allow
-for convolution without excessive size reduction of the data, the
-pooling layer still remains a fundamental component of convolutional
-neural networks. It can be used before, after, or in between
-convolutional layers, although finding the optimal placement of layers
-and network depth requires experimentation to achieve the best
-performance for a given problem. The code we provide allows you to
-perform two types of pooling known as max pooling and average pooling.
-
-!bc pycod
-class Pooling2DLayer(Layer):
-    def __init__(
-        self,
-        kernel_height,
-        kernel_width,
-        v_stride,
-        h_stride,
-        pooling="max",
-        seed=None,
-    ):
-        super().__init__(seed)
-        self.kernel_height = kernel_height
-        self.kernel_width = kernel_width
-        self.v_stride = v_stride
-        self.h_stride = h_stride
-        self.pooling = pooling
-
-    def _feedforward(self, X_batch):
-        # Saving the input for use in the backwardpass
-        self.X_batch_feedforward = X_batch
-
-        # check if user is silly
-        self._check_for_errors()
-
-        # Computing the size of the feature maps based on kernel size and the stride parameter
-        strided_height = (
-            X_batch.shape[height_index] - self.kernel_height
-        ) // self.v_stride + 1
-        if X_batch.shape[height_index] == X_batch.shape[width_index]:
-            strided_width = strided_height
-        else:
-            strided_width = (
-                X_batch.shape[width_index] - self.kernel_width
-            ) // self.h_stride + 1
-
-        # initialize output array
-        output = np.ndarray(
-            (
-                X_batch.shape[input_index],
-                X_batch.shape[feature_maps_index],
-                strided_height,
-                strided_width,
-            )
-        )
-
-        # select pooling action, either max or average pooling
-        if self.pooling == "max":
-            self.pooling_action = np.max
-        elif self.pooling == "average":
-            self.pooling_action = np.mean
-
-        # pool based on kernel size and stride
-        for img in range(output.shape[input_index]):
-            for fmap in range(output.shape[feature_maps_index]):
-                for h in range(strided_height):
-                    for w in range(strided_width):
-                        output[img, fmap, h, w] = self.pooling_action(
-                            X_batch[
-                                img,
-                                fmap,
-                                (h * self.v_stride) : (h * self.v_stride)
-                                + self.kernel_height,
-                                (w * self.h_stride) : (w * self.h_stride)
-                                + self.kernel_width,
-                            ]
-                        )
-
-        # output for feedforward in next layer
-        return output
-
-    def _backpropagate(self, delta_term_next):
-        # initiate delta term array
-        delta_term = np.zeros((self.X_batch_feedforward.shape))
-
-        for img in range(delta_term_next.shape[input_index]):
-            for fmap in range(delta_term_next.shape[feature_maps_index]):
-                for h in range(0, delta_term_next.shape[height_index], self.v_stride):
-                    for w in range(
-                        0, delta_term_next.shape[width_index], self.h_stride
-                    ):
-                        # max pooling
-                        if self.pooling == "max":
-                            # get window
-                            window = self.X_batch_feedforward[
-                                img,
-                                fmap,
-                                h : h + self.kernel_height,
-                                w : w + self.kernel_width,
-                            ]
-
-                            # find max values indices in window
-                            max_h, max_w = np.unravel_index(
-                                window.argmax(), window.shape
-                            )
-
-                            # set values in new, upsampled delta term
-                            delta_term[
-                                img,
-                                fmap,
-                                (h + max_h),
-                                (w + max_w),
-                            ] += delta_term_next[img, fmap, h, w]
-
-                        # average pooling
-                        if self.pooling == "average":
-                            delta_term[
-                                img,
-                                fmap,
-                                h : h + self.kernel_height,
-                                w : w + self.kernel_width,
-                            ] = (
-                                delta_term_next[img, fmap, h, w]
-                                / self.kernel_height
-                                / self.kernel_width
-                            )
-        # returns input to backpropagation in previous layer
-        return delta_term
-
-    def _reset_weights(self, previous_nodes):
-        # calculate strided height, strided width
-        strided_height = (
-            previous_nodes.shape[height_index] - self.kernel_height
-        ) // self.v_stride + 1
-        if previous_nodes.shape[height_index] == previous_nodes.shape[width_index]:
-            strided_width = strided_height
-        else:
-            strided_width = (
-                previous_nodes.shape[width_index] - self.kernel_width
-            ) // self.h_stride + 1
-
-        # initiate output array
-        output = np.ones(
-            (
-                previous_nodes.shape[input_index],
-                previous_nodes.shape[feature_maps_index],
-                strided_height,
-                strided_width,
-            )
-        )
-
-        # returns output with shape used for reset weights in next layer
-        return output
-
-    def _check_for_errors(self):
-        # check if input is smaller than kernel size -> error
-        assert (
-            self.X_batch_feedforward.shape[width_index] >= self.kernel_width
-        ), f"ERROR: Pooling kernel width_index ({self.kernel_width}) larger than data width_index ({self.X_batch_feedforward.input.shape[2]}), please lower the kernel width_index of the Pooling2DLayer"
-        assert (
-            self.X_batch_feedforward.shape[height_index] >= self.kernel_height
-        ), f"ERROR: Pooling kernel height_index ({self.kernel_height}) larger than data height_index ({self.X_batch_feedforward.input.shape[3]}), please lower the kernel height_index of the Pooling2DLayer"
-!ec
 
-=== Flattening Layer ===
+!split
+===== The convolution stage =====
 
-Before we can begin building our first CNN model, we need to introduce
-the flattening layer. As its name suggests, the flattening layer
-transforms the data into a one-dimensional vector that can be fed into
-the feedforward layers of our network. This layer plays a crucial role
-in preparing the data for further processing in the
-network. Additionally, the flattening layer is responsible for
-reshaping the gradient to the proper shape during
-backpropagation. This ensures that the kernels are correctly updated,
-allowing for effective learning in the network.
+The convolution stage, where we apply different filters $\bm{W}$ in
+order to reduce the dimensionality of an image, adds, in addition to
+the weights and biases (to be trained by the back propagation
+algorithm) that define the filters, two new hyperparameters, the so-called
+_padding_ $P$ and the stride $S$.
 
-!bc pycod
-class FlattenLayer(Layer):
-    def __init__(self, act_func=LRELU, seed=None):
-        super().__init__(seed)
-        self.act_func = act_func
-
-    def _feedforward(self, X_batch):
-        # save input for backpropagation
-        self.X_batch_feedforward_shape = X_batch.shape
-        # Remember, the data has the following shape: (I, FM, H, W, ) in the convolutional layers
-        # whilst the data has the shape (I, FM * H * W) in the fully connected layers
-        # I = Inputs, FM = Feature Maps, H = Height and W = Width.
-        X_batch = X_batch.reshape(
-            X_batch.shape[input_index],
-            X_batch.shape[feature_maps_index]
-            * X_batch.shape[height_index]
-            * X_batch.shape[width_index],
-        )
-
-        # add bias to a
-        self.z_matrix = X_batch
-        bias = np.ones((X_batch.shape[input_index], 1)) * 0.01
-        self.a_matrix = np.hstack([bias, X_batch])
-
-        # return a, the input to feedforward in next layer
-        return self.a_matrix
-
-    def _backpropagate(self, weights_next, delta_term_next):
-        activation_derivative = derivate(self.act_func)
-
-        # calculate delta term
-        delta_term = (
-            weights_next[bias_index:, :] @ delta_term_next.T
-        ).T * activation_derivative(self.z_matrix)
-
-        # FlattenLayer does not update weights
-        # reshapes delta layer to convolutional layer data format [Input, Feature_Maps, Height, Width]
-        return delta_term.reshape(self.X_batch_feedforward_shape)
-
-    def _reset_weights(self, previous_nodes):
-        # note that the previous nodes to the FlattenLayer are from the convolutional layers
-        previous_nodes = previous_nodes.reshape(
-            previous_nodes.shape[input_index],
-            previous_nodes.shape[feature_maps_index]
-            * previous_nodes.shape[height_index]
-            * previous_nodes.shape[width_index],
-        )
-
-        # return shape used in reset_weights in next layer
-        return previous_nodes.shape[node_index]
-
-    def get_prev_a(self):
-        return self.a_matrix
-!ec
 
-=== Fully Connected Layers ===
-
-Finally, the result from the flatten layer will pass to a series of
-fully connected layers, which function as a normal feed forward neural
-network. The fully connected layers are split into two classes;
-FullyConnectedLayer which acts as a hidden layer, and OutputLayer,
-which acts as the single output layer at the end of the CNN. If one
-wishes to use this codebase to construct a normal feed forward neural
-network, it must start with a FlattenLayer due to techincal details
-regarding weight intitialization. However many FullyConnectedLayers
-can be added to the CNN, and in each layer the amount of nodes, which
-activation function and scheduler to use can be specified. In
-practice, the scheduler will be specified in the CNN object
-initialization, and inherited if no other scheduler is specified.
+!split
+===== Finding the number of parameters =====
 
-!bc pycod
-class FullyConnectedLayer(Layer):
-    # FullyConnectedLayer per default uses LRELU and Adam scheduler
-    # with an eta of 0.0001, rho of 0.9 and rho2 of 0.999
-    def __init__(
-        self,
-        nodes: int,
-        act_func: Callable = LRELU,
-        scheduler: Scheduler = Adam(eta=1e-4, rho=0.9, rho2=0.999),
-        seed: int = None,
-    ):
-        super().__init__(seed)
-        self.nodes = nodes
-        self.act_func = act_func
-        self.scheduler_weight = copy(scheduler)
-        self.scheduler_bias = copy(scheduler)
-
-        # initiate matrices for later
-        self.weights = None
-        self.a_matrix = None
-        self.z_matrix = None
-
-    def _feedforward(self, X_batch):
-        # calculate z
-        self.z_matrix = X_batch @ self.weights
-
-        # calculate a, add bias
-        bias = np.ones((X_batch.shape[input_index], 1)) * 0.01
-        self.a_matrix = self.act_func(self.z_matrix)
-        self.a_matrix = np.hstack([bias, self.a_matrix])
-
-        # return a, the input for feedforward in next layer
-        return self.a_matrix
-
-    def _backpropagate(self, weights_next, delta_term_next, a_previous, lam):
-        # take the derivative of the activation function
-        activation_derivative = derivate(self.act_func)
-
-        # calculate the delta term
-        delta_term = (
-            weights_next[bias_index:, :] @ delta_term_next.T
-        ).T * activation_derivative(self.z_matrix)
-
-        # intitiate matrix to store gradient
-        # note that we exclude the bias term, which we will calculate later
-        gradient_weights = np.zeros(
-            (
-                a_previous.shape[input_index],
-                a_previous.shape[node_index] - bias_index,
-                delta_term.shape[node_index],
-            )
-        )
-
-        # calculate gradient = delta term * previous a
-        for i in range(len(delta_term)):
-            gradient_weights[i, :, :] = np.outer(
-                a_previous[i, bias_index:], delta_term[i, :]
-            )
-
-        # sum the gradient, divide by input_index
-        gradient_weights = np.mean(gradient_weights, axis=input_index)
-        # for the bias gradient we do not multiply by previous a
-        gradient_bias = np.mean(delta_term, axis=input_index).reshape(
-            1, delta_term.shape[node_index]
-        )
-
-        # regularization term
-        gradient_weights += self.weights[bias_index:, :] * lam
-
-        # send gradients into scheduler
-        # returns update matrix which will be used to update the weights and bias
-        update_matrix = np.vstack(
-            [
-                self.scheduler_bias.update_change(gradient_bias),
-                self.scheduler_weight.update_change(gradient_weights),
-            ]
-        )
-
-        # update weights
-        self.weights -= update_matrix
-
-        # return weights and delta term, input for backpropagation in previous layer
-        return self.weights, delta_term
-
-    def _reset_weights(self, previous_nodes):
-        # sets seed to remove randomness inbetween runs
-        if self.seed is not None:
-            np.random.seed(self.seed)
-
-        # add bias, initiate random weights
-        bias = 1
-        self.weights = np.random.randn(previous_nodes + bias, self.nodes)
-
-        # returns number of nodes, used for reset_weights in next layer
-        return self.nodes
-
-    def _reset_scheduler(self):
-        # resets scheduler per epoch
-        self.scheduler_weight.reset()
-        self.scheduler_bias.reset()
-
-    def get_prev_a(self):
-        # returns a matrix, used in backpropagation
-        return self.a_matrix
-
-
-class OutputLayer(FullyConnectedLayer):
-    def __init__(
-        self,
-        nodes: int,
-        output_func: Callable = LRELU,
-        cost_func: Callable = CostCrossEntropy,
-        scheduler: Scheduler = Adam(eta=1e-4, rho=0.9, rho2=0.999),
-        seed: int = None,
-    ):
-        super().__init__(nodes, output_func, copy(scheduler), seed)
-        self.cost_func = cost_func
-
-        # initiate matrices for later
-        self.weights = None
-        self.a_matrix = None
-        self.z_matrix = None
-
-        # decides if the output layer performs binary or multi-class classification
-        self._set_pred_format()
-
-    def _feedforward(self, X_batch: np.ndarray):
-        # calculate a, z
-        # note that bias is not added as this would create an extra output class
-        self.z_matrix = X_batch @ self.weights
-        self.a_matrix = self.act_func(self.z_matrix)
-
-        # returns prediction
-        return self.a_matrix
-
-    def _backpropagate(self, target, a_previous, lam):
-        # note that in the OutputLayer the activation function is the output function
-        activation_derivative = derivate(self.act_func)
-
-        # calculate output delta terms
-        # for multi-class or binary classification
-        if self.pred_format == "Multi-class":
-            delta_term = self.a_matrix - target
-        else:
-            cost_func_derivative = grad(self.cost_func(target))
-            delta_term = activation_derivative(self.z_matrix) * cost_func_derivative(
-                self.a_matrix
-            )
-
-        # intiate matrix that stores gradient
-        gradient_weights = np.zeros(
-            (
-                a_previous.shape[input_index],
-                a_previous.shape[node_index] - bias_index,
-                delta_term.shape[node_index],
-            )
-        )
-
-        # calculate gradient = delta term * previous a
-        for i in range(len(delta_term)):
-            gradient_weights[i, :, :] = np.outer(
-                a_previous[i, bias_index:], delta_term[i, :]
-            )
-
-        # sum the gradient, divide by input_index
-        gradient_weights = np.mean(gradient_weights, axis=input_index)
-        # for the bias gradient we do not multiply by previous a
-        gradient_bias = np.mean(delta_term, axis=input_index).reshape(
-            1, delta_term.shape[node_index]
-        )
-
-        # regularization term
-        gradient_weights += self.weights[bias_index:, :] * lam
-
-        # send gradients into scheduler
-        # returns update matrix which will be used to update the weights and bias
-        update_matrix = np.vstack(
-            [
-                self.scheduler_bias.update_change(gradient_bias),
-                self.scheduler_weight.update_change(gradient_weights),
-            ]
-        )
-
-        # update weights
-        self.weights -= update_matrix
-
-        # return weights and delta term, input for backpropagation in previous layer
-        return self.weights, delta_term
-
-    def _reset_weights(self, previous_nodes):
-        # sets seed to remove randomness inbetween runs
-        if self.seed is not None:
-            np.random.seed(self.seed)
-
-        # add bias, initiate random weights
-        bias = 1
-        self.weights = np.random.rand(previous_nodes + bias, self.nodes)
-
-        # returns number of nodes, used for reset_weights in next layer
-        return self.nodes
-
-    def _reset_scheduler(self):
-        # resets scheduler per epoch
-        self.scheduler_weight.reset()
-        self.scheduler_bias.reset()
-
-    def _set_pred_format(self):
-        # sets prediction format to either regression, binary or multi-class classification
-        if self.act_func.__name__ is None or self.act_func.__name__ == "identity":
-            self.pred_format = "Regression"
-        elif self.act_func.__name__ == "sigmoid" or self.act_func.__name__ == "tanh":
-            self.pred_format = "Binary"
-        else:
-            self.pred_format = "Multi-class"
-
-    def get_pred_format(self):
-        # returns format of prediction
-        return self.pred_format
-!ec
+In the above example we have an input matrix of dimension $3\times
+3$. In general we call the input for an input volume and it is defined
+by its width $H_1$, height $H_1$ and depth $D_1$. If we have the
+standard three color channels $D_1=3$.
 
-=== Optimized Convolution2DLayer ===
+The above example has $W_1=H_1=3$ and $D_1=1$.
 
-For our CNN, we have also implemented an optimized version of the
-Convolution2DLayer, Convolution2DLayerOPT, which runs much faster. See
-VII. Remarks for discussion. This layer will per default be used by
-the CNN due to its computational advantages, but is much less
-readable. We've documented it such that specially interested students
-can understand the principles behind it, but it is not recommended to
-read. In short, we reshape and transpose parts of the image such that
-the convolutional operation can be swapped out for a simple matrix
-multiplication.
+When we introduce the filter we have the following additional hyperparameters
+o $K$ the number of filters. It is common to perform the convolution of the input several times since by experience shrinking the input too fast does not work well
+o $F$ as the filter's spatial extent
+o $S$ as the stride parameter
+o $P$ as the padding parameter
 
+These parameters are defined by the architecture of the network and are not included in the training.
 
-!bc pycod
-class Convolution2DLayerOPT(Convolution2DLayer):
-    """
-    Am optimized version of the convolution layer above which
-    utilizes an approach of extracting windows of size equivalent
-    in size to the filter. The convoution is then performed on those
-    windows instead of a full feature map.
-    """
-
-    def __init__(
-        self,
-        input_channels,
-        feature_maps,
-        kernel_height,
-        kernel_width,
-        v_stride,
-        h_stride,
-        pad,
-        act_func: Callable,
-        seed=None,
-        reset_weights_independently=True,
-    ):
-        super().__init__(
-            input_channels,
-            feature_maps,
-            kernel_height,
-            kernel_width,
-            v_stride,
-            h_stride,
-            pad,
-            act_func,
-            seed,
-        )
-        # true if layer is used outside of CNN
-        if reset_weights_independently == True:
-            self._reset_weights_independently()
-
-    def _feedforward(self, X_batch):
-        # The optimized _feedforward method is difficult to understand but computationally more efficient
-        # for a more "by the book" approach, please look at the _feedforward method of Convolution2DLayer
-
-        # save the input for backpropagation
-        self.X_batch_feedforward = X_batch
-
-        # check that there are the correct amount of input channels
-        self._check_for_errors()
-
-        # calculate new shape after stride
-        strided_height = int(np.ceil(X_batch.shape[height_index] / self.v_stride))
-        strided_width = int(np.ceil(X_batch.shape[width_index] / self.h_stride))
-
-        # get windows of the image for more computationally efficient convolution
-        # the idea is that we want to align the dimensions that we wish to matrix
-        # multiply, then use a simple matrix multiplication instead of convolution.
-        # then, we reshape the size back to its intended shape
-        windows = self._extract_windows(X_batch)
-        windows = windows.transpose(1, 0, 2, 3, 4).reshape(
-            X_batch.shape[input_index],
-            strided_height * strided_width,
-            -1,
-        )
-
-        # reshape the kernel for more computationally efficient convolution
-        kernel = self.kernel
-        kernel = kernel.transpose(0, 2, 3, 1).reshape(
-            kernel.shape[kernel_input_channels_index]
-            * kernel.shape[height_index]
-            * kernel.shape[width_index],
-            -1,
-        )
-
-        # use simple matrix calculation to obtain output
-        output = (
-            (windows @ kernel)
-            .reshape(
-                X_batch.shape[input_index],
-                strided_height,
-                strided_width,
-                -1,
-            )
-            .transpose(0, 3, 1, 2)
-        )
-
-        # The output is reshaped and rearranged to appropriate shape
-        return self.act_func(
-            output / (self.kernel_height * X_batch.shape[feature_maps_index])
-        )
-
-    def _backpropagate(self, delta_term_next):
-        # The optimized _backpropagate method is difficult to understand but computationally more efficient
-        # for a more "by the book" approach, please look at the _backpropagate method of Convolution2DLayer
-        act_derivative = derivate(self.act_func)
-        delta_term_next = act_derivative(delta_term_next)
-
-        # calculate strided dimensions
-        strided_height = int(
-            np.ceil(self.X_batch_feedforward.shape[height_index] / self.v_stride)
-        )
-        strided_width = int(
-            np.ceil(self.X_batch_feedforward.shape[width_index] / self.h_stride)
-        )
-
-        # copy kernel
-        kernel = self.kernel
-
-        # get windows, reshape for matrix multiplication
-        windows = self._extract_windows(self.X_batch_feedforward, "image").reshape(
-            self.X_batch_feedforward.shape[input_index]
-            * strided_height
-            * strided_width,
-            -1,
-        )
-
-        # initialize output gradient, reshape and transpose into correct shape
-        # for matrix multiplication
-        output_grad_tr = delta_term_next.transpose(0, 2, 3, 1).reshape(
-            self.X_batch_feedforward.shape[input_index]
-            * strided_height
-            * strided_width,
-            -1,
-        )
-
-        # calculate gradient kernel via simple matrix multiplication and reshaping
-        gradient_kernel = (
-            (windows.T @ output_grad_tr)
-            .reshape(
-                kernel.shape[kernel_input_channels_index],
-                kernel.shape[height_index],
-                kernel.shape[width_index],
-                kernel.shape[kernel_feature_maps_index],
-            )
-            .transpose(0, 3, 1, 2)
-        )
-
-        # for computing the input gradient
-        windows_out, upsampled_height, upsampled_width = self._extract_windows(
-            delta_term_next, "grad"
-        )
-
-        # calculate new window dimensions
-        new_windows_first_dim = (
-            self.X_batch_feedforward.shape[input_index]
-            * upsampled_height
-            * upsampled_width
-        )
-        # ceil allows for various asymmetric kernels
-        new_windows_sec_dim = int(np.ceil(windows_out.size / new_windows_first_dim))
-
-        # reshape for matrix multiplication
-        windows_out = windows_out.transpose(1, 0, 2, 3, 4).reshape(
-            new_windows_first_dim, new_windows_sec_dim
-        )
-
-        # reshape for matrix multiplication
-        kernel_reshaped = kernel.reshape(self.input_channels, -1)
-
-        # calculating input gradient for next convolutional layer
-        input_grad = (windows_out @ kernel_reshaped.T).reshape(
-            self.X_batch_feedforward.shape[input_index],
-            upsampled_height,
-            upsampled_width,
-            kernel.shape[kernel_input_channels_index],
-        )
-        input_grad = input_grad.transpose(0, 3, 1, 2)
-
-        # Update the weights in the kernel
-        self.kernel -= gradient_kernel
-
-        # Output the gradient to propagate backwards
-        return input_grad
-
-    def _extract_windows(self, X_batch, batch_type="image"):
-        """
-        Receives as input the X_batch with shape (inputs, feature_maps, image_height, image_width)
-        and extract windows of size kernel_height * kernel_width for every image and every feature_map.
-        It then returns an np.ndarray of shape (image_height * image_width, inputs, feature_maps, kernel_height, kernel_width)
-        which will be used either to filter the images in feedforward or to calculate the gradient.
-        """
-
-        # initialize list of windows
-        windows = []
-
-        if batch_type == "image":
-            # pad the images
-            X_batch_padded = self._padding(X_batch, batch_type="image")
-            img_height, img_width = X_batch_padded.shape[2:]
-            # For each location in the image...
-            for h in range(
-                0,
-                X_batch.shape[height_index],
-                self.v_stride,
-            ):
-                for w in range(
-                    0,
-                    X_batch.shape[width_index],
-                    self.h_stride,
-                ):
-                    # ...obtain an image patch of the original size (strided)
-
-                    # get window
-                    window = X_batch_padded[
-                        :,
-                        :,
-                        h : h + self.kernel_height,
-                        w : w + self.kernel_width,
-                    ]
-
-                    # append to list of windows
-                    windows.append(window)
-
-            # return numpy array instead of list
-            return np.stack(windows)
-
-        # In order to be able to perform backprogagation by the method of window extraction,
-        # here is a modified approach to extracting the windows which allow for the necessary
-        # upsampling of the gradient in case the on of the stride parameters is larger than one.
-
-        if batch_type == "grad":
-
-            # In the case of one of the stride parameters being odd, we have to take some
-            # extra care in calculating the upsampled size of X_batch. We solve this
-            # by simply flooring the result of dividing stride by 2.
-            if self.v_stride < 2 or self.v_stride % 2 == 0:
-                v_stride = 0
-            else:
-                v_stride = int(np.floor(self.v_stride / 2))
-
-            if self.h_stride < 2 or self.h_stride % 2 == 0:
-                h_stride = 0
-            else:
-                h_stride = int(np.floor(self.h_stride / 2))
-
-            upsampled_height = (X_batch.shape[height_index] * self.v_stride) - v_stride
-            upsampled_width = (X_batch.shape[width_index] * self.h_stride) - h_stride
-
-            # When upsampling, we need to insert rows and columns filled with zeros
-            # into each feature map. How many of those we have to insert is purely
-            # dependant on the value of stride parameter in the vertical and horizontal
-            # direction.
-            if self.v_stride > 1:
-                v_ind = 1
-                for i in range(X_batch.shape[height_index]):
-                    for j in range(self.v_stride - 1):
-                        X_batch = np.insert(X_batch, v_ind, 0, axis=height_index)
-                    v_ind += self.v_stride
-
-            if self.h_stride > 1:
-                h_ind = 1
-                for i in range(X_batch.shape[width_index]):
-                    for k in range(self.h_stride - 1):
-                        X_batch = np.insert(X_batch, h_ind, 0, axis=width_index)
-                    h_ind += self.h_stride
-
-            # Since the insertion of zero-filled rows and columns isn't perfect, we have
-            # to assure that the resulting feature maps will have the expected upsampled height
-            # and width by cutting them og at desired dimensions.
-
-            X_batch = X_batch[:, :, :upsampled_height, :upsampled_width]
-
-            X_batch_padded = self._padding(X_batch, batch_type="grad")
-
-            # initialize list of windows
-            windows = []
-
-            # For each location in the image...
-            for h in range(
-                0,
-                X_batch.shape[height_index],
-                self.v_stride,
-            ):
-                for w in range(
-                    0,
-                    X_batch.shape[width_index],
-                    self.h_stride,
-                ):
-                    # ...obtain an image patch of the original size (strided)
-
-                    # get window
-                    window = X_batch_padded[
-                        :, :, h : h + self.kernel_height, w : w + self.kernel_width
-                    ]
-
-                    # append window to list
-                    windows.append(window)
-
-            # return numpy array, unsampled dimensions
-            return np.stack(windows), upsampled_height, upsampled_width
-
-    def _check_for_errors(self):
-        # compares input channels of data to input channels of Convolution2DLayer
-        if self.X_batch_feedforward.shape[input_channel_index] != self.input_channels:
-            raise AssertionError(
-                f"ERROR: Number of input channels in data ({self.X_batch_feedforward.shape[input_channel_index]}) is not equal to input channels in Convolution2DLayerOPT ({self.input_channels})! Please change the number of input channels of the Convolution2DLayer such that they are equal"
-            )
-!ec
+!split
+===== New image (or volume) =====
 
-=== The Convolutional Neural Network (CNN) ===
+Acting with the filter on the input volume produces an output volume
+which is defined by its width $W_2$, its height $H_2$ and its depth
+$D_2$.
 
-Finally, we present the code for the CNN. The CNN class organizes all the layers, and allows for training on image data.
+These are defined by the following relations
+!bt
+\[
+W_2 = \frac{(W_1-F+2P)}{S}+1,
+\]
+!et
+!bt
+\[
+H_2 = \frac{(H_1-F+2P)}{S}+1,
+\]
+!et
+and $D_2=K$.
 
-!bc pycod
-import math
-import autograd.numpy as np
-import sys
-import warnings
-from autograd import grad, elementwise_grad
-from random import random, seed
-from copy import deepcopy
-from typing import Tuple, Callable
-from sklearn.utils import resample
-
-warnings.simplefilter("error")
-
-
-class CNN:
-    def __init__(
-        self,
-        cost_func: Callable = CostCrossEntropy,
-        scheduler: Scheduler = Adam(eta=1e-4, rho=0.9, rho2=0.999),
-        seed: int = None,
-    ):
-        """
-        Description:
-        ------------
-            Instantiates CNN object
-
-        Parameters:
-        ------------
-            I   output_func (costFunctions) cost function for feed forward neural network part of CNN,
-                such as "CostLogReg", "CostOLS" or "CostCrossEntropy"
-
-            II  scheduler (Scheduler) optional parameter, default set to Adam. Can also be set to other
-                schedulers such as AdaGrad, Momentum, RMS_prop and Constant. Note that schedulers have
-                to be instantiated first with proper parameters (for example eta, rho and rho2 for Adam)
-
-            III seed (int) used for seeding all random operations
-        """
-        self.layers = list()
-        self.cost_func = cost_func
-        self.scheduler = scheduler
-        self.schedulers_weight = list()
-        self.schedulers_bias = list()
-        self.seed = seed
-        self.pred_format = None
-
-    def add_FullyConnectedLayer(
-        self, nodes: int, act_func=LRELU, scheduler=None
-    ) -> None:
-        """
-        Description:
-        ------------
-            Add a FullyConnectedLayer to the CNN, i.e. a hidden layer in the feed forward neural
-            network part of the CNN. Often called a Dense layer in literature
-
-        Parameters:
-        ------------
-            I   nodes (int) number of nodes in FullyConnectedLayer
-            II  act_func (activationFunctions) activation function of FullyConnectedLayer,
-                such as "sigmoid", "RELU", "LRELU", "softmax" or "identity"
-            III scheduler (Scheduler) optional parameter, default set to Adam. Can also be set to other
-                schedulers such as AdaGrad, Momentum, RMS_prop and Constant
-        """
-        assert self.layers, "FullyConnectedLayer should follow FlattenLayer in CNN"
-
-        if scheduler is None:
-            scheduler = self.scheduler
-
-        layer = FullyConnectedLayer(nodes, act_func, scheduler, self.seed)
-        self.layers.append(layer)
-
-    def add_OutputLayer(self, nodes: int, output_func=sigmoid, scheduler=None) -> None:
-        """
-        Description:
-        ------------
-            Add an OutputLayer to the CNN, i.e. a the final layer in the feed forward neural
-            network part of the CNN
-
-        Parameters:
-        ------------
-            I   nodes (int) number of nodes in OutputLayer. Set nodes=1 for binary classification and
-                nodes = number of classes for multi-class classification
-            II  output_func (activationFunctions) activation function for the output layer, such as
-                "identity" for regression, "sigmoid" for binary classification and "softmax" for multi-class
-                classification
-            III scheduler (Scheduler) optional parameter, default set to Adam. Can also be set to other
-                schedulers such as AdaGrad, Momentum, RMS_prop and Constant
-        """
-        assert self.layers, "OutputLayer should follow FullyConnectedLayer in CNN"
-
-        if scheduler is None:
-            scheduler = self.scheduler
-
-        output_layer = OutputLayer(
-            nodes, output_func, self.cost_func, scheduler, self.seed
-        )
-        self.layers.append(output_layer)
-        self.pred_format = output_layer.get_pred_format()
-
-    def add_FlattenLayer(self, act_func=LRELU) -> None:
-        """
-        Description:
-        ------------
-            Add a FlattenLayer to the CNN, which flattens the image data such that it is formatted to
-            be used in the feed forward neural network part of the CNN
-        """
-        self.layers.append(FlattenLayer(act_func=act_func, seed=self.seed))
-
-    def add_Convolution2DLayer(
-        self,
-        input_channels=1,
-        feature_maps=1,
-        kernel_height=3,
-        kernel_width=3,
-        v_stride=1,
-        h_stride=1,
-        pad="same",
-        act_func=LRELU,
-        optimized=True,
-    ) -> None:
-        """
-        Description:
-        ------------
-            Add a Convolution2DLayer to the CNN, i.e. a convolutional layer with a 2 dimensional kernel. Should be
-            the first layer added to the CNN
-
-        Parameters:
-        ------------
-            I   input_channels (int) specifies amount of input channels. For monochrome images, use input_channels
-                = 1, and input_channels = 3 for colored images, where each channel represents one of R, G and B
-            II  feature_maps (int) amount of feature maps in CNN
-            III kernel_height (int) height of the kernel, also called 'convolutional filter' in literature
-            IV  kernel_width (int) width of the kernel, also called 'convolutional filter' in literature
-            V   v_stride (int) value of vertical stride for dimentionality reduction
-            VI  h_stride (int) value of horizontal stride for dimentionality reduction
-            VII pad (str) default = "same" ensures output size is the same as input size (given stride=1)
-           VIII act_func (activationFunctions) default = "LRELU", nonlinear activation function
-             IX optimized (bool) default = True, uses Convolution2DLayerOPT if True which is much faster when
-                compared to Convolution2DLayer, which is a more straightforward, understandable implementation
-        """
-        if optimized:
-            conv_layer = Convolution2DLayerOPT(
-                input_channels,
-                feature_maps,
-                kernel_height,
-                kernel_width,
-                v_stride,
-                h_stride,
-                pad,
-                act_func,
-                self.seed,
-                reset_weights_independently=False,
-            )
-        else:
-            conv_layer = Convolution2DLayer(
-                input_channels,
-                feature_maps,
-                kernel_height,
-                kernel_width,
-                v_stride,
-                h_stride,
-                pad,
-                act_func,
-                self.seed,
-                reset_weights_independently=False,
-            )
-        self.layers.append(conv_layer)
-
-    def add_PoolingLayer(
-        self, kernel_height=2, kernel_width=2, v_stride=1, h_stride=1, pooling="max"
-    ) -> None:
-        """
-        Description:
-        ------------
-            Add a Pooling2DLayer to the CNN, i.e. a pooling layer that reduces the dimentionality of
-            the image data. It is not necessary to use a Pooling2DLayer when creating a CNN, but it
-            can be used to speed up the training
-
-        Parameters:
-        ------------
-            I   kernel_height (int) height of the kernel used for pooling
-            II  kernel_width (int) width of the kernel used for pooling
-            III v_stride (int) value of vertical stride for dimentionality reduction
-            IV  h_stride (int) value of horizontal stride for dimentionality reduction
-            V   pooling (str) either "max" or "average", describes type of pooling performed
-        """
-        pooling_layer = Pooling2DLayer(
-            kernel_height, kernel_width, v_stride, h_stride, pooling, self.seed
-        )
-        self.layers.append(pooling_layer)
-
-    def fit(
-        self,
-        X: np.ndarray,
-        t: np.ndarray,
-        epochs: int = 100,
-        lam: float = 0,
-        batches: int = 1,
-        X_val: np.ndarray = None,
-        t_val: np.ndarray = None,
-    ) -> dict:
-        """
-        Description:
-        ------------
-            Fits the CNN to input X for a given amount of epochs. Performs feedforward and backpropagation passes,
-            can utilize batches, regulariziation and validation if desired.
-
-        Parameters:
-        ------------
-            X (numpy array) with input data in format [images, input channels,
-            image height, image_width]
-            t (numpy array) target labels for input data
-            epochs (int) amount of epochs
-            lam (float) regulariziation term lambda
-            batches (int) amount of batches input data splits into
-            X_val (numpy array) validation data
-            t_val (numpy array) target labels for validation data
-
-        Returns:
-        ------------
-            scores (dict) a dictionary with "train_error", "train_acc", "val_error", val_acc" keys
-            that contain numpy arrays with float values of all accuracies/errors over all epochs.
-            Can be used to create plots. Also used to update the progress bar during training
-        """
-
-        # setup
-        if self.seed is not None:
-            np.random.seed(self.seed)
-
-        # initialize weights
-        self._initialize_weights(X)
-
-        # create arrays for score metrics
-        scores = self._initialize_scores(epochs)
-
-        assert batches <= t.shape[0]
-        batch_size = X.shape[0] // batches
-
-        try:
-            for epoch in range(epochs):
-                for batch in range(batches):
-                    # minibatch gradient descent
-                    # If the for loop has reached the last batch, take all thats left
-                    if batch == batches - 1:
-                        X_batch = X[batch * batch_size :, :, :, :]
-                        t_batch = t[batch * batch_size :, :]
-                    else:
-                        X_batch = X[
-                            batch * batch_size : (batch + 1) * batch_size, :, :, :
-                        ]
-                        t_batch = t[batch * batch_size : (batch + 1) * batch_size, :]
-
-                    self._feedforward(X_batch)
-                    self._backpropagate(t_batch, lam)
-
-                # reset schedulers for each epoch (some schedulers pass in this call)
-                for layer in self.layers:
-                    if isinstance(layer, FullyConnectedLayer):
-                        layer._reset_scheduler()
-
-                # computing performance metrics
-                scores = self._compute_scores(scores, epoch, X, t, X_val, t_val)
-
-                # printing progress bar
-                print_length = self._progress_bar(
-                    epoch,
-                    epochs,
-                    scores,
-                )
-        # allows for stopping training at any point and seeing the result
-        except KeyboardInterrupt:
-            pass
-
-        # visualization of training progression (similiar to tensorflow progression bar)
-        sys.stdout.write("\r" + " " * print_length)
-        sys.stdout.flush()
-        self._progress_bar(
-            epochs,
-            epochs,
-            scores,
-        )
-        sys.stdout.write("")
-
-        return scores
-
-    def _feedforward(self, X_batch) -> np.ndarray:
-        """
-        Description:
-        ------------
-            Performs the feedforward pass for all layers in the CNN. Called from fit()
-        """
-        a = X_batch
-        for layer in self.layers:
-            a = layer._feedforward(a)
-
-        return a
-
-    def _backpropagate(self, t_batch, lam) -> None:
-        """
-        Description:
-        ------------
-            Performs backpropagation for all layers in the CNN. Called from fit()
-        """
-        assert len(self.layers) >= 2
-        reversed_layers = self.layers[::-1]
-
-        # for every layer, backwards
-        for i in range(len(reversed_layers) - 1):
-            layer = reversed_layers[i]
-            prev_layer = reversed_layers[i + 1]
-
-            # OutputLayer
-            if isinstance(layer, OutputLayer):
-                prev_a = prev_layer.get_prev_a()
-                weights_next, delta_next = layer._backpropagate(t_batch, prev_a, lam)
-
-            # FullyConnectedLayer
-            elif isinstance(layer, FullyConnectedLayer):
-                assert (
-                    delta_next is not None
-                ), "No OutputLayer to follow FullyConnectedLayer"
-                assert (
-                    weights_next is not None
-                ), "No OutputLayer to follow FullyConnectedLayer"
-                prev_a = prev_layer.get_prev_a()
-                weights_next, delta_next = layer._backpropagate(
-                    weights_next, delta_next, prev_a, lam
-                )
-
-            # FlattenLayer
-            elif isinstance(layer, FlattenLayer):
-                assert (
-                    delta_next is not None
-                ), "No FullyConnectedLayer to follow FlattenLayer"
-                assert (
-                    weights_next is not None
-                ), "No FullyConnectedLayer to follow FlattenLayer"
-                delta_next = layer._backpropagate(weights_next, delta_next)
-
-            # Convolution2DLayer and Convolution2DLayerOPT
-            elif isinstance(layer, Convolution2DLayer):
-                assert (
-                    delta_next is not None
-                ), "No FlattenLayer to follow Convolution2DLayer"
-                delta_next = layer._backpropagate(delta_next)
-
-            # Pooling2DLayer
-            elif isinstance(layer, Pooling2DLayer):
-                assert delta_next is not None, "No Layer to follow Pooling2DLayer"
-                delta_next = layer._backpropagate(delta_next)
-
-            # Catch error
-            else:
-                raise NotImplementedError
-
-    def _compute_scores(
-        self,
-        scores: dict,
-        epoch: int,
-        X: np.ndarray,
-        t: np.ndarray,
-        X_val: np.ndarray,
-        t_val: np.ndarray,
-    ) -> dict:
-        """
-        Description:
-        ------------
-            Computes scores such as training error, training accuracy, validation error
-            and validation accuracy for the CNN depending on if a validation set is used
-            and if the CNN performs classification or regression
-
-        Returns:
-        ------------
-            scores (dict) a dictionary with "train_error", "train_acc", "val_error", val_acc" keys
-            that contain numpy arrays with float values of all accuracies/errors over all epochs.
-            Can be used to create plots. Also used to update the progress bar during training
-        """
-
-        pred_train = self.predict(X)
-        cost_function_train = self.cost_func(t)
-        train_error = cost_function_train(pred_train)
-        scores["train_error"][epoch] = train_error
-
-        if X_val is not None and t_val is not None:
-            cost_function_val = self.cost_func(t_val)
-            pred_val = self.predict(X_val)
-            val_error = cost_function_val(pred_val)
-            scores["val_error"][epoch] = val_error
-
-        if self.pred_format != "Regression":
-            train_acc = self._accuracy(pred_train, t)
-            scores["train_acc"][epoch] = train_acc
-            if X_val is not None and t_val is not None:
-                val_acc = self._accuracy(pred_val, t_val)
-                scores["val_acc"][epoch] = val_acc
-
-        return scores
-
-    def _initialize_scores(self, epochs) -> dict:
-        """
-        Description:
-        ------------
-            Initializes scores such as training error, training accuracy, validation error
-            and validation accuracy for the CNN
-
-        Returns:
-        ------------
-            A dictionary with "train_error", "train_acc", "val_error", val_acc" keys that
-            will contain numpy arrays with float values of all accuracies/errors over all epochs
-            when passed through the _compute_scores() function during fit()
-        """
-        scores = dict()
-
-        train_errors = np.empty(epochs)
-        train_errors.fill(np.nan)
-        val_errors = np.empty(epochs)
-        val_errors.fill(np.nan)
-
-        train_accs = np.empty(epochs)
-        train_accs.fill(np.nan)
-        val_accs = np.empty(epochs)
-        val_accs.fill(np.nan)
-
-        scores["train_error"] = train_errors
-        scores["val_error"] = val_errors
-        scores["train_acc"] = train_accs
-        scores["val_acc"] = val_accs
-
-        return scores
-
-    def _initialize_weights(self, X: np.ndarray) -> None:
-        """
-        Description:
-        ------------
-            Initializes weights for all layers in CNN
-
-        Parameters:
-        ------------
-            I   X (np.ndarray) input of format [img, feature_maps, height, width]
-        """
-        prev_nodes = X
-        for layer in self.layers:
-            prev_nodes = layer._reset_weights(prev_nodes)
-
-    def predict(self, X: np.ndarray, *, threshold=0.5) -> np.ndarray:
-        """
-        Description:
-        ------------
-            Predicts output of input X
-
-        Parameters:
-        ------------
-            I   X (np.ndarray) input [img, feature_maps, height, width]
-        """
-
-        prediction = self._feedforward(X)
-
-        if self.pred_format == "Binary":
-            return np.where(prediction > threshold, 1, 0)
-        elif self.pred_format == "Multi-class":
-            class_prediction = np.zeros(prediction.shape)
-            for i in range(prediction.shape[0]):
-                class_prediction[i, np.argmax(prediction[i, :])] = 1
-            return class_prediction
-        else:
-            return prediction
-
-    def _accuracy(self, prediction: np.ndarray, target: np.ndarray) -> float:
-        """
-        Description:
-        ------------
-            Calculates accuracy of given prediction to target
-
-        Parameters:
-        ------------
-            I   prediction (np.ndarray): output of predict() fuction
-            (1s and 0s in case of classification, and real numbers in case of regression)
-            II  target (np.ndarray): vector of true values (What the network should predict)
-
-        Returns:
-        ------------
-            A floating point number representing the percentage of correctly classified instances.
-        """
-        assert prediction.size == target.size
-        return np.average((target == prediction))
-
-    def _progress_bar(self, epoch: int, epochs: int, scores: dict) -> int:
-        """
-        Description:
-        ------------
-            Displays progress of training
-        """
-        progression = epoch / epochs
-        epoch -= 1
-        print_length = 40
-        num_equals = int(progression * print_length)
-        num_not = print_length - num_equals
-        arrow = ">" if num_equals > 0 else ""
-        bar = "[" + "=" * (num_equals - 1) + arrow + "-" * num_not + "]"
-        perc_print = self._fmt(progression * 100, N=5)
-        line = f"  {bar} {perc_print}% "
-
-        for key, score in scores.items():
-            if np.isnan(score[epoch]) == False:
-                value = self._fmt(score[epoch], N=4)
-                line += f"| {key}: {value} "
-        print(line, end="\r")
-        return len(line)
-
-    def _fmt(self, value: int, N=4) -> str:
-        """
-        Description:
-        ------------
-            Formats decimal numbers for progress bar
-        """
-        if value > 0:
-            v = value
-        elif value < 0:
-            v = -10 * value
-        else:
-            v = 1
-        n = 1 + math.floor(math.log10(v))
-        if n >= N - 1:
-            return str(round(value))
-            # or overflow
-        return f"{value:.{N-n-1}f}"
-!ec
+!split
+===== Parameters to train, common settings =====
 
-=== Usage of CNN code ===
+With parameter sharing, the convolution involves thus  for each filter  $F\times F\times D_1$ weights plus one bias parameter.
 
-Using the CNN codebase is very simple. We begin by initiating a CNN
-object, which takes a cost function, a scheduler and a seed as its
-arguments. If a scheduler is not provided, it will per default
-initiate an Adam scheduler with eta=1e-4, and if a seed is not
-provided, the CNN will not be seeded, meaning it will run with a
-different random seed every run. Below we demonstrate an initiation of
-our CNN.
+In total we have
+!bt
+\[
+\left(F\times F\times D_1)\right) \times K+(K\mathrm{--biases}),
+\]
+!et
+parameters to train by back propagation.
 
-!bc pycod
-adam_scheduler = Adam(eta=1e-3, rho=0.9, rho2=0.999)
-cnn = CNN(cost_func=CostCrossEntropy, scheduler=adam_scheduler, seed=2023)
-!ec
+It is common to let $K$ come in powers of $2$, that is $32$, $64$, $128$ etc.
 
-Now that we have our CNN object, we can begin to add layers to it!
-Many of the add_layer functions have default values, for example
-add_Convolution2DLayer() has a default v_stride and h_stride of
-1. However, these can of course be set to any value you please. Note
-that the input channels of a subsequent convolutional layer must equal
-the previous convolutional layer's feature maps.
+!bblock Common settings
+o $\begin{array}{c} F=3 & S=1 & P=1 \end{array}$
+o $\begin{array}{c} F=5 & S=1 & P=2 \end{array}$
+o $\begin{array}{c} F=5 & S=2 & P=\mathrm{open} \end{array}$
+o $\begin{array}{c} F=1 & S=1 & P=0 \end{array}$
+!eblock
 
-!bc pycod
-cnn.add_Convolution2DLayer(
-    input_channels=1,
-    feature_maps=1,
-    kernel_height=3,
-    kernel_width=3,
-    act_func=LRELU,
-)
+!split
+===== Examples of CNN setups =====
 
-cnn.add_FlattenLayer()
+Let us assume we have an input volume $V$ given by an image of dimensionality
+$32\times 32 \times 3$, that is three color channels and $32\times 32$ pixels.
 
-cnn.add_FullyConnectedLayer(30, LRELU)
+We apply a filter of dimension $5\times 5$ ten times with stride $S=1$ and padding $P=0$.
 
-cnn.add_FullyConnectedLayer(20, LRELU)
+The output volume is given by $(32-5)/1+1=28$, resulting in ten images
+of dimensionality $28\times 28\times 3$.
 
-cnn.add_OutputLayer(10, softmax)
-!ec
+The total number of parameters to train for each filter is then
+$5\times 5\times 3+1$, where the last parameter is the bias. This
+gives us $76$ parameters for each filter, leading to a total of $760$
+parameters for the ten filters.
 
-Here we have created a CNN with the following architecture:
- 
-o A convolutional layer with 1 input channel, with a kernel height of 2 and a width of 2, which uses LRELU as its non-linearity function. This layer outputs 1 feature map, which feed into the subsequent layer.
+How many parameters will a filter of dimensionality $3\times 3$
+(adding color channels) result in if we produce $32$ new images? Use $S=1$ and $P=0$.
 
-o A flatten layer
+Note that strides constitute a form of _subsampling_. As an alternative to
+being interpreted as a measure of how much the kernel/filter is translated, strides
+can also be viewed as how much of the output is retained. For instance, moving
+the kernel by hops of two is equivalent to moving the kernel by hops of one but
+retaining only odd output elements.
 
-o A hidden layer with 30 nodes, with LRELU as its activation function
 
-o Another hidden layer but with 20 nodes
 
-o The output layer, with softmax as its activation function and 10 nodes. We use 10 nodes because we will be using a dataset with 10 classes.
+!split
+===== Summarizing: Performing a general discrete convolution ("From Raschka et al":"https://github.com/rasbt/machine-learning-book") =====
 
-Now, before we can train the model, we need to load in our data. We
-will use the MNIST dataset and use 10000 $28 \times  28$ images.
+FIGURE: [figslides/discreteconv1.png, width=500 frac=0.67]  A deep CNN
 
-!bc pycod
-from sklearn.datasets import fetch_openml
-from sklearn.model_selection import train_test_split
 
-def onehot(target: np.ndarray):
-    onehot = np.zeros((target.size, target.max() + 1))
-    onehot[np.arange(target.size), target] = 1
-    return onehot
 
-# get dataset
-dataset = fetch_openml("mnist_784", parser="auto")
-mnist = dataset.data.to_numpy(dtype="float")[:10000, :]
+!split
+===== Pooling =====
 
-# scale data
-for i in range(mnist.shape[1]):
-    mnist[:, i] /= 255
-    
-# reshape to add single input channel to data shape [inputs, input_channels, height, width]
-mnist = mnist.reshape(mnist.shape[0], 1, 28, 28)
+In addition to discrete convolutions themselves, _pooling_ operations
+make up another important building block in CNNs. Pooling operations reduce
+the size of feature maps by using some function to summarize subregions, such
+as taking the average or the maximum value.
 
-# one hot encode target as we are doing multi-class classification
-target = onehot(np.array([int(i) for i in dataset.target.to_numpy()[:10000]]))
+Pooling works by sliding a window across the input and feeding the content of
+the window to a _pooling function_. In some sense, pooling works very much
+like a discrete convolution, but replaces the linear combination described by
+the kernel with some other function.
 
-# split into training and validation data
-x_train, x_val, y_train, y_val = train_test_split(mnist, target)
-!ec
 
-Now we may train our model. Note that we can utilize regularization in
-the CNN by using the lam (lambda) parameter in fit(), and utilize
-different types of gradient descent by specifying the amount of
-batches via the batches parameter as shown below.
- 
-The functionfit() returns a score dictionary of the training error and
-accuracy (and validation error and accuracy if a validation set is
-provided) which can be used to plot the error and accuracy of the
-model over epochs.
 
-!bc pycod
-scores = cnn.fit(
-    x_train,
-    y_train,
-    lam=1e-5,
-    batches=10,
-    epochs=100,
-    X_val=x_val,
-    t_val=y_val,
-)
-
-plt.plot(scores["train_acc"], label="Training")
-plt.plot(scores["val_acc"], label="Validation")
-plt.ylim([0.8,1])
-plt.xlabel("Epochs")
-plt.ylabel("Accuracy")
-plt.legend()
-plt.show()
-!ec
+!split
+===== Pooling arithmetic =====
 
-Considering we only trained the model for 100 epochs without any tuning of the hyperparameters, this result is pretty good.
- 
-The codebase allows for great flexibility in CNN
-architectures. Pooling layers can be added before, inbetween or after
-convolutional layers, but due to the great optimizations made within
-Convolution2DLayerOPT, we recommend using the v_stride and h_stride
-parameters in add_Convolution2DLayer() to reduce the dimentionality of
-the problem as the pooling layer is slow in comparison. To use the
-unoptimized version of Convolution2DLayer, simply pass optimized=False
-as an argument in add_Convolution2DLayer().
-
-If one wishes to perform binary classification using the CNN, simply
-use the cost function 'CostLogReg' when initializing the CNN and use 1
-node at the OutputLayer.
- 
-Below we have created another, more untraditional architecture using
-our code to demonstrate its flexibility and different attributes such
-as asymmetric stride that might become useful when constructing your
-own CNN.
+In a neural network, pooling layers provide invariance to small translations of
+the input. The most common kind of pooling is _max pooling_, which
+consists in splitting the input in (usually non-overlapping) patches and
+outputting the maximum value of each patch. Other kinds of pooling exist, e.g.,
+mean or average pooling, which all share the same idea of aggregating the input
+locally by applying a non-linearity to the content of some patches.
 
-!bc pycod
-adam_scheduler = Adam(eta=1e-3, rho=0.9, rho2=0.999)
-cnn = CNN(cost_func=CostCrossEntropy, scheduler=adam_scheduler, seed=2023)
-
-cnn.add_Convolution2DLayer(
-    input_channels=1,
-    feature_maps=7,
-    kernel_height=7,
-    kernel_width=1,
-    act_func=LRELU,
-)
-
-cnn.add_PoolingLayer(
-    kernel_height=2,
-    kernel_width=2,
-    pooling="average",
-)
-
-cnn.add_PoolingLayer(
-    kernel_height=2,
-    kernel_width=2,
-    pooling="max",
-)
-
-cnn.add_Convolution2DLayer(
-    input_channels=7,
-    feature_maps=1,
-    kernel_height=4,
-    kernel_width=4,
-    v_stride=2,
-    h_stride=3,
-    act_func=LRELU,
-    optimized=False,
-)
-
-cnn.add_Convolution2DLayer(
-    input_channels=1,
-    feature_maps=1,
-    kernel_height=2,
-    kernel_width=2,
-    act_func=sigmoid,
-    optimized=True,
-)
-
-cnn.add_PoolingLayer(
-    kernel_height=2,
-    kernel_width=2,
-    pooling="max"
-)
-
-cnn.add_FlattenLayer()
-
-cnn.add_FullyConnectedLayer(100, LRELU)
-
-cnn.add_FullyConnectedLayer(10, sigmoid)
-
-cnn.add_FullyConnectedLayer(101, identity)
-
-cnn.add_OutputLayer(10, softmax)
-!ec
 
-Here we see the use of asymmetrical 1D kernels such as the $7 \times
-1$ kernel in the first convolutional layer, both max and average
-pooling, asymmetric stride in the unoptimized convolutional layer,
-more pooling, a flatten layer, a hidden layer with 100 nodes using
-LRELU, another hidden layer with 10 hidden nodes that uses the sigmoid
-activation function, and another hidden layer with 101 nodes which
-utilizes no activation function (identity). Finally, we arrive at the
-output layer with 10 nodes, which uses softmax as its activation
-function.
-
-=== Additional Remarks ===
-
-
-The stride parameter controls the distance between each convolution
-and the kernel/filter. If our image is padded, stride is the only
-parameter that determines the size of the output from a convolutional
-layer. However, if we decide not to perform any padding, the size of
-the output feature map depends on both the stride and kernel size. It
-is important to note that neither the stride nor the kernel has to be
-symmetrical. This means that we can use a rectangular filter if we
-choose, and the stride in the vertical direction (axis=0 in Python)
-does not need to be the same as the stride in the horizontal direction
-(axis=1 in Python). It may even be the case that asymmetric
-combinations of stride or kernel dimensions, or both, yield better
-results than symmetric values for these parameters.
 
-!bc pycod
-def convolve(image, kernel, stride=1):
-    for i in range(2):
-        kernel = np.rot90(kernel)
-
-    k_half_height = kernel.shape[0] // 2
-    k_half_width = kernel.shape[0] // 2
-
-    conv_image = np.zeros(image.shape)
-    pad_image = padding(image, kernel)
-
-    for i in range(k_half_height, conv_image.shape[0] + k_half_height, stride):
-        for j in range(k_half_width, conv_image.shape[1] + k_half_width, stride):
-            conv_image[i - k_half_height, j - k_half_width] = np.sum(
-                pad_image[
-                    i - k_half_height : i + k_half_height + 1, j - k_half_width : j + k_half_width + 1
-                ]
-                * kernel
-            )
-
-    return conv_image
-!ec
+!split
+===== Pooling types ("From Raschka et al":"https://github.com/rasbt/machine-learning-book") =====
 
-=== Remarks on the speed  ===
-
-Despite the naive convolution algorithm shown above working finely, it
-is extremely slow, requiring approximately 20-30 seconds to process a
-single image. The time complexity of 2D convolution, which is O(NMnm),
-rapidly becomes a constraint and may, at worst, make computations
-infeasible. Consequently, optimizing the naive 2D convolution
-algorithm is a necessity, as the execution time of the algorithm
-significantly increases as the input data size expands. This can pose
-a bottleneck in applications that necessitate real-time processing of
-large data volumes, such as image and video processing, deep learning,
-and scientific simulations.
-
-To address this issue, we shall present two widely used optimization
-techniques: the separable kernel approach and Fast Fourier Transform
-(FFT). Both of these methods can drastically reduce the computational
-complexity of convolution and enhance the overall efficiency of
-processing substantial data quantities. While we shall refrain from
-delving into the intricacies of these algorithms, we strongly
-encourage you to examine at least the application of FFT to optimize
-computations.
-
-=== Convolution using separable kernels ===
+FIGURE: [figslides/maxpooling.png, width=500 frac=0.67]  A deep CNN
 
-!bc pycod
-def conv2DSep(image, kernel, coef, stride=1, pad="zero"):
-    for i in range(2):
-        kernel = np.rot90(kernel)
-
-    # The kernel is quadratic, thus we only need one of its dimensions
-    half_dim = kernel.shape[0] // 2
-
-    ker1 = np.array(kernel[0, :])
-    ker2 = np.array(kernel[:, 0])
-
-    if pad == "zero":
-        conv_image = np.zeros(image.shape)
-        pad_image = padding(image, kernel)
-    else:
-        conv_image = np.zeros(
-            (image.shape[0] - kernel.shape[0], image.shape[1] - kernel.shape[1])
-        )
-        pad_image = image[:, :]
-
-    for i in range(half_dim, conv_image.shape[0] + half_dim, stride):
-        for j in range(half_dim, conv_image.shape[1] + half_dim, stride):
-            conv_image[i - half_dim, j - half_dim] = (
-                pad_image[
-                    i - half_dim : i + half_dim + 1, j - half_dim : j + half_dim + 1
-                ]
-                @ ker1
-                @ ker2.T
-                * coef
-            )
-
-    return conv_image
-
-img_path = img_path = "data/IMG-2167.JPG"
-image_of_cute_dog = imageio.imread(img_path, mode="L")
-start_time = time.time()
-filtered_image = conv2DSep(image_of_cute_dog, kernel=sobel_kernel, coef=1)
-print(f'Time taken for convolution with seperated kernel on 128x128 image {time.time() - start_time}')
-plt.imshow(filtered_image, cmap="gray", vmin=0, vmax=255, aspect="auto")
-plt.show()
-!ec
 
-By taking advantage of the capabilities of separable kernels, we can
-effectively cut the computational expense of filtering an image in
-half. Yet, if we seek even more rapid processing, we can turn to the
-Fast Fourier Transform (FFT) algorithm provided by the numpy
-library. By utilizing FFT to transform the input image and filter into
-the frequency domain, we can perform convolution in this domain. This
-approach significantly reduces the number of operations needed and
-results in a marked speedup relative to other convolution
-techniques. In addition, it is worth noting that the FFT is widely
-regarded as one of the most critical algorithms developed to date,
-with applications ranging from digital signal processing to scientific
-computing.
-
-=== Convolution in the Fourier domain ===
-!bc pycod
-start_time = time.time()
-img_fft = np.fft.fft2(image_of_cute_dog)
-kernel_fft = np.fft.fft2(sobel_kernel, s=image_of_cute_dog.shape)
+!split
+===== Building convolutional neural networks in Tensorflow/Keras and PyTorch =====
 
-conv_image = img_fft * kernel_fft
+  
+As discussed above, CNNs are neural networks built from the assumption
+that the inputs to the network are 2D images. This is important
+because the number of features or pixels in images grows very fast
+with the image size, and an enormous number of weights and biases are
+needed in order to build an accurate network.  Next week we will
+discuss in more detail how we can build a CNN using either TensorFlow
+with Keras and PyTorch.
+  
 
-filtered_image = np.fft.ifft2(conv_image)
-print(f'Time take for convolution in the fourier domain: {time.time() - start_time}')
-plt.imshow(filtered_image.real, cmap="gray", vmin=0, vmax=255, aspect="auto")
-plt.show()
-!ec
 
-It is evident that executing convolution in the Fourier domain yields
-the quickest computation time. Nonetheless, one should exercise
-caution, particularly when dealing with images of relatively small
-dimensions, as one of the other methods may prove to be more
-expeditious than FFT-enhanced convolution. The overhead involved in
-transferring both the image and filter into the Fourier domain,
-followed by their subsequent transformation back into the spatial
-domain, results in a minor inconvenience. Therefore, it is imperative
-to remain cognizant of this fact when utilizing FFT as the primary
-optimization technique.
 
 
 
diff --git a/doc/src/week45/backup2022.do.txt b/doc/src/week45/Previousversions/backup2022.do.txt
similarity index 100%
rename from doc/src/week45/backup2022.do.txt
rename to doc/src/week45/Previousversions/backup2022.do.txt
diff --git a/doc/src/week45/backup2023.do.txt b/doc/src/week45/Previousversions/backup2023.do.txt
similarity index 100%
rename from doc/src/week45/backup2023.do.txt
rename to doc/src/week45/Previousversions/backup2023.do.txt
diff --git a/doc/src/week45/Previousversions/week45.do.txt b/doc/src/week45/Previousversions/week45.do.txt
new file mode 100644
index 000000000..d8701da08
--- /dev/null
+++ b/doc/src/week45/Previousversions/week45.do.txt
@@ -0,0 +1,4011 @@
+TITLE: Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks (RNNs)
+AUTHOR: Morten Hjorth-Jensen {copyright, 1999-present|CC BY-NC} at Department of Physics, University of Oslo
+DATE: November 3-7, 2025
+
+!split
+===== Plans for week 45 =====
+
+!bblock Material for the lecture on Monday November 3, 2025
+o Convolutional Neural Networks, codes and examples (own code and TensorFlow and Pytorch implementations)
+o Recurrent  Neural Networks (RNNs)
+o Readings and Videos:
+  o These lecture notes at URL:"https://github.com/CompPhysics/MachineLearning/blob/master/doc/pub/week45/ipynb/week45.ipynb"
+#    * "Video of lecture":"https://youtu.be/z0x-vgyAZUk"
+#    * "Whiteboard notes":"https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2023/NotesNov9.pdf"
+  o For a more in depth discussion on  CNNs and recurrent neural networks we recommend Goodfellow et al chapters 9 and 10. See also chapter 11 and 12 on practicalities and applications    
+  o Reading suggestions for implementation of CNNs and RNNs, see Raschka et al chapters 14-15 at URL:"https://github.com/rasbt/machine-learning-book".
+  o Video  on Recurrent Neural Networks from MIT at URL:"https://www.youtube.com/watch?v=SEnXr6v2ifU&ab_channel=AlexanderAmini"
+  o Video on Deep Learning at URL:"https://www.youtube.com/playlist?list=PLZHQObOWTQDNU6R1_67000Dx_ZCJB-3pi"
+!eblock
+
+
+!split
+===== Material for the lab sessions, additional ways to present classification results and other practicalities =====
+
+!bblock  Material for the active learning sessions on Tuesday and Wednesday
+  o Discussion of and work on project 3, available from Monday November 4, late evening
+!eblock  
+
+
+
+!split
+===== Material for Lecture Monday November 4 =====
+
+
+
+!split
+===== Convolutional Neural Networks (recognizing images) =====
+
+
+Convolutional neural networks (CNNs) were developed during the last
+decade of the previous century, with a focus on character recognition
+tasks. Nowadays, CNNs are a central element in the spectacular success
+of deep learning methods. The success in for example image
+classifications have made them a central tool for most machine
+learning practitioners.
+
+CNNs are very similar to ordinary Neural Networks.
+They are made up of neurons that have learnable weights and
+biases. Each neuron receives some inputs, performs a dot product and
+optionally follows it with a non-linearity. The whole network still
+expresses a single differentiable score function: from the raw image
+pixels on one end to class scores at the other. And they still have a
+loss function (for example Softmax) on the last (fully-connected) layer
+and all the tips/tricks we developed for learning regular Neural
+Networks still apply (back propagation, gradient descent etc etc).
+
+!split
+===== What is the Difference =====
+
+_CNN architectures make the explicit assumption that
+the inputs are images, which allows us to encode certain properties
+into the architecture. These then make the forward function more
+efficient to implement and vastly reduce the amount of parameters in
+the network._
+
+
+
+!split
+===== Neural Networks vs CNNs =====
+
+Neural networks are defined as _affine transformations_, that is 
+a vector is received as input and is multiplied with a matrix of so-called weights (our unknown paramters) to produce an
+output (to which a bias vector is usually added before passing the result
+through a nonlinear activation function). This is applicable to any type of input, be it an
+image, a sound clip or an unordered collection of features: whatever their
+dimensionality, their representation can always be flattened into a vector
+before the transformation.
+
+
+!split
+===== Why CNNS for images, sound files, medical images from CT scans etc? =====
+
+However, when we consider images, sound clips and many other similar kinds of data, these data  have an intrinsic
+structure. More formally, they share these important properties:
+* They are stored as multi-dimensional arrays (think of the pixels of a figure) .
+* They feature one or more axes for which ordering matters (e.g., width and height axes for an image, time axis for a sound clip).
+* One axis, called the channel axis, is used to access different views of the data (e.g., the red, green and blue channels of a color image, or the left and right channels of a stereo audio track).
+
+These properties are not exploited when an affine transformation is applied; in
+fact, all the axes are treated in the same way and the topological information
+is not taken into account. Still, taking advantage of the implicit structure of
+the data may prove very handy in solving some tasks, like computer vision and
+speech recognition, and in these cases it would be best to preserve it. This is
+where discrete convolutions come into play.
+
+A discrete convolution is a linear transformation that preserves this notion of
+ordering. It is sparse (only a few input units contribute to a given output
+unit) and reuses parameters (the same weights are applied to multiple locations
+in the input).
+
+
+
+
+!split
+===== Regular NNs don’t scale well to full images =====
+
+As an example, consider
+an image of size $32\times 32\times 3$ (32 wide, 32 high, 3 color channels), so a
+single fully-connected neuron in a first hidden layer of a regular
+Neural Network would have $32\times 32\times 3 = 3072$ weights. This amount still
+seems manageable, but clearly this fully-connected structure does not
+scale to larger images. For example, an image of more respectable
+size, say $200\times 200\times 3$, would lead to neurons that have 
+$200\times 200\times 3 = 120,000$ weights. 
+
+We could have
+several such neurons, and the parameters would add up quickly! Clearly,
+this full connectivity is wasteful and the huge number of parameters
+would quickly lead to possible overfitting.
+
+FIGURE: [figslides/nn.jpeg, width=500 frac=0.6]  A regular 3-layer Neural Network.
+
+!split
+===== 3D volumes of neurons ===== 
+
+Convolutional Neural Networks take advantage of the fact that the
+input consists of images and they constrain the architecture in a more
+sensible way. 
+
+In particular, unlike a regular Neural Network, the
+layers of a CNN have neurons arranged in 3 dimensions: width,
+height, depth. (Note that the word depth here refers to the third
+dimension of an activation volume, not to the depth of a full Neural
+Network, which can refer to the total number of layers in a network.)
+
+To understand it better, the above example of an image 
+with an input volume of
+activations has dimensions $32\times 32\times 3$ (width, height,
+depth respectively). 
+
+The neurons in a layer will
+only be connected to a small region of the layer before it, instead of
+all of the neurons in a fully-connected manner. Moreover, the final
+output layer could  for this specific image have dimensions $1\times 1 \times 10$, 
+because by the
+end of the CNN architecture we will reduce the full image into a
+single vector of class scores, arranged along the depth
+dimension. 
+
+FIGURE: [figslides/cnn.jpeg, width=500 frac=0.6]  A CNN arranges its neurons in three dimensions (width, height, depth), as visualized in one of the layers. Every layer of a CNN transforms the 3D input volume to a 3D output volume of neuron activations. In this example, the red input layer holds the image, so its width and height would be the dimensions of the image, and the depth would be 3 (Red, Green, Blue channels).
+
+
+
+
+!split 
+===== Layers used to build CNNs =====
+
+
+A simple CNN is a sequence of layers, and every layer of a CNN
+transforms one volume of activations to another through a
+differentiable function. We use three main types of layers to build
+CNN architectures: Convolutional Layer, Pooling Layer, and
+Fully-Connected Layer (exactly as seen in regular Neural Networks). We
+will stack these layers to form a full CNN architecture.
+
+A simple CNN for image classification could have the architecture:
+
+* _INPUT_ ($32\times 32 \times 3$) will hold the raw pixel values of the image, in this case an image of width 32, height 32, and with three color channels R,G,B.
+* _CONV_ (convolutional )layer will compute the output of neurons that are connected to local regions in the input, each computing a dot product between their weights and a small region they are connected to in the input volume. This may result in volume such as $[32\times 32\times 12]$ if we decided to use 12 filters.
+* _RELU_ layer will apply an elementwise activation function, such as the $max(0,x)$ thresholding at zero. This leaves the size of the volume unchanged ($[32\times 32\times 12]$).
+* _POOL_ (pooling) layer will perform a downsampling operation along the spatial dimensions (width, height), resulting in volume such as $[16\times 16\times 12]$.
+* _FC_ (i.e. fully-connected) layer will compute the class scores, resulting in volume of size $[1\times 1\times 10]$, where each of the 10 numbers correspond to a class score, such as among the 10 categories of the MNIST images we considered above . As with ordinary Neural Networks and as the name implies, each neuron in this layer will be connected to all the numbers in the previous volume.
+
+
+
+
+
+!split
+===== CNNs in brief =====
+
+In summary:
+
+* A CNN architecture is in the simplest case a list of Layers that transform the image volume into an output volume (e.g. holding the class scores)
+* There are a few distinct types of Layers (e.g. CONV/FC/RELU/POOL are by far the most popular)
+* Each Layer accepts an input 3D volume and transforms it to an output 3D volume through a differentiable function
+* Each Layer may or may not have parameters (e.g. CONV/FC do, RELU/POOL don’t)
+* Each Layer may or may not have additional hyperparameters (e.g. CONV/FC/POOL do, RELU doesn’t)
+
+!split
+===== A deep CNN model ("From Raschka et al":"https://github.com/rasbt/machine-learning-book") =====
+
+FIGURE: [figslides/deepcnn.png, width=500 frac=0.67]  A deep CNN
+
+!split
+===== Key Idea =====
+
+A dense neural network is representd by an affine operation (like matrix-matrix multiplication) where all parameters are included.
+
+The key idea in CNNs for say imaging is that in images neighbor pixels tend to be related! So we connect
+only neighboring neurons in the input instead of connecting all with the first hidden layer.
+
+We say we perform a filtering (convolution is the mathematical operation). 
+
+
+
+
+!split
+===== Building convolutional neural networks in Tensorflow and Keras =====
+
+  
+As discussed above, CNNs are neural networks built from the assumption that the inputs
+to the network are 2D images. This is important because the number of features or pixels in images
+grows very fast with the image size, and an enormous number of weights and biases are needed in order to build an accurate network.  
+  
+As before, we still have our input, a hidden layer and an output. What's novel about convolutional networks
+are the _convolutional_ and _pooling_ layers stacked in pairs between the input and the hidden layer.
+In addition, the data is no longer represented as a 2D feature matrix, instead each input is a number of 2D
+matrices, typically 1 for each color dimension (Red, Green, Blue). 
+
+
+!split
+===== Setting it up =====
+
+It means that to represent the entire
+dataset of images, we require a 4D matrix or _tensor_. This tensor has the dimensions:  
+!bt
+\[  
+(n_{inputs},\, n_{pixels, width},\, n_{pixels, height},\, depth) .
+\]
+!et
+  
+!split
+=====  The MNIST dataset again =====
+
+The MNIST dataset consists of grayscale images with a pixel size of
+$28\times 28$, meaning we require $28 \times 28 = 724$ weights to each
+neuron in the first hidden layer.
+
+If we were to analyze images of size $128\times 128$ we would require
+$128 \times 128 = 16384$ weights to each neuron. Even worse if we were
+dealing with color images, as most images are, we have an image matrix
+of size $128\times 128$ for each color dimension (Red, Green, Blue),
+meaning 3 times the number of weights $= 49152$ are required for every
+single neuron in the first hidden layer.
+  
+
+!split
+===== Strong correlations =====
+
+Images typically have strong local correlations, meaning that a small
+part of the image varies little from its neighboring regions. If for
+example we have an image of a blue car, we can roughly assume that a
+small blue part of the image is surrounded by other blue regions.
+
+Therefore, instead of connecting every single pixel to a neuron in the
+first hidden layer, as we have previously done with deep neural
+networks, we can instead connect each neuron to a small part of the
+image (in all 3 RGB depth dimensions).  The size of each small area is
+fixed, and known as a "receptive":"https://en.wikipedia.org/wiki/Receptive_field".
+  
+
+!split 
+===== Layers of a CNN =====
+
+The layers of a convolutional neural network arrange neurons in 3D: width, height and depth.  
+The input image is typically a square matrix of depth 3. 
+
+A _convolution_ is performed on the image which outputs
+a 3D volume of neurons. The weights to the input are arranged in a number of 2D matrices, known as _filters_.
+
+
+Each filter slides along the input image, taking the dot product
+between each small part of the image and the filter, in all depth
+dimensions. This is then passed through a non-linear function,
+typically the _Rectified Linear (ReLu)_ function, which serves as the
+activation of the neurons in the first convolutional layer. This is
+further passed through a _pooling layer_, which reduces the size of the
+convolutional layer, e.g. by taking the maximum or average across some
+small regions, and this serves as input to the next convolutional
+layer.
+
+
+!split
+===== Systematic reduction =====  
+
+By systematically reducing the size of the input volume, through
+convolution and pooling, the network should create representations of
+small parts of the input, and then from them assemble representations
+of larger areas.  The final pooling layer is flattened to serve as
+input to a hidden layer, such that each neuron in the final pooling
+layer is connected to every single neuron in the hidden layer. This
+then serves as input to the output layer, e.g. a softmax output for
+classification.
+  
+
+!split
+===== Prerequisites: Collect and pre-process data =====
+!bc pycod
+# import necessary packages
+import numpy as np
+import matplotlib.pyplot as plt
+from sklearn import datasets
+
+
+# ensure the same random numbers appear every time
+np.random.seed(0)
+
+# display images in notebook
+%matplotlib inline
+plt.rcParams['figure.figsize'] = (12,12)
+
+
+# download MNIST dataset
+digits = datasets.load_digits()
+
+# define inputs and labels
+inputs = digits.images
+labels = digits.target
+
+# RGB images have a depth of 3
+# our images are grayscale so they should have a depth of 1
+inputs = inputs[:,:,:,np.newaxis]
+
+print("inputs = (n_inputs, pixel_width, pixel_height, depth) = " + str(inputs.shape))
+print("labels = (n_inputs) = " + str(labels.shape))
+
+
+# choose some random images to display
+n_inputs = len(inputs)
+indices = np.arange(n_inputs)
+random_indices = np.random.choice(indices, size=5)
+
+for i, image in enumerate(digits.images[random_indices]):
+    plt.subplot(1, 5, i+1)
+    plt.axis('off')
+    plt.imshow(image, cmap=plt.cm.gray_r, interpolation='nearest')
+    plt.title("Label: %d" % digits.target[random_indices[i]])
+plt.show()
+!ec
+
+
+!split
+===== Importing Keras and Tensorflow =====
+!bc pycod
+from tensorflow.keras import datasets, layers, models
+from tensorflow.keras.layers import Input
+from tensorflow.keras.models import Sequential      #This allows appending layers to existing models
+from tensorflow.keras.layers import Dense           #This allows defining the characteristics of a particular layer
+from tensorflow.keras import optimizers             #This allows using whichever optimiser we want (sgd,adam,RMSprop)
+from tensorflow.keras import regularizers           #This allows using whichever regularizer we want (l1,l2,l1_l2)
+from tensorflow.keras.utils import to_categorical   #This allows using categorical cross entropy as the cost function
+#from tensorflow.keras import Conv2D
+#from tensorflow.keras import MaxPooling2D
+#from tensorflow.keras import Flatten
+
+from sklearn.model_selection import train_test_split
+
+# representation of labels
+labels = to_categorical(labels)
+
+# split into train and test data
+# one-liner from scikit-learn library
+train_size = 0.8
+test_size = 1 - train_size
+X_train, X_test, Y_train, Y_test = train_test_split(inputs, labels, train_size=train_size,
+                                                    test_size=test_size)
+!ec
+
+!split 
+===== Running with Keras =====
+
+!bc pycod
+def create_convolutional_neural_network_keras(input_shape, receptive_field,
+                                              n_filters, n_neurons_connected, n_categories,
+                                              eta, lmbd):
+    model = Sequential()
+    model.add(layers.Conv2D(n_filters, (receptive_field, receptive_field), input_shape=input_shape, padding='same',
+              activation='relu', kernel_regularizer=regularizers.l2(lmbd)))
+    model.add(layers.MaxPooling2D(pool_size=(2, 2)))
+    model.add(layers.Flatten())
+    model.add(layers.Dense(n_neurons_connected, activation='relu', kernel_regularizer=regularizers.l2(lmbd)))
+    model.add(layers.Dense(n_categories, activation='softmax', kernel_regularizer=regularizers.l2(lmbd)))
+    
+    sgd = optimizers.SGD(learning_rate=eta)
+    model.compile(loss='categorical_crossentropy', optimizer=sgd, metrics=['accuracy'])
+    
+    return model
+
+epochs = 100
+batch_size = 100
+input_shape = X_train.shape[1:4]
+receptive_field = 3
+n_filters = 10
+n_neurons_connected = 50
+n_categories = 10
+
+eta_vals = np.logspace(-5, 1, 7)
+lmbd_vals = np.logspace(-5, 1, 7)
+!ec
+
+!split
+===== Final part =====
+
+!bc pycod
+CNN_keras = np.zeros((len(eta_vals), len(lmbd_vals)), dtype=object)
+        
+for i, eta in enumerate(eta_vals):
+    for j, lmbd in enumerate(lmbd_vals):
+        CNN = create_convolutional_neural_network_keras(input_shape, receptive_field,
+                                              n_filters, n_neurons_connected, n_categories,
+                                              eta, lmbd)
+        CNN.fit(X_train, Y_train, epochs=epochs, batch_size=batch_size, verbose=0)
+        scores = CNN.evaluate(X_test, Y_test)
+        
+        CNN_keras[i][j] = CNN
+        
+        print("Learning rate = ", eta)
+        print("Lambda = ", lmbd)
+        print("Test accuracy: %.3f" % scores[1])
+        print()
+!ec     
+
+!split
+===== Final visualization =====
+
+!bc pycod
+# visual representation of grid search
+# uses seaborn heatmap, could probably do this in matplotlib
+import seaborn as sns
+
+sns.set()
+
+train_accuracy = np.zeros((len(eta_vals), len(lmbd_vals)))
+test_accuracy = np.zeros((len(eta_vals), len(lmbd_vals)))
+
+for i in range(len(eta_vals)):
+    for j in range(len(lmbd_vals)):
+        CNN = CNN_keras[i][j]
+
+        train_accuracy[i][j] = CNN.evaluate(X_train, Y_train)[1]
+        test_accuracy[i][j] = CNN.evaluate(X_test, Y_test)[1]
+
+        
+fig, ax = plt.subplots(figsize = (10, 10))
+sns.heatmap(train_accuracy, annot=True, ax=ax, cmap="viridis")
+ax.set_title("Training Accuracy")
+ax.set_ylabel("$\eta$")
+ax.set_xlabel("$\lambda$")
+plt.show()
+
+fig, ax = plt.subplots(figsize = (10, 10))
+sns.heatmap(test_accuracy, annot=True, ax=ax, cmap="viridis")
+ax.set_title("Test Accuracy")
+ax.set_ylabel("$\eta$")
+ax.set_xlabel("$\lambda$")
+plt.show()
+!ec
+
+
+
+
+!split
+===== The CIFAR01 data set =====
+
+The CIFAR10 dataset contains 60,000 color images in 10 classes, with
+6,000 images in each class. The dataset is divided into 50,000
+training images and 10,000 testing images. The classes are mutually
+exclusive and there is no overlap between them.
+
+!bc pycod
+import tensorflow as tf
+
+from tensorflow.keras import datasets, layers, models
+import matplotlib.pyplot as plt
+
+# We import the data set
+(train_images, train_labels), (test_images, test_labels) = datasets.cifar10.load_data()
+
+# Normalize pixel values to be between 0 and 1 by dividing by 255. 
+train_images, test_images = train_images / 255.0, test_images / 255.0
+
+!ec
+
+
+
+!split
+===== Verifying the data set =====
+
+To verify that the dataset looks correct, let's plot the first 25 images from the training set and display the class name below each image.
+
+!bc pycod
+class_names = ['airplane', 'automobile', 'bird', 'cat', 'deer',
+               'dog', 'frog', 'horse', 'ship', 'truck']
+plt.figure(figsize=(10,10))
+for i in range(25):
+    plt.subplot(5,5,i+1)
+    plt.xticks([])
+    plt.yticks([])
+    plt.grid(False)
+    plt.imshow(train_images[i], cmap=plt.cm.binary)
+    # The CIFAR labels happen to be arrays, 
+    # which is why you need the extra index
+    plt.xlabel(class_names[train_labels[i][0]])
+plt.show()
+!ec
+
+!split
+===== Set up  the model =====
+
+The 6 lines of code below define the convolutional base using a common pattern: a stack of Conv2D and MaxPooling2D layers.
+
+As input, a CNN takes tensors of shape (image_height, image_width, color_channels), ignoring the batch size. If you are new to these dimensions, color_channels refers to (R,G,B). In this example, you will configure our CNN to process inputs of shape (32, 32, 3), which is the format of CIFAR images. You can do this by passing the argument input_shape to our first layer.
+
+!bc pycod
+model = models.Sequential()
+model.add(layers.Conv2D(32, (3, 3), activation='relu', input_shape=(32, 32, 3)))
+model.add(layers.MaxPooling2D((2, 2)))
+model.add(layers.Conv2D(64, (3, 3), activation='relu'))
+model.add(layers.MaxPooling2D((2, 2)))
+model.add(layers.Conv2D(64, (3, 3), activation='relu'))
+
+# Let's display the architecture of our model so far.
+
+model.summary()
+!ec
+
+You can see that the output of every Conv2D and MaxPooling2D layer is a 3D tensor of shape (height, width, channels). The width and height dimensions tend to shrink as you go deeper in the network. The number of output channels for each Conv2D layer is controlled by the first argument (e.g., 32 or 64). Typically, as the width and height shrink, you can afford (computationally) to add more output channels in each Conv2D layer.
+
+
+
+
+!split
+===== Add Dense layers on top =====
+
+To complete our model, you will feed the last output tensor from the
+convolutional base (of shape (4, 4, 64)) into one or more Dense layers
+to perform classification. Dense layers take vectors as input (which
+are 1D), while the current output is a 3D tensor. First, you will
+flatten (or unroll) the 3D output to 1D, then add one or more Dense
+layers on top. CIFAR has 10 output classes, so you use a final Dense
+layer with 10 outputs and a softmax activation.
+
+!bc pycod
+model.add(layers.Flatten())
+model.add(layers.Dense(64, activation='relu'))
+model.add(layers.Dense(10))
+# Here's the complete architecture of our model
+
+model.summary()
+!ec
+As you can see, our (4, 4, 64) outputs were flattened into vectors of shape (1024) before going through two Dense layers.
+
+!split
+===== Compile and train the model =====
+
+!bc pycod
+model.compile(optimizer='adam',
+              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
+              metrics=['accuracy'])
+
+history = model.fit(train_images, train_labels, epochs=10, 
+                    validation_data=(test_images, test_labels))
+
+!ec
+
+
+!split
+===== Finally, evaluate the model =====
+
+!bc pycod
+plt.plot(history.history['accuracy'], label='accuracy')
+plt.plot(history.history['val_accuracy'], label = 'val_accuracy')
+plt.xlabel('Epoch')
+plt.ylabel('Accuracy')
+plt.ylim([0.5, 1])
+plt.legend(loc='lower right')
+
+test_loss, test_acc = model.evaluate(test_images,  test_labels, verbose=2)
+
+print(test_acc)
+
+!ec
+
+
+
+
+
+!split
+===== Building our own CNN code ===== 
+
+
+Here we present a flexible and readable python code for a CNN
+implemented with NumPy. We will present the code, showcase how to use
+the codebase and fit a CNN that yields a 99% accuracy on the 28x28
+MNIST dataset within reasonable time.
+
+_The codes here were developed by Eric Reber and Gregor Kajda during spring 2023._
+
+The CNN is compatible with all schedulers, cost functions and
+activation functions discussed in constructing our neural network
+codes.
+ 
+ The CNN code consists of different types of Layer classes, including
+Convolution2DLayer, Pooling2DLayer, FlattenLayer, FullyConnectedLayer
+and OutputLayer, which can be added to the CNN object using the
+interface of the CNN class. This allows you to easily construct your
+own CNN, as well as allowing you to get used to an interface similar
+to that of TensorFlow which is used for real world applications. 
+
+Another important feature of this code is that it throws errors if
+unreasonable decisions are made (for example using a kernel that is
+larger than the image, not using a FlattenLayer, etc), and provides
+the user with an informative error message.
+
+=== List of contents: ===
+o Schedulers
+o Activation Functions
+o Cost Functions 
+o Convolution
+o Layers
+o CNN 
+o Some final remarks
+
+=== Schedulers ===
+
+The code below shows object oriented implementations of the Constant,
+Momentum, Adagrad, AdagradMomentum, RMS prop and Adam schedulers. All
+of the classes belong to the shared abstract Scheduler class, and
+share the update_change() and reset() methods allowing for any of the
+schedulers to be seamlessly used during the training stage, as will
+later be shown in the fit() method of the neural
+network. Update_change() only has one parameter, the gradient
+($\delta^{l}_{j}a^{l-1}_k$), and returns the change which will be
+subtracted from the weights. The reset() function takes no parameters,
+and resets the desired variables. For Constant and Momentum, reset
+does nothing.
+
+!bc pycod
+import autograd.numpy as np
+
+class Scheduler:
+    """
+    Abstract class for Schedulers
+    """
+
+    def __init__(self, eta):
+        self.eta = eta
+
+    # should be overwritten
+    def update_change(self, gradient):
+        raise NotImplementedError
+
+    # overwritten if needed
+    def reset(self):
+        pass
+
+
+class Constant(Scheduler):
+    def __init__(self, eta):
+        super().__init__(eta)
+
+    def update_change(self, gradient):
+        return self.eta * gradient
+    
+    def reset(self):
+        pass
+
+
+class Momentum(Scheduler):
+    def __init__(self, eta: float, momentum: float):
+        super().__init__(eta)
+        self.momentum = momentum
+        self.change = 0
+
+    def update_change(self, gradient):
+        self.change = self.momentum * self.change + self.eta * gradient
+        return self.change
+
+    def reset(self):
+        pass
+
+
+class Adagrad(Scheduler):
+    def __init__(self, eta):
+        super().__init__(eta)
+        self.G_t = None
+
+    def update_change(self, gradient):
+        delta = 1e-8  # avoid division ny zero
+
+        if self.G_t is None:
+            self.G_t = np.zeros((gradient.shape[0], gradient.shape[0]))
+
+        self.G_t += gradient @ gradient.T
+
+        G_t_inverse = 1 / (
+            delta + np.sqrt(np.reshape(np.diagonal(self.G_t), (self.G_t.shape[0], 1)))
+        )
+        return self.eta * gradient * G_t_inverse
+
+    def reset(self):
+        self.G_t = None
+
+
+class AdagradMomentum(Scheduler):
+    def __init__(self, eta, momentum):
+        super().__init__(eta)
+        self.G_t = None
+        self.momentum = momentum
+        self.change = 0
+
+    def update_change(self, gradient):
+        delta = 1e-8  # avoid division ny zero
+
+        if self.G_t is None:
+            self.G_t = np.zeros((gradient.shape[0], gradient.shape[0]))
+
+        self.G_t += gradient @ gradient.T
+
+        G_t_inverse = 1 / (
+            delta + np.sqrt(np.reshape(np.diagonal(self.G_t), (self.G_t.shape[0], 1)))
+        )
+        self.change = self.change * self.momentum + self.eta * gradient * G_t_inverse
+        return self.change
+
+    def reset(self):
+        self.G_t = None
+
+
+class RMS_prop(Scheduler):
+    def __init__(self, eta, rho):
+        super().__init__(eta)
+        self.rho = rho
+        self.second = 0.0
+
+    def update_change(self, gradient):
+        delta = 1e-8  # avoid division ny zero
+        self.second = self.rho * self.second + (1 - self.rho) * gradient * gradient
+        return self.eta * gradient / (np.sqrt(self.second + delta))
+
+    def reset(self):
+        self.second = 0.0
+
+
+class Adam(Scheduler):
+    def __init__(self, eta, rho, rho2):
+        super().__init__(eta)
+        self.rho = rho
+        self.rho2 = rho2
+        self.moment = 0
+        self.second = 0
+        self.n_epochs = 1
+
+    def update_change(self, gradient):
+        delta = 1e-8  # avoid division ny zero
+
+        self.moment = self.rho * self.moment + (1 - self.rho) * gradient
+        self.second = self.rho2 * self.second + (1 - self.rho2) * gradient * gradient
+
+        moment_corrected = self.moment / (1 - self.rho**self.n_epochs)
+        second_corrected = self.second / (1 - self.rho2**self.n_epochs)
+
+        return self.eta * moment_corrected / (np.sqrt(second_corrected + delta))
+
+    def reset(self):
+        self.n_epochs += 1
+        self.moment = 0
+        self.second = 0
+
+!ec
+
+=== Usage of schedulers ===
+
+To initalize a scheduler, simply create the object and pass in the necessary parameters such as the learning rate and the momentum as shown below. As the Scheduler class is an abstract class it should not called directly, and will raise an error upon usage.
+
+!bc pycod
+momentum_scheduler = Momentum(eta=1e-3, momentum=0.9)
+adam_scheduler = Adam(eta=1e-3, rho=0.9, rho2=0.999)
+!ec
+
+Here is a small example for how a segment of code using schedulers could look. Switching out the schedulers is simple.
+
+!bc pycod
+weights = np.ones((3,3))
+print(f"Before scheduler:\n{weights=}")
+
+epochs = 10
+for e in range(epochs):
+    gradient = np.random.rand(3, 3)
+    change = adam_scheduler.update_change(gradient)
+    weights = weights - change
+    adam_scheduler.reset()
+
+print(f"\nAfter scheduler:\n{weights=}")
+!ec
+
+=== Cost functions ===
+
+In this section we will quickly look at cost functions that can be
+used when creating the neural network. Every cost function takes the
+target vector as its parameter, and returns a function valued only at
+X such that it may easily be differentiated.
+
+
+!bc pycod
+def CostOLS(target):
+    """
+    Return OLS function valued only at X, so
+    that it may be easily differentiated
+    """
+
+    def func(X):
+        return (1.0 / target.shape[0]) * np.sum((target - X) ** 2)
+
+    return func
+
+
+def CostLogReg(target):
+    """
+    Return Logistic Regression cost function
+    valued only at X, so that it may be easily differentiated
+    """
+
+    def func(X):
+        return -(1.0 / target.shape[0]) * np.sum(
+            (target * np.log(X + 10e-10)) + ((1 - target) * np.log(1 - X + 10e-10))
+        )
+
+    return func
+
+
+def CostCrossEntropy(target):
+    """
+    Return cross entropy cost function valued only at X, so
+    that it may be easily differentiated
+    """
+    
+    def func(X):
+        return -(1.0 / target.size) * np.sum(target * np.log(X + 10e-10))
+
+    return func
+
+!ec
+
+=== Usage of cost functions ===
+
+Below we will provide a short example of how these cost function may
+be used to obtain results if you wish to test them out on your own
+using AutoGrad's automatic differentiation.
+
+!bc pycod
+from autograd import grad
+
+target = np.array([[1, 2, 3]]).T
+a = np.array([[4, 5, 6]]).T
+
+cost_func = CostCrossEntropy
+cost_func_derivative = grad(cost_func(target))
+
+valued_at_a = cost_func_derivative(a)
+print(f"Derivative of cost function {cost_func.__name__} valued at a:\n{valued_at_a}")
+!ec
+
+=== Activation functions ===
+
+Finally, before we look at the layers that make up the neural network,
+we will look at the activation functions which can be specified
+between the hidden layers and as the output function. Each function
+can be valued for any given vector or matrix X, and can be
+differentiated via derivate().
+
+
+!bc pycod
+
+import autograd.numpy as np
+from autograd import elementwise_grad
+
+def identity(X):
+    return X
+
+
+def sigmoid(X):
+    try:
+        return 1.0 / (1 + np.exp(-X))
+    except FloatingPointError:
+        return np.where(X > np.zeros(X.shape), np.ones(X.shape), np.zeros(X.shape))
+
+
+def softmax(X):
+    X = X - np.max(X, axis=-1, keepdims=True)
+    delta = 10e-10
+    return np.exp(X) / (np.sum(np.exp(X), axis=-1, keepdims=True) + delta)
+
+
+def RELU(X):
+    return np.where(X > np.zeros(X.shape), X, np.zeros(X.shape))
+
+
+def LRELU(X):
+    delta = 10e-4
+    return np.where(X > np.zeros(X.shape), X, delta * X)
+
+
+def derivate(func):
+    if func.__name__ == "RELU":
+
+        def func(X):
+            return np.where(X > 0, 1, 0)
+
+        return func
+
+    elif func.__name__ == "LRELU":
+
+        def func(X):
+            delta = 10e-4
+            return np.where(X > 0, 1, delta)
+
+        return func
+
+    else:
+        return elementwise_grad(func)
+!ec
+
+
+=== Usage of activation functions ===
+
+Below we present a short demonstration of how to use an activation
+function. The derivative of the activation function will be important
+when calculating the output delta term during backpropagation. Note
+that derivate() can also be used for cost functions for a more
+generalized approach.
+
+
+!bc pycod
+z = np.array([[4, 5, 6]]).T
+print(f"Input to activation function:\n{z}")
+
+act_func = sigmoid
+a = act_func(z)
+print(f"\nOutput from {act_func.__name__} activation function:\n{a}")
+
+act_func_derivative = derivate(act_func)
+valued_at_z = act_func_derivative(a)
+print(f"\nDerivative of {act_func.__name__} activation function valued at z:\n{valued_at_z}")
+!ec
+
+=== Convolution ===
+
+In order to construct a convolutional neural network (CNN), it is
+crucial to comprehend the fundamental principles of convolution and
+how it aids in extracting information from images. Convolution, at its
+core, is merely a mathematical operation between two functions that
+yields another function. It is represented by an integral between two
+functions, which is typically expressed as:
+
+!bt
+\[
+(f \ast g)(t):=\int_{-\infty}^{\infty} f(\tau) g(t-\tau) d \tau.
+\]
+!et
+
+Here, $f$ and $g$ are the two functions on which we want to perform an
+operation. The outcome of the convolution operation is represented by
+$(f \ast g)$, and it is derived by sliding the function $g$ over $f$ and
+computing the integral of their product at each position. If both
+functions are continuous, convolution takes the form shown
+above. However, if we discretize both $f$ and $g$, the convolution
+operation will take the form of a sum between the elements of $f$ and $g$:
+!bt
+\[
+(f \ast g)[n]=\sum_{m=0}^{n-1} f(m) g(n-m).
+\]
+!et
+
+The key idea we utilize to extract the information contained in an
+image is to slide an $m \times n$ matrix $g$ over an $m \times n$
+matrix $f$. In our case, $f$ represents the image, while $g$
+represents the kernel, oftentimes called a filter. However, since our
+convolution will be a two-dimensional variant, we need to extend our
+mathematical formula with an additional summation:
+
+!bt
+\[
+(f \ast g)(i, j)\sum_{m=0}^{M-1}\sum_{n=0}^{N-1} f(m,n) g(i-m, j-n).
+\]
+!et
+
+It is imperative to note that the size of the kernel g is
+significantly smaller than the size of the input image f, thereby
+reducing the amount of computation necessary for feature
+extraction. Furthermore, the kernel is usually a trainable parameter
+in a convolutional neural network, allowing the network to learn
+appropriate kernels for specific tasks.
+
+To give you an example of how 2D convolution works in practice,
+suppose we have an image $f$ of dimension $6 \times 6$
+
+!bt
+\[
+f = \begin{bmatrix}
+4 & 1 & 2 & 9 & 8 & 6 \\
+9 & 5 & 9 & 5 & 8 & 5 \\
+1 & 5 & 9 & 7 & 6 & 4 \\
+2 & 9 & 8 & 3 & 7 & 1 \\
+8 & 1 & 6 & 4 & 2 & 2 \\
+1 & 0 & 5 & 7 & 8 & 2 \\
+\end{bmatrix}
+\]
+!et
+
+and a $3 \times 3$ kernel $g$ called a low-pass filter. Note that the
+kernel is usually rotated by 180 degrees during convolution, however
+this has no effect on this kernel.
+
+!bt
+\[
+g = \frac{1}{9}
+\begin{bmatrix}
+1 & 1 & 1 \\
+1 & 1 & 1 \\
+1 & 1 & 1 \\
+\end{bmatrix}
+\]
+!et
+
+In order to filter the image, we have to extract a $3 \times 3$
+element from the upper left corner of $f$, and perform element-wise
+multiplication of the extracted image pixels with the elements of the
+kernel $g$:
+
+!bt
+\[
+\begin{bmatrix}
+4 & 1 & 2 \\
+9 & 5 & 9 \\
+1 & 5 & 9 \\
+\end{bmatrix}
+\cdot
+\begin{bmatrix}
+\frac{1}{9} & \frac{1}{9} & \frac{1}{9} \\
+\frac{1}{9} & \frac{1}{9} & \frac{1}{9} \\
+\frac{1}{9} & \frac{1}{9} & \frac{1}{9} \\
+\end{bmatrix}
+=
+\begin{bmatrix}
+\frac{4}{9} & \frac{1}{9} & \frac{2}{9} \\
+\frac{9}{9} & \frac{5}{9} & \frac{9}{9} \\
+\frac{1}{9} & \frac{5}{9} & \frac{9}{9} \\
+\end {bmatrix}
+= \bm{A}
+\]
+!et
+
+Then, following the multiplication, we summarize all the elements of the resulting matrix $\bm{A}$:
+
+!bt
+\[
+(f \ast g)(0, 0)= \sum_{i=0}^{2} \sum_{j=0}^{2} a_{i,j} = 5,
+\]
+!et
+which corresponds to the first element of the filtered image $(f \ast g)$.
+
+Here we use a stride of $S=1$, a parameter denoted $S$ which describes how
+many indexes we move the kernel $g$ to the right before repeating the
+calculations above for the next $3 \times 3$ element of the image
+$f$. It is usually presumed that $S=1$, however, larger values for $S$
+can be used to reduce the dimentionality of the filtered image such
+that the convolution operation is more computationally efficient. In
+the context of a convolutional neural network, this will become very
+useful.
+
+The full result of the convolution is:
+
+!bt
+\[
+(f \ast g) =
+\begin{bmatrix}
+5 & 5.78 & 7 & 6.44 \\
+6.33 & 6.67 & 6.89 & 5.11 \\
+5.44 & 5.78 & 5.78 & 4 \\
+4.44 & 4.78 & 5.56 & 4 \\
+\end{bmatrix}
+\]
+!et
+
+The result is markedly smaller in shape than the original image. This
+occurs when using convolution without first padding the image with
+additional columns and rows, allowing us to keep the original image
+shape after sliding the kernel over the image.  How many rows and
+columns we wish to pad the image with depends strictly on the shape of
+the kernel, as we wish to pad the image with $r$ additional rows and
+$c$ additional columns.
+
+!bt
+\[
+r =\lfloor \frac{\mathrm{kernel height}}{2} \rfloor \cdot 2 \\
+c =\lfloor \frac{\mathrm{kernel width}}{2} \rfloor \cdot 2
+\]
+!et
+
+Note the notation $\lfloor \frac{\mathrm{kernel width}}{2} \rfloor$ means that
+we floor the result of the division, meaning we round down to a whole
+number in case $\frac{\mathrm{kernel width}}{2}$ results in a floating point
+number.
+
+Using those simple equations, we find out by how much we have to
+extend the dimensions of the original image. Before proceeding,
+however, we might ask what we shall fill the additional rows and
+columns with? One of the most common approaches to padding is
+zero-padding, which as the name suggest, involves filling the rows and
+columns with zeros. This is the approach that we will be using for
+this demonstration. If we apply this padding to out original $6 \times 6$
+image, the result will be an $8 \times 8$ image as the kernel has a width and
+height of 3. Note that the original image is encapsuled by the
+zero-padded rows and columns:
+
+!bt
+\[
+\begin{bmatrix}
+0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\
+0 & 4 & 1 & 2 & 9 & 8 & 6 & 0 \\
+0 & 9 & 5 & 9 & 5 & 8 & 5 & 0 \\
+0 & 1 & 5 & 9 & 7 & 6 & 4 & 0 \\
+0 & 2 & 9 & 8 & 3 & 7 & 1 & 0 \\
+0 & 8 & 1 & 6 & 4 & 2 & 2 & 0 \\
+0 & 1 & 0 & 5 & 7 & 8 & 2 & 0 \\
+0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\
+\end{bmatrix}.
+\]
+!et
+
+Below we have provided code that demonstrates padding and
+convolution. As you will see when we run the code, the size of the
+image will remain unchanged when using padding.~
+
+!bc pycod
+import numpy as np
+
+def padding(image, kernel):
+    # calculate r and c
+    r = (kernel.shape[0] // 2) * 2
+    c = (kernel.shape[1] // 2) * 2
+    
+    # padded image dimensions
+    padded_height = image.shape[0] + r
+    padded_width = image.shape[1] + c
+    
+    # for more readable code
+    k_half_height = kernel.shape[0] // 2
+    k_half_width = kernel.shape[1] // 2
+
+    # zero matrix with padded dimensions
+    padded_img = np.zeros((padded_height, padded_width))
+
+    # place image into zero matrix
+    padded_img[k_half_height : padded_height - k_half_height,
+               k_half_width : padded_width - k_half_width] = image[:, :]
+
+    return padded_img
+
+def convolve(original_image, padded_image, kernel, stride=1):
+    # rotate kernel by 180 degrees
+    kernel = np.rot90(np.rot90(kernel))
+
+    # note that kernel height // 2 is written as 'm'
+    # and kernel width // 2 as 'n' in the mathematical notation
+    m = kernel.shape[0] // 2
+    n = kernel.shape[1] // 2
+    
+    r = (kernel.shape[0] // 2) * 2
+    c = (kernel.shape[1] // 2) * 2
+    
+    # initialize output array
+    convolved_image = np.zeros(original_image.shape)
+    image_height = original_image.shape[0]
+    image_width = original_image.shape[1]
+
+    # the convolution
+    for i in range(m, image_height + m, stride):
+        for j in range(n, image_width + n, stride):
+            convolved_image[i-m, j-n] = np.sum(
+                padded_image[i : i + m, j : j + n]
+                * kernel
+            )
+            
+    return convolved_image
+
+def convolve(image, kernel, stride=1):
+    for i in range(2):
+        kernel = np.rot90(kernel)
+
+    k_half_height = kernel.shape[0] // 2
+    k_half_width = kernel.shape[0] // 2
+
+    conv_image = np.zeros(image.shape)
+    pad_image = padding(image, kernel)
+
+    for i in range(k_half_height, conv_image.shape[0] + k_half_height, stride):
+        for j in range(k_half_width, conv_image.shape[1] + k_half_width, stride):
+            conv_image[i - k_half_height, j - k_half_width] = np.sum(
+                pad_image[
+                    i - k_half_height : i + k_half_height + 1, j - k_half_width : j + k_half_width + 1
+                ]
+                * kernel
+            )
+
+    return conv_image
+!ec
+
+Fun fact: When filtering images, you will see that convolution
+involves rotating the kernel by 180 degrees.  However, this is not the
+case when applying convolution in a CNN, where the same operation that is not
+rotated by 180 degrees is called cross-correlation, which is normally implemented in most libraries.
+
+!bc pycod
+
+original_image = np.array([[4, 1, 2, 9, 8, 6],
+                 [9, 5, 9, 5, 8, 5],
+                 [1, 5, 9, 7, 6, 4],
+                 [2, 9, 8, 3, 7, 1],
+                 [8, 1, 6, 4, 2, 2],
+                 [1, 0, 5, 7, 8, 2]])
+
+kernel = (1/9)*np.ones((3,3))
+
+print(f"{original_image.shape=}")
+
+# note that convolve() performs padding
+convolved_image = convolve(original_image, kernel, stride=1)
+
+print(f"{convolved_image.shape=}")
+!ec
+
+As you can see, the resulting image is of the same size as the
+original image. To round of our demonstration of convolution, we will
+present the results of convolution using commonly used kernels. In a
+CNN, the values of the kernels are randomly initialized, and then
+learned during training. These kernels will extract information
+regarding the picture, such as for example the edge detection filter
+demonstrated below extracts the edges present in the picture. Of
+course, there is no guarantee that the CNN will learn an edge
+detection filter, but this should provide some intuiton as to how the
+CNN is able to use kernels to make better predictions than a regular
+feed forward neural network.
+
+!bc pycod
+# Now an example using a real image and first a gaussian low-pass filter and then a Sobel filter
+import numpy as np
+import imageio.v3 as imageio
+import matplotlib.pyplot as plt
+import time
+
+def generate_gauss_mask(sigma, K=1):
+    side = np.ceil(1 + 8 * sigma)
+    y, x = np.mgrid[-side // 2 + 1 : (side // 2) + 1, -side // 2 + 1 : (side // 2) + 1]
+    ker_coef = K / (2 * np.pi * sigma**2)
+    g = np.exp(-((x**2 + y**2) / (2.0 * sigma**2)))
+
+    return g, ker_coef
+
+
+img_path = "data/IMG-2167.JPG"
+image_of_cute_dog = imageio.imread(img_path, mode='L')
+
+plt.imshow(image_of_cute_dog, cmap="gray", vmin=0, vmax=255, aspect="auto")
+plt.title("Original image")
+plt.show()
+
+gauss, kernel = generate_gauss_mask(sigma=6)
+gauss_kernel = gauss*kernel
+
+filtered_image = convolve(image_of_cute_dog, gauss_kernel)
+plt.imshow(filtered_image, cmap="gray", vmin=0, vmax=255, aspect="auto")
+plt.title("Result of convolution with gauss kernel (blurring filter)")
+plt.show()
+
+sobel_kernel = np.array([[1, 2, 1],
+                    [0, 0, 0], 
+                    [-1, -2, -1]])
+
+filtered_image = convolve(image_of_cute_dog, sobel_kernel)
+
+plt.imshow(filtered_image, cmap="gray", vmin=0, vmax=255, aspect="auto")
+plt.title("Result of convolution with sobel kernel (edge detection filter)")
+plt.show()
+!ec
+
+=== Layers ===
+
+The code below initialises global variables for readability and
+describes the abstract class Layers. This is not important in order to
+understand the CNN, but is benefitial for organizing the code neatly.
+
+!bc pycod
+import math
+import autograd.numpy as np
+from copy import deepcopy, copy
+from autograd import grad
+from typing import Callable
+
+# global variables for index readability
+input_index = 0
+node_index = 1
+bias_index = 1
+input_channel_index = 1
+feature_maps_index = 1
+height_index = 2
+width_index = 3
+kernel_feature_maps_index = 1
+kernel_input_channels_index = 0
+
+
+class Layer:
+    def __init__(self, seed):
+        self.seed = seed
+
+    def _feedforward(self):
+        raise NotImplementedError
+
+    def _backpropagate(self):
+        raise NotImplementedError
+
+    def _reset_weights(self, previous_nodes):
+        raise NotImplementedError
+!ec
+
+=== Convolution2DLayer: convolution in a hidden layer ===
+
+After establishing the foundational understanding of applying
+convolution to spatial data, let us delve into the intricate workings
+of a convolutional layer in a Convolutional Neural Network (CNN). The
+primary function of convolution, as previously discussed, is to
+extract pertinent information from images while simultaneously
+decreasing the scale of our data. To initiate the image processing, we
+shall begin by partitioning the images into color channels (unless the
+image is grayscale), comprising three primary colors: red, green, and
+blue. We will subsequently utilize trainable kernels to construct a
+higher-dimensional encoding of each channel called feature
+maps. Successive layers will receive these feature maps as inputs,
+generating further encodings, albeit with reduced dimensions. The term
+trainable kernels denotes the initialization of pre-defined
+kernel-shaped weights, which we will then train via backpropagation,
+similar to how weights are trained in a Feedforward Neural Network.
+
+ 
+To ensure seamless integration between our implementation of the
+convolutional layer and popular machine learning frameworks like
+Tensorflow (Keras) and PyTorch, we have adopted a design pattern that
+mirrors the construction of models using these APIs. This involves
+implementing our convolutional layer as a Python class or object,
+which allows for a more modular and flexible approach to building
+neural networks. By structuring our code in this way, users can easily
+incorporate our implementation into their existing machine learning
+pipelines without having to make significant changes to their
+codebase. Additionally, this design pattern promotes code reusability
+and makes it easier to maintain and update our convolutional layer
+implementation over time.
+
+ 
+Note that the Convolution2DLayer takes in an activation function as a parameter, as it also performs non-linearity.
+
+!bc pycod
+class Convolution2DLayer(Layer):
+    def __init__(
+        self,
+        input_channels,
+        feature_maps,
+        kernel_height,
+        kernel_width,
+        v_stride,
+        h_stride,
+        pad,
+        act_func: Callable,
+        seed=None,
+        reset_weights_independently=True,
+    ):
+        super().__init__(seed)
+        self.input_channels = input_channels
+        self.feature_maps = feature_maps
+        self.kernel_height = kernel_height
+        self.kernel_width = kernel_width
+        self.v_stride = v_stride
+        self.h_stride = h_stride
+        self.pad = pad
+        self.act_func = act_func
+
+        # such that the layer can be used on its own
+        # outside of the CNN module
+        if reset_weights_independently == True:
+            self._reset_weights_independently()
+
+    def _feedforward(self, X_batch):
+        # note that the shape of X_batch = [inputs, input_maps, img_height, img_width]
+
+        # pad the input batch
+        X_batch_padded = self._padding(X_batch)
+
+        # calculate height_index and width_index after stride
+        strided_height = int(np.ceil(X_batch.shape[height_index] / self.v_stride))
+        strided_width = int(np.ceil(X_batch.shape[width_index] / self.h_stride))
+
+        # create output array
+        output = np.ndarray(
+            (
+                X_batch.shape[input_index],
+                self.feature_maps,
+                strided_height,
+                strided_width,
+            )
+        )
+
+        # save input and output for backpropagation
+        self.X_batch_feedforward = X_batch
+        self.output_shape = output.shape
+
+        # checking for errors, no need to look here :)
+        self._check_for_errors()
+
+        # convolve input with kernel
+        for img in range(X_batch.shape[input_index]):
+            for chin in range(self.input_channels):
+                for fmap in range(self.feature_maps):
+                    out_h = 0
+                    for h in range(0, X_batch.shape[height_index], self.v_stride):
+                        out_w = 0
+                        for w in range(0, X_batch.shape[width_index], self.h_stride):
+                            output[img, fmap, out_h, out_w] = np.sum(
+                                X_batch_padded[
+                                    img,
+                                    chin,
+                                    h : h + self.kernel_height,
+                                    w : w + self.kernel_width,
+                                ]
+                                * self.kernel[chin, fmap, :, :]
+                            )
+                            out_w += 1
+                        out_h += 1
+
+        # Pay attention to the fact that we're not rotating the kernel by 180 degrees when filtering the image in
+        # the convolutional layer, as convolution in terms of Machine Learning is a procedure known as cross-correlation
+        # in image processing and signal processing
+
+        # return a
+        return self.act_func(output / (self.kernel_height))
+
+    def _backpropagate(self, delta_term_next):
+        # intiate matrices
+        delta_term = np.zeros((self.X_batch_feedforward.shape))
+        gradient_kernel = np.zeros((self.kernel.shape))
+
+        # pad input for convolution
+        X_batch_padded = self._padding(self.X_batch_feedforward)
+
+        # Since an activation function is used at the output of the convolution layer, its derivative
+        # has to be accounted for in the backpropagation -> as if ReLU was a layer on its own.
+        act_derivative = derivate(self.act_func)
+        delta_term_next = act_derivative(delta_term_next)
+
+        # fill in 0's for values removed by vertical stride in feedforward
+        if self.v_stride > 1:
+            v_ind = 1
+            for i in range(delta_term_next.shape[height_index]):
+                for j in range(self.v_stride - 1):
+                    delta_term_next = np.insert(
+                        delta_term_next, v_ind, 0, axis=height_index
+                    )
+                v_ind += self.v_stride
+
+        # fill in 0's for values removed by horizontal stride in feedforward
+        if self.h_stride > 1:
+            h_ind = 1
+            for i in range(delta_term_next.shape[width_index]):
+                for k in range(self.h_stride - 1):
+                    delta_term_next = np.insert(
+                        delta_term_next, h_ind, 0, axis=width_index
+                    )
+                h_ind += self.h_stride
+
+        # crops out 0-rows and 0-columns
+        delta_term_next = delta_term_next[
+            :,
+            :,
+            : self.X_batch_feedforward.shape[height_index],
+            : self.X_batch_feedforward.shape[width_index],
+        ]
+
+        # the gradient received from the next layer also needs to be padded
+        delta_term_next = self._padding(delta_term_next)
+
+        # calculate delta term by convolving next delta term with kernel
+        for img in range(self.X_batch_feedforward.shape[input_index]):
+            for chin in range(self.input_channels):
+                for fmap in range(self.feature_maps):
+                    for h in range(self.X_batch_feedforward.shape[height_index]):
+                        for w in range(self.X_batch_feedforward.shape[width_index]):
+                            delta_term[img, chin, h, w] = np.sum(
+                                delta_term_next[
+                                    img,
+                                    fmap,
+                                    h : h + self.kernel_height,
+                                    w : w + self.kernel_width,
+                                ]
+                                * np.rot90(np.rot90(self.kernel[chin, fmap, :, :]))
+                            )
+
+        # calculate gradient for kernel for weight update
+        # also via convolution
+        for chin in range(self.input_channels):
+            for fmap in range(self.feature_maps):
+                for k_x in range(self.kernel_height):
+                    for k_y in range(self.kernel_width):
+                        gradient_kernel[chin, fmap, k_x, k_y] = np.sum(
+                            X_batch_padded[
+                                img,
+                                chin,
+                                h : h + self.kernel_height,
+                                w : w + self.kernel_width,
+                            ]
+                            * delta_term_next[
+                                img,
+                                fmap,
+                                h : h + self.kernel_height,
+                                w : w + self.kernel_width,
+                            ]
+                        )
+        # all kernels are updated with weight gradient of kernel
+        self.kernel -= gradient_kernel
+
+        # return delta term
+        return delta_term
+
+    def _padding(self, X_batch, batch_type="image"):
+
+        # same padding for images
+        if self.pad == "same" and batch_type == "image":
+            padded_height = X_batch.shape[height_index] + (self.kernel_height // 2) * 2
+            padded_width = X_batch.shape[width_index] + (self.kernel_width // 2) * 2
+            half_kernel_height = self.kernel_height // 2
+            half_kernel_width = self.kernel_width // 2
+
+            # initialize padded array
+            X_batch_padded = np.ndarray(
+                (
+                    X_batch.shape[input_index],
+                    X_batch.shape[feature_maps_index],
+                    padded_height,
+                    padded_width,
+                )
+            )
+
+            # zero pad all images in X_batch
+            for img in range(X_batch.shape[input_index]):
+                padded_img = np.zeros(
+                    (X_batch.shape[feature_maps_index], padded_height, padded_width)
+                )
+                padded_img[
+                    :,
+                    half_kernel_height : padded_height - half_kernel_height,
+                    half_kernel_width : padded_width - half_kernel_width,
+                ] = X_batch[img, :, :, :]
+                X_batch_padded[img, :, :, :] = padded_img[:, :, :]
+
+            return X_batch_padded
+
+        # same padding for gradients
+        elif self.pad == "same" and batch_type == "grad":
+            padded_height = X_batch.shape[height_index] + (self.kernel_height // 2) * 2
+            padded_width = X_batch.shape[width_index] + (self.kernel_width // 2) * 2
+            half_kernel_height = self.kernel_height // 2
+            half_kernel_width = self.kernel_width // 2
+
+            # initialize padded array
+            delta_term_padded = np.zeros(
+                (
+                    X_batch.shape[input_index],
+                    X_batch.shape[feature_maps_index],
+                    padded_height,
+                    padded_width,
+                )
+            )
+
+            # zero pad delta term
+            delta_term_padded[
+                :, :, : X_batch.shape[height_index], : X_batch.shape[width_index]
+            ] = X_batch[:, :, :, :]
+
+            return delta_term_padded
+
+        else:
+            return X_batch
+
+    def _reset_weights_independently(self):
+        # sets seed to remove randomness inbetween runs
+        if self.seed is not None:
+            np.random.seed(self.seed)
+
+        # initializes kernel matrix
+        self.kernel = np.ndarray(
+            (
+                self.input_channels,
+                self.feature_maps,
+                self.kernel_height,
+                self.kernel_width,
+            )
+        )
+
+        # randomly initializes weights
+        for chin in range(self.kernel.shape[kernel_input_channels_index]):
+            for fmap in range(self.kernel.shape[kernel_feature_maps_index]):
+                self.kernel[chin, fmap, :, :] = np.random.rand(
+                    self.kernel_height, self.kernel_width
+                )
+
+    def _reset_weights(self, previous_nodes):
+        # sets weights
+        self._reset_weights_independently()
+
+        # returns shape of output used for subsequent layer's weight initiation
+        strided_height = int(
+            np.ceil(previous_nodes.shape[height_index] / self.v_stride)
+        )
+        strided_width = int(np.ceil(previous_nodes.shape[width_index] / self.h_stride))
+        next_nodes = np.ones(
+            (
+                previous_nodes.shape[input_index],
+                self.feature_maps,
+                strided_height,
+                strided_width,
+            )
+        )
+        return next_nodes / self.kernel_height
+
+    def _check_for_errors(self):
+        if self.X_batch_feedforward.shape[input_channel_index] != self.input_channels:
+            raise AssertionError(
+                f"ERROR: Number of input channels in data ({self.X_batch_feedforward.shape[input_channel_index]}) is not equal to input channels in Convolution2DLayerOPT ({self.input_channels})! Please change the number of input channels of the Convolution2DLayer such that they are equal"
+            )
+!ec
+
+=== Backpropagation in the convolutional layer ===
+
+As you may have noticed, we have not yet explained how the
+backpropagation algorithm works in a convolutional layer. However,
+having covered all other major details about convolutional layers, we
+are now prepared to do so. It should come as no surprise that the
+calculation of delta terms at each convolutional layer takes the form
+of convolution. After the gradient has been propagated backwards
+through the flattening layer, where it was reshaped into an
+appropriate form, calculating the update value for the kernel is
+simply a matter of convolving the output gradient with the input of
+the layer for which we are updating the weights. For more detail, this
+article serves as an excellent resource, see
+URL:"https://pavisj.medium.com/convolutions-and-backpropagations-46026a8f5d2c"
+
+=== Demonstration ===
+
+We can use the convolutional layer above to perform a simple convolution on an image of the now familiar cute dog.
+
+!bc pycod
+import numpy as np
+import imageio.v3 as imageio
+import matplotlib.pyplot as plt
+
+def plot_convolution_result(X, layer):
+    plt.imshow(X[0, 0, :, :], vmin=0, vmax=255, cmap="gray")
+    plt.title("Original image")
+    plt.colorbar()
+    plt.show()
+    conv_result = layer._feedforward(X)
+    plt.title("Result of convolutional layer")
+    plt.imshow(conv_result[0, 0, :, :], vmin=0, vmax=255, cmap="gray")
+    plt.colorbar()
+    plt.show()
+
+# create layer
+layer = Convolution2DLayer(
+    input_channels=3,
+    feature_maps=1,
+    kernel_height=4,
+    kernel_width=4,
+    v_stride=2,
+    h_stride=2,
+    pad="same",
+    act_func=identity,
+    seed=2023,
+    )
+
+# read in image path, make data correct format
+img_path = img_path = "data/IMG-2167.JPG"
+image_of_cute_dog = imageio.imread(img_path)
+image_shape = image_of_cute_dog.shape
+image_of_cute_dog = image_of_cute_dog.reshape(1, image_shape[0], image_shape[1], image_shape[2])
+image_of_cute_dog = image_of_cute_dog.transpose(0, 3, 1, 2)
+
+# plot the result of the convolution
+plot_convolution_result(image_of_cute_dog, layer)
+!ec
+
+We cobserve that the result has half the pixels on each axis due to
+the fact that we've used a horizontal and vertical stride of 2. The
+result of this convolution is not very insightfull, as the kernel has
+completely random values for the first feedforward pass. However, as
+we perform multiple forward and backward passes, the results of the
+convolution should provide identifying features of the image it uses
+for classification.
+
+
+Note that image data usually comes in many different shapes and sizes,
+but for our CNN we require the input data be formatted as \[Number of
+inputs, input channels, input height, input width\]. Occasionally, the
+data you come accross use will be formatted like this, but on many
+occasions reshaping and transposing the dimensions is sadly necessary.
+
+=== Pooling Layer ===
+
+The pooling layer is another widely used type of layer in
+convolutional neural networks that enables data downsampling to a more
+manageable size. Despite recent technological advancements that allow
+for convolution without excessive size reduction of the data, the
+pooling layer still remains a fundamental component of convolutional
+neural networks. It can be used before, after, or in between
+convolutional layers, although finding the optimal placement of layers
+and network depth requires experimentation to achieve the best
+performance for a given problem. The code we provide allows you to
+perform two types of pooling known as max pooling and average pooling.
+
+!bc pycod
+class Pooling2DLayer(Layer):
+    def __init__(
+        self,
+        kernel_height,
+        kernel_width,
+        v_stride,
+        h_stride,
+        pooling="max",
+        seed=None,
+    ):
+        super().__init__(seed)
+        self.kernel_height = kernel_height
+        self.kernel_width = kernel_width
+        self.v_stride = v_stride
+        self.h_stride = h_stride
+        self.pooling = pooling
+
+    def _feedforward(self, X_batch):
+        # Saving the input for use in the backwardpass
+        self.X_batch_feedforward = X_batch
+
+        # check if user is silly
+        self._check_for_errors()
+
+        # Computing the size of the feature maps based on kernel size and the stride parameter
+        strided_height = (
+            X_batch.shape[height_index] - self.kernel_height
+        ) // self.v_stride + 1
+        if X_batch.shape[height_index] == X_batch.shape[width_index]:
+            strided_width = strided_height
+        else:
+            strided_width = (
+                X_batch.shape[width_index] - self.kernel_width
+            ) // self.h_stride + 1
+
+        # initialize output array
+        output = np.ndarray(
+            (
+                X_batch.shape[input_index],
+                X_batch.shape[feature_maps_index],
+                strided_height,
+                strided_width,
+            )
+        )
+
+        # select pooling action, either max or average pooling
+        if self.pooling == "max":
+            self.pooling_action = np.max
+        elif self.pooling == "average":
+            self.pooling_action = np.mean
+
+        # pool based on kernel size and stride
+        for img in range(output.shape[input_index]):
+            for fmap in range(output.shape[feature_maps_index]):
+                for h in range(strided_height):
+                    for w in range(strided_width):
+                        output[img, fmap, h, w] = self.pooling_action(
+                            X_batch[
+                                img,
+                                fmap,
+                                (h * self.v_stride) : (h * self.v_stride)
+                                + self.kernel_height,
+                                (w * self.h_stride) : (w * self.h_stride)
+                                + self.kernel_width,
+                            ]
+                        )
+
+        # output for feedforward in next layer
+        return output
+
+    def _backpropagate(self, delta_term_next):
+        # initiate delta term array
+        delta_term = np.zeros((self.X_batch_feedforward.shape))
+
+        for img in range(delta_term_next.shape[input_index]):
+            for fmap in range(delta_term_next.shape[feature_maps_index]):
+                for h in range(0, delta_term_next.shape[height_index], self.v_stride):
+                    for w in range(
+                        0, delta_term_next.shape[width_index], self.h_stride
+                    ):
+                        # max pooling
+                        if self.pooling == "max":
+                            # get window
+                            window = self.X_batch_feedforward[
+                                img,
+                                fmap,
+                                h : h + self.kernel_height,
+                                w : w + self.kernel_width,
+                            ]
+
+                            # find max values indices in window
+                            max_h, max_w = np.unravel_index(
+                                window.argmax(), window.shape
+                            )
+
+                            # set values in new, upsampled delta term
+                            delta_term[
+                                img,
+                                fmap,
+                                (h + max_h),
+                                (w + max_w),
+                            ] += delta_term_next[img, fmap, h, w]
+
+                        # average pooling
+                        if self.pooling == "average":
+                            delta_term[
+                                img,
+                                fmap,
+                                h : h + self.kernel_height,
+                                w : w + self.kernel_width,
+                            ] = (
+                                delta_term_next[img, fmap, h, w]
+                                / self.kernel_height
+                                / self.kernel_width
+                            )
+        # returns input to backpropagation in previous layer
+        return delta_term
+
+    def _reset_weights(self, previous_nodes):
+        # calculate strided height, strided width
+        strided_height = (
+            previous_nodes.shape[height_index] - self.kernel_height
+        ) // self.v_stride + 1
+        if previous_nodes.shape[height_index] == previous_nodes.shape[width_index]:
+            strided_width = strided_height
+        else:
+            strided_width = (
+                previous_nodes.shape[width_index] - self.kernel_width
+            ) // self.h_stride + 1
+
+        # initiate output array
+        output = np.ones(
+            (
+                previous_nodes.shape[input_index],
+                previous_nodes.shape[feature_maps_index],
+                strided_height,
+                strided_width,
+            )
+        )
+
+        # returns output with shape used for reset weights in next layer
+        return output
+
+    def _check_for_errors(self):
+        # check if input is smaller than kernel size -> error
+        assert (
+            self.X_batch_feedforward.shape[width_index] >= self.kernel_width
+        ), f"ERROR: Pooling kernel width_index ({self.kernel_width}) larger than data width_index ({self.X_batch_feedforward.input.shape[2]}), please lower the kernel width_index of the Pooling2DLayer"
+        assert (
+            self.X_batch_feedforward.shape[height_index] >= self.kernel_height
+        ), f"ERROR: Pooling kernel height_index ({self.kernel_height}) larger than data height_index ({self.X_batch_feedforward.input.shape[3]}), please lower the kernel height_index of the Pooling2DLayer"
+!ec
+
+=== Flattening Layer ===
+
+Before we can begin building our first CNN model, we need to introduce
+the flattening layer. As its name suggests, the flattening layer
+transforms the data into a one-dimensional vector that can be fed into
+the feedforward layers of our network. This layer plays a crucial role
+in preparing the data for further processing in the
+network. Additionally, the flattening layer is responsible for
+reshaping the gradient to the proper shape during
+backpropagation. This ensures that the kernels are correctly updated,
+allowing for effective learning in the network.
+
+!bc pycod
+class FlattenLayer(Layer):
+    def __init__(self, act_func=LRELU, seed=None):
+        super().__init__(seed)
+        self.act_func = act_func
+
+    def _feedforward(self, X_batch):
+        # save input for backpropagation
+        self.X_batch_feedforward_shape = X_batch.shape
+        # Remember, the data has the following shape: (I, FM, H, W, ) in the convolutional layers
+        # whilst the data has the shape (I, FM * H * W) in the fully connected layers
+        # I = Inputs, FM = Feature Maps, H = Height and W = Width.
+        X_batch = X_batch.reshape(
+            X_batch.shape[input_index],
+            X_batch.shape[feature_maps_index]
+            * X_batch.shape[height_index]
+            * X_batch.shape[width_index],
+        )
+
+        # add bias to a
+        self.z_matrix = X_batch
+        bias = np.ones((X_batch.shape[input_index], 1)) * 0.01
+        self.a_matrix = np.hstack([bias, X_batch])
+
+        # return a, the input to feedforward in next layer
+        return self.a_matrix
+
+    def _backpropagate(self, weights_next, delta_term_next):
+        activation_derivative = derivate(self.act_func)
+
+        # calculate delta term
+        delta_term = (
+            weights_next[bias_index:, :] @ delta_term_next.T
+        ).T * activation_derivative(self.z_matrix)
+
+        # FlattenLayer does not update weights
+        # reshapes delta layer to convolutional layer data format [Input, Feature_Maps, Height, Width]
+        return delta_term.reshape(self.X_batch_feedforward_shape)
+
+    def _reset_weights(self, previous_nodes):
+        # note that the previous nodes to the FlattenLayer are from the convolutional layers
+        previous_nodes = previous_nodes.reshape(
+            previous_nodes.shape[input_index],
+            previous_nodes.shape[feature_maps_index]
+            * previous_nodes.shape[height_index]
+            * previous_nodes.shape[width_index],
+        )
+
+        # return shape used in reset_weights in next layer
+        return previous_nodes.shape[node_index]
+
+    def get_prev_a(self):
+        return self.a_matrix
+!ec
+
+=== Fully Connected Layers ===
+
+Finally, the result from the flatten layer will pass to a series of
+fully connected layers, which function as a normal feed forward neural
+network. The fully connected layers are split into two classes;
+FullyConnectedLayer which acts as a hidden layer, and OutputLayer,
+which acts as the single output layer at the end of the CNN. If one
+wishes to use this codebase to construct a normal feed forward neural
+network, it must start with a FlattenLayer due to techincal details
+regarding weight intitialization. However many FullyConnectedLayers
+can be added to the CNN, and in each layer the amount of nodes, which
+activation function and scheduler to use can be specified. In
+practice, the scheduler will be specified in the CNN object
+initialization, and inherited if no other scheduler is specified.
+
+!bc pycod
+class FullyConnectedLayer(Layer):
+    # FullyConnectedLayer per default uses LRELU and Adam scheduler
+    # with an eta of 0.0001, rho of 0.9 and rho2 of 0.999
+    def __init__(
+        self,
+        nodes: int,
+        act_func: Callable = LRELU,
+        scheduler: Scheduler = Adam(eta=1e-4, rho=0.9, rho2=0.999),
+        seed: int = None,
+    ):
+        super().__init__(seed)
+        self.nodes = nodes
+        self.act_func = act_func
+        self.scheduler_weight = copy(scheduler)
+        self.scheduler_bias = copy(scheduler)
+
+        # initiate matrices for later
+        self.weights = None
+        self.a_matrix = None
+        self.z_matrix = None
+
+    def _feedforward(self, X_batch):
+        # calculate z
+        self.z_matrix = X_batch @ self.weights
+
+        # calculate a, add bias
+        bias = np.ones((X_batch.shape[input_index], 1)) * 0.01
+        self.a_matrix = self.act_func(self.z_matrix)
+        self.a_matrix = np.hstack([bias, self.a_matrix])
+
+        # return a, the input for feedforward in next layer
+        return self.a_matrix
+
+    def _backpropagate(self, weights_next, delta_term_next, a_previous, lam):
+        # take the derivative of the activation function
+        activation_derivative = derivate(self.act_func)
+
+        # calculate the delta term
+        delta_term = (
+            weights_next[bias_index:, :] @ delta_term_next.T
+        ).T * activation_derivative(self.z_matrix)
+
+        # intitiate matrix to store gradient
+        # note that we exclude the bias term, which we will calculate later
+        gradient_weights = np.zeros(
+            (
+                a_previous.shape[input_index],
+                a_previous.shape[node_index] - bias_index,
+                delta_term.shape[node_index],
+            )
+        )
+
+        # calculate gradient = delta term * previous a
+        for i in range(len(delta_term)):
+            gradient_weights[i, :, :] = np.outer(
+                a_previous[i, bias_index:], delta_term[i, :]
+            )
+
+        # sum the gradient, divide by input_index
+        gradient_weights = np.mean(gradient_weights, axis=input_index)
+        # for the bias gradient we do not multiply by previous a
+        gradient_bias = np.mean(delta_term, axis=input_index).reshape(
+            1, delta_term.shape[node_index]
+        )
+
+        # regularization term
+        gradient_weights += self.weights[bias_index:, :] * lam
+
+        # send gradients into scheduler
+        # returns update matrix which will be used to update the weights and bias
+        update_matrix = np.vstack(
+            [
+                self.scheduler_bias.update_change(gradient_bias),
+                self.scheduler_weight.update_change(gradient_weights),
+            ]
+        )
+
+        # update weights
+        self.weights -= update_matrix
+
+        # return weights and delta term, input for backpropagation in previous layer
+        return self.weights, delta_term
+
+    def _reset_weights(self, previous_nodes):
+        # sets seed to remove randomness inbetween runs
+        if self.seed is not None:
+            np.random.seed(self.seed)
+
+        # add bias, initiate random weights
+        bias = 1
+        self.weights = np.random.randn(previous_nodes + bias, self.nodes)
+
+        # returns number of nodes, used for reset_weights in next layer
+        return self.nodes
+
+    def _reset_scheduler(self):
+        # resets scheduler per epoch
+        self.scheduler_weight.reset()
+        self.scheduler_bias.reset()
+
+    def get_prev_a(self):
+        # returns a matrix, used in backpropagation
+        return self.a_matrix
+
+
+class OutputLayer(FullyConnectedLayer):
+    def __init__(
+        self,
+        nodes: int,
+        output_func: Callable = LRELU,
+        cost_func: Callable = CostCrossEntropy,
+        scheduler: Scheduler = Adam(eta=1e-4, rho=0.9, rho2=0.999),
+        seed: int = None,
+    ):
+        super().__init__(nodes, output_func, copy(scheduler), seed)
+        self.cost_func = cost_func
+
+        # initiate matrices for later
+        self.weights = None
+        self.a_matrix = None
+        self.z_matrix = None
+
+        # decides if the output layer performs binary or multi-class classification
+        self._set_pred_format()
+
+    def _feedforward(self, X_batch: np.ndarray):
+        # calculate a, z
+        # note that bias is not added as this would create an extra output class
+        self.z_matrix = X_batch @ self.weights
+        self.a_matrix = self.act_func(self.z_matrix)
+
+        # returns prediction
+        return self.a_matrix
+
+    def _backpropagate(self, target, a_previous, lam):
+        # note that in the OutputLayer the activation function is the output function
+        activation_derivative = derivate(self.act_func)
+
+        # calculate output delta terms
+        # for multi-class or binary classification
+        if self.pred_format == "Multi-class":
+            delta_term = self.a_matrix - target
+        else:
+            cost_func_derivative = grad(self.cost_func(target))
+            delta_term = activation_derivative(self.z_matrix) * cost_func_derivative(
+                self.a_matrix
+            )
+
+        # intiate matrix that stores gradient
+        gradient_weights = np.zeros(
+            (
+                a_previous.shape[input_index],
+                a_previous.shape[node_index] - bias_index,
+                delta_term.shape[node_index],
+            )
+        )
+
+        # calculate gradient = delta term * previous a
+        for i in range(len(delta_term)):
+            gradient_weights[i, :, :] = np.outer(
+                a_previous[i, bias_index:], delta_term[i, :]
+            )
+
+        # sum the gradient, divide by input_index
+        gradient_weights = np.mean(gradient_weights, axis=input_index)
+        # for the bias gradient we do not multiply by previous a
+        gradient_bias = np.mean(delta_term, axis=input_index).reshape(
+            1, delta_term.shape[node_index]
+        )
+
+        # regularization term
+        gradient_weights += self.weights[bias_index:, :] * lam
+
+        # send gradients into scheduler
+        # returns update matrix which will be used to update the weights and bias
+        update_matrix = np.vstack(
+            [
+                self.scheduler_bias.update_change(gradient_bias),
+                self.scheduler_weight.update_change(gradient_weights),
+            ]
+        )
+
+        # update weights
+        self.weights -= update_matrix
+
+        # return weights and delta term, input for backpropagation in previous layer
+        return self.weights, delta_term
+
+    def _reset_weights(self, previous_nodes):
+        # sets seed to remove randomness inbetween runs
+        if self.seed is not None:
+            np.random.seed(self.seed)
+
+        # add bias, initiate random weights
+        bias = 1
+        self.weights = np.random.rand(previous_nodes + bias, self.nodes)
+
+        # returns number of nodes, used for reset_weights in next layer
+        return self.nodes
+
+    def _reset_scheduler(self):
+        # resets scheduler per epoch
+        self.scheduler_weight.reset()
+        self.scheduler_bias.reset()
+
+    def _set_pred_format(self):
+        # sets prediction format to either regression, binary or multi-class classification
+        if self.act_func.__name__ is None or self.act_func.__name__ == "identity":
+            self.pred_format = "Regression"
+        elif self.act_func.__name__ == "sigmoid" or self.act_func.__name__ == "tanh":
+            self.pred_format = "Binary"
+        else:
+            self.pred_format = "Multi-class"
+
+    def get_pred_format(self):
+        # returns format of prediction
+        return self.pred_format
+!ec
+
+=== Optimized Convolution2DLayer ===
+
+For our CNN, we have also implemented an optimized version of the
+Convolution2DLayer, Convolution2DLayerOPT, which runs much faster. See
+VII. Remarks for discussion. This layer will per default be used by
+the CNN due to its computational advantages, but is much less
+readable. We've documented it such that specially interested students
+can understand the principles behind it, but it is not recommended to
+read. In short, we reshape and transpose parts of the image such that
+the convolutional operation can be swapped out for a simple matrix
+multiplication.
+
+
+!bc pycod
+class Convolution2DLayerOPT(Convolution2DLayer):
+    """
+    Am optimized version of the convolution layer above which
+    utilizes an approach of extracting windows of size equivalent
+    in size to the filter. The convoution is then performed on those
+    windows instead of a full feature map.
+    """
+
+    def __init__(
+        self,
+        input_channels,
+        feature_maps,
+        kernel_height,
+        kernel_width,
+        v_stride,
+        h_stride,
+        pad,
+        act_func: Callable,
+        seed=None,
+        reset_weights_independently=True,
+    ):
+        super().__init__(
+            input_channels,
+            feature_maps,
+            kernel_height,
+            kernel_width,
+            v_stride,
+            h_stride,
+            pad,
+            act_func,
+            seed,
+        )
+        # true if layer is used outside of CNN
+        if reset_weights_independently == True:
+            self._reset_weights_independently()
+
+    def _feedforward(self, X_batch):
+        # The optimized _feedforward method is difficult to understand but computationally more efficient
+        # for a more "by the book" approach, please look at the _feedforward method of Convolution2DLayer
+
+        # save the input for backpropagation
+        self.X_batch_feedforward = X_batch
+
+        # check that there are the correct amount of input channels
+        self._check_for_errors()
+
+        # calculate new shape after stride
+        strided_height = int(np.ceil(X_batch.shape[height_index] / self.v_stride))
+        strided_width = int(np.ceil(X_batch.shape[width_index] / self.h_stride))
+
+        # get windows of the image for more computationally efficient convolution
+        # the idea is that we want to align the dimensions that we wish to matrix
+        # multiply, then use a simple matrix multiplication instead of convolution.
+        # then, we reshape the size back to its intended shape
+        windows = self._extract_windows(X_batch)
+        windows = windows.transpose(1, 0, 2, 3, 4).reshape(
+            X_batch.shape[input_index],
+            strided_height * strided_width,
+            -1,
+        )
+
+        # reshape the kernel for more computationally efficient convolution
+        kernel = self.kernel
+        kernel = kernel.transpose(0, 2, 3, 1).reshape(
+            kernel.shape[kernel_input_channels_index]
+            * kernel.shape[height_index]
+            * kernel.shape[width_index],
+            -1,
+        )
+
+        # use simple matrix calculation to obtain output
+        output = (
+            (windows @ kernel)
+            .reshape(
+                X_batch.shape[input_index],
+                strided_height,
+                strided_width,
+                -1,
+            )
+            .transpose(0, 3, 1, 2)
+        )
+
+        # The output is reshaped and rearranged to appropriate shape
+        return self.act_func(
+            output / (self.kernel_height * X_batch.shape[feature_maps_index])
+        )
+
+    def _backpropagate(self, delta_term_next):
+        # The optimized _backpropagate method is difficult to understand but computationally more efficient
+        # for a more "by the book" approach, please look at the _backpropagate method of Convolution2DLayer
+        act_derivative = derivate(self.act_func)
+        delta_term_next = act_derivative(delta_term_next)
+
+        # calculate strided dimensions
+        strided_height = int(
+            np.ceil(self.X_batch_feedforward.shape[height_index] / self.v_stride)
+        )
+        strided_width = int(
+            np.ceil(self.X_batch_feedforward.shape[width_index] / self.h_stride)
+        )
+
+        # copy kernel
+        kernel = self.kernel
+
+        # get windows, reshape for matrix multiplication
+        windows = self._extract_windows(self.X_batch_feedforward, "image").reshape(
+            self.X_batch_feedforward.shape[input_index]
+            * strided_height
+            * strided_width,
+            -1,
+        )
+
+        # initialize output gradient, reshape and transpose into correct shape
+        # for matrix multiplication
+        output_grad_tr = delta_term_next.transpose(0, 2, 3, 1).reshape(
+            self.X_batch_feedforward.shape[input_index]
+            * strided_height
+            * strided_width,
+            -1,
+        )
+
+        # calculate gradient kernel via simple matrix multiplication and reshaping
+        gradient_kernel = (
+            (windows.T @ output_grad_tr)
+            .reshape(
+                kernel.shape[kernel_input_channels_index],
+                kernel.shape[height_index],
+                kernel.shape[width_index],
+                kernel.shape[kernel_feature_maps_index],
+            )
+            .transpose(0, 3, 1, 2)
+        )
+
+        # for computing the input gradient
+        windows_out, upsampled_height, upsampled_width = self._extract_windows(
+            delta_term_next, "grad"
+        )
+
+        # calculate new window dimensions
+        new_windows_first_dim = (
+            self.X_batch_feedforward.shape[input_index]
+            * upsampled_height
+            * upsampled_width
+        )
+        # ceil allows for various asymmetric kernels
+        new_windows_sec_dim = int(np.ceil(windows_out.size / new_windows_first_dim))
+
+        # reshape for matrix multiplication
+        windows_out = windows_out.transpose(1, 0, 2, 3, 4).reshape(
+            new_windows_first_dim, new_windows_sec_dim
+        )
+
+        # reshape for matrix multiplication
+        kernel_reshaped = kernel.reshape(self.input_channels, -1)
+
+        # calculating input gradient for next convolutional layer
+        input_grad = (windows_out @ kernel_reshaped.T).reshape(
+            self.X_batch_feedforward.shape[input_index],
+            upsampled_height,
+            upsampled_width,
+            kernel.shape[kernel_input_channels_index],
+        )
+        input_grad = input_grad.transpose(0, 3, 1, 2)
+
+        # Update the weights in the kernel
+        self.kernel -= gradient_kernel
+
+        # Output the gradient to propagate backwards
+        return input_grad
+
+    def _extract_windows(self, X_batch, batch_type="image"):
+        """
+        Receives as input the X_batch with shape (inputs, feature_maps, image_height, image_width)
+        and extract windows of size kernel_height * kernel_width for every image and every feature_map.
+        It then returns an np.ndarray of shape (image_height * image_width, inputs, feature_maps, kernel_height, kernel_width)
+        which will be used either to filter the images in feedforward or to calculate the gradient.
+        """
+
+        # initialize list of windows
+        windows = []
+
+        if batch_type == "image":
+            # pad the images
+            X_batch_padded = self._padding(X_batch, batch_type="image")
+            img_height, img_width = X_batch_padded.shape[2:]
+            # For each location in the image...
+            for h in range(
+                0,
+                X_batch.shape[height_index],
+                self.v_stride,
+            ):
+                for w in range(
+                    0,
+                    X_batch.shape[width_index],
+                    self.h_stride,
+                ):
+                    # ...obtain an image patch of the original size (strided)
+
+                    # get window
+                    window = X_batch_padded[
+                        :,
+                        :,
+                        h : h + self.kernel_height,
+                        w : w + self.kernel_width,
+                    ]
+
+                    # append to list of windows
+                    windows.append(window)
+
+            # return numpy array instead of list
+            return np.stack(windows)
+
+        # In order to be able to perform backprogagation by the method of window extraction,
+        # here is a modified approach to extracting the windows which allow for the necessary
+        # upsampling of the gradient in case the on of the stride parameters is larger than one.
+
+        if batch_type == "grad":
+
+            # In the case of one of the stride parameters being odd, we have to take some
+            # extra care in calculating the upsampled size of X_batch. We solve this
+            # by simply flooring the result of dividing stride by 2.
+            if self.v_stride < 2 or self.v_stride % 2 == 0:
+                v_stride = 0
+            else:
+                v_stride = int(np.floor(self.v_stride / 2))
+
+            if self.h_stride < 2 or self.h_stride % 2 == 0:
+                h_stride = 0
+            else:
+                h_stride = int(np.floor(self.h_stride / 2))
+
+            upsampled_height = (X_batch.shape[height_index] * self.v_stride) - v_stride
+            upsampled_width = (X_batch.shape[width_index] * self.h_stride) - h_stride
+
+            # When upsampling, we need to insert rows and columns filled with zeros
+            # into each feature map. How many of those we have to insert is purely
+            # dependant on the value of stride parameter in the vertical and horizontal
+            # direction.
+            if self.v_stride > 1:
+                v_ind = 1
+                for i in range(X_batch.shape[height_index]):
+                    for j in range(self.v_stride - 1):
+                        X_batch = np.insert(X_batch, v_ind, 0, axis=height_index)
+                    v_ind += self.v_stride
+
+            if self.h_stride > 1:
+                h_ind = 1
+                for i in range(X_batch.shape[width_index]):
+                    for k in range(self.h_stride - 1):
+                        X_batch = np.insert(X_batch, h_ind, 0, axis=width_index)
+                    h_ind += self.h_stride
+
+            # Since the insertion of zero-filled rows and columns isn't perfect, we have
+            # to assure that the resulting feature maps will have the expected upsampled height
+            # and width by cutting them og at desired dimensions.
+
+            X_batch = X_batch[:, :, :upsampled_height, :upsampled_width]
+
+            X_batch_padded = self._padding(X_batch, batch_type="grad")
+
+            # initialize list of windows
+            windows = []
+
+            # For each location in the image...
+            for h in range(
+                0,
+                X_batch.shape[height_index],
+                self.v_stride,
+            ):
+                for w in range(
+                    0,
+                    X_batch.shape[width_index],
+                    self.h_stride,
+                ):
+                    # ...obtain an image patch of the original size (strided)
+
+                    # get window
+                    window = X_batch_padded[
+                        :, :, h : h + self.kernel_height, w : w + self.kernel_width
+                    ]
+
+                    # append window to list
+                    windows.append(window)
+
+            # return numpy array, unsampled dimensions
+            return np.stack(windows), upsampled_height, upsampled_width
+
+    def _check_for_errors(self):
+        # compares input channels of data to input channels of Convolution2DLayer
+        if self.X_batch_feedforward.shape[input_channel_index] != self.input_channels:
+            raise AssertionError(
+                f"ERROR: Number of input channels in data ({self.X_batch_feedforward.shape[input_channel_index]}) is not equal to input channels in Convolution2DLayerOPT ({self.input_channels})! Please change the number of input channels of the Convolution2DLayer such that they are equal"
+            )
+!ec
+
+=== The Convolutional Neural Network (CNN) ===
+
+Finally, we present the code for the CNN. The CNN class organizes all the layers, and allows for training on image data.
+
+!bc pycod
+import math
+import autograd.numpy as np
+import sys
+import warnings
+from autograd import grad, elementwise_grad
+from random import random, seed
+from copy import deepcopy
+from typing import Tuple, Callable
+from sklearn.utils import resample
+
+warnings.simplefilter("error")
+
+
+class CNN:
+    def __init__(
+        self,
+        cost_func: Callable = CostCrossEntropy,
+        scheduler: Scheduler = Adam(eta=1e-4, rho=0.9, rho2=0.999),
+        seed: int = None,
+    ):
+        """
+        Description:
+        ------------
+            Instantiates CNN object
+
+        Parameters:
+        ------------
+            I   output_func (costFunctions) cost function for feed forward neural network part of CNN,
+                such as "CostLogReg", "CostOLS" or "CostCrossEntropy"
+
+            II  scheduler (Scheduler) optional parameter, default set to Adam. Can also be set to other
+                schedulers such as AdaGrad, Momentum, RMS_prop and Constant. Note that schedulers have
+                to be instantiated first with proper parameters (for example eta, rho and rho2 for Adam)
+
+            III seed (int) used for seeding all random operations
+        """
+        self.layers = list()
+        self.cost_func = cost_func
+        self.scheduler = scheduler
+        self.schedulers_weight = list()
+        self.schedulers_bias = list()
+        self.seed = seed
+        self.pred_format = None
+
+    def add_FullyConnectedLayer(
+        self, nodes: int, act_func=LRELU, scheduler=None
+    ) -> None:
+        """
+        Description:
+        ------------
+            Add a FullyConnectedLayer to the CNN, i.e. a hidden layer in the feed forward neural
+            network part of the CNN. Often called a Dense layer in literature
+
+        Parameters:
+        ------------
+            I   nodes (int) number of nodes in FullyConnectedLayer
+            II  act_func (activationFunctions) activation function of FullyConnectedLayer,
+                such as "sigmoid", "RELU", "LRELU", "softmax" or "identity"
+            III scheduler (Scheduler) optional parameter, default set to Adam. Can also be set to other
+                schedulers such as AdaGrad, Momentum, RMS_prop and Constant
+        """
+        assert self.layers, "FullyConnectedLayer should follow FlattenLayer in CNN"
+
+        if scheduler is None:
+            scheduler = self.scheduler
+
+        layer = FullyConnectedLayer(nodes, act_func, scheduler, self.seed)
+        self.layers.append(layer)
+
+    def add_OutputLayer(self, nodes: int, output_func=sigmoid, scheduler=None) -> None:
+        """
+        Description:
+        ------------
+            Add an OutputLayer to the CNN, i.e. a the final layer in the feed forward neural
+            network part of the CNN
+
+        Parameters:
+        ------------
+            I   nodes (int) number of nodes in OutputLayer. Set nodes=1 for binary classification and
+                nodes = number of classes for multi-class classification
+            II  output_func (activationFunctions) activation function for the output layer, such as
+                "identity" for regression, "sigmoid" for binary classification and "softmax" for multi-class
+                classification
+            III scheduler (Scheduler) optional parameter, default set to Adam. Can also be set to other
+                schedulers such as AdaGrad, Momentum, RMS_prop and Constant
+        """
+        assert self.layers, "OutputLayer should follow FullyConnectedLayer in CNN"
+
+        if scheduler is None:
+            scheduler = self.scheduler
+
+        output_layer = OutputLayer(
+            nodes, output_func, self.cost_func, scheduler, self.seed
+        )
+        self.layers.append(output_layer)
+        self.pred_format = output_layer.get_pred_format()
+
+    def add_FlattenLayer(self, act_func=LRELU) -> None:
+        """
+        Description:
+        ------------
+            Add a FlattenLayer to the CNN, which flattens the image data such that it is formatted to
+            be used in the feed forward neural network part of the CNN
+        """
+        self.layers.append(FlattenLayer(act_func=act_func, seed=self.seed))
+
+    def add_Convolution2DLayer(
+        self,
+        input_channels=1,
+        feature_maps=1,
+        kernel_height=3,
+        kernel_width=3,
+        v_stride=1,
+        h_stride=1,
+        pad="same",
+        act_func=LRELU,
+        optimized=True,
+    ) -> None:
+        """
+        Description:
+        ------------
+            Add a Convolution2DLayer to the CNN, i.e. a convolutional layer with a 2 dimensional kernel. Should be
+            the first layer added to the CNN
+
+        Parameters:
+        ------------
+            I   input_channels (int) specifies amount of input channels. For monochrome images, use input_channels
+                = 1, and input_channels = 3 for colored images, where each channel represents one of R, G and B
+            II  feature_maps (int) amount of feature maps in CNN
+            III kernel_height (int) height of the kernel, also called 'convolutional filter' in literature
+            IV  kernel_width (int) width of the kernel, also called 'convolutional filter' in literature
+            V   v_stride (int) value of vertical stride for dimentionality reduction
+            VI  h_stride (int) value of horizontal stride for dimentionality reduction
+            VII pad (str) default = "same" ensures output size is the same as input size (given stride=1)
+           VIII act_func (activationFunctions) default = "LRELU", nonlinear activation function
+             IX optimized (bool) default = True, uses Convolution2DLayerOPT if True which is much faster when
+                compared to Convolution2DLayer, which is a more straightforward, understandable implementation
+        """
+        if optimized:
+            conv_layer = Convolution2DLayerOPT(
+                input_channels,
+                feature_maps,
+                kernel_height,
+                kernel_width,
+                v_stride,
+                h_stride,
+                pad,
+                act_func,
+                self.seed,
+                reset_weights_independently=False,
+            )
+        else:
+            conv_layer = Convolution2DLayer(
+                input_channels,
+                feature_maps,
+                kernel_height,
+                kernel_width,
+                v_stride,
+                h_stride,
+                pad,
+                act_func,
+                self.seed,
+                reset_weights_independently=False,
+            )
+        self.layers.append(conv_layer)
+
+    def add_PoolingLayer(
+        self, kernel_height=2, kernel_width=2, v_stride=1, h_stride=1, pooling="max"
+    ) -> None:
+        """
+        Description:
+        ------------
+            Add a Pooling2DLayer to the CNN, i.e. a pooling layer that reduces the dimentionality of
+            the image data. It is not necessary to use a Pooling2DLayer when creating a CNN, but it
+            can be used to speed up the training
+
+        Parameters:
+        ------------
+            I   kernel_height (int) height of the kernel used for pooling
+            II  kernel_width (int) width of the kernel used for pooling
+            III v_stride (int) value of vertical stride for dimentionality reduction
+            IV  h_stride (int) value of horizontal stride for dimentionality reduction
+            V   pooling (str) either "max" or "average", describes type of pooling performed
+        """
+        pooling_layer = Pooling2DLayer(
+            kernel_height, kernel_width, v_stride, h_stride, pooling, self.seed
+        )
+        self.layers.append(pooling_layer)
+
+    def fit(
+        self,
+        X: np.ndarray,
+        t: np.ndarray,
+        epochs: int = 100,
+        lam: float = 0,
+        batches: int = 1,
+        X_val: np.ndarray = None,
+        t_val: np.ndarray = None,
+    ) -> dict:
+        """
+        Description:
+        ------------
+            Fits the CNN to input X for a given amount of epochs. Performs feedforward and backpropagation passes,
+            can utilize batches, regulariziation and validation if desired.
+
+        Parameters:
+        ------------
+            X (numpy array) with input data in format [images, input channels,
+            image height, image_width]
+            t (numpy array) target labels for input data
+            epochs (int) amount of epochs
+            lam (float) regulariziation term lambda
+            batches (int) amount of batches input data splits into
+            X_val (numpy array) validation data
+            t_val (numpy array) target labels for validation data
+
+        Returns:
+        ------------
+            scores (dict) a dictionary with "train_error", "train_acc", "val_error", val_acc" keys
+            that contain numpy arrays with float values of all accuracies/errors over all epochs.
+            Can be used to create plots. Also used to update the progress bar during training
+        """
+
+        # setup
+        if self.seed is not None:
+            np.random.seed(self.seed)
+
+        # initialize weights
+        self._initialize_weights(X)
+
+        # create arrays for score metrics
+        scores = self._initialize_scores(epochs)
+
+        assert batches <= t.shape[0]
+        batch_size = X.shape[0] // batches
+
+        try:
+            for epoch in range(epochs):
+                for batch in range(batches):
+                    # minibatch gradient descent
+                    # If the for loop has reached the last batch, take all thats left
+                    if batch == batches - 1:
+                        X_batch = X[batch * batch_size :, :, :, :]
+                        t_batch = t[batch * batch_size :, :]
+                    else:
+                        X_batch = X[
+                            batch * batch_size : (batch + 1) * batch_size, :, :, :
+                        ]
+                        t_batch = t[batch * batch_size : (batch + 1) * batch_size, :]
+
+                    self._feedforward(X_batch)
+                    self._backpropagate(t_batch, lam)
+
+                # reset schedulers for each epoch (some schedulers pass in this call)
+                for layer in self.layers:
+                    if isinstance(layer, FullyConnectedLayer):
+                        layer._reset_scheduler()
+
+                # computing performance metrics
+                scores = self._compute_scores(scores, epoch, X, t, X_val, t_val)
+
+                # printing progress bar
+                print_length = self._progress_bar(
+                    epoch,
+                    epochs,
+                    scores,
+                )
+        # allows for stopping training at any point and seeing the result
+        except KeyboardInterrupt:
+            pass
+
+        # visualization of training progression (similiar to tensorflow progression bar)
+        sys.stdout.write("\r" + " " * print_length)
+        sys.stdout.flush()
+        self._progress_bar(
+            epochs,
+            epochs,
+            scores,
+        )
+        sys.stdout.write("")
+
+        return scores
+
+    def _feedforward(self, X_batch) -> np.ndarray:
+        """
+        Description:
+        ------------
+            Performs the feedforward pass for all layers in the CNN. Called from fit()
+        """
+        a = X_batch
+        for layer in self.layers:
+            a = layer._feedforward(a)
+
+        return a
+
+    def _backpropagate(self, t_batch, lam) -> None:
+        """
+        Description:
+        ------------
+            Performs backpropagation for all layers in the CNN. Called from fit()
+        """
+        assert len(self.layers) >= 2
+        reversed_layers = self.layers[::-1]
+
+        # for every layer, backwards
+        for i in range(len(reversed_layers) - 1):
+            layer = reversed_layers[i]
+            prev_layer = reversed_layers[i + 1]
+
+            # OutputLayer
+            if isinstance(layer, OutputLayer):
+                prev_a = prev_layer.get_prev_a()
+                weights_next, delta_next = layer._backpropagate(t_batch, prev_a, lam)
+
+            # FullyConnectedLayer
+            elif isinstance(layer, FullyConnectedLayer):
+                assert (
+                    delta_next is not None
+                ), "No OutputLayer to follow FullyConnectedLayer"
+                assert (
+                    weights_next is not None
+                ), "No OutputLayer to follow FullyConnectedLayer"
+                prev_a = prev_layer.get_prev_a()
+                weights_next, delta_next = layer._backpropagate(
+                    weights_next, delta_next, prev_a, lam
+                )
+
+            # FlattenLayer
+            elif isinstance(layer, FlattenLayer):
+                assert (
+                    delta_next is not None
+                ), "No FullyConnectedLayer to follow FlattenLayer"
+                assert (
+                    weights_next is not None
+                ), "No FullyConnectedLayer to follow FlattenLayer"
+                delta_next = layer._backpropagate(weights_next, delta_next)
+
+            # Convolution2DLayer and Convolution2DLayerOPT
+            elif isinstance(layer, Convolution2DLayer):
+                assert (
+                    delta_next is not None
+                ), "No FlattenLayer to follow Convolution2DLayer"
+                delta_next = layer._backpropagate(delta_next)
+
+            # Pooling2DLayer
+            elif isinstance(layer, Pooling2DLayer):
+                assert delta_next is not None, "No Layer to follow Pooling2DLayer"
+                delta_next = layer._backpropagate(delta_next)
+
+            # Catch error
+            else:
+                raise NotImplementedError
+
+    def _compute_scores(
+        self,
+        scores: dict,
+        epoch: int,
+        X: np.ndarray,
+        t: np.ndarray,
+        X_val: np.ndarray,
+        t_val: np.ndarray,
+    ) -> dict:
+        """
+        Description:
+        ------------
+            Computes scores such as training error, training accuracy, validation error
+            and validation accuracy for the CNN depending on if a validation set is used
+            and if the CNN performs classification or regression
+
+        Returns:
+        ------------
+            scores (dict) a dictionary with "train_error", "train_acc", "val_error", val_acc" keys
+            that contain numpy arrays with float values of all accuracies/errors over all epochs.
+            Can be used to create plots. Also used to update the progress bar during training
+        """
+
+        pred_train = self.predict(X)
+        cost_function_train = self.cost_func(t)
+        train_error = cost_function_train(pred_train)
+        scores["train_error"][epoch] = train_error
+
+        if X_val is not None and t_val is not None:
+            cost_function_val = self.cost_func(t_val)
+            pred_val = self.predict(X_val)
+            val_error = cost_function_val(pred_val)
+            scores["val_error"][epoch] = val_error
+
+        if self.pred_format != "Regression":
+            train_acc = self._accuracy(pred_train, t)
+            scores["train_acc"][epoch] = train_acc
+            if X_val is not None and t_val is not None:
+                val_acc = self._accuracy(pred_val, t_val)
+                scores["val_acc"][epoch] = val_acc
+
+        return scores
+
+    def _initialize_scores(self, epochs) -> dict:
+        """
+        Description:
+        ------------
+            Initializes scores such as training error, training accuracy, validation error
+            and validation accuracy for the CNN
+
+        Returns:
+        ------------
+            A dictionary with "train_error", "train_acc", "val_error", val_acc" keys that
+            will contain numpy arrays with float values of all accuracies/errors over all epochs
+            when passed through the _compute_scores() function during fit()
+        """
+        scores = dict()
+
+        train_errors = np.empty(epochs)
+        train_errors.fill(np.nan)
+        val_errors = np.empty(epochs)
+        val_errors.fill(np.nan)
+
+        train_accs = np.empty(epochs)
+        train_accs.fill(np.nan)
+        val_accs = np.empty(epochs)
+        val_accs.fill(np.nan)
+
+        scores["train_error"] = train_errors
+        scores["val_error"] = val_errors
+        scores["train_acc"] = train_accs
+        scores["val_acc"] = val_accs
+
+        return scores
+
+    def _initialize_weights(self, X: np.ndarray) -> None:
+        """
+        Description:
+        ------------
+            Initializes weights for all layers in CNN
+
+        Parameters:
+        ------------
+            I   X (np.ndarray) input of format [img, feature_maps, height, width]
+        """
+        prev_nodes = X
+        for layer in self.layers:
+            prev_nodes = layer._reset_weights(prev_nodes)
+
+    def predict(self, X: np.ndarray, *, threshold=0.5) -> np.ndarray:
+        """
+        Description:
+        ------------
+            Predicts output of input X
+
+        Parameters:
+        ------------
+            I   X (np.ndarray) input [img, feature_maps, height, width]
+        """
+
+        prediction = self._feedforward(X)
+
+        if self.pred_format == "Binary":
+            return np.where(prediction > threshold, 1, 0)
+        elif self.pred_format == "Multi-class":
+            class_prediction = np.zeros(prediction.shape)
+            for i in range(prediction.shape[0]):
+                class_prediction[i, np.argmax(prediction[i, :])] = 1
+            return class_prediction
+        else:
+            return prediction
+
+    def _accuracy(self, prediction: np.ndarray, target: np.ndarray) -> float:
+        """
+        Description:
+        ------------
+            Calculates accuracy of given prediction to target
+
+        Parameters:
+        ------------
+            I   prediction (np.ndarray): output of predict() fuction
+            (1s and 0s in case of classification, and real numbers in case of regression)
+            II  target (np.ndarray): vector of true values (What the network should predict)
+
+        Returns:
+        ------------
+            A floating point number representing the percentage of correctly classified instances.
+        """
+        assert prediction.size == target.size
+        return np.average((target == prediction))
+
+    def _progress_bar(self, epoch: int, epochs: int, scores: dict) -> int:
+        """
+        Description:
+        ------------
+            Displays progress of training
+        """
+        progression = epoch / epochs
+        epoch -= 1
+        print_length = 40
+        num_equals = int(progression * print_length)
+        num_not = print_length - num_equals
+        arrow = ">" if num_equals > 0 else ""
+        bar = "[" + "=" * (num_equals - 1) + arrow + "-" * num_not + "]"
+        perc_print = self._fmt(progression * 100, N=5)
+        line = f"  {bar} {perc_print}% "
+
+        for key, score in scores.items():
+            if np.isnan(score[epoch]) == False:
+                value = self._fmt(score[epoch], N=4)
+                line += f"| {key}: {value} "
+        print(line, end="\r")
+        return len(line)
+
+    def _fmt(self, value: int, N=4) -> str:
+        """
+        Description:
+        ------------
+            Formats decimal numbers for progress bar
+        """
+        if value > 0:
+            v = value
+        elif value < 0:
+            v = -10 * value
+        else:
+            v = 1
+        n = 1 + math.floor(math.log10(v))
+        if n >= N - 1:
+            return str(round(value))
+            # or overflow
+        return f"{value:.{N-n-1}f}"
+!ec
+
+=== Usage of CNN code ===
+
+Using the CNN codebase is very simple. We begin by initiating a CNN
+object, which takes a cost function, a scheduler and a seed as its
+arguments. If a scheduler is not provided, it will per default
+initiate an Adam scheduler with eta=1e-4, and if a seed is not
+provided, the CNN will not be seeded, meaning it will run with a
+different random seed every run. Below we demonstrate an initiation of
+our CNN.
+
+!bc pycod
+adam_scheduler = Adam(eta=1e-3, rho=0.9, rho2=0.999)
+cnn = CNN(cost_func=CostCrossEntropy, scheduler=adam_scheduler, seed=2023)
+!ec
+
+Now that we have our CNN object, we can begin to add layers to it!
+Many of the add_layer functions have default values, for example
+add_Convolution2DLayer() has a default v_stride and h_stride of
+1. However, these can of course be set to any value you please. Note
+that the input channels of a subsequent convolutional layer must equal
+the previous convolutional layer's feature maps.
+
+!bc pycod
+cnn.add_Convolution2DLayer(
+    input_channels=1,
+    feature_maps=1,
+    kernel_height=3,
+    kernel_width=3,
+    act_func=LRELU,
+)
+
+cnn.add_FlattenLayer()
+
+cnn.add_FullyConnectedLayer(30, LRELU)
+
+cnn.add_FullyConnectedLayer(20, LRELU)
+
+cnn.add_OutputLayer(10, softmax)
+!ec
+
+Here we have created a CNN with the following architecture:
+ 
+o A convolutional layer with 1 input channel, with a kernel height of 2 and a width of 2, which uses LRELU as its non-linearity function. This layer outputs 1 feature map, which feed into the subsequent layer.
+
+o A flatten layer
+
+o A hidden layer with 30 nodes, with LRELU as its activation function
+
+o Another hidden layer but with 20 nodes
+
+o The output layer, with softmax as its activation function and 10 nodes. We use 10 nodes because we will be using a dataset with 10 classes.
+
+Now, before we can train the model, we need to load in our data. We
+will use the MNIST dataset and use 10000 $28 \times  28$ images.
+
+!bc pycod
+from sklearn.datasets import fetch_openml
+from sklearn.model_selection import train_test_split
+
+def onehot(target: np.ndarray):
+    onehot = np.zeros((target.size, target.max() + 1))
+    onehot[np.arange(target.size), target] = 1
+    return onehot
+
+# get dataset
+dataset = fetch_openml("mnist_784", parser="auto")
+mnist = dataset.data.to_numpy(dtype="float")[:10000, :]
+
+# scale data
+for i in range(mnist.shape[1]):
+    mnist[:, i] /= 255
+    
+# reshape to add single input channel to data shape [inputs, input_channels, height, width]
+mnist = mnist.reshape(mnist.shape[0], 1, 28, 28)
+
+# one hot encode target as we are doing multi-class classification
+target = onehot(np.array([int(i) for i in dataset.target.to_numpy()[:10000]]))
+
+# split into training and validation data
+x_train, x_val, y_train, y_val = train_test_split(mnist, target)
+!ec
+
+Now we may train our model. Note that we can utilize regularization in
+the CNN by using the lam (lambda) parameter in fit(), and utilize
+different types of gradient descent by specifying the amount of
+batches via the batches parameter as shown below.
+ 
+The functionfit() returns a score dictionary of the training error and
+accuracy (and validation error and accuracy if a validation set is
+provided) which can be used to plot the error and accuracy of the
+model over epochs.
+
+!bc pycod
+scores = cnn.fit(
+    x_train,
+    y_train,
+    lam=1e-5,
+    batches=10,
+    epochs=100,
+    X_val=x_val,
+    t_val=y_val,
+)
+
+plt.plot(scores["train_acc"], label="Training")
+plt.plot(scores["val_acc"], label="Validation")
+plt.ylim([0.8,1])
+plt.xlabel("Epochs")
+plt.ylabel("Accuracy")
+plt.legend()
+plt.show()
+!ec
+
+Considering we only trained the model for 100 epochs without any tuning of the hyperparameters, this result is pretty good.
+ 
+The codebase allows for great flexibility in CNN
+architectures. Pooling layers can be added before, inbetween or after
+convolutional layers, but due to the great optimizations made within
+Convolution2DLayerOPT, we recommend using the v_stride and h_stride
+parameters in add_Convolution2DLayer() to reduce the dimentionality of
+the problem as the pooling layer is slow in comparison. To use the
+unoptimized version of Convolution2DLayer, simply pass optimized=False
+as an argument in add_Convolution2DLayer().
+
+If one wishes to perform binary classification using the CNN, simply
+use the cost function 'CostLogReg' when initializing the CNN and use 1
+node at the OutputLayer.
+ 
+Below we have created another, more untraditional architecture using
+our code to demonstrate its flexibility and different attributes such
+as asymmetric stride that might become useful when constructing your
+own CNN.
+
+!bc pycod
+adam_scheduler = Adam(eta=1e-3, rho=0.9, rho2=0.999)
+cnn = CNN(cost_func=CostCrossEntropy, scheduler=adam_scheduler, seed=2023)
+
+cnn.add_Convolution2DLayer(
+    input_channels=1,
+    feature_maps=7,
+    kernel_height=7,
+    kernel_width=1,
+    act_func=LRELU,
+)
+
+cnn.add_PoolingLayer(
+    kernel_height=2,
+    kernel_width=2,
+    pooling="average",
+)
+
+cnn.add_PoolingLayer(
+    kernel_height=2,
+    kernel_width=2,
+    pooling="max",
+)
+
+cnn.add_Convolution2DLayer(
+    input_channels=7,
+    feature_maps=1,
+    kernel_height=4,
+    kernel_width=4,
+    v_stride=2,
+    h_stride=3,
+    act_func=LRELU,
+    optimized=False,
+)
+
+cnn.add_Convolution2DLayer(
+    input_channels=1,
+    feature_maps=1,
+    kernel_height=2,
+    kernel_width=2,
+    act_func=sigmoid,
+    optimized=True,
+)
+
+cnn.add_PoolingLayer(
+    kernel_height=2,
+    kernel_width=2,
+    pooling="max"
+)
+
+cnn.add_FlattenLayer()
+
+cnn.add_FullyConnectedLayer(100, LRELU)
+
+cnn.add_FullyConnectedLayer(10, sigmoid)
+
+cnn.add_FullyConnectedLayer(101, identity)
+
+cnn.add_OutputLayer(10, softmax)
+!ec
+
+Here we see the use of asymmetrical 1D kernels such as the $7 \times
+1$ kernel in the first convolutional layer, both max and average
+pooling, asymmetric stride in the unoptimized convolutional layer,
+more pooling, a flatten layer, a hidden layer with 100 nodes using
+LRELU, another hidden layer with 10 hidden nodes that uses the sigmoid
+activation function, and another hidden layer with 101 nodes which
+utilizes no activation function (identity). Finally, we arrive at the
+output layer with 10 nodes, which uses softmax as its activation
+function.
+
+=== Additional Remarks ===
+
+
+The stride parameter controls the distance between each convolution
+and the kernel/filter. If our image is padded, stride is the only
+parameter that determines the size of the output from a convolutional
+layer. However, if we decide not to perform any padding, the size of
+the output feature map depends on both the stride and kernel size. It
+is important to note that neither the stride nor the kernel has to be
+symmetrical. This means that we can use a rectangular filter if we
+choose, and the stride in the vertical direction (axis=0 in Python)
+does not need to be the same as the stride in the horizontal direction
+(axis=1 in Python). It may even be the case that asymmetric
+combinations of stride or kernel dimensions, or both, yield better
+results than symmetric values for these parameters.
+
+!bc pycod
+def convolve(image, kernel, stride=1):
+    for i in range(2):
+        kernel = np.rot90(kernel)
+
+    k_half_height = kernel.shape[0] // 2
+    k_half_width = kernel.shape[0] // 2
+
+    conv_image = np.zeros(image.shape)
+    pad_image = padding(image, kernel)
+
+    for i in range(k_half_height, conv_image.shape[0] + k_half_height, stride):
+        for j in range(k_half_width, conv_image.shape[1] + k_half_width, stride):
+            conv_image[i - k_half_height, j - k_half_width] = np.sum(
+                pad_image[
+                    i - k_half_height : i + k_half_height + 1, j - k_half_width : j + k_half_width + 1
+                ]
+                * kernel
+            )
+
+    return conv_image
+!ec
+
+=== Remarks on the speed  ===
+
+Despite the naive convolution algorithm shown above working finely, it
+is extremely slow, requiring approximately 20-30 seconds to process a
+single image. The time complexity of 2D convolution, which is O(NMnm),
+rapidly becomes a constraint and may, at worst, make computations
+infeasible. Consequently, optimizing the naive 2D convolution
+algorithm is a necessity, as the execution time of the algorithm
+significantly increases as the input data size expands. This can pose
+a bottleneck in applications that necessitate real-time processing of
+large data volumes, such as image and video processing, deep learning,
+and scientific simulations.
+
+To address this issue, we shall present two widely used optimization
+techniques: the separable kernel approach and Fast Fourier Transform
+(FFT). Both of these methods can drastically reduce the computational
+complexity of convolution and enhance the overall efficiency of
+processing substantial data quantities. While we shall refrain from
+delving into the intricacies of these algorithms, we strongly
+encourage you to examine at least the application of FFT to optimize
+computations.
+
+=== Convolution using separable kernels ===
+
+!bc pycod
+def conv2DSep(image, kernel, coef, stride=1, pad="zero"):
+    for i in range(2):
+        kernel = np.rot90(kernel)
+
+    # The kernel is quadratic, thus we only need one of its dimensions
+    half_dim = kernel.shape[0] // 2
+
+    ker1 = np.array(kernel[0, :])
+    ker2 = np.array(kernel[:, 0])
+
+    if pad == "zero":
+        conv_image = np.zeros(image.shape)
+        pad_image = padding(image, kernel)
+    else:
+        conv_image = np.zeros(
+            (image.shape[0] - kernel.shape[0], image.shape[1] - kernel.shape[1])
+        )
+        pad_image = image[:, :]
+
+    for i in range(half_dim, conv_image.shape[0] + half_dim, stride):
+        for j in range(half_dim, conv_image.shape[1] + half_dim, stride):
+            conv_image[i - half_dim, j - half_dim] = (
+                pad_image[
+                    i - half_dim : i + half_dim + 1, j - half_dim : j + half_dim + 1
+                ]
+                @ ker1
+                @ ker2.T
+                * coef
+            )
+
+    return conv_image
+
+img_path = img_path = "data/IMG-2167.JPG"
+image_of_cute_dog = imageio.imread(img_path, mode="L")
+start_time = time.time()
+filtered_image = conv2DSep(image_of_cute_dog, kernel=sobel_kernel, coef=1)
+print(f'Time taken for convolution with seperated kernel on 128x128 image {time.time() - start_time}')
+plt.imshow(filtered_image, cmap="gray", vmin=0, vmax=255, aspect="auto")
+plt.show()
+!ec
+
+By taking advantage of the capabilities of separable kernels, we can
+effectively cut the computational expense of filtering an image in
+half. Yet, if we seek even more rapid processing, we can turn to the
+Fast Fourier Transform (FFT) algorithm provided by the numpy
+library. By utilizing FFT to transform the input image and filter into
+the frequency domain, we can perform convolution in this domain. This
+approach significantly reduces the number of operations needed and
+results in a marked speedup relative to other convolution
+techniques. In addition, it is worth noting that the FFT is widely
+regarded as one of the most critical algorithms developed to date,
+with applications ranging from digital signal processing to scientific
+computing.
+
+=== Convolution in the Fourier domain ===
+!bc pycod
+start_time = time.time()
+img_fft = np.fft.fft2(image_of_cute_dog)
+kernel_fft = np.fft.fft2(sobel_kernel, s=image_of_cute_dog.shape)
+
+conv_image = img_fft * kernel_fft
+
+filtered_image = np.fft.ifft2(conv_image)
+print(f'Time take for convolution in the fourier domain: {time.time() - start_time}')
+plt.imshow(filtered_image.real, cmap="gray", vmin=0, vmax=255, aspect="auto")
+plt.show()
+!ec
+
+It is evident that executing convolution in the Fourier domain yields
+the quickest computation time. Nonetheless, one should exercise
+caution, particularly when dealing with images of relatively small
+dimensions, as one of the other methods may prove to be more
+expeditious than FFT-enhanced convolution. The overhead involved in
+transferring both the image and filter into the Fourier domain,
+followed by their subsequent transformation back into the spatial
+domain, results in a minor inconvenience. Therefore, it is imperative
+to remain cognizant of this fact when utilizing FFT as the primary
+optimization technique.
+
+
+
+
+
+
+
+
+
+
+!split
+===== From FFNNs and CNNs to recurrent neural networks (RNNs) =====
+
+There are limitation of FFNNs, one of which being that FFNNs are not
+designed to handle sequential data (data for which the order matters)
+effectively because they lack the capabilities of storing information
+about previous inputs; each input is being treated indepen-
+dently. This is a limitation when dealing with sequential data where
+past information can be vital to correctly process current and future
+inputs. 
+
+
+!split
+===== Feedback connections =====
+
+
+In contrast to FFNNs, recurrent networks introduce feedback
+connections, meaning the information is allowed to be carried to
+subsequent nodes across different time steps. These cyclic or feedback
+connections have the objective of providing the network with some kind
+of memory, making RNNs particularly suited for time- series data,
+natural language processing, speech recognition, and several other
+problems for which the order of the data is crucial.  The RNN
+architectures vary greatly in how they manage information flow and
+memory in the network.
+
+
+!split
+===== Vanishing gradients =====
+
+Different architectures often aim at improving
+some sub-optimal characteristics of the network. The simplest form of
+recurrent network, commonly called simple or vanilla RNN, for example,
+is known to suffer from the problem of vanishing gradients. This
+problem arises due to the nature of backpropagation in time. Gradients
+of the cost/loss function may get exponentially small (or large) if
+there are many layers in the network, which is the case of RNN when
+the sequence gets long.
+
+
+
+
+!split
+===== Recurrent neural networks (RNNs): Overarching view =====
+
+Till now our focus has been, including convolutional neural networks
+as well, on feedforward neural networks. The output or the activations
+flow only in one direction, from the input layer to the output layer.
+
+A recurrent neural network (RNN) looks very much like a feedforward
+neural network, except that it also has connections pointing
+backward. 
+
+RNNs are used to analyze time series data such as stock prices, and
+tell you when to buy or sell. In autonomous driving systems, they can
+anticipate car trajectories and help avoid accidents. More generally,
+they can work on sequences of arbitrary lengths, rather than on
+fixed-sized inputs like all the nets we have discussed so far. For
+example, they can take sentences, documents, or audio samples as
+input, making them extremely useful for natural language processing
+systems such as automatic translation and speech-to-text.
+
+!split
+===== Sequential data only? =====
+
+An important issue is that in many deep learning methods we assume
+that the input and output data can be treated as independent and
+identically distributed, normally abbreviated to _iid_.
+This means that the data we use can be seen as mutually independent.
+
+This is however not the case for most data sets used in RNNs since we
+are dealing with sequences of data with strong inter-dependencies.
+This applies in particular to time series, which are sequential by
+contruction.
+
+!split
+===== Differential equations =====
+
+As an example, the solutions of ordinary differential equations can be
+represented as a time series, similarly, how stock prices evolve as
+function of time is another example of a typical time series, or voice
+records and many other examples.
+
+
+Not all sequential data may however have a time stamp, texts being a
+typical example thereof, or DNA sequences.
+
+The main focus here is on data that can be structured either as time
+series or as ordered series of data.  We will not focus on for example
+natural language processing or similar data sets.
+
+
+
+!split
+===== A simple example =====
+
+!bc pycod
+# Start importing packages
+import pandas as pd
+import numpy as np
+import matplotlib.pyplot as plt
+import tensorflow as tf
+from tensorflow.keras import datasets, layers, models
+from tensorflow.keras.layers import Input
+from tensorflow.keras.models import Model, Sequential 
+from tensorflow.keras.layers import Dense, SimpleRNN, LSTM, GRU
+from tensorflow.keras import optimizers     
+from tensorflow.keras import regularizers           
+from tensorflow.keras.utils import to_categorical 
+
+
+
+# convert into dataset matrix
+def convertToMatrix(data, step):
+ X, Y =[], []
+ for i in range(len(data)-step):
+  d=i+step  
+  X.append(data[i:d,])
+  Y.append(data[d,])
+ return np.array(X), np.array(Y)
+
+step = 4
+N = 1000    
+Tp = 800    
+
+t=np.arange(0,N)
+x=np.sin(0.02*t)+2*np.random.rand(N)
+df = pd.DataFrame(x)
+df.head()
+
+values=df.values
+train,test = values[0:Tp,:], values[Tp:N,:]
+
+# add step elements into train and test
+test = np.append(test,np.repeat(test[-1,],step))
+train = np.append(train,np.repeat(train[-1,],step))
+ 
+trainX,trainY =convertToMatrix(train,step)
+testX,testY =convertToMatrix(test,step)
+trainX = np.reshape(trainX, (trainX.shape[0], 1, trainX.shape[1]))
+testX = np.reshape(testX, (testX.shape[0], 1, testX.shape[1]))
+
+model = Sequential()
+model.add(SimpleRNN(units=32, input_shape=(1,step), activation="relu"))
+model.add(Dense(8, activation="relu")) 
+model.add(Dense(1))
+model.compile(loss='mean_squared_error', optimizer='rmsprop')
+model.summary()
+
+model.fit(trainX,trainY, epochs=100, batch_size=16, verbose=2)
+trainPredict = model.predict(trainX)
+testPredict= model.predict(testX)
+predicted=np.concatenate((trainPredict,testPredict),axis=0)
+
+trainScore = model.evaluate(trainX, trainY, verbose=0)
+print(trainScore)
+plt.plot(df)
+plt.plot(predicted)
+plt.show()
+!ec
+
+
+!split
+===== RNNs =====
+
+RNNs are very powerful, because they
+combine two properties:
+o Distributed hidden state that allows them to store a lot of information about the past efficiently.
+o Non-linear dynamics that allows them to update their hidden state in complicated ways.
+
+With enough neurons and time, RNNs
+can compute anything that can be
+computed by your computer.
+
+
+
+!split
+===== What kinds of behaviour can RNNs exhibit? =====
+
+o They can oscillate. 
+o They can settle to point attractors.
+o They can behave chaotically.
+o RNNs could potentially learn to implement lots of small programs that each capture a nugget of knowledge and run in parallel, interacting to produce very complicated effects.
+
+But the extensive computational needs  of RNNs makes them very hard to train.
+
+
+!split
+===== Basic layout,  "Figures from Sebastian Rashcka et al, Machine learning with Sickit-Learn and PyTorch":"https://sebastianraschka.com/blog/2022/ml-pytorch-book.html" =====
+
+FIGURE: [figslides/RNN1.png, width=700 frac=0.9]
+
+
+!split
+===== Solving differential equations with RNNs =====
+
+To gain some intuition on how we can use RNNs for time series, let us
+tailor the representation of the solution of a differential equation
+as a time series.
+
+Consider the famous differential equation (Newton's equation of motion for damped harmonic oscillations, scaled in terms of dimensionless time)
+
+!bt
+\[
+\frac{d^2x}{dt^2}+\eta\frac{dx}{dt}+x(t)=F(t),
+\]
+!et
+where $\eta$ is a constant used in scaling time into a dimensionless variable and $F(t)$ is an external force acting on the system.
+The constant $\eta$ is a so-called damping.
+
+!split
+===== Two first-order differential equations =====
+
+In solving the above second-order equation, it is common to rewrite it in terms of two coupled first-order equations
+with the velocity
+!bt
+\[
+v(t)=\frac{dx}{dt},
+\]
+!et
+and the acceleration
+!bt
+\[
+\frac{dv}{dt}=F(t)-\eta v(t)-x(t).
+\]
+!et
+
+With the initial conditions $v_0=v(t_0)$ and $x_0=x(t_0)$ defined, we can integrate these equations and find their respective solutions.
+
+!split
+===== Velocity only =====
+
+Let us focus on the velocity only. Discretizing and using the simplest
+possible approximation for the derivative, we have Euler's forward
+method for the updated velocity at a time step $i+1$ given by
+!bt
+\[
+v_{i+1}=v_i+\Delta t \frac{dv}{dt}_{\vert_{v=v_i}}=v_i+\Delta t\left(F_i-\eta v_i-x_i\right).
+\]
+!et
+Defining a function
+!bt
+\[
+h_i(x_i,v_i,F_i)=v_i+\Delta t\left(F_i-\eta v_i-x_i\right),
+\]
+!et
+we have
+!bt
+\[
+v_{i+1}=h_i(x_i,v_i,F_i).
+\]
+!et
+
+!split
+===== Linking with RNNs =====
+
+The equation
+!bt
+\[
+v_{i+1}=h_i(x_i,v_i,F_i).
+\]
+!et
+can be used to train a feed-forward neural network with inputs $v_i$ and outputs $v_{i+1}$ at a time $t_i$. But we can think of this also as a recurrent neural network
+with inputs $v_i$, $x_i$ and $F_i$ at each time step $t_i$, and producing an output $v_{i+1}$.
+
+Noting that 
+!bt
+\[
+v_{i}=v_{i-1}+\Delta t\left(F_{i-1}-\eta v_{i-1}-x_{i-1}\right)=h_{i-1}.
+\]
+!et
+we have
+!bt
+\[
+v_{i}=h_{i-1}(x_{i-1},v_{i-1},F_{i-1}),
+\]
+!et
+and we can rewrite
+!bt
+\[
+v_{i+1}=h_i(x_i,h_{i-1},F_i).
+\]
+!et
+
+!split
+===== Minor rewrite =====
+
+We can thus set up a recurring series which depends on the inputs $x_i$ and $F_i$ and the previous values $h_{i-1}$.
+We assume now that the inputs at each step (or time $t_i$) is given by $x_i$ only and we denote the outputs for $\tilde{y}_i$ instead of $v_{i_1}$, we have then the compact equation for our outputs at each step $t_i$
+!bt
+\[
+y_{i}=h_i(x_i,h_{i-1}).
+\]
+!et
+
+We can think of this as an element in a recurrent network where our
+network (our model) produces an output $y_i$ which is then compared
+with a target value through a given cost/loss function that we
+optimize. The target values at a given step $t_i$ could be the results
+of a measurement or simply the analytical results of a differential
+equation.
+
+
+!split
+===== RNNs in more detail  =====
+
+FIGURE: [figslides/RNN2.png, width=700 frac=0.9]
+
+!split
+===== RNNs in more detail, part 2  =====
+
+FIGURE: [figslides/RNN3.png, width=700 frac=0.9]
+
+
+!split
+===== RNNs in more detail, part 3  =====
+
+FIGURE: [figslides/RNN4.png, width=700 frac=0.9]
+
+!split
+===== RNNs in more detail, part 4  =====
+
+FIGURE: [figslides/RNN5.png, width=700 frac=0.9]
+
+!split
+===== RNNs in more detail, part 5  =====
+
+FIGURE: [figslides/RNN6.png, width=700 frac=0.9]
+
+!split
+===== RNNs in more detail, part 6  =====
+
+FIGURE: [figslides/RNN7.png, width=700 frac=0.9]
+
+!split
+===== RNNs in more detail, part 7  =====
+
+FIGURE: [figslides/RNN8.png, width=700 frac=0.9]
+
+
+
+!split
+===== Backpropagation through time =====
+
+!bblock
+We can think of the recurrent net as a layered, feed-forward
+net with shared weights and then train the feed-forward net
+with weight constraints.
+!eblock
+
+We can also think of this training algorithm in the time domain:
+o The forward pass builds up a stack of the activities of all the units at each time step.
+o The backward pass peels activities off the stack to compute the error derivatives at each time step.
+o After the backward pass we add together the derivatives at all the different times for each weight. 
+
+!split
+===== The backward pass is linear =====
+
+o There is a big difference between the forward and backward passes.
+o In the forward pass we use squashing functions (like the logistic) to prevent the activity vectors from exploding.
+o The backward pass, is completely linear. If you double the error derivatives at the final layer, all the error derivatives will double.
+
+The forward pass determines the slope of the linear function used for
+backpropagating through each neuron
+
+
+!split 
+===== The problem of exploding or vanishing gradients =====
+* What happens to the magnitude of the gradients as we backpropagate through many layers?
+  o If the weights are small, the gradients shrink exponentially.
+  o If the weights are big the gradients grow exponentially.
+* Typical feed-forward neural nets can cope with these exponential effects because they only have a few hidden layers.
+* In an RNN trained on long sequences (e.g. 100 time steps) the gradients can easily explode or vanish.
+  o We can avoid this by initializing the weights very carefully.
+* Even with good initial weights, its very hard to detect that the current target output depends on an input from many time-steps ago.
+
+
+RNNs have difficulty dealing with long-range dependencies. 
+
+!split
+===== Mathematical setup =====
+
+The expression for the simplest Recurrent network resembles that of a
+regular feed-forward neural network, but now with
+the concept of temporal dependencies
+
+!bt
+\begin{align*}
+    \mathbf{a}^{(t)} & = U * \mathbf{x}^{(t)} + W * \mathbf{h}^{(t-1)} + \mathbf{b}, \notag \\
+    \mathbf{h}^{(t)} &= \sigma_h(\mathbf{a}^{(t)}), \notag\\
+    \mathbf{y}^{(t)} &= V * \mathbf{h}^{(t)} + \mathbf{c}, \notag\\
+    \mathbf{\hat{y}}^{(t)} &= \sigma_y(\mathbf{y}^{(t)}).
+\end{align*}
+!et
+
+
+!split
+===== Back propagation in time through figures, part 1   =====
+
+FIGURE: [figslides/RNN9.png, width=700 frac=0.9]
+
+
+!split
+===== Back propagation in time, part 2   =====
+
+FIGURE: [figslides/RNN10.png, width=700 frac=0.9]
+
+
+!split
+===== Back propagation in time, part 3   =====
+
+FIGURE: [figslides/RNN11.png, width=700 frac=0.9]
+
+
+!split
+===== Back propagation in time, part 4   =====
+
+FIGURE: [figslides/RNN12.png, width=700 frac=0.9]
+
+
+!split
+===== Back propagation in time in equations =====
+
+To derive the expression of the gradients of $\mathcal{L}$ for
+the RNN, we need to start recursively from the nodes closer to the
+output layer in the temporal unrolling scheme - such as $\mathbf{y}$
+and $\mathbf{h}$ at final time $t = \tau$,
+
+!bt
+\begin{align*}
+    (\nabla_{ \mathbf{y}^{(t)}} \mathcal{L})_{i} &= \frac{\partial \mathcal{L}}{\partial L^{(t)}}\frac{\partial L^{(t)}}{\partial y_{i}^{(t)}}, \notag\\
+    \nabla_{\mathbf{h}^{(\tau)}} \mathcal{L} &= \mathbf{V}^\mathsf{T}\nabla_{ \mathbf{y}^{(\tau)}} \mathcal{L}.
+\end{align*}
+!et
+
+!split
+===== Chain rule again =====
+For the following hidden nodes, we have to iterate through time, so by the chain rule, 
+
+!bt
+\begin{align*}
+    \nabla_{\mathbf{h}^{(t)}} \mathcal{L} &= \left(\frac{\partial\mathbf{h}^{(t+1)}}{\partial\mathbf{h}^{(t)}}\right)^\mathsf{T}\nabla_{\mathbf{h}^{(t+1)}}\mathcal{L} + \left(\frac{\partial\mathbf{y}^{(t)}}{\partial\mathbf{h}^{(t)}}\right)^\mathsf{T}\nabla_{ \mathbf{y}^{(t)}} \mathcal{L}.
+\end{align*}
+!et
+
+!split
+===== Gradients of loss functions =====
+Similarly, the gradients of $\mathcal{L}$ with respect to the weights and biases follow,
+
+!bt
+\begin{align*}
+    \nabla_{\mathbf{c}} \mathcal{L} &=\sum_{t}\left(\frac{\partial \mathbf{y}^{(t)}}{\partial \mathbf{c}}\right)^\mathsf{T} \nabla_{\mathbf{y}^{(t)}} \mathcal{L} \notag\\
+    \nabla_{\mathbf{b}} \mathcal{L} &=\sum_{t}\left(\frac{\partial \mathbf{h}^{(t)}}{\partial \mathbf{b}}\right)^\mathsf{T}        \nabla_{\mathbf{h}^{(t)}} \mathcal{L} \notag\\
+    \nabla_{\mathbf{V}} \mathcal{L} &=\sum_{t}\sum_{i}\left(\frac{\partial \mathcal{L}}{\partial y_i^{(t)} }\right)\nabla_{\mathbf{V}^{(t)}}y_i^{(t)} \notag\\
+    \nabla_{\mathbf{W}} \mathcal{L} &=\sum_{t}\sum_{i}\left(\frac{\partial \mathcal{L}}{\partial h_i^{(t)}}\right)\nabla_{\mathbf{w}^{(t)}} h_i^{(t)} \notag\\
+    \nabla_{\mathbf{U}} \mathcal{L} &=\sum_{t}\sum_{i}\left(\frac{\partial \mathcal{L}}{\partial h_i^{(t)}}\right)\nabla_{\mathbf{U}^{(t)}}h_i^{(t)}.
+    \label{eq:rnn_gradients3}
+\end{align*}
+!et
+
+!split
+===== Summary of RNNs =====
+
+Recurrent neural networks (RNNs) have in general no probabilistic component
+in a model. With a given fixed input and target from data, the RNNs learn the intermediate
+association between various layers.
+The inputs, outputs, and internal representation (hidden states) are all
+real-valued vectors.
+
+In a  traditional NN, it is assumed that every input is
+independent of each other.  But with sequential data, the input at a given stage $t$ depends on the input from the previous stage $t-1$
+
+!split
+===== Summary of a  typical RNN =====
+
+o Weight matrices $U$, $W$ and $V$ that connect the input layer at a stage $t$ with the hidden layer $h_t$, the previous hidden layer $h_{t-1}$ with $h_t$ and the hidden layer $h_t$ connecting with the output layer at the same stage and producing an output $\tilde{y}_t$, respectively.
+o The output from the hidden layer $h_t$ is oftem modulated by a $\tanh{}$ function $h_t=\sigma_h(x_t,h_{t-1})=\tanh{(Ux_t+Wh_{t-1}+b)}$ with $b$ a bias value
+o The output from the hidden layer produces $\tilde{y}_t=\sigma_y(Vh_t+c)$ where $c$ is a new bias parameter.
+o The output from the training at a given stage is in turn compared with the observation $y_t$ thorugh a chosen cost function.
+
+The function $g$ can any of the standard activation functions, that is a Sigmoid, a Softmax, a ReLU and other.
+The parameters are trained through the so-called back-propagation through time (BPTT) algorithm.
+
+
+
+
+!split
+===== Four effective ways to learn an RNN and preparing for next week =====
+o Long Short Term Memory Make the RNN out of little modules that are designed to remember values for a long time.
+o Hessian Free Optimization: Deal with the vanishing gradients problem by using a fancy optimizer that can detect directions with a tiny gradient but even smaller curvature.
+o Echo State Networks: Initialize the input a hidden and hidden-hidden and output-hidden connections very carefully so that the hidden state has a huge reservoir of weakly coupled oscillators which can be selectively driven by the input.
+  * ESNs only need to learn the hidden-output connections.
+o Good initialization with momentum Initialize like in Echo State Networks, but then learn all of the connections using momentum
+
+
+!split
+===== Gating mechanism: Long Short Term Memory (LSTM) =====
+
+Besides a simple recurrent neural network layer, as discussed above, there are two other
+commonly used types of recurrent neural network layers: Long Short
+Term Memory (LSTM) and Gated Recurrent Unit (GRU).  For a short
+introduction to these layers see URL:"https://medium.com/mindboard/lstm-vs-gru-experimental-comparison-955820c21e8b"
+and URL:"https://medium.com/mindboard/lstm-vs-gru-experimental-comparison-955820c21e8b".
+
+
+LSTM uses a memory cell for  modeling long-range dependencies and avoid vanishing gradient  problems.
+Capable of modeling longer term dependencies by having
+memory cells and gates that controls the information flow along
+with the memory cells.
+
+o Introduced by Hochreiter and Schmidhuber (1997) who solved the problem of getting an RNN to remember things for a long time (like hundreds of time steps).
+o They designed a memory cell using logistic and linear units with multiplicative interactions.
+o Information gets into the cell whenever its “write” gate is on.
+o The information stays in the cell so long as its _keep_ gate is on.
+o Information can be read from the cell by turning on its _read_ gate. 
+
+
+!split
+===== Implementing a memory cell in a neural network =====
+
+To preserve information for a long time in
+the activities of an RNN, we use a circuit
+that implements an analog memory cell.
+
+o A linear unit that has a self-link with a weight of 1 will maintain its state.
+o Information is stored in the cell by activating its write gate.
+o Information is retrieved by activating the read gate.
+o We can backpropagate through this circuit because logistics are have nice derivatives. 
+
+
+!split
+===== LSTM details =====
+
+The LSTM is a unit cell that is made of three gates:
+o the input gate,
+o the forget gate,
+o and the output gate.
+
+It also introduces a cell state $c$, which can be thought of as the
+long-term memory, and a hidden state $h$ which can be thought of as
+the short-term memory.
+
+!split
+===== Basic layout =====
+
+FIGURE: [figslides/lstm.pdf, width=700 frac=1.0]
+
+!split
+===== More LSTM details =====
+
+The first stage is called the forget gate, where we combine the input
+at (say, time $t$), and the hidden cell state input at $t-1$, passing
+it through the Sigmoid activation function and then performing an
+element-wise multiplication, denoted by $\otimes$.
+
+It follows 
+!bt
+\[
+\mathbf{f}^{(t)} = \sigma(W_f\mathbf{x}^{(t)} + U_f\mathbf{h}^{(t-1)} + \mathbf{b}_f)
+\]
+!et
+where $W$ and $U$ are the weights respectively.
+
+!split
+===== The forget gate =====
+
+This is called the forget gate since the Sigmoid activation function's
+outputs are very close to $0$ if the argument for the function is very
+negative, and $1$ if the argument is very positive. Hence we can
+control the amount of information we want to take from the long-term
+memory.
+
+!split
+===== Input gate =====
+
+The next stage is the input gate, which consists of both a Sigmoid
+function ($\sigma_i$), which decide what percentage of the input will
+be stored in the long-term memory, and the $\tanh_i$ function, which
+decide what is the full memory that can be stored in the long term
+memory. When these results are calculated and multiplied together, it
+is added to the cell state or stored in the long-term memory, denoted
+as $\oplus$. 
+
+We have
+!bt
+\[
+\mathbf{i}^{(t)} = \sigma_g(W_i\mathbf{x}^{(t)} + U_i\mathbf{h}^{(t-1)} + \mathbf{b}_i),
+\]
+!et
+and
+!bt
+\[
+\mathbf{\tilde{c}}^{(t)} = \tanh(W_c\mathbf{x}^{(t)} + U_c\mathbf{h}^{(t-1)} + \mathbf{b}_c),
+\]
+!et
+again the $W$ and $U$ are the weights.
+
+!split
+===== Forget and input =====
+
+The forget gate and the input gate together also update the cell state with the following equation, 
+!bt
+\[
+\mathbf{c}^{(t)} = \mathbf{f}^{(t)} \otimes \mathbf{c}^{(t-1)} + \mathbf{i}^{(t)} \otimes \mathbf{\tilde{c}}^{(t)},
+\]
+!et
+where $f^{(t)}$ and $i^{(t)}$ are the outputs of the forget gate and the input gate, respectively.
+
+!split
+===== Output gate =====
+
+The final stage of the LSTM is the output gate, and its purpose is to
+update the short-term memory.  To achieve this, we take the newly
+generated long-term memory and process it through a hyperbolic tangent
+($\tanh$) function creating a potential new short-term memory. We then
+multiply this potential memory by the output of the Sigmoid function
+($\sigma_o$). This multiplication generates the final output as well
+as the input for the next hidden cell ($h^{\langle t \rangle}$) within
+the LSTM cell.
+
+We have 
+!bt
+\[
+\begin{aligned}
+\mathbf{o}^{(t)} &= \sigma_g(W_o\mathbf{x}^{(t)} + U_o\mathbf{h}^{(t-1)} + \mathbf{b}_o), \\
+\mathbf{h}^{(t)} &= \mathbf{o}^{(t)} \otimes \sigma_h(\mathbf{c}^{(t)}). \\
+\end{aligned}
+\]
+!et
+where $\mathbf{W_o,U_o}$ are the weights of the output gate and $\mathbf{b_o}$ is the bias of the output gate.
+
+
+
+
diff --git a/doc/src/week45/week45.do.txt b/doc/src/week45/week45.do.txt
index d8701da08..773344d47 100644
--- a/doc/src/week45/week45.do.txt
+++ b/doc/src/week45/week45.do.txt
@@ -1,4 +1,4 @@
-TITLE: Week 45,  Convolutional Neural Networks (CCNs) and Recurrent Neural Networks (RNNs)
+TITLE: Week 45,  Convolutional Neural Networks (CCNs)
 AUTHOR: Morten Hjorth-Jensen {copyright, 1999-present|CC BY-NC} at Department of Physics, University of Oslo
 DATE: November 3-7, 2025
 
@@ -6,35 +6,32 @@ DATE: November 3-7, 2025
 ===== Plans for week 45 =====
 
 !bblock Material for the lecture on Monday November 3, 2025
-o Convolutional Neural Networks, codes and examples (own code and TensorFlow and Pytorch implementations)
-o Recurrent  Neural Networks (RNNs)
+o Convolutional Neural Networks, codes and examples (TensorFlow and Pytorch implementations)
 o Readings and Videos:
-  o These lecture notes at URL:"https://github.com/CompPhysics/MachineLearning/blob/master/doc/pub/week45/ipynb/week45.ipynb"
-#    * "Video of lecture":"https://youtu.be/z0x-vgyAZUk"
-#    * "Whiteboard notes":"https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2023/NotesNov9.pdf"
-  o For a more in depth discussion on  CNNs and recurrent neural networks we recommend Goodfellow et al chapters 9 and 10. See also chapter 11 and 12 on practicalities and applications    
-  o Reading suggestions for implementation of CNNs and RNNs, see Raschka et al chapters 14-15 at URL:"https://github.com/rasbt/machine-learning-book".
-  o Video  on Recurrent Neural Networks from MIT at URL:"https://www.youtube.com/watch?v=SEnXr6v2ifU&ab_channel=AlexanderAmini"
+o These lecture notes at URL:"https://github.com/CompPhysics/MachineLearning/blob/master/doc/pub/week45/ipynb/week45.ipynb"
+o Video of lecture at URL:"https://youtu.be/dZt6Vm1wjhs"
+o Whiteboard notes at URL:"https://github.com/CompPhysics/MachineLearning/blob/master/doc/HandWrittenNotes/2025/FYSSTKweek45.pdf"
+o For a more in depth discussion on  CNNs we recommend Goodfellow et al chapters 9. See also chapter 11 and 12 on practicalities and applications    
+o Reading suggestions for implementation of CNNs, see Raschka et al chapters 14-15 at URL:"https://github.com/rasbt/machine-learning-book".
+#  o Video  on Recurrent Neural Networks from MIT at URL:"https://www.youtube.com/watch?v=SEnXr6v2ifU&ab_channel=AlexanderAmini"
   o Video on Deep Learning at URL:"https://www.youtube.com/playlist?list=PLZHQObOWTQDNU6R1_67000Dx_ZCJB-3pi"
 !eblock
 
 
 !split
-===== Material for the lab sessions, additional ways to present classification results and other practicalities =====
+===== Material for the lab sessions =====
 
-!bblock  Material for the active learning sessions on Tuesday and Wednesday
-  o Discussion of and work on project 3, available from Monday November 4, late evening
-!eblock  
+Discussion of and work on project 2, no exercises this week, only project work
 
 
 
 !split
-===== Material for Lecture Monday November 4 =====
+===== Material for Lecture Monday November 3 =====
 
 
 
 !split
-===== Convolutional Neural Networks (recognizing images) =====
+===== Convolutional Neural Networks (recognizing images), reminder from last week =====
 
 
 Convolutional neural networks (CNNs) were developed during the last
@@ -150,6 +147,49 @@ dimension.
 FIGURE: [figslides/cnn.jpeg, width=500 frac=0.6]  A CNN arranges its neurons in three dimensions (width, height, depth), as visualized in one of the layers. Every layer of a CNN transforms the 3D input volume to a 3D output volume of neuron activations. In this example, the red input layer holds the image, so its width and height would be the dimensions of the image, and the depth would be 3 (Red, Green, Blue channels).
 
 
+!split
+===== More on Dimensionalities =====
+
+In fields like signal processing (and imaging as well), one designs
+so-called filters. These filters are defined by the convolutions and
+are often hand-crafted. One may specify filters for smoothing, edge
+detection, frequency reshaping, and similar operations. However with
+neural networks the idea is to automatically learn the filters and use
+many of them in conjunction with non-linear operations (activation
+functions).
+
+As an example consider a neural network operating on sound sequence
+data.  Assume that we an input vector $\bm{x}$ of length $d=10^6$.  We
+construct then a neural network with onle hidden layer only with
+$10^4$ nodes. This means that we will have a weight matrix with
+$10^4\times 10^6=10^{10}$ weights to be determined, together with $10^4$ biases.
+
+Assume furthermore that we have an output layer which is meant to train whether the sound sequence represents a human voice (true) or something else (false).
+It means that we have only one output node. But since this output node connects to $10^4$ nodes in the hidden layer, there are in total $10^4$ weights to be determined for the output layer, plus one bias. In total we have
+
+!bt
+\[
+\mathrm{NumberParameters}=10^{10}+10^4+10^4+1 \approx 10^{10},
+\]
+!et
+that is ten billion parameters to determine. 
+
+
+!split
+===== Further remarks =====
+
+
+
+The main principles that justify convolutions is locality of
+information and repetion of patterns within the signal. Sound samples
+of the input in adjacent spots are much more likely to affect each
+other than those that are very far away. Similarly, sounds are
+repeated in multiple times in the signal. While slightly simplistic,
+reasoning about such a sound example demonstrates this. The same
+principles then apply to images and other similar data.
+
+
+
 
 
 !split 
@@ -172,7 +212,20 @@ A simple CNN for image classification could have the architecture:
 * _FC_ (i.e. fully-connected) layer will compute the class scores, resulting in volume of size $[1\times 1\times 10]$, where each of the 10 numbers correspond to a class score, such as among the 10 categories of the MNIST images we considered above . As with ordinary Neural Networks and as the name implies, each neuron in this layer will be connected to all the numbers in the previous volume.
 
 
+!split
+===== Transforming images =====
+
+CNNs transform the original image layer by layer from the original
+pixel values to the final class scores. 
 
+Observe that some layers contain
+parameters and other don’t. In particular, the CNN layers perform
+transformations that are a function of not only the activations in the
+input volume, but also of the parameters (the weights and biases of
+the neurons). On the other hand, the RELU/POOL layers will implement a
+fixed function. The parameters in the CONV/FC layers will be trained
+with gradient descent so that the class scores that the CNN computes
+are consistent with the labels in the training set for each image.
 
 
 !split
@@ -205,3806 +258,1042 @@ We say we perform a filtering (convolution is the mathematical operation).
 
 
 !split
-===== Building convolutional neural networks in Tensorflow and Keras =====
+===== Mathematics of CNNs =====
 
-  
-As discussed above, CNNs are neural networks built from the assumption that the inputs
-to the network are 2D images. This is important because the number of features or pixels in images
-grows very fast with the image size, and an enormous number of weights and biases are needed in order to build an accurate network.  
-  
-As before, we still have our input, a hidden layer and an output. What's novel about convolutional networks
-are the _convolutional_ and _pooling_ layers stacked in pairs between the input and the hidden layer.
-In addition, the data is no longer represented as a 2D feature matrix, instead each input is a number of 2D
-matrices, typically 1 for each color dimension (Red, Green, Blue). 
+The mathematics of CNNs is based on the mathematical operation of
+_convolution_.  In mathematics (in particular in functional analysis),
+convolution is represented by mathematical operations (integration,
+summation etc) on two functions in order to produce a third function
+that expresses how the shape of one gets modified by the other.
+Convolution has a plethora of applications in a variety of
+disciplines, spanning from statistics to signal processing, computer
+vision, solutions of differential equations,linear algebra,
+engineering, and yes, machine learning.
 
+Mathematically, convolution is defined as follows (one-dimensional example):
+Let us define a continuous function $y(t)$ given by
+!bt
+\[
+y(t) = \int x(a) w(t-a) da,
+\]
+!et
+where $x(a)$ represents a so-called input and $w(t-a)$ is normally called the weight function or kernel.
 
-!split
-===== Setting it up =====
+The above integral is written in  a more compact form as
+!bt
+\[
+y(t) = \left(x * w\right)(t).
+\]
+!et
 
-It means that to represent the entire
-dataset of images, we require a 4D matrix or _tensor_. This tensor has the dimensions:  
+The discretized version reads
 !bt
-\[  
-(n_{inputs},\, n_{pixels, width},\, n_{pixels, height},\, depth) .
+\[
+y(t) = \sum_{a=-\infty}^{a=\infty}x(a)w(t-a).
 \]
 !et
-  
-!split
-=====  The MNIST dataset again =====
+Computing the inverse of the above convolution operations is known as deconvolution and the process is commutative.
 
-The MNIST dataset consists of grayscale images with a pixel size of
-$28\times 28$, meaning we require $28 \times 28 = 724$ weights to each
-neuron in the first hidden layer.
+How can we use this? And what does it mean? Let us study some familiar examples first.
 
-If we were to analyze images of size $128\times 128$ we would require
-$128 \times 128 = 16384$ weights to each neuron. Even worse if we were
-dealing with color images, as most images are, we have an image matrix
-of size $128\times 128$ for each color dimension (Red, Green, Blue),
-meaning 3 times the number of weights $= 49152$ are required for every
-single neuron in the first hidden layer.
-  
 
 !split
-===== Strong correlations =====
+===== Convolution Examples: Polynomial multiplication =====
 
-Images typically have strong local correlations, meaning that a small
-part of the image varies little from its neighboring regions. If for
-example we have an image of a blue car, we can roughly assume that a
-small blue part of the image is surrounded by other blue regions.
+Our first example is that of a multiplication between two polynomials,
+which we will rewrite in terms of the mathematics of convolution. In
+the final stage, since the problem here is a discrete one, we will
+recast the final expression in terms of a matrix-vector
+multiplication, where the matrix is a so-called "Toeplitz matrix
+":"https://link.springer.com/book/10.1007/978-93-86279-04-0".
 
-Therefore, instead of connecting every single pixel to a neuron in the
-first hidden layer, as we have previously done with deep neural
-networks, we can instead connect each neuron to a small part of the
-image (in all 3 RGB depth dimensions).  The size of each small area is
-fixed, and known as a "receptive":"https://en.wikipedia.org/wiki/Receptive_field".
-  
+Let us look a the following polynomials to second and third order, respectively:
+!bt
+\[
+p(t) = \alpha_0+\alpha_1 t+\alpha_2 t^2,
+\]
+!et
+and
+!bt
+\[
+s(t) = \beta_0+\beta_1 t+\beta_2 t^2+\beta_3 t^3.
+\]
+!et
 
-!split 
-===== Layers of a CNN =====
+The polynomial multiplication gives us a new polynomial of degree $5$
+!bt
+\[
+z(t) = \delta_0+\delta_1 t+\delta_2 t^2+\delta_3 t^3+\delta_4 t^4+\delta_5 t^5.
+\]
+!et
 
-The layers of a convolutional neural network arrange neurons in 3D: width, height and depth.  
-The input image is typically a square matrix of depth 3. 
+!split
+===== Efficient Polynomial Multiplication =====
 
-A _convolution_ is performed on the image which outputs
-a 3D volume of neurons. The weights to the input are arranged in a number of 2D matrices, known as _filters_.
+Computing polynomial products can be implemented efficiently if we rewrite the more brute force multiplications using convolution.
+We note first that the new coefficients are given as
 
+!bt
+\begin{split}
+\delta_0=&\alpha_0\beta_0\\
+\delta_1=&\alpha_1\beta_0+\alpha_0\beta_1\\
+\delta_2=&\alpha_0\beta_2+\alpha_1\beta_1+\alpha_2\beta_0\\
+\delta_3=&\alpha_1\beta_2+\alpha_2\beta_1+\alpha_0\beta_3\\
+\delta_4=&\alpha_2\beta_2+\alpha_1\beta_3\\
+\delta_5=&\alpha_2\beta_3.\\
+\end{split}
+!et
 
-Each filter slides along the input image, taking the dot product
-between each small part of the image and the filter, in all depth
-dimensions. This is then passed through a non-linear function,
-typically the _Rectified Linear (ReLu)_ function, which serves as the
-activation of the neurons in the first convolutional layer. This is
-further passed through a _pooling layer_, which reduces the size of the
-convolutional layer, e.g. by taking the maximum or average across some
-small regions, and this serves as input to the next convolutional
-layer.
 
+We note that $\alpha_i=0$ except for $i\in \left\{0,1,2\right\}$ and $\beta_i=0$ except for $i\in\left\{0,1,2,3\right\}$.
 
-!split
-===== Systematic reduction =====  
+We can then rewrite the coefficients $\delta_j$ using a discrete convolution as
+!bt
+\[
+\delta_j = \sum_{i=-\infty}^{i=\infty}\alpha_i\beta_{j-i}=(\alpha * \beta)_j,
+\]
+!et
+or as a double sum with restriction $l=i+j$
+!bt
+\[
+\delta_l = \sum_{ij}\alpha_i\beta_{j}.
+\]
+!et
 
-By systematically reducing the size of the input volume, through
-convolution and pooling, the network should create representations of
-small parts of the input, and then from them assemble representations
-of larger areas.  The final pooling layer is flattened to serve as
-input to a hidden layer, such that each neuron in the final pooling
-layer is connected to every single neuron in the hidden layer. This
-then serves as input to the output layer, e.g. a softmax output for
-classification.
-  
 
 !split
-===== Prerequisites: Collect and pre-process data =====
-!bc pycod
-# import necessary packages
-import numpy as np
-import matplotlib.pyplot as plt
-from sklearn import datasets
+===== Further simplification =====
 
+Although we may have redundant operations with some few zeros for $\beta_i$, we can rewrite the above sum in a more compact way as 
+!bt
+\[
+\delta_i = \sum_{k=0}^{k=m-1}\alpha_k\beta_{i-k},
+\]
+!et
+where $m=3$ in our case, the maximum length of
+the vector $\alpha$. Note that the vector $\bm{\beta}$ has length $n=4$. Below we will find an even more efficient representation.
 
-# ensure the same random numbers appear every time
-np.random.seed(0)
+!split
+===== A more efficient way of coding the above Convolution =====
 
-# display images in notebook
-%matplotlib inline
-plt.rcParams['figure.figsize'] = (12,12)
+Since we only have a finite number of $\alpha$ and $\beta$ values
+which are non-zero, we can rewrite the above convolution expressions
+as a matrix-vector multiplication
 
+!bt
+\[
+\bm{\delta}=\begin{bmatrix}\alpha_0 & 0 & 0 & 0 \\
+                            \alpha_1 & \alpha_0 & 0 & 0 \\
+			    \alpha_2 & \alpha_1 & \alpha_0 & 0 \\
+			    0 & \alpha_2 & \alpha_1 & \alpha_0 \\
+			    0 & 0 & \alpha_2 & \alpha_1 \\
+			    0 & 0 & 0 & \alpha_2
+			    \end{bmatrix}\begin{bmatrix} \beta_0 \\ \beta_1 \\ \beta_2 \\ \beta_3\end{bmatrix}.
+\]
+!et
 
-# download MNIST dataset
-digits = datasets.load_digits()
+!split
+===== Commutative process =====
 
-# define inputs and labels
-inputs = digits.images
-labels = digits.target
+The process is commutative and we can easily see that we can rewrite the multiplication in terms of  a matrix holding $\beta$ and a vector holding $\alpha$.
+In this case we have
+!bt
+\[
+\bm{\delta}=\begin{bmatrix}\beta_0 & 0 & 0  \\
+                            \beta_1 & \beta_0 & 0  \\
+			    \beta_2 & \beta_1 & \beta_0  \\
+			    \beta_3 & \beta_2 & \beta_1 \\
+			    0 & \beta_3 & \beta_2 \\
+			    0 & 0 & \beta_3
+			    \end{bmatrix}\begin{bmatrix} \alpha_0 \\ \alpha_1 \\ \alpha_2\end{bmatrix}.
+\]
+!et
 
-# RGB images have a depth of 3
-# our images are grayscale so they should have a depth of 1
-inputs = inputs[:,:,:,np.newaxis]
+Note that the use of these matrices is for mathematical purposes only
+and not implementation purposes.  When implementing the above equation
+we do not encode (and allocate memory) the matrices explicitely.  We
+rather code the convolutions in the minimal memory footprint that they
+require.
 
-print("inputs = (n_inputs, pixel_width, pixel_height, depth) = " + str(inputs.shape))
-print("labels = (n_inputs) = " + str(labels.shape))
 
 
-# choose some random images to display
-n_inputs = len(inputs)
-indices = np.arange(n_inputs)
-random_indices = np.random.choice(indices, size=5)
+!split
+===== Toeplitz matrices =====
 
-for i, image in enumerate(digits.images[random_indices]):
-    plt.subplot(1, 5, i+1)
-    plt.axis('off')
-    plt.imshow(image, cmap=plt.cm.gray_r, interpolation='nearest')
-    plt.title("Label: %d" % digits.target[random_indices[i]])
-plt.show()
-!ec
+The above matrices are examples of so-called "Toeplitz
+matrices":"https://link.springer.com/book/10.1007/978-93-86279-04-0". A
+Toeplitz matrix is a matrix in which each descending diagonal from
+left to right is constant. For instance the last matrix, which we
+rewrite as
+!bt
+\[
+\bm{A}=\begin{bmatrix}a_0 & 0 & 0  \\
+                            a_1 & a_0 & 0  \\
+			    a_2 & a_1 & a_0  \\
+			    a_3 & a_2 & a_1 \\
+			    0 & a_3 & a_2 \\
+			    0 & 0 & a_3
+			    \end{bmatrix},
+\]
+!et
+with elements $a_{ii}=a_{i+1,j+1}=a_{i-j}$ is an example of a Toeplitz
+matrix. Such a matrix does not need to be a square matrix.  Toeplitz
+matrices are also closely connected with Fourier series, because the multiplication operator by a trigonometric
+polynomial, compressed to a finite-dimensional space, can be
+represented by such a matrix. The example above shows that we can
+represent linear convolution as multiplication of a Toeplitz matrix by
+a vector.
 
 
 !split
-===== Importing Keras and Tensorflow =====
-!bc pycod
-from tensorflow.keras import datasets, layers, models
-from tensorflow.keras.layers import Input
-from tensorflow.keras.models import Sequential      #This allows appending layers to existing models
-from tensorflow.keras.layers import Dense           #This allows defining the characteristics of a particular layer
-from tensorflow.keras import optimizers             #This allows using whichever optimiser we want (sgd,adam,RMSprop)
-from tensorflow.keras import regularizers           #This allows using whichever regularizer we want (l1,l2,l1_l2)
-from tensorflow.keras.utils import to_categorical   #This allows using categorical cross entropy as the cost function
-#from tensorflow.keras import Conv2D
-#from tensorflow.keras import MaxPooling2D
-#from tensorflow.keras import Flatten
+===== Fourier series and Toeplitz matrices =====
 
-from sklearn.model_selection import train_test_split
+This is an active and ogoing research area concerning CNNs. The following articles may be of interest
+o "Read more about the convolution theorem and Fouriers series":"https://www.sciencedirect.com/topics/engineering/convolution-theorem#:~:text=The%20convolution%20theorem%20(together%20with,k%20)%20G%20(%20k%20)%20."
+o "Fourier Transform Layer":"https://www.sciencedirect.com/science/article/pii/S1568494623006257"
 
-# representation of labels
-labels = to_categorical(labels)
 
-# split into train and test data
-# one-liner from scikit-learn library
-train_size = 0.8
-test_size = 1 - train_size
-X_train, X_test, Y_train, Y_test = train_test_split(inputs, labels, train_size=train_size,
-                                                    test_size=test_size)
-!ec
 
-!split 
-===== Running with Keras =====
+!split
+===== Generalizing the above one-dimensional case =====
 
-!bc pycod
-def create_convolutional_neural_network_keras(input_shape, receptive_field,
-                                              n_filters, n_neurons_connected, n_categories,
-                                              eta, lmbd):
-    model = Sequential()
-    model.add(layers.Conv2D(n_filters, (receptive_field, receptive_field), input_shape=input_shape, padding='same',
-              activation='relu', kernel_regularizer=regularizers.l2(lmbd)))
-    model.add(layers.MaxPooling2D(pool_size=(2, 2)))
-    model.add(layers.Flatten())
-    model.add(layers.Dense(n_neurons_connected, activation='relu', kernel_regularizer=regularizers.l2(lmbd)))
-    model.add(layers.Dense(n_categories, activation='softmax', kernel_regularizer=regularizers.l2(lmbd)))
-    
-    sgd = optimizers.SGD(learning_rate=eta)
-    model.compile(loss='categorical_crossentropy', optimizer=sgd, metrics=['accuracy'])
-    
-    return model
+In order to align the above simple case with the more general
+convolution cases, we rename $\bm{\alpha}$, whose length is $m=3$,
+with $\bm{w}$.  We will interpret $\bm{w}$ as a weight/filter function
+with which we want to perform the convolution with an input variable
+$\bm{x}$ of length $n$.  We will assume always that the filter
+$\bm{w}$ has dimensionality $m \le n$.
 
-epochs = 100
-batch_size = 100
-input_shape = X_train.shape[1:4]
-receptive_field = 3
-n_filters = 10
-n_neurons_connected = 50
-n_categories = 10
+We replace thus $\bm{\beta}$ with $\bm{x}$ and $\bm{\delta}$ with $\bm{y}$ and have
+!bt
+\[
+y(i)= \left(x*w\right)(i)= \sum_{k=0}^{k=m-1}w(k)x(i-k),
+\]
+!et
+where $m=3$ in our case, the maximum length of the vector $\bm{w}$.
+Here the symbol $*$ represents the mathematical operation of convolution.
 
-eta_vals = np.logspace(-5, 1, 7)
-lmbd_vals = np.logspace(-5, 1, 7)
-!ec
 
 !split
-===== Final part =====
+===== Memory considerations =====
 
-!bc pycod
-CNN_keras = np.zeros((len(eta_vals), len(lmbd_vals)), dtype=object)
-        
-for i, eta in enumerate(eta_vals):
-    for j, lmbd in enumerate(lmbd_vals):
-        CNN = create_convolutional_neural_network_keras(input_shape, receptive_field,
-                                              n_filters, n_neurons_connected, n_categories,
-                                              eta, lmbd)
-        CNN.fit(X_train, Y_train, epochs=epochs, batch_size=batch_size, verbose=0)
-        scores = CNN.evaluate(X_test, Y_test)
-        
-        CNN_keras[i][j] = CNN
-        
-        print("Learning rate = ", eta)
-        print("Lambda = ", lmbd)
-        print("Test accuracy: %.3f" % scores[1])
-        print()
-!ec     
+This expression leaves us however with some terms with negative
+indices, for example $x(-1)$ and $x(-2)$ which may not be defined. Our
+vector $\bm{x}$ has components $x(0)$, $x(1)$, $x(2)$ and $x(3)$.
 
-!split
-===== Final visualization =====
+The index $j$ for $\bm{x}$ runs from $j=0$ to $j=3$ since $\bm{x}$ is meant to
+represent a third-order polynomial.
 
-!bc pycod
-# visual representation of grid search
-# uses seaborn heatmap, could probably do this in matplotlib
-import seaborn as sns
+Furthermore, the index $i$ runs from $i=0$ to $i=5$ since $\bm{y}$
+contains the coefficients of a fifth-order polynomial.  When $i=5$ we
+may also have values of $x(4)$ and $x(5)$ which are not defined.
 
-sns.set()
 
-train_accuracy = np.zeros((len(eta_vals), len(lmbd_vals)))
-test_accuracy = np.zeros((len(eta_vals), len(lmbd_vals)))
+!split
+===== Padding =====
 
-for i in range(len(eta_vals)):
-    for j in range(len(lmbd_vals)):
-        CNN = CNN_keras[i][j]
+The solution to this is what is called _padding_!  We simply define a
+new vector $x$ with two added elements set to zero before $x(0)$ and
+two new elements after $x(3)$ set to zero. That is, we augment the
+length of $\bm{x}$ from $n=4$ to $n+2P=8$, where $P=2$ is the padding
+constant (a new hyperparameter), see discussions below as well.
 
-        train_accuracy[i][j] = CNN.evaluate(X_train, Y_train)[1]
-        test_accuracy[i][j] = CNN.evaluate(X_test, Y_test)[1]
+!split
+===== New vector =====
 
-        
-fig, ax = plt.subplots(figsize = (10, 10))
-sns.heatmap(train_accuracy, annot=True, ax=ax, cmap="viridis")
-ax.set_title("Training Accuracy")
-ax.set_ylabel("$\eta$")
-ax.set_xlabel("$\lambda$")
-plt.show()
+We have a new vector defined as $x(0)=0$, $x(1)=0$,
+$x(2)=\beta_0$, $x(3)=\beta_1$, $x(4)=\beta_2$, $x(5)=\beta_3$,
+$x(6)=0$, and $x(7)=0$.
 
-fig, ax = plt.subplots(figsize = (10, 10))
-sns.heatmap(test_accuracy, annot=True, ax=ax, cmap="viridis")
-ax.set_title("Test Accuracy")
-ax.set_ylabel("$\eta$")
-ax.set_xlabel("$\lambda$")
-plt.show()
-!ec
 
+We have added four new elements, which
+are all zero. The benefit is that we can rewrite the equation for
+$\bm{y}$, with $i=0,1,\dots,5$,
+!bt
+\[
+y(i) = \sum_{k=0}^{k=m-1}w(k)x(i+(m-1)-k).
+\]
+!et
+
+As an example, we have
+!bt
+\[
+y(4)=x(6)w(0)+x(5)w(1)+x(4)w(2)=0\times \alpha_0+\beta_3\alpha_1+\beta_2\alpha_2,
+\]
+!et
+as before except that we have an additional term $x(6)w(0)$, which is zero.
 
+Similarly, for the fifth-order term we have
+!bt
+\[
+y(5)=x(7)w(0)+x(6)w(1)+x(5)w(2)=0\times \alpha_0+0\times\alpha_1+\beta_3\alpha_2.
+\]
+!et
 
+The zeroth-order term is
+!bt
+\[
+y(0)=x(2)w(0)+x(1)w(1)+x(0)w(2)=\beta_0 \alpha_0+0\times\alpha_1+0\times\alpha_2=\alpha_0\beta_0.
+\]
+!et
 
 !split
-===== The CIFAR01 data set =====
+===== Rewriting as dot products =====
 
-The CIFAR10 dataset contains 60,000 color images in 10 classes, with
-6,000 images in each class. The dataset is divided into 50,000
-training images and 10,000 testing images. The classes are mutually
-exclusive and there is no overlap between them.
+If we now flip the filter/weight vector, with the following term as a typical example
+!bt
+\[
+y(0)=x(2)w(0)+x(1)w(1)+x(0)w(2)=x(2)\tilde{w}(2)+x(1)\tilde{w}(1)+x(0)\tilde{w}(0),
+\]
+!et
+with $\tilde{w}(0)=w(2)$, $\tilde{w}(1)=w(1)$, and $\tilde{w}(2)=w(0)$, we can then rewrite the above sum as a dot product of
+$x(i:i+(m-1))\tilde{w}$ for element $y(i)$, where $x(i:i+(m-1))$ is simply a patch of $\bm{x}$ of size $m-1$.
 
-!bc pycod
-import tensorflow as tf
+The padding $P$ we have introduced for the convolution stage is just
+another hyperparameter which is introduced as part of the
+architecture. Similarly, below we will also introduce another
+hyperparameter called _Stride_ $S$. 
 
-from tensorflow.keras import datasets, layers, models
-import matplotlib.pyplot as plt
 
-# We import the data set
-(train_images, train_labels), (test_images, test_labels) = datasets.cifar10.load_data()
+!split
+===== Cross correlation =====
 
-# Normalize pixel values to be between 0 and 1 by dividing by 255. 
-train_images, test_images = train_images / 255.0, test_images / 255.0
+In essentially all applications one uses what is called cross correlation instead of the standard convolution described above.
+This means that multiplication is performed in the same direction and instead of the general expression we have discussed above (with infinite sums)
+!bt
+\[
+y(i) = \sum_{k=-\infty}^{k=\infty}w(k)x(i-k),
+\]
+!et
+we have now
+!bt
+\[
+y(i) = \sum_{k=-\infty}^{k=\infty}w(k)x(i+k).
+\]
+!et
 
-!ec
+Both TensorFlow and PyTorch (as well as our own code example below),
+implement the last equation, although it is normally referred to as
+convolution.  The same padding rules and stride rules discussed below
+apply to this expression as well.
 
+We leave it as an exercise for you to convince yourself that the example we have discussed till now, gives the same final expression using the last expression.
 
 
-!split
-===== Verifying the data set =====
 
-To verify that the dataset looks correct, let's plot the first 25 images from the training set and display the class name below each image.
+=====  Two-dimensional objects =====
 
-!bc pycod
-class_names = ['airplane', 'automobile', 'bird', 'cat', 'deer',
-               'dog', 'frog', 'horse', 'ship', 'truck']
-plt.figure(figsize=(10,10))
-for i in range(25):
-    plt.subplot(5,5,i+1)
-    plt.xticks([])
-    plt.yticks([])
-    plt.grid(False)
-    plt.imshow(train_images[i], cmap=plt.cm.binary)
-    # The CIFAR labels happen to be arrays, 
-    # which is why you need the extra index
-    plt.xlabel(class_names[train_labels[i][0]])
-plt.show()
-!ec
+We are now ready to start studying the discrete convolutions relevant for convolutional neural networks.
+We often use convolutions over more than one dimension at a time. If
+we have a two-dimensional image $X$ as input, we can have a _filter_
+defined by a two-dimensional _kernel/weight/filter_ $W$. This leads to an output $Y$
 
-!split
-===== Set up  the model =====
+!bt
+\[
+Y(i,j)=(X * W)(i,j) = \sum_m\sum_n X(m,n)W(i-m,j-n).
+\]
+!et
 
-The 6 lines of code below define the convolutional base using a common pattern: a stack of Conv2D and MaxPooling2D layers.
+Convolution is a commutative process, which means we can rewrite this equation as
+!bt
+\[
+Y(i,j)=(X * W)(i,j) = \sum_m\sum_n X(i-m,j-n)W(m,n).
+\]
+!et
 
-As input, a CNN takes tensors of shape (image_height, image_width, color_channels), ignoring the batch size. If you are new to these dimensions, color_channels refers to (R,G,B). In this example, you will configure our CNN to process inputs of shape (32, 32, 3), which is the format of CIFAR images. You can do this by passing the argument input_shape to our first layer.
+Normally the latter is more straightforward to implement in a machine
+larning library since there is less variation in the range of values
+of $m$ and $n$.
 
-!bc pycod
-model = models.Sequential()
-model.add(layers.Conv2D(32, (3, 3), activation='relu', input_shape=(32, 32, 3)))
-model.add(layers.MaxPooling2D((2, 2)))
-model.add(layers.Conv2D(64, (3, 3), activation='relu'))
-model.add(layers.MaxPooling2D((2, 2)))
-model.add(layers.Conv2D(64, (3, 3), activation='relu'))
 
-# Let's display the architecture of our model so far.
+As mentioned above, most deep learning libraries implement
+cross-correlation instead of convolution (although it is referred to as
+convolution)
+!bt
+Y(i,j)=(X * W)(i,j) = \sum_m\sum_n X(i+m,j+n)W(m,n).
+\]
+!et
 
-model.summary()
-!ec
 
-You can see that the output of every Conv2D and MaxPooling2D layer is a 3D tensor of shape (height, width, channels). The width and height dimensions tend to shrink as you go deeper in the network. The number of output channels for each Conv2D layer is controlled by the first argument (e.g., 32 or 64). Typically, as the width and height shrink, you can afford (computationally) to add more output channels in each Conv2D layer.
 
+!split
+===== CNNs in more detail, simple example  =====
 
 
+Let assume we have an input matrix $X$ of dimensionality $3\times 3$
+and a $2\times 2$ filter $W$ given by the following matrices
 
-!split
-===== Add Dense layers on top =====
+!bt
+\[
+\bm{X}=\begin{bmatrix}x_{00} & x_{01} & x_{02}  \\
+                      x_{10} & x_{11} & x_{12}  \\
+	              x_{20} & x_{21} & x_{22} \end{bmatrix},
+\]
+!et
+and 
+!bt
+\[
+\bm{W}=\begin{bmatrix}w_{00} & w_{01} \\
+	              w_{10} & w_{11}\end{bmatrix}.
+\]
+!et
+We introduce now the hyperparameter $S$ _stride_. Stride represents how the filter $W$ moves the convolution process on the matrix $X$.
+We strongly recommend the repository on "Arithmetic of deep learning by Dumoulin and Visin":"https://github.com/vdumoulin/conv_arithmetic" 
 
-To complete our model, you will feed the last output tensor from the
-convolutional base (of shape (4, 4, 64)) into one or more Dense layers
-to perform classification. Dense layers take vectors as input (which
-are 1D), while the current output is a 3D tensor. First, you will
-flatten (or unroll) the 3D output to 1D, then add one or more Dense
-layers on top. CIFAR has 10 output classes, so you use a final Dense
-layer with 10 outputs and a softmax activation.
+Here we set the stride equal to $S=1$, which means that, starting with the element $x_{00}$, the filter will act on $2\times 2$ submatrices each time, starting with the upper corner and moving according to the stride value column by column. 
 
-!bc pycod
-model.add(layers.Flatten())
-model.add(layers.Dense(64, activation='relu'))
-model.add(layers.Dense(10))
-# Here's the complete architecture of our model
+Here we perform the operation
+!bt
+\[
+Y_(i,j)=(X * W)(i,j) = \sum_m\sum_n X(i-m,j-n)W(m,n),
+\]
+!et
+and obtain
+!bt
+\[
+\bm{Y}=\begin{bmatrix}x_{00}w_{00}+x_{01}w_{01}+x_{10}w_{10}+x_{11}w_{11} & x_{01}w_{00}+x_{02}w_{01}+x_{11}w_{10}+x_{12}w_{11}  \\
+	              x_{10}w_{00}+x_{11}w_{01}+x_{20}w_{10}+x_{21}w_{11} & x_{11}w_{00}+x_{12}w_{01}+x_{21}w_{10}+x_{22}w_{11}\end{bmatrix}.
+\]
+!et
 
-model.summary()
-!ec
-As you can see, our (4, 4, 64) outputs were flattened into vectors of shape (1024) before going through two Dense layers.
+We can rewrite this operation in terms of a matrix-vector multiplication by defining a new vector where we flatten out the inputs as a vector $\bm{X}'$ of length $9$ and
+a matrix $\bm{W}'$ with dimension $4\times 9$ as
 
-!split
-===== Compile and train the model =====
+!bt
+\[
+\bm{X}'=\begin{bmatrix}x_{00} \\ x_{01} \\ x_{02} \\ x_{10} \\ x_{11} \\ x_{12} \\ x_{20} \\ x_{21} \\ x_{22} \end{bmatrix},
+\]
+!et
 
-!bc pycod
-model.compile(optimizer='adam',
-              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
-              metrics=['accuracy'])
+and the new matrix
+!bt
+\[
+\bm{W}'=\begin{bmatrix} w_{00} & w_{01} & 0 & w_{10} & w_{11} & 0 & 0 & 0 & 0 \\
+                        0  & w_{00} & w_{01} & 0 & w_{10} & w_{11} & 0 & 0 & 0 \\
+			0 & 0 & 0 & w_{00} & w_{01} & 0 & w_{10} & w_{11} & 0  \\
+                        0 & 0 & 0 & 0 & w_{00} & w_{01} & 0 & w_{10} & w_{11}\end{bmatrix}.
+\]
+!et
 
-history = model.fit(train_images, train_labels, epochs=10, 
-                    validation_data=(test_images, test_labels))
+We see easily that performing the matrix-vector multiplication $\bm{W}'\bm{X}'$ is the same as the above convolution with stride $S=1$, that is
 
-!ec
+!bt
+\[
+Y=(\bm{W}*\bm{X}),
+\]
+!et
+is now given by $\bm{W}'\bm{X}'$ which is a vector of length $4$ instead of the originally resulting  $2\times 2$ output matrix.
 
 
 !split
-===== Finally, evaluate the model =====
-
-!bc pycod
-plt.plot(history.history['accuracy'], label='accuracy')
-plt.plot(history.history['val_accuracy'], label = 'val_accuracy')
-plt.xlabel('Epoch')
-plt.ylabel('Accuracy')
-plt.ylim([0.5, 1])
-plt.legend(loc='lower right')
+===== The convolution stage =====
 
-test_loss, test_acc = model.evaluate(test_images,  test_labels, verbose=2)
+The convolution stage, where we apply different filters $\bm{W}$ in
+order to reduce the dimensionality of an image, adds, in addition to
+the weights and biases (to be trained by the back propagation
+algorithm) that define the filters, two new hyperparameters, the so-called
+_padding_ $P$ and the stride $S$.
 
-print(test_acc)
 
-!ec
+!split
+===== Finding the number of parameters =====
 
+In the above example we have an input matrix of dimension $3\times
+3$. In general we call the input for an input volume and it is defined
+by its width $H_1$, height $H_1$ and depth $D_1$. If we have the
+standard three color channels $D_1=3$.
 
+The above example has $W_1=H_1=3$ and $D_1=1$.
 
+When we introduce the filter we have the following additional hyperparameters
+o $K$ the number of filters. It is common to perform the convolution of the input several times since by experience shrinking the input too fast does not work well
+o $F$ as the filter's spatial extent
+o $S$ as the stride parameter
+o $P$ as the padding parameter
 
+These parameters are defined by the architecture of the network and are not included in the training.
 
 !split
-===== Building our own CNN code ===== 
-
-
-Here we present a flexible and readable python code for a CNN
-implemented with NumPy. We will present the code, showcase how to use
-the codebase and fit a CNN that yields a 99% accuracy on the 28x28
-MNIST dataset within reasonable time.
-
-_The codes here were developed by Eric Reber and Gregor Kajda during spring 2023._
-
-The CNN is compatible with all schedulers, cost functions and
-activation functions discussed in constructing our neural network
-codes.
- 
- The CNN code consists of different types of Layer classes, including
-Convolution2DLayer, Pooling2DLayer, FlattenLayer, FullyConnectedLayer
-and OutputLayer, which can be added to the CNN object using the
-interface of the CNN class. This allows you to easily construct your
-own CNN, as well as allowing you to get used to an interface similar
-to that of TensorFlow which is used for real world applications. 
-
-Another important feature of this code is that it throws errors if
-unreasonable decisions are made (for example using a kernel that is
-larger than the image, not using a FlattenLayer, etc), and provides
-the user with an informative error message.
-
-=== List of contents: ===
-o Schedulers
-o Activation Functions
-o Cost Functions 
-o Convolution
-o Layers
-o CNN 
-o Some final remarks
-
-=== Schedulers ===
-
-The code below shows object oriented implementations of the Constant,
-Momentum, Adagrad, AdagradMomentum, RMS prop and Adam schedulers. All
-of the classes belong to the shared abstract Scheduler class, and
-share the update_change() and reset() methods allowing for any of the
-schedulers to be seamlessly used during the training stage, as will
-later be shown in the fit() method of the neural
-network. Update_change() only has one parameter, the gradient
-($\delta^{l}_{j}a^{l-1}_k$), and returns the change which will be
-subtracted from the weights. The reset() function takes no parameters,
-and resets the desired variables. For Constant and Momentum, reset
-does nothing.
+===== New image (or volume) =====
 
-!bc pycod
-import autograd.numpy as np
+Acting with the filter on the input volume produces an output volume
+which is defined by its width $W_2$, its height $H_2$ and its depth
+$D_2$.
 
-class Scheduler:
-    """
-    Abstract class for Schedulers
-    """
+These are defined by the following relations
+!bt
+\[
+W_2 = \frac{(W_1-F+2P)}{S}+1,
+\]
+!et
+!bt
+\[
+H_2 = \frac{(H_1-F+2P)}{S}+1,
+\]
+!et
+and $D_2=K$.
 
-    def __init__(self, eta):
-        self.eta = eta
+!split
+===== Parameters to train, common settings =====
 
-    # should be overwritten
-    def update_change(self, gradient):
-        raise NotImplementedError
+With parameter sharing, the convolution involves thus  for each filter  $F\times F\times D_1$ weights plus one bias parameter.
 
-    # overwritten if needed
-    def reset(self):
-        pass
+In total we have
+!bt
+\[
+\left(F\times F\times D_1)\right) \times K+(K\mathrm{--biases}),
+\]
+!et
+parameters to train by back propagation.
 
+It is common to let $K$ come in powers of $2$, that is $32$, $64$, $128$ etc.
 
-class Constant(Scheduler):
-    def __init__(self, eta):
-        super().__init__(eta)
+!bblock Common settings
+o $\begin{array}{c} F=3 & S=1 & P=1 \end{array}$
+o $\begin{array}{c} F=5 & S=1 & P=2 \end{array}$
+o $\begin{array}{c} F=5 & S=2 & P=\mathrm{open} \end{array}$
+o $\begin{array}{c} F=1 & S=1 & P=0 \end{array}$
+!eblock
 
-    def update_change(self, gradient):
-        return self.eta * gradient
-    
-    def reset(self):
-        pass
+!split
+===== Examples of CNN setups =====
 
+Let us assume we have an input volume $V$ given by an image of dimensionality
+$32\times 32 \times 3$, that is three color channels and $32\times 32$ pixels.
 
-class Momentum(Scheduler):
-    def __init__(self, eta: float, momentum: float):
-        super().__init__(eta)
-        self.momentum = momentum
-        self.change = 0
+We apply a filter of dimension $5\times 5$ ten times with stride $S=1$ and padding $P=0$.
 
-    def update_change(self, gradient):
-        self.change = self.momentum * self.change + self.eta * gradient
-        return self.change
+The output volume is given by $(32-5)/1+1=28$, resulting in ten images
+of dimensionality $28\times 28\times 3$.
 
-    def reset(self):
-        pass
+The total number of parameters to train for each filter is then
+$5\times 5\times 3+1$, where the last parameter is the bias. This
+gives us $76$ parameters for each filter, leading to a total of $760$
+parameters for the ten filters.
 
+How many parameters will a filter of dimensionality $3\times 3$
+(adding color channels) result in if we produce $32$ new images? Use $S=1$ and $P=0$.
 
-class Adagrad(Scheduler):
-    def __init__(self, eta):
-        super().__init__(eta)
-        self.G_t = None
+Note that strides constitute a form of _subsampling_. As an alternative to
+being interpreted as a measure of how much the kernel/filter is translated, strides
+can also be viewed as how much of the output is retained. For instance, moving
+the kernel by hops of two is equivalent to moving the kernel by hops of one but
+retaining only odd output elements.
 
-    def update_change(self, gradient):
-        delta = 1e-8  # avoid division ny zero
 
-        if self.G_t is None:
-            self.G_t = np.zeros((gradient.shape[0], gradient.shape[0]))
 
-        self.G_t += gradient @ gradient.T
+!split
+===== Summarizing: Performing a general discrete convolution ("From Raschka et al":"https://github.com/rasbt/machine-learning-book") =====
 
-        G_t_inverse = 1 / (
-            delta + np.sqrt(np.reshape(np.diagonal(self.G_t), (self.G_t.shape[0], 1)))
-        )
-        return self.eta * gradient * G_t_inverse
+FIGURE: [figslides/discreteconv1.png, width=500 frac=0.67]  A deep CNN
 
-    def reset(self):
-        self.G_t = None
 
 
-class AdagradMomentum(Scheduler):
-    def __init__(self, eta, momentum):
-        super().__init__(eta)
-        self.G_t = None
-        self.momentum = momentum
-        self.change = 0
+!split
+===== Pooling =====
 
-    def update_change(self, gradient):
-        delta = 1e-8  # avoid division ny zero
+In addition to discrete convolutions themselves, _pooling_ operations
+make up another important building block in CNNs. Pooling operations reduce
+the size of feature maps by using some function to summarize subregions, such
+as taking the average or the maximum value.
 
-        if self.G_t is None:
-            self.G_t = np.zeros((gradient.shape[0], gradient.shape[0]))
+Pooling works by sliding a window across the input and feeding the content of
+the window to a _pooling function_. In some sense, pooling works very much
+like a discrete convolution, but replaces the linear combination described by
+the kernel with some other function.
 
-        self.G_t += gradient @ gradient.T
 
-        G_t_inverse = 1 / (
-            delta + np.sqrt(np.reshape(np.diagonal(self.G_t), (self.G_t.shape[0], 1)))
-        )
-        self.change = self.change * self.momentum + self.eta * gradient * G_t_inverse
-        return self.change
 
-    def reset(self):
-        self.G_t = None
+!split
+===== Pooling arithmetic =====
 
+In a neural network, pooling layers provide invariance to small translations of
+the input. The most common kind of pooling is _max pooling_, which
+consists in splitting the input in (usually non-overlapping) patches and
+outputting the maximum value of each patch. Other kinds of pooling exist, e.g.,
+mean or average pooling, which all share the same idea of aggregating the input
+locally by applying a non-linearity to the content of some patches.
 
-class RMS_prop(Scheduler):
-    def __init__(self, eta, rho):
-        super().__init__(eta)
-        self.rho = rho
-        self.second = 0.0
 
-    def update_change(self, gradient):
-        delta = 1e-8  # avoid division ny zero
-        self.second = self.rho * self.second + (1 - self.rho) * gradient * gradient
-        return self.eta * gradient / (np.sqrt(self.second + delta))
 
-    def reset(self):
-        self.second = 0.0
+!split
+===== Pooling types ("From Raschka et al":"https://github.com/rasbt/machine-learning-book") =====
 
+FIGURE: [figslides/maxpooling.png, width=500 frac=0.67]  A deep CNN
 
-class Adam(Scheduler):
-    def __init__(self, eta, rho, rho2):
-        super().__init__(eta)
-        self.rho = rho
-        self.rho2 = rho2
-        self.moment = 0
-        self.second = 0
-        self.n_epochs = 1
 
-    def update_change(self, gradient):
-        delta = 1e-8  # avoid division ny zero
+!split
+===== Building convolutional neural networks using Tensorflow and Keras =====
 
-        self.moment = self.rho * self.moment + (1 - self.rho) * gradient
-        self.second = self.rho2 * self.second + (1 - self.rho2) * gradient * gradient
+  
+As discussed above, CNNs are neural networks built from the assumption that the inputs
+to the network are 2D images. This is important because the number of features or pixels in images
+grows very fast with the image size, and an enormous number of weights and biases are needed in order to build an accurate network.  
+  
+As before, we still have our input, a hidden layer and an output. What's novel about convolutional networks
+are the _convolutional_ and _pooling_ layers stacked in pairs between the input and the hidden layer.
+In addition, the data is no longer represented as a 2D feature matrix, instead each input is a number of 2D
+matrices, typically 1 for each color dimension (Red, Green, Blue). 
 
-        moment_corrected = self.moment / (1 - self.rho**self.n_epochs)
-        second_corrected = self.second / (1 - self.rho2**self.n_epochs)
 
-        return self.eta * moment_corrected / (np.sqrt(second_corrected + delta))
+!split
+===== Setting it up =====
 
-    def reset(self):
-        self.n_epochs += 1
-        self.moment = 0
-        self.second = 0
+It means that to represent the entire
+dataset of images, we require a 4D matrix or _tensor_. This tensor has the dimensions:  
+!bt
+\[  
+(n_{inputs},\, n_{pixels, width},\, n_{pixels, height},\, depth) .
+\]
+!et
+  
+!split
+=====  The MNIST dataset again =====
 
-!ec
+The MNIST dataset consists of grayscale images with a pixel size of
+$28\times 28$, meaning we require $28 \times 28 = 724$ weights to each
+neuron in the first hidden layer.
 
-=== Usage of schedulers ===
+If we were to analyze images of size $128\times 128$ we would require
+$128 \times 128 = 16384$ weights to each neuron. Even worse if we were
+dealing with color images, as most images are, we have an image matrix
+of size $128\times 128$ for each color dimension (Red, Green, Blue),
+meaning 3 times the number of weights $= 49152$ are required for every
+single neuron in the first hidden layer.
+  
 
-To initalize a scheduler, simply create the object and pass in the necessary parameters such as the learning rate and the momentum as shown below. As the Scheduler class is an abstract class it should not called directly, and will raise an error upon usage.
+!split
+===== Strong correlations =====
 
-!bc pycod
-momentum_scheduler = Momentum(eta=1e-3, momentum=0.9)
-adam_scheduler = Adam(eta=1e-3, rho=0.9, rho2=0.999)
-!ec
+Images typically have strong local correlations, meaning that a small
+part of the image varies little from its neighboring regions. If for
+example we have an image of a blue car, we can roughly assume that a
+small blue part of the image is surrounded by other blue regions.
 
-Here is a small example for how a segment of code using schedulers could look. Switching out the schedulers is simple.
+Therefore, instead of connecting every single pixel to a neuron in the
+first hidden layer, as we have previously done with deep neural
+networks, we can instead connect each neuron to a small part of the
+image (in all 3 RGB depth dimensions).  The size of each small area is
+fixed, and known as a "receptive":"https://en.wikipedia.org/wiki/Receptive_field".
+  
 
-!bc pycod
-weights = np.ones((3,3))
-print(f"Before scheduler:\n{weights=}")
+!split 
+===== Layers of a CNN =====
 
-epochs = 10
-for e in range(epochs):
-    gradient = np.random.rand(3, 3)
-    change = adam_scheduler.update_change(gradient)
-    weights = weights - change
-    adam_scheduler.reset()
+The layers of a convolutional neural network arrange neurons in 3D: width, height and depth.  
+The input image is typically a square matrix of depth 3. 
 
-print(f"\nAfter scheduler:\n{weights=}")
-!ec
+A _convolution_ is performed on the image which outputs
+a 3D volume of neurons. The weights to the input are arranged in a number of 2D matrices, known as _filters_.
 
-=== Cost functions ===
 
-In this section we will quickly look at cost functions that can be
-used when creating the neural network. Every cost function takes the
-target vector as its parameter, and returns a function valued only at
-X such that it may easily be differentiated.
-
-
-!bc pycod
-def CostOLS(target):
-    """
-    Return OLS function valued only at X, so
-    that it may be easily differentiated
-    """
-
-    def func(X):
-        return (1.0 / target.shape[0]) * np.sum((target - X) ** 2)
-
-    return func
-
-
-def CostLogReg(target):
-    """
-    Return Logistic Regression cost function
-    valued only at X, so that it may be easily differentiated
-    """
-
-    def func(X):
-        return -(1.0 / target.shape[0]) * np.sum(
-            (target * np.log(X + 10e-10)) + ((1 - target) * np.log(1 - X + 10e-10))
-        )
-
-    return func
-
-
-def CostCrossEntropy(target):
-    """
-    Return cross entropy cost function valued only at X, so
-    that it may be easily differentiated
-    """
-    
-    def func(X):
-        return -(1.0 / target.size) * np.sum(target * np.log(X + 10e-10))
-
-    return func
-
-!ec
-
-=== Usage of cost functions ===
-
-Below we will provide a short example of how these cost function may
-be used to obtain results if you wish to test them out on your own
-using AutoGrad's automatic differentiation.
-
-!bc pycod
-from autograd import grad
-
-target = np.array([[1, 2, 3]]).T
-a = np.array([[4, 5, 6]]).T
-
-cost_func = CostCrossEntropy
-cost_func_derivative = grad(cost_func(target))
-
-valued_at_a = cost_func_derivative(a)
-print(f"Derivative of cost function {cost_func.__name__} valued at a:\n{valued_at_a}")
-!ec
-
-=== Activation functions ===
-
-Finally, before we look at the layers that make up the neural network,
-we will look at the activation functions which can be specified
-between the hidden layers and as the output function. Each function
-can be valued for any given vector or matrix X, and can be
-differentiated via derivate().
-
-
-!bc pycod
-
-import autograd.numpy as np
-from autograd import elementwise_grad
-
-def identity(X):
-    return X
-
-
-def sigmoid(X):
-    try:
-        return 1.0 / (1 + np.exp(-X))
-    except FloatingPointError:
-        return np.where(X > np.zeros(X.shape), np.ones(X.shape), np.zeros(X.shape))
-
-
-def softmax(X):
-    X = X - np.max(X, axis=-1, keepdims=True)
-    delta = 10e-10
-    return np.exp(X) / (np.sum(np.exp(X), axis=-1, keepdims=True) + delta)
-
-
-def RELU(X):
-    return np.where(X > np.zeros(X.shape), X, np.zeros(X.shape))
-
-
-def LRELU(X):
-    delta = 10e-4
-    return np.where(X > np.zeros(X.shape), X, delta * X)
-
-
-def derivate(func):
-    if func.__name__ == "RELU":
-
-        def func(X):
-            return np.where(X > 0, 1, 0)
-
-        return func
-
-    elif func.__name__ == "LRELU":
-
-        def func(X):
-            delta = 10e-4
-            return np.where(X > 0, 1, delta)
-
-        return func
-
-    else:
-        return elementwise_grad(func)
-!ec
-
-
-=== Usage of activation functions ===
-
-Below we present a short demonstration of how to use an activation
-function. The derivative of the activation function will be important
-when calculating the output delta term during backpropagation. Note
-that derivate() can also be used for cost functions for a more
-generalized approach.
-
-
-!bc pycod
-z = np.array([[4, 5, 6]]).T
-print(f"Input to activation function:\n{z}")
-
-act_func = sigmoid
-a = act_func(z)
-print(f"\nOutput from {act_func.__name__} activation function:\n{a}")
-
-act_func_derivative = derivate(act_func)
-valued_at_z = act_func_derivative(a)
-print(f"\nDerivative of {act_func.__name__} activation function valued at z:\n{valued_at_z}")
-!ec
-
-=== Convolution ===
-
-In order to construct a convolutional neural network (CNN), it is
-crucial to comprehend the fundamental principles of convolution and
-how it aids in extracting information from images. Convolution, at its
-core, is merely a mathematical operation between two functions that
-yields another function. It is represented by an integral between two
-functions, which is typically expressed as:
-
-!bt
-\[
-(f \ast g)(t):=\int_{-\infty}^{\infty} f(\tau) g(t-\tau) d \tau.
-\]
-!et
-
-Here, $f$ and $g$ are the two functions on which we want to perform an
-operation. The outcome of the convolution operation is represented by
-$(f \ast g)$, and it is derived by sliding the function $g$ over $f$ and
-computing the integral of their product at each position. If both
-functions are continuous, convolution takes the form shown
-above. However, if we discretize both $f$ and $g$, the convolution
-operation will take the form of a sum between the elements of $f$ and $g$:
-!bt
-\[
-(f \ast g)[n]=\sum_{m=0}^{n-1} f(m) g(n-m).
-\]
-!et
-
-The key idea we utilize to extract the information contained in an
-image is to slide an $m \times n$ matrix $g$ over an $m \times n$
-matrix $f$. In our case, $f$ represents the image, while $g$
-represents the kernel, oftentimes called a filter. However, since our
-convolution will be a two-dimensional variant, we need to extend our
-mathematical formula with an additional summation:
-
-!bt
-\[
-(f \ast g)(i, j)\sum_{m=0}^{M-1}\sum_{n=0}^{N-1} f(m,n) g(i-m, j-n).
-\]
-!et
-
-It is imperative to note that the size of the kernel g is
-significantly smaller than the size of the input image f, thereby
-reducing the amount of computation necessary for feature
-extraction. Furthermore, the kernel is usually a trainable parameter
-in a convolutional neural network, allowing the network to learn
-appropriate kernels for specific tasks.
-
-To give you an example of how 2D convolution works in practice,
-suppose we have an image $f$ of dimension $6 \times 6$
-
-!bt
-\[
-f = \begin{bmatrix}
-4 & 1 & 2 & 9 & 8 & 6 \\
-9 & 5 & 9 & 5 & 8 & 5 \\
-1 & 5 & 9 & 7 & 6 & 4 \\
-2 & 9 & 8 & 3 & 7 & 1 \\
-8 & 1 & 6 & 4 & 2 & 2 \\
-1 & 0 & 5 & 7 & 8 & 2 \\
-\end{bmatrix}
-\]
-!et
-
-and a $3 \times 3$ kernel $g$ called a low-pass filter. Note that the
-kernel is usually rotated by 180 degrees during convolution, however
-this has no effect on this kernel.
-
-!bt
-\[
-g = \frac{1}{9}
-\begin{bmatrix}
-1 & 1 & 1 \\
-1 & 1 & 1 \\
-1 & 1 & 1 \\
-\end{bmatrix}
-\]
-!et
-
-In order to filter the image, we have to extract a $3 \times 3$
-element from the upper left corner of $f$, and perform element-wise
-multiplication of the extracted image pixels with the elements of the
-kernel $g$:
-
-!bt
-\[
-\begin{bmatrix}
-4 & 1 & 2 \\
-9 & 5 & 9 \\
-1 & 5 & 9 \\
-\end{bmatrix}
-\cdot
-\begin{bmatrix}
-\frac{1}{9} & \frac{1}{9} & \frac{1}{9} \\
-\frac{1}{9} & \frac{1}{9} & \frac{1}{9} \\
-\frac{1}{9} & \frac{1}{9} & \frac{1}{9} \\
-\end{bmatrix}
-=
-\begin{bmatrix}
-\frac{4}{9} & \frac{1}{9} & \frac{2}{9} \\
-\frac{9}{9} & \frac{5}{9} & \frac{9}{9} \\
-\frac{1}{9} & \frac{5}{9} & \frac{9}{9} \\
-\end {bmatrix}
-= \bm{A}
-\]
-!et
-
-Then, following the multiplication, we summarize all the elements of the resulting matrix $\bm{A}$:
-
-!bt
-\[
-(f \ast g)(0, 0)= \sum_{i=0}^{2} \sum_{j=0}^{2} a_{i,j} = 5,
-\]
-!et
-which corresponds to the first element of the filtered image $(f \ast g)$.
-
-Here we use a stride of $S=1$, a parameter denoted $S$ which describes how
-many indexes we move the kernel $g$ to the right before repeating the
-calculations above for the next $3 \times 3$ element of the image
-$f$. It is usually presumed that $S=1$, however, larger values for $S$
-can be used to reduce the dimentionality of the filtered image such
-that the convolution operation is more computationally efficient. In
-the context of a convolutional neural network, this will become very
-useful.
-
-The full result of the convolution is:
-
-!bt
-\[
-(f \ast g) =
-\begin{bmatrix}
-5 & 5.78 & 7 & 6.44 \\
-6.33 & 6.67 & 6.89 & 5.11 \\
-5.44 & 5.78 & 5.78 & 4 \\
-4.44 & 4.78 & 5.56 & 4 \\
-\end{bmatrix}
-\]
-!et
-
-The result is markedly smaller in shape than the original image. This
-occurs when using convolution without first padding the image with
-additional columns and rows, allowing us to keep the original image
-shape after sliding the kernel over the image.  How many rows and
-columns we wish to pad the image with depends strictly on the shape of
-the kernel, as we wish to pad the image with $r$ additional rows and
-$c$ additional columns.
-
-!bt
-\[
-r =\lfloor \frac{\mathrm{kernel height}}{2} \rfloor \cdot 2 \\
-c =\lfloor \frac{\mathrm{kernel width}}{2} \rfloor \cdot 2
-\]
-!et
-
-Note the notation $\lfloor \frac{\mathrm{kernel width}}{2} \rfloor$ means that
-we floor the result of the division, meaning we round down to a whole
-number in case $\frac{\mathrm{kernel width}}{2}$ results in a floating point
-number.
-
-Using those simple equations, we find out by how much we have to
-extend the dimensions of the original image. Before proceeding,
-however, we might ask what we shall fill the additional rows and
-columns with? One of the most common approaches to padding is
-zero-padding, which as the name suggest, involves filling the rows and
-columns with zeros. This is the approach that we will be using for
-this demonstration. If we apply this padding to out original $6 \times 6$
-image, the result will be an $8 \times 8$ image as the kernel has a width and
-height of 3. Note that the original image is encapsuled by the
-zero-padded rows and columns:
-
-!bt
-\[
-\begin{bmatrix}
-0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\
-0 & 4 & 1 & 2 & 9 & 8 & 6 & 0 \\
-0 & 9 & 5 & 9 & 5 & 8 & 5 & 0 \\
-0 & 1 & 5 & 9 & 7 & 6 & 4 & 0 \\
-0 & 2 & 9 & 8 & 3 & 7 & 1 & 0 \\
-0 & 8 & 1 & 6 & 4 & 2 & 2 & 0 \\
-0 & 1 & 0 & 5 & 7 & 8 & 2 & 0 \\
-0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\
-\end{bmatrix}.
-\]
-!et
-
-Below we have provided code that demonstrates padding and
-convolution. As you will see when we run the code, the size of the
-image will remain unchanged when using padding.~
-
-!bc pycod
-import numpy as np
-
-def padding(image, kernel):
-    # calculate r and c
-    r = (kernel.shape[0] // 2) * 2
-    c = (kernel.shape[1] // 2) * 2
-    
-    # padded image dimensions
-    padded_height = image.shape[0] + r
-    padded_width = image.shape[1] + c
-    
-    # for more readable code
-    k_half_height = kernel.shape[0] // 2
-    k_half_width = kernel.shape[1] // 2
-
-    # zero matrix with padded dimensions
-    padded_img = np.zeros((padded_height, padded_width))
-
-    # place image into zero matrix
-    padded_img[k_half_height : padded_height - k_half_height,
-               k_half_width : padded_width - k_half_width] = image[:, :]
-
-    return padded_img
-
-def convolve(original_image, padded_image, kernel, stride=1):
-    # rotate kernel by 180 degrees
-    kernel = np.rot90(np.rot90(kernel))
-
-    # note that kernel height // 2 is written as 'm'
-    # and kernel width // 2 as 'n' in the mathematical notation
-    m = kernel.shape[0] // 2
-    n = kernel.shape[1] // 2
-    
-    r = (kernel.shape[0] // 2) * 2
-    c = (kernel.shape[1] // 2) * 2
-    
-    # initialize output array
-    convolved_image = np.zeros(original_image.shape)
-    image_height = original_image.shape[0]
-    image_width = original_image.shape[1]
-
-    # the convolution
-    for i in range(m, image_height + m, stride):
-        for j in range(n, image_width + n, stride):
-            convolved_image[i-m, j-n] = np.sum(
-                padded_image[i : i + m, j : j + n]
-                * kernel
-            )
-            
-    return convolved_image
-
-def convolve(image, kernel, stride=1):
-    for i in range(2):
-        kernel = np.rot90(kernel)
-
-    k_half_height = kernel.shape[0] // 2
-    k_half_width = kernel.shape[0] // 2
-
-    conv_image = np.zeros(image.shape)
-    pad_image = padding(image, kernel)
-
-    for i in range(k_half_height, conv_image.shape[0] + k_half_height, stride):
-        for j in range(k_half_width, conv_image.shape[1] + k_half_width, stride):
-            conv_image[i - k_half_height, j - k_half_width] = np.sum(
-                pad_image[
-                    i - k_half_height : i + k_half_height + 1, j - k_half_width : j + k_half_width + 1
-                ]
-                * kernel
-            )
-
-    return conv_image
-!ec
-
-Fun fact: When filtering images, you will see that convolution
-involves rotating the kernel by 180 degrees.  However, this is not the
-case when applying convolution in a CNN, where the same operation that is not
-rotated by 180 degrees is called cross-correlation, which is normally implemented in most libraries.
-
-!bc pycod
-
-original_image = np.array([[4, 1, 2, 9, 8, 6],
-                 [9, 5, 9, 5, 8, 5],
-                 [1, 5, 9, 7, 6, 4],
-                 [2, 9, 8, 3, 7, 1],
-                 [8, 1, 6, 4, 2, 2],
-                 [1, 0, 5, 7, 8, 2]])
-
-kernel = (1/9)*np.ones((3,3))
-
-print(f"{original_image.shape=}")
-
-# note that convolve() performs padding
-convolved_image = convolve(original_image, kernel, stride=1)
-
-print(f"{convolved_image.shape=}")
-!ec
-
-As you can see, the resulting image is of the same size as the
-original image. To round of our demonstration of convolution, we will
-present the results of convolution using commonly used kernels. In a
-CNN, the values of the kernels are randomly initialized, and then
-learned during training. These kernels will extract information
-regarding the picture, such as for example the edge detection filter
-demonstrated below extracts the edges present in the picture. Of
-course, there is no guarantee that the CNN will learn an edge
-detection filter, but this should provide some intuiton as to how the
-CNN is able to use kernels to make better predictions than a regular
-feed forward neural network.
-
-!bc pycod
-# Now an example using a real image and first a gaussian low-pass filter and then a Sobel filter
-import numpy as np
-import imageio.v3 as imageio
-import matplotlib.pyplot as plt
-import time
-
-def generate_gauss_mask(sigma, K=1):
-    side = np.ceil(1 + 8 * sigma)
-    y, x = np.mgrid[-side // 2 + 1 : (side // 2) + 1, -side // 2 + 1 : (side // 2) + 1]
-    ker_coef = K / (2 * np.pi * sigma**2)
-    g = np.exp(-((x**2 + y**2) / (2.0 * sigma**2)))
-
-    return g, ker_coef
-
-
-img_path = "data/IMG-2167.JPG"
-image_of_cute_dog = imageio.imread(img_path, mode='L')
-
-plt.imshow(image_of_cute_dog, cmap="gray", vmin=0, vmax=255, aspect="auto")
-plt.title("Original image")
-plt.show()
-
-gauss, kernel = generate_gauss_mask(sigma=6)
-gauss_kernel = gauss*kernel
-
-filtered_image = convolve(image_of_cute_dog, gauss_kernel)
-plt.imshow(filtered_image, cmap="gray", vmin=0, vmax=255, aspect="auto")
-plt.title("Result of convolution with gauss kernel (blurring filter)")
-plt.show()
-
-sobel_kernel = np.array([[1, 2, 1],
-                    [0, 0, 0], 
-                    [-1, -2, -1]])
-
-filtered_image = convolve(image_of_cute_dog, sobel_kernel)
-
-plt.imshow(filtered_image, cmap="gray", vmin=0, vmax=255, aspect="auto")
-plt.title("Result of convolution with sobel kernel (edge detection filter)")
-plt.show()
-!ec
-
-=== Layers ===
-
-The code below initialises global variables for readability and
-describes the abstract class Layers. This is not important in order to
-understand the CNN, but is benefitial for organizing the code neatly.
-
-!bc pycod
-import math
-import autograd.numpy as np
-from copy import deepcopy, copy
-from autograd import grad
-from typing import Callable
-
-# global variables for index readability
-input_index = 0
-node_index = 1
-bias_index = 1
-input_channel_index = 1
-feature_maps_index = 1
-height_index = 2
-width_index = 3
-kernel_feature_maps_index = 1
-kernel_input_channels_index = 0
-
-
-class Layer:
-    def __init__(self, seed):
-        self.seed = seed
-
-    def _feedforward(self):
-        raise NotImplementedError
-
-    def _backpropagate(self):
-        raise NotImplementedError
-
-    def _reset_weights(self, previous_nodes):
-        raise NotImplementedError
-!ec
-
-=== Convolution2DLayer: convolution in a hidden layer ===
-
-After establishing the foundational understanding of applying
-convolution to spatial data, let us delve into the intricate workings
-of a convolutional layer in a Convolutional Neural Network (CNN). The
-primary function of convolution, as previously discussed, is to
-extract pertinent information from images while simultaneously
-decreasing the scale of our data. To initiate the image processing, we
-shall begin by partitioning the images into color channels (unless the
-image is grayscale), comprising three primary colors: red, green, and
-blue. We will subsequently utilize trainable kernels to construct a
-higher-dimensional encoding of each channel called feature
-maps. Successive layers will receive these feature maps as inputs,
-generating further encodings, albeit with reduced dimensions. The term
-trainable kernels denotes the initialization of pre-defined
-kernel-shaped weights, which we will then train via backpropagation,
-similar to how weights are trained in a Feedforward Neural Network.
-
- 
-To ensure seamless integration between our implementation of the
-convolutional layer and popular machine learning frameworks like
-Tensorflow (Keras) and PyTorch, we have adopted a design pattern that
-mirrors the construction of models using these APIs. This involves
-implementing our convolutional layer as a Python class or object,
-which allows for a more modular and flexible approach to building
-neural networks. By structuring our code in this way, users can easily
-incorporate our implementation into their existing machine learning
-pipelines without having to make significant changes to their
-codebase. Additionally, this design pattern promotes code reusability
-and makes it easier to maintain and update our convolutional layer
-implementation over time.
-
- 
-Note that the Convolution2DLayer takes in an activation function as a parameter, as it also performs non-linearity.
-
-!bc pycod
-class Convolution2DLayer(Layer):
-    def __init__(
-        self,
-        input_channels,
-        feature_maps,
-        kernel_height,
-        kernel_width,
-        v_stride,
-        h_stride,
-        pad,
-        act_func: Callable,
-        seed=None,
-        reset_weights_independently=True,
-    ):
-        super().__init__(seed)
-        self.input_channels = input_channels
-        self.feature_maps = feature_maps
-        self.kernel_height = kernel_height
-        self.kernel_width = kernel_width
-        self.v_stride = v_stride
-        self.h_stride = h_stride
-        self.pad = pad
-        self.act_func = act_func
-
-        # such that the layer can be used on its own
-        # outside of the CNN module
-        if reset_weights_independently == True:
-            self._reset_weights_independently()
-
-    def _feedforward(self, X_batch):
-        # note that the shape of X_batch = [inputs, input_maps, img_height, img_width]
-
-        # pad the input batch
-        X_batch_padded = self._padding(X_batch)
-
-        # calculate height_index and width_index after stride
-        strided_height = int(np.ceil(X_batch.shape[height_index] / self.v_stride))
-        strided_width = int(np.ceil(X_batch.shape[width_index] / self.h_stride))
-
-        # create output array
-        output = np.ndarray(
-            (
-                X_batch.shape[input_index],
-                self.feature_maps,
-                strided_height,
-                strided_width,
-            )
-        )
-
-        # save input and output for backpropagation
-        self.X_batch_feedforward = X_batch
-        self.output_shape = output.shape
-
-        # checking for errors, no need to look here :)
-        self._check_for_errors()
-
-        # convolve input with kernel
-        for img in range(X_batch.shape[input_index]):
-            for chin in range(self.input_channels):
-                for fmap in range(self.feature_maps):
-                    out_h = 0
-                    for h in range(0, X_batch.shape[height_index], self.v_stride):
-                        out_w = 0
-                        for w in range(0, X_batch.shape[width_index], self.h_stride):
-                            output[img, fmap, out_h, out_w] = np.sum(
-                                X_batch_padded[
-                                    img,
-                                    chin,
-                                    h : h + self.kernel_height,
-                                    w : w + self.kernel_width,
-                                ]
-                                * self.kernel[chin, fmap, :, :]
-                            )
-                            out_w += 1
-                        out_h += 1
-
-        # Pay attention to the fact that we're not rotating the kernel by 180 degrees when filtering the image in
-        # the convolutional layer, as convolution in terms of Machine Learning is a procedure known as cross-correlation
-        # in image processing and signal processing
-
-        # return a
-        return self.act_func(output / (self.kernel_height))
-
-    def _backpropagate(self, delta_term_next):
-        # intiate matrices
-        delta_term = np.zeros((self.X_batch_feedforward.shape))
-        gradient_kernel = np.zeros((self.kernel.shape))
-
-        # pad input for convolution
-        X_batch_padded = self._padding(self.X_batch_feedforward)
-
-        # Since an activation function is used at the output of the convolution layer, its derivative
-        # has to be accounted for in the backpropagation -> as if ReLU was a layer on its own.
-        act_derivative = derivate(self.act_func)
-        delta_term_next = act_derivative(delta_term_next)
-
-        # fill in 0's for values removed by vertical stride in feedforward
-        if self.v_stride > 1:
-            v_ind = 1
-            for i in range(delta_term_next.shape[height_index]):
-                for j in range(self.v_stride - 1):
-                    delta_term_next = np.insert(
-                        delta_term_next, v_ind, 0, axis=height_index
-                    )
-                v_ind += self.v_stride
-
-        # fill in 0's for values removed by horizontal stride in feedforward
-        if self.h_stride > 1:
-            h_ind = 1
-            for i in range(delta_term_next.shape[width_index]):
-                for k in range(self.h_stride - 1):
-                    delta_term_next = np.insert(
-                        delta_term_next, h_ind, 0, axis=width_index
-                    )
-                h_ind += self.h_stride
-
-        # crops out 0-rows and 0-columns
-        delta_term_next = delta_term_next[
-            :,
-            :,
-            : self.X_batch_feedforward.shape[height_index],
-            : self.X_batch_feedforward.shape[width_index],
-        ]
-
-        # the gradient received from the next layer also needs to be padded
-        delta_term_next = self._padding(delta_term_next)
-
-        # calculate delta term by convolving next delta term with kernel
-        for img in range(self.X_batch_feedforward.shape[input_index]):
-            for chin in range(self.input_channels):
-                for fmap in range(self.feature_maps):
-                    for h in range(self.X_batch_feedforward.shape[height_index]):
-                        for w in range(self.X_batch_feedforward.shape[width_index]):
-                            delta_term[img, chin, h, w] = np.sum(
-                                delta_term_next[
-                                    img,
-                                    fmap,
-                                    h : h + self.kernel_height,
-                                    w : w + self.kernel_width,
-                                ]
-                                * np.rot90(np.rot90(self.kernel[chin, fmap, :, :]))
-                            )
-
-        # calculate gradient for kernel for weight update
-        # also via convolution
-        for chin in range(self.input_channels):
-            for fmap in range(self.feature_maps):
-                for k_x in range(self.kernel_height):
-                    for k_y in range(self.kernel_width):
-                        gradient_kernel[chin, fmap, k_x, k_y] = np.sum(
-                            X_batch_padded[
-                                img,
-                                chin,
-                                h : h + self.kernel_height,
-                                w : w + self.kernel_width,
-                            ]
-                            * delta_term_next[
-                                img,
-                                fmap,
-                                h : h + self.kernel_height,
-                                w : w + self.kernel_width,
-                            ]
-                        )
-        # all kernels are updated with weight gradient of kernel
-        self.kernel -= gradient_kernel
-
-        # return delta term
-        return delta_term
-
-    def _padding(self, X_batch, batch_type="image"):
-
-        # same padding for images
-        if self.pad == "same" and batch_type == "image":
-            padded_height = X_batch.shape[height_index] + (self.kernel_height // 2) * 2
-            padded_width = X_batch.shape[width_index] + (self.kernel_width // 2) * 2
-            half_kernel_height = self.kernel_height // 2
-            half_kernel_width = self.kernel_width // 2
-
-            # initialize padded array
-            X_batch_padded = np.ndarray(
-                (
-                    X_batch.shape[input_index],
-                    X_batch.shape[feature_maps_index],
-                    padded_height,
-                    padded_width,
-                )
-            )
-
-            # zero pad all images in X_batch
-            for img in range(X_batch.shape[input_index]):
-                padded_img = np.zeros(
-                    (X_batch.shape[feature_maps_index], padded_height, padded_width)
-                )
-                padded_img[
-                    :,
-                    half_kernel_height : padded_height - half_kernel_height,
-                    half_kernel_width : padded_width - half_kernel_width,
-                ] = X_batch[img, :, :, :]
-                X_batch_padded[img, :, :, :] = padded_img[:, :, :]
-
-            return X_batch_padded
-
-        # same padding for gradients
-        elif self.pad == "same" and batch_type == "grad":
-            padded_height = X_batch.shape[height_index] + (self.kernel_height // 2) * 2
-            padded_width = X_batch.shape[width_index] + (self.kernel_width // 2) * 2
-            half_kernel_height = self.kernel_height // 2
-            half_kernel_width = self.kernel_width // 2
-
-            # initialize padded array
-            delta_term_padded = np.zeros(
-                (
-                    X_batch.shape[input_index],
-                    X_batch.shape[feature_maps_index],
-                    padded_height,
-                    padded_width,
-                )
-            )
-
-            # zero pad delta term
-            delta_term_padded[
-                :, :, : X_batch.shape[height_index], : X_batch.shape[width_index]
-            ] = X_batch[:, :, :, :]
-
-            return delta_term_padded
-
-        else:
-            return X_batch
-
-    def _reset_weights_independently(self):
-        # sets seed to remove randomness inbetween runs
-        if self.seed is not None:
-            np.random.seed(self.seed)
-
-        # initializes kernel matrix
-        self.kernel = np.ndarray(
-            (
-                self.input_channels,
-                self.feature_maps,
-                self.kernel_height,
-                self.kernel_width,
-            )
-        )
-
-        # randomly initializes weights
-        for chin in range(self.kernel.shape[kernel_input_channels_index]):
-            for fmap in range(self.kernel.shape[kernel_feature_maps_index]):
-                self.kernel[chin, fmap, :, :] = np.random.rand(
-                    self.kernel_height, self.kernel_width
-                )
-
-    def _reset_weights(self, previous_nodes):
-        # sets weights
-        self._reset_weights_independently()
-
-        # returns shape of output used for subsequent layer's weight initiation
-        strided_height = int(
-            np.ceil(previous_nodes.shape[height_index] / self.v_stride)
-        )
-        strided_width = int(np.ceil(previous_nodes.shape[width_index] / self.h_stride))
-        next_nodes = np.ones(
-            (
-                previous_nodes.shape[input_index],
-                self.feature_maps,
-                strided_height,
-                strided_width,
-            )
-        )
-        return next_nodes / self.kernel_height
-
-    def _check_for_errors(self):
-        if self.X_batch_feedforward.shape[input_channel_index] != self.input_channels:
-            raise AssertionError(
-                f"ERROR: Number of input channels in data ({self.X_batch_feedforward.shape[input_channel_index]}) is not equal to input channels in Convolution2DLayerOPT ({self.input_channels})! Please change the number of input channels of the Convolution2DLayer such that they are equal"
-            )
-!ec
-
-=== Backpropagation in the convolutional layer ===
-
-As you may have noticed, we have not yet explained how the
-backpropagation algorithm works in a convolutional layer. However,
-having covered all other major details about convolutional layers, we
-are now prepared to do so. It should come as no surprise that the
-calculation of delta terms at each convolutional layer takes the form
-of convolution. After the gradient has been propagated backwards
-through the flattening layer, where it was reshaped into an
-appropriate form, calculating the update value for the kernel is
-simply a matter of convolving the output gradient with the input of
-the layer for which we are updating the weights. For more detail, this
-article serves as an excellent resource, see
-URL:"https://pavisj.medium.com/convolutions-and-backpropagations-46026a8f5d2c"
-
-=== Demonstration ===
-
-We can use the convolutional layer above to perform a simple convolution on an image of the now familiar cute dog.
-
-!bc pycod
-import numpy as np
-import imageio.v3 as imageio
-import matplotlib.pyplot as plt
-
-def plot_convolution_result(X, layer):
-    plt.imshow(X[0, 0, :, :], vmin=0, vmax=255, cmap="gray")
-    plt.title("Original image")
-    plt.colorbar()
-    plt.show()
-    conv_result = layer._feedforward(X)
-    plt.title("Result of convolutional layer")
-    plt.imshow(conv_result[0, 0, :, :], vmin=0, vmax=255, cmap="gray")
-    plt.colorbar()
-    plt.show()
-
-# create layer
-layer = Convolution2DLayer(
-    input_channels=3,
-    feature_maps=1,
-    kernel_height=4,
-    kernel_width=4,
-    v_stride=2,
-    h_stride=2,
-    pad="same",
-    act_func=identity,
-    seed=2023,
-    )
-
-# read in image path, make data correct format
-img_path = img_path = "data/IMG-2167.JPG"
-image_of_cute_dog = imageio.imread(img_path)
-image_shape = image_of_cute_dog.shape
-image_of_cute_dog = image_of_cute_dog.reshape(1, image_shape[0], image_shape[1], image_shape[2])
-image_of_cute_dog = image_of_cute_dog.transpose(0, 3, 1, 2)
-
-# plot the result of the convolution
-plot_convolution_result(image_of_cute_dog, layer)
-!ec
-
-We cobserve that the result has half the pixels on each axis due to
-the fact that we've used a horizontal and vertical stride of 2. The
-result of this convolution is not very insightfull, as the kernel has
-completely random values for the first feedforward pass. However, as
-we perform multiple forward and backward passes, the results of the
-convolution should provide identifying features of the image it uses
-for classification.
-
-
-Note that image data usually comes in many different shapes and sizes,
-but for our CNN we require the input data be formatted as \[Number of
-inputs, input channels, input height, input width\]. Occasionally, the
-data you come accross use will be formatted like this, but on many
-occasions reshaping and transposing the dimensions is sadly necessary.
-
-=== Pooling Layer ===
-
-The pooling layer is another widely used type of layer in
-convolutional neural networks that enables data downsampling to a more
-manageable size. Despite recent technological advancements that allow
-for convolution without excessive size reduction of the data, the
-pooling layer still remains a fundamental component of convolutional
-neural networks. It can be used before, after, or in between
-convolutional layers, although finding the optimal placement of layers
-and network depth requires experimentation to achieve the best
-performance for a given problem. The code we provide allows you to
-perform two types of pooling known as max pooling and average pooling.
-
-!bc pycod
-class Pooling2DLayer(Layer):
-    def __init__(
-        self,
-        kernel_height,
-        kernel_width,
-        v_stride,
-        h_stride,
-        pooling="max",
-        seed=None,
-    ):
-        super().__init__(seed)
-        self.kernel_height = kernel_height
-        self.kernel_width = kernel_width
-        self.v_stride = v_stride
-        self.h_stride = h_stride
-        self.pooling = pooling
-
-    def _feedforward(self, X_batch):
-        # Saving the input for use in the backwardpass
-        self.X_batch_feedforward = X_batch
-
-        # check if user is silly
-        self._check_for_errors()
-
-        # Computing the size of the feature maps based on kernel size and the stride parameter
-        strided_height = (
-            X_batch.shape[height_index] - self.kernel_height
-        ) // self.v_stride + 1
-        if X_batch.shape[height_index] == X_batch.shape[width_index]:
-            strided_width = strided_height
-        else:
-            strided_width = (
-                X_batch.shape[width_index] - self.kernel_width
-            ) // self.h_stride + 1
-
-        # initialize output array
-        output = np.ndarray(
-            (
-                X_batch.shape[input_index],
-                X_batch.shape[feature_maps_index],
-                strided_height,
-                strided_width,
-            )
-        )
-
-        # select pooling action, either max or average pooling
-        if self.pooling == "max":
-            self.pooling_action = np.max
-        elif self.pooling == "average":
-            self.pooling_action = np.mean
-
-        # pool based on kernel size and stride
-        for img in range(output.shape[input_index]):
-            for fmap in range(output.shape[feature_maps_index]):
-                for h in range(strided_height):
-                    for w in range(strided_width):
-                        output[img, fmap, h, w] = self.pooling_action(
-                            X_batch[
-                                img,
-                                fmap,
-                                (h * self.v_stride) : (h * self.v_stride)
-                                + self.kernel_height,
-                                (w * self.h_stride) : (w * self.h_stride)
-                                + self.kernel_width,
-                            ]
-                        )
-
-        # output for feedforward in next layer
-        return output
-
-    def _backpropagate(self, delta_term_next):
-        # initiate delta term array
-        delta_term = np.zeros((self.X_batch_feedforward.shape))
-
-        for img in range(delta_term_next.shape[input_index]):
-            for fmap in range(delta_term_next.shape[feature_maps_index]):
-                for h in range(0, delta_term_next.shape[height_index], self.v_stride):
-                    for w in range(
-                        0, delta_term_next.shape[width_index], self.h_stride
-                    ):
-                        # max pooling
-                        if self.pooling == "max":
-                            # get window
-                            window = self.X_batch_feedforward[
-                                img,
-                                fmap,
-                                h : h + self.kernel_height,
-                                w : w + self.kernel_width,
-                            ]
-
-                            # find max values indices in window
-                            max_h, max_w = np.unravel_index(
-                                window.argmax(), window.shape
-                            )
-
-                            # set values in new, upsampled delta term
-                            delta_term[
-                                img,
-                                fmap,
-                                (h + max_h),
-                                (w + max_w),
-                            ] += delta_term_next[img, fmap, h, w]
-
-                        # average pooling
-                        if self.pooling == "average":
-                            delta_term[
-                                img,
-                                fmap,
-                                h : h + self.kernel_height,
-                                w : w + self.kernel_width,
-                            ] = (
-                                delta_term_next[img, fmap, h, w]
-                                / self.kernel_height
-                                / self.kernel_width
-                            )
-        # returns input to backpropagation in previous layer
-        return delta_term
-
-    def _reset_weights(self, previous_nodes):
-        # calculate strided height, strided width
-        strided_height = (
-            previous_nodes.shape[height_index] - self.kernel_height
-        ) // self.v_stride + 1
-        if previous_nodes.shape[height_index] == previous_nodes.shape[width_index]:
-            strided_width = strided_height
-        else:
-            strided_width = (
-                previous_nodes.shape[width_index] - self.kernel_width
-            ) // self.h_stride + 1
-
-        # initiate output array
-        output = np.ones(
-            (
-                previous_nodes.shape[input_index],
-                previous_nodes.shape[feature_maps_index],
-                strided_height,
-                strided_width,
-            )
-        )
-
-        # returns output with shape used for reset weights in next layer
-        return output
-
-    def _check_for_errors(self):
-        # check if input is smaller than kernel size -> error
-        assert (
-            self.X_batch_feedforward.shape[width_index] >= self.kernel_width
-        ), f"ERROR: Pooling kernel width_index ({self.kernel_width}) larger than data width_index ({self.X_batch_feedforward.input.shape[2]}), please lower the kernel width_index of the Pooling2DLayer"
-        assert (
-            self.X_batch_feedforward.shape[height_index] >= self.kernel_height
-        ), f"ERROR: Pooling kernel height_index ({self.kernel_height}) larger than data height_index ({self.X_batch_feedforward.input.shape[3]}), please lower the kernel height_index of the Pooling2DLayer"
-!ec
-
-=== Flattening Layer ===
-
-Before we can begin building our first CNN model, we need to introduce
-the flattening layer. As its name suggests, the flattening layer
-transforms the data into a one-dimensional vector that can be fed into
-the feedforward layers of our network. This layer plays a crucial role
-in preparing the data for further processing in the
-network. Additionally, the flattening layer is responsible for
-reshaping the gradient to the proper shape during
-backpropagation. This ensures that the kernels are correctly updated,
-allowing for effective learning in the network.
-
-!bc pycod
-class FlattenLayer(Layer):
-    def __init__(self, act_func=LRELU, seed=None):
-        super().__init__(seed)
-        self.act_func = act_func
-
-    def _feedforward(self, X_batch):
-        # save input for backpropagation
-        self.X_batch_feedforward_shape = X_batch.shape
-        # Remember, the data has the following shape: (I, FM, H, W, ) in the convolutional layers
-        # whilst the data has the shape (I, FM * H * W) in the fully connected layers
-        # I = Inputs, FM = Feature Maps, H = Height and W = Width.
-        X_batch = X_batch.reshape(
-            X_batch.shape[input_index],
-            X_batch.shape[feature_maps_index]
-            * X_batch.shape[height_index]
-            * X_batch.shape[width_index],
-        )
-
-        # add bias to a
-        self.z_matrix = X_batch
-        bias = np.ones((X_batch.shape[input_index], 1)) * 0.01
-        self.a_matrix = np.hstack([bias, X_batch])
-
-        # return a, the input to feedforward in next layer
-        return self.a_matrix
-
-    def _backpropagate(self, weights_next, delta_term_next):
-        activation_derivative = derivate(self.act_func)
-
-        # calculate delta term
-        delta_term = (
-            weights_next[bias_index:, :] @ delta_term_next.T
-        ).T * activation_derivative(self.z_matrix)
-
-        # FlattenLayer does not update weights
-        # reshapes delta layer to convolutional layer data format [Input, Feature_Maps, Height, Width]
-        return delta_term.reshape(self.X_batch_feedforward_shape)
-
-    def _reset_weights(self, previous_nodes):
-        # note that the previous nodes to the FlattenLayer are from the convolutional layers
-        previous_nodes = previous_nodes.reshape(
-            previous_nodes.shape[input_index],
-            previous_nodes.shape[feature_maps_index]
-            * previous_nodes.shape[height_index]
-            * previous_nodes.shape[width_index],
-        )
-
-        # return shape used in reset_weights in next layer
-        return previous_nodes.shape[node_index]
-
-    def get_prev_a(self):
-        return self.a_matrix
-!ec
-
-=== Fully Connected Layers ===
-
-Finally, the result from the flatten layer will pass to a series of
-fully connected layers, which function as a normal feed forward neural
-network. The fully connected layers are split into two classes;
-FullyConnectedLayer which acts as a hidden layer, and OutputLayer,
-which acts as the single output layer at the end of the CNN. If one
-wishes to use this codebase to construct a normal feed forward neural
-network, it must start with a FlattenLayer due to techincal details
-regarding weight intitialization. However many FullyConnectedLayers
-can be added to the CNN, and in each layer the amount of nodes, which
-activation function and scheduler to use can be specified. In
-practice, the scheduler will be specified in the CNN object
-initialization, and inherited if no other scheduler is specified.
-
-!bc pycod
-class FullyConnectedLayer(Layer):
-    # FullyConnectedLayer per default uses LRELU and Adam scheduler
-    # with an eta of 0.0001, rho of 0.9 and rho2 of 0.999
-    def __init__(
-        self,
-        nodes: int,
-        act_func: Callable = LRELU,
-        scheduler: Scheduler = Adam(eta=1e-4, rho=0.9, rho2=0.999),
-        seed: int = None,
-    ):
-        super().__init__(seed)
-        self.nodes = nodes
-        self.act_func = act_func
-        self.scheduler_weight = copy(scheduler)
-        self.scheduler_bias = copy(scheduler)
-
-        # initiate matrices for later
-        self.weights = None
-        self.a_matrix = None
-        self.z_matrix = None
-
-    def _feedforward(self, X_batch):
-        # calculate z
-        self.z_matrix = X_batch @ self.weights
-
-        # calculate a, add bias
-        bias = np.ones((X_batch.shape[input_index], 1)) * 0.01
-        self.a_matrix = self.act_func(self.z_matrix)
-        self.a_matrix = np.hstack([bias, self.a_matrix])
-
-        # return a, the input for feedforward in next layer
-        return self.a_matrix
-
-    def _backpropagate(self, weights_next, delta_term_next, a_previous, lam):
-        # take the derivative of the activation function
-        activation_derivative = derivate(self.act_func)
-
-        # calculate the delta term
-        delta_term = (
-            weights_next[bias_index:, :] @ delta_term_next.T
-        ).T * activation_derivative(self.z_matrix)
-
-        # intitiate matrix to store gradient
-        # note that we exclude the bias term, which we will calculate later
-        gradient_weights = np.zeros(
-            (
-                a_previous.shape[input_index],
-                a_previous.shape[node_index] - bias_index,
-                delta_term.shape[node_index],
-            )
-        )
-
-        # calculate gradient = delta term * previous a
-        for i in range(len(delta_term)):
-            gradient_weights[i, :, :] = np.outer(
-                a_previous[i, bias_index:], delta_term[i, :]
-            )
-
-        # sum the gradient, divide by input_index
-        gradient_weights = np.mean(gradient_weights, axis=input_index)
-        # for the bias gradient we do not multiply by previous a
-        gradient_bias = np.mean(delta_term, axis=input_index).reshape(
-            1, delta_term.shape[node_index]
-        )
-
-        # regularization term
-        gradient_weights += self.weights[bias_index:, :] * lam
-
-        # send gradients into scheduler
-        # returns update matrix which will be used to update the weights and bias
-        update_matrix = np.vstack(
-            [
-                self.scheduler_bias.update_change(gradient_bias),
-                self.scheduler_weight.update_change(gradient_weights),
-            ]
-        )
-
-        # update weights
-        self.weights -= update_matrix
-
-        # return weights and delta term, input for backpropagation in previous layer
-        return self.weights, delta_term
-
-    def _reset_weights(self, previous_nodes):
-        # sets seed to remove randomness inbetween runs
-        if self.seed is not None:
-            np.random.seed(self.seed)
-
-        # add bias, initiate random weights
-        bias = 1
-        self.weights = np.random.randn(previous_nodes + bias, self.nodes)
-
-        # returns number of nodes, used for reset_weights in next layer
-        return self.nodes
-
-    def _reset_scheduler(self):
-        # resets scheduler per epoch
-        self.scheduler_weight.reset()
-        self.scheduler_bias.reset()
-
-    def get_prev_a(self):
-        # returns a matrix, used in backpropagation
-        return self.a_matrix
-
-
-class OutputLayer(FullyConnectedLayer):
-    def __init__(
-        self,
-        nodes: int,
-        output_func: Callable = LRELU,
-        cost_func: Callable = CostCrossEntropy,
-        scheduler: Scheduler = Adam(eta=1e-4, rho=0.9, rho2=0.999),
-        seed: int = None,
-    ):
-        super().__init__(nodes, output_func, copy(scheduler), seed)
-        self.cost_func = cost_func
-
-        # initiate matrices for later
-        self.weights = None
-        self.a_matrix = None
-        self.z_matrix = None
-
-        # decides if the output layer performs binary or multi-class classification
-        self._set_pred_format()
-
-    def _feedforward(self, X_batch: np.ndarray):
-        # calculate a, z
-        # note that bias is not added as this would create an extra output class
-        self.z_matrix = X_batch @ self.weights
-        self.a_matrix = self.act_func(self.z_matrix)
-
-        # returns prediction
-        return self.a_matrix
-
-    def _backpropagate(self, target, a_previous, lam):
-        # note that in the OutputLayer the activation function is the output function
-        activation_derivative = derivate(self.act_func)
-
-        # calculate output delta terms
-        # for multi-class or binary classification
-        if self.pred_format == "Multi-class":
-            delta_term = self.a_matrix - target
-        else:
-            cost_func_derivative = grad(self.cost_func(target))
-            delta_term = activation_derivative(self.z_matrix) * cost_func_derivative(
-                self.a_matrix
-            )
-
-        # intiate matrix that stores gradient
-        gradient_weights = np.zeros(
-            (
-                a_previous.shape[input_index],
-                a_previous.shape[node_index] - bias_index,
-                delta_term.shape[node_index],
-            )
-        )
-
-        # calculate gradient = delta term * previous a
-        for i in range(len(delta_term)):
-            gradient_weights[i, :, :] = np.outer(
-                a_previous[i, bias_index:], delta_term[i, :]
-            )
-
-        # sum the gradient, divide by input_index
-        gradient_weights = np.mean(gradient_weights, axis=input_index)
-        # for the bias gradient we do not multiply by previous a
-        gradient_bias = np.mean(delta_term, axis=input_index).reshape(
-            1, delta_term.shape[node_index]
-        )
-
-        # regularization term
-        gradient_weights += self.weights[bias_index:, :] * lam
-
-        # send gradients into scheduler
-        # returns update matrix which will be used to update the weights and bias
-        update_matrix = np.vstack(
-            [
-                self.scheduler_bias.update_change(gradient_bias),
-                self.scheduler_weight.update_change(gradient_weights),
-            ]
-        )
-
-        # update weights
-        self.weights -= update_matrix
-
-        # return weights and delta term, input for backpropagation in previous layer
-        return self.weights, delta_term
-
-    def _reset_weights(self, previous_nodes):
-        # sets seed to remove randomness inbetween runs
-        if self.seed is not None:
-            np.random.seed(self.seed)
-
-        # add bias, initiate random weights
-        bias = 1
-        self.weights = np.random.rand(previous_nodes + bias, self.nodes)
-
-        # returns number of nodes, used for reset_weights in next layer
-        return self.nodes
-
-    def _reset_scheduler(self):
-        # resets scheduler per epoch
-        self.scheduler_weight.reset()
-        self.scheduler_bias.reset()
-
-    def _set_pred_format(self):
-        # sets prediction format to either regression, binary or multi-class classification
-        if self.act_func.__name__ is None or self.act_func.__name__ == "identity":
-            self.pred_format = "Regression"
-        elif self.act_func.__name__ == "sigmoid" or self.act_func.__name__ == "tanh":
-            self.pred_format = "Binary"
-        else:
-            self.pred_format = "Multi-class"
-
-    def get_pred_format(self):
-        # returns format of prediction
-        return self.pred_format
-!ec
-
-=== Optimized Convolution2DLayer ===
-
-For our CNN, we have also implemented an optimized version of the
-Convolution2DLayer, Convolution2DLayerOPT, which runs much faster. See
-VII. Remarks for discussion. This layer will per default be used by
-the CNN due to its computational advantages, but is much less
-readable. We've documented it such that specially interested students
-can understand the principles behind it, but it is not recommended to
-read. In short, we reshape and transpose parts of the image such that
-the convolutional operation can be swapped out for a simple matrix
-multiplication.
-
-
-!bc pycod
-class Convolution2DLayerOPT(Convolution2DLayer):
-    """
-    Am optimized version of the convolution layer above which
-    utilizes an approach of extracting windows of size equivalent
-    in size to the filter. The convoution is then performed on those
-    windows instead of a full feature map.
-    """
-
-    def __init__(
-        self,
-        input_channels,
-        feature_maps,
-        kernel_height,
-        kernel_width,
-        v_stride,
-        h_stride,
-        pad,
-        act_func: Callable,
-        seed=None,
-        reset_weights_independently=True,
-    ):
-        super().__init__(
-            input_channels,
-            feature_maps,
-            kernel_height,
-            kernel_width,
-            v_stride,
-            h_stride,
-            pad,
-            act_func,
-            seed,
-        )
-        # true if layer is used outside of CNN
-        if reset_weights_independently == True:
-            self._reset_weights_independently()
-
-    def _feedforward(self, X_batch):
-        # The optimized _feedforward method is difficult to understand but computationally more efficient
-        # for a more "by the book" approach, please look at the _feedforward method of Convolution2DLayer
-
-        # save the input for backpropagation
-        self.X_batch_feedforward = X_batch
-
-        # check that there are the correct amount of input channels
-        self._check_for_errors()
-
-        # calculate new shape after stride
-        strided_height = int(np.ceil(X_batch.shape[height_index] / self.v_stride))
-        strided_width = int(np.ceil(X_batch.shape[width_index] / self.h_stride))
-
-        # get windows of the image for more computationally efficient convolution
-        # the idea is that we want to align the dimensions that we wish to matrix
-        # multiply, then use a simple matrix multiplication instead of convolution.
-        # then, we reshape the size back to its intended shape
-        windows = self._extract_windows(X_batch)
-        windows = windows.transpose(1, 0, 2, 3, 4).reshape(
-            X_batch.shape[input_index],
-            strided_height * strided_width,
-            -1,
-        )
-
-        # reshape the kernel for more computationally efficient convolution
-        kernel = self.kernel
-        kernel = kernel.transpose(0, 2, 3, 1).reshape(
-            kernel.shape[kernel_input_channels_index]
-            * kernel.shape[height_index]
-            * kernel.shape[width_index],
-            -1,
-        )
-
-        # use simple matrix calculation to obtain output
-        output = (
-            (windows @ kernel)
-            .reshape(
-                X_batch.shape[input_index],
-                strided_height,
-                strided_width,
-                -1,
-            )
-            .transpose(0, 3, 1, 2)
-        )
-
-        # The output is reshaped and rearranged to appropriate shape
-        return self.act_func(
-            output / (self.kernel_height * X_batch.shape[feature_maps_index])
-        )
-
-    def _backpropagate(self, delta_term_next):
-        # The optimized _backpropagate method is difficult to understand but computationally more efficient
-        # for a more "by the book" approach, please look at the _backpropagate method of Convolution2DLayer
-        act_derivative = derivate(self.act_func)
-        delta_term_next = act_derivative(delta_term_next)
-
-        # calculate strided dimensions
-        strided_height = int(
-            np.ceil(self.X_batch_feedforward.shape[height_index] / self.v_stride)
-        )
-        strided_width = int(
-            np.ceil(self.X_batch_feedforward.shape[width_index] / self.h_stride)
-        )
-
-        # copy kernel
-        kernel = self.kernel
-
-        # get windows, reshape for matrix multiplication
-        windows = self._extract_windows(self.X_batch_feedforward, "image").reshape(
-            self.X_batch_feedforward.shape[input_index]
-            * strided_height
-            * strided_width,
-            -1,
-        )
-
-        # initialize output gradient, reshape and transpose into correct shape
-        # for matrix multiplication
-        output_grad_tr = delta_term_next.transpose(0, 2, 3, 1).reshape(
-            self.X_batch_feedforward.shape[input_index]
-            * strided_height
-            * strided_width,
-            -1,
-        )
-
-        # calculate gradient kernel via simple matrix multiplication and reshaping
-        gradient_kernel = (
-            (windows.T @ output_grad_tr)
-            .reshape(
-                kernel.shape[kernel_input_channels_index],
-                kernel.shape[height_index],
-                kernel.shape[width_index],
-                kernel.shape[kernel_feature_maps_index],
-            )
-            .transpose(0, 3, 1, 2)
-        )
-
-        # for computing the input gradient
-        windows_out, upsampled_height, upsampled_width = self._extract_windows(
-            delta_term_next, "grad"
-        )
-
-        # calculate new window dimensions
-        new_windows_first_dim = (
-            self.X_batch_feedforward.shape[input_index]
-            * upsampled_height
-            * upsampled_width
-        )
-        # ceil allows for various asymmetric kernels
-        new_windows_sec_dim = int(np.ceil(windows_out.size / new_windows_first_dim))
-
-        # reshape for matrix multiplication
-        windows_out = windows_out.transpose(1, 0, 2, 3, 4).reshape(
-            new_windows_first_dim, new_windows_sec_dim
-        )
-
-        # reshape for matrix multiplication
-        kernel_reshaped = kernel.reshape(self.input_channels, -1)
-
-        # calculating input gradient for next convolutional layer
-        input_grad = (windows_out @ kernel_reshaped.T).reshape(
-            self.X_batch_feedforward.shape[input_index],
-            upsampled_height,
-            upsampled_width,
-            kernel.shape[kernel_input_channels_index],
-        )
-        input_grad = input_grad.transpose(0, 3, 1, 2)
-
-        # Update the weights in the kernel
-        self.kernel -= gradient_kernel
-
-        # Output the gradient to propagate backwards
-        return input_grad
-
-    def _extract_windows(self, X_batch, batch_type="image"):
-        """
-        Receives as input the X_batch with shape (inputs, feature_maps, image_height, image_width)
-        and extract windows of size kernel_height * kernel_width for every image and every feature_map.
-        It then returns an np.ndarray of shape (image_height * image_width, inputs, feature_maps, kernel_height, kernel_width)
-        which will be used either to filter the images in feedforward or to calculate the gradient.
-        """
-
-        # initialize list of windows
-        windows = []
-
-        if batch_type == "image":
-            # pad the images
-            X_batch_padded = self._padding(X_batch, batch_type="image")
-            img_height, img_width = X_batch_padded.shape[2:]
-            # For each location in the image...
-            for h in range(
-                0,
-                X_batch.shape[height_index],
-                self.v_stride,
-            ):
-                for w in range(
-                    0,
-                    X_batch.shape[width_index],
-                    self.h_stride,
-                ):
-                    # ...obtain an image patch of the original size (strided)
-
-                    # get window
-                    window = X_batch_padded[
-                        :,
-                        :,
-                        h : h + self.kernel_height,
-                        w : w + self.kernel_width,
-                    ]
-
-                    # append to list of windows
-                    windows.append(window)
-
-            # return numpy array instead of list
-            return np.stack(windows)
-
-        # In order to be able to perform backprogagation by the method of window extraction,
-        # here is a modified approach to extracting the windows which allow for the necessary
-        # upsampling of the gradient in case the on of the stride parameters is larger than one.
-
-        if batch_type == "grad":
-
-            # In the case of one of the stride parameters being odd, we have to take some
-            # extra care in calculating the upsampled size of X_batch. We solve this
-            # by simply flooring the result of dividing stride by 2.
-            if self.v_stride < 2 or self.v_stride % 2 == 0:
-                v_stride = 0
-            else:
-                v_stride = int(np.floor(self.v_stride / 2))
-
-            if self.h_stride < 2 or self.h_stride % 2 == 0:
-                h_stride = 0
-            else:
-                h_stride = int(np.floor(self.h_stride / 2))
-
-            upsampled_height = (X_batch.shape[height_index] * self.v_stride) - v_stride
-            upsampled_width = (X_batch.shape[width_index] * self.h_stride) - h_stride
-
-            # When upsampling, we need to insert rows and columns filled with zeros
-            # into each feature map. How many of those we have to insert is purely
-            # dependant on the value of stride parameter in the vertical and horizontal
-            # direction.
-            if self.v_stride > 1:
-                v_ind = 1
-                for i in range(X_batch.shape[height_index]):
-                    for j in range(self.v_stride - 1):
-                        X_batch = np.insert(X_batch, v_ind, 0, axis=height_index)
-                    v_ind += self.v_stride
-
-            if self.h_stride > 1:
-                h_ind = 1
-                for i in range(X_batch.shape[width_index]):
-                    for k in range(self.h_stride - 1):
-                        X_batch = np.insert(X_batch, h_ind, 0, axis=width_index)
-                    h_ind += self.h_stride
-
-            # Since the insertion of zero-filled rows and columns isn't perfect, we have
-            # to assure that the resulting feature maps will have the expected upsampled height
-            # and width by cutting them og at desired dimensions.
-
-            X_batch = X_batch[:, :, :upsampled_height, :upsampled_width]
-
-            X_batch_padded = self._padding(X_batch, batch_type="grad")
-
-            # initialize list of windows
-            windows = []
-
-            # For each location in the image...
-            for h in range(
-                0,
-                X_batch.shape[height_index],
-                self.v_stride,
-            ):
-                for w in range(
-                    0,
-                    X_batch.shape[width_index],
-                    self.h_stride,
-                ):
-                    # ...obtain an image patch of the original size (strided)
-
-                    # get window
-                    window = X_batch_padded[
-                        :, :, h : h + self.kernel_height, w : w + self.kernel_width
-                    ]
-
-                    # append window to list
-                    windows.append(window)
-
-            # return numpy array, unsampled dimensions
-            return np.stack(windows), upsampled_height, upsampled_width
-
-    def _check_for_errors(self):
-        # compares input channels of data to input channels of Convolution2DLayer
-        if self.X_batch_feedforward.shape[input_channel_index] != self.input_channels:
-            raise AssertionError(
-                f"ERROR: Number of input channels in data ({self.X_batch_feedforward.shape[input_channel_index]}) is not equal to input channels in Convolution2DLayerOPT ({self.input_channels})! Please change the number of input channels of the Convolution2DLayer such that they are equal"
-            )
-!ec
-
-=== The Convolutional Neural Network (CNN) ===
-
-Finally, we present the code for the CNN. The CNN class organizes all the layers, and allows for training on image data.
-
-!bc pycod
-import math
-import autograd.numpy as np
-import sys
-import warnings
-from autograd import grad, elementwise_grad
-from random import random, seed
-from copy import deepcopy
-from typing import Tuple, Callable
-from sklearn.utils import resample
-
-warnings.simplefilter("error")
-
-
-class CNN:
-    def __init__(
-        self,
-        cost_func: Callable = CostCrossEntropy,
-        scheduler: Scheduler = Adam(eta=1e-4, rho=0.9, rho2=0.999),
-        seed: int = None,
-    ):
-        """
-        Description:
-        ------------
-            Instantiates CNN object
-
-        Parameters:
-        ------------
-            I   output_func (costFunctions) cost function for feed forward neural network part of CNN,
-                such as "CostLogReg", "CostOLS" or "CostCrossEntropy"
-
-            II  scheduler (Scheduler) optional parameter, default set to Adam. Can also be set to other
-                schedulers such as AdaGrad, Momentum, RMS_prop and Constant. Note that schedulers have
-                to be instantiated first with proper parameters (for example eta, rho and rho2 for Adam)
-
-            III seed (int) used for seeding all random operations
-        """
-        self.layers = list()
-        self.cost_func = cost_func
-        self.scheduler = scheduler
-        self.schedulers_weight = list()
-        self.schedulers_bias = list()
-        self.seed = seed
-        self.pred_format = None
-
-    def add_FullyConnectedLayer(
-        self, nodes: int, act_func=LRELU, scheduler=None
-    ) -> None:
-        """
-        Description:
-        ------------
-            Add a FullyConnectedLayer to the CNN, i.e. a hidden layer in the feed forward neural
-            network part of the CNN. Often called a Dense layer in literature
-
-        Parameters:
-        ------------
-            I   nodes (int) number of nodes in FullyConnectedLayer
-            II  act_func (activationFunctions) activation function of FullyConnectedLayer,
-                such as "sigmoid", "RELU", "LRELU", "softmax" or "identity"
-            III scheduler (Scheduler) optional parameter, default set to Adam. Can also be set to other
-                schedulers such as AdaGrad, Momentum, RMS_prop and Constant
-        """
-        assert self.layers, "FullyConnectedLayer should follow FlattenLayer in CNN"
-
-        if scheduler is None:
-            scheduler = self.scheduler
-
-        layer = FullyConnectedLayer(nodes, act_func, scheduler, self.seed)
-        self.layers.append(layer)
-
-    def add_OutputLayer(self, nodes: int, output_func=sigmoid, scheduler=None) -> None:
-        """
-        Description:
-        ------------
-            Add an OutputLayer to the CNN, i.e. a the final layer in the feed forward neural
-            network part of the CNN
-
-        Parameters:
-        ------------
-            I   nodes (int) number of nodes in OutputLayer. Set nodes=1 for binary classification and
-                nodes = number of classes for multi-class classification
-            II  output_func (activationFunctions) activation function for the output layer, such as
-                "identity" for regression, "sigmoid" for binary classification and "softmax" for multi-class
-                classification
-            III scheduler (Scheduler) optional parameter, default set to Adam. Can also be set to other
-                schedulers such as AdaGrad, Momentum, RMS_prop and Constant
-        """
-        assert self.layers, "OutputLayer should follow FullyConnectedLayer in CNN"
-
-        if scheduler is None:
-            scheduler = self.scheduler
-
-        output_layer = OutputLayer(
-            nodes, output_func, self.cost_func, scheduler, self.seed
-        )
-        self.layers.append(output_layer)
-        self.pred_format = output_layer.get_pred_format()
-
-    def add_FlattenLayer(self, act_func=LRELU) -> None:
-        """
-        Description:
-        ------------
-            Add a FlattenLayer to the CNN, which flattens the image data such that it is formatted to
-            be used in the feed forward neural network part of the CNN
-        """
-        self.layers.append(FlattenLayer(act_func=act_func, seed=self.seed))
-
-    def add_Convolution2DLayer(
-        self,
-        input_channels=1,
-        feature_maps=1,
-        kernel_height=3,
-        kernel_width=3,
-        v_stride=1,
-        h_stride=1,
-        pad="same",
-        act_func=LRELU,
-        optimized=True,
-    ) -> None:
-        """
-        Description:
-        ------------
-            Add a Convolution2DLayer to the CNN, i.e. a convolutional layer with a 2 dimensional kernel. Should be
-            the first layer added to the CNN
-
-        Parameters:
-        ------------
-            I   input_channels (int) specifies amount of input channels. For monochrome images, use input_channels
-                = 1, and input_channels = 3 for colored images, where each channel represents one of R, G and B
-            II  feature_maps (int) amount of feature maps in CNN
-            III kernel_height (int) height of the kernel, also called 'convolutional filter' in literature
-            IV  kernel_width (int) width of the kernel, also called 'convolutional filter' in literature
-            V   v_stride (int) value of vertical stride for dimentionality reduction
-            VI  h_stride (int) value of horizontal stride for dimentionality reduction
-            VII pad (str) default = "same" ensures output size is the same as input size (given stride=1)
-           VIII act_func (activationFunctions) default = "LRELU", nonlinear activation function
-             IX optimized (bool) default = True, uses Convolution2DLayerOPT if True which is much faster when
-                compared to Convolution2DLayer, which is a more straightforward, understandable implementation
-        """
-        if optimized:
-            conv_layer = Convolution2DLayerOPT(
-                input_channels,
-                feature_maps,
-                kernel_height,
-                kernel_width,
-                v_stride,
-                h_stride,
-                pad,
-                act_func,
-                self.seed,
-                reset_weights_independently=False,
-            )
-        else:
-            conv_layer = Convolution2DLayer(
-                input_channels,
-                feature_maps,
-                kernel_height,
-                kernel_width,
-                v_stride,
-                h_stride,
-                pad,
-                act_func,
-                self.seed,
-                reset_weights_independently=False,
-            )
-        self.layers.append(conv_layer)
-
-    def add_PoolingLayer(
-        self, kernel_height=2, kernel_width=2, v_stride=1, h_stride=1, pooling="max"
-    ) -> None:
-        """
-        Description:
-        ------------
-            Add a Pooling2DLayer to the CNN, i.e. a pooling layer that reduces the dimentionality of
-            the image data. It is not necessary to use a Pooling2DLayer when creating a CNN, but it
-            can be used to speed up the training
-
-        Parameters:
-        ------------
-            I   kernel_height (int) height of the kernel used for pooling
-            II  kernel_width (int) width of the kernel used for pooling
-            III v_stride (int) value of vertical stride for dimentionality reduction
-            IV  h_stride (int) value of horizontal stride for dimentionality reduction
-            V   pooling (str) either "max" or "average", describes type of pooling performed
-        """
-        pooling_layer = Pooling2DLayer(
-            kernel_height, kernel_width, v_stride, h_stride, pooling, self.seed
-        )
-        self.layers.append(pooling_layer)
-
-    def fit(
-        self,
-        X: np.ndarray,
-        t: np.ndarray,
-        epochs: int = 100,
-        lam: float = 0,
-        batches: int = 1,
-        X_val: np.ndarray = None,
-        t_val: np.ndarray = None,
-    ) -> dict:
-        """
-        Description:
-        ------------
-            Fits the CNN to input X for a given amount of epochs. Performs feedforward and backpropagation passes,
-            can utilize batches, regulariziation and validation if desired.
-
-        Parameters:
-        ------------
-            X (numpy array) with input data in format [images, input channels,
-            image height, image_width]
-            t (numpy array) target labels for input data
-            epochs (int) amount of epochs
-            lam (float) regulariziation term lambda
-            batches (int) amount of batches input data splits into
-            X_val (numpy array) validation data
-            t_val (numpy array) target labels for validation data
-
-        Returns:
-        ------------
-            scores (dict) a dictionary with "train_error", "train_acc", "val_error", val_acc" keys
-            that contain numpy arrays with float values of all accuracies/errors over all epochs.
-            Can be used to create plots. Also used to update the progress bar during training
-        """
-
-        # setup
-        if self.seed is not None:
-            np.random.seed(self.seed)
-
-        # initialize weights
-        self._initialize_weights(X)
-
-        # create arrays for score metrics
-        scores = self._initialize_scores(epochs)
-
-        assert batches <= t.shape[0]
-        batch_size = X.shape[0] // batches
-
-        try:
-            for epoch in range(epochs):
-                for batch in range(batches):
-                    # minibatch gradient descent
-                    # If the for loop has reached the last batch, take all thats left
-                    if batch == batches - 1:
-                        X_batch = X[batch * batch_size :, :, :, :]
-                        t_batch = t[batch * batch_size :, :]
-                    else:
-                        X_batch = X[
-                            batch * batch_size : (batch + 1) * batch_size, :, :, :
-                        ]
-                        t_batch = t[batch * batch_size : (batch + 1) * batch_size, :]
-
-                    self._feedforward(X_batch)
-                    self._backpropagate(t_batch, lam)
-
-                # reset schedulers for each epoch (some schedulers pass in this call)
-                for layer in self.layers:
-                    if isinstance(layer, FullyConnectedLayer):
-                        layer._reset_scheduler()
-
-                # computing performance metrics
-                scores = self._compute_scores(scores, epoch, X, t, X_val, t_val)
-
-                # printing progress bar
-                print_length = self._progress_bar(
-                    epoch,
-                    epochs,
-                    scores,
-                )
-        # allows for stopping training at any point and seeing the result
-        except KeyboardInterrupt:
-            pass
-
-        # visualization of training progression (similiar to tensorflow progression bar)
-        sys.stdout.write("\r" + " " * print_length)
-        sys.stdout.flush()
-        self._progress_bar(
-            epochs,
-            epochs,
-            scores,
-        )
-        sys.stdout.write("")
-
-        return scores
-
-    def _feedforward(self, X_batch) -> np.ndarray:
-        """
-        Description:
-        ------------
-            Performs the feedforward pass for all layers in the CNN. Called from fit()
-        """
-        a = X_batch
-        for layer in self.layers:
-            a = layer._feedforward(a)
-
-        return a
-
-    def _backpropagate(self, t_batch, lam) -> None:
-        """
-        Description:
-        ------------
-            Performs backpropagation for all layers in the CNN. Called from fit()
-        """
-        assert len(self.layers) >= 2
-        reversed_layers = self.layers[::-1]
-
-        # for every layer, backwards
-        for i in range(len(reversed_layers) - 1):
-            layer = reversed_layers[i]
-            prev_layer = reversed_layers[i + 1]
-
-            # OutputLayer
-            if isinstance(layer, OutputLayer):
-                prev_a = prev_layer.get_prev_a()
-                weights_next, delta_next = layer._backpropagate(t_batch, prev_a, lam)
-
-            # FullyConnectedLayer
-            elif isinstance(layer, FullyConnectedLayer):
-                assert (
-                    delta_next is not None
-                ), "No OutputLayer to follow FullyConnectedLayer"
-                assert (
-                    weights_next is not None
-                ), "No OutputLayer to follow FullyConnectedLayer"
-                prev_a = prev_layer.get_prev_a()
-                weights_next, delta_next = layer._backpropagate(
-                    weights_next, delta_next, prev_a, lam
-                )
-
-            # FlattenLayer
-            elif isinstance(layer, FlattenLayer):
-                assert (
-                    delta_next is not None
-                ), "No FullyConnectedLayer to follow FlattenLayer"
-                assert (
-                    weights_next is not None
-                ), "No FullyConnectedLayer to follow FlattenLayer"
-                delta_next = layer._backpropagate(weights_next, delta_next)
-
-            # Convolution2DLayer and Convolution2DLayerOPT
-            elif isinstance(layer, Convolution2DLayer):
-                assert (
-                    delta_next is not None
-                ), "No FlattenLayer to follow Convolution2DLayer"
-                delta_next = layer._backpropagate(delta_next)
-
-            # Pooling2DLayer
-            elif isinstance(layer, Pooling2DLayer):
-                assert delta_next is not None, "No Layer to follow Pooling2DLayer"
-                delta_next = layer._backpropagate(delta_next)
-
-            # Catch error
-            else:
-                raise NotImplementedError
-
-    def _compute_scores(
-        self,
-        scores: dict,
-        epoch: int,
-        X: np.ndarray,
-        t: np.ndarray,
-        X_val: np.ndarray,
-        t_val: np.ndarray,
-    ) -> dict:
-        """
-        Description:
-        ------------
-            Computes scores such as training error, training accuracy, validation error
-            and validation accuracy for the CNN depending on if a validation set is used
-            and if the CNN performs classification or regression
-
-        Returns:
-        ------------
-            scores (dict) a dictionary with "train_error", "train_acc", "val_error", val_acc" keys
-            that contain numpy arrays with float values of all accuracies/errors over all epochs.
-            Can be used to create plots. Also used to update the progress bar during training
-        """
-
-        pred_train = self.predict(X)
-        cost_function_train = self.cost_func(t)
-        train_error = cost_function_train(pred_train)
-        scores["train_error"][epoch] = train_error
-
-        if X_val is not None and t_val is not None:
-            cost_function_val = self.cost_func(t_val)
-            pred_val = self.predict(X_val)
-            val_error = cost_function_val(pred_val)
-            scores["val_error"][epoch] = val_error
-
-        if self.pred_format != "Regression":
-            train_acc = self._accuracy(pred_train, t)
-            scores["train_acc"][epoch] = train_acc
-            if X_val is not None and t_val is not None:
-                val_acc = self._accuracy(pred_val, t_val)
-                scores["val_acc"][epoch] = val_acc
-
-        return scores
-
-    def _initialize_scores(self, epochs) -> dict:
-        """
-        Description:
-        ------------
-            Initializes scores such as training error, training accuracy, validation error
-            and validation accuracy for the CNN
-
-        Returns:
-        ------------
-            A dictionary with "train_error", "train_acc", "val_error", val_acc" keys that
-            will contain numpy arrays with float values of all accuracies/errors over all epochs
-            when passed through the _compute_scores() function during fit()
-        """
-        scores = dict()
-
-        train_errors = np.empty(epochs)
-        train_errors.fill(np.nan)
-        val_errors = np.empty(epochs)
-        val_errors.fill(np.nan)
-
-        train_accs = np.empty(epochs)
-        train_accs.fill(np.nan)
-        val_accs = np.empty(epochs)
-        val_accs.fill(np.nan)
-
-        scores["train_error"] = train_errors
-        scores["val_error"] = val_errors
-        scores["train_acc"] = train_accs
-        scores["val_acc"] = val_accs
-
-        return scores
-
-    def _initialize_weights(self, X: np.ndarray) -> None:
-        """
-        Description:
-        ------------
-            Initializes weights for all layers in CNN
-
-        Parameters:
-        ------------
-            I   X (np.ndarray) input of format [img, feature_maps, height, width]
-        """
-        prev_nodes = X
-        for layer in self.layers:
-            prev_nodes = layer._reset_weights(prev_nodes)
-
-    def predict(self, X: np.ndarray, *, threshold=0.5) -> np.ndarray:
-        """
-        Description:
-        ------------
-            Predicts output of input X
-
-        Parameters:
-        ------------
-            I   X (np.ndarray) input [img, feature_maps, height, width]
-        """
-
-        prediction = self._feedforward(X)
-
-        if self.pred_format == "Binary":
-            return np.where(prediction > threshold, 1, 0)
-        elif self.pred_format == "Multi-class":
-            class_prediction = np.zeros(prediction.shape)
-            for i in range(prediction.shape[0]):
-                class_prediction[i, np.argmax(prediction[i, :])] = 1
-            return class_prediction
-        else:
-            return prediction
-
-    def _accuracy(self, prediction: np.ndarray, target: np.ndarray) -> float:
-        """
-        Description:
-        ------------
-            Calculates accuracy of given prediction to target
-
-        Parameters:
-        ------------
-            I   prediction (np.ndarray): output of predict() fuction
-            (1s and 0s in case of classification, and real numbers in case of regression)
-            II  target (np.ndarray): vector of true values (What the network should predict)
-
-        Returns:
-        ------------
-            A floating point number representing the percentage of correctly classified instances.
-        """
-        assert prediction.size == target.size
-        return np.average((target == prediction))
-
-    def _progress_bar(self, epoch: int, epochs: int, scores: dict) -> int:
-        """
-        Description:
-        ------------
-            Displays progress of training
-        """
-        progression = epoch / epochs
-        epoch -= 1
-        print_length = 40
-        num_equals = int(progression * print_length)
-        num_not = print_length - num_equals
-        arrow = ">" if num_equals > 0 else ""
-        bar = "[" + "=" * (num_equals - 1) + arrow + "-" * num_not + "]"
-        perc_print = self._fmt(progression * 100, N=5)
-        line = f"  {bar} {perc_print}% "
-
-        for key, score in scores.items():
-            if np.isnan(score[epoch]) == False:
-                value = self._fmt(score[epoch], N=4)
-                line += f"| {key}: {value} "
-        print(line, end="\r")
-        return len(line)
-
-    def _fmt(self, value: int, N=4) -> str:
-        """
-        Description:
-        ------------
-            Formats decimal numbers for progress bar
-        """
-        if value > 0:
-            v = value
-        elif value < 0:
-            v = -10 * value
-        else:
-            v = 1
-        n = 1 + math.floor(math.log10(v))
-        if n >= N - 1:
-            return str(round(value))
-            # or overflow
-        return f"{value:.{N-n-1}f}"
-!ec
-
-=== Usage of CNN code ===
-
-Using the CNN codebase is very simple. We begin by initiating a CNN
-object, which takes a cost function, a scheduler and a seed as its
-arguments. If a scheduler is not provided, it will per default
-initiate an Adam scheduler with eta=1e-4, and if a seed is not
-provided, the CNN will not be seeded, meaning it will run with a
-different random seed every run. Below we demonstrate an initiation of
-our CNN.
-
-!bc pycod
-adam_scheduler = Adam(eta=1e-3, rho=0.9, rho2=0.999)
-cnn = CNN(cost_func=CostCrossEntropy, scheduler=adam_scheduler, seed=2023)
-!ec
-
-Now that we have our CNN object, we can begin to add layers to it!
-Many of the add_layer functions have default values, for example
-add_Convolution2DLayer() has a default v_stride and h_stride of
-1. However, these can of course be set to any value you please. Note
-that the input channels of a subsequent convolutional layer must equal
-the previous convolutional layer's feature maps.
-
-!bc pycod
-cnn.add_Convolution2DLayer(
-    input_channels=1,
-    feature_maps=1,
-    kernel_height=3,
-    kernel_width=3,
-    act_func=LRELU,
-)
-
-cnn.add_FlattenLayer()
-
-cnn.add_FullyConnectedLayer(30, LRELU)
-
-cnn.add_FullyConnectedLayer(20, LRELU)
-
-cnn.add_OutputLayer(10, softmax)
-!ec
-
-Here we have created a CNN with the following architecture:
- 
-o A convolutional layer with 1 input channel, with a kernel height of 2 and a width of 2, which uses LRELU as its non-linearity function. This layer outputs 1 feature map, which feed into the subsequent layer.
-
-o A flatten layer
-
-o A hidden layer with 30 nodes, with LRELU as its activation function
-
-o Another hidden layer but with 20 nodes
-
-o The output layer, with softmax as its activation function and 10 nodes. We use 10 nodes because we will be using a dataset with 10 classes.
-
-Now, before we can train the model, we need to load in our data. We
-will use the MNIST dataset and use 10000 $28 \times  28$ images.
-
-!bc pycod
-from sklearn.datasets import fetch_openml
-from sklearn.model_selection import train_test_split
-
-def onehot(target: np.ndarray):
-    onehot = np.zeros((target.size, target.max() + 1))
-    onehot[np.arange(target.size), target] = 1
-    return onehot
-
-# get dataset
-dataset = fetch_openml("mnist_784", parser="auto")
-mnist = dataset.data.to_numpy(dtype="float")[:10000, :]
-
-# scale data
-for i in range(mnist.shape[1]):
-    mnist[:, i] /= 255
-    
-# reshape to add single input channel to data shape [inputs, input_channels, height, width]
-mnist = mnist.reshape(mnist.shape[0], 1, 28, 28)
-
-# one hot encode target as we are doing multi-class classification
-target = onehot(np.array([int(i) for i in dataset.target.to_numpy()[:10000]]))
-
-# split into training and validation data
-x_train, x_val, y_train, y_val = train_test_split(mnist, target)
-!ec
-
-Now we may train our model. Note that we can utilize regularization in
-the CNN by using the lam (lambda) parameter in fit(), and utilize
-different types of gradient descent by specifying the amount of
-batches via the batches parameter as shown below.
- 
-The functionfit() returns a score dictionary of the training error and
-accuracy (and validation error and accuracy if a validation set is
-provided) which can be used to plot the error and accuracy of the
-model over epochs.
-
-!bc pycod
-scores = cnn.fit(
-    x_train,
-    y_train,
-    lam=1e-5,
-    batches=10,
-    epochs=100,
-    X_val=x_val,
-    t_val=y_val,
-)
-
-plt.plot(scores["train_acc"], label="Training")
-plt.plot(scores["val_acc"], label="Validation")
-plt.ylim([0.8,1])
-plt.xlabel("Epochs")
-plt.ylabel("Accuracy")
-plt.legend()
-plt.show()
-!ec
-
-Considering we only trained the model for 100 epochs without any tuning of the hyperparameters, this result is pretty good.
- 
-The codebase allows for great flexibility in CNN
-architectures. Pooling layers can be added before, inbetween or after
-convolutional layers, but due to the great optimizations made within
-Convolution2DLayerOPT, we recommend using the v_stride and h_stride
-parameters in add_Convolution2DLayer() to reduce the dimentionality of
-the problem as the pooling layer is slow in comparison. To use the
-unoptimized version of Convolution2DLayer, simply pass optimized=False
-as an argument in add_Convolution2DLayer().
-
-If one wishes to perform binary classification using the CNN, simply
-use the cost function 'CostLogReg' when initializing the CNN and use 1
-node at the OutputLayer.
- 
-Below we have created another, more untraditional architecture using
-our code to demonstrate its flexibility and different attributes such
-as asymmetric stride that might become useful when constructing your
-own CNN.
-
-!bc pycod
-adam_scheduler = Adam(eta=1e-3, rho=0.9, rho2=0.999)
-cnn = CNN(cost_func=CostCrossEntropy, scheduler=adam_scheduler, seed=2023)
-
-cnn.add_Convolution2DLayer(
-    input_channels=1,
-    feature_maps=7,
-    kernel_height=7,
-    kernel_width=1,
-    act_func=LRELU,
-)
-
-cnn.add_PoolingLayer(
-    kernel_height=2,
-    kernel_width=2,
-    pooling="average",
-)
-
-cnn.add_PoolingLayer(
-    kernel_height=2,
-    kernel_width=2,
-    pooling="max",
-)
-
-cnn.add_Convolution2DLayer(
-    input_channels=7,
-    feature_maps=1,
-    kernel_height=4,
-    kernel_width=4,
-    v_stride=2,
-    h_stride=3,
-    act_func=LRELU,
-    optimized=False,
-)
-
-cnn.add_Convolution2DLayer(
-    input_channels=1,
-    feature_maps=1,
-    kernel_height=2,
-    kernel_width=2,
-    act_func=sigmoid,
-    optimized=True,
-)
-
-cnn.add_PoolingLayer(
-    kernel_height=2,
-    kernel_width=2,
-    pooling="max"
-)
-
-cnn.add_FlattenLayer()
-
-cnn.add_FullyConnectedLayer(100, LRELU)
-
-cnn.add_FullyConnectedLayer(10, sigmoid)
-
-cnn.add_FullyConnectedLayer(101, identity)
-
-cnn.add_OutputLayer(10, softmax)
-!ec
-
-Here we see the use of asymmetrical 1D kernels such as the $7 \times
-1$ kernel in the first convolutional layer, both max and average
-pooling, asymmetric stride in the unoptimized convolutional layer,
-more pooling, a flatten layer, a hidden layer with 100 nodes using
-LRELU, another hidden layer with 10 hidden nodes that uses the sigmoid
-activation function, and another hidden layer with 101 nodes which
-utilizes no activation function (identity). Finally, we arrive at the
-output layer with 10 nodes, which uses softmax as its activation
-function.
-
-=== Additional Remarks ===
-
-
-The stride parameter controls the distance between each convolution
-and the kernel/filter. If our image is padded, stride is the only
-parameter that determines the size of the output from a convolutional
-layer. However, if we decide not to perform any padding, the size of
-the output feature map depends on both the stride and kernel size. It
-is important to note that neither the stride nor the kernel has to be
-symmetrical. This means that we can use a rectangular filter if we
-choose, and the stride in the vertical direction (axis=0 in Python)
-does not need to be the same as the stride in the horizontal direction
-(axis=1 in Python). It may even be the case that asymmetric
-combinations of stride or kernel dimensions, or both, yield better
-results than symmetric values for these parameters.
-
-!bc pycod
-def convolve(image, kernel, stride=1):
-    for i in range(2):
-        kernel = np.rot90(kernel)
-
-    k_half_height = kernel.shape[0] // 2
-    k_half_width = kernel.shape[0] // 2
-
-    conv_image = np.zeros(image.shape)
-    pad_image = padding(image, kernel)
-
-    for i in range(k_half_height, conv_image.shape[0] + k_half_height, stride):
-        for j in range(k_half_width, conv_image.shape[1] + k_half_width, stride):
-            conv_image[i - k_half_height, j - k_half_width] = np.sum(
-                pad_image[
-                    i - k_half_height : i + k_half_height + 1, j - k_half_width : j + k_half_width + 1
-                ]
-                * kernel
-            )
-
-    return conv_image
-!ec
-
-=== Remarks on the speed  ===
-
-Despite the naive convolution algorithm shown above working finely, it
-is extremely slow, requiring approximately 20-30 seconds to process a
-single image. The time complexity of 2D convolution, which is O(NMnm),
-rapidly becomes a constraint and may, at worst, make computations
-infeasible. Consequently, optimizing the naive 2D convolution
-algorithm is a necessity, as the execution time of the algorithm
-significantly increases as the input data size expands. This can pose
-a bottleneck in applications that necessitate real-time processing of
-large data volumes, such as image and video processing, deep learning,
-and scientific simulations.
-
-To address this issue, we shall present two widely used optimization
-techniques: the separable kernel approach and Fast Fourier Transform
-(FFT). Both of these methods can drastically reduce the computational
-complexity of convolution and enhance the overall efficiency of
-processing substantial data quantities. While we shall refrain from
-delving into the intricacies of these algorithms, we strongly
-encourage you to examine at least the application of FFT to optimize
-computations.
-
-=== Convolution using separable kernels ===
-
-!bc pycod
-def conv2DSep(image, kernel, coef, stride=1, pad="zero"):
-    for i in range(2):
-        kernel = np.rot90(kernel)
-
-    # The kernel is quadratic, thus we only need one of its dimensions
-    half_dim = kernel.shape[0] // 2
-
-    ker1 = np.array(kernel[0, :])
-    ker2 = np.array(kernel[:, 0])
-
-    if pad == "zero":
-        conv_image = np.zeros(image.shape)
-        pad_image = padding(image, kernel)
-    else:
-        conv_image = np.zeros(
-            (image.shape[0] - kernel.shape[0], image.shape[1] - kernel.shape[1])
-        )
-        pad_image = image[:, :]
-
-    for i in range(half_dim, conv_image.shape[0] + half_dim, stride):
-        for j in range(half_dim, conv_image.shape[1] + half_dim, stride):
-            conv_image[i - half_dim, j - half_dim] = (
-                pad_image[
-                    i - half_dim : i + half_dim + 1, j - half_dim : j + half_dim + 1
-                ]
-                @ ker1
-                @ ker2.T
-                * coef
-            )
-
-    return conv_image
-
-img_path = img_path = "data/IMG-2167.JPG"
-image_of_cute_dog = imageio.imread(img_path, mode="L")
-start_time = time.time()
-filtered_image = conv2DSep(image_of_cute_dog, kernel=sobel_kernel, coef=1)
-print(f'Time taken for convolution with seperated kernel on 128x128 image {time.time() - start_time}')
-plt.imshow(filtered_image, cmap="gray", vmin=0, vmax=255, aspect="auto")
-plt.show()
-!ec
-
-By taking advantage of the capabilities of separable kernels, we can
-effectively cut the computational expense of filtering an image in
-half. Yet, if we seek even more rapid processing, we can turn to the
-Fast Fourier Transform (FFT) algorithm provided by the numpy
-library. By utilizing FFT to transform the input image and filter into
-the frequency domain, we can perform convolution in this domain. This
-approach significantly reduces the number of operations needed and
-results in a marked speedup relative to other convolution
-techniques. In addition, it is worth noting that the FFT is widely
-regarded as one of the most critical algorithms developed to date,
-with applications ranging from digital signal processing to scientific
-computing.
-
-=== Convolution in the Fourier domain ===
-!bc pycod
-start_time = time.time()
-img_fft = np.fft.fft2(image_of_cute_dog)
-kernel_fft = np.fft.fft2(sobel_kernel, s=image_of_cute_dog.shape)
-
-conv_image = img_fft * kernel_fft
-
-filtered_image = np.fft.ifft2(conv_image)
-print(f'Time take for convolution in the fourier domain: {time.time() - start_time}')
-plt.imshow(filtered_image.real, cmap="gray", vmin=0, vmax=255, aspect="auto")
-plt.show()
-!ec
-
-It is evident that executing convolution in the Fourier domain yields
-the quickest computation time. Nonetheless, one should exercise
-caution, particularly when dealing with images of relatively small
-dimensions, as one of the other methods may prove to be more
-expeditious than FFT-enhanced convolution. The overhead involved in
-transferring both the image and filter into the Fourier domain,
-followed by their subsequent transformation back into the spatial
-domain, results in a minor inconvenience. Therefore, it is imperative
-to remain cognizant of this fact when utilizing FFT as the primary
-optimization technique.
-
-
-
-
-
-
-
-
-
-
-!split
-===== From FFNNs and CNNs to recurrent neural networks (RNNs) =====
-
-There are limitation of FFNNs, one of which being that FFNNs are not
-designed to handle sequential data (data for which the order matters)
-effectively because they lack the capabilities of storing information
-about previous inputs; each input is being treated indepen-
-dently. This is a limitation when dealing with sequential data where
-past information can be vital to correctly process current and future
-inputs. 
+Each filter slides along the input image, taking the dot product
+between each small part of the image and the filter, in all depth
+dimensions. This is then passed through a non-linear function,
+typically the _Rectified Linear (ReLu)_ function, which serves as the
+activation of the neurons in the first convolutional layer. This is
+further passed through a _pooling layer_, which reduces the size of the
+convolutional layer, e.g. by taking the maximum or average across some
+small regions, and this serves as input to the next convolutional
+layer.
 
 
 !split
-===== Feedback connections =====
-
-
-In contrast to FFNNs, recurrent networks introduce feedback
-connections, meaning the information is allowed to be carried to
-subsequent nodes across different time steps. These cyclic or feedback
-connections have the objective of providing the network with some kind
-of memory, making RNNs particularly suited for time- series data,
-natural language processing, speech recognition, and several other
-problems for which the order of the data is crucial.  The RNN
-architectures vary greatly in how they manage information flow and
-memory in the network.
-
-
-!split
-===== Vanishing gradients =====
-
-Different architectures often aim at improving
-some sub-optimal characteristics of the network. The simplest form of
-recurrent network, commonly called simple or vanilla RNN, for example,
-is known to suffer from the problem of vanishing gradients. This
-problem arises due to the nature of backpropagation in time. Gradients
-of the cost/loss function may get exponentially small (or large) if
-there are many layers in the network, which is the case of RNN when
-the sequence gets long.
-
-
-
-
-!split
-===== Recurrent neural networks (RNNs): Overarching view =====
-
-Till now our focus has been, including convolutional neural networks
-as well, on feedforward neural networks. The output or the activations
-flow only in one direction, from the input layer to the output layer.
-
-A recurrent neural network (RNN) looks very much like a feedforward
-neural network, except that it also has connections pointing
-backward. 
-
-RNNs are used to analyze time series data such as stock prices, and
-tell you when to buy or sell. In autonomous driving systems, they can
-anticipate car trajectories and help avoid accidents. More generally,
-they can work on sequences of arbitrary lengths, rather than on
-fixed-sized inputs like all the nets we have discussed so far. For
-example, they can take sentences, documents, or audio samples as
-input, making them extremely useful for natural language processing
-systems such as automatic translation and speech-to-text.
-
-!split
-===== Sequential data only? =====
-
-An important issue is that in many deep learning methods we assume
-that the input and output data can be treated as independent and
-identically distributed, normally abbreviated to _iid_.
-This means that the data we use can be seen as mutually independent.
-
-This is however not the case for most data sets used in RNNs since we
-are dealing with sequences of data with strong inter-dependencies.
-This applies in particular to time series, which are sequential by
-contruction.
-
-!split
-===== Differential equations =====
-
-As an example, the solutions of ordinary differential equations can be
-represented as a time series, similarly, how stock prices evolve as
-function of time is another example of a typical time series, or voice
-records and many other examples.
-
-
-Not all sequential data may however have a time stamp, texts being a
-typical example thereof, or DNA sequences.
-
-The main focus here is on data that can be structured either as time
-series or as ordered series of data.  We will not focus on for example
-natural language processing or similar data sets.
-
+===== Systematic reduction =====  
 
+By systematically reducing the size of the input volume, through
+convolution and pooling, the network should create representations of
+small parts of the input, and then from them assemble representations
+of larger areas.  The final pooling layer is flattened to serve as
+input to a hidden layer, such that each neuron in the final pooling
+layer is connected to every single neuron in the hidden layer. This
+then serves as input to the output layer, e.g. a softmax output for
+classification.
+  
 
 !split
-===== A simple example =====
-
+===== Prerequisites: Collect and pre-process data =====
 !bc pycod
-# Start importing packages
-import pandas as pd
+# import necessary packages
 import numpy as np
 import matplotlib.pyplot as plt
-import tensorflow as tf
-from tensorflow.keras import datasets, layers, models
-from tensorflow.keras.layers import Input
-from tensorflow.keras.models import Model, Sequential 
-from tensorflow.keras.layers import Dense, SimpleRNN, LSTM, GRU
-from tensorflow.keras import optimizers     
-from tensorflow.keras import regularizers           
-from tensorflow.keras.utils import to_categorical 
-
-
-
-# convert into dataset matrix
-def convertToMatrix(data, step):
- X, Y =[], []
- for i in range(len(data)-step):
-  d=i+step  
-  X.append(data[i:d,])
-  Y.append(data[d,])
- return np.array(X), np.array(Y)
-
-step = 4
-N = 1000    
-Tp = 800    
-
-t=np.arange(0,N)
-x=np.sin(0.02*t)+2*np.random.rand(N)
-df = pd.DataFrame(x)
-df.head()
-
-values=df.values
-train,test = values[0:Tp,:], values[Tp:N,:]
-
-# add step elements into train and test
-test = np.append(test,np.repeat(test[-1,],step))
-train = np.append(train,np.repeat(train[-1,],step))
- 
-trainX,trainY =convertToMatrix(train,step)
-testX,testY =convertToMatrix(test,step)
-trainX = np.reshape(trainX, (trainX.shape[0], 1, trainX.shape[1]))
-testX = np.reshape(testX, (testX.shape[0], 1, testX.shape[1]))
-
-model = Sequential()
-model.add(SimpleRNN(units=32, input_shape=(1,step), activation="relu"))
-model.add(Dense(8, activation="relu")) 
-model.add(Dense(1))
-model.compile(loss='mean_squared_error', optimizer='rmsprop')
-model.summary()
-
-model.fit(trainX,trainY, epochs=100, batch_size=16, verbose=2)
-trainPredict = model.predict(trainX)
-testPredict= model.predict(testX)
-predicted=np.concatenate((trainPredict,testPredict),axis=0)
-
-trainScore = model.evaluate(trainX, trainY, verbose=0)
-print(trainScore)
-plt.plot(df)
-plt.plot(predicted)
-plt.show()
-!ec
-
-
-!split
-===== RNNs =====
-
-RNNs are very powerful, because they
-combine two properties:
-o Distributed hidden state that allows them to store a lot of information about the past efficiently.
-o Non-linear dynamics that allows them to update their hidden state in complicated ways.
-
-With enough neurons and time, RNNs
-can compute anything that can be
-computed by your computer.
-
-
-
-!split
-===== What kinds of behaviour can RNNs exhibit? =====
-
-o They can oscillate. 
-o They can settle to point attractors.
-o They can behave chaotically.
-o RNNs could potentially learn to implement lots of small programs that each capture a nugget of knowledge and run in parallel, interacting to produce very complicated effects.
-
-But the extensive computational needs  of RNNs makes them very hard to train.
-
-
-!split
-===== Basic layout,  "Figures from Sebastian Rashcka et al, Machine learning with Sickit-Learn and PyTorch":"https://sebastianraschka.com/blog/2022/ml-pytorch-book.html" =====
-
-FIGURE: [figslides/RNN1.png, width=700 frac=0.9]
-
-
-!split
-===== Solving differential equations with RNNs =====
-
-To gain some intuition on how we can use RNNs for time series, let us
-tailor the representation of the solution of a differential equation
-as a time series.
-
-Consider the famous differential equation (Newton's equation of motion for damped harmonic oscillations, scaled in terms of dimensionless time)
-
-!bt
-\[
-\frac{d^2x}{dt^2}+\eta\frac{dx}{dt}+x(t)=F(t),
-\]
-!et
-where $\eta$ is a constant used in scaling time into a dimensionless variable and $F(t)$ is an external force acting on the system.
-The constant $\eta$ is a so-called damping.
-
-!split
-===== Two first-order differential equations =====
-
-In solving the above second-order equation, it is common to rewrite it in terms of two coupled first-order equations
-with the velocity
-!bt
-\[
-v(t)=\frac{dx}{dt},
-\]
-!et
-and the acceleration
-!bt
-\[
-\frac{dv}{dt}=F(t)-\eta v(t)-x(t).
-\]
-!et
-
-With the initial conditions $v_0=v(t_0)$ and $x_0=x(t_0)$ defined, we can integrate these equations and find their respective solutions.
-
-!split
-===== Velocity only =====
-
-Let us focus on the velocity only. Discretizing and using the simplest
-possible approximation for the derivative, we have Euler's forward
-method for the updated velocity at a time step $i+1$ given by
-!bt
-\[
-v_{i+1}=v_i+\Delta t \frac{dv}{dt}_{\vert_{v=v_i}}=v_i+\Delta t\left(F_i-\eta v_i-x_i\right).
-\]
-!et
-Defining a function
-!bt
-\[
-h_i(x_i,v_i,F_i)=v_i+\Delta t\left(F_i-\eta v_i-x_i\right),
-\]
-!et
-we have
-!bt
-\[
-v_{i+1}=h_i(x_i,v_i,F_i).
-\]
-!et
-
-!split
-===== Linking with RNNs =====
-
-The equation
-!bt
-\[
-v_{i+1}=h_i(x_i,v_i,F_i).
-\]
-!et
-can be used to train a feed-forward neural network with inputs $v_i$ and outputs $v_{i+1}$ at a time $t_i$. But we can think of this also as a recurrent neural network
-with inputs $v_i$, $x_i$ and $F_i$ at each time step $t_i$, and producing an output $v_{i+1}$.
-
-Noting that 
-!bt
-\[
-v_{i}=v_{i-1}+\Delta t\left(F_{i-1}-\eta v_{i-1}-x_{i-1}\right)=h_{i-1}.
-\]
-!et
-we have
-!bt
-\[
-v_{i}=h_{i-1}(x_{i-1},v_{i-1},F_{i-1}),
-\]
-!et
-and we can rewrite
-!bt
-\[
-v_{i+1}=h_i(x_i,h_{i-1},F_i).
-\]
-!et
-
-!split
-===== Minor rewrite =====
-
-We can thus set up a recurring series which depends on the inputs $x_i$ and $F_i$ and the previous values $h_{i-1}$.
-We assume now that the inputs at each step (or time $t_i$) is given by $x_i$ only and we denote the outputs for $\tilde{y}_i$ instead of $v_{i_1}$, we have then the compact equation for our outputs at each step $t_i$
-!bt
-\[
-y_{i}=h_i(x_i,h_{i-1}).
-\]
-!et
-
-We can think of this as an element in a recurrent network where our
-network (our model) produces an output $y_i$ which is then compared
-with a target value through a given cost/loss function that we
-optimize. The target values at a given step $t_i$ could be the results
-of a measurement or simply the analytical results of a differential
-equation.
-
-
-!split
-===== RNNs in more detail  =====
-
-FIGURE: [figslides/RNN2.png, width=700 frac=0.9]
-
-!split
-===== RNNs in more detail, part 2  =====
-
-FIGURE: [figslides/RNN3.png, width=700 frac=0.9]
-
-
-!split
-===== RNNs in more detail, part 3  =====
-
-FIGURE: [figslides/RNN4.png, width=700 frac=0.9]
-
-!split
-===== RNNs in more detail, part 4  =====
+from sklearn import datasets
 
-FIGURE: [figslides/RNN5.png, width=700 frac=0.9]
 
-!split
-===== RNNs in more detail, part 5  =====
+# ensure the same random numbers appear every time
+np.random.seed(0)
 
-FIGURE: [figslides/RNN6.png, width=700 frac=0.9]
+# display images in notebook
+%matplotlib inline
+plt.rcParams['figure.figsize'] = (12,12)
 
-!split
-===== RNNs in more detail, part 6  =====
 
-FIGURE: [figslides/RNN7.png, width=700 frac=0.9]
+# download MNIST dataset
+digits = datasets.load_digits()
 
-!split
-===== RNNs in more detail, part 7  =====
+# define inputs and labels
+inputs = digits.images
+labels = digits.target
 
-FIGURE: [figslides/RNN8.png, width=700 frac=0.9]
+# RGB images have a depth of 3
+# our images are grayscale so they should have a depth of 1
+inputs = inputs[:,:,:,np.newaxis]
 
+print("inputs = (n_inputs, pixel_width, pixel_height, depth) = " + str(inputs.shape))
+print("labels = (n_inputs) = " + str(labels.shape))
 
 
-!split
-===== Backpropagation through time =====
+# choose some random images to display
+n_inputs = len(inputs)
+indices = np.arange(n_inputs)
+random_indices = np.random.choice(indices, size=5)
 
-!bblock
-We can think of the recurrent net as a layered, feed-forward
-net with shared weights and then train the feed-forward net
-with weight constraints.
-!eblock
+for i, image in enumerate(digits.images[random_indices]):
+    plt.subplot(1, 5, i+1)
+    plt.axis('off')
+    plt.imshow(image, cmap=plt.cm.gray_r, interpolation='nearest')
+    plt.title("Label: %d" % digits.target[random_indices[i]])
+plt.show()
+!ec
 
-We can also think of this training algorithm in the time domain:
-o The forward pass builds up a stack of the activities of all the units at each time step.
-o The backward pass peels activities off the stack to compute the error derivatives at each time step.
-o After the backward pass we add together the derivatives at all the different times for each weight. 
 
 !split
-===== The backward pass is linear =====
+===== Importing Keras and Tensorflow =====
+!bc pycod
+from tensorflow.keras import datasets, layers, models
+from tensorflow.keras.layers import Input
+from tensorflow.keras.models import Sequential      #This allows appending layers to existing models
+from tensorflow.keras.layers import Dense           #This allows defining the characteristics of a particular layer
+from tensorflow.keras import optimizers             #This allows using whichever optimiser we want (sgd,adam,RMSprop)
+from tensorflow.keras import regularizers           #This allows using whichever regularizer we want (l1,l2,l1_l2)
+from tensorflow.keras.utils import to_categorical   #This allows using categorical cross entropy as the cost function
+#from tensorflow.keras import Conv2D
+#from tensorflow.keras import MaxPooling2D
+#from tensorflow.keras import Flatten
 
-o There is a big difference between the forward and backward passes.
-o In the forward pass we use squashing functions (like the logistic) to prevent the activity vectors from exploding.
-o The backward pass, is completely linear. If you double the error derivatives at the final layer, all the error derivatives will double.
+from sklearn.model_selection import train_test_split
 
-The forward pass determines the slope of the linear function used for
-backpropagating through each neuron
+# representation of labels
+labels = to_categorical(labels)
 
+# split into train and test data
+# one-liner from scikit-learn library
+train_size = 0.8
+test_size = 1 - train_size
+X_train, X_test, Y_train, Y_test = train_test_split(inputs, labels, train_size=train_size,
+                                                    test_size=test_size)
+!ec
 
 !split 
-===== The problem of exploding or vanishing gradients =====
-* What happens to the magnitude of the gradients as we backpropagate through many layers?
-  o If the weights are small, the gradients shrink exponentially.
-  o If the weights are big the gradients grow exponentially.
-* Typical feed-forward neural nets can cope with these exponential effects because they only have a few hidden layers.
-* In an RNN trained on long sequences (e.g. 100 time steps) the gradients can easily explode or vanish.
-  o We can avoid this by initializing the weights very carefully.
-* Even with good initial weights, its very hard to detect that the current target output depends on an input from many time-steps ago.
-
+===== Running with Keras =====
 
-RNNs have difficulty dealing with long-range dependencies. 
+!bc pycod
+def create_convolutional_neural_network_keras(input_shape, receptive_field,
+                                              n_filters, n_neurons_connected, n_categories,
+                                              eta, lmbd):
+    model = Sequential()
+    model.add(layers.Conv2D(n_filters, (receptive_field, receptive_field), input_shape=input_shape, padding='same',
+              activation='relu', kernel_regularizer=regularizers.l2(lmbd)))
+    model.add(layers.MaxPooling2D(pool_size=(2, 2)))
+    model.add(layers.Flatten())
+    model.add(layers.Dense(n_neurons_connected, activation='relu', kernel_regularizer=regularizers.l2(lmbd)))
+    model.add(layers.Dense(n_categories, activation='softmax', kernel_regularizer=regularizers.l2(lmbd)))
+    
+    sgd = optimizers.SGD(lr=eta)
+    model.compile(loss='categorical_crossentropy', optimizer=sgd, metrics=['accuracy'])
+    
+    return model
 
-!split
-===== Mathematical setup =====
+epochs = 100
+batch_size = 100
+input_shape = X_train.shape[1:4]
+receptive_field = 3
+n_filters = 10
+n_neurons_connected = 50
+n_categories = 10
 
-The expression for the simplest Recurrent network resembles that of a
-regular feed-forward neural network, but now with
-the concept of temporal dependencies
+eta_vals = np.logspace(-5, 1, 7)
+lmbd_vals = np.logspace(-5, 1, 7)
+!ec
 
-!bt
-\begin{align*}
-    \mathbf{a}^{(t)} & = U * \mathbf{x}^{(t)} + W * \mathbf{h}^{(t-1)} + \mathbf{b}, \notag \\
-    \mathbf{h}^{(t)} &= \sigma_h(\mathbf{a}^{(t)}), \notag\\
-    \mathbf{y}^{(t)} &= V * \mathbf{h}^{(t)} + \mathbf{c}, \notag\\
-    \mathbf{\hat{y}}^{(t)} &= \sigma_y(\mathbf{y}^{(t)}).
-\end{align*}
-!et
+!split
+===== Final part =====
 
+!bc pycod
+CNN_keras = np.zeros((len(eta_vals), len(lmbd_vals)), dtype=object)
+        
+for i, eta in enumerate(eta_vals):
+    for j, lmbd in enumerate(lmbd_vals):
+        CNN = create_convolutional_neural_network_keras(input_shape, receptive_field,
+                                              n_filters, n_neurons_connected, n_categories,
+                                              eta, lmbd)
+        CNN.fit(X_train, Y_train, epochs=epochs, batch_size=batch_size, verbose=0)
+        scores = CNN.evaluate(X_test, Y_test)
+        
+        CNN_keras[i][j] = CNN
+        
+        print("Learning rate = ", eta)
+        print("Lambda = ", lmbd)
+        print("Test accuracy: %.3f" % scores[1])
+        print()
+!ec     
 
 !split
-===== Back propagation in time through figures, part 1   =====
-
-FIGURE: [figslides/RNN9.png, width=700 frac=0.9]
+===== Final visualization =====
 
+!bc pycod
+# visual representation of grid search
+# uses seaborn heatmap, could probably do this in matplotlib
+import seaborn as sns
 
-!split
-===== Back propagation in time, part 2   =====
+sns.set()
 
-FIGURE: [figslides/RNN10.png, width=700 frac=0.9]
+train_accuracy = np.zeros((len(eta_vals), len(lmbd_vals)))
+test_accuracy = np.zeros((len(eta_vals), len(lmbd_vals)))
 
+for i in range(len(eta_vals)):
+    for j in range(len(lmbd_vals)):
+        CNN = CNN_keras[i][j]
 
-!split
-===== Back propagation in time, part 3   =====
+        train_accuracy[i][j] = CNN.evaluate(X_train, Y_train)[1]
+        test_accuracy[i][j] = CNN.evaluate(X_test, Y_test)[1]
 
-FIGURE: [figslides/RNN11.png, width=700 frac=0.9]
+        
+fig, ax = plt.subplots(figsize = (10, 10))
+sns.heatmap(train_accuracy, annot=True, ax=ax, cmap="viridis")
+ax.set_title("Training Accuracy")
+ax.set_ylabel("$\eta$")
+ax.set_xlabel("$\lambda$")
+plt.show()
 
+fig, ax = plt.subplots(figsize = (10, 10))
+sns.heatmap(test_accuracy, annot=True, ax=ax, cmap="viridis")
+ax.set_title("Test Accuracy")
+ax.set_ylabel("$\eta$")
+ax.set_xlabel("$\lambda$")
+plt.show()
+!ec
 
-!split
-===== Back propagation in time, part 4   =====
 
-FIGURE: [figslides/RNN12.png, width=700 frac=0.9]
 
 
 !split
-===== Back propagation in time in equations =====
-
-To derive the expression of the gradients of $\mathcal{L}$ for
-the RNN, we need to start recursively from the nodes closer to the
-output layer in the temporal unrolling scheme - such as $\mathbf{y}$
-and $\mathbf{h}$ at final time $t = \tau$,
+===== The CIFAR01 data set =====
 
-!bt
-\begin{align*}
-    (\nabla_{ \mathbf{y}^{(t)}} \mathcal{L})_{i} &= \frac{\partial \mathcal{L}}{\partial L^{(t)}}\frac{\partial L^{(t)}}{\partial y_{i}^{(t)}}, \notag\\
-    \nabla_{\mathbf{h}^{(\tau)}} \mathcal{L} &= \mathbf{V}^\mathsf{T}\nabla_{ \mathbf{y}^{(\tau)}} \mathcal{L}.
-\end{align*}
-!et
+The CIFAR10 dataset contains 60,000 color images in 10 classes, with
+6,000 images in each class. The dataset is divided into 50,000
+training images and 10,000 testing images. The classes are mutually
+exclusive and there is no overlap between them.
 
-!split
-===== Chain rule again =====
-For the following hidden nodes, we have to iterate through time, so by the chain rule, 
+!bc pycod
+import tensorflow as tf
 
-!bt
-\begin{align*}
-    \nabla_{\mathbf{h}^{(t)}} \mathcal{L} &= \left(\frac{\partial\mathbf{h}^{(t+1)}}{\partial\mathbf{h}^{(t)}}\right)^\mathsf{T}\nabla_{\mathbf{h}^{(t+1)}}\mathcal{L} + \left(\frac{\partial\mathbf{y}^{(t)}}{\partial\mathbf{h}^{(t)}}\right)^\mathsf{T}\nabla_{ \mathbf{y}^{(t)}} \mathcal{L}.
-\end{align*}
-!et
+from tensorflow.keras import datasets, layers, models
+import matplotlib.pyplot as plt
 
-!split
-===== Gradients of loss functions =====
-Similarly, the gradients of $\mathcal{L}$ with respect to the weights and biases follow,
+# We import the data set
+(train_images, train_labels), (test_images, test_labels) = datasets.cifar10.load_data()
 
-!bt
-\begin{align*}
-    \nabla_{\mathbf{c}} \mathcal{L} &=\sum_{t}\left(\frac{\partial \mathbf{y}^{(t)}}{\partial \mathbf{c}}\right)^\mathsf{T} \nabla_{\mathbf{y}^{(t)}} \mathcal{L} \notag\\
-    \nabla_{\mathbf{b}} \mathcal{L} &=\sum_{t}\left(\frac{\partial \mathbf{h}^{(t)}}{\partial \mathbf{b}}\right)^\mathsf{T}        \nabla_{\mathbf{h}^{(t)}} \mathcal{L} \notag\\
-    \nabla_{\mathbf{V}} \mathcal{L} &=\sum_{t}\sum_{i}\left(\frac{\partial \mathcal{L}}{\partial y_i^{(t)} }\right)\nabla_{\mathbf{V}^{(t)}}y_i^{(t)} \notag\\
-    \nabla_{\mathbf{W}} \mathcal{L} &=\sum_{t}\sum_{i}\left(\frac{\partial \mathcal{L}}{\partial h_i^{(t)}}\right)\nabla_{\mathbf{w}^{(t)}} h_i^{(t)} \notag\\
-    \nabla_{\mathbf{U}} \mathcal{L} &=\sum_{t}\sum_{i}\left(\frac{\partial \mathcal{L}}{\partial h_i^{(t)}}\right)\nabla_{\mathbf{U}^{(t)}}h_i^{(t)}.
-    \label{eq:rnn_gradients3}
-\end{align*}
-!et
+# Normalize pixel values to be between 0 and 1 by dividing by 255. 
+train_images, test_images = train_images / 255.0, test_images / 255.0
 
-!split
-===== Summary of RNNs =====
+!ec
 
-Recurrent neural networks (RNNs) have in general no probabilistic component
-in a model. With a given fixed input and target from data, the RNNs learn the intermediate
-association between various layers.
-The inputs, outputs, and internal representation (hidden states) are all
-real-valued vectors.
 
-In a  traditional NN, it is assumed that every input is
-independent of each other.  But with sequential data, the input at a given stage $t$ depends on the input from the previous stage $t-1$
 
 !split
-===== Summary of a  typical RNN =====
-
-o Weight matrices $U$, $W$ and $V$ that connect the input layer at a stage $t$ with the hidden layer $h_t$, the previous hidden layer $h_{t-1}$ with $h_t$ and the hidden layer $h_t$ connecting with the output layer at the same stage and producing an output $\tilde{y}_t$, respectively.
-o The output from the hidden layer $h_t$ is oftem modulated by a $\tanh{}$ function $h_t=\sigma_h(x_t,h_{t-1})=\tanh{(Ux_t+Wh_{t-1}+b)}$ with $b$ a bias value
-o The output from the hidden layer produces $\tilde{y}_t=\sigma_y(Vh_t+c)$ where $c$ is a new bias parameter.
-o The output from the training at a given stage is in turn compared with the observation $y_t$ thorugh a chosen cost function.
+===== Verifying the data set =====
 
-The function $g$ can any of the standard activation functions, that is a Sigmoid, a Softmax, a ReLU and other.
-The parameters are trained through the so-called back-propagation through time (BPTT) algorithm.
+To verify that the dataset looks correct, let's plot the first 25 images from the training set and display the class name below each image.
 
+!bc pycod
+class_names = ['airplane', 'automobile', 'bird', 'cat', 'deer',
+               'dog', 'frog', 'horse', 'ship', 'truck']
+plt.figure(figsize=(10,10))
+for i in range(25):
+    plt.subplot(5,5,i+1)
+    plt.xticks([])
+    plt.yticks([])
+    plt.grid(False)
+    plt.imshow(train_images[i], cmap=plt.cm.binary)
+    # The CIFAR labels happen to be arrays, 
+    # which is why you need the extra index
+    plt.xlabel(class_names[train_labels[i][0]])
+plt.show()
+!ec
 
+!split
+===== Set up  the model =====
 
+The 6 lines of code below define the convolutional base using a common pattern: a stack of Conv2D and MaxPooling2D layers.
 
-!split
-===== Four effective ways to learn an RNN and preparing for next week =====
-o Long Short Term Memory Make the RNN out of little modules that are designed to remember values for a long time.
-o Hessian Free Optimization: Deal with the vanishing gradients problem by using a fancy optimizer that can detect directions with a tiny gradient but even smaller curvature.
-o Echo State Networks: Initialize the input a hidden and hidden-hidden and output-hidden connections very carefully so that the hidden state has a huge reservoir of weakly coupled oscillators which can be selectively driven by the input.
-  * ESNs only need to learn the hidden-output connections.
-o Good initialization with momentum Initialize like in Echo State Networks, but then learn all of the connections using momentum
+As input, a CNN takes tensors of shape (image_height, image_width, color_channels), ignoring the batch size. If you are new to these dimensions, color_channels refers to (R,G,B). In this example, you will configure our CNN to process inputs of shape (32, 32, 3), which is the format of CIFAR images. You can do this by passing the argument input_shape to our first layer.
 
+!bc pycod
+model = models.Sequential()
+model.add(layers.Conv2D(32, (3, 3), activation='relu', input_shape=(32, 32, 3)))
+model.add(layers.MaxPooling2D((2, 2)))
+model.add(layers.Conv2D(64, (3, 3), activation='relu'))
+model.add(layers.MaxPooling2D((2, 2)))
+model.add(layers.Conv2D(64, (3, 3), activation='relu'))
 
-!split
-===== Gating mechanism: Long Short Term Memory (LSTM) =====
+# Let's display the architecture of our model so far.
 
-Besides a simple recurrent neural network layer, as discussed above, there are two other
-commonly used types of recurrent neural network layers: Long Short
-Term Memory (LSTM) and Gated Recurrent Unit (GRU).  For a short
-introduction to these layers see URL:"https://medium.com/mindboard/lstm-vs-gru-experimental-comparison-955820c21e8b"
-and URL:"https://medium.com/mindboard/lstm-vs-gru-experimental-comparison-955820c21e8b".
+model.summary()
+!ec
 
+You can see that the output of every Conv2D and MaxPooling2D layer is a 3D tensor of shape (height, width, channels). The width and height dimensions tend to shrink as you go deeper in the network. The number of output channels for each Conv2D layer is controlled by the first argument (e.g., 32 or 64). Typically, as the width and height shrink, you can afford (computationally) to add more output channels in each Conv2D layer.
 
-LSTM uses a memory cell for  modeling long-range dependencies and avoid vanishing gradient  problems.
-Capable of modeling longer term dependencies by having
-memory cells and gates that controls the information flow along
-with the memory cells.
 
-o Introduced by Hochreiter and Schmidhuber (1997) who solved the problem of getting an RNN to remember things for a long time (like hundreds of time steps).
-o They designed a memory cell using logistic and linear units with multiplicative interactions.
-o Information gets into the cell whenever its “write” gate is on.
-o The information stays in the cell so long as its _keep_ gate is on.
-o Information can be read from the cell by turning on its _read_ gate. 
 
 
 !split
-===== Implementing a memory cell in a neural network =====
+===== Add Dense layers on top =====
 
-To preserve information for a long time in
-the activities of an RNN, we use a circuit
-that implements an analog memory cell.
+To complete our model, you will feed the last output tensor from the
+convolutional base (of shape (4, 4, 64)) into one or more Dense layers
+to perform classification. Dense layers take vectors as input (which
+are 1D), while the current output is a 3D tensor. First, you will
+flatten (or unroll) the 3D output to 1D, then add one or more Dense
+layers on top. CIFAR has 10 output classes, so you use a final Dense
+layer with 10 outputs and a softmax activation.
 
-o A linear unit that has a self-link with a weight of 1 will maintain its state.
-o Information is stored in the cell by activating its write gate.
-o Information is retrieved by activating the read gate.
-o We can backpropagate through this circuit because logistics are have nice derivatives. 
+!bc pycod
+model.add(layers.Flatten())
+model.add(layers.Dense(64, activation='relu'))
+model.add(layers.Dense(10))
+Here's the complete architecture of our model.
 
+model.summary()
+!ec
+As you can see, our (4, 4, 64) outputs were flattened into vectors of shape (1024) before going through two Dense layers.
 
 !split
-===== LSTM details =====
+===== Compile and train the model =====
 
-The LSTM is a unit cell that is made of three gates:
-o the input gate,
-o the forget gate,
-o and the output gate.
+!bc pycod
+model.compile(optimizer='adam',
+              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
+              metrics=['accuracy'])
 
-It also introduces a cell state $c$, which can be thought of as the
-long-term memory, and a hidden state $h$ which can be thought of as
-the short-term memory.
+history = model.fit(train_images, train_labels, epochs=10, 
+                    validation_data=(test_images, test_labels))
 
-!split
-===== Basic layout =====
+!ec
 
-FIGURE: [figslides/lstm.pdf, width=700 frac=1.0]
 
 !split
-===== More LSTM details =====
-
-The first stage is called the forget gate, where we combine the input
-at (say, time $t$), and the hidden cell state input at $t-1$, passing
-it through the Sigmoid activation function and then performing an
-element-wise multiplication, denoted by $\otimes$.
+===== Finally, evaluate the model =====
 
-It follows 
-!bt
-\[
-\mathbf{f}^{(t)} = \sigma(W_f\mathbf{x}^{(t)} + U_f\mathbf{h}^{(t-1)} + \mathbf{b}_f)
-\]
-!et
-where $W$ and $U$ are the weights respectively.
+!bc pycod
+plt.plot(history.history['accuracy'], label='accuracy')
+plt.plot(history.history['val_accuracy'], label = 'val_accuracy')
+plt.xlabel('Epoch')
+plt.ylabel('Accuracy')
+plt.ylim([0.5, 1])
+plt.legend(loc='lower right')
 
-!split
-===== The forget gate =====
+test_loss, test_acc = model.evaluate(test_images,  test_labels, verbose=2)
 
-This is called the forget gate since the Sigmoid activation function's
-outputs are very close to $0$ if the argument for the function is very
-negative, and $1$ if the argument is very positive. Hence we can
-control the amount of information we want to take from the long-term
-memory.
+print(test_acc)
 
-!split
-===== Input gate =====
+!ec
 
-The next stage is the input gate, which consists of both a Sigmoid
-function ($\sigma_i$), which decide what percentage of the input will
-be stored in the long-term memory, and the $\tanh_i$ function, which
-decide what is the full memory that can be stored in the long term
-memory. When these results are calculated and multiplied together, it
-is added to the cell state or stored in the long-term memory, denoted
-as $\oplus$. 
 
-We have
-!bt
-\[
-\mathbf{i}^{(t)} = \sigma_g(W_i\mathbf{x}^{(t)} + U_i\mathbf{h}^{(t-1)} + \mathbf{b}_i),
-\]
-!et
-and
-!bt
-\[
-\mathbf{\tilde{c}}^{(t)} = \tanh(W_c\mathbf{x}^{(t)} + U_c\mathbf{h}^{(t-1)} + \mathbf{b}_c),
-\]
-!et
-again the $W$ and $U$ are the weights.
 
 !split
-===== Forget and input =====
+===== Building code using Pytorch =====
 
-The forget gate and the input gate together also update the cell state with the following equation, 
-!bt
-\[
-\mathbf{c}^{(t)} = \mathbf{f}^{(t)} \otimes \mathbf{c}^{(t-1)} + \mathbf{i}^{(t)} \otimes \mathbf{\tilde{c}}^{(t)},
-\]
-!et
-where $f^{(t)}$ and $i^{(t)}$ are the outputs of the forget gate and the input gate, respectively.
+This code loads and normalizes the MNIST dataset. Thereafter it defines  a CNN architecture with:
+o Two convolutional layers
+o Max pooling
+o Dropout for regularization
+o Two fully connected layers
 
-!split
-===== Output gate =====
-
-The final stage of the LSTM is the output gate, and its purpose is to
-update the short-term memory.  To achieve this, we take the newly
-generated long-term memory and process it through a hyperbolic tangent
-($\tanh$) function creating a potential new short-term memory. We then
-multiply this potential memory by the output of the Sigmoid function
-($\sigma_o$). This multiplication generates the final output as well
-as the input for the next hidden cell ($h^{\langle t \rangle}$) within
-the LSTM cell.
-
-We have 
-!bt
-\[
-\begin{aligned}
-\mathbf{o}^{(t)} &= \sigma_g(W_o\mathbf{x}^{(t)} + U_o\mathbf{h}^{(t-1)} + \mathbf{b}_o), \\
-\mathbf{h}^{(t)} &= \mathbf{o}^{(t)} \otimes \sigma_h(\mathbf{c}^{(t)}). \\
-\end{aligned}
-\]
-!et
-where $\mathbf{W_o,U_o}$ are the weights of the output gate and $\mathbf{b_o}$ is the bias of the output gate.
+It uses the Adam optimizer and for cost function it employs the
+Cross-Entropy function. It trains for 10 epochs.
+You can modify the architecture (number of layers, channels, dropout
+rate) or training parameters (learning rate, batch size, epochs) to
+experiment with different configurations.
+
+!bc pycod
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+import torch.optim as optim
+from torchvision import datasets, transforms
+
+# Set device
+device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+
+# Define transforms
+transform = transforms.Compose([
+   transforms.ToTensor(),
+   transforms.Normalize((0.1307,), (0.3081,))
+])
+
+# Load datasets
+train_dataset = datasets.MNIST(root='./data', train=True, download=True, transform=transform)
+test_dataset = datasets.MNIST(root='./data', train=False, download=True, transform=transform)
+
+# Create data loaders
+train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=64, shuffle=True)
+test_loader = torch.utils.data.DataLoader(test_dataset, batch_size=64, shuffle=False)
+
+# Define CNN model
+class CNN(nn.Module):
+   def __init__(self):
+       super(CNN, self).__init__()
+       self.conv1 = nn.Conv2d(1, 32, 3, padding=1)
+       self.conv2 = nn.Conv2d(32, 64, 3, padding=1)
+       self.pool = nn.MaxPool2d(2, 2)
+       self.fc1 = nn.Linear(64*7*7, 1024)
+       self.fc2 = nn.Linear(1024, 10)
+       self.dropout = nn.Dropout(0.5)
+
+   def forward(self, x):
+       x = self.pool(F.relu(self.conv1(x)))
+       x = self.pool(F.relu(self.conv2(x)))
+       x = x.view(-1, 64*7*7)
+       x = self.dropout(F.relu(self.fc1(x)))
+       x = self.fc2(x)
+       return x
+
+# Initialize model, loss function, and optimizer
+model = CNN().to(device)
+criterion = nn.CrossEntropyLoss()
+optimizer = optim.Adam(model.parameters(), lr=0.001)
+
+# Training loop
+num_epochs = 10
+for epoch in range(num_epochs):
+   model.train()
+   running_loss = 0.0
+   for batch_idx, (data, target) in enumerate(train_loader):
+       data, target = data.to(device), target.to(device)
+       optimizer.zero_grad()
+       outputs = model(data)
+       loss = criterion(outputs, target)
+       loss.backward()
+       optimizer.step()
+       running_loss += loss.item()
+
+   print(f'Epoch [{epoch+1}/{num_epochs}], Loss: {running_loss/len(train_loader):.4f}')
+
+# Testing the model
+model.eval()
+correct = 0
+total = 0
+with torch.no_grad():
+   for data, target in test_loader:
+       data, target = data.to(device), target.to(device)
+       outputs = model(data)
+       _, predicted = torch.max(outputs.data, 1)
+       total += target.size(0)
+       correct += (predicted == target).sum().item()
+
+print(f'Test Accuracy: {100 * correct / total:.2f}%')
+!ec